| 7 4 36 36 2 24 1 31 30 30 1 1 1 7 7 7 7 19 4 16 1 17 2 16 16 1 15 1 1 9 2 9 8 1 1 1 97 5 96 2 95 94 34 33 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 | // SPDX-License-Identifier: GPL-2.0+ /* * NILFS disk address translation. * * Copyright (C) 2006-2008 Nippon Telegraph and Telephone Corporation. * * Written by Koji Sato. */ #include <linux/types.h> #include <linux/buffer_head.h> #include <linux/string.h> #include <linux/errno.h> #include "nilfs.h" #include "mdt.h" #include "alloc.h" #include "dat.h" #define NILFS_CNO_MIN ((__u64)1) #define NILFS_CNO_MAX (~(__u64)0) /** * struct nilfs_dat_info - on-memory private data of DAT file * @mi: on-memory private data of metadata file * @palloc_cache: persistent object allocator cache of DAT file * @shadow: shadow map of DAT file */ struct nilfs_dat_info { struct nilfs_mdt_info mi; struct nilfs_palloc_cache palloc_cache; struct nilfs_shadow_map shadow; }; static inline struct nilfs_dat_info *NILFS_DAT_I(struct inode *dat) { return (struct nilfs_dat_info *)NILFS_MDT(dat); } static int nilfs_dat_prepare_entry(struct inode *dat, struct nilfs_palloc_req *req, int create) { int ret; ret = nilfs_palloc_get_entry_block(dat, req->pr_entry_nr, create, &req->pr_entry_bh); if (unlikely(ret == -ENOENT)) { nilfs_err(dat->i_sb, "DAT doesn't have a block to manage vblocknr = %llu", (unsigned long long)req->pr_entry_nr); /* * Return internal code -EINVAL to notify bmap layer of * metadata corruption. */ ret = -EINVAL; } return ret; } static void nilfs_dat_commit_entry(struct inode *dat, struct nilfs_palloc_req *req) { mark_buffer_dirty(req->pr_entry_bh); nilfs_mdt_mark_dirty(dat); brelse(req->pr_entry_bh); } static void nilfs_dat_abort_entry(struct inode *dat, struct nilfs_palloc_req *req) { brelse(req->pr_entry_bh); } int nilfs_dat_prepare_alloc(struct inode *dat, struct nilfs_palloc_req *req) { int ret; ret = nilfs_palloc_prepare_alloc_entry(dat, req, true); if (ret < 0) return ret; ret = nilfs_dat_prepare_entry(dat, req, 1); if (ret < 0) nilfs_palloc_abort_alloc_entry(dat, req); return ret; } void nilfs_dat_commit_alloc(struct inode *dat, struct nilfs_palloc_req *req) { struct nilfs_dat_entry *entry; size_t offset; offset = nilfs_palloc_entry_offset(dat, req->pr_entry_nr, req->pr_entry_bh); entry = kmap_local_folio(req->pr_entry_bh->b_folio, offset); entry->de_start = cpu_to_le64(NILFS_CNO_MIN); entry->de_end = cpu_to_le64(NILFS_CNO_MAX); entry->de_blocknr = cpu_to_le64(0); kunmap_local(entry); nilfs_palloc_commit_alloc_entry(dat, req); nilfs_dat_commit_entry(dat, req); } void nilfs_dat_abort_alloc(struct inode *dat, struct nilfs_palloc_req *req) { nilfs_dat_abort_entry(dat, req); nilfs_palloc_abort_alloc_entry(dat, req); } static void nilfs_dat_commit_free(struct inode *dat, struct nilfs_palloc_req *req) { struct nilfs_dat_entry *entry; size_t offset; offset = nilfs_palloc_entry_offset(dat, req->pr_entry_nr, req->pr_entry_bh); entry = kmap_local_folio(req->pr_entry_bh->b_folio, offset); entry->de_start = cpu_to_le64(NILFS_CNO_MIN); entry->de_end = cpu_to_le64(NILFS_CNO_MIN); entry->de_blocknr = cpu_to_le64(0); kunmap_local(entry); nilfs_dat_commit_entry(dat, req); if (unlikely(req->pr_desc_bh == NULL || req->pr_bitmap_bh == NULL)) { nilfs_error(dat->i_sb, "state inconsistency probably due to duplicate use of vblocknr = %llu", (unsigned long long)req->pr_entry_nr); return; } nilfs_palloc_commit_free_entry(dat, req); } int nilfs_dat_prepare_start(struct inode *dat, struct nilfs_palloc_req *req) { return nilfs_dat_prepare_entry(dat, req, 0); } void nilfs_dat_commit_start(struct inode *dat, struct nilfs_palloc_req *req, sector_t blocknr) { struct nilfs_dat_entry *entry; size_t offset; offset = nilfs_palloc_entry_offset(dat, req->pr_entry_nr, req->pr_entry_bh); entry = kmap_local_folio(req->pr_entry_bh->b_folio, offset); entry->de_start = cpu_to_le64(nilfs_mdt_cno(dat)); entry->de_blocknr = cpu_to_le64(blocknr); kunmap_local(entry); nilfs_dat_commit_entry(dat, req); } int nilfs_dat_prepare_end(struct inode *dat, struct nilfs_palloc_req *req) { struct nilfs_dat_entry *entry; __u64 start; sector_t blocknr; size_t offset; int ret; ret = nilfs_dat_prepare_entry(dat, req, 0); if (ret < 0) return ret; offset = nilfs_palloc_entry_offset(dat, req->pr_entry_nr, req->pr_entry_bh); entry = kmap_local_folio(req->pr_entry_bh->b_folio, offset); start = le64_to_cpu(entry->de_start); blocknr = le64_to_cpu(entry->de_blocknr); kunmap_local(entry); if (blocknr == 0) { ret = nilfs_palloc_prepare_free_entry(dat, req); if (ret < 0) { nilfs_dat_abort_entry(dat, req); return ret; } } if (unlikely(start > nilfs_mdt_cno(dat))) { nilfs_err(dat->i_sb, "vblocknr = %llu has abnormal lifetime: start cno (= %llu) > current cno (= %llu)", (unsigned long long)req->pr_entry_nr, (unsigned long long)start, (unsigned long long)nilfs_mdt_cno(dat)); nilfs_dat_abort_entry(dat, req); return -EINVAL; } return 0; } void nilfs_dat_commit_end(struct inode *dat, struct nilfs_palloc_req *req, int dead) { struct nilfs_dat_entry *entry; __u64 start, end; sector_t blocknr; size_t offset; offset = nilfs_palloc_entry_offset(dat, req->pr_entry_nr, req->pr_entry_bh); entry = kmap_local_folio(req->pr_entry_bh->b_folio, offset); end = start = le64_to_cpu(entry->de_start); if (!dead) { end = nilfs_mdt_cno(dat); WARN_ON(start > end); } entry->de_end = cpu_to_le64(end); blocknr = le64_to_cpu(entry->de_blocknr); kunmap_local(entry); if (blocknr == 0) nilfs_dat_commit_free(dat, req); else nilfs_dat_commit_entry(dat, req); } void nilfs_dat_abort_end(struct inode *dat, struct nilfs_palloc_req *req) { struct nilfs_dat_entry *entry; __u64 start; sector_t blocknr; size_t offset; offset = nilfs_palloc_entry_offset(dat, req->pr_entry_nr, req->pr_entry_bh); entry = kmap_local_folio(req->pr_entry_bh->b_folio, offset); start = le64_to_cpu(entry->de_start); blocknr = le64_to_cpu(entry->de_blocknr); kunmap_local(entry); if (start == nilfs_mdt_cno(dat) && blocknr == 0) nilfs_palloc_abort_free_entry(dat, req); nilfs_dat_abort_entry(dat, req); } int nilfs_dat_prepare_update(struct inode *dat, struct nilfs_palloc_req *oldreq, struct nilfs_palloc_req *newreq) { int ret; ret = nilfs_dat_prepare_end(dat, oldreq); if (!ret) { ret = nilfs_dat_prepare_alloc(dat, newreq); if (ret < 0) nilfs_dat_abort_end(dat, oldreq); } return ret; } void nilfs_dat_commit_update(struct inode *dat, struct nilfs_palloc_req *oldreq, struct nilfs_palloc_req *newreq, int dead) { nilfs_dat_commit_end(dat, oldreq, dead); nilfs_dat_commit_alloc(dat, newreq); } void nilfs_dat_abort_update(struct inode *dat, struct nilfs_palloc_req *oldreq, struct nilfs_palloc_req *newreq) { nilfs_dat_abort_end(dat, oldreq); nilfs_dat_abort_alloc(dat, newreq); } /** * nilfs_dat_mark_dirty - mark the DAT block buffer containing the specified * virtual block address entry as dirty * @dat: DAT file inode * @vblocknr: virtual block number * * Return: 0 on success, or one of the following negative error codes on * failure: * * %-EINVAL - Invalid DAT entry (internal code). * * %-EIO - I/O error (including metadata corruption). * * %-ENOMEM - Insufficient memory available. */ int nilfs_dat_mark_dirty(struct inode *dat, __u64 vblocknr) { struct nilfs_palloc_req req; int ret; req.pr_entry_nr = vblocknr; ret = nilfs_dat_prepare_entry(dat, &req, 0); if (ret == 0) nilfs_dat_commit_entry(dat, &req); return ret; } /** * nilfs_dat_freev - free virtual block numbers * @dat: DAT file inode * @vblocknrs: array of virtual block numbers * @nitems: number of virtual block numbers * * Description: nilfs_dat_freev() frees the virtual block numbers specified by * @vblocknrs and @nitems. * * Return: 0 on success, or one of the following negative error codes on * failure: * * %-EIO - I/O error (including metadata corruption). * * %-ENOENT - The virtual block number have not been allocated. * * %-ENOMEM - Insufficient memory available. */ int nilfs_dat_freev(struct inode *dat, __u64 *vblocknrs, size_t nitems) { return nilfs_palloc_freev(dat, vblocknrs, nitems); } /** * nilfs_dat_move - change a block number * @dat: DAT file inode * @vblocknr: virtual block number * @blocknr: block number * * Description: nilfs_dat_move() changes the block number associated with * @vblocknr to @blocknr. * * Return: 0 on success, or one of the following negative error codes on * failure: * * %-EIO - I/O error (including metadata corruption). * * %-ENOMEM - Insufficient memory available. */ int nilfs_dat_move(struct inode *dat, __u64 vblocknr, sector_t blocknr) { struct buffer_head *entry_bh; struct nilfs_dat_entry *entry; size_t offset; int ret; ret = nilfs_palloc_get_entry_block(dat, vblocknr, 0, &entry_bh); if (ret < 0) return ret; /* * The given disk block number (blocknr) is not yet written to * the device at this point. * * To prevent nilfs_dat_translate() from returning the * uncommitted block number, this makes a copy of the entry * buffer and redirects nilfs_dat_translate() to the copy. */ if (!buffer_nilfs_redirected(entry_bh)) { ret = nilfs_mdt_freeze_buffer(dat, entry_bh); if (ret) { brelse(entry_bh); return ret; } } offset = nilfs_palloc_entry_offset(dat, vblocknr, entry_bh); entry = kmap_local_folio(entry_bh->b_folio, offset); if (unlikely(entry->de_blocknr == cpu_to_le64(0))) { nilfs_crit(dat->i_sb, "%s: invalid vblocknr = %llu, [%llu, %llu)", __func__, (unsigned long long)vblocknr, (unsigned long long)le64_to_cpu(entry->de_start), (unsigned long long)le64_to_cpu(entry->de_end)); kunmap_local(entry); brelse(entry_bh); return -EINVAL; } WARN_ON(blocknr == 0); entry->de_blocknr = cpu_to_le64(blocknr); kunmap_local(entry); mark_buffer_dirty(entry_bh); nilfs_mdt_mark_dirty(dat); brelse(entry_bh); return 0; } /** * nilfs_dat_translate - translate a virtual block number to a block number * @dat: DAT file inode * @vblocknr: virtual block number * @blocknrp: pointer to a block number * * Description: nilfs_dat_translate() maps the virtual block number @vblocknr * to the corresponding block number. The block number associated with * @vblocknr is stored in the place pointed to by @blocknrp. * * Return: 0 on success, or one of the following negative error codes on * failure: * * %-EIO - I/O error (including metadata corruption). * * %-ENOENT - A block number associated with @vblocknr does not exist. * * %-ENOMEM - Insufficient memory available. */ int nilfs_dat_translate(struct inode *dat, __u64 vblocknr, sector_t *blocknrp) { struct buffer_head *entry_bh, *bh; struct nilfs_dat_entry *entry; sector_t blocknr; size_t offset; int ret; ret = nilfs_palloc_get_entry_block(dat, vblocknr, 0, &entry_bh); if (ret < 0) return ret; if (!nilfs_doing_gc() && buffer_nilfs_redirected(entry_bh)) { bh = nilfs_mdt_get_frozen_buffer(dat, entry_bh); if (bh) { WARN_ON(!buffer_uptodate(bh)); brelse(entry_bh); entry_bh = bh; } } offset = nilfs_palloc_entry_offset(dat, vblocknr, entry_bh); entry = kmap_local_folio(entry_bh->b_folio, offset); blocknr = le64_to_cpu(entry->de_blocknr); if (blocknr == 0) { ret = -ENOENT; goto out; } *blocknrp = blocknr; out: kunmap_local(entry); brelse(entry_bh); return ret; } ssize_t nilfs_dat_get_vinfo(struct inode *dat, void *buf, unsigned int visz, size_t nvi) { struct buffer_head *entry_bh; struct nilfs_dat_entry *entry, *first_entry; struct nilfs_vinfo *vinfo = buf; __u64 first, last; size_t offset; unsigned long entries_per_block = NILFS_MDT(dat)->mi_entries_per_block; unsigned int entry_size = NILFS_MDT(dat)->mi_entry_size; int i, j, n, ret; for (i = 0; i < nvi; i += n) { ret = nilfs_palloc_get_entry_block(dat, vinfo->vi_vblocknr, 0, &entry_bh); if (ret < 0) return ret; first = vinfo->vi_vblocknr; first = div64_ul(first, entries_per_block); first *= entries_per_block; /* first virtual block number in this block */ last = first + entries_per_block - 1; /* last virtual block number in this block */ offset = nilfs_palloc_entry_offset(dat, first, entry_bh); first_entry = kmap_local_folio(entry_bh->b_folio, offset); for (j = i, n = 0; j < nvi && vinfo->vi_vblocknr >= first && vinfo->vi_vblocknr <= last; j++, n++, vinfo = (void *)vinfo + visz) { entry = (void *)first_entry + (vinfo->vi_vblocknr - first) * entry_size; vinfo->vi_start = le64_to_cpu(entry->de_start); vinfo->vi_end = le64_to_cpu(entry->de_end); vinfo->vi_blocknr = le64_to_cpu(entry->de_blocknr); } kunmap_local(first_entry); brelse(entry_bh); } return nvi; } /** * nilfs_dat_read - read or get dat inode * @sb: super block instance * @entry_size: size of a dat entry * @raw_inode: on-disk dat inode * @inodep: buffer to store the inode * * Return: 0 on success, or a negative error code on failure. */ int nilfs_dat_read(struct super_block *sb, size_t entry_size, struct nilfs_inode *raw_inode, struct inode **inodep) { static struct lock_class_key dat_lock_key; struct inode *dat; struct nilfs_dat_info *di; int err; if (entry_size > sb->s_blocksize) { nilfs_err(sb, "too large DAT entry size: %zu bytes", entry_size); return -EINVAL; } else if (entry_size < NILFS_MIN_DAT_ENTRY_SIZE) { nilfs_err(sb, "too small DAT entry size: %zu bytes", entry_size); return -EINVAL; } dat = nilfs_iget_locked(sb, NULL, NILFS_DAT_INO); if (unlikely(!dat)) return -ENOMEM; if (!(dat->i_state & I_NEW)) goto out; err = nilfs_mdt_init(dat, NILFS_MDT_GFP, sizeof(*di)); if (err) goto failed; err = nilfs_palloc_init_blockgroup(dat, entry_size); if (err) goto failed; di = NILFS_DAT_I(dat); lockdep_set_class(&di->mi.mi_sem, &dat_lock_key); nilfs_palloc_setup_cache(dat, &di->palloc_cache); err = nilfs_mdt_setup_shadow_map(dat, &di->shadow); if (err) goto failed; err = nilfs_read_inode_common(dat, raw_inode); if (err) goto failed; unlock_new_inode(dat); out: *inodep = dat; return 0; failed: iget_failed(dat); return err; } |
| 9 22 5 2 16 16 1 1 1 7 7 44 2 9 34 43 43 2 2 11 24 23 3 3 3 3 14 3 1 1 7 10 2 10 3 3 10 10 10 10 28 28 23 3 3 1 3 1641 1639 9 8 4 2 24 24 24 24 24 1 46 46 2 1 1 1 1 4 2 4 2 3 2 1 8 1 1 2 4 6 59 1 1 2 39 16 37 3 11 52 45 5 1 1 2 1 7 1 1 1 2 1 1 1 1 1 6 5 4 3 2 2 2 7 1 1 1 4 1 3 18 1 2 15 1 3 5 4 2 5 1 3 3 3 1 2 15 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 | /* * net/tipc/bearer.c: TIPC bearer code * * Copyright (c) 1996-2006, 2013-2016, Ericsson AB * Copyright (c) 2004-2006, 2010-2013, Wind River Systems * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are met: * * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. Neither the names of the copyright holders nor the names of its * contributors may be used to endorse or promote products derived from * this software without specific prior written permission. * * Alternatively, this software may be distributed under the terms of the * GNU General Public License ("GPL") version 2 as published by the Free * Software Foundation. * * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE * POSSIBILITY OF SUCH DAMAGE. */ #include <net/sock.h> #include "core.h" #include "bearer.h" #include "link.h" #include "discover.h" #include "monitor.h" #include "bcast.h" #include "netlink.h" #include "udp_media.h" #include "trace.h" #include "crypto.h" #define MAX_ADDR_STR 60 static struct tipc_media * const media_info_array[] = { ð_media_info, #ifdef CONFIG_TIPC_MEDIA_IB &ib_media_info, #endif #ifdef CONFIG_TIPC_MEDIA_UDP &udp_media_info, #endif NULL }; static struct tipc_bearer *bearer_get(struct net *net, int bearer_id) { struct tipc_net *tn = tipc_net(net); return rcu_dereference(tn->bearer_list[bearer_id]); } static void bearer_disable(struct net *net, struct tipc_bearer *b); static int tipc_l2_rcv_msg(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt, struct net_device *orig_dev); /** * tipc_media_find - locates specified media object by name * @name: name to locate */ struct tipc_media *tipc_media_find(const char *name) { u32 i; for (i = 0; media_info_array[i] != NULL; i++) { if (!strcmp(media_info_array[i]->name, name)) break; } return media_info_array[i]; } /** * media_find_id - locates specified media object by type identifier * @type: type identifier to locate */ static struct tipc_media *media_find_id(u8 type) { u32 i; for (i = 0; media_info_array[i] != NULL; i++) { if (media_info_array[i]->type_id == type) break; } return media_info_array[i]; } /** * tipc_media_addr_printf - record media address in print buffer * @buf: output buffer * @len: output buffer size remaining * @a: input media address */ int tipc_media_addr_printf(char *buf, int len, struct tipc_media_addr *a) { char addr_str[MAX_ADDR_STR]; struct tipc_media *m; int ret; m = media_find_id(a->media_id); if (m && !m->addr2str(a, addr_str, sizeof(addr_str))) ret = scnprintf(buf, len, "%s(%s)", m->name, addr_str); else { u32 i; ret = scnprintf(buf, len, "UNKNOWN(%u)", a->media_id); for (i = 0; i < sizeof(a->value); i++) ret += scnprintf(buf + ret, len - ret, "-%x", a->value[i]); } return ret; } /** * bearer_name_validate - validate & (optionally) deconstruct bearer name * @name: ptr to bearer name string * @name_parts: ptr to area for bearer name components (or NULL if not needed) * * Return: 1 if bearer name is valid, otherwise 0. */ static int bearer_name_validate(const char *name, struct tipc_bearer_names *name_parts) { char name_copy[TIPC_MAX_BEARER_NAME]; char *media_name; char *if_name; u32 media_len; u32 if_len; /* copy bearer name & ensure length is OK */ if (strscpy(name_copy, name, TIPC_MAX_BEARER_NAME) < 0) return 0; /* ensure all component parts of bearer name are present */ media_name = name_copy; if_name = strchr(media_name, ':'); if (if_name == NULL) return 0; *(if_name++) = 0; media_len = if_name - media_name; if_len = strlen(if_name) + 1; /* validate component parts of bearer name */ if ((media_len <= 1) || (media_len > TIPC_MAX_MEDIA_NAME) || (if_len <= 1) || (if_len > TIPC_MAX_IF_NAME)) return 0; /* return bearer name components, if necessary */ if (name_parts) { if (strscpy(name_parts->media_name, media_name, TIPC_MAX_MEDIA_NAME) < 0) return 0; if (strscpy(name_parts->if_name, if_name, TIPC_MAX_IF_NAME) < 0) return 0; } return 1; } /** * tipc_bearer_find - locates bearer object with matching bearer name * @net: the applicable net namespace * @name: bearer name to locate */ struct tipc_bearer *tipc_bearer_find(struct net *net, const char *name) { struct tipc_net *tn = tipc_net(net); struct tipc_bearer *b; u32 i; for (i = 0; i < MAX_BEARERS; i++) { b = rtnl_dereference(tn->bearer_list[i]); if (b && (!strcmp(b->name, name))) return b; } return NULL; } /* tipc_bearer_get_name - get the bearer name from its id. * @net: network namespace * @name: a pointer to the buffer where the name will be stored. * @bearer_id: the id to get the name from. */ int tipc_bearer_get_name(struct net *net, char *name, u32 bearer_id) { struct tipc_net *tn = tipc_net(net); struct tipc_bearer *b; if (bearer_id >= MAX_BEARERS) return -EINVAL; b = rtnl_dereference(tn->bearer_list[bearer_id]); if (!b) return -EINVAL; strcpy(name, b->name); return 0; } void tipc_bearer_add_dest(struct net *net, u32 bearer_id, u32 dest) { struct tipc_bearer *b; rcu_read_lock(); b = bearer_get(net, bearer_id); if (b) tipc_disc_add_dest(b->disc); rcu_read_unlock(); } void tipc_bearer_remove_dest(struct net *net, u32 bearer_id, u32 dest) { struct tipc_bearer *b; rcu_read_lock(); b = bearer_get(net, bearer_id); if (b) tipc_disc_remove_dest(b->disc); rcu_read_unlock(); } /** * tipc_enable_bearer - enable bearer with the given name * @net: the applicable net namespace * @name: bearer name to enable * @disc_domain: bearer domain * @prio: bearer priority * @attr: nlattr array * @extack: netlink extended ack */ static int tipc_enable_bearer(struct net *net, const char *name, u32 disc_domain, u32 prio, struct nlattr *attr[], struct netlink_ext_ack *extack) { struct tipc_net *tn = tipc_net(net); struct tipc_bearer_names b_names; int with_this_prio = 1; struct tipc_bearer *b; struct tipc_media *m; struct sk_buff *skb; int bearer_id = 0; int res = -EINVAL; char *errstr = ""; u32 i; if (!bearer_name_validate(name, &b_names)) { NL_SET_ERR_MSG(extack, "Illegal name"); return res; } if (prio > TIPC_MAX_LINK_PRI && prio != TIPC_MEDIA_LINK_PRI) { errstr = "illegal priority"; NL_SET_ERR_MSG(extack, "Illegal priority"); goto rejected; } m = tipc_media_find(b_names.media_name); if (!m) { errstr = "media not registered"; NL_SET_ERR_MSG(extack, "Media not registered"); goto rejected; } if (prio == TIPC_MEDIA_LINK_PRI) prio = m->priority; /* Check new bearer vs existing ones and find free bearer id if any */ bearer_id = MAX_BEARERS; i = MAX_BEARERS; while (i-- != 0) { b = rtnl_dereference(tn->bearer_list[i]); if (!b) { bearer_id = i; continue; } if (!strcmp(name, b->name)) { errstr = "already enabled"; NL_SET_ERR_MSG(extack, "Already enabled"); goto rejected; } if (b->priority == prio && (++with_this_prio > 2)) { pr_warn("Bearer <%s>: already 2 bearers with priority %u\n", name, prio); if (prio == TIPC_MIN_LINK_PRI) { errstr = "cannot adjust to lower"; NL_SET_ERR_MSG(extack, "Cannot adjust to lower"); goto rejected; } pr_warn("Bearer <%s>: trying with adjusted priority\n", name); prio--; bearer_id = MAX_BEARERS; i = MAX_BEARERS; with_this_prio = 1; } } if (bearer_id >= MAX_BEARERS) { errstr = "max 3 bearers permitted"; NL_SET_ERR_MSG(extack, "Max 3 bearers permitted"); goto rejected; } b = kzalloc(sizeof(*b), GFP_ATOMIC); if (!b) return -ENOMEM; strscpy(b->name, name); b->media = m; res = m->enable_media(net, b, attr); if (res) { kfree(b); errstr = "failed to enable media"; NL_SET_ERR_MSG(extack, "Failed to enable media"); goto rejected; } b->identity = bearer_id; b->tolerance = m->tolerance; b->min_win = m->min_win; b->max_win = m->max_win; b->domain = disc_domain; b->net_plane = bearer_id + 'A'; b->priority = prio; refcount_set(&b->refcnt, 1); res = tipc_disc_create(net, b, &b->bcast_addr, &skb); if (res) { bearer_disable(net, b); errstr = "failed to create discoverer"; NL_SET_ERR_MSG(extack, "Failed to create discoverer"); goto rejected; } /* Create monitoring data before accepting activate messages */ if (tipc_mon_create(net, bearer_id)) { bearer_disable(net, b); kfree_skb(skb); return -ENOMEM; } test_and_set_bit_lock(0, &b->up); rcu_assign_pointer(tn->bearer_list[bearer_id], b); if (skb) tipc_bearer_xmit_skb(net, bearer_id, skb, &b->bcast_addr); pr_info("Enabled bearer <%s>, priority %u\n", name, prio); return res; rejected: pr_warn("Enabling of bearer <%s> rejected, %s\n", name, errstr); return res; } /** * tipc_reset_bearer - Reset all links established over this bearer * @net: the applicable net namespace * @b: the target bearer */ static int tipc_reset_bearer(struct net *net, struct tipc_bearer *b) { pr_info("Resetting bearer <%s>\n", b->name); tipc_node_delete_links(net, b->identity); tipc_disc_reset(net, b); return 0; } bool tipc_bearer_hold(struct tipc_bearer *b) { return (b && refcount_inc_not_zero(&b->refcnt)); } void tipc_bearer_put(struct tipc_bearer *b) { if (b && refcount_dec_and_test(&b->refcnt)) kfree_rcu(b, rcu); } /** * bearer_disable - disable this bearer * @net: the applicable net namespace * @b: the bearer to disable * * Note: This routine assumes caller holds RTNL lock. */ static void bearer_disable(struct net *net, struct tipc_bearer *b) { struct tipc_net *tn = tipc_net(net); int bearer_id = b->identity; pr_info("Disabling bearer <%s>\n", b->name); clear_bit_unlock(0, &b->up); tipc_node_delete_links(net, bearer_id); b->media->disable_media(b); RCU_INIT_POINTER(b->media_ptr, NULL); if (b->disc) tipc_disc_delete(b->disc); RCU_INIT_POINTER(tn->bearer_list[bearer_id], NULL); tipc_bearer_put(b); tipc_mon_delete(net, bearer_id); } int tipc_enable_l2_media(struct net *net, struct tipc_bearer *b, struct nlattr *attr[]) { char *dev_name = strchr((const char *)b->name, ':') + 1; int hwaddr_len = b->media->hwaddr_len; u8 node_id[NODE_ID_LEN] = {0,}; struct net_device *dev; /* Find device with specified name */ dev = dev_get_by_name(net, dev_name); if (!dev) return -ENODEV; if (tipc_mtu_bad(dev)) { dev_put(dev); return -EINVAL; } if (dev == net->loopback_dev) { dev_put(dev); pr_info("Enabling <%s> not permitted\n", b->name); return -EINVAL; } /* Autoconfigure own node identity if needed */ if (!tipc_own_id(net) && hwaddr_len <= NODE_ID_LEN) { memcpy(node_id, dev->dev_addr, hwaddr_len); tipc_net_init(net, node_id, 0); } if (!tipc_own_id(net)) { dev_put(dev); pr_warn("Failed to obtain node identity\n"); return -EINVAL; } /* Associate TIPC bearer with L2 bearer */ rcu_assign_pointer(b->media_ptr, dev); b->pt.dev = dev; b->pt.type = htons(ETH_P_TIPC); b->pt.func = tipc_l2_rcv_msg; dev_add_pack(&b->pt); memset(&b->bcast_addr, 0, sizeof(b->bcast_addr)); memcpy(b->bcast_addr.value, dev->broadcast, hwaddr_len); b->bcast_addr.media_id = b->media->type_id; b->bcast_addr.broadcast = TIPC_BROADCAST_SUPPORT; b->mtu = dev->mtu; b->media->raw2addr(b, &b->addr, (const char *)dev->dev_addr); rcu_assign_pointer(dev->tipc_ptr, b); return 0; } /* tipc_disable_l2_media - detach TIPC bearer from an L2 interface * @b: the target bearer * * Mark L2 bearer as inactive so that incoming buffers are thrown away */ void tipc_disable_l2_media(struct tipc_bearer *b) { struct net_device *dev; dev = (struct net_device *)rtnl_dereference(b->media_ptr); dev_remove_pack(&b->pt); RCU_INIT_POINTER(dev->tipc_ptr, NULL); synchronize_net(); dev_put(dev); } /** * tipc_l2_send_msg - send a TIPC packet out over an L2 interface * @net: the associated network namespace * @skb: the packet to be sent * @b: the bearer through which the packet is to be sent * @dest: peer destination address */ int tipc_l2_send_msg(struct net *net, struct sk_buff *skb, struct tipc_bearer *b, struct tipc_media_addr *dest) { struct net_device *dev; int delta; dev = (struct net_device *)rcu_dereference(b->media_ptr); if (!dev) return 0; delta = SKB_DATA_ALIGN(dev->hard_header_len - skb_headroom(skb)); if ((delta > 0) && pskb_expand_head(skb, delta, 0, GFP_ATOMIC)) { kfree_skb(skb); return 0; } skb_reset_network_header(skb); skb->dev = dev; skb->protocol = htons(ETH_P_TIPC); dev_hard_header(skb, dev, ETH_P_TIPC, dest->value, dev->dev_addr, skb->len); dev_queue_xmit(skb); return 0; } bool tipc_bearer_bcast_support(struct net *net, u32 bearer_id) { bool supp = false; struct tipc_bearer *b; rcu_read_lock(); b = bearer_get(net, bearer_id); if (b) supp = (b->bcast_addr.broadcast == TIPC_BROADCAST_SUPPORT); rcu_read_unlock(); return supp; } int tipc_bearer_mtu(struct net *net, u32 bearer_id) { int mtu = 0; struct tipc_bearer *b; rcu_read_lock(); b = bearer_get(net, bearer_id); if (b) mtu = b->mtu; rcu_read_unlock(); return mtu; } int tipc_bearer_min_mtu(struct net *net, u32 bearer_id) { int mtu = TIPC_MIN_BEARER_MTU; struct tipc_bearer *b; rcu_read_lock(); b = bearer_get(net, bearer_id); if (b) mtu += b->encap_hlen; rcu_read_unlock(); return mtu; } /* tipc_bearer_xmit_skb - sends buffer to destination over bearer */ void tipc_bearer_xmit_skb(struct net *net, u32 bearer_id, struct sk_buff *skb, struct tipc_media_addr *dest) { struct tipc_msg *hdr = buf_msg(skb); struct tipc_bearer *b; rcu_read_lock(); b = bearer_get(net, bearer_id); if (likely(b && (test_bit(0, &b->up) || msg_is_reset(hdr)))) { #ifdef CONFIG_TIPC_CRYPTO tipc_crypto_xmit(net, &skb, b, dest, NULL); if (skb) #endif b->media->send_msg(net, skb, b, dest); } else { kfree_skb(skb); } rcu_read_unlock(); } /* tipc_bearer_xmit() -send buffer to destination over bearer */ void tipc_bearer_xmit(struct net *net, u32 bearer_id, struct sk_buff_head *xmitq, struct tipc_media_addr *dst, struct tipc_node *__dnode) { struct tipc_bearer *b; struct sk_buff *skb, *tmp; if (skb_queue_empty(xmitq)) return; rcu_read_lock(); b = bearer_get(net, bearer_id); if (unlikely(!b)) __skb_queue_purge(xmitq); skb_queue_walk_safe(xmitq, skb, tmp) { __skb_dequeue(xmitq); if (likely(test_bit(0, &b->up) || msg_is_reset(buf_msg(skb)))) { #ifdef CONFIG_TIPC_CRYPTO tipc_crypto_xmit(net, &skb, b, dst, __dnode); if (skb) #endif b->media->send_msg(net, skb, b, dst); } else { kfree_skb(skb); } } rcu_read_unlock(); } /* tipc_bearer_bc_xmit() - broadcast buffers to all destinations */ void tipc_bearer_bc_xmit(struct net *net, u32 bearer_id, struct sk_buff_head *xmitq) { struct tipc_net *tn = tipc_net(net); struct tipc_media_addr *dst; int net_id = tn->net_id; struct tipc_bearer *b; struct sk_buff *skb, *tmp; struct tipc_msg *hdr; rcu_read_lock(); b = bearer_get(net, bearer_id); if (unlikely(!b || !test_bit(0, &b->up))) __skb_queue_purge(xmitq); skb_queue_walk_safe(xmitq, skb, tmp) { hdr = buf_msg(skb); msg_set_non_seq(hdr, 1); msg_set_mc_netid(hdr, net_id); __skb_dequeue(xmitq); dst = &b->bcast_addr; #ifdef CONFIG_TIPC_CRYPTO tipc_crypto_xmit(net, &skb, b, dst, NULL); if (skb) #endif b->media->send_msg(net, skb, b, dst); } rcu_read_unlock(); } /** * tipc_l2_rcv_msg - handle incoming TIPC message from an interface * @skb: the received message * @dev: the net device that the packet was received on * @pt: the packet_type structure which was used to register this handler * @orig_dev: the original receive net device in case the device is a bond * * Accept only packets explicitly sent to this node, or broadcast packets; * ignores packets sent using interface multicast, and traffic sent to other * nodes (which can happen if interface is running in promiscuous mode). */ static int tipc_l2_rcv_msg(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt, struct net_device *orig_dev) { struct tipc_bearer *b; rcu_read_lock(); b = rcu_dereference(dev->tipc_ptr) ?: rcu_dereference(orig_dev->tipc_ptr); if (likely(b && test_bit(0, &b->up) && (skb->pkt_type <= PACKET_MULTICAST))) { skb_mark_not_on_list(skb); TIPC_SKB_CB(skb)->flags = 0; tipc_rcv(dev_net(b->pt.dev), skb, b); rcu_read_unlock(); return NET_RX_SUCCESS; } rcu_read_unlock(); kfree_skb(skb); return NET_RX_DROP; } /** * tipc_l2_device_event - handle device events from network device * @nb: the context of the notification * @evt: the type of event * @ptr: the net device that the event was on * * This function is called by the Ethernet driver in case of link * change event. */ static int tipc_l2_device_event(struct notifier_block *nb, unsigned long evt, void *ptr) { struct net_device *dev = netdev_notifier_info_to_dev(ptr); struct net *net = dev_net(dev); struct tipc_bearer *b; b = rtnl_dereference(dev->tipc_ptr); if (!b) return NOTIFY_DONE; trace_tipc_l2_device_event(dev, b, evt); switch (evt) { case NETDEV_CHANGE: if (netif_carrier_ok(dev) && netif_oper_up(dev)) { test_and_set_bit_lock(0, &b->up); break; } fallthrough; case NETDEV_GOING_DOWN: clear_bit_unlock(0, &b->up); tipc_reset_bearer(net, b); break; case NETDEV_UP: test_and_set_bit_lock(0, &b->up); break; case NETDEV_CHANGEMTU: if (tipc_mtu_bad(dev)) { bearer_disable(net, b); break; } b->mtu = dev->mtu; tipc_reset_bearer(net, b); break; case NETDEV_CHANGEADDR: b->media->raw2addr(b, &b->addr, (const char *)dev->dev_addr); tipc_reset_bearer(net, b); break; case NETDEV_UNREGISTER: case NETDEV_CHANGENAME: bearer_disable(net, b); break; } return NOTIFY_OK; } static struct notifier_block notifier = { .notifier_call = tipc_l2_device_event, .priority = 0, }; int tipc_bearer_setup(void) { return register_netdevice_notifier(¬ifier); } void tipc_bearer_cleanup(void) { unregister_netdevice_notifier(¬ifier); } void tipc_bearer_stop(struct net *net) { struct tipc_net *tn = tipc_net(net); struct tipc_bearer *b; u32 i; for (i = 0; i < MAX_BEARERS; i++) { b = rtnl_dereference(tn->bearer_list[i]); if (b) { bearer_disable(net, b); tn->bearer_list[i] = NULL; } } } void tipc_clone_to_loopback(struct net *net, struct sk_buff_head *pkts) { struct net_device *dev = net->loopback_dev; struct sk_buff *skb, *_skb; int exp; skb_queue_walk(pkts, _skb) { skb = pskb_copy(_skb, GFP_ATOMIC); if (!skb) continue; exp = SKB_DATA_ALIGN(dev->hard_header_len - skb_headroom(skb)); if (exp > 0 && pskb_expand_head(skb, exp, 0, GFP_ATOMIC)) { kfree_skb(skb); continue; } skb_reset_network_header(skb); dev_hard_header(skb, dev, ETH_P_TIPC, dev->dev_addr, dev->dev_addr, skb->len); skb->dev = dev; skb->pkt_type = PACKET_HOST; skb->ip_summed = CHECKSUM_UNNECESSARY; skb->protocol = eth_type_trans(skb, dev); netif_rx(skb); } } static int tipc_loopback_rcv_pkt(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt, struct net_device *od) { consume_skb(skb); return NET_RX_SUCCESS; } int tipc_attach_loopback(struct net *net) { struct net_device *dev = net->loopback_dev; struct tipc_net *tn = tipc_net(net); if (!dev) return -ENODEV; netdev_hold(dev, &tn->loopback_pt.dev_tracker, GFP_KERNEL); tn->loopback_pt.dev = dev; tn->loopback_pt.type = htons(ETH_P_TIPC); tn->loopback_pt.func = tipc_loopback_rcv_pkt; dev_add_pack(&tn->loopback_pt); return 0; } void tipc_detach_loopback(struct net *net) { struct tipc_net *tn = tipc_net(net); dev_remove_pack(&tn->loopback_pt); netdev_put(net->loopback_dev, &tn->loopback_pt.dev_tracker); } /* Caller should hold rtnl_lock to protect the bearer */ static int __tipc_nl_add_bearer(struct tipc_nl_msg *msg, struct tipc_bearer *bearer, int nlflags) { void *hdr; struct nlattr *attrs; struct nlattr *prop; hdr = genlmsg_put(msg->skb, msg->portid, msg->seq, &tipc_genl_family, nlflags, TIPC_NL_BEARER_GET); if (!hdr) return -EMSGSIZE; attrs = nla_nest_start_noflag(msg->skb, TIPC_NLA_BEARER); if (!attrs) goto msg_full; if (nla_put_string(msg->skb, TIPC_NLA_BEARER_NAME, bearer->name)) goto attr_msg_full; prop = nla_nest_start_noflag(msg->skb, TIPC_NLA_BEARER_PROP); if (!prop) goto prop_msg_full; if (nla_put_u32(msg->skb, TIPC_NLA_PROP_PRIO, bearer->priority)) goto prop_msg_full; if (nla_put_u32(msg->skb, TIPC_NLA_PROP_TOL, bearer->tolerance)) goto prop_msg_full; if (nla_put_u32(msg->skb, TIPC_NLA_PROP_WIN, bearer->max_win)) goto prop_msg_full; if (bearer->media->type_id == TIPC_MEDIA_TYPE_UDP) if (nla_put_u32(msg->skb, TIPC_NLA_PROP_MTU, bearer->mtu)) goto prop_msg_full; nla_nest_end(msg->skb, prop); #ifdef CONFIG_TIPC_MEDIA_UDP if (bearer->media->type_id == TIPC_MEDIA_TYPE_UDP) { if (tipc_udp_nl_add_bearer_data(msg, bearer)) goto attr_msg_full; } #endif nla_nest_end(msg->skb, attrs); genlmsg_end(msg->skb, hdr); return 0; prop_msg_full: nla_nest_cancel(msg->skb, prop); attr_msg_full: nla_nest_cancel(msg->skb, attrs); msg_full: genlmsg_cancel(msg->skb, hdr); return -EMSGSIZE; } int tipc_nl_bearer_dump(struct sk_buff *skb, struct netlink_callback *cb) { int err; int i = cb->args[0]; struct tipc_bearer *bearer; struct tipc_nl_msg msg; struct net *net = sock_net(skb->sk); struct tipc_net *tn = tipc_net(net); if (i == MAX_BEARERS) return 0; msg.skb = skb; msg.portid = NETLINK_CB(cb->skb).portid; msg.seq = cb->nlh->nlmsg_seq; rtnl_lock(); for (i = 0; i < MAX_BEARERS; i++) { bearer = rtnl_dereference(tn->bearer_list[i]); if (!bearer) continue; err = __tipc_nl_add_bearer(&msg, bearer, NLM_F_MULTI); if (err) break; } rtnl_unlock(); cb->args[0] = i; return skb->len; } int tipc_nl_bearer_get(struct sk_buff *skb, struct genl_info *info) { int err; char *name; struct sk_buff *rep; struct tipc_bearer *bearer; struct tipc_nl_msg msg; struct nlattr *attrs[TIPC_NLA_BEARER_MAX + 1]; struct net *net = genl_info_net(info); if (!info->attrs[TIPC_NLA_BEARER]) return -EINVAL; err = nla_parse_nested_deprecated(attrs, TIPC_NLA_BEARER_MAX, info->attrs[TIPC_NLA_BEARER], tipc_nl_bearer_policy, info->extack); if (err) return err; if (!attrs[TIPC_NLA_BEARER_NAME]) return -EINVAL; name = nla_data(attrs[TIPC_NLA_BEARER_NAME]); rep = nlmsg_new(NLMSG_GOODSIZE, GFP_KERNEL); if (!rep) return -ENOMEM; msg.skb = rep; msg.portid = info->snd_portid; msg.seq = info->snd_seq; rtnl_lock(); bearer = tipc_bearer_find(net, name); if (!bearer) { err = -EINVAL; NL_SET_ERR_MSG(info->extack, "Bearer not found"); goto err_out; } err = __tipc_nl_add_bearer(&msg, bearer, 0); if (err) goto err_out; rtnl_unlock(); return genlmsg_reply(rep, info); err_out: rtnl_unlock(); nlmsg_free(rep); return err; } int __tipc_nl_bearer_disable(struct sk_buff *skb, struct genl_info *info) { int err; char *name; struct tipc_bearer *bearer; struct nlattr *attrs[TIPC_NLA_BEARER_MAX + 1]; struct net *net = sock_net(skb->sk); if (!info->attrs[TIPC_NLA_BEARER]) return -EINVAL; err = nla_parse_nested_deprecated(attrs, TIPC_NLA_BEARER_MAX, info->attrs[TIPC_NLA_BEARER], tipc_nl_bearer_policy, info->extack); if (err) return err; if (!attrs[TIPC_NLA_BEARER_NAME]) return -EINVAL; name = nla_data(attrs[TIPC_NLA_BEARER_NAME]); bearer = tipc_bearer_find(net, name); if (!bearer) { NL_SET_ERR_MSG(info->extack, "Bearer not found"); return -EINVAL; } bearer_disable(net, bearer); return 0; } int tipc_nl_bearer_disable(struct sk_buff *skb, struct genl_info *info) { int err; rtnl_lock(); err = __tipc_nl_bearer_disable(skb, info); rtnl_unlock(); return err; } int __tipc_nl_bearer_enable(struct sk_buff *skb, struct genl_info *info) { int err; char *bearer; struct nlattr *attrs[TIPC_NLA_BEARER_MAX + 1]; struct net *net = sock_net(skb->sk); u32 domain = 0; u32 prio; prio = TIPC_MEDIA_LINK_PRI; if (!info->attrs[TIPC_NLA_BEARER]) return -EINVAL; err = nla_parse_nested_deprecated(attrs, TIPC_NLA_BEARER_MAX, info->attrs[TIPC_NLA_BEARER], tipc_nl_bearer_policy, info->extack); if (err) return err; if (!attrs[TIPC_NLA_BEARER_NAME]) return -EINVAL; bearer = nla_data(attrs[TIPC_NLA_BEARER_NAME]); if (attrs[TIPC_NLA_BEARER_DOMAIN]) domain = nla_get_u32(attrs[TIPC_NLA_BEARER_DOMAIN]); if (attrs[TIPC_NLA_BEARER_PROP]) { struct nlattr *props[TIPC_NLA_PROP_MAX + 1]; err = tipc_nl_parse_link_prop(attrs[TIPC_NLA_BEARER_PROP], props); if (err) return err; if (props[TIPC_NLA_PROP_PRIO]) prio = nla_get_u32(props[TIPC_NLA_PROP_PRIO]); } return tipc_enable_bearer(net, bearer, domain, prio, attrs, info->extack); } int tipc_nl_bearer_enable(struct sk_buff *skb, struct genl_info *info) { int err; rtnl_lock(); err = __tipc_nl_bearer_enable(skb, info); rtnl_unlock(); return err; } int tipc_nl_bearer_add(struct sk_buff *skb, struct genl_info *info) { int err; char *name; struct tipc_bearer *b; struct nlattr *attrs[TIPC_NLA_BEARER_MAX + 1]; struct net *net = sock_net(skb->sk); if (!info->attrs[TIPC_NLA_BEARER]) return -EINVAL; err = nla_parse_nested_deprecated(attrs, TIPC_NLA_BEARER_MAX, info->attrs[TIPC_NLA_BEARER], tipc_nl_bearer_policy, info->extack); if (err) return err; if (!attrs[TIPC_NLA_BEARER_NAME]) return -EINVAL; name = nla_data(attrs[TIPC_NLA_BEARER_NAME]); rtnl_lock(); b = tipc_bearer_find(net, name); if (!b) { NL_SET_ERR_MSG(info->extack, "Bearer not found"); err = -EINVAL; goto out; } #ifdef CONFIG_TIPC_MEDIA_UDP if (attrs[TIPC_NLA_BEARER_UDP_OPTS]) { if (b->media->type_id != TIPC_MEDIA_TYPE_UDP) { NL_SET_ERR_MSG(info->extack, "UDP option is unsupported"); err = -EINVAL; goto out; } err = tipc_udp_nl_bearer_add(b, attrs[TIPC_NLA_BEARER_UDP_OPTS]); } #endif out: rtnl_unlock(); return err; } int __tipc_nl_bearer_set(struct sk_buff *skb, struct genl_info *info) { struct tipc_bearer *b; struct nlattr *attrs[TIPC_NLA_BEARER_MAX + 1]; struct net *net = sock_net(skb->sk); char *name; int err; if (!info->attrs[TIPC_NLA_BEARER]) return -EINVAL; err = nla_parse_nested_deprecated(attrs, TIPC_NLA_BEARER_MAX, info->attrs[TIPC_NLA_BEARER], tipc_nl_bearer_policy, info->extack); if (err) return err; if (!attrs[TIPC_NLA_BEARER_NAME]) return -EINVAL; name = nla_data(attrs[TIPC_NLA_BEARER_NAME]); b = tipc_bearer_find(net, name); if (!b) { NL_SET_ERR_MSG(info->extack, "Bearer not found"); return -EINVAL; } if (attrs[TIPC_NLA_BEARER_PROP]) { struct nlattr *props[TIPC_NLA_PROP_MAX + 1]; err = tipc_nl_parse_link_prop(attrs[TIPC_NLA_BEARER_PROP], props); if (err) return err; if (props[TIPC_NLA_PROP_TOL]) { b->tolerance = nla_get_u32(props[TIPC_NLA_PROP_TOL]); tipc_node_apply_property(net, b, TIPC_NLA_PROP_TOL); } if (props[TIPC_NLA_PROP_PRIO]) b->priority = nla_get_u32(props[TIPC_NLA_PROP_PRIO]); if (props[TIPC_NLA_PROP_WIN]) b->max_win = nla_get_u32(props[TIPC_NLA_PROP_WIN]); if (props[TIPC_NLA_PROP_MTU]) { if (b->media->type_id != TIPC_MEDIA_TYPE_UDP) { NL_SET_ERR_MSG(info->extack, "MTU property is unsupported"); return -EINVAL; } #ifdef CONFIG_TIPC_MEDIA_UDP if (nla_get_u32(props[TIPC_NLA_PROP_MTU]) < b->encap_hlen + TIPC_MIN_BEARER_MTU) { NL_SET_ERR_MSG(info->extack, "MTU value is out-of-range"); return -EINVAL; } b->mtu = nla_get_u32(props[TIPC_NLA_PROP_MTU]); tipc_node_apply_property(net, b, TIPC_NLA_PROP_MTU); #endif } } return 0; } int tipc_nl_bearer_set(struct sk_buff *skb, struct genl_info *info) { int err; rtnl_lock(); err = __tipc_nl_bearer_set(skb, info); rtnl_unlock(); return err; } static int __tipc_nl_add_media(struct tipc_nl_msg *msg, struct tipc_media *media, int nlflags) { void *hdr; struct nlattr *attrs; struct nlattr *prop; hdr = genlmsg_put(msg->skb, msg->portid, msg->seq, &tipc_genl_family, nlflags, TIPC_NL_MEDIA_GET); if (!hdr) return -EMSGSIZE; attrs = nla_nest_start_noflag(msg->skb, TIPC_NLA_MEDIA); if (!attrs) goto msg_full; if (nla_put_string(msg->skb, TIPC_NLA_MEDIA_NAME, media->name)) goto attr_msg_full; prop = nla_nest_start_noflag(msg->skb, TIPC_NLA_MEDIA_PROP); if (!prop) goto prop_msg_full; if (nla_put_u32(msg->skb, TIPC_NLA_PROP_PRIO, media->priority)) goto prop_msg_full; if (nla_put_u32(msg->skb, TIPC_NLA_PROP_TOL, media->tolerance)) goto prop_msg_full; if (nla_put_u32(msg->skb, TIPC_NLA_PROP_WIN, media->max_win)) goto prop_msg_full; if (media->type_id == TIPC_MEDIA_TYPE_UDP) if (nla_put_u32(msg->skb, TIPC_NLA_PROP_MTU, media->mtu)) goto prop_msg_full; nla_nest_end(msg->skb, prop); nla_nest_end(msg->skb, attrs); genlmsg_end(msg->skb, hdr); return 0; prop_msg_full: nla_nest_cancel(msg->skb, prop); attr_msg_full: nla_nest_cancel(msg->skb, attrs); msg_full: genlmsg_cancel(msg->skb, hdr); return -EMSGSIZE; } int tipc_nl_media_dump(struct sk_buff *skb, struct netlink_callback *cb) { int err; int i = cb->args[0]; struct tipc_nl_msg msg; if (i == MAX_MEDIA) return 0; msg.skb = skb; msg.portid = NETLINK_CB(cb->skb).portid; msg.seq = cb->nlh->nlmsg_seq; rtnl_lock(); for (; media_info_array[i] != NULL; i++) { err = __tipc_nl_add_media(&msg, media_info_array[i], NLM_F_MULTI); if (err) break; } rtnl_unlock(); cb->args[0] = i; return skb->len; } int tipc_nl_media_get(struct sk_buff *skb, struct genl_info *info) { int err; char *name; struct tipc_nl_msg msg; struct tipc_media *media; struct sk_buff *rep; struct nlattr *attrs[TIPC_NLA_MEDIA_MAX + 1]; if (!info->attrs[TIPC_NLA_MEDIA]) return -EINVAL; err = nla_parse_nested_deprecated(attrs, TIPC_NLA_MEDIA_MAX, info->attrs[TIPC_NLA_MEDIA], tipc_nl_media_policy, info->extack); if (err) return err; if (!attrs[TIPC_NLA_MEDIA_NAME]) return -EINVAL; name = nla_data(attrs[TIPC_NLA_MEDIA_NAME]); rep = nlmsg_new(NLMSG_GOODSIZE, GFP_KERNEL); if (!rep) return -ENOMEM; msg.skb = rep; msg.portid = info->snd_portid; msg.seq = info->snd_seq; rtnl_lock(); media = tipc_media_find(name); if (!media) { NL_SET_ERR_MSG(info->extack, "Media not found"); err = -EINVAL; goto err_out; } err = __tipc_nl_add_media(&msg, media, 0); if (err) goto err_out; rtnl_unlock(); return genlmsg_reply(rep, info); err_out: rtnl_unlock(); nlmsg_free(rep); return err; } int __tipc_nl_media_set(struct sk_buff *skb, struct genl_info *info) { int err; char *name; struct tipc_media *m; struct nlattr *attrs[TIPC_NLA_MEDIA_MAX + 1]; if (!info->attrs[TIPC_NLA_MEDIA]) return -EINVAL; err = nla_parse_nested_deprecated(attrs, TIPC_NLA_MEDIA_MAX, info->attrs[TIPC_NLA_MEDIA], tipc_nl_media_policy, info->extack); if (!attrs[TIPC_NLA_MEDIA_NAME]) return -EINVAL; name = nla_data(attrs[TIPC_NLA_MEDIA_NAME]); m = tipc_media_find(name); if (!m) { NL_SET_ERR_MSG(info->extack, "Media not found"); return -EINVAL; } if (attrs[TIPC_NLA_MEDIA_PROP]) { struct nlattr *props[TIPC_NLA_PROP_MAX + 1]; err = tipc_nl_parse_link_prop(attrs[TIPC_NLA_MEDIA_PROP], props); if (err) return err; if (props[TIPC_NLA_PROP_TOL]) m->tolerance = nla_get_u32(props[TIPC_NLA_PROP_TOL]); if (props[TIPC_NLA_PROP_PRIO]) m->priority = nla_get_u32(props[TIPC_NLA_PROP_PRIO]); if (props[TIPC_NLA_PROP_WIN]) m->max_win = nla_get_u32(props[TIPC_NLA_PROP_WIN]); if (props[TIPC_NLA_PROP_MTU]) { if (m->type_id != TIPC_MEDIA_TYPE_UDP) { NL_SET_ERR_MSG(info->extack, "MTU property is unsupported"); return -EINVAL; } #ifdef CONFIG_TIPC_MEDIA_UDP if (tipc_udp_mtu_bad(nla_get_u32 (props[TIPC_NLA_PROP_MTU]))) { NL_SET_ERR_MSG(info->extack, "MTU value is out-of-range"); return -EINVAL; } m->mtu = nla_get_u32(props[TIPC_NLA_PROP_MTU]); #endif } } return 0; } int tipc_nl_media_set(struct sk_buff *skb, struct genl_info *info) { int err; rtnl_lock(); err = __tipc_nl_media_set(skb, info); rtnl_unlock(); return err; } |
| 7 1 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 | // SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB /* * Copyright (c) 2016 Mellanox Technologies Ltd. All rights reserved. * Copyright (c) 2015 System Fabric Works, Inc. All rights reserved. */ #include <linux/libnvdimm.h> #include "rxe.h" #include "rxe_loc.h" /* Return a random 8 bit key value that is * different than the last_key. Set last_key to -1 * if this is the first key for an MR or MW */ u8 rxe_get_next_key(u32 last_key) { u8 key; do { get_random_bytes(&key, 1); } while (key == last_key); return key; } int mr_check_range(struct rxe_mr *mr, u64 iova, size_t length) { switch (mr->ibmr.type) { case IB_MR_TYPE_DMA: return 0; case IB_MR_TYPE_USER: case IB_MR_TYPE_MEM_REG: if (iova < mr->ibmr.iova || iova + length > mr->ibmr.iova + mr->ibmr.length) { rxe_dbg_mr(mr, "iova/length out of range\n"); return -EINVAL; } return 0; default: rxe_dbg_mr(mr, "mr type not supported\n"); return -EINVAL; } } static void rxe_mr_init(int access, struct rxe_mr *mr) { u32 key = mr->elem.index << 8 | rxe_get_next_key(-1); /* set ibmr->l/rkey and also copy into private l/rkey * for user MRs these will always be the same * for cases where caller 'owns' the key portion * they may be different until REG_MR WQE is executed. */ mr->lkey = mr->ibmr.lkey = key; mr->rkey = mr->ibmr.rkey = key; mr->access = access; mr->ibmr.page_size = PAGE_SIZE; mr->page_mask = PAGE_MASK; mr->page_shift = PAGE_SHIFT; mr->state = RXE_MR_STATE_INVALID; } void rxe_mr_init_dma(int access, struct rxe_mr *mr) { rxe_mr_init(access, mr); mr->state = RXE_MR_STATE_VALID; mr->ibmr.type = IB_MR_TYPE_DMA; } static unsigned long rxe_mr_iova_to_index(struct rxe_mr *mr, u64 iova) { return (iova >> mr->page_shift) - (mr->ibmr.iova >> mr->page_shift); } static unsigned long rxe_mr_iova_to_page_offset(struct rxe_mr *mr, u64 iova) { return iova & (mr_page_size(mr) - 1); } static bool is_pmem_page(struct page *pg) { unsigned long paddr = page_to_phys(pg); return REGION_INTERSECTS == region_intersects(paddr, PAGE_SIZE, IORESOURCE_MEM, IORES_DESC_PERSISTENT_MEMORY); } static int rxe_mr_fill_pages_from_sgt(struct rxe_mr *mr, struct sg_table *sgt) { XA_STATE(xas, &mr->page_list, 0); struct sg_page_iter sg_iter; struct page *page; bool persistent = !!(mr->access & IB_ACCESS_FLUSH_PERSISTENT); __sg_page_iter_start(&sg_iter, sgt->sgl, sgt->orig_nents, 0); if (!__sg_page_iter_next(&sg_iter)) return 0; do { xas_lock(&xas); while (true) { page = sg_page_iter_page(&sg_iter); if (persistent && !is_pmem_page(page)) { rxe_dbg_mr(mr, "Page can't be persistent\n"); xas_set_err(&xas, -EINVAL); break; } xas_store(&xas, page); if (xas_error(&xas)) break; xas_next(&xas); if (!__sg_page_iter_next(&sg_iter)) break; } xas_unlock(&xas); } while (xas_nomem(&xas, GFP_KERNEL)); return xas_error(&xas); } int rxe_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, int access, struct rxe_mr *mr) { struct ib_umem *umem; int err; rxe_mr_init(access, mr); xa_init(&mr->page_list); umem = ib_umem_get(&rxe->ib_dev, start, length, access); if (IS_ERR(umem)) { rxe_dbg_mr(mr, "Unable to pin memory region err = %d\n", (int)PTR_ERR(umem)); return PTR_ERR(umem); } err = rxe_mr_fill_pages_from_sgt(mr, &umem->sgt_append.sgt); if (err) { ib_umem_release(umem); return err; } mr->umem = umem; mr->ibmr.type = IB_MR_TYPE_USER; mr->state = RXE_MR_STATE_VALID; return 0; } static int rxe_mr_alloc(struct rxe_mr *mr, int num_buf) { XA_STATE(xas, &mr->page_list, 0); int i = 0; int err; xa_init(&mr->page_list); do { xas_lock(&xas); while (i != num_buf) { xas_store(&xas, XA_ZERO_ENTRY); if (xas_error(&xas)) break; xas_next(&xas); i++; } xas_unlock(&xas); } while (xas_nomem(&xas, GFP_KERNEL)); err = xas_error(&xas); if (err) return err; mr->num_buf = num_buf; return 0; } int rxe_mr_init_fast(int max_pages, struct rxe_mr *mr) { int err; /* always allow remote access for FMRs */ rxe_mr_init(RXE_ACCESS_REMOTE, mr); err = rxe_mr_alloc(mr, max_pages); if (err) goto err1; mr->state = RXE_MR_STATE_FREE; mr->ibmr.type = IB_MR_TYPE_MEM_REG; return 0; err1: return err; } static int rxe_set_page(struct ib_mr *ibmr, u64 dma_addr) { struct rxe_mr *mr = to_rmr(ibmr); struct page *page = ib_virt_dma_to_page(dma_addr); bool persistent = !!(mr->access & IB_ACCESS_FLUSH_PERSISTENT); int err; if (persistent && !is_pmem_page(page)) { rxe_dbg_mr(mr, "Page cannot be persistent\n"); return -EINVAL; } if (unlikely(mr->nbuf == mr->num_buf)) return -ENOMEM; err = xa_err(xa_store(&mr->page_list, mr->nbuf, page, GFP_KERNEL)); if (err) return err; mr->nbuf++; return 0; } int rxe_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sgl, int sg_nents, unsigned int *sg_offset) { struct rxe_mr *mr = to_rmr(ibmr); unsigned int page_size = mr_page_size(mr); mr->nbuf = 0; mr->page_shift = ilog2(page_size); mr->page_mask = ~((u64)page_size - 1); mr->page_offset = mr->ibmr.iova & (page_size - 1); return ib_sg_to_pages(ibmr, sgl, sg_nents, sg_offset, rxe_set_page); } static int rxe_mr_copy_xarray(struct rxe_mr *mr, u64 iova, void *addr, unsigned int length, enum rxe_mr_copy_dir dir) { unsigned int page_offset = rxe_mr_iova_to_page_offset(mr, iova); unsigned long index = rxe_mr_iova_to_index(mr, iova); unsigned int bytes; struct page *page; void *va; while (length) { page = xa_load(&mr->page_list, index); if (!page) return -EFAULT; bytes = min_t(unsigned int, length, mr_page_size(mr) - page_offset); va = kmap_local_page(page); if (dir == RXE_FROM_MR_OBJ) memcpy(addr, va + page_offset, bytes); else memcpy(va + page_offset, addr, bytes); kunmap_local(va); page_offset = 0; addr += bytes; length -= bytes; index++; } return 0; } static void rxe_mr_copy_dma(struct rxe_mr *mr, u64 dma_addr, void *addr, unsigned int length, enum rxe_mr_copy_dir dir) { unsigned int page_offset = dma_addr & (PAGE_SIZE - 1); unsigned int bytes; struct page *page; u8 *va; while (length) { page = ib_virt_dma_to_page(dma_addr); bytes = min_t(unsigned int, length, PAGE_SIZE - page_offset); va = kmap_local_page(page); if (dir == RXE_TO_MR_OBJ) memcpy(va + page_offset, addr, bytes); else memcpy(addr, va + page_offset, bytes); kunmap_local(va); page_offset = 0; dma_addr += bytes; addr += bytes; length -= bytes; } } int rxe_mr_copy(struct rxe_mr *mr, u64 iova, void *addr, unsigned int length, enum rxe_mr_copy_dir dir) { int err; if (length == 0) return 0; if (WARN_ON(!mr)) return -EINVAL; if (mr->ibmr.type == IB_MR_TYPE_DMA) { rxe_mr_copy_dma(mr, iova, addr, length, dir); return 0; } err = mr_check_range(mr, iova, length); if (unlikely(err)) { rxe_dbg_mr(mr, "iova out of range\n"); return err; } return rxe_mr_copy_xarray(mr, iova, addr, length, dir); } /* copy data in or out of a wqe, i.e. sg list * under the control of a dma descriptor */ int copy_data( struct rxe_pd *pd, int access, struct rxe_dma_info *dma, void *addr, int length, enum rxe_mr_copy_dir dir) { int bytes; struct rxe_sge *sge = &dma->sge[dma->cur_sge]; int offset = dma->sge_offset; int resid = dma->resid; struct rxe_mr *mr = NULL; u64 iova; int err; if (length == 0) return 0; if (length > resid) { err = -EINVAL; goto err2; } if (sge->length && (offset < sge->length)) { mr = lookup_mr(pd, access, sge->lkey, RXE_LOOKUP_LOCAL); if (!mr) { err = -EINVAL; goto err1; } } while (length > 0) { bytes = length; if (offset >= sge->length) { if (mr) { rxe_put(mr); mr = NULL; } sge++; dma->cur_sge++; offset = 0; if (dma->cur_sge >= dma->num_sge) { err = -ENOSPC; goto err2; } if (sge->length) { mr = lookup_mr(pd, access, sge->lkey, RXE_LOOKUP_LOCAL); if (!mr) { err = -EINVAL; goto err1; } } else { continue; } } if (bytes > sge->length - offset) bytes = sge->length - offset; if (bytes > 0) { iova = sge->addr + offset; err = rxe_mr_copy(mr, iova, addr, bytes, dir); if (err) goto err2; offset += bytes; resid -= bytes; length -= bytes; addr += bytes; } } dma->sge_offset = offset; dma->resid = resid; if (mr) rxe_put(mr); return 0; err2: if (mr) rxe_put(mr); err1: return err; } int rxe_flush_pmem_iova(struct rxe_mr *mr, u64 iova, unsigned int length) { unsigned int page_offset; unsigned long index; struct page *page; unsigned int bytes; int err; u8 *va; /* mr must be valid even if length is zero */ if (WARN_ON(!mr)) return -EINVAL; if (length == 0) return 0; if (mr->ibmr.type == IB_MR_TYPE_DMA) return -EFAULT; err = mr_check_range(mr, iova, length); if (err) return err; while (length > 0) { index = rxe_mr_iova_to_index(mr, iova); page = xa_load(&mr->page_list, index); page_offset = rxe_mr_iova_to_page_offset(mr, iova); if (!page) return -EFAULT; bytes = min_t(unsigned int, length, mr_page_size(mr) - page_offset); va = kmap_local_page(page); arch_wb_cache_pmem(va + page_offset, bytes); kunmap_local(va); length -= bytes; iova += bytes; page_offset = 0; } return 0; } /* Guarantee atomicity of atomic operations at the machine level. */ static DEFINE_SPINLOCK(atomic_ops_lock); int rxe_mr_do_atomic_op(struct rxe_mr *mr, u64 iova, int opcode, u64 compare, u64 swap_add, u64 *orig_val) { unsigned int page_offset; struct page *page; u64 value; u64 *va; if (unlikely(mr->state != RXE_MR_STATE_VALID)) { rxe_dbg_mr(mr, "mr not in valid state\n"); return RESPST_ERR_RKEY_VIOLATION; } if (mr->ibmr.type == IB_MR_TYPE_DMA) { page_offset = iova & (PAGE_SIZE - 1); page = ib_virt_dma_to_page(iova); } else { unsigned long index; int err; err = mr_check_range(mr, iova, sizeof(value)); if (err) { rxe_dbg_mr(mr, "iova out of range\n"); return RESPST_ERR_RKEY_VIOLATION; } page_offset = rxe_mr_iova_to_page_offset(mr, iova); index = rxe_mr_iova_to_index(mr, iova); page = xa_load(&mr->page_list, index); if (!page) return RESPST_ERR_RKEY_VIOLATION; } if (unlikely(page_offset & 0x7)) { rxe_dbg_mr(mr, "iova not aligned\n"); return RESPST_ERR_MISALIGNED_ATOMIC; } va = kmap_local_page(page); spin_lock_bh(&atomic_ops_lock); value = *orig_val = va[page_offset >> 3]; if (opcode == IB_OPCODE_RC_COMPARE_SWAP) { if (value == compare) va[page_offset >> 3] = swap_add; } else { value += swap_add; va[page_offset >> 3] = value; } spin_unlock_bh(&atomic_ops_lock); kunmap_local(va); return 0; } #if defined CONFIG_64BIT /* only implemented or called for 64 bit architectures */ int rxe_mr_do_atomic_write(struct rxe_mr *mr, u64 iova, u64 value) { unsigned int page_offset; struct page *page; u64 *va; /* See IBA oA19-28 */ if (unlikely(mr->state != RXE_MR_STATE_VALID)) { rxe_dbg_mr(mr, "mr not in valid state\n"); return RESPST_ERR_RKEY_VIOLATION; } if (mr->ibmr.type == IB_MR_TYPE_DMA) { page_offset = iova & (PAGE_SIZE - 1); page = ib_virt_dma_to_page(iova); } else { unsigned long index; int err; /* See IBA oA19-28 */ err = mr_check_range(mr, iova, sizeof(value)); if (unlikely(err)) { rxe_dbg_mr(mr, "iova out of range\n"); return RESPST_ERR_RKEY_VIOLATION; } page_offset = rxe_mr_iova_to_page_offset(mr, iova); index = rxe_mr_iova_to_index(mr, iova); page = xa_load(&mr->page_list, index); if (!page) return RESPST_ERR_RKEY_VIOLATION; } /* See IBA A19.4.2 */ if (unlikely(page_offset & 0x7)) { rxe_dbg_mr(mr, "misaligned address\n"); return RESPST_ERR_MISALIGNED_ATOMIC; } va = kmap_local_page(page); /* Do atomic write after all prior operations have completed */ smp_store_release(&va[page_offset >> 3], value); kunmap_local(va); return 0; } #else int rxe_mr_do_atomic_write(struct rxe_mr *mr, u64 iova, u64 value) { return RESPST_ERR_UNSUPPORTED_OPCODE; } #endif int advance_dma_data(struct rxe_dma_info *dma, unsigned int length) { struct rxe_sge *sge = &dma->sge[dma->cur_sge]; int offset = dma->sge_offset; int resid = dma->resid; while (length) { unsigned int bytes; if (offset >= sge->length) { sge++; dma->cur_sge++; offset = 0; if (dma->cur_sge >= dma->num_sge) return -ENOSPC; } bytes = length; if (bytes > sge->length - offset) bytes = sge->length - offset; offset += bytes; resid -= bytes; length -= bytes; } dma->sge_offset = offset; dma->resid = resid; return 0; } struct rxe_mr *lookup_mr(struct rxe_pd *pd, int access, u32 key, enum rxe_mr_lookup_type type) { struct rxe_mr *mr; struct rxe_dev *rxe = to_rdev(pd->ibpd.device); int index = key >> 8; mr = rxe_pool_get_index(&rxe->mr_pool, index); if (!mr) return NULL; if (unlikely((type == RXE_LOOKUP_LOCAL && mr->lkey != key) || (type == RXE_LOOKUP_REMOTE && mr->rkey != key) || mr_pd(mr) != pd || ((access & mr->access) != access) || mr->state != RXE_MR_STATE_VALID)) { rxe_put(mr); mr = NULL; } return mr; } int rxe_invalidate_mr(struct rxe_qp *qp, u32 key) { struct rxe_dev *rxe = to_rdev(qp->ibqp.device); struct rxe_mr *mr; int remote; int ret; mr = rxe_pool_get_index(&rxe->mr_pool, key >> 8); if (!mr) { rxe_dbg_qp(qp, "No MR for key %#x\n", key); ret = -EINVAL; goto err; } remote = mr->access & RXE_ACCESS_REMOTE; if (remote ? (key != mr->rkey) : (key != mr->lkey)) { rxe_dbg_mr(mr, "wr key (%#x) doesn't match mr key (%#x)\n", key, (remote ? mr->rkey : mr->lkey)); ret = -EINVAL; goto err_drop_ref; } if (atomic_read(&mr->num_mw) > 0) { rxe_dbg_mr(mr, "Attempt to invalidate an MR while bound to MWs\n"); ret = -EINVAL; goto err_drop_ref; } if (unlikely(mr->ibmr.type != IB_MR_TYPE_MEM_REG)) { rxe_dbg_mr(mr, "Type (%d) is wrong\n", mr->ibmr.type); ret = -EINVAL; goto err_drop_ref; } mr->state = RXE_MR_STATE_FREE; ret = 0; err_drop_ref: rxe_put(mr); err: return ret; } /* user can (re)register fast MR by executing a REG_MR WQE. * user is expected to hold a reference on the ib mr until the * WQE completes. * Once a fast MR is created this is the only way to change the * private keys. It is the responsibility of the user to maintain * the ib mr keys in sync with rxe mr keys. */ int rxe_reg_fast_mr(struct rxe_qp *qp, struct rxe_send_wqe *wqe) { struct rxe_mr *mr = to_rmr(wqe->wr.wr.reg.mr); u32 key = wqe->wr.wr.reg.key; u32 access = wqe->wr.wr.reg.access; /* user can only register MR in free state */ if (unlikely(mr->state != RXE_MR_STATE_FREE)) { rxe_dbg_mr(mr, "mr->lkey = 0x%x not free\n", mr->lkey); return -EINVAL; } /* user can only register mr with qp in same protection domain */ if (unlikely(qp->ibqp.pd != mr->ibmr.pd)) { rxe_dbg_mr(mr, "qp->pd and mr->pd don't match\n"); return -EINVAL; } /* user is only allowed to change key portion of l/rkey */ if (unlikely((mr->lkey & ~0xff) != (key & ~0xff))) { rxe_dbg_mr(mr, "key = 0x%x has wrong index mr->lkey = 0x%x\n", key, mr->lkey); return -EINVAL; } mr->access = access; mr->lkey = key; mr->rkey = key; mr->ibmr.iova = wqe->wr.wr.reg.mr->iova; mr->state = RXE_MR_STATE_VALID; return 0; } void rxe_mr_cleanup(struct rxe_pool_elem *elem) { struct rxe_mr *mr = container_of(elem, typeof(*mr), elem); rxe_put(mr_pd(mr)); ib_umem_release(mr->umem); if (mr->ibmr.type != IB_MR_TYPE_DMA) xa_destroy(&mr->page_list); } |
| 2 1 1 2 1 2 3 1 3 6 1 40 40 40 38 2 30 1 3 3 1 2 2 1 1 1 1 1 1 1 1 2 1 1 1 1 2 1 1 1 3 5 5 1 1 5 1 2 2 1 1 1 1 1 27 27 10 1 2 3 6 7 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 | // SPDX-License-Identifier: GPL-2.0-or-later /* * Ioctl handler * Linux ethernet bridge * * Authors: * Lennert Buytenhek <buytenh@gnu.org> */ #include <linux/capability.h> #include <linux/compat.h> #include <linux/kernel.h> #include <linux/if_bridge.h> #include <linux/netdevice.h> #include <linux/slab.h> #include <linux/times.h> #include <net/net_namespace.h> #include <linux/uaccess.h> #include "br_private.h" static int get_bridge_ifindices(struct net *net, int *indices, int num) { struct net_device *dev; int i = 0; rcu_read_lock(); for_each_netdev_rcu(net, dev) { if (i >= num) break; if (netif_is_bridge_master(dev)) indices[i++] = dev->ifindex; } rcu_read_unlock(); return i; } /* called with RTNL */ static void get_port_ifindices(struct net_bridge *br, int *ifindices, int num) { struct net_bridge_port *p; list_for_each_entry(p, &br->port_list, list) { if (p->port_no < num) ifindices[p->port_no] = p->dev->ifindex; } } /* * Format up to a page worth of forwarding table entries * userbuf -- where to copy result * maxnum -- maximum number of entries desired * (limited to a page for sanity) * offset -- number of records to skip */ static int get_fdb_entries(struct net_bridge *br, void __user *userbuf, unsigned long maxnum, unsigned long offset) { int num; void *buf; size_t size; /* Clamp size to PAGE_SIZE, test maxnum to avoid overflow */ if (maxnum > PAGE_SIZE/sizeof(struct __fdb_entry)) maxnum = PAGE_SIZE/sizeof(struct __fdb_entry); size = maxnum * sizeof(struct __fdb_entry); buf = kmalloc(size, GFP_USER); if (!buf) return -ENOMEM; num = br_fdb_fillbuf(br, buf, maxnum, offset); if (num > 0) { if (copy_to_user(userbuf, buf, array_size(num, sizeof(struct __fdb_entry)))) num = -EFAULT; } kfree(buf); return num; } /* called with RTNL */ static int add_del_if(struct net_bridge *br, int ifindex, int isadd) { struct net *net = dev_net(br->dev); struct net_device *dev; int ret; if (!ns_capable(net->user_ns, CAP_NET_ADMIN)) return -EPERM; dev = __dev_get_by_index(net, ifindex); if (dev == NULL) return -EINVAL; if (isadd) ret = br_add_if(br, dev, NULL); else ret = br_del_if(br, dev); return ret; } #define BR_UARGS_MAX 4 static int br_dev_read_uargs(unsigned long *args, size_t nr_args, void __user **argp, void __user *data) { int ret; if (nr_args < 2 || nr_args > BR_UARGS_MAX) return -EINVAL; if (in_compat_syscall()) { unsigned int cargs[BR_UARGS_MAX]; int i; ret = copy_from_user(cargs, data, nr_args * sizeof(*cargs)); if (ret) goto fault; for (i = 0; i < nr_args; ++i) args[i] = cargs[i]; *argp = compat_ptr(args[1]); } else { ret = copy_from_user(args, data, nr_args * sizeof(*args)); if (ret) goto fault; *argp = (void __user *)args[1]; } return 0; fault: return -EFAULT; } /* * Legacy ioctl's through SIOCDEVPRIVATE * This interface is deprecated because it was too difficult * to do the translation for 32/64bit ioctl compatibility. */ int br_dev_siocdevprivate(struct net_device *dev, struct ifreq *rq, void __user *data, int cmd) { struct net_bridge *br = netdev_priv(dev); struct net_bridge_port *p = NULL; unsigned long args[4]; void __user *argp; int ret; ret = br_dev_read_uargs(args, ARRAY_SIZE(args), &argp, data); if (ret) return ret; switch (args[0]) { case BRCTL_ADD_IF: case BRCTL_DEL_IF: return add_del_if(br, args[1], args[0] == BRCTL_ADD_IF); case BRCTL_GET_BRIDGE_INFO: { struct __bridge_info b; memset(&b, 0, sizeof(struct __bridge_info)); rcu_read_lock(); memcpy(&b.designated_root, &br->designated_root, 8); memcpy(&b.bridge_id, &br->bridge_id, 8); b.root_path_cost = br->root_path_cost; b.max_age = jiffies_to_clock_t(br->max_age); b.hello_time = jiffies_to_clock_t(br->hello_time); b.forward_delay = br->forward_delay; b.bridge_max_age = br->bridge_max_age; b.bridge_hello_time = br->bridge_hello_time; b.bridge_forward_delay = jiffies_to_clock_t(br->bridge_forward_delay); b.topology_change = br->topology_change; b.topology_change_detected = br->topology_change_detected; b.root_port = br->root_port; b.stp_enabled = (br->stp_enabled != BR_NO_STP); b.ageing_time = jiffies_to_clock_t(br->ageing_time); b.hello_timer_value = br_timer_value(&br->hello_timer); b.tcn_timer_value = br_timer_value(&br->tcn_timer); b.topology_change_timer_value = br_timer_value(&br->topology_change_timer); b.gc_timer_value = br_timer_value(&br->gc_work.timer); rcu_read_unlock(); if (copy_to_user((void __user *)args[1], &b, sizeof(b))) return -EFAULT; return 0; } case BRCTL_GET_PORT_LIST: { int num, *indices; num = args[2]; if (num < 0) return -EINVAL; if (num == 0) num = 256; if (num > BR_MAX_PORTS) num = BR_MAX_PORTS; indices = kcalloc(num, sizeof(int), GFP_KERNEL); if (indices == NULL) return -ENOMEM; get_port_ifindices(br, indices, num); if (copy_to_user(argp, indices, array_size(num, sizeof(int)))) num = -EFAULT; kfree(indices); return num; } case BRCTL_SET_BRIDGE_FORWARD_DELAY: if (!ns_capable(dev_net(dev)->user_ns, CAP_NET_ADMIN)) return -EPERM; ret = br_set_forward_delay(br, args[1]); break; case BRCTL_SET_BRIDGE_HELLO_TIME: if (!ns_capable(dev_net(dev)->user_ns, CAP_NET_ADMIN)) return -EPERM; ret = br_set_hello_time(br, args[1]); break; case BRCTL_SET_BRIDGE_MAX_AGE: if (!ns_capable(dev_net(dev)->user_ns, CAP_NET_ADMIN)) return -EPERM; ret = br_set_max_age(br, args[1]); break; case BRCTL_SET_AGEING_TIME: if (!ns_capable(dev_net(dev)->user_ns, CAP_NET_ADMIN)) return -EPERM; ret = br_set_ageing_time(br, args[1]); break; case BRCTL_GET_PORT_INFO: { struct __port_info p; struct net_bridge_port *pt; rcu_read_lock(); if ((pt = br_get_port(br, args[2])) == NULL) { rcu_read_unlock(); return -EINVAL; } memset(&p, 0, sizeof(struct __port_info)); memcpy(&p.designated_root, &pt->designated_root, 8); memcpy(&p.designated_bridge, &pt->designated_bridge, 8); p.port_id = pt->port_id; p.designated_port = pt->designated_port; p.path_cost = pt->path_cost; p.designated_cost = pt->designated_cost; p.state = pt->state; p.top_change_ack = pt->topology_change_ack; p.config_pending = pt->config_pending; p.message_age_timer_value = br_timer_value(&pt->message_age_timer); p.forward_delay_timer_value = br_timer_value(&pt->forward_delay_timer); p.hold_timer_value = br_timer_value(&pt->hold_timer); rcu_read_unlock(); if (copy_to_user(argp, &p, sizeof(p))) return -EFAULT; return 0; } case BRCTL_SET_BRIDGE_STP_STATE: if (!ns_capable(dev_net(dev)->user_ns, CAP_NET_ADMIN)) return -EPERM; ret = br_stp_set_enabled(br, args[1], NULL); break; case BRCTL_SET_BRIDGE_PRIORITY: if (!ns_capable(dev_net(dev)->user_ns, CAP_NET_ADMIN)) return -EPERM; br_stp_set_bridge_priority(br, args[1]); ret = 0; break; case BRCTL_SET_PORT_PRIORITY: { if (!ns_capable(dev_net(dev)->user_ns, CAP_NET_ADMIN)) return -EPERM; spin_lock_bh(&br->lock); if ((p = br_get_port(br, args[1])) == NULL) ret = -EINVAL; else ret = br_stp_set_port_priority(p, args[2]); spin_unlock_bh(&br->lock); break; } case BRCTL_SET_PATH_COST: { if (!ns_capable(dev_net(dev)->user_ns, CAP_NET_ADMIN)) return -EPERM; spin_lock_bh(&br->lock); if ((p = br_get_port(br, args[1])) == NULL) ret = -EINVAL; else ret = br_stp_set_path_cost(p, args[2]); spin_unlock_bh(&br->lock); break; } case BRCTL_GET_FDB_ENTRIES: return get_fdb_entries(br, argp, args[2], args[3]); default: ret = -EOPNOTSUPP; } if (!ret) { if (p) br_ifinfo_notify(RTM_NEWLINK, NULL, p); else netdev_state_change(br->dev); } return ret; } static int old_deviceless(struct net *net, void __user *data) { unsigned long args[3]; void __user *argp; int ret; ret = br_dev_read_uargs(args, ARRAY_SIZE(args), &argp, data); if (ret) return ret; switch (args[0]) { case BRCTL_GET_VERSION: return BRCTL_VERSION; case BRCTL_GET_BRIDGES: { int *indices; int ret = 0; if (args[2] >= 2048) return -ENOMEM; indices = kcalloc(args[2], sizeof(int), GFP_KERNEL); if (indices == NULL) return -ENOMEM; args[2] = get_bridge_ifindices(net, indices, args[2]); ret = copy_to_user(argp, indices, array_size(args[2], sizeof(int))) ? -EFAULT : args[2]; kfree(indices); return ret; } case BRCTL_ADD_BRIDGE: case BRCTL_DEL_BRIDGE: { char buf[IFNAMSIZ]; if (!ns_capable(net->user_ns, CAP_NET_ADMIN)) return -EPERM; if (copy_from_user(buf, argp, IFNAMSIZ)) return -EFAULT; buf[IFNAMSIZ-1] = 0; if (args[0] == BRCTL_ADD_BRIDGE) return br_add_bridge(net, buf); return br_del_bridge(net, buf); } } return -EOPNOTSUPP; } int br_ioctl_stub(struct net *net, struct net_bridge *br, unsigned int cmd, struct ifreq *ifr, void __user *uarg) { int ret = -EOPNOTSUPP; rtnl_lock(); switch (cmd) { case SIOCGIFBR: case SIOCSIFBR: ret = old_deviceless(net, uarg); break; case SIOCBRADDBR: case SIOCBRDELBR: { char buf[IFNAMSIZ]; if (!ns_capable(net->user_ns, CAP_NET_ADMIN)) { ret = -EPERM; break; } if (copy_from_user(buf, uarg, IFNAMSIZ)) { ret = -EFAULT; break; } buf[IFNAMSIZ-1] = 0; if (cmd == SIOCBRADDBR) ret = br_add_bridge(net, buf); else ret = br_del_bridge(net, buf); } break; case SIOCBRADDIF: case SIOCBRDELIF: ret = add_del_if(br, ifr->ifr_ifindex, cmd == SIOCBRADDIF); break; } rtnl_unlock(); return ret; } |
| 109 109 108 108 1 1 13 13 10 7 1 1 68 66 1 1 17 13 2 1 2 1 1 1 1 1 1 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 | // SPDX-License-Identifier: GPL-2.0-only /* * fs/crypto/hooks.c * * Encryption hooks for higher-level filesystem operations. */ #include "fscrypt_private.h" /** * fscrypt_file_open() - prepare to open a possibly-encrypted regular file * @inode: the inode being opened * @filp: the struct file being set up * * Currently, an encrypted regular file can only be opened if its encryption key * is available; access to the raw encrypted contents is not supported. * Therefore, we first set up the inode's encryption key (if not already done) * and return an error if it's unavailable. * * We also verify that if the parent directory (from the path via which the file * is being opened) is encrypted, then the inode being opened uses the same * encryption policy. This is needed as part of the enforcement that all files * in an encrypted directory tree use the same encryption policy, as a * protection against certain types of offline attacks. Note that this check is * needed even when opening an *unencrypted* file, since it's forbidden to have * an unencrypted file in an encrypted directory. * * Return: 0 on success, -ENOKEY if the key is missing, or another -errno code */ int fscrypt_file_open(struct inode *inode, struct file *filp) { int err; struct dentry *dentry, *dentry_parent; struct inode *inode_parent; err = fscrypt_require_key(inode); if (err) return err; dentry = file_dentry(filp); /* * Getting a reference to the parent dentry is needed for the actual * encryption policy comparison, but it's expensive on multi-core * systems. Since this function runs on unencrypted files too, start * with a lightweight RCU-mode check for the parent directory being * unencrypted (in which case it's fine for the child to be either * unencrypted, or encrypted with any policy). Only continue on to the * full policy check if the parent directory is actually encrypted. */ rcu_read_lock(); dentry_parent = READ_ONCE(dentry->d_parent); inode_parent = d_inode_rcu(dentry_parent); if (inode_parent != NULL && !IS_ENCRYPTED(inode_parent)) { rcu_read_unlock(); return 0; } rcu_read_unlock(); dentry_parent = dget_parent(dentry); if (!fscrypt_has_permitted_context(d_inode(dentry_parent), inode)) { fscrypt_warn(inode, "Inconsistent encryption context (parent directory: %lu)", d_inode(dentry_parent)->i_ino); err = -EPERM; } dput(dentry_parent); return err; } EXPORT_SYMBOL_GPL(fscrypt_file_open); int __fscrypt_prepare_link(struct inode *inode, struct inode *dir, struct dentry *dentry) { if (fscrypt_is_nokey_name(dentry)) return -ENOKEY; /* * We don't need to separately check that the directory inode's key is * available, as it's implied by the dentry not being a no-key name. */ if (!fscrypt_has_permitted_context(dir, inode)) return -EXDEV; return 0; } EXPORT_SYMBOL_GPL(__fscrypt_prepare_link); int __fscrypt_prepare_rename(struct inode *old_dir, struct dentry *old_dentry, struct inode *new_dir, struct dentry *new_dentry, unsigned int flags) { if (fscrypt_is_nokey_name(old_dentry) || fscrypt_is_nokey_name(new_dentry)) return -ENOKEY; /* * We don't need to separately check that the directory inodes' keys are * available, as it's implied by the dentries not being no-key names. */ if (old_dir != new_dir) { if (IS_ENCRYPTED(new_dir) && !fscrypt_has_permitted_context(new_dir, d_inode(old_dentry))) return -EXDEV; if ((flags & RENAME_EXCHANGE) && IS_ENCRYPTED(old_dir) && !fscrypt_has_permitted_context(old_dir, d_inode(new_dentry))) return -EXDEV; } return 0; } EXPORT_SYMBOL_GPL(__fscrypt_prepare_rename); int __fscrypt_prepare_lookup(struct inode *dir, struct dentry *dentry, struct fscrypt_name *fname) { int err = fscrypt_setup_filename(dir, &dentry->d_name, 1, fname); if (err && err != -ENOENT) return err; fscrypt_prepare_dentry(dentry, fname->is_nokey_name); return err; } EXPORT_SYMBOL_GPL(__fscrypt_prepare_lookup); /** * fscrypt_prepare_lookup_partial() - prepare lookup without filename setup * @dir: the encrypted directory being searched * @dentry: the dentry being looked up in @dir * * This function should be used by the ->lookup and ->atomic_open methods of * filesystems that handle filename encryption and no-key name encoding * themselves and thus can't use fscrypt_prepare_lookup(). Like * fscrypt_prepare_lookup(), this will try to set up the directory's encryption * key and will set DCACHE_NOKEY_NAME on the dentry if the key is unavailable. * However, this function doesn't set up a struct fscrypt_name for the filename. * * Return: 0 on success; -errno on error. Note that the encryption key being * unavailable is not considered an error. It is also not an error if * the encryption policy is unsupported by this kernel; that is treated * like the key being unavailable, so that files can still be deleted. */ int fscrypt_prepare_lookup_partial(struct inode *dir, struct dentry *dentry) { int err = fscrypt_get_encryption_info(dir, true); bool is_nokey_name = (!err && !fscrypt_has_encryption_key(dir)); fscrypt_prepare_dentry(dentry, is_nokey_name); return err; } EXPORT_SYMBOL_GPL(fscrypt_prepare_lookup_partial); int __fscrypt_prepare_readdir(struct inode *dir) { return fscrypt_get_encryption_info(dir, true); } EXPORT_SYMBOL_GPL(__fscrypt_prepare_readdir); int __fscrypt_prepare_setattr(struct dentry *dentry, struct iattr *attr) { if (attr->ia_valid & ATTR_SIZE) return fscrypt_require_key(d_inode(dentry)); return 0; } EXPORT_SYMBOL_GPL(__fscrypt_prepare_setattr); /** * fscrypt_prepare_setflags() - prepare to change flags with FS_IOC_SETFLAGS * @inode: the inode on which flags are being changed * @oldflags: the old flags * @flags: the new flags * * The caller should be holding i_rwsem for write. * * Return: 0 on success; -errno if the flags change isn't allowed or if * another error occurs. */ int fscrypt_prepare_setflags(struct inode *inode, unsigned int oldflags, unsigned int flags) { struct fscrypt_inode_info *ci; struct fscrypt_master_key *mk; int err; /* * When the CASEFOLD flag is set on an encrypted directory, we must * derive the secret key needed for the dirhash. This is only possible * if the directory uses a v2 encryption policy. */ if (IS_ENCRYPTED(inode) && (flags & ~oldflags & FS_CASEFOLD_FL)) { err = fscrypt_require_key(inode); if (err) return err; ci = inode->i_crypt_info; if (ci->ci_policy.version != FSCRYPT_POLICY_V2) return -EINVAL; mk = ci->ci_master_key; down_read(&mk->mk_sem); if (mk->mk_present) err = fscrypt_derive_dirhash_key(ci, mk); else err = -ENOKEY; up_read(&mk->mk_sem); return err; } return 0; } /** * fscrypt_prepare_symlink() - prepare to create a possibly-encrypted symlink * @dir: directory in which the symlink is being created * @target: plaintext symlink target * @len: length of @target excluding null terminator * @max_len: space the filesystem has available to store the symlink target * @disk_link: (out) the on-disk symlink target being prepared * * This function computes the size the symlink target will require on-disk, * stores it in @disk_link->len, and validates it against @max_len. An * encrypted symlink may be longer than the original. * * Additionally, @disk_link->name is set to @target if the symlink will be * unencrypted, but left NULL if the symlink will be encrypted. For encrypted * symlinks, the filesystem must call fscrypt_encrypt_symlink() to create the * on-disk target later. (The reason for the two-step process is that some * filesystems need to know the size of the symlink target before creating the * inode, e.g. to determine whether it will be a "fast" or "slow" symlink.) * * Return: 0 on success, -ENAMETOOLONG if the symlink target is too long, * -ENOKEY if the encryption key is missing, or another -errno code if a problem * occurred while setting up the encryption key. */ int fscrypt_prepare_symlink(struct inode *dir, const char *target, unsigned int len, unsigned int max_len, struct fscrypt_str *disk_link) { const union fscrypt_policy *policy; /* * To calculate the size of the encrypted symlink target we need to know * the amount of NUL padding, which is determined by the flags set in * the encryption policy which will be inherited from the directory. */ policy = fscrypt_policy_to_inherit(dir); if (policy == NULL) { /* Not encrypted */ disk_link->name = (unsigned char *)target; disk_link->len = len + 1; if (disk_link->len > max_len) return -ENAMETOOLONG; return 0; } if (IS_ERR(policy)) return PTR_ERR(policy); /* * Calculate the size of the encrypted symlink and verify it won't * exceed max_len. Note that for historical reasons, encrypted symlink * targets are prefixed with the ciphertext length, despite this * actually being redundant with i_size. This decreases by 2 bytes the * longest symlink target we can accept. * * We could recover 1 byte by not counting a null terminator, but * counting it (even though it is meaningless for ciphertext) is simpler * for now since filesystems will assume it is there and subtract it. */ if (!__fscrypt_fname_encrypted_size(policy, len, max_len - sizeof(struct fscrypt_symlink_data) - 1, &disk_link->len)) return -ENAMETOOLONG; disk_link->len += sizeof(struct fscrypt_symlink_data) + 1; disk_link->name = NULL; return 0; } EXPORT_SYMBOL_GPL(fscrypt_prepare_symlink); int __fscrypt_encrypt_symlink(struct inode *inode, const char *target, unsigned int len, struct fscrypt_str *disk_link) { int err; struct qstr iname = QSTR_INIT(target, len); struct fscrypt_symlink_data *sd; unsigned int ciphertext_len; /* * fscrypt_prepare_new_inode() should have already set up the new * symlink inode's encryption key. We don't wait until now to do it, * since we may be in a filesystem transaction now. */ if (WARN_ON_ONCE(!fscrypt_has_encryption_key(inode))) return -ENOKEY; if (disk_link->name) { /* filesystem-provided buffer */ sd = (struct fscrypt_symlink_data *)disk_link->name; } else { sd = kmalloc(disk_link->len, GFP_NOFS); if (!sd) return -ENOMEM; } ciphertext_len = disk_link->len - sizeof(*sd) - 1; sd->len = cpu_to_le16(ciphertext_len); err = fscrypt_fname_encrypt(inode, &iname, sd->encrypted_path, ciphertext_len); if (err) goto err_free_sd; /* * Null-terminating the ciphertext doesn't make sense, but we still * count the null terminator in the length, so we might as well * initialize it just in case the filesystem writes it out. */ sd->encrypted_path[ciphertext_len] = '\0'; /* Cache the plaintext symlink target for later use by get_link() */ err = -ENOMEM; inode->i_link = kmemdup(target, len + 1, GFP_NOFS); if (!inode->i_link) goto err_free_sd; if (!disk_link->name) disk_link->name = (unsigned char *)sd; return 0; err_free_sd: if (!disk_link->name) kfree(sd); return err; } EXPORT_SYMBOL_GPL(__fscrypt_encrypt_symlink); /** * fscrypt_get_symlink() - get the target of an encrypted symlink * @inode: the symlink inode * @caddr: the on-disk contents of the symlink * @max_size: size of @caddr buffer * @done: if successful, will be set up to free the returned target if needed * * If the symlink's encryption key is available, we decrypt its target. * Otherwise, we encode its target for presentation. * * This may sleep, so the filesystem must have dropped out of RCU mode already. * * Return: the presentable symlink target or an ERR_PTR() */ const char *fscrypt_get_symlink(struct inode *inode, const void *caddr, unsigned int max_size, struct delayed_call *done) { const struct fscrypt_symlink_data *sd; struct fscrypt_str cstr, pstr; bool has_key; int err; /* This is for encrypted symlinks only */ if (WARN_ON_ONCE(!IS_ENCRYPTED(inode))) return ERR_PTR(-EINVAL); /* If the decrypted target is already cached, just return it. */ pstr.name = READ_ONCE(inode->i_link); if (pstr.name) return pstr.name; /* * Try to set up the symlink's encryption key, but we can continue * regardless of whether the key is available or not. */ err = fscrypt_get_encryption_info(inode, false); if (err) return ERR_PTR(err); has_key = fscrypt_has_encryption_key(inode); /* * For historical reasons, encrypted symlink targets are prefixed with * the ciphertext length, even though this is redundant with i_size. */ if (max_size < sizeof(*sd) + 1) return ERR_PTR(-EUCLEAN); sd = caddr; cstr.name = (unsigned char *)sd->encrypted_path; cstr.len = le16_to_cpu(sd->len); if (cstr.len == 0) return ERR_PTR(-EUCLEAN); if (cstr.len + sizeof(*sd) > max_size) return ERR_PTR(-EUCLEAN); err = fscrypt_fname_alloc_buffer(cstr.len, &pstr); if (err) return ERR_PTR(err); err = fscrypt_fname_disk_to_usr(inode, 0, 0, &cstr, &pstr); if (err) goto err_kfree; err = -EUCLEAN; if (pstr.name[0] == '\0') goto err_kfree; pstr.name[pstr.len] = '\0'; /* * Cache decrypted symlink targets in i_link for later use. Don't cache * symlink targets encoded without the key, since those become outdated * once the key is added. This pairs with the READ_ONCE() above and in * the VFS path lookup code. */ if (!has_key || cmpxchg_release(&inode->i_link, NULL, pstr.name) != NULL) set_delayed_call(done, kfree_link, pstr.name); return pstr.name; err_kfree: kfree(pstr.name); return ERR_PTR(err); } EXPORT_SYMBOL_GPL(fscrypt_get_symlink); /** * fscrypt_symlink_getattr() - set the correct st_size for encrypted symlinks * @path: the path for the encrypted symlink being queried * @stat: the struct being filled with the symlink's attributes * * Override st_size of encrypted symlinks to be the length of the decrypted * symlink target (or the no-key encoded symlink target, if the key is * unavailable) rather than the length of the encrypted symlink target. This is * necessary for st_size to match the symlink target that userspace actually * sees. POSIX requires this, and some userspace programs depend on it. * * This requires reading the symlink target from disk if needed, setting up the * inode's encryption key if possible, and then decrypting or encoding the * symlink target. This makes lstat() more heavyweight than is normally the * case. However, decrypted symlink targets will be cached in ->i_link, so * usually the symlink won't have to be read and decrypted again later if/when * it is actually followed, readlink() is called, or lstat() is called again. * * Return: 0 on success, -errno on failure */ int fscrypt_symlink_getattr(const struct path *path, struct kstat *stat) { struct dentry *dentry = path->dentry; struct inode *inode = d_inode(dentry); const char *link; DEFINE_DELAYED_CALL(done); /* * To get the symlink target that userspace will see (whether it's the * decrypted target or the no-key encoded target), we can just get it in * the same way the VFS does during path resolution and readlink(). */ link = READ_ONCE(inode->i_link); if (!link) { link = inode->i_op->get_link(dentry, inode, &done); if (IS_ERR(link)) return PTR_ERR(link); } stat->size = strlen(link); do_delayed_call(&done); return 0; } EXPORT_SYMBOL_GPL(fscrypt_symlink_getattr); |
| 3 3 3 3 3 3 3 3 3 3 81 83 83 82 82 83 81 82 83 82 3 3 3 83 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 | // SPDX-License-Identifier: GPL-2.0-only /* * INET An implementation of the TCP/IP protocol suite for the LINUX * operating system. INET is implemented using the BSD Socket * interface as the means of communication with the user level. * * Generic TIME_WAIT sockets functions * * From code orinally in TCP */ #include <linux/kernel.h> #include <linux/slab.h> #include <linux/module.h> #include <net/inet_hashtables.h> #include <net/inet_timewait_sock.h> #include <net/ip.h> /** * inet_twsk_bind_unhash - unhash a timewait socket from bind hash * @tw: timewait socket * @hashinfo: hashinfo pointer * * unhash a timewait socket from bind hash, if hashed. * bind hash lock must be held by caller. * Returns 1 if caller should call inet_twsk_put() after lock release. */ void inet_twsk_bind_unhash(struct inet_timewait_sock *tw, struct inet_hashinfo *hashinfo) { struct inet_bind2_bucket *tb2 = tw->tw_tb2; struct inet_bind_bucket *tb = tw->tw_tb; if (!tb) return; __sk_del_bind_node((struct sock *)tw); tw->tw_tb = NULL; tw->tw_tb2 = NULL; inet_bind2_bucket_destroy(hashinfo->bind2_bucket_cachep, tb2); inet_bind_bucket_destroy(hashinfo->bind_bucket_cachep, tb); __sock_put((struct sock *)tw); } /* Must be called with locally disabled BHs. */ static void inet_twsk_kill(struct inet_timewait_sock *tw) { struct inet_hashinfo *hashinfo = tw->tw_dr->hashinfo; spinlock_t *lock = inet_ehash_lockp(hashinfo, tw->tw_hash); struct inet_bind_hashbucket *bhead, *bhead2; spin_lock(lock); sk_nulls_del_node_init_rcu((struct sock *)tw); spin_unlock(lock); /* Disassociate with bind bucket. */ bhead = &hashinfo->bhash[inet_bhashfn(twsk_net(tw), tw->tw_num, hashinfo->bhash_size)]; bhead2 = inet_bhashfn_portaddr(hashinfo, (struct sock *)tw, twsk_net(tw), tw->tw_num); spin_lock(&bhead->lock); spin_lock(&bhead2->lock); inet_twsk_bind_unhash(tw, hashinfo); spin_unlock(&bhead2->lock); spin_unlock(&bhead->lock); refcount_dec(&tw->tw_dr->tw_refcount); inet_twsk_put(tw); } void inet_twsk_free(struct inet_timewait_sock *tw) { struct module *owner = tw->tw_prot->owner; twsk_destructor((struct sock *)tw); kmem_cache_free(tw->tw_prot->twsk_prot->twsk_slab, tw); module_put(owner); } void inet_twsk_put(struct inet_timewait_sock *tw) { if (refcount_dec_and_test(&tw->tw_refcnt)) inet_twsk_free(tw); } EXPORT_SYMBOL_GPL(inet_twsk_put); static void inet_twsk_add_node_rcu(struct inet_timewait_sock *tw, struct hlist_nulls_head *list) { hlist_nulls_add_head_rcu(&tw->tw_node, list); } static void inet_twsk_schedule(struct inet_timewait_sock *tw, int timeo) { __inet_twsk_schedule(tw, timeo, false); } /* * Enter the time wait state. * Essentially we whip up a timewait bucket, copy the relevant info into it * from the SK, and mess with hash chains and list linkage. * * The caller must not access @tw anymore after this function returns. */ void inet_twsk_hashdance_schedule(struct inet_timewait_sock *tw, struct sock *sk, struct inet_hashinfo *hashinfo, int timeo) { const struct inet_sock *inet = inet_sk(sk); const struct inet_connection_sock *icsk = inet_csk(sk); struct inet_ehash_bucket *ehead = inet_ehash_bucket(hashinfo, sk->sk_hash); spinlock_t *lock = inet_ehash_lockp(hashinfo, sk->sk_hash); struct inet_bind_hashbucket *bhead, *bhead2; /* Step 1: Put TW into bind hash. Original socket stays there too. Note, that any socket with inet->num != 0 MUST be bound in binding cache, even if it is closed. */ bhead = &hashinfo->bhash[inet_bhashfn(twsk_net(tw), inet->inet_num, hashinfo->bhash_size)]; bhead2 = inet_bhashfn_portaddr(hashinfo, sk, twsk_net(tw), inet->inet_num); local_bh_disable(); spin_lock(&bhead->lock); spin_lock(&bhead2->lock); tw->tw_tb = icsk->icsk_bind_hash; WARN_ON(!icsk->icsk_bind_hash); tw->tw_tb2 = icsk->icsk_bind2_hash; WARN_ON(!icsk->icsk_bind2_hash); sk_add_bind_node((struct sock *)tw, &tw->tw_tb2->owners); spin_unlock(&bhead2->lock); spin_unlock(&bhead->lock); spin_lock(lock); /* Step 2: Hash TW into tcp ehash chain */ inet_twsk_add_node_rcu(tw, &ehead->chain); /* Step 3: Remove SK from hash chain */ if (__sk_nulls_del_node_init_rcu(sk)) sock_prot_inuse_add(sock_net(sk), sk->sk_prot, -1); /* Ensure above writes are committed into memory before updating the * refcount. * Provides ordering vs later refcount_inc(). */ smp_wmb(); /* tw_refcnt is set to 3 because we have : * - one reference for bhash chain. * - one reference for ehash chain. * - one reference for timer. * Also note that after this point, we lost our implicit reference * so we are not allowed to use tw anymore. */ refcount_set(&tw->tw_refcnt, 3); inet_twsk_schedule(tw, timeo); spin_unlock(lock); local_bh_enable(); } EXPORT_SYMBOL_GPL(inet_twsk_hashdance_schedule); static void tw_timer_handler(struct timer_list *t) { struct inet_timewait_sock *tw = from_timer(tw, t, tw_timer); inet_twsk_kill(tw); } struct inet_timewait_sock *inet_twsk_alloc(const struct sock *sk, struct inet_timewait_death_row *dr, const int state) { struct inet_timewait_sock *tw; if (refcount_read(&dr->tw_refcount) - 1 >= READ_ONCE(dr->sysctl_max_tw_buckets)) return NULL; tw = kmem_cache_alloc(sk->sk_prot_creator->twsk_prot->twsk_slab, GFP_ATOMIC); if (tw) { const struct inet_sock *inet = inet_sk(sk); tw->tw_dr = dr; /* Give us an identity. */ tw->tw_daddr = inet->inet_daddr; tw->tw_rcv_saddr = inet->inet_rcv_saddr; tw->tw_bound_dev_if = sk->sk_bound_dev_if; tw->tw_tos = inet->tos; tw->tw_num = inet->inet_num; tw->tw_state = TCP_TIME_WAIT; tw->tw_substate = state; tw->tw_sport = inet->inet_sport; tw->tw_dport = inet->inet_dport; tw->tw_family = sk->sk_family; tw->tw_reuse = sk->sk_reuse; tw->tw_reuseport = sk->sk_reuseport; tw->tw_hash = sk->sk_hash; tw->tw_ipv6only = 0; tw->tw_transparent = inet_test_bit(TRANSPARENT, sk); tw->tw_prot = sk->sk_prot_creator; atomic64_set(&tw->tw_cookie, atomic64_read(&sk->sk_cookie)); twsk_net_set(tw, sock_net(sk)); timer_setup(&tw->tw_timer, tw_timer_handler, 0); /* * Because we use RCU lookups, we should not set tw_refcnt * to a non null value before everything is setup for this * timewait socket. */ refcount_set(&tw->tw_refcnt, 0); __module_get(tw->tw_prot->owner); } return tw; } EXPORT_SYMBOL_GPL(inet_twsk_alloc); /* These are always called from BH context. See callers in * tcp_input.c to verify this. */ /* This is for handling early-kills of TIME_WAIT sockets. * Warning : consume reference. * Caller should not access tw anymore. */ void inet_twsk_deschedule_put(struct inet_timewait_sock *tw) { struct inet_hashinfo *hashinfo = tw->tw_dr->hashinfo; spinlock_t *lock = inet_ehash_lockp(hashinfo, tw->tw_hash); /* inet_twsk_purge() walks over all sockets, including tw ones, * and removes them via inet_twsk_deschedule_put() after a * refcount_inc_not_zero(). * * inet_twsk_hashdance_schedule() must (re)init the refcount before * arming the timer, i.e. inet_twsk_purge can obtain a reference to * a twsk that did not yet schedule the timer. * * The ehash lock synchronizes these two: * After acquiring the lock, the timer is always scheduled (else * timer_shutdown returns false), because hashdance_schedule releases * the ehash lock only after completing the timer initialization. * * Without grabbing the ehash lock, we get: * 1) cpu x sets twsk refcount to 3 * 2) cpu y bumps refcount to 4 * 3) cpu y calls inet_twsk_deschedule_put() and shuts timer down * 4) cpu x tries to start timer, but mod_timer is a noop post-shutdown * -> timer refcount is never decremented. */ spin_lock(lock); /* Makes sure hashdance_schedule() has completed */ spin_unlock(lock); if (timer_shutdown_sync(&tw->tw_timer)) inet_twsk_kill(tw); inet_twsk_put(tw); } EXPORT_SYMBOL(inet_twsk_deschedule_put); void __inet_twsk_schedule(struct inet_timewait_sock *tw, int timeo, bool rearm) { /* timeout := RTO * 3.5 * * 3.5 = 1+2+0.5 to wait for two retransmits. * * RATIONALE: if FIN arrived and we entered TIME-WAIT state, * our ACK acking that FIN can be lost. If N subsequent retransmitted * FINs (or previous seqments) are lost (probability of such event * is p^(N+1), where p is probability to lose single packet and * time to detect the loss is about RTO*(2^N - 1) with exponential * backoff). Normal timewait length is calculated so, that we * waited at least for one retransmitted FIN (maximal RTO is 120sec). * [ BTW Linux. following BSD, violates this requirement waiting * only for 60sec, we should wait at least for 240 secs. * Well, 240 consumes too much of resources 8) * ] * This interval is not reduced to catch old duplicate and * responces to our wandering segments living for two MSLs. * However, if we use PAWS to detect * old duplicates, we can reduce the interval to bounds required * by RTO, rather than MSL. So, if peer understands PAWS, we * kill tw bucket after 3.5*RTO (it is important that this number * is greater than TS tick!) and detect old duplicates with help * of PAWS. */ if (!rearm) { bool kill = timeo <= 4*HZ; __NET_INC_STATS(twsk_net(tw), kill ? LINUX_MIB_TIMEWAITKILLED : LINUX_MIB_TIMEWAITED); BUG_ON(mod_timer(&tw->tw_timer, jiffies + timeo)); refcount_inc(&tw->tw_dr->tw_refcount); } else { mod_timer_pending(&tw->tw_timer, jiffies + timeo); } } EXPORT_SYMBOL_GPL(__inet_twsk_schedule); /* Remove all non full sockets (TIME_WAIT and NEW_SYN_RECV) for dead netns */ void inet_twsk_purge(struct inet_hashinfo *hashinfo) { struct inet_ehash_bucket *head = &hashinfo->ehash[0]; unsigned int ehash_mask = hashinfo->ehash_mask; struct hlist_nulls_node *node; unsigned int slot; struct sock *sk; for (slot = 0; slot <= ehash_mask; slot++, head++) { if (hlist_nulls_empty(&head->chain)) continue; restart_rcu: cond_resched(); rcu_read_lock(); restart: sk_nulls_for_each_rcu(sk, node, &head->chain) { int state = inet_sk_state_load(sk); if ((1 << state) & ~(TCPF_TIME_WAIT | TCPF_NEW_SYN_RECV)) continue; if (refcount_read(&sock_net(sk)->ns.count)) continue; if (unlikely(!refcount_inc_not_zero(&sk->sk_refcnt))) continue; if (refcount_read(&sock_net(sk)->ns.count)) { sock_gen_put(sk); goto restart; } rcu_read_unlock(); local_bh_disable(); if (state == TCP_TIME_WAIT) { inet_twsk_deschedule_put(inet_twsk(sk)); } else { struct request_sock *req = inet_reqsk(sk); inet_csk_reqsk_queue_drop_and_put(req->rsk_listener, req); } local_bh_enable(); goto restart_rcu; } /* If the nulls value we got at the end of this lookup is * not the expected one, we must restart lookup. * We probably met an item that was moved to another chain. */ if (get_nulls_value(node) != slot) goto restart; rcu_read_unlock(); } } EXPORT_SYMBOL_GPL(inet_twsk_purge); |
| 1 59 14 14 2 1 13 14 3 1 13 50 50 49 3 5 42 10 42 10 46 49 3 32 32 8 24 1 12 1 28 4 13 18 6 5 64 11 64 107 34 68 1 86 48 3 4 1 55 6 15 6 9 15 7 1 22 2 10 12 11 7 38 27 94 49 72 8 5 42 11 122 113 24 119 12 122 12 1 10 1 4 3 11 10 5 1 10 3 1 10 10 1 9 8 3 3 1 1 1 1 3 26 26 26 4 4 10 3 1 10 13 12 2 18 10 8 8 1 1 8 8 1 3 4 7 7 8 2 1 1 1 1 1 1 1 1 3 1 1 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 2297 2298 2299 2300 2301 2302 2303 2304 2305 2306 2307 2308 2309 2310 2311 2312 2313 2314 2315 2316 2317 2318 2319 2320 2321 2322 2323 2324 2325 2326 2327 2328 2329 2330 2331 2332 2333 2334 2335 2336 2337 2338 2339 2340 2341 2342 2343 2344 2345 2346 2347 2348 2349 2350 2351 2352 2353 2354 2355 2356 2357 2358 2359 2360 2361 2362 2363 2364 2365 2366 2367 2368 2369 2370 2371 2372 2373 2374 2375 2376 2377 2378 2379 2380 2381 2382 2383 2384 2385 2386 2387 2388 2389 2390 2391 2392 2393 2394 2395 2396 2397 2398 2399 2400 2401 2402 2403 2404 2405 2406 2407 2408 2409 2410 2411 2412 2413 2414 2415 2416 2417 2418 2419 2420 2421 2422 2423 2424 2425 2426 2427 2428 2429 2430 2431 2432 2433 2434 2435 2436 2437 2438 2439 2440 2441 2442 2443 2444 2445 2446 2447 2448 2449 2450 2451 2452 2453 2454 2455 2456 2457 2458 2459 2460 2461 2462 2463 2464 2465 2466 2467 2468 2469 2470 2471 2472 2473 2474 2475 2476 2477 2478 2479 2480 2481 2482 2483 2484 2485 2486 2487 2488 2489 2490 2491 2492 2493 2494 2495 2496 2497 2498 2499 2500 2501 2502 2503 2504 2505 2506 2507 2508 2509 2510 2511 2512 2513 2514 2515 2516 2517 2518 2519 2520 2521 2522 2523 2524 2525 2526 2527 2528 2529 2530 2531 2532 2533 2534 2535 2536 2537 2538 2539 2540 2541 2542 2543 2544 2545 2546 2547 2548 2549 2550 2551 2552 2553 2554 2555 2556 2557 2558 2559 2560 2561 2562 2563 2564 2565 2566 2567 2568 2569 2570 2571 2572 2573 2574 2575 2576 2577 2578 2579 2580 2581 2582 2583 2584 2585 2586 2587 2588 2589 2590 2591 2592 2593 2594 2595 2596 2597 2598 2599 2600 2601 2602 2603 2604 2605 2606 2607 2608 2609 2610 2611 2612 2613 2614 2615 2616 2617 2618 2619 2620 2621 2622 2623 2624 2625 2626 2627 2628 2629 2630 2631 2632 2633 2634 2635 2636 2637 2638 2639 2640 2641 2642 2643 2644 2645 2646 2647 2648 2649 2650 2651 2652 2653 2654 2655 2656 2657 2658 2659 2660 2661 2662 2663 2664 2665 2666 2667 2668 2669 2670 2671 2672 2673 2674 2675 2676 2677 2678 | // SPDX-License-Identifier: GPL-2.0 /* * * Copyright (C) 2019-2021 Paragon Software GmbH, All rights reserved. * * TODO: Merge attr_set_size/attr_data_get_block/attr_allocate_frame? */ #include <linux/fs.h> #include <linux/slab.h> #include <linux/kernel.h> #include "debug.h" #include "ntfs.h" #include "ntfs_fs.h" /* * You can set external NTFS_MIN_LOG2_OF_CLUMP/NTFS_MAX_LOG2_OF_CLUMP to manage * preallocate algorithm. */ #ifndef NTFS_MIN_LOG2_OF_CLUMP #define NTFS_MIN_LOG2_OF_CLUMP 16 #endif #ifndef NTFS_MAX_LOG2_OF_CLUMP #define NTFS_MAX_LOG2_OF_CLUMP 26 #endif // 16M #define NTFS_CLUMP_MIN (1 << (NTFS_MIN_LOG2_OF_CLUMP + 8)) // 16G #define NTFS_CLUMP_MAX (1ull << (NTFS_MAX_LOG2_OF_CLUMP + 8)) static inline u64 get_pre_allocated(u64 size) { u32 clump; u8 align_shift; u64 ret; if (size <= NTFS_CLUMP_MIN) { clump = 1 << NTFS_MIN_LOG2_OF_CLUMP; align_shift = NTFS_MIN_LOG2_OF_CLUMP; } else if (size >= NTFS_CLUMP_MAX) { clump = 1 << NTFS_MAX_LOG2_OF_CLUMP; align_shift = NTFS_MAX_LOG2_OF_CLUMP; } else { align_shift = NTFS_MIN_LOG2_OF_CLUMP - 1 + __ffs(size >> (8 + NTFS_MIN_LOG2_OF_CLUMP)); clump = 1u << align_shift; } ret = (((size + clump - 1) >> align_shift)) << align_shift; return ret; } /* * attr_load_runs - Load all runs stored in @attr. */ static int attr_load_runs(struct ATTRIB *attr, struct ntfs_inode *ni, struct runs_tree *run, const CLST *vcn) { int err; CLST svcn = le64_to_cpu(attr->nres.svcn); CLST evcn = le64_to_cpu(attr->nres.evcn); u32 asize; u16 run_off; if (svcn >= evcn + 1 || run_is_mapped_full(run, svcn, evcn)) return 0; if (vcn && (evcn < *vcn || *vcn < svcn)) return -EINVAL; asize = le32_to_cpu(attr->size); run_off = le16_to_cpu(attr->nres.run_off); if (run_off > asize) return -EINVAL; err = run_unpack_ex(run, ni->mi.sbi, ni->mi.rno, svcn, evcn, vcn ? *vcn : svcn, Add2Ptr(attr, run_off), asize - run_off); if (err < 0) return err; return 0; } /* * run_deallocate_ex - Deallocate clusters. */ static int run_deallocate_ex(struct ntfs_sb_info *sbi, struct runs_tree *run, CLST vcn, CLST len, CLST *done, bool trim) { int err = 0; CLST vcn_next, vcn0 = vcn, lcn, clen, dn = 0; size_t idx; if (!len) goto out; if (!run_lookup_entry(run, vcn, &lcn, &clen, &idx)) { failed: run_truncate(run, vcn0); err = -EINVAL; goto out; } for (;;) { if (clen > len) clen = len; if (!clen) { err = -EINVAL; goto out; } if (lcn != SPARSE_LCN) { if (sbi) { /* mark bitmap range [lcn + clen) as free and trim clusters. */ mark_as_free_ex(sbi, lcn, clen, trim); } dn += clen; } len -= clen; if (!len) break; vcn_next = vcn + clen; if (!run_get_entry(run, ++idx, &vcn, &lcn, &clen) || vcn != vcn_next) { /* Save memory - don't load entire run. */ goto failed; } } out: if (done) *done += dn; return err; } /* * attr_allocate_clusters - Find free space, mark it as used and store in @run. */ int attr_allocate_clusters(struct ntfs_sb_info *sbi, struct runs_tree *run, CLST vcn, CLST lcn, CLST len, CLST *pre_alloc, enum ALLOCATE_OPT opt, CLST *alen, const size_t fr, CLST *new_lcn, CLST *new_len) { int err; CLST flen, vcn0 = vcn, pre = pre_alloc ? *pre_alloc : 0; size_t cnt = run->count; for (;;) { err = ntfs_look_for_free_space(sbi, lcn, len + pre, &lcn, &flen, opt); if (err == -ENOSPC && pre) { pre = 0; if (*pre_alloc) *pre_alloc = 0; continue; } if (err) goto out; if (vcn == vcn0) { /* Return the first fragment. */ if (new_lcn) *new_lcn = lcn; if (new_len) *new_len = flen; } /* Add new fragment into run storage. */ if (!run_add_entry(run, vcn, lcn, flen, opt & ALLOCATE_MFT)) { /* Undo last 'ntfs_look_for_free_space' */ mark_as_free_ex(sbi, lcn, len, false); err = -ENOMEM; goto out; } if (opt & ALLOCATE_ZERO) { u8 shift = sbi->cluster_bits - SECTOR_SHIFT; err = blkdev_issue_zeroout(sbi->sb->s_bdev, (sector_t)lcn << shift, (sector_t)flen << shift, GFP_NOFS, 0); if (err) goto out; } vcn += flen; if (flen >= len || (opt & ALLOCATE_MFT) || (fr && run->count - cnt >= fr)) { *alen = vcn - vcn0; return 0; } len -= flen; } out: /* Undo 'ntfs_look_for_free_space' */ if (vcn - vcn0) { run_deallocate_ex(sbi, run, vcn0, vcn - vcn0, NULL, false); run_truncate(run, vcn0); } return err; } /* * attr_make_nonresident * * If page is not NULL - it is already contains resident data * and locked (called from ni_write_frame()). */ int attr_make_nonresident(struct ntfs_inode *ni, struct ATTRIB *attr, struct ATTR_LIST_ENTRY *le, struct mft_inode *mi, u64 new_size, struct runs_tree *run, struct ATTRIB **ins_attr, struct page *page) { struct ntfs_sb_info *sbi; struct ATTRIB *attr_s; struct MFT_REC *rec; u32 used, asize, rsize, aoff; bool is_data; CLST len, alen; char *next; int err; if (attr->non_res) { *ins_attr = attr; return 0; } sbi = mi->sbi; rec = mi->mrec; attr_s = NULL; used = le32_to_cpu(rec->used); asize = le32_to_cpu(attr->size); next = Add2Ptr(attr, asize); aoff = PtrOffset(rec, attr); rsize = le32_to_cpu(attr->res.data_size); is_data = attr->type == ATTR_DATA && !attr->name_len; /* len - how many clusters required to store 'rsize' bytes */ if (is_attr_compressed(attr)) { u8 shift = sbi->cluster_bits + NTFS_LZNT_CUNIT; len = ((rsize + (1u << shift) - 1) >> shift) << NTFS_LZNT_CUNIT; } else { len = bytes_to_cluster(sbi, rsize); } run_init(run); /* Make a copy of original attribute. */ attr_s = kmemdup(attr, asize, GFP_NOFS); if (!attr_s) { err = -ENOMEM; goto out; } if (!len) { /* Empty resident -> Empty nonresident. */ alen = 0; } else { const char *data = resident_data(attr); err = attr_allocate_clusters(sbi, run, 0, 0, len, NULL, ALLOCATE_DEF, &alen, 0, NULL, NULL); if (err) goto out1; if (!rsize) { /* Empty resident -> Non empty nonresident. */ } else if (!is_data) { err = ntfs_sb_write_run(sbi, run, 0, data, rsize, 0); if (err) goto out2; } else if (!page) { struct address_space *mapping = ni->vfs_inode.i_mapping; struct folio *folio; folio = __filemap_get_folio( mapping, 0, FGP_LOCK | FGP_ACCESSED | FGP_CREAT, mapping_gfp_mask(mapping)); if (IS_ERR(folio)) { err = PTR_ERR(folio); goto out2; } folio_fill_tail(folio, 0, data, rsize); folio_mark_uptodate(folio); folio_mark_dirty(folio); folio_unlock(folio); folio_put(folio); } } /* Remove original attribute. */ used -= asize; memmove(attr, Add2Ptr(attr, asize), used - aoff); rec->used = cpu_to_le32(used); mi->dirty = true; if (le) al_remove_le(ni, le); err = ni_insert_nonresident(ni, attr_s->type, attr_name(attr_s), attr_s->name_len, run, 0, alen, attr_s->flags, &attr, NULL, NULL); if (err) goto out3; kfree(attr_s); attr->nres.data_size = cpu_to_le64(rsize); attr->nres.valid_size = attr->nres.data_size; *ins_attr = attr; if (is_data) ni->ni_flags &= ~NI_FLAG_RESIDENT; /* Resident attribute becomes non resident. */ return 0; out3: attr = Add2Ptr(rec, aoff); memmove(next, attr, used - aoff); memcpy(attr, attr_s, asize); rec->used = cpu_to_le32(used + asize); mi->dirty = true; out2: /* Undo: do not trim new allocated clusters. */ run_deallocate(sbi, run, false); run_close(run); out1: kfree(attr_s); out: return err; } /* * attr_set_size_res - Helper for attr_set_size(). */ static int attr_set_size_res(struct ntfs_inode *ni, struct ATTRIB *attr, struct ATTR_LIST_ENTRY *le, struct mft_inode *mi, u64 new_size, struct runs_tree *run, struct ATTRIB **ins_attr) { struct ntfs_sb_info *sbi = mi->sbi; struct MFT_REC *rec = mi->mrec; u32 used = le32_to_cpu(rec->used); u32 asize = le32_to_cpu(attr->size); u32 aoff = PtrOffset(rec, attr); u32 rsize = le32_to_cpu(attr->res.data_size); u32 tail = used - aoff - asize; char *next = Add2Ptr(attr, asize); s64 dsize = ALIGN(new_size, 8) - ALIGN(rsize, 8); if (dsize < 0) { memmove(next + dsize, next, tail); } else if (dsize > 0) { if (used + dsize > sbi->max_bytes_per_attr) return attr_make_nonresident(ni, attr, le, mi, new_size, run, ins_attr, NULL); memmove(next + dsize, next, tail); memset(next, 0, dsize); } if (new_size > rsize) memset(Add2Ptr(resident_data(attr), rsize), 0, new_size - rsize); rec->used = cpu_to_le32(used + dsize); attr->size = cpu_to_le32(asize + dsize); attr->res.data_size = cpu_to_le32(new_size); mi->dirty = true; *ins_attr = attr; return 0; } /* * attr_set_size - Change the size of attribute. * * Extend: * - Sparse/compressed: No allocated clusters. * - Normal: Append allocated and preallocated new clusters. * Shrink: * - No deallocate if @keep_prealloc is set. */ int attr_set_size(struct ntfs_inode *ni, enum ATTR_TYPE type, const __le16 *name, u8 name_len, struct runs_tree *run, u64 new_size, const u64 *new_valid, bool keep_prealloc, struct ATTRIB **ret) { int err = 0; struct ntfs_sb_info *sbi = ni->mi.sbi; u8 cluster_bits = sbi->cluster_bits; bool is_mft = ni->mi.rno == MFT_REC_MFT && type == ATTR_DATA && !name_len; u64 old_valid, old_size, old_alloc, new_alloc, new_alloc_tmp; struct ATTRIB *attr = NULL, *attr_b; struct ATTR_LIST_ENTRY *le, *le_b; struct mft_inode *mi, *mi_b; CLST alen, vcn, lcn, new_alen, old_alen, svcn, evcn; CLST next_svcn, pre_alloc = -1, done = 0; bool is_ext, is_bad = false; bool dirty = false; u32 align; struct MFT_REC *rec; again: alen = 0; le_b = NULL; attr_b = ni_find_attr(ni, NULL, &le_b, type, name, name_len, NULL, &mi_b); if (!attr_b) { err = -ENOENT; goto bad_inode; } if (!attr_b->non_res) { err = attr_set_size_res(ni, attr_b, le_b, mi_b, new_size, run, &attr_b); if (err) return err; /* Return if file is still resident. */ if (!attr_b->non_res) { dirty = true; goto ok1; } /* Layout of records may be changed, so do a full search. */ goto again; } is_ext = is_attr_ext(attr_b); align = sbi->cluster_size; if (is_ext) align <<= attr_b->nres.c_unit; old_valid = le64_to_cpu(attr_b->nres.valid_size); old_size = le64_to_cpu(attr_b->nres.data_size); old_alloc = le64_to_cpu(attr_b->nres.alloc_size); again_1: old_alen = old_alloc >> cluster_bits; new_alloc = (new_size + align - 1) & ~(u64)(align - 1); new_alen = new_alloc >> cluster_bits; if (keep_prealloc && new_size < old_size) { attr_b->nres.data_size = cpu_to_le64(new_size); mi_b->dirty = dirty = true; goto ok; } vcn = old_alen - 1; svcn = le64_to_cpu(attr_b->nres.svcn); evcn = le64_to_cpu(attr_b->nres.evcn); if (svcn <= vcn && vcn <= evcn) { attr = attr_b; le = le_b; mi = mi_b; } else if (!le_b) { err = -EINVAL; goto bad_inode; } else { le = le_b; attr = ni_find_attr(ni, attr_b, &le, type, name, name_len, &vcn, &mi); if (!attr) { err = -EINVAL; goto bad_inode; } next_le_1: svcn = le64_to_cpu(attr->nres.svcn); evcn = le64_to_cpu(attr->nres.evcn); } /* * Here we have: * attr,mi,le - last attribute segment (containing 'vcn'). * attr_b,mi_b,le_b - base (primary) attribute segment. */ next_le: rec = mi->mrec; err = attr_load_runs(attr, ni, run, NULL); if (err) goto out; if (new_size > old_size) { CLST to_allocate; size_t free; if (new_alloc <= old_alloc) { attr_b->nres.data_size = cpu_to_le64(new_size); mi_b->dirty = dirty = true; goto ok; } /* * Add clusters. In simple case we have to: * - allocate space (vcn, lcn, len) * - update packed run in 'mi' * - update attr->nres.evcn * - update attr_b->nres.data_size/attr_b->nres.alloc_size */ to_allocate = new_alen - old_alen; add_alloc_in_same_attr_seg: lcn = 0; if (is_mft) { /* MFT allocates clusters from MFT zone. */ pre_alloc = 0; } else if (is_ext) { /* No preallocate for sparse/compress. */ pre_alloc = 0; } else if (pre_alloc == -1) { pre_alloc = 0; if (type == ATTR_DATA && !name_len && sbi->options->prealloc) { pre_alloc = bytes_to_cluster( sbi, get_pre_allocated( new_size)) - new_alen; } /* Get the last LCN to allocate from. */ if (old_alen && !run_lookup_entry(run, vcn, &lcn, NULL, NULL)) { lcn = SPARSE_LCN; } if (lcn == SPARSE_LCN) lcn = 0; else if (lcn) lcn += 1; free = wnd_zeroes(&sbi->used.bitmap); if (to_allocate > free) { err = -ENOSPC; goto out; } if (pre_alloc && to_allocate + pre_alloc > free) pre_alloc = 0; } vcn = old_alen; if (is_ext) { if (!run_add_entry(run, vcn, SPARSE_LCN, to_allocate, false)) { err = -ENOMEM; goto out; } alen = to_allocate; } else { /* ~3 bytes per fragment. */ err = attr_allocate_clusters( sbi, run, vcn, lcn, to_allocate, &pre_alloc, is_mft ? ALLOCATE_MFT : ALLOCATE_DEF, &alen, is_mft ? 0 : (sbi->record_size - le32_to_cpu(rec->used) + 8) / 3 + 1, NULL, NULL); if (err) goto out; } done += alen; vcn += alen; if (to_allocate > alen) to_allocate -= alen; else to_allocate = 0; pack_runs: err = mi_pack_runs(mi, attr, run, vcn - svcn); if (err) goto undo_1; next_svcn = le64_to_cpu(attr->nres.evcn) + 1; new_alloc_tmp = (u64)next_svcn << cluster_bits; attr_b->nres.alloc_size = cpu_to_le64(new_alloc_tmp); mi_b->dirty = dirty = true; if (next_svcn >= vcn && !to_allocate) { /* Normal way. Update attribute and exit. */ attr_b->nres.data_size = cpu_to_le64(new_size); goto ok; } /* At least two MFT to avoid recursive loop. */ if (is_mft && next_svcn == vcn && ((u64)done << sbi->cluster_bits) >= 2 * sbi->record_size) { new_size = new_alloc_tmp; attr_b->nres.data_size = attr_b->nres.alloc_size; goto ok; } if (le32_to_cpu(rec->used) < sbi->record_size) { old_alen = next_svcn; evcn = old_alen - 1; goto add_alloc_in_same_attr_seg; } attr_b->nres.data_size = attr_b->nres.alloc_size; if (new_alloc_tmp < old_valid) attr_b->nres.valid_size = attr_b->nres.data_size; if (type == ATTR_LIST) { err = ni_expand_list(ni); if (err) goto undo_2; if (next_svcn < vcn) goto pack_runs; /* Layout of records is changed. */ goto again; } if (!ni->attr_list.size) { err = ni_create_attr_list(ni); /* In case of error layout of records is not changed. */ if (err) goto undo_2; /* Layout of records is changed. */ } if (next_svcn >= vcn) { /* This is MFT data, repeat. */ goto again; } /* Insert new attribute segment. */ err = ni_insert_nonresident(ni, type, name, name_len, run, next_svcn, vcn - next_svcn, attr_b->flags, &attr, &mi, NULL); /* * Layout of records maybe changed. * Find base attribute to update. */ le_b = NULL; attr_b = ni_find_attr(ni, NULL, &le_b, type, name, name_len, NULL, &mi_b); if (!attr_b) { err = -EINVAL; goto bad_inode; } if (err) { /* ni_insert_nonresident failed. */ attr = NULL; goto undo_2; } /* keep runs for $MFT::$ATTR_DATA and $MFT::$ATTR_BITMAP. */ if (ni->mi.rno != MFT_REC_MFT) run_truncate_head(run, evcn + 1); svcn = le64_to_cpu(attr->nres.svcn); evcn = le64_to_cpu(attr->nres.evcn); /* * Attribute is in consistency state. * Save this point to restore to if next steps fail. */ old_valid = old_size = old_alloc = (u64)vcn << cluster_bits; attr_b->nres.valid_size = attr_b->nres.data_size = attr_b->nres.alloc_size = cpu_to_le64(old_size); mi_b->dirty = dirty = true; goto again_1; } if (new_size != old_size || (new_alloc != old_alloc && !keep_prealloc)) { /* * Truncate clusters. In simple case we have to: * - update packed run in 'mi' * - update attr->nres.evcn * - update attr_b->nres.data_size/attr_b->nres.alloc_size * - mark and trim clusters as free (vcn, lcn, len) */ CLST dlen = 0; vcn = max(svcn, new_alen); new_alloc_tmp = (u64)vcn << cluster_bits; if (vcn > svcn) { err = mi_pack_runs(mi, attr, run, vcn - svcn); if (err) goto out; } else if (le && le->vcn) { u16 le_sz = le16_to_cpu(le->size); /* * NOTE: List entries for one attribute are always * the same size. We deal with last entry (vcn==0) * and it is not first in entries array * (list entry for std attribute always first). * So it is safe to step back. */ mi_remove_attr(NULL, mi, attr); if (!al_remove_le(ni, le)) { err = -EINVAL; goto bad_inode; } le = (struct ATTR_LIST_ENTRY *)((u8 *)le - le_sz); } else { attr->nres.evcn = cpu_to_le64((u64)vcn - 1); mi->dirty = true; } attr_b->nres.alloc_size = cpu_to_le64(new_alloc_tmp); if (vcn == new_alen) { attr_b->nres.data_size = cpu_to_le64(new_size); if (new_size < old_valid) attr_b->nres.valid_size = attr_b->nres.data_size; } else { if (new_alloc_tmp <= le64_to_cpu(attr_b->nres.data_size)) attr_b->nres.data_size = attr_b->nres.alloc_size; if (new_alloc_tmp < le64_to_cpu(attr_b->nres.valid_size)) attr_b->nres.valid_size = attr_b->nres.alloc_size; } mi_b->dirty = dirty = true; err = run_deallocate_ex(sbi, run, vcn, evcn - vcn + 1, &dlen, true); if (err) goto out; if (is_ext) { /* dlen - really deallocated clusters. */ le64_sub_cpu(&attr_b->nres.total_size, ((u64)dlen << cluster_bits)); } run_truncate(run, vcn); if (new_alloc_tmp <= new_alloc) goto ok; old_size = new_alloc_tmp; vcn = svcn - 1; if (le == le_b) { attr = attr_b; mi = mi_b; evcn = svcn - 1; svcn = 0; goto next_le; } if (le->type != type || le->name_len != name_len || memcmp(le_name(le), name, name_len * sizeof(short))) { err = -EINVAL; goto bad_inode; } err = ni_load_mi(ni, le, &mi); if (err) goto out; attr = mi_find_attr(ni, mi, NULL, type, name, name_len, &le->id); if (!attr) { err = -EINVAL; goto bad_inode; } goto next_le_1; } ok: if (new_valid) { __le64 valid = cpu_to_le64(min(*new_valid, new_size)); if (attr_b->nres.valid_size != valid) { attr_b->nres.valid_size = valid; mi_b->dirty = true; } } ok1: if (ret) *ret = attr_b; if (((type == ATTR_DATA && !name_len) || (type == ATTR_ALLOC && name == I30_NAME))) { /* Update inode_set_bytes. */ if (attr_b->non_res) { new_alloc = le64_to_cpu(attr_b->nres.alloc_size); if (inode_get_bytes(&ni->vfs_inode) != new_alloc) { inode_set_bytes(&ni->vfs_inode, new_alloc); dirty = true; } } /* Don't forget to update duplicate information in parent. */ if (dirty) { ni->ni_flags |= NI_FLAG_UPDATE_PARENT; mark_inode_dirty(&ni->vfs_inode); } } return 0; undo_2: vcn -= alen; attr_b->nres.data_size = cpu_to_le64(old_size); attr_b->nres.valid_size = cpu_to_le64(old_valid); attr_b->nres.alloc_size = cpu_to_le64(old_alloc); /* Restore 'attr' and 'mi'. */ if (attr) goto restore_run; if (le64_to_cpu(attr_b->nres.svcn) <= svcn && svcn <= le64_to_cpu(attr_b->nres.evcn)) { attr = attr_b; le = le_b; mi = mi_b; } else if (!le_b) { err = -EINVAL; goto bad_inode; } else { le = le_b; attr = ni_find_attr(ni, attr_b, &le, type, name, name_len, &svcn, &mi); if (!attr) goto bad_inode; } restore_run: if (mi_pack_runs(mi, attr, run, evcn - svcn + 1)) is_bad = true; undo_1: run_deallocate_ex(sbi, run, vcn, alen, NULL, false); run_truncate(run, vcn); out: if (is_bad) { bad_inode: _ntfs_bad_inode(&ni->vfs_inode); } return err; } /* * attr_data_get_block - Returns 'lcn' and 'len' for given 'vcn'. * * @new == NULL means just to get current mapping for 'vcn' * @new != NULL means allocate real cluster if 'vcn' maps to hole * @zero - zeroout new allocated clusters * * NOTE: * - @new != NULL is called only for sparsed or compressed attributes. * - new allocated clusters are zeroed via blkdev_issue_zeroout. */ int attr_data_get_block(struct ntfs_inode *ni, CLST vcn, CLST clen, CLST *lcn, CLST *len, bool *new, bool zero) { int err = 0; struct runs_tree *run = &ni->file.run; struct ntfs_sb_info *sbi; u8 cluster_bits; struct ATTRIB *attr, *attr_b; struct ATTR_LIST_ENTRY *le, *le_b; struct mft_inode *mi, *mi_b; CLST hint, svcn, to_alloc, evcn1, next_svcn, asize, end, vcn0, alen; CLST alloc, evcn; unsigned fr; u64 total_size, total_size0; int step = 0; if (new) *new = false; /* Try to find in cache. */ down_read(&ni->file.run_lock); if (!run_lookup_entry(run, vcn, lcn, len, NULL)) *len = 0; up_read(&ni->file.run_lock); if (*len && (*lcn != SPARSE_LCN || !new)) return 0; /* Fast normal way without allocation. */ /* No cluster in cache or we need to allocate cluster in hole. */ sbi = ni->mi.sbi; cluster_bits = sbi->cluster_bits; ni_lock(ni); down_write(&ni->file.run_lock); /* Repeat the code above (under write lock). */ if (!run_lookup_entry(run, vcn, lcn, len, NULL)) *len = 0; if (*len) { if (*lcn != SPARSE_LCN || !new) goto out; /* normal way without allocation. */ if (clen > *len) clen = *len; } le_b = NULL; attr_b = ni_find_attr(ni, NULL, &le_b, ATTR_DATA, NULL, 0, NULL, &mi_b); if (!attr_b) { err = -ENOENT; goto out; } if (!attr_b->non_res) { *lcn = RESIDENT_LCN; *len = 1; goto out; } asize = le64_to_cpu(attr_b->nres.alloc_size) >> cluster_bits; if (vcn >= asize) { if (new) { err = -EINVAL; } else { *len = 1; *lcn = SPARSE_LCN; } goto out; } svcn = le64_to_cpu(attr_b->nres.svcn); evcn1 = le64_to_cpu(attr_b->nres.evcn) + 1; attr = attr_b; le = le_b; mi = mi_b; if (le_b && (vcn < svcn || evcn1 <= vcn)) { attr = ni_find_attr(ni, attr_b, &le, ATTR_DATA, NULL, 0, &vcn, &mi); if (!attr) { err = -EINVAL; goto out; } svcn = le64_to_cpu(attr->nres.svcn); evcn1 = le64_to_cpu(attr->nres.evcn) + 1; } /* Load in cache actual information. */ err = attr_load_runs(attr, ni, run, NULL); if (err) goto out; /* Check for compressed frame. */ err = attr_is_frame_compressed(ni, attr_b, vcn >> NTFS_LZNT_CUNIT, &hint, run); if (err) goto out; if (hint) { /* if frame is compressed - don't touch it. */ *lcn = COMPRESSED_LCN; /* length to the end of frame. */ *len = NTFS_LZNT_CLUSTERS - (vcn & (NTFS_LZNT_CLUSTERS - 1)); err = 0; goto out; } if (!*len) { if (run_lookup_entry(run, vcn, lcn, len, NULL)) { if (*lcn != SPARSE_LCN || !new) goto ok; /* Slow normal way without allocation. */ if (clen > *len) clen = *len; } else if (!new) { /* Here we may return -ENOENT. * In any case caller gets zero length. */ goto ok; } } if (!is_attr_ext(attr_b)) { /* The code below only for sparsed or compressed attributes. */ err = -EINVAL; goto out; } vcn0 = vcn; to_alloc = clen; fr = (sbi->record_size - le32_to_cpu(mi->mrec->used) + 8) / 3 + 1; /* Allocate frame aligned clusters. * ntfs.sys usually uses 16 clusters per frame for sparsed or compressed. * ntfs3 uses 1 cluster per frame for new created sparsed files. */ if (attr_b->nres.c_unit) { CLST clst_per_frame = 1u << attr_b->nres.c_unit; CLST cmask = ~(clst_per_frame - 1); /* Get frame aligned vcn and to_alloc. */ vcn = vcn0 & cmask; to_alloc = ((vcn0 + clen + clst_per_frame - 1) & cmask) - vcn; if (fr < clst_per_frame) fr = clst_per_frame; zero = true; /* Check if 'vcn' and 'vcn0' in different attribute segments. */ if (vcn < svcn || evcn1 <= vcn) { struct ATTRIB *attr2; /* Load runs for truncated vcn. */ attr2 = ni_find_attr(ni, attr_b, &le_b, ATTR_DATA, NULL, 0, &vcn, &mi); if (!attr2) { err = -EINVAL; goto out; } evcn1 = le64_to_cpu(attr2->nres.evcn) + 1; err = attr_load_runs(attr2, ni, run, NULL); if (err) goto out; } } if (vcn + to_alloc > asize) to_alloc = asize - vcn; /* Get the last LCN to allocate from. */ hint = 0; if (vcn > evcn1) { if (!run_add_entry(run, evcn1, SPARSE_LCN, vcn - evcn1, false)) { err = -ENOMEM; goto out; } } else if (vcn && !run_lookup_entry(run, vcn - 1, &hint, NULL, NULL)) { hint = -1; } /* Allocate and zeroout new clusters. */ err = attr_allocate_clusters(sbi, run, vcn, hint + 1, to_alloc, NULL, zero ? ALLOCATE_ZERO : ALLOCATE_DEF, &alen, fr, lcn, len); if (err) goto out; *new = true; step = 1; end = vcn + alen; /* Save 'total_size0' to restore if error. */ total_size0 = le64_to_cpu(attr_b->nres.total_size); total_size = total_size0 + ((u64)alen << cluster_bits); if (vcn != vcn0) { if (!run_lookup_entry(run, vcn0, lcn, len, NULL)) { err = -EINVAL; goto out; } if (*lcn == SPARSE_LCN) { /* Internal error. Should not happened. */ WARN_ON(1); err = -EINVAL; goto out; } /* Check case when vcn0 + len overlaps new allocated clusters. */ if (vcn0 + *len > end) *len = end - vcn0; } repack: err = mi_pack_runs(mi, attr, run, max(end, evcn1) - svcn); if (err) goto out; attr_b->nres.total_size = cpu_to_le64(total_size); inode_set_bytes(&ni->vfs_inode, total_size); ni->ni_flags |= NI_FLAG_UPDATE_PARENT; mi_b->dirty = true; mark_inode_dirty(&ni->vfs_inode); /* Stored [vcn : next_svcn) from [vcn : end). */ next_svcn = le64_to_cpu(attr->nres.evcn) + 1; if (end <= evcn1) { if (next_svcn == evcn1) { /* Normal way. Update attribute and exit. */ goto ok; } /* Add new segment [next_svcn : evcn1 - next_svcn). */ if (!ni->attr_list.size) { err = ni_create_attr_list(ni); if (err) goto undo1; /* Layout of records is changed. */ le_b = NULL; attr_b = ni_find_attr(ni, NULL, &le_b, ATTR_DATA, NULL, 0, NULL, &mi_b); if (!attr_b) { err = -ENOENT; goto out; } attr = attr_b; le = le_b; mi = mi_b; goto repack; } } /* * The code below may require additional cluster (to extend attribute list) * and / or one MFT record * It is too complex to undo operations if -ENOSPC occurs deep inside * in 'ni_insert_nonresident'. * Return in advance -ENOSPC here if there are no free cluster and no free MFT. */ if (!ntfs_check_for_free_space(sbi, 1, 1)) { /* Undo step 1. */ err = -ENOSPC; goto undo1; } step = 2; svcn = evcn1; /* Estimate next attribute. */ attr = ni_find_attr(ni, attr, &le, ATTR_DATA, NULL, 0, &svcn, &mi); if (!attr) { /* Insert new attribute segment. */ goto ins_ext; } /* Try to update existed attribute segment. */ alloc = bytes_to_cluster(sbi, le64_to_cpu(attr_b->nres.alloc_size)); evcn = le64_to_cpu(attr->nres.evcn); if (end < next_svcn) end = next_svcn; while (end > evcn) { /* Remove segment [svcn : evcn). */ mi_remove_attr(NULL, mi, attr); if (!al_remove_le(ni, le)) { err = -EINVAL; goto out; } if (evcn + 1 >= alloc) { /* Last attribute segment. */ evcn1 = evcn + 1; goto ins_ext; } if (ni_load_mi(ni, le, &mi)) { attr = NULL; goto out; } attr = mi_find_attr(ni, mi, NULL, ATTR_DATA, NULL, 0, &le->id); if (!attr) { err = -EINVAL; goto out; } svcn = le64_to_cpu(attr->nres.svcn); evcn = le64_to_cpu(attr->nres.evcn); } if (end < svcn) end = svcn; err = attr_load_runs(attr, ni, run, &end); if (err) goto out; evcn1 = evcn + 1; attr->nres.svcn = cpu_to_le64(next_svcn); err = mi_pack_runs(mi, attr, run, evcn1 - next_svcn); if (err) goto out; le->vcn = cpu_to_le64(next_svcn); ni->attr_list.dirty = true; mi->dirty = true; next_svcn = le64_to_cpu(attr->nres.evcn) + 1; ins_ext: if (evcn1 > next_svcn) { err = ni_insert_nonresident(ni, ATTR_DATA, NULL, 0, run, next_svcn, evcn1 - next_svcn, attr_b->flags, &attr, &mi, NULL); if (err) goto out; } ok: run_truncate_around(run, vcn); out: if (err && step > 1) { /* Too complex to restore. */ _ntfs_bad_inode(&ni->vfs_inode); } up_write(&ni->file.run_lock); ni_unlock(ni); return err; undo1: /* Undo step1. */ attr_b->nres.total_size = cpu_to_le64(total_size0); inode_set_bytes(&ni->vfs_inode, total_size0); if (run_deallocate_ex(sbi, run, vcn, alen, NULL, false) || !run_add_entry(run, vcn, SPARSE_LCN, alen, false) || mi_pack_runs(mi, attr, run, max(end, evcn1) - svcn)) { _ntfs_bad_inode(&ni->vfs_inode); } goto out; } int attr_data_read_resident(struct ntfs_inode *ni, struct folio *folio) { u64 vbo; struct ATTRIB *attr; u32 data_size; size_t len; attr = ni_find_attr(ni, NULL, NULL, ATTR_DATA, NULL, 0, NULL, NULL); if (!attr) return -EINVAL; if (attr->non_res) return E_NTFS_NONRESIDENT; vbo = folio->index << PAGE_SHIFT; data_size = le32_to_cpu(attr->res.data_size); if (vbo > data_size) len = 0; else len = min(data_size - vbo, folio_size(folio)); folio_fill_tail(folio, 0, resident_data(attr) + vbo, len); folio_mark_uptodate(folio); return 0; } int attr_data_write_resident(struct ntfs_inode *ni, struct folio *folio) { u64 vbo; struct mft_inode *mi; struct ATTRIB *attr; u32 data_size; attr = ni_find_attr(ni, NULL, NULL, ATTR_DATA, NULL, 0, NULL, &mi); if (!attr) return -EINVAL; if (attr->non_res) { /* Return special error code to check this case. */ return E_NTFS_NONRESIDENT; } vbo = folio->index << PAGE_SHIFT; data_size = le32_to_cpu(attr->res.data_size); if (vbo < data_size) { char *data = resident_data(attr); size_t len = min(data_size - vbo, folio_size(folio)); memcpy_from_folio(data + vbo, folio, 0, len); mi->dirty = true; } ni->i_valid = data_size; return 0; } /* * attr_load_runs_vcn - Load runs with VCN. */ int attr_load_runs_vcn(struct ntfs_inode *ni, enum ATTR_TYPE type, const __le16 *name, u8 name_len, struct runs_tree *run, CLST vcn) { struct ATTRIB *attr; int err; CLST svcn, evcn; u16 ro; if (!ni) { /* Is record corrupted? */ return -ENOENT; } attr = ni_find_attr(ni, NULL, NULL, type, name, name_len, &vcn, NULL); if (!attr) { /* Is record corrupted? */ return -ENOENT; } svcn = le64_to_cpu(attr->nres.svcn); evcn = le64_to_cpu(attr->nres.evcn); if (evcn < vcn || vcn < svcn) { /* Is record corrupted? */ return -EINVAL; } ro = le16_to_cpu(attr->nres.run_off); if (ro > le32_to_cpu(attr->size)) return -EINVAL; err = run_unpack_ex(run, ni->mi.sbi, ni->mi.rno, svcn, evcn, svcn, Add2Ptr(attr, ro), le32_to_cpu(attr->size) - ro); if (err < 0) return err; return 0; } /* * attr_load_runs_range - Load runs for given range [from to). */ int attr_load_runs_range(struct ntfs_inode *ni, enum ATTR_TYPE type, const __le16 *name, u8 name_len, struct runs_tree *run, u64 from, u64 to) { struct ntfs_sb_info *sbi = ni->mi.sbi; u8 cluster_bits = sbi->cluster_bits; CLST vcn; CLST vcn_last = (to - 1) >> cluster_bits; CLST lcn, clen; int err; for (vcn = from >> cluster_bits; vcn <= vcn_last; vcn += clen) { if (!run_lookup_entry(run, vcn, &lcn, &clen, NULL)) { err = attr_load_runs_vcn(ni, type, name, name_len, run, vcn); if (err) return err; clen = 0; /* Next run_lookup_entry(vcn) must be success. */ } } return 0; } #ifdef CONFIG_NTFS3_LZX_XPRESS /* * attr_wof_frame_info * * Read header of Xpress/LZX file to get info about frame. */ int attr_wof_frame_info(struct ntfs_inode *ni, struct ATTRIB *attr, struct runs_tree *run, u64 frame, u64 frames, u8 frame_bits, u32 *ondisk_size, u64 *vbo_data) { struct ntfs_sb_info *sbi = ni->mi.sbi; u64 vbo[2], off[2], wof_size; u32 voff; u8 bytes_per_off; char *addr; struct folio *folio; int i, err; __le32 *off32; __le64 *off64; if (ni->vfs_inode.i_size < 0x100000000ull) { /* File starts with array of 32 bit offsets. */ bytes_per_off = sizeof(__le32); vbo[1] = frame << 2; *vbo_data = frames << 2; } else { /* File starts with array of 64 bit offsets. */ bytes_per_off = sizeof(__le64); vbo[1] = frame << 3; *vbo_data = frames << 3; } /* * Read 4/8 bytes at [vbo - 4(8)] == offset where compressed frame starts. * Read 4/8 bytes at [vbo] == offset where compressed frame ends. */ if (!attr->non_res) { if (vbo[1] + bytes_per_off > le32_to_cpu(attr->res.data_size)) { _ntfs_bad_inode(&ni->vfs_inode); return -EINVAL; } addr = resident_data(attr); if (bytes_per_off == sizeof(__le32)) { off32 = Add2Ptr(addr, vbo[1]); off[0] = vbo[1] ? le32_to_cpu(off32[-1]) : 0; off[1] = le32_to_cpu(off32[0]); } else { off64 = Add2Ptr(addr, vbo[1]); off[0] = vbo[1] ? le64_to_cpu(off64[-1]) : 0; off[1] = le64_to_cpu(off64[0]); } *vbo_data += off[0]; *ondisk_size = off[1] - off[0]; return 0; } wof_size = le64_to_cpu(attr->nres.data_size); down_write(&ni->file.run_lock); folio = ni->file.offs_folio; if (!folio) { folio = folio_alloc(GFP_KERNEL, 0); if (!folio) { err = -ENOMEM; goto out; } folio->index = -1; ni->file.offs_folio = folio; } folio_lock(folio); addr = folio_address(folio); if (vbo[1]) { voff = vbo[1] & (PAGE_SIZE - 1); vbo[0] = vbo[1] - bytes_per_off; i = 0; } else { voff = 0; vbo[0] = 0; off[0] = 0; i = 1; } do { pgoff_t index = vbo[i] >> PAGE_SHIFT; if (index != folio->index) { struct page *page = &folio->page; u64 from = vbo[i] & ~(u64)(PAGE_SIZE - 1); u64 to = min(from + PAGE_SIZE, wof_size); err = attr_load_runs_range(ni, ATTR_DATA, WOF_NAME, ARRAY_SIZE(WOF_NAME), run, from, to); if (err) goto out1; err = ntfs_bio_pages(sbi, run, &page, 1, from, to - from, REQ_OP_READ); if (err) { folio->index = -1; goto out1; } folio->index = index; } if (i) { if (bytes_per_off == sizeof(__le32)) { off32 = Add2Ptr(addr, voff); off[1] = le32_to_cpu(*off32); } else { off64 = Add2Ptr(addr, voff); off[1] = le64_to_cpu(*off64); } } else if (!voff) { if (bytes_per_off == sizeof(__le32)) { off32 = Add2Ptr(addr, PAGE_SIZE - sizeof(u32)); off[0] = le32_to_cpu(*off32); } else { off64 = Add2Ptr(addr, PAGE_SIZE - sizeof(u64)); off[0] = le64_to_cpu(*off64); } } else { /* Two values in one page. */ if (bytes_per_off == sizeof(__le32)) { off32 = Add2Ptr(addr, voff); off[0] = le32_to_cpu(off32[-1]); off[1] = le32_to_cpu(off32[0]); } else { off64 = Add2Ptr(addr, voff); off[0] = le64_to_cpu(off64[-1]); off[1] = le64_to_cpu(off64[0]); } break; } } while (++i < 2); *vbo_data += off[0]; *ondisk_size = off[1] - off[0]; out1: folio_unlock(folio); out: up_write(&ni->file.run_lock); return err; } #endif /* * attr_is_frame_compressed - Used to detect compressed frame. * * attr - base (primary) attribute segment. * run - run to use, usually == &ni->file.run. * Only base segments contains valid 'attr->nres.c_unit' */ int attr_is_frame_compressed(struct ntfs_inode *ni, struct ATTRIB *attr, CLST frame, CLST *clst_data, struct runs_tree *run) { int err; u32 clst_frame; CLST clen, lcn, vcn, alen, slen, vcn_next; size_t idx; *clst_data = 0; if (!is_attr_compressed(attr)) return 0; if (!attr->non_res) return 0; clst_frame = 1u << attr->nres.c_unit; vcn = frame * clst_frame; if (!run_lookup_entry(run, vcn, &lcn, &clen, &idx)) { err = attr_load_runs_vcn(ni, attr->type, attr_name(attr), attr->name_len, run, vcn); if (err) return err; if (!run_lookup_entry(run, vcn, &lcn, &clen, &idx)) return -EINVAL; } if (lcn == SPARSE_LCN) { /* Sparsed frame. */ return 0; } if (clen >= clst_frame) { /* * The frame is not compressed 'cause * it does not contain any sparse clusters. */ *clst_data = clst_frame; return 0; } alen = bytes_to_cluster(ni->mi.sbi, le64_to_cpu(attr->nres.alloc_size)); slen = 0; *clst_data = clen; /* * The frame is compressed if *clst_data + slen >= clst_frame. * Check next fragments. */ while ((vcn += clen) < alen) { vcn_next = vcn; if (!run_get_entry(run, ++idx, &vcn, &lcn, &clen) || vcn_next != vcn) { err = attr_load_runs_vcn(ni, attr->type, attr_name(attr), attr->name_len, run, vcn_next); if (err) return err; vcn = vcn_next; if (!run_lookup_entry(run, vcn, &lcn, &clen, &idx)) return -EINVAL; } if (lcn == SPARSE_LCN) { slen += clen; } else { if (slen) { /* * Data_clusters + sparse_clusters = * not enough for frame. */ return -EINVAL; } *clst_data += clen; } if (*clst_data + slen >= clst_frame) { if (!slen) { /* * There is no sparsed clusters in this frame * so it is not compressed. */ *clst_data = clst_frame; } else { /* Frame is compressed. */ } break; } } return 0; } /* * attr_allocate_frame - Allocate/free clusters for @frame. * * Assumed: down_write(&ni->file.run_lock); */ int attr_allocate_frame(struct ntfs_inode *ni, CLST frame, size_t compr_size, u64 new_valid) { int err = 0; struct runs_tree *run = &ni->file.run; struct ntfs_sb_info *sbi = ni->mi.sbi; struct ATTRIB *attr = NULL, *attr_b; struct ATTR_LIST_ENTRY *le, *le_b; struct mft_inode *mi, *mi_b; CLST svcn, evcn1, next_svcn, len; CLST vcn, end, clst_data; u64 total_size, valid_size, data_size; le_b = NULL; attr_b = ni_find_attr(ni, NULL, &le_b, ATTR_DATA, NULL, 0, NULL, &mi_b); if (!attr_b) return -ENOENT; if (!is_attr_ext(attr_b)) return -EINVAL; vcn = frame << NTFS_LZNT_CUNIT; total_size = le64_to_cpu(attr_b->nres.total_size); svcn = le64_to_cpu(attr_b->nres.svcn); evcn1 = le64_to_cpu(attr_b->nres.evcn) + 1; data_size = le64_to_cpu(attr_b->nres.data_size); if (svcn <= vcn && vcn < evcn1) { attr = attr_b; le = le_b; mi = mi_b; } else if (!le_b) { err = -EINVAL; goto out; } else { le = le_b; attr = ni_find_attr(ni, attr_b, &le, ATTR_DATA, NULL, 0, &vcn, &mi); if (!attr) { err = -EINVAL; goto out; } svcn = le64_to_cpu(attr->nres.svcn); evcn1 = le64_to_cpu(attr->nres.evcn) + 1; } err = attr_load_runs(attr, ni, run, NULL); if (err) goto out; err = attr_is_frame_compressed(ni, attr_b, frame, &clst_data, run); if (err) goto out; total_size -= (u64)clst_data << sbi->cluster_bits; len = bytes_to_cluster(sbi, compr_size); if (len == clst_data) goto out; if (len < clst_data) { err = run_deallocate_ex(sbi, run, vcn + len, clst_data - len, NULL, true); if (err) goto out; if (!run_add_entry(run, vcn + len, SPARSE_LCN, clst_data - len, false)) { err = -ENOMEM; goto out; } end = vcn + clst_data; /* Run contains updated range [vcn + len : end). */ } else { CLST alen, hint = 0; /* Get the last LCN to allocate from. */ if (vcn + clst_data && !run_lookup_entry(run, vcn + clst_data - 1, &hint, NULL, NULL)) { hint = -1; } err = attr_allocate_clusters(sbi, run, vcn + clst_data, hint + 1, len - clst_data, NULL, ALLOCATE_DEF, &alen, 0, NULL, NULL); if (err) goto out; end = vcn + len; /* Run contains updated range [vcn + clst_data : end). */ } total_size += (u64)len << sbi->cluster_bits; repack: err = mi_pack_runs(mi, attr, run, max(end, evcn1) - svcn); if (err) goto out; attr_b->nres.total_size = cpu_to_le64(total_size); inode_set_bytes(&ni->vfs_inode, total_size); ni->ni_flags |= NI_FLAG_UPDATE_PARENT; mi_b->dirty = true; mark_inode_dirty(&ni->vfs_inode); /* Stored [vcn : next_svcn) from [vcn : end). */ next_svcn = le64_to_cpu(attr->nres.evcn) + 1; if (end <= evcn1) { if (next_svcn == evcn1) { /* Normal way. Update attribute and exit. */ goto ok; } /* Add new segment [next_svcn : evcn1 - next_svcn). */ if (!ni->attr_list.size) { err = ni_create_attr_list(ni); if (err) goto out; /* Layout of records is changed. */ le_b = NULL; attr_b = ni_find_attr(ni, NULL, &le_b, ATTR_DATA, NULL, 0, NULL, &mi_b); if (!attr_b) { err = -ENOENT; goto out; } attr = attr_b; le = le_b; mi = mi_b; goto repack; } } svcn = evcn1; /* Estimate next attribute. */ attr = ni_find_attr(ni, attr, &le, ATTR_DATA, NULL, 0, &svcn, &mi); if (attr) { CLST alloc = bytes_to_cluster( sbi, le64_to_cpu(attr_b->nres.alloc_size)); CLST evcn = le64_to_cpu(attr->nres.evcn); if (end < next_svcn) end = next_svcn; while (end > evcn) { /* Remove segment [svcn : evcn). */ mi_remove_attr(NULL, mi, attr); if (!al_remove_le(ni, le)) { err = -EINVAL; goto out; } if (evcn + 1 >= alloc) { /* Last attribute segment. */ evcn1 = evcn + 1; goto ins_ext; } if (ni_load_mi(ni, le, &mi)) { attr = NULL; goto out; } attr = mi_find_attr(ni, mi, NULL, ATTR_DATA, NULL, 0, &le->id); if (!attr) { err = -EINVAL; goto out; } svcn = le64_to_cpu(attr->nres.svcn); evcn = le64_to_cpu(attr->nres.evcn); } if (end < svcn) end = svcn; err = attr_load_runs(attr, ni, run, &end); if (err) goto out; evcn1 = evcn + 1; attr->nres.svcn = cpu_to_le64(next_svcn); err = mi_pack_runs(mi, attr, run, evcn1 - next_svcn); if (err) goto out; le->vcn = cpu_to_le64(next_svcn); ni->attr_list.dirty = true; mi->dirty = true; next_svcn = le64_to_cpu(attr->nres.evcn) + 1; } ins_ext: if (evcn1 > next_svcn) { err = ni_insert_nonresident(ni, ATTR_DATA, NULL, 0, run, next_svcn, evcn1 - next_svcn, attr_b->flags, &attr, &mi, NULL); if (err) goto out; } ok: run_truncate_around(run, vcn); out: if (attr_b) { if (new_valid > data_size) new_valid = data_size; valid_size = le64_to_cpu(attr_b->nres.valid_size); if (new_valid != valid_size) { attr_b->nres.valid_size = cpu_to_le64(valid_size); mi_b->dirty = true; } } return err; } /* * attr_collapse_range - Collapse range in file. */ int attr_collapse_range(struct ntfs_inode *ni, u64 vbo, u64 bytes) { int err = 0; struct runs_tree *run = &ni->file.run; struct ntfs_sb_info *sbi = ni->mi.sbi; struct ATTRIB *attr = NULL, *attr_b; struct ATTR_LIST_ENTRY *le, *le_b; struct mft_inode *mi, *mi_b; CLST svcn, evcn1, len, dealloc, alen; CLST vcn, end; u64 valid_size, data_size, alloc_size, total_size; u32 mask; __le16 a_flags; if (!bytes) return 0; le_b = NULL; attr_b = ni_find_attr(ni, NULL, &le_b, ATTR_DATA, NULL, 0, NULL, &mi_b); if (!attr_b) return -ENOENT; if (!attr_b->non_res) { /* Attribute is resident. Nothing to do? */ return 0; } data_size = le64_to_cpu(attr_b->nres.data_size); alloc_size = le64_to_cpu(attr_b->nres.alloc_size); a_flags = attr_b->flags; if (is_attr_ext(attr_b)) { total_size = le64_to_cpu(attr_b->nres.total_size); mask = (sbi->cluster_size << attr_b->nres.c_unit) - 1; } else { total_size = alloc_size; mask = sbi->cluster_mask; } if ((vbo & mask) || (bytes & mask)) { /* Allow to collapse only cluster aligned ranges. */ return -EINVAL; } if (vbo > data_size) return -EINVAL; down_write(&ni->file.run_lock); if (vbo + bytes >= data_size) { u64 new_valid = min(ni->i_valid, vbo); /* Simple truncate file at 'vbo'. */ truncate_setsize(&ni->vfs_inode, vbo); err = attr_set_size(ni, ATTR_DATA, NULL, 0, &ni->file.run, vbo, &new_valid, true, NULL); if (!err && new_valid < ni->i_valid) ni->i_valid = new_valid; goto out; } /* * Enumerate all attribute segments and collapse. */ alen = alloc_size >> sbi->cluster_bits; vcn = vbo >> sbi->cluster_bits; len = bytes >> sbi->cluster_bits; end = vcn + len; dealloc = 0; svcn = le64_to_cpu(attr_b->nres.svcn); evcn1 = le64_to_cpu(attr_b->nres.evcn) + 1; if (svcn <= vcn && vcn < evcn1) { attr = attr_b; le = le_b; mi = mi_b; } else if (!le_b) { err = -EINVAL; goto out; } else { le = le_b; attr = ni_find_attr(ni, attr_b, &le, ATTR_DATA, NULL, 0, &vcn, &mi); if (!attr) { err = -EINVAL; goto out; } svcn = le64_to_cpu(attr->nres.svcn); evcn1 = le64_to_cpu(attr->nres.evcn) + 1; } for (;;) { if (svcn >= end) { /* Shift VCN- */ attr->nres.svcn = cpu_to_le64(svcn - len); attr->nres.evcn = cpu_to_le64(evcn1 - 1 - len); if (le) { le->vcn = attr->nres.svcn; ni->attr_list.dirty = true; } mi->dirty = true; } else if (svcn < vcn || end < evcn1) { CLST vcn1, eat, next_svcn; /* Collapse a part of this attribute segment. */ err = attr_load_runs(attr, ni, run, &svcn); if (err) goto out; vcn1 = max(vcn, svcn); eat = min(end, evcn1) - vcn1; err = run_deallocate_ex(sbi, run, vcn1, eat, &dealloc, true); if (err) goto out; if (!run_collapse_range(run, vcn1, eat)) { err = -ENOMEM; goto out; } if (svcn >= vcn) { /* Shift VCN */ attr->nres.svcn = cpu_to_le64(vcn); if (le) { le->vcn = attr->nres.svcn; ni->attr_list.dirty = true; } } err = mi_pack_runs(mi, attr, run, evcn1 - svcn - eat); if (err) goto out; next_svcn = le64_to_cpu(attr->nres.evcn) + 1; if (next_svcn + eat < evcn1) { err = ni_insert_nonresident( ni, ATTR_DATA, NULL, 0, run, next_svcn, evcn1 - eat - next_svcn, a_flags, &attr, &mi, &le); if (err) goto out; /* Layout of records maybe changed. */ attr_b = NULL; } /* Free all allocated memory. */ run_truncate(run, 0); } else { u16 le_sz; u16 roff = le16_to_cpu(attr->nres.run_off); if (roff > le32_to_cpu(attr->size)) { err = -EINVAL; goto out; } run_unpack_ex(RUN_DEALLOCATE, sbi, ni->mi.rno, svcn, evcn1 - 1, svcn, Add2Ptr(attr, roff), le32_to_cpu(attr->size) - roff); /* Delete this attribute segment. */ mi_remove_attr(NULL, mi, attr); if (!le) break; le_sz = le16_to_cpu(le->size); if (!al_remove_le(ni, le)) { err = -EINVAL; goto out; } if (evcn1 >= alen) break; if (!svcn) { /* Load next record that contains this attribute. */ if (ni_load_mi(ni, le, &mi)) { err = -EINVAL; goto out; } /* Look for required attribute. */ attr = mi_find_attr(ni, mi, NULL, ATTR_DATA, NULL, 0, &le->id); if (!attr) { err = -EINVAL; goto out; } goto next_attr; } le = (struct ATTR_LIST_ENTRY *)((u8 *)le - le_sz); } if (evcn1 >= alen) break; attr = ni_enum_attr_ex(ni, attr, &le, &mi); if (!attr) { err = -EINVAL; goto out; } next_attr: svcn = le64_to_cpu(attr->nres.svcn); evcn1 = le64_to_cpu(attr->nres.evcn) + 1; } if (!attr_b) { le_b = NULL; attr_b = ni_find_attr(ni, NULL, &le_b, ATTR_DATA, NULL, 0, NULL, &mi_b); if (!attr_b) { err = -ENOENT; goto out; } } data_size -= bytes; valid_size = ni->i_valid; if (vbo + bytes <= valid_size) valid_size -= bytes; else if (vbo < valid_size) valid_size = vbo; attr_b->nres.alloc_size = cpu_to_le64(alloc_size - bytes); attr_b->nres.data_size = cpu_to_le64(data_size); attr_b->nres.valid_size = cpu_to_le64(min(valid_size, data_size)); total_size -= (u64)dealloc << sbi->cluster_bits; if (is_attr_ext(attr_b)) attr_b->nres.total_size = cpu_to_le64(total_size); mi_b->dirty = true; /* Update inode size. */ ni->i_valid = valid_size; i_size_write(&ni->vfs_inode, data_size); inode_set_bytes(&ni->vfs_inode, total_size); ni->ni_flags |= NI_FLAG_UPDATE_PARENT; mark_inode_dirty(&ni->vfs_inode); out: up_write(&ni->file.run_lock); if (err) _ntfs_bad_inode(&ni->vfs_inode); return err; } /* * attr_punch_hole * * Not for normal files. */ int attr_punch_hole(struct ntfs_inode *ni, u64 vbo, u64 bytes, u32 *frame_size) { int err = 0; struct runs_tree *run = &ni->file.run; struct ntfs_sb_info *sbi = ni->mi.sbi; struct ATTRIB *attr = NULL, *attr_b; struct ATTR_LIST_ENTRY *le, *le_b; struct mft_inode *mi, *mi_b; CLST svcn, evcn1, vcn, len, end, alen, hole, next_svcn; u64 total_size, alloc_size; u32 mask; __le16 a_flags; struct runs_tree run2; if (!bytes) return 0; le_b = NULL; attr_b = ni_find_attr(ni, NULL, &le_b, ATTR_DATA, NULL, 0, NULL, &mi_b); if (!attr_b) return -ENOENT; if (!attr_b->non_res) { u32 data_size = le32_to_cpu(attr_b->res.data_size); u32 from, to; if (vbo > data_size) return 0; from = vbo; to = min_t(u64, vbo + bytes, data_size); memset(Add2Ptr(resident_data(attr_b), from), 0, to - from); return 0; } if (!is_attr_ext(attr_b)) return -EOPNOTSUPP; alloc_size = le64_to_cpu(attr_b->nres.alloc_size); total_size = le64_to_cpu(attr_b->nres.total_size); if (vbo >= alloc_size) { /* NOTE: It is allowed. */ return 0; } mask = (sbi->cluster_size << attr_b->nres.c_unit) - 1; bytes += vbo; if (bytes > alloc_size) bytes = alloc_size; bytes -= vbo; if ((vbo & mask) || (bytes & mask)) { /* We have to zero a range(s). */ if (frame_size == NULL) { /* Caller insists range is aligned. */ return -EINVAL; } *frame_size = mask + 1; return E_NTFS_NOTALIGNED; } down_write(&ni->file.run_lock); run_init(&run2); run_truncate(run, 0); /* * Enumerate all attribute segments and punch hole where necessary. */ alen = alloc_size >> sbi->cluster_bits; vcn = vbo >> sbi->cluster_bits; len = bytes >> sbi->cluster_bits; end = vcn + len; hole = 0; svcn = le64_to_cpu(attr_b->nres.svcn); evcn1 = le64_to_cpu(attr_b->nres.evcn) + 1; a_flags = attr_b->flags; if (svcn <= vcn && vcn < evcn1) { attr = attr_b; le = le_b; mi = mi_b; } else if (!le_b) { err = -EINVAL; goto bad_inode; } else { le = le_b; attr = ni_find_attr(ni, attr_b, &le, ATTR_DATA, NULL, 0, &vcn, &mi); if (!attr) { err = -EINVAL; goto bad_inode; } svcn = le64_to_cpu(attr->nres.svcn); evcn1 = le64_to_cpu(attr->nres.evcn) + 1; } while (svcn < end) { CLST vcn1, zero, hole2 = hole; err = attr_load_runs(attr, ni, run, &svcn); if (err) goto done; vcn1 = max(vcn, svcn); zero = min(end, evcn1) - vcn1; /* * Check range [vcn1 + zero). * Calculate how many clusters there are. * Don't do any destructive actions. */ err = run_deallocate_ex(NULL, run, vcn1, zero, &hole2, false); if (err) goto done; /* Check if required range is already hole. */ if (hole2 == hole) goto next_attr; /* Make a clone of run to undo. */ err = run_clone(run, &run2); if (err) goto done; /* Make a hole range (sparse) [vcn1 + zero). */ if (!run_add_entry(run, vcn1, SPARSE_LCN, zero, false)) { err = -ENOMEM; goto done; } /* Update run in attribute segment. */ err = mi_pack_runs(mi, attr, run, evcn1 - svcn); if (err) goto done; next_svcn = le64_to_cpu(attr->nres.evcn) + 1; if (next_svcn < evcn1) { /* Insert new attribute segment. */ err = ni_insert_nonresident(ni, ATTR_DATA, NULL, 0, run, next_svcn, evcn1 - next_svcn, a_flags, &attr, &mi, &le); if (err) goto undo_punch; /* Layout of records maybe changed. */ attr_b = NULL; } /* Real deallocate. Should not fail. */ run_deallocate_ex(sbi, &run2, vcn1, zero, &hole, true); next_attr: /* Free all allocated memory. */ run_truncate(run, 0); if (evcn1 >= alen) break; /* Get next attribute segment. */ attr = ni_enum_attr_ex(ni, attr, &le, &mi); if (!attr) { err = -EINVAL; goto bad_inode; } svcn = le64_to_cpu(attr->nres.svcn); evcn1 = le64_to_cpu(attr->nres.evcn) + 1; } done: if (!hole) goto out; if (!attr_b) { attr_b = ni_find_attr(ni, NULL, NULL, ATTR_DATA, NULL, 0, NULL, &mi_b); if (!attr_b) { err = -EINVAL; goto bad_inode; } } total_size -= (u64)hole << sbi->cluster_bits; attr_b->nres.total_size = cpu_to_le64(total_size); mi_b->dirty = true; /* Update inode size. */ inode_set_bytes(&ni->vfs_inode, total_size); ni->ni_flags |= NI_FLAG_UPDATE_PARENT; mark_inode_dirty(&ni->vfs_inode); out: run_close(&run2); up_write(&ni->file.run_lock); return err; bad_inode: _ntfs_bad_inode(&ni->vfs_inode); goto out; undo_punch: /* * Restore packed runs. * 'mi_pack_runs' should not fail, cause we restore original. */ if (mi_pack_runs(mi, attr, &run2, evcn1 - svcn)) goto bad_inode; goto done; } /* * attr_insert_range - Insert range (hole) in file. * Not for normal files. */ int attr_insert_range(struct ntfs_inode *ni, u64 vbo, u64 bytes) { int err = 0; struct runs_tree *run = &ni->file.run; struct ntfs_sb_info *sbi = ni->mi.sbi; struct ATTRIB *attr = NULL, *attr_b; struct ATTR_LIST_ENTRY *le, *le_b; struct mft_inode *mi, *mi_b; CLST vcn, svcn, evcn1, len, next_svcn; u64 data_size, alloc_size; u32 mask; __le16 a_flags; if (!bytes) return 0; le_b = NULL; attr_b = ni_find_attr(ni, NULL, &le_b, ATTR_DATA, NULL, 0, NULL, &mi_b); if (!attr_b) return -ENOENT; if (!is_attr_ext(attr_b)) { /* It was checked above. See fallocate. */ return -EOPNOTSUPP; } if (!attr_b->non_res) { data_size = le32_to_cpu(attr_b->res.data_size); alloc_size = data_size; mask = sbi->cluster_mask; /* cluster_size - 1 */ } else { data_size = le64_to_cpu(attr_b->nres.data_size); alloc_size = le64_to_cpu(attr_b->nres.alloc_size); mask = (sbi->cluster_size << attr_b->nres.c_unit) - 1; } if (vbo >= data_size) { /* * Insert range after the file size is not allowed. * If the offset is equal to or greater than the end of * file, an error is returned. For such operations (i.e., inserting * a hole at the end of file), ftruncate(2) should be used. */ return -EINVAL; } if ((vbo & mask) || (bytes & mask)) { /* Allow to insert only frame aligned ranges. */ return -EINVAL; } /* * valid_size <= data_size <= alloc_size * Check alloc_size for maximum possible. */ if (bytes > sbi->maxbytes_sparse - alloc_size) return -EFBIG; vcn = vbo >> sbi->cluster_bits; len = bytes >> sbi->cluster_bits; down_write(&ni->file.run_lock); if (!attr_b->non_res) { err = attr_set_size(ni, ATTR_DATA, NULL, 0, run, data_size + bytes, NULL, false, NULL); le_b = NULL; attr_b = ni_find_attr(ni, NULL, &le_b, ATTR_DATA, NULL, 0, NULL, &mi_b); if (!attr_b) { err = -EINVAL; goto bad_inode; } if (err) goto out; if (!attr_b->non_res) { /* Still resident. */ char *data = Add2Ptr(attr_b, le16_to_cpu(attr_b->res.data_off)); memmove(data + bytes, data, bytes); memset(data, 0, bytes); goto done; } /* Resident files becomes nonresident. */ data_size = le64_to_cpu(attr_b->nres.data_size); alloc_size = le64_to_cpu(attr_b->nres.alloc_size); } /* * Enumerate all attribute segments and shift start vcn. */ a_flags = attr_b->flags; svcn = le64_to_cpu(attr_b->nres.svcn); evcn1 = le64_to_cpu(attr_b->nres.evcn) + 1; if (svcn <= vcn && vcn < evcn1) { attr = attr_b; le = le_b; mi = mi_b; } else if (!le_b) { err = -EINVAL; goto bad_inode; } else { le = le_b; attr = ni_find_attr(ni, attr_b, &le, ATTR_DATA, NULL, 0, &vcn, &mi); if (!attr) { err = -EINVAL; goto bad_inode; } svcn = le64_to_cpu(attr->nres.svcn); evcn1 = le64_to_cpu(attr->nres.evcn) + 1; } run_truncate(run, 0); /* clear cached values. */ err = attr_load_runs(attr, ni, run, NULL); if (err) goto out; if (!run_insert_range(run, vcn, len)) { err = -ENOMEM; goto out; } /* Try to pack in current record as much as possible. */ err = mi_pack_runs(mi, attr, run, evcn1 + len - svcn); if (err) goto out; next_svcn = le64_to_cpu(attr->nres.evcn) + 1; while ((attr = ni_enum_attr_ex(ni, attr, &le, &mi)) && attr->type == ATTR_DATA && !attr->name_len) { le64_add_cpu(&attr->nres.svcn, len); le64_add_cpu(&attr->nres.evcn, len); if (le) { le->vcn = attr->nres.svcn; ni->attr_list.dirty = true; } mi->dirty = true; } if (next_svcn < evcn1 + len) { err = ni_insert_nonresident(ni, ATTR_DATA, NULL, 0, run, next_svcn, evcn1 + len - next_svcn, a_flags, NULL, NULL, NULL); le_b = NULL; attr_b = ni_find_attr(ni, NULL, &le_b, ATTR_DATA, NULL, 0, NULL, &mi_b); if (!attr_b) { err = -EINVAL; goto bad_inode; } if (err) { /* ni_insert_nonresident failed. Try to undo. */ goto undo_insert_range; } } /* * Update primary attribute segment. */ if (vbo <= ni->i_valid) ni->i_valid += bytes; attr_b->nres.data_size = cpu_to_le64(data_size + bytes); attr_b->nres.alloc_size = cpu_to_le64(alloc_size + bytes); /* ni->valid may be not equal valid_size (temporary). */ if (ni->i_valid > data_size + bytes) attr_b->nres.valid_size = attr_b->nres.data_size; else attr_b->nres.valid_size = cpu_to_le64(ni->i_valid); mi_b->dirty = true; done: i_size_write(&ni->vfs_inode, ni->vfs_inode.i_size + bytes); ni->ni_flags |= NI_FLAG_UPDATE_PARENT; mark_inode_dirty(&ni->vfs_inode); out: run_truncate(run, 0); /* clear cached values. */ up_write(&ni->file.run_lock); return err; bad_inode: _ntfs_bad_inode(&ni->vfs_inode); goto out; undo_insert_range: svcn = le64_to_cpu(attr_b->nres.svcn); evcn1 = le64_to_cpu(attr_b->nres.evcn) + 1; if (svcn <= vcn && vcn < evcn1) { attr = attr_b; le = le_b; mi = mi_b; } else if (!le_b) { goto bad_inode; } else { le = le_b; attr = ni_find_attr(ni, attr_b, &le, ATTR_DATA, NULL, 0, &vcn, &mi); if (!attr) { goto bad_inode; } svcn = le64_to_cpu(attr->nres.svcn); evcn1 = le64_to_cpu(attr->nres.evcn) + 1; } if (attr_load_runs(attr, ni, run, NULL)) goto bad_inode; if (!run_collapse_range(run, vcn, len)) goto bad_inode; if (mi_pack_runs(mi, attr, run, evcn1 + len - svcn)) goto bad_inode; while ((attr = ni_enum_attr_ex(ni, attr, &le, &mi)) && attr->type == ATTR_DATA && !attr->name_len) { le64_sub_cpu(&attr->nres.svcn, len); le64_sub_cpu(&attr->nres.evcn, len); if (le) { le->vcn = attr->nres.svcn; ni->attr_list.dirty = true; } mi->dirty = true; } goto out; } /* * attr_force_nonresident * * Convert default data attribute into non resident form. */ int attr_force_nonresident(struct ntfs_inode *ni) { int err; struct ATTRIB *attr; struct ATTR_LIST_ENTRY *le = NULL; struct mft_inode *mi; attr = ni_find_attr(ni, NULL, &le, ATTR_DATA, NULL, 0, NULL, &mi); if (!attr) { _ntfs_bad_inode(&ni->vfs_inode); return -ENOENT; } if (attr->non_res) { /* Already non resident. */ return 0; } down_write(&ni->file.run_lock); err = attr_make_nonresident(ni, attr, le, mi, le32_to_cpu(attr->res.data_size), &ni->file.run, &attr, NULL); up_write(&ni->file.run_lock); return err; } /* * Change the compression of data attribute */ int attr_set_compress(struct ntfs_inode *ni, bool compr) { struct ATTRIB *attr; struct mft_inode *mi; attr = ni_find_attr(ni, NULL, NULL, ATTR_DATA, NULL, 0, NULL, &mi); if (!attr) return -ENOENT; if (is_attr_compressed(attr) == !!compr) { /* Already required compressed state. */ return 0; } if (attr->non_res) { u16 run_off; u32 run_size; char *run; if (attr->nres.data_size) { /* * There are rare cases when it possible to change * compress state without big changes. * TODO: Process these cases. */ return -EOPNOTSUPP; } run_off = le16_to_cpu(attr->nres.run_off); run_size = le32_to_cpu(attr->size) - run_off; run = Add2Ptr(attr, run_off); if (!compr) { /* remove field 'attr->nres.total_size'. */ memmove(run - 8, run, run_size); run_off -= 8; } if (!mi_resize_attr(mi, attr, compr ? +8 : -8)) { /* * Ignore rare case when there are no 8 bytes in record with attr. * TODO: split attribute. */ return -EOPNOTSUPP; } if (compr) { /* Make a gap for 'attr->nres.total_size'. */ memmove(run + 8, run, run_size); run_off += 8; attr->nres.total_size = attr->nres.alloc_size; } attr->nres.run_off = cpu_to_le16(run_off); } /* Update data attribute flags. */ if (compr) { attr->flags |= ATTR_FLAG_COMPRESSED; attr->nres.c_unit = NTFS_LZNT_CUNIT; } else { attr->flags &= ~ATTR_FLAG_COMPRESSED; attr->nres.c_unit = 0; } mi->dirty = true; return 0; } |
| 2 1 2 3 2 2 2 2 2 2 1 1 2 1 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 | // SPDX-License-Identifier: GPL-2.0-or-later /* * Copyright (c) 2016 Mellanox Technologies. All rights reserved. * Copyright (c) 2016 Jiri Pirko <jiri@mellanox.com> */ #include "devl_internal.h" struct devlink_linecard { struct list_head list; struct devlink *devlink; unsigned int index; const struct devlink_linecard_ops *ops; void *priv; enum devlink_linecard_state state; struct mutex state_lock; /* Protects state */ const char *type; struct devlink_linecard_type *types; unsigned int types_count; u32 rel_index; }; unsigned int devlink_linecard_index(struct devlink_linecard *linecard) { return linecard->index; } static struct devlink_linecard * devlink_linecard_get_by_index(struct devlink *devlink, unsigned int linecard_index) { struct devlink_linecard *devlink_linecard; list_for_each_entry(devlink_linecard, &devlink->linecard_list, list) { if (devlink_linecard->index == linecard_index) return devlink_linecard; } return NULL; } static bool devlink_linecard_index_exists(struct devlink *devlink, unsigned int linecard_index) { return devlink_linecard_get_by_index(devlink, linecard_index); } static struct devlink_linecard * devlink_linecard_get_from_attrs(struct devlink *devlink, struct nlattr **attrs) { if (attrs[DEVLINK_ATTR_LINECARD_INDEX]) { u32 linecard_index = nla_get_u32(attrs[DEVLINK_ATTR_LINECARD_INDEX]); struct devlink_linecard *linecard; linecard = devlink_linecard_get_by_index(devlink, linecard_index); if (!linecard) return ERR_PTR(-ENODEV); return linecard; } return ERR_PTR(-EINVAL); } static struct devlink_linecard * devlink_linecard_get_from_info(struct devlink *devlink, struct genl_info *info) { return devlink_linecard_get_from_attrs(devlink, info->attrs); } struct devlink_linecard_type { const char *type; const void *priv; }; static int devlink_nl_linecard_fill(struct sk_buff *msg, struct devlink *devlink, struct devlink_linecard *linecard, enum devlink_command cmd, u32 portid, u32 seq, int flags, struct netlink_ext_ack *extack) { struct devlink_linecard_type *linecard_type; struct nlattr *attr; void *hdr; int i; hdr = genlmsg_put(msg, portid, seq, &devlink_nl_family, flags, cmd); if (!hdr) return -EMSGSIZE; if (devlink_nl_put_handle(msg, devlink)) goto nla_put_failure; if (nla_put_u32(msg, DEVLINK_ATTR_LINECARD_INDEX, linecard->index)) goto nla_put_failure; if (nla_put_u8(msg, DEVLINK_ATTR_LINECARD_STATE, linecard->state)) goto nla_put_failure; if (linecard->type && nla_put_string(msg, DEVLINK_ATTR_LINECARD_TYPE, linecard->type)) goto nla_put_failure; if (linecard->types_count) { attr = nla_nest_start(msg, DEVLINK_ATTR_LINECARD_SUPPORTED_TYPES); if (!attr) goto nla_put_failure; for (i = 0; i < linecard->types_count; i++) { linecard_type = &linecard->types[i]; if (nla_put_string(msg, DEVLINK_ATTR_LINECARD_TYPE, linecard_type->type)) { nla_nest_cancel(msg, attr); goto nla_put_failure; } } nla_nest_end(msg, attr); } if (devlink_rel_devlink_handle_put(msg, devlink, linecard->rel_index, DEVLINK_ATTR_NESTED_DEVLINK, NULL)) goto nla_put_failure; genlmsg_end(msg, hdr); return 0; nla_put_failure: genlmsg_cancel(msg, hdr); return -EMSGSIZE; } static void devlink_linecard_notify(struct devlink_linecard *linecard, enum devlink_command cmd) { struct devlink *devlink = linecard->devlink; struct sk_buff *msg; int err; WARN_ON(cmd != DEVLINK_CMD_LINECARD_NEW && cmd != DEVLINK_CMD_LINECARD_DEL); if (!__devl_is_registered(devlink) || !devlink_nl_notify_need(devlink)) return; msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); if (!msg) return; err = devlink_nl_linecard_fill(msg, devlink, linecard, cmd, 0, 0, 0, NULL); if (err) { nlmsg_free(msg); return; } devlink_nl_notify_send(devlink, msg); } void devlink_linecards_notify_register(struct devlink *devlink) { struct devlink_linecard *linecard; list_for_each_entry(linecard, &devlink->linecard_list, list) devlink_linecard_notify(linecard, DEVLINK_CMD_LINECARD_NEW); } void devlink_linecards_notify_unregister(struct devlink *devlink) { struct devlink_linecard *linecard; list_for_each_entry_reverse(linecard, &devlink->linecard_list, list) devlink_linecard_notify(linecard, DEVLINK_CMD_LINECARD_DEL); } int devlink_nl_linecard_get_doit(struct sk_buff *skb, struct genl_info *info) { struct devlink *devlink = info->user_ptr[0]; struct devlink_linecard *linecard; struct sk_buff *msg; int err; linecard = devlink_linecard_get_from_info(devlink, info); if (IS_ERR(linecard)) return PTR_ERR(linecard); msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); if (!msg) return -ENOMEM; mutex_lock(&linecard->state_lock); err = devlink_nl_linecard_fill(msg, devlink, linecard, DEVLINK_CMD_LINECARD_NEW, info->snd_portid, info->snd_seq, 0, info->extack); mutex_unlock(&linecard->state_lock); if (err) { nlmsg_free(msg); return err; } return genlmsg_reply(msg, info); } static int devlink_nl_linecard_get_dump_one(struct sk_buff *msg, struct devlink *devlink, struct netlink_callback *cb, int flags) { struct devlink_nl_dump_state *state = devlink_dump_state(cb); struct devlink_linecard *linecard; int idx = 0; int err = 0; list_for_each_entry(linecard, &devlink->linecard_list, list) { if (idx < state->idx) { idx++; continue; } mutex_lock(&linecard->state_lock); err = devlink_nl_linecard_fill(msg, devlink, linecard, DEVLINK_CMD_LINECARD_NEW, NETLINK_CB(cb->skb).portid, cb->nlh->nlmsg_seq, flags, cb->extack); mutex_unlock(&linecard->state_lock); if (err) { state->idx = idx; break; } idx++; } return err; } int devlink_nl_linecard_get_dumpit(struct sk_buff *skb, struct netlink_callback *cb) { return devlink_nl_dumpit(skb, cb, devlink_nl_linecard_get_dump_one); } static struct devlink_linecard_type * devlink_linecard_type_lookup(struct devlink_linecard *linecard, const char *type) { struct devlink_linecard_type *linecard_type; int i; for (i = 0; i < linecard->types_count; i++) { linecard_type = &linecard->types[i]; if (!strcmp(type, linecard_type->type)) return linecard_type; } return NULL; } static int devlink_linecard_type_set(struct devlink_linecard *linecard, const char *type, struct netlink_ext_ack *extack) { const struct devlink_linecard_ops *ops = linecard->ops; struct devlink_linecard_type *linecard_type; int err; mutex_lock(&linecard->state_lock); if (linecard->state == DEVLINK_LINECARD_STATE_PROVISIONING) { NL_SET_ERR_MSG(extack, "Line card is currently being provisioned"); err = -EBUSY; goto out; } if (linecard->state == DEVLINK_LINECARD_STATE_UNPROVISIONING) { NL_SET_ERR_MSG(extack, "Line card is currently being unprovisioned"); err = -EBUSY; goto out; } linecard_type = devlink_linecard_type_lookup(linecard, type); if (!linecard_type) { NL_SET_ERR_MSG(extack, "Unsupported line card type provided"); err = -EINVAL; goto out; } if (linecard->state != DEVLINK_LINECARD_STATE_UNPROVISIONED && linecard->state != DEVLINK_LINECARD_STATE_PROVISIONING_FAILED) { NL_SET_ERR_MSG(extack, "Line card already provisioned"); err = -EBUSY; /* Check if the line card is provisioned in the same * way the user asks. In case it is, make the operation * to return success. */ if (ops->same_provision && ops->same_provision(linecard, linecard->priv, linecard_type->type, linecard_type->priv)) err = 0; goto out; } linecard->state = DEVLINK_LINECARD_STATE_PROVISIONING; linecard->type = linecard_type->type; devlink_linecard_notify(linecard, DEVLINK_CMD_LINECARD_NEW); mutex_unlock(&linecard->state_lock); err = ops->provision(linecard, linecard->priv, linecard_type->type, linecard_type->priv, extack); if (err) { /* Provisioning failed. Assume the linecard is unprovisioned * for future operations. */ mutex_lock(&linecard->state_lock); linecard->state = DEVLINK_LINECARD_STATE_UNPROVISIONED; linecard->type = NULL; devlink_linecard_notify(linecard, DEVLINK_CMD_LINECARD_NEW); mutex_unlock(&linecard->state_lock); } return err; out: mutex_unlock(&linecard->state_lock); return err; } static int devlink_linecard_type_unset(struct devlink_linecard *linecard, struct netlink_ext_ack *extack) { int err; mutex_lock(&linecard->state_lock); if (linecard->state == DEVLINK_LINECARD_STATE_PROVISIONING) { NL_SET_ERR_MSG(extack, "Line card is currently being provisioned"); err = -EBUSY; goto out; } if (linecard->state == DEVLINK_LINECARD_STATE_UNPROVISIONING) { NL_SET_ERR_MSG(extack, "Line card is currently being unprovisioned"); err = -EBUSY; goto out; } if (linecard->state == DEVLINK_LINECARD_STATE_PROVISIONING_FAILED) { linecard->state = DEVLINK_LINECARD_STATE_UNPROVISIONED; linecard->type = NULL; devlink_linecard_notify(linecard, DEVLINK_CMD_LINECARD_NEW); err = 0; goto out; } if (linecard->state == DEVLINK_LINECARD_STATE_UNPROVISIONED) { NL_SET_ERR_MSG(extack, "Line card is not provisioned"); err = 0; goto out; } linecard->state = DEVLINK_LINECARD_STATE_UNPROVISIONING; devlink_linecard_notify(linecard, DEVLINK_CMD_LINECARD_NEW); mutex_unlock(&linecard->state_lock); err = linecard->ops->unprovision(linecard, linecard->priv, extack); if (err) { /* Unprovisioning failed. Assume the linecard is unprovisioned * for future operations. */ mutex_lock(&linecard->state_lock); linecard->state = DEVLINK_LINECARD_STATE_UNPROVISIONED; linecard->type = NULL; devlink_linecard_notify(linecard, DEVLINK_CMD_LINECARD_NEW); mutex_unlock(&linecard->state_lock); } return err; out: mutex_unlock(&linecard->state_lock); return err; } int devlink_nl_linecard_set_doit(struct sk_buff *skb, struct genl_info *info) { struct netlink_ext_ack *extack = info->extack; struct devlink *devlink = info->user_ptr[0]; struct devlink_linecard *linecard; int err; linecard = devlink_linecard_get_from_info(devlink, info); if (IS_ERR(linecard)) return PTR_ERR(linecard); if (info->attrs[DEVLINK_ATTR_LINECARD_TYPE]) { const char *type; type = nla_data(info->attrs[DEVLINK_ATTR_LINECARD_TYPE]); if (strcmp(type, "")) { err = devlink_linecard_type_set(linecard, type, extack); if (err) return err; } else { err = devlink_linecard_type_unset(linecard, extack); if (err) return err; } } return 0; } static int devlink_linecard_types_init(struct devlink_linecard *linecard) { struct devlink_linecard_type *linecard_type; unsigned int count; int i; count = linecard->ops->types_count(linecard, linecard->priv); linecard->types = kmalloc_array(count, sizeof(*linecard_type), GFP_KERNEL); if (!linecard->types) return -ENOMEM; linecard->types_count = count; for (i = 0; i < count; i++) { linecard_type = &linecard->types[i]; linecard->ops->types_get(linecard, linecard->priv, i, &linecard_type->type, &linecard_type->priv); } return 0; } static void devlink_linecard_types_fini(struct devlink_linecard *linecard) { kfree(linecard->types); } /** * devl_linecard_create - Create devlink linecard * * @devlink: devlink * @linecard_index: driver-specific numerical identifier of the linecard * @ops: linecards ops * @priv: user priv pointer * * Create devlink linecard instance with provided linecard index. * Caller can use any indexing, even hw-related one. * * Return: Line card structure or an ERR_PTR() encoded error code. */ struct devlink_linecard * devl_linecard_create(struct devlink *devlink, unsigned int linecard_index, const struct devlink_linecard_ops *ops, void *priv) { struct devlink_linecard *linecard; int err; if (WARN_ON(!ops || !ops->provision || !ops->unprovision || !ops->types_count || !ops->types_get)) return ERR_PTR(-EINVAL); if (devlink_linecard_index_exists(devlink, linecard_index)) return ERR_PTR(-EEXIST); linecard = kzalloc(sizeof(*linecard), GFP_KERNEL); if (!linecard) return ERR_PTR(-ENOMEM); linecard->devlink = devlink; linecard->index = linecard_index; linecard->ops = ops; linecard->priv = priv; linecard->state = DEVLINK_LINECARD_STATE_UNPROVISIONED; mutex_init(&linecard->state_lock); err = devlink_linecard_types_init(linecard); if (err) { mutex_destroy(&linecard->state_lock); kfree(linecard); return ERR_PTR(err); } list_add_tail(&linecard->list, &devlink->linecard_list); devlink_linecard_notify(linecard, DEVLINK_CMD_LINECARD_NEW); return linecard; } EXPORT_SYMBOL_GPL(devl_linecard_create); /** * devl_linecard_destroy - Destroy devlink linecard * * @linecard: devlink linecard */ void devl_linecard_destroy(struct devlink_linecard *linecard) { devlink_linecard_notify(linecard, DEVLINK_CMD_LINECARD_DEL); list_del(&linecard->list); devlink_linecard_types_fini(linecard); mutex_destroy(&linecard->state_lock); kfree(linecard); } EXPORT_SYMBOL_GPL(devl_linecard_destroy); /** * devlink_linecard_provision_set - Set provisioning on linecard * * @linecard: devlink linecard * @type: linecard type * * This is either called directly from the provision() op call or * as a result of the provision() op call asynchronously. */ void devlink_linecard_provision_set(struct devlink_linecard *linecard, const char *type) { mutex_lock(&linecard->state_lock); WARN_ON(linecard->type && strcmp(linecard->type, type)); linecard->state = DEVLINK_LINECARD_STATE_PROVISIONED; linecard->type = type; devlink_linecard_notify(linecard, DEVLINK_CMD_LINECARD_NEW); mutex_unlock(&linecard->state_lock); } EXPORT_SYMBOL_GPL(devlink_linecard_provision_set); /** * devlink_linecard_provision_clear - Clear provisioning on linecard * * @linecard: devlink linecard * * This is either called directly from the unprovision() op call or * as a result of the unprovision() op call asynchronously. */ void devlink_linecard_provision_clear(struct devlink_linecard *linecard) { mutex_lock(&linecard->state_lock); linecard->state = DEVLINK_LINECARD_STATE_UNPROVISIONED; linecard->type = NULL; devlink_linecard_notify(linecard, DEVLINK_CMD_LINECARD_NEW); mutex_unlock(&linecard->state_lock); } EXPORT_SYMBOL_GPL(devlink_linecard_provision_clear); /** * devlink_linecard_provision_fail - Fail provisioning on linecard * * @linecard: devlink linecard * * This is either called directly from the provision() op call or * as a result of the provision() op call asynchronously. */ void devlink_linecard_provision_fail(struct devlink_linecard *linecard) { mutex_lock(&linecard->state_lock); linecard->state = DEVLINK_LINECARD_STATE_PROVISIONING_FAILED; devlink_linecard_notify(linecard, DEVLINK_CMD_LINECARD_NEW); mutex_unlock(&linecard->state_lock); } EXPORT_SYMBOL_GPL(devlink_linecard_provision_fail); /** * devlink_linecard_activate - Set linecard active * * @linecard: devlink linecard */ void devlink_linecard_activate(struct devlink_linecard *linecard) { mutex_lock(&linecard->state_lock); WARN_ON(linecard->state != DEVLINK_LINECARD_STATE_PROVISIONED); linecard->state = DEVLINK_LINECARD_STATE_ACTIVE; devlink_linecard_notify(linecard, DEVLINK_CMD_LINECARD_NEW); mutex_unlock(&linecard->state_lock); } EXPORT_SYMBOL_GPL(devlink_linecard_activate); /** * devlink_linecard_deactivate - Set linecard inactive * * @linecard: devlink linecard */ void devlink_linecard_deactivate(struct devlink_linecard *linecard) { mutex_lock(&linecard->state_lock); switch (linecard->state) { case DEVLINK_LINECARD_STATE_ACTIVE: linecard->state = DEVLINK_LINECARD_STATE_PROVISIONED; devlink_linecard_notify(linecard, DEVLINK_CMD_LINECARD_NEW); break; case DEVLINK_LINECARD_STATE_UNPROVISIONING: /* Line card is being deactivated as part * of unprovisioning flow. */ break; default: WARN_ON(1); break; } mutex_unlock(&linecard->state_lock); } EXPORT_SYMBOL_GPL(devlink_linecard_deactivate); static void devlink_linecard_rel_notify_cb(struct devlink *devlink, u32 linecard_index) { struct devlink_linecard *linecard; linecard = devlink_linecard_get_by_index(devlink, linecard_index); if (!linecard) return; devlink_linecard_notify(linecard, DEVLINK_CMD_LINECARD_NEW); } static void devlink_linecard_rel_cleanup_cb(struct devlink *devlink, u32 linecard_index, u32 rel_index) { struct devlink_linecard *linecard; linecard = devlink_linecard_get_by_index(devlink, linecard_index); if (linecard && linecard->rel_index == rel_index) linecard->rel_index = 0; } /** * devlink_linecard_nested_dl_set - Attach/detach nested devlink * instance to linecard. * * @linecard: devlink linecard * @nested_devlink: devlink instance to attach or NULL to detach */ int devlink_linecard_nested_dl_set(struct devlink_linecard *linecard, struct devlink *nested_devlink) { return devlink_rel_nested_in_add(&linecard->rel_index, linecard->devlink->index, linecard->index, devlink_linecard_rel_notify_cb, devlink_linecard_rel_cleanup_cb, nested_devlink); } EXPORT_SYMBOL_GPL(devlink_linecard_nested_dl_set); |
| 309 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 | /* SPDX-License-Identifier: GPL-2.0 */ /* * include/linux/signalfd.h * * Copyright (C) 2007 Davide Libenzi <davidel@xmailserver.org> * */ #ifndef _LINUX_SIGNALFD_H #define _LINUX_SIGNALFD_H #include <uapi/linux/signalfd.h> #include <linux/sched/signal.h> #ifdef CONFIG_SIGNALFD /* * Deliver the signal to listening signalfd. */ static inline void signalfd_notify(struct task_struct *tsk, int sig) { if (unlikely(waitqueue_active(&tsk->sighand->signalfd_wqh))) wake_up(&tsk->sighand->signalfd_wqh); } extern void signalfd_cleanup(struct sighand_struct *sighand); #else /* CONFIG_SIGNALFD */ static inline void signalfd_notify(struct task_struct *tsk, int sig) { } static inline void signalfd_cleanup(struct sighand_struct *sighand) { } #endif /* CONFIG_SIGNALFD */ #endif /* _LINUX_SIGNALFD_H */ |
| 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 4 4 3 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 | // SPDX-License-Identifier: GPL-2.0-or-later /* * Common Twofish algorithm parts shared between the c and assembler * implementations * * Originally Twofish for GPG * By Matthew Skala <mskala@ansuz.sooke.bc.ca>, July 26, 1998 * 256-bit key length added March 20, 1999 * Some modifications to reduce the text size by Werner Koch, April, 1998 * Ported to the kerneli patch by Marc Mutz <Marc@Mutz.com> * Ported to CryptoAPI by Colin Slater <hoho@tacomeat.net> * * The original author has disclaimed all copyright interest in this * code and thus put it in the public domain. The subsequent authors * have put this under the GNU General Public License. * * This code is a "clean room" implementation, written from the paper * _Twofish: A 128-Bit Block Cipher_ by Bruce Schneier, John Kelsey, * Doug Whiting, David Wagner, Chris Hall, and Niels Ferguson, available * through http://www.counterpane.com/twofish.html * * For background information on multiplication in finite fields, used for * the matrix operations in the key schedule, see the book _Contemporary * Abstract Algebra_ by Joseph A. Gallian, especially chapter 22 in the * Third Edition. */ #include <crypto/algapi.h> #include <crypto/twofish.h> #include <linux/bitops.h> #include <linux/errno.h> #include <linux/init.h> #include <linux/kernel.h> #include <linux/module.h> #include <linux/types.h> /* The large precomputed tables for the Twofish cipher (twofish.c) * Taken from the same source as twofish.c * Marc Mutz <Marc@Mutz.com> */ /* These two tables are the q0 and q1 permutations, exactly as described in * the Twofish paper. */ static const u8 q0[256] = { 0xA9, 0x67, 0xB3, 0xE8, 0x04, 0xFD, 0xA3, 0x76, 0x9A, 0x92, 0x80, 0x78, 0xE4, 0xDD, 0xD1, 0x38, 0x0D, 0xC6, 0x35, 0x98, 0x18, 0xF7, 0xEC, 0x6C, 0x43, 0x75, 0x37, 0x26, 0xFA, 0x13, 0x94, 0x48, 0xF2, 0xD0, 0x8B, 0x30, 0x84, 0x54, 0xDF, 0x23, 0x19, 0x5B, 0x3D, 0x59, 0xF3, 0xAE, 0xA2, 0x82, 0x63, 0x01, 0x83, 0x2E, 0xD9, 0x51, 0x9B, 0x7C, 0xA6, 0xEB, 0xA5, 0xBE, 0x16, 0x0C, 0xE3, 0x61, 0xC0, 0x8C, 0x3A, 0xF5, 0x73, 0x2C, 0x25, 0x0B, 0xBB, 0x4E, 0x89, 0x6B, 0x53, 0x6A, 0xB4, 0xF1, 0xE1, 0xE6, 0xBD, 0x45, 0xE2, 0xF4, 0xB6, 0x66, 0xCC, 0x95, 0x03, 0x56, 0xD4, 0x1C, 0x1E, 0xD7, 0xFB, 0xC3, 0x8E, 0xB5, 0xE9, 0xCF, 0xBF, 0xBA, 0xEA, 0x77, 0x39, 0xAF, 0x33, 0xC9, 0x62, 0x71, 0x81, 0x79, 0x09, 0xAD, 0x24, 0xCD, 0xF9, 0xD8, 0xE5, 0xC5, 0xB9, 0x4D, 0x44, 0x08, 0x86, 0xE7, 0xA1, 0x1D, 0xAA, 0xED, 0x06, 0x70, 0xB2, 0xD2, 0x41, 0x7B, 0xA0, 0x11, 0x31, 0xC2, 0x27, 0x90, 0x20, 0xF6, 0x60, 0xFF, 0x96, 0x5C, 0xB1, 0xAB, 0x9E, 0x9C, 0x52, 0x1B, 0x5F, 0x93, 0x0A, 0xEF, 0x91, 0x85, 0x49, 0xEE, 0x2D, 0x4F, 0x8F, 0x3B, 0x47, 0x87, 0x6D, 0x46, 0xD6, 0x3E, 0x69, 0x64, 0x2A, 0xCE, 0xCB, 0x2F, 0xFC, 0x97, 0x05, 0x7A, 0xAC, 0x7F, 0xD5, 0x1A, 0x4B, 0x0E, 0xA7, 0x5A, 0x28, 0x14, 0x3F, 0x29, 0x88, 0x3C, 0x4C, 0x02, 0xB8, 0xDA, 0xB0, 0x17, 0x55, 0x1F, 0x8A, 0x7D, 0x57, 0xC7, 0x8D, 0x74, 0xB7, 0xC4, 0x9F, 0x72, 0x7E, 0x15, 0x22, 0x12, 0x58, 0x07, 0x99, 0x34, 0x6E, 0x50, 0xDE, 0x68, 0x65, 0xBC, 0xDB, 0xF8, 0xC8, 0xA8, 0x2B, 0x40, 0xDC, 0xFE, 0x32, 0xA4, 0xCA, 0x10, 0x21, 0xF0, 0xD3, 0x5D, 0x0F, 0x00, 0x6F, 0x9D, 0x36, 0x42, 0x4A, 0x5E, 0xC1, 0xE0 }; static const u8 q1[256] = { 0x75, 0xF3, 0xC6, 0xF4, 0xDB, 0x7B, 0xFB, 0xC8, 0x4A, 0xD3, 0xE6, 0x6B, 0x45, 0x7D, 0xE8, 0x4B, 0xD6, 0x32, 0xD8, 0xFD, 0x37, 0x71, 0xF1, 0xE1, 0x30, 0x0F, 0xF8, 0x1B, 0x87, 0xFA, 0x06, 0x3F, 0x5E, 0xBA, 0xAE, 0x5B, 0x8A, 0x00, 0xBC, 0x9D, 0x6D, 0xC1, 0xB1, 0x0E, 0x80, 0x5D, 0xD2, 0xD5, 0xA0, 0x84, 0x07, 0x14, 0xB5, 0x90, 0x2C, 0xA3, 0xB2, 0x73, 0x4C, 0x54, 0x92, 0x74, 0x36, 0x51, 0x38, 0xB0, 0xBD, 0x5A, 0xFC, 0x60, 0x62, 0x96, 0x6C, 0x42, 0xF7, 0x10, 0x7C, 0x28, 0x27, 0x8C, 0x13, 0x95, 0x9C, 0xC7, 0x24, 0x46, 0x3B, 0x70, 0xCA, 0xE3, 0x85, 0xCB, 0x11, 0xD0, 0x93, 0xB8, 0xA6, 0x83, 0x20, 0xFF, 0x9F, 0x77, 0xC3, 0xCC, 0x03, 0x6F, 0x08, 0xBF, 0x40, 0xE7, 0x2B, 0xE2, 0x79, 0x0C, 0xAA, 0x82, 0x41, 0x3A, 0xEA, 0xB9, 0xE4, 0x9A, 0xA4, 0x97, 0x7E, 0xDA, 0x7A, 0x17, 0x66, 0x94, 0xA1, 0x1D, 0x3D, 0xF0, 0xDE, 0xB3, 0x0B, 0x72, 0xA7, 0x1C, 0xEF, 0xD1, 0x53, 0x3E, 0x8F, 0x33, 0x26, 0x5F, 0xEC, 0x76, 0x2A, 0x49, 0x81, 0x88, 0xEE, 0x21, 0xC4, 0x1A, 0xEB, 0xD9, 0xC5, 0x39, 0x99, 0xCD, 0xAD, 0x31, 0x8B, 0x01, 0x18, 0x23, 0xDD, 0x1F, 0x4E, 0x2D, 0xF9, 0x48, 0x4F, 0xF2, 0x65, 0x8E, 0x78, 0x5C, 0x58, 0x19, 0x8D, 0xE5, 0x98, 0x57, 0x67, 0x7F, 0x05, 0x64, 0xAF, 0x63, 0xB6, 0xFE, 0xF5, 0xB7, 0x3C, 0xA5, 0xCE, 0xE9, 0x68, 0x44, 0xE0, 0x4D, 0x43, 0x69, 0x29, 0x2E, 0xAC, 0x15, 0x59, 0xA8, 0x0A, 0x9E, 0x6E, 0x47, 0xDF, 0x34, 0x35, 0x6A, 0xCF, 0xDC, 0x22, 0xC9, 0xC0, 0x9B, 0x89, 0xD4, 0xED, 0xAB, 0x12, 0xA2, 0x0D, 0x52, 0xBB, 0x02, 0x2F, 0xA9, 0xD7, 0x61, 0x1E, 0xB4, 0x50, 0x04, 0xF6, 0xC2, 0x16, 0x25, 0x86, 0x56, 0x55, 0x09, 0xBE, 0x91 }; /* These MDS tables are actually tables of MDS composed with q0 and q1, * because it is only ever used that way and we can save some time by * precomputing. Of course the main saving comes from precomputing the * GF(2^8) multiplication involved in the MDS matrix multiply; by looking * things up in these tables we reduce the matrix multiply to four lookups * and three XORs. Semi-formally, the definition of these tables is: * mds[0][i] = MDS (q1[i] 0 0 0)^T mds[1][i] = MDS (0 q0[i] 0 0)^T * mds[2][i] = MDS (0 0 q1[i] 0)^T mds[3][i] = MDS (0 0 0 q0[i])^T * where ^T means "transpose", the matrix multiply is performed in GF(2^8) * represented as GF(2)[x]/v(x) where v(x)=x^8+x^6+x^5+x^3+1 as described * by Schneier et al, and I'm casually glossing over the byte/word * conversion issues. */ static const u32 mds[4][256] = { { 0xBCBC3275, 0xECEC21F3, 0x202043C6, 0xB3B3C9F4, 0xDADA03DB, 0x02028B7B, 0xE2E22BFB, 0x9E9EFAC8, 0xC9C9EC4A, 0xD4D409D3, 0x18186BE6, 0x1E1E9F6B, 0x98980E45, 0xB2B2387D, 0xA6A6D2E8, 0x2626B74B, 0x3C3C57D6, 0x93938A32, 0x8282EED8, 0x525298FD, 0x7B7BD437, 0xBBBB3771, 0x5B5B97F1, 0x474783E1, 0x24243C30, 0x5151E20F, 0xBABAC6F8, 0x4A4AF31B, 0xBFBF4887, 0x0D0D70FA, 0xB0B0B306, 0x7575DE3F, 0xD2D2FD5E, 0x7D7D20BA, 0x666631AE, 0x3A3AA35B, 0x59591C8A, 0x00000000, 0xCDCD93BC, 0x1A1AE09D, 0xAEAE2C6D, 0x7F7FABC1, 0x2B2BC7B1, 0xBEBEB90E, 0xE0E0A080, 0x8A8A105D, 0x3B3B52D2, 0x6464BAD5, 0xD8D888A0, 0xE7E7A584, 0x5F5FE807, 0x1B1B1114, 0x2C2CC2B5, 0xFCFCB490, 0x3131272C, 0x808065A3, 0x73732AB2, 0x0C0C8173, 0x79795F4C, 0x6B6B4154, 0x4B4B0292, 0x53536974, 0x94948F36, 0x83831F51, 0x2A2A3638, 0xC4C49CB0, 0x2222C8BD, 0xD5D5F85A, 0xBDBDC3FC, 0x48487860, 0xFFFFCE62, 0x4C4C0796, 0x4141776C, 0xC7C7E642, 0xEBEB24F7, 0x1C1C1410, 0x5D5D637C, 0x36362228, 0x6767C027, 0xE9E9AF8C, 0x4444F913, 0x1414EA95, 0xF5F5BB9C, 0xCFCF18C7, 0x3F3F2D24, 0xC0C0E346, 0x7272DB3B, 0x54546C70, 0x29294CCA, 0xF0F035E3, 0x0808FE85, 0xC6C617CB, 0xF3F34F11, 0x8C8CE4D0, 0xA4A45993, 0xCACA96B8, 0x68683BA6, 0xB8B84D83, 0x38382820, 0xE5E52EFF, 0xADAD569F, 0x0B0B8477, 0xC8C81DC3, 0x9999FFCC, 0x5858ED03, 0x19199A6F, 0x0E0E0A08, 0x95957EBF, 0x70705040, 0xF7F730E7, 0x6E6ECF2B, 0x1F1F6EE2, 0xB5B53D79, 0x09090F0C, 0x616134AA, 0x57571682, 0x9F9F0B41, 0x9D9D803A, 0x111164EA, 0x2525CDB9, 0xAFAFDDE4, 0x4545089A, 0xDFDF8DA4, 0xA3A35C97, 0xEAEAD57E, 0x353558DA, 0xEDEDD07A, 0x4343FC17, 0xF8F8CB66, 0xFBFBB194, 0x3737D3A1, 0xFAFA401D, 0xC2C2683D, 0xB4B4CCF0, 0x32325DDE, 0x9C9C71B3, 0x5656E70B, 0xE3E3DA72, 0x878760A7, 0x15151B1C, 0xF9F93AEF, 0x6363BFD1, 0x3434A953, 0x9A9A853E, 0xB1B1428F, 0x7C7CD133, 0x88889B26, 0x3D3DA65F, 0xA1A1D7EC, 0xE4E4DF76, 0x8181942A, 0x91910149, 0x0F0FFB81, 0xEEEEAA88, 0x161661EE, 0xD7D77321, 0x9797F5C4, 0xA5A5A81A, 0xFEFE3FEB, 0x6D6DB5D9, 0x7878AEC5, 0xC5C56D39, 0x1D1DE599, 0x7676A4CD, 0x3E3EDCAD, 0xCBCB6731, 0xB6B6478B, 0xEFEF5B01, 0x12121E18, 0x6060C523, 0x6A6AB0DD, 0x4D4DF61F, 0xCECEE94E, 0xDEDE7C2D, 0x55559DF9, 0x7E7E5A48, 0x2121B24F, 0x03037AF2, 0xA0A02665, 0x5E5E198E, 0x5A5A6678, 0x65654B5C, 0x62624E58, 0xFDFD4519, 0x0606F48D, 0x404086E5, 0xF2F2BE98, 0x3333AC57, 0x17179067, 0x05058E7F, 0xE8E85E05, 0x4F4F7D64, 0x89896AAF, 0x10109563, 0x74742FB6, 0x0A0A75FE, 0x5C5C92F5, 0x9B9B74B7, 0x2D2D333C, 0x3030D6A5, 0x2E2E49CE, 0x494989E9, 0x46467268, 0x77775544, 0xA8A8D8E0, 0x9696044D, 0x2828BD43, 0xA9A92969, 0xD9D97929, 0x8686912E, 0xD1D187AC, 0xF4F44A15, 0x8D8D1559, 0xD6D682A8, 0xB9B9BC0A, 0x42420D9E, 0xF6F6C16E, 0x2F2FB847, 0xDDDD06DF, 0x23233934, 0xCCCC6235, 0xF1F1C46A, 0xC1C112CF, 0x8585EBDC, 0x8F8F9E22, 0x7171A1C9, 0x9090F0C0, 0xAAAA539B, 0x0101F189, 0x8B8BE1D4, 0x4E4E8CED, 0x8E8E6FAB, 0xABABA212, 0x6F6F3EA2, 0xE6E6540D, 0xDBDBF252, 0x92927BBB, 0xB7B7B602, 0x6969CA2F, 0x3939D9A9, 0xD3D30CD7, 0xA7A72361, 0xA2A2AD1E, 0xC3C399B4, 0x6C6C4450, 0x07070504, 0x04047FF6, 0x272746C2, 0xACACA716, 0xD0D07625, 0x50501386, 0xDCDCF756, 0x84841A55, 0xE1E15109, 0x7A7A25BE, 0x1313EF91}, { 0xA9D93939, 0x67901717, 0xB3719C9C, 0xE8D2A6A6, 0x04050707, 0xFD985252, 0xA3658080, 0x76DFE4E4, 0x9A084545, 0x92024B4B, 0x80A0E0E0, 0x78665A5A, 0xE4DDAFAF, 0xDDB06A6A, 0xD1BF6363, 0x38362A2A, 0x0D54E6E6, 0xC6432020, 0x3562CCCC, 0x98BEF2F2, 0x181E1212, 0xF724EBEB, 0xECD7A1A1, 0x6C774141, 0x43BD2828, 0x7532BCBC, 0x37D47B7B, 0x269B8888, 0xFA700D0D, 0x13F94444, 0x94B1FBFB, 0x485A7E7E, 0xF27A0303, 0xD0E48C8C, 0x8B47B6B6, 0x303C2424, 0x84A5E7E7, 0x54416B6B, 0xDF06DDDD, 0x23C56060, 0x1945FDFD, 0x5BA33A3A, 0x3D68C2C2, 0x59158D8D, 0xF321ECEC, 0xAE316666, 0xA23E6F6F, 0x82165757, 0x63951010, 0x015BEFEF, 0x834DB8B8, 0x2E918686, 0xD9B56D6D, 0x511F8383, 0x9B53AAAA, 0x7C635D5D, 0xA63B6868, 0xEB3FFEFE, 0xA5D63030, 0xBE257A7A, 0x16A7ACAC, 0x0C0F0909, 0xE335F0F0, 0x6123A7A7, 0xC0F09090, 0x8CAFE9E9, 0x3A809D9D, 0xF5925C5C, 0x73810C0C, 0x2C273131, 0x2576D0D0, 0x0BE75656, 0xBB7B9292, 0x4EE9CECE, 0x89F10101, 0x6B9F1E1E, 0x53A93434, 0x6AC4F1F1, 0xB499C3C3, 0xF1975B5B, 0xE1834747, 0xE66B1818, 0xBDC82222, 0x450E9898, 0xE26E1F1F, 0xF4C9B3B3, 0xB62F7474, 0x66CBF8F8, 0xCCFF9999, 0x95EA1414, 0x03ED5858, 0x56F7DCDC, 0xD4E18B8B, 0x1C1B1515, 0x1EADA2A2, 0xD70CD3D3, 0xFB2BE2E2, 0xC31DC8C8, 0x8E195E5E, 0xB5C22C2C, 0xE9894949, 0xCF12C1C1, 0xBF7E9595, 0xBA207D7D, 0xEA641111, 0x77840B0B, 0x396DC5C5, 0xAF6A8989, 0x33D17C7C, 0xC9A17171, 0x62CEFFFF, 0x7137BBBB, 0x81FB0F0F, 0x793DB5B5, 0x0951E1E1, 0xADDC3E3E, 0x242D3F3F, 0xCDA47676, 0xF99D5555, 0xD8EE8282, 0xE5864040, 0xC5AE7878, 0xB9CD2525, 0x4D049696, 0x44557777, 0x080A0E0E, 0x86135050, 0xE730F7F7, 0xA1D33737, 0x1D40FAFA, 0xAA346161, 0xED8C4E4E, 0x06B3B0B0, 0x706C5454, 0xB22A7373, 0xD2523B3B, 0x410B9F9F, 0x7B8B0202, 0xA088D8D8, 0x114FF3F3, 0x3167CBCB, 0xC2462727, 0x27C06767, 0x90B4FCFC, 0x20283838, 0xF67F0404, 0x60784848, 0xFF2EE5E5, 0x96074C4C, 0x5C4B6565, 0xB1C72B2B, 0xAB6F8E8E, 0x9E0D4242, 0x9CBBF5F5, 0x52F2DBDB, 0x1BF34A4A, 0x5FA63D3D, 0x9359A4A4, 0x0ABCB9B9, 0xEF3AF9F9, 0x91EF1313, 0x85FE0808, 0x49019191, 0xEE611616, 0x2D7CDEDE, 0x4FB22121, 0x8F42B1B1, 0x3BDB7272, 0x47B82F2F, 0x8748BFBF, 0x6D2CAEAE, 0x46E3C0C0, 0xD6573C3C, 0x3E859A9A, 0x6929A9A9, 0x647D4F4F, 0x2A948181, 0xCE492E2E, 0xCB17C6C6, 0x2FCA6969, 0xFCC3BDBD, 0x975CA3A3, 0x055EE8E8, 0x7AD0EDED, 0xAC87D1D1, 0x7F8E0505, 0xD5BA6464, 0x1AA8A5A5, 0x4BB72626, 0x0EB9BEBE, 0xA7608787, 0x5AF8D5D5, 0x28223636, 0x14111B1B, 0x3FDE7575, 0x2979D9D9, 0x88AAEEEE, 0x3C332D2D, 0x4C5F7979, 0x02B6B7B7, 0xB896CACA, 0xDA583535, 0xB09CC4C4, 0x17FC4343, 0x551A8484, 0x1FF64D4D, 0x8A1C5959, 0x7D38B2B2, 0x57AC3333, 0xC718CFCF, 0x8DF40606, 0x74695353, 0xB7749B9B, 0xC4F59797, 0x9F56ADAD, 0x72DAE3E3, 0x7ED5EAEA, 0x154AF4F4, 0x229E8F8F, 0x12A2ABAB, 0x584E6262, 0x07E85F5F, 0x99E51D1D, 0x34392323, 0x6EC1F6F6, 0x50446C6C, 0xDE5D3232, 0x68724646, 0x6526A0A0, 0xBC93CDCD, 0xDB03DADA, 0xF8C6BABA, 0xC8FA9E9E, 0xA882D6D6, 0x2BCF6E6E, 0x40507070, 0xDCEB8585, 0xFE750A0A, 0x328A9393, 0xA48DDFDF, 0xCA4C2929, 0x10141C1C, 0x2173D7D7, 0xF0CCB4B4, 0xD309D4D4, 0x5D108A8A, 0x0FE25151, 0x00000000, 0x6F9A1919, 0x9DE01A1A, 0x368F9494, 0x42E6C7C7, 0x4AECC9C9, 0x5EFDD2D2, 0xC1AB7F7F, 0xE0D8A8A8}, { 0xBC75BC32, 0xECF3EC21, 0x20C62043, 0xB3F4B3C9, 0xDADBDA03, 0x027B028B, 0xE2FBE22B, 0x9EC89EFA, 0xC94AC9EC, 0xD4D3D409, 0x18E6186B, 0x1E6B1E9F, 0x9845980E, 0xB27DB238, 0xA6E8A6D2, 0x264B26B7, 0x3CD63C57, 0x9332938A, 0x82D882EE, 0x52FD5298, 0x7B377BD4, 0xBB71BB37, 0x5BF15B97, 0x47E14783, 0x2430243C, 0x510F51E2, 0xBAF8BAC6, 0x4A1B4AF3, 0xBF87BF48, 0x0DFA0D70, 0xB006B0B3, 0x753F75DE, 0xD25ED2FD, 0x7DBA7D20, 0x66AE6631, 0x3A5B3AA3, 0x598A591C, 0x00000000, 0xCDBCCD93, 0x1A9D1AE0, 0xAE6DAE2C, 0x7FC17FAB, 0x2BB12BC7, 0xBE0EBEB9, 0xE080E0A0, 0x8A5D8A10, 0x3BD23B52, 0x64D564BA, 0xD8A0D888, 0xE784E7A5, 0x5F075FE8, 0x1B141B11, 0x2CB52CC2, 0xFC90FCB4, 0x312C3127, 0x80A38065, 0x73B2732A, 0x0C730C81, 0x794C795F, 0x6B546B41, 0x4B924B02, 0x53745369, 0x9436948F, 0x8351831F, 0x2A382A36, 0xC4B0C49C, 0x22BD22C8, 0xD55AD5F8, 0xBDFCBDC3, 0x48604878, 0xFF62FFCE, 0x4C964C07, 0x416C4177, 0xC742C7E6, 0xEBF7EB24, 0x1C101C14, 0x5D7C5D63, 0x36283622, 0x672767C0, 0xE98CE9AF, 0x441344F9, 0x149514EA, 0xF59CF5BB, 0xCFC7CF18, 0x3F243F2D, 0xC046C0E3, 0x723B72DB, 0x5470546C, 0x29CA294C, 0xF0E3F035, 0x088508FE, 0xC6CBC617, 0xF311F34F, 0x8CD08CE4, 0xA493A459, 0xCAB8CA96, 0x68A6683B, 0xB883B84D, 0x38203828, 0xE5FFE52E, 0xAD9FAD56, 0x0B770B84, 0xC8C3C81D, 0x99CC99FF, 0x580358ED, 0x196F199A, 0x0E080E0A, 0x95BF957E, 0x70407050, 0xF7E7F730, 0x6E2B6ECF, 0x1FE21F6E, 0xB579B53D, 0x090C090F, 0x61AA6134, 0x57825716, 0x9F419F0B, 0x9D3A9D80, 0x11EA1164, 0x25B925CD, 0xAFE4AFDD, 0x459A4508, 0xDFA4DF8D, 0xA397A35C, 0xEA7EEAD5, 0x35DA3558, 0xED7AEDD0, 0x431743FC, 0xF866F8CB, 0xFB94FBB1, 0x37A137D3, 0xFA1DFA40, 0xC23DC268, 0xB4F0B4CC, 0x32DE325D, 0x9CB39C71, 0x560B56E7, 0xE372E3DA, 0x87A78760, 0x151C151B, 0xF9EFF93A, 0x63D163BF, 0x345334A9, 0x9A3E9A85, 0xB18FB142, 0x7C337CD1, 0x8826889B, 0x3D5F3DA6, 0xA1ECA1D7, 0xE476E4DF, 0x812A8194, 0x91499101, 0x0F810FFB, 0xEE88EEAA, 0x16EE1661, 0xD721D773, 0x97C497F5, 0xA51AA5A8, 0xFEEBFE3F, 0x6DD96DB5, 0x78C578AE, 0xC539C56D, 0x1D991DE5, 0x76CD76A4, 0x3EAD3EDC, 0xCB31CB67, 0xB68BB647, 0xEF01EF5B, 0x1218121E, 0x602360C5, 0x6ADD6AB0, 0x4D1F4DF6, 0xCE4ECEE9, 0xDE2DDE7C, 0x55F9559D, 0x7E487E5A, 0x214F21B2, 0x03F2037A, 0xA065A026, 0x5E8E5E19, 0x5A785A66, 0x655C654B, 0x6258624E, 0xFD19FD45, 0x068D06F4, 0x40E54086, 0xF298F2BE, 0x335733AC, 0x17671790, 0x057F058E, 0xE805E85E, 0x4F644F7D, 0x89AF896A, 0x10631095, 0x74B6742F, 0x0AFE0A75, 0x5CF55C92, 0x9BB79B74, 0x2D3C2D33, 0x30A530D6, 0x2ECE2E49, 0x49E94989, 0x46684672, 0x77447755, 0xA8E0A8D8, 0x964D9604, 0x284328BD, 0xA969A929, 0xD929D979, 0x862E8691, 0xD1ACD187, 0xF415F44A, 0x8D598D15, 0xD6A8D682, 0xB90AB9BC, 0x429E420D, 0xF66EF6C1, 0x2F472FB8, 0xDDDFDD06, 0x23342339, 0xCC35CC62, 0xF16AF1C4, 0xC1CFC112, 0x85DC85EB, 0x8F228F9E, 0x71C971A1, 0x90C090F0, 0xAA9BAA53, 0x018901F1, 0x8BD48BE1, 0x4EED4E8C, 0x8EAB8E6F, 0xAB12ABA2, 0x6FA26F3E, 0xE60DE654, 0xDB52DBF2, 0x92BB927B, 0xB702B7B6, 0x692F69CA, 0x39A939D9, 0xD3D7D30C, 0xA761A723, 0xA21EA2AD, 0xC3B4C399, 0x6C506C44, 0x07040705, 0x04F6047F, 0x27C22746, 0xAC16ACA7, 0xD025D076, 0x50865013, 0xDC56DCF7, 0x8455841A, 0xE109E151, 0x7ABE7A25, 0x139113EF}, { 0xD939A9D9, 0x90176790, 0x719CB371, 0xD2A6E8D2, 0x05070405, 0x9852FD98, 0x6580A365, 0xDFE476DF, 0x08459A08, 0x024B9202, 0xA0E080A0, 0x665A7866, 0xDDAFE4DD, 0xB06ADDB0, 0xBF63D1BF, 0x362A3836, 0x54E60D54, 0x4320C643, 0x62CC3562, 0xBEF298BE, 0x1E12181E, 0x24EBF724, 0xD7A1ECD7, 0x77416C77, 0xBD2843BD, 0x32BC7532, 0xD47B37D4, 0x9B88269B, 0x700DFA70, 0xF94413F9, 0xB1FB94B1, 0x5A7E485A, 0x7A03F27A, 0xE48CD0E4, 0x47B68B47, 0x3C24303C, 0xA5E784A5, 0x416B5441, 0x06DDDF06, 0xC56023C5, 0x45FD1945, 0xA33A5BA3, 0x68C23D68, 0x158D5915, 0x21ECF321, 0x3166AE31, 0x3E6FA23E, 0x16578216, 0x95106395, 0x5BEF015B, 0x4DB8834D, 0x91862E91, 0xB56DD9B5, 0x1F83511F, 0x53AA9B53, 0x635D7C63, 0x3B68A63B, 0x3FFEEB3F, 0xD630A5D6, 0x257ABE25, 0xA7AC16A7, 0x0F090C0F, 0x35F0E335, 0x23A76123, 0xF090C0F0, 0xAFE98CAF, 0x809D3A80, 0x925CF592, 0x810C7381, 0x27312C27, 0x76D02576, 0xE7560BE7, 0x7B92BB7B, 0xE9CE4EE9, 0xF10189F1, 0x9F1E6B9F, 0xA93453A9, 0xC4F16AC4, 0x99C3B499, 0x975BF197, 0x8347E183, 0x6B18E66B, 0xC822BDC8, 0x0E98450E, 0x6E1FE26E, 0xC9B3F4C9, 0x2F74B62F, 0xCBF866CB, 0xFF99CCFF, 0xEA1495EA, 0xED5803ED, 0xF7DC56F7, 0xE18BD4E1, 0x1B151C1B, 0xADA21EAD, 0x0CD3D70C, 0x2BE2FB2B, 0x1DC8C31D, 0x195E8E19, 0xC22CB5C2, 0x8949E989, 0x12C1CF12, 0x7E95BF7E, 0x207DBA20, 0x6411EA64, 0x840B7784, 0x6DC5396D, 0x6A89AF6A, 0xD17C33D1, 0xA171C9A1, 0xCEFF62CE, 0x37BB7137, 0xFB0F81FB, 0x3DB5793D, 0x51E10951, 0xDC3EADDC, 0x2D3F242D, 0xA476CDA4, 0x9D55F99D, 0xEE82D8EE, 0x8640E586, 0xAE78C5AE, 0xCD25B9CD, 0x04964D04, 0x55774455, 0x0A0E080A, 0x13508613, 0x30F7E730, 0xD337A1D3, 0x40FA1D40, 0x3461AA34, 0x8C4EED8C, 0xB3B006B3, 0x6C54706C, 0x2A73B22A, 0x523BD252, 0x0B9F410B, 0x8B027B8B, 0x88D8A088, 0x4FF3114F, 0x67CB3167, 0x4627C246, 0xC06727C0, 0xB4FC90B4, 0x28382028, 0x7F04F67F, 0x78486078, 0x2EE5FF2E, 0x074C9607, 0x4B655C4B, 0xC72BB1C7, 0x6F8EAB6F, 0x0D429E0D, 0xBBF59CBB, 0xF2DB52F2, 0xF34A1BF3, 0xA63D5FA6, 0x59A49359, 0xBCB90ABC, 0x3AF9EF3A, 0xEF1391EF, 0xFE0885FE, 0x01914901, 0x6116EE61, 0x7CDE2D7C, 0xB2214FB2, 0x42B18F42, 0xDB723BDB, 0xB82F47B8, 0x48BF8748, 0x2CAE6D2C, 0xE3C046E3, 0x573CD657, 0x859A3E85, 0x29A96929, 0x7D4F647D, 0x94812A94, 0x492ECE49, 0x17C6CB17, 0xCA692FCA, 0xC3BDFCC3, 0x5CA3975C, 0x5EE8055E, 0xD0ED7AD0, 0x87D1AC87, 0x8E057F8E, 0xBA64D5BA, 0xA8A51AA8, 0xB7264BB7, 0xB9BE0EB9, 0x6087A760, 0xF8D55AF8, 0x22362822, 0x111B1411, 0xDE753FDE, 0x79D92979, 0xAAEE88AA, 0x332D3C33, 0x5F794C5F, 0xB6B702B6, 0x96CAB896, 0x5835DA58, 0x9CC4B09C, 0xFC4317FC, 0x1A84551A, 0xF64D1FF6, 0x1C598A1C, 0x38B27D38, 0xAC3357AC, 0x18CFC718, 0xF4068DF4, 0x69537469, 0x749BB774, 0xF597C4F5, 0x56AD9F56, 0xDAE372DA, 0xD5EA7ED5, 0x4AF4154A, 0x9E8F229E, 0xA2AB12A2, 0x4E62584E, 0xE85F07E8, 0xE51D99E5, 0x39233439, 0xC1F66EC1, 0x446C5044, 0x5D32DE5D, 0x72466872, 0x26A06526, 0x93CDBC93, 0x03DADB03, 0xC6BAF8C6, 0xFA9EC8FA, 0x82D6A882, 0xCF6E2BCF, 0x50704050, 0xEB85DCEB, 0x750AFE75, 0x8A93328A, 0x8DDFA48D, 0x4C29CA4C, 0x141C1014, 0x73D72173, 0xCCB4F0CC, 0x09D4D309, 0x108A5D10, 0xE2510FE2, 0x00000000, 0x9A196F9A, 0xE01A9DE0, 0x8F94368F, 0xE6C742E6, 0xECC94AEC, 0xFDD25EFD, 0xAB7FC1AB, 0xD8A8E0D8} }; /* The exp_to_poly and poly_to_exp tables are used to perform efficient * operations in GF(2^8) represented as GF(2)[x]/w(x) where * w(x)=x^8+x^6+x^3+x^2+1. We care about doing that because it's part of the * definition of the RS matrix in the key schedule. Elements of that field * are polynomials of degree not greater than 7 and all coefficients 0 or 1, * which can be represented naturally by bytes (just substitute x=2). In that * form, GF(2^8) addition is the same as bitwise XOR, but GF(2^8) * multiplication is inefficient without hardware support. To multiply * faster, I make use of the fact x is a generator for the nonzero elements, * so that every element p of GF(2)[x]/w(x) is either 0 or equal to (x)^n for * some n in 0..254. Note that caret is exponentiation in GF(2^8), * *not* polynomial notation. So if I want to compute pq where p and q are * in GF(2^8), I can just say: * 1. if p=0 or q=0 then pq=0 * 2. otherwise, find m and n such that p=x^m and q=x^n * 3. pq=(x^m)(x^n)=x^(m+n), so add m and n and find pq * The translations in steps 2 and 3 are looked up in the tables * poly_to_exp (for step 2) and exp_to_poly (for step 3). To see this * in action, look at the CALC_S macro. As additional wrinkles, note that * one of my operands is always a constant, so the poly_to_exp lookup on it * is done in advance; I included the original values in the comments so * readers can have some chance of recognizing that this *is* the RS matrix * from the Twofish paper. I've only included the table entries I actually * need; I never do a lookup on a variable input of zero and the biggest * exponents I'll ever see are 254 (variable) and 237 (constant), so they'll * never sum to more than 491. I'm repeating part of the exp_to_poly table * so that I don't have to do mod-255 reduction in the exponent arithmetic. * Since I know my constant operands are never zero, I only have to worry * about zero values in the variable operand, and I do it with a simple * conditional branch. I know conditionals are expensive, but I couldn't * see a non-horrible way of avoiding them, and I did manage to group the * statements so that each if covers four group multiplications. */ static const u8 poly_to_exp[255] = { 0x00, 0x01, 0x17, 0x02, 0x2E, 0x18, 0x53, 0x03, 0x6A, 0x2F, 0x93, 0x19, 0x34, 0x54, 0x45, 0x04, 0x5C, 0x6B, 0xB6, 0x30, 0xA6, 0x94, 0x4B, 0x1A, 0x8C, 0x35, 0x81, 0x55, 0xAA, 0x46, 0x0D, 0x05, 0x24, 0x5D, 0x87, 0x6C, 0x9B, 0xB7, 0xC1, 0x31, 0x2B, 0xA7, 0xA3, 0x95, 0x98, 0x4C, 0xCA, 0x1B, 0xE6, 0x8D, 0x73, 0x36, 0xCD, 0x82, 0x12, 0x56, 0x62, 0xAB, 0xF0, 0x47, 0x4F, 0x0E, 0xBD, 0x06, 0xD4, 0x25, 0xD2, 0x5E, 0x27, 0x88, 0x66, 0x6D, 0xD6, 0x9C, 0x79, 0xB8, 0x08, 0xC2, 0xDF, 0x32, 0x68, 0x2C, 0xFD, 0xA8, 0x8A, 0xA4, 0x5A, 0x96, 0x29, 0x99, 0x22, 0x4D, 0x60, 0xCB, 0xE4, 0x1C, 0x7B, 0xE7, 0x3B, 0x8E, 0x9E, 0x74, 0xF4, 0x37, 0xD8, 0xCE, 0xF9, 0x83, 0x6F, 0x13, 0xB2, 0x57, 0xE1, 0x63, 0xDC, 0xAC, 0xC4, 0xF1, 0xAF, 0x48, 0x0A, 0x50, 0x42, 0x0F, 0xBA, 0xBE, 0xC7, 0x07, 0xDE, 0xD5, 0x78, 0x26, 0x65, 0xD3, 0xD1, 0x5F, 0xE3, 0x28, 0x21, 0x89, 0x59, 0x67, 0xFC, 0x6E, 0xB1, 0xD7, 0xF8, 0x9D, 0xF3, 0x7A, 0x3A, 0xB9, 0xC6, 0x09, 0x41, 0xC3, 0xAE, 0xE0, 0xDB, 0x33, 0x44, 0x69, 0x92, 0x2D, 0x52, 0xFE, 0x16, 0xA9, 0x0C, 0x8B, 0x80, 0xA5, 0x4A, 0x5B, 0xB5, 0x97, 0xC9, 0x2A, 0xA2, 0x9A, 0xC0, 0x23, 0x86, 0x4E, 0xBC, 0x61, 0xEF, 0xCC, 0x11, 0xE5, 0x72, 0x1D, 0x3D, 0x7C, 0xEB, 0xE8, 0xE9, 0x3C, 0xEA, 0x8F, 0x7D, 0x9F, 0xEC, 0x75, 0x1E, 0xF5, 0x3E, 0x38, 0xF6, 0xD9, 0x3F, 0xCF, 0x76, 0xFA, 0x1F, 0x84, 0xA0, 0x70, 0xED, 0x14, 0x90, 0xB3, 0x7E, 0x58, 0xFB, 0xE2, 0x20, 0x64, 0xD0, 0xDD, 0x77, 0xAD, 0xDA, 0xC5, 0x40, 0xF2, 0x39, 0xB0, 0xF7, 0x49, 0xB4, 0x0B, 0x7F, 0x51, 0x15, 0x43, 0x91, 0x10, 0x71, 0xBB, 0xEE, 0xBF, 0x85, 0xC8, 0xA1 }; static const u8 exp_to_poly[492] = { 0x01, 0x02, 0x04, 0x08, 0x10, 0x20, 0x40, 0x80, 0x4D, 0x9A, 0x79, 0xF2, 0xA9, 0x1F, 0x3E, 0x7C, 0xF8, 0xBD, 0x37, 0x6E, 0xDC, 0xF5, 0xA7, 0x03, 0x06, 0x0C, 0x18, 0x30, 0x60, 0xC0, 0xCD, 0xD7, 0xE3, 0x8B, 0x5B, 0xB6, 0x21, 0x42, 0x84, 0x45, 0x8A, 0x59, 0xB2, 0x29, 0x52, 0xA4, 0x05, 0x0A, 0x14, 0x28, 0x50, 0xA0, 0x0D, 0x1A, 0x34, 0x68, 0xD0, 0xED, 0x97, 0x63, 0xC6, 0xC1, 0xCF, 0xD3, 0xEB, 0x9B, 0x7B, 0xF6, 0xA1, 0x0F, 0x1E, 0x3C, 0x78, 0xF0, 0xAD, 0x17, 0x2E, 0x5C, 0xB8, 0x3D, 0x7A, 0xF4, 0xA5, 0x07, 0x0E, 0x1C, 0x38, 0x70, 0xE0, 0x8D, 0x57, 0xAE, 0x11, 0x22, 0x44, 0x88, 0x5D, 0xBA, 0x39, 0x72, 0xE4, 0x85, 0x47, 0x8E, 0x51, 0xA2, 0x09, 0x12, 0x24, 0x48, 0x90, 0x6D, 0xDA, 0xF9, 0xBF, 0x33, 0x66, 0xCC, 0xD5, 0xE7, 0x83, 0x4B, 0x96, 0x61, 0xC2, 0xC9, 0xDF, 0xF3, 0xAB, 0x1B, 0x36, 0x6C, 0xD8, 0xFD, 0xB7, 0x23, 0x46, 0x8C, 0x55, 0xAA, 0x19, 0x32, 0x64, 0xC8, 0xDD, 0xF7, 0xA3, 0x0B, 0x16, 0x2C, 0x58, 0xB0, 0x2D, 0x5A, 0xB4, 0x25, 0x4A, 0x94, 0x65, 0xCA, 0xD9, 0xFF, 0xB3, 0x2B, 0x56, 0xAC, 0x15, 0x2A, 0x54, 0xA8, 0x1D, 0x3A, 0x74, 0xE8, 0x9D, 0x77, 0xEE, 0x91, 0x6F, 0xDE, 0xF1, 0xAF, 0x13, 0x26, 0x4C, 0x98, 0x7D, 0xFA, 0xB9, 0x3F, 0x7E, 0xFC, 0xB5, 0x27, 0x4E, 0x9C, 0x75, 0xEA, 0x99, 0x7F, 0xFE, 0xB1, 0x2F, 0x5E, 0xBC, 0x35, 0x6A, 0xD4, 0xE5, 0x87, 0x43, 0x86, 0x41, 0x82, 0x49, 0x92, 0x69, 0xD2, 0xE9, 0x9F, 0x73, 0xE6, 0x81, 0x4F, 0x9E, 0x71, 0xE2, 0x89, 0x5F, 0xBE, 0x31, 0x62, 0xC4, 0xC5, 0xC7, 0xC3, 0xCB, 0xDB, 0xFB, 0xBB, 0x3B, 0x76, 0xEC, 0x95, 0x67, 0xCE, 0xD1, 0xEF, 0x93, 0x6B, 0xD6, 0xE1, 0x8F, 0x53, 0xA6, 0x01, 0x02, 0x04, 0x08, 0x10, 0x20, 0x40, 0x80, 0x4D, 0x9A, 0x79, 0xF2, 0xA9, 0x1F, 0x3E, 0x7C, 0xF8, 0xBD, 0x37, 0x6E, 0xDC, 0xF5, 0xA7, 0x03, 0x06, 0x0C, 0x18, 0x30, 0x60, 0xC0, 0xCD, 0xD7, 0xE3, 0x8B, 0x5B, 0xB6, 0x21, 0x42, 0x84, 0x45, 0x8A, 0x59, 0xB2, 0x29, 0x52, 0xA4, 0x05, 0x0A, 0x14, 0x28, 0x50, 0xA0, 0x0D, 0x1A, 0x34, 0x68, 0xD0, 0xED, 0x97, 0x63, 0xC6, 0xC1, 0xCF, 0xD3, 0xEB, 0x9B, 0x7B, 0xF6, 0xA1, 0x0F, 0x1E, 0x3C, 0x78, 0xF0, 0xAD, 0x17, 0x2E, 0x5C, 0xB8, 0x3D, 0x7A, 0xF4, 0xA5, 0x07, 0x0E, 0x1C, 0x38, 0x70, 0xE0, 0x8D, 0x57, 0xAE, 0x11, 0x22, 0x44, 0x88, 0x5D, 0xBA, 0x39, 0x72, 0xE4, 0x85, 0x47, 0x8E, 0x51, 0xA2, 0x09, 0x12, 0x24, 0x48, 0x90, 0x6D, 0xDA, 0xF9, 0xBF, 0x33, 0x66, 0xCC, 0xD5, 0xE7, 0x83, 0x4B, 0x96, 0x61, 0xC2, 0xC9, 0xDF, 0xF3, 0xAB, 0x1B, 0x36, 0x6C, 0xD8, 0xFD, 0xB7, 0x23, 0x46, 0x8C, 0x55, 0xAA, 0x19, 0x32, 0x64, 0xC8, 0xDD, 0xF7, 0xA3, 0x0B, 0x16, 0x2C, 0x58, 0xB0, 0x2D, 0x5A, 0xB4, 0x25, 0x4A, 0x94, 0x65, 0xCA, 0xD9, 0xFF, 0xB3, 0x2B, 0x56, 0xAC, 0x15, 0x2A, 0x54, 0xA8, 0x1D, 0x3A, 0x74, 0xE8, 0x9D, 0x77, 0xEE, 0x91, 0x6F, 0xDE, 0xF1, 0xAF, 0x13, 0x26, 0x4C, 0x98, 0x7D, 0xFA, 0xB9, 0x3F, 0x7E, 0xFC, 0xB5, 0x27, 0x4E, 0x9C, 0x75, 0xEA, 0x99, 0x7F, 0xFE, 0xB1, 0x2F, 0x5E, 0xBC, 0x35, 0x6A, 0xD4, 0xE5, 0x87, 0x43, 0x86, 0x41, 0x82, 0x49, 0x92, 0x69, 0xD2, 0xE9, 0x9F, 0x73, 0xE6, 0x81, 0x4F, 0x9E, 0x71, 0xE2, 0x89, 0x5F, 0xBE, 0x31, 0x62, 0xC4, 0xC5, 0xC7, 0xC3, 0xCB }; /* The table constants are indices of * S-box entries, preprocessed through q0 and q1. */ static const u8 calc_sb_tbl[512] = { 0xA9, 0x75, 0x67, 0xF3, 0xB3, 0xC6, 0xE8, 0xF4, 0x04, 0xDB, 0xFD, 0x7B, 0xA3, 0xFB, 0x76, 0xC8, 0x9A, 0x4A, 0x92, 0xD3, 0x80, 0xE6, 0x78, 0x6B, 0xE4, 0x45, 0xDD, 0x7D, 0xD1, 0xE8, 0x38, 0x4B, 0x0D, 0xD6, 0xC6, 0x32, 0x35, 0xD8, 0x98, 0xFD, 0x18, 0x37, 0xF7, 0x71, 0xEC, 0xF1, 0x6C, 0xE1, 0x43, 0x30, 0x75, 0x0F, 0x37, 0xF8, 0x26, 0x1B, 0xFA, 0x87, 0x13, 0xFA, 0x94, 0x06, 0x48, 0x3F, 0xF2, 0x5E, 0xD0, 0xBA, 0x8B, 0xAE, 0x30, 0x5B, 0x84, 0x8A, 0x54, 0x00, 0xDF, 0xBC, 0x23, 0x9D, 0x19, 0x6D, 0x5B, 0xC1, 0x3D, 0xB1, 0x59, 0x0E, 0xF3, 0x80, 0xAE, 0x5D, 0xA2, 0xD2, 0x82, 0xD5, 0x63, 0xA0, 0x01, 0x84, 0x83, 0x07, 0x2E, 0x14, 0xD9, 0xB5, 0x51, 0x90, 0x9B, 0x2C, 0x7C, 0xA3, 0xA6, 0xB2, 0xEB, 0x73, 0xA5, 0x4C, 0xBE, 0x54, 0x16, 0x92, 0x0C, 0x74, 0xE3, 0x36, 0x61, 0x51, 0xC0, 0x38, 0x8C, 0xB0, 0x3A, 0xBD, 0xF5, 0x5A, 0x73, 0xFC, 0x2C, 0x60, 0x25, 0x62, 0x0B, 0x96, 0xBB, 0x6C, 0x4E, 0x42, 0x89, 0xF7, 0x6B, 0x10, 0x53, 0x7C, 0x6A, 0x28, 0xB4, 0x27, 0xF1, 0x8C, 0xE1, 0x13, 0xE6, 0x95, 0xBD, 0x9C, 0x45, 0xC7, 0xE2, 0x24, 0xF4, 0x46, 0xB6, 0x3B, 0x66, 0x70, 0xCC, 0xCA, 0x95, 0xE3, 0x03, 0x85, 0x56, 0xCB, 0xD4, 0x11, 0x1C, 0xD0, 0x1E, 0x93, 0xD7, 0xB8, 0xFB, 0xA6, 0xC3, 0x83, 0x8E, 0x20, 0xB5, 0xFF, 0xE9, 0x9F, 0xCF, 0x77, 0xBF, 0xC3, 0xBA, 0xCC, 0xEA, 0x03, 0x77, 0x6F, 0x39, 0x08, 0xAF, 0xBF, 0x33, 0x40, 0xC9, 0xE7, 0x62, 0x2B, 0x71, 0xE2, 0x81, 0x79, 0x79, 0x0C, 0x09, 0xAA, 0xAD, 0x82, 0x24, 0x41, 0xCD, 0x3A, 0xF9, 0xEA, 0xD8, 0xB9, 0xE5, 0xE4, 0xC5, 0x9A, 0xB9, 0xA4, 0x4D, 0x97, 0x44, 0x7E, 0x08, 0xDA, 0x86, 0x7A, 0xE7, 0x17, 0xA1, 0x66, 0x1D, 0x94, 0xAA, 0xA1, 0xED, 0x1D, 0x06, 0x3D, 0x70, 0xF0, 0xB2, 0xDE, 0xD2, 0xB3, 0x41, 0x0B, 0x7B, 0x72, 0xA0, 0xA7, 0x11, 0x1C, 0x31, 0xEF, 0xC2, 0xD1, 0x27, 0x53, 0x90, 0x3E, 0x20, 0x8F, 0xF6, 0x33, 0x60, 0x26, 0xFF, 0x5F, 0x96, 0xEC, 0x5C, 0x76, 0xB1, 0x2A, 0xAB, 0x49, 0x9E, 0x81, 0x9C, 0x88, 0x52, 0xEE, 0x1B, 0x21, 0x5F, 0xC4, 0x93, 0x1A, 0x0A, 0xEB, 0xEF, 0xD9, 0x91, 0xC5, 0x85, 0x39, 0x49, 0x99, 0xEE, 0xCD, 0x2D, 0xAD, 0x4F, 0x31, 0x8F, 0x8B, 0x3B, 0x01, 0x47, 0x18, 0x87, 0x23, 0x6D, 0xDD, 0x46, 0x1F, 0xD6, 0x4E, 0x3E, 0x2D, 0x69, 0xF9, 0x64, 0x48, 0x2A, 0x4F, 0xCE, 0xF2, 0xCB, 0x65, 0x2F, 0x8E, 0xFC, 0x78, 0x97, 0x5C, 0x05, 0x58, 0x7A, 0x19, 0xAC, 0x8D, 0x7F, 0xE5, 0xD5, 0x98, 0x1A, 0x57, 0x4B, 0x67, 0x0E, 0x7F, 0xA7, 0x05, 0x5A, 0x64, 0x28, 0xAF, 0x14, 0x63, 0x3F, 0xB6, 0x29, 0xFE, 0x88, 0xF5, 0x3C, 0xB7, 0x4C, 0x3C, 0x02, 0xA5, 0xB8, 0xCE, 0xDA, 0xE9, 0xB0, 0x68, 0x17, 0x44, 0x55, 0xE0, 0x1F, 0x4D, 0x8A, 0x43, 0x7D, 0x69, 0x57, 0x29, 0xC7, 0x2E, 0x8D, 0xAC, 0x74, 0x15, 0xB7, 0x59, 0xC4, 0xA8, 0x9F, 0x0A, 0x72, 0x9E, 0x7E, 0x6E, 0x15, 0x47, 0x22, 0xDF, 0x12, 0x34, 0x58, 0x35, 0x07, 0x6A, 0x99, 0xCF, 0x34, 0xDC, 0x6E, 0x22, 0x50, 0xC9, 0xDE, 0xC0, 0x68, 0x9B, 0x65, 0x89, 0xBC, 0xD4, 0xDB, 0xED, 0xF8, 0xAB, 0xC8, 0x12, 0xA8, 0xA2, 0x2B, 0x0D, 0x40, 0x52, 0xDC, 0xBB, 0xFE, 0x02, 0x32, 0x2F, 0xA4, 0xA9, 0xCA, 0xD7, 0x10, 0x61, 0x21, 0x1E, 0xF0, 0xB4, 0xD3, 0x50, 0x5D, 0x04, 0x0F, 0xF6, 0x00, 0xC2, 0x6F, 0x16, 0x9D, 0x25, 0x36, 0x86, 0x42, 0x56, 0x4A, 0x55, 0x5E, 0x09, 0xC1, 0xBE, 0xE0, 0x91 }; /* Macro to perform one column of the RS matrix multiplication. The * parameters a, b, c, and d are the four bytes of output; i is the index * of the key bytes, and w, x, y, and z, are the column of constants from * the RS matrix, preprocessed through the poly_to_exp table. */ #define CALC_S(a, b, c, d, i, w, x, y, z) \ if (key[i]) { \ tmp = poly_to_exp[key[i] - 1]; \ (a) ^= exp_to_poly[tmp + (w)]; \ (b) ^= exp_to_poly[tmp + (x)]; \ (c) ^= exp_to_poly[tmp + (y)]; \ (d) ^= exp_to_poly[tmp + (z)]; \ } /* Macros to calculate the key-dependent S-boxes for a 128-bit key using * the S vector from CALC_S. CALC_SB_2 computes a single entry in all * four S-boxes, where i is the index of the entry to compute, and a and b * are the index numbers preprocessed through the q0 and q1 tables * respectively. */ #define CALC_SB_2(i, a, b) \ ctx->s[0][i] = mds[0][q0[(a) ^ sa] ^ se]; \ ctx->s[1][i] = mds[1][q0[(b) ^ sb] ^ sf]; \ ctx->s[2][i] = mds[2][q1[(a) ^ sc] ^ sg]; \ ctx->s[3][i] = mds[3][q1[(b) ^ sd] ^ sh] /* Macro exactly like CALC_SB_2, but for 192-bit keys. */ #define CALC_SB192_2(i, a, b) \ ctx->s[0][i] = mds[0][q0[q0[(b) ^ sa] ^ se] ^ si]; \ ctx->s[1][i] = mds[1][q0[q1[(b) ^ sb] ^ sf] ^ sj]; \ ctx->s[2][i] = mds[2][q1[q0[(a) ^ sc] ^ sg] ^ sk]; \ ctx->s[3][i] = mds[3][q1[q1[(a) ^ sd] ^ sh] ^ sl]; /* Macro exactly like CALC_SB_2, but for 256-bit keys. */ #define CALC_SB256_2(i, a, b) \ ctx->s[0][i] = mds[0][q0[q0[q1[(b) ^ sa] ^ se] ^ si] ^ sm]; \ ctx->s[1][i] = mds[1][q0[q1[q1[(a) ^ sb] ^ sf] ^ sj] ^ sn]; \ ctx->s[2][i] = mds[2][q1[q0[q0[(a) ^ sc] ^ sg] ^ sk] ^ so]; \ ctx->s[3][i] = mds[3][q1[q1[q0[(b) ^ sd] ^ sh] ^ sl] ^ sp]; /* Macros to calculate the whitening and round subkeys. CALC_K_2 computes the * last two stages of the h() function for a given index (either 2i or 2i+1). * a, b, c, and d are the four bytes going into the last two stages. For * 128-bit keys, this is the entire h() function and a and c are the index * preprocessed through q0 and q1 respectively; for longer keys they are the * output of previous stages. j is the index of the first key byte to use. * CALC_K computes a pair of subkeys for 128-bit Twofish, by calling CALC_K_2 * twice, doing the Pseudo-Hadamard Transform, and doing the necessary * rotations. Its parameters are: a, the array to write the results into, * j, the index of the first output entry, k and l, the preprocessed indices * for index 2i, and m and n, the preprocessed indices for index 2i+1. * CALC_K192_2 expands CALC_K_2 to handle 192-bit keys, by doing an * additional lookup-and-XOR stage. The parameters a, b, c and d are the * four bytes going into the last three stages. For 192-bit keys, c = d * are the index preprocessed through q0, and a = b are the index * preprocessed through q1; j is the index of the first key byte to use. * CALC_K192 is identical to CALC_K but for using the CALC_K192_2 macro * instead of CALC_K_2. * CALC_K256_2 expands CALC_K192_2 to handle 256-bit keys, by doing an * additional lookup-and-XOR stage. The parameters a and b are the index * preprocessed through q0 and q1 respectively; j is the index of the first * key byte to use. CALC_K256 is identical to CALC_K but for using the * CALC_K256_2 macro instead of CALC_K_2. */ #define CALC_K_2(a, b, c, d, j) \ mds[0][q0[a ^ key[(j) + 8]] ^ key[j]] \ ^ mds[1][q0[b ^ key[(j) + 9]] ^ key[(j) + 1]] \ ^ mds[2][q1[c ^ key[(j) + 10]] ^ key[(j) + 2]] \ ^ mds[3][q1[d ^ key[(j) + 11]] ^ key[(j) + 3]] #define CALC_K(a, j, k, l, m, n) \ x = CALC_K_2 (k, l, k, l, 0); \ y = CALC_K_2 (m, n, m, n, 4); \ y = rol32(y, 8); \ x += y; y += x; ctx->a[j] = x; \ ctx->a[(j) + 1] = rol32(y, 9) #define CALC_K192_2(a, b, c, d, j) \ CALC_K_2 (q0[a ^ key[(j) + 16]], \ q1[b ^ key[(j) + 17]], \ q0[c ^ key[(j) + 18]], \ q1[d ^ key[(j) + 19]], j) #define CALC_K192(a, j, k, l, m, n) \ x = CALC_K192_2 (l, l, k, k, 0); \ y = CALC_K192_2 (n, n, m, m, 4); \ y = rol32(y, 8); \ x += y; y += x; ctx->a[j] = x; \ ctx->a[(j) + 1] = rol32(y, 9) #define CALC_K256_2(a, b, j) \ CALC_K192_2 (q1[b ^ key[(j) + 24]], \ q1[a ^ key[(j) + 25]], \ q0[a ^ key[(j) + 26]], \ q0[b ^ key[(j) + 27]], j) #define CALC_K256(a, j, k, l, m, n) \ x = CALC_K256_2 (k, l, 0); \ y = CALC_K256_2 (m, n, 4); \ y = rol32(y, 8); \ x += y; y += x; ctx->a[j] = x; \ ctx->a[(j) + 1] = rol32(y, 9) /* Perform the key setup. */ int __twofish_setkey(struct twofish_ctx *ctx, const u8 *key, unsigned int key_len) { int i, j, k; /* Temporaries for CALC_K. */ u32 x, y; /* The S vector used to key the S-boxes, split up into individual bytes. * 128-bit keys use only sa through sh; 256-bit use all of them. */ u8 sa = 0, sb = 0, sc = 0, sd = 0, se = 0, sf = 0, sg = 0, sh = 0; u8 si = 0, sj = 0, sk = 0, sl = 0, sm = 0, sn = 0, so = 0, sp = 0; /* Temporary for CALC_S. */ u8 tmp; /* Check key length. */ if (key_len % 8) return -EINVAL; /* unsupported key length */ /* Compute the first two words of the S vector. The magic numbers are * the entries of the RS matrix, preprocessed through poly_to_exp. The * numbers in the comments are the original (polynomial form) matrix * entries. */ CALC_S (sa, sb, sc, sd, 0, 0x00, 0x2D, 0x01, 0x2D); /* 01 A4 02 A4 */ CALC_S (sa, sb, sc, sd, 1, 0x2D, 0xA4, 0x44, 0x8A); /* A4 56 A1 55 */ CALC_S (sa, sb, sc, sd, 2, 0x8A, 0xD5, 0xBF, 0xD1); /* 55 82 FC 87 */ CALC_S (sa, sb, sc, sd, 3, 0xD1, 0x7F, 0x3D, 0x99); /* 87 F3 C1 5A */ CALC_S (sa, sb, sc, sd, 4, 0x99, 0x46, 0x66, 0x96); /* 5A 1E 47 58 */ CALC_S (sa, sb, sc, sd, 5, 0x96, 0x3C, 0x5B, 0xED); /* 58 C6 AE DB */ CALC_S (sa, sb, sc, sd, 6, 0xED, 0x37, 0x4F, 0xE0); /* DB 68 3D 9E */ CALC_S (sa, sb, sc, sd, 7, 0xE0, 0xD0, 0x8C, 0x17); /* 9E E5 19 03 */ CALC_S (se, sf, sg, sh, 8, 0x00, 0x2D, 0x01, 0x2D); /* 01 A4 02 A4 */ CALC_S (se, sf, sg, sh, 9, 0x2D, 0xA4, 0x44, 0x8A); /* A4 56 A1 55 */ CALC_S (se, sf, sg, sh, 10, 0x8A, 0xD5, 0xBF, 0xD1); /* 55 82 FC 87 */ CALC_S (se, sf, sg, sh, 11, 0xD1, 0x7F, 0x3D, 0x99); /* 87 F3 C1 5A */ CALC_S (se, sf, sg, sh, 12, 0x99, 0x46, 0x66, 0x96); /* 5A 1E 47 58 */ CALC_S (se, sf, sg, sh, 13, 0x96, 0x3C, 0x5B, 0xED); /* 58 C6 AE DB */ CALC_S (se, sf, sg, sh, 14, 0xED, 0x37, 0x4F, 0xE0); /* DB 68 3D 9E */ CALC_S (se, sf, sg, sh, 15, 0xE0, 0xD0, 0x8C, 0x17); /* 9E E5 19 03 */ if (key_len == 24 || key_len == 32) { /* 192- or 256-bit key */ /* Calculate the third word of the S vector */ CALC_S (si, sj, sk, sl, 16, 0x00, 0x2D, 0x01, 0x2D); /* 01 A4 02 A4 */ CALC_S (si, sj, sk, sl, 17, 0x2D, 0xA4, 0x44, 0x8A); /* A4 56 A1 55 */ CALC_S (si, sj, sk, sl, 18, 0x8A, 0xD5, 0xBF, 0xD1); /* 55 82 FC 87 */ CALC_S (si, sj, sk, sl, 19, 0xD1, 0x7F, 0x3D, 0x99); /* 87 F3 C1 5A */ CALC_S (si, sj, sk, sl, 20, 0x99, 0x46, 0x66, 0x96); /* 5A 1E 47 58 */ CALC_S (si, sj, sk, sl, 21, 0x96, 0x3C, 0x5B, 0xED); /* 58 C6 AE DB */ CALC_S (si, sj, sk, sl, 22, 0xED, 0x37, 0x4F, 0xE0); /* DB 68 3D 9E */ CALC_S (si, sj, sk, sl, 23, 0xE0, 0xD0, 0x8C, 0x17); /* 9E E5 19 03 */ } if (key_len == 32) { /* 256-bit key */ /* Calculate the fourth word of the S vector */ CALC_S (sm, sn, so, sp, 24, 0x00, 0x2D, 0x01, 0x2D); /* 01 A4 02 A4 */ CALC_S (sm, sn, so, sp, 25, 0x2D, 0xA4, 0x44, 0x8A); /* A4 56 A1 55 */ CALC_S (sm, sn, so, sp, 26, 0x8A, 0xD5, 0xBF, 0xD1); /* 55 82 FC 87 */ CALC_S (sm, sn, so, sp, 27, 0xD1, 0x7F, 0x3D, 0x99); /* 87 F3 C1 5A */ CALC_S (sm, sn, so, sp, 28, 0x99, 0x46, 0x66, 0x96); /* 5A 1E 47 58 */ CALC_S (sm, sn, so, sp, 29, 0x96, 0x3C, 0x5B, 0xED); /* 58 C6 AE DB */ CALC_S (sm, sn, so, sp, 30, 0xED, 0x37, 0x4F, 0xE0); /* DB 68 3D 9E */ CALC_S (sm, sn, so, sp, 31, 0xE0, 0xD0, 0x8C, 0x17); /* 9E E5 19 03 */ /* Compute the S-boxes. */ for ( i = j = 0, k = 1; i < 256; i++, j += 2, k += 2 ) { CALC_SB256_2( i, calc_sb_tbl[j], calc_sb_tbl[k] ); } /* CALC_K256/CALC_K192/CALC_K loops were unrolled. * Unrolling produced x2.5 more code (+18k on i386), * and speeded up key setup by 7%: * unrolled: twofish_setkey/sec: 41128 * loop: twofish_setkey/sec: 38148 * CALC_K256: ~100 insns each * CALC_K192: ~90 insns * CALC_K: ~70 insns */ /* Calculate whitening and round subkeys */ for ( i = 0; i < 8; i += 2 ) { CALC_K256 (w, i, q0[i], q1[i], q0[i+1], q1[i+1]); } for ( i = 0; i < 32; i += 2 ) { CALC_K256 (k, i, q0[i+8], q1[i+8], q0[i+9], q1[i+9]); } } else if (key_len == 24) { /* 192-bit key */ /* Compute the S-boxes. */ for ( i = j = 0, k = 1; i < 256; i++, j += 2, k += 2 ) { CALC_SB192_2( i, calc_sb_tbl[j], calc_sb_tbl[k] ); } /* Calculate whitening and round subkeys */ for ( i = 0; i < 8; i += 2 ) { CALC_K192 (w, i, q0[i], q1[i], q0[i+1], q1[i+1]); } for ( i = 0; i < 32; i += 2 ) { CALC_K192 (k, i, q0[i+8], q1[i+8], q0[i+9], q1[i+9]); } } else { /* 128-bit key */ /* Compute the S-boxes. */ for ( i = j = 0, k = 1; i < 256; i++, j += 2, k += 2 ) { CALC_SB_2( i, calc_sb_tbl[j], calc_sb_tbl[k] ); } /* Calculate whitening and round subkeys */ for ( i = 0; i < 8; i += 2 ) { CALC_K (w, i, q0[i], q1[i], q0[i+1], q1[i+1]); } for ( i = 0; i < 32; i += 2 ) { CALC_K (k, i, q0[i+8], q1[i+8], q0[i+9], q1[i+9]); } } return 0; } EXPORT_SYMBOL_GPL(__twofish_setkey); int twofish_setkey(struct crypto_tfm *tfm, const u8 *key, unsigned int key_len) { return __twofish_setkey(crypto_tfm_ctx(tfm), key, key_len); } EXPORT_SYMBOL_GPL(twofish_setkey); MODULE_LICENSE("GPL"); MODULE_DESCRIPTION("Twofish cipher common functions"); |
| 6 6 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 | // SPDX-License-Identifier: GPL-2.0-only /* * * Author Karsten Keil <kkeil@novell.com> * * Copyright 2008 by Karsten Keil <kkeil@novell.com> */ #include <linux/slab.h> #include <linux/mISDNif.h> #include <linux/kthread.h> #include <linux/sched.h> #include <linux/sched/cputime.h> #include <linux/signal.h> #include "core.h" static u_int *debug; static inline void _queue_message(struct mISDNstack *st, struct sk_buff *skb) { struct mISDNhead *hh = mISDN_HEAD_P(skb); if (*debug & DEBUG_QUEUE_FUNC) printk(KERN_DEBUG "%s prim(%x) id(%x) %p\n", __func__, hh->prim, hh->id, skb); skb_queue_tail(&st->msgq, skb); if (likely(!test_bit(mISDN_STACK_STOPPED, &st->status))) { test_and_set_bit(mISDN_STACK_WORK, &st->status); wake_up_interruptible(&st->workq); } } static int mISDN_queue_message(struct mISDNchannel *ch, struct sk_buff *skb) { _queue_message(ch->st, skb); return 0; } static struct mISDNchannel * get_channel4id(struct mISDNstack *st, u_int id) { struct mISDNchannel *ch; mutex_lock(&st->lmutex); list_for_each_entry(ch, &st->layer2, list) { if (id == ch->nr) goto unlock; } ch = NULL; unlock: mutex_unlock(&st->lmutex); return ch; } static void send_socklist(struct mISDN_sock_list *sl, struct sk_buff *skb) { struct sock *sk; struct sk_buff *cskb = NULL; read_lock(&sl->lock); sk_for_each(sk, &sl->head) { if (sk->sk_state != MISDN_BOUND) continue; if (!cskb) cskb = skb_copy(skb, GFP_ATOMIC); if (!cskb) { printk(KERN_WARNING "%s no skb\n", __func__); break; } if (!sock_queue_rcv_skb(sk, cskb)) cskb = NULL; } read_unlock(&sl->lock); dev_kfree_skb(cskb); } static void send_layer2(struct mISDNstack *st, struct sk_buff *skb) { struct sk_buff *cskb; struct mISDNhead *hh = mISDN_HEAD_P(skb); struct mISDNchannel *ch; int ret; if (!st) return; mutex_lock(&st->lmutex); if ((hh->id & MISDN_ID_ADDR_MASK) == MISDN_ID_ANY) { /* L2 for all */ list_for_each_entry(ch, &st->layer2, list) { if (list_is_last(&ch->list, &st->layer2)) { cskb = skb; skb = NULL; } else { cskb = skb_copy(skb, GFP_KERNEL); } if (cskb) { ret = ch->send(ch, cskb); if (ret) { if (*debug & DEBUG_SEND_ERR) printk(KERN_DEBUG "%s ch%d prim(%x) addr(%x)" " err %d\n", __func__, ch->nr, hh->prim, ch->addr, ret); dev_kfree_skb(cskb); } } else { printk(KERN_WARNING "%s ch%d addr %x no mem\n", __func__, ch->nr, ch->addr); goto out; } } } else { list_for_each_entry(ch, &st->layer2, list) { if ((hh->id & MISDN_ID_ADDR_MASK) == ch->addr) { ret = ch->send(ch, skb); if (!ret) skb = NULL; goto out; } } ret = st->dev->teimgr->ctrl(st->dev->teimgr, CHECK_DATA, skb); if (!ret) skb = NULL; else if (*debug & DEBUG_SEND_ERR) printk(KERN_DEBUG "%s mgr prim(%x) err %d\n", __func__, hh->prim, ret); } out: mutex_unlock(&st->lmutex); dev_kfree_skb(skb); } static inline int send_msg_to_layer(struct mISDNstack *st, struct sk_buff *skb) { struct mISDNhead *hh = mISDN_HEAD_P(skb); struct mISDNchannel *ch; int lm; lm = hh->prim & MISDN_LAYERMASK; if (*debug & DEBUG_QUEUE_FUNC) printk(KERN_DEBUG "%s prim(%x) id(%x) %p\n", __func__, hh->prim, hh->id, skb); if (lm == 0x1) { if (!hlist_empty(&st->l1sock.head)) { __net_timestamp(skb); send_socklist(&st->l1sock, skb); } return st->layer1->send(st->layer1, skb); } else if (lm == 0x2) { if (!hlist_empty(&st->l1sock.head)) send_socklist(&st->l1sock, skb); send_layer2(st, skb); return 0; } else if (lm == 0x4) { ch = get_channel4id(st, hh->id); if (ch) return ch->send(ch, skb); else printk(KERN_WARNING "%s: dev(%s) prim(%x) id(%x) no channel\n", __func__, dev_name(&st->dev->dev), hh->prim, hh->id); } else if (lm == 0x8) { WARN_ON(lm == 0x8); ch = get_channel4id(st, hh->id); if (ch) return ch->send(ch, skb); else printk(KERN_WARNING "%s: dev(%s) prim(%x) id(%x) no channel\n", __func__, dev_name(&st->dev->dev), hh->prim, hh->id); } else { /* broadcast not handled yet */ printk(KERN_WARNING "%s: dev(%s) prim %x not delivered\n", __func__, dev_name(&st->dev->dev), hh->prim); } return -ESRCH; } static void do_clear_stack(struct mISDNstack *st) { } static int mISDNStackd(void *data) { struct mISDNstack *st = data; #ifdef MISDN_MSG_STATS u64 utime, stime; #endif int err = 0; sigfillset(¤t->blocked); if (*debug & DEBUG_MSG_THREAD) printk(KERN_DEBUG "mISDNStackd %s started\n", dev_name(&st->dev->dev)); if (st->notify != NULL) { complete(st->notify); st->notify = NULL; } for (;;) { struct sk_buff *skb; if (unlikely(test_bit(mISDN_STACK_STOPPED, &st->status))) { test_and_clear_bit(mISDN_STACK_WORK, &st->status); test_and_clear_bit(mISDN_STACK_RUNNING, &st->status); } else test_and_set_bit(mISDN_STACK_RUNNING, &st->status); while (test_bit(mISDN_STACK_WORK, &st->status)) { skb = skb_dequeue(&st->msgq); if (!skb) { test_and_clear_bit(mISDN_STACK_WORK, &st->status); /* test if a race happens */ skb = skb_dequeue(&st->msgq); if (!skb) continue; test_and_set_bit(mISDN_STACK_WORK, &st->status); } #ifdef MISDN_MSG_STATS st->msg_cnt++; #endif err = send_msg_to_layer(st, skb); if (unlikely(err)) { if (*debug & DEBUG_SEND_ERR) printk(KERN_DEBUG "%s: %s prim(%x) id(%x) " "send call(%d)\n", __func__, dev_name(&st->dev->dev), mISDN_HEAD_PRIM(skb), mISDN_HEAD_ID(skb), err); dev_kfree_skb(skb); continue; } if (unlikely(test_bit(mISDN_STACK_STOPPED, &st->status))) { test_and_clear_bit(mISDN_STACK_WORK, &st->status); test_and_clear_bit(mISDN_STACK_RUNNING, &st->status); break; } } if (test_bit(mISDN_STACK_CLEARING, &st->status)) { test_and_set_bit(mISDN_STACK_STOPPED, &st->status); test_and_clear_bit(mISDN_STACK_RUNNING, &st->status); do_clear_stack(st); test_and_clear_bit(mISDN_STACK_CLEARING, &st->status); test_and_set_bit(mISDN_STACK_RESTART, &st->status); } if (test_and_clear_bit(mISDN_STACK_RESTART, &st->status)) { test_and_clear_bit(mISDN_STACK_STOPPED, &st->status); test_and_set_bit(mISDN_STACK_RUNNING, &st->status); if (!skb_queue_empty(&st->msgq)) test_and_set_bit(mISDN_STACK_WORK, &st->status); } if (test_bit(mISDN_STACK_ABORT, &st->status)) break; if (st->notify != NULL) { complete(st->notify); st->notify = NULL; } #ifdef MISDN_MSG_STATS st->sleep_cnt++; #endif test_and_clear_bit(mISDN_STACK_ACTIVE, &st->status); wait_event_interruptible(st->workq, (st->status & mISDN_STACK_ACTION_MASK)); if (*debug & DEBUG_MSG_THREAD) printk(KERN_DEBUG "%s: %s wake status %08lx\n", __func__, dev_name(&st->dev->dev), st->status); test_and_set_bit(mISDN_STACK_ACTIVE, &st->status); test_and_clear_bit(mISDN_STACK_WAKEUP, &st->status); if (test_bit(mISDN_STACK_STOPPED, &st->status)) { test_and_clear_bit(mISDN_STACK_RUNNING, &st->status); #ifdef MISDN_MSG_STATS st->stopped_cnt++; #endif } } #ifdef MISDN_MSG_STATS printk(KERN_DEBUG "mISDNStackd daemon for %s proceed %d " "msg %d sleep %d stopped\n", dev_name(&st->dev->dev), st->msg_cnt, st->sleep_cnt, st->stopped_cnt); task_cputime(st->thread, &utime, &stime); printk(KERN_DEBUG "mISDNStackd daemon for %s utime(%llu) stime(%llu)\n", dev_name(&st->dev->dev), utime, stime); printk(KERN_DEBUG "mISDNStackd daemon for %s nvcsw(%ld) nivcsw(%ld)\n", dev_name(&st->dev->dev), st->thread->nvcsw, st->thread->nivcsw); printk(KERN_DEBUG "mISDNStackd daemon for %s killed now\n", dev_name(&st->dev->dev)); #endif test_and_set_bit(mISDN_STACK_KILLED, &st->status); test_and_clear_bit(mISDN_STACK_RUNNING, &st->status); test_and_clear_bit(mISDN_STACK_ACTIVE, &st->status); test_and_clear_bit(mISDN_STACK_ABORT, &st->status); skb_queue_purge(&st->msgq); st->thread = NULL; if (st->notify != NULL) { complete(st->notify); st->notify = NULL; } return 0; } static int l1_receive(struct mISDNchannel *ch, struct sk_buff *skb) { if (!ch->st) return -ENODEV; __net_timestamp(skb); _queue_message(ch->st, skb); return 0; } void set_channel_address(struct mISDNchannel *ch, u_int sapi, u_int tei) { ch->addr = sapi | (tei << 8); } void __add_layer2(struct mISDNchannel *ch, struct mISDNstack *st) { list_add_tail(&ch->list, &st->layer2); } void add_layer2(struct mISDNchannel *ch, struct mISDNstack *st) { mutex_lock(&st->lmutex); __add_layer2(ch, st); mutex_unlock(&st->lmutex); } static int st_own_ctrl(struct mISDNchannel *ch, u_int cmd, void *arg) { if (!ch->st || !ch->st->layer1) return -EINVAL; return ch->st->layer1->ctrl(ch->st->layer1, cmd, arg); } int create_stack(struct mISDNdevice *dev) { struct mISDNstack *newst; int err; DECLARE_COMPLETION_ONSTACK(done); newst = kzalloc(sizeof(struct mISDNstack), GFP_KERNEL); if (!newst) { printk(KERN_ERR "kmalloc mISDN_stack failed\n"); return -ENOMEM; } newst->dev = dev; INIT_LIST_HEAD(&newst->layer2); INIT_HLIST_HEAD(&newst->l1sock.head); rwlock_init(&newst->l1sock.lock); init_waitqueue_head(&newst->workq); skb_queue_head_init(&newst->msgq); mutex_init(&newst->lmutex); dev->D.st = newst; err = create_teimanager(dev); if (err) { printk(KERN_ERR "kmalloc teimanager failed\n"); kfree(newst); return err; } dev->teimgr->peer = &newst->own; dev->teimgr->recv = mISDN_queue_message; dev->teimgr->st = newst; newst->layer1 = &dev->D; dev->D.recv = l1_receive; dev->D.peer = &newst->own; newst->own.st = newst; newst->own.ctrl = st_own_ctrl; newst->own.send = mISDN_queue_message; newst->own.recv = mISDN_queue_message; if (*debug & DEBUG_CORE_FUNC) printk(KERN_DEBUG "%s: st(%s)\n", __func__, dev_name(&newst->dev->dev)); newst->notify = &done; newst->thread = kthread_run(mISDNStackd, (void *)newst, "mISDN_%s", dev_name(&newst->dev->dev)); if (IS_ERR(newst->thread)) { err = PTR_ERR(newst->thread); printk(KERN_ERR "mISDN:cannot create kernel thread for %s (%d)\n", dev_name(&newst->dev->dev), err); delete_teimanager(dev->teimgr); kfree(newst); } else wait_for_completion(&done); return err; } int connect_layer1(struct mISDNdevice *dev, struct mISDNchannel *ch, u_int protocol, struct sockaddr_mISDN *adr) { struct mISDN_sock *msk = container_of(ch, struct mISDN_sock, ch); struct channel_req rq; int err; if (*debug & DEBUG_CORE_FUNC) printk(KERN_DEBUG "%s: %s proto(%x) adr(%d %d %d %d)\n", __func__, dev_name(&dev->dev), protocol, adr->dev, adr->channel, adr->sapi, adr->tei); switch (protocol) { case ISDN_P_NT_S0: case ISDN_P_NT_E1: case ISDN_P_TE_S0: case ISDN_P_TE_E1: ch->recv = mISDN_queue_message; ch->peer = &dev->D.st->own; ch->st = dev->D.st; rq.protocol = protocol; rq.adr.channel = adr->channel; err = dev->D.ctrl(&dev->D, OPEN_CHANNEL, &rq); printk(KERN_DEBUG "%s: ret %d (dev %d)\n", __func__, err, dev->id); if (err) return err; write_lock_bh(&dev->D.st->l1sock.lock); sk_add_node(&msk->sk, &dev->D.st->l1sock.head); write_unlock_bh(&dev->D.st->l1sock.lock); break; default: return -ENOPROTOOPT; } return 0; } int connect_Bstack(struct mISDNdevice *dev, struct mISDNchannel *ch, u_int protocol, struct sockaddr_mISDN *adr) { struct channel_req rq, rq2; int pmask, err; struct Bprotocol *bp; if (*debug & DEBUG_CORE_FUNC) printk(KERN_DEBUG "%s: %s proto(%x) adr(%d %d %d %d)\n", __func__, dev_name(&dev->dev), protocol, adr->dev, adr->channel, adr->sapi, adr->tei); ch->st = dev->D.st; pmask = 1 << (protocol & ISDN_P_B_MASK); if (pmask & dev->Bprotocols) { rq.protocol = protocol; rq.adr = *adr; err = dev->D.ctrl(&dev->D, OPEN_CHANNEL, &rq); if (err) return err; ch->recv = rq.ch->send; ch->peer = rq.ch; rq.ch->recv = ch->send; rq.ch->peer = ch; rq.ch->st = dev->D.st; } else { bp = get_Bprotocol4mask(pmask); if (!bp) return -ENOPROTOOPT; rq2.protocol = protocol; rq2.adr = *adr; rq2.ch = ch; err = bp->create(&rq2); if (err) return err; ch->recv = rq2.ch->send; ch->peer = rq2.ch; rq2.ch->st = dev->D.st; rq.protocol = rq2.protocol; rq.adr = *adr; err = dev->D.ctrl(&dev->D, OPEN_CHANNEL, &rq); if (err) { rq2.ch->ctrl(rq2.ch, CLOSE_CHANNEL, NULL); return err; } rq2.ch->recv = rq.ch->send; rq2.ch->peer = rq.ch; rq.ch->recv = rq2.ch->send; rq.ch->peer = rq2.ch; rq.ch->st = dev->D.st; } ch->protocol = protocol; ch->nr = rq.ch->nr; return 0; } int create_l2entity(struct mISDNdevice *dev, struct mISDNchannel *ch, u_int protocol, struct sockaddr_mISDN *adr) { struct channel_req rq; int err; if (*debug & DEBUG_CORE_FUNC) printk(KERN_DEBUG "%s: %s proto(%x) adr(%d %d %d %d)\n", __func__, dev_name(&dev->dev), protocol, adr->dev, adr->channel, adr->sapi, adr->tei); rq.protocol = ISDN_P_TE_S0; if (dev->Dprotocols & (1 << ISDN_P_TE_E1)) rq.protocol = ISDN_P_TE_E1; switch (protocol) { case ISDN_P_LAPD_NT: rq.protocol = ISDN_P_NT_S0; if (dev->Dprotocols & (1 << ISDN_P_NT_E1)) rq.protocol = ISDN_P_NT_E1; fallthrough; case ISDN_P_LAPD_TE: ch->recv = mISDN_queue_message; ch->peer = &dev->D.st->own; ch->st = dev->D.st; rq.adr.channel = 0; err = dev->D.ctrl(&dev->D, OPEN_CHANNEL, &rq); printk(KERN_DEBUG "%s: ret 1 %d\n", __func__, err); if (err) break; rq.protocol = protocol; rq.adr = *adr; rq.ch = ch; err = dev->teimgr->ctrl(dev->teimgr, OPEN_CHANNEL, &rq); printk(KERN_DEBUG "%s: ret 2 %d\n", __func__, err); if (!err) { if ((protocol == ISDN_P_LAPD_NT) && !rq.ch) break; add_layer2(rq.ch, dev->D.st); rq.ch->recv = mISDN_queue_message; rq.ch->peer = &dev->D.st->own; rq.ch->ctrl(rq.ch, OPEN_CHANNEL, NULL); /* can't fail */ } break; default: err = -EPROTONOSUPPORT; } return err; } void delete_channel(struct mISDNchannel *ch) { struct mISDN_sock *msk = container_of(ch, struct mISDN_sock, ch); struct mISDNchannel *pch; if (!ch->st) { printk(KERN_WARNING "%s: no stack\n", __func__); return; } if (*debug & DEBUG_CORE_FUNC) printk(KERN_DEBUG "%s: st(%s) protocol(%x)\n", __func__, dev_name(&ch->st->dev->dev), ch->protocol); if (ch->protocol >= ISDN_P_B_START) { if (ch->peer) { ch->peer->ctrl(ch->peer, CLOSE_CHANNEL, NULL); ch->peer = NULL; } return; } switch (ch->protocol) { case ISDN_P_NT_S0: case ISDN_P_TE_S0: case ISDN_P_NT_E1: case ISDN_P_TE_E1: write_lock_bh(&ch->st->l1sock.lock); sk_del_node_init(&msk->sk); write_unlock_bh(&ch->st->l1sock.lock); ch->st->dev->D.ctrl(&ch->st->dev->D, CLOSE_CHANNEL, NULL); break; case ISDN_P_LAPD_TE: pch = get_channel4id(ch->st, ch->nr); if (pch) { mutex_lock(&ch->st->lmutex); list_del(&pch->list); mutex_unlock(&ch->st->lmutex); pch->ctrl(pch, CLOSE_CHANNEL, NULL); pch = ch->st->dev->teimgr; pch->ctrl(pch, CLOSE_CHANNEL, NULL); } else printk(KERN_WARNING "%s: no l2 channel\n", __func__); break; case ISDN_P_LAPD_NT: pch = ch->st->dev->teimgr; if (pch) { pch->ctrl(pch, CLOSE_CHANNEL, NULL); } else printk(KERN_WARNING "%s: no l2 channel\n", __func__); break; default: break; } return; } void delete_stack(struct mISDNdevice *dev) { struct mISDNstack *st = dev->D.st; DECLARE_COMPLETION_ONSTACK(done); if (*debug & DEBUG_CORE_FUNC) printk(KERN_DEBUG "%s: st(%s)\n", __func__, dev_name(&st->dev->dev)); if (dev->teimgr) delete_teimanager(dev->teimgr); if (st->thread) { if (st->notify) { printk(KERN_WARNING "%s: notifier in use\n", __func__); complete(st->notify); } st->notify = &done; test_and_set_bit(mISDN_STACK_ABORT, &st->status); test_and_set_bit(mISDN_STACK_WAKEUP, &st->status); wake_up_interruptible(&st->workq); wait_for_completion(&done); } if (!list_empty(&st->layer2)) printk(KERN_WARNING "%s: layer2 list not empty\n", __func__); if (!hlist_empty(&st->l1sock.head)) printk(KERN_WARNING "%s: layer1 list not empty\n", __func__); kfree(st); } void mISDN_initstack(u_int *dp) { debug = dp; } |
| 50 50 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 | // SPDX-License-Identifier: GPL-2.0 /* * sysctl_net_ipv6.c: sysctl interface to net IPV6 subsystem. * * Changes: * YOSHIFUJI Hideaki @USAGI: added icmp sysctl table. */ #include <linux/mm.h> #include <linux/sysctl.h> #include <linux/in6.h> #include <linux/ipv6.h> #include <linux/slab.h> #include <linux/export.h> #include <net/ndisc.h> #include <net/ipv6.h> #include <net/addrconf.h> #include <net/inet_frag.h> #include <net/netevent.h> #include <net/ip_fib.h> #ifdef CONFIG_NETLABEL #include <net/calipso.h> #endif #include <linux/ioam6.h> static int flowlabel_reflect_max = 0x7; static int auto_flowlabels_max = IP6_AUTO_FLOW_LABEL_MAX; static u32 rt6_multipath_hash_fields_all_mask = FIB_MULTIPATH_HASH_FIELD_ALL_MASK; static u32 ioam6_id_max = IOAM6_DEFAULT_ID; static u64 ioam6_id_wide_max = IOAM6_DEFAULT_ID_WIDE; static int proc_rt6_multipath_hash_policy(const struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos) { struct net *net; int ret; net = container_of(table->data, struct net, ipv6.sysctl.multipath_hash_policy); ret = proc_dou8vec_minmax(table, write, buffer, lenp, ppos); if (write && ret == 0) call_netevent_notifiers(NETEVENT_IPV6_MPATH_HASH_UPDATE, net); return ret; } static int proc_rt6_multipath_hash_fields(const struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos) { struct net *net; int ret; net = container_of(table->data, struct net, ipv6.sysctl.multipath_hash_fields); ret = proc_douintvec_minmax(table, write, buffer, lenp, ppos); if (write && ret == 0) call_netevent_notifiers(NETEVENT_IPV6_MPATH_HASH_UPDATE, net); return ret; } static struct ctl_table ipv6_table_template[] = { { .procname = "bindv6only", .data = &init_net.ipv6.sysctl.bindv6only, .maxlen = sizeof(u8), .mode = 0644, .proc_handler = proc_dou8vec_minmax, }, { .procname = "anycast_src_echo_reply", .data = &init_net.ipv6.sysctl.anycast_src_echo_reply, .maxlen = sizeof(u8), .mode = 0644, .proc_handler = proc_dou8vec_minmax, }, { .procname = "flowlabel_consistency", .data = &init_net.ipv6.sysctl.flowlabel_consistency, .maxlen = sizeof(u8), .mode = 0644, .proc_handler = proc_dou8vec_minmax, }, { .procname = "auto_flowlabels", .data = &init_net.ipv6.sysctl.auto_flowlabels, .maxlen = sizeof(u8), .mode = 0644, .proc_handler = proc_dou8vec_minmax, .extra2 = &auto_flowlabels_max }, { .procname = "fwmark_reflect", .data = &init_net.ipv6.sysctl.fwmark_reflect, .maxlen = sizeof(u8), .mode = 0644, .proc_handler = proc_dou8vec_minmax, }, { .procname = "idgen_retries", .data = &init_net.ipv6.sysctl.idgen_retries, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec, }, { .procname = "idgen_delay", .data = &init_net.ipv6.sysctl.idgen_delay, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec_jiffies, }, { .procname = "flowlabel_state_ranges", .data = &init_net.ipv6.sysctl.flowlabel_state_ranges, .maxlen = sizeof(u8), .mode = 0644, .proc_handler = proc_dou8vec_minmax, }, { .procname = "ip_nonlocal_bind", .data = &init_net.ipv6.sysctl.ip_nonlocal_bind, .maxlen = sizeof(u8), .mode = 0644, .proc_handler = proc_dou8vec_minmax, }, { .procname = "flowlabel_reflect", .data = &init_net.ipv6.sysctl.flowlabel_reflect, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec_minmax, .extra1 = SYSCTL_ZERO, .extra2 = &flowlabel_reflect_max, }, { .procname = "max_dst_opts_number", .data = &init_net.ipv6.sysctl.max_dst_opts_cnt, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec }, { .procname = "max_hbh_opts_number", .data = &init_net.ipv6.sysctl.max_hbh_opts_cnt, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec }, { .procname = "max_dst_opts_length", .data = &init_net.ipv6.sysctl.max_dst_opts_len, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec }, { .procname = "max_hbh_length", .data = &init_net.ipv6.sysctl.max_hbh_opts_len, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec }, { .procname = "fib_multipath_hash_policy", .data = &init_net.ipv6.sysctl.multipath_hash_policy, .maxlen = sizeof(u8), .mode = 0644, .proc_handler = proc_rt6_multipath_hash_policy, .extra1 = SYSCTL_ZERO, .extra2 = SYSCTL_THREE, }, { .procname = "fib_multipath_hash_fields", .data = &init_net.ipv6.sysctl.multipath_hash_fields, .maxlen = sizeof(u32), .mode = 0644, .proc_handler = proc_rt6_multipath_hash_fields, .extra1 = SYSCTL_ONE, .extra2 = &rt6_multipath_hash_fields_all_mask, }, { .procname = "seg6_flowlabel", .data = &init_net.ipv6.sysctl.seg6_flowlabel, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec }, { .procname = "fib_notify_on_flag_change", .data = &init_net.ipv6.sysctl.fib_notify_on_flag_change, .maxlen = sizeof(u8), .mode = 0644, .proc_handler = proc_dou8vec_minmax, .extra1 = SYSCTL_ZERO, .extra2 = SYSCTL_TWO, }, { .procname = "ioam6_id", .data = &init_net.ipv6.sysctl.ioam6_id, .maxlen = sizeof(u32), .mode = 0644, .proc_handler = proc_douintvec_minmax, .extra2 = &ioam6_id_max, }, { .procname = "ioam6_id_wide", .data = &init_net.ipv6.sysctl.ioam6_id_wide, .maxlen = sizeof(u64), .mode = 0644, .proc_handler = proc_doulongvec_minmax, .extra2 = &ioam6_id_wide_max, }, }; static struct ctl_table ipv6_rotable[] = { { .procname = "mld_max_msf", .data = &sysctl_mld_max_msf, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec }, { .procname = "mld_qrv", .data = &sysctl_mld_qrv, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec_minmax, .extra1 = SYSCTL_ONE }, #ifdef CONFIG_NETLABEL { .procname = "calipso_cache_enable", .data = &calipso_cache_enabled, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec, }, { .procname = "calipso_cache_bucket_size", .data = &calipso_cache_bucketsize, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec, }, #endif /* CONFIG_NETLABEL */ }; static int __net_init ipv6_sysctl_net_init(struct net *net) { size_t table_size = ARRAY_SIZE(ipv6_table_template); struct ctl_table *ipv6_table; struct ctl_table *ipv6_route_table; struct ctl_table *ipv6_icmp_table; int err, i; err = -ENOMEM; ipv6_table = kmemdup(ipv6_table_template, sizeof(ipv6_table_template), GFP_KERNEL); if (!ipv6_table) goto out; /* Update the variables to point into the current struct net */ for (i = 0; i < table_size; i++) ipv6_table[i].data += (void *)net - (void *)&init_net; ipv6_route_table = ipv6_route_sysctl_init(net); if (!ipv6_route_table) goto out_ipv6_table; ipv6_icmp_table = ipv6_icmp_sysctl_init(net); if (!ipv6_icmp_table) goto out_ipv6_route_table; net->ipv6.sysctl.hdr = register_net_sysctl_sz(net, "net/ipv6", ipv6_table, table_size); if (!net->ipv6.sysctl.hdr) goto out_ipv6_icmp_table; net->ipv6.sysctl.route_hdr = register_net_sysctl_sz(net, "net/ipv6/route", ipv6_route_table, ipv6_route_sysctl_table_size(net)); if (!net->ipv6.sysctl.route_hdr) goto out_unregister_ipv6_table; net->ipv6.sysctl.icmp_hdr = register_net_sysctl_sz(net, "net/ipv6/icmp", ipv6_icmp_table, ipv6_icmp_sysctl_table_size()); if (!net->ipv6.sysctl.icmp_hdr) goto out_unregister_route_table; err = 0; out: return err; out_unregister_route_table: unregister_net_sysctl_table(net->ipv6.sysctl.route_hdr); out_unregister_ipv6_table: unregister_net_sysctl_table(net->ipv6.sysctl.hdr); out_ipv6_icmp_table: kfree(ipv6_icmp_table); out_ipv6_route_table: kfree(ipv6_route_table); out_ipv6_table: kfree(ipv6_table); goto out; } static void __net_exit ipv6_sysctl_net_exit(struct net *net) { const struct ctl_table *ipv6_table; const struct ctl_table *ipv6_route_table; const struct ctl_table *ipv6_icmp_table; ipv6_table = net->ipv6.sysctl.hdr->ctl_table_arg; ipv6_route_table = net->ipv6.sysctl.route_hdr->ctl_table_arg; ipv6_icmp_table = net->ipv6.sysctl.icmp_hdr->ctl_table_arg; unregister_net_sysctl_table(net->ipv6.sysctl.icmp_hdr); unregister_net_sysctl_table(net->ipv6.sysctl.route_hdr); unregister_net_sysctl_table(net->ipv6.sysctl.hdr); kfree(ipv6_table); kfree(ipv6_route_table); kfree(ipv6_icmp_table); } static struct pernet_operations ipv6_sysctl_net_ops = { .init = ipv6_sysctl_net_init, .exit = ipv6_sysctl_net_exit, }; static struct ctl_table_header *ip6_header; int ipv6_sysctl_register(void) { int err = -ENOMEM; ip6_header = register_net_sysctl(&init_net, "net/ipv6", ipv6_rotable); if (!ip6_header) goto out; err = register_pernet_subsys(&ipv6_sysctl_net_ops); if (err) goto err_pernet; out: return err; err_pernet: unregister_net_sysctl_table(ip6_header); goto out; } void ipv6_sysctl_unregister(void) { unregister_net_sysctl_table(ip6_header); unregister_pernet_subsys(&ipv6_sysctl_net_ops); } |
| 1 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 | /* SPDX-License-Identifier: GPL-2.0 */ #undef TRACE_SYSTEM #define TRACE_SYSTEM handshake #if !defined(_TRACE_HANDSHAKE_H) || defined(TRACE_HEADER_MULTI_READ) #define _TRACE_HANDSHAKE_H #include <linux/net.h> #include <net/tls_prot.h> #include <linux/tracepoint.h> #include <trace/events/net_probe_common.h> #define TLS_RECORD_TYPE_LIST \ record_type(CHANGE_CIPHER_SPEC) \ record_type(ALERT) \ record_type(HANDSHAKE) \ record_type(DATA) \ record_type(HEARTBEAT) \ record_type(TLS12_CID) \ record_type_end(ACK) #undef record_type #undef record_type_end #define record_type(x) TRACE_DEFINE_ENUM(TLS_RECORD_TYPE_##x); #define record_type_end(x) TRACE_DEFINE_ENUM(TLS_RECORD_TYPE_##x); TLS_RECORD_TYPE_LIST #undef record_type #undef record_type_end #define record_type(x) { TLS_RECORD_TYPE_##x, #x }, #define record_type_end(x) { TLS_RECORD_TYPE_##x, #x } #define show_tls_content_type(type) \ __print_symbolic(type, TLS_RECORD_TYPE_LIST) TRACE_DEFINE_ENUM(TLS_ALERT_LEVEL_WARNING); TRACE_DEFINE_ENUM(TLS_ALERT_LEVEL_FATAL); #define show_tls_alert_level(level) \ __print_symbolic(level, \ { TLS_ALERT_LEVEL_WARNING, "Warning" }, \ { TLS_ALERT_LEVEL_FATAL, "Fatal" }) #define TLS_ALERT_DESCRIPTION_LIST \ alert_description(CLOSE_NOTIFY) \ alert_description(UNEXPECTED_MESSAGE) \ alert_description(BAD_RECORD_MAC) \ alert_description(RECORD_OVERFLOW) \ alert_description(HANDSHAKE_FAILURE) \ alert_description(BAD_CERTIFICATE) \ alert_description(UNSUPPORTED_CERTIFICATE) \ alert_description(CERTIFICATE_REVOKED) \ alert_description(CERTIFICATE_EXPIRED) \ alert_description(CERTIFICATE_UNKNOWN) \ alert_description(ILLEGAL_PARAMETER) \ alert_description(UNKNOWN_CA) \ alert_description(ACCESS_DENIED) \ alert_description(DECODE_ERROR) \ alert_description(DECRYPT_ERROR) \ alert_description(TOO_MANY_CIDS_REQUESTED) \ alert_description(PROTOCOL_VERSION) \ alert_description(INSUFFICIENT_SECURITY) \ alert_description(INTERNAL_ERROR) \ alert_description(INAPPROPRIATE_FALLBACK) \ alert_description(USER_CANCELED) \ alert_description(MISSING_EXTENSION) \ alert_description(UNSUPPORTED_EXTENSION) \ alert_description(UNRECOGNIZED_NAME) \ alert_description(BAD_CERTIFICATE_STATUS_RESPONSE) \ alert_description(UNKNOWN_PSK_IDENTITY) \ alert_description(CERTIFICATE_REQUIRED) \ alert_description_end(NO_APPLICATION_PROTOCOL) #undef alert_description #undef alert_description_end #define alert_description(x) TRACE_DEFINE_ENUM(TLS_ALERT_DESC_##x); #define alert_description_end(x) TRACE_DEFINE_ENUM(TLS_ALERT_DESC_##x); TLS_ALERT_DESCRIPTION_LIST #undef alert_description #undef alert_description_end #define alert_description(x) { TLS_ALERT_DESC_##x, #x }, #define alert_description_end(x) { TLS_ALERT_DESC_##x, #x } #define show_tls_alert_description(desc) \ __print_symbolic(desc, TLS_ALERT_DESCRIPTION_LIST) DECLARE_EVENT_CLASS(handshake_event_class, TP_PROTO( const struct net *net, const struct handshake_req *req, const struct sock *sk ), TP_ARGS(net, req, sk), TP_STRUCT__entry( __field(const void *, req) __field(const void *, sk) __field(unsigned int, netns_ino) ), TP_fast_assign( __entry->req = req; __entry->sk = sk; __entry->netns_ino = net->ns.inum; ), TP_printk("req=%p sk=%p", __entry->req, __entry->sk ) ); #define DEFINE_HANDSHAKE_EVENT(name) \ DEFINE_EVENT(handshake_event_class, name, \ TP_PROTO( \ const struct net *net, \ const struct handshake_req *req, \ const struct sock *sk \ ), \ TP_ARGS(net, req, sk)) DECLARE_EVENT_CLASS(handshake_fd_class, TP_PROTO( const struct net *net, const struct handshake_req *req, const struct sock *sk, int fd ), TP_ARGS(net, req, sk, fd), TP_STRUCT__entry( __field(const void *, req) __field(const void *, sk) __field(int, fd) __field(unsigned int, netns_ino) ), TP_fast_assign( __entry->req = req; __entry->sk = req->hr_sk; __entry->fd = fd; __entry->netns_ino = net->ns.inum; ), TP_printk("req=%p sk=%p fd=%d", __entry->req, __entry->sk, __entry->fd ) ); #define DEFINE_HANDSHAKE_FD_EVENT(name) \ DEFINE_EVENT(handshake_fd_class, name, \ TP_PROTO( \ const struct net *net, \ const struct handshake_req *req, \ const struct sock *sk, \ int fd \ ), \ TP_ARGS(net, req, sk, fd)) DECLARE_EVENT_CLASS(handshake_error_class, TP_PROTO( const struct net *net, const struct handshake_req *req, const struct sock *sk, int err ), TP_ARGS(net, req, sk, err), TP_STRUCT__entry( __field(const void *, req) __field(const void *, sk) __field(int, err) __field(unsigned int, netns_ino) ), TP_fast_assign( __entry->req = req; __entry->sk = sk; __entry->err = err; __entry->netns_ino = net->ns.inum; ), TP_printk("req=%p sk=%p err=%d", __entry->req, __entry->sk, __entry->err ) ); #define DEFINE_HANDSHAKE_ERROR(name) \ DEFINE_EVENT(handshake_error_class, name, \ TP_PROTO( \ const struct net *net, \ const struct handshake_req *req, \ const struct sock *sk, \ int err \ ), \ TP_ARGS(net, req, sk, err)) DECLARE_EVENT_CLASS(handshake_alert_class, TP_PROTO( const struct sock *sk, unsigned char level, unsigned char description ), TP_ARGS(sk, level, description), TP_STRUCT__entry( /* sockaddr_in6 is always bigger than sockaddr_in */ __array(__u8, saddr, sizeof(struct sockaddr_in6)) __array(__u8, daddr, sizeof(struct sockaddr_in6)) __field(unsigned int, netns_ino) __field(unsigned long, level) __field(unsigned long, description) ), TP_fast_assign( const struct inet_sock *inet = inet_sk(sk); memset(__entry->saddr, 0, sizeof(struct sockaddr_in6)); memset(__entry->daddr, 0, sizeof(struct sockaddr_in6)); TP_STORE_ADDR_PORTS(__entry, inet, sk); __entry->netns_ino = sock_net(sk)->ns.inum; __entry->level = level; __entry->description = description; ), TP_printk("src=%pISpc dest=%pISpc %s: %s", __entry->saddr, __entry->daddr, show_tls_alert_level(__entry->level), show_tls_alert_description(__entry->description) ) ); #define DEFINE_HANDSHAKE_ALERT(name) \ DEFINE_EVENT(handshake_alert_class, name, \ TP_PROTO( \ const struct sock *sk, \ unsigned char level, \ unsigned char description \ ), \ TP_ARGS(sk, level, description)) /* * Request lifetime events */ DEFINE_HANDSHAKE_EVENT(handshake_submit); DEFINE_HANDSHAKE_ERROR(handshake_submit_err); DEFINE_HANDSHAKE_EVENT(handshake_cancel); DEFINE_HANDSHAKE_EVENT(handshake_cancel_none); DEFINE_HANDSHAKE_EVENT(handshake_cancel_busy); DEFINE_HANDSHAKE_EVENT(handshake_destruct); TRACE_EVENT(handshake_complete, TP_PROTO( const struct net *net, const struct handshake_req *req, const struct sock *sk, int status ), TP_ARGS(net, req, sk, status), TP_STRUCT__entry( __field(const void *, req) __field(const void *, sk) __field(int, status) __field(unsigned int, netns_ino) ), TP_fast_assign( __entry->req = req; __entry->sk = sk; __entry->status = status; __entry->netns_ino = net->ns.inum; ), TP_printk("req=%p sk=%p status=%d", __entry->req, __entry->sk, __entry->status ) ); /* * Netlink events */ DEFINE_HANDSHAKE_ERROR(handshake_notify_err); DEFINE_HANDSHAKE_FD_EVENT(handshake_cmd_accept); DEFINE_HANDSHAKE_ERROR(handshake_cmd_accept_err); DEFINE_HANDSHAKE_FD_EVENT(handshake_cmd_done); DEFINE_HANDSHAKE_ERROR(handshake_cmd_done_err); /* * TLS Record events */ TRACE_EVENT(tls_contenttype, TP_PROTO( const struct sock *sk, unsigned char type ), TP_ARGS(sk, type), TP_STRUCT__entry( /* sockaddr_in6 is always bigger than sockaddr_in */ __array(__u8, saddr, sizeof(struct sockaddr_in6)) __array(__u8, daddr, sizeof(struct sockaddr_in6)) __field(unsigned int, netns_ino) __field(unsigned long, type) ), TP_fast_assign( const struct inet_sock *inet = inet_sk(sk); memset(__entry->saddr, 0, sizeof(struct sockaddr_in6)); memset(__entry->daddr, 0, sizeof(struct sockaddr_in6)); TP_STORE_ADDR_PORTS(__entry, inet, sk); __entry->netns_ino = sock_net(sk)->ns.inum; __entry->type = type; ), TP_printk("src=%pISpc dest=%pISpc %s", __entry->saddr, __entry->daddr, show_tls_content_type(__entry->type) ) ); /* * TLS Alert events */ DEFINE_HANDSHAKE_ALERT(tls_alert_send); DEFINE_HANDSHAKE_ALERT(tls_alert_recv); #endif /* _TRACE_HANDSHAKE_H */ #include <trace/define_trace.h> |
| 100 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 | // SPDX-License-Identifier: GPL-2.0-only #include <linux/interval_tree.h> #include <linux/interval_tree_generic.h> #include <linux/compiler.h> #include <linux/export.h> #define START(node) ((node)->start) #define LAST(node) ((node)->last) INTERVAL_TREE_DEFINE(struct interval_tree_node, rb, unsigned long, __subtree_last, START, LAST,, interval_tree) EXPORT_SYMBOL_GPL(interval_tree_insert); EXPORT_SYMBOL_GPL(interval_tree_remove); EXPORT_SYMBOL_GPL(interval_tree_iter_first); EXPORT_SYMBOL_GPL(interval_tree_iter_next); #ifdef CONFIG_INTERVAL_TREE_SPAN_ITER /* * Roll nodes[1] into nodes[0] by advancing nodes[1] to the end of a contiguous * span of nodes. This makes nodes[0]->last the end of that contiguous used span * indexes that started at the original nodes[1]->start. nodes[1] is now the * first node starting the next used span. A hole span is between nodes[0]->last * and nodes[1]->start. nodes[1] must be !NULL. */ static void interval_tree_span_iter_next_gap(struct interval_tree_span_iter *state) { struct interval_tree_node *cur = state->nodes[1]; state->nodes[0] = cur; do { if (cur->last > state->nodes[0]->last) state->nodes[0] = cur; cur = interval_tree_iter_next(cur, state->first_index, state->last_index); } while (cur && (state->nodes[0]->last >= cur->start || state->nodes[0]->last + 1 == cur->start)); state->nodes[1] = cur; } void interval_tree_span_iter_first(struct interval_tree_span_iter *iter, struct rb_root_cached *itree, unsigned long first_index, unsigned long last_index) { iter->first_index = first_index; iter->last_index = last_index; iter->nodes[0] = NULL; iter->nodes[1] = interval_tree_iter_first(itree, first_index, last_index); if (!iter->nodes[1]) { /* No nodes intersect the span, whole span is hole */ iter->start_hole = first_index; iter->last_hole = last_index; iter->is_hole = 1; return; } if (iter->nodes[1]->start > first_index) { /* Leading hole on first iteration */ iter->start_hole = first_index; iter->last_hole = iter->nodes[1]->start - 1; iter->is_hole = 1; interval_tree_span_iter_next_gap(iter); return; } /* Starting inside a used */ iter->start_used = first_index; iter->is_hole = 0; interval_tree_span_iter_next_gap(iter); iter->last_used = iter->nodes[0]->last; if (iter->last_used >= last_index) { iter->last_used = last_index; iter->nodes[0] = NULL; iter->nodes[1] = NULL; } } EXPORT_SYMBOL_GPL(interval_tree_span_iter_first); void interval_tree_span_iter_next(struct interval_tree_span_iter *iter) { if (!iter->nodes[0] && !iter->nodes[1]) { iter->is_hole = -1; return; } if (iter->is_hole) { iter->start_used = iter->last_hole + 1; iter->last_used = iter->nodes[0]->last; if (iter->last_used >= iter->last_index) { iter->last_used = iter->last_index; iter->nodes[0] = NULL; iter->nodes[1] = NULL; } iter->is_hole = 0; return; } if (!iter->nodes[1]) { /* Trailing hole */ iter->start_hole = iter->nodes[0]->last + 1; iter->last_hole = iter->last_index; iter->nodes[0] = NULL; iter->is_hole = 1; return; } /* must have both nodes[0] and [1], interior hole */ iter->start_hole = iter->nodes[0]->last + 1; iter->last_hole = iter->nodes[1]->start - 1; iter->is_hole = 1; interval_tree_span_iter_next_gap(iter); } EXPORT_SYMBOL_GPL(interval_tree_span_iter_next); /* * Advance the iterator index to a specific position. The returned used/hole is * updated to start at new_index. This is faster than calling * interval_tree_span_iter_first() as it can avoid full searches in several * cases where the iterator is already set. */ void interval_tree_span_iter_advance(struct interval_tree_span_iter *iter, struct rb_root_cached *itree, unsigned long new_index) { if (iter->is_hole == -1) return; iter->first_index = new_index; if (new_index > iter->last_index) { iter->is_hole = -1; return; } /* Rely on the union aliasing hole/used */ if (iter->start_hole <= new_index && new_index <= iter->last_hole) { iter->start_hole = new_index; return; } if (new_index == iter->last_hole + 1) interval_tree_span_iter_next(iter); else interval_tree_span_iter_first(iter, itree, new_index, iter->last_index); } EXPORT_SYMBOL_GPL(interval_tree_span_iter_advance); #endif |
| 2 3 2 4 3 4 4 3 4 3 3 1 1 3 2 2 4 4 4 4 1 4 3 2 2 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 | // SPDX-License-Identifier: GPL-2.0-only /* * TCP Westwood+: end-to-end bandwidth estimation for TCP * * Angelo Dell'Aera: author of the first version of TCP Westwood+ in Linux 2.4 * * Support at http://c3lab.poliba.it/index.php/Westwood * Main references in literature: * * - Mascolo S, Casetti, M. Gerla et al. * "TCP Westwood: bandwidth estimation for TCP" Proc. ACM Mobicom 2001 * * - A. Grieco, s. Mascolo * "Performance evaluation of New Reno, Vegas, Westwood+ TCP" ACM Computer * Comm. Review, 2004 * * - A. Dell'Aera, L. Grieco, S. Mascolo. * "Linux 2.4 Implementation of Westwood+ TCP with Rate-Halving : * A Performance Evaluation Over the Internet" (ICC 2004), Paris, June 2004 * * Westwood+ employs end-to-end bandwidth measurement to set cwnd and * ssthresh after packet loss. The probing phase is as the original Reno. */ #include <linux/mm.h> #include <linux/module.h> #include <linux/skbuff.h> #include <linux/inet_diag.h> #include <net/tcp.h> /* TCP Westwood structure */ struct westwood { u32 bw_ns_est; /* first bandwidth estimation..not too smoothed 8) */ u32 bw_est; /* bandwidth estimate */ u32 rtt_win_sx; /* here starts a new evaluation... */ u32 bk; u32 snd_una; /* used for evaluating the number of acked bytes */ u32 cumul_ack; u32 accounted; u32 rtt; u32 rtt_min; /* minimum observed RTT */ u8 first_ack; /* flag which infers that this is the first ack */ u8 reset_rtt_min; /* Reset RTT min to next RTT sample*/ }; /* TCP Westwood functions and constants */ #define TCP_WESTWOOD_RTT_MIN (HZ/20) /* 50ms */ #define TCP_WESTWOOD_INIT_RTT (20*HZ) /* maybe too conservative?! */ /* * @tcp_westwood_create * This function initializes fields used in TCP Westwood+, * it is called after the initial SYN, so the sequence numbers * are correct but new passive connections we have no * information about RTTmin at this time so we simply set it to * TCP_WESTWOOD_INIT_RTT. This value was chosen to be too conservative * since in this way we're sure it will be updated in a consistent * way as soon as possible. It will reasonably happen within the first * RTT period of the connection lifetime. */ static void tcp_westwood_init(struct sock *sk) { struct westwood *w = inet_csk_ca(sk); w->bk = 0; w->bw_ns_est = 0; w->bw_est = 0; w->accounted = 0; w->cumul_ack = 0; w->reset_rtt_min = 1; w->rtt_min = w->rtt = TCP_WESTWOOD_INIT_RTT; w->rtt_win_sx = tcp_jiffies32; w->snd_una = tcp_sk(sk)->snd_una; w->first_ack = 1; } /* * @westwood_do_filter * Low-pass filter. Implemented using constant coefficients. */ static inline u32 westwood_do_filter(u32 a, u32 b) { return ((7 * a) + b) >> 3; } static void westwood_filter(struct westwood *w, u32 delta) { /* If the filter is empty fill it with the first sample of bandwidth */ if (w->bw_ns_est == 0 && w->bw_est == 0) { w->bw_ns_est = w->bk / delta; w->bw_est = w->bw_ns_est; } else { w->bw_ns_est = westwood_do_filter(w->bw_ns_est, w->bk / delta); w->bw_est = westwood_do_filter(w->bw_est, w->bw_ns_est); } } /* * @westwood_pkts_acked * Called after processing group of packets. * but all westwood needs is the last sample of srtt. */ static void tcp_westwood_pkts_acked(struct sock *sk, const struct ack_sample *sample) { struct westwood *w = inet_csk_ca(sk); if (sample->rtt_us > 0) w->rtt = usecs_to_jiffies(sample->rtt_us); } /* * @westwood_update_window * It updates RTT evaluation window if it is the right moment to do * it. If so it calls filter for evaluating bandwidth. */ static void westwood_update_window(struct sock *sk) { struct westwood *w = inet_csk_ca(sk); s32 delta = tcp_jiffies32 - w->rtt_win_sx; /* Initialize w->snd_una with the first acked sequence number in order * to fix mismatch between tp->snd_una and w->snd_una for the first * bandwidth sample */ if (w->first_ack) { w->snd_una = tcp_sk(sk)->snd_una; w->first_ack = 0; } /* * See if a RTT-window has passed. * Be careful since if RTT is less than * 50ms we don't filter but we continue 'building the sample'. * This minimum limit was chosen since an estimation on small * time intervals is better to avoid... * Obviously on a LAN we reasonably will always have * right_bound = left_bound + WESTWOOD_RTT_MIN */ if (w->rtt && delta > max_t(u32, w->rtt, TCP_WESTWOOD_RTT_MIN)) { westwood_filter(w, delta); w->bk = 0; w->rtt_win_sx = tcp_jiffies32; } } static inline void update_rtt_min(struct westwood *w) { if (w->reset_rtt_min) { w->rtt_min = w->rtt; w->reset_rtt_min = 0; } else w->rtt_min = min(w->rtt, w->rtt_min); } /* * @westwood_fast_bw * It is called when we are in fast path. In particular it is called when * header prediction is successful. In such case in fact update is * straight forward and doesn't need any particular care. */ static inline void westwood_fast_bw(struct sock *sk) { const struct tcp_sock *tp = tcp_sk(sk); struct westwood *w = inet_csk_ca(sk); westwood_update_window(sk); w->bk += tp->snd_una - w->snd_una; w->snd_una = tp->snd_una; update_rtt_min(w); } /* * @westwood_acked_count * This function evaluates cumul_ack for evaluating bk in case of * delayed or partial acks. */ static inline u32 westwood_acked_count(struct sock *sk) { const struct tcp_sock *tp = tcp_sk(sk); struct westwood *w = inet_csk_ca(sk); w->cumul_ack = tp->snd_una - w->snd_una; /* If cumul_ack is 0 this is a dupack since it's not moving * tp->snd_una. */ if (!w->cumul_ack) { w->accounted += tp->mss_cache; w->cumul_ack = tp->mss_cache; } if (w->cumul_ack > tp->mss_cache) { /* Partial or delayed ack */ if (w->accounted >= w->cumul_ack) { w->accounted -= w->cumul_ack; w->cumul_ack = tp->mss_cache; } else { w->cumul_ack -= w->accounted; w->accounted = 0; } } w->snd_una = tp->snd_una; return w->cumul_ack; } /* * TCP Westwood * Here limit is evaluated as Bw estimation*RTTmin (for obtaining it * in packets we use mss_cache). Rttmin is guaranteed to be >= 2 * so avoids ever returning 0. */ static u32 tcp_westwood_bw_rttmin(const struct sock *sk) { const struct tcp_sock *tp = tcp_sk(sk); const struct westwood *w = inet_csk_ca(sk); return max_t(u32, (w->bw_est * w->rtt_min) / tp->mss_cache, 2); } static void tcp_westwood_ack(struct sock *sk, u32 ack_flags) { if (ack_flags & CA_ACK_SLOWPATH) { struct westwood *w = inet_csk_ca(sk); westwood_update_window(sk); w->bk += westwood_acked_count(sk); update_rtt_min(w); return; } westwood_fast_bw(sk); } static void tcp_westwood_event(struct sock *sk, enum tcp_ca_event event) { struct tcp_sock *tp = tcp_sk(sk); struct westwood *w = inet_csk_ca(sk); switch (event) { case CA_EVENT_COMPLETE_CWR: tp->snd_ssthresh = tcp_westwood_bw_rttmin(sk); tcp_snd_cwnd_set(tp, tp->snd_ssthresh); break; case CA_EVENT_LOSS: tp->snd_ssthresh = tcp_westwood_bw_rttmin(sk); /* Update RTT_min when next ack arrives */ w->reset_rtt_min = 1; break; default: /* don't care */ break; } } /* Extract info for Tcp socket info provided via netlink. */ static size_t tcp_westwood_info(struct sock *sk, u32 ext, int *attr, union tcp_cc_info *info) { const struct westwood *ca = inet_csk_ca(sk); if (ext & (1 << (INET_DIAG_VEGASINFO - 1))) { info->vegas.tcpv_enabled = 1; info->vegas.tcpv_rttcnt = 0; info->vegas.tcpv_rtt = jiffies_to_usecs(ca->rtt); info->vegas.tcpv_minrtt = jiffies_to_usecs(ca->rtt_min); *attr = INET_DIAG_VEGASINFO; return sizeof(struct tcpvegas_info); } return 0; } static struct tcp_congestion_ops tcp_westwood __read_mostly = { .init = tcp_westwood_init, .ssthresh = tcp_reno_ssthresh, .cong_avoid = tcp_reno_cong_avoid, .undo_cwnd = tcp_reno_undo_cwnd, .cwnd_event = tcp_westwood_event, .in_ack_event = tcp_westwood_ack, .get_info = tcp_westwood_info, .pkts_acked = tcp_westwood_pkts_acked, .owner = THIS_MODULE, .name = "westwood" }; static int __init tcp_westwood_register(void) { BUILD_BUG_ON(sizeof(struct westwood) > ICSK_CA_PRIV_SIZE); return tcp_register_congestion_control(&tcp_westwood); } static void __exit tcp_westwood_unregister(void) { tcp_unregister_congestion_control(&tcp_westwood); } module_init(tcp_westwood_register); module_exit(tcp_westwood_unregister); MODULE_AUTHOR("Stephen Hemminger, Angelo Dell'Aera"); MODULE_LICENSE("GPL"); MODULE_DESCRIPTION("TCP Westwood+"); |
| 2 2 2 2 2 2 2 2 2 2 2 1 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 | // SPDX-License-Identifier: GPL-2.0 /* Copyright (C) B.A.T.M.A.N. contributors: * * Edo Monticelli, Antonio Quartulli */ #include "tp_meter.h" #include "main.h" #include <linux/atomic.h> #include <linux/build_bug.h> #include <linux/byteorder/generic.h> #include <linux/cache.h> #include <linux/compiler.h> #include <linux/container_of.h> #include <linux/err.h> #include <linux/etherdevice.h> #include <linux/gfp.h> #include <linux/if_ether.h> #include <linux/init.h> #include <linux/jiffies.h> #include <linux/kref.h> #include <linux/kthread.h> #include <linux/limits.h> #include <linux/list.h> #include <linux/minmax.h> #include <linux/netdevice.h> #include <linux/param.h> #include <linux/printk.h> #include <linux/random.h> #include <linux/rculist.h> #include <linux/rcupdate.h> #include <linux/sched.h> #include <linux/skbuff.h> #include <linux/slab.h> #include <linux/spinlock.h> #include <linux/stddef.h> #include <linux/string.h> #include <linux/timer.h> #include <linux/wait.h> #include <linux/workqueue.h> #include <uapi/linux/batadv_packet.h> #include <uapi/linux/batman_adv.h> #include "hard-interface.h" #include "log.h" #include "netlink.h" #include "originator.h" #include "send.h" /** * BATADV_TP_DEF_TEST_LENGTH - Default test length if not specified by the user * in milliseconds */ #define BATADV_TP_DEF_TEST_LENGTH 10000 /** * BATADV_TP_AWND - Advertised window by the receiver (in bytes) */ #define BATADV_TP_AWND 0x20000000 /** * BATADV_TP_RECV_TIMEOUT - Receiver activity timeout. If the receiver does not * get anything for such amount of milliseconds, the connection is killed */ #define BATADV_TP_RECV_TIMEOUT 1000 /** * BATADV_TP_MAX_RTO - Maximum sender timeout. If the sender RTO gets beyond * such amount of milliseconds, the receiver is considered unreachable and the * connection is killed */ #define BATADV_TP_MAX_RTO 30000 /** * BATADV_TP_FIRST_SEQ - First seqno of each session. The number is rather high * in order to immediately trigger a wrap around (test purposes) */ #define BATADV_TP_FIRST_SEQ ((u32)-1 - 2000) /** * BATADV_TP_PLEN - length of the payload (data after the batadv_unicast header) * to simulate */ #define BATADV_TP_PLEN (BATADV_TP_PACKET_LEN - ETH_HLEN - \ sizeof(struct batadv_unicast_packet)) static u8 batadv_tp_prerandom[4096] __read_mostly; /** * batadv_tp_session_cookie() - generate session cookie based on session ids * @session: TP session identifier * @icmp_uid: icmp pseudo uid of the tp session * * Return: 32 bit tp_meter session cookie */ static u32 batadv_tp_session_cookie(const u8 session[2], u8 icmp_uid) { u32 cookie; cookie = icmp_uid << 16; cookie |= session[0] << 8; cookie |= session[1]; return cookie; } /** * batadv_tp_cwnd() - compute the new cwnd size * @base: base cwnd size value * @increment: the value to add to base to get the new size * @min: minimum cwnd value (usually MSS) * * Return the new cwnd size and ensure it does not exceed the Advertised * Receiver Window size. It is wrapped around safely. * For details refer to Section 3.1 of RFC5681 * * Return: new congestion window size in bytes */ static u32 batadv_tp_cwnd(u32 base, u32 increment, u32 min) { u32 new_size = base + increment; /* check for wrap-around */ if (new_size < base) new_size = (u32)ULONG_MAX; new_size = min_t(u32, new_size, BATADV_TP_AWND); return max_t(u32, new_size, min); } /** * batadv_tp_update_cwnd() - update the Congestion Windows * @tp_vars: the private data of the current TP meter session * @mss: maximum segment size of transmission * * 1) if the session is in Slow Start, the CWND has to be increased by 1 * MSS every unique received ACK * 2) if the session is in Congestion Avoidance, the CWND has to be * increased by MSS * MSS / CWND for every unique received ACK */ static void batadv_tp_update_cwnd(struct batadv_tp_vars *tp_vars, u32 mss) { spin_lock_bh(&tp_vars->cwnd_lock); /* slow start... */ if (tp_vars->cwnd <= tp_vars->ss_threshold) { tp_vars->dec_cwnd = 0; tp_vars->cwnd = batadv_tp_cwnd(tp_vars->cwnd, mss, mss); spin_unlock_bh(&tp_vars->cwnd_lock); return; } /* increment CWND at least of 1 (section 3.1 of RFC5681) */ tp_vars->dec_cwnd += max_t(u32, 1U << 3, ((mss * mss) << 6) / (tp_vars->cwnd << 3)); if (tp_vars->dec_cwnd < (mss << 3)) { spin_unlock_bh(&tp_vars->cwnd_lock); return; } tp_vars->cwnd = batadv_tp_cwnd(tp_vars->cwnd, mss, mss); tp_vars->dec_cwnd = 0; spin_unlock_bh(&tp_vars->cwnd_lock); } /** * batadv_tp_update_rto() - calculate new retransmission timeout * @tp_vars: the private data of the current TP meter session * @new_rtt: new roundtrip time in msec */ static void batadv_tp_update_rto(struct batadv_tp_vars *tp_vars, u32 new_rtt) { long m = new_rtt; /* RTT update * Details in Section 2.2 and 2.3 of RFC6298 * * It's tricky to understand. Don't lose hair please. * Inspired by tcp_rtt_estimator() tcp_input.c */ if (tp_vars->srtt != 0) { m -= (tp_vars->srtt >> 3); /* m is now error in rtt est */ tp_vars->srtt += m; /* rtt = 7/8 srtt + 1/8 new */ if (m < 0) m = -m; m -= (tp_vars->rttvar >> 2); tp_vars->rttvar += m; /* mdev ~= 3/4 rttvar + 1/4 new */ } else { /* first measure getting in */ tp_vars->srtt = m << 3; /* take the measured time to be srtt */ tp_vars->rttvar = m << 1; /* new_rtt / 2 */ } /* rto = srtt + 4 * rttvar. * rttvar is scaled by 4, therefore doesn't need to be multiplied */ tp_vars->rto = (tp_vars->srtt >> 3) + tp_vars->rttvar; } /** * batadv_tp_batctl_notify() - send client status result to client * @reason: reason for tp meter session stop * @dst: destination of tp_meter session * @bat_priv: the bat priv with all the soft interface information * @start_time: start of transmission in jiffies * @total_sent: bytes acked to the receiver * @cookie: cookie of tp_meter session */ static void batadv_tp_batctl_notify(enum batadv_tp_meter_reason reason, const u8 *dst, struct batadv_priv *bat_priv, unsigned long start_time, u64 total_sent, u32 cookie) { u32 test_time; u8 result; u32 total_bytes; if (!batadv_tp_is_error(reason)) { result = BATADV_TP_REASON_COMPLETE; test_time = jiffies_to_msecs(jiffies - start_time); total_bytes = total_sent; } else { result = reason; test_time = 0; total_bytes = 0; } batadv_netlink_tpmeter_notify(bat_priv, dst, result, test_time, total_bytes, cookie); } /** * batadv_tp_batctl_error_notify() - send client error result to client * @reason: reason for tp meter session stop * @dst: destination of tp_meter session * @bat_priv: the bat priv with all the soft interface information * @cookie: cookie of tp_meter session */ static void batadv_tp_batctl_error_notify(enum batadv_tp_meter_reason reason, const u8 *dst, struct batadv_priv *bat_priv, u32 cookie) { batadv_tp_batctl_notify(reason, dst, bat_priv, 0, 0, cookie); } /** * batadv_tp_list_find() - find a tp_vars object in the global list * @bat_priv: the bat priv with all the soft interface information * @dst: the other endpoint MAC address to look for * * Look for a tp_vars object matching dst as end_point and return it after * having increment the refcounter. Return NULL is not found * * Return: matching tp_vars or NULL when no tp_vars with @dst was found */ static struct batadv_tp_vars *batadv_tp_list_find(struct batadv_priv *bat_priv, const u8 *dst) { struct batadv_tp_vars *pos, *tp_vars = NULL; rcu_read_lock(); hlist_for_each_entry_rcu(pos, &bat_priv->tp_list, list) { if (!batadv_compare_eth(pos->other_end, dst)) continue; /* most of the time this function is invoked during the normal * process..it makes sens to pay more when the session is * finished and to speed the process up during the measurement */ if (unlikely(!kref_get_unless_zero(&pos->refcount))) continue; tp_vars = pos; break; } rcu_read_unlock(); return tp_vars; } /** * batadv_tp_list_find_session() - find tp_vars session object in the global * list * @bat_priv: the bat priv with all the soft interface information * @dst: the other endpoint MAC address to look for * @session: session identifier * * Look for a tp_vars object matching dst as end_point, session as tp meter * session and return it after having increment the refcounter. Return NULL * is not found * * Return: matching tp_vars or NULL when no tp_vars was found */ static struct batadv_tp_vars * batadv_tp_list_find_session(struct batadv_priv *bat_priv, const u8 *dst, const u8 *session) { struct batadv_tp_vars *pos, *tp_vars = NULL; rcu_read_lock(); hlist_for_each_entry_rcu(pos, &bat_priv->tp_list, list) { if (!batadv_compare_eth(pos->other_end, dst)) continue; if (memcmp(pos->session, session, sizeof(pos->session)) != 0) continue; /* most of the time this function is invoked during the normal * process..it makes sense to pay more when the session is * finished and to speed the process up during the measurement */ if (unlikely(!kref_get_unless_zero(&pos->refcount))) continue; tp_vars = pos; break; } rcu_read_unlock(); return tp_vars; } /** * batadv_tp_vars_release() - release batadv_tp_vars from lists and queue for * free after rcu grace period * @ref: kref pointer of the batadv_tp_vars */ static void batadv_tp_vars_release(struct kref *ref) { struct batadv_tp_vars *tp_vars; struct batadv_tp_unacked *un, *safe; tp_vars = container_of(ref, struct batadv_tp_vars, refcount); /* lock should not be needed because this object is now out of any * context! */ spin_lock_bh(&tp_vars->unacked_lock); list_for_each_entry_safe(un, safe, &tp_vars->unacked_list, list) { list_del(&un->list); kfree(un); } spin_unlock_bh(&tp_vars->unacked_lock); kfree_rcu(tp_vars, rcu); } /** * batadv_tp_vars_put() - decrement the batadv_tp_vars refcounter and possibly * release it * @tp_vars: the private data of the current TP meter session to be free'd */ static void batadv_tp_vars_put(struct batadv_tp_vars *tp_vars) { if (!tp_vars) return; kref_put(&tp_vars->refcount, batadv_tp_vars_release); } /** * batadv_tp_sender_cleanup() - cleanup sender data and drop and timer * @bat_priv: the bat priv with all the soft interface information * @tp_vars: the private data of the current TP meter session to cleanup */ static void batadv_tp_sender_cleanup(struct batadv_priv *bat_priv, struct batadv_tp_vars *tp_vars) { cancel_delayed_work(&tp_vars->finish_work); spin_lock_bh(&tp_vars->bat_priv->tp_list_lock); hlist_del_rcu(&tp_vars->list); spin_unlock_bh(&tp_vars->bat_priv->tp_list_lock); /* drop list reference */ batadv_tp_vars_put(tp_vars); atomic_dec(&tp_vars->bat_priv->tp_num); /* kill the timer and remove its reference */ del_timer_sync(&tp_vars->timer); /* the worker might have rearmed itself therefore we kill it again. Note * that if the worker should run again before invoking the following * del_timer(), it would not re-arm itself once again because the status * is OFF now */ del_timer(&tp_vars->timer); batadv_tp_vars_put(tp_vars); } /** * batadv_tp_sender_end() - print info about ended session and inform client * @bat_priv: the bat priv with all the soft interface information * @tp_vars: the private data of the current TP meter session */ static void batadv_tp_sender_end(struct batadv_priv *bat_priv, struct batadv_tp_vars *tp_vars) { u32 session_cookie; batadv_dbg(BATADV_DBG_TP_METER, bat_priv, "Test towards %pM finished..shutting down (reason=%d)\n", tp_vars->other_end, tp_vars->reason); batadv_dbg(BATADV_DBG_TP_METER, bat_priv, "Last timing stats: SRTT=%ums RTTVAR=%ums RTO=%ums\n", tp_vars->srtt >> 3, tp_vars->rttvar >> 2, tp_vars->rto); batadv_dbg(BATADV_DBG_TP_METER, bat_priv, "Final values: cwnd=%u ss_threshold=%u\n", tp_vars->cwnd, tp_vars->ss_threshold); session_cookie = batadv_tp_session_cookie(tp_vars->session, tp_vars->icmp_uid); batadv_tp_batctl_notify(tp_vars->reason, tp_vars->other_end, bat_priv, tp_vars->start_time, atomic64_read(&tp_vars->tot_sent), session_cookie); } /** * batadv_tp_sender_shutdown() - let sender thread/timer stop gracefully * @tp_vars: the private data of the current TP meter session * @reason: reason for tp meter session stop */ static void batadv_tp_sender_shutdown(struct batadv_tp_vars *tp_vars, enum batadv_tp_meter_reason reason) { if (!atomic_dec_and_test(&tp_vars->sending)) return; tp_vars->reason = reason; } /** * batadv_tp_sender_finish() - stop sender session after test_length was reached * @work: delayed work reference of the related tp_vars */ static void batadv_tp_sender_finish(struct work_struct *work) { struct delayed_work *delayed_work; struct batadv_tp_vars *tp_vars; delayed_work = to_delayed_work(work); tp_vars = container_of(delayed_work, struct batadv_tp_vars, finish_work); batadv_tp_sender_shutdown(tp_vars, BATADV_TP_REASON_COMPLETE); } /** * batadv_tp_reset_sender_timer() - reschedule the sender timer * @tp_vars: the private TP meter data for this session * * Reschedule the timer using tp_vars->rto as delay */ static void batadv_tp_reset_sender_timer(struct batadv_tp_vars *tp_vars) { /* most of the time this function is invoked while normal packet * reception... */ if (unlikely(atomic_read(&tp_vars->sending) == 0)) /* timer ref will be dropped in batadv_tp_sender_cleanup */ return; mod_timer(&tp_vars->timer, jiffies + msecs_to_jiffies(tp_vars->rto)); } /** * batadv_tp_sender_timeout() - timer that fires in case of packet loss * @t: address to timer_list inside tp_vars * * If fired it means that there was packet loss. * Switch to Slow Start, set the ss_threshold to half of the current cwnd and * reset the cwnd to 3*MSS */ static void batadv_tp_sender_timeout(struct timer_list *t) { struct batadv_tp_vars *tp_vars = from_timer(tp_vars, t, timer); struct batadv_priv *bat_priv = tp_vars->bat_priv; if (atomic_read(&tp_vars->sending) == 0) return; /* if the user waited long enough...shutdown the test */ if (unlikely(tp_vars->rto >= BATADV_TP_MAX_RTO)) { batadv_tp_sender_shutdown(tp_vars, BATADV_TP_REASON_DST_UNREACHABLE); return; } /* RTO exponential backoff * Details in Section 5.5 of RFC6298 */ tp_vars->rto <<= 1; spin_lock_bh(&tp_vars->cwnd_lock); tp_vars->ss_threshold = tp_vars->cwnd >> 1; if (tp_vars->ss_threshold < BATADV_TP_PLEN * 2) tp_vars->ss_threshold = BATADV_TP_PLEN * 2; batadv_dbg(BATADV_DBG_TP_METER, bat_priv, "Meter: RTO fired during test towards %pM! cwnd=%u new ss_thr=%u, resetting last_sent to %u\n", tp_vars->other_end, tp_vars->cwnd, tp_vars->ss_threshold, atomic_read(&tp_vars->last_acked)); tp_vars->cwnd = BATADV_TP_PLEN * 3; spin_unlock_bh(&tp_vars->cwnd_lock); /* resend the non-ACKed packets.. */ tp_vars->last_sent = atomic_read(&tp_vars->last_acked); wake_up(&tp_vars->more_bytes); batadv_tp_reset_sender_timer(tp_vars); } /** * batadv_tp_fill_prerandom() - Fill buffer with prefetched random bytes * @tp_vars: the private TP meter data for this session * @buf: Buffer to fill with bytes * @nbytes: amount of pseudorandom bytes */ static void batadv_tp_fill_prerandom(struct batadv_tp_vars *tp_vars, u8 *buf, size_t nbytes) { u32 local_offset; size_t bytes_inbuf; size_t to_copy; size_t pos = 0; spin_lock_bh(&tp_vars->prerandom_lock); local_offset = tp_vars->prerandom_offset; tp_vars->prerandom_offset += nbytes; tp_vars->prerandom_offset %= sizeof(batadv_tp_prerandom); spin_unlock_bh(&tp_vars->prerandom_lock); while (nbytes) { local_offset %= sizeof(batadv_tp_prerandom); bytes_inbuf = sizeof(batadv_tp_prerandom) - local_offset; to_copy = min(nbytes, bytes_inbuf); memcpy(&buf[pos], &batadv_tp_prerandom[local_offset], to_copy); pos += to_copy; nbytes -= to_copy; local_offset = 0; } } /** * batadv_tp_send_msg() - send a single message * @tp_vars: the private TP meter data for this session * @src: source mac address * @orig_node: the originator of the destination * @seqno: sequence number of this packet * @len: length of the entire packet * @session: session identifier * @uid: local ICMP "socket" index * @timestamp: timestamp in jiffies which is replied in ack * * Create and send a single TP Meter message. * * Return: 0 on success, BATADV_TP_REASON_DST_UNREACHABLE if the destination is * not reachable, BATADV_TP_REASON_MEMORY_ERROR if the packet couldn't be * allocated */ static int batadv_tp_send_msg(struct batadv_tp_vars *tp_vars, const u8 *src, struct batadv_orig_node *orig_node, u32 seqno, size_t len, const u8 *session, int uid, u32 timestamp) { struct batadv_icmp_tp_packet *icmp; struct sk_buff *skb; int r; u8 *data; size_t data_len; skb = netdev_alloc_skb_ip_align(NULL, len + ETH_HLEN); if (unlikely(!skb)) return BATADV_TP_REASON_MEMORY_ERROR; skb_reserve(skb, ETH_HLEN); icmp = skb_put(skb, sizeof(*icmp)); /* fill the icmp header */ ether_addr_copy(icmp->dst, orig_node->orig); ether_addr_copy(icmp->orig, src); icmp->version = BATADV_COMPAT_VERSION; icmp->packet_type = BATADV_ICMP; icmp->ttl = BATADV_TTL; icmp->msg_type = BATADV_TP; icmp->uid = uid; icmp->subtype = BATADV_TP_MSG; memcpy(icmp->session, session, sizeof(icmp->session)); icmp->seqno = htonl(seqno); icmp->timestamp = htonl(timestamp); data_len = len - sizeof(*icmp); data = skb_put(skb, data_len); batadv_tp_fill_prerandom(tp_vars, data, data_len); r = batadv_send_skb_to_orig(skb, orig_node, NULL); if (r == NET_XMIT_SUCCESS) return 0; return BATADV_TP_REASON_CANT_SEND; } /** * batadv_tp_recv_ack() - ACK receiving function * @bat_priv: the bat priv with all the soft interface information * @skb: the buffer containing the received packet * * Process a received TP ACK packet */ static void batadv_tp_recv_ack(struct batadv_priv *bat_priv, const struct sk_buff *skb) { struct batadv_hard_iface *primary_if = NULL; struct batadv_orig_node *orig_node = NULL; const struct batadv_icmp_tp_packet *icmp; struct batadv_tp_vars *tp_vars; const unsigned char *dev_addr; size_t packet_len, mss; u32 rtt, recv_ack, cwnd; packet_len = BATADV_TP_PLEN; mss = BATADV_TP_PLEN; packet_len += sizeof(struct batadv_unicast_packet); icmp = (struct batadv_icmp_tp_packet *)skb->data; /* find the tp_vars */ tp_vars = batadv_tp_list_find_session(bat_priv, icmp->orig, icmp->session); if (unlikely(!tp_vars)) return; if (unlikely(atomic_read(&tp_vars->sending) == 0)) goto out; /* old ACK? silently drop it.. */ if (batadv_seq_before(ntohl(icmp->seqno), (u32)atomic_read(&tp_vars->last_acked))) goto out; primary_if = batadv_primary_if_get_selected(bat_priv); if (unlikely(!primary_if)) goto out; orig_node = batadv_orig_hash_find(bat_priv, icmp->orig); if (unlikely(!orig_node)) goto out; /* update RTO with the new sampled RTT, if any */ rtt = jiffies_to_msecs(jiffies) - ntohl(icmp->timestamp); if (icmp->timestamp && rtt) batadv_tp_update_rto(tp_vars, rtt); /* ACK for new data... reset the timer */ batadv_tp_reset_sender_timer(tp_vars); recv_ack = ntohl(icmp->seqno); /* check if this ACK is a duplicate */ if (atomic_read(&tp_vars->last_acked) == recv_ack) { atomic_inc(&tp_vars->dup_acks); if (atomic_read(&tp_vars->dup_acks) != 3) goto out; if (recv_ack >= tp_vars->recover) goto out; /* if this is the third duplicate ACK do Fast Retransmit */ batadv_tp_send_msg(tp_vars, primary_if->net_dev->dev_addr, orig_node, recv_ack, packet_len, icmp->session, icmp->uid, jiffies_to_msecs(jiffies)); spin_lock_bh(&tp_vars->cwnd_lock); /* Fast Recovery */ tp_vars->fast_recovery = true; /* Set recover to the last outstanding seqno when Fast Recovery * is entered. RFC6582, Section 3.2, step 1 */ tp_vars->recover = tp_vars->last_sent; tp_vars->ss_threshold = tp_vars->cwnd >> 1; batadv_dbg(BATADV_DBG_TP_METER, bat_priv, "Meter: Fast Recovery, (cur cwnd=%u) ss_thr=%u last_sent=%u recv_ack=%u\n", tp_vars->cwnd, tp_vars->ss_threshold, tp_vars->last_sent, recv_ack); tp_vars->cwnd = batadv_tp_cwnd(tp_vars->ss_threshold, 3 * mss, mss); tp_vars->dec_cwnd = 0; tp_vars->last_sent = recv_ack; spin_unlock_bh(&tp_vars->cwnd_lock); } else { /* count the acked data */ atomic64_add(recv_ack - atomic_read(&tp_vars->last_acked), &tp_vars->tot_sent); /* reset the duplicate ACKs counter */ atomic_set(&tp_vars->dup_acks, 0); if (tp_vars->fast_recovery) { /* partial ACK */ if (batadv_seq_before(recv_ack, tp_vars->recover)) { /* this is another hole in the window. React * immediately as specified by NewReno (see * Section 3.2 of RFC6582 for details) */ dev_addr = primary_if->net_dev->dev_addr; batadv_tp_send_msg(tp_vars, dev_addr, orig_node, recv_ack, packet_len, icmp->session, icmp->uid, jiffies_to_msecs(jiffies)); tp_vars->cwnd = batadv_tp_cwnd(tp_vars->cwnd, mss, mss); } else { tp_vars->fast_recovery = false; /* set cwnd to the value of ss_threshold at the * moment that Fast Recovery was entered. * RFC6582, Section 3.2, step 3 */ cwnd = batadv_tp_cwnd(tp_vars->ss_threshold, 0, mss); tp_vars->cwnd = cwnd; } goto move_twnd; } if (recv_ack - atomic_read(&tp_vars->last_acked) >= mss) batadv_tp_update_cwnd(tp_vars, mss); move_twnd: /* move the Transmit Window */ atomic_set(&tp_vars->last_acked, recv_ack); } wake_up(&tp_vars->more_bytes); out: batadv_hardif_put(primary_if); batadv_orig_node_put(orig_node); batadv_tp_vars_put(tp_vars); } /** * batadv_tp_avail() - check if congestion window is not full * @tp_vars: the private data of the current TP meter session * @payload_len: size of the payload of a single message * * Return: true when congestion window is not full, false otherwise */ static bool batadv_tp_avail(struct batadv_tp_vars *tp_vars, size_t payload_len) { u32 win_left, win_limit; win_limit = atomic_read(&tp_vars->last_acked) + tp_vars->cwnd; win_left = win_limit - tp_vars->last_sent; return win_left >= payload_len; } /** * batadv_tp_wait_available() - wait until congestion window becomes free or * timeout is reached * @tp_vars: the private data of the current TP meter session * @plen: size of the payload of a single message * * Return: 0 if the condition evaluated to false after the timeout elapsed, * 1 if the condition evaluated to true after the timeout elapsed, the * remaining jiffies (at least 1) if the condition evaluated to true before * the timeout elapsed, or -ERESTARTSYS if it was interrupted by a signal. */ static int batadv_tp_wait_available(struct batadv_tp_vars *tp_vars, size_t plen) { int ret; ret = wait_event_interruptible_timeout(tp_vars->more_bytes, batadv_tp_avail(tp_vars, plen), HZ / 10); return ret; } /** * batadv_tp_send() - main sending thread of a tp meter session * @arg: address of the related tp_vars * * Return: nothing, this function never returns */ static int batadv_tp_send(void *arg) { struct batadv_tp_vars *tp_vars = arg; struct batadv_priv *bat_priv = tp_vars->bat_priv; struct batadv_hard_iface *primary_if = NULL; struct batadv_orig_node *orig_node = NULL; size_t payload_len, packet_len; int err = 0; if (unlikely(tp_vars->role != BATADV_TP_SENDER)) { err = BATADV_TP_REASON_DST_UNREACHABLE; tp_vars->reason = err; goto out; } orig_node = batadv_orig_hash_find(bat_priv, tp_vars->other_end); if (unlikely(!orig_node)) { err = BATADV_TP_REASON_DST_UNREACHABLE; tp_vars->reason = err; goto out; } primary_if = batadv_primary_if_get_selected(bat_priv); if (unlikely(!primary_if)) { err = BATADV_TP_REASON_DST_UNREACHABLE; tp_vars->reason = err; goto out; } /* assume that all the hard_interfaces have a correctly * configured MTU, so use the soft_iface MTU as MSS. * This might not be true and in that case the fragmentation * should be used. * Now, try to send the packet as it is */ payload_len = BATADV_TP_PLEN; BUILD_BUG_ON(sizeof(struct batadv_icmp_tp_packet) > BATADV_TP_PLEN); batadv_tp_reset_sender_timer(tp_vars); /* queue the worker in charge of terminating the test */ queue_delayed_work(batadv_event_workqueue, &tp_vars->finish_work, msecs_to_jiffies(tp_vars->test_length)); while (atomic_read(&tp_vars->sending) != 0) { if (unlikely(!batadv_tp_avail(tp_vars, payload_len))) { batadv_tp_wait_available(tp_vars, payload_len); continue; } /* to emulate normal unicast traffic, add to the payload len * the size of the unicast header */ packet_len = payload_len + sizeof(struct batadv_unicast_packet); err = batadv_tp_send_msg(tp_vars, primary_if->net_dev->dev_addr, orig_node, tp_vars->last_sent, packet_len, tp_vars->session, tp_vars->icmp_uid, jiffies_to_msecs(jiffies)); /* something went wrong during the preparation/transmission */ if (unlikely(err && err != BATADV_TP_REASON_CANT_SEND)) { batadv_dbg(BATADV_DBG_TP_METER, bat_priv, "Meter: %s() cannot send packets (%d)\n", __func__, err); /* ensure nobody else tries to stop the thread now */ if (atomic_dec_and_test(&tp_vars->sending)) tp_vars->reason = err; break; } /* right-shift the TWND */ if (!err) tp_vars->last_sent += payload_len; cond_resched(); } out: batadv_hardif_put(primary_if); batadv_orig_node_put(orig_node); batadv_tp_sender_end(bat_priv, tp_vars); batadv_tp_sender_cleanup(bat_priv, tp_vars); batadv_tp_vars_put(tp_vars); return 0; } /** * batadv_tp_start_kthread() - start new thread which manages the tp meter * sender * @tp_vars: the private data of the current TP meter session */ static void batadv_tp_start_kthread(struct batadv_tp_vars *tp_vars) { struct task_struct *kthread; struct batadv_priv *bat_priv = tp_vars->bat_priv; u32 session_cookie; kref_get(&tp_vars->refcount); kthread = kthread_create(batadv_tp_send, tp_vars, "kbatadv_tp_meter"); if (IS_ERR(kthread)) { session_cookie = batadv_tp_session_cookie(tp_vars->session, tp_vars->icmp_uid); pr_err("batadv: cannot create tp meter kthread\n"); batadv_tp_batctl_error_notify(BATADV_TP_REASON_MEMORY_ERROR, tp_vars->other_end, bat_priv, session_cookie); /* drop reserved reference for kthread */ batadv_tp_vars_put(tp_vars); /* cleanup of failed tp meter variables */ batadv_tp_sender_cleanup(bat_priv, tp_vars); return; } wake_up_process(kthread); } /** * batadv_tp_start() - start a new tp meter session * @bat_priv: the bat priv with all the soft interface information * @dst: the receiver MAC address * @test_length: test length in milliseconds * @cookie: session cookie */ void batadv_tp_start(struct batadv_priv *bat_priv, const u8 *dst, u32 test_length, u32 *cookie) { struct batadv_tp_vars *tp_vars; u8 session_id[2]; u8 icmp_uid; u32 session_cookie; get_random_bytes(session_id, sizeof(session_id)); get_random_bytes(&icmp_uid, 1); session_cookie = batadv_tp_session_cookie(session_id, icmp_uid); *cookie = session_cookie; /* look for an already existing test towards this node */ spin_lock_bh(&bat_priv->tp_list_lock); tp_vars = batadv_tp_list_find(bat_priv, dst); if (tp_vars) { spin_unlock_bh(&bat_priv->tp_list_lock); batadv_tp_vars_put(tp_vars); batadv_dbg(BATADV_DBG_TP_METER, bat_priv, "Meter: test to or from the same node already ongoing, aborting\n"); batadv_tp_batctl_error_notify(BATADV_TP_REASON_ALREADY_ONGOING, dst, bat_priv, session_cookie); return; } if (!atomic_add_unless(&bat_priv->tp_num, 1, BATADV_TP_MAX_NUM)) { spin_unlock_bh(&bat_priv->tp_list_lock); batadv_dbg(BATADV_DBG_TP_METER, bat_priv, "Meter: too many ongoing sessions, aborting (SEND)\n"); batadv_tp_batctl_error_notify(BATADV_TP_REASON_TOO_MANY, dst, bat_priv, session_cookie); return; } tp_vars = kmalloc(sizeof(*tp_vars), GFP_ATOMIC); if (!tp_vars) { spin_unlock_bh(&bat_priv->tp_list_lock); batadv_dbg(BATADV_DBG_TP_METER, bat_priv, "Meter: %s cannot allocate list elements\n", __func__); batadv_tp_batctl_error_notify(BATADV_TP_REASON_MEMORY_ERROR, dst, bat_priv, session_cookie); return; } /* initialize tp_vars */ ether_addr_copy(tp_vars->other_end, dst); kref_init(&tp_vars->refcount); tp_vars->role = BATADV_TP_SENDER; atomic_set(&tp_vars->sending, 1); memcpy(tp_vars->session, session_id, sizeof(session_id)); tp_vars->icmp_uid = icmp_uid; tp_vars->last_sent = BATADV_TP_FIRST_SEQ; atomic_set(&tp_vars->last_acked, BATADV_TP_FIRST_SEQ); tp_vars->fast_recovery = false; tp_vars->recover = BATADV_TP_FIRST_SEQ; /* initialise the CWND to 3*MSS (Section 3.1 in RFC5681). * For batman-adv the MSS is the size of the payload received by the * soft_interface, hence its MTU */ tp_vars->cwnd = BATADV_TP_PLEN * 3; /* at the beginning initialise the SS threshold to the biggest possible * window size, hence the AWND size */ tp_vars->ss_threshold = BATADV_TP_AWND; /* RTO initial value is 3 seconds. * Details in Section 2.1 of RFC6298 */ tp_vars->rto = 1000; tp_vars->srtt = 0; tp_vars->rttvar = 0; atomic64_set(&tp_vars->tot_sent, 0); kref_get(&tp_vars->refcount); timer_setup(&tp_vars->timer, batadv_tp_sender_timeout, 0); tp_vars->bat_priv = bat_priv; tp_vars->start_time = jiffies; init_waitqueue_head(&tp_vars->more_bytes); spin_lock_init(&tp_vars->unacked_lock); INIT_LIST_HEAD(&tp_vars->unacked_list); spin_lock_init(&tp_vars->cwnd_lock); tp_vars->prerandom_offset = 0; spin_lock_init(&tp_vars->prerandom_lock); kref_get(&tp_vars->refcount); hlist_add_head_rcu(&tp_vars->list, &bat_priv->tp_list); spin_unlock_bh(&bat_priv->tp_list_lock); tp_vars->test_length = test_length; if (!tp_vars->test_length) tp_vars->test_length = BATADV_TP_DEF_TEST_LENGTH; batadv_dbg(BATADV_DBG_TP_METER, bat_priv, "Meter: starting throughput meter towards %pM (length=%ums)\n", dst, test_length); /* init work item for finished tp tests */ INIT_DELAYED_WORK(&tp_vars->finish_work, batadv_tp_sender_finish); /* start tp kthread. This way the write() call issued from userspace can * happily return and avoid to block */ batadv_tp_start_kthread(tp_vars); /* don't return reference to new tp_vars */ batadv_tp_vars_put(tp_vars); } /** * batadv_tp_stop() - stop currently running tp meter session * @bat_priv: the bat priv with all the soft interface information * @dst: the receiver MAC address * @return_value: reason for tp meter session stop */ void batadv_tp_stop(struct batadv_priv *bat_priv, const u8 *dst, u8 return_value) { struct batadv_orig_node *orig_node; struct batadv_tp_vars *tp_vars; batadv_dbg(BATADV_DBG_TP_METER, bat_priv, "Meter: stopping test towards %pM\n", dst); orig_node = batadv_orig_hash_find(bat_priv, dst); if (!orig_node) return; tp_vars = batadv_tp_list_find(bat_priv, orig_node->orig); if (!tp_vars) { batadv_dbg(BATADV_DBG_TP_METER, bat_priv, "Meter: trying to interrupt an already over connection\n"); goto out; } batadv_tp_sender_shutdown(tp_vars, return_value); batadv_tp_vars_put(tp_vars); out: batadv_orig_node_put(orig_node); } /** * batadv_tp_reset_receiver_timer() - reset the receiver shutdown timer * @tp_vars: the private data of the current TP meter session * * start the receiver shutdown timer or reset it if already started */ static void batadv_tp_reset_receiver_timer(struct batadv_tp_vars *tp_vars) { mod_timer(&tp_vars->timer, jiffies + msecs_to_jiffies(BATADV_TP_RECV_TIMEOUT)); } /** * batadv_tp_receiver_shutdown() - stop a tp meter receiver when timeout is * reached without received ack * @t: address to timer_list inside tp_vars */ static void batadv_tp_receiver_shutdown(struct timer_list *t) { struct batadv_tp_vars *tp_vars = from_timer(tp_vars, t, timer); struct batadv_tp_unacked *un, *safe; struct batadv_priv *bat_priv; bat_priv = tp_vars->bat_priv; /* if there is recent activity rearm the timer */ if (!batadv_has_timed_out(tp_vars->last_recv_time, BATADV_TP_RECV_TIMEOUT)) { /* reset the receiver shutdown timer */ batadv_tp_reset_receiver_timer(tp_vars); return; } batadv_dbg(BATADV_DBG_TP_METER, bat_priv, "Shutting down for inactivity (more than %dms) from %pM\n", BATADV_TP_RECV_TIMEOUT, tp_vars->other_end); spin_lock_bh(&tp_vars->bat_priv->tp_list_lock); hlist_del_rcu(&tp_vars->list); spin_unlock_bh(&tp_vars->bat_priv->tp_list_lock); /* drop list reference */ batadv_tp_vars_put(tp_vars); atomic_dec(&bat_priv->tp_num); spin_lock_bh(&tp_vars->unacked_lock); list_for_each_entry_safe(un, safe, &tp_vars->unacked_list, list) { list_del(&un->list); kfree(un); } spin_unlock_bh(&tp_vars->unacked_lock); /* drop reference of timer */ batadv_tp_vars_put(tp_vars); } /** * batadv_tp_send_ack() - send an ACK packet * @bat_priv: the bat priv with all the soft interface information * @dst: the mac address of the destination originator * @seq: the sequence number to ACK * @timestamp: the timestamp to echo back in the ACK * @session: session identifier * @socket_index: local ICMP socket identifier * * Return: 0 on success, a positive integer representing the reason of the * failure otherwise */ static int batadv_tp_send_ack(struct batadv_priv *bat_priv, const u8 *dst, u32 seq, __be32 timestamp, const u8 *session, int socket_index) { struct batadv_hard_iface *primary_if = NULL; struct batadv_orig_node *orig_node; struct batadv_icmp_tp_packet *icmp; struct sk_buff *skb; int r, ret; orig_node = batadv_orig_hash_find(bat_priv, dst); if (unlikely(!orig_node)) { ret = BATADV_TP_REASON_DST_UNREACHABLE; goto out; } primary_if = batadv_primary_if_get_selected(bat_priv); if (unlikely(!primary_if)) { ret = BATADV_TP_REASON_DST_UNREACHABLE; goto out; } skb = netdev_alloc_skb_ip_align(NULL, sizeof(*icmp) + ETH_HLEN); if (unlikely(!skb)) { ret = BATADV_TP_REASON_MEMORY_ERROR; goto out; } skb_reserve(skb, ETH_HLEN); icmp = skb_put(skb, sizeof(*icmp)); icmp->packet_type = BATADV_ICMP; icmp->version = BATADV_COMPAT_VERSION; icmp->ttl = BATADV_TTL; icmp->msg_type = BATADV_TP; ether_addr_copy(icmp->dst, orig_node->orig); ether_addr_copy(icmp->orig, primary_if->net_dev->dev_addr); icmp->uid = socket_index; icmp->subtype = BATADV_TP_ACK; memcpy(icmp->session, session, sizeof(icmp->session)); icmp->seqno = htonl(seq); icmp->timestamp = timestamp; /* send the ack */ r = batadv_send_skb_to_orig(skb, orig_node, NULL); if (unlikely(r < 0) || r == NET_XMIT_DROP) { ret = BATADV_TP_REASON_DST_UNREACHABLE; goto out; } ret = 0; out: batadv_orig_node_put(orig_node); batadv_hardif_put(primary_if); return ret; } /** * batadv_tp_handle_out_of_order() - store an out of order packet * @tp_vars: the private data of the current TP meter session * @skb: the buffer containing the received packet * * Store the out of order packet in the unacked list for late processing. This * packets are kept in this list so that they can be ACKed at once as soon as * all the previous packets have been received * * Return: true if the packed has been successfully processed, false otherwise */ static bool batadv_tp_handle_out_of_order(struct batadv_tp_vars *tp_vars, const struct sk_buff *skb) { const struct batadv_icmp_tp_packet *icmp; struct batadv_tp_unacked *un, *new; u32 payload_len; bool added = false; new = kmalloc(sizeof(*new), GFP_ATOMIC); if (unlikely(!new)) return false; icmp = (struct batadv_icmp_tp_packet *)skb->data; new->seqno = ntohl(icmp->seqno); payload_len = skb->len - sizeof(struct batadv_unicast_packet); new->len = payload_len; spin_lock_bh(&tp_vars->unacked_lock); /* if the list is empty immediately attach this new object */ if (list_empty(&tp_vars->unacked_list)) { list_add(&new->list, &tp_vars->unacked_list); goto out; } /* otherwise loop over the list and either drop the packet because this * is a duplicate or store it at the right position. * * The iteration is done in the reverse way because it is likely that * the last received packet (the one being processed now) has a bigger * seqno than all the others already stored. */ list_for_each_entry_reverse(un, &tp_vars->unacked_list, list) { /* check for duplicates */ if (new->seqno == un->seqno) { if (new->len > un->len) un->len = new->len; kfree(new); added = true; break; } /* look for the right position */ if (batadv_seq_before(new->seqno, un->seqno)) continue; /* as soon as an entry having a bigger seqno is found, the new * one is attached _after_ it. In this way the list is kept in * ascending order */ list_add_tail(&new->list, &un->list); added = true; break; } /* received packet with smallest seqno out of order; add it to front */ if (!added) list_add(&new->list, &tp_vars->unacked_list); out: spin_unlock_bh(&tp_vars->unacked_lock); return true; } /** * batadv_tp_ack_unordered() - update number received bytes in current stream * without gaps * @tp_vars: the private data of the current TP meter session */ static void batadv_tp_ack_unordered(struct batadv_tp_vars *tp_vars) { struct batadv_tp_unacked *un, *safe; u32 to_ack; /* go through the unacked packet list and possibly ACK them as * well */ spin_lock_bh(&tp_vars->unacked_lock); list_for_each_entry_safe(un, safe, &tp_vars->unacked_list, list) { /* the list is ordered, therefore it is possible to stop as soon * there is a gap between the last acked seqno and the seqno of * the packet under inspection */ if (batadv_seq_before(tp_vars->last_recv, un->seqno)) break; to_ack = un->seqno + un->len - tp_vars->last_recv; if (batadv_seq_before(tp_vars->last_recv, un->seqno + un->len)) tp_vars->last_recv += to_ack; list_del(&un->list); kfree(un); } spin_unlock_bh(&tp_vars->unacked_lock); } /** * batadv_tp_init_recv() - return matching or create new receiver tp_vars * @bat_priv: the bat priv with all the soft interface information * @icmp: received icmp tp msg * * Return: corresponding tp_vars or NULL on errors */ static struct batadv_tp_vars * batadv_tp_init_recv(struct batadv_priv *bat_priv, const struct batadv_icmp_tp_packet *icmp) { struct batadv_tp_vars *tp_vars; spin_lock_bh(&bat_priv->tp_list_lock); tp_vars = batadv_tp_list_find_session(bat_priv, icmp->orig, icmp->session); if (tp_vars) goto out_unlock; if (!atomic_add_unless(&bat_priv->tp_num, 1, BATADV_TP_MAX_NUM)) { batadv_dbg(BATADV_DBG_TP_METER, bat_priv, "Meter: too many ongoing sessions, aborting (RECV)\n"); goto out_unlock; } tp_vars = kmalloc(sizeof(*tp_vars), GFP_ATOMIC); if (!tp_vars) goto out_unlock; ether_addr_copy(tp_vars->other_end, icmp->orig); tp_vars->role = BATADV_TP_RECEIVER; memcpy(tp_vars->session, icmp->session, sizeof(tp_vars->session)); tp_vars->last_recv = BATADV_TP_FIRST_SEQ; tp_vars->bat_priv = bat_priv; kref_init(&tp_vars->refcount); spin_lock_init(&tp_vars->unacked_lock); INIT_LIST_HEAD(&tp_vars->unacked_list); kref_get(&tp_vars->refcount); hlist_add_head_rcu(&tp_vars->list, &bat_priv->tp_list); kref_get(&tp_vars->refcount); timer_setup(&tp_vars->timer, batadv_tp_receiver_shutdown, 0); batadv_tp_reset_receiver_timer(tp_vars); out_unlock: spin_unlock_bh(&bat_priv->tp_list_lock); return tp_vars; } /** * batadv_tp_recv_msg() - process a single data message * @bat_priv: the bat priv with all the soft interface information * @skb: the buffer containing the received packet * * Process a received TP MSG packet */ static void batadv_tp_recv_msg(struct batadv_priv *bat_priv, const struct sk_buff *skb) { const struct batadv_icmp_tp_packet *icmp; struct batadv_tp_vars *tp_vars; size_t packet_size; u32 seqno; icmp = (struct batadv_icmp_tp_packet *)skb->data; seqno = ntohl(icmp->seqno); /* check if this is the first seqno. This means that if the * first packet is lost, the tp meter does not work anymore! */ if (seqno == BATADV_TP_FIRST_SEQ) { tp_vars = batadv_tp_init_recv(bat_priv, icmp); if (!tp_vars) { batadv_dbg(BATADV_DBG_TP_METER, bat_priv, "Meter: seqno != BATADV_TP_FIRST_SEQ cannot initiate connection\n"); goto out; } } else { tp_vars = batadv_tp_list_find_session(bat_priv, icmp->orig, icmp->session); if (!tp_vars) { batadv_dbg(BATADV_DBG_TP_METER, bat_priv, "Unexpected packet from %pM!\n", icmp->orig); goto out; } } if (unlikely(tp_vars->role != BATADV_TP_RECEIVER)) { batadv_dbg(BATADV_DBG_TP_METER, bat_priv, "Meter: dropping packet: not expected (role=%u)\n", tp_vars->role); goto out; } tp_vars->last_recv_time = jiffies; /* if the packet is a duplicate, it may be the case that an ACK has been * lost. Resend the ACK */ if (batadv_seq_before(seqno, tp_vars->last_recv)) goto send_ack; /* if the packet is out of order enqueue it */ if (ntohl(icmp->seqno) != tp_vars->last_recv) { /* exit immediately (and do not send any ACK) if the packet has * not been enqueued correctly */ if (!batadv_tp_handle_out_of_order(tp_vars, skb)) goto out; /* send a duplicate ACK */ goto send_ack; } /* if everything was fine count the ACKed bytes */ packet_size = skb->len - sizeof(struct batadv_unicast_packet); tp_vars->last_recv += packet_size; /* check if this ordered message filled a gap.... */ batadv_tp_ack_unordered(tp_vars); send_ack: /* send the ACK. If the received packet was out of order, the ACK that * is going to be sent is a duplicate (the sender will count them and * possibly enter Fast Retransmit as soon as it has reached 3) */ batadv_tp_send_ack(bat_priv, icmp->orig, tp_vars->last_recv, icmp->timestamp, icmp->session, icmp->uid); out: batadv_tp_vars_put(tp_vars); } /** * batadv_tp_meter_recv() - main TP Meter receiving function * @bat_priv: the bat priv with all the soft interface information * @skb: the buffer containing the received packet */ void batadv_tp_meter_recv(struct batadv_priv *bat_priv, struct sk_buff *skb) { struct batadv_icmp_tp_packet *icmp; icmp = (struct batadv_icmp_tp_packet *)skb->data; switch (icmp->subtype) { case BATADV_TP_MSG: batadv_tp_recv_msg(bat_priv, skb); break; case BATADV_TP_ACK: batadv_tp_recv_ack(bat_priv, skb); break; default: batadv_dbg(BATADV_DBG_TP_METER, bat_priv, "Received unknown TP Metric packet type %u\n", icmp->subtype); } consume_skb(skb); } /** * batadv_tp_meter_init() - initialize global tp_meter structures */ void __init batadv_tp_meter_init(void) { get_random_bytes(batadv_tp_prerandom, sizeof(batadv_tp_prerandom)); } |
| 7 584 645 130 507 168 464 67 649 644 51 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 | /* SPDX-License-Identifier: GPL-2.0-only */ /* * sha256_base.h - core logic for SHA-256 implementations * * Copyright (C) 2015 Linaro Ltd <ard.biesheuvel@linaro.org> */ #ifndef _CRYPTO_SHA256_BASE_H #define _CRYPTO_SHA256_BASE_H #include <asm/byteorder.h> #include <linux/unaligned.h> #include <crypto/internal/hash.h> #include <crypto/sha2.h> #include <linux/string.h> #include <linux/types.h> typedef void (sha256_block_fn)(struct sha256_state *sst, u8 const *src, int blocks); static inline int sha224_base_init(struct shash_desc *desc) { struct sha256_state *sctx = shash_desc_ctx(desc); sha224_init(sctx); return 0; } static inline int sha256_base_init(struct shash_desc *desc) { struct sha256_state *sctx = shash_desc_ctx(desc); sha256_init(sctx); return 0; } static inline int lib_sha256_base_do_update(struct sha256_state *sctx, const u8 *data, unsigned int len, sha256_block_fn *block_fn) { unsigned int partial = sctx->count % SHA256_BLOCK_SIZE; sctx->count += len; if (unlikely((partial + len) >= SHA256_BLOCK_SIZE)) { int blocks; if (partial) { int p = SHA256_BLOCK_SIZE - partial; memcpy(sctx->buf + partial, data, p); data += p; len -= p; block_fn(sctx, sctx->buf, 1); } blocks = len / SHA256_BLOCK_SIZE; len %= SHA256_BLOCK_SIZE; if (blocks) { block_fn(sctx, data, blocks); data += blocks * SHA256_BLOCK_SIZE; } partial = 0; } if (len) memcpy(sctx->buf + partial, data, len); return 0; } static inline int sha256_base_do_update(struct shash_desc *desc, const u8 *data, unsigned int len, sha256_block_fn *block_fn) { struct sha256_state *sctx = shash_desc_ctx(desc); return lib_sha256_base_do_update(sctx, data, len, block_fn); } static inline int lib_sha256_base_do_finalize(struct sha256_state *sctx, sha256_block_fn *block_fn) { const int bit_offset = SHA256_BLOCK_SIZE - sizeof(__be64); __be64 *bits = (__be64 *)(sctx->buf + bit_offset); unsigned int partial = sctx->count % SHA256_BLOCK_SIZE; sctx->buf[partial++] = 0x80; if (partial > bit_offset) { memset(sctx->buf + partial, 0x0, SHA256_BLOCK_SIZE - partial); partial = 0; block_fn(sctx, sctx->buf, 1); } memset(sctx->buf + partial, 0x0, bit_offset - partial); *bits = cpu_to_be64(sctx->count << 3); block_fn(sctx, sctx->buf, 1); return 0; } static inline int sha256_base_do_finalize(struct shash_desc *desc, sha256_block_fn *block_fn) { struct sha256_state *sctx = shash_desc_ctx(desc); return lib_sha256_base_do_finalize(sctx, block_fn); } static inline int lib_sha256_base_finish(struct sha256_state *sctx, u8 *out, unsigned int digest_size) { __be32 *digest = (__be32 *)out; int i; for (i = 0; digest_size > 0; i++, digest_size -= sizeof(__be32)) put_unaligned_be32(sctx->state[i], digest++); memzero_explicit(sctx, sizeof(*sctx)); return 0; } static inline int sha256_base_finish(struct shash_desc *desc, u8 *out) { unsigned int digest_size = crypto_shash_digestsize(desc->tfm); struct sha256_state *sctx = shash_desc_ctx(desc); return lib_sha256_base_finish(sctx, out, digest_size); } #endif /* _CRYPTO_SHA256_BASE_H */ |
| 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 | // SPDX-License-Identifier: GPL-2.0 /* * Copyright (c) 2000-2006 Silicon Graphics, Inc. * Copyright (c) 2013 Red Hat, Inc. * All Rights Reserved. */ #include "xfs.h" #include "xfs_fs.h" #include "xfs_shared.h" #include "xfs_format.h" #include "xfs_log_format.h" #include "xfs_trans_resv.h" #include "xfs_mount.h" #include "xfs_inode.h" #include "xfs_quota.h" #include "xfs_trans.h" #include "xfs_qm.h" #include "xfs_error.h" #include "xfs_health.h" #include "xfs_metadir.h" #include "xfs_metafile.h" int xfs_calc_dquots_per_chunk( unsigned int nbblks) /* basic block units */ { ASSERT(nbblks > 0); return BBTOB(nbblks) / sizeof(struct xfs_dqblk); } /* * Do some primitive error checking on ondisk dquot data structures. * * The xfs_dqblk structure /contains/ the xfs_disk_dquot structure; * we verify them separately because at some points we have only the * smaller xfs_disk_dquot structure available. */ xfs_failaddr_t xfs_dquot_verify( struct xfs_mount *mp, struct xfs_disk_dquot *ddq, xfs_dqid_t id) /* used only during quotacheck */ { __u8 ddq_type; /* * We can encounter an uninitialized dquot buffer for 2 reasons: * 1. If we crash while deleting the quotainode(s), and those blks got * used for user data. This is because we take the path of regular * file deletion; however, the size field of quotainodes is never * updated, so all the tricks that we play in itruncate_finish * don't quite matter. * * 2. We don't play the quota buffers when there's a quotaoff logitem. * But the allocation will be replayed so we'll end up with an * uninitialized quota block. * * This is all fine; things are still consistent, and we haven't lost * any quota information. Just don't complain about bad dquot blks. */ if (ddq->d_magic != cpu_to_be16(XFS_DQUOT_MAGIC)) return __this_address; if (ddq->d_version != XFS_DQUOT_VERSION) return __this_address; if (ddq->d_type & ~XFS_DQTYPE_ANY) return __this_address; ddq_type = ddq->d_type & XFS_DQTYPE_REC_MASK; if (ddq_type != XFS_DQTYPE_USER && ddq_type != XFS_DQTYPE_PROJ && ddq_type != XFS_DQTYPE_GROUP) return __this_address; if ((ddq->d_type & XFS_DQTYPE_BIGTIME) && !xfs_has_bigtime(mp)) return __this_address; if ((ddq->d_type & XFS_DQTYPE_BIGTIME) && !ddq->d_id) return __this_address; if (id != -1 && id != be32_to_cpu(ddq->d_id)) return __this_address; if (!ddq->d_id) return NULL; if (ddq->d_blk_softlimit && be64_to_cpu(ddq->d_bcount) > be64_to_cpu(ddq->d_blk_softlimit) && !ddq->d_btimer) return __this_address; if (ddq->d_ino_softlimit && be64_to_cpu(ddq->d_icount) > be64_to_cpu(ddq->d_ino_softlimit) && !ddq->d_itimer) return __this_address; if (ddq->d_rtb_softlimit && be64_to_cpu(ddq->d_rtbcount) > be64_to_cpu(ddq->d_rtb_softlimit) && !ddq->d_rtbtimer) return __this_address; return NULL; } xfs_failaddr_t xfs_dqblk_verify( struct xfs_mount *mp, struct xfs_dqblk *dqb, xfs_dqid_t id) /* used only during quotacheck */ { if (xfs_has_crc(mp) && !uuid_equal(&dqb->dd_uuid, &mp->m_sb.sb_meta_uuid)) return __this_address; return xfs_dquot_verify(mp, &dqb->dd_diskdq, id); } /* * Do some primitive error checking on ondisk dquot data structures. */ void xfs_dqblk_repair( struct xfs_mount *mp, struct xfs_dqblk *dqb, xfs_dqid_t id, xfs_dqtype_t type) { /* * Typically, a repair is only requested by quotacheck. */ ASSERT(id != -1); memset(dqb, 0, sizeof(struct xfs_dqblk)); dqb->dd_diskdq.d_magic = cpu_to_be16(XFS_DQUOT_MAGIC); dqb->dd_diskdq.d_version = XFS_DQUOT_VERSION; dqb->dd_diskdq.d_type = type; dqb->dd_diskdq.d_id = cpu_to_be32(id); if (xfs_has_crc(mp)) { uuid_copy(&dqb->dd_uuid, &mp->m_sb.sb_meta_uuid); xfs_update_cksum((char *)dqb, sizeof(struct xfs_dqblk), XFS_DQUOT_CRC_OFF); } } STATIC bool xfs_dquot_buf_verify_crc( struct xfs_mount *mp, struct xfs_buf *bp, bool readahead) { struct xfs_dqblk *d = (struct xfs_dqblk *)bp->b_addr; int ndquots; int i; if (!xfs_has_crc(mp)) return true; /* * if we are in log recovery, the quota subsystem has not been * initialised so we have no quotainfo structure. In that case, we need * to manually calculate the number of dquots in the buffer. */ if (mp->m_quotainfo) ndquots = mp->m_quotainfo->qi_dqperchunk; else ndquots = xfs_calc_dquots_per_chunk(bp->b_length); for (i = 0; i < ndquots; i++, d++) { if (!xfs_verify_cksum((char *)d, sizeof(struct xfs_dqblk), XFS_DQUOT_CRC_OFF)) { if (!readahead) xfs_buf_verifier_error(bp, -EFSBADCRC, __func__, d, sizeof(*d), __this_address); return false; } } return true; } STATIC xfs_failaddr_t xfs_dquot_buf_verify( struct xfs_mount *mp, struct xfs_buf *bp, bool readahead) { struct xfs_dqblk *dqb = bp->b_addr; xfs_failaddr_t fa; xfs_dqid_t id = 0; int ndquots; int i; /* * if we are in log recovery, the quota subsystem has not been * initialised so we have no quotainfo structure. In that case, we need * to manually calculate the number of dquots in the buffer. */ if (mp->m_quotainfo) ndquots = mp->m_quotainfo->qi_dqperchunk; else ndquots = xfs_calc_dquots_per_chunk(bp->b_length); /* * On the first read of the buffer, verify that each dquot is valid. * We don't know what the id of the dquot is supposed to be, just that * they should be increasing monotonically within the buffer. If the * first id is corrupt, then it will fail on the second dquot in the * buffer so corruptions could point to the wrong dquot in this case. */ for (i = 0; i < ndquots; i++) { struct xfs_disk_dquot *ddq; ddq = &dqb[i].dd_diskdq; if (i == 0) id = be32_to_cpu(ddq->d_id); fa = xfs_dqblk_verify(mp, &dqb[i], id + i); if (fa) { if (!readahead) xfs_buf_verifier_error(bp, -EFSCORRUPTED, __func__, &dqb[i], sizeof(struct xfs_dqblk), fa); return fa; } } return NULL; } static xfs_failaddr_t xfs_dquot_buf_verify_struct( struct xfs_buf *bp) { struct xfs_mount *mp = bp->b_mount; return xfs_dquot_buf_verify(mp, bp, false); } static void xfs_dquot_buf_read_verify( struct xfs_buf *bp) { struct xfs_mount *mp = bp->b_mount; if (!xfs_dquot_buf_verify_crc(mp, bp, false)) return; xfs_dquot_buf_verify(mp, bp, false); } /* * readahead errors are silent and simply leave the buffer as !done so a real * read will then be run with the xfs_dquot_buf_ops verifier. See * xfs_inode_buf_verify() for why we use EIO and ~XBF_DONE here rather than * reporting the failure. */ static void xfs_dquot_buf_readahead_verify( struct xfs_buf *bp) { struct xfs_mount *mp = bp->b_mount; if (!xfs_dquot_buf_verify_crc(mp, bp, true) || xfs_dquot_buf_verify(mp, bp, true) != NULL) { xfs_buf_ioerror(bp, -EIO); bp->b_flags &= ~XBF_DONE; } } /* * we don't calculate the CRC here as that is done when the dquot is flushed to * the buffer after the update is done. This ensures that the dquot in the * buffer always has an up-to-date CRC value. */ static void xfs_dquot_buf_write_verify( struct xfs_buf *bp) { struct xfs_mount *mp = bp->b_mount; xfs_dquot_buf_verify(mp, bp, false); } const struct xfs_buf_ops xfs_dquot_buf_ops = { .name = "xfs_dquot", .magic16 = { cpu_to_be16(XFS_DQUOT_MAGIC), cpu_to_be16(XFS_DQUOT_MAGIC) }, .verify_read = xfs_dquot_buf_read_verify, .verify_write = xfs_dquot_buf_write_verify, .verify_struct = xfs_dquot_buf_verify_struct, }; const struct xfs_buf_ops xfs_dquot_buf_ra_ops = { .name = "xfs_dquot_ra", .magic16 = { cpu_to_be16(XFS_DQUOT_MAGIC), cpu_to_be16(XFS_DQUOT_MAGIC) }, .verify_read = xfs_dquot_buf_readahead_verify, .verify_write = xfs_dquot_buf_write_verify, }; /* Convert an on-disk timer value into an incore timer value. */ time64_t xfs_dquot_from_disk_ts( struct xfs_disk_dquot *ddq, __be32 dtimer) { uint32_t t = be32_to_cpu(dtimer); if (t != 0 && (ddq->d_type & XFS_DQTYPE_BIGTIME)) return xfs_dq_bigtime_to_unix(t); return t; } /* Convert an incore timer value into an on-disk timer value. */ __be32 xfs_dquot_to_disk_ts( struct xfs_dquot *dqp, time64_t timer) { uint32_t t = timer; if (timer != 0 && (dqp->q_type & XFS_DQTYPE_BIGTIME)) t = xfs_dq_unix_to_bigtime(timer); return cpu_to_be32(t); } inline unsigned int xfs_dqinode_sick_mask(xfs_dqtype_t type) { switch (type) { case XFS_DQTYPE_USER: return XFS_SICK_FS_UQUOTA; case XFS_DQTYPE_GROUP: return XFS_SICK_FS_GQUOTA; case XFS_DQTYPE_PROJ: return XFS_SICK_FS_PQUOTA; } ASSERT(0); return 0; } /* * Load the inode for a given type of quota, assuming that the sb fields have * been sorted out. This is not true when switching quota types on a V4 * filesystem, so do not use this function for that. If metadir is enabled, * @dp must be the /quota metadir. * * Returns -ENOENT if the quota inode field is NULLFSINO; 0 and an inode on * success; or a negative errno. */ int xfs_dqinode_load( struct xfs_trans *tp, struct xfs_inode *dp, xfs_dqtype_t type, struct xfs_inode **ipp) { struct xfs_mount *mp = tp->t_mountp; struct xfs_inode *ip; enum xfs_metafile_type metafile_type = xfs_dqinode_metafile_type(type); int error; if (!xfs_has_metadir(mp)) { xfs_ino_t ino; switch (type) { case XFS_DQTYPE_USER: ino = mp->m_sb.sb_uquotino; break; case XFS_DQTYPE_GROUP: ino = mp->m_sb.sb_gquotino; break; case XFS_DQTYPE_PROJ: ino = mp->m_sb.sb_pquotino; break; default: ASSERT(0); return -EFSCORRUPTED; } /* Should have set 0 to NULLFSINO when loading superblock */ if (ino == NULLFSINO) return -ENOENT; error = xfs_trans_metafile_iget(tp, ino, metafile_type, &ip); } else { error = xfs_metadir_load(tp, dp, xfs_dqinode_path(type), metafile_type, &ip); if (error == -ENOENT) return error; } if (error) { if (xfs_metadata_is_sick(error)) xfs_fs_mark_sick(mp, xfs_dqinode_sick_mask(type)); return error; } if (XFS_IS_CORRUPT(mp, ip->i_df.if_format != XFS_DINODE_FMT_EXTENTS && ip->i_df.if_format != XFS_DINODE_FMT_BTREE)) { xfs_irele(ip); xfs_fs_mark_sick(mp, xfs_dqinode_sick_mask(type)); return -EFSCORRUPTED; } if (XFS_IS_CORRUPT(mp, ip->i_projid != 0)) { xfs_irele(ip); xfs_fs_mark_sick(mp, xfs_dqinode_sick_mask(type)); return -EFSCORRUPTED; } *ipp = ip; return 0; } /* Create a metadata directory quota inode. */ int xfs_dqinode_metadir_create( struct xfs_inode *dp, xfs_dqtype_t type, struct xfs_inode **ipp) { struct xfs_metadir_update upd = { .dp = dp, .metafile_type = xfs_dqinode_metafile_type(type), .path = xfs_dqinode_path(type), }; int error; error = xfs_metadir_start_create(&upd); if (error) return error; error = xfs_metadir_create(&upd, S_IFREG); if (error) return error; xfs_trans_log_inode(upd.tp, upd.ip, XFS_ILOG_CORE); error = xfs_metadir_commit(&upd); if (error) return error; xfs_finish_inode_setup(upd.ip); *ipp = upd.ip; return 0; } #ifndef __KERNEL__ /* Link a metadata directory quota inode. */ int xfs_dqinode_metadir_link( struct xfs_inode *dp, xfs_dqtype_t type, struct xfs_inode *ip) { struct xfs_metadir_update upd = { .dp = dp, .metafile_type = xfs_dqinode_metafile_type(type), .path = xfs_dqinode_path(type), .ip = ip, }; int error; error = xfs_metadir_start_link(&upd); if (error) return error; error = xfs_metadir_link(&upd); if (error) return error; xfs_trans_log_inode(upd.tp, upd.ip, XFS_ILOG_CORE); return xfs_metadir_commit(&upd); } #endif /* __KERNEL__ */ /* Create the parent directory for all quota inodes and load it. */ int xfs_dqinode_mkdir_parent( struct xfs_mount *mp, struct xfs_inode **dpp) { if (!mp->m_metadirip) { xfs_fs_mark_sick(mp, XFS_SICK_FS_METADIR); return -EFSCORRUPTED; } return xfs_metadir_mkdir(mp->m_metadirip, "quota", dpp); } /* * Load the parent directory of all quota inodes. Pass the inode to the caller * because quota functions (e.g. QUOTARM) can be called on the quota files even * if quotas are not enabled. */ int xfs_dqinode_load_parent( struct xfs_trans *tp, struct xfs_inode **dpp) { struct xfs_mount *mp = tp->t_mountp; if (!mp->m_metadirip) { xfs_fs_mark_sick(mp, XFS_SICK_FS_METADIR); return -EFSCORRUPTED; } return xfs_metadir_load(tp, mp->m_metadirip, "quota", XFS_METAFILE_DIR, dpp); } |
| 2 2 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 | // SPDX-License-Identifier: GPL-2.0 #include <linux/module.h> #include <linux/netfilter/nf_tables.h> #include <net/netfilter/nf_nat.h> #include <net/netfilter/nf_tables.h> #include <net/netfilter/nf_tables_ipv4.h> #include <net/netfilter/nf_tables_ipv6.h> static unsigned int nft_nat_do_chain(void *priv, struct sk_buff *skb, const struct nf_hook_state *state) { struct nft_pktinfo pkt; nft_set_pktinfo(&pkt, skb, state); switch (state->pf) { #ifdef CONFIG_NF_TABLES_IPV4 case NFPROTO_IPV4: nft_set_pktinfo_ipv4(&pkt); break; #endif #ifdef CONFIG_NF_TABLES_IPV6 case NFPROTO_IPV6: nft_set_pktinfo_ipv6(&pkt); break; #endif default: break; } return nft_do_chain(&pkt, priv); } #ifdef CONFIG_NF_TABLES_IPV4 static const struct nft_chain_type nft_chain_nat_ipv4 = { .name = "nat", .type = NFT_CHAIN_T_NAT, .family = NFPROTO_IPV4, .owner = THIS_MODULE, .hook_mask = (1 << NF_INET_PRE_ROUTING) | (1 << NF_INET_POST_ROUTING) | (1 << NF_INET_LOCAL_OUT) | (1 << NF_INET_LOCAL_IN), .hooks = { [NF_INET_PRE_ROUTING] = nft_nat_do_chain, [NF_INET_POST_ROUTING] = nft_nat_do_chain, [NF_INET_LOCAL_OUT] = nft_nat_do_chain, [NF_INET_LOCAL_IN] = nft_nat_do_chain, }, .ops_register = nf_nat_ipv4_register_fn, .ops_unregister = nf_nat_ipv4_unregister_fn, }; #endif #ifdef CONFIG_NF_TABLES_IPV6 static const struct nft_chain_type nft_chain_nat_ipv6 = { .name = "nat", .type = NFT_CHAIN_T_NAT, .family = NFPROTO_IPV6, .owner = THIS_MODULE, .hook_mask = (1 << NF_INET_PRE_ROUTING) | (1 << NF_INET_POST_ROUTING) | (1 << NF_INET_LOCAL_OUT) | (1 << NF_INET_LOCAL_IN), .hooks = { [NF_INET_PRE_ROUTING] = nft_nat_do_chain, [NF_INET_POST_ROUTING] = nft_nat_do_chain, [NF_INET_LOCAL_OUT] = nft_nat_do_chain, [NF_INET_LOCAL_IN] = nft_nat_do_chain, }, .ops_register = nf_nat_ipv6_register_fn, .ops_unregister = nf_nat_ipv6_unregister_fn, }; #endif #ifdef CONFIG_NF_TABLES_INET static int nft_nat_inet_reg(struct net *net, const struct nf_hook_ops *ops) { return nf_nat_inet_register_fn(net, ops); } static void nft_nat_inet_unreg(struct net *net, const struct nf_hook_ops *ops) { nf_nat_inet_unregister_fn(net, ops); } static const struct nft_chain_type nft_chain_nat_inet = { .name = "nat", .type = NFT_CHAIN_T_NAT, .family = NFPROTO_INET, .owner = THIS_MODULE, .hook_mask = (1 << NF_INET_PRE_ROUTING) | (1 << NF_INET_LOCAL_IN) | (1 << NF_INET_LOCAL_OUT) | (1 << NF_INET_POST_ROUTING), .hooks = { [NF_INET_PRE_ROUTING] = nft_nat_do_chain, [NF_INET_LOCAL_IN] = nft_nat_do_chain, [NF_INET_LOCAL_OUT] = nft_nat_do_chain, [NF_INET_POST_ROUTING] = nft_nat_do_chain, }, .ops_register = nft_nat_inet_reg, .ops_unregister = nft_nat_inet_unreg, }; #endif static int __init nft_chain_nat_init(void) { #ifdef CONFIG_NF_TABLES_IPV6 nft_register_chain_type(&nft_chain_nat_ipv6); #endif #ifdef CONFIG_NF_TABLES_IPV4 nft_register_chain_type(&nft_chain_nat_ipv4); #endif #ifdef CONFIG_NF_TABLES_INET nft_register_chain_type(&nft_chain_nat_inet); #endif return 0; } static void __exit nft_chain_nat_exit(void) { #ifdef CONFIG_NF_TABLES_IPV4 nft_unregister_chain_type(&nft_chain_nat_ipv4); #endif #ifdef CONFIG_NF_TABLES_IPV6 nft_unregister_chain_type(&nft_chain_nat_ipv6); #endif #ifdef CONFIG_NF_TABLES_INET nft_unregister_chain_type(&nft_chain_nat_inet); #endif } module_init(nft_chain_nat_init); module_exit(nft_chain_nat_exit); MODULE_LICENSE("GPL"); MODULE_DESCRIPTION("nftables network address translation support"); #ifdef CONFIG_NF_TABLES_IPV4 MODULE_ALIAS_NFT_CHAIN(AF_INET, "nat"); #endif #ifdef CONFIG_NF_TABLES_IPV6 MODULE_ALIAS_NFT_CHAIN(AF_INET6, "nat"); #endif #ifdef CONFIG_NF_TABLES_INET MODULE_ALIAS_NFT_CHAIN(1, "nat"); /* NFPROTO_INET */ #endif |
| 123 15 5 134 10 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 | /* SPDX-License-Identifier: GPL-2.0 */ #ifndef _ASM_GENERIC_HUGETLB_H #define _ASM_GENERIC_HUGETLB_H #include <linux/swap.h> #include <linux/swapops.h> static inline pte_t mk_huge_pte(struct page *page, pgprot_t pgprot) { return mk_pte(page, pgprot); } static inline unsigned long huge_pte_write(pte_t pte) { return pte_write(pte); } static inline unsigned long huge_pte_dirty(pte_t pte) { return pte_dirty(pte); } static inline pte_t huge_pte_mkwrite(pte_t pte) { return pte_mkwrite_novma(pte); } #ifndef __HAVE_ARCH_HUGE_PTE_WRPROTECT static inline pte_t huge_pte_wrprotect(pte_t pte) { return pte_wrprotect(pte); } #endif static inline pte_t huge_pte_mkdirty(pte_t pte) { return pte_mkdirty(pte); } static inline pte_t huge_pte_modify(pte_t pte, pgprot_t newprot) { return pte_modify(pte, newprot); } #ifndef __HAVE_ARCH_HUGE_PTE_MKUFFD_WP static inline pte_t huge_pte_mkuffd_wp(pte_t pte) { return huge_pte_wrprotect(pte_mkuffd_wp(pte)); } #endif #ifndef __HAVE_ARCH_HUGE_PTE_CLEAR_UFFD_WP static inline pte_t huge_pte_clear_uffd_wp(pte_t pte) { return pte_clear_uffd_wp(pte); } #endif #ifndef __HAVE_ARCH_HUGE_PTE_UFFD_WP static inline int huge_pte_uffd_wp(pte_t pte) { return pte_uffd_wp(pte); } #endif #ifndef __HAVE_ARCH_HUGE_PTE_CLEAR static inline void huge_pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep, unsigned long sz) { pte_clear(mm, addr, ptep); } #endif #ifndef __HAVE_ARCH_HUGETLB_FREE_PGD_RANGE static inline void hugetlb_free_pgd_range(struct mmu_gather *tlb, unsigned long addr, unsigned long end, unsigned long floor, unsigned long ceiling) { free_pgd_range(tlb, addr, end, floor, ceiling); } #endif #ifndef __HAVE_ARCH_HUGE_SET_HUGE_PTE_AT static inline void set_huge_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep, pte_t pte, unsigned long sz) { set_pte_at(mm, addr, ptep, pte); } #endif #ifndef __HAVE_ARCH_HUGE_PTEP_GET_AND_CLEAR static inline pte_t huge_ptep_get_and_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep) { return ptep_get_and_clear(mm, addr, ptep); } #endif #ifndef __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH static inline pte_t huge_ptep_clear_flush(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep) { return ptep_clear_flush(vma, addr, ptep); } #endif #ifndef __HAVE_ARCH_HUGE_PTE_NONE static inline int huge_pte_none(pte_t pte) { return pte_none(pte); } #endif /* Please refer to comments above pte_none_mostly() for the usage */ #ifndef __HAVE_ARCH_HUGE_PTE_NONE_MOSTLY static inline int huge_pte_none_mostly(pte_t pte) { return huge_pte_none(pte) || is_pte_marker(pte); } #endif #ifndef __HAVE_ARCH_PREPARE_HUGEPAGE_RANGE static inline int prepare_hugepage_range(struct file *file, unsigned long addr, unsigned long len) { return 0; } #endif #ifndef __HAVE_ARCH_HUGE_PTEP_SET_WRPROTECT static inline void huge_ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr, pte_t *ptep) { ptep_set_wrprotect(mm, addr, ptep); } #endif #ifndef __HAVE_ARCH_HUGE_PTEP_SET_ACCESS_FLAGS static inline int huge_ptep_set_access_flags(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep, pte_t pte, int dirty) { return ptep_set_access_flags(vma, addr, ptep, pte, dirty); } #endif #ifndef __HAVE_ARCH_HUGE_PTEP_GET static inline pte_t huge_ptep_get(struct mm_struct *mm, unsigned long addr, pte_t *ptep) { return ptep_get(ptep); } #endif #ifndef __HAVE_ARCH_GIGANTIC_PAGE_RUNTIME_SUPPORTED static inline bool gigantic_page_runtime_supported(void) { return IS_ENABLED(CONFIG_ARCH_HAS_GIGANTIC_PAGE); } #endif /* __HAVE_ARCH_GIGANTIC_PAGE_RUNTIME_SUPPORTED */ #endif /* _ASM_GENERIC_HUGETLB_H */ |
| 14 1 1 11 1 13 13 1 1 10 5 3 2 3 2 2 2 2 27 11 17 17 1 1 12 3 1 1 12 1 9 9 13 1 4 8 10 2 9 8 17 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 | // SPDX-License-Identifier: GPL-2.0-only /* * fs/bfs/inode.c * BFS superblock and inode operations. * Copyright (C) 1999-2018 Tigran Aivazian <aivazian.tigran@gmail.com> * From fs/minix, Copyright (C) 1991, 1992 Linus Torvalds. * Made endianness-clean by Andrew Stribblehill <ads@wompom.org>, 2005. */ #include <linux/module.h> #include <linux/mm.h> #include <linux/slab.h> #include <linux/init.h> #include <linux/fs.h> #include <linux/buffer_head.h> #include <linux/vfs.h> #include <linux/writeback.h> #include <linux/uio.h> #include <linux/uaccess.h> #include "bfs.h" MODULE_AUTHOR("Tigran Aivazian <aivazian.tigran@gmail.com>"); MODULE_DESCRIPTION("SCO UnixWare BFS filesystem for Linux"); MODULE_LICENSE("GPL"); #undef DEBUG #ifdef DEBUG #define dprintf(x...) printf(x) #else #define dprintf(x...) #endif struct inode *bfs_iget(struct super_block *sb, unsigned long ino) { struct bfs_inode *di; struct inode *inode; struct buffer_head *bh; int block, off; inode = iget_locked(sb, ino); if (!inode) return ERR_PTR(-ENOMEM); if (!(inode->i_state & I_NEW)) return inode; if ((ino < BFS_ROOT_INO) || (ino > BFS_SB(inode->i_sb)->si_lasti)) { printf("Bad inode number %s:%08lx\n", inode->i_sb->s_id, ino); goto error; } block = (ino - BFS_ROOT_INO) / BFS_INODES_PER_BLOCK + 1; bh = sb_bread(inode->i_sb, block); if (!bh) { printf("Unable to read inode %s:%08lx\n", inode->i_sb->s_id, ino); goto error; } off = (ino - BFS_ROOT_INO) % BFS_INODES_PER_BLOCK; di = (struct bfs_inode *)bh->b_data + off; inode->i_mode = 0x0000FFFF & le32_to_cpu(di->i_mode); if (le32_to_cpu(di->i_vtype) == BFS_VDIR) { inode->i_mode |= S_IFDIR; inode->i_op = &bfs_dir_inops; inode->i_fop = &bfs_dir_operations; } else if (le32_to_cpu(di->i_vtype) == BFS_VREG) { inode->i_mode |= S_IFREG; inode->i_op = &bfs_file_inops; inode->i_fop = &bfs_file_operations; inode->i_mapping->a_ops = &bfs_aops; } BFS_I(inode)->i_sblock = le32_to_cpu(di->i_sblock); BFS_I(inode)->i_eblock = le32_to_cpu(di->i_eblock); BFS_I(inode)->i_dsk_ino = le16_to_cpu(di->i_ino); i_uid_write(inode, le32_to_cpu(di->i_uid)); i_gid_write(inode, le32_to_cpu(di->i_gid)); set_nlink(inode, le32_to_cpu(di->i_nlink)); inode->i_size = BFS_FILESIZE(di); inode->i_blocks = BFS_FILEBLOCKS(di); inode_set_atime(inode, le32_to_cpu(di->i_atime), 0); inode_set_mtime(inode, le32_to_cpu(di->i_mtime), 0); inode_set_ctime(inode, le32_to_cpu(di->i_ctime), 0); brelse(bh); unlock_new_inode(inode); return inode; error: iget_failed(inode); return ERR_PTR(-EIO); } static struct bfs_inode *find_inode(struct super_block *sb, u16 ino, struct buffer_head **p) { if ((ino < BFS_ROOT_INO) || (ino > BFS_SB(sb)->si_lasti)) { printf("Bad inode number %s:%08x\n", sb->s_id, ino); return ERR_PTR(-EIO); } ino -= BFS_ROOT_INO; *p = sb_bread(sb, 1 + ino / BFS_INODES_PER_BLOCK); if (!*p) { printf("Unable to read inode %s:%08x\n", sb->s_id, ino); return ERR_PTR(-EIO); } return (struct bfs_inode *)(*p)->b_data + ino % BFS_INODES_PER_BLOCK; } static int bfs_write_inode(struct inode *inode, struct writeback_control *wbc) { struct bfs_sb_info *info = BFS_SB(inode->i_sb); unsigned int ino = (u16)inode->i_ino; unsigned long i_sblock; struct bfs_inode *di; struct buffer_head *bh; int err = 0; dprintf("ino=%08x\n", ino); di = find_inode(inode->i_sb, ino, &bh); if (IS_ERR(di)) return PTR_ERR(di); mutex_lock(&info->bfs_lock); if (ino == BFS_ROOT_INO) di->i_vtype = cpu_to_le32(BFS_VDIR); else di->i_vtype = cpu_to_le32(BFS_VREG); di->i_ino = cpu_to_le16(ino); di->i_mode = cpu_to_le32(inode->i_mode); di->i_uid = cpu_to_le32(i_uid_read(inode)); di->i_gid = cpu_to_le32(i_gid_read(inode)); di->i_nlink = cpu_to_le32(inode->i_nlink); di->i_atime = cpu_to_le32(inode_get_atime_sec(inode)); di->i_mtime = cpu_to_le32(inode_get_mtime_sec(inode)); di->i_ctime = cpu_to_le32(inode_get_ctime_sec(inode)); i_sblock = BFS_I(inode)->i_sblock; di->i_sblock = cpu_to_le32(i_sblock); di->i_eblock = cpu_to_le32(BFS_I(inode)->i_eblock); di->i_eoffset = cpu_to_le32(i_sblock * BFS_BSIZE + inode->i_size - 1); mark_buffer_dirty(bh); if (wbc->sync_mode == WB_SYNC_ALL) { sync_dirty_buffer(bh); if (buffer_req(bh) && !buffer_uptodate(bh)) err = -EIO; } brelse(bh); mutex_unlock(&info->bfs_lock); return err; } static void bfs_evict_inode(struct inode *inode) { unsigned long ino = inode->i_ino; struct bfs_inode *di; struct buffer_head *bh; struct super_block *s = inode->i_sb; struct bfs_sb_info *info = BFS_SB(s); struct bfs_inode_info *bi = BFS_I(inode); dprintf("ino=%08lx\n", ino); truncate_inode_pages_final(&inode->i_data); invalidate_inode_buffers(inode); clear_inode(inode); if (inode->i_nlink) return; di = find_inode(s, inode->i_ino, &bh); if (IS_ERR(di)) return; mutex_lock(&info->bfs_lock); /* clear on-disk inode */ memset(di, 0, sizeof(struct bfs_inode)); mark_buffer_dirty(bh); brelse(bh); if (bi->i_dsk_ino) { if (bi->i_sblock) info->si_freeb += bi->i_eblock + 1 - bi->i_sblock; info->si_freei++; clear_bit(ino, info->si_imap); bfs_dump_imap("evict_inode", s); } /* * If this was the last file, make the previous block * "last block of the last file" even if there is no * real file there, saves us 1 gap. */ if (info->si_lf_eblk == bi->i_eblock) info->si_lf_eblk = bi->i_sblock - 1; mutex_unlock(&info->bfs_lock); } static void bfs_put_super(struct super_block *s) { struct bfs_sb_info *info = BFS_SB(s); if (!info) return; mutex_destroy(&info->bfs_lock); kfree(info); s->s_fs_info = NULL; } static int bfs_statfs(struct dentry *dentry, struct kstatfs *buf) { struct super_block *s = dentry->d_sb; struct bfs_sb_info *info = BFS_SB(s); u64 id = huge_encode_dev(s->s_bdev->bd_dev); buf->f_type = BFS_MAGIC; buf->f_bsize = s->s_blocksize; buf->f_blocks = info->si_blocks; buf->f_bfree = buf->f_bavail = info->si_freeb; buf->f_files = info->si_lasti + 1 - BFS_ROOT_INO; buf->f_ffree = info->si_freei; buf->f_fsid = u64_to_fsid(id); buf->f_namelen = BFS_NAMELEN; return 0; } static struct kmem_cache *bfs_inode_cachep; static struct inode *bfs_alloc_inode(struct super_block *sb) { struct bfs_inode_info *bi; bi = alloc_inode_sb(sb, bfs_inode_cachep, GFP_KERNEL); if (!bi) return NULL; return &bi->vfs_inode; } static void bfs_free_inode(struct inode *inode) { kmem_cache_free(bfs_inode_cachep, BFS_I(inode)); } static void init_once(void *foo) { struct bfs_inode_info *bi = foo; inode_init_once(&bi->vfs_inode); } static int __init init_inodecache(void) { bfs_inode_cachep = kmem_cache_create("bfs_inode_cache", sizeof(struct bfs_inode_info), 0, (SLAB_RECLAIM_ACCOUNT| SLAB_ACCOUNT), init_once); if (bfs_inode_cachep == NULL) return -ENOMEM; return 0; } static void destroy_inodecache(void) { /* * Make sure all delayed rcu free inodes are flushed before we * destroy cache. */ rcu_barrier(); kmem_cache_destroy(bfs_inode_cachep); } static const struct super_operations bfs_sops = { .alloc_inode = bfs_alloc_inode, .free_inode = bfs_free_inode, .write_inode = bfs_write_inode, .evict_inode = bfs_evict_inode, .put_super = bfs_put_super, .statfs = bfs_statfs, }; void bfs_dump_imap(const char *prefix, struct super_block *s) { #ifdef DEBUG int i; char *tmpbuf = (char *)get_zeroed_page(GFP_KERNEL); if (!tmpbuf) return; for (i = BFS_SB(s)->si_lasti; i >= 0; i--) { if (i > PAGE_SIZE - 100) break; if (test_bit(i, BFS_SB(s)->si_imap)) strcat(tmpbuf, "1"); else strcat(tmpbuf, "0"); } printf("%s: lasti=%08lx <%s>\n", prefix, BFS_SB(s)->si_lasti, tmpbuf); free_page((unsigned long)tmpbuf); #endif } static int bfs_fill_super(struct super_block *s, void *data, int silent) { struct buffer_head *bh, *sbh; struct bfs_super_block *bfs_sb; struct inode *inode; unsigned i; struct bfs_sb_info *info; int ret = -EINVAL; unsigned long i_sblock, i_eblock, i_eoff, s_size; info = kzalloc(sizeof(*info), GFP_KERNEL); if (!info) return -ENOMEM; mutex_init(&info->bfs_lock); s->s_fs_info = info; s->s_time_min = 0; s->s_time_max = U32_MAX; sb_set_blocksize(s, BFS_BSIZE); sbh = sb_bread(s, 0); if (!sbh) goto out; bfs_sb = (struct bfs_super_block *)sbh->b_data; if (le32_to_cpu(bfs_sb->s_magic) != BFS_MAGIC) { if (!silent) printf("No BFS filesystem on %s (magic=%08x)\n", s->s_id, le32_to_cpu(bfs_sb->s_magic)); goto out1; } if (BFS_UNCLEAN(bfs_sb, s) && !silent) printf("%s is unclean, continuing\n", s->s_id); s->s_magic = BFS_MAGIC; if (le32_to_cpu(bfs_sb->s_start) > le32_to_cpu(bfs_sb->s_end) || le32_to_cpu(bfs_sb->s_start) < sizeof(struct bfs_super_block) + sizeof(struct bfs_dirent)) { printf("Superblock is corrupted on %s\n", s->s_id); goto out1; } info->si_lasti = (le32_to_cpu(bfs_sb->s_start) - BFS_BSIZE) / sizeof(struct bfs_inode) + BFS_ROOT_INO - 1; if (info->si_lasti == BFS_MAX_LASTI) printf("NOTE: filesystem %s was created with 512 inodes, the real maximum is 511, mounting anyway\n", s->s_id); else if (info->si_lasti > BFS_MAX_LASTI) { printf("Impossible last inode number %lu > %d on %s\n", info->si_lasti, BFS_MAX_LASTI, s->s_id); goto out1; } for (i = 0; i < BFS_ROOT_INO; i++) set_bit(i, info->si_imap); s->s_op = &bfs_sops; inode = bfs_iget(s, BFS_ROOT_INO); if (IS_ERR(inode)) { ret = PTR_ERR(inode); goto out1; } s->s_root = d_make_root(inode); if (!s->s_root) { ret = -ENOMEM; goto out1; } info->si_blocks = (le32_to_cpu(bfs_sb->s_end) + 1) >> BFS_BSIZE_BITS; info->si_freeb = (le32_to_cpu(bfs_sb->s_end) + 1 - le32_to_cpu(bfs_sb->s_start)) >> BFS_BSIZE_BITS; info->si_freei = 0; info->si_lf_eblk = 0; /* can we read the last block? */ bh = sb_bread(s, info->si_blocks - 1); if (!bh) { printf("Last block not available on %s: %lu\n", s->s_id, info->si_blocks - 1); ret = -EIO; goto out2; } brelse(bh); bh = NULL; for (i = BFS_ROOT_INO; i <= info->si_lasti; i++) { struct bfs_inode *di; int block = (i - BFS_ROOT_INO) / BFS_INODES_PER_BLOCK + 1; int off = (i - BFS_ROOT_INO) % BFS_INODES_PER_BLOCK; unsigned long eblock; if (!off) { brelse(bh); bh = sb_bread(s, block); } if (!bh) continue; di = (struct bfs_inode *)bh->b_data + off; /* test if filesystem is not corrupted */ i_eoff = le32_to_cpu(di->i_eoffset); i_sblock = le32_to_cpu(di->i_sblock); i_eblock = le32_to_cpu(di->i_eblock); s_size = le32_to_cpu(bfs_sb->s_end); if (i_sblock > info->si_blocks || i_eblock > info->si_blocks || i_sblock > i_eblock || (i_eoff != le32_to_cpu(-1) && i_eoff > s_size) || i_sblock * BFS_BSIZE > i_eoff) { printf("Inode 0x%08x corrupted on %s\n", i, s->s_id); brelse(bh); ret = -EIO; goto out2; } if (!di->i_ino) { info->si_freei++; continue; } set_bit(i, info->si_imap); info->si_freeb -= BFS_FILEBLOCKS(di); eblock = le32_to_cpu(di->i_eblock); if (eblock > info->si_lf_eblk) info->si_lf_eblk = eblock; } brelse(bh); brelse(sbh); bfs_dump_imap("fill_super", s); return 0; out2: dput(s->s_root); s->s_root = NULL; out1: brelse(sbh); out: mutex_destroy(&info->bfs_lock); kfree(info); s->s_fs_info = NULL; return ret; } static struct dentry *bfs_mount(struct file_system_type *fs_type, int flags, const char *dev_name, void *data) { return mount_bdev(fs_type, flags, dev_name, data, bfs_fill_super); } static struct file_system_type bfs_fs_type = { .owner = THIS_MODULE, .name = "bfs", .mount = bfs_mount, .kill_sb = kill_block_super, .fs_flags = FS_REQUIRES_DEV, }; MODULE_ALIAS_FS("bfs"); static int __init init_bfs_fs(void) { int err = init_inodecache(); if (err) goto out1; err = register_filesystem(&bfs_fs_type); if (err) goto out; return 0; out: destroy_inodecache(); out1: return err; } static void __exit exit_bfs_fs(void) { unregister_filesystem(&bfs_fs_type); destroy_inodecache(); } module_init(init_bfs_fs) module_exit(exit_bfs_fs) |
| 1 1 1 1 3 3 1 3 1 2 3 2 1 1 4 1 3 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 2297 2298 2299 2300 2301 2302 2303 2304 2305 2306 2307 2308 2309 2310 2311 2312 2313 2314 2315 2316 2317 2318 2319 2320 2321 2322 2323 2324 2325 2326 2327 2328 2329 2330 2331 2332 2333 2334 2335 2336 2337 2338 2339 2340 2341 2342 2343 2344 2345 2346 2347 2348 2349 2350 2351 2352 2353 2354 2355 2356 2357 2358 2359 2360 2361 2362 2363 2364 2365 2366 2367 2368 2369 2370 2371 2372 2373 2374 2375 2376 2377 2378 2379 2380 2381 2382 2383 2384 2385 2386 2387 2388 2389 2390 2391 2392 2393 2394 2395 2396 2397 2398 2399 2400 2401 2402 2403 2404 2405 2406 2407 2408 2409 2410 2411 2412 2413 2414 2415 2416 2417 2418 2419 2420 2421 2422 2423 2424 2425 2426 2427 2428 2429 2430 2431 2432 2433 2434 2435 2436 2437 2438 2439 2440 2441 2442 2443 2444 2445 2446 2447 2448 2449 2450 2451 2452 2453 2454 2455 2456 2457 2458 2459 2460 2461 2462 2463 2464 2465 2466 2467 2468 2469 2470 2471 2472 2473 2474 2475 2476 2477 2478 2479 2480 2481 2482 2483 2484 2485 2486 2487 2488 2489 2490 2491 2492 2493 2494 2495 2496 2497 2498 2499 2500 2501 2502 2503 2504 2505 2506 2507 2508 2509 2510 2511 2512 2513 2514 2515 2516 2517 2518 2519 2520 2521 2522 2523 2524 2525 2526 2527 2528 2529 2530 2531 2532 2533 2534 2535 2536 2537 2538 2539 2540 2541 2542 2543 2544 2545 2546 2547 2548 2549 2550 2551 2552 2553 2554 2555 2556 2557 2558 2559 2560 2561 2562 2563 2564 2565 2566 2567 2568 2569 2570 2571 2572 2573 2574 2575 2576 2577 2578 2579 2580 2581 2582 2583 2584 2585 2586 2587 2588 2589 2590 2591 2592 2593 | // SPDX-License-Identifier: GPL-2.0-or-later /* * imon.c: input and display driver for SoundGraph iMON IR/VFD/LCD * * Copyright(C) 2010 Jarod Wilson <jarod@wilsonet.com> * Portions based on the original lirc_imon driver, * Copyright(C) 2004 Venky Raju(dev@venky.ws) * * Huge thanks to R. Geoff Newbury for invaluable debugging on the * 0xffdc iMON devices, and for sending me one to hack on, without * which the support for them wouldn't be nearly as good. Thanks * also to the numerous 0xffdc device owners that tested auto-config * support for me and provided debug dumps from their devices. */ #define pr_fmt(fmt) KBUILD_MODNAME ":%s: " fmt, __func__ #include <linux/errno.h> #include <linux/init.h> #include <linux/kernel.h> #include <linux/ktime.h> #include <linux/module.h> #include <linux/slab.h> #include <linux/uaccess.h> #include <linux/ratelimit.h> #include <linux/input.h> #include <linux/usb.h> #include <linux/usb/input.h> #include <media/rc-core.h> #include <linux/timer.h> #define MOD_AUTHOR "Jarod Wilson <jarod@wilsonet.com>" #define MOD_DESC "Driver for SoundGraph iMON MultiMedia IR/Display" #define MOD_NAME "imon" #define MOD_VERSION "0.9.4" #define DISPLAY_MINOR_BASE 144 #define DEVICE_NAME "lcd%d" #define BUF_CHUNK_SIZE 8 #define BUF_SIZE 128 #define BIT_DURATION 250 /* each bit received is 250us */ #define IMON_CLOCK_ENABLE_PACKETS 2 /*** P R O T O T Y P E S ***/ /* USB Callback prototypes */ static int imon_probe(struct usb_interface *interface, const struct usb_device_id *id); static void imon_disconnect(struct usb_interface *interface); static void usb_rx_callback_intf0(struct urb *urb); static void usb_rx_callback_intf1(struct urb *urb); static void usb_tx_callback(struct urb *urb); /* suspend/resume support */ static int imon_resume(struct usb_interface *intf); static int imon_suspend(struct usb_interface *intf, pm_message_t message); /* Display file_operations function prototypes */ static int display_open(struct inode *inode, struct file *file); static int display_close(struct inode *inode, struct file *file); /* VFD write operation */ static ssize_t vfd_write(struct file *file, const char __user *buf, size_t n_bytes, loff_t *pos); /* LCD file_operations override function prototypes */ static ssize_t lcd_write(struct file *file, const char __user *buf, size_t n_bytes, loff_t *pos); /*** G L O B A L S ***/ struct imon_panel_key_table { u64 hw_code; u32 keycode; }; struct imon_usb_dev_descr { __u16 flags; #define IMON_NO_FLAGS 0 #define IMON_NEED_20MS_PKT_DELAY 1 #define IMON_SUPPRESS_REPEATED_KEYS 2 struct imon_panel_key_table key_table[]; }; struct imon_context { struct device *dev; /* Newer devices have two interfaces */ struct usb_device *usbdev_intf0; struct usb_device *usbdev_intf1; bool display_supported; /* not all controllers do */ bool display_isopen; /* display port has been opened */ bool rf_device; /* true if iMON 2.4G LT/DT RF device */ bool rf_isassociating; /* RF remote associating */ bool dev_present_intf0; /* USB device presence, interface 0 */ bool dev_present_intf1; /* USB device presence, interface 1 */ struct mutex lock; /* to lock this object */ wait_queue_head_t remove_ok; /* For unexpected USB disconnects */ struct usb_endpoint_descriptor *rx_endpoint_intf0; struct usb_endpoint_descriptor *rx_endpoint_intf1; struct usb_endpoint_descriptor *tx_endpoint; struct urb *rx_urb_intf0; struct urb *rx_urb_intf1; struct urb *tx_urb; bool tx_control; unsigned char usb_rx_buf[8]; unsigned char usb_tx_buf[8]; unsigned int send_packet_delay; struct tx_t { unsigned char data_buf[35]; /* user data buffer */ struct completion finished; /* wait for write to finish */ bool busy; /* write in progress */ int status; /* status of tx completion */ } tx; u16 vendor; /* usb vendor ID */ u16 product; /* usb product ID */ struct rc_dev *rdev; /* rc-core device for remote */ struct input_dev *idev; /* input device for panel & IR mouse */ struct input_dev *touch; /* input device for touchscreen */ spinlock_t kc_lock; /* make sure we get keycodes right */ u32 kc; /* current input keycode */ u32 last_keycode; /* last reported input keycode */ u32 rc_scancode; /* the computed remote scancode */ u8 rc_toggle; /* the computed remote toggle bit */ u64 rc_proto; /* iMON or MCE (RC6) IR protocol? */ bool release_code; /* some keys send a release code */ u8 display_type; /* store the display type */ bool pad_mouse; /* toggle kbd(0)/mouse(1) mode */ char name_rdev[128]; /* rc input device name */ char phys_rdev[64]; /* rc input device phys path */ char name_idev[128]; /* input device name */ char phys_idev[64]; /* input device phys path */ char name_touch[128]; /* touch screen name */ char phys_touch[64]; /* touch screen phys path */ struct timer_list ttimer; /* touch screen timer */ int touch_x; /* x coordinate on touchscreen */ int touch_y; /* y coordinate on touchscreen */ const struct imon_usb_dev_descr *dev_descr; /* device description with key */ /* table for front panels */ /* * Fields for deferring free_imon_context(). * * Since reference to "struct imon_context" is stored into * "struct file"->private_data, we need to remember * how many file descriptors might access this "struct imon_context". */ refcount_t users; /* * Use a flag for telling display_open()/vfd_write()/lcd_write() that * imon_disconnect() was already called. */ bool disconnected; /* * We need to wait for RCU grace period in order to allow * display_open() to safely check ->disconnected and increment ->users. */ struct rcu_head rcu; }; #define TOUCH_TIMEOUT (HZ/30) /* vfd character device file operations */ static const struct file_operations vfd_fops = { .owner = THIS_MODULE, .open = display_open, .write = vfd_write, .release = display_close, .llseek = noop_llseek, }; /* lcd character device file operations */ static const struct file_operations lcd_fops = { .owner = THIS_MODULE, .open = display_open, .write = lcd_write, .release = display_close, .llseek = noop_llseek, }; enum { IMON_DISPLAY_TYPE_AUTO = 0, IMON_DISPLAY_TYPE_VFD = 1, IMON_DISPLAY_TYPE_LCD = 2, IMON_DISPLAY_TYPE_VGA = 3, IMON_DISPLAY_TYPE_NONE = 4, }; enum { IMON_KEY_IMON = 0, IMON_KEY_MCE = 1, IMON_KEY_PANEL = 2, }; static struct usb_class_driver imon_vfd_class = { .name = DEVICE_NAME, .fops = &vfd_fops, .minor_base = DISPLAY_MINOR_BASE, }; static struct usb_class_driver imon_lcd_class = { .name = DEVICE_NAME, .fops = &lcd_fops, .minor_base = DISPLAY_MINOR_BASE, }; /* imon receiver front panel/knob key table */ static const struct imon_usb_dev_descr imon_default_table = { .flags = IMON_NO_FLAGS, .key_table = { { 0x000000000f00ffeell, KEY_MEDIA }, /* Go */ { 0x000000001200ffeell, KEY_UP }, { 0x000000001300ffeell, KEY_DOWN }, { 0x000000001400ffeell, KEY_LEFT }, { 0x000000001500ffeell, KEY_RIGHT }, { 0x000000001600ffeell, KEY_ENTER }, { 0x000000001700ffeell, KEY_ESC }, { 0x000000001f00ffeell, KEY_AUDIO }, { 0x000000002000ffeell, KEY_VIDEO }, { 0x000000002100ffeell, KEY_CAMERA }, { 0x000000002700ffeell, KEY_DVD }, { 0x000000002300ffeell, KEY_TV }, { 0x000000002b00ffeell, KEY_EXIT }, { 0x000000002c00ffeell, KEY_SELECT }, { 0x000000002d00ffeell, KEY_MENU }, { 0x000000000500ffeell, KEY_PREVIOUS }, { 0x000000000700ffeell, KEY_REWIND }, { 0x000000000400ffeell, KEY_STOP }, { 0x000000003c00ffeell, KEY_PLAYPAUSE }, { 0x000000000800ffeell, KEY_FASTFORWARD }, { 0x000000000600ffeell, KEY_NEXT }, { 0x000000010000ffeell, KEY_RIGHT }, { 0x000001000000ffeell, KEY_LEFT }, { 0x000000003d00ffeell, KEY_SELECT }, { 0x000100000000ffeell, KEY_VOLUMEUP }, { 0x010000000000ffeell, KEY_VOLUMEDOWN }, { 0x000000000100ffeell, KEY_MUTE }, /* 0xffdc iMON MCE VFD */ { 0x00010000ffffffeell, KEY_VOLUMEUP }, { 0x01000000ffffffeell, KEY_VOLUMEDOWN }, { 0x00000001ffffffeell, KEY_MUTE }, { 0x0000000fffffffeell, KEY_MEDIA }, { 0x00000012ffffffeell, KEY_UP }, { 0x00000013ffffffeell, KEY_DOWN }, { 0x00000014ffffffeell, KEY_LEFT }, { 0x00000015ffffffeell, KEY_RIGHT }, { 0x00000016ffffffeell, KEY_ENTER }, { 0x00000017ffffffeell, KEY_ESC }, /* iMON Knob values */ { 0x000100ffffffffeell, KEY_VOLUMEUP }, { 0x010000ffffffffeell, KEY_VOLUMEDOWN }, { 0x000008ffffffffeell, KEY_MUTE }, { 0, KEY_RESERVED }, } }; static const struct imon_usb_dev_descr imon_OEM_VFD = { .flags = IMON_NEED_20MS_PKT_DELAY, .key_table = { { 0x000000000f00ffeell, KEY_MEDIA }, /* Go */ { 0x000000001200ffeell, KEY_UP }, { 0x000000001300ffeell, KEY_DOWN }, { 0x000000001400ffeell, KEY_LEFT }, { 0x000000001500ffeell, KEY_RIGHT }, { 0x000000001600ffeell, KEY_ENTER }, { 0x000000001700ffeell, KEY_ESC }, { 0x000000001f00ffeell, KEY_AUDIO }, { 0x000000002b00ffeell, KEY_EXIT }, { 0x000000002c00ffeell, KEY_SELECT }, { 0x000000002d00ffeell, KEY_MENU }, { 0x000000000500ffeell, KEY_PREVIOUS }, { 0x000000000700ffeell, KEY_REWIND }, { 0x000000000400ffeell, KEY_STOP }, { 0x000000003c00ffeell, KEY_PLAYPAUSE }, { 0x000000000800ffeell, KEY_FASTFORWARD }, { 0x000000000600ffeell, KEY_NEXT }, { 0x000000010000ffeell, KEY_RIGHT }, { 0x000001000000ffeell, KEY_LEFT }, { 0x000000003d00ffeell, KEY_SELECT }, { 0x000100000000ffeell, KEY_VOLUMEUP }, { 0x010000000000ffeell, KEY_VOLUMEDOWN }, { 0x000000000100ffeell, KEY_MUTE }, /* 0xffdc iMON MCE VFD */ { 0x00010000ffffffeell, KEY_VOLUMEUP }, { 0x01000000ffffffeell, KEY_VOLUMEDOWN }, { 0x00000001ffffffeell, KEY_MUTE }, { 0x0000000fffffffeell, KEY_MEDIA }, { 0x00000012ffffffeell, KEY_UP }, { 0x00000013ffffffeell, KEY_DOWN }, { 0x00000014ffffffeell, KEY_LEFT }, { 0x00000015ffffffeell, KEY_RIGHT }, { 0x00000016ffffffeell, KEY_ENTER }, { 0x00000017ffffffeell, KEY_ESC }, /* iMON Knob values */ { 0x000100ffffffffeell, KEY_VOLUMEUP }, { 0x010000ffffffffeell, KEY_VOLUMEDOWN }, { 0x000008ffffffffeell, KEY_MUTE }, { 0, KEY_RESERVED }, } }; /* imon receiver front panel/knob key table for DH102*/ static const struct imon_usb_dev_descr imon_DH102 = { .flags = IMON_NO_FLAGS, .key_table = { { 0x000100000000ffeell, KEY_VOLUMEUP }, { 0x010000000000ffeell, KEY_VOLUMEDOWN }, { 0x000000010000ffeell, KEY_MUTE }, { 0x0000000f0000ffeell, KEY_MEDIA }, { 0x000000120000ffeell, KEY_UP }, { 0x000000130000ffeell, KEY_DOWN }, { 0x000000140000ffeell, KEY_LEFT }, { 0x000000150000ffeell, KEY_RIGHT }, { 0x000000160000ffeell, KEY_ENTER }, { 0x000000170000ffeell, KEY_ESC }, { 0x0000002b0000ffeell, KEY_EXIT }, { 0x0000002c0000ffeell, KEY_SELECT }, { 0x0000002d0000ffeell, KEY_MENU }, { 0, KEY_RESERVED } } }; /* imon ultrabay front panel key table */ static const struct imon_usb_dev_descr ultrabay_table = { .flags = IMON_SUPPRESS_REPEATED_KEYS, .key_table = { { 0x0000000f0000ffeell, KEY_MEDIA }, /* Go */ { 0x000000000100ffeell, KEY_UP }, { 0x000000000001ffeell, KEY_DOWN }, { 0x000000160000ffeell, KEY_ENTER }, { 0x0000001f0000ffeell, KEY_AUDIO }, /* Music */ { 0x000000200000ffeell, KEY_VIDEO }, /* Movie */ { 0x000000210000ffeell, KEY_CAMERA }, /* Photo */ { 0x000000270000ffeell, KEY_DVD }, /* DVD */ { 0x000000230000ffeell, KEY_TV }, /* TV */ { 0x000000050000ffeell, KEY_PREVIOUS }, /* Previous */ { 0x000000070000ffeell, KEY_REWIND }, { 0x000000040000ffeell, KEY_STOP }, { 0x000000020000ffeell, KEY_PLAYPAUSE }, { 0x000000080000ffeell, KEY_FASTFORWARD }, { 0x000000060000ffeell, KEY_NEXT }, /* Next */ { 0x000100000000ffeell, KEY_VOLUMEUP }, { 0x010000000000ffeell, KEY_VOLUMEDOWN }, { 0x000000010000ffeell, KEY_MUTE }, { 0, KEY_RESERVED }, } }; /* * USB Device ID for iMON USB Control Boards * * The Windows drivers contain 6 different inf files, more or less one for * each new device until the 0x0034-0x0046 devices, which all use the same * driver. Some of the devices in the 34-46 range haven't been definitively * identified yet. Early devices have either a TriGem Computer, Inc. or a * Samsung vendor ID (0x0aa8 and 0x04e8 respectively), while all later * devices use the SoundGraph vendor ID (0x15c2). This driver only supports * the ffdc and later devices, which do onboard decoding. */ static const struct usb_device_id imon_usb_id_table[] = { /* * Several devices with this same device ID, all use iMON_PAD.inf * SoundGraph iMON PAD (IR & VFD) * SoundGraph iMON PAD (IR & LCD) * SoundGraph iMON Knob (IR only) */ { USB_DEVICE(0x15c2, 0xffdc), .driver_info = (unsigned long)&imon_default_table }, /* * Newer devices, all driven by the latest iMON Windows driver, full * list of device IDs extracted via 'strings Setup/data1.hdr |grep 15c2' * Need user input to fill in details on unknown devices. */ /* SoundGraph iMON OEM Touch LCD (IR & 7" VGA LCD) */ { USB_DEVICE(0x15c2, 0x0034), .driver_info = (unsigned long)&imon_DH102 }, /* SoundGraph iMON OEM Touch LCD (IR & 4.3" VGA LCD) */ { USB_DEVICE(0x15c2, 0x0035), .driver_info = (unsigned long)&imon_default_table}, /* SoundGraph iMON OEM VFD (IR & VFD) */ { USB_DEVICE(0x15c2, 0x0036), .driver_info = (unsigned long)&imon_OEM_VFD }, /* device specifics unknown */ { USB_DEVICE(0x15c2, 0x0037), .driver_info = (unsigned long)&imon_default_table}, /* SoundGraph iMON OEM LCD (IR & LCD) */ { USB_DEVICE(0x15c2, 0x0038), .driver_info = (unsigned long)&imon_default_table}, /* SoundGraph iMON UltraBay (IR & LCD) */ { USB_DEVICE(0x15c2, 0x0039), .driver_info = (unsigned long)&imon_default_table}, /* device specifics unknown */ { USB_DEVICE(0x15c2, 0x003a), .driver_info = (unsigned long)&imon_default_table}, /* device specifics unknown */ { USB_DEVICE(0x15c2, 0x003b), .driver_info = (unsigned long)&imon_default_table}, /* SoundGraph iMON OEM Inside (IR only) */ { USB_DEVICE(0x15c2, 0x003c), .driver_info = (unsigned long)&imon_default_table}, /* device specifics unknown */ { USB_DEVICE(0x15c2, 0x003d), .driver_info = (unsigned long)&imon_default_table}, /* device specifics unknown */ { USB_DEVICE(0x15c2, 0x003e), .driver_info = (unsigned long)&imon_default_table}, /* device specifics unknown */ { USB_DEVICE(0x15c2, 0x003f), .driver_info = (unsigned long)&imon_default_table}, /* device specifics unknown */ { USB_DEVICE(0x15c2, 0x0040), .driver_info = (unsigned long)&imon_default_table}, /* SoundGraph iMON MINI (IR only) */ { USB_DEVICE(0x15c2, 0x0041), .driver_info = (unsigned long)&imon_default_table}, /* Antec Veris Multimedia Station EZ External (IR only) */ { USB_DEVICE(0x15c2, 0x0042), .driver_info = (unsigned long)&imon_default_table}, /* Antec Veris Multimedia Station Basic Internal (IR only) */ { USB_DEVICE(0x15c2, 0x0043), .driver_info = (unsigned long)&imon_default_table}, /* Antec Veris Multimedia Station Elite (IR & VFD) */ { USB_DEVICE(0x15c2, 0x0044), .driver_info = (unsigned long)&imon_default_table}, /* Antec Veris Multimedia Station Premiere (IR & LCD) */ { USB_DEVICE(0x15c2, 0x0045), .driver_info = (unsigned long)&imon_default_table}, /* device specifics unknown */ { USB_DEVICE(0x15c2, 0x0046), .driver_info = (unsigned long)&imon_default_table}, {} }; /* USB Device data */ static struct usb_driver imon_driver = { .name = MOD_NAME, .probe = imon_probe, .disconnect = imon_disconnect, .suspend = imon_suspend, .resume = imon_resume, .id_table = imon_usb_id_table, }; /* Module bookkeeping bits */ MODULE_AUTHOR(MOD_AUTHOR); MODULE_DESCRIPTION(MOD_DESC); MODULE_VERSION(MOD_VERSION); MODULE_LICENSE("GPL"); MODULE_DEVICE_TABLE(usb, imon_usb_id_table); static bool debug; module_param(debug, bool, S_IRUGO | S_IWUSR); MODULE_PARM_DESC(debug, "Debug messages: 0=no, 1=yes (default: no)"); /* lcd, vfd, vga or none? should be auto-detected, but can be overridden... */ static int display_type; module_param(display_type, int, S_IRUGO); MODULE_PARM_DESC(display_type, "Type of attached display. 0=autodetect, 1=vfd, 2=lcd, 3=vga, 4=none (default: autodetect)"); static int pad_stabilize = 1; module_param(pad_stabilize, int, S_IRUGO | S_IWUSR); MODULE_PARM_DESC(pad_stabilize, "Apply stabilization algorithm to iMON PAD presses in arrow key mode. 0=disable, 1=enable (default)."); /* * In certain use cases, mouse mode isn't really helpful, and could actually * cause confusion, so allow disabling it when the IR device is open. */ static bool nomouse; module_param(nomouse, bool, S_IRUGO | S_IWUSR); MODULE_PARM_DESC(nomouse, "Disable mouse input device mode when IR device is open. 0=don't disable, 1=disable. (default: don't disable)"); /* threshold at which a pad push registers as an arrow key in kbd mode */ static int pad_thresh; module_param(pad_thresh, int, S_IRUGO | S_IWUSR); MODULE_PARM_DESC(pad_thresh, "Threshold at which a pad push registers as an arrow key in kbd mode (default: 28)"); static void free_imon_context(struct imon_context *ictx) { struct device *dev = ictx->dev; usb_free_urb(ictx->tx_urb); WARN_ON(ictx->dev_present_intf0); usb_free_urb(ictx->rx_urb_intf0); WARN_ON(ictx->dev_present_intf1); usb_free_urb(ictx->rx_urb_intf1); kfree_rcu(ictx, rcu); dev_dbg(dev, "%s: iMON context freed\n", __func__); } /* * Called when the Display device (e.g. /dev/lcd0) * is opened by the application. */ static int display_open(struct inode *inode, struct file *file) { struct usb_interface *interface; struct imon_context *ictx = NULL; int subminor; int retval = 0; subminor = iminor(inode); interface = usb_find_interface(&imon_driver, subminor); if (!interface) { pr_err("could not find interface for minor %d\n", subminor); retval = -ENODEV; goto exit; } rcu_read_lock(); ictx = usb_get_intfdata(interface); if (!ictx || ictx->disconnected || !refcount_inc_not_zero(&ictx->users)) { rcu_read_unlock(); pr_err("no context found for minor %d\n", subminor); retval = -ENODEV; goto exit; } rcu_read_unlock(); mutex_lock(&ictx->lock); if (!ictx->display_supported) { pr_err("display not supported by device\n"); retval = -ENODEV; } else if (ictx->display_isopen) { pr_err("display port is already open\n"); retval = -EBUSY; } else { ictx->display_isopen = true; file->private_data = ictx; dev_dbg(ictx->dev, "display port opened\n"); } mutex_unlock(&ictx->lock); if (retval && refcount_dec_and_test(&ictx->users)) free_imon_context(ictx); exit: return retval; } /* * Called when the display device (e.g. /dev/lcd0) * is closed by the application. */ static int display_close(struct inode *inode, struct file *file) { struct imon_context *ictx = file->private_data; int retval = 0; mutex_lock(&ictx->lock); if (!ictx->display_supported) { pr_err("display not supported by device\n"); retval = -ENODEV; } else if (!ictx->display_isopen) { pr_err("display is not open\n"); retval = -EIO; } else { ictx->display_isopen = false; dev_dbg(ictx->dev, "display port closed\n"); } mutex_unlock(&ictx->lock); if (refcount_dec_and_test(&ictx->users)) free_imon_context(ictx); return retval; } /* * Sends a packet to the device -- this function must be called with * ictx->lock held, or its unlock/lock sequence while waiting for tx * to complete can/will lead to a deadlock. */ static int send_packet(struct imon_context *ictx) { unsigned int pipe; unsigned long timeout; int interval = 0; int retval = 0; struct usb_ctrlrequest *control_req = NULL; /* Check if we need to use control or interrupt urb */ if (!ictx->tx_control) { pipe = usb_sndintpipe(ictx->usbdev_intf0, ictx->tx_endpoint->bEndpointAddress); interval = ictx->tx_endpoint->bInterval; usb_fill_int_urb(ictx->tx_urb, ictx->usbdev_intf0, pipe, ictx->usb_tx_buf, sizeof(ictx->usb_tx_buf), usb_tx_callback, ictx, interval); ictx->tx_urb->actual_length = 0; } else { /* fill request into kmalloc'ed space: */ control_req = kmalloc(sizeof(*control_req), GFP_KERNEL); if (control_req == NULL) return -ENOMEM; /* setup packet is '21 09 0200 0001 0008' */ control_req->bRequestType = 0x21; control_req->bRequest = 0x09; control_req->wValue = cpu_to_le16(0x0200); control_req->wIndex = cpu_to_le16(0x0001); control_req->wLength = cpu_to_le16(0x0008); /* control pipe is endpoint 0x00 */ pipe = usb_sndctrlpipe(ictx->usbdev_intf0, 0); /* build the control urb */ usb_fill_control_urb(ictx->tx_urb, ictx->usbdev_intf0, pipe, (unsigned char *)control_req, ictx->usb_tx_buf, sizeof(ictx->usb_tx_buf), usb_tx_callback, ictx); ictx->tx_urb->actual_length = 0; } reinit_completion(&ictx->tx.finished); ictx->tx.busy = true; smp_rmb(); /* ensure later readers know we're busy */ retval = usb_submit_urb(ictx->tx_urb, GFP_KERNEL); if (retval) { ictx->tx.busy = false; smp_rmb(); /* ensure later readers know we're not busy */ pr_err_ratelimited("error submitting urb(%d)\n", retval); } else { /* Wait for transmission to complete (or abort) */ retval = wait_for_completion_interruptible( &ictx->tx.finished); if (retval) { usb_kill_urb(ictx->tx_urb); pr_err_ratelimited("task interrupted\n"); } ictx->tx.busy = false; retval = ictx->tx.status; if (retval) pr_err_ratelimited("packet tx failed (%d)\n", retval); } kfree(control_req); /* * Induce a mandatory delay before returning, as otherwise, * send_packet can get called so rapidly as to overwhelm the device, * particularly on faster systems and/or those with quirky usb. */ timeout = msecs_to_jiffies(ictx->send_packet_delay); set_current_state(TASK_INTERRUPTIBLE); schedule_timeout(timeout); return retval; } /* * Sends an associate packet to the iMON 2.4G. * * This might not be such a good idea, since it has an id collision with * some versions of the "IR & VFD" combo. The only way to determine if it * is an RF version is to look at the product description string. (Which * we currently do not fetch). */ static int send_associate_24g(struct imon_context *ictx) { const unsigned char packet[8] = { 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x20 }; if (!ictx) { pr_err("no context for device\n"); return -ENODEV; } if (!ictx->dev_present_intf0) { pr_err("no iMON device present\n"); return -ENODEV; } memcpy(ictx->usb_tx_buf, packet, sizeof(packet)); return send_packet(ictx); } /* * Sends packets to setup and show clock on iMON display * * Arguments: year - last 2 digits of year, month - 1..12, * day - 1..31, dow - day of the week (0-Sun...6-Sat), * hour - 0..23, minute - 0..59, second - 0..59 */ static int send_set_imon_clock(struct imon_context *ictx, unsigned int year, unsigned int month, unsigned int day, unsigned int dow, unsigned int hour, unsigned int minute, unsigned int second) { unsigned char clock_enable_pkt[IMON_CLOCK_ENABLE_PACKETS][8]; int retval = 0; int i; if (!ictx) { pr_err("no context for device\n"); return -ENODEV; } switch (ictx->display_type) { case IMON_DISPLAY_TYPE_LCD: clock_enable_pkt[0][0] = 0x80; clock_enable_pkt[0][1] = year; clock_enable_pkt[0][2] = month-1; clock_enable_pkt[0][3] = day; clock_enable_pkt[0][4] = hour; clock_enable_pkt[0][5] = minute; clock_enable_pkt[0][6] = second; clock_enable_pkt[1][0] = 0x80; clock_enable_pkt[1][1] = 0; clock_enable_pkt[1][2] = 0; clock_enable_pkt[1][3] = 0; clock_enable_pkt[1][4] = 0; clock_enable_pkt[1][5] = 0; clock_enable_pkt[1][6] = 0; if (ictx->product == 0xffdc) { clock_enable_pkt[0][7] = 0x50; clock_enable_pkt[1][7] = 0x51; } else { clock_enable_pkt[0][7] = 0x88; clock_enable_pkt[1][7] = 0x8a; } break; case IMON_DISPLAY_TYPE_VFD: clock_enable_pkt[0][0] = year; clock_enable_pkt[0][1] = month-1; clock_enable_pkt[0][2] = day; clock_enable_pkt[0][3] = dow; clock_enable_pkt[0][4] = hour; clock_enable_pkt[0][5] = minute; clock_enable_pkt[0][6] = second; clock_enable_pkt[0][7] = 0x40; clock_enable_pkt[1][0] = 0; clock_enable_pkt[1][1] = 0; clock_enable_pkt[1][2] = 1; clock_enable_pkt[1][3] = 0; clock_enable_pkt[1][4] = 0; clock_enable_pkt[1][5] = 0; clock_enable_pkt[1][6] = 0; clock_enable_pkt[1][7] = 0x42; break; default: return -ENODEV; } for (i = 0; i < IMON_CLOCK_ENABLE_PACKETS; i++) { memcpy(ictx->usb_tx_buf, clock_enable_pkt[i], 8); retval = send_packet(ictx); if (retval) { pr_err("send_packet failed for packet %d\n", i); break; } } return retval; } /* * These are the sysfs functions to handle the association on the iMON 2.4G LT. */ static ssize_t associate_remote_show(struct device *d, struct device_attribute *attr, char *buf) { struct imon_context *ictx = dev_get_drvdata(d); if (!ictx) return -ENODEV; mutex_lock(&ictx->lock); if (ictx->rf_isassociating) strscpy(buf, "associating\n", PAGE_SIZE); else strscpy(buf, "closed\n", PAGE_SIZE); dev_info(d, "Visit https://www.lirc.org/html/imon-24g.html for instructions on how to associate your iMON 2.4G DT/LT remote\n"); mutex_unlock(&ictx->lock); return strlen(buf); } static ssize_t associate_remote_store(struct device *d, struct device_attribute *attr, const char *buf, size_t count) { struct imon_context *ictx; ictx = dev_get_drvdata(d); if (!ictx) return -ENODEV; mutex_lock(&ictx->lock); ictx->rf_isassociating = true; send_associate_24g(ictx); mutex_unlock(&ictx->lock); return count; } /* * sysfs functions to control internal imon clock */ static ssize_t imon_clock_show(struct device *d, struct device_attribute *attr, char *buf) { struct imon_context *ictx = dev_get_drvdata(d); size_t len; if (!ictx) return -ENODEV; mutex_lock(&ictx->lock); if (!ictx->display_supported) { len = sysfs_emit(buf, "Not supported."); } else { len = sysfs_emit(buf, "To set the clock on your iMON display:\n" "# date \"+%%y %%m %%d %%w %%H %%M %%S\" > imon_clock\n" "%s", ictx->display_isopen ? "\nNOTE: imon device must be closed\n" : ""); } mutex_unlock(&ictx->lock); return len; } static ssize_t imon_clock_store(struct device *d, struct device_attribute *attr, const char *buf, size_t count) { struct imon_context *ictx = dev_get_drvdata(d); ssize_t retval; unsigned int year, month, day, dow, hour, minute, second; if (!ictx) return -ENODEV; mutex_lock(&ictx->lock); if (!ictx->display_supported) { retval = -ENODEV; goto exit; } else if (ictx->display_isopen) { retval = -EBUSY; goto exit; } if (sscanf(buf, "%u %u %u %u %u %u %u", &year, &month, &day, &dow, &hour, &minute, &second) != 7) { retval = -EINVAL; goto exit; } if ((month < 1 || month > 12) || (day < 1 || day > 31) || (dow > 6) || (hour > 23) || (minute > 59) || (second > 59)) { retval = -EINVAL; goto exit; } retval = send_set_imon_clock(ictx, year, month, day, dow, hour, minute, second); if (retval) goto exit; retval = count; exit: mutex_unlock(&ictx->lock); return retval; } static DEVICE_ATTR_RW(imon_clock); static DEVICE_ATTR_RW(associate_remote); static struct attribute *imon_display_sysfs_entries[] = { &dev_attr_imon_clock.attr, NULL }; static const struct attribute_group imon_display_attr_group = { .attrs = imon_display_sysfs_entries }; static struct attribute *imon_rf_sysfs_entries[] = { &dev_attr_associate_remote.attr, NULL }; static const struct attribute_group imon_rf_attr_group = { .attrs = imon_rf_sysfs_entries }; /* * Writes data to the VFD. The iMON VFD is 2x16 characters * and requires data in 5 consecutive USB interrupt packets, * each packet but the last carrying 7 bytes. * * I don't know if the VFD board supports features such as * scrolling, clearing rows, blanking, etc. so at * the caller must provide a full screen of data. If fewer * than 32 bytes are provided spaces will be appended to * generate a full screen. */ static ssize_t vfd_write(struct file *file, const char __user *buf, size_t n_bytes, loff_t *pos) { int i; int offset; int seq; int retval = 0; struct imon_context *ictx = file->private_data; static const unsigned char vfd_packet6[] = { 0x01, 0x00, 0x00, 0x00, 0x00, 0xFF, 0xFF }; if (ictx->disconnected) return -ENODEV; if (mutex_lock_interruptible(&ictx->lock)) return -ERESTARTSYS; if (!ictx->dev_present_intf0) { pr_err_ratelimited("no iMON device present\n"); retval = -ENODEV; goto exit; } if (n_bytes <= 0 || n_bytes > 32) { pr_err_ratelimited("invalid payload size\n"); retval = -EINVAL; goto exit; } if (copy_from_user(ictx->tx.data_buf, buf, n_bytes)) { retval = -EFAULT; goto exit; } /* Pad with spaces */ for (i = n_bytes; i < 32; ++i) ictx->tx.data_buf[i] = ' '; for (i = 32; i < 35; ++i) ictx->tx.data_buf[i] = 0xFF; offset = 0; seq = 0; do { memcpy(ictx->usb_tx_buf, ictx->tx.data_buf + offset, 7); ictx->usb_tx_buf[7] = (unsigned char) seq; retval = send_packet(ictx); if (retval) { pr_err_ratelimited("send packet #%d failed\n", seq / 2); goto exit; } else { seq += 2; offset += 7; } } while (offset < 35); /* Send packet #6 */ memcpy(ictx->usb_tx_buf, &vfd_packet6, sizeof(vfd_packet6)); ictx->usb_tx_buf[7] = (unsigned char) seq; retval = send_packet(ictx); if (retval) pr_err_ratelimited("send packet #%d failed\n", seq / 2); exit: mutex_unlock(&ictx->lock); return (!retval) ? n_bytes : retval; } /* * Writes data to the LCD. The iMON OEM LCD screen expects 8-byte * packets. We accept data as 16 hexadecimal digits, followed by a * newline (to make it easy to drive the device from a command-line * -- even though the actual binary data is a bit complicated). * * The device itself is not a "traditional" text-mode display. It's * actually a 16x96 pixel bitmap display. That means if you want to * display text, you've got to have your own "font" and translate the * text into bitmaps for display. This is really flexible (you can * display whatever diacritics you need, and so on), but it's also * a lot more complicated than most LCDs... */ static ssize_t lcd_write(struct file *file, const char __user *buf, size_t n_bytes, loff_t *pos) { int retval = 0; struct imon_context *ictx = file->private_data; if (ictx->disconnected) return -ENODEV; mutex_lock(&ictx->lock); if (!ictx->display_supported) { pr_err_ratelimited("no iMON display present\n"); retval = -ENODEV; goto exit; } if (n_bytes != 8) { pr_err_ratelimited("invalid payload size: %d (expected 8)\n", (int)n_bytes); retval = -EINVAL; goto exit; } if (copy_from_user(ictx->usb_tx_buf, buf, 8)) { retval = -EFAULT; goto exit; } retval = send_packet(ictx); if (retval) { pr_err_ratelimited("send packet failed!\n"); goto exit; } else { dev_dbg(ictx->dev, "%s: write %d bytes to LCD\n", __func__, (int) n_bytes); } exit: mutex_unlock(&ictx->lock); return (!retval) ? n_bytes : retval; } /* * Callback function for USB core API: transmit data */ static void usb_tx_callback(struct urb *urb) { struct imon_context *ictx; if (!urb) return; ictx = (struct imon_context *)urb->context; if (!ictx) return; ictx->tx.status = urb->status; /* notify waiters that write has finished */ ictx->tx.busy = false; smp_rmb(); /* ensure later readers know we're not busy */ complete(&ictx->tx.finished); } /* * report touchscreen input */ static void imon_touch_display_timeout(struct timer_list *t) { struct imon_context *ictx = from_timer(ictx, t, ttimer); if (ictx->display_type != IMON_DISPLAY_TYPE_VGA) return; input_report_abs(ictx->touch, ABS_X, ictx->touch_x); input_report_abs(ictx->touch, ABS_Y, ictx->touch_y); input_report_key(ictx->touch, BTN_TOUCH, 0x00); input_sync(ictx->touch); } /* * iMON IR receivers support two different signal sets -- those used by * the iMON remotes, and those used by the Windows MCE remotes (which is * really just RC-6), but only one or the other at a time, as the signals * are decoded onboard the receiver. * * This function gets called two different ways, one way is from * rc_register_device, for initial protocol selection/setup, and the other is * via a userspace-initiated protocol change request, either by direct sysfs * prodding or by something like ir-keytable. In the rc_register_device case, * the imon context lock is already held, but when initiated from userspace, * it is not, so we must acquire it prior to calling send_packet, which * requires that the lock is held. */ static int imon_ir_change_protocol(struct rc_dev *rc, u64 *rc_proto) { int retval; struct imon_context *ictx = rc->priv; struct device *dev = ictx->dev; bool unlock = false; unsigned char ir_proto_packet[] = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x86 }; if (*rc_proto && !(*rc_proto & rc->allowed_protocols)) dev_warn(dev, "Looks like you're trying to use an IR protocol this device does not support\n"); if (*rc_proto & RC_PROTO_BIT_RC6_MCE) { dev_dbg(dev, "Configuring IR receiver for MCE protocol\n"); ir_proto_packet[0] = 0x01; *rc_proto = RC_PROTO_BIT_RC6_MCE; } else if (*rc_proto & RC_PROTO_BIT_IMON) { dev_dbg(dev, "Configuring IR receiver for iMON protocol\n"); if (!pad_stabilize) dev_dbg(dev, "PAD stabilize functionality disabled\n"); /* ir_proto_packet[0] = 0x00; // already the default */ *rc_proto = RC_PROTO_BIT_IMON; } else { dev_warn(dev, "Unsupported IR protocol specified, overriding to iMON IR protocol\n"); if (!pad_stabilize) dev_dbg(dev, "PAD stabilize functionality disabled\n"); /* ir_proto_packet[0] = 0x00; // already the default */ *rc_proto = RC_PROTO_BIT_IMON; } memcpy(ictx->usb_tx_buf, &ir_proto_packet, sizeof(ir_proto_packet)); unlock = mutex_trylock(&ictx->lock); retval = send_packet(ictx); if (retval) goto out; ictx->rc_proto = *rc_proto; ictx->pad_mouse = false; out: if (unlock) mutex_unlock(&ictx->lock); return retval; } /* * The directional pad behaves a bit differently, depending on whether this is * one of the older ffdc devices or a newer device. Newer devices appear to * have a higher resolution matrix for more precise mouse movement, but it * makes things overly sensitive in keyboard mode, so we do some interesting * contortions to make it less touchy. Older devices run through the same * routine with shorter timeout and a smaller threshold. */ static int stabilize(int a, int b, u16 timeout, u16 threshold) { ktime_t ct; static ktime_t prev_time; static ktime_t hit_time; static int x, y, prev_result, hits; int result = 0; long msec, msec_hit; ct = ktime_get(); msec = ktime_ms_delta(ct, prev_time); msec_hit = ktime_ms_delta(ct, hit_time); if (msec > 100) { x = 0; y = 0; hits = 0; } x += a; y += b; prev_time = ct; if (abs(x) > threshold || abs(y) > threshold) { if (abs(y) > abs(x)) result = (y > 0) ? 0x7F : 0x80; else result = (x > 0) ? 0x7F00 : 0x8000; x = 0; y = 0; if (result == prev_result) { hits++; if (hits > 3) { switch (result) { case 0x7F: y = 17 * threshold / 30; break; case 0x80: y -= 17 * threshold / 30; break; case 0x7F00: x = 17 * threshold / 30; break; case 0x8000: x -= 17 * threshold / 30; break; } } if (hits == 2 && msec_hit < timeout) { result = 0; hits = 1; } } else { prev_result = result; hits = 1; hit_time = ct; } } return result; } static u32 imon_remote_key_lookup(struct imon_context *ictx, u32 scancode) { u32 keycode; u32 release; bool is_release_code = false; /* Look for the initial press of a button */ keycode = rc_g_keycode_from_table(ictx->rdev, scancode); ictx->rc_toggle = 0x0; ictx->rc_scancode = scancode; /* Look for the release of a button */ if (keycode == KEY_RESERVED) { release = scancode & ~0x4000; keycode = rc_g_keycode_from_table(ictx->rdev, release); if (keycode != KEY_RESERVED) is_release_code = true; } ictx->release_code = is_release_code; return keycode; } static u32 imon_mce_key_lookup(struct imon_context *ictx, u32 scancode) { u32 keycode; #define MCE_KEY_MASK 0x7000 #define MCE_TOGGLE_BIT 0x8000 /* * On some receivers, mce keys decode to 0x8000f04xx and 0x8000f84xx * (the toggle bit flipping between alternating key presses), while * on other receivers, we see 0x8000f74xx and 0x8000ff4xx. To keep * the table trim, we always or in the bits to look up 0x8000ff4xx, * but we can't or them into all codes, as some keys are decoded in * a different way w/o the same use of the toggle bit... */ if (scancode & 0x80000000) scancode = scancode | MCE_KEY_MASK | MCE_TOGGLE_BIT; ictx->rc_scancode = scancode; keycode = rc_g_keycode_from_table(ictx->rdev, scancode); /* not used in mce mode, but make sure we know its false */ ictx->release_code = false; return keycode; } static u32 imon_panel_key_lookup(struct imon_context *ictx, u64 code) { const struct imon_panel_key_table *key_table; u32 keycode = KEY_RESERVED; int i; key_table = ictx->dev_descr->key_table; for (i = 0; key_table[i].hw_code != 0; i++) { if (key_table[i].hw_code == (code | 0xffee)) { keycode = key_table[i].keycode; break; } } ictx->release_code = false; return keycode; } static bool imon_mouse_event(struct imon_context *ictx, unsigned char *buf, int len) { signed char rel_x = 0x00, rel_y = 0x00; u8 right_shift = 1; bool mouse_input = true; int dir = 0; unsigned long flags; spin_lock_irqsave(&ictx->kc_lock, flags); /* newer iMON device PAD or mouse button */ if (ictx->product != 0xffdc && (buf[0] & 0x01) && len == 5) { rel_x = buf[2]; rel_y = buf[3]; right_shift = 1; /* 0xffdc iMON PAD or mouse button input */ } else if (ictx->product == 0xffdc && (buf[0] & 0x40) && !((buf[1] & 0x01) || ((buf[1] >> 2) & 0x01))) { rel_x = (buf[1] & 0x08) | (buf[1] & 0x10) >> 2 | (buf[1] & 0x20) >> 4 | (buf[1] & 0x40) >> 6; if (buf[0] & 0x02) rel_x |= ~0x0f; rel_x = rel_x + rel_x / 2; rel_y = (buf[2] & 0x08) | (buf[2] & 0x10) >> 2 | (buf[2] & 0x20) >> 4 | (buf[2] & 0x40) >> 6; if (buf[0] & 0x01) rel_y |= ~0x0f; rel_y = rel_y + rel_y / 2; right_shift = 2; /* some ffdc devices decode mouse buttons differently... */ } else if (ictx->product == 0xffdc && (buf[0] == 0x68)) { right_shift = 2; /* ch+/- buttons, which we use for an emulated scroll wheel */ } else if (ictx->kc == KEY_CHANNELUP && (buf[2] & 0x40) != 0x40) { dir = 1; } else if (ictx->kc == KEY_CHANNELDOWN && (buf[2] & 0x40) != 0x40) { dir = -1; } else mouse_input = false; spin_unlock_irqrestore(&ictx->kc_lock, flags); if (mouse_input) { dev_dbg(ictx->dev, "sending mouse data via input subsystem\n"); if (dir) { input_report_rel(ictx->idev, REL_WHEEL, dir); } else if (rel_x || rel_y) { input_report_rel(ictx->idev, REL_X, rel_x); input_report_rel(ictx->idev, REL_Y, rel_y); } else { input_report_key(ictx->idev, BTN_LEFT, buf[1] & 0x1); input_report_key(ictx->idev, BTN_RIGHT, buf[1] >> right_shift & 0x1); } input_sync(ictx->idev); spin_lock_irqsave(&ictx->kc_lock, flags); ictx->last_keycode = ictx->kc; spin_unlock_irqrestore(&ictx->kc_lock, flags); } return mouse_input; } static void imon_touch_event(struct imon_context *ictx, unsigned char *buf) { mod_timer(&ictx->ttimer, jiffies + TOUCH_TIMEOUT); ictx->touch_x = (buf[0] << 4) | (buf[1] >> 4); ictx->touch_y = 0xfff - ((buf[2] << 4) | (buf[1] & 0xf)); input_report_abs(ictx->touch, ABS_X, ictx->touch_x); input_report_abs(ictx->touch, ABS_Y, ictx->touch_y); input_report_key(ictx->touch, BTN_TOUCH, 0x01); input_sync(ictx->touch); } static void imon_pad_to_keys(struct imon_context *ictx, unsigned char *buf) { int dir = 0; signed char rel_x = 0x00, rel_y = 0x00; u16 timeout, threshold; u32 scancode = KEY_RESERVED; unsigned long flags; /* * The imon directional pad functions more like a touchpad. Bytes 3 & 4 * contain a position coordinate (x,y), with each component ranging * from -14 to 14. We want to down-sample this to only 4 discrete values * for up/down/left/right arrow keys. Also, when you get too close to * diagonals, it has a tendency to jump back and forth, so lets try to * ignore when they get too close. */ if (ictx->product != 0xffdc) { /* first, pad to 8 bytes so it conforms with everything else */ buf[5] = buf[6] = buf[7] = 0; timeout = 500; /* in msecs */ /* (2*threshold) x (2*threshold) square */ threshold = pad_thresh ? pad_thresh : 28; rel_x = buf[2]; rel_y = buf[3]; if (ictx->rc_proto == RC_PROTO_BIT_IMON && pad_stabilize) { if ((buf[1] == 0) && ((rel_x != 0) || (rel_y != 0))) { dir = stabilize((int)rel_x, (int)rel_y, timeout, threshold); if (!dir) { spin_lock_irqsave(&ictx->kc_lock, flags); ictx->kc = KEY_UNKNOWN; spin_unlock_irqrestore(&ictx->kc_lock, flags); return; } buf[2] = dir & 0xFF; buf[3] = (dir >> 8) & 0xFF; scancode = be32_to_cpu(*((__be32 *)buf)); } } else { /* * Hack alert: instead of using keycodes, we have * to use hard-coded scancodes here... */ if (abs(rel_y) > abs(rel_x)) { buf[2] = (rel_y > 0) ? 0x7F : 0x80; buf[3] = 0; if (rel_y > 0) scancode = 0x01007f00; /* KEY_DOWN */ else scancode = 0x01008000; /* KEY_UP */ } else { buf[2] = 0; buf[3] = (rel_x > 0) ? 0x7F : 0x80; if (rel_x > 0) scancode = 0x0100007f; /* KEY_RIGHT */ else scancode = 0x01000080; /* KEY_LEFT */ } } /* * Handle on-board decoded pad events for e.g. older VFD/iMON-Pad * device (15c2:ffdc). The remote generates various codes from * 0x68nnnnB7 to 0x6AnnnnB7, the left mouse button generates * 0x688301b7 and the right one 0x688481b7. All other keys generate * 0x2nnnnnnn. Position coordinate is encoded in buf[1] and buf[2] with * reversed endianness. Extract direction from buffer, rotate endianness, * adjust sign and feed the values into stabilize(). The resulting codes * will be 0x01008000, 0x01007F00, which match the newer devices. */ } else { timeout = 10; /* in msecs */ /* (2*threshold) x (2*threshold) square */ threshold = pad_thresh ? pad_thresh : 15; /* buf[1] is x */ rel_x = (buf[1] & 0x08) | (buf[1] & 0x10) >> 2 | (buf[1] & 0x20) >> 4 | (buf[1] & 0x40) >> 6; if (buf[0] & 0x02) rel_x |= ~0x10+1; /* buf[2] is y */ rel_y = (buf[2] & 0x08) | (buf[2] & 0x10) >> 2 | (buf[2] & 0x20) >> 4 | (buf[2] & 0x40) >> 6; if (buf[0] & 0x01) rel_y |= ~0x10+1; buf[0] = 0x01; buf[1] = buf[4] = buf[5] = buf[6] = buf[7] = 0; if (ictx->rc_proto == RC_PROTO_BIT_IMON && pad_stabilize) { dir = stabilize((int)rel_x, (int)rel_y, timeout, threshold); if (!dir) { spin_lock_irqsave(&ictx->kc_lock, flags); ictx->kc = KEY_UNKNOWN; spin_unlock_irqrestore(&ictx->kc_lock, flags); return; } buf[2] = dir & 0xFF; buf[3] = (dir >> 8) & 0xFF; scancode = be32_to_cpu(*((__be32 *)buf)); } else { /* * Hack alert: instead of using keycodes, we have * to use hard-coded scancodes here... */ if (abs(rel_y) > abs(rel_x)) { buf[2] = (rel_y > 0) ? 0x7F : 0x80; buf[3] = 0; if (rel_y > 0) scancode = 0x01007f00; /* KEY_DOWN */ else scancode = 0x01008000; /* KEY_UP */ } else { buf[2] = 0; buf[3] = (rel_x > 0) ? 0x7F : 0x80; if (rel_x > 0) scancode = 0x0100007f; /* KEY_RIGHT */ else scancode = 0x01000080; /* KEY_LEFT */ } } } if (scancode) { spin_lock_irqsave(&ictx->kc_lock, flags); ictx->kc = imon_remote_key_lookup(ictx, scancode); spin_unlock_irqrestore(&ictx->kc_lock, flags); } } /* * figure out if these is a press or a release. We don't actually * care about repeats, as those will be auto-generated within the IR * subsystem for repeating scancodes. */ static int imon_parse_press_type(struct imon_context *ictx, unsigned char *buf, u8 ktype) { int press_type = 0; unsigned long flags; spin_lock_irqsave(&ictx->kc_lock, flags); /* key release of 0x02XXXXXX key */ if (ictx->kc == KEY_RESERVED && buf[0] == 0x02 && buf[3] == 0x00) ictx->kc = ictx->last_keycode; /* mouse button release on (some) 0xffdc devices */ else if (ictx->kc == KEY_RESERVED && buf[0] == 0x68 && buf[1] == 0x82 && buf[2] == 0x81 && buf[3] == 0xb7) ictx->kc = ictx->last_keycode; /* mouse button release on (some other) 0xffdc devices */ else if (ictx->kc == KEY_RESERVED && buf[0] == 0x01 && buf[1] == 0x00 && buf[2] == 0x81 && buf[3] == 0xb7) ictx->kc = ictx->last_keycode; /* mce-specific button handling, no keyup events */ else if (ktype == IMON_KEY_MCE) { ictx->rc_toggle = buf[2]; press_type = 1; /* incoherent or irrelevant data */ } else if (ictx->kc == KEY_RESERVED) press_type = -EINVAL; /* key release of 0xXXXXXXb7 key */ else if (ictx->release_code) press_type = 0; /* this is a button press */ else press_type = 1; spin_unlock_irqrestore(&ictx->kc_lock, flags); return press_type; } /* * Process the incoming packet */ static void imon_incoming_packet(struct imon_context *ictx, struct urb *urb, int intf) { int len = urb->actual_length; unsigned char *buf = urb->transfer_buffer; struct device *dev = ictx->dev; unsigned long flags; u32 kc; u64 scancode; int press_type = 0; ktime_t t; static ktime_t prev_time; u8 ktype; /* filter out junk data on the older 0xffdc imon devices */ if ((buf[0] == 0xff) && (buf[1] == 0xff) && (buf[2] == 0xff)) return; /* Figure out what key was pressed */ if (len == 8 && buf[7] == 0xee) { scancode = be64_to_cpu(*((__be64 *)buf)); ktype = IMON_KEY_PANEL; kc = imon_panel_key_lookup(ictx, scancode); ictx->release_code = false; } else { scancode = be32_to_cpu(*((__be32 *)buf)); if (ictx->rc_proto == RC_PROTO_BIT_RC6_MCE) { ktype = IMON_KEY_IMON; if (buf[0] == 0x80) ktype = IMON_KEY_MCE; kc = imon_mce_key_lookup(ictx, scancode); } else { ktype = IMON_KEY_IMON; kc = imon_remote_key_lookup(ictx, scancode); } } spin_lock_irqsave(&ictx->kc_lock, flags); /* keyboard/mouse mode toggle button */ if (kc == KEY_KEYBOARD && !ictx->release_code) { ictx->last_keycode = kc; if (!nomouse) { ictx->pad_mouse = !ictx->pad_mouse; dev_dbg(dev, "toggling to %s mode\n", ictx->pad_mouse ? "mouse" : "keyboard"); spin_unlock_irqrestore(&ictx->kc_lock, flags); return; } else { ictx->pad_mouse = false; dev_dbg(dev, "mouse mode disabled, passing key value\n"); } } ictx->kc = kc; spin_unlock_irqrestore(&ictx->kc_lock, flags); /* send touchscreen events through input subsystem if touchpad data */ if (ictx->touch && len == 8 && buf[7] == 0x86) { imon_touch_event(ictx, buf); return; /* look for mouse events with pad in mouse mode */ } else if (ictx->pad_mouse) { if (imon_mouse_event(ictx, buf, len)) return; } /* Now for some special handling to convert pad input to arrow keys */ if (((len == 5) && (buf[0] == 0x01) && (buf[4] == 0x00)) || ((len == 8) && (buf[0] & 0x40) && !(buf[1] & 0x1 || buf[1] >> 2 & 0x1))) { len = 8; imon_pad_to_keys(ictx, buf); } if (debug) { printk(KERN_INFO "intf%d decoded packet: %*ph\n", intf, len, buf); } press_type = imon_parse_press_type(ictx, buf, ktype); if (press_type < 0) goto not_input_data; if (ktype != IMON_KEY_PANEL) { if (press_type == 0) rc_keyup(ictx->rdev); else { enum rc_proto proto; if (ictx->rc_proto == RC_PROTO_BIT_RC6_MCE) proto = RC_PROTO_RC6_MCE; else if (ictx->rc_proto == RC_PROTO_BIT_IMON) proto = RC_PROTO_IMON; else return; rc_keydown(ictx->rdev, proto, ictx->rc_scancode, ictx->rc_toggle); spin_lock_irqsave(&ictx->kc_lock, flags); ictx->last_keycode = ictx->kc; spin_unlock_irqrestore(&ictx->kc_lock, flags); } return; } /* Only panel type events left to process now */ spin_lock_irqsave(&ictx->kc_lock, flags); t = ktime_get(); /* KEY repeats from knob and panel that need to be suppressed */ if (ictx->kc == KEY_MUTE || ictx->dev_descr->flags & IMON_SUPPRESS_REPEATED_KEYS) { if (ictx->kc == ictx->last_keycode && ktime_ms_delta(t, prev_time) < ictx->idev->rep[REP_DELAY]) { spin_unlock_irqrestore(&ictx->kc_lock, flags); return; } } prev_time = t; kc = ictx->kc; spin_unlock_irqrestore(&ictx->kc_lock, flags); input_report_key(ictx->idev, kc, press_type); input_sync(ictx->idev); /* panel keys don't generate a release */ input_report_key(ictx->idev, kc, 0); input_sync(ictx->idev); spin_lock_irqsave(&ictx->kc_lock, flags); ictx->last_keycode = kc; spin_unlock_irqrestore(&ictx->kc_lock, flags); return; not_input_data: if (len != 8) { dev_warn(dev, "imon %s: invalid incoming packet size (len = %d, intf%d)\n", __func__, len, intf); return; } /* iMON 2.4G associate frame */ if (buf[0] == 0x00 && buf[2] == 0xFF && /* REFID */ buf[3] == 0xFF && buf[4] == 0xFF && buf[5] == 0xFF && /* iMON 2.4G */ ((buf[6] == 0x4E && buf[7] == 0xDF) || /* LT */ (buf[6] == 0x5E && buf[7] == 0xDF))) { /* DT */ dev_warn(dev, "%s: remote associated refid=%02X\n", __func__, buf[1]); ictx->rf_isassociating = false; } } /* * Callback function for USB core API: receive data */ static void usb_rx_callback_intf0(struct urb *urb) { struct imon_context *ictx; int intfnum = 0; if (!urb) return; ictx = (struct imon_context *)urb->context; if (!ictx) return; /* * if we get a callback before we're done configuring the hardware, we * can't yet process the data, as there's nowhere to send it, but we * still need to submit a new rx URB to avoid wedging the hardware */ if (!ictx->dev_present_intf0) goto out; switch (urb->status) { case -ENOENT: /* usbcore unlink successful! */ return; case -ESHUTDOWN: /* transport endpoint was shut down */ break; case 0: imon_incoming_packet(ictx, urb, intfnum); break; default: dev_warn(ictx->dev, "imon %s: status(%d): ignored\n", __func__, urb->status); break; } out: usb_submit_urb(ictx->rx_urb_intf0, GFP_ATOMIC); } static void usb_rx_callback_intf1(struct urb *urb) { struct imon_context *ictx; int intfnum = 1; if (!urb) return; ictx = (struct imon_context *)urb->context; if (!ictx) return; /* * if we get a callback before we're done configuring the hardware, we * can't yet process the data, as there's nowhere to send it, but we * still need to submit a new rx URB to avoid wedging the hardware */ if (!ictx->dev_present_intf1) goto out; switch (urb->status) { case -ENOENT: /* usbcore unlink successful! */ return; case -ESHUTDOWN: /* transport endpoint was shut down */ break; case 0: imon_incoming_packet(ictx, urb, intfnum); break; default: dev_warn(ictx->dev, "imon %s: status(%d): ignored\n", __func__, urb->status); break; } out: usb_submit_urb(ictx->rx_urb_intf1, GFP_ATOMIC); } /* * The 0x15c2:0xffdc device ID was used for umpteen different imon * devices, and all of them constantly spew interrupts, even when there * is no actual data to report. However, byte 6 of this buffer looks like * its unique across device variants, so we're trying to key off that to * figure out which display type (if any) and what IR protocol the device * actually supports. These devices have their IR protocol hard-coded into * their firmware, they can't be changed on the fly like the newer hardware. */ static void imon_get_ffdc_type(struct imon_context *ictx) { u8 ffdc_cfg_byte = ictx->usb_rx_buf[6]; u8 detected_display_type = IMON_DISPLAY_TYPE_NONE; u64 allowed_protos = RC_PROTO_BIT_IMON; switch (ffdc_cfg_byte) { /* iMON Knob, no display, iMON IR + vol knob */ case 0x21: dev_info(ictx->dev, "0xffdc iMON Knob, iMON IR"); ictx->display_supported = false; break; /* iMON 2.4G LT (usb stick), no display, iMON RF */ case 0x4e: dev_info(ictx->dev, "0xffdc iMON 2.4G LT, iMON RF"); ictx->display_supported = false; ictx->rf_device = true; break; /* iMON VFD, no IR (does have vol knob tho) */ case 0x35: dev_info(ictx->dev, "0xffdc iMON VFD + knob, no IR"); detected_display_type = IMON_DISPLAY_TYPE_VFD; break; /* iMON VFD, iMON IR */ case 0x24: case 0x30: case 0x85: dev_info(ictx->dev, "0xffdc iMON VFD, iMON IR"); detected_display_type = IMON_DISPLAY_TYPE_VFD; break; /* iMON VFD, MCE IR */ case 0x46: case 0x9e: dev_info(ictx->dev, "0xffdc iMON VFD, MCE IR"); detected_display_type = IMON_DISPLAY_TYPE_VFD; allowed_protos = RC_PROTO_BIT_RC6_MCE; break; /* iMON VFD, iMON or MCE IR */ case 0x7e: dev_info(ictx->dev, "0xffdc iMON VFD, iMON or MCE IR"); detected_display_type = IMON_DISPLAY_TYPE_VFD; allowed_protos |= RC_PROTO_BIT_RC6_MCE; break; /* iMON LCD, MCE IR */ case 0x9f: dev_info(ictx->dev, "0xffdc iMON LCD, MCE IR"); detected_display_type = IMON_DISPLAY_TYPE_LCD; allowed_protos = RC_PROTO_BIT_RC6_MCE; break; /* no display, iMON IR */ case 0x26: dev_info(ictx->dev, "0xffdc iMON Inside, iMON IR"); ictx->display_supported = false; break; /* Soundgraph iMON UltraBay */ case 0x98: dev_info(ictx->dev, "0xffdc iMON UltraBay, LCD + IR"); detected_display_type = IMON_DISPLAY_TYPE_LCD; allowed_protos = RC_PROTO_BIT_IMON | RC_PROTO_BIT_RC6_MCE; ictx->dev_descr = &ultrabay_table; break; default: dev_info(ictx->dev, "Unknown 0xffdc device, defaulting to VFD and iMON IR"); detected_display_type = IMON_DISPLAY_TYPE_VFD; /* * We don't know which one it is, allow user to set the * RC6 one from userspace if IMON wasn't correct. */ allowed_protos |= RC_PROTO_BIT_RC6_MCE; break; } printk(KERN_CONT " (id 0x%02x)\n", ffdc_cfg_byte); ictx->display_type = detected_display_type; ictx->rc_proto = allowed_protos; } static void imon_set_display_type(struct imon_context *ictx) { u8 configured_display_type = IMON_DISPLAY_TYPE_VFD; /* * Try to auto-detect the type of display if the user hasn't set * it by hand via the display_type modparam. Default is VFD. */ if (display_type == IMON_DISPLAY_TYPE_AUTO) { switch (ictx->product) { case 0xffdc: /* set in imon_get_ffdc_type() */ configured_display_type = ictx->display_type; break; case 0x0034: case 0x0035: configured_display_type = IMON_DISPLAY_TYPE_VGA; break; case 0x0038: case 0x0039: case 0x0045: configured_display_type = IMON_DISPLAY_TYPE_LCD; break; case 0x003c: case 0x0041: case 0x0042: case 0x0043: configured_display_type = IMON_DISPLAY_TYPE_NONE; ictx->display_supported = false; break; case 0x0036: case 0x0044: default: configured_display_type = IMON_DISPLAY_TYPE_VFD; break; } } else { configured_display_type = display_type; if (display_type == IMON_DISPLAY_TYPE_NONE) ictx->display_supported = false; else ictx->display_supported = true; dev_info(ictx->dev, "%s: overriding display type to %d via modparam\n", __func__, display_type); } ictx->display_type = configured_display_type; } static struct rc_dev *imon_init_rdev(struct imon_context *ictx) { struct rc_dev *rdev; int ret; static const unsigned char fp_packet[] = { 0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x88 }; rdev = rc_allocate_device(RC_DRIVER_SCANCODE); if (!rdev) { dev_err(ictx->dev, "remote control dev allocation failed\n"); goto out; } snprintf(ictx->name_rdev, sizeof(ictx->name_rdev), "iMON Remote (%04x:%04x)", ictx->vendor, ictx->product); usb_make_path(ictx->usbdev_intf0, ictx->phys_rdev, sizeof(ictx->phys_rdev)); strlcat(ictx->phys_rdev, "/input0", sizeof(ictx->phys_rdev)); rdev->device_name = ictx->name_rdev; rdev->input_phys = ictx->phys_rdev; usb_to_input_id(ictx->usbdev_intf0, &rdev->input_id); rdev->dev.parent = ictx->dev; rdev->priv = ictx; /* iMON PAD or MCE */ rdev->allowed_protocols = RC_PROTO_BIT_IMON | RC_PROTO_BIT_RC6_MCE; rdev->change_protocol = imon_ir_change_protocol; rdev->driver_name = MOD_NAME; /* Enable front-panel buttons and/or knobs */ memcpy(ictx->usb_tx_buf, &fp_packet, sizeof(fp_packet)); ret = send_packet(ictx); /* Not fatal, but warn about it */ if (ret) dev_info(ictx->dev, "panel buttons/knobs setup failed\n"); if (ictx->product == 0xffdc) { imon_get_ffdc_type(ictx); rdev->allowed_protocols = ictx->rc_proto; } imon_set_display_type(ictx); if (ictx->rc_proto == RC_PROTO_BIT_RC6_MCE) rdev->map_name = RC_MAP_IMON_MCE; else rdev->map_name = RC_MAP_IMON_PAD; ret = rc_register_device(rdev); if (ret < 0) { dev_err(ictx->dev, "remote input dev register failed\n"); goto out; } return rdev; out: rc_free_device(rdev); return NULL; } static struct input_dev *imon_init_idev(struct imon_context *ictx) { const struct imon_panel_key_table *key_table; struct input_dev *idev; int ret, i; key_table = ictx->dev_descr->key_table; idev = input_allocate_device(); if (!idev) goto out; snprintf(ictx->name_idev, sizeof(ictx->name_idev), "iMON Panel, Knob and Mouse(%04x:%04x)", ictx->vendor, ictx->product); idev->name = ictx->name_idev; usb_make_path(ictx->usbdev_intf0, ictx->phys_idev, sizeof(ictx->phys_idev)); strlcat(ictx->phys_idev, "/input1", sizeof(ictx->phys_idev)); idev->phys = ictx->phys_idev; idev->evbit[0] = BIT_MASK(EV_KEY) | BIT_MASK(EV_REP) | BIT_MASK(EV_REL); idev->keybit[BIT_WORD(BTN_MOUSE)] = BIT_MASK(BTN_LEFT) | BIT_MASK(BTN_RIGHT); idev->relbit[0] = BIT_MASK(REL_X) | BIT_MASK(REL_Y) | BIT_MASK(REL_WHEEL); /* panel and/or knob code support */ for (i = 0; key_table[i].hw_code != 0; i++) { u32 kc = key_table[i].keycode; __set_bit(kc, idev->keybit); } usb_to_input_id(ictx->usbdev_intf0, &idev->id); idev->dev.parent = ictx->dev; input_set_drvdata(idev, ictx); ret = input_register_device(idev); if (ret < 0) { dev_err(ictx->dev, "input dev register failed\n"); goto out; } return idev; out: input_free_device(idev); return NULL; } static struct input_dev *imon_init_touch(struct imon_context *ictx) { struct input_dev *touch; int ret; touch = input_allocate_device(); if (!touch) goto touch_alloc_failed; snprintf(ictx->name_touch, sizeof(ictx->name_touch), "iMON USB Touchscreen (%04x:%04x)", ictx->vendor, ictx->product); touch->name = ictx->name_touch; usb_make_path(ictx->usbdev_intf1, ictx->phys_touch, sizeof(ictx->phys_touch)); strlcat(ictx->phys_touch, "/input2", sizeof(ictx->phys_touch)); touch->phys = ictx->phys_touch; touch->evbit[0] = BIT_MASK(EV_KEY) | BIT_MASK(EV_ABS); touch->keybit[BIT_WORD(BTN_TOUCH)] = BIT_MASK(BTN_TOUCH); input_set_abs_params(touch, ABS_X, 0x00, 0xfff, 0, 0); input_set_abs_params(touch, ABS_Y, 0x00, 0xfff, 0, 0); input_set_drvdata(touch, ictx); usb_to_input_id(ictx->usbdev_intf1, &touch->id); touch->dev.parent = ictx->dev; ret = input_register_device(touch); if (ret < 0) { dev_info(ictx->dev, "touchscreen input dev register failed\n"); goto touch_register_failed; } return touch; touch_register_failed: input_free_device(touch); touch_alloc_failed: return NULL; } static bool imon_find_endpoints(struct imon_context *ictx, struct usb_host_interface *iface_desc) { struct usb_endpoint_descriptor *ep; struct usb_endpoint_descriptor *rx_endpoint = NULL; struct usb_endpoint_descriptor *tx_endpoint = NULL; int ifnum = iface_desc->desc.bInterfaceNumber; int num_endpts = iface_desc->desc.bNumEndpoints; int i, ep_dir, ep_type; bool ir_ep_found = false; bool display_ep_found = false; bool tx_control = false; /* * Scan the endpoint list and set: * first input endpoint = IR endpoint * first output endpoint = display endpoint */ for (i = 0; i < num_endpts && !(ir_ep_found && display_ep_found); ++i) { ep = &iface_desc->endpoint[i].desc; ep_dir = ep->bEndpointAddress & USB_ENDPOINT_DIR_MASK; ep_type = usb_endpoint_type(ep); if (!ir_ep_found && ep_dir == USB_DIR_IN && ep_type == USB_ENDPOINT_XFER_INT) { rx_endpoint = ep; ir_ep_found = true; dev_dbg(ictx->dev, "%s: found IR endpoint\n", __func__); } else if (!display_ep_found && ep_dir == USB_DIR_OUT && ep_type == USB_ENDPOINT_XFER_INT) { tx_endpoint = ep; display_ep_found = true; dev_dbg(ictx->dev, "%s: found display endpoint\n", __func__); } } if (ifnum == 0) { ictx->rx_endpoint_intf0 = rx_endpoint; /* * tx is used to send characters to lcd/vfd, associate RF * remotes, set IR protocol, and maybe more... */ ictx->tx_endpoint = tx_endpoint; } else { ictx->rx_endpoint_intf1 = rx_endpoint; } /* * If we didn't find a display endpoint, this is probably one of the * newer iMON devices that use control urb instead of interrupt */ if (!display_ep_found) { tx_control = true; display_ep_found = true; dev_dbg(ictx->dev, "%s: device uses control endpoint, not interface OUT endpoint\n", __func__); } /* * Some iMON receivers have no display. Unfortunately, it seems * that SoundGraph recycles device IDs between devices both with * and without... :\ */ if (ictx->display_type == IMON_DISPLAY_TYPE_NONE) { display_ep_found = false; dev_dbg(ictx->dev, "%s: device has no display\n", __func__); } /* * iMON Touch devices have a VGA touchscreen, but no "display", as * that refers to e.g. /dev/lcd0 (a character device LCD or VFD). */ if (ictx->display_type == IMON_DISPLAY_TYPE_VGA) { display_ep_found = false; dev_dbg(ictx->dev, "%s: iMON Touch device found\n", __func__); } /* Input endpoint is mandatory */ if (!ir_ep_found) pr_err("no valid input (IR) endpoint found\n"); ictx->tx_control = tx_control; if (display_ep_found) ictx->display_supported = true; return ir_ep_found; } static struct imon_context *imon_init_intf0(struct usb_interface *intf, const struct usb_device_id *id) { struct imon_context *ictx; struct urb *rx_urb; struct urb *tx_urb; struct device *dev = &intf->dev; struct usb_host_interface *iface_desc; int ret = -ENOMEM; ictx = kzalloc(sizeof(*ictx), GFP_KERNEL); if (!ictx) goto exit; rx_urb = usb_alloc_urb(0, GFP_KERNEL); if (!rx_urb) goto rx_urb_alloc_failed; tx_urb = usb_alloc_urb(0, GFP_KERNEL); if (!tx_urb) goto tx_urb_alloc_failed; mutex_init(&ictx->lock); spin_lock_init(&ictx->kc_lock); mutex_lock(&ictx->lock); ictx->dev = dev; ictx->usbdev_intf0 = usb_get_dev(interface_to_usbdev(intf)); ictx->rx_urb_intf0 = rx_urb; ictx->tx_urb = tx_urb; ictx->rf_device = false; init_completion(&ictx->tx.finished); ictx->vendor = le16_to_cpu(ictx->usbdev_intf0->descriptor.idVendor); ictx->product = le16_to_cpu(ictx->usbdev_intf0->descriptor.idProduct); /* save drive info for later accessing the panel/knob key table */ ictx->dev_descr = (struct imon_usb_dev_descr *)id->driver_info; /* default send_packet delay is 5ms but some devices need more */ ictx->send_packet_delay = ictx->dev_descr->flags & IMON_NEED_20MS_PKT_DELAY ? 20 : 5; ret = -ENODEV; iface_desc = intf->cur_altsetting; if (!imon_find_endpoints(ictx, iface_desc)) { goto find_endpoint_failed; } usb_fill_int_urb(ictx->rx_urb_intf0, ictx->usbdev_intf0, usb_rcvintpipe(ictx->usbdev_intf0, ictx->rx_endpoint_intf0->bEndpointAddress), ictx->usb_rx_buf, sizeof(ictx->usb_rx_buf), usb_rx_callback_intf0, ictx, ictx->rx_endpoint_intf0->bInterval); ret = usb_submit_urb(ictx->rx_urb_intf0, GFP_KERNEL); if (ret) { pr_err("usb_submit_urb failed for intf0 (%d)\n", ret); goto urb_submit_failed; } ictx->idev = imon_init_idev(ictx); if (!ictx->idev) { dev_err(dev, "%s: input device setup failed\n", __func__); goto idev_setup_failed; } ictx->rdev = imon_init_rdev(ictx); if (!ictx->rdev) { dev_err(dev, "%s: rc device setup failed\n", __func__); goto rdev_setup_failed; } ictx->dev_present_intf0 = true; mutex_unlock(&ictx->lock); return ictx; rdev_setup_failed: input_unregister_device(ictx->idev); idev_setup_failed: usb_kill_urb(ictx->rx_urb_intf0); urb_submit_failed: find_endpoint_failed: usb_put_dev(ictx->usbdev_intf0); mutex_unlock(&ictx->lock); usb_free_urb(tx_urb); tx_urb_alloc_failed: usb_free_urb(rx_urb); rx_urb_alloc_failed: kfree(ictx); exit: dev_err(dev, "unable to initialize intf0, err %d\n", ret); return NULL; } static struct imon_context *imon_init_intf1(struct usb_interface *intf, struct imon_context *ictx) { struct urb *rx_urb; struct usb_host_interface *iface_desc; int ret = -ENOMEM; rx_urb = usb_alloc_urb(0, GFP_KERNEL); if (!rx_urb) goto rx_urb_alloc_failed; mutex_lock(&ictx->lock); if (ictx->display_type == IMON_DISPLAY_TYPE_VGA) { timer_setup(&ictx->ttimer, imon_touch_display_timeout, 0); } ictx->usbdev_intf1 = usb_get_dev(interface_to_usbdev(intf)); ictx->rx_urb_intf1 = rx_urb; ret = -ENODEV; iface_desc = intf->cur_altsetting; if (!imon_find_endpoints(ictx, iface_desc)) goto find_endpoint_failed; if (ictx->display_type == IMON_DISPLAY_TYPE_VGA) { ictx->touch = imon_init_touch(ictx); if (!ictx->touch) goto touch_setup_failed; } else ictx->touch = NULL; usb_fill_int_urb(ictx->rx_urb_intf1, ictx->usbdev_intf1, usb_rcvintpipe(ictx->usbdev_intf1, ictx->rx_endpoint_intf1->bEndpointAddress), ictx->usb_rx_buf, sizeof(ictx->usb_rx_buf), usb_rx_callback_intf1, ictx, ictx->rx_endpoint_intf1->bInterval); ret = usb_submit_urb(ictx->rx_urb_intf1, GFP_KERNEL); if (ret) { pr_err("usb_submit_urb failed for intf1 (%d)\n", ret); goto urb_submit_failed; } ictx->dev_present_intf1 = true; mutex_unlock(&ictx->lock); return ictx; urb_submit_failed: if (ictx->touch) input_unregister_device(ictx->touch); touch_setup_failed: find_endpoint_failed: usb_put_dev(ictx->usbdev_intf1); ictx->usbdev_intf1 = NULL; mutex_unlock(&ictx->lock); usb_free_urb(rx_urb); ictx->rx_urb_intf1 = NULL; rx_urb_alloc_failed: dev_err(ictx->dev, "unable to initialize intf1, err %d\n", ret); return NULL; } static void imon_init_display(struct imon_context *ictx, struct usb_interface *intf) { int ret; dev_dbg(ictx->dev, "Registering iMON display with sysfs\n"); /* set up sysfs entry for built-in clock */ ret = sysfs_create_group(&intf->dev.kobj, &imon_display_attr_group); if (ret) dev_err(ictx->dev, "Could not create display sysfs entries(%d)", ret); if (ictx->display_type == IMON_DISPLAY_TYPE_LCD) ret = usb_register_dev(intf, &imon_lcd_class); else ret = usb_register_dev(intf, &imon_vfd_class); if (ret) /* Not a fatal error, so ignore */ dev_info(ictx->dev, "could not get a minor number for display\n"); } /* * Callback function for USB core API: Probe */ static int imon_probe(struct usb_interface *interface, const struct usb_device_id *id) { struct usb_device *usbdev = NULL; struct usb_host_interface *iface_desc = NULL; struct usb_interface *first_if; struct device *dev = &interface->dev; int ifnum, sysfs_err; int ret = 0; struct imon_context *ictx = NULL; u16 vendor, product; usbdev = usb_get_dev(interface_to_usbdev(interface)); iface_desc = interface->cur_altsetting; ifnum = iface_desc->desc.bInterfaceNumber; vendor = le16_to_cpu(usbdev->descriptor.idVendor); product = le16_to_cpu(usbdev->descriptor.idProduct); dev_dbg(dev, "%s: found iMON device (%04x:%04x, intf%d)\n", __func__, vendor, product, ifnum); first_if = usb_ifnum_to_if(usbdev, 0); if (!first_if) { ret = -ENODEV; goto fail; } if (first_if->dev.driver != interface->dev.driver) { dev_err(&interface->dev, "inconsistent driver matching\n"); ret = -EINVAL; goto fail; } if (ifnum == 0) { ictx = imon_init_intf0(interface, id); if (!ictx) { pr_err("failed to initialize context!\n"); ret = -ENODEV; goto fail; } refcount_set(&ictx->users, 1); } else { /* this is the secondary interface on the device */ struct imon_context *first_if_ctx = usb_get_intfdata(first_if); /* fail early if first intf failed to register */ if (!first_if_ctx) { ret = -ENODEV; goto fail; } ictx = imon_init_intf1(interface, first_if_ctx); if (!ictx) { pr_err("failed to attach to context!\n"); ret = -ENODEV; goto fail; } refcount_inc(&ictx->users); } usb_set_intfdata(interface, ictx); if (ifnum == 0) { if (product == 0xffdc && ictx->rf_device) { sysfs_err = sysfs_create_group(&interface->dev.kobj, &imon_rf_attr_group); if (sysfs_err) pr_err("Could not create RF sysfs entries(%d)\n", sysfs_err); } if (ictx->display_supported) imon_init_display(ictx, interface); } dev_info(dev, "iMON device (%04x:%04x, intf%d) on usb<%d:%d> initialized\n", vendor, product, ifnum, usbdev->bus->busnum, usbdev->devnum); usb_put_dev(usbdev); return 0; fail: usb_put_dev(usbdev); dev_err(dev, "unable to register, err %d\n", ret); return ret; } /* * Callback function for USB core API: disconnect */ static void imon_disconnect(struct usb_interface *interface) { struct imon_context *ictx; struct device *dev; int ifnum; ictx = usb_get_intfdata(interface); ictx->disconnected = true; dev = ictx->dev; ifnum = interface->cur_altsetting->desc.bInterfaceNumber; /* * sysfs_remove_group is safe to call even if sysfs_create_group * hasn't been called */ sysfs_remove_group(&interface->dev.kobj, &imon_display_attr_group); sysfs_remove_group(&interface->dev.kobj, &imon_rf_attr_group); usb_set_intfdata(interface, NULL); /* Abort ongoing write */ if (ictx->tx.busy) { usb_kill_urb(ictx->tx_urb); complete(&ictx->tx.finished); } if (ifnum == 0) { ictx->dev_present_intf0 = false; usb_kill_urb(ictx->rx_urb_intf0); input_unregister_device(ictx->idev); rc_unregister_device(ictx->rdev); if (ictx->display_supported) { if (ictx->display_type == IMON_DISPLAY_TYPE_LCD) usb_deregister_dev(interface, &imon_lcd_class); else if (ictx->display_type == IMON_DISPLAY_TYPE_VFD) usb_deregister_dev(interface, &imon_vfd_class); } usb_put_dev(ictx->usbdev_intf0); } else { ictx->dev_present_intf1 = false; usb_kill_urb(ictx->rx_urb_intf1); if (ictx->display_type == IMON_DISPLAY_TYPE_VGA) { del_timer_sync(&ictx->ttimer); input_unregister_device(ictx->touch); } usb_put_dev(ictx->usbdev_intf1); } if (refcount_dec_and_test(&ictx->users)) free_imon_context(ictx); dev_dbg(dev, "%s: iMON device (intf%d) disconnected\n", __func__, ifnum); } static int imon_suspend(struct usb_interface *intf, pm_message_t message) { struct imon_context *ictx = usb_get_intfdata(intf); int ifnum = intf->cur_altsetting->desc.bInterfaceNumber; if (ifnum == 0) usb_kill_urb(ictx->rx_urb_intf0); else usb_kill_urb(ictx->rx_urb_intf1); return 0; } static int imon_resume(struct usb_interface *intf) { int rc = 0; struct imon_context *ictx = usb_get_intfdata(intf); int ifnum = intf->cur_altsetting->desc.bInterfaceNumber; if (ifnum == 0) { usb_fill_int_urb(ictx->rx_urb_intf0, ictx->usbdev_intf0, usb_rcvintpipe(ictx->usbdev_intf0, ictx->rx_endpoint_intf0->bEndpointAddress), ictx->usb_rx_buf, sizeof(ictx->usb_rx_buf), usb_rx_callback_intf0, ictx, ictx->rx_endpoint_intf0->bInterval); rc = usb_submit_urb(ictx->rx_urb_intf0, GFP_NOIO); } else { usb_fill_int_urb(ictx->rx_urb_intf1, ictx->usbdev_intf1, usb_rcvintpipe(ictx->usbdev_intf1, ictx->rx_endpoint_intf1->bEndpointAddress), ictx->usb_rx_buf, sizeof(ictx->usb_rx_buf), usb_rx_callback_intf1, ictx, ictx->rx_endpoint_intf1->bInterval); rc = usb_submit_urb(ictx->rx_urb_intf1, GFP_NOIO); } return rc; } module_usb_driver(imon_driver); |
| 289 290 289 39 39 28 27 273 220 220 270 275 221 265 158 115 284 283 193 26 40 4 1 274 3 207 116 144 72 5 1 1 72 72 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 | // SPDX-License-Identifier: GPL-2.0 /* * linux/fs/hfsplus/bfind.c * * Copyright (C) 2001 * Brad Boyer (flar@allandria.com) * (C) 2003 Ardis Technologies <roman@ardistech.com> * * Search routines for btrees */ #include <linux/slab.h> #include "hfsplus_fs.h" int hfs_find_init(struct hfs_btree *tree, struct hfs_find_data *fd) { void *ptr; fd->tree = tree; fd->bnode = NULL; ptr = kmalloc(tree->max_key_len * 2 + 4, GFP_KERNEL); if (!ptr) return -ENOMEM; fd->search_key = ptr; fd->key = ptr + tree->max_key_len + 2; hfs_dbg(BNODE_REFS, "find_init: %d (%p)\n", tree->cnid, __builtin_return_address(0)); mutex_lock_nested(&tree->tree_lock, hfsplus_btree_lock_class(tree)); return 0; } void hfs_find_exit(struct hfs_find_data *fd) { hfs_bnode_put(fd->bnode); kfree(fd->search_key); hfs_dbg(BNODE_REFS, "find_exit: %d (%p)\n", fd->tree->cnid, __builtin_return_address(0)); mutex_unlock(&fd->tree->tree_lock); fd->tree = NULL; } int hfs_find_1st_rec_by_cnid(struct hfs_bnode *bnode, struct hfs_find_data *fd, int *begin, int *end, int *cur_rec) { __be32 cur_cnid; __be32 search_cnid; if (bnode->tree->cnid == HFSPLUS_EXT_CNID) { cur_cnid = fd->key->ext.cnid; search_cnid = fd->search_key->ext.cnid; } else if (bnode->tree->cnid == HFSPLUS_CAT_CNID) { cur_cnid = fd->key->cat.parent; search_cnid = fd->search_key->cat.parent; } else if (bnode->tree->cnid == HFSPLUS_ATTR_CNID) { cur_cnid = fd->key->attr.cnid; search_cnid = fd->search_key->attr.cnid; } else { cur_cnid = 0; /* used-uninitialized warning */ search_cnid = 0; BUG(); } if (cur_cnid == search_cnid) { (*end) = (*cur_rec); if ((*begin) == (*end)) return 1; } else { if (be32_to_cpu(cur_cnid) < be32_to_cpu(search_cnid)) (*begin) = (*cur_rec) + 1; else (*end) = (*cur_rec) - 1; } return 0; } int hfs_find_rec_by_key(struct hfs_bnode *bnode, struct hfs_find_data *fd, int *begin, int *end, int *cur_rec) { int cmpval; cmpval = bnode->tree->keycmp(fd->key, fd->search_key); if (!cmpval) { (*end) = (*cur_rec); return 1; } if (cmpval < 0) (*begin) = (*cur_rec) + 1; else *(end) = (*cur_rec) - 1; return 0; } /* Find the record in bnode that best matches key (not greater than...)*/ int __hfs_brec_find(struct hfs_bnode *bnode, struct hfs_find_data *fd, search_strategy_t rec_found) { u16 off, len, keylen; int rec; int b, e; int res; BUG_ON(!rec_found); b = 0; e = bnode->num_recs - 1; res = -ENOENT; do { rec = (e + b) / 2; len = hfs_brec_lenoff(bnode, rec, &off); keylen = hfs_brec_keylen(bnode, rec); if (keylen == 0) { res = -EINVAL; goto fail; } hfs_bnode_read(bnode, fd->key, off, keylen); if (rec_found(bnode, fd, &b, &e, &rec)) { res = 0; goto done; } } while (b <= e); if (rec != e && e >= 0) { len = hfs_brec_lenoff(bnode, e, &off); keylen = hfs_brec_keylen(bnode, e); if (keylen == 0) { res = -EINVAL; goto fail; } hfs_bnode_read(bnode, fd->key, off, keylen); } done: fd->record = e; fd->keyoffset = off; fd->keylength = keylen; fd->entryoffset = off + keylen; fd->entrylength = len - keylen; fail: return res; } /* Traverse a B*Tree from the root to a leaf finding best fit to key */ /* Return allocated copy of node found, set recnum to best record */ int hfs_brec_find(struct hfs_find_data *fd, search_strategy_t do_key_compare) { struct hfs_btree *tree; struct hfs_bnode *bnode; u32 nidx, parent; __be32 data; int height, res; tree = fd->tree; if (fd->bnode) hfs_bnode_put(fd->bnode); fd->bnode = NULL; nidx = tree->root; if (!nidx) return -ENOENT; height = tree->depth; res = 0; parent = 0; for (;;) { bnode = hfs_bnode_find(tree, nidx); if (IS_ERR(bnode)) { res = PTR_ERR(bnode); bnode = NULL; break; } if (bnode->height != height) goto invalid; if (bnode->type != (--height ? HFS_NODE_INDEX : HFS_NODE_LEAF)) goto invalid; bnode->parent = parent; res = __hfs_brec_find(bnode, fd, do_key_compare); if (!height) break; if (fd->record < 0) goto release; parent = nidx; hfs_bnode_read(bnode, &data, fd->entryoffset, 4); nidx = be32_to_cpu(data); hfs_bnode_put(bnode); } fd->bnode = bnode; return res; invalid: pr_err("inconsistency in B*Tree (%d,%d,%d,%u,%u)\n", height, bnode->height, bnode->type, nidx, parent); res = -EIO; release: hfs_bnode_put(bnode); return res; } int hfs_brec_read(struct hfs_find_data *fd, void *rec, int rec_len) { int res; res = hfs_brec_find(fd, hfs_find_rec_by_key); if (res) return res; if (fd->entrylength > rec_len) return -EINVAL; hfs_bnode_read(fd->bnode, rec, fd->entryoffset, fd->entrylength); return 0; } int hfs_brec_goto(struct hfs_find_data *fd, int cnt) { struct hfs_btree *tree; struct hfs_bnode *bnode; int idx, res = 0; u16 off, len, keylen; bnode = fd->bnode; tree = bnode->tree; if (cnt < 0) { cnt = -cnt; while (cnt > fd->record) { cnt -= fd->record + 1; fd->record = bnode->num_recs - 1; idx = bnode->prev; if (!idx) { res = -ENOENT; goto out; } hfs_bnode_put(bnode); bnode = hfs_bnode_find(tree, idx); if (IS_ERR(bnode)) { res = PTR_ERR(bnode); bnode = NULL; goto out; } } fd->record -= cnt; } else { while (cnt >= bnode->num_recs - fd->record) { cnt -= bnode->num_recs - fd->record; fd->record = 0; idx = bnode->next; if (!idx) { res = -ENOENT; goto out; } hfs_bnode_put(bnode); bnode = hfs_bnode_find(tree, idx); if (IS_ERR(bnode)) { res = PTR_ERR(bnode); bnode = NULL; goto out; } } fd->record += cnt; } len = hfs_brec_lenoff(bnode, fd->record, &off); keylen = hfs_brec_keylen(bnode, fd->record); if (keylen == 0) { res = -EINVAL; goto out; } fd->keyoffset = off; fd->keylength = keylen; fd->entryoffset = off + keylen; fd->entrylength = len - keylen; hfs_bnode_read(bnode, fd->key, off, keylen); out: fd->bnode = bnode; return res; } |
| 85 62 2 84 4 84 77 34 75 74 75 12 40 75 34 34 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 | /* mpicoder.c - Coder for the external representation of MPIs * Copyright (C) 1998, 1999 Free Software Foundation, Inc. * * This file is part of GnuPG. * * GnuPG is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * GnuPG is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA */ #include <linux/bitops.h> #include <linux/count_zeros.h> #include <linux/byteorder/generic.h> #include <linux/scatterlist.h> #include <linux/string.h> #include "mpi-internal.h" #define MAX_EXTERN_MPI_BITS 16384 /** * mpi_read_raw_data - Read a raw byte stream as a positive integer * @xbuffer: The data to read * @nbytes: The amount of data to read */ MPI mpi_read_raw_data(const void *xbuffer, size_t nbytes) { const uint8_t *buffer = xbuffer; int i, j; unsigned nbits, nlimbs; mpi_limb_t a; MPI val = NULL; while (nbytes > 0 && buffer[0] == 0) { buffer++; nbytes--; } nbits = nbytes * 8; if (nbits > MAX_EXTERN_MPI_BITS) { pr_info("MPI: mpi too large (%u bits)\n", nbits); return NULL; } if (nbytes > 0) nbits -= count_leading_zeros(buffer[0]) - (BITS_PER_LONG - 8); nlimbs = DIV_ROUND_UP(nbytes, BYTES_PER_MPI_LIMB); val = mpi_alloc(nlimbs); if (!val) return NULL; val->nbits = nbits; val->sign = 0; val->nlimbs = nlimbs; if (nbytes > 0) { i = BYTES_PER_MPI_LIMB - nbytes % BYTES_PER_MPI_LIMB; i %= BYTES_PER_MPI_LIMB; for (j = nlimbs; j > 0; j--) { a = 0; for (; i < BYTES_PER_MPI_LIMB; i++) { a <<= 8; a |= *buffer++; } i = 0; val->d[j - 1] = a; } } return val; } EXPORT_SYMBOL_GPL(mpi_read_raw_data); MPI mpi_read_from_buffer(const void *xbuffer, unsigned *ret_nread) { const uint8_t *buffer = xbuffer; unsigned int nbits, nbytes; MPI val; if (*ret_nread < 2) return ERR_PTR(-EINVAL); nbits = buffer[0] << 8 | buffer[1]; if (nbits > MAX_EXTERN_MPI_BITS) { pr_info("MPI: mpi too large (%u bits)\n", nbits); return ERR_PTR(-EINVAL); } nbytes = DIV_ROUND_UP(nbits, 8); if (nbytes + 2 > *ret_nread) { pr_info("MPI: mpi larger than buffer nbytes=%u ret_nread=%u\n", nbytes, *ret_nread); return ERR_PTR(-EINVAL); } val = mpi_read_raw_data(buffer + 2, nbytes); if (!val) return ERR_PTR(-ENOMEM); *ret_nread = nbytes + 2; return val; } EXPORT_SYMBOL_GPL(mpi_read_from_buffer); static int count_lzeros(MPI a) { mpi_limb_t alimb; int i, lzeros = 0; for (i = a->nlimbs - 1; i >= 0; i--) { alimb = a->d[i]; if (alimb == 0) { lzeros += sizeof(mpi_limb_t); } else { lzeros += count_leading_zeros(alimb) / 8; break; } } return lzeros; } /** * mpi_read_buffer() - read MPI to a buffer provided by user (msb first) * * @a: a multi precision integer * @buf: buffer to which the output will be written to. Needs to be at * least mpi_get_size(a) long. * @buf_len: size of the buf. * @nbytes: receives the actual length of the data written on success and * the data to-be-written on -EOVERFLOW in case buf_len was too * small. * @sign: if not NULL, it will be set to the sign of a. * * Return: 0 on success or error code in case of error */ int mpi_read_buffer(MPI a, uint8_t *buf, unsigned buf_len, unsigned *nbytes, int *sign) { uint8_t *p; #if BYTES_PER_MPI_LIMB == 4 __be32 alimb; #elif BYTES_PER_MPI_LIMB == 8 __be64 alimb; #else #error please implement for this limb size. #endif unsigned int n = mpi_get_size(a); int i, lzeros; if (!buf || !nbytes) return -EINVAL; if (sign) *sign = a->sign; lzeros = count_lzeros(a); if (buf_len < n - lzeros) { *nbytes = n - lzeros; return -EOVERFLOW; } p = buf; *nbytes = n - lzeros; for (i = a->nlimbs - 1 - lzeros / BYTES_PER_MPI_LIMB, lzeros %= BYTES_PER_MPI_LIMB; i >= 0; i--) { #if BYTES_PER_MPI_LIMB == 4 alimb = cpu_to_be32(a->d[i]); #elif BYTES_PER_MPI_LIMB == 8 alimb = cpu_to_be64(a->d[i]); #else #error please implement for this limb size. #endif memcpy(p, (u8 *)&alimb + lzeros, BYTES_PER_MPI_LIMB - lzeros); p += BYTES_PER_MPI_LIMB - lzeros; lzeros = 0; } return 0; } EXPORT_SYMBOL_GPL(mpi_read_buffer); /* * mpi_get_buffer() - Returns an allocated buffer with the MPI (msb first). * Caller must free the return string. * This function does return a 0 byte buffer with nbytes set to zero if the * value of A is zero. * * @a: a multi precision integer. * @nbytes: receives the length of this buffer. * @sign: if not NULL, it will be set to the sign of the a. * * Return: Pointer to MPI buffer or NULL on error */ void *mpi_get_buffer(MPI a, unsigned *nbytes, int *sign) { uint8_t *buf; unsigned int n; int ret; if (!nbytes) return NULL; n = mpi_get_size(a); if (!n) n++; buf = kmalloc(n, GFP_KERNEL); if (!buf) return NULL; ret = mpi_read_buffer(a, buf, n, nbytes, sign); if (ret) { kfree(buf); return NULL; } return buf; } EXPORT_SYMBOL_GPL(mpi_get_buffer); /** * mpi_write_to_sgl() - Funnction exports MPI to an sgl (msb first) * * This function works in the same way as the mpi_read_buffer, but it * takes an sgl instead of u8 * buf. * * @a: a multi precision integer * @sgl: scatterlist to write to. Needs to be at least * mpi_get_size(a) long. * @nbytes: the number of bytes to write. Leading bytes will be * filled with zero. * @sign: if not NULL, it will be set to the sign of a. * * Return: 0 on success or error code in case of error */ int mpi_write_to_sgl(MPI a, struct scatterlist *sgl, unsigned nbytes, int *sign) { u8 *p, *p2; #if BYTES_PER_MPI_LIMB == 4 __be32 alimb; #elif BYTES_PER_MPI_LIMB == 8 __be64 alimb; #else #error please implement for this limb size. #endif unsigned int n = mpi_get_size(a); struct sg_mapping_iter miter; int i, x, buf_len; int nents; if (sign) *sign = a->sign; if (nbytes < n) return -EOVERFLOW; nents = sg_nents_for_len(sgl, nbytes); if (nents < 0) return -EINVAL; sg_miter_start(&miter, sgl, nents, SG_MITER_ATOMIC | SG_MITER_TO_SG); sg_miter_next(&miter); buf_len = miter.length; p2 = miter.addr; while (nbytes > n) { i = min_t(unsigned, nbytes - n, buf_len); memset(p2, 0, i); p2 += i; nbytes -= i; buf_len -= i; if (!buf_len) { sg_miter_next(&miter); buf_len = miter.length; p2 = miter.addr; } } for (i = a->nlimbs - 1; i >= 0; i--) { #if BYTES_PER_MPI_LIMB == 4 alimb = a->d[i] ? cpu_to_be32(a->d[i]) : 0; #elif BYTES_PER_MPI_LIMB == 8 alimb = a->d[i] ? cpu_to_be64(a->d[i]) : 0; #else #error please implement for this limb size. #endif p = (u8 *)&alimb; for (x = 0; x < sizeof(alimb); x++) { *p2++ = *p++; if (!--buf_len) { sg_miter_next(&miter); buf_len = miter.length; p2 = miter.addr; } } } sg_miter_stop(&miter); return 0; } EXPORT_SYMBOL_GPL(mpi_write_to_sgl); /* * mpi_read_raw_from_sgl() - Function allocates an MPI and populates it with * data from the sgl * * This function works in the same way as the mpi_read_raw_data, but it * takes an sgl instead of void * buffer. i.e. it allocates * a new MPI and reads the content of the sgl to the MPI. * * @sgl: scatterlist to read from * @nbytes: number of bytes to read * * Return: Pointer to a new MPI or NULL on error */ MPI mpi_read_raw_from_sgl(struct scatterlist *sgl, unsigned int nbytes) { struct sg_mapping_iter miter; unsigned int nbits, nlimbs; int x, j, z, lzeros, ents; unsigned int len; const u8 *buff; mpi_limb_t a; MPI val = NULL; ents = sg_nents_for_len(sgl, nbytes); if (ents < 0) return NULL; sg_miter_start(&miter, sgl, ents, SG_MITER_ATOMIC | SG_MITER_FROM_SG); lzeros = 0; len = 0; while (nbytes > 0) { while (len && !*buff) { lzeros++; len--; buff++; } if (len && *buff) break; sg_miter_next(&miter); buff = miter.addr; len = miter.length; nbytes -= lzeros; lzeros = 0; } miter.consumed = lzeros; nbytes -= lzeros; nbits = nbytes * 8; if (nbits > MAX_EXTERN_MPI_BITS) { sg_miter_stop(&miter); pr_info("MPI: mpi too large (%u bits)\n", nbits); return NULL; } if (nbytes > 0) nbits -= count_leading_zeros(*buff) - (BITS_PER_LONG - 8); sg_miter_stop(&miter); nlimbs = DIV_ROUND_UP(nbytes, BYTES_PER_MPI_LIMB); val = mpi_alloc(nlimbs); if (!val) return NULL; val->nbits = nbits; val->sign = 0; val->nlimbs = nlimbs; if (nbytes == 0) return val; j = nlimbs - 1; a = 0; z = BYTES_PER_MPI_LIMB - nbytes % BYTES_PER_MPI_LIMB; z %= BYTES_PER_MPI_LIMB; while (sg_miter_next(&miter)) { buff = miter.addr; len = min_t(unsigned, miter.length, nbytes); nbytes -= len; for (x = 0; x < len; x++) { a <<= 8; a |= *buff++; if (((z + x + 1) % BYTES_PER_MPI_LIMB) == 0) { val->d[j--] = a; a = 0; } } z += x; } return val; } EXPORT_SYMBOL_GPL(mpi_read_raw_from_sgl); |
| 14 14 14 14 14 2 12 14 14 1 14 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 | // SPDX-License-Identifier: GPL-2.0-or-later /* L2TPv3 ethernet pseudowire driver * * Copyright (c) 2008,2009,2010 Katalix Systems Ltd */ #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt #include <linux/module.h> #include <linux/skbuff.h> #include <linux/socket.h> #include <linux/hash.h> #include <linux/l2tp.h> #include <linux/in.h> #include <linux/etherdevice.h> #include <linux/spinlock.h> #include <net/sock.h> #include <net/ip.h> #include <net/icmp.h> #include <net/udp.h> #include <net/inet_common.h> #include <net/inet_hashtables.h> #include <net/tcp_states.h> #include <net/protocol.h> #include <net/xfrm.h> #include <net/net_namespace.h> #include <net/netns/generic.h> #include <linux/ip.h> #include <linux/ipv6.h> #include <linux/udp.h> #include "l2tp_core.h" /* Default device name. May be overridden by name specified by user */ #define L2TP_ETH_DEV_NAME "l2tpeth%d" /* via netdev_priv() */ struct l2tp_eth { struct l2tp_session *session; }; /* via l2tp_session_priv() */ struct l2tp_eth_sess { struct net_device __rcu *dev; }; static int l2tp_eth_dev_init(struct net_device *dev) { eth_hw_addr_random(dev); eth_broadcast_addr(dev->broadcast); netdev_lockdep_set_classes(dev); return 0; } static void l2tp_eth_dev_uninit(struct net_device *dev) { struct l2tp_eth *priv = netdev_priv(dev); struct l2tp_eth_sess *spriv; spriv = l2tp_session_priv(priv->session); RCU_INIT_POINTER(spriv->dev, NULL); /* No need for synchronize_net() here. We're called by * unregister_netdev*(), which does the synchronisation for us. */ } static netdev_tx_t l2tp_eth_dev_xmit(struct sk_buff *skb, struct net_device *dev) { struct l2tp_eth *priv = netdev_priv(dev); struct l2tp_session *session = priv->session; unsigned int len = skb->len; int ret = l2tp_xmit_skb(session, skb); if (likely(ret == NET_XMIT_SUCCESS)) dev_dstats_tx_add(dev, len); else dev_dstats_tx_dropped(dev); return NETDEV_TX_OK; } static const struct net_device_ops l2tp_eth_netdev_ops = { .ndo_init = l2tp_eth_dev_init, .ndo_uninit = l2tp_eth_dev_uninit, .ndo_start_xmit = l2tp_eth_dev_xmit, .ndo_set_mac_address = eth_mac_addr, }; static const struct device_type l2tpeth_type = { .name = "l2tpeth", }; static void l2tp_eth_dev_setup(struct net_device *dev) { SET_NETDEV_DEVTYPE(dev, &l2tpeth_type); ether_setup(dev); dev->priv_flags &= ~IFF_TX_SKB_SHARING; dev->lltx = true; dev->netdev_ops = &l2tp_eth_netdev_ops; dev->needs_free_netdev = true; dev->pcpu_stat_type = NETDEV_PCPU_STAT_DSTATS; } static void l2tp_eth_dev_recv(struct l2tp_session *session, struct sk_buff *skb, int data_len) { struct l2tp_eth_sess *spriv = l2tp_session_priv(session); struct net_device *dev; if (!pskb_may_pull(skb, ETH_HLEN)) goto error; secpath_reset(skb); /* checksums verified by L2TP */ skb->ip_summed = CHECKSUM_NONE; /* drop outer flow-hash */ skb_clear_hash(skb); skb_dst_drop(skb); nf_reset_ct(skb); rcu_read_lock(); dev = rcu_dereference(spriv->dev); if (!dev) goto error_rcu; if (dev_forward_skb(dev, skb) == NET_RX_SUCCESS) dev_dstats_rx_add(dev, data_len); else DEV_STATS_INC(dev, rx_errors); rcu_read_unlock(); return; error_rcu: rcu_read_unlock(); error: kfree_skb(skb); } static void l2tp_eth_delete(struct l2tp_session *session) { struct l2tp_eth_sess *spriv; struct net_device *dev; if (session) { spriv = l2tp_session_priv(session); rtnl_lock(); dev = rtnl_dereference(spriv->dev); if (dev) { unregister_netdevice(dev); rtnl_unlock(); module_put(THIS_MODULE); } else { rtnl_unlock(); } } } static void l2tp_eth_show(struct seq_file *m, void *arg) { struct l2tp_session *session = arg; struct l2tp_eth_sess *spriv = l2tp_session_priv(session); struct net_device *dev; rcu_read_lock(); dev = rcu_dereference(spriv->dev); if (!dev) { rcu_read_unlock(); return; } dev_hold(dev); rcu_read_unlock(); seq_printf(m, " interface %s\n", dev->name); dev_put(dev); } static void l2tp_eth_adjust_mtu(struct l2tp_tunnel *tunnel, struct l2tp_session *session, struct net_device *dev) { unsigned int overhead = 0; u32 l3_overhead = 0; u32 mtu; /* if the encap is UDP, account for UDP header size */ if (tunnel->encap == L2TP_ENCAPTYPE_UDP) { overhead += sizeof(struct udphdr); dev->needed_headroom += sizeof(struct udphdr); } lock_sock(tunnel->sock); l3_overhead = kernel_sock_ip_overhead(tunnel->sock); release_sock(tunnel->sock); if (l3_overhead == 0) { /* L3 Overhead couldn't be identified, this could be * because tunnel->sock was NULL or the socket's * address family was not IPv4 or IPv6, * dev mtu stays at 1500. */ return; } /* Adjust MTU, factor overhead - underlay L3, overlay L2 hdr * UDP overhead, if any, was already factored in above. */ overhead += session->hdr_len + ETH_HLEN + l3_overhead; mtu = l2tp_tunnel_dst_mtu(tunnel) - overhead; if (mtu < dev->min_mtu || mtu > dev->max_mtu) dev->mtu = ETH_DATA_LEN - overhead; else dev->mtu = mtu; dev->needed_headroom += session->hdr_len; } static int l2tp_eth_create(struct net *net, struct l2tp_tunnel *tunnel, u32 session_id, u32 peer_session_id, struct l2tp_session_cfg *cfg) { unsigned char name_assign_type; struct net_device *dev; char name[IFNAMSIZ]; struct l2tp_session *session; struct l2tp_eth *priv; struct l2tp_eth_sess *spriv; int rc; if (cfg->ifname) { strscpy(name, cfg->ifname, IFNAMSIZ); name_assign_type = NET_NAME_USER; } else { strcpy(name, L2TP_ETH_DEV_NAME); name_assign_type = NET_NAME_ENUM; } session = l2tp_session_create(sizeof(*spriv), tunnel, session_id, peer_session_id, cfg); if (IS_ERR(session)) { rc = PTR_ERR(session); goto err; } dev = alloc_netdev(sizeof(*priv), name, name_assign_type, l2tp_eth_dev_setup); if (!dev) { rc = -ENOMEM; goto err_sess; } dev_net_set(dev, net); dev->min_mtu = 0; dev->max_mtu = ETH_MAX_MTU; l2tp_eth_adjust_mtu(tunnel, session, dev); priv = netdev_priv(dev); priv->session = session; session->recv_skb = l2tp_eth_dev_recv; session->session_close = l2tp_eth_delete; if (IS_ENABLED(CONFIG_L2TP_DEBUGFS)) session->show = l2tp_eth_show; spriv = l2tp_session_priv(session); refcount_inc(&session->ref_count); rtnl_lock(); /* Register both device and session while holding the rtnl lock. This * ensures that l2tp_eth_delete() will see that there's a device to * unregister, even if it happened to run before we assign spriv->dev. */ rc = l2tp_session_register(session, tunnel); if (rc < 0) { rtnl_unlock(); goto err_sess_dev; } rc = register_netdevice(dev); if (rc < 0) { rtnl_unlock(); l2tp_session_delete(session); l2tp_session_put(session); free_netdev(dev); return rc; } strscpy(session->ifname, dev->name, IFNAMSIZ); rcu_assign_pointer(spriv->dev, dev); rtnl_unlock(); l2tp_session_put(session); __module_get(THIS_MODULE); return 0; err_sess_dev: l2tp_session_put(session); free_netdev(dev); err_sess: l2tp_session_put(session); err: return rc; } static const struct l2tp_nl_cmd_ops l2tp_eth_nl_cmd_ops = { .session_create = l2tp_eth_create, .session_delete = l2tp_session_delete, }; static int __init l2tp_eth_init(void) { int err = 0; err = l2tp_nl_register_ops(L2TP_PWTYPE_ETH, &l2tp_eth_nl_cmd_ops); if (err) goto err; pr_info("L2TP ethernet pseudowire support (L2TPv3)\n"); return 0; err: return err; } static void __exit l2tp_eth_exit(void) { l2tp_nl_unregister_ops(L2TP_PWTYPE_ETH); } module_init(l2tp_eth_init); module_exit(l2tp_eth_exit); MODULE_LICENSE("GPL"); MODULE_AUTHOR("James Chapman <jchapman@katalix.com>"); MODULE_DESCRIPTION("L2TP ethernet pseudowire driver"); MODULE_VERSION("1.0"); MODULE_ALIAS_L2TP_PWTYPE(5); |
| 1 1 1 1 1 1 1 1 1 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 | /* SPDX-License-Identifier: GPL-2.0-only */ /* * Kernel-based Virtual Machine driver for Linux * * This module enables machines with Intel VT-x extensions to run virtual * machines without emulation or binary translation. * * MMU support * * Copyright (C) 2006 Qumranet, Inc. * Copyright 2010 Red Hat, Inc. and/or its affiliates. * * Authors: * Yaniv Kamay <yaniv@qumranet.com> * Avi Kivity <avi@qumranet.com> */ /* * The MMU needs to be able to access/walk 32-bit and 64-bit guest page tables, * as well as guest EPT tables, so the code in this file is compiled thrice, * once per guest PTE type. The per-type defines are #undef'd at the end. */ #if PTTYPE == 64 #define pt_element_t u64 #define guest_walker guest_walker64 #define FNAME(name) paging##64_##name #define PT_LEVEL_BITS 9 #define PT_GUEST_DIRTY_SHIFT PT_DIRTY_SHIFT #define PT_GUEST_ACCESSED_SHIFT PT_ACCESSED_SHIFT #define PT_HAVE_ACCESSED_DIRTY(mmu) true #ifdef CONFIG_X86_64 #define PT_MAX_FULL_LEVELS PT64_ROOT_MAX_LEVEL #else #define PT_MAX_FULL_LEVELS 2 #endif #elif PTTYPE == 32 #define pt_element_t u32 #define guest_walker guest_walker32 #define FNAME(name) paging##32_##name #define PT_LEVEL_BITS 10 #define PT_MAX_FULL_LEVELS 2 #define PT_GUEST_DIRTY_SHIFT PT_DIRTY_SHIFT #define PT_GUEST_ACCESSED_SHIFT PT_ACCESSED_SHIFT #define PT_HAVE_ACCESSED_DIRTY(mmu) true #define PT32_DIR_PSE36_SIZE 4 #define PT32_DIR_PSE36_SHIFT 13 #define PT32_DIR_PSE36_MASK \ (((1ULL << PT32_DIR_PSE36_SIZE) - 1) << PT32_DIR_PSE36_SHIFT) #elif PTTYPE == PTTYPE_EPT #define pt_element_t u64 #define guest_walker guest_walkerEPT #define FNAME(name) ept_##name #define PT_LEVEL_BITS 9 #define PT_GUEST_DIRTY_SHIFT 9 #define PT_GUEST_ACCESSED_SHIFT 8 #define PT_HAVE_ACCESSED_DIRTY(mmu) (!(mmu)->cpu_role.base.ad_disabled) #define PT_MAX_FULL_LEVELS PT64_ROOT_MAX_LEVEL #else #error Invalid PTTYPE value #endif /* Common logic, but per-type values. These also need to be undefined. */ #define PT_BASE_ADDR_MASK ((pt_element_t)__PT_BASE_ADDR_MASK) #define PT_LVL_ADDR_MASK(lvl) __PT_LVL_ADDR_MASK(PT_BASE_ADDR_MASK, lvl, PT_LEVEL_BITS) #define PT_LVL_OFFSET_MASK(lvl) __PT_LVL_OFFSET_MASK(PT_BASE_ADDR_MASK, lvl, PT_LEVEL_BITS) #define PT_INDEX(addr, lvl) __PT_INDEX(addr, lvl, PT_LEVEL_BITS) #define PT_GUEST_DIRTY_MASK (1 << PT_GUEST_DIRTY_SHIFT) #define PT_GUEST_ACCESSED_MASK (1 << PT_GUEST_ACCESSED_SHIFT) #define gpte_to_gfn_lvl FNAME(gpte_to_gfn_lvl) #define gpte_to_gfn(pte) gpte_to_gfn_lvl((pte), PG_LEVEL_4K) /* * The guest_walker structure emulates the behavior of the hardware page * table walker. */ struct guest_walker { int level; unsigned max_level; gfn_t table_gfn[PT_MAX_FULL_LEVELS]; pt_element_t ptes[PT_MAX_FULL_LEVELS]; pt_element_t prefetch_ptes[PTE_PREFETCH_NUM]; gpa_t pte_gpa[PT_MAX_FULL_LEVELS]; pt_element_t __user *ptep_user[PT_MAX_FULL_LEVELS]; bool pte_writable[PT_MAX_FULL_LEVELS]; unsigned int pt_access[PT_MAX_FULL_LEVELS]; unsigned int pte_access; gfn_t gfn; struct x86_exception fault; }; #if PTTYPE == 32 static inline gfn_t pse36_gfn_delta(u32 gpte) { int shift = 32 - PT32_DIR_PSE36_SHIFT - PAGE_SHIFT; return (gpte & PT32_DIR_PSE36_MASK) << shift; } #endif static gfn_t gpte_to_gfn_lvl(pt_element_t gpte, int lvl) { return (gpte & PT_LVL_ADDR_MASK(lvl)) >> PAGE_SHIFT; } static inline void FNAME(protect_clean_gpte)(struct kvm_mmu *mmu, unsigned *access, unsigned gpte) { unsigned mask; /* dirty bit is not supported, so no need to track it */ if (!PT_HAVE_ACCESSED_DIRTY(mmu)) return; BUILD_BUG_ON(PT_WRITABLE_MASK != ACC_WRITE_MASK); mask = (unsigned)~ACC_WRITE_MASK; /* Allow write access to dirty gptes */ mask |= (gpte >> (PT_GUEST_DIRTY_SHIFT - PT_WRITABLE_SHIFT)) & PT_WRITABLE_MASK; *access &= mask; } static inline int FNAME(is_present_gpte)(unsigned long pte) { #if PTTYPE != PTTYPE_EPT return pte & PT_PRESENT_MASK; #else return pte & 7; #endif } static bool FNAME(is_bad_mt_xwr)(struct rsvd_bits_validate *rsvd_check, u64 gpte) { #if PTTYPE != PTTYPE_EPT return false; #else return __is_bad_mt_xwr(rsvd_check, gpte); #endif } static bool FNAME(is_rsvd_bits_set)(struct kvm_mmu *mmu, u64 gpte, int level) { return __is_rsvd_bits_set(&mmu->guest_rsvd_check, gpte, level) || FNAME(is_bad_mt_xwr)(&mmu->guest_rsvd_check, gpte); } static bool FNAME(prefetch_invalid_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, u64 *spte, u64 gpte) { if (!FNAME(is_present_gpte)(gpte)) goto no_present; /* Prefetch only accessed entries (unless A/D bits are disabled). */ if (PT_HAVE_ACCESSED_DIRTY(vcpu->arch.mmu) && !(gpte & PT_GUEST_ACCESSED_MASK)) goto no_present; if (FNAME(is_rsvd_bits_set)(vcpu->arch.mmu, gpte, PG_LEVEL_4K)) goto no_present; return false; no_present: drop_spte(vcpu->kvm, spte); return true; } /* * For PTTYPE_EPT, a page table can be executable but not readable * on supported processors. Therefore, set_spte does not automatically * set bit 0 if execute only is supported. Here, we repurpose ACC_USER_MASK * to signify readability since it isn't used in the EPT case */ static inline unsigned FNAME(gpte_access)(u64 gpte) { unsigned access; #if PTTYPE == PTTYPE_EPT access = ((gpte & VMX_EPT_WRITABLE_MASK) ? ACC_WRITE_MASK : 0) | ((gpte & VMX_EPT_EXECUTABLE_MASK) ? ACC_EXEC_MASK : 0) | ((gpte & VMX_EPT_READABLE_MASK) ? ACC_USER_MASK : 0); #else BUILD_BUG_ON(ACC_EXEC_MASK != PT_PRESENT_MASK); BUILD_BUG_ON(ACC_EXEC_MASK != 1); access = gpte & (PT_WRITABLE_MASK | PT_USER_MASK | PT_PRESENT_MASK); /* Combine NX with P (which is set here) to get ACC_EXEC_MASK. */ access ^= (gpte >> PT64_NX_SHIFT); #endif return access; } static int FNAME(update_accessed_dirty_bits)(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, struct guest_walker *walker, gpa_t addr, int write_fault) { unsigned level, index; pt_element_t pte, orig_pte; pt_element_t __user *ptep_user; gfn_t table_gfn; int ret; /* dirty/accessed bits are not supported, so no need to update them */ if (!PT_HAVE_ACCESSED_DIRTY(mmu)) return 0; for (level = walker->max_level; level >= walker->level; --level) { pte = orig_pte = walker->ptes[level - 1]; table_gfn = walker->table_gfn[level - 1]; ptep_user = walker->ptep_user[level - 1]; index = offset_in_page(ptep_user) / sizeof(pt_element_t); if (!(pte & PT_GUEST_ACCESSED_MASK)) { trace_kvm_mmu_set_accessed_bit(table_gfn, index, sizeof(pte)); pte |= PT_GUEST_ACCESSED_MASK; } if (level == walker->level && write_fault && !(pte & PT_GUEST_DIRTY_MASK)) { trace_kvm_mmu_set_dirty_bit(table_gfn, index, sizeof(pte)); #if PTTYPE == PTTYPE_EPT if (kvm_x86_ops.nested_ops->write_log_dirty(vcpu, addr)) return -EINVAL; #endif pte |= PT_GUEST_DIRTY_MASK; } if (pte == orig_pte) continue; /* * If the slot is read-only, simply do not process the accessed * and dirty bits. This is the correct thing to do if the slot * is ROM, and page tables in read-as-ROM/write-as-MMIO slots * are only supported if the accessed and dirty bits are already * set in the ROM (so that MMIO writes are never needed). * * Note that NPT does not allow this at all and faults, since * it always wants nested page table entries for the guest * page tables to be writable. And EPT works but will simply * overwrite the read-only memory to set the accessed and dirty * bits. */ if (unlikely(!walker->pte_writable[level - 1])) continue; ret = __try_cmpxchg_user(ptep_user, &orig_pte, pte, fault); if (ret) return ret; kvm_vcpu_mark_page_dirty(vcpu, table_gfn); walker->ptes[level - 1] = pte; } return 0; } static inline unsigned FNAME(gpte_pkeys)(struct kvm_vcpu *vcpu, u64 gpte) { unsigned pkeys = 0; #if PTTYPE == 64 pte_t pte = {.pte = gpte}; pkeys = pte_flags_pkey(pte_flags(pte)); #endif return pkeys; } static inline bool FNAME(is_last_gpte)(struct kvm_mmu *mmu, unsigned int level, unsigned int gpte) { /* * For EPT and PAE paging (both variants), bit 7 is either reserved at * all level or indicates a huge page (ignoring CR3/EPTP). In either * case, bit 7 being set terminates the walk. */ #if PTTYPE == 32 /* * 32-bit paging requires special handling because bit 7 is ignored if * CR4.PSE=0, not reserved. Clear bit 7 in the gpte if the level is * greater than the last level for which bit 7 is the PAGE_SIZE bit. * * The RHS has bit 7 set iff level < (2 + PSE). If it is clear, bit 7 * is not reserved and does not indicate a large page at this level, * so clear PT_PAGE_SIZE_MASK in gpte if that is the case. */ gpte &= level - (PT32_ROOT_LEVEL + mmu->cpu_role.ext.cr4_pse); #endif /* * PG_LEVEL_4K always terminates. The RHS has bit 7 set * iff level <= PG_LEVEL_4K, which for our purpose means * level == PG_LEVEL_4K; set PT_PAGE_SIZE_MASK in gpte then. */ gpte |= level - PG_LEVEL_4K - 1; return gpte & PT_PAGE_SIZE_MASK; } /* * Fetch a guest pte for a guest virtual address, or for an L2's GPA. */ static int FNAME(walk_addr_generic)(struct guest_walker *walker, struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, gpa_t addr, u64 access) { int ret; pt_element_t pte; pt_element_t __user *ptep_user; gfn_t table_gfn; u64 pt_access, pte_access; unsigned index, accessed_dirty, pte_pkey; u64 nested_access; gpa_t pte_gpa; bool have_ad; int offset; u64 walk_nx_mask = 0; const int write_fault = access & PFERR_WRITE_MASK; const int user_fault = access & PFERR_USER_MASK; const int fetch_fault = access & PFERR_FETCH_MASK; u16 errcode = 0; gpa_t real_gpa; gfn_t gfn; trace_kvm_mmu_pagetable_walk(addr, access); retry_walk: walker->level = mmu->cpu_role.base.level; pte = kvm_mmu_get_guest_pgd(vcpu, mmu); have_ad = PT_HAVE_ACCESSED_DIRTY(mmu); #if PTTYPE == 64 walk_nx_mask = 1ULL << PT64_NX_SHIFT; if (walker->level == PT32E_ROOT_LEVEL) { pte = mmu->get_pdptr(vcpu, (addr >> 30) & 3); trace_kvm_mmu_paging_element(pte, walker->level); if (!FNAME(is_present_gpte)(pte)) goto error; --walker->level; } #endif walker->max_level = walker->level; /* * FIXME: on Intel processors, loads of the PDPTE registers for PAE paging * by the MOV to CR instruction are treated as reads and do not cause the * processor to set the dirty flag in any EPT paging-structure entry. */ nested_access = (have_ad ? PFERR_WRITE_MASK : 0) | PFERR_USER_MASK; pte_access = ~0; /* * Queue a page fault for injection if this assertion fails, as callers * assume that walker.fault contains sane info on a walk failure. I.e. * avoid making the situation worse by inducing even worse badness * between when the assertion fails and when KVM kicks the vCPU out to * userspace (because the VM is bugged). */ if (KVM_BUG_ON(is_long_mode(vcpu) && !is_pae(vcpu), vcpu->kvm)) goto error; ++walker->level; do { struct kvm_memory_slot *slot; unsigned long host_addr; pt_access = pte_access; --walker->level; index = PT_INDEX(addr, walker->level); table_gfn = gpte_to_gfn(pte); offset = index * sizeof(pt_element_t); pte_gpa = gfn_to_gpa(table_gfn) + offset; BUG_ON(walker->level < 1); walker->table_gfn[walker->level - 1] = table_gfn; walker->pte_gpa[walker->level - 1] = pte_gpa; real_gpa = kvm_translate_gpa(vcpu, mmu, gfn_to_gpa(table_gfn), nested_access, &walker->fault); /* * FIXME: This can happen if emulation (for of an INS/OUTS * instruction) triggers a nested page fault. The exit * qualification / exit info field will incorrectly have * "guest page access" as the nested page fault's cause, * instead of "guest page structure access". To fix this, * the x86_exception struct should be augmented with enough * information to fix the exit_qualification or exit_info_1 * fields. */ if (unlikely(real_gpa == INVALID_GPA)) return 0; slot = kvm_vcpu_gfn_to_memslot(vcpu, gpa_to_gfn(real_gpa)); if (!kvm_is_visible_memslot(slot)) goto error; host_addr = gfn_to_hva_memslot_prot(slot, gpa_to_gfn(real_gpa), &walker->pte_writable[walker->level - 1]); if (unlikely(kvm_is_error_hva(host_addr))) goto error; ptep_user = (pt_element_t __user *)((void *)host_addr + offset); if (unlikely(__get_user(pte, ptep_user))) goto error; walker->ptep_user[walker->level - 1] = ptep_user; trace_kvm_mmu_paging_element(pte, walker->level); /* * Inverting the NX it lets us AND it like other * permission bits. */ pte_access = pt_access & (pte ^ walk_nx_mask); if (unlikely(!FNAME(is_present_gpte)(pte))) goto error; if (unlikely(FNAME(is_rsvd_bits_set)(mmu, pte, walker->level))) { errcode = PFERR_RSVD_MASK | PFERR_PRESENT_MASK; goto error; } walker->ptes[walker->level - 1] = pte; /* Convert to ACC_*_MASK flags for struct guest_walker. */ walker->pt_access[walker->level - 1] = FNAME(gpte_access)(pt_access ^ walk_nx_mask); } while (!FNAME(is_last_gpte)(mmu, walker->level, pte)); pte_pkey = FNAME(gpte_pkeys)(vcpu, pte); accessed_dirty = have_ad ? pte_access & PT_GUEST_ACCESSED_MASK : 0; /* Convert to ACC_*_MASK flags for struct guest_walker. */ walker->pte_access = FNAME(gpte_access)(pte_access ^ walk_nx_mask); errcode = permission_fault(vcpu, mmu, walker->pte_access, pte_pkey, access); if (unlikely(errcode)) goto error; gfn = gpte_to_gfn_lvl(pte, walker->level); gfn += (addr & PT_LVL_OFFSET_MASK(walker->level)) >> PAGE_SHIFT; #if PTTYPE == 32 if (walker->level > PG_LEVEL_4K && is_cpuid_PSE36()) gfn += pse36_gfn_delta(pte); #endif real_gpa = kvm_translate_gpa(vcpu, mmu, gfn_to_gpa(gfn), access, &walker->fault); if (real_gpa == INVALID_GPA) return 0; walker->gfn = real_gpa >> PAGE_SHIFT; if (!write_fault) FNAME(protect_clean_gpte)(mmu, &walker->pte_access, pte); else /* * On a write fault, fold the dirty bit into accessed_dirty. * For modes without A/D bits support accessed_dirty will be * always clear. */ accessed_dirty &= pte >> (PT_GUEST_DIRTY_SHIFT - PT_GUEST_ACCESSED_SHIFT); if (unlikely(!accessed_dirty)) { ret = FNAME(update_accessed_dirty_bits)(vcpu, mmu, walker, addr, write_fault); if (unlikely(ret < 0)) goto error; else if (ret) goto retry_walk; } return 1; error: errcode |= write_fault | user_fault; if (fetch_fault && (is_efer_nx(mmu) || is_cr4_smep(mmu))) errcode |= PFERR_FETCH_MASK; walker->fault.vector = PF_VECTOR; walker->fault.error_code_valid = true; walker->fault.error_code = errcode; #if PTTYPE == PTTYPE_EPT /* * Use PFERR_RSVD_MASK in error_code to tell if EPT * misconfiguration requires to be injected. The detection is * done by is_rsvd_bits_set() above. * * We set up the value of exit_qualification to inject: * [2:0] - Derive from the access bits. The exit_qualification might be * out of date if it is serving an EPT misconfiguration. * [5:3] - Calculated by the page walk of the guest EPT page tables * [7:8] - Derived from [7:8] of real exit_qualification * * The other bits are set to 0. */ if (!(errcode & PFERR_RSVD_MASK)) { walker->fault.exit_qualification = 0; if (write_fault) walker->fault.exit_qualification |= EPT_VIOLATION_ACC_WRITE; if (user_fault) walker->fault.exit_qualification |= EPT_VIOLATION_ACC_READ; if (fetch_fault) walker->fault.exit_qualification |= EPT_VIOLATION_ACC_INSTR; /* * Note, pte_access holds the raw RWX bits from the EPTE, not * ACC_*_MASK flags! */ walker->fault.exit_qualification |= (pte_access & VMX_EPT_RWX_MASK) << EPT_VIOLATION_RWX_SHIFT; } #endif walker->fault.address = addr; walker->fault.nested_page_fault = mmu != vcpu->arch.walk_mmu; walker->fault.async_page_fault = false; trace_kvm_mmu_walker_error(walker->fault.error_code); return 0; } static int FNAME(walk_addr)(struct guest_walker *walker, struct kvm_vcpu *vcpu, gpa_t addr, u64 access) { return FNAME(walk_addr_generic)(walker, vcpu, vcpu->arch.mmu, addr, access); } static bool FNAME(prefetch_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, u64 *spte, pt_element_t gpte) { unsigned pte_access; gfn_t gfn; if (FNAME(prefetch_invalid_gpte)(vcpu, sp, spte, gpte)) return false; gfn = gpte_to_gfn(gpte); pte_access = sp->role.access & FNAME(gpte_access)(gpte); FNAME(protect_clean_gpte)(vcpu->arch.mmu, &pte_access, gpte); return kvm_mmu_prefetch_sptes(vcpu, gfn, spte, 1, pte_access); } static bool FNAME(gpte_changed)(struct kvm_vcpu *vcpu, struct guest_walker *gw, int level) { pt_element_t curr_pte; gpa_t base_gpa, pte_gpa = gw->pte_gpa[level - 1]; u64 mask; int r, index; if (level == PG_LEVEL_4K) { mask = PTE_PREFETCH_NUM * sizeof(pt_element_t) - 1; base_gpa = pte_gpa & ~mask; index = (pte_gpa - base_gpa) / sizeof(pt_element_t); r = kvm_vcpu_read_guest_atomic(vcpu, base_gpa, gw->prefetch_ptes, sizeof(gw->prefetch_ptes)); curr_pte = gw->prefetch_ptes[index]; } else r = kvm_vcpu_read_guest_atomic(vcpu, pte_gpa, &curr_pte, sizeof(curr_pte)); return r || curr_pte != gw->ptes[level - 1]; } static void FNAME(pte_prefetch)(struct kvm_vcpu *vcpu, struct guest_walker *gw, u64 *sptep) { struct kvm_mmu_page *sp; pt_element_t *gptep = gw->prefetch_ptes; u64 *spte; int i; sp = sptep_to_sp(sptep); if (sp->role.level > PG_LEVEL_4K) return; /* * If addresses are being invalidated, skip prefetching to avoid * accidentally prefetching those addresses. */ if (unlikely(vcpu->kvm->mmu_invalidate_in_progress)) return; if (sp->role.direct) return __direct_pte_prefetch(vcpu, sp, sptep); i = spte_index(sptep) & ~(PTE_PREFETCH_NUM - 1); spte = sp->spt + i; for (i = 0; i < PTE_PREFETCH_NUM; i++, spte++) { if (spte == sptep) continue; if (is_shadow_present_pte(*spte)) continue; if (!FNAME(prefetch_gpte)(vcpu, sp, spte, gptep[i])) break; } } /* * Fetch a shadow pte for a specific level in the paging hierarchy. * If the guest tries to write a write-protected page, we need to * emulate this operation, return 1 to indicate this case. */ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault, struct guest_walker *gw) { struct kvm_mmu_page *sp = NULL; struct kvm_shadow_walk_iterator it; unsigned int direct_access, access; int top_level, ret; gfn_t base_gfn = fault->gfn; WARN_ON_ONCE(gw->gfn != base_gfn); direct_access = gw->pte_access; top_level = vcpu->arch.mmu->cpu_role.base.level; if (top_level == PT32E_ROOT_LEVEL) top_level = PT32_ROOT_LEVEL; /* * Verify that the top-level gpte is still there. Since the page * is a root page, it is either write protected (and cannot be * changed from now on) or it is invalid (in which case, we don't * really care if it changes underneath us after this point). */ if (FNAME(gpte_changed)(vcpu, gw, top_level)) return RET_PF_RETRY; if (WARN_ON_ONCE(!VALID_PAGE(vcpu->arch.mmu->root.hpa))) return RET_PF_RETRY; /* * Load a new root and retry the faulting instruction in the extremely * unlikely scenario that the guest root gfn became visible between * loading a dummy root and handling the resulting page fault, e.g. if * userspace create a memslot in the interim. */ if (unlikely(kvm_mmu_is_dummy_root(vcpu->arch.mmu->root.hpa))) { kvm_make_request(KVM_REQ_MMU_FREE_OBSOLETE_ROOTS, vcpu); return RET_PF_RETRY; } for_each_shadow_entry(vcpu, fault->addr, it) { gfn_t table_gfn; clear_sp_write_flooding_count(it.sptep); if (it.level == gw->level) break; table_gfn = gw->table_gfn[it.level - 2]; access = gw->pt_access[it.level - 2]; sp = kvm_mmu_get_child_sp(vcpu, it.sptep, table_gfn, false, access); /* * Synchronize the new page before linking it, as the CPU (KVM) * is architecturally disallowed from inserting non-present * entries into the TLB, i.e. the guest isn't required to flush * the TLB when changing the gPTE from non-present to present. * * For PG_LEVEL_4K, kvm_mmu_find_shadow_page() has already * synchronized the page via kvm_sync_page(). * * For higher level pages, which cannot be unsync themselves * but can have unsync children, synchronize via the slower * mmu_sync_children(). If KVM needs to drop mmu_lock due to * contention or to reschedule, instruct the caller to retry * the #PF (mmu_sync_children() ensures forward progress will * be made). */ if (sp != ERR_PTR(-EEXIST) && sp->unsync_children && mmu_sync_children(vcpu, sp, false)) return RET_PF_RETRY; /* * Verify that the gpte in the page, which is now either * write-protected or unsync, wasn't modified between the fault * and acquiring mmu_lock. This needs to be done even when * reusing an existing shadow page to ensure the information * gathered by the walker matches the information stored in the * shadow page (which could have been modified by a different * vCPU even if the page was already linked). Holding mmu_lock * prevents the shadow page from changing after this point. */ if (FNAME(gpte_changed)(vcpu, gw, it.level - 1)) return RET_PF_RETRY; if (sp != ERR_PTR(-EEXIST)) link_shadow_page(vcpu, it.sptep, sp); if (fault->write && table_gfn == fault->gfn) fault->write_fault_to_shadow_pgtable = true; } /* * Adjust the hugepage size _after_ resolving indirect shadow pages. * KVM doesn't support mapping hugepages into the guest for gfns that * are being shadowed by KVM, i.e. allocating a new shadow page may * affect the allowed hugepage size. */ kvm_mmu_hugepage_adjust(vcpu, fault); trace_kvm_mmu_spte_requested(fault); for (; shadow_walk_okay(&it); shadow_walk_next(&it)) { /* * We cannot overwrite existing page tables with an NX * large page, as the leaf could be executable. */ if (fault->nx_huge_page_workaround_enabled) disallowed_hugepage_adjust(fault, *it.sptep, it.level); base_gfn = gfn_round_for_level(fault->gfn, it.level); if (it.level == fault->goal_level) break; validate_direct_spte(vcpu, it.sptep, direct_access); sp = kvm_mmu_get_child_sp(vcpu, it.sptep, base_gfn, true, direct_access); if (sp == ERR_PTR(-EEXIST)) continue; link_shadow_page(vcpu, it.sptep, sp); if (fault->huge_page_disallowed) account_nx_huge_page(vcpu->kvm, sp, fault->req_level >= it.level); } if (WARN_ON_ONCE(it.level != fault->goal_level)) return -EFAULT; ret = mmu_set_spte(vcpu, fault->slot, it.sptep, gw->pte_access, base_gfn, fault->pfn, fault); if (ret == RET_PF_SPURIOUS) return ret; FNAME(pte_prefetch)(vcpu, gw, it.sptep); return ret; } /* * Page fault handler. There are several causes for a page fault: * - there is no shadow pte for the guest pte * - write access through a shadow pte marked read only so that we can set * the dirty bit * - write access to a shadow pte marked read only so we can update the page * dirty bitmap, when userspace requests it * - mmio access; in this case we will never install a present shadow pte * - normal guest page fault due to the guest pte marked not present, not * writable, or not executable * * Returns: 1 if we need to emulate the instruction, 0 otherwise, or * a negative value on error. */ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) { struct guest_walker walker; int r; WARN_ON_ONCE(fault->is_tdp); /* * Look up the guest pte for the faulting address. * If PFEC.RSVD is set, this is a shadow page fault. * The bit needs to be cleared before walking guest page tables. */ r = FNAME(walk_addr)(&walker, vcpu, fault->addr, fault->error_code & ~PFERR_RSVD_MASK); /* * The page is not mapped by the guest. Let the guest handle it. */ if (!r) { if (!fault->prefetch) kvm_inject_emulated_page_fault(vcpu, &walker.fault); return RET_PF_RETRY; } fault->gfn = walker.gfn; fault->max_level = walker.level; fault->slot = kvm_vcpu_gfn_to_memslot(vcpu, fault->gfn); if (page_fault_handle_page_track(vcpu, fault)) { shadow_page_table_clear_flood(vcpu, fault->addr); return RET_PF_WRITE_PROTECTED; } r = mmu_topup_memory_caches(vcpu, true); if (r) return r; r = kvm_mmu_faultin_pfn(vcpu, fault, walker.pte_access); if (r != RET_PF_CONTINUE) return r; /* * Do not change pte_access if the pfn is a mmio page, otherwise * we will cache the incorrect access into mmio spte. */ if (fault->write && !(walker.pte_access & ACC_WRITE_MASK) && !is_cr0_wp(vcpu->arch.mmu) && !fault->user && fault->slot) { walker.pte_access |= ACC_WRITE_MASK; walker.pte_access &= ~ACC_USER_MASK; /* * If we converted a user page to a kernel page, * so that the kernel can write to it when cr0.wp=0, * then we should prevent the kernel from executing it * if SMEP is enabled. */ if (is_cr4_smep(vcpu->arch.mmu)) walker.pte_access &= ~ACC_EXEC_MASK; } r = RET_PF_RETRY; write_lock(&vcpu->kvm->mmu_lock); if (is_page_fault_stale(vcpu, fault)) goto out_unlock; r = make_mmu_pages_available(vcpu); if (r) goto out_unlock; r = FNAME(fetch)(vcpu, fault, &walker); out_unlock: kvm_mmu_finish_page_fault(vcpu, fault, r); write_unlock(&vcpu->kvm->mmu_lock); return r; } static gpa_t FNAME(get_level1_sp_gpa)(struct kvm_mmu_page *sp) { int offset = 0; WARN_ON_ONCE(sp->role.level != PG_LEVEL_4K); if (PTTYPE == 32) offset = sp->role.quadrant << SPTE_LEVEL_BITS; return gfn_to_gpa(sp->gfn) + offset * sizeof(pt_element_t); } /* Note, @addr is a GPA when gva_to_gpa() translates an L2 GPA to an L1 GPA. */ static gpa_t FNAME(gva_to_gpa)(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, gpa_t addr, u64 access, struct x86_exception *exception) { struct guest_walker walker; gpa_t gpa = INVALID_GPA; int r; #ifndef CONFIG_X86_64 /* A 64-bit GVA should be impossible on 32-bit KVM. */ WARN_ON_ONCE((addr >> 32) && mmu == vcpu->arch.walk_mmu); #endif r = FNAME(walk_addr_generic)(&walker, vcpu, mmu, addr, access); if (r) { gpa = gfn_to_gpa(walker.gfn); gpa |= addr & ~PAGE_MASK; } else if (exception) *exception = walker.fault; return gpa; } /* * Using the information in sp->shadowed_translation (kvm_mmu_page_get_gfn()) is * safe because SPTEs are protected by mmu_notifiers and memslot generations, so * the pfn for a given gfn can't change unless all SPTEs pointing to the gfn are * nuked first. * * Returns * < 0: failed to sync spte * 0: the spte is synced and no tlb flushing is required * > 0: the spte is synced and tlb flushing is required */ static int FNAME(sync_spte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, int i) { bool host_writable; gpa_t first_pte_gpa; u64 *sptep, spte; struct kvm_memory_slot *slot; unsigned pte_access; pt_element_t gpte; gpa_t pte_gpa; gfn_t gfn; if (WARN_ON_ONCE(sp->spt[i] == SHADOW_NONPRESENT_VALUE || !sp->shadowed_translation)) return 0; first_pte_gpa = FNAME(get_level1_sp_gpa)(sp); pte_gpa = first_pte_gpa + i * sizeof(pt_element_t); if (kvm_vcpu_read_guest_atomic(vcpu, pte_gpa, &gpte, sizeof(pt_element_t))) return -1; if (FNAME(prefetch_invalid_gpte)(vcpu, sp, &sp->spt[i], gpte)) return 1; gfn = gpte_to_gfn(gpte); pte_access = sp->role.access; pte_access &= FNAME(gpte_access)(gpte); FNAME(protect_clean_gpte)(vcpu->arch.mmu, &pte_access, gpte); if (sync_mmio_spte(vcpu, &sp->spt[i], gfn, pte_access)) return 0; /* * Drop the SPTE if the new protections result in no effective * "present" bit or if the gfn is changing. The former case * only affects EPT with execute-only support with pte_access==0; * all other paging modes will create a read-only SPTE if * pte_access is zero. */ if ((pte_access | shadow_present_mask) == SHADOW_NONPRESENT_VALUE || gfn != kvm_mmu_page_get_gfn(sp, i)) { drop_spte(vcpu->kvm, &sp->spt[i]); return 1; } /* * Do nothing if the permissions are unchanged. The existing SPTE is * still, and prefetch_invalid_gpte() has verified that the A/D bits * are set in the "new" gPTE, i.e. there is no danger of missing an A/D * update due to A/D bits being set in the SPTE but not the gPTE. */ if (kvm_mmu_page_get_access(sp, i) == pte_access) return 0; /* Update the shadowed access bits in case they changed. */ kvm_mmu_page_set_access(sp, i, pte_access); sptep = &sp->spt[i]; spte = *sptep; host_writable = spte & shadow_host_writable_mask; slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn); make_spte(vcpu, sp, slot, pte_access, gfn, spte_to_pfn(spte), spte, true, true, host_writable, &spte); /* * There is no need to mark the pfn dirty, as the new protections must * be a subset of the old protections, i.e. synchronizing a SPTE cannot * change the SPTE from read-only to writable. */ return mmu_spte_update(sptep, spte); } #undef pt_element_t #undef guest_walker #undef FNAME #undef PT_BASE_ADDR_MASK #undef PT_INDEX #undef PT_LVL_ADDR_MASK #undef PT_LVL_OFFSET_MASK #undef PT_LEVEL_BITS #undef PT_MAX_FULL_LEVELS #undef gpte_to_gfn #undef gpte_to_gfn_lvl #undef PT_GUEST_ACCESSED_MASK #undef PT_GUEST_DIRTY_MASK #undef PT_GUEST_DIRTY_SHIFT #undef PT_GUEST_ACCESSED_SHIFT #undef PT_HAVE_ACCESSED_DIRTY |
| 1 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 | /* SPDX-License-Identifier: GPL-2.0 */ /* * Copyright (C) 2007 Oracle. All rights reserved. */ #ifndef BTRFS_INODE_H #define BTRFS_INODE_H #include <linux/hash.h> #include <linux/refcount.h> #include <linux/spinlock.h> #include <linux/mutex.h> #include <linux/rwsem.h> #include <linux/fs.h> #include <linux/mm.h> #include <linux/compiler.h> #include <linux/fscrypt.h> #include <linux/lockdep.h> #include <uapi/linux/btrfs_tree.h> #include <trace/events/btrfs.h> #include "block-rsv.h" #include "extent_map.h" #include "extent_io.h" #include "extent-io-tree.h" #include "ordered-data.h" #include "delayed-inode.h" struct extent_state; struct posix_acl; struct iov_iter; struct writeback_control; struct btrfs_root; struct btrfs_fs_info; struct btrfs_trans_handle; /* * Since we search a directory based on f_pos (struct dir_context::pos) we have * to start at 2 since '.' and '..' have f_pos of 0 and 1 respectively, so * everybody else has to start at 2 (see btrfs_real_readdir() and dir_emit_dots()). */ #define BTRFS_DIR_START_INDEX 2 /* * ordered_data_close is set by truncate when a file that used * to have good data has been truncated to zero. When it is set * the btrfs file release call will add this inode to the * ordered operations list so that we make sure to flush out any * new data the application may have written before commit. */ enum { BTRFS_INODE_FLUSH_ON_CLOSE, BTRFS_INODE_DUMMY, BTRFS_INODE_IN_DEFRAG, BTRFS_INODE_HAS_ASYNC_EXTENT, /* * Always set under the VFS' inode lock, otherwise it can cause races * during fsync (we start as a fast fsync and then end up in a full * fsync racing with ordered extent completion). */ BTRFS_INODE_NEEDS_FULL_SYNC, BTRFS_INODE_COPY_EVERYTHING, BTRFS_INODE_HAS_PROPS, BTRFS_INODE_SNAPSHOT_FLUSH, /* * Set and used when logging an inode and it serves to signal that an * inode does not have xattrs, so subsequent fsyncs can avoid searching * for xattrs to log. This bit must be cleared whenever a xattr is added * to an inode. */ BTRFS_INODE_NO_XATTRS, /* * Set when we are in a context where we need to start a transaction and * have dirty pages with the respective file range locked. This is to * ensure that when reserving space for the transaction, if we are low * on available space and need to flush delalloc, we will not flush * delalloc for this inode, because that could result in a deadlock (on * the file range, inode's io_tree). */ BTRFS_INODE_NO_DELALLOC_FLUSH, /* * Set when we are working on enabling verity for a file. Computing and * writing the whole Merkle tree can take a while so we want to prevent * races where two separate tasks attempt to simultaneously start verity * on the same file. */ BTRFS_INODE_VERITY_IN_PROGRESS, /* Set when this inode is a free space inode. */ BTRFS_INODE_FREE_SPACE_INODE, /* Set when there are no capabilities in XATTs for the inode. */ BTRFS_INODE_NO_CAP_XATTR, /* * Set if an error happened when doing a COW write before submitting a * bio or during writeback. Used for both buffered writes and direct IO * writes. This is to signal a fast fsync that it has to wait for * ordered extents to complete and therefore not log extent maps that * point to unwritten extents (when an ordered extent completes and it * has the BTRFS_ORDERED_IOERR flag set, it drops extent maps in its * range). */ BTRFS_INODE_COW_WRITE_ERROR, /* * Indicate this is a directory that points to a subvolume for which * there is no root reference item. That's a case like the following: * * $ btrfs subvolume create /mnt/parent * $ btrfs subvolume create /mnt/parent/child * $ btrfs subvolume snapshot /mnt/parent /mnt/snap * * If subvolume "parent" is root 256, subvolume "child" is root 257 and * snapshot "snap" is root 258, then there's no root reference item (key * BTRFS_ROOT_REF_KEY in the root tree) for the subvolume "child" * associated to root 258 (the snapshot) - there's only for the root * of the "parent" subvolume (root 256). In the chunk root we have a * (256 BTRFS_ROOT_REF_KEY 257) key but we don't have a * (258 BTRFS_ROOT_REF_KEY 257) key - the sames goes for backrefs, we * have a (257 BTRFS_ROOT_BACKREF_KEY 256) but we don't have a * (257 BTRFS_ROOT_BACKREF_KEY 258) key. * * So when opening the "child" dentry from the snapshot's directory, * we don't find a root ref item and we create a stub inode. This is * done at new_simple_dir(), called from btrfs_lookup_dentry(). */ BTRFS_INODE_ROOT_STUB, }; /* in memory btrfs inode */ struct btrfs_inode { /* which subvolume this inode belongs to */ struct btrfs_root *root; #if BITS_PER_LONG == 32 /* * The objectid of the corresponding BTRFS_INODE_ITEM_KEY. * On 64 bits platforms we can get it from vfs_inode.i_ino, which is an * unsigned long and therefore 64 bits on such platforms. */ u64 objectid; #endif /* Cached value of inode property 'compression'. */ u8 prop_compress; /* * Force compression on the file using the defrag ioctl, could be * different from prop_compress and takes precedence if set. */ u8 defrag_compress; /* * Lock for counters and all fields used to determine if the inode is in * the log or not (last_trans, last_sub_trans, last_log_commit, * logged_trans), to access/update delalloc_bytes, new_delalloc_bytes, * defrag_bytes, disk_i_size, outstanding_extents, csum_bytes and to * update the VFS' inode number of bytes used. * Also protects setting struct file::private_data. */ spinlock_t lock; /* the extent_tree has caches of all the extent mappings to disk */ struct extent_map_tree extent_tree; /* the io_tree does range state (DIRTY, LOCKED etc) */ struct extent_io_tree io_tree; /* * Keep track of where the inode has extent items mapped in order to * make sure the i_size adjustments are accurate. Not required when the * filesystem is NO_HOLES, the status can't be set while mounted as * it's a mkfs-time feature. */ struct extent_io_tree *file_extent_tree; /* held while logging the inode in tree-log.c */ struct mutex log_mutex; /* * Counters to keep track of the number of extent item's we may use due * to delalloc and such. outstanding_extents is the number of extent * items we think we'll end up using, and reserved_extents is the number * of extent items we've reserved metadata for. Protected by 'lock'. */ unsigned outstanding_extents; /* used to order data wrt metadata */ spinlock_t ordered_tree_lock; struct rb_root ordered_tree; struct rb_node *ordered_tree_last; /* list of all the delalloc inodes in the FS. There are times we need * to write all the delalloc pages to disk, and this list is used * to walk them all. */ struct list_head delalloc_inodes; unsigned long runtime_flags; /* full 64 bit generation number, struct vfs_inode doesn't have a big * enough field for this. */ u64 generation; /* * ID of the transaction handle that last modified this inode. * Protected by 'lock'. */ u64 last_trans; /* * ID of the transaction that last logged this inode. * Protected by 'lock'. */ u64 logged_trans; /* * Log transaction ID when this inode was last modified. * Protected by 'lock'. */ int last_sub_trans; /* A local copy of root's last_log_commit. Protected by 'lock'. */ int last_log_commit; union { /* * Total number of bytes pending delalloc, used by stat to * calculate the real block usage of the file. This is used * only for files. Protected by 'lock'. */ u64 delalloc_bytes; /* * The lowest possible index of the next dir index key which * points to an inode that needs to be logged. * This is used only for directories. * Use the helpers btrfs_get_first_dir_index_to_log() and * btrfs_set_first_dir_index_to_log() to access this field. */ u64 first_dir_index_to_log; }; union { /* * Total number of bytes pending delalloc that fall within a file * range that is either a hole or beyond EOF (and no prealloc extent * exists in the range). This is always <= delalloc_bytes and this * is used only for files. Protected by 'lock'. */ u64 new_delalloc_bytes; /* * The offset of the last dir index key that was logged. * This is used only for directories. */ u64 last_dir_index_offset; }; union { /* * Total number of bytes pending defrag, used by stat to check whether * it needs COW. Protected by 'lock'. * Used by inodes other than the data relocation inode. */ u64 defrag_bytes; /* * Logical address of the block group being relocated. * Used only by the data relocation inode. */ u64 reloc_block_group_start; }; /* * The size of the file stored in the metadata on disk. data=ordered * means the in-memory i_size might be larger than the size on disk * because not all the blocks are written yet. Protected by 'lock'. */ u64 disk_i_size; union { /* * If this is a directory then index_cnt is the counter for the * index number for new files that are created. For an empty * directory, this must be initialized to BTRFS_DIR_START_INDEX. */ u64 index_cnt; /* * If this is not a directory, this is the number of bytes * outstanding that are going to need csums. This is used in * ENOSPC accounting. Protected by 'lock'. */ u64 csum_bytes; }; /* Cache the directory index number to speed the dir/file remove */ u64 dir_index; /* the fsync log has some corner cases that mean we have to check * directories to see if any unlinks have been done before * the directory was logged. See tree-log.c for all the * details */ u64 last_unlink_trans; union { /* * The id/generation of the last transaction where this inode * was either the source or the destination of a clone/dedupe * operation. Used when logging an inode to know if there are * shared extents that need special care when logging checksum * items, to avoid duplicate checksum items in a log (which can * lead to a corruption where we end up with missing checksum * ranges after log replay). Protected by the VFS inode lock. * Used for regular files only. */ u64 last_reflink_trans; /* * In case this a root stub inode (BTRFS_INODE_ROOT_STUB flag set), * the ID of that root. */ u64 ref_root_id; }; /* Backwards incompatible flags, lower half of inode_item::flags */ u32 flags; /* Read-only compatibility flags, upper half of inode_item::flags */ u32 ro_flags; struct btrfs_block_rsv block_rsv; struct btrfs_delayed_node *delayed_node; /* File creation time. */ u64 i_otime_sec; u32 i_otime_nsec; /* Hook into fs_info->delayed_iputs */ struct list_head delayed_iput; struct rw_semaphore i_mmap_lock; struct inode vfs_inode; }; static inline u64 btrfs_get_first_dir_index_to_log(const struct btrfs_inode *inode) { return READ_ONCE(inode->first_dir_index_to_log); } static inline void btrfs_set_first_dir_index_to_log(struct btrfs_inode *inode, u64 index) { WRITE_ONCE(inode->first_dir_index_to_log, index); } /* Type checked and const-preserving VFS inode -> btrfs inode. */ #define BTRFS_I(_inode) \ _Generic(_inode, \ struct inode *: container_of(_inode, struct btrfs_inode, vfs_inode), \ const struct inode *: (const struct btrfs_inode *)container_of( \ _inode, const struct btrfs_inode, vfs_inode)) static inline unsigned long btrfs_inode_hash(u64 objectid, const struct btrfs_root *root) { u64 h = objectid ^ (root->root_key.objectid * GOLDEN_RATIO_PRIME); #if BITS_PER_LONG == 32 h = (h >> 32) ^ (h & 0xffffffff); #endif return (unsigned long)h; } #if BITS_PER_LONG == 32 /* * On 32 bit systems the i_ino of struct inode is 32 bits (unsigned long), so * we use the inode's location objectid which is a u64 to avoid truncation. */ static inline u64 btrfs_ino(const struct btrfs_inode *inode) { u64 ino = inode->objectid; if (test_bit(BTRFS_INODE_ROOT_STUB, &inode->runtime_flags)) ino = inode->vfs_inode.i_ino; return ino; } #else static inline u64 btrfs_ino(const struct btrfs_inode *inode) { return inode->vfs_inode.i_ino; } #endif static inline void btrfs_get_inode_key(const struct btrfs_inode *inode, struct btrfs_key *key) { key->objectid = btrfs_ino(inode); key->type = BTRFS_INODE_ITEM_KEY; key->offset = 0; } static inline void btrfs_set_inode_number(struct btrfs_inode *inode, u64 ino) { #if BITS_PER_LONG == 32 inode->objectid = ino; #endif inode->vfs_inode.i_ino = ino; } static inline void btrfs_i_size_write(struct btrfs_inode *inode, u64 size) { i_size_write(&inode->vfs_inode, size); inode->disk_i_size = size; } static inline bool btrfs_is_free_space_inode(const struct btrfs_inode *inode) { return test_bit(BTRFS_INODE_FREE_SPACE_INODE, &inode->runtime_flags); } static inline bool is_data_inode(const struct btrfs_inode *inode) { return btrfs_ino(inode) != BTRFS_BTREE_INODE_OBJECTID; } static inline void btrfs_mod_outstanding_extents(struct btrfs_inode *inode, int mod) { lockdep_assert_held(&inode->lock); inode->outstanding_extents += mod; if (btrfs_is_free_space_inode(inode)) return; trace_btrfs_inode_mod_outstanding_extents(inode->root, btrfs_ino(inode), mod, inode->outstanding_extents); } /* * Called every time after doing a buffered, direct IO or memory mapped write. * * This is to ensure that if we write to a file that was previously fsynced in * the current transaction, then try to fsync it again in the same transaction, * we will know that there were changes in the file and that it needs to be * logged. */ static inline void btrfs_set_inode_last_sub_trans(struct btrfs_inode *inode) { spin_lock(&inode->lock); inode->last_sub_trans = inode->root->log_transid; spin_unlock(&inode->lock); } /* * Should be called while holding the inode's VFS lock in exclusive mode, or * while holding the inode's mmap lock (struct btrfs_inode::i_mmap_lock) in * either shared or exclusive mode, or in a context where no one else can access * the inode concurrently (during inode creation or when loading an inode from * disk). */ static inline void btrfs_set_inode_full_sync(struct btrfs_inode *inode) { set_bit(BTRFS_INODE_NEEDS_FULL_SYNC, &inode->runtime_flags); /* * The inode may have been part of a reflink operation in the last * transaction that modified it, and then a fsync has reset the * last_reflink_trans to avoid subsequent fsyncs in the same * transaction to do unnecessary work. So update last_reflink_trans * to the last_trans value (we have to be pessimistic and assume a * reflink happened). * * The ->last_trans is protected by the inode's spinlock and we can * have a concurrent ordered extent completion update it. Also set * last_reflink_trans to ->last_trans only if the former is less than * the later, because we can be called in a context where * last_reflink_trans was set to the current transaction generation * while ->last_trans was not yet updated in the current transaction, * and therefore has a lower value. */ spin_lock(&inode->lock); if (inode->last_reflink_trans < inode->last_trans) inode->last_reflink_trans = inode->last_trans; spin_unlock(&inode->lock); } static inline bool btrfs_inode_in_log(struct btrfs_inode *inode, u64 generation) { bool ret = false; spin_lock(&inode->lock); if (inode->logged_trans == generation && inode->last_sub_trans <= inode->last_log_commit && inode->last_sub_trans <= btrfs_get_root_last_log_commit(inode->root)) ret = true; spin_unlock(&inode->lock); return ret; } /* * Check if the inode has flags compatible with compression */ static inline bool btrfs_inode_can_compress(const struct btrfs_inode *inode) { if (inode->flags & BTRFS_INODE_NODATACOW || inode->flags & BTRFS_INODE_NODATASUM) return false; return true; } static inline void btrfs_assert_inode_locked(struct btrfs_inode *inode) { /* Immediately trigger a crash if the inode is not locked. */ ASSERT(inode_is_locked(&inode->vfs_inode)); /* Trigger a splat in dmesg if this task is not holding the lock. */ lockdep_assert_held(&inode->vfs_inode.i_rwsem); } /* Array of bytes with variable length, hexadecimal format 0x1234 */ #define CSUM_FMT "0x%*phN" #define CSUM_FMT_VALUE(size, bytes) size, bytes int btrfs_check_sector_csum(struct btrfs_fs_info *fs_info, struct page *page, u32 pgoff, u8 *csum, const u8 * const csum_expected); bool btrfs_data_csum_ok(struct btrfs_bio *bbio, struct btrfs_device *dev, u32 bio_offset, struct bio_vec *bv); noinline int can_nocow_extent(struct inode *inode, u64 offset, u64 *len, struct btrfs_file_extent *file_extent, bool nowait); void btrfs_del_delalloc_inode(struct btrfs_inode *inode); struct inode *btrfs_lookup_dentry(struct inode *dir, struct dentry *dentry); int btrfs_set_inode_index(struct btrfs_inode *dir, u64 *index); int btrfs_unlink_inode(struct btrfs_trans_handle *trans, struct btrfs_inode *dir, struct btrfs_inode *inode, const struct fscrypt_str *name); int btrfs_add_link(struct btrfs_trans_handle *trans, struct btrfs_inode *parent_inode, struct btrfs_inode *inode, const struct fscrypt_str *name, int add_backref, u64 index); int btrfs_delete_subvolume(struct btrfs_inode *dir, struct dentry *dentry); int btrfs_truncate_block(struct btrfs_inode *inode, loff_t from, loff_t len, int front); int btrfs_start_delalloc_snapshot(struct btrfs_root *root, bool in_reclaim_context); int btrfs_start_delalloc_roots(struct btrfs_fs_info *fs_info, long nr, bool in_reclaim_context); int btrfs_set_extent_delalloc(struct btrfs_inode *inode, u64 start, u64 end, unsigned int extra_bits, struct extent_state **cached_state); struct btrfs_new_inode_args { /* Input */ struct inode *dir; struct dentry *dentry; struct inode *inode; bool orphan; bool subvol; /* Output from btrfs_new_inode_prepare(), input to btrfs_create_new_inode(). */ struct posix_acl *default_acl; struct posix_acl *acl; struct fscrypt_name fname; }; int btrfs_new_inode_prepare(struct btrfs_new_inode_args *args, unsigned int *trans_num_items); int btrfs_create_new_inode(struct btrfs_trans_handle *trans, struct btrfs_new_inode_args *args); void btrfs_new_inode_args_destroy(struct btrfs_new_inode_args *args); struct inode *btrfs_new_subvol_inode(struct mnt_idmap *idmap, struct inode *dir); void btrfs_set_delalloc_extent(struct btrfs_inode *inode, struct extent_state *state, u32 bits); void btrfs_clear_delalloc_extent(struct btrfs_inode *inode, struct extent_state *state, u32 bits); void btrfs_merge_delalloc_extent(struct btrfs_inode *inode, struct extent_state *new, struct extent_state *other); void btrfs_split_delalloc_extent(struct btrfs_inode *inode, struct extent_state *orig, u64 split); void btrfs_evict_inode(struct inode *inode); struct inode *btrfs_alloc_inode(struct super_block *sb); void btrfs_destroy_inode(struct inode *inode); void btrfs_free_inode(struct inode *inode); int btrfs_drop_inode(struct inode *inode); int __init btrfs_init_cachep(void); void __cold btrfs_destroy_cachep(void); struct inode *btrfs_iget_path(u64 ino, struct btrfs_root *root, struct btrfs_path *path); struct inode *btrfs_iget(u64 ino, struct btrfs_root *root); struct extent_map *btrfs_get_extent(struct btrfs_inode *inode, struct folio *folio, u64 start, u64 len); int btrfs_update_inode(struct btrfs_trans_handle *trans, struct btrfs_inode *inode); int btrfs_update_inode_fallback(struct btrfs_trans_handle *trans, struct btrfs_inode *inode); int btrfs_orphan_add(struct btrfs_trans_handle *trans, struct btrfs_inode *inode); int btrfs_orphan_cleanup(struct btrfs_root *root); int btrfs_cont_expand(struct btrfs_inode *inode, loff_t oldsize, loff_t size); void btrfs_add_delayed_iput(struct btrfs_inode *inode); void btrfs_run_delayed_iputs(struct btrfs_fs_info *fs_info); int btrfs_wait_on_delayed_iputs(struct btrfs_fs_info *fs_info); int btrfs_prealloc_file_range(struct inode *inode, int mode, u64 start, u64 num_bytes, u64 min_size, loff_t actual_len, u64 *alloc_hint); int btrfs_prealloc_file_range_trans(struct inode *inode, struct btrfs_trans_handle *trans, int mode, u64 start, u64 num_bytes, u64 min_size, loff_t actual_len, u64 *alloc_hint); int btrfs_run_delalloc_range(struct btrfs_inode *inode, struct folio *locked_folio, u64 start, u64 end, struct writeback_control *wbc); int btrfs_writepage_cow_fixup(struct folio *folio); int btrfs_encoded_io_compression_from_extent(struct btrfs_fs_info *fs_info, int compress_type); int btrfs_encoded_read_regular_fill_pages(struct btrfs_inode *inode, u64 disk_bytenr, u64 disk_io_size, struct page **pages, void *uring_ctx); ssize_t btrfs_encoded_read(struct kiocb *iocb, struct iov_iter *iter, struct btrfs_ioctl_encoded_io_args *encoded, struct extent_state **cached_state, u64 *disk_bytenr, u64 *disk_io_size); ssize_t btrfs_encoded_read_regular(struct kiocb *iocb, struct iov_iter *iter, u64 start, u64 lockend, struct extent_state **cached_state, u64 disk_bytenr, u64 disk_io_size, size_t count, bool compressed, bool *unlocked); ssize_t btrfs_do_encoded_write(struct kiocb *iocb, struct iov_iter *from, const struct btrfs_ioctl_encoded_io_args *encoded); struct btrfs_inode *btrfs_find_first_inode(struct btrfs_root *root, u64 min_ino); extern const struct dentry_operations btrfs_dentry_operations; /* Inode locking type flags, by default the exclusive lock is taken. */ enum btrfs_ilock_type { ENUM_BIT(BTRFS_ILOCK_SHARED), ENUM_BIT(BTRFS_ILOCK_TRY), ENUM_BIT(BTRFS_ILOCK_MMAP), }; int btrfs_inode_lock(struct btrfs_inode *inode, unsigned int ilock_flags); void btrfs_inode_unlock(struct btrfs_inode *inode, unsigned int ilock_flags); void btrfs_update_inode_bytes(struct btrfs_inode *inode, const u64 add_bytes, const u64 del_bytes); void btrfs_assert_inode_range_clean(struct btrfs_inode *inode, u64 start, u64 end); u64 btrfs_get_extent_allocation_hint(struct btrfs_inode *inode, u64 start, u64 num_bytes); struct extent_map *btrfs_create_io_em(struct btrfs_inode *inode, u64 start, const struct btrfs_file_extent *file_extent, int type); #endif |
| 18 18 6 14 14 4 15 24 134 136 133 110 22 8 98 25 117 35 97 101 28 116 11 7 76 33 92 31 117 116 25 25 5 3 5 3 3 3 2 2 47 101 101 100 60 19 49 31 15 60 1 37 38 12 24 3 27 17 38 38 24 25 25 25 25 11 3 24 2 25 1 1 1 9 9 9 3 2 6 6 107 75 57 107 63 29 27 103 11 106 3 105 1 43 76 107 107 107 106 106 3 18 90 107 106 38 12 1 1 100 37 102 1 1 12 11 1 1 1 1 1 1 1 1 11 1 1 1 2 2 2 114 77 48 2 48 107 74 42 15 8 19 13 10 2 15 2 15 18 90 91 8 8 96 2 1 1 95 37 67 2 96 95 96 6 7 94 68 26 6 88 90 89 36 62 90 12 3 3 22 22 9 13 22 1 22 3 22 9 13 18 8 8 22 6 5 3 3 2 117 70 96 22 22 22 18 8 80 94 94 94 3 107 12 12 118 95 78 98 21 7 3 97 95 95 85 116 107 106 106 17 16 118 118 118 75 5 98 97 61 46 78 107 104 104 2 104 100 101 101 19 20 2 2 16 4 2 3 2 2 4 8 1 2 6 6 22 21 10 7 14 14 21 29 25 2 5 21 28 26 22 4 3 22 7 15 1 12 6 7 4 1 6 7 12 12 1 17 13 13 12 1 46 1 1 6 6 6 1 52 51 1 30 19 53 53 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 2297 2298 2299 2300 2301 2302 2303 2304 2305 2306 2307 2308 2309 2310 2311 2312 2313 2314 2315 2316 2317 2318 2319 2320 2321 2322 2323 2324 2325 2326 2327 2328 2329 2330 2331 2332 2333 2334 2335 2336 2337 2338 2339 2340 2341 2342 2343 2344 2345 2346 2347 2348 2349 2350 2351 2352 2353 2354 2355 2356 2357 2358 2359 2360 2361 2362 2363 2364 2365 2366 2367 2368 2369 2370 2371 2372 2373 2374 2375 2376 2377 2378 2379 2380 2381 2382 2383 2384 2385 2386 2387 2388 2389 2390 2391 2392 2393 2394 2395 2396 2397 2398 2399 2400 2401 2402 2403 2404 2405 2406 2407 2408 2409 2410 2411 2412 2413 2414 2415 2416 2417 2418 2419 2420 2421 2422 2423 2424 2425 2426 2427 2428 2429 2430 2431 2432 2433 2434 2435 2436 2437 2438 2439 2440 2441 2442 2443 2444 2445 2446 2447 2448 2449 2450 2451 2452 2453 2454 2455 2456 2457 2458 2459 2460 2461 2462 2463 2464 2465 2466 2467 2468 2469 2470 2471 2472 2473 2474 2475 2476 2477 2478 2479 2480 2481 2482 2483 2484 2485 2486 2487 2488 2489 2490 2491 2492 2493 2494 2495 2496 2497 2498 2499 2500 2501 2502 2503 2504 2505 2506 2507 2508 2509 2510 2511 2512 2513 2514 2515 2516 2517 2518 2519 2520 2521 2522 2523 2524 2525 2526 2527 2528 2529 2530 2531 2532 2533 2534 2535 2536 2537 2538 2539 2540 2541 2542 2543 2544 2545 2546 2547 2548 2549 2550 2551 2552 2553 2554 2555 2556 2557 2558 2559 2560 2561 2562 2563 2564 2565 2566 2567 2568 2569 2570 2571 2572 2573 2574 2575 2576 2577 2578 2579 2580 2581 2582 2583 2584 2585 2586 2587 2588 2589 2590 2591 2592 2593 2594 2595 2596 2597 2598 2599 2600 2601 2602 2603 2604 2605 2606 2607 2608 2609 2610 2611 2612 2613 2614 2615 2616 2617 2618 2619 2620 2621 2622 2623 2624 2625 2626 2627 2628 2629 2630 2631 2632 2633 2634 2635 2636 2637 2638 2639 2640 2641 2642 2643 2644 2645 2646 2647 2648 2649 2650 2651 2652 2653 2654 2655 2656 2657 2658 2659 2660 2661 2662 2663 2664 2665 2666 2667 2668 2669 2670 2671 2672 2673 2674 2675 2676 2677 2678 2679 2680 2681 2682 2683 2684 2685 2686 2687 2688 2689 2690 2691 2692 2693 2694 2695 2696 2697 2698 2699 2700 2701 2702 2703 2704 2705 2706 2707 2708 2709 2710 2711 2712 2713 2714 2715 2716 2717 2718 2719 2720 2721 2722 2723 2724 2725 2726 2727 2728 2729 2730 2731 2732 2733 2734 2735 2736 2737 2738 | // SPDX-License-Identifier: GPL-2.0 /* * Memory Migration functionality - linux/mm/migrate.c * * Copyright (C) 2006 Silicon Graphics, Inc., Christoph Lameter * * Page migration was first developed in the context of the memory hotplug * project. The main authors of the migration code are: * * IWAMOTO Toshihiro <iwamoto@valinux.co.jp> * Hirokazu Takahashi <taka@valinux.co.jp> * Dave Hansen <haveblue@us.ibm.com> * Christoph Lameter */ #include <linux/migrate.h> #include <linux/export.h> #include <linux/swap.h> #include <linux/swapops.h> #include <linux/pagemap.h> #include <linux/buffer_head.h> #include <linux/mm_inline.h> #include <linux/ksm.h> #include <linux/rmap.h> #include <linux/topology.h> #include <linux/cpu.h> #include <linux/cpuset.h> #include <linux/writeback.h> #include <linux/mempolicy.h> #include <linux/vmalloc.h> #include <linux/security.h> #include <linux/backing-dev.h> #include <linux/compaction.h> #include <linux/syscalls.h> #include <linux/compat.h> #include <linux/hugetlb.h> #include <linux/gfp.h> #include <linux/pfn_t.h> #include <linux/page_idle.h> #include <linux/page_owner.h> #include <linux/sched/mm.h> #include <linux/ptrace.h> #include <linux/memory.h> #include <linux/sched/sysctl.h> #include <linux/memory-tiers.h> #include <linux/pagewalk.h> #include <asm/tlbflush.h> #include <trace/events/migrate.h> #include "internal.h" bool isolate_movable_page(struct page *page, isolate_mode_t mode) { struct folio *folio = folio_get_nontail_page(page); const struct movable_operations *mops; /* * Avoid burning cycles with pages that are yet under __free_pages(), * or just got freed under us. * * In case we 'win' a race for a movable page being freed under us and * raise its refcount preventing __free_pages() from doing its job * the put_page() at the end of this block will take care of * release this page, thus avoiding a nasty leakage. */ if (!folio) goto out; /* * Check movable flag before taking the page lock because * we use non-atomic bitops on newly allocated page flags so * unconditionally grabbing the lock ruins page's owner side. */ if (unlikely(!__folio_test_movable(folio))) goto out_putfolio; /* * As movable pages are not isolated from LRU lists, concurrent * compaction threads can race against page migration functions * as well as race against the releasing a page. * * In order to avoid having an already isolated movable page * being (wrongly) re-isolated while it is under migration, * or to avoid attempting to isolate pages being released, * lets be sure we have the page lock * before proceeding with the movable page isolation steps. */ if (unlikely(!folio_trylock(folio))) goto out_putfolio; if (!folio_test_movable(folio) || folio_test_isolated(folio)) goto out_no_isolated; mops = folio_movable_ops(folio); VM_BUG_ON_FOLIO(!mops, folio); if (!mops->isolate_page(&folio->page, mode)) goto out_no_isolated; /* Driver shouldn't use the isolated flag */ WARN_ON_ONCE(folio_test_isolated(folio)); folio_set_isolated(folio); folio_unlock(folio); return true; out_no_isolated: folio_unlock(folio); out_putfolio: folio_put(folio); out: return false; } static void putback_movable_folio(struct folio *folio) { const struct movable_operations *mops = folio_movable_ops(folio); mops->putback_page(&folio->page); folio_clear_isolated(folio); } /* * Put previously isolated pages back onto the appropriate lists * from where they were once taken off for compaction/migration. * * This function shall be used whenever the isolated pageset has been * built from lru, balloon, hugetlbfs page. See isolate_migratepages_range() * and folio_isolate_hugetlb(). */ void putback_movable_pages(struct list_head *l) { struct folio *folio; struct folio *folio2; list_for_each_entry_safe(folio, folio2, l, lru) { if (unlikely(folio_test_hugetlb(folio))) { folio_putback_hugetlb(folio); continue; } list_del(&folio->lru); /* * We isolated non-lru movable folio so here we can use * __folio_test_movable because LRU folio's mapping cannot * have PAGE_MAPPING_MOVABLE. */ if (unlikely(__folio_test_movable(folio))) { VM_BUG_ON_FOLIO(!folio_test_isolated(folio), folio); folio_lock(folio); if (folio_test_movable(folio)) putback_movable_folio(folio); else folio_clear_isolated(folio); folio_unlock(folio); folio_put(folio); } else { node_stat_mod_folio(folio, NR_ISOLATED_ANON + folio_is_file_lru(folio), -folio_nr_pages(folio)); folio_putback_lru(folio); } } } /* Must be called with an elevated refcount on the non-hugetlb folio */ bool isolate_folio_to_list(struct folio *folio, struct list_head *list) { bool isolated, lru; if (folio_test_hugetlb(folio)) return folio_isolate_hugetlb(folio, list); lru = !__folio_test_movable(folio); if (lru) isolated = folio_isolate_lru(folio); else isolated = isolate_movable_page(&folio->page, ISOLATE_UNEVICTABLE); if (!isolated) return false; list_add(&folio->lru, list); if (lru) node_stat_add_folio(folio, NR_ISOLATED_ANON + folio_is_file_lru(folio)); return true; } static bool try_to_map_unused_to_zeropage(struct page_vma_mapped_walk *pvmw, struct folio *folio, unsigned long idx) { struct page *page = folio_page(folio, idx); bool contains_data; pte_t newpte; void *addr; if (PageCompound(page)) return false; VM_BUG_ON_PAGE(!PageAnon(page), page); VM_BUG_ON_PAGE(!PageLocked(page), page); VM_BUG_ON_PAGE(pte_present(*pvmw->pte), page); if (folio_test_mlocked(folio) || (pvmw->vma->vm_flags & VM_LOCKED) || mm_forbids_zeropage(pvmw->vma->vm_mm)) return false; /* * The pmd entry mapping the old thp was flushed and the pte mapping * this subpage has been non present. If the subpage is only zero-filled * then map it to the shared zeropage. */ addr = kmap_local_page(page); contains_data = memchr_inv(addr, 0, PAGE_SIZE); kunmap_local(addr); if (contains_data) return false; newpte = pte_mkspecial(pfn_pte(my_zero_pfn(pvmw->address), pvmw->vma->vm_page_prot)); set_pte_at(pvmw->vma->vm_mm, pvmw->address, pvmw->pte, newpte); dec_mm_counter(pvmw->vma->vm_mm, mm_counter(folio)); return true; } struct rmap_walk_arg { struct folio *folio; bool map_unused_to_zeropage; }; /* * Restore a potential migration pte to a working pte entry */ static bool remove_migration_pte(struct folio *folio, struct vm_area_struct *vma, unsigned long addr, void *arg) { struct rmap_walk_arg *rmap_walk_arg = arg; DEFINE_FOLIO_VMA_WALK(pvmw, rmap_walk_arg->folio, vma, addr, PVMW_SYNC | PVMW_MIGRATION); while (page_vma_mapped_walk(&pvmw)) { rmap_t rmap_flags = RMAP_NONE; pte_t old_pte; pte_t pte; swp_entry_t entry; struct page *new; unsigned long idx = 0; /* pgoff is invalid for ksm pages, but they are never large */ if (folio_test_large(folio) && !folio_test_hugetlb(folio)) idx = linear_page_index(vma, pvmw.address) - pvmw.pgoff; new = folio_page(folio, idx); #ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION /* PMD-mapped THP migration entry */ if (!pvmw.pte) { VM_BUG_ON_FOLIO(folio_test_hugetlb(folio) || !folio_test_pmd_mappable(folio), folio); remove_migration_pmd(&pvmw, new); continue; } #endif if (rmap_walk_arg->map_unused_to_zeropage && try_to_map_unused_to_zeropage(&pvmw, folio, idx)) continue; folio_get(folio); pte = mk_pte(new, READ_ONCE(vma->vm_page_prot)); old_pte = ptep_get(pvmw.pte); entry = pte_to_swp_entry(old_pte); if (!is_migration_entry_young(entry)) pte = pte_mkold(pte); if (folio_test_dirty(folio) && is_migration_entry_dirty(entry)) pte = pte_mkdirty(pte); if (pte_swp_soft_dirty(old_pte)) pte = pte_mksoft_dirty(pte); else pte = pte_clear_soft_dirty(pte); if (is_writable_migration_entry(entry)) pte = pte_mkwrite(pte, vma); else if (pte_swp_uffd_wp(old_pte)) pte = pte_mkuffd_wp(pte); if (folio_test_anon(folio) && !is_readable_migration_entry(entry)) rmap_flags |= RMAP_EXCLUSIVE; if (unlikely(is_device_private_page(new))) { if (pte_write(pte)) entry = make_writable_device_private_entry( page_to_pfn(new)); else entry = make_readable_device_private_entry( page_to_pfn(new)); pte = swp_entry_to_pte(entry); if (pte_swp_soft_dirty(old_pte)) pte = pte_swp_mksoft_dirty(pte); if (pte_swp_uffd_wp(old_pte)) pte = pte_swp_mkuffd_wp(pte); } #ifdef CONFIG_HUGETLB_PAGE if (folio_test_hugetlb(folio)) { struct hstate *h = hstate_vma(vma); unsigned int shift = huge_page_shift(h); unsigned long psize = huge_page_size(h); pte = arch_make_huge_pte(pte, shift, vma->vm_flags); if (folio_test_anon(folio)) hugetlb_add_anon_rmap(folio, vma, pvmw.address, rmap_flags); else hugetlb_add_file_rmap(folio); set_huge_pte_at(vma->vm_mm, pvmw.address, pvmw.pte, pte, psize); } else #endif { if (folio_test_anon(folio)) folio_add_anon_rmap_pte(folio, new, vma, pvmw.address, rmap_flags); else folio_add_file_rmap_pte(folio, new, vma); set_pte_at(vma->vm_mm, pvmw.address, pvmw.pte, pte); } if (vma->vm_flags & VM_LOCKED) mlock_drain_local(); trace_remove_migration_pte(pvmw.address, pte_val(pte), compound_order(new)); /* No need to invalidate - it was non-present before */ update_mmu_cache(vma, pvmw.address, pvmw.pte); } return true; } /* * Get rid of all migration entries and replace them by * references to the indicated page. */ void remove_migration_ptes(struct folio *src, struct folio *dst, int flags) { struct rmap_walk_arg rmap_walk_arg = { .folio = src, .map_unused_to_zeropage = flags & RMP_USE_SHARED_ZEROPAGE, }; struct rmap_walk_control rwc = { .rmap_one = remove_migration_pte, .arg = &rmap_walk_arg, }; VM_BUG_ON_FOLIO((flags & RMP_USE_SHARED_ZEROPAGE) && (src != dst), src); if (flags & RMP_LOCKED) rmap_walk_locked(dst, &rwc); else rmap_walk(dst, &rwc); } /* * Something used the pte of a page under migration. We need to * get to the page and wait until migration is finished. * When we return from this function the fault will be retried. */ void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd, unsigned long address) { spinlock_t *ptl; pte_t *ptep; pte_t pte; swp_entry_t entry; ptep = pte_offset_map_lock(mm, pmd, address, &ptl); if (!ptep) return; pte = ptep_get(ptep); pte_unmap(ptep); if (!is_swap_pte(pte)) goto out; entry = pte_to_swp_entry(pte); if (!is_migration_entry(entry)) goto out; migration_entry_wait_on_locked(entry, ptl); return; out: spin_unlock(ptl); } #ifdef CONFIG_HUGETLB_PAGE /* * The vma read lock must be held upon entry. Holding that lock prevents either * the pte or the ptl from being freed. * * This function will release the vma lock before returning. */ void migration_entry_wait_huge(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep) { spinlock_t *ptl = huge_pte_lockptr(hstate_vma(vma), vma->vm_mm, ptep); pte_t pte; hugetlb_vma_assert_locked(vma); spin_lock(ptl); pte = huge_ptep_get(vma->vm_mm, addr, ptep); if (unlikely(!is_hugetlb_entry_migration(pte))) { spin_unlock(ptl); hugetlb_vma_unlock_read(vma); } else { /* * If migration entry existed, safe to release vma lock * here because the pgtable page won't be freed without the * pgtable lock released. See comment right above pgtable * lock release in migration_entry_wait_on_locked(). */ hugetlb_vma_unlock_read(vma); migration_entry_wait_on_locked(pte_to_swp_entry(pte), ptl); } } #endif #ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION void pmd_migration_entry_wait(struct mm_struct *mm, pmd_t *pmd) { spinlock_t *ptl; ptl = pmd_lock(mm, pmd); if (!is_pmd_migration_entry(*pmd)) goto unlock; migration_entry_wait_on_locked(pmd_to_swp_entry(*pmd), ptl); return; unlock: spin_unlock(ptl); } #endif static int folio_expected_refs(struct address_space *mapping, struct folio *folio) { int refs = 1; if (!mapping) return refs; refs += folio_nr_pages(folio); if (folio_test_private(folio)) refs++; return refs; } /* * Replace the folio in the mapping. * * The number of remaining references must be: * 1 for anonymous folios without a mapping * 2 for folios with a mapping * 3 for folios with a mapping and the private flag set. */ static int __folio_migrate_mapping(struct address_space *mapping, struct folio *newfolio, struct folio *folio, int expected_count) { XA_STATE(xas, &mapping->i_pages, folio_index(folio)); struct zone *oldzone, *newzone; int dirty; long nr = folio_nr_pages(folio); long entries, i; if (!mapping) { /* Take off deferred split queue while frozen and memcg set */ if (folio_test_large(folio) && folio_test_large_rmappable(folio)) { if (!folio_ref_freeze(folio, expected_count)) return -EAGAIN; folio_unqueue_deferred_split(folio); folio_ref_unfreeze(folio, expected_count); } /* No turning back from here */ newfolio->index = folio->index; newfolio->mapping = folio->mapping; if (folio_test_anon(folio) && folio_test_large(folio)) mod_mthp_stat(folio_order(folio), MTHP_STAT_NR_ANON, 1); if (folio_test_swapbacked(folio)) __folio_set_swapbacked(newfolio); return MIGRATEPAGE_SUCCESS; } oldzone = folio_zone(folio); newzone = folio_zone(newfolio); xas_lock_irq(&xas); if (!folio_ref_freeze(folio, expected_count)) { xas_unlock_irq(&xas); return -EAGAIN; } /* Take off deferred split queue while frozen and memcg set */ folio_unqueue_deferred_split(folio); /* * Now we know that no one else is looking at the folio: * no turning back from here. */ newfolio->index = folio->index; newfolio->mapping = folio->mapping; if (folio_test_anon(folio) && folio_test_large(folio)) mod_mthp_stat(folio_order(folio), MTHP_STAT_NR_ANON, 1); folio_ref_add(newfolio, nr); /* add cache reference */ if (folio_test_swapbacked(folio)) { __folio_set_swapbacked(newfolio); if (folio_test_swapcache(folio)) { folio_set_swapcache(newfolio); newfolio->private = folio_get_private(folio); } entries = nr; } else { VM_BUG_ON_FOLIO(folio_test_swapcache(folio), folio); entries = 1; } /* Move dirty while folio refs frozen and newfolio not yet exposed */ dirty = folio_test_dirty(folio); if (dirty) { folio_clear_dirty(folio); folio_set_dirty(newfolio); } /* Swap cache still stores N entries instead of a high-order entry */ for (i = 0; i < entries; i++) { xas_store(&xas, newfolio); xas_next(&xas); } /* * Drop cache reference from old folio by unfreezing * to one less reference. * We know this isn't the last reference. */ folio_ref_unfreeze(folio, expected_count - nr); xas_unlock(&xas); /* Leave irq disabled to prevent preemption while updating stats */ /* * If moved to a different zone then also account * the folio for that zone. Other VM counters will be * taken care of when we establish references to the * new folio and drop references to the old folio. * * Note that anonymous folios are accounted for * via NR_FILE_PAGES and NR_ANON_MAPPED if they * are mapped to swap space. */ if (newzone != oldzone) { struct lruvec *old_lruvec, *new_lruvec; struct mem_cgroup *memcg; memcg = folio_memcg(folio); old_lruvec = mem_cgroup_lruvec(memcg, oldzone->zone_pgdat); new_lruvec = mem_cgroup_lruvec(memcg, newzone->zone_pgdat); __mod_lruvec_state(old_lruvec, NR_FILE_PAGES, -nr); __mod_lruvec_state(new_lruvec, NR_FILE_PAGES, nr); if (folio_test_swapbacked(folio) && !folio_test_swapcache(folio)) { __mod_lruvec_state(old_lruvec, NR_SHMEM, -nr); __mod_lruvec_state(new_lruvec, NR_SHMEM, nr); if (folio_test_pmd_mappable(folio)) { __mod_lruvec_state(old_lruvec, NR_SHMEM_THPS, -nr); __mod_lruvec_state(new_lruvec, NR_SHMEM_THPS, nr); } } #ifdef CONFIG_SWAP if (folio_test_swapcache(folio)) { __mod_lruvec_state(old_lruvec, NR_SWAPCACHE, -nr); __mod_lruvec_state(new_lruvec, NR_SWAPCACHE, nr); } #endif if (dirty && mapping_can_writeback(mapping)) { __mod_lruvec_state(old_lruvec, NR_FILE_DIRTY, -nr); __mod_zone_page_state(oldzone, NR_ZONE_WRITE_PENDING, -nr); __mod_lruvec_state(new_lruvec, NR_FILE_DIRTY, nr); __mod_zone_page_state(newzone, NR_ZONE_WRITE_PENDING, nr); } } local_irq_enable(); return MIGRATEPAGE_SUCCESS; } int folio_migrate_mapping(struct address_space *mapping, struct folio *newfolio, struct folio *folio, int extra_count) { int expected_count = folio_expected_refs(mapping, folio) + extra_count; if (folio_ref_count(folio) != expected_count) return -EAGAIN; return __folio_migrate_mapping(mapping, newfolio, folio, expected_count); } EXPORT_SYMBOL(folio_migrate_mapping); /* * The expected number of remaining references is the same as that * of folio_migrate_mapping(). */ int migrate_huge_page_move_mapping(struct address_space *mapping, struct folio *dst, struct folio *src) { XA_STATE(xas, &mapping->i_pages, folio_index(src)); int rc, expected_count = folio_expected_refs(mapping, src); if (folio_ref_count(src) != expected_count) return -EAGAIN; rc = folio_mc_copy(dst, src); if (unlikely(rc)) return rc; xas_lock_irq(&xas); if (!folio_ref_freeze(src, expected_count)) { xas_unlock_irq(&xas); return -EAGAIN; } dst->index = src->index; dst->mapping = src->mapping; folio_ref_add(dst, folio_nr_pages(dst)); xas_store(&xas, dst); folio_ref_unfreeze(src, expected_count - folio_nr_pages(src)); xas_unlock_irq(&xas); return MIGRATEPAGE_SUCCESS; } /* * Copy the flags and some other ancillary information */ void folio_migrate_flags(struct folio *newfolio, struct folio *folio) { int cpupid; if (folio_test_referenced(folio)) folio_set_referenced(newfolio); if (folio_test_uptodate(folio)) folio_mark_uptodate(newfolio); if (folio_test_clear_active(folio)) { VM_BUG_ON_FOLIO(folio_test_unevictable(folio), folio); folio_set_active(newfolio); } else if (folio_test_clear_unevictable(folio)) folio_set_unevictable(newfolio); if (folio_test_workingset(folio)) folio_set_workingset(newfolio); if (folio_test_checked(folio)) folio_set_checked(newfolio); /* * PG_anon_exclusive (-> PG_mappedtodisk) is always migrated via * migration entries. We can still have PG_anon_exclusive set on an * effectively unmapped and unreferenced first sub-pages of an * anonymous THP: we can simply copy it here via PG_mappedtodisk. */ if (folio_test_mappedtodisk(folio)) folio_set_mappedtodisk(newfolio); /* Move dirty on pages not done by folio_migrate_mapping() */ if (folio_test_dirty(folio)) folio_set_dirty(newfolio); if (folio_test_young(folio)) folio_set_young(newfolio); if (folio_test_idle(folio)) folio_set_idle(newfolio); folio_migrate_refs(newfolio, folio); /* * Copy NUMA information to the new page, to prevent over-eager * future migrations of this same page. */ cpupid = folio_xchg_last_cpupid(folio, -1); /* * For memory tiering mode, when migrate between slow and fast * memory node, reset cpupid, because that is used to record * page access time in slow memory node. */ if (sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING) { bool f_toptier = node_is_toptier(folio_nid(folio)); bool t_toptier = node_is_toptier(folio_nid(newfolio)); if (f_toptier != t_toptier) cpupid = -1; } folio_xchg_last_cpupid(newfolio, cpupid); folio_migrate_ksm(newfolio, folio); /* * Please do not reorder this without considering how mm/ksm.c's * ksm_get_folio() depends upon ksm_migrate_page() and the * swapcache flag. */ if (folio_test_swapcache(folio)) folio_clear_swapcache(folio); folio_clear_private(folio); /* page->private contains hugetlb specific flags */ if (!folio_test_hugetlb(folio)) folio->private = NULL; /* * If any waiters have accumulated on the new page then * wake them up. */ if (folio_test_writeback(newfolio)) folio_end_writeback(newfolio); /* * PG_readahead shares the same bit with PG_reclaim. The above * end_page_writeback() may clear PG_readahead mistakenly, so set the * bit after that. */ if (folio_test_readahead(folio)) folio_set_readahead(newfolio); folio_copy_owner(newfolio, folio); pgalloc_tag_swap(newfolio, folio); mem_cgroup_migrate(folio, newfolio); } EXPORT_SYMBOL(folio_migrate_flags); /************************************************************ * Migration functions ***********************************************************/ static int __migrate_folio(struct address_space *mapping, struct folio *dst, struct folio *src, void *src_private, enum migrate_mode mode) { int rc, expected_count = folio_expected_refs(mapping, src); /* Check whether src does not have extra refs before we do more work */ if (folio_ref_count(src) != expected_count) return -EAGAIN; rc = folio_mc_copy(dst, src); if (unlikely(rc)) return rc; rc = __folio_migrate_mapping(mapping, dst, src, expected_count); if (rc != MIGRATEPAGE_SUCCESS) return rc; if (src_private) folio_attach_private(dst, folio_detach_private(src)); folio_migrate_flags(dst, src); return MIGRATEPAGE_SUCCESS; } /** * migrate_folio() - Simple folio migration. * @mapping: The address_space containing the folio. * @dst: The folio to migrate the data to. * @src: The folio containing the current data. * @mode: How to migrate the page. * * Common logic to directly migrate a single LRU folio suitable for * folios that do not have private data. * * Folios are locked upon entry and exit. */ int migrate_folio(struct address_space *mapping, struct folio *dst, struct folio *src, enum migrate_mode mode) { BUG_ON(folio_test_writeback(src)); /* Writeback must be complete */ return __migrate_folio(mapping, dst, src, NULL, mode); } EXPORT_SYMBOL(migrate_folio); #ifdef CONFIG_BUFFER_HEAD /* Returns true if all buffers are successfully locked */ static bool buffer_migrate_lock_buffers(struct buffer_head *head, enum migrate_mode mode) { struct buffer_head *bh = head; struct buffer_head *failed_bh; do { if (!trylock_buffer(bh)) { if (mode == MIGRATE_ASYNC) goto unlock; if (mode == MIGRATE_SYNC_LIGHT && !buffer_uptodate(bh)) goto unlock; lock_buffer(bh); } bh = bh->b_this_page; } while (bh != head); return true; unlock: /* We failed to lock the buffer and cannot stall. */ failed_bh = bh; bh = head; while (bh != failed_bh) { unlock_buffer(bh); bh = bh->b_this_page; } return false; } static int __buffer_migrate_folio(struct address_space *mapping, struct folio *dst, struct folio *src, enum migrate_mode mode, bool check_refs) { struct buffer_head *bh, *head; int rc; int expected_count; head = folio_buffers(src); if (!head) return migrate_folio(mapping, dst, src, mode); /* Check whether page does not have extra refs before we do more work */ expected_count = folio_expected_refs(mapping, src); if (folio_ref_count(src) != expected_count) return -EAGAIN; if (!buffer_migrate_lock_buffers(head, mode)) return -EAGAIN; if (check_refs) { bool busy; bool invalidated = false; recheck_buffers: busy = false; spin_lock(&mapping->i_private_lock); bh = head; do { if (atomic_read(&bh->b_count)) { busy = true; break; } bh = bh->b_this_page; } while (bh != head); if (busy) { if (invalidated) { rc = -EAGAIN; goto unlock_buffers; } spin_unlock(&mapping->i_private_lock); invalidate_bh_lrus(); invalidated = true; goto recheck_buffers; } } rc = filemap_migrate_folio(mapping, dst, src, mode); if (rc != MIGRATEPAGE_SUCCESS) goto unlock_buffers; bh = head; do { folio_set_bh(bh, dst, bh_offset(bh)); bh = bh->b_this_page; } while (bh != head); unlock_buffers: if (check_refs) spin_unlock(&mapping->i_private_lock); bh = head; do { unlock_buffer(bh); bh = bh->b_this_page; } while (bh != head); return rc; } /** * buffer_migrate_folio() - Migration function for folios with buffers. * @mapping: The address space containing @src. * @dst: The folio to migrate to. * @src: The folio to migrate from. * @mode: How to migrate the folio. * * This function can only be used if the underlying filesystem guarantees * that no other references to @src exist. For example attached buffer * heads are accessed only under the folio lock. If your filesystem cannot * provide this guarantee, buffer_migrate_folio_norefs() may be more * appropriate. * * Return: 0 on success or a negative errno on failure. */ int buffer_migrate_folio(struct address_space *mapping, struct folio *dst, struct folio *src, enum migrate_mode mode) { return __buffer_migrate_folio(mapping, dst, src, mode, false); } EXPORT_SYMBOL(buffer_migrate_folio); /** * buffer_migrate_folio_norefs() - Migration function for folios with buffers. * @mapping: The address space containing @src. * @dst: The folio to migrate to. * @src: The folio to migrate from. * @mode: How to migrate the folio. * * Like buffer_migrate_folio() except that this variant is more careful * and checks that there are also no buffer head references. This function * is the right one for mappings where buffer heads are directly looked * up and referenced (such as block device mappings). * * Return: 0 on success or a negative errno on failure. */ int buffer_migrate_folio_norefs(struct address_space *mapping, struct folio *dst, struct folio *src, enum migrate_mode mode) { return __buffer_migrate_folio(mapping, dst, src, mode, true); } EXPORT_SYMBOL_GPL(buffer_migrate_folio_norefs); #endif /* CONFIG_BUFFER_HEAD */ int filemap_migrate_folio(struct address_space *mapping, struct folio *dst, struct folio *src, enum migrate_mode mode) { return __migrate_folio(mapping, dst, src, folio_get_private(src), mode); } EXPORT_SYMBOL_GPL(filemap_migrate_folio); /* * Writeback a folio to clean the dirty state */ static int writeout(struct address_space *mapping, struct folio *folio) { struct writeback_control wbc = { .sync_mode = WB_SYNC_NONE, .nr_to_write = 1, .range_start = 0, .range_end = LLONG_MAX, .for_reclaim = 1 }; int rc; if (!mapping->a_ops->writepage) /* No write method for the address space */ return -EINVAL; if (!folio_clear_dirty_for_io(folio)) /* Someone else already triggered a write */ return -EAGAIN; /* * A dirty folio may imply that the underlying filesystem has * the folio on some queue. So the folio must be clean for * migration. Writeout may mean we lose the lock and the * folio state is no longer what we checked for earlier. * At this point we know that the migration attempt cannot * be successful. */ remove_migration_ptes(folio, folio, 0); rc = mapping->a_ops->writepage(&folio->page, &wbc); if (rc != AOP_WRITEPAGE_ACTIVATE) /* unlocked. Relock */ folio_lock(folio); return (rc < 0) ? -EIO : -EAGAIN; } /* * Default handling if a filesystem does not provide a migration function. */ static int fallback_migrate_folio(struct address_space *mapping, struct folio *dst, struct folio *src, enum migrate_mode mode) { if (folio_test_dirty(src)) { /* Only writeback folios in full synchronous migration */ switch (mode) { case MIGRATE_SYNC: break; default: return -EBUSY; } return writeout(mapping, src); } /* * Buffers may be managed in a filesystem specific way. * We must have no buffers or drop them. */ if (!filemap_release_folio(src, GFP_KERNEL)) return mode == MIGRATE_SYNC ? -EAGAIN : -EBUSY; return migrate_folio(mapping, dst, src, mode); } /* * Move a page to a newly allocated page * The page is locked and all ptes have been successfully removed. * * The new page will have replaced the old page if this function * is successful. * * Return value: * < 0 - error code * MIGRATEPAGE_SUCCESS - success */ static int move_to_new_folio(struct folio *dst, struct folio *src, enum migrate_mode mode) { int rc = -EAGAIN; bool is_lru = !__folio_test_movable(src); VM_BUG_ON_FOLIO(!folio_test_locked(src), src); VM_BUG_ON_FOLIO(!folio_test_locked(dst), dst); if (likely(is_lru)) { struct address_space *mapping = folio_mapping(src); if (!mapping) rc = migrate_folio(mapping, dst, src, mode); else if (mapping_inaccessible(mapping)) rc = -EOPNOTSUPP; else if (mapping->a_ops->migrate_folio) /* * Most folios have a mapping and most filesystems * provide a migrate_folio callback. Anonymous folios * are part of swap space which also has its own * migrate_folio callback. This is the most common path * for page migration. */ rc = mapping->a_ops->migrate_folio(mapping, dst, src, mode); else rc = fallback_migrate_folio(mapping, dst, src, mode); } else { const struct movable_operations *mops; /* * In case of non-lru page, it could be released after * isolation step. In that case, we shouldn't try migration. */ VM_BUG_ON_FOLIO(!folio_test_isolated(src), src); if (!folio_test_movable(src)) { rc = MIGRATEPAGE_SUCCESS; folio_clear_isolated(src); goto out; } mops = folio_movable_ops(src); rc = mops->migrate_page(&dst->page, &src->page, mode); WARN_ON_ONCE(rc == MIGRATEPAGE_SUCCESS && !folio_test_isolated(src)); } /* * When successful, old pagecache src->mapping must be cleared before * src is freed; but stats require that PageAnon be left as PageAnon. */ if (rc == MIGRATEPAGE_SUCCESS) { if (__folio_test_movable(src)) { VM_BUG_ON_FOLIO(!folio_test_isolated(src), src); /* * We clear PG_movable under page_lock so any compactor * cannot try to migrate this page. */ folio_clear_isolated(src); } /* * Anonymous and movable src->mapping will be cleared by * free_pages_prepare so don't reset it here for keeping * the type to work PageAnon, for example. */ if (!folio_mapping_flags(src)) src->mapping = NULL; if (likely(!folio_is_zone_device(dst))) flush_dcache_folio(dst); } out: return rc; } /* * To record some information during migration, we use unused private * field of struct folio of the newly allocated destination folio. * This is safe because nobody is using it except us. */ enum { PAGE_WAS_MAPPED = BIT(0), PAGE_WAS_MLOCKED = BIT(1), PAGE_OLD_STATES = PAGE_WAS_MAPPED | PAGE_WAS_MLOCKED, }; static void __migrate_folio_record(struct folio *dst, int old_page_state, struct anon_vma *anon_vma) { dst->private = (void *)anon_vma + old_page_state; } static void __migrate_folio_extract(struct folio *dst, int *old_page_state, struct anon_vma **anon_vmap) { unsigned long private = (unsigned long)dst->private; *anon_vmap = (struct anon_vma *)(private & ~PAGE_OLD_STATES); *old_page_state = private & PAGE_OLD_STATES; dst->private = NULL; } /* Restore the source folio to the original state upon failure */ static void migrate_folio_undo_src(struct folio *src, int page_was_mapped, struct anon_vma *anon_vma, bool locked, struct list_head *ret) { if (page_was_mapped) remove_migration_ptes(src, src, 0); /* Drop an anon_vma reference if we took one */ if (anon_vma) put_anon_vma(anon_vma); if (locked) folio_unlock(src); if (ret) list_move_tail(&src->lru, ret); } /* Restore the destination folio to the original state upon failure */ static void migrate_folio_undo_dst(struct folio *dst, bool locked, free_folio_t put_new_folio, unsigned long private) { if (locked) folio_unlock(dst); if (put_new_folio) put_new_folio(dst, private); else folio_put(dst); } /* Cleanup src folio upon migration success */ static void migrate_folio_done(struct folio *src, enum migrate_reason reason) { /* * Compaction can migrate also non-LRU pages which are * not accounted to NR_ISOLATED_*. They can be recognized * as __folio_test_movable */ if (likely(!__folio_test_movable(src)) && reason != MR_DEMOTION) mod_node_page_state(folio_pgdat(src), NR_ISOLATED_ANON + folio_is_file_lru(src), -folio_nr_pages(src)); if (reason != MR_MEMORY_FAILURE) /* We release the page in page_handle_poison. */ folio_put(src); } /* Obtain the lock on page, remove all ptes. */ static int migrate_folio_unmap(new_folio_t get_new_folio, free_folio_t put_new_folio, unsigned long private, struct folio *src, struct folio **dstp, enum migrate_mode mode, enum migrate_reason reason, struct list_head *ret) { struct folio *dst; int rc = -EAGAIN; int old_page_state = 0; struct anon_vma *anon_vma = NULL; bool is_lru = data_race(!__folio_test_movable(src)); bool locked = false; bool dst_locked = false; if (folio_ref_count(src) == 1) { /* Folio was freed from under us. So we are done. */ folio_clear_active(src); folio_clear_unevictable(src); /* free_pages_prepare() will clear PG_isolated. */ list_del(&src->lru); migrate_folio_done(src, reason); return MIGRATEPAGE_SUCCESS; } dst = get_new_folio(src, private); if (!dst) return -ENOMEM; *dstp = dst; dst->private = NULL; if (!folio_trylock(src)) { if (mode == MIGRATE_ASYNC) goto out; /* * It's not safe for direct compaction to call lock_page. * For example, during page readahead pages are added locked * to the LRU. Later, when the IO completes the pages are * marked uptodate and unlocked. However, the queueing * could be merging multiple pages for one bio (e.g. * mpage_readahead). If an allocation happens for the * second or third page, the process can end up locking * the same page twice and deadlocking. Rather than * trying to be clever about what pages can be locked, * avoid the use of lock_page for direct compaction * altogether. */ if (current->flags & PF_MEMALLOC) goto out; /* * In "light" mode, we can wait for transient locks (eg * inserting a page into the page table), but it's not * worth waiting for I/O. */ if (mode == MIGRATE_SYNC_LIGHT && !folio_test_uptodate(src)) goto out; folio_lock(src); } locked = true; if (folio_test_mlocked(src)) old_page_state |= PAGE_WAS_MLOCKED; if (folio_test_writeback(src)) { /* * Only in the case of a full synchronous migration is it * necessary to wait for PageWriteback. In the async case, * the retry loop is too short and in the sync-light case, * the overhead of stalling is too much */ switch (mode) { case MIGRATE_SYNC: break; default: rc = -EBUSY; goto out; } folio_wait_writeback(src); } /* * By try_to_migrate(), src->mapcount goes down to 0 here. In this case, * we cannot notice that anon_vma is freed while we migrate a page. * This get_anon_vma() delays freeing anon_vma pointer until the end * of migration. File cache pages are no problem because of page_lock() * File Caches may use write_page() or lock_page() in migration, then, * just care Anon page here. * * Only folio_get_anon_vma() understands the subtleties of * getting a hold on an anon_vma from outside one of its mms. * But if we cannot get anon_vma, then we won't need it anyway, * because that implies that the anon page is no longer mapped * (and cannot be remapped so long as we hold the page lock). */ if (folio_test_anon(src) && !folio_test_ksm(src)) anon_vma = folio_get_anon_vma(src); /* * Block others from accessing the new page when we get around to * establishing additional references. We are usually the only one * holding a reference to dst at this point. We used to have a BUG * here if folio_trylock(dst) fails, but would like to allow for * cases where there might be a race with the previous use of dst. * This is much like races on refcount of oldpage: just don't BUG(). */ if (unlikely(!folio_trylock(dst))) goto out; dst_locked = true; if (unlikely(!is_lru)) { __migrate_folio_record(dst, old_page_state, anon_vma); return MIGRATEPAGE_UNMAP; } /* * Corner case handling: * 1. When a new swap-cache page is read into, it is added to the LRU * and treated as swapcache but it has no rmap yet. * Calling try_to_unmap() against a src->mapping==NULL page will * trigger a BUG. So handle it here. * 2. An orphaned page (see truncate_cleanup_page) might have * fs-private metadata. The page can be picked up due to memory * offlining. Everywhere else except page reclaim, the page is * invisible to the vm, so the page can not be migrated. So try to * free the metadata, so the page can be freed. */ if (!src->mapping) { if (folio_test_private(src)) { try_to_free_buffers(src); goto out; } } else if (folio_mapped(src)) { /* Establish migration ptes */ VM_BUG_ON_FOLIO(folio_test_anon(src) && !folio_test_ksm(src) && !anon_vma, src); try_to_migrate(src, mode == MIGRATE_ASYNC ? TTU_BATCH_FLUSH : 0); old_page_state |= PAGE_WAS_MAPPED; } if (!folio_mapped(src)) { __migrate_folio_record(dst, old_page_state, anon_vma); return MIGRATEPAGE_UNMAP; } out: /* * A folio that has not been unmapped will be restored to * right list unless we want to retry. */ if (rc == -EAGAIN) ret = NULL; migrate_folio_undo_src(src, old_page_state & PAGE_WAS_MAPPED, anon_vma, locked, ret); migrate_folio_undo_dst(dst, dst_locked, put_new_folio, private); return rc; } /* Migrate the folio to the newly allocated folio in dst. */ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private, struct folio *src, struct folio *dst, enum migrate_mode mode, enum migrate_reason reason, struct list_head *ret) { int rc; int old_page_state = 0; struct anon_vma *anon_vma = NULL; bool is_lru = !__folio_test_movable(src); struct list_head *prev; __migrate_folio_extract(dst, &old_page_state, &anon_vma); prev = dst->lru.prev; list_del(&dst->lru); rc = move_to_new_folio(dst, src, mode); if (rc) goto out; if (unlikely(!is_lru)) goto out_unlock_both; /* * When successful, push dst to LRU immediately: so that if it * turns out to be an mlocked page, remove_migration_ptes() will * automatically build up the correct dst->mlock_count for it. * * We would like to do something similar for the old page, when * unsuccessful, and other cases when a page has been temporarily * isolated from the unevictable LRU: but this case is the easiest. */ folio_add_lru(dst); if (old_page_state & PAGE_WAS_MLOCKED) lru_add_drain(); if (old_page_state & PAGE_WAS_MAPPED) remove_migration_ptes(src, dst, 0); out_unlock_both: folio_unlock(dst); set_page_owner_migrate_reason(&dst->page, reason); /* * If migration is successful, decrease refcount of dst, * which will not free the page because new page owner increased * refcounter. */ folio_put(dst); /* * A folio that has been migrated has all references removed * and will be freed. */ list_del(&src->lru); /* Drop an anon_vma reference if we took one */ if (anon_vma) put_anon_vma(anon_vma); folio_unlock(src); migrate_folio_done(src, reason); return rc; out: /* * A folio that has not been migrated will be restored to * right list unless we want to retry. */ if (rc == -EAGAIN) { list_add(&dst->lru, prev); __migrate_folio_record(dst, old_page_state, anon_vma); return rc; } migrate_folio_undo_src(src, old_page_state & PAGE_WAS_MAPPED, anon_vma, true, ret); migrate_folio_undo_dst(dst, true, put_new_folio, private); return rc; } /* * Counterpart of unmap_and_move_page() for hugepage migration. * * This function doesn't wait the completion of hugepage I/O * because there is no race between I/O and migration for hugepage. * Note that currently hugepage I/O occurs only in direct I/O * where no lock is held and PG_writeback is irrelevant, * and writeback status of all subpages are counted in the reference * count of the head page (i.e. if all subpages of a 2MB hugepage are * under direct I/O, the reference of the head page is 512 and a bit more.) * This means that when we try to migrate hugepage whose subpages are * doing direct I/O, some references remain after try_to_unmap() and * hugepage migration fails without data corruption. * * There is also no race when direct I/O is issued on the page under migration, * because then pte is replaced with migration swap entry and direct I/O code * will wait in the page fault for migration to complete. */ static int unmap_and_move_huge_page(new_folio_t get_new_folio, free_folio_t put_new_folio, unsigned long private, struct folio *src, int force, enum migrate_mode mode, int reason, struct list_head *ret) { struct folio *dst; int rc = -EAGAIN; int page_was_mapped = 0; struct anon_vma *anon_vma = NULL; struct address_space *mapping = NULL; if (folio_ref_count(src) == 1) { /* page was freed from under us. So we are done. */ folio_putback_hugetlb(src); return MIGRATEPAGE_SUCCESS; } dst = get_new_folio(src, private); if (!dst) return -ENOMEM; if (!folio_trylock(src)) { if (!force) goto out; switch (mode) { case MIGRATE_SYNC: break; default: goto out; } folio_lock(src); } /* * Check for pages which are in the process of being freed. Without * folio_mapping() set, hugetlbfs specific move page routine will not * be called and we could leak usage counts for subpools. */ if (hugetlb_folio_subpool(src) && !folio_mapping(src)) { rc = -EBUSY; goto out_unlock; } if (folio_test_anon(src)) anon_vma = folio_get_anon_vma(src); if (unlikely(!folio_trylock(dst))) goto put_anon; if (folio_mapped(src)) { enum ttu_flags ttu = 0; if (!folio_test_anon(src)) { /* * In shared mappings, try_to_unmap could potentially * call huge_pmd_unshare. Because of this, take * semaphore in write mode here and set TTU_RMAP_LOCKED * to let lower levels know we have taken the lock. */ mapping = hugetlb_folio_mapping_lock_write(src); if (unlikely(!mapping)) goto unlock_put_anon; ttu = TTU_RMAP_LOCKED; } try_to_migrate(src, ttu); page_was_mapped = 1; if (ttu & TTU_RMAP_LOCKED) i_mmap_unlock_write(mapping); } if (!folio_mapped(src)) rc = move_to_new_folio(dst, src, mode); if (page_was_mapped) remove_migration_ptes(src, rc == MIGRATEPAGE_SUCCESS ? dst : src, 0); unlock_put_anon: folio_unlock(dst); put_anon: if (anon_vma) put_anon_vma(anon_vma); if (rc == MIGRATEPAGE_SUCCESS) { move_hugetlb_state(src, dst, reason); put_new_folio = NULL; } out_unlock: folio_unlock(src); out: if (rc == MIGRATEPAGE_SUCCESS) folio_putback_hugetlb(src); else if (rc != -EAGAIN) list_move_tail(&src->lru, ret); /* * If migration was not successful and there's a freeing callback, * return the folio to that special allocator. Otherwise, simply drop * our additional reference. */ if (put_new_folio) put_new_folio(dst, private); else folio_put(dst); return rc; } static inline int try_split_folio(struct folio *folio, struct list_head *split_folios, enum migrate_mode mode) { int rc; if (mode == MIGRATE_ASYNC) { if (!folio_trylock(folio)) return -EAGAIN; } else { folio_lock(folio); } rc = split_folio_to_list(folio, split_folios); folio_unlock(folio); if (!rc) list_move_tail(&folio->lru, split_folios); return rc; } #ifdef CONFIG_TRANSPARENT_HUGEPAGE #define NR_MAX_BATCHED_MIGRATION HPAGE_PMD_NR #else #define NR_MAX_BATCHED_MIGRATION 512 #endif #define NR_MAX_MIGRATE_PAGES_RETRY 10 #define NR_MAX_MIGRATE_ASYNC_RETRY 3 #define NR_MAX_MIGRATE_SYNC_RETRY \ (NR_MAX_MIGRATE_PAGES_RETRY - NR_MAX_MIGRATE_ASYNC_RETRY) struct migrate_pages_stats { int nr_succeeded; /* Normal and large folios migrated successfully, in units of base pages */ int nr_failed_pages; /* Normal and large folios failed to be migrated, in units of base pages. Untried folios aren't counted */ int nr_thp_succeeded; /* THP migrated successfully */ int nr_thp_failed; /* THP failed to be migrated */ int nr_thp_split; /* THP split before migrating */ int nr_split; /* Large folio (include THP) split before migrating */ }; /* * Returns the number of hugetlb folios that were not migrated, or an error code * after NR_MAX_MIGRATE_PAGES_RETRY attempts or if no hugetlb folios are movable * any more because the list has become empty or no retryable hugetlb folios * exist any more. It is caller's responsibility to call putback_movable_pages() * only if ret != 0. */ static int migrate_hugetlbs(struct list_head *from, new_folio_t get_new_folio, free_folio_t put_new_folio, unsigned long private, enum migrate_mode mode, int reason, struct migrate_pages_stats *stats, struct list_head *ret_folios) { int retry = 1; int nr_failed = 0; int nr_retry_pages = 0; int pass = 0; struct folio *folio, *folio2; int rc, nr_pages; for (pass = 0; pass < NR_MAX_MIGRATE_PAGES_RETRY && retry; pass++) { retry = 0; nr_retry_pages = 0; list_for_each_entry_safe(folio, folio2, from, lru) { if (!folio_test_hugetlb(folio)) continue; nr_pages = folio_nr_pages(folio); cond_resched(); /* * Migratability of hugepages depends on architectures and * their size. This check is necessary because some callers * of hugepage migration like soft offline and memory * hotremove don't walk through page tables or check whether * the hugepage is pmd-based or not before kicking migration. */ if (!hugepage_migration_supported(folio_hstate(folio))) { nr_failed++; stats->nr_failed_pages += nr_pages; list_move_tail(&folio->lru, ret_folios); continue; } rc = unmap_and_move_huge_page(get_new_folio, put_new_folio, private, folio, pass > 2, mode, reason, ret_folios); /* * The rules are: * Success: hugetlb folio will be put back * -EAGAIN: stay on the from list * -ENOMEM: stay on the from list * Other errno: put on ret_folios list */ switch(rc) { case -ENOMEM: /* * When memory is low, don't bother to try to migrate * other folios, just exit. */ stats->nr_failed_pages += nr_pages + nr_retry_pages; return -ENOMEM; case -EAGAIN: retry++; nr_retry_pages += nr_pages; break; case MIGRATEPAGE_SUCCESS: stats->nr_succeeded += nr_pages; break; default: /* * Permanent failure (-EBUSY, etc.): * unlike -EAGAIN case, the failed folio is * removed from migration folio list and not * retried in the next outer loop. */ nr_failed++; stats->nr_failed_pages += nr_pages; break; } } } /* * nr_failed is number of hugetlb folios failed to be migrated. After * NR_MAX_MIGRATE_PAGES_RETRY attempts, give up and count retried hugetlb * folios as failed. */ nr_failed += retry; stats->nr_failed_pages += nr_retry_pages; return nr_failed; } static void migrate_folios_move(struct list_head *src_folios, struct list_head *dst_folios, free_folio_t put_new_folio, unsigned long private, enum migrate_mode mode, int reason, struct list_head *ret_folios, struct migrate_pages_stats *stats, int *retry, int *thp_retry, int *nr_failed, int *nr_retry_pages) { struct folio *folio, *folio2, *dst, *dst2; bool is_thp; int nr_pages; int rc; dst = list_first_entry(dst_folios, struct folio, lru); dst2 = list_next_entry(dst, lru); list_for_each_entry_safe(folio, folio2, src_folios, lru) { is_thp = folio_test_large(folio) && folio_test_pmd_mappable(folio); nr_pages = folio_nr_pages(folio); cond_resched(); rc = migrate_folio_move(put_new_folio, private, folio, dst, mode, reason, ret_folios); /* * The rules are: * Success: folio will be freed * -EAGAIN: stay on the unmap_folios list * Other errno: put on ret_folios list */ switch (rc) { case -EAGAIN: *retry += 1; *thp_retry += is_thp; *nr_retry_pages += nr_pages; break; case MIGRATEPAGE_SUCCESS: stats->nr_succeeded += nr_pages; stats->nr_thp_succeeded += is_thp; break; default: *nr_failed += 1; stats->nr_thp_failed += is_thp; stats->nr_failed_pages += nr_pages; break; } dst = dst2; dst2 = list_next_entry(dst, lru); } } static void migrate_folios_undo(struct list_head *src_folios, struct list_head *dst_folios, free_folio_t put_new_folio, unsigned long private, struct list_head *ret_folios) { struct folio *folio, *folio2, *dst, *dst2; dst = list_first_entry(dst_folios, struct folio, lru); dst2 = list_next_entry(dst, lru); list_for_each_entry_safe(folio, folio2, src_folios, lru) { int old_page_state = 0; struct anon_vma *anon_vma = NULL; __migrate_folio_extract(dst, &old_page_state, &anon_vma); migrate_folio_undo_src(folio, old_page_state & PAGE_WAS_MAPPED, anon_vma, true, ret_folios); list_del(&dst->lru); migrate_folio_undo_dst(dst, true, put_new_folio, private); dst = dst2; dst2 = list_next_entry(dst, lru); } } /* * migrate_pages_batch() first unmaps folios in the from list as many as * possible, then move the unmapped folios. * * We only batch migration if mode == MIGRATE_ASYNC to avoid to wait a * lock or bit when we have locked more than one folio. Which may cause * deadlock (e.g., for loop device). So, if mode != MIGRATE_ASYNC, the * length of the from list must be <= 1. */ static int migrate_pages_batch(struct list_head *from, new_folio_t get_new_folio, free_folio_t put_new_folio, unsigned long private, enum migrate_mode mode, int reason, struct list_head *ret_folios, struct list_head *split_folios, struct migrate_pages_stats *stats, int nr_pass) { int retry = 1; int thp_retry = 1; int nr_failed = 0; int nr_retry_pages = 0; int pass = 0; bool is_thp = false; bool is_large = false; struct folio *folio, *folio2, *dst = NULL; int rc, rc_saved = 0, nr_pages; LIST_HEAD(unmap_folios); LIST_HEAD(dst_folios); bool nosplit = (reason == MR_NUMA_MISPLACED); VM_WARN_ON_ONCE(mode != MIGRATE_ASYNC && !list_empty(from) && !list_is_singular(from)); for (pass = 0; pass < nr_pass && retry; pass++) { retry = 0; thp_retry = 0; nr_retry_pages = 0; list_for_each_entry_safe(folio, folio2, from, lru) { is_large = folio_test_large(folio); is_thp = folio_test_pmd_mappable(folio); nr_pages = folio_nr_pages(folio); cond_resched(); /* * The rare folio on the deferred split list should * be split now. It should not count as a failure: * but increment nr_failed because, without doing so, * migrate_pages() may report success with (split but * unmigrated) pages still on its fromlist; whereas it * always reports success when its fromlist is empty. * stats->nr_thp_failed should be increased too, * otherwise stats inconsistency will happen when * migrate_pages_batch is called via migrate_pages() * with MIGRATE_SYNC and MIGRATE_ASYNC. * * Only check it without removing it from the list. * Since the folio can be on deferred_split_scan() * local list and removing it can cause the local list * corruption. Folio split process below can handle it * with the help of folio_ref_freeze(). * * nr_pages > 2 is needed to avoid checking order-1 * page cache folios. They exist, in contrast to * non-existent order-1 anonymous folios, and do not * use _deferred_list. */ if (nr_pages > 2 && !list_empty(&folio->_deferred_list) && folio_test_partially_mapped(folio)) { if (!try_split_folio(folio, split_folios, mode)) { nr_failed++; stats->nr_thp_failed += is_thp; stats->nr_thp_split += is_thp; stats->nr_split++; continue; } } /* * Large folio migration might be unsupported or * the allocation might be failed so we should retry * on the same folio with the large folio split * to normal folios. * * Split folios are put in split_folios, and * we will migrate them after the rest of the * list is processed. */ if (!thp_migration_supported() && is_thp) { nr_failed++; stats->nr_thp_failed++; if (!try_split_folio(folio, split_folios, mode)) { stats->nr_thp_split++; stats->nr_split++; continue; } stats->nr_failed_pages += nr_pages; list_move_tail(&folio->lru, ret_folios); continue; } rc = migrate_folio_unmap(get_new_folio, put_new_folio, private, folio, &dst, mode, reason, ret_folios); /* * The rules are: * Success: folio will be freed * Unmap: folio will be put on unmap_folios list, * dst folio put on dst_folios list * -EAGAIN: stay on the from list * -ENOMEM: stay on the from list * Other errno: put on ret_folios list */ switch(rc) { case -ENOMEM: /* * When memory is low, don't bother to try to migrate * other folios, move unmapped folios, then exit. */ nr_failed++; stats->nr_thp_failed += is_thp; /* Large folio NUMA faulting doesn't split to retry. */ if (is_large && !nosplit) { int ret = try_split_folio(folio, split_folios, mode); if (!ret) { stats->nr_thp_split += is_thp; stats->nr_split++; break; } else if (reason == MR_LONGTERM_PIN && ret == -EAGAIN) { /* * Try again to split large folio to * mitigate the failure of longterm pinning. */ retry++; thp_retry += is_thp; nr_retry_pages += nr_pages; /* Undo duplicated failure counting. */ nr_failed--; stats->nr_thp_failed -= is_thp; break; } } stats->nr_failed_pages += nr_pages + nr_retry_pages; /* nr_failed isn't updated for not used */ stats->nr_thp_failed += thp_retry; rc_saved = rc; if (list_empty(&unmap_folios)) goto out; else goto move; case -EAGAIN: retry++; thp_retry += is_thp; nr_retry_pages += nr_pages; break; case MIGRATEPAGE_SUCCESS: stats->nr_succeeded += nr_pages; stats->nr_thp_succeeded += is_thp; break; case MIGRATEPAGE_UNMAP: list_move_tail(&folio->lru, &unmap_folios); list_add_tail(&dst->lru, &dst_folios); break; default: /* * Permanent failure (-EBUSY, etc.): * unlike -EAGAIN case, the failed folio is * removed from migration folio list and not * retried in the next outer loop. */ nr_failed++; stats->nr_thp_failed += is_thp; stats->nr_failed_pages += nr_pages; break; } } } nr_failed += retry; stats->nr_thp_failed += thp_retry; stats->nr_failed_pages += nr_retry_pages; move: /* Flush TLBs for all unmapped folios */ try_to_unmap_flush(); retry = 1; for (pass = 0; pass < nr_pass && retry; pass++) { retry = 0; thp_retry = 0; nr_retry_pages = 0; /* Move the unmapped folios */ migrate_folios_move(&unmap_folios, &dst_folios, put_new_folio, private, mode, reason, ret_folios, stats, &retry, &thp_retry, &nr_failed, &nr_retry_pages); } nr_failed += retry; stats->nr_thp_failed += thp_retry; stats->nr_failed_pages += nr_retry_pages; rc = rc_saved ? : nr_failed; out: /* Cleanup remaining folios */ migrate_folios_undo(&unmap_folios, &dst_folios, put_new_folio, private, ret_folios); return rc; } static int migrate_pages_sync(struct list_head *from, new_folio_t get_new_folio, free_folio_t put_new_folio, unsigned long private, enum migrate_mode mode, int reason, struct list_head *ret_folios, struct list_head *split_folios, struct migrate_pages_stats *stats) { int rc, nr_failed = 0; LIST_HEAD(folios); struct migrate_pages_stats astats; memset(&astats, 0, sizeof(astats)); /* Try to migrate in batch with MIGRATE_ASYNC mode firstly */ rc = migrate_pages_batch(from, get_new_folio, put_new_folio, private, MIGRATE_ASYNC, reason, &folios, split_folios, &astats, NR_MAX_MIGRATE_ASYNC_RETRY); stats->nr_succeeded += astats.nr_succeeded; stats->nr_thp_succeeded += astats.nr_thp_succeeded; stats->nr_thp_split += astats.nr_thp_split; stats->nr_split += astats.nr_split; if (rc < 0) { stats->nr_failed_pages += astats.nr_failed_pages; stats->nr_thp_failed += astats.nr_thp_failed; list_splice_tail(&folios, ret_folios); return rc; } stats->nr_thp_failed += astats.nr_thp_split; /* * Do not count rc, as pages will be retried below. * Count nr_split only, since it includes nr_thp_split. */ nr_failed += astats.nr_split; /* * Fall back to migrate all failed folios one by one synchronously. All * failed folios except split THPs will be retried, so their failure * isn't counted */ list_splice_tail_init(&folios, from); while (!list_empty(from)) { list_move(from->next, &folios); rc = migrate_pages_batch(&folios, get_new_folio, put_new_folio, private, mode, reason, ret_folios, split_folios, stats, NR_MAX_MIGRATE_SYNC_RETRY); list_splice_tail_init(&folios, ret_folios); if (rc < 0) return rc; nr_failed += rc; } return nr_failed; } /* * migrate_pages - migrate the folios specified in a list, to the free folios * supplied as the target for the page migration * * @from: The list of folios to be migrated. * @get_new_folio: The function used to allocate free folios to be used * as the target of the folio migration. * @put_new_folio: The function used to free target folios if migration * fails, or NULL if no special handling is necessary. * @private: Private data to be passed on to get_new_folio() * @mode: The migration mode that specifies the constraints for * folio migration, if any. * @reason: The reason for folio migration. * @ret_succeeded: Set to the number of folios migrated successfully if * the caller passes a non-NULL pointer. * * The function returns after NR_MAX_MIGRATE_PAGES_RETRY attempts or if no folios * are movable any more because the list has become empty or no retryable folios * exist any more. It is caller's responsibility to call putback_movable_pages() * only if ret != 0. * * Returns the number of {normal folio, large folio, hugetlb} that were not * migrated, or an error code. The number of large folio splits will be * considered as the number of non-migrated large folio, no matter how many * split folios of the large folio are migrated successfully. */ int migrate_pages(struct list_head *from, new_folio_t get_new_folio, free_folio_t put_new_folio, unsigned long private, enum migrate_mode mode, int reason, unsigned int *ret_succeeded) { int rc, rc_gather; int nr_pages; struct folio *folio, *folio2; LIST_HEAD(folios); LIST_HEAD(ret_folios); LIST_HEAD(split_folios); struct migrate_pages_stats stats; trace_mm_migrate_pages_start(mode, reason); memset(&stats, 0, sizeof(stats)); rc_gather = migrate_hugetlbs(from, get_new_folio, put_new_folio, private, mode, reason, &stats, &ret_folios); if (rc_gather < 0) goto out; again: nr_pages = 0; list_for_each_entry_safe(folio, folio2, from, lru) { /* Retried hugetlb folios will be kept in list */ if (folio_test_hugetlb(folio)) { list_move_tail(&folio->lru, &ret_folios); continue; } nr_pages += folio_nr_pages(folio); if (nr_pages >= NR_MAX_BATCHED_MIGRATION) break; } if (nr_pages >= NR_MAX_BATCHED_MIGRATION) list_cut_before(&folios, from, &folio2->lru); else list_splice_init(from, &folios); if (mode == MIGRATE_ASYNC) rc = migrate_pages_batch(&folios, get_new_folio, put_new_folio, private, mode, reason, &ret_folios, &split_folios, &stats, NR_MAX_MIGRATE_PAGES_RETRY); else rc = migrate_pages_sync(&folios, get_new_folio, put_new_folio, private, mode, reason, &ret_folios, &split_folios, &stats); list_splice_tail_init(&folios, &ret_folios); if (rc < 0) { rc_gather = rc; list_splice_tail(&split_folios, &ret_folios); goto out; } if (!list_empty(&split_folios)) { /* * Failure isn't counted since all split folios of a large folio * is counted as 1 failure already. And, we only try to migrate * with minimal effort, force MIGRATE_ASYNC mode and retry once. */ migrate_pages_batch(&split_folios, get_new_folio, put_new_folio, private, MIGRATE_ASYNC, reason, &ret_folios, NULL, &stats, 1); list_splice_tail_init(&split_folios, &ret_folios); } rc_gather += rc; if (!list_empty(from)) goto again; out: /* * Put the permanent failure folio back to migration list, they * will be put back to the right list by the caller. */ list_splice(&ret_folios, from); /* * Return 0 in case all split folios of fail-to-migrate large folios * are migrated successfully. */ if (list_empty(from)) rc_gather = 0; count_vm_events(PGMIGRATE_SUCCESS, stats.nr_succeeded); count_vm_events(PGMIGRATE_FAIL, stats.nr_failed_pages); count_vm_events(THP_MIGRATION_SUCCESS, stats.nr_thp_succeeded); count_vm_events(THP_MIGRATION_FAIL, stats.nr_thp_failed); count_vm_events(THP_MIGRATION_SPLIT, stats.nr_thp_split); trace_mm_migrate_pages(stats.nr_succeeded, stats.nr_failed_pages, stats.nr_thp_succeeded, stats.nr_thp_failed, stats.nr_thp_split, stats.nr_split, mode, reason); if (ret_succeeded) *ret_succeeded = stats.nr_succeeded; return rc_gather; } struct folio *alloc_migration_target(struct folio *src, unsigned long private) { struct migration_target_control *mtc; gfp_t gfp_mask; unsigned int order = 0; int nid; int zidx; mtc = (struct migration_target_control *)private; gfp_mask = mtc->gfp_mask; nid = mtc->nid; if (nid == NUMA_NO_NODE) nid = folio_nid(src); if (folio_test_hugetlb(src)) { struct hstate *h = folio_hstate(src); gfp_mask = htlb_modify_alloc_mask(h, gfp_mask); return alloc_hugetlb_folio_nodemask(h, nid, mtc->nmask, gfp_mask, htlb_allow_alloc_fallback(mtc->reason)); } if (folio_test_large(src)) { /* * clear __GFP_RECLAIM to make the migration callback * consistent with regular THP allocations. */ gfp_mask &= ~__GFP_RECLAIM; gfp_mask |= GFP_TRANSHUGE; order = folio_order(src); } zidx = zone_idx(folio_zone(src)); if (is_highmem_idx(zidx) || zidx == ZONE_MOVABLE) gfp_mask |= __GFP_HIGHMEM; return __folio_alloc(gfp_mask, order, nid, mtc->nmask); } #ifdef CONFIG_NUMA static int store_status(int __user *status, int start, int value, int nr) { while (nr-- > 0) { if (put_user(value, status + start)) return -EFAULT; start++; } return 0; } static int do_move_pages_to_node(struct list_head *pagelist, int node) { int err; struct migration_target_control mtc = { .nid = node, .gfp_mask = GFP_HIGHUSER_MOVABLE | __GFP_THISNODE, .reason = MR_SYSCALL, }; err = migrate_pages(pagelist, alloc_migration_target, NULL, (unsigned long)&mtc, MIGRATE_SYNC, MR_SYSCALL, NULL); if (err) putback_movable_pages(pagelist); return err; } static int __add_folio_for_migration(struct folio *folio, int node, struct list_head *pagelist, bool migrate_all) { if (is_zero_folio(folio) || is_huge_zero_folio(folio)) return -EFAULT; if (folio_is_zone_device(folio)) return -ENOENT; if (folio_nid(folio) == node) return 0; if (folio_likely_mapped_shared(folio) && !migrate_all) return -EACCES; if (folio_test_hugetlb(folio)) { if (folio_isolate_hugetlb(folio, pagelist)) return 1; } else if (folio_isolate_lru(folio)) { list_add_tail(&folio->lru, pagelist); node_stat_mod_folio(folio, NR_ISOLATED_ANON + folio_is_file_lru(folio), folio_nr_pages(folio)); return 1; } return -EBUSY; } /* * Resolves the given address to a struct folio, isolates it from the LRU and * puts it to the given pagelist. * Returns: * errno - if the folio cannot be found/isolated * 0 - when it doesn't have to be migrated because it is already on the * target node * 1 - when it has been queued */ static int add_folio_for_migration(struct mm_struct *mm, const void __user *p, int node, struct list_head *pagelist, bool migrate_all) { struct vm_area_struct *vma; struct folio_walk fw; struct folio *folio; unsigned long addr; int err = -EFAULT; mmap_read_lock(mm); addr = (unsigned long)untagged_addr_remote(mm, p); vma = vma_lookup(mm, addr); if (vma && vma_migratable(vma)) { folio = folio_walk_start(&fw, vma, addr, FW_ZEROPAGE); if (folio) { err = __add_folio_for_migration(folio, node, pagelist, migrate_all); folio_walk_end(&fw, vma); } else { err = -ENOENT; } } mmap_read_unlock(mm); return err; } static int move_pages_and_store_status(int node, struct list_head *pagelist, int __user *status, int start, int i, unsigned long nr_pages) { int err; if (list_empty(pagelist)) return 0; err = do_move_pages_to_node(pagelist, node); if (err) { /* * Positive err means the number of failed * pages to migrate. Since we are going to * abort and return the number of non-migrated * pages, so need to include the rest of the * nr_pages that have not been attempted as * well. */ if (err > 0) err += nr_pages - i; return err; } return store_status(status, start, node, i - start); } /* * Migrate an array of page address onto an array of nodes and fill * the corresponding array of status. */ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes, unsigned long nr_pages, const void __user * __user *pages, const int __user *nodes, int __user *status, int flags) { compat_uptr_t __user *compat_pages = (void __user *)pages; int current_node = NUMA_NO_NODE; LIST_HEAD(pagelist); int start, i; int err = 0, err1; lru_cache_disable(); for (i = start = 0; i < nr_pages; i++) { const void __user *p; int node; err = -EFAULT; if (in_compat_syscall()) { compat_uptr_t cp; if (get_user(cp, compat_pages + i)) goto out_flush; p = compat_ptr(cp); } else { if (get_user(p, pages + i)) goto out_flush; } if (get_user(node, nodes + i)) goto out_flush; err = -ENODEV; if (node < 0 || node >= MAX_NUMNODES) goto out_flush; if (!node_state(node, N_MEMORY)) goto out_flush; err = -EACCES; if (!node_isset(node, task_nodes)) goto out_flush; if (current_node == NUMA_NO_NODE) { current_node = node; start = i; } else if (node != current_node) { err = move_pages_and_store_status(current_node, &pagelist, status, start, i, nr_pages); if (err) goto out; start = i; current_node = node; } /* * Errors in the page lookup or isolation are not fatal and we simply * report them via status */ err = add_folio_for_migration(mm, p, current_node, &pagelist, flags & MPOL_MF_MOVE_ALL); if (err > 0) { /* The page is successfully queued for migration */ continue; } /* * The move_pages() man page does not have an -EEXIST choice, so * use -EFAULT instead. */ if (err == -EEXIST) err = -EFAULT; /* * If the page is already on the target node (!err), store the * node, otherwise, store the err. */ err = store_status(status, i, err ? : current_node, 1); if (err) goto out_flush; err = move_pages_and_store_status(current_node, &pagelist, status, start, i, nr_pages); if (err) { /* We have accounted for page i */ if (err > 0) err--; goto out; } current_node = NUMA_NO_NODE; } out_flush: /* Make sure we do not overwrite the existing error */ err1 = move_pages_and_store_status(current_node, &pagelist, status, start, i, nr_pages); if (err >= 0) err = err1; out: lru_cache_enable(); return err; } /* * Determine the nodes of an array of pages and store it in an array of status. */ static void do_pages_stat_array(struct mm_struct *mm, unsigned long nr_pages, const void __user **pages, int *status) { unsigned long i; mmap_read_lock(mm); for (i = 0; i < nr_pages; i++) { unsigned long addr = (unsigned long)(*pages); struct vm_area_struct *vma; struct folio_walk fw; struct folio *folio; int err = -EFAULT; vma = vma_lookup(mm, addr); if (!vma) goto set_status; folio = folio_walk_start(&fw, vma, addr, FW_ZEROPAGE); if (folio) { if (is_zero_folio(folio) || is_huge_zero_folio(folio)) err = -EFAULT; else if (folio_is_zone_device(folio)) err = -ENOENT; else err = folio_nid(folio); folio_walk_end(&fw, vma); } else { err = -ENOENT; } set_status: *status = err; pages++; status++; } mmap_read_unlock(mm); } static int get_compat_pages_array(const void __user *chunk_pages[], const void __user * __user *pages, unsigned long chunk_nr) { compat_uptr_t __user *pages32 = (compat_uptr_t __user *)pages; compat_uptr_t p; int i; for (i = 0; i < chunk_nr; i++) { if (get_user(p, pages32 + i)) return -EFAULT; chunk_pages[i] = compat_ptr(p); } return 0; } /* * Determine the nodes of a user array of pages and store it in * a user array of status. */ static int do_pages_stat(struct mm_struct *mm, unsigned long nr_pages, const void __user * __user *pages, int __user *status) { #define DO_PAGES_STAT_CHUNK_NR 16UL const void __user *chunk_pages[DO_PAGES_STAT_CHUNK_NR]; int chunk_status[DO_PAGES_STAT_CHUNK_NR]; while (nr_pages) { unsigned long chunk_nr = min(nr_pages, DO_PAGES_STAT_CHUNK_NR); if (in_compat_syscall()) { if (get_compat_pages_array(chunk_pages, pages, chunk_nr)) break; } else { if (copy_from_user(chunk_pages, pages, chunk_nr * sizeof(*chunk_pages))) break; } do_pages_stat_array(mm, chunk_nr, chunk_pages, chunk_status); if (copy_to_user(status, chunk_status, chunk_nr * sizeof(*status))) break; pages += chunk_nr; status += chunk_nr; nr_pages -= chunk_nr; } return nr_pages ? -EFAULT : 0; } static struct mm_struct *find_mm_struct(pid_t pid, nodemask_t *mem_nodes) { struct task_struct *task; struct mm_struct *mm; /* * There is no need to check if current process has the right to modify * the specified process when they are same. */ if (!pid) { mmget(current->mm); *mem_nodes = cpuset_mems_allowed(current); return current->mm; } task = find_get_task_by_vpid(pid); if (!task) { return ERR_PTR(-ESRCH); } /* * Check if this process has the right to modify the specified * process. Use the regular "ptrace_may_access()" checks. */ if (!ptrace_may_access(task, PTRACE_MODE_READ_REALCREDS)) { mm = ERR_PTR(-EPERM); goto out; } mm = ERR_PTR(security_task_movememory(task)); if (IS_ERR(mm)) goto out; *mem_nodes = cpuset_mems_allowed(task); mm = get_task_mm(task); out: put_task_struct(task); if (!mm) mm = ERR_PTR(-EINVAL); return mm; } /* * Move a list of pages in the address space of the currently executing * process. */ static int kernel_move_pages(pid_t pid, unsigned long nr_pages, const void __user * __user *pages, const int __user *nodes, int __user *status, int flags) { struct mm_struct *mm; int err; nodemask_t task_nodes; /* Check flags */ if (flags & ~(MPOL_MF_MOVE|MPOL_MF_MOVE_ALL)) return -EINVAL; if ((flags & MPOL_MF_MOVE_ALL) && !capable(CAP_SYS_NICE)) return -EPERM; mm = find_mm_struct(pid, &task_nodes); if (IS_ERR(mm)) return PTR_ERR(mm); if (nodes) err = do_pages_move(mm, task_nodes, nr_pages, pages, nodes, status, flags); else err = do_pages_stat(mm, nr_pages, pages, status); mmput(mm); return err; } SYSCALL_DEFINE6(move_pages, pid_t, pid, unsigned long, nr_pages, const void __user * __user *, pages, const int __user *, nodes, int __user *, status, int, flags) { return kernel_move_pages(pid, nr_pages, pages, nodes, status, flags); } #ifdef CONFIG_NUMA_BALANCING /* * Returns true if this is a safe migration target node for misplaced NUMA * pages. Currently it only checks the watermarks which is crude. */ static bool migrate_balanced_pgdat(struct pglist_data *pgdat, unsigned long nr_migrate_pages) { int z; for (z = pgdat->nr_zones - 1; z >= 0; z--) { struct zone *zone = pgdat->node_zones + z; if (!managed_zone(zone)) continue; /* Avoid waking kswapd by allocating pages_to_migrate pages. */ if (!zone_watermark_ok(zone, 0, high_wmark_pages(zone) + nr_migrate_pages, ZONE_MOVABLE, ALLOC_CMA)) continue; return true; } return false; } static struct folio *alloc_misplaced_dst_folio(struct folio *src, unsigned long data) { int nid = (int) data; int order = folio_order(src); gfp_t gfp = __GFP_THISNODE; if (order > 0) gfp |= GFP_TRANSHUGE_LIGHT; else { gfp |= GFP_HIGHUSER_MOVABLE | __GFP_NOMEMALLOC | __GFP_NORETRY | __GFP_NOWARN; gfp &= ~__GFP_RECLAIM; } return __folio_alloc_node(gfp, order, nid); } /* * Prepare for calling migrate_misplaced_folio() by isolating the folio if * permitted. Must be called with the PTL still held. */ int migrate_misplaced_folio_prepare(struct folio *folio, struct vm_area_struct *vma, int node) { int nr_pages = folio_nr_pages(folio); pg_data_t *pgdat = NODE_DATA(node); if (folio_is_file_lru(folio)) { /* * Do not migrate file folios that are mapped in multiple * processes with execute permissions as they are probably * shared libraries. * * See folio_likely_mapped_shared() on possible imprecision * when we cannot easily detect if a folio is shared. */ if ((vma->vm_flags & VM_EXEC) && folio_likely_mapped_shared(folio)) return -EACCES; /* * Do not migrate dirty folios as not all filesystems can move * dirty folios in MIGRATE_ASYNC mode which is a waste of * cycles. */ if (folio_test_dirty(folio)) return -EAGAIN; } /* Avoid migrating to a node that is nearly full */ if (!migrate_balanced_pgdat(pgdat, nr_pages)) { int z; if (!(sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING)) return -EAGAIN; for (z = pgdat->nr_zones - 1; z >= 0; z--) { if (managed_zone(pgdat->node_zones + z)) break; } /* * If there are no managed zones, it should not proceed * further. */ if (z < 0) return -EAGAIN; wakeup_kswapd(pgdat->node_zones + z, 0, folio_order(folio), ZONE_MOVABLE); return -EAGAIN; } if (!folio_isolate_lru(folio)) return -EAGAIN; node_stat_mod_folio(folio, NR_ISOLATED_ANON + folio_is_file_lru(folio), nr_pages); return 0; } /* * Attempt to migrate a misplaced folio to the specified destination * node. Caller is expected to have isolated the folio by calling * migrate_misplaced_folio_prepare(), which will result in an * elevated reference count on the folio. This function will un-isolate the * folio, dereferencing the folio before returning. */ int migrate_misplaced_folio(struct folio *folio, int node) { pg_data_t *pgdat = NODE_DATA(node); int nr_remaining; unsigned int nr_succeeded; LIST_HEAD(migratepages); struct mem_cgroup *memcg = get_mem_cgroup_from_folio(folio); struct lruvec *lruvec = mem_cgroup_lruvec(memcg, pgdat); list_add(&folio->lru, &migratepages); nr_remaining = migrate_pages(&migratepages, alloc_misplaced_dst_folio, NULL, node, MIGRATE_ASYNC, MR_NUMA_MISPLACED, &nr_succeeded); if (nr_remaining && !list_empty(&migratepages)) putback_movable_pages(&migratepages); if (nr_succeeded) { count_vm_numa_events(NUMA_PAGE_MIGRATE, nr_succeeded); count_memcg_events(memcg, NUMA_PAGE_MIGRATE, nr_succeeded); if ((sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING) && !node_is_toptier(folio_nid(folio)) && node_is_toptier(node)) mod_lruvec_state(lruvec, PGPROMOTE_SUCCESS, nr_succeeded); } mem_cgroup_put(memcg); BUG_ON(!list_empty(&migratepages)); return nr_remaining ? -EAGAIN : 0; } #endif /* CONFIG_NUMA_BALANCING */ #endif /* CONFIG_NUMA */ |
| 99 9 8 9 90 90 107 1 102 24 2 32 608 5 608 297 295 293 296 296 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 | // SPDX-License-Identifier: GPL-2.0-only /* * IPv6 library code, needed by static components when full IPv6 support is * not configured or static. These functions are needed by GSO/GRO implementation. */ #include <linux/export.h> #include <net/ip.h> #include <net/ipv6.h> #include <net/ip6_fib.h> #include <net/addrconf.h> #include <net/secure_seq.h> #include <linux/netfilter.h> static u32 __ipv6_select_ident(struct net *net, const struct in6_addr *dst, const struct in6_addr *src) { return get_random_u32_above(0); } /* This function exists only for tap drivers that must support broken * clients requesting UFO without specifying an IPv6 fragment ID. * * This is similar to ipv6_select_ident() but we use an independent hash * seed to limit information leakage. * * The network header must be set before calling this. */ __be32 ipv6_proxy_select_ident(struct net *net, struct sk_buff *skb) { struct in6_addr buf[2]; struct in6_addr *addrs; u32 id; addrs = skb_header_pointer(skb, skb_network_offset(skb) + offsetof(struct ipv6hdr, saddr), sizeof(buf), buf); if (!addrs) return 0; id = __ipv6_select_ident(net, &addrs[1], &addrs[0]); return htonl(id); } EXPORT_SYMBOL_GPL(ipv6_proxy_select_ident); __be32 ipv6_select_ident(struct net *net, const struct in6_addr *daddr, const struct in6_addr *saddr) { u32 id; id = __ipv6_select_ident(net, daddr, saddr); return htonl(id); } EXPORT_SYMBOL(ipv6_select_ident); int ip6_find_1stfragopt(struct sk_buff *skb, u8 **nexthdr) { unsigned int offset = sizeof(struct ipv6hdr); unsigned int packet_len = skb_tail_pointer(skb) - skb_network_header(skb); int found_rhdr = 0; *nexthdr = &ipv6_hdr(skb)->nexthdr; while (offset <= packet_len) { struct ipv6_opt_hdr *exthdr; switch (**nexthdr) { case NEXTHDR_HOP: break; case NEXTHDR_ROUTING: found_rhdr = 1; break; case NEXTHDR_DEST: #if IS_ENABLED(CONFIG_IPV6_MIP6) if (ipv6_find_tlv(skb, offset, IPV6_TLV_HAO) >= 0) break; #endif if (found_rhdr) return offset; break; default: return offset; } if (offset + sizeof(struct ipv6_opt_hdr) > packet_len) return -EINVAL; exthdr = (struct ipv6_opt_hdr *)(skb_network_header(skb) + offset); offset += ipv6_optlen(exthdr); if (offset > IPV6_MAXPLEN) return -EINVAL; *nexthdr = &exthdr->nexthdr; } return -EINVAL; } EXPORT_SYMBOL(ip6_find_1stfragopt); #if IS_ENABLED(CONFIG_IPV6) int ip6_dst_hoplimit(struct dst_entry *dst) { int hoplimit = dst_metric_raw(dst, RTAX_HOPLIMIT); if (hoplimit == 0) { struct net_device *dev = dst->dev; struct inet6_dev *idev; rcu_read_lock(); idev = __in6_dev_get(dev); if (idev) hoplimit = READ_ONCE(idev->cnf.hop_limit); else hoplimit = READ_ONCE(dev_net(dev)->ipv6.devconf_all->hop_limit); rcu_read_unlock(); } return hoplimit; } EXPORT_SYMBOL(ip6_dst_hoplimit); #endif int __ip6_local_out(struct net *net, struct sock *sk, struct sk_buff *skb) { int len; len = skb->len - sizeof(struct ipv6hdr); if (len > IPV6_MAXPLEN) len = 0; ipv6_hdr(skb)->payload_len = htons(len); IP6CB(skb)->nhoff = offsetof(struct ipv6hdr, nexthdr); /* if egress device is enslaved to an L3 master device pass the * skb to its handler for processing */ skb = l3mdev_ip6_out(sk, skb); if (unlikely(!skb)) return 0; skb->protocol = htons(ETH_P_IPV6); return nf_hook(NFPROTO_IPV6, NF_INET_LOCAL_OUT, net, sk, skb, NULL, skb_dst(skb)->dev, dst_output); } EXPORT_SYMBOL_GPL(__ip6_local_out); int ip6_local_out(struct net *net, struct sock *sk, struct sk_buff *skb) { int err; err = __ip6_local_out(net, sk, skb); if (likely(err == 1)) err = dst_output(net, sk, skb); return err; } EXPORT_SYMBOL_GPL(ip6_local_out); |
| 9 9 16 16 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 | // SPDX-License-Identifier: GPL-2.0-or-later /* In-software asymmetric public-key crypto subtype * * See Documentation/crypto/asymmetric-keys.rst * * Copyright (C) 2012 Red Hat, Inc. All Rights Reserved. * Written by David Howells (dhowells@redhat.com) */ #define pr_fmt(fmt) "PKEY: "fmt #include <crypto/akcipher.h> #include <crypto/public_key.h> #include <crypto/sig.h> #include <keys/asymmetric-subtype.h> #include <linux/asn1.h> #include <linux/err.h> #include <linux/kernel.h> #include <linux/module.h> #include <linux/seq_file.h> #include <linux/slab.h> #include <linux/string.h> MODULE_DESCRIPTION("In-software asymmetric public-key subtype"); MODULE_AUTHOR("Red Hat, Inc."); MODULE_LICENSE("GPL"); /* * Provide a part of a description of the key for /proc/keys. */ static void public_key_describe(const struct key *asymmetric_key, struct seq_file *m) { struct public_key *key = asymmetric_key->payload.data[asym_crypto]; if (key) seq_printf(m, "%s.%s", key->id_type, key->pkey_algo); } /* * Destroy a public key algorithm key. */ void public_key_free(struct public_key *key) { if (key) { kfree_sensitive(key->key); kfree(key->params); kfree(key); } } EXPORT_SYMBOL_GPL(public_key_free); /* * Destroy a public key algorithm key. */ static void public_key_destroy(void *payload0, void *payload3) { public_key_free(payload0); public_key_signature_free(payload3); } /* * Given a public_key, and an encoding and hash_algo to be used for signing * and/or verification with that key, determine the name of the corresponding * akcipher algorithm. Also check that encoding and hash_algo are allowed. */ static int software_key_determine_akcipher(const struct public_key *pkey, const char *encoding, const char *hash_algo, char alg_name[CRYPTO_MAX_ALG_NAME], bool *sig, enum kernel_pkey_operation op) { int n; *sig = true; if (!encoding) return -EINVAL; if (strcmp(pkey->pkey_algo, "rsa") == 0) { /* * RSA signatures usually use EMSA-PKCS1-1_5 [RFC3447 sec 8.2]. */ if (strcmp(encoding, "pkcs1") == 0) { *sig = op == kernel_pkey_sign || op == kernel_pkey_verify; if (!*sig) { /* * For encrypt/decrypt, hash_algo is not used * but allowed to be set for historic reasons. */ n = snprintf(alg_name, CRYPTO_MAX_ALG_NAME, "pkcs1pad(%s)", pkey->pkey_algo); } else { if (!hash_algo) hash_algo = "none"; n = snprintf(alg_name, CRYPTO_MAX_ALG_NAME, "pkcs1(%s,%s)", pkey->pkey_algo, hash_algo); } return n >= CRYPTO_MAX_ALG_NAME ? -EINVAL : 0; } if (strcmp(encoding, "raw") != 0) return -EINVAL; /* * Raw RSA cannot differentiate between different hash * algorithms. */ if (hash_algo) return -EINVAL; *sig = false; } else if (strncmp(pkey->pkey_algo, "ecdsa", 5) == 0) { if (strcmp(encoding, "x962") != 0 && strcmp(encoding, "p1363") != 0) return -EINVAL; /* * ECDSA signatures are taken over a raw hash, so they don't * differentiate between different hash algorithms. That means * that the verifier should hard-code a specific hash algorithm. * Unfortunately, in practice ECDSA is used with multiple SHAs, * so we have to allow all of them and not just one. */ if (!hash_algo) return -EINVAL; if (strcmp(hash_algo, "sha1") != 0 && strcmp(hash_algo, "sha224") != 0 && strcmp(hash_algo, "sha256") != 0 && strcmp(hash_algo, "sha384") != 0 && strcmp(hash_algo, "sha512") != 0 && strcmp(hash_algo, "sha3-256") != 0 && strcmp(hash_algo, "sha3-384") != 0 && strcmp(hash_algo, "sha3-512") != 0) return -EINVAL; n = snprintf(alg_name, CRYPTO_MAX_ALG_NAME, "%s(%s)", encoding, pkey->pkey_algo); return n >= CRYPTO_MAX_ALG_NAME ? -EINVAL : 0; } else if (strcmp(pkey->pkey_algo, "ecrdsa") == 0) { if (strcmp(encoding, "raw") != 0) return -EINVAL; if (!hash_algo) return -EINVAL; if (strcmp(hash_algo, "streebog256") != 0 && strcmp(hash_algo, "streebog512") != 0) return -EINVAL; } else { /* Unknown public key algorithm */ return -ENOPKG; } if (strscpy(alg_name, pkey->pkey_algo, CRYPTO_MAX_ALG_NAME) < 0) return -EINVAL; return 0; } static u8 *pkey_pack_u32(u8 *dst, u32 val) { memcpy(dst, &val, sizeof(val)); return dst + sizeof(val); } /* * Query information about a key. */ static int software_key_query(const struct kernel_pkey_params *params, struct kernel_pkey_query *info) { struct crypto_akcipher *tfm; struct public_key *pkey = params->key->payload.data[asym_crypto]; char alg_name[CRYPTO_MAX_ALG_NAME]; struct crypto_sig *sig; u8 *key, *ptr; int ret, len; bool issig; ret = software_key_determine_akcipher(pkey, params->encoding, params->hash_algo, alg_name, &issig, kernel_pkey_sign); if (ret < 0) return ret; key = kmalloc(pkey->keylen + sizeof(u32) * 2 + pkey->paramlen, GFP_KERNEL); if (!key) return -ENOMEM; memcpy(key, pkey->key, pkey->keylen); ptr = key + pkey->keylen; ptr = pkey_pack_u32(ptr, pkey->algo); ptr = pkey_pack_u32(ptr, pkey->paramlen); memcpy(ptr, pkey->params, pkey->paramlen); if (issig) { sig = crypto_alloc_sig(alg_name, 0, 0); if (IS_ERR(sig)) { ret = PTR_ERR(sig); goto error_free_key; } if (pkey->key_is_private) ret = crypto_sig_set_privkey(sig, key, pkey->keylen); else ret = crypto_sig_set_pubkey(sig, key, pkey->keylen); if (ret < 0) goto error_free_tfm; len = crypto_sig_keysize(sig); info->max_sig_size = crypto_sig_maxsize(sig); info->max_data_size = crypto_sig_digestsize(sig); info->supported_ops = KEYCTL_SUPPORTS_VERIFY; if (pkey->key_is_private) info->supported_ops |= KEYCTL_SUPPORTS_SIGN; if (strcmp(params->encoding, "pkcs1") == 0) { info->supported_ops |= KEYCTL_SUPPORTS_ENCRYPT; if (pkey->key_is_private) info->supported_ops |= KEYCTL_SUPPORTS_DECRYPT; } } else { tfm = crypto_alloc_akcipher(alg_name, 0, 0); if (IS_ERR(tfm)) { ret = PTR_ERR(tfm); goto error_free_key; } if (pkey->key_is_private) ret = crypto_akcipher_set_priv_key(tfm, key, pkey->keylen); else ret = crypto_akcipher_set_pub_key(tfm, key, pkey->keylen); if (ret < 0) goto error_free_tfm; len = crypto_akcipher_maxsize(tfm); info->max_sig_size = len; info->max_data_size = len; info->supported_ops = KEYCTL_SUPPORTS_ENCRYPT; if (pkey->key_is_private) info->supported_ops |= KEYCTL_SUPPORTS_DECRYPT; } info->key_size = len * 8; info->max_enc_size = len; info->max_dec_size = len; ret = 0; error_free_tfm: if (issig) crypto_free_sig(sig); else crypto_free_akcipher(tfm); error_free_key: kfree_sensitive(key); pr_devel("<==%s() = %d\n", __func__, ret); return ret; } /* * Do encryption, decryption and signing ops. */ static int software_key_eds_op(struct kernel_pkey_params *params, const void *in, void *out) { const struct public_key *pkey = params->key->payload.data[asym_crypto]; char alg_name[CRYPTO_MAX_ALG_NAME]; struct crypto_akcipher *tfm; struct crypto_sig *sig; char *key, *ptr; bool issig; int ksz; int ret; pr_devel("==>%s()\n", __func__); ret = software_key_determine_akcipher(pkey, params->encoding, params->hash_algo, alg_name, &issig, params->op); if (ret < 0) return ret; key = kmalloc(pkey->keylen + sizeof(u32) * 2 + pkey->paramlen, GFP_KERNEL); if (!key) return -ENOMEM; memcpy(key, pkey->key, pkey->keylen); ptr = key + pkey->keylen; ptr = pkey_pack_u32(ptr, pkey->algo); ptr = pkey_pack_u32(ptr, pkey->paramlen); memcpy(ptr, pkey->params, pkey->paramlen); if (issig) { sig = crypto_alloc_sig(alg_name, 0, 0); if (IS_ERR(sig)) { ret = PTR_ERR(sig); goto error_free_key; } if (pkey->key_is_private) ret = crypto_sig_set_privkey(sig, key, pkey->keylen); else ret = crypto_sig_set_pubkey(sig, key, pkey->keylen); if (ret) goto error_free_tfm; ksz = crypto_sig_keysize(sig); } else { tfm = crypto_alloc_akcipher(alg_name, 0, 0); if (IS_ERR(tfm)) { ret = PTR_ERR(tfm); goto error_free_key; } if (pkey->key_is_private) ret = crypto_akcipher_set_priv_key(tfm, key, pkey->keylen); else ret = crypto_akcipher_set_pub_key(tfm, key, pkey->keylen); if (ret) goto error_free_tfm; ksz = crypto_akcipher_maxsize(tfm); } ret = -EINVAL; /* Perform the encryption calculation. */ switch (params->op) { case kernel_pkey_encrypt: if (issig) break; ret = crypto_akcipher_sync_encrypt(tfm, in, params->in_len, out, params->out_len); break; case kernel_pkey_decrypt: if (issig) break; ret = crypto_akcipher_sync_decrypt(tfm, in, params->in_len, out, params->out_len); break; case kernel_pkey_sign: if (!issig) break; ret = crypto_sig_sign(sig, in, params->in_len, out, params->out_len); break; default: BUG(); } if (ret == 0) ret = ksz; error_free_tfm: if (issig) crypto_free_sig(sig); else crypto_free_akcipher(tfm); error_free_key: kfree_sensitive(key); pr_devel("<==%s() = %d\n", __func__, ret); return ret; } /* * Verify a signature using a public key. */ int public_key_verify_signature(const struct public_key *pkey, const struct public_key_signature *sig) { char alg_name[CRYPTO_MAX_ALG_NAME]; struct crypto_sig *tfm; char *key, *ptr; bool issig; int ret; pr_devel("==>%s()\n", __func__); BUG_ON(!pkey); BUG_ON(!sig); BUG_ON(!sig->s); /* * If the signature specifies a public key algorithm, it *must* match * the key's actual public key algorithm. * * Small exception: ECDSA signatures don't specify the curve, but ECDSA * keys do. So the strings can mismatch slightly in that case: * "ecdsa-nist-*" for the key, but "ecdsa" for the signature. */ if (sig->pkey_algo) { if (strcmp(pkey->pkey_algo, sig->pkey_algo) != 0 && (strncmp(pkey->pkey_algo, "ecdsa-", 6) != 0 || strcmp(sig->pkey_algo, "ecdsa") != 0)) return -EKEYREJECTED; } ret = software_key_determine_akcipher(pkey, sig->encoding, sig->hash_algo, alg_name, &issig, kernel_pkey_verify); if (ret < 0) return ret; tfm = crypto_alloc_sig(alg_name, 0, 0); if (IS_ERR(tfm)) return PTR_ERR(tfm); key = kmalloc(pkey->keylen + sizeof(u32) * 2 + pkey->paramlen, GFP_KERNEL); if (!key) { ret = -ENOMEM; goto error_free_tfm; } memcpy(key, pkey->key, pkey->keylen); ptr = key + pkey->keylen; ptr = pkey_pack_u32(ptr, pkey->algo); ptr = pkey_pack_u32(ptr, pkey->paramlen); memcpy(ptr, pkey->params, pkey->paramlen); if (pkey->key_is_private) ret = crypto_sig_set_privkey(tfm, key, pkey->keylen); else ret = crypto_sig_set_pubkey(tfm, key, pkey->keylen); if (ret) goto error_free_key; ret = crypto_sig_verify(tfm, sig->s, sig->s_size, sig->digest, sig->digest_size); error_free_key: kfree_sensitive(key); error_free_tfm: crypto_free_sig(tfm); pr_devel("<==%s() = %d\n", __func__, ret); if (WARN_ON_ONCE(ret > 0)) ret = -EINVAL; return ret; } EXPORT_SYMBOL_GPL(public_key_verify_signature); static int public_key_verify_signature_2(const struct key *key, const struct public_key_signature *sig) { const struct public_key *pk = key->payload.data[asym_crypto]; return public_key_verify_signature(pk, sig); } /* * Public key algorithm asymmetric key subtype */ struct asymmetric_key_subtype public_key_subtype = { .owner = THIS_MODULE, .name = "public_key", .name_len = sizeof("public_key") - 1, .describe = public_key_describe, .destroy = public_key_destroy, .query = software_key_query, .eds_op = software_key_eds_op, .verify_signature = public_key_verify_signature_2, }; EXPORT_SYMBOL_GPL(public_key_subtype); |
| 3 5 2 2 2 1 8 14 1 9 4 2 8 3 11 9 4 5 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 | // SPDX-License-Identifier: GPL-2.0 /* * Framework for userspace DMA-BUF allocations * * Copyright (C) 2011 Google, Inc. * Copyright (C) 2019 Linaro Ltd. */ #include <linux/cdev.h> #include <linux/device.h> #include <linux/dma-buf.h> #include <linux/dma-heap.h> #include <linux/err.h> #include <linux/list.h> #include <linux/nospec.h> #include <linux/syscalls.h> #include <linux/uaccess.h> #include <linux/xarray.h> #include <uapi/linux/dma-heap.h> #define DEVNAME "dma_heap" #define NUM_HEAP_MINORS 128 /** * struct dma_heap - represents a dmabuf heap in the system * @name: used for debugging/device-node name * @ops: ops struct for this heap * @priv: private data for this heap * @heap_devt: heap device node * @list: list head connecting to list of heaps * @heap_cdev: heap char device * * Represents a heap of memory from which buffers can be made. */ struct dma_heap { const char *name; const struct dma_heap_ops *ops; void *priv; dev_t heap_devt; struct list_head list; struct cdev heap_cdev; }; static LIST_HEAD(heap_list); static DEFINE_MUTEX(heap_list_lock); static dev_t dma_heap_devt; static struct class *dma_heap_class; static DEFINE_XARRAY_ALLOC(dma_heap_minors); static int dma_heap_buffer_alloc(struct dma_heap *heap, size_t len, u32 fd_flags, u64 heap_flags) { struct dma_buf *dmabuf; int fd; /* * Allocations from all heaps have to begin * and end on page boundaries. */ len = PAGE_ALIGN(len); if (!len) return -EINVAL; dmabuf = heap->ops->allocate(heap, len, fd_flags, heap_flags); if (IS_ERR(dmabuf)) return PTR_ERR(dmabuf); fd = dma_buf_fd(dmabuf, fd_flags); if (fd < 0) { dma_buf_put(dmabuf); /* just return, as put will call release and that will free */ } return fd; } static int dma_heap_open(struct inode *inode, struct file *file) { struct dma_heap *heap; heap = xa_load(&dma_heap_minors, iminor(inode)); if (!heap) { pr_err("dma_heap: minor %d unknown.\n", iminor(inode)); return -ENODEV; } /* instance data as context */ file->private_data = heap; nonseekable_open(inode, file); return 0; } static long dma_heap_ioctl_allocate(struct file *file, void *data) { struct dma_heap_allocation_data *heap_allocation = data; struct dma_heap *heap = file->private_data; int fd; if (heap_allocation->fd) return -EINVAL; if (heap_allocation->fd_flags & ~DMA_HEAP_VALID_FD_FLAGS) return -EINVAL; if (heap_allocation->heap_flags & ~DMA_HEAP_VALID_HEAP_FLAGS) return -EINVAL; fd = dma_heap_buffer_alloc(heap, heap_allocation->len, heap_allocation->fd_flags, heap_allocation->heap_flags); if (fd < 0) return fd; heap_allocation->fd = fd; return 0; } static unsigned int dma_heap_ioctl_cmds[] = { DMA_HEAP_IOCTL_ALLOC, }; static long dma_heap_ioctl(struct file *file, unsigned int ucmd, unsigned long arg) { char stack_kdata[128]; char *kdata = stack_kdata; unsigned int kcmd; unsigned int in_size, out_size, drv_size, ksize; int nr = _IOC_NR(ucmd); int ret = 0; if (nr >= ARRAY_SIZE(dma_heap_ioctl_cmds)) return -EINVAL; nr = array_index_nospec(nr, ARRAY_SIZE(dma_heap_ioctl_cmds)); /* Get the kernel ioctl cmd that matches */ kcmd = dma_heap_ioctl_cmds[nr]; /* Figure out the delta between user cmd size and kernel cmd size */ drv_size = _IOC_SIZE(kcmd); out_size = _IOC_SIZE(ucmd); in_size = out_size; if ((ucmd & kcmd & IOC_IN) == 0) in_size = 0; if ((ucmd & kcmd & IOC_OUT) == 0) out_size = 0; ksize = max(max(in_size, out_size), drv_size); /* If necessary, allocate buffer for ioctl argument */ if (ksize > sizeof(stack_kdata)) { kdata = kmalloc(ksize, GFP_KERNEL); if (!kdata) return -ENOMEM; } if (copy_from_user(kdata, (void __user *)arg, in_size) != 0) { ret = -EFAULT; goto err; } /* zero out any difference between the kernel/user structure size */ if (ksize > in_size) memset(kdata + in_size, 0, ksize - in_size); switch (kcmd) { case DMA_HEAP_IOCTL_ALLOC: ret = dma_heap_ioctl_allocate(file, kdata); break; default: ret = -ENOTTY; goto err; } if (copy_to_user((void __user *)arg, kdata, out_size) != 0) ret = -EFAULT; err: if (kdata != stack_kdata) kfree(kdata); return ret; } static const struct file_operations dma_heap_fops = { .owner = THIS_MODULE, .open = dma_heap_open, .unlocked_ioctl = dma_heap_ioctl, #ifdef CONFIG_COMPAT .compat_ioctl = dma_heap_ioctl, #endif }; /** * dma_heap_get_drvdata - get per-heap driver data * @heap: DMA-Heap to retrieve private data for * * Returns: * The per-heap data for the heap. */ void *dma_heap_get_drvdata(struct dma_heap *heap) { return heap->priv; } /** * dma_heap_get_name - get heap name * @heap: DMA-Heap to retrieve the name of * * Returns: * The char* for the heap name. */ const char *dma_heap_get_name(struct dma_heap *heap) { return heap->name; } /** * dma_heap_add - adds a heap to dmabuf heaps * @exp_info: information needed to register this heap */ struct dma_heap *dma_heap_add(const struct dma_heap_export_info *exp_info) { struct dma_heap *heap, *h, *err_ret; struct device *dev_ret; unsigned int minor; int ret; if (!exp_info->name || !strcmp(exp_info->name, "")) { pr_err("dma_heap: Cannot add heap without a name\n"); return ERR_PTR(-EINVAL); } if (!exp_info->ops || !exp_info->ops->allocate) { pr_err("dma_heap: Cannot add heap with invalid ops struct\n"); return ERR_PTR(-EINVAL); } heap = kzalloc(sizeof(*heap), GFP_KERNEL); if (!heap) return ERR_PTR(-ENOMEM); heap->name = exp_info->name; heap->ops = exp_info->ops; heap->priv = exp_info->priv; /* Find unused minor number */ ret = xa_alloc(&dma_heap_minors, &minor, heap, XA_LIMIT(0, NUM_HEAP_MINORS - 1), GFP_KERNEL); if (ret < 0) { pr_err("dma_heap: Unable to get minor number for heap\n"); err_ret = ERR_PTR(ret); goto err0; } /* Create device */ heap->heap_devt = MKDEV(MAJOR(dma_heap_devt), minor); cdev_init(&heap->heap_cdev, &dma_heap_fops); ret = cdev_add(&heap->heap_cdev, heap->heap_devt, 1); if (ret < 0) { pr_err("dma_heap: Unable to add char device\n"); err_ret = ERR_PTR(ret); goto err1; } dev_ret = device_create(dma_heap_class, NULL, heap->heap_devt, NULL, heap->name); if (IS_ERR(dev_ret)) { pr_err("dma_heap: Unable to create device\n"); err_ret = ERR_CAST(dev_ret); goto err2; } mutex_lock(&heap_list_lock); /* check the name is unique */ list_for_each_entry(h, &heap_list, list) { if (!strcmp(h->name, exp_info->name)) { mutex_unlock(&heap_list_lock); pr_err("dma_heap: Already registered heap named %s\n", exp_info->name); err_ret = ERR_PTR(-EINVAL); goto err3; } } /* Add heap to the list */ list_add(&heap->list, &heap_list); mutex_unlock(&heap_list_lock); return heap; err3: device_destroy(dma_heap_class, heap->heap_devt); err2: cdev_del(&heap->heap_cdev); err1: xa_erase(&dma_heap_minors, minor); err0: kfree(heap); return err_ret; } static char *dma_heap_devnode(const struct device *dev, umode_t *mode) { return kasprintf(GFP_KERNEL, "dma_heap/%s", dev_name(dev)); } static int dma_heap_init(void) { int ret; ret = alloc_chrdev_region(&dma_heap_devt, 0, NUM_HEAP_MINORS, DEVNAME); if (ret) return ret; dma_heap_class = class_create(DEVNAME); if (IS_ERR(dma_heap_class)) { unregister_chrdev_region(dma_heap_devt, NUM_HEAP_MINORS); return PTR_ERR(dma_heap_class); } dma_heap_class->devnode = dma_heap_devnode; return 0; } subsys_initcall(dma_heap_init); |
| 269 269 3 271 1 270 13 13 221 223 1 217 224 213 9 9 224 1 222 208 15 3 220 219 3 214 8 9 76 75 76 69 76 69 7 7 76 75 1 74 64 12 3 72 74 2 69 7 7 67 68 68 63 63 63 191 1 189 61 61 59 4 61 7 61 2 2 2 2 2 2 3 3 3 7 10 22 10 12 12 7 4 10 7 3 1 15 3 3 3 21 21 8 8 8 3 3 13 13 13 88 276 2 277 1 1 271 292 291 277 131 198 197 293 144 292 265 191 191 190 70 33 70 3 39 67 70 4 70 76 75 75 64 63 1 64 64 63 12 5 12 5 7 77 12 74 7 7 7 77 77 77 77 77 77 77 77 77 76 77 2 70 7 8 70 20 77 77 17 20 20 59 17 76 76 65 68 75 2 12 73 7 7 7 248 248 248 2 2 2 11 9 9 5 2 2 1 1 7 2 2 2 1 1 11 11 11 11 4 5 3 11 1 2 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 2297 2298 2299 2300 2301 2302 2303 2304 2305 2306 2307 2308 2309 2310 2311 2312 2313 2314 2315 2316 2317 2318 2319 2320 2321 2322 2323 2324 2325 2326 2327 2328 2329 2330 2331 2332 2333 2334 2335 2336 2337 2338 2339 2340 2341 2342 2343 2344 2345 2346 2347 2348 2349 2350 2351 2352 2353 2354 2355 2356 2357 2358 2359 2360 2361 2362 2363 2364 2365 2366 2367 2368 2369 2370 2371 2372 2373 2374 2375 2376 2377 2378 2379 2380 2381 2382 2383 2384 2385 2386 2387 2388 2389 2390 2391 2392 2393 2394 2395 2396 2397 2398 2399 2400 2401 2402 2403 2404 2405 2406 2407 2408 2409 2410 2411 2412 2413 2414 2415 2416 2417 2418 2419 2420 2421 2422 2423 2424 2425 2426 2427 2428 2429 2430 2431 2432 2433 2434 2435 2436 2437 2438 2439 2440 2441 2442 2443 2444 2445 2446 2447 2448 2449 2450 2451 2452 2453 2454 2455 2456 2457 2458 2459 2460 2461 2462 2463 2464 2465 2466 2467 2468 2469 2470 2471 2472 2473 2474 2475 2476 2477 2478 2479 2480 2481 2482 2483 2484 2485 2486 2487 2488 2489 2490 2491 2492 2493 2494 2495 2496 2497 2498 2499 2500 2501 2502 2503 2504 2505 2506 2507 2508 2509 2510 2511 2512 2513 2514 2515 2516 2517 2518 2519 2520 2521 2522 2523 2524 2525 2526 2527 2528 2529 2530 2531 2532 2533 2534 2535 2536 2537 2538 2539 2540 2541 2542 2543 2544 2545 2546 2547 2548 2549 2550 2551 2552 2553 2554 2555 2556 2557 2558 2559 2560 2561 2562 2563 2564 2565 2566 2567 2568 2569 2570 2571 2572 2573 2574 2575 2576 2577 2578 2579 2580 2581 2582 2583 2584 2585 2586 2587 2588 2589 2590 2591 2592 2593 2594 2595 2596 2597 2598 2599 2600 2601 2602 2603 2604 2605 2606 2607 2608 2609 2610 2611 2612 2613 2614 2615 2616 2617 2618 2619 2620 2621 2622 2623 2624 2625 2626 2627 2628 2629 2630 2631 2632 2633 2634 2635 2636 2637 2638 2639 2640 2641 2642 2643 2644 2645 2646 2647 2648 2649 2650 2651 2652 2653 2654 2655 2656 2657 2658 2659 2660 2661 2662 2663 2664 2665 2666 2667 2668 2669 2670 2671 2672 2673 2674 2675 2676 2677 2678 2679 2680 2681 2682 2683 2684 2685 2686 2687 2688 2689 2690 2691 2692 2693 2694 2695 2696 2697 2698 2699 2700 2701 2702 2703 2704 2705 2706 2707 2708 2709 2710 2711 2712 2713 2714 2715 2716 2717 2718 2719 2720 2721 2722 2723 2724 2725 2726 2727 2728 2729 2730 2731 2732 2733 2734 2735 2736 2737 2738 2739 2740 2741 2742 2743 2744 2745 2746 2747 2748 2749 2750 2751 2752 2753 2754 2755 2756 2757 2758 2759 2760 2761 2762 2763 2764 2765 2766 2767 2768 2769 2770 2771 2772 2773 2774 2775 2776 2777 2778 2779 2780 2781 2782 2783 2784 2785 2786 2787 2788 2789 2790 2791 2792 2793 2794 2795 2796 2797 2798 2799 2800 2801 2802 2803 2804 2805 2806 2807 2808 2809 2810 2811 2812 2813 2814 2815 2816 2817 2818 2819 2820 2821 2822 2823 2824 2825 2826 2827 2828 2829 2830 2831 2832 2833 2834 2835 2836 2837 2838 2839 2840 2841 2842 2843 2844 2845 2846 2847 2848 2849 2850 2851 2852 2853 2854 2855 2856 2857 2858 2859 2860 2861 2862 2863 2864 2865 2866 2867 2868 2869 2870 2871 2872 2873 2874 2875 2876 2877 2878 2879 2880 2881 2882 2883 2884 2885 2886 2887 2888 2889 2890 2891 2892 2893 2894 2895 2896 2897 2898 2899 2900 2901 2902 2903 2904 2905 2906 2907 2908 2909 2910 2911 2912 2913 2914 2915 2916 2917 2918 2919 2920 2921 2922 2923 2924 2925 2926 2927 2928 2929 2930 2931 2932 2933 2934 2935 2936 2937 2938 2939 2940 2941 2942 2943 2944 2945 2946 2947 2948 2949 2950 2951 2952 2953 2954 2955 2956 2957 2958 2959 2960 2961 2962 2963 2964 2965 2966 2967 2968 2969 2970 2971 2972 2973 2974 2975 2976 2977 2978 2979 2980 2981 2982 2983 2984 2985 2986 2987 2988 2989 2990 2991 2992 2993 2994 2995 2996 2997 2998 2999 3000 3001 3002 3003 3004 3005 3006 3007 3008 3009 3010 3011 3012 3013 3014 3015 3016 3017 3018 3019 3020 3021 3022 3023 3024 3025 3026 3027 3028 3029 3030 3031 3032 3033 3034 3035 3036 3037 3038 3039 3040 3041 3042 3043 3044 3045 3046 3047 3048 3049 3050 3051 3052 3053 3054 3055 3056 3057 3058 3059 3060 3061 3062 3063 3064 3065 3066 3067 3068 3069 3070 3071 3072 3073 3074 3075 3076 3077 3078 3079 3080 3081 3082 3083 3084 3085 3086 3087 3088 3089 3090 3091 3092 3093 3094 3095 3096 3097 3098 3099 3100 3101 3102 3103 3104 3105 3106 3107 3108 3109 3110 3111 3112 3113 3114 3115 3116 3117 3118 3119 3120 3121 3122 3123 3124 3125 3126 3127 3128 3129 3130 3131 3132 3133 3134 3135 3136 3137 3138 3139 3140 3141 3142 3143 3144 3145 3146 3147 3148 3149 3150 3151 3152 3153 3154 3155 3156 3157 3158 3159 3160 3161 3162 3163 3164 3165 3166 3167 3168 3169 3170 3171 3172 3173 3174 3175 3176 3177 3178 3179 3180 3181 3182 3183 3184 3185 3186 3187 3188 3189 3190 3191 3192 3193 3194 3195 3196 3197 3198 3199 3200 3201 3202 3203 3204 3205 3206 3207 3208 3209 3210 3211 3212 3213 3214 3215 3216 3217 3218 3219 3220 3221 3222 3223 3224 3225 3226 3227 3228 3229 3230 3231 3232 3233 3234 3235 3236 3237 3238 3239 3240 3241 3242 3243 3244 3245 3246 3247 3248 3249 3250 3251 3252 3253 3254 3255 3256 3257 3258 3259 3260 3261 3262 3263 3264 3265 3266 3267 3268 3269 3270 3271 3272 3273 3274 3275 3276 3277 3278 3279 3280 3281 3282 3283 3284 3285 3286 3287 3288 3289 3290 3291 3292 3293 3294 3295 3296 3297 3298 3299 3300 3301 3302 3303 3304 3305 3306 3307 3308 3309 3310 3311 3312 3313 3314 3315 3316 3317 3318 3319 3320 3321 3322 3323 3324 3325 3326 3327 3328 3329 3330 3331 3332 3333 3334 3335 3336 3337 3338 3339 3340 3341 3342 3343 3344 3345 3346 3347 3348 3349 3350 3351 3352 3353 3354 3355 3356 3357 3358 3359 3360 3361 3362 3363 3364 3365 3366 3367 3368 3369 3370 3371 3372 3373 3374 3375 3376 3377 3378 3379 3380 3381 3382 3383 3384 3385 3386 3387 3388 3389 3390 3391 3392 3393 3394 3395 3396 3397 3398 3399 3400 3401 3402 3403 3404 3405 3406 3407 3408 3409 3410 3411 3412 3413 3414 3415 3416 3417 3418 3419 3420 3421 3422 3423 3424 3425 3426 3427 3428 3429 3430 3431 3432 3433 3434 3435 3436 3437 3438 3439 3440 3441 3442 3443 3444 3445 3446 3447 3448 3449 3450 3451 3452 3453 3454 3455 3456 3457 3458 3459 3460 3461 3462 3463 3464 3465 3466 3467 3468 3469 3470 3471 3472 3473 3474 3475 3476 3477 3478 3479 3480 3481 3482 3483 3484 3485 3486 3487 3488 3489 3490 3491 3492 3493 3494 3495 3496 3497 3498 3499 3500 3501 3502 3503 3504 3505 3506 3507 3508 3509 3510 3511 3512 3513 3514 3515 3516 3517 3518 3519 3520 3521 3522 3523 3524 3525 3526 3527 3528 3529 3530 3531 3532 3533 3534 3535 3536 3537 3538 3539 3540 3541 3542 3543 3544 3545 3546 3547 3548 3549 3550 3551 3552 3553 3554 3555 3556 3557 3558 3559 3560 3561 3562 3563 3564 3565 3566 3567 3568 3569 3570 3571 3572 3573 3574 3575 3576 3577 3578 3579 3580 3581 3582 3583 3584 3585 3586 3587 3588 3589 3590 3591 3592 3593 3594 3595 3596 3597 3598 3599 3600 3601 3602 3603 3604 3605 3606 3607 3608 3609 3610 3611 3612 3613 3614 3615 3616 3617 3618 3619 3620 3621 3622 3623 3624 3625 3626 3627 3628 3629 3630 3631 3632 3633 3634 3635 3636 3637 3638 3639 3640 3641 3642 3643 3644 3645 3646 3647 3648 3649 3650 3651 3652 3653 3654 3655 3656 3657 3658 3659 3660 3661 3662 3663 3664 3665 3666 3667 3668 3669 3670 3671 3672 3673 3674 3675 3676 3677 3678 3679 3680 3681 3682 3683 3684 3685 3686 3687 3688 3689 3690 3691 3692 3693 3694 3695 3696 3697 3698 3699 3700 3701 3702 3703 3704 3705 3706 3707 3708 3709 3710 3711 3712 3713 3714 3715 3716 3717 3718 3719 3720 3721 3722 3723 3724 3725 3726 3727 3728 3729 3730 3731 3732 3733 3734 3735 3736 3737 3738 3739 3740 3741 3742 3743 3744 3745 3746 3747 3748 3749 3750 3751 3752 3753 3754 3755 3756 3757 3758 3759 3760 3761 3762 3763 3764 3765 3766 3767 3768 3769 3770 3771 3772 3773 3774 3775 3776 3777 3778 3779 3780 3781 3782 3783 3784 3785 3786 3787 3788 3789 3790 3791 3792 3793 3794 3795 3796 3797 3798 3799 3800 3801 3802 3803 3804 3805 3806 3807 3808 3809 3810 3811 3812 3813 3814 3815 3816 3817 3818 3819 3820 3821 3822 3823 3824 3825 3826 3827 3828 3829 3830 3831 3832 3833 3834 3835 3836 3837 3838 3839 3840 3841 3842 3843 3844 3845 3846 3847 3848 3849 3850 3851 3852 3853 3854 3855 3856 3857 3858 3859 3860 3861 3862 3863 3864 3865 3866 3867 3868 3869 3870 3871 3872 3873 3874 3875 3876 3877 3878 3879 3880 3881 3882 3883 3884 3885 3886 3887 3888 3889 3890 3891 3892 3893 3894 3895 3896 3897 3898 3899 3900 3901 3902 3903 3904 3905 3906 3907 3908 3909 3910 3911 3912 3913 3914 3915 3916 3917 3918 3919 3920 3921 3922 3923 3924 3925 3926 3927 3928 3929 | // SPDX-License-Identifier: GPL-2.0-or-later /* SCTP kernel implementation * (C) Copyright IBM Corp. 2001, 2004 * Copyright (c) 1999-2000 Cisco, Inc. * Copyright (c) 1999-2001 Motorola, Inc. * Copyright (c) 2001-2002 Intel Corp. * * This file is part of the SCTP kernel implementation * * These functions work with the state functions in sctp_sm_statefuns.c * to implement the state operations. These functions implement the * steps which require modifying existing data structures. * * Please send any bug reports or fixes you make to the * email address(es): * lksctp developers <linux-sctp@vger.kernel.org> * * Written or modified by: * La Monte H.P. Yarroll <piggy@acm.org> * Karl Knutson <karl@athena.chicago.il.us> * C. Robin <chris@hundredacre.ac.uk> * Jon Grimm <jgrimm@us.ibm.com> * Xingang Guo <xingang.guo@intel.com> * Dajiang Zhang <dajiang.zhang@nokia.com> * Sridhar Samudrala <sri@us.ibm.com> * Daisy Chang <daisyc@us.ibm.com> * Ardelle Fan <ardelle.fan@intel.com> * Kevin Gao <kevin.gao@intel.com> */ #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt #include <crypto/hash.h> #include <linux/types.h> #include <linux/kernel.h> #include <linux/ip.h> #include <linux/ipv6.h> #include <linux/net.h> #include <linux/inet.h> #include <linux/scatterlist.h> #include <linux/slab.h> #include <net/sock.h> #include <linux/skbuff.h> #include <linux/random.h> /* for get_random_bytes */ #include <net/sctp/sctp.h> #include <net/sctp/sm.h> static struct sctp_chunk *sctp_make_control(const struct sctp_association *asoc, __u8 type, __u8 flags, int paylen, gfp_t gfp); static struct sctp_chunk *sctp_make_data(const struct sctp_association *asoc, __u8 flags, int paylen, gfp_t gfp); static struct sctp_chunk *_sctp_make_chunk(const struct sctp_association *asoc, __u8 type, __u8 flags, int paylen, gfp_t gfp); static struct sctp_cookie_param *sctp_pack_cookie( const struct sctp_endpoint *ep, const struct sctp_association *asoc, const struct sctp_chunk *init_chunk, int *cookie_len, const __u8 *raw_addrs, int addrs_len); static int sctp_process_param(struct sctp_association *asoc, union sctp_params param, const union sctp_addr *peer_addr, gfp_t gfp); static void *sctp_addto_param(struct sctp_chunk *chunk, int len, const void *data); /* Control chunk destructor */ static void sctp_control_release_owner(struct sk_buff *skb) { struct sctp_chunk *chunk = skb_shinfo(skb)->destructor_arg; if (chunk->shkey) { struct sctp_shared_key *shkey = chunk->shkey; struct sctp_association *asoc = chunk->asoc; /* refcnt == 2 and !list_empty mean after this release, it's * not being used anywhere, and it's time to notify userland * that this shkey can be freed if it's been deactivated. */ if (shkey->deactivated && !list_empty(&shkey->key_list) && refcount_read(&shkey->refcnt) == 2) { struct sctp_ulpevent *ev; ev = sctp_ulpevent_make_authkey(asoc, shkey->key_id, SCTP_AUTH_FREE_KEY, GFP_KERNEL); if (ev) asoc->stream.si->enqueue_event(&asoc->ulpq, ev); } sctp_auth_shkey_release(chunk->shkey); } } static void sctp_control_set_owner_w(struct sctp_chunk *chunk) { struct sctp_association *asoc = chunk->asoc; struct sk_buff *skb = chunk->skb; /* TODO: properly account for control chunks. * To do it right we'll need: * 1) endpoint if association isn't known. * 2) proper memory accounting. * * For now don't do anything for now. */ if (chunk->auth) { chunk->shkey = asoc->shkey; sctp_auth_shkey_hold(chunk->shkey); } skb->sk = asoc ? asoc->base.sk : NULL; skb_shinfo(skb)->destructor_arg = chunk; skb->destructor = sctp_control_release_owner; } /* What was the inbound interface for this chunk? */ int sctp_chunk_iif(const struct sctp_chunk *chunk) { struct sk_buff *skb = chunk->skb; return SCTP_INPUT_CB(skb)->af->skb_iif(skb); } /* RFC 2960 3.3.2 Initiation (INIT) (1) * * Note 2: The ECN capable field is reserved for future use of * Explicit Congestion Notification. */ static const struct sctp_paramhdr ecap_param = { SCTP_PARAM_ECN_CAPABLE, cpu_to_be16(sizeof(struct sctp_paramhdr)), }; static const struct sctp_paramhdr prsctp_param = { SCTP_PARAM_FWD_TSN_SUPPORT, cpu_to_be16(sizeof(struct sctp_paramhdr)), }; /* A helper to initialize an op error inside a provided chunk, as most * cause codes will be embedded inside an abort chunk. */ int sctp_init_cause(struct sctp_chunk *chunk, __be16 cause_code, size_t paylen) { struct sctp_errhdr err; __u16 len; /* Cause code constants are now defined in network order. */ err.cause = cause_code; len = sizeof(err) + paylen; err.length = htons(len); if (skb_tailroom(chunk->skb) < len) return -ENOSPC; chunk->subh.err_hdr = sctp_addto_chunk(chunk, sizeof(err), &err); return 0; } /* 3.3.2 Initiation (INIT) (1) * * This chunk is used to initiate a SCTP association between two * endpoints. The format of the INIT chunk is shown below: * * 0 1 2 3 * 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * | Type = 1 | Chunk Flags | Chunk Length | * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * | Initiate Tag | * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * | Advertised Receiver Window Credit (a_rwnd) | * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * | Number of Outbound Streams | Number of Inbound Streams | * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * | Initial TSN | * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * \ \ * / Optional/Variable-Length Parameters / * \ \ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * * * The INIT chunk contains the following parameters. Unless otherwise * noted, each parameter MUST only be included once in the INIT chunk. * * Fixed Parameters Status * ---------------------------------------------- * Initiate Tag Mandatory * Advertised Receiver Window Credit Mandatory * Number of Outbound Streams Mandatory * Number of Inbound Streams Mandatory * Initial TSN Mandatory * * Variable Parameters Status Type Value * ------------------------------------------------------------- * IPv4 Address (Note 1) Optional 5 * IPv6 Address (Note 1) Optional 6 * Cookie Preservative Optional 9 * Reserved for ECN Capable (Note 2) Optional 32768 (0x8000) * Host Name Address (Note 3) Optional 11 * Supported Address Types (Note 4) Optional 12 */ struct sctp_chunk *sctp_make_init(const struct sctp_association *asoc, const struct sctp_bind_addr *bp, gfp_t gfp, int vparam_len) { struct sctp_supported_ext_param ext_param; struct sctp_adaptation_ind_param aiparam; struct sctp_paramhdr *auth_chunks = NULL; struct sctp_paramhdr *auth_hmacs = NULL; struct sctp_supported_addrs_param sat; struct sctp_endpoint *ep = asoc->ep; struct sctp_chunk *retval = NULL; int num_types, addrs_len = 0; struct sctp_inithdr init; union sctp_params addrs; struct sctp_sock *sp; __u8 extensions[5]; size_t chunksize; __be16 types[2]; int num_ext = 0; /* RFC 2960 3.3.2 Initiation (INIT) (1) * * Note 1: The INIT chunks can contain multiple addresses that * can be IPv4 and/or IPv6 in any combination. */ /* Convert the provided bind address list to raw format. */ addrs = sctp_bind_addrs_to_raw(bp, &addrs_len, gfp); init.init_tag = htonl(asoc->c.my_vtag); init.a_rwnd = htonl(asoc->rwnd); init.num_outbound_streams = htons(asoc->c.sinit_num_ostreams); init.num_inbound_streams = htons(asoc->c.sinit_max_instreams); init.initial_tsn = htonl(asoc->c.initial_tsn); /* How many address types are needed? */ sp = sctp_sk(asoc->base.sk); num_types = sp->pf->supported_addrs(sp, types); chunksize = sizeof(init) + addrs_len; chunksize += SCTP_PAD4(SCTP_SAT_LEN(num_types)); if (asoc->ep->ecn_enable) chunksize += sizeof(ecap_param); if (asoc->ep->prsctp_enable) chunksize += sizeof(prsctp_param); /* ADDIP: Section 4.2.7: * An implementation supporting this extension [ADDIP] MUST list * the ASCONF,the ASCONF-ACK, and the AUTH chunks in its INIT and * INIT-ACK parameters. */ if (asoc->ep->asconf_enable) { extensions[num_ext] = SCTP_CID_ASCONF; extensions[num_ext+1] = SCTP_CID_ASCONF_ACK; num_ext += 2; } if (asoc->ep->reconf_enable) { extensions[num_ext] = SCTP_CID_RECONF; num_ext += 1; } if (sp->adaptation_ind) chunksize += sizeof(aiparam); if (asoc->ep->intl_enable) { extensions[num_ext] = SCTP_CID_I_DATA; num_ext += 1; } chunksize += vparam_len; /* Account for AUTH related parameters */ if (ep->auth_enable) { /* Add random parameter length*/ chunksize += sizeof(asoc->c.auth_random); /* Add HMACS parameter length if any were defined */ auth_hmacs = (struct sctp_paramhdr *)asoc->c.auth_hmacs; if (auth_hmacs->length) chunksize += SCTP_PAD4(ntohs(auth_hmacs->length)); else auth_hmacs = NULL; /* Add CHUNKS parameter length */ auth_chunks = (struct sctp_paramhdr *)asoc->c.auth_chunks; if (auth_chunks->length) chunksize += SCTP_PAD4(ntohs(auth_chunks->length)); else auth_chunks = NULL; extensions[num_ext] = SCTP_CID_AUTH; num_ext += 1; } /* If we have any extensions to report, account for that */ if (num_ext) chunksize += SCTP_PAD4(sizeof(ext_param) + num_ext); /* RFC 2960 3.3.2 Initiation (INIT) (1) * * Note 3: An INIT chunk MUST NOT contain more than one Host * Name address parameter. Moreover, the sender of the INIT * MUST NOT combine any other address types with the Host Name * address in the INIT. The receiver of INIT MUST ignore any * other address types if the Host Name address parameter is * present in the received INIT chunk. * * PLEASE DO NOT FIXME [This version does not support Host Name.] */ retval = sctp_make_control(asoc, SCTP_CID_INIT, 0, chunksize, gfp); if (!retval) goto nodata; retval->subh.init_hdr = sctp_addto_chunk(retval, sizeof(init), &init); retval->param_hdr.v = sctp_addto_chunk(retval, addrs_len, addrs.v); /* RFC 2960 3.3.2 Initiation (INIT) (1) * * Note 4: This parameter, when present, specifies all the * address types the sending endpoint can support. The absence * of this parameter indicates that the sending endpoint can * support any address type. */ sat.param_hdr.type = SCTP_PARAM_SUPPORTED_ADDRESS_TYPES; sat.param_hdr.length = htons(SCTP_SAT_LEN(num_types)); sctp_addto_chunk(retval, sizeof(sat), &sat); sctp_addto_chunk(retval, num_types * sizeof(__u16), &types); if (asoc->ep->ecn_enable) sctp_addto_chunk(retval, sizeof(ecap_param), &ecap_param); /* Add the supported extensions parameter. Be nice and add this * fist before addiding the parameters for the extensions themselves */ if (num_ext) { ext_param.param_hdr.type = SCTP_PARAM_SUPPORTED_EXT; ext_param.param_hdr.length = htons(sizeof(ext_param) + num_ext); sctp_addto_chunk(retval, sizeof(ext_param), &ext_param); sctp_addto_param(retval, num_ext, extensions); } if (asoc->ep->prsctp_enable) sctp_addto_chunk(retval, sizeof(prsctp_param), &prsctp_param); if (sp->adaptation_ind) { aiparam.param_hdr.type = SCTP_PARAM_ADAPTATION_LAYER_IND; aiparam.param_hdr.length = htons(sizeof(aiparam)); aiparam.adaptation_ind = htonl(sp->adaptation_ind); sctp_addto_chunk(retval, sizeof(aiparam), &aiparam); } /* Add SCTP-AUTH chunks to the parameter list */ if (ep->auth_enable) { sctp_addto_chunk(retval, sizeof(asoc->c.auth_random), asoc->c.auth_random); if (auth_hmacs) sctp_addto_chunk(retval, ntohs(auth_hmacs->length), auth_hmacs); if (auth_chunks) sctp_addto_chunk(retval, ntohs(auth_chunks->length), auth_chunks); } nodata: kfree(addrs.v); return retval; } struct sctp_chunk *sctp_make_init_ack(const struct sctp_association *asoc, const struct sctp_chunk *chunk, gfp_t gfp, int unkparam_len) { struct sctp_supported_ext_param ext_param; struct sctp_adaptation_ind_param aiparam; struct sctp_paramhdr *auth_chunks = NULL; struct sctp_paramhdr *auth_random = NULL; struct sctp_paramhdr *auth_hmacs = NULL; struct sctp_chunk *retval = NULL; struct sctp_cookie_param *cookie; struct sctp_inithdr initack; union sctp_params addrs; struct sctp_sock *sp; __u8 extensions[5]; size_t chunksize; int num_ext = 0; int cookie_len; int addrs_len; /* Note: there may be no addresses to embed. */ addrs = sctp_bind_addrs_to_raw(&asoc->base.bind_addr, &addrs_len, gfp); initack.init_tag = htonl(asoc->c.my_vtag); initack.a_rwnd = htonl(asoc->rwnd); initack.num_outbound_streams = htons(asoc->c.sinit_num_ostreams); initack.num_inbound_streams = htons(asoc->c.sinit_max_instreams); initack.initial_tsn = htonl(asoc->c.initial_tsn); /* FIXME: We really ought to build the cookie right * into the packet instead of allocating more fresh memory. */ cookie = sctp_pack_cookie(asoc->ep, asoc, chunk, &cookie_len, addrs.v, addrs_len); if (!cookie) goto nomem_cookie; /* Calculate the total size of allocation, include the reserved * space for reporting unknown parameters if it is specified. */ sp = sctp_sk(asoc->base.sk); chunksize = sizeof(initack) + addrs_len + cookie_len + unkparam_len; /* Tell peer that we'll do ECN only if peer advertised such cap. */ if (asoc->peer.ecn_capable) chunksize += sizeof(ecap_param); if (asoc->peer.prsctp_capable) chunksize += sizeof(prsctp_param); if (asoc->peer.asconf_capable) { extensions[num_ext] = SCTP_CID_ASCONF; extensions[num_ext+1] = SCTP_CID_ASCONF_ACK; num_ext += 2; } if (asoc->peer.reconf_capable) { extensions[num_ext] = SCTP_CID_RECONF; num_ext += 1; } if (sp->adaptation_ind) chunksize += sizeof(aiparam); if (asoc->peer.intl_capable) { extensions[num_ext] = SCTP_CID_I_DATA; num_ext += 1; } if (asoc->peer.auth_capable) { auth_random = (struct sctp_paramhdr *)asoc->c.auth_random; chunksize += ntohs(auth_random->length); auth_hmacs = (struct sctp_paramhdr *)asoc->c.auth_hmacs; if (auth_hmacs->length) chunksize += SCTP_PAD4(ntohs(auth_hmacs->length)); else auth_hmacs = NULL; auth_chunks = (struct sctp_paramhdr *)asoc->c.auth_chunks; if (auth_chunks->length) chunksize += SCTP_PAD4(ntohs(auth_chunks->length)); else auth_chunks = NULL; extensions[num_ext] = SCTP_CID_AUTH; num_ext += 1; } if (num_ext) chunksize += SCTP_PAD4(sizeof(ext_param) + num_ext); /* Now allocate and fill out the chunk. */ retval = sctp_make_control(asoc, SCTP_CID_INIT_ACK, 0, chunksize, gfp); if (!retval) goto nomem_chunk; /* RFC 2960 6.4 Multi-homed SCTP Endpoints * * An endpoint SHOULD transmit reply chunks (e.g., SACK, * HEARTBEAT ACK, * etc.) to the same destination transport * address from which it received the DATA or control chunk * to which it is replying. * * [INIT ACK back to where the INIT came from.] */ if (chunk->transport) retval->transport = sctp_assoc_lookup_paddr(asoc, &chunk->transport->ipaddr); retval->subh.init_hdr = sctp_addto_chunk(retval, sizeof(initack), &initack); retval->param_hdr.v = sctp_addto_chunk(retval, addrs_len, addrs.v); sctp_addto_chunk(retval, cookie_len, cookie); if (asoc->peer.ecn_capable) sctp_addto_chunk(retval, sizeof(ecap_param), &ecap_param); if (num_ext) { ext_param.param_hdr.type = SCTP_PARAM_SUPPORTED_EXT; ext_param.param_hdr.length = htons(sizeof(ext_param) + num_ext); sctp_addto_chunk(retval, sizeof(ext_param), &ext_param); sctp_addto_param(retval, num_ext, extensions); } if (asoc->peer.prsctp_capable) sctp_addto_chunk(retval, sizeof(prsctp_param), &prsctp_param); if (sp->adaptation_ind) { aiparam.param_hdr.type = SCTP_PARAM_ADAPTATION_LAYER_IND; aiparam.param_hdr.length = htons(sizeof(aiparam)); aiparam.adaptation_ind = htonl(sp->adaptation_ind); sctp_addto_chunk(retval, sizeof(aiparam), &aiparam); } if (asoc->peer.auth_capable) { sctp_addto_chunk(retval, ntohs(auth_random->length), auth_random); if (auth_hmacs) sctp_addto_chunk(retval, ntohs(auth_hmacs->length), auth_hmacs); if (auth_chunks) sctp_addto_chunk(retval, ntohs(auth_chunks->length), auth_chunks); } /* We need to remove the const qualifier at this point. */ retval->asoc = (struct sctp_association *) asoc; nomem_chunk: kfree(cookie); nomem_cookie: kfree(addrs.v); return retval; } /* 3.3.11 Cookie Echo (COOKIE ECHO) (10): * * This chunk is used only during the initialization of an association. * It is sent by the initiator of an association to its peer to complete * the initialization process. This chunk MUST precede any DATA chunk * sent within the association, but MAY be bundled with one or more DATA * chunks in the same packet. * * 0 1 2 3 * 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * | Type = 10 |Chunk Flags | Length | * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * / Cookie / * \ \ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * * Chunk Flags: 8 bit * * Set to zero on transmit and ignored on receipt. * * Length: 16 bits (unsigned integer) * * Set to the size of the chunk in bytes, including the 4 bytes of * the chunk header and the size of the Cookie. * * Cookie: variable size * * This field must contain the exact cookie received in the * State Cookie parameter from the previous INIT ACK. * * An implementation SHOULD make the cookie as small as possible * to insure interoperability. */ struct sctp_chunk *sctp_make_cookie_echo(const struct sctp_association *asoc, const struct sctp_chunk *chunk) { struct sctp_chunk *retval; int cookie_len; void *cookie; cookie = asoc->peer.cookie; cookie_len = asoc->peer.cookie_len; /* Build a cookie echo chunk. */ retval = sctp_make_control(asoc, SCTP_CID_COOKIE_ECHO, 0, cookie_len, GFP_ATOMIC); if (!retval) goto nodata; retval->subh.cookie_hdr = sctp_addto_chunk(retval, cookie_len, cookie); /* RFC 2960 6.4 Multi-homed SCTP Endpoints * * An endpoint SHOULD transmit reply chunks (e.g., SACK, * HEARTBEAT ACK, * etc.) to the same destination transport * address from which it * received the DATA or control chunk * to which it is replying. * * [COOKIE ECHO back to where the INIT ACK came from.] */ if (chunk) retval->transport = chunk->transport; nodata: return retval; } /* 3.3.12 Cookie Acknowledgement (COOKIE ACK) (11): * * This chunk is used only during the initialization of an * association. It is used to acknowledge the receipt of a COOKIE * ECHO chunk. This chunk MUST precede any DATA or SACK chunk sent * within the association, but MAY be bundled with one or more DATA * chunks or SACK chunk in the same SCTP packet. * * 0 1 2 3 * 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * | Type = 11 |Chunk Flags | Length = 4 | * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * * Chunk Flags: 8 bits * * Set to zero on transmit and ignored on receipt. */ struct sctp_chunk *sctp_make_cookie_ack(const struct sctp_association *asoc, const struct sctp_chunk *chunk) { struct sctp_chunk *retval; retval = sctp_make_control(asoc, SCTP_CID_COOKIE_ACK, 0, 0, GFP_ATOMIC); /* RFC 2960 6.4 Multi-homed SCTP Endpoints * * An endpoint SHOULD transmit reply chunks (e.g., SACK, * HEARTBEAT ACK, * etc.) to the same destination transport * address from which it * received the DATA or control chunk * to which it is replying. * * [COOKIE ACK back to where the COOKIE ECHO came from.] */ if (retval && chunk && chunk->transport) retval->transport = sctp_assoc_lookup_paddr(asoc, &chunk->transport->ipaddr); return retval; } /* * Appendix A: Explicit Congestion Notification: * CWR: * * RFC 2481 details a specific bit for a sender to send in the header of * its next outbound TCP segment to indicate to its peer that it has * reduced its congestion window. This is termed the CWR bit. For * SCTP the same indication is made by including the CWR chunk. * This chunk contains one data element, i.e. the TSN number that * was sent in the ECNE chunk. This element represents the lowest * TSN number in the datagram that was originally marked with the * CE bit. * * 0 1 2 3 * 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * | Chunk Type=13 | Flags=00000000| Chunk Length = 8 | * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * | Lowest TSN Number | * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * * Note: The CWR is considered a Control chunk. */ struct sctp_chunk *sctp_make_cwr(const struct sctp_association *asoc, const __u32 lowest_tsn, const struct sctp_chunk *chunk) { struct sctp_chunk *retval; struct sctp_cwrhdr cwr; cwr.lowest_tsn = htonl(lowest_tsn); retval = sctp_make_control(asoc, SCTP_CID_ECN_CWR, 0, sizeof(cwr), GFP_ATOMIC); if (!retval) goto nodata; retval->subh.ecn_cwr_hdr = sctp_addto_chunk(retval, sizeof(cwr), &cwr); /* RFC 2960 6.4 Multi-homed SCTP Endpoints * * An endpoint SHOULD transmit reply chunks (e.g., SACK, * HEARTBEAT ACK, * etc.) to the same destination transport * address from which it * received the DATA or control chunk * to which it is replying. * * [Report a reduced congestion window back to where the ECNE * came from.] */ if (chunk) retval->transport = chunk->transport; nodata: return retval; } /* Make an ECNE chunk. This is a congestion experienced report. */ struct sctp_chunk *sctp_make_ecne(const struct sctp_association *asoc, const __u32 lowest_tsn) { struct sctp_chunk *retval; struct sctp_ecnehdr ecne; ecne.lowest_tsn = htonl(lowest_tsn); retval = sctp_make_control(asoc, SCTP_CID_ECN_ECNE, 0, sizeof(ecne), GFP_ATOMIC); if (!retval) goto nodata; retval->subh.ecne_hdr = sctp_addto_chunk(retval, sizeof(ecne), &ecne); nodata: return retval; } /* Make a DATA chunk for the given association from the provided * parameters. However, do not populate the data payload. */ struct sctp_chunk *sctp_make_datafrag_empty(const struct sctp_association *asoc, const struct sctp_sndrcvinfo *sinfo, int len, __u8 flags, gfp_t gfp) { struct sctp_chunk *retval; struct sctp_datahdr dp; /* We assign the TSN as LATE as possible, not here when * creating the chunk. */ memset(&dp, 0, sizeof(dp)); dp.ppid = sinfo->sinfo_ppid; dp.stream = htons(sinfo->sinfo_stream); /* Set the flags for an unordered send. */ if (sinfo->sinfo_flags & SCTP_UNORDERED) flags |= SCTP_DATA_UNORDERED; retval = sctp_make_data(asoc, flags, sizeof(dp) + len, gfp); if (!retval) return NULL; retval->subh.data_hdr = sctp_addto_chunk(retval, sizeof(dp), &dp); memcpy(&retval->sinfo, sinfo, sizeof(struct sctp_sndrcvinfo)); return retval; } /* Create a selective ackowledgement (SACK) for the given * association. This reports on which TSN's we've seen to date, * including duplicates and gaps. */ struct sctp_chunk *sctp_make_sack(struct sctp_association *asoc) { struct sctp_tsnmap *map = (struct sctp_tsnmap *)&asoc->peer.tsn_map; struct sctp_gap_ack_block gabs[SCTP_MAX_GABS]; __u16 num_gabs, num_dup_tsns; struct sctp_transport *trans; struct sctp_chunk *retval; struct sctp_sackhdr sack; __u32 ctsn; int len; memset(gabs, 0, sizeof(gabs)); ctsn = sctp_tsnmap_get_ctsn(map); pr_debug("%s: sackCTSNAck sent:0x%x\n", __func__, ctsn); /* How much room is needed in the chunk? */ num_gabs = sctp_tsnmap_num_gabs(map, gabs); num_dup_tsns = sctp_tsnmap_num_dups(map); /* Initialize the SACK header. */ sack.cum_tsn_ack = htonl(ctsn); sack.a_rwnd = htonl(asoc->a_rwnd); sack.num_gap_ack_blocks = htons(num_gabs); sack.num_dup_tsns = htons(num_dup_tsns); len = sizeof(sack) + sizeof(struct sctp_gap_ack_block) * num_gabs + sizeof(__u32) * num_dup_tsns; /* Create the chunk. */ retval = sctp_make_control(asoc, SCTP_CID_SACK, 0, len, GFP_ATOMIC); if (!retval) goto nodata; /* RFC 2960 6.4 Multi-homed SCTP Endpoints * * An endpoint SHOULD transmit reply chunks (e.g., SACK, * HEARTBEAT ACK, etc.) to the same destination transport * address from which it received the DATA or control chunk to * which it is replying. This rule should also be followed if * the endpoint is bundling DATA chunks together with the * reply chunk. * * However, when acknowledging multiple DATA chunks received * in packets from different source addresses in a single * SACK, the SACK chunk may be transmitted to one of the * destination transport addresses from which the DATA or * control chunks being acknowledged were received. * * [BUG: We do not implement the following paragraph. * Perhaps we should remember the last transport we used for a * SACK and avoid that (if possible) if we have seen any * duplicates. --piggy] * * When a receiver of a duplicate DATA chunk sends a SACK to a * multi- homed endpoint it MAY be beneficial to vary the * destination address and not use the source address of the * DATA chunk. The reason being that receiving a duplicate * from a multi-homed endpoint might indicate that the return * path (as specified in the source address of the DATA chunk) * for the SACK is broken. * * [Send to the address from which we last received a DATA chunk.] */ retval->transport = asoc->peer.last_data_from; retval->subh.sack_hdr = sctp_addto_chunk(retval, sizeof(sack), &sack); /* Add the gap ack block information. */ if (num_gabs) sctp_addto_chunk(retval, sizeof(__u32) * num_gabs, gabs); /* Add the duplicate TSN information. */ if (num_dup_tsns) { asoc->stats.idupchunks += num_dup_tsns; sctp_addto_chunk(retval, sizeof(__u32) * num_dup_tsns, sctp_tsnmap_get_dups(map)); } /* Once we have a sack generated, check to see what our sack * generation is, if its 0, reset the transports to 0, and reset * the association generation to 1 * * The idea is that zero is never used as a valid generation for the * association so no transport will match after a wrap event like this, * Until the next sack */ if (++asoc->peer.sack_generation == 0) { list_for_each_entry(trans, &asoc->peer.transport_addr_list, transports) trans->sack_generation = 0; asoc->peer.sack_generation = 1; } nodata: return retval; } /* Make a SHUTDOWN chunk. */ struct sctp_chunk *sctp_make_shutdown(const struct sctp_association *asoc, const struct sctp_chunk *chunk) { struct sctp_shutdownhdr shut; struct sctp_chunk *retval; __u32 ctsn; ctsn = sctp_tsnmap_get_ctsn(&asoc->peer.tsn_map); shut.cum_tsn_ack = htonl(ctsn); retval = sctp_make_control(asoc, SCTP_CID_SHUTDOWN, 0, sizeof(shut), GFP_ATOMIC); if (!retval) goto nodata; retval->subh.shutdown_hdr = sctp_addto_chunk(retval, sizeof(shut), &shut); if (chunk) retval->transport = chunk->transport; nodata: return retval; } struct sctp_chunk *sctp_make_shutdown_ack(const struct sctp_association *asoc, const struct sctp_chunk *chunk) { struct sctp_chunk *retval; retval = sctp_make_control(asoc, SCTP_CID_SHUTDOWN_ACK, 0, 0, GFP_ATOMIC); /* RFC 2960 6.4 Multi-homed SCTP Endpoints * * An endpoint SHOULD transmit reply chunks (e.g., SACK, * HEARTBEAT ACK, * etc.) to the same destination transport * address from which it * received the DATA or control chunk * to which it is replying. * * [ACK back to where the SHUTDOWN came from.] */ if (retval && chunk) retval->transport = chunk->transport; return retval; } struct sctp_chunk *sctp_make_shutdown_complete( const struct sctp_association *asoc, const struct sctp_chunk *chunk) { struct sctp_chunk *retval; __u8 flags = 0; /* Set the T-bit if we have no association (vtag will be * reflected) */ flags |= asoc ? 0 : SCTP_CHUNK_FLAG_T; retval = sctp_make_control(asoc, SCTP_CID_SHUTDOWN_COMPLETE, flags, 0, GFP_ATOMIC); /* RFC 2960 6.4 Multi-homed SCTP Endpoints * * An endpoint SHOULD transmit reply chunks (e.g., SACK, * HEARTBEAT ACK, * etc.) to the same destination transport * address from which it * received the DATA or control chunk * to which it is replying. * * [Report SHUTDOWN COMPLETE back to where the SHUTDOWN ACK * came from.] */ if (retval && chunk) retval->transport = chunk->transport; return retval; } /* Create an ABORT. Note that we set the T bit if we have no * association, except when responding to an INIT (sctpimpguide 2.41). */ struct sctp_chunk *sctp_make_abort(const struct sctp_association *asoc, const struct sctp_chunk *chunk, const size_t hint) { struct sctp_chunk *retval; __u8 flags = 0; /* Set the T-bit if we have no association and 'chunk' is not * an INIT (vtag will be reflected). */ if (!asoc) { if (chunk && chunk->chunk_hdr && chunk->chunk_hdr->type == SCTP_CID_INIT) flags = 0; else flags = SCTP_CHUNK_FLAG_T; } retval = sctp_make_control(asoc, SCTP_CID_ABORT, flags, hint, GFP_ATOMIC); /* RFC 2960 6.4 Multi-homed SCTP Endpoints * * An endpoint SHOULD transmit reply chunks (e.g., SACK, * HEARTBEAT ACK, * etc.) to the same destination transport * address from which it * received the DATA or control chunk * to which it is replying. * * [ABORT back to where the offender came from.] */ if (retval && chunk) retval->transport = chunk->transport; return retval; } /* Helper to create ABORT with a NO_USER_DATA error. */ struct sctp_chunk *sctp_make_abort_no_data( const struct sctp_association *asoc, const struct sctp_chunk *chunk, __u32 tsn) { struct sctp_chunk *retval; __be32 payload; retval = sctp_make_abort(asoc, chunk, sizeof(struct sctp_errhdr) + sizeof(tsn)); if (!retval) goto no_mem; /* Put the tsn back into network byte order. */ payload = htonl(tsn); sctp_init_cause(retval, SCTP_ERROR_NO_DATA, sizeof(payload)); sctp_addto_chunk(retval, sizeof(payload), (const void *)&payload); /* RFC 2960 6.4 Multi-homed SCTP Endpoints * * An endpoint SHOULD transmit reply chunks (e.g., SACK, * HEARTBEAT ACK, * etc.) to the same destination transport * address from which it * received the DATA or control chunk * to which it is replying. * * [ABORT back to where the offender came from.] */ if (chunk) retval->transport = chunk->transport; no_mem: return retval; } /* Helper to create ABORT with a SCTP_ERROR_USER_ABORT error. */ struct sctp_chunk *sctp_make_abort_user(const struct sctp_association *asoc, struct msghdr *msg, size_t paylen) { struct sctp_chunk *retval; void *payload = NULL; int err; retval = sctp_make_abort(asoc, NULL, sizeof(struct sctp_errhdr) + paylen); if (!retval) goto err_chunk; if (paylen) { /* Put the msg_iov together into payload. */ payload = kmalloc(paylen, GFP_KERNEL); if (!payload) goto err_payload; err = memcpy_from_msg(payload, msg, paylen); if (err < 0) goto err_copy; } sctp_init_cause(retval, SCTP_ERROR_USER_ABORT, paylen); sctp_addto_chunk(retval, paylen, payload); if (paylen) kfree(payload); return retval; err_copy: kfree(payload); err_payload: sctp_chunk_free(retval); retval = NULL; err_chunk: return retval; } /* Append bytes to the end of a parameter. Will panic if chunk is not big * enough. */ static void *sctp_addto_param(struct sctp_chunk *chunk, int len, const void *data) { int chunklen = ntohs(chunk->chunk_hdr->length); void *target; target = skb_put(chunk->skb, len); if (data) memcpy(target, data, len); else memset(target, 0, len); /* Adjust the chunk length field. */ chunk->chunk_hdr->length = htons(chunklen + len); chunk->chunk_end = skb_tail_pointer(chunk->skb); return target; } /* Make an ABORT chunk with a PROTOCOL VIOLATION cause code. */ struct sctp_chunk *sctp_make_abort_violation( const struct sctp_association *asoc, const struct sctp_chunk *chunk, const __u8 *payload, const size_t paylen) { struct sctp_chunk *retval; struct sctp_paramhdr phdr; retval = sctp_make_abort(asoc, chunk, sizeof(struct sctp_errhdr) + paylen + sizeof(phdr)); if (!retval) goto end; sctp_init_cause(retval, SCTP_ERROR_PROTO_VIOLATION, paylen + sizeof(phdr)); phdr.type = htons(chunk->chunk_hdr->type); phdr.length = chunk->chunk_hdr->length; sctp_addto_chunk(retval, paylen, payload); sctp_addto_param(retval, sizeof(phdr), &phdr); end: return retval; } struct sctp_chunk *sctp_make_violation_paramlen( const struct sctp_association *asoc, const struct sctp_chunk *chunk, struct sctp_paramhdr *param) { static const char error[] = "The following parameter had invalid length:"; size_t payload_len = sizeof(error) + sizeof(struct sctp_errhdr) + sizeof(*param); struct sctp_chunk *retval; retval = sctp_make_abort(asoc, chunk, payload_len); if (!retval) goto nodata; sctp_init_cause(retval, SCTP_ERROR_PROTO_VIOLATION, sizeof(error) + sizeof(*param)); sctp_addto_chunk(retval, sizeof(error), error); sctp_addto_param(retval, sizeof(*param), param); nodata: return retval; } struct sctp_chunk *sctp_make_violation_max_retrans( const struct sctp_association *asoc, const struct sctp_chunk *chunk) { static const char error[] = "Association exceeded its max_retrans count"; size_t payload_len = sizeof(error) + sizeof(struct sctp_errhdr); struct sctp_chunk *retval; retval = sctp_make_abort(asoc, chunk, payload_len); if (!retval) goto nodata; sctp_init_cause(retval, SCTP_ERROR_PROTO_VIOLATION, sizeof(error)); sctp_addto_chunk(retval, sizeof(error), error); nodata: return retval; } struct sctp_chunk *sctp_make_new_encap_port(const struct sctp_association *asoc, const struct sctp_chunk *chunk) { struct sctp_new_encap_port_hdr nep; struct sctp_chunk *retval; retval = sctp_make_abort(asoc, chunk, sizeof(struct sctp_errhdr) + sizeof(nep)); if (!retval) goto nodata; sctp_init_cause(retval, SCTP_ERROR_NEW_ENCAP_PORT, sizeof(nep)); nep.cur_port = SCTP_INPUT_CB(chunk->skb)->encap_port; nep.new_port = chunk->transport->encap_port; sctp_addto_chunk(retval, sizeof(nep), &nep); nodata: return retval; } /* Make a HEARTBEAT chunk. */ struct sctp_chunk *sctp_make_heartbeat(const struct sctp_association *asoc, const struct sctp_transport *transport, __u32 probe_size) { struct sctp_sender_hb_info hbinfo = {}; struct sctp_chunk *retval; retval = sctp_make_control(asoc, SCTP_CID_HEARTBEAT, 0, sizeof(hbinfo), GFP_ATOMIC); if (!retval) goto nodata; hbinfo.param_hdr.type = SCTP_PARAM_HEARTBEAT_INFO; hbinfo.param_hdr.length = htons(sizeof(hbinfo)); hbinfo.daddr = transport->ipaddr; hbinfo.sent_at = jiffies; hbinfo.hb_nonce = transport->hb_nonce; hbinfo.probe_size = probe_size; /* Cast away the 'const', as this is just telling the chunk * what transport it belongs to. */ retval->transport = (struct sctp_transport *) transport; retval->subh.hbs_hdr = sctp_addto_chunk(retval, sizeof(hbinfo), &hbinfo); retval->pmtu_probe = !!probe_size; nodata: return retval; } struct sctp_chunk *sctp_make_heartbeat_ack(const struct sctp_association *asoc, const struct sctp_chunk *chunk, const void *payload, const size_t paylen) { struct sctp_chunk *retval; retval = sctp_make_control(asoc, SCTP_CID_HEARTBEAT_ACK, 0, paylen, GFP_ATOMIC); if (!retval) goto nodata; retval->subh.hbs_hdr = sctp_addto_chunk(retval, paylen, payload); /* RFC 2960 6.4 Multi-homed SCTP Endpoints * * An endpoint SHOULD transmit reply chunks (e.g., SACK, * HEARTBEAT ACK, * etc.) to the same destination transport * address from which it * received the DATA or control chunk * to which it is replying. * * [HBACK back to where the HEARTBEAT came from.] */ if (chunk) retval->transport = chunk->transport; nodata: return retval; } /* RFC4820 3. Padding Chunk (PAD) * 0 1 2 3 * 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * | Type = 0x84 | Flags=0 | Length | * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * | | * \ Padding Data / * / \ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ */ struct sctp_chunk *sctp_make_pad(const struct sctp_association *asoc, int len) { struct sctp_chunk *retval; retval = sctp_make_control(asoc, SCTP_CID_PAD, 0, len, GFP_ATOMIC); if (!retval) return NULL; skb_put_zero(retval->skb, len); retval->chunk_hdr->length = htons(ntohs(retval->chunk_hdr->length) + len); retval->chunk_end = skb_tail_pointer(retval->skb); return retval; } /* Create an Operation Error chunk with the specified space reserved. * This routine can be used for containing multiple causes in the chunk. */ static struct sctp_chunk *sctp_make_op_error_space( const struct sctp_association *asoc, const struct sctp_chunk *chunk, size_t size) { struct sctp_chunk *retval; retval = sctp_make_control(asoc, SCTP_CID_ERROR, 0, sizeof(struct sctp_errhdr) + size, GFP_ATOMIC); if (!retval) goto nodata; /* RFC 2960 6.4 Multi-homed SCTP Endpoints * * An endpoint SHOULD transmit reply chunks (e.g., SACK, * HEARTBEAT ACK, etc.) to the same destination transport * address from which it received the DATA or control chunk * to which it is replying. * */ if (chunk) retval->transport = chunk->transport; nodata: return retval; } /* Create an Operation Error chunk of a fixed size, specifically, * min(asoc->pathmtu, SCTP_DEFAULT_MAXSEGMENT) - overheads. * This is a helper function to allocate an error chunk for those * invalid parameter codes in which we may not want to report all the * errors, if the incoming chunk is large. If it can't fit in a single * packet, we ignore it. */ static inline struct sctp_chunk *sctp_make_op_error_limited( const struct sctp_association *asoc, const struct sctp_chunk *chunk) { size_t size = SCTP_DEFAULT_MAXSEGMENT; struct sctp_sock *sp = NULL; if (asoc) { size = min_t(size_t, size, asoc->pathmtu); sp = sctp_sk(asoc->base.sk); } size = sctp_mtu_payload(sp, size, sizeof(struct sctp_errhdr)); return sctp_make_op_error_space(asoc, chunk, size); } /* Create an Operation Error chunk. */ struct sctp_chunk *sctp_make_op_error(const struct sctp_association *asoc, const struct sctp_chunk *chunk, __be16 cause_code, const void *payload, size_t paylen, size_t reserve_tail) { struct sctp_chunk *retval; retval = sctp_make_op_error_space(asoc, chunk, paylen + reserve_tail); if (!retval) goto nodata; sctp_init_cause(retval, cause_code, paylen + reserve_tail); sctp_addto_chunk(retval, paylen, payload); if (reserve_tail) sctp_addto_param(retval, reserve_tail, NULL); nodata: return retval; } struct sctp_chunk *sctp_make_auth(const struct sctp_association *asoc, __u16 key_id) { struct sctp_authhdr auth_hdr; struct sctp_hmac *hmac_desc; struct sctp_chunk *retval; /* Get the first hmac that the peer told us to use */ hmac_desc = sctp_auth_asoc_get_hmac(asoc); if (unlikely(!hmac_desc)) return NULL; retval = sctp_make_control(asoc, SCTP_CID_AUTH, 0, hmac_desc->hmac_len + sizeof(auth_hdr), GFP_ATOMIC); if (!retval) return NULL; auth_hdr.hmac_id = htons(hmac_desc->hmac_id); auth_hdr.shkey_id = htons(key_id); retval->subh.auth_hdr = sctp_addto_chunk(retval, sizeof(auth_hdr), &auth_hdr); skb_put_zero(retval->skb, hmac_desc->hmac_len); /* Adjust the chunk header to include the empty MAC */ retval->chunk_hdr->length = htons(ntohs(retval->chunk_hdr->length) + hmac_desc->hmac_len); retval->chunk_end = skb_tail_pointer(retval->skb); return retval; } /******************************************************************** * 2nd Level Abstractions ********************************************************************/ /* Turn an skb into a chunk. * FIXME: Eventually move the structure directly inside the skb->cb[]. * * sctpimpguide-05.txt Section 2.8.2 * M1) Each time a new DATA chunk is transmitted * set the 'TSN.Missing.Report' count for that TSN to 0. The * 'TSN.Missing.Report' count will be used to determine missing chunks * and when to fast retransmit. * */ struct sctp_chunk *sctp_chunkify(struct sk_buff *skb, const struct sctp_association *asoc, struct sock *sk, gfp_t gfp) { struct sctp_chunk *retval; retval = kmem_cache_zalloc(sctp_chunk_cachep, gfp); if (!retval) goto nodata; if (!sk) pr_debug("%s: chunkifying skb:%p w/o an sk\n", __func__, skb); INIT_LIST_HEAD(&retval->list); retval->skb = skb; retval->asoc = (struct sctp_association *)asoc; retval->singleton = 1; retval->fast_retransmit = SCTP_CAN_FRTX; /* Polish the bead hole. */ INIT_LIST_HEAD(&retval->transmitted_list); INIT_LIST_HEAD(&retval->frag_list); SCTP_DBG_OBJCNT_INC(chunk); refcount_set(&retval->refcnt, 1); nodata: return retval; } /* Set chunk->source and dest based on the IP header in chunk->skb. */ void sctp_init_addrs(struct sctp_chunk *chunk, union sctp_addr *src, union sctp_addr *dest) { memcpy(&chunk->source, src, sizeof(union sctp_addr)); memcpy(&chunk->dest, dest, sizeof(union sctp_addr)); } /* Extract the source address from a chunk. */ const union sctp_addr *sctp_source(const struct sctp_chunk *chunk) { /* If we have a known transport, use that. */ if (chunk->transport) { return &chunk->transport->ipaddr; } else { /* Otherwise, extract it from the IP header. */ return &chunk->source; } } /* Create a new chunk, setting the type and flags headers from the * arguments, reserving enough space for a 'paylen' byte payload. */ static struct sctp_chunk *_sctp_make_chunk(const struct sctp_association *asoc, __u8 type, __u8 flags, int paylen, gfp_t gfp) { struct sctp_chunkhdr *chunk_hdr; struct sctp_chunk *retval; struct sk_buff *skb; struct sock *sk; int chunklen; chunklen = SCTP_PAD4(sizeof(*chunk_hdr) + paylen); if (chunklen > SCTP_MAX_CHUNK_LEN) goto nodata; /* No need to allocate LL here, as this is only a chunk. */ skb = alloc_skb(chunklen, gfp); if (!skb) goto nodata; /* Make room for the chunk header. */ chunk_hdr = (struct sctp_chunkhdr *)skb_put(skb, sizeof(*chunk_hdr)); chunk_hdr->type = type; chunk_hdr->flags = flags; chunk_hdr->length = htons(sizeof(*chunk_hdr)); sk = asoc ? asoc->base.sk : NULL; retval = sctp_chunkify(skb, asoc, sk, gfp); if (!retval) { kfree_skb(skb); goto nodata; } retval->chunk_hdr = chunk_hdr; retval->chunk_end = ((__u8 *)chunk_hdr) + sizeof(*chunk_hdr); /* Determine if the chunk needs to be authenticated */ if (sctp_auth_send_cid(type, asoc)) retval->auth = 1; return retval; nodata: return NULL; } static struct sctp_chunk *sctp_make_data(const struct sctp_association *asoc, __u8 flags, int paylen, gfp_t gfp) { return _sctp_make_chunk(asoc, SCTP_CID_DATA, flags, paylen, gfp); } struct sctp_chunk *sctp_make_idata(const struct sctp_association *asoc, __u8 flags, int paylen, gfp_t gfp) { return _sctp_make_chunk(asoc, SCTP_CID_I_DATA, flags, paylen, gfp); } static struct sctp_chunk *sctp_make_control(const struct sctp_association *asoc, __u8 type, __u8 flags, int paylen, gfp_t gfp) { struct sctp_chunk *chunk; chunk = _sctp_make_chunk(asoc, type, flags, paylen, gfp); if (chunk) sctp_control_set_owner_w(chunk); return chunk; } /* Release the memory occupied by a chunk. */ static void sctp_chunk_destroy(struct sctp_chunk *chunk) { BUG_ON(!list_empty(&chunk->list)); list_del_init(&chunk->transmitted_list); consume_skb(chunk->skb); consume_skb(chunk->auth_chunk); SCTP_DBG_OBJCNT_DEC(chunk); kmem_cache_free(sctp_chunk_cachep, chunk); } /* Possibly, free the chunk. */ void sctp_chunk_free(struct sctp_chunk *chunk) { /* Release our reference on the message tracker. */ if (chunk->msg) sctp_datamsg_put(chunk->msg); sctp_chunk_put(chunk); } /* Grab a reference to the chunk. */ void sctp_chunk_hold(struct sctp_chunk *ch) { refcount_inc(&ch->refcnt); } /* Release a reference to the chunk. */ void sctp_chunk_put(struct sctp_chunk *ch) { if (refcount_dec_and_test(&ch->refcnt)) sctp_chunk_destroy(ch); } /* Append bytes to the end of a chunk. Will panic if chunk is not big * enough. */ void *sctp_addto_chunk(struct sctp_chunk *chunk, int len, const void *data) { int chunklen = ntohs(chunk->chunk_hdr->length); int padlen = SCTP_PAD4(chunklen) - chunklen; void *target; skb_put_zero(chunk->skb, padlen); target = skb_put_data(chunk->skb, data, len); /* Adjust the chunk length field. */ chunk->chunk_hdr->length = htons(chunklen + padlen + len); chunk->chunk_end = skb_tail_pointer(chunk->skb); return target; } /* Append bytes from user space to the end of a chunk. Will panic if * chunk is not big enough. * Returns a kernel err value. */ int sctp_user_addto_chunk(struct sctp_chunk *chunk, int len, struct iov_iter *from) { void *target; /* Make room in chunk for data. */ target = skb_put(chunk->skb, len); /* Copy data (whole iovec) into chunk */ if (!copy_from_iter_full(target, len, from)) return -EFAULT; /* Adjust the chunk length field. */ chunk->chunk_hdr->length = htons(ntohs(chunk->chunk_hdr->length) + len); chunk->chunk_end = skb_tail_pointer(chunk->skb); return 0; } /* Helper function to assign a TSN if needed. This assumes that both * the data_hdr and association have already been assigned. */ void sctp_chunk_assign_ssn(struct sctp_chunk *chunk) { struct sctp_stream *stream; struct sctp_chunk *lchunk; struct sctp_datamsg *msg; __u16 ssn, sid; if (chunk->has_ssn) return; /* All fragments will be on the same stream */ sid = ntohs(chunk->subh.data_hdr->stream); stream = &chunk->asoc->stream; /* Now assign the sequence number to the entire message. * All fragments must have the same stream sequence number. */ msg = chunk->msg; list_for_each_entry(lchunk, &msg->chunks, frag_list) { if (lchunk->chunk_hdr->flags & SCTP_DATA_UNORDERED) { ssn = 0; } else { if (lchunk->chunk_hdr->flags & SCTP_DATA_LAST_FRAG) ssn = sctp_ssn_next(stream, out, sid); else ssn = sctp_ssn_peek(stream, out, sid); } lchunk->subh.data_hdr->ssn = htons(ssn); lchunk->has_ssn = 1; } } /* Helper function to assign a TSN if needed. This assumes that both * the data_hdr and association have already been assigned. */ void sctp_chunk_assign_tsn(struct sctp_chunk *chunk) { if (!chunk->has_tsn) { /* This is the last possible instant to * assign a TSN. */ chunk->subh.data_hdr->tsn = htonl(sctp_association_get_next_tsn(chunk->asoc)); chunk->has_tsn = 1; } } /* Create a CLOSED association to use with an incoming packet. */ struct sctp_association *sctp_make_temp_asoc(const struct sctp_endpoint *ep, struct sctp_chunk *chunk, gfp_t gfp) { struct sctp_association *asoc; enum sctp_scope scope; struct sk_buff *skb; /* Create the bare association. */ scope = sctp_scope(sctp_source(chunk)); asoc = sctp_association_new(ep, ep->base.sk, scope, gfp); if (!asoc) goto nodata; asoc->temp = 1; skb = chunk->skb; /* Create an entry for the source address of the packet. */ SCTP_INPUT_CB(skb)->af->from_skb(&asoc->c.peer_addr, skb, 1); nodata: return asoc; } /* Build a cookie representing asoc. * This INCLUDES the param header needed to put the cookie in the INIT ACK. */ static struct sctp_cookie_param *sctp_pack_cookie( const struct sctp_endpoint *ep, const struct sctp_association *asoc, const struct sctp_chunk *init_chunk, int *cookie_len, const __u8 *raw_addrs, int addrs_len) { struct sctp_signed_cookie *cookie; struct sctp_cookie_param *retval; int headersize, bodysize; /* Header size is static data prior to the actual cookie, including * any padding. */ headersize = sizeof(struct sctp_paramhdr) + (sizeof(struct sctp_signed_cookie) - sizeof(struct sctp_cookie)); bodysize = sizeof(struct sctp_cookie) + ntohs(init_chunk->chunk_hdr->length) + addrs_len; /* Pad out the cookie to a multiple to make the signature * functions simpler to write. */ if (bodysize % SCTP_COOKIE_MULTIPLE) bodysize += SCTP_COOKIE_MULTIPLE - (bodysize % SCTP_COOKIE_MULTIPLE); *cookie_len = headersize + bodysize; /* Clear this memory since we are sending this data structure * out on the network. */ retval = kzalloc(*cookie_len, GFP_ATOMIC); if (!retval) goto nodata; cookie = (struct sctp_signed_cookie *) retval->body; /* Set up the parameter header. */ retval->p.type = SCTP_PARAM_STATE_COOKIE; retval->p.length = htons(*cookie_len); /* Copy the cookie part of the association itself. */ cookie->c = asoc->c; /* Save the raw address list length in the cookie. */ cookie->c.raw_addr_list_len = addrs_len; /* Remember PR-SCTP capability. */ cookie->c.prsctp_capable = asoc->peer.prsctp_capable; /* Save adaptation indication in the cookie. */ cookie->c.adaptation_ind = asoc->peer.adaptation_ind; /* Set an expiration time for the cookie. */ cookie->c.expiration = ktime_add(asoc->cookie_life, ktime_get_real()); /* Copy the peer's init packet. */ memcpy(cookie + 1, init_chunk->chunk_hdr, ntohs(init_chunk->chunk_hdr->length)); /* Copy the raw local address list of the association. */ memcpy((__u8 *)(cookie + 1) + ntohs(init_chunk->chunk_hdr->length), raw_addrs, addrs_len); if (sctp_sk(ep->base.sk)->hmac) { struct crypto_shash *tfm = sctp_sk(ep->base.sk)->hmac; int err; /* Sign the message. */ err = crypto_shash_setkey(tfm, ep->secret_key, sizeof(ep->secret_key)) ?: crypto_shash_tfm_digest(tfm, (u8 *)&cookie->c, bodysize, cookie->signature); if (err) goto free_cookie; } return retval; free_cookie: kfree(retval); nodata: *cookie_len = 0; return NULL; } /* Unpack the cookie from COOKIE ECHO chunk, recreating the association. */ struct sctp_association *sctp_unpack_cookie( const struct sctp_endpoint *ep, const struct sctp_association *asoc, struct sctp_chunk *chunk, gfp_t gfp, int *error, struct sctp_chunk **errp) { struct sctp_association *retval = NULL; int headersize, bodysize, fixed_size; struct sctp_signed_cookie *cookie; struct sk_buff *skb = chunk->skb; struct sctp_cookie *bear_cookie; __u8 *digest = ep->digest; enum sctp_scope scope; unsigned int len; ktime_t kt; /* Header size is static data prior to the actual cookie, including * any padding. */ headersize = sizeof(struct sctp_chunkhdr) + (sizeof(struct sctp_signed_cookie) - sizeof(struct sctp_cookie)); bodysize = ntohs(chunk->chunk_hdr->length) - headersize; fixed_size = headersize + sizeof(struct sctp_cookie); /* Verify that the chunk looks like it even has a cookie. * There must be enough room for our cookie and our peer's * INIT chunk. */ len = ntohs(chunk->chunk_hdr->length); if (len < fixed_size + sizeof(struct sctp_chunkhdr)) goto malformed; /* Verify that the cookie has been padded out. */ if (bodysize % SCTP_COOKIE_MULTIPLE) goto malformed; /* Process the cookie. */ cookie = chunk->subh.cookie_hdr; bear_cookie = &cookie->c; if (!sctp_sk(ep->base.sk)->hmac) goto no_hmac; /* Check the signature. */ { struct crypto_shash *tfm = sctp_sk(ep->base.sk)->hmac; int err; err = crypto_shash_setkey(tfm, ep->secret_key, sizeof(ep->secret_key)) ?: crypto_shash_tfm_digest(tfm, (u8 *)bear_cookie, bodysize, digest); if (err) { *error = -SCTP_IERROR_NOMEM; goto fail; } } if (memcmp(digest, cookie->signature, SCTP_SIGNATURE_SIZE)) { *error = -SCTP_IERROR_BAD_SIG; goto fail; } no_hmac: /* IG Section 2.35.2: * 3) Compare the port numbers and the verification tag contained * within the COOKIE ECHO chunk to the actual port numbers and the * verification tag within the SCTP common header of the received * packet. If these values do not match the packet MUST be silently * discarded, */ if (ntohl(chunk->sctp_hdr->vtag) != bear_cookie->my_vtag) { *error = -SCTP_IERROR_BAD_TAG; goto fail; } if (chunk->sctp_hdr->source != bear_cookie->peer_addr.v4.sin_port || ntohs(chunk->sctp_hdr->dest) != bear_cookie->my_port) { *error = -SCTP_IERROR_BAD_PORTS; goto fail; } /* Check to see if the cookie is stale. If there is already * an association, there is no need to check cookie's expiration * for init collision case of lost COOKIE ACK. * If skb has been timestamped, then use the stamp, otherwise * use current time. This introduces a small possibility that * a cookie may be considered expired, but this would only slow * down the new association establishment instead of every packet. */ if (sock_flag(ep->base.sk, SOCK_TIMESTAMP)) kt = skb_get_ktime(skb); else kt = ktime_get_real(); if (!asoc && ktime_before(bear_cookie->expiration, kt)) { suseconds_t usecs = ktime_to_us(ktime_sub(kt, bear_cookie->expiration)); __be32 n = htonl(usecs); /* * Section 3.3.10.3 Stale Cookie Error (3) * * Cause of error * --------------- * Stale Cookie Error: Indicates the receipt of a valid State * Cookie that has expired. */ *errp = sctp_make_op_error(asoc, chunk, SCTP_ERROR_STALE_COOKIE, &n, sizeof(n), 0); if (*errp) *error = -SCTP_IERROR_STALE_COOKIE; else *error = -SCTP_IERROR_NOMEM; goto fail; } /* Make a new base association. */ scope = sctp_scope(sctp_source(chunk)); retval = sctp_association_new(ep, ep->base.sk, scope, gfp); if (!retval) { *error = -SCTP_IERROR_NOMEM; goto fail; } /* Set up our peer's port number. */ retval->peer.port = ntohs(chunk->sctp_hdr->source); /* Populate the association from the cookie. */ memcpy(&retval->c, bear_cookie, sizeof(*bear_cookie)); if (sctp_assoc_set_bind_addr_from_cookie(retval, bear_cookie, GFP_ATOMIC) < 0) { *error = -SCTP_IERROR_NOMEM; goto fail; } /* Also, add the destination address. */ if (list_empty(&retval->base.bind_addr.address_list)) { sctp_add_bind_addr(&retval->base.bind_addr, &chunk->dest, sizeof(chunk->dest), SCTP_ADDR_SRC, GFP_ATOMIC); } retval->next_tsn = retval->c.initial_tsn; retval->ctsn_ack_point = retval->next_tsn - 1; retval->addip_serial = retval->c.initial_tsn; retval->strreset_outseq = retval->c.initial_tsn; retval->adv_peer_ack_point = retval->ctsn_ack_point; retval->peer.prsctp_capable = retval->c.prsctp_capable; retval->peer.adaptation_ind = retval->c.adaptation_ind; /* The INIT stuff will be done by the side effects. */ return retval; fail: if (retval) sctp_association_free(retval); return NULL; malformed: /* Yikes! The packet is either corrupt or deliberately * malformed. */ *error = -SCTP_IERROR_MALFORMED; goto fail; } /******************************************************************** * 3rd Level Abstractions ********************************************************************/ struct __sctp_missing { __be32 num_missing; __be16 type; } __packed; /* * Report a missing mandatory parameter. */ static int sctp_process_missing_param(const struct sctp_association *asoc, enum sctp_param paramtype, struct sctp_chunk *chunk, struct sctp_chunk **errp) { struct __sctp_missing report; __u16 len; len = SCTP_PAD4(sizeof(report)); /* Make an ERROR chunk, preparing enough room for * returning multiple unknown parameters. */ if (!*errp) *errp = sctp_make_op_error_space(asoc, chunk, len); if (*errp) { report.num_missing = htonl(1); report.type = paramtype; sctp_init_cause(*errp, SCTP_ERROR_MISS_PARAM, sizeof(report)); sctp_addto_chunk(*errp, sizeof(report), &report); } /* Stop processing this chunk. */ return 0; } /* Report an Invalid Mandatory Parameter. */ static int sctp_process_inv_mandatory(const struct sctp_association *asoc, struct sctp_chunk *chunk, struct sctp_chunk **errp) { /* Invalid Mandatory Parameter Error has no payload. */ if (!*errp) *errp = sctp_make_op_error_space(asoc, chunk, 0); if (*errp) sctp_init_cause(*errp, SCTP_ERROR_INV_PARAM, 0); /* Stop processing this chunk. */ return 0; } static int sctp_process_inv_paramlength(const struct sctp_association *asoc, struct sctp_paramhdr *param, const struct sctp_chunk *chunk, struct sctp_chunk **errp) { /* This is a fatal error. Any accumulated non-fatal errors are * not reported. */ if (*errp) sctp_chunk_free(*errp); /* Create an error chunk and fill it in with our payload. */ *errp = sctp_make_violation_paramlen(asoc, chunk, param); return 0; } /* Do not attempt to handle the HOST_NAME parm. However, do * send back an indicator to the peer. */ static int sctp_process_hn_param(const struct sctp_association *asoc, union sctp_params param, struct sctp_chunk *chunk, struct sctp_chunk **errp) { __u16 len = ntohs(param.p->length); /* Processing of the HOST_NAME parameter will generate an * ABORT. If we've accumulated any non-fatal errors, they * would be unrecognized parameters and we should not include * them in the ABORT. */ if (*errp) sctp_chunk_free(*errp); *errp = sctp_make_op_error(asoc, chunk, SCTP_ERROR_DNS_FAILED, param.v, len, 0); /* Stop processing this chunk. */ return 0; } static int sctp_verify_ext_param(struct net *net, const struct sctp_endpoint *ep, union sctp_params param) { __u16 num_ext = ntohs(param.p->length) - sizeof(struct sctp_paramhdr); int have_asconf = 0; int have_auth = 0; int i; for (i = 0; i < num_ext; i++) { switch (param.ext->chunks[i]) { case SCTP_CID_AUTH: have_auth = 1; break; case SCTP_CID_ASCONF: case SCTP_CID_ASCONF_ACK: have_asconf = 1; break; } } /* ADD-IP Security: The draft requires us to ABORT or ignore the * INIT/INIT-ACK if ADD-IP is listed, but AUTH is not. Do this * only if ADD-IP is turned on and we are not backward-compatible * mode. */ if (net->sctp.addip_noauth) return 1; if (ep->asconf_enable && !have_auth && have_asconf) return 0; return 1; } static void sctp_process_ext_param(struct sctp_association *asoc, union sctp_params param) { __u16 num_ext = ntohs(param.p->length) - sizeof(struct sctp_paramhdr); int i; for (i = 0; i < num_ext; i++) { switch (param.ext->chunks[i]) { case SCTP_CID_RECONF: if (asoc->ep->reconf_enable) asoc->peer.reconf_capable = 1; break; case SCTP_CID_FWD_TSN: if (asoc->ep->prsctp_enable) asoc->peer.prsctp_capable = 1; break; case SCTP_CID_AUTH: /* if the peer reports AUTH, assume that he * supports AUTH. */ if (asoc->ep->auth_enable) asoc->peer.auth_capable = 1; break; case SCTP_CID_ASCONF: case SCTP_CID_ASCONF_ACK: if (asoc->ep->asconf_enable) asoc->peer.asconf_capable = 1; break; case SCTP_CID_I_DATA: if (asoc->ep->intl_enable) asoc->peer.intl_capable = 1; break; default: break; } } } /* RFC 3.2.1 & the Implementers Guide 2.2. * * The Parameter Types are encoded such that the * highest-order two bits specify the action that must be * taken if the processing endpoint does not recognize the * Parameter Type. * * 00 - Stop processing this parameter; do not process any further * parameters within this chunk * * 01 - Stop processing this parameter, do not process any further * parameters within this chunk, and report the unrecognized * parameter in an 'Unrecognized Parameter' ERROR chunk. * * 10 - Skip this parameter and continue processing. * * 11 - Skip this parameter and continue processing but * report the unrecognized parameter in an * 'Unrecognized Parameter' ERROR chunk. * * Return value: * SCTP_IERROR_NO_ERROR - continue with the chunk * SCTP_IERROR_ERROR - stop and report an error. * SCTP_IERROR_NOMEME - out of memory. */ static enum sctp_ierror sctp_process_unk_param( const struct sctp_association *asoc, union sctp_params param, struct sctp_chunk *chunk, struct sctp_chunk **errp) { int retval = SCTP_IERROR_NO_ERROR; switch (param.p->type & SCTP_PARAM_ACTION_MASK) { case SCTP_PARAM_ACTION_DISCARD: retval = SCTP_IERROR_ERROR; break; case SCTP_PARAM_ACTION_SKIP: break; case SCTP_PARAM_ACTION_DISCARD_ERR: retval = SCTP_IERROR_ERROR; fallthrough; case SCTP_PARAM_ACTION_SKIP_ERR: /* Make an ERROR chunk, preparing enough room for * returning multiple unknown parameters. */ if (!*errp) { *errp = sctp_make_op_error_limited(asoc, chunk); if (!*errp) { /* If there is no memory for generating the * ERROR report as specified, an ABORT will be * triggered to the peer and the association * won't be established. */ retval = SCTP_IERROR_NOMEM; break; } } if (!sctp_init_cause(*errp, SCTP_ERROR_UNKNOWN_PARAM, ntohs(param.p->length))) sctp_addto_chunk(*errp, ntohs(param.p->length), param.v); break; default: break; } return retval; } /* Verify variable length parameters * Return values: * SCTP_IERROR_ABORT - trigger an ABORT * SCTP_IERROR_NOMEM - out of memory (abort) * SCTP_IERROR_ERROR - stop processing, trigger an ERROR * SCTP_IERROR_NO_ERROR - continue with the chunk */ static enum sctp_ierror sctp_verify_param(struct net *net, const struct sctp_endpoint *ep, const struct sctp_association *asoc, union sctp_params param, enum sctp_cid cid, struct sctp_chunk *chunk, struct sctp_chunk **err_chunk) { struct sctp_hmac_algo_param *hmacs; int retval = SCTP_IERROR_NO_ERROR; __u16 n_elt, id = 0; int i; /* FIXME - This routine is not looking at each parameter per the * chunk type, i.e., unrecognized parameters should be further * identified based on the chunk id. */ switch (param.p->type) { case SCTP_PARAM_IPV4_ADDRESS: case SCTP_PARAM_IPV6_ADDRESS: case SCTP_PARAM_COOKIE_PRESERVATIVE: case SCTP_PARAM_SUPPORTED_ADDRESS_TYPES: case SCTP_PARAM_STATE_COOKIE: case SCTP_PARAM_HEARTBEAT_INFO: case SCTP_PARAM_UNRECOGNIZED_PARAMETERS: case SCTP_PARAM_ECN_CAPABLE: case SCTP_PARAM_ADAPTATION_LAYER_IND: break; case SCTP_PARAM_SUPPORTED_EXT: if (!sctp_verify_ext_param(net, ep, param)) return SCTP_IERROR_ABORT; break; case SCTP_PARAM_SET_PRIMARY: if (!ep->asconf_enable) goto unhandled; if (ntohs(param.p->length) < sizeof(struct sctp_addip_param) + sizeof(struct sctp_paramhdr)) { sctp_process_inv_paramlength(asoc, param.p, chunk, err_chunk); retval = SCTP_IERROR_ABORT; } break; case SCTP_PARAM_HOST_NAME_ADDRESS: /* This param has been Deprecated, send ABORT. */ sctp_process_hn_param(asoc, param, chunk, err_chunk); retval = SCTP_IERROR_ABORT; break; case SCTP_PARAM_FWD_TSN_SUPPORT: if (ep->prsctp_enable) break; goto unhandled; case SCTP_PARAM_RANDOM: if (!ep->auth_enable) goto unhandled; /* SCTP-AUTH: Secion 6.1 * If the random number is not 32 byte long the association * MUST be aborted. The ABORT chunk SHOULD contain the error * cause 'Protocol Violation'. */ if (SCTP_AUTH_RANDOM_LENGTH != ntohs(param.p->length) - sizeof(struct sctp_paramhdr)) { sctp_process_inv_paramlength(asoc, param.p, chunk, err_chunk); retval = SCTP_IERROR_ABORT; } break; case SCTP_PARAM_CHUNKS: if (!ep->auth_enable) goto unhandled; /* SCTP-AUTH: Section 3.2 * The CHUNKS parameter MUST be included once in the INIT or * INIT-ACK chunk if the sender wants to receive authenticated * chunks. Its maximum length is 260 bytes. */ if (260 < ntohs(param.p->length)) { sctp_process_inv_paramlength(asoc, param.p, chunk, err_chunk); retval = SCTP_IERROR_ABORT; } break; case SCTP_PARAM_HMAC_ALGO: if (!ep->auth_enable) goto unhandled; hmacs = (struct sctp_hmac_algo_param *)param.p; n_elt = (ntohs(param.p->length) - sizeof(struct sctp_paramhdr)) >> 1; /* SCTP-AUTH: Section 6.1 * The HMAC algorithm based on SHA-1 MUST be supported and * included in the HMAC-ALGO parameter. */ for (i = 0; i < n_elt; i++) { id = ntohs(hmacs->hmac_ids[i]); if (id == SCTP_AUTH_HMAC_ID_SHA1) break; } if (id != SCTP_AUTH_HMAC_ID_SHA1) { sctp_process_inv_paramlength(asoc, param.p, chunk, err_chunk); retval = SCTP_IERROR_ABORT; } break; unhandled: default: pr_debug("%s: unrecognized param:%d for chunk:%d\n", __func__, ntohs(param.p->type), cid); retval = sctp_process_unk_param(asoc, param, chunk, err_chunk); break; } return retval; } /* Verify the INIT packet before we process it. */ int sctp_verify_init(struct net *net, const struct sctp_endpoint *ep, const struct sctp_association *asoc, enum sctp_cid cid, struct sctp_init_chunk *peer_init, struct sctp_chunk *chunk, struct sctp_chunk **errp) { union sctp_params param; bool has_cookie = false; int result; /* Check for missing mandatory parameters. Note: Initial TSN is * also mandatory, but is not checked here since the valid range * is 0..2**32-1. RFC4960, section 3.3.3. */ if (peer_init->init_hdr.num_outbound_streams == 0 || peer_init->init_hdr.num_inbound_streams == 0 || peer_init->init_hdr.init_tag == 0 || ntohl(peer_init->init_hdr.a_rwnd) < SCTP_DEFAULT_MINWINDOW) return sctp_process_inv_mandatory(asoc, chunk, errp); sctp_walk_params(param, peer_init) { if (param.p->type == SCTP_PARAM_STATE_COOKIE) has_cookie = true; } /* There is a possibility that a parameter length was bad and * in that case we would have stoped walking the parameters. * The current param.p would point at the bad one. * Current consensus on the mailing list is to generate a PROTOCOL * VIOLATION error. We build the ERROR chunk here and let the normal * error handling code build and send the packet. */ if (param.v != (void *)chunk->chunk_end) return sctp_process_inv_paramlength(asoc, param.p, chunk, errp); /* The only missing mandatory param possible today is * the state cookie for an INIT-ACK chunk. */ if ((SCTP_CID_INIT_ACK == cid) && !has_cookie) return sctp_process_missing_param(asoc, SCTP_PARAM_STATE_COOKIE, chunk, errp); /* Verify all the variable length parameters */ sctp_walk_params(param, peer_init) { result = sctp_verify_param(net, ep, asoc, param, cid, chunk, errp); switch (result) { case SCTP_IERROR_ABORT: case SCTP_IERROR_NOMEM: return 0; case SCTP_IERROR_ERROR: return 1; case SCTP_IERROR_NO_ERROR: default: break; } } /* for (loop through all parameters) */ return 1; } /* Unpack the parameters in an INIT packet into an association. * Returns 0 on failure, else success. * FIXME: This is an association method. */ int sctp_process_init(struct sctp_association *asoc, struct sctp_chunk *chunk, const union sctp_addr *peer_addr, struct sctp_init_chunk *peer_init, gfp_t gfp) { struct sctp_transport *transport; struct list_head *pos, *temp; union sctp_params param; union sctp_addr addr; struct sctp_af *af; int src_match = 0; /* We must include the address that the INIT packet came from. * This is the only address that matters for an INIT packet. * When processing a COOKIE ECHO, we retrieve the from address * of the INIT from the cookie. */ /* This implementation defaults to making the first transport * added as the primary transport. The source address seems to * be a better choice than any of the embedded addresses. */ asoc->encap_port = SCTP_INPUT_CB(chunk->skb)->encap_port; if (!sctp_assoc_add_peer(asoc, peer_addr, gfp, SCTP_ACTIVE)) goto nomem; if (sctp_cmp_addr_exact(sctp_source(chunk), peer_addr)) src_match = 1; /* Process the initialization parameters. */ sctp_walk_params(param, peer_init) { if (!src_match && (param.p->type == SCTP_PARAM_IPV4_ADDRESS || param.p->type == SCTP_PARAM_IPV6_ADDRESS)) { af = sctp_get_af_specific(param_type2af(param.p->type)); if (!af->from_addr_param(&addr, param.addr, chunk->sctp_hdr->source, 0)) continue; if (sctp_cmp_addr_exact(sctp_source(chunk), &addr)) src_match = 1; } if (!sctp_process_param(asoc, param, peer_addr, gfp)) goto clean_up; } /* source address of chunk may not match any valid address */ if (!src_match) goto clean_up; /* AUTH: After processing the parameters, make sure that we * have all the required info to potentially do authentications. */ if (asoc->peer.auth_capable && (!asoc->peer.peer_random || !asoc->peer.peer_hmacs)) asoc->peer.auth_capable = 0; /* In a non-backward compatible mode, if the peer claims * support for ADD-IP but not AUTH, the ADD-IP spec states * that we MUST ABORT the association. Section 6. The section * also give us an option to silently ignore the packet, which * is what we'll do here. */ if (!asoc->base.net->sctp.addip_noauth && (asoc->peer.asconf_capable && !asoc->peer.auth_capable)) { asoc->peer.addip_disabled_mask |= (SCTP_PARAM_ADD_IP | SCTP_PARAM_DEL_IP | SCTP_PARAM_SET_PRIMARY); asoc->peer.asconf_capable = 0; goto clean_up; } /* Walk list of transports, removing transports in the UNKNOWN state. */ list_for_each_safe(pos, temp, &asoc->peer.transport_addr_list) { transport = list_entry(pos, struct sctp_transport, transports); if (transport->state == SCTP_UNKNOWN) { sctp_assoc_rm_peer(asoc, transport); } } /* The fixed INIT headers are always in network byte * order. */ asoc->peer.i.init_tag = ntohl(peer_init->init_hdr.init_tag); asoc->peer.i.a_rwnd = ntohl(peer_init->init_hdr.a_rwnd); asoc->peer.i.num_outbound_streams = ntohs(peer_init->init_hdr.num_outbound_streams); asoc->peer.i.num_inbound_streams = ntohs(peer_init->init_hdr.num_inbound_streams); asoc->peer.i.initial_tsn = ntohl(peer_init->init_hdr.initial_tsn); asoc->strreset_inseq = asoc->peer.i.initial_tsn; /* Apply the upper bounds for output streams based on peer's * number of inbound streams. */ if (asoc->c.sinit_num_ostreams > ntohs(peer_init->init_hdr.num_inbound_streams)) { asoc->c.sinit_num_ostreams = ntohs(peer_init->init_hdr.num_inbound_streams); } if (asoc->c.sinit_max_instreams > ntohs(peer_init->init_hdr.num_outbound_streams)) { asoc->c.sinit_max_instreams = ntohs(peer_init->init_hdr.num_outbound_streams); } /* Copy Initiation tag from INIT to VT_peer in cookie. */ asoc->c.peer_vtag = asoc->peer.i.init_tag; /* Peer Rwnd : Current calculated value of the peer's rwnd. */ asoc->peer.rwnd = asoc->peer.i.a_rwnd; /* RFC 2960 7.2.1 The initial value of ssthresh MAY be arbitrarily * high (for example, implementations MAY use the size of the receiver * advertised window). */ list_for_each_entry(transport, &asoc->peer.transport_addr_list, transports) { transport->ssthresh = asoc->peer.i.a_rwnd; } /* Set up the TSN tracking pieces. */ if (!sctp_tsnmap_init(&asoc->peer.tsn_map, SCTP_TSN_MAP_INITIAL, asoc->peer.i.initial_tsn, gfp)) goto clean_up; /* RFC 2960 6.5 Stream Identifier and Stream Sequence Number * * The stream sequence number in all the streams shall start * from 0 when the association is established. Also, when the * stream sequence number reaches the value 65535 the next * stream sequence number shall be set to 0. */ if (sctp_stream_init(&asoc->stream, asoc->c.sinit_num_ostreams, asoc->c.sinit_max_instreams, gfp)) goto clean_up; /* Update frag_point when stream_interleave may get changed. */ sctp_assoc_update_frag_point(asoc); if (!asoc->temp && sctp_assoc_set_id(asoc, gfp)) goto clean_up; /* ADDIP Section 4.1 ASCONF Chunk Procedures * * When an endpoint has an ASCONF signaled change to be sent to the * remote endpoint it should do the following: * ... * A2) A serial number should be assigned to the Chunk. The serial * number should be a monotonically increasing number. All serial * numbers are defined to be initialized at the start of the * association to the same value as the Initial TSN. */ asoc->peer.addip_serial = asoc->peer.i.initial_tsn - 1; return 1; clean_up: /* Release the transport structures. */ list_for_each_safe(pos, temp, &asoc->peer.transport_addr_list) { transport = list_entry(pos, struct sctp_transport, transports); if (transport->state != SCTP_ACTIVE) sctp_assoc_rm_peer(asoc, transport); } nomem: return 0; } /* Update asoc with the option described in param. * * RFC2960 3.3.2.1 Optional/Variable Length Parameters in INIT * * asoc is the association to update. * param is the variable length parameter to use for update. * cid tells us if this is an INIT, INIT ACK or COOKIE ECHO. * If the current packet is an INIT we want to minimize the amount of * work we do. In particular, we should not build transport * structures for the addresses. */ static int sctp_process_param(struct sctp_association *asoc, union sctp_params param, const union sctp_addr *peer_addr, gfp_t gfp) { struct sctp_endpoint *ep = asoc->ep; union sctp_addr_param *addr_param; struct net *net = asoc->base.net; struct sctp_transport *t; enum sctp_scope scope; union sctp_addr addr; struct sctp_af *af; int retval = 1, i; u32 stale; __u16 sat; /* We maintain all INIT parameters in network byte order all the * time. This allows us to not worry about whether the parameters * came from a fresh INIT, and INIT ACK, or were stored in a cookie. */ switch (param.p->type) { case SCTP_PARAM_IPV6_ADDRESS: if (PF_INET6 != asoc->base.sk->sk_family) break; goto do_addr_param; case SCTP_PARAM_IPV4_ADDRESS: /* v4 addresses are not allowed on v6-only socket */ if (ipv6_only_sock(asoc->base.sk)) break; do_addr_param: af = sctp_get_af_specific(param_type2af(param.p->type)); if (!af->from_addr_param(&addr, param.addr, htons(asoc->peer.port), 0)) break; scope = sctp_scope(peer_addr); if (sctp_in_scope(net, &addr, scope)) if (!sctp_assoc_add_peer(asoc, &addr, gfp, SCTP_UNCONFIRMED)) return 0; break; case SCTP_PARAM_COOKIE_PRESERVATIVE: if (!net->sctp.cookie_preserve_enable) break; stale = ntohl(param.life->lifespan_increment); /* Suggested Cookie Life span increment's unit is msec, * (1/1000sec). */ asoc->cookie_life = ktime_add_ms(asoc->cookie_life, stale); break; case SCTP_PARAM_SUPPORTED_ADDRESS_TYPES: /* Turn off the default values first so we'll know which * ones are really set by the peer. */ asoc->peer.ipv4_address = 0; asoc->peer.ipv6_address = 0; /* Assume that peer supports the address family * by which it sends a packet. */ if (peer_addr->sa.sa_family == AF_INET6) asoc->peer.ipv6_address = 1; else if (peer_addr->sa.sa_family == AF_INET) asoc->peer.ipv4_address = 1; /* Cycle through address types; avoid divide by 0. */ sat = ntohs(param.p->length) - sizeof(struct sctp_paramhdr); if (sat) sat /= sizeof(__u16); for (i = 0; i < sat; ++i) { switch (param.sat->types[i]) { case SCTP_PARAM_IPV4_ADDRESS: asoc->peer.ipv4_address = 1; break; case SCTP_PARAM_IPV6_ADDRESS: if (PF_INET6 == asoc->base.sk->sk_family) asoc->peer.ipv6_address = 1; break; default: /* Just ignore anything else. */ break; } } break; case SCTP_PARAM_STATE_COOKIE: asoc->peer.cookie_len = ntohs(param.p->length) - sizeof(struct sctp_paramhdr); kfree(asoc->peer.cookie); asoc->peer.cookie = kmemdup(param.cookie->body, asoc->peer.cookie_len, gfp); if (!asoc->peer.cookie) retval = 0; break; case SCTP_PARAM_HEARTBEAT_INFO: /* Would be odd to receive, but it causes no problems. */ break; case SCTP_PARAM_UNRECOGNIZED_PARAMETERS: /* Rejected during verify stage. */ break; case SCTP_PARAM_ECN_CAPABLE: if (asoc->ep->ecn_enable) { asoc->peer.ecn_capable = 1; break; } /* Fall Through */ goto fall_through; case SCTP_PARAM_ADAPTATION_LAYER_IND: asoc->peer.adaptation_ind = ntohl(param.aind->adaptation_ind); break; case SCTP_PARAM_SET_PRIMARY: if (!ep->asconf_enable) goto fall_through; addr_param = param.v + sizeof(struct sctp_addip_param); af = sctp_get_af_specific(param_type2af(addr_param->p.type)); if (!af) break; if (!af->from_addr_param(&addr, addr_param, htons(asoc->peer.port), 0)) break; if (!af->addr_valid(&addr, NULL, NULL)) break; t = sctp_assoc_lookup_paddr(asoc, &addr); if (!t) break; sctp_assoc_set_primary(asoc, t); break; case SCTP_PARAM_SUPPORTED_EXT: sctp_process_ext_param(asoc, param); break; case SCTP_PARAM_FWD_TSN_SUPPORT: if (asoc->ep->prsctp_enable) { asoc->peer.prsctp_capable = 1; break; } /* Fall Through */ goto fall_through; case SCTP_PARAM_RANDOM: if (!ep->auth_enable) goto fall_through; /* Save peer's random parameter */ kfree(asoc->peer.peer_random); asoc->peer.peer_random = kmemdup(param.p, ntohs(param.p->length), gfp); if (!asoc->peer.peer_random) { retval = 0; break; } break; case SCTP_PARAM_HMAC_ALGO: if (!ep->auth_enable) goto fall_through; /* Save peer's HMAC list */ kfree(asoc->peer.peer_hmacs); asoc->peer.peer_hmacs = kmemdup(param.p, ntohs(param.p->length), gfp); if (!asoc->peer.peer_hmacs) { retval = 0; break; } /* Set the default HMAC the peer requested*/ sctp_auth_asoc_set_default_hmac(asoc, param.hmac_algo); break; case SCTP_PARAM_CHUNKS: if (!ep->auth_enable) goto fall_through; kfree(asoc->peer.peer_chunks); asoc->peer.peer_chunks = kmemdup(param.p, ntohs(param.p->length), gfp); if (!asoc->peer.peer_chunks) retval = 0; break; fall_through: default: /* Any unrecognized parameters should have been caught * and handled by sctp_verify_param() which should be * called prior to this routine. Simply log the error * here. */ pr_debug("%s: ignoring param:%d for association:%p.\n", __func__, ntohs(param.p->type), asoc); break; } return retval; } /* Select a new verification tag. */ __u32 sctp_generate_tag(const struct sctp_endpoint *ep) { /* I believe that this random number generator complies with RFC1750. * A tag of 0 is reserved for special cases (e.g. INIT). */ __u32 x; do { get_random_bytes(&x, sizeof(__u32)); } while (x == 0); return x; } /* Select an initial TSN to send during startup. */ __u32 sctp_generate_tsn(const struct sctp_endpoint *ep) { __u32 retval; get_random_bytes(&retval, sizeof(__u32)); return retval; } /* * ADDIP 3.1.1 Address Configuration Change Chunk (ASCONF) * 0 1 2 3 * 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * | Type = 0xC1 | Chunk Flags | Chunk Length | * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * | Serial Number | * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * | Address Parameter | * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * | ASCONF Parameter #1 | * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * \ \ * / .... / * \ \ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * | ASCONF Parameter #N | * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * * Address Parameter and other parameter will not be wrapped in this function */ static struct sctp_chunk *sctp_make_asconf(struct sctp_association *asoc, union sctp_addr *addr, int vparam_len) { struct sctp_addiphdr asconf; struct sctp_chunk *retval; int length = sizeof(asconf) + vparam_len; union sctp_addr_param addrparam; int addrlen; struct sctp_af *af = sctp_get_af_specific(addr->v4.sin_family); addrlen = af->to_addr_param(addr, &addrparam); if (!addrlen) return NULL; length += addrlen; /* Create the chunk. */ retval = sctp_make_control(asoc, SCTP_CID_ASCONF, 0, length, GFP_ATOMIC); if (!retval) return NULL; asconf.serial = htonl(asoc->addip_serial++); retval->subh.addip_hdr = sctp_addto_chunk(retval, sizeof(asconf), &asconf); retval->param_hdr.v = sctp_addto_chunk(retval, addrlen, &addrparam); return retval; } /* ADDIP * 3.2.1 Add IP Address * 0 1 2 3 * 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * | Type = 0xC001 | Length = Variable | * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * | ASCONF-Request Correlation ID | * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * | Address Parameter | * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * * 3.2.2 Delete IP Address * 0 1 2 3 * 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * | Type = 0xC002 | Length = Variable | * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * | ASCONF-Request Correlation ID | * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * | Address Parameter | * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * */ struct sctp_chunk *sctp_make_asconf_update_ip(struct sctp_association *asoc, union sctp_addr *laddr, struct sockaddr *addrs, int addrcnt, __be16 flags) { union sctp_addr_param addr_param; struct sctp_addip_param param; int paramlen = sizeof(param); struct sctp_chunk *retval; int addr_param_len = 0; union sctp_addr *addr; int totallen = 0, i; int del_pickup = 0; struct sctp_af *af; void *addr_buf; /* Get total length of all the address parameters. */ addr_buf = addrs; for (i = 0; i < addrcnt; i++) { addr = addr_buf; af = sctp_get_af_specific(addr->v4.sin_family); addr_param_len = af->to_addr_param(addr, &addr_param); totallen += paramlen; totallen += addr_param_len; addr_buf += af->sockaddr_len; if (asoc->asconf_addr_del_pending && !del_pickup) { /* reuse the parameter length from the same scope one */ totallen += paramlen; totallen += addr_param_len; del_pickup = 1; pr_debug("%s: picked same-scope del_pending addr, " "totallen for all addresses is %d\n", __func__, totallen); } } /* Create an asconf chunk with the required length. */ retval = sctp_make_asconf(asoc, laddr, totallen); if (!retval) return NULL; /* Add the address parameters to the asconf chunk. */ addr_buf = addrs; for (i = 0; i < addrcnt; i++) { addr = addr_buf; af = sctp_get_af_specific(addr->v4.sin_family); addr_param_len = af->to_addr_param(addr, &addr_param); param.param_hdr.type = flags; param.param_hdr.length = htons(paramlen + addr_param_len); param.crr_id = htonl(i); sctp_addto_chunk(retval, paramlen, ¶m); sctp_addto_chunk(retval, addr_param_len, &addr_param); addr_buf += af->sockaddr_len; } if (flags == SCTP_PARAM_ADD_IP && del_pickup) { addr = asoc->asconf_addr_del_pending; af = sctp_get_af_specific(addr->v4.sin_family); addr_param_len = af->to_addr_param(addr, &addr_param); param.param_hdr.type = SCTP_PARAM_DEL_IP; param.param_hdr.length = htons(paramlen + addr_param_len); param.crr_id = htonl(i); sctp_addto_chunk(retval, paramlen, ¶m); sctp_addto_chunk(retval, addr_param_len, &addr_param); } return retval; } /* ADDIP * 3.2.4 Set Primary IP Address * 0 1 2 3 * 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * | Type =0xC004 | Length = Variable | * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * | ASCONF-Request Correlation ID | * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * | Address Parameter | * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * * Create an ASCONF chunk with Set Primary IP address parameter. */ struct sctp_chunk *sctp_make_asconf_set_prim(struct sctp_association *asoc, union sctp_addr *addr) { struct sctp_af *af = sctp_get_af_specific(addr->v4.sin_family); union sctp_addr_param addrparam; struct sctp_addip_param param; struct sctp_chunk *retval; int len = sizeof(param); int addrlen; addrlen = af->to_addr_param(addr, &addrparam); if (!addrlen) return NULL; len += addrlen; /* Create the chunk and make asconf header. */ retval = sctp_make_asconf(asoc, addr, len); if (!retval) return NULL; param.param_hdr.type = SCTP_PARAM_SET_PRIMARY; param.param_hdr.length = htons(len); param.crr_id = 0; sctp_addto_chunk(retval, sizeof(param), ¶m); sctp_addto_chunk(retval, addrlen, &addrparam); return retval; } /* ADDIP 3.1.2 Address Configuration Acknowledgement Chunk (ASCONF-ACK) * 0 1 2 3 * 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * | Type = 0x80 | Chunk Flags | Chunk Length | * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * | Serial Number | * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * | ASCONF Parameter Response#1 | * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * \ \ * / .... / * \ \ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * | ASCONF Parameter Response#N | * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * * Create an ASCONF_ACK chunk with enough space for the parameter responses. */ static struct sctp_chunk *sctp_make_asconf_ack(const struct sctp_association *asoc, __u32 serial, int vparam_len) { struct sctp_addiphdr asconf; struct sctp_chunk *retval; int length = sizeof(asconf) + vparam_len; /* Create the chunk. */ retval = sctp_make_control(asoc, SCTP_CID_ASCONF_ACK, 0, length, GFP_ATOMIC); if (!retval) return NULL; asconf.serial = htonl(serial); retval->subh.addip_hdr = sctp_addto_chunk(retval, sizeof(asconf), &asconf); return retval; } /* Add response parameters to an ASCONF_ACK chunk. */ static void sctp_add_asconf_response(struct sctp_chunk *chunk, __be32 crr_id, __be16 err_code, struct sctp_addip_param *asconf_param) { struct sctp_addip_param ack_param; struct sctp_errhdr err_param; int asconf_param_len = 0; int err_param_len = 0; __be16 response_type; if (SCTP_ERROR_NO_ERROR == err_code) { response_type = SCTP_PARAM_SUCCESS_REPORT; } else { response_type = SCTP_PARAM_ERR_CAUSE; err_param_len = sizeof(err_param); if (asconf_param) asconf_param_len = ntohs(asconf_param->param_hdr.length); } /* Add Success Indication or Error Cause Indication parameter. */ ack_param.param_hdr.type = response_type; ack_param.param_hdr.length = htons(sizeof(ack_param) + err_param_len + asconf_param_len); ack_param.crr_id = crr_id; sctp_addto_chunk(chunk, sizeof(ack_param), &ack_param); if (SCTP_ERROR_NO_ERROR == err_code) return; /* Add Error Cause parameter. */ err_param.cause = err_code; err_param.length = htons(err_param_len + asconf_param_len); sctp_addto_chunk(chunk, err_param_len, &err_param); /* Add the failed TLV copied from ASCONF chunk. */ if (asconf_param) sctp_addto_chunk(chunk, asconf_param_len, asconf_param); } /* Process a asconf parameter. */ static __be16 sctp_process_asconf_param(struct sctp_association *asoc, struct sctp_chunk *asconf, struct sctp_addip_param *asconf_param) { union sctp_addr_param *addr_param; struct sctp_transport *peer; union sctp_addr addr; struct sctp_af *af; addr_param = (void *)asconf_param + sizeof(*asconf_param); if (asconf_param->param_hdr.type != SCTP_PARAM_ADD_IP && asconf_param->param_hdr.type != SCTP_PARAM_DEL_IP && asconf_param->param_hdr.type != SCTP_PARAM_SET_PRIMARY) return SCTP_ERROR_UNKNOWN_PARAM; switch (addr_param->p.type) { case SCTP_PARAM_IPV6_ADDRESS: if (!asoc->peer.ipv6_address) return SCTP_ERROR_DNS_FAILED; break; case SCTP_PARAM_IPV4_ADDRESS: if (!asoc->peer.ipv4_address) return SCTP_ERROR_DNS_FAILED; break; default: return SCTP_ERROR_DNS_FAILED; } af = sctp_get_af_specific(param_type2af(addr_param->p.type)); if (unlikely(!af)) return SCTP_ERROR_DNS_FAILED; if (!af->from_addr_param(&addr, addr_param, htons(asoc->peer.port), 0)) return SCTP_ERROR_DNS_FAILED; /* ADDIP 4.2.1 This parameter MUST NOT contain a broadcast * or multicast address. * (note: wildcard is permitted and requires special handling so * make sure we check for that) */ if (!af->is_any(&addr) && !af->addr_valid(&addr, NULL, asconf->skb)) return SCTP_ERROR_DNS_FAILED; switch (asconf_param->param_hdr.type) { case SCTP_PARAM_ADD_IP: /* Section 4.2.1: * If the address 0.0.0.0 or ::0 is provided, the source * address of the packet MUST be added. */ if (af->is_any(&addr)) memcpy(&addr, &asconf->source, sizeof(addr)); if (security_sctp_bind_connect(asoc->ep->base.sk, SCTP_PARAM_ADD_IP, (struct sockaddr *)&addr, af->sockaddr_len)) return SCTP_ERROR_REQ_REFUSED; /* ADDIP 4.3 D9) If an endpoint receives an ADD IP address * request and does not have the local resources to add this * new address to the association, it MUST return an Error * Cause TLV set to the new error code 'Operation Refused * Due to Resource Shortage'. */ peer = sctp_assoc_add_peer(asoc, &addr, GFP_ATOMIC, SCTP_UNCONFIRMED); if (!peer) return SCTP_ERROR_RSRC_LOW; /* Start the heartbeat timer. */ sctp_transport_reset_hb_timer(peer); asoc->new_transport = peer; break; case SCTP_PARAM_DEL_IP: /* ADDIP 4.3 D7) If a request is received to delete the * last remaining IP address of a peer endpoint, the receiver * MUST send an Error Cause TLV with the error cause set to the * new error code 'Request to Delete Last Remaining IP Address'. */ if (asoc->peer.transport_count == 1) return SCTP_ERROR_DEL_LAST_IP; /* ADDIP 4.3 D8) If a request is received to delete an IP * address which is also the source address of the IP packet * which contained the ASCONF chunk, the receiver MUST reject * this request. To reject the request the receiver MUST send * an Error Cause TLV set to the new error code 'Request to * Delete Source IP Address' */ if (sctp_cmp_addr_exact(&asconf->source, &addr)) return SCTP_ERROR_DEL_SRC_IP; /* Section 4.2.2 * If the address 0.0.0.0 or ::0 is provided, all * addresses of the peer except the source address of the * packet MUST be deleted. */ if (af->is_any(&addr)) { sctp_assoc_set_primary(asoc, asconf->transport); sctp_assoc_del_nonprimary_peers(asoc, asconf->transport); return SCTP_ERROR_NO_ERROR; } /* If the address is not part of the association, the * ASCONF-ACK with Error Cause Indication Parameter * which including cause of Unresolvable Address should * be sent. */ peer = sctp_assoc_lookup_paddr(asoc, &addr); if (!peer) return SCTP_ERROR_DNS_FAILED; sctp_assoc_rm_peer(asoc, peer); break; case SCTP_PARAM_SET_PRIMARY: /* ADDIP Section 4.2.4 * If the address 0.0.0.0 or ::0 is provided, the receiver * MAY mark the source address of the packet as its * primary. */ if (af->is_any(&addr)) memcpy(&addr, sctp_source(asconf), sizeof(addr)); if (security_sctp_bind_connect(asoc->ep->base.sk, SCTP_PARAM_SET_PRIMARY, (struct sockaddr *)&addr, af->sockaddr_len)) return SCTP_ERROR_REQ_REFUSED; peer = sctp_assoc_lookup_paddr(asoc, &addr); if (!peer) return SCTP_ERROR_DNS_FAILED; sctp_assoc_set_primary(asoc, peer); break; } return SCTP_ERROR_NO_ERROR; } /* Verify the ASCONF packet before we process it. */ bool sctp_verify_asconf(const struct sctp_association *asoc, struct sctp_chunk *chunk, bool addr_param_needed, struct sctp_paramhdr **errp) { struct sctp_addip_chunk *addip; bool addr_param_seen = false; union sctp_params param; addip = (struct sctp_addip_chunk *)chunk->chunk_hdr; sctp_walk_params(param, addip) { size_t length = ntohs(param.p->length); *errp = param.p; switch (param.p->type) { case SCTP_PARAM_ERR_CAUSE: break; case SCTP_PARAM_IPV4_ADDRESS: if (length != sizeof(struct sctp_ipv4addr_param)) return false; /* ensure there is only one addr param and it's in the * beginning of addip_hdr params, or we reject it. */ if (param.v != (addip + 1)) return false; addr_param_seen = true; break; case SCTP_PARAM_IPV6_ADDRESS: if (length != sizeof(struct sctp_ipv6addr_param)) return false; if (param.v != (addip + 1)) return false; addr_param_seen = true; break; case SCTP_PARAM_ADD_IP: case SCTP_PARAM_DEL_IP: case SCTP_PARAM_SET_PRIMARY: /* In ASCONF chunks, these need to be first. */ if (addr_param_needed && !addr_param_seen) return false; length = ntohs(param.addip->param_hdr.length); if (length < sizeof(struct sctp_addip_param) + sizeof(**errp)) return false; break; case SCTP_PARAM_SUCCESS_REPORT: case SCTP_PARAM_ADAPTATION_LAYER_IND: if (length != sizeof(struct sctp_addip_param)) return false; break; default: /* This is unknown to us, reject! */ return false; } } /* Remaining sanity checks. */ if (addr_param_needed && !addr_param_seen) return false; if (!addr_param_needed && addr_param_seen) return false; if (param.v != chunk->chunk_end) return false; return true; } /* Process an incoming ASCONF chunk with the next expected serial no. and * return an ASCONF_ACK chunk to be sent in response. */ struct sctp_chunk *sctp_process_asconf(struct sctp_association *asoc, struct sctp_chunk *asconf) { union sctp_addr_param *addr_param; struct sctp_addip_chunk *addip; struct sctp_chunk *asconf_ack; bool all_param_pass = true; struct sctp_addiphdr *hdr; int length = 0, chunk_len; union sctp_params param; __be16 err_code; __u32 serial; addip = (struct sctp_addip_chunk *)asconf->chunk_hdr; chunk_len = ntohs(asconf->chunk_hdr->length) - sizeof(struct sctp_chunkhdr); hdr = (struct sctp_addiphdr *)asconf->skb->data; serial = ntohl(hdr->serial); /* Skip the addiphdr and store a pointer to address parameter. */ length = sizeof(*hdr); addr_param = (union sctp_addr_param *)(asconf->skb->data + length); chunk_len -= length; /* Skip the address parameter and store a pointer to the first * asconf parameter. */ length = ntohs(addr_param->p.length); chunk_len -= length; /* create an ASCONF_ACK chunk. * Based on the definitions of parameters, we know that the size of * ASCONF_ACK parameters are less than or equal to the fourfold of ASCONF * parameters. */ asconf_ack = sctp_make_asconf_ack(asoc, serial, chunk_len * 4); if (!asconf_ack) goto done; /* Process the TLVs contained within the ASCONF chunk. */ sctp_walk_params(param, addip) { /* Skip preceding address parameters. */ if (param.p->type == SCTP_PARAM_IPV4_ADDRESS || param.p->type == SCTP_PARAM_IPV6_ADDRESS) continue; err_code = sctp_process_asconf_param(asoc, asconf, param.addip); /* ADDIP 4.1 A7) * If an error response is received for a TLV parameter, * all TLVs with no response before the failed TLV are * considered successful if not reported. All TLVs after * the failed response are considered unsuccessful unless * a specific success indication is present for the parameter. */ if (err_code != SCTP_ERROR_NO_ERROR) all_param_pass = false; if (!all_param_pass) sctp_add_asconf_response(asconf_ack, param.addip->crr_id, err_code, param.addip); /* ADDIP 4.3 D11) When an endpoint receiving an ASCONF to add * an IP address sends an 'Out of Resource' in its response, it * MUST also fail any subsequent add or delete requests bundled * in the ASCONF. */ if (err_code == SCTP_ERROR_RSRC_LOW) goto done; } done: asoc->peer.addip_serial++; /* If we are sending a new ASCONF_ACK hold a reference to it in assoc * after freeing the reference to old asconf ack if any. */ if (asconf_ack) { sctp_chunk_hold(asconf_ack); list_add_tail(&asconf_ack->transmitted_list, &asoc->asconf_ack_list); } return asconf_ack; } /* Process a asconf parameter that is successfully acked. */ static void sctp_asconf_param_success(struct sctp_association *asoc, struct sctp_addip_param *asconf_param) { struct sctp_bind_addr *bp = &asoc->base.bind_addr; union sctp_addr_param *addr_param; struct sctp_sockaddr_entry *saddr; struct sctp_transport *transport; union sctp_addr addr; struct sctp_af *af; addr_param = (void *)asconf_param + sizeof(*asconf_param); /* We have checked the packet before, so we do not check again. */ af = sctp_get_af_specific(param_type2af(addr_param->p.type)); if (!af->from_addr_param(&addr, addr_param, htons(bp->port), 0)) return; switch (asconf_param->param_hdr.type) { case SCTP_PARAM_ADD_IP: /* This is always done in BH context with a socket lock * held, so the list can not change. */ local_bh_disable(); list_for_each_entry(saddr, &bp->address_list, list) { if (sctp_cmp_addr_exact(&saddr->a, &addr)) saddr->state = SCTP_ADDR_SRC; } local_bh_enable(); list_for_each_entry(transport, &asoc->peer.transport_addr_list, transports) { sctp_transport_dst_release(transport); } break; case SCTP_PARAM_DEL_IP: local_bh_disable(); sctp_del_bind_addr(bp, &addr); if (asoc->asconf_addr_del_pending != NULL && sctp_cmp_addr_exact(asoc->asconf_addr_del_pending, &addr)) { kfree(asoc->asconf_addr_del_pending); asoc->asconf_addr_del_pending = NULL; } local_bh_enable(); list_for_each_entry(transport, &asoc->peer.transport_addr_list, transports) { sctp_transport_dst_release(transport); } break; default: break; } } /* Get the corresponding ASCONF response error code from the ASCONF_ACK chunk * for the given asconf parameter. If there is no response for this parameter, * return the error code based on the third argument 'no_err'. * ADDIP 4.1 * A7) If an error response is received for a TLV parameter, all TLVs with no * response before the failed TLV are considered successful if not reported. * All TLVs after the failed response are considered unsuccessful unless a * specific success indication is present for the parameter. */ static __be16 sctp_get_asconf_response(struct sctp_chunk *asconf_ack, struct sctp_addip_param *asconf_param, int no_err) { struct sctp_addip_param *asconf_ack_param; struct sctp_errhdr *err_param; int asconf_ack_len; __be16 err_code; int length; if (no_err) err_code = SCTP_ERROR_NO_ERROR; else err_code = SCTP_ERROR_REQ_REFUSED; asconf_ack_len = ntohs(asconf_ack->chunk_hdr->length) - sizeof(struct sctp_chunkhdr); /* Skip the addiphdr from the asconf_ack chunk and store a pointer to * the first asconf_ack parameter. */ length = sizeof(struct sctp_addiphdr); asconf_ack_param = (struct sctp_addip_param *)(asconf_ack->skb->data + length); asconf_ack_len -= length; while (asconf_ack_len > 0) { if (asconf_ack_param->crr_id == asconf_param->crr_id) { switch (asconf_ack_param->param_hdr.type) { case SCTP_PARAM_SUCCESS_REPORT: return SCTP_ERROR_NO_ERROR; case SCTP_PARAM_ERR_CAUSE: length = sizeof(*asconf_ack_param); err_param = (void *)asconf_ack_param + length; asconf_ack_len -= length; if (asconf_ack_len > 0) return err_param->cause; else return SCTP_ERROR_INV_PARAM; break; default: return SCTP_ERROR_INV_PARAM; } } length = ntohs(asconf_ack_param->param_hdr.length); asconf_ack_param = (void *)asconf_ack_param + length; asconf_ack_len -= length; } return err_code; } /* Process an incoming ASCONF_ACK chunk against the cached last ASCONF chunk. */ int sctp_process_asconf_ack(struct sctp_association *asoc, struct sctp_chunk *asconf_ack) { struct sctp_chunk *asconf = asoc->addip_last_asconf; struct sctp_addip_param *asconf_param; __be16 err_code = SCTP_ERROR_NO_ERROR; union sctp_addr_param *addr_param; int asconf_len = asconf->skb->len; int all_param_pass = 0; int length = 0; int no_err = 1; int retval = 0; /* Skip the chunkhdr and addiphdr from the last asconf sent and store * a pointer to address parameter. */ length = sizeof(struct sctp_addip_chunk); addr_param = (union sctp_addr_param *)(asconf->skb->data + length); asconf_len -= length; /* Skip the address parameter in the last asconf sent and store a * pointer to the first asconf parameter. */ length = ntohs(addr_param->p.length); asconf_param = (void *)addr_param + length; asconf_len -= length; /* ADDIP 4.1 * A8) If there is no response(s) to specific TLV parameter(s), and no * failures are indicated, then all request(s) are considered * successful. */ if (asconf_ack->skb->len == sizeof(struct sctp_addiphdr)) all_param_pass = 1; /* Process the TLVs contained in the last sent ASCONF chunk. */ while (asconf_len > 0) { if (all_param_pass) err_code = SCTP_ERROR_NO_ERROR; else { err_code = sctp_get_asconf_response(asconf_ack, asconf_param, no_err); if (no_err && (SCTP_ERROR_NO_ERROR != err_code)) no_err = 0; } switch (err_code) { case SCTP_ERROR_NO_ERROR: sctp_asconf_param_success(asoc, asconf_param); break; case SCTP_ERROR_RSRC_LOW: retval = 1; break; case SCTP_ERROR_UNKNOWN_PARAM: /* Disable sending this type of asconf parameter in * future. */ asoc->peer.addip_disabled_mask |= asconf_param->param_hdr.type; break; case SCTP_ERROR_REQ_REFUSED: case SCTP_ERROR_DEL_LAST_IP: case SCTP_ERROR_DEL_SRC_IP: default: break; } /* Skip the processed asconf parameter and move to the next * one. */ length = ntohs(asconf_param->param_hdr.length); asconf_param = (void *)asconf_param + length; asconf_len -= length; } if (no_err && asoc->src_out_of_asoc_ok) { asoc->src_out_of_asoc_ok = 0; sctp_transport_immediate_rtx(asoc->peer.primary_path); } /* Free the cached last sent asconf chunk. */ list_del_init(&asconf->transmitted_list); sctp_chunk_free(asconf); asoc->addip_last_asconf = NULL; return retval; } /* Make a FWD TSN chunk. */ struct sctp_chunk *sctp_make_fwdtsn(const struct sctp_association *asoc, __u32 new_cum_tsn, size_t nstreams, struct sctp_fwdtsn_skip *skiplist) { struct sctp_chunk *retval = NULL; struct sctp_fwdtsn_hdr ftsn_hdr; struct sctp_fwdtsn_skip skip; size_t hint; int i; hint = (nstreams + 1) * sizeof(__u32); retval = sctp_make_control(asoc, SCTP_CID_FWD_TSN, 0, hint, GFP_ATOMIC); if (!retval) return NULL; ftsn_hdr.new_cum_tsn = htonl(new_cum_tsn); retval->subh.fwdtsn_hdr = sctp_addto_chunk(retval, sizeof(ftsn_hdr), &ftsn_hdr); for (i = 0; i < nstreams; i++) { skip.stream = skiplist[i].stream; skip.ssn = skiplist[i].ssn; sctp_addto_chunk(retval, sizeof(skip), &skip); } return retval; } struct sctp_chunk *sctp_make_ifwdtsn(const struct sctp_association *asoc, __u32 new_cum_tsn, size_t nstreams, struct sctp_ifwdtsn_skip *skiplist) { struct sctp_chunk *retval = NULL; struct sctp_ifwdtsn_hdr ftsn_hdr; size_t hint; hint = (nstreams + 1) * sizeof(__u32); retval = sctp_make_control(asoc, SCTP_CID_I_FWD_TSN, 0, hint, GFP_ATOMIC); if (!retval) return NULL; ftsn_hdr.new_cum_tsn = htonl(new_cum_tsn); retval->subh.ifwdtsn_hdr = sctp_addto_chunk(retval, sizeof(ftsn_hdr), &ftsn_hdr); sctp_addto_chunk(retval, nstreams * sizeof(skiplist[0]), skiplist); return retval; } /* RE-CONFIG 3.1 (RE-CONFIG chunk) * 0 1 2 3 * 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * | Type = 130 | Chunk Flags | Chunk Length | * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * \ \ * / Re-configuration Parameter / * \ \ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * \ \ * / Re-configuration Parameter (optional) / * \ \ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ */ static struct sctp_chunk *sctp_make_reconf(const struct sctp_association *asoc, int length) { struct sctp_reconf_chunk *reconf; struct sctp_chunk *retval; retval = sctp_make_control(asoc, SCTP_CID_RECONF, 0, length, GFP_ATOMIC); if (!retval) return NULL; reconf = (struct sctp_reconf_chunk *)retval->chunk_hdr; retval->param_hdr.v = (u8 *)(reconf + 1); return retval; } /* RE-CONFIG 4.1 (STREAM OUT RESET) * 0 1 2 3 * 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * | Parameter Type = 13 | Parameter Length = 16 + 2 * N | * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * | Re-configuration Request Sequence Number | * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * | Re-configuration Response Sequence Number | * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * | Sender's Last Assigned TSN | * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * | Stream Number 1 (optional) | Stream Number 2 (optional) | * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * / ...... / * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * | Stream Number N-1 (optional) | Stream Number N (optional) | * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * * RE-CONFIG 4.2 (STREAM IN RESET) * 0 1 2 3 * 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * | Parameter Type = 14 | Parameter Length = 8 + 2 * N | * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * | Re-configuration Request Sequence Number | * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * | Stream Number 1 (optional) | Stream Number 2 (optional) | * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * / ...... / * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * | Stream Number N-1 (optional) | Stream Number N (optional) | * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ */ struct sctp_chunk *sctp_make_strreset_req( const struct sctp_association *asoc, __u16 stream_num, __be16 *stream_list, bool out, bool in) { __u16 stream_len = stream_num * sizeof(__u16); struct sctp_strreset_outreq outreq; struct sctp_strreset_inreq inreq; struct sctp_chunk *retval; __u16 outlen, inlen; outlen = (sizeof(outreq) + stream_len) * out; inlen = (sizeof(inreq) + stream_len) * in; retval = sctp_make_reconf(asoc, SCTP_PAD4(outlen) + SCTP_PAD4(inlen)); if (!retval) return NULL; if (outlen) { outreq.param_hdr.type = SCTP_PARAM_RESET_OUT_REQUEST; outreq.param_hdr.length = htons(outlen); outreq.request_seq = htonl(asoc->strreset_outseq); outreq.response_seq = htonl(asoc->strreset_inseq - 1); outreq.send_reset_at_tsn = htonl(asoc->next_tsn - 1); sctp_addto_chunk(retval, sizeof(outreq), &outreq); if (stream_len) sctp_addto_chunk(retval, stream_len, stream_list); } if (inlen) { inreq.param_hdr.type = SCTP_PARAM_RESET_IN_REQUEST; inreq.param_hdr.length = htons(inlen); inreq.request_seq = htonl(asoc->strreset_outseq + out); sctp_addto_chunk(retval, sizeof(inreq), &inreq); if (stream_len) sctp_addto_chunk(retval, stream_len, stream_list); } return retval; } /* RE-CONFIG 4.3 (SSN/TSN RESET ALL) * 0 1 2 3 * 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * | Parameter Type = 15 | Parameter Length = 8 | * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * | Re-configuration Request Sequence Number | * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ */ struct sctp_chunk *sctp_make_strreset_tsnreq( const struct sctp_association *asoc) { struct sctp_strreset_tsnreq tsnreq; __u16 length = sizeof(tsnreq); struct sctp_chunk *retval; retval = sctp_make_reconf(asoc, length); if (!retval) return NULL; tsnreq.param_hdr.type = SCTP_PARAM_RESET_TSN_REQUEST; tsnreq.param_hdr.length = htons(length); tsnreq.request_seq = htonl(asoc->strreset_outseq); sctp_addto_chunk(retval, sizeof(tsnreq), &tsnreq); return retval; } /* RE-CONFIG 4.5/4.6 (ADD STREAM) * 0 1 2 3 * 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * | Parameter Type = 17 | Parameter Length = 12 | * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * | Re-configuration Request Sequence Number | * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * | Number of new streams | Reserved | * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ */ struct sctp_chunk *sctp_make_strreset_addstrm( const struct sctp_association *asoc, __u16 out, __u16 in) { struct sctp_strreset_addstrm addstrm; __u16 size = sizeof(addstrm); struct sctp_chunk *retval; retval = sctp_make_reconf(asoc, (!!out + !!in) * size); if (!retval) return NULL; if (out) { addstrm.param_hdr.type = SCTP_PARAM_RESET_ADD_OUT_STREAMS; addstrm.param_hdr.length = htons(size); addstrm.number_of_streams = htons(out); addstrm.request_seq = htonl(asoc->strreset_outseq); addstrm.reserved = 0; sctp_addto_chunk(retval, size, &addstrm); } if (in) { addstrm.param_hdr.type = SCTP_PARAM_RESET_ADD_IN_STREAMS; addstrm.param_hdr.length = htons(size); addstrm.number_of_streams = htons(in); addstrm.request_seq = htonl(asoc->strreset_outseq + !!out); addstrm.reserved = 0; sctp_addto_chunk(retval, size, &addstrm); } return retval; } /* RE-CONFIG 4.4 (RESP) * 0 1 2 3 * 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * | Parameter Type = 16 | Parameter Length | * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * | Re-configuration Response Sequence Number | * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * | Result | * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ */ struct sctp_chunk *sctp_make_strreset_resp(const struct sctp_association *asoc, __u32 result, __u32 sn) { struct sctp_strreset_resp resp; __u16 length = sizeof(resp); struct sctp_chunk *retval; retval = sctp_make_reconf(asoc, length); if (!retval) return NULL; resp.param_hdr.type = SCTP_PARAM_RESET_RESPONSE; resp.param_hdr.length = htons(length); resp.response_seq = htonl(sn); resp.result = htonl(result); sctp_addto_chunk(retval, sizeof(resp), &resp); return retval; } /* RE-CONFIG 4.4 OPTIONAL (TSNRESP) * 0 1 2 3 * 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * | Parameter Type = 16 | Parameter Length | * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * | Re-configuration Response Sequence Number | * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * | Result | * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * | Sender's Next TSN (optional) | * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * | Receiver's Next TSN (optional) | * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ */ struct sctp_chunk *sctp_make_strreset_tsnresp(struct sctp_association *asoc, __u32 result, __u32 sn, __u32 sender_tsn, __u32 receiver_tsn) { struct sctp_strreset_resptsn tsnresp; __u16 length = sizeof(tsnresp); struct sctp_chunk *retval; retval = sctp_make_reconf(asoc, length); if (!retval) return NULL; tsnresp.param_hdr.type = SCTP_PARAM_RESET_RESPONSE; tsnresp.param_hdr.length = htons(length); tsnresp.response_seq = htonl(sn); tsnresp.result = htonl(result); tsnresp.senders_next_tsn = htonl(sender_tsn); tsnresp.receivers_next_tsn = htonl(receiver_tsn); sctp_addto_chunk(retval, sizeof(tsnresp), &tsnresp); return retval; } bool sctp_verify_reconf(const struct sctp_association *asoc, struct sctp_chunk *chunk, struct sctp_paramhdr **errp) { struct sctp_reconf_chunk *hdr; union sctp_params param; __be16 last = 0; __u16 cnt = 0; hdr = (struct sctp_reconf_chunk *)chunk->chunk_hdr; sctp_walk_params(param, hdr) { __u16 length = ntohs(param.p->length); *errp = param.p; if (cnt++ > 2) return false; switch (param.p->type) { case SCTP_PARAM_RESET_OUT_REQUEST: if (length < sizeof(struct sctp_strreset_outreq) || (last && last != SCTP_PARAM_RESET_RESPONSE && last != SCTP_PARAM_RESET_IN_REQUEST)) return false; break; case SCTP_PARAM_RESET_IN_REQUEST: if (length < sizeof(struct sctp_strreset_inreq) || (last && last != SCTP_PARAM_RESET_OUT_REQUEST)) return false; break; case SCTP_PARAM_RESET_RESPONSE: if ((length != sizeof(struct sctp_strreset_resp) && length != sizeof(struct sctp_strreset_resptsn)) || (last && last != SCTP_PARAM_RESET_RESPONSE && last != SCTP_PARAM_RESET_OUT_REQUEST)) return false; break; case SCTP_PARAM_RESET_TSN_REQUEST: if (length != sizeof(struct sctp_strreset_tsnreq) || last) return false; break; case SCTP_PARAM_RESET_ADD_IN_STREAMS: if (length != sizeof(struct sctp_strreset_addstrm) || (last && last != SCTP_PARAM_RESET_ADD_OUT_STREAMS)) return false; break; case SCTP_PARAM_RESET_ADD_OUT_STREAMS: if (length != sizeof(struct sctp_strreset_addstrm) || (last && last != SCTP_PARAM_RESET_ADD_IN_STREAMS)) return false; break; default: return false; } last = param.p->type; } return true; } |
| 6 1 1 3 2 1 6 2146 2113 2 1 1 1 1 1 2 1 7 1 2096 2113 1 2125 2107 1 2101 1 2142 2122 2124 2094 2120 2 2095 2094 7 2103 2151 6 11 11 2091 11 2170 42 1 26 12 27 2 2168 13 13 13 13 13 2093 2092 161 2088 2088 2079 2165 2164 2160 2093 2092 11 2154 13 2 11 54 11 43 241 2 1 2 2 1 1 2 2 2 2155 11 2154 2156 2 2165 2108 2145 2104 58 2102 2106 12 2071 2078 2093 2090 2080 987 1920 8 8 5 5 20 20 20 20 1 114 114 2081 2078 1911 1911 2080 2080 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 2297 2298 2299 2300 2301 2302 2303 2304 2305 2306 2307 2308 2309 2310 2311 2312 2313 2314 2315 2316 2317 2318 2319 2320 2321 2322 2323 2324 2325 2326 2327 2328 2329 2330 2331 2332 2333 2334 2335 2336 2337 2338 2339 2340 2341 2342 2343 2344 2345 2346 2347 2348 2349 2350 2351 2352 2353 2354 2355 2356 2357 2358 2359 2360 2361 2362 2363 2364 2365 2366 2367 2368 2369 2370 2371 2372 2373 2374 2375 2376 2377 2378 2379 2380 2381 2382 2383 2384 2385 2386 2387 2388 2389 2390 2391 2392 2393 2394 2395 2396 2397 2398 2399 2400 2401 2402 2403 2404 2405 2406 2407 2408 2409 2410 2411 2412 2413 2414 2415 2416 2417 2418 2419 2420 2421 2422 2423 2424 2425 2426 2427 2428 2429 2430 2431 2432 2433 2434 2435 2436 2437 2438 2439 2440 2441 2442 2443 2444 2445 2446 2447 2448 2449 2450 2451 2452 2453 2454 2455 2456 2457 2458 2459 2460 2461 2462 2463 2464 2465 2466 2467 2468 2469 2470 2471 2472 2473 2474 2475 2476 2477 2478 2479 2480 2481 2482 2483 2484 2485 2486 2487 2488 2489 2490 2491 2492 2493 2494 2495 2496 2497 2498 2499 2500 2501 2502 2503 2504 2505 2506 2507 2508 2509 2510 2511 2512 2513 2514 2515 2516 2517 2518 2519 2520 2521 2522 2523 2524 2525 2526 2527 2528 2529 2530 2531 2532 2533 2534 2535 2536 2537 2538 2539 2540 2541 2542 2543 2544 2545 2546 2547 2548 2549 2550 2551 2552 2553 2554 2555 2556 2557 2558 2559 2560 2561 2562 2563 2564 2565 2566 2567 2568 2569 2570 2571 2572 2573 2574 2575 2576 2577 2578 2579 2580 2581 2582 2583 2584 2585 2586 2587 2588 2589 2590 2591 2592 2593 2594 2595 2596 2597 2598 2599 2600 2601 2602 2603 2604 2605 2606 2607 2608 2609 2610 2611 2612 2613 2614 2615 2616 2617 2618 2619 2620 2621 2622 2623 2624 2625 2626 2627 2628 2629 2630 2631 2632 2633 2634 2635 2636 2637 2638 2639 2640 2641 2642 2643 2644 2645 2646 2647 2648 2649 2650 2651 2652 2653 2654 2655 2656 2657 2658 2659 2660 2661 2662 2663 2664 2665 2666 2667 2668 2669 2670 2671 2672 2673 2674 2675 2676 2677 2678 2679 2680 2681 2682 2683 2684 2685 2686 2687 2688 2689 2690 2691 2692 2693 2694 2695 2696 2697 2698 2699 2700 2701 2702 2703 2704 2705 2706 2707 2708 2709 2710 2711 2712 2713 2714 2715 2716 2717 2718 2719 2720 2721 2722 2723 2724 2725 2726 2727 2728 2729 2730 2731 2732 2733 2734 2735 2736 2737 2738 2739 2740 2741 2742 2743 2744 2745 2746 2747 2748 2749 2750 2751 2752 2753 2754 2755 2756 2757 2758 2759 2760 2761 2762 2763 2764 2765 2766 2767 2768 2769 2770 2771 2772 2773 2774 2775 2776 2777 2778 2779 2780 2781 2782 2783 2784 2785 2786 2787 2788 2789 2790 2791 2792 2793 2794 2795 2796 2797 2798 2799 2800 2801 2802 2803 2804 2805 2806 2807 2808 2809 2810 2811 2812 2813 2814 2815 2816 2817 2818 2819 2820 2821 2822 2823 2824 2825 2826 2827 2828 2829 2830 2831 2832 2833 2834 2835 2836 2837 2838 2839 2840 2841 2842 2843 2844 2845 2846 2847 2848 2849 2850 2851 2852 2853 2854 2855 2856 2857 2858 2859 2860 2861 2862 2863 2864 2865 2866 2867 2868 2869 2870 2871 2872 2873 2874 2875 2876 2877 2878 2879 2880 2881 2882 2883 2884 2885 2886 2887 2888 2889 2890 2891 2892 2893 2894 2895 2896 2897 2898 2899 2900 2901 2902 2903 2904 2905 2906 2907 2908 2909 2910 2911 2912 2913 2914 2915 2916 2917 2918 2919 2920 2921 2922 2923 2924 2925 2926 2927 2928 2929 2930 2931 2932 2933 2934 2935 2936 2937 2938 2939 2940 2941 2942 2943 2944 2945 2946 2947 2948 2949 2950 2951 2952 2953 2954 2955 2956 2957 2958 2959 2960 2961 2962 2963 2964 2965 2966 2967 2968 2969 2970 2971 2972 2973 2974 2975 2976 2977 2978 2979 2980 2981 2982 2983 2984 2985 2986 2987 2988 2989 2990 2991 2992 2993 2994 2995 2996 2997 2998 2999 3000 3001 3002 3003 3004 3005 3006 3007 3008 3009 3010 3011 3012 3013 3014 3015 3016 3017 3018 3019 3020 3021 3022 3023 3024 3025 3026 3027 3028 3029 3030 3031 3032 3033 3034 3035 3036 3037 3038 3039 3040 3041 3042 3043 3044 3045 3046 3047 3048 3049 3050 3051 3052 3053 3054 3055 3056 3057 3058 3059 3060 3061 3062 3063 3064 3065 3066 3067 3068 3069 3070 3071 3072 3073 3074 3075 3076 3077 3078 3079 3080 3081 3082 3083 3084 3085 3086 3087 3088 3089 3090 3091 3092 3093 3094 3095 3096 3097 3098 3099 3100 3101 3102 3103 3104 3105 3106 3107 3108 3109 3110 3111 3112 3113 3114 3115 3116 3117 3118 3119 3120 3121 3122 3123 3124 3125 3126 3127 3128 3129 3130 3131 3132 3133 3134 3135 3136 3137 3138 3139 3140 3141 3142 3143 3144 3145 3146 3147 3148 3149 3150 3151 3152 3153 3154 3155 3156 3157 3158 3159 3160 3161 3162 3163 3164 3165 3166 3167 3168 3169 3170 3171 3172 3173 3174 3175 3176 3177 3178 3179 3180 3181 3182 3183 | // SPDX-License-Identifier: GPL-2.0+ /* * (C) Copyright Linus Torvalds 1999 * (C) Copyright Johannes Erdfelt 1999-2001 * (C) Copyright Andreas Gal 1999 * (C) Copyright Gregory P. Smith 1999 * (C) Copyright Deti Fliegl 1999 * (C) Copyright Randy Dunlap 2000 * (C) Copyright David Brownell 2000-2002 */ #include <linux/bcd.h> #include <linux/module.h> #include <linux/version.h> #include <linux/kernel.h> #include <linux/sched/task_stack.h> #include <linux/slab.h> #include <linux/completion.h> #include <linux/utsname.h> #include <linux/mm.h> #include <asm/io.h> #include <linux/device.h> #include <linux/dma-mapping.h> #include <linux/mutex.h> #include <asm/irq.h> #include <asm/byteorder.h> #include <linux/unaligned.h> #include <linux/platform_device.h> #include <linux/workqueue.h> #include <linux/pm_runtime.h> #include <linux/types.h> #include <linux/genalloc.h> #include <linux/io.h> #include <linux/kcov.h> #include <linux/phy/phy.h> #include <linux/usb.h> #include <linux/usb/hcd.h> #include <linux/usb/otg.h> #include "usb.h" #include "phy.h" /*-------------------------------------------------------------------------*/ /* * USB Host Controller Driver framework * * Plugs into usbcore (usb_bus) and lets HCDs share code, minimizing * HCD-specific behaviors/bugs. * * This does error checks, tracks devices and urbs, and delegates to a * "hc_driver" only for code (and data) that really needs to know about * hardware differences. That includes root hub registers, i/o queues, * and so on ... but as little else as possible. * * Shared code includes most of the "root hub" code (these are emulated, * though each HC's hardware works differently) and PCI glue, plus request * tracking overhead. The HCD code should only block on spinlocks or on * hardware handshaking; blocking on software events (such as other kernel * threads releasing resources, or completing actions) is all generic. * * Happens the USB 2.0 spec says this would be invisible inside the "USBD", * and includes mostly a "HCDI" (HCD Interface) along with some APIs used * only by the hub driver ... and that neither should be seen or used by * usb client device drivers. * * Contributors of ideas or unattributed patches include: David Brownell, * Roman Weissgaerber, Rory Bolt, Greg Kroah-Hartman, ... * * HISTORY: * 2002-02-21 Pull in most of the usb_bus support from usb.c; some * associated cleanup. "usb_hcd" still != "usb_bus". * 2001-12-12 Initial patch version for Linux 2.5.1 kernel. */ /*-------------------------------------------------------------------------*/ /* Keep track of which host controller drivers are loaded */ unsigned long usb_hcds_loaded; EXPORT_SYMBOL_GPL(usb_hcds_loaded); /* host controllers we manage */ DEFINE_IDR (usb_bus_idr); EXPORT_SYMBOL_GPL (usb_bus_idr); /* used when allocating bus numbers */ #define USB_MAXBUS 64 /* used when updating list of hcds */ DEFINE_MUTEX(usb_bus_idr_lock); /* exported only for usbfs */ EXPORT_SYMBOL_GPL (usb_bus_idr_lock); /* used for controlling access to virtual root hubs */ static DEFINE_SPINLOCK(hcd_root_hub_lock); /* used when updating an endpoint's URB list */ static DEFINE_SPINLOCK(hcd_urb_list_lock); /* used to protect against unlinking URBs after the device is gone */ static DEFINE_SPINLOCK(hcd_urb_unlink_lock); /* wait queue for synchronous unlinks */ DECLARE_WAIT_QUEUE_HEAD(usb_kill_urb_queue); /*-------------------------------------------------------------------------*/ /* * Sharable chunks of root hub code. */ /*-------------------------------------------------------------------------*/ #define KERNEL_REL bin2bcd(LINUX_VERSION_MAJOR) #define KERNEL_VER bin2bcd(LINUX_VERSION_PATCHLEVEL) /* usb 3.1 root hub device descriptor */ static const u8 usb31_rh_dev_descriptor[18] = { 0x12, /* __u8 bLength; */ USB_DT_DEVICE, /* __u8 bDescriptorType; Device */ 0x10, 0x03, /* __le16 bcdUSB; v3.1 */ 0x09, /* __u8 bDeviceClass; HUB_CLASSCODE */ 0x00, /* __u8 bDeviceSubClass; */ 0x03, /* __u8 bDeviceProtocol; USB 3 hub */ 0x09, /* __u8 bMaxPacketSize0; 2^9 = 512 Bytes */ 0x6b, 0x1d, /* __le16 idVendor; Linux Foundation 0x1d6b */ 0x03, 0x00, /* __le16 idProduct; device 0x0003 */ KERNEL_VER, KERNEL_REL, /* __le16 bcdDevice */ 0x03, /* __u8 iManufacturer; */ 0x02, /* __u8 iProduct; */ 0x01, /* __u8 iSerialNumber; */ 0x01 /* __u8 bNumConfigurations; */ }; /* usb 3.0 root hub device descriptor */ static const u8 usb3_rh_dev_descriptor[18] = { 0x12, /* __u8 bLength; */ USB_DT_DEVICE, /* __u8 bDescriptorType; Device */ 0x00, 0x03, /* __le16 bcdUSB; v3.0 */ 0x09, /* __u8 bDeviceClass; HUB_CLASSCODE */ 0x00, /* __u8 bDeviceSubClass; */ 0x03, /* __u8 bDeviceProtocol; USB 3.0 hub */ 0x09, /* __u8 bMaxPacketSize0; 2^9 = 512 Bytes */ 0x6b, 0x1d, /* __le16 idVendor; Linux Foundation 0x1d6b */ 0x03, 0x00, /* __le16 idProduct; device 0x0003 */ KERNEL_VER, KERNEL_REL, /* __le16 bcdDevice */ 0x03, /* __u8 iManufacturer; */ 0x02, /* __u8 iProduct; */ 0x01, /* __u8 iSerialNumber; */ 0x01 /* __u8 bNumConfigurations; */ }; /* usb 2.0 root hub device descriptor */ static const u8 usb2_rh_dev_descriptor[18] = { 0x12, /* __u8 bLength; */ USB_DT_DEVICE, /* __u8 bDescriptorType; Device */ 0x00, 0x02, /* __le16 bcdUSB; v2.0 */ 0x09, /* __u8 bDeviceClass; HUB_CLASSCODE */ 0x00, /* __u8 bDeviceSubClass; */ 0x00, /* __u8 bDeviceProtocol; [ usb 2.0 no TT ] */ 0x40, /* __u8 bMaxPacketSize0; 64 Bytes */ 0x6b, 0x1d, /* __le16 idVendor; Linux Foundation 0x1d6b */ 0x02, 0x00, /* __le16 idProduct; device 0x0002 */ KERNEL_VER, KERNEL_REL, /* __le16 bcdDevice */ 0x03, /* __u8 iManufacturer; */ 0x02, /* __u8 iProduct; */ 0x01, /* __u8 iSerialNumber; */ 0x01 /* __u8 bNumConfigurations; */ }; /* no usb 2.0 root hub "device qualifier" descriptor: one speed only */ /* usb 1.1 root hub device descriptor */ static const u8 usb11_rh_dev_descriptor[18] = { 0x12, /* __u8 bLength; */ USB_DT_DEVICE, /* __u8 bDescriptorType; Device */ 0x10, 0x01, /* __le16 bcdUSB; v1.1 */ 0x09, /* __u8 bDeviceClass; HUB_CLASSCODE */ 0x00, /* __u8 bDeviceSubClass; */ 0x00, /* __u8 bDeviceProtocol; [ low/full speeds only ] */ 0x40, /* __u8 bMaxPacketSize0; 64 Bytes */ 0x6b, 0x1d, /* __le16 idVendor; Linux Foundation 0x1d6b */ 0x01, 0x00, /* __le16 idProduct; device 0x0001 */ KERNEL_VER, KERNEL_REL, /* __le16 bcdDevice */ 0x03, /* __u8 iManufacturer; */ 0x02, /* __u8 iProduct; */ 0x01, /* __u8 iSerialNumber; */ 0x01 /* __u8 bNumConfigurations; */ }; /*-------------------------------------------------------------------------*/ /* Configuration descriptors for our root hubs */ static const u8 fs_rh_config_descriptor[] = { /* one configuration */ 0x09, /* __u8 bLength; */ USB_DT_CONFIG, /* __u8 bDescriptorType; Configuration */ 0x19, 0x00, /* __le16 wTotalLength; */ 0x01, /* __u8 bNumInterfaces; (1) */ 0x01, /* __u8 bConfigurationValue; */ 0x00, /* __u8 iConfiguration; */ 0xc0, /* __u8 bmAttributes; Bit 7: must be set, 6: Self-powered, 5: Remote wakeup, 4..0: resvd */ 0x00, /* __u8 MaxPower; */ /* USB 1.1: * USB 2.0, single TT organization (mandatory): * one interface, protocol 0 * * USB 2.0, multiple TT organization (optional): * two interfaces, protocols 1 (like single TT) * and 2 (multiple TT mode) ... config is * sometimes settable * NOT IMPLEMENTED */ /* one interface */ 0x09, /* __u8 if_bLength; */ USB_DT_INTERFACE, /* __u8 if_bDescriptorType; Interface */ 0x00, /* __u8 if_bInterfaceNumber; */ 0x00, /* __u8 if_bAlternateSetting; */ 0x01, /* __u8 if_bNumEndpoints; */ 0x09, /* __u8 if_bInterfaceClass; HUB_CLASSCODE */ 0x00, /* __u8 if_bInterfaceSubClass; */ 0x00, /* __u8 if_bInterfaceProtocol; [usb1.1 or single tt] */ 0x00, /* __u8 if_iInterface; */ /* one endpoint (status change endpoint) */ 0x07, /* __u8 ep_bLength; */ USB_DT_ENDPOINT, /* __u8 ep_bDescriptorType; Endpoint */ 0x81, /* __u8 ep_bEndpointAddress; IN Endpoint 1 */ 0x03, /* __u8 ep_bmAttributes; Interrupt */ 0x02, 0x00, /* __le16 ep_wMaxPacketSize; 1 + (MAX_ROOT_PORTS / 8) */ 0xff /* __u8 ep_bInterval; (255ms -- usb 2.0 spec) */ }; static const u8 hs_rh_config_descriptor[] = { /* one configuration */ 0x09, /* __u8 bLength; */ USB_DT_CONFIG, /* __u8 bDescriptorType; Configuration */ 0x19, 0x00, /* __le16 wTotalLength; */ 0x01, /* __u8 bNumInterfaces; (1) */ 0x01, /* __u8 bConfigurationValue; */ 0x00, /* __u8 iConfiguration; */ 0xc0, /* __u8 bmAttributes; Bit 7: must be set, 6: Self-powered, 5: Remote wakeup, 4..0: resvd */ 0x00, /* __u8 MaxPower; */ /* USB 1.1: * USB 2.0, single TT organization (mandatory): * one interface, protocol 0 * * USB 2.0, multiple TT organization (optional): * two interfaces, protocols 1 (like single TT) * and 2 (multiple TT mode) ... config is * sometimes settable * NOT IMPLEMENTED */ /* one interface */ 0x09, /* __u8 if_bLength; */ USB_DT_INTERFACE, /* __u8 if_bDescriptorType; Interface */ 0x00, /* __u8 if_bInterfaceNumber; */ 0x00, /* __u8 if_bAlternateSetting; */ 0x01, /* __u8 if_bNumEndpoints; */ 0x09, /* __u8 if_bInterfaceClass; HUB_CLASSCODE */ 0x00, /* __u8 if_bInterfaceSubClass; */ 0x00, /* __u8 if_bInterfaceProtocol; [usb1.1 or single tt] */ 0x00, /* __u8 if_iInterface; */ /* one endpoint (status change endpoint) */ 0x07, /* __u8 ep_bLength; */ USB_DT_ENDPOINT, /* __u8 ep_bDescriptorType; Endpoint */ 0x81, /* __u8 ep_bEndpointAddress; IN Endpoint 1 */ 0x03, /* __u8 ep_bmAttributes; Interrupt */ /* __le16 ep_wMaxPacketSize; 1 + (MAX_ROOT_PORTS / 8) * see hub.c:hub_configure() for details. */ (USB_MAXCHILDREN + 1 + 7) / 8, 0x00, 0x0c /* __u8 ep_bInterval; (256ms -- usb 2.0 spec) */ }; static const u8 ss_rh_config_descriptor[] = { /* one configuration */ 0x09, /* __u8 bLength; */ USB_DT_CONFIG, /* __u8 bDescriptorType; Configuration */ 0x1f, 0x00, /* __le16 wTotalLength; */ 0x01, /* __u8 bNumInterfaces; (1) */ 0x01, /* __u8 bConfigurationValue; */ 0x00, /* __u8 iConfiguration; */ 0xc0, /* __u8 bmAttributes; Bit 7: must be set, 6: Self-powered, 5: Remote wakeup, 4..0: resvd */ 0x00, /* __u8 MaxPower; */ /* one interface */ 0x09, /* __u8 if_bLength; */ USB_DT_INTERFACE, /* __u8 if_bDescriptorType; Interface */ 0x00, /* __u8 if_bInterfaceNumber; */ 0x00, /* __u8 if_bAlternateSetting; */ 0x01, /* __u8 if_bNumEndpoints; */ 0x09, /* __u8 if_bInterfaceClass; HUB_CLASSCODE */ 0x00, /* __u8 if_bInterfaceSubClass; */ 0x00, /* __u8 if_bInterfaceProtocol; */ 0x00, /* __u8 if_iInterface; */ /* one endpoint (status change endpoint) */ 0x07, /* __u8 ep_bLength; */ USB_DT_ENDPOINT, /* __u8 ep_bDescriptorType; Endpoint */ 0x81, /* __u8 ep_bEndpointAddress; IN Endpoint 1 */ 0x03, /* __u8 ep_bmAttributes; Interrupt */ /* __le16 ep_wMaxPacketSize; 1 + (MAX_ROOT_PORTS / 8) * see hub.c:hub_configure() for details. */ (USB_MAXCHILDREN + 1 + 7) / 8, 0x00, 0x0c, /* __u8 ep_bInterval; (256ms -- usb 2.0 spec) */ /* one SuperSpeed endpoint companion descriptor */ 0x06, /* __u8 ss_bLength */ USB_DT_SS_ENDPOINT_COMP, /* __u8 ss_bDescriptorType; SuperSpeed EP */ /* Companion */ 0x00, /* __u8 ss_bMaxBurst; allows 1 TX between ACKs */ 0x00, /* __u8 ss_bmAttributes; 1 packet per service interval */ 0x02, 0x00 /* __le16 ss_wBytesPerInterval; 15 bits for max 15 ports */ }; /* authorized_default behaviour: * -1 is authorized for all devices (leftover from wireless USB) * 0 is unauthorized for all devices * 1 is authorized for all devices * 2 is authorized for internal devices */ #define USB_AUTHORIZE_WIRED -1 #define USB_AUTHORIZE_NONE 0 #define USB_AUTHORIZE_ALL 1 #define USB_AUTHORIZE_INTERNAL 2 static int authorized_default = CONFIG_USB_DEFAULT_AUTHORIZATION_MODE; module_param(authorized_default, int, S_IRUGO|S_IWUSR); MODULE_PARM_DESC(authorized_default, "Default USB device authorization: 0 is not authorized, 1 is authorized (default), 2 is authorized for internal devices, -1 is authorized (same as 1)"); /*-------------------------------------------------------------------------*/ /** * ascii2desc() - Helper routine for producing UTF-16LE string descriptors * @s: Null-terminated ASCII (actually ISO-8859-1) string * @buf: Buffer for USB string descriptor (header + UTF-16LE) * @len: Length (in bytes; may be odd) of descriptor buffer. * * Return: The number of bytes filled in: 2 + 2*strlen(s) or @len, * whichever is less. * * Note: * USB String descriptors can contain at most 126 characters; input * strings longer than that are truncated. */ static unsigned ascii2desc(char const *s, u8 *buf, unsigned len) { unsigned n, t = 2 + 2*strlen(s); if (t > 254) t = 254; /* Longest possible UTF string descriptor */ if (len > t) len = t; t += USB_DT_STRING << 8; /* Now t is first 16 bits to store */ n = len; while (n--) { *buf++ = t; if (!n--) break; *buf++ = t >> 8; t = (unsigned char)*s++; } return len; } /** * rh_string() - provides string descriptors for root hub * @id: the string ID number (0: langids, 1: serial #, 2: product, 3: vendor) * @hcd: the host controller for this root hub * @data: buffer for output packet * @len: length of the provided buffer * * Produces either a manufacturer, product or serial number string for the * virtual root hub device. * * Return: The number of bytes filled in: the length of the descriptor or * of the provided buffer, whichever is less. */ static unsigned rh_string(int id, struct usb_hcd const *hcd, u8 *data, unsigned len) { char buf[160]; char const *s; static char const langids[4] = {4, USB_DT_STRING, 0x09, 0x04}; /* language ids */ switch (id) { case 0: /* Array of LANGID codes (0x0409 is MSFT-speak for "en-us") */ /* See http://www.usb.org/developers/docs/USB_LANGIDs.pdf */ if (len > 4) len = 4; memcpy(data, langids, len); return len; case 1: /* Serial number */ s = hcd->self.bus_name; break; case 2: /* Product name */ s = hcd->product_desc; break; case 3: /* Manufacturer */ snprintf (buf, sizeof buf, "%s %s %s", init_utsname()->sysname, init_utsname()->release, hcd->driver->description); s = buf; break; default: /* Can't happen; caller guarantees it */ return 0; } return ascii2desc(s, data, len); } /* Root hub control transfers execute synchronously */ static int rh_call_control (struct usb_hcd *hcd, struct urb *urb) { struct usb_ctrlrequest *cmd; u16 typeReq, wValue, wIndex, wLength; u8 *ubuf = urb->transfer_buffer; unsigned len = 0; int status; u8 patch_wakeup = 0; u8 patch_protocol = 0; u16 tbuf_size; u8 *tbuf = NULL; const u8 *bufp; might_sleep(); spin_lock_irq(&hcd_root_hub_lock); status = usb_hcd_link_urb_to_ep(hcd, urb); spin_unlock_irq(&hcd_root_hub_lock); if (status) return status; urb->hcpriv = hcd; /* Indicate it's queued */ cmd = (struct usb_ctrlrequest *) urb->setup_packet; typeReq = (cmd->bRequestType << 8) | cmd->bRequest; wValue = le16_to_cpu (cmd->wValue); wIndex = le16_to_cpu (cmd->wIndex); wLength = le16_to_cpu (cmd->wLength); if (wLength > urb->transfer_buffer_length) goto error; /* * tbuf should be at least as big as the * USB hub descriptor. */ tbuf_size = max_t(u16, sizeof(struct usb_hub_descriptor), wLength); tbuf = kzalloc(tbuf_size, GFP_KERNEL); if (!tbuf) { status = -ENOMEM; goto err_alloc; } bufp = tbuf; urb->actual_length = 0; switch (typeReq) { /* DEVICE REQUESTS */ /* The root hub's remote wakeup enable bit is implemented using * driver model wakeup flags. If this system supports wakeup * through USB, userspace may change the default "allow wakeup" * policy through sysfs or these calls. * * Most root hubs support wakeup from downstream devices, for * runtime power management (disabling USB clocks and reducing * VBUS power usage). However, not all of them do so; silicon, * board, and BIOS bugs here are not uncommon, so these can't * be treated quite like external hubs. * * Likewise, not all root hubs will pass wakeup events upstream, * to wake up the whole system. So don't assume root hub and * controller capabilities are identical. */ case DeviceRequest | USB_REQ_GET_STATUS: tbuf[0] = (device_may_wakeup(&hcd->self.root_hub->dev) << USB_DEVICE_REMOTE_WAKEUP) | (1 << USB_DEVICE_SELF_POWERED); tbuf[1] = 0; len = 2; break; case DeviceOutRequest | USB_REQ_CLEAR_FEATURE: if (wValue == USB_DEVICE_REMOTE_WAKEUP) device_set_wakeup_enable(&hcd->self.root_hub->dev, 0); else goto error; break; case DeviceOutRequest | USB_REQ_SET_FEATURE: if (device_can_wakeup(&hcd->self.root_hub->dev) && wValue == USB_DEVICE_REMOTE_WAKEUP) device_set_wakeup_enable(&hcd->self.root_hub->dev, 1); else goto error; break; case DeviceRequest | USB_REQ_GET_CONFIGURATION: tbuf[0] = 1; len = 1; fallthrough; case DeviceOutRequest | USB_REQ_SET_CONFIGURATION: break; case DeviceRequest | USB_REQ_GET_DESCRIPTOR: switch (wValue & 0xff00) { case USB_DT_DEVICE << 8: switch (hcd->speed) { case HCD_USB32: case HCD_USB31: bufp = usb31_rh_dev_descriptor; break; case HCD_USB3: bufp = usb3_rh_dev_descriptor; break; case HCD_USB2: bufp = usb2_rh_dev_descriptor; break; case HCD_USB11: bufp = usb11_rh_dev_descriptor; break; default: goto error; } len = 18; if (hcd->has_tt) patch_protocol = 1; break; case USB_DT_CONFIG << 8: switch (hcd->speed) { case HCD_USB32: case HCD_USB31: case HCD_USB3: bufp = ss_rh_config_descriptor; len = sizeof ss_rh_config_descriptor; break; case HCD_USB2: bufp = hs_rh_config_descriptor; len = sizeof hs_rh_config_descriptor; break; case HCD_USB11: bufp = fs_rh_config_descriptor; len = sizeof fs_rh_config_descriptor; break; default: goto error; } if (device_can_wakeup(&hcd->self.root_hub->dev)) patch_wakeup = 1; break; case USB_DT_STRING << 8: if ((wValue & 0xff) < 4) urb->actual_length = rh_string(wValue & 0xff, hcd, ubuf, wLength); else /* unsupported IDs --> "protocol stall" */ goto error; break; case USB_DT_BOS << 8: goto nongeneric; default: goto error; } break; case DeviceRequest | USB_REQ_GET_INTERFACE: tbuf[0] = 0; len = 1; fallthrough; case DeviceOutRequest | USB_REQ_SET_INTERFACE: break; case DeviceOutRequest | USB_REQ_SET_ADDRESS: /* wValue == urb->dev->devaddr */ dev_dbg (hcd->self.controller, "root hub device address %d\n", wValue); break; /* INTERFACE REQUESTS (no defined feature/status flags) */ /* ENDPOINT REQUESTS */ case EndpointRequest | USB_REQ_GET_STATUS: /* ENDPOINT_HALT flag */ tbuf[0] = 0; tbuf[1] = 0; len = 2; fallthrough; case EndpointOutRequest | USB_REQ_CLEAR_FEATURE: case EndpointOutRequest | USB_REQ_SET_FEATURE: dev_dbg (hcd->self.controller, "no endpoint features yet\n"); break; /* CLASS REQUESTS (and errors) */ default: nongeneric: /* non-generic request */ switch (typeReq) { case GetHubStatus: len = 4; break; case GetPortStatus: if (wValue == HUB_PORT_STATUS) len = 4; else /* other port status types return 8 bytes */ len = 8; break; case GetHubDescriptor: len = sizeof (struct usb_hub_descriptor); break; case DeviceRequest | USB_REQ_GET_DESCRIPTOR: /* len is returned by hub_control */ break; } status = hcd->driver->hub_control (hcd, typeReq, wValue, wIndex, tbuf, wLength); if (typeReq == GetHubDescriptor) usb_hub_adjust_deviceremovable(hcd->self.root_hub, (struct usb_hub_descriptor *)tbuf); break; error: /* "protocol stall" on error */ status = -EPIPE; } if (status < 0) { len = 0; if (status != -EPIPE) { dev_dbg (hcd->self.controller, "CTRL: TypeReq=0x%x val=0x%x " "idx=0x%x len=%d ==> %d\n", typeReq, wValue, wIndex, wLength, status); } } else if (status > 0) { /* hub_control may return the length of data copied. */ len = status; status = 0; } if (len) { if (urb->transfer_buffer_length < len) len = urb->transfer_buffer_length; urb->actual_length = len; /* always USB_DIR_IN, toward host */ memcpy (ubuf, bufp, len); /* report whether RH hardware supports remote wakeup */ if (patch_wakeup && len > offsetof (struct usb_config_descriptor, bmAttributes)) ((struct usb_config_descriptor *)ubuf)->bmAttributes |= USB_CONFIG_ATT_WAKEUP; /* report whether RH hardware has an integrated TT */ if (patch_protocol && len > offsetof(struct usb_device_descriptor, bDeviceProtocol)) ((struct usb_device_descriptor *) ubuf)-> bDeviceProtocol = USB_HUB_PR_HS_SINGLE_TT; } kfree(tbuf); err_alloc: /* any errors get returned through the urb completion */ spin_lock_irq(&hcd_root_hub_lock); usb_hcd_unlink_urb_from_ep(hcd, urb); usb_hcd_giveback_urb(hcd, urb, status); spin_unlock_irq(&hcd_root_hub_lock); return 0; } /*-------------------------------------------------------------------------*/ /* * Root Hub interrupt transfers are polled using a timer if the * driver requests it; otherwise the driver is responsible for * calling usb_hcd_poll_rh_status() when an event occurs. * * Completion handler may not sleep. See usb_hcd_giveback_urb() for details. */ void usb_hcd_poll_rh_status(struct usb_hcd *hcd) { struct urb *urb; int length; int status; unsigned long flags; char buffer[6]; /* Any root hubs with > 31 ports? */ if (unlikely(!hcd->rh_pollable)) return; if (!hcd->uses_new_polling && !hcd->status_urb) return; length = hcd->driver->hub_status_data(hcd, buffer); if (length > 0) { /* try to complete the status urb */ spin_lock_irqsave(&hcd_root_hub_lock, flags); urb = hcd->status_urb; if (urb) { clear_bit(HCD_FLAG_POLL_PENDING, &hcd->flags); hcd->status_urb = NULL; if (urb->transfer_buffer_length >= length) { status = 0; } else { status = -EOVERFLOW; length = urb->transfer_buffer_length; } urb->actual_length = length; memcpy(urb->transfer_buffer, buffer, length); usb_hcd_unlink_urb_from_ep(hcd, urb); usb_hcd_giveback_urb(hcd, urb, status); } else { length = 0; set_bit(HCD_FLAG_POLL_PENDING, &hcd->flags); } spin_unlock_irqrestore(&hcd_root_hub_lock, flags); } /* The USB 2.0 spec says 256 ms. This is close enough and won't * exceed that limit if HZ is 100. The math is more clunky than * maybe expected, this is to make sure that all timers for USB devices * fire at the same time to give the CPU a break in between */ if (hcd->uses_new_polling ? HCD_POLL_RH(hcd) : (length == 0 && hcd->status_urb != NULL)) mod_timer (&hcd->rh_timer, (jiffies/(HZ/4) + 1) * (HZ/4)); } EXPORT_SYMBOL_GPL(usb_hcd_poll_rh_status); /* timer callback */ static void rh_timer_func (struct timer_list *t) { struct usb_hcd *_hcd = from_timer(_hcd, t, rh_timer); usb_hcd_poll_rh_status(_hcd); } /*-------------------------------------------------------------------------*/ static int rh_queue_status (struct usb_hcd *hcd, struct urb *urb) { int retval; unsigned long flags; unsigned len = 1 + (urb->dev->maxchild / 8); spin_lock_irqsave (&hcd_root_hub_lock, flags); if (hcd->status_urb || urb->transfer_buffer_length < len) { dev_dbg (hcd->self.controller, "not queuing rh status urb\n"); retval = -EINVAL; goto done; } retval = usb_hcd_link_urb_to_ep(hcd, urb); if (retval) goto done; hcd->status_urb = urb; urb->hcpriv = hcd; /* indicate it's queued */ if (!hcd->uses_new_polling) mod_timer(&hcd->rh_timer, (jiffies/(HZ/4) + 1) * (HZ/4)); /* If a status change has already occurred, report it ASAP */ else if (HCD_POLL_PENDING(hcd)) mod_timer(&hcd->rh_timer, jiffies); retval = 0; done: spin_unlock_irqrestore (&hcd_root_hub_lock, flags); return retval; } static int rh_urb_enqueue (struct usb_hcd *hcd, struct urb *urb) { if (usb_endpoint_xfer_int(&urb->ep->desc)) return rh_queue_status (hcd, urb); if (usb_endpoint_xfer_control(&urb->ep->desc)) return rh_call_control (hcd, urb); return -EINVAL; } /*-------------------------------------------------------------------------*/ /* Unlinks of root-hub control URBs are legal, but they don't do anything * since these URBs always execute synchronously. */ static int usb_rh_urb_dequeue(struct usb_hcd *hcd, struct urb *urb, int status) { unsigned long flags; int rc; spin_lock_irqsave(&hcd_root_hub_lock, flags); rc = usb_hcd_check_unlink_urb(hcd, urb, status); if (rc) goto done; if (usb_endpoint_num(&urb->ep->desc) == 0) { /* Control URB */ ; /* Do nothing */ } else { /* Status URB */ if (!hcd->uses_new_polling) del_timer (&hcd->rh_timer); if (urb == hcd->status_urb) { hcd->status_urb = NULL; usb_hcd_unlink_urb_from_ep(hcd, urb); usb_hcd_giveback_urb(hcd, urb, status); } } done: spin_unlock_irqrestore(&hcd_root_hub_lock, flags); return rc; } /*-------------------------------------------------------------------------*/ /** * usb_bus_init - shared initialization code * @bus: the bus structure being initialized * * This code is used to initialize a usb_bus structure, memory for which is * separately managed. */ static void usb_bus_init (struct usb_bus *bus) { memset(&bus->devmap, 0, sizeof(bus->devmap)); bus->devnum_next = 1; bus->root_hub = NULL; bus->busnum = -1; bus->bandwidth_allocated = 0; bus->bandwidth_int_reqs = 0; bus->bandwidth_isoc_reqs = 0; mutex_init(&bus->devnum_next_mutex); } /*-------------------------------------------------------------------------*/ /** * usb_register_bus - registers the USB host controller with the usb core * @bus: pointer to the bus to register * * Context: task context, might sleep. * * Assigns a bus number, and links the controller into usbcore data * structures so that it can be seen by scanning the bus list. * * Return: 0 if successful. A negative error code otherwise. */ static int usb_register_bus(struct usb_bus *bus) { int result = -E2BIG; int busnum; mutex_lock(&usb_bus_idr_lock); busnum = idr_alloc(&usb_bus_idr, bus, 1, USB_MAXBUS, GFP_KERNEL); if (busnum < 0) { pr_err("%s: failed to get bus number\n", usbcore_name); goto error_find_busnum; } bus->busnum = busnum; mutex_unlock(&usb_bus_idr_lock); usb_notify_add_bus(bus); dev_info (bus->controller, "new USB bus registered, assigned bus " "number %d\n", bus->busnum); return 0; error_find_busnum: mutex_unlock(&usb_bus_idr_lock); return result; } /** * usb_deregister_bus - deregisters the USB host controller * @bus: pointer to the bus to deregister * * Context: task context, might sleep. * * Recycles the bus number, and unlinks the controller from usbcore data * structures so that it won't be seen by scanning the bus list. */ static void usb_deregister_bus (struct usb_bus *bus) { dev_info (bus->controller, "USB bus %d deregistered\n", bus->busnum); /* * NOTE: make sure that all the devices are removed by the * controller code, as well as having it call this when cleaning * itself up */ mutex_lock(&usb_bus_idr_lock); idr_remove(&usb_bus_idr, bus->busnum); mutex_unlock(&usb_bus_idr_lock); usb_notify_remove_bus(bus); } /** * register_root_hub - called by usb_add_hcd() to register a root hub * @hcd: host controller for this root hub * * This function registers the root hub with the USB subsystem. It sets up * the device properly in the device tree and then calls usb_new_device() * to register the usb device. It also assigns the root hub's USB address * (always 1). * * Return: 0 if successful. A negative error code otherwise. */ static int register_root_hub(struct usb_hcd *hcd) { struct device *parent_dev = hcd->self.controller; struct usb_device *usb_dev = hcd->self.root_hub; struct usb_device_descriptor *descr; const int devnum = 1; int retval; usb_dev->devnum = devnum; usb_dev->bus->devnum_next = devnum + 1; set_bit(devnum, usb_dev->bus->devmap); usb_set_device_state(usb_dev, USB_STATE_ADDRESS); mutex_lock(&usb_bus_idr_lock); usb_dev->ep0.desc.wMaxPacketSize = cpu_to_le16(64); descr = usb_get_device_descriptor(usb_dev); if (IS_ERR(descr)) { retval = PTR_ERR(descr); mutex_unlock(&usb_bus_idr_lock); dev_dbg (parent_dev, "can't read %s device descriptor %d\n", dev_name(&usb_dev->dev), retval); return retval; } usb_dev->descriptor = *descr; kfree(descr); if (le16_to_cpu(usb_dev->descriptor.bcdUSB) >= 0x0201) { retval = usb_get_bos_descriptor(usb_dev); if (!retval) { usb_dev->lpm_capable = usb_device_supports_lpm(usb_dev); } else if (usb_dev->speed >= USB_SPEED_SUPER) { mutex_unlock(&usb_bus_idr_lock); dev_dbg(parent_dev, "can't read %s bos descriptor %d\n", dev_name(&usb_dev->dev), retval); return retval; } } retval = usb_new_device (usb_dev); if (retval) { dev_err (parent_dev, "can't register root hub for %s, %d\n", dev_name(&usb_dev->dev), retval); } else { spin_lock_irq (&hcd_root_hub_lock); hcd->rh_registered = 1; spin_unlock_irq (&hcd_root_hub_lock); /* Did the HC die before the root hub was registered? */ if (HCD_DEAD(hcd)) usb_hc_died (hcd); /* This time clean up */ } mutex_unlock(&usb_bus_idr_lock); return retval; } /* * usb_hcd_start_port_resume - a root-hub port is sending a resume signal * @bus: the bus which the root hub belongs to * @portnum: the port which is being resumed * * HCDs should call this function when they know that a resume signal is * being sent to a root-hub port. The root hub will be prevented from * going into autosuspend until usb_hcd_end_port_resume() is called. * * The bus's private lock must be held by the caller. */ void usb_hcd_start_port_resume(struct usb_bus *bus, int portnum) { unsigned bit = 1 << portnum; if (!(bus->resuming_ports & bit)) { bus->resuming_ports |= bit; pm_runtime_get_noresume(&bus->root_hub->dev); } } EXPORT_SYMBOL_GPL(usb_hcd_start_port_resume); /* * usb_hcd_end_port_resume - a root-hub port has stopped sending a resume signal * @bus: the bus which the root hub belongs to * @portnum: the port which is being resumed * * HCDs should call this function when they know that a resume signal has * stopped being sent to a root-hub port. The root hub will be allowed to * autosuspend again. * * The bus's private lock must be held by the caller. */ void usb_hcd_end_port_resume(struct usb_bus *bus, int portnum) { unsigned bit = 1 << portnum; if (bus->resuming_ports & bit) { bus->resuming_ports &= ~bit; pm_runtime_put_noidle(&bus->root_hub->dev); } } EXPORT_SYMBOL_GPL(usb_hcd_end_port_resume); /*-------------------------------------------------------------------------*/ /** * usb_calc_bus_time - approximate periodic transaction time in nanoseconds * @speed: from dev->speed; USB_SPEED_{LOW,FULL,HIGH} * @is_input: true iff the transaction sends data to the host * @isoc: true for isochronous transactions, false for interrupt ones * @bytecount: how many bytes in the transaction. * * Return: Approximate bus time in nanoseconds for a periodic transaction. * * Note: * See USB 2.0 spec section 5.11.3; only periodic transfers need to be * scheduled in software, this function is only used for such scheduling. */ long usb_calc_bus_time (int speed, int is_input, int isoc, int bytecount) { unsigned long tmp; switch (speed) { case USB_SPEED_LOW: /* INTR only */ if (is_input) { tmp = (67667L * (31L + 10L * BitTime (bytecount))) / 1000L; return 64060L + (2 * BW_HUB_LS_SETUP) + BW_HOST_DELAY + tmp; } else { tmp = (66700L * (31L + 10L * BitTime (bytecount))) / 1000L; return 64107L + (2 * BW_HUB_LS_SETUP) + BW_HOST_DELAY + tmp; } case USB_SPEED_FULL: /* ISOC or INTR */ if (isoc) { tmp = (8354L * (31L + 10L * BitTime (bytecount))) / 1000L; return ((is_input) ? 7268L : 6265L) + BW_HOST_DELAY + tmp; } else { tmp = (8354L * (31L + 10L * BitTime (bytecount))) / 1000L; return 9107L + BW_HOST_DELAY + tmp; } case USB_SPEED_HIGH: /* ISOC or INTR */ /* FIXME adjust for input vs output */ if (isoc) tmp = HS_NSECS_ISO (bytecount); else tmp = HS_NSECS (bytecount); return tmp; default: pr_debug ("%s: bogus device speed!\n", usbcore_name); return -1; } } EXPORT_SYMBOL_GPL(usb_calc_bus_time); /*-------------------------------------------------------------------------*/ /* * Generic HC operations. */ /*-------------------------------------------------------------------------*/ /** * usb_hcd_link_urb_to_ep - add an URB to its endpoint queue * @hcd: host controller to which @urb was submitted * @urb: URB being submitted * * Host controller drivers should call this routine in their enqueue() * method. The HCD's private spinlock must be held and interrupts must * be disabled. The actions carried out here are required for URB * submission, as well as for endpoint shutdown and for usb_kill_urb. * * Return: 0 for no error, otherwise a negative error code (in which case * the enqueue() method must fail). If no error occurs but enqueue() fails * anyway, it must call usb_hcd_unlink_urb_from_ep() before releasing * the private spinlock and returning. */ int usb_hcd_link_urb_to_ep(struct usb_hcd *hcd, struct urb *urb) { int rc = 0; spin_lock(&hcd_urb_list_lock); /* Check that the URB isn't being killed */ if (unlikely(atomic_read(&urb->reject))) { rc = -EPERM; goto done; } if (unlikely(!urb->ep->enabled)) { rc = -ENOENT; goto done; } if (unlikely(!urb->dev->can_submit)) { rc = -EHOSTUNREACH; goto done; } /* * Check the host controller's state and add the URB to the * endpoint's queue. */ if (HCD_RH_RUNNING(hcd)) { urb->unlinked = 0; list_add_tail(&urb->urb_list, &urb->ep->urb_list); } else { rc = -ESHUTDOWN; goto done; } done: spin_unlock(&hcd_urb_list_lock); return rc; } EXPORT_SYMBOL_GPL(usb_hcd_link_urb_to_ep); /** * usb_hcd_check_unlink_urb - check whether an URB may be unlinked * @hcd: host controller to which @urb was submitted * @urb: URB being checked for unlinkability * @status: error code to store in @urb if the unlink succeeds * * Host controller drivers should call this routine in their dequeue() * method. The HCD's private spinlock must be held and interrupts must * be disabled. The actions carried out here are required for making * sure than an unlink is valid. * * Return: 0 for no error, otherwise a negative error code (in which case * the dequeue() method must fail). The possible error codes are: * * -EIDRM: @urb was not submitted or has already completed. * The completion function may not have been called yet. * * -EBUSY: @urb has already been unlinked. */ int usb_hcd_check_unlink_urb(struct usb_hcd *hcd, struct urb *urb, int status) { struct list_head *tmp; /* insist the urb is still queued */ list_for_each(tmp, &urb->ep->urb_list) { if (tmp == &urb->urb_list) break; } if (tmp != &urb->urb_list) return -EIDRM; /* Any status except -EINPROGRESS means something already started to * unlink this URB from the hardware. So there's no more work to do. */ if (urb->unlinked) return -EBUSY; urb->unlinked = status; return 0; } EXPORT_SYMBOL_GPL(usb_hcd_check_unlink_urb); /** * usb_hcd_unlink_urb_from_ep - remove an URB from its endpoint queue * @hcd: host controller to which @urb was submitted * @urb: URB being unlinked * * Host controller drivers should call this routine before calling * usb_hcd_giveback_urb(). The HCD's private spinlock must be held and * interrupts must be disabled. The actions carried out here are required * for URB completion. */ void usb_hcd_unlink_urb_from_ep(struct usb_hcd *hcd, struct urb *urb) { /* clear all state linking urb to this dev (and hcd) */ spin_lock(&hcd_urb_list_lock); list_del_init(&urb->urb_list); spin_unlock(&hcd_urb_list_lock); } EXPORT_SYMBOL_GPL(usb_hcd_unlink_urb_from_ep); /* * Some usb host controllers can only perform dma using a small SRAM area, * or have restrictions on addressable DRAM. * The usb core itself is however optimized for host controllers that can dma * using regular system memory - like pci devices doing bus mastering. * * To support host controllers with limited dma capabilities we provide dma * bounce buffers. This feature can be enabled by initializing * hcd->localmem_pool using usb_hcd_setup_local_mem(). * * The initialized hcd->localmem_pool then tells the usb code to allocate all * data for dma using the genalloc API. * * So, to summarize... * * - We need "local" memory, canonical example being * a small SRAM on a discrete controller being the * only memory that the controller can read ... * (a) "normal" kernel memory is no good, and * (b) there's not enough to share * * - So we use that, even though the primary requirement * is that the memory be "local" (hence addressable * by that device), not "coherent". * */ static int hcd_alloc_coherent(struct usb_bus *bus, gfp_t mem_flags, dma_addr_t *dma_handle, void **vaddr_handle, size_t size, enum dma_data_direction dir) { unsigned char *vaddr; if (*vaddr_handle == NULL) { WARN_ON_ONCE(1); return -EFAULT; } vaddr = hcd_buffer_alloc(bus, size + sizeof(unsigned long), mem_flags, dma_handle); if (!vaddr) return -ENOMEM; /* * Store the virtual address of the buffer at the end * of the allocated dma buffer. The size of the buffer * may be uneven so use unaligned functions instead * of just rounding up. It makes sense to optimize for * memory footprint over access speed since the amount * of memory available for dma may be limited. */ put_unaligned((unsigned long)*vaddr_handle, (unsigned long *)(vaddr + size)); if (dir == DMA_TO_DEVICE) memcpy(vaddr, *vaddr_handle, size); *vaddr_handle = vaddr; return 0; } static void hcd_free_coherent(struct usb_bus *bus, dma_addr_t *dma_handle, void **vaddr_handle, size_t size, enum dma_data_direction dir) { unsigned char *vaddr = *vaddr_handle; vaddr = (void *)get_unaligned((unsigned long *)(vaddr + size)); if (dir == DMA_FROM_DEVICE) memcpy(vaddr, *vaddr_handle, size); hcd_buffer_free(bus, size + sizeof(vaddr), *vaddr_handle, *dma_handle); *vaddr_handle = vaddr; *dma_handle = 0; } void usb_hcd_unmap_urb_setup_for_dma(struct usb_hcd *hcd, struct urb *urb) { if (IS_ENABLED(CONFIG_HAS_DMA) && (urb->transfer_flags & URB_SETUP_MAP_SINGLE)) dma_unmap_single(hcd->self.sysdev, urb->setup_dma, sizeof(struct usb_ctrlrequest), DMA_TO_DEVICE); else if (urb->transfer_flags & URB_SETUP_MAP_LOCAL) hcd_free_coherent(urb->dev->bus, &urb->setup_dma, (void **) &urb->setup_packet, sizeof(struct usb_ctrlrequest), DMA_TO_DEVICE); /* Make it safe to call this routine more than once */ urb->transfer_flags &= ~(URB_SETUP_MAP_SINGLE | URB_SETUP_MAP_LOCAL); } EXPORT_SYMBOL_GPL(usb_hcd_unmap_urb_setup_for_dma); static void unmap_urb_for_dma(struct usb_hcd *hcd, struct urb *urb) { if (hcd->driver->unmap_urb_for_dma) hcd->driver->unmap_urb_for_dma(hcd, urb); else usb_hcd_unmap_urb_for_dma(hcd, urb); } void usb_hcd_unmap_urb_for_dma(struct usb_hcd *hcd, struct urb *urb) { enum dma_data_direction dir; usb_hcd_unmap_urb_setup_for_dma(hcd, urb); dir = usb_urb_dir_in(urb) ? DMA_FROM_DEVICE : DMA_TO_DEVICE; if (IS_ENABLED(CONFIG_HAS_DMA) && (urb->transfer_flags & URB_DMA_MAP_SG)) dma_unmap_sg(hcd->self.sysdev, urb->sg, urb->num_sgs, dir); else if (IS_ENABLED(CONFIG_HAS_DMA) && (urb->transfer_flags & URB_DMA_MAP_PAGE)) dma_unmap_page(hcd->self.sysdev, urb->transfer_dma, urb->transfer_buffer_length, dir); else if (IS_ENABLED(CONFIG_HAS_DMA) && (urb->transfer_flags & URB_DMA_MAP_SINGLE)) dma_unmap_single(hcd->self.sysdev, urb->transfer_dma, urb->transfer_buffer_length, dir); else if (urb->transfer_flags & URB_MAP_LOCAL) hcd_free_coherent(urb->dev->bus, &urb->transfer_dma, &urb->transfer_buffer, urb->transfer_buffer_length, dir); /* Make it safe to call this routine more than once */ urb->transfer_flags &= ~(URB_DMA_MAP_SG | URB_DMA_MAP_PAGE | URB_DMA_MAP_SINGLE | URB_MAP_LOCAL); } EXPORT_SYMBOL_GPL(usb_hcd_unmap_urb_for_dma); static int map_urb_for_dma(struct usb_hcd *hcd, struct urb *urb, gfp_t mem_flags) { if (hcd->driver->map_urb_for_dma) return hcd->driver->map_urb_for_dma(hcd, urb, mem_flags); else return usb_hcd_map_urb_for_dma(hcd, urb, mem_flags); } int usb_hcd_map_urb_for_dma(struct usb_hcd *hcd, struct urb *urb, gfp_t mem_flags) { enum dma_data_direction dir; int ret = 0; /* Map the URB's buffers for DMA access. * Lower level HCD code should use *_dma exclusively, * unless it uses pio or talks to another transport, * or uses the provided scatter gather list for bulk. */ if (usb_endpoint_xfer_control(&urb->ep->desc)) { if (hcd->self.uses_pio_for_control) return ret; if (hcd->localmem_pool) { ret = hcd_alloc_coherent( urb->dev->bus, mem_flags, &urb->setup_dma, (void **)&urb->setup_packet, sizeof(struct usb_ctrlrequest), DMA_TO_DEVICE); if (ret) return ret; urb->transfer_flags |= URB_SETUP_MAP_LOCAL; } else if (hcd_uses_dma(hcd)) { if (object_is_on_stack(urb->setup_packet)) { WARN_ONCE(1, "setup packet is on stack\n"); return -EAGAIN; } urb->setup_dma = dma_map_single( hcd->self.sysdev, urb->setup_packet, sizeof(struct usb_ctrlrequest), DMA_TO_DEVICE); if (dma_mapping_error(hcd->self.sysdev, urb->setup_dma)) return -EAGAIN; urb->transfer_flags |= URB_SETUP_MAP_SINGLE; } } dir = usb_urb_dir_in(urb) ? DMA_FROM_DEVICE : DMA_TO_DEVICE; if (urb->transfer_buffer_length != 0 && !(urb->transfer_flags & URB_NO_TRANSFER_DMA_MAP)) { if (hcd->localmem_pool) { ret = hcd_alloc_coherent( urb->dev->bus, mem_flags, &urb->transfer_dma, &urb->transfer_buffer, urb->transfer_buffer_length, dir); if (ret == 0) urb->transfer_flags |= URB_MAP_LOCAL; } else if (hcd_uses_dma(hcd)) { if (urb->num_sgs) { int n; /* We don't support sg for isoc transfers ! */ if (usb_endpoint_xfer_isoc(&urb->ep->desc)) { WARN_ON(1); return -EINVAL; } n = dma_map_sg( hcd->self.sysdev, urb->sg, urb->num_sgs, dir); if (!n) ret = -EAGAIN; else urb->transfer_flags |= URB_DMA_MAP_SG; urb->num_mapped_sgs = n; if (n != urb->num_sgs) urb->transfer_flags |= URB_DMA_SG_COMBINED; } else if (urb->sg) { struct scatterlist *sg = urb->sg; urb->transfer_dma = dma_map_page( hcd->self.sysdev, sg_page(sg), sg->offset, urb->transfer_buffer_length, dir); if (dma_mapping_error(hcd->self.sysdev, urb->transfer_dma)) ret = -EAGAIN; else urb->transfer_flags |= URB_DMA_MAP_PAGE; } else if (object_is_on_stack(urb->transfer_buffer)) { WARN_ONCE(1, "transfer buffer is on stack\n"); ret = -EAGAIN; } else { urb->transfer_dma = dma_map_single( hcd->self.sysdev, urb->transfer_buffer, urb->transfer_buffer_length, dir); if (dma_mapping_error(hcd->self.sysdev, urb->transfer_dma)) ret = -EAGAIN; else urb->transfer_flags |= URB_DMA_MAP_SINGLE; } } if (ret && (urb->transfer_flags & (URB_SETUP_MAP_SINGLE | URB_SETUP_MAP_LOCAL))) usb_hcd_unmap_urb_for_dma(hcd, urb); } return ret; } EXPORT_SYMBOL_GPL(usb_hcd_map_urb_for_dma); /*-------------------------------------------------------------------------*/ /* may be called in any context with a valid urb->dev usecount * caller surrenders "ownership" of urb * expects usb_submit_urb() to have sanity checked and conditioned all * inputs in the urb */ int usb_hcd_submit_urb (struct urb *urb, gfp_t mem_flags) { int status; struct usb_hcd *hcd = bus_to_hcd(urb->dev->bus); /* increment urb's reference count as part of giving it to the HCD * (which will control it). HCD guarantees that it either returns * an error or calls giveback(), but not both. */ usb_get_urb(urb); atomic_inc(&urb->use_count); atomic_inc(&urb->dev->urbnum); usbmon_urb_submit(&hcd->self, urb); /* NOTE requirements on root-hub callers (usbfs and the hub * driver, for now): URBs' urb->transfer_buffer must be * valid and usb_buffer_{sync,unmap}() not be needed, since * they could clobber root hub response data. Also, control * URBs must be submitted in process context with interrupts * enabled. */ if (is_root_hub(urb->dev)) { status = rh_urb_enqueue(hcd, urb); } else { status = map_urb_for_dma(hcd, urb, mem_flags); if (likely(status == 0)) { status = hcd->driver->urb_enqueue(hcd, urb, mem_flags); if (unlikely(status)) unmap_urb_for_dma(hcd, urb); } } if (unlikely(status)) { usbmon_urb_submit_error(&hcd->self, urb, status); urb->hcpriv = NULL; INIT_LIST_HEAD(&urb->urb_list); atomic_dec(&urb->use_count); /* * Order the write of urb->use_count above before the read * of urb->reject below. Pairs with the memory barriers in * usb_kill_urb() and usb_poison_urb(). */ smp_mb__after_atomic(); atomic_dec(&urb->dev->urbnum); if (atomic_read(&urb->reject)) wake_up(&usb_kill_urb_queue); usb_put_urb(urb); } return status; } /*-------------------------------------------------------------------------*/ /* this makes the hcd giveback() the urb more quickly, by kicking it * off hardware queues (which may take a while) and returning it as * soon as practical. we've already set up the urb's return status, * but we can't know if the callback completed already. */ static int unlink1(struct usb_hcd *hcd, struct urb *urb, int status) { int value; if (is_root_hub(urb->dev)) value = usb_rh_urb_dequeue(hcd, urb, status); else { /* The only reason an HCD might fail this call is if * it has not yet fully queued the urb to begin with. * Such failures should be harmless. */ value = hcd->driver->urb_dequeue(hcd, urb, status); } return value; } /* * called in any context * * caller guarantees urb won't be recycled till both unlink() * and the urb's completion function return */ int usb_hcd_unlink_urb (struct urb *urb, int status) { struct usb_hcd *hcd; struct usb_device *udev = urb->dev; int retval = -EIDRM; unsigned long flags; /* Prevent the device and bus from going away while * the unlink is carried out. If they are already gone * then urb->use_count must be 0, since disconnected * devices can't have any active URBs. */ spin_lock_irqsave(&hcd_urb_unlink_lock, flags); if (atomic_read(&urb->use_count) > 0) { retval = 0; usb_get_dev(udev); } spin_unlock_irqrestore(&hcd_urb_unlink_lock, flags); if (retval == 0) { hcd = bus_to_hcd(urb->dev->bus); retval = unlink1(hcd, urb, status); if (retval == 0) retval = -EINPROGRESS; else if (retval != -EIDRM && retval != -EBUSY) dev_dbg(&udev->dev, "hcd_unlink_urb %pK fail %d\n", urb, retval); usb_put_dev(udev); } return retval; } /*-------------------------------------------------------------------------*/ static void __usb_hcd_giveback_urb(struct urb *urb) { struct usb_hcd *hcd = bus_to_hcd(urb->dev->bus); struct usb_anchor *anchor = urb->anchor; int status = urb->unlinked; unsigned long flags; urb->hcpriv = NULL; if (unlikely((urb->transfer_flags & URB_SHORT_NOT_OK) && urb->actual_length < urb->transfer_buffer_length && !status)) status = -EREMOTEIO; unmap_urb_for_dma(hcd, urb); usbmon_urb_complete(&hcd->self, urb, status); usb_anchor_suspend_wakeups(anchor); usb_unanchor_urb(urb); if (likely(status == 0)) usb_led_activity(USB_LED_EVENT_HOST); /* pass ownership to the completion handler */ urb->status = status; /* * Only collect coverage in the softirq context and disable interrupts * to avoid scenarios with nested remote coverage collection sections * that KCOV does not support. * See the comment next to kcov_remote_start_usb_softirq() for details. */ flags = kcov_remote_start_usb_softirq((u64)urb->dev->bus->busnum); urb->complete(urb); kcov_remote_stop_softirq(flags); usb_anchor_resume_wakeups(anchor); atomic_dec(&urb->use_count); /* * Order the write of urb->use_count above before the read * of urb->reject below. Pairs with the memory barriers in * usb_kill_urb() and usb_poison_urb(). */ smp_mb__after_atomic(); if (unlikely(atomic_read(&urb->reject))) wake_up(&usb_kill_urb_queue); usb_put_urb(urb); } static void usb_giveback_urb_bh(struct work_struct *work) { struct giveback_urb_bh *bh = container_of(work, struct giveback_urb_bh, bh); struct list_head local_list; spin_lock_irq(&bh->lock); bh->running = true; list_replace_init(&bh->head, &local_list); spin_unlock_irq(&bh->lock); while (!list_empty(&local_list)) { struct urb *urb; urb = list_entry(local_list.next, struct urb, urb_list); list_del_init(&urb->urb_list); bh->completing_ep = urb->ep; __usb_hcd_giveback_urb(urb); bh->completing_ep = NULL; } /* * giveback new URBs next time to prevent this function * from not exiting for a long time. */ spin_lock_irq(&bh->lock); if (!list_empty(&bh->head)) { if (bh->high_prio) queue_work(system_bh_highpri_wq, &bh->bh); else queue_work(system_bh_wq, &bh->bh); } bh->running = false; spin_unlock_irq(&bh->lock); } /** * usb_hcd_giveback_urb - return URB from HCD to device driver * @hcd: host controller returning the URB * @urb: urb being returned to the USB device driver. * @status: completion status code for the URB. * * Context: atomic. The completion callback is invoked in caller's context. * For HCDs with HCD_BH flag set, the completion callback is invoked in BH * context (except for URBs submitted to the root hub which always complete in * caller's context). * * This hands the URB from HCD to its USB device driver, using its * completion function. The HCD has freed all per-urb resources * (and is done using urb->hcpriv). It also released all HCD locks; * the device driver won't cause problems if it frees, modifies, * or resubmits this URB. * * If @urb was unlinked, the value of @status will be overridden by * @urb->unlinked. Erroneous short transfers are detected in case * the HCD hasn't checked for them. */ void usb_hcd_giveback_urb(struct usb_hcd *hcd, struct urb *urb, int status) { struct giveback_urb_bh *bh; bool running; /* pass status to BH via unlinked */ if (likely(!urb->unlinked)) urb->unlinked = status; if (!hcd_giveback_urb_in_bh(hcd) && !is_root_hub(urb->dev)) { __usb_hcd_giveback_urb(urb); return; } if (usb_pipeisoc(urb->pipe) || usb_pipeint(urb->pipe)) bh = &hcd->high_prio_bh; else bh = &hcd->low_prio_bh; spin_lock(&bh->lock); list_add_tail(&urb->urb_list, &bh->head); running = bh->running; spin_unlock(&bh->lock); if (running) ; else if (bh->high_prio) queue_work(system_bh_highpri_wq, &bh->bh); else queue_work(system_bh_wq, &bh->bh); } EXPORT_SYMBOL_GPL(usb_hcd_giveback_urb); /*-------------------------------------------------------------------------*/ /* Cancel all URBs pending on this endpoint and wait for the endpoint's * queue to drain completely. The caller must first insure that no more * URBs can be submitted for this endpoint. */ void usb_hcd_flush_endpoint(struct usb_device *udev, struct usb_host_endpoint *ep) { struct usb_hcd *hcd; struct urb *urb; if (!ep) return; might_sleep(); hcd = bus_to_hcd(udev->bus); /* No more submits can occur */ spin_lock_irq(&hcd_urb_list_lock); rescan: list_for_each_entry_reverse(urb, &ep->urb_list, urb_list) { int is_in; if (urb->unlinked) continue; usb_get_urb (urb); is_in = usb_urb_dir_in(urb); spin_unlock(&hcd_urb_list_lock); /* kick hcd */ unlink1(hcd, urb, -ESHUTDOWN); dev_dbg (hcd->self.controller, "shutdown urb %pK ep%d%s-%s\n", urb, usb_endpoint_num(&ep->desc), is_in ? "in" : "out", usb_ep_type_string(usb_endpoint_type(&ep->desc))); usb_put_urb (urb); /* list contents may have changed */ spin_lock(&hcd_urb_list_lock); goto rescan; } spin_unlock_irq(&hcd_urb_list_lock); /* Wait until the endpoint queue is completely empty */ while (!list_empty (&ep->urb_list)) { spin_lock_irq(&hcd_urb_list_lock); /* The list may have changed while we acquired the spinlock */ urb = NULL; if (!list_empty (&ep->urb_list)) { urb = list_entry (ep->urb_list.prev, struct urb, urb_list); usb_get_urb (urb); } spin_unlock_irq(&hcd_urb_list_lock); if (urb) { usb_kill_urb (urb); usb_put_urb (urb); } } } /** * usb_hcd_alloc_bandwidth - check whether a new bandwidth setting exceeds * the bus bandwidth * @udev: target &usb_device * @new_config: new configuration to install * @cur_alt: the current alternate interface setting * @new_alt: alternate interface setting that is being installed * * To change configurations, pass in the new configuration in new_config, * and pass NULL for cur_alt and new_alt. * * To reset a device's configuration (put the device in the ADDRESSED state), * pass in NULL for new_config, cur_alt, and new_alt. * * To change alternate interface settings, pass in NULL for new_config, * pass in the current alternate interface setting in cur_alt, * and pass in the new alternate interface setting in new_alt. * * Return: An error if the requested bandwidth change exceeds the * bus bandwidth or host controller internal resources. */ int usb_hcd_alloc_bandwidth(struct usb_device *udev, struct usb_host_config *new_config, struct usb_host_interface *cur_alt, struct usb_host_interface *new_alt) { int num_intfs, i, j; struct usb_host_interface *alt = NULL; int ret = 0; struct usb_hcd *hcd; struct usb_host_endpoint *ep; hcd = bus_to_hcd(udev->bus); if (!hcd->driver->check_bandwidth) return 0; /* Configuration is being removed - set configuration 0 */ if (!new_config && !cur_alt) { for (i = 1; i < 16; ++i) { ep = udev->ep_out[i]; if (ep) hcd->driver->drop_endpoint(hcd, udev, ep); ep = udev->ep_in[i]; if (ep) hcd->driver->drop_endpoint(hcd, udev, ep); } hcd->driver->check_bandwidth(hcd, udev); return 0; } /* Check if the HCD says there's enough bandwidth. Enable all endpoints * each interface's alt setting 0 and ask the HCD to check the bandwidth * of the bus. There will always be bandwidth for endpoint 0, so it's * ok to exclude it. */ if (new_config) { num_intfs = new_config->desc.bNumInterfaces; /* Remove endpoints (except endpoint 0, which is always on the * schedule) from the old config from the schedule */ for (i = 1; i < 16; ++i) { ep = udev->ep_out[i]; if (ep) { ret = hcd->driver->drop_endpoint(hcd, udev, ep); if (ret < 0) goto reset; } ep = udev->ep_in[i]; if (ep) { ret = hcd->driver->drop_endpoint(hcd, udev, ep); if (ret < 0) goto reset; } } for (i = 0; i < num_intfs; ++i) { struct usb_host_interface *first_alt; int iface_num; first_alt = &new_config->intf_cache[i]->altsetting[0]; iface_num = first_alt->desc.bInterfaceNumber; /* Set up endpoints for alternate interface setting 0 */ alt = usb_find_alt_setting(new_config, iface_num, 0); if (!alt) /* No alt setting 0? Pick the first setting. */ alt = first_alt; for (j = 0; j < alt->desc.bNumEndpoints; j++) { ret = hcd->driver->add_endpoint(hcd, udev, &alt->endpoint[j]); if (ret < 0) goto reset; } } } if (cur_alt && new_alt) { struct usb_interface *iface = usb_ifnum_to_if(udev, cur_alt->desc.bInterfaceNumber); if (!iface) return -EINVAL; if (iface->resetting_device) { /* * The USB core just reset the device, so the xHCI host * and the device will think alt setting 0 is installed. * However, the USB core will pass in the alternate * setting installed before the reset as cur_alt. Dig * out the alternate setting 0 structure, or the first * alternate setting if a broken device doesn't have alt * setting 0. */ cur_alt = usb_altnum_to_altsetting(iface, 0); if (!cur_alt) cur_alt = &iface->altsetting[0]; } /* Drop all the endpoints in the current alt setting */ for (i = 0; i < cur_alt->desc.bNumEndpoints; i++) { ret = hcd->driver->drop_endpoint(hcd, udev, &cur_alt->endpoint[i]); if (ret < 0) goto reset; } /* Add all the endpoints in the new alt setting */ for (i = 0; i < new_alt->desc.bNumEndpoints; i++) { ret = hcd->driver->add_endpoint(hcd, udev, &new_alt->endpoint[i]); if (ret < 0) goto reset; } } ret = hcd->driver->check_bandwidth(hcd, udev); reset: if (ret < 0) hcd->driver->reset_bandwidth(hcd, udev); return ret; } /* Disables the endpoint: synchronizes with the hcd to make sure all * endpoint state is gone from hardware. usb_hcd_flush_endpoint() must * have been called previously. Use for set_configuration, set_interface, * driver removal, physical disconnect. * * example: a qh stored in ep->hcpriv, holding state related to endpoint * type, maxpacket size, toggle, halt status, and scheduling. */ void usb_hcd_disable_endpoint(struct usb_device *udev, struct usb_host_endpoint *ep) { struct usb_hcd *hcd; might_sleep(); hcd = bus_to_hcd(udev->bus); if (hcd->driver->endpoint_disable) hcd->driver->endpoint_disable(hcd, ep); } /** * usb_hcd_reset_endpoint - reset host endpoint state * @udev: USB device. * @ep: the endpoint to reset. * * Resets any host endpoint state such as the toggle bit, sequence * number and current window. */ void usb_hcd_reset_endpoint(struct usb_device *udev, struct usb_host_endpoint *ep) { struct usb_hcd *hcd = bus_to_hcd(udev->bus); if (hcd->driver->endpoint_reset) hcd->driver->endpoint_reset(hcd, ep); else { int epnum = usb_endpoint_num(&ep->desc); int is_out = usb_endpoint_dir_out(&ep->desc); int is_control = usb_endpoint_xfer_control(&ep->desc); usb_settoggle(udev, epnum, is_out, 0); if (is_control) usb_settoggle(udev, epnum, !is_out, 0); } } /** * usb_alloc_streams - allocate bulk endpoint stream IDs. * @interface: alternate setting that includes all endpoints. * @eps: array of endpoints that need streams. * @num_eps: number of endpoints in the array. * @num_streams: number of streams to allocate. * @mem_flags: flags hcd should use to allocate memory. * * Sets up a group of bulk endpoints to have @num_streams stream IDs available. * Drivers may queue multiple transfers to different stream IDs, which may * complete in a different order than they were queued. * * Return: On success, the number of allocated streams. On failure, a negative * error code. */ int usb_alloc_streams(struct usb_interface *interface, struct usb_host_endpoint **eps, unsigned int num_eps, unsigned int num_streams, gfp_t mem_flags) { struct usb_hcd *hcd; struct usb_device *dev; int i, ret; dev = interface_to_usbdev(interface); hcd = bus_to_hcd(dev->bus); if (!hcd->driver->alloc_streams || !hcd->driver->free_streams) return -EINVAL; if (dev->speed < USB_SPEED_SUPER) return -EINVAL; if (dev->state < USB_STATE_CONFIGURED) return -ENODEV; for (i = 0; i < num_eps; i++) { /* Streams only apply to bulk endpoints. */ if (!usb_endpoint_xfer_bulk(&eps[i]->desc)) return -EINVAL; /* Re-alloc is not allowed */ if (eps[i]->streams) return -EINVAL; } ret = hcd->driver->alloc_streams(hcd, dev, eps, num_eps, num_streams, mem_flags); if (ret < 0) return ret; for (i = 0; i < num_eps; i++) eps[i]->streams = ret; return ret; } EXPORT_SYMBOL_GPL(usb_alloc_streams); /** * usb_free_streams - free bulk endpoint stream IDs. * @interface: alternate setting that includes all endpoints. * @eps: array of endpoints to remove streams from. * @num_eps: number of endpoints in the array. * @mem_flags: flags hcd should use to allocate memory. * * Reverts a group of bulk endpoints back to not using stream IDs. * Can fail if we are given bad arguments, or HCD is broken. * * Return: 0 on success. On failure, a negative error code. */ int usb_free_streams(struct usb_interface *interface, struct usb_host_endpoint **eps, unsigned int num_eps, gfp_t mem_flags) { struct usb_hcd *hcd; struct usb_device *dev; int i, ret; dev = interface_to_usbdev(interface); hcd = bus_to_hcd(dev->bus); if (dev->speed < USB_SPEED_SUPER) return -EINVAL; /* Double-free is not allowed */ for (i = 0; i < num_eps; i++) if (!eps[i] || !eps[i]->streams) return -EINVAL; ret = hcd->driver->free_streams(hcd, dev, eps, num_eps, mem_flags); if (ret < 0) return ret; for (i = 0; i < num_eps; i++) eps[i]->streams = 0; return ret; } EXPORT_SYMBOL_GPL(usb_free_streams); /* Protect against drivers that try to unlink URBs after the device * is gone, by waiting until all unlinks for @udev are finished. * Since we don't currently track URBs by device, simply wait until * nothing is running in the locked region of usb_hcd_unlink_urb(). */ void usb_hcd_synchronize_unlinks(struct usb_device *udev) { spin_lock_irq(&hcd_urb_unlink_lock); spin_unlock_irq(&hcd_urb_unlink_lock); } /*-------------------------------------------------------------------------*/ /* called in any context */ int usb_hcd_get_frame_number (struct usb_device *udev) { struct usb_hcd *hcd = bus_to_hcd(udev->bus); if (!HCD_RH_RUNNING(hcd)) return -ESHUTDOWN; return hcd->driver->get_frame_number (hcd); } /*-------------------------------------------------------------------------*/ #ifdef CONFIG_USB_HCD_TEST_MODE static void usb_ehset_completion(struct urb *urb) { struct completion *done = urb->context; complete(done); } /* * Allocate and initialize a control URB. This request will be used by the * EHSET SINGLE_STEP_SET_FEATURE test in which the DATA and STATUS stages * of the GetDescriptor request are sent 15 seconds after the SETUP stage. * Return NULL if failed. */ static struct urb *request_single_step_set_feature_urb( struct usb_device *udev, void *dr, void *buf, struct completion *done) { struct urb *urb; struct usb_hcd *hcd = bus_to_hcd(udev->bus); urb = usb_alloc_urb(0, GFP_KERNEL); if (!urb) return NULL; urb->pipe = usb_rcvctrlpipe(udev, 0); urb->ep = &udev->ep0; urb->dev = udev; urb->setup_packet = (void *)dr; urb->transfer_buffer = buf; urb->transfer_buffer_length = USB_DT_DEVICE_SIZE; urb->complete = usb_ehset_completion; urb->status = -EINPROGRESS; urb->actual_length = 0; urb->transfer_flags = URB_DIR_IN; usb_get_urb(urb); atomic_inc(&urb->use_count); atomic_inc(&urb->dev->urbnum); if (map_urb_for_dma(hcd, urb, GFP_KERNEL)) { usb_put_urb(urb); usb_free_urb(urb); return NULL; } urb->context = done; return urb; } int ehset_single_step_set_feature(struct usb_hcd *hcd, int port) { int retval = -ENOMEM; struct usb_ctrlrequest *dr; struct urb *urb; struct usb_device *udev; struct usb_device_descriptor *buf; DECLARE_COMPLETION_ONSTACK(done); /* Obtain udev of the rhub's child port */ udev = usb_hub_find_child(hcd->self.root_hub, port); if (!udev) { dev_err(hcd->self.controller, "No device attached to the RootHub\n"); return -ENODEV; } buf = kmalloc(USB_DT_DEVICE_SIZE, GFP_KERNEL); if (!buf) return -ENOMEM; dr = kmalloc(sizeof(struct usb_ctrlrequest), GFP_KERNEL); if (!dr) { kfree(buf); return -ENOMEM; } /* Fill Setup packet for GetDescriptor */ dr->bRequestType = USB_DIR_IN; dr->bRequest = USB_REQ_GET_DESCRIPTOR; dr->wValue = cpu_to_le16(USB_DT_DEVICE << 8); dr->wIndex = 0; dr->wLength = cpu_to_le16(USB_DT_DEVICE_SIZE); urb = request_single_step_set_feature_urb(udev, dr, buf, &done); if (!urb) goto cleanup; /* Submit just the SETUP stage */ retval = hcd->driver->submit_single_step_set_feature(hcd, urb, 1); if (retval) goto out1; if (!wait_for_completion_timeout(&done, msecs_to_jiffies(2000))) { usb_kill_urb(urb); retval = -ETIMEDOUT; dev_err(hcd->self.controller, "%s SETUP stage timed out on ep0\n", __func__); goto out1; } msleep(15 * 1000); /* Complete remaining DATA and STATUS stages using the same URB */ urb->status = -EINPROGRESS; usb_get_urb(urb); atomic_inc(&urb->use_count); atomic_inc(&urb->dev->urbnum); retval = hcd->driver->submit_single_step_set_feature(hcd, urb, 0); if (!retval && !wait_for_completion_timeout(&done, msecs_to_jiffies(2000))) { usb_kill_urb(urb); retval = -ETIMEDOUT; dev_err(hcd->self.controller, "%s IN stage timed out on ep0\n", __func__); } out1: usb_free_urb(urb); cleanup: kfree(dr); kfree(buf); return retval; } EXPORT_SYMBOL_GPL(ehset_single_step_set_feature); #endif /* CONFIG_USB_HCD_TEST_MODE */ /*-------------------------------------------------------------------------*/ #ifdef CONFIG_PM int hcd_bus_suspend(struct usb_device *rhdev, pm_message_t msg) { struct usb_hcd *hcd = bus_to_hcd(rhdev->bus); int status; int old_state = hcd->state; dev_dbg(&rhdev->dev, "bus %ssuspend, wakeup %d\n", (PMSG_IS_AUTO(msg) ? "auto-" : ""), rhdev->do_remote_wakeup); if (HCD_DEAD(hcd)) { dev_dbg(&rhdev->dev, "skipped %s of dead bus\n", "suspend"); return 0; } if (!hcd->driver->bus_suspend) { status = -ENOENT; } else { clear_bit(HCD_FLAG_RH_RUNNING, &hcd->flags); hcd->state = HC_STATE_QUIESCING; status = hcd->driver->bus_suspend(hcd); } if (status == 0) { usb_set_device_state(rhdev, USB_STATE_SUSPENDED); hcd->state = HC_STATE_SUSPENDED; if (!PMSG_IS_AUTO(msg)) usb_phy_roothub_suspend(hcd->self.sysdev, hcd->phy_roothub); /* Did we race with a root-hub wakeup event? */ if (rhdev->do_remote_wakeup) { char buffer[6]; status = hcd->driver->hub_status_data(hcd, buffer); if (status != 0) { dev_dbg(&rhdev->dev, "suspend raced with wakeup event\n"); hcd_bus_resume(rhdev, PMSG_AUTO_RESUME); status = -EBUSY; } } } else { spin_lock_irq(&hcd_root_hub_lock); if (!HCD_DEAD(hcd)) { set_bit(HCD_FLAG_RH_RUNNING, &hcd->flags); hcd->state = old_state; } spin_unlock_irq(&hcd_root_hub_lock); dev_dbg(&rhdev->dev, "bus %s fail, err %d\n", "suspend", status); } return status; } int hcd_bus_resume(struct usb_device *rhdev, pm_message_t msg) { struct usb_hcd *hcd = bus_to_hcd(rhdev->bus); int status; int old_state = hcd->state; dev_dbg(&rhdev->dev, "usb %sresume\n", (PMSG_IS_AUTO(msg) ? "auto-" : "")); if (HCD_DEAD(hcd)) { dev_dbg(&rhdev->dev, "skipped %s of dead bus\n", "resume"); return 0; } if (!PMSG_IS_AUTO(msg)) { status = usb_phy_roothub_resume(hcd->self.sysdev, hcd->phy_roothub); if (status) return status; } if (!hcd->driver->bus_resume) return -ENOENT; if (HCD_RH_RUNNING(hcd)) return 0; hcd->state = HC_STATE_RESUMING; status = hcd->driver->bus_resume(hcd); clear_bit(HCD_FLAG_WAKEUP_PENDING, &hcd->flags); if (status == 0) status = usb_phy_roothub_calibrate(hcd->phy_roothub); if (status == 0) { struct usb_device *udev; int port1; spin_lock_irq(&hcd_root_hub_lock); if (!HCD_DEAD(hcd)) { usb_set_device_state(rhdev, rhdev->actconfig ? USB_STATE_CONFIGURED : USB_STATE_ADDRESS); set_bit(HCD_FLAG_RH_RUNNING, &hcd->flags); hcd->state = HC_STATE_RUNNING; } spin_unlock_irq(&hcd_root_hub_lock); /* * Check whether any of the enabled ports on the root hub are * unsuspended. If they are then a TRSMRCY delay is needed * (this is what the USB-2 spec calls a "global resume"). * Otherwise we can skip the delay. */ usb_hub_for_each_child(rhdev, port1, udev) { if (udev->state != USB_STATE_NOTATTACHED && !udev->port_is_suspended) { usleep_range(10000, 11000); /* TRSMRCY */ break; } } } else { hcd->state = old_state; usb_phy_roothub_suspend(hcd->self.sysdev, hcd->phy_roothub); dev_dbg(&rhdev->dev, "bus %s fail, err %d\n", "resume", status); if (status != -ESHUTDOWN) usb_hc_died(hcd); } return status; } /* Workqueue routine for root-hub remote wakeup */ static void hcd_resume_work(struct work_struct *work) { struct usb_hcd *hcd = container_of(work, struct usb_hcd, wakeup_work); struct usb_device *udev = hcd->self.root_hub; usb_remote_wakeup(udev); } /** * usb_hcd_resume_root_hub - called by HCD to resume its root hub * @hcd: host controller for this root hub * * The USB host controller calls this function when its root hub is * suspended (with the remote wakeup feature enabled) and a remote * wakeup request is received. The routine submits a workqueue request * to resume the root hub (that is, manage its downstream ports again). */ void usb_hcd_resume_root_hub (struct usb_hcd *hcd) { unsigned long flags; spin_lock_irqsave (&hcd_root_hub_lock, flags); if (hcd->rh_registered) { pm_wakeup_event(&hcd->self.root_hub->dev, 0); set_bit(HCD_FLAG_WAKEUP_PENDING, &hcd->flags); queue_work(pm_wq, &hcd->wakeup_work); } spin_unlock_irqrestore (&hcd_root_hub_lock, flags); } EXPORT_SYMBOL_GPL(usb_hcd_resume_root_hub); #endif /* CONFIG_PM */ /*-------------------------------------------------------------------------*/ #ifdef CONFIG_USB_OTG /** * usb_bus_start_enum - start immediate enumeration (for OTG) * @bus: the bus (must use hcd framework) * @port_num: 1-based number of port; usually bus->otg_port * Context: atomic * * Starts enumeration, with an immediate reset followed later by * hub_wq identifying and possibly configuring the device. * This is needed by OTG controller drivers, where it helps meet * HNP protocol timing requirements for starting a port reset. * * Return: 0 if successful. */ int usb_bus_start_enum(struct usb_bus *bus, unsigned port_num) { struct usb_hcd *hcd; int status = -EOPNOTSUPP; /* NOTE: since HNP can't start by grabbing the bus's address0_sem, * boards with root hubs hooked up to internal devices (instead of * just the OTG port) may need more attention to resetting... */ hcd = bus_to_hcd(bus); if (port_num && hcd->driver->start_port_reset) status = hcd->driver->start_port_reset(hcd, port_num); /* allocate hub_wq shortly after (first) root port reset finishes; * it may issue others, until at least 50 msecs have passed. */ if (status == 0) mod_timer(&hcd->rh_timer, jiffies + msecs_to_jiffies(10)); return status; } EXPORT_SYMBOL_GPL(usb_bus_start_enum); #endif /*-------------------------------------------------------------------------*/ /** * usb_hcd_irq - hook IRQs to HCD framework (bus glue) * @irq: the IRQ being raised * @__hcd: pointer to the HCD whose IRQ is being signaled * * If the controller isn't HALTed, calls the driver's irq handler. * Checks whether the controller is now dead. * * Return: %IRQ_HANDLED if the IRQ was handled. %IRQ_NONE otherwise. */ irqreturn_t usb_hcd_irq (int irq, void *__hcd) { struct usb_hcd *hcd = __hcd; irqreturn_t rc; if (unlikely(HCD_DEAD(hcd) || !HCD_HW_ACCESSIBLE(hcd))) rc = IRQ_NONE; else if (hcd->driver->irq(hcd) == IRQ_NONE) rc = IRQ_NONE; else rc = IRQ_HANDLED; return rc; } EXPORT_SYMBOL_GPL(usb_hcd_irq); /*-------------------------------------------------------------------------*/ /* Workqueue routine for when the root-hub has died. */ static void hcd_died_work(struct work_struct *work) { struct usb_hcd *hcd = container_of(work, struct usb_hcd, died_work); static char *env[] = { "ERROR=DEAD", NULL }; /* Notify user space that the host controller has died */ kobject_uevent_env(&hcd->self.root_hub->dev.kobj, KOBJ_OFFLINE, env); } /** * usb_hc_died - report abnormal shutdown of a host controller (bus glue) * @hcd: pointer to the HCD representing the controller * * This is called by bus glue to report a USB host controller that died * while operations may still have been pending. It's called automatically * by the PCI glue, so only glue for non-PCI busses should need to call it. * * Only call this function with the primary HCD. */ void usb_hc_died (struct usb_hcd *hcd) { unsigned long flags; dev_err (hcd->self.controller, "HC died; cleaning up\n"); spin_lock_irqsave (&hcd_root_hub_lock, flags); clear_bit(HCD_FLAG_RH_RUNNING, &hcd->flags); set_bit(HCD_FLAG_DEAD, &hcd->flags); if (hcd->rh_registered) { clear_bit(HCD_FLAG_POLL_RH, &hcd->flags); /* make hub_wq clean up old urbs and devices */ usb_set_device_state (hcd->self.root_hub, USB_STATE_NOTATTACHED); usb_kick_hub_wq(hcd->self.root_hub); } if (usb_hcd_is_primary_hcd(hcd) && hcd->shared_hcd) { hcd = hcd->shared_hcd; clear_bit(HCD_FLAG_RH_RUNNING, &hcd->flags); set_bit(HCD_FLAG_DEAD, &hcd->flags); if (hcd->rh_registered) { clear_bit(HCD_FLAG_POLL_RH, &hcd->flags); /* make hub_wq clean up old urbs and devices */ usb_set_device_state(hcd->self.root_hub, USB_STATE_NOTATTACHED); usb_kick_hub_wq(hcd->self.root_hub); } } /* Handle the case where this function gets called with a shared HCD */ if (usb_hcd_is_primary_hcd(hcd)) schedule_work(&hcd->died_work); else schedule_work(&hcd->primary_hcd->died_work); spin_unlock_irqrestore (&hcd_root_hub_lock, flags); /* Make sure that the other roothub is also deallocated. */ } EXPORT_SYMBOL_GPL (usb_hc_died); /*-------------------------------------------------------------------------*/ static void init_giveback_urb_bh(struct giveback_urb_bh *bh) { spin_lock_init(&bh->lock); INIT_LIST_HEAD(&bh->head); INIT_WORK(&bh->bh, usb_giveback_urb_bh); } struct usb_hcd *__usb_create_hcd(const struct hc_driver *driver, struct device *sysdev, struct device *dev, const char *bus_name, struct usb_hcd *primary_hcd) { struct usb_hcd *hcd; hcd = kzalloc(sizeof(*hcd) + driver->hcd_priv_size, GFP_KERNEL); if (!hcd) return NULL; if (primary_hcd == NULL) { hcd->address0_mutex = kmalloc(sizeof(*hcd->address0_mutex), GFP_KERNEL); if (!hcd->address0_mutex) { kfree(hcd); dev_dbg(dev, "hcd address0 mutex alloc failed\n"); return NULL; } mutex_init(hcd->address0_mutex); hcd->bandwidth_mutex = kmalloc(sizeof(*hcd->bandwidth_mutex), GFP_KERNEL); if (!hcd->bandwidth_mutex) { kfree(hcd->address0_mutex); kfree(hcd); dev_dbg(dev, "hcd bandwidth mutex alloc failed\n"); return NULL; } mutex_init(hcd->bandwidth_mutex); dev_set_drvdata(dev, hcd); } else { mutex_lock(&usb_port_peer_mutex); hcd->address0_mutex = primary_hcd->address0_mutex; hcd->bandwidth_mutex = primary_hcd->bandwidth_mutex; hcd->primary_hcd = primary_hcd; primary_hcd->primary_hcd = primary_hcd; hcd->shared_hcd = primary_hcd; primary_hcd->shared_hcd = hcd; mutex_unlock(&usb_port_peer_mutex); } kref_init(&hcd->kref); usb_bus_init(&hcd->self); hcd->self.controller = dev; hcd->self.sysdev = sysdev; hcd->self.bus_name = bus_name; timer_setup(&hcd->rh_timer, rh_timer_func, 0); #ifdef CONFIG_PM INIT_WORK(&hcd->wakeup_work, hcd_resume_work); #endif INIT_WORK(&hcd->died_work, hcd_died_work); hcd->driver = driver; hcd->speed = driver->flags & HCD_MASK; hcd->product_desc = (driver->product_desc) ? driver->product_desc : "USB Host Controller"; return hcd; } EXPORT_SYMBOL_GPL(__usb_create_hcd); /** * usb_create_shared_hcd - create and initialize an HCD structure * @driver: HC driver that will use this hcd * @dev: device for this HC, stored in hcd->self.controller * @bus_name: value to store in hcd->self.bus_name * @primary_hcd: a pointer to the usb_hcd structure that is sharing the * PCI device. Only allocate certain resources for the primary HCD * * Context: task context, might sleep. * * Allocate a struct usb_hcd, with extra space at the end for the * HC driver's private data. Initialize the generic members of the * hcd structure. * * Return: On success, a pointer to the created and initialized HCD structure. * On failure (e.g. if memory is unavailable), %NULL. */ struct usb_hcd *usb_create_shared_hcd(const struct hc_driver *driver, struct device *dev, const char *bus_name, struct usb_hcd *primary_hcd) { return __usb_create_hcd(driver, dev, dev, bus_name, primary_hcd); } EXPORT_SYMBOL_GPL(usb_create_shared_hcd); /** * usb_create_hcd - create and initialize an HCD structure * @driver: HC driver that will use this hcd * @dev: device for this HC, stored in hcd->self.controller * @bus_name: value to store in hcd->self.bus_name * * Context: task context, might sleep. * * Allocate a struct usb_hcd, with extra space at the end for the * HC driver's private data. Initialize the generic members of the * hcd structure. * * Return: On success, a pointer to the created and initialized HCD * structure. On failure (e.g. if memory is unavailable), %NULL. */ struct usb_hcd *usb_create_hcd(const struct hc_driver *driver, struct device *dev, const char *bus_name) { return __usb_create_hcd(driver, dev, dev, bus_name, NULL); } EXPORT_SYMBOL_GPL(usb_create_hcd); /* * Roothubs that share one PCI device must also share the bandwidth mutex. * Don't deallocate the bandwidth_mutex until the last shared usb_hcd is * deallocated. * * Make sure to deallocate the bandwidth_mutex only when the last HCD is * freed. When hcd_release() is called for either hcd in a peer set, * invalidate the peer's ->shared_hcd and ->primary_hcd pointers. */ static void hcd_release(struct kref *kref) { struct usb_hcd *hcd = container_of (kref, struct usb_hcd, kref); mutex_lock(&usb_port_peer_mutex); if (hcd->shared_hcd) { struct usb_hcd *peer = hcd->shared_hcd; peer->shared_hcd = NULL; peer->primary_hcd = NULL; } else { kfree(hcd->address0_mutex); kfree(hcd->bandwidth_mutex); } mutex_unlock(&usb_port_peer_mutex); kfree(hcd); } struct usb_hcd *usb_get_hcd (struct usb_hcd *hcd) { if (hcd) kref_get (&hcd->kref); return hcd; } EXPORT_SYMBOL_GPL(usb_get_hcd); void usb_put_hcd (struct usb_hcd *hcd) { if (hcd) kref_put (&hcd->kref, hcd_release); } EXPORT_SYMBOL_GPL(usb_put_hcd); int usb_hcd_is_primary_hcd(struct usb_hcd *hcd) { if (!hcd->primary_hcd) return 1; return hcd == hcd->primary_hcd; } EXPORT_SYMBOL_GPL(usb_hcd_is_primary_hcd); int usb_hcd_find_raw_port_number(struct usb_hcd *hcd, int port1) { if (!hcd->driver->find_raw_port_number) return port1; return hcd->driver->find_raw_port_number(hcd, port1); } static int usb_hcd_request_irqs(struct usb_hcd *hcd, unsigned int irqnum, unsigned long irqflags) { int retval; if (hcd->driver->irq) { snprintf(hcd->irq_descr, sizeof(hcd->irq_descr), "%s:usb%d", hcd->driver->description, hcd->self.busnum); retval = request_irq(irqnum, &usb_hcd_irq, irqflags, hcd->irq_descr, hcd); if (retval != 0) { dev_err(hcd->self.controller, "request interrupt %d failed\n", irqnum); return retval; } hcd->irq = irqnum; dev_info(hcd->self.controller, "irq %d, %s 0x%08llx\n", irqnum, (hcd->driver->flags & HCD_MEMORY) ? "io mem" : "io port", (unsigned long long)hcd->rsrc_start); } else { hcd->irq = 0; if (hcd->rsrc_start) dev_info(hcd->self.controller, "%s 0x%08llx\n", (hcd->driver->flags & HCD_MEMORY) ? "io mem" : "io port", (unsigned long long)hcd->rsrc_start); } return 0; } /* * Before we free this root hub, flush in-flight peering attempts * and disable peer lookups */ static void usb_put_invalidate_rhdev(struct usb_hcd *hcd) { struct usb_device *rhdev; mutex_lock(&usb_port_peer_mutex); rhdev = hcd->self.root_hub; hcd->self.root_hub = NULL; mutex_unlock(&usb_port_peer_mutex); usb_put_dev(rhdev); } /** * usb_stop_hcd - Halt the HCD * @hcd: the usb_hcd that has to be halted * * Stop the root-hub polling timer and invoke the HCD's ->stop callback. */ static void usb_stop_hcd(struct usb_hcd *hcd) { hcd->rh_pollable = 0; clear_bit(HCD_FLAG_POLL_RH, &hcd->flags); del_timer_sync(&hcd->rh_timer); hcd->driver->stop(hcd); hcd->state = HC_STATE_HALT; /* In case the HCD restarted the timer, stop it again. */ clear_bit(HCD_FLAG_POLL_RH, &hcd->flags); del_timer_sync(&hcd->rh_timer); } /** * usb_add_hcd - finish generic HCD structure initialization and register * @hcd: the usb_hcd structure to initialize * @irqnum: Interrupt line to allocate * @irqflags: Interrupt type flags * * Finish the remaining parts of generic HCD initialization: allocate the * buffers of consistent memory, register the bus, request the IRQ line, * and call the driver's reset() and start() routines. */ int usb_add_hcd(struct usb_hcd *hcd, unsigned int irqnum, unsigned long irqflags) { int retval; struct usb_device *rhdev; struct usb_hcd *shared_hcd; int skip_phy_initialization; if (usb_hcd_is_primary_hcd(hcd)) skip_phy_initialization = hcd->skip_phy_initialization; else skip_phy_initialization = hcd->primary_hcd->skip_phy_initialization; if (!skip_phy_initialization) { if (usb_hcd_is_primary_hcd(hcd)) { hcd->phy_roothub = usb_phy_roothub_alloc(hcd->self.sysdev); if (IS_ERR(hcd->phy_roothub)) return PTR_ERR(hcd->phy_roothub); } else { hcd->phy_roothub = usb_phy_roothub_alloc_usb3_phy(hcd->self.sysdev); if (IS_ERR(hcd->phy_roothub)) return PTR_ERR(hcd->phy_roothub); } retval = usb_phy_roothub_init(hcd->phy_roothub); if (retval) return retval; retval = usb_phy_roothub_set_mode(hcd->phy_roothub, PHY_MODE_USB_HOST_SS); if (retval) retval = usb_phy_roothub_set_mode(hcd->phy_roothub, PHY_MODE_USB_HOST); if (retval) goto err_usb_phy_roothub_power_on; retval = usb_phy_roothub_power_on(hcd->phy_roothub); if (retval) goto err_usb_phy_roothub_power_on; } dev_info(hcd->self.controller, "%s\n", hcd->product_desc); switch (authorized_default) { case USB_AUTHORIZE_NONE: hcd->dev_policy = USB_DEVICE_AUTHORIZE_NONE; break; case USB_AUTHORIZE_INTERNAL: hcd->dev_policy = USB_DEVICE_AUTHORIZE_INTERNAL; break; case USB_AUTHORIZE_ALL: case USB_AUTHORIZE_WIRED: default: hcd->dev_policy = USB_DEVICE_AUTHORIZE_ALL; break; } set_bit(HCD_FLAG_HW_ACCESSIBLE, &hcd->flags); /* per default all interfaces are authorized */ set_bit(HCD_FLAG_INTF_AUTHORIZED, &hcd->flags); /* HC is in reset state, but accessible. Now do the one-time init, * bottom up so that hcds can customize the root hubs before hub_wq * starts talking to them. (Note, bus id is assigned early too.) */ retval = hcd_buffer_create(hcd); if (retval != 0) { dev_dbg(hcd->self.sysdev, "pool alloc failed\n"); goto err_create_buf; } retval = usb_register_bus(&hcd->self); if (retval < 0) goto err_register_bus; rhdev = usb_alloc_dev(NULL, &hcd->self, 0); if (rhdev == NULL) { dev_err(hcd->self.sysdev, "unable to allocate root hub\n"); retval = -ENOMEM; goto err_allocate_root_hub; } mutex_lock(&usb_port_peer_mutex); hcd->self.root_hub = rhdev; mutex_unlock(&usb_port_peer_mutex); rhdev->rx_lanes = 1; rhdev->tx_lanes = 1; rhdev->ssp_rate = USB_SSP_GEN_UNKNOWN; switch (hcd->speed) { case HCD_USB11: rhdev->speed = USB_SPEED_FULL; break; case HCD_USB2: rhdev->speed = USB_SPEED_HIGH; break; case HCD_USB3: rhdev->speed = USB_SPEED_SUPER; break; case HCD_USB32: rhdev->rx_lanes = 2; rhdev->tx_lanes = 2; rhdev->ssp_rate = USB_SSP_GEN_2x2; rhdev->speed = USB_SPEED_SUPER_PLUS; break; case HCD_USB31: rhdev->ssp_rate = USB_SSP_GEN_2x1; rhdev->speed = USB_SPEED_SUPER_PLUS; break; default: retval = -EINVAL; goto err_set_rh_speed; } /* wakeup flag init defaults to "everything works" for root hubs, * but drivers can override it in reset() if needed, along with * recording the overall controller's system wakeup capability. */ device_set_wakeup_capable(&rhdev->dev, 1); /* HCD_FLAG_RH_RUNNING doesn't matter until the root hub is * registered. But since the controller can die at any time, * let's initialize the flag before touching the hardware. */ set_bit(HCD_FLAG_RH_RUNNING, &hcd->flags); /* "reset" is misnamed; its role is now one-time init. the controller * should already have been reset (and boot firmware kicked off etc). */ if (hcd->driver->reset) { retval = hcd->driver->reset(hcd); if (retval < 0) { dev_err(hcd->self.controller, "can't setup: %d\n", retval); goto err_hcd_driver_setup; } } hcd->rh_pollable = 1; retval = usb_phy_roothub_calibrate(hcd->phy_roothub); if (retval) goto err_hcd_driver_setup; /* NOTE: root hub and controller capabilities may not be the same */ if (device_can_wakeup(hcd->self.controller) && device_can_wakeup(&hcd->self.root_hub->dev)) dev_dbg(hcd->self.controller, "supports USB remote wakeup\n"); /* initialize BHs */ init_giveback_urb_bh(&hcd->high_prio_bh); hcd->high_prio_bh.high_prio = true; init_giveback_urb_bh(&hcd->low_prio_bh); /* enable irqs just before we start the controller, * if the BIOS provides legacy PCI irqs. */ if (usb_hcd_is_primary_hcd(hcd) && irqnum) { retval = usb_hcd_request_irqs(hcd, irqnum, irqflags); if (retval) goto err_request_irq; } hcd->state = HC_STATE_RUNNING; retval = hcd->driver->start(hcd); if (retval < 0) { dev_err(hcd->self.controller, "startup error %d\n", retval); goto err_hcd_driver_start; } /* starting here, usbcore will pay attention to the shared HCD roothub */ shared_hcd = hcd->shared_hcd; if (!usb_hcd_is_primary_hcd(hcd) && shared_hcd && HCD_DEFER_RH_REGISTER(shared_hcd)) { retval = register_root_hub(shared_hcd); if (retval != 0) goto err_register_root_hub; if (shared_hcd->uses_new_polling && HCD_POLL_RH(shared_hcd)) usb_hcd_poll_rh_status(shared_hcd); } /* starting here, usbcore will pay attention to this root hub */ if (!HCD_DEFER_RH_REGISTER(hcd)) { retval = register_root_hub(hcd); if (retval != 0) goto err_register_root_hub; if (hcd->uses_new_polling && HCD_POLL_RH(hcd)) usb_hcd_poll_rh_status(hcd); } return retval; err_register_root_hub: usb_stop_hcd(hcd); err_hcd_driver_start: if (usb_hcd_is_primary_hcd(hcd) && hcd->irq > 0) free_irq(irqnum, hcd); err_request_irq: err_hcd_driver_setup: err_set_rh_speed: usb_put_invalidate_rhdev(hcd); err_allocate_root_hub: usb_deregister_bus(&hcd->self); err_register_bus: hcd_buffer_destroy(hcd); err_create_buf: usb_phy_roothub_power_off(hcd->phy_roothub); err_usb_phy_roothub_power_on: usb_phy_roothub_exit(hcd->phy_roothub); return retval; } EXPORT_SYMBOL_GPL(usb_add_hcd); /** * usb_remove_hcd - shutdown processing for generic HCDs * @hcd: the usb_hcd structure to remove * * Context: task context, might sleep. * * Disconnects the root hub, then reverses the effects of usb_add_hcd(), * invoking the HCD's stop() method. */ void usb_remove_hcd(struct usb_hcd *hcd) { struct usb_device *rhdev; bool rh_registered; if (!hcd) { pr_debug("%s: hcd is NULL\n", __func__); return; } rhdev = hcd->self.root_hub; dev_info(hcd->self.controller, "remove, state %x\n", hcd->state); usb_get_dev(rhdev); clear_bit(HCD_FLAG_RH_RUNNING, &hcd->flags); if (HC_IS_RUNNING (hcd->state)) hcd->state = HC_STATE_QUIESCING; dev_dbg(hcd->self.controller, "roothub graceful disconnect\n"); spin_lock_irq (&hcd_root_hub_lock); rh_registered = hcd->rh_registered; hcd->rh_registered = 0; spin_unlock_irq (&hcd_root_hub_lock); #ifdef CONFIG_PM cancel_work_sync(&hcd->wakeup_work); #endif cancel_work_sync(&hcd->died_work); mutex_lock(&usb_bus_idr_lock); if (rh_registered) usb_disconnect(&rhdev); /* Sets rhdev to NULL */ mutex_unlock(&usb_bus_idr_lock); /* * flush_work() isn't needed here because: * - driver's disconnect() called from usb_disconnect() should * make sure its URBs are completed during the disconnect() * callback * * - it is too late to run complete() here since driver may have * been removed already now */ /* Prevent any more root-hub status calls from the timer. * The HCD might still restart the timer (if a port status change * interrupt occurs), but usb_hcd_poll_rh_status() won't invoke * the hub_status_data() callback. */ usb_stop_hcd(hcd); if (usb_hcd_is_primary_hcd(hcd)) { if (hcd->irq > 0) free_irq(hcd->irq, hcd); } usb_deregister_bus(&hcd->self); hcd_buffer_destroy(hcd); usb_phy_roothub_power_off(hcd->phy_roothub); usb_phy_roothub_exit(hcd->phy_roothub); usb_put_invalidate_rhdev(hcd); hcd->flags = 0; } EXPORT_SYMBOL_GPL(usb_remove_hcd); void usb_hcd_platform_shutdown(struct platform_device *dev) { struct usb_hcd *hcd = platform_get_drvdata(dev); /* No need for pm_runtime_put(), we're shutting down */ pm_runtime_get_sync(&dev->dev); if (hcd->driver->shutdown) hcd->driver->shutdown(hcd); } EXPORT_SYMBOL_GPL(usb_hcd_platform_shutdown); int usb_hcd_setup_local_mem(struct usb_hcd *hcd, phys_addr_t phys_addr, dma_addr_t dma, size_t size) { int err; void *local_mem; hcd->localmem_pool = devm_gen_pool_create(hcd->self.sysdev, 4, dev_to_node(hcd->self.sysdev), dev_name(hcd->self.sysdev)); if (IS_ERR(hcd->localmem_pool)) return PTR_ERR(hcd->localmem_pool); /* * if a physical SRAM address was passed, map it, otherwise * allocate system memory as a buffer. */ if (phys_addr) local_mem = devm_memremap(hcd->self.sysdev, phys_addr, size, MEMREMAP_WC); else local_mem = dmam_alloc_attrs(hcd->self.sysdev, size, &dma, GFP_KERNEL, DMA_ATTR_WRITE_COMBINE); if (IS_ERR_OR_NULL(local_mem)) { if (!local_mem) return -ENOMEM; return PTR_ERR(local_mem); } /* * Here we pass a dma_addr_t but the arg type is a phys_addr_t. * It's not backed by system memory and thus there's no kernel mapping * for it. */ err = gen_pool_add_virt(hcd->localmem_pool, (unsigned long)local_mem, dma, size, dev_to_node(hcd->self.sysdev)); if (err < 0) { dev_err(hcd->self.sysdev, "gen_pool_add_virt failed with %d\n", err); return err; } return 0; } EXPORT_SYMBOL_GPL(usb_hcd_setup_local_mem); /*-------------------------------------------------------------------------*/ #if IS_ENABLED(CONFIG_USB_MON) const struct usb_mon_operations *mon_ops; /* * The registration is unlocked. * We do it this way because we do not want to lock in hot paths. * * Notice that the code is minimally error-proof. Because usbmon needs * symbols from usbcore, usbcore gets referenced and cannot be unloaded first. */ int usb_mon_register(const struct usb_mon_operations *ops) { if (mon_ops) return -EBUSY; mon_ops = ops; mb(); return 0; } EXPORT_SYMBOL_GPL (usb_mon_register); void usb_mon_deregister (void) { if (mon_ops == NULL) { printk(KERN_ERR "USB: monitor was not registered\n"); return; } mon_ops = NULL; mb(); } EXPORT_SYMBOL_GPL (usb_mon_deregister); #endif /* CONFIG_USB_MON || CONFIG_USB_MON_MODULE */ |
| 3 3 3 3 3 2 2 1 3 3 3 2 3 3 3 3 2 3 1 1 2 1 3 5 1 1 3 3 3 2 1 1 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 | // SPDX-License-Identifier: GPL-2.0+ /*****************************************************************************/ /* * uss720.c -- USS720 USB Parport Cable. * * Copyright (C) 1999, 2005, 2010 * Thomas Sailer (t.sailer@alumni.ethz.ch) * * Based on parport_pc.c * * History: * 0.1 04.08.1999 Created * 0.2 07.08.1999 Some fixes mainly suggested by Tim Waugh * Interrupt handling currently disabled because * usb_request_irq crashes somewhere within ohci.c * for no apparent reason (that is for me, anyway) * ECP currently untested * 0.3 10.08.1999 fixing merge errors * 0.4 13.08.1999 Added Vendor/Product ID of Brad Hard's cable * 0.5 20.09.1999 usb_control_msg wrapper used * Nov01.2000 usb_device_table support by Adam J. Richter * 08.04.2001 Identify version on module load. gb * 0.6 02.09.2005 Fix "scheduling in interrupt" problem by making save/restore * context asynchronous * */ /*****************************************************************************/ #include <linux/module.h> #include <linux/socket.h> #include <linux/parport.h> #include <linux/init.h> #include <linux/usb.h> #include <linux/delay.h> #include <linux/completion.h> #include <linux/kref.h> #include <linux/slab.h> #include <linux/sched/signal.h> #define DRIVER_AUTHOR "Thomas M. Sailer, t.sailer@alumni.ethz.ch" #define DRIVER_DESC "USB Parport Cable driver for Cables using the Lucent Technologies USS720 Chip" /* --------------------------------------------------------------------- */ struct parport_uss720_private { struct usb_device *usbdev; struct parport *pp; struct kref ref_count; __u8 reg[7]; /* USB registers */ struct list_head asynclist; spinlock_t asynclock; }; struct uss720_async_request { struct parport_uss720_private *priv; struct kref ref_count; struct list_head asynclist; struct completion compl; struct urb *urb; struct usb_ctrlrequest *dr; __u8 reg[7]; }; /* --------------------------------------------------------------------- */ static void destroy_priv(struct kref *kref) { struct parport_uss720_private *priv = container_of(kref, struct parport_uss720_private, ref_count); dev_dbg(&priv->usbdev->dev, "destroying priv datastructure\n"); usb_put_dev(priv->usbdev); priv->usbdev = NULL; kfree(priv); } static void destroy_async(struct kref *kref) { struct uss720_async_request *rq = container_of(kref, struct uss720_async_request, ref_count); struct parport_uss720_private *priv = rq->priv; unsigned long flags; if (likely(rq->urb)) usb_free_urb(rq->urb); kfree(rq->dr); spin_lock_irqsave(&priv->asynclock, flags); list_del_init(&rq->asynclist); spin_unlock_irqrestore(&priv->asynclock, flags); kfree(rq); kref_put(&priv->ref_count, destroy_priv); } /* --------------------------------------------------------------------- */ static void async_complete(struct urb *urb) { struct uss720_async_request *rq; struct parport *pp; struct parport_uss720_private *priv; int status = urb->status; rq = urb->context; priv = rq->priv; pp = priv->pp; if (status) { dev_err(&urb->dev->dev, "async_complete: urb error %d\n", status); } else if (rq->dr->bRequest == 3) { memcpy(priv->reg, rq->reg, sizeof(priv->reg)); #if 0 dev_dbg(&priv->usbdev->dev, "async_complete regs %7ph\n", priv->reg); #endif /* if nAck interrupts are enabled and we have an interrupt, call the interrupt procedure */ if (rq->reg[2] & rq->reg[1] & 0x10 && pp) parport_generic_irq(pp); } complete(&rq->compl); kref_put(&rq->ref_count, destroy_async); } static struct uss720_async_request *submit_async_request(struct parport_uss720_private *priv, __u8 request, __u8 requesttype, __u16 value, __u16 index, gfp_t mem_flags) { struct usb_device *usbdev; struct uss720_async_request *rq; unsigned long flags; int ret; if (!priv) return NULL; usbdev = priv->usbdev; if (!usbdev) return NULL; rq = kzalloc(sizeof(struct uss720_async_request), mem_flags); if (!rq) return NULL; kref_init(&rq->ref_count); INIT_LIST_HEAD(&rq->asynclist); init_completion(&rq->compl); kref_get(&priv->ref_count); rq->priv = priv; rq->urb = usb_alloc_urb(0, mem_flags); if (!rq->urb) { kref_put(&rq->ref_count, destroy_async); return NULL; } rq->dr = kmalloc(sizeof(*rq->dr), mem_flags); if (!rq->dr) { kref_put(&rq->ref_count, destroy_async); return NULL; } rq->dr->bRequestType = requesttype; rq->dr->bRequest = request; rq->dr->wValue = cpu_to_le16(value); rq->dr->wIndex = cpu_to_le16(index); rq->dr->wLength = cpu_to_le16((request == 3) ? sizeof(rq->reg) : 0); usb_fill_control_urb(rq->urb, usbdev, (requesttype & 0x80) ? usb_rcvctrlpipe(usbdev, 0) : usb_sndctrlpipe(usbdev, 0), (unsigned char *)rq->dr, (request == 3) ? rq->reg : NULL, (request == 3) ? sizeof(rq->reg) : 0, async_complete, rq); /* rq->urb->transfer_flags |= URB_ASYNC_UNLINK; */ spin_lock_irqsave(&priv->asynclock, flags); list_add_tail(&rq->asynclist, &priv->asynclist); spin_unlock_irqrestore(&priv->asynclock, flags); kref_get(&rq->ref_count); ret = usb_submit_urb(rq->urb, mem_flags); if (!ret) return rq; destroy_async(&rq->ref_count); dev_err(&usbdev->dev, "submit_async_request submit_urb failed with %d\n", ret); return NULL; } static unsigned int kill_all_async_requests_priv(struct parport_uss720_private *priv) { struct uss720_async_request *rq; unsigned long flags; unsigned int ret = 0; spin_lock_irqsave(&priv->asynclock, flags); list_for_each_entry(rq, &priv->asynclist, asynclist) { usb_unlink_urb(rq->urb); ret++; } spin_unlock_irqrestore(&priv->asynclock, flags); return ret; } /* --------------------------------------------------------------------- */ static int get_1284_register(struct parport *pp, unsigned char reg, unsigned char *val, gfp_t mem_flags) { struct parport_uss720_private *priv; struct uss720_async_request *rq; static const unsigned char regindex[9] = { 4, 0, 1, 5, 5, 0, 2, 3, 6 }; int ret; if (!pp) return -EIO; priv = pp->private_data; rq = submit_async_request(priv, 3, 0xc0, ((unsigned int)reg) << 8, 0, mem_flags); if (!rq) { dev_err(&priv->usbdev->dev, "get_1284_register(%u) failed", (unsigned int)reg); return -EIO; } if (!val) { kref_put(&rq->ref_count, destroy_async); return 0; } if (wait_for_completion_timeout(&rq->compl, HZ)) { ret = rq->urb->status; *val = priv->reg[(reg >= 9) ? 0 : regindex[reg]]; if (ret) printk(KERN_WARNING "get_1284_register: " "usb error %d\n", ret); kref_put(&rq->ref_count, destroy_async); return ret; } printk(KERN_WARNING "get_1284_register timeout\n"); kill_all_async_requests_priv(priv); return -EIO; } static int set_1284_register(struct parport *pp, unsigned char reg, unsigned char val, gfp_t mem_flags) { struct parport_uss720_private *priv; struct uss720_async_request *rq; if (!pp) return -EIO; priv = pp->private_data; rq = submit_async_request(priv, 4, 0x40, (((unsigned int)reg) << 8) | val, 0, mem_flags); if (!rq) { dev_err(&priv->usbdev->dev, "set_1284_register(%u,%u) failed", (unsigned int)reg, (unsigned int)val); return -EIO; } kref_put(&rq->ref_count, destroy_async); return 0; } /* --------------------------------------------------------------------- */ /* ECR modes */ #define ECR_SPP 00 #define ECR_PS2 01 #define ECR_PPF 02 #define ECR_ECP 03 #define ECR_EPP 04 /* Safely change the mode bits in the ECR */ static int change_mode(struct parport *pp, int m) { struct parport_uss720_private *priv = pp->private_data; int mode; __u8 reg; if (get_1284_register(pp, 6, ®, GFP_KERNEL)) return -EIO; /* Bits <7:5> contain the mode. */ mode = (priv->reg[2] >> 5) & 0x7; if (mode == m) return 0; /* We have to go through mode 000 or 001 */ if (mode > ECR_PS2 && m > ECR_PS2) if (change_mode(pp, ECR_PS2)) return -EIO; if (m <= ECR_PS2 && !(priv->reg[1] & 0x20)) { /* This mode resets the FIFO, so we may * have to wait for it to drain first. */ unsigned long expire = jiffies + pp->physport->cad->timeout; switch (mode) { case ECR_PPF: /* Parallel Port FIFO mode */ case ECR_ECP: /* ECP Parallel Port mode */ /* Poll slowly. */ for (;;) { if (get_1284_register(pp, 6, ®, GFP_KERNEL)) return -EIO; if (priv->reg[2] & 0x01) break; if (time_after_eq (jiffies, expire)) /* The FIFO is stuck. */ return -EBUSY; msleep_interruptible(10); if (signal_pending (current)) break; } } } /* Set the mode. */ if (set_1284_register(pp, 6, m << 5, GFP_KERNEL)) return -EIO; if (get_1284_register(pp, 6, ®, GFP_KERNEL)) return -EIO; return 0; } /* * Clear TIMEOUT BIT in EPP MODE */ static int clear_epp_timeout(struct parport *pp) { unsigned char stat; if (get_1284_register(pp, 1, &stat, GFP_KERNEL)) return 1; return stat & 1; } /* * Access functions. */ #if 0 static int uss720_irq(int usbstatus, void *buffer, int len, void *dev_id) { struct parport *pp = (struct parport *)dev_id; struct parport_uss720_private *priv = pp->private_data; if (usbstatus != 0 || len < 4 || !buffer) return 1; memcpy(priv->reg, buffer, 4); /* if nAck interrupts are enabled and we have an interrupt, call the interrupt procedure */ if (priv->reg[2] & priv->reg[1] & 0x10) parport_generic_irq(pp); return 1; } #endif static void parport_uss720_write_data(struct parport *pp, unsigned char d) { set_1284_register(pp, 0, d, GFP_KERNEL); } static unsigned char parport_uss720_read_data(struct parport *pp) { unsigned char ret; if (get_1284_register(pp, 0, &ret, GFP_KERNEL)) return 0; return ret; } static void parport_uss720_write_control(struct parport *pp, unsigned char d) { struct parport_uss720_private *priv = pp->private_data; d = (d & 0xf) | (priv->reg[1] & 0xf0); if (set_1284_register(pp, 2, d, GFP_KERNEL)) return; priv->reg[1] = d; } static unsigned char parport_uss720_read_control(struct parport *pp) { struct parport_uss720_private *priv = pp->private_data; return priv->reg[1] & 0xf; /* Use soft copy */ } static unsigned char parport_uss720_frob_control(struct parport *pp, unsigned char mask, unsigned char val) { struct parport_uss720_private *priv = pp->private_data; unsigned char d; mask &= 0x0f; val &= 0x0f; d = (priv->reg[1] & (~mask)) ^ val; if (set_1284_register(pp, 2, d, GFP_ATOMIC)) return 0; priv->reg[1] = d; return d & 0xf; } static unsigned char parport_uss720_read_status(struct parport *pp) { unsigned char ret; if (get_1284_register(pp, 1, &ret, GFP_ATOMIC)) return 0; return ret & 0xf8; } static void parport_uss720_disable_irq(struct parport *pp) { struct parport_uss720_private *priv = pp->private_data; unsigned char d; d = priv->reg[1] & ~0x10; if (set_1284_register(pp, 2, d, GFP_KERNEL)) return; priv->reg[1] = d; } static void parport_uss720_enable_irq(struct parport *pp) { struct parport_uss720_private *priv = pp->private_data; unsigned char d; d = priv->reg[1] | 0x10; if (set_1284_register(pp, 2, d, GFP_KERNEL)) return; priv->reg[1] = d; } static void parport_uss720_data_forward (struct parport *pp) { struct parport_uss720_private *priv = pp->private_data; unsigned char d; d = priv->reg[1] & ~0x20; if (set_1284_register(pp, 2, d, GFP_KERNEL)) return; priv->reg[1] = d; } static void parport_uss720_data_reverse (struct parport *pp) { struct parport_uss720_private *priv = pp->private_data; unsigned char d; d = priv->reg[1] | 0x20; if (set_1284_register(pp, 2, d, GFP_KERNEL)) return; priv->reg[1] = d; } static void parport_uss720_init_state(struct pardevice *dev, struct parport_state *s) { s->u.pc.ctr = 0xc | (dev->irq_func ? 0x10 : 0x0); s->u.pc.ecr = 0x24; } static void parport_uss720_save_state(struct parport *pp, struct parport_state *s) { struct parport_uss720_private *priv = pp->private_data; #if 0 if (get_1284_register(pp, 2, NULL, GFP_ATOMIC)) return; #endif s->u.pc.ctr = priv->reg[1]; s->u.pc.ecr = priv->reg[2]; } static void parport_uss720_restore_state(struct parport *pp, struct parport_state *s) { struct parport_uss720_private *priv = pp->private_data; set_1284_register(pp, 2, s->u.pc.ctr, GFP_ATOMIC); set_1284_register(pp, 6, s->u.pc.ecr, GFP_ATOMIC); get_1284_register(pp, 2, NULL, GFP_ATOMIC); priv->reg[1] = s->u.pc.ctr; priv->reg[2] = s->u.pc.ecr; } static size_t parport_uss720_epp_read_data(struct parport *pp, void *buf, size_t length, int flags) { struct parport_uss720_private *priv = pp->private_data; size_t got = 0; if (change_mode(pp, ECR_EPP)) return 0; for (; got < length; got++) { if (get_1284_register(pp, 4, (char *)buf, GFP_KERNEL)) break; buf++; if (priv->reg[0] & 0x01) { clear_epp_timeout(pp); break; } } change_mode(pp, ECR_PS2); return got; } static size_t parport_uss720_epp_write_data(struct parport *pp, const void *buf, size_t length, int flags) { #if 0 struct parport_uss720_private *priv = pp->private_data; size_t written = 0; if (change_mode(pp, ECR_EPP)) return 0; for (; written < length; written++) { if (set_1284_register(pp, 4, (char *)buf, GFP_KERNEL)) break; ((char*)buf)++; if (get_1284_register(pp, 1, NULL, GFP_KERNEL)) break; if (priv->reg[0] & 0x01) { clear_epp_timeout(pp); break; } } change_mode(pp, ECR_PS2); return written; #else struct parport_uss720_private *priv = pp->private_data; struct usb_device *usbdev = priv->usbdev; int rlen = 0; int i; if (!usbdev) return 0; if (change_mode(pp, ECR_EPP)) return 0; i = usb_bulk_msg(usbdev, usb_sndbulkpipe(usbdev, 1), (void *)buf, length, &rlen, 20000); if (i) printk(KERN_ERR "uss720: sendbulk ep 1 buf %p len %zu rlen %u\n", buf, length, rlen); change_mode(pp, ECR_PS2); return rlen; #endif } static size_t parport_uss720_epp_read_addr(struct parport *pp, void *buf, size_t length, int flags) { struct parport_uss720_private *priv = pp->private_data; size_t got = 0; if (change_mode(pp, ECR_EPP)) return 0; for (; got < length; got++) { if (get_1284_register(pp, 3, (char *)buf, GFP_KERNEL)) break; buf++; if (priv->reg[0] & 0x01) { clear_epp_timeout(pp); break; } } change_mode(pp, ECR_PS2); return got; } static size_t parport_uss720_epp_write_addr(struct parport *pp, const void *buf, size_t length, int flags) { struct parport_uss720_private *priv = pp->private_data; size_t written = 0; if (change_mode(pp, ECR_EPP)) return 0; for (; written < length; written++) { if (set_1284_register(pp, 3, *(char *)buf, GFP_KERNEL)) break; buf++; if (get_1284_register(pp, 1, NULL, GFP_KERNEL)) break; if (priv->reg[0] & 0x01) { clear_epp_timeout(pp); break; } } change_mode(pp, ECR_PS2); return written; } static size_t parport_uss720_ecp_write_data(struct parport *pp, const void *buffer, size_t len, int flags) { struct parport_uss720_private *priv = pp->private_data; struct usb_device *usbdev = priv->usbdev; int rlen = 0; int i; if (!usbdev) return 0; if (change_mode(pp, ECR_ECP)) return 0; i = usb_bulk_msg(usbdev, usb_sndbulkpipe(usbdev, 1), (void *)buffer, len, &rlen, 20000); if (i) printk(KERN_ERR "uss720: sendbulk ep 1 buf %p len %zu rlen %u\n", buffer, len, rlen); change_mode(pp, ECR_PS2); return rlen; } static size_t parport_uss720_ecp_read_data(struct parport *pp, void *buffer, size_t len, int flags) { struct parport_uss720_private *priv = pp->private_data; struct usb_device *usbdev = priv->usbdev; int rlen = 0; int i; if (!usbdev) return 0; if (change_mode(pp, ECR_ECP)) return 0; i = usb_bulk_msg(usbdev, usb_rcvbulkpipe(usbdev, 2), buffer, len, &rlen, 20000); if (i) printk(KERN_ERR "uss720: recvbulk ep 2 buf %p len %zu rlen %u\n", buffer, len, rlen); change_mode(pp, ECR_PS2); return rlen; } static size_t parport_uss720_ecp_write_addr(struct parport *pp, const void *buffer, size_t len, int flags) { size_t written = 0; if (change_mode(pp, ECR_ECP)) return 0; for (; written < len; written++) { if (set_1284_register(pp, 5, *(char *)buffer, GFP_KERNEL)) break; buffer++; } change_mode(pp, ECR_PS2); return written; } static size_t parport_uss720_write_compat(struct parport *pp, const void *buffer, size_t len, int flags) { struct parport_uss720_private *priv = pp->private_data; struct usb_device *usbdev = priv->usbdev; int rlen = 0; int i; if (!usbdev) return 0; if (change_mode(pp, ECR_PPF)) return 0; i = usb_bulk_msg(usbdev, usb_sndbulkpipe(usbdev, 1), (void *)buffer, len, &rlen, 20000); if (i) printk(KERN_ERR "uss720: sendbulk ep 1 buf %p len %zu rlen %u\n", buffer, len, rlen); change_mode(pp, ECR_PS2); return rlen; } /* --------------------------------------------------------------------- */ static struct parport_operations parport_uss720_ops = { .owner = THIS_MODULE, .write_data = parport_uss720_write_data, .read_data = parport_uss720_read_data, .write_control = parport_uss720_write_control, .read_control = parport_uss720_read_control, .frob_control = parport_uss720_frob_control, .read_status = parport_uss720_read_status, .enable_irq = parport_uss720_enable_irq, .disable_irq = parport_uss720_disable_irq, .data_forward = parport_uss720_data_forward, .data_reverse = parport_uss720_data_reverse, .init_state = parport_uss720_init_state, .save_state = parport_uss720_save_state, .restore_state = parport_uss720_restore_state, .epp_write_data = parport_uss720_epp_write_data, .epp_read_data = parport_uss720_epp_read_data, .epp_write_addr = parport_uss720_epp_write_addr, .epp_read_addr = parport_uss720_epp_read_addr, .ecp_write_data = parport_uss720_ecp_write_data, .ecp_read_data = parport_uss720_ecp_read_data, .ecp_write_addr = parport_uss720_ecp_write_addr, .compat_write_data = parport_uss720_write_compat, .nibble_read_data = parport_ieee1284_read_nibble, .byte_read_data = parport_ieee1284_read_byte, }; /* --------------------------------------------------------------------- */ static int uss720_probe(struct usb_interface *intf, const struct usb_device_id *id) { struct usb_device *usbdev = usb_get_dev(interface_to_usbdev(intf)); struct usb_host_interface *interface; struct usb_endpoint_descriptor *epd; struct parport_uss720_private *priv; struct parport *pp; unsigned char reg; int ret; dev_dbg(&intf->dev, "probe: vendor id 0x%x, device id 0x%x\n", le16_to_cpu(usbdev->descriptor.idVendor), le16_to_cpu(usbdev->descriptor.idProduct)); /* our known interfaces have 3 alternate settings */ if (intf->num_altsetting != 3) { usb_put_dev(usbdev); return -ENODEV; } ret = usb_set_interface(usbdev, intf->altsetting->desc.bInterfaceNumber, 2); dev_dbg(&intf->dev, "set interface result %d\n", ret); interface = intf->cur_altsetting; if (interface->desc.bNumEndpoints < 2) { usb_put_dev(usbdev); return -ENODEV; } /* * Allocate parport interface */ priv = kzalloc(sizeof(struct parport_uss720_private), GFP_KERNEL); if (!priv) { usb_put_dev(usbdev); return -ENOMEM; } priv->pp = NULL; priv->usbdev = usbdev; kref_init(&priv->ref_count); spin_lock_init(&priv->asynclock); INIT_LIST_HEAD(&priv->asynclist); pp = parport_register_port(0, PARPORT_IRQ_NONE, PARPORT_DMA_NONE, &parport_uss720_ops); if (!pp) { printk(KERN_WARNING "uss720: could not register parport\n"); goto probe_abort; } priv->pp = pp; pp->private_data = priv; pp->modes = PARPORT_MODE_PCSPP | PARPORT_MODE_TRISTATE | PARPORT_MODE_EPP | PARPORT_MODE_COMPAT; if (interface->desc.bNumEndpoints >= 3) pp->modes |= PARPORT_MODE_ECP; pp->dev = &usbdev->dev; /* set the USS720 control register to manual mode, no ECP compression, enable all ints */ set_1284_register(pp, 7, 0x00, GFP_KERNEL); set_1284_register(pp, 6, 0x30, GFP_KERNEL); /* PS/2 mode */ set_1284_register(pp, 2, 0x0c, GFP_KERNEL); /* The Belkin F5U002 Rev 2 P80453-B USB parallel port adapter shares the * device ID 050d:0002 with some other device that works with this * driver, but it itself does not. Detect and handle the bad cable * here. */ ret = get_1284_register(pp, 0, ®, GFP_KERNEL); dev_dbg(&intf->dev, "reg: %7ph\n", priv->reg); if (ret < 0) return ret; ret = usb_find_last_int_in_endpoint(interface, &epd); if (!ret) { dev_dbg(&intf->dev, "epaddr %d interval %d\n", epd->bEndpointAddress, epd->bInterval); } parport_announce_port(pp); usb_set_intfdata(intf, pp); return 0; probe_abort: kill_all_async_requests_priv(priv); kref_put(&priv->ref_count, destroy_priv); return -ENODEV; } static void uss720_disconnect(struct usb_interface *intf) { struct parport *pp = usb_get_intfdata(intf); struct parport_uss720_private *priv; dev_dbg(&intf->dev, "disconnect\n"); usb_set_intfdata(intf, NULL); if (pp) { priv = pp->private_data; priv->pp = NULL; dev_dbg(&intf->dev, "parport_remove_port\n"); parport_remove_port(pp); parport_put_port(pp); kill_all_async_requests_priv(priv); kref_put(&priv->ref_count, destroy_priv); } dev_dbg(&intf->dev, "disconnect done\n"); } /* table of cables that work through this driver */ static const struct usb_device_id uss720_table[] = { { USB_DEVICE(0x047e, 0x1001) }, /* Infowave 901-0030 */ { USB_DEVICE(0x04b8, 0x0002) }, /* Epson CAEUL0002 ISD-103 */ { USB_DEVICE(0x04b8, 0x0003) }, /* Epson ISD-101 */ { USB_DEVICE(0x050d, 0x0002) }, { USB_DEVICE(0x050d, 0x1202) }, /* Belkin F5U120-PC */ { USB_DEVICE(0x0557, 0x2001) }, { USB_DEVICE(0x05ab, 0x0002) }, /* Belkin F5U002 ISD-101 */ { USB_DEVICE(0x05ab, 0x1001) }, /* Belkin F5U002 P80453-A */ { USB_DEVICE(0x06c6, 0x0100) }, /* Infowave ISD-103 */ { USB_DEVICE(0x0729, 0x1284) }, { USB_DEVICE(0x1293, 0x0002) }, { } /* Terminating entry */ }; MODULE_DEVICE_TABLE (usb, uss720_table); static struct usb_driver uss720_driver = { .name = "uss720", .probe = uss720_probe, .disconnect = uss720_disconnect, .id_table = uss720_table, }; /* --------------------------------------------------------------------- */ MODULE_AUTHOR(DRIVER_AUTHOR); MODULE_DESCRIPTION(DRIVER_DESC); MODULE_LICENSE("GPL"); static int __init uss720_init(void) { int retval; retval = usb_register(&uss720_driver); if (retval) goto out; printk(KERN_INFO KBUILD_MODNAME ": " DRIVER_DESC "\n"); printk(KERN_INFO KBUILD_MODNAME ": NOTE: this is a special purpose " "driver to allow nonstandard\n"); printk(KERN_INFO KBUILD_MODNAME ": protocols (eg. bitbang) over " "USS720 usb to parallel cables\n"); printk(KERN_INFO KBUILD_MODNAME ": If you just want to connect to a " "printer, use usblp instead\n"); out: return retval; } static void __exit uss720_cleanup(void) { usb_deregister(&uss720_driver); } module_init(uss720_init); module_exit(uss720_cleanup); /* --------------------------------------------------------------------- */ |
| 2 3 1 3 1 3 1 3 3 3 1 3 1 1 1 2 2 2 1 1 2 2 2 2 2 3 3 1 2 2 2 2 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 | // SPDX-License-Identifier: GPL-2.0-only /* * H-TCP congestion control. The algorithm is detailed in: * R.N.Shorten, D.J.Leith: * "H-TCP: TCP for high-speed and long-distance networks" * Proc. PFLDnet, Argonne, 2004. * https://www.hamilton.ie/net/htcp3.pdf */ #include <linux/mm.h> #include <linux/module.h> #include <net/tcp.h> #define ALPHA_BASE (1<<7) /* 1.0 with shift << 7 */ #define BETA_MIN (1<<6) /* 0.5 with shift << 7 */ #define BETA_MAX 102 /* 0.8 with shift << 7 */ static int use_rtt_scaling __read_mostly = 1; module_param(use_rtt_scaling, int, 0644); MODULE_PARM_DESC(use_rtt_scaling, "turn on/off RTT scaling"); static int use_bandwidth_switch __read_mostly = 1; module_param(use_bandwidth_switch, int, 0644); MODULE_PARM_DESC(use_bandwidth_switch, "turn on/off bandwidth switcher"); struct htcp { u32 alpha; /* Fixed point arith, << 7 */ u8 beta; /* Fixed point arith, << 7 */ u8 modeswitch; /* Delay modeswitch until we had at least one congestion event */ u16 pkts_acked; u32 packetcount; u32 minRTT; u32 maxRTT; u32 last_cong; /* Time since last congestion event end */ u32 undo_last_cong; u32 undo_maxRTT; u32 undo_old_maxB; /* Bandwidth estimation */ u32 minB; u32 maxB; u32 old_maxB; u32 Bi; u32 lasttime; }; static inline u32 htcp_cong_time(const struct htcp *ca) { return jiffies - ca->last_cong; } static inline u32 htcp_ccount(const struct htcp *ca) { return htcp_cong_time(ca) / ca->minRTT; } static inline void htcp_reset(struct htcp *ca) { ca->undo_last_cong = ca->last_cong; ca->undo_maxRTT = ca->maxRTT; ca->undo_old_maxB = ca->old_maxB; ca->last_cong = jiffies; } static u32 htcp_cwnd_undo(struct sock *sk) { struct htcp *ca = inet_csk_ca(sk); if (ca->undo_last_cong) { ca->last_cong = ca->undo_last_cong; ca->maxRTT = ca->undo_maxRTT; ca->old_maxB = ca->undo_old_maxB; ca->undo_last_cong = 0; } return tcp_reno_undo_cwnd(sk); } static inline void measure_rtt(struct sock *sk, u32 srtt) { const struct inet_connection_sock *icsk = inet_csk(sk); struct htcp *ca = inet_csk_ca(sk); /* keep track of minimum RTT seen so far, minRTT is zero at first */ if (ca->minRTT > srtt || !ca->minRTT) ca->minRTT = srtt; /* max RTT */ if (icsk->icsk_ca_state == TCP_CA_Open) { if (ca->maxRTT < ca->minRTT) ca->maxRTT = ca->minRTT; if (ca->maxRTT < srtt && srtt <= ca->maxRTT + msecs_to_jiffies(20)) ca->maxRTT = srtt; } } static void measure_achieved_throughput(struct sock *sk, const struct ack_sample *sample) { const struct inet_connection_sock *icsk = inet_csk(sk); const struct tcp_sock *tp = tcp_sk(sk); struct htcp *ca = inet_csk_ca(sk); u32 now = tcp_jiffies32; if (icsk->icsk_ca_state == TCP_CA_Open) ca->pkts_acked = sample->pkts_acked; if (sample->rtt_us > 0) measure_rtt(sk, usecs_to_jiffies(sample->rtt_us)); if (!use_bandwidth_switch) return; /* achieved throughput calculations */ if (!((1 << icsk->icsk_ca_state) & (TCPF_CA_Open | TCPF_CA_Disorder))) { ca->packetcount = 0; ca->lasttime = now; return; } ca->packetcount += sample->pkts_acked; if (ca->packetcount >= tcp_snd_cwnd(tp) - (ca->alpha >> 7 ? : 1) && now - ca->lasttime >= ca->minRTT && ca->minRTT > 0) { __u32 cur_Bi = ca->packetcount * HZ / (now - ca->lasttime); if (htcp_ccount(ca) <= 3) { /* just after backoff */ ca->minB = ca->maxB = ca->Bi = cur_Bi; } else { ca->Bi = (3 * ca->Bi + cur_Bi) / 4; if (ca->Bi > ca->maxB) ca->maxB = ca->Bi; if (ca->minB > ca->maxB) ca->minB = ca->maxB; } ca->packetcount = 0; ca->lasttime = now; } } static inline void htcp_beta_update(struct htcp *ca, u32 minRTT, u32 maxRTT) { if (use_bandwidth_switch) { u32 maxB = ca->maxB; u32 old_maxB = ca->old_maxB; ca->old_maxB = ca->maxB; if (!between(5 * maxB, 4 * old_maxB, 6 * old_maxB)) { ca->beta = BETA_MIN; ca->modeswitch = 0; return; } } if (ca->modeswitch && minRTT > msecs_to_jiffies(10) && maxRTT) { ca->beta = (minRTT << 7) / maxRTT; if (ca->beta < BETA_MIN) ca->beta = BETA_MIN; else if (ca->beta > BETA_MAX) ca->beta = BETA_MAX; } else { ca->beta = BETA_MIN; ca->modeswitch = 1; } } static inline void htcp_alpha_update(struct htcp *ca) { u32 minRTT = ca->minRTT; u32 factor = 1; u32 diff = htcp_cong_time(ca); if (diff > HZ) { diff -= HZ; factor = 1 + (10 * diff + ((diff / 2) * (diff / 2) / HZ)) / HZ; } if (use_rtt_scaling && minRTT) { u32 scale = (HZ << 3) / (10 * minRTT); /* clamping ratio to interval [0.5,10]<<3 */ scale = clamp(scale, 1U << 2, 10U << 3); factor = (factor << 3) / scale; if (!factor) factor = 1; } ca->alpha = 2 * factor * ((1 << 7) - ca->beta); if (!ca->alpha) ca->alpha = ALPHA_BASE; } /* * After we have the rtt data to calculate beta, we'd still prefer to wait one * rtt before we adjust our beta to ensure we are working from a consistent * data. * * This function should be called when we hit a congestion event since only at * that point do we really have a real sense of maxRTT (the queues en route * were getting just too full now). */ static void htcp_param_update(struct sock *sk) { struct htcp *ca = inet_csk_ca(sk); u32 minRTT = ca->minRTT; u32 maxRTT = ca->maxRTT; htcp_beta_update(ca, minRTT, maxRTT); htcp_alpha_update(ca); /* add slowly fading memory for maxRTT to accommodate routing changes */ if (minRTT > 0 && maxRTT > minRTT) ca->maxRTT = minRTT + ((maxRTT - minRTT) * 95) / 100; } static u32 htcp_recalc_ssthresh(struct sock *sk) { const struct tcp_sock *tp = tcp_sk(sk); const struct htcp *ca = inet_csk_ca(sk); htcp_param_update(sk); return max((tcp_snd_cwnd(tp) * ca->beta) >> 7, 2U); } static void htcp_cong_avoid(struct sock *sk, u32 ack, u32 acked) { struct tcp_sock *tp = tcp_sk(sk); struct htcp *ca = inet_csk_ca(sk); if (!tcp_is_cwnd_limited(sk)) return; if (tcp_in_slow_start(tp)) tcp_slow_start(tp, acked); else { /* In dangerous area, increase slowly. * In theory this is tp->snd_cwnd += alpha / tp->snd_cwnd */ if ((tp->snd_cwnd_cnt * ca->alpha)>>7 >= tcp_snd_cwnd(tp)) { if (tcp_snd_cwnd(tp) < tp->snd_cwnd_clamp) tcp_snd_cwnd_set(tp, tcp_snd_cwnd(tp) + 1); tp->snd_cwnd_cnt = 0; htcp_alpha_update(ca); } else tp->snd_cwnd_cnt += ca->pkts_acked; ca->pkts_acked = 1; } } static void htcp_init(struct sock *sk) { struct htcp *ca = inet_csk_ca(sk); memset(ca, 0, sizeof(struct htcp)); ca->alpha = ALPHA_BASE; ca->beta = BETA_MIN; ca->pkts_acked = 1; ca->last_cong = jiffies; } static void htcp_state(struct sock *sk, u8 new_state) { switch (new_state) { case TCP_CA_Open: { struct htcp *ca = inet_csk_ca(sk); if (ca->undo_last_cong) { ca->last_cong = jiffies; ca->undo_last_cong = 0; } } break; case TCP_CA_CWR: case TCP_CA_Recovery: case TCP_CA_Loss: htcp_reset(inet_csk_ca(sk)); break; } } static struct tcp_congestion_ops htcp __read_mostly = { .init = htcp_init, .ssthresh = htcp_recalc_ssthresh, .cong_avoid = htcp_cong_avoid, .set_state = htcp_state, .undo_cwnd = htcp_cwnd_undo, .pkts_acked = measure_achieved_throughput, .owner = THIS_MODULE, .name = "htcp", }; static int __init htcp_register(void) { BUILD_BUG_ON(sizeof(struct htcp) > ICSK_CA_PRIV_SIZE); BUILD_BUG_ON(BETA_MIN >= BETA_MAX); return tcp_register_congestion_control(&htcp); } static void __exit htcp_unregister(void) { tcp_unregister_congestion_control(&htcp); } module_init(htcp_register); module_exit(htcp_unregister); MODULE_AUTHOR("Baruch Even"); MODULE_LICENSE("GPL"); MODULE_DESCRIPTION("H-TCP"); |
| 185 185 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 | // SPDX-License-Identifier: GPL-2.0 #define CREATE_TRACE_POINTS #include <trace/events/mmap_lock.h> #include <linux/mm.h> #include <linux/cgroup.h> #include <linux/memcontrol.h> #include <linux/mmap_lock.h> #include <linux/mutex.h> #include <linux/percpu.h> #include <linux/rcupdate.h> #include <linux/smp.h> #include <linux/trace_events.h> #include <linux/local_lock.h> EXPORT_TRACEPOINT_SYMBOL(mmap_lock_start_locking); EXPORT_TRACEPOINT_SYMBOL(mmap_lock_acquire_returned); EXPORT_TRACEPOINT_SYMBOL(mmap_lock_released); #ifdef CONFIG_TRACING /* * Trace calls must be in a separate file, as otherwise there's a circular * dependency between linux/mmap_lock.h and trace/events/mmap_lock.h. */ void __mmap_lock_do_trace_start_locking(struct mm_struct *mm, bool write) { trace_mmap_lock_start_locking(mm, write); } EXPORT_SYMBOL(__mmap_lock_do_trace_start_locking); void __mmap_lock_do_trace_acquire_returned(struct mm_struct *mm, bool write, bool success) { trace_mmap_lock_acquire_returned(mm, write, success); } EXPORT_SYMBOL(__mmap_lock_do_trace_acquire_returned); void __mmap_lock_do_trace_released(struct mm_struct *mm, bool write) { trace_mmap_lock_released(mm, write); } EXPORT_SYMBOL(__mmap_lock_do_trace_released); #endif /* CONFIG_TRACING */ |
| 27 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 | /* SPDX-License-Identifier: GPL-2.0 */ #ifndef _LINUX_IRQ_H #define _LINUX_IRQ_H /* * Please do not include this file in generic code. There is currently * no requirement for any architecture to implement anything held * within this file. * * Thanks. --rmk */ #include <linux/cache.h> #include <linux/spinlock.h> #include <linux/cpumask.h> #include <linux/irqhandler.h> #include <linux/irqreturn.h> #include <linux/irqnr.h> #include <linux/topology.h> #include <linux/io.h> #include <linux/slab.h> #include <asm/irq.h> #include <asm/ptrace.h> #include <asm/irq_regs.h> struct seq_file; struct module; struct msi_msg; struct irq_affinity_desc; enum irqchip_irq_state; /* * IRQ line status. * * Bits 0-7 are the same as the IRQF_* bits in linux/interrupt.h * * IRQ_TYPE_NONE - default, unspecified type * IRQ_TYPE_EDGE_RISING - rising edge triggered * IRQ_TYPE_EDGE_FALLING - falling edge triggered * IRQ_TYPE_EDGE_BOTH - rising and falling edge triggered * IRQ_TYPE_LEVEL_HIGH - high level triggered * IRQ_TYPE_LEVEL_LOW - low level triggered * IRQ_TYPE_LEVEL_MASK - Mask to filter out the level bits * IRQ_TYPE_SENSE_MASK - Mask for all the above bits * IRQ_TYPE_DEFAULT - For use by some PICs to ask irq_set_type * to setup the HW to a sane default (used * by irqdomain map() callbacks to synchronize * the HW state and SW flags for a newly * allocated descriptor). * * IRQ_TYPE_PROBE - Special flag for probing in progress * * Bits which can be modified via irq_set/clear/modify_status_flags() * IRQ_LEVEL - Interrupt is level type. Will be also * updated in the code when the above trigger * bits are modified via irq_set_irq_type() * IRQ_PER_CPU - Mark an interrupt PER_CPU. Will protect * it from affinity setting * IRQ_NOPROBE - Interrupt cannot be probed by autoprobing * IRQ_NOREQUEST - Interrupt cannot be requested via * request_irq() * IRQ_NOTHREAD - Interrupt cannot be threaded * IRQ_NOAUTOEN - Interrupt is not automatically enabled in * request/setup_irq() * IRQ_NO_BALANCING - Interrupt cannot be balanced (affinity set) * IRQ_NESTED_THREAD - Interrupt nests into another thread * IRQ_PER_CPU_DEVID - Dev_id is a per-cpu variable * IRQ_IS_POLLED - Always polled by another interrupt. Exclude * it from the spurious interrupt detection * mechanism and from core side polling. * IRQ_DISABLE_UNLAZY - Disable lazy irq disable * IRQ_HIDDEN - Don't show up in /proc/interrupts * IRQ_NO_DEBUG - Exclude from note_interrupt() debugging */ enum { IRQ_TYPE_NONE = 0x00000000, IRQ_TYPE_EDGE_RISING = 0x00000001, IRQ_TYPE_EDGE_FALLING = 0x00000002, IRQ_TYPE_EDGE_BOTH = (IRQ_TYPE_EDGE_FALLING | IRQ_TYPE_EDGE_RISING), IRQ_TYPE_LEVEL_HIGH = 0x00000004, IRQ_TYPE_LEVEL_LOW = 0x00000008, IRQ_TYPE_LEVEL_MASK = (IRQ_TYPE_LEVEL_LOW | IRQ_TYPE_LEVEL_HIGH), IRQ_TYPE_SENSE_MASK = 0x0000000f, IRQ_TYPE_DEFAULT = IRQ_TYPE_SENSE_MASK, IRQ_TYPE_PROBE = 0x00000010, IRQ_LEVEL = (1 << 8), IRQ_PER_CPU = (1 << 9), IRQ_NOPROBE = (1 << 10), IRQ_NOREQUEST = (1 << 11), IRQ_NOAUTOEN = (1 << 12), IRQ_NO_BALANCING = (1 << 13), IRQ_NESTED_THREAD = (1 << 15), IRQ_NOTHREAD = (1 << 16), IRQ_PER_CPU_DEVID = (1 << 17), IRQ_IS_POLLED = (1 << 18), IRQ_DISABLE_UNLAZY = (1 << 19), IRQ_HIDDEN = (1 << 20), IRQ_NO_DEBUG = (1 << 21), }; #define IRQF_MODIFY_MASK \ (IRQ_TYPE_SENSE_MASK | IRQ_NOPROBE | IRQ_NOREQUEST | \ IRQ_NOAUTOEN | IRQ_LEVEL | IRQ_NO_BALANCING | \ IRQ_PER_CPU | IRQ_NESTED_THREAD | IRQ_NOTHREAD | IRQ_PER_CPU_DEVID | \ IRQ_IS_POLLED | IRQ_DISABLE_UNLAZY | IRQ_HIDDEN) #define IRQ_NO_BALANCING_MASK (IRQ_PER_CPU | IRQ_NO_BALANCING) /* * Return value for chip->irq_set_affinity() * * IRQ_SET_MASK_OK - OK, core updates irq_common_data.affinity * IRQ_SET_MASK_NOCOPY - OK, chip did update irq_common_data.affinity * IRQ_SET_MASK_OK_DONE - Same as IRQ_SET_MASK_OK for core. Special code to * support stacked irqchips, which indicates skipping * all descendant irqchips. */ enum { IRQ_SET_MASK_OK = 0, IRQ_SET_MASK_OK_NOCOPY, IRQ_SET_MASK_OK_DONE, }; struct msi_desc; struct irq_domain; /** * struct irq_common_data - per irq data shared by all irqchips * @state_use_accessors: status information for irq chip functions. * Use accessor functions to deal with it * @node: node index useful for balancing * @handler_data: per-IRQ data for the irq_chip methods * @affinity: IRQ affinity on SMP. If this is an IPI * related irq, then this is the mask of the * CPUs to which an IPI can be sent. * @effective_affinity: The effective IRQ affinity on SMP as some irq * chips do not allow multi CPU destinations. * A subset of @affinity. * @msi_desc: MSI descriptor * @ipi_offset: Offset of first IPI target cpu in @affinity. Optional. */ struct irq_common_data { unsigned int __private state_use_accessors; #ifdef CONFIG_NUMA unsigned int node; #endif void *handler_data; struct msi_desc *msi_desc; #ifdef CONFIG_SMP cpumask_var_t affinity; #endif #ifdef CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK cpumask_var_t effective_affinity; #endif #ifdef CONFIG_GENERIC_IRQ_IPI unsigned int ipi_offset; #endif }; /** * struct irq_data - per irq chip data passed down to chip functions * @mask: precomputed bitmask for accessing the chip registers * @irq: interrupt number * @hwirq: hardware interrupt number, local to the interrupt domain * @common: point to data shared by all irqchips * @chip: low level interrupt hardware access * @domain: Interrupt translation domain; responsible for mapping * between hwirq number and linux irq number. * @parent_data: pointer to parent struct irq_data to support hierarchy * irq_domain * @chip_data: platform-specific per-chip private data for the chip * methods, to allow shared chip implementations */ struct irq_data { u32 mask; unsigned int irq; irq_hw_number_t hwirq; struct irq_common_data *common; struct irq_chip *chip; struct irq_domain *domain; #ifdef CONFIG_IRQ_DOMAIN_HIERARCHY struct irq_data *parent_data; #endif void *chip_data; }; /* * Bit masks for irq_common_data.state_use_accessors * * IRQD_TRIGGER_MASK - Mask for the trigger type bits * IRQD_SETAFFINITY_PENDING - Affinity setting is pending * IRQD_ACTIVATED - Interrupt has already been activated * IRQD_NO_BALANCING - Balancing disabled for this IRQ * IRQD_PER_CPU - Interrupt is per cpu * IRQD_AFFINITY_SET - Interrupt affinity was set * IRQD_LEVEL - Interrupt is level triggered * IRQD_WAKEUP_STATE - Interrupt is configured for wakeup * from suspend * IRQD_IRQ_DISABLED - Disabled state of the interrupt * IRQD_IRQ_MASKED - Masked state of the interrupt * IRQD_IRQ_INPROGRESS - In progress state of the interrupt * IRQD_WAKEUP_ARMED - Wakeup mode armed * IRQD_FORWARDED_TO_VCPU - The interrupt is forwarded to a VCPU * IRQD_AFFINITY_MANAGED - Affinity is auto-managed by the kernel * IRQD_IRQ_STARTED - Startup state of the interrupt * IRQD_MANAGED_SHUTDOWN - Interrupt was shutdown due to empty affinity * mask. Applies only to affinity managed irqs. * IRQD_SINGLE_TARGET - IRQ allows only a single affinity target * IRQD_DEFAULT_TRIGGER_SET - Expected trigger already been set * IRQD_CAN_RESERVE - Can use reservation mode * IRQD_HANDLE_ENFORCE_IRQCTX - Enforce that handle_irq_*() is only invoked * from actual interrupt context. * IRQD_AFFINITY_ON_ACTIVATE - Affinity is set on activation. Don't call * irq_chip::irq_set_affinity() when deactivated. * IRQD_IRQ_ENABLED_ON_SUSPEND - Interrupt is enabled on suspend by irq pm if * irqchip have flag IRQCHIP_ENABLE_WAKEUP_ON_SUSPEND set. * IRQD_RESEND_WHEN_IN_PROGRESS - Interrupt may fire when already in progress in which * case it must be resent at the next available opportunity. */ enum { IRQD_TRIGGER_MASK = 0xf, IRQD_SETAFFINITY_PENDING = BIT(8), IRQD_ACTIVATED = BIT(9), IRQD_NO_BALANCING = BIT(10), IRQD_PER_CPU = BIT(11), IRQD_AFFINITY_SET = BIT(12), IRQD_LEVEL = BIT(13), IRQD_WAKEUP_STATE = BIT(14), IRQD_IRQ_DISABLED = BIT(16), IRQD_IRQ_MASKED = BIT(17), IRQD_IRQ_INPROGRESS = BIT(18), IRQD_WAKEUP_ARMED = BIT(19), IRQD_FORWARDED_TO_VCPU = BIT(20), IRQD_AFFINITY_MANAGED = BIT(21), IRQD_IRQ_STARTED = BIT(22), IRQD_MANAGED_SHUTDOWN = BIT(23), IRQD_SINGLE_TARGET = BIT(24), IRQD_DEFAULT_TRIGGER_SET = BIT(25), IRQD_CAN_RESERVE = BIT(26), IRQD_HANDLE_ENFORCE_IRQCTX = BIT(27), IRQD_AFFINITY_ON_ACTIVATE = BIT(28), IRQD_IRQ_ENABLED_ON_SUSPEND = BIT(29), IRQD_RESEND_WHEN_IN_PROGRESS = BIT(30), }; #define __irqd_to_state(d) ACCESS_PRIVATE((d)->common, state_use_accessors) static inline bool irqd_is_setaffinity_pending(struct irq_data *d) { return __irqd_to_state(d) & IRQD_SETAFFINITY_PENDING; } static inline bool irqd_is_per_cpu(struct irq_data *d) { return __irqd_to_state(d) & IRQD_PER_CPU; } static inline bool irqd_can_balance(struct irq_data *d) { return !(__irqd_to_state(d) & (IRQD_PER_CPU | IRQD_NO_BALANCING)); } static inline bool irqd_affinity_was_set(struct irq_data *d) { return __irqd_to_state(d) & IRQD_AFFINITY_SET; } static inline void irqd_mark_affinity_was_set(struct irq_data *d) { __irqd_to_state(d) |= IRQD_AFFINITY_SET; } static inline bool irqd_trigger_type_was_set(struct irq_data *d) { return __irqd_to_state(d) & IRQD_DEFAULT_TRIGGER_SET; } static inline u32 irqd_get_trigger_type(struct irq_data *d) { return __irqd_to_state(d) & IRQD_TRIGGER_MASK; } /* * Must only be called inside irq_chip.irq_set_type() functions or * from the DT/ACPI setup code. */ static inline void irqd_set_trigger_type(struct irq_data *d, u32 type) { __irqd_to_state(d) &= ~IRQD_TRIGGER_MASK; __irqd_to_state(d) |= type & IRQD_TRIGGER_MASK; __irqd_to_state(d) |= IRQD_DEFAULT_TRIGGER_SET; } static inline bool irqd_is_level_type(struct irq_data *d) { return __irqd_to_state(d) & IRQD_LEVEL; } /* * Must only be called of irqchip.irq_set_affinity() or low level * hierarchy domain allocation functions. */ static inline void irqd_set_single_target(struct irq_data *d) { __irqd_to_state(d) |= IRQD_SINGLE_TARGET; } static inline bool irqd_is_single_target(struct irq_data *d) { return __irqd_to_state(d) & IRQD_SINGLE_TARGET; } static inline void irqd_set_handle_enforce_irqctx(struct irq_data *d) { __irqd_to_state(d) |= IRQD_HANDLE_ENFORCE_IRQCTX; } static inline bool irqd_is_handle_enforce_irqctx(struct irq_data *d) { return __irqd_to_state(d) & IRQD_HANDLE_ENFORCE_IRQCTX; } static inline bool irqd_is_enabled_on_suspend(struct irq_data *d) { return __irqd_to_state(d) & IRQD_IRQ_ENABLED_ON_SUSPEND; } static inline bool irqd_is_wakeup_set(struct irq_data *d) { return __irqd_to_state(d) & IRQD_WAKEUP_STATE; } static inline bool irqd_irq_disabled(struct irq_data *d) { return __irqd_to_state(d) & IRQD_IRQ_DISABLED; } static inline bool irqd_irq_masked(struct irq_data *d) { return __irqd_to_state(d) & IRQD_IRQ_MASKED; } static inline bool irqd_irq_inprogress(struct irq_data *d) { return __irqd_to_state(d) & IRQD_IRQ_INPROGRESS; } static inline bool irqd_is_wakeup_armed(struct irq_data *d) { return __irqd_to_state(d) & IRQD_WAKEUP_ARMED; } static inline bool irqd_is_forwarded_to_vcpu(struct irq_data *d) { return __irqd_to_state(d) & IRQD_FORWARDED_TO_VCPU; } static inline void irqd_set_forwarded_to_vcpu(struct irq_data *d) { __irqd_to_state(d) |= IRQD_FORWARDED_TO_VCPU; } static inline void irqd_clr_forwarded_to_vcpu(struct irq_data *d) { __irqd_to_state(d) &= ~IRQD_FORWARDED_TO_VCPU; } static inline bool irqd_affinity_is_managed(struct irq_data *d) { return __irqd_to_state(d) & IRQD_AFFINITY_MANAGED; } static inline bool irqd_is_activated(struct irq_data *d) { return __irqd_to_state(d) & IRQD_ACTIVATED; } static inline void irqd_set_activated(struct irq_data *d) { __irqd_to_state(d) |= IRQD_ACTIVATED; } static inline void irqd_clr_activated(struct irq_data *d) { __irqd_to_state(d) &= ~IRQD_ACTIVATED; } static inline bool irqd_is_started(struct irq_data *d) { return __irqd_to_state(d) & IRQD_IRQ_STARTED; } static inline bool irqd_is_managed_and_shutdown(struct irq_data *d) { return __irqd_to_state(d) & IRQD_MANAGED_SHUTDOWN; } static inline void irqd_set_can_reserve(struct irq_data *d) { __irqd_to_state(d) |= IRQD_CAN_RESERVE; } static inline void irqd_clr_can_reserve(struct irq_data *d) { __irqd_to_state(d) &= ~IRQD_CAN_RESERVE; } static inline bool irqd_can_reserve(struct irq_data *d) { return __irqd_to_state(d) & IRQD_CAN_RESERVE; } static inline void irqd_set_affinity_on_activate(struct irq_data *d) { __irqd_to_state(d) |= IRQD_AFFINITY_ON_ACTIVATE; } static inline bool irqd_affinity_on_activate(struct irq_data *d) { return __irqd_to_state(d) & IRQD_AFFINITY_ON_ACTIVATE; } static inline void irqd_set_resend_when_in_progress(struct irq_data *d) { __irqd_to_state(d) |= IRQD_RESEND_WHEN_IN_PROGRESS; } static inline bool irqd_needs_resend_when_in_progress(struct irq_data *d) { return __irqd_to_state(d) & IRQD_RESEND_WHEN_IN_PROGRESS; } #undef __irqd_to_state static inline irq_hw_number_t irqd_to_hwirq(struct irq_data *d) { return d->hwirq; } /** * struct irq_chip - hardware interrupt chip descriptor * * @name: name for /proc/interrupts * @irq_startup: start up the interrupt (defaults to ->enable if NULL) * @irq_shutdown: shut down the interrupt (defaults to ->disable if NULL) * @irq_enable: enable the interrupt (defaults to chip->unmask if NULL) * @irq_disable: disable the interrupt * @irq_ack: start of a new interrupt * @irq_mask: mask an interrupt source * @irq_mask_ack: ack and mask an interrupt source * @irq_unmask: unmask an interrupt source * @irq_eoi: end of interrupt * @irq_set_affinity: Set the CPU affinity on SMP machines. If the force * argument is true, it tells the driver to * unconditionally apply the affinity setting. Sanity * checks against the supplied affinity mask are not * required. This is used for CPU hotplug where the * target CPU is not yet set in the cpu_online_mask. * @irq_retrigger: resend an IRQ to the CPU * @irq_set_type: set the flow type (IRQ_TYPE_LEVEL/etc.) of an IRQ * @irq_set_wake: enable/disable power-management wake-on of an IRQ * @irq_bus_lock: function to lock access to slow bus (i2c) chips * @irq_bus_sync_unlock:function to sync and unlock slow bus (i2c) chips * @irq_cpu_online: configure an interrupt source for a secondary CPU * @irq_cpu_offline: un-configure an interrupt source for a secondary CPU * @irq_suspend: function called from core code on suspend once per * chip, when one or more interrupts are installed * @irq_resume: function called from core code on resume once per chip, * when one ore more interrupts are installed * @irq_pm_shutdown: function called from core code on shutdown once per chip * @irq_calc_mask: Optional function to set irq_data.mask for special cases * @irq_print_chip: optional to print special chip info in show_interrupts * @irq_request_resources: optional to request resources before calling * any other callback related to this irq * @irq_release_resources: optional to release resources acquired with * irq_request_resources * @irq_compose_msi_msg: optional to compose message content for MSI * @irq_write_msi_msg: optional to write message content for MSI * @irq_get_irqchip_state: return the internal state of an interrupt * @irq_set_irqchip_state: set the internal state of a interrupt * @irq_set_vcpu_affinity: optional to target a vCPU in a virtual machine * @ipi_send_single: send a single IPI to destination cpus * @ipi_send_mask: send an IPI to destination cpus in cpumask * @irq_nmi_setup: function called from core code before enabling an NMI * @irq_nmi_teardown: function called from core code after disabling an NMI * @flags: chip specific flags */ struct irq_chip { const char *name; unsigned int (*irq_startup)(struct irq_data *data); void (*irq_shutdown)(struct irq_data *data); void (*irq_enable)(struct irq_data *data); void (*irq_disable)(struct irq_data *data); void (*irq_ack)(struct irq_data *data); void (*irq_mask)(struct irq_data *data); void (*irq_mask_ack)(struct irq_data *data); void (*irq_unmask)(struct irq_data *data); void (*irq_eoi)(struct irq_data *data); int (*irq_set_affinity)(struct irq_data *data, const struct cpumask *dest, bool force); int (*irq_retrigger)(struct irq_data *data); int (*irq_set_type)(struct irq_data *data, unsigned int flow_type); int (*irq_set_wake)(struct irq_data *data, unsigned int on); void (*irq_bus_lock)(struct irq_data *data); void (*irq_bus_sync_unlock)(struct irq_data *data); #ifdef CONFIG_DEPRECATED_IRQ_CPU_ONOFFLINE void (*irq_cpu_online)(struct irq_data *data); void (*irq_cpu_offline)(struct irq_data *data); #endif void (*irq_suspend)(struct irq_data *data); void (*irq_resume)(struct irq_data *data); void (*irq_pm_shutdown)(struct irq_data *data); void (*irq_calc_mask)(struct irq_data *data); void (*irq_print_chip)(struct irq_data *data, struct seq_file *p); int (*irq_request_resources)(struct irq_data *data); void (*irq_release_resources)(struct irq_data *data); void (*irq_compose_msi_msg)(struct irq_data *data, struct msi_msg *msg); void (*irq_write_msi_msg)(struct irq_data *data, struct msi_msg *msg); int (*irq_get_irqchip_state)(struct irq_data *data, enum irqchip_irq_state which, bool *state); int (*irq_set_irqchip_state)(struct irq_data *data, enum irqchip_irq_state which, bool state); int (*irq_set_vcpu_affinity)(struct irq_data *data, void *vcpu_info); void (*ipi_send_single)(struct irq_data *data, unsigned int cpu); void (*ipi_send_mask)(struct irq_data *data, const struct cpumask *dest); int (*irq_nmi_setup)(struct irq_data *data); void (*irq_nmi_teardown)(struct irq_data *data); unsigned long flags; }; /* * irq_chip specific flags * * IRQCHIP_SET_TYPE_MASKED: Mask before calling chip.irq_set_type() * IRQCHIP_EOI_IF_HANDLED: Only issue irq_eoi() when irq was handled * IRQCHIP_MASK_ON_SUSPEND: Mask non wake irqs in the suspend path * IRQCHIP_ONOFFLINE_ENABLED: Only call irq_on/off_line callbacks * when irq enabled * IRQCHIP_SKIP_SET_WAKE: Skip chip.irq_set_wake(), for this irq chip * IRQCHIP_ONESHOT_SAFE: One shot does not require mask/unmask * IRQCHIP_EOI_THREADED: Chip requires eoi() on unmask in threaded mode * IRQCHIP_SUPPORTS_LEVEL_MSI: Chip can provide two doorbells for Level MSIs * IRQCHIP_SUPPORTS_NMI: Chip can deliver NMIs, only for root irqchips * IRQCHIP_ENABLE_WAKEUP_ON_SUSPEND: Invokes __enable_irq()/__disable_irq() for wake irqs * in the suspend path if they are in disabled state * IRQCHIP_AFFINITY_PRE_STARTUP: Default affinity update before startup * IRQCHIP_IMMUTABLE: Don't ever change anything in this chip * IRQCHIP_MOVE_DEFERRED: Move the interrupt in actual interrupt context */ enum { IRQCHIP_SET_TYPE_MASKED = (1 << 0), IRQCHIP_EOI_IF_HANDLED = (1 << 1), IRQCHIP_MASK_ON_SUSPEND = (1 << 2), IRQCHIP_ONOFFLINE_ENABLED = (1 << 3), IRQCHIP_SKIP_SET_WAKE = (1 << 4), IRQCHIP_ONESHOT_SAFE = (1 << 5), IRQCHIP_EOI_THREADED = (1 << 6), IRQCHIP_SUPPORTS_LEVEL_MSI = (1 << 7), IRQCHIP_SUPPORTS_NMI = (1 << 8), IRQCHIP_ENABLE_WAKEUP_ON_SUSPEND = (1 << 9), IRQCHIP_AFFINITY_PRE_STARTUP = (1 << 10), IRQCHIP_IMMUTABLE = (1 << 11), IRQCHIP_MOVE_DEFERRED = (1 << 12), }; #include <linux/irqdesc.h> /* * Pick up the arch-dependent methods: */ #include <asm/hw_irq.h> #ifndef NR_IRQS_LEGACY # define NR_IRQS_LEGACY 0 #endif #ifndef ARCH_IRQ_INIT_FLAGS # define ARCH_IRQ_INIT_FLAGS 0 #endif #define IRQ_DEFAULT_INIT_FLAGS ARCH_IRQ_INIT_FLAGS struct irqaction; extern int setup_percpu_irq(unsigned int irq, struct irqaction *new); extern void remove_percpu_irq(unsigned int irq, struct irqaction *act); #ifdef CONFIG_DEPRECATED_IRQ_CPU_ONOFFLINE extern void irq_cpu_online(void); extern void irq_cpu_offline(void); #endif extern int irq_set_affinity_locked(struct irq_data *data, const struct cpumask *cpumask, bool force); extern int irq_set_vcpu_affinity(unsigned int irq, void *vcpu_info); #if defined(CONFIG_SMP) && defined(CONFIG_GENERIC_IRQ_MIGRATION) extern void irq_migrate_all_off_this_cpu(void); extern int irq_affinity_online_cpu(unsigned int cpu); #else # define irq_affinity_online_cpu NULL #endif #if defined(CONFIG_SMP) && defined(CONFIG_GENERIC_PENDING_IRQ) void __irq_move_irq(struct irq_data *data); static inline void irq_move_irq(struct irq_data *data) { if (unlikely(irqd_is_setaffinity_pending(data))) __irq_move_irq(data); } void irq_move_masked_irq(struct irq_data *data); void irq_force_complete_move(struct irq_desc *desc); #else static inline void irq_move_irq(struct irq_data *data) { } static inline void irq_move_masked_irq(struct irq_data *data) { } static inline void irq_force_complete_move(struct irq_desc *desc) { } #endif extern int no_irq_affinity; #ifdef CONFIG_HARDIRQS_SW_RESEND int irq_set_parent(int irq, int parent_irq); #else static inline int irq_set_parent(int irq, int parent_irq) { return 0; } #endif /* * Built-in IRQ handlers for various IRQ types, * callable via desc->handle_irq() */ extern void handle_level_irq(struct irq_desc *desc); extern void handle_fasteoi_irq(struct irq_desc *desc); extern void handle_edge_irq(struct irq_desc *desc); extern void handle_edge_eoi_irq(struct irq_desc *desc); extern void handle_simple_irq(struct irq_desc *desc); extern void handle_untracked_irq(struct irq_desc *desc); extern void handle_percpu_irq(struct irq_desc *desc); extern void handle_percpu_devid_irq(struct irq_desc *desc); extern void handle_bad_irq(struct irq_desc *desc); extern void handle_nested_irq(unsigned int irq); extern void handle_fasteoi_nmi(struct irq_desc *desc); extern void handle_percpu_devid_fasteoi_nmi(struct irq_desc *desc); extern int irq_chip_compose_msi_msg(struct irq_data *data, struct msi_msg *msg); extern int irq_chip_pm_get(struct irq_data *data); extern int irq_chip_pm_put(struct irq_data *data); #ifdef CONFIG_IRQ_DOMAIN_HIERARCHY extern void handle_fasteoi_ack_irq(struct irq_desc *desc); extern void handle_fasteoi_mask_irq(struct irq_desc *desc); extern int irq_chip_set_parent_state(struct irq_data *data, enum irqchip_irq_state which, bool val); extern int irq_chip_get_parent_state(struct irq_data *data, enum irqchip_irq_state which, bool *state); extern void irq_chip_enable_parent(struct irq_data *data); extern void irq_chip_disable_parent(struct irq_data *data); extern void irq_chip_ack_parent(struct irq_data *data); extern int irq_chip_retrigger_hierarchy(struct irq_data *data); extern void irq_chip_mask_parent(struct irq_data *data); extern void irq_chip_mask_ack_parent(struct irq_data *data); extern void irq_chip_unmask_parent(struct irq_data *data); extern void irq_chip_eoi_parent(struct irq_data *data); extern int irq_chip_set_affinity_parent(struct irq_data *data, const struct cpumask *dest, bool force); extern int irq_chip_set_wake_parent(struct irq_data *data, unsigned int on); extern int irq_chip_set_vcpu_affinity_parent(struct irq_data *data, void *vcpu_info); extern int irq_chip_set_type_parent(struct irq_data *data, unsigned int type); extern int irq_chip_request_resources_parent(struct irq_data *data); extern void irq_chip_release_resources_parent(struct irq_data *data); #endif /* Disable or mask interrupts during a kernel kexec */ extern void machine_kexec_mask_interrupts(void); /* Handling of unhandled and spurious interrupts: */ extern void note_interrupt(struct irq_desc *desc, irqreturn_t action_ret); /* Enable/disable irq debugging output: */ extern int noirqdebug_setup(char *str); /* Checks whether the interrupt can be requested by request_irq(): */ extern int can_request_irq(unsigned int irq, unsigned long irqflags); /* Dummy irq-chip implementations: */ extern struct irq_chip no_irq_chip; extern struct irq_chip dummy_irq_chip; extern void irq_set_chip_and_handler_name(unsigned int irq, const struct irq_chip *chip, irq_flow_handler_t handle, const char *name); static inline void irq_set_chip_and_handler(unsigned int irq, const struct irq_chip *chip, irq_flow_handler_t handle) { irq_set_chip_and_handler_name(irq, chip, handle, NULL); } extern int irq_set_percpu_devid(unsigned int irq); extern int irq_set_percpu_devid_partition(unsigned int irq, const struct cpumask *affinity); extern int irq_get_percpu_devid_partition(unsigned int irq, struct cpumask *affinity); extern void __irq_set_handler(unsigned int irq, irq_flow_handler_t handle, int is_chained, const char *name); static inline void irq_set_handler(unsigned int irq, irq_flow_handler_t handle) { __irq_set_handler(irq, handle, 0, NULL); } /* * Set a highlevel chained flow handler for a given IRQ. * (a chained handler is automatically enabled and set to * IRQ_NOREQUEST, IRQ_NOPROBE, and IRQ_NOTHREAD) */ static inline void irq_set_chained_handler(unsigned int irq, irq_flow_handler_t handle) { __irq_set_handler(irq, handle, 1, NULL); } /* * Set a highlevel chained flow handler and its data for a given IRQ. * (a chained handler is automatically enabled and set to * IRQ_NOREQUEST, IRQ_NOPROBE, and IRQ_NOTHREAD) */ void irq_set_chained_handler_and_data(unsigned int irq, irq_flow_handler_t handle, void *data); void irq_modify_status(unsigned int irq, unsigned long clr, unsigned long set); static inline void irq_set_status_flags(unsigned int irq, unsigned long set) { irq_modify_status(irq, 0, set); } static inline void irq_clear_status_flags(unsigned int irq, unsigned long clr) { irq_modify_status(irq, clr, 0); } static inline void irq_set_noprobe(unsigned int irq) { irq_modify_status(irq, 0, IRQ_NOPROBE); } static inline void irq_set_probe(unsigned int irq) { irq_modify_status(irq, IRQ_NOPROBE, 0); } static inline void irq_set_nothread(unsigned int irq) { irq_modify_status(irq, 0, IRQ_NOTHREAD); } static inline void irq_set_thread(unsigned int irq) { irq_modify_status(irq, IRQ_NOTHREAD, 0); } static inline void irq_set_nested_thread(unsigned int irq, bool nest) { if (nest) irq_set_status_flags(irq, IRQ_NESTED_THREAD); else irq_clear_status_flags(irq, IRQ_NESTED_THREAD); } static inline void irq_set_percpu_devid_flags(unsigned int irq) { irq_set_status_flags(irq, IRQ_NOAUTOEN | IRQ_PER_CPU | IRQ_NOTHREAD | IRQ_NOPROBE | IRQ_PER_CPU_DEVID); } /* Set/get chip/data for an IRQ: */ extern int irq_set_chip(unsigned int irq, const struct irq_chip *chip); extern int irq_set_handler_data(unsigned int irq, void *data); extern int irq_set_chip_data(unsigned int irq, void *data); extern int irq_set_irq_type(unsigned int irq, unsigned int type); extern int irq_set_msi_desc(unsigned int irq, struct msi_desc *entry); extern int irq_set_msi_desc_off(unsigned int irq_base, unsigned int irq_offset, struct msi_desc *entry); extern struct irq_data *irq_get_irq_data(unsigned int irq); static inline struct irq_chip *irq_get_chip(unsigned int irq) { struct irq_data *d = irq_get_irq_data(irq); return d ? d->chip : NULL; } static inline struct irq_chip *irq_data_get_irq_chip(struct irq_data *d) { return d->chip; } static inline void *irq_get_chip_data(unsigned int irq) { struct irq_data *d = irq_get_irq_data(irq); return d ? d->chip_data : NULL; } static inline void *irq_data_get_irq_chip_data(struct irq_data *d) { return d->chip_data; } static inline void *irq_get_handler_data(unsigned int irq) { struct irq_data *d = irq_get_irq_data(irq); return d ? d->common->handler_data : NULL; } static inline void *irq_data_get_irq_handler_data(struct irq_data *d) { return d->common->handler_data; } static inline struct msi_desc *irq_get_msi_desc(unsigned int irq) { struct irq_data *d = irq_get_irq_data(irq); return d ? d->common->msi_desc : NULL; } static inline struct msi_desc *irq_data_get_msi_desc(struct irq_data *d) { return d->common->msi_desc; } static inline u32 irq_get_trigger_type(unsigned int irq) { struct irq_data *d = irq_get_irq_data(irq); return d ? irqd_get_trigger_type(d) : 0; } static inline int irq_common_data_get_node(struct irq_common_data *d) { #ifdef CONFIG_NUMA return d->node; #else return 0; #endif } static inline int irq_data_get_node(struct irq_data *d) { return irq_common_data_get_node(d->common); } static inline const struct cpumask *irq_data_get_affinity_mask(struct irq_data *d) { #ifdef CONFIG_SMP return d->common->affinity; #else return cpumask_of(0); #endif } static inline void irq_data_update_affinity(struct irq_data *d, const struct cpumask *m) { #ifdef CONFIG_SMP cpumask_copy(d->common->affinity, m); #endif } static inline const struct cpumask *irq_get_affinity_mask(int irq) { struct irq_data *d = irq_get_irq_data(irq); return d ? irq_data_get_affinity_mask(d) : NULL; } #ifdef CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK static inline const struct cpumask *irq_data_get_effective_affinity_mask(struct irq_data *d) { return d->common->effective_affinity; } static inline void irq_data_update_effective_affinity(struct irq_data *d, const struct cpumask *m) { cpumask_copy(d->common->effective_affinity, m); } #else static inline void irq_data_update_effective_affinity(struct irq_data *d, const struct cpumask *m) { } static inline const struct cpumask *irq_data_get_effective_affinity_mask(struct irq_data *d) { return irq_data_get_affinity_mask(d); } #endif static inline const struct cpumask *irq_get_effective_affinity_mask(unsigned int irq) { struct irq_data *d = irq_get_irq_data(irq); return d ? irq_data_get_effective_affinity_mask(d) : NULL; } unsigned int arch_dynirq_lower_bound(unsigned int from); int __irq_alloc_descs(int irq, unsigned int from, unsigned int cnt, int node, struct module *owner, const struct irq_affinity_desc *affinity); int __devm_irq_alloc_descs(struct device *dev, int irq, unsigned int from, unsigned int cnt, int node, struct module *owner, const struct irq_affinity_desc *affinity); /* use macros to avoid needing export.h for THIS_MODULE */ #define irq_alloc_descs(irq, from, cnt, node) \ __irq_alloc_descs(irq, from, cnt, node, THIS_MODULE, NULL) #define irq_alloc_desc(node) \ irq_alloc_descs(-1, 1, 1, node) #define irq_alloc_desc_at(at, node) \ irq_alloc_descs(at, at, 1, node) #define irq_alloc_desc_from(from, node) \ irq_alloc_descs(-1, from, 1, node) #define irq_alloc_descs_from(from, cnt, node) \ irq_alloc_descs(-1, from, cnt, node) #define devm_irq_alloc_descs(dev, irq, from, cnt, node) \ __devm_irq_alloc_descs(dev, irq, from, cnt, node, THIS_MODULE, NULL) #define devm_irq_alloc_desc(dev, node) \ devm_irq_alloc_descs(dev, -1, 1, 1, node) #define devm_irq_alloc_desc_at(dev, at, node) \ devm_irq_alloc_descs(dev, at, at, 1, node) #define devm_irq_alloc_desc_from(dev, from, node) \ devm_irq_alloc_descs(dev, -1, from, 1, node) #define devm_irq_alloc_descs_from(dev, from, cnt, node) \ devm_irq_alloc_descs(dev, -1, from, cnt, node) void irq_free_descs(unsigned int irq, unsigned int cnt); static inline void irq_free_desc(unsigned int irq) { irq_free_descs(irq, 1); } #ifdef CONFIG_GENERIC_IRQ_LEGACY void irq_init_desc(unsigned int irq); #endif /** * struct irq_chip_regs - register offsets for struct irq_gci * @enable: Enable register offset to reg_base * @disable: Disable register offset to reg_base * @mask: Mask register offset to reg_base * @ack: Ack register offset to reg_base * @eoi: Eoi register offset to reg_base * @type: Type configuration register offset to reg_base */ struct irq_chip_regs { unsigned long enable; unsigned long disable; unsigned long mask; unsigned long ack; unsigned long eoi; unsigned long type; }; /** * struct irq_chip_type - Generic interrupt chip instance for a flow type * @chip: The real interrupt chip which provides the callbacks * @regs: Register offsets for this chip * @handler: Flow handler associated with this chip * @type: Chip can handle these flow types * @mask_cache_priv: Cached mask register private to the chip type * @mask_cache: Pointer to cached mask register * * A irq_generic_chip can have several instances of irq_chip_type when * it requires different functions and register offsets for different * flow types. */ struct irq_chip_type { struct irq_chip chip; struct irq_chip_regs regs; irq_flow_handler_t handler; u32 type; u32 mask_cache_priv; u32 *mask_cache; }; /** * struct irq_chip_generic - Generic irq chip data structure * @lock: Lock to protect register and cache data access * @reg_base: Register base address (virtual) * @reg_readl: Alternate I/O accessor (defaults to readl if NULL) * @reg_writel: Alternate I/O accessor (defaults to writel if NULL) * @suspend: Function called from core code on suspend once per * chip; can be useful instead of irq_chip::suspend to * handle chip details even when no interrupts are in use * @resume: Function called from core code on resume once per chip; * can be useful instead of irq_chip::suspend to handle * chip details even when no interrupts are in use * @irq_base: Interrupt base nr for this chip * @irq_cnt: Number of interrupts handled by this chip * @mask_cache: Cached mask register shared between all chip types * @wake_enabled: Interrupt can wakeup from suspend * @wake_active: Interrupt is marked as an wakeup from suspend source * @num_ct: Number of available irq_chip_type instances (usually 1) * @private: Private data for non generic chip callbacks * @installed: bitfield to denote installed interrupts * @unused: bitfield to denote unused interrupts * @domain: irq domain pointer * @list: List head for keeping track of instances * @chip_types: Array of interrupt irq_chip_types * * Note, that irq_chip_generic can have multiple irq_chip_type * implementations which can be associated to a particular irq line of * an irq_chip_generic instance. That allows to share and protect * state in an irq_chip_generic instance when we need to implement * different flow mechanisms (level/edge) for it. */ struct irq_chip_generic { raw_spinlock_t lock; void __iomem *reg_base; u32 (*reg_readl)(void __iomem *addr); void (*reg_writel)(u32 val, void __iomem *addr); void (*suspend)(struct irq_chip_generic *gc); void (*resume)(struct irq_chip_generic *gc); unsigned int irq_base; unsigned int irq_cnt; u32 mask_cache; u32 wake_enabled; u32 wake_active; unsigned int num_ct; void *private; unsigned long installed; unsigned long unused; struct irq_domain *domain; struct list_head list; struct irq_chip_type chip_types[]; }; /** * enum irq_gc_flags - Initialization flags for generic irq chips * @IRQ_GC_INIT_MASK_CACHE: Initialize the mask_cache by reading mask reg * @IRQ_GC_INIT_NESTED_LOCK: Set the lock class of the irqs to nested for * irq chips which need to call irq_set_wake() on * the parent irq. Usually GPIO implementations * @IRQ_GC_MASK_CACHE_PER_TYPE: Mask cache is chip type private * @IRQ_GC_NO_MASK: Do not calculate irq_data->mask * @IRQ_GC_BE_IO: Use big-endian register accesses (default: LE) */ enum irq_gc_flags { IRQ_GC_INIT_MASK_CACHE = 1 << 0, IRQ_GC_INIT_NESTED_LOCK = 1 << 1, IRQ_GC_MASK_CACHE_PER_TYPE = 1 << 2, IRQ_GC_NO_MASK = 1 << 3, IRQ_GC_BE_IO = 1 << 4, }; /* * struct irq_domain_chip_generic - Generic irq chip data structure for irq domains * @irqs_per_chip: Number of interrupts per chip * @num_chips: Number of chips * @irq_flags_to_set: IRQ* flags to set on irq setup * @irq_flags_to_clear: IRQ* flags to clear on irq setup * @gc_flags: Generic chip specific setup flags * @exit: Function called on each chip when they are destroyed. * @gc: Array of pointers to generic interrupt chips */ struct irq_domain_chip_generic { unsigned int irqs_per_chip; unsigned int num_chips; unsigned int irq_flags_to_clear; unsigned int irq_flags_to_set; enum irq_gc_flags gc_flags; void (*exit)(struct irq_chip_generic *gc); struct irq_chip_generic *gc[]; }; /** * struct irq_domain_chip_generic_info - Generic chip information structure * @name: Name of the generic interrupt chip * @handler: Interrupt handler used by the generic interrupt chip * @irqs_per_chip: Number of interrupts each chip handles (max 32) * @num_ct: Number of irq_chip_type instances associated with each * chip * @irq_flags_to_clear: IRQ_* bits to clear in the mapping function * @irq_flags_to_set: IRQ_* bits to set in the mapping function * @gc_flags: Generic chip specific setup flags * @init: Function called on each chip when they are created. * Allow to do some additional chip initialisation. * @exit: Function called on each chip when they are destroyed. * Allow to do some chip cleanup operation. */ struct irq_domain_chip_generic_info { const char *name; irq_flow_handler_t handler; unsigned int irqs_per_chip; unsigned int num_ct; unsigned int irq_flags_to_clear; unsigned int irq_flags_to_set; enum irq_gc_flags gc_flags; int (*init)(struct irq_chip_generic *gc); void (*exit)(struct irq_chip_generic *gc); }; /* Generic chip callback functions */ void irq_gc_noop(struct irq_data *d); void irq_gc_mask_disable_reg(struct irq_data *d); void irq_gc_mask_set_bit(struct irq_data *d); void irq_gc_mask_clr_bit(struct irq_data *d); void irq_gc_unmask_enable_reg(struct irq_data *d); void irq_gc_ack_set_bit(struct irq_data *d); void irq_gc_ack_clr_bit(struct irq_data *d); void irq_gc_mask_disable_and_ack_set(struct irq_data *d); void irq_gc_eoi(struct irq_data *d); int irq_gc_set_wake(struct irq_data *d, unsigned int on); /* Setup functions for irq_chip_generic */ int irq_map_generic_chip(struct irq_domain *d, unsigned int virq, irq_hw_number_t hw_irq); void irq_unmap_generic_chip(struct irq_domain *d, unsigned int virq); struct irq_chip_generic * irq_alloc_generic_chip(const char *name, int nr_ct, unsigned int irq_base, void __iomem *reg_base, irq_flow_handler_t handler); void irq_setup_generic_chip(struct irq_chip_generic *gc, u32 msk, enum irq_gc_flags flags, unsigned int clr, unsigned int set); int irq_setup_alt_chip(struct irq_data *d, unsigned int type); void irq_remove_generic_chip(struct irq_chip_generic *gc, u32 msk, unsigned int clr, unsigned int set); struct irq_chip_generic * devm_irq_alloc_generic_chip(struct device *dev, const char *name, int num_ct, unsigned int irq_base, void __iomem *reg_base, irq_flow_handler_t handler); int devm_irq_setup_generic_chip(struct device *dev, struct irq_chip_generic *gc, u32 msk, enum irq_gc_flags flags, unsigned int clr, unsigned int set); struct irq_chip_generic *irq_get_domain_generic_chip(struct irq_domain *d, unsigned int hw_irq); #ifdef CONFIG_GENERIC_IRQ_CHIP int irq_domain_alloc_generic_chips(struct irq_domain *d, const struct irq_domain_chip_generic_info *info); void irq_domain_remove_generic_chips(struct irq_domain *d); #else static inline int irq_domain_alloc_generic_chips(struct irq_domain *d, const struct irq_domain_chip_generic_info *info) { return -EINVAL; } static inline void irq_domain_remove_generic_chips(struct irq_domain *d) { } #endif /* CONFIG_GENERIC_IRQ_CHIP */ int __irq_alloc_domain_generic_chips(struct irq_domain *d, int irqs_per_chip, int num_ct, const char *name, irq_flow_handler_t handler, unsigned int clr, unsigned int set, enum irq_gc_flags flags); #define irq_alloc_domain_generic_chips(d, irqs_per_chip, num_ct, name, \ handler, clr, set, flags) \ ({ \ MAYBE_BUILD_BUG_ON(irqs_per_chip > 32); \ __irq_alloc_domain_generic_chips(d, irqs_per_chip, num_ct, name,\ handler, clr, set, flags); \ }) static inline void irq_free_generic_chip(struct irq_chip_generic *gc) { kfree(gc); } static inline void irq_destroy_generic_chip(struct irq_chip_generic *gc, u32 msk, unsigned int clr, unsigned int set) { irq_remove_generic_chip(gc, msk, clr, set); irq_free_generic_chip(gc); } static inline struct irq_chip_type *irq_data_get_chip_type(struct irq_data *d) { return container_of(d->chip, struct irq_chip_type, chip); } #define IRQ_MSK(n) (u32)((n) < 32 ? ((1 << (n)) - 1) : UINT_MAX) #ifdef CONFIG_SMP static inline void irq_gc_lock(struct irq_chip_generic *gc) { raw_spin_lock(&gc->lock); } static inline void irq_gc_unlock(struct irq_chip_generic *gc) { raw_spin_unlock(&gc->lock); } #else static inline void irq_gc_lock(struct irq_chip_generic *gc) { } static inline void irq_gc_unlock(struct irq_chip_generic *gc) { } #endif /* * The irqsave variants are for usage in non interrupt code. Do not use * them in irq_chip callbacks. Use irq_gc_lock() instead. */ #define irq_gc_lock_irqsave(gc, flags) \ raw_spin_lock_irqsave(&(gc)->lock, flags) #define irq_gc_unlock_irqrestore(gc, flags) \ raw_spin_unlock_irqrestore(&(gc)->lock, flags) static inline void irq_reg_writel(struct irq_chip_generic *gc, u32 val, int reg_offset) { if (gc->reg_writel) gc->reg_writel(val, gc->reg_base + reg_offset); else writel(val, gc->reg_base + reg_offset); } static inline u32 irq_reg_readl(struct irq_chip_generic *gc, int reg_offset) { if (gc->reg_readl) return gc->reg_readl(gc->reg_base + reg_offset); else return readl(gc->reg_base + reg_offset); } struct irq_matrix; struct irq_matrix *irq_alloc_matrix(unsigned int matrix_bits, unsigned int alloc_start, unsigned int alloc_end); void irq_matrix_online(struct irq_matrix *m); void irq_matrix_offline(struct irq_matrix *m); void irq_matrix_assign_system(struct irq_matrix *m, unsigned int bit, bool replace); int irq_matrix_reserve_managed(struct irq_matrix *m, const struct cpumask *msk); void irq_matrix_remove_managed(struct irq_matrix *m, const struct cpumask *msk); int irq_matrix_alloc_managed(struct irq_matrix *m, const struct cpumask *msk, unsigned int *mapped_cpu); void irq_matrix_reserve(struct irq_matrix *m); void irq_matrix_remove_reserved(struct irq_matrix *m); int irq_matrix_alloc(struct irq_matrix *m, const struct cpumask *msk, bool reserved, unsigned int *mapped_cpu); void irq_matrix_free(struct irq_matrix *m, unsigned int cpu, unsigned int bit, bool managed); void irq_matrix_assign(struct irq_matrix *m, unsigned int bit); unsigned int irq_matrix_available(struct irq_matrix *m, bool cpudown); unsigned int irq_matrix_allocated(struct irq_matrix *m); unsigned int irq_matrix_reserved(struct irq_matrix *m); void irq_matrix_debug_show(struct seq_file *sf, struct irq_matrix *m, int ind); /* Contrary to Linux irqs, for hardware irqs the irq number 0 is valid */ #define INVALID_HWIRQ (~0UL) irq_hw_number_t ipi_get_hwirq(unsigned int irq, unsigned int cpu); int __ipi_send_single(struct irq_desc *desc, unsigned int cpu); int __ipi_send_mask(struct irq_desc *desc, const struct cpumask *dest); int ipi_send_single(unsigned int virq, unsigned int cpu); int ipi_send_mask(unsigned int virq, const struct cpumask *dest); void ipi_mux_process(void); int ipi_mux_create(unsigned int nr_ipi, void (*mux_send)(unsigned int cpu)); #ifdef CONFIG_GENERIC_IRQ_MULTI_HANDLER /* * Registers a generic IRQ handling function as the top-level IRQ handler in * the system, which is generally the first C code called from an assembly * architecture-specific interrupt handler. * * Returns 0 on success, or -EBUSY if an IRQ handler has already been * registered. */ int __init set_handle_irq(void (*handle_irq)(struct pt_regs *)); /* * Allows interrupt handlers to find the irqchip that's been registered as the * top-level IRQ handler. */ extern void (*handle_arch_irq)(struct pt_regs *) __ro_after_init; asmlinkage void generic_handle_arch_irq(struct pt_regs *regs); #else #ifndef set_handle_irq #define set_handle_irq(handle_irq) \ do { \ (void)handle_irq; \ WARN_ON(1); \ } while (0) #endif #endif #endif /* _LINUX_IRQ_H */ |
| 2 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 | /* SPDX-License-Identifier: GPL-2.0-or-later */ /* * include/net/dsa_stubs.h - Stubs for the Distributed Switch Architecture framework */ #include <linux/mutex.h> #include <linux/netdevice.h> #include <linux/net_tstamp.h> #include <net/dsa.h> #if IS_ENABLED(CONFIG_NET_DSA) extern const struct dsa_stubs *dsa_stubs; struct dsa_stubs { int (*conduit_hwtstamp_validate)(struct net_device *dev, const struct kernel_hwtstamp_config *config, struct netlink_ext_ack *extack); }; static inline int dsa_conduit_hwtstamp_validate(struct net_device *dev, const struct kernel_hwtstamp_config *config, struct netlink_ext_ack *extack) { if (!netdev_uses_dsa(dev)) return 0; /* rtnl_lock() is a sufficient guarantee, because as long as * netdev_uses_dsa() returns true, the dsa_core module is still * registered, and so, dsa_unregister_stubs() couldn't have run. * For netdev_uses_dsa() to start returning false, it would imply that * dsa_conduit_teardown() has executed, which requires rtnl_lock(). */ ASSERT_RTNL(); return dsa_stubs->conduit_hwtstamp_validate(dev, config, extack); } #else static inline int dsa_conduit_hwtstamp_validate(struct net_device *dev, const struct kernel_hwtstamp_config *config, struct netlink_ext_ack *extack) { return 0; } #endif |
| 25 25 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 | // SPDX-License-Identifier: GPL-2.0-only /* * linux/lib/crc-ccitt.c */ #include <linux/types.h> #include <linux/module.h> #include <linux/crc-ccitt.h> /* * This mysterious table is just the CRC of each possible byte. It can be * computed using the standard bit-at-a-time methods. The polynomial can * be seen in entry 128, 0x8408. This corresponds to x^0 + x^5 + x^12. * Add the implicit x^16, and you have the standard CRC-CCITT. */ u16 const crc_ccitt_table[256] = { 0x0000, 0x1189, 0x2312, 0x329b, 0x4624, 0x57ad, 0x6536, 0x74bf, 0x8c48, 0x9dc1, 0xaf5a, 0xbed3, 0xca6c, 0xdbe5, 0xe97e, 0xf8f7, 0x1081, 0x0108, 0x3393, 0x221a, 0x56a5, 0x472c, 0x75b7, 0x643e, 0x9cc9, 0x8d40, 0xbfdb, 0xae52, 0xdaed, 0xcb64, 0xf9ff, 0xe876, 0x2102, 0x308b, 0x0210, 0x1399, 0x6726, 0x76af, 0x4434, 0x55bd, 0xad4a, 0xbcc3, 0x8e58, 0x9fd1, 0xeb6e, 0xfae7, 0xc87c, 0xd9f5, 0x3183, 0x200a, 0x1291, 0x0318, 0x77a7, 0x662e, 0x54b5, 0x453c, 0xbdcb, 0xac42, 0x9ed9, 0x8f50, 0xfbef, 0xea66, 0xd8fd, 0xc974, 0x4204, 0x538d, 0x6116, 0x709f, 0x0420, 0x15a9, 0x2732, 0x36bb, 0xce4c, 0xdfc5, 0xed5e, 0xfcd7, 0x8868, 0x99e1, 0xab7a, 0xbaf3, 0x5285, 0x430c, 0x7197, 0x601e, 0x14a1, 0x0528, 0x37b3, 0x263a, 0xdecd, 0xcf44, 0xfddf, 0xec56, 0x98e9, 0x8960, 0xbbfb, 0xaa72, 0x6306, 0x728f, 0x4014, 0x519d, 0x2522, 0x34ab, 0x0630, 0x17b9, 0xef4e, 0xfec7, 0xcc5c, 0xddd5, 0xa96a, 0xb8e3, 0x8a78, 0x9bf1, 0x7387, 0x620e, 0x5095, 0x411c, 0x35a3, 0x242a, 0x16b1, 0x0738, 0xffcf, 0xee46, 0xdcdd, 0xcd54, 0xb9eb, 0xa862, 0x9af9, 0x8b70, 0x8408, 0x9581, 0xa71a, 0xb693, 0xc22c, 0xd3a5, 0xe13e, 0xf0b7, 0x0840, 0x19c9, 0x2b52, 0x3adb, 0x4e64, 0x5fed, 0x6d76, 0x7cff, 0x9489, 0x8500, 0xb79b, 0xa612, 0xd2ad, 0xc324, 0xf1bf, 0xe036, 0x18c1, 0x0948, 0x3bd3, 0x2a5a, 0x5ee5, 0x4f6c, 0x7df7, 0x6c7e, 0xa50a, 0xb483, 0x8618, 0x9791, 0xe32e, 0xf2a7, 0xc03c, 0xd1b5, 0x2942, 0x38cb, 0x0a50, 0x1bd9, 0x6f66, 0x7eef, 0x4c74, 0x5dfd, 0xb58b, 0xa402, 0x9699, 0x8710, 0xf3af, 0xe226, 0xd0bd, 0xc134, 0x39c3, 0x284a, 0x1ad1, 0x0b58, 0x7fe7, 0x6e6e, 0x5cf5, 0x4d7c, 0xc60c, 0xd785, 0xe51e, 0xf497, 0x8028, 0x91a1, 0xa33a, 0xb2b3, 0x4a44, 0x5bcd, 0x6956, 0x78df, 0x0c60, 0x1de9, 0x2f72, 0x3efb, 0xd68d, 0xc704, 0xf59f, 0xe416, 0x90a9, 0x8120, 0xb3bb, 0xa232, 0x5ac5, 0x4b4c, 0x79d7, 0x685e, 0x1ce1, 0x0d68, 0x3ff3, 0x2e7a, 0xe70e, 0xf687, 0xc41c, 0xd595, 0xa12a, 0xb0a3, 0x8238, 0x93b1, 0x6b46, 0x7acf, 0x4854, 0x59dd, 0x2d62, 0x3ceb, 0x0e70, 0x1ff9, 0xf78f, 0xe606, 0xd49d, 0xc514, 0xb1ab, 0xa022, 0x92b9, 0x8330, 0x7bc7, 0x6a4e, 0x58d5, 0x495c, 0x3de3, 0x2c6a, 0x1ef1, 0x0f78 }; EXPORT_SYMBOL(crc_ccitt_table); /** * crc_ccitt - recompute the CRC (CRC-CCITT variant) for the data * buffer * @crc: previous CRC value * @buffer: data pointer * @len: number of bytes in the buffer */ u16 crc_ccitt(u16 crc, u8 const *buffer, size_t len) { while (len--) crc = crc_ccitt_byte(crc, *buffer++); return crc; } EXPORT_SYMBOL(crc_ccitt); MODULE_DESCRIPTION("CRC-CCITT calculations"); MODULE_LICENSE("GPL"); |
| 1 15 3 2 2 2 1 4 4 4 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 | // SPDX-License-Identifier: GPL-2.0-only /* * Copyright 2007-2012 Siemens AG * * Written by: * Dmitry Eremin-Solenikov <dbaryshkov@gmail.com> * Sergey Lapin <slapin@ossfans.org> * Maxim Gorbachyov <maxim.gorbachev@siemens.com> * Alexander Smirnov <alex.bluesman.smirnov@gmail.com> */ #include <linux/if_arp.h> #include <net/mac802154.h> #include <net/ieee802154_netdev.h> #include <net/cfg802154.h> #include "ieee802154_i.h" #include "driver-ops.h" void mac802154_dev_set_page_channel(struct net_device *dev, u8 page, u8 chan) { struct ieee802154_sub_if_data *sdata = IEEE802154_DEV_TO_SUB_IF(dev); struct ieee802154_local *local = sdata->local; int res; ASSERT_RTNL(); BUG_ON(dev->type != ARPHRD_IEEE802154); res = drv_set_channel(local, page, chan); if (res) { pr_debug("set_channel failed\n"); } else { local->phy->current_channel = chan; local->phy->current_page = page; } } int mac802154_get_params(struct net_device *dev, struct ieee802154_llsec_params *params) { struct ieee802154_sub_if_data *sdata = IEEE802154_DEV_TO_SUB_IF(dev); int res; BUG_ON(dev->type != ARPHRD_IEEE802154); mutex_lock(&sdata->sec_mtx); res = mac802154_llsec_get_params(&sdata->sec, params); mutex_unlock(&sdata->sec_mtx); return res; } int mac802154_set_params(struct net_device *dev, const struct ieee802154_llsec_params *params, int changed) { struct ieee802154_sub_if_data *sdata = IEEE802154_DEV_TO_SUB_IF(dev); int res; BUG_ON(dev->type != ARPHRD_IEEE802154); mutex_lock(&sdata->sec_mtx); res = mac802154_llsec_set_params(&sdata->sec, params, changed); mutex_unlock(&sdata->sec_mtx); return res; } int mac802154_add_key(struct net_device *dev, const struct ieee802154_llsec_key_id *id, const struct ieee802154_llsec_key *key) { struct ieee802154_sub_if_data *sdata = IEEE802154_DEV_TO_SUB_IF(dev); int res; BUG_ON(dev->type != ARPHRD_IEEE802154); mutex_lock(&sdata->sec_mtx); res = mac802154_llsec_key_add(&sdata->sec, id, key); mutex_unlock(&sdata->sec_mtx); return res; } int mac802154_del_key(struct net_device *dev, const struct ieee802154_llsec_key_id *id) { struct ieee802154_sub_if_data *sdata = IEEE802154_DEV_TO_SUB_IF(dev); int res; BUG_ON(dev->type != ARPHRD_IEEE802154); mutex_lock(&sdata->sec_mtx); res = mac802154_llsec_key_del(&sdata->sec, id); mutex_unlock(&sdata->sec_mtx); return res; } int mac802154_add_dev(struct net_device *dev, const struct ieee802154_llsec_device *llsec_dev) { struct ieee802154_sub_if_data *sdata = IEEE802154_DEV_TO_SUB_IF(dev); int res; BUG_ON(dev->type != ARPHRD_IEEE802154); mutex_lock(&sdata->sec_mtx); res = mac802154_llsec_dev_add(&sdata->sec, llsec_dev); mutex_unlock(&sdata->sec_mtx); return res; } int mac802154_del_dev(struct net_device *dev, __le64 dev_addr) { struct ieee802154_sub_if_data *sdata = IEEE802154_DEV_TO_SUB_IF(dev); int res; BUG_ON(dev->type != ARPHRD_IEEE802154); mutex_lock(&sdata->sec_mtx); res = mac802154_llsec_dev_del(&sdata->sec, dev_addr); mutex_unlock(&sdata->sec_mtx); return res; } int mac802154_add_devkey(struct net_device *dev, __le64 device_addr, const struct ieee802154_llsec_device_key *key) { struct ieee802154_sub_if_data *sdata = IEEE802154_DEV_TO_SUB_IF(dev); int res; BUG_ON(dev->type != ARPHRD_IEEE802154); mutex_lock(&sdata->sec_mtx); res = mac802154_llsec_devkey_add(&sdata->sec, device_addr, key); mutex_unlock(&sdata->sec_mtx); return res; } int mac802154_del_devkey(struct net_device *dev, __le64 device_addr, const struct ieee802154_llsec_device_key *key) { struct ieee802154_sub_if_data *sdata = IEEE802154_DEV_TO_SUB_IF(dev); int res; BUG_ON(dev->type != ARPHRD_IEEE802154); mutex_lock(&sdata->sec_mtx); res = mac802154_llsec_devkey_del(&sdata->sec, device_addr, key); mutex_unlock(&sdata->sec_mtx); return res; } int mac802154_add_seclevel(struct net_device *dev, const struct ieee802154_llsec_seclevel *sl) { struct ieee802154_sub_if_data *sdata = IEEE802154_DEV_TO_SUB_IF(dev); int res; BUG_ON(dev->type != ARPHRD_IEEE802154); mutex_lock(&sdata->sec_mtx); res = mac802154_llsec_seclevel_add(&sdata->sec, sl); mutex_unlock(&sdata->sec_mtx); return res; } int mac802154_del_seclevel(struct net_device *dev, const struct ieee802154_llsec_seclevel *sl) { struct ieee802154_sub_if_data *sdata = IEEE802154_DEV_TO_SUB_IF(dev); int res; BUG_ON(dev->type != ARPHRD_IEEE802154); mutex_lock(&sdata->sec_mtx); res = mac802154_llsec_seclevel_del(&sdata->sec, sl); mutex_unlock(&sdata->sec_mtx); return res; } void mac802154_lock_table(struct net_device *dev) { struct ieee802154_sub_if_data *sdata = IEEE802154_DEV_TO_SUB_IF(dev); BUG_ON(dev->type != ARPHRD_IEEE802154); mutex_lock(&sdata->sec_mtx); } void mac802154_get_table(struct net_device *dev, struct ieee802154_llsec_table **t) { struct ieee802154_sub_if_data *sdata = IEEE802154_DEV_TO_SUB_IF(dev); BUG_ON(dev->type != ARPHRD_IEEE802154); *t = &sdata->sec.table; } void mac802154_unlock_table(struct net_device *dev) { struct ieee802154_sub_if_data *sdata = IEEE802154_DEV_TO_SUB_IF(dev); BUG_ON(dev->type != ARPHRD_IEEE802154); mutex_unlock(&sdata->sec_mtx); } |
| 1285 320 320 630 925 6 643 480 18 31 399 20 1347 545 12 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 | /* SPDX-License-Identifier: GPL-2.0-or-later */ /* * Linux INET6 implementation * * Authors: * Pedro Roque <roque@di.fc.ul.pt> */ #ifndef _IP6_FIB_H #define _IP6_FIB_H #include <linux/ipv6_route.h> #include <linux/rtnetlink.h> #include <linux/spinlock.h> #include <linux/notifier.h> #include <net/dst.h> #include <net/flow.h> #include <net/ip_fib.h> #include <net/netlink.h> #include <net/inetpeer.h> #include <net/fib_notifier.h> #include <linux/indirect_call_wrapper.h> #include <uapi/linux/bpf.h> #ifdef CONFIG_IPV6_MULTIPLE_TABLES #define FIB6_TABLE_HASHSZ 256 #else #define FIB6_TABLE_HASHSZ 1 #endif #define RT6_DEBUG 2 struct rt6_info; struct fib6_info; struct fib6_config { u32 fc_table; u32 fc_metric; int fc_dst_len; int fc_src_len; int fc_ifindex; u32 fc_flags; u32 fc_protocol; u16 fc_type; /* only 8 bits are used */ u16 fc_delete_all_nh : 1, fc_ignore_dev_down:1, __unused : 14; u32 fc_nh_id; struct in6_addr fc_dst; struct in6_addr fc_src; struct in6_addr fc_prefsrc; struct in6_addr fc_gateway; unsigned long fc_expires; struct nlattr *fc_mx; int fc_mx_len; int fc_mp_len; struct nlattr *fc_mp; struct nl_info fc_nlinfo; struct nlattr *fc_encap; u16 fc_encap_type; bool fc_is_fdb; }; struct fib6_node { struct fib6_node __rcu *parent; struct fib6_node __rcu *left; struct fib6_node __rcu *right; #ifdef CONFIG_IPV6_SUBTREES struct fib6_node __rcu *subtree; #endif struct fib6_info __rcu *leaf; __u16 fn_bit; /* bit key */ __u16 fn_flags; int fn_sernum; struct fib6_info __rcu *rr_ptr; struct rcu_head rcu; }; struct fib6_gc_args { int timeout; int more; }; #ifndef CONFIG_IPV6_SUBTREES #define FIB6_SUBTREE(fn) NULL static inline bool fib6_routes_require_src(const struct net *net) { return false; } static inline void fib6_routes_require_src_inc(struct net *net) {} static inline void fib6_routes_require_src_dec(struct net *net) {} #else static inline bool fib6_routes_require_src(const struct net *net) { return net->ipv6.fib6_routes_require_src > 0; } static inline void fib6_routes_require_src_inc(struct net *net) { net->ipv6.fib6_routes_require_src++; } static inline void fib6_routes_require_src_dec(struct net *net) { net->ipv6.fib6_routes_require_src--; } #define FIB6_SUBTREE(fn) (rcu_dereference_protected((fn)->subtree, 1)) #endif /* * routing information * */ struct rt6key { struct in6_addr addr; int plen; }; struct fib6_table; struct rt6_exception_bucket { struct hlist_head chain; int depth; }; struct rt6_exception { struct hlist_node hlist; struct rt6_info *rt6i; unsigned long stamp; struct rcu_head rcu; }; #define FIB6_EXCEPTION_BUCKET_SIZE_SHIFT 10 #define FIB6_EXCEPTION_BUCKET_SIZE (1 << FIB6_EXCEPTION_BUCKET_SIZE_SHIFT) #define FIB6_MAX_DEPTH 5 struct fib6_nh { struct fib_nh_common nh_common; #ifdef CONFIG_IPV6_ROUTER_PREF unsigned long last_probe; #endif struct rt6_info * __percpu *rt6i_pcpu; struct rt6_exception_bucket __rcu *rt6i_exception_bucket; }; struct fib6_info { struct fib6_table *fib6_table; struct fib6_info __rcu *fib6_next; struct fib6_node __rcu *fib6_node; /* Multipath routes: * siblings is a list of fib6_info that have the same metric/weight, * destination, but not the same gateway. nsiblings is just a cache * to speed up lookup. */ union { struct list_head fib6_siblings; struct list_head nh_list; }; unsigned int fib6_nsiblings; refcount_t fib6_ref; unsigned long expires; struct hlist_node gc_link; struct dst_metrics *fib6_metrics; #define fib6_pmtu fib6_metrics->metrics[RTAX_MTU-1] struct rt6key fib6_dst; u32 fib6_flags; struct rt6key fib6_src; struct rt6key fib6_prefsrc; u32 fib6_metric; u8 fib6_protocol; u8 fib6_type; u8 offload; u8 trap; u8 offload_failed; u8 should_flush:1, dst_nocount:1, dst_nopolicy:1, fib6_destroying:1, unused:4; struct rcu_head rcu; struct nexthop *nh; struct fib6_nh fib6_nh[]; }; struct rt6_info { struct dst_entry dst; struct fib6_info __rcu *from; int sernum; struct rt6key rt6i_dst; struct rt6key rt6i_src; struct in6_addr rt6i_gateway; struct inet6_dev *rt6i_idev; u32 rt6i_flags; /* more non-fragment space at head required */ unsigned short rt6i_nfheader_len; }; struct fib6_result { struct fib6_nh *nh; struct fib6_info *f6i; u32 fib6_flags; u8 fib6_type; struct rt6_info *rt6; }; #define for_each_fib6_node_rt_rcu(fn) \ for (rt = rcu_dereference((fn)->leaf); rt; \ rt = rcu_dereference(rt->fib6_next)) #define for_each_fib6_walker_rt(w) \ for (rt = (w)->leaf; rt; \ rt = rcu_dereference_protected(rt->fib6_next, 1)) #define dst_rt6_info(_ptr) container_of_const(_ptr, struct rt6_info, dst) static inline struct inet6_dev *ip6_dst_idev(const struct dst_entry *dst) { return dst_rt6_info(dst)->rt6i_idev; } static inline bool fib6_requires_src(const struct fib6_info *rt) { return rt->fib6_src.plen > 0; } /* The callers should hold f6i->fib6_table->tb6_lock if a route has ever * been added to a table before. */ static inline void fib6_clean_expires(struct fib6_info *f6i) { f6i->fib6_flags &= ~RTF_EXPIRES; f6i->expires = 0; } /* The callers should hold f6i->fib6_table->tb6_lock if a route has ever * been added to a table before. */ static inline void fib6_set_expires(struct fib6_info *f6i, unsigned long expires) { f6i->expires = expires; f6i->fib6_flags |= RTF_EXPIRES; } static inline bool fib6_check_expired(const struct fib6_info *f6i) { if (f6i->fib6_flags & RTF_EXPIRES) return time_after(jiffies, f6i->expires); return false; } /* Function to safely get fn->fn_sernum for passed in rt * and store result in passed in cookie. * Return true if we can get cookie safely * Return false if not */ static inline bool fib6_get_cookie_safe(const struct fib6_info *f6i, u32 *cookie) { struct fib6_node *fn; bool status = false; fn = rcu_dereference(f6i->fib6_node); if (fn) { *cookie = READ_ONCE(fn->fn_sernum); /* pairs with smp_wmb() in __fib6_update_sernum_upto_root() */ smp_rmb(); status = true; } return status; } static inline u32 rt6_get_cookie(const struct rt6_info *rt) { struct fib6_info *from; u32 cookie = 0; if (rt->sernum) return rt->sernum; rcu_read_lock(); from = rcu_dereference(rt->from); if (from) fib6_get_cookie_safe(from, &cookie); rcu_read_unlock(); return cookie; } static inline void ip6_rt_put(struct rt6_info *rt) { /* dst_release() accepts a NULL parameter. * We rely on dst being first structure in struct rt6_info */ BUILD_BUG_ON(offsetof(struct rt6_info, dst) != 0); dst_release(&rt->dst); } struct fib6_info *fib6_info_alloc(gfp_t gfp_flags, bool with_fib6_nh); void fib6_info_destroy_rcu(struct rcu_head *head); static inline void fib6_info_hold(struct fib6_info *f6i) { refcount_inc(&f6i->fib6_ref); } static inline bool fib6_info_hold_safe(struct fib6_info *f6i) { return refcount_inc_not_zero(&f6i->fib6_ref); } static inline void fib6_info_release(struct fib6_info *f6i) { if (f6i && refcount_dec_and_test(&f6i->fib6_ref)) { DEBUG_NET_WARN_ON_ONCE(!hlist_unhashed(&f6i->gc_link)); call_rcu_hurry(&f6i->rcu, fib6_info_destroy_rcu); } } enum fib6_walk_state { #ifdef CONFIG_IPV6_SUBTREES FWS_S, #endif FWS_L, FWS_R, FWS_C, FWS_U }; struct fib6_walker { struct list_head lh; struct fib6_node *root, *node; struct fib6_info *leaf; enum fib6_walk_state state; unsigned int skip; unsigned int count; unsigned int skip_in_node; int (*func)(struct fib6_walker *); void *args; }; struct rt6_statistics { __u32 fib_nodes; /* all fib6 nodes */ __u32 fib_route_nodes; /* intermediate nodes */ __u32 fib_rt_entries; /* rt entries in fib table */ __u32 fib_rt_cache; /* cached rt entries in exception table */ __u32 fib_discarded_routes; /* total number of routes delete */ /* The following stat is not protected by any lock */ atomic_t fib_rt_alloc; /* total number of routes alloced */ }; #define RTN_TL_ROOT 0x0001 #define RTN_ROOT 0x0002 /* tree root node */ #define RTN_RTINFO 0x0004 /* node with valid routing info */ /* * priority levels (or metrics) * */ struct fib6_table { struct hlist_node tb6_hlist; u32 tb6_id; spinlock_t tb6_lock; struct fib6_node tb6_root; struct inet_peer_base tb6_peers; unsigned int flags; unsigned int fib_seq; /* writes protected by rtnl_mutex */ struct hlist_head tb6_gc_hlist; /* GC candidates */ #define RT6_TABLE_HAS_DFLT_ROUTER BIT(0) }; #define RT6_TABLE_UNSPEC RT_TABLE_UNSPEC #define RT6_TABLE_MAIN RT_TABLE_MAIN #define RT6_TABLE_DFLT RT6_TABLE_MAIN #define RT6_TABLE_INFO RT6_TABLE_MAIN #define RT6_TABLE_PREFIX RT6_TABLE_MAIN #ifdef CONFIG_IPV6_MULTIPLE_TABLES #define FIB6_TABLE_MIN 1 #define FIB6_TABLE_MAX RT_TABLE_MAX #define RT6_TABLE_LOCAL RT_TABLE_LOCAL #else #define FIB6_TABLE_MIN RT_TABLE_MAIN #define FIB6_TABLE_MAX FIB6_TABLE_MIN #define RT6_TABLE_LOCAL RT6_TABLE_MAIN #endif typedef struct rt6_info *(*pol_lookup_t)(struct net *, struct fib6_table *, struct flowi6 *, const struct sk_buff *, int); struct fib6_entry_notifier_info { struct fib_notifier_info info; /* must be first */ struct fib6_info *rt; unsigned int nsiblings; }; /* * exported functions */ struct fib6_table *fib6_get_table(struct net *net, u32 id); struct fib6_table *fib6_new_table(struct net *net, u32 id); struct dst_entry *fib6_rule_lookup(struct net *net, struct flowi6 *fl6, const struct sk_buff *skb, int flags, pol_lookup_t lookup); /* called with rcu lock held; can return error pointer * caller needs to select path */ int fib6_lookup(struct net *net, int oif, struct flowi6 *fl6, struct fib6_result *res, int flags); /* called with rcu lock held; caller needs to select path */ int fib6_table_lookup(struct net *net, struct fib6_table *table, int oif, struct flowi6 *fl6, struct fib6_result *res, int strict); void fib6_select_path(const struct net *net, struct fib6_result *res, struct flowi6 *fl6, int oif, bool have_oif_match, const struct sk_buff *skb, int strict); struct fib6_node *fib6_node_lookup(struct fib6_node *root, const struct in6_addr *daddr, const struct in6_addr *saddr); struct fib6_node *fib6_locate(struct fib6_node *root, const struct in6_addr *daddr, int dst_len, const struct in6_addr *saddr, int src_len, bool exact_match); void fib6_clean_all(struct net *net, int (*func)(struct fib6_info *, void *arg), void *arg); void fib6_clean_all_skip_notify(struct net *net, int (*func)(struct fib6_info *, void *arg), void *arg); int fib6_add(struct fib6_node *root, struct fib6_info *rt, struct nl_info *info, struct netlink_ext_ack *extack); int fib6_del(struct fib6_info *rt, struct nl_info *info); static inline void rt6_get_prefsrc(const struct rt6_info *rt, struct in6_addr *addr) { const struct fib6_info *from; rcu_read_lock(); from = rcu_dereference(rt->from); if (from) *addr = from->fib6_prefsrc.addr; else *addr = in6addr_any; rcu_read_unlock(); } int fib6_nh_init(struct net *net, struct fib6_nh *fib6_nh, struct fib6_config *cfg, gfp_t gfp_flags, struct netlink_ext_ack *extack); void fib6_nh_release(struct fib6_nh *fib6_nh); void fib6_nh_release_dsts(struct fib6_nh *fib6_nh); int call_fib6_entry_notifiers(struct net *net, enum fib_event_type event_type, struct fib6_info *rt, struct netlink_ext_ack *extack); int call_fib6_multipath_entry_notifiers(struct net *net, enum fib_event_type event_type, struct fib6_info *rt, unsigned int nsiblings, struct netlink_ext_ack *extack); int call_fib6_entry_notifiers_replace(struct net *net, struct fib6_info *rt); void fib6_rt_update(struct net *net, struct fib6_info *rt, struct nl_info *info); void inet6_rt_notify(int event, struct fib6_info *rt, struct nl_info *info, unsigned int flags); void fib6_run_gc(unsigned long expires, struct net *net, bool force); void fib6_gc_cleanup(void); int fib6_init(void); /* Add the route to the gc list if it is not already there * * The callers should hold f6i->fib6_table->tb6_lock. */ static inline void fib6_add_gc_list(struct fib6_info *f6i) { /* If fib6_node is null, the f6i is not in (or removed from) the * table. * * There is a gap between finding the f6i from the table and * calling this function without the protection of the tb6_lock. * This check makes sure the f6i is not added to the gc list when * it is not on the table. */ if (!rcu_dereference_protected(f6i->fib6_node, lockdep_is_held(&f6i->fib6_table->tb6_lock))) return; if (hlist_unhashed(&f6i->gc_link)) hlist_add_head(&f6i->gc_link, &f6i->fib6_table->tb6_gc_hlist); } /* Remove the route from the gc list if it is on the list. * * The callers should hold f6i->fib6_table->tb6_lock. */ static inline void fib6_remove_gc_list(struct fib6_info *f6i) { if (!hlist_unhashed(&f6i->gc_link)) hlist_del_init(&f6i->gc_link); } struct ipv6_route_iter { struct seq_net_private p; struct fib6_walker w; loff_t skip; struct fib6_table *tbl; int sernum; }; extern const struct seq_operations ipv6_route_seq_ops; int call_fib6_notifier(struct notifier_block *nb, enum fib_event_type event_type, struct fib_notifier_info *info); int call_fib6_notifiers(struct net *net, enum fib_event_type event_type, struct fib_notifier_info *info); int __net_init fib6_notifier_init(struct net *net); void __net_exit fib6_notifier_exit(struct net *net); unsigned int fib6_tables_seq_read(const struct net *net); int fib6_tables_dump(struct net *net, struct notifier_block *nb, struct netlink_ext_ack *extack); void fib6_update_sernum(struct net *net, struct fib6_info *rt); void fib6_update_sernum_upto_root(struct net *net, struct fib6_info *rt); void fib6_update_sernum_stub(struct net *net, struct fib6_info *f6i); void fib6_metric_set(struct fib6_info *f6i, int metric, u32 val); static inline bool fib6_metric_locked(struct fib6_info *f6i, int metric) { return !!(f6i->fib6_metrics->metrics[RTAX_LOCK - 1] & (1 << metric)); } void fib6_info_hw_flags_set(struct net *net, struct fib6_info *f6i, bool offload, bool trap, bool offload_failed); #if IS_BUILTIN(CONFIG_IPV6) && defined(CONFIG_BPF_SYSCALL) struct bpf_iter__ipv6_route { __bpf_md_ptr(struct bpf_iter_meta *, meta); __bpf_md_ptr(struct fib6_info *, rt); }; #endif INDIRECT_CALLABLE_DECLARE(struct rt6_info *ip6_pol_route_output(struct net *net, struct fib6_table *table, struct flowi6 *fl6, const struct sk_buff *skb, int flags)); INDIRECT_CALLABLE_DECLARE(struct rt6_info *ip6_pol_route_input(struct net *net, struct fib6_table *table, struct flowi6 *fl6, const struct sk_buff *skb, int flags)); INDIRECT_CALLABLE_DECLARE(struct rt6_info *__ip6_route_redirect(struct net *net, struct fib6_table *table, struct flowi6 *fl6, const struct sk_buff *skb, int flags)); INDIRECT_CALLABLE_DECLARE(struct rt6_info *ip6_pol_route_lookup(struct net *net, struct fib6_table *table, struct flowi6 *fl6, const struct sk_buff *skb, int flags)); static inline struct rt6_info *pol_lookup_func(pol_lookup_t lookup, struct net *net, struct fib6_table *table, struct flowi6 *fl6, const struct sk_buff *skb, int flags) { return INDIRECT_CALL_4(lookup, ip6_pol_route_output, ip6_pol_route_input, ip6_pol_route_lookup, __ip6_route_redirect, net, table, fl6, skb, flags); } #ifdef CONFIG_IPV6_MULTIPLE_TABLES static inline bool fib6_has_custom_rules(const struct net *net) { return net->ipv6.fib6_has_custom_rules; } int fib6_rules_init(void); void fib6_rules_cleanup(void); bool fib6_rule_default(const struct fib_rule *rule); int fib6_rules_dump(struct net *net, struct notifier_block *nb, struct netlink_ext_ack *extack); unsigned int fib6_rules_seq_read(const struct net *net); static inline bool fib6_rules_early_flow_dissect(struct net *net, struct sk_buff *skb, struct flowi6 *fl6, struct flow_keys *flkeys) { unsigned int flag = FLOW_DISSECTOR_F_STOP_AT_ENCAP; if (!net->ipv6.fib6_rules_require_fldissect) return false; memset(flkeys, 0, sizeof(*flkeys)); __skb_flow_dissect(net, skb, &flow_keys_dissector, flkeys, NULL, 0, 0, 0, flag); fl6->fl6_sport = flkeys->ports.src; fl6->fl6_dport = flkeys->ports.dst; fl6->flowi6_proto = flkeys->basic.ip_proto; return true; } #else static inline bool fib6_has_custom_rules(const struct net *net) { return false; } static inline int fib6_rules_init(void) { return 0; } static inline void fib6_rules_cleanup(void) { return ; } static inline bool fib6_rule_default(const struct fib_rule *rule) { return true; } static inline int fib6_rules_dump(struct net *net, struct notifier_block *nb, struct netlink_ext_ack *extack) { return 0; } static inline unsigned int fib6_rules_seq_read(const struct net *net) { return 0; } static inline bool fib6_rules_early_flow_dissect(struct net *net, struct sk_buff *skb, struct flowi6 *fl6, struct flow_keys *flkeys) { return false; } #endif #endif |
| 3 1 2 1 3 3 1 2 1 1 3 4 1 2 11 11 1 5 11 8 16 16 16 16 16 11 13 13 6 2 2 11 11 13 17 17 1 18 8 8 2 10 10 10 1 8 1 10 1 1 8 8 1 9 9 9 6 11 9 9 9 8 1 2 9 9 9 9 9 9 24 2 23 23 23 8 1 14 15 15 15 23 19 4 19 5 17 17 16 1 1 7 9 17 9 8 17 17 17 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 | // SPDX-License-Identifier: GPL-2.0-or-later /* * Xbox gamepad driver * * Copyright (c) 2002 Marko Friedemann <mfr@bmx-chemnitz.de> * 2004 Oliver Schwartz <Oliver.Schwartz@gmx.de>, * Steven Toth <steve@toth.demon.co.uk>, * Franz Lehner <franz@caos.at>, * Ivan Hawkes <blackhawk@ivanhawkes.com> * 2005 Dominic Cerquetti <binary1230@yahoo.com> * 2006 Adam Buchbinder <adam.buchbinder@gmail.com> * 2007 Jan Kratochvil <honza@jikos.cz> * 2010 Christoph Fritz <chf.fritz@googlemail.com> * * This driver is based on: * - information from http://euc.jp/periphs/xbox-controller.ja.html * - the iForce driver drivers/char/joystick/iforce.c * - the skeleton-driver drivers/usb/usb-skeleton.c * - Xbox 360 information http://www.free60.org/wiki/Gamepad * - Xbox One information https://github.com/quantus/xbox-one-controller-protocol * * Thanks to: * - ITO Takayuki for providing essential xpad information on his website * - Vojtech Pavlik - iforce driver / input subsystem * - Greg Kroah-Hartman - usb-skeleton driver * - Xbox Linux project - extra USB IDs * - Pekka Pöyry (quantus) - Xbox One controller reverse-engineering * * TODO: * - fine tune axes (especially trigger axes) * - fix "analog" buttons (reported as digital now) * - get rumble working * - need USB IDs for other dance pads * * History: * * 2002-06-27 - 0.0.1 : first version, just said "XBOX HID controller" * * 2002-07-02 - 0.0.2 : basic working version * - all axes and 9 of the 10 buttons work (german InterAct device) * - the black button does not work * * 2002-07-14 - 0.0.3 : rework by Vojtech Pavlik * - indentation fixes * - usb + input init sequence fixes * * 2002-07-16 - 0.0.4 : minor changes, merge with Vojtech's v0.0.3 * - verified the lack of HID and report descriptors * - verified that ALL buttons WORK * - fixed d-pad to axes mapping * * 2002-07-17 - 0.0.5 : simplified d-pad handling * * 2004-10-02 - 0.0.6 : DDR pad support * - borrowed from the Xbox Linux kernel * - USB id's for commonly used dance pads are present * - dance pads will map D-PAD to buttons, not axes * - pass the module paramater 'dpad_to_buttons' to force * the D-PAD to map to buttons if your pad is not detected * * Later changes can be tracked in SCM. */ #include <linux/bits.h> #include <linux/kernel.h> #include <linux/input.h> #include <linux/rcupdate.h> #include <linux/slab.h> #include <linux/stat.h> #include <linux/module.h> #include <linux/usb/input.h> #include <linux/usb/quirks.h> #define XPAD_PKT_LEN 64 /* * xbox d-pads should map to buttons, as is required for DDR pads * but we map them to axes when possible to simplify things */ #define MAP_DPAD_TO_BUTTONS (1 << 0) #define MAP_TRIGGERS_TO_BUTTONS (1 << 1) #define MAP_STICKS_TO_NULL (1 << 2) #define MAP_SELECT_BUTTON (1 << 3) #define MAP_PADDLES (1 << 4) #define MAP_PROFILE_BUTTON (1 << 5) #define DANCEPAD_MAP_CONFIG (MAP_DPAD_TO_BUTTONS | \ MAP_TRIGGERS_TO_BUTTONS | MAP_STICKS_TO_NULL) #define XTYPE_XBOX 0 #define XTYPE_XBOX360 1 #define XTYPE_XBOX360W 2 #define XTYPE_XBOXONE 3 #define XTYPE_UNKNOWN 4 /* Send power-off packet to xpad360w after holding the mode button for this many * seconds */ #define XPAD360W_POWEROFF_TIMEOUT 5 #define PKT_XB 0 #define PKT_XBE1 1 #define PKT_XBE2_FW_OLD 2 #define PKT_XBE2_FW_5_EARLY 3 #define PKT_XBE2_FW_5_11 4 static bool dpad_to_buttons; module_param(dpad_to_buttons, bool, S_IRUGO); MODULE_PARM_DESC(dpad_to_buttons, "Map D-PAD to buttons rather than axes for unknown pads"); static bool triggers_to_buttons; module_param(triggers_to_buttons, bool, S_IRUGO); MODULE_PARM_DESC(triggers_to_buttons, "Map triggers to buttons rather than axes for unknown pads"); static bool sticks_to_null; module_param(sticks_to_null, bool, S_IRUGO); MODULE_PARM_DESC(sticks_to_null, "Do not map sticks at all for unknown pads"); static bool auto_poweroff = true; module_param(auto_poweroff, bool, S_IWUSR | S_IRUGO); MODULE_PARM_DESC(auto_poweroff, "Power off wireless controllers on suspend"); static const struct xpad_device { u16 idVendor; u16 idProduct; char *name; u8 mapping; u8 xtype; } xpad_device[] = { /* Please keep this list sorted by vendor and product ID. */ { 0x0079, 0x18d4, "GPD Win 2 X-Box Controller", 0, XTYPE_XBOX360 }, { 0x03eb, 0xff01, "Wooting One (Legacy)", 0, XTYPE_XBOX360 }, { 0x03eb, 0xff02, "Wooting Two (Legacy)", 0, XTYPE_XBOX360 }, { 0x03f0, 0x038D, "HyperX Clutch", 0, XTYPE_XBOX360 }, /* wired */ { 0x03f0, 0x048D, "HyperX Clutch", 0, XTYPE_XBOX360 }, /* wireless */ { 0x03f0, 0x0495, "HyperX Clutch Gladiate", 0, XTYPE_XBOXONE }, { 0x03f0, 0x07A0, "HyperX Clutch Gladiate RGB", 0, XTYPE_XBOXONE }, { 0x03f0, 0x08B6, "HyperX Clutch Gladiate", 0, XTYPE_XBOXONE }, /* v2 */ { 0x03f0, 0x09B4, "HyperX Clutch Tanto", 0, XTYPE_XBOXONE }, { 0x044f, 0x0f00, "Thrustmaster Wheel", 0, XTYPE_XBOX }, { 0x044f, 0x0f03, "Thrustmaster Wheel", 0, XTYPE_XBOX }, { 0x044f, 0x0f07, "Thrustmaster, Inc. Controller", 0, XTYPE_XBOX }, { 0x044f, 0x0f10, "Thrustmaster Modena GT Wheel", 0, XTYPE_XBOX }, { 0x044f, 0xb326, "Thrustmaster Gamepad GP XID", 0, XTYPE_XBOX360 }, { 0x045e, 0x0202, "Microsoft X-Box pad v1 (US)", 0, XTYPE_XBOX }, { 0x045e, 0x0285, "Microsoft X-Box pad (Japan)", 0, XTYPE_XBOX }, { 0x045e, 0x0287, "Microsoft Xbox Controller S", 0, XTYPE_XBOX }, { 0x045e, 0x0288, "Microsoft Xbox Controller S v2", 0, XTYPE_XBOX }, { 0x045e, 0x0289, "Microsoft X-Box pad v2 (US)", 0, XTYPE_XBOX }, { 0x045e, 0x028e, "Microsoft X-Box 360 pad", 0, XTYPE_XBOX360 }, { 0x045e, 0x028f, "Microsoft X-Box 360 pad v2", 0, XTYPE_XBOX360 }, { 0x045e, 0x0291, "Xbox 360 Wireless Receiver (XBOX)", MAP_DPAD_TO_BUTTONS, XTYPE_XBOX360W }, { 0x045e, 0x02a9, "Xbox 360 Wireless Receiver (Unofficial)", MAP_DPAD_TO_BUTTONS, XTYPE_XBOX360W }, { 0x045e, 0x02d1, "Microsoft X-Box One pad", 0, XTYPE_XBOXONE }, { 0x045e, 0x02dd, "Microsoft X-Box One pad (Firmware 2015)", 0, XTYPE_XBOXONE }, { 0x045e, 0x02e3, "Microsoft X-Box One Elite pad", MAP_PADDLES, XTYPE_XBOXONE }, { 0x045e, 0x02ea, "Microsoft X-Box One S pad", 0, XTYPE_XBOXONE }, { 0x045e, 0x0719, "Xbox 360 Wireless Receiver", MAP_DPAD_TO_BUTTONS, XTYPE_XBOX360W }, { 0x045e, 0x0b00, "Microsoft X-Box One Elite 2 pad", MAP_PADDLES, XTYPE_XBOXONE }, { 0x045e, 0x0b0a, "Microsoft X-Box Adaptive Controller", MAP_PROFILE_BUTTON, XTYPE_XBOXONE }, { 0x045e, 0x0b12, "Microsoft Xbox Series S|X Controller", MAP_SELECT_BUTTON, XTYPE_XBOXONE }, { 0x046d, 0xc21d, "Logitech Gamepad F310", 0, XTYPE_XBOX360 }, { 0x046d, 0xc21e, "Logitech Gamepad F510", 0, XTYPE_XBOX360 }, { 0x046d, 0xc21f, "Logitech Gamepad F710", 0, XTYPE_XBOX360 }, { 0x046d, 0xc242, "Logitech Chillstream Controller", 0, XTYPE_XBOX360 }, { 0x046d, 0xca84, "Logitech Xbox Cordless Controller", 0, XTYPE_XBOX }, { 0x046d, 0xca88, "Logitech Compact Controller for Xbox", 0, XTYPE_XBOX }, { 0x046d, 0xca8a, "Logitech Precision Vibration Feedback Wheel", 0, XTYPE_XBOX }, { 0x046d, 0xcaa3, "Logitech DriveFx Racing Wheel", 0, XTYPE_XBOX360 }, { 0x056e, 0x2004, "Elecom JC-U3613M", 0, XTYPE_XBOX360 }, { 0x05fd, 0x1007, "Mad Catz Controller (unverified)", 0, XTYPE_XBOX }, { 0x05fd, 0x107a, "InterAct 'PowerPad Pro' X-Box pad (Germany)", 0, XTYPE_XBOX }, { 0x05fe, 0x3030, "Chic Controller", 0, XTYPE_XBOX }, { 0x05fe, 0x3031, "Chic Controller", 0, XTYPE_XBOX }, { 0x062a, 0x0020, "Logic3 Xbox GamePad", 0, XTYPE_XBOX }, { 0x062a, 0x0033, "Competition Pro Steering Wheel", 0, XTYPE_XBOX }, { 0x06a3, 0x0200, "Saitek Racing Wheel", 0, XTYPE_XBOX }, { 0x06a3, 0x0201, "Saitek Adrenalin", 0, XTYPE_XBOX }, { 0x06a3, 0xf51a, "Saitek P3600", 0, XTYPE_XBOX360 }, { 0x0738, 0x4506, "Mad Catz 4506 Wireless Controller", 0, XTYPE_XBOX }, { 0x0738, 0x4516, "Mad Catz Control Pad", 0, XTYPE_XBOX }, { 0x0738, 0x4520, "Mad Catz Control Pad Pro", 0, XTYPE_XBOX }, { 0x0738, 0x4522, "Mad Catz LumiCON", 0, XTYPE_XBOX }, { 0x0738, 0x4526, "Mad Catz Control Pad Pro", 0, XTYPE_XBOX }, { 0x0738, 0x4530, "Mad Catz Universal MC2 Racing Wheel and Pedals", 0, XTYPE_XBOX }, { 0x0738, 0x4536, "Mad Catz MicroCON", 0, XTYPE_XBOX }, { 0x0738, 0x4540, "Mad Catz Beat Pad", MAP_DPAD_TO_BUTTONS, XTYPE_XBOX }, { 0x0738, 0x4556, "Mad Catz Lynx Wireless Controller", 0, XTYPE_XBOX }, { 0x0738, 0x4586, "Mad Catz MicroCon Wireless Controller", 0, XTYPE_XBOX }, { 0x0738, 0x4588, "Mad Catz Blaster", 0, XTYPE_XBOX }, { 0x0738, 0x45ff, "Mad Catz Beat Pad (w/ Handle)", MAP_DPAD_TO_BUTTONS, XTYPE_XBOX }, { 0x0738, 0x4716, "Mad Catz Wired Xbox 360 Controller", 0, XTYPE_XBOX360 }, { 0x0738, 0x4718, "Mad Catz Street Fighter IV FightStick SE", 0, XTYPE_XBOX360 }, { 0x0738, 0x4726, "Mad Catz Xbox 360 Controller", 0, XTYPE_XBOX360 }, { 0x0738, 0x4728, "Mad Catz Street Fighter IV FightPad", MAP_TRIGGERS_TO_BUTTONS, XTYPE_XBOX360 }, { 0x0738, 0x4736, "Mad Catz MicroCon Gamepad", 0, XTYPE_XBOX360 }, { 0x0738, 0x4738, "Mad Catz Wired Xbox 360 Controller (SFIV)", MAP_TRIGGERS_TO_BUTTONS, XTYPE_XBOX360 }, { 0x0738, 0x4740, "Mad Catz Beat Pad", 0, XTYPE_XBOX360 }, { 0x0738, 0x4743, "Mad Catz Beat Pad Pro", MAP_DPAD_TO_BUTTONS, XTYPE_XBOX }, { 0x0738, 0x4758, "Mad Catz Arcade Game Stick", MAP_TRIGGERS_TO_BUTTONS, XTYPE_XBOX360 }, { 0x0738, 0x4a01, "Mad Catz FightStick TE 2", MAP_TRIGGERS_TO_BUTTONS, XTYPE_XBOXONE }, { 0x0738, 0x6040, "Mad Catz Beat Pad Pro", MAP_DPAD_TO_BUTTONS, XTYPE_XBOX }, { 0x0738, 0x9871, "Mad Catz Portable Drum", 0, XTYPE_XBOX360 }, { 0x0738, 0xb726, "Mad Catz Xbox controller - MW2", 0, XTYPE_XBOX360 }, { 0x0738, 0xb738, "Mad Catz MVC2TE Stick 2", MAP_TRIGGERS_TO_BUTTONS, XTYPE_XBOX360 }, { 0x0738, 0xbeef, "Mad Catz JOYTECH NEO SE Advanced GamePad", XTYPE_XBOX360 }, { 0x0738, 0xcb02, "Saitek Cyborg Rumble Pad - PC/Xbox 360", 0, XTYPE_XBOX360 }, { 0x0738, 0xcb03, "Saitek P3200 Rumble Pad - PC/Xbox 360", 0, XTYPE_XBOX360 }, { 0x0738, 0xcb29, "Saitek Aviator Stick AV8R02", 0, XTYPE_XBOX360 }, { 0x0738, 0xf738, "Super SFIV FightStick TE S", 0, XTYPE_XBOX360 }, { 0x07ff, 0xffff, "Mad Catz GamePad", 0, XTYPE_XBOX360 }, { 0x0b05, 0x1a38, "ASUS ROG RAIKIRI", 0, XTYPE_XBOXONE }, { 0x0b05, 0x1abb, "ASUS ROG RAIKIRI PRO", 0, XTYPE_XBOXONE }, { 0x0c12, 0x0005, "Intec wireless", 0, XTYPE_XBOX }, { 0x0c12, 0x8801, "Nyko Xbox Controller", 0, XTYPE_XBOX }, { 0x0c12, 0x8802, "Zeroplus Xbox Controller", 0, XTYPE_XBOX }, { 0x0c12, 0x8809, "RedOctane Xbox Dance Pad", DANCEPAD_MAP_CONFIG, XTYPE_XBOX }, { 0x0c12, 0x880a, "Pelican Eclipse PL-2023", 0, XTYPE_XBOX }, { 0x0c12, 0x8810, "Zeroplus Xbox Controller", 0, XTYPE_XBOX }, { 0x0c12, 0x9902, "HAMA VibraX - *FAULTY HARDWARE*", 0, XTYPE_XBOX }, { 0x0d2f, 0x0002, "Andamiro Pump It Up pad", MAP_DPAD_TO_BUTTONS, XTYPE_XBOX }, { 0x0db0, 0x1901, "Micro Star International Xbox360 Controller for Windows", 0, XTYPE_XBOX360 }, { 0x0e4c, 0x1097, "Radica Gamester Controller", 0, XTYPE_XBOX }, { 0x0e4c, 0x1103, "Radica Gamester Reflex", MAP_TRIGGERS_TO_BUTTONS, XTYPE_XBOX }, { 0x0e4c, 0x2390, "Radica Games Jtech Controller", 0, XTYPE_XBOX }, { 0x0e4c, 0x3510, "Radica Gamester", 0, XTYPE_XBOX }, { 0x0e6f, 0x0003, "Logic3 Freebird wireless Controller", 0, XTYPE_XBOX }, { 0x0e6f, 0x0005, "Eclipse wireless Controller", 0, XTYPE_XBOX }, { 0x0e6f, 0x0006, "Edge wireless Controller", 0, XTYPE_XBOX }, { 0x0e6f, 0x0008, "After Glow Pro Controller", 0, XTYPE_XBOX }, { 0x0e6f, 0x0105, "HSM3 Xbox360 dancepad", MAP_DPAD_TO_BUTTONS, XTYPE_XBOX360 }, { 0x0e6f, 0x0113, "Afterglow AX.1 Gamepad for Xbox 360", 0, XTYPE_XBOX360 }, { 0x0e6f, 0x011f, "Rock Candy Gamepad Wired Controller", 0, XTYPE_XBOX360 }, { 0x0e6f, 0x0131, "PDP EA Sports Controller", 0, XTYPE_XBOX360 }, { 0x0e6f, 0x0133, "Xbox 360 Wired Controller", 0, XTYPE_XBOX360 }, { 0x0e6f, 0x0139, "Afterglow Prismatic Wired Controller", 0, XTYPE_XBOXONE }, { 0x0e6f, 0x013a, "PDP Xbox One Controller", 0, XTYPE_XBOXONE }, { 0x0e6f, 0x0146, "Rock Candy Wired Controller for Xbox One", 0, XTYPE_XBOXONE }, { 0x0e6f, 0x0147, "PDP Marvel Xbox One Controller", 0, XTYPE_XBOXONE }, { 0x0e6f, 0x015c, "PDP Xbox One Arcade Stick", MAP_TRIGGERS_TO_BUTTONS, XTYPE_XBOXONE }, { 0x0e6f, 0x0161, "PDP Xbox One Controller", 0, XTYPE_XBOXONE }, { 0x0e6f, 0x0162, "PDP Xbox One Controller", 0, XTYPE_XBOXONE }, { 0x0e6f, 0x0163, "PDP Xbox One Controller", 0, XTYPE_XBOXONE }, { 0x0e6f, 0x0164, "PDP Battlefield One", 0, XTYPE_XBOXONE }, { 0x0e6f, 0x0165, "PDP Titanfall 2", 0, XTYPE_XBOXONE }, { 0x0e6f, 0x0201, "Pelican PL-3601 'TSZ' Wired Xbox 360 Controller", 0, XTYPE_XBOX360 }, { 0x0e6f, 0x0213, "Afterglow Gamepad for Xbox 360", 0, XTYPE_XBOX360 }, { 0x0e6f, 0x021f, "Rock Candy Gamepad for Xbox 360", 0, XTYPE_XBOX360 }, { 0x0e6f, 0x0246, "Rock Candy Gamepad for Xbox One 2015", 0, XTYPE_XBOXONE }, { 0x0e6f, 0x02a0, "PDP Xbox One Controller", 0, XTYPE_XBOXONE }, { 0x0e6f, 0x02a1, "PDP Xbox One Controller", 0, XTYPE_XBOXONE }, { 0x0e6f, 0x02a2, "PDP Wired Controller for Xbox One - Crimson Red", 0, XTYPE_XBOXONE }, { 0x0e6f, 0x02a4, "PDP Wired Controller for Xbox One - Stealth Series", 0, XTYPE_XBOXONE }, { 0x0e6f, 0x02a6, "PDP Wired Controller for Xbox One - Camo Series", 0, XTYPE_XBOXONE }, { 0x0e6f, 0x02a7, "PDP Xbox One Controller", 0, XTYPE_XBOXONE }, { 0x0e6f, 0x02a8, "PDP Xbox One Controller", 0, XTYPE_XBOXONE }, { 0x0e6f, 0x02ab, "PDP Controller for Xbox One", 0, XTYPE_XBOXONE }, { 0x0e6f, 0x02ad, "PDP Wired Controller for Xbox One - Stealth Series", 0, XTYPE_XBOXONE }, { 0x0e6f, 0x02b3, "Afterglow Prismatic Wired Controller", 0, XTYPE_XBOXONE }, { 0x0e6f, 0x02b8, "Afterglow Prismatic Wired Controller", 0, XTYPE_XBOXONE }, { 0x0e6f, 0x0301, "Logic3 Controller", 0, XTYPE_XBOX360 }, { 0x0e6f, 0x0346, "Rock Candy Gamepad for Xbox One 2016", 0, XTYPE_XBOXONE }, { 0x0e6f, 0x0401, "Logic3 Controller", 0, XTYPE_XBOX360 }, { 0x0e6f, 0x0413, "Afterglow AX.1 Gamepad for Xbox 360", 0, XTYPE_XBOX360 }, { 0x0e6f, 0x0501, "PDP Xbox 360 Controller", 0, XTYPE_XBOX360 }, { 0x0e6f, 0xf900, "PDP Afterglow AX.1", 0, XTYPE_XBOX360 }, { 0x0e8f, 0x0201, "SmartJoy Frag Xpad/PS2 adaptor", 0, XTYPE_XBOX }, { 0x0e8f, 0x3008, "Generic xbox control (dealextreme)", 0, XTYPE_XBOX }, { 0x0f0d, 0x000a, "Hori Co. DOA4 FightStick", 0, XTYPE_XBOX360 }, { 0x0f0d, 0x000c, "Hori PadEX Turbo", 0, XTYPE_XBOX360 }, { 0x0f0d, 0x000d, "Hori Fighting Stick EX2", MAP_TRIGGERS_TO_BUTTONS, XTYPE_XBOX360 }, { 0x0f0d, 0x0016, "Hori Real Arcade Pro.EX", MAP_TRIGGERS_TO_BUTTONS, XTYPE_XBOX360 }, { 0x0f0d, 0x001b, "Hori Real Arcade Pro VX", MAP_TRIGGERS_TO_BUTTONS, XTYPE_XBOX360 }, { 0x0f0d, 0x0063, "Hori Real Arcade Pro Hayabusa (USA) Xbox One", MAP_TRIGGERS_TO_BUTTONS, XTYPE_XBOXONE }, { 0x0f0d, 0x0067, "HORIPAD ONE", 0, XTYPE_XBOXONE }, { 0x0f0d, 0x0078, "Hori Real Arcade Pro V Kai Xbox One", MAP_TRIGGERS_TO_BUTTONS, XTYPE_XBOXONE }, { 0x0f0d, 0x00c5, "Hori Fighting Commander ONE", MAP_TRIGGERS_TO_BUTTONS, XTYPE_XBOXONE }, { 0x0f0d, 0x00dc, "HORIPAD FPS for Nintendo Switch", MAP_TRIGGERS_TO_BUTTONS, XTYPE_XBOX360 }, { 0x0f30, 0x010b, "Philips Recoil", 0, XTYPE_XBOX }, { 0x0f30, 0x0202, "Joytech Advanced Controller", 0, XTYPE_XBOX }, { 0x0f30, 0x8888, "BigBen XBMiniPad Controller", 0, XTYPE_XBOX }, { 0x102c, 0xff0c, "Joytech Wireless Advanced Controller", 0, XTYPE_XBOX }, { 0x1038, 0x1430, "SteelSeries Stratus Duo", 0, XTYPE_XBOX360 }, { 0x1038, 0x1431, "SteelSeries Stratus Duo", 0, XTYPE_XBOX360 }, { 0x11c9, 0x55f0, "Nacon GC-100XF", 0, XTYPE_XBOX360 }, { 0x11ff, 0x0511, "PXN V900", 0, XTYPE_XBOX360 }, { 0x1209, 0x2882, "Ardwiino Controller", 0, XTYPE_XBOX360 }, { 0x12ab, 0x0004, "Honey Bee Xbox360 dancepad", MAP_DPAD_TO_BUTTONS, XTYPE_XBOX360 }, { 0x12ab, 0x0301, "PDP AFTERGLOW AX.1", 0, XTYPE_XBOX360 }, { 0x12ab, 0x0303, "Mortal Kombat Klassic FightStick", MAP_TRIGGERS_TO_BUTTONS, XTYPE_XBOX360 }, { 0x12ab, 0x8809, "Xbox DDR dancepad", MAP_DPAD_TO_BUTTONS, XTYPE_XBOX }, { 0x1430, 0x4748, "RedOctane Guitar Hero X-plorer", 0, XTYPE_XBOX360 }, { 0x1430, 0x8888, "TX6500+ Dance Pad (first generation)", MAP_DPAD_TO_BUTTONS, XTYPE_XBOX }, { 0x1430, 0xf801, "RedOctane Controller", 0, XTYPE_XBOX360 }, { 0x146b, 0x0601, "BigBen Interactive XBOX 360 Controller", 0, XTYPE_XBOX360 }, { 0x146b, 0x0604, "Bigben Interactive DAIJA Arcade Stick", MAP_TRIGGERS_TO_BUTTONS, XTYPE_XBOX360 }, { 0x1532, 0x0a00, "Razer Atrox Arcade Stick", MAP_TRIGGERS_TO_BUTTONS, XTYPE_XBOXONE }, { 0x1532, 0x0a03, "Razer Wildcat", 0, XTYPE_XBOXONE }, { 0x1532, 0x0a29, "Razer Wolverine V2", 0, XTYPE_XBOXONE }, { 0x15e4, 0x3f00, "Power A Mini Pro Elite", 0, XTYPE_XBOX360 }, { 0x15e4, 0x3f0a, "Xbox Airflo wired controller", 0, XTYPE_XBOX360 }, { 0x15e4, 0x3f10, "Batarang Xbox 360 controller", 0, XTYPE_XBOX360 }, { 0x162e, 0xbeef, "Joytech Neo-Se Take2", 0, XTYPE_XBOX360 }, { 0x1689, 0xfd00, "Razer Onza Tournament Edition", 0, XTYPE_XBOX360 }, { 0x1689, 0xfd01, "Razer Onza Classic Edition", 0, XTYPE_XBOX360 }, { 0x1689, 0xfe00, "Razer Sabertooth", 0, XTYPE_XBOX360 }, { 0x17ef, 0x6182, "Lenovo Legion Controller for Windows", 0, XTYPE_XBOX360 }, { 0x1949, 0x041a, "Amazon Game Controller", 0, XTYPE_XBOX360 }, { 0x1a86, 0xe310, "QH Electronics Controller", 0, XTYPE_XBOX360 }, { 0x1bad, 0x0002, "Harmonix Rock Band Guitar", 0, XTYPE_XBOX360 }, { 0x1bad, 0x0003, "Harmonix Rock Band Drumkit", MAP_DPAD_TO_BUTTONS, XTYPE_XBOX360 }, { 0x1bad, 0x0130, "Ion Drum Rocker", MAP_DPAD_TO_BUTTONS, XTYPE_XBOX360 }, { 0x1bad, 0xf016, "Mad Catz Xbox 360 Controller", 0, XTYPE_XBOX360 }, { 0x1bad, 0xf018, "Mad Catz Street Fighter IV SE Fighting Stick", MAP_TRIGGERS_TO_BUTTONS, XTYPE_XBOX360 }, { 0x1bad, 0xf019, "Mad Catz Brawlstick for Xbox 360", MAP_TRIGGERS_TO_BUTTONS, XTYPE_XBOX360 }, { 0x1bad, 0xf021, "Mad Cats Ghost Recon FS GamePad", 0, XTYPE_XBOX360 }, { 0x1bad, 0xf023, "MLG Pro Circuit Controller (Xbox)", 0, XTYPE_XBOX360 }, { 0x1bad, 0xf025, "Mad Catz Call Of Duty", 0, XTYPE_XBOX360 }, { 0x1bad, 0xf027, "Mad Catz FPS Pro", 0, XTYPE_XBOX360 }, { 0x1bad, 0xf028, "Street Fighter IV FightPad", 0, XTYPE_XBOX360 }, { 0x1bad, 0xf02e, "Mad Catz Fightpad", MAP_TRIGGERS_TO_BUTTONS, XTYPE_XBOX360 }, { 0x1bad, 0xf030, "Mad Catz Xbox 360 MC2 MicroCon Racing Wheel", 0, XTYPE_XBOX360 }, { 0x1bad, 0xf036, "Mad Catz MicroCon GamePad Pro", 0, XTYPE_XBOX360 }, { 0x1bad, 0xf038, "Street Fighter IV FightStick TE", 0, XTYPE_XBOX360 }, { 0x1bad, 0xf039, "Mad Catz MvC2 TE", MAP_TRIGGERS_TO_BUTTONS, XTYPE_XBOX360 }, { 0x1bad, 0xf03a, "Mad Catz SFxT Fightstick Pro", MAP_TRIGGERS_TO_BUTTONS, XTYPE_XBOX360 }, { 0x1bad, 0xf03d, "Street Fighter IV Arcade Stick TE - Chun Li", MAP_TRIGGERS_TO_BUTTONS, XTYPE_XBOX360 }, { 0x1bad, 0xf03e, "Mad Catz MLG FightStick TE", MAP_TRIGGERS_TO_BUTTONS, XTYPE_XBOX360 }, { 0x1bad, 0xf03f, "Mad Catz FightStick SoulCaliber", MAP_TRIGGERS_TO_BUTTONS, XTYPE_XBOX360 }, { 0x1bad, 0xf042, "Mad Catz FightStick TES+", MAP_TRIGGERS_TO_BUTTONS, XTYPE_XBOX360 }, { 0x1bad, 0xf080, "Mad Catz FightStick TE2", MAP_TRIGGERS_TO_BUTTONS, XTYPE_XBOX360 }, { 0x1bad, 0xf501, "HoriPad EX2 Turbo", 0, XTYPE_XBOX360 }, { 0x1bad, 0xf502, "Hori Real Arcade Pro.VX SA", MAP_TRIGGERS_TO_BUTTONS, XTYPE_XBOX360 }, { 0x1bad, 0xf503, "Hori Fighting Stick VX", MAP_TRIGGERS_TO_BUTTONS, XTYPE_XBOX360 }, { 0x1bad, 0xf504, "Hori Real Arcade Pro. EX", MAP_TRIGGERS_TO_BUTTONS, XTYPE_XBOX360 }, { 0x1bad, 0xf505, "Hori Fighting Stick EX2B", MAP_TRIGGERS_TO_BUTTONS, XTYPE_XBOX360 }, { 0x1bad, 0xf506, "Hori Real Arcade Pro.EX Premium VLX", 0, XTYPE_XBOX360 }, { 0x1bad, 0xf900, "Harmonix Xbox 360 Controller", 0, XTYPE_XBOX360 }, { 0x1bad, 0xf901, "Gamestop Xbox 360 Controller", 0, XTYPE_XBOX360 }, { 0x1bad, 0xf903, "Tron Xbox 360 controller", 0, XTYPE_XBOX360 }, { 0x1bad, 0xf904, "PDP Versus Fighting Pad", 0, XTYPE_XBOX360 }, { 0x1bad, 0xf906, "MortalKombat FightStick", MAP_TRIGGERS_TO_BUTTONS, XTYPE_XBOX360 }, { 0x1bad, 0xfa01, "MadCatz GamePad", 0, XTYPE_XBOX360 }, { 0x1bad, 0xfd00, "Razer Onza TE", 0, XTYPE_XBOX360 }, { 0x1bad, 0xfd01, "Razer Onza", 0, XTYPE_XBOX360 }, { 0x20d6, 0x2001, "BDA Xbox Series X Wired Controller", 0, XTYPE_XBOXONE }, { 0x20d6, 0x2009, "PowerA Enhanced Wired Controller for Xbox Series X|S", 0, XTYPE_XBOXONE }, { 0x20d6, 0x281f, "PowerA Wired Controller For Xbox 360", 0, XTYPE_XBOX360 }, { 0x2345, 0xe00b, "Machenike G5 Pro Controller", 0, XTYPE_XBOX360 }, { 0x24c6, 0x5000, "Razer Atrox Arcade Stick", MAP_TRIGGERS_TO_BUTTONS, XTYPE_XBOX360 }, { 0x24c6, 0x5300, "PowerA MINI PROEX Controller", 0, XTYPE_XBOX360 }, { 0x24c6, 0x5303, "Xbox Airflo wired controller", 0, XTYPE_XBOX360 }, { 0x24c6, 0x530a, "Xbox 360 Pro EX Controller", 0, XTYPE_XBOX360 }, { 0x24c6, 0x531a, "PowerA Pro Ex", 0, XTYPE_XBOX360 }, { 0x24c6, 0x5397, "FUS1ON Tournament Controller", 0, XTYPE_XBOX360 }, { 0x24c6, 0x541a, "PowerA Xbox One Mini Wired Controller", 0, XTYPE_XBOXONE }, { 0x24c6, 0x542a, "Xbox ONE spectra", 0, XTYPE_XBOXONE }, { 0x24c6, 0x543a, "PowerA Xbox One wired controller", 0, XTYPE_XBOXONE }, { 0x24c6, 0x5500, "Hori XBOX 360 EX 2 with Turbo", 0, XTYPE_XBOX360 }, { 0x24c6, 0x5501, "Hori Real Arcade Pro VX-SA", 0, XTYPE_XBOX360 }, { 0x24c6, 0x5502, "Hori Fighting Stick VX Alt", MAP_TRIGGERS_TO_BUTTONS, XTYPE_XBOX360 }, { 0x24c6, 0x5503, "Hori Fighting Edge", MAP_TRIGGERS_TO_BUTTONS, XTYPE_XBOX360 }, { 0x24c6, 0x5506, "Hori SOULCALIBUR V Stick", 0, XTYPE_XBOX360 }, { 0x24c6, 0x550d, "Hori GEM Xbox controller", 0, XTYPE_XBOX360 }, { 0x24c6, 0x550e, "Hori Real Arcade Pro V Kai 360", MAP_TRIGGERS_TO_BUTTONS, XTYPE_XBOX360 }, { 0x24c6, 0x5510, "Hori Fighting Commander ONE (Xbox 360/PC Mode)", MAP_TRIGGERS_TO_BUTTONS, XTYPE_XBOX360 }, { 0x24c6, 0x551a, "PowerA FUSION Pro Controller", 0, XTYPE_XBOXONE }, { 0x24c6, 0x561a, "PowerA FUSION Controller", 0, XTYPE_XBOXONE }, { 0x24c6, 0x5b00, "ThrustMaster Ferrari 458 Racing Wheel", 0, XTYPE_XBOX360 }, { 0x24c6, 0x5b02, "Thrustmaster, Inc. GPX Controller", 0, XTYPE_XBOX360 }, { 0x24c6, 0x5b03, "Thrustmaster Ferrari 458 Racing Wheel", 0, XTYPE_XBOX360 }, { 0x24c6, 0x5d04, "Razer Sabertooth", 0, XTYPE_XBOX360 }, { 0x24c6, 0xfafe, "Rock Candy Gamepad for Xbox 360", 0, XTYPE_XBOX360 }, { 0x2563, 0x058d, "OneXPlayer Gamepad", 0, XTYPE_XBOX360 }, { 0x294b, 0x3303, "Snakebyte GAMEPAD BASE X", 0, XTYPE_XBOXONE }, { 0x294b, 0x3404, "Snakebyte GAMEPAD RGB X", 0, XTYPE_XBOXONE }, { 0x2dc8, 0x2000, "8BitDo Pro 2 Wired Controller fox Xbox", 0, XTYPE_XBOXONE }, { 0x2dc8, 0x3106, "8BitDo Ultimate Wireless / Pro 2 Wired Controller", 0, XTYPE_XBOX360 }, { 0x2dc8, 0x310a, "8BitDo Ultimate 2C Wireless Controller", 0, XTYPE_XBOX360 }, { 0x2e24, 0x0652, "Hyperkin Duke X-Box One pad", 0, XTYPE_XBOXONE }, { 0x31e3, 0x1100, "Wooting One", 0, XTYPE_XBOX360 }, { 0x31e3, 0x1200, "Wooting Two", 0, XTYPE_XBOX360 }, { 0x31e3, 0x1210, "Wooting Lekker", 0, XTYPE_XBOX360 }, { 0x31e3, 0x1220, "Wooting Two HE", 0, XTYPE_XBOX360 }, { 0x31e3, 0x1230, "Wooting Two HE (ARM)", 0, XTYPE_XBOX360 }, { 0x31e3, 0x1300, "Wooting 60HE (AVR)", 0, XTYPE_XBOX360 }, { 0x31e3, 0x1310, "Wooting 60HE (ARM)", 0, XTYPE_XBOX360 }, { 0x3285, 0x0607, "Nacon GC-100", 0, XTYPE_XBOX360 }, { 0x3285, 0x0646, "Nacon Pro Compact", 0, XTYPE_XBOXONE }, { 0x3285, 0x0663, "Nacon Evol-X", 0, XTYPE_XBOXONE }, { 0x3537, 0x1004, "GameSir T4 Kaleid", 0, XTYPE_XBOX360 }, { 0x3767, 0x0101, "Fanatec Speedster 3 Forceshock Wheel", 0, XTYPE_XBOX }, { 0xffff, 0xffff, "Chinese-made Xbox Controller", 0, XTYPE_XBOX }, { 0x0000, 0x0000, "Generic X-Box pad", 0, XTYPE_UNKNOWN } }; /* buttons shared with xbox and xbox360 */ static const signed short xpad_common_btn[] = { BTN_A, BTN_B, BTN_X, BTN_Y, /* "analog" buttons */ BTN_START, BTN_SELECT, BTN_THUMBL, BTN_THUMBR, /* start/back/sticks */ -1 /* terminating entry */ }; /* original xbox controllers only */ static const signed short xpad_btn[] = { BTN_C, BTN_Z, /* "analog" buttons */ -1 /* terminating entry */ }; /* used when dpad is mapped to buttons */ static const signed short xpad_btn_pad[] = { BTN_TRIGGER_HAPPY1, BTN_TRIGGER_HAPPY2, /* d-pad left, right */ BTN_TRIGGER_HAPPY3, BTN_TRIGGER_HAPPY4, /* d-pad up, down */ -1 /* terminating entry */ }; /* used when triggers are mapped to buttons */ static const signed short xpad_btn_triggers[] = { BTN_TL2, BTN_TR2, /* triggers left/right */ -1 }; static const signed short xpad360_btn[] = { /* buttons for x360 controller */ BTN_TL, BTN_TR, /* Button LB/RB */ BTN_MODE, /* The big X button */ -1 }; static const signed short xpad_abs[] = { ABS_X, ABS_Y, /* left stick */ ABS_RX, ABS_RY, /* right stick */ -1 /* terminating entry */ }; /* used when dpad is mapped to axes */ static const signed short xpad_abs_pad[] = { ABS_HAT0X, ABS_HAT0Y, /* d-pad axes */ -1 /* terminating entry */ }; /* used when triggers are mapped to axes */ static const signed short xpad_abs_triggers[] = { ABS_Z, ABS_RZ, /* triggers left/right */ -1 }; /* used when the controller has extra paddle buttons */ static const signed short xpad_btn_paddles[] = { BTN_TRIGGER_HAPPY5, BTN_TRIGGER_HAPPY6, /* paddle upper right, lower right */ BTN_TRIGGER_HAPPY7, BTN_TRIGGER_HAPPY8, /* paddle upper left, lower left */ -1 /* terminating entry */ }; /* * Xbox 360 has a vendor-specific class, so we cannot match it with only * USB_INTERFACE_INFO (also specifically refused by USB subsystem), so we * match against vendor id as well. Wired Xbox 360 devices have protocol 1, * wireless controllers have protocol 129. */ #define XPAD_XBOX360_VENDOR_PROTOCOL(vend, pr) \ .match_flags = USB_DEVICE_ID_MATCH_VENDOR | USB_DEVICE_ID_MATCH_INT_INFO, \ .idVendor = (vend), \ .bInterfaceClass = USB_CLASS_VENDOR_SPEC, \ .bInterfaceSubClass = 93, \ .bInterfaceProtocol = (pr) #define XPAD_XBOX360_VENDOR(vend) \ { XPAD_XBOX360_VENDOR_PROTOCOL((vend), 1) }, \ { XPAD_XBOX360_VENDOR_PROTOCOL((vend), 129) } /* The Xbox One controller uses subclass 71 and protocol 208. */ #define XPAD_XBOXONE_VENDOR_PROTOCOL(vend, pr) \ .match_flags = USB_DEVICE_ID_MATCH_VENDOR | USB_DEVICE_ID_MATCH_INT_INFO, \ .idVendor = (vend), \ .bInterfaceClass = USB_CLASS_VENDOR_SPEC, \ .bInterfaceSubClass = 71, \ .bInterfaceProtocol = (pr) #define XPAD_XBOXONE_VENDOR(vend) \ { XPAD_XBOXONE_VENDOR_PROTOCOL((vend), 208) } static const struct usb_device_id xpad_table[] = { /* * Please keep this list sorted by vendor ID. Note that there are 2 * macros - XPAD_XBOX360_VENDOR and XPAD_XBOXONE_VENDOR. */ { USB_INTERFACE_INFO('X', 'B', 0) }, /* Xbox USB-IF not-approved class */ XPAD_XBOX360_VENDOR(0x0079), /* GPD Win 2 controller */ XPAD_XBOX360_VENDOR(0x03eb), /* Wooting Keyboards (Legacy) */ XPAD_XBOX360_VENDOR(0x03f0), /* HP HyperX Xbox 360 controllers */ XPAD_XBOXONE_VENDOR(0x03f0), /* HP HyperX Xbox One controllers */ XPAD_XBOX360_VENDOR(0x044f), /* Thrustmaster Xbox 360 controllers */ XPAD_XBOX360_VENDOR(0x045e), /* Microsoft Xbox 360 controllers */ XPAD_XBOXONE_VENDOR(0x045e), /* Microsoft Xbox One controllers */ XPAD_XBOX360_VENDOR(0x046d), /* Logitech Xbox 360-style controllers */ XPAD_XBOX360_VENDOR(0x056e), /* Elecom JC-U3613M */ XPAD_XBOX360_VENDOR(0x06a3), /* Saitek P3600 */ XPAD_XBOX360_VENDOR(0x0738), /* Mad Catz Xbox 360 controllers */ { USB_DEVICE(0x0738, 0x4540) }, /* Mad Catz Beat Pad */ XPAD_XBOXONE_VENDOR(0x0738), /* Mad Catz FightStick TE 2 */ XPAD_XBOX360_VENDOR(0x07ff), /* Mad Catz Gamepad */ XPAD_XBOXONE_VENDOR(0x0b05), /* ASUS controllers */ XPAD_XBOX360_VENDOR(0x0c12), /* Zeroplus X-Box 360 controllers */ XPAD_XBOX360_VENDOR(0x0db0), /* Micro Star International X-Box 360 controllers */ XPAD_XBOX360_VENDOR(0x0e6f), /* 0x0e6f Xbox 360 controllers */ XPAD_XBOXONE_VENDOR(0x0e6f), /* 0x0e6f Xbox One controllers */ XPAD_XBOX360_VENDOR(0x0f0d), /* Hori controllers */ XPAD_XBOXONE_VENDOR(0x0f0d), /* Hori controllers */ XPAD_XBOX360_VENDOR(0x1038), /* SteelSeries controllers */ XPAD_XBOXONE_VENDOR(0x10f5), /* Turtle Beach Controllers */ XPAD_XBOX360_VENDOR(0x11c9), /* Nacon GC100XF */ XPAD_XBOX360_VENDOR(0x11ff), /* PXN V900 */ XPAD_XBOX360_VENDOR(0x1209), /* Ardwiino Controllers */ XPAD_XBOX360_VENDOR(0x12ab), /* Xbox 360 dance pads */ XPAD_XBOX360_VENDOR(0x1430), /* RedOctane Xbox 360 controllers */ XPAD_XBOX360_VENDOR(0x146b), /* Bigben Interactive controllers */ XPAD_XBOX360_VENDOR(0x1532), /* Razer Sabertooth */ XPAD_XBOXONE_VENDOR(0x1532), /* Razer Wildcat */ XPAD_XBOX360_VENDOR(0x15e4), /* Numark Xbox 360 controllers */ XPAD_XBOX360_VENDOR(0x162e), /* Joytech Xbox 360 controllers */ XPAD_XBOX360_VENDOR(0x1689), /* Razer Onza */ XPAD_XBOX360_VENDOR(0x17ef), /* Lenovo */ XPAD_XBOX360_VENDOR(0x1949), /* Amazon controllers */ XPAD_XBOX360_VENDOR(0x1a86), /* QH Electronics */ XPAD_XBOX360_VENDOR(0x1bad), /* Harmonix Rock Band guitar and drums */ XPAD_XBOX360_VENDOR(0x20d6), /* PowerA controllers */ XPAD_XBOXONE_VENDOR(0x20d6), /* PowerA controllers */ XPAD_XBOX360_VENDOR(0x2345), /* Machenike Controllers */ XPAD_XBOX360_VENDOR(0x24c6), /* PowerA controllers */ XPAD_XBOXONE_VENDOR(0x24c6), /* PowerA controllers */ XPAD_XBOX360_VENDOR(0x2563), /* OneXPlayer Gamepad */ XPAD_XBOX360_VENDOR(0x260d), /* Dareu H101 */ XPAD_XBOXONE_VENDOR(0x294b), /* Snakebyte */ XPAD_XBOX360_VENDOR(0x2c22), /* Qanba Controllers */ XPAD_XBOX360_VENDOR(0x2dc8), /* 8BitDo Pro 2 Wired Controller */ XPAD_XBOXONE_VENDOR(0x2dc8), /* 8BitDo Pro 2 Wired Controller for Xbox */ XPAD_XBOXONE_VENDOR(0x2e24), /* Hyperkin Duke Xbox One pad */ XPAD_XBOX360_VENDOR(0x2f24), /* GameSir controllers */ XPAD_XBOX360_VENDOR(0x31e3), /* Wooting Keyboards */ XPAD_XBOX360_VENDOR(0x3285), /* Nacon GC-100 */ XPAD_XBOXONE_VENDOR(0x3285), /* Nacon Evol-X */ XPAD_XBOX360_VENDOR(0x3537), /* GameSir Controllers */ XPAD_XBOXONE_VENDOR(0x3537), /* GameSir Controllers */ { } }; MODULE_DEVICE_TABLE(usb, xpad_table); struct xboxone_init_packet { u16 idVendor; u16 idProduct; const u8 *data; u8 len; }; #define XBOXONE_INIT_PKT(_vid, _pid, _data) \ { \ .idVendor = (_vid), \ .idProduct = (_pid), \ .data = (_data), \ .len = ARRAY_SIZE(_data), \ } /* * starting with xbox one, the game input protocol is used * magic numbers are taken from * - https://github.com/xpadneo/gip-dissector/blob/main/src/gip-dissector.lua * - https://github.com/medusalix/xone/blob/master/bus/protocol.c */ #define GIP_CMD_ACK 0x01 #define GIP_CMD_IDENTIFY 0x04 #define GIP_CMD_POWER 0x05 #define GIP_CMD_AUTHENTICATE 0x06 #define GIP_CMD_VIRTUAL_KEY 0x07 #define GIP_CMD_RUMBLE 0x09 #define GIP_CMD_LED 0x0a #define GIP_CMD_FIRMWARE 0x0c #define GIP_CMD_INPUT 0x20 #define GIP_SEQ0 0x00 #define GIP_OPT_ACK 0x10 #define GIP_OPT_INTERNAL 0x20 /* * length of the command payload encoded with * https://en.wikipedia.org/wiki/LEB128 * which is a no-op for N < 128 */ #define GIP_PL_LEN(N) (N) /* * payload specific defines */ #define GIP_PWR_ON 0x00 #define GIP_LED_ON 0x01 #define GIP_MOTOR_R BIT(0) #define GIP_MOTOR_L BIT(1) #define GIP_MOTOR_RT BIT(2) #define GIP_MOTOR_LT BIT(3) #define GIP_MOTOR_ALL (GIP_MOTOR_R | GIP_MOTOR_L | GIP_MOTOR_RT | GIP_MOTOR_LT) #define GIP_WIRED_INTF_DATA 0 #define GIP_WIRED_INTF_AUDIO 1 /* * This packet is required for all Xbox One pads with 2015 * or later firmware installed (or present from the factory). */ static const u8 xboxone_power_on[] = { GIP_CMD_POWER, GIP_OPT_INTERNAL, GIP_SEQ0, GIP_PL_LEN(1), GIP_PWR_ON }; /* * This packet is required for Xbox One S (0x045e:0x02ea) * and Xbox One Elite Series 2 (0x045e:0x0b00) pads to * initialize the controller that was previously used in * Bluetooth mode. */ static const u8 xboxone_s_init[] = { GIP_CMD_POWER, GIP_OPT_INTERNAL, GIP_SEQ0, 0x0f, 0x06 }; /* * This packet is required to get additional input data * from Xbox One Elite Series 2 (0x045e:0x0b00) pads. * We mostly do this right now to get paddle data */ static const u8 extra_input_packet_init[] = { 0x4d, 0x10, 0x01, 0x02, 0x07, 0x00 }; /* * This packet is required for the Titanfall 2 Xbox One pads * (0x0e6f:0x0165) to finish initialization and for Hori pads * (0x0f0d:0x0067) to make the analog sticks work. */ static const u8 xboxone_hori_ack_id[] = { GIP_CMD_ACK, GIP_OPT_INTERNAL, GIP_SEQ0, GIP_PL_LEN(9), 0x00, GIP_CMD_IDENTIFY, GIP_OPT_INTERNAL, 0x3a, 0x00, 0x00, 0x00, 0x80, 0x00 }; /* * This packet is required for most (all?) of the PDP pads to start * sending input reports. These pads include: (0x0e6f:0x02ab), * (0x0e6f:0x02a4), (0x0e6f:0x02a6). */ static const u8 xboxone_pdp_led_on[] = { GIP_CMD_LED, GIP_OPT_INTERNAL, GIP_SEQ0, GIP_PL_LEN(3), 0x00, GIP_LED_ON, 0x14 }; /* * This packet is required for most (all?) of the PDP pads to start * sending input reports. These pads include: (0x0e6f:0x02ab), * (0x0e6f:0x02a4), (0x0e6f:0x02a6). */ static const u8 xboxone_pdp_auth[] = { GIP_CMD_AUTHENTICATE, GIP_OPT_INTERNAL, GIP_SEQ0, GIP_PL_LEN(2), 0x01, 0x00 }; /* * A specific rumble packet is required for some PowerA pads to start * sending input reports. One of those pads is (0x24c6:0x543a). */ static const u8 xboxone_rumblebegin_init[] = { GIP_CMD_RUMBLE, 0x00, GIP_SEQ0, GIP_PL_LEN(9), 0x00, GIP_MOTOR_ALL, 0x00, 0x00, 0x1D, 0x1D, 0xFF, 0x00, 0x00 }; /* * A rumble packet with zero FF intensity will immediately * terminate the rumbling required to init PowerA pads. * This should happen fast enough that the motors don't * spin up to enough speed to actually vibrate the gamepad. */ static const u8 xboxone_rumbleend_init[] = { GIP_CMD_RUMBLE, 0x00, GIP_SEQ0, GIP_PL_LEN(9), 0x00, GIP_MOTOR_ALL, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 }; /* * This specifies the selection of init packets that a gamepad * will be sent on init *and* the order in which they will be * sent. The correct sequence number will be added when the * packet is going to be sent. */ static const struct xboxone_init_packet xboxone_init_packets[] = { XBOXONE_INIT_PKT(0x0e6f, 0x0165, xboxone_hori_ack_id), XBOXONE_INIT_PKT(0x0f0d, 0x0067, xboxone_hori_ack_id), XBOXONE_INIT_PKT(0x0000, 0x0000, xboxone_power_on), XBOXONE_INIT_PKT(0x045e, 0x02ea, xboxone_s_init), XBOXONE_INIT_PKT(0x045e, 0x0b00, xboxone_s_init), XBOXONE_INIT_PKT(0x045e, 0x0b00, extra_input_packet_init), XBOXONE_INIT_PKT(0x0e6f, 0x0000, xboxone_pdp_led_on), XBOXONE_INIT_PKT(0x0e6f, 0x0000, xboxone_pdp_auth), XBOXONE_INIT_PKT(0x24c6, 0x541a, xboxone_rumblebegin_init), XBOXONE_INIT_PKT(0x24c6, 0x542a, xboxone_rumblebegin_init), XBOXONE_INIT_PKT(0x24c6, 0x543a, xboxone_rumblebegin_init), XBOXONE_INIT_PKT(0x24c6, 0x541a, xboxone_rumbleend_init), XBOXONE_INIT_PKT(0x24c6, 0x542a, xboxone_rumbleend_init), XBOXONE_INIT_PKT(0x24c6, 0x543a, xboxone_rumbleend_init), }; struct xpad_output_packet { u8 data[XPAD_PKT_LEN]; u8 len; bool pending; }; #define XPAD_OUT_CMD_IDX 0 #define XPAD_OUT_FF_IDX 1 #define XPAD_OUT_LED_IDX (1 + IS_ENABLED(CONFIG_JOYSTICK_XPAD_FF)) #define XPAD_NUM_OUT_PACKETS (1 + \ IS_ENABLED(CONFIG_JOYSTICK_XPAD_FF) + \ IS_ENABLED(CONFIG_JOYSTICK_XPAD_LEDS)) struct usb_xpad { struct input_dev *dev; /* input device interface */ struct input_dev __rcu *x360w_dev; struct usb_device *udev; /* usb device */ struct usb_interface *intf; /* usb interface */ bool pad_present; bool input_created; struct urb *irq_in; /* urb for interrupt in report */ unsigned char *idata; /* input data */ dma_addr_t idata_dma; struct urb *irq_out; /* urb for interrupt out report */ struct usb_anchor irq_out_anchor; bool irq_out_active; /* we must not use an active URB */ u8 odata_serial; /* serial number for xbox one protocol */ unsigned char *odata; /* output data */ dma_addr_t odata_dma; spinlock_t odata_lock; struct xpad_output_packet out_packets[XPAD_NUM_OUT_PACKETS]; int last_out_packet; int init_seq; #if defined(CONFIG_JOYSTICK_XPAD_LEDS) struct xpad_led *led; #endif char phys[64]; /* physical device path */ int mapping; /* map d-pad to buttons or to axes */ int xtype; /* type of xbox device */ int packet_type; /* type of the extended packet */ int pad_nr; /* the order x360 pads were attached */ const char *name; /* name of the device */ struct work_struct work; /* init/remove device from callback */ time64_t mode_btn_down_ts; }; static int xpad_init_input(struct usb_xpad *xpad); static void xpad_deinit_input(struct usb_xpad *xpad); static void xpadone_ack_mode_report(struct usb_xpad *xpad, u8 seq_num); static void xpad360w_poweroff_controller(struct usb_xpad *xpad); /* * xpad_process_packet * * Completes a request by converting the data into events for the * input subsystem. * * The used report descriptor was taken from ITO Takayuki's website: * http://euc.jp/periphs/xbox-controller.ja.html */ static void xpad_process_packet(struct usb_xpad *xpad, u16 cmd, unsigned char *data) { struct input_dev *dev = xpad->dev; if (!(xpad->mapping & MAP_STICKS_TO_NULL)) { /* left stick */ input_report_abs(dev, ABS_X, (__s16) le16_to_cpup((__le16 *)(data + 12))); input_report_abs(dev, ABS_Y, ~(__s16) le16_to_cpup((__le16 *)(data + 14))); /* right stick */ input_report_abs(dev, ABS_RX, (__s16) le16_to_cpup((__le16 *)(data + 16))); input_report_abs(dev, ABS_RY, ~(__s16) le16_to_cpup((__le16 *)(data + 18))); } /* triggers left/right */ if (xpad->mapping & MAP_TRIGGERS_TO_BUTTONS) { input_report_key(dev, BTN_TL2, data[10]); input_report_key(dev, BTN_TR2, data[11]); } else { input_report_abs(dev, ABS_Z, data[10]); input_report_abs(dev, ABS_RZ, data[11]); } /* digital pad */ if (xpad->mapping & MAP_DPAD_TO_BUTTONS) { /* dpad as buttons (left, right, up, down) */ input_report_key(dev, BTN_TRIGGER_HAPPY1, data[2] & BIT(2)); input_report_key(dev, BTN_TRIGGER_HAPPY2, data[2] & BIT(3)); input_report_key(dev, BTN_TRIGGER_HAPPY3, data[2] & BIT(0)); input_report_key(dev, BTN_TRIGGER_HAPPY4, data[2] & BIT(1)); } else { input_report_abs(dev, ABS_HAT0X, !!(data[2] & 0x08) - !!(data[2] & 0x04)); input_report_abs(dev, ABS_HAT0Y, !!(data[2] & 0x02) - !!(data[2] & 0x01)); } /* start/back buttons and stick press left/right */ input_report_key(dev, BTN_START, data[2] & BIT(4)); input_report_key(dev, BTN_SELECT, data[2] & BIT(5)); input_report_key(dev, BTN_THUMBL, data[2] & BIT(6)); input_report_key(dev, BTN_THUMBR, data[2] & BIT(7)); /* "analog" buttons A, B, X, Y */ input_report_key(dev, BTN_A, data[4]); input_report_key(dev, BTN_B, data[5]); input_report_key(dev, BTN_X, data[6]); input_report_key(dev, BTN_Y, data[7]); /* "analog" buttons black, white */ input_report_key(dev, BTN_C, data[8]); input_report_key(dev, BTN_Z, data[9]); input_sync(dev); } /* * xpad360_process_packet * * Completes a request by converting the data into events for the * input subsystem. It is version for xbox 360 controller * * The used report descriptor was taken from: * http://www.free60.org/wiki/Gamepad */ static void xpad360_process_packet(struct usb_xpad *xpad, struct input_dev *dev, u16 cmd, unsigned char *data) { /* valid pad data */ if (data[0] != 0x00) return; /* digital pad */ if (xpad->mapping & MAP_DPAD_TO_BUTTONS) { /* dpad as buttons (left, right, up, down) */ input_report_key(dev, BTN_TRIGGER_HAPPY1, data[2] & BIT(2)); input_report_key(dev, BTN_TRIGGER_HAPPY2, data[2] & BIT(3)); input_report_key(dev, BTN_TRIGGER_HAPPY3, data[2] & BIT(0)); input_report_key(dev, BTN_TRIGGER_HAPPY4, data[2] & BIT(1)); } /* * This should be a simple else block. However historically * xbox360w has mapped DPAD to buttons while xbox360 did not. This * made no sense, but now we can not just switch back and have to * support both behaviors. */ if (!(xpad->mapping & MAP_DPAD_TO_BUTTONS) || xpad->xtype == XTYPE_XBOX360W) { input_report_abs(dev, ABS_HAT0X, !!(data[2] & 0x08) - !!(data[2] & 0x04)); input_report_abs(dev, ABS_HAT0Y, !!(data[2] & 0x02) - !!(data[2] & 0x01)); } /* start/back buttons */ input_report_key(dev, BTN_START, data[2] & BIT(4)); input_report_key(dev, BTN_SELECT, data[2] & BIT(5)); /* stick press left/right */ input_report_key(dev, BTN_THUMBL, data[2] & BIT(6)); input_report_key(dev, BTN_THUMBR, data[2] & BIT(7)); /* buttons A,B,X,Y,TL,TR and MODE */ input_report_key(dev, BTN_A, data[3] & BIT(4)); input_report_key(dev, BTN_B, data[3] & BIT(5)); input_report_key(dev, BTN_X, data[3] & BIT(6)); input_report_key(dev, BTN_Y, data[3] & BIT(7)); input_report_key(dev, BTN_TL, data[3] & BIT(0)); input_report_key(dev, BTN_TR, data[3] & BIT(1)); input_report_key(dev, BTN_MODE, data[3] & BIT(2)); if (!(xpad->mapping & MAP_STICKS_TO_NULL)) { /* left stick */ input_report_abs(dev, ABS_X, (__s16) le16_to_cpup((__le16 *)(data + 6))); input_report_abs(dev, ABS_Y, ~(__s16) le16_to_cpup((__le16 *)(data + 8))); /* right stick */ input_report_abs(dev, ABS_RX, (__s16) le16_to_cpup((__le16 *)(data + 10))); input_report_abs(dev, ABS_RY, ~(__s16) le16_to_cpup((__le16 *)(data + 12))); } /* triggers left/right */ if (xpad->mapping & MAP_TRIGGERS_TO_BUTTONS) { input_report_key(dev, BTN_TL2, data[4]); input_report_key(dev, BTN_TR2, data[5]); } else { input_report_abs(dev, ABS_Z, data[4]); input_report_abs(dev, ABS_RZ, data[5]); } input_sync(dev); /* XBOX360W controllers can't be turned off without driver assistance */ if (xpad->xtype == XTYPE_XBOX360W) { if (xpad->mode_btn_down_ts > 0 && xpad->pad_present && ((ktime_get_seconds() - xpad->mode_btn_down_ts) >= XPAD360W_POWEROFF_TIMEOUT)) { xpad360w_poweroff_controller(xpad); xpad->mode_btn_down_ts = 0; return; } /* mode button down/up */ if (data[3] & BIT(2)) xpad->mode_btn_down_ts = ktime_get_seconds(); else xpad->mode_btn_down_ts = 0; } } static void xpad_presence_work(struct work_struct *work) { struct usb_xpad *xpad = container_of(work, struct usb_xpad, work); int error; if (xpad->pad_present) { error = xpad_init_input(xpad); if (error) { /* complain only, not much else we can do here */ dev_err(&xpad->dev->dev, "unable to init device: %d\n", error); } else { rcu_assign_pointer(xpad->x360w_dev, xpad->dev); } } else { RCU_INIT_POINTER(xpad->x360w_dev, NULL); synchronize_rcu(); /* * Now that we are sure xpad360w_process_packet is not * using input device we can get rid of it. */ xpad_deinit_input(xpad); } } /* * xpad360w_process_packet * * Completes a request by converting the data into events for the * input subsystem. It is version for xbox 360 wireless controller. * * Byte.Bit * 00.1 - Status change: The controller or headset has connected/disconnected * Bits 01.7 and 01.6 are valid * 01.7 - Controller present * 01.6 - Headset present * 01.1 - Pad state (Bytes 4+) valid * */ static void xpad360w_process_packet(struct usb_xpad *xpad, u16 cmd, unsigned char *data) { struct input_dev *dev; bool present; /* Presence change */ if (data[0] & 0x08) { present = (data[1] & 0x80) != 0; if (xpad->pad_present != present) { xpad->pad_present = present; schedule_work(&xpad->work); } } /* Valid pad data */ if (data[1] != 0x1) return; rcu_read_lock(); dev = rcu_dereference(xpad->x360w_dev); if (dev) xpad360_process_packet(xpad, dev, cmd, &data[4]); rcu_read_unlock(); } /* * xpadone_process_packet * * Completes a request by converting the data into events for the * input subsystem. This version is for the Xbox One controller. * * The report format was gleaned from * https://github.com/kylelemons/xbox/blob/master/xbox.go */ static void xpadone_process_packet(struct usb_xpad *xpad, u16 cmd, unsigned char *data) { struct input_dev *dev = xpad->dev; bool do_sync = false; /* the xbox button has its own special report */ if (data[0] == GIP_CMD_VIRTUAL_KEY) { /* * The Xbox One S controller requires these reports to be * acked otherwise it continues sending them forever and * won't report further mode button events. */ if (data[1] == (GIP_OPT_ACK | GIP_OPT_INTERNAL)) xpadone_ack_mode_report(xpad, data[2]); input_report_key(dev, BTN_MODE, data[4] & GENMASK(1, 0)); input_sync(dev); do_sync = true; } else if (data[0] == GIP_CMD_FIRMWARE) { /* Some packet formats force us to use this separate to poll paddle inputs */ if (xpad->packet_type == PKT_XBE2_FW_5_11) { /* Mute paddles if controller is in a custom profile slot * Checked by looking at the active profile slot to * verify it's the default slot */ if (data[19] != 0) data[18] = 0; /* Elite Series 2 split packet paddle bits */ input_report_key(dev, BTN_TRIGGER_HAPPY5, data[18] & BIT(0)); input_report_key(dev, BTN_TRIGGER_HAPPY6, data[18] & BIT(1)); input_report_key(dev, BTN_TRIGGER_HAPPY7, data[18] & BIT(2)); input_report_key(dev, BTN_TRIGGER_HAPPY8, data[18] & BIT(3)); do_sync = true; } } else if (data[0] == GIP_CMD_INPUT) { /* The main valid packet type for inputs */ /* menu/view buttons */ input_report_key(dev, BTN_START, data[4] & BIT(2)); input_report_key(dev, BTN_SELECT, data[4] & BIT(3)); if (xpad->mapping & MAP_SELECT_BUTTON) input_report_key(dev, KEY_RECORD, data[22] & BIT(0)); /* buttons A,B,X,Y */ input_report_key(dev, BTN_A, data[4] & BIT(4)); input_report_key(dev, BTN_B, data[4] & BIT(5)); input_report_key(dev, BTN_X, data[4] & BIT(6)); input_report_key(dev, BTN_Y, data[4] & BIT(7)); /* digital pad */ if (xpad->mapping & MAP_DPAD_TO_BUTTONS) { /* dpad as buttons (left, right, up, down) */ input_report_key(dev, BTN_TRIGGER_HAPPY1, data[5] & BIT(2)); input_report_key(dev, BTN_TRIGGER_HAPPY2, data[5] & BIT(3)); input_report_key(dev, BTN_TRIGGER_HAPPY3, data[5] & BIT(0)); input_report_key(dev, BTN_TRIGGER_HAPPY4, data[5] & BIT(1)); } else { input_report_abs(dev, ABS_HAT0X, !!(data[5] & 0x08) - !!(data[5] & 0x04)); input_report_abs(dev, ABS_HAT0Y, !!(data[5] & 0x02) - !!(data[5] & 0x01)); } /* TL/TR */ input_report_key(dev, BTN_TL, data[5] & BIT(4)); input_report_key(dev, BTN_TR, data[5] & BIT(5)); /* stick press left/right */ input_report_key(dev, BTN_THUMBL, data[5] & BIT(6)); input_report_key(dev, BTN_THUMBR, data[5] & BIT(7)); if (!(xpad->mapping & MAP_STICKS_TO_NULL)) { /* left stick */ input_report_abs(dev, ABS_X, (__s16) le16_to_cpup((__le16 *)(data + 10))); input_report_abs(dev, ABS_Y, ~(__s16) le16_to_cpup((__le16 *)(data + 12))); /* right stick */ input_report_abs(dev, ABS_RX, (__s16) le16_to_cpup((__le16 *)(data + 14))); input_report_abs(dev, ABS_RY, ~(__s16) le16_to_cpup((__le16 *)(data + 16))); } /* triggers left/right */ if (xpad->mapping & MAP_TRIGGERS_TO_BUTTONS) { input_report_key(dev, BTN_TL2, (__u16) le16_to_cpup((__le16 *)(data + 6))); input_report_key(dev, BTN_TR2, (__u16) le16_to_cpup((__le16 *)(data + 8))); } else { input_report_abs(dev, ABS_Z, (__u16) le16_to_cpup((__le16 *)(data + 6))); input_report_abs(dev, ABS_RZ, (__u16) le16_to_cpup((__le16 *)(data + 8))); } /* Profile button has a value of 0-3, so it is reported as an axis */ if (xpad->mapping & MAP_PROFILE_BUTTON) input_report_abs(dev, ABS_PROFILE, data[34]); /* paddle handling */ /* based on SDL's SDL_hidapi_xboxone.c */ if (xpad->mapping & MAP_PADDLES) { if (xpad->packet_type == PKT_XBE1) { /* Mute paddles if controller has a custom mapping applied. * Checked by comparing the current mapping * config against the factory mapping config */ if (memcmp(&data[4], &data[18], 2) != 0) data[32] = 0; /* OG Elite Series Controller paddle bits */ input_report_key(dev, BTN_TRIGGER_HAPPY5, data[32] & BIT(1)); input_report_key(dev, BTN_TRIGGER_HAPPY6, data[32] & BIT(3)); input_report_key(dev, BTN_TRIGGER_HAPPY7, data[32] & BIT(0)); input_report_key(dev, BTN_TRIGGER_HAPPY8, data[32] & BIT(2)); } else if (xpad->packet_type == PKT_XBE2_FW_OLD) { /* Mute paddles if controller has a custom mapping applied. * Checked by comparing the current mapping * config against the factory mapping config */ if (data[19] != 0) data[18] = 0; /* Elite Series 2 4.x firmware paddle bits */ input_report_key(dev, BTN_TRIGGER_HAPPY5, data[18] & BIT(0)); input_report_key(dev, BTN_TRIGGER_HAPPY6, data[18] & BIT(1)); input_report_key(dev, BTN_TRIGGER_HAPPY7, data[18] & BIT(2)); input_report_key(dev, BTN_TRIGGER_HAPPY8, data[18] & BIT(3)); } else if (xpad->packet_type == PKT_XBE2_FW_5_EARLY) { /* Mute paddles if controller has a custom mapping applied. * Checked by comparing the current mapping * config against the factory mapping config */ if (data[23] != 0) data[22] = 0; /* Elite Series 2 5.x firmware paddle bits * (before the packet was split) */ input_report_key(dev, BTN_TRIGGER_HAPPY5, data[22] & BIT(0)); input_report_key(dev, BTN_TRIGGER_HAPPY6, data[22] & BIT(1)); input_report_key(dev, BTN_TRIGGER_HAPPY7, data[22] & BIT(2)); input_report_key(dev, BTN_TRIGGER_HAPPY8, data[22] & BIT(3)); } } do_sync = true; } if (do_sync) input_sync(dev); } static void xpad_irq_in(struct urb *urb) { struct usb_xpad *xpad = urb->context; struct device *dev = &xpad->intf->dev; int retval, status; status = urb->status; switch (status) { case 0: /* success */ break; case -ECONNRESET: case -ENOENT: case -ESHUTDOWN: /* this urb is terminated, clean up */ dev_dbg(dev, "%s - urb shutting down with status: %d\n", __func__, status); return; default: dev_dbg(dev, "%s - nonzero urb status received: %d\n", __func__, status); goto exit; } switch (xpad->xtype) { case XTYPE_XBOX360: xpad360_process_packet(xpad, xpad->dev, 0, xpad->idata); break; case XTYPE_XBOX360W: xpad360w_process_packet(xpad, 0, xpad->idata); break; case XTYPE_XBOXONE: xpadone_process_packet(xpad, 0, xpad->idata); break; default: xpad_process_packet(xpad, 0, xpad->idata); } exit: retval = usb_submit_urb(urb, GFP_ATOMIC); if (retval) dev_err(dev, "%s - usb_submit_urb failed with result %d\n", __func__, retval); } /* Callers must hold xpad->odata_lock spinlock */ static bool xpad_prepare_next_init_packet(struct usb_xpad *xpad) { const struct xboxone_init_packet *init_packet; if (xpad->xtype != XTYPE_XBOXONE) return false; /* Perform initialization sequence for Xbox One pads that require it */ while (xpad->init_seq < ARRAY_SIZE(xboxone_init_packets)) { init_packet = &xboxone_init_packets[xpad->init_seq++]; if (init_packet->idVendor != 0 && init_packet->idVendor != xpad->dev->id.vendor) continue; if (init_packet->idProduct != 0 && init_packet->idProduct != xpad->dev->id.product) continue; /* This packet applies to our device, so prepare to send it */ memcpy(xpad->odata, init_packet->data, init_packet->len); xpad->irq_out->transfer_buffer_length = init_packet->len; /* Update packet with current sequence number */ xpad->odata[2] = xpad->odata_serial++; return true; } return false; } /* Callers must hold xpad->odata_lock spinlock */ static bool xpad_prepare_next_out_packet(struct usb_xpad *xpad) { struct xpad_output_packet *pkt, *packet = NULL; int i; /* We may have init packets to send before we can send user commands */ if (xpad_prepare_next_init_packet(xpad)) return true; for (i = 0; i < XPAD_NUM_OUT_PACKETS; i++) { if (++xpad->last_out_packet >= XPAD_NUM_OUT_PACKETS) xpad->last_out_packet = 0; pkt = &xpad->out_packets[xpad->last_out_packet]; if (pkt->pending) { dev_dbg(&xpad->intf->dev, "%s - found pending output packet %d\n", __func__, xpad->last_out_packet); packet = pkt; break; } } if (packet) { memcpy(xpad->odata, packet->data, packet->len); xpad->irq_out->transfer_buffer_length = packet->len; packet->pending = false; return true; } return false; } /* Callers must hold xpad->odata_lock spinlock */ static int xpad_try_sending_next_out_packet(struct usb_xpad *xpad) { int error; if (!xpad->irq_out_active && xpad_prepare_next_out_packet(xpad)) { usb_anchor_urb(xpad->irq_out, &xpad->irq_out_anchor); error = usb_submit_urb(xpad->irq_out, GFP_ATOMIC); if (error) { dev_err(&xpad->intf->dev, "%s - usb_submit_urb failed with result %d\n", __func__, error); usb_unanchor_urb(xpad->irq_out); return -EIO; } xpad->irq_out_active = true; } return 0; } static void xpad_irq_out(struct urb *urb) { struct usb_xpad *xpad = urb->context; struct device *dev = &xpad->intf->dev; int status = urb->status; int error; guard(spinlock_irqsave)(&xpad->odata_lock); switch (status) { case 0: /* success */ xpad->irq_out_active = xpad_prepare_next_out_packet(xpad); break; case -ECONNRESET: case -ENOENT: case -ESHUTDOWN: /* this urb is terminated, clean up */ dev_dbg(dev, "%s - urb shutting down with status: %d\n", __func__, status); xpad->irq_out_active = false; break; default: dev_dbg(dev, "%s - nonzero urb status received: %d\n", __func__, status); break; } if (xpad->irq_out_active) { usb_anchor_urb(urb, &xpad->irq_out_anchor); error = usb_submit_urb(urb, GFP_ATOMIC); if (error) { dev_err(dev, "%s - usb_submit_urb failed with result %d\n", __func__, error); usb_unanchor_urb(urb); xpad->irq_out_active = false; } } } static int xpad_init_output(struct usb_interface *intf, struct usb_xpad *xpad, struct usb_endpoint_descriptor *ep_irq_out) { int error; if (xpad->xtype == XTYPE_UNKNOWN) return 0; init_usb_anchor(&xpad->irq_out_anchor); xpad->odata = usb_alloc_coherent(xpad->udev, XPAD_PKT_LEN, GFP_KERNEL, &xpad->odata_dma); if (!xpad->odata) return -ENOMEM; spin_lock_init(&xpad->odata_lock); xpad->irq_out = usb_alloc_urb(0, GFP_KERNEL); if (!xpad->irq_out) { error = -ENOMEM; goto err_free_coherent; } usb_fill_int_urb(xpad->irq_out, xpad->udev, usb_sndintpipe(xpad->udev, ep_irq_out->bEndpointAddress), xpad->odata, XPAD_PKT_LEN, xpad_irq_out, xpad, ep_irq_out->bInterval); xpad->irq_out->transfer_dma = xpad->odata_dma; xpad->irq_out->transfer_flags |= URB_NO_TRANSFER_DMA_MAP; return 0; err_free_coherent: usb_free_coherent(xpad->udev, XPAD_PKT_LEN, xpad->odata, xpad->odata_dma); return error; } static void xpad_stop_output(struct usb_xpad *xpad) { if (xpad->xtype != XTYPE_UNKNOWN) { if (!usb_wait_anchor_empty_timeout(&xpad->irq_out_anchor, 5000)) { dev_warn(&xpad->intf->dev, "timed out waiting for output URB to complete, killing\n"); usb_kill_anchored_urbs(&xpad->irq_out_anchor); } } } static void xpad_deinit_output(struct usb_xpad *xpad) { if (xpad->xtype != XTYPE_UNKNOWN) { usb_free_urb(xpad->irq_out); usb_free_coherent(xpad->udev, XPAD_PKT_LEN, xpad->odata, xpad->odata_dma); } } static int xpad_inquiry_pad_presence(struct usb_xpad *xpad) { struct xpad_output_packet *packet = &xpad->out_packets[XPAD_OUT_CMD_IDX]; guard(spinlock_irqsave)(&xpad->odata_lock); packet->data[0] = 0x08; packet->data[1] = 0x00; packet->data[2] = 0x0F; packet->data[3] = 0xC0; packet->data[4] = 0x00; packet->data[5] = 0x00; packet->data[6] = 0x00; packet->data[7] = 0x00; packet->data[8] = 0x00; packet->data[9] = 0x00; packet->data[10] = 0x00; packet->data[11] = 0x00; packet->len = 12; packet->pending = true; /* Reset the sequence so we send out presence first */ xpad->last_out_packet = -1; return xpad_try_sending_next_out_packet(xpad); } static int xpad_start_xbox_one(struct usb_xpad *xpad) { int error; if (usb_ifnum_to_if(xpad->udev, GIP_WIRED_INTF_AUDIO)) { /* * Explicitly disable the audio interface. This is needed * for some controllers, such as the PowerA Enhanced Wired * Controller for Series X|S (0x20d6:0x200e) to report the * guide button. */ error = usb_set_interface(xpad->udev, GIP_WIRED_INTF_AUDIO, 0); if (error) dev_warn(&xpad->dev->dev, "unable to disable audio interface: %d\n", error); } guard(spinlock_irqsave)(&xpad->odata_lock); /* * Begin the init sequence by attempting to send a packet. * We will cycle through the init packet sequence before * sending any packets from the output ring. */ xpad->init_seq = 0; return xpad_try_sending_next_out_packet(xpad); } static void xpadone_ack_mode_report(struct usb_xpad *xpad, u8 seq_num) { struct xpad_output_packet *packet = &xpad->out_packets[XPAD_OUT_CMD_IDX]; static const u8 mode_report_ack[] = { GIP_CMD_ACK, GIP_OPT_INTERNAL, GIP_SEQ0, GIP_PL_LEN(9), 0x00, GIP_CMD_VIRTUAL_KEY, GIP_OPT_INTERNAL, 0x02, 0x00, 0x00, 0x00, 0x00, 0x00 }; guard(spinlock_irqsave)(&xpad->odata_lock); packet->len = sizeof(mode_report_ack); memcpy(packet->data, mode_report_ack, packet->len); packet->data[2] = seq_num; packet->pending = true; /* Reset the sequence so we send out the ack now */ xpad->last_out_packet = -1; xpad_try_sending_next_out_packet(xpad); } #ifdef CONFIG_JOYSTICK_XPAD_FF static int xpad_play_effect(struct input_dev *dev, void *data, struct ff_effect *effect) { struct usb_xpad *xpad = input_get_drvdata(dev); struct xpad_output_packet *packet = &xpad->out_packets[XPAD_OUT_FF_IDX]; __u16 strong; __u16 weak; if (effect->type != FF_RUMBLE) return 0; strong = effect->u.rumble.strong_magnitude; weak = effect->u.rumble.weak_magnitude; guard(spinlock_irqsave)(&xpad->odata_lock); switch (xpad->xtype) { case XTYPE_XBOX: packet->data[0] = 0x00; packet->data[1] = 0x06; packet->data[2] = 0x00; packet->data[3] = strong / 256; /* left actuator */ packet->data[4] = 0x00; packet->data[5] = weak / 256; /* right actuator */ packet->len = 6; packet->pending = true; break; case XTYPE_XBOX360: packet->data[0] = 0x00; packet->data[1] = 0x08; packet->data[2] = 0x00; packet->data[3] = strong / 256; /* left actuator? */ packet->data[4] = weak / 256; /* right actuator? */ packet->data[5] = 0x00; packet->data[6] = 0x00; packet->data[7] = 0x00; packet->len = 8; packet->pending = true; break; case XTYPE_XBOX360W: packet->data[0] = 0x00; packet->data[1] = 0x01; packet->data[2] = 0x0F; packet->data[3] = 0xC0; packet->data[4] = 0x00; packet->data[5] = strong / 256; packet->data[6] = weak / 256; packet->data[7] = 0x00; packet->data[8] = 0x00; packet->data[9] = 0x00; packet->data[10] = 0x00; packet->data[11] = 0x00; packet->len = 12; packet->pending = true; break; case XTYPE_XBOXONE: packet->data[0] = GIP_CMD_RUMBLE; /* activate rumble */ packet->data[1] = 0x00; packet->data[2] = xpad->odata_serial++; packet->data[3] = GIP_PL_LEN(9); packet->data[4] = 0x00; packet->data[5] = GIP_MOTOR_ALL; packet->data[6] = 0x00; /* left trigger */ packet->data[7] = 0x00; /* right trigger */ packet->data[8] = strong / 512; /* left actuator */ packet->data[9] = weak / 512; /* right actuator */ packet->data[10] = 0xFF; /* on period */ packet->data[11] = 0x00; /* off period */ packet->data[12] = 0xFF; /* repeat count */ packet->len = 13; packet->pending = true; break; default: dev_dbg(&xpad->dev->dev, "%s - rumble command sent to unsupported xpad type: %d\n", __func__, xpad->xtype); return -EINVAL; } return xpad_try_sending_next_out_packet(xpad); } static int xpad_init_ff(struct usb_xpad *xpad) { if (xpad->xtype == XTYPE_UNKNOWN) return 0; input_set_capability(xpad->dev, EV_FF, FF_RUMBLE); return input_ff_create_memless(xpad->dev, NULL, xpad_play_effect); } #else static int xpad_init_ff(struct usb_xpad *xpad) { return 0; } #endif #if defined(CONFIG_JOYSTICK_XPAD_LEDS) #include <linux/leds.h> #include <linux/idr.h> static DEFINE_IDA(xpad_pad_seq); struct xpad_led { char name[16]; struct led_classdev led_cdev; struct usb_xpad *xpad; }; /* * set the LEDs on Xbox 360 / Wireless Controllers * @param command * 0: off * 1: all blink, then previous setting * 2: 1/top-left blink, then on * 3: 2/top-right blink, then on * 4: 3/bottom-left blink, then on * 5: 4/bottom-right blink, then on * 6: 1/top-left on * 7: 2/top-right on * 8: 3/bottom-left on * 9: 4/bottom-right on * 10: rotate * 11: blink, based on previous setting * 12: slow blink, based on previous setting * 13: rotate with two lights * 14: persistent slow all blink * 15: blink once, then previous setting */ static void xpad_send_led_command(struct usb_xpad *xpad, int command) { struct xpad_output_packet *packet = &xpad->out_packets[XPAD_OUT_LED_IDX]; command %= 16; guard(spinlock_irqsave)(&xpad->odata_lock); switch (xpad->xtype) { case XTYPE_XBOX360: packet->data[0] = 0x01; packet->data[1] = 0x03; packet->data[2] = command; packet->len = 3; packet->pending = true; break; case XTYPE_XBOX360W: packet->data[0] = 0x00; packet->data[1] = 0x00; packet->data[2] = 0x08; packet->data[3] = 0x40 + command; packet->data[4] = 0x00; packet->data[5] = 0x00; packet->data[6] = 0x00; packet->data[7] = 0x00; packet->data[8] = 0x00; packet->data[9] = 0x00; packet->data[10] = 0x00; packet->data[11] = 0x00; packet->len = 12; packet->pending = true; break; } xpad_try_sending_next_out_packet(xpad); } /* * Light up the segment corresponding to the pad number on * Xbox 360 Controllers. */ static void xpad_identify_controller(struct usb_xpad *xpad) { led_set_brightness(&xpad->led->led_cdev, (xpad->pad_nr % 4) + 2); } static void xpad_led_set(struct led_classdev *led_cdev, enum led_brightness value) { struct xpad_led *xpad_led = container_of(led_cdev, struct xpad_led, led_cdev); xpad_send_led_command(xpad_led->xpad, value); } static int xpad_led_probe(struct usb_xpad *xpad) { struct xpad_led *led; struct led_classdev *led_cdev; int error; if (xpad->xtype != XTYPE_XBOX360 && xpad->xtype != XTYPE_XBOX360W) return 0; xpad->led = led = kzalloc(sizeof(*led), GFP_KERNEL); if (!led) return -ENOMEM; xpad->pad_nr = ida_alloc(&xpad_pad_seq, GFP_KERNEL); if (xpad->pad_nr < 0) { error = xpad->pad_nr; goto err_free_mem; } snprintf(led->name, sizeof(led->name), "xpad%d", xpad->pad_nr); led->xpad = xpad; led_cdev = &led->led_cdev; led_cdev->name = led->name; led_cdev->brightness_set = xpad_led_set; led_cdev->flags = LED_CORE_SUSPENDRESUME; error = led_classdev_register(&xpad->udev->dev, led_cdev); if (error) goto err_free_id; xpad_identify_controller(xpad); return 0; err_free_id: ida_free(&xpad_pad_seq, xpad->pad_nr); err_free_mem: kfree(led); xpad->led = NULL; return error; } static void xpad_led_disconnect(struct usb_xpad *xpad) { struct xpad_led *xpad_led = xpad->led; if (xpad_led) { led_classdev_unregister(&xpad_led->led_cdev); ida_free(&xpad_pad_seq, xpad->pad_nr); kfree(xpad_led); } } #else static int xpad_led_probe(struct usb_xpad *xpad) { return 0; } static void xpad_led_disconnect(struct usb_xpad *xpad) { } #endif static int xpad_start_input(struct usb_xpad *xpad) { int error; if (usb_submit_urb(xpad->irq_in, GFP_KERNEL)) return -EIO; if (xpad->xtype == XTYPE_XBOXONE) { error = xpad_start_xbox_one(xpad); if (error) { usb_kill_urb(xpad->irq_in); return error; } } if (xpad->xtype == XTYPE_XBOX360) { /* * Some third-party controllers Xbox 360-style controllers * require this message to finish initialization. */ u8 dummy[20]; error = usb_control_msg_recv(xpad->udev, 0, /* bRequest */ 0x01, /* bmRequestType */ USB_TYPE_VENDOR | USB_DIR_IN | USB_RECIP_INTERFACE, /* wValue */ 0x100, /* wIndex */ 0x00, dummy, sizeof(dummy), 25, GFP_KERNEL); if (error) dev_warn(&xpad->dev->dev, "unable to receive magic message: %d\n", error); } return 0; } static void xpad_stop_input(struct usb_xpad *xpad) { usb_kill_urb(xpad->irq_in); } static void xpad360w_poweroff_controller(struct usb_xpad *xpad) { struct xpad_output_packet *packet = &xpad->out_packets[XPAD_OUT_CMD_IDX]; guard(spinlock_irqsave)(&xpad->odata_lock); packet->data[0] = 0x00; packet->data[1] = 0x00; packet->data[2] = 0x08; packet->data[3] = 0xC0; packet->data[4] = 0x00; packet->data[5] = 0x00; packet->data[6] = 0x00; packet->data[7] = 0x00; packet->data[8] = 0x00; packet->data[9] = 0x00; packet->data[10] = 0x00; packet->data[11] = 0x00; packet->len = 12; packet->pending = true; /* Reset the sequence so we send out poweroff now */ xpad->last_out_packet = -1; xpad_try_sending_next_out_packet(xpad); } static int xpad360w_start_input(struct usb_xpad *xpad) { int error; error = usb_submit_urb(xpad->irq_in, GFP_KERNEL); if (error) return -EIO; /* * Send presence packet. * This will force the controller to resend connection packets. * This is useful in the case we activate the module after the * adapter has been plugged in, as it won't automatically * send us info about the controllers. */ error = xpad_inquiry_pad_presence(xpad); if (error) { usb_kill_urb(xpad->irq_in); return error; } return 0; } static void xpad360w_stop_input(struct usb_xpad *xpad) { usb_kill_urb(xpad->irq_in); /* Make sure we are done with presence work if it was scheduled */ flush_work(&xpad->work); } static int xpad_open(struct input_dev *dev) { struct usb_xpad *xpad = input_get_drvdata(dev); return xpad_start_input(xpad); } static void xpad_close(struct input_dev *dev) { struct usb_xpad *xpad = input_get_drvdata(dev); xpad_stop_input(xpad); } static void xpad_set_up_abs(struct input_dev *input_dev, signed short abs) { struct usb_xpad *xpad = input_get_drvdata(input_dev); switch (abs) { case ABS_X: case ABS_Y: case ABS_RX: case ABS_RY: /* the two sticks */ input_set_abs_params(input_dev, abs, -32768, 32767, 16, 128); break; case ABS_Z: case ABS_RZ: /* the triggers (if mapped to axes) */ if (xpad->xtype == XTYPE_XBOXONE) input_set_abs_params(input_dev, abs, 0, 1023, 0, 0); else input_set_abs_params(input_dev, abs, 0, 255, 0, 0); break; case ABS_HAT0X: case ABS_HAT0Y: /* the d-pad (only if dpad is mapped to axes */ input_set_abs_params(input_dev, abs, -1, 1, 0, 0); break; case ABS_PROFILE: /* 4 value profile button (such as on XAC) */ input_set_abs_params(input_dev, abs, 0, 4, 0, 0); break; default: input_set_abs_params(input_dev, abs, 0, 0, 0, 0); break; } } static void xpad_deinit_input(struct usb_xpad *xpad) { if (xpad->input_created) { xpad->input_created = false; xpad_led_disconnect(xpad); input_unregister_device(xpad->dev); } } static int xpad_init_input(struct usb_xpad *xpad) { struct input_dev *input_dev; int i, error; input_dev = input_allocate_device(); if (!input_dev) return -ENOMEM; xpad->dev = input_dev; input_dev->name = xpad->name; input_dev->phys = xpad->phys; usb_to_input_id(xpad->udev, &input_dev->id); if (xpad->xtype == XTYPE_XBOX360W) { /* x360w controllers and the receiver have different ids */ input_dev->id.product = 0x02a1; } input_dev->dev.parent = &xpad->intf->dev; input_set_drvdata(input_dev, xpad); if (xpad->xtype != XTYPE_XBOX360W) { input_dev->open = xpad_open; input_dev->close = xpad_close; } if (!(xpad->mapping & MAP_STICKS_TO_NULL)) { /* set up axes */ for (i = 0; xpad_abs[i] >= 0; i++) xpad_set_up_abs(input_dev, xpad_abs[i]); } /* set up standard buttons */ for (i = 0; xpad_common_btn[i] >= 0; i++) input_set_capability(input_dev, EV_KEY, xpad_common_btn[i]); /* set up model-specific ones */ if (xpad->xtype == XTYPE_XBOX360 || xpad->xtype == XTYPE_XBOX360W || xpad->xtype == XTYPE_XBOXONE) { for (i = 0; xpad360_btn[i] >= 0; i++) input_set_capability(input_dev, EV_KEY, xpad360_btn[i]); if (xpad->mapping & MAP_SELECT_BUTTON) input_set_capability(input_dev, EV_KEY, KEY_RECORD); } else { for (i = 0; xpad_btn[i] >= 0; i++) input_set_capability(input_dev, EV_KEY, xpad_btn[i]); } if (xpad->mapping & MAP_DPAD_TO_BUTTONS) { for (i = 0; xpad_btn_pad[i] >= 0; i++) input_set_capability(input_dev, EV_KEY, xpad_btn_pad[i]); } /* set up paddles if the controller has them */ if (xpad->mapping & MAP_PADDLES) { for (i = 0; xpad_btn_paddles[i] >= 0; i++) input_set_capability(input_dev, EV_KEY, xpad_btn_paddles[i]); } /* * This should be a simple else block. However historically * xbox360w has mapped DPAD to buttons while xbox360 did not. This * made no sense, but now we can not just switch back and have to * support both behaviors. */ if (!(xpad->mapping & MAP_DPAD_TO_BUTTONS) || xpad->xtype == XTYPE_XBOX360W) { for (i = 0; xpad_abs_pad[i] >= 0; i++) xpad_set_up_abs(input_dev, xpad_abs_pad[i]); } if (xpad->mapping & MAP_TRIGGERS_TO_BUTTONS) { for (i = 0; xpad_btn_triggers[i] >= 0; i++) input_set_capability(input_dev, EV_KEY, xpad_btn_triggers[i]); } else { for (i = 0; xpad_abs_triggers[i] >= 0; i++) xpad_set_up_abs(input_dev, xpad_abs_triggers[i]); } /* setup profile button as an axis with 4 possible values */ if (xpad->mapping & MAP_PROFILE_BUTTON) xpad_set_up_abs(input_dev, ABS_PROFILE); error = xpad_init_ff(xpad); if (error) goto err_free_input; error = xpad_led_probe(xpad); if (error) goto err_destroy_ff; error = input_register_device(xpad->dev); if (error) goto err_disconnect_led; xpad->input_created = true; return 0; err_disconnect_led: xpad_led_disconnect(xpad); err_destroy_ff: input_ff_destroy(input_dev); err_free_input: input_free_device(input_dev); return error; } static int xpad_probe(struct usb_interface *intf, const struct usb_device_id *id) { struct usb_device *udev = interface_to_usbdev(intf); struct usb_xpad *xpad; struct usb_endpoint_descriptor *ep_irq_in, *ep_irq_out; int i, error; if (intf->cur_altsetting->desc.bNumEndpoints != 2) return -ENODEV; for (i = 0; xpad_device[i].idVendor; i++) { if ((le16_to_cpu(udev->descriptor.idVendor) == xpad_device[i].idVendor) && (le16_to_cpu(udev->descriptor.idProduct) == xpad_device[i].idProduct)) break; } xpad = kzalloc(sizeof(*xpad), GFP_KERNEL); if (!xpad) return -ENOMEM; usb_make_path(udev, xpad->phys, sizeof(xpad->phys)); strlcat(xpad->phys, "/input0", sizeof(xpad->phys)); xpad->idata = usb_alloc_coherent(udev, XPAD_PKT_LEN, GFP_KERNEL, &xpad->idata_dma); if (!xpad->idata) { error = -ENOMEM; goto err_free_mem; } xpad->irq_in = usb_alloc_urb(0, GFP_KERNEL); if (!xpad->irq_in) { error = -ENOMEM; goto err_free_idata; } xpad->udev = udev; xpad->intf = intf; xpad->mapping = xpad_device[i].mapping; xpad->xtype = xpad_device[i].xtype; xpad->name = xpad_device[i].name; xpad->packet_type = PKT_XB; INIT_WORK(&xpad->work, xpad_presence_work); if (xpad->xtype == XTYPE_UNKNOWN) { if (intf->cur_altsetting->desc.bInterfaceClass == USB_CLASS_VENDOR_SPEC) { if (intf->cur_altsetting->desc.bInterfaceProtocol == 129) xpad->xtype = XTYPE_XBOX360W; else if (intf->cur_altsetting->desc.bInterfaceProtocol == 208) xpad->xtype = XTYPE_XBOXONE; else xpad->xtype = XTYPE_XBOX360; } else { xpad->xtype = XTYPE_XBOX; } if (dpad_to_buttons) xpad->mapping |= MAP_DPAD_TO_BUTTONS; if (triggers_to_buttons) xpad->mapping |= MAP_TRIGGERS_TO_BUTTONS; if (sticks_to_null) xpad->mapping |= MAP_STICKS_TO_NULL; } if (xpad->xtype == XTYPE_XBOXONE && intf->cur_altsetting->desc.bInterfaceNumber != GIP_WIRED_INTF_DATA) { /* * The Xbox One controller lists three interfaces all with the * same interface class, subclass and protocol. Differentiate by * interface number. */ error = -ENODEV; goto err_free_in_urb; } ep_irq_in = ep_irq_out = NULL; for (i = 0; i < 2; i++) { struct usb_endpoint_descriptor *ep = &intf->cur_altsetting->endpoint[i].desc; if (usb_endpoint_xfer_int(ep)) { if (usb_endpoint_dir_in(ep)) ep_irq_in = ep; else ep_irq_out = ep; } } if (!ep_irq_in || !ep_irq_out) { error = -ENODEV; goto err_free_in_urb; } error = xpad_init_output(intf, xpad, ep_irq_out); if (error) goto err_free_in_urb; usb_fill_int_urb(xpad->irq_in, udev, usb_rcvintpipe(udev, ep_irq_in->bEndpointAddress), xpad->idata, XPAD_PKT_LEN, xpad_irq_in, xpad, ep_irq_in->bInterval); xpad->irq_in->transfer_dma = xpad->idata_dma; xpad->irq_in->transfer_flags |= URB_NO_TRANSFER_DMA_MAP; usb_set_intfdata(intf, xpad); /* Packet type detection */ if (le16_to_cpu(udev->descriptor.idVendor) == 0x045e) { /* Microsoft controllers */ if (le16_to_cpu(udev->descriptor.idProduct) == 0x02e3) { /* The original elite controller always uses the oldest * type of extended packet */ xpad->packet_type = PKT_XBE1; } else if (le16_to_cpu(udev->descriptor.idProduct) == 0x0b00) { /* The elite 2 controller has seen multiple packet * revisions. These are tied to specific firmware * versions */ if (le16_to_cpu(udev->descriptor.bcdDevice) < 0x0500) { /* This is the format that the Elite 2 used * prior to the BLE update */ xpad->packet_type = PKT_XBE2_FW_OLD; } else if (le16_to_cpu(udev->descriptor.bcdDevice) < 0x050b) { /* This is the format that the Elite 2 used * prior to the update that split the packet */ xpad->packet_type = PKT_XBE2_FW_5_EARLY; } else { /* The split packet format that was introduced * in firmware v5.11 */ xpad->packet_type = PKT_XBE2_FW_5_11; } } } if (xpad->xtype == XTYPE_XBOX360W) { /* * Submit the int URB immediately rather than waiting for open * because we get status messages from the device whether * or not any controllers are attached. In fact, it's * exactly the message that a controller has arrived that * we're waiting for. */ error = xpad360w_start_input(xpad); if (error) goto err_deinit_output; /* * Wireless controllers require RESET_RESUME to work properly * after suspend. Ideally this quirk should be in usb core * quirk list, but we have too many vendors producing these * controllers and we'd need to maintain 2 identical lists * here in this driver and in usb core. */ udev->quirks |= USB_QUIRK_RESET_RESUME; } else { error = xpad_init_input(xpad); if (error) goto err_deinit_output; } return 0; err_deinit_output: xpad_deinit_output(xpad); err_free_in_urb: usb_free_urb(xpad->irq_in); err_free_idata: usb_free_coherent(udev, XPAD_PKT_LEN, xpad->idata, xpad->idata_dma); err_free_mem: kfree(xpad); return error; } static void xpad_disconnect(struct usb_interface *intf) { struct usb_xpad *xpad = usb_get_intfdata(intf); if (xpad->xtype == XTYPE_XBOX360W) xpad360w_stop_input(xpad); xpad_deinit_input(xpad); /* * Now that both input device and LED device are gone we can * stop output URB. */ xpad_stop_output(xpad); xpad_deinit_output(xpad); usb_free_urb(xpad->irq_in); usb_free_coherent(xpad->udev, XPAD_PKT_LEN, xpad->idata, xpad->idata_dma); kfree(xpad); usb_set_intfdata(intf, NULL); } static int xpad_suspend(struct usb_interface *intf, pm_message_t message) { struct usb_xpad *xpad = usb_get_intfdata(intf); struct input_dev *input = xpad->dev; if (xpad->xtype == XTYPE_XBOX360W) { /* * Wireless controllers always listen to input so * they are notified when controller shows up * or goes away. */ xpad360w_stop_input(xpad); /* * The wireless adapter is going off now, so the * gamepads are going to become disconnected. * Unless explicitly disabled, power them down * so they don't just sit there flashing. */ if (auto_poweroff && xpad->pad_present) xpad360w_poweroff_controller(xpad); } else { guard(mutex)(&input->mutex); if (input_device_enabled(input)) xpad_stop_input(xpad); } xpad_stop_output(xpad); return 0; } static int xpad_resume(struct usb_interface *intf) { struct usb_xpad *xpad = usb_get_intfdata(intf); struct input_dev *input = xpad->dev; if (xpad->xtype == XTYPE_XBOX360W) return xpad360w_start_input(xpad); guard(mutex)(&input->mutex); if (input_device_enabled(input)) return xpad_start_input(xpad); if (xpad->xtype == XTYPE_XBOXONE) { /* * Even if there are no users, we'll send Xbox One pads * the startup sequence so they don't sit there and * blink until somebody opens the input device again. */ return xpad_start_xbox_one(xpad); } return 0; } static struct usb_driver xpad_driver = { .name = "xpad", .probe = xpad_probe, .disconnect = xpad_disconnect, .suspend = xpad_suspend, .resume = xpad_resume, .id_table = xpad_table, }; module_usb_driver(xpad_driver); MODULE_AUTHOR("Marko Friedemann <mfr@bmx-chemnitz.de>"); MODULE_DESCRIPTION("Xbox pad driver"); MODULE_LICENSE("GPL"); |
| 1339 869 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 | /* SPDX-License-Identifier: GPL-2.0 */ /* Freezer declarations */ #ifndef FREEZER_H_INCLUDED #define FREEZER_H_INCLUDED #include <linux/debug_locks.h> #include <linux/sched.h> #include <linux/wait.h> #include <linux/atomic.h> #include <linux/jump_label.h> #ifdef CONFIG_FREEZER DECLARE_STATIC_KEY_FALSE(freezer_active); extern bool pm_freezing; /* PM freezing in effect */ extern bool pm_nosig_freezing; /* PM nosig freezing in effect */ /* * Timeout for stopping processes */ extern unsigned int freeze_timeout_msecs; /* * Check if a process has been frozen */ extern bool frozen(struct task_struct *p); extern bool freezing_slow_path(struct task_struct *p); /* * Check if there is a request to freeze a process */ static inline bool freezing(struct task_struct *p) { if (static_branch_unlikely(&freezer_active)) return freezing_slow_path(p); return false; } /* Takes and releases task alloc lock using task_lock() */ extern void __thaw_task(struct task_struct *t); extern bool __refrigerator(bool check_kthr_stop); extern int freeze_processes(void); extern int freeze_kernel_threads(void); extern void thaw_processes(void); extern void thaw_kernel_threads(void); static inline bool try_to_freeze(void) { might_sleep(); if (likely(!freezing(current))) return false; if (!(current->flags & PF_NOFREEZE)) debug_check_no_locks_held(); return __refrigerator(false); } extern bool freeze_task(struct task_struct *p); extern bool set_freezable(void); #ifdef CONFIG_CGROUP_FREEZER extern bool cgroup_freezing(struct task_struct *task); #else /* !CONFIG_CGROUP_FREEZER */ static inline bool cgroup_freezing(struct task_struct *task) { return false; } #endif /* !CONFIG_CGROUP_FREEZER */ #else /* !CONFIG_FREEZER */ static inline bool frozen(struct task_struct *p) { return false; } static inline bool freezing(struct task_struct *p) { return false; } static inline void __thaw_task(struct task_struct *t) {} static inline bool __refrigerator(bool check_kthr_stop) { return false; } static inline int freeze_processes(void) { return -ENOSYS; } static inline int freeze_kernel_threads(void) { return -ENOSYS; } static inline void thaw_processes(void) {} static inline void thaw_kernel_threads(void) {} static inline bool try_to_freeze(void) { return false; } static inline void set_freezable(void) {} #endif /* !CONFIG_FREEZER */ #endif /* FREEZER_H_INCLUDED */ |
| 331 1173 485 854 12 26 90 12 840 178 337 57 717 8 2 1 50 59 6 6 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 | /* SPDX-License-Identifier: GPL-2.0-only */ /* * Copyright (c) 2020 Christoph Hellwig. * * Support for "universal" pointers that can point to either kernel or userspace * memory. */ #ifndef _LINUX_SOCKPTR_H #define _LINUX_SOCKPTR_H #include <linux/slab.h> #include <linux/uaccess.h> typedef struct { union { void *kernel; void __user *user; }; bool is_kernel : 1; } sockptr_t; static inline bool sockptr_is_kernel(sockptr_t sockptr) { return sockptr.is_kernel; } static inline sockptr_t KERNEL_SOCKPTR(void *p) { return (sockptr_t) { .kernel = p, .is_kernel = true }; } static inline sockptr_t USER_SOCKPTR(void __user *p) { return (sockptr_t) { .user = p }; } static inline bool sockptr_is_null(sockptr_t sockptr) { if (sockptr_is_kernel(sockptr)) return !sockptr.kernel; return !sockptr.user; } static inline int copy_from_sockptr_offset(void *dst, sockptr_t src, size_t offset, size_t size) { if (!sockptr_is_kernel(src)) return copy_from_user(dst, src.user + offset, size); memcpy(dst, src.kernel + offset, size); return 0; } /* Deprecated. * This is unsafe, unless caller checked user provided optlen. * Prefer copy_safe_from_sockptr() instead. * * Returns 0 for success, or number of bytes not copied on error. */ static inline int copy_from_sockptr(void *dst, sockptr_t src, size_t size) { return copy_from_sockptr_offset(dst, src, 0, size); } /** * copy_safe_from_sockptr: copy a struct from sockptr * @dst: Destination address, in kernel space. This buffer must be @ksize * bytes long. * @ksize: Size of @dst struct. * @optval: Source address. (in user or kernel space) * @optlen: Size of @optval data. * * Returns: * * -EINVAL: @optlen < @ksize * * -EFAULT: access to userspace failed. * * 0 : @ksize bytes were copied */ static inline int copy_safe_from_sockptr(void *dst, size_t ksize, sockptr_t optval, unsigned int optlen) { if (optlen < ksize) return -EINVAL; if (copy_from_sockptr(dst, optval, ksize)) return -EFAULT; return 0; } static inline int copy_struct_from_sockptr(void *dst, size_t ksize, sockptr_t src, size_t usize) { size_t size = min(ksize, usize); size_t rest = max(ksize, usize) - size; if (!sockptr_is_kernel(src)) return copy_struct_from_user(dst, ksize, src.user, size); if (usize < ksize) { memset(dst + size, 0, rest); } else if (usize > ksize) { char *p = src.kernel; while (rest--) { if (*p++) return -E2BIG; } } memcpy(dst, src.kernel, size); return 0; } static inline int copy_to_sockptr_offset(sockptr_t dst, size_t offset, const void *src, size_t size) { if (!sockptr_is_kernel(dst)) return copy_to_user(dst.user + offset, src, size); memcpy(dst.kernel + offset, src, size); return 0; } static inline int copy_to_sockptr(sockptr_t dst, const void *src, size_t size) { return copy_to_sockptr_offset(dst, 0, src, size); } static inline void *memdup_sockptr_noprof(sockptr_t src, size_t len) { void *p = kmalloc_track_caller_noprof(len, GFP_USER | __GFP_NOWARN); if (!p) return ERR_PTR(-ENOMEM); if (copy_from_sockptr(p, src, len)) { kfree(p); return ERR_PTR(-EFAULT); } return p; } #define memdup_sockptr(...) alloc_hooks(memdup_sockptr_noprof(__VA_ARGS__)) static inline void *memdup_sockptr_nul_noprof(sockptr_t src, size_t len) { char *p = kmalloc_track_caller_noprof(len + 1, GFP_KERNEL); if (!p) return ERR_PTR(-ENOMEM); if (copy_from_sockptr(p, src, len)) { kfree(p); return ERR_PTR(-EFAULT); } p[len] = '\0'; return p; } #define memdup_sockptr_nul(...) alloc_hooks(memdup_sockptr_nul_noprof(__VA_ARGS__)) static inline long strncpy_from_sockptr(char *dst, sockptr_t src, size_t count) { if (sockptr_is_kernel(src)) { size_t len = min(strnlen(src.kernel, count - 1) + 1, count); memcpy(dst, src.kernel, len); return len; } return strncpy_from_user(dst, src.user, count); } static inline int check_zeroed_sockptr(sockptr_t src, size_t offset, size_t size) { if (!sockptr_is_kernel(src)) return check_zeroed_user(src.user + offset, size); return memchr_inv(src.kernel + offset, 0, size) == NULL; } #endif /* _LINUX_SOCKPTR_H */ |
| 1 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 | // SPDX-License-Identifier: GPL-2.0 /* * Copyright (c) 2000-2003,2005 Silicon Graphics, Inc. * All Rights Reserved. */ #ifndef __XFS_LOG_FORMAT_H__ #define __XFS_LOG_FORMAT_H__ struct xfs_mount; struct xfs_trans_res; /* * On-disk Log Format definitions. * * This file contains all the on-disk format definitions used within the log. It * includes the physical log structure itself, as well as all the log item * format structures that are written into the log and intepreted by log * recovery. We start with the physical log format definitions, and then work * through all the log items definitions and everything they encode into the * log. */ typedef uint32_t xlog_tid_t; #define XLOG_MIN_ICLOGS 2 #define XLOG_MAX_ICLOGS 8 #define XLOG_HEADER_MAGIC_NUM 0xFEEDbabe /* Invalid cycle number */ #define XLOG_VERSION_1 1 #define XLOG_VERSION_2 2 /* Large IClogs, Log sunit */ #define XLOG_VERSION_OKBITS (XLOG_VERSION_1 | XLOG_VERSION_2) #define XLOG_MIN_RECORD_BSIZE (16*1024) /* eventually 32k */ #define XLOG_BIG_RECORD_BSIZE (32*1024) /* 32k buffers */ #define XLOG_MAX_RECORD_BSIZE (256*1024) #define XLOG_HEADER_CYCLE_SIZE (32*1024) /* cycle data in header */ #define XLOG_MIN_RECORD_BSHIFT 14 /* 16384 == 1 << 14 */ #define XLOG_BIG_RECORD_BSHIFT 15 /* 32k == 1 << 15 */ #define XLOG_MAX_RECORD_BSHIFT 18 /* 256k == 1 << 18 */ #define XLOG_HEADER_SIZE 512 /* Minimum number of transactions that must fit in the log (defined by mkfs) */ #define XFS_MIN_LOG_FACTOR 3 #define XLOG_REC_SHIFT(log) \ BTOBB(1 << (xfs_has_logv2(log->l_mp) ? \ XLOG_MAX_RECORD_BSHIFT : XLOG_BIG_RECORD_BSHIFT)) #define XLOG_TOTAL_REC_SHIFT(log) \ BTOBB(XLOG_MAX_ICLOGS << (xfs_has_logv2(log->l_mp) ? \ XLOG_MAX_RECORD_BSHIFT : XLOG_BIG_RECORD_BSHIFT)) /* get lsn fields */ #define CYCLE_LSN(lsn) ((uint)((lsn)>>32)) #define BLOCK_LSN(lsn) ((uint)(lsn)) /* this is used in a spot where we might otherwise double-endian-flip */ #define CYCLE_LSN_DISK(lsn) (((__be32 *)&(lsn))[0]) static inline xfs_lsn_t xlog_assign_lsn(uint cycle, uint block) { return ((xfs_lsn_t)cycle << 32) | block; } static inline uint xlog_get_cycle(char *ptr) { if (be32_to_cpu(*(__be32 *)ptr) == XLOG_HEADER_MAGIC_NUM) return be32_to_cpu(*((__be32 *)ptr + 1)); else return be32_to_cpu(*(__be32 *)ptr); } /* Log Clients */ #define XFS_TRANSACTION 0x69 #define XFS_LOG 0xaa #define XLOG_UNMOUNT_TYPE 0x556e /* Un for Unmount */ /* * Log item for unmount records. * * The unmount record used to have a string "Unmount filesystem--" in the * data section where the "Un" was really a magic number (XLOG_UNMOUNT_TYPE). * We just write the magic number now; see xfs_log_unmount_write. */ struct xfs_unmount_log_format { uint16_t magic; /* XLOG_UNMOUNT_TYPE */ uint16_t pad1; uint32_t pad2; /* may as well make it 64 bits */ }; /* Region types for iovec's i_type */ #define XLOG_REG_TYPE_BFORMAT 1 #define XLOG_REG_TYPE_BCHUNK 2 #define XLOG_REG_TYPE_EFI_FORMAT 3 #define XLOG_REG_TYPE_EFD_FORMAT 4 #define XLOG_REG_TYPE_IFORMAT 5 #define XLOG_REG_TYPE_ICORE 6 #define XLOG_REG_TYPE_IEXT 7 #define XLOG_REG_TYPE_IBROOT 8 #define XLOG_REG_TYPE_ILOCAL 9 #define XLOG_REG_TYPE_IATTR_EXT 10 #define XLOG_REG_TYPE_IATTR_BROOT 11 #define XLOG_REG_TYPE_IATTR_LOCAL 12 #define XLOG_REG_TYPE_QFORMAT 13 #define XLOG_REG_TYPE_DQUOT 14 #define XLOG_REG_TYPE_QUOTAOFF 15 #define XLOG_REG_TYPE_LRHEADER 16 #define XLOG_REG_TYPE_UNMOUNT 17 #define XLOG_REG_TYPE_COMMIT 18 #define XLOG_REG_TYPE_TRANSHDR 19 #define XLOG_REG_TYPE_ICREATE 20 #define XLOG_REG_TYPE_RUI_FORMAT 21 #define XLOG_REG_TYPE_RUD_FORMAT 22 #define XLOG_REG_TYPE_CUI_FORMAT 23 #define XLOG_REG_TYPE_CUD_FORMAT 24 #define XLOG_REG_TYPE_BUI_FORMAT 25 #define XLOG_REG_TYPE_BUD_FORMAT 26 #define XLOG_REG_TYPE_ATTRI_FORMAT 27 #define XLOG_REG_TYPE_ATTRD_FORMAT 28 #define XLOG_REG_TYPE_ATTR_NAME 29 #define XLOG_REG_TYPE_ATTR_VALUE 30 #define XLOG_REG_TYPE_XMI_FORMAT 31 #define XLOG_REG_TYPE_XMD_FORMAT 32 #define XLOG_REG_TYPE_ATTR_NEWNAME 33 #define XLOG_REG_TYPE_ATTR_NEWVALUE 34 #define XLOG_REG_TYPE_MAX 34 /* * Flags to log operation header * * The first write of a new transaction will be preceded with a start * record, XLOG_START_TRANS. Once a transaction is committed, a commit * record is written, XLOG_COMMIT_TRANS. If a single region can not fit into * the remainder of the current active in-core log, it is split up into * multiple regions. Each partial region will be marked with a * XLOG_CONTINUE_TRANS until the last one, which gets marked with XLOG_END_TRANS. * */ #define XLOG_START_TRANS 0x01 /* Start a new transaction */ #define XLOG_COMMIT_TRANS 0x02 /* Commit this transaction */ #define XLOG_CONTINUE_TRANS 0x04 /* Cont this trans into new region */ #define XLOG_WAS_CONT_TRANS 0x08 /* Cont this trans into new region */ #define XLOG_END_TRANS 0x10 /* End a continued transaction */ #define XLOG_UNMOUNT_TRANS 0x20 /* Unmount a filesystem transaction */ typedef struct xlog_op_header { __be32 oh_tid; /* transaction id of operation : 4 b */ __be32 oh_len; /* bytes in data region : 4 b */ __u8 oh_clientid; /* who sent me this : 1 b */ __u8 oh_flags; /* : 1 b */ __u16 oh_res2; /* 32 bit align : 2 b */ } xlog_op_header_t; /* valid values for h_fmt */ #define XLOG_FMT_UNKNOWN 0 #define XLOG_FMT_LINUX_LE 1 #define XLOG_FMT_LINUX_BE 2 #define XLOG_FMT_IRIX_BE 3 /* our fmt */ #ifdef XFS_NATIVE_HOST #define XLOG_FMT XLOG_FMT_LINUX_BE #else #define XLOG_FMT XLOG_FMT_LINUX_LE #endif typedef struct xlog_rec_header { __be32 h_magicno; /* log record (LR) identifier : 4 */ __be32 h_cycle; /* write cycle of log : 4 */ __be32 h_version; /* LR version : 4 */ __be32 h_len; /* len in bytes; should be 64-bit aligned: 4 */ __be64 h_lsn; /* lsn of this LR : 8 */ __be64 h_tail_lsn; /* lsn of 1st LR w/ buffers not committed: 8 */ __le32 h_crc; /* crc of log record : 4 */ __be32 h_prev_block; /* block number to previous LR : 4 */ __be32 h_num_logops; /* number of log operations in this LR : 4 */ __be32 h_cycle_data[XLOG_HEADER_CYCLE_SIZE / BBSIZE]; /* new fields */ __be32 h_fmt; /* format of log record : 4 */ uuid_t h_fs_uuid; /* uuid of FS : 16 */ __be32 h_size; /* iclog size : 4 */ } xlog_rec_header_t; typedef struct xlog_rec_ext_header { __be32 xh_cycle; /* write cycle of log : 4 */ __be32 xh_cycle_data[XLOG_HEADER_CYCLE_SIZE / BBSIZE]; /* : 256 */ } xlog_rec_ext_header_t; /* * Quite misnamed, because this union lays out the actual on-disk log buffer. */ typedef union xlog_in_core2 { xlog_rec_header_t hic_header; xlog_rec_ext_header_t hic_xheader; char hic_sector[XLOG_HEADER_SIZE]; } xlog_in_core_2_t; /* not an on-disk structure, but needed by log recovery in userspace */ typedef struct xfs_log_iovec { void *i_addr; /* beginning address of region */ int i_len; /* length in bytes of region */ uint i_type; /* type of region */ } xfs_log_iovec_t; /* * Transaction Header definitions. * * This is the structure written in the log at the head of every transaction. It * identifies the type and id of the transaction, and contains the number of * items logged by the transaction so we know how many to expect during * recovery. * * Do not change the below structure without redoing the code in * xlog_recover_add_to_trans() and xlog_recover_add_to_cont_trans(). */ typedef struct xfs_trans_header { uint th_magic; /* magic number */ uint th_type; /* transaction type */ int32_t th_tid; /* transaction id (unused) */ uint th_num_items; /* num items logged by trans */ } xfs_trans_header_t; #define XFS_TRANS_HEADER_MAGIC 0x5452414e /* TRAN */ /* * The only type valid for th_type in CIL-enabled file system logs: */ #define XFS_TRANS_CHECKPOINT 40 /* * Log item types. */ #define XFS_LI_EFI 0x1236 #define XFS_LI_EFD 0x1237 #define XFS_LI_IUNLINK 0x1238 #define XFS_LI_INODE 0x123b /* aligned ino chunks, var-size ibufs */ #define XFS_LI_BUF 0x123c /* v2 bufs, variable sized inode bufs */ #define XFS_LI_DQUOT 0x123d #define XFS_LI_QUOTAOFF 0x123e #define XFS_LI_ICREATE 0x123f #define XFS_LI_RUI 0x1240 /* rmap update intent */ #define XFS_LI_RUD 0x1241 #define XFS_LI_CUI 0x1242 /* refcount update intent */ #define XFS_LI_CUD 0x1243 #define XFS_LI_BUI 0x1244 /* bmbt update intent */ #define XFS_LI_BUD 0x1245 #define XFS_LI_ATTRI 0x1246 /* attr set/remove intent*/ #define XFS_LI_ATTRD 0x1247 /* attr set/remove done */ #define XFS_LI_XMI 0x1248 /* mapping exchange intent */ #define XFS_LI_XMD 0x1249 /* mapping exchange done */ #define XFS_LI_EFI_RT 0x124a /* realtime extent free intent */ #define XFS_LI_EFD_RT 0x124b /* realtime extent free done */ #define XFS_LI_RUI_RT 0x124c /* realtime rmap update intent */ #define XFS_LI_RUD_RT 0x124d /* realtime rmap update done */ #define XFS_LI_CUI_RT 0x124e /* realtime refcount update intent */ #define XFS_LI_CUD_RT 0x124f /* realtime refcount update done */ #define XFS_LI_TYPE_DESC \ { XFS_LI_EFI, "XFS_LI_EFI" }, \ { XFS_LI_EFD, "XFS_LI_EFD" }, \ { XFS_LI_IUNLINK, "XFS_LI_IUNLINK" }, \ { XFS_LI_INODE, "XFS_LI_INODE" }, \ { XFS_LI_BUF, "XFS_LI_BUF" }, \ { XFS_LI_DQUOT, "XFS_LI_DQUOT" }, \ { XFS_LI_QUOTAOFF, "XFS_LI_QUOTAOFF" }, \ { XFS_LI_ICREATE, "XFS_LI_ICREATE" }, \ { XFS_LI_RUI, "XFS_LI_RUI" }, \ { XFS_LI_RUD, "XFS_LI_RUD" }, \ { XFS_LI_CUI, "XFS_LI_CUI" }, \ { XFS_LI_CUD, "XFS_LI_CUD" }, \ { XFS_LI_BUI, "XFS_LI_BUI" }, \ { XFS_LI_BUD, "XFS_LI_BUD" }, \ { XFS_LI_ATTRI, "XFS_LI_ATTRI" }, \ { XFS_LI_ATTRD, "XFS_LI_ATTRD" }, \ { XFS_LI_XMI, "XFS_LI_XMI" }, \ { XFS_LI_XMD, "XFS_LI_XMD" }, \ { XFS_LI_EFI_RT, "XFS_LI_EFI_RT" }, \ { XFS_LI_EFD_RT, "XFS_LI_EFD_RT" }, \ { XFS_LI_RUI_RT, "XFS_LI_RUI_RT" }, \ { XFS_LI_RUD_RT, "XFS_LI_RUD_RT" }, \ { XFS_LI_CUI_RT, "XFS_LI_CUI_RT" }, \ { XFS_LI_CUD_RT, "XFS_LI_CUD_RT" } /* * Inode Log Item Format definitions. * * This is the structure used to lay out an inode log item in the * log. The size of the inline data/extents/b-tree root to be logged * (if any) is indicated in the ilf_dsize field. Changes to this structure * must be added on to the end. */ struct xfs_inode_log_format { uint16_t ilf_type; /* inode log item type */ uint16_t ilf_size; /* size of this item */ uint32_t ilf_fields; /* flags for fields logged */ uint16_t ilf_asize; /* size of attr d/ext/root */ uint16_t ilf_dsize; /* size of data/ext/root */ uint32_t ilf_pad; /* pad for 64 bit boundary */ uint64_t ilf_ino; /* inode number */ union { uint32_t ilfu_rdev; /* rdev value for dev inode*/ uint8_t __pad[16]; /* unused */ } ilf_u; int64_t ilf_blkno; /* blkno of inode buffer */ int32_t ilf_len; /* len of inode buffer */ int32_t ilf_boffset; /* off of inode in buffer */ }; /* * Old 32 bit systems will log in this format without the 64 bit * alignment padding. Recovery will detect this and convert it to the * correct format. */ struct xfs_inode_log_format_32 { uint16_t ilf_type; /* inode log item type */ uint16_t ilf_size; /* size of this item */ uint32_t ilf_fields; /* flags for fields logged */ uint16_t ilf_asize; /* size of attr d/ext/root */ uint16_t ilf_dsize; /* size of data/ext/root */ uint64_t ilf_ino; /* inode number */ union { uint32_t ilfu_rdev; /* rdev value for dev inode*/ uint8_t __pad[16]; /* unused */ } ilf_u; int64_t ilf_blkno; /* blkno of inode buffer */ int32_t ilf_len; /* len of inode buffer */ int32_t ilf_boffset; /* off of inode in buffer */ } __attribute__((packed)); /* * Flags for xfs_trans_log_inode flags field. */ #define XFS_ILOG_CORE 0x001 /* log standard inode fields */ #define XFS_ILOG_DDATA 0x002 /* log i_df.if_data */ #define XFS_ILOG_DEXT 0x004 /* log i_df.if_extents */ #define XFS_ILOG_DBROOT 0x008 /* log i_df.i_broot */ #define XFS_ILOG_DEV 0x010 /* log the dev field */ #define XFS_ILOG_UUID 0x020 /* added long ago, but never used */ #define XFS_ILOG_ADATA 0x040 /* log i_af.if_data */ #define XFS_ILOG_AEXT 0x080 /* log i_af.if_extents */ #define XFS_ILOG_ABROOT 0x100 /* log i_af.i_broot */ #define XFS_ILOG_DOWNER 0x200 /* change the data fork owner on replay */ #define XFS_ILOG_AOWNER 0x400 /* change the attr fork owner on replay */ /* * The timestamps are dirty, but not necessarily anything else in the inode * core. Unlike the other fields above this one must never make it to disk * in the ilf_fields of the inode_log_format, but is purely store in-memory in * ili_fields in the inode_log_item. */ #define XFS_ILOG_TIMESTAMP 0x4000 /* * The version field has been changed, but not necessarily anything else of * interest. This must never make it to disk - it is used purely to ensure that * the inode item ->precommit operation can update the fsync flag triggers * in the inode item correctly. */ #define XFS_ILOG_IVERSION 0x8000 #define XFS_ILOG_DFORK (XFS_ILOG_DDATA | XFS_ILOG_DEXT | \ XFS_ILOG_DBROOT) #define XFS_ILOG_AFORK (XFS_ILOG_ADATA | XFS_ILOG_AEXT | \ XFS_ILOG_ABROOT) #define XFS_ILOG_ALL (XFS_ILOG_CORE | XFS_ILOG_DDATA | \ XFS_ILOG_DEXT | XFS_ILOG_DBROOT | \ XFS_ILOG_DEV | XFS_ILOG_ADATA | \ XFS_ILOG_AEXT | XFS_ILOG_ABROOT | \ XFS_ILOG_TIMESTAMP | XFS_ILOG_DOWNER | \ XFS_ILOG_AOWNER) static inline int xfs_ilog_fbroot(int w) { return (w == XFS_DATA_FORK ? XFS_ILOG_DBROOT : XFS_ILOG_ABROOT); } static inline int xfs_ilog_fext(int w) { return (w == XFS_DATA_FORK ? XFS_ILOG_DEXT : XFS_ILOG_AEXT); } static inline int xfs_ilog_fdata(int w) { return (w == XFS_DATA_FORK ? XFS_ILOG_DDATA : XFS_ILOG_ADATA); } /* * Incore version of the on-disk inode core structures. We log this directly * into the journal in host CPU format (for better or worse) and as such * directly mirrors the xfs_dinode structure as it must contain all the same * information. */ typedef uint64_t xfs_log_timestamp_t; /* Legacy timestamp encoding format. */ struct xfs_log_legacy_timestamp { int32_t t_sec; /* timestamp seconds */ int32_t t_nsec; /* timestamp nanoseconds */ }; /* * Define the format of the inode core that is logged. This structure must be * kept identical to struct xfs_dinode except for the endianness annotations. */ struct xfs_log_dinode { uint16_t di_magic; /* inode magic # = XFS_DINODE_MAGIC */ uint16_t di_mode; /* mode and type of file */ int8_t di_version; /* inode version */ int8_t di_format; /* format of di_c data */ uint16_t di_metatype; /* metadata type, if DIFLAG2_METADATA */ uint32_t di_uid; /* owner's user id */ uint32_t di_gid; /* owner's group id */ uint32_t di_nlink; /* number of links to file */ uint16_t di_projid_lo; /* lower part of owner's project id */ uint16_t di_projid_hi; /* higher part of owner's project id */ union { /* Number of data fork extents if NREXT64 is set */ uint64_t di_big_nextents; /* Padding for V3 inodes without NREXT64 set. */ uint64_t di_v3_pad; /* Padding and inode flush counter for V2 inodes. */ struct { uint8_t di_v2_pad[6]; /* V2 inode zeroed space */ uint16_t di_flushiter; /* V2 inode incremented on flush */ }; }; xfs_log_timestamp_t di_atime; /* time last accessed */ xfs_log_timestamp_t di_mtime; /* time last modified */ xfs_log_timestamp_t di_ctime; /* time created/inode modified */ xfs_fsize_t di_size; /* number of bytes in file */ xfs_rfsblock_t di_nblocks; /* # of direct & btree blocks used */ xfs_extlen_t di_extsize; /* basic/minimum extent size for file */ union { /* * For V2 inodes and V3 inodes without NREXT64 set, this * is the number of data and attr fork extents. */ struct { uint32_t di_nextents; uint16_t di_anextents; } __packed; /* Number of attr fork extents if NREXT64 is set. */ struct { uint32_t di_big_anextents; uint16_t di_nrext64_pad; } __packed; } __packed; uint8_t di_forkoff; /* attr fork offs, <<3 for 64b align */ int8_t di_aformat; /* format of attr fork's data */ uint32_t di_dmevmask; /* DMIG event mask */ uint16_t di_dmstate; /* DMIG state info */ uint16_t di_flags; /* random flags, XFS_DIFLAG_... */ uint32_t di_gen; /* generation number */ /* di_next_unlinked is the only non-core field in the old dinode */ xfs_agino_t di_next_unlinked;/* agi unlinked list ptr */ /* start of the extended dinode, writable fields */ uint32_t di_crc; /* CRC of the inode */ uint64_t di_changecount; /* number of attribute changes */ /* * The LSN we write to this field during formatting is not a reflection * of the current on-disk LSN. It should never be used for recovery * sequencing, nor should it be recovered into the on-disk inode at all. * See xlog_recover_inode_commit_pass2() and xfs_log_dinode_to_disk() * for details. */ xfs_lsn_t di_lsn; uint64_t di_flags2; /* more random flags */ uint32_t di_cowextsize; /* basic cow extent size for file */ uint8_t di_pad2[12]; /* more padding for future expansion */ /* fields only written to during inode creation */ xfs_log_timestamp_t di_crtime; /* time created */ xfs_ino_t di_ino; /* inode number */ uuid_t di_uuid; /* UUID of the filesystem */ /* structure must be padded to 64 bit alignment */ }; #define xfs_log_dinode_size(mp) \ (xfs_has_v3inodes((mp)) ? \ sizeof(struct xfs_log_dinode) : \ offsetof(struct xfs_log_dinode, di_next_unlinked)) /* * Buffer Log Format definitions * * These are the physical dirty bitmap definitions for the log format structure. */ #define XFS_BLF_CHUNK 128 #define XFS_BLF_SHIFT 7 #define BIT_TO_WORD_SHIFT 5 #define NBWORD (NBBY * sizeof(unsigned int)) /* * This flag indicates that the buffer contains on disk inodes * and requires special recovery handling. */ #define XFS_BLF_INODE_BUF (1<<0) /* * This flag indicates that the buffer should not be replayed * during recovery because its blocks are being freed. */ #define XFS_BLF_CANCEL (1<<1) /* * This flag indicates that the buffer contains on disk * user or group dquots and may require special recovery handling. */ #define XFS_BLF_UDQUOT_BUF (1<<2) #define XFS_BLF_PDQUOT_BUF (1<<3) #define XFS_BLF_GDQUOT_BUF (1<<4) /* * This is the structure used to lay out a buf log item in the log. The data * map describes which 128 byte chunks of the buffer have been logged. * * The placement of blf_map_size causes blf_data_map to start at an odd * multiple of sizeof(unsigned int) offset within the struct. Because the data * bitmap size will always be an even number, the end of the data_map (and * therefore the structure) will also be at an odd multiple of sizeof(unsigned * int). Some 64-bit compilers will insert padding at the end of the struct to * ensure 64-bit alignment of blf_blkno, but 32-bit ones will not. Therefore, * XFS_BLF_DATAMAP_SIZE must be an odd number to make the padding explicit and * keep the structure size consistent between 32-bit and 64-bit platforms. */ #define __XFS_BLF_DATAMAP_SIZE ((XFS_MAX_BLOCKSIZE / XFS_BLF_CHUNK) / NBWORD) #define XFS_BLF_DATAMAP_SIZE (__XFS_BLF_DATAMAP_SIZE + 1) typedef struct xfs_buf_log_format { unsigned short blf_type; /* buf log item type indicator */ unsigned short blf_size; /* size of this item */ unsigned short blf_flags; /* misc state */ unsigned short blf_len; /* number of blocks in this buf */ int64_t blf_blkno; /* starting blkno of this buf */ unsigned int blf_map_size; /* used size of data bitmap in words */ unsigned int blf_data_map[XFS_BLF_DATAMAP_SIZE]; /* dirty bitmap */ } xfs_buf_log_format_t; /* * All buffers now need to tell recovery where the magic number * is so that it can verify and calculate the CRCs on the buffer correctly * once the changes have been replayed into the buffer. * * The type value is held in the upper 5 bits of the blf_flags field, which is * an unsigned 16 bit field. Hence we need to shift it 11 bits up and down. */ #define XFS_BLFT_BITS 5 #define XFS_BLFT_SHIFT 11 #define XFS_BLFT_MASK (((1 << XFS_BLFT_BITS) - 1) << XFS_BLFT_SHIFT) enum xfs_blft { XFS_BLFT_UNKNOWN_BUF = 0, XFS_BLFT_UDQUOT_BUF, XFS_BLFT_PDQUOT_BUF, XFS_BLFT_GDQUOT_BUF, XFS_BLFT_BTREE_BUF, XFS_BLFT_AGF_BUF, XFS_BLFT_AGFL_BUF, XFS_BLFT_AGI_BUF, XFS_BLFT_DINO_BUF, XFS_BLFT_SYMLINK_BUF, XFS_BLFT_DIR_BLOCK_BUF, XFS_BLFT_DIR_DATA_BUF, XFS_BLFT_DIR_FREE_BUF, XFS_BLFT_DIR_LEAF1_BUF, XFS_BLFT_DIR_LEAFN_BUF, XFS_BLFT_DA_NODE_BUF, XFS_BLFT_ATTR_LEAF_BUF, XFS_BLFT_ATTR_RMT_BUF, XFS_BLFT_SB_BUF, XFS_BLFT_RTBITMAP_BUF, XFS_BLFT_RTSUMMARY_BUF, XFS_BLFT_MAX_BUF = (1 << XFS_BLFT_BITS), }; static inline void xfs_blft_to_flags(struct xfs_buf_log_format *blf, enum xfs_blft type) { ASSERT(type > XFS_BLFT_UNKNOWN_BUF && type < XFS_BLFT_MAX_BUF); blf->blf_flags &= ~XFS_BLFT_MASK; blf->blf_flags |= ((type << XFS_BLFT_SHIFT) & XFS_BLFT_MASK); } static inline uint16_t xfs_blft_from_flags(struct xfs_buf_log_format *blf) { return (blf->blf_flags & XFS_BLFT_MASK) >> XFS_BLFT_SHIFT; } /* * EFI/EFD log format definitions */ typedef struct xfs_extent { xfs_fsblock_t ext_start; xfs_extlen_t ext_len; } xfs_extent_t; /* * Since an xfs_extent_t has types (start:64, len: 32) * there are different alignments on 32 bit and 64 bit kernels. * So we provide the different variants for use by a * conversion routine. */ typedef struct xfs_extent_32 { uint64_t ext_start; uint32_t ext_len; } __attribute__((packed)) xfs_extent_32_t; typedef struct xfs_extent_64 { uint64_t ext_start; uint32_t ext_len; uint32_t ext_pad; } xfs_extent_64_t; /* * This is the structure used to lay out an efi log item in the * log. The efi_extents field is a variable size array whose * size is given by efi_nextents. */ typedef struct xfs_efi_log_format { uint16_t efi_type; /* efi log item type */ uint16_t efi_size; /* size of this item */ uint32_t efi_nextents; /* # extents to free */ uint64_t efi_id; /* efi identifier */ xfs_extent_t efi_extents[]; /* array of extents to free */ } xfs_efi_log_format_t; static inline size_t xfs_efi_log_format_sizeof( unsigned int nr) { return sizeof(struct xfs_efi_log_format) + nr * sizeof(struct xfs_extent); } typedef struct xfs_efi_log_format_32 { uint16_t efi_type; /* efi log item type */ uint16_t efi_size; /* size of this item */ uint32_t efi_nextents; /* # extents to free */ uint64_t efi_id; /* efi identifier */ xfs_extent_32_t efi_extents[]; /* array of extents to free */ } __attribute__((packed)) xfs_efi_log_format_32_t; static inline size_t xfs_efi_log_format32_sizeof( unsigned int nr) { return sizeof(struct xfs_efi_log_format_32) + nr * sizeof(struct xfs_extent_32); } typedef struct xfs_efi_log_format_64 { uint16_t efi_type; /* efi log item type */ uint16_t efi_size; /* size of this item */ uint32_t efi_nextents; /* # extents to free */ uint64_t efi_id; /* efi identifier */ xfs_extent_64_t efi_extents[]; /* array of extents to free */ } xfs_efi_log_format_64_t; static inline size_t xfs_efi_log_format64_sizeof( unsigned int nr) { return sizeof(struct xfs_efi_log_format_64) + nr * sizeof(struct xfs_extent_64); } /* * This is the structure used to lay out an efd log item in the * log. The efd_extents array is a variable size array whose * size is given by efd_nextents; */ typedef struct xfs_efd_log_format { uint16_t efd_type; /* efd log item type */ uint16_t efd_size; /* size of this item */ uint32_t efd_nextents; /* # of extents freed */ uint64_t efd_efi_id; /* id of corresponding efi */ xfs_extent_t efd_extents[]; /* array of extents freed */ } xfs_efd_log_format_t; static inline size_t xfs_efd_log_format_sizeof( unsigned int nr) { return sizeof(struct xfs_efd_log_format) + nr * sizeof(struct xfs_extent); } typedef struct xfs_efd_log_format_32 { uint16_t efd_type; /* efd log item type */ uint16_t efd_size; /* size of this item */ uint32_t efd_nextents; /* # of extents freed */ uint64_t efd_efi_id; /* id of corresponding efi */ xfs_extent_32_t efd_extents[]; /* array of extents freed */ } __attribute__((packed)) xfs_efd_log_format_32_t; static inline size_t xfs_efd_log_format32_sizeof( unsigned int nr) { return sizeof(struct xfs_efd_log_format_32) + nr * sizeof(struct xfs_extent_32); } typedef struct xfs_efd_log_format_64 { uint16_t efd_type; /* efd log item type */ uint16_t efd_size; /* size of this item */ uint32_t efd_nextents; /* # of extents freed */ uint64_t efd_efi_id; /* id of corresponding efi */ xfs_extent_64_t efd_extents[]; /* array of extents freed */ } xfs_efd_log_format_64_t; static inline size_t xfs_efd_log_format64_sizeof( unsigned int nr) { return sizeof(struct xfs_efd_log_format_64) + nr * sizeof(struct xfs_extent_64); } /* * RUI/RUD (reverse mapping) log format definitions */ struct xfs_map_extent { uint64_t me_owner; uint64_t me_startblock; uint64_t me_startoff; uint32_t me_len; uint32_t me_flags; }; /* rmap me_flags: upper bits are flags, lower byte is type code */ #define XFS_RMAP_EXTENT_MAP 1 #define XFS_RMAP_EXTENT_MAP_SHARED 2 #define XFS_RMAP_EXTENT_UNMAP 3 #define XFS_RMAP_EXTENT_UNMAP_SHARED 4 #define XFS_RMAP_EXTENT_CONVERT 5 #define XFS_RMAP_EXTENT_CONVERT_SHARED 6 #define XFS_RMAP_EXTENT_ALLOC 7 #define XFS_RMAP_EXTENT_FREE 8 #define XFS_RMAP_EXTENT_TYPE_MASK 0xFF #define XFS_RMAP_EXTENT_ATTR_FORK (1U << 31) #define XFS_RMAP_EXTENT_BMBT_BLOCK (1U << 30) #define XFS_RMAP_EXTENT_UNWRITTEN (1U << 29) #define XFS_RMAP_EXTENT_FLAGS (XFS_RMAP_EXTENT_TYPE_MASK | \ XFS_RMAP_EXTENT_ATTR_FORK | \ XFS_RMAP_EXTENT_BMBT_BLOCK | \ XFS_RMAP_EXTENT_UNWRITTEN) /* * This is the structure used to lay out an rui log item in the * log. The rui_extents field is a variable size array whose * size is given by rui_nextents. */ struct xfs_rui_log_format { uint16_t rui_type; /* rui log item type */ uint16_t rui_size; /* size of this item */ uint32_t rui_nextents; /* # extents to free */ uint64_t rui_id; /* rui identifier */ struct xfs_map_extent rui_extents[]; /* array of extents to rmap */ }; static inline size_t xfs_rui_log_format_sizeof( unsigned int nr) { return sizeof(struct xfs_rui_log_format) + nr * sizeof(struct xfs_map_extent); } /* * This is the structure used to lay out an rud log item in the * log. The rud_extents array is a variable size array whose * size is given by rud_nextents; */ struct xfs_rud_log_format { uint16_t rud_type; /* rud log item type */ uint16_t rud_size; /* size of this item */ uint32_t __pad; uint64_t rud_rui_id; /* id of corresponding rui */ }; /* * CUI/CUD (refcount update) log format definitions */ struct xfs_phys_extent { uint64_t pe_startblock; uint32_t pe_len; uint32_t pe_flags; }; /* refcount pe_flags: upper bits are flags, lower byte is type code */ /* Type codes are taken directly from enum xfs_refcount_intent_type. */ #define XFS_REFCOUNT_EXTENT_TYPE_MASK 0xFF #define XFS_REFCOUNT_EXTENT_FLAGS (XFS_REFCOUNT_EXTENT_TYPE_MASK) /* * This is the structure used to lay out a cui log item in the * log. The cui_extents field is a variable size array whose * size is given by cui_nextents. */ struct xfs_cui_log_format { uint16_t cui_type; /* cui log item type */ uint16_t cui_size; /* size of this item */ uint32_t cui_nextents; /* # extents to free */ uint64_t cui_id; /* cui identifier */ struct xfs_phys_extent cui_extents[]; /* array of extents */ }; static inline size_t xfs_cui_log_format_sizeof( unsigned int nr) { return sizeof(struct xfs_cui_log_format) + nr * sizeof(struct xfs_phys_extent); } /* * This is the structure used to lay out a cud log item in the * log. The cud_extents array is a variable size array whose * size is given by cud_nextents; */ struct xfs_cud_log_format { uint16_t cud_type; /* cud log item type */ uint16_t cud_size; /* size of this item */ uint32_t __pad; uint64_t cud_cui_id; /* id of corresponding cui */ }; /* * BUI/BUD (inode block mapping) log format definitions */ /* bmbt me_flags: upper bits are flags, lower byte is type code */ /* Type codes are taken directly from enum xfs_bmap_intent_type. */ #define XFS_BMAP_EXTENT_TYPE_MASK 0xFF #define XFS_BMAP_EXTENT_ATTR_FORK (1U << 31) #define XFS_BMAP_EXTENT_UNWRITTEN (1U << 30) #define XFS_BMAP_EXTENT_REALTIME (1U << 29) #define XFS_BMAP_EXTENT_FLAGS (XFS_BMAP_EXTENT_TYPE_MASK | \ XFS_BMAP_EXTENT_ATTR_FORK | \ XFS_BMAP_EXTENT_UNWRITTEN | \ XFS_BMAP_EXTENT_REALTIME) /* * This is the structure used to lay out an bui log item in the * log. The bui_extents field is a variable size array whose * size is given by bui_nextents. */ struct xfs_bui_log_format { uint16_t bui_type; /* bui log item type */ uint16_t bui_size; /* size of this item */ uint32_t bui_nextents; /* # extents to free */ uint64_t bui_id; /* bui identifier */ struct xfs_map_extent bui_extents[]; /* array of extents to bmap */ }; static inline size_t xfs_bui_log_format_sizeof( unsigned int nr) { return sizeof(struct xfs_bui_log_format) + nr * sizeof(struct xfs_map_extent); } /* * This is the structure used to lay out an bud log item in the * log. The bud_extents array is a variable size array whose * size is given by bud_nextents; */ struct xfs_bud_log_format { uint16_t bud_type; /* bud log item type */ uint16_t bud_size; /* size of this item */ uint32_t __pad; uint64_t bud_bui_id; /* id of corresponding bui */ }; /* * XMI/XMD (file mapping exchange) log format definitions */ /* This is the structure used to lay out an mapping exchange log item. */ struct xfs_xmi_log_format { uint16_t xmi_type; /* xmi log item type */ uint16_t xmi_size; /* size of this item */ uint32_t __pad; /* must be zero */ uint64_t xmi_id; /* xmi identifier */ uint64_t xmi_inode1; /* inumber of first file */ uint64_t xmi_inode2; /* inumber of second file */ uint32_t xmi_igen1; /* generation of first file */ uint32_t xmi_igen2; /* generation of second file */ uint64_t xmi_startoff1; /* block offset into file1 */ uint64_t xmi_startoff2; /* block offset into file2 */ uint64_t xmi_blockcount; /* number of blocks */ uint64_t xmi_flags; /* XFS_EXCHMAPS_* */ uint64_t xmi_isize1; /* intended file1 size */ uint64_t xmi_isize2; /* intended file2 size */ }; /* Exchange mappings between extended attribute forks instead of data forks. */ #define XFS_EXCHMAPS_ATTR_FORK (1ULL << 0) /* Set the file sizes when finished. */ #define XFS_EXCHMAPS_SET_SIZES (1ULL << 1) /* * Exchange the mappings of the two files only if the file allocation units * mapped to file1's range have been written. */ #define XFS_EXCHMAPS_INO1_WRITTEN (1ULL << 2) /* Clear the reflink flag from inode1 after the operation. */ #define XFS_EXCHMAPS_CLEAR_INO1_REFLINK (1ULL << 3) /* Clear the reflink flag from inode2 after the operation. */ #define XFS_EXCHMAPS_CLEAR_INO2_REFLINK (1ULL << 4) #define XFS_EXCHMAPS_LOGGED_FLAGS (XFS_EXCHMAPS_ATTR_FORK | \ XFS_EXCHMAPS_SET_SIZES | \ XFS_EXCHMAPS_INO1_WRITTEN | \ XFS_EXCHMAPS_CLEAR_INO1_REFLINK | \ XFS_EXCHMAPS_CLEAR_INO2_REFLINK) /* This is the structure used to lay out an mapping exchange done log item. */ struct xfs_xmd_log_format { uint16_t xmd_type; /* xmd log item type */ uint16_t xmd_size; /* size of this item */ uint32_t __pad; uint64_t xmd_xmi_id; /* id of corresponding xmi */ }; /* * Dquot Log format definitions. * * The first two fields must be the type and size fitting into * 32 bits : log_recovery code assumes that. */ typedef struct xfs_dq_logformat { uint16_t qlf_type; /* dquot log item type */ uint16_t qlf_size; /* size of this item */ xfs_dqid_t qlf_id; /* usr/grp/proj id : 32 bits */ int64_t qlf_blkno; /* blkno of dquot buffer */ int32_t qlf_len; /* len of dquot buffer */ uint32_t qlf_boffset; /* off of dquot in buffer */ } xfs_dq_logformat_t; /* * log format struct for QUOTAOFF records. * The first two fields must be the type and size fitting into * 32 bits : log_recovery code assumes that. * We write two LI_QUOTAOFF logitems per quotaoff, the last one keeps a pointer * to the first and ensures that the first logitem is taken out of the AIL * only when the last one is securely committed. */ typedef struct xfs_qoff_logformat { unsigned short qf_type; /* quotaoff log item type */ unsigned short qf_size; /* size of this item */ unsigned int qf_flags; /* USR and/or GRP */ char qf_pad[12]; /* padding for future */ } xfs_qoff_logformat_t; /* * Disk quotas status in m_qflags, and also sb_qflags. 16 bits. */ #define XFS_UQUOTA_ACCT 0x0001 /* user quota accounting ON */ #define XFS_UQUOTA_ENFD 0x0002 /* user quota limits enforced */ #define XFS_UQUOTA_CHKD 0x0004 /* quotacheck run on usr quotas */ #define XFS_PQUOTA_ACCT 0x0008 /* project quota accounting ON */ #define XFS_OQUOTA_ENFD 0x0010 /* other (grp/prj) quota limits enforced */ #define XFS_OQUOTA_CHKD 0x0020 /* quotacheck run on other (grp/prj) quotas */ #define XFS_GQUOTA_ACCT 0x0040 /* group quota accounting ON */ /* * Conversion to and from the combined OQUOTA flag (if necessary) * is done only in xfs_sb_qflags_to_disk() and xfs_sb_qflags_from_disk() */ #define XFS_GQUOTA_ENFD 0x0080 /* group quota limits enforced */ #define XFS_GQUOTA_CHKD 0x0100 /* quotacheck run on group quotas */ #define XFS_PQUOTA_ENFD 0x0200 /* project quota limits enforced */ #define XFS_PQUOTA_CHKD 0x0400 /* quotacheck run on project quotas */ #define XFS_ALL_QUOTA_ACCT \ (XFS_UQUOTA_ACCT | XFS_GQUOTA_ACCT | XFS_PQUOTA_ACCT) #define XFS_ALL_QUOTA_ENFD \ (XFS_UQUOTA_ENFD | XFS_GQUOTA_ENFD | XFS_PQUOTA_ENFD) #define XFS_ALL_QUOTA_CHKD \ (XFS_UQUOTA_CHKD | XFS_GQUOTA_CHKD | XFS_PQUOTA_CHKD) #define XFS_MOUNT_QUOTA_ALL (XFS_UQUOTA_ACCT|XFS_UQUOTA_ENFD|\ XFS_UQUOTA_CHKD|XFS_GQUOTA_ACCT|\ XFS_GQUOTA_ENFD|XFS_GQUOTA_CHKD|\ XFS_PQUOTA_ACCT|XFS_PQUOTA_ENFD|\ XFS_PQUOTA_CHKD) /* * Inode create log item structure * * Log recovery assumes the first two entries are the type and size and they fit * in 32 bits. Also in host order (ugh) so they have to be 32 bit aligned so * decoding can be done correctly. */ struct xfs_icreate_log { uint16_t icl_type; /* type of log format structure */ uint16_t icl_size; /* size of log format structure */ __be32 icl_ag; /* ag being allocated in */ __be32 icl_agbno; /* start block of inode range */ __be32 icl_count; /* number of inodes to initialise */ __be32 icl_isize; /* size of inodes */ __be32 icl_length; /* length of extent to initialise */ __be32 icl_gen; /* inode generation number to use */ }; /* * Flags for deferred attribute operations. * Upper bits are flags, lower byte is type code */ #define XFS_ATTRI_OP_FLAGS_SET 1 /* Set the attribute */ #define XFS_ATTRI_OP_FLAGS_REMOVE 2 /* Remove the attribute */ #define XFS_ATTRI_OP_FLAGS_REPLACE 3 /* Replace the attribute */ #define XFS_ATTRI_OP_FLAGS_PPTR_SET 4 /* Set parent pointer */ #define XFS_ATTRI_OP_FLAGS_PPTR_REMOVE 5 /* Remove parent pointer */ #define XFS_ATTRI_OP_FLAGS_PPTR_REPLACE 6 /* Replace parent pointer */ #define XFS_ATTRI_OP_FLAGS_TYPE_MASK 0xFF /* Flags type mask */ /* * alfi_attr_filter captures the state of xfs_da_args.attr_filter, so it should * never have any other bits set. */ #define XFS_ATTRI_FILTER_MASK (XFS_ATTR_ROOT | \ XFS_ATTR_SECURE | \ XFS_ATTR_PARENT | \ XFS_ATTR_INCOMPLETE) /* * This is the structure used to lay out an attr log item in the * log. */ struct xfs_attri_log_format { uint16_t alfi_type; /* attri log item type */ uint16_t alfi_size; /* size of this item */ uint32_t alfi_igen; /* generation of alfi_ino for pptr ops */ uint64_t alfi_id; /* attri identifier */ uint64_t alfi_ino; /* the inode for this attr operation */ uint32_t alfi_op_flags; /* marks the op as a set or remove */ union { uint32_t alfi_name_len; /* attr name length */ struct { /* * For PPTR_REPLACE, these are the lengths of the old * and new attr names. The new and old values must * have the same length. */ uint16_t alfi_old_name_len; uint16_t alfi_new_name_len; }; }; uint32_t alfi_value_len; /* attr value length */ uint32_t alfi_attr_filter;/* attr filter flags */ }; struct xfs_attrd_log_format { uint16_t alfd_type; /* attrd log item type */ uint16_t alfd_size; /* size of this item */ uint32_t __pad; /* pad to 64 bit aligned */ uint64_t alfd_alf_id; /* id of corresponding attri */ }; #endif /* __XFS_LOG_FORMAT_H__ */ |
| 6 6 6 11 4 4 11 11 11 11 11 1 11 11 11 11 11 12 12 11 9 9 9 9 9 9 9 9 9 9 9 4 9 9 9 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 | // SPDX-License-Identifier: GPL-2.0-or-later /* Volume-level cache cookie handling. * * Copyright (C) 2021 Red Hat, Inc. All Rights Reserved. * Written by David Howells (dhowells@redhat.com) */ #define FSCACHE_DEBUG_LEVEL COOKIE #include <linux/export.h> #include <linux/slab.h> #include "internal.h" #define fscache_volume_hash_shift 10 static struct hlist_bl_head fscache_volume_hash[1 << fscache_volume_hash_shift]; static atomic_t fscache_volume_debug_id; static LIST_HEAD(fscache_volumes); static void fscache_create_volume_work(struct work_struct *work); struct fscache_volume *fscache_get_volume(struct fscache_volume *volume, enum fscache_volume_trace where) { int ref; __refcount_inc(&volume->ref, &ref); trace_fscache_volume(volume->debug_id, ref + 1, where); return volume; } struct fscache_volume *fscache_try_get_volume(struct fscache_volume *volume, enum fscache_volume_trace where) { int ref; if (!__refcount_inc_not_zero(&volume->ref, &ref)) return NULL; trace_fscache_volume(volume->debug_id, ref + 1, where); return volume; } EXPORT_SYMBOL(fscache_try_get_volume); static void fscache_see_volume(struct fscache_volume *volume, enum fscache_volume_trace where) { int ref = refcount_read(&volume->ref); trace_fscache_volume(volume->debug_id, ref, where); } /* * Pin the cache behind a volume so that we can access it. */ static void __fscache_begin_volume_access(struct fscache_volume *volume, struct fscache_cookie *cookie, enum fscache_access_trace why) { int n_accesses; n_accesses = atomic_inc_return(&volume->n_accesses); smp_mb__after_atomic(); trace_fscache_access_volume(volume->debug_id, cookie ? cookie->debug_id : 0, refcount_read(&volume->ref), n_accesses, why); } /** * fscache_begin_volume_access - Pin a cache so a volume can be accessed * @volume: The volume cookie * @cookie: A datafile cookie for a tracing reference (or NULL) * @why: An indication of the circumstances of the access for tracing * * Attempt to pin the cache to prevent it from going away whilst we're * accessing a volume and returns true if successful. This works as follows: * * (1) If the cache tests as not live (state is not FSCACHE_CACHE_IS_ACTIVE), * then we return false to indicate access was not permitted. * * (2) If the cache tests as live, then we increment the volume's n_accesses * count and then recheck the cache liveness, ending the access if it * ceased to be live. * * (3) When we end the access, we decrement the volume's n_accesses and wake * up the any waiters if it reaches 0. * * (4) Whilst the cache is caching, the volume's n_accesses is kept * artificially incremented to prevent wakeups from happening. * * (5) When the cache is taken offline, the state is changed to prevent new * accesses, the volume's n_accesses is decremented and we wait for it to * become 0. * * The datafile @cookie and the @why indicator are merely provided for tracing * purposes. */ bool fscache_begin_volume_access(struct fscache_volume *volume, struct fscache_cookie *cookie, enum fscache_access_trace why) { if (!fscache_cache_is_live(volume->cache)) return false; __fscache_begin_volume_access(volume, cookie, why); if (!fscache_cache_is_live(volume->cache)) { fscache_end_volume_access(volume, cookie, fscache_access_unlive); return false; } return true; } /** * fscache_end_volume_access - Unpin a cache at the end of an access. * @volume: The volume cookie * @cookie: A datafile cookie for a tracing reference (or NULL) * @why: An indication of the circumstances of the access for tracing * * Unpin a cache volume after we've accessed it. The datafile @cookie and the * @why indicator are merely provided for tracing purposes. */ void fscache_end_volume_access(struct fscache_volume *volume, struct fscache_cookie *cookie, enum fscache_access_trace why) { int n_accesses; smp_mb__before_atomic(); n_accesses = atomic_dec_return(&volume->n_accesses); trace_fscache_access_volume(volume->debug_id, cookie ? cookie->debug_id : 0, refcount_read(&volume->ref), n_accesses, why); if (n_accesses == 0) wake_up_var(&volume->n_accesses); } EXPORT_SYMBOL(fscache_end_volume_access); static bool fscache_volume_same(const struct fscache_volume *a, const struct fscache_volume *b) { size_t klen; if (a->key_hash != b->key_hash || a->cache != b->cache || a->key[0] != b->key[0]) return false; klen = round_up(a->key[0] + 1, sizeof(__le32)); return memcmp(a->key, b->key, klen) == 0; } static bool fscache_is_acquire_pending(struct fscache_volume *volume) { return test_bit(FSCACHE_VOLUME_ACQUIRE_PENDING, &volume->flags); } static void fscache_wait_on_volume_collision(struct fscache_volume *candidate, unsigned int collidee_debug_id) { wait_on_bit_timeout(&candidate->flags, FSCACHE_VOLUME_ACQUIRE_PENDING, TASK_UNINTERRUPTIBLE, 20 * HZ); if (fscache_is_acquire_pending(candidate)) { pr_notice("Potential volume collision new=%08x old=%08x", candidate->debug_id, collidee_debug_id); fscache_stat(&fscache_n_volumes_collision); wait_on_bit(&candidate->flags, FSCACHE_VOLUME_ACQUIRE_PENDING, TASK_UNINTERRUPTIBLE); } } /* * Attempt to insert the new volume into the hash. If there's a collision, we * wait for the old volume to complete if it's being relinquished and an error * otherwise. */ static bool fscache_hash_volume(struct fscache_volume *candidate) { struct fscache_volume *cursor; struct hlist_bl_head *h; struct hlist_bl_node *p; unsigned int bucket, collidee_debug_id = 0; bucket = candidate->key_hash & (ARRAY_SIZE(fscache_volume_hash) - 1); h = &fscache_volume_hash[bucket]; hlist_bl_lock(h); hlist_bl_for_each_entry(cursor, p, h, hash_link) { if (fscache_volume_same(candidate, cursor)) { if (!test_bit(FSCACHE_VOLUME_RELINQUISHED, &cursor->flags)) goto collision; fscache_see_volume(cursor, fscache_volume_get_hash_collision); set_bit(FSCACHE_VOLUME_COLLIDED_WITH, &cursor->flags); set_bit(FSCACHE_VOLUME_ACQUIRE_PENDING, &candidate->flags); collidee_debug_id = cursor->debug_id; break; } } hlist_bl_add_head(&candidate->hash_link, h); hlist_bl_unlock(h); if (fscache_is_acquire_pending(candidate)) fscache_wait_on_volume_collision(candidate, collidee_debug_id); return true; collision: fscache_see_volume(cursor, fscache_volume_collision); hlist_bl_unlock(h); return false; } /* * Allocate and initialise a volume representation cookie. */ static struct fscache_volume *fscache_alloc_volume(const char *volume_key, const char *cache_name, const void *coherency_data, size_t coherency_len) { struct fscache_volume *volume; struct fscache_cache *cache; size_t klen, hlen; u8 *key; klen = strlen(volume_key); if (klen > NAME_MAX) return NULL; if (!coherency_data) coherency_len = 0; cache = fscache_lookup_cache(cache_name, false); if (IS_ERR(cache)) return NULL; volume = kzalloc(struct_size(volume, coherency, coherency_len), GFP_KERNEL); if (!volume) goto err_cache; volume->cache = cache; volume->coherency_len = coherency_len; if (coherency_data) memcpy(volume->coherency, coherency_data, coherency_len); INIT_LIST_HEAD(&volume->proc_link); INIT_WORK(&volume->work, fscache_create_volume_work); refcount_set(&volume->ref, 1); spin_lock_init(&volume->lock); /* Stick the length on the front of the key and pad it out to make * hashing easier. */ hlen = round_up(1 + klen + 1, sizeof(__le32)); key = kzalloc(hlen, GFP_KERNEL); if (!key) goto err_vol; key[0] = klen; memcpy(key + 1, volume_key, klen); volume->key = key; volume->key_hash = fscache_hash(0, key, hlen); volume->debug_id = atomic_inc_return(&fscache_volume_debug_id); down_write(&fscache_addremove_sem); atomic_inc(&cache->n_volumes); list_add_tail(&volume->proc_link, &fscache_volumes); fscache_see_volume(volume, fscache_volume_new_acquire); fscache_stat(&fscache_n_volumes); up_write(&fscache_addremove_sem); _leave(" = v=%x", volume->debug_id); return volume; err_vol: kfree(volume); err_cache: fscache_put_cache(cache, fscache_cache_put_alloc_volume); fscache_stat(&fscache_n_volumes_nomem); return NULL; } /* * Create a volume's representation on disk. Have a volume ref and a cache * access we have to release. */ static void fscache_create_volume_work(struct work_struct *work) { const struct fscache_cache_ops *ops; struct fscache_volume *volume = container_of(work, struct fscache_volume, work); fscache_see_volume(volume, fscache_volume_see_create_work); ops = volume->cache->ops; if (ops->acquire_volume) ops->acquire_volume(volume); fscache_end_cache_access(volume->cache, fscache_access_acquire_volume_end); clear_and_wake_up_bit(FSCACHE_VOLUME_CREATING, &volume->flags); fscache_put_volume(volume, fscache_volume_put_create_work); } /* * Dispatch a worker thread to create a volume's representation on disk. */ void fscache_create_volume(struct fscache_volume *volume, bool wait) { if (test_and_set_bit(FSCACHE_VOLUME_CREATING, &volume->flags)) goto maybe_wait; if (volume->cache_priv) goto no_wait; /* We raced */ if (!fscache_begin_cache_access(volume->cache, fscache_access_acquire_volume)) goto no_wait; fscache_get_volume(volume, fscache_volume_get_create_work); if (!schedule_work(&volume->work)) fscache_put_volume(volume, fscache_volume_put_create_work); maybe_wait: if (wait) { fscache_see_volume(volume, fscache_volume_wait_create_work); wait_on_bit(&volume->flags, FSCACHE_VOLUME_CREATING, TASK_UNINTERRUPTIBLE); } return; no_wait: clear_and_wake_up_bit(FSCACHE_VOLUME_CREATING, &volume->flags); } /* * Acquire a volume representation cookie and link it to a (proposed) cache. */ struct fscache_volume *__fscache_acquire_volume(const char *volume_key, const char *cache_name, const void *coherency_data, size_t coherency_len) { struct fscache_volume *volume; volume = fscache_alloc_volume(volume_key, cache_name, coherency_data, coherency_len); if (!volume) return ERR_PTR(-ENOMEM); if (!fscache_hash_volume(volume)) { fscache_put_volume(volume, fscache_volume_put_hash_collision); return ERR_PTR(-EBUSY); } fscache_create_volume(volume, false); return volume; } EXPORT_SYMBOL(__fscache_acquire_volume); static void fscache_wake_pending_volume(struct fscache_volume *volume, struct hlist_bl_head *h) { struct fscache_volume *cursor; struct hlist_bl_node *p; hlist_bl_for_each_entry(cursor, p, h, hash_link) { if (fscache_volume_same(cursor, volume)) { fscache_see_volume(cursor, fscache_volume_see_hash_wake); clear_and_wake_up_bit(FSCACHE_VOLUME_ACQUIRE_PENDING, &cursor->flags); return; } } } /* * Remove a volume cookie from the hash table. */ static void fscache_unhash_volume(struct fscache_volume *volume) { struct hlist_bl_head *h; unsigned int bucket; bucket = volume->key_hash & (ARRAY_SIZE(fscache_volume_hash) - 1); h = &fscache_volume_hash[bucket]; hlist_bl_lock(h); hlist_bl_del(&volume->hash_link); if (test_bit(FSCACHE_VOLUME_COLLIDED_WITH, &volume->flags)) fscache_wake_pending_volume(volume, h); hlist_bl_unlock(h); } /* * Drop a cache's volume attachments. */ static void fscache_free_volume(struct fscache_volume *volume) { struct fscache_cache *cache = volume->cache; if (volume->cache_priv) { __fscache_begin_volume_access(volume, NULL, fscache_access_relinquish_volume); if (volume->cache_priv) cache->ops->free_volume(volume); fscache_end_volume_access(volume, NULL, fscache_access_relinquish_volume_end); } down_write(&fscache_addremove_sem); list_del_init(&volume->proc_link); atomic_dec(&volume->cache->n_volumes); up_write(&fscache_addremove_sem); if (!hlist_bl_unhashed(&volume->hash_link)) fscache_unhash_volume(volume); trace_fscache_volume(volume->debug_id, 0, fscache_volume_free); kfree(volume->key); kfree(volume); fscache_stat_d(&fscache_n_volumes); fscache_put_cache(cache, fscache_cache_put_volume); } /* * Drop a reference to a volume cookie. */ void fscache_put_volume(struct fscache_volume *volume, enum fscache_volume_trace where) { if (volume) { unsigned int debug_id = volume->debug_id; bool zero; int ref; zero = __refcount_dec_and_test(&volume->ref, &ref); trace_fscache_volume(debug_id, ref - 1, where); if (zero) fscache_free_volume(volume); } } EXPORT_SYMBOL(fscache_put_volume); /* * Relinquish a volume representation cookie. */ void __fscache_relinquish_volume(struct fscache_volume *volume, const void *coherency_data, bool invalidate) { if (WARN_ON(test_and_set_bit(FSCACHE_VOLUME_RELINQUISHED, &volume->flags))) return; if (invalidate) { set_bit(FSCACHE_VOLUME_INVALIDATE, &volume->flags); } else if (coherency_data) { memcpy(volume->coherency, coherency_data, volume->coherency_len); } fscache_put_volume(volume, fscache_volume_put_relinquish); } EXPORT_SYMBOL(__fscache_relinquish_volume); /** * fscache_withdraw_volume - Withdraw a volume from being cached * @volume: Volume cookie * * Withdraw a cache volume from service, waiting for all accesses to complete * before returning. */ void fscache_withdraw_volume(struct fscache_volume *volume) { int n_accesses; _debug("withdraw V=%x", volume->debug_id); /* Allow wakeups on dec-to-0 */ n_accesses = atomic_dec_return(&volume->n_accesses); trace_fscache_access_volume(volume->debug_id, 0, refcount_read(&volume->ref), n_accesses, fscache_access_cache_unpin); wait_var_event(&volume->n_accesses, atomic_read(&volume->n_accesses) == 0); } EXPORT_SYMBOL(fscache_withdraw_volume); #ifdef CONFIG_PROC_FS /* * Generate a list of volumes in /proc/fs/fscache/volumes */ static int fscache_volumes_seq_show(struct seq_file *m, void *v) { struct fscache_volume *volume; if (v == &fscache_volumes) { seq_puts(m, "VOLUME REF nCOOK ACC FL CACHE KEY\n" "======== ===== ===== === == =============== ================\n"); return 0; } volume = list_entry(v, struct fscache_volume, proc_link); seq_printf(m, "%08x %5d %5d %3d %02lx %-15.15s %s\n", volume->debug_id, refcount_read(&volume->ref), atomic_read(&volume->n_cookies), atomic_read(&volume->n_accesses), volume->flags, volume->cache->name ?: "-", volume->key + 1); return 0; } static void *fscache_volumes_seq_start(struct seq_file *m, loff_t *_pos) __acquires(&fscache_addremove_sem) { down_read(&fscache_addremove_sem); return seq_list_start_head(&fscache_volumes, *_pos); } static void *fscache_volumes_seq_next(struct seq_file *m, void *v, loff_t *_pos) { return seq_list_next(v, &fscache_volumes, _pos); } static void fscache_volumes_seq_stop(struct seq_file *m, void *v) __releases(&fscache_addremove_sem) { up_read(&fscache_addremove_sem); } const struct seq_operations fscache_volumes_seq_ops = { .start = fscache_volumes_seq_start, .next = fscache_volumes_seq_next, .stop = fscache_volumes_seq_stop, .show = fscache_volumes_seq_show, }; #endif /* CONFIG_PROC_FS */ |
| 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 | // SPDX-License-Identifier: GPL-2.0 /* * Copyright (c) 2000-2003,2005 Silicon Graphics, Inc. * Copyright (C) 2010 Red Hat, Inc. * All Rights Reserved. */ #include "xfs.h" #include "xfs_fs.h" #include "xfs_shared.h" #include "xfs_format.h" #include "xfs_log_format.h" #include "xfs_trans_resv.h" #include "xfs_mount.h" #include "xfs_da_format.h" #include "xfs_da_btree.h" #include "xfs_inode.h" #include "xfs_bmap_btree.h" #include "xfs_quota.h" #include "xfs_trans.h" #include "xfs_qm.h" #include "xfs_trans_space.h" #include "xfs_rtbitmap.h" #include "xfs_attr_item.h" #include "xfs_log.h" #define _ALLOC true #define _FREE false /* * A buffer has a format structure overhead in the log in addition * to the data, so we need to take this into account when reserving * space in a transaction for a buffer. Round the space required up * to a multiple of 128 bytes so that we don't change the historical * reservation that has been used for this overhead. */ STATIC uint xfs_buf_log_overhead(void) { return round_up(sizeof(struct xlog_op_header) + sizeof(struct xfs_buf_log_format), 128); } /* * Calculate out transaction log reservation per item in bytes. * * The nbufs argument is used to indicate the number of items that * will be changed in a transaction. size is used to tell how many * bytes should be reserved per item. */ STATIC uint xfs_calc_buf_res( uint nbufs, uint size) { return nbufs * (size + xfs_buf_log_overhead()); } /* * Per-extent log reservation for the btree changes involved in freeing or * allocating an extent. In classic XFS there were two trees that will be * modified (bnobt + cntbt). With rmap enabled, there are three trees * (rmapbt). The number of blocks reserved is based on the formula: * * num trees * ((2 blocks/level * max depth) - 1) * * Keep in mind that max depth is calculated separately for each type of tree. */ uint xfs_allocfree_block_count( struct xfs_mount *mp, uint num_ops) { uint blocks; blocks = num_ops * 2 * (2 * mp->m_alloc_maxlevels - 1); if (xfs_has_rmapbt(mp)) blocks += num_ops * (2 * mp->m_rmap_maxlevels - 1); return blocks; } /* * Per-extent log reservation for refcount btree changes. These are never done * in the same transaction as an allocation or a free, so we compute them * separately. */ static unsigned int xfs_refcountbt_block_count( struct xfs_mount *mp, unsigned int num_ops) { return num_ops * (2 * mp->m_refc_maxlevels - 1); } static unsigned int xfs_rtrefcountbt_block_count( struct xfs_mount *mp, unsigned int num_ops) { return num_ops * (2 * mp->m_rtrefc_maxlevels - 1); } /* * Logging inodes is really tricksy. They are logged in memory format, * which means that what we write into the log doesn't directly translate into * the amount of space they use on disk. * * Case in point - btree format forks in memory format use more space than the * on-disk format. In memory, the buffer contains a normal btree block header so * the btree code can treat it as though it is just another generic buffer. * However, when we write it to the inode fork, we don't write all of this * header as it isn't needed. e.g. the root is only ever in the inode, so * there's no need for sibling pointers which would waste 16 bytes of space. * * Hence when we have an inode with a maximally sized btree format fork, then * amount of information we actually log is greater than the size of the inode * on disk. Hence we need an inode reservation function that calculates all this * correctly. So, we log: * * - 4 log op headers for object * - for the ilf, the inode core and 2 forks * - inode log format object * - the inode core * - two inode forks containing bmap btree root blocks. * - the btree data contained by both forks will fit into the inode size, * hence when combined with the inode core above, we have a total of the * actual inode size. * - the BMBT headers need to be accounted separately, as they are * additional to the records and pointers that fit inside the inode * forks. */ STATIC uint xfs_calc_inode_res( struct xfs_mount *mp, uint ninodes) { return ninodes * (4 * sizeof(struct xlog_op_header) + sizeof(struct xfs_inode_log_format) + mp->m_sb.sb_inodesize + 2 * xfs_bmbt_block_len(mp)); } /* * Inode btree record insertion/removal modifies the inode btree and free space * btrees (since the inobt does not use the agfl). This requires the following * reservation: * * the inode btree: max depth * blocksize * the allocation btrees: 2 trees * (max depth - 1) * block size * * The caller must account for SB and AG header modifications, etc. */ STATIC uint xfs_calc_inobt_res( struct xfs_mount *mp) { return xfs_calc_buf_res(M_IGEO(mp)->inobt_maxlevels, XFS_FSB_TO_B(mp, 1)) + xfs_calc_buf_res(xfs_allocfree_block_count(mp, 1), XFS_FSB_TO_B(mp, 1)); } /* * The free inode btree is a conditional feature. The behavior differs slightly * from that of the traditional inode btree in that the finobt tracks records * for inode chunks with at least one free inode. A record can be removed from * the tree during individual inode allocation. Therefore the finobt * reservation is unconditional for both the inode chunk allocation and * individual inode allocation (modify) cases. * * Behavior aside, the reservation for finobt modification is equivalent to the * traditional inobt: cover a full finobt shape change plus block allocation. */ STATIC uint xfs_calc_finobt_res( struct xfs_mount *mp) { if (!xfs_has_finobt(mp)) return 0; return xfs_calc_inobt_res(mp); } /* * Calculate the reservation required to allocate or free an inode chunk. This * includes: * * the allocation btrees: 2 trees * (max depth - 1) * block size * the inode chunk: m_ino_geo.ialloc_blks * N * * The size N of the inode chunk reservation depends on whether it is for * allocation or free and which type of create transaction is in use. An inode * chunk free always invalidates the buffers and only requires reservation for * headers (N == 0). An inode chunk allocation requires a chunk sized * reservation on v4 and older superblocks to initialize the chunk. No chunk * reservation is required for allocation on v5 supers, which use ordered * buffers to initialize. */ STATIC uint xfs_calc_inode_chunk_res( struct xfs_mount *mp, bool alloc) { uint res, size = 0; res = xfs_calc_buf_res(xfs_allocfree_block_count(mp, 1), XFS_FSB_TO_B(mp, 1)); if (alloc) { /* icreate tx uses ordered buffers */ if (xfs_has_v3inodes(mp)) return res; size = XFS_FSB_TO_B(mp, 1); } res += xfs_calc_buf_res(M_IGEO(mp)->ialloc_blks, size); return res; } /* * Per-extent log reservation for the btree changes involved in freeing or * allocating a realtime extent. We have to be able to log as many rtbitmap * blocks as needed to mark inuse XFS_BMBT_MAX_EXTLEN blocks' worth of realtime * extents, as well as the realtime summary block (t1). Realtime rmap btree * operations happen in a second transaction, so factor in a couple of rtrmapbt * splits (t2). */ static unsigned int xfs_rtalloc_block_count( struct xfs_mount *mp, unsigned int num_ops) { unsigned int rtbmp_blocks; xfs_rtxlen_t rtxlen; unsigned int t1, t2 = 0; rtxlen = xfs_extlen_to_rtxlen(mp, XFS_MAX_BMBT_EXTLEN); rtbmp_blocks = xfs_rtbitmap_blockcount_len(mp, rtxlen); t1 = (rtbmp_blocks + 1) * num_ops; if (xfs_has_rmapbt(mp)) t2 = num_ops * (2 * mp->m_rtrmap_maxlevels - 1); return max(t1, t2); } /* * Various log reservation values. * * These are based on the size of the file system block because that is what * most transactions manipulate. Each adds in an additional 128 bytes per * item logged to try to account for the overhead of the transaction mechanism. * * Note: Most of the reservations underestimate the number of allocation * groups into which they could free extents in the xfs_defer_finish() call. * This is because the number in the worst case is quite high and quite * unusual. In order to fix this we need to change xfs_defer_finish() to free * extents in only a single AG at a time. This will require changes to the * EFI code as well, however, so that the EFI for the extents not freed is * logged again in each transaction. See SGI PV #261917. * * Reservation functions here avoid a huge stack in xfs_trans_init due to * register overflow from temporaries in the calculations. */ /* * Compute the log reservation required to handle the refcount update * transaction. Refcount updates are always done via deferred log items. * * This is calculated as the max of: * Data device refcount updates (t1): * the agfs of the ags containing the blocks: nr_ops * sector size * the refcount btrees: nr_ops * 1 trees * (2 * max depth - 1) * block size * Realtime refcount updates (t2); * the rt refcount inode * the rtrefcount btrees: nr_ops * 1 trees * (2 * max depth - 1) * block size */ static unsigned int xfs_calc_refcountbt_reservation( struct xfs_mount *mp, unsigned int nr_ops) { unsigned int blksz = XFS_FSB_TO_B(mp, 1); unsigned int t1, t2 = 0; if (!xfs_has_reflink(mp)) return 0; t1 = xfs_calc_buf_res(nr_ops, mp->m_sb.sb_sectsize) + xfs_calc_buf_res(xfs_refcountbt_block_count(mp, nr_ops), blksz); if (xfs_has_realtime(mp)) t2 = xfs_calc_inode_res(mp, 1) + xfs_calc_buf_res(xfs_rtrefcountbt_block_count(mp, nr_ops), blksz); return max(t1, t2); } /* * In a write transaction we can allocate a maximum of 2 * extents. This gives (t1): * the inode getting the new extents: inode size * the inode's bmap btree: max depth * block size * the agfs of the ags from which the extents are allocated: 2 * sector * the superblock free block counter: sector size * the allocation btrees: 2 exts * 2 trees * (2 * max depth - 1) * block size * Or, if we're writing to a realtime file (t2): * the inode getting the new extents: inode size * the inode's bmap btree: max depth * block size * the agfs of the ags from which the extents are allocated: 2 * sector * the superblock free block counter: sector size * the realtime bitmap: ((XFS_BMBT_MAX_EXTLEN / rtextsize) / NBBY) bytes * the realtime summary: 1 block * the allocation btrees: 2 trees * (2 * max depth - 1) * block size * And the bmap_finish transaction can free bmap blocks in a join (t3): * the agfs of the ags containing the blocks: 2 * sector size * the agfls of the ags containing the blocks: 2 * sector size * the super block free block counter: sector size * the allocation btrees: 2 exts * 2 trees * (2 * max depth - 1) * block size * And any refcount updates that happen in a separate transaction (t4). */ STATIC uint xfs_calc_write_reservation( struct xfs_mount *mp, bool for_minlogsize) { unsigned int t1, t2, t3, t4; unsigned int blksz = XFS_FSB_TO_B(mp, 1); t1 = xfs_calc_inode_res(mp, 1) + xfs_calc_buf_res(XFS_BM_MAXLEVELS(mp, XFS_DATA_FORK), blksz) + xfs_calc_buf_res(3, mp->m_sb.sb_sectsize) + xfs_calc_buf_res(xfs_allocfree_block_count(mp, 2), blksz); if (xfs_has_realtime(mp)) { t2 = xfs_calc_inode_res(mp, 1) + xfs_calc_buf_res(XFS_BM_MAXLEVELS(mp, XFS_DATA_FORK), blksz) + xfs_calc_buf_res(3, mp->m_sb.sb_sectsize) + xfs_calc_buf_res(xfs_rtalloc_block_count(mp, 1), blksz) + xfs_calc_buf_res(xfs_allocfree_block_count(mp, 1), blksz); } else { t2 = 0; } t3 = xfs_calc_buf_res(5, mp->m_sb.sb_sectsize) + xfs_calc_buf_res(xfs_allocfree_block_count(mp, 2), blksz); /* * In the early days of reflink, we included enough reservation to log * two refcountbt splits for each transaction. The codebase runs * refcountbt updates in separate transactions now, so to compute the * minimum log size, add the refcountbtree splits back to t1 and t3 and * do not account them separately as t4. Reflink did not support * realtime when the reservations were established, so no adjustment to * t2 is needed. */ if (for_minlogsize) { unsigned int adj = 0; if (xfs_has_reflink(mp)) adj = xfs_calc_buf_res( xfs_refcountbt_block_count(mp, 2), blksz); t1 += adj; t3 += adj; return XFS_DQUOT_LOGRES + max3(t1, t2, t3); } t4 = xfs_calc_refcountbt_reservation(mp, 1); return XFS_DQUOT_LOGRES + max(t4, max3(t1, t2, t3)); } unsigned int xfs_calc_write_reservation_minlogsize( struct xfs_mount *mp) { return xfs_calc_write_reservation(mp, true); } /* * In truncating a file we free up to two extents at once. We can modify (t1): * the inode being truncated: inode size * the inode's bmap btree: (max depth + 1) * block size * And the bmap_finish transaction can free the blocks and bmap blocks (t2): * the agf for each of the ags: 4 * sector size * the agfl for each of the ags: 4 * sector size * the super block to reflect the freed blocks: sector size * worst case split in allocation btrees per extent assuming 4 extents: * 4 exts * 2 trees * (2 * max depth - 1) * block size * Or, if it's a realtime file (t3): * the agf for each of the ags: 2 * sector size * the agfl for each of the ags: 2 * sector size * the super block to reflect the freed blocks: sector size * the realtime bitmap: * 2 exts * ((XFS_BMBT_MAX_EXTLEN / rtextsize) / NBBY) bytes * the realtime summary: 2 exts * 1 block * worst case split in allocation btrees per extent assuming 2 extents: * 2 exts * 2 trees * (2 * max depth - 1) * block size * And any refcount updates that happen in a separate transaction (t4). */ STATIC uint xfs_calc_itruncate_reservation( struct xfs_mount *mp, bool for_minlogsize) { unsigned int t1, t2, t3, t4; unsigned int blksz = XFS_FSB_TO_B(mp, 1); t1 = xfs_calc_inode_res(mp, 1) + xfs_calc_buf_res(XFS_BM_MAXLEVELS(mp, XFS_DATA_FORK) + 1, blksz); t2 = xfs_calc_buf_res(9, mp->m_sb.sb_sectsize) + xfs_calc_buf_res(xfs_allocfree_block_count(mp, 4), blksz); if (xfs_has_realtime(mp)) { t3 = xfs_calc_buf_res(5, mp->m_sb.sb_sectsize) + xfs_calc_buf_res(xfs_rtalloc_block_count(mp, 2), blksz) + xfs_calc_buf_res(xfs_allocfree_block_count(mp, 2), blksz); } else { t3 = 0; } /* * In the early days of reflink, we included enough reservation to log * four refcountbt splits in the same transaction as bnobt/cntbt * updates. The codebase runs refcountbt updates in separate * transactions now, so to compute the minimum log size, add the * refcount btree splits back here and do not compute them separately * as t4. Reflink did not support realtime when the reservations were * established, so do not adjust t3. */ if (for_minlogsize) { if (xfs_has_reflink(mp)) t2 += xfs_calc_buf_res( xfs_refcountbt_block_count(mp, 4), blksz); return XFS_DQUOT_LOGRES + max3(t1, t2, t3); } t4 = xfs_calc_refcountbt_reservation(mp, 2); return XFS_DQUOT_LOGRES + max(t4, max3(t1, t2, t3)); } unsigned int xfs_calc_itruncate_reservation_minlogsize( struct xfs_mount *mp) { return xfs_calc_itruncate_reservation(mp, true); } static inline unsigned int xfs_calc_pptr_link_overhead(void) { return sizeof(struct xfs_attri_log_format) + xlog_calc_iovec_len(sizeof(struct xfs_parent_rec)) + xlog_calc_iovec_len(MAXNAMELEN - 1); } static inline unsigned int xfs_calc_pptr_unlink_overhead(void) { return sizeof(struct xfs_attri_log_format) + xlog_calc_iovec_len(sizeof(struct xfs_parent_rec)) + xlog_calc_iovec_len(MAXNAMELEN - 1); } static inline unsigned int xfs_calc_pptr_replace_overhead(void) { return sizeof(struct xfs_attri_log_format) + xlog_calc_iovec_len(sizeof(struct xfs_parent_rec)) + xlog_calc_iovec_len(MAXNAMELEN - 1) + xlog_calc_iovec_len(sizeof(struct xfs_parent_rec)) + xlog_calc_iovec_len(MAXNAMELEN - 1); } /* * In renaming a files we can modify: * the five inodes involved: 5 * inode size * the two directory btrees: 2 * (max depth + v2) * dir block size * the two directory bmap btrees: 2 * max depth * block size * And the bmap_finish transaction can free dir and bmap blocks (two sets * of bmap blocks) giving (t2): * the agf for the ags in which the blocks live: 3 * sector size * the agfl for the ags in which the blocks live: 3 * sector size * the superblock for the free block count: sector size * the allocation btrees: 3 exts * 2 trees * (2 * max depth - 1) * block size * If parent pointers are enabled (t3), then each transaction in the chain * must be capable of setting or removing the extended attribute * containing the parent information. It must also be able to handle * the three xattr intent items that track the progress of the parent * pointer update. */ STATIC uint xfs_calc_rename_reservation( struct xfs_mount *mp) { unsigned int overhead = XFS_DQUOT_LOGRES; struct xfs_trans_resv *resp = M_RES(mp); unsigned int t1, t2, t3 = 0; t1 = xfs_calc_inode_res(mp, 5) + xfs_calc_buf_res(2 * XFS_DIROP_LOG_COUNT(mp), XFS_FSB_TO_B(mp, 1)); t2 = xfs_calc_buf_res(7, mp->m_sb.sb_sectsize) + xfs_calc_buf_res(xfs_allocfree_block_count(mp, 3), XFS_FSB_TO_B(mp, 1)); if (xfs_has_parent(mp)) { unsigned int rename_overhead, exchange_overhead; t3 = max(resp->tr_attrsetm.tr_logres, resp->tr_attrrm.tr_logres); /* * For a standard rename, the three xattr intent log items * are (1) replacing the pptr for the source file; (2) * removing the pptr on the dest file; and (3) adding a * pptr for the whiteout file in the src dir. * * For an RENAME_EXCHANGE, there are two xattr intent * items to replace the pptr for both src and dest * files. Link counts don't change and there is no * whiteout. * * In the worst case we can end up relogging all log * intent items to allow the log tail to move ahead, so * they become overhead added to each transaction in a * processing chain. */ rename_overhead = xfs_calc_pptr_replace_overhead() + xfs_calc_pptr_unlink_overhead() + xfs_calc_pptr_link_overhead(); exchange_overhead = 2 * xfs_calc_pptr_replace_overhead(); overhead += max(rename_overhead, exchange_overhead); } return overhead + max3(t1, t2, t3); } static inline unsigned int xfs_rename_log_count( struct xfs_mount *mp, struct xfs_trans_resv *resp) { /* One for the rename, one more for freeing blocks */ unsigned int ret = XFS_RENAME_LOG_COUNT; /* * Pre-reserve enough log reservation to handle the transaction * rolling needed to remove or add one parent pointer. */ if (xfs_has_parent(mp)) ret += max(resp->tr_attrsetm.tr_logcount, resp->tr_attrrm.tr_logcount); return ret; } /* * For removing an inode from unlinked list at first, we can modify: * the agi hash list and counters: sector size * the on disk inode before ours in the agi hash list: inode cluster size * the on disk inode in the agi hash list: inode cluster size */ STATIC uint xfs_calc_iunlink_remove_reservation( struct xfs_mount *mp) { return xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) + 2 * M_IGEO(mp)->inode_cluster_size; } static inline unsigned int xfs_link_log_count( struct xfs_mount *mp, struct xfs_trans_resv *resp) { unsigned int ret = XFS_LINK_LOG_COUNT; /* * Pre-reserve enough log reservation to handle the transaction * rolling needed to add one parent pointer. */ if (xfs_has_parent(mp)) ret += resp->tr_attrsetm.tr_logcount; return ret; } /* * For creating a link to an inode: * the parent directory inode: inode size * the linked inode: inode size * the directory btree could split: (max depth + v2) * dir block size * the directory bmap btree could join or split: (max depth + v2) * blocksize * And the bmap_finish transaction can free some bmap blocks giving: * the agf for the ag in which the blocks live: sector size * the agfl for the ag in which the blocks live: sector size * the superblock for the free block count: sector size * the allocation btrees: 2 trees * (2 * max depth - 1) * block size */ STATIC uint xfs_calc_link_reservation( struct xfs_mount *mp) { unsigned int overhead = XFS_DQUOT_LOGRES; struct xfs_trans_resv *resp = M_RES(mp); unsigned int t1, t2, t3 = 0; overhead += xfs_calc_iunlink_remove_reservation(mp); t1 = xfs_calc_inode_res(mp, 2) + xfs_calc_buf_res(XFS_DIROP_LOG_COUNT(mp), XFS_FSB_TO_B(mp, 1)); t2 = xfs_calc_buf_res(3, mp->m_sb.sb_sectsize) + xfs_calc_buf_res(xfs_allocfree_block_count(mp, 1), XFS_FSB_TO_B(mp, 1)); if (xfs_has_parent(mp)) { t3 = resp->tr_attrsetm.tr_logres; overhead += xfs_calc_pptr_link_overhead(); } return overhead + max3(t1, t2, t3); } /* * For adding an inode to unlinked list we can modify: * the agi hash list: sector size * the on disk inode: inode cluster size */ STATIC uint xfs_calc_iunlink_add_reservation(xfs_mount_t *mp) { return xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) + M_IGEO(mp)->inode_cluster_size; } static inline unsigned int xfs_remove_log_count( struct xfs_mount *mp, struct xfs_trans_resv *resp) { unsigned int ret = XFS_REMOVE_LOG_COUNT; /* * Pre-reserve enough log reservation to handle the transaction * rolling needed to add one parent pointer. */ if (xfs_has_parent(mp)) ret += resp->tr_attrrm.tr_logcount; return ret; } /* * For removing a directory entry we can modify: * the parent directory inode: inode size * the removed inode: inode size * the directory btree could join: (max depth + v2) * dir block size * the directory bmap btree could join or split: (max depth + v2) * blocksize * And the bmap_finish transaction can free the dir and bmap blocks giving: * the agf for the ag in which the blocks live: 2 * sector size * the agfl for the ag in which the blocks live: 2 * sector size * the superblock for the free block count: sector size * the allocation btrees: 2 exts * 2 trees * (2 * max depth - 1) * block size */ STATIC uint xfs_calc_remove_reservation( struct xfs_mount *mp) { unsigned int overhead = XFS_DQUOT_LOGRES; struct xfs_trans_resv *resp = M_RES(mp); unsigned int t1, t2, t3 = 0; overhead += xfs_calc_iunlink_add_reservation(mp); t1 = xfs_calc_inode_res(mp, 2) + xfs_calc_buf_res(XFS_DIROP_LOG_COUNT(mp), XFS_FSB_TO_B(mp, 1)); t2 = xfs_calc_buf_res(4, mp->m_sb.sb_sectsize) + xfs_calc_buf_res(xfs_allocfree_block_count(mp, 2), XFS_FSB_TO_B(mp, 1)); if (xfs_has_parent(mp)) { t3 = resp->tr_attrrm.tr_logres; overhead += xfs_calc_pptr_unlink_overhead(); } return overhead + max3(t1, t2, t3); } /* * For create, break it in to the two cases that the transaction * covers. We start with the modify case - allocation done by modification * of the state of existing inodes - and the allocation case. */ /* * For create we can modify: * the parent directory inode: inode size * the new inode: inode size * the inode btree entry: block size * the superblock for the nlink flag: sector size * the directory btree: (max depth + v2) * dir block size * the directory inode's bmap btree: (max depth + v2) * block size * the finobt (record modification and allocation btrees) */ STATIC uint xfs_calc_create_resv_modify( struct xfs_mount *mp) { return xfs_calc_inode_res(mp, 2) + xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) + (uint)XFS_FSB_TO_B(mp, 1) + xfs_calc_buf_res(XFS_DIROP_LOG_COUNT(mp), XFS_FSB_TO_B(mp, 1)) + xfs_calc_finobt_res(mp); } /* * For icreate we can allocate some inodes giving: * the agi and agf of the ag getting the new inodes: 2 * sectorsize * the superblock for the nlink flag: sector size * the inode chunk (allocation, optional init) * the inobt (record insertion) * the finobt (optional, record insertion) */ STATIC uint xfs_calc_icreate_resv_alloc( struct xfs_mount *mp) { return xfs_calc_buf_res(2, mp->m_sb.sb_sectsize) + mp->m_sb.sb_sectsize + xfs_calc_inode_chunk_res(mp, _ALLOC) + xfs_calc_inobt_res(mp) + xfs_calc_finobt_res(mp); } static inline unsigned int xfs_icreate_log_count( struct xfs_mount *mp, struct xfs_trans_resv *resp) { unsigned int ret = XFS_CREATE_LOG_COUNT; /* * Pre-reserve enough log reservation to handle the transaction * rolling needed to add one parent pointer. */ if (xfs_has_parent(mp)) ret += resp->tr_attrsetm.tr_logcount; return ret; } STATIC uint xfs_calc_icreate_reservation( struct xfs_mount *mp) { struct xfs_trans_resv *resp = M_RES(mp); unsigned int overhead = XFS_DQUOT_LOGRES; unsigned int t1, t2, t3 = 0; t1 = xfs_calc_icreate_resv_alloc(mp); t2 = xfs_calc_create_resv_modify(mp); if (xfs_has_parent(mp)) { t3 = resp->tr_attrsetm.tr_logres; overhead += xfs_calc_pptr_link_overhead(); } return overhead + max3(t1, t2, t3); } STATIC uint xfs_calc_create_tmpfile_reservation( struct xfs_mount *mp) { uint res = XFS_DQUOT_LOGRES; res += xfs_calc_icreate_resv_alloc(mp); return res + xfs_calc_iunlink_add_reservation(mp); } static inline unsigned int xfs_mkdir_log_count( struct xfs_mount *mp, struct xfs_trans_resv *resp) { unsigned int ret = XFS_MKDIR_LOG_COUNT; /* * Pre-reserve enough log reservation to handle the transaction * rolling needed to add one parent pointer. */ if (xfs_has_parent(mp)) ret += resp->tr_attrsetm.tr_logcount; return ret; } /* * Making a new directory is the same as creating a new file. */ STATIC uint xfs_calc_mkdir_reservation( struct xfs_mount *mp) { return xfs_calc_icreate_reservation(mp); } static inline unsigned int xfs_symlink_log_count( struct xfs_mount *mp, struct xfs_trans_resv *resp) { unsigned int ret = XFS_SYMLINK_LOG_COUNT; /* * Pre-reserve enough log reservation to handle the transaction * rolling needed to add one parent pointer. */ if (xfs_has_parent(mp)) ret += resp->tr_attrsetm.tr_logcount; return ret; } /* * Making a new symplink is the same as creating a new file, but * with the added blocks for remote symlink data which can be up to 1kB in * length (XFS_SYMLINK_MAXLEN). */ STATIC uint xfs_calc_symlink_reservation( struct xfs_mount *mp) { return xfs_calc_icreate_reservation(mp) + xfs_calc_buf_res(1, XFS_SYMLINK_MAXLEN); } /* * In freeing an inode we can modify: * the inode being freed: inode size * the super block free inode counter, AGF and AGFL: sector size * the on disk inode (agi unlinked list removal) * the inode chunk (invalidated, headers only) * the inode btree * the finobt (record insertion, removal or modification) * * Note that the inode chunk res. includes an allocfree res. for freeing of the * inode chunk. This is technically extraneous because the inode chunk free is * deferred (it occurs after a transaction roll). Include the extra reservation * anyways since we've had reports of ifree transaction overruns due to too many * agfl fixups during inode chunk frees. */ STATIC uint xfs_calc_ifree_reservation( struct xfs_mount *mp) { return XFS_DQUOT_LOGRES + xfs_calc_inode_res(mp, 1) + xfs_calc_buf_res(3, mp->m_sb.sb_sectsize) + xfs_calc_iunlink_remove_reservation(mp) + xfs_calc_inode_chunk_res(mp, _FREE) + xfs_calc_inobt_res(mp) + xfs_calc_finobt_res(mp); } /* * When only changing the inode we log the inode and possibly the superblock * We also add a bit of slop for the transaction stuff. */ STATIC uint xfs_calc_ichange_reservation( struct xfs_mount *mp) { return XFS_DQUOT_LOGRES + xfs_calc_inode_res(mp, 1) + xfs_calc_buf_res(1, mp->m_sb.sb_sectsize); } /* * Growing the data section of the filesystem. * superblock * agi and agf * allocation btrees */ STATIC uint xfs_calc_growdata_reservation( struct xfs_mount *mp) { return xfs_calc_buf_res(3, mp->m_sb.sb_sectsize) + xfs_calc_buf_res(xfs_allocfree_block_count(mp, 1), XFS_FSB_TO_B(mp, 1)); } /* * Growing the rt section of the filesystem. * In the first set of transactions (ALLOC) we allocate space to the * bitmap or summary files. * superblock: sector size * agf of the ag from which the extent is allocated: sector size * bmap btree for bitmap/summary inode: max depth * blocksize * bitmap/summary inode: inode size * allocation btrees for 1 block alloc: 2 * (2 * maxdepth - 1) * blocksize */ STATIC uint xfs_calc_growrtalloc_reservation( struct xfs_mount *mp) { return xfs_calc_buf_res(2, mp->m_sb.sb_sectsize) + xfs_calc_buf_res(XFS_BM_MAXLEVELS(mp, XFS_DATA_FORK), XFS_FSB_TO_B(mp, 1)) + xfs_calc_inode_res(mp, 1) + xfs_calc_buf_res(xfs_allocfree_block_count(mp, 1), XFS_FSB_TO_B(mp, 1)); } /* * Growing the rt section of the filesystem. * In the second set of transactions (ZERO) we zero the new metadata blocks. * one bitmap/summary block: blocksize */ STATIC uint xfs_calc_growrtzero_reservation( struct xfs_mount *mp) { return xfs_calc_buf_res(1, mp->m_sb.sb_blocksize); } /* * Growing the rt section of the filesystem. * In the third set of transactions (FREE) we update metadata without * allocating any new blocks. * superblock: sector size * bitmap inode: inode size * summary inode: inode size * one bitmap block: blocksize * summary blocks: new summary size */ STATIC uint xfs_calc_growrtfree_reservation( struct xfs_mount *mp) { return xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) + xfs_calc_inode_res(mp, 2) + xfs_calc_buf_res(1, mp->m_sb.sb_blocksize) + xfs_calc_buf_res(1, XFS_FSB_TO_B(mp, mp->m_rsumblocks)); } /* * Logging the inode modification timestamp on a synchronous write. * inode */ STATIC uint xfs_calc_swrite_reservation( struct xfs_mount *mp) { return xfs_calc_inode_res(mp, 1); } /* * Logging the inode mode bits when writing a setuid/setgid file * inode */ STATIC uint xfs_calc_writeid_reservation( struct xfs_mount *mp) { return xfs_calc_inode_res(mp, 1); } /* * Converting the inode from non-attributed to attributed. * the inode being converted: inode size * agf block and superblock (for block allocation) * the new block (directory sized) * bmap blocks for the new directory block * allocation btrees */ STATIC uint xfs_calc_addafork_reservation( struct xfs_mount *mp) { return XFS_DQUOT_LOGRES + xfs_calc_inode_res(mp, 1) + xfs_calc_buf_res(2, mp->m_sb.sb_sectsize) + xfs_calc_buf_res(1, mp->m_dir_geo->blksize) + xfs_calc_buf_res(XFS_DAENTER_BMAP1B(mp, XFS_DATA_FORK) + 1, XFS_FSB_TO_B(mp, 1)) + xfs_calc_buf_res(xfs_allocfree_block_count(mp, 1), XFS_FSB_TO_B(mp, 1)); } /* * Removing the attribute fork of a file * the inode being truncated: inode size * the inode's bmap btree: max depth * block size * And the bmap_finish transaction can free the blocks and bmap blocks: * the agf for each of the ags: 4 * sector size * the agfl for each of the ags: 4 * sector size * the super block to reflect the freed blocks: sector size * worst case split in allocation btrees per extent assuming 4 extents: * 4 exts * 2 trees * (2 * max depth - 1) * block size */ STATIC uint xfs_calc_attrinval_reservation( struct xfs_mount *mp) { return max((xfs_calc_inode_res(mp, 1) + xfs_calc_buf_res(XFS_BM_MAXLEVELS(mp, XFS_ATTR_FORK), XFS_FSB_TO_B(mp, 1))), (xfs_calc_buf_res(9, mp->m_sb.sb_sectsize) + xfs_calc_buf_res(xfs_allocfree_block_count(mp, 4), XFS_FSB_TO_B(mp, 1)))); } /* * Setting an attribute at mount time. * the inode getting the attribute * the superblock for allocations * the agfs extents are allocated from * the attribute btree * max depth * the inode allocation btree * Since attribute transaction space is dependent on the size of the attribute, * the calculation is done partially at mount time and partially at runtime(see * below). */ STATIC uint xfs_calc_attrsetm_reservation( struct xfs_mount *mp) { return XFS_DQUOT_LOGRES + xfs_calc_inode_res(mp, 1) + xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) + xfs_calc_buf_res(XFS_DA_NODE_MAXDEPTH, XFS_FSB_TO_B(mp, 1)); } /* * Setting an attribute at runtime, transaction space unit per block. * the superblock for allocations: sector size * the inode bmap btree could join or split: max depth * block size * Since the runtime attribute transaction space is dependent on the total * blocks needed for the 1st bmap, here we calculate out the space unit for * one block so that the caller could figure out the total space according * to the attibute extent length in blocks by: * ext * M_RES(mp)->tr_attrsetrt.tr_logres */ STATIC uint xfs_calc_attrsetrt_reservation( struct xfs_mount *mp) { return xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) + xfs_calc_buf_res(XFS_BM_MAXLEVELS(mp, XFS_ATTR_FORK), XFS_FSB_TO_B(mp, 1)); } /* * Removing an attribute. * the inode: inode size * the attribute btree could join: max depth * block size * the inode bmap btree could join or split: max depth * block size * And the bmap_finish transaction can free the attr blocks freed giving: * the agf for the ag in which the blocks live: 2 * sector size * the agfl for the ag in which the blocks live: 2 * sector size * the superblock for the free block count: sector size * the allocation btrees: 2 exts * 2 trees * (2 * max depth - 1) * block size */ STATIC uint xfs_calc_attrrm_reservation( struct xfs_mount *mp) { return XFS_DQUOT_LOGRES + max((xfs_calc_inode_res(mp, 1) + xfs_calc_buf_res(XFS_DA_NODE_MAXDEPTH, XFS_FSB_TO_B(mp, 1)) + (uint)XFS_FSB_TO_B(mp, XFS_BM_MAXLEVELS(mp, XFS_ATTR_FORK)) + xfs_calc_buf_res(XFS_BM_MAXLEVELS(mp, XFS_DATA_FORK), 0)), (xfs_calc_buf_res(5, mp->m_sb.sb_sectsize) + xfs_calc_buf_res(xfs_allocfree_block_count(mp, 2), XFS_FSB_TO_B(mp, 1)))); } /* * Clearing a bad agino number in an agi hash bucket. */ STATIC uint xfs_calc_clear_agi_bucket_reservation( struct xfs_mount *mp) { return xfs_calc_buf_res(1, mp->m_sb.sb_sectsize); } /* * Adjusting quota limits. * the disk quota buffer: sizeof(struct xfs_disk_dquot) */ STATIC uint xfs_calc_qm_setqlim_reservation(void) { return xfs_calc_buf_res(1, sizeof(struct xfs_disk_dquot)); } /* * Allocating quota on disk if needed. * the write transaction log space for quota file extent allocation * the unit of quota allocation: one system block size */ STATIC uint xfs_calc_qm_dqalloc_reservation( struct xfs_mount *mp, bool for_minlogsize) { return xfs_calc_write_reservation(mp, for_minlogsize) + xfs_calc_buf_res(1, XFS_FSB_TO_B(mp, XFS_DQUOT_CLUSTER_SIZE_FSB) - 1); } unsigned int xfs_calc_qm_dqalloc_reservation_minlogsize( struct xfs_mount *mp) { return xfs_calc_qm_dqalloc_reservation(mp, true); } /* * Syncing the incore super block changes to disk. * the super block to reflect the changes: sector size */ STATIC uint xfs_calc_sb_reservation( struct xfs_mount *mp) { return xfs_calc_buf_res(1, mp->m_sb.sb_sectsize); } /* * Namespace reservations. * * These get tricky when parent pointers are enabled as we have attribute * modifications occurring from within these transactions. Rather than confuse * each of these reservation calculations with the conditional attribute * reservations, add them here in a clear and concise manner. This requires that * the attribute reservations have already been calculated. * * Note that we only include the static attribute reservation here; the runtime * reservation will have to be modified by the size of the attributes being * added/removed/modified. See the comments on the attribute reservation * calculations for more details. */ STATIC void xfs_calc_namespace_reservations( struct xfs_mount *mp, struct xfs_trans_resv *resp) { ASSERT(resp->tr_attrsetm.tr_logres > 0); resp->tr_rename.tr_logres = xfs_calc_rename_reservation(mp); resp->tr_rename.tr_logcount = xfs_rename_log_count(mp, resp); resp->tr_rename.tr_logflags |= XFS_TRANS_PERM_LOG_RES; resp->tr_link.tr_logres = xfs_calc_link_reservation(mp); resp->tr_link.tr_logcount = xfs_link_log_count(mp, resp); resp->tr_link.tr_logflags |= XFS_TRANS_PERM_LOG_RES; resp->tr_remove.tr_logres = xfs_calc_remove_reservation(mp); resp->tr_remove.tr_logcount = xfs_remove_log_count(mp, resp); resp->tr_remove.tr_logflags |= XFS_TRANS_PERM_LOG_RES; resp->tr_symlink.tr_logres = xfs_calc_symlink_reservation(mp); resp->tr_symlink.tr_logcount = xfs_symlink_log_count(mp, resp); resp->tr_symlink.tr_logflags |= XFS_TRANS_PERM_LOG_RES; resp->tr_create.tr_logres = xfs_calc_icreate_reservation(mp); resp->tr_create.tr_logcount = xfs_icreate_log_count(mp, resp); resp->tr_create.tr_logflags |= XFS_TRANS_PERM_LOG_RES; resp->tr_mkdir.tr_logres = xfs_calc_mkdir_reservation(mp); resp->tr_mkdir.tr_logcount = xfs_mkdir_log_count(mp, resp); resp->tr_mkdir.tr_logflags |= XFS_TRANS_PERM_LOG_RES; } void xfs_trans_resv_calc( struct xfs_mount *mp, struct xfs_trans_resv *resp) { int logcount_adj = 0; /* * The following transactions are logged in physical format and * require a permanent reservation on space. */ resp->tr_write.tr_logres = xfs_calc_write_reservation(mp, false); resp->tr_write.tr_logcount = XFS_WRITE_LOG_COUNT; resp->tr_write.tr_logflags |= XFS_TRANS_PERM_LOG_RES; resp->tr_itruncate.tr_logres = xfs_calc_itruncate_reservation(mp, false); resp->tr_itruncate.tr_logcount = XFS_ITRUNCATE_LOG_COUNT; resp->tr_itruncate.tr_logflags |= XFS_TRANS_PERM_LOG_RES; resp->tr_create_tmpfile.tr_logres = xfs_calc_create_tmpfile_reservation(mp); resp->tr_create_tmpfile.tr_logcount = XFS_CREATE_TMPFILE_LOG_COUNT; resp->tr_create_tmpfile.tr_logflags |= XFS_TRANS_PERM_LOG_RES; resp->tr_ifree.tr_logres = xfs_calc_ifree_reservation(mp); resp->tr_ifree.tr_logcount = XFS_INACTIVE_LOG_COUNT; resp->tr_ifree.tr_logflags |= XFS_TRANS_PERM_LOG_RES; resp->tr_addafork.tr_logres = xfs_calc_addafork_reservation(mp); resp->tr_addafork.tr_logcount = XFS_ADDAFORK_LOG_COUNT; resp->tr_addafork.tr_logflags |= XFS_TRANS_PERM_LOG_RES; resp->tr_attrinval.tr_logres = xfs_calc_attrinval_reservation(mp); resp->tr_attrinval.tr_logcount = XFS_ATTRINVAL_LOG_COUNT; resp->tr_attrinval.tr_logflags |= XFS_TRANS_PERM_LOG_RES; resp->tr_attrsetm.tr_logres = xfs_calc_attrsetm_reservation(mp); resp->tr_attrsetm.tr_logcount = XFS_ATTRSET_LOG_COUNT; resp->tr_attrsetm.tr_logflags |= XFS_TRANS_PERM_LOG_RES; resp->tr_attrrm.tr_logres = xfs_calc_attrrm_reservation(mp); resp->tr_attrrm.tr_logcount = XFS_ATTRRM_LOG_COUNT; resp->tr_attrrm.tr_logflags |= XFS_TRANS_PERM_LOG_RES; resp->tr_growrtalloc.tr_logres = xfs_calc_growrtalloc_reservation(mp); resp->tr_growrtalloc.tr_logcount = XFS_DEFAULT_PERM_LOG_COUNT; resp->tr_growrtalloc.tr_logflags |= XFS_TRANS_PERM_LOG_RES; resp->tr_qm_dqalloc.tr_logres = xfs_calc_qm_dqalloc_reservation(mp, false); resp->tr_qm_dqalloc.tr_logcount = XFS_WRITE_LOG_COUNT; resp->tr_qm_dqalloc.tr_logflags |= XFS_TRANS_PERM_LOG_RES; xfs_calc_namespace_reservations(mp, resp); /* * The following transactions are logged in logical format with * a default log count. */ resp->tr_qm_setqlim.tr_logres = xfs_calc_qm_setqlim_reservation(); resp->tr_qm_setqlim.tr_logcount = XFS_DEFAULT_LOG_COUNT; resp->tr_sb.tr_logres = xfs_calc_sb_reservation(mp); resp->tr_sb.tr_logcount = XFS_DEFAULT_LOG_COUNT; /* growdata requires permanent res; it can free space to the last AG */ resp->tr_growdata.tr_logres = xfs_calc_growdata_reservation(mp); resp->tr_growdata.tr_logcount = XFS_DEFAULT_PERM_LOG_COUNT; resp->tr_growdata.tr_logflags |= XFS_TRANS_PERM_LOG_RES; /* The following transaction are logged in logical format */ resp->tr_ichange.tr_logres = xfs_calc_ichange_reservation(mp); resp->tr_fsyncts.tr_logres = xfs_calc_swrite_reservation(mp); resp->tr_writeid.tr_logres = xfs_calc_writeid_reservation(mp); resp->tr_attrsetrt.tr_logres = xfs_calc_attrsetrt_reservation(mp); resp->tr_clearagi.tr_logres = xfs_calc_clear_agi_bucket_reservation(mp); resp->tr_growrtzero.tr_logres = xfs_calc_growrtzero_reservation(mp); resp->tr_growrtfree.tr_logres = xfs_calc_growrtfree_reservation(mp); /* * Add one logcount for BUI items that appear with rmap or reflink, * one logcount for refcount intent items, and one logcount for rmap * intent items. */ if (xfs_has_reflink(mp) || xfs_has_rmapbt(mp)) logcount_adj++; if (xfs_has_reflink(mp)) logcount_adj++; if (xfs_has_rmapbt(mp)) logcount_adj++; resp->tr_itruncate.tr_logcount += logcount_adj; resp->tr_write.tr_logcount += logcount_adj; resp->tr_qm_dqalloc.tr_logcount += logcount_adj; } |
| 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 2297 2298 2299 2300 2301 2302 2303 2304 2305 2306 2307 2308 2309 2310 2311 2312 2313 2314 2315 2316 2317 2318 2319 2320 2321 2322 2323 2324 2325 2326 2327 2328 2329 2330 2331 2332 2333 2334 2335 2336 2337 2338 2339 2340 2341 2342 2343 2344 2345 2346 2347 2348 2349 2350 2351 2352 2353 2354 2355 2356 2357 2358 2359 2360 2361 2362 2363 2364 2365 2366 2367 2368 2369 2370 2371 2372 2373 2374 2375 2376 2377 2378 2379 2380 2381 2382 2383 2384 2385 2386 2387 2388 2389 2390 2391 2392 2393 2394 2395 2396 2397 2398 2399 2400 2401 2402 2403 2404 2405 2406 2407 2408 2409 2410 2411 2412 2413 2414 2415 2416 2417 2418 2419 2420 2421 2422 2423 2424 2425 2426 2427 2428 2429 2430 2431 2432 2433 2434 2435 2436 2437 2438 2439 2440 2441 2442 2443 2444 2445 2446 2447 2448 2449 2450 2451 2452 2453 2454 2455 2456 2457 2458 2459 2460 2461 2462 2463 2464 2465 2466 2467 2468 2469 2470 2471 2472 2473 2474 2475 2476 2477 2478 2479 2480 2481 2482 2483 2484 2485 2486 2487 2488 2489 2490 2491 2492 2493 2494 2495 2496 2497 2498 2499 2500 2501 2502 2503 2504 2505 2506 2507 2508 2509 2510 2511 2512 2513 2514 2515 2516 2517 2518 2519 2520 2521 2522 2523 2524 2525 2526 2527 2528 2529 2530 2531 2532 2533 2534 2535 2536 2537 2538 2539 2540 2541 2542 2543 2544 2545 2546 2547 2548 2549 2550 2551 2552 2553 2554 2555 2556 2557 2558 2559 2560 2561 2562 2563 2564 2565 2566 2567 2568 2569 2570 2571 2572 2573 2574 2575 2576 2577 2578 2579 2580 2581 2582 2583 2584 2585 2586 2587 2588 2589 2590 2591 2592 2593 2594 2595 2596 2597 2598 2599 2600 2601 2602 2603 2604 2605 2606 2607 2608 2609 2610 2611 2612 2613 2614 2615 2616 2617 2618 2619 2620 2621 2622 2623 2624 2625 2626 2627 2628 2629 2630 2631 2632 2633 2634 2635 2636 2637 2638 2639 2640 2641 2642 2643 2644 2645 2646 2647 2648 2649 2650 2651 2652 2653 2654 2655 2656 2657 2658 2659 2660 2661 2662 2663 2664 2665 2666 2667 2668 2669 2670 2671 2672 2673 2674 2675 2676 2677 2678 2679 2680 2681 2682 2683 2684 2685 2686 2687 2688 2689 2690 2691 2692 2693 | // SPDX-License-Identifier: GPL-2.0 /* * Copyright (C) 2007 Oracle. All rights reserved. */ #include <linux/fs.h> #include <linux/slab.h> #include <linux/sched.h> #include <linux/sched/mm.h> #include <linux/writeback.h> #include <linux/pagemap.h> #include <linux/blkdev.h> #include <linux/uuid.h> #include <linux/timekeeping.h> #include "misc.h" #include "ctree.h" #include "disk-io.h" #include "transaction.h" #include "locking.h" #include "tree-log.h" #include "volumes.h" #include "dev-replace.h" #include "qgroup.h" #include "block-group.h" #include "space-info.h" #include "fs.h" #include "accessors.h" #include "extent-tree.h" #include "root-tree.h" #include "dir-item.h" #include "uuid-tree.h" #include "ioctl.h" #include "relocation.h" #include "scrub.h" static struct kmem_cache *btrfs_trans_handle_cachep; /* * Transaction states and transitions * * No running transaction (fs tree blocks are not modified) * | * | To next stage: * | Call start_transaction() variants. Except btrfs_join_transaction_nostart(). * V * Transaction N [[TRANS_STATE_RUNNING]] * | * | New trans handles can be attached to transaction N by calling all * | start_transaction() variants. * | * | To next stage: * | Call btrfs_commit_transaction() on any trans handle attached to * | transaction N * V * Transaction N [[TRANS_STATE_COMMIT_PREP]] * | * | If there are simultaneous calls to btrfs_commit_transaction() one will win * | the race and the rest will wait for the winner to commit the transaction. * | * | The winner will wait for previous running transaction to completely finish * | if there is one. * | * Transaction N [[TRANS_STATE_COMMIT_START]] * | * | Then one of the following happens: * | - Wait for all other trans handle holders to release. * | The btrfs_commit_transaction() caller will do the commit work. * | - Wait for current transaction to be committed by others. * | Other btrfs_commit_transaction() caller will do the commit work. * | * | At this stage, only btrfs_join_transaction*() variants can attach * | to this running transaction. * | All other variants will wait for current one to finish and attach to * | transaction N+1. * | * | To next stage: * | Caller is chosen to commit transaction N, and all other trans handle * | haven been released. * V * Transaction N [[TRANS_STATE_COMMIT_DOING]] * | * | The heavy lifting transaction work is started. * | From running delayed refs (modifying extent tree) to creating pending * | snapshots, running qgroups. * | In short, modify supporting trees to reflect modifications of subvolume * | trees. * | * | At this stage, all start_transaction() calls will wait for this * | transaction to finish and attach to transaction N+1. * | * | To next stage: * | Until all supporting trees are updated. * V * Transaction N [[TRANS_STATE_UNBLOCKED]] * | Transaction N+1 * | All needed trees are modified, thus we only [[TRANS_STATE_RUNNING]] * | need to write them back to disk and update | * | super blocks. | * | | * | At this stage, new transaction is allowed to | * | start. | * | All new start_transaction() calls will be | * | attached to transid N+1. | * | | * | To next stage: | * | Until all tree blocks are super blocks are | * | written to block devices | * V | * Transaction N [[TRANS_STATE_COMPLETED]] V * All tree blocks and super blocks are written. Transaction N+1 * This transaction is finished and all its [[TRANS_STATE_COMMIT_START]] * data structures will be cleaned up. | Life goes on */ static const unsigned int btrfs_blocked_trans_types[TRANS_STATE_MAX] = { [TRANS_STATE_RUNNING] = 0U, [TRANS_STATE_COMMIT_PREP] = 0U, [TRANS_STATE_COMMIT_START] = (__TRANS_START | __TRANS_ATTACH), [TRANS_STATE_COMMIT_DOING] = (__TRANS_START | __TRANS_ATTACH | __TRANS_JOIN | __TRANS_JOIN_NOSTART), [TRANS_STATE_UNBLOCKED] = (__TRANS_START | __TRANS_ATTACH | __TRANS_JOIN | __TRANS_JOIN_NOLOCK | __TRANS_JOIN_NOSTART), [TRANS_STATE_SUPER_COMMITTED] = (__TRANS_START | __TRANS_ATTACH | __TRANS_JOIN | __TRANS_JOIN_NOLOCK | __TRANS_JOIN_NOSTART), [TRANS_STATE_COMPLETED] = (__TRANS_START | __TRANS_ATTACH | __TRANS_JOIN | __TRANS_JOIN_NOLOCK | __TRANS_JOIN_NOSTART), }; void btrfs_put_transaction(struct btrfs_transaction *transaction) { WARN_ON(refcount_read(&transaction->use_count) == 0); if (refcount_dec_and_test(&transaction->use_count)) { BUG_ON(!list_empty(&transaction->list)); WARN_ON(!xa_empty(&transaction->delayed_refs.head_refs)); WARN_ON(!xa_empty(&transaction->delayed_refs.dirty_extents)); if (transaction->delayed_refs.pending_csums) btrfs_err(transaction->fs_info, "pending csums is %llu", transaction->delayed_refs.pending_csums); /* * If any block groups are found in ->deleted_bgs then it's * because the transaction was aborted and a commit did not * happen (things failed before writing the new superblock * and calling btrfs_finish_extent_commit()), so we can not * discard the physical locations of the block groups. */ while (!list_empty(&transaction->deleted_bgs)) { struct btrfs_block_group *cache; cache = list_first_entry(&transaction->deleted_bgs, struct btrfs_block_group, bg_list); list_del_init(&cache->bg_list); btrfs_unfreeze_block_group(cache); btrfs_put_block_group(cache); } WARN_ON(!list_empty(&transaction->dev_update_list)); kfree(transaction); } } static noinline void switch_commit_roots(struct btrfs_trans_handle *trans) { struct btrfs_transaction *cur_trans = trans->transaction; struct btrfs_fs_info *fs_info = trans->fs_info; struct btrfs_root *root, *tmp; /* * At this point no one can be using this transaction to modify any tree * and no one can start another transaction to modify any tree either. */ ASSERT(cur_trans->state == TRANS_STATE_COMMIT_DOING); down_write(&fs_info->commit_root_sem); if (test_bit(BTRFS_FS_RELOC_RUNNING, &fs_info->flags)) fs_info->last_reloc_trans = trans->transid; list_for_each_entry_safe(root, tmp, &cur_trans->switch_commits, dirty_list) { list_del_init(&root->dirty_list); free_extent_buffer(root->commit_root); root->commit_root = btrfs_root_node(root); extent_io_tree_release(&root->dirty_log_pages); btrfs_qgroup_clean_swapped_blocks(root); } /* We can free old roots now. */ spin_lock(&cur_trans->dropped_roots_lock); while (!list_empty(&cur_trans->dropped_roots)) { root = list_first_entry(&cur_trans->dropped_roots, struct btrfs_root, root_list); list_del_init(&root->root_list); spin_unlock(&cur_trans->dropped_roots_lock); btrfs_free_log(trans, root); btrfs_drop_and_free_fs_root(fs_info, root); spin_lock(&cur_trans->dropped_roots_lock); } spin_unlock(&cur_trans->dropped_roots_lock); up_write(&fs_info->commit_root_sem); } static inline void extwriter_counter_inc(struct btrfs_transaction *trans, unsigned int type) { if (type & TRANS_EXTWRITERS) atomic_inc(&trans->num_extwriters); } static inline void extwriter_counter_dec(struct btrfs_transaction *trans, unsigned int type) { if (type & TRANS_EXTWRITERS) atomic_dec(&trans->num_extwriters); } static inline void extwriter_counter_init(struct btrfs_transaction *trans, unsigned int type) { atomic_set(&trans->num_extwriters, ((type & TRANS_EXTWRITERS) ? 1 : 0)); } static inline int extwriter_counter_read(struct btrfs_transaction *trans) { return atomic_read(&trans->num_extwriters); } /* * To be called after doing the chunk btree updates right after allocating a new * chunk (after btrfs_chunk_alloc_add_chunk_item() is called), when removing a * chunk after all chunk btree updates and after finishing the second phase of * chunk allocation (btrfs_create_pending_block_groups()) in case some block * group had its chunk item insertion delayed to the second phase. */ void btrfs_trans_release_chunk_metadata(struct btrfs_trans_handle *trans) { struct btrfs_fs_info *fs_info = trans->fs_info; if (!trans->chunk_bytes_reserved) return; btrfs_block_rsv_release(fs_info, &fs_info->chunk_block_rsv, trans->chunk_bytes_reserved, NULL); trans->chunk_bytes_reserved = 0; } /* * either allocate a new transaction or hop into the existing one */ static noinline int join_transaction(struct btrfs_fs_info *fs_info, unsigned int type) { struct btrfs_transaction *cur_trans; spin_lock(&fs_info->trans_lock); loop: /* The file system has been taken offline. No new transactions. */ if (BTRFS_FS_ERROR(fs_info)) { spin_unlock(&fs_info->trans_lock); return -EROFS; } cur_trans = fs_info->running_transaction; if (cur_trans) { if (TRANS_ABORTED(cur_trans)) { const int abort_error = cur_trans->aborted; spin_unlock(&fs_info->trans_lock); return abort_error; } if (btrfs_blocked_trans_types[cur_trans->state] & type) { spin_unlock(&fs_info->trans_lock); return -EBUSY; } refcount_inc(&cur_trans->use_count); atomic_inc(&cur_trans->num_writers); extwriter_counter_inc(cur_trans, type); spin_unlock(&fs_info->trans_lock); btrfs_lockdep_acquire(fs_info, btrfs_trans_num_writers); btrfs_lockdep_acquire(fs_info, btrfs_trans_num_extwriters); return 0; } spin_unlock(&fs_info->trans_lock); /* * If we are ATTACH or TRANS_JOIN_NOSTART, we just want to catch the * current transaction, and commit it. If there is no transaction, just * return ENOENT. */ if (type == TRANS_ATTACH || type == TRANS_JOIN_NOSTART) return -ENOENT; /* * JOIN_NOLOCK only happens during the transaction commit, so * it is impossible that ->running_transaction is NULL */ BUG_ON(type == TRANS_JOIN_NOLOCK); cur_trans = kmalloc(sizeof(*cur_trans), GFP_NOFS); if (!cur_trans) return -ENOMEM; btrfs_lockdep_acquire(fs_info, btrfs_trans_num_writers); btrfs_lockdep_acquire(fs_info, btrfs_trans_num_extwriters); spin_lock(&fs_info->trans_lock); if (fs_info->running_transaction) { /* * someone started a transaction after we unlocked. Make sure * to redo the checks above */ btrfs_lockdep_release(fs_info, btrfs_trans_num_extwriters); btrfs_lockdep_release(fs_info, btrfs_trans_num_writers); kfree(cur_trans); goto loop; } else if (BTRFS_FS_ERROR(fs_info)) { spin_unlock(&fs_info->trans_lock); btrfs_lockdep_release(fs_info, btrfs_trans_num_extwriters); btrfs_lockdep_release(fs_info, btrfs_trans_num_writers); kfree(cur_trans); return -EROFS; } cur_trans->fs_info = fs_info; atomic_set(&cur_trans->pending_ordered, 0); init_waitqueue_head(&cur_trans->pending_wait); atomic_set(&cur_trans->num_writers, 1); extwriter_counter_init(cur_trans, type); init_waitqueue_head(&cur_trans->writer_wait); init_waitqueue_head(&cur_trans->commit_wait); cur_trans->state = TRANS_STATE_RUNNING; /* * One for this trans handle, one so it will live on until we * commit the transaction. */ refcount_set(&cur_trans->use_count, 2); cur_trans->flags = 0; cur_trans->start_time = ktime_get_seconds(); memset(&cur_trans->delayed_refs, 0, sizeof(cur_trans->delayed_refs)); xa_init(&cur_trans->delayed_refs.head_refs); xa_init(&cur_trans->delayed_refs.dirty_extents); /* * although the tree mod log is per file system and not per transaction, * the log must never go across transaction boundaries. */ smp_mb(); if (!list_empty(&fs_info->tree_mod_seq_list)) WARN(1, KERN_ERR "BTRFS: tree_mod_seq_list not empty when creating a fresh transaction\n"); if (!RB_EMPTY_ROOT(&fs_info->tree_mod_log)) WARN(1, KERN_ERR "BTRFS: tree_mod_log rb tree not empty when creating a fresh transaction\n"); atomic64_set(&fs_info->tree_mod_seq, 0); spin_lock_init(&cur_trans->delayed_refs.lock); INIT_LIST_HEAD(&cur_trans->pending_snapshots); INIT_LIST_HEAD(&cur_trans->dev_update_list); INIT_LIST_HEAD(&cur_trans->switch_commits); INIT_LIST_HEAD(&cur_trans->dirty_bgs); INIT_LIST_HEAD(&cur_trans->io_bgs); INIT_LIST_HEAD(&cur_trans->dropped_roots); mutex_init(&cur_trans->cache_write_mutex); spin_lock_init(&cur_trans->dirty_bgs_lock); INIT_LIST_HEAD(&cur_trans->deleted_bgs); spin_lock_init(&cur_trans->dropped_roots_lock); list_add_tail(&cur_trans->list, &fs_info->trans_list); extent_io_tree_init(fs_info, &cur_trans->dirty_pages, IO_TREE_TRANS_DIRTY_PAGES); extent_io_tree_init(fs_info, &cur_trans->pinned_extents, IO_TREE_FS_PINNED_EXTENTS); btrfs_set_fs_generation(fs_info, fs_info->generation + 1); cur_trans->transid = fs_info->generation; fs_info->running_transaction = cur_trans; cur_trans->aborted = 0; spin_unlock(&fs_info->trans_lock); return 0; } /* * This does all the record keeping required to make sure that a shareable root * is properly recorded in a given transaction. This is required to make sure * the old root from before we joined the transaction is deleted when the * transaction commits. */ static int record_root_in_trans(struct btrfs_trans_handle *trans, struct btrfs_root *root, int force) { struct btrfs_fs_info *fs_info = root->fs_info; int ret = 0; if ((test_bit(BTRFS_ROOT_SHAREABLE, &root->state) && btrfs_get_root_last_trans(root) < trans->transid) || force) { WARN_ON(!force && root->commit_root != root->node); /* * see below for IN_TRANS_SETUP usage rules * we have the reloc mutex held now, so there * is only one writer in this function */ set_bit(BTRFS_ROOT_IN_TRANS_SETUP, &root->state); /* make sure readers find IN_TRANS_SETUP before * they find our root->last_trans update */ smp_wmb(); spin_lock(&fs_info->fs_roots_radix_lock); if (btrfs_get_root_last_trans(root) == trans->transid && !force) { spin_unlock(&fs_info->fs_roots_radix_lock); return 0; } radix_tree_tag_set(&fs_info->fs_roots_radix, (unsigned long)btrfs_root_id(root), BTRFS_ROOT_TRANS_TAG); spin_unlock(&fs_info->fs_roots_radix_lock); btrfs_set_root_last_trans(root, trans->transid); /* this is pretty tricky. We don't want to * take the relocation lock in btrfs_record_root_in_trans * unless we're really doing the first setup for this root in * this transaction. * * Normally we'd use root->last_trans as a flag to decide * if we want to take the expensive mutex. * * But, we have to set root->last_trans before we * init the relocation root, otherwise, we trip over warnings * in ctree.c. The solution used here is to flag ourselves * with root IN_TRANS_SETUP. When this is 1, we're still * fixing up the reloc trees and everyone must wait. * * When this is zero, they can trust root->last_trans and fly * through btrfs_record_root_in_trans without having to take the * lock. smp_wmb() makes sure that all the writes above are * done before we pop in the zero below */ ret = btrfs_init_reloc_root(trans, root); smp_mb__before_atomic(); clear_bit(BTRFS_ROOT_IN_TRANS_SETUP, &root->state); } return ret; } void btrfs_add_dropped_root(struct btrfs_trans_handle *trans, struct btrfs_root *root) { struct btrfs_fs_info *fs_info = root->fs_info; struct btrfs_transaction *cur_trans = trans->transaction; /* Add ourselves to the transaction dropped list */ spin_lock(&cur_trans->dropped_roots_lock); list_add_tail(&root->root_list, &cur_trans->dropped_roots); spin_unlock(&cur_trans->dropped_roots_lock); /* Make sure we don't try to update the root at commit time */ spin_lock(&fs_info->fs_roots_radix_lock); radix_tree_tag_clear(&fs_info->fs_roots_radix, (unsigned long)btrfs_root_id(root), BTRFS_ROOT_TRANS_TAG); spin_unlock(&fs_info->fs_roots_radix_lock); } int btrfs_record_root_in_trans(struct btrfs_trans_handle *trans, struct btrfs_root *root) { struct btrfs_fs_info *fs_info = root->fs_info; int ret; if (!test_bit(BTRFS_ROOT_SHAREABLE, &root->state)) return 0; /* * see record_root_in_trans for comments about IN_TRANS_SETUP usage * and barriers */ smp_rmb(); if (btrfs_get_root_last_trans(root) == trans->transid && !test_bit(BTRFS_ROOT_IN_TRANS_SETUP, &root->state)) return 0; mutex_lock(&fs_info->reloc_mutex); ret = record_root_in_trans(trans, root, 0); mutex_unlock(&fs_info->reloc_mutex); return ret; } static inline int is_transaction_blocked(struct btrfs_transaction *trans) { return (trans->state >= TRANS_STATE_COMMIT_START && trans->state < TRANS_STATE_UNBLOCKED && !TRANS_ABORTED(trans)); } /* wait for commit against the current transaction to become unblocked * when this is done, it is safe to start a new transaction, but the current * transaction might not be fully on disk. */ static void wait_current_trans(struct btrfs_fs_info *fs_info) { struct btrfs_transaction *cur_trans; spin_lock(&fs_info->trans_lock); cur_trans = fs_info->running_transaction; if (cur_trans && is_transaction_blocked(cur_trans)) { refcount_inc(&cur_trans->use_count); spin_unlock(&fs_info->trans_lock); btrfs_might_wait_for_state(fs_info, BTRFS_LOCKDEP_TRANS_UNBLOCKED); wait_event(fs_info->transaction_wait, cur_trans->state >= TRANS_STATE_UNBLOCKED || TRANS_ABORTED(cur_trans)); btrfs_put_transaction(cur_trans); } else { spin_unlock(&fs_info->trans_lock); } } static int may_wait_transaction(struct btrfs_fs_info *fs_info, int type) { if (test_bit(BTRFS_FS_LOG_RECOVERING, &fs_info->flags)) return 0; if (type == TRANS_START) return 1; return 0; } static inline bool need_reserve_reloc_root(struct btrfs_root *root) { struct btrfs_fs_info *fs_info = root->fs_info; if (!fs_info->reloc_ctl || !test_bit(BTRFS_ROOT_SHAREABLE, &root->state) || btrfs_root_id(root) == BTRFS_TREE_RELOC_OBJECTID || root->reloc_root) return false; return true; } static int btrfs_reserve_trans_metadata(struct btrfs_fs_info *fs_info, enum btrfs_reserve_flush_enum flush, u64 num_bytes, u64 *delayed_refs_bytes) { struct btrfs_space_info *si = fs_info->trans_block_rsv.space_info; u64 bytes = num_bytes + *delayed_refs_bytes; int ret; /* * We want to reserve all the bytes we may need all at once, so we only * do 1 enospc flushing cycle per transaction start. */ ret = btrfs_reserve_metadata_bytes(fs_info, si, bytes, flush); /* * If we are an emergency flush, which can steal from the global block * reserve, then attempt to not reserve space for the delayed refs, as * we will consume space for them from the global block reserve. */ if (ret && flush == BTRFS_RESERVE_FLUSH_ALL_STEAL) { bytes -= *delayed_refs_bytes; *delayed_refs_bytes = 0; ret = btrfs_reserve_metadata_bytes(fs_info, si, bytes, flush); } return ret; } static struct btrfs_trans_handle * start_transaction(struct btrfs_root *root, unsigned int num_items, unsigned int type, enum btrfs_reserve_flush_enum flush, bool enforce_qgroups) { struct btrfs_fs_info *fs_info = root->fs_info; struct btrfs_block_rsv *delayed_refs_rsv = &fs_info->delayed_refs_rsv; struct btrfs_block_rsv *trans_rsv = &fs_info->trans_block_rsv; struct btrfs_trans_handle *h; struct btrfs_transaction *cur_trans; u64 num_bytes = 0; u64 qgroup_reserved = 0; u64 delayed_refs_bytes = 0; bool reloc_reserved = false; bool do_chunk_alloc = false; int ret; if (BTRFS_FS_ERROR(fs_info)) return ERR_PTR(-EROFS); if (current->journal_info) { WARN_ON(type & TRANS_EXTWRITERS); h = current->journal_info; refcount_inc(&h->use_count); WARN_ON(refcount_read(&h->use_count) > 2); h->orig_rsv = h->block_rsv; h->block_rsv = NULL; goto got_it; } /* * Do the reservation before we join the transaction so we can do all * the appropriate flushing if need be. */ if (num_items && root != fs_info->chunk_root) { qgroup_reserved = num_items * fs_info->nodesize; /* * Use prealloc for now, as there might be a currently running * transaction that could free this reserved space prematurely * by committing. */ ret = btrfs_qgroup_reserve_meta_prealloc(root, qgroup_reserved, enforce_qgroups, false); if (ret) return ERR_PTR(ret); num_bytes = btrfs_calc_insert_metadata_size(fs_info, num_items); /* * If we plan to insert/update/delete "num_items" from a btree, * we will also generate delayed refs for extent buffers in the * respective btree paths, so reserve space for the delayed refs * that will be generated by the caller as it modifies btrees. * Try to reserve them to avoid excessive use of the global * block reserve. */ delayed_refs_bytes = btrfs_calc_delayed_ref_bytes(fs_info, num_items); /* * Do the reservation for the relocation root creation */ if (need_reserve_reloc_root(root)) { num_bytes += fs_info->nodesize; reloc_reserved = true; } ret = btrfs_reserve_trans_metadata(fs_info, flush, num_bytes, &delayed_refs_bytes); if (ret) goto reserve_fail; btrfs_block_rsv_add_bytes(trans_rsv, num_bytes, true); if (trans_rsv->space_info->force_alloc) do_chunk_alloc = true; } else if (num_items == 0 && flush == BTRFS_RESERVE_FLUSH_ALL && !btrfs_block_rsv_full(delayed_refs_rsv)) { /* * Some people call with btrfs_start_transaction(root, 0) * because they can be throttled, but have some other mechanism * for reserving space. We still want these guys to refill the * delayed block_rsv so just add 1 items worth of reservation * here. */ ret = btrfs_delayed_refs_rsv_refill(fs_info, flush); if (ret) goto reserve_fail; } again: h = kmem_cache_zalloc(btrfs_trans_handle_cachep, GFP_NOFS); if (!h) { ret = -ENOMEM; goto alloc_fail; } /* * If we are JOIN_NOLOCK we're already committing a transaction and * waiting on this guy, so we don't need to do the sb_start_intwrite * because we're already holding a ref. We need this because we could * have raced in and did an fsync() on a file which can kick a commit * and then we deadlock with somebody doing a freeze. * * If we are ATTACH, it means we just want to catch the current * transaction and commit it, so we needn't do sb_start_intwrite(). */ if (type & __TRANS_FREEZABLE) sb_start_intwrite(fs_info->sb); if (may_wait_transaction(fs_info, type)) wait_current_trans(fs_info); do { ret = join_transaction(fs_info, type); if (ret == -EBUSY) { wait_current_trans(fs_info); if (unlikely(type == TRANS_ATTACH || type == TRANS_JOIN_NOSTART)) ret = -ENOENT; } } while (ret == -EBUSY); if (ret < 0) goto join_fail; cur_trans = fs_info->running_transaction; h->transid = cur_trans->transid; h->transaction = cur_trans; refcount_set(&h->use_count, 1); h->fs_info = root->fs_info; h->type = type; INIT_LIST_HEAD(&h->new_bgs); btrfs_init_metadata_block_rsv(fs_info, &h->delayed_rsv, BTRFS_BLOCK_RSV_DELOPS); smp_mb(); if (cur_trans->state >= TRANS_STATE_COMMIT_START && may_wait_transaction(fs_info, type)) { current->journal_info = h; btrfs_commit_transaction(h); goto again; } if (num_bytes) { trace_btrfs_space_reservation(fs_info, "transaction", h->transid, num_bytes, 1); h->block_rsv = trans_rsv; h->bytes_reserved = num_bytes; if (delayed_refs_bytes > 0) { trace_btrfs_space_reservation(fs_info, "local_delayed_refs_rsv", h->transid, delayed_refs_bytes, 1); h->delayed_refs_bytes_reserved = delayed_refs_bytes; btrfs_block_rsv_add_bytes(&h->delayed_rsv, delayed_refs_bytes, true); delayed_refs_bytes = 0; } h->reloc_reserved = reloc_reserved; } got_it: if (!current->journal_info) current->journal_info = h; /* * If the space_info is marked ALLOC_FORCE then we'll get upgraded to * ALLOC_FORCE the first run through, and then we won't allocate for * anybody else who races in later. We don't care about the return * value here. */ if (do_chunk_alloc && num_bytes) { u64 flags = h->block_rsv->space_info->flags; btrfs_chunk_alloc(h, btrfs_get_alloc_profile(fs_info, flags), CHUNK_ALLOC_NO_FORCE); } /* * btrfs_record_root_in_trans() needs to alloc new extents, and may * call btrfs_join_transaction() while we're also starting a * transaction. * * Thus it need to be called after current->journal_info initialized, * or we can deadlock. */ ret = btrfs_record_root_in_trans(h, root); if (ret) { /* * The transaction handle is fully initialized and linked with * other structures so it needs to be ended in case of errors, * not just freed. */ btrfs_end_transaction(h); goto reserve_fail; } /* * Now that we have found a transaction to be a part of, convert the * qgroup reservation from prealloc to pertrans. A different transaction * can't race in and free our pertrans out from under us. */ if (qgroup_reserved) btrfs_qgroup_convert_reserved_meta(root, qgroup_reserved); return h; join_fail: if (type & __TRANS_FREEZABLE) sb_end_intwrite(fs_info->sb); kmem_cache_free(btrfs_trans_handle_cachep, h); alloc_fail: if (num_bytes) btrfs_block_rsv_release(fs_info, trans_rsv, num_bytes, NULL); if (delayed_refs_bytes) btrfs_space_info_free_bytes_may_use(trans_rsv->space_info, delayed_refs_bytes); reserve_fail: btrfs_qgroup_free_meta_prealloc(root, qgroup_reserved); return ERR_PTR(ret); } struct btrfs_trans_handle *btrfs_start_transaction(struct btrfs_root *root, unsigned int num_items) { return start_transaction(root, num_items, TRANS_START, BTRFS_RESERVE_FLUSH_ALL, true); } struct btrfs_trans_handle *btrfs_start_transaction_fallback_global_rsv( struct btrfs_root *root, unsigned int num_items) { return start_transaction(root, num_items, TRANS_START, BTRFS_RESERVE_FLUSH_ALL_STEAL, false); } struct btrfs_trans_handle *btrfs_join_transaction(struct btrfs_root *root) { return start_transaction(root, 0, TRANS_JOIN, BTRFS_RESERVE_NO_FLUSH, true); } struct btrfs_trans_handle *btrfs_join_transaction_spacecache(struct btrfs_root *root) { return start_transaction(root, 0, TRANS_JOIN_NOLOCK, BTRFS_RESERVE_NO_FLUSH, true); } /* * Similar to regular join but it never starts a transaction when none is * running or when there's a running one at a state >= TRANS_STATE_UNBLOCKED. * This is similar to btrfs_attach_transaction() but it allows the join to * happen if the transaction commit already started but it's not yet in the * "doing" phase (the state is < TRANS_STATE_COMMIT_DOING). */ struct btrfs_trans_handle *btrfs_join_transaction_nostart(struct btrfs_root *root) { return start_transaction(root, 0, TRANS_JOIN_NOSTART, BTRFS_RESERVE_NO_FLUSH, true); } /* * Catch the running transaction. * * It is used when we want to commit the current the transaction, but * don't want to start a new one. * * Note: If this function return -ENOENT, it just means there is no * running transaction. But it is possible that the inactive transaction * is still in the memory, not fully on disk. If you hope there is no * inactive transaction in the fs when -ENOENT is returned, you should * invoke * btrfs_attach_transaction_barrier() */ struct btrfs_trans_handle *btrfs_attach_transaction(struct btrfs_root *root) { return start_transaction(root, 0, TRANS_ATTACH, BTRFS_RESERVE_NO_FLUSH, true); } /* * Catch the running transaction. * * It is similar to the above function, the difference is this one * will wait for all the inactive transactions until they fully * complete. */ struct btrfs_trans_handle * btrfs_attach_transaction_barrier(struct btrfs_root *root) { struct btrfs_trans_handle *trans; trans = start_transaction(root, 0, TRANS_ATTACH, BTRFS_RESERVE_NO_FLUSH, true); if (trans == ERR_PTR(-ENOENT)) { int ret; ret = btrfs_wait_for_commit(root->fs_info, 0); if (ret) return ERR_PTR(ret); } return trans; } /* Wait for a transaction commit to reach at least the given state. */ static noinline void wait_for_commit(struct btrfs_transaction *commit, const enum btrfs_trans_state min_state) { struct btrfs_fs_info *fs_info = commit->fs_info; u64 transid = commit->transid; bool put = false; /* * At the moment this function is called with min_state either being * TRANS_STATE_COMPLETED or TRANS_STATE_SUPER_COMMITTED. */ if (min_state == TRANS_STATE_COMPLETED) btrfs_might_wait_for_state(fs_info, BTRFS_LOCKDEP_TRANS_COMPLETED); else btrfs_might_wait_for_state(fs_info, BTRFS_LOCKDEP_TRANS_SUPER_COMMITTED); while (1) { wait_event(commit->commit_wait, commit->state >= min_state); if (put) btrfs_put_transaction(commit); if (min_state < TRANS_STATE_COMPLETED) break; /* * A transaction isn't really completed until all of the * previous transactions are completed, but with fsync we can * end up with SUPER_COMMITTED transactions before a COMPLETED * transaction. Wait for those. */ spin_lock(&fs_info->trans_lock); commit = list_first_entry_or_null(&fs_info->trans_list, struct btrfs_transaction, list); if (!commit || commit->transid > transid) { spin_unlock(&fs_info->trans_lock); break; } refcount_inc(&commit->use_count); put = true; spin_unlock(&fs_info->trans_lock); } } int btrfs_wait_for_commit(struct btrfs_fs_info *fs_info, u64 transid) { struct btrfs_transaction *cur_trans = NULL, *t; int ret = 0; if (transid) { if (transid <= btrfs_get_last_trans_committed(fs_info)) goto out; /* find specified transaction */ spin_lock(&fs_info->trans_lock); list_for_each_entry(t, &fs_info->trans_list, list) { if (t->transid == transid) { cur_trans = t; refcount_inc(&cur_trans->use_count); ret = 0; break; } if (t->transid > transid) { ret = 0; break; } } spin_unlock(&fs_info->trans_lock); /* * The specified transaction doesn't exist, or we * raced with btrfs_commit_transaction */ if (!cur_trans) { if (transid > btrfs_get_last_trans_committed(fs_info)) ret = -EINVAL; goto out; } } else { /* find newest transaction that is committing | committed */ spin_lock(&fs_info->trans_lock); list_for_each_entry_reverse(t, &fs_info->trans_list, list) { if (t->state >= TRANS_STATE_COMMIT_START) { if (t->state == TRANS_STATE_COMPLETED) break; cur_trans = t; refcount_inc(&cur_trans->use_count); break; } } spin_unlock(&fs_info->trans_lock); if (!cur_trans) goto out; /* nothing committing|committed */ } wait_for_commit(cur_trans, TRANS_STATE_COMPLETED); ret = cur_trans->aborted; btrfs_put_transaction(cur_trans); out: return ret; } void btrfs_throttle(struct btrfs_fs_info *fs_info) { wait_current_trans(fs_info); } bool btrfs_should_end_transaction(struct btrfs_trans_handle *trans) { struct btrfs_transaction *cur_trans = trans->transaction; if (cur_trans->state >= TRANS_STATE_COMMIT_START || test_bit(BTRFS_DELAYED_REFS_FLUSHING, &cur_trans->delayed_refs.flags)) return true; if (btrfs_check_space_for_delayed_refs(trans->fs_info)) return true; return !!btrfs_block_rsv_check(&trans->fs_info->global_block_rsv, 50); } static void btrfs_trans_release_metadata(struct btrfs_trans_handle *trans) { struct btrfs_fs_info *fs_info = trans->fs_info; if (!trans->block_rsv) { ASSERT(!trans->bytes_reserved); ASSERT(!trans->delayed_refs_bytes_reserved); return; } if (!trans->bytes_reserved) { ASSERT(!trans->delayed_refs_bytes_reserved); return; } ASSERT(trans->block_rsv == &fs_info->trans_block_rsv); trace_btrfs_space_reservation(fs_info, "transaction", trans->transid, trans->bytes_reserved, 0); btrfs_block_rsv_release(fs_info, trans->block_rsv, trans->bytes_reserved, NULL); trans->bytes_reserved = 0; if (!trans->delayed_refs_bytes_reserved) return; trace_btrfs_space_reservation(fs_info, "local_delayed_refs_rsv", trans->transid, trans->delayed_refs_bytes_reserved, 0); btrfs_block_rsv_release(fs_info, &trans->delayed_rsv, trans->delayed_refs_bytes_reserved, NULL); trans->delayed_refs_bytes_reserved = 0; } static int __btrfs_end_transaction(struct btrfs_trans_handle *trans, int throttle) { struct btrfs_fs_info *info = trans->fs_info; struct btrfs_transaction *cur_trans = trans->transaction; int ret = 0; if (refcount_read(&trans->use_count) > 1) { refcount_dec(&trans->use_count); trans->block_rsv = trans->orig_rsv; return 0; } btrfs_trans_release_metadata(trans); trans->block_rsv = NULL; btrfs_create_pending_block_groups(trans); btrfs_trans_release_chunk_metadata(trans); if (trans->type & __TRANS_FREEZABLE) sb_end_intwrite(info->sb); WARN_ON(cur_trans != info->running_transaction); WARN_ON(atomic_read(&cur_trans->num_writers) < 1); atomic_dec(&cur_trans->num_writers); extwriter_counter_dec(cur_trans, trans->type); cond_wake_up(&cur_trans->writer_wait); btrfs_lockdep_release(info, btrfs_trans_num_extwriters); btrfs_lockdep_release(info, btrfs_trans_num_writers); btrfs_put_transaction(cur_trans); if (current->journal_info == trans) current->journal_info = NULL; if (throttle) btrfs_run_delayed_iputs(info); if (TRANS_ABORTED(trans) || BTRFS_FS_ERROR(info)) { wake_up_process(info->transaction_kthread); if (TRANS_ABORTED(trans)) ret = trans->aborted; else ret = -EROFS; } kmem_cache_free(btrfs_trans_handle_cachep, trans); return ret; } int btrfs_end_transaction(struct btrfs_trans_handle *trans) { return __btrfs_end_transaction(trans, 0); } int btrfs_end_transaction_throttle(struct btrfs_trans_handle *trans) { return __btrfs_end_transaction(trans, 1); } /* * when btree blocks are allocated, they have some corresponding bits set for * them in one of two extent_io trees. This is used to make sure all of * those extents are sent to disk but does not wait on them */ int btrfs_write_marked_extents(struct btrfs_fs_info *fs_info, struct extent_io_tree *dirty_pages, int mark) { int ret = 0; struct address_space *mapping = fs_info->btree_inode->i_mapping; struct extent_state *cached_state = NULL; u64 start = 0; u64 end; while (find_first_extent_bit(dirty_pages, start, &start, &end, mark, &cached_state)) { bool wait_writeback = false; ret = convert_extent_bit(dirty_pages, start, end, EXTENT_NEED_WAIT, mark, &cached_state); /* * convert_extent_bit can return -ENOMEM, which is most of the * time a temporary error. So when it happens, ignore the error * and wait for writeback of this range to finish - because we * failed to set the bit EXTENT_NEED_WAIT for the range, a call * to __btrfs_wait_marked_extents() would not know that * writeback for this range started and therefore wouldn't * wait for it to finish - we don't want to commit a * superblock that points to btree nodes/leafs for which * writeback hasn't finished yet (and without errors). * We cleanup any entries left in the io tree when committing * the transaction (through extent_io_tree_release()). */ if (ret == -ENOMEM) { ret = 0; wait_writeback = true; } if (!ret) ret = filemap_fdatawrite_range(mapping, start, end); if (!ret && wait_writeback) ret = filemap_fdatawait_range(mapping, start, end); free_extent_state(cached_state); if (ret) break; cached_state = NULL; cond_resched(); start = end + 1; } return ret; } /* * when btree blocks are allocated, they have some corresponding bits set for * them in one of two extent_io trees. This is used to make sure all of * those extents are on disk for transaction or log commit. We wait * on all the pages and clear them from the dirty pages state tree */ static int __btrfs_wait_marked_extents(struct btrfs_fs_info *fs_info, struct extent_io_tree *dirty_pages) { struct address_space *mapping = fs_info->btree_inode->i_mapping; struct extent_state *cached_state = NULL; u64 start = 0; u64 end; int ret = 0; while (find_first_extent_bit(dirty_pages, start, &start, &end, EXTENT_NEED_WAIT, &cached_state)) { /* * Ignore -ENOMEM errors returned by clear_extent_bit(). * When committing the transaction, we'll remove any entries * left in the io tree. For a log commit, we don't remove them * after committing the log because the tree can be accessed * concurrently - we do it only at transaction commit time when * it's safe to do it (through extent_io_tree_release()). */ ret = clear_extent_bit(dirty_pages, start, end, EXTENT_NEED_WAIT, &cached_state); if (ret == -ENOMEM) ret = 0; if (!ret) ret = filemap_fdatawait_range(mapping, start, end); free_extent_state(cached_state); if (ret) break; cached_state = NULL; cond_resched(); start = end + 1; } return ret; } static int btrfs_wait_extents(struct btrfs_fs_info *fs_info, struct extent_io_tree *dirty_pages) { bool errors = false; int err; err = __btrfs_wait_marked_extents(fs_info, dirty_pages); if (test_and_clear_bit(BTRFS_FS_BTREE_ERR, &fs_info->flags)) errors = true; if (errors && !err) err = -EIO; return err; } int btrfs_wait_tree_log_extents(struct btrfs_root *log_root, int mark) { struct btrfs_fs_info *fs_info = log_root->fs_info; struct extent_io_tree *dirty_pages = &log_root->dirty_log_pages; bool errors = false; int err; ASSERT(btrfs_root_id(log_root) == BTRFS_TREE_LOG_OBJECTID); err = __btrfs_wait_marked_extents(fs_info, dirty_pages); if ((mark & EXTENT_DIRTY) && test_and_clear_bit(BTRFS_FS_LOG1_ERR, &fs_info->flags)) errors = true; if ((mark & EXTENT_NEW) && test_and_clear_bit(BTRFS_FS_LOG2_ERR, &fs_info->flags)) errors = true; if (errors && !err) err = -EIO; return err; } /* * When btree blocks are allocated the corresponding extents are marked dirty. * This function ensures such extents are persisted on disk for transaction or * log commit. * * @trans: transaction whose dirty pages we'd like to write */ static int btrfs_write_and_wait_transaction(struct btrfs_trans_handle *trans) { int ret; int ret2; struct extent_io_tree *dirty_pages = &trans->transaction->dirty_pages; struct btrfs_fs_info *fs_info = trans->fs_info; struct blk_plug plug; blk_start_plug(&plug); ret = btrfs_write_marked_extents(fs_info, dirty_pages, EXTENT_DIRTY); blk_finish_plug(&plug); ret2 = btrfs_wait_extents(fs_info, dirty_pages); extent_io_tree_release(&trans->transaction->dirty_pages); if (ret) return ret; else if (ret2) return ret2; else return 0; } /* * this is used to update the root pointer in the tree of tree roots. * * But, in the case of the extent allocation tree, updating the root * pointer may allocate blocks which may change the root of the extent * allocation tree. * * So, this loops and repeats and makes sure the cowonly root didn't * change while the root pointer was being updated in the metadata. */ static int update_cowonly_root(struct btrfs_trans_handle *trans, struct btrfs_root *root) { int ret; u64 old_root_bytenr; u64 old_root_used; struct btrfs_fs_info *fs_info = root->fs_info; struct btrfs_root *tree_root = fs_info->tree_root; old_root_used = btrfs_root_used(&root->root_item); while (1) { old_root_bytenr = btrfs_root_bytenr(&root->root_item); if (old_root_bytenr == root->node->start && old_root_used == btrfs_root_used(&root->root_item)) break; btrfs_set_root_node(&root->root_item, root->node); ret = btrfs_update_root(trans, tree_root, &root->root_key, &root->root_item); if (ret) return ret; old_root_used = btrfs_root_used(&root->root_item); } return 0; } /* * update all the cowonly tree roots on disk * * The error handling in this function may not be obvious. Any of the * failures will cause the file system to go offline. We still need * to clean up the delayed refs. */ static noinline int commit_cowonly_roots(struct btrfs_trans_handle *trans) { struct btrfs_fs_info *fs_info = trans->fs_info; struct list_head *dirty_bgs = &trans->transaction->dirty_bgs; struct list_head *io_bgs = &trans->transaction->io_bgs; struct list_head *next; struct extent_buffer *eb; int ret; /* * At this point no one can be using this transaction to modify any tree * and no one can start another transaction to modify any tree either. */ ASSERT(trans->transaction->state == TRANS_STATE_COMMIT_DOING); eb = btrfs_lock_root_node(fs_info->tree_root); ret = btrfs_cow_block(trans, fs_info->tree_root, eb, NULL, 0, &eb, BTRFS_NESTING_COW); btrfs_tree_unlock(eb); free_extent_buffer(eb); if (ret) return ret; ret = btrfs_run_dev_stats(trans); if (ret) return ret; ret = btrfs_run_dev_replace(trans); if (ret) return ret; ret = btrfs_run_qgroups(trans); if (ret) return ret; ret = btrfs_setup_space_cache(trans); if (ret) return ret; again: while (!list_empty(&fs_info->dirty_cowonly_roots)) { struct btrfs_root *root; next = fs_info->dirty_cowonly_roots.next; list_del_init(next); root = list_entry(next, struct btrfs_root, dirty_list); clear_bit(BTRFS_ROOT_DIRTY, &root->state); list_add_tail(&root->dirty_list, &trans->transaction->switch_commits); ret = update_cowonly_root(trans, root); if (ret) return ret; } /* Now flush any delayed refs generated by updating all of the roots */ ret = btrfs_run_delayed_refs(trans, U64_MAX); if (ret) return ret; while (!list_empty(dirty_bgs) || !list_empty(io_bgs)) { ret = btrfs_write_dirty_block_groups(trans); if (ret) return ret; /* * We're writing the dirty block groups, which could generate * delayed refs, which could generate more dirty block groups, * so we want to keep this flushing in this loop to make sure * everything gets run. */ ret = btrfs_run_delayed_refs(trans, U64_MAX); if (ret) return ret; } if (!list_empty(&fs_info->dirty_cowonly_roots)) goto again; /* Update dev-replace pointer once everything is committed */ fs_info->dev_replace.committed_cursor_left = fs_info->dev_replace.cursor_left_last_write_of_item; return 0; } /* * If we had a pending drop we need to see if there are any others left in our * dead roots list, and if not clear our bit and wake any waiters. */ void btrfs_maybe_wake_unfinished_drop(struct btrfs_fs_info *fs_info) { /* * We put the drop in progress roots at the front of the list, so if the * first entry doesn't have UNFINISHED_DROP set we can wake everybody * up. */ spin_lock(&fs_info->trans_lock); if (!list_empty(&fs_info->dead_roots)) { struct btrfs_root *root = list_first_entry(&fs_info->dead_roots, struct btrfs_root, root_list); if (test_bit(BTRFS_ROOT_UNFINISHED_DROP, &root->state)) { spin_unlock(&fs_info->trans_lock); return; } } spin_unlock(&fs_info->trans_lock); btrfs_wake_unfinished_drop(fs_info); } /* * dead roots are old snapshots that need to be deleted. This allocates * a dirty root struct and adds it into the list of dead roots that need to * be deleted */ void btrfs_add_dead_root(struct btrfs_root *root) { struct btrfs_fs_info *fs_info = root->fs_info; spin_lock(&fs_info->trans_lock); if (list_empty(&root->root_list)) { btrfs_grab_root(root); /* We want to process the partially complete drops first. */ if (test_bit(BTRFS_ROOT_UNFINISHED_DROP, &root->state)) list_add(&root->root_list, &fs_info->dead_roots); else list_add_tail(&root->root_list, &fs_info->dead_roots); } spin_unlock(&fs_info->trans_lock); } /* * Update each subvolume root and its relocation root, if it exists, in the tree * of tree roots. Also free log roots if they exist. */ static noinline int commit_fs_roots(struct btrfs_trans_handle *trans) { struct btrfs_fs_info *fs_info = trans->fs_info; struct btrfs_root *gang[8]; int i; int ret; /* * At this point no one can be using this transaction to modify any tree * and no one can start another transaction to modify any tree either. */ ASSERT(trans->transaction->state == TRANS_STATE_COMMIT_DOING); spin_lock(&fs_info->fs_roots_radix_lock); while (1) { ret = radix_tree_gang_lookup_tag(&fs_info->fs_roots_radix, (void **)gang, 0, ARRAY_SIZE(gang), BTRFS_ROOT_TRANS_TAG); if (ret == 0) break; for (i = 0; i < ret; i++) { struct btrfs_root *root = gang[i]; int ret2; /* * At this point we can neither have tasks logging inodes * from a root nor trying to commit a log tree. */ ASSERT(atomic_read(&root->log_writers) == 0); ASSERT(atomic_read(&root->log_commit[0]) == 0); ASSERT(atomic_read(&root->log_commit[1]) == 0); radix_tree_tag_clear(&fs_info->fs_roots_radix, (unsigned long)btrfs_root_id(root), BTRFS_ROOT_TRANS_TAG); btrfs_qgroup_free_meta_all_pertrans(root); spin_unlock(&fs_info->fs_roots_radix_lock); btrfs_free_log(trans, root); ret2 = btrfs_update_reloc_root(trans, root); if (ret2) return ret2; /* see comments in should_cow_block() */ clear_bit(BTRFS_ROOT_FORCE_COW, &root->state); smp_mb__after_atomic(); if (root->commit_root != root->node) { list_add_tail(&root->dirty_list, &trans->transaction->switch_commits); btrfs_set_root_node(&root->root_item, root->node); } ret2 = btrfs_update_root(trans, fs_info->tree_root, &root->root_key, &root->root_item); if (ret2) return ret2; spin_lock(&fs_info->fs_roots_radix_lock); } } spin_unlock(&fs_info->fs_roots_radix_lock); return 0; } /* * Do all special snapshot related qgroup dirty hack. * * Will do all needed qgroup inherit and dirty hack like switch commit * roots inside one transaction and write all btree into disk, to make * qgroup works. */ static int qgroup_account_snapshot(struct btrfs_trans_handle *trans, struct btrfs_root *src, struct btrfs_root *parent, struct btrfs_qgroup_inherit *inherit, u64 dst_objectid) { struct btrfs_fs_info *fs_info = src->fs_info; int ret; /* * Save some performance in the case that qgroups are not enabled. If * this check races with the ioctl, rescan will kick in anyway. */ if (!btrfs_qgroup_full_accounting(fs_info)) return 0; /* * Ensure dirty @src will be committed. Or, after coming * commit_fs_roots() and switch_commit_roots(), any dirty but not * recorded root will never be updated again, causing an outdated root * item. */ ret = record_root_in_trans(trans, src, 1); if (ret) return ret; /* * btrfs_qgroup_inherit relies on a consistent view of the usage for the * src root, so we must run the delayed refs here. * * However this isn't particularly fool proof, because there's no * synchronization keeping us from changing the tree after this point * before we do the qgroup_inherit, or even from making changes while * we're doing the qgroup_inherit. But that's a problem for the future, * for now flush the delayed refs to narrow the race window where the * qgroup counters could end up wrong. */ ret = btrfs_run_delayed_refs(trans, U64_MAX); if (ret) { btrfs_abort_transaction(trans, ret); return ret; } ret = commit_fs_roots(trans); if (ret) goto out; ret = btrfs_qgroup_account_extents(trans); if (ret < 0) goto out; /* Now qgroup are all updated, we can inherit it to new qgroups */ ret = btrfs_qgroup_inherit(trans, btrfs_root_id(src), dst_objectid, btrfs_root_id(parent), inherit); if (ret < 0) goto out; /* * Now we do a simplified commit transaction, which will: * 1) commit all subvolume and extent tree * To ensure all subvolume and extent tree have a valid * commit_root to accounting later insert_dir_item() * 2) write all btree blocks onto disk * This is to make sure later btree modification will be cowed * Or commit_root can be populated and cause wrong qgroup numbers * In this simplified commit, we don't really care about other trees * like chunk and root tree, as they won't affect qgroup. * And we don't write super to avoid half committed status. */ ret = commit_cowonly_roots(trans); if (ret) goto out; switch_commit_roots(trans); ret = btrfs_write_and_wait_transaction(trans); if (ret) btrfs_handle_fs_error(fs_info, ret, "Error while writing out transaction for qgroup"); out: /* * Force parent root to be updated, as we recorded it before so its * last_trans == cur_transid. * Or it won't be committed again onto disk after later * insert_dir_item() */ if (!ret) ret = record_root_in_trans(trans, parent, 1); return ret; } /* * new snapshots need to be created at a very specific time in the * transaction commit. This does the actual creation. * * Note: * If the error which may affect the commitment of the current transaction * happens, we should return the error number. If the error which just affect * the creation of the pending snapshots, just return 0. */ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans, struct btrfs_pending_snapshot *pending) { struct btrfs_fs_info *fs_info = trans->fs_info; struct btrfs_key key; struct btrfs_root_item *new_root_item; struct btrfs_root *tree_root = fs_info->tree_root; struct btrfs_root *root = pending->root; struct btrfs_root *parent_root; struct btrfs_block_rsv *rsv; struct inode *parent_inode = &pending->dir->vfs_inode; struct btrfs_path *path; struct btrfs_dir_item *dir_item; struct extent_buffer *tmp; struct extent_buffer *old; struct timespec64 cur_time; int ret = 0; u64 to_reserve = 0; u64 index = 0; u64 objectid; u64 root_flags; unsigned int nofs_flags; struct fscrypt_name fname; ASSERT(pending->path); path = pending->path; ASSERT(pending->root_item); new_root_item = pending->root_item; /* * We're inside a transaction and must make sure that any potential * allocations with GFP_KERNEL in fscrypt won't recurse back to * filesystem. */ nofs_flags = memalloc_nofs_save(); pending->error = fscrypt_setup_filename(parent_inode, &pending->dentry->d_name, 0, &fname); memalloc_nofs_restore(nofs_flags); if (pending->error) goto free_pending; pending->error = btrfs_get_free_objectid(tree_root, &objectid); if (pending->error) goto free_fname; /* * Make qgroup to skip current new snapshot's qgroupid, as it is * accounted by later btrfs_qgroup_inherit(). */ btrfs_set_skip_qgroup(trans, objectid); btrfs_reloc_pre_snapshot(pending, &to_reserve); if (to_reserve > 0) { pending->error = btrfs_block_rsv_add(fs_info, &pending->block_rsv, to_reserve, BTRFS_RESERVE_NO_FLUSH); if (pending->error) goto clear_skip_qgroup; } key.objectid = objectid; key.offset = (u64)-1; key.type = BTRFS_ROOT_ITEM_KEY; rsv = trans->block_rsv; trans->block_rsv = &pending->block_rsv; trans->bytes_reserved = trans->block_rsv->reserved; trace_btrfs_space_reservation(fs_info, "transaction", trans->transid, trans->bytes_reserved, 1); parent_root = BTRFS_I(parent_inode)->root; ret = record_root_in_trans(trans, parent_root, 0); if (ret) goto fail; cur_time = current_time(parent_inode); /* * insert the directory item */ ret = btrfs_set_inode_index(BTRFS_I(parent_inode), &index); if (ret) { btrfs_abort_transaction(trans, ret); goto fail; } /* check if there is a file/dir which has the same name. */ dir_item = btrfs_lookup_dir_item(NULL, parent_root, path, btrfs_ino(BTRFS_I(parent_inode)), &fname.disk_name, 0); if (dir_item != NULL && !IS_ERR(dir_item)) { pending->error = -EEXIST; goto dir_item_existed; } else if (IS_ERR(dir_item)) { ret = PTR_ERR(dir_item); btrfs_abort_transaction(trans, ret); goto fail; } btrfs_release_path(path); ret = btrfs_create_qgroup(trans, objectid); if (ret && ret != -EEXIST) { btrfs_abort_transaction(trans, ret); goto fail; } /* * pull in the delayed directory update * and the delayed inode item * otherwise we corrupt the FS during * snapshot */ ret = btrfs_run_delayed_items(trans); if (ret) { /* Transaction aborted */ btrfs_abort_transaction(trans, ret); goto fail; } ret = record_root_in_trans(trans, root, 0); if (ret) { btrfs_abort_transaction(trans, ret); goto fail; } btrfs_set_root_last_snapshot(&root->root_item, trans->transid); memcpy(new_root_item, &root->root_item, sizeof(*new_root_item)); btrfs_check_and_init_root_item(new_root_item); root_flags = btrfs_root_flags(new_root_item); if (pending->readonly) root_flags |= BTRFS_ROOT_SUBVOL_RDONLY; else root_flags &= ~BTRFS_ROOT_SUBVOL_RDONLY; btrfs_set_root_flags(new_root_item, root_flags); btrfs_set_root_generation_v2(new_root_item, trans->transid); generate_random_guid(new_root_item->uuid); memcpy(new_root_item->parent_uuid, root->root_item.uuid, BTRFS_UUID_SIZE); if (!(root_flags & BTRFS_ROOT_SUBVOL_RDONLY)) { memset(new_root_item->received_uuid, 0, sizeof(new_root_item->received_uuid)); memset(&new_root_item->stime, 0, sizeof(new_root_item->stime)); memset(&new_root_item->rtime, 0, sizeof(new_root_item->rtime)); btrfs_set_root_stransid(new_root_item, 0); btrfs_set_root_rtransid(new_root_item, 0); } btrfs_set_stack_timespec_sec(&new_root_item->otime, cur_time.tv_sec); btrfs_set_stack_timespec_nsec(&new_root_item->otime, cur_time.tv_nsec); btrfs_set_root_otransid(new_root_item, trans->transid); old = btrfs_lock_root_node(root); ret = btrfs_cow_block(trans, root, old, NULL, 0, &old, BTRFS_NESTING_COW); if (ret) { btrfs_tree_unlock(old); free_extent_buffer(old); btrfs_abort_transaction(trans, ret); goto fail; } ret = btrfs_copy_root(trans, root, old, &tmp, objectid); /* clean up in any case */ btrfs_tree_unlock(old); free_extent_buffer(old); if (ret) { btrfs_abort_transaction(trans, ret); goto fail; } /* see comments in should_cow_block() */ set_bit(BTRFS_ROOT_FORCE_COW, &root->state); smp_wmb(); btrfs_set_root_node(new_root_item, tmp); /* record when the snapshot was created in key.offset */ key.offset = trans->transid; ret = btrfs_insert_root(trans, tree_root, &key, new_root_item); btrfs_tree_unlock(tmp); free_extent_buffer(tmp); if (ret) { btrfs_abort_transaction(trans, ret); goto fail; } /* * insert root back/forward references */ ret = btrfs_add_root_ref(trans, objectid, btrfs_root_id(parent_root), btrfs_ino(BTRFS_I(parent_inode)), index, &fname.disk_name); if (ret) { btrfs_abort_transaction(trans, ret); goto fail; } key.offset = (u64)-1; pending->snap = btrfs_get_new_fs_root(fs_info, objectid, &pending->anon_dev); if (IS_ERR(pending->snap)) { ret = PTR_ERR(pending->snap); pending->snap = NULL; btrfs_abort_transaction(trans, ret); goto fail; } ret = btrfs_reloc_post_snapshot(trans, pending); if (ret) { btrfs_abort_transaction(trans, ret); goto fail; } /* * Do special qgroup accounting for snapshot, as we do some qgroup * snapshot hack to do fast snapshot. * To co-operate with that hack, we do hack again. * Or snapshot will be greatly slowed down by a subtree qgroup rescan */ if (btrfs_qgroup_mode(fs_info) == BTRFS_QGROUP_MODE_FULL) ret = qgroup_account_snapshot(trans, root, parent_root, pending->inherit, objectid); else if (btrfs_qgroup_mode(fs_info) == BTRFS_QGROUP_MODE_SIMPLE) ret = btrfs_qgroup_inherit(trans, btrfs_root_id(root), objectid, btrfs_root_id(parent_root), pending->inherit); if (ret < 0) goto fail; ret = btrfs_insert_dir_item(trans, &fname.disk_name, BTRFS_I(parent_inode), &key, BTRFS_FT_DIR, index); if (ret) { btrfs_abort_transaction(trans, ret); goto fail; } btrfs_i_size_write(BTRFS_I(parent_inode), parent_inode->i_size + fname.disk_name.len * 2); inode_set_mtime_to_ts(parent_inode, inode_set_ctime_current(parent_inode)); ret = btrfs_update_inode_fallback(trans, BTRFS_I(parent_inode)); if (ret) { btrfs_abort_transaction(trans, ret); goto fail; } ret = btrfs_uuid_tree_add(trans, new_root_item->uuid, BTRFS_UUID_KEY_SUBVOL, objectid); if (ret) { btrfs_abort_transaction(trans, ret); goto fail; } if (!btrfs_is_empty_uuid(new_root_item->received_uuid)) { ret = btrfs_uuid_tree_add(trans, new_root_item->received_uuid, BTRFS_UUID_KEY_RECEIVED_SUBVOL, objectid); if (ret && ret != -EEXIST) { btrfs_abort_transaction(trans, ret); goto fail; } } fail: pending->error = ret; dir_item_existed: trans->block_rsv = rsv; trans->bytes_reserved = 0; clear_skip_qgroup: btrfs_clear_skip_qgroup(trans); free_fname: fscrypt_free_filename(&fname); free_pending: kfree(new_root_item); pending->root_item = NULL; btrfs_free_path(path); pending->path = NULL; return ret; } /* * create all the snapshots we've scheduled for creation */ static noinline int create_pending_snapshots(struct btrfs_trans_handle *trans) { struct btrfs_pending_snapshot *pending, *next; struct list_head *head = &trans->transaction->pending_snapshots; int ret = 0; list_for_each_entry_safe(pending, next, head, list) { list_del(&pending->list); ret = create_pending_snapshot(trans, pending); if (ret) break; } return ret; } static void update_super_roots(struct btrfs_fs_info *fs_info) { struct btrfs_root_item *root_item; struct btrfs_super_block *super; super = fs_info->super_copy; root_item = &fs_info->chunk_root->root_item; super->chunk_root = root_item->bytenr; super->chunk_root_generation = root_item->generation; super->chunk_root_level = root_item->level; root_item = &fs_info->tree_root->root_item; super->root = root_item->bytenr; super->generation = root_item->generation; super->root_level = root_item->level; if (btrfs_test_opt(fs_info, SPACE_CACHE)) super->cache_generation = root_item->generation; else if (test_bit(BTRFS_FS_CLEANUP_SPACE_CACHE_V1, &fs_info->flags)) super->cache_generation = 0; if (test_bit(BTRFS_FS_UPDATE_UUID_TREE_GEN, &fs_info->flags)) super->uuid_tree_generation = root_item->generation; } int btrfs_transaction_blocked(struct btrfs_fs_info *info) { struct btrfs_transaction *trans; int ret = 0; spin_lock(&info->trans_lock); trans = info->running_transaction; if (trans) ret = is_transaction_blocked(trans); spin_unlock(&info->trans_lock); return ret; } void btrfs_commit_transaction_async(struct btrfs_trans_handle *trans) { struct btrfs_fs_info *fs_info = trans->fs_info; struct btrfs_transaction *cur_trans; /* Kick the transaction kthread. */ set_bit(BTRFS_FS_COMMIT_TRANS, &fs_info->flags); wake_up_process(fs_info->transaction_kthread); /* take transaction reference */ cur_trans = trans->transaction; refcount_inc(&cur_trans->use_count); btrfs_end_transaction(trans); /* * Wait for the current transaction commit to start and block * subsequent transaction joins */ btrfs_might_wait_for_state(fs_info, BTRFS_LOCKDEP_TRANS_COMMIT_PREP); wait_event(fs_info->transaction_blocked_wait, cur_trans->state >= TRANS_STATE_COMMIT_START || TRANS_ABORTED(cur_trans)); btrfs_put_transaction(cur_trans); } /* * If there is a running transaction commit it or if it's already committing, * wait for its commit to complete. Does not start and commit a new transaction * if there isn't any running. */ int btrfs_commit_current_transaction(struct btrfs_root *root) { struct btrfs_trans_handle *trans; trans = btrfs_attach_transaction_barrier(root); if (IS_ERR(trans)) { int ret = PTR_ERR(trans); return (ret == -ENOENT) ? 0 : ret; } return btrfs_commit_transaction(trans); } static void cleanup_transaction(struct btrfs_trans_handle *trans, int err) { struct btrfs_fs_info *fs_info = trans->fs_info; struct btrfs_transaction *cur_trans = trans->transaction; WARN_ON(refcount_read(&trans->use_count) > 1); btrfs_abort_transaction(trans, err); spin_lock(&fs_info->trans_lock); /* * If the transaction is removed from the list, it means this * transaction has been committed successfully, so it is impossible * to call the cleanup function. */ BUG_ON(list_empty(&cur_trans->list)); if (cur_trans == fs_info->running_transaction) { cur_trans->state = TRANS_STATE_COMMIT_DOING; spin_unlock(&fs_info->trans_lock); /* * The thread has already released the lockdep map as reader * already in btrfs_commit_transaction(). */ btrfs_might_wait_for_event(fs_info, btrfs_trans_num_writers); wait_event(cur_trans->writer_wait, atomic_read(&cur_trans->num_writers) == 1); spin_lock(&fs_info->trans_lock); } /* * Now that we know no one else is still using the transaction we can * remove the transaction from the list of transactions. This avoids * the transaction kthread from cleaning up the transaction while some * other task is still using it, which could result in a use-after-free * on things like log trees, as it forces the transaction kthread to * wait for this transaction to be cleaned up by us. */ list_del_init(&cur_trans->list); spin_unlock(&fs_info->trans_lock); btrfs_cleanup_one_transaction(trans->transaction); spin_lock(&fs_info->trans_lock); if (cur_trans == fs_info->running_transaction) fs_info->running_transaction = NULL; spin_unlock(&fs_info->trans_lock); if (trans->type & __TRANS_FREEZABLE) sb_end_intwrite(fs_info->sb); btrfs_put_transaction(cur_trans); btrfs_put_transaction(cur_trans); trace_btrfs_transaction_commit(fs_info); if (current->journal_info == trans) current->journal_info = NULL; /* * If relocation is running, we can't cancel scrub because that will * result in a deadlock. Before relocating a block group, relocation * pauses scrub, then starts and commits a transaction before unpausing * scrub. If the transaction commit is being done by the relocation * task or triggered by another task and the relocation task is waiting * for the commit, and we end up here due to an error in the commit * path, then calling btrfs_scrub_cancel() will deadlock, as we are * asking for scrub to stop while having it asked to be paused higher * above in relocation code. */ if (!test_bit(BTRFS_FS_RELOC_RUNNING, &fs_info->flags)) btrfs_scrub_cancel(fs_info); kmem_cache_free(btrfs_trans_handle_cachep, trans); } /* * Release reserved delayed ref space of all pending block groups of the * transaction and remove them from the list */ static void btrfs_cleanup_pending_block_groups(struct btrfs_trans_handle *trans) { struct btrfs_fs_info *fs_info = trans->fs_info; struct btrfs_block_group *block_group, *tmp; list_for_each_entry_safe(block_group, tmp, &trans->new_bgs, bg_list) { btrfs_dec_delayed_refs_rsv_bg_inserts(fs_info); list_del_init(&block_group->bg_list); } } static inline int btrfs_start_delalloc_flush(struct btrfs_fs_info *fs_info) { /* * We use try_to_writeback_inodes_sb() here because if we used * btrfs_start_delalloc_roots we would deadlock with fs freeze. * Currently are holding the fs freeze lock, if we do an async flush * we'll do btrfs_join_transaction() and deadlock because we need to * wait for the fs freeze lock. Using the direct flushing we benefit * from already being in a transaction and our join_transaction doesn't * have to re-take the fs freeze lock. * * Note that try_to_writeback_inodes_sb() will only trigger writeback * if it can read lock sb->s_umount. It will always be able to lock it, * except when the filesystem is being unmounted or being frozen, but in * those cases sync_filesystem() is called, which results in calling * writeback_inodes_sb() while holding a write lock on sb->s_umount. * Note that we don't call writeback_inodes_sb() directly, because it * will emit a warning if sb->s_umount is not locked. */ if (btrfs_test_opt(fs_info, FLUSHONCOMMIT)) try_to_writeback_inodes_sb(fs_info->sb, WB_REASON_SYNC); return 0; } static inline void btrfs_wait_delalloc_flush(struct btrfs_fs_info *fs_info) { if (btrfs_test_opt(fs_info, FLUSHONCOMMIT)) btrfs_wait_ordered_roots(fs_info, U64_MAX, NULL); } /* * Add a pending snapshot associated with the given transaction handle to the * respective handle. This must be called after the transaction commit started * and while holding fs_info->trans_lock. * This serves to guarantee a caller of btrfs_commit_transaction() that it can * safely free the pending snapshot pointer in case btrfs_commit_transaction() * returns an error. */ static void add_pending_snapshot(struct btrfs_trans_handle *trans) { struct btrfs_transaction *cur_trans = trans->transaction; if (!trans->pending_snapshot) return; lockdep_assert_held(&trans->fs_info->trans_lock); ASSERT(cur_trans->state >= TRANS_STATE_COMMIT_PREP); list_add(&trans->pending_snapshot->list, &cur_trans->pending_snapshots); } static void update_commit_stats(struct btrfs_fs_info *fs_info, ktime_t interval) { fs_info->commit_stats.commit_count++; fs_info->commit_stats.last_commit_dur = interval; fs_info->commit_stats.max_commit_dur = max_t(u64, fs_info->commit_stats.max_commit_dur, interval); fs_info->commit_stats.total_commit_dur += interval; } int btrfs_commit_transaction(struct btrfs_trans_handle *trans) { struct btrfs_fs_info *fs_info = trans->fs_info; struct btrfs_transaction *cur_trans = trans->transaction; struct btrfs_transaction *prev_trans = NULL; int ret; ktime_t start_time; ktime_t interval; ASSERT(refcount_read(&trans->use_count) == 1); btrfs_trans_state_lockdep_acquire(fs_info, BTRFS_LOCKDEP_TRANS_COMMIT_PREP); clear_bit(BTRFS_FS_NEED_TRANS_COMMIT, &fs_info->flags); /* Stop the commit early if ->aborted is set */ if (TRANS_ABORTED(cur_trans)) { ret = cur_trans->aborted; goto lockdep_trans_commit_start_release; } btrfs_trans_release_metadata(trans); trans->block_rsv = NULL; /* * We only want one transaction commit doing the flushing so we do not * waste a bunch of time on lock contention on the extent root node. */ if (!test_and_set_bit(BTRFS_DELAYED_REFS_FLUSHING, &cur_trans->delayed_refs.flags)) { /* * Make a pass through all the delayed refs we have so far. * Any running threads may add more while we are here. */ ret = btrfs_run_delayed_refs(trans, 0); if (ret) goto lockdep_trans_commit_start_release; } btrfs_create_pending_block_groups(trans); if (!test_bit(BTRFS_TRANS_DIRTY_BG_RUN, &cur_trans->flags)) { int run_it = 0; /* this mutex is also taken before trying to set * block groups readonly. We need to make sure * that nobody has set a block group readonly * after a extents from that block group have been * allocated for cache files. btrfs_set_block_group_ro * will wait for the transaction to commit if it * finds BTRFS_TRANS_DIRTY_BG_RUN set. * * The BTRFS_TRANS_DIRTY_BG_RUN flag is also used to make sure * only one process starts all the block group IO. It wouldn't * hurt to have more than one go through, but there's no * real advantage to it either. */ mutex_lock(&fs_info->ro_block_group_mutex); if (!test_and_set_bit(BTRFS_TRANS_DIRTY_BG_RUN, &cur_trans->flags)) run_it = 1; mutex_unlock(&fs_info->ro_block_group_mutex); if (run_it) { ret = btrfs_start_dirty_block_groups(trans); if (ret) goto lockdep_trans_commit_start_release; } } spin_lock(&fs_info->trans_lock); if (cur_trans->state >= TRANS_STATE_COMMIT_PREP) { enum btrfs_trans_state want_state = TRANS_STATE_COMPLETED; add_pending_snapshot(trans); spin_unlock(&fs_info->trans_lock); refcount_inc(&cur_trans->use_count); if (trans->in_fsync) want_state = TRANS_STATE_SUPER_COMMITTED; btrfs_trans_state_lockdep_release(fs_info, BTRFS_LOCKDEP_TRANS_COMMIT_PREP); ret = btrfs_end_transaction(trans); wait_for_commit(cur_trans, want_state); if (TRANS_ABORTED(cur_trans)) ret = cur_trans->aborted; btrfs_put_transaction(cur_trans); return ret; } cur_trans->state = TRANS_STATE_COMMIT_PREP; wake_up(&fs_info->transaction_blocked_wait); btrfs_trans_state_lockdep_release(fs_info, BTRFS_LOCKDEP_TRANS_COMMIT_PREP); if (cur_trans->list.prev != &fs_info->trans_list) { enum btrfs_trans_state want_state = TRANS_STATE_COMPLETED; if (trans->in_fsync) want_state = TRANS_STATE_SUPER_COMMITTED; prev_trans = list_entry(cur_trans->list.prev, struct btrfs_transaction, list); if (prev_trans->state < want_state) { refcount_inc(&prev_trans->use_count); spin_unlock(&fs_info->trans_lock); wait_for_commit(prev_trans, want_state); ret = READ_ONCE(prev_trans->aborted); btrfs_put_transaction(prev_trans); if (ret) goto lockdep_release; spin_lock(&fs_info->trans_lock); } } else { /* * The previous transaction was aborted and was already removed * from the list of transactions at fs_info->trans_list. So we * abort to prevent writing a new superblock that reflects a * corrupt state (pointing to trees with unwritten nodes/leafs). */ if (BTRFS_FS_ERROR(fs_info)) { spin_unlock(&fs_info->trans_lock); ret = -EROFS; goto lockdep_release; } } cur_trans->state = TRANS_STATE_COMMIT_START; wake_up(&fs_info->transaction_blocked_wait); spin_unlock(&fs_info->trans_lock); /* * Get the time spent on the work done by the commit thread and not * the time spent waiting on a previous commit */ start_time = ktime_get_ns(); extwriter_counter_dec(cur_trans, trans->type); ret = btrfs_start_delalloc_flush(fs_info); if (ret) goto lockdep_release; ret = btrfs_run_delayed_items(trans); if (ret) goto lockdep_release; /* * The thread has started/joined the transaction thus it holds the * lockdep map as a reader. It has to release it before acquiring the * lockdep map as a writer. */ btrfs_lockdep_release(fs_info, btrfs_trans_num_extwriters); btrfs_might_wait_for_event(fs_info, btrfs_trans_num_extwriters); wait_event(cur_trans->writer_wait, extwriter_counter_read(cur_trans) == 0); /* some pending stuffs might be added after the previous flush. */ ret = btrfs_run_delayed_items(trans); if (ret) { btrfs_lockdep_release(fs_info, btrfs_trans_num_writers); goto cleanup_transaction; } btrfs_wait_delalloc_flush(fs_info); /* * Wait for all ordered extents started by a fast fsync that joined this * transaction. Otherwise if this transaction commits before the ordered * extents complete we lose logged data after a power failure. */ btrfs_might_wait_for_event(fs_info, btrfs_trans_pending_ordered); wait_event(cur_trans->pending_wait, atomic_read(&cur_trans->pending_ordered) == 0); btrfs_scrub_pause(fs_info); /* * Ok now we need to make sure to block out any other joins while we * commit the transaction. We could have started a join before setting * COMMIT_DOING so make sure to wait for num_writers to == 1 again. */ spin_lock(&fs_info->trans_lock); add_pending_snapshot(trans); cur_trans->state = TRANS_STATE_COMMIT_DOING; spin_unlock(&fs_info->trans_lock); /* * The thread has started/joined the transaction thus it holds the * lockdep map as a reader. It has to release it before acquiring the * lockdep map as a writer. */ btrfs_lockdep_release(fs_info, btrfs_trans_num_writers); btrfs_might_wait_for_event(fs_info, btrfs_trans_num_writers); wait_event(cur_trans->writer_wait, atomic_read(&cur_trans->num_writers) == 1); /* * Make lockdep happy by acquiring the state locks after * btrfs_trans_num_writers is released. If we acquired the state locks * before releasing the btrfs_trans_num_writers lock then lockdep would * complain because we did not follow the reverse order unlocking rule. */ btrfs_trans_state_lockdep_acquire(fs_info, BTRFS_LOCKDEP_TRANS_COMPLETED); btrfs_trans_state_lockdep_acquire(fs_info, BTRFS_LOCKDEP_TRANS_SUPER_COMMITTED); btrfs_trans_state_lockdep_acquire(fs_info, BTRFS_LOCKDEP_TRANS_UNBLOCKED); /* * We've started the commit, clear the flag in case we were triggered to * do an async commit but somebody else started before the transaction * kthread could do the work. */ clear_bit(BTRFS_FS_COMMIT_TRANS, &fs_info->flags); if (TRANS_ABORTED(cur_trans)) { ret = cur_trans->aborted; btrfs_trans_state_lockdep_release(fs_info, BTRFS_LOCKDEP_TRANS_UNBLOCKED); goto scrub_continue; } /* * the reloc mutex makes sure that we stop * the balancing code from coming in and moving * extents around in the middle of the commit */ mutex_lock(&fs_info->reloc_mutex); /* * We needn't worry about the delayed items because we will * deal with them in create_pending_snapshot(), which is the * core function of the snapshot creation. */ ret = create_pending_snapshots(trans); if (ret) goto unlock_reloc; /* * We insert the dir indexes of the snapshots and update the inode * of the snapshots' parents after the snapshot creation, so there * are some delayed items which are not dealt with. Now deal with * them. * * We needn't worry that this operation will corrupt the snapshots, * because all the tree which are snapshoted will be forced to COW * the nodes and leaves. */ ret = btrfs_run_delayed_items(trans); if (ret) goto unlock_reloc; ret = btrfs_run_delayed_refs(trans, U64_MAX); if (ret) goto unlock_reloc; /* * make sure none of the code above managed to slip in a * delayed item */ btrfs_assert_delayed_root_empty(fs_info); WARN_ON(cur_trans != trans->transaction); ret = commit_fs_roots(trans); if (ret) goto unlock_reloc; /* commit_fs_roots gets rid of all the tree log roots, it is now * safe to free the root of tree log roots */ btrfs_free_log_root_tree(trans, fs_info); /* * Since fs roots are all committed, we can get a quite accurate * new_roots. So let's do quota accounting. */ ret = btrfs_qgroup_account_extents(trans); if (ret < 0) goto unlock_reloc; ret = commit_cowonly_roots(trans); if (ret) goto unlock_reloc; /* * The tasks which save the space cache and inode cache may also * update ->aborted, check it. */ if (TRANS_ABORTED(cur_trans)) { ret = cur_trans->aborted; goto unlock_reloc; } cur_trans = fs_info->running_transaction; btrfs_set_root_node(&fs_info->tree_root->root_item, fs_info->tree_root->node); list_add_tail(&fs_info->tree_root->dirty_list, &cur_trans->switch_commits); btrfs_set_root_node(&fs_info->chunk_root->root_item, fs_info->chunk_root->node); list_add_tail(&fs_info->chunk_root->dirty_list, &cur_trans->switch_commits); if (btrfs_fs_incompat(fs_info, EXTENT_TREE_V2)) { btrfs_set_root_node(&fs_info->block_group_root->root_item, fs_info->block_group_root->node); list_add_tail(&fs_info->block_group_root->dirty_list, &cur_trans->switch_commits); } switch_commit_roots(trans); ASSERT(list_empty(&cur_trans->dirty_bgs)); ASSERT(list_empty(&cur_trans->io_bgs)); update_super_roots(fs_info); btrfs_set_super_log_root(fs_info->super_copy, 0); btrfs_set_super_log_root_level(fs_info->super_copy, 0); memcpy(fs_info->super_for_commit, fs_info->super_copy, sizeof(*fs_info->super_copy)); btrfs_commit_device_sizes(cur_trans); clear_bit(BTRFS_FS_LOG1_ERR, &fs_info->flags); clear_bit(BTRFS_FS_LOG2_ERR, &fs_info->flags); btrfs_trans_release_chunk_metadata(trans); /* * Before changing the transaction state to TRANS_STATE_UNBLOCKED and * setting fs_info->running_transaction to NULL, lock tree_log_mutex to * make sure that before we commit our superblock, no other task can * start a new transaction and commit a log tree before we commit our * superblock. Anyone trying to commit a log tree locks this mutex before * writing its superblock. */ mutex_lock(&fs_info->tree_log_mutex); spin_lock(&fs_info->trans_lock); cur_trans->state = TRANS_STATE_UNBLOCKED; fs_info->running_transaction = NULL; spin_unlock(&fs_info->trans_lock); mutex_unlock(&fs_info->reloc_mutex); wake_up(&fs_info->transaction_wait); btrfs_trans_state_lockdep_release(fs_info, BTRFS_LOCKDEP_TRANS_UNBLOCKED); /* If we have features changed, wake up the cleaner to update sysfs. */ if (test_bit(BTRFS_FS_FEATURE_CHANGED, &fs_info->flags) && fs_info->cleaner_kthread) wake_up_process(fs_info->cleaner_kthread); ret = btrfs_write_and_wait_transaction(trans); if (ret) { btrfs_handle_fs_error(fs_info, ret, "Error while writing out transaction"); mutex_unlock(&fs_info->tree_log_mutex); goto scrub_continue; } ret = write_all_supers(fs_info, 0); /* * the super is written, we can safely allow the tree-loggers * to go about their business */ mutex_unlock(&fs_info->tree_log_mutex); if (ret) goto scrub_continue; /* * We needn't acquire the lock here because there is no other task * which can change it. */ cur_trans->state = TRANS_STATE_SUPER_COMMITTED; wake_up(&cur_trans->commit_wait); btrfs_trans_state_lockdep_release(fs_info, BTRFS_LOCKDEP_TRANS_SUPER_COMMITTED); btrfs_finish_extent_commit(trans); if (test_bit(BTRFS_TRANS_HAVE_FREE_BGS, &cur_trans->flags)) btrfs_clear_space_info_full(fs_info); btrfs_set_last_trans_committed(fs_info, cur_trans->transid); /* * We needn't acquire the lock here because there is no other task * which can change it. */ cur_trans->state = TRANS_STATE_COMPLETED; wake_up(&cur_trans->commit_wait); btrfs_trans_state_lockdep_release(fs_info, BTRFS_LOCKDEP_TRANS_COMPLETED); spin_lock(&fs_info->trans_lock); list_del_init(&cur_trans->list); spin_unlock(&fs_info->trans_lock); btrfs_put_transaction(cur_trans); btrfs_put_transaction(cur_trans); if (trans->type & __TRANS_FREEZABLE) sb_end_intwrite(fs_info->sb); trace_btrfs_transaction_commit(fs_info); interval = ktime_get_ns() - start_time; btrfs_scrub_continue(fs_info); if (current->journal_info == trans) current->journal_info = NULL; kmem_cache_free(btrfs_trans_handle_cachep, trans); update_commit_stats(fs_info, interval); return ret; unlock_reloc: mutex_unlock(&fs_info->reloc_mutex); btrfs_trans_state_lockdep_release(fs_info, BTRFS_LOCKDEP_TRANS_UNBLOCKED); scrub_continue: btrfs_trans_state_lockdep_release(fs_info, BTRFS_LOCKDEP_TRANS_SUPER_COMMITTED); btrfs_trans_state_lockdep_release(fs_info, BTRFS_LOCKDEP_TRANS_COMPLETED); btrfs_scrub_continue(fs_info); cleanup_transaction: btrfs_trans_release_metadata(trans); btrfs_cleanup_pending_block_groups(trans); btrfs_trans_release_chunk_metadata(trans); trans->block_rsv = NULL; btrfs_warn(fs_info, "Skipping commit of aborted transaction."); if (current->journal_info == trans) current->journal_info = NULL; cleanup_transaction(trans, ret); return ret; lockdep_release: btrfs_lockdep_release(fs_info, btrfs_trans_num_extwriters); btrfs_lockdep_release(fs_info, btrfs_trans_num_writers); goto cleanup_transaction; lockdep_trans_commit_start_release: btrfs_trans_state_lockdep_release(fs_info, BTRFS_LOCKDEP_TRANS_COMMIT_PREP); btrfs_end_transaction(trans); return ret; } /* * return < 0 if error * 0 if there are no more dead_roots at the time of call * 1 there are more to be processed, call me again * * The return value indicates there are certainly more snapshots to delete, but * if there comes a new one during processing, it may return 0. We don't mind, * because btrfs_commit_super will poke cleaner thread and it will process it a * few seconds later. */ int btrfs_clean_one_deleted_snapshot(struct btrfs_fs_info *fs_info) { struct btrfs_root *root; int ret; spin_lock(&fs_info->trans_lock); if (list_empty(&fs_info->dead_roots)) { spin_unlock(&fs_info->trans_lock); return 0; } root = list_first_entry(&fs_info->dead_roots, struct btrfs_root, root_list); list_del_init(&root->root_list); spin_unlock(&fs_info->trans_lock); btrfs_debug(fs_info, "cleaner removing %llu", btrfs_root_id(root)); btrfs_kill_all_delayed_nodes(root); if (btrfs_header_backref_rev(root->node) < BTRFS_MIXED_BACKREF_REV) ret = btrfs_drop_snapshot(root, 0, 0); else ret = btrfs_drop_snapshot(root, 1, 0); btrfs_put_root(root); return (ret < 0) ? 0 : 1; } /* * We only mark the transaction aborted and then set the file system read-only. * This will prevent new transactions from starting or trying to join this * one. * * This means that error recovery at the call site is limited to freeing * any local memory allocations and passing the error code up without * further cleanup. The transaction should complete as it normally would * in the call path but will return -EIO. * * We'll complete the cleanup in btrfs_end_transaction and * btrfs_commit_transaction. */ void __cold __btrfs_abort_transaction(struct btrfs_trans_handle *trans, const char *function, unsigned int line, int error, bool first_hit) { struct btrfs_fs_info *fs_info = trans->fs_info; WRITE_ONCE(trans->aborted, error); WRITE_ONCE(trans->transaction->aborted, error); if (first_hit && error == -ENOSPC) btrfs_dump_space_info_for_trans_abort(fs_info); /* Wake up anybody who may be waiting on this transaction */ wake_up(&fs_info->transaction_wait); wake_up(&fs_info->transaction_blocked_wait); __btrfs_handle_fs_error(fs_info, function, line, error, NULL); } int __init btrfs_transaction_init(void) { btrfs_trans_handle_cachep = KMEM_CACHE(btrfs_trans_handle, SLAB_TEMPORARY); if (!btrfs_trans_handle_cachep) return -ENOMEM; return 0; } void __cold btrfs_transaction_exit(void) { kmem_cache_destroy(btrfs_trans_handle_cachep); } |
| 13569 13572 343 57 298 296 297 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 | // SPDX-License-Identifier: GPL-2.0-only /* * Lock-less NULL terminated single linked list * * The basic atomic operation of this list is cmpxchg on long. On * architectures that don't have NMI-safe cmpxchg implementation, the * list can NOT be used in NMI handlers. So code that uses the list in * an NMI handler should depend on CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG. * * Copyright 2010,2011 Intel Corp. * Author: Huang Ying <ying.huang@intel.com> */ #include <linux/kernel.h> #include <linux/export.h> #include <linux/llist.h> /** * llist_add_batch - add several linked entries in batch * @new_first: first entry in batch to be added * @new_last: last entry in batch to be added * @head: the head for your lock-less list * * Return whether list is empty before adding. */ bool llist_add_batch(struct llist_node *new_first, struct llist_node *new_last, struct llist_head *head) { struct llist_node *first = READ_ONCE(head->first); do { new_last->next = first; } while (!try_cmpxchg(&head->first, &first, new_first)); return !first; } EXPORT_SYMBOL_GPL(llist_add_batch); /** * llist_del_first - delete the first entry of lock-less list * @head: the head for your lock-less list * * If list is empty, return NULL, otherwise, return the first entry * deleted, this is the newest added one. * * Only one llist_del_first user can be used simultaneously with * multiple llist_add users without lock. Because otherwise * llist_del_first, llist_add, llist_add (or llist_del_all, llist_add, * llist_add) sequence in another user may change @head->first->next, * but keep @head->first. If multiple consumers are needed, please * use llist_del_all or use lock between consumers. */ struct llist_node *llist_del_first(struct llist_head *head) { struct llist_node *entry, *next; entry = smp_load_acquire(&head->first); do { if (entry == NULL) return NULL; next = READ_ONCE(entry->next); } while (!try_cmpxchg(&head->first, &entry, next)); return entry; } EXPORT_SYMBOL_GPL(llist_del_first); /** * llist_del_first_this - delete given entry of lock-less list if it is first * @head: the head for your lock-less list * @this: a list entry. * * If head of the list is given entry, delete and return %true else * return %false. * * Multiple callers can safely call this concurrently with multiple * llist_add() callers, providing all the callers offer a different @this. */ bool llist_del_first_this(struct llist_head *head, struct llist_node *this) { struct llist_node *entry, *next; /* acquire ensures orderig wrt try_cmpxchg() is llist_del_first() */ entry = smp_load_acquire(&head->first); do { if (entry != this) return false; next = READ_ONCE(entry->next); } while (!try_cmpxchg(&head->first, &entry, next)); return true; } EXPORT_SYMBOL_GPL(llist_del_first_this); /** * llist_reverse_order - reverse order of a llist chain * @head: first item of the list to be reversed * * Reverse the order of a chain of llist entries and return the * new first entry. */ struct llist_node *llist_reverse_order(struct llist_node *head) { struct llist_node *new_head = NULL; while (head) { struct llist_node *tmp = head; head = head->next; tmp->next = new_head; new_head = tmp; } return new_head; } EXPORT_SYMBOL_GPL(llist_reverse_order); |
| 3 3 1 1 1 1 3 3 3 3 2 1 2 2 1 1 3 3 3 8 8 7 7 4 3 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 | // SPDX-License-Identifier: GPL-2.0-only /* * Sony NFC Port-100 Series driver * Copyright (c) 2013, Intel Corporation. * * Partly based/Inspired by Stephen Tiedemann's nfcpy */ #include <linux/module.h> #include <linux/usb.h> #include <net/nfc/digital.h> #define VERSION "0.1" #define SONY_VENDOR_ID 0x054c #define RCS380S_PRODUCT_ID 0x06c1 #define RCS380P_PRODUCT_ID 0x06c3 #define PORT100_PROTOCOLS (NFC_PROTO_JEWEL_MASK | \ NFC_PROTO_MIFARE_MASK | \ NFC_PROTO_FELICA_MASK | \ NFC_PROTO_NFC_DEP_MASK | \ NFC_PROTO_ISO14443_MASK | \ NFC_PROTO_ISO14443_B_MASK) #define PORT100_CAPABILITIES (NFC_DIGITAL_DRV_CAPS_IN_CRC | \ NFC_DIGITAL_DRV_CAPS_TG_CRC) /* Standard port100 frame definitions */ #define PORT100_FRAME_HEADER_LEN (sizeof(struct port100_frame) \ + 2) /* data[0] CC, data[1] SCC */ #define PORT100_FRAME_TAIL_LEN 2 /* data[len] DCS, data[len + 1] postamble*/ #define PORT100_COMM_RF_HEAD_MAX_LEN (sizeof(struct port100_tg_comm_rf_cmd)) /* * Max extended frame payload len, excluding CC and SCC * which are already in PORT100_FRAME_HEADER_LEN. */ #define PORT100_FRAME_MAX_PAYLOAD_LEN 1001 #define PORT100_FRAME_ACK_SIZE 6 /* Preamble (1), SoPC (2), ACK Code (2), Postamble (1) */ static u8 ack_frame[PORT100_FRAME_ACK_SIZE] = { 0x00, 0x00, 0xff, 0x00, 0xff, 0x00 }; #define PORT100_FRAME_CHECKSUM(f) (f->data[le16_to_cpu(f->datalen)]) #define PORT100_FRAME_POSTAMBLE(f) (f->data[le16_to_cpu(f->datalen) + 1]) /* start of frame */ #define PORT100_FRAME_SOF 0x00FF #define PORT100_FRAME_EXT 0xFFFF #define PORT100_FRAME_ACK 0x00FF /* Port-100 command: in or out */ #define PORT100_FRAME_DIRECTION(f) (f->data[0]) /* CC */ #define PORT100_FRAME_DIR_OUT 0xD6 #define PORT100_FRAME_DIR_IN 0xD7 /* Port-100 sub-command */ #define PORT100_FRAME_CMD(f) (f->data[1]) /* SCC */ #define PORT100_CMD_GET_FIRMWARE_VERSION 0x20 #define PORT100_CMD_GET_COMMAND_TYPE 0x28 #define PORT100_CMD_SET_COMMAND_TYPE 0x2A #define PORT100_CMD_IN_SET_RF 0x00 #define PORT100_CMD_IN_SET_PROTOCOL 0x02 #define PORT100_CMD_IN_COMM_RF 0x04 #define PORT100_CMD_TG_SET_RF 0x40 #define PORT100_CMD_TG_SET_PROTOCOL 0x42 #define PORT100_CMD_TG_SET_RF_OFF 0x46 #define PORT100_CMD_TG_COMM_RF 0x48 #define PORT100_CMD_SWITCH_RF 0x06 #define PORT100_CMD_RESPONSE(cmd) (cmd + 1) #define PORT100_CMD_TYPE_IS_SUPPORTED(mask, cmd_type) \ ((mask) & (0x01 << (cmd_type))) #define PORT100_CMD_TYPE_0 0 #define PORT100_CMD_TYPE_1 1 #define PORT100_CMD_STATUS_OK 0x00 #define PORT100_CMD_STATUS_TIMEOUT 0x80 #define PORT100_MDAA_TGT_HAS_BEEN_ACTIVATED_MASK 0x01 #define PORT100_MDAA_TGT_WAS_ACTIVATED_MASK 0x02 struct port100; typedef void (*port100_send_async_complete_t)(struct port100 *dev, void *arg, struct sk_buff *resp); /* * Setting sets structure for in_set_rf command * * @in_*_set_number: Represent the entry indexes in the port-100 RF Base Table. * This table contains multiple RF setting sets required for RF * communication. * * @in_*_comm_type: Theses fields set the communication type to be used. */ struct port100_in_rf_setting { u8 in_send_set_number; u8 in_send_comm_type; u8 in_recv_set_number; u8 in_recv_comm_type; } __packed; #define PORT100_COMM_TYPE_IN_212F 0x01 #define PORT100_COMM_TYPE_IN_424F 0x02 #define PORT100_COMM_TYPE_IN_106A 0x03 #define PORT100_COMM_TYPE_IN_106B 0x07 static const struct port100_in_rf_setting in_rf_settings[] = { [NFC_DIGITAL_RF_TECH_212F] = { .in_send_set_number = 1, .in_send_comm_type = PORT100_COMM_TYPE_IN_212F, .in_recv_set_number = 15, .in_recv_comm_type = PORT100_COMM_TYPE_IN_212F, }, [NFC_DIGITAL_RF_TECH_424F] = { .in_send_set_number = 1, .in_send_comm_type = PORT100_COMM_TYPE_IN_424F, .in_recv_set_number = 15, .in_recv_comm_type = PORT100_COMM_TYPE_IN_424F, }, [NFC_DIGITAL_RF_TECH_106A] = { .in_send_set_number = 2, .in_send_comm_type = PORT100_COMM_TYPE_IN_106A, .in_recv_set_number = 15, .in_recv_comm_type = PORT100_COMM_TYPE_IN_106A, }, [NFC_DIGITAL_RF_TECH_106B] = { .in_send_set_number = 3, .in_send_comm_type = PORT100_COMM_TYPE_IN_106B, .in_recv_set_number = 15, .in_recv_comm_type = PORT100_COMM_TYPE_IN_106B, }, /* Ensures the array has NFC_DIGITAL_RF_TECH_LAST elements */ [NFC_DIGITAL_RF_TECH_LAST] = { 0 }, }; /** * struct port100_tg_rf_setting - Setting sets structure for tg_set_rf command * * @tg_set_number: Represents the entry index in the port-100 RF Base Table. * This table contains multiple RF setting sets required for RF * communication. this field is used for both send and receive * settings. * * @tg_comm_type: Sets the communication type to be used to send and receive * data. */ struct port100_tg_rf_setting { u8 tg_set_number; u8 tg_comm_type; } __packed; #define PORT100_COMM_TYPE_TG_106A 0x0B #define PORT100_COMM_TYPE_TG_212F 0x0C #define PORT100_COMM_TYPE_TG_424F 0x0D static const struct port100_tg_rf_setting tg_rf_settings[] = { [NFC_DIGITAL_RF_TECH_106A] = { .tg_set_number = 8, .tg_comm_type = PORT100_COMM_TYPE_TG_106A, }, [NFC_DIGITAL_RF_TECH_212F] = { .tg_set_number = 8, .tg_comm_type = PORT100_COMM_TYPE_TG_212F, }, [NFC_DIGITAL_RF_TECH_424F] = { .tg_set_number = 8, .tg_comm_type = PORT100_COMM_TYPE_TG_424F, }, /* Ensures the array has NFC_DIGITAL_RF_TECH_LAST elements */ [NFC_DIGITAL_RF_TECH_LAST] = { 0 }, }; #define PORT100_IN_PROT_INITIAL_GUARD_TIME 0x00 #define PORT100_IN_PROT_ADD_CRC 0x01 #define PORT100_IN_PROT_CHECK_CRC 0x02 #define PORT100_IN_PROT_MULTI_CARD 0x03 #define PORT100_IN_PROT_ADD_PARITY 0x04 #define PORT100_IN_PROT_CHECK_PARITY 0x05 #define PORT100_IN_PROT_BITWISE_AC_RECV_MODE 0x06 #define PORT100_IN_PROT_VALID_BIT_NUMBER 0x07 #define PORT100_IN_PROT_CRYPTO1 0x08 #define PORT100_IN_PROT_ADD_SOF 0x09 #define PORT100_IN_PROT_CHECK_SOF 0x0A #define PORT100_IN_PROT_ADD_EOF 0x0B #define PORT100_IN_PROT_CHECK_EOF 0x0C #define PORT100_IN_PROT_DEAF_TIME 0x0E #define PORT100_IN_PROT_CRM 0x0F #define PORT100_IN_PROT_CRM_MIN_LEN 0x10 #define PORT100_IN_PROT_T1_TAG_FRAME 0x11 #define PORT100_IN_PROT_RFCA 0x12 #define PORT100_IN_PROT_GUARD_TIME_AT_INITIATOR 0x13 #define PORT100_IN_PROT_END 0x14 #define PORT100_IN_MAX_NUM_PROTOCOLS 19 #define PORT100_TG_PROT_TU 0x00 #define PORT100_TG_PROT_RF_OFF 0x01 #define PORT100_TG_PROT_CRM 0x02 #define PORT100_TG_PROT_END 0x03 #define PORT100_TG_MAX_NUM_PROTOCOLS 3 struct port100_protocol { u8 number; u8 value; } __packed; static const struct port100_protocol in_protocols[][PORT100_IN_MAX_NUM_PROTOCOLS + 1] = { [NFC_DIGITAL_FRAMING_NFCA_SHORT] = { { PORT100_IN_PROT_INITIAL_GUARD_TIME, 6 }, { PORT100_IN_PROT_ADD_CRC, 0 }, { PORT100_IN_PROT_CHECK_CRC, 0 }, { PORT100_IN_PROT_MULTI_CARD, 0 }, { PORT100_IN_PROT_ADD_PARITY, 0 }, { PORT100_IN_PROT_CHECK_PARITY, 1 }, { PORT100_IN_PROT_BITWISE_AC_RECV_MODE, 0 }, { PORT100_IN_PROT_VALID_BIT_NUMBER, 7 }, { PORT100_IN_PROT_CRYPTO1, 0 }, { PORT100_IN_PROT_ADD_SOF, 0 }, { PORT100_IN_PROT_CHECK_SOF, 0 }, { PORT100_IN_PROT_ADD_EOF, 0 }, { PORT100_IN_PROT_CHECK_EOF, 0 }, { PORT100_IN_PROT_DEAF_TIME, 4 }, { PORT100_IN_PROT_CRM, 0 }, { PORT100_IN_PROT_CRM_MIN_LEN, 0 }, { PORT100_IN_PROT_T1_TAG_FRAME, 0 }, { PORT100_IN_PROT_RFCA, 0 }, { PORT100_IN_PROT_GUARD_TIME_AT_INITIATOR, 6 }, { PORT100_IN_PROT_END, 0 }, }, [NFC_DIGITAL_FRAMING_NFCA_STANDARD] = { { PORT100_IN_PROT_INITIAL_GUARD_TIME, 6 }, { PORT100_IN_PROT_ADD_CRC, 0 }, { PORT100_IN_PROT_CHECK_CRC, 0 }, { PORT100_IN_PROT_MULTI_CARD, 0 }, { PORT100_IN_PROT_ADD_PARITY, 1 }, { PORT100_IN_PROT_CHECK_PARITY, 1 }, { PORT100_IN_PROT_BITWISE_AC_RECV_MODE, 0 }, { PORT100_IN_PROT_VALID_BIT_NUMBER, 8 }, { PORT100_IN_PROT_CRYPTO1, 0 }, { PORT100_IN_PROT_ADD_SOF, 0 }, { PORT100_IN_PROT_CHECK_SOF, 0 }, { PORT100_IN_PROT_ADD_EOF, 0 }, { PORT100_IN_PROT_CHECK_EOF, 0 }, { PORT100_IN_PROT_DEAF_TIME, 4 }, { PORT100_IN_PROT_CRM, 0 }, { PORT100_IN_PROT_CRM_MIN_LEN, 0 }, { PORT100_IN_PROT_T1_TAG_FRAME, 0 }, { PORT100_IN_PROT_RFCA, 0 }, { PORT100_IN_PROT_GUARD_TIME_AT_INITIATOR, 6 }, { PORT100_IN_PROT_END, 0 }, }, [NFC_DIGITAL_FRAMING_NFCA_STANDARD_WITH_CRC_A] = { { PORT100_IN_PROT_INITIAL_GUARD_TIME, 6 }, { PORT100_IN_PROT_ADD_CRC, 1 }, { PORT100_IN_PROT_CHECK_CRC, 1 }, { PORT100_IN_PROT_MULTI_CARD, 0 }, { PORT100_IN_PROT_ADD_PARITY, 1 }, { PORT100_IN_PROT_CHECK_PARITY, 1 }, { PORT100_IN_PROT_BITWISE_AC_RECV_MODE, 0 }, { PORT100_IN_PROT_VALID_BIT_NUMBER, 8 }, { PORT100_IN_PROT_CRYPTO1, 0 }, { PORT100_IN_PROT_ADD_SOF, 0 }, { PORT100_IN_PROT_CHECK_SOF, 0 }, { PORT100_IN_PROT_ADD_EOF, 0 }, { PORT100_IN_PROT_CHECK_EOF, 0 }, { PORT100_IN_PROT_DEAF_TIME, 4 }, { PORT100_IN_PROT_CRM, 0 }, { PORT100_IN_PROT_CRM_MIN_LEN, 0 }, { PORT100_IN_PROT_T1_TAG_FRAME, 0 }, { PORT100_IN_PROT_RFCA, 0 }, { PORT100_IN_PROT_GUARD_TIME_AT_INITIATOR, 6 }, { PORT100_IN_PROT_END, 0 }, }, [NFC_DIGITAL_FRAMING_NFCA_T1T] = { /* nfc_digital_framing_nfca_short */ { PORT100_IN_PROT_ADD_CRC, 2 }, { PORT100_IN_PROT_CHECK_CRC, 2 }, { PORT100_IN_PROT_VALID_BIT_NUMBER, 8 }, { PORT100_IN_PROT_T1_TAG_FRAME, 2 }, { PORT100_IN_PROT_END, 0 }, }, [NFC_DIGITAL_FRAMING_NFCA_T2T] = { /* nfc_digital_framing_nfca_standard */ { PORT100_IN_PROT_ADD_CRC, 1 }, { PORT100_IN_PROT_CHECK_CRC, 0 }, { PORT100_IN_PROT_END, 0 }, }, [NFC_DIGITAL_FRAMING_NFCA_T4T] = { /* nfc_digital_framing_nfca_standard_with_crc_a */ { PORT100_IN_PROT_END, 0 }, }, [NFC_DIGITAL_FRAMING_NFCA_NFC_DEP] = { /* nfc_digital_framing_nfca_standard */ { PORT100_IN_PROT_END, 0 }, }, [NFC_DIGITAL_FRAMING_NFCF] = { { PORT100_IN_PROT_INITIAL_GUARD_TIME, 18 }, { PORT100_IN_PROT_ADD_CRC, 1 }, { PORT100_IN_PROT_CHECK_CRC, 1 }, { PORT100_IN_PROT_MULTI_CARD, 0 }, { PORT100_IN_PROT_ADD_PARITY, 0 }, { PORT100_IN_PROT_CHECK_PARITY, 0 }, { PORT100_IN_PROT_BITWISE_AC_RECV_MODE, 0 }, { PORT100_IN_PROT_VALID_BIT_NUMBER, 8 }, { PORT100_IN_PROT_CRYPTO1, 0 }, { PORT100_IN_PROT_ADD_SOF, 0 }, { PORT100_IN_PROT_CHECK_SOF, 0 }, { PORT100_IN_PROT_ADD_EOF, 0 }, { PORT100_IN_PROT_CHECK_EOF, 0 }, { PORT100_IN_PROT_DEAF_TIME, 4 }, { PORT100_IN_PROT_CRM, 0 }, { PORT100_IN_PROT_CRM_MIN_LEN, 0 }, { PORT100_IN_PROT_T1_TAG_FRAME, 0 }, { PORT100_IN_PROT_RFCA, 0 }, { PORT100_IN_PROT_GUARD_TIME_AT_INITIATOR, 6 }, { PORT100_IN_PROT_END, 0 }, }, [NFC_DIGITAL_FRAMING_NFCF_T3T] = { /* nfc_digital_framing_nfcf */ { PORT100_IN_PROT_END, 0 }, }, [NFC_DIGITAL_FRAMING_NFCF_NFC_DEP] = { /* nfc_digital_framing_nfcf */ { PORT100_IN_PROT_INITIAL_GUARD_TIME, 18 }, { PORT100_IN_PROT_ADD_CRC, 1 }, { PORT100_IN_PROT_CHECK_CRC, 1 }, { PORT100_IN_PROT_MULTI_CARD, 0 }, { PORT100_IN_PROT_ADD_PARITY, 0 }, { PORT100_IN_PROT_CHECK_PARITY, 0 }, { PORT100_IN_PROT_BITWISE_AC_RECV_MODE, 0 }, { PORT100_IN_PROT_VALID_BIT_NUMBER, 8 }, { PORT100_IN_PROT_CRYPTO1, 0 }, { PORT100_IN_PROT_ADD_SOF, 0 }, { PORT100_IN_PROT_CHECK_SOF, 0 }, { PORT100_IN_PROT_ADD_EOF, 0 }, { PORT100_IN_PROT_CHECK_EOF, 0 }, { PORT100_IN_PROT_DEAF_TIME, 4 }, { PORT100_IN_PROT_CRM, 0 }, { PORT100_IN_PROT_CRM_MIN_LEN, 0 }, { PORT100_IN_PROT_T1_TAG_FRAME, 0 }, { PORT100_IN_PROT_RFCA, 0 }, { PORT100_IN_PROT_GUARD_TIME_AT_INITIATOR, 6 }, { PORT100_IN_PROT_END, 0 }, }, [NFC_DIGITAL_FRAMING_NFC_DEP_ACTIVATED] = { { PORT100_IN_PROT_END, 0 }, }, [NFC_DIGITAL_FRAMING_NFCB] = { { PORT100_IN_PROT_INITIAL_GUARD_TIME, 20 }, { PORT100_IN_PROT_ADD_CRC, 1 }, { PORT100_IN_PROT_CHECK_CRC, 1 }, { PORT100_IN_PROT_MULTI_CARD, 0 }, { PORT100_IN_PROT_ADD_PARITY, 0 }, { PORT100_IN_PROT_CHECK_PARITY, 0 }, { PORT100_IN_PROT_BITWISE_AC_RECV_MODE, 0 }, { PORT100_IN_PROT_VALID_BIT_NUMBER, 8 }, { PORT100_IN_PROT_CRYPTO1, 0 }, { PORT100_IN_PROT_ADD_SOF, 1 }, { PORT100_IN_PROT_CHECK_SOF, 1 }, { PORT100_IN_PROT_ADD_EOF, 1 }, { PORT100_IN_PROT_CHECK_EOF, 1 }, { PORT100_IN_PROT_DEAF_TIME, 4 }, { PORT100_IN_PROT_CRM, 0 }, { PORT100_IN_PROT_CRM_MIN_LEN, 0 }, { PORT100_IN_PROT_T1_TAG_FRAME, 0 }, { PORT100_IN_PROT_RFCA, 0 }, { PORT100_IN_PROT_GUARD_TIME_AT_INITIATOR, 6 }, { PORT100_IN_PROT_END, 0 }, }, [NFC_DIGITAL_FRAMING_NFCB_T4T] = { /* nfc_digital_framing_nfcb */ { PORT100_IN_PROT_END, 0 }, }, /* Ensures the array has NFC_DIGITAL_FRAMING_LAST elements */ [NFC_DIGITAL_FRAMING_LAST] = { { PORT100_IN_PROT_END, 0 }, }, }; static const struct port100_protocol tg_protocols[][PORT100_TG_MAX_NUM_PROTOCOLS + 1] = { [NFC_DIGITAL_FRAMING_NFCA_SHORT] = { { PORT100_TG_PROT_END, 0 }, }, [NFC_DIGITAL_FRAMING_NFCA_STANDARD] = { { PORT100_TG_PROT_END, 0 }, }, [NFC_DIGITAL_FRAMING_NFCA_STANDARD_WITH_CRC_A] = { { PORT100_TG_PROT_END, 0 }, }, [NFC_DIGITAL_FRAMING_NFCA_T1T] = { { PORT100_TG_PROT_END, 0 }, }, [NFC_DIGITAL_FRAMING_NFCA_T2T] = { { PORT100_TG_PROT_END, 0 }, }, [NFC_DIGITAL_FRAMING_NFCA_NFC_DEP] = { { PORT100_TG_PROT_TU, 1 }, { PORT100_TG_PROT_RF_OFF, 0 }, { PORT100_TG_PROT_CRM, 7 }, { PORT100_TG_PROT_END, 0 }, }, [NFC_DIGITAL_FRAMING_NFCF] = { { PORT100_TG_PROT_END, 0 }, }, [NFC_DIGITAL_FRAMING_NFCF_T3T] = { { PORT100_TG_PROT_END, 0 }, }, [NFC_DIGITAL_FRAMING_NFCF_NFC_DEP] = { { PORT100_TG_PROT_TU, 1 }, { PORT100_TG_PROT_RF_OFF, 0 }, { PORT100_TG_PROT_CRM, 7 }, { PORT100_TG_PROT_END, 0 }, }, [NFC_DIGITAL_FRAMING_NFC_DEP_ACTIVATED] = { { PORT100_TG_PROT_RF_OFF, 1 }, { PORT100_TG_PROT_END, 0 }, }, /* Ensures the array has NFC_DIGITAL_FRAMING_LAST elements */ [NFC_DIGITAL_FRAMING_LAST] = { { PORT100_TG_PROT_END, 0 }, }, }; struct port100 { struct nfc_digital_dev *nfc_digital_dev; int skb_headroom; int skb_tailroom; struct usb_device *udev; struct usb_interface *interface; struct urb *out_urb; struct urb *in_urb; /* This mutex protects the out_urb and avoids to submit a new command * through port100_send_frame_async() while the previous one is being * canceled through port100_abort_cmd(). */ struct mutex out_urb_lock; struct work_struct cmd_complete_work; u8 cmd_type; /* The digital stack serializes commands to be sent. There is no need * for any queuing/locking mechanism at driver level. */ struct port100_cmd *cmd; bool cmd_cancel; struct completion cmd_cancel_done; }; struct port100_cmd { u8 code; int status; struct sk_buff *req; struct sk_buff *resp; int resp_len; port100_send_async_complete_t complete_cb; void *complete_cb_context; }; struct port100_frame { u8 preamble; __be16 start_frame; __be16 extended_frame; __le16 datalen; u8 datalen_checksum; u8 data[]; } __packed; struct port100_ack_frame { u8 preamble; __be16 start_frame; __be16 ack_frame; u8 postambule; } __packed; struct port100_cb_arg { nfc_digital_cmd_complete_t complete_cb; void *complete_arg; u8 mdaa; }; struct port100_tg_comm_rf_cmd { __le16 guard_time; __le16 send_timeout; u8 mdaa; u8 nfca_param[6]; u8 nfcf_param[18]; u8 mf_halted; u8 arae_flag; __le16 recv_timeout; u8 data[]; } __packed; struct port100_tg_comm_rf_res { u8 comm_type; u8 ar_status; u8 target_activated; __le32 status; u8 data[]; } __packed; /* The rule: value + checksum = 0 */ static inline u8 port100_checksum(u16 value) { return ~(((u8 *)&value)[0] + ((u8 *)&value)[1]) + 1; } /* The rule: sum(data elements) + checksum = 0 */ static u8 port100_data_checksum(const u8 *data, int datalen) { u8 sum = 0; int i; for (i = 0; i < datalen; i++) sum += data[i]; return port100_checksum(sum); } static void port100_tx_frame_init(void *_frame, u8 cmd_code) { struct port100_frame *frame = _frame; frame->preamble = 0; frame->start_frame = cpu_to_be16(PORT100_FRAME_SOF); frame->extended_frame = cpu_to_be16(PORT100_FRAME_EXT); PORT100_FRAME_DIRECTION(frame) = PORT100_FRAME_DIR_OUT; PORT100_FRAME_CMD(frame) = cmd_code; frame->datalen = cpu_to_le16(2); } static void port100_tx_frame_finish(void *_frame) { struct port100_frame *frame = _frame; frame->datalen_checksum = port100_checksum(le16_to_cpu(frame->datalen)); PORT100_FRAME_CHECKSUM(frame) = port100_data_checksum(frame->data, le16_to_cpu(frame->datalen)); PORT100_FRAME_POSTAMBLE(frame) = 0; } static void port100_tx_update_payload_len(void *_frame, int len) { struct port100_frame *frame = _frame; le16_add_cpu(&frame->datalen, len); } static bool port100_rx_frame_is_valid(const void *_frame) { u8 checksum; const struct port100_frame *frame = _frame; if (frame->start_frame != cpu_to_be16(PORT100_FRAME_SOF) || frame->extended_frame != cpu_to_be16(PORT100_FRAME_EXT)) return false; checksum = port100_checksum(le16_to_cpu(frame->datalen)); if (checksum != frame->datalen_checksum) return false; checksum = port100_data_checksum(frame->data, le16_to_cpu(frame->datalen)); if (checksum != PORT100_FRAME_CHECKSUM(frame)) return false; return true; } static bool port100_rx_frame_is_ack(const struct port100_ack_frame *frame) { return (frame->start_frame == cpu_to_be16(PORT100_FRAME_SOF) && frame->ack_frame == cpu_to_be16(PORT100_FRAME_ACK)); } static inline int port100_rx_frame_size(const void *frame) { const struct port100_frame *f = frame; return sizeof(struct port100_frame) + le16_to_cpu(f->datalen) + PORT100_FRAME_TAIL_LEN; } static bool port100_rx_frame_is_cmd_response(const struct port100 *dev, const void *frame) { const struct port100_frame *f = frame; return (PORT100_FRAME_CMD(f) == PORT100_CMD_RESPONSE(dev->cmd->code)); } static void port100_recv_response(struct urb *urb) { struct port100 *dev = urb->context; struct port100_cmd *cmd = dev->cmd; u8 *in_frame; cmd->status = urb->status; switch (urb->status) { case 0: break; /* success */ case -ECONNRESET: case -ENOENT: nfc_dbg(&dev->interface->dev, "The urb has been canceled (status %d)\n", urb->status); goto sched_wq; case -ESHUTDOWN: default: nfc_err(&dev->interface->dev, "Urb failure (status %d)\n", urb->status); goto sched_wq; } in_frame = dev->in_urb->transfer_buffer; if (!port100_rx_frame_is_valid(in_frame)) { nfc_err(&dev->interface->dev, "Received an invalid frame\n"); cmd->status = -EIO; goto sched_wq; } print_hex_dump_debug("PORT100 RX: ", DUMP_PREFIX_NONE, 16, 1, in_frame, port100_rx_frame_size(in_frame), false); if (!port100_rx_frame_is_cmd_response(dev, in_frame)) { nfc_err(&dev->interface->dev, "It's not the response to the last command\n"); cmd->status = -EIO; goto sched_wq; } sched_wq: schedule_work(&dev->cmd_complete_work); } static int port100_submit_urb_for_response(const struct port100 *dev, gfp_t flags) { dev->in_urb->complete = port100_recv_response; return usb_submit_urb(dev->in_urb, flags); } static void port100_recv_ack(struct urb *urb) { struct port100 *dev = urb->context; struct port100_cmd *cmd = dev->cmd; const struct port100_ack_frame *in_frame; int rc; cmd->status = urb->status; switch (urb->status) { case 0: break; /* success */ case -ECONNRESET: case -ENOENT: nfc_dbg(&dev->interface->dev, "The urb has been stopped (status %d)\n", urb->status); goto sched_wq; case -ESHUTDOWN: default: nfc_err(&dev->interface->dev, "Urb failure (status %d)\n", urb->status); goto sched_wq; } in_frame = dev->in_urb->transfer_buffer; if (!port100_rx_frame_is_ack(in_frame)) { nfc_err(&dev->interface->dev, "Received an invalid ack\n"); cmd->status = -EIO; goto sched_wq; } rc = port100_submit_urb_for_response(dev, GFP_ATOMIC); if (rc) { nfc_err(&dev->interface->dev, "usb_submit_urb failed with result %d\n", rc); cmd->status = rc; goto sched_wq; } return; sched_wq: schedule_work(&dev->cmd_complete_work); } static int port100_submit_urb_for_ack(const struct port100 *dev, gfp_t flags) { dev->in_urb->complete = port100_recv_ack; return usb_submit_urb(dev->in_urb, flags); } static int port100_send_ack(struct port100 *dev) { int rc = 0; mutex_lock(&dev->out_urb_lock); /* * If prior cancel is in-flight (dev->cmd_cancel == true), we * can skip to send cancel. Then this will wait the prior * cancel, or merged into the next cancel rarely if next * cancel was started before waiting done. In any case, this * will be waked up soon or later. */ if (!dev->cmd_cancel) { reinit_completion(&dev->cmd_cancel_done); usb_kill_urb(dev->out_urb); dev->out_urb->transfer_buffer = ack_frame; dev->out_urb->transfer_buffer_length = sizeof(ack_frame); rc = usb_submit_urb(dev->out_urb, GFP_KERNEL); /* * Set the cmd_cancel flag only if the URB has been * successfully submitted. It will be reset by the out * URB completion callback port100_send_complete(). */ dev->cmd_cancel = !rc; } mutex_unlock(&dev->out_urb_lock); if (!rc) wait_for_completion(&dev->cmd_cancel_done); return rc; } static int port100_send_frame_async(struct port100 *dev, const struct sk_buff *out, const struct sk_buff *in, int in_len) { int rc; mutex_lock(&dev->out_urb_lock); /* A command cancel frame as been sent through dev->out_urb. Don't try * to submit a new one. */ if (dev->cmd_cancel) { rc = -EAGAIN; goto exit; } dev->out_urb->transfer_buffer = out->data; dev->out_urb->transfer_buffer_length = out->len; dev->in_urb->transfer_buffer = in->data; dev->in_urb->transfer_buffer_length = in_len; print_hex_dump_debug("PORT100 TX: ", DUMP_PREFIX_NONE, 16, 1, out->data, out->len, false); rc = usb_submit_urb(dev->out_urb, GFP_KERNEL); if (rc) goto exit; rc = port100_submit_urb_for_ack(dev, GFP_KERNEL); if (rc) usb_kill_urb(dev->out_urb); exit: mutex_unlock(&dev->out_urb_lock); return rc; } static void port100_build_cmd_frame(struct port100 *dev, u8 cmd_code, struct sk_buff *skb) { /* payload is already there, just update datalen */ int payload_len = skb->len; skb_push(skb, PORT100_FRAME_HEADER_LEN); skb_put(skb, PORT100_FRAME_TAIL_LEN); port100_tx_frame_init(skb->data, cmd_code); port100_tx_update_payload_len(skb->data, payload_len); port100_tx_frame_finish(skb->data); } static void port100_send_async_complete(struct port100 *dev) { struct port100_cmd *cmd = dev->cmd; int status = cmd->status; struct sk_buff *req = cmd->req; struct sk_buff *resp = cmd->resp; dev_kfree_skb(req); dev->cmd = NULL; if (status < 0) { cmd->complete_cb(dev, cmd->complete_cb_context, ERR_PTR(status)); dev_kfree_skb(resp); goto done; } skb_put(resp, port100_rx_frame_size(resp->data)); skb_pull(resp, PORT100_FRAME_HEADER_LEN); skb_trim(resp, resp->len - PORT100_FRAME_TAIL_LEN); cmd->complete_cb(dev, cmd->complete_cb_context, resp); done: kfree(cmd); } static int port100_send_cmd_async(struct port100 *dev, u8 cmd_code, struct sk_buff *req, port100_send_async_complete_t complete_cb, void *complete_cb_context) { struct port100_cmd *cmd; struct sk_buff *resp; int rc; int resp_len = PORT100_FRAME_HEADER_LEN + PORT100_FRAME_MAX_PAYLOAD_LEN + PORT100_FRAME_TAIL_LEN; if (dev->cmd) { nfc_err(&dev->interface->dev, "A command is still in process\n"); return -EBUSY; } resp = alloc_skb(resp_len, GFP_KERNEL); if (!resp) return -ENOMEM; cmd = kzalloc(sizeof(*cmd), GFP_KERNEL); if (!cmd) { dev_kfree_skb(resp); return -ENOMEM; } cmd->code = cmd_code; cmd->req = req; cmd->resp = resp; cmd->resp_len = resp_len; cmd->complete_cb = complete_cb; cmd->complete_cb_context = complete_cb_context; port100_build_cmd_frame(dev, cmd_code, req); dev->cmd = cmd; rc = port100_send_frame_async(dev, req, resp, resp_len); if (rc) { kfree(cmd); dev_kfree_skb(resp); dev->cmd = NULL; } return rc; } struct port100_sync_cmd_response { struct sk_buff *resp; struct completion done; }; static void port100_wq_cmd_complete(struct work_struct *work) { struct port100 *dev = container_of(work, struct port100, cmd_complete_work); port100_send_async_complete(dev); } static void port100_send_sync_complete(struct port100 *dev, void *_arg, struct sk_buff *resp) { struct port100_sync_cmd_response *arg = _arg; arg->resp = resp; complete(&arg->done); } static struct sk_buff *port100_send_cmd_sync(struct port100 *dev, u8 cmd_code, struct sk_buff *req) { int rc; struct port100_sync_cmd_response arg; init_completion(&arg.done); rc = port100_send_cmd_async(dev, cmd_code, req, port100_send_sync_complete, &arg); if (rc) { dev_kfree_skb(req); return ERR_PTR(rc); } wait_for_completion(&arg.done); return arg.resp; } static void port100_send_complete(struct urb *urb) { struct port100 *dev = urb->context; if (dev->cmd_cancel) { complete_all(&dev->cmd_cancel_done); dev->cmd_cancel = false; } switch (urb->status) { case 0: break; /* success */ case -ECONNRESET: case -ENOENT: nfc_dbg(&dev->interface->dev, "The urb has been stopped (status %d)\n", urb->status); break; case -ESHUTDOWN: default: nfc_err(&dev->interface->dev, "Urb failure (status %d)\n", urb->status); } } static void port100_abort_cmd(struct nfc_digital_dev *ddev) { struct port100 *dev = nfc_digital_get_drvdata(ddev); /* An ack will cancel the last issued command */ port100_send_ack(dev); /* cancel the urb request */ usb_kill_urb(dev->in_urb); } static struct sk_buff *port100_alloc_skb(const struct port100 *dev, unsigned int size) { struct sk_buff *skb; skb = alloc_skb(dev->skb_headroom + dev->skb_tailroom + size, GFP_KERNEL); if (skb) skb_reserve(skb, dev->skb_headroom); return skb; } static int port100_set_command_type(struct port100 *dev, u8 command_type) { struct sk_buff *skb; struct sk_buff *resp; int rc; skb = port100_alloc_skb(dev, 1); if (!skb) return -ENOMEM; skb_put_u8(skb, command_type); resp = port100_send_cmd_sync(dev, PORT100_CMD_SET_COMMAND_TYPE, skb); if (IS_ERR(resp)) return PTR_ERR(resp); rc = resp->data[0]; dev_kfree_skb(resp); return rc; } static u64 port100_get_command_type_mask(struct port100 *dev) { struct sk_buff *skb; struct sk_buff *resp; u64 mask; skb = port100_alloc_skb(dev, 0); if (!skb) return 0; resp = port100_send_cmd_sync(dev, PORT100_CMD_GET_COMMAND_TYPE, skb); if (IS_ERR(resp)) return 0; if (resp->len < 8) mask = 0; else mask = be64_to_cpu(*(__be64 *)resp->data); dev_kfree_skb(resp); return mask; } static u16 port100_get_firmware_version(struct port100 *dev) { struct sk_buff *skb; struct sk_buff *resp; u16 fw_ver; skb = port100_alloc_skb(dev, 0); if (!skb) return 0; resp = port100_send_cmd_sync(dev, PORT100_CMD_GET_FIRMWARE_VERSION, skb); if (IS_ERR(resp)) return 0; fw_ver = le16_to_cpu(*(__le16 *)resp->data); dev_kfree_skb(resp); return fw_ver; } static int port100_switch_rf(struct nfc_digital_dev *ddev, bool on) { struct port100 *dev = nfc_digital_get_drvdata(ddev); struct sk_buff *skb, *resp; skb = port100_alloc_skb(dev, 1); if (!skb) return -ENOMEM; skb_put_u8(skb, on ? 1 : 0); /* Cancel the last command if the device is being switched off */ if (!on) port100_abort_cmd(ddev); resp = port100_send_cmd_sync(dev, PORT100_CMD_SWITCH_RF, skb); if (IS_ERR(resp)) return PTR_ERR(resp); dev_kfree_skb(resp); return 0; } static int port100_in_set_rf(struct nfc_digital_dev *ddev, u8 rf) { struct port100 *dev = nfc_digital_get_drvdata(ddev); struct sk_buff *skb; struct sk_buff *resp; int rc; if (rf >= NFC_DIGITAL_RF_TECH_LAST) return -EINVAL; skb = port100_alloc_skb(dev, sizeof(struct port100_in_rf_setting)); if (!skb) return -ENOMEM; skb_put_data(skb, &in_rf_settings[rf], sizeof(struct port100_in_rf_setting)); resp = port100_send_cmd_sync(dev, PORT100_CMD_IN_SET_RF, skb); if (IS_ERR(resp)) return PTR_ERR(resp); rc = resp->data[0]; dev_kfree_skb(resp); return rc; } static int port100_in_set_framing(struct nfc_digital_dev *ddev, int param) { struct port100 *dev = nfc_digital_get_drvdata(ddev); const struct port100_protocol *protocols; struct sk_buff *skb; struct sk_buff *resp; int num_protocols; size_t size; int rc; if (param >= NFC_DIGITAL_FRAMING_LAST) return -EINVAL; protocols = in_protocols[param]; num_protocols = 0; while (protocols[num_protocols].number != PORT100_IN_PROT_END) num_protocols++; if (!num_protocols) return 0; size = sizeof(struct port100_protocol) * num_protocols; skb = port100_alloc_skb(dev, size); if (!skb) return -ENOMEM; skb_put_data(skb, protocols, size); resp = port100_send_cmd_sync(dev, PORT100_CMD_IN_SET_PROTOCOL, skb); if (IS_ERR(resp)) return PTR_ERR(resp); rc = resp->data[0]; dev_kfree_skb(resp); return rc; } static int port100_in_configure_hw(struct nfc_digital_dev *ddev, int type, int param) { if (type == NFC_DIGITAL_CONFIG_RF_TECH) return port100_in_set_rf(ddev, param); if (type == NFC_DIGITAL_CONFIG_FRAMING) return port100_in_set_framing(ddev, param); return -EINVAL; } static void port100_in_comm_rf_complete(struct port100 *dev, void *arg, struct sk_buff *resp) { const struct port100_cb_arg *cb_arg = arg; nfc_digital_cmd_complete_t cb = cb_arg->complete_cb; u32 status; int rc; if (IS_ERR(resp)) { rc = PTR_ERR(resp); goto exit; } if (resp->len < 4) { nfc_err(&dev->interface->dev, "Invalid packet length received\n"); rc = -EIO; goto error; } status = le32_to_cpu(*(__le32 *)resp->data); skb_pull(resp, sizeof(u32)); if (status == PORT100_CMD_STATUS_TIMEOUT) { rc = -ETIMEDOUT; goto error; } if (status != PORT100_CMD_STATUS_OK) { nfc_err(&dev->interface->dev, "in_comm_rf failed with status 0x%08x\n", status); rc = -EIO; goto error; } /* Remove collision bits byte */ skb_pull(resp, 1); goto exit; error: kfree_skb(resp); resp = ERR_PTR(rc); exit: cb(dev->nfc_digital_dev, cb_arg->complete_arg, resp); kfree(cb_arg); } static int port100_in_send_cmd(struct nfc_digital_dev *ddev, struct sk_buff *skb, u16 _timeout, nfc_digital_cmd_complete_t cb, void *arg) { struct port100 *dev = nfc_digital_get_drvdata(ddev); struct port100_cb_arg *cb_arg; __le16 timeout; cb_arg = kzalloc(sizeof(struct port100_cb_arg), GFP_KERNEL); if (!cb_arg) return -ENOMEM; cb_arg->complete_cb = cb; cb_arg->complete_arg = arg; timeout = cpu_to_le16(_timeout * 10); memcpy(skb_push(skb, sizeof(__le16)), &timeout, sizeof(__le16)); return port100_send_cmd_async(dev, PORT100_CMD_IN_COMM_RF, skb, port100_in_comm_rf_complete, cb_arg); } static int port100_tg_set_rf(struct nfc_digital_dev *ddev, u8 rf) { struct port100 *dev = nfc_digital_get_drvdata(ddev); struct sk_buff *skb; struct sk_buff *resp; int rc; if (rf >= NFC_DIGITAL_RF_TECH_LAST) return -EINVAL; skb = port100_alloc_skb(dev, sizeof(struct port100_tg_rf_setting)); if (!skb) return -ENOMEM; skb_put_data(skb, &tg_rf_settings[rf], sizeof(struct port100_tg_rf_setting)); resp = port100_send_cmd_sync(dev, PORT100_CMD_TG_SET_RF, skb); if (IS_ERR(resp)) return PTR_ERR(resp); rc = resp->data[0]; dev_kfree_skb(resp); return rc; } static int port100_tg_set_framing(struct nfc_digital_dev *ddev, int param) { struct port100 *dev = nfc_digital_get_drvdata(ddev); const struct port100_protocol *protocols; struct sk_buff *skb; struct sk_buff *resp; int rc; int num_protocols; size_t size; if (param >= NFC_DIGITAL_FRAMING_LAST) return -EINVAL; protocols = tg_protocols[param]; num_protocols = 0; while (protocols[num_protocols].number != PORT100_TG_PROT_END) num_protocols++; if (!num_protocols) return 0; size = sizeof(struct port100_protocol) * num_protocols; skb = port100_alloc_skb(dev, size); if (!skb) return -ENOMEM; skb_put_data(skb, protocols, size); resp = port100_send_cmd_sync(dev, PORT100_CMD_TG_SET_PROTOCOL, skb); if (IS_ERR(resp)) return PTR_ERR(resp); rc = resp->data[0]; dev_kfree_skb(resp); return rc; } static int port100_tg_configure_hw(struct nfc_digital_dev *ddev, int type, int param) { if (type == NFC_DIGITAL_CONFIG_RF_TECH) return port100_tg_set_rf(ddev, param); if (type == NFC_DIGITAL_CONFIG_FRAMING) return port100_tg_set_framing(ddev, param); return -EINVAL; } static bool port100_tg_target_activated(struct port100 *dev, u8 tgt_activated) { u8 mask; switch (dev->cmd_type) { case PORT100_CMD_TYPE_0: mask = PORT100_MDAA_TGT_HAS_BEEN_ACTIVATED_MASK; break; case PORT100_CMD_TYPE_1: mask = PORT100_MDAA_TGT_HAS_BEEN_ACTIVATED_MASK | PORT100_MDAA_TGT_WAS_ACTIVATED_MASK; break; default: nfc_err(&dev->interface->dev, "Unknown command type\n"); return false; } return ((tgt_activated & mask) == mask); } static void port100_tg_comm_rf_complete(struct port100 *dev, void *arg, struct sk_buff *resp) { u32 status; const struct port100_cb_arg *cb_arg = arg; nfc_digital_cmd_complete_t cb = cb_arg->complete_cb; struct port100_tg_comm_rf_res *hdr; if (IS_ERR(resp)) goto exit; hdr = (struct port100_tg_comm_rf_res *)resp->data; status = le32_to_cpu(hdr->status); if (cb_arg->mdaa && !port100_tg_target_activated(dev, hdr->target_activated)) { kfree_skb(resp); resp = ERR_PTR(-ETIMEDOUT); goto exit; } skb_pull(resp, sizeof(struct port100_tg_comm_rf_res)); if (status != PORT100_CMD_STATUS_OK) { kfree_skb(resp); if (status == PORT100_CMD_STATUS_TIMEOUT) resp = ERR_PTR(-ETIMEDOUT); else resp = ERR_PTR(-EIO); } exit: cb(dev->nfc_digital_dev, cb_arg->complete_arg, resp); kfree(cb_arg); } static int port100_tg_send_cmd(struct nfc_digital_dev *ddev, struct sk_buff *skb, u16 timeout, nfc_digital_cmd_complete_t cb, void *arg) { struct port100 *dev = nfc_digital_get_drvdata(ddev); struct port100_tg_comm_rf_cmd *hdr; struct port100_cb_arg *cb_arg; cb_arg = kzalloc(sizeof(struct port100_cb_arg), GFP_KERNEL); if (!cb_arg) return -ENOMEM; cb_arg->complete_cb = cb; cb_arg->complete_arg = arg; skb_push(skb, sizeof(struct port100_tg_comm_rf_cmd)); hdr = (struct port100_tg_comm_rf_cmd *)skb->data; memset(hdr, 0, sizeof(struct port100_tg_comm_rf_cmd)); hdr->guard_time = cpu_to_le16(500); hdr->send_timeout = cpu_to_le16(0xFFFF); hdr->recv_timeout = cpu_to_le16(timeout); return port100_send_cmd_async(dev, PORT100_CMD_TG_COMM_RF, skb, port100_tg_comm_rf_complete, cb_arg); } static int port100_listen_mdaa(struct nfc_digital_dev *ddev, struct digital_tg_mdaa_params *params, u16 timeout, nfc_digital_cmd_complete_t cb, void *arg) { struct port100 *dev = nfc_digital_get_drvdata(ddev); struct port100_tg_comm_rf_cmd *hdr; struct port100_cb_arg *cb_arg; struct sk_buff *skb; int rc; rc = port100_tg_configure_hw(ddev, NFC_DIGITAL_CONFIG_RF_TECH, NFC_DIGITAL_RF_TECH_106A); if (rc) return rc; rc = port100_tg_configure_hw(ddev, NFC_DIGITAL_CONFIG_FRAMING, NFC_DIGITAL_FRAMING_NFCA_NFC_DEP); if (rc) return rc; cb_arg = kzalloc(sizeof(struct port100_cb_arg), GFP_KERNEL); if (!cb_arg) return -ENOMEM; cb_arg->complete_cb = cb; cb_arg->complete_arg = arg; cb_arg->mdaa = 1; skb = port100_alloc_skb(dev, 0); if (!skb) { kfree(cb_arg); return -ENOMEM; } skb_push(skb, sizeof(struct port100_tg_comm_rf_cmd)); hdr = (struct port100_tg_comm_rf_cmd *)skb->data; memset(hdr, 0, sizeof(struct port100_tg_comm_rf_cmd)); hdr->guard_time = 0; hdr->send_timeout = cpu_to_le16(0xFFFF); hdr->mdaa = 1; hdr->nfca_param[0] = (params->sens_res >> 8) & 0xFF; hdr->nfca_param[1] = params->sens_res & 0xFF; memcpy(hdr->nfca_param + 2, params->nfcid1, 3); hdr->nfca_param[5] = params->sel_res; memcpy(hdr->nfcf_param, params->nfcid2, 8); hdr->nfcf_param[16] = (params->sc >> 8) & 0xFF; hdr->nfcf_param[17] = params->sc & 0xFF; hdr->recv_timeout = cpu_to_le16(timeout); return port100_send_cmd_async(dev, PORT100_CMD_TG_COMM_RF, skb, port100_tg_comm_rf_complete, cb_arg); } static int port100_listen(struct nfc_digital_dev *ddev, u16 timeout, nfc_digital_cmd_complete_t cb, void *arg) { const struct port100 *dev = nfc_digital_get_drvdata(ddev); struct sk_buff *skb; skb = port100_alloc_skb(dev, 0); if (!skb) return -ENOMEM; return port100_tg_send_cmd(ddev, skb, timeout, cb, arg); } static const struct nfc_digital_ops port100_digital_ops = { .in_configure_hw = port100_in_configure_hw, .in_send_cmd = port100_in_send_cmd, .tg_listen_mdaa = port100_listen_mdaa, .tg_listen = port100_listen, .tg_configure_hw = port100_tg_configure_hw, .tg_send_cmd = port100_tg_send_cmd, .switch_rf = port100_switch_rf, .abort_cmd = port100_abort_cmd, }; static const struct usb_device_id port100_table[] = { { USB_DEVICE(SONY_VENDOR_ID, RCS380S_PRODUCT_ID), }, { USB_DEVICE(SONY_VENDOR_ID, RCS380P_PRODUCT_ID), }, { } }; MODULE_DEVICE_TABLE(usb, port100_table); static int port100_probe(struct usb_interface *interface, const struct usb_device_id *id) { struct port100 *dev; int rc; struct usb_host_interface *iface_desc; struct usb_endpoint_descriptor *endpoint; int in_endpoint; int out_endpoint; u16 fw_version; u64 cmd_type_mask; int i; dev = devm_kzalloc(&interface->dev, sizeof(struct port100), GFP_KERNEL); if (!dev) return -ENOMEM; mutex_init(&dev->out_urb_lock); dev->udev = usb_get_dev(interface_to_usbdev(interface)); dev->interface = interface; usb_set_intfdata(interface, dev); in_endpoint = out_endpoint = 0; iface_desc = interface->cur_altsetting; for (i = 0; i < iface_desc->desc.bNumEndpoints; ++i) { endpoint = &iface_desc->endpoint[i].desc; if (!in_endpoint && usb_endpoint_is_bulk_in(endpoint)) in_endpoint = endpoint->bEndpointAddress; if (!out_endpoint && usb_endpoint_is_bulk_out(endpoint)) out_endpoint = endpoint->bEndpointAddress; } if (!in_endpoint || !out_endpoint) { nfc_err(&interface->dev, "Could not find bulk-in or bulk-out endpoint\n"); rc = -ENODEV; goto error; } dev->in_urb = usb_alloc_urb(0, GFP_KERNEL); dev->out_urb = usb_alloc_urb(0, GFP_KERNEL); if (!dev->in_urb || !dev->out_urb) { nfc_err(&interface->dev, "Could not allocate USB URBs\n"); rc = -ENOMEM; goto error; } usb_fill_bulk_urb(dev->in_urb, dev->udev, usb_rcvbulkpipe(dev->udev, in_endpoint), NULL, 0, NULL, dev); usb_fill_bulk_urb(dev->out_urb, dev->udev, usb_sndbulkpipe(dev->udev, out_endpoint), NULL, 0, port100_send_complete, dev); dev->out_urb->transfer_flags = URB_ZERO_PACKET; dev->skb_headroom = PORT100_FRAME_HEADER_LEN + PORT100_COMM_RF_HEAD_MAX_LEN; dev->skb_tailroom = PORT100_FRAME_TAIL_LEN; init_completion(&dev->cmd_cancel_done); INIT_WORK(&dev->cmd_complete_work, port100_wq_cmd_complete); /* The first thing to do with the Port-100 is to set the command type * to be used. If supported we use command type 1. 0 otherwise. */ cmd_type_mask = port100_get_command_type_mask(dev); if (!cmd_type_mask) { nfc_err(&interface->dev, "Could not get supported command types\n"); rc = -ENODEV; goto error; } if (PORT100_CMD_TYPE_IS_SUPPORTED(cmd_type_mask, PORT100_CMD_TYPE_1)) dev->cmd_type = PORT100_CMD_TYPE_1; else dev->cmd_type = PORT100_CMD_TYPE_0; rc = port100_set_command_type(dev, dev->cmd_type); if (rc) { nfc_err(&interface->dev, "The device does not support command type %u\n", dev->cmd_type); goto error; } fw_version = port100_get_firmware_version(dev); if (!fw_version) nfc_err(&interface->dev, "Could not get device firmware version\n"); nfc_info(&interface->dev, "Sony NFC Port-100 Series attached (firmware v%x.%02x)\n", (fw_version & 0xFF00) >> 8, fw_version & 0xFF); dev->nfc_digital_dev = nfc_digital_allocate_device(&port100_digital_ops, PORT100_PROTOCOLS, PORT100_CAPABILITIES, dev->skb_headroom, dev->skb_tailroom); if (!dev->nfc_digital_dev) { nfc_err(&interface->dev, "Could not allocate nfc_digital_dev\n"); rc = -ENOMEM; goto error; } nfc_digital_set_parent_dev(dev->nfc_digital_dev, &interface->dev); nfc_digital_set_drvdata(dev->nfc_digital_dev, dev); rc = nfc_digital_register_device(dev->nfc_digital_dev); if (rc) { nfc_err(&interface->dev, "Could not register digital device\n"); goto free_nfc_dev; } return 0; free_nfc_dev: nfc_digital_free_device(dev->nfc_digital_dev); error: usb_kill_urb(dev->in_urb); usb_free_urb(dev->in_urb); usb_kill_urb(dev->out_urb); usb_free_urb(dev->out_urb); usb_put_dev(dev->udev); return rc; } static void port100_disconnect(struct usb_interface *interface) { struct port100 *dev; dev = usb_get_intfdata(interface); usb_set_intfdata(interface, NULL); nfc_digital_unregister_device(dev->nfc_digital_dev); nfc_digital_free_device(dev->nfc_digital_dev); usb_kill_urb(dev->in_urb); usb_kill_urb(dev->out_urb); usb_free_urb(dev->in_urb); usb_free_urb(dev->out_urb); usb_put_dev(dev->udev); kfree(dev->cmd); nfc_info(&interface->dev, "Sony Port-100 NFC device disconnected\n"); } static struct usb_driver port100_driver = { .name = "port100", .probe = port100_probe, .disconnect = port100_disconnect, .id_table = port100_table, }; module_usb_driver(port100_driver); MODULE_DESCRIPTION("NFC Port-100 series usb driver ver " VERSION); MODULE_VERSION(VERSION); MODULE_LICENSE("GPL"); |
| 3 2 2 4 4 4 4 4 4 29 3 1 1 1 1 1 1 4 16 5 9 5 3 3 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 | // SPDX-License-Identifier: GPL-2.0-only /* * Copyright (c) 2022 Pablo Neira Ayuso <pablo@netfilter.org> */ #include <linux/kernel.h> #include <linux/if_vlan.h> #include <linux/init.h> #include <linux/module.h> #include <linux/netlink.h> #include <linux/netfilter.h> #include <linux/netfilter/nf_tables.h> #include <net/netfilter/nf_tables_core.h> #include <net/netfilter/nf_tables.h> #include <net/netfilter/nft_meta.h> #include <net/netfilter/nf_tables_offload.h> #include <linux/tcp.h> #include <linux/udp.h> #include <net/gre.h> #include <net/geneve.h> #include <net/ip.h> #include <linux/icmpv6.h> #include <linux/ip.h> #include <linux/ipv6.h> static DEFINE_PER_CPU(struct nft_inner_tun_ctx, nft_pcpu_tun_ctx); /* Same layout as nft_expr but it embeds the private expression data area. */ struct __nft_expr { const struct nft_expr_ops *ops; union { struct nft_payload payload; struct nft_meta meta; } __attribute__((aligned(__alignof__(u64)))); }; enum { NFT_INNER_EXPR_PAYLOAD, NFT_INNER_EXPR_META, }; struct nft_inner { u8 flags; u8 hdrsize; u8 type; u8 expr_type; struct __nft_expr expr; }; static int nft_inner_parse_l2l3(const struct nft_inner *priv, const struct nft_pktinfo *pkt, struct nft_inner_tun_ctx *ctx, u32 off) { __be16 llproto, outer_llproto; u32 nhoff, thoff; if (priv->flags & NFT_INNER_LL) { struct vlan_ethhdr *veth, _veth; struct ethhdr *eth, _eth; u32 hdrsize; eth = skb_header_pointer(pkt->skb, off, sizeof(_eth), &_eth); if (!eth) return -1; switch (eth->h_proto) { case htons(ETH_P_IP): case htons(ETH_P_IPV6): llproto = eth->h_proto; hdrsize = sizeof(_eth); break; case htons(ETH_P_8021Q): veth = skb_header_pointer(pkt->skb, off, sizeof(_veth), &_veth); if (!veth) return -1; outer_llproto = veth->h_vlan_encapsulated_proto; llproto = veth->h_vlan_proto; hdrsize = sizeof(_veth); break; default: return -1; } ctx->inner_lloff = off; ctx->flags |= NFT_PAYLOAD_CTX_INNER_LL; off += hdrsize; } else { struct iphdr *iph; u32 _version; iph = skb_header_pointer(pkt->skb, off, sizeof(_version), &_version); if (!iph) return -1; switch (iph->version) { case 4: llproto = htons(ETH_P_IP); break; case 6: llproto = htons(ETH_P_IPV6); break; default: return -1; } } ctx->llproto = llproto; if (llproto == htons(ETH_P_8021Q)) llproto = outer_llproto; nhoff = off; switch (llproto) { case htons(ETH_P_IP): { struct iphdr *iph, _iph; iph = skb_header_pointer(pkt->skb, nhoff, sizeof(_iph), &_iph); if (!iph) return -1; if (iph->ihl < 5 || iph->version != 4) return -1; ctx->inner_nhoff = nhoff; ctx->flags |= NFT_PAYLOAD_CTX_INNER_NH; thoff = nhoff + (iph->ihl * 4); if ((ntohs(iph->frag_off) & IP_OFFSET) == 0) { ctx->flags |= NFT_PAYLOAD_CTX_INNER_TH; ctx->inner_thoff = thoff; ctx->l4proto = iph->protocol; } } break; case htons(ETH_P_IPV6): { struct ipv6hdr *ip6h, _ip6h; int fh_flags = IP6_FH_F_AUTH; unsigned short fragoff; int l4proto; ip6h = skb_header_pointer(pkt->skb, nhoff, sizeof(_ip6h), &_ip6h); if (!ip6h) return -1; if (ip6h->version != 6) return -1; ctx->inner_nhoff = nhoff; ctx->flags |= NFT_PAYLOAD_CTX_INNER_NH; thoff = nhoff; l4proto = ipv6_find_hdr(pkt->skb, &thoff, -1, &fragoff, &fh_flags); if (l4proto < 0 || thoff > U16_MAX) return -1; if (fragoff == 0) { thoff = nhoff + sizeof(_ip6h); ctx->flags |= NFT_PAYLOAD_CTX_INNER_TH; ctx->inner_thoff = thoff; ctx->l4proto = l4proto; } } break; default: return -1; } return 0; } static int nft_inner_parse_tunhdr(const struct nft_inner *priv, const struct nft_pktinfo *pkt, struct nft_inner_tun_ctx *ctx, u32 *off) { if (pkt->tprot == IPPROTO_GRE) { ctx->inner_tunoff = pkt->thoff; ctx->flags |= NFT_PAYLOAD_CTX_INNER_TUN; return 0; } if (pkt->tprot != IPPROTO_UDP) return -1; ctx->inner_tunoff = *off; ctx->flags |= NFT_PAYLOAD_CTX_INNER_TUN; *off += priv->hdrsize; switch (priv->type) { case NFT_INNER_GENEVE: { struct genevehdr *gnvh, _gnvh; gnvh = skb_header_pointer(pkt->skb, pkt->inneroff, sizeof(_gnvh), &_gnvh); if (!gnvh) return -1; *off += gnvh->opt_len * 4; } break; default: break; } return 0; } static int nft_inner_parse(const struct nft_inner *priv, struct nft_pktinfo *pkt, struct nft_inner_tun_ctx *tun_ctx) { u32 off = pkt->inneroff; if (priv->flags & NFT_INNER_HDRSIZE && nft_inner_parse_tunhdr(priv, pkt, tun_ctx, &off) < 0) return -1; if (priv->flags & (NFT_INNER_LL | NFT_INNER_NH)) { if (nft_inner_parse_l2l3(priv, pkt, tun_ctx, off) < 0) return -1; } else if (priv->flags & NFT_INNER_TH) { tun_ctx->inner_thoff = off; tun_ctx->flags |= NFT_PAYLOAD_CTX_INNER_TH; } tun_ctx->type = priv->type; tun_ctx->cookie = (unsigned long)pkt->skb; pkt->flags |= NFT_PKTINFO_INNER_FULL; return 0; } static bool nft_inner_restore_tun_ctx(const struct nft_pktinfo *pkt, struct nft_inner_tun_ctx *tun_ctx) { struct nft_inner_tun_ctx *this_cpu_tun_ctx; local_bh_disable(); this_cpu_tun_ctx = this_cpu_ptr(&nft_pcpu_tun_ctx); if (this_cpu_tun_ctx->cookie != (unsigned long)pkt->skb) { local_bh_enable(); return false; } *tun_ctx = *this_cpu_tun_ctx; local_bh_enable(); return true; } static void nft_inner_save_tun_ctx(const struct nft_pktinfo *pkt, const struct nft_inner_tun_ctx *tun_ctx) { struct nft_inner_tun_ctx *this_cpu_tun_ctx; local_bh_disable(); this_cpu_tun_ctx = this_cpu_ptr(&nft_pcpu_tun_ctx); if (this_cpu_tun_ctx->cookie != tun_ctx->cookie) *this_cpu_tun_ctx = *tun_ctx; local_bh_enable(); } static bool nft_inner_parse_needed(const struct nft_inner *priv, const struct nft_pktinfo *pkt, struct nft_inner_tun_ctx *tun_ctx) { if (!(pkt->flags & NFT_PKTINFO_INNER_FULL)) return true; if (!nft_inner_restore_tun_ctx(pkt, tun_ctx)) return true; if (priv->type != tun_ctx->type) return true; return false; } static void nft_inner_eval(const struct nft_expr *expr, struct nft_regs *regs, const struct nft_pktinfo *pkt) { const struct nft_inner *priv = nft_expr_priv(expr); struct nft_inner_tun_ctx tun_ctx = {}; if (nft_payload_inner_offset(pkt) < 0) goto err; if (nft_inner_parse_needed(priv, pkt, &tun_ctx) && nft_inner_parse(priv, (struct nft_pktinfo *)pkt, &tun_ctx) < 0) goto err; switch (priv->expr_type) { case NFT_INNER_EXPR_PAYLOAD: nft_payload_inner_eval((struct nft_expr *)&priv->expr, regs, pkt, &tun_ctx); break; case NFT_INNER_EXPR_META: nft_meta_inner_eval((struct nft_expr *)&priv->expr, regs, pkt, &tun_ctx); break; default: WARN_ON_ONCE(1); goto err; } nft_inner_save_tun_ctx(pkt, &tun_ctx); return; err: regs->verdict.code = NFT_BREAK; } static const struct nla_policy nft_inner_policy[NFTA_INNER_MAX + 1] = { [NFTA_INNER_NUM] = { .type = NLA_U32 }, [NFTA_INNER_FLAGS] = { .type = NLA_U32 }, [NFTA_INNER_HDRSIZE] = { .type = NLA_U32 }, [NFTA_INNER_TYPE] = { .type = NLA_U32 }, [NFTA_INNER_EXPR] = { .type = NLA_NESTED }, }; struct nft_expr_info { const struct nft_expr_ops *ops; const struct nlattr *attr; struct nlattr *tb[NFT_EXPR_MAXATTR + 1]; }; static int nft_inner_init(const struct nft_ctx *ctx, const struct nft_expr *expr, const struct nlattr * const tb[]) { struct nft_inner *priv = nft_expr_priv(expr); u32 flags, hdrsize, type, num; struct nft_expr_info expr_info; int err; if (!tb[NFTA_INNER_FLAGS] || !tb[NFTA_INNER_NUM] || !tb[NFTA_INNER_HDRSIZE] || !tb[NFTA_INNER_TYPE] || !tb[NFTA_INNER_EXPR]) return -EINVAL; flags = ntohl(nla_get_be32(tb[NFTA_INNER_FLAGS])); if (flags & ~NFT_INNER_MASK) return -EOPNOTSUPP; num = ntohl(nla_get_be32(tb[NFTA_INNER_NUM])); if (num != 0) return -EOPNOTSUPP; hdrsize = ntohl(nla_get_be32(tb[NFTA_INNER_HDRSIZE])); type = ntohl(nla_get_be32(tb[NFTA_INNER_TYPE])); if (type > U8_MAX) return -EINVAL; if (flags & NFT_INNER_HDRSIZE) { if (hdrsize == 0 || hdrsize > 64) return -EOPNOTSUPP; } priv->flags = flags; priv->hdrsize = hdrsize; priv->type = type; err = nft_expr_inner_parse(ctx, tb[NFTA_INNER_EXPR], &expr_info); if (err < 0) return err; priv->expr.ops = expr_info.ops; if (!strcmp(expr_info.ops->type->name, "payload")) priv->expr_type = NFT_INNER_EXPR_PAYLOAD; else if (!strcmp(expr_info.ops->type->name, "meta")) priv->expr_type = NFT_INNER_EXPR_META; else return -EINVAL; err = expr_info.ops->init(ctx, (struct nft_expr *)&priv->expr, (const struct nlattr * const*)expr_info.tb); if (err < 0) return err; return 0; } static int nft_inner_dump(struct sk_buff *skb, const struct nft_expr *expr, bool reset) { const struct nft_inner *priv = nft_expr_priv(expr); if (nla_put_be32(skb, NFTA_INNER_NUM, htonl(0)) || nla_put_be32(skb, NFTA_INNER_TYPE, htonl(priv->type)) || nla_put_be32(skb, NFTA_INNER_FLAGS, htonl(priv->flags)) || nla_put_be32(skb, NFTA_INNER_HDRSIZE, htonl(priv->hdrsize))) goto nla_put_failure; if (nft_expr_dump(skb, NFTA_INNER_EXPR, (struct nft_expr *)&priv->expr, reset) < 0) goto nla_put_failure; return 0; nla_put_failure: return -1; } static const struct nft_expr_ops nft_inner_ops = { .type = &nft_inner_type, .size = NFT_EXPR_SIZE(sizeof(struct nft_inner)), .eval = nft_inner_eval, .init = nft_inner_init, .dump = nft_inner_dump, }; struct nft_expr_type nft_inner_type __read_mostly = { .name = "inner", .ops = &nft_inner_ops, .policy = nft_inner_policy, .maxattr = NFTA_INNER_MAX, .owner = THIS_MODULE, }; |
| 1 1 1 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 | // SPDX-License-Identifier: GPL-2.0-or-later /* * HID driver for some chicony "special" devices * * Copyright (c) 1999 Andreas Gal * Copyright (c) 2000-2005 Vojtech Pavlik <vojtech@suse.cz> * Copyright (c) 2005 Michael Haboustak <mike-@cinci.rr.com> for Concept2, Inc * Copyright (c) 2006-2007 Jiri Kosina * Copyright (c) 2007 Paul Walmsley * Copyright (c) 2008 Jiri Slaby */ /* */ #include <linux/device.h> #include <linux/input.h> #include <linux/hid.h> #include <linux/module.h> #include <linux/usb.h> #include "hid-ids.h" #define CH_WIRELESS_CTL_REPORT_ID 0x11 static int ch_report_wireless(struct hid_report *report, u8 *data, int size) { struct hid_device *hdev = report->device; struct input_dev *input; if (report->id != CH_WIRELESS_CTL_REPORT_ID || report->maxfield != 1) return 0; input = report->field[0]->hidinput->input; if (!input) { hid_warn(hdev, "can't find wireless radio control's input"); return 0; } input_report_key(input, KEY_RFKILL, 1); input_sync(input); input_report_key(input, KEY_RFKILL, 0); input_sync(input); return 1; } static int ch_raw_event(struct hid_device *hdev, struct hid_report *report, u8 *data, int size) { if (report->application == HID_GD_WIRELESS_RADIO_CTLS) return ch_report_wireless(report, data, size); return 0; } #define ch_map_key_clear(c) hid_map_usage_clear(hi, usage, bit, max, \ EV_KEY, (c)) static int ch_input_mapping(struct hid_device *hdev, struct hid_input *hi, struct hid_field *field, struct hid_usage *usage, unsigned long **bit, int *max) { if ((usage->hid & HID_USAGE_PAGE) != HID_UP_MSVENDOR) return 0; set_bit(EV_REP, hi->input->evbit); switch (usage->hid & HID_USAGE) { case 0xff01: ch_map_key_clear(BTN_1); break; case 0xff02: ch_map_key_clear(BTN_2); break; case 0xff03: ch_map_key_clear(BTN_3); break; case 0xff04: ch_map_key_clear(BTN_4); break; case 0xff05: ch_map_key_clear(BTN_5); break; case 0xff06: ch_map_key_clear(BTN_6); break; case 0xff07: ch_map_key_clear(BTN_7); break; case 0xff08: ch_map_key_clear(BTN_8); break; case 0xff09: ch_map_key_clear(BTN_9); break; case 0xff0a: ch_map_key_clear(BTN_A); break; case 0xff0b: ch_map_key_clear(BTN_B); break; case 0x00f1: ch_map_key_clear(KEY_WLAN); break; case 0x00f2: ch_map_key_clear(KEY_BRIGHTNESSDOWN); break; case 0x00f3: ch_map_key_clear(KEY_BRIGHTNESSUP); break; case 0x00f4: ch_map_key_clear(KEY_DISPLAY_OFF); break; case 0x00f7: ch_map_key_clear(KEY_CAMERA); break; case 0x00f8: ch_map_key_clear(KEY_PROG1); break; default: return 0; } return 1; } static const __u8 *ch_switch12_report_fixup(struct hid_device *hdev, __u8 *rdesc, unsigned int *rsize) { struct usb_interface *intf = to_usb_interface(hdev->dev.parent); if (intf->cur_altsetting->desc.bInterfaceNumber == 1) { /* Change usage maximum and logical maximum from 0x7fff to * 0x2fff, so they don't exceed HID_MAX_USAGES */ switch (hdev->product) { case USB_DEVICE_ID_CHICONY_ACER_SWITCH12: if (*rsize >= 128 && rdesc[64] == 0xff && rdesc[65] == 0x7f && rdesc[69] == 0xff && rdesc[70] == 0x7f) { hid_info(hdev, "Fixing up report descriptor\n"); rdesc[65] = rdesc[70] = 0x2f; } break; } } return rdesc; } static int ch_probe(struct hid_device *hdev, const struct hid_device_id *id) { int ret; if (!hid_is_usb(hdev)) return -EINVAL; hdev->quirks |= HID_QUIRK_INPUT_PER_APP; ret = hid_parse(hdev); if (ret) { hid_err(hdev, "Chicony hid parse failed: %d\n", ret); return ret; } ret = hid_hw_start(hdev, HID_CONNECT_DEFAULT); if (ret) { hid_err(hdev, "Chicony hw start failed: %d\n", ret); return ret; } return 0; } static const struct hid_device_id ch_devices[] = { { HID_USB_DEVICE(USB_VENDOR_ID_CHICONY, USB_DEVICE_ID_CHICONY_TACTICAL_PAD) }, { HID_USB_DEVICE(USB_VENDOR_ID_CHICONY, USB_DEVICE_ID_CHICONY_WIRELESS2) }, { HID_USB_DEVICE(USB_VENDOR_ID_CHICONY, USB_DEVICE_ID_CHICONY_WIRELESS3) }, { HID_USB_DEVICE(USB_VENDOR_ID_CHICONY, USB_DEVICE_ID_CHICONY_ACER_SWITCH12) }, { } }; MODULE_DEVICE_TABLE(hid, ch_devices); static struct hid_driver ch_driver = { .name = "chicony", .id_table = ch_devices, .report_fixup = ch_switch12_report_fixup, .input_mapping = ch_input_mapping, .probe = ch_probe, .raw_event = ch_raw_event, }; module_hid_driver(ch_driver); MODULE_DESCRIPTION("HID driver for some chicony \"special\" devices"); MODULE_LICENSE("GPL"); |
| 3358 3357 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 | // SPDX-License-Identifier: GPL-2.0 /* * Device physical location support * * Author: Won Chung <wonchung@google.com> */ #include <linux/acpi.h> #include <linux/sysfs.h> #include "physical_location.h" bool dev_add_physical_location(struct device *dev) { struct acpi_pld_info *pld; if (!has_acpi_companion(dev)) return false; if (!acpi_get_physical_device_location(ACPI_HANDLE(dev), &pld)) return false; dev->physical_location = kzalloc(sizeof(*dev->physical_location), GFP_KERNEL); if (!dev->physical_location) { ACPI_FREE(pld); return false; } dev->physical_location->panel = pld->panel; dev->physical_location->vertical_position = pld->vertical_position; dev->physical_location->horizontal_position = pld->horizontal_position; dev->physical_location->dock = pld->dock; dev->physical_location->lid = pld->lid; ACPI_FREE(pld); return true; } static ssize_t panel_show(struct device *dev, struct device_attribute *attr, char *buf) { const char *panel; switch (dev->physical_location->panel) { case DEVICE_PANEL_TOP: panel = "top"; break; case DEVICE_PANEL_BOTTOM: panel = "bottom"; break; case DEVICE_PANEL_LEFT: panel = "left"; break; case DEVICE_PANEL_RIGHT: panel = "right"; break; case DEVICE_PANEL_FRONT: panel = "front"; break; case DEVICE_PANEL_BACK: panel = "back"; break; default: panel = "unknown"; } return sysfs_emit(buf, "%s\n", panel); } static DEVICE_ATTR_RO(panel); static ssize_t vertical_position_show(struct device *dev, struct device_attribute *attr, char *buf) { const char *vertical_position; switch (dev->physical_location->vertical_position) { case DEVICE_VERT_POS_UPPER: vertical_position = "upper"; break; case DEVICE_VERT_POS_CENTER: vertical_position = "center"; break; case DEVICE_VERT_POS_LOWER: vertical_position = "lower"; break; default: vertical_position = "unknown"; } return sysfs_emit(buf, "%s\n", vertical_position); } static DEVICE_ATTR_RO(vertical_position); static ssize_t horizontal_position_show(struct device *dev, struct device_attribute *attr, char *buf) { const char *horizontal_position; switch (dev->physical_location->horizontal_position) { case DEVICE_HORI_POS_LEFT: horizontal_position = "left"; break; case DEVICE_HORI_POS_CENTER: horizontal_position = "center"; break; case DEVICE_HORI_POS_RIGHT: horizontal_position = "right"; break; default: horizontal_position = "unknown"; } return sysfs_emit(buf, "%s\n", horizontal_position); } static DEVICE_ATTR_RO(horizontal_position); static ssize_t dock_show(struct device *dev, struct device_attribute *attr, char *buf) { return sysfs_emit(buf, "%s\n", dev->physical_location->dock ? "yes" : "no"); } static DEVICE_ATTR_RO(dock); static ssize_t lid_show(struct device *dev, struct device_attribute *attr, char *buf) { return sysfs_emit(buf, "%s\n", dev->physical_location->lid ? "yes" : "no"); } static DEVICE_ATTR_RO(lid); static struct attribute *dev_attr_physical_location[] = { &dev_attr_panel.attr, &dev_attr_vertical_position.attr, &dev_attr_horizontal_position.attr, &dev_attr_dock.attr, &dev_attr_lid.attr, NULL, }; const struct attribute_group dev_attr_physical_location_group = { .name = "physical_location", .attrs = dev_attr_physical_location, }; |
| 1 1 2 2 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 | // SPDX-License-Identifier: GPL-2.0-or-later /* * Squashfs - a compressed read only filesystem for Linux * * Copyright (c) 2002, 2003, 2004, 2005, 2006, 2007, 2008 * Phillip Lougher <phillip@squashfs.org.uk> * * fragment.c */ /* * This file implements code to handle compressed fragments (tail-end packed * datablocks). * * Regular files contain a fragment index which is mapped to a fragment * location on disk and compressed size using a fragment lookup table. * Like everything in Squashfs this fragment lookup table is itself stored * compressed into metadata blocks. A second index table is used to locate * these. This second index table for speed of access (and because it * is small) is read at mount time and cached in memory. */ #include <linux/fs.h> #include <linux/vfs.h> #include <linux/slab.h> #include "squashfs_fs.h" #include "squashfs_fs_sb.h" #include "squashfs.h" /* * Look-up fragment using the fragment index table. Return the on disk * location of the fragment and its compressed size */ int squashfs_frag_lookup(struct super_block *sb, unsigned int fragment, u64 *fragment_block) { struct squashfs_sb_info *msblk = sb->s_fs_info; int block, offset, size; struct squashfs_fragment_entry fragment_entry; u64 start_block; if (fragment >= msblk->fragments) return -EIO; block = SQUASHFS_FRAGMENT_INDEX(fragment); offset = SQUASHFS_FRAGMENT_INDEX_OFFSET(fragment); start_block = le64_to_cpu(msblk->fragment_index[block]); size = squashfs_read_metadata(sb, &fragment_entry, &start_block, &offset, sizeof(fragment_entry)); if (size < 0) return size; *fragment_block = le64_to_cpu(fragment_entry.start_block); return squashfs_block_size(fragment_entry.size); } /* * Read the uncompressed fragment lookup table indexes off disk into memory */ __le64 *squashfs_read_fragment_index_table(struct super_block *sb, u64 fragment_table_start, u64 next_table, unsigned int fragments) { unsigned int length = SQUASHFS_FRAGMENT_INDEX_BYTES(fragments); __le64 *table; /* * Sanity check, length bytes should not extend into the next table - * this check also traps instances where fragment_table_start is * incorrectly larger than the next table start */ if (fragment_table_start + length > next_table) return ERR_PTR(-EINVAL); table = squashfs_read_table(sb, fragment_table_start, length); /* * table[0] points to the first fragment table metadata block, this * should be less than fragment_table_start */ if (!IS_ERR(table) && le64_to_cpu(table[0]) >= fragment_table_start) { kfree(table); return ERR_PTR(-EINVAL); } return table; } |
| 73 48 73 2 67 69 69 2 44 20 2 1 63 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 | // SPDX-License-Identifier: GPL-2.0 /* * linux/fs/hfsplus/wrapper.c * * Copyright (C) 2001 * Brad Boyer (flar@allandria.com) * (C) 2003 Ardis Technologies <roman@ardistech.com> * * Handling of HFS wrappers around HFS+ volumes */ #include <linux/fs.h> #include <linux/blkdev.h> #include <linux/cdrom.h> #include <linux/unaligned.h> #include "hfsplus_fs.h" #include "hfsplus_raw.h" struct hfsplus_wd { u32 ablk_size; u16 ablk_start; u16 embed_start; u16 embed_count; }; /** * hfsplus_submit_bio - Perform block I/O * @sb: super block of volume for I/O * @sector: block to read or write, for blocks of HFSPLUS_SECTOR_SIZE bytes * @buf: buffer for I/O * @data: output pointer for location of requested data * @opf: I/O operation type and flags * * The unit of I/O is hfsplus_min_io_size(sb), which may be bigger than * HFSPLUS_SECTOR_SIZE, and @buf must be sized accordingly. On reads * @data will return a pointer to the start of the requested sector, * which may not be the same location as @buf. * * If @sector is not aligned to the bdev logical block size it will * be rounded down. For writes this means that @buf should contain data * that starts at the rounded-down address. As long as the data was * read using hfsplus_submit_bio() and the same buffer is used things * will work correctly. * * Returns: %0 on success else -errno code */ int hfsplus_submit_bio(struct super_block *sb, sector_t sector, void *buf, void **data, blk_opf_t opf) { const enum req_op op = opf & REQ_OP_MASK; struct bio *bio; int ret = 0; u64 io_size; loff_t start; int offset; /* * Align sector to hardware sector size and find offset. We * assume that io_size is a power of two, which _should_ * be true. */ io_size = hfsplus_min_io_size(sb); start = (loff_t)sector << HFSPLUS_SECTOR_SHIFT; offset = start & (io_size - 1); sector &= ~((io_size >> HFSPLUS_SECTOR_SHIFT) - 1); bio = bio_alloc(sb->s_bdev, 1, opf, GFP_NOIO); bio->bi_iter.bi_sector = sector; if (op != REQ_OP_WRITE && data) *data = (u8 *)buf + offset; while (io_size > 0) { unsigned int page_offset = offset_in_page(buf); unsigned int len = min_t(unsigned int, PAGE_SIZE - page_offset, io_size); ret = bio_add_page(bio, virt_to_page(buf), len, page_offset); if (ret != len) { ret = -EIO; goto out; } io_size -= len; buf = (u8 *)buf + len; } ret = submit_bio_wait(bio); out: bio_put(bio); return ret < 0 ? ret : 0; } static int hfsplus_read_mdb(void *bufptr, struct hfsplus_wd *wd) { u32 extent; u16 attrib; __be16 sig; sig = *(__be16 *)(bufptr + HFSP_WRAPOFF_EMBEDSIG); if (sig != cpu_to_be16(HFSPLUS_VOLHEAD_SIG) && sig != cpu_to_be16(HFSPLUS_VOLHEAD_SIGX)) return 0; attrib = be16_to_cpu(*(__be16 *)(bufptr + HFSP_WRAPOFF_ATTRIB)); if (!(attrib & HFSP_WRAP_ATTRIB_SLOCK) || !(attrib & HFSP_WRAP_ATTRIB_SPARED)) return 0; wd->ablk_size = be32_to_cpu(*(__be32 *)(bufptr + HFSP_WRAPOFF_ABLKSIZE)); if (wd->ablk_size < HFSPLUS_SECTOR_SIZE) return 0; if (wd->ablk_size % HFSPLUS_SECTOR_SIZE) return 0; wd->ablk_start = be16_to_cpu(*(__be16 *)(bufptr + HFSP_WRAPOFF_ABLKSTART)); extent = get_unaligned_be32(bufptr + HFSP_WRAPOFF_EMBEDEXT); wd->embed_start = (extent >> 16) & 0xFFFF; wd->embed_count = extent & 0xFFFF; return 1; } static int hfsplus_get_last_session(struct super_block *sb, sector_t *start, sector_t *size) { struct cdrom_device_info *cdi = disk_to_cdi(sb->s_bdev->bd_disk); /* default values */ *start = 0; *size = bdev_nr_sectors(sb->s_bdev); if (HFSPLUS_SB(sb)->session >= 0) { struct cdrom_tocentry te; if (!cdi) return -EINVAL; te.cdte_track = HFSPLUS_SB(sb)->session; te.cdte_format = CDROM_LBA; if (cdrom_read_tocentry(cdi, &te) || (te.cdte_ctrl & CDROM_DATA_TRACK) != 4) { pr_err("invalid session number or type of track\n"); return -EINVAL; } *start = (sector_t)te.cdte_addr.lba << 2; } else if (cdi) { struct cdrom_multisession ms_info; ms_info.addr_format = CDROM_LBA; if (cdrom_multisession(cdi, &ms_info) == 0 && ms_info.xa_flag) *start = (sector_t)ms_info.addr.lba << 2; } return 0; } /* Find the volume header and fill in some minimum bits in superblock */ /* Takes in super block, returns true if good data read */ int hfsplus_read_wrapper(struct super_block *sb) { struct hfsplus_sb_info *sbi = HFSPLUS_SB(sb); struct hfsplus_wd wd; sector_t part_start, part_size; u32 blocksize; int error = 0; error = -EINVAL; blocksize = sb_min_blocksize(sb, HFSPLUS_SECTOR_SIZE); if (!blocksize) goto out; sbi->min_io_size = blocksize; if (hfsplus_get_last_session(sb, &part_start, &part_size)) goto out; error = -ENOMEM; sbi->s_vhdr_buf = kmalloc(hfsplus_min_io_size(sb), GFP_KERNEL); if (!sbi->s_vhdr_buf) goto out; sbi->s_backup_vhdr_buf = kmalloc(hfsplus_min_io_size(sb), GFP_KERNEL); if (!sbi->s_backup_vhdr_buf) goto out_free_vhdr; reread: error = hfsplus_submit_bio(sb, part_start + HFSPLUS_VOLHEAD_SECTOR, sbi->s_vhdr_buf, (void **)&sbi->s_vhdr, REQ_OP_READ); if (error) goto out_free_backup_vhdr; error = -EINVAL; switch (sbi->s_vhdr->signature) { case cpu_to_be16(HFSPLUS_VOLHEAD_SIGX): set_bit(HFSPLUS_SB_HFSX, &sbi->flags); fallthrough; case cpu_to_be16(HFSPLUS_VOLHEAD_SIG): break; case cpu_to_be16(HFSP_WRAP_MAGIC): if (!hfsplus_read_mdb(sbi->s_vhdr, &wd)) goto out_free_backup_vhdr; wd.ablk_size >>= HFSPLUS_SECTOR_SHIFT; part_start += (sector_t)wd.ablk_start + (sector_t)wd.embed_start * wd.ablk_size; part_size = (sector_t)wd.embed_count * wd.ablk_size; goto reread; default: /* * Check for a partition block. * * (should do this only for cdrom/loop though) */ if (hfs_part_find(sb, &part_start, &part_size)) goto out_free_backup_vhdr; goto reread; } error = hfsplus_submit_bio(sb, part_start + part_size - 2, sbi->s_backup_vhdr_buf, (void **)&sbi->s_backup_vhdr, REQ_OP_READ); if (error) goto out_free_backup_vhdr; error = -EINVAL; if (sbi->s_backup_vhdr->signature != sbi->s_vhdr->signature) { pr_warn("invalid secondary volume header\n"); goto out_free_backup_vhdr; } blocksize = be32_to_cpu(sbi->s_vhdr->blocksize); /* * Block size must be at least as large as a sector and a multiple of 2. */ if (blocksize < HFSPLUS_SECTOR_SIZE || ((blocksize - 1) & blocksize)) goto out_free_backup_vhdr; sbi->alloc_blksz = blocksize; sbi->alloc_blksz_shift = ilog2(blocksize); blocksize = min_t(u32, sbi->alloc_blksz, PAGE_SIZE); /* * Align block size to block offset. */ while (part_start & ((blocksize >> HFSPLUS_SECTOR_SHIFT) - 1)) blocksize >>= 1; if (sb_set_blocksize(sb, blocksize) != blocksize) { pr_err("unable to set blocksize to %u!\n", blocksize); goto out_free_backup_vhdr; } sbi->blockoffset = part_start >> (sb->s_blocksize_bits - HFSPLUS_SECTOR_SHIFT); sbi->part_start = part_start; sbi->sect_count = part_size; sbi->fs_shift = sbi->alloc_blksz_shift - sb->s_blocksize_bits; return 0; out_free_backup_vhdr: kfree(sbi->s_backup_vhdr_buf); out_free_vhdr: kfree(sbi->s_vhdr_buf); out: return error; } |
| 1 1 1 1 1 1 3 1 1 1 1 1 3 2 3 3 3 3 1 1 5 5 5 1 2 1 1 5 4 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 | // SPDX-License-Identifier: GPL-2.0-only /* * Overlayfs NFS export support. * * Amir Goldstein <amir73il@gmail.com> * * Copyright (C) 2017-2018 CTERA Networks. All Rights Reserved. */ #include <linux/fs.h> #include <linux/cred.h> #include <linux/mount.h> #include <linux/namei.h> #include <linux/xattr.h> #include <linux/exportfs.h> #include <linux/ratelimit.h> #include "overlayfs.h" static int ovl_encode_maybe_copy_up(struct dentry *dentry) { int err; if (ovl_dentry_upper(dentry)) return 0; err = ovl_copy_up(dentry); if (err) { pr_warn_ratelimited("failed to copy up on encode (%pd2, err=%i)\n", dentry, err); } return err; } /* * Before encoding a non-upper directory file handle from real layer N, we need * to check if it will be possible to reconnect an overlay dentry from the real * lower decoded dentry. This is done by following the overlay ancestry up to a * "layer N connected" ancestor and verifying that all parents along the way are * "layer N connectable". If an ancestor that is NOT "layer N connectable" is * found, we need to copy up an ancestor, which is "layer N connectable", thus * making that ancestor "layer N connected". For example: * * layer 1: /a * layer 2: /a/b/c * * The overlay dentry /a is NOT "layer 2 connectable", because if dir /a is * copied up and renamed, upper dir /a will be indexed by lower dir /a from * layer 1. The dir /a from layer 2 will never be indexed, so the algorithm (*) * in ovl_lookup_real_ancestor() will not be able to lookup a connected overlay * dentry from the connected lower dentry /a/b/c. * * To avoid this problem on decode time, we need to copy up an ancestor of * /a/b/c, which is "layer 2 connectable", on encode time. That ancestor is * /a/b. After copy up (and index) of /a/b, it will become "layer 2 connected" * and when the time comes to decode the file handle from lower dentry /a/b/c, * ovl_lookup_real_ancestor() will find the indexed ancestor /a/b and decoding * a connected overlay dentry will be accomplished. * * (*) the algorithm in ovl_lookup_real_ancestor() can be improved to lookup an * entry /a in the lower layers above layer N and find the indexed dir /a from * layer 1. If that improvement is made, then the check for "layer N connected" * will need to verify there are no redirects in lower layers above N. In the * example above, /a will be "layer 2 connectable". However, if layer 2 dir /a * is a target of a layer 1 redirect, then /a will NOT be "layer 2 connectable": * * layer 1: /A (redirect = /a) * layer 2: /a/b/c */ /* Return the lowest layer for encoding a connectable file handle */ static int ovl_connectable_layer(struct dentry *dentry) { struct ovl_entry *oe = OVL_E(dentry); /* We can get overlay root from root of any layer */ if (dentry == dentry->d_sb->s_root) return ovl_numlower(oe); /* * If it's an unindexed merge dir, then it's not connectable with any * lower layer */ if (ovl_dentry_upper(dentry) && !ovl_test_flag(OVL_INDEX, d_inode(dentry))) return 0; /* We can get upper/overlay path from indexed/lower dentry */ return ovl_lowerstack(oe)->layer->idx; } /* * @dentry is "connected" if all ancestors up to root or a "connected" ancestor * have the same uppermost lower layer as the origin's layer. We may need to * copy up a "connectable" ancestor to make it "connected". A "connected" dentry * cannot become non "connected", so cache positive result in dentry flags. * * Return the connected origin layer or < 0 on error. */ static int ovl_connect_layer(struct dentry *dentry) { struct dentry *next, *parent = NULL; struct ovl_entry *oe = OVL_E(dentry); int origin_layer; int err = 0; if (WARN_ON(dentry == dentry->d_sb->s_root) || WARN_ON(!ovl_dentry_lower(dentry))) return -EIO; origin_layer = ovl_lowerstack(oe)->layer->idx; if (ovl_dentry_test_flag(OVL_E_CONNECTED, dentry)) return origin_layer; /* Find the topmost origin layer connectable ancestor of @dentry */ next = dget(dentry); for (;;) { parent = dget_parent(next); if (WARN_ON(parent == next)) { err = -EIO; break; } /* * If @parent is not origin layer connectable, then copy up * @next which is origin layer connectable and we are done. */ if (ovl_connectable_layer(parent) < origin_layer) { err = ovl_encode_maybe_copy_up(next); break; } /* If @parent is connected or indexed we are done */ if (ovl_dentry_test_flag(OVL_E_CONNECTED, parent) || ovl_test_flag(OVL_INDEX, d_inode(parent))) break; dput(next); next = parent; } dput(parent); dput(next); if (!err) ovl_dentry_set_flag(OVL_E_CONNECTED, dentry); return err ?: origin_layer; } /* * We only need to encode origin if there is a chance that the same object was * encoded pre copy up and then we need to stay consistent with the same * encoding also after copy up. If non-pure upper is not indexed, then it was * copied up before NFS export was enabled. In that case we don't need to worry * about staying consistent with pre copy up encoding and we encode an upper * file handle. Overlay root dentry is a private case of non-indexed upper. * * The following table summarizes the different file handle encodings used for * different overlay object types: * * Object type | Encoding * -------------------------------- * Pure upper | U * Non-indexed upper | U * Indexed upper | L (*) * Non-upper | L (*) * * U = upper file handle * L = lower file handle * * (*) Decoding a connected overlay dir from real lower dentry is not always * possible when there are redirects in lower layers and non-indexed merge dirs. * To mitigate those case, we may copy up the lower dir ancestor before encode * of a decodable file handle for non-upper dir. * * Return 0 for upper file handle, > 0 for lower file handle or < 0 on error. */ static int ovl_check_encode_origin(struct inode *inode) { struct ovl_fs *ofs = OVL_FS(inode->i_sb); bool decodable = ofs->config.nfs_export; struct dentry *dentry; int err; /* No upper layer? */ if (!ovl_upper_mnt(ofs)) return 1; /* Lower file handle for non-upper non-decodable */ if (!ovl_inode_upper(inode) && !decodable) return 1; /* Upper file handle for pure upper */ if (!ovl_inode_lower(inode)) return 0; /* * Root is never indexed, so if there's an upper layer, encode upper for * root. */ if (inode == d_inode(inode->i_sb->s_root)) return 0; /* * Upper decodable file handle for non-indexed upper. */ if (ovl_inode_upper(inode) && decodable && !ovl_test_flag(OVL_INDEX, inode)) return 0; /* * Decoding a merge dir, whose origin's ancestor is under a redirected * lower dir or under a non-indexed upper is not always possible. * ovl_connect_layer() will try to make origin's layer "connected" by * copying up a "connectable" ancestor. */ if (!decodable || !S_ISDIR(inode->i_mode)) return 1; dentry = d_find_any_alias(inode); if (!dentry) return -ENOENT; err = ovl_connect_layer(dentry); dput(dentry); if (err < 0) return err; /* Lower file handle for indexed and non-upper dir/non-dir */ return 1; } static int ovl_dentry_to_fid(struct ovl_fs *ofs, struct inode *inode, u32 *fid, int buflen) { struct ovl_fh *fh = NULL; int err, enc_lower; int len; /* * Check if we should encode a lower or upper file handle and maybe * copy up an ancestor to make lower file handle connectable. */ err = enc_lower = ovl_check_encode_origin(inode); if (enc_lower < 0) goto fail; /* Encode an upper or lower file handle */ fh = ovl_encode_real_fh(ofs, enc_lower ? ovl_inode_lower(inode) : ovl_inode_upper(inode), !enc_lower); if (IS_ERR(fh)) return PTR_ERR(fh); len = OVL_FH_LEN(fh); if (len <= buflen) memcpy(fid, fh, len); err = len; out: kfree(fh); return err; fail: pr_warn_ratelimited("failed to encode file handle (ino=%lu, err=%i)\n", inode->i_ino, err); goto out; } static int ovl_encode_fh(struct inode *inode, u32 *fid, int *max_len, struct inode *parent) { struct ovl_fs *ofs = OVL_FS(inode->i_sb); int bytes, buflen = *max_len << 2; /* TODO: encode connectable file handles */ if (parent) return FILEID_INVALID; bytes = ovl_dentry_to_fid(ofs, inode, fid, buflen); if (bytes <= 0) return FILEID_INVALID; *max_len = bytes >> 2; if (bytes > buflen) return FILEID_INVALID; return OVL_FILEID_V1; } /* * Find or instantiate an overlay dentry from real dentries and index. */ static struct dentry *ovl_obtain_alias(struct super_block *sb, struct dentry *upper_alias, struct ovl_path *lowerpath, struct dentry *index) { struct dentry *lower = lowerpath ? lowerpath->dentry : NULL; struct dentry *upper = upper_alias ?: index; struct inode *inode = NULL; struct ovl_entry *oe; struct ovl_inode_params oip = { .index = index, }; /* We get overlay directory dentries with ovl_lookup_real() */ if (d_is_dir(upper ?: lower)) return ERR_PTR(-EIO); oe = ovl_alloc_entry(!!lower); if (!oe) return ERR_PTR(-ENOMEM); oip.upperdentry = dget(upper); if (lower) { ovl_lowerstack(oe)->dentry = dget(lower); ovl_lowerstack(oe)->layer = lowerpath->layer; } oip.oe = oe; inode = ovl_get_inode(sb, &oip); if (IS_ERR(inode)) { ovl_free_entry(oe); dput(upper); return ERR_CAST(inode); } if (upper) ovl_set_flag(OVL_UPPERDATA, inode); return d_obtain_alias(inode); } /* Get the upper or lower dentry in stack whose on layer @idx */ static struct dentry *ovl_dentry_real_at(struct dentry *dentry, int idx) { struct ovl_entry *oe = OVL_E(dentry); struct ovl_path *lowerstack = ovl_lowerstack(oe); int i; if (!idx) return ovl_dentry_upper(dentry); for (i = 0; i < ovl_numlower(oe); i++) { if (lowerstack[i].layer->idx == idx) return lowerstack[i].dentry; } return NULL; } /* * Lookup a child overlay dentry to get a connected overlay dentry whose real * dentry is @real. If @real is on upper layer, we lookup a child overlay * dentry with the same name as the real dentry. Otherwise, we need to consult * index for lookup. */ static struct dentry *ovl_lookup_real_one(struct dentry *connected, struct dentry *real, const struct ovl_layer *layer) { struct inode *dir = d_inode(connected); struct dentry *this, *parent = NULL; struct name_snapshot name; int err; /* * Lookup child overlay dentry by real name. The dir mutex protects us * from racing with overlay rename. If the overlay dentry that is above * real has already been moved to a parent that is not under the * connected overlay dir, we return -ECHILD and restart the lookup of * connected real path from the top. */ inode_lock_nested(dir, I_MUTEX_PARENT); err = -ECHILD; parent = dget_parent(real); if (ovl_dentry_real_at(connected, layer->idx) != parent) goto fail; /* * We also need to take a snapshot of real dentry name to protect us * from racing with underlying layer rename. In this case, we don't * care about returning ESTALE, only from dereferencing a free name * pointer because we hold no lock on the real dentry. */ take_dentry_name_snapshot(&name, real); /* * No idmap handling here: it's an internal lookup. Could skip * permission checking altogether, but for now just use non-idmap * transformed ids. */ this = lookup_one_len(name.name.name, connected, name.name.len); release_dentry_name_snapshot(&name); err = PTR_ERR(this); if (IS_ERR(this)) { goto fail; } else if (!this || !this->d_inode) { dput(this); err = -ENOENT; goto fail; } else if (ovl_dentry_real_at(this, layer->idx) != real) { dput(this); err = -ESTALE; goto fail; } out: dput(parent); inode_unlock(dir); return this; fail: pr_warn_ratelimited("failed to lookup one by real (%pd2, layer=%d, connected=%pd2, err=%i)\n", real, layer->idx, connected, err); this = ERR_PTR(err); goto out; } static struct dentry *ovl_lookup_real(struct super_block *sb, struct dentry *real, const struct ovl_layer *layer); /* * Lookup an indexed or hashed overlay dentry by real inode. */ static struct dentry *ovl_lookup_real_inode(struct super_block *sb, struct dentry *real, const struct ovl_layer *layer) { struct ovl_fs *ofs = OVL_FS(sb); struct dentry *index = NULL; struct dentry *this = NULL; struct inode *inode; /* * Decoding upper dir from index is expensive, so first try to lookup * overlay dentry in inode/dcache. */ inode = ovl_lookup_inode(sb, real, !layer->idx); if (IS_ERR(inode)) return ERR_CAST(inode); if (inode) { this = d_find_any_alias(inode); iput(inode); } /* * For decoded lower dir file handle, lookup index by origin to check * if lower dir was copied up and and/or removed. */ if (!this && layer->idx && ovl_indexdir(sb) && !WARN_ON(!d_is_dir(real))) { index = ovl_lookup_index(ofs, NULL, real, false); if (IS_ERR(index)) return index; } /* Get connected upper overlay dir from index */ if (index) { struct dentry *upper = ovl_index_upper(ofs, index, true); dput(index); if (IS_ERR_OR_NULL(upper)) return upper; /* * ovl_lookup_real() in lower layer may call recursively once to * ovl_lookup_real() in upper layer. The first level call walks * back lower parents to the topmost indexed parent. The second * recursive call walks back from indexed upper to the topmost * connected/hashed upper parent (or up to root). */ this = ovl_lookup_real(sb, upper, &ofs->layers[0]); dput(upper); } if (IS_ERR_OR_NULL(this)) return this; if (ovl_dentry_real_at(this, layer->idx) != real) { dput(this); this = ERR_PTR(-EIO); } return this; } /* * Lookup an indexed or hashed overlay dentry, whose real dentry is an * ancestor of @real. */ static struct dentry *ovl_lookup_real_ancestor(struct super_block *sb, struct dentry *real, const struct ovl_layer *layer) { struct dentry *next, *parent = NULL; struct dentry *ancestor = ERR_PTR(-EIO); if (real == layer->mnt->mnt_root) return dget(sb->s_root); /* Find the topmost indexed or hashed ancestor */ next = dget(real); for (;;) { parent = dget_parent(next); /* * Lookup a matching overlay dentry in inode/dentry * cache or in index by real inode. */ ancestor = ovl_lookup_real_inode(sb, next, layer); if (ancestor) break; if (parent == layer->mnt->mnt_root) { ancestor = dget(sb->s_root); break; } /* * If @real has been moved out of the layer root directory, * we will eventully hit the real fs root. This cannot happen * by legit overlay rename, so we return error in that case. */ if (parent == next) { ancestor = ERR_PTR(-EXDEV); break; } dput(next); next = parent; } dput(parent); dput(next); return ancestor; } /* * Lookup a connected overlay dentry whose real dentry is @real. * If @real is on upper layer, we lookup a child overlay dentry with the same * path the real dentry. Otherwise, we need to consult index for lookup. */ static struct dentry *ovl_lookup_real(struct super_block *sb, struct dentry *real, const struct ovl_layer *layer) { struct dentry *connected; int err = 0; connected = ovl_lookup_real_ancestor(sb, real, layer); if (IS_ERR(connected)) return connected; while (!err) { struct dentry *next, *this; struct dentry *parent = NULL; struct dentry *real_connected = ovl_dentry_real_at(connected, layer->idx); if (real_connected == real) break; /* Find the topmost dentry not yet connected */ next = dget(real); for (;;) { parent = dget_parent(next); if (parent == real_connected) break; /* * If real has been moved out of 'real_connected', * we will not find 'real_connected' and hit the layer * root. In that case, we need to restart connecting. * This game can go on forever in the worst case. We * may want to consider taking s_vfs_rename_mutex if * this happens more than once. */ if (parent == layer->mnt->mnt_root) { dput(connected); connected = dget(sb->s_root); break; } /* * If real file has been moved out of the layer root * directory, we will eventully hit the real fs root. * This cannot happen by legit overlay rename, so we * return error in that case. */ if (parent == next) { err = -EXDEV; break; } dput(next); next = parent; } if (!err) { this = ovl_lookup_real_one(connected, next, layer); if (IS_ERR(this)) err = PTR_ERR(this); /* * Lookup of child in overlay can fail when racing with * overlay rename of child away from 'connected' parent. * In this case, we need to restart the lookup from the * top, because we cannot trust that 'real_connected' is * still an ancestor of 'real'. There is a good chance * that the renamed overlay ancestor is now in cache, so * ovl_lookup_real_ancestor() will find it and we can * continue to connect exactly from where lookup failed. */ if (err == -ECHILD) { this = ovl_lookup_real_ancestor(sb, real, layer); err = PTR_ERR_OR_ZERO(this); } if (!err) { dput(connected); connected = this; } } dput(parent); dput(next); } if (err) goto fail; return connected; fail: pr_warn_ratelimited("failed to lookup by real (%pd2, layer=%d, connected=%pd2, err=%i)\n", real, layer->idx, connected, err); dput(connected); return ERR_PTR(err); } /* * Get an overlay dentry from upper/lower real dentries and index. */ static struct dentry *ovl_get_dentry(struct super_block *sb, struct dentry *upper, struct ovl_path *lowerpath, struct dentry *index) { struct ovl_fs *ofs = OVL_FS(sb); const struct ovl_layer *layer = upper ? &ofs->layers[0] : lowerpath->layer; struct dentry *real = upper ?: (index ?: lowerpath->dentry); /* * Obtain a disconnected overlay dentry from a non-dir real dentry * and index. */ if (!d_is_dir(real)) return ovl_obtain_alias(sb, upper, lowerpath, index); /* Removed empty directory? */ if ((real->d_flags & DCACHE_DISCONNECTED) || d_unhashed(real)) return ERR_PTR(-ENOENT); /* * If real dentry is connected and hashed, get a connected overlay * dentry whose real dentry is @real. */ return ovl_lookup_real(sb, real, layer); } static struct dentry *ovl_upper_fh_to_d(struct super_block *sb, struct ovl_fh *fh) { struct ovl_fs *ofs = OVL_FS(sb); struct dentry *dentry; struct dentry *upper; if (!ovl_upper_mnt(ofs)) return ERR_PTR(-EACCES); upper = ovl_decode_real_fh(ofs, fh, ovl_upper_mnt(ofs), true); if (IS_ERR_OR_NULL(upper)) return upper; dentry = ovl_get_dentry(sb, upper, NULL, NULL); dput(upper); return dentry; } static struct dentry *ovl_lower_fh_to_d(struct super_block *sb, struct ovl_fh *fh) { struct ovl_fs *ofs = OVL_FS(sb); struct ovl_path origin = { }; struct ovl_path *stack = &origin; struct dentry *dentry = NULL; struct dentry *index = NULL; struct inode *inode; int err; /* First lookup overlay inode in inode cache by origin fh */ err = ovl_check_origin_fh(ofs, fh, false, NULL, &stack); if (err) return ERR_PTR(err); if (!d_is_dir(origin.dentry) || !(origin.dentry->d_flags & DCACHE_DISCONNECTED)) { inode = ovl_lookup_inode(sb, origin.dentry, false); err = PTR_ERR(inode); if (IS_ERR(inode)) goto out_err; if (inode) { dentry = d_find_any_alias(inode); iput(inode); if (dentry) goto out; } } /* Then lookup indexed upper/whiteout by origin fh */ if (ovl_indexdir(sb)) { index = ovl_get_index_fh(ofs, fh); err = PTR_ERR(index); if (IS_ERR(index)) { index = NULL; goto out_err; } } /* Then try to get a connected upper dir by index */ if (index && d_is_dir(index)) { struct dentry *upper = ovl_index_upper(ofs, index, true); err = PTR_ERR(upper); if (IS_ERR_OR_NULL(upper)) goto out_err; dentry = ovl_get_dentry(sb, upper, NULL, NULL); dput(upper); goto out; } /* Find origin.dentry again with ovl_acceptable() layer check */ if (d_is_dir(origin.dentry)) { dput(origin.dentry); origin.dentry = NULL; err = ovl_check_origin_fh(ofs, fh, true, NULL, &stack); if (err) goto out_err; } if (index) { err = ovl_verify_origin(ofs, index, origin.dentry, false); if (err) goto out_err; } /* Get a connected non-upper dir or disconnected non-dir */ dentry = ovl_get_dentry(sb, NULL, &origin, index); out: dput(origin.dentry); dput(index); return dentry; out_err: dentry = ERR_PTR(err); goto out; } static struct ovl_fh *ovl_fid_to_fh(struct fid *fid, int buflen, int fh_type) { struct ovl_fh *fh; /* If on-wire inner fid is aligned - nothing to do */ if (fh_type == OVL_FILEID_V1) return (struct ovl_fh *)fid; if (fh_type != OVL_FILEID_V0) return ERR_PTR(-EINVAL); if (buflen <= OVL_FH_WIRE_OFFSET) return ERR_PTR(-EINVAL); fh = kzalloc(buflen, GFP_KERNEL); if (!fh) return ERR_PTR(-ENOMEM); /* Copy unaligned inner fh into aligned buffer */ memcpy(fh->buf, fid, buflen - OVL_FH_WIRE_OFFSET); return fh; } static struct dentry *ovl_fh_to_dentry(struct super_block *sb, struct fid *fid, int fh_len, int fh_type) { struct dentry *dentry = NULL; struct ovl_fh *fh = NULL; int len = fh_len << 2; unsigned int flags = 0; int err; fh = ovl_fid_to_fh(fid, len, fh_type); err = PTR_ERR(fh); if (IS_ERR(fh)) goto out_err; err = ovl_check_fh_len(fh, len); if (err) goto out_err; flags = fh->fb.flags; dentry = (flags & OVL_FH_FLAG_PATH_UPPER) ? ovl_upper_fh_to_d(sb, fh) : ovl_lower_fh_to_d(sb, fh); err = PTR_ERR(dentry); if (IS_ERR(dentry) && err != -ESTALE) goto out_err; out: /* We may have needed to re-align OVL_FILEID_V0 */ if (!IS_ERR_OR_NULL(fh) && fh != (void *)fid) kfree(fh); return dentry; out_err: pr_warn_ratelimited("failed to decode file handle (len=%d, type=%d, flags=%x, err=%i)\n", fh_len, fh_type, flags, err); dentry = ERR_PTR(err); goto out; } static struct dentry *ovl_fh_to_parent(struct super_block *sb, struct fid *fid, int fh_len, int fh_type) { pr_warn_ratelimited("connectable file handles not supported; use 'no_subtree_check' exportfs option.\n"); return ERR_PTR(-EACCES); } static int ovl_get_name(struct dentry *parent, char *name, struct dentry *child) { /* * ovl_fh_to_dentry() returns connected dir overlay dentries and * ovl_fh_to_parent() is not implemented, so we should not get here. */ WARN_ON_ONCE(1); return -EIO; } static struct dentry *ovl_get_parent(struct dentry *dentry) { /* * ovl_fh_to_dentry() returns connected dir overlay dentries, so we * should not get here. */ WARN_ON_ONCE(1); return ERR_PTR(-EIO); } const struct export_operations ovl_export_operations = { .encode_fh = ovl_encode_fh, .fh_to_dentry = ovl_fh_to_dentry, .fh_to_parent = ovl_fh_to_parent, .get_name = ovl_get_name, .get_parent = ovl_get_parent, }; /* encode_fh() encodes non-decodable file handles with nfs_export=off */ const struct export_operations ovl_export_fid_operations = { .encode_fh = ovl_encode_fh, }; |
| 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 | // SPDX-License-Identifier: GPL-2.0 /* Copyright (c) 2021-2022, NVIDIA CORPORATION & AFFILIATES. * * Kernel side components to support tools/testing/selftests/iommu */ #include <linux/anon_inodes.h> #include <linux/debugfs.h> #include <linux/fault-inject.h> #include <linux/file.h> #include <linux/iommu.h> #include <linux/platform_device.h> #include <linux/slab.h> #include <linux/xarray.h> #include <uapi/linux/iommufd.h> #include "../iommu-priv.h" #include "io_pagetable.h" #include "iommufd_private.h" #include "iommufd_test.h" static DECLARE_FAULT_ATTR(fail_iommufd); static struct dentry *dbgfs_root; static struct platform_device *selftest_iommu_dev; static const struct iommu_ops mock_ops; static struct iommu_domain_ops domain_nested_ops; size_t iommufd_test_memory_limit = 65536; struct mock_bus_type { struct bus_type bus; struct notifier_block nb; }; static struct mock_bus_type iommufd_mock_bus_type = { .bus = { .name = "iommufd_mock", }, }; static DEFINE_IDA(mock_dev_ida); enum { MOCK_DIRTY_TRACK = 1, MOCK_IO_PAGE_SIZE = PAGE_SIZE / 2, MOCK_HUGE_PAGE_SIZE = 512 * MOCK_IO_PAGE_SIZE, /* * Like a real page table alignment requires the low bits of the address * to be zero. xarray also requires the high bit to be zero, so we store * the pfns shifted. The upper bits are used for metadata. */ MOCK_PFN_MASK = ULONG_MAX / MOCK_IO_PAGE_SIZE, _MOCK_PFN_START = MOCK_PFN_MASK + 1, MOCK_PFN_START_IOVA = _MOCK_PFN_START, MOCK_PFN_LAST_IOVA = _MOCK_PFN_START, MOCK_PFN_DIRTY_IOVA = _MOCK_PFN_START << 1, MOCK_PFN_HUGE_IOVA = _MOCK_PFN_START << 2, }; /* * Syzkaller has trouble randomizing the correct iova to use since it is linked * to the map ioctl's output, and it has no ide about that. So, simplify things. * In syzkaller mode the 64 bit IOVA is converted into an nth area and offset * value. This has a much smaller randomization space and syzkaller can hit it. */ static unsigned long __iommufd_test_syz_conv_iova(struct io_pagetable *iopt, u64 *iova) { struct syz_layout { __u32 nth_area; __u32 offset; }; struct syz_layout *syz = (void *)iova; unsigned int nth = syz->nth_area; struct iopt_area *area; down_read(&iopt->iova_rwsem); for (area = iopt_area_iter_first(iopt, 0, ULONG_MAX); area; area = iopt_area_iter_next(area, 0, ULONG_MAX)) { if (nth == 0) { up_read(&iopt->iova_rwsem); return iopt_area_iova(area) + syz->offset; } nth--; } up_read(&iopt->iova_rwsem); return 0; } static unsigned long iommufd_test_syz_conv_iova(struct iommufd_access *access, u64 *iova) { unsigned long ret; mutex_lock(&access->ioas_lock); if (!access->ioas) { |