359 358 360 360 359 30875 30849 30861 30851 50 50 50 8 8 220 13 220 220 206 220 220 1 1 145 50 142 142 287 145 10 137 142 141 142 141 142 40 40 40 40 1025 1026 1024 1026 1025 1027 1025 32285 54178 32288 45059 706 1240 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 | // SPDX-License-Identifier: GPL-2.0-only /* * Copyright (C) 1994 Linus Torvalds * * Pentium III FXSR, SSE support * General FPU state handling cleanups * Gareth Hughes <gareth@valinux.com>, May 2000 */ #include <asm/fpu/api.h> #include <asm/fpu/regset.h> #include <asm/fpu/sched.h> #include <asm/fpu/signal.h> #include <asm/fpu/types.h> #include <asm/traps.h> #include <asm/irq_regs.h> #include <uapi/asm/kvm.h> #include <linux/hardirq.h> #include <linux/pkeys.h> #include <linux/vmalloc.h> #include "context.h" #include "internal.h" #include "legacy.h" #include "xstate.h" #define CREATE_TRACE_POINTS #include <asm/trace/fpu.h> #ifdef CONFIG_X86_64 DEFINE_STATIC_KEY_FALSE(__fpu_state_size_dynamic); DEFINE_PER_CPU(u64, xfd_state); #endif /* The FPU state configuration data for kernel and user space */ struct fpu_state_config fpu_kernel_cfg __ro_after_init; struct fpu_state_config fpu_user_cfg __ro_after_init; /* * Represents the initial FPU state. It's mostly (but not completely) zeroes, * depending on the FPU hardware format: */ struct fpstate init_fpstate __ro_after_init; /* Track in-kernel FPU usage */ static DEFINE_PER_CPU(bool, in_kernel_fpu); /* * Track which context is using the FPU on the CPU: */ DEFINE_PER_CPU(struct fpu *, fpu_fpregs_owner_ctx); /* * Can we use the FPU in kernel mode with the * whole "kernel_fpu_begin/end()" sequence? */ bool irq_fpu_usable(void) { if (WARN_ON_ONCE(in_nmi())) return false; /* In kernel FPU usage already active? */ if (this_cpu_read(in_kernel_fpu)) return false; /* * When not in NMI or hard interrupt context, FPU can be used in: * * - Task context except from within fpregs_lock()'ed critical * regions. * * - Soft interrupt processing context which cannot happen * while in a fpregs_lock()'ed critical region. */ if (!in_hardirq()) return true; /* * In hard interrupt context it's safe when soft interrupts * are enabled, which means the interrupt did not hit in * a fpregs_lock()'ed critical region. */ return !softirq_count(); } EXPORT_SYMBOL(irq_fpu_usable); /* * Track AVX512 state use because it is known to slow the max clock * speed of the core. */ static void update_avx_timestamp(struct fpu *fpu) { #define AVX512_TRACKING_MASK (XFEATURE_MASK_ZMM_Hi256 | XFEATURE_MASK_Hi16_ZMM) if (fpu->fpstate->regs.xsave.header.xfeatures & AVX512_TRACKING_MASK) fpu->avx512_timestamp = jiffies; } /* * Save the FPU register state in fpu->fpstate->regs. The register state is * preserved. * * Must be called with fpregs_lock() held. * * The legacy FNSAVE instruction clears all FPU state unconditionally, so * register state has to be reloaded. That might be a pointless exercise * when the FPU is going to be used by another task right after that. But * this only affects 20+ years old 32bit systems and avoids conditionals all * over the place. * * FXSAVE and all XSAVE variants preserve the FPU register state. */ void save_fpregs_to_fpstate(struct fpu *fpu) { if (likely(use_xsave())) { os_xsave(fpu->fpstate); update_avx_timestamp(fpu); return; } if (likely(use_fxsr())) { fxsave(&fpu->fpstate->regs.fxsave); return; } /* * Legacy FPU register saving, FNSAVE always clears FPU registers, * so we have to reload them from the memory state. */ asm volatile("fnsave %[fp]; fwait" : [fp] "=m" (fpu->fpstate->regs.fsave)); frstor(&fpu->fpstate->regs.fsave); } void restore_fpregs_from_fpstate(struct fpstate *fpstate, u64 mask) { /* * AMD K7/K8 and later CPUs up to Zen don't save/restore * FDP/FIP/FOP unless an exception is pending. Clear the x87 state * here by setting it to fixed values. "m" is a random variable * that should be in L1. */ if (unlikely(static_cpu_has_bug(X86_BUG_FXSAVE_LEAK))) { asm volatile( "fnclex\n\t" "emms\n\t" "fildl %[addr]" /* set F?P to defined value */ : : [addr] "m" (*fpstate)); } if (use_xsave()) { /* * Dynamically enabled features are enabled in XCR0, but * usage requires also that the corresponding bits in XFD * are cleared. If the bits are set then using a related * instruction will raise #NM. This allows to do the * allocation of the larger FPU buffer lazy from #NM or if * the task has no permission to kill it which would happen * via #UD if the feature is disabled in XCR0. * * XFD state is following the same life time rules as * XSTATE and to restore state correctly XFD has to be * updated before XRSTORS otherwise the component would * stay in or go into init state even if the bits are set * in fpstate::regs::xsave::xfeatures. */ xfd_update_state(fpstate); /* * Restoring state always needs to modify all features * which are in @mask even if the current task cannot use * extended features. * * So fpstate->xfeatures cannot be used here, because then * a feature for which the task has no permission but was * used by the previous task would not go into init state. */ mask = fpu_kernel_cfg.max_features & mask; os_xrstor(fpstate, mask); } else { if (use_fxsr()) fxrstor(&fpstate->regs.fxsave); else frstor(&fpstate->regs.fsave); } } void fpu_reset_from_exception_fixup(void) { restore_fpregs_from_fpstate(&init_fpstate, XFEATURE_MASK_FPSTATE); } #if IS_ENABLED(CONFIG_KVM) static void __fpstate_reset(struct fpstate *fpstate, u64 xfd); static void fpu_init_guest_permissions(struct fpu_guest *gfpu) { struct fpu_state_perm *fpuperm; u64 perm; if (!IS_ENABLED(CONFIG_X86_64)) return; spin_lock_irq(¤t->sighand->siglock); fpuperm = ¤t->group_leader->thread.fpu.guest_perm; perm = fpuperm->__state_perm; /* First fpstate allocation locks down permissions. */ WRITE_ONCE(fpuperm->__state_perm, perm | FPU_GUEST_PERM_LOCKED); spin_unlock_irq(¤t->sighand->siglock); gfpu->perm = perm & ~FPU_GUEST_PERM_LOCKED; } bool fpu_alloc_guest_fpstate(struct fpu_guest *gfpu) { struct fpstate *fpstate; unsigned int size; size = fpu_user_cfg.default_size + ALIGN(offsetof(struct fpstate, regs), 64); fpstate = vzalloc(size); if (!fpstate) return false; /* Leave xfd to 0 (the reset value defined by spec) */ __fpstate_reset(fpstate, 0); fpstate_init_user(fpstate); fpstate->is_valloc = true; fpstate->is_guest = true; gfpu->fpstate = fpstate; gfpu->xfeatures = fpu_user_cfg.default_features; gfpu->perm = fpu_user_cfg.default_features; /* * KVM sets the FP+SSE bits in the XSAVE header when copying FPU state * to userspace, even when XSAVE is unsupported, so that restoring FPU * state on a different CPU that does support XSAVE can cleanly load * the incoming state using its natural XSAVE. In other words, KVM's * uABI size may be larger than this host's default size. Conversely, * the default size should never be larger than KVM's base uABI size; * all features that can expand the uABI size must be opt-in. */ gfpu->uabi_size = sizeof(struct kvm_xsave); if (WARN_ON_ONCE(fpu_user_cfg.default_size > gfpu->uabi_size)) gfpu->uabi_size = fpu_user_cfg.default_size; fpu_init_guest_permissions(gfpu); return true; } EXPORT_SYMBOL_GPL(fpu_alloc_guest_fpstate); void fpu_free_guest_fpstate(struct fpu_guest *gfpu) { struct fpstate *fps = gfpu->fpstate; if (!fps) return; if (WARN_ON_ONCE(!fps->is_valloc || !fps->is_guest || fps->in_use)) return; gfpu->fpstate = NULL; vfree(fps); } EXPORT_SYMBOL_GPL(fpu_free_guest_fpstate); /* * fpu_enable_guest_xfd_features - Check xfeatures against guest perm and enable * @guest_fpu: Pointer to the guest FPU container * @xfeatures: Features requested by guest CPUID * * Enable all dynamic xfeatures according to guest perm and requested CPUID. * * Return: 0 on success, error code otherwise */ int fpu_enable_guest_xfd_features(struct fpu_guest *guest_fpu, u64 xfeatures) { lockdep_assert_preemption_enabled(); /* Nothing to do if all requested features are already enabled. */ xfeatures &= ~guest_fpu->xfeatures; if (!xfeatures) return 0; return __xfd_enable_feature(xfeatures, guest_fpu); } EXPORT_SYMBOL_GPL(fpu_enable_guest_xfd_features); #ifdef CONFIG_X86_64 void fpu_update_guest_xfd(struct fpu_guest *guest_fpu, u64 xfd) { fpregs_lock(); guest_fpu->fpstate->xfd = xfd; if (guest_fpu->fpstate->in_use) xfd_update_state(guest_fpu->fpstate); fpregs_unlock(); } EXPORT_SYMBOL_GPL(fpu_update_guest_xfd); /** * fpu_sync_guest_vmexit_xfd_state - Synchronize XFD MSR and software state * * Must be invoked from KVM after a VMEXIT before enabling interrupts when * XFD write emulation is disabled. This is required because the guest can * freely modify XFD and the state at VMEXIT is not guaranteed to be the * same as the state on VMENTER. So software state has to be updated before * any operation which depends on it can take place. * * Note: It can be invoked unconditionally even when write emulation is * enabled for the price of a then pointless MSR read. */ void fpu_sync_guest_vmexit_xfd_state(void) { struct fpstate *fps = current->thread.fpu.fpstate; lockdep_assert_irqs_disabled(); if (fpu_state_size_dynamic()) { rdmsrl(MSR_IA32_XFD, fps->xfd); __this_cpu_write(xfd_state, fps->xfd); } } EXPORT_SYMBOL_GPL(fpu_sync_guest_vmexit_xfd_state); #endif /* CONFIG_X86_64 */ int fpu_swap_kvm_fpstate(struct fpu_guest *guest_fpu, bool enter_guest) { struct fpstate *guest_fps = guest_fpu->fpstate; struct fpu *fpu = ¤t->thread.fpu; struct fpstate *cur_fps = fpu->fpstate; fpregs_lock(); if (!cur_fps->is_confidential && !test_thread_flag(TIF_NEED_FPU_LOAD)) save_fpregs_to_fpstate(fpu); /* Swap fpstate */ if (enter_guest) { fpu->__task_fpstate = cur_fps; fpu->fpstate = guest_fps; guest_fps->in_use = true; } else { guest_fps->in_use = false; fpu->fpstate = fpu->__task_fpstate; fpu->__task_fpstate = NULL; } cur_fps = fpu->fpstate; if (!cur_fps->is_confidential) { /* Includes XFD update */ restore_fpregs_from_fpstate(cur_fps, XFEATURE_MASK_FPSTATE); } else { /* * XSTATE is restored by firmware from encrypted * memory. Make sure XFD state is correct while * running with guest fpstate */ xfd_update_state(cur_fps); } fpregs_mark_activate(); fpregs_unlock(); return 0; } EXPORT_SYMBOL_GPL(fpu_swap_kvm_fpstate); void fpu_copy_guest_fpstate_to_uabi(struct fpu_guest *gfpu, void *buf, unsigned int size, u64 xfeatures, u32 pkru) { struct fpstate *kstate = gfpu->fpstate; union fpregs_state *ustate = buf; struct membuf mb = { .p = buf, .left = size }; if (cpu_feature_enabled(X86_FEATURE_XSAVE)) { __copy_xstate_to_uabi_buf(mb, kstate, xfeatures, pkru, XSTATE_COPY_XSAVE); } else { memcpy(&ustate->fxsave, &kstate->regs.fxsave, sizeof(ustate->fxsave)); /* Make it restorable on a XSAVE enabled host */ ustate->xsave.header.xfeatures = XFEATURE_MASK_FPSSE; } } EXPORT_SYMBOL_GPL(fpu_copy_guest_fpstate_to_uabi); int fpu_copy_uabi_to_guest_fpstate(struct fpu_guest *gfpu, const void *buf, u64 xcr0, u32 *vpkru) { struct fpstate *kstate = gfpu->fpstate; const union fpregs_state *ustate = buf; if (!cpu_feature_enabled(X86_FEATURE_XSAVE)) { if (ustate->xsave.header.xfeatures & ~XFEATURE_MASK_FPSSE) return -EINVAL; if (ustate->fxsave.mxcsr & ~mxcsr_feature_mask) return -EINVAL; memcpy(&kstate->regs.fxsave, &ustate->fxsave, sizeof(ustate->fxsave)); return 0; } if (ustate->xsave.header.xfeatures & ~xcr0) return -EINVAL; /* * Nullify @vpkru to preserve its current value if PKRU's bit isn't set * in the header. KVM's odd ABI is to leave PKRU untouched in this * case (all other components are eventually re-initialized). */ if (!(ustate->xsave.header.xfeatures & XFEATURE_MASK_PKRU)) vpkru = NULL; return copy_uabi_from_kernel_to_xstate(kstate, ustate, vpkru); } EXPORT_SYMBOL_GPL(fpu_copy_uabi_to_guest_fpstate); #endif /* CONFIG_KVM */ void kernel_fpu_begin_mask(unsigned int kfpu_mask) { preempt_disable(); WARN_ON_FPU(!irq_fpu_usable()); WARN_ON_FPU(this_cpu_read(in_kernel_fpu)); this_cpu_write(in_kernel_fpu, true); if (!(current->flags & (PF_KTHREAD | PF_USER_WORKER)) && !test_thread_flag(TIF_NEED_FPU_LOAD)) { set_thread_flag(TIF_NEED_FPU_LOAD); save_fpregs_to_fpstate(¤t->thread.fpu); } __cpu_invalidate_fpregs_state(); /* Put sane initial values into the control registers. */ if (likely(kfpu_mask & KFPU_MXCSR) && boot_cpu_has(X86_FEATURE_XMM)) ldmxcsr(MXCSR_DEFAULT); if (unlikely(kfpu_mask & KFPU_387) && boot_cpu_has(X86_FEATURE_FPU)) asm volatile ("fninit"); } EXPORT_SYMBOL_GPL(kernel_fpu_begin_mask); void kernel_fpu_end(void) { WARN_ON_FPU(!this_cpu_read(in_kernel_fpu)); this_cpu_write(in_kernel_fpu, false); preempt_enable(); } EXPORT_SYMBOL_GPL(kernel_fpu_end); /* * Sync the FPU register state to current's memory register state when the * current task owns the FPU. The hardware register state is preserved. */ void fpu_sync_fpstate(struct fpu *fpu) { WARN_ON_FPU(fpu != ¤t->thread.fpu); fpregs_lock(); trace_x86_fpu_before_save(fpu); if (!test_thread_flag(TIF_NEED_FPU_LOAD)) save_fpregs_to_fpstate(fpu); trace_x86_fpu_after_save(fpu); fpregs_unlock(); } static inline unsigned int init_fpstate_copy_size(void) { if (!use_xsave()) return fpu_kernel_cfg.default_size; /* XSAVE(S) just needs the legacy and the xstate header part */ return sizeof(init_fpstate.regs.xsave); } static inline void fpstate_init_fxstate(struct fpstate *fpstate) { fpstate->regs.fxsave.cwd = 0x37f; fpstate->regs.fxsave.mxcsr = MXCSR_DEFAULT; } /* * Legacy x87 fpstate state init: */ static inline void fpstate_init_fstate(struct fpstate *fpstate) { fpstate->regs.fsave.cwd = 0xffff037fu; fpstate->regs.fsave.swd = 0xffff0000u; fpstate->regs.fsave.twd = 0xffffffffu; fpstate->regs.fsave.fos = 0xffff0000u; } /* * Used in two places: * 1) Early boot to setup init_fpstate for non XSAVE systems * 2) fpu_init_fpstate_user() which is invoked from KVM */ void fpstate_init_user(struct fpstate *fpstate) { if (!cpu_feature_enabled(X86_FEATURE_FPU)) { fpstate_init_soft(&fpstate->regs.soft); return; } xstate_init_xcomp_bv(&fpstate->regs.xsave, fpstate->xfeatures); if (cpu_feature_enabled(X86_FEATURE_FXSR)) fpstate_init_fxstate(fpstate); else fpstate_init_fstate(fpstate); } static void __fpstate_reset(struct fpstate *fpstate, u64 xfd) { /* Initialize sizes and feature masks */ fpstate->size = fpu_kernel_cfg.default_size; fpstate->user_size = fpu_user_cfg.default_size; fpstate->xfeatures = fpu_kernel_cfg.default_features; fpstate->user_xfeatures = fpu_user_cfg.default_features; fpstate->xfd = xfd; } void fpstate_reset(struct fpu *fpu) { /* Set the fpstate pointer to the default fpstate */ fpu->fpstate = &fpu->__fpstate; __fpstate_reset(fpu->fpstate, init_fpstate.xfd); /* Initialize the permission related info in fpu */ fpu->perm.__state_perm = fpu_kernel_cfg.default_features; fpu->perm.__state_size = fpu_kernel_cfg.default_size; fpu->perm.__user_state_size = fpu_user_cfg.default_size; /* Same defaults for guests */ fpu->guest_perm = fpu->perm; } static inline void fpu_inherit_perms(struct fpu *dst_fpu) { if (fpu_state_size_dynamic()) { struct fpu *src_fpu = ¤t->group_leader->thread.fpu; spin_lock_irq(¤t->sighand->siglock); /* Fork also inherits the permissions of the parent */ dst_fpu->perm = src_fpu->perm; dst_fpu->guest_perm = src_fpu->guest_perm; spin_unlock_irq(¤t->sighand->siglock); } } /* A passed ssp of zero will not cause any update */ static int update_fpu_shstk(struct task_struct *dst, unsigned long ssp) { #ifdef CONFIG_X86_USER_SHADOW_STACK struct cet_user_state *xstate; /* If ssp update is not needed. */ if (!ssp) return 0; xstate = get_xsave_addr(&dst->thread.fpu.fpstate->regs.xsave, XFEATURE_CET_USER); /* * If there is a non-zero ssp, then 'dst' must be configured with a shadow * stack and the fpu state should be up to date since it was just copied * from the parent in fpu_clone(). So there must be a valid non-init CET * state location in the buffer. */ if (WARN_ON_ONCE(!xstate)) return 1; xstate->user_ssp = (u64)ssp; #endif return 0; } /* Clone current's FPU state on fork */ int fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal, unsigned long ssp) { struct fpu *src_fpu = ¤t->thread.fpu; struct fpu *dst_fpu = &dst->thread.fpu; /* The new task's FPU state cannot be valid in the hardware. */ dst_fpu->last_cpu = -1; fpstate_reset(dst_fpu); if (!cpu_feature_enabled(X86_FEATURE_FPU)) return 0; /* * Enforce reload for user space tasks and prevent kernel threads * from trying to save the FPU registers on context switch. */ set_tsk_thread_flag(dst, TIF_NEED_FPU_LOAD); /* * No FPU state inheritance for kernel threads and IO * worker threads. */ if (minimal) { /* Clear out the minimal state */ memcpy(&dst_fpu->fpstate->regs, &init_fpstate.regs, init_fpstate_copy_size()); return 0; } /* * If a new feature is added, ensure all dynamic features are * caller-saved from here! */ BUILD_BUG_ON(XFEATURE_MASK_USER_DYNAMIC != XFEATURE_MASK_XTILE_DATA); /* * Save the default portion of the current FPU state into the * clone. Assume all dynamic features to be defined as caller- * saved, which enables skipping both the expansion of fpstate * and the copying of any dynamic state. * * Do not use memcpy() when TIF_NEED_FPU_LOAD is set because * copying is not valid when current uses non-default states. */ fpregs_lock(); if (test_thread_flag(TIF_NEED_FPU_LOAD)) fpregs_restore_userregs(); save_fpregs_to_fpstate(dst_fpu); fpregs_unlock(); if (!(clone_flags & CLONE_THREAD)) fpu_inherit_perms(dst_fpu); /* * Children never inherit PASID state. * Force it to have its init value: */ if (use_xsave()) dst_fpu->fpstate->regs.xsave.header.xfeatures &= ~XFEATURE_MASK_PASID; /* * Update shadow stack pointer, in case it changed during clone. */ if (update_fpu_shstk(dst, ssp)) return 1; trace_x86_fpu_copy_src(src_fpu); trace_x86_fpu_copy_dst(dst_fpu); return 0; } /* * Whitelist the FPU register state embedded into task_struct for hardened * usercopy. */ void fpu_thread_struct_whitelist(unsigned long *offset, unsigned long *size) { *offset = offsetof(struct thread_struct, fpu.__fpstate.regs); *size = fpu_kernel_cfg.default_size; } /* * Drops current FPU state: deactivates the fpregs and * the fpstate. NOTE: it still leaves previous contents * in the fpregs in the eager-FPU case. * * This function can be used in cases where we know that * a state-restore is coming: either an explicit one, * or a reschedule. */ void fpu__drop(struct fpu *fpu) { preempt_disable(); if (fpu == ¤t->thread.fpu) { /* Ignore delayed exceptions from user space */ asm volatile("1: fwait\n" "2:\n" _ASM_EXTABLE(1b, 2b)); fpregs_deactivate(fpu); } trace_x86_fpu_dropped(fpu); preempt_enable(); } /* * Clear FPU registers by setting them up from the init fpstate. * Caller must do fpregs_[un]lock() around it. */ static inline void restore_fpregs_from_init_fpstate(u64 features_mask) { if (use_xsave()) os_xrstor(&init_fpstate, features_mask); else if (use_fxsr()) fxrstor(&init_fpstate.regs.fxsave); else frstor(&init_fpstate.regs.fsave); pkru_write_default(); } /* * Reset current->fpu memory state to the init values. */ static void fpu_reset_fpregs(void) { struct fpu *fpu = ¤t->thread.fpu; fpregs_lock(); __fpu_invalidate_fpregs_state(fpu); /* * This does not change the actual hardware registers. It just * resets the memory image and sets TIF_NEED_FPU_LOAD so a * subsequent return to usermode will reload the registers from the * task's memory image. * * Do not use fpstate_init() here. Just copy init_fpstate which has * the correct content already except for PKRU. * * PKRU handling does not rely on the xstate when restoring for * user space as PKRU is eagerly written in switch_to() and * flush_thread(). */ memcpy(&fpu->fpstate->regs, &init_fpstate.regs, init_fpstate_copy_size()); set_thread_flag(TIF_NEED_FPU_LOAD); fpregs_unlock(); } /* * Reset current's user FPU states to the init states. current's * supervisor states, if any, are not modified by this function. The * caller guarantees that the XSTATE header in memory is intact. */ void fpu__clear_user_states(struct fpu *fpu) { WARN_ON_FPU(fpu != ¤t->thread.fpu); fpregs_lock(); if (!cpu_feature_enabled(X86_FEATURE_FPU)) { fpu_reset_fpregs(); fpregs_unlock(); return; } /* * Ensure that current's supervisor states are loaded into their * corresponding registers. */ if (xfeatures_mask_supervisor() && !fpregs_state_valid(fpu, smp_processor_id())) os_xrstor_supervisor(fpu->fpstate); /* Reset user states in registers. */ restore_fpregs_from_init_fpstate(XFEATURE_MASK_USER_RESTORE); /* * Now all FPU registers have their desired values. Inform the FPU * state machine that current's FPU registers are in the hardware * registers. The memory image does not need to be updated because * any operation relying on it has to save the registers first when * current's FPU is marked active. */ fpregs_mark_activate(); fpregs_unlock(); } void fpu_flush_thread(void) { fpstate_reset(¤t->thread.fpu); fpu_reset_fpregs(); } /* * Load FPU context before returning to userspace. */ void switch_fpu_return(void) { if (!static_cpu_has(X86_FEATURE_FPU)) return; fpregs_restore_userregs(); } EXPORT_SYMBOL_GPL(switch_fpu_return); void fpregs_lock_and_load(void) { /* * fpregs_lock() only disables preemption (mostly). So modifying state * in an interrupt could screw up some in progress fpregs operation. * Warn about it. */ WARN_ON_ONCE(!irq_fpu_usable()); WARN_ON_ONCE(current->flags & PF_KTHREAD); fpregs_lock(); fpregs_assert_state_consistent(); if (test_thread_flag(TIF_NEED_FPU_LOAD)) fpregs_restore_userregs(); } #ifdef CONFIG_X86_DEBUG_FPU /* * If current FPU state according to its tracking (loaded FPU context on this * CPU) is not valid then we must have TIF_NEED_FPU_LOAD set so the context is * loaded on return to userland. */ void fpregs_assert_state_consistent(void) { struct fpu *fpu = ¤t->thread.fpu; if (test_thread_flag(TIF_NEED_FPU_LOAD)) return; WARN_ON_FPU(!fpregs_state_valid(fpu, smp_processor_id())); } EXPORT_SYMBOL_GPL(fpregs_assert_state_consistent); #endif void fpregs_mark_activate(void) { struct fpu *fpu = ¤t->thread.fpu; fpregs_activate(fpu); fpu->last_cpu = smp_processor_id(); clear_thread_flag(TIF_NEED_FPU_LOAD); } /* * x87 math exception handling: */ int fpu__exception_code(struct fpu *fpu, int trap_nr) { int err; if (trap_nr == X86_TRAP_MF) { unsigned short cwd, swd; /* * (~cwd & swd) will mask out exceptions that are not set to unmasked * status. 0x3f is the exception bits in these regs, 0x200 is the * C1 reg you need in case of a stack fault, 0x040 is the stack * fault bit. We should only be taking one exception at a time, * so if this combination doesn't produce any single exception, * then we have a bad program that isn't synchronizing its FPU usage * and it will suffer the consequences since we won't be able to * fully reproduce the context of the exception. */ if (boot_cpu_has(X86_FEATURE_FXSR)) { cwd = fpu->fpstate->regs.fxsave.cwd; swd = fpu->fpstate->regs.fxsave.swd; } else { cwd = (unsigned short)fpu->fpstate->regs.fsave.cwd; swd = (unsigned short)fpu->fpstate->regs.fsave.swd; } err = swd & ~cwd; } else { /* * The SIMD FPU exceptions are handled a little differently, as there * is only a single status/control register. Thus, to determine which * unmasked exception was caught we must mask the exception mask bits * at 0x1f80, and then use these to mask the exception bits at 0x3f. */ unsigned short mxcsr = MXCSR_DEFAULT; if (boot_cpu_has(X86_FEATURE_XMM)) mxcsr = fpu->fpstate->regs.fxsave.mxcsr; err = ~(mxcsr >> 7) & mxcsr; } if (err & 0x001) { /* Invalid op */ /* * swd & 0x240 == 0x040: Stack Underflow * swd & 0x240 == 0x240: Stack Overflow * User must clear the SF bit (0x40) if set */ return FPE_FLTINV; } else if (err & 0x004) { /* Divide by Zero */ return FPE_FLTDIV; } else if (err & 0x008) { /* Overflow */ return FPE_FLTOVF; } else if (err & 0x012) { /* Denormal, Underflow */ return FPE_FLTUND; } else if (err & 0x020) { /* Precision */ return FPE_FLTRES; } /* * If we're using IRQ 13, or supposedly even some trap * X86_TRAP_MF implementations, it's possible * we get a spurious trap, which is not an error. */ return 0; } /* * Initialize register state that may prevent from entering low-power idle. * This function will be invoked from the cpuidle driver only when needed. */ noinstr void fpu_idle_fpregs(void) { /* Note: AMX_TILE being enabled implies XGETBV1 support */ if (cpu_feature_enabled(X86_FEATURE_AMX_TILE) && (xfeatures_in_use() & XFEATURE_MASK_XTILE)) { tile_release(); __this_cpu_write(fpu_fpregs_owner_ctx, NULL); } } |
18 16 17 18 16 16 21 15 18 491 188 459 457 485 144 288 285 460 284 123 35 30 35 34 147 123 58 102 101 15 2 6 6 5 5 5 5 8 5 4 105 8 102 104 105 104 97 6 7 100 54 59 32 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 | // SPDX-License-Identifier: GPL-2.0-or-later /* Generic associative array implementation. * * See Documentation/core-api/assoc_array.rst for information. * * Copyright (C) 2013 Red Hat, Inc. All Rights Reserved. * Written by David Howells (dhowells@redhat.com) */ //#define DEBUG #include <linux/rcupdate.h> #include <linux/slab.h> #include <linux/err.h> #include <linux/assoc_array_priv.h> /* * Iterate over an associative array. The caller must hold the RCU read lock * or better. */ static int assoc_array_subtree_iterate(const struct assoc_array_ptr *root, const struct assoc_array_ptr *stop, int (*iterator)(const void *leaf, void *iterator_data), void *iterator_data) { const struct assoc_array_shortcut *shortcut; const struct assoc_array_node *node; const struct assoc_array_ptr *cursor, *ptr, *parent; unsigned long has_meta; int slot, ret; cursor = root; begin_node: if (assoc_array_ptr_is_shortcut(cursor)) { /* Descend through a shortcut */ shortcut = assoc_array_ptr_to_shortcut(cursor); cursor = READ_ONCE(shortcut->next_node); /* Address dependency. */ } node = assoc_array_ptr_to_node(cursor); slot = 0; /* We perform two passes of each node. * * The first pass does all the leaves in this node. This means we * don't miss any leaves if the node is split up by insertion whilst * we're iterating over the branches rooted here (we may, however, see * some leaves twice). */ has_meta = 0; for (; slot < ASSOC_ARRAY_FAN_OUT; slot++) { ptr = READ_ONCE(node->slots[slot]); /* Address dependency. */ has_meta |= (unsigned long)ptr; if (ptr && assoc_array_ptr_is_leaf(ptr)) { /* We need a barrier between the read of the pointer, * which is supplied by the above READ_ONCE(). */ /* Invoke the callback */ ret = iterator(assoc_array_ptr_to_leaf(ptr), iterator_data); if (ret) return ret; } } /* The second pass attends to all the metadata pointers. If we follow * one of these we may find that we don't come back here, but rather go * back to a replacement node with the leaves in a different layout. * * We are guaranteed to make progress, however, as the slot number for * a particular portion of the key space cannot change - and we * continue at the back pointer + 1. */ if (!(has_meta & ASSOC_ARRAY_PTR_META_TYPE)) goto finished_node; slot = 0; continue_node: node = assoc_array_ptr_to_node(cursor); for (; slot < ASSOC_ARRAY_FAN_OUT; slot++) { ptr = READ_ONCE(node->slots[slot]); /* Address dependency. */ if (assoc_array_ptr_is_meta(ptr)) { cursor = ptr; goto begin_node; } } finished_node: /* Move up to the parent (may need to skip back over a shortcut) */ parent = READ_ONCE(node->back_pointer); /* Address dependency. */ slot = node->parent_slot; if (parent == stop) return 0; if (assoc_array_ptr_is_shortcut(parent)) { shortcut = assoc_array_ptr_to_shortcut(parent); cursor = parent; parent = READ_ONCE(shortcut->back_pointer); /* Address dependency. */ slot = shortcut->parent_slot; if (parent == stop) return 0; } /* Ascend to next slot in parent node */ cursor = parent; slot++; goto continue_node; } /** * assoc_array_iterate - Pass all objects in the array to a callback * @array: The array to iterate over. * @iterator: The callback function. * @iterator_data: Private data for the callback function. * * Iterate over all the objects in an associative array. Each one will be * presented to the iterator function. * * If the array is being modified concurrently with the iteration then it is * possible that some objects in the array will be passed to the iterator * callback more than once - though every object should be passed at least * once. If this is undesirable then the caller must lock against modification * for the duration of this function. * * The function will return 0 if no objects were in the array or else it will * return the result of the last iterator function called. Iteration stops * immediately if any call to the iteration function results in a non-zero * return. * * The caller should hold the RCU read lock or better if concurrent * modification is possible. */ int assoc_array_iterate(const struct assoc_array *array, int (*iterator)(const void *object, void *iterator_data), void *iterator_data) { struct assoc_array_ptr *root = READ_ONCE(array->root); /* Address dependency. */ if (!root) return 0; return assoc_array_subtree_iterate(root, NULL, iterator, iterator_data); } enum assoc_array_walk_status { assoc_array_walk_tree_empty, assoc_array_walk_found_terminal_node, assoc_array_walk_found_wrong_shortcut, }; struct assoc_array_walk_result { struct { struct assoc_array_node *node; /* Node in which leaf might be found */ int level; int slot; } terminal_node; struct { struct assoc_array_shortcut *shortcut; int level; int sc_level; unsigned long sc_segments; unsigned long dissimilarity; } wrong_shortcut; }; /* * Navigate through the internal tree looking for the closest node to the key. */ static enum assoc_array_walk_status assoc_array_walk(const struct assoc_array *array, const struct assoc_array_ops *ops, const void *index_key, struct assoc_array_walk_result *result) { struct assoc_array_shortcut *shortcut; struct assoc_array_node *node; struct assoc_array_ptr *cursor, *ptr; unsigned long sc_segments, dissimilarity; unsigned long segments; int level, sc_level, next_sc_level; int slot; pr_devel("-->%s()\n", __func__); cursor = READ_ONCE(array->root); /* Address dependency. */ if (!cursor) return assoc_array_walk_tree_empty; level = 0; /* Use segments from the key for the new leaf to navigate through the * internal tree, skipping through nodes and shortcuts that are on * route to the destination. Eventually we'll come to a slot that is * either empty or contains a leaf at which point we've found a node in * which the leaf we're looking for might be found or into which it * should be inserted. */ jumped: segments = ops->get_key_chunk(index_key, level); pr_devel("segments[%d]: %lx\n", level, segments); if (assoc_array_ptr_is_shortcut(cursor)) goto follow_shortcut; consider_node: node = assoc_array_ptr_to_node(cursor); slot = segments >> (level & ASSOC_ARRAY_KEY_CHUNK_MASK); slot &= ASSOC_ARRAY_FAN_MASK; ptr = READ_ONCE(node->slots[slot]); /* Address dependency. */ pr_devel("consider slot %x [ix=%d type=%lu]\n", slot, level, (unsigned long)ptr & 3); if (!assoc_array_ptr_is_meta(ptr)) { /* The node doesn't have a node/shortcut pointer in the slot * corresponding to the index key that we have to follow. */ result->terminal_node.node = node; result->terminal_node.level = level; result->terminal_node.slot = slot; pr_devel("<--%s() = terminal_node\n", __func__); return assoc_array_walk_found_terminal_node; } if (assoc_array_ptr_is_node(ptr)) { /* There is a pointer to a node in the slot corresponding to * this index key segment, so we need to follow it. */ cursor = ptr; level += ASSOC_ARRAY_LEVEL_STEP; if ((level & ASSOC_ARRAY_KEY_CHUNK_MASK) != 0) goto consider_node; goto jumped; } /* There is a shortcut in the slot corresponding to the index key * segment. We follow the shortcut if its partial index key matches * this leaf's. Otherwise we need to split the shortcut. */ cursor = ptr; follow_shortcut: shortcut = assoc_array_ptr_to_shortcut(cursor); pr_devel("shortcut to %d\n", shortcut->skip_to_level); sc_level = level + ASSOC_ARRAY_LEVEL_STEP; BUG_ON(sc_level > shortcut->skip_to_level); do { /* Check the leaf against the shortcut's index key a word at a * time, trimming the final word (the shortcut stores the index * key completely from the root to the shortcut's target). */ if ((sc_level & ASSOC_ARRAY_KEY_CHUNK_MASK) == 0) segments = ops->get_key_chunk(index_key, sc_level); sc_segments = shortcut->index_key[sc_level >> ASSOC_ARRAY_KEY_CHUNK_SHIFT]; dissimilarity = segments ^ sc_segments; if (round_up(sc_level, ASSOC_ARRAY_KEY_CHUNK_SIZE) > shortcut->skip_to_level) { /* Trim segments that are beyond the shortcut */ int shift = shortcut->skip_to_level & ASSOC_ARRAY_KEY_CHUNK_MASK; dissimilarity &= ~(ULONG_MAX << shift); next_sc_level = shortcut->skip_to_level; } else { next_sc_level = sc_level + ASSOC_ARRAY_KEY_CHUNK_SIZE; next_sc_level = round_down(next_sc_level, ASSOC_ARRAY_KEY_CHUNK_SIZE); } if (dissimilarity != 0) { /* This shortcut points elsewhere */ result->wrong_shortcut.shortcut = shortcut; result->wrong_shortcut.level = level; result->wrong_shortcut.sc_level = sc_level; result->wrong_shortcut.sc_segments = sc_segments; result->wrong_shortcut.dissimilarity = dissimilarity; return assoc_array_walk_found_wrong_shortcut; } sc_level = next_sc_level; } while (sc_level < shortcut->skip_to_level); /* The shortcut matches the leaf's index to this point. */ cursor = READ_ONCE(shortcut->next_node); /* Address dependency. */ if (((level ^ sc_level) & ~ASSOC_ARRAY_KEY_CHUNK_MASK) != 0) { level = sc_level; goto jumped; } else { level = sc_level; goto consider_node; } } /** * assoc_array_find - Find an object by index key * @array: The associative array to search. * @ops: The operations to use. * @index_key: The key to the object. * * Find an object in an associative array by walking through the internal tree * to the node that should contain the object and then searching the leaves * there. NULL is returned if the requested object was not found in the array. * * The caller must hold the RCU read lock or better. */ void *assoc_array_find(const struct assoc_array *array, const struct assoc_array_ops *ops, const void *index_key) { struct assoc_array_walk_result result; const struct assoc_array_node *node; const struct assoc_array_ptr *ptr; const void *leaf; int slot; if (assoc_array_walk(array, ops, index_key, &result) != assoc_array_walk_found_terminal_node) return NULL; node = result.terminal_node.node; /* If the target key is available to us, it's has to be pointed to by * the terminal node. */ for (slot = 0; slot < ASSOC_ARRAY_FAN_OUT; slot++) { ptr = READ_ONCE(node->slots[slot]); /* Address dependency. */ if (ptr && assoc_array_ptr_is_leaf(ptr)) { /* We need a barrier between the read of the pointer * and dereferencing the pointer - but only if we are * actually going to dereference it. */ leaf = assoc_array_ptr_to_leaf(ptr); if (ops->compare_object(leaf, index_key)) return (void *)leaf; } } return NULL; } /* * Destructively iterate over an associative array. The caller must prevent * other simultaneous accesses. */ static void assoc_array_destroy_subtree(struct assoc_array_ptr *root, const struct assoc_array_ops *ops) { struct assoc_array_shortcut *shortcut; struct assoc_array_node *node; struct assoc_array_ptr *cursor, *parent = NULL; int slot = -1; pr_devel("-->%s()\n", __func__); cursor = root; if (!cursor) { pr_devel("empty\n"); return; } move_to_meta: if (assoc_array_ptr_is_shortcut(cursor)) { /* Descend through a shortcut */ pr_devel("[%d] shortcut\n", slot); BUG_ON(!assoc_array_ptr_is_shortcut(cursor)); shortcut = assoc_array_ptr_to_shortcut(cursor); BUG_ON(shortcut->back_pointer != parent); BUG_ON(slot != -1 && shortcut->parent_slot != slot); parent = cursor; cursor = shortcut->next_node; slot = -1; BUG_ON(!assoc_array_ptr_is_node(cursor)); } pr_devel("[%d] node\n", slot); node = assoc_array_ptr_to_node(cursor); BUG_ON(node->back_pointer != parent); BUG_ON(slot != -1 && node->parent_slot != slot); slot = 0; continue_node: pr_devel("Node %p [back=%p]\n", node, node->back_pointer); for (; slot < ASSOC_ARRAY_FAN_OUT; slot++) { struct assoc_array_ptr *ptr = node->slots[slot]; if (!ptr) continue; if (assoc_array_ptr_is_meta(ptr)) { parent = cursor; cursor = ptr; goto move_to_meta; } if (ops) { pr_devel("[%d] free leaf\n", slot); ops->free_object(assoc_array_ptr_to_leaf(ptr)); } } parent = node->back_pointer; slot = node->parent_slot; pr_devel("free node\n"); kfree(node); if (!parent) return; /* Done */ /* Move back up to the parent (may need to free a shortcut on * the way up) */ if (assoc_array_ptr_is_shortcut(parent)) { shortcut = assoc_array_ptr_to_shortcut(parent); BUG_ON(shortcut->next_node != cursor); cursor = parent; parent = shortcut->back_pointer; slot = shortcut->parent_slot; pr_devel("free shortcut\n"); kfree(shortcut); if (!parent) return; BUG_ON(!assoc_array_ptr_is_node(parent)); } /* Ascend to next slot in parent node */ pr_devel("ascend to %p[%d]\n", parent, slot); cursor = parent; node = assoc_array_ptr_to_node(cursor); slot++; goto continue_node; } /** * assoc_array_destroy - Destroy an associative array * @array: The array to destroy. * @ops: The operations to use. * * Discard all metadata and free all objects in an associative array. The * array will be empty and ready to use again upon completion. This function * cannot fail. * * The caller must prevent all other accesses whilst this takes place as no * attempt is made to adjust pointers gracefully to permit RCU readlock-holding * accesses to continue. On the other hand, no memory allocation is required. */ void assoc_array_destroy(struct assoc_array *array, const struct assoc_array_ops *ops) { assoc_array_destroy_subtree(array->root, ops); array->root = NULL; } /* * Handle insertion into an empty tree. */ static bool assoc_array_insert_in_empty_tree(struct assoc_array_edit *edit) { struct assoc_array_node *new_n0; pr_devel("-->%s()\n", __func__); new_n0 = kzalloc(sizeof(struct assoc_array_node), GFP_KERNEL); if (!new_n0) return false; edit->new_meta[0] = assoc_array_node_to_ptr(new_n0); edit->leaf_p = &new_n0->slots[0]; edit->adjust_count_on = new_n0; edit->set[0].ptr = &edit->array->root; edit->set[0].to = assoc_array_node_to_ptr(new_n0); pr_devel("<--%s() = ok [no root]\n", __func__); return true; } /* * Handle insertion into a terminal node. */ static bool assoc_array_insert_into_terminal_node(struct assoc_array_edit *edit, const struct assoc_array_ops *ops, const void *index_key, struct assoc_array_walk_result *result) { struct assoc_array_shortcut *shortcut, *new_s0; struct assoc_array_node *node, *new_n0, *new_n1, *side; struct assoc_array_ptr *ptr; unsigned long dissimilarity, base_seg, blank; size_t keylen; bool have_meta; int level, diff; int slot, next_slot, free_slot, i, j; node = result->terminal_node.node; level = result->terminal_node.level; edit->segment_cache[ASSOC_ARRAY_FAN_OUT] = result->terminal_node.slot; pr_devel("-->%s()\n", __func__); /* We arrived at a node which doesn't have an onward node or shortcut * pointer that we have to follow. This means that (a) the leaf we * want must go here (either by insertion or replacement) or (b) we * need to split this node and insert in one of the fragments. */ free_slot = -1; /* Firstly, we have to check the leaves in this node to see if there's * a matching one we should replace in place. */ for (i = 0; i < ASSOC_ARRAY_FAN_OUT; i++) { ptr = node->slots[i]; if (!ptr) { free_slot = i; continue; } if (assoc_array_ptr_is_leaf(ptr) && ops->compare_object(assoc_array_ptr_to_leaf(ptr), index_key)) { pr_devel("replace in slot %d\n", i); edit->leaf_p = &node->slots[i]; edit->dead_leaf = node->slots[i]; pr_devel("<--%s() = ok [replace]\n", __func__); return true; } } /* If there is a free slot in this node then we can just insert the * leaf here. */ if (free_slot >= 0) { pr_devel("insert in free slot %d\n", free_slot); edit->leaf_p = &node->slots[free_slot]; edit->adjust_count_on = node; pr_devel("<--%s() = ok [insert]\n", __func__); return true; } /* The node has no spare slots - so we're either going to have to split * it or insert another node before it. * * Whatever, we're going to need at least two new nodes - so allocate * those now. We may also need a new shortcut, but we deal with that * when we need it. */ new_n0 = kzalloc(sizeof(struct assoc_array_node), GFP_KERNEL); if (!new_n0) return false; edit->new_meta[0] = assoc_array_node_to_ptr(new_n0); new_n1 = kzalloc(sizeof(struct assoc_array_node), GFP_KERNEL); if (!new_n1) return false; edit->new_meta[1] = assoc_array_node_to_ptr(new_n1); /* We need to find out how similar the leaves are. */ pr_devel("no spare slots\n"); have_meta = false; for (i = 0; i < ASSOC_ARRAY_FAN_OUT; i++) { ptr = node->slots[i]; if (assoc_array_ptr_is_meta(ptr)) { edit->segment_cache[i] = 0xff; have_meta = true; continue; } base_seg = ops->get_object_key_chunk( assoc_array_ptr_to_leaf(ptr), level); base_seg >>= level & ASSOC_ARRAY_KEY_CHUNK_MASK; edit->segment_cache[i] = base_seg & ASSOC_ARRAY_FAN_MASK; } if (have_meta) { pr_devel("have meta\n"); goto split_node; } /* The node contains only leaves */ dissimilarity = 0; base_seg = edit->segment_cache[0]; for (i = 1; i < ASSOC_ARRAY_FAN_OUT; i++) dissimilarity |= edit->segment_cache[i] ^ base_seg; pr_devel("only leaves; dissimilarity=%lx\n", dissimilarity); if ((dissimilarity & ASSOC_ARRAY_FAN_MASK) == 0) { /* The old leaves all cluster in the same slot. We will need * to insert a shortcut if the new node wants to cluster with them. */ if ((edit->segment_cache[ASSOC_ARRAY_FAN_OUT] ^ base_seg) == 0) goto all_leaves_cluster_together; /* Otherwise all the old leaves cluster in the same slot, but * the new leaf wants to go into a different slot - so we * create a new node (n0) to hold the new leaf and a pointer to * a new node (n1) holding all the old leaves. * * This can be done by falling through to the node splitting * path. */ pr_devel("present leaves cluster but not new leaf\n"); } split_node: pr_devel("split node\n"); /* We need to split the current node. The node must contain anything * from a single leaf (in the one leaf case, this leaf will cluster * with the new leaf) and the rest meta-pointers, to all leaves, some * of which may cluster. * * It won't contain the case in which all the current leaves plus the * new leaves want to cluster in the same slot. * * We need to expel at least two leaves out of a set consisting of the * leaves in the node and the new leaf. The current meta pointers can * just be copied as they shouldn't cluster with any of the leaves. * * We need a new node (n0) to replace the current one and a new node to * take the expelled nodes (n1). */ edit->set[0].to = assoc_array_node_to_ptr(new_n0); new_n0->back_pointer = node->back_pointer; new_n0->parent_slot = node->parent_slot; new_n1->back_pointer = assoc_array_node_to_ptr(new_n0); new_n1->parent_slot = -1; /* Need to calculate this */ do_split_node: pr_devel("do_split_node\n"); new_n0->nr_leaves_on_branch = node->nr_leaves_on_branch; new_n1->nr_leaves_on_branch = 0; /* Begin by finding two matching leaves. There have to be at least two * that match - even if there are meta pointers - because any leaf that * would match a slot with a meta pointer in it must be somewhere * behind that meta pointer and cannot be here. Further, given N * remaining leaf slots, we now have N+1 leaves to go in them. */ for (i = 0; i < ASSOC_ARRAY_FAN_OUT; i++) { slot = edit->segment_cache[i]; if (slot != 0xff) for (j = i + 1; j < ASSOC_ARRAY_FAN_OUT + 1; j++) if (edit->segment_cache[j] == slot) goto found_slot_for_multiple_occupancy; } found_slot_for_multiple_occupancy: pr_devel("same slot: %x %x [%02x]\n", i, j, slot); BUG_ON(i >= ASSOC_ARRAY_FAN_OUT); BUG_ON(j >= ASSOC_ARRAY_FAN_OUT + 1); BUG_ON(slot >= ASSOC_ARRAY_FAN_OUT); new_n1->parent_slot = slot; /* Metadata pointers cannot change slot */ for (i = 0; i < ASSOC_ARRAY_FAN_OUT; i++) if (assoc_array_ptr_is_meta(node->slots[i])) new_n0->slots[i] = node->slots[i]; else new_n0->slots[i] = NULL; BUG_ON(new_n0->slots[slot] != NULL); new_n0->slots[slot] = assoc_array_node_to_ptr(new_n1); /* Filter the leaf pointers between the new nodes */ free_slot = -1; next_slot = 0; for (i = 0; i < ASSOC_ARRAY_FAN_OUT; i++) { if (assoc_array_ptr_is_meta(node->slots[i])) continue; if (edit->segment_cache[i] == slot) { new_n1->slots[next_slot++] = node->slots[i]; new_n1->nr_leaves_on_branch++; } else { do { free_slot++; } while (new_n0->slots[free_slot] != NULL); new_n0->slots[free_slot] = node->slots[i]; } } pr_devel("filtered: f=%x n=%x\n", free_slot, next_slot); if (edit->segment_cache[ASSOC_ARRAY_FAN_OUT] != slot) { do { free_slot++; } while (new_n0->slots[free_slot] != NULL); edit->leaf_p = &new_n0->slots[free_slot]; edit->adjust_count_on = new_n0; } else { edit->leaf_p = &new_n1->slots[next_slot++]; edit->adjust_count_on = new_n1; } BUG_ON(next_slot <= 1); edit->set_backpointers_to = assoc_array_node_to_ptr(new_n0); for (i = 0; i < ASSOC_ARRAY_FAN_OUT; i++) { if (edit->segment_cache[i] == 0xff) { ptr = node->slots[i]; BUG_ON(assoc_array_ptr_is_leaf(ptr)); if (assoc_array_ptr_is_node(ptr)) { side = assoc_array_ptr_to_node(ptr); edit->set_backpointers[i] = &side->back_pointer; } else { shortcut = assoc_array_ptr_to_shortcut(ptr); edit->set_backpointers[i] = &shortcut->back_pointer; } } } ptr = node->back_pointer; if (!ptr) edit->set[0].ptr = &edit->array->root; else if (assoc_array_ptr_is_node(ptr)) edit->set[0].ptr = &assoc_array_ptr_to_node(ptr)->slots[node->parent_slot]; else edit->set[0].ptr = &assoc_array_ptr_to_shortcut(ptr)->next_node; edit->excised_meta[0] = assoc_array_node_to_ptr(node); pr_devel("<--%s() = ok [split node]\n", __func__); return true; all_leaves_cluster_together: /* All the leaves, new and old, want to cluster together in this node * in the same slot, so we have to replace this node with a shortcut to * skip over the identical parts of the key and then place a pair of * nodes, one inside the other, at the end of the shortcut and * distribute the keys between them. * * Firstly we need to work out where the leaves start diverging as a * bit position into their keys so that we know how big the shortcut * needs to be. * * We only need to make a single pass of N of the N+1 leaves because if * any keys differ between themselves at bit X then at least one of * them must also differ with the base key at bit X or before. */ pr_devel("all leaves cluster together\n"); diff = INT_MAX; for (i = 0; i < ASSOC_ARRAY_FAN_OUT; i++) { int x = ops->diff_objects(assoc_array_ptr_to_leaf(node->slots[i]), index_key); if (x < diff) { BUG_ON(x < 0); diff = x; } } BUG_ON(diff == INT_MAX); BUG_ON(diff < level + ASSOC_ARRAY_LEVEL_STEP); keylen = round_up(diff, ASSOC_ARRAY_KEY_CHUNK_SIZE); keylen >>= ASSOC_ARRAY_KEY_CHUNK_SHIFT; new_s0 = kzalloc(struct_size(new_s0, index_key, keylen), GFP_KERNEL); if (!new_s0) return false; edit->new_meta[2] = assoc_array_shortcut_to_ptr(new_s0); edit->set[0].to = assoc_array_shortcut_to_ptr(new_s0); new_s0->back_pointer = node->back_pointer; new_s0->parent_slot = node->parent_slot; new_s0->next_node = assoc_array_node_to_ptr(new_n0); new_n0->back_pointer = assoc_array_shortcut_to_ptr(new_s0); new_n0->parent_slot = 0; new_n1->back_pointer = assoc_array_node_to_ptr(new_n0); new_n1->parent_slot = -1; /* Need to calculate this */ new_s0->skip_to_level = level = diff & ~ASSOC_ARRAY_LEVEL_STEP_MASK; pr_devel("skip_to_level = %d [diff %d]\n", level, diff); BUG_ON(level <= 0); for (i = 0; i < keylen; i++) new_s0->index_key[i] = ops->get_key_chunk(index_key, i * ASSOC_ARRAY_KEY_CHUNK_SIZE); if (level & ASSOC_ARRAY_KEY_CHUNK_MASK) { blank = ULONG_MAX << (level & ASSOC_ARRAY_KEY_CHUNK_MASK); pr_devel("blank off [%zu] %d: %lx\n", keylen - 1, level, blank); new_s0->index_key[keylen - 1] &= ~blank; } /* This now reduces to a node splitting exercise for which we'll need * to regenerate the disparity table. */ for (i = 0; i < ASSOC_ARRAY_FAN_OUT; i++) { ptr = node->slots[i]; base_seg = ops->get_object_key_chunk(assoc_array_ptr_to_leaf(ptr), level); base_seg >>= level & ASSOC_ARRAY_KEY_CHUNK_MASK; edit->segment_cache[i] = base_seg & ASSOC_ARRAY_FAN_MASK; } base_seg = ops->get_key_chunk(index_key, level); base_seg >>= level & ASSOC_ARRAY_KEY_CHUNK_MASK; edit->segment_cache[ASSOC_ARRAY_FAN_OUT] = base_seg & ASSOC_ARRAY_FAN_MASK; goto do_split_node; } /* * Handle insertion into the middle of a shortcut. */ static bool assoc_array_insert_mid_shortcut(struct assoc_array_edit *edit, const struct assoc_array_ops *ops, struct assoc_array_walk_result *result) { struct assoc_array_shortcut *shortcut, *new_s0, *new_s1; struct assoc_array_node *node, *new_n0, *side; unsigned long sc_segments, dissimilarity, blank; size_t keylen; int level, sc_level, diff; int sc_slot; shortcut = result->wrong_shortcut.shortcut; level = result->wrong_shortcut.level; sc_level = result->wrong_shortcut.sc_level; sc_segments = result->wrong_shortcut.sc_segments; dissimilarity = result->wrong_shortcut.dissimilarity; pr_devel("-->%s(ix=%d dis=%lx scix=%d)\n", __func__, level, dissimilarity, sc_level); /* We need to split a shortcut and insert a node between the two * pieces. Zero-length pieces will be dispensed with entirely. * * First of all, we need to find out in which level the first * difference was. */ diff = __ffs(dissimilarity); diff &= ~ASSOC_ARRAY_LEVEL_STEP_MASK; diff += sc_level & ~ASSOC_ARRAY_KEY_CHUNK_MASK; pr_devel("diff=%d\n", diff); if (!shortcut->back_pointer) { edit->set[0].ptr = &edit->array->root; } else if (assoc_array_ptr_is_node(shortcut->back_pointer)) { node = assoc_array_ptr_to_node(shortcut->back_pointer); edit->set[0].ptr = &node->slots[shortcut->parent_slot]; } else { BUG(); } edit->excised_meta[0] = assoc_array_shortcut_to_ptr(shortcut); /* Create a new node now since we're going to need it anyway */ new_n0 = kzalloc(sizeof(struct assoc_array_node), GFP_KERNEL); if (!new_n0) return false; edit->new_meta[0] = assoc_array_node_to_ptr(new_n0); edit->adjust_count_on = new_n0; /* Insert a new shortcut before the new node if this segment isn't of * zero length - otherwise we just connect the new node directly to the * parent. */ level += ASSOC_ARRAY_LEVEL_STEP; if (diff > level) { pr_devel("pre-shortcut %d...%d\n", level, diff); keylen = round_up(diff, ASSOC_ARRAY_KEY_CHUNK_SIZE); keylen >>= ASSOC_ARRAY_KEY_CHUNK_SHIFT; new_s0 = kzalloc(struct_size(new_s0, index_key, keylen), GFP_KERNEL); if (!new_s0) return false; edit->new_meta[1] = assoc_array_shortcut_to_ptr(new_s0); edit->set[0].to = assoc_array_shortcut_to_ptr(new_s0); new_s0->back_pointer = shortcut->back_pointer; new_s0->parent_slot = shortcut->parent_slot; new_s0->next_node = assoc_array_node_to_ptr(new_n0); new_s0->skip_to_level = diff; new_n0->back_pointer = assoc_array_shortcut_to_ptr(new_s0); new_n0->parent_slot = 0; memcpy(new_s0->index_key, shortcut->index_key, flex_array_size(new_s0, index_key, keylen)); blank = ULONG_MAX << (diff & ASSOC_ARRAY_KEY_CHUNK_MASK); pr_devel("blank off [%zu] %d: %lx\n", keylen - 1, diff, blank); new_s0->index_key[keylen - 1] &= ~blank; } else { pr_devel("no pre-shortcut\n"); edit->set[0].to = assoc_array_node_to_ptr(new_n0); new_n0->back_pointer = shortcut->back_pointer; new_n0->parent_slot = shortcut->parent_slot; } side = assoc_array_ptr_to_node(shortcut->next_node); new_n0->nr_leaves_on_branch = side->nr_leaves_on_branch; /* We need to know which slot in the new node is going to take a * metadata pointer. */ sc_slot = sc_segments >> (diff & ASSOC_ARRAY_KEY_CHUNK_MASK); sc_slot &= ASSOC_ARRAY_FAN_MASK; pr_devel("new slot %lx >> %d -> %d\n", sc_segments, diff & ASSOC_ARRAY_KEY_CHUNK_MASK, sc_slot); /* Determine whether we need to follow the new node with a replacement * for the current shortcut. We could in theory reuse the current * shortcut if its parent slot number doesn't change - but that's a * 1-in-16 chance so not worth expending the code upon. */ level = diff + ASSOC_ARRAY_LEVEL_STEP; if (level < shortcut->skip_to_level) { pr_devel("post-shortcut %d...%d\n", level, shortcut->skip_to_level); keylen = round_up(shortcut->skip_to_level, ASSOC_ARRAY_KEY_CHUNK_SIZE); keylen >>= ASSOC_ARRAY_KEY_CHUNK_SHIFT; new_s1 = kzalloc(struct_size(new_s1, index_key, keylen), GFP_KERNEL); if (!new_s1) return false; edit->new_meta[2] = assoc_array_shortcut_to_ptr(new_s1); new_s1->back_pointer = assoc_array_node_to_ptr(new_n0); new_s1->parent_slot = sc_slot; new_s1->next_node = shortcut->next_node; new_s1->skip_to_level = shortcut->skip_to_level; new_n0->slots[sc_slot] = assoc_array_shortcut_to_ptr(new_s1); memcpy(new_s1->index_key, shortcut->index_key, flex_array_size(new_s1, index_key, keylen)); edit->set[1].ptr = &side->back_pointer; edit->set[1].to = assoc_array_shortcut_to_ptr(new_s1); } else { pr_devel("no post-shortcut\n"); /* We don't have to replace the pointed-to node as long as we * use memory barriers to make sure the parent slot number is * changed before the back pointer (the parent slot number is * irrelevant to the old parent shortcut). */ new_n0->slots[sc_slot] = shortcut->next_node; edit->set_parent_slot[0].p = &side->parent_slot; edit->set_parent_slot[0].to = sc_slot; edit->set[1].ptr = &side->back_pointer; edit->set[1].to = assoc_array_node_to_ptr(new_n0); } /* Install the new leaf in a spare slot in the new node. */ if (sc_slot == 0) edit->leaf_p = &new_n0->slots[1]; else edit->leaf_p = &new_n0->slots[0]; pr_devel("<--%s() = ok [split shortcut]\n", __func__); return true; } /** * assoc_array_insert - Script insertion of an object into an associative array * @array: The array to insert into. * @ops: The operations to use. * @index_key: The key to insert at. * @object: The object to insert. * * Precalculate and preallocate a script for the insertion or replacement of an * object in an associative array. This results in an edit script that can * either be applied or cancelled. * * The function returns a pointer to an edit script or -ENOMEM. * * The caller should lock against other modifications and must continue to hold * the lock until assoc_array_apply_edit() has been called. * * Accesses to the tree may take place concurrently with this function, * provided they hold the RCU read lock. */ struct assoc_array_edit *assoc_array_insert(struct assoc_array *array, const struct assoc_array_ops *ops, const void *index_key, void *object) { struct assoc_array_walk_result result; struct assoc_array_edit *edit; pr_devel("-->%s()\n", __func__); /* The leaf pointer we're given must not have the bottom bit set as we * use those for type-marking the pointer. NULL pointers are also not * allowed as they indicate an empty slot but we have to allow them * here as they can be updated later. */ BUG_ON(assoc_array_ptr_is_meta(object)); edit = kzalloc(sizeof(struct assoc_array_edit), GFP_KERNEL); if (!edit) return ERR_PTR(-ENOMEM); edit->array = array; edit->ops = ops; edit->leaf = assoc_array_leaf_to_ptr(object); edit->adjust_count_by = 1; switch (assoc_array_walk(array, ops, index_key, &result)) { case assoc_array_walk_tree_empty: /* Allocate a root node if there isn't one yet */ if (!assoc_array_insert_in_empty_tree(edit)) goto enomem; return edit; case assoc_array_walk_found_terminal_node: /* We found a node that doesn't have a node/shortcut pointer in * the slot corresponding to the index key that we have to * follow. */ if (!assoc_array_insert_into_terminal_node(edit, ops, index_key, &result)) goto enomem; return edit; case assoc_array_walk_found_wrong_shortcut: /* We found a shortcut that didn't match our key in a slot we * needed to follow. */ if (!assoc_array_insert_mid_shortcut(edit, ops, &result)) goto enomem; return edit; } enomem: /* Clean up after an out of memory error */ pr_devel("enomem\n"); assoc_array_cancel_edit(edit); return ERR_PTR(-ENOMEM); } /** * assoc_array_insert_set_object - Set the new object pointer in an edit script * @edit: The edit script to modify. * @object: The object pointer to set. * * Change the object to be inserted in an edit script. The object pointed to * by the old object is not freed. This must be done prior to applying the * script. */ void assoc_array_insert_set_object(struct assoc_array_edit *edit, void *object) { BUG_ON(!object); edit->leaf = assoc_array_leaf_to_ptr(object); } struct assoc_array_delete_collapse_context { struct assoc_array_node *node; const void *skip_leaf; int slot; }; /* * Subtree collapse to node iterator. */ static int assoc_array_delete_collapse_iterator(const void *leaf, void *iterator_data) { struct assoc_array_delete_collapse_context *collapse = iterator_data; if (leaf == collapse->skip_leaf) return 0; BUG_ON(collapse->slot >= ASSOC_ARRAY_FAN_OUT); collapse->node->slots[collapse->slot++] = assoc_array_leaf_to_ptr(leaf); return 0; } /** * assoc_array_delete - Script deletion of an object from an associative array * @array: The array to search. * @ops: The operations to use. * @index_key: The key to the object. * * Precalculate and preallocate a script for the deletion of an object from an * associative array. This results in an edit script that can either be * applied or cancelled. * * The function returns a pointer to an edit script if the object was found, * NULL if the object was not found or -ENOMEM. * * The caller should lock against other modifications and must continue to hold * the lock until assoc_array_apply_edit() has been called. * * Accesses to the tree may take place concurrently with this function, * provided they hold the RCU read lock. */ struct assoc_array_edit *assoc_array_delete(struct assoc_array *array, const struct assoc_array_ops *ops, const void *index_key) { struct assoc_array_delete_collapse_context collapse; struct assoc_array_walk_result result; struct assoc_array_node *node, *new_n0; struct assoc_array_edit *edit; struct assoc_array_ptr *ptr; bool has_meta; int slot, i; pr_devel("-->%s()\n", __func__); edit = kzalloc(sizeof(struct assoc_array_edit), GFP_KERNEL); if (!edit) return ERR_PTR(-ENOMEM); edit->array = array; edit->ops = ops; edit->adjust_count_by = -1; switch (assoc_array_walk(array, ops, index_key, &result)) { case assoc_array_walk_found_terminal_node: /* We found a node that should contain the leaf we've been * asked to remove - *if* it's in the tree. */ pr_devel("terminal_node\n"); node = result.terminal_node.node; for (slot = 0; slot < ASSOC_ARRAY_FAN_OUT; slot++) { ptr = node->slots[slot]; if (ptr && assoc_array_ptr_is_leaf(ptr) && ops->compare_object(assoc_array_ptr_to_leaf(ptr), index_key)) goto found_leaf; } fallthrough; case assoc_array_walk_tree_empty: case assoc_array_walk_found_wrong_shortcut: default: assoc_array_cancel_edit(edit); pr_devel("not found\n"); return NULL; } found_leaf: BUG_ON(array->nr_leaves_on_tree <= 0); /* In the simplest form of deletion we just clear the slot and release * the leaf after a suitable interval. */ edit->dead_leaf = node->slots[slot]; edit->set[0].ptr = &node->slots[slot]; edit->set[0].to = NULL; edit->adjust_count_on = node; /* If that concludes erasure of the last leaf, then delete the entire * internal array. */ if (array->nr_leaves_on_tree == 1) { edit->set[1].ptr = &array->root; edit->set[1].to = NULL; edit->adjust_count_on = NULL; edit->excised_subtree = array->root; pr_devel("all gone\n"); return edit; } /* However, we'd also like to clear up some metadata blocks if we * possibly can. * * We go for a simple algorithm of: if this node has FAN_OUT or fewer * leaves in it, then attempt to collapse it - and attempt to * recursively collapse up the tree. * * We could also try and collapse in partially filled subtrees to take * up space in this node. */ if (node->nr_leaves_on_branch <= ASSOC_ARRAY_FAN_OUT + 1) { struct assoc_array_node *parent, *grandparent; struct assoc_array_ptr *ptr; /* First of all, we need to know if this node has metadata so * that we don't try collapsing if all the leaves are already * here. */ has_meta = false; for (i = 0; i < ASSOC_ARRAY_FAN_OUT; i++) { ptr = node->slots[i]; if (assoc_array_ptr_is_meta(ptr)) { has_meta = true; break; } } pr_devel("leaves: %ld [m=%d]\n", node->nr_leaves_on_branch - 1, has_meta); /* Look further up the tree to see if we can collapse this node * into a more proximal node too. */ parent = node; collapse_up: pr_devel("collapse subtree: %ld\n", parent->nr_leaves_on_branch); ptr = parent->back_pointer; if (!ptr) goto do_collapse; if (assoc_array_ptr_is_shortcut(ptr)) { struct assoc_array_shortcut *s = assoc_array_ptr_to_shortcut(ptr); ptr = s->back_pointer; if (!ptr) goto do_collapse; } grandparent = assoc_array_ptr_to_node(ptr); if (grandparent->nr_leaves_on_branch <= ASSOC_ARRAY_FAN_OUT + 1) { parent = grandparent; goto collapse_up; } do_collapse: /* There's no point collapsing if the original node has no meta * pointers to discard and if we didn't merge into one of that * node's ancestry. */ if (has_meta || parent != node) { node = parent; /* Create a new node to collapse into */ new_n0 = kzalloc(sizeof(struct assoc_array_node), GFP_KERNEL); if (!new_n0) goto enomem; edit->new_meta[0] = assoc_array_node_to_ptr(new_n0); new_n0->back_pointer = node->back_pointer; new_n0->parent_slot = node->parent_slot; new_n0->nr_leaves_on_branch = node->nr_leaves_on_branch; edit->adjust_count_on = new_n0; collapse.node = new_n0; collapse.skip_leaf = assoc_array_ptr_to_leaf(edit->dead_leaf); collapse.slot = 0; assoc_array_subtree_iterate(assoc_array_node_to_ptr(node), node->back_pointer, assoc_array_delete_collapse_iterator, &collapse); pr_devel("collapsed %d,%lu\n", collapse.slot, new_n0->nr_leaves_on_branch); BUG_ON(collapse.slot != new_n0->nr_leaves_on_branch - 1); if (!node->back_pointer) { edit->set[1].ptr = &array->root; } else if (assoc_array_ptr_is_leaf(node->back_pointer)) { BUG(); } else if (assoc_array_ptr_is_node(node->back_pointer)) { struct assoc_array_node *p = assoc_array_ptr_to_node(node->back_pointer); edit->set[1].ptr = &p->slots[node->parent_slot]; } else if (assoc_array_ptr_is_shortcut(node->back_pointer)) { struct assoc_array_shortcut *s = assoc_array_ptr_to_shortcut(node->back_pointer); edit->set[1].ptr = &s->next_node; } edit->set[1].to = assoc_array_node_to_ptr(new_n0); edit->excised_subtree = assoc_array_node_to_ptr(node); } } return edit; enomem: /* Clean up after an out of memory error */ pr_devel("enomem\n"); assoc_array_cancel_edit(edit); return ERR_PTR(-ENOMEM); } /** * assoc_array_clear - Script deletion of all objects from an associative array * @array: The array to clear. * @ops: The operations to use. * * Precalculate and preallocate a script for the deletion of all the objects * from an associative array. This results in an edit script that can either * be applied or cancelled. * * The function returns a pointer to an edit script if there are objects to be * deleted, NULL if there are no objects in the array or -ENOMEM. * * The caller should lock against other modifications and must continue to hold * the lock until assoc_array_apply_edit() has been called. * * Accesses to the tree may take place concurrently with this function, * provided they hold the RCU read lock. */ struct assoc_array_edit *assoc_array_clear(struct assoc_array *array, const struct assoc_array_ops *ops) { struct assoc_array_edit *edit; pr_devel("-->%s()\n", __func__); if (!array->root) return NULL; edit = kzalloc(sizeof(struct assoc_array_edit), GFP_KERNEL); if (!edit) return ERR_PTR(-ENOMEM); edit->array = array; edit->ops = ops; edit->set[1].ptr = &array->root; edit->set[1].to = NULL; edit->excised_subtree = array->root; edit->ops_for_excised_subtree = ops; pr_devel("all gone\n"); return edit; } /* * Handle the deferred destruction after an applied edit. */ static void assoc_array_rcu_cleanup(struct rcu_head *head) { struct assoc_array_edit *edit = container_of(head, struct assoc_array_edit, rcu); int i; pr_devel("-->%s()\n", __func__); if (edit->dead_leaf) edit->ops->free_object(assoc_array_ptr_to_leaf(edit->dead_leaf)); for (i = 0; i < ARRAY_SIZE(edit->excised_meta); i++) if (edit->excised_meta[i]) kfree(assoc_array_ptr_to_node(edit->excised_meta[i])); if (edit->excised_subtree) { BUG_ON(assoc_array_ptr_is_leaf(edit->excised_subtree)); if (assoc_array_ptr_is_node(edit->excised_subtree)) { struct assoc_array_node *n = assoc_array_ptr_to_node(edit->excised_subtree); n->back_pointer = NULL; } else { struct assoc_array_shortcut *s = assoc_array_ptr_to_shortcut(edit->excised_subtree); s->back_pointer = NULL; } assoc_array_destroy_subtree(edit->excised_subtree, edit->ops_for_excised_subtree); } kfree(edit); } /** * assoc_array_apply_edit - Apply an edit script to an associative array * @edit: The script to apply. * * Apply an edit script to an associative array to effect an insertion, * deletion or clearance. As the edit script includes preallocated memory, * this is guaranteed not to fail. * * The edit script, dead objects and dead metadata will be scheduled for * destruction after an RCU grace period to permit those doing read-only * accesses on the array to continue to do so under the RCU read lock whilst * the edit is taking place. */ void assoc_array_apply_edit(struct assoc_array_edit *edit) { struct assoc_array_shortcut *shortcut; struct assoc_array_node *node; struct assoc_array_ptr *ptr; int i; pr_devel("-->%s()\n", __func__); smp_wmb(); if (edit->leaf_p) *edit->leaf_p = edit->leaf; smp_wmb(); for (i = 0; i < ARRAY_SIZE(edit->set_parent_slot); i++) if (edit->set_parent_slot[i].p) *edit->set_parent_slot[i].p = edit->set_parent_slot[i].to; smp_wmb(); for (i = 0; i < ARRAY_SIZE(edit->set_backpointers); i++) if (edit->set_backpointers[i]) *edit->set_backpointers[i] = edit->set_backpointers_to; smp_wmb(); for (i = 0; i < ARRAY_SIZE(edit->set); i++) if (edit->set[i].ptr) *edit->set[i].ptr = edit->set[i].to; if (edit->array->root == NULL) { edit->array->nr_leaves_on_tree = 0; } else if (edit->adjust_count_on) { node = edit->adjust_count_on; for (;;) { node->nr_leaves_on_branch += edit->adjust_count_by; ptr = node->back_pointer; if (!ptr) break; if (assoc_array_ptr_is_shortcut(ptr)) { shortcut = assoc_array_ptr_to_shortcut(ptr); ptr = shortcut->back_pointer; if (!ptr) break; } BUG_ON(!assoc_array_ptr_is_node(ptr)); node = assoc_array_ptr_to_node(ptr); } edit->array->nr_leaves_on_tree += edit->adjust_count_by; } call_rcu(&edit->rcu, assoc_array_rcu_cleanup); } /** * assoc_array_cancel_edit - Discard an edit script. * @edit: The script to discard. * * Free an edit script and all the preallocated data it holds without making * any changes to the associative array it was intended for. * * NOTE! In the case of an insertion script, this does _not_ release the leaf * that was to be inserted. That is left to the caller. */ void assoc_array_cancel_edit(struct assoc_array_edit *edit) { struct assoc_array_ptr *ptr; int i; pr_devel("-->%s()\n", __func__); /* Clean up after an out of memory error */ for (i = 0; i < ARRAY_SIZE(edit->new_meta); i++) { ptr = edit->new_meta[i]; if (ptr) { if (assoc_array_ptr_is_node(ptr)) kfree(assoc_array_ptr_to_node(ptr)); else kfree(assoc_array_ptr_to_shortcut(ptr)); } } kfree(edit); } /** * assoc_array_gc - Garbage collect an associative array. * @array: The array to clean. * @ops: The operations to use. * @iterator: A callback function to pass judgement on each object. * @iterator_data: Private data for the callback function. * * Collect garbage from an associative array and pack down the internal tree to * save memory. * * The iterator function is asked to pass judgement upon each object in the * array. If it returns false, the object is discard and if it returns true, * the object is kept. If it returns true, it must increment the object's * usage count (or whatever it needs to do to retain it) before returning. * * This function returns 0 if successful or -ENOMEM if out of memory. In the * latter case, the array is not changed. * * The caller should lock against other modifications and must continue to hold * the lock until assoc_array_apply_edit() has been called. * * Accesses to the tree may take place concurrently with this function, * provided they hold the RCU read lock. */ int assoc_array_gc(struct assoc_array *array, const struct assoc_array_ops *ops, bool (*iterator)(void *object, void *iterator_data), void *iterator_data) { struct assoc_array_shortcut *shortcut, *new_s; struct assoc_array_node *node, *new_n; struct assoc_array_edit *edit; struct assoc_array_ptr *cursor, *ptr; struct assoc_array_ptr *new_root, *new_parent, **new_ptr_pp; unsigned long nr_leaves_on_tree; bool retained; int keylen, slot, nr_free, next_slot, i; pr_devel("-->%s()\n", __func__); if (!array->root) return 0; edit = kzalloc(sizeof(struct assoc_array_edit), GFP_KERNEL); if (!edit) return -ENOMEM; edit->array = array; edit->ops = ops; edit->ops_for_excised_subtree = ops; edit->set[0].ptr = &array->root; edit->excised_subtree = array->root; new_root = new_parent = NULL; new_ptr_pp = &new_root; cursor = array->root; descend: /* If this point is a shortcut, then we need to duplicate it and * advance the target cursor. */ if (assoc_array_ptr_is_shortcut(cursor)) { shortcut = assoc_array_ptr_to_shortcut(cursor); keylen = round_up(shortcut->skip_to_level, ASSOC_ARRAY_KEY_CHUNK_SIZE); keylen >>= ASSOC_ARRAY_KEY_CHUNK_SHIFT; new_s = kmalloc(struct_size(new_s, index_key, keylen), GFP_KERNEL); if (!new_s) goto enomem; pr_devel("dup shortcut %p -> %p\n", shortcut, new_s); memcpy(new_s, shortcut, struct_size(new_s, index_key, keylen)); new_s->back_pointer = new_parent; new_s->parent_slot = shortcut->parent_slot; *new_ptr_pp = new_parent = assoc_array_shortcut_to_ptr(new_s); new_ptr_pp = &new_s->next_node; cursor = shortcut->next_node; } /* Duplicate the node at this position */ node = assoc_array_ptr_to_node(cursor); new_n = kzalloc(sizeof(struct assoc_array_node), GFP_KERNEL); if (!new_n) goto enomem; pr_devel("dup node %p -> %p\n", node, new_n); new_n->back_pointer = new_parent; new_n->parent_slot = node->parent_slot; *new_ptr_pp = new_parent = assoc_array_node_to_ptr(new_n); new_ptr_pp = NULL; slot = 0; continue_node: /* Filter across any leaves and gc any subtrees */ for (; slot < ASSOC_ARRAY_FAN_OUT; slot++) { ptr = node->slots[slot]; if (!ptr) continue; if (assoc_array_ptr_is_leaf(ptr)) { if (iterator(assoc_array_ptr_to_leaf(ptr), iterator_data)) /* The iterator will have done any reference * counting on the object for us. */ new_n->slots[slot] = ptr; continue; } new_ptr_pp = &new_n->slots[slot]; cursor = ptr; goto descend; } retry_compress: pr_devel("-- compress node %p --\n", new_n); /* Count up the number of empty slots in this node and work out the * subtree leaf count. */ new_n->nr_leaves_on_branch = 0; nr_free = 0; for (slot = 0; slot < ASSOC_ARRAY_FAN_OUT; slot++) { ptr = new_n->slots[slot]; if (!ptr) nr_free++; else if (assoc_array_ptr_is_leaf(ptr)) new_n->nr_leaves_on_branch++; } pr_devel("free=%d, leaves=%lu\n", nr_free, new_n->nr_leaves_on_branch); /* See what we can fold in */ retained = false; next_slot = 0; for (slot = 0; slot < ASSOC_ARRAY_FAN_OUT; slot++) { struct assoc_array_shortcut *s; struct assoc_array_node *child; ptr = new_n->slots[slot]; if (!ptr || assoc_array_ptr_is_leaf(ptr)) continue; s = NULL; if (assoc_array_ptr_is_shortcut(ptr)) { s = assoc_array_ptr_to_shortcut(ptr); ptr = s->next_node; } child = assoc_array_ptr_to_node(ptr); new_n->nr_leaves_on_branch += child->nr_leaves_on_branch; if (child->nr_leaves_on_branch <= nr_free + 1) { /* Fold the child node into this one */ pr_devel("[%d] fold node %lu/%d [nx %d]\n", slot, child->nr_leaves_on_branch, nr_free + 1, next_slot); /* We would already have reaped an intervening shortcut * on the way back up the tree. */ BUG_ON(s); new_n->slots[slot] = NULL; nr_free++; if (slot < next_slot) next_slot = slot; for (i = 0; i < ASSOC_ARRAY_FAN_OUT; i++) { struct assoc_array_ptr *p = child->slots[i]; if (!p) continue; BUG_ON(assoc_array_ptr_is_meta(p)); while (new_n->slots[next_slot]) next_slot++; BUG_ON(next_slot >= ASSOC_ARRAY_FAN_OUT); new_n->slots[next_slot++] = p; nr_free--; } kfree(child); } else { pr_devel("[%d] retain node %lu/%d [nx %d]\n", slot, child->nr_leaves_on_branch, nr_free + 1, next_slot); retained = true; } } if (retained && new_n->nr_leaves_on_branch <= ASSOC_ARRAY_FAN_OUT) { pr_devel("internal nodes remain despite enough space, retrying\n"); goto retry_compress; } pr_devel("after: %lu\n", new_n->nr_leaves_on_branch); nr_leaves_on_tree = new_n->nr_leaves_on_branch; /* Excise this node if it is singly occupied by a shortcut */ if (nr_free == ASSOC_ARRAY_FAN_OUT - 1) { for (slot = 0; slot < ASSOC_ARRAY_FAN_OUT; slot++) if ((ptr = new_n->slots[slot])) break; if (assoc_array_ptr_is_meta(ptr) && assoc_array_ptr_is_shortcut(ptr)) { pr_devel("excise node %p with 1 shortcut\n", new_n); new_s = assoc_array_ptr_to_shortcut(ptr); new_parent = new_n->back_pointer; slot = new_n->parent_slot; kfree(new_n); if (!new_parent) { new_s->back_pointer = NULL; new_s->parent_slot = 0; new_root = ptr; goto gc_complete; } if (assoc_array_ptr_is_shortcut(new_parent)) { /* We can discard any preceding shortcut also */ struct assoc_array_shortcut *s = assoc_array_ptr_to_shortcut(new_parent); pr_devel("excise preceding shortcut\n"); new_parent = new_s->back_pointer = s->back_pointer; slot = new_s->parent_slot = s->parent_slot; kfree(s); if (!new_parent) { new_s->back_pointer = NULL; new_s->parent_slot = 0; new_root = ptr; goto gc_complete; } } new_s->back_pointer = new_parent; new_s->parent_slot = slot; new_n = assoc_array_ptr_to_node(new_parent); new_n->slots[slot] = ptr; goto ascend_old_tree; } } /* Excise any shortcuts we might encounter that point to nodes that * only contain leaves. */ ptr = new_n->back_pointer; if (!ptr) goto gc_complete; if (assoc_array_ptr_is_shortcut(ptr)) { new_s = assoc_array_ptr_to_shortcut(ptr); new_parent = new_s->back_pointer; slot = new_s->parent_slot; if (new_n->nr_leaves_on_branch <= ASSOC_ARRAY_FAN_OUT) { struct assoc_array_node *n; pr_devel("excise shortcut\n"); new_n->back_pointer = new_parent; new_n->parent_slot = slot; kfree(new_s); if (!new_parent) { new_root = assoc_array_node_to_ptr(new_n); goto gc_complete; } n = assoc_array_ptr_to_node(new_parent); n->slots[slot] = assoc_array_node_to_ptr(new_n); } } else { new_parent = ptr; } new_n = assoc_array_ptr_to_node(new_parent); ascend_old_tree: ptr = node->back_pointer; if (assoc_array_ptr_is_shortcut(ptr)) { shortcut = assoc_array_ptr_to_shortcut(ptr); slot = shortcut->parent_slot; cursor = shortcut->back_pointer; if (!cursor) goto gc_complete; } else { slot = node->parent_slot; cursor = ptr; } BUG_ON(!cursor); node = assoc_array_ptr_to_node(cursor); slot++; goto continue_node; gc_complete: edit->set[0].to = new_root; assoc_array_apply_edit(edit); array->nr_leaves_on_tree = nr_leaves_on_tree; return 0; enomem: pr_devel("enomem\n"); assoc_array_destroy_subtree(new_root, edit->ops); kfree(edit); return -ENOMEM; } |
4 2 4 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 | /* SPDX-License-Identifier: GPL-2.0 */ #undef TRACE_SYSTEM #define TRACE_SYSTEM nbd #if !defined(_TRACE_NBD_H) || defined(TRACE_HEADER_MULTI_READ) #define _TRACE_NBD_H #include <linux/tracepoint.h> DECLARE_EVENT_CLASS(nbd_transport_event, TP_PROTO(struct request *req, u64 handle), TP_ARGS(req, handle), TP_STRUCT__entry( __field(struct request *, req) __field(u64, handle) ), TP_fast_assign( __entry->req = req; __entry->handle = handle; ), TP_printk( "nbd transport event: request %p, handle 0x%016llx", __entry->req, __entry->handle ) ); DEFINE_EVENT(nbd_transport_event, nbd_header_sent, TP_PROTO(struct request *req, u64 handle), TP_ARGS(req, handle) ); DEFINE_EVENT(nbd_transport_event, nbd_payload_sent, TP_PROTO(struct request *req, u64 handle), TP_ARGS(req, handle) ); DEFINE_EVENT(nbd_transport_event, nbd_header_received, TP_PROTO(struct request *req, u64 handle), TP_ARGS(req, handle) ); DEFINE_EVENT(nbd_transport_event, nbd_payload_received, TP_PROTO(struct request *req, u64 handle), TP_ARGS(req, handle) ); DECLARE_EVENT_CLASS(nbd_send_request, TP_PROTO(struct nbd_request *nbd_request, int index, struct request *rq), TP_ARGS(nbd_request, index, rq), TP_STRUCT__entry( __field(struct nbd_request *, nbd_request) __field(u64, dev_index) __field(struct request *, request) ), TP_fast_assign( __entry->nbd_request = NULL; __entry->dev_index = index; __entry->request = rq; ), TP_printk("nbd%lld: request %p", __entry->dev_index, __entry->request) ); #ifdef DEFINE_EVENT_WRITABLE #undef NBD_DEFINE_EVENT #define NBD_DEFINE_EVENT(template, call, proto, args, size) \ DEFINE_EVENT_WRITABLE(template, call, PARAMS(proto), \ PARAMS(args), size) #else #undef NBD_DEFINE_EVENT #define NBD_DEFINE_EVENT(template, call, proto, args, size) \ DEFINE_EVENT(template, call, PARAMS(proto), PARAMS(args)) #endif NBD_DEFINE_EVENT(nbd_send_request, nbd_send_request, TP_PROTO(struct nbd_request *nbd_request, int index, struct request *rq), TP_ARGS(nbd_request, index, rq), sizeof(struct nbd_request) ); #endif /* This part must be outside protection */ #include <trace/define_trace.h> |
14 38 13 38 38 13 1 17 24 24 14 14 17 12 38 15 38 38 38 38 21 24 21 16 24 24 21 21 59 59 7 7 12 38 38 38 38 20 21 20 21 21 21 21 20 21 21 21 20 21 37 24 30 17 24 37 21 7 7 38 38 21 38 1 38 38 38 38 38 38 14 14 14 14 128 88 86 86 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 | // SPDX-License-Identifier: GPL-2.0 /* * Copyright (C) Qu Wenruo 2017. All rights reserved. */ /* * The module is used to catch unexpected/corrupted tree block data. * Such behavior can be caused either by a fuzzed image or bugs. * * The objective is to do leaf/node validation checks when tree block is read * from disk, and check *every* possible member, so other code won't * need to checking them again. * * Due to the potential and unwanted damage, every checker needs to be * carefully reviewed otherwise so it does not prevent mount of valid images. */ #include <linux/types.h> #include <linux/stddef.h> #include <linux/error-injection.h> #include "messages.h" #include "ctree.h" #include "tree-checker.h" #include "compression.h" #include "volumes.h" #include "misc.h" #include "fs.h" #include "accessors.h" #include "file-item.h" #include "inode-item.h" #include "dir-item.h" #include "extent-tree.h" /* * Error message should follow the following format: * corrupt <type>: <identifier>, <reason>[, <bad_value>] * * @type: leaf or node * @identifier: the necessary info to locate the leaf/node. * It's recommended to decode key.objecitd/offset if it's * meaningful. * @reason: describe the error * @bad_value: optional, it's recommended to output bad value and its * expected value (range). * * Since comma is used to separate the components, only space is allowed * inside each component. */ /* * Append generic "corrupt leaf/node root=%llu block=%llu slot=%d: " to @fmt. * Allows callers to customize the output. */ __printf(3, 4) __cold static void generic_err(const struct extent_buffer *eb, int slot, const char *fmt, ...) { const struct btrfs_fs_info *fs_info = eb->fs_info; struct va_format vaf; va_list args; va_start(args, fmt); vaf.fmt = fmt; vaf.va = &args; dump_page(folio_page(eb->folios[0], 0), "eb page dump"); btrfs_crit(fs_info, "corrupt %s: root=%llu block=%llu slot=%d, %pV", btrfs_header_level(eb) == 0 ? "leaf" : "node", btrfs_header_owner(eb), btrfs_header_bytenr(eb), slot, &vaf); va_end(args); } /* * Customized reporter for extent data item, since its key objectid and * offset has its own meaning. */ __printf(3, 4) __cold static void file_extent_err(const struct extent_buffer *eb, int slot, const char *fmt, ...) { const struct btrfs_fs_info *fs_info = eb->fs_info; struct btrfs_key key; struct va_format vaf; va_list args; btrfs_item_key_to_cpu(eb, &key, slot); va_start(args, fmt); vaf.fmt = fmt; vaf.va = &args; dump_page(folio_page(eb->folios[0], 0), "eb page dump"); btrfs_crit(fs_info, "corrupt %s: root=%llu block=%llu slot=%d ino=%llu file_offset=%llu, %pV", btrfs_header_level(eb) == 0 ? "leaf" : "node", btrfs_header_owner(eb), btrfs_header_bytenr(eb), slot, key.objectid, key.offset, &vaf); va_end(args); } /* * Return 0 if the btrfs_file_extent_##name is aligned to @alignment * Else return 1 */ #define CHECK_FE_ALIGNED(leaf, slot, fi, name, alignment) \ ({ \ if (unlikely(!IS_ALIGNED(btrfs_file_extent_##name((leaf), (fi)), \ (alignment)))) \ file_extent_err((leaf), (slot), \ "invalid %s for file extent, have %llu, should be aligned to %u", \ (#name), btrfs_file_extent_##name((leaf), (fi)), \ (alignment)); \ (!IS_ALIGNED(btrfs_file_extent_##name((leaf), (fi)), (alignment))); \ }) static u64 file_extent_end(struct extent_buffer *leaf, struct btrfs_key *key, struct btrfs_file_extent_item *extent) { u64 end; u64 len; if (btrfs_file_extent_type(leaf, extent) == BTRFS_FILE_EXTENT_INLINE) { len = btrfs_file_extent_ram_bytes(leaf, extent); end = ALIGN(key->offset + len, leaf->fs_info->sectorsize); } else { len = btrfs_file_extent_num_bytes(leaf, extent); end = key->offset + len; } return end; } /* * Customized report for dir_item, the only new important information is * key->objectid, which represents inode number */ __printf(3, 4) __cold static void dir_item_err(const struct extent_buffer *eb, int slot, const char *fmt, ...) { const struct btrfs_fs_info *fs_info = eb->fs_info; struct btrfs_key key; struct va_format vaf; va_list args; btrfs_item_key_to_cpu(eb, &key, slot); va_start(args, fmt); vaf.fmt = fmt; vaf.va = &args; dump_page(folio_page(eb->folios[0], 0), "eb page dump"); btrfs_crit(fs_info, "corrupt %s: root=%llu block=%llu slot=%d ino=%llu, %pV", btrfs_header_level(eb) == 0 ? "leaf" : "node", btrfs_header_owner(eb), btrfs_header_bytenr(eb), slot, key.objectid, &vaf); va_end(args); } /* * This functions checks prev_key->objectid, to ensure current key and prev_key * share the same objectid as inode number. * * This is to detect missing INODE_ITEM in subvolume trees. * * Return true if everything is OK or we don't need to check. * Return false if anything is wrong. */ static bool check_prev_ino(struct extent_buffer *leaf, struct btrfs_key *key, int slot, struct btrfs_key *prev_key) { /* No prev key, skip check */ if (slot == 0) return true; /* Only these key->types needs to be checked */ ASSERT(key->type == BTRFS_XATTR_ITEM_KEY || key->type == BTRFS_INODE_REF_KEY || key->type == BTRFS_DIR_INDEX_KEY || key->type == BTRFS_DIR_ITEM_KEY || key->type == BTRFS_EXTENT_DATA_KEY); /* * Only subvolume trees along with their reloc trees need this check. * Things like log tree doesn't follow this ino requirement. */ if (!is_fstree(btrfs_header_owner(leaf))) return true; if (key->objectid == prev_key->objectid) return true; /* Error found */ dir_item_err(leaf, slot, "invalid previous key objectid, have %llu expect %llu", prev_key->objectid, key->objectid); return false; } static int check_extent_data_item(struct extent_buffer *leaf, struct btrfs_key *key, int slot, struct btrfs_key *prev_key) { struct btrfs_fs_info *fs_info = leaf->fs_info; struct btrfs_file_extent_item *fi; u32 sectorsize = fs_info->sectorsize; u32 item_size = btrfs_item_size(leaf, slot); u64 extent_end; if (unlikely(!IS_ALIGNED(key->offset, sectorsize))) { file_extent_err(leaf, slot, "unaligned file_offset for file extent, have %llu should be aligned to %u", key->offset, sectorsize); return -EUCLEAN; } /* * Previous key must have the same key->objectid (ino). * It can be XATTR_ITEM, INODE_ITEM or just another EXTENT_DATA. * But if objectids mismatch, it means we have a missing * INODE_ITEM. */ if (unlikely(!check_prev_ino(leaf, key, slot, prev_key))) return -EUCLEAN; fi = btrfs_item_ptr(leaf, slot, struct btrfs_file_extent_item); /* * Make sure the item contains at least inline header, so the file * extent type is not some garbage. */ if (unlikely(item_size < BTRFS_FILE_EXTENT_INLINE_DATA_START)) { file_extent_err(leaf, slot, "invalid item size, have %u expect [%zu, %u)", item_size, BTRFS_FILE_EXTENT_INLINE_DATA_START, SZ_4K); return -EUCLEAN; } if (unlikely(btrfs_file_extent_type(leaf, fi) >= BTRFS_NR_FILE_EXTENT_TYPES)) { file_extent_err(leaf, slot, "invalid type for file extent, have %u expect range [0, %u]", btrfs_file_extent_type(leaf, fi), BTRFS_NR_FILE_EXTENT_TYPES - 1); return -EUCLEAN; } /* * Support for new compression/encryption must introduce incompat flag, * and must be caught in open_ctree(). */ if (unlikely(btrfs_file_extent_compression(leaf, fi) >= BTRFS_NR_COMPRESS_TYPES)) { file_extent_err(leaf, slot, "invalid compression for file extent, have %u expect range [0, %u]", btrfs_file_extent_compression(leaf, fi), BTRFS_NR_COMPRESS_TYPES - 1); return -EUCLEAN; } if (unlikely(btrfs_file_extent_encryption(leaf, fi))) { file_extent_err(leaf, slot, "invalid encryption for file extent, have %u expect 0", btrfs_file_extent_encryption(leaf, fi)); return -EUCLEAN; } if (btrfs_file_extent_type(leaf, fi) == BTRFS_FILE_EXTENT_INLINE) { /* Inline extent must have 0 as key offset */ if (unlikely(key->offset)) { file_extent_err(leaf, slot, "invalid file_offset for inline file extent, have %llu expect 0", key->offset); return -EUCLEAN; } /* Compressed inline extent has no on-disk size, skip it */ if (btrfs_file_extent_compression(leaf, fi) != BTRFS_COMPRESS_NONE) return 0; /* Uncompressed inline extent size must match item size */ if (unlikely(item_size != BTRFS_FILE_EXTENT_INLINE_DATA_START + btrfs_file_extent_ram_bytes(leaf, fi))) { file_extent_err(leaf, slot, "invalid ram_bytes for uncompressed inline extent, have %u expect %llu", item_size, BTRFS_FILE_EXTENT_INLINE_DATA_START + btrfs_file_extent_ram_bytes(leaf, fi)); return -EUCLEAN; } return 0; } /* Regular or preallocated extent has fixed item size */ if (unlikely(item_size != sizeof(*fi))) { file_extent_err(leaf, slot, "invalid item size for reg/prealloc file extent, have %u expect %zu", item_size, sizeof(*fi)); return -EUCLEAN; } if (unlikely(CHECK_FE_ALIGNED(leaf, slot, fi, ram_bytes, sectorsize) || CHECK_FE_ALIGNED(leaf, slot, fi, disk_bytenr, sectorsize) || CHECK_FE_ALIGNED(leaf, slot, fi, disk_num_bytes, sectorsize) || CHECK_FE_ALIGNED(leaf, slot, fi, offset, sectorsize) || CHECK_FE_ALIGNED(leaf, slot, fi, num_bytes, sectorsize))) return -EUCLEAN; /* Catch extent end overflow */ if (unlikely(check_add_overflow(btrfs_file_extent_num_bytes(leaf, fi), key->offset, &extent_end))) { file_extent_err(leaf, slot, "extent end overflow, have file offset %llu extent num bytes %llu", key->offset, btrfs_file_extent_num_bytes(leaf, fi)); return -EUCLEAN; } /* * Check that no two consecutive file extent items, in the same leaf, * present ranges that overlap each other. */ if (slot > 0 && prev_key->objectid == key->objectid && prev_key->type == BTRFS_EXTENT_DATA_KEY) { struct btrfs_file_extent_item *prev_fi; u64 prev_end; prev_fi = btrfs_item_ptr(leaf, slot - 1, struct btrfs_file_extent_item); prev_end = file_extent_end(leaf, prev_key, prev_fi); if (unlikely(prev_end > key->offset)) { file_extent_err(leaf, slot - 1, "file extent end range (%llu) goes beyond start offset (%llu) of the next file extent", prev_end, key->offset); return -EUCLEAN; } } /* * For non-compressed data extents, ram_bytes should match its * disk_num_bytes. * However we do not really utilize ram_bytes in this case, so this check * is only optional for DEBUG builds for developers to catch the * unexpected behaviors. */ if (IS_ENABLED(CONFIG_BTRFS_DEBUG) && btrfs_file_extent_compression(leaf, fi) == BTRFS_COMPRESS_NONE && btrfs_file_extent_disk_bytenr(leaf, fi)) { if (WARN_ON(btrfs_file_extent_ram_bytes(leaf, fi) != btrfs_file_extent_disk_num_bytes(leaf, fi))) file_extent_err(leaf, slot, "mismatch ram_bytes (%llu) and disk_num_bytes (%llu) for non-compressed extent", btrfs_file_extent_ram_bytes(leaf, fi), btrfs_file_extent_disk_num_bytes(leaf, fi)); } return 0; } static int check_csum_item(struct extent_buffer *leaf, struct btrfs_key *key, int slot, struct btrfs_key *prev_key) { struct btrfs_fs_info *fs_info = leaf->fs_info; u32 sectorsize = fs_info->sectorsize; const u32 csumsize = fs_info->csum_size; if (unlikely(key->objectid != BTRFS_EXTENT_CSUM_OBJECTID)) { generic_err(leaf, slot, "invalid key objectid for csum item, have %llu expect %llu", key->objectid, BTRFS_EXTENT_CSUM_OBJECTID); return -EUCLEAN; } if (unlikely(!IS_ALIGNED(key->offset, sectorsize))) { generic_err(leaf, slot, "unaligned key offset for csum item, have %llu should be aligned to %u", key->offset, sectorsize); return -EUCLEAN; } if (unlikely(!IS_ALIGNED(btrfs_item_size(leaf, slot), csumsize))) { generic_err(leaf, slot, "unaligned item size for csum item, have %u should be aligned to %u", btrfs_item_size(leaf, slot), csumsize); return -EUCLEAN; } if (slot > 0 && prev_key->type == BTRFS_EXTENT_CSUM_KEY) { u64 prev_csum_end; u32 prev_item_size; prev_item_size = btrfs_item_size(leaf, slot - 1); prev_csum_end = (prev_item_size / csumsize) * sectorsize; prev_csum_end += prev_key->offset; if (unlikely(prev_csum_end > key->offset)) { generic_err(leaf, slot - 1, "csum end range (%llu) goes beyond the start range (%llu) of the next csum item", prev_csum_end, key->offset); return -EUCLEAN; } } return 0; } /* Inode item error output has the same format as dir_item_err() */ #define inode_item_err(eb, slot, fmt, ...) \ dir_item_err(eb, slot, fmt, __VA_ARGS__) static int check_inode_key(struct extent_buffer *leaf, struct btrfs_key *key, int slot) { struct btrfs_key item_key; bool is_inode_item; btrfs_item_key_to_cpu(leaf, &item_key, slot); is_inode_item = (item_key.type == BTRFS_INODE_ITEM_KEY); /* For XATTR_ITEM, location key should be all 0 */ if (item_key.type == BTRFS_XATTR_ITEM_KEY) { if (unlikely(key->objectid != 0 || key->type != 0 || key->offset != 0)) return -EUCLEAN; return 0; } if (unlikely((key->objectid < BTRFS_FIRST_FREE_OBJECTID || key->objectid > BTRFS_LAST_FREE_OBJECTID) && key->objectid != BTRFS_ROOT_TREE_DIR_OBJECTID && key->objectid != BTRFS_FREE_INO_OBJECTID)) { if (is_inode_item) { generic_err(leaf, slot, "invalid key objectid: has %llu expect %llu or [%llu, %llu] or %llu", key->objectid, BTRFS_ROOT_TREE_DIR_OBJECTID, BTRFS_FIRST_FREE_OBJECTID, BTRFS_LAST_FREE_OBJECTID, BTRFS_FREE_INO_OBJECTID); } else { dir_item_err(leaf, slot, "invalid location key objectid: has %llu expect %llu or [%llu, %llu] or %llu", key->objectid, BTRFS_ROOT_TREE_DIR_OBJECTID, BTRFS_FIRST_FREE_OBJECTID, BTRFS_LAST_FREE_OBJECTID, BTRFS_FREE_INO_OBJECTID); } return -EUCLEAN; } if (unlikely(key->offset != 0)) { if (is_inode_item) inode_item_err(leaf, slot, "invalid key offset: has %llu expect 0", key->offset); else dir_item_err(leaf, slot, "invalid location key offset:has %llu expect 0", key->offset); return -EUCLEAN; } return 0; } static int check_root_key(struct extent_buffer *leaf, struct btrfs_key *key, int slot) { struct btrfs_key item_key; bool is_root_item; btrfs_item_key_to_cpu(leaf, &item_key, slot); is_root_item = (item_key.type == BTRFS_ROOT_ITEM_KEY); /* * Bad rootid for reloc trees. * * Reloc trees are only for subvolume trees, other trees only need * to be COWed to be relocated. */ if (unlikely(is_root_item && key->objectid == BTRFS_TREE_RELOC_OBJECTID && !is_fstree(key->offset))) { generic_err(leaf, slot, "invalid reloc tree for root %lld, root id is not a subvolume tree", key->offset); return -EUCLEAN; } /* No such tree id */ if (unlikely(key->objectid == 0)) { if (is_root_item) generic_err(leaf, slot, "invalid root id 0"); else dir_item_err(leaf, slot, "invalid location key root id 0"); return -EUCLEAN; } /* DIR_ITEM/INDEX/INODE_REF is not allowed to point to non-fs trees */ if (unlikely(!is_fstree(key->objectid) && !is_root_item)) { dir_item_err(leaf, slot, "invalid location key objectid, have %llu expect [%llu, %llu]", key->objectid, BTRFS_FIRST_FREE_OBJECTID, BTRFS_LAST_FREE_OBJECTID); return -EUCLEAN; } /* * ROOT_ITEM with non-zero offset means this is a snapshot, created at * @offset transid. * Furthermore, for location key in DIR_ITEM, its offset is always -1. * * So here we only check offset for reloc tree whose key->offset must * be a valid tree. */ if (unlikely(key->objectid == BTRFS_TREE_RELOC_OBJECTID && key->offset == 0)) { generic_err(leaf, slot, "invalid root id 0 for reloc tree"); return -EUCLEAN; } return 0; } static int check_dir_item(struct extent_buffer *leaf, struct btrfs_key *key, struct btrfs_key *prev_key, int slot) { struct btrfs_fs_info *fs_info = leaf->fs_info; struct btrfs_dir_item *di; u32 item_size = btrfs_item_size(leaf, slot); u32 cur = 0; if (unlikely(!check_prev_ino(leaf, key, slot, prev_key))) return -EUCLEAN; di = btrfs_item_ptr(leaf, slot, struct btrfs_dir_item); while (cur < item_size) { struct btrfs_key location_key; u32 name_len; u32 data_len; u32 max_name_len; u32 total_size; u32 name_hash; u8 dir_type; int ret; /* header itself should not cross item boundary */ if (unlikely(cur + sizeof(*di) > item_size)) { dir_item_err(leaf, slot, "dir item header crosses item boundary, have %zu boundary %u", cur + sizeof(*di), item_size); return -EUCLEAN; } /* Location key check */ btrfs_dir_item_key_to_cpu(leaf, di, &location_key); if (location_key.type == BTRFS_ROOT_ITEM_KEY) { ret = check_root_key(leaf, &location_key, slot); if (unlikely(ret < 0)) return ret; } else if (location_key.type == BTRFS_INODE_ITEM_KEY || location_key.type == 0) { ret = check_inode_key(leaf, &location_key, slot); if (unlikely(ret < 0)) return ret; } else { dir_item_err(leaf, slot, "invalid location key type, have %u, expect %u or %u", location_key.type, BTRFS_ROOT_ITEM_KEY, BTRFS_INODE_ITEM_KEY); return -EUCLEAN; } /* dir type check */ dir_type = btrfs_dir_ftype(leaf, di); if (unlikely(dir_type >= BTRFS_FT_MAX)) { dir_item_err(leaf, slot, "invalid dir item type, have %u expect [0, %u)", dir_type, BTRFS_FT_MAX); return -EUCLEAN; } if (unlikely(key->type == BTRFS_XATTR_ITEM_KEY && dir_type != BTRFS_FT_XATTR)) { dir_item_err(leaf, slot, "invalid dir item type for XATTR key, have %u expect %u", dir_type, BTRFS_FT_XATTR); return -EUCLEAN; } if (unlikely(dir_type == BTRFS_FT_XATTR && key->type != BTRFS_XATTR_ITEM_KEY)) { dir_item_err(leaf, slot, "xattr dir type found for non-XATTR key"); return -EUCLEAN; } if (dir_type == BTRFS_FT_XATTR) max_name_len = XATTR_NAME_MAX; else max_name_len = BTRFS_NAME_LEN; /* Name/data length check */ name_len = btrfs_dir_name_len(leaf, di); data_len = btrfs_dir_data_len(leaf, di); if (unlikely(name_len > max_name_len)) { dir_item_err(leaf, slot, "dir item name len too long, have %u max %u", name_len, max_name_len); return -EUCLEAN; } if (unlikely(name_len + data_len > BTRFS_MAX_XATTR_SIZE(fs_info))) { dir_item_err(leaf, slot, "dir item name and data len too long, have %u max %u", name_len + data_len, BTRFS_MAX_XATTR_SIZE(fs_info)); return -EUCLEAN; } if (unlikely(data_len && dir_type != BTRFS_FT_XATTR)) { dir_item_err(leaf, slot, "dir item with invalid data len, have %u expect 0", data_len); return -EUCLEAN; } total_size = sizeof(*di) + name_len + data_len; /* header and name/data should not cross item boundary */ if (unlikely(cur + total_size > item_size)) { dir_item_err(leaf, slot, "dir item data crosses item boundary, have %u boundary %u", cur + total_size, item_size); return -EUCLEAN; } /* * Special check for XATTR/DIR_ITEM, as key->offset is name * hash, should match its name */ if (key->type == BTRFS_DIR_ITEM_KEY || key->type == BTRFS_XATTR_ITEM_KEY) { char namebuf[max(BTRFS_NAME_LEN, XATTR_NAME_MAX)]; read_extent_buffer(leaf, namebuf, (unsigned long)(di + 1), name_len); name_hash = btrfs_name_hash(namebuf, name_len); if (unlikely(key->offset != name_hash)) { dir_item_err(leaf, slot, "name hash mismatch with key, have 0x%016x expect 0x%016llx", name_hash, key->offset); return -EUCLEAN; } } cur += total_size; di = (struct btrfs_dir_item *)((void *)di + total_size); } return 0; } __printf(3, 4) __cold static void block_group_err(const struct extent_buffer *eb, int slot, const char *fmt, ...) { const struct btrfs_fs_info *fs_info = eb->fs_info; struct btrfs_key key; struct va_format vaf; va_list args; btrfs_item_key_to_cpu(eb, &key, slot); va_start(args, fmt); vaf.fmt = fmt; vaf.va = &args; dump_page(folio_page(eb->folios[0], 0), "eb page dump"); btrfs_crit(fs_info, "corrupt %s: root=%llu block=%llu slot=%d bg_start=%llu bg_len=%llu, %pV", btrfs_header_level(eb) == 0 ? "leaf" : "node", btrfs_header_owner(eb), btrfs_header_bytenr(eb), slot, key.objectid, key.offset, &vaf); va_end(args); } static int check_block_group_item(struct extent_buffer *leaf, struct btrfs_key *key, int slot) { struct btrfs_fs_info *fs_info = leaf->fs_info; struct btrfs_block_group_item bgi; u32 item_size = btrfs_item_size(leaf, slot); u64 chunk_objectid; u64 flags; u64 type; /* * Here we don't really care about alignment since extent allocator can * handle it. We care more about the size. */ if (unlikely(key->offset == 0)) { block_group_err(leaf, slot, "invalid block group size 0"); return -EUCLEAN; } if (unlikely(item_size != sizeof(bgi))) { block_group_err(leaf, slot, "invalid item size, have %u expect %zu", item_size, sizeof(bgi)); return -EUCLEAN; } read_extent_buffer(leaf, &bgi, btrfs_item_ptr_offset(leaf, slot), sizeof(bgi)); chunk_objectid = btrfs_stack_block_group_chunk_objectid(&bgi); if (btrfs_fs_incompat(fs_info, EXTENT_TREE_V2)) { /* * We don't init the nr_global_roots until we load the global * roots, so this could be 0 at mount time. If it's 0 we'll * just assume we're fine, and later we'll check against our * actual value. */ if (unlikely(fs_info->nr_global_roots && chunk_objectid >= fs_info->nr_global_roots)) { block_group_err(leaf, slot, "invalid block group global root id, have %llu, needs to be <= %llu", chunk_objectid, fs_info->nr_global_roots); return -EUCLEAN; } } else if (unlikely(chunk_objectid != BTRFS_FIRST_CHUNK_TREE_OBJECTID)) { block_group_err(leaf, slot, "invalid block group chunk objectid, have %llu expect %llu", btrfs_stack_block_group_chunk_objectid(&bgi), BTRFS_FIRST_CHUNK_TREE_OBJECTID); return -EUCLEAN; } if (unlikely(btrfs_stack_block_group_used(&bgi) > key->offset)) { block_group_err(leaf, slot, "invalid block group used, have %llu expect [0, %llu)", btrfs_stack_block_group_used(&bgi), key->offset); return -EUCLEAN; } flags = btrfs_stack_block_group_flags(&bgi); if (unlikely(hweight64(flags & BTRFS_BLOCK_GROUP_PROFILE_MASK) > 1)) { block_group_err(leaf, slot, "invalid profile flags, have 0x%llx (%lu bits set) expect no more than 1 bit set", flags & BTRFS_BLOCK_GROUP_PROFILE_MASK, hweight64(flags & BTRFS_BLOCK_GROUP_PROFILE_MASK)); return -EUCLEAN; } type = flags & BTRFS_BLOCK_GROUP_TYPE_MASK; if (unlikely(type != BTRFS_BLOCK_GROUP_DATA && type != BTRFS_BLOCK_GROUP_METADATA && type != BTRFS_BLOCK_GROUP_SYSTEM && type != (BTRFS_BLOCK_GROUP_METADATA | BTRFS_BLOCK_GROUP_DATA))) { block_group_err(leaf, slot, "invalid type, have 0x%llx (%lu bits set) expect either 0x%llx, 0x%llx, 0x%llx or 0x%llx", type, hweight64(type), BTRFS_BLOCK_GROUP_DATA, BTRFS_BLOCK_GROUP_METADATA, BTRFS_BLOCK_GROUP_SYSTEM, BTRFS_BLOCK_GROUP_METADATA | BTRFS_BLOCK_GROUP_DATA); return -EUCLEAN; } return 0; } __printf(4, 5) __cold static void chunk_err(const struct extent_buffer *leaf, const struct btrfs_chunk *chunk, u64 logical, const char *fmt, ...) { const struct btrfs_fs_info *fs_info = leaf->fs_info; bool is_sb; struct va_format vaf; va_list args; int i; int slot = -1; /* Only superblock eb is able to have such small offset */ is_sb = (leaf->start == BTRFS_SUPER_INFO_OFFSET); if (!is_sb) { /* * Get the slot number by iterating through all slots, this * would provide better readability. */ for (i = 0; i < btrfs_header_nritems(leaf); i++) { if (btrfs_item_ptr_offset(leaf, i) == (unsigned long)chunk) { slot = i; break; } } } va_start(args, fmt); vaf.fmt = fmt; vaf.va = &args; if (is_sb) btrfs_crit(fs_info, "corrupt superblock syschunk array: chunk_start=%llu, %pV", logical, &vaf); else btrfs_crit(fs_info, "corrupt leaf: root=%llu block=%llu slot=%d chunk_start=%llu, %pV", BTRFS_CHUNK_TREE_OBJECTID, leaf->start, slot, logical, &vaf); va_end(args); } /* * The common chunk check which could also work on super block sys chunk array. * * Return -EUCLEAN if anything is corrupted. * Return 0 if everything is OK. */ int btrfs_check_chunk_valid(struct extent_buffer *leaf, struct btrfs_chunk *chunk, u64 logical) { struct btrfs_fs_info *fs_info = leaf->fs_info; u64 length; u64 chunk_end; u64 stripe_len; u16 num_stripes; u16 sub_stripes; u64 type; u64 features; bool mixed = false; int raid_index; int nparity; int ncopies; length = btrfs_chunk_length(leaf, chunk); stripe_len = btrfs_chunk_stripe_len(leaf, chunk); num_stripes = btrfs_chunk_num_stripes(leaf, chunk); sub_stripes = btrfs_chunk_sub_stripes(leaf, chunk); type = btrfs_chunk_type(leaf, chunk); raid_index = btrfs_bg_flags_to_raid_index(type); ncopies = btrfs_raid_array[raid_index].ncopies; nparity = btrfs_raid_array[raid_index].nparity; if (unlikely(!num_stripes)) { chunk_err(leaf, chunk, logical, "invalid chunk num_stripes, have %u", num_stripes); return -EUCLEAN; } if (unlikely(num_stripes < ncopies)) { chunk_err(leaf, chunk, logical, "invalid chunk num_stripes < ncopies, have %u < %d", num_stripes, ncopies); return -EUCLEAN; } if (unlikely(nparity && num_stripes == nparity)) { chunk_err(leaf, chunk, logical, "invalid chunk num_stripes == nparity, have %u == %d", num_stripes, nparity); return -EUCLEAN; } if (unlikely(!IS_ALIGNED(logical, fs_info->sectorsize))) { chunk_err(leaf, chunk, logical, "invalid chunk logical, have %llu should aligned to %u", logical, fs_info->sectorsize); return -EUCLEAN; } if (unlikely(btrfs_chunk_sector_size(leaf, chunk) != fs_info->sectorsize)) { chunk_err(leaf, chunk, logical, "invalid chunk sectorsize, have %u expect %u", btrfs_chunk_sector_size(leaf, chunk), fs_info->sectorsize); return -EUCLEAN; } if (unlikely(!length || !IS_ALIGNED(length, fs_info->sectorsize))) { chunk_err(leaf, chunk, logical, "invalid chunk length, have %llu", length); return -EUCLEAN; } if (unlikely(check_add_overflow(logical, length, &chunk_end))) { chunk_err(leaf, chunk, logical, "invalid chunk logical start and length, have logical start %llu length %llu", logical, length); return -EUCLEAN; } if (unlikely(!is_power_of_2(stripe_len) || stripe_len != BTRFS_STRIPE_LEN)) { chunk_err(leaf, chunk, logical, "invalid chunk stripe length: %llu", stripe_len); return -EUCLEAN; } /* * We artificially limit the chunk size, so that the number of stripes * inside a chunk can be fit into a U32. The current limit (256G) is * way too large for real world usage anyway, and it's also much larger * than our existing limit (10G). * * Thus it should be a good way to catch obvious bitflips. */ if (unlikely(length >= btrfs_stripe_nr_to_offset(U32_MAX))) { chunk_err(leaf, chunk, logical, "chunk length too large: have %llu limit %llu", length, btrfs_stripe_nr_to_offset(U32_MAX)); return -EUCLEAN; } if (unlikely(type & ~(BTRFS_BLOCK_GROUP_TYPE_MASK | BTRFS_BLOCK_GROUP_PROFILE_MASK))) { chunk_err(leaf, chunk, logical, "unrecognized chunk type: 0x%llx", ~(BTRFS_BLOCK_GROUP_TYPE_MASK | BTRFS_BLOCK_GROUP_PROFILE_MASK) & btrfs_chunk_type(leaf, chunk)); return -EUCLEAN; } if (unlikely(!has_single_bit_set(type & BTRFS_BLOCK_GROUP_PROFILE_MASK) && (type & BTRFS_BLOCK_GROUP_PROFILE_MASK) != 0)) { chunk_err(leaf, chunk, logical, "invalid chunk profile flag: 0x%llx, expect 0 or 1 bit set", type & BTRFS_BLOCK_GROUP_PROFILE_MASK); return -EUCLEAN; } if (unlikely((type & BTRFS_BLOCK_GROUP_TYPE_MASK) == 0)) { chunk_err(leaf, chunk, logical, "missing chunk type flag, have 0x%llx one bit must be set in 0x%llx", type, BTRFS_BLOCK_GROUP_TYPE_MASK); return -EUCLEAN; } if (unlikely((type & BTRFS_BLOCK_GROUP_SYSTEM) && (type & (BTRFS_BLOCK_GROUP_METADATA | BTRFS_BLOCK_GROUP_DATA)))) { chunk_err(leaf, chunk, logical, "system chunk with data or metadata type: 0x%llx", type); return -EUCLEAN; } features = btrfs_super_incompat_flags(fs_info->super_copy); if (features & BTRFS_FEATURE_INCOMPAT_MIXED_GROUPS) mixed = true; if (!mixed) { if (unlikely((type & BTRFS_BLOCK_GROUP_METADATA) && (type & BTRFS_BLOCK_GROUP_DATA))) { chunk_err(leaf, chunk, logical, "mixed chunk type in non-mixed mode: 0x%llx", type); return -EUCLEAN; } } if (unlikely((type & BTRFS_BLOCK_GROUP_RAID10 && sub_stripes != btrfs_raid_array[BTRFS_RAID_RAID10].sub_stripes) || (type & BTRFS_BLOCK_GROUP_RAID1 && num_stripes != btrfs_raid_array[BTRFS_RAID_RAID1].devs_min) || (type & BTRFS_BLOCK_GROUP_RAID1C3 && num_stripes != btrfs_raid_array[BTRFS_RAID_RAID1C3].devs_min) || (type & BTRFS_BLOCK_GROUP_RAID1C4 && num_stripes != btrfs_raid_array[BTRFS_RAID_RAID1C4].devs_min) || (type & BTRFS_BLOCK_GROUP_RAID5 && num_stripes < btrfs_raid_array[BTRFS_RAID_RAID5].devs_min) || (type & BTRFS_BLOCK_GROUP_RAID6 && num_stripes < btrfs_raid_array[BTRFS_RAID_RAID6].devs_min) || (type & BTRFS_BLOCK_GROUP_DUP && num_stripes != btrfs_raid_array[BTRFS_RAID_DUP].dev_stripes) || ((type & BTRFS_BLOCK_GROUP_PROFILE_MASK) == 0 && num_stripes != btrfs_raid_array[BTRFS_RAID_SINGLE].dev_stripes))) { chunk_err(leaf, chunk, logical, "invalid num_stripes:sub_stripes %u:%u for profile %llu", num_stripes, sub_stripes, type & BTRFS_BLOCK_GROUP_PROFILE_MASK); return -EUCLEAN; } return 0; } /* * Enhanced version of chunk item checker. * * The common btrfs_check_chunk_valid() doesn't check item size since it needs * to work on super block sys_chunk_array which doesn't have full item ptr. */ static int check_leaf_chunk_item(struct extent_buffer *leaf, struct btrfs_chunk *chunk, struct btrfs_key *key, int slot) { int num_stripes; if (unlikely(btrfs_item_size(leaf, slot) < sizeof(struct btrfs_chunk))) { chunk_err(leaf, chunk, key->offset, "invalid chunk item size: have %u expect [%zu, %u)", btrfs_item_size(leaf, slot), sizeof(struct btrfs_chunk), BTRFS_LEAF_DATA_SIZE(leaf->fs_info)); return -EUCLEAN; } num_stripes = btrfs_chunk_num_stripes(leaf, chunk); /* Let btrfs_check_chunk_valid() handle this error type */ if (num_stripes == 0) goto out; if (unlikely(btrfs_chunk_item_size(num_stripes) != btrfs_item_size(leaf, slot))) { chunk_err(leaf, chunk, key->offset, "invalid chunk item size: have %u expect %lu", btrfs_item_size(leaf, slot), btrfs_chunk_item_size(num_stripes)); return -EUCLEAN; } out: return btrfs_check_chunk_valid(leaf, chunk, key->offset); } __printf(3, 4) __cold static void dev_item_err(const struct extent_buffer *eb, int slot, const char *fmt, ...) { struct btrfs_key key; struct va_format vaf; va_list args; btrfs_item_key_to_cpu(eb, &key, slot); va_start(args, fmt); vaf.fmt = fmt; vaf.va = &args; dump_page(folio_page(eb->folios[0], 0), "eb page dump"); btrfs_crit(eb->fs_info, "corrupt %s: root=%llu block=%llu slot=%d devid=%llu %pV", btrfs_header_level(eb) == 0 ? "leaf" : "node", btrfs_header_owner(eb), btrfs_header_bytenr(eb), slot, key.objectid, &vaf); va_end(args); } static int check_dev_item(struct extent_buffer *leaf, struct btrfs_key *key, int slot) { struct btrfs_dev_item *ditem; const u32 item_size = btrfs_item_size(leaf, slot); if (unlikely(key->objectid != BTRFS_DEV_ITEMS_OBJECTID)) { dev_item_err(leaf, slot, "invalid objectid: has=%llu expect=%llu", key->objectid, BTRFS_DEV_ITEMS_OBJECTID); return -EUCLEAN; } if (unlikely(item_size != sizeof(*ditem))) { dev_item_err(leaf, slot, "invalid item size: has %u expect %zu", item_size, sizeof(*ditem)); return -EUCLEAN; } ditem = btrfs_item_ptr(leaf, slot, struct btrfs_dev_item); if (unlikely(btrfs_device_id(leaf, ditem) != key->offset)) { dev_item_err(leaf, slot, "devid mismatch: key has=%llu item has=%llu", key->offset, btrfs_device_id(leaf, ditem)); return -EUCLEAN; } /* * For device total_bytes, we don't have reliable way to check it, as * it can be 0 for device removal. Device size check can only be done * by dev extents check. */ if (unlikely(btrfs_device_bytes_used(leaf, ditem) > btrfs_device_total_bytes(leaf, ditem))) { dev_item_err(leaf, slot, "invalid bytes used: have %llu expect [0, %llu]", btrfs_device_bytes_used(leaf, ditem), btrfs_device_total_bytes(leaf, ditem)); return -EUCLEAN; } /* * Remaining members like io_align/type/gen/dev_group aren't really * utilized. Skip them to make later usage of them easier. */ return 0; } static int check_inode_item(struct extent_buffer *leaf, struct btrfs_key *key, int slot) { struct btrfs_fs_info *fs_info = leaf->fs_info; struct btrfs_inode_item *iitem; u64 super_gen = btrfs_super_generation(fs_info->super_copy); u32 valid_mask = (S_IFMT | S_ISUID | S_ISGID | S_ISVTX | 0777); const u32 item_size = btrfs_item_size(leaf, slot); u32 mode; int ret; u32 flags; u32 ro_flags; ret = check_inode_key(leaf, key, slot); if (unlikely(ret < 0)) return ret; if (unlikely(item_size != sizeof(*iitem))) { generic_err(leaf, slot, "invalid item size: has %u expect %zu", item_size, sizeof(*iitem)); return -EUCLEAN; } iitem = btrfs_item_ptr(leaf, slot, struct btrfs_inode_item); /* Here we use super block generation + 1 to handle log tree */ if (unlikely(btrfs_inode_generation(leaf, iitem) > super_gen + 1)) { inode_item_err(leaf, slot, "invalid inode generation: has %llu expect (0, %llu]", btrfs_inode_generation(leaf, iitem), super_gen + 1); return -EUCLEAN; } /* Note for ROOT_TREE_DIR_ITEM, mkfs could set its transid 0 */ if (unlikely(btrfs_inode_transid(leaf, iitem) > super_gen + 1)) { inode_item_err(leaf, slot, "invalid inode transid: has %llu expect [0, %llu]", btrfs_inode_transid(leaf, iitem), super_gen + 1); return -EUCLEAN; } /* * For size and nbytes it's better not to be too strict, as for dir * item its size/nbytes can easily get wrong, but doesn't affect * anything in the fs. So here we skip the check. */ mode = btrfs_inode_mode(leaf, iitem); if (unlikely(mode & ~valid_mask)) { inode_item_err(leaf, slot, "unknown mode bit detected: 0x%x", mode & ~valid_mask); return -EUCLEAN; } /* * S_IFMT is not bit mapped so we can't completely rely on * is_power_of_2/has_single_bit_set, but it can save us from checking * FIFO/CHR/DIR/REG. Only needs to check BLK, LNK and SOCKS */ if (!has_single_bit_set(mode & S_IFMT)) { if (unlikely(!S_ISLNK(mode) && !S_ISBLK(mode) && !S_ISSOCK(mode))) { inode_item_err(leaf, slot, "invalid mode: has 0%o expect valid S_IF* bit(s)", mode & S_IFMT); return -EUCLEAN; } } if (unlikely(S_ISDIR(mode) && btrfs_inode_nlink(leaf, iitem) > 1)) { inode_item_err(leaf, slot, "invalid nlink: has %u expect no more than 1 for dir", btrfs_inode_nlink(leaf, iitem)); return -EUCLEAN; } btrfs_inode_split_flags(btrfs_inode_flags(leaf, iitem), &flags, &ro_flags); if (unlikely(flags & ~BTRFS_INODE_FLAG_MASK)) { inode_item_err(leaf, slot, "unknown incompat flags detected: 0x%x", flags); return -EUCLEAN; } if (unlikely(!sb_rdonly(fs_info->sb) && (ro_flags & ~BTRFS_INODE_RO_FLAG_MASK))) { inode_item_err(leaf, slot, "unknown ro-compat flags detected on writeable mount: 0x%x", ro_flags); return -EUCLEAN; } return 0; } static int check_root_item(struct extent_buffer *leaf, struct btrfs_key *key, int slot) { struct btrfs_fs_info *fs_info = leaf->fs_info; struct btrfs_root_item ri = { 0 }; const u64 valid_root_flags = BTRFS_ROOT_SUBVOL_RDONLY | BTRFS_ROOT_SUBVOL_DEAD; int ret; ret = check_root_key(leaf, key, slot); if (unlikely(ret < 0)) return ret; if (unlikely(btrfs_item_size(leaf, slot) != sizeof(ri) && btrfs_item_size(leaf, slot) != btrfs_legacy_root_item_size())) { generic_err(leaf, slot, "invalid root item size, have %u expect %zu or %u", btrfs_item_size(leaf, slot), sizeof(ri), btrfs_legacy_root_item_size()); return -EUCLEAN; } /* * For legacy root item, the members starting at generation_v2 will be * all filled with 0. * And since we allow geneartion_v2 as 0, it will still pass the check. */ read_extent_buffer(leaf, &ri, btrfs_item_ptr_offset(leaf, slot), btrfs_item_size(leaf, slot)); /* Generation related */ if (unlikely(btrfs_root_generation(&ri) > btrfs_super_generation(fs_info->super_copy) + 1)) { generic_err(leaf, slot, "invalid root generation, have %llu expect (0, %llu]", btrfs_root_generation(&ri), btrfs_super_generation(fs_info->super_copy) + 1); return -EUCLEAN; } if (unlikely(btrfs_root_generation_v2(&ri) > btrfs_super_generation(fs_info->super_copy) + 1)) { generic_err(leaf, slot, "invalid root v2 generation, have %llu expect (0, %llu]", btrfs_root_generation_v2(&ri), btrfs_super_generation(fs_info->super_copy) + 1); return -EUCLEAN; } if (unlikely(btrfs_root_last_snapshot(&ri) > btrfs_super_generation(fs_info->super_copy) + 1)) { generic_err(leaf, slot, "invalid root last_snapshot, have %llu expect (0, %llu]", btrfs_root_last_snapshot(&ri), btrfs_super_generation(fs_info->super_copy) + 1); return -EUCLEAN; } /* Alignment and level check */ if (unlikely(!IS_ALIGNED(btrfs_root_bytenr(&ri), fs_info->sectorsize))) { generic_err(leaf, slot, "invalid root bytenr, have %llu expect to be aligned to %u", btrfs_root_bytenr(&ri), fs_info->sectorsize); return -EUCLEAN; } if (unlikely(btrfs_root_level(&ri) >= BTRFS_MAX_LEVEL)) { generic_err(leaf, slot, "invalid root level, have %u expect [0, %u]", btrfs_root_level(&ri), BTRFS_MAX_LEVEL - 1); return -EUCLEAN; } if (unlikely(btrfs_root_drop_level(&ri) >= BTRFS_MAX_LEVEL)) { generic_err(leaf, slot, "invalid root level, have %u expect [0, %u]", btrfs_root_drop_level(&ri), BTRFS_MAX_LEVEL - 1); return -EUCLEAN; } /* Flags check */ if (unlikely(btrfs_root_flags(&ri) & ~valid_root_flags)) { generic_err(leaf, slot, "invalid root flags, have 0x%llx expect mask 0x%llx", btrfs_root_flags(&ri), valid_root_flags); return -EUCLEAN; } return 0; } __printf(3,4) __cold static void extent_err(const struct extent_buffer *eb, int slot, const char *fmt, ...) { struct btrfs_key key; struct va_format vaf; va_list args; u64 bytenr; u64 len; btrfs_item_key_to_cpu(eb, &key, slot); bytenr = key.objectid; if (key.type == BTRFS_METADATA_ITEM_KEY || key.type == BTRFS_TREE_BLOCK_REF_KEY || key.type == BTRFS_SHARED_BLOCK_REF_KEY) len = eb->fs_info->nodesize; else len = key.offset; va_start(args, fmt); vaf.fmt = fmt; vaf.va = &args; dump_page(folio_page(eb->folios[0], 0), "eb page dump"); btrfs_crit(eb->fs_info, "corrupt %s: block=%llu slot=%d extent bytenr=%llu len=%llu %pV", btrfs_header_level(eb) == 0 ? "leaf" : "node", eb->start, slot, bytenr, len, &vaf); va_end(args); } static int check_extent_item(struct extent_buffer *leaf, struct btrfs_key *key, int slot, struct btrfs_key *prev_key) { struct btrfs_fs_info *fs_info = leaf->fs_info; struct btrfs_extent_item *ei; bool is_tree_block = false; unsigned long ptr; /* Current pointer inside inline refs */ unsigned long end; /* Extent item end */ const u32 item_size = btrfs_item_size(leaf, slot); u8 last_type = 0; u64 last_seq = U64_MAX; u64 flags; u64 generation; u64 total_refs; /* Total refs in btrfs_extent_item */ u64 inline_refs = 0; /* found total inline refs */ if (unlikely(key->type == BTRFS_METADATA_ITEM_KEY && !btrfs_fs_incompat(fs_info, SKINNY_METADATA))) { generic_err(leaf, slot, "invalid key type, METADATA_ITEM type invalid when SKINNY_METADATA feature disabled"); return -EUCLEAN; } /* key->objectid is the bytenr for both key types */ if (unlikely(!IS_ALIGNED(key->objectid, fs_info->sectorsize))) { generic_err(leaf, slot, "invalid key objectid, have %llu expect to be aligned to %u", key->objectid, fs_info->sectorsize); return -EUCLEAN; } /* key->offset is tree level for METADATA_ITEM_KEY */ if (unlikely(key->type == BTRFS_METADATA_ITEM_KEY && key->offset >= BTRFS_MAX_LEVEL)) { extent_err(leaf, slot, "invalid tree level, have %llu expect [0, %u]", key->offset, BTRFS_MAX_LEVEL - 1); return -EUCLEAN; } /* * EXTENT/METADATA_ITEM consists of: * 1) One btrfs_extent_item * Records the total refs, type and generation of the extent. * * 2) One btrfs_tree_block_info (for EXTENT_ITEM and tree backref only) * Records the first key and level of the tree block. * * 2) Zero or more btrfs_extent_inline_ref(s) * Each inline ref has one btrfs_extent_inline_ref shows: * 2.1) The ref type, one of the 4 * TREE_BLOCK_REF Tree block only * SHARED_BLOCK_REF Tree block only * EXTENT_DATA_REF Data only * SHARED_DATA_REF Data only * 2.2) Ref type specific data * Either using btrfs_extent_inline_ref::offset, or specific * data structure. * * All above inline items should follow the order: * * - All btrfs_extent_inline_ref::type should be in an ascending * order * * - Within the same type, the items should follow a descending * order by their sequence number. The sequence number is * determined by: * * btrfs_extent_inline_ref::offset for all types other than * EXTENT_DATA_REF * * hash_extent_data_ref() for EXTENT_DATA_REF */ if (unlikely(item_size < sizeof(*ei))) { extent_err(leaf, slot, "invalid item size, have %u expect [%zu, %u)", item_size, sizeof(*ei), BTRFS_LEAF_DATA_SIZE(fs_info)); return -EUCLEAN; } end = item_size + btrfs_item_ptr_offset(leaf, slot); /* Checks against extent_item */ ei = btrfs_item_ptr(leaf, slot, struct btrfs_extent_item); flags = btrfs_extent_flags(leaf, ei); total_refs = btrfs_extent_refs(leaf, ei); generation = btrfs_extent_generation(leaf, ei); if (unlikely(generation > btrfs_super_generation(fs_info->super_copy) + 1)) { extent_err(leaf, slot, "invalid generation, have %llu expect (0, %llu]", generation, btrfs_super_generation(fs_info->super_copy) + 1); return -EUCLEAN; } if (unlikely(!has_single_bit_set(flags & (BTRFS_EXTENT_FLAG_DATA | BTRFS_EXTENT_FLAG_TREE_BLOCK)))) { extent_err(leaf, slot, "invalid extent flag, have 0x%llx expect 1 bit set in 0x%llx", flags, BTRFS_EXTENT_FLAG_DATA | BTRFS_EXTENT_FLAG_TREE_BLOCK); return -EUCLEAN; } is_tree_block = !!(flags & BTRFS_EXTENT_FLAG_TREE_BLOCK); if (is_tree_block) { if (unlikely(key->type == BTRFS_EXTENT_ITEM_KEY && key->offset != fs_info->nodesize)) { extent_err(leaf, slot, "invalid extent length, have %llu expect %u", key->offset, fs_info->nodesize); return -EUCLEAN; } } else { if (unlikely(key->type != BTRFS_EXTENT_ITEM_KEY)) { extent_err(leaf, slot, "invalid key type, have %u expect %u for data backref", key->type, BTRFS_EXTENT_ITEM_KEY); return -EUCLEAN; } if (unlikely(!IS_ALIGNED(key->offset, fs_info->sectorsize))) { extent_err(leaf, slot, "invalid extent length, have %llu expect aligned to %u", key->offset, fs_info->sectorsize); return -EUCLEAN; } if (unlikely(flags & BTRFS_BLOCK_FLAG_FULL_BACKREF)) { extent_err(leaf, slot, "invalid extent flag, data has full backref set"); return -EUCLEAN; } } ptr = (unsigned long)(struct btrfs_extent_item *)(ei + 1); /* Check the special case of btrfs_tree_block_info */ if (is_tree_block && key->type != BTRFS_METADATA_ITEM_KEY) { struct btrfs_tree_block_info *info; info = (struct btrfs_tree_block_info *)ptr; if (unlikely(btrfs_tree_block_level(leaf, info) >= BTRFS_MAX_LEVEL)) { extent_err(leaf, slot, "invalid tree block info level, have %u expect [0, %u]", btrfs_tree_block_level(leaf, info), BTRFS_MAX_LEVEL - 1); return -EUCLEAN; } ptr = (unsigned long)(struct btrfs_tree_block_info *)(info + 1); } /* Check inline refs */ while (ptr < end) { struct btrfs_extent_inline_ref *iref; struct btrfs_extent_data_ref *dref; struct btrfs_shared_data_ref *sref; u64 seq; u64 dref_offset; u64 inline_offset; u8 inline_type; if (unlikely(ptr + sizeof(*iref) > end)) { extent_err(leaf, slot, "inline ref item overflows extent item, ptr %lu iref size %zu end %lu", ptr, sizeof(*iref), end); return -EUCLEAN; } iref = (struct btrfs_extent_inline_ref *)ptr; inline_type = btrfs_extent_inline_ref_type(leaf, iref); inline_offset = btrfs_extent_inline_ref_offset(leaf, iref); seq = inline_offset; if (unlikely(ptr + btrfs_extent_inline_ref_size(inline_type) > end)) { extent_err(leaf, slot, "inline ref item overflows extent item, ptr %lu iref size %u end %lu", ptr, btrfs_extent_inline_ref_size(inline_type), end); return -EUCLEAN; } switch (inline_type) { /* inline_offset is subvolid of the owner, no need to check */ case BTRFS_TREE_BLOCK_REF_KEY: inline_refs++; break; /* Contains parent bytenr */ case BTRFS_SHARED_BLOCK_REF_KEY: if (unlikely(!IS_ALIGNED(inline_offset, fs_info->sectorsize))) { extent_err(leaf, slot, "invalid tree parent bytenr, have %llu expect aligned to %u", inline_offset, fs_info->sectorsize); return -EUCLEAN; } inline_refs++; break; /* * Contains owner subvolid, owner key objectid, adjusted offset. * The only obvious corruption can happen in that offset. */ case BTRFS_EXTENT_DATA_REF_KEY: dref = (struct btrfs_extent_data_ref *)(&iref->offset); dref_offset = btrfs_extent_data_ref_offset(leaf, dref); seq = hash_extent_data_ref( btrfs_extent_data_ref_root(leaf, dref), btrfs_extent_data_ref_objectid(leaf, dref), btrfs_extent_data_ref_offset(leaf, dref)); if (unlikely(!IS_ALIGNED(dref_offset, fs_info->sectorsize))) { extent_err(leaf, slot, "invalid data ref offset, have %llu expect aligned to %u", dref_offset, fs_info->sectorsize); return -EUCLEAN; } inline_refs += btrfs_extent_data_ref_count(leaf, dref); break; /* Contains parent bytenr and ref count */ case BTRFS_SHARED_DATA_REF_KEY: sref = (struct btrfs_shared_data_ref *)(iref + 1); if (unlikely(!IS_ALIGNED(inline_offset, fs_info->sectorsize))) { extent_err(leaf, slot, "invalid data parent bytenr, have %llu expect aligned to %u", inline_offset, fs_info->sectorsize); return -EUCLEAN; } inline_refs += btrfs_shared_data_ref_count(leaf, sref); break; case BTRFS_EXTENT_OWNER_REF_KEY: WARN_ON(!btrfs_fs_incompat(fs_info, SIMPLE_QUOTA)); break; default: extent_err(leaf, slot, "unknown inline ref type: %u", inline_type); return -EUCLEAN; } if (inline_type < last_type) { extent_err(leaf, slot, "inline ref out-of-order: has type %u, prev type %u", inline_type, last_type); return -EUCLEAN; } /* Type changed, allow the sequence starts from U64_MAX again. */ if (inline_type > last_type) last_seq = U64_MAX; if (seq > last_seq) { extent_err(leaf, slot, "inline ref out-of-order: has type %u offset %llu seq 0x%llx, prev type %u seq 0x%llx", inline_type, inline_offset, seq, last_type, last_seq); return -EUCLEAN; } last_type = inline_type; last_seq = seq; ptr += btrfs_extent_inline_ref_size(inline_type); } /* No padding is allowed */ if (unlikely(ptr != end)) { extent_err(leaf, slot, "invalid extent item size, padding bytes found"); return -EUCLEAN; } /* Finally, check the inline refs against total refs */ if (unlikely(inline_refs > total_refs)) { extent_err(leaf, slot, "invalid extent refs, have %llu expect >= inline %llu", total_refs, inline_refs); return -EUCLEAN; } if ((prev_key->type == BTRFS_EXTENT_ITEM_KEY) || (prev_key->type == BTRFS_METADATA_ITEM_KEY)) { u64 prev_end = prev_key->objectid; if (prev_key->type == BTRFS_METADATA_ITEM_KEY) prev_end += fs_info->nodesize; else prev_end += prev_key->offset; if (unlikely(prev_end > key->objectid)) { extent_err(leaf, slot, "previous extent [%llu %u %llu] overlaps current extent [%llu %u %llu]", prev_key->objectid, prev_key->type, prev_key->offset, key->objectid, key->type, key->offset); return -EUCLEAN; } } return 0; } static int check_simple_keyed_refs(struct extent_buffer *leaf, struct btrfs_key *key, int slot) { u32 expect_item_size = 0; if (key->type == BTRFS_SHARED_DATA_REF_KEY) expect_item_size = sizeof(struct btrfs_shared_data_ref); if (unlikely(btrfs_item_size(leaf, slot) != expect_item_size)) { generic_err(leaf, slot, "invalid item size, have %u expect %u for key type %u", btrfs_item_size(leaf, slot), expect_item_size, key->type); return -EUCLEAN; } if (unlikely(!IS_ALIGNED(key->objectid, leaf->fs_info->sectorsize))) { generic_err(leaf, slot, "invalid key objectid for shared block ref, have %llu expect aligned to %u", key->objectid, leaf->fs_info->sectorsize); return -EUCLEAN; } if (unlikely(key->type != BTRFS_TREE_BLOCK_REF_KEY && !IS_ALIGNED(key->offset, leaf->fs_info->sectorsize))) { extent_err(leaf, slot, "invalid tree parent bytenr, have %llu expect aligned to %u", key->offset, leaf->fs_info->sectorsize); return -EUCLEAN; } return 0; } static int check_extent_data_ref(struct extent_buffer *leaf, struct btrfs_key *key, int slot) { struct btrfs_extent_data_ref *dref; unsigned long ptr = btrfs_item_ptr_offset(leaf, slot); const unsigned long end = ptr + btrfs_item_size(leaf, slot); if (unlikely(btrfs_item_size(leaf, slot) % sizeof(*dref) != 0)) { generic_err(leaf, slot, "invalid item size, have %u expect aligned to %zu for key type %u", btrfs_item_size(leaf, slot), sizeof(*dref), key->type); return -EUCLEAN; } if (unlikely(!IS_ALIGNED(key->objectid, leaf->fs_info->sectorsize))) { generic_err(leaf, slot, "invalid key objectid for shared block ref, have %llu expect aligned to %u", key->objectid, leaf->fs_info->sectorsize); return -EUCLEAN; } for (; ptr < end; ptr += sizeof(*dref)) { u64 offset; /* * We cannot check the extent_data_ref hash due to possible * overflow from the leaf due to hash collisions. */ dref = (struct btrfs_extent_data_ref *)ptr; offset = btrfs_extent_data_ref_offset(leaf, dref); if (unlikely(!IS_ALIGNED(offset, leaf->fs_info->sectorsize))) { extent_err(leaf, slot, "invalid extent data backref offset, have %llu expect aligned to %u", offset, leaf->fs_info->sectorsize); return -EUCLEAN; } } return 0; } #define inode_ref_err(eb, slot, fmt, args...) \ inode_item_err(eb, slot, fmt, ##args) static int check_inode_ref(struct extent_buffer *leaf, struct btrfs_key *key, struct btrfs_key *prev_key, int slot) { struct btrfs_inode_ref *iref; unsigned long ptr; unsigned long end; if (unlikely(!check_prev_ino(leaf, key, slot, prev_key))) return -EUCLEAN; /* namelen can't be 0, so item_size == sizeof() is also invalid */ if (unlikely(btrfs_item_size(leaf, slot) <= sizeof(*iref))) { inode_ref_err(leaf, slot, "invalid item size, have %u expect (%zu, %u)", btrfs_item_size(leaf, slot), sizeof(*iref), BTRFS_LEAF_DATA_SIZE(leaf->fs_info)); return -EUCLEAN; } ptr = btrfs_item_ptr_offset(leaf, slot); end = ptr + btrfs_item_size(leaf, slot); while (ptr < end) { u16 namelen; if (unlikely(ptr + sizeof(iref) > end)) { inode_ref_err(leaf, slot, "inode ref overflow, ptr %lu end %lu inode_ref_size %zu", ptr, end, sizeof(iref)); return -EUCLEAN; } iref = (struct btrfs_inode_ref *)ptr; namelen = btrfs_inode_ref_name_len(leaf, iref); if (unlikely(ptr + sizeof(*iref) + namelen > end)) { inode_ref_err(leaf, slot, "inode ref overflow, ptr %lu end %lu namelen %u", ptr, end, namelen); return -EUCLEAN; } /* * NOTE: In theory we should record all found index numbers * to find any duplicated indexes, but that will be too time * consuming for inodes with too many hard links. */ ptr += sizeof(*iref) + namelen; } return 0; } static int check_raid_stripe_extent(const struct extent_buffer *leaf, const struct btrfs_key *key, int slot) { if (unlikely(!IS_ALIGNED(key->objectid, leaf->fs_info->sectorsize))) { generic_err(leaf, slot, "invalid key objectid for raid stripe extent, have %llu expect aligned to %u", key->objectid, leaf->fs_info->sectorsize); return -EUCLEAN; } if (unlikely(!btrfs_fs_incompat(leaf->fs_info, RAID_STRIPE_TREE))) { generic_err(leaf, slot, "RAID_STRIPE_EXTENT present but RAID_STRIPE_TREE incompat bit unset"); return -EUCLEAN; } return 0; } /* * Common point to switch the item-specific validation. */ static enum btrfs_tree_block_status check_leaf_item(struct extent_buffer *leaf, struct btrfs_key *key, int slot, struct btrfs_key *prev_key) { int ret = 0; struct btrfs_chunk *chunk; switch (key->type) { case BTRFS_EXTENT_DATA_KEY: ret = check_extent_data_item(leaf, key, slot, prev_key); break; case BTRFS_EXTENT_CSUM_KEY: ret = check_csum_item(leaf, key, slot, prev_key); break; case BTRFS_DIR_ITEM_KEY: case BTRFS_DIR_INDEX_KEY: case BTRFS_XATTR_ITEM_KEY: ret = check_dir_item(leaf, key, prev_key, slot); break; case BTRFS_INODE_REF_KEY: ret = check_inode_ref(leaf, key, prev_key, slot); break; case BTRFS_BLOCK_GROUP_ITEM_KEY: ret = check_block_group_item(leaf, key, slot); break; case BTRFS_CHUNK_ITEM_KEY: chunk = btrfs_item_ptr(leaf, slot, struct btrfs_chunk); ret = check_leaf_chunk_item(leaf, chunk, key, slot); break; case BTRFS_DEV_ITEM_KEY: ret = check_dev_item(leaf, key, slot); break; case BTRFS_INODE_ITEM_KEY: ret = check_inode_item(leaf, key, slot); break; case BTRFS_ROOT_ITEM_KEY: ret = check_root_item(leaf, key, slot); break; case BTRFS_EXTENT_ITEM_KEY: case BTRFS_METADATA_ITEM_KEY: ret = check_extent_item(leaf, key, slot, prev_key); break; case BTRFS_TREE_BLOCK_REF_KEY: case BTRFS_SHARED_DATA_REF_KEY: case BTRFS_SHARED_BLOCK_REF_KEY: ret = check_simple_keyed_refs(leaf, key, slot); break; case BTRFS_EXTENT_DATA_REF_KEY: ret = check_extent_data_ref(leaf, key, slot); break; case BTRFS_RAID_STRIPE_KEY: ret = check_raid_stripe_extent(leaf, key, slot); break; } if (ret) return BTRFS_TREE_BLOCK_INVALID_ITEM; return BTRFS_TREE_BLOCK_CLEAN; } enum btrfs_tree_block_status __btrfs_check_leaf(struct extent_buffer *leaf) { struct btrfs_fs_info *fs_info = leaf->fs_info; /* No valid key type is 0, so all key should be larger than this key */ struct btrfs_key prev_key = {0, 0, 0}; struct btrfs_key key; u32 nritems = btrfs_header_nritems(leaf); int slot; if (unlikely(btrfs_header_level(leaf) != 0)) { generic_err(leaf, 0, "invalid level for leaf, have %d expect 0", btrfs_header_level(leaf)); return BTRFS_TREE_BLOCK_INVALID_LEVEL; } if (unlikely(!btrfs_header_flag(leaf, BTRFS_HEADER_FLAG_WRITTEN))) { generic_err(leaf, 0, "invalid flag for leaf, WRITTEN not set"); return BTRFS_TREE_BLOCK_WRITTEN_NOT_SET; } /* * Extent buffers from a relocation tree have a owner field that * corresponds to the subvolume tree they are based on. So just from an * extent buffer alone we can not find out what is the id of the * corresponding subvolume tree, so we can not figure out if the extent * buffer corresponds to the root of the relocation tree or not. So * skip this check for relocation trees. */ if (nritems == 0 && !btrfs_header_flag(leaf, BTRFS_HEADER_FLAG_RELOC)) { u64 owner = btrfs_header_owner(leaf); /* These trees must never be empty */ if (unlikely(owner == BTRFS_ROOT_TREE_OBJECTID || owner == BTRFS_CHUNK_TREE_OBJECTID || owner == BTRFS_DEV_TREE_OBJECTID || owner == BTRFS_FS_TREE_OBJECTID || owner == BTRFS_DATA_RELOC_TREE_OBJECTID)) { generic_err(leaf, 0, "invalid root, root %llu must never be empty", owner); return BTRFS_TREE_BLOCK_INVALID_NRITEMS; } /* Unknown tree */ if (unlikely(owner == 0)) { generic_err(leaf, 0, "invalid owner, root 0 is not defined"); return BTRFS_TREE_BLOCK_INVALID_OWNER; } /* EXTENT_TREE_V2 can have empty extent trees. */ if (btrfs_fs_incompat(fs_info, EXTENT_TREE_V2)) return BTRFS_TREE_BLOCK_CLEAN; if (unlikely(owner == BTRFS_EXTENT_TREE_OBJECTID)) { generic_err(leaf, 0, "invalid root, root %llu must never be empty", owner); return BTRFS_TREE_BLOCK_INVALID_NRITEMS; } return BTRFS_TREE_BLOCK_CLEAN; } if (unlikely(nritems == 0)) return BTRFS_TREE_BLOCK_CLEAN; /* * Check the following things to make sure this is a good leaf, and * leaf users won't need to bother with similar sanity checks: * * 1) key ordering * 2) item offset and size * No overlap, no hole, all inside the leaf. * 3) item content * If possible, do comprehensive sanity check. * NOTE: All checks must only rely on the item data itself. */ for (slot = 0; slot < nritems; slot++) { u32 item_end_expected; u64 item_data_end; enum btrfs_tree_block_status ret; btrfs_item_key_to_cpu(leaf, &key, slot); /* Make sure the keys are in the right order */ if (unlikely(btrfs_comp_cpu_keys(&prev_key, &key) >= 0)) { generic_err(leaf, slot, "bad key order, prev (%llu %u %llu) current (%llu %u %llu)", prev_key.objectid, prev_key.type, prev_key.offset, key.objectid, key.type, key.offset); return BTRFS_TREE_BLOCK_BAD_KEY_ORDER; } item_data_end = (u64)btrfs_item_offset(leaf, slot) + btrfs_item_size(leaf, slot); /* * Make sure the offset and ends are right, remember that the * item data starts at the end of the leaf and grows towards the * front. */ if (slot == 0) item_end_expected = BTRFS_LEAF_DATA_SIZE(fs_info); else item_end_expected = btrfs_item_offset(leaf, slot - 1); if (unlikely(item_data_end != item_end_expected)) { generic_err(leaf, slot, "unexpected item end, have %llu expect %u", item_data_end, item_end_expected); return BTRFS_TREE_BLOCK_INVALID_OFFSETS; } /* * Check to make sure that we don't point outside of the leaf, * just in case all the items are consistent to each other, but * all point outside of the leaf. */ if (unlikely(item_data_end > BTRFS_LEAF_DATA_SIZE(fs_info))) { generic_err(leaf, slot, "slot end outside of leaf, have %llu expect range [0, %u]", item_data_end, BTRFS_LEAF_DATA_SIZE(fs_info)); return BTRFS_TREE_BLOCK_INVALID_OFFSETS; } /* Also check if the item pointer overlaps with btrfs item. */ if (unlikely(btrfs_item_ptr_offset(leaf, slot) < btrfs_item_nr_offset(leaf, slot) + sizeof(struct btrfs_item))) { generic_err(leaf, slot, "slot overlaps with its data, item end %lu data start %lu", btrfs_item_nr_offset(leaf, slot) + sizeof(struct btrfs_item), btrfs_item_ptr_offset(leaf, slot)); return BTRFS_TREE_BLOCK_INVALID_OFFSETS; } /* Check if the item size and content meet other criteria. */ ret = check_leaf_item(leaf, &key, slot, &prev_key); if (unlikely(ret != BTRFS_TREE_BLOCK_CLEAN)) return ret; prev_key.objectid = key.objectid; prev_key.type = key.type; prev_key.offset = key.offset; } return BTRFS_TREE_BLOCK_CLEAN; } int btrfs_check_leaf(struct extent_buffer *leaf) { enum btrfs_tree_block_status ret; ret = __btrfs_check_leaf(leaf); if (unlikely(ret != BTRFS_TREE_BLOCK_CLEAN)) return -EUCLEAN; return 0; } ALLOW_ERROR_INJECTION(btrfs_check_leaf, ERRNO); enum btrfs_tree_block_status __btrfs_check_node(struct extent_buffer *node) { struct btrfs_fs_info *fs_info = node->fs_info; unsigned long nr = btrfs_header_nritems(node); struct btrfs_key key, next_key; int slot; int level = btrfs_header_level(node); u64 bytenr; if (unlikely(!btrfs_header_flag(node, BTRFS_HEADER_FLAG_WRITTEN))) { generic_err(node, 0, "invalid flag for node, WRITTEN not set"); return BTRFS_TREE_BLOCK_WRITTEN_NOT_SET; } if (unlikely(level <= 0 || level >= BTRFS_MAX_LEVEL)) { generic_err(node, 0, "invalid level for node, have %d expect [1, %d]", level, BTRFS_MAX_LEVEL - 1); return BTRFS_TREE_BLOCK_INVALID_LEVEL; } if (unlikely(nr == 0 || nr > BTRFS_NODEPTRS_PER_BLOCK(fs_info))) { btrfs_crit(fs_info, "corrupt node: root=%llu block=%llu, nritems too %s, have %lu expect range [1,%u]", btrfs_header_owner(node), node->start, nr == 0 ? "small" : "large", nr, BTRFS_NODEPTRS_PER_BLOCK(fs_info)); return BTRFS_TREE_BLOCK_INVALID_NRITEMS; } for (slot = 0; slot < nr - 1; slot++) { bytenr = btrfs_node_blockptr(node, slot); btrfs_node_key_to_cpu(node, &key, slot); btrfs_node_key_to_cpu(node, &next_key, slot + 1); if (unlikely(!bytenr)) { generic_err(node, slot, "invalid NULL node pointer"); return BTRFS_TREE_BLOCK_INVALID_BLOCKPTR; } if (unlikely(!IS_ALIGNED(bytenr, fs_info->sectorsize))) { generic_err(node, slot, "unaligned pointer, have %llu should be aligned to %u", bytenr, fs_info->sectorsize); return BTRFS_TREE_BLOCK_INVALID_BLOCKPTR; } if (unlikely(btrfs_comp_cpu_keys(&key, &next_key) >= 0)) { generic_err(node, slot, "bad key order, current (%llu %u %llu) next (%llu %u %llu)", key.objectid, key.type, key.offset, next_key.objectid, next_key.type, next_key.offset); return BTRFS_TREE_BLOCK_BAD_KEY_ORDER; } } return BTRFS_TREE_BLOCK_CLEAN; } int btrfs_check_node(struct extent_buffer *node) { enum btrfs_tree_block_status ret; ret = __btrfs_check_node(node); if (unlikely(ret != BTRFS_TREE_BLOCK_CLEAN)) return -EUCLEAN; return 0; } ALLOW_ERROR_INJECTION(btrfs_check_node, ERRNO); int btrfs_check_eb_owner(const struct extent_buffer *eb, u64 root_owner) { const bool is_subvol = is_fstree(root_owner); const u64 eb_owner = btrfs_header_owner(eb); /* * Skip dummy fs, as selftests don't create unique ebs for each dummy * root. */ if (btrfs_is_testing(eb->fs_info)) return 0; /* * There are several call sites (backref walking, qgroup, and data * reloc) passing 0 as @root_owner, as they are not holding the * tree root. In that case, we can not do a reliable ownership check, * so just exit. */ if (root_owner == 0) return 0; /* * These trees use key.offset as their owner, our callers don't have * the extra capacity to pass key.offset here. So we just skip them. */ if (root_owner == BTRFS_TREE_LOG_OBJECTID || root_owner == BTRFS_TREE_RELOC_OBJECTID) return 0; if (!is_subvol) { /* For non-subvolume trees, the eb owner should match root owner */ if (unlikely(root_owner != eb_owner)) { btrfs_crit(eb->fs_info, "corrupted %s, root=%llu block=%llu owner mismatch, have %llu expect %llu", btrfs_header_level(eb) == 0 ? "leaf" : "node", root_owner, btrfs_header_bytenr(eb), eb_owner, root_owner); return -EUCLEAN; } return 0; } /* * For subvolume trees, owners can mismatch, but they should all belong * to subvolume trees. */ if (unlikely(is_subvol != is_fstree(eb_owner))) { btrfs_crit(eb->fs_info, "corrupted %s, root=%llu block=%llu owner mismatch, have %llu expect [%llu, %llu]", btrfs_header_level(eb) == 0 ? "leaf" : "node", root_owner, btrfs_header_bytenr(eb), eb_owner, BTRFS_FIRST_FREE_OBJECTID, BTRFS_LAST_FREE_OBJECTID); return -EUCLEAN; } return 0; } int btrfs_verify_level_key(struct extent_buffer *eb, int level, struct btrfs_key *first_key, u64 parent_transid) { struct btrfs_fs_info *fs_info = eb->fs_info; int found_level; struct btrfs_key found_key; int ret; found_level = btrfs_header_level(eb); if (found_level != level) { WARN(IS_ENABLED(CONFIG_BTRFS_DEBUG), KERN_ERR "BTRFS: tree level check failed\n"); btrfs_err(fs_info, "tree level mismatch detected, bytenr=%llu level expected=%u has=%u", eb->start, level, found_level); return -EIO; } if (!first_key) return 0; /* * For live tree block (new tree blocks in current transaction), * we need proper lock context to avoid race, which is impossible here. * So we only checks tree blocks which is read from disk, whose * generation <= fs_info->last_trans_committed. */ if (btrfs_header_generation(eb) > btrfs_get_last_trans_committed(fs_info)) return 0; /* We have @first_key, so this @eb must have at least one item */ if (btrfs_header_nritems(eb) == 0) { btrfs_err(fs_info, "invalid tree nritems, bytenr=%llu nritems=0 expect >0", eb->start); WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG)); return -EUCLEAN; } if (found_level) btrfs_node_key_to_cpu(eb, &found_key, 0); else btrfs_item_key_to_cpu(eb, &found_key, 0); ret = btrfs_comp_cpu_keys(first_key, &found_key); if (ret) { WARN(IS_ENABLED(CONFIG_BTRFS_DEBUG), KERN_ERR "BTRFS: tree first key check failed\n"); btrfs_err(fs_info, "tree first key mismatch detected, bytenr=%llu parent_transid=%llu key expected=(%llu,%u,%llu) has=(%llu,%u,%llu)", eb->start, parent_transid, first_key->objectid, first_key->type, first_key->offset, found_key.objectid, found_key.type, found_key.offset); } return ret; } |
4 4 2 1 3 3 3 3 3 3 3 4 3 4 4 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 | // SPDX-License-Identifier: GPL-2.0 /* * Simple file system for zoned block devices exposing zones as files. * * Copyright (C) 2019 Western Digital Corporation or its affiliates. */ #include <linux/module.h> #include <linux/pagemap.h> #include <linux/magic.h> #include <linux/iomap.h> #include <linux/init.h> #include <linux/slab.h> #include <linux/blkdev.h> #include <linux/statfs.h> #include <linux/writeback.h> #include <linux/quotaops.h> #include <linux/seq_file.h> #include <linux/uio.h> #include <linux/mman.h> #include <linux/sched/mm.h> #include <linux/crc32.h> #include <linux/task_io_accounting_ops.h> #include <linux/fs_parser.h> #include <linux/fs_context.h> #include "zonefs.h" #define CREATE_TRACE_POINTS #include "trace.h" /* * Get the name of a zone group directory. */ static const char *zonefs_zgroup_name(enum zonefs_ztype ztype) { switch (ztype) { case ZONEFS_ZTYPE_CNV: return "cnv"; case ZONEFS_ZTYPE_SEQ: return "seq"; default: WARN_ON_ONCE(1); return "???"; } } /* * Manage the active zone count. */ static void zonefs_account_active(struct super_block *sb, struct zonefs_zone *z) { struct zonefs_sb_info *sbi = ZONEFS_SB(sb); if (zonefs_zone_is_cnv(z)) return; /* * For zones that transitioned to the offline or readonly condition, * we only need to clear the active state. */ if (z->z_flags & (ZONEFS_ZONE_OFFLINE | ZONEFS_ZONE_READONLY)) goto out; /* * If the zone is active, that is, if it is explicitly open or * partially written, check if it was already accounted as active. */ if ((z->z_flags & ZONEFS_ZONE_OPEN) || (z->z_wpoffset > 0 && z->z_wpoffset < z->z_capacity)) { if (!(z->z_flags & ZONEFS_ZONE_ACTIVE)) { z->z_flags |= ZONEFS_ZONE_ACTIVE; atomic_inc(&sbi->s_active_seq_files); } return; } out: /* The zone is not active. If it was, update the active count */ if (z->z_flags & ZONEFS_ZONE_ACTIVE) { z->z_flags &= ~ZONEFS_ZONE_ACTIVE; atomic_dec(&sbi->s_active_seq_files); } } /* * Manage the active zone count. Called with zi->i_truncate_mutex held. */ void zonefs_inode_account_active(struct inode *inode) { lockdep_assert_held(&ZONEFS_I(inode)->i_truncate_mutex); return zonefs_account_active(inode->i_sb, zonefs_inode_zone(inode)); } /* * Execute a zone management operation. */ static int zonefs_zone_mgmt(struct super_block *sb, struct zonefs_zone *z, enum req_op op) { int ret; /* * With ZNS drives, closing an explicitly open zone that has not been * written will change the zone state to "closed", that is, the zone * will remain active. Since this can then cause failure of explicit * open operation on other zones if the drive active zone resources * are exceeded, make sure that the zone does not remain active by * resetting it. */ if (op == REQ_OP_ZONE_CLOSE && !z->z_wpoffset) op = REQ_OP_ZONE_RESET; trace_zonefs_zone_mgmt(sb, z, op); ret = blkdev_zone_mgmt(sb->s_bdev, op, z->z_sector, z->z_size >> SECTOR_SHIFT); if (ret) { zonefs_err(sb, "Zone management operation %s at %llu failed %d\n", blk_op_str(op), z->z_sector, ret); return ret; } return 0; } int zonefs_inode_zone_mgmt(struct inode *inode, enum req_op op) { lockdep_assert_held(&ZONEFS_I(inode)->i_truncate_mutex); return zonefs_zone_mgmt(inode->i_sb, zonefs_inode_zone(inode), op); } void zonefs_i_size_write(struct inode *inode, loff_t isize) { struct zonefs_zone *z = zonefs_inode_zone(inode); i_size_write(inode, isize); /* * A full zone is no longer open/active and does not need * explicit closing. */ if (isize >= z->z_capacity) { struct zonefs_sb_info *sbi = ZONEFS_SB(inode->i_sb); if (z->z_flags & ZONEFS_ZONE_ACTIVE) atomic_dec(&sbi->s_active_seq_files); z->z_flags &= ~(ZONEFS_ZONE_OPEN | ZONEFS_ZONE_ACTIVE); } } void zonefs_update_stats(struct inode *inode, loff_t new_isize) { struct super_block *sb = inode->i_sb; struct zonefs_sb_info *sbi = ZONEFS_SB(sb); loff_t old_isize = i_size_read(inode); loff_t nr_blocks; if (new_isize == old_isize) return; spin_lock(&sbi->s_lock); /* * This may be called for an update after an IO error. * So beware of the values seen. */ if (new_isize < old_isize) { nr_blocks = (old_isize - new_isize) >> sb->s_blocksize_bits; if (sbi->s_used_blocks > nr_blocks) sbi->s_used_blocks -= nr_blocks; else sbi->s_used_blocks = 0; } else { sbi->s_used_blocks += (new_isize - old_isize) >> sb->s_blocksize_bits; if (sbi->s_used_blocks > sbi->s_blocks) sbi->s_used_blocks = sbi->s_blocks; } spin_unlock(&sbi->s_lock); } /* * Check a zone condition. Return the amount of written (and still readable) * data in the zone. */ static loff_t zonefs_check_zone_condition(struct super_block *sb, struct zonefs_zone *z, struct blk_zone *zone) { switch (zone->cond) { case BLK_ZONE_COND_OFFLINE: zonefs_warn(sb, "Zone %llu: offline zone\n", z->z_sector); z->z_flags |= ZONEFS_ZONE_OFFLINE; return 0; case BLK_ZONE_COND_READONLY: /* * The write pointer of read-only zones is invalid, so we cannot * determine the zone wpoffset (inode size). We thus keep the * zone wpoffset as is, which leads to an empty file * (wpoffset == 0) on mount. For a runtime error, this keeps * the inode size as it was when last updated so that the user * can recover data. */ zonefs_warn(sb, "Zone %llu: read-only zone\n", z->z_sector); z->z_flags |= ZONEFS_ZONE_READONLY; if (zonefs_zone_is_cnv(z)) return z->z_capacity; return z->z_wpoffset; case BLK_ZONE_COND_FULL: /* The write pointer of full zones is invalid. */ return z->z_capacity; default: if (zonefs_zone_is_cnv(z)) return z->z_capacity; return (zone->wp - zone->start) << SECTOR_SHIFT; } } /* * Check a zone condition and adjust its inode access permissions for * offline and readonly zones. */ static void zonefs_inode_update_mode(struct inode *inode) { struct zonefs_zone *z = zonefs_inode_zone(inode); if (z->z_flags & ZONEFS_ZONE_OFFLINE) { /* Offline zones cannot be read nor written */ inode->i_flags |= S_IMMUTABLE; inode->i_mode &= ~0777; } else if (z->z_flags & ZONEFS_ZONE_READONLY) { /* Readonly zones cannot be written */ inode->i_flags |= S_IMMUTABLE; if (z->z_flags & ZONEFS_ZONE_INIT_MODE) inode->i_mode &= ~0777; else inode->i_mode &= ~0222; } z->z_flags &= ~ZONEFS_ZONE_INIT_MODE; z->z_mode = inode->i_mode; } static int zonefs_io_error_cb(struct blk_zone *zone, unsigned int idx, void *data) { struct blk_zone *z = data; *z = *zone; return 0; } static void zonefs_handle_io_error(struct inode *inode, struct blk_zone *zone, bool write) { struct zonefs_zone *z = zonefs_inode_zone(inode); struct super_block *sb = inode->i_sb; struct zonefs_sb_info *sbi = ZONEFS_SB(sb); loff_t isize, data_size; /* * Check the zone condition: if the zone is not "bad" (offline or * read-only), read errors are simply signaled to the IO issuer as long * as there is no inconsistency between the inode size and the amount of * data writen in the zone (data_size). */ data_size = zonefs_check_zone_condition(sb, z, zone); isize = i_size_read(inode); if (!(z->z_flags & (ZONEFS_ZONE_READONLY | ZONEFS_ZONE_OFFLINE)) && !write && isize == data_size) return; /* * At this point, we detected either a bad zone or an inconsistency * between the inode size and the amount of data written in the zone. * For the latter case, the cause may be a write IO error or an external * action on the device. Two error patterns exist: * 1) The inode size is lower than the amount of data in the zone: * a write operation partially failed and data was writen at the end * of the file. This can happen in the case of a large direct IO * needing several BIOs and/or write requests to be processed. * 2) The inode size is larger than the amount of data in the zone: * this can happen with a deferred write error with the use of the * device side write cache after getting successful write IO * completions. Other possibilities are (a) an external corruption, * e.g. an application reset the zone directly, or (b) the device * has a serious problem (e.g. firmware bug). * * In all cases, warn about inode size inconsistency and handle the * IO error according to the zone condition and to the mount options. */ if (isize != data_size) zonefs_warn(sb, "inode %lu: invalid size %lld (should be %lld)\n", inode->i_ino, isize, data_size); /* * First handle bad zones signaled by hardware. The mount options * errors=zone-ro and errors=zone-offline result in changing the * zone condition to read-only and offline respectively, as if the * condition was signaled by the hardware. */ if ((z->z_flags & ZONEFS_ZONE_OFFLINE) || (sbi->s_mount_opts & ZONEFS_MNTOPT_ERRORS_ZOL)) { zonefs_warn(sb, "inode %lu: read/write access disabled\n", inode->i_ino); if (!(z->z_flags & ZONEFS_ZONE_OFFLINE)) z->z_flags |= ZONEFS_ZONE_OFFLINE; zonefs_inode_update_mode(inode); data_size = 0; } else if ((z->z_flags & ZONEFS_ZONE_READONLY) || (sbi->s_mount_opts & ZONEFS_MNTOPT_ERRORS_ZRO)) { zonefs_warn(sb, "inode %lu: write access disabled\n", inode->i_ino); if (!(z->z_flags & ZONEFS_ZONE_READONLY)) z->z_flags |= ZONEFS_ZONE_READONLY; zonefs_inode_update_mode(inode); data_size = isize; } else if (sbi->s_mount_opts & ZONEFS_MNTOPT_ERRORS_RO && data_size > isize) { /* Do not expose garbage data */ data_size = isize; } /* * If the filesystem is mounted with the explicit-open mount option, we * need to clear the ZONEFS_ZONE_OPEN flag if the zone transitioned to * the read-only or offline condition, to avoid attempting an explicit * close of the zone when the inode file is closed. */ if ((sbi->s_mount_opts & ZONEFS_MNTOPT_EXPLICIT_OPEN) && (z->z_flags & (ZONEFS_ZONE_READONLY | ZONEFS_ZONE_OFFLINE))) z->z_flags &= ~ZONEFS_ZONE_OPEN; /* * If error=remount-ro was specified, any error result in remounting * the volume as read-only. */ if ((sbi->s_mount_opts & ZONEFS_MNTOPT_ERRORS_RO) && !sb_rdonly(sb)) { zonefs_warn(sb, "remounting filesystem read-only\n"); sb->s_flags |= SB_RDONLY; } /* * Update block usage stats and the inode size to prevent access to * invalid data. */ zonefs_update_stats(inode, data_size); zonefs_i_size_write(inode, data_size); z->z_wpoffset = data_size; zonefs_inode_account_active(inode); } /* * When an file IO error occurs, check the file zone to see if there is a change * in the zone condition (e.g. offline or read-only). For a failed write to a * sequential zone, the zone write pointer position must also be checked to * eventually correct the file size and zonefs inode write pointer offset * (which can be out of sync with the drive due to partial write failures). */ void __zonefs_io_error(struct inode *inode, bool write) { struct zonefs_zone *z = zonefs_inode_zone(inode); struct super_block *sb = inode->i_sb; unsigned int noio_flag; struct blk_zone zone; int ret; /* * Conventional zone have no write pointer and cannot become read-only * or offline. So simply fake a report for a single or aggregated zone * and let zonefs_handle_io_error() correct the zone inode information * according to the mount options. */ if (!zonefs_zone_is_seq(z)) { zone.start = z->z_sector; zone.len = z->z_size >> SECTOR_SHIFT; zone.wp = zone.start + zone.len; zone.type = BLK_ZONE_TYPE_CONVENTIONAL; zone.cond = BLK_ZONE_COND_NOT_WP; zone.capacity = zone.len; goto handle_io_error; } /* * Memory allocations in blkdev_report_zones() can trigger a memory * reclaim which may in turn cause a recursion into zonefs as well as * struct request allocations for the same device. The former case may * end up in a deadlock on the inode truncate mutex, while the latter * may prevent IO forward progress. Executing the report zones under * the GFP_NOIO context avoids both problems. */ noio_flag = memalloc_noio_save(); ret = blkdev_report_zones(sb->s_bdev, z->z_sector, 1, zonefs_io_error_cb, &zone); memalloc_noio_restore(noio_flag); if (ret != 1) { zonefs_err(sb, "Get inode %lu zone information failed %d\n", inode->i_ino, ret); zonefs_warn(sb, "remounting filesystem read-only\n"); sb->s_flags |= SB_RDONLY; return; } handle_io_error: zonefs_handle_io_error(inode, &zone, write); } static struct kmem_cache *zonefs_inode_cachep; static struct inode *zonefs_alloc_inode(struct super_block *sb) { struct zonefs_inode_info *zi; zi = alloc_inode_sb(sb, zonefs_inode_cachep, GFP_KERNEL); if (!zi) return NULL; inode_init_once(&zi->i_vnode); mutex_init(&zi->i_truncate_mutex); zi->i_wr_refcnt = 0; return &zi->i_vnode; } static void zonefs_free_inode(struct inode *inode) { kmem_cache_free(zonefs_inode_cachep, ZONEFS_I(inode)); } /* * File system stat. */ static int zonefs_statfs(struct dentry *dentry, struct kstatfs *buf) { struct super_block *sb = dentry->d_sb; struct zonefs_sb_info *sbi = ZONEFS_SB(sb); enum zonefs_ztype t; buf->f_type = ZONEFS_MAGIC; buf->f_bsize = sb->s_blocksize; buf->f_namelen = ZONEFS_NAME_MAX; spin_lock(&sbi->s_lock); buf->f_blocks = sbi->s_blocks; if (WARN_ON(sbi->s_used_blocks > sbi->s_blocks)) buf->f_bfree = 0; else buf->f_bfree = buf->f_blocks - sbi->s_used_blocks; buf->f_bavail = buf->f_bfree; for (t = 0; t < ZONEFS_ZTYPE_MAX; t++) { if (sbi->s_zgroup[t].g_nr_zones) buf->f_files += sbi->s_zgroup[t].g_nr_zones + 1; } buf->f_ffree = 0; spin_unlock(&sbi->s_lock); buf->f_fsid = uuid_to_fsid(sbi->s_uuid.b); return 0; } enum { Opt_errors, Opt_explicit_open, }; struct zonefs_context { unsigned long s_mount_opts; }; static const struct constant_table zonefs_param_errors[] = { {"remount-ro", ZONEFS_MNTOPT_ERRORS_RO}, {"zone-ro", ZONEFS_MNTOPT_ERRORS_ZRO}, {"zone-offline", ZONEFS_MNTOPT_ERRORS_ZOL}, {"repair", ZONEFS_MNTOPT_ERRORS_REPAIR}, {} }; static const struct fs_parameter_spec zonefs_param_spec[] = { fsparam_enum ("errors", Opt_errors, zonefs_param_errors), fsparam_flag ("explicit-open", Opt_explicit_open), {} }; static int zonefs_parse_param(struct fs_context *fc, struct fs_parameter *param) { struct zonefs_context *ctx = fc->fs_private; struct fs_parse_result result; int opt; opt = fs_parse(fc, zonefs_param_spec, param, &result); if (opt < 0) return opt; switch (opt) { case Opt_errors: ctx->s_mount_opts &= ~ZONEFS_MNTOPT_ERRORS_MASK; ctx->s_mount_opts |= result.uint_32; break; case Opt_explicit_open: ctx->s_mount_opts |= ZONEFS_MNTOPT_EXPLICIT_OPEN; break; default: return -EINVAL; } return 0; } static int zonefs_show_options(struct seq_file *seq, struct dentry *root) { struct zonefs_sb_info *sbi = ZONEFS_SB(root->d_sb); if (sbi->s_mount_opts & ZONEFS_MNTOPT_ERRORS_RO) seq_puts(seq, ",errors=remount-ro"); if (sbi->s_mount_opts & ZONEFS_MNTOPT_ERRORS_ZRO) seq_puts(seq, ",errors=zone-ro"); if (sbi->s_mount_opts & ZONEFS_MNTOPT_ERRORS_ZOL) seq_puts(seq, ",errors=zone-offline"); if (sbi->s_mount_opts & ZONEFS_MNTOPT_ERRORS_REPAIR) seq_puts(seq, ",errors=repair"); return 0; } static int zonefs_inode_setattr(struct mnt_idmap *idmap, struct dentry *dentry, struct iattr *iattr) { struct inode *inode = d_inode(dentry); int ret; if (unlikely(IS_IMMUTABLE(inode))) return -EPERM; ret = setattr_prepare(&nop_mnt_idmap, dentry, iattr); if (ret) return ret; /* * Since files and directories cannot be created nor deleted, do not * allow setting any write attributes on the sub-directories grouping * files by zone type. */ if ((iattr->ia_valid & ATTR_MODE) && S_ISDIR(inode->i_mode) && (iattr->ia_mode & 0222)) return -EPERM; if (((iattr->ia_valid & ATTR_UID) && !uid_eq(iattr->ia_uid, inode->i_uid)) || ((iattr->ia_valid & ATTR_GID) && !gid_eq(iattr->ia_gid, inode->i_gid))) { ret = dquot_transfer(&nop_mnt_idmap, inode, iattr); if (ret) return ret; } if (iattr->ia_valid & ATTR_SIZE) { ret = zonefs_file_truncate(inode, iattr->ia_size); if (ret) return ret; } setattr_copy(&nop_mnt_idmap, inode, iattr); if (S_ISREG(inode->i_mode)) { struct zonefs_zone *z = zonefs_inode_zone(inode); z->z_mode = inode->i_mode; z->z_uid = inode->i_uid; z->z_gid = inode->i_gid; } return 0; } static const struct inode_operations zonefs_file_inode_operations = { .setattr = zonefs_inode_setattr, }; static long zonefs_fname_to_fno(const struct qstr *fname) { const char *name = fname->name; unsigned int len = fname->len; long fno = 0, shift = 1; const char *rname; char c = *name; unsigned int i; /* * File names are always a base-10 number string without any * leading 0s. */ if (!isdigit(c)) return -ENOENT; if (len > 1 && c == '0') return -ENOENT; if (len == 1) return c - '0'; for (i = 0, rname = name + len - 1; i < len; i++, rname--) { c = *rname; if (!isdigit(c)) return -ENOENT; fno += (c - '0') * shift; shift *= 10; } return fno; } static struct inode *zonefs_get_file_inode(struct inode *dir, struct dentry *dentry) { struct zonefs_zone_group *zgroup = dir->i_private; struct super_block *sb = dir->i_sb; struct zonefs_sb_info *sbi = ZONEFS_SB(sb); struct zonefs_zone *z; struct inode *inode; ino_t ino; long fno; /* Get the file number from the file name */ fno = zonefs_fname_to_fno(&dentry->d_name); if (fno < 0) return ERR_PTR(fno); if (!zgroup->g_nr_zones || fno >= zgroup->g_nr_zones) return ERR_PTR(-ENOENT); z = &zgroup->g_zones[fno]; ino = z->z_sector >> sbi->s_zone_sectors_shift; inode = iget_locked(sb, ino); if (!inode) return ERR_PTR(-ENOMEM); if (!(inode->i_state & I_NEW)) { WARN_ON_ONCE(inode->i_private != z); return inode; } inode->i_ino = ino; inode->i_mode = z->z_mode; inode_set_mtime_to_ts(inode, inode_set_atime_to_ts(inode, inode_set_ctime_to_ts(inode, inode_get_ctime(dir)))); inode->i_uid = z->z_uid; inode->i_gid = z->z_gid; inode->i_size = z->z_wpoffset; inode->i_blocks = z->z_capacity >> SECTOR_SHIFT; inode->i_private = z; inode->i_op = &zonefs_file_inode_operations; inode->i_fop = &zonefs_file_operations; inode->i_mapping->a_ops = &zonefs_file_aops; mapping_set_large_folios(inode->i_mapping); /* Update the inode access rights depending on the zone condition */ zonefs_inode_update_mode(inode); unlock_new_inode(inode); return inode; } static struct inode *zonefs_get_zgroup_inode(struct super_block *sb, enum zonefs_ztype ztype) { struct inode *root = d_inode(sb->s_root); struct zonefs_sb_info *sbi = ZONEFS_SB(sb); struct inode *inode; ino_t ino = bdev_nr_zones(sb->s_bdev) + ztype + 1; inode = iget_locked(sb, ino); if (!inode) return ERR_PTR(-ENOMEM); if (!(inode->i_state & I_NEW)) return inode; inode->i_ino = ino; inode_init_owner(&nop_mnt_idmap, inode, root, S_IFDIR | 0555); inode->i_size = sbi->s_zgroup[ztype].g_nr_zones; inode_set_mtime_to_ts(inode, inode_set_atime_to_ts(inode, inode_set_ctime_to_ts(inode, inode_get_ctime(root)))); inode->i_private = &sbi->s_zgroup[ztype]; set_nlink(inode, 2); inode->i_op = &zonefs_dir_inode_operations; inode->i_fop = &zonefs_dir_operations; unlock_new_inode(inode); return inode; } static struct inode *zonefs_get_dir_inode(struct inode *dir, struct dentry *dentry) { struct super_block *sb = dir->i_sb; struct zonefs_sb_info *sbi = ZONEFS_SB(sb); const char *name = dentry->d_name.name; enum zonefs_ztype ztype; /* * We only need to check for the "seq" directory and * the "cnv" directory if we have conventional zones. */ if (dentry->d_name.len != 3) return ERR_PTR(-ENOENT); for (ztype = 0; ztype < ZONEFS_ZTYPE_MAX; ztype++) { if (sbi->s_zgroup[ztype].g_nr_zones && memcmp(name, zonefs_zgroup_name(ztype), 3) == 0) break; } if (ztype == ZONEFS_ZTYPE_MAX) return ERR_PTR(-ENOENT); return zonefs_get_zgroup_inode(sb, ztype); } static struct dentry *zonefs_lookup(struct inode *dir, struct dentry *dentry, unsigned int flags) { struct inode *inode; if (dentry->d_name.len > ZONEFS_NAME_MAX) return ERR_PTR(-ENAMETOOLONG); if (dir == d_inode(dir->i_sb->s_root)) inode = zonefs_get_dir_inode(dir, dentry); else inode = zonefs_get_file_inode(dir, dentry); return d_splice_alias(inode, dentry); } static int zonefs_readdir_root(struct file *file, struct dir_context *ctx) { struct inode *inode = file_inode(file); struct super_block *sb = inode->i_sb; struct zonefs_sb_info *sbi = ZONEFS_SB(sb); enum zonefs_ztype ztype = ZONEFS_ZTYPE_CNV; ino_t base_ino = bdev_nr_zones(sb->s_bdev) + 1; if (ctx->pos >= inode->i_size) return 0; if (!dir_emit_dots(file, ctx)) return 0; if (ctx->pos == 2) { if (!sbi->s_zgroup[ZONEFS_ZTYPE_CNV].g_nr_zones) ztype = ZONEFS_ZTYPE_SEQ; if (!dir_emit(ctx, zonefs_zgroup_name(ztype), 3, base_ino + ztype, DT_DIR)) return 0; ctx->pos++; } if (ctx->pos == 3 && ztype != ZONEFS_ZTYPE_SEQ) { ztype = ZONEFS_ZTYPE_SEQ; if (!dir_emit(ctx, zonefs_zgroup_name(ztype), 3, base_ino + ztype, DT_DIR)) return 0; ctx->pos++; } return 0; } static int zonefs_readdir_zgroup(struct file *file, struct dir_context *ctx) { struct inode *inode = file_inode(file); struct zonefs_zone_group *zgroup = inode->i_private; struct super_block *sb = inode->i_sb; struct zonefs_sb_info *sbi = ZONEFS_SB(sb); struct zonefs_zone *z; int fname_len; char *fname; ino_t ino; int f; /* * The size of zone group directories is equal to the number * of zone files in the group and does note include the "." and * ".." entries. Hence the "+ 2" here. */ if (ctx->pos >= inode->i_size + 2) return 0; if (!dir_emit_dots(file, ctx)) return 0; fname = kmalloc(ZONEFS_NAME_MAX, GFP_KERNEL); if (!fname) return -ENOMEM; for (f = ctx->pos - 2; f < zgroup->g_nr_zones; f++) { z = &zgroup->g_zones[f]; ino = z->z_sector >> sbi->s_zone_sectors_shift; fname_len = snprintf(fname, ZONEFS_NAME_MAX - 1, "%u", f); if (!dir_emit(ctx, fname, fname_len, ino, DT_REG)) break; ctx->pos++; } kfree(fname); return 0; } static int zonefs_readdir(struct file *file, struct dir_context *ctx) { struct inode *inode = file_inode(file); if (inode == d_inode(inode->i_sb->s_root)) return zonefs_readdir_root(file, ctx); return zonefs_readdir_zgroup(file, ctx); } const struct inode_operations zonefs_dir_inode_operations = { .lookup = zonefs_lookup, .setattr = zonefs_inode_setattr, }; const struct file_operations zonefs_dir_operations = { .llseek = generic_file_llseek, .read = generic_read_dir, .iterate_shared = zonefs_readdir, }; struct zonefs_zone_data { struct super_block *sb; unsigned int nr_zones[ZONEFS_ZTYPE_MAX]; sector_t cnv_zone_start; struct blk_zone *zones; }; static int zonefs_get_zone_info_cb(struct blk_zone *zone, unsigned int idx, void *data) { struct zonefs_zone_data *zd = data; struct super_block *sb = zd->sb; struct zonefs_sb_info *sbi = ZONEFS_SB(sb); /* * We do not care about the first zone: it contains the super block * and not exposed as a file. */ if (!idx) return 0; /* * Count the number of zones that will be exposed as files. * For sequential zones, we always have as many files as zones. * FOr conventional zones, the number of files depends on if we have * conventional zones aggregation enabled. */ switch (zone->type) { case BLK_ZONE_TYPE_CONVENTIONAL: if (sbi->s_features & ZONEFS_F_AGGRCNV) { /* One file per set of contiguous conventional zones */ if (!(sbi->s_zgroup[ZONEFS_ZTYPE_CNV].g_nr_zones) || zone->start != zd->cnv_zone_start) sbi->s_zgroup[ZONEFS_ZTYPE_CNV].g_nr_zones++; zd->cnv_zone_start = zone->start + zone->len; } else { /* One file per zone */ sbi->s_zgroup[ZONEFS_ZTYPE_CNV].g_nr_zones++; } break; case BLK_ZONE_TYPE_SEQWRITE_REQ: case BLK_ZONE_TYPE_SEQWRITE_PREF: sbi->s_zgroup[ZONEFS_ZTYPE_SEQ].g_nr_zones++; break; default: zonefs_err(zd->sb, "Unsupported zone type 0x%x\n", zone->type); return -EIO; } memcpy(&zd->zones[idx], zone, sizeof(struct blk_zone)); return 0; } static int zonefs_get_zone_info(struct zonefs_zone_data *zd) { struct block_device *bdev = zd->sb->s_bdev; int ret; zd->zones = kvcalloc(bdev_nr_zones(bdev), sizeof(struct blk_zone), GFP_KERNEL); if (!zd->zones) return -ENOMEM; /* Get zones information from the device */ ret = blkdev_report_zones(bdev, 0, BLK_ALL_ZONES, zonefs_get_zone_info_cb, zd); if (ret < 0) { zonefs_err(zd->sb, "Zone report failed %d\n", ret); return ret; } if (ret != bdev_nr_zones(bdev)) { zonefs_err(zd->sb, "Invalid zone report (%d/%u zones)\n", ret, bdev_nr_zones(bdev)); return -EIO; } return 0; } static inline void zonefs_free_zone_info(struct zonefs_zone_data *zd) { kvfree(zd->zones); } /* * Create a zone group and populate it with zone files. */ static int zonefs_init_zgroup(struct super_block *sb, struct zonefs_zone_data *zd, enum zonefs_ztype ztype) { struct zonefs_sb_info *sbi = ZONEFS_SB(sb); struct zonefs_zone_group *zgroup = &sbi->s_zgroup[ztype]; struct blk_zone *zone, *next, *end; struct zonefs_zone *z; unsigned int n = 0; int ret; /* Allocate the zone group. If it is empty, we have nothing to do. */ if (!zgroup->g_nr_zones) return 0; zgroup->g_zones = kvcalloc(zgroup->g_nr_zones, sizeof(struct zonefs_zone), GFP_KERNEL); if (!zgroup->g_zones) return -ENOMEM; /* * Initialize the zone groups using the device zone information. * We always skip the first zone as it contains the super block * and is not use to back a file. */ end = zd->zones + bdev_nr_zones(sb->s_bdev); for (zone = &zd->zones[1]; zone < end; zone = next) { next = zone + 1; if (zonefs_zone_type(zone) != ztype) continue; if (WARN_ON_ONCE(n >= zgroup->g_nr_zones)) return -EINVAL; /* * For conventional zones, contiguous zones can be aggregated * together to form larger files. Note that this overwrites the * length of the first zone of the set of contiguous zones * aggregated together. If one offline or read-only zone is * found, assume that all zones aggregated have the same * condition. */ if (ztype == ZONEFS_ZTYPE_CNV && (sbi->s_features & ZONEFS_F_AGGRCNV)) { for (; next < end; next++) { if (zonefs_zone_type(next) != ztype) break; zone->len += next->len; zone->capacity += next->capacity; if (next->cond == BLK_ZONE_COND_READONLY && zone->cond != BLK_ZONE_COND_OFFLINE) zone->cond = BLK_ZONE_COND_READONLY; else if (next->cond == BLK_ZONE_COND_OFFLINE) zone->cond = BLK_ZONE_COND_OFFLINE; } } z = &zgroup->g_zones[n]; if (ztype == ZONEFS_ZTYPE_CNV) z->z_flags |= ZONEFS_ZONE_CNV; z->z_sector = zone->start; z->z_size = zone->len << SECTOR_SHIFT; if (z->z_size > bdev_zone_sectors(sb->s_bdev) << SECTOR_SHIFT && !(sbi->s_features & ZONEFS_F_AGGRCNV)) { zonefs_err(sb, "Invalid zone size %llu (device zone sectors %llu)\n", z->z_size, bdev_zone_sectors(sb->s_bdev) << SECTOR_SHIFT); return -EINVAL; } z->z_capacity = min_t(loff_t, MAX_LFS_FILESIZE, zone->capacity << SECTOR_SHIFT); z->z_wpoffset = zonefs_check_zone_condition(sb, z, zone); z->z_mode = S_IFREG | sbi->s_perm; z->z_uid = sbi->s_uid; z->z_gid = sbi->s_gid; /* * Let zonefs_inode_update_mode() know that we will need * special initialization of the inode mode the first time * it is accessed. */ z->z_flags |= ZONEFS_ZONE_INIT_MODE; sb->s_maxbytes = max(z->z_capacity, sb->s_maxbytes); sbi->s_blocks += z->z_capacity >> sb->s_blocksize_bits; sbi->s_used_blocks += z->z_wpoffset >> sb->s_blocksize_bits; /* * For sequential zones, make sure that any open zone is closed * first to ensure that the initial number of open zones is 0, * in sync with the open zone accounting done when the mount * option ZONEFS_MNTOPT_EXPLICIT_OPEN is used. */ if (ztype == ZONEFS_ZTYPE_SEQ && (zone->cond == BLK_ZONE_COND_IMP_OPEN || zone->cond == BLK_ZONE_COND_EXP_OPEN)) { ret = zonefs_zone_mgmt(sb, z, REQ_OP_ZONE_CLOSE); if (ret) return ret; } zonefs_account_active(sb, z); n++; } if (WARN_ON_ONCE(n != zgroup->g_nr_zones)) return -EINVAL; zonefs_info(sb, "Zone group \"%s\" has %u file%s\n", zonefs_zgroup_name(ztype), zgroup->g_nr_zones, str_plural(zgroup->g_nr_zones)); return 0; } static void zonefs_free_zgroups(struct super_block *sb) { struct zonefs_sb_info *sbi = ZONEFS_SB(sb); enum zonefs_ztype ztype; if (!sbi) return; for (ztype = 0; ztype < ZONEFS_ZTYPE_MAX; ztype++) { kvfree(sbi->s_zgroup[ztype].g_zones); sbi->s_zgroup[ztype].g_zones = NULL; } } /* * Create a zone group and populate it with zone files. */ static int zonefs_init_zgroups(struct super_block *sb) { struct zonefs_zone_data zd; enum zonefs_ztype ztype; int ret; /* First get the device zone information */ memset(&zd, 0, sizeof(struct zonefs_zone_data)); zd.sb = sb; ret = zonefs_get_zone_info(&zd); if (ret) goto cleanup; /* Allocate and initialize the zone groups */ for (ztype = 0; ztype < ZONEFS_ZTYPE_MAX; ztype++) { ret = zonefs_init_zgroup(sb, &zd, ztype); if (ret) { zonefs_info(sb, "Zone group \"%s\" initialization failed\n", zonefs_zgroup_name(ztype)); break; } } cleanup: zonefs_free_zone_info(&zd); if (ret) zonefs_free_zgroups(sb); return ret; } /* * Read super block information from the device. */ static int zonefs_read_super(struct super_block *sb) { struct zonefs_sb_info *sbi = ZONEFS_SB(sb); struct zonefs_super *super; u32 crc, stored_crc; struct page *page; struct bio_vec bio_vec; struct bio bio; int ret; page = alloc_page(GFP_KERNEL); if (!page) return -ENOMEM; bio_init(&bio, sb->s_bdev, &bio_vec, 1, REQ_OP_READ); bio.bi_iter.bi_sector = 0; __bio_add_page(&bio, page, PAGE_SIZE, 0); ret = submit_bio_wait(&bio); if (ret) goto free_page; super = page_address(page); ret = -EINVAL; if (le32_to_cpu(super->s_magic) != ZONEFS_MAGIC) goto free_page; stored_crc = le32_to_cpu(super->s_crc); super->s_crc = 0; crc = crc32(~0U, (unsigned char *)super, sizeof(struct zonefs_super)); if (crc != stored_crc) { zonefs_err(sb, "Invalid checksum (Expected 0x%08x, got 0x%08x)", crc, stored_crc); goto free_page; } sbi->s_features = le64_to_cpu(super->s_features); if (sbi->s_features & ~ZONEFS_F_DEFINED_FEATURES) { zonefs_err(sb, "Unknown features set 0x%llx\n", sbi->s_features); goto free_page; } if (sbi->s_features & ZONEFS_F_UID) { sbi->s_uid = make_kuid(current_user_ns(), le32_to_cpu(super->s_uid)); if (!uid_valid(sbi->s_uid)) { zonefs_err(sb, "Invalid UID feature\n"); goto free_page; } } if (sbi->s_features & ZONEFS_F_GID) { sbi->s_gid = make_kgid(current_user_ns(), le32_to_cpu(super->s_gid)); if (!gid_valid(sbi->s_gid)) { zonefs_err(sb, "Invalid GID feature\n"); goto free_page; } } if (sbi->s_features & ZONEFS_F_PERM) sbi->s_perm = le32_to_cpu(super->s_perm); if (memchr_inv(super->s_reserved, 0, sizeof(super->s_reserved))) { zonefs_err(sb, "Reserved area is being used\n"); goto free_page; } import_uuid(&sbi->s_uuid, super->s_uuid); ret = 0; free_page: __free_page(page); return ret; } static const struct super_operations zonefs_sops = { .alloc_inode = zonefs_alloc_inode, .free_inode = zonefs_free_inode, .statfs = zonefs_statfs, .show_options = zonefs_show_options, }; static int zonefs_get_zgroup_inodes(struct super_block *sb) { struct zonefs_sb_info *sbi = ZONEFS_SB(sb); struct inode *dir_inode; enum zonefs_ztype ztype; for (ztype = 0; ztype < ZONEFS_ZTYPE_MAX; ztype++) { if (!sbi->s_zgroup[ztype].g_nr_zones) continue; dir_inode = zonefs_get_zgroup_inode(sb, ztype); if (IS_ERR(dir_inode)) return PTR_ERR(dir_inode); sbi->s_zgroup[ztype].g_inode = dir_inode; } return 0; } static void zonefs_release_zgroup_inodes(struct super_block *sb) { struct zonefs_sb_info *sbi = ZONEFS_SB(sb); enum zonefs_ztype ztype; if (!sbi) return; for (ztype = 0; ztype < ZONEFS_ZTYPE_MAX; ztype++) { if (sbi->s_zgroup[ztype].g_inode) { iput(sbi->s_zgroup[ztype].g_inode); sbi->s_zgroup[ztype].g_inode = NULL; } } } /* * Check that the device is zoned. If it is, get the list of zones and create * sub-directories and files according to the device zone configuration and * format options. */ static int zonefs_fill_super(struct super_block *sb, struct fs_context *fc) { struct zonefs_sb_info *sbi; struct zonefs_context *ctx = fc->fs_private; struct inode *inode; enum zonefs_ztype ztype; int ret; if (!bdev_is_zoned(sb->s_bdev)) { zonefs_err(sb, "Not a zoned block device\n"); return -EINVAL; } /* * Initialize super block information: the maximum file size is updated * when the zone files are created so that the format option * ZONEFS_F_AGGRCNV which increases the maximum file size of a file * beyond the zone size is taken into account. */ sbi = kzalloc(sizeof(*sbi), GFP_KERNEL); if (!sbi) return -ENOMEM; spin_lock_init(&sbi->s_lock); sb->s_fs_info = sbi; sb->s_magic = ZONEFS_MAGIC; sb->s_maxbytes = 0; sb->s_op = &zonefs_sops; sb->s_time_gran = 1; /* * The block size is set to the device zone write granularity to ensure * that write operations are always aligned according to the device * interface constraints. */ sb_set_blocksize(sb, bdev_zone_write_granularity(sb->s_bdev)); sbi->s_zone_sectors_shift = ilog2(bdev_zone_sectors(sb->s_bdev)); sbi->s_uid = GLOBAL_ROOT_UID; sbi->s_gid = GLOBAL_ROOT_GID; sbi->s_perm = 0640; sbi->s_mount_opts = ctx->s_mount_opts; atomic_set(&sbi->s_wro_seq_files, 0); sbi->s_max_wro_seq_files = bdev_max_open_zones(sb->s_bdev); atomic_set(&sbi->s_active_seq_files, 0); sbi->s_max_active_seq_files = bdev_max_active_zones(sb->s_bdev); ret = zonefs_read_super(sb); if (ret) return ret; zonefs_info(sb, "Mounting %u zones", bdev_nr_zones(sb->s_bdev)); if (!sbi->s_max_wro_seq_files && !sbi->s_max_active_seq_files && sbi->s_mount_opts & ZONEFS_MNTOPT_EXPLICIT_OPEN) { zonefs_info(sb, "No open and active zone limits. Ignoring explicit_open mount option\n"); sbi->s_mount_opts &= ~ZONEFS_MNTOPT_EXPLICIT_OPEN; } /* Initialize the zone groups */ ret = zonefs_init_zgroups(sb); if (ret) goto cleanup; /* Create the root directory inode */ ret = -ENOMEM; inode = new_inode(sb); if (!inode) goto cleanup; inode->i_ino = bdev_nr_zones(sb->s_bdev); inode->i_mode = S_IFDIR | 0555; simple_inode_init_ts(inode); inode->i_op = &zonefs_dir_inode_operations; inode->i_fop = &zonefs_dir_operations; inode->i_size = 2; set_nlink(inode, 2); for (ztype = 0; ztype < ZONEFS_ZTYPE_MAX; ztype++) { if (sbi->s_zgroup[ztype].g_nr_zones) { inc_nlink(inode); inode->i_size++; } } sb->s_root = d_make_root(inode); if (!sb->s_root) goto cleanup; /* * Take a reference on the zone groups directory inodes * to keep them in the inode cache. */ ret = zonefs_get_zgroup_inodes(sb); if (ret) goto cleanup; ret = zonefs_sysfs_register(sb); if (ret) goto cleanup; return 0; cleanup: zonefs_release_zgroup_inodes(sb); zonefs_free_zgroups(sb); return ret; } static void zonefs_kill_super(struct super_block *sb) { struct zonefs_sb_info *sbi = ZONEFS_SB(sb); /* Release the reference on the zone group directory inodes */ zonefs_release_zgroup_inodes(sb); kill_block_super(sb); zonefs_sysfs_unregister(sb); zonefs_free_zgroups(sb); kfree(sbi); } static void zonefs_free_fc(struct fs_context *fc) { struct zonefs_context *ctx = fc->fs_private; kfree(ctx); } static int zonefs_get_tree(struct fs_context *fc) { return get_tree_bdev(fc, zonefs_fill_super); } static int zonefs_reconfigure(struct fs_context *fc) { struct zonefs_context *ctx = fc->fs_private; struct super_block *sb = fc->root->d_sb; struct zonefs_sb_info *sbi = sb->s_fs_info; sync_filesystem(fc->root->d_sb); /* Copy new options from ctx into sbi. */ sbi->s_mount_opts = ctx->s_mount_opts; return 0; } static const struct fs_context_operations zonefs_context_ops = { .parse_param = zonefs_parse_param, .get_tree = zonefs_get_tree, .reconfigure = zonefs_reconfigure, .free = zonefs_free_fc, }; /* * Set up the filesystem mount context. */ static int zonefs_init_fs_context(struct fs_context *fc) { struct zonefs_context *ctx; ctx = kzalloc(sizeof(struct zonefs_context), GFP_KERNEL); if (!ctx) return -ENOMEM; ctx->s_mount_opts = ZONEFS_MNTOPT_ERRORS_RO; fc->ops = &zonefs_context_ops; fc->fs_private = ctx; return 0; } /* * File system definition and registration. */ static struct file_system_type zonefs_type = { .owner = THIS_MODULE, .name = "zonefs", .kill_sb = zonefs_kill_super, .fs_flags = FS_REQUIRES_DEV, .init_fs_context = zonefs_init_fs_context, .parameters = zonefs_param_spec, }; static int __init zonefs_init_inodecache(void) { zonefs_inode_cachep = kmem_cache_create("zonefs_inode_cache", sizeof(struct zonefs_inode_info), 0, SLAB_RECLAIM_ACCOUNT | SLAB_ACCOUNT, NULL); if (zonefs_inode_cachep == NULL) return -ENOMEM; return 0; } static void zonefs_destroy_inodecache(void) { /* * Make sure all delayed rcu free inodes are flushed before we * destroy the inode cache. */ rcu_barrier(); kmem_cache_destroy(zonefs_inode_cachep); } static int __init zonefs_init(void) { int ret; BUILD_BUG_ON(sizeof(struct zonefs_super) != ZONEFS_SUPER_SIZE); ret = zonefs_init_inodecache(); if (ret) return ret; ret = zonefs_sysfs_init(); if (ret) goto destroy_inodecache; ret = register_filesystem(&zonefs_type); if (ret) goto sysfs_exit; return 0; sysfs_exit: zonefs_sysfs_exit(); destroy_inodecache: zonefs_destroy_inodecache(); return ret; } static void __exit zonefs_exit(void) { unregister_filesystem(&zonefs_type); zonefs_sysfs_exit(); zonefs_destroy_inodecache(); } MODULE_AUTHOR("Damien Le Moal"); MODULE_DESCRIPTION("Zone file system for zoned block devices"); MODULE_LICENSE("GPL"); MODULE_ALIAS_FS("zonefs"); module_init(zonefs_init); module_exit(zonefs_exit); |
1797 1660 93 1751 48 2 5 7 8 9 1 2 2 83 224 183 65 164 126 51 5 6 5 9 9 7 284 162 83 116 18 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 | // SPDX-License-Identifier: GPL-2.0-or-later /* * Generic parts * Linux ethernet bridge * * Authors: * Lennert Buytenhek <buytenh@gnu.org> */ #include <linux/module.h> #include <linux/kernel.h> #include <linux/netdevice.h> #include <linux/etherdevice.h> #include <linux/init.h> #include <linux/llc.h> #include <net/llc.h> #include <net/stp.h> #include <net/switchdev.h> #include "br_private.h" /* * Handle changes in state of network devices enslaved to a bridge. * * Note: don't care about up/down if bridge itself is down, because * port state is checked when bridge is brought up. */ static int br_device_event(struct notifier_block *unused, unsigned long event, void *ptr) { struct netlink_ext_ack *extack = netdev_notifier_info_to_extack(ptr); struct netdev_notifier_pre_changeaddr_info *prechaddr_info; struct net_device *dev = netdev_notifier_info_to_dev(ptr); struct net_bridge_port *p; struct net_bridge *br; bool notified = false; bool changed_addr; int err; if (netif_is_bridge_master(dev)) { err = br_vlan_bridge_event(dev, event, ptr); if (err) return notifier_from_errno(err); if (event == NETDEV_REGISTER) { /* register of bridge completed, add sysfs entries */ err = br_sysfs_addbr(dev); if (err) return notifier_from_errno(err); return NOTIFY_DONE; } } /* not a port of a bridge */ p = br_port_get_rtnl(dev); if (!p) return NOTIFY_DONE; br = p->br; switch (event) { case NETDEV_CHANGEMTU: br_mtu_auto_adjust(br); break; case NETDEV_PRE_CHANGEADDR: if (br->dev->addr_assign_type == NET_ADDR_SET) break; prechaddr_info = ptr; err = dev_pre_changeaddr_notify(br->dev, prechaddr_info->dev_addr, extack); if (err) return notifier_from_errno(err); break; case NETDEV_CHANGEADDR: spin_lock_bh(&br->lock); br_fdb_changeaddr(p, dev->dev_addr); changed_addr = br_stp_recalculate_bridge_id(br); spin_unlock_bh(&br->lock); if (changed_addr) call_netdevice_notifiers(NETDEV_CHANGEADDR, br->dev); break; case NETDEV_CHANGE: br_port_carrier_check(p, ¬ified); break; case NETDEV_FEAT_CHANGE: netdev_update_features(br->dev); break; case NETDEV_DOWN: spin_lock_bh(&br->lock); if (br->dev->flags & IFF_UP) { br_stp_disable_port(p); notified = true; } spin_unlock_bh(&br->lock); break; case NETDEV_UP: if (netif_running(br->dev) && netif_oper_up(dev)) { spin_lock_bh(&br->lock); br_stp_enable_port(p); notified = true; spin_unlock_bh(&br->lock); } break; case NETDEV_UNREGISTER: br_del_if(br, dev); break; case NETDEV_CHANGENAME: err = br_sysfs_renameif(p); if (err) return notifier_from_errno(err); break; case NETDEV_PRE_TYPE_CHANGE: /* Forbid underlying device to change its type. */ return NOTIFY_BAD; case NETDEV_RESEND_IGMP: /* Propagate to master device */ call_netdevice_notifiers(event, br->dev); break; } if (event != NETDEV_UNREGISTER) br_vlan_port_event(p, event); /* Events that may cause spanning tree to refresh */ if (!notified && (event == NETDEV_CHANGEADDR || event == NETDEV_UP || event == NETDEV_CHANGE || event == NETDEV_DOWN)) br_ifinfo_notify(RTM_NEWLINK, NULL, p); return NOTIFY_DONE; } static struct notifier_block br_device_notifier = { .notifier_call = br_device_event }; /* called with RTNL or RCU */ static int br_switchdev_event(struct notifier_block *unused, unsigned long event, void *ptr) { struct net_device *dev = switchdev_notifier_info_to_dev(ptr); struct net_bridge_port *p; struct net_bridge *br; struct switchdev_notifier_fdb_info *fdb_info; int err = NOTIFY_DONE; p = br_port_get_rtnl_rcu(dev); if (!p) goto out; br = p->br; switch (event) { case SWITCHDEV_FDB_ADD_TO_BRIDGE: fdb_info = ptr; err = br_fdb_external_learn_add(br, p, fdb_info->addr, fdb_info->vid, fdb_info->locked, false); if (err) { err = notifier_from_errno(err); break; } br_fdb_offloaded_set(br, p, fdb_info->addr, fdb_info->vid, fdb_info->offloaded); break; case SWITCHDEV_FDB_DEL_TO_BRIDGE: fdb_info = ptr; err = br_fdb_external_learn_del(br, p, fdb_info->addr, fdb_info->vid, false); if (err) err = notifier_from_errno(err); break; case SWITCHDEV_FDB_OFFLOADED: fdb_info = ptr; br_fdb_offloaded_set(br, p, fdb_info->addr, fdb_info->vid, fdb_info->offloaded); break; case SWITCHDEV_FDB_FLUSH_TO_BRIDGE: fdb_info = ptr; /* Don't delete static entries */ br_fdb_delete_by_port(br, p, fdb_info->vid, 0); break; } out: return err; } static struct notifier_block br_switchdev_notifier = { .notifier_call = br_switchdev_event, }; /* called under rtnl_mutex */ static int br_switchdev_blocking_event(struct notifier_block *nb, unsigned long event, void *ptr) { struct netlink_ext_ack *extack = netdev_notifier_info_to_extack(ptr); struct net_device *dev = switchdev_notifier_info_to_dev(ptr); struct switchdev_notifier_brport_info *brport_info; const struct switchdev_brport *b; struct net_bridge_port *p; int err = NOTIFY_DONE; p = br_port_get_rtnl(dev); if (!p) goto out; switch (event) { case SWITCHDEV_BRPORT_OFFLOADED: brport_info = ptr; b = &brport_info->brport; err = br_switchdev_port_offload(p, b->dev, b->ctx, b->atomic_nb, b->blocking_nb, b->tx_fwd_offload, extack); err = notifier_from_errno(err); break; case SWITCHDEV_BRPORT_UNOFFLOADED: brport_info = ptr; b = &brport_info->brport; br_switchdev_port_unoffload(p, b->ctx, b->atomic_nb, b->blocking_nb); break; case SWITCHDEV_BRPORT_REPLAY: brport_info = ptr; b = &brport_info->brport; err = br_switchdev_port_replay(p, b->dev, b->ctx, b->atomic_nb, b->blocking_nb, extack); err = notifier_from_errno(err); break; } out: return err; } static struct notifier_block br_switchdev_blocking_notifier = { .notifier_call = br_switchdev_blocking_event, }; /* br_boolopt_toggle - change user-controlled boolean option * * @br: bridge device * @opt: id of the option to change * @on: new option value * @extack: extack for error messages * * Changes the value of the respective boolean option to @on taking care of * any internal option value mapping and configuration. */ int br_boolopt_toggle(struct net_bridge *br, enum br_boolopt_id opt, bool on, struct netlink_ext_ack *extack) { int err = 0; switch (opt) { case BR_BOOLOPT_NO_LL_LEARN: br_opt_toggle(br, BROPT_NO_LL_LEARN, on); break; case BR_BOOLOPT_MCAST_VLAN_SNOOPING: err = br_multicast_toggle_vlan_snooping(br, on, extack); break; case BR_BOOLOPT_MST_ENABLE: err = br_mst_set_enabled(br, on, extack); break; default: /* shouldn't be called with unsupported options */ WARN_ON(1); break; } return err; } int br_boolopt_get(const struct net_bridge *br, enum br_boolopt_id opt) { switch (opt) { case BR_BOOLOPT_NO_LL_LEARN: return br_opt_get(br, BROPT_NO_LL_LEARN); case BR_BOOLOPT_MCAST_VLAN_SNOOPING: return br_opt_get(br, BROPT_MCAST_VLAN_SNOOPING_ENABLED); case BR_BOOLOPT_MST_ENABLE: return br_opt_get(br, BROPT_MST_ENABLED); default: /* shouldn't be called with unsupported options */ WARN_ON(1); break; } return 0; } int br_boolopt_multi_toggle(struct net_bridge *br, struct br_boolopt_multi *bm, struct netlink_ext_ack *extack) { unsigned long bitmap = bm->optmask; int err = 0; int opt_id; for_each_set_bit(opt_id, &bitmap, BR_BOOLOPT_MAX) { bool on = !!(bm->optval & BIT(opt_id)); err = br_boolopt_toggle(br, opt_id, on, extack); if (err) { br_debug(br, "boolopt multi-toggle error: option: %d current: %d new: %d error: %d\n", opt_id, br_boolopt_get(br, opt_id), on, err); break; } } return err; } void br_boolopt_multi_get(const struct net_bridge *br, struct br_boolopt_multi *bm) { u32 optval = 0; int opt_id; for (opt_id = 0; opt_id < BR_BOOLOPT_MAX; opt_id++) optval |= (br_boolopt_get(br, opt_id) << opt_id); bm->optval = optval; bm->optmask = GENMASK((BR_BOOLOPT_MAX - 1), 0); } /* private bridge options, controlled by the kernel */ void br_opt_toggle(struct net_bridge *br, enum net_bridge_opts opt, bool on) { bool cur = !!br_opt_get(br, opt); br_debug(br, "toggle option: %d state: %d -> %d\n", opt, cur, on); if (cur == on) return; if (on) set_bit(opt, &br->options); else clear_bit(opt, &br->options); } static void __net_exit br_net_exit_batch_rtnl(struct list_head *net_list, struct list_head *dev_to_kill) { struct net_device *dev; struct net *net; ASSERT_RTNL(); list_for_each_entry(net, net_list, exit_list) for_each_netdev(net, dev) if (netif_is_bridge_master(dev)) br_dev_delete(dev, dev_to_kill); } static struct pernet_operations br_net_ops = { .exit_batch_rtnl = br_net_exit_batch_rtnl, }; static const struct stp_proto br_stp_proto = { .rcv = br_stp_rcv, }; static int __init br_init(void) { int err; BUILD_BUG_ON(sizeof(struct br_input_skb_cb) > sizeof_field(struct sk_buff, cb)); err = stp_proto_register(&br_stp_proto); if (err < 0) { pr_err("bridge: can't register sap for STP\n"); return err; } err = br_fdb_init(); if (err) goto err_out; err = register_pernet_subsys(&br_net_ops); if (err) goto err_out1; err = br_nf_core_init(); if (err) goto err_out2; err = register_netdevice_notifier(&br_device_notifier); if (err) goto err_out3; err = register_switchdev_notifier(&br_switchdev_notifier); if (err) goto err_out4; err = register_switchdev_blocking_notifier(&br_switchdev_blocking_notifier); if (err) goto err_out5; err = br_netlink_init(); if (err) goto err_out6; brioctl_set(br_ioctl_stub); #if IS_ENABLED(CONFIG_ATM_LANE) br_fdb_test_addr_hook = br_fdb_test_addr; #endif #if IS_MODULE(CONFIG_BRIDGE_NETFILTER) pr_info("bridge: filtering via arp/ip/ip6tables is no longer available " "by default. Update your scripts to load br_netfilter if you " "need this.\n"); #endif return 0; err_out6: unregister_switchdev_blocking_notifier(&br_switchdev_blocking_notifier); err_out5: unregister_switchdev_notifier(&br_switchdev_notifier); err_out4: unregister_netdevice_notifier(&br_device_notifier); err_out3: br_nf_core_fini(); err_out2: unregister_pernet_subsys(&br_net_ops); err_out1: br_fdb_fini(); err_out: stp_proto_unregister(&br_stp_proto); return err; } static void __exit br_deinit(void) { stp_proto_unregister(&br_stp_proto); br_netlink_fini(); unregister_switchdev_blocking_notifier(&br_switchdev_blocking_notifier); unregister_switchdev_notifier(&br_switchdev_notifier); unregister_netdevice_notifier(&br_device_notifier); brioctl_set(NULL); unregister_pernet_subsys(&br_net_ops); rcu_barrier(); /* Wait for completion of call_rcu()'s */ br_nf_core_fini(); #if IS_ENABLED(CONFIG_ATM_LANE) br_fdb_test_addr_hook = NULL; #endif br_fdb_fini(); } module_init(br_init) module_exit(br_deinit) MODULE_LICENSE("GPL"); MODULE_VERSION(BR_VERSION); MODULE_ALIAS_RTNL_LINK("bridge"); MODULE_DESCRIPTION("Ethernet bridge driver"); |
9 9 5 3 3 3 9 1 8 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 | // SPDX-License-Identifier: GPL-2.0 /* * Parts of this file are * Copyright (C) 2022-2023 Intel Corporation */ #include <linux/ieee80211.h> #include <linux/export.h> #include <net/cfg80211.h> #include "nl80211.h" #include "core.h" #include "rdev-ops.h" static int ___cfg80211_stop_ap(struct cfg80211_registered_device *rdev, struct net_device *dev, unsigned int link_id, bool notify) { struct wireless_dev *wdev = dev->ieee80211_ptr; int err; lockdep_assert_wiphy(wdev->wiphy); if (!rdev->ops->stop_ap) return -EOPNOTSUPP; if (dev->ieee80211_ptr->iftype != NL80211_IFTYPE_AP && dev->ieee80211_ptr->iftype != NL80211_IFTYPE_P2P_GO) return -EOPNOTSUPP; if (!wdev->links[link_id].ap.beacon_interval) return -ENOENT; err = rdev_stop_ap(rdev, dev, link_id); if (!err) { wdev->conn_owner_nlportid = 0; wdev->links[link_id].ap.beacon_interval = 0; memset(&wdev->links[link_id].ap.chandef, 0, sizeof(wdev->links[link_id].ap.chandef)); wdev->u.ap.ssid_len = 0; rdev_set_qos_map(rdev, dev, NULL); if (notify) nl80211_send_ap_stopped(wdev, link_id); /* Should we apply the grace period during beaconing interface * shutdown also? */ cfg80211_sched_dfs_chan_update(rdev); } schedule_work(&cfg80211_disconnect_work); return err; } int cfg80211_stop_ap(struct cfg80211_registered_device *rdev, struct net_device *dev, int link_id, bool notify) { unsigned int link; int ret = 0; if (link_id >= 0) return ___cfg80211_stop_ap(rdev, dev, link_id, notify); for_each_valid_link(dev->ieee80211_ptr, link) { int ret1 = ___cfg80211_stop_ap(rdev, dev, link, notify); if (ret1) ret = ret1; /* try the next one also if one errored */ } return ret; } |
34110 712 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 | /* SPDX-License-Identifier: GPL-2.0 */ #ifndef _ASM_X86_CPUFEATURE_H #define _ASM_X86_CPUFEATURE_H #include <asm/processor.h> #if defined(__KERNEL__) && !defined(__ASSEMBLY__) #include <asm/asm.h> #include <linux/bitops.h> #include <asm/alternative.h> enum cpuid_leafs { CPUID_1_EDX = 0, CPUID_8000_0001_EDX, CPUID_8086_0001_EDX, CPUID_LNX_1, CPUID_1_ECX, CPUID_C000_0001_EDX, CPUID_8000_0001_ECX, CPUID_LNX_2, CPUID_LNX_3, CPUID_7_0_EBX, CPUID_D_1_EAX, CPUID_LNX_4, CPUID_7_1_EAX, CPUID_8000_0008_EBX, CPUID_6_EAX, CPUID_8000_000A_EDX, CPUID_7_ECX, CPUID_8000_0007_EBX, CPUID_7_EDX, CPUID_8000_001F_EAX, CPUID_8000_0021_EAX, CPUID_LNX_5, NR_CPUID_WORDS, }; #define X86_CAP_FMT_NUM "%d:%d" #define x86_cap_flag_num(flag) ((flag) >> 5), ((flag) & 31) extern const char * const x86_cap_flags[NCAPINTS*32]; extern const char * const x86_power_flags[32]; #define X86_CAP_FMT "%s" #define x86_cap_flag(flag) x86_cap_flags[flag] /* * In order to save room, we index into this array by doing * X86_BUG_<name> - NCAPINTS*32. */ extern const char * const x86_bug_flags[NBUGINTS*32]; #define test_cpu_cap(c, bit) \ arch_test_bit(bit, (unsigned long *)((c)->x86_capability)) /* * There are 32 bits/features in each mask word. The high bits * (selected with (bit>>5) give us the word number and the low 5 * bits give us the bit/feature number inside the word. * (1UL<<((bit)&31) gives us a mask for the feature_bit so we can * see if it is set in the mask word. */ #define CHECK_BIT_IN_MASK_WORD(maskname, word, bit) \ (((bit)>>5)==(word) && (1UL<<((bit)&31) & maskname##word )) /* * {REQUIRED,DISABLED}_MASK_CHECK below may seem duplicated with the * following BUILD_BUG_ON_ZERO() check but when NCAPINTS gets changed, all * header macros which use NCAPINTS need to be changed. The duplicated macro * use causes the compiler to issue errors for all headers so that all usage * sites can be corrected. */ #define REQUIRED_MASK_BIT_SET(feature_bit) \ ( CHECK_BIT_IN_MASK_WORD(REQUIRED_MASK, 0, feature_bit) || \ CHECK_BIT_IN_MASK_WORD(REQUIRED_MASK, 1, feature_bit) || \ CHECK_BIT_IN_MASK_WORD(REQUIRED_MASK, 2, feature_bit) || \ CHECK_BIT_IN_MASK_WORD(REQUIRED_MASK, 3, feature_bit) || \ CHECK_BIT_IN_MASK_WORD(REQUIRED_MASK, 4, feature_bit) || \ CHECK_BIT_IN_MASK_WORD(REQUIRED_MASK, 5, feature_bit) || \ CHECK_BIT_IN_MASK_WORD(REQUIRED_MASK, 6, feature_bit) || \ CHECK_BIT_IN_MASK_WORD(REQUIRED_MASK, 7, feature_bit) || \ CHECK_BIT_IN_MASK_WORD(REQUIRED_MASK, 8, feature_bit) || \ CHECK_BIT_IN_MASK_WORD(REQUIRED_MASK, 9, feature_bit) || \ CHECK_BIT_IN_MASK_WORD(REQUIRED_MASK, 10, feature_bit) || \ CHECK_BIT_IN_MASK_WORD(REQUIRED_MASK, 11, feature_bit) || \ CHECK_BIT_IN_MASK_WORD(REQUIRED_MASK, 12, feature_bit) || \ CHECK_BIT_IN_MASK_WORD(REQUIRED_MASK, 13, feature_bit) || \ CHECK_BIT_IN_MASK_WORD(REQUIRED_MASK, 14, feature_bit) || \ CHECK_BIT_IN_MASK_WORD(REQUIRED_MASK, 15, feature_bit) || \ CHECK_BIT_IN_MASK_WORD(REQUIRED_MASK, 16, feature_bit) || \ CHECK_BIT_IN_MASK_WORD(REQUIRED_MASK, 17, feature_bit) || \ CHECK_BIT_IN_MASK_WORD(REQUIRED_MASK, 18, feature_bit) || \ CHECK_BIT_IN_MASK_WORD(REQUIRED_MASK, 19, feature_bit) || \ CHECK_BIT_IN_MASK_WORD(REQUIRED_MASK, 20, feature_bit) || \ CHECK_BIT_IN_MASK_WORD(REQUIRED_MASK, 21, feature_bit) || \ REQUIRED_MASK_CHECK || \ BUILD_BUG_ON_ZERO(NCAPINTS != 22)) #define DISABLED_MASK_BIT_SET(feature_bit) \ ( CHECK_BIT_IN_MASK_WORD(DISABLED_MASK, 0, feature_bit) || \ CHECK_BIT_IN_MASK_WORD(DISABLED_MASK, 1, feature_bit) || \ CHECK_BIT_IN_MASK_WORD(DISABLED_MASK, 2, feature_bit) || \ CHECK_BIT_IN_MASK_WORD(DISABLED_MASK, 3, feature_bit) || \ CHECK_BIT_IN_MASK_WORD(DISABLED_MASK, 4, feature_bit) || \ CHECK_BIT_IN_MASK_WORD(DISABLED_MASK, 5, feature_bit) || \ CHECK_BIT_IN_MASK_WORD(DISABLED_MASK, 6, feature_bit) || \ CHECK_BIT_IN_MASK_WORD(DISABLED_MASK, 7, feature_bit) || \ CHECK_BIT_IN_MASK_WORD(DISABLED_MASK, 8, feature_bit) || \ CHECK_BIT_IN_MASK_WORD(DISABLED_MASK, 9, feature_bit) || \ CHECK_BIT_IN_MASK_WORD(DISABLED_MASK, 10, feature_bit) || \ CHECK_BIT_IN_MASK_WORD(DISABLED_MASK, 11, feature_bit) || \ CHECK_BIT_IN_MASK_WORD(DISABLED_MASK, 12, feature_bit) || \ CHECK_BIT_IN_MASK_WORD(DISABLED_MASK, 13, feature_bit) || \ CHECK_BIT_IN_MASK_WORD(DISABLED_MASK, 14, feature_bit) || \ CHECK_BIT_IN_MASK_WORD(DISABLED_MASK, 15, feature_bit) || \ CHECK_BIT_IN_MASK_WORD(DISABLED_MASK, 16, feature_bit) || \ CHECK_BIT_IN_MASK_WORD(DISABLED_MASK, 17, feature_bit) || \ CHECK_BIT_IN_MASK_WORD(DISABLED_MASK, 18, feature_bit) || \ CHECK_BIT_IN_MASK_WORD(DISABLED_MASK, 19, feature_bit) || \ CHECK_BIT_IN_MASK_WORD(DISABLED_MASK, 20, feature_bit) || \ CHECK_BIT_IN_MASK_WORD(DISABLED_MASK, 21, feature_bit) || \ DISABLED_MASK_CHECK || \ BUILD_BUG_ON_ZERO(NCAPINTS != 22)) #define cpu_has(c, bit) \ (__builtin_constant_p(bit) && REQUIRED_MASK_BIT_SET(bit) ? 1 : \ test_cpu_cap(c, bit)) #define this_cpu_has(bit) \ (__builtin_constant_p(bit) && REQUIRED_MASK_BIT_SET(bit) ? 1 : \ x86_this_cpu_test_bit(bit, cpu_info.x86_capability)) /* * This macro is for detection of features which need kernel * infrastructure to be used. It may *not* directly test the CPU * itself. Use the cpu_has() family if you want true runtime * testing of CPU features, like in hypervisor code where you are * supporting a possible guest feature where host support for it * is not relevant. */ #define cpu_feature_enabled(bit) \ (__builtin_constant_p(bit) && DISABLED_MASK_BIT_SET(bit) ? 0 : static_cpu_has(bit)) #define boot_cpu_has(bit) cpu_has(&boot_cpu_data, bit) #define set_cpu_cap(c, bit) set_bit(bit, (unsigned long *)((c)->x86_capability)) extern void setup_clear_cpu_cap(unsigned int bit); extern void clear_cpu_cap(struct cpuinfo_x86 *c, unsigned int bit); #define setup_force_cpu_cap(bit) do { \ \ if (!boot_cpu_has(bit)) \ WARN_ON(alternatives_patched); \ \ set_cpu_cap(&boot_cpu_data, bit); \ set_bit(bit, (unsigned long *)cpu_caps_set); \ } while (0) #define setup_force_cpu_bug(bit) setup_force_cpu_cap(bit) /* * Static testing of CPU features. Used the same as boot_cpu_has(). It * statically patches the target code for additional performance. Use * static_cpu_has() only in fast paths, where every cycle counts. Which * means that the boot_cpu_has() variant is already fast enough for the * majority of cases and you should stick to using it as it is generally * only two instructions: a RIP-relative MOV and a TEST. * * Do not use an "m" constraint for [cap_byte] here: gcc doesn't know * that this is only used on a fallback path and will sometimes cause * it to manifest the address of boot_cpu_data in a register, fouling * the mainline (post-initialization) code. */ static __always_inline bool _static_cpu_has(u16 bit) { asm goto(ALTERNATIVE_TERNARY("jmp 6f", %c[feature], "", "jmp %l[t_no]") ".pushsection .altinstr_aux,\"ax\"\n" "6:\n" " testb %[bitnum], %a[cap_byte]\n" " jnz %l[t_yes]\n" " jmp %l[t_no]\n" ".popsection\n" : : [feature] "i" (bit), [bitnum] "i" (1 << (bit & 7)), [cap_byte] "i" (&((const char *)boot_cpu_data.x86_capability)[bit >> 3]) : : t_yes, t_no); t_yes: return true; t_no: return false; } #define static_cpu_has(bit) \ ( \ __builtin_constant_p(boot_cpu_has(bit)) ? \ boot_cpu_has(bit) : \ _static_cpu_has(bit) \ ) #define cpu_has_bug(c, bit) cpu_has(c, (bit)) #define set_cpu_bug(c, bit) set_cpu_cap(c, (bit)) #define clear_cpu_bug(c, bit) clear_cpu_cap(c, (bit)) #define static_cpu_has_bug(bit) static_cpu_has((bit)) #define boot_cpu_has_bug(bit) cpu_has_bug(&boot_cpu_data, (bit)) #define boot_cpu_set_bug(bit) set_cpu_cap(&boot_cpu_data, (bit)) #define MAX_CPU_FEATURES (NCAPINTS * 32) #define cpu_have_feature boot_cpu_has #define CPU_FEATURE_TYPEFMT "x86,ven%04Xfam%04Xmod%04X" #define CPU_FEATURE_TYPEVAL boot_cpu_data.x86_vendor, boot_cpu_data.x86, \ boot_cpu_data.x86_model #endif /* defined(__KERNEL__) && !defined(__ASSEMBLY__) */ #endif /* _ASM_X86_CPUFEATURE_H */ |
130 1 6 128 2 5 5 5 5 5 4 3 2 6 6 5 2 2 10 10 9 2 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 | // SPDX-License-Identifier: GPL-2.0-only /* * kernel/freezer.c - Function to freeze a process * * Originally from kernel/power/process.c */ #include <linux/interrupt.h> #include <linux/suspend.h> #include <linux/export.h> #include <linux/syscalls.h> #include <linux/freezer.h> #include <linux/kthread.h> /* total number of freezing conditions in effect */ DEFINE_STATIC_KEY_FALSE(freezer_active); EXPORT_SYMBOL(freezer_active); /* * indicate whether PM freezing is in effect, protected by * system_transition_mutex */ bool pm_freezing; bool pm_nosig_freezing; /* protects freezing and frozen transitions */ static DEFINE_SPINLOCK(freezer_lock); /** * freezing_slow_path - slow path for testing whether a task needs to be frozen * @p: task to be tested * * This function is called by freezing() if freezer_active isn't zero * and tests whether @p needs to enter and stay in frozen state. Can be * called under any context. The freezers are responsible for ensuring the * target tasks see the updated state. */ bool freezing_slow_path(struct task_struct *p) { if (p->flags & (PF_NOFREEZE | PF_SUSPEND_TASK)) return false; if (test_tsk_thread_flag(p, TIF_MEMDIE)) return false; if (pm_nosig_freezing || cgroup_freezing(p)) return true; if (pm_freezing && !(p->flags & PF_KTHREAD)) return true; return false; } EXPORT_SYMBOL(freezing_slow_path); bool frozen(struct task_struct *p) { return READ_ONCE(p->__state) & TASK_FROZEN; } /* Refrigerator is place where frozen processes are stored :-). */ bool __refrigerator(bool check_kthr_stop) { unsigned int state = get_current_state(); bool was_frozen = false; pr_debug("%s entered refrigerator\n", current->comm); WARN_ON_ONCE(state && !(state & TASK_NORMAL)); for (;;) { bool freeze; raw_spin_lock_irq(¤t->pi_lock); set_current_state(TASK_FROZEN); /* unstale saved_state so that __thaw_task() will wake us up */ current->saved_state = TASK_RUNNING; raw_spin_unlock_irq(¤t->pi_lock); spin_lock_irq(&freezer_lock); freeze = freezing(current) && !(check_kthr_stop && kthread_should_stop()); spin_unlock_irq(&freezer_lock); if (!freeze) break; was_frozen = true; schedule(); } __set_current_state(TASK_RUNNING); pr_debug("%s left refrigerator\n", current->comm); return was_frozen; } EXPORT_SYMBOL(__refrigerator); static void fake_signal_wake_up(struct task_struct *p) { unsigned long flags; if (lock_task_sighand(p, &flags)) { signal_wake_up(p, 0); unlock_task_sighand(p, &flags); } } static int __set_task_frozen(struct task_struct *p, void *arg) { unsigned int state = READ_ONCE(p->__state); if (p->on_rq) return 0; if (p != current && task_curr(p)) return 0; if (!(state & (TASK_FREEZABLE | __TASK_STOPPED | __TASK_TRACED))) return 0; /* * Only TASK_NORMAL can be augmented with TASK_FREEZABLE, since they * can suffer spurious wakeups. */ if (state & TASK_FREEZABLE) WARN_ON_ONCE(!(state & TASK_NORMAL)); #ifdef CONFIG_LOCKDEP /* * It's dangerous to freeze with locks held; there be dragons there. */ if (!(state & __TASK_FREEZABLE_UNSAFE)) WARN_ON_ONCE(debug_locks && p->lockdep_depth); #endif p->saved_state = p->__state; WRITE_ONCE(p->__state, TASK_FROZEN); return TASK_FROZEN; } static bool __freeze_task(struct task_struct *p) { /* TASK_FREEZABLE|TASK_STOPPED|TASK_TRACED -> TASK_FROZEN */ return task_call_func(p, __set_task_frozen, NULL); } /** * freeze_task - send a freeze request to given task * @p: task to send the request to * * If @p is freezing, the freeze request is sent either by sending a fake * signal (if it's not a kernel thread) or waking it up (if it's a kernel * thread). * * RETURNS: * %false, if @p is not freezing or already frozen; %true, otherwise */ bool freeze_task(struct task_struct *p) { unsigned long flags; spin_lock_irqsave(&freezer_lock, flags); if (!freezing(p) || frozen(p) || __freeze_task(p)) { spin_unlock_irqrestore(&freezer_lock, flags); return false; } if (!(p->flags & PF_KTHREAD)) fake_signal_wake_up(p); else wake_up_state(p, TASK_NORMAL); spin_unlock_irqrestore(&freezer_lock, flags); return true; } /* * Restore the saved_state before the task entered freezer. For typical task * in the __refrigerator(), saved_state == TASK_RUNNING so nothing happens * here. For tasks which were TASK_NORMAL | TASK_FREEZABLE, their initial state * is restored unless they got an expected wakeup (see ttwu_state_match()). * Returns 1 if the task state was restored. */ static int __restore_freezer_state(struct task_struct *p, void *arg) { unsigned int state = p->saved_state; if (state != TASK_RUNNING) { WRITE_ONCE(p->__state, state); p->saved_state = TASK_RUNNING; return 1; } return 0; } void __thaw_task(struct task_struct *p) { unsigned long flags; spin_lock_irqsave(&freezer_lock, flags); if (WARN_ON_ONCE(freezing(p))) goto unlock; if (!frozen(p) || task_call_func(p, __restore_freezer_state, NULL)) goto unlock; wake_up_state(p, TASK_FROZEN); unlock: spin_unlock_irqrestore(&freezer_lock, flags); } /** * set_freezable - make %current freezable * * Mark %current freezable and enter refrigerator if necessary. */ bool set_freezable(void) { might_sleep(); /* * Modify flags while holding freezer_lock. This ensures the * freezer notices that we aren't frozen yet or the freezing * condition is visible to try_to_freeze() below. */ spin_lock_irq(&freezer_lock); current->flags &= ~PF_NOFREEZE; spin_unlock_irq(&freezer_lock); return try_to_freeze(); } EXPORT_SYMBOL(set_freezable); |
1 1 1 3 10 10 29 29 189 190 8 8 8 8 8 29 21 7 1 8 8 29 8 21 33 6 28 28 1 29 1 29 29 29 29 29 2 8 8 10 1 7 2 2 5 44 5 44 3 47 47 24 38 43 33 13 8 9 49 50 4 49 27 27 19 18 13 13 15 15 43 43 43 43 43 43 43 43 19 19 19 17 4 1 20 20 20 20 20 1 19 23 1 2 2 20 20 20 20 45 2 43 4 40 1 1 40 1 37 27 4 23 5 5 11 11 1 1 7 7 7 6 1 1 6 6 6 2 4 2 3 4 6 4 4 30 2 17 11 24 2 5 5 2 1 2 27 6 27 27 5 22 1 22 8 2 2 2 4 1 1 1 1 2 1 1 107 2 4 102 86 19 18 82 35 44 5 7 2 1 1 6 22 4 16 22 2 3 1 2 2 2 2 2 7 2 2 1 1 10 10 1 1 10 10 37 37 37 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 2297 2298 2299 2300 2301 2302 2303 2304 2305 2306 2307 2308 2309 2310 2311 2312 2313 2314 2315 2316 2317 2318 2319 2320 2321 2322 2323 2324 2325 2326 2327 2328 2329 2330 2331 2332 2333 2334 2335 2336 2337 2338 2339 2340 2341 2342 2343 2344 2345 2346 2347 2348 2349 2350 2351 2352 2353 2354 2355 2356 2357 2358 2359 2360 2361 2362 2363 2364 2365 2366 2367 2368 2369 2370 2371 2372 2373 2374 2375 2376 2377 2378 2379 2380 2381 2382 2383 2384 2385 2386 2387 2388 2389 2390 2391 2392 2393 2394 2395 2396 2397 2398 2399 2400 2401 2402 2403 2404 2405 2406 2407 2408 2409 2410 2411 2412 2413 2414 2415 2416 2417 2418 2419 2420 2421 2422 2423 2424 2425 2426 2427 2428 2429 2430 2431 2432 2433 2434 2435 2436 2437 2438 2439 2440 2441 2442 2443 2444 2445 2446 2447 2448 2449 2450 2451 2452 2453 2454 2455 2456 2457 2458 2459 2460 2461 2462 2463 2464 2465 2466 2467 2468 2469 2470 2471 2472 2473 2474 2475 2476 2477 2478 2479 2480 2481 2482 2483 2484 2485 2486 2487 2488 2489 2490 2491 2492 2493 2494 2495 2496 2497 2498 2499 2500 2501 2502 2503 2504 2505 2506 2507 2508 2509 2510 2511 2512 2513 2514 2515 2516 2517 2518 2519 2520 2521 2522 2523 2524 2525 2526 2527 2528 2529 2530 2531 2532 2533 2534 2535 2536 2537 2538 2539 2540 2541 2542 2543 2544 2545 2546 2547 2548 2549 2550 2551 2552 2553 2554 2555 2556 2557 2558 2559 2560 2561 2562 2563 2564 2565 2566 2567 2568 2569 2570 2571 2572 2573 2574 2575 2576 2577 2578 2579 2580 2581 2582 2583 2584 2585 2586 2587 2588 2589 2590 2591 2592 2593 2594 2595 2596 2597 2598 2599 2600 2601 2602 2603 2604 2605 2606 2607 2608 2609 2610 2611 2612 2613 2614 2615 2616 2617 2618 2619 2620 2621 2622 2623 2624 2625 2626 2627 2628 2629 2630 2631 2632 2633 2634 2635 2636 2637 2638 2639 2640 2641 2642 2643 2644 2645 2646 2647 2648 2649 2650 2651 2652 2653 2654 2655 2656 2657 2658 2659 2660 2661 2662 2663 2664 2665 2666 2667 2668 2669 2670 2671 2672 2673 2674 2675 2676 2677 2678 2679 2680 2681 2682 2683 2684 2685 2686 2687 2688 2689 2690 2691 2692 2693 2694 2695 2696 2697 2698 2699 2700 2701 2702 2703 2704 2705 2706 2707 2708 2709 2710 2711 2712 2713 2714 2715 2716 2717 2718 2719 2720 2721 2722 2723 2724 2725 2726 2727 2728 2729 2730 2731 2732 2733 2734 2735 2736 2737 2738 2739 2740 2741 2742 2743 2744 2745 2746 2747 2748 2749 2750 2751 2752 2753 2754 2755 2756 2757 2758 2759 2760 2761 2762 2763 2764 2765 2766 2767 2768 2769 2770 2771 2772 2773 2774 2775 2776 2777 2778 2779 2780 2781 2782 2783 2784 2785 2786 2787 2788 2789 2790 2791 2792 2793 2794 2795 2796 2797 2798 2799 2800 2801 2802 2803 2804 2805 2806 2807 2808 2809 2810 2811 2812 2813 2814 2815 2816 2817 2818 2819 2820 2821 2822 2823 2824 2825 2826 2827 2828 2829 2830 2831 2832 2833 2834 2835 2836 2837 2838 2839 2840 2841 2842 2843 2844 2845 2846 2847 2848 2849 2850 2851 2852 2853 2854 2855 2856 2857 2858 2859 2860 2861 2862 2863 2864 2865 2866 2867 2868 2869 2870 2871 2872 2873 2874 2875 2876 2877 2878 2879 2880 2881 2882 2883 2884 2885 2886 2887 2888 2889 2890 2891 2892 2893 2894 2895 2896 2897 2898 2899 2900 2901 2902 2903 2904 2905 2906 2907 2908 2909 2910 2911 2912 2913 2914 2915 2916 2917 2918 2919 2920 2921 2922 2923 2924 2925 2926 2927 2928 2929 2930 2931 2932 2933 2934 2935 2936 2937 2938 2939 2940 2941 2942 2943 2944 2945 2946 2947 2948 2949 2950 2951 2952 2953 2954 2955 2956 2957 2958 2959 2960 2961 2962 2963 2964 2965 2966 2967 2968 2969 2970 2971 2972 2973 2974 2975 2976 2977 2978 2979 2980 2981 2982 2983 2984 2985 2986 2987 2988 2989 2990 2991 2992 2993 2994 2995 2996 2997 2998 2999 3000 3001 3002 3003 3004 3005 3006 3007 3008 3009 3010 3011 3012 3013 3014 3015 3016 3017 3018 3019 3020 3021 3022 3023 3024 3025 3026 3027 3028 3029 3030 3031 3032 3033 3034 3035 3036 3037 3038 3039 3040 3041 3042 3043 3044 3045 3046 3047 3048 3049 3050 3051 3052 3053 3054 3055 3056 3057 3058 3059 3060 3061 3062 3063 3064 3065 3066 3067 3068 3069 3070 3071 3072 3073 3074 3075 3076 3077 3078 3079 3080 3081 3082 3083 3084 3085 3086 3087 3088 3089 3090 3091 3092 3093 3094 3095 3096 3097 3098 3099 3100 3101 3102 3103 3104 3105 3106 3107 3108 3109 3110 3111 3112 3113 3114 3115 3116 3117 3118 3119 3120 3121 3122 3123 3124 3125 3126 3127 3128 3129 3130 3131 3132 3133 3134 3135 3136 3137 3138 3139 3140 3141 3142 3143 3144 3145 3146 3147 3148 3149 3150 3151 3152 3153 3154 3155 3156 3157 3158 3159 3160 3161 3162 3163 3164 3165 3166 3167 3168 3169 3170 3171 3172 3173 3174 3175 3176 3177 3178 3179 3180 3181 3182 3183 3184 3185 3186 3187 3188 3189 3190 3191 3192 3193 3194 3195 3196 3197 3198 3199 3200 3201 3202 3203 3204 3205 3206 3207 3208 3209 3210 3211 3212 3213 3214 3215 3216 3217 3218 3219 3220 3221 3222 3223 3224 3225 3226 3227 3228 3229 3230 3231 3232 3233 3234 3235 3236 3237 3238 3239 3240 3241 3242 3243 3244 3245 3246 3247 3248 3249 3250 3251 3252 3253 3254 3255 3256 3257 3258 3259 3260 3261 3262 3263 3264 3265 3266 3267 3268 3269 3270 3271 3272 3273 3274 3275 3276 3277 3278 3279 3280 3281 3282 3283 3284 3285 3286 3287 3288 3289 3290 3291 3292 3293 3294 3295 3296 3297 3298 3299 3300 3301 3302 3303 3304 3305 3306 3307 3308 3309 3310 3311 3312 3313 3314 3315 3316 3317 3318 3319 3320 3321 3322 3323 3324 3325 3326 3327 3328 3329 3330 3331 3332 3333 3334 3335 3336 3337 3338 3339 3340 3341 3342 3343 3344 3345 3346 3347 3348 3349 3350 3351 3352 3353 3354 3355 3356 3357 3358 3359 3360 3361 3362 3363 3364 3365 3366 3367 3368 3369 3370 3371 3372 3373 3374 3375 3376 3377 3378 3379 3380 3381 3382 3383 3384 3385 3386 3387 3388 3389 3390 3391 3392 3393 3394 3395 3396 3397 3398 3399 3400 3401 3402 3403 3404 3405 3406 3407 3408 3409 3410 3411 3412 3413 3414 3415 3416 3417 3418 3419 3420 3421 3422 3423 3424 3425 3426 3427 3428 3429 3430 3431 3432 3433 3434 3435 3436 3437 3438 3439 3440 3441 3442 3443 3444 3445 3446 3447 3448 3449 3450 3451 3452 3453 3454 3455 3456 3457 3458 3459 3460 3461 3462 3463 3464 3465 3466 3467 3468 3469 3470 3471 3472 3473 3474 3475 3476 3477 3478 3479 3480 3481 3482 3483 3484 3485 3486 3487 3488 3489 3490 3491 3492 3493 3494 3495 3496 3497 3498 3499 3500 3501 3502 3503 3504 3505 3506 3507 3508 3509 3510 3511 3512 3513 3514 3515 3516 3517 3518 3519 3520 3521 3522 3523 3524 3525 3526 3527 3528 3529 3530 3531 3532 3533 3534 3535 3536 3537 3538 3539 3540 3541 3542 3543 3544 3545 3546 3547 3548 3549 3550 3551 3552 3553 3554 3555 3556 3557 3558 3559 3560 3561 3562 3563 3564 3565 3566 3567 3568 3569 3570 3571 3572 3573 3574 3575 3576 3577 3578 3579 3580 3581 3582 3583 3584 3585 3586 3587 3588 3589 3590 3591 3592 3593 3594 3595 3596 3597 3598 3599 3600 3601 3602 3603 3604 3605 3606 3607 3608 3609 3610 3611 3612 3613 3614 3615 3616 3617 3618 3619 3620 3621 3622 3623 3624 3625 3626 3627 3628 3629 3630 3631 3632 3633 3634 3635 3636 3637 3638 3639 3640 3641 3642 3643 3644 3645 3646 3647 3648 3649 3650 3651 3652 3653 3654 3655 3656 3657 | // SPDX-License-Identifier: GPL-2.0-only /* * Shared Memory Communications over RDMA (SMC-R) and RoCE * * AF_SMC protocol family socket handler keeping the AF_INET sock address type * applies to SOCK_STREAM sockets only * offers an alternative communication option for TCP-protocol sockets * applicable with RoCE-cards only * * Initial restrictions: * - support for alternate links postponed * * Copyright IBM Corp. 2016, 2018 * * Author(s): Ursula Braun <ubraun@linux.vnet.ibm.com> * based on prototype from Frank Blaschka */ #define KMSG_COMPONENT "smc" #define pr_fmt(fmt) KMSG_COMPONENT ": " fmt #include <linux/module.h> #include <linux/socket.h> #include <linux/workqueue.h> #include <linux/in.h> #include <linux/sched/signal.h> #include <linux/if_vlan.h> #include <linux/rcupdate_wait.h> #include <linux/ctype.h> #include <linux/splice.h> #include <net/sock.h> #include <net/tcp.h> #include <net/smc.h> #include <asm/ioctls.h> #include <net/net_namespace.h> #include <net/netns/generic.h> #include "smc_netns.h" #include "smc.h" #include "smc_clc.h" #include "smc_llc.h" #include "smc_cdc.h" #include "smc_core.h" #include "smc_ib.h" #include "smc_ism.h" #include "smc_pnet.h" #include "smc_netlink.h" #include "smc_tx.h" #include "smc_rx.h" #include "smc_close.h" #include "smc_stats.h" #include "smc_tracepoint.h" #include "smc_sysctl.h" #include "smc_loopback.h" #include "smc_inet.h" static DEFINE_MUTEX(smc_server_lgr_pending); /* serialize link group * creation on server */ static DEFINE_MUTEX(smc_client_lgr_pending); /* serialize link group * creation on client */ static struct workqueue_struct *smc_tcp_ls_wq; /* wq for tcp listen work */ struct workqueue_struct *smc_hs_wq; /* wq for handshake work */ struct workqueue_struct *smc_close_wq; /* wq for close work */ static void smc_tcp_listen_work(struct work_struct *); static void smc_connect_work(struct work_struct *); int smc_nl_dump_hs_limitation(struct sk_buff *skb, struct netlink_callback *cb) { struct smc_nl_dmp_ctx *cb_ctx = smc_nl_dmp_ctx(cb); void *hdr; if (cb_ctx->pos[0]) goto out; hdr = genlmsg_put(skb, NETLINK_CB(cb->skb).portid, cb->nlh->nlmsg_seq, &smc_gen_nl_family, NLM_F_MULTI, SMC_NETLINK_DUMP_HS_LIMITATION); if (!hdr) return -ENOMEM; if (nla_put_u8(skb, SMC_NLA_HS_LIMITATION_ENABLED, sock_net(skb->sk)->smc.limit_smc_hs)) goto err; genlmsg_end(skb, hdr); cb_ctx->pos[0] = 1; out: return skb->len; err: genlmsg_cancel(skb, hdr); return -EMSGSIZE; } int smc_nl_enable_hs_limitation(struct sk_buff *skb, struct genl_info *info) { sock_net(skb->sk)->smc.limit_smc_hs = true; return 0; } int smc_nl_disable_hs_limitation(struct sk_buff *skb, struct genl_info *info) { sock_net(skb->sk)->smc.limit_smc_hs = false; return 0; } static void smc_set_keepalive(struct sock *sk, int val) { struct smc_sock *smc = smc_sk(sk); smc->clcsock->sk->sk_prot->keepalive(smc->clcsock->sk, val); } static struct sock *smc_tcp_syn_recv_sock(const struct sock *sk, struct sk_buff *skb, struct request_sock *req, struct dst_entry *dst, struct request_sock *req_unhash, bool *own_req) { struct smc_sock *smc; struct sock *child; smc = smc_clcsock_user_data(sk); if (READ_ONCE(sk->sk_ack_backlog) + atomic_read(&smc->queued_smc_hs) > sk->sk_max_ack_backlog) goto drop; if (sk_acceptq_is_full(&smc->sk)) { NET_INC_STATS(sock_net(sk), LINUX_MIB_LISTENOVERFLOWS); goto drop; } /* passthrough to original syn recv sock fct */ child = smc->ori_af_ops->syn_recv_sock(sk, skb, req, dst, req_unhash, own_req); /* child must not inherit smc or its ops */ if (child) { rcu_assign_sk_user_data(child, NULL); /* v4-mapped sockets don't inherit parent ops. Don't restore. */ if (inet_csk(child)->icsk_af_ops == inet_csk(sk)->icsk_af_ops) inet_csk(child)->icsk_af_ops = smc->ori_af_ops; } return child; drop: dst_release(dst); tcp_listendrop(sk); return NULL; } static bool smc_hs_congested(const struct sock *sk) { const struct smc_sock *smc; smc = smc_clcsock_user_data(sk); if (!smc) return true; if (workqueue_congested(WORK_CPU_UNBOUND, smc_hs_wq)) return true; return false; } struct smc_hashinfo smc_v4_hashinfo = { .lock = __RW_LOCK_UNLOCKED(smc_v4_hashinfo.lock), }; struct smc_hashinfo smc_v6_hashinfo = { .lock = __RW_LOCK_UNLOCKED(smc_v6_hashinfo.lock), }; int smc_hash_sk(struct sock *sk) { struct smc_hashinfo *h = sk->sk_prot->h.smc_hash; struct hlist_head *head; head = &h->ht; write_lock_bh(&h->lock); sk_add_node(sk, head); write_unlock_bh(&h->lock); sock_prot_inuse_add(sock_net(sk), sk->sk_prot, 1); return 0; } void smc_unhash_sk(struct sock *sk) { struct smc_hashinfo *h = sk->sk_prot->h.smc_hash; write_lock_bh(&h->lock); if (sk_del_node_init(sk)) sock_prot_inuse_add(sock_net(sk), sk->sk_prot, -1); write_unlock_bh(&h->lock); } /* This will be called before user really release sock_lock. So do the * work which we didn't do because of user hold the sock_lock in the * BH context */ void smc_release_cb(struct sock *sk) { struct smc_sock *smc = smc_sk(sk); if (smc->conn.tx_in_release_sock) { smc_tx_pending(&smc->conn); smc->conn.tx_in_release_sock = false; } } struct proto smc_proto = { .name = "SMC", .owner = THIS_MODULE, .keepalive = smc_set_keepalive, .hash = smc_hash_sk, .unhash = smc_unhash_sk, .release_cb = smc_release_cb, .obj_size = sizeof(struct smc_sock), .h.smc_hash = &smc_v4_hashinfo, .slab_flags = SLAB_TYPESAFE_BY_RCU, }; EXPORT_SYMBOL_GPL(smc_proto); struct proto smc_proto6 = { .name = "SMC6", .owner = THIS_MODULE, .keepalive = smc_set_keepalive, .hash = smc_hash_sk, .unhash = smc_unhash_sk, .release_cb = smc_release_cb, .obj_size = sizeof(struct smc_sock), .h.smc_hash = &smc_v6_hashinfo, .slab_flags = SLAB_TYPESAFE_BY_RCU, }; EXPORT_SYMBOL_GPL(smc_proto6); static void smc_fback_restore_callbacks(struct smc_sock *smc) { struct sock *clcsk = smc->clcsock->sk; write_lock_bh(&clcsk->sk_callback_lock); clcsk->sk_user_data = NULL; smc_clcsock_restore_cb(&clcsk->sk_state_change, &smc->clcsk_state_change); smc_clcsock_restore_cb(&clcsk->sk_data_ready, &smc->clcsk_data_ready); smc_clcsock_restore_cb(&clcsk->sk_write_space, &smc->clcsk_write_space); smc_clcsock_restore_cb(&clcsk->sk_error_report, &smc->clcsk_error_report); write_unlock_bh(&clcsk->sk_callback_lock); } static void smc_restore_fallback_changes(struct smc_sock *smc) { if (smc->clcsock->file) { /* non-accepted sockets have no file yet */ smc->clcsock->file->private_data = smc->sk.sk_socket; smc->clcsock->file = NULL; smc_fback_restore_callbacks(smc); } } static int __smc_release(struct smc_sock *smc) { struct sock *sk = &smc->sk; int rc = 0; if (!smc->use_fallback) { rc = smc_close_active(smc); smc_sock_set_flag(sk, SOCK_DEAD); sk->sk_shutdown |= SHUTDOWN_MASK; } else { if (sk->sk_state != SMC_CLOSED) { if (sk->sk_state != SMC_LISTEN && sk->sk_state != SMC_INIT) sock_put(sk); /* passive closing */ if (sk->sk_state == SMC_LISTEN) { /* wake up clcsock accept */ rc = kernel_sock_shutdown(smc->clcsock, SHUT_RDWR); } sk->sk_state = SMC_CLOSED; sk->sk_state_change(sk); } smc_restore_fallback_changes(smc); } sk->sk_prot->unhash(sk); if (sk->sk_state == SMC_CLOSED) { if (smc->clcsock) { release_sock(sk); smc_clcsock_release(smc); lock_sock(sk); } if (!smc->use_fallback) smc_conn_free(&smc->conn); } return rc; } int smc_release(struct socket *sock) { struct sock *sk = sock->sk; struct smc_sock *smc; int old_state, rc = 0; if (!sk) goto out; sock_hold(sk); /* sock_put below */ smc = smc_sk(sk); old_state = sk->sk_state; /* cleanup for a dangling non-blocking connect */ if (smc->connect_nonblock && old_state == SMC_INIT) tcp_abort(smc->clcsock->sk, ECONNABORTED); if (cancel_work_sync(&smc->connect_work)) sock_put(&smc->sk); /* sock_hold in smc_connect for passive closing */ if (sk->sk_state == SMC_LISTEN) /* smc_close_non_accepted() is called and acquires * sock lock for child sockets again */ lock_sock_nested(sk, SINGLE_DEPTH_NESTING); else lock_sock(sk); if (old_state == SMC_INIT && sk->sk_state == SMC_ACTIVE && !smc->use_fallback) smc_close_active_abort(smc); rc = __smc_release(smc); /* detach socket */ sock_orphan(sk); sock->sk = NULL; release_sock(sk); sock_put(sk); /* sock_hold above */ sock_put(sk); /* final sock_put */ out: return rc; } static void smc_destruct(struct sock *sk) { if (sk->sk_state != SMC_CLOSED) return; if (!sock_flag(sk, SOCK_DEAD)) return; } void smc_sk_init(struct net *net, struct sock *sk, int protocol) { struct smc_sock *smc = smc_sk(sk); sk->sk_state = SMC_INIT; sk->sk_destruct = smc_destruct; sk->sk_protocol = protocol; WRITE_ONCE(sk->sk_sndbuf, 2 * READ_ONCE(net->smc.sysctl_wmem)); WRITE_ONCE(sk->sk_rcvbuf, 2 * READ_ONCE(net->smc.sysctl_rmem)); INIT_WORK(&smc->tcp_listen_work, smc_tcp_listen_work); INIT_WORK(&smc->connect_work, smc_connect_work); INIT_DELAYED_WORK(&smc->conn.tx_work, smc_tx_work); INIT_LIST_HEAD(&smc->accept_q); spin_lock_init(&smc->accept_q_lock); spin_lock_init(&smc->conn.send_lock); sk->sk_prot->hash(sk); mutex_init(&smc->clcsock_release_lock); smc_init_saved_callbacks(smc); smc->limit_smc_hs = net->smc.limit_smc_hs; smc->use_fallback = false; /* assume rdma capability first */ smc->fallback_rsn = 0; } static struct sock *smc_sock_alloc(struct net *net, struct socket *sock, int protocol) { struct proto *prot; struct sock *sk; prot = (protocol == SMCPROTO_SMC6) ? &smc_proto6 : &smc_proto; sk = sk_alloc(net, PF_SMC, GFP_KERNEL, prot, 0); if (!sk) return NULL; sock_init_data(sock, sk); /* sets sk_refcnt to 1 */ smc_sk_init(net, sk, protocol); return sk; } int smc_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len) { struct sockaddr_in *addr = (struct sockaddr_in *)uaddr; struct sock *sk = sock->sk; struct smc_sock *smc; int rc; smc = smc_sk(sk); /* replicate tests from inet_bind(), to be safe wrt. future changes */ rc = -EINVAL; if (addr_len < sizeof(struct sockaddr_in)) goto out; rc = -EAFNOSUPPORT; if (addr->sin_family != AF_INET && addr->sin_family != AF_INET6 && addr->sin_family != AF_UNSPEC) goto out; /* accept AF_UNSPEC (mapped to AF_INET) only if s_addr is INADDR_ANY */ if (addr->sin_family == AF_UNSPEC && addr->sin_addr.s_addr != htonl(INADDR_ANY)) goto out; lock_sock(sk); /* Check if socket is already active */ rc = -EINVAL; if (sk->sk_state != SMC_INIT || smc->connect_nonblock) goto out_rel; smc->clcsock->sk->sk_reuse = sk->sk_reuse; smc->clcsock->sk->sk_reuseport = sk->sk_reuseport; rc = kernel_bind(smc->clcsock, uaddr, addr_len); out_rel: release_sock(sk); out: return rc; } /* copy only relevant settings and flags of SOL_SOCKET level from smc to * clc socket (since smc is not called for these options from net/core) */ #define SK_FLAGS_SMC_TO_CLC ((1UL << SOCK_URGINLINE) | \ (1UL << SOCK_KEEPOPEN) | \ (1UL << SOCK_LINGER) | \ (1UL << SOCK_BROADCAST) | \ (1UL << SOCK_TIMESTAMP) | \ (1UL << SOCK_DBG) | \ (1UL << SOCK_RCVTSTAMP) | \ (1UL << SOCK_RCVTSTAMPNS) | \ (1UL << SOCK_LOCALROUTE) | \ (1UL << SOCK_TIMESTAMPING_RX_SOFTWARE) | \ (1UL << SOCK_RXQ_OVFL) | \ (1UL << SOCK_WIFI_STATUS) | \ (1UL << SOCK_NOFCS) | \ (1UL << SOCK_FILTER_LOCKED) | \ (1UL << SOCK_TSTAMP_NEW)) /* if set, use value set by setsockopt() - else use IPv4 or SMC sysctl value */ static void smc_adjust_sock_bufsizes(struct sock *nsk, struct sock *osk, unsigned long mask) { nsk->sk_userlocks = osk->sk_userlocks; if (osk->sk_userlocks & SOCK_SNDBUF_LOCK) nsk->sk_sndbuf = osk->sk_sndbuf; if (osk->sk_userlocks & SOCK_RCVBUF_LOCK) nsk->sk_rcvbuf = osk->sk_rcvbuf; } static void smc_copy_sock_settings(struct sock *nsk, struct sock *osk, unsigned long mask) { /* options we don't get control via setsockopt for */ nsk->sk_type = osk->sk_type; nsk->sk_sndtimeo = osk->sk_sndtimeo; nsk->sk_rcvtimeo = osk->sk_rcvtimeo; nsk->sk_mark = READ_ONCE(osk->sk_mark); nsk->sk_priority = READ_ONCE(osk->sk_priority); nsk->sk_rcvlowat = osk->sk_rcvlowat; nsk->sk_bound_dev_if = osk->sk_bound_dev_if; nsk->sk_err = osk->sk_err; nsk->sk_flags &= ~mask; nsk->sk_flags |= osk->sk_flags & mask; smc_adjust_sock_bufsizes(nsk, osk, mask); } static void smc_copy_sock_settings_to_clc(struct smc_sock *smc) { smc_copy_sock_settings(smc->clcsock->sk, &smc->sk, SK_FLAGS_SMC_TO_CLC); } #define SK_FLAGS_CLC_TO_SMC ((1UL << SOCK_URGINLINE) | \ (1UL << SOCK_KEEPOPEN) | \ (1UL << SOCK_LINGER) | \ (1UL << SOCK_DBG)) /* copy only settings and flags relevant for smc from clc to smc socket */ static void smc_copy_sock_settings_to_smc(struct smc_sock *smc) { smc_copy_sock_settings(&smc->sk, smc->clcsock->sk, SK_FLAGS_CLC_TO_SMC); } /* register the new vzalloced sndbuf on all links */ static int smcr_lgr_reg_sndbufs(struct smc_link *link, struct smc_buf_desc *snd_desc) { struct smc_link_group *lgr = link->lgr; int i, rc = 0; if (!snd_desc->is_vm) return -EINVAL; /* protect against parallel smcr_link_reg_buf() */ down_write(&lgr->llc_conf_mutex); for (i = 0; i < SMC_LINKS_PER_LGR_MAX; i++) { if (!smc_link_active(&lgr->lnk[i])) continue; rc = smcr_link_reg_buf(&lgr->lnk[i], snd_desc); if (rc) break; } up_write(&lgr->llc_conf_mutex); return rc; } /* register the new rmb on all links */ static int smcr_lgr_reg_rmbs(struct smc_link *link, struct smc_buf_desc *rmb_desc) { struct smc_link_group *lgr = link->lgr; bool do_slow = false; int i, rc = 0; rc = smc_llc_flow_initiate(lgr, SMC_LLC_FLOW_RKEY); if (rc) return rc; down_read(&lgr->llc_conf_mutex); for (i = 0; i < SMC_LINKS_PER_LGR_MAX; i++) { if (!smc_link_active(&lgr->lnk[i])) continue; if (!rmb_desc->is_reg_mr[link->link_idx]) { up_read(&lgr->llc_conf_mutex); goto slow_path; } } /* mr register already */ goto fast_path; slow_path: do_slow = true; /* protect against parallel smc_llc_cli_rkey_exchange() and * parallel smcr_link_reg_buf() */ down_write(&lgr->llc_conf_mutex); for (i = 0; i < SMC_LINKS_PER_LGR_MAX; i++) { if (!smc_link_active(&lgr->lnk[i])) continue; rc = smcr_link_reg_buf(&lgr->lnk[i], rmb_desc); if (rc) goto out; } fast_path: /* exchange confirm_rkey msg with peer */ rc = smc_llc_do_confirm_rkey(link, rmb_desc); if (rc) { rc = -EFAULT; goto out; } rmb_desc->is_conf_rkey = true; out: do_slow ? up_write(&lgr->llc_conf_mutex) : up_read(&lgr->llc_conf_mutex); smc_llc_flow_stop(lgr, &lgr->llc_flow_lcl); return rc; } static int smcr_clnt_conf_first_link(struct smc_sock *smc) { struct smc_link *link = smc->conn.lnk; struct smc_llc_qentry *qentry; int rc; /* Receive CONFIRM LINK request from server over RoCE fabric. * Increasing the client's timeout by twice as much as the server's * timeout by default can temporarily avoid decline messages of * both sides crossing or colliding */ qentry = smc_llc_wait(link->lgr, NULL, 2 * SMC_LLC_WAIT_TIME, SMC_LLC_CONFIRM_LINK); if (!qentry) { struct smc_clc_msg_decline dclc; rc = smc_clc_wait_msg(smc, &dclc, sizeof(dclc), SMC_CLC_DECLINE, CLC_WAIT_TIME_SHORT); return rc == -EAGAIN ? SMC_CLC_DECL_TIMEOUT_CL : rc; } smc_llc_save_peer_uid(qentry); rc = smc_llc_eval_conf_link(qentry, SMC_LLC_REQ); smc_llc_flow_qentry_del(&link->lgr->llc_flow_lcl); if (rc) return SMC_CLC_DECL_RMBE_EC; rc = smc_ib_modify_qp_rts(link); if (rc) return SMC_CLC_DECL_ERR_RDYLNK; smc_wr_remember_qp_attr(link); /* reg the sndbuf if it was vzalloced */ if (smc->conn.sndbuf_desc->is_vm) { if (smcr_link_reg_buf(link, smc->conn.sndbuf_desc)) return SMC_CLC_DECL_ERR_REGBUF; } /* reg the rmb */ if (smcr_link_reg_buf(link, smc->conn.rmb_desc)) return SMC_CLC_DECL_ERR_REGBUF; /* confirm_rkey is implicit on 1st contact */ smc->conn.rmb_desc->is_conf_rkey = true; /* send CONFIRM LINK response over RoCE fabric */ rc = smc_llc_send_confirm_link(link, SMC_LLC_RESP); if (rc < 0) return SMC_CLC_DECL_TIMEOUT_CL; smc_llc_link_active(link); smcr_lgr_set_type(link->lgr, SMC_LGR_SINGLE); if (link->lgr->max_links > 1) { /* optional 2nd link, receive ADD LINK request from server */ qentry = smc_llc_wait(link->lgr, NULL, SMC_LLC_WAIT_TIME, SMC_LLC_ADD_LINK); if (!qentry) { struct smc_clc_msg_decline dclc; rc = smc_clc_wait_msg(smc, &dclc, sizeof(dclc), SMC_CLC_DECLINE, CLC_WAIT_TIME_SHORT); if (rc == -EAGAIN) rc = 0; /* no DECLINE received, go with one link */ return rc; } smc_llc_flow_qentry_clr(&link->lgr->llc_flow_lcl); smc_llc_cli_add_link(link, qentry); } return 0; } static bool smc_isascii(char *hostname) { int i; for (i = 0; i < SMC_MAX_HOSTNAME_LEN; i++) if (!isascii(hostname[i])) return false; return true; } static void smc_conn_save_peer_info_fce(struct smc_sock *smc, struct smc_clc_msg_accept_confirm *clc) { struct smc_clc_first_contact_ext *fce; int clc_v2_len; if (clc->hdr.version == SMC_V1 || !(clc->hdr.typev2 & SMC_FIRST_CONTACT_MASK)) return; if (smc->conn.lgr->is_smcd) { memcpy(smc->conn.lgr->negotiated_eid, clc->d1.eid, SMC_MAX_EID_LEN); clc_v2_len = offsetofend(struct smc_clc_msg_accept_confirm, d1); } else { memcpy(smc->conn.lgr->negotiated_eid, clc->r1.eid, SMC_MAX_EID_LEN); clc_v2_len = offsetofend(struct smc_clc_msg_accept_confirm, r1); } fce = (struct smc_clc_first_contact_ext *)(((u8 *)clc) + clc_v2_len); smc->conn.lgr->peer_os = fce->os_type; smc->conn.lgr->peer_smc_release = fce->release; if (smc_isascii(fce->hostname)) memcpy(smc->conn.lgr->peer_hostname, fce->hostname, SMC_MAX_HOSTNAME_LEN); } static void smcr_conn_save_peer_info(struct smc_sock *smc, struct smc_clc_msg_accept_confirm *clc) { int bufsize = smc_uncompress_bufsize(clc->r0.rmbe_size); smc->conn.peer_rmbe_idx = clc->r0.rmbe_idx; smc->conn.local_tx_ctrl.token = ntohl(clc->r0.rmbe_alert_token); smc->conn.peer_rmbe_size = bufsize; atomic_set(&smc->conn.peer_rmbe_space, smc->conn.peer_rmbe_size); smc->conn.tx_off = bufsize * (smc->conn.peer_rmbe_idx - 1); } static void smcd_conn_save_peer_info(struct smc_sock *smc, struct smc_clc_msg_accept_confirm *clc) { int bufsize = smc_uncompress_bufsize(clc->d0.dmbe_size); smc->conn.peer_rmbe_idx = clc->d0.dmbe_idx; smc->conn.peer_token = ntohll(clc->d0.token); /* msg header takes up space in the buffer */ smc->conn.peer_rmbe_size = bufsize - sizeof(struct smcd_cdc_msg); atomic_set(&smc->conn.peer_rmbe_space, smc->conn.peer_rmbe_size); smc->conn.tx_off = bufsize * smc->conn.peer_rmbe_idx; } static void smc_conn_save_peer_info(struct smc_sock *smc, struct smc_clc_msg_accept_confirm *clc) { if (smc->conn.lgr->is_smcd) smcd_conn_save_peer_info(smc, clc); else smcr_conn_save_peer_info(smc, clc); smc_conn_save_peer_info_fce(smc, clc); } static void smc_link_save_peer_info(struct smc_link *link, struct smc_clc_msg_accept_confirm *clc, struct smc_init_info *ini) { link->peer_qpn = ntoh24(clc->r0.qpn); memcpy(link->peer_gid, ini->peer_gid, SMC_GID_SIZE); memcpy(link->peer_mac, ini->peer_mac, sizeof(link->peer_mac)); link->peer_psn = ntoh24(clc->r0.psn); link->peer_mtu = clc->r0.qp_mtu; } static void smc_stat_inc_fback_rsn_cnt(struct smc_sock *smc, struct smc_stats_fback *fback_arr) { int cnt; for (cnt = 0; cnt < SMC_MAX_FBACK_RSN_CNT; cnt++) { if (fback_arr[cnt].fback_code == smc->fallback_rsn) { fback_arr[cnt].count++; break; } if (!fback_arr[cnt].fback_code) { fback_arr[cnt].fback_code = smc->fallback_rsn; fback_arr[cnt].count++; break; } } } static void smc_stat_fallback(struct smc_sock *smc) { struct net *net = sock_net(&smc->sk); mutex_lock(&net->smc.mutex_fback_rsn); if (smc->listen_smc) { smc_stat_inc_fback_rsn_cnt(smc, net->smc.fback_rsn->srv); net->smc.fback_rsn->srv_fback_cnt++; } else { smc_stat_inc_fback_rsn_cnt(smc, net->smc.fback_rsn->clnt); net->smc.fback_rsn->clnt_fback_cnt++; } mutex_unlock(&net->smc.mutex_fback_rsn); } /* must be called under rcu read lock */ static void smc_fback_wakeup_waitqueue(struct smc_sock *smc, void *key) { struct socket_wq *wq; __poll_t flags; wq = rcu_dereference(smc->sk.sk_wq); if (!skwq_has_sleeper(wq)) return; /* wake up smc sk->sk_wq */ if (!key) { /* sk_state_change */ wake_up_interruptible_all(&wq->wait); } else { flags = key_to_poll(key); if (flags & (EPOLLIN | EPOLLOUT)) /* sk_data_ready or sk_write_space */ wake_up_interruptible_sync_poll(&wq->wait, flags); else if (flags & EPOLLERR) /* sk_error_report */ wake_up_interruptible_poll(&wq->wait, flags); } } static int smc_fback_mark_woken(wait_queue_entry_t *wait, unsigned int mode, int sync, void *key) { struct smc_mark_woken *mark = container_of(wait, struct smc_mark_woken, wait_entry); mark->woken = true; mark->key = key; return 0; } static void smc_fback_forward_wakeup(struct smc_sock *smc, struct sock *clcsk, void (*clcsock_callback)(struct sock *sk)) { struct smc_mark_woken mark = { .woken = false }; struct socket_wq *wq; init_waitqueue_func_entry(&mark.wait_entry, smc_fback_mark_woken); rcu_read_lock(); wq = rcu_dereference(clcsk->sk_wq); if (!wq) goto out; add_wait_queue(sk_sleep(clcsk), &mark.wait_entry); clcsock_callback(clcsk); remove_wait_queue(sk_sleep(clcsk), &mark.wait_entry); if (mark.woken) smc_fback_wakeup_waitqueue(smc, mark.key); out: rcu_read_unlock(); } static void smc_fback_state_change(struct sock *clcsk) { struct smc_sock *smc; read_lock_bh(&clcsk->sk_callback_lock); smc = smc_clcsock_user_data(clcsk); if (smc) smc_fback_forward_wakeup(smc, clcsk, smc->clcsk_state_change); read_unlock_bh(&clcsk->sk_callback_lock); } static void smc_fback_data_ready(struct sock *clcsk) { struct smc_sock *smc; read_lock_bh(&clcsk->sk_callback_lock); smc = smc_clcsock_user_data(clcsk); if (smc) smc_fback_forward_wakeup(smc, clcsk, smc->clcsk_data_ready); read_unlock_bh(&clcsk->sk_callback_lock); } static void smc_fback_write_space(struct sock *clcsk) { struct smc_sock *smc; read_lock_bh(&clcsk->sk_callback_lock); smc = smc_clcsock_user_data(clcsk); if (smc) smc_fback_forward_wakeup(smc, clcsk, smc->clcsk_write_space); read_unlock_bh(&clcsk->sk_callback_lock); } static void smc_fback_error_report(struct sock *clcsk) { struct smc_sock *smc; read_lock_bh(&clcsk->sk_callback_lock); smc = smc_clcsock_user_data(clcsk); if (smc) smc_fback_forward_wakeup(smc, clcsk, smc->clcsk_error_report); read_unlock_bh(&clcsk->sk_callback_lock); } static void smc_fback_replace_callbacks(struct smc_sock *smc) { struct sock *clcsk = smc->clcsock->sk; write_lock_bh(&clcsk->sk_callback_lock); clcsk->sk_user_data = (void *)((uintptr_t)smc | SK_USER_DATA_NOCOPY); smc_clcsock_replace_cb(&clcsk->sk_state_change, smc_fback_state_change, &smc->clcsk_state_change); smc_clcsock_replace_cb(&clcsk->sk_data_ready, smc_fback_data_ready, &smc->clcsk_data_ready); smc_clcsock_replace_cb(&clcsk->sk_write_space, smc_fback_write_space, &smc->clcsk_write_space); smc_clcsock_replace_cb(&clcsk->sk_error_report, smc_fback_error_report, &smc->clcsk_error_report); write_unlock_bh(&clcsk->sk_callback_lock); } static int smc_switch_to_fallback(struct smc_sock *smc, int reason_code) { int rc = 0; mutex_lock(&smc->clcsock_release_lock); if (!smc->clcsock) { rc = -EBADF; goto out; } smc->use_fallback = true; smc->fallback_rsn = reason_code; smc_stat_fallback(smc); trace_smc_switch_to_fallback(smc, reason_code); if (smc->sk.sk_socket && smc->sk.sk_socket->file) { smc->clcsock->file = smc->sk.sk_socket->file; smc->clcsock->file->private_data = smc->clcsock; smc->clcsock->wq.fasync_list = smc->sk.sk_socket->wq.fasync_list; smc->sk.sk_socket->wq.fasync_list = NULL; /* There might be some wait entries remaining * in smc sk->sk_wq and they should be woken up * as clcsock's wait queue is woken up. */ smc_fback_replace_callbacks(smc); } out: mutex_unlock(&smc->clcsock_release_lock); return rc; } /* fall back during connect */ static int smc_connect_fallback(struct smc_sock *smc, int reason_code) { struct net *net = sock_net(&smc->sk); int rc = 0; rc = smc_switch_to_fallback(smc, reason_code); if (rc) { /* fallback fails */ this_cpu_inc(net->smc.smc_stats->clnt_hshake_err_cnt); if (smc->sk.sk_state == SMC_INIT) sock_put(&smc->sk); /* passive closing */ return rc; } smc_copy_sock_settings_to_clc(smc); smc->connect_nonblock = 0; if (smc->sk.sk_state == SMC_INIT) smc->sk.sk_state = SMC_ACTIVE; return 0; } /* decline and fall back during connect */ static int smc_connect_decline_fallback(struct smc_sock *smc, int reason_code, u8 version) { struct net *net = sock_net(&smc->sk); int rc; if (reason_code < 0) { /* error, fallback is not possible */ this_cpu_inc(net->smc.smc_stats->clnt_hshake_err_cnt); if (smc->sk.sk_state == SMC_INIT) sock_put(&smc->sk); /* passive closing */ return reason_code; } if (reason_code != SMC_CLC_DECL_PEERDECL) { rc = smc_clc_send_decline(smc, reason_code, version); if (rc < 0) { this_cpu_inc(net->smc.smc_stats->clnt_hshake_err_cnt); if (smc->sk.sk_state == SMC_INIT) sock_put(&smc->sk); /* passive closing */ return rc; } } return smc_connect_fallback(smc, reason_code); } static void smc_conn_abort(struct smc_sock *smc, int local_first) { struct smc_connection *conn = &smc->conn; struct smc_link_group *lgr = conn->lgr; bool lgr_valid = false; if (smc_conn_lgr_valid(conn)) lgr_valid = true; smc_conn_free(conn); if (local_first && lgr_valid) smc_lgr_cleanup_early(lgr); } /* check if there is a rdma device available for this connection. */ /* called for connect and listen */ static int smc_find_rdma_device(struct smc_sock *smc, struct smc_init_info *ini) { /* PNET table look up: search active ib_device and port * within same PNETID that also contains the ethernet device * used for the internal TCP socket */ smc_pnet_find_roce_resource(smc->clcsock->sk, ini); if (!ini->check_smcrv2 && !ini->ib_dev) return SMC_CLC_DECL_NOSMCRDEV; if (ini->check_smcrv2 && !ini->smcrv2.ib_dev_v2) return SMC_CLC_DECL_NOSMCRDEV; return 0; } /* check if there is an ISM device available for this connection. */ /* called for connect and listen */ static int smc_find_ism_device(struct smc_sock *smc, struct smc_init_info *ini) { /* Find ISM device with same PNETID as connecting interface */ smc_pnet_find_ism_resource(smc->clcsock->sk, ini); if (!ini->ism_dev[0]) return SMC_CLC_DECL_NOSMCDDEV; else ini->ism_chid[0] = smc_ism_get_chid(ini->ism_dev[0]); return 0; } /* is chid unique for the ism devices that are already determined? */ static bool smc_find_ism_v2_is_unique_chid(u16 chid, struct smc_init_info *ini, int cnt) { int i = (!ini->ism_dev[0]) ? 1 : 0; for (; i < cnt; i++) if (ini->ism_chid[i] == chid) return false; return true; } /* determine possible V2 ISM devices (either without PNETID or with PNETID plus * PNETID matching net_device) */ static int smc_find_ism_v2_device_clnt(struct smc_sock *smc, struct smc_init_info *ini) { int rc = SMC_CLC_DECL_NOSMCDDEV; struct smcd_dev *smcd; int i = 1, entry = 1; bool is_emulated; u16 chid; if (smcd_indicated(ini->smc_type_v1)) rc = 0; /* already initialized for V1 */ mutex_lock(&smcd_dev_list.mutex); list_for_each_entry(smcd, &smcd_dev_list.list, list) { if (smcd->going_away || smcd == ini->ism_dev[0]) continue; chid = smc_ism_get_chid(smcd); if (!smc_find_ism_v2_is_unique_chid(chid, ini, i)) continue; is_emulated = __smc_ism_is_emulated(chid); if (!smc_pnet_is_pnetid_set(smcd->pnetid) || smc_pnet_is_ndev_pnetid(sock_net(&smc->sk), smcd->pnetid)) { if (is_emulated && entry == SMCD_CLC_MAX_V2_GID_ENTRIES) /* It's the last GID-CHID entry left in CLC * Proposal SMC-Dv2 extension, but an Emulated- * ISM device will take two entries. So give * up it and try the next potential ISM device. */ continue; ini->ism_dev[i] = smcd; ini->ism_chid[i] = chid; ini->is_smcd = true; rc = 0; i++; entry = is_emulated ? entry + 2 : entry + 1; if (entry > SMCD_CLC_MAX_V2_GID_ENTRIES) break; } } mutex_unlock(&smcd_dev_list.mutex); ini->ism_offered_cnt = i - 1; if (!ini->ism_dev[0] && !ini->ism_dev[1]) ini->smcd_version = 0; return rc; } /* Check for VLAN ID and register it on ISM device just for CLC handshake */ static int smc_connect_ism_vlan_setup(struct smc_sock *smc, struct smc_init_info *ini) { if (ini->vlan_id && smc_ism_get_vlan(ini->ism_dev[0], ini->vlan_id)) return SMC_CLC_DECL_ISMVLANERR; return 0; } static int smc_find_proposal_devices(struct smc_sock *smc, struct smc_init_info *ini) { int rc = 0; /* check if there is an ism device available */ if (!(ini->smcd_version & SMC_V1) || smc_find_ism_device(smc, ini) || smc_connect_ism_vlan_setup(smc, ini)) ini->smcd_version &= ~SMC_V1; /* else ISM V1 is supported for this connection */ /* check if there is an rdma device available */ if (!(ini->smcr_version & SMC_V1) || smc_find_rdma_device(smc, ini)) ini->smcr_version &= ~SMC_V1; /* else RDMA is supported for this connection */ ini->smc_type_v1 = smc_indicated_type(ini->smcd_version & SMC_V1, ini->smcr_version & SMC_V1); /* check if there is an ism v2 device available */ if (!(ini->smcd_version & SMC_V2) || !smc_ism_is_v2_capable() || smc_find_ism_v2_device_clnt(smc, ini)) ini->smcd_version &= ~SMC_V2; /* check if there is an rdma v2 device available */ ini->check_smcrv2 = true; ini->smcrv2.saddr = smc->clcsock->sk->sk_rcv_saddr; if (!(ini->smcr_version & SMC_V2) || smc->clcsock->sk->sk_family != AF_INET || !smc_clc_ueid_count() || smc_find_rdma_device(smc, ini)) ini->smcr_version &= ~SMC_V2; ini->check_smcrv2 = false; ini->smc_type_v2 = smc_indicated_type(ini->smcd_version & SMC_V2, ini->smcr_version & SMC_V2); /* if neither ISM nor RDMA are supported, fallback */ if (ini->smc_type_v1 == SMC_TYPE_N && ini->smc_type_v2 == SMC_TYPE_N) rc = SMC_CLC_DECL_NOSMCDEV; return rc; } /* cleanup temporary VLAN ID registration used for CLC handshake. If ISM is * used, the VLAN ID will be registered again during the connection setup. */ static int smc_connect_ism_vlan_cleanup(struct smc_sock *smc, struct smc_init_info *ini) { if (!smcd_indicated(ini->smc_type_v1)) return 0; if (ini->vlan_id && smc_ism_put_vlan(ini->ism_dev[0], ini->vlan_id)) return SMC_CLC_DECL_CNFERR; return 0; } #define SMC_CLC_MAX_ACCEPT_LEN \ (sizeof(struct smc_clc_msg_accept_confirm) + \ sizeof(struct smc_clc_first_contact_ext_v2x) + \ sizeof(struct smc_clc_msg_trail)) /* CLC handshake during connect */ static int smc_connect_clc(struct smc_sock *smc, struct smc_clc_msg_accept_confirm *aclc, struct smc_init_info *ini) { int rc = 0; /* do inband token exchange */ rc = smc_clc_send_proposal(smc, ini); if (rc) return rc; /* receive SMC Accept CLC message */ return smc_clc_wait_msg(smc, aclc, SMC_CLC_MAX_ACCEPT_LEN, SMC_CLC_ACCEPT, CLC_WAIT_TIME); } void smc_fill_gid_list(struct smc_link_group *lgr, struct smc_gidlist *gidlist, struct smc_ib_device *known_dev, u8 *known_gid) { struct smc_init_info *alt_ini = NULL; memset(gidlist, 0, sizeof(*gidlist)); memcpy(gidlist->list[gidlist->len++], known_gid, SMC_GID_SIZE); alt_ini = kzalloc(sizeof(*alt_ini), GFP_KERNEL); if (!alt_ini) goto out; alt_ini->vlan_id = lgr->vlan_id; alt_ini->check_smcrv2 = true; alt_ini->smcrv2.saddr = lgr->saddr; smc_pnet_find_alt_roce(lgr, alt_ini, known_dev); if (!alt_ini->smcrv2.ib_dev_v2) goto out; memcpy(gidlist->list[gidlist->len++], alt_ini->smcrv2.ib_gid_v2, SMC_GID_SIZE); out: kfree(alt_ini); } static int smc_connect_rdma_v2_prepare(struct smc_sock *smc, struct smc_clc_msg_accept_confirm *aclc, struct smc_init_info *ini) { struct smc_clc_first_contact_ext *fce = smc_get_clc_first_contact_ext(aclc, false); struct net *net = sock_net(&smc->sk); int rc; if (!ini->first_contact_peer || aclc->hdr.version == SMC_V1) return 0; if (fce->v2_direct) { memcpy(ini->smcrv2.nexthop_mac, &aclc->r0.lcl.mac, ETH_ALEN); ini->smcrv2.uses_gateway = false; } else { if (smc_ib_find_route(net, smc->clcsock->sk->sk_rcv_saddr, smc_ib_gid_to_ipv4(aclc->r0.lcl.gid), ini->smcrv2.nexthop_mac, &ini->smcrv2.uses_gateway)) return SMC_CLC_DECL_NOROUTE; if (!ini->smcrv2.uses_gateway) { /* mismatch: peer claims indirect, but its direct */ return SMC_CLC_DECL_NOINDIRECT; } } ini->release_nr = fce->release; rc = smc_clc_clnt_v2x_features_validate(fce, ini); if (rc) return rc; return 0; } /* setup for RDMA connection of client */ static int smc_connect_rdma(struct smc_sock *smc, struct smc_clc_msg_accept_confirm *aclc, struct smc_init_info *ini) { int i, reason_code = 0; struct smc_link *link; u8 *eid = NULL; ini->is_smcd = false; ini->ib_clcqpn = ntoh24(aclc->r0.qpn); ini->first_contact_peer = aclc->hdr.typev2 & SMC_FIRST_CONTACT_MASK; memcpy(ini->peer_systemid, aclc->r0.lcl.id_for_peer, SMC_SYSTEMID_LEN); memcpy(ini->peer_gid, aclc->r0.lcl.gid, SMC_GID_SIZE); memcpy(ini->peer_mac, aclc->r0.lcl.mac, ETH_ALEN); ini->max_conns = SMC_CONN_PER_LGR_MAX; ini->max_links = SMC_LINKS_ADD_LNK_MAX; reason_code = smc_connect_rdma_v2_prepare(smc, aclc, ini); if (reason_code) return reason_code; mutex_lock(&smc_client_lgr_pending); reason_code = smc_conn_create(smc, ini); if (reason_code) { mutex_unlock(&smc_client_lgr_pending); return reason_code; } smc_conn_save_peer_info(smc, aclc); if (ini->first_contact_local) { link = smc->conn.lnk; } else { /* set link that was assigned by server */ link = NULL; for (i = 0; i < SMC_LINKS_PER_LGR_MAX; i++) { struct smc_link *l = &smc->conn.lgr->lnk[i]; if (l->peer_qpn == ntoh24(aclc->r0.qpn) && !memcmp(l->peer_gid, &aclc->r0.lcl.gid, SMC_GID_SIZE) && (aclc->hdr.version > SMC_V1 || !memcmp(l->peer_mac, &aclc->r0.lcl.mac, sizeof(l->peer_mac)))) { link = l; break; } } if (!link) { reason_code = SMC_CLC_DECL_NOSRVLINK; goto connect_abort; } smc_switch_link_and_count(&smc->conn, link); } /* create send buffer and rmb */ if (smc_buf_create(smc, false)) { reason_code = SMC_CLC_DECL_MEM; goto connect_abort; } if (ini->first_contact_local) smc_link_save_peer_info(link, aclc, ini); if (smc_rmb_rtoken_handling(&smc->conn, link, aclc)) { reason_code = SMC_CLC_DECL_ERR_RTOK; goto connect_abort; } smc_close_init(smc); smc_rx_init(smc); if (ini->first_contact_local) { if (smc_ib_ready_link(link)) { reason_code = SMC_CLC_DECL_ERR_RDYLNK; goto connect_abort; } } else { /* reg sendbufs if they were vzalloced */ if (smc->conn.sndbuf_desc->is_vm) { if (smcr_lgr_reg_sndbufs(link, smc->conn.sndbuf_desc)) { reason_code = SMC_CLC_DECL_ERR_REGBUF; goto connect_abort; } } if (smcr_lgr_reg_rmbs(link, smc->conn.rmb_desc)) { reason_code = SMC_CLC_DECL_ERR_REGBUF; goto connect_abort; } } if (aclc->hdr.version > SMC_V1) { eid = aclc->r1.eid; if (ini->first_contact_local) smc_fill_gid_list(link->lgr, &ini->smcrv2.gidlist, link->smcibdev, link->gid); } reason_code = smc_clc_send_confirm(smc, ini->first_contact_local, aclc->hdr.version, eid, ini); if (reason_code) goto connect_abort; smc_tx_init(smc); if (ini->first_contact_local) { /* QP confirmation over RoCE fabric */ smc_llc_flow_initiate(link->lgr, SMC_LLC_FLOW_ADD_LINK); reason_code = smcr_clnt_conf_first_link(smc); smc_llc_flow_stop(link->lgr, &link->lgr->llc_flow_lcl); if (reason_code) goto connect_abort; } mutex_unlock(&smc_client_lgr_pending); smc_copy_sock_settings_to_clc(smc); smc->connect_nonblock = 0; if (smc->sk.sk_state == SMC_INIT) smc->sk.sk_state = SMC_ACTIVE; return 0; connect_abort: smc_conn_abort(smc, ini->first_contact_local); mutex_unlock(&smc_client_lgr_pending); smc->connect_nonblock = 0; return reason_code; } /* The server has chosen one of the proposed ISM devices for the communication. * Determine from the CHID of the received CLC ACCEPT the ISM device chosen. */ static int smc_v2_determine_accepted_chid(struct smc_clc_msg_accept_confirm *aclc, struct smc_init_info *ini) { int i; for (i = 0; i < ini->ism_offered_cnt + 1; i++) { if (ini->ism_chid[i] == ntohs(aclc->d1.chid)) { ini->ism_selected = i; return 0; } } return -EPROTO; } /* setup for ISM connection of client */ static int smc_connect_ism(struct smc_sock *smc, struct smc_clc_msg_accept_confirm *aclc, struct smc_init_info *ini) { u8 *eid = NULL; int rc = 0; ini->is_smcd = true; ini->first_contact_peer = aclc->hdr.typev2 & SMC_FIRST_CONTACT_MASK; if (aclc->hdr.version == SMC_V2) { if (ini->first_contact_peer) { struct smc_clc_first_contact_ext *fce = smc_get_clc_first_contact_ext(aclc, true); ini->release_nr = fce->release; rc = smc_clc_clnt_v2x_features_validate(fce, ini); if (rc) return rc; } rc = smc_v2_determine_accepted_chid(aclc, ini); if (rc) return rc; if (__smc_ism_is_emulated(ini->ism_chid[ini->ism_selected])) ini->ism_peer_gid[ini->ism_selected].gid_ext = ntohll(aclc->d1.gid_ext); /* for non-Emulated-ISM devices, peer gid_ext remains 0. */ } ini->ism_peer_gid[ini->ism_selected].gid = ntohll(aclc->d0.gid); /* there is only one lgr role for SMC-D; use server lock */ mutex_lock(&smc_server_lgr_pending); rc = smc_conn_create(smc, ini); if (rc) { mutex_unlock(&smc_server_lgr_pending); return rc; } /* Create send and receive buffers */ rc = smc_buf_create(smc, true); if (rc) { rc = (rc == -ENOSPC) ? SMC_CLC_DECL_MAX_DMB : SMC_CLC_DECL_MEM; goto connect_abort; } smc_conn_save_peer_info(smc, aclc); if (smc_ism_support_dmb_nocopy(smc->conn.lgr->smcd)) { rc = smcd_buf_attach(smc); if (rc) { rc = SMC_CLC_DECL_MEM; /* try to fallback */ goto connect_abort; } } smc_close_init(smc); smc_rx_init(smc); smc_tx_init(smc); if (aclc->hdr.version > SMC_V1) eid = aclc->d1.eid; rc = smc_clc_send_confirm(smc, ini->first_contact_local, aclc->hdr.version, eid, ini); if (rc) goto connect_abort; mutex_unlock(&smc_server_lgr_pending); smc_copy_sock_settings_to_clc(smc); smc->connect_nonblock = 0; if (smc->sk.sk_state == SMC_INIT) smc->sk.sk_state = SMC_ACTIVE; return 0; connect_abort: smc_conn_abort(smc, ini->first_contact_local); mutex_unlock(&smc_server_lgr_pending); smc->connect_nonblock = 0; return rc; } /* check if received accept type and version matches a proposed one */ static int smc_connect_check_aclc(struct smc_init_info *ini, struct smc_clc_msg_accept_confirm *aclc) { if (aclc->hdr.typev1 != SMC_TYPE_R && aclc->hdr.typev1 != SMC_TYPE_D) return SMC_CLC_DECL_MODEUNSUPP; if (aclc->hdr.version >= SMC_V2) { if ((aclc->hdr.typev1 == SMC_TYPE_R && !smcr_indicated(ini->smc_type_v2)) || (aclc->hdr.typev1 == SMC_TYPE_D && !smcd_indicated(ini->smc_type_v2))) return SMC_CLC_DECL_MODEUNSUPP; } else { if ((aclc->hdr.typev1 == SMC_TYPE_R && !smcr_indicated(ini->smc_type_v1)) || (aclc->hdr.typev1 == SMC_TYPE_D && !smcd_indicated(ini->smc_type_v1))) return SMC_CLC_DECL_MODEUNSUPP; } return 0; } /* perform steps before actually connecting */ static int __smc_connect(struct smc_sock *smc) { u8 version = smc_ism_is_v2_capable() ? SMC_V2 : SMC_V1; struct smc_clc_msg_accept_confirm *aclc; struct smc_init_info *ini = NULL; u8 *buf = NULL; int rc = 0; if (smc->use_fallback) return smc_connect_fallback(smc, smc->fallback_rsn); /* if peer has not signalled SMC-capability, fall back */ if (!tcp_sk(smc->clcsock->sk)->syn_smc) return smc_connect_fallback(smc, SMC_CLC_DECL_PEERNOSMC); /* IPSec connections opt out of SMC optimizations */ if (using_ipsec(smc)) return smc_connect_decline_fallback(smc, SMC_CLC_DECL_IPSEC, version); ini = kzalloc(sizeof(*ini), GFP_KERNEL); if (!ini) return smc_connect_decline_fallback(smc, SMC_CLC_DECL_MEM, version); ini->smcd_version = SMC_V1 | SMC_V2; ini->smcr_version = SMC_V1 | SMC_V2; ini->smc_type_v1 = SMC_TYPE_B; ini->smc_type_v2 = SMC_TYPE_B; /* get vlan id from IP device */ if (smc_vlan_by_tcpsk(smc->clcsock, ini)) { ini->smcd_version &= ~SMC_V1; ini->smcr_version = 0; ini->smc_type_v1 = SMC_TYPE_N; if (!ini->smcd_version) { rc = SMC_CLC_DECL_GETVLANERR; goto fallback; } } rc = smc_find_proposal_devices(smc, ini); if (rc) goto fallback; buf = kzalloc(SMC_CLC_MAX_ACCEPT_LEN, GFP_KERNEL); if (!buf) { rc = SMC_CLC_DECL_MEM; goto fallback; } aclc = (struct smc_clc_msg_accept_confirm *)buf; /* perform CLC handshake */ rc = smc_connect_clc(smc, aclc, ini); if (rc) { /* -EAGAIN on timeout, see tcp_recvmsg() */ if (rc == -EAGAIN) { rc = -ETIMEDOUT; smc->sk.sk_err = ETIMEDOUT; } goto vlan_cleanup; } /* check if smc modes and versions of CLC proposal and accept match */ rc = smc_connect_check_aclc(ini, aclc); version = aclc->hdr.version == SMC_V1 ? SMC_V1 : SMC_V2; if (rc) goto vlan_cleanup; /* depending on previous steps, connect using rdma or ism */ if (aclc->hdr.typev1 == SMC_TYPE_R) { ini->smcr_version = version; rc = smc_connect_rdma(smc, aclc, ini); } else if (aclc->hdr.typev1 == SMC_TYPE_D) { ini->smcd_version = version; rc = smc_connect_ism(smc, aclc, ini); } if (rc) goto vlan_cleanup; SMC_STAT_CLNT_SUCC_INC(sock_net(smc->clcsock->sk), aclc); smc_connect_ism_vlan_cleanup(smc, ini); kfree(buf); kfree(ini); return 0; vlan_cleanup: smc_connect_ism_vlan_cleanup(smc, ini); kfree(buf); fallback: kfree(ini); return smc_connect_decline_fallback(smc, rc, version); } static void smc_connect_work(struct work_struct *work) { struct smc_sock *smc = container_of(work, struct smc_sock, connect_work); long timeo = smc->sk.sk_sndtimeo; int rc = 0; if (!timeo) timeo = MAX_SCHEDULE_TIMEOUT; lock_sock(smc->clcsock->sk); if (smc->clcsock->sk->sk_err) { smc->sk.sk_err = smc->clcsock->sk->sk_err; } else if ((1 << smc->clcsock->sk->sk_state) & (TCPF_SYN_SENT | TCPF_SYN_RECV)) { rc = sk_stream_wait_connect(smc->clcsock->sk, &timeo); if ((rc == -EPIPE) && ((1 << smc->clcsock->sk->sk_state) & (TCPF_ESTABLISHED | TCPF_CLOSE_WAIT))) rc = 0; } release_sock(smc->clcsock->sk); lock_sock(&smc->sk); if (rc != 0 || smc->sk.sk_err) { smc->sk.sk_state = SMC_CLOSED; if (rc == -EPIPE || rc == -EAGAIN) smc->sk.sk_err = EPIPE; else if (rc == -ECONNREFUSED) smc->sk.sk_err = ECONNREFUSED; else if (signal_pending(current)) smc->sk.sk_err = -sock_intr_errno(timeo); sock_put(&smc->sk); /* passive closing */ goto out; } rc = __smc_connect(smc); if (rc < 0) smc->sk.sk_err = -rc; out: if (!sock_flag(&smc->sk, SOCK_DEAD)) { if (smc->sk.sk_err) { smc->sk.sk_state_change(&smc->sk); } else { /* allow polling before and after fallback decision */ smc->clcsock->sk->sk_write_space(smc->clcsock->sk); smc->sk.sk_write_space(&smc->sk); } } release_sock(&smc->sk); } int smc_connect(struct socket *sock, struct sockaddr *addr, int alen, int flags) { struct sock *sk = sock->sk; struct smc_sock *smc; int rc = -EINVAL; smc = smc_sk(sk); /* separate smc parameter checking to be safe */ if (alen < sizeof(addr->sa_family)) goto out_err; if (addr->sa_family != AF_INET && addr->sa_family != AF_INET6) goto out_err; lock_sock(sk); switch (sock->state) { default: rc = -EINVAL; goto out; case SS_CONNECTED: rc = sk->sk_state == SMC_ACTIVE ? -EISCONN : -EINVAL; goto out; case SS_CONNECTING: if (sk->sk_state == SMC_ACTIVE) goto connected; break; case SS_UNCONNECTED: sock->state = SS_CONNECTING; break; } switch (sk->sk_state) { default: goto out; case SMC_CLOSED: rc = sock_error(sk) ? : -ECONNABORTED; sock->state = SS_UNCONNECTED; goto out; case SMC_ACTIVE: rc = -EISCONN; goto out; case SMC_INIT: break; } smc_copy_sock_settings_to_clc(smc); tcp_sk(smc->clcsock->sk)->syn_smc = 1; if (smc->connect_nonblock) { rc = -EALREADY; goto out; } rc = kernel_connect(smc->clcsock, addr, alen, flags); if (rc && rc != -EINPROGRESS) goto out; if (smc->use_fallback) { sock->state = rc ? SS_CONNECTING : SS_CONNECTED; goto out; } sock_hold(&smc->sk); /* sock put in passive closing */ if (flags & O_NONBLOCK) { if (queue_work(smc_hs_wq, &smc->connect_work)) smc->connect_nonblock = 1; rc = -EINPROGRESS; goto out; } else { rc = __smc_connect(smc); if (rc < 0) goto out; } connected: rc = 0; sock->state = SS_CONNECTED; out: release_sock(sk); out_err: return rc; } static int smc_clcsock_accept(struct smc_sock *lsmc, struct smc_sock **new_smc) { struct socket *new_clcsock = NULL; struct sock *lsk = &lsmc->sk; struct sock *new_sk; int rc = -EINVAL; release_sock(lsk); new_sk = smc_sock_alloc(sock_net(lsk), NULL, lsk->sk_protocol); if (!new_sk) { rc = -ENOMEM; lsk->sk_err = ENOMEM; *new_smc = NULL; lock_sock(lsk); goto out; } *new_smc = smc_sk(new_sk); mutex_lock(&lsmc->clcsock_release_lock); if (lsmc->clcsock) rc = kernel_accept(lsmc->clcsock, &new_clcsock, SOCK_NONBLOCK); mutex_unlock(&lsmc->clcsock_release_lock); lock_sock(lsk); if (rc < 0 && rc != -EAGAIN) lsk->sk_err = -rc; if (rc < 0 || lsk->sk_state == SMC_CLOSED) { new_sk->sk_prot->unhash(new_sk); if (new_clcsock) sock_release(new_clcsock); new_sk->sk_state = SMC_CLOSED; smc_sock_set_flag(new_sk, SOCK_DEAD); sock_put(new_sk); /* final */ *new_smc = NULL; goto out; } /* new clcsock has inherited the smc listen-specific sk_data_ready * function; switch it back to the original sk_data_ready function */ new_clcsock->sk->sk_data_ready = lsmc->clcsk_data_ready; /* if new clcsock has also inherited the fallback-specific callback * functions, switch them back to the original ones. */ if (lsmc->use_fallback) { if (lsmc->clcsk_state_change) new_clcsock->sk->sk_state_change = lsmc->clcsk_state_change; if (lsmc->clcsk_write_space) new_clcsock->sk->sk_write_space = lsmc->clcsk_write_space; if (lsmc->clcsk_error_report) new_clcsock->sk->sk_error_report = lsmc->clcsk_error_report; } (*new_smc)->clcsock = new_clcsock; out: return rc; } /* add a just created sock to the accept queue of the listen sock as * candidate for a following socket accept call from user space */ static void smc_accept_enqueue(struct sock *parent, struct sock *sk) { struct smc_sock *par = smc_sk(parent); sock_hold(sk); /* sock_put in smc_accept_unlink () */ spin_lock(&par->accept_q_lock); list_add_tail(&smc_sk(sk)->accept_q, &par->accept_q); spin_unlock(&par->accept_q_lock); sk_acceptq_added(parent); } /* remove a socket from the accept queue of its parental listening socket */ static void smc_accept_unlink(struct sock *sk) { struct smc_sock *par = smc_sk(sk)->listen_smc; spin_lock(&par->accept_q_lock); list_del_init(&smc_sk(sk)->accept_q); spin_unlock(&par->accept_q_lock); sk_acceptq_removed(&smc_sk(sk)->listen_smc->sk); sock_put(sk); /* sock_hold in smc_accept_enqueue */ } /* remove a sock from the accept queue to bind it to a new socket created * for a socket accept call from user space */ struct sock *smc_accept_dequeue(struct sock *parent, struct socket *new_sock) { struct smc_sock *isk, *n; struct sock *new_sk; list_for_each_entry_safe(isk, n, &smc_sk(parent)->accept_q, accept_q) { new_sk = (struct sock *)isk; smc_accept_unlink(new_sk); if (new_sk->sk_state == SMC_CLOSED) { new_sk->sk_prot->unhash(new_sk); if (isk->clcsock) { sock_release(isk->clcsock); isk->clcsock = NULL; } sock_put(new_sk); /* final */ continue; } if (new_sock) { sock_graft(new_sk, new_sock); new_sock->state = SS_CONNECTED; if (isk->use_fallback) { smc_sk(new_sk)->clcsock->file = new_sock->file; isk->clcsock->file->private_data = isk->clcsock; } } return new_sk; } return NULL; } /* clean up for a created but never accepted sock */ void smc_close_non_accepted(struct sock *sk) { struct smc_sock *smc = smc_sk(sk); sock_hold(sk); /* sock_put below */ lock_sock(sk); if (!sk->sk_lingertime) /* wait for peer closing */ WRITE_ONCE(sk->sk_lingertime, SMC_MAX_STREAM_WAIT_TIMEOUT); __smc_release(smc); release_sock(sk); sock_put(sk); /* sock_hold above */ sock_put(sk); /* final sock_put */ } static int smcr_serv_conf_first_link(struct smc_sock *smc) { struct smc_link *link = smc->conn.lnk; struct smc_llc_qentry *qentry; int rc; /* reg the sndbuf if it was vzalloced*/ if (smc->conn.sndbuf_desc->is_vm) { if (smcr_link_reg_buf(link, smc->conn.sndbuf_desc)) return SMC_CLC_DECL_ERR_REGBUF; } /* reg the rmb */ if (smcr_link_reg_buf(link, smc->conn.rmb_desc)) return SMC_CLC_DECL_ERR_REGBUF; /* send CONFIRM LINK request to client over the RoCE fabric */ rc = smc_llc_send_confirm_link(link, SMC_LLC_REQ); if (rc < 0) return SMC_CLC_DECL_TIMEOUT_CL; /* receive CONFIRM LINK response from client over the RoCE fabric */ qentry = smc_llc_wait(link->lgr, link, SMC_LLC_WAIT_TIME, SMC_LLC_CONFIRM_LINK); if (!qentry) { struct smc_clc_msg_decline dclc; rc = smc_clc_wait_msg(smc, &dclc, sizeof(dclc), SMC_CLC_DECLINE, CLC_WAIT_TIME_SHORT); return rc == -EAGAIN ? SMC_CLC_DECL_TIMEOUT_CL : rc; } smc_llc_save_peer_uid(qentry); rc = smc_llc_eval_conf_link(qentry, SMC_LLC_RESP); smc_llc_flow_qentry_del(&link->lgr->llc_flow_lcl); if (rc) return SMC_CLC_DECL_RMBE_EC; /* confirm_rkey is implicit on 1st contact */ smc->conn.rmb_desc->is_conf_rkey = true; smc_llc_link_active(link); smcr_lgr_set_type(link->lgr, SMC_LGR_SINGLE); if (link->lgr->max_links > 1) { down_write(&link->lgr->llc_conf_mutex); /* initial contact - try to establish second link */ smc_llc_srv_add_link(link, NULL); up_write(&link->lgr->llc_conf_mutex); } return 0; } /* listen worker: finish */ static void smc_listen_out(struct smc_sock *new_smc) { struct smc_sock *lsmc = new_smc->listen_smc; struct sock *newsmcsk = &new_smc->sk; if (tcp_sk(new_smc->clcsock->sk)->syn_smc) atomic_dec(&lsmc->queued_smc_hs); if (lsmc->sk.sk_state == SMC_LISTEN) { lock_sock_nested(&lsmc->sk, SINGLE_DEPTH_NESTING); smc_accept_enqueue(&lsmc->sk, newsmcsk); release_sock(&lsmc->sk); } else { /* no longer listening */ smc_close_non_accepted(newsmcsk); } /* Wake up accept */ lsmc->sk.sk_data_ready(&lsmc->sk); sock_put(&lsmc->sk); /* sock_hold in smc_tcp_listen_work */ } /* listen worker: finish in state connected */ static void smc_listen_out_connected(struct smc_sock *new_smc) { struct sock *newsmcsk = &new_smc->sk; if (newsmcsk->sk_state == SMC_INIT) newsmcsk->sk_state = SMC_ACTIVE; smc_listen_out(new_smc); } /* listen worker: finish in error state */ static void smc_listen_out_err(struct smc_sock *new_smc) { struct sock *newsmcsk = &new_smc->sk; struct net *net = sock_net(newsmcsk); this_cpu_inc(net->smc.smc_stats->srv_hshake_err_cnt); if (newsmcsk->sk_state == SMC_INIT) sock_put(&new_smc->sk); /* passive closing */ newsmcsk->sk_state = SMC_CLOSED; smc_listen_out(new_smc); } /* listen worker: decline and fall back if possible */ static void smc_listen_decline(struct smc_sock *new_smc, int reason_code, int local_first, u8 version) { /* RDMA setup failed, switch back to TCP */ smc_conn_abort(new_smc, local_first); if (reason_code < 0 || smc_switch_to_fallback(new_smc, reason_code)) { /* error, no fallback possible */ smc_listen_out_err(new_smc); return; } if (reason_code && reason_code != SMC_CLC_DECL_PEERDECL) { if (smc_clc_send_decline(new_smc, reason_code, version) < 0) { smc_listen_out_err(new_smc); return; } } smc_listen_out_connected(new_smc); } /* listen worker: version checking */ static int smc_listen_v2_check(struct smc_sock *new_smc, struct smc_clc_msg_proposal *pclc, struct smc_init_info *ini) { struct smc_clc_smcd_v2_extension *pclc_smcd_v2_ext; struct smc_clc_v2_extension *pclc_v2_ext; int rc = SMC_CLC_DECL_PEERNOSMC; ini->smc_type_v1 = pclc->hdr.typev1; ini->smc_type_v2 = pclc->hdr.typev2; ini->smcd_version = smcd_indicated(ini->smc_type_v1) ? SMC_V1 : 0; ini->smcr_version = smcr_indicated(ini->smc_type_v1) ? SMC_V1 : 0; if (pclc->hdr.version > SMC_V1) { if (smcd_indicated(ini->smc_type_v2)) ini->smcd_version |= SMC_V2; if (smcr_indicated(ini->smc_type_v2)) ini->smcr_version |= SMC_V2; } if (!(ini->smcd_version & SMC_V2) && !(ini->smcr_version & SMC_V2)) { rc = SMC_CLC_DECL_PEERNOSMC; goto out; } pclc_v2_ext = smc_get_clc_v2_ext(pclc); if (!pclc_v2_ext) { ini->smcd_version &= ~SMC_V2; ini->smcr_version &= ~SMC_V2; rc = SMC_CLC_DECL_NOV2EXT; goto out; } pclc_smcd_v2_ext = smc_get_clc_smcd_v2_ext(pclc_v2_ext); if (ini->smcd_version & SMC_V2) { if (!smc_ism_is_v2_capable()) { ini->smcd_version &= ~SMC_V2; rc = SMC_CLC_DECL_NOISM2SUPP; } else if (!pclc_smcd_v2_ext) { ini->smcd_version &= ~SMC_V2; rc = SMC_CLC_DECL_NOV2DEXT; } else if (!pclc_v2_ext->hdr.eid_cnt && !pclc_v2_ext->hdr.flag.seid) { ini->smcd_version &= ~SMC_V2; rc = SMC_CLC_DECL_NOUEID; } } if (ini->smcr_version & SMC_V2) { if (!pclc_v2_ext->hdr.eid_cnt) { ini->smcr_version &= ~SMC_V2; rc = SMC_CLC_DECL_NOUEID; } } ini->release_nr = pclc_v2_ext->hdr.flag.release; if (pclc_v2_ext->hdr.flag.release > SMC_RELEASE) ini->release_nr = SMC_RELEASE; out: if (!ini->smcd_version && !ini->smcr_version) return rc; return 0; } /* listen worker: check prefixes */ static int smc_listen_prfx_check(struct smc_sock *new_smc, struct smc_clc_msg_proposal *pclc) { struct smc_clc_msg_proposal_prefix *pclc_prfx; struct socket *newclcsock = new_smc->clcsock; if (pclc->hdr.typev1 == SMC_TYPE_N) return 0; pclc_prfx = smc_clc_proposal_get_prefix(pclc); if (smc_clc_prfx_match(newclcsock, pclc_prfx)) return SMC_CLC_DECL_DIFFPREFIX; return 0; } /* listen worker: initialize connection and buffers */ static int smc_listen_rdma_init(struct smc_sock *new_smc, struct smc_init_info *ini) { int rc; /* allocate connection / link group */ rc = smc_conn_create(new_smc, ini); if (rc) return rc; /* create send buffer and rmb */ if (smc_buf_create(new_smc, false)) { smc_conn_abort(new_smc, ini->first_contact_local); return SMC_CLC_DECL_MEM; } return 0; } /* listen worker: initialize connection and buffers for SMC-D */ static int smc_listen_ism_init(struct smc_sock *new_smc, struct smc_init_info *ini) { int rc; rc = smc_conn_create(new_smc, ini); if (rc) return rc; /* Create send and receive buffers */ rc = smc_buf_create(new_smc, true); if (rc) { smc_conn_abort(new_smc, ini->first_contact_local); return (rc == -ENOSPC) ? SMC_CLC_DECL_MAX_DMB : SMC_CLC_DECL_MEM; } return 0; } static bool smc_is_already_selected(struct smcd_dev *smcd, struct smc_init_info *ini, int matches) { int i; for (i = 0; i < matches; i++) if (smcd == ini->ism_dev[i]) return true; return false; } /* check for ISM devices matching proposed ISM devices */ static void smc_check_ism_v2_match(struct smc_init_info *ini, u16 proposed_chid, struct smcd_gid *proposed_gid, unsigned int *matches) { struct smcd_dev *smcd; list_for_each_entry(smcd, &smcd_dev_list.list, list) { if (smcd->going_away) continue; if (smc_is_already_selected(smcd, ini, *matches)) continue; if (smc_ism_get_chid(smcd) == proposed_chid && !smc_ism_cantalk(proposed_gid, ISM_RESERVED_VLANID, smcd)) { ini->ism_peer_gid[*matches].gid = proposed_gid->gid; if (__smc_ism_is_emulated(proposed_chid)) ini->ism_peer_gid[*matches].gid_ext = proposed_gid->gid_ext; /* non-Emulated-ISM's peer gid_ext remains 0. */ ini->ism_dev[*matches] = smcd; (*matches)++; break; } } } static void smc_find_ism_store_rc(u32 rc, struct smc_init_info *ini) { if (!ini->rc) ini->rc = rc; } static void smc_find_ism_v2_device_serv(struct smc_sock *new_smc, struct smc_clc_msg_proposal *pclc, struct smc_init_info *ini) { struct smc_clc_smcd_v2_extension *smcd_v2_ext; struct smc_clc_v2_extension *smc_v2_ext; struct smc_clc_msg_smcd *pclc_smcd; unsigned int matches = 0; struct smcd_gid smcd_gid; u8 smcd_version; u8 *eid = NULL; int i, rc; u16 chid; if (!(ini->smcd_version & SMC_V2) || !smcd_indicated(ini->smc_type_v2)) goto not_found; pclc_smcd = smc_get_clc_msg_smcd(pclc); smc_v2_ext = smc_get_clc_v2_ext(pclc); smcd_v2_ext = smc_get_clc_smcd_v2_ext(smc_v2_ext); mutex_lock(&smcd_dev_list.mutex); if (pclc_smcd->ism.chid) { /* check for ISM device matching proposed native ISM device */ smcd_gid.gid = ntohll(pclc_smcd->ism.gid); smcd_gid.gid_ext = 0; smc_check_ism_v2_match(ini, ntohs(pclc_smcd->ism.chid), &smcd_gid, &matches); } for (i = 0; i < smc_v2_ext->hdr.ism_gid_cnt; i++) { /* check for ISM devices matching proposed non-native ISM * devices */ smcd_gid.gid = ntohll(smcd_v2_ext->gidchid[i].gid); smcd_gid.gid_ext = 0; chid = ntohs(smcd_v2_ext->gidchid[i].chid); if (__smc_ism_is_emulated(chid)) { if ((i + 1) == smc_v2_ext->hdr.ism_gid_cnt || chid != ntohs(smcd_v2_ext->gidchid[i + 1].chid)) /* each Emulated-ISM device takes two GID-CHID * entries and CHID of the second entry repeats * that of the first entry. * * So check if the next GID-CHID entry exists * and both two entries' CHIDs are the same. */ continue; smcd_gid.gid_ext = ntohll(smcd_v2_ext->gidchid[++i].gid); } smc_check_ism_v2_match(ini, chid, &smcd_gid, &matches); } mutex_unlock(&smcd_dev_list.mutex); if (!ini->ism_dev[0]) { smc_find_ism_store_rc(SMC_CLC_DECL_NOSMCD2DEV, ini); goto not_found; } smc_ism_get_system_eid(&eid); if (!smc_clc_match_eid(ini->negotiated_eid, smc_v2_ext, smcd_v2_ext->system_eid, eid)) goto not_found; /* separate - outside the smcd_dev_list.lock */ smcd_version = ini->smcd_version; for (i = 0; i < matches; i++) { ini->smcd_version = SMC_V2; ini->is_smcd = true; ini->ism_selected = i; rc = smc_listen_ism_init(new_smc, ini); if (rc) { smc_find_ism_store_rc(rc, ini); /* try next active ISM device */ continue; } return; /* matching and usable V2 ISM device found */ } /* no V2 ISM device could be initialized */ ini->smcd_version = smcd_version; /* restore original value */ ini->negotiated_eid[0] = 0; not_found: ini->smcd_version &= ~SMC_V2; ini->ism_dev[0] = NULL; ini->is_smcd = false; } static void smc_find_ism_v1_device_serv(struct smc_sock *new_smc, struct smc_clc_msg_proposal *pclc, struct smc_init_info *ini) { struct smc_clc_msg_smcd *pclc_smcd = smc_get_clc_msg_smcd(pclc); int rc = 0; /* check if ISM V1 is available */ if (!(ini->smcd_version & SMC_V1) || !smcd_indicated(ini->smc_type_v1)) goto not_found; ini->is_smcd = true; /* prepare ISM check */ ini->ism_peer_gid[0].gid = ntohll(pclc_smcd->ism.gid); ini->ism_peer_gid[0].gid_ext = 0; rc = smc_find_ism_device(new_smc, ini); if (rc) goto not_found; ini->ism_selected = 0; rc = smc_listen_ism_init(new_smc, ini); if (!rc) return; /* V1 ISM device found */ not_found: smc_find_ism_store_rc(rc, ini); ini->smcd_version &= ~SMC_V1; ini->ism_dev[0] = NULL; ini->is_smcd = false; } /* listen worker: register buffers */ static int smc_listen_rdma_reg(struct smc_sock *new_smc, bool local_first) { struct smc_connection *conn = &new_smc->conn; if (!local_first) { /* reg sendbufs if they were vzalloced */ if (conn->sndbuf_desc->is_vm) { if (smcr_lgr_reg_sndbufs(conn->lnk, conn->sndbuf_desc)) return SMC_CLC_DECL_ERR_REGBUF; } if (smcr_lgr_reg_rmbs(conn->lnk, conn->rmb_desc)) return SMC_CLC_DECL_ERR_REGBUF; } return 0; } static void smc_find_rdma_v2_device_serv(struct smc_sock *new_smc, struct smc_clc_msg_proposal *pclc, struct smc_init_info *ini) { struct smc_clc_v2_extension *smc_v2_ext; u8 smcr_version; int rc; if (!(ini->smcr_version & SMC_V2) || !smcr_indicated(ini->smc_type_v2)) goto not_found; smc_v2_ext = smc_get_clc_v2_ext(pclc); if (!smc_clc_match_eid(ini->negotiated_eid, smc_v2_ext, NULL, NULL)) goto not_found; /* prepare RDMA check */ memcpy(ini->peer_systemid, pclc->lcl.id_for_peer, SMC_SYSTEMID_LEN); memcpy(ini->peer_gid, smc_v2_ext->roce, SMC_GID_SIZE); memcpy(ini->peer_mac, pclc->lcl.mac, ETH_ALEN); ini->check_smcrv2 = true; ini->smcrv2.clc_sk = new_smc->clcsock->sk; ini->smcrv2.saddr = new_smc->clcsock->sk->sk_rcv_saddr; ini->smcrv2.daddr = smc_ib_gid_to_ipv4(smc_v2_ext->roce); rc = smc_find_rdma_device(new_smc, ini); if (rc) { smc_find_ism_store_rc(rc, ini); goto not_found; } if (!ini->smcrv2.uses_gateway) memcpy(ini->smcrv2.nexthop_mac, pclc->lcl.mac, ETH_ALEN); smcr_version = ini->smcr_version; ini->smcr_version = SMC_V2; rc = smc_listen_rdma_init(new_smc, ini); if (!rc) { rc = smc_listen_rdma_reg(new_smc, ini->first_contact_local); if (rc) smc_conn_abort(new_smc, ini->first_contact_local); } if (!rc) return; ini->smcr_version = smcr_version; smc_find_ism_store_rc(rc, ini); not_found: ini->smcr_version &= ~SMC_V2; ini->smcrv2.ib_dev_v2 = NULL; ini->check_smcrv2 = false; } static int smc_find_rdma_v1_device_serv(struct smc_sock *new_smc, struct smc_clc_msg_proposal *pclc, struct smc_init_info *ini) { int rc; if (!(ini->smcr_version & SMC_V1) || !smcr_indicated(ini->smc_type_v1)) return SMC_CLC_DECL_NOSMCDEV; /* prepare RDMA check */ memcpy(ini->peer_systemid, pclc->lcl.id_for_peer, SMC_SYSTEMID_LEN); memcpy(ini->peer_gid, pclc->lcl.gid, SMC_GID_SIZE); memcpy(ini->peer_mac, pclc->lcl.mac, ETH_ALEN); rc = smc_find_rdma_device(new_smc, ini); if (rc) { /* no RDMA device found */ return SMC_CLC_DECL_NOSMCDEV; } rc = smc_listen_rdma_init(new_smc, ini); if (rc) return rc; return smc_listen_rdma_reg(new_smc, ini->first_contact_local); } /* determine the local device matching to proposal */ static int smc_listen_find_device(struct smc_sock *new_smc, struct smc_clc_msg_proposal *pclc, struct smc_init_info *ini) { int prfx_rc; /* check for ISM device matching V2 proposed device */ smc_find_ism_v2_device_serv(new_smc, pclc, ini); if (ini->ism_dev[0]) return 0; /* check for matching IP prefix and subnet length (V1) */ prfx_rc = smc_listen_prfx_check(new_smc, pclc); if (prfx_rc) smc_find_ism_store_rc(prfx_rc, ini); /* get vlan id from IP device */ if (smc_vlan_by_tcpsk(new_smc->clcsock, ini)) return ini->rc ?: SMC_CLC_DECL_GETVLANERR; /* check for ISM device matching V1 proposed device */ if (!prfx_rc) smc_find_ism_v1_device_serv(new_smc, pclc, ini); if (ini->ism_dev[0]) return 0; if (!smcr_indicated(pclc->hdr.typev1) && !smcr_indicated(pclc->hdr.typev2)) /* skip RDMA and decline */ return ini->rc ?: SMC_CLC_DECL_NOSMCDDEV; /* check if RDMA V2 is available */ smc_find_rdma_v2_device_serv(new_smc, pclc, ini); if (ini->smcrv2.ib_dev_v2) return 0; /* check if RDMA V1 is available */ if (!prfx_rc) { int rc; rc = smc_find_rdma_v1_device_serv(new_smc, pclc, ini); smc_find_ism_store_rc(rc, ini); return (!rc) ? 0 : ini->rc; } return prfx_rc; } /* listen worker: finish RDMA setup */ static int smc_listen_rdma_finish(struct smc_sock *new_smc, struct smc_clc_msg_accept_confirm *cclc, bool local_first, struct smc_init_info *ini) { struct smc_link *link = new_smc->conn.lnk; int reason_code = 0; if (local_first) smc_link_save_peer_info(link, cclc, ini); if (smc_rmb_rtoken_handling(&new_smc->conn, link, cclc)) return SMC_CLC_DECL_ERR_RTOK; if (local_first) { if (smc_ib_ready_link(link)) return SMC_CLC_DECL_ERR_RDYLNK; /* QP confirmation over RoCE fabric */ smc_llc_flow_initiate(link->lgr, SMC_LLC_FLOW_ADD_LINK); reason_code = smcr_serv_conf_first_link(new_smc); smc_llc_flow_stop(link->lgr, &link->lgr->llc_flow_lcl); } return reason_code; } /* setup for connection of server */ static void smc_listen_work(struct work_struct *work) { struct smc_sock *new_smc = container_of(work, struct smc_sock, smc_listen_work); struct socket *newclcsock = new_smc->clcsock; struct smc_clc_msg_accept_confirm *cclc; struct smc_clc_msg_proposal_area *buf; struct smc_clc_msg_proposal *pclc; struct smc_init_info *ini = NULL; u8 proposal_version = SMC_V1; u8 accept_version; int rc = 0; if (new_smc->listen_smc->sk.sk_state != SMC_LISTEN) return smc_listen_out_err(new_smc); if (new_smc->use_fallback) { smc_listen_out_connected(new_smc); return; } /* check if peer is smc capable */ if (!tcp_sk(newclcsock->sk)->syn_smc) { rc = smc_switch_to_fallback(new_smc, SMC_CLC_DECL_PEERNOSMC); if (rc) smc_listen_out_err(new_smc); else smc_listen_out_connected(new_smc); return; } /* do inband token exchange - * wait for and receive SMC Proposal CLC message */ buf = kzalloc(sizeof(*buf), GFP_KERNEL); if (!buf) { rc = SMC_CLC_DECL_MEM; goto out_decl; } pclc = (struct smc_clc_msg_proposal *)buf; rc = smc_clc_wait_msg(new_smc, pclc, sizeof(*buf), SMC_CLC_PROPOSAL, CLC_WAIT_TIME); if (rc) goto out_decl; if (pclc->hdr.version > SMC_V1) proposal_version = SMC_V2; /* IPSec connections opt out of SMC optimizations */ if (using_ipsec(new_smc)) { rc = SMC_CLC_DECL_IPSEC; goto out_decl; } ini = kzalloc(sizeof(*ini), GFP_KERNEL); if (!ini) { rc = SMC_CLC_DECL_MEM; goto out_decl; } /* initial version checking */ rc = smc_listen_v2_check(new_smc, pclc, ini); if (rc) goto out_decl; rc = smc_clc_srv_v2x_features_validate(new_smc, pclc, ini); if (rc) goto out_decl; mutex_lock(&smc_server_lgr_pending); smc_close_init(new_smc); smc_rx_init(new_smc); smc_tx_init(new_smc); /* determine ISM or RoCE device used for connection */ rc = smc_listen_find_device(new_smc, pclc, ini); if (rc) goto out_unlock; /* send SMC Accept CLC message */ accept_version = ini->is_smcd ? ini->smcd_version : ini->smcr_version; rc = smc_clc_send_accept(new_smc, ini->first_contact_local, accept_version, ini->negotiated_eid, ini); if (rc) goto out_unlock; /* SMC-D does not need this lock any more */ if (ini->is_smcd) mutex_unlock(&smc_server_lgr_pending); /* receive SMC Confirm CLC message */ memset(buf, 0, sizeof(*buf)); cclc = (struct smc_clc_msg_accept_confirm *)buf; rc = smc_clc_wait_msg(new_smc, cclc, sizeof(*buf), SMC_CLC_CONFIRM, CLC_WAIT_TIME); if (rc) { if (!ini->is_smcd) goto out_unlock; goto out_decl; } rc = smc_clc_v2x_features_confirm_check(cclc, ini); if (rc) { if (!ini->is_smcd) goto out_unlock; goto out_decl; } /* fce smc release version is needed in smc_listen_rdma_finish, * so save fce info here. */ smc_conn_save_peer_info_fce(new_smc, cclc); /* finish worker */ if (!ini->is_smcd) { rc = smc_listen_rdma_finish(new_smc, cclc, ini->first_contact_local, ini); if (rc) goto out_unlock; mutex_unlock(&smc_server_lgr_pending); } smc_conn_save_peer_info(new_smc, cclc); if (ini->is_smcd && smc_ism_support_dmb_nocopy(new_smc->conn.lgr->smcd)) { rc = smcd_buf_attach(new_smc); if (rc) goto out_decl; } smc_listen_out_connected(new_smc); SMC_STAT_SERV_SUCC_INC(sock_net(newclcsock->sk), ini); goto out_free; out_unlock: mutex_unlock(&smc_server_lgr_pending); out_decl: smc_listen_decline(new_smc, rc, ini ? ini->first_contact_local : 0, proposal_version); out_free: kfree(ini); kfree(buf); } static void smc_tcp_listen_work(struct work_struct *work) { struct smc_sock *lsmc = container_of(work, struct smc_sock, tcp_listen_work); struct sock *lsk = &lsmc->sk; struct smc_sock *new_smc; int rc = 0; lock_sock(lsk); while (lsk->sk_state == SMC_LISTEN) { rc = smc_clcsock_accept(lsmc, &new_smc); if (rc) /* clcsock accept queue empty or error */ goto out; if (!new_smc) continue; if (tcp_sk(new_smc->clcsock->sk)->syn_smc) atomic_inc(&lsmc->queued_smc_hs); new_smc->listen_smc = lsmc; new_smc->use_fallback = lsmc->use_fallback; new_smc->fallback_rsn = lsmc->fallback_rsn; sock_hold(lsk); /* sock_put in smc_listen_work */ INIT_WORK(&new_smc->smc_listen_work, smc_listen_work); smc_copy_sock_settings_to_smc(new_smc); sock_hold(&new_smc->sk); /* sock_put in passive closing */ if (!queue_work(smc_hs_wq, &new_smc->smc_listen_work)) sock_put(&new_smc->sk); } out: release_sock(lsk); sock_put(&lsmc->sk); /* sock_hold in smc_clcsock_data_ready() */ } static void smc_clcsock_data_ready(struct sock *listen_clcsock) { struct smc_sock *lsmc; read_lock_bh(&listen_clcsock->sk_callback_lock); lsmc = smc_clcsock_user_data(listen_clcsock); if (!lsmc) goto out; lsmc->clcsk_data_ready(listen_clcsock); if (lsmc->sk.sk_state == SMC_LISTEN) { sock_hold(&lsmc->sk); /* sock_put in smc_tcp_listen_work() */ if (!queue_work(smc_tcp_ls_wq, &lsmc->tcp_listen_work)) sock_put(&lsmc->sk); } out: read_unlock_bh(&listen_clcsock->sk_callback_lock); } int smc_listen(struct socket *sock, int backlog) { struct sock *sk = sock->sk; struct smc_sock *smc; int rc; smc = smc_sk(sk); lock_sock(sk); rc = -EINVAL; if ((sk->sk_state != SMC_INIT && sk->sk_state != SMC_LISTEN) || smc->connect_nonblock || sock->state != SS_UNCONNECTED) goto out; rc = 0; if (sk->sk_state == SMC_LISTEN) { sk->sk_max_ack_backlog = backlog; goto out; } /* some socket options are handled in core, so we could not apply * them to the clc socket -- copy smc socket options to clc socket */ smc_copy_sock_settings_to_clc(smc); if (!smc->use_fallback) tcp_sk(smc->clcsock->sk)->syn_smc = 1; /* save original sk_data_ready function and establish * smc-specific sk_data_ready function */ write_lock_bh(&smc->clcsock->sk->sk_callback_lock); smc->clcsock->sk->sk_user_data = (void *)((uintptr_t)smc | SK_USER_DATA_NOCOPY); smc_clcsock_replace_cb(&smc->clcsock->sk->sk_data_ready, smc_clcsock_data_ready, &smc->clcsk_data_ready); write_unlock_bh(&smc->clcsock->sk->sk_callback_lock); /* save original ops */ smc->ori_af_ops = inet_csk(smc->clcsock->sk)->icsk_af_ops; smc->af_ops = *smc->ori_af_ops; smc->af_ops.syn_recv_sock = smc_tcp_syn_recv_sock; inet_csk(smc->clcsock->sk)->icsk_af_ops = &smc->af_ops; if (smc->limit_smc_hs) tcp_sk(smc->clcsock->sk)->smc_hs_congested = smc_hs_congested; rc = kernel_listen(smc->clcsock, backlog); if (rc) { write_lock_bh(&smc->clcsock->sk->sk_callback_lock); smc_clcsock_restore_cb(&smc->clcsock->sk->sk_data_ready, &smc->clcsk_data_ready); smc->clcsock->sk->sk_user_data = NULL; write_unlock_bh(&smc->clcsock->sk->sk_callback_lock); goto out; } sk->sk_max_ack_backlog = backlog; sk->sk_ack_backlog = 0; sk->sk_state = SMC_LISTEN; out: release_sock(sk); return rc; } int smc_accept(struct socket *sock, struct socket *new_sock, struct proto_accept_arg *arg) { struct sock *sk = sock->sk, *nsk; DECLARE_WAITQUEUE(wait, current); struct smc_sock *lsmc; long timeo; int rc = 0; lsmc = smc_sk(sk); sock_hold(sk); /* sock_put below */ lock_sock(sk); if (lsmc->sk.sk_state != SMC_LISTEN) { rc = -EINVAL; release_sock(sk); goto out; } /* Wait for an incoming connection */ timeo = sock_rcvtimeo(sk, arg->flags & O_NONBLOCK); add_wait_queue_exclusive(sk_sleep(sk), &wait); while (!(nsk = smc_accept_dequeue(sk, new_sock))) { set_current_state(TASK_INTERRUPTIBLE); if (!timeo) { rc = -EAGAIN; break; } release_sock(sk); timeo = schedule_timeout(timeo); /* wakeup by sk_data_ready in smc_listen_work() */ sched_annotate_sleep(); lock_sock(sk); if (signal_pending(current)) { rc = sock_intr_errno(timeo); break; } } set_current_state(TASK_RUNNING); remove_wait_queue(sk_sleep(sk), &wait); if (!rc) rc = sock_error(nsk); release_sock(sk); if (rc) goto out; if (lsmc->sockopt_defer_accept && !(arg->flags & O_NONBLOCK)) { /* wait till data arrives on the socket */ timeo = msecs_to_jiffies(lsmc->sockopt_defer_accept * MSEC_PER_SEC); if (smc_sk(nsk)->use_fallback) { struct sock *clcsk = smc_sk(nsk)->clcsock->sk; lock_sock(clcsk); if (skb_queue_empty(&clcsk->sk_receive_queue)) sk_wait_data(clcsk, &timeo, NULL); release_sock(clcsk); } else if (!atomic_read(&smc_sk(nsk)->conn.bytes_to_rcv)) { lock_sock(nsk); smc_rx_wait(smc_sk(nsk), &timeo, smc_rx_data_available); release_sock(nsk); } } out: sock_put(sk); /* sock_hold above */ return rc; } int smc_getname(struct socket *sock, struct sockaddr *addr, int peer) { struct smc_sock *smc; if (peer && (sock->sk->sk_state != SMC_ACTIVE) && (sock->sk->sk_state != SMC_APPCLOSEWAIT1)) return -ENOTCONN; smc = smc_sk(sock->sk); return smc->clcsock->ops->getname(smc->clcsock, addr, peer); } int smc_sendmsg(struct socket *sock, struct msghdr *msg, size_t len) { struct sock *sk = sock->sk; struct smc_sock *smc; int rc; smc = smc_sk(sk); lock_sock(sk); /* SMC does not support connect with fastopen */ if (msg->msg_flags & MSG_FASTOPEN) { /* not connected yet, fallback */ if (sk->sk_state == SMC_INIT && !smc->connect_nonblock) { rc = smc_switch_to_fallback(smc, SMC_CLC_DECL_OPTUNSUPP); if (rc) goto out; } else { rc = -EINVAL; goto out; } } else if ((sk->sk_state != SMC_ACTIVE) && (sk->sk_state != SMC_APPCLOSEWAIT1) && (sk->sk_state != SMC_INIT)) { rc = -EPIPE; goto out; } if (smc->use_fallback) { rc = smc->clcsock->ops->sendmsg(smc->clcsock, msg, len); } else { rc = smc_tx_sendmsg(smc, msg, len); SMC_STAT_TX_PAYLOAD(smc, len, rc); } out: release_sock(sk); return rc; } int smc_recvmsg(struct socket *sock, struct msghdr *msg, size_t len, int flags) { struct sock *sk = sock->sk; struct smc_sock *smc; int rc = -ENOTCONN; smc = smc_sk(sk); lock_sock(sk); if (sk->sk_state == SMC_CLOSED && (sk->sk_shutdown & RCV_SHUTDOWN)) { /* socket was connected before, no more data to read */ rc = 0; goto out; } if ((sk->sk_state == SMC_INIT) || (sk->sk_state == SMC_LISTEN) || (sk->sk_state == SMC_CLOSED)) goto out; if (sk->sk_state == SMC_PEERFINCLOSEWAIT) { rc = 0; goto out; } if (smc->use_fallback) { rc = smc->clcsock->ops->recvmsg(smc->clcsock, msg, len, flags); } else { msg->msg_namelen = 0; rc = smc_rx_recvmsg(smc, msg, NULL, len, flags); SMC_STAT_RX_PAYLOAD(smc, rc, rc); } out: release_sock(sk); return rc; } static __poll_t smc_accept_poll(struct sock *parent) { struct smc_sock *isk = smc_sk(parent); __poll_t mask = 0; spin_lock(&isk->accept_q_lock); if (!list_empty(&isk->accept_q)) mask = EPOLLIN | EPOLLRDNORM; spin_unlock(&isk->accept_q_lock); return mask; } __poll_t smc_poll(struct file *file, struct socket *sock, poll_table *wait) { struct sock *sk = sock->sk; struct smc_sock *smc; __poll_t mask = 0; if (!sk) return EPOLLNVAL; smc = smc_sk(sock->sk); if (smc->use_fallback) { /* delegate to CLC child sock */ mask = smc->clcsock->ops->poll(file, smc->clcsock, wait); sk->sk_err = smc->clcsock->sk->sk_err; } else { if (sk->sk_state != SMC_CLOSED) sock_poll_wait(file, sock, wait); if (sk->sk_err) mask |= EPOLLERR; if ((sk->sk_shutdown == SHUTDOWN_MASK) || (sk->sk_state == SMC_CLOSED)) mask |= EPOLLHUP; if (sk->sk_state == SMC_LISTEN) { /* woken up by sk_data_ready in smc_listen_work() */ mask |= smc_accept_poll(sk); } else if (smc->use_fallback) { /* as result of connect_work()*/ mask |= smc->clcsock->ops->poll(file, smc->clcsock, wait); sk->sk_err = smc->clcsock->sk->sk_err; } else { if ((sk->sk_state != SMC_INIT && atomic_read(&smc->conn.sndbuf_space)) || sk->sk_shutdown & SEND_SHUTDOWN) { mask |= EPOLLOUT | EPOLLWRNORM; } else { sk_set_bit(SOCKWQ_ASYNC_NOSPACE, sk); set_bit(SOCK_NOSPACE, &sk->sk_socket->flags); } if (atomic_read(&smc->conn.bytes_to_rcv)) mask |= EPOLLIN | EPOLLRDNORM; if (sk->sk_shutdown & RCV_SHUTDOWN) mask |= EPOLLIN | EPOLLRDNORM | EPOLLRDHUP; if (sk->sk_state == SMC_APPCLOSEWAIT1) mask |= EPOLLIN; if (smc->conn.urg_state == SMC_URG_VALID) mask |= EPOLLPRI; } } return mask; } int smc_shutdown(struct socket *sock, int how) { struct sock *sk = sock->sk; bool do_shutdown = true; struct smc_sock *smc; int rc = -EINVAL; int old_state; int rc1 = 0; smc = smc_sk(sk); if ((how < SHUT_RD) || (how > SHUT_RDWR)) return rc; lock_sock(sk); if (sock->state == SS_CONNECTING) { if (sk->sk_state == SMC_ACTIVE) sock->state = SS_CONNECTED; else if (sk->sk_state == SMC_PEERCLOSEWAIT1 || sk->sk_state == SMC_PEERCLOSEWAIT2 || sk->sk_state == SMC_APPCLOSEWAIT1 || sk->sk_state == SMC_APPCLOSEWAIT2 || sk->sk_state == SMC_APPFINCLOSEWAIT) sock->state = SS_DISCONNECTING; } rc = -ENOTCONN; if ((sk->sk_state != SMC_ACTIVE) && (sk->sk_state != SMC_PEERCLOSEWAIT1) && (sk->sk_state != SMC_PEERCLOSEWAIT2) && (sk->sk_state != SMC_APPCLOSEWAIT1) && (sk->sk_state != SMC_APPCLOSEWAIT2) && (sk->sk_state != SMC_APPFINCLOSEWAIT)) goto out; if (smc->use_fallback) { rc = kernel_sock_shutdown(smc->clcsock, how); sk->sk_shutdown = smc->clcsock->sk->sk_shutdown; if (sk->sk_shutdown == SHUTDOWN_MASK) { sk->sk_state = SMC_CLOSED; sk->sk_socket->state = SS_UNCONNECTED; sock_put(sk); } goto out; } switch (how) { case SHUT_RDWR: /* shutdown in both directions */ old_state = sk->sk_state; rc = smc_close_active(smc); if (old_state == SMC_ACTIVE && sk->sk_state == SMC_PEERCLOSEWAIT1) do_shutdown = false; break; case SHUT_WR: rc = smc_close_shutdown_write(smc); break; case SHUT_RD: rc = 0; /* nothing more to do because peer is not involved */ break; } if (do_shutdown && smc->clcsock) rc1 = kernel_sock_shutdown(smc->clcsock, how); /* map sock_shutdown_cmd constants to sk_shutdown value range */ sk->sk_shutdown |= how + 1; if (sk->sk_state == SMC_CLOSED) sock->state = SS_UNCONNECTED; else sock->state = SS_DISCONNECTING; out: release_sock(sk); return rc ? rc : rc1; } static int __smc_getsockopt(struct socket *sock, int level, int optname, char __user *optval, int __user *optlen) { struct smc_sock *smc; int val, len; smc = smc_sk(sock->sk); if (get_user(len, optlen)) return -EFAULT; len = min_t(int, len, sizeof(int)); if (len < 0) return -EINVAL; switch (optname) { case SMC_LIMIT_HS: val = smc->limit_smc_hs; break; default: return -EOPNOTSUPP; } if (put_user(len, optlen)) return -EFAULT; if (copy_to_user(optval, &val, len)) return -EFAULT; return 0; } static int __smc_setsockopt(struct socket *sock, int level, int optname, sockptr_t optval, unsigned int optlen) { struct sock *sk = sock->sk; struct smc_sock *smc; int val, rc; smc = smc_sk(sk); lock_sock(sk); switch (optname) { case SMC_LIMIT_HS: if (optlen < sizeof(int)) { rc = -EINVAL; break; } if (copy_from_sockptr(&val, optval, sizeof(int))) { rc = -EFAULT; break; } smc->limit_smc_hs = !!val; rc = 0; break; default: rc = -EOPNOTSUPP; break; } release_sock(sk); return rc; } int smc_setsockopt(struct socket *sock, int level, int optname, sockptr_t optval, unsigned int optlen) { struct sock *sk = sock->sk; struct smc_sock *smc; int val, rc; if (level == SOL_TCP && optname == TCP_ULP) return -EOPNOTSUPP; else if (level == SOL_SMC) return __smc_setsockopt(sock, level, optname, optval, optlen); smc = smc_sk(sk); /* generic setsockopts reaching us here always apply to the * CLC socket */ mutex_lock(&smc->clcsock_release_lock); if (!smc->clcsock) { mutex_unlock(&smc->clcsock_release_lock); return -EBADF; } if (unlikely(!smc->clcsock->ops->setsockopt)) rc = -EOPNOTSUPP; else rc = smc->clcsock->ops->setsockopt(smc->clcsock, level, optname, optval, optlen); if (smc->clcsock->sk->sk_err) { sk->sk_err = smc->clcsock->sk->sk_err; sk_error_report(sk); } mutex_unlock(&smc->clcsock_release_lock); if (optlen < sizeof(int)) return -EINVAL; if (copy_from_sockptr(&val, optval, sizeof(int))) return -EFAULT; lock_sock(sk); if (rc || smc->use_fallback) goto out; switch (optname) { case TCP_FASTOPEN: case TCP_FASTOPEN_CONNECT: case TCP_FASTOPEN_KEY: case TCP_FASTOPEN_NO_COOKIE: /* option not supported by SMC */ if (sk->sk_state == SMC_INIT && !smc->connect_nonblock) { rc = smc_switch_to_fallback(smc, SMC_CLC_DECL_OPTUNSUPP); } else { rc = -EINVAL; } break; case TCP_NODELAY: if (sk->sk_state != SMC_INIT && sk->sk_state != SMC_LISTEN && sk->sk_state != SMC_CLOSED) { if (val) { SMC_STAT_INC(smc, ndly_cnt); smc_tx_pending(&smc->conn); cancel_delayed_work(&smc->conn.tx_work); } } break; case TCP_CORK: if (sk->sk_state != SMC_INIT && sk->sk_state != SMC_LISTEN && sk->sk_state != SMC_CLOSED) { if (!val) { SMC_STAT_INC(smc, cork_cnt); smc_tx_pending(&smc->conn); cancel_delayed_work(&smc->conn.tx_work); } } break; case TCP_DEFER_ACCEPT: smc->sockopt_defer_accept = val; break; default: break; } out: release_sock(sk); return rc; } int smc_getsockopt(struct socket *sock, int level, int optname, char __user *optval, int __user *optlen) { struct smc_sock *smc; int rc; if (level == SOL_SMC) return __smc_getsockopt(sock, level, optname, optval, optlen); smc = smc_sk(sock->sk); mutex_lock(&smc->clcsock_release_lock); if (!smc->clcsock) { mutex_unlock(&smc->clcsock_release_lock); return -EBADF; } /* socket options apply to the CLC socket */ if (unlikely(!smc->clcsock->ops->getsockopt)) { mutex_unlock(&smc->clcsock_release_lock); return -EOPNOTSUPP; } rc = smc->clcsock->ops->getsockopt(smc->clcsock, level, optname, optval, optlen); mutex_unlock(&smc->clcsock_release_lock); return rc; } int smc_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg) { union smc_host_cursor cons, urg; struct smc_connection *conn; struct smc_sock *smc; int answ; smc = smc_sk(sock->sk); conn = &smc->conn; lock_sock(&smc->sk); if (smc->use_fallback) { if (!smc->clcsock) { release_sock(&smc->sk); return -EBADF; } answ = smc->clcsock->ops->ioctl(smc->clcsock, cmd, arg); release_sock(&smc->sk); return answ; } switch (cmd) { case SIOCINQ: /* same as FIONREAD */ if (smc->sk.sk_state == SMC_LISTEN) { release_sock(&smc->sk); return -EINVAL; } if (smc->sk.sk_state == SMC_INIT || smc->sk.sk_state == SMC_CLOSED) answ = 0; else answ = atomic_read(&smc->conn.bytes_to_rcv); break; case SIOCOUTQ: /* output queue size (not send + not acked) */ if (smc->sk.sk_state == SMC_LISTEN) { release_sock(&smc->sk); return -EINVAL; } if (smc->sk.sk_state == SMC_INIT || smc->sk.sk_state == SMC_CLOSED) answ = 0; else answ = smc->conn.sndbuf_desc->len - atomic_read(&smc->conn.sndbuf_space); break; case SIOCOUTQNSD: /* output queue size (not send only) */ if (smc->sk.sk_state == SMC_LISTEN) { release_sock(&smc->sk); return -EINVAL; } if (smc->sk.sk_state == SMC_INIT || smc->sk.sk_state == SMC_CLOSED) answ = 0; else answ = smc_tx_prepared_sends(&smc->conn); break; case SIOCATMARK: if (smc->sk.sk_state == SMC_LISTEN) { release_sock(&smc->sk); return -EINVAL; } if (smc->sk.sk_state == SMC_INIT || smc->sk.sk_state == SMC_CLOSED) { answ = 0; } else { smc_curs_copy(&cons, &conn->local_tx_ctrl.cons, conn); smc_curs_copy(&urg, &conn->urg_curs, conn); answ = smc_curs_diff(conn->rmb_desc->len, &cons, &urg) == 1; } break; default: release_sock(&smc->sk); return -ENOIOCTLCMD; } release_sock(&smc->sk); return put_user(answ, (int __user *)arg); } /* Map the affected portions of the rmbe into an spd, note the number of bytes * to splice in conn->splice_pending, and press 'go'. Delays consumer cursor * updates till whenever a respective page has been fully processed. * Note that subsequent recv() calls have to wait till all splice() processing * completed. */ ssize_t smc_splice_read(struct socket *sock, loff_t *ppos, struct pipe_inode_info *pipe, size_t len, unsigned int flags) { struct sock *sk = sock->sk; struct smc_sock *smc; int rc = -ENOTCONN; smc = smc_sk(sk); lock_sock(sk); if (sk->sk_state == SMC_CLOSED && (sk->sk_shutdown & RCV_SHUTDOWN)) { /* socket was connected before, no more data to read */ rc = 0; goto out; } if (sk->sk_state == SMC_INIT || sk->sk_state == SMC_LISTEN || sk->sk_state == SMC_CLOSED) goto out; if (sk->sk_state == SMC_PEERFINCLOSEWAIT) { rc = 0; goto out; } if (smc->use_fallback) { rc = smc->clcsock->ops->splice_read(smc->clcsock, ppos, pipe, len, flags); } else { if (*ppos) { rc = -ESPIPE; goto out; } if (flags & SPLICE_F_NONBLOCK) flags = MSG_DONTWAIT; else flags = 0; SMC_STAT_INC(smc, splice_cnt); rc = smc_rx_recvmsg(smc, NULL, pipe, len, flags); } out: release_sock(sk); return rc; } /* must look like tcp */ static const struct proto_ops smc_sock_ops = { .family = PF_SMC, .owner = THIS_MODULE, .release = smc_release, .bind = smc_bind, .connect = smc_connect, .socketpair = sock_no_socketpair, .accept = smc_accept, .getname = smc_getname, .poll = smc_poll, .ioctl = smc_ioctl, .listen = smc_listen, .shutdown = smc_shutdown, .setsockopt = smc_setsockopt, .getsockopt = smc_getsockopt, .sendmsg = smc_sendmsg, .recvmsg = smc_recvmsg, .mmap = sock_no_mmap, .splice_read = smc_splice_read, }; int smc_create_clcsk(struct net *net, struct sock *sk, int family) { struct smc_sock *smc = smc_sk(sk); int rc; rc = sock_create_kern(net, family, SOCK_STREAM, IPPROTO_TCP, &smc->clcsock); if (rc) { sk_common_release(sk); return rc; } /* smc_clcsock_release() does not wait smc->clcsock->sk's * destruction; its sk_state might not be TCP_CLOSE after * smc->sk is close()d, and TCP timers can be fired later, * which need net ref. */ sk = smc->clcsock->sk; __netns_tracker_free(net, &sk->ns_tracker, false); sk->sk_net_refcnt = 1; get_net_track(net, &sk->ns_tracker, GFP_KERNEL); sock_inuse_add(net, 1); return 0; } static int __smc_create(struct net *net, struct socket *sock, int protocol, int kern, struct socket *clcsock) { int family = (protocol == SMCPROTO_SMC6) ? PF_INET6 : PF_INET; struct smc_sock *smc; struct sock *sk; int rc; rc = -ESOCKTNOSUPPORT; if (sock->type != SOCK_STREAM) goto out; rc = -EPROTONOSUPPORT; if (protocol != SMCPROTO_SMC && protocol != SMCPROTO_SMC6) goto out; rc = -ENOBUFS; sock->ops = &smc_sock_ops; sock->state = SS_UNCONNECTED; sk = smc_sock_alloc(net, sock, protocol); if (!sk) goto out; /* create internal TCP socket for CLC handshake and fallback */ smc = smc_sk(sk); rc = 0; if (clcsock) smc->clcsock = clcsock; else rc = smc_create_clcsk(net, sk, family); out: return rc; } static int smc_create(struct net *net, struct socket *sock, int protocol, int kern) { return __smc_create(net, sock, protocol, kern, NULL); } static const struct net_proto_family smc_sock_family_ops = { .family = PF_SMC, .owner = THIS_MODULE, .create = smc_create, }; static int smc_ulp_init(struct sock *sk) { struct socket *tcp = sk->sk_socket; struct net *net = sock_net(sk); struct socket *smcsock; int protocol, ret; /* only TCP can be replaced */ if (tcp->type != SOCK_STREAM || sk->sk_protocol != IPPROTO_TCP || (sk->sk_family != AF_INET && sk->sk_family != AF_INET6)) return -ESOCKTNOSUPPORT; /* don't handle wq now */ if (tcp->state != SS_UNCONNECTED || !tcp->file || tcp->wq.fasync_list) return -ENOTCONN; if (sk->sk_family == AF_INET) protocol = SMCPROTO_SMC; else protocol = SMCPROTO_SMC6; smcsock = sock_alloc(); if (!smcsock) return -ENFILE; smcsock->type = SOCK_STREAM; __module_get(THIS_MODULE); /* tried in __tcp_ulp_find_autoload */ ret = __smc_create(net, smcsock, protocol, 1, tcp); if (ret) { sock_release(smcsock); /* module_put() which ops won't be NULL */ return ret; } /* replace tcp socket to smc */ smcsock->file = tcp->file; smcsock->file->private_data = smcsock; smcsock->file->f_inode = SOCK_INODE(smcsock); /* replace inode when sock_close */ smcsock->file->f_path.dentry->d_inode = SOCK_INODE(smcsock); /* dput() in __fput */ tcp->file = NULL; return ret; } static void smc_ulp_clone(const struct request_sock *req, struct sock *newsk, const gfp_t priority) { struct inet_connection_sock *icsk = inet_csk(newsk); /* don't inherit ulp ops to child when listen */ icsk->icsk_ulp_ops = NULL; } static struct tcp_ulp_ops smc_ulp_ops __read_mostly = { .name = "smc", .owner = THIS_MODULE, .init = smc_ulp_init, .clone = smc_ulp_clone, }; unsigned int smc_net_id; static __net_init int smc_net_init(struct net *net) { int rc; rc = smc_sysctl_net_init(net); if (rc) return rc; return smc_pnet_net_init(net); } static void __net_exit smc_net_exit(struct net *net) { smc_sysctl_net_exit(net); smc_pnet_net_exit(net); } static __net_init int smc_net_stat_init(struct net *net) { return smc_stats_init(net); } static void __net_exit smc_net_stat_exit(struct net *net) { smc_stats_exit(net); } static struct pernet_operations smc_net_ops = { .init = smc_net_init, .exit = smc_net_exit, .id = &smc_net_id, .size = sizeof(struct smc_net), }; static struct pernet_operations smc_net_stat_ops = { .init = smc_net_stat_init, .exit = smc_net_stat_exit, }; static int __init smc_init(void) { int rc; rc = register_pernet_subsys(&smc_net_ops); if (rc) return rc; rc = register_pernet_subsys(&smc_net_stat_ops); if (rc) goto out_pernet_subsys; rc = smc_ism_init(); if (rc) goto out_pernet_subsys_stat; smc_clc_init(); rc = smc_nl_init(); if (rc) goto out_ism; rc = smc_pnet_init(); if (rc) goto out_nl; rc = -ENOMEM; smc_tcp_ls_wq = alloc_workqueue("smc_tcp_ls_wq", 0, 0); if (!smc_tcp_ls_wq) goto out_pnet; smc_hs_wq = alloc_workqueue("smc_hs_wq", 0, 0); if (!smc_hs_wq) goto out_alloc_tcp_ls_wq; smc_close_wq = alloc_workqueue("smc_close_wq", 0, 0); if (!smc_close_wq) goto out_alloc_hs_wq; rc = smc_core_init(); if (rc) { pr_err("%s: smc_core_init fails with %d\n", __func__, rc); goto out_alloc_wqs; } rc = smc_llc_init(); if (rc) { pr_err("%s: smc_llc_init fails with %d\n", __func__, rc); goto out_core; } rc = smc_cdc_init(); if (rc) { pr_err("%s: smc_cdc_init fails with %d\n", __func__, rc); goto out_core; } rc = proto_register(&smc_proto, 1); if (rc) { pr_err("%s: proto_register(v4) fails with %d\n", __func__, rc); goto out_core; } rc = proto_register(&smc_proto6, 1); if (rc) { pr_err("%s: proto_register(v6) fails with %d\n", __func__, rc); goto out_proto; } rc = sock_register(&smc_sock_family_ops); if (rc) { pr_err("%s: sock_register fails with %d\n", __func__, rc); goto out_proto6; } INIT_HLIST_HEAD(&smc_v4_hashinfo.ht); INIT_HLIST_HEAD(&smc_v6_hashinfo.ht); rc = smc_ib_register_client(); if (rc) { pr_err("%s: ib_register fails with %d\n", __func__, rc); goto out_sock; } rc = smc_loopback_init(); if (rc) { pr_err("%s: smc_loopback_init fails with %d\n", __func__, rc); goto out_ib; } rc = tcp_register_ulp(&smc_ulp_ops); if (rc) { pr_err("%s: tcp_ulp_register fails with %d\n", __func__, rc); goto out_lo; } rc = smc_inet_init(); if (rc) { pr_err("%s: smc_inet_init fails with %d\n", __func__, rc); goto out_ulp; } static_branch_enable(&tcp_have_smc); return 0; out_ulp: tcp_unregister_ulp(&smc_ulp_ops); out_lo: smc_loopback_exit(); out_ib: smc_ib_unregister_client(); out_sock: sock_unregister(PF_SMC); out_proto6: proto_unregister(&smc_proto6); out_proto: proto_unregister(&smc_proto); out_core: smc_core_exit(); out_alloc_wqs: destroy_workqueue(smc_close_wq); out_alloc_hs_wq: destroy_workqueue(smc_hs_wq); out_alloc_tcp_ls_wq: destroy_workqueue(smc_tcp_ls_wq); out_pnet: smc_pnet_exit(); out_nl: smc_nl_exit(); out_ism: smc_clc_exit(); smc_ism_exit(); out_pernet_subsys_stat: unregister_pernet_subsys(&smc_net_stat_ops); out_pernet_subsys: unregister_pernet_subsys(&smc_net_ops); return rc; } static void __exit smc_exit(void) { static_branch_disable(&tcp_have_smc); smc_inet_exit(); tcp_unregister_ulp(&smc_ulp_ops); sock_unregister(PF_SMC); smc_core_exit(); smc_loopback_exit(); smc_ib_unregister_client(); smc_ism_exit(); destroy_workqueue(smc_close_wq); destroy_workqueue(smc_tcp_ls_wq); destroy_workqueue(smc_hs_wq); proto_unregister(&smc_proto6); proto_unregister(&smc_proto); smc_pnet_exit(); smc_nl_exit(); smc_clc_exit(); unregister_pernet_subsys(&smc_net_stat_ops); unregister_pernet_subsys(&smc_net_ops); rcu_barrier(); } module_init(smc_init); module_exit(smc_exit); MODULE_AUTHOR("Ursula Braun <ubraun@linux.vnet.ibm.com>"); MODULE_DESCRIPTION("smc socket address family"); MODULE_LICENSE("GPL"); MODULE_ALIAS_NETPROTO(PF_SMC); MODULE_ALIAS_TCP_ULP("smc"); /* 256 for IPPROTO_SMC and 1 for SOCK_STREAM */ MODULE_ALIAS_NET_PF_PROTO_TYPE(PF_INET, 256, 1); #if IS_ENABLED(CONFIG_IPV6) MODULE_ALIAS_NET_PF_PROTO_TYPE(PF_INET6, 256, 1); #endif /* CONFIG_IPV6 */ MODULE_ALIAS_GENL_FAMILY(SMC_GENL_FAMILY_NAME); |
57 46 5 16 16 3 43 68 107 106 8 89 8 83 4 5 6 3 2 4 4 7 5 86 72 14 48 1 4 46 46 37 3 3 9 8 3 3 4 3 8 7 3 2 1 1 1 10 7 3 7 4 3 31 1 2 28 30 30 3 2 2 32 1 4 1 3 1 3 19 16 3 7 7 181 154 11 1 6 1 10 4 24 28 25 18 8 25 1 29 1 28 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 | // SPDX-License-Identifier: GPL-2.0 /* * Copyright (C) 1991, 1992 Linus Torvalds * * Added support for a Unix98-style ptmx device. * -- C. Scott Ananian <cananian@alumni.princeton.edu>, 14-Jan-1998 * */ #include <linux/module.h> #include <linux/errno.h> #include <linux/interrupt.h> #include <linux/tty.h> #include <linux/tty_flip.h> #include <linux/fcntl.h> #include <linux/sched/signal.h> #include <linux/string.h> #include <linux/major.h> #include <linux/mm.h> #include <linux/init.h> #include <linux/device.h> #include <linux/uaccess.h> #include <linux/bitops.h> #include <linux/devpts_fs.h> #include <linux/slab.h> #include <linux/mutex.h> #include <linux/poll.h> #include <linux/mount.h> #include <linux/file.h> #include <linux/ioctl.h> #include <linux/compat.h> #include "tty.h" #undef TTY_DEBUG_HANGUP #ifdef TTY_DEBUG_HANGUP # define tty_debug_hangup(tty, f, args...) tty_debug(tty, f, ##args) #else # define tty_debug_hangup(tty, f, args...) do {} while (0) #endif #ifdef CONFIG_UNIX98_PTYS static struct tty_driver *ptm_driver; static struct tty_driver *pts_driver; static DEFINE_MUTEX(devpts_mutex); #endif static void pty_close(struct tty_struct *tty, struct file *filp) { if (tty->driver->subtype == PTY_TYPE_MASTER) WARN_ON(tty->count > 1); else { if (tty_io_error(tty)) return; if (tty->count > 2) return; } set_bit(TTY_IO_ERROR, &tty->flags); wake_up_interruptible(&tty->read_wait); wake_up_interruptible(&tty->write_wait); spin_lock_irq(&tty->ctrl.lock); tty->ctrl.packet = false; spin_unlock_irq(&tty->ctrl.lock); /* Review - krefs on tty_link ?? */ if (!tty->link) return; set_bit(TTY_OTHER_CLOSED, &tty->link->flags); wake_up_interruptible(&tty->link->read_wait); wake_up_interruptible(&tty->link->write_wait); if (tty->driver->subtype == PTY_TYPE_MASTER) { set_bit(TTY_OTHER_CLOSED, &tty->flags); #ifdef CONFIG_UNIX98_PTYS if (tty->driver == ptm_driver) { mutex_lock(&devpts_mutex); if (tty->link->driver_data) devpts_pty_kill(tty->link->driver_data); mutex_unlock(&devpts_mutex); } #endif tty_vhangup(tty->link); } } /* * The unthrottle routine is called by the line discipline to signal * that it can receive more characters. For PTY's, the TTY_THROTTLED * flag is always set, to force the line discipline to always call the * unthrottle routine when there are fewer than TTY_THRESHOLD_UNTHROTTLE * characters in the queue. This is necessary since each time this * happens, we need to wake up any sleeping processes that could be * (1) trying to send data to the pty, or (2) waiting in wait_until_sent() * for the pty buffer to be drained. */ static void pty_unthrottle(struct tty_struct *tty) { tty_wakeup(tty->link); set_bit(TTY_THROTTLED, &tty->flags); } /** * pty_write - write to a pty * @tty: the tty we write from * @buf: kernel buffer of data * @c: bytes to write * * Our "hardware" write method. Data is coming from the ldisc which * may be in a non sleeping state. We simply throw this at the other * end of the link as if we were an IRQ handler receiving stuff for * the other side of the pty/tty pair. */ static ssize_t pty_write(struct tty_struct *tty, const u8 *buf, size_t c) { struct tty_struct *to = tty->link; if (tty->flow.stopped || !c) return 0; return tty_insert_flip_string_and_push_buffer(to->port, buf, c); } /** * pty_write_room - write space * @tty: tty we are writing from * * Report how many bytes the ldisc can send into the queue for * the other device. */ static unsigned int pty_write_room(struct tty_struct *tty) { if (tty->flow.stopped) return 0; return tty_buffer_space_avail(tty->link->port); } /* Set the lock flag on a pty */ static int pty_set_lock(struct tty_struct *tty, int __user *arg) { int val; if (get_user(val, arg)) return -EFAULT; if (val) set_bit(TTY_PTY_LOCK, &tty->flags); else clear_bit(TTY_PTY_LOCK, &tty->flags); return 0; } static int pty_get_lock(struct tty_struct *tty, int __user *arg) { int locked = test_bit(TTY_PTY_LOCK, &tty->flags); return put_user(locked, arg); } /* Set the packet mode on a pty */ static int pty_set_pktmode(struct tty_struct *tty, int __user *arg) { int pktmode; if (get_user(pktmode, arg)) return -EFAULT; spin_lock_irq(&tty->ctrl.lock); if (pktmode) { if (!tty->ctrl.packet) { tty->link->ctrl.pktstatus = 0; smp_mb(); tty->ctrl.packet = true; } } else tty->ctrl.packet = false; spin_unlock_irq(&tty->ctrl.lock); return 0; } /* Get the packet mode of a pty */ static int pty_get_pktmode(struct tty_struct *tty, int __user *arg) { int pktmode = tty->ctrl.packet; return put_user(pktmode, arg); } /* Send a signal to the slave */ static int pty_signal(struct tty_struct *tty, int sig) { struct pid *pgrp; if (sig != SIGINT && sig != SIGQUIT && sig != SIGTSTP) return -EINVAL; if (tty->link) { pgrp = tty_get_pgrp(tty->link); if (pgrp) kill_pgrp(pgrp, sig, 1); put_pid(pgrp); } return 0; } static void pty_flush_buffer(struct tty_struct *tty) { struct tty_struct *to = tty->link; if (!to) return; tty_buffer_flush(to, NULL); if (to->ctrl.packet) { spin_lock_irq(&tty->ctrl.lock); tty->ctrl.pktstatus |= TIOCPKT_FLUSHWRITE; wake_up_interruptible(&to->read_wait); spin_unlock_irq(&tty->ctrl.lock); } } static int pty_open(struct tty_struct *tty, struct file *filp) { if (!tty || !tty->link) return -ENODEV; if (test_bit(TTY_OTHER_CLOSED, &tty->flags)) goto out; if (test_bit(TTY_PTY_LOCK, &tty->link->flags)) goto out; if (tty->driver->subtype == PTY_TYPE_SLAVE && tty->link->count != 1) goto out; clear_bit(TTY_IO_ERROR, &tty->flags); clear_bit(TTY_OTHER_CLOSED, &tty->link->flags); set_bit(TTY_THROTTLED, &tty->flags); return 0; out: set_bit(TTY_IO_ERROR, &tty->flags); return -EIO; } static void pty_set_termios(struct tty_struct *tty, const struct ktermios *old_termios) { /* See if packet mode change of state. */ if (tty->link && tty->link->ctrl.packet) { int extproc = (old_termios->c_lflag & EXTPROC) | L_EXTPROC(tty); int old_flow = ((old_termios->c_iflag & IXON) && (old_termios->c_cc[VSTOP] == '\023') && (old_termios->c_cc[VSTART] == '\021')); int new_flow = (I_IXON(tty) && STOP_CHAR(tty) == '\023' && START_CHAR(tty) == '\021'); if ((old_flow != new_flow) || extproc) { spin_lock_irq(&tty->ctrl.lock); if (old_flow != new_flow) { tty->ctrl.pktstatus &= ~(TIOCPKT_DOSTOP | TIOCPKT_NOSTOP); if (new_flow) tty->ctrl.pktstatus |= TIOCPKT_DOSTOP; else tty->ctrl.pktstatus |= TIOCPKT_NOSTOP; } if (extproc) tty->ctrl.pktstatus |= TIOCPKT_IOCTL; spin_unlock_irq(&tty->ctrl.lock); wake_up_interruptible(&tty->link->read_wait); } } tty->termios.c_cflag &= ~(CSIZE | PARENB); tty->termios.c_cflag |= (CS8 | CREAD); } /** * pty_resize - resize event * @tty: tty being resized * @ws: window size being set. * * Update the termios variables and send the necessary signals to * peform a terminal resize correctly */ static int pty_resize(struct tty_struct *tty, struct winsize *ws) { struct pid *pgrp, *rpgrp; struct tty_struct *pty = tty->link; /* For a PTY we need to lock the tty side */ mutex_lock(&tty->winsize_mutex); if (!memcmp(ws, &tty->winsize, sizeof(*ws))) goto done; /* Signal the foreground process group of both ptys */ pgrp = tty_get_pgrp(tty); rpgrp = tty_get_pgrp(pty); if (pgrp) kill_pgrp(pgrp, SIGWINCH, 1); if (rpgrp != pgrp && rpgrp) kill_pgrp(rpgrp, SIGWINCH, 1); put_pid(pgrp); put_pid(rpgrp); tty->winsize = *ws; pty->winsize = *ws; /* Never used so will go away soon */ done: mutex_unlock(&tty->winsize_mutex); return 0; } /** * pty_start - start() handler * pty_stop - stop() handler * @tty: tty being flow-controlled * * Propagates the TIOCPKT status to the master pty. * * NB: only the master pty can be in packet mode so only the slave * needs start()/stop() handlers */ static void pty_start(struct tty_struct *tty) { unsigned long flags; if (tty->link && tty->link->ctrl.packet) { spin_lock_irqsave(&tty->ctrl.lock, flags); tty->ctrl.pktstatus &= ~TIOCPKT_STOP; tty->ctrl.pktstatus |= TIOCPKT_START; spin_unlock_irqrestore(&tty->ctrl.lock, flags); wake_up_interruptible_poll(&tty->link->read_wait, EPOLLIN); } } static void pty_stop(struct tty_struct *tty) { unsigned long flags; if (tty->link && tty->link->ctrl.packet) { spin_lock_irqsave(&tty->ctrl.lock, flags); tty->ctrl.pktstatus &= ~TIOCPKT_START; tty->ctrl.pktstatus |= TIOCPKT_STOP; spin_unlock_irqrestore(&tty->ctrl.lock, flags); wake_up_interruptible_poll(&tty->link->read_wait, EPOLLIN); } } /** * pty_common_install - set up the pty pair * @driver: the pty driver * @tty: the tty being instantiated * @legacy: true if this is BSD style * * Perform the initial set up for the tty/pty pair. Called from the * tty layer when the port is first opened. * * Locking: the caller must hold the tty_mutex */ static int pty_common_install(struct tty_driver *driver, struct tty_struct *tty, bool legacy) { struct tty_struct *o_tty; struct tty_port *ports[2]; int idx = tty->index; int retval = -ENOMEM; /* Opening the slave first has always returned -EIO */ if (driver->subtype != PTY_TYPE_MASTER) return -EIO; ports[0] = kmalloc(sizeof **ports, GFP_KERNEL); ports[1] = kmalloc(sizeof **ports, GFP_KERNEL); if (!ports[0] || !ports[1]) goto err; if (!try_module_get(driver->other->owner)) { /* This cannot in fact currently happen */ goto err; } o_tty = alloc_tty_struct(driver->other, idx); if (!o_tty) goto err_put_module; tty_set_lock_subclass(o_tty); lockdep_set_subclass(&o_tty->termios_rwsem, TTY_LOCK_SLAVE); if (legacy) { /* We always use new tty termios data so we can do this the easy way .. */ tty_init_termios(tty); tty_init_termios(o_tty); driver->other->ttys[idx] = o_tty; driver->ttys[idx] = tty; } else { memset(&tty->termios_locked, 0, sizeof(tty->termios_locked)); tty->termios = driver->init_termios; memset(&o_tty->termios_locked, 0, sizeof(tty->termios_locked)); o_tty->termios = driver->other->init_termios; } /* * Everything allocated ... set up the o_tty structure. */ tty_driver_kref_get(driver->other); /* Establish the links in both directions */ tty->link = o_tty; o_tty->link = tty; tty_port_init(ports[0]); tty_port_init(ports[1]); tty_buffer_set_limit(ports[0], 8192); tty_buffer_set_limit(ports[1], 8192); o_tty->port = ports[0]; tty->port = ports[1]; o_tty->port->itty = o_tty; tty_buffer_set_lock_subclass(o_tty->port); tty_driver_kref_get(driver); tty->count++; o_tty->count++; return 0; err_put_module: module_put(driver->other->owner); err: kfree(ports[0]); kfree(ports[1]); return retval; } static void pty_cleanup(struct tty_struct *tty) { tty_port_put(tty->port); } /* Traditional BSD devices */ #ifdef CONFIG_LEGACY_PTYS static int pty_install(struct tty_driver *driver, struct tty_struct *tty) { return pty_common_install(driver, tty, true); } static void pty_remove(struct tty_driver *driver, struct tty_struct *tty) { struct tty_struct *pair = tty->link; driver->ttys[tty->index] = NULL; if (pair) pair->driver->ttys[pair->index] = NULL; } static int pty_bsd_ioctl(struct tty_struct *tty, unsigned int cmd, unsigned long arg) { switch (cmd) { case TIOCSPTLCK: /* Set PT Lock (disallow slave open) */ return pty_set_lock(tty, (int __user *) arg); case TIOCGPTLCK: /* Get PT Lock status */ return pty_get_lock(tty, (int __user *)arg); case TIOCPKT: /* Set PT packet mode */ return pty_set_pktmode(tty, (int __user *)arg); case TIOCGPKT: /* Get PT packet mode */ return pty_get_pktmode(tty, (int __user *)arg); case TIOCSIG: /* Send signal to other side of pty */ return pty_signal(tty, (int) arg); case TIOCGPTN: /* TTY returns ENOTTY, but glibc expects EINVAL here */ return -EINVAL; } return -ENOIOCTLCMD; } #ifdef CONFIG_COMPAT static long pty_bsd_compat_ioctl(struct tty_struct *tty, unsigned int cmd, unsigned long arg) { /* * PTY ioctls don't require any special translation between 32-bit and * 64-bit userspace, they are already compatible. */ return pty_bsd_ioctl(tty, cmd, (unsigned long)compat_ptr(arg)); } #else #define pty_bsd_compat_ioctl NULL #endif static int legacy_count = CONFIG_LEGACY_PTY_COUNT; /* * not really modular, but the easiest way to keep compat with existing * bootargs behaviour is to continue using module_param here. */ module_param(legacy_count, int, 0); /* * The master side of a pty can do TIOCSPTLCK and thus * has pty_bsd_ioctl. */ static const struct tty_operations master_pty_ops_bsd = { .install = pty_install, .open = pty_open, .close = pty_close, .write = pty_write, .write_room = pty_write_room, .flush_buffer = pty_flush_buffer, .unthrottle = pty_unthrottle, .ioctl = pty_bsd_ioctl, .compat_ioctl = pty_bsd_compat_ioctl, .cleanup = pty_cleanup, .resize = pty_resize, .remove = pty_remove }; static const struct tty_operations slave_pty_ops_bsd = { .install = pty_install, .open = pty_open, .close = pty_close, .write = pty_write, .write_room = pty_write_room, .flush_buffer = pty_flush_buffer, .unthrottle = pty_unthrottle, .set_termios = pty_set_termios, .cleanup = pty_cleanup, .resize = pty_resize, .start = pty_start, .stop = pty_stop, .remove = pty_remove }; static void __init legacy_pty_init(void) { struct tty_driver *pty_driver, *pty_slave_driver; if (legacy_count <= 0) return; pty_driver = tty_alloc_driver(legacy_count, TTY_DRIVER_RESET_TERMIOS | TTY_DRIVER_REAL_RAW | TTY_DRIVER_DYNAMIC_ALLOC); if (IS_ERR(pty_driver)) panic("Couldn't allocate pty driver"); pty_slave_driver = tty_alloc_driver(legacy_count, TTY_DRIVER_RESET_TERMIOS | TTY_DRIVER_REAL_RAW | TTY_DRIVER_DYNAMIC_ALLOC); if (IS_ERR(pty_slave_driver)) panic("Couldn't allocate pty slave driver"); pty_driver->driver_name = "pty_master"; pty_driver->name = "pty"; pty_driver->major = PTY_MASTER_MAJOR; pty_driver->minor_start = 0; pty_driver->type = TTY_DRIVER_TYPE_PTY; pty_driver->subtype = PTY_TYPE_MASTER; pty_driver->init_termios = tty_std_termios; pty_driver->init_termios.c_iflag = 0; pty_driver->init_termios.c_oflag = 0; pty_driver->init_termios.c_cflag = B38400 | CS8 | CREAD; pty_driver->init_termios.c_lflag = 0; pty_driver->init_termios.c_ispeed = 38400; pty_driver->init_termios.c_ospeed = 38400; pty_driver->other = pty_slave_driver; tty_set_operations(pty_driver, &master_pty_ops_bsd); pty_slave_driver->driver_name = "pty_slave"; pty_slave_driver->name = "ttyp"; pty_slave_driver->major = PTY_SLAVE_MAJOR; pty_slave_driver->minor_start = 0; pty_slave_driver->type = TTY_DRIVER_TYPE_PTY; pty_slave_driver->subtype = PTY_TYPE_SLAVE; pty_slave_driver->init_termios = tty_std_termios; pty_slave_driver->init_termios.c_cflag = B38400 | CS8 | CREAD; pty_slave_driver->init_termios.c_ispeed = 38400; pty_slave_driver->init_termios.c_ospeed = 38400; pty_slave_driver->other = pty_driver; tty_set_operations(pty_slave_driver, &slave_pty_ops_bsd); if (tty_register_driver(pty_driver)) panic("Couldn't register pty driver"); if (tty_register_driver(pty_slave_driver)) panic("Couldn't register pty slave driver"); } #else static inline void legacy_pty_init(void) { } #endif /* Unix98 devices */ #ifdef CONFIG_UNIX98_PTYS static struct cdev ptmx_cdev; /** * ptm_open_peer - open the peer of a pty * @master: the open struct file of the ptmx device node * @tty: the master of the pty being opened * @flags: the flags for open * * Provide a race free way for userspace to open the slave end of a pty * (where they have the master fd and cannot access or trust the mount * namespace /dev/pts was mounted inside). */ int ptm_open_peer(struct file *master, struct tty_struct *tty, int flags) { int fd; struct file *filp; int retval = -EINVAL; struct path path; if (tty->driver != ptm_driver) return -EIO; fd = get_unused_fd_flags(flags); if (fd < 0) { retval = fd; goto err; } /* Compute the slave's path */ path.mnt = devpts_mntget(master, tty->driver_data); if (IS_ERR(path.mnt)) { retval = PTR_ERR(path.mnt); goto err_put; } path.dentry = tty->link->driver_data; filp = dentry_open(&path, flags, current_cred()); mntput(path.mnt); if (IS_ERR(filp)) { retval = PTR_ERR(filp); goto err_put; } fd_install(fd, filp); return fd; err_put: put_unused_fd(fd); err: return retval; } static int pty_unix98_ioctl(struct tty_struct *tty, unsigned int cmd, unsigned long arg) { switch (cmd) { case TIOCSPTLCK: /* Set PT Lock (disallow slave open) */ return pty_set_lock(tty, (int __user *)arg); case TIOCGPTLCK: /* Get PT Lock status */ return pty_get_lock(tty, (int __user *)arg); case TIOCPKT: /* Set PT packet mode */ return pty_set_pktmode(tty, (int __user *)arg); case TIOCGPKT: /* Get PT packet mode */ return pty_get_pktmode(tty, (int __user *)arg); case TIOCGPTN: /* Get PT Number */ return put_user(tty->index, (unsigned int __user *)arg); case TIOCSIG: /* Send signal to other side of pty */ return pty_signal(tty, (int) arg); } return -ENOIOCTLCMD; } #ifdef CONFIG_COMPAT static long pty_unix98_compat_ioctl(struct tty_struct *tty, unsigned int cmd, unsigned long arg) { /* * PTY ioctls don't require any special translation between 32-bit and * 64-bit userspace, they are already compatible. */ return pty_unix98_ioctl(tty, cmd, cmd == TIOCSIG ? arg : (unsigned long)compat_ptr(arg)); } #else #define pty_unix98_compat_ioctl NULL #endif /** * ptm_unix98_lookup - find a pty master * @driver: ptm driver * @file: unused * @idx: tty index * * Look up a pty master device. Called under the tty_mutex for now. * This provides our locking. */ static struct tty_struct *ptm_unix98_lookup(struct tty_driver *driver, struct file *file, int idx) { /* Master must be open via /dev/ptmx */ return ERR_PTR(-EIO); } /** * pts_unix98_lookup - find a pty slave * @driver: pts driver * @file: file pointer to tty * @idx: tty index * * Look up a pty master device. Called under the tty_mutex for now. * This provides our locking for the tty pointer. */ static struct tty_struct *pts_unix98_lookup(struct tty_driver *driver, struct file *file, int idx) { struct tty_struct *tty; mutex_lock(&devpts_mutex); tty = devpts_get_priv(file->f_path.dentry); mutex_unlock(&devpts_mutex); /* Master must be open before slave */ if (!tty) return ERR_PTR(-EIO); return tty; } static int pty_unix98_install(struct tty_driver *driver, struct tty_struct *tty) { return pty_common_install(driver, tty, false); } /* this is called once with whichever end is closed last */ static void pty_unix98_remove(struct tty_driver *driver, struct tty_struct *tty) { struct pts_fs_info *fsi; if (tty->driver->subtype == PTY_TYPE_MASTER) fsi = tty->driver_data; else fsi = tty->link->driver_data; if (fsi) { devpts_kill_index(fsi, tty->index); devpts_release(fsi); } } static void pty_show_fdinfo(struct tty_struct *tty, struct seq_file *m) { seq_printf(m, "tty-index:\t%d\n", tty->index); } static const struct tty_operations ptm_unix98_ops = { .lookup = ptm_unix98_lookup, .install = pty_unix98_install, .remove = pty_unix98_remove, .open = pty_open, .close = pty_close, .write = pty_write, .write_room = pty_write_room, .flush_buffer = pty_flush_buffer, .unthrottle = pty_unthrottle, .ioctl = pty_unix98_ioctl, .compat_ioctl = pty_unix98_compat_ioctl, .resize = pty_resize, .cleanup = pty_cleanup, .show_fdinfo = pty_show_fdinfo, }; static const struct tty_operations pty_unix98_ops = { .lookup = pts_unix98_lookup, .install = pty_unix98_install, .remove = pty_unix98_remove, .open = pty_open, .close = pty_close, .write = pty_write, .write_room = pty_write_room, .flush_buffer = pty_flush_buffer, .unthrottle = pty_unthrottle, .set_termios = pty_set_termios, .start = pty_start, .stop = pty_stop, .cleanup = pty_cleanup, }; /** * ptmx_open - open a unix 98 pty master * @inode: inode of device file * @filp: file pointer to tty * * Allocate a unix98 pty master device from the ptmx driver. * * Locking: tty_mutex protects the init_dev work. tty->count should * protect the rest. * allocated_ptys_lock handles the list of free pty numbers */ static int ptmx_open(struct inode *inode, struct file *filp) { struct pts_fs_info *fsi; struct tty_struct *tty; struct dentry *dentry; int retval; int index; nonseekable_open(inode, filp); /* We refuse fsnotify events on ptmx, since it's a shared resource */ filp->f_mode |= FMODE_NONOTIFY; retval = tty_alloc_file(filp); if (retval) return retval; fsi = devpts_acquire(filp); if (IS_ERR(fsi)) { retval = PTR_ERR(fsi); goto out_free_file; } /* find a device that is not in use. */ mutex_lock(&devpts_mutex); index = devpts_new_index(fsi); mutex_unlock(&devpts_mutex); retval = index; if (index < 0) goto out_put_fsi; mutex_lock(&tty_mutex); tty = tty_init_dev(ptm_driver, index); /* The tty returned here is locked so we can safely drop the mutex */ mutex_unlock(&tty_mutex); retval = PTR_ERR(tty); if (IS_ERR(tty)) goto out; /* * From here on out, the tty is "live", and the index and * fsi will be killed/put by the tty_release() */ set_bit(TTY_PTY_LOCK, &tty->flags); /* LOCK THE SLAVE */ tty->driver_data = fsi; tty_add_file(tty, filp); dentry = devpts_pty_new(fsi, index, tty->link); if (IS_ERR(dentry)) { retval = PTR_ERR(dentry); goto err_release; } tty->link->driver_data = dentry; retval = ptm_driver->ops->open(tty, filp); if (retval) goto err_release; tty_debug_hangup(tty, "opening (count=%d)\n", tty->count); tty_unlock(tty); return 0; err_release: tty_unlock(tty); // This will also put-ref the fsi tty_release(inode, filp); return retval; out: devpts_kill_index(fsi, index); out_put_fsi: devpts_release(fsi); out_free_file: tty_free_file(filp); return retval; } static struct file_operations ptmx_fops __ro_after_init; static void __init unix98_pty_init(void) { ptm_driver = tty_alloc_driver(NR_UNIX98_PTY_MAX, TTY_DRIVER_RESET_TERMIOS | TTY_DRIVER_REAL_RAW | TTY_DRIVER_DYNAMIC_DEV | TTY_DRIVER_DEVPTS_MEM | TTY_DRIVER_DYNAMIC_ALLOC); if (IS_ERR(ptm_driver)) panic("Couldn't allocate Unix98 ptm driver"); pts_driver = tty_alloc_driver(NR_UNIX98_PTY_MAX, TTY_DRIVER_RESET_TERMIOS | TTY_DRIVER_REAL_RAW | TTY_DRIVER_DYNAMIC_DEV | TTY_DRIVER_DEVPTS_MEM | TTY_DRIVER_DYNAMIC_ALLOC); if (IS_ERR(pts_driver)) panic("Couldn't allocate Unix98 pts driver"); ptm_driver->driver_name = "pty_master"; ptm_driver->name = "ptm"; ptm_driver->major = UNIX98_PTY_MASTER_MAJOR; ptm_driver->minor_start = 0; ptm_driver->type = TTY_DRIVER_TYPE_PTY; ptm_driver->subtype = PTY_TYPE_MASTER; ptm_driver->init_termios = tty_std_termios; ptm_driver->init_termios.c_iflag = 0; ptm_driver->init_termios.c_oflag = 0; ptm_driver->init_termios.c_cflag = B38400 | CS8 | CREAD; ptm_driver->init_termios.c_lflag = 0; ptm_driver->init_termios.c_ispeed = 38400; ptm_driver->init_termios.c_ospeed = 38400; ptm_driver->other = pts_driver; tty_set_operations(ptm_driver, &ptm_unix98_ops); pts_driver->driver_name = "pty_slave"; pts_driver->name = "pts"; pts_driver->major = UNIX98_PTY_SLAVE_MAJOR; pts_driver->minor_start = 0; pts_driver->type = TTY_DRIVER_TYPE_PTY; pts_driver->subtype = PTY_TYPE_SLAVE; pts_driver->init_termios = tty_std_termios; pts_driver->init_termios.c_cflag = B38400 | CS8 | CREAD; pts_driver->init_termios.c_ispeed = 38400; pts_driver->init_termios.c_ospeed = 38400; pts_driver->other = ptm_driver; tty_set_operations(pts_driver, &pty_unix98_ops); if (tty_register_driver(ptm_driver)) panic("Couldn't register Unix98 ptm driver"); if (tty_register_driver(pts_driver)) panic("Couldn't register Unix98 pts driver"); /* Now create the /dev/ptmx special device */ tty_default_fops(&ptmx_fops); ptmx_fops.open = ptmx_open; cdev_init(&ptmx_cdev, &ptmx_fops); if (cdev_add(&ptmx_cdev, MKDEV(TTYAUX_MAJOR, 2), 1) || register_chrdev_region(MKDEV(TTYAUX_MAJOR, 2), 1, "/dev/ptmx") < 0) panic("Couldn't register /dev/ptmx driver"); device_create(&tty_class, NULL, MKDEV(TTYAUX_MAJOR, 2), NULL, "ptmx"); } #else static inline void unix98_pty_init(void) { } #endif static int __init pty_init(void) { legacy_pty_init(); unix98_pty_init(); return 0; } device_initcall(pty_init); |
1 5 5 1 5 1 6 5 1 8 2 5 5 3 13 6 6 6 6 6 1 5 6 6 10 10 2 8 10 21 21 21 6 6 15 3 18 2 19 11 10 3 5 3 10 6 13 6 10 6 10 3 3 3 3 3 3 10 2 1 1 3 5 6 6 37 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 | // SPDX-License-Identifier: GPL-2.0 /* * IPVS An implementation of the IP virtual server support for the * LINUX operating system. IPVS is now implemented as a module * over the NetFilter framework. IPVS can be used to build a * high-performance and highly available server based on a * cluster of servers. * * Version 1, is capable of handling both version 0 and 1 messages. * Version 0 is the plain old format. * Note Version 0 receivers will just drop Ver 1 messages. * Version 1 is capable of handle IPv6, Persistence data, * time-outs, and firewall marks. * In ver.1 "ip_vs_sync_conn_options" will be sent in netw. order. * Ver. 0 can be turned on by sysctl -w net.ipv4.vs.sync_version=0 * * Definitions Message: is a complete datagram * Sync_conn: is a part of a Message * Param Data is an option to a Sync_conn. * * Authors: Wensong Zhang <wensong@linuxvirtualserver.org> * * ip_vs_sync: sync connection info from master load balancer to backups * through multicast * * Changes: * Alexandre Cassen : Added master & backup support at a time. * Alexandre Cassen : Added SyncID support for incoming sync * messages filtering. * Justin Ossevoort : Fix endian problem on sync message size. * Hans Schillstrom : Added Version 1: i.e. IPv6, * Persistence support, fwmark and time-out. */ #define KMSG_COMPONENT "IPVS" #define pr_fmt(fmt) KMSG_COMPONENT ": " fmt #include <linux/module.h> #include <linux/slab.h> #include <linux/inetdevice.h> #include <linux/net.h> #include <linux/completion.h> #include <linux/delay.h> #include <linux/skbuff.h> #include <linux/in.h> #include <linux/igmp.h> /* for ip_mc_join_group */ #include <linux/udp.h> #include <linux/err.h> #include <linux/kthread.h> #include <linux/wait.h> #include <linux/kernel.h> #include <linux/sched/signal.h> #include <asm/unaligned.h> /* Used for ntoh_seq and hton_seq */ #include <net/ip.h> #include <net/sock.h> #include <net/ip_vs.h> #define IP_VS_SYNC_GROUP 0xe0000051 /* multicast addr - 224.0.0.81 */ #define IP_VS_SYNC_PORT 8848 /* multicast port */ #define SYNC_PROTO_VER 1 /* Protocol version in header */ static struct lock_class_key __ipvs_sync_key; /* * IPVS sync connection entry * Version 0, i.e. original version. */ struct ip_vs_sync_conn_v0 { __u8 reserved; /* Protocol, addresses and port numbers */ __u8 protocol; /* Which protocol (TCP/UDP) */ __be16 cport; __be16 vport; __be16 dport; __be32 caddr; /* client address */ __be32 vaddr; /* virtual address */ __be32 daddr; /* destination address */ /* Flags and state transition */ __be16 flags; /* status flags */ __be16 state; /* state info */ /* The sequence options start here */ }; struct ip_vs_sync_conn_options { struct ip_vs_seq in_seq; /* incoming seq. struct */ struct ip_vs_seq out_seq; /* outgoing seq. struct */ }; /* Sync Connection format (sync_conn) 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type | Protocol | Ver. | Size | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Flags | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | State | cport | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | vport | dport | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | fwmark | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | timeout (in sec.) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ... | | IP-Addresses (v4 or v6) | | ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Optional Parameters. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Param. Type | Param. Length | Param. data | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | ... | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | Param Type | Param. Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Param data | | Last Param data should be padded for 32 bit alignment | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ */ /* * Type 0, IPv4 sync connection format */ struct ip_vs_sync_v4 { __u8 type; __u8 protocol; /* Which protocol (TCP/UDP) */ __be16 ver_size; /* Version msb 4 bits */ /* Flags and state transition */ __be32 flags; /* status flags */ __be16 state; /* state info */ /* Protocol, addresses and port numbers */ __be16 cport; __be16 vport; __be16 dport; __be32 fwmark; /* Firewall mark from skb */ __be32 timeout; /* cp timeout */ __be32 caddr; /* client address */ __be32 vaddr; /* virtual address */ __be32 daddr; /* destination address */ /* The sequence options start here */ /* PE data padded to 32bit alignment after seq. options */ }; /* * Type 2 messages IPv6 */ struct ip_vs_sync_v6 { __u8 type; __u8 protocol; /* Which protocol (TCP/UDP) */ __be16 ver_size; /* Version msb 4 bits */ /* Flags and state transition */ __be32 flags; /* status flags */ __be16 state; /* state info */ /* Protocol, addresses and port numbers */ __be16 cport; __be16 vport; __be16 dport; __be32 fwmark; /* Firewall mark from skb */ __be32 timeout; /* cp timeout */ struct in6_addr caddr; /* client address */ struct in6_addr vaddr; /* virtual address */ struct in6_addr daddr; /* destination address */ /* The sequence options start here */ /* PE data padded to 32bit alignment after seq. options */ }; union ip_vs_sync_conn { struct ip_vs_sync_v4 v4; struct ip_vs_sync_v6 v6; }; /* Bits in Type field in above */ #define STYPE_INET6 0 #define STYPE_F_INET6 (1 << STYPE_INET6) #define SVER_SHIFT 12 /* Shift to get version */ #define SVER_MASK 0x0fff /* Mask to strip version */ #define IPVS_OPT_SEQ_DATA 1 #define IPVS_OPT_PE_DATA 2 #define IPVS_OPT_PE_NAME 3 #define IPVS_OPT_PARAM 7 #define IPVS_OPT_F_SEQ_DATA (1 << (IPVS_OPT_SEQ_DATA-1)) #define IPVS_OPT_F_PE_DATA (1 << (IPVS_OPT_PE_DATA-1)) #define IPVS_OPT_F_PE_NAME (1 << (IPVS_OPT_PE_NAME-1)) #define IPVS_OPT_F_PARAM (1 << (IPVS_OPT_PARAM-1)) struct ip_vs_sync_thread_data { struct task_struct *task; struct netns_ipvs *ipvs; struct socket *sock; char *buf; int id; }; /* Version 0 definition of packet sizes */ #define SIMPLE_CONN_SIZE (sizeof(struct ip_vs_sync_conn_v0)) #define FULL_CONN_SIZE \ (sizeof(struct ip_vs_sync_conn_v0) + sizeof(struct ip_vs_sync_conn_options)) /* The master mulitcasts messages (Datagrams) to the backup load balancers in the following format. Version 1: Note, first byte should be Zero, so ver 0 receivers will drop the packet. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 0 | SyncID | Size | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Count Conns | Version | Reserved, set to Zero | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | IPVS Sync Connection (1) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | . | ~ . ~ | . | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | IPVS Sync Connection (n) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Version 0 Header 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Count Conns | SyncID | Size | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | IPVS Sync Connection (1) | */ /* Version 0 header */ struct ip_vs_sync_mesg_v0 { __u8 nr_conns; __u8 syncid; __be16 size; /* ip_vs_sync_conn entries start here */ }; /* Version 1 header */ struct ip_vs_sync_mesg { __u8 reserved; /* must be zero */ __u8 syncid; __be16 size; __u8 nr_conns; __s8 version; /* SYNC_PROTO_VER */ __u16 spare; /* ip_vs_sync_conn entries start here */ }; union ipvs_sockaddr { struct sockaddr_in in; struct sockaddr_in6 in6; }; struct ip_vs_sync_buff { struct list_head list; unsigned long firstuse; /* pointers for the message data */ struct ip_vs_sync_mesg *mesg; unsigned char *head; unsigned char *end; }; /* * Copy of struct ip_vs_seq * From unaligned network order to aligned host order */ static void ntoh_seq(struct ip_vs_seq *no, struct ip_vs_seq *ho) { memset(ho, 0, sizeof(*ho)); ho->init_seq = get_unaligned_be32(&no->init_seq); ho->delta = get_unaligned_be32(&no->delta); ho->previous_delta = get_unaligned_be32(&no->previous_delta); } /* * Copy of struct ip_vs_seq * From Aligned host order to unaligned network order */ static void hton_seq(struct ip_vs_seq *ho, struct ip_vs_seq *no) { put_unaligned_be32(ho->init_seq, &no->init_seq); put_unaligned_be32(ho->delta, &no->delta); put_unaligned_be32(ho->previous_delta, &no->previous_delta); } static inline struct ip_vs_sync_buff * sb_dequeue(struct netns_ipvs *ipvs, struct ipvs_master_sync_state *ms) { struct ip_vs_sync_buff *sb; spin_lock_bh(&ipvs->sync_lock); if (list_empty(&ms->sync_queue)) { sb = NULL; __set_current_state(TASK_INTERRUPTIBLE); } else { sb = list_entry(ms->sync_queue.next, struct ip_vs_sync_buff, list); list_del(&sb->list); ms->sync_queue_len--; if (!ms->sync_queue_len) ms->sync_queue_delay = 0; } spin_unlock_bh(&ipvs->sync_lock); return sb; } /* * Create a new sync buffer for Version 1 proto. */ static inline struct ip_vs_sync_buff * ip_vs_sync_buff_create(struct netns_ipvs *ipvs, unsigned int len) { struct ip_vs_sync_buff *sb; if (!(sb=kmalloc(sizeof(struct ip_vs_sync_buff), GFP_ATOMIC))) return NULL; len = max_t(unsigned int, len + sizeof(struct ip_vs_sync_mesg), ipvs->mcfg.sync_maxlen); sb->mesg = kmalloc(len, GFP_ATOMIC); if (!sb->mesg) { kfree(sb); return NULL; } sb->mesg->reserved = 0; /* old nr_conns i.e. must be zero now */ sb->mesg->version = SYNC_PROTO_VER; sb->mesg->syncid = ipvs->mcfg.syncid; sb->mesg->size = htons(sizeof(struct ip_vs_sync_mesg)); sb->mesg->nr_conns = 0; sb->mesg->spare = 0; sb->head = (unsigned char *)sb->mesg + sizeof(struct ip_vs_sync_mesg); sb->end = (unsigned char *)sb->mesg + len; sb->firstuse = jiffies; return sb; } static inline void ip_vs_sync_buff_release(struct ip_vs_sync_buff *sb) { kfree(sb->mesg); kfree(sb); } static inline void sb_queue_tail(struct netns_ipvs *ipvs, struct ipvs_master_sync_state *ms) { struct ip_vs_sync_buff *sb = ms->sync_buff; spin_lock(&ipvs->sync_lock); if (ipvs->sync_state & IP_VS_STATE_MASTER && ms->sync_queue_len < sysctl_sync_qlen_max(ipvs)) { if (!ms->sync_queue_len) schedule_delayed_work(&ms->master_wakeup_work, max(IPVS_SYNC_SEND_DELAY, 1)); ms->sync_queue_len++; list_add_tail(&sb->list, &ms->sync_queue); if ((++ms->sync_queue_delay) == IPVS_SYNC_WAKEUP_RATE) { int id = (int)(ms - ipvs->ms); wake_up_process(ipvs->master_tinfo[id].task); } } else ip_vs_sync_buff_release(sb); spin_unlock(&ipvs->sync_lock); } /* * Get the current sync buffer if it has been created for more * than the specified time or the specified time is zero. */ static inline struct ip_vs_sync_buff * get_curr_sync_buff(struct netns_ipvs *ipvs, struct ipvs_master_sync_state *ms, unsigned long time) { struct ip_vs_sync_buff *sb; spin_lock_bh(&ipvs->sync_buff_lock); sb = ms->sync_buff; if (sb && time_after_eq(jiffies - sb->firstuse, time)) { ms->sync_buff = NULL; __set_current_state(TASK_RUNNING); } else sb = NULL; spin_unlock_bh(&ipvs->sync_buff_lock); return sb; } static inline int select_master_thread_id(struct netns_ipvs *ipvs, struct ip_vs_conn *cp) { return ((long) cp >> (1 + ilog2(sizeof(*cp)))) & ipvs->threads_mask; } /* * Create a new sync buffer for Version 0 proto. */ static inline struct ip_vs_sync_buff * ip_vs_sync_buff_create_v0(struct netns_ipvs *ipvs, unsigned int len) { struct ip_vs_sync_buff *sb; struct ip_vs_sync_mesg_v0 *mesg; if (!(sb=kmalloc(sizeof(struct ip_vs_sync_buff), GFP_ATOMIC))) return NULL; len = max_t(unsigned int, len + sizeof(struct ip_vs_sync_mesg_v0), ipvs->mcfg.sync_maxlen); sb->mesg = kmalloc(len, GFP_ATOMIC); if (!sb->mesg) { kfree(sb); return NULL; } mesg = (struct ip_vs_sync_mesg_v0 *)sb->mesg; mesg->nr_conns = 0; mesg->syncid = ipvs->mcfg.syncid; mesg->size = htons(sizeof(struct ip_vs_sync_mesg_v0)); sb->head = (unsigned char *)mesg + sizeof(struct ip_vs_sync_mesg_v0); sb->end = (unsigned char *)mesg + len; sb->firstuse = jiffies; return sb; } /* Check if connection is controlled by persistence */ static inline bool in_persistence(struct ip_vs_conn *cp) { for (cp = cp->control; cp; cp = cp->control) { if (cp->flags & IP_VS_CONN_F_TEMPLATE) return true; } return false; } /* Check if conn should be synced. * pkts: conn packets, use sysctl_sync_threshold to avoid packet check * - (1) sync_refresh_period: reduce sync rate. Additionally, retry * sync_retries times with period of sync_refresh_period/8 * - (2) if both sync_refresh_period and sync_period are 0 send sync only * for state changes or only once when pkts matches sync_threshold * - (3) templates: rate can be reduced only with sync_refresh_period or * with (2) */ static int ip_vs_sync_conn_needed(struct netns_ipvs *ipvs, struct ip_vs_conn *cp, int pkts) { unsigned long orig = READ_ONCE(cp->sync_endtime); unsigned long now = jiffies; unsigned long n = (now + cp->timeout) & ~3UL; unsigned int sync_refresh_period; int sync_period; int force; /* Check if we sync in current state */ if (unlikely(cp->flags & IP_VS_CONN_F_TEMPLATE)) force = 0; else if (unlikely(sysctl_sync_persist_mode(ipvs) && in_persistence(cp))) return 0; else if (likely(cp->protocol == IPPROTO_TCP)) { if (!((1 << cp->state) & ((1 << IP_VS_TCP_S_ESTABLISHED) | (1 << IP_VS_TCP_S_FIN_WAIT) | (1 << IP_VS_TCP_S_CLOSE) | (1 << IP_VS_TCP_S_CLOSE_WAIT) | (1 << IP_VS_TCP_S_TIME_WAIT)))) return 0; force = cp->state != cp->old_state; if (force && cp->state != IP_VS_TCP_S_ESTABLISHED) goto set; } else if (unlikely(cp->protocol == IPPROTO_SCTP)) { if (!((1 << cp->state) & ((1 << IP_VS_SCTP_S_ESTABLISHED) | (1 << IP_VS_SCTP_S_SHUTDOWN_SENT) | (1 << IP_VS_SCTP_S_SHUTDOWN_RECEIVED) | (1 << IP_VS_SCTP_S_SHUTDOWN_ACK_SENT) | (1 << IP_VS_SCTP_S_CLOSED)))) return 0; force = cp->state != cp->old_state; if (force && cp->state != IP_VS_SCTP_S_ESTABLISHED) goto set; } else { /* UDP or another protocol with single state */ force = 0; } sync_refresh_period = sysctl_sync_refresh_period(ipvs); if (sync_refresh_period > 0) { long diff = n - orig; long min_diff = max(cp->timeout >> 1, 10UL * HZ); /* Avoid sync if difference is below sync_refresh_period * and below the half timeout. */ if (abs(diff) < min_t(long, sync_refresh_period, min_diff)) { int retries = orig & 3; if (retries >= sysctl_sync_retries(ipvs)) return 0; if (time_before(now, orig - cp->timeout + (sync_refresh_period >> 3))) return 0; n |= retries + 1; } } sync_period = sysctl_sync_period(ipvs); if (sync_period > 0) { if (!(cp->flags & IP_VS_CONN_F_TEMPLATE) && pkts % sync_period != sysctl_sync_threshold(ipvs)) return 0; } else if (!sync_refresh_period && pkts != sysctl_sync_threshold(ipvs)) return 0; set: cp->old_state = cp->state; n = cmpxchg(&cp->sync_endtime, orig, n); return n == orig || force; } /* * Version 0 , could be switched in by sys_ctl. * Add an ip_vs_conn information into the current sync_buff. */ static void ip_vs_sync_conn_v0(struct netns_ipvs *ipvs, struct ip_vs_conn *cp, int pkts) { struct ip_vs_sync_mesg_v0 *m; struct ip_vs_sync_conn_v0 *s; struct ip_vs_sync_buff *buff; struct ipvs_master_sync_state *ms; int id; unsigned int len; if (unlikely(cp->af != AF_INET)) return; /* Do not sync ONE PACKET */ if (cp->flags & IP_VS_CONN_F_ONE_PACKET) return; if (!ip_vs_sync_conn_needed(ipvs, cp, pkts)) return; spin_lock_bh(&ipvs->sync_buff_lock); if (!(ipvs->sync_state & IP_VS_STATE_MASTER)) { spin_unlock_bh(&ipvs->sync_buff_lock); return; } id = select_master_thread_id(ipvs, cp); ms = &ipvs->ms[id]; buff = ms->sync_buff; len = (cp->flags & IP_VS_CONN_F_SEQ_MASK) ? FULL_CONN_SIZE : SIMPLE_CONN_SIZE; if (buff) { m = (struct ip_vs_sync_mesg_v0 *) buff->mesg; /* Send buffer if it is for v1 */ if (buff->head + len > buff->end || !m->nr_conns) { sb_queue_tail(ipvs, ms); ms->sync_buff = NULL; buff = NULL; } } if (!buff) { buff = ip_vs_sync_buff_create_v0(ipvs, len); if (!buff) { spin_unlock_bh(&ipvs->sync_buff_lock); pr_err("ip_vs_sync_buff_create failed.\n"); return; } ms->sync_buff = buff; } m = (struct ip_vs_sync_mesg_v0 *) buff->mesg; s = (struct ip_vs_sync_conn_v0 *) buff->head; /* copy members */ s->reserved = 0; s->protocol = cp->protocol; s->cport = cp->cport; s->vport = cp->vport; s->dport = cp->dport; s->caddr = cp->caddr.ip; s->vaddr = cp->vaddr.ip; s->daddr = cp->daddr.ip; s->flags = htons(cp->flags & ~IP_VS_CONN_F_HASHED); s->state = htons(cp->state); if (cp->flags & IP_VS_CONN_F_SEQ_MASK) { struct ip_vs_sync_conn_options *opt = (struct ip_vs_sync_conn_options *)&s[1]; memcpy(opt, &cp->sync_conn_opt, sizeof(*opt)); } m->nr_conns++; m->size = htons(ntohs(m->size) + len); buff->head += len; spin_unlock_bh(&ipvs->sync_buff_lock); /* synchronize its controller if it has */ cp = cp->control; if (cp) { if (cp->flags & IP_VS_CONN_F_TEMPLATE) pkts = atomic_inc_return(&cp->in_pkts); else pkts = sysctl_sync_threshold(ipvs); ip_vs_sync_conn(ipvs, cp, pkts); } } /* * Add an ip_vs_conn information into the current sync_buff. * Called by ip_vs_in. * Sending Version 1 messages */ void ip_vs_sync_conn(struct netns_ipvs *ipvs, struct ip_vs_conn *cp, int pkts) { struct ip_vs_sync_mesg *m; union ip_vs_sync_conn *s; struct ip_vs_sync_buff *buff; struct ipvs_master_sync_state *ms; int id; __u8 *p; unsigned int len, pe_name_len, pad; /* Handle old version of the protocol */ if (sysctl_sync_ver(ipvs) == 0) { ip_vs_sync_conn_v0(ipvs, cp, pkts); return; } /* Do not sync ONE PACKET */ if (cp->flags & IP_VS_CONN_F_ONE_PACKET) goto control; sloop: if (!ip_vs_sync_conn_needed(ipvs, cp, pkts)) goto control; /* Sanity checks */ pe_name_len = 0; if (cp->pe_data_len) { if (!cp->pe_data || !cp->dest) { IP_VS_ERR_RL("SYNC, connection pe_data invalid\n"); return; } pe_name_len = strnlen(cp->pe->name, IP_VS_PENAME_MAXLEN); } spin_lock_bh(&ipvs->sync_buff_lock); if (!(ipvs->sync_state & IP_VS_STATE_MASTER)) { spin_unlock_bh(&ipvs->sync_buff_lock); return; } id = select_master_thread_id(ipvs, cp); ms = &ipvs->ms[id]; #ifdef CONFIG_IP_VS_IPV6 if (cp->af == AF_INET6) len = sizeof(struct ip_vs_sync_v6); else #endif len = sizeof(struct ip_vs_sync_v4); if (cp->flags & IP_VS_CONN_F_SEQ_MASK) len += sizeof(struct ip_vs_sync_conn_options) + 2; if (cp->pe_data_len) len += cp->pe_data_len + 2; /* + Param hdr field */ if (pe_name_len) len += pe_name_len + 2; /* check if there is a space for this one */ pad = 0; buff = ms->sync_buff; if (buff) { m = buff->mesg; pad = (4 - (size_t) buff->head) & 3; /* Send buffer if it is for v0 */ if (buff->head + len + pad > buff->end || m->reserved) { sb_queue_tail(ipvs, ms); ms->sync_buff = NULL; buff = NULL; pad = 0; } } if (!buff) { buff = ip_vs_sync_buff_create(ipvs, len); if (!buff) { spin_unlock_bh(&ipvs->sync_buff_lock); pr_err("ip_vs_sync_buff_create failed.\n"); return; } ms->sync_buff = buff; m = buff->mesg; } p = buff->head; buff->head += pad + len; m->size = htons(ntohs(m->size) + pad + len); /* Add ev. padding from prev. sync_conn */ while (pad--) *(p++) = 0; s = (union ip_vs_sync_conn *)p; /* Set message type & copy members */ s->v4.type = (cp->af == AF_INET6 ? STYPE_F_INET6 : 0); s->v4.ver_size = htons(len & SVER_MASK); /* Version 0 */ s->v4.flags = htonl(cp->flags & ~IP_VS_CONN_F_HASHED); s->v4.state = htons(cp->state); s->v4.protocol = cp->protocol; s->v4.cport = cp->cport; s->v4.vport = cp->vport; s->v4.dport = cp->dport; s->v4.fwmark = htonl(cp->fwmark); s->v4.timeout = htonl(cp->timeout / HZ); m->nr_conns++; #ifdef CONFIG_IP_VS_IPV6 if (cp->af == AF_INET6) { p += sizeof(struct ip_vs_sync_v6); s->v6.caddr = cp->caddr.in6; s->v6.vaddr = cp->vaddr.in6; s->v6.daddr = cp->daddr.in6; } else #endif { p += sizeof(struct ip_vs_sync_v4); /* options ptr */ s->v4.caddr = cp->caddr.ip; s->v4.vaddr = cp->vaddr.ip; s->v4.daddr = cp->daddr.ip; } if (cp->flags & IP_VS_CONN_F_SEQ_MASK) { *(p++) = IPVS_OPT_SEQ_DATA; *(p++) = sizeof(struct ip_vs_sync_conn_options); hton_seq((struct ip_vs_seq *)p, &cp->in_seq); p += sizeof(struct ip_vs_seq); hton_seq((struct ip_vs_seq *)p, &cp->out_seq); p += sizeof(struct ip_vs_seq); } /* Handle pe data */ if (cp->pe_data_len && cp->pe_data) { *(p++) = IPVS_OPT_PE_DATA; *(p++) = cp->pe_data_len; memcpy(p, cp->pe_data, cp->pe_data_len); p += cp->pe_data_len; if (pe_name_len) { /* Add PE_NAME */ *(p++) = IPVS_OPT_PE_NAME; *(p++) = pe_name_len; memcpy(p, cp->pe->name, pe_name_len); p += pe_name_len; } } spin_unlock_bh(&ipvs->sync_buff_lock); control: /* synchronize its controller if it has */ cp = cp->control; if (!cp) return; if (cp->flags & IP_VS_CONN_F_TEMPLATE) pkts = atomic_inc_return(&cp->in_pkts); else pkts = sysctl_sync_threshold(ipvs); goto sloop; } /* * fill_param used by version 1 */ static inline int ip_vs_conn_fill_param_sync(struct netns_ipvs *ipvs, int af, union ip_vs_sync_conn *sc, struct ip_vs_conn_param *p, __u8 *pe_data, unsigned int pe_data_len, __u8 *pe_name, unsigned int pe_name_len) { #ifdef CONFIG_IP_VS_IPV6 if (af == AF_INET6) ip_vs_conn_fill_param(ipvs, af, sc->v6.protocol, (const union nf_inet_addr *)&sc->v6.caddr, sc->v6.cport, (const union nf_inet_addr *)&sc->v6.vaddr, sc->v6.vport, p); else #endif ip_vs_conn_fill_param(ipvs, af, sc->v4.protocol, (const union nf_inet_addr *)&sc->v4.caddr, sc->v4.cport, (const union nf_inet_addr *)&sc->v4.vaddr, sc->v4.vport, p); /* Handle pe data */ if (pe_data_len) { if (pe_name_len) { char buff[IP_VS_PENAME_MAXLEN+1]; memcpy(buff, pe_name, pe_name_len); buff[pe_name_len]=0; p->pe = __ip_vs_pe_getbyname(buff); if (!p->pe) { IP_VS_DBG(3, "BACKUP, no %s engine found/loaded\n", buff); return 1; } } else { IP_VS_ERR_RL("BACKUP, Invalid PE parameters\n"); return 1; } p->pe_data = kmemdup(pe_data, pe_data_len, GFP_ATOMIC); if (!p->pe_data) { module_put(p->pe->module); return -ENOMEM; } p->pe_data_len = pe_data_len; } return 0; } /* * Connection Add / Update. * Common for version 0 and 1 reception of backup sync_conns. * Param: ... * timeout is in sec. */ static void ip_vs_proc_conn(struct netns_ipvs *ipvs, struct ip_vs_conn_param *param, unsigned int flags, unsigned int state, unsigned int protocol, unsigned int type, const union nf_inet_addr *daddr, __be16 dport, unsigned long timeout, __u32 fwmark, struct ip_vs_sync_conn_options *opt) { struct ip_vs_dest *dest; struct ip_vs_conn *cp; if (!(flags & IP_VS_CONN_F_TEMPLATE)) { cp = ip_vs_conn_in_get(param); if (cp && ((cp->dport != dport) || !ip_vs_addr_equal(cp->daf, &cp->daddr, daddr))) { if (!(flags & IP_VS_CONN_F_INACTIVE)) { ip_vs_conn_expire_now(cp); __ip_vs_conn_put(cp); cp = NULL; } else { /* This is the expiration message for the * connection that was already replaced, so we * just ignore it. */ __ip_vs_conn_put(cp); kfree(param->pe_data); return; } } } else { cp = ip_vs_ct_in_get(param); } if (cp) { /* Free pe_data */ kfree(param->pe_data); dest = cp->dest; spin_lock_bh(&cp->lock); if ((cp->flags ^ flags) & IP_VS_CONN_F_INACTIVE && !(flags & IP_VS_CONN_F_TEMPLATE) && dest) { if (flags & IP_VS_CONN_F_INACTIVE) { atomic_dec(&dest->activeconns); atomic_inc(&dest->inactconns); } else { atomic_inc(&dest->activeconns); atomic_dec(&dest->inactconns); } } flags &= IP_VS_CONN_F_BACKUP_UPD_MASK; flags |= cp->flags & ~IP_VS_CONN_F_BACKUP_UPD_MASK; cp->flags = flags; spin_unlock_bh(&cp->lock); if (!dest) ip_vs_try_bind_dest(cp); } else { /* * Find the appropriate destination for the connection. * If it is not found the connection will remain unbound * but still handled. */ rcu_read_lock(); /* This function is only invoked by the synchronization * code. We do not currently support heterogeneous pools * with synchronization, so we can make the assumption that * the svc_af is the same as the dest_af */ dest = ip_vs_find_dest(ipvs, type, type, daddr, dport, param->vaddr, param->vport, protocol, fwmark, flags); cp = ip_vs_conn_new(param, type, daddr, dport, flags, dest, fwmark); rcu_read_unlock(); if (!cp) { kfree(param->pe_data); IP_VS_DBG(2, "BACKUP, add new conn. failed\n"); return; } if (!(flags & IP_VS_CONN_F_TEMPLATE)) kfree(param->pe_data); } if (opt) { cp->in_seq = opt->in_seq; cp->out_seq = opt->out_seq; } atomic_set(&cp->in_pkts, sysctl_sync_threshold(ipvs)); cp->state = state; cp->old_state = cp->state; /* * For Ver 0 messages style * - Not possible to recover the right timeout for templates * - can not find the right fwmark * virtual service. If needed, we can do it for * non-fwmark persistent services. * Ver 1 messages style. * - No problem. */ if (timeout) { if (timeout > MAX_SCHEDULE_TIMEOUT / HZ) timeout = MAX_SCHEDULE_TIMEOUT / HZ; cp->timeout = timeout*HZ; } else { struct ip_vs_proto_data *pd; pd = ip_vs_proto_data_get(ipvs, protocol); if (!(flags & IP_VS_CONN_F_TEMPLATE) && pd && pd->timeout_table) cp->timeout = pd->timeout_table[state]; else cp->timeout = (3*60*HZ); } ip_vs_conn_put(cp); } /* * Process received multicast message for Version 0 */ static void ip_vs_process_message_v0(struct netns_ipvs *ipvs, const char *buffer, const size_t buflen) { struct ip_vs_sync_mesg_v0 *m = (struct ip_vs_sync_mesg_v0 *)buffer; struct ip_vs_sync_conn_v0 *s; struct ip_vs_sync_conn_options *opt; struct ip_vs_protocol *pp; struct ip_vs_conn_param param; char *p; int i; p = (char *)buffer + sizeof(struct ip_vs_sync_mesg_v0); for (i=0; i<m->nr_conns; i++) { unsigned int flags, state; if (p + SIMPLE_CONN_SIZE > buffer+buflen) { IP_VS_ERR_RL("BACKUP v0, bogus conn\n"); return; } s = (struct ip_vs_sync_conn_v0 *) p; flags = ntohs(s->flags) | IP_VS_CONN_F_SYNC; flags &= ~IP_VS_CONN_F_HASHED; if (flags & IP_VS_CONN_F_SEQ_MASK) { opt = (struct ip_vs_sync_conn_options *)&s[1]; p += FULL_CONN_SIZE; if (p > buffer+buflen) { IP_VS_ERR_RL("BACKUP v0, Dropping buffer bogus conn options\n"); return; } } else { opt = NULL; p += SIMPLE_CONN_SIZE; } state = ntohs(s->state); if (!(flags & IP_VS_CONN_F_TEMPLATE)) { pp = ip_vs_proto_get(s->protocol); if (!pp) { IP_VS_DBG(2, "BACKUP v0, Unsupported protocol %u\n", s->protocol); continue; } if (state >= pp->num_states) { IP_VS_DBG(2, "BACKUP v0, Invalid %s state %u\n", pp->name, state); continue; } } else { if (state >= IP_VS_CTPL_S_LAST) IP_VS_DBG(7, "BACKUP v0, Invalid tpl state %u\n", state); } ip_vs_conn_fill_param(ipvs, AF_INET, s->protocol, (const union nf_inet_addr *)&s->caddr, s->cport, (const union nf_inet_addr *)&s->vaddr, s->vport, ¶m); /* Send timeout as Zero */ ip_vs_proc_conn(ipvs, ¶m, flags, state, s->protocol, AF_INET, (union nf_inet_addr *)&s->daddr, s->dport, 0, 0, opt); } } /* * Handle options */ static inline int ip_vs_proc_seqopt(__u8 *p, unsigned int plen, __u32 *opt_flags, struct ip_vs_sync_conn_options *opt) { struct ip_vs_sync_conn_options *topt; topt = (struct ip_vs_sync_conn_options *)p; if (plen != sizeof(struct ip_vs_sync_conn_options)) { IP_VS_DBG(2, "BACKUP, bogus conn options length\n"); return -EINVAL; } if (*opt_flags & IPVS_OPT_F_SEQ_DATA) { IP_VS_DBG(2, "BACKUP, conn options found twice\n"); return -EINVAL; } ntoh_seq(&topt->in_seq, &opt->in_seq); ntoh_seq(&topt->out_seq, &opt->out_seq); *opt_flags |= IPVS_OPT_F_SEQ_DATA; return 0; } static int ip_vs_proc_str(__u8 *p, unsigned int plen, unsigned int *data_len, __u8 **data, unsigned int maxlen, __u32 *opt_flags, __u32 flag) { if (plen > maxlen) { IP_VS_DBG(2, "BACKUP, bogus par.data len > %d\n", maxlen); return -EINVAL; } if (*opt_flags & flag) { IP_VS_DBG(2, "BACKUP, Par.data found twice 0x%x\n", flag); return -EINVAL; } *data_len = plen; *data = p; *opt_flags |= flag; return 0; } /* * Process a Version 1 sync. connection */ static inline int ip_vs_proc_sync_conn(struct netns_ipvs *ipvs, __u8 *p, __u8 *msg_end) { struct ip_vs_sync_conn_options opt; union ip_vs_sync_conn *s; struct ip_vs_protocol *pp; struct ip_vs_conn_param param; __u32 flags; unsigned int af, state, pe_data_len=0, pe_name_len=0; __u8 *pe_data=NULL, *pe_name=NULL; __u32 opt_flags=0; int retc=0; s = (union ip_vs_sync_conn *) p; if (s->v6.type & STYPE_F_INET6) { #ifdef CONFIG_IP_VS_IPV6 af = AF_INET6; p += sizeof(struct ip_vs_sync_v6); #else IP_VS_DBG(3,"BACKUP, IPv6 msg received, and IPVS is not compiled for IPv6\n"); retc = 10; goto out; #endif } else if (!s->v4.type) { af = AF_INET; p += sizeof(struct ip_vs_sync_v4); } else { return -10; } if (p > msg_end) return -20; /* Process optional params check Type & Len. */ while (p < msg_end) { int ptype; int plen; if (p+2 > msg_end) return -30; ptype = *(p++); plen = *(p++); if (!plen || ((p + plen) > msg_end)) return -40; /* Handle seq option p = param data */ switch (ptype & ~IPVS_OPT_F_PARAM) { case IPVS_OPT_SEQ_DATA: if (ip_vs_proc_seqopt(p, plen, &opt_flags, &opt)) return -50; break; case IPVS_OPT_PE_DATA: if (ip_vs_proc_str(p, plen, &pe_data_len, &pe_data, IP_VS_PEDATA_MAXLEN, &opt_flags, IPVS_OPT_F_PE_DATA)) return -60; break; case IPVS_OPT_PE_NAME: if (ip_vs_proc_str(p, plen,&pe_name_len, &pe_name, IP_VS_PENAME_MAXLEN, &opt_flags, IPVS_OPT_F_PE_NAME)) return -70; break; default: /* Param data mandatory ? */ if (!(ptype & IPVS_OPT_F_PARAM)) { IP_VS_DBG(3, "BACKUP, Unknown mandatory param %d found\n", ptype & ~IPVS_OPT_F_PARAM); retc = 20; goto out; } } p += plen; /* Next option */ } /* Get flags and Mask off unsupported */ flags = ntohl(s->v4.flags) & IP_VS_CONN_F_BACKUP_MASK; flags |= IP_VS_CONN_F_SYNC; state = ntohs(s->v4.state); if (!(flags & IP_VS_CONN_F_TEMPLATE)) { pp = ip_vs_proto_get(s->v4.protocol); if (!pp) { IP_VS_DBG(3,"BACKUP, Unsupported protocol %u\n", s->v4.protocol); retc = 30; goto out; } if (state >= pp->num_states) { IP_VS_DBG(3, "BACKUP, Invalid %s state %u\n", pp->name, state); retc = 40; goto out; } } else { if (state >= IP_VS_CTPL_S_LAST) IP_VS_DBG(7, "BACKUP, Invalid tpl state %u\n", state); } if (ip_vs_conn_fill_param_sync(ipvs, af, s, ¶m, pe_data, pe_data_len, pe_name, pe_name_len)) { retc = 50; goto out; } /* If only IPv4, just silent skip IPv6 */ if (af == AF_INET) ip_vs_proc_conn(ipvs, ¶m, flags, state, s->v4.protocol, af, (union nf_inet_addr *)&s->v4.daddr, s->v4.dport, ntohl(s->v4.timeout), ntohl(s->v4.fwmark), (opt_flags & IPVS_OPT_F_SEQ_DATA ? &opt : NULL) ); #ifdef CONFIG_IP_VS_IPV6 else ip_vs_proc_conn(ipvs, ¶m, flags, state, s->v6.protocol, af, (union nf_inet_addr *)&s->v6.daddr, s->v6.dport, ntohl(s->v6.timeout), ntohl(s->v6.fwmark), (opt_flags & IPVS_OPT_F_SEQ_DATA ? &opt : NULL) ); #endif ip_vs_pe_put(param.pe); return 0; /* Error exit */ out: IP_VS_DBG(2, "BACKUP, Single msg dropped err:%d\n", retc); return retc; } /* * Process received multicast message and create the corresponding * ip_vs_conn entries. * Handles Version 0 & 1 */ static void ip_vs_process_message(struct netns_ipvs *ipvs, __u8 *buffer, const size_t buflen) { struct ip_vs_sync_mesg *m2 = (struct ip_vs_sync_mesg *)buffer; __u8 *p, *msg_end; int i, nr_conns; if (buflen < sizeof(struct ip_vs_sync_mesg_v0)) { IP_VS_DBG(2, "BACKUP, message header too short\n"); return; } if (buflen != ntohs(m2->size)) { IP_VS_DBG(2, "BACKUP, bogus message size\n"); return; } /* SyncID sanity check */ if (ipvs->bcfg.syncid != 0 && m2->syncid != ipvs->bcfg.syncid) { IP_VS_DBG(7, "BACKUP, Ignoring syncid = %d\n", m2->syncid); return; } /* Handle version 1 message */ if ((m2->version == SYNC_PROTO_VER) && (m2->reserved == 0) && (m2->spare == 0)) { msg_end = buffer + sizeof(struct ip_vs_sync_mesg); nr_conns = m2->nr_conns; for (i=0; i<nr_conns; i++) { union ip_vs_sync_conn *s; unsigned int size; int retc; p = msg_end; if (p + sizeof(s->v4) > buffer+buflen) { IP_VS_ERR_RL("BACKUP, Dropping buffer, too small\n"); return; } s = (union ip_vs_sync_conn *)p; size = ntohs(s->v4.ver_size) & SVER_MASK; msg_end = p + size; /* Basic sanity checks */ if (msg_end > buffer+buflen) { IP_VS_ERR_RL("BACKUP, Dropping buffer, msg > buffer\n"); return; } if (ntohs(s->v4.ver_size) >> SVER_SHIFT) { IP_VS_ERR_RL("BACKUP, Dropping buffer, Unknown version %d\n", ntohs(s->v4.ver_size) >> SVER_SHIFT); return; } /* Process a single sync_conn */ retc = ip_vs_proc_sync_conn(ipvs, p, msg_end); if (retc < 0) { IP_VS_ERR_RL("BACKUP, Dropping buffer, Err: %d in decoding\n", retc); return; } /* Make sure we have 32 bit alignment */ msg_end = p + ((size + 3) & ~3); } } else { /* Old type of message */ ip_vs_process_message_v0(ipvs, buffer, buflen); return; } } /* * Setup sndbuf (mode=1) or rcvbuf (mode=0) */ static void set_sock_size(struct sock *sk, int mode, int val) { /* setsockopt(sock, SOL_SOCKET, SO_SNDBUF, &val, sizeof(val)); */ /* setsockopt(sock, SOL_SOCKET, SO_RCVBUF, &val, sizeof(val)); */ lock_sock(sk); if (mode) { val = clamp_t(int, val, (SOCK_MIN_SNDBUF + 1) / 2, READ_ONCE(sysctl_wmem_max)); sk->sk_sndbuf = val * 2; sk->sk_userlocks |= SOCK_SNDBUF_LOCK; } else { val = clamp_t(int, val, (SOCK_MIN_RCVBUF + 1) / 2, READ_ONCE(sysctl_rmem_max)); sk->sk_rcvbuf = val * 2; sk->sk_userlocks |= SOCK_RCVBUF_LOCK; } release_sock(sk); } /* * Setup loopback of outgoing multicasts on a sending socket */ static void set_mcast_loop(struct sock *sk, u_char loop) { /* setsockopt(sock, SOL_IP, IP_MULTICAST_LOOP, &loop, sizeof(loop)); */ inet_assign_bit(MC_LOOP, sk, loop); #ifdef CONFIG_IP_VS_IPV6 if (READ_ONCE(sk->sk_family) == AF_INET6) { /* IPV6_MULTICAST_LOOP */ inet6_assign_bit(MC6_LOOP, sk, loop); } #endif } /* * Specify TTL for outgoing multicasts on a sending socket */ static void set_mcast_ttl(struct sock *sk, u_char ttl) { struct inet_sock *inet = inet_sk(sk); /* setsockopt(sock, SOL_IP, IP_MULTICAST_TTL, &ttl, sizeof(ttl)); */ lock_sock(sk); WRITE_ONCE(inet->mc_ttl, ttl); #ifdef CONFIG_IP_VS_IPV6 if (sk->sk_family == AF_INET6) { struct ipv6_pinfo *np = inet6_sk(sk); /* IPV6_MULTICAST_HOPS */ WRITE_ONCE(np->mcast_hops, ttl); } #endif release_sock(sk); } /* Control fragmentation of messages */ static void set_mcast_pmtudisc(struct sock *sk, int val) { struct inet_sock *inet = inet_sk(sk); /* setsockopt(sock, SOL_IP, IP_MTU_DISCOVER, &val, sizeof(val)); */ lock_sock(sk); WRITE_ONCE(inet->pmtudisc, val); #ifdef CONFIG_IP_VS_IPV6 if (sk->sk_family == AF_INET6) { struct ipv6_pinfo *np = inet6_sk(sk); /* IPV6_MTU_DISCOVER */ WRITE_ONCE(np->pmtudisc, val); } #endif release_sock(sk); } /* * Specifiy default interface for outgoing multicasts */ static int set_mcast_if(struct sock *sk, struct net_device *dev) { struct inet_sock *inet = inet_sk(sk); if (sk->sk_bound_dev_if && dev->ifindex != sk->sk_bound_dev_if) return -EINVAL; lock_sock(sk); inet->mc_index = dev->ifindex; /* inet->mc_addr = 0; */ #ifdef CONFIG_IP_VS_IPV6 if (sk->sk_family == AF_INET6) { struct ipv6_pinfo *np = inet6_sk(sk); /* IPV6_MULTICAST_IF */ WRITE_ONCE(np->mcast_oif, dev->ifindex); } #endif release_sock(sk); return 0; } /* * Join a multicast group. * the group is specified by a class D multicast address 224.0.0.0/8 * in the in_addr structure passed in as a parameter. */ static int join_mcast_group(struct sock *sk, struct in_addr *addr, struct net_device *dev) { struct ip_mreqn mreq; int ret; memset(&mreq, 0, sizeof(mreq)); memcpy(&mreq.imr_multiaddr, addr, sizeof(struct in_addr)); if (sk->sk_bound_dev_if && dev->ifindex != sk->sk_bound_dev_if) return -EINVAL; mreq.imr_ifindex = dev->ifindex; lock_sock(sk); ret = ip_mc_join_group(sk, &mreq); release_sock(sk); return ret; } #ifdef CONFIG_IP_VS_IPV6 static int join_mcast_group6(struct sock *sk, struct in6_addr *addr, struct net_device *dev) { int ret; if (sk->sk_bound_dev_if && dev->ifindex != sk->sk_bound_dev_if) return -EINVAL; lock_sock(sk); ret = ipv6_sock_mc_join(sk, dev->ifindex, addr); release_sock(sk); return ret; } #endif static int bind_mcastif_addr(struct socket *sock, struct net_device *dev) { __be32 addr; struct sockaddr_in sin; addr = inet_select_addr(dev, 0, RT_SCOPE_UNIVERSE); if (!addr) pr_err("You probably need to specify IP address on " "multicast interface.\n"); IP_VS_DBG(7, "binding socket with (%s) %pI4\n", dev->name, &addr); /* Now bind the socket with the address of multicast interface */ sin.sin_family = AF_INET; sin.sin_addr.s_addr = addr; sin.sin_port = 0; return kernel_bind(sock, (struct sockaddr *)&sin, sizeof(sin)); } static void get_mcast_sockaddr(union ipvs_sockaddr *sa, int *salen, struct ipvs_sync_daemon_cfg *c, int id) { if (AF_INET6 == c->mcast_af) { sa->in6 = (struct sockaddr_in6) { .sin6_family = AF_INET6, .sin6_port = htons(c->mcast_port + id), }; sa->in6.sin6_addr = c->mcast_group.in6; *salen = sizeof(sa->in6); } else { sa->in = (struct sockaddr_in) { .sin_family = AF_INET, .sin_port = htons(c->mcast_port + id), }; sa->in.sin_addr = c->mcast_group.in; *salen = sizeof(sa->in); } } /* * Set up sending multicast socket over UDP */ static int make_send_sock(struct netns_ipvs *ipvs, int id, struct net_device *dev, struct socket **sock_ret) { /* multicast addr */ union ipvs_sockaddr mcast_addr; struct socket *sock; int result, salen; /* First create a socket */ result = sock_create_kern(ipvs->net, ipvs->mcfg.mcast_af, SOCK_DGRAM, IPPROTO_UDP, &sock); if (result < 0) { pr_err("Error during creation of socket; terminating\n"); goto error; } *sock_ret = sock; result = set_mcast_if(sock->sk, dev); if (result < 0) { pr_err("Error setting outbound mcast interface\n"); goto error; } set_mcast_loop(sock->sk, 0); set_mcast_ttl(sock->sk, ipvs->mcfg.mcast_ttl); /* Allow fragmentation if MTU changes */ set_mcast_pmtudisc(sock->sk, IP_PMTUDISC_DONT); result = sysctl_sync_sock_size(ipvs); if (result > 0) set_sock_size(sock->sk, 1, result); if (AF_INET == ipvs->mcfg.mcast_af) result = bind_mcastif_addr(sock, dev); else result = 0; if (result < 0) { pr_err("Error binding address of the mcast interface\n"); goto error; } get_mcast_sockaddr(&mcast_addr, &salen, &ipvs->mcfg, id); result = kernel_connect(sock, (struct sockaddr *)&mcast_addr, salen, 0); if (result < 0) { pr_err("Error connecting to the multicast addr\n"); goto error; } return 0; error: return result; } /* * Set up receiving multicast socket over UDP */ static int make_receive_sock(struct netns_ipvs *ipvs, int id, struct net_device *dev, struct socket **sock_ret) { /* multicast addr */ union ipvs_sockaddr mcast_addr; struct socket *sock; int result, salen; /* First create a socket */ result = sock_create_kern(ipvs->net, ipvs->bcfg.mcast_af, SOCK_DGRAM, IPPROTO_UDP, &sock); if (result < 0) { pr_err("Error during creation of socket; terminating\n"); goto error; } *sock_ret = sock; /* it is equivalent to the REUSEADDR option in user-space */ sock->sk->sk_reuse = SK_CAN_REUSE; result = sysctl_sync_sock_size(ipvs); if (result > 0) set_sock_size(sock->sk, 0, result); get_mcast_sockaddr(&mcast_addr, &salen, &ipvs->bcfg, id); sock->sk->sk_bound_dev_if = dev->ifindex; result = kernel_bind(sock, (struct sockaddr *)&mcast_addr, salen); if (result < 0) { pr_err("Error binding to the multicast addr\n"); goto error; } /* join the multicast group */ #ifdef CONFIG_IP_VS_IPV6 if (ipvs->bcfg.mcast_af == AF_INET6) result = join_mcast_group6(sock->sk, &mcast_addr.in6.sin6_addr, dev); else #endif result = join_mcast_group(sock->sk, &mcast_addr.in.sin_addr, dev); if (result < 0) { pr_err("Error joining to the multicast group\n"); goto error; } return 0; error: return result; } static int ip_vs_send_async(struct socket *sock, const char *buffer, const size_t length) { struct msghdr msg = {.msg_flags = MSG_DONTWAIT|MSG_NOSIGNAL}; struct kvec iov; int len; iov.iov_base = (void *)buffer; iov.iov_len = length; len = kernel_sendmsg(sock, &msg, &iov, 1, (size_t)(length)); return len; } static int ip_vs_send_sync_msg(struct socket *sock, struct ip_vs_sync_mesg *msg) { int msize; int ret; msize = ntohs(msg->size); ret = ip_vs_send_async(sock, (char *)msg, msize); if (ret >= 0 || ret == -EAGAIN) return ret; pr_err("ip_vs_send_async error %d\n", ret); return 0; } static int ip_vs_receive(struct socket *sock, char *buffer, const size_t buflen) { struct msghdr msg = {NULL,}; struct kvec iov = {buffer, buflen}; int len; /* Receive a packet */ iov_iter_kvec(&msg.msg_iter, ITER_DEST, &iov, 1, buflen); len = sock_recvmsg(sock, &msg, MSG_DONTWAIT); if (len < 0) return len; return len; } /* Wakeup the master thread for sending */ static void master_wakeup_work_handler(struct work_struct *work) { struct ipvs_master_sync_state *ms = container_of(work, struct ipvs_master_sync_state, master_wakeup_work.work); struct netns_ipvs *ipvs = ms->ipvs; spin_lock_bh(&ipvs->sync_lock); if (ms->sync_queue_len && ms->sync_queue_delay < IPVS_SYNC_WAKEUP_RATE) { int id = (int)(ms - ipvs->ms); ms->sync_queue_delay = IPVS_SYNC_WAKEUP_RATE; wake_up_process(ipvs->master_tinfo[id].task); } spin_unlock_bh(&ipvs->sync_lock); } /* Get next buffer to send */ static inline struct ip_vs_sync_buff * next_sync_buff(struct netns_ipvs *ipvs, struct ipvs_master_sync_state *ms) { struct ip_vs_sync_buff *sb; sb = sb_dequeue(ipvs, ms); if (sb) return sb; /* Do not delay entries in buffer for more than 2 seconds */ return get_curr_sync_buff(ipvs, ms, IPVS_SYNC_FLUSH_TIME); } static int sync_thread_master(void *data) { struct ip_vs_sync_thread_data *tinfo = data; struct netns_ipvs *ipvs = tinfo->ipvs; struct ipvs_master_sync_state *ms = &ipvs->ms[tinfo->id]; struct sock *sk = tinfo->sock->sk; struct ip_vs_sync_buff *sb; pr_info("sync thread started: state = MASTER, mcast_ifn = %s, " "syncid = %d, id = %d\n", ipvs->mcfg.mcast_ifn, ipvs->mcfg.syncid, tinfo->id); for (;;) { sb = next_sync_buff(ipvs, ms); if (unlikely(kthread_should_stop())) break; if (!sb) { schedule_timeout(IPVS_SYNC_CHECK_PERIOD); continue; } while (ip_vs_send_sync_msg(tinfo->sock, sb->mesg) < 0) { /* (Ab)use interruptible sleep to avoid increasing * the load avg. */ __wait_event_interruptible(*sk_sleep(sk), sock_writeable(sk) || kthread_should_stop()); if (unlikely(kthread_should_stop())) goto done; } ip_vs_sync_buff_release(sb); } done: __set_current_state(TASK_RUNNING); if (sb) ip_vs_sync_buff_release(sb); /* clean up the sync_buff queue */ while ((sb = sb_dequeue(ipvs, ms))) ip_vs_sync_buff_release(sb); __set_current_state(TASK_RUNNING); /* clean up the current sync_buff */ sb = get_curr_sync_buff(ipvs, ms, 0); if (sb) ip_vs_sync_buff_release(sb); return 0; } static int sync_thread_backup(void *data) { struct ip_vs_sync_thread_data *tinfo = data; struct netns_ipvs *ipvs = tinfo->ipvs; struct sock *sk = tinfo->sock->sk; struct udp_sock *up = udp_sk(sk); int len; pr_info("sync thread started: state = BACKUP, mcast_ifn = %s, " "syncid = %d, id = %d\n", ipvs->bcfg.mcast_ifn, ipvs->bcfg.syncid, tinfo->id); while (!kthread_should_stop()) { wait_event_interruptible(*sk_sleep(sk), !skb_queue_empty_lockless(&sk->sk_receive_queue) || !skb_queue_empty_lockless(&up->reader_queue) || kthread_should_stop()); /* do we have data now? */ while (!skb_queue_empty_lockless(&sk->sk_receive_queue) || !skb_queue_empty_lockless(&up->reader_queue)) { len = ip_vs_receive(tinfo->sock, tinfo->buf, ipvs->bcfg.sync_maxlen); if (len <= 0) { if (len != -EAGAIN) pr_err("receiving message error\n"); break; } ip_vs_process_message(ipvs, tinfo->buf, len); } } return 0; } int start_sync_thread(struct netns_ipvs *ipvs, struct ipvs_sync_daemon_cfg *c, int state) { struct ip_vs_sync_thread_data *ti = NULL, *tinfo; struct task_struct *task; struct net_device *dev; char *name; int (*threadfn)(void *data); int id = 0, count, hlen; int result = -ENOMEM; u16 mtu, min_mtu; IP_VS_DBG(7, "%s(): pid %d\n", __func__, task_pid_nr(current)); IP_VS_DBG(7, "Each ip_vs_sync_conn entry needs %zd bytes\n", sizeof(struct ip_vs_sync_conn_v0)); /* increase the module use count */ if (!ip_vs_use_count_inc()) return -ENOPROTOOPT; /* Do not hold one mutex and then to block on another */ for (;;) { rtnl_lock(); if (mutex_trylock(&ipvs->sync_mutex)) break; rtnl_unlock(); mutex_lock(&ipvs->sync_mutex); if (rtnl_trylock()) break; mutex_unlock(&ipvs->sync_mutex); } if (!ipvs->sync_state) { count = clamp(sysctl_sync_ports(ipvs), 1, IPVS_SYNC_PORTS_MAX); ipvs->threads_mask = count - 1; } else count = ipvs->threads_mask + 1; if (c->mcast_af == AF_UNSPEC) { c->mcast_af = AF_INET; c->mcast_group.ip = cpu_to_be32(IP_VS_SYNC_GROUP); } if (!c->mcast_port) c->mcast_port = IP_VS_SYNC_PORT; if (!c->mcast_ttl) c->mcast_ttl = 1; dev = __dev_get_by_name(ipvs->net, c->mcast_ifn); if (!dev) { pr_err("Unknown mcast interface: %s\n", c->mcast_ifn); result = -ENODEV; goto out_early; } hlen = (AF_INET6 == c->mcast_af) ? sizeof(struct ipv6hdr) + sizeof(struct udphdr) : sizeof(struct iphdr) + sizeof(struct udphdr); mtu = (state == IP_VS_STATE_BACKUP) ? clamp(dev->mtu, 1500U, 65535U) : 1500U; min_mtu = (state == IP_VS_STATE_BACKUP) ? 1024 : 1; if (c->sync_maxlen) c->sync_maxlen = clamp_t(unsigned int, c->sync_maxlen, min_mtu, 65535 - hlen); else c->sync_maxlen = mtu - hlen; if (state == IP_VS_STATE_MASTER) { result = -EEXIST; if (ipvs->ms) goto out_early; ipvs->mcfg = *c; name = "ipvs-m:%d:%d"; threadfn = sync_thread_master; } else if (state == IP_VS_STATE_BACKUP) { result = -EEXIST; if (ipvs->backup_tinfo) goto out_early; ipvs->bcfg = *c; name = "ipvs-b:%d:%d"; threadfn = sync_thread_backup; } else { result = -EINVAL; goto out_early; } if (state == IP_VS_STATE_MASTER) { struct ipvs_master_sync_state *ms; result = -ENOMEM; ipvs->ms = kcalloc(count, sizeof(ipvs->ms[0]), GFP_KERNEL); if (!ipvs->ms) goto out; ms = ipvs->ms; for (id = 0; id < count; id++, ms++) { INIT_LIST_HEAD(&ms->sync_queue); ms->sync_queue_len = 0; ms->sync_queue_delay = 0; INIT_DELAYED_WORK(&ms->master_wakeup_work, master_wakeup_work_handler); ms->ipvs = ipvs; } } result = -ENOMEM; ti = kcalloc(count, sizeof(struct ip_vs_sync_thread_data), GFP_KERNEL); if (!ti) goto out; for (id = 0; id < count; id++) { tinfo = &ti[id]; tinfo->ipvs = ipvs; if (state == IP_VS_STATE_BACKUP) { result = -ENOMEM; tinfo->buf = kmalloc(ipvs->bcfg.sync_maxlen, GFP_KERNEL); if (!tinfo->buf) goto out; } tinfo->id = id; if (state == IP_VS_STATE_MASTER) result = make_send_sock(ipvs, id, dev, &tinfo->sock); else result = make_receive_sock(ipvs, id, dev, &tinfo->sock); if (result < 0) goto out; task = kthread_run(threadfn, tinfo, name, ipvs->gen, id); if (IS_ERR(task)) { result = PTR_ERR(task); goto out; } tinfo->task = task; } /* mark as active */ if (state == IP_VS_STATE_MASTER) ipvs->master_tinfo = ti; else ipvs->backup_tinfo = ti; spin_lock_bh(&ipvs->sync_buff_lock); ipvs->sync_state |= state; spin_unlock_bh(&ipvs->sync_buff_lock); mutex_unlock(&ipvs->sync_mutex); rtnl_unlock(); return 0; out: /* We do not need RTNL lock anymore, release it here so that * sock_release below can use rtnl_lock to leave the mcast group. */ rtnl_unlock(); id = min(id, count - 1); if (ti) { for (tinfo = ti + id; tinfo >= ti; tinfo--) { if (tinfo->task) kthread_stop(tinfo->task); } } if (!(ipvs->sync_state & IP_VS_STATE_MASTER)) { kfree(ipvs->ms); ipvs->ms = NULL; } mutex_unlock(&ipvs->sync_mutex); /* No more mutexes, release socks */ if (ti) { for (tinfo = ti + id; tinfo >= ti; tinfo--) { if (tinfo->sock) sock_release(tinfo->sock); kfree(tinfo->buf); } kfree(ti); } /* decrease the module use count */ ip_vs_use_count_dec(); return result; out_early: mutex_unlock(&ipvs->sync_mutex); rtnl_unlock(); /* decrease the module use count */ ip_vs_use_count_dec(); return result; } int stop_sync_thread(struct netns_ipvs *ipvs, int state) { struct ip_vs_sync_thread_data *ti, *tinfo; int id; int retc = -EINVAL; IP_VS_DBG(7, "%s(): pid %d\n", __func__, task_pid_nr(current)); mutex_lock(&ipvs->sync_mutex); if (state == IP_VS_STATE_MASTER) { retc = -ESRCH; if (!ipvs->ms) goto err; ti = ipvs->master_tinfo; /* * The lock synchronizes with sb_queue_tail(), so that we don't * add sync buffers to the queue, when we are already in * progress of stopping the master sync daemon. */ spin_lock_bh(&ipvs->sync_buff_lock); spin_lock(&ipvs->sync_lock); ipvs->sync_state &= ~IP_VS_STATE_MASTER; spin_unlock(&ipvs->sync_lock); spin_unlock_bh(&ipvs->sync_buff_lock); retc = 0; for (id = ipvs->threads_mask; id >= 0; id--) { struct ipvs_master_sync_state *ms = &ipvs->ms[id]; int ret; tinfo = &ti[id]; pr_info("stopping master sync thread %d ...\n", task_pid_nr(tinfo->task)); cancel_delayed_work_sync(&ms->master_wakeup_work); ret = kthread_stop(tinfo->task); if (retc >= 0) retc = ret; } kfree(ipvs->ms); ipvs->ms = NULL; ipvs->master_tinfo = NULL; } else if (state == IP_VS_STATE_BACKUP) { retc = -ESRCH; if (!ipvs->backup_tinfo) goto err; ti = ipvs->backup_tinfo; ipvs->sync_state &= ~IP_VS_STATE_BACKUP; retc = 0; for (id = ipvs->threads_mask; id >= 0; id--) { int ret; tinfo = &ti[id]; pr_info("stopping backup sync thread %d ...\n", task_pid_nr(tinfo->task)); ret = kthread_stop(tinfo->task); if (retc >= 0) retc = ret; } ipvs->backup_tinfo = NULL; } else { goto err; } id = ipvs->threads_mask; mutex_unlock(&ipvs->sync_mutex); /* No more mutexes, release socks */ for (tinfo = ti + id; tinfo >= ti; tinfo--) { if (tinfo->sock) sock_release(tinfo->sock); kfree(tinfo->buf); } kfree(ti); /* decrease the module use count */ ip_vs_use_count_dec(); return retc; err: mutex_unlock(&ipvs->sync_mutex); return retc; } /* * Initialize data struct for each netns */ int __net_init ip_vs_sync_net_init(struct netns_ipvs *ipvs) { __mutex_init(&ipvs->sync_mutex, "ipvs->sync_mutex", &__ipvs_sync_key); spin_lock_init(&ipvs->sync_lock); spin_lock_init(&ipvs->sync_buff_lock); return 0; } void ip_vs_sync_net_cleanup(struct netns_ipvs *ipvs) { int retc; retc = stop_sync_thread(ipvs, IP_VS_STATE_MASTER); if (retc && retc != -ESRCH) pr_err("Failed to stop Master Daemon\n"); retc = stop_sync_thread(ipvs, IP_VS_STATE_BACKUP); if (retc && retc != -ESRCH) pr_err("Failed to stop Backup Daemon\n"); } |
26 44 17 33 6 6 6 26 34 128 12 11 1 11 1 11 1 12 135 6 6 6 18 9 9 9 6 157 149 18 148 146 145 145 149 2 2 4 2 1 3 2 2 6 18 6 6 6 6 6 6 20 20 14 14 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 9 9 9 9 26 2 23 24 6 6 6 6 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 2297 2298 2299 2300 2301 2302 2303 2304 2305 2306 2307 2308 2309 2310 2311 2312 2313 2314 2315 2316 2317 2318 2319 2320 2321 2322 2323 2324 2325 2326 2327 2328 2329 2330 2331 2332 2333 2334 2335 2336 2337 2338 2339 2340 2341 2342 2343 2344 2345 2346 2347 2348 2349 2350 2351 2352 2353 2354 2355 2356 2357 2358 2359 2360 2361 2362 2363 2364 2365 2366 2367 2368 2369 2370 2371 2372 2373 2374 2375 2376 2377 2378 2379 2380 2381 2382 2383 2384 2385 2386 2387 2388 2389 2390 2391 2392 2393 2394 2395 2396 2397 2398 2399 2400 2401 2402 2403 2404 2405 2406 2407 2408 2409 2410 2411 2412 2413 2414 2415 2416 2417 2418 2419 2420 2421 2422 2423 2424 2425 2426 2427 2428 2429 2430 2431 2432 2433 2434 2435 2436 2437 2438 2439 2440 2441 2442 2443 2444 2445 2446 2447 2448 2449 2450 2451 2452 2453 2454 2455 2456 2457 2458 2459 2460 2461 2462 2463 2464 2465 2466 2467 2468 2469 2470 2471 2472 2473 2474 2475 2476 2477 2478 2479 2480 2481 2482 2483 2484 2485 2486 2487 2488 2489 2490 2491 2492 2493 2494 2495 2496 2497 2498 2499 2500 2501 2502 2503 2504 2505 2506 2507 2508 2509 2510 2511 2512 2513 2514 2515 2516 2517 2518 2519 2520 2521 2522 2523 2524 2525 2526 2527 2528 2529 2530 2531 2532 2533 2534 2535 2536 2537 2538 2539 2540 2541 2542 2543 2544 2545 2546 2547 2548 2549 2550 2551 2552 2553 2554 2555 2556 2557 2558 2559 2560 2561 2562 2563 2564 2565 2566 2567 2568 2569 2570 2571 2572 2573 2574 2575 2576 2577 2578 2579 2580 2581 2582 2583 2584 2585 2586 2587 2588 2589 2590 2591 2592 2593 2594 2595 2596 2597 2598 2599 2600 2601 2602 2603 2604 2605 2606 2607 2608 2609 2610 2611 2612 2613 2614 2615 2616 2617 2618 2619 2620 2621 2622 2623 2624 2625 2626 2627 2628 2629 2630 2631 2632 2633 2634 2635 2636 2637 2638 2639 2640 2641 2642 2643 2644 2645 2646 2647 2648 2649 2650 2651 2652 2653 2654 2655 2656 2657 2658 2659 2660 2661 2662 2663 2664 2665 2666 2667 2668 2669 2670 2671 2672 2673 2674 2675 2676 2677 2678 2679 2680 2681 2682 2683 2684 2685 2686 2687 2688 2689 2690 2691 2692 2693 2694 2695 2696 2697 2698 2699 2700 2701 2702 2703 2704 2705 2706 2707 2708 2709 2710 2711 2712 2713 2714 2715 2716 2717 2718 2719 2720 2721 2722 2723 2724 2725 2726 2727 2728 2729 2730 2731 2732 2733 2734 2735 2736 2737 2738 2739 2740 2741 2742 2743 2744 2745 2746 2747 2748 2749 2750 2751 2752 2753 2754 2755 2756 2757 2758 2759 2760 2761 2762 2763 2764 2765 2766 2767 2768 2769 2770 2771 2772 2773 2774 2775 2776 2777 | /* * Copyright © 1997-2003 by The XFree86 Project, Inc. * Copyright © 2007 Dave Airlie * Copyright © 2007-2008 Intel Corporation * Jesse Barnes <jesse.barnes@intel.com> * Copyright 2005-2006 Luc Verhaegen * Copyright (c) 2001, Andy Ritger aritger@nvidia.com * * Permission is hereby granted, free of charge, to any person obtaining a * copy of this software and associated documentation files (the "Software"), * to deal in the Software without restriction, including without limitation * the rights to use, copy, modify, merge, publish, distribute, sublicense, * and/or sell copies of the Software, and to permit persons to whom the * Software is furnished to do so, subject to the following conditions: * * The above copyright notice and this permission notice shall be included in * all copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR * OTHER DEALINGS IN THE SOFTWARE. * * Except as contained in this notice, the name of the copyright holder(s) * and author(s) shall not be used in advertising or otherwise to promote * the sale, use or other dealings in this Software without prior written * authorization from the copyright holder(s) and author(s). */ #include <linux/ctype.h> #include <linux/export.h> #include <linux/fb.h> /* for KHZ2PICOS() */ #include <linux/list.h> #include <linux/list_sort.h> #include <linux/of.h> #include <video/of_display_timing.h> #include <video/of_videomode.h> #include <video/videomode.h> #include <drm/drm_crtc.h> #include <drm/drm_device.h> #include <drm/drm_edid.h> #include <drm/drm_modes.h> #include <drm/drm_print.h> #include "drm_crtc_internal.h" /** * drm_mode_debug_printmodeline - print a mode to dmesg * @mode: mode to print * * Describe @mode using DRM_DEBUG. */ void drm_mode_debug_printmodeline(const struct drm_display_mode *mode) { DRM_DEBUG_KMS("Modeline " DRM_MODE_FMT "\n", DRM_MODE_ARG(mode)); } EXPORT_SYMBOL(drm_mode_debug_printmodeline); /** * drm_mode_create - create a new display mode * @dev: DRM device * * Create a new, cleared drm_display_mode with kzalloc, allocate an ID for it * and return it. * * Returns: * Pointer to new mode on success, NULL on error. */ struct drm_display_mode *drm_mode_create(struct drm_device *dev) { struct drm_display_mode *nmode; nmode = kzalloc(sizeof(struct drm_display_mode), GFP_KERNEL); if (!nmode) return NULL; return nmode; } EXPORT_SYMBOL(drm_mode_create); /** * drm_mode_destroy - remove a mode * @dev: DRM device * @mode: mode to remove * * Release @mode's unique ID, then free it @mode structure itself using kfree. */ void drm_mode_destroy(struct drm_device *dev, struct drm_display_mode *mode) { if (!mode) return; kfree(mode); } EXPORT_SYMBOL(drm_mode_destroy); /** * drm_mode_probed_add - add a mode to a connector's probed_mode list * @connector: connector the new mode * @mode: mode data * * Add @mode to @connector's probed_mode list for later use. This list should * then in a second step get filtered and all the modes actually supported by * the hardware moved to the @connector's modes list. */ void drm_mode_probed_add(struct drm_connector *connector, struct drm_display_mode *mode) { WARN_ON(!mutex_is_locked(&connector->dev->mode_config.mutex)); list_add_tail(&mode->head, &connector->probed_modes); } EXPORT_SYMBOL(drm_mode_probed_add); enum drm_mode_analog { DRM_MODE_ANALOG_NTSC, /* 525 lines, 60Hz */ DRM_MODE_ANALOG_PAL, /* 625 lines, 50Hz */ }; /* * The timings come from: * - https://web.archive.org/web/20220406232708/http://www.kolumbus.fi/pami1/video/pal_ntsc.html * - https://web.archive.org/web/20220406124914/http://martin.hinner.info/vga/pal.html * - https://web.archive.org/web/20220609202433/http://www.batsocks.co.uk/readme/video_timing.htm */ #define NTSC_LINE_DURATION_NS 63556U #define NTSC_LINES_NUMBER 525 #define NTSC_HBLK_DURATION_TYP_NS 10900U #define NTSC_HBLK_DURATION_MIN_NS (NTSC_HBLK_DURATION_TYP_NS - 200) #define NTSC_HBLK_DURATION_MAX_NS (NTSC_HBLK_DURATION_TYP_NS + 200) #define NTSC_HACT_DURATION_TYP_NS (NTSC_LINE_DURATION_NS - NTSC_HBLK_DURATION_TYP_NS) #define NTSC_HACT_DURATION_MIN_NS (NTSC_LINE_DURATION_NS - NTSC_HBLK_DURATION_MAX_NS) #define NTSC_HACT_DURATION_MAX_NS (NTSC_LINE_DURATION_NS - NTSC_HBLK_DURATION_MIN_NS) #define NTSC_HFP_DURATION_TYP_NS 1500 #define NTSC_HFP_DURATION_MIN_NS 1270 #define NTSC_HFP_DURATION_MAX_NS 2220 #define NTSC_HSLEN_DURATION_TYP_NS 4700 #define NTSC_HSLEN_DURATION_MIN_NS (NTSC_HSLEN_DURATION_TYP_NS - 100) #define NTSC_HSLEN_DURATION_MAX_NS (NTSC_HSLEN_DURATION_TYP_NS + 100) #define NTSC_HBP_DURATION_TYP_NS 4700 /* * I couldn't find the actual tolerance for the back porch, so let's * just reuse the sync length ones. */ #define NTSC_HBP_DURATION_MIN_NS (NTSC_HBP_DURATION_TYP_NS - 100) #define NTSC_HBP_DURATION_MAX_NS (NTSC_HBP_DURATION_TYP_NS + 100) #define PAL_LINE_DURATION_NS 64000U #define PAL_LINES_NUMBER 625 #define PAL_HACT_DURATION_TYP_NS 51950U #define PAL_HACT_DURATION_MIN_NS (PAL_HACT_DURATION_TYP_NS - 100) #define PAL_HACT_DURATION_MAX_NS (PAL_HACT_DURATION_TYP_NS + 400) #define PAL_HBLK_DURATION_TYP_NS (PAL_LINE_DURATION_NS - PAL_HACT_DURATION_TYP_NS) #define PAL_HBLK_DURATION_MIN_NS (PAL_LINE_DURATION_NS - PAL_HACT_DURATION_MAX_NS) #define PAL_HBLK_DURATION_MAX_NS (PAL_LINE_DURATION_NS - PAL_HACT_DURATION_MIN_NS) #define PAL_HFP_DURATION_TYP_NS 1650 #define PAL_HFP_DURATION_MIN_NS (PAL_HFP_DURATION_TYP_NS - 100) #define PAL_HFP_DURATION_MAX_NS (PAL_HFP_DURATION_TYP_NS + 400) #define PAL_HSLEN_DURATION_TYP_NS 4700 #define PAL_HSLEN_DURATION_MIN_NS (PAL_HSLEN_DURATION_TYP_NS - 200) #define PAL_HSLEN_DURATION_MAX_NS (PAL_HSLEN_DURATION_TYP_NS + 200) #define PAL_HBP_DURATION_TYP_NS 5700 #define PAL_HBP_DURATION_MIN_NS (PAL_HBP_DURATION_TYP_NS - 200) #define PAL_HBP_DURATION_MAX_NS (PAL_HBP_DURATION_TYP_NS + 200) struct analog_param_field { unsigned int even, odd; }; #define PARAM_FIELD(_odd, _even) \ { .even = _even, .odd = _odd } struct analog_param_range { unsigned int min, typ, max; }; #define PARAM_RANGE(_min, _typ, _max) \ { .min = _min, .typ = _typ, .max = _max } struct analog_parameters { unsigned int num_lines; unsigned int line_duration_ns; struct analog_param_range hact_ns; struct analog_param_range hfp_ns; struct analog_param_range hslen_ns; struct analog_param_range hbp_ns; struct analog_param_range hblk_ns; unsigned int bt601_hfp; struct analog_param_field vfp_lines; struct analog_param_field vslen_lines; struct analog_param_field vbp_lines; }; #define TV_MODE_PARAMETER(_mode, _lines, _line_dur, _hact, _hfp, \ _hslen, _hbp, _hblk, _bt601_hfp, _vfp, \ _vslen, _vbp) \ [_mode] = { \ .num_lines = _lines, \ .line_duration_ns = _line_dur, \ .hact_ns = _hact, \ .hfp_ns = _hfp, \ .hslen_ns = _hslen, \ .hbp_ns = _hbp, \ .hblk_ns = _hblk, \ .bt601_hfp = _bt601_hfp, \ .vfp_lines = _vfp, \ .vslen_lines = _vslen, \ .vbp_lines = _vbp, \ } static const struct analog_parameters tv_modes_parameters[] = { TV_MODE_PARAMETER(DRM_MODE_ANALOG_NTSC, NTSC_LINES_NUMBER, NTSC_LINE_DURATION_NS, PARAM_RANGE(NTSC_HACT_DURATION_MIN_NS, NTSC_HACT_DURATION_TYP_NS, NTSC_HACT_DURATION_MAX_NS), PARAM_RANGE(NTSC_HFP_DURATION_MIN_NS, NTSC_HFP_DURATION_TYP_NS, NTSC_HFP_DURATION_MAX_NS), PARAM_RANGE(NTSC_HSLEN_DURATION_MIN_NS, NTSC_HSLEN_DURATION_TYP_NS, NTSC_HSLEN_DURATION_MAX_NS), PARAM_RANGE(NTSC_HBP_DURATION_MIN_NS, NTSC_HBP_DURATION_TYP_NS, NTSC_HBP_DURATION_MAX_NS), PARAM_RANGE(NTSC_HBLK_DURATION_MIN_NS, NTSC_HBLK_DURATION_TYP_NS, NTSC_HBLK_DURATION_MAX_NS), 16, PARAM_FIELD(3, 3), PARAM_FIELD(3, 3), PARAM_FIELD(16, 17)), TV_MODE_PARAMETER(DRM_MODE_ANALOG_PAL, PAL_LINES_NUMBER, PAL_LINE_DURATION_NS, PARAM_RANGE(PAL_HACT_DURATION_MIN_NS, PAL_HACT_DURATION_TYP_NS, PAL_HACT_DURATION_MAX_NS), PARAM_RANGE(PAL_HFP_DURATION_MIN_NS, PAL_HFP_DURATION_TYP_NS, PAL_HFP_DURATION_MAX_NS), PARAM_RANGE(PAL_HSLEN_DURATION_MIN_NS, PAL_HSLEN_DURATION_TYP_NS, PAL_HSLEN_DURATION_MAX_NS), PARAM_RANGE(PAL_HBP_DURATION_MIN_NS, PAL_HBP_DURATION_TYP_NS, PAL_HBP_DURATION_MAX_NS), PARAM_RANGE(PAL_HBLK_DURATION_MIN_NS, PAL_HBLK_DURATION_TYP_NS, PAL_HBLK_DURATION_MAX_NS), 12, /* * The front porch is actually 6 short sync * pulses for the even field, and 5 for the * odd field. Each sync takes half a life so * the odd field front porch is shorter by * half a line. * * In progressive, we're supposed to use 6 * pulses, so we're fine there */ PARAM_FIELD(3, 2), /* * The vsync length is 5 long sync pulses, * each field taking half a line. We're * shorter for both fields by half a line. * * In progressive, we're supposed to use 5 * pulses, so we're off by half * a line. * * In interlace, we're now off by half a line * for the even field and one line for the odd * field. */ PARAM_FIELD(3, 3), /* * The back porch starts with post-equalizing * pulses, consisting in 5 short sync pulses * for the even field, 4 for the odd field. In * progressive, it's 5 short syncs. * * In progressive, we thus have 2.5 lines, * plus the 0.5 line we were missing * previously, so we should use 3 lines. * * In interlace, the even field is in the * exact same case than progressive. For the * odd field, we should be using 2 lines but * we're one line short, so we'll make up for * it here by using 3. * * The entire blanking area is supposed to * take 25 lines, so we also need to account * for the rest of the blanking area that * can't be in either the front porch or sync * period. */ PARAM_FIELD(19, 20)), }; static int fill_analog_mode(struct drm_device *dev, struct drm_display_mode *mode, const struct analog_parameters *params, unsigned long pixel_clock_hz, unsigned int hactive, unsigned int vactive, bool interlace) { unsigned long pixel_duration_ns = NSEC_PER_SEC / pixel_clock_hz; unsigned int htotal, vtotal; unsigned int max_hact, hact_duration_ns; unsigned int hblk, hblk_duration_ns; unsigned int hfp, hfp_duration_ns; unsigned int hslen, hslen_duration_ns; unsigned int hbp, hbp_duration_ns; unsigned int porches, porches_duration_ns; unsigned int vfp, vfp_min; unsigned int vbp, vbp_min; unsigned int vslen; bool bt601 = false; int porches_rem; u64 result; drm_dbg_kms(dev, "Generating a %ux%u%c, %u-line mode with a %lu kHz clock\n", hactive, vactive, interlace ? 'i' : 'p', params->num_lines, pixel_clock_hz / 1000); max_hact = params->hact_ns.max / pixel_duration_ns; if (pixel_clock_hz == 13500000 && hactive > max_hact && hactive <= 720) { drm_dbg_kms(dev, "Trying to generate a BT.601 mode. Disabling checks.\n"); bt601 = true; } /* * Our pixel duration is going to be round down by the division, * so rounding up is probably going to introduce even more * deviation. */ result = (u64)params->line_duration_ns * pixel_clock_hz; do_div(result, NSEC_PER_SEC); htotal = result; drm_dbg_kms(dev, "Total Horizontal Number of Pixels: %u\n", htotal); hact_duration_ns = hactive * pixel_duration_ns; if (!bt601 && (hact_duration_ns < params->hact_ns.min || hact_duration_ns > params->hact_ns.max)) { drm_err(dev, "Invalid horizontal active area duration: %uns (min: %u, max %u)\n", hact_duration_ns, params->hact_ns.min, params->hact_ns.max); return -EINVAL; } hblk = htotal - hactive; drm_dbg_kms(dev, "Horizontal Blanking Period: %u\n", hblk); hblk_duration_ns = hblk * pixel_duration_ns; if (!bt601 && (hblk_duration_ns < params->hblk_ns.min || hblk_duration_ns > params->hblk_ns.max)) { drm_err(dev, "Invalid horizontal blanking duration: %uns (min: %u, max %u)\n", hblk_duration_ns, params->hblk_ns.min, params->hblk_ns.max); return -EINVAL; } hslen = DIV_ROUND_UP(params->hslen_ns.typ, pixel_duration_ns); drm_dbg_kms(dev, "Horizontal Sync Period: %u\n", hslen); hslen_duration_ns = hslen * pixel_duration_ns; if (!bt601 && (hslen_duration_ns < params->hslen_ns.min || hslen_duration_ns > params->hslen_ns.max)) { drm_err(dev, "Invalid horizontal sync duration: %uns (min: %u, max %u)\n", hslen_duration_ns, params->hslen_ns.min, params->hslen_ns.max); return -EINVAL; } porches = hblk - hslen; drm_dbg_kms(dev, "Remaining horizontal pixels for both porches: %u\n", porches); porches_duration_ns = porches * pixel_duration_ns; if (!bt601 && (porches_duration_ns > (params->hfp_ns.max + params->hbp_ns.max) || porches_duration_ns < (params->hfp_ns.min + params->hbp_ns.min))) { drm_err(dev, "Invalid horizontal porches duration: %uns\n", porches_duration_ns); return -EINVAL; } if (bt601) { hfp = params->bt601_hfp; } else { unsigned int hfp_min = DIV_ROUND_UP(params->hfp_ns.min, pixel_duration_ns); unsigned int hbp_min = DIV_ROUND_UP(params->hbp_ns.min, pixel_duration_ns); int porches_rem = porches - hfp_min - hbp_min; hfp = hfp_min + DIV_ROUND_UP(porches_rem, 2); } drm_dbg_kms(dev, "Horizontal Front Porch: %u\n", hfp); hfp_duration_ns = hfp * pixel_duration_ns; if (!bt601 && (hfp_duration_ns < params->hfp_ns.min || hfp_duration_ns > params->hfp_ns.max)) { drm_err(dev, "Invalid horizontal front porch duration: %uns (min: %u, max %u)\n", hfp_duration_ns, params->hfp_ns.min, params->hfp_ns.max); return -EINVAL; } hbp = porches - hfp; drm_dbg_kms(dev, "Horizontal Back Porch: %u\n", hbp); hbp_duration_ns = hbp * pixel_duration_ns; if (!bt601 && (hbp_duration_ns < params->hbp_ns.min || hbp_duration_ns > params->hbp_ns.max)) { drm_err(dev, "Invalid horizontal back porch duration: %uns (min: %u, max %u)\n", hbp_duration_ns, params->hbp_ns.min, params->hbp_ns.max); return -EINVAL; } if (htotal != (hactive + hfp + hslen + hbp)) return -EINVAL; mode->clock = pixel_clock_hz / 1000; mode->hdisplay = hactive; mode->hsync_start = mode->hdisplay + hfp; mode->hsync_end = mode->hsync_start + hslen; mode->htotal = mode->hsync_end + hbp; if (interlace) { vfp_min = params->vfp_lines.even + params->vfp_lines.odd; vbp_min = params->vbp_lines.even + params->vbp_lines.odd; vslen = params->vslen_lines.even + params->vslen_lines.odd; } else { /* * By convention, NTSC (aka 525/60) systems start with * the even field, but PAL (aka 625/50) systems start * with the odd one. * * PAL systems also have asymmetric timings between the * even and odd field, while NTSC is symmetric. * * Moreover, if we want to create a progressive mode for * PAL, we need to use the odd field timings. * * Since odd == even for NTSC, we can just use the odd * one all the time to simplify the code a bit. */ vfp_min = params->vfp_lines.odd; vbp_min = params->vbp_lines.odd; vslen = params->vslen_lines.odd; } drm_dbg_kms(dev, "Vertical Sync Period: %u\n", vslen); porches = params->num_lines - vactive - vslen; drm_dbg_kms(dev, "Remaining vertical pixels for both porches: %u\n", porches); porches_rem = porches - vfp_min - vbp_min; vfp = vfp_min + (porches_rem / 2); drm_dbg_kms(dev, "Vertical Front Porch: %u\n", vfp); vbp = porches - vfp; drm_dbg_kms(dev, "Vertical Back Porch: %u\n", vbp); vtotal = vactive + vfp + vslen + vbp; if (params->num_lines != vtotal) { drm_err(dev, "Invalid vertical total: %upx (expected %upx)\n", vtotal, params->num_lines); return -EINVAL; } mode->vdisplay = vactive; mode->vsync_start = mode->vdisplay + vfp; mode->vsync_end = mode->vsync_start + vslen; mode->vtotal = mode->vsync_end + vbp; if (mode->vtotal != params->num_lines) return -EINVAL; mode->type = DRM_MODE_TYPE_DRIVER; mode->flags = DRM_MODE_FLAG_NVSYNC | DRM_MODE_FLAG_NHSYNC; if (interlace) mode->flags |= DRM_MODE_FLAG_INTERLACE; drm_mode_set_name(mode); drm_dbg_kms(dev, "Generated mode " DRM_MODE_FMT "\n", DRM_MODE_ARG(mode)); return 0; } /** * drm_analog_tv_mode - create a display mode for an analog TV * @dev: drm device * @tv_mode: TV Mode standard to create a mode for. See DRM_MODE_TV_MODE_*. * @pixel_clock_hz: Pixel Clock Frequency, in Hertz * @hdisplay: hdisplay size * @vdisplay: vdisplay size * @interlace: whether to compute an interlaced mode * * This function creates a struct drm_display_mode instance suited for * an analog TV output, for one of the usual analog TV modes. Where * this is DRM_MODE_TV_MODE_MONOCHROME, a 625-line mode will be created. * * Note that @hdisplay is larger than the usual constraints for the PAL * and NTSC timings, and we'll choose to ignore most timings constraints * to reach those resolutions. * * Returns: * * A pointer to the mode, allocated with drm_mode_create(). Returns NULL * on error. */ struct drm_display_mode *drm_analog_tv_mode(struct drm_device *dev, enum drm_connector_tv_mode tv_mode, unsigned long pixel_clock_hz, unsigned int hdisplay, unsigned int vdisplay, bool interlace) { struct drm_display_mode *mode; enum drm_mode_analog analog; int ret; switch (tv_mode) { case DRM_MODE_TV_MODE_NTSC: fallthrough; case DRM_MODE_TV_MODE_NTSC_443: fallthrough; case DRM_MODE_TV_MODE_NTSC_J: fallthrough; case DRM_MODE_TV_MODE_PAL_M: analog = DRM_MODE_ANALOG_NTSC; break; case DRM_MODE_TV_MODE_PAL: fallthrough; case DRM_MODE_TV_MODE_PAL_N: fallthrough; case DRM_MODE_TV_MODE_SECAM: fallthrough; case DRM_MODE_TV_MODE_MONOCHROME: analog = DRM_MODE_ANALOG_PAL; break; default: return NULL; } mode = drm_mode_create(dev); if (!mode) return NULL; ret = fill_analog_mode(dev, mode, &tv_modes_parameters[analog], pixel_clock_hz, hdisplay, vdisplay, interlace); if (ret) goto err_free_mode; return mode; err_free_mode: drm_mode_destroy(dev, mode); return NULL; } EXPORT_SYMBOL(drm_analog_tv_mode); /** * drm_cvt_mode -create a modeline based on the CVT algorithm * @dev: drm device * @hdisplay: hdisplay size * @vdisplay: vdisplay size * @vrefresh: vrefresh rate * @reduced: whether to use reduced blanking * @interlaced: whether to compute an interlaced mode * @margins: whether to add margins (borders) * * This function is called to generate the modeline based on CVT algorithm * according to the hdisplay, vdisplay, vrefresh. * It is based from the VESA(TM) Coordinated Video Timing Generator by * Graham Loveridge April 9, 2003 available at * http://www.elo.utfsm.cl/~elo212/docs/CVTd6r1.xls * * And it is copied from xf86CVTmode in xserver/hw/xfree86/modes/xf86cvt.c. * What I have done is to translate it by using integer calculation. * * Returns: * The modeline based on the CVT algorithm stored in a drm_display_mode object. * The display mode object is allocated with drm_mode_create(). Returns NULL * when no mode could be allocated. */ struct drm_display_mode *drm_cvt_mode(struct drm_device *dev, int hdisplay, int vdisplay, int vrefresh, bool reduced, bool interlaced, bool margins) { #define HV_FACTOR 1000 /* 1) top/bottom margin size (% of height) - default: 1.8, */ #define CVT_MARGIN_PERCENTAGE 18 /* 2) character cell horizontal granularity (pixels) - default 8 */ #define CVT_H_GRANULARITY 8 /* 3) Minimum vertical porch (lines) - default 3 */ #define CVT_MIN_V_PORCH 3 /* 4) Minimum number of vertical back porch lines - default 6 */ #define CVT_MIN_V_BPORCH 6 /* Pixel Clock step (kHz) */ #define CVT_CLOCK_STEP 250 struct drm_display_mode *drm_mode; unsigned int vfieldrate, hperiod; int hdisplay_rnd, hmargin, vdisplay_rnd, vmargin, vsync; int interlace; u64 tmp; if (!hdisplay || !vdisplay) return NULL; /* allocate the drm_display_mode structure. If failure, we will * return directly */ drm_mode = drm_mode_create(dev); if (!drm_mode) return NULL; /* the CVT default refresh rate is 60Hz */ if (!vrefresh) vrefresh = 60; /* the required field fresh rate */ if (interlaced) vfieldrate = vrefresh * 2; else vfieldrate = vrefresh; /* horizontal pixels */ hdisplay_rnd = hdisplay - (hdisplay % CVT_H_GRANULARITY); /* determine the left&right borders */ hmargin = 0; if (margins) { hmargin = hdisplay_rnd * CVT_MARGIN_PERCENTAGE / 1000; hmargin -= hmargin % CVT_H_GRANULARITY; } /* find the total active pixels */ drm_mode->hdisplay = hdisplay_rnd + 2 * hmargin; /* find the number of lines per field */ if (interlaced) vdisplay_rnd = vdisplay / 2; else vdisplay_rnd = vdisplay; /* find the top & bottom borders */ vmargin = 0; if (margins) vmargin = vdisplay_rnd * CVT_MARGIN_PERCENTAGE / 1000; drm_mode->vdisplay = vdisplay + 2 * vmargin; /* Interlaced */ if (interlaced) interlace = 1; else interlace = 0; /* Determine VSync Width from aspect ratio */ if (!(vdisplay % 3) && ((vdisplay * 4 / 3) == hdisplay)) vsync = 4; else if (!(vdisplay % 9) && ((vdisplay * 16 / 9) == hdisplay)) vsync = 5; else if (!(vdisplay % 10) && ((vdisplay * 16 / 10) == hdisplay)) vsync = 6; else if (!(vdisplay % 4) && ((vdisplay * 5 / 4) == hdisplay)) vsync = 7; else if (!(vdisplay % 9) && ((vdisplay * 15 / 9) == hdisplay)) vsync = 7; else /* custom */ vsync = 10; if (!reduced) { /* simplify the GTF calculation */ /* 4) Minimum time of vertical sync + back porch interval (µs) * default 550.0 */ int tmp1, tmp2; #define CVT_MIN_VSYNC_BP 550 /* 3) Nominal HSync width (% of line period) - default 8 */ #define CVT_HSYNC_PERCENTAGE 8 unsigned int hblank_percentage; int vsyncandback_porch, __maybe_unused vback_porch, hblank; /* estimated the horizontal period */ tmp1 = HV_FACTOR * 1000000 - CVT_MIN_VSYNC_BP * HV_FACTOR * vfieldrate; tmp2 = (vdisplay_rnd + 2 * vmargin + CVT_MIN_V_PORCH) * 2 + interlace; hperiod = tmp1 * 2 / (tmp2 * vfieldrate); tmp1 = CVT_MIN_VSYNC_BP * HV_FACTOR / hperiod + 1; /* 9. Find number of lines in sync + backporch */ if (tmp1 < (vsync + CVT_MIN_V_PORCH)) vsyncandback_porch = vsync + CVT_MIN_V_PORCH; else vsyncandback_porch = tmp1; /* 10. Find number of lines in back porch */ vback_porch = vsyncandback_porch - vsync; drm_mode->vtotal = vdisplay_rnd + 2 * vmargin + vsyncandback_porch + CVT_MIN_V_PORCH; /* 5) Definition of Horizontal blanking time limitation */ /* Gradient (%/kHz) - default 600 */ #define CVT_M_FACTOR 600 /* Offset (%) - default 40 */ #define CVT_C_FACTOR 40 /* Blanking time scaling factor - default 128 */ #define CVT_K_FACTOR 128 /* Scaling factor weighting - default 20 */ #define CVT_J_FACTOR 20 #define CVT_M_PRIME (CVT_M_FACTOR * CVT_K_FACTOR / 256) #define CVT_C_PRIME ((CVT_C_FACTOR - CVT_J_FACTOR) * CVT_K_FACTOR / 256 + \ CVT_J_FACTOR) /* 12. Find ideal blanking duty cycle from formula */ hblank_percentage = CVT_C_PRIME * HV_FACTOR - CVT_M_PRIME * hperiod / 1000; /* 13. Blanking time */ if (hblank_percentage < 20 * HV_FACTOR) hblank_percentage = 20 * HV_FACTOR; hblank = drm_mode->hdisplay * hblank_percentage / (100 * HV_FACTOR - hblank_percentage); hblank -= hblank % (2 * CVT_H_GRANULARITY); /* 14. find the total pixels per line */ drm_mode->htotal = drm_mode->hdisplay + hblank; drm_mode->hsync_end = drm_mode->hdisplay + hblank / 2; drm_mode->hsync_start = drm_mode->hsync_end - (drm_mode->htotal * CVT_HSYNC_PERCENTAGE) / 100; drm_mode->hsync_start += CVT_H_GRANULARITY - drm_mode->hsync_start % CVT_H_GRANULARITY; /* fill the Vsync values */ drm_mode->vsync_start = drm_mode->vdisplay + CVT_MIN_V_PORCH; drm_mode->vsync_end = drm_mode->vsync_start + vsync; } else { /* Reduced blanking */ /* Minimum vertical blanking interval time (µs)- default 460 */ #define CVT_RB_MIN_VBLANK 460 /* Fixed number of clocks for horizontal sync */ #define CVT_RB_H_SYNC 32 /* Fixed number of clocks for horizontal blanking */ #define CVT_RB_H_BLANK 160 /* Fixed number of lines for vertical front porch - default 3*/ #define CVT_RB_VFPORCH 3 int vbilines; int tmp1, tmp2; /* 8. Estimate Horizontal period. */ tmp1 = HV_FACTOR * 1000000 - CVT_RB_MIN_VBLANK * HV_FACTOR * vfieldrate; tmp2 = vdisplay_rnd + 2 * vmargin; hperiod = tmp1 / (tmp2 * vfieldrate); /* 9. Find number of lines in vertical blanking */ vbilines = CVT_RB_MIN_VBLANK * HV_FACTOR / hperiod + 1; /* 10. Check if vertical blanking is sufficient */ if (vbilines < (CVT_RB_VFPORCH + vsync + CVT_MIN_V_BPORCH)) vbilines = CVT_RB_VFPORCH + vsync + CVT_MIN_V_BPORCH; /* 11. Find total number of lines in vertical field */ drm_mode->vtotal = vdisplay_rnd + 2 * vmargin + vbilines; /* 12. Find total number of pixels in a line */ drm_mode->htotal = drm_mode->hdisplay + CVT_RB_H_BLANK; /* Fill in HSync values */ drm_mode->hsync_end = drm_mode->hdisplay + CVT_RB_H_BLANK / 2; drm_mode->hsync_start = drm_mode->hsync_end - CVT_RB_H_SYNC; /* Fill in VSync values */ drm_mode->vsync_start = drm_mode->vdisplay + CVT_RB_VFPORCH; drm_mode->vsync_end = drm_mode->vsync_start + vsync; } /* 15/13. Find pixel clock frequency (kHz for xf86) */ tmp = drm_mode->htotal; /* perform intermediate calcs in u64 */ tmp *= HV_FACTOR * 1000; do_div(tmp, hperiod); tmp -= drm_mode->clock % CVT_CLOCK_STEP; drm_mode->clock = tmp; /* 18/16. Find actual vertical frame frequency */ /* ignore - just set the mode flag for interlaced */ if (interlaced) { drm_mode->vtotal *= 2; drm_mode->flags |= DRM_MODE_FLAG_INTERLACE; } /* Fill the mode line name */ drm_mode_set_name(drm_mode); if (reduced) drm_mode->flags |= (DRM_MODE_FLAG_PHSYNC | DRM_MODE_FLAG_NVSYNC); else drm_mode->flags |= (DRM_MODE_FLAG_PVSYNC | DRM_MODE_FLAG_NHSYNC); return drm_mode; } EXPORT_SYMBOL(drm_cvt_mode); /** * drm_gtf_mode_complex - create the modeline based on the full GTF algorithm * @dev: drm device * @hdisplay: hdisplay size * @vdisplay: vdisplay size * @vrefresh: vrefresh rate. * @interlaced: whether to compute an interlaced mode * @margins: desired margin (borders) size * @GTF_M: extended GTF formula parameters * @GTF_2C: extended GTF formula parameters * @GTF_K: extended GTF formula parameters * @GTF_2J: extended GTF formula parameters * * GTF feature blocks specify C and J in multiples of 0.5, so we pass them * in here multiplied by two. For a C of 40, pass in 80. * * Returns: * The modeline based on the full GTF algorithm stored in a drm_display_mode object. * The display mode object is allocated with drm_mode_create(). Returns NULL * when no mode could be allocated. */ struct drm_display_mode * drm_gtf_mode_complex(struct drm_device *dev, int hdisplay, int vdisplay, int vrefresh, bool interlaced, int margins, int GTF_M, int GTF_2C, int GTF_K, int GTF_2J) { /* 1) top/bottom margin size (% of height) - default: 1.8, */ #define GTF_MARGIN_PERCENTAGE 18 /* 2) character cell horizontal granularity (pixels) - default 8 */ #define GTF_CELL_GRAN 8 /* 3) Minimum vertical porch (lines) - default 3 */ #define GTF_MIN_V_PORCH 1 /* width of vsync in lines */ #define V_SYNC_RQD 3 /* width of hsync as % of total line */ #define H_SYNC_PERCENT 8 /* min time of vsync + back porch (microsec) */ #define MIN_VSYNC_PLUS_BP 550 /* C' and M' are part of the Blanking Duty Cycle computation */ #define GTF_C_PRIME ((((GTF_2C - GTF_2J) * GTF_K / 256) + GTF_2J) / 2) #define GTF_M_PRIME (GTF_K * GTF_M / 256) struct drm_display_mode *drm_mode; unsigned int hdisplay_rnd, vdisplay_rnd, vfieldrate_rqd; int top_margin, bottom_margin; int interlace; unsigned int hfreq_est; int vsync_plus_bp, __maybe_unused vback_porch; unsigned int vtotal_lines, __maybe_unused vfieldrate_est; unsigned int __maybe_unused hperiod; unsigned int vfield_rate, __maybe_unused vframe_rate; int left_margin, right_margin; unsigned int total_active_pixels, ideal_duty_cycle; unsigned int hblank, total_pixels, pixel_freq; int hsync, hfront_porch, vodd_front_porch_lines; unsigned int tmp1, tmp2; if (!hdisplay || !vdisplay) return NULL; drm_mode = drm_mode_create(dev); if (!drm_mode) return NULL; /* 1. In order to give correct results, the number of horizontal * pixels requested is first processed to ensure that it is divisible * by the character size, by rounding it to the nearest character * cell boundary: */ hdisplay_rnd = (hdisplay + GTF_CELL_GRAN / 2) / GTF_CELL_GRAN; hdisplay_rnd = hdisplay_rnd * GTF_CELL_GRAN; /* 2. If interlace is requested, the number of vertical lines assumed * by the calculation must be halved, as the computation calculates * the number of vertical lines per field. */ if (interlaced) vdisplay_rnd = vdisplay / 2; else vdisplay_rnd = vdisplay; /* 3. Find the frame rate required: */ if (interlaced) vfieldrate_rqd = vrefresh * 2; else vfieldrate_rqd = vrefresh; /* 4. Find number of lines in Top margin: */ top_margin = 0; if (margins) top_margin = (vdisplay_rnd * GTF_MARGIN_PERCENTAGE + 500) / 1000; /* 5. Find number of lines in bottom margin: */ bottom_margin = top_margin; /* 6. If interlace is required, then set variable interlace: */ if (interlaced) interlace = 1; else interlace = 0; /* 7. Estimate the Horizontal frequency */ { tmp1 = (1000000 - MIN_VSYNC_PLUS_BP * vfieldrate_rqd) / 500; tmp2 = (vdisplay_rnd + 2 * top_margin + GTF_MIN_V_PORCH) * 2 + interlace; hfreq_est = (tmp2 * 1000 * vfieldrate_rqd) / tmp1; } /* 8. Find the number of lines in V sync + back porch */ /* [V SYNC+BP] = RINT(([MIN VSYNC+BP] * hfreq_est / 1000000)) */ vsync_plus_bp = MIN_VSYNC_PLUS_BP * hfreq_est / 1000; vsync_plus_bp = (vsync_plus_bp + 500) / 1000; /* 9. Find the number of lines in V back porch alone: */ vback_porch = vsync_plus_bp - V_SYNC_RQD; /* 10. Find the total number of lines in Vertical field period: */ vtotal_lines = vdisplay_rnd + top_margin + bottom_margin + vsync_plus_bp + GTF_MIN_V_PORCH; /* 11. Estimate the Vertical field frequency: */ vfieldrate_est = hfreq_est / vtotal_lines; /* 12. Find the actual horizontal period: */ hperiod = 1000000 / (vfieldrate_rqd * vtotal_lines); /* 13. Find the actual Vertical field frequency: */ vfield_rate = hfreq_est / vtotal_lines; /* 14. Find the Vertical frame frequency: */ if (interlaced) vframe_rate = vfield_rate / 2; else vframe_rate = vfield_rate; /* 15. Find number of pixels in left margin: */ if (margins) left_margin = (hdisplay_rnd * GTF_MARGIN_PERCENTAGE + 500) / 1000; else left_margin = 0; /* 16.Find number of pixels in right margin: */ right_margin = left_margin; /* 17.Find total number of active pixels in image and left and right */ total_active_pixels = hdisplay_rnd + left_margin + right_margin; /* 18.Find the ideal blanking duty cycle from blanking duty cycle */ ideal_duty_cycle = GTF_C_PRIME * 1000 - (GTF_M_PRIME * 1000000 / hfreq_est); /* 19.Find the number of pixels in the blanking time to the nearest * double character cell: */ hblank = total_active_pixels * ideal_duty_cycle / (100000 - ideal_duty_cycle); hblank = (hblank + GTF_CELL_GRAN) / (2 * GTF_CELL_GRAN); hblank = hblank * 2 * GTF_CELL_GRAN; /* 20.Find total number of pixels: */ total_pixels = total_active_pixels + hblank; /* 21.Find pixel clock frequency: */ pixel_freq = total_pixels * hfreq_est / 1000; /* Stage 1 computations are now complete; I should really pass * the results to another function and do the Stage 2 computations, * but I only need a few more values so I'll just append the * computations here for now */ /* 17. Find the number of pixels in the horizontal sync period: */ hsync = H_SYNC_PERCENT * total_pixels / 100; hsync = (hsync + GTF_CELL_GRAN / 2) / GTF_CELL_GRAN; hsync = hsync * GTF_CELL_GRAN; /* 18. Find the number of pixels in horizontal front porch period */ hfront_porch = hblank / 2 - hsync; /* 36. Find the number of lines in the odd front porch period: */ vodd_front_porch_lines = GTF_MIN_V_PORCH ; /* finally, pack the results in the mode struct */ drm_mode->hdisplay = hdisplay_rnd; drm_mode->hsync_start = hdisplay_rnd + hfront_porch; drm_mode->hsync_end = drm_mode->hsync_start + hsync; drm_mode->htotal = total_pixels; drm_mode->vdisplay = vdisplay_rnd; drm_mode->vsync_start = vdisplay_rnd + vodd_front_porch_lines; drm_mode->vsync_end = drm_mode->vsync_start + V_SYNC_RQD; drm_mode->vtotal = vtotal_lines; drm_mode->clock = pixel_freq; if (interlaced) { drm_mode->vtotal *= 2; drm_mode->flags |= DRM_MODE_FLAG_INTERLACE; } drm_mode_set_name(drm_mode); if (GTF_M == 600 && GTF_2C == 80 && GTF_K == 128 && GTF_2J == 40) drm_mode->flags = DRM_MODE_FLAG_NHSYNC | DRM_MODE_FLAG_PVSYNC; else drm_mode->flags = DRM_MODE_FLAG_PHSYNC | DRM_MODE_FLAG_NVSYNC; return drm_mode; } EXPORT_SYMBOL(drm_gtf_mode_complex); /** * drm_gtf_mode - create the modeline based on the GTF algorithm * @dev: drm device * @hdisplay: hdisplay size * @vdisplay: vdisplay size * @vrefresh: vrefresh rate. * @interlaced: whether to compute an interlaced mode * @margins: desired margin (borders) size * * return the modeline based on GTF algorithm * * This function is to create the modeline based on the GTF algorithm. * Generalized Timing Formula is derived from: * * GTF Spreadsheet by Andy Morrish (1/5/97) * available at https://www.vesa.org * * And it is copied from the file of xserver/hw/xfree86/modes/xf86gtf.c. * What I have done is to translate it by using integer calculation. * I also refer to the function of fb_get_mode in the file of * drivers/video/fbmon.c * * Standard GTF parameters:: * * M = 600 * C = 40 * K = 128 * J = 20 * * Returns: * The modeline based on the GTF algorithm stored in a drm_display_mode object. * The display mode object is allocated with drm_mode_create(). Returns NULL * when no mode could be allocated. */ struct drm_display_mode * drm_gtf_mode(struct drm_device *dev, int hdisplay, int vdisplay, int vrefresh, bool interlaced, int margins) { return drm_gtf_mode_complex(dev, hdisplay, vdisplay, vrefresh, interlaced, margins, 600, 40 * 2, 128, 20 * 2); } EXPORT_SYMBOL(drm_gtf_mode); #ifdef CONFIG_VIDEOMODE_HELPERS /** * drm_display_mode_from_videomode - fill in @dmode using @vm, * @vm: videomode structure to use as source * @dmode: drm_display_mode structure to use as destination * * Fills out @dmode using the display mode specified in @vm. */ void drm_display_mode_from_videomode(const struct videomode *vm, struct drm_display_mode *dmode) { dmode->hdisplay = vm->hactive; dmode->hsync_start = dmode->hdisplay + vm->hfront_porch; dmode->hsync_end = dmode->hsync_start + vm->hsync_len; dmode->htotal = dmode->hsync_end + vm->hback_porch; dmode->vdisplay = vm->vactive; dmode->vsync_start = dmode->vdisplay + vm->vfront_porch; dmode->vsync_end = dmode->vsync_start + vm->vsync_len; dmode->vtotal = dmode->vsync_end + vm->vback_porch; dmode->clock = vm->pixelclock / 1000; dmode->flags = 0; if (vm->flags & DISPLAY_FLAGS_HSYNC_HIGH) dmode->flags |= DRM_MODE_FLAG_PHSYNC; else if (vm->flags & DISPLAY_FLAGS_HSYNC_LOW) dmode->flags |= DRM_MODE_FLAG_NHSYNC; if (vm->flags & DISPLAY_FLAGS_VSYNC_HIGH) dmode->flags |= DRM_MODE_FLAG_PVSYNC; else if (vm->flags & DISPLAY_FLAGS_VSYNC_LOW) dmode->flags |= DRM_MODE_FLAG_NVSYNC; if (vm->flags & DISPLAY_FLAGS_INTERLACED) dmode->flags |= DRM_MODE_FLAG_INTERLACE; if (vm->flags & DISPLAY_FLAGS_DOUBLESCAN) dmode->flags |= DRM_MODE_FLAG_DBLSCAN; if (vm->flags & DISPLAY_FLAGS_DOUBLECLK) dmode->flags |= DRM_MODE_FLAG_DBLCLK; drm_mode_set_name(dmode); } EXPORT_SYMBOL_GPL(drm_display_mode_from_videomode); /** * drm_display_mode_to_videomode - fill in @vm using @dmode, * @dmode: drm_display_mode structure to use as source * @vm: videomode structure to use as destination * * Fills out @vm using the display mode specified in @dmode. */ void drm_display_mode_to_videomode(const struct drm_display_mode *dmode, struct videomode *vm) { vm->hactive = dmode->hdisplay; vm->hfront_porch = dmode->hsync_start - dmode->hdisplay; vm->hsync_len = dmode->hsync_end - dmode->hsync_start; vm->hback_porch = dmode->htotal - dmode->hsync_end; vm->vactive = dmode->vdisplay; vm->vfront_porch = dmode->vsync_start - dmode->vdisplay; vm->vsync_len = dmode->vsync_end - dmode->vsync_start; vm->vback_porch = dmode->vtotal - dmode->vsync_end; vm->pixelclock = dmode->clock * 1000; vm->flags = 0; if (dmode->flags & DRM_MODE_FLAG_PHSYNC) vm->flags |= DISPLAY_FLAGS_HSYNC_HIGH; else if (dmode->flags & DRM_MODE_FLAG_NHSYNC) vm->flags |= DISPLAY_FLAGS_HSYNC_LOW; if (dmode->flags & DRM_MODE_FLAG_PVSYNC) vm->flags |= DISPLAY_FLAGS_VSYNC_HIGH; else if (dmode->flags & DRM_MODE_FLAG_NVSYNC) vm->flags |= DISPLAY_FLAGS_VSYNC_LOW; if (dmode->flags & DRM_MODE_FLAG_INTERLACE) vm->flags |= DISPLAY_FLAGS_INTERLACED; if (dmode->flags & DRM_MODE_FLAG_DBLSCAN) vm->flags |= DISPLAY_FLAGS_DOUBLESCAN; if (dmode->flags & DRM_MODE_FLAG_DBLCLK) vm->flags |= DISPLAY_FLAGS_DOUBLECLK; } EXPORT_SYMBOL_GPL(drm_display_mode_to_videomode); /** * drm_bus_flags_from_videomode - extract information about pixelclk and * DE polarity from videomode and store it in a separate variable * @vm: videomode structure to use * @bus_flags: information about pixelclk, sync and DE polarity will be stored * here * * Sets DRM_BUS_FLAG_DE_(LOW|HIGH), DRM_BUS_FLAG_PIXDATA_DRIVE_(POS|NEG)EDGE * and DISPLAY_FLAGS_SYNC_(POS|NEG)EDGE in @bus_flags according to DISPLAY_FLAGS * found in @vm */ void drm_bus_flags_from_videomode(const struct videomode *vm, u32 *bus_flags) { *bus_flags = 0; if (vm->flags & DISPLAY_FLAGS_PIXDATA_POSEDGE) *bus_flags |= DRM_BUS_FLAG_PIXDATA_DRIVE_POSEDGE; if (vm->flags & DISPLAY_FLAGS_PIXDATA_NEGEDGE) *bus_flags |= DRM_BUS_FLAG_PIXDATA_DRIVE_NEGEDGE; if (vm->flags & DISPLAY_FLAGS_SYNC_POSEDGE) *bus_flags |= DRM_BUS_FLAG_SYNC_DRIVE_POSEDGE; if (vm->flags & DISPLAY_FLAGS_SYNC_NEGEDGE) *bus_flags |= DRM_BUS_FLAG_SYNC_DRIVE_NEGEDGE; if (vm->flags & DISPLAY_FLAGS_DE_LOW) *bus_flags |= DRM_BUS_FLAG_DE_LOW; if (vm->flags & DISPLAY_FLAGS_DE_HIGH) *bus_flags |= DRM_BUS_FLAG_DE_HIGH; } EXPORT_SYMBOL_GPL(drm_bus_flags_from_videomode); #ifdef CONFIG_OF /** * of_get_drm_display_mode - get a drm_display_mode from devicetree * @np: device_node with the timing specification * @dmode: will be set to the return value * @bus_flags: information about pixelclk, sync and DE polarity * @index: index into the list of display timings in devicetree * * This function is expensive and should only be used, if only one mode is to be * read from DT. To get multiple modes start with of_get_display_timings and * work with that instead. * * Returns: * 0 on success, a negative errno code when no of videomode node was found. */ int of_get_drm_display_mode(struct device_node *np, struct drm_display_mode *dmode, u32 *bus_flags, int index) { struct videomode vm; int ret; ret = of_get_videomode(np, &vm, index); if (ret) return ret; drm_display_mode_from_videomode(&vm, dmode); if (bus_flags) drm_bus_flags_from_videomode(&vm, bus_flags); pr_debug("%pOF: got %dx%d display mode: " DRM_MODE_FMT "\n", np, vm.hactive, vm.vactive, DRM_MODE_ARG(dmode)); return 0; } EXPORT_SYMBOL_GPL(of_get_drm_display_mode); /** * of_get_drm_panel_display_mode - get a panel-timing drm_display_mode from devicetree * @np: device_node with the panel-timing specification * @dmode: will be set to the return value * @bus_flags: information about pixelclk, sync and DE polarity * * The mandatory Device Tree properties width-mm and height-mm * are read and set on the display mode. * * Returns: * Zero on success, negative error code on failure. */ int of_get_drm_panel_display_mode(struct device_node *np, struct drm_display_mode *dmode, u32 *bus_flags) { u32 width_mm = 0, height_mm = 0; struct display_timing timing; struct videomode vm; int ret; ret = of_get_display_timing(np, "panel-timing", &timing); if (ret) return ret; videomode_from_timing(&timing, &vm); memset(dmode, 0, sizeof(*dmode)); drm_display_mode_from_videomode(&vm, dmode); if (bus_flags) drm_bus_flags_from_videomode(&vm, bus_flags); ret = of_property_read_u32(np, "width-mm", &width_mm); if (ret) return ret; ret = of_property_read_u32(np, "height-mm", &height_mm); if (ret) return ret; dmode->width_mm = width_mm; dmode->height_mm = height_mm; pr_debug(DRM_MODE_FMT "\n", DRM_MODE_ARG(dmode)); return 0; } EXPORT_SYMBOL_GPL(of_get_drm_panel_display_mode); #endif /* CONFIG_OF */ #endif /* CONFIG_VIDEOMODE_HELPERS */ /** * drm_mode_set_name - set the name on a mode * @mode: name will be set in this mode * * Set the name of @mode to a standard format which is <hdisplay>x<vdisplay> * with an optional 'i' suffix for interlaced modes. */ void drm_mode_set_name(struct drm_display_mode *mode) { bool interlaced = !!(mode->flags & DRM_MODE_FLAG_INTERLACE); snprintf(mode->name, DRM_DISPLAY_MODE_LEN, "%dx%d%s", mode->hdisplay, mode->vdisplay, interlaced ? "i" : ""); } EXPORT_SYMBOL(drm_mode_set_name); /** * drm_mode_vrefresh - get the vrefresh of a mode * @mode: mode * * Returns: * @modes's vrefresh rate in Hz, rounded to the nearest integer. Calculates the * value first if it is not yet set. */ int drm_mode_vrefresh(const struct drm_display_mode *mode) { unsigned int num, den; if (mode->htotal == 0 || mode->vtotal == 0) return 0; num = mode->clock; den = mode->htotal * mode->vtotal; if (mode->flags & DRM_MODE_FLAG_INTERLACE) num *= 2; if (mode->flags & DRM_MODE_FLAG_DBLSCAN) den *= 2; if (mode->vscan > 1) den *= mode->vscan; return DIV_ROUND_CLOSEST_ULL(mul_u32_u32(num, 1000), den); } EXPORT_SYMBOL(drm_mode_vrefresh); /** * drm_mode_get_hv_timing - Fetches hdisplay/vdisplay for given mode * @mode: mode to query * @hdisplay: hdisplay value to fill in * @vdisplay: vdisplay value to fill in * * The vdisplay value will be doubled if the specified mode is a stereo mode of * the appropriate layout. */ void drm_mode_get_hv_timing(const struct drm_display_mode *mode, int *hdisplay, int *vdisplay) { struct drm_display_mode adjusted; drm_mode_init(&adjusted, mode); drm_mode_set_crtcinfo(&adjusted, CRTC_STEREO_DOUBLE_ONLY); *hdisplay = adjusted.crtc_hdisplay; *vdisplay = adjusted.crtc_vdisplay; } EXPORT_SYMBOL(drm_mode_get_hv_timing); /** * drm_mode_set_crtcinfo - set CRTC modesetting timing parameters * @p: mode * @adjust_flags: a combination of adjustment flags * * Setup the CRTC modesetting timing parameters for @p, adjusting if necessary. * * - The CRTC_INTERLACE_HALVE_V flag can be used to halve vertical timings of * interlaced modes. * - The CRTC_STEREO_DOUBLE flag can be used to compute the timings for * buffers containing two eyes (only adjust the timings when needed, eg. for * "frame packing" or "side by side full"). * - The CRTC_NO_DBLSCAN and CRTC_NO_VSCAN flags request that adjustment *not* * be performed for doublescan and vscan > 1 modes respectively. */ void drm_mode_set_crtcinfo(struct drm_display_mode *p, int adjust_flags) { if (!p) return; p->crtc_clock = p->clock; p->crtc_hdisplay = p->hdisplay; p->crtc_hsync_start = p->hsync_start; p->crtc_hsync_end = p->hsync_end; p->crtc_htotal = p->htotal; p->crtc_hskew = p->hskew; p->crtc_vdisplay = p->vdisplay; p->crtc_vsync_start = p->vsync_start; p->crtc_vsync_end = p->vsync_end; p->crtc_vtotal = p->vtotal; if (p->flags & DRM_MODE_FLAG_INTERLACE) { if (adjust_flags & CRTC_INTERLACE_HALVE_V) { p->crtc_vdisplay /= 2; p->crtc_vsync_start /= 2; p->crtc_vsync_end /= 2; p->crtc_vtotal /= 2; } } if (!(adjust_flags & CRTC_NO_DBLSCAN)) { if (p->flags & DRM_MODE_FLAG_DBLSCAN) { p->crtc_vdisplay *= 2; p->crtc_vsync_start *= 2; p->crtc_vsync_end *= 2; p->crtc_vtotal *= 2; } } if (!(adjust_flags & CRTC_NO_VSCAN)) { if (p->vscan > 1) { p->crtc_vdisplay *= p->vscan; p->crtc_vsync_start *= p->vscan; p->crtc_vsync_end *= p->vscan; p->crtc_vtotal *= p->vscan; } } if (adjust_flags & CRTC_STEREO_DOUBLE) { unsigned int layout = p->flags & DRM_MODE_FLAG_3D_MASK; switch (layout) { case DRM_MODE_FLAG_3D_FRAME_PACKING: p->crtc_clock *= 2; p->crtc_vdisplay += p->crtc_vtotal; p->crtc_vsync_start += p->crtc_vtotal; p->crtc_vsync_end += p->crtc_vtotal; p->crtc_vtotal += p->crtc_vtotal; break; } } p->crtc_vblank_start = min(p->crtc_vsync_start, p->crtc_vdisplay); p->crtc_vblank_end = max(p->crtc_vsync_end, p->crtc_vtotal); p->crtc_hblank_start = min(p->crtc_hsync_start, p->crtc_hdisplay); p->crtc_hblank_end = max(p->crtc_hsync_end, p->crtc_htotal); } EXPORT_SYMBOL(drm_mode_set_crtcinfo); /** * drm_mode_copy - copy the mode * @dst: mode to overwrite * @src: mode to copy * * Copy an existing mode into another mode, preserving the * list head of the destination mode. */ void drm_mode_copy(struct drm_display_mode *dst, const struct drm_display_mode *src) { struct list_head head = dst->head; *dst = *src; dst->head = head; } EXPORT_SYMBOL(drm_mode_copy); /** * drm_mode_init - initialize the mode from another mode * @dst: mode to overwrite * @src: mode to copy * * Copy an existing mode into another mode, zeroing the * list head of the destination mode. Typically used * to guarantee the list head is not left with stack * garbage in on-stack modes. */ void drm_mode_init(struct drm_display_mode *dst, const struct drm_display_mode *src) { memset(dst, 0, sizeof(*dst)); drm_mode_copy(dst, src); } EXPORT_SYMBOL(drm_mode_init); /** * drm_mode_duplicate - allocate and duplicate an existing mode * @dev: drm_device to allocate the duplicated mode for * @mode: mode to duplicate * * Just allocate a new mode, copy the existing mode into it, and return * a pointer to it. Used to create new instances of established modes. * * Returns: * Pointer to duplicated mode on success, NULL on error. */ struct drm_display_mode *drm_mode_duplicate(struct drm_device *dev, const struct drm_display_mode *mode) { struct drm_display_mode *nmode; nmode = drm_mode_create(dev); if (!nmode) return NULL; drm_mode_copy(nmode, mode); return nmode; } EXPORT_SYMBOL(drm_mode_duplicate); static bool drm_mode_match_timings(const struct drm_display_mode *mode1, const struct drm_display_mode *mode2) { return mode1->hdisplay == mode2->hdisplay && mode1->hsync_start == mode2->hsync_start && mode1->hsync_end == mode2->hsync_end && mode1->htotal == mode2->htotal && mode1->hskew == mode2->hskew && mode1->vdisplay == mode2->vdisplay && mode1->vsync_start == mode2->vsync_start && mode1->vsync_end == mode2->vsync_end && mode1->vtotal == mode2->vtotal && mode1->vscan == mode2->vscan; } static bool drm_mode_match_clock(const struct drm_display_mode *mode1, const struct drm_display_mode *mode2) { /* * do clock check convert to PICOS * so fb modes get matched the same */ if (mode1->clock && mode2->clock) return KHZ2PICOS(mode1->clock) == KHZ2PICOS(mode2->clock); else return mode1->clock == mode2->clock; } static bool drm_mode_match_flags(const struct drm_display_mode *mode1, const struct drm_display_mode *mode2) { return (mode1->flags & ~DRM_MODE_FLAG_3D_MASK) == (mode2->flags & ~DRM_MODE_FLAG_3D_MASK); } static bool drm_mode_match_3d_flags(const struct drm_display_mode *mode1, const struct drm_display_mode *mode2) { return (mode1->flags & DRM_MODE_FLAG_3D_MASK) == (mode2->flags & DRM_MODE_FLAG_3D_MASK); } static bool drm_mode_match_aspect_ratio(const struct drm_display_mode *mode1, const struct drm_display_mode *mode2) { return mode1->picture_aspect_ratio == mode2->picture_aspect_ratio; } /** * drm_mode_match - test modes for (partial) equality * @mode1: first mode * @mode2: second mode * @match_flags: which parts need to match (DRM_MODE_MATCH_*) * * Check to see if @mode1 and @mode2 are equivalent. * * Returns: * True if the modes are (partially) equal, false otherwise. */ bool drm_mode_match(const struct drm_display_mode *mode1, const struct drm_display_mode *mode2, unsigned int match_flags) { if (!mode1 && !mode2) return true; if (!mode1 || !mode2) return false; if (match_flags & DRM_MODE_MATCH_TIMINGS && !drm_mode_match_timings(mode1, mode2)) return false; if (match_flags & DRM_MODE_MATCH_CLOCK && !drm_mode_match_clock(mode1, mode2)) return false; if (match_flags & DRM_MODE_MATCH_FLAGS && !drm_mode_match_flags(mode1, mode2)) return false; if (match_flags & DRM_MODE_MATCH_3D_FLAGS && !drm_mode_match_3d_flags(mode1, mode2)) return false; if (match_flags & DRM_MODE_MATCH_ASPECT_RATIO && !drm_mode_match_aspect_ratio(mode1, mode2)) return false; return true; } EXPORT_SYMBOL(drm_mode_match); /** * drm_mode_equal - test modes for equality * @mode1: first mode * @mode2: second mode * * Check to see if @mode1 and @mode2 are equivalent. * * Returns: * True if the modes are equal, false otherwise. */ bool drm_mode_equal(const struct drm_display_mode *mode1, const struct drm_display_mode *mode2) { return drm_mode_match(mode1, mode2, DRM_MODE_MATCH_TIMINGS | DRM_MODE_MATCH_CLOCK | DRM_MODE_MATCH_FLAGS | DRM_MODE_MATCH_3D_FLAGS| DRM_MODE_MATCH_ASPECT_RATIO); } EXPORT_SYMBOL(drm_mode_equal); /** * drm_mode_equal_no_clocks - test modes for equality * @mode1: first mode * @mode2: second mode * * Check to see if @mode1 and @mode2 are equivalent, but * don't check the pixel clocks. * * Returns: * True if the modes are equal, false otherwise. */ bool drm_mode_equal_no_clocks(const struct drm_display_mode *mode1, const struct drm_display_mode *mode2) { return drm_mode_match(mode1, mode2, DRM_MODE_MATCH_TIMINGS | DRM_MODE_MATCH_FLAGS | DRM_MODE_MATCH_3D_FLAGS); } EXPORT_SYMBOL(drm_mode_equal_no_clocks); /** * drm_mode_equal_no_clocks_no_stereo - test modes for equality * @mode1: first mode * @mode2: second mode * * Check to see if @mode1 and @mode2 are equivalent, but * don't check the pixel clocks nor the stereo layout. * * Returns: * True if the modes are equal, false otherwise. */ bool drm_mode_equal_no_clocks_no_stereo(const struct drm_display_mode *mode1, const struct drm_display_mode *mode2) { return drm_mode_match(mode1, mode2, DRM_MODE_MATCH_TIMINGS | DRM_MODE_MATCH_FLAGS); } EXPORT_SYMBOL(drm_mode_equal_no_clocks_no_stereo); static enum drm_mode_status drm_mode_validate_basic(const struct drm_display_mode *mode) { if (mode->type & ~DRM_MODE_TYPE_ALL) return MODE_BAD; if (mode->flags & ~DRM_MODE_FLAG_ALL) return MODE_BAD; if ((mode->flags & DRM_MODE_FLAG_3D_MASK) > DRM_MODE_FLAG_3D_MAX) return MODE_BAD; if (mode->clock == 0) return MODE_CLOCK_LOW; if (mode->hdisplay == 0 || mode->hsync_start < mode->hdisplay || mode->hsync_end < mode->hsync_start || mode->htotal < mode->hsync_end) return MODE_H_ILLEGAL; if (mode->vdisplay == 0 || mode->vsync_start < mode->vdisplay || mode->vsync_end < mode->vsync_start || mode->vtotal < mode->vsync_end) return MODE_V_ILLEGAL; return MODE_OK; } /** * drm_mode_validate_driver - make sure the mode is somewhat sane * @dev: drm device * @mode: mode to check * * First do basic validation on the mode, and then allow the driver * to check for device/driver specific limitations via the optional * &drm_mode_config_helper_funcs.mode_valid hook. * * Returns: * The mode status */ enum drm_mode_status drm_mode_validate_driver(struct drm_device *dev, const struct drm_display_mode *mode) { enum drm_mode_status status; status = drm_mode_validate_basic(mode); if (status != MODE_OK) return status; if (dev->mode_config.funcs->mode_valid) return dev->mode_config.funcs->mode_valid(dev, mode); else return MODE_OK; } EXPORT_SYMBOL(drm_mode_validate_driver); /** * drm_mode_validate_size - make sure modes adhere to size constraints * @mode: mode to check * @maxX: maximum width * @maxY: maximum height * * This function is a helper which can be used to validate modes against size * limitations of the DRM device/connector. If a mode is too big its status * member is updated with the appropriate validation failure code. The list * itself is not changed. * * Returns: * The mode status */ enum drm_mode_status drm_mode_validate_size(const struct drm_display_mode *mode, int maxX, int maxY) { if (maxX > 0 && mode->hdisplay > maxX) return MODE_VIRTUAL_X; if (maxY > 0 && mode->vdisplay > maxY) return MODE_VIRTUAL_Y; return MODE_OK; } EXPORT_SYMBOL(drm_mode_validate_size); /** * drm_mode_validate_ycbcr420 - add 'ycbcr420-only' modes only when allowed * @mode: mode to check * @connector: drm connector under action * * This function is a helper which can be used to filter out any YCBCR420 * only mode, when the source doesn't support it. * * Returns: * The mode status */ enum drm_mode_status drm_mode_validate_ycbcr420(const struct drm_display_mode *mode, struct drm_connector *connector) { if (!connector->ycbcr_420_allowed && drm_mode_is_420_only(&connector->display_info, mode)) return MODE_NO_420; return MODE_OK; } EXPORT_SYMBOL(drm_mode_validate_ycbcr420); #define MODE_STATUS(status) [MODE_ ## status + 3] = #status static const char * const drm_mode_status_names[] = { MODE_STATUS(OK), MODE_STATUS(HSYNC), MODE_STATUS(VSYNC), MODE_STATUS(H_ILLEGAL), MODE_STATUS(V_ILLEGAL), MODE_STATUS(BAD_WIDTH), MODE_STATUS(NOMODE), MODE_STATUS(NO_INTERLACE), MODE_STATUS(NO_DBLESCAN), MODE_STATUS(NO_VSCAN), MODE_STATUS(MEM), MODE_STATUS(VIRTUAL_X), MODE_STATUS(VIRTUAL_Y), MODE_STATUS(MEM_VIRT), MODE_STATUS(NOCLOCK), MODE_STATUS(CLOCK_HIGH), MODE_STATUS(CLOCK_LOW), MODE_STATUS(CLOCK_RANGE), MODE_STATUS(BAD_HVALUE), MODE_STATUS(BAD_VVALUE), MODE_STATUS(BAD_VSCAN), MODE_STATUS(HSYNC_NARROW), MODE_STATUS(HSYNC_WIDE), MODE_STATUS(HBLANK_NARROW), MODE_STATUS(HBLANK_WIDE), MODE_STATUS(VSYNC_NARROW), MODE_STATUS(VSYNC_WIDE), MODE_STATUS(VBLANK_NARROW), MODE_STATUS(VBLANK_WIDE), MODE_STATUS(PANEL), MODE_STATUS(INTERLACE_WIDTH), MODE_STATUS(ONE_WIDTH), MODE_STATUS(ONE_HEIGHT), MODE_STATUS(ONE_SIZE), MODE_STATUS(NO_REDUCED), MODE_STATUS(NO_STEREO), MODE_STATUS(NO_420), MODE_STATUS(STALE), MODE_STATUS(BAD), MODE_STATUS(ERROR), }; #undef MODE_STATUS const char *drm_get_mode_status_name(enum drm_mode_status status) { int index = status + 3; if (WARN_ON(index < 0 || index >= ARRAY_SIZE(drm_mode_status_names))) return ""; return drm_mode_status_names[index]; } /** * drm_mode_prune_invalid - remove invalid modes from mode list * @dev: DRM device * @mode_list: list of modes to check * @verbose: be verbose about it * * This helper function can be used to prune a display mode list after * validation has been completed. All modes whose status is not MODE_OK will be * removed from the list, and if @verbose the status code and mode name is also * printed to dmesg. */ void drm_mode_prune_invalid(struct drm_device *dev, struct list_head *mode_list, bool verbose) { struct drm_display_mode *mode, *t; list_for_each_entry_safe(mode, t, mode_list, head) { if (mode->status != MODE_OK) { list_del(&mode->head); if (mode->type & DRM_MODE_TYPE_USERDEF) { drm_warn(dev, "User-defined mode not supported: " DRM_MODE_FMT "\n", DRM_MODE_ARG(mode)); } if (verbose) { drm_dbg_kms(dev, "Rejected mode: " DRM_MODE_FMT " (%s)\n", DRM_MODE_ARG(mode), drm_get_mode_status_name(mode->status)); } drm_mode_destroy(dev, mode); } } } EXPORT_SYMBOL(drm_mode_prune_invalid); /** * drm_mode_compare - compare modes for favorability * @priv: unused * @lh_a: list_head for first mode * @lh_b: list_head for second mode * * Compare two modes, given by @lh_a and @lh_b, returning a value indicating * which is better. * * Returns: * Negative if @lh_a is better than @lh_b, zero if they're equivalent, or * positive if @lh_b is better than @lh_a. */ static int drm_mode_compare(void *priv, const struct list_head *lh_a, const struct list_head *lh_b) { struct drm_display_mode *a = list_entry(lh_a, struct drm_display_mode, head); struct drm_display_mode *b = list_entry(lh_b, struct drm_display_mode, head); int diff; diff = ((b->type & DRM_MODE_TYPE_PREFERRED) != 0) - ((a->type & DRM_MODE_TYPE_PREFERRED) != 0); if (diff) return diff; diff = b->hdisplay * b->vdisplay - a->hdisplay * a->vdisplay; if (diff) return diff; diff = drm_mode_vrefresh(b) - drm_mode_vrefresh(a); if (diff) return diff; diff = b->clock - a->clock; return diff; } /** * drm_mode_sort - sort mode list * @mode_list: list of drm_display_mode structures to sort * * Sort @mode_list by favorability, moving good modes to the head of the list. */ void drm_mode_sort(struct list_head *mode_list) { list_sort(NULL, mode_list, drm_mode_compare); } EXPORT_SYMBOL(drm_mode_sort); /** * drm_connector_list_update - update the mode list for the connector * @connector: the connector to update * * This moves the modes from the @connector probed_modes list * to the actual mode list. It compares the probed mode against the current * list and only adds different/new modes. * * This is just a helper functions doesn't validate any modes itself and also * doesn't prune any invalid modes. Callers need to do that themselves. */ void drm_connector_list_update(struct drm_connector *connector) { struct drm_display_mode *pmode, *pt; WARN_ON(!mutex_is_locked(&connector->dev->mode_config.mutex)); list_for_each_entry_safe(pmode, pt, &connector->probed_modes, head) { struct drm_display_mode *mode; bool found_it = false; /* go through current modes checking for the new probed mode */ list_for_each_entry(mode, &connector->modes, head) { if (!drm_mode_equal(pmode, mode)) continue; found_it = true; /* * If the old matching mode is stale (ie. left over * from a previous probe) just replace it outright. * Otherwise just merge the type bits between all * equal probed modes. * * If two probed modes are considered equal, pick the * actual timings from the one that's marked as * preferred (in case the match isn't 100%). If * multiple or zero preferred modes are present, favor * the mode added to the probed_modes list first. */ if (mode->status == MODE_STALE) { drm_mode_copy(mode, pmode); } else if ((mode->type & DRM_MODE_TYPE_PREFERRED) == 0 && (pmode->type & DRM_MODE_TYPE_PREFERRED) != 0) { pmode->type |= mode->type; drm_mode_copy(mode, pmode); } else { mode->type |= pmode->type; } list_del(&pmode->head); drm_mode_destroy(connector->dev, pmode); break; } if (!found_it) { list_move_tail(&pmode->head, &connector->modes); } } } EXPORT_SYMBOL(drm_connector_list_update); static int drm_mode_parse_cmdline_bpp(const char *str, char **end_ptr, struct drm_cmdline_mode *mode) { unsigned int bpp; if (str[0] != '-') return -EINVAL; str++; bpp = simple_strtol(str, end_ptr, 10); if (*end_ptr == str) return -EINVAL; mode->bpp = bpp; mode->bpp_specified = true; return 0; } static int drm_mode_parse_cmdline_refresh(const char *str, char **end_ptr, struct drm_cmdline_mode *mode) { unsigned int refresh; if (str[0] != '@') return -EINVAL; str++; refresh = simple_strtol(str, end_ptr, 10); if (*end_ptr == str) return -EINVAL; mode->refresh = refresh; mode->refresh_specified = true; return 0; } static int drm_mode_parse_cmdline_extra(const char *str, int length, bool freestanding, const struct drm_connector *connector, struct drm_cmdline_mode *mode) { int i; for (i = 0; i < length; i++) { switch (str[i]) { case 'i': if (freestanding) return -EINVAL; mode->interlace = true; break; case 'm': if (freestanding) return -EINVAL; mode->margins = true; break; case 'D': if (mode->force != DRM_FORCE_UNSPECIFIED) return -EINVAL; if ((connector->connector_type != DRM_MODE_CONNECTOR_DVII) && (connector->connector_type != DRM_MODE_CONNECTOR_HDMIB)) mode->force = DRM_FORCE_ON; else mode->force = DRM_FORCE_ON_DIGITAL; break; case 'd': if (mode->force != DRM_FORCE_UNSPECIFIED) return -EINVAL; mode->force = DRM_FORCE_OFF; break; case 'e': if (mode->force != DRM_FORCE_UNSPECIFIED) return -EINVAL; mode->force = DRM_FORCE_ON; break; default: return -EINVAL; } } return 0; } static int drm_mode_parse_cmdline_res_mode(const char *str, unsigned int length, bool extras, const struct drm_connector *connector, struct drm_cmdline_mode *mode) { const char *str_start = str; bool rb = false, cvt = false; int xres = 0, yres = 0; int remaining, i; char *end_ptr; xres = simple_strtol(str, &end_ptr, 10); if (end_ptr == str) return -EINVAL; if (end_ptr[0] != 'x') return -EINVAL; end_ptr++; str = end_ptr; yres = simple_strtol(str, &end_ptr, 10); if (end_ptr == str) return -EINVAL; remaining = length - (end_ptr - str_start); if (remaining < 0) return -EINVAL; for (i = 0; i < remaining; i++) { switch (end_ptr[i]) { case 'M': cvt = true; break; case 'R': rb = true; break; default: /* * Try to pass that to our extras parsing * function to handle the case where the * extras are directly after the resolution */ if (extras) { int ret = drm_mode_parse_cmdline_extra(end_ptr + i, 1, false, connector, mode); if (ret) return ret; } else { return -EINVAL; } } } mode->xres = xres; mode->yres = yres; mode->cvt = cvt; mode->rb = rb; return 0; } static int drm_mode_parse_cmdline_int(const char *delim, unsigned int *int_ret) { const char *value; char *endp; /* * delim must point to the '=', otherwise it is a syntax error and * if delim points to the terminating zero, then delim + 1 will point * past the end of the string. */ if (*delim != '=') return -EINVAL; value = delim + 1; *int_ret = simple_strtol(value, &endp, 10); /* Make sure we have parsed something */ if (endp == value) return -EINVAL; return 0; } static int drm_mode_parse_panel_orientation(const char *delim, struct drm_cmdline_mode *mode) { const char *value; if (*delim != '=') return -EINVAL; value = delim + 1; delim = strchr(value, ','); if (!delim) delim = value + strlen(value); if (!strncmp(value, "normal", delim - value)) mode->panel_orientation = DRM_MODE_PANEL_ORIENTATION_NORMAL; else if (!strncmp(value, "upside_down", delim - value)) mode->panel_orientation = DRM_MODE_PANEL_ORIENTATION_BOTTOM_UP; else if (!strncmp(value, "left_side_up", delim - value)) mode->panel_orientation = DRM_MODE_PANEL_ORIENTATION_LEFT_UP; else if (!strncmp(value, "right_side_up", delim - value)) mode->panel_orientation = DRM_MODE_PANEL_ORIENTATION_RIGHT_UP; else return -EINVAL; return 0; } static int drm_mode_parse_tv_mode(const char *delim, struct drm_cmdline_mode *mode) { const char *value; int ret; if (*delim != '=') return -EINVAL; value = delim + 1; delim = strchr(value, ','); if (!delim) delim = value + strlen(value); ret = drm_get_tv_mode_from_name(value, delim - value); if (ret < 0) return ret; mode->tv_mode_specified = true; mode->tv_mode = ret; return 0; } static int drm_mode_parse_cmdline_options(const char *str, bool freestanding, const struct drm_connector *connector, struct drm_cmdline_mode *mode) { unsigned int deg, margin, rotation = 0; const char *delim, *option, *sep; option = str; do { delim = strchr(option, '='); if (!delim) { delim = strchr(option, ','); if (!delim) delim = option + strlen(option); } if (!strncmp(option, "rotate", delim - option)) { if (drm_mode_parse_cmdline_int(delim, °)) return -EINVAL; switch (deg) { case 0: rotation |= DRM_MODE_ROTATE_0; break; case 90: rotation |= DRM_MODE_ROTATE_90; break; case 180: rotation |= DRM_MODE_ROTATE_180; break; case 270: rotation |= DRM_MODE_ROTATE_270; break; default: return -EINVAL; } } else if (!strncmp(option, "reflect_x", delim - option)) { rotation |= DRM_MODE_REFLECT_X; } else if (!strncmp(option, "reflect_y", delim - option)) { rotation |= DRM_MODE_REFLECT_Y; } else if (!strncmp(option, "margin_right", delim - option)) { if (drm_mode_parse_cmdline_int(delim, &margin)) return -EINVAL; mode->tv_margins.right = margin; } else if (!strncmp(option, "margin_left", delim - option)) { if (drm_mode_parse_cmdline_int(delim, &margin)) return -EINVAL; mode->tv_margins.left = margin; } else if (!strncmp(option, "margin_top", delim - option)) { if (drm_mode_parse_cmdline_int(delim, &margin)) return -EINVAL; mode->tv_margins.top = margin; } else if (!strncmp(option, "margin_bottom", delim - option)) { if (drm_mode_parse_cmdline_int(delim, &margin)) return -EINVAL; mode->tv_margins.bottom = margin; } else if (!strncmp(option, "panel_orientation", delim - option)) { if (drm_mode_parse_panel_orientation(delim, mode)) return -EINVAL; } else if (!strncmp(option, "tv_mode", delim - option)) { if (drm_mode_parse_tv_mode(delim, mode)) return -EINVAL; } else { return -EINVAL; } sep = strchr(delim, ','); option = sep + 1; } while (sep); if (rotation && freestanding) return -EINVAL; if (!(rotation & DRM_MODE_ROTATE_MASK)) rotation |= DRM_MODE_ROTATE_0; /* Make sure there is exactly one rotation defined */ if (!is_power_of_2(rotation & DRM_MODE_ROTATE_MASK)) return -EINVAL; mode->rotation_reflection = rotation; return 0; } struct drm_named_mode { const char *name; unsigned int pixel_clock_khz; unsigned int xres; unsigned int yres; unsigned int flags; unsigned int tv_mode; }; #define NAMED_MODE(_name, _pclk, _x, _y, _flags, _mode) \ { \ .name = _name, \ .pixel_clock_khz = _pclk, \ .xres = _x, \ .yres = _y, \ .flags = _flags, \ .tv_mode = _mode, \ } static const struct drm_named_mode drm_named_modes[] = { NAMED_MODE("NTSC", 13500, 720, 480, DRM_MODE_FLAG_INTERLACE, DRM_MODE_TV_MODE_NTSC), NAMED_MODE("NTSC-J", 13500, 720, 480, DRM_MODE_FLAG_INTERLACE, DRM_MODE_TV_MODE_NTSC_J), NAMED_MODE("PAL", 13500, 720, 576, DRM_MODE_FLAG_INTERLACE, DRM_MODE_TV_MODE_PAL), NAMED_MODE("PAL-M", 13500, 720, 480, DRM_MODE_FLAG_INTERLACE, DRM_MODE_TV_MODE_PAL_M), }; static int drm_mode_parse_cmdline_named_mode(const char *name, unsigned int name_end, struct drm_cmdline_mode *cmdline_mode) { unsigned int i; if (!name_end) return 0; /* If the name starts with a digit, it's not a named mode */ if (isdigit(name[0])) return 0; /* * If there's an equal sign in the name, the command-line * contains only an option and no mode. */ if (strnchr(name, name_end, '=')) return 0; /* The connection status extras can be set without a mode. */ if (name_end == 1 && (name[0] == 'd' || name[0] == 'D' || name[0] == 'e')) return 0; /* * We're sure we're a named mode at this point, iterate over the * list of modes we're aware of. */ for (i = 0; i < ARRAY_SIZE(drm_named_modes); i++) { const struct drm_named_mode *mode = &drm_named_modes[i]; int ret; ret = str_has_prefix(name, mode->name); if (ret != name_end) continue; strscpy(cmdline_mode->name, mode->name, sizeof(cmdline_mode->name)); cmdline_mode->pixel_clock = mode->pixel_clock_khz; cmdline_mode->xres = mode->xres; cmdline_mode->yres = mode->yres; cmdline_mode->interlace = !!(mode->flags & DRM_MODE_FLAG_INTERLACE); cmdline_mode->tv_mode = mode->tv_mode; cmdline_mode->tv_mode_specified = true; cmdline_mode->specified = true; return 1; } return -EINVAL; } /** * drm_mode_parse_command_line_for_connector - parse command line modeline for connector * @mode_option: optional per connector mode option * @connector: connector to parse modeline for * @mode: preallocated drm_cmdline_mode structure to fill out * * This parses @mode_option command line modeline for modes and options to * configure the connector. * * This uses the same parameters as the fb modedb.c, except for an extra * force-enable, force-enable-digital and force-disable bit at the end:: * * <xres>x<yres>[M][R][-<bpp>][@<refresh>][i][m][eDd] * * Additionals options can be provided following the mode, using a comma to * separate each option. Valid options can be found in * Documentation/fb/modedb.rst. * * The intermediate drm_cmdline_mode structure is required to store additional * options from the command line modline like the force-enable/disable flag. * * Returns: * True if a valid modeline has been parsed, false otherwise. */ bool drm_mode_parse_command_line_for_connector(const char *mode_option, const struct drm_connector *connector, struct drm_cmdline_mode *mode) { const char *name; bool freestanding = false, parse_extras = false; unsigned int bpp_off = 0, refresh_off = 0, options_off = 0; unsigned int mode_end = 0; const char *bpp_ptr = NULL, *refresh_ptr = NULL, *extra_ptr = NULL; const char *options_ptr = NULL; char *bpp_end_ptr = NULL, *refresh_end_ptr = NULL; int len, ret; memset(mode, 0, sizeof(*mode)); mode->panel_orientation = DRM_MODE_PANEL_ORIENTATION_UNKNOWN; if (!mode_option) return false; name = mode_option; /* Locate the start of named options */ options_ptr = strchr(name, ','); if (options_ptr) options_off = options_ptr - name; else options_off = strlen(name); /* Try to locate the bpp and refresh specifiers, if any */ bpp_ptr = strnchr(name, options_off, '-'); while (bpp_ptr && !isdigit(bpp_ptr[1])) bpp_ptr = strnchr(bpp_ptr + 1, options_off, '-'); if (bpp_ptr) bpp_off = bpp_ptr - name; refresh_ptr = strnchr(name, options_off, '@'); if (refresh_ptr) refresh_off = refresh_ptr - name; /* Locate the end of the name / resolution, and parse it */ if (bpp_ptr) { mode_end = bpp_off; } else if (refresh_ptr) { mode_end = refresh_off; } else if (options_ptr) { mode_end = options_off; parse_extras = true; } else { mode_end = strlen(name); parse_extras = true; } if (!mode_end) return false; ret = drm_mode_parse_cmdline_named_mode(name, mode_end, mode); if (ret < 0) return false; /* * Having a mode that starts by a letter (and thus is named) and * an at-sign (used to specify a refresh rate) is disallowed. */ if (ret && refresh_ptr) return false; /* No named mode? Check for a normal mode argument, e.g. 1024x768 */ if (!mode->specified && isdigit(name[0])) { ret = drm_mode_parse_cmdline_res_mode(name, mode_end, parse_extras, connector, mode); if (ret) return false; mode->specified = true; } /* No mode? Check for freestanding extras and/or options */ if (!mode->specified) { unsigned int len = strlen(mode_option); if (bpp_ptr || refresh_ptr) return false; /* syntax error */ if (len == 1 || (len >= 2 && mode_option[1] == ',')) extra_ptr = mode_option; else options_ptr = mode_option - 1; freestanding = true; } if (bpp_ptr) { ret = drm_mode_parse_cmdline_bpp(bpp_ptr, &bpp_end_ptr, mode); if (ret) return false; mode->bpp_specified = true; } if (refresh_ptr) { ret = drm_mode_parse_cmdline_refresh(refresh_ptr, &refresh_end_ptr, mode); if (ret) return false; mode->refresh_specified = true; } /* * Locate the end of the bpp / refresh, and parse the extras * if relevant */ if (bpp_ptr && refresh_ptr) extra_ptr = max(bpp_end_ptr, refresh_end_ptr); else if (bpp_ptr) extra_ptr = bpp_end_ptr; else if (refresh_ptr) extra_ptr = refresh_end_ptr; if (extra_ptr) { if (options_ptr) len = options_ptr - extra_ptr; else len = strlen(extra_ptr); ret = drm_mode_parse_cmdline_extra(extra_ptr, len, freestanding, connector, mode); if (ret) return false; } if (options_ptr) { ret = drm_mode_parse_cmdline_options(options_ptr + 1, freestanding, connector, mode); if (ret) return false; } return true; } EXPORT_SYMBOL(drm_mode_parse_command_line_for_connector); static struct drm_display_mode *drm_named_mode(struct drm_device *dev, struct drm_cmdline_mode *cmd) { unsigned int i; for (i = 0; i < ARRAY_SIZE(drm_named_modes); i++) { const struct drm_named_mode *named_mode = &drm_named_modes[i]; if (strcmp(cmd->name, named_mode->name)) continue; if (!cmd->tv_mode_specified) continue; return drm_analog_tv_mode(dev, named_mode->tv_mode, named_mode->pixel_clock_khz * 1000, named_mode->xres, named_mode->yres, named_mode->flags & DRM_MODE_FLAG_INTERLACE); } return NULL; } /** * drm_mode_create_from_cmdline_mode - convert a command line modeline into a DRM display mode * @dev: DRM device to create the new mode for * @cmd: input command line modeline * * Returns: * Pointer to converted mode on success, NULL on error. */ struct drm_display_mode * drm_mode_create_from_cmdline_mode(struct drm_device *dev, struct drm_cmdline_mode *cmd) { struct drm_display_mode *mode; if (cmd->xres == 0 || cmd->yres == 0) return NULL; if (strlen(cmd->name)) mode = drm_named_mode(dev, cmd); else if (cmd->cvt) mode = drm_cvt_mode(dev, cmd->xres, cmd->yres, cmd->refresh_specified ? cmd->refresh : 60, cmd->rb, cmd->interlace, cmd->margins); else mode = drm_gtf_mode(dev, cmd->xres, cmd->yres, cmd->refresh_specified ? cmd->refresh : 60, cmd->interlace, cmd->margins); if (!mode) return NULL; mode->type |= DRM_MODE_TYPE_USERDEF; /* fix up 1368x768: GFT/CVT can't express 1366 width due to alignment */ if (cmd->xres == 1366) drm_mode_fixup_1366x768(mode); drm_mode_set_crtcinfo(mode, CRTC_INTERLACE_HALVE_V); return mode; } EXPORT_SYMBOL(drm_mode_create_from_cmdline_mode); /** * drm_mode_convert_to_umode - convert a drm_display_mode into a modeinfo * @out: drm_mode_modeinfo struct to return to the user * @in: drm_display_mode to use * * Convert a drm_display_mode into a drm_mode_modeinfo structure to return to * the user. */ void drm_mode_convert_to_umode(struct drm_mode_modeinfo *out, const struct drm_display_mode *in) { out->clock = in->clock; out->hdisplay = in->hdisplay; out->hsync_start = in->hsync_start; out->hsync_end = in->hsync_end; out->htotal = in->htotal; out->hskew = in->hskew; out->vdisplay = in->vdisplay; out->vsync_start = in->vsync_start; out->vsync_end = in->vsync_end; out->vtotal = in->vtotal; out->vscan = in->vscan; out->vrefresh = drm_mode_vrefresh(in); out->flags = in->flags; out->type = in->type; switch (in->picture_aspect_ratio) { case HDMI_PICTURE_ASPECT_4_3: out->flags |= DRM_MODE_FLAG_PIC_AR_4_3; break; case HDMI_PICTURE_ASPECT_16_9: out->flags |= DRM_MODE_FLAG_PIC_AR_16_9; break; case HDMI_PICTURE_ASPECT_64_27: out->flags |= DRM_MODE_FLAG_PIC_AR_64_27; break; case HDMI_PICTURE_ASPECT_256_135: out->flags |= DRM_MODE_FLAG_PIC_AR_256_135; break; default: WARN(1, "Invalid aspect ratio (0%x) on mode\n", in->picture_aspect_ratio); fallthrough; case HDMI_PICTURE_ASPECT_NONE: out->flags |= DRM_MODE_FLAG_PIC_AR_NONE; break; } strscpy_pad(out->name, in->name, sizeof(out->name)); } /** * drm_mode_convert_umode - convert a modeinfo into a drm_display_mode * @dev: drm device * @out: drm_display_mode to return to the user * @in: drm_mode_modeinfo to use * * Convert a drm_mode_modeinfo into a drm_display_mode structure to return to * the caller. * * Returns: * Zero on success, negative errno on failure. */ int drm_mode_convert_umode(struct drm_device *dev, struct drm_display_mode *out, const struct drm_mode_modeinfo *in) { if (in->clock > INT_MAX || in->vrefresh > INT_MAX) return -ERANGE; out->clock = in->clock; out->hdisplay = in->hdisplay; out->hsync_start = in->hsync_start; out->hsync_end = in->hsync_end; out->htotal = in->htotal; out->hskew = in->hskew; out->vdisplay = in->vdisplay; out->vsync_start = in->vsync_start; out->vsync_end = in->vsync_end; out->vtotal = in->vtotal; out->vscan = in->vscan; out->flags = in->flags; /* * Old xf86-video-vmware (possibly others too) used to * leave 'type' uninitialized. Just ignore any bits we * don't like. It's a just hint after all, and more * useful for the kernel->userspace direction anyway. */ out->type = in->type & DRM_MODE_TYPE_ALL; strscpy_pad(out->name, in->name, sizeof(out->name)); /* Clearing picture aspect ratio bits from out flags, * as the aspect-ratio information is not stored in * flags for kernel-mode, but in picture_aspect_ratio. */ out->flags &= ~DRM_MODE_FLAG_PIC_AR_MASK; switch (in->flags & DRM_MODE_FLAG_PIC_AR_MASK) { case DRM_MODE_FLAG_PIC_AR_4_3: out->picture_aspect_ratio = HDMI_PICTURE_ASPECT_4_3; break; case DRM_MODE_FLAG_PIC_AR_16_9: out->picture_aspect_ratio = HDMI_PICTURE_ASPECT_16_9; break; case DRM_MODE_FLAG_PIC_AR_64_27: out->picture_aspect_ratio = HDMI_PICTURE_ASPECT_64_27; break; case DRM_MODE_FLAG_PIC_AR_256_135: out->picture_aspect_ratio = HDMI_PICTURE_ASPECT_256_135; break; case DRM_MODE_FLAG_PIC_AR_NONE: out->picture_aspect_ratio = HDMI_PICTURE_ASPECT_NONE; break; default: return -EINVAL; } out->status = drm_mode_validate_driver(dev, out); if (out->status != MODE_OK) return -EINVAL; drm_mode_set_crtcinfo(out, CRTC_INTERLACE_HALVE_V); return 0; } /** * drm_mode_is_420_only - if a given videomode can be only supported in YCBCR420 * output format * * @display: display under action * @mode: video mode to be tested. * * Returns: * true if the mode can be supported in YCBCR420 format * false if not. */ bool drm_mode_is_420_only(const struct drm_display_info *display, const struct drm_display_mode *mode) { u8 vic = drm_match_cea_mode(mode); return test_bit(vic, display->hdmi.y420_vdb_modes); } EXPORT_SYMBOL(drm_mode_is_420_only); /** * drm_mode_is_420_also - if a given videomode can be supported in YCBCR420 * output format also (along with RGB/YCBCR444/422) * * @display: display under action. * @mode: video mode to be tested. * * Returns: * true if the mode can be support YCBCR420 format * false if not. */ bool drm_mode_is_420_also(const struct drm_display_info *display, const struct drm_display_mode *mode) { u8 vic = drm_match_cea_mode(mode); return test_bit(vic, display->hdmi.y420_cmdb_modes); } EXPORT_SYMBOL(drm_mode_is_420_also); /** * drm_mode_is_420 - if a given videomode can be supported in YCBCR420 * output format * * @display: display under action. * @mode: video mode to be tested. * * Returns: * true if the mode can be supported in YCBCR420 format * false if not. */ bool drm_mode_is_420(const struct drm_display_info *display, const struct drm_display_mode *mode) { return drm_mode_is_420_only(display, mode) || drm_mode_is_420_also(display, mode); } EXPORT_SYMBOL(drm_mode_is_420); /** * drm_set_preferred_mode - Sets the preferred mode of a connector * @connector: connector whose mode list should be processed * @hpref: horizontal resolution of preferred mode * @vpref: vertical resolution of preferred mode * * Marks a mode as preferred if it matches the resolution specified by @hpref * and @vpref. */ void drm_set_preferred_mode(struct drm_connector *connector, int hpref, int vpref) { struct drm_display_mode *mode; list_for_each_entry(mode, &connector->probed_modes, head) { if (mode->hdisplay == hpref && mode->vdisplay == vpref) mode->type |= DRM_MODE_TYPE_PREFERRED; } } EXPORT_SYMBOL(drm_set_preferred_mode); |
1 1 1 1 1 1 1 1 3 3 3 1 2 2 2 3 3 3 3 1 1 1 1 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 | // SPDX-License-Identifier: GPL-2.0 // rc-ir-raw.c - handle IR pulse/space events // // Copyright (C) 2010 by Mauro Carvalho Chehab #include <linux/export.h> #include <linux/kthread.h> #include <linux/mutex.h> #include <linux/kmod.h> #include <linux/sched.h> #include "rc-core-priv.h" /* Used to keep track of IR raw clients, protected by ir_raw_handler_lock */ static LIST_HEAD(ir_raw_client_list); /* Used to handle IR raw handler extensions */ DEFINE_MUTEX(ir_raw_handler_lock); static LIST_HEAD(ir_raw_handler_list); static atomic64_t available_protocols = ATOMIC64_INIT(0); static int ir_raw_event_thread(void *data) { struct ir_raw_event ev; struct ir_raw_handler *handler; struct ir_raw_event_ctrl *raw = data; struct rc_dev *dev = raw->dev; while (1) { mutex_lock(&ir_raw_handler_lock); while (kfifo_out(&raw->kfifo, &ev, 1)) { if (is_timing_event(ev)) { if (ev.duration == 0) dev_warn_once(&dev->dev, "nonsensical timing event of duration 0"); if (is_timing_event(raw->prev_ev) && !is_transition(&ev, &raw->prev_ev)) dev_warn_once(&dev->dev, "two consecutive events of type %s", TO_STR(ev.pulse)); } list_for_each_entry(handler, &ir_raw_handler_list, list) if (dev->enabled_protocols & handler->protocols || !handler->protocols) handler->decode(dev, ev); lirc_raw_event(dev, ev); raw->prev_ev = ev; } mutex_unlock(&ir_raw_handler_lock); set_current_state(TASK_INTERRUPTIBLE); if (kthread_should_stop()) { __set_current_state(TASK_RUNNING); break; } else if (!kfifo_is_empty(&raw->kfifo)) set_current_state(TASK_RUNNING); schedule(); } return 0; } /** * ir_raw_event_store() - pass a pulse/space duration to the raw ir decoders * @dev: the struct rc_dev device descriptor * @ev: the struct ir_raw_event descriptor of the pulse/space * * This routine (which may be called from an interrupt context) stores a * pulse/space duration for the raw ir decoding state machines. Pulses are * signalled as positive values and spaces as negative values. A zero value * will reset the decoding state machines. */ int ir_raw_event_store(struct rc_dev *dev, struct ir_raw_event *ev) { if (!dev->raw) return -EINVAL; dev_dbg(&dev->dev, "sample: (%05dus %s)\n", ev->duration, TO_STR(ev->pulse)); if (!kfifo_put(&dev->raw->kfifo, *ev)) { dev_err(&dev->dev, "IR event FIFO is full!\n"); return -ENOSPC; } return 0; } EXPORT_SYMBOL_GPL(ir_raw_event_store); /** * ir_raw_event_store_edge() - notify raw ir decoders of the start of a pulse/space * @dev: the struct rc_dev device descriptor * @pulse: true for pulse, false for space * * This routine (which may be called from an interrupt context) is used to * store the beginning of an ir pulse or space (or the start/end of ir * reception) for the raw ir decoding state machines. This is used by * hardware which does not provide durations directly but only interrupts * (or similar events) on state change. */ int ir_raw_event_store_edge(struct rc_dev *dev, bool pulse) { ktime_t now; struct ir_raw_event ev = {}; if (!dev->raw) return -EINVAL; now = ktime_get(); ev.duration = ktime_to_us(ktime_sub(now, dev->raw->last_event)); ev.pulse = !pulse; return ir_raw_event_store_with_timeout(dev, &ev); } EXPORT_SYMBOL_GPL(ir_raw_event_store_edge); /* * ir_raw_event_store_with_timeout() - pass a pulse/space duration to the raw * ir decoders, schedule decoding and * timeout * @dev: the struct rc_dev device descriptor * @ev: the struct ir_raw_event descriptor of the pulse/space * * This routine (which may be called from an interrupt context) stores a * pulse/space duration for the raw ir decoding state machines, schedules * decoding and generates a timeout. */ int ir_raw_event_store_with_timeout(struct rc_dev *dev, struct ir_raw_event *ev) { ktime_t now; int rc = 0; if (!dev->raw) return -EINVAL; now = ktime_get(); spin_lock(&dev->raw->edge_spinlock); rc = ir_raw_event_store(dev, ev); dev->raw->last_event = now; /* timer could be set to timeout (125ms by default) */ if (!timer_pending(&dev->raw->edge_handle) || time_after(dev->raw->edge_handle.expires, jiffies + msecs_to_jiffies(15))) { mod_timer(&dev->raw->edge_handle, jiffies + msecs_to_jiffies(15)); } spin_unlock(&dev->raw->edge_spinlock); return rc; } EXPORT_SYMBOL_GPL(ir_raw_event_store_with_timeout); /** * ir_raw_event_store_with_filter() - pass next pulse/space to decoders with some processing * @dev: the struct rc_dev device descriptor * @ev: the event that has occurred * * This routine (which may be called from an interrupt context) works * in similar manner to ir_raw_event_store_edge. * This routine is intended for devices with limited internal buffer * It automerges samples of same type, and handles timeouts. Returns non-zero * if the event was added, and zero if the event was ignored due to idle * processing. */ int ir_raw_event_store_with_filter(struct rc_dev *dev, struct ir_raw_event *ev) { if (!dev->raw) return -EINVAL; /* Ignore spaces in idle mode */ if (dev->idle && !ev->pulse) return 0; else if (dev->idle) ir_raw_event_set_idle(dev, false); if (!dev->raw->this_ev.duration) dev->raw->this_ev = *ev; else if (ev->pulse == dev->raw->this_ev.pulse) dev->raw->this_ev.duration += ev->duration; else { ir_raw_event_store(dev, &dev->raw->this_ev); dev->raw->this_ev = *ev; } /* Enter idle mode if necessary */ if (!ev->pulse && dev->timeout && dev->raw->this_ev.duration >= dev->timeout) ir_raw_event_set_idle(dev, true); return 1; } EXPORT_SYMBOL_GPL(ir_raw_event_store_with_filter); /** * ir_raw_event_set_idle() - provide hint to rc-core when the device is idle or not * @dev: the struct rc_dev device descriptor * @idle: whether the device is idle or not */ void ir_raw_event_set_idle(struct rc_dev *dev, bool idle) { if (!dev->raw) return; dev_dbg(&dev->dev, "%s idle mode\n", idle ? "enter" : "leave"); if (idle) { dev->raw->this_ev.timeout = true; ir_raw_event_store(dev, &dev->raw->this_ev); dev->raw->this_ev = (struct ir_raw_event) {}; } if (dev->s_idle) dev->s_idle(dev, idle); dev->idle = idle; } EXPORT_SYMBOL_GPL(ir_raw_event_set_idle); /** * ir_raw_event_handle() - schedules the decoding of stored ir data * @dev: the struct rc_dev device descriptor * * This routine will tell rc-core to start decoding stored ir data. */ void ir_raw_event_handle(struct rc_dev *dev) { if (!dev->raw || !dev->raw->thread) return; wake_up_process(dev->raw->thread); } EXPORT_SYMBOL_GPL(ir_raw_event_handle); /* used internally by the sysfs interface */ u64 ir_raw_get_allowed_protocols(void) { return atomic64_read(&available_protocols); } static int change_protocol(struct rc_dev *dev, u64 *rc_proto) { struct ir_raw_handler *handler; u32 timeout = 0; mutex_lock(&ir_raw_handler_lock); list_for_each_entry(handler, &ir_raw_handler_list, list) { if (!(dev->enabled_protocols & handler->protocols) && (*rc_proto & handler->protocols) && handler->raw_register) handler->raw_register(dev); if ((dev->enabled_protocols & handler->protocols) && !(*rc_proto & handler->protocols) && handler->raw_unregister) handler->raw_unregister(dev); } mutex_unlock(&ir_raw_handler_lock); if (!dev->max_timeout) return 0; mutex_lock(&ir_raw_handler_lock); list_for_each_entry(handler, &ir_raw_handler_list, list) { if (handler->protocols & *rc_proto) { if (timeout < handler->min_timeout) timeout = handler->min_timeout; } } mutex_unlock(&ir_raw_handler_lock); if (timeout == 0) timeout = IR_DEFAULT_TIMEOUT; else timeout += MS_TO_US(10); if (timeout < dev->min_timeout) timeout = dev->min_timeout; else if (timeout > dev->max_timeout) timeout = dev->max_timeout; if (dev->s_timeout) dev->s_timeout(dev, timeout); else dev->timeout = timeout; return 0; } static void ir_raw_disable_protocols(struct rc_dev *dev, u64 protocols) { mutex_lock(&dev->lock); dev->enabled_protocols &= ~protocols; mutex_unlock(&dev->lock); } /** * ir_raw_gen_manchester() - Encode data with Manchester (bi-phase) modulation. * @ev: Pointer to pointer to next free event. *@ev is incremented for * each raw event filled. * @max: Maximum number of raw events to fill. * @timings: Manchester modulation timings. * @n: Number of bits of data. * @data: Data bits to encode. * * Encodes the @n least significant bits of @data using Manchester (bi-phase) * modulation with the timing characteristics described by @timings, writing up * to @max raw IR events using the *@ev pointer. * * Returns: 0 on success. * -ENOBUFS if there isn't enough space in the array to fit the * full encoded data. In this case all @max events will have been * written. */ int ir_raw_gen_manchester(struct ir_raw_event **ev, unsigned int max, const struct ir_raw_timings_manchester *timings, unsigned int n, u64 data) { bool need_pulse; u64 i; int ret = -ENOBUFS; i = BIT_ULL(n - 1); if (timings->leader_pulse) { if (!max--) return ret; init_ir_raw_event_duration((*ev), 1, timings->leader_pulse); if (timings->leader_space) { if (!max--) return ret; init_ir_raw_event_duration(++(*ev), 0, timings->leader_space); } } else { /* continue existing signal */ --(*ev); } /* from here on *ev will point to the last event rather than the next */ while (n && i > 0) { need_pulse = !(data & i); if (timings->invert) need_pulse = !need_pulse; if (need_pulse == !!(*ev)->pulse) { (*ev)->duration += timings->clock; } else { if (!max--) goto nobufs; init_ir_raw_event_duration(++(*ev), need_pulse, timings->clock); } if (!max--) goto nobufs; init_ir_raw_event_duration(++(*ev), !need_pulse, timings->clock); i >>= 1; } if (timings->trailer_space) { if (!(*ev)->pulse) (*ev)->duration += timings->trailer_space; else if (!max--) goto nobufs; else init_ir_raw_event_duration(++(*ev), 0, timings->trailer_space); } ret = 0; nobufs: /* point to the next event rather than last event before returning */ ++(*ev); return ret; } EXPORT_SYMBOL(ir_raw_gen_manchester); /** * ir_raw_gen_pd() - Encode data to raw events with pulse-distance modulation. * @ev: Pointer to pointer to next free event. *@ev is incremented for * each raw event filled. * @max: Maximum number of raw events to fill. * @timings: Pulse distance modulation timings. * @n: Number of bits of data. * @data: Data bits to encode. * * Encodes the @n least significant bits of @data using pulse-distance * modulation with the timing characteristics described by @timings, writing up * to @max raw IR events using the *@ev pointer. * * Returns: 0 on success. * -ENOBUFS if there isn't enough space in the array to fit the * full encoded data. In this case all @max events will have been * written. */ int ir_raw_gen_pd(struct ir_raw_event **ev, unsigned int max, const struct ir_raw_timings_pd *timings, unsigned int n, u64 data) { int i; int ret; unsigned int space; if (timings->header_pulse) { ret = ir_raw_gen_pulse_space(ev, &max, timings->header_pulse, timings->header_space); if (ret) return ret; } if (timings->msb_first) { for (i = n - 1; i >= 0; --i) { space = timings->bit_space[(data >> i) & 1]; ret = ir_raw_gen_pulse_space(ev, &max, timings->bit_pulse, space); if (ret) return ret; } } else { for (i = 0; i < n; ++i, data >>= 1) { space = timings->bit_space[data & 1]; ret = ir_raw_gen_pulse_space(ev, &max, timings->bit_pulse, space); if (ret) return ret; } } ret = ir_raw_gen_pulse_space(ev, &max, timings->trailer_pulse, timings->trailer_space); return ret; } EXPORT_SYMBOL(ir_raw_gen_pd); /** * ir_raw_gen_pl() - Encode data to raw events with pulse-length modulation. * @ev: Pointer to pointer to next free event. *@ev is incremented for * each raw event filled. * @max: Maximum number of raw events to fill. * @timings: Pulse distance modulation timings. * @n: Number of bits of data. * @data: Data bits to encode. * * Encodes the @n least significant bits of @data using space-distance * modulation with the timing characteristics described by @timings, writing up * to @max raw IR events using the *@ev pointer. * * Returns: 0 on success. * -ENOBUFS if there isn't enough space in the array to fit the * full encoded data. In this case all @max events will have been * written. */ int ir_raw_gen_pl(struct ir_raw_event **ev, unsigned int max, const struct ir_raw_timings_pl *timings, unsigned int n, u64 data) { int i; int ret = -ENOBUFS; unsigned int pulse; if (!max--) return ret; init_ir_raw_event_duration((*ev)++, 1, timings->header_pulse); if (timings->msb_first) { for (i = n - 1; i >= 0; --i) { if (!max--) return ret; init_ir_raw_event_duration((*ev)++, 0, timings->bit_space); if (!max--) return ret; pulse = timings->bit_pulse[(data >> i) & 1]; init_ir_raw_event_duration((*ev)++, 1, pulse); } } else { for (i = 0; i < n; ++i, data >>= 1) { if (!max--) return ret; init_ir_raw_event_duration((*ev)++, 0, timings->bit_space); if (!max--) return ret; pulse = timings->bit_pulse[data & 1]; init_ir_raw_event_duration((*ev)++, 1, pulse); } } if (!max--) return ret; init_ir_raw_event_duration((*ev)++, 0, timings->trailer_space); return 0; } EXPORT_SYMBOL(ir_raw_gen_pl); /** * ir_raw_encode_scancode() - Encode a scancode as raw events * * @protocol: protocol * @scancode: scancode filter describing a single scancode * @events: array of raw events to write into * @max: max number of raw events * * Attempts to encode the scancode as raw events. * * Returns: The number of events written. * -ENOBUFS if there isn't enough space in the array to fit the * encoding. In this case all @max events will have been written. * -EINVAL if the scancode is ambiguous or invalid, or if no * compatible encoder was found. */ int ir_raw_encode_scancode(enum rc_proto protocol, u32 scancode, struct ir_raw_event *events, unsigned int max) { struct ir_raw_handler *handler; int ret = -EINVAL; u64 mask = 1ULL << protocol; ir_raw_load_modules(&mask); mutex_lock(&ir_raw_handler_lock); list_for_each_entry(handler, &ir_raw_handler_list, list) { if (handler->protocols & mask && handler->encode) { ret = handler->encode(protocol, scancode, events, max); if (ret >= 0 || ret == -ENOBUFS) break; } } mutex_unlock(&ir_raw_handler_lock); return ret; } EXPORT_SYMBOL(ir_raw_encode_scancode); /** * ir_raw_edge_handle() - Handle ir_raw_event_store_edge() processing * * @t: timer_list * * This callback is armed by ir_raw_event_store_edge(). It does two things: * first of all, rather than calling ir_raw_event_handle() for each * edge and waking up the rc thread, 15 ms after the first edge * ir_raw_event_handle() is called. Secondly, generate a timeout event * no more IR is received after the rc_dev timeout. */ static void ir_raw_edge_handle(struct timer_list *t) { struct ir_raw_event_ctrl *raw = from_timer(raw, t, edge_handle); struct rc_dev *dev = raw->dev; unsigned long flags; ktime_t interval; spin_lock_irqsave(&dev->raw->edge_spinlock, flags); interval = ktime_sub(ktime_get(), dev->raw->last_event); if (ktime_to_us(interval) >= dev->timeout) { struct ir_raw_event ev = { .timeout = true, .duration = ktime_to_us(interval) }; ir_raw_event_store(dev, &ev); } else { mod_timer(&dev->raw->edge_handle, jiffies + usecs_to_jiffies(dev->timeout - ktime_to_us(interval))); } spin_unlock_irqrestore(&dev->raw->edge_spinlock, flags); ir_raw_event_handle(dev); } /** * ir_raw_encode_carrier() - Get carrier used for protocol * * @protocol: protocol * * Attempts to find the carrier for the specified protocol * * Returns: The carrier in Hz * -EINVAL if the protocol is invalid, or if no * compatible encoder was found. */ int ir_raw_encode_carrier(enum rc_proto protocol) { struct ir_raw_handler *handler; int ret = -EINVAL; u64 mask = BIT_ULL(protocol); mutex_lock(&ir_raw_handler_lock); list_for_each_entry(handler, &ir_raw_handler_list, list) { if (handler->protocols & mask && handler->encode) { ret = handler->carrier; break; } } mutex_unlock(&ir_raw_handler_lock); return ret; } EXPORT_SYMBOL(ir_raw_encode_carrier); /* * Used to (un)register raw event clients */ int ir_raw_event_prepare(struct rc_dev *dev) { if (!dev) return -EINVAL; dev->raw = kzalloc(sizeof(*dev->raw), GFP_KERNEL); if (!dev->raw) return -ENOMEM; dev->raw->dev = dev; dev->change_protocol = change_protocol; dev->idle = true; spin_lock_init(&dev->raw->edge_spinlock); timer_setup(&dev->raw->edge_handle, ir_raw_edge_handle, 0); INIT_KFIFO(dev->raw->kfifo); return 0; } int ir_raw_event_register(struct rc_dev *dev) { struct task_struct *thread; thread = kthread_run(ir_raw_event_thread, dev->raw, "rc%u", dev->minor); if (IS_ERR(thread)) return PTR_ERR(thread); dev->raw->thread = thread; mutex_lock(&ir_raw_handler_lock); list_add_tail(&dev->raw->list, &ir_raw_client_list); mutex_unlock(&ir_raw_handler_lock); return 0; } void ir_raw_event_free(struct rc_dev *dev) { if (!dev) return; kfree(dev->raw); dev->raw = NULL; } void ir_raw_event_unregister(struct rc_dev *dev) { struct ir_raw_handler *handler; if (!dev || !dev->raw) return; kthread_stop(dev->raw->thread); del_timer_sync(&dev->raw->edge_handle); mutex_lock(&ir_raw_handler_lock); list_del(&dev->raw->list); list_for_each_entry(handler, &ir_raw_handler_list, list) if (handler->raw_unregister && (handler->protocols & dev->enabled_protocols)) handler->raw_unregister(dev); lirc_bpf_free(dev); ir_raw_event_free(dev); /* * A user can be calling bpf(BPF_PROG_{QUERY|ATTACH|DETACH}), so * ensure that the raw member is null on unlock; this is how * "device gone" is checked. */ mutex_unlock(&ir_raw_handler_lock); } /* * Extension interface - used to register the IR decoders */ int ir_raw_handler_register(struct ir_raw_handler *ir_raw_handler) { mutex_lock(&ir_raw_handler_lock); list_add_tail(&ir_raw_handler->list, &ir_raw_handler_list); atomic64_or(ir_raw_handler->protocols, &available_protocols); mutex_unlock(&ir_raw_handler_lock); return 0; } EXPORT_SYMBOL(ir_raw_handler_register); void ir_raw_handler_unregister(struct ir_raw_handler *ir_raw_handler) { struct ir_raw_event_ctrl *raw; u64 protocols = ir_raw_handler->protocols; mutex_lock(&ir_raw_handler_lock); list_del(&ir_raw_handler->list); list_for_each_entry(raw, &ir_raw_client_list, list) { if (ir_raw_handler->raw_unregister && (raw->dev->enabled_protocols & protocols)) ir_raw_handler->raw_unregister(raw->dev); ir_raw_disable_protocols(raw->dev, protocols); } atomic64_andnot(protocols, &available_protocols); mutex_unlock(&ir_raw_handler_lock); } EXPORT_SYMBOL(ir_raw_handler_unregister); |
1 1 1 1 1 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 | // SPDX-License-Identifier: GPL-2.0+ /* * comedi/drivers/ni_usb6501.c * Comedi driver for National Instruments USB-6501 * * COMEDI - Linux Control and Measurement Device Interface * Copyright (C) 2014 Luca Ellero <luca.ellero@brickedbrain.com> */ /* * Driver: ni_usb6501 * Description: National Instruments USB-6501 module * Devices: [National Instruments] USB-6501 (ni_usb6501) * Author: Luca Ellero <luca.ellero@brickedbrain.com> * Updated: 8 Sep 2014 * Status: works * * * Configuration Options: * none */ /* * NI-6501 - USB PROTOCOL DESCRIPTION * * Every command is composed by two USB packets: * - request (out) * - response (in) * * Every packet is at least 12 bytes long, here is the meaning of * every field (all values are hex): * * byte 0 is always 00 * byte 1 is always 01 * byte 2 is always 00 * byte 3 is the total packet length * * byte 4 is always 00 * byte 5 is the total packet length - 4 * byte 6 is always 01 * byte 7 is the command * * byte 8 is 02 (request) or 00 (response) * byte 9 is 00 (response) or 10 (port request) or 20 (counter request) * byte 10 is always 00 * byte 11 is 00 (request) or 02 (response) * * PORT PACKETS * * CMD: 0xE READ_PORT * REQ: 00 01 00 10 00 0C 01 0E 02 10 00 00 00 03 <PORT> 00 * RES: 00 01 00 10 00 0C 01 00 00 00 00 02 00 03 <BMAP> 00 * * CMD: 0xF WRITE_PORT * REQ: 00 01 00 14 00 10 01 0F 02 10 00 00 00 03 <PORT> 00 03 <BMAP> 00 00 * RES: 00 01 00 0C 00 08 01 00 00 00 00 02 * * CMD: 0x12 SET_PORT_DIR (0 = input, 1 = output) * REQ: 00 01 00 18 00 14 01 12 02 10 00 00 * 00 05 <PORT 0> <PORT 1> <PORT 2> 00 05 00 00 00 00 00 * RES: 00 01 00 0C 00 08 01 00 00 00 00 02 * * COUNTER PACKETS * * CMD 0x9: START_COUNTER * REQ: 00 01 00 0C 00 08 01 09 02 20 00 00 * RES: 00 01 00 0C 00 08 01 00 00 00 00 02 * * CMD 0xC: STOP_COUNTER * REQ: 00 01 00 0C 00 08 01 0C 02 20 00 00 * RES: 00 01 00 0C 00 08 01 00 00 00 00 02 * * CMD 0xE: READ_COUNTER * REQ: 00 01 00 0C 00 08 01 0E 02 20 00 00 * RES: 00 01 00 10 00 0C 01 00 00 00 00 02 <u32 counter value, Big Endian> * * CMD 0xF: WRITE_COUNTER * REQ: 00 01 00 10 00 0C 01 0F 02 20 00 00 <u32 counter value, Big Endian> * RES: 00 01 00 0C 00 08 01 00 00 00 00 02 * * * Please visit https://www.brickedbrain.com if you need * additional information or have any questions. * */ #include <linux/kernel.h> #include <linux/module.h> #include <linux/slab.h> #include <linux/comedi/comedi_usb.h> #define NI6501_TIMEOUT 1000 /* Port request packets */ static const u8 READ_PORT_REQUEST[] = {0x00, 0x01, 0x00, 0x10, 0x00, 0x0C, 0x01, 0x0E, 0x02, 0x10, 0x00, 0x00, 0x00, 0x03, 0x00, 0x00}; static const u8 WRITE_PORT_REQUEST[] = {0x00, 0x01, 0x00, 0x14, 0x00, 0x10, 0x01, 0x0F, 0x02, 0x10, 0x00, 0x00, 0x00, 0x03, 0x00, 0x00, 0x03, 0x00, 0x00, 0x00}; static const u8 SET_PORT_DIR_REQUEST[] = {0x00, 0x01, 0x00, 0x18, 0x00, 0x14, 0x01, 0x12, 0x02, 0x10, 0x00, 0x00, 0x00, 0x05, 0x00, 0x00, 0x00, 0x00, 0x05, 0x00, 0x00, 0x00, 0x00, 0x00}; /* Counter request packets */ static const u8 START_COUNTER_REQUEST[] = {0x00, 0x01, 0x00, 0x0C, 0x00, 0x08, 0x01, 0x09, 0x02, 0x20, 0x00, 0x00}; static const u8 STOP_COUNTER_REQUEST[] = {0x00, 0x01, 0x00, 0x0C, 0x00, 0x08, 0x01, 0x0C, 0x02, 0x20, 0x00, 0x00}; static const u8 READ_COUNTER_REQUEST[] = {0x00, 0x01, 0x00, 0x0C, 0x00, 0x08, 0x01, 0x0E, 0x02, 0x20, 0x00, 0x00}; static const u8 WRITE_COUNTER_REQUEST[] = {0x00, 0x01, 0x00, 0x10, 0x00, 0x0C, 0x01, 0x0F, 0x02, 0x20, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; /* Response packets */ static const u8 GENERIC_RESPONSE[] = {0x00, 0x01, 0x00, 0x0C, 0x00, 0x08, 0x01, 0x00, 0x00, 0x00, 0x00, 0x02}; static const u8 READ_PORT_RESPONSE[] = {0x00, 0x01, 0x00, 0x10, 0x00, 0x0C, 0x01, 0x00, 0x00, 0x00, 0x00, 0x02, 0x00, 0x03, 0x00, 0x00}; static const u8 READ_COUNTER_RESPONSE[] = {0x00, 0x01, 0x00, 0x10, 0x00, 0x0C, 0x01, 0x00, 0x00, 0x00, 0x00, 0x02, 0x00, 0x00, 0x00, 0x00}; /* Largest supported packets */ static const size_t TX_MAX_SIZE = sizeof(SET_PORT_DIR_REQUEST); static const size_t RX_MAX_SIZE = sizeof(READ_PORT_RESPONSE); enum commands { READ_PORT, WRITE_PORT, SET_PORT_DIR, START_COUNTER, STOP_COUNTER, READ_COUNTER, WRITE_COUNTER }; struct ni6501_private { struct usb_endpoint_descriptor *ep_rx; struct usb_endpoint_descriptor *ep_tx; struct mutex mut; u8 *usb_rx_buf; u8 *usb_tx_buf; }; static int ni6501_port_command(struct comedi_device *dev, int command, unsigned int val, u8 *bitmap) { struct usb_device *usb = comedi_to_usb_dev(dev); struct ni6501_private *devpriv = dev->private; int request_size, response_size; u8 *tx = devpriv->usb_tx_buf; int ret; if (command != SET_PORT_DIR && !bitmap) return -EINVAL; mutex_lock(&devpriv->mut); switch (command) { case READ_PORT: request_size = sizeof(READ_PORT_REQUEST); response_size = sizeof(READ_PORT_RESPONSE); memcpy(tx, READ_PORT_REQUEST, request_size); tx[14] = val & 0xff; break; case WRITE_PORT: request_size = sizeof(WRITE_PORT_REQUEST); response_size = sizeof(GENERIC_RESPONSE); memcpy(tx, WRITE_PORT_REQUEST, request_size); tx[14] = val & 0xff; tx[17] = *bitmap; break; case SET_PORT_DIR: request_size = sizeof(SET_PORT_DIR_REQUEST); response_size = sizeof(GENERIC_RESPONSE); memcpy(tx, SET_PORT_DIR_REQUEST, request_size); tx[14] = val & 0xff; tx[15] = (val >> 8) & 0xff; tx[16] = (val >> 16) & 0xff; break; default: ret = -EINVAL; goto end; } ret = usb_bulk_msg(usb, usb_sndbulkpipe(usb, devpriv->ep_tx->bEndpointAddress), devpriv->usb_tx_buf, request_size, NULL, NI6501_TIMEOUT); if (ret) goto end; ret = usb_bulk_msg(usb, usb_rcvbulkpipe(usb, devpriv->ep_rx->bEndpointAddress), devpriv->usb_rx_buf, response_size, NULL, NI6501_TIMEOUT); if (ret) goto end; /* Check if results are valid */ if (command == READ_PORT) { *bitmap = devpriv->usb_rx_buf[14]; /* mask bitmap for comparing */ devpriv->usb_rx_buf[14] = 0x00; if (memcmp(devpriv->usb_rx_buf, READ_PORT_RESPONSE, sizeof(READ_PORT_RESPONSE))) { ret = -EINVAL; } } else if (memcmp(devpriv->usb_rx_buf, GENERIC_RESPONSE, sizeof(GENERIC_RESPONSE))) { ret = -EINVAL; } end: mutex_unlock(&devpriv->mut); return ret; } static int ni6501_counter_command(struct comedi_device *dev, int command, u32 *val) { struct usb_device *usb = comedi_to_usb_dev(dev); struct ni6501_private *devpriv = dev->private; int request_size, response_size; u8 *tx = devpriv->usb_tx_buf; int ret; if ((command == READ_COUNTER || command == WRITE_COUNTER) && !val) return -EINVAL; mutex_lock(&devpriv->mut); switch (command) { case START_COUNTER: request_size = sizeof(START_COUNTER_REQUEST); response_size = sizeof(GENERIC_RESPONSE); memcpy(tx, START_COUNTER_REQUEST, request_size); break; case STOP_COUNTER: request_size = sizeof(STOP_COUNTER_REQUEST); response_size = sizeof(GENERIC_RESPONSE); memcpy(tx, STOP_COUNTER_REQUEST, request_size); break; case READ_COUNTER: request_size = sizeof(READ_COUNTER_REQUEST); response_size = sizeof(READ_COUNTER_RESPONSE); memcpy(tx, READ_COUNTER_REQUEST, request_size); break; case WRITE_COUNTER: request_size = sizeof(WRITE_COUNTER_REQUEST); response_size = sizeof(GENERIC_RESPONSE); memcpy(tx, WRITE_COUNTER_REQUEST, request_size); /* Setup tx packet: bytes 12,13,14,15 hold the */ /* u32 counter value (Big Endian) */ *((__be32 *)&tx[12]) = cpu_to_be32(*val); break; default: ret = -EINVAL; goto end; } ret = usb_bulk_msg(usb, usb_sndbulkpipe(usb, devpriv->ep_tx->bEndpointAddress), devpriv->usb_tx_buf, request_size, NULL, NI6501_TIMEOUT); if (ret) goto end; ret = usb_bulk_msg(usb, usb_rcvbulkpipe(usb, devpriv->ep_rx->bEndpointAddress), devpriv->usb_rx_buf, response_size, NULL, NI6501_TIMEOUT); if (ret) goto end; /* Check if results are valid */ if (command == READ_COUNTER) { int i; /* Read counter value: bytes 12,13,14,15 of rx packet */ /* hold the u32 counter value (Big Endian) */ *val = be32_to_cpu(*((__be32 *)&devpriv->usb_rx_buf[12])); /* mask counter value for comparing */ for (i = 12; i < sizeof(READ_COUNTER_RESPONSE); ++i) devpriv->usb_rx_buf[i] = 0x00; if (memcmp(devpriv->usb_rx_buf, READ_COUNTER_RESPONSE, sizeof(READ_COUNTER_RESPONSE))) { ret = -EINVAL; } } else if (memcmp(devpriv->usb_rx_buf, GENERIC_RESPONSE, sizeof(GENERIC_RESPONSE))) { ret = -EINVAL; } end: mutex_unlock(&devpriv->mut); return ret; } static int ni6501_dio_insn_config(struct comedi_device *dev, struct comedi_subdevice *s, struct comedi_insn *insn, unsigned int *data) { int ret; ret = comedi_dio_insn_config(dev, s, insn, data, 0); if (ret) return ret; ret = ni6501_port_command(dev, SET_PORT_DIR, s->io_bits, NULL); if (ret) return ret; return insn->n; } static int ni6501_dio_insn_bits(struct comedi_device *dev, struct comedi_subdevice *s, struct comedi_insn *insn, unsigned int *data) { unsigned int mask; int ret; u8 port; u8 bitmap; mask = comedi_dio_update_state(s, data); for (port = 0; port < 3; port++) { if (mask & (0xFF << port * 8)) { bitmap = (s->state >> port * 8) & 0xFF; ret = ni6501_port_command(dev, WRITE_PORT, port, &bitmap); if (ret) return ret; } } data[1] = 0; for (port = 0; port < 3; port++) { ret = ni6501_port_command(dev, READ_PORT, port, &bitmap); if (ret) return ret; data[1] |= bitmap << port * 8; } return insn->n; } static int ni6501_cnt_insn_config(struct comedi_device *dev, struct comedi_subdevice *s, struct comedi_insn *insn, unsigned int *data) { int ret; u32 val = 0; switch (data[0]) { case INSN_CONFIG_ARM: ret = ni6501_counter_command(dev, START_COUNTER, NULL); break; case INSN_CONFIG_DISARM: ret = ni6501_counter_command(dev, STOP_COUNTER, NULL); break; case INSN_CONFIG_RESET: ret = ni6501_counter_command(dev, STOP_COUNTER, NULL); if (ret) break; ret = ni6501_counter_command(dev, WRITE_COUNTER, &val); break; default: return -EINVAL; } return ret ? ret : insn->n; } static int ni6501_cnt_insn_read(struct comedi_device *dev, struct comedi_subdevice *s, struct comedi_insn *insn, unsigned int *data) { int ret; u32 val; unsigned int i; for (i = 0; i < insn->n; i++) { ret = ni6501_counter_command(dev, READ_COUNTER, &val); if (ret) return ret; data[i] = val; } return insn->n; } static int ni6501_cnt_insn_write(struct comedi_device *dev, struct comedi_subdevice *s, struct comedi_insn *insn, unsigned int *data) { int ret; if (insn->n) { u32 val = data[insn->n - 1]; ret = ni6501_counter_command(dev, WRITE_COUNTER, &val); if (ret) return ret; } return insn->n; } static int ni6501_alloc_usb_buffers(struct comedi_device *dev) { struct ni6501_private *devpriv = dev->private; size_t size; size = usb_endpoint_maxp(devpriv->ep_rx); devpriv->usb_rx_buf = kzalloc(size, GFP_KERNEL); if (!devpriv->usb_rx_buf) return -ENOMEM; size = usb_endpoint_maxp(devpriv->ep_tx); devpriv->usb_tx_buf = kzalloc(size, GFP_KERNEL); if (!devpriv->usb_tx_buf) return -ENOMEM; return 0; } static int ni6501_find_endpoints(struct comedi_device *dev) { struct usb_interface *intf = comedi_to_usb_interface(dev); struct ni6501_private *devpriv = dev->private; struct usb_host_interface *iface_desc = intf->cur_altsetting; struct usb_endpoint_descriptor *ep_desc; int i; if (iface_desc->desc.bNumEndpoints != 2) { dev_err(dev->class_dev, "Wrong number of endpoints\n"); return -ENODEV; } for (i = 0; i < iface_desc->desc.bNumEndpoints; i++) { ep_desc = &iface_desc->endpoint[i].desc; if (usb_endpoint_is_bulk_in(ep_desc)) { if (!devpriv->ep_rx) devpriv->ep_rx = ep_desc; continue; } if (usb_endpoint_is_bulk_out(ep_desc)) { if (!devpriv->ep_tx) devpriv->ep_tx = ep_desc; continue; } } if (!devpriv->ep_rx || !devpriv->ep_tx) return -ENODEV; if (usb_endpoint_maxp(devpriv->ep_rx) < RX_MAX_SIZE) return -ENODEV; if (usb_endpoint_maxp(devpriv->ep_tx) < TX_MAX_SIZE) return -ENODEV; return 0; } static int ni6501_auto_attach(struct comedi_device *dev, unsigned long context) { struct usb_interface *intf = comedi_to_usb_interface(dev); struct ni6501_private *devpriv; struct comedi_subdevice *s; int ret; devpriv = comedi_alloc_devpriv(dev, sizeof(*devpriv)); if (!devpriv) return -ENOMEM; mutex_init(&devpriv->mut); usb_set_intfdata(intf, devpriv); ret = ni6501_find_endpoints(dev); if (ret) return ret; ret = ni6501_alloc_usb_buffers(dev); if (ret) return ret; ret = comedi_alloc_subdevices(dev, 2); if (ret) return ret; /* Digital Input/Output subdevice */ s = &dev->subdevices[0]; s->type = COMEDI_SUBD_DIO; s->subdev_flags = SDF_READABLE | SDF_WRITABLE; s->n_chan = 24; s->maxdata = 1; s->range_table = &range_digital; s->insn_bits = ni6501_dio_insn_bits; s->insn_config = ni6501_dio_insn_config; /* Counter subdevice */ s = &dev->subdevices[1]; s->type = COMEDI_SUBD_COUNTER; s->subdev_flags = SDF_READABLE | SDF_WRITABLE | SDF_LSAMPL; s->n_chan = 1; s->maxdata = 0xffffffff; s->insn_read = ni6501_cnt_insn_read; s->insn_write = ni6501_cnt_insn_write; s->insn_config = ni6501_cnt_insn_config; return 0; } static void ni6501_detach(struct comedi_device *dev) { struct usb_interface *intf = comedi_to_usb_interface(dev); struct ni6501_private *devpriv = dev->private; if (!devpriv) return; mutex_destroy(&devpriv->mut); usb_set_intfdata(intf, NULL); kfree(devpriv->usb_rx_buf); kfree(devpriv->usb_tx_buf); } static struct comedi_driver ni6501_driver = { .module = THIS_MODULE, .driver_name = "ni6501", .auto_attach = ni6501_auto_attach, .detach = ni6501_detach, }; static int ni6501_usb_probe(struct usb_interface *intf, const struct usb_device_id *id) { return comedi_usb_auto_config(intf, &ni6501_driver, id->driver_info); } static const struct usb_device_id ni6501_usb_table[] = { { USB_DEVICE(0x3923, 0x718a) }, { } }; MODULE_DEVICE_TABLE(usb, ni6501_usb_table); static struct usb_driver ni6501_usb_driver = { .name = "ni6501", .id_table = ni6501_usb_table, .probe = ni6501_usb_probe, .disconnect = comedi_usb_auto_unconfig, }; module_comedi_usb_driver(ni6501_driver, ni6501_usb_driver); MODULE_AUTHOR("Luca Ellero"); MODULE_DESCRIPTION("Comedi driver for National Instruments USB-6501"); MODULE_LICENSE("GPL"); |
24 67 65 52 41 52 59 10 67 155 46 25 65 29 172 145 69 70 67 46 1 70 90 31 90 90 267 12 248 177 68 178 8 3 5 258 15 249 222 173 69 578 276 345 414 275 3 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 | // SPDX-License-Identifier: GPL-2.0 /* * linux/fs/fat/cache.c * * Written 1992,1993 by Werner Almesberger * * Mar 1999. AV. Changed cache, so that it uses the starting cluster instead * of inode number. * May 1999. AV. Fixed the bogosity with FAT32 (read "FAT28"). Fscking lusers. */ #include <linux/slab.h> #include "fat.h" /* this must be > 0. */ #define FAT_MAX_CACHE 8 struct fat_cache { struct list_head cache_list; int nr_contig; /* number of contiguous clusters */ int fcluster; /* cluster number in the file. */ int dcluster; /* cluster number on disk. */ }; struct fat_cache_id { unsigned int id; int nr_contig; int fcluster; int dcluster; }; static inline int fat_max_cache(struct inode *inode) { return FAT_MAX_CACHE; } static struct kmem_cache *fat_cache_cachep; static void init_once(void *foo) { struct fat_cache *cache = (struct fat_cache *)foo; INIT_LIST_HEAD(&cache->cache_list); } int __init fat_cache_init(void) { fat_cache_cachep = kmem_cache_create("fat_cache", sizeof(struct fat_cache), 0, SLAB_RECLAIM_ACCOUNT, init_once); if (fat_cache_cachep == NULL) return -ENOMEM; return 0; } void fat_cache_destroy(void) { kmem_cache_destroy(fat_cache_cachep); } static inline struct fat_cache *fat_cache_alloc(struct inode *inode) { return kmem_cache_alloc(fat_cache_cachep, GFP_NOFS); } static inline void fat_cache_free(struct fat_cache *cache) { BUG_ON(!list_empty(&cache->cache_list)); kmem_cache_free(fat_cache_cachep, cache); } static inline void fat_cache_update_lru(struct inode *inode, struct fat_cache *cache) { if (MSDOS_I(inode)->cache_lru.next != &cache->cache_list) list_move(&cache->cache_list, &MSDOS_I(inode)->cache_lru); } static int fat_cache_lookup(struct inode *inode, int fclus, struct fat_cache_id *cid, int *cached_fclus, int *cached_dclus) { static struct fat_cache nohit = { .fcluster = 0, }; struct fat_cache *hit = &nohit, *p; int offset = -1; spin_lock(&MSDOS_I(inode)->cache_lru_lock); list_for_each_entry(p, &MSDOS_I(inode)->cache_lru, cache_list) { /* Find the cache of "fclus" or nearest cache. */ if (p->fcluster <= fclus && hit->fcluster < p->fcluster) { hit = p; if ((hit->fcluster + hit->nr_contig) < fclus) { offset = hit->nr_contig; } else { offset = fclus - hit->fcluster; break; } } } if (hit != &nohit) { fat_cache_update_lru(inode, hit); cid->id = MSDOS_I(inode)->cache_valid_id; cid->nr_contig = hit->nr_contig; cid->fcluster = hit->fcluster; cid->dcluster = hit->dcluster; *cached_fclus = cid->fcluster + offset; *cached_dclus = cid->dcluster + offset; } spin_unlock(&MSDOS_I(inode)->cache_lru_lock); return offset; } static struct fat_cache *fat_cache_merge(struct inode *inode, struct fat_cache_id *new) { struct fat_cache *p; list_for_each_entry(p, &MSDOS_I(inode)->cache_lru, cache_list) { /* Find the same part as "new" in cluster-chain. */ if (p->fcluster == new->fcluster) { BUG_ON(p->dcluster != new->dcluster); if (new->nr_contig > p->nr_contig) p->nr_contig = new->nr_contig; return p; } } return NULL; } static void fat_cache_add(struct inode *inode, struct fat_cache_id *new) { struct fat_cache *cache, *tmp; if (new->fcluster == -1) /* dummy cache */ return; spin_lock(&MSDOS_I(inode)->cache_lru_lock); if (new->id != FAT_CACHE_VALID && new->id != MSDOS_I(inode)->cache_valid_id) goto out; /* this cache was invalidated */ cache = fat_cache_merge(inode, new); if (cache == NULL) { if (MSDOS_I(inode)->nr_caches < fat_max_cache(inode)) { MSDOS_I(inode)->nr_caches++; spin_unlock(&MSDOS_I(inode)->cache_lru_lock); tmp = fat_cache_alloc(inode); if (!tmp) { spin_lock(&MSDOS_I(inode)->cache_lru_lock); MSDOS_I(inode)->nr_caches--; spin_unlock(&MSDOS_I(inode)->cache_lru_lock); return; } spin_lock(&MSDOS_I(inode)->cache_lru_lock); cache = fat_cache_merge(inode, new); if (cache != NULL) { MSDOS_I(inode)->nr_caches--; fat_cache_free(tmp); goto out_update_lru; } cache = tmp; } else { struct list_head *p = MSDOS_I(inode)->cache_lru.prev; cache = list_entry(p, struct fat_cache, cache_list); } cache->fcluster = new->fcluster; cache->dcluster = new->dcluster; cache->nr_contig = new->nr_contig; } out_update_lru: fat_cache_update_lru(inode, cache); out: spin_unlock(&MSDOS_I(inode)->cache_lru_lock); } /* * Cache invalidation occurs rarely, thus the LRU chain is not updated. It * fixes itself after a while. */ static void __fat_cache_inval_inode(struct inode *inode) { struct msdos_inode_info *i = MSDOS_I(inode); struct fat_cache *cache; while (!list_empty(&i->cache_lru)) { cache = list_entry(i->cache_lru.next, struct fat_cache, cache_list); list_del_init(&cache->cache_list); i->nr_caches--; fat_cache_free(cache); } /* Update. The copy of caches before this id is discarded. */ i->cache_valid_id++; if (i->cache_valid_id == FAT_CACHE_VALID) i->cache_valid_id++; } void fat_cache_inval_inode(struct inode *inode) { spin_lock(&MSDOS_I(inode)->cache_lru_lock); __fat_cache_inval_inode(inode); spin_unlock(&MSDOS_I(inode)->cache_lru_lock); } static inline int cache_contiguous(struct fat_cache_id *cid, int dclus) { cid->nr_contig++; return ((cid->dcluster + cid->nr_contig) == dclus); } static inline void cache_init(struct fat_cache_id *cid, int fclus, int dclus) { cid->id = FAT_CACHE_VALID; cid->fcluster = fclus; cid->dcluster = dclus; cid->nr_contig = 0; } int fat_get_cluster(struct inode *inode, int cluster, int *fclus, int *dclus) { struct super_block *sb = inode->i_sb; struct msdos_sb_info *sbi = MSDOS_SB(sb); const int limit = sb->s_maxbytes >> sbi->cluster_bits; struct fat_entry fatent; struct fat_cache_id cid; int nr; BUG_ON(MSDOS_I(inode)->i_start == 0); *fclus = 0; *dclus = MSDOS_I(inode)->i_start; if (!fat_valid_entry(sbi, *dclus)) { fat_fs_error_ratelimit(sb, "%s: invalid start cluster (i_pos %lld, start %08x)", __func__, MSDOS_I(inode)->i_pos, *dclus); return -EIO; } if (cluster == 0) return 0; if (fat_cache_lookup(inode, cluster, &cid, fclus, dclus) < 0) { /* * dummy, always not contiguous * This is reinitialized by cache_init(), later. */ cache_init(&cid, -1, -1); } fatent_init(&fatent); while (*fclus < cluster) { /* prevent the infinite loop of cluster chain */ if (*fclus > limit) { fat_fs_error_ratelimit(sb, "%s: detected the cluster chain loop (i_pos %lld)", __func__, MSDOS_I(inode)->i_pos); nr = -EIO; goto out; } nr = fat_ent_read(inode, &fatent, *dclus); if (nr < 0) goto out; else if (nr == FAT_ENT_FREE) { fat_fs_error_ratelimit(sb, "%s: invalid cluster chain (i_pos %lld)", __func__, MSDOS_I(inode)->i_pos); nr = -EIO; goto out; } else if (nr == FAT_ENT_EOF) { fat_cache_add(inode, &cid); goto out; } (*fclus)++; *dclus = nr; if (!cache_contiguous(&cid, *dclus)) cache_init(&cid, *fclus, *dclus); } nr = 0; fat_cache_add(inode, &cid); out: fatent_brelse(&fatent); return nr; } static int fat_bmap_cluster(struct inode *inode, int cluster) { struct super_block *sb = inode->i_sb; int ret, fclus, dclus; if (MSDOS_I(inode)->i_start == 0) return 0; ret = fat_get_cluster(inode, cluster, &fclus, &dclus); if (ret < 0) return ret; else if (ret == FAT_ENT_EOF) { fat_fs_error(sb, "%s: request beyond EOF (i_pos %lld)", __func__, MSDOS_I(inode)->i_pos); return -EIO; } return dclus; } int fat_get_mapped_cluster(struct inode *inode, sector_t sector, sector_t last_block, unsigned long *mapped_blocks, sector_t *bmap) { struct super_block *sb = inode->i_sb; struct msdos_sb_info *sbi = MSDOS_SB(sb); int cluster, offset; cluster = sector >> (sbi->cluster_bits - sb->s_blocksize_bits); offset = sector & (sbi->sec_per_clus - 1); cluster = fat_bmap_cluster(inode, cluster); if (cluster < 0) return cluster; else if (cluster) { *bmap = fat_clus_to_blknr(sbi, cluster) + offset; *mapped_blocks = sbi->sec_per_clus - offset; if (*mapped_blocks > last_block - sector) *mapped_blocks = last_block - sector; } return 0; } static int is_exceed_eof(struct inode *inode, sector_t sector, sector_t *last_block, int create) { struct super_block *sb = inode->i_sb; const unsigned long blocksize = sb->s_blocksize; const unsigned char blocksize_bits = sb->s_blocksize_bits; *last_block = (i_size_read(inode) + (blocksize - 1)) >> blocksize_bits; if (sector >= *last_block) { if (!create) return 1; /* * ->mmu_private can access on only allocation path. * (caller must hold ->i_mutex) */ *last_block = (MSDOS_I(inode)->mmu_private + (blocksize - 1)) >> blocksize_bits; if (sector >= *last_block) return 1; } return 0; } int fat_bmap(struct inode *inode, sector_t sector, sector_t *phys, unsigned long *mapped_blocks, int create, bool from_bmap) { struct msdos_sb_info *sbi = MSDOS_SB(inode->i_sb); sector_t last_block; *phys = 0; *mapped_blocks = 0; if (!is_fat32(sbi) && (inode->i_ino == MSDOS_ROOT_INO)) { if (sector < (sbi->dir_entries >> sbi->dir_per_block_bits)) { *phys = sector + sbi->dir_start; *mapped_blocks = 1; } return 0; } if (!from_bmap) { if (is_exceed_eof(inode, sector, &last_block, create)) return 0; } else { last_block = inode->i_blocks >> (inode->i_sb->s_blocksize_bits - 9); if (sector >= last_block) return 0; } return fat_get_mapped_cluster(inode, sector, last_block, mapped_blocks, phys); } |
20 29 1 20 28 5 23 23 23 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 | // SPDX-License-Identifier: GPL-2.0-or-later /* * lib/textsearch.c Generic text search interface * * Authors: Thomas Graf <tgraf@suug.ch> * Pablo Neira Ayuso <pablo@netfilter.org> * * ========================================================================== */ /** * DOC: ts_intro * INTRODUCTION * * The textsearch infrastructure provides text searching facilities for * both linear and non-linear data. Individual search algorithms are * implemented in modules and chosen by the user. * * ARCHITECTURE * * .. code-block:: none * * User * +----------------+ * | finish()|<--------------(6)-----------------+ * |get_next_block()|<--------------(5)---------------+ | * | | Algorithm | | * | | +------------------------------+ * | | | init() find() destroy() | * | | +------------------------------+ * | | Core API ^ ^ ^ * | | +---------------+ (2) (4) (8) * | (1)|----->| prepare() |---+ | | * | (3)|----->| find()/next() |-----------+ | * | (7)|----->| destroy() |----------------------+ * +----------------+ +---------------+ * * (1) User configures a search by calling textsearch_prepare() specifying * the search parameters such as the pattern and algorithm name. * (2) Core requests the algorithm to allocate and initialize a search * configuration according to the specified parameters. * (3) User starts the search(es) by calling textsearch_find() or * textsearch_next() to fetch subsequent occurrences. A state variable * is provided to the algorithm to store persistent variables. * (4) Core eventually resets the search offset and forwards the find() * request to the algorithm. * (5) Algorithm calls get_next_block() provided by the user continuously * to fetch the data to be searched in block by block. * (6) Algorithm invokes finish() after the last call to get_next_block * to clean up any leftovers from get_next_block. (Optional) * (7) User destroys the configuration by calling textsearch_destroy(). * (8) Core notifies the algorithm to destroy algorithm specific * allocations. (Optional) * * USAGE * * Before a search can be performed, a configuration must be created * by calling textsearch_prepare() specifying the searching algorithm, * the pattern to look for and flags. As a flag, you can set TS_IGNORECASE * to perform case insensitive matching. But it might slow down * performance of algorithm, so you should use it at own your risk. * The returned configuration may then be used for an arbitrary * amount of times and even in parallel as long as a separate struct * ts_state variable is provided to every instance. * * The actual search is performed by either calling * textsearch_find_continuous() for linear data or by providing * an own get_next_block() implementation and * calling textsearch_find(). Both functions return * the position of the first occurrence of the pattern or UINT_MAX if * no match was found. Subsequent occurrences can be found by calling * textsearch_next() regardless of the linearity of the data. * * Once you're done using a configuration it must be given back via * textsearch_destroy. * * EXAMPLE:: * * int pos; * struct ts_config *conf; * struct ts_state state; * const char *pattern = "chicken"; * const char *example = "We dance the funky chicken"; * * conf = textsearch_prepare("kmp", pattern, strlen(pattern), * GFP_KERNEL, TS_AUTOLOAD); * if (IS_ERR(conf)) { * err = PTR_ERR(conf); * goto errout; * } * * pos = textsearch_find_continuous(conf, &state, example, strlen(example)); * if (pos != UINT_MAX) * panic("Oh my god, dancing chickens at %d\n", pos); * * textsearch_destroy(conf); */ /* ========================================================================== */ #include <linux/module.h> #include <linux/types.h> #include <linux/string.h> #include <linux/init.h> #include <linux/rculist.h> #include <linux/rcupdate.h> #include <linux/err.h> #include <linux/textsearch.h> #include <linux/slab.h> static LIST_HEAD(ts_ops); static DEFINE_SPINLOCK(ts_mod_lock); static inline struct ts_ops *lookup_ts_algo(const char *name) { struct ts_ops *o; rcu_read_lock(); list_for_each_entry_rcu(o, &ts_ops, list) { if (!strcmp(name, o->name)) { if (!try_module_get(o->owner)) o = NULL; rcu_read_unlock(); return o; } } rcu_read_unlock(); return NULL; } /** * textsearch_register - register a textsearch module * @ops: operations lookup table * * This function must be called by textsearch modules to announce * their presence. The specified &@ops must have %name set to a * unique identifier and the callbacks find(), init(), get_pattern(), * and get_pattern_len() must be implemented. * * Returns 0 or -EEXISTS if another module has already registered * with same name. */ int textsearch_register(struct ts_ops *ops) { int err = -EEXIST; struct ts_ops *o; if (ops->name == NULL || ops->find == NULL || ops->init == NULL || ops->get_pattern == NULL || ops->get_pattern_len == NULL) return -EINVAL; spin_lock(&ts_mod_lock); list_for_each_entry(o, &ts_ops, list) { if (!strcmp(ops->name, o->name)) goto errout; } list_add_tail_rcu(&ops->list, &ts_ops); err = 0; errout: spin_unlock(&ts_mod_lock); return err; } EXPORT_SYMBOL(textsearch_register); /** * textsearch_unregister - unregister a textsearch module * @ops: operations lookup table * * This function must be called by textsearch modules to announce * their disappearance for examples when the module gets unloaded. * The &ops parameter must be the same as the one during the * registration. * * Returns 0 on success or -ENOENT if no matching textsearch * registration was found. */ int textsearch_unregister(struct ts_ops *ops) { int err = 0; struct ts_ops *o; spin_lock(&ts_mod_lock); list_for_each_entry(o, &ts_ops, list) { if (o == ops) { list_del_rcu(&o->list); goto out; } } err = -ENOENT; out: spin_unlock(&ts_mod_lock); return err; } EXPORT_SYMBOL(textsearch_unregister); struct ts_linear_state { unsigned int len; const void *data; }; static unsigned int get_linear_data(unsigned int consumed, const u8 **dst, struct ts_config *conf, struct ts_state *state) { struct ts_linear_state *st = (struct ts_linear_state *) state->cb; if (likely(consumed < st->len)) { *dst = st->data + consumed; return st->len - consumed; } return 0; } /** * textsearch_find_continuous - search a pattern in continuous/linear data * @conf: search configuration * @state: search state * @data: data to search in * @len: length of data * * A simplified version of textsearch_find() for continuous/linear data. * Call textsearch_next() to retrieve subsequent matches. * * Returns the position of first occurrence of the pattern or * %UINT_MAX if no occurrence was found. */ unsigned int textsearch_find_continuous(struct ts_config *conf, struct ts_state *state, const void *data, unsigned int len) { struct ts_linear_state *st = (struct ts_linear_state *) state->cb; conf->get_next_block = get_linear_data; st->data = data; st->len = len; return textsearch_find(conf, state); } EXPORT_SYMBOL(textsearch_find_continuous); /** * textsearch_prepare - Prepare a search * @algo: name of search algorithm * @pattern: pattern data * @len: length of pattern * @gfp_mask: allocation mask * @flags: search flags * * Looks up the search algorithm module and creates a new textsearch * configuration for the specified pattern. * * Note: The format of the pattern may not be compatible between * the various search algorithms. * * Returns a new textsearch configuration according to the specified * parameters or a ERR_PTR(). If a zero length pattern is passed, this * function returns EINVAL. */ struct ts_config *textsearch_prepare(const char *algo, const void *pattern, unsigned int len, gfp_t gfp_mask, int flags) { int err = -ENOENT; struct ts_config *conf; struct ts_ops *ops; if (len == 0) return ERR_PTR(-EINVAL); ops = lookup_ts_algo(algo); #ifdef CONFIG_MODULES /* * Why not always autoload you may ask. Some users are * in a situation where requesting a module may deadlock, * especially when the module is located on a NFS mount. */ if (ops == NULL && flags & TS_AUTOLOAD) { request_module("ts_%s", algo); ops = lookup_ts_algo(algo); } #endif if (ops == NULL) goto errout; conf = ops->init(pattern, len, gfp_mask, flags); if (IS_ERR(conf)) { err = PTR_ERR(conf); goto errout; } conf->ops = ops; return conf; errout: if (ops) module_put(ops->owner); return ERR_PTR(err); } EXPORT_SYMBOL(textsearch_prepare); /** * textsearch_destroy - destroy a search configuration * @conf: search configuration * * Releases all references of the configuration and frees * up the memory. */ void textsearch_destroy(struct ts_config *conf) { if (conf->ops) { if (conf->ops->destroy) conf->ops->destroy(conf); module_put(conf->ops->owner); } kfree(conf); } EXPORT_SYMBOL(textsearch_destroy); |
642 642 243 243 243 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 | // SPDX-License-Identifier: GPL-2.0-only /* * Copyright(c) 2017 Intel Corporation. All rights reserved. */ #include <linux/pagemap.h> #include <linux/module.h> #include <linux/mount.h> #include <linux/pseudo_fs.h> #include <linux/magic.h> #include <linux/pfn_t.h> #include <linux/cdev.h> #include <linux/slab.h> #include <linux/uio.h> #include <linux/dax.h> #include <linux/fs.h> #include <linux/cacheinfo.h> #include "dax-private.h" /** * struct dax_device - anchor object for dax services * @inode: core vfs * @cdev: optional character interface for "device dax" * @private: dax driver private data * @flags: state and boolean properties * @ops: operations for this device * @holder_data: holder of a dax_device: could be filesystem or mapped device * @holder_ops: operations for the inner holder */ struct dax_device { struct inode inode; struct cdev cdev; void *private; unsigned long flags; const struct dax_operations *ops; void *holder_data; const struct dax_holder_operations *holder_ops; }; static dev_t dax_devt; DEFINE_STATIC_SRCU(dax_srcu); static struct vfsmount *dax_mnt; static DEFINE_IDA(dax_minor_ida); static struct kmem_cache *dax_cache __read_mostly; static struct super_block *dax_superblock __read_mostly; int dax_read_lock(void) { return srcu_read_lock(&dax_srcu); } EXPORT_SYMBOL_GPL(dax_read_lock); void dax_read_unlock(int id) { srcu_read_unlock(&dax_srcu, id); } EXPORT_SYMBOL_GPL(dax_read_unlock); #if defined(CONFIG_BLOCK) && defined(CONFIG_FS_DAX) #include <linux/blkdev.h> static DEFINE_XARRAY(dax_hosts); int dax_add_host(struct dax_device *dax_dev, struct gendisk *disk) { return xa_insert(&dax_hosts, (unsigned long)disk, dax_dev, GFP_KERNEL); } EXPORT_SYMBOL_GPL(dax_add_host); void dax_remove_host(struct gendisk *disk) { xa_erase(&dax_hosts, (unsigned long)disk); } EXPORT_SYMBOL_GPL(dax_remove_host); /** * fs_dax_get_by_bdev() - temporary lookup mechanism for filesystem-dax * @bdev: block device to find a dax_device for * @start_off: returns the byte offset into the dax_device that @bdev starts * @holder: filesystem or mapped device inside the dax_device * @ops: operations for the inner holder */ struct dax_device *fs_dax_get_by_bdev(struct block_device *bdev, u64 *start_off, void *holder, const struct dax_holder_operations *ops) { struct dax_device *dax_dev; u64 part_size; int id; if (!blk_queue_dax(bdev->bd_disk->queue)) return NULL; *start_off = get_start_sect(bdev) * SECTOR_SIZE; part_size = bdev_nr_sectors(bdev) * SECTOR_SIZE; if (*start_off % PAGE_SIZE || part_size % PAGE_SIZE) { pr_info("%pg: error: unaligned partition for dax\n", bdev); return NULL; } id = dax_read_lock(); dax_dev = xa_load(&dax_hosts, (unsigned long)bdev->bd_disk); if (!dax_dev || !dax_alive(dax_dev) || !igrab(&dax_dev->inode)) dax_dev = NULL; else if (holder) { if (!cmpxchg(&dax_dev->holder_data, NULL, holder)) dax_dev->holder_ops = ops; else dax_dev = NULL; } dax_read_unlock(id); return dax_dev; } EXPORT_SYMBOL_GPL(fs_dax_get_by_bdev); void fs_put_dax(struct dax_device *dax_dev, void *holder) { if (dax_dev && holder && cmpxchg(&dax_dev->holder_data, holder, NULL) == holder) dax_dev->holder_ops = NULL; put_dax(dax_dev); } EXPORT_SYMBOL_GPL(fs_put_dax); #endif /* CONFIG_BLOCK && CONFIG_FS_DAX */ enum dax_device_flags { /* !alive + rcu grace period == no new operations / mappings */ DAXDEV_ALIVE, /* gate whether dax_flush() calls the low level flush routine */ DAXDEV_WRITE_CACHE, /* flag to check if device supports synchronous flush */ DAXDEV_SYNC, /* do not leave the caches dirty after writes */ DAXDEV_NOCACHE, /* handle CPU fetch exceptions during reads */ DAXDEV_NOMC, }; /** * dax_direct_access() - translate a device pgoff to an absolute pfn * @dax_dev: a dax_device instance representing the logical memory range * @pgoff: offset in pages from the start of the device to translate * @nr_pages: number of consecutive pages caller can handle relative to @pfn * @mode: indicator on normal access or recovery write * @kaddr: output parameter that returns a virtual address mapping of pfn * @pfn: output parameter that returns an absolute pfn translation of @pgoff * * Return: negative errno if an error occurs, otherwise the number of * pages accessible at the device relative @pgoff. */ long dax_direct_access(struct dax_device *dax_dev, pgoff_t pgoff, long nr_pages, enum dax_access_mode mode, void **kaddr, pfn_t *pfn) { long avail; if (!dax_dev) return -EOPNOTSUPP; if (!dax_alive(dax_dev)) return -ENXIO; if (nr_pages < 0) return -EINVAL; avail = dax_dev->ops->direct_access(dax_dev, pgoff, nr_pages, mode, kaddr, pfn); if (!avail) return -ERANGE; return min(avail, nr_pages); } EXPORT_SYMBOL_GPL(dax_direct_access); size_t dax_copy_from_iter(struct dax_device *dax_dev, pgoff_t pgoff, void *addr, size_t bytes, struct iov_iter *i) { if (!dax_alive(dax_dev)) return 0; /* * The userspace address for the memory copy has already been validated * via access_ok() in vfs_write, so use the 'no check' version to bypass * the HARDENED_USERCOPY overhead. */ if (test_bit(DAXDEV_NOCACHE, &dax_dev->flags)) return _copy_from_iter_flushcache(addr, bytes, i); return _copy_from_iter(addr, bytes, i); } size_t dax_copy_to_iter(struct dax_device *dax_dev, pgoff_t pgoff, void *addr, size_t bytes, struct iov_iter *i) { if (!dax_alive(dax_dev)) return 0; /* * The userspace address for the memory copy has already been validated * via access_ok() in vfs_red, so use the 'no check' version to bypass * the HARDENED_USERCOPY overhead. */ if (test_bit(DAXDEV_NOMC, &dax_dev->flags)) return _copy_mc_to_iter(addr, bytes, i); return _copy_to_iter(addr, bytes, i); } int dax_zero_page_range(struct dax_device *dax_dev, pgoff_t pgoff, size_t nr_pages) { int ret; if (!dax_alive(dax_dev)) return -ENXIO; /* * There are no callers that want to zero more than one page as of now. * Once users are there, this check can be removed after the * device mapper code has been updated to split ranges across targets. */ if (nr_pages != 1) return -EIO; ret = dax_dev->ops->zero_page_range(dax_dev, pgoff, nr_pages); return dax_mem2blk_err(ret); } EXPORT_SYMBOL_GPL(dax_zero_page_range); size_t dax_recovery_write(struct dax_device *dax_dev, pgoff_t pgoff, void *addr, size_t bytes, struct iov_iter *iter) { if (!dax_dev->ops->recovery_write) return 0; return dax_dev->ops->recovery_write(dax_dev, pgoff, addr, bytes, iter); } EXPORT_SYMBOL_GPL(dax_recovery_write); int dax_holder_notify_failure(struct dax_device *dax_dev, u64 off, u64 len, int mf_flags) { int rc, id; id = dax_read_lock(); if (!dax_alive(dax_dev)) { rc = -ENXIO; goto out; } if (!dax_dev->holder_ops) { rc = -EOPNOTSUPP; goto out; } rc = dax_dev->holder_ops->notify_failure(dax_dev, off, len, mf_flags); out: dax_read_unlock(id); return rc; } EXPORT_SYMBOL_GPL(dax_holder_notify_failure); #ifdef CONFIG_ARCH_HAS_PMEM_API void arch_wb_cache_pmem(void *addr, size_t size); void dax_flush(struct dax_device *dax_dev, void *addr, size_t size) { if (unlikely(!dax_write_cache_enabled(dax_dev))) return; arch_wb_cache_pmem(addr, size); } #else void dax_flush(struct dax_device *dax_dev, void *addr, size_t size) { } #endif EXPORT_SYMBOL_GPL(dax_flush); void dax_write_cache(struct dax_device *dax_dev, bool wc) { if (wc) set_bit(DAXDEV_WRITE_CACHE, &dax_dev->flags); else clear_bit(DAXDEV_WRITE_CACHE, &dax_dev->flags); } EXPORT_SYMBOL_GPL(dax_write_cache); bool dax_write_cache_enabled(struct dax_device *dax_dev) { return test_bit(DAXDEV_WRITE_CACHE, &dax_dev->flags); } EXPORT_SYMBOL_GPL(dax_write_cache_enabled); bool dax_synchronous(struct dax_device *dax_dev) { return test_bit(DAXDEV_SYNC, &dax_dev->flags); } EXPORT_SYMBOL_GPL(dax_synchronous); void set_dax_synchronous(struct dax_device *dax_dev) { set_bit(DAXDEV_SYNC, &dax_dev->flags); } EXPORT_SYMBOL_GPL(set_dax_synchronous); void set_dax_nocache(struct dax_device *dax_dev) { set_bit(DAXDEV_NOCACHE, &dax_dev->flags); } EXPORT_SYMBOL_GPL(set_dax_nocache); void set_dax_nomc(struct dax_device *dax_dev) { set_bit(DAXDEV_NOMC, &dax_dev->flags); } EXPORT_SYMBOL_GPL(set_dax_nomc); bool dax_alive(struct dax_device *dax_dev) { lockdep_assert_held(&dax_srcu); return test_bit(DAXDEV_ALIVE, &dax_dev->flags); } EXPORT_SYMBOL_GPL(dax_alive); /* * Note, rcu is not protecting the liveness of dax_dev, rcu is ensuring * that any fault handlers or operations that might have seen * dax_alive(), have completed. Any operations that start after * synchronize_srcu() has run will abort upon seeing !dax_alive(). * * Note, because alloc_dax() returns an ERR_PTR() on error, callers * typically store its result into a local variable in order to check * the result. Therefore, care must be taken to populate the struct * device dax_dev field make sure the dax_dev is not leaked. */ void kill_dax(struct dax_device *dax_dev) { if (!dax_dev) return; if (dax_dev->holder_data != NULL) dax_holder_notify_failure(dax_dev, 0, U64_MAX, MF_MEM_PRE_REMOVE); clear_bit(DAXDEV_ALIVE, &dax_dev->flags); synchronize_srcu(&dax_srcu); /* clear holder data */ dax_dev->holder_ops = NULL; dax_dev->holder_data = NULL; } EXPORT_SYMBOL_GPL(kill_dax); void run_dax(struct dax_device *dax_dev) { set_bit(DAXDEV_ALIVE, &dax_dev->flags); } EXPORT_SYMBOL_GPL(run_dax); static struct inode *dax_alloc_inode(struct super_block *sb) { struct dax_device *dax_dev; struct inode *inode; dax_dev = alloc_inode_sb(sb, dax_cache, GFP_KERNEL); if (!dax_dev) return NULL; inode = &dax_dev->inode; inode->i_rdev = 0; return inode; } static struct dax_device *to_dax_dev(struct inode *inode) { return container_of(inode, struct dax_device, inode); } static void dax_free_inode(struct inode *inode) { struct dax_device *dax_dev = to_dax_dev(inode); if (inode->i_rdev) ida_free(&dax_minor_ida, iminor(inode)); kmem_cache_free(dax_cache, dax_dev); } static void dax_destroy_inode(struct inode *inode) { struct dax_device *dax_dev = to_dax_dev(inode); WARN_ONCE(test_bit(DAXDEV_ALIVE, &dax_dev->flags), "kill_dax() must be called before final iput()\n"); } static const struct super_operations dax_sops = { .statfs = simple_statfs, .alloc_inode = dax_alloc_inode, .destroy_inode = dax_destroy_inode, .free_inode = dax_free_inode, .drop_inode = generic_delete_inode, }; static int dax_init_fs_context(struct fs_context *fc) { struct pseudo_fs_context *ctx = init_pseudo(fc, DAXFS_MAGIC); if (!ctx) return -ENOMEM; ctx->ops = &dax_sops; return 0; } static struct file_system_type dax_fs_type = { .name = "dax", .init_fs_context = dax_init_fs_context, .kill_sb = kill_anon_super, }; static int dax_test(struct inode *inode, void *data) { dev_t devt = *(dev_t *) data; return inode->i_rdev == devt; } static int dax_set(struct inode *inode, void *data) { dev_t devt = *(dev_t *) data; inode->i_rdev = devt; return 0; } static struct dax_device *dax_dev_get(dev_t devt) { struct dax_device *dax_dev; struct inode *inode; inode = iget5_locked(dax_superblock, hash_32(devt + DAXFS_MAGIC, 31), dax_test, dax_set, &devt); if (!inode) return NULL; dax_dev = to_dax_dev(inode); if (inode->i_state & I_NEW) { set_bit(DAXDEV_ALIVE, &dax_dev->flags); inode->i_cdev = &dax_dev->cdev; inode->i_mode = S_IFCHR; inode->i_flags = S_DAX; mapping_set_gfp_mask(&inode->i_data, GFP_USER); unlock_new_inode(inode); } return dax_dev; } struct dax_device *alloc_dax(void *private, const struct dax_operations *ops) { struct dax_device *dax_dev; dev_t devt; int minor; /* * Unavailable on architectures with virtually aliased data caches, * except for device-dax (NULL operations pointer), which does * not use aliased mappings from the kernel. */ if (ops && cpu_dcache_is_aliasing()) return ERR_PTR(-EOPNOTSUPP); if (WARN_ON_ONCE(ops && !ops->zero_page_range)) return ERR_PTR(-EINVAL); minor = ida_alloc_max(&dax_minor_ida, MINORMASK, GFP_KERNEL); if (minor < 0) return ERR_PTR(-ENOMEM); devt = MKDEV(MAJOR(dax_devt), minor); dax_dev = dax_dev_get(devt); if (!dax_dev) goto err_dev; dax_dev->ops = ops; dax_dev->private = private; return dax_dev; err_dev: ida_free(&dax_minor_ida, minor); return ERR_PTR(-ENOMEM); } EXPORT_SYMBOL_GPL(alloc_dax); void put_dax(struct dax_device *dax_dev) { if (!dax_dev) return; iput(&dax_dev->inode); } EXPORT_SYMBOL_GPL(put_dax); /** * dax_holder() - obtain the holder of a dax device * @dax_dev: a dax_device instance * * Return: the holder's data which represents the holder if registered, * otherwize NULL. */ void *dax_holder(struct dax_device *dax_dev) { return dax_dev->holder_data; } EXPORT_SYMBOL_GPL(dax_holder); /** * inode_dax: convert a public inode into its dax_dev * @inode: An inode with i_cdev pointing to a dax_dev * * Note this is not equivalent to to_dax_dev() which is for private * internal use where we know the inode filesystem type == dax_fs_type. */ struct dax_device *inode_dax(struct inode *inode) { struct cdev *cdev = inode->i_cdev; return container_of(cdev, struct dax_device, cdev); } EXPORT_SYMBOL_GPL(inode_dax); struct inode *dax_inode(struct dax_device *dax_dev) { return &dax_dev->inode; } EXPORT_SYMBOL_GPL(dax_inode); void *dax_get_private(struct dax_device *dax_dev) { if (!test_bit(DAXDEV_ALIVE, &dax_dev->flags)) return NULL; return dax_dev->private; } EXPORT_SYMBOL_GPL(dax_get_private); static void init_once(void *_dax_dev) { struct dax_device *dax_dev = _dax_dev; struct inode *inode = &dax_dev->inode; memset(dax_dev, 0, sizeof(*dax_dev)); inode_init_once(inode); } static int dax_fs_init(void) { int rc; dax_cache = kmem_cache_create("dax_cache", sizeof(struct dax_device), 0, SLAB_HWCACHE_ALIGN | SLAB_RECLAIM_ACCOUNT | SLAB_ACCOUNT, init_once); if (!dax_cache) return -ENOMEM; dax_mnt = kern_mount(&dax_fs_type); if (IS_ERR(dax_mnt)) { rc = PTR_ERR(dax_mnt); goto err_mount; } dax_superblock = dax_mnt->mnt_sb; return 0; err_mount: kmem_cache_destroy(dax_cache); return rc; } static void dax_fs_exit(void) { kern_unmount(dax_mnt); rcu_barrier(); kmem_cache_destroy(dax_cache); } static int __init dax_core_init(void) { int rc; rc = dax_fs_init(); if (rc) return rc; rc = alloc_chrdev_region(&dax_devt, 0, MINORMASK+1, "dax"); if (rc) goto err_chrdev; rc = dax_bus_init(); if (rc) goto err_bus; return 0; err_bus: unregister_chrdev_region(dax_devt, MINORMASK+1); err_chrdev: dax_fs_exit(); return 0; } static void __exit dax_core_exit(void) { dax_bus_exit(); unregister_chrdev_region(dax_devt, MINORMASK+1); ida_destroy(&dax_minor_ida); dax_fs_exit(); } MODULE_AUTHOR("Intel Corporation"); MODULE_DESCRIPTION("DAX: direct access to differentiated memory"); MODULE_LICENSE("GPL v2"); subsys_initcall(dax_core_init); module_exit(dax_core_exit); |
8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 47 1 1 1 1 1 1 1 64 2 63 64 2 9 10 14 14 14 1 3 10 10 10 9 1 14 2 3 1 73 19 21 1 41 1 37 2 22 114 10 84 19 15 3 3 1 8 7 8 2 4 2 1 1 114 143 115 9 115 4 2 3 14 7 11 11 8 3 7 7 1 6 7 7 7 64 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 | // SPDX-License-Identifier: GPL-2.0-only /* * IBSS mode implementation * Copyright 2003-2008, Jouni Malinen <j@w1.fi> * Copyright 2004, Instant802 Networks, Inc. * Copyright 2005, Devicescape Software, Inc. * Copyright 2006-2007 Jiri Benc <jbenc@suse.cz> * Copyright 2007, Michael Wu <flamingice@sourmilk.net> * Copyright 2009, Johannes Berg <johannes@sipsolutions.net> * Copyright 2013-2014 Intel Mobile Communications GmbH * Copyright(c) 2016 Intel Deutschland GmbH * Copyright(c) 2018-2024 Intel Corporation */ #include <linux/delay.h> #include <linux/slab.h> #include <linux/if_ether.h> #include <linux/skbuff.h> #include <linux/if_arp.h> #include <linux/etherdevice.h> #include <linux/rtnetlink.h> #include <net/mac80211.h> #include "ieee80211_i.h" #include "driver-ops.h" #include "rate.h" #define IEEE80211_SCAN_INTERVAL (2 * HZ) #define IEEE80211_IBSS_JOIN_TIMEOUT (7 * HZ) #define IEEE80211_IBSS_MERGE_INTERVAL (30 * HZ) #define IEEE80211_IBSS_INACTIVITY_LIMIT (60 * HZ) #define IEEE80211_IBSS_RSN_INACTIVITY_LIMIT (10 * HZ) #define IEEE80211_IBSS_MAX_STA_ENTRIES 128 static struct beacon_data * ieee80211_ibss_build_presp(struct ieee80211_sub_if_data *sdata, const int beacon_int, const u32 basic_rates, const u16 capability, u64 tsf, struct cfg80211_chan_def *chandef, bool *have_higher_than_11mbit, struct cfg80211_csa_settings *csa_settings) { struct ieee80211_if_ibss *ifibss = &sdata->u.ibss; struct ieee80211_local *local = sdata->local; int rates_n = 0, i, ri; struct ieee80211_mgmt *mgmt; u8 *pos; struct ieee80211_supported_band *sband; u32 rate_flags, rates = 0, rates_added = 0; struct beacon_data *presp; int frame_len; /* Build IBSS probe response */ frame_len = sizeof(struct ieee80211_hdr_3addr) + 12 /* struct ieee80211_mgmt.u.beacon */ + 2 + IEEE80211_MAX_SSID_LEN /* max SSID */ + 2 + 8 /* max Supported Rates */ + 3 /* max DS params */ + 4 /* IBSS params */ + 5 /* Channel Switch Announcement */ + 2 + (IEEE80211_MAX_SUPP_RATES - 8) + 2 + sizeof(struct ieee80211_ht_cap) + 2 + sizeof(struct ieee80211_ht_operation) + 2 + sizeof(struct ieee80211_vht_cap) + 2 + sizeof(struct ieee80211_vht_operation) + ifibss->ie_len; presp = kzalloc(sizeof(*presp) + frame_len, GFP_KERNEL); if (!presp) return NULL; presp->head = (void *)(presp + 1); mgmt = (void *) presp->head; mgmt->frame_control = cpu_to_le16(IEEE80211_FTYPE_MGMT | IEEE80211_STYPE_PROBE_RESP); eth_broadcast_addr(mgmt->da); memcpy(mgmt->sa, sdata->vif.addr, ETH_ALEN); memcpy(mgmt->bssid, ifibss->bssid, ETH_ALEN); mgmt->u.beacon.beacon_int = cpu_to_le16(beacon_int); mgmt->u.beacon.timestamp = cpu_to_le64(tsf); mgmt->u.beacon.capab_info = cpu_to_le16(capability); pos = (u8 *)mgmt + offsetof(struct ieee80211_mgmt, u.beacon.variable); *pos++ = WLAN_EID_SSID; *pos++ = ifibss->ssid_len; memcpy(pos, ifibss->ssid, ifibss->ssid_len); pos += ifibss->ssid_len; sband = local->hw.wiphy->bands[chandef->chan->band]; rate_flags = ieee80211_chandef_rate_flags(chandef); rates_n = 0; if (have_higher_than_11mbit) *have_higher_than_11mbit = false; for (i = 0; i < sband->n_bitrates; i++) { if ((rate_flags & sband->bitrates[i].flags) != rate_flags) continue; if (sband->bitrates[i].bitrate > 110 && have_higher_than_11mbit) *have_higher_than_11mbit = true; rates |= BIT(i); rates_n++; } *pos++ = WLAN_EID_SUPP_RATES; *pos++ = min_t(int, 8, rates_n); for (ri = 0; ri < sband->n_bitrates; ri++) { int rate = DIV_ROUND_UP(sband->bitrates[ri].bitrate, 5); u8 basic = 0; if (!(rates & BIT(ri))) continue; if (basic_rates & BIT(ri)) basic = 0x80; *pos++ = basic | (u8) rate; if (++rates_added == 8) { ri++; /* continue at next rate for EXT_SUPP_RATES */ break; } } if (sband->band == NL80211_BAND_2GHZ) { *pos++ = WLAN_EID_DS_PARAMS; *pos++ = 1; *pos++ = ieee80211_frequency_to_channel( chandef->chan->center_freq); } *pos++ = WLAN_EID_IBSS_PARAMS; *pos++ = 2; /* FIX: set ATIM window based on scan results */ *pos++ = 0; *pos++ = 0; if (csa_settings) { *pos++ = WLAN_EID_CHANNEL_SWITCH; *pos++ = 3; *pos++ = csa_settings->block_tx ? 1 : 0; *pos++ = ieee80211_frequency_to_channel( csa_settings->chandef.chan->center_freq); presp->cntdwn_counter_offsets[0] = (pos - presp->head); *pos++ = csa_settings->count; presp->cntdwn_current_counter = csa_settings->count; } /* put the remaining rates in WLAN_EID_EXT_SUPP_RATES */ if (rates_n > 8) { *pos++ = WLAN_EID_EXT_SUPP_RATES; *pos++ = rates_n - 8; for (; ri < sband->n_bitrates; ri++) { int rate = DIV_ROUND_UP(sband->bitrates[ri].bitrate, 5); u8 basic = 0; if (!(rates & BIT(ri))) continue; if (basic_rates & BIT(ri)) basic = 0x80; *pos++ = basic | (u8) rate; } } if (ifibss->ie_len) { memcpy(pos, ifibss->ie, ifibss->ie_len); pos += ifibss->ie_len; } /* add HT capability and information IEs */ if (chandef->width != NL80211_CHAN_WIDTH_20_NOHT && chandef->width != NL80211_CHAN_WIDTH_5 && chandef->width != NL80211_CHAN_WIDTH_10 && sband->ht_cap.ht_supported) { struct ieee80211_sta_ht_cap ht_cap; memcpy(&ht_cap, &sband->ht_cap, sizeof(ht_cap)); ieee80211_apply_htcap_overrides(sdata, &ht_cap); pos = ieee80211_ie_build_ht_cap(pos, &ht_cap, ht_cap.cap); /* * Note: According to 802.11n-2009 9.13.3.1, HT Protection * field and RIFS Mode are reserved in IBSS mode, therefore * keep them at 0 */ pos = ieee80211_ie_build_ht_oper(pos, &sband->ht_cap, chandef, 0, false); /* add VHT capability and information IEs */ if (chandef->width != NL80211_CHAN_WIDTH_20 && chandef->width != NL80211_CHAN_WIDTH_40 && sband->vht_cap.vht_supported) { pos = ieee80211_ie_build_vht_cap(pos, &sband->vht_cap, sband->vht_cap.cap); pos = ieee80211_ie_build_vht_oper(pos, &sband->vht_cap, chandef); } } if (local->hw.queues >= IEEE80211_NUM_ACS) pos = ieee80211_add_wmm_info_ie(pos, 0); /* U-APSD not in use */ presp->head_len = pos - presp->head; if (WARN_ON(presp->head_len > frame_len)) goto error; return presp; error: kfree(presp); return NULL; } static void __ieee80211_sta_join_ibss(struct ieee80211_sub_if_data *sdata, const u8 *bssid, const int beacon_int, struct cfg80211_chan_def *req_chandef, const u32 basic_rates, const u16 capability, u64 tsf, bool creator) { struct ieee80211_if_ibss *ifibss = &sdata->u.ibss; struct ieee80211_local *local = sdata->local; struct ieee80211_mgmt *mgmt; struct cfg80211_bss *bss; u64 bss_change; struct ieee80211_chan_req chanreq = {}; struct ieee80211_channel *chan; struct beacon_data *presp; struct cfg80211_inform_bss bss_meta = {}; bool have_higher_than_11mbit; bool radar_required; int err; lockdep_assert_wiphy(local->hw.wiphy); /* Reset own TSF to allow time synchronization work. */ drv_reset_tsf(local, sdata); if (!ether_addr_equal(ifibss->bssid, bssid)) sta_info_flush(sdata, -1); /* if merging, indicate to driver that we leave the old IBSS */ if (sdata->vif.cfg.ibss_joined) { sdata->vif.cfg.ibss_joined = false; sdata->vif.cfg.ibss_creator = false; sdata->vif.bss_conf.enable_beacon = false; netif_carrier_off(sdata->dev); ieee80211_bss_info_change_notify(sdata, BSS_CHANGED_IBSS | BSS_CHANGED_BEACON_ENABLED); drv_leave_ibss(local, sdata); } presp = sdata_dereference(ifibss->presp, sdata); RCU_INIT_POINTER(ifibss->presp, NULL); if (presp) kfree_rcu(presp, rcu_head); /* make a copy of the chandef, it could be modified below. */ chanreq.oper = *req_chandef; chan = chanreq.oper.chan; if (!cfg80211_reg_can_beacon(local->hw.wiphy, &chanreq.oper, NL80211_IFTYPE_ADHOC)) { if (chanreq.oper.width == NL80211_CHAN_WIDTH_5 || chanreq.oper.width == NL80211_CHAN_WIDTH_10 || chanreq.oper.width == NL80211_CHAN_WIDTH_20_NOHT || chanreq.oper.width == NL80211_CHAN_WIDTH_20) { sdata_info(sdata, "Failed to join IBSS, beacons forbidden\n"); return; } chanreq.oper.width = NL80211_CHAN_WIDTH_20; chanreq.oper.center_freq1 = chan->center_freq; /* check again for downgraded chandef */ if (!cfg80211_reg_can_beacon(local->hw.wiphy, &chanreq.oper, NL80211_IFTYPE_ADHOC)) { sdata_info(sdata, "Failed to join IBSS, beacons forbidden\n"); return; } } err = cfg80211_chandef_dfs_required(sdata->local->hw.wiphy, &chanreq.oper, NL80211_IFTYPE_ADHOC); if (err < 0) { sdata_info(sdata, "Failed to join IBSS, invalid chandef\n"); return; } if (err > 0 && !ifibss->userspace_handles_dfs) { sdata_info(sdata, "Failed to join IBSS, DFS channel without control program\n"); return; } radar_required = err; if (ieee80211_link_use_channel(&sdata->deflink, &chanreq, ifibss->fixed_channel ? IEEE80211_CHANCTX_SHARED : IEEE80211_CHANCTX_EXCLUSIVE)) { sdata_info(sdata, "Failed to join IBSS, no channel context\n"); return; } sdata->deflink.radar_required = radar_required; memcpy(ifibss->bssid, bssid, ETH_ALEN); presp = ieee80211_ibss_build_presp(sdata, beacon_int, basic_rates, capability, tsf, &chanreq.oper, &have_higher_than_11mbit, NULL); if (!presp) return; rcu_assign_pointer(ifibss->presp, presp); mgmt = (void *)presp->head; sdata->vif.bss_conf.enable_beacon = true; sdata->vif.bss_conf.beacon_int = beacon_int; sdata->vif.bss_conf.basic_rates = basic_rates; sdata->vif.cfg.ssid_len = ifibss->ssid_len; memcpy(sdata->vif.cfg.ssid, ifibss->ssid, ifibss->ssid_len); bss_change = BSS_CHANGED_BEACON_INT; bss_change |= ieee80211_reset_erp_info(sdata); bss_change |= BSS_CHANGED_BSSID; bss_change |= BSS_CHANGED_BEACON; bss_change |= BSS_CHANGED_BEACON_ENABLED; bss_change |= BSS_CHANGED_BASIC_RATES; bss_change |= BSS_CHANGED_HT; bss_change |= BSS_CHANGED_IBSS; bss_change |= BSS_CHANGED_SSID; /* * In 5 GHz/802.11a, we can always use short slot time. * (IEEE 802.11-2012 18.3.8.7) * * In 2.4GHz, we must always use long slots in IBSS for compatibility * reasons. * (IEEE 802.11-2012 19.4.5) * * HT follows these specifications (IEEE 802.11-2012 20.3.18) */ sdata->vif.bss_conf.use_short_slot = chan->band == NL80211_BAND_5GHZ; bss_change |= BSS_CHANGED_ERP_SLOT; /* cf. IEEE 802.11 9.2.12 */ sdata->deflink.operating_11g_mode = chan->band == NL80211_BAND_2GHZ && have_higher_than_11mbit; ieee80211_set_wmm_default(&sdata->deflink, true, false); sdata->vif.cfg.ibss_joined = true; sdata->vif.cfg.ibss_creator = creator; err = drv_join_ibss(local, sdata); if (err) { sdata->vif.cfg.ibss_joined = false; sdata->vif.cfg.ibss_creator = false; sdata->vif.bss_conf.enable_beacon = false; sdata->vif.cfg.ssid_len = 0; RCU_INIT_POINTER(ifibss->presp, NULL); kfree_rcu(presp, rcu_head); ieee80211_link_release_channel(&sdata->deflink); sdata_info(sdata, "Failed to join IBSS, driver failure: %d\n", err); return; } ieee80211_bss_info_change_notify(sdata, bss_change); ifibss->state = IEEE80211_IBSS_MLME_JOINED; mod_timer(&ifibss->timer, round_jiffies(jiffies + IEEE80211_IBSS_MERGE_INTERVAL)); bss_meta.chan = chan; bss = cfg80211_inform_bss_frame_data(local->hw.wiphy, &bss_meta, mgmt, presp->head_len, GFP_KERNEL); cfg80211_put_bss(local->hw.wiphy, bss); netif_carrier_on(sdata->dev); cfg80211_ibss_joined(sdata->dev, ifibss->bssid, chan, GFP_KERNEL); } static void ieee80211_sta_join_ibss(struct ieee80211_sub_if_data *sdata, struct ieee80211_bss *bss) { struct cfg80211_bss *cbss = container_of((void *)bss, struct cfg80211_bss, priv); struct ieee80211_supported_band *sband; struct cfg80211_chan_def chandef; u32 basic_rates; int i, j; u16 beacon_int = cbss->beacon_interval; const struct cfg80211_bss_ies *ies; enum nl80211_channel_type chan_type; u64 tsf; u32 rate_flags; lockdep_assert_wiphy(sdata->local->hw.wiphy); if (beacon_int < 10) beacon_int = 10; switch (sdata->u.ibss.chandef.width) { case NL80211_CHAN_WIDTH_20_NOHT: case NL80211_CHAN_WIDTH_20: case NL80211_CHAN_WIDTH_40: chan_type = cfg80211_get_chandef_type(&sdata->u.ibss.chandef); cfg80211_chandef_create(&chandef, cbss->channel, chan_type); break; case NL80211_CHAN_WIDTH_5: case NL80211_CHAN_WIDTH_10: cfg80211_chandef_create(&chandef, cbss->channel, NL80211_CHAN_NO_HT); chandef.width = sdata->u.ibss.chandef.width; break; case NL80211_CHAN_WIDTH_80: case NL80211_CHAN_WIDTH_80P80: case NL80211_CHAN_WIDTH_160: chandef = sdata->u.ibss.chandef; chandef.chan = cbss->channel; break; default: /* fall back to 20 MHz for unsupported modes */ cfg80211_chandef_create(&chandef, cbss->channel, NL80211_CHAN_NO_HT); break; } sband = sdata->local->hw.wiphy->bands[cbss->channel->band]; rate_flags = ieee80211_chandef_rate_flags(&sdata->u.ibss.chandef); basic_rates = 0; for (i = 0; i < bss->supp_rates_len; i++) { int rate = bss->supp_rates[i] & 0x7f; bool is_basic = !!(bss->supp_rates[i] & 0x80); for (j = 0; j < sband->n_bitrates; j++) { int brate; if ((rate_flags & sband->bitrates[j].flags) != rate_flags) continue; brate = DIV_ROUND_UP(sband->bitrates[j].bitrate, 5); if (brate == rate) { if (is_basic) basic_rates |= BIT(j); break; } } } rcu_read_lock(); ies = rcu_dereference(cbss->ies); tsf = ies->tsf; rcu_read_unlock(); __ieee80211_sta_join_ibss(sdata, cbss->bssid, beacon_int, &chandef, basic_rates, cbss->capability, tsf, false); } int ieee80211_ibss_csa_beacon(struct ieee80211_sub_if_data *sdata, struct cfg80211_csa_settings *csa_settings, u64 *changed) { struct ieee80211_if_ibss *ifibss = &sdata->u.ibss; struct beacon_data *presp, *old_presp; struct cfg80211_bss *cbss; const struct cfg80211_bss_ies *ies; u16 capability = WLAN_CAPABILITY_IBSS; u64 tsf; lockdep_assert_wiphy(sdata->local->hw.wiphy); if (ifibss->privacy) capability |= WLAN_CAPABILITY_PRIVACY; cbss = cfg80211_get_bss(sdata->local->hw.wiphy, ifibss->chandef.chan, ifibss->bssid, ifibss->ssid, ifibss->ssid_len, IEEE80211_BSS_TYPE_IBSS, IEEE80211_PRIVACY(ifibss->privacy)); if (unlikely(!cbss)) return -EINVAL; rcu_read_lock(); ies = rcu_dereference(cbss->ies); tsf = ies->tsf; rcu_read_unlock(); cfg80211_put_bss(sdata->local->hw.wiphy, cbss); old_presp = sdata_dereference(ifibss->presp, sdata); presp = ieee80211_ibss_build_presp(sdata, sdata->vif.bss_conf.beacon_int, sdata->vif.bss_conf.basic_rates, capability, tsf, &ifibss->chandef, NULL, csa_settings); if (!presp) return -ENOMEM; rcu_assign_pointer(ifibss->presp, presp); if (old_presp) kfree_rcu(old_presp, rcu_head); *changed |= BSS_CHANGED_BEACON; return 0; } int ieee80211_ibss_finish_csa(struct ieee80211_sub_if_data *sdata, u64 *changed) { struct ieee80211_if_ibss *ifibss = &sdata->u.ibss; struct cfg80211_bss *cbss; lockdep_assert_wiphy(sdata->local->hw.wiphy); /* When not connected/joined, sending CSA doesn't make sense. */ if (ifibss->state != IEEE80211_IBSS_MLME_JOINED) return -ENOLINK; /* update cfg80211 bss information with the new channel */ if (!is_zero_ether_addr(ifibss->bssid)) { cbss = cfg80211_get_bss(sdata->local->hw.wiphy, ifibss->chandef.chan, ifibss->bssid, ifibss->ssid, ifibss->ssid_len, IEEE80211_BSS_TYPE_IBSS, IEEE80211_PRIVACY(ifibss->privacy)); /* XXX: should not really modify cfg80211 data */ if (cbss) { cbss->channel = sdata->deflink.csa.chanreq.oper.chan; cfg80211_put_bss(sdata->local->hw.wiphy, cbss); } } ifibss->chandef = sdata->deflink.csa.chanreq.oper; /* generate the beacon */ return ieee80211_ibss_csa_beacon(sdata, NULL, changed); } void ieee80211_ibss_stop(struct ieee80211_sub_if_data *sdata) { struct ieee80211_if_ibss *ifibss = &sdata->u.ibss; wiphy_work_cancel(sdata->local->hw.wiphy, &ifibss->csa_connection_drop_work); } static struct sta_info *ieee80211_ibss_finish_sta(struct sta_info *sta) __acquires(RCU) { struct ieee80211_sub_if_data *sdata = sta->sdata; u8 addr[ETH_ALEN]; memcpy(addr, sta->sta.addr, ETH_ALEN); ibss_dbg(sdata, "Adding new IBSS station %pM\n", addr); sta_info_pre_move_state(sta, IEEE80211_STA_AUTH); sta_info_pre_move_state(sta, IEEE80211_STA_ASSOC); /* authorize the station only if the network is not RSN protected. If * not wait for the userspace to authorize it */ if (!sta->sdata->u.ibss.control_port) sta_info_pre_move_state(sta, IEEE80211_STA_AUTHORIZED); rate_control_rate_init(sta); /* If it fails, maybe we raced another insertion? */ if (sta_info_insert_rcu(sta)) return sta_info_get(sdata, addr); return sta; } static struct sta_info * ieee80211_ibss_add_sta(struct ieee80211_sub_if_data *sdata, const u8 *bssid, const u8 *addr, u32 supp_rates) __acquires(RCU) { struct ieee80211_if_ibss *ifibss = &sdata->u.ibss; struct ieee80211_local *local = sdata->local; struct sta_info *sta; struct ieee80211_chanctx_conf *chanctx_conf; struct ieee80211_supported_band *sband; int band; /* * XXX: Consider removing the least recently used entry and * allow new one to be added. */ if (local->num_sta >= IEEE80211_IBSS_MAX_STA_ENTRIES) { net_info_ratelimited("%s: No room for a new IBSS STA entry %pM\n", sdata->name, addr); rcu_read_lock(); return NULL; } if (ifibss->state == IEEE80211_IBSS_MLME_SEARCH) { rcu_read_lock(); return NULL; } if (!ether_addr_equal(bssid, sdata->u.ibss.bssid)) { rcu_read_lock(); return NULL; } rcu_read_lock(); chanctx_conf = rcu_dereference(sdata->vif.bss_conf.chanctx_conf); if (WARN_ON_ONCE(!chanctx_conf)) return NULL; band = chanctx_conf->def.chan->band; rcu_read_unlock(); sta = sta_info_alloc(sdata, addr, GFP_KERNEL); if (!sta) { rcu_read_lock(); return NULL; } /* make sure mandatory rates are always added */ sband = local->hw.wiphy->bands[band]; sta->sta.deflink.supp_rates[band] = supp_rates | ieee80211_mandatory_rates(sband); return ieee80211_ibss_finish_sta(sta); } static int ieee80211_sta_active_ibss(struct ieee80211_sub_if_data *sdata) { struct ieee80211_local *local = sdata->local; int active = 0; struct sta_info *sta; lockdep_assert_wiphy(sdata->local->hw.wiphy); rcu_read_lock(); list_for_each_entry_rcu(sta, &local->sta_list, list) { unsigned long last_active = ieee80211_sta_last_active(sta); if (sta->sdata == sdata && time_is_after_jiffies(last_active + IEEE80211_IBSS_MERGE_INTERVAL)) { active++; break; } } rcu_read_unlock(); return active; } static void ieee80211_ibss_disconnect(struct ieee80211_sub_if_data *sdata) { struct ieee80211_if_ibss *ifibss = &sdata->u.ibss; struct ieee80211_local *local = sdata->local; struct cfg80211_bss *cbss; struct beacon_data *presp; struct sta_info *sta; lockdep_assert_wiphy(local->hw.wiphy); if (!is_zero_ether_addr(ifibss->bssid)) { cbss = cfg80211_get_bss(local->hw.wiphy, ifibss->chandef.chan, ifibss->bssid, ifibss->ssid, ifibss->ssid_len, IEEE80211_BSS_TYPE_IBSS, IEEE80211_PRIVACY(ifibss->privacy)); if (cbss) { cfg80211_unlink_bss(local->hw.wiphy, cbss); cfg80211_put_bss(sdata->local->hw.wiphy, cbss); } } ifibss->state = IEEE80211_IBSS_MLME_SEARCH; sta_info_flush(sdata, -1); spin_lock_bh(&ifibss->incomplete_lock); while (!list_empty(&ifibss->incomplete_stations)) { sta = list_first_entry(&ifibss->incomplete_stations, struct sta_info, list); list_del(&sta->list); spin_unlock_bh(&ifibss->incomplete_lock); sta_info_free(local, sta); spin_lock_bh(&ifibss->incomplete_lock); } spin_unlock_bh(&ifibss->incomplete_lock); netif_carrier_off(sdata->dev); sdata->vif.cfg.ibss_joined = false; sdata->vif.cfg.ibss_creator = false; sdata->vif.bss_conf.enable_beacon = false; sdata->vif.cfg.ssid_len = 0; /* remove beacon */ presp = sdata_dereference(ifibss->presp, sdata); RCU_INIT_POINTER(sdata->u.ibss.presp, NULL); if (presp) kfree_rcu(presp, rcu_head); clear_bit(SDATA_STATE_OFFCHANNEL_BEACON_STOPPED, &sdata->state); ieee80211_bss_info_change_notify(sdata, BSS_CHANGED_BEACON_ENABLED | BSS_CHANGED_IBSS); drv_leave_ibss(local, sdata); ieee80211_link_release_channel(&sdata->deflink); } static void ieee80211_csa_connection_drop_work(struct wiphy *wiphy, struct wiphy_work *work) { struct ieee80211_sub_if_data *sdata = container_of(work, struct ieee80211_sub_if_data, u.ibss.csa_connection_drop_work); ieee80211_ibss_disconnect(sdata); synchronize_rcu(); skb_queue_purge(&sdata->skb_queue); /* trigger a scan to find another IBSS network to join */ wiphy_work_queue(sdata->local->hw.wiphy, &sdata->work); } static void ieee80211_ibss_csa_mark_radar(struct ieee80211_sub_if_data *sdata) { struct ieee80211_if_ibss *ifibss = &sdata->u.ibss; int err; /* if the current channel is a DFS channel, mark the channel as * unavailable. */ err = cfg80211_chandef_dfs_required(sdata->local->hw.wiphy, &ifibss->chandef, NL80211_IFTYPE_ADHOC); if (err > 0) cfg80211_radar_event(sdata->local->hw.wiphy, &ifibss->chandef, GFP_ATOMIC); } static bool ieee80211_ibss_process_chanswitch(struct ieee80211_sub_if_data *sdata, struct ieee802_11_elems *elems, bool beacon) { struct cfg80211_csa_settings params; struct ieee80211_csa_ie csa_ie; struct ieee80211_if_ibss *ifibss = &sdata->u.ibss; enum nl80211_channel_type ch_type; int err; struct ieee80211_conn_settings conn = { .mode = IEEE80211_CONN_MODE_HT, .bw_limit = IEEE80211_CONN_BW_LIMIT_40, }; u32 vht_cap_info = 0; lockdep_assert_wiphy(sdata->local->hw.wiphy); switch (ifibss->chandef.width) { case NL80211_CHAN_WIDTH_5: case NL80211_CHAN_WIDTH_10: case NL80211_CHAN_WIDTH_20_NOHT: conn.mode = IEEE80211_CONN_MODE_LEGACY; fallthrough; case NL80211_CHAN_WIDTH_20: conn.bw_limit = IEEE80211_CONN_BW_LIMIT_20; break; default: break; } if (elems->vht_cap_elem) vht_cap_info = le32_to_cpu(elems->vht_cap_elem->vht_cap_info); memset(¶ms, 0, sizeof(params)); err = ieee80211_parse_ch_switch_ie(sdata, elems, ifibss->chandef.chan->band, vht_cap_info, &conn, ifibss->bssid, false, &csa_ie); /* can't switch to destination channel, fail */ if (err < 0) goto disconnect; /* did not contain a CSA */ if (err) return false; /* channel switch is not supported, disconnect */ if (!(sdata->local->hw.wiphy->flags & WIPHY_FLAG_HAS_CHANNEL_SWITCH)) goto disconnect; params.count = csa_ie.count; params.chandef = csa_ie.chanreq.oper; switch (ifibss->chandef.width) { case NL80211_CHAN_WIDTH_20_NOHT: case NL80211_CHAN_WIDTH_20: case NL80211_CHAN_WIDTH_40: /* keep our current HT mode (HT20/HT40+/HT40-), even if * another mode has been announced. The mode is not adopted * within the beacon while doing CSA and we should therefore * keep the mode which we announce. */ ch_type = cfg80211_get_chandef_type(&ifibss->chandef); cfg80211_chandef_create(¶ms.chandef, params.chandef.chan, ch_type); break; case NL80211_CHAN_WIDTH_5: case NL80211_CHAN_WIDTH_10: if (params.chandef.width != ifibss->chandef.width) { sdata_info(sdata, "IBSS %pM received channel switch from incompatible channel width (%d MHz, width:%d, CF1/2: %d/%d MHz), disconnecting\n", ifibss->bssid, params.chandef.chan->center_freq, params.chandef.width, params.chandef.center_freq1, params.chandef.center_freq2); goto disconnect; } break; default: /* should not happen, conn_flags should prevent VHT modes. */ WARN_ON(1); goto disconnect; } if (!cfg80211_reg_can_beacon(sdata->local->hw.wiphy, ¶ms.chandef, NL80211_IFTYPE_ADHOC)) { sdata_info(sdata, "IBSS %pM switches to unsupported channel (%d MHz, width:%d, CF1/2: %d/%d MHz), disconnecting\n", ifibss->bssid, params.chandef.chan->center_freq, params.chandef.width, params.chandef.center_freq1, params.chandef.center_freq2); goto disconnect; } err = cfg80211_chandef_dfs_required(sdata->local->hw.wiphy, ¶ms.chandef, NL80211_IFTYPE_ADHOC); if (err < 0) goto disconnect; if (err > 0 && !ifibss->userspace_handles_dfs) { /* IBSS-DFS only allowed with a control program */ goto disconnect; } params.radar_required = err; if (cfg80211_chandef_identical(¶ms.chandef, &sdata->vif.bss_conf.chanreq.oper)) { ibss_dbg(sdata, "received csa with an identical chandef, ignoring\n"); return true; } /* all checks done, now perform the channel switch. */ ibss_dbg(sdata, "received channel switch announcement to go to channel %d MHz\n", params.chandef.chan->center_freq); params.block_tx = !!csa_ie.mode; if (ieee80211_channel_switch(sdata->local->hw.wiphy, sdata->dev, ¶ms)) goto disconnect; ieee80211_ibss_csa_mark_radar(sdata); return true; disconnect: ibss_dbg(sdata, "Can't handle channel switch, disconnect\n"); wiphy_work_queue(sdata->local->hw.wiphy, &ifibss->csa_connection_drop_work); ieee80211_ibss_csa_mark_radar(sdata); return true; } static void ieee80211_rx_mgmt_spectrum_mgmt(struct ieee80211_sub_if_data *sdata, struct ieee80211_mgmt *mgmt, size_t len, struct ieee80211_rx_status *rx_status, struct ieee802_11_elems *elems) { int required_len; if (len < IEEE80211_MIN_ACTION_SIZE + 1) return; /* CSA is the only action we handle for now */ if (mgmt->u.action.u.measurement.action_code != WLAN_ACTION_SPCT_CHL_SWITCH) return; required_len = IEEE80211_MIN_ACTION_SIZE + sizeof(mgmt->u.action.u.chan_switch); if (len < required_len) return; if (!sdata->vif.bss_conf.csa_active) ieee80211_ibss_process_chanswitch(sdata, elems, false); } static void ieee80211_rx_mgmt_deauth_ibss(struct ieee80211_sub_if_data *sdata, struct ieee80211_mgmt *mgmt, size_t len) { u16 reason = le16_to_cpu(mgmt->u.deauth.reason_code); if (len < IEEE80211_DEAUTH_FRAME_LEN) return; ibss_dbg(sdata, "RX DeAuth SA=%pM DA=%pM\n", mgmt->sa, mgmt->da); ibss_dbg(sdata, "\tBSSID=%pM (reason: %d)\n", mgmt->bssid, reason); sta_info_destroy_addr(sdata, mgmt->sa); } static void ieee80211_rx_mgmt_auth_ibss(struct ieee80211_sub_if_data *sdata, struct ieee80211_mgmt *mgmt, size_t len) { u16 auth_alg, auth_transaction; lockdep_assert_wiphy(sdata->local->hw.wiphy); if (len < 24 + 6) return; auth_alg = le16_to_cpu(mgmt->u.auth.auth_alg); auth_transaction = le16_to_cpu(mgmt->u.auth.auth_transaction); ibss_dbg(sdata, "RX Auth SA=%pM DA=%pM\n", mgmt->sa, mgmt->da); ibss_dbg(sdata, "\tBSSID=%pM (auth_transaction=%d)\n", mgmt->bssid, auth_transaction); if (auth_alg != WLAN_AUTH_OPEN || auth_transaction != 1) return; /* * IEEE 802.11 standard does not require authentication in IBSS * networks and most implementations do not seem to use it. * However, try to reply to authentication attempts if someone * has actually implemented this. */ ieee80211_send_auth(sdata, 2, WLAN_AUTH_OPEN, 0, NULL, 0, mgmt->sa, sdata->u.ibss.bssid, NULL, 0, 0, 0); } static void ieee80211_update_sta_info(struct ieee80211_sub_if_data *sdata, struct ieee80211_mgmt *mgmt, size_t len, struct ieee80211_rx_status *rx_status, struct ieee802_11_elems *elems, struct ieee80211_channel *channel) { struct sta_info *sta; enum nl80211_band band = rx_status->band; struct ieee80211_local *local = sdata->local; struct ieee80211_supported_band *sband; bool rates_updated = false; u32 supp_rates = 0; if (sdata->vif.type != NL80211_IFTYPE_ADHOC) return; if (!ether_addr_equal(mgmt->bssid, sdata->u.ibss.bssid)) return; sband = local->hw.wiphy->bands[band]; if (WARN_ON(!sband)) return; rcu_read_lock(); sta = sta_info_get(sdata, mgmt->sa); if (elems->supp_rates) { supp_rates = ieee80211_sta_get_rates(sdata, elems, band, NULL); if (sta) { u32 prev_rates; prev_rates = sta->sta.deflink.supp_rates[band]; sta->sta.deflink.supp_rates[band] = supp_rates | ieee80211_mandatory_rates(sband); if (sta->sta.deflink.supp_rates[band] != prev_rates) { ibss_dbg(sdata, "updated supp_rates set for %pM based on beacon/probe_resp (0x%x -> 0x%x)\n", sta->sta.addr, prev_rates, sta->sta.deflink.supp_rates[band]); rates_updated = true; } } else { rcu_read_unlock(); sta = ieee80211_ibss_add_sta(sdata, mgmt->bssid, mgmt->sa, supp_rates); } } if (sta && !sta->sta.wme && (elems->wmm_info || elems->s1g_capab) && local->hw.queues >= IEEE80211_NUM_ACS) { sta->sta.wme = true; ieee80211_check_fast_xmit(sta); } if (sta && elems->ht_operation && elems->ht_cap_elem && sdata->u.ibss.chandef.width != NL80211_CHAN_WIDTH_20_NOHT && sdata->u.ibss.chandef.width != NL80211_CHAN_WIDTH_5 && sdata->u.ibss.chandef.width != NL80211_CHAN_WIDTH_10) { /* we both use HT */ struct ieee80211_ht_cap htcap_ie; struct cfg80211_chan_def chandef; enum ieee80211_sta_rx_bandwidth bw = sta->sta.deflink.bandwidth; cfg80211_chandef_create(&chandef, channel, NL80211_CHAN_NO_HT); ieee80211_chandef_ht_oper(elems->ht_operation, &chandef); memcpy(&htcap_ie, elems->ht_cap_elem, sizeof(htcap_ie)); rates_updated |= ieee80211_ht_cap_ie_to_sta_ht_cap(sdata, sband, &htcap_ie, &sta->deflink); if (elems->vht_operation && elems->vht_cap_elem && sdata->u.ibss.chandef.width != NL80211_CHAN_WIDTH_20 && sdata->u.ibss.chandef.width != NL80211_CHAN_WIDTH_40) { /* we both use VHT */ struct ieee80211_vht_cap cap_ie; struct ieee80211_sta_vht_cap cap = sta->sta.deflink.vht_cap; u32 vht_cap_info = le32_to_cpu(elems->vht_cap_elem->vht_cap_info); ieee80211_chandef_vht_oper(&local->hw, vht_cap_info, elems->vht_operation, elems->ht_operation, &chandef); memcpy(&cap_ie, elems->vht_cap_elem, sizeof(cap_ie)); ieee80211_vht_cap_ie_to_sta_vht_cap(sdata, sband, &cap_ie, NULL, &sta->deflink); if (memcmp(&cap, &sta->sta.deflink.vht_cap, sizeof(cap))) rates_updated |= true; } if (bw != sta->sta.deflink.bandwidth) rates_updated |= true; if (!cfg80211_chandef_compatible(&sdata->u.ibss.chandef, &chandef)) WARN_ON_ONCE(1); } if (sta && rates_updated) { u32 changed = IEEE80211_RC_SUPP_RATES_CHANGED; u8 rx_nss = sta->sta.deflink.rx_nss; /* Force rx_nss recalculation */ sta->sta.deflink.rx_nss = 0; rate_control_rate_init(sta); if (sta->sta.deflink.rx_nss != rx_nss) changed |= IEEE80211_RC_NSS_CHANGED; drv_sta_rc_update(local, sdata, &sta->sta, changed); } rcu_read_unlock(); } static void ieee80211_rx_bss_info(struct ieee80211_sub_if_data *sdata, struct ieee80211_mgmt *mgmt, size_t len, struct ieee80211_rx_status *rx_status, struct ieee802_11_elems *elems) { struct ieee80211_local *local = sdata->local; struct cfg80211_bss *cbss; struct ieee80211_bss *bss; struct ieee80211_channel *channel; u64 beacon_timestamp, rx_timestamp; u32 supp_rates = 0; enum nl80211_band band = rx_status->band; channel = ieee80211_get_channel(local->hw.wiphy, rx_status->freq); if (!channel) return; ieee80211_update_sta_info(sdata, mgmt, len, rx_status, elems, channel); bss = ieee80211_bss_info_update(local, rx_status, mgmt, len, channel); if (!bss) return; cbss = container_of((void *)bss, struct cfg80211_bss, priv); /* same for beacon and probe response */ beacon_timestamp = le64_to_cpu(mgmt->u.beacon.timestamp); /* check if we need to merge IBSS */ /* not an IBSS */ if (!(cbss->capability & WLAN_CAPABILITY_IBSS)) goto put_bss; /* different channel */ if (sdata->u.ibss.fixed_channel && sdata->u.ibss.chandef.chan != cbss->channel) goto put_bss; /* different SSID */ if (elems->ssid_len != sdata->u.ibss.ssid_len || memcmp(elems->ssid, sdata->u.ibss.ssid, sdata->u.ibss.ssid_len)) goto put_bss; /* process channel switch */ if (sdata->vif.bss_conf.csa_active || ieee80211_ibss_process_chanswitch(sdata, elems, true)) goto put_bss; /* same BSSID */ if (ether_addr_equal(cbss->bssid, sdata->u.ibss.bssid)) goto put_bss; /* we use a fixed BSSID */ if (sdata->u.ibss.fixed_bssid) goto put_bss; if (ieee80211_have_rx_timestamp(rx_status)) { /* time when timestamp field was received */ rx_timestamp = ieee80211_calculate_rx_timestamp(local, rx_status, len + FCS_LEN, 24); } else { /* * second best option: get current TSF * (will return -1 if not supported) */ rx_timestamp = drv_get_tsf(local, sdata); } ibss_dbg(sdata, "RX beacon SA=%pM BSSID=%pM TSF=0x%llx\n", mgmt->sa, mgmt->bssid, (unsigned long long)rx_timestamp); ibss_dbg(sdata, "\tBCN=0x%llx diff=%lld @%lu\n", (unsigned long long)beacon_timestamp, (unsigned long long)(rx_timestamp - beacon_timestamp), jiffies); if (beacon_timestamp > rx_timestamp) { ibss_dbg(sdata, "beacon TSF higher than local TSF - IBSS merge with BSSID %pM\n", mgmt->bssid); ieee80211_sta_join_ibss(sdata, bss); supp_rates = ieee80211_sta_get_rates(sdata, elems, band, NULL); ieee80211_ibss_add_sta(sdata, mgmt->bssid, mgmt->sa, supp_rates); rcu_read_unlock(); } put_bss: ieee80211_rx_bss_put(local, bss); } void ieee80211_ibss_rx_no_sta(struct ieee80211_sub_if_data *sdata, const u8 *bssid, const u8 *addr, u32 supp_rates) { struct ieee80211_if_ibss *ifibss = &sdata->u.ibss; struct ieee80211_local *local = sdata->local; struct sta_info *sta; struct ieee80211_chanctx_conf *chanctx_conf; struct ieee80211_supported_band *sband; int band; /* * XXX: Consider removing the least recently used entry and * allow new one to be added. */ if (local->num_sta >= IEEE80211_IBSS_MAX_STA_ENTRIES) { net_info_ratelimited("%s: No room for a new IBSS STA entry %pM\n", sdata->name, addr); return; } if (ifibss->state == IEEE80211_IBSS_MLME_SEARCH) return; if (!ether_addr_equal(bssid, sdata->u.ibss.bssid)) return; rcu_read_lock(); chanctx_conf = rcu_dereference(sdata->vif.bss_conf.chanctx_conf); if (WARN_ON_ONCE(!chanctx_conf)) { rcu_read_unlock(); return; } band = chanctx_conf->def.chan->band; rcu_read_unlock(); sta = sta_info_alloc(sdata, addr, GFP_ATOMIC); if (!sta) return; /* make sure mandatory rates are always added */ sband = local->hw.wiphy->bands[band]; sta->sta.deflink.supp_rates[band] = supp_rates | ieee80211_mandatory_rates(sband); spin_lock(&ifibss->incomplete_lock); list_add(&sta->list, &ifibss->incomplete_stations); spin_unlock(&ifibss->incomplete_lock); wiphy_work_queue(local->hw.wiphy, &sdata->work); } static void ieee80211_ibss_sta_expire(struct ieee80211_sub_if_data *sdata) { struct ieee80211_if_ibss *ifibss = &sdata->u.ibss; struct ieee80211_local *local = sdata->local; struct sta_info *sta, *tmp; unsigned long exp_time = IEEE80211_IBSS_INACTIVITY_LIMIT; unsigned long exp_rsn = IEEE80211_IBSS_RSN_INACTIVITY_LIMIT; lockdep_assert_wiphy(local->hw.wiphy); list_for_each_entry_safe(sta, tmp, &local->sta_list, list) { unsigned long last_active = ieee80211_sta_last_active(sta); if (sdata != sta->sdata) continue; if (time_is_before_jiffies(last_active + exp_time) || (time_is_before_jiffies(last_active + exp_rsn) && sta->sta_state != IEEE80211_STA_AUTHORIZED)) { u8 frame_buf[IEEE80211_DEAUTH_FRAME_LEN]; sta_dbg(sta->sdata, "expiring inactive %sSTA %pM\n", sta->sta_state != IEEE80211_STA_AUTHORIZED ? "not authorized " : "", sta->sta.addr); ieee80211_send_deauth_disassoc(sdata, sta->sta.addr, ifibss->bssid, IEEE80211_STYPE_DEAUTH, WLAN_REASON_DEAUTH_LEAVING, true, frame_buf); WARN_ON(__sta_info_destroy(sta)); } } } /* * This function is called with state == IEEE80211_IBSS_MLME_JOINED */ static void ieee80211_sta_merge_ibss(struct ieee80211_sub_if_data *sdata) { struct ieee80211_if_ibss *ifibss = &sdata->u.ibss; lockdep_assert_wiphy(sdata->local->hw.wiphy); mod_timer(&ifibss->timer, round_jiffies(jiffies + IEEE80211_IBSS_MERGE_INTERVAL)); ieee80211_ibss_sta_expire(sdata); if (time_before(jiffies, ifibss->last_scan_completed + IEEE80211_IBSS_MERGE_INTERVAL)) return; if (ieee80211_sta_active_ibss(sdata)) return; if (ifibss->fixed_channel) return; sdata_info(sdata, "No active IBSS STAs - trying to scan for other IBSS networks with same SSID (merge)\n"); ieee80211_request_ibss_scan(sdata, ifibss->ssid, ifibss->ssid_len, NULL, 0); } static void ieee80211_sta_create_ibss(struct ieee80211_sub_if_data *sdata) { struct ieee80211_if_ibss *ifibss = &sdata->u.ibss; u8 bssid[ETH_ALEN]; u16 capability; int i; lockdep_assert_wiphy(sdata->local->hw.wiphy); if (ifibss->fixed_bssid) { memcpy(bssid, ifibss->bssid, ETH_ALEN); } else { /* Generate random, not broadcast, locally administered BSSID. Mix in * own MAC address to make sure that devices that do not have proper * random number generator get different BSSID. */ get_random_bytes(bssid, ETH_ALEN); for (i = 0; i < ETH_ALEN; i++) bssid[i] ^= sdata->vif.addr[i]; bssid[0] &= ~0x01; bssid[0] |= 0x02; } sdata_info(sdata, "Creating new IBSS network, BSSID %pM\n", bssid); capability = WLAN_CAPABILITY_IBSS; if (ifibss->privacy) capability |= WLAN_CAPABILITY_PRIVACY; __ieee80211_sta_join_ibss(sdata, bssid, sdata->vif.bss_conf.beacon_int, &ifibss->chandef, ifibss->basic_rates, capability, 0, true); } static unsigned int ibss_setup_channels(struct wiphy *wiphy, struct ieee80211_channel **channels, unsigned int channels_max, u32 center_freq, u32 width) { struct ieee80211_channel *chan = NULL; unsigned int n_chan = 0; u32 start_freq, end_freq, freq; if (width <= 20) { start_freq = center_freq; end_freq = center_freq; } else { start_freq = center_freq - width / 2 + 10; end_freq = center_freq + width / 2 - 10; } for (freq = start_freq; freq <= end_freq; freq += 20) { chan = ieee80211_get_channel(wiphy, freq); if (!chan) continue; if (n_chan >= channels_max) return n_chan; channels[n_chan] = chan; n_chan++; } return n_chan; } static unsigned int ieee80211_ibss_setup_scan_channels(struct wiphy *wiphy, const struct cfg80211_chan_def *chandef, struct ieee80211_channel **channels, unsigned int channels_max) { unsigned int n_chan = 0; u32 width, cf1, cf2 = 0; switch (chandef->width) { case NL80211_CHAN_WIDTH_40: width = 40; break; case NL80211_CHAN_WIDTH_80P80: cf2 = chandef->center_freq2; fallthrough; case NL80211_CHAN_WIDTH_80: width = 80; break; case NL80211_CHAN_WIDTH_160: width = 160; break; default: width = 20; break; } cf1 = chandef->center_freq1; n_chan = ibss_setup_channels(wiphy, channels, channels_max, cf1, width); if (cf2) n_chan += ibss_setup_channels(wiphy, &channels[n_chan], channels_max - n_chan, cf2, width); return n_chan; } /* * This function is called with state == IEEE80211_IBSS_MLME_SEARCH */ static void ieee80211_sta_find_ibss(struct ieee80211_sub_if_data *sdata) { struct ieee80211_if_ibss *ifibss = &sdata->u.ibss; struct ieee80211_local *local = sdata->local; struct cfg80211_bss *cbss; struct ieee80211_channel *chan = NULL; const u8 *bssid = NULL; int active_ibss; lockdep_assert_wiphy(sdata->local->hw.wiphy); active_ibss = ieee80211_sta_active_ibss(sdata); ibss_dbg(sdata, "sta_find_ibss (active_ibss=%d)\n", active_ibss); if (active_ibss) return; if (ifibss->fixed_bssid) bssid = ifibss->bssid; if (ifibss->fixed_channel) chan = ifibss->chandef.chan; if (!is_zero_ether_addr(ifibss->bssid)) bssid = ifibss->bssid; cbss = cfg80211_get_bss(local->hw.wiphy, chan, bssid, ifibss->ssid, ifibss->ssid_len, IEEE80211_BSS_TYPE_IBSS, IEEE80211_PRIVACY(ifibss->privacy)); if (cbss) { struct ieee80211_bss *bss; bss = (void *)cbss->priv; ibss_dbg(sdata, "sta_find_ibss: selected %pM current %pM\n", cbss->bssid, ifibss->bssid); sdata_info(sdata, "Selected IBSS BSSID %pM based on configured SSID\n", cbss->bssid); ieee80211_sta_join_ibss(sdata, bss); ieee80211_rx_bss_put(local, bss); return; } /* if a fixed bssid and a fixed freq have been provided create the IBSS * directly and do not waste time scanning */ if (ifibss->fixed_bssid && ifibss->fixed_channel) { sdata_info(sdata, "Created IBSS using preconfigured BSSID %pM\n", bssid); ieee80211_sta_create_ibss(sdata); return; } ibss_dbg(sdata, "sta_find_ibss: did not try to join ibss\n"); /* Selected IBSS not found in current scan results - try to scan */ if (time_after(jiffies, ifibss->last_scan_completed + IEEE80211_SCAN_INTERVAL)) { struct ieee80211_channel *channels[8]; unsigned int num; sdata_info(sdata, "Trigger new scan to find an IBSS to join\n"); if (ifibss->fixed_channel) { num = ieee80211_ibss_setup_scan_channels(local->hw.wiphy, &ifibss->chandef, channels, ARRAY_SIZE(channels)); ieee80211_request_ibss_scan(sdata, ifibss->ssid, ifibss->ssid_len, channels, num); } else { ieee80211_request_ibss_scan(sdata, ifibss->ssid, ifibss->ssid_len, NULL, 0); } } else { int interval = IEEE80211_SCAN_INTERVAL; if (time_after(jiffies, ifibss->ibss_join_req + IEEE80211_IBSS_JOIN_TIMEOUT)) ieee80211_sta_create_ibss(sdata); mod_timer(&ifibss->timer, round_jiffies(jiffies + interval)); } } static void ieee80211_rx_mgmt_probe_req(struct ieee80211_sub_if_data *sdata, struct sk_buff *req) { struct ieee80211_mgmt *mgmt = (void *)req->data; struct ieee80211_if_ibss *ifibss = &sdata->u.ibss; struct ieee80211_local *local = sdata->local; int tx_last_beacon, len = req->len; struct sk_buff *skb; struct beacon_data *presp; u8 *pos, *end; lockdep_assert_wiphy(sdata->local->hw.wiphy); presp = sdata_dereference(ifibss->presp, sdata); if (ifibss->state != IEEE80211_IBSS_MLME_JOINED || len < 24 + 2 || !presp) return; tx_last_beacon = drv_tx_last_beacon(local); ibss_dbg(sdata, "RX ProbeReq SA=%pM DA=%pM\n", mgmt->sa, mgmt->da); ibss_dbg(sdata, "\tBSSID=%pM (tx_last_beacon=%d)\n", mgmt->bssid, tx_last_beacon); if (!tx_last_beacon && is_multicast_ether_addr(mgmt->da)) return; if (!ether_addr_equal(mgmt->bssid, ifibss->bssid) && !is_broadcast_ether_addr(mgmt->bssid)) return; end = ((u8 *) mgmt) + len; pos = mgmt->u.probe_req.variable; if (pos[0] != WLAN_EID_SSID || pos + 2 + pos[1] > end) { ibss_dbg(sdata, "Invalid SSID IE in ProbeReq from %pM\n", mgmt->sa); return; } if (pos[1] != 0 && (pos[1] != ifibss->ssid_len || memcmp(pos + 2, ifibss->ssid, ifibss->ssid_len))) { /* Ignore ProbeReq for foreign SSID */ return; } /* Reply with ProbeResp */ skb = dev_alloc_skb(local->tx_headroom + presp->head_len); if (!skb) return; skb_reserve(skb, local->tx_headroom); skb_put_data(skb, presp->head, presp->head_len); memcpy(((struct ieee80211_mgmt *) skb->data)->da, mgmt->sa, ETH_ALEN); ibss_dbg(sdata, "Sending ProbeResp to %pM\n", mgmt->sa); IEEE80211_SKB_CB(skb)->flags |= IEEE80211_TX_INTFL_DONT_ENCRYPT; /* avoid excessive retries for probe request to wildcard SSIDs */ if (pos[1] == 0) IEEE80211_SKB_CB(skb)->flags |= IEEE80211_TX_CTL_NO_ACK; ieee80211_tx_skb(sdata, skb); } static void ieee80211_rx_mgmt_probe_beacon(struct ieee80211_sub_if_data *sdata, struct ieee80211_mgmt *mgmt, size_t len, struct ieee80211_rx_status *rx_status) { size_t baselen; struct ieee802_11_elems *elems; BUILD_BUG_ON(offsetof(typeof(mgmt->u.probe_resp), variable) != offsetof(typeof(mgmt->u.beacon), variable)); /* * either beacon or probe_resp but the variable field is at the * same offset */ baselen = (u8 *) mgmt->u.probe_resp.variable - (u8 *) mgmt; if (baselen > len) return; elems = ieee802_11_parse_elems(mgmt->u.probe_resp.variable, len - baselen, false, NULL); if (elems) { ieee80211_rx_bss_info(sdata, mgmt, len, rx_status, elems); kfree(elems); } } void ieee80211_ibss_rx_queued_mgmt(struct ieee80211_sub_if_data *sdata, struct sk_buff *skb) { struct ieee80211_rx_status *rx_status; struct ieee80211_mgmt *mgmt; u16 fc; struct ieee802_11_elems *elems; int ies_len; rx_status = IEEE80211_SKB_RXCB(skb); mgmt = (struct ieee80211_mgmt *) skb->data; fc = le16_to_cpu(mgmt->frame_control); if (!sdata->u.ibss.ssid_len) return; /* not ready to merge yet */ switch (fc & IEEE80211_FCTL_STYPE) { case IEEE80211_STYPE_PROBE_REQ: ieee80211_rx_mgmt_probe_req(sdata, skb); break; case IEEE80211_STYPE_PROBE_RESP: case IEEE80211_STYPE_BEACON: ieee80211_rx_mgmt_probe_beacon(sdata, mgmt, skb->len, rx_status); break; case IEEE80211_STYPE_AUTH: ieee80211_rx_mgmt_auth_ibss(sdata, mgmt, skb->len); break; case IEEE80211_STYPE_DEAUTH: ieee80211_rx_mgmt_deauth_ibss(sdata, mgmt, skb->len); break; case IEEE80211_STYPE_ACTION: switch (mgmt->u.action.category) { case WLAN_CATEGORY_SPECTRUM_MGMT: ies_len = skb->len - offsetof(struct ieee80211_mgmt, u.action.u.chan_switch.variable); if (ies_len < 0) break; elems = ieee802_11_parse_elems( mgmt->u.action.u.chan_switch.variable, ies_len, true, NULL); if (elems && !elems->parse_error) ieee80211_rx_mgmt_spectrum_mgmt(sdata, mgmt, skb->len, rx_status, elems); kfree(elems); break; } } } void ieee80211_ibss_work(struct ieee80211_sub_if_data *sdata) { struct ieee80211_if_ibss *ifibss = &sdata->u.ibss; struct sta_info *sta; /* * Work could be scheduled after scan or similar * when we aren't even joined (or trying) with a * network. */ if (!ifibss->ssid_len) return; spin_lock_bh(&ifibss->incomplete_lock); while (!list_empty(&ifibss->incomplete_stations)) { sta = list_first_entry(&ifibss->incomplete_stations, struct sta_info, list); list_del(&sta->list); spin_unlock_bh(&ifibss->incomplete_lock); ieee80211_ibss_finish_sta(sta); rcu_read_unlock(); spin_lock_bh(&ifibss->incomplete_lock); } spin_unlock_bh(&ifibss->incomplete_lock); switch (ifibss->state) { case IEEE80211_IBSS_MLME_SEARCH: ieee80211_sta_find_ibss(sdata); break; case IEEE80211_IBSS_MLME_JOINED: ieee80211_sta_merge_ibss(sdata); break; default: WARN_ON(1); break; } } static void ieee80211_ibss_timer(struct timer_list *t) { struct ieee80211_sub_if_data *sdata = from_timer(sdata, t, u.ibss.timer); wiphy_work_queue(sdata->local->hw.wiphy, &sdata->work); } void ieee80211_ibss_setup_sdata(struct ieee80211_sub_if_data *sdata) { struct ieee80211_if_ibss *ifibss = &sdata->u.ibss; timer_setup(&ifibss->timer, ieee80211_ibss_timer, 0); INIT_LIST_HEAD(&ifibss->incomplete_stations); spin_lock_init(&ifibss->incomplete_lock); wiphy_work_init(&ifibss->csa_connection_drop_work, ieee80211_csa_connection_drop_work); } /* scan finished notification */ void ieee80211_ibss_notify_scan_completed(struct ieee80211_local *local) { struct ieee80211_sub_if_data *sdata; lockdep_assert_wiphy(local->hw.wiphy); list_for_each_entry(sdata, &local->interfaces, list) { if (!ieee80211_sdata_running(sdata)) continue; if (sdata->vif.type != NL80211_IFTYPE_ADHOC) continue; sdata->u.ibss.last_scan_completed = jiffies; } } int ieee80211_ibss_join(struct ieee80211_sub_if_data *sdata, struct cfg80211_ibss_params *params) { u64 changed = 0; u32 rate_flags; struct ieee80211_supported_band *sband; enum ieee80211_chanctx_mode chanmode; struct ieee80211_local *local = sdata->local; int radar_detect_width = 0; int i; int ret; lockdep_assert_wiphy(local->hw.wiphy); if (params->chandef.chan->freq_offset) { /* this may work, but is untested */ return -EOPNOTSUPP; } ret = cfg80211_chandef_dfs_required(local->hw.wiphy, ¶ms->chandef, sdata->wdev.iftype); if (ret < 0) return ret; if (ret > 0) { if (!params->userspace_handles_dfs) return -EINVAL; radar_detect_width = BIT(params->chandef.width); } chanmode = (params->channel_fixed && !ret) ? IEEE80211_CHANCTX_SHARED : IEEE80211_CHANCTX_EXCLUSIVE; ret = ieee80211_check_combinations(sdata, ¶ms->chandef, chanmode, radar_detect_width, -1); if (ret < 0) return ret; if (params->bssid) { memcpy(sdata->u.ibss.bssid, params->bssid, ETH_ALEN); sdata->u.ibss.fixed_bssid = true; } else sdata->u.ibss.fixed_bssid = false; sdata->u.ibss.privacy = params->privacy; sdata->u.ibss.control_port = params->control_port; sdata->u.ibss.userspace_handles_dfs = params->userspace_handles_dfs; sdata->u.ibss.basic_rates = params->basic_rates; sdata->u.ibss.last_scan_completed = jiffies; /* fix basic_rates if channel does not support these rates */ rate_flags = ieee80211_chandef_rate_flags(¶ms->chandef); sband = local->hw.wiphy->bands[params->chandef.chan->band]; for (i = 0; i < sband->n_bitrates; i++) { if ((rate_flags & sband->bitrates[i].flags) != rate_flags) sdata->u.ibss.basic_rates &= ~BIT(i); } memcpy(sdata->vif.bss_conf.mcast_rate, params->mcast_rate, sizeof(params->mcast_rate)); sdata->vif.bss_conf.beacon_int = params->beacon_interval; sdata->u.ibss.chandef = params->chandef; sdata->u.ibss.fixed_channel = params->channel_fixed; if (params->ie) { sdata->u.ibss.ie = kmemdup(params->ie, params->ie_len, GFP_KERNEL); if (sdata->u.ibss.ie) sdata->u.ibss.ie_len = params->ie_len; } sdata->u.ibss.state = IEEE80211_IBSS_MLME_SEARCH; sdata->u.ibss.ibss_join_req = jiffies; memcpy(sdata->u.ibss.ssid, params->ssid, params->ssid_len); sdata->u.ibss.ssid_len = params->ssid_len; memcpy(&sdata->u.ibss.ht_capa, ¶ms->ht_capa, sizeof(sdata->u.ibss.ht_capa)); memcpy(&sdata->u.ibss.ht_capa_mask, ¶ms->ht_capa_mask, sizeof(sdata->u.ibss.ht_capa_mask)); /* * 802.11n-2009 9.13.3.1: In an IBSS, the HT Protection field is * reserved, but an HT STA shall protect HT transmissions as though * the HT Protection field were set to non-HT mixed mode. * * In an IBSS, the RIFS Mode field of the HT Operation element is * also reserved, but an HT STA shall operate as though this field * were set to 1. */ sdata->vif.bss_conf.ht_operation_mode |= IEEE80211_HT_OP_MODE_PROTECTION_NONHT_MIXED | IEEE80211_HT_PARAM_RIFS_MODE; changed |= BSS_CHANGED_HT | BSS_CHANGED_MCAST_RATE; ieee80211_link_info_change_notify(sdata, &sdata->deflink, changed); sdata->deflink.smps_mode = IEEE80211_SMPS_OFF; sdata->deflink.needed_rx_chains = local->rx_chains; sdata->control_port_over_nl80211 = params->control_port_over_nl80211; wiphy_work_queue(local->hw.wiphy, &sdata->work); return 0; } int ieee80211_ibss_leave(struct ieee80211_sub_if_data *sdata) { struct ieee80211_if_ibss *ifibss = &sdata->u.ibss; ieee80211_ibss_disconnect(sdata); ifibss->ssid_len = 0; eth_zero_addr(ifibss->bssid); /* remove beacon */ kfree(sdata->u.ibss.ie); sdata->u.ibss.ie = NULL; sdata->u.ibss.ie_len = 0; /* on the next join, re-program HT parameters */ memset(&ifibss->ht_capa, 0, sizeof(ifibss->ht_capa)); memset(&ifibss->ht_capa_mask, 0, sizeof(ifibss->ht_capa_mask)); synchronize_rcu(); skb_queue_purge(&sdata->skb_queue); del_timer_sync(&sdata->u.ibss.timer); return 0; } |
4 1 9 7 2 1 3 4 296 288 6 11 9 12 3 6 144 3 5 8 9 9 299 299 1 24 3 5 2 2 3 1 15 10 1 1 9 26 1 3 19 4 19 11 8 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 | // SPDX-License-Identifier: GPL-2.0-or-later #include <linux/syscalls.h> #include <linux/time_namespace.h> #include "futex.h" /* * Support for robust futexes: the kernel cleans up held futexes at * thread exit time. * * Implementation: user-space maintains a per-thread list of locks it * is holding. Upon do_exit(), the kernel carefully walks this list, * and marks all locks that are owned by this thread with the * FUTEX_OWNER_DIED bit, and wakes up a waiter (if any). The list is * always manipulated with the lock held, so the list is private and * per-thread. Userspace also maintains a per-thread 'list_op_pending' * field, to allow the kernel to clean up if the thread dies after * acquiring the lock, but just before it could have added itself to * the list. There can only be one such pending lock. */ /** * sys_set_robust_list() - Set the robust-futex list head of a task * @head: pointer to the list-head * @len: length of the list-head, as userspace expects */ SYSCALL_DEFINE2(set_robust_list, struct robust_list_head __user *, head, size_t, len) { /* * The kernel knows only one size for now: */ if (unlikely(len != sizeof(*head))) return -EINVAL; current->robust_list = head; return 0; } /** * sys_get_robust_list() - Get the robust-futex list head of a task * @pid: pid of the process [zero for current task] * @head_ptr: pointer to a list-head pointer, the kernel fills it in * @len_ptr: pointer to a length field, the kernel fills in the header size */ SYSCALL_DEFINE3(get_robust_list, int, pid, struct robust_list_head __user * __user *, head_ptr, size_t __user *, len_ptr) { struct robust_list_head __user *head; unsigned long ret; struct task_struct *p; rcu_read_lock(); ret = -ESRCH; if (!pid) p = current; else { p = find_task_by_vpid(pid); if (!p) goto err_unlock; } ret = -EPERM; if (!ptrace_may_access(p, PTRACE_MODE_READ_REALCREDS)) goto err_unlock; head = p->robust_list; rcu_read_unlock(); if (put_user(sizeof(*head), len_ptr)) return -EFAULT; return put_user(head, head_ptr); err_unlock: rcu_read_unlock(); return ret; } long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t *timeout, u32 __user *uaddr2, u32 val2, u32 val3) { unsigned int flags = futex_to_flags(op); int cmd = op & FUTEX_CMD_MASK; if (flags & FLAGS_CLOCKRT) { if (cmd != FUTEX_WAIT_BITSET && cmd != FUTEX_WAIT_REQUEUE_PI && cmd != FUTEX_LOCK_PI2) return -ENOSYS; } switch (cmd) { case FUTEX_WAIT: val3 = FUTEX_BITSET_MATCH_ANY; fallthrough; case FUTEX_WAIT_BITSET: return futex_wait(uaddr, flags, val, timeout, val3); case FUTEX_WAKE: val3 = FUTEX_BITSET_MATCH_ANY; fallthrough; case FUTEX_WAKE_BITSET: return futex_wake(uaddr, flags, val, val3); case FUTEX_REQUEUE: return futex_requeue(uaddr, flags, uaddr2, flags, val, val2, NULL, 0); case FUTEX_CMP_REQUEUE: return futex_requeue(uaddr, flags, uaddr2, flags, val, val2, &val3, 0); case FUTEX_WAKE_OP: return futex_wake_op(uaddr, flags, uaddr2, val, val2, val3); case FUTEX_LOCK_PI: flags |= FLAGS_CLOCKRT; fallthrough; case FUTEX_LOCK_PI2: return futex_lock_pi(uaddr, flags, timeout, 0); case FUTEX_UNLOCK_PI: return futex_unlock_pi(uaddr, flags); case FUTEX_TRYLOCK_PI: return futex_lock_pi(uaddr, flags, NULL, 1); case FUTEX_WAIT_REQUEUE_PI: val3 = FUTEX_BITSET_MATCH_ANY; return futex_wait_requeue_pi(uaddr, flags, val, timeout, val3, uaddr2); case FUTEX_CMP_REQUEUE_PI: return futex_requeue(uaddr, flags, uaddr2, flags, val, val2, &val3, 1); } return -ENOSYS; } static __always_inline bool futex_cmd_has_timeout(u32 cmd) { switch (cmd) { case FUTEX_WAIT: case FUTEX_LOCK_PI: case FUTEX_LOCK_PI2: case FUTEX_WAIT_BITSET: case FUTEX_WAIT_REQUEUE_PI: return true; } return false; } static __always_inline int futex_init_timeout(u32 cmd, u32 op, struct timespec64 *ts, ktime_t *t) { if (!timespec64_valid(ts)) return -EINVAL; *t = timespec64_to_ktime(*ts); if (cmd == FUTEX_WAIT) *t = ktime_add_safe(ktime_get(), *t); else if (cmd != FUTEX_LOCK_PI && !(op & FUTEX_CLOCK_REALTIME)) *t = timens_ktime_to_host(CLOCK_MONOTONIC, *t); return 0; } SYSCALL_DEFINE6(futex, u32 __user *, uaddr, int, op, u32, val, const struct __kernel_timespec __user *, utime, u32 __user *, uaddr2, u32, val3) { int ret, cmd = op & FUTEX_CMD_MASK; ktime_t t, *tp = NULL; struct timespec64 ts; if (utime && futex_cmd_has_timeout(cmd)) { if (unlikely(should_fail_futex(!(op & FUTEX_PRIVATE_FLAG)))) return -EFAULT; if (get_timespec64(&ts, utime)) return -EFAULT; ret = futex_init_timeout(cmd, op, &ts, &t); if (ret) return ret; tp = &t; } return do_futex(uaddr, op, val, tp, uaddr2, (unsigned long)utime, val3); } /** * futex_parse_waitv - Parse a waitv array from userspace * @futexv: Kernel side list of waiters to be filled * @uwaitv: Userspace list to be parsed * @nr_futexes: Length of futexv * @wake: Wake to call when futex is woken * @wake_data: Data for the wake handler * * Return: Error code on failure, 0 on success */ int futex_parse_waitv(struct futex_vector *futexv, struct futex_waitv __user *uwaitv, unsigned int nr_futexes, futex_wake_fn *wake, void *wake_data) { struct futex_waitv aux; unsigned int i; for (i = 0; i < nr_futexes; i++) { unsigned int flags; if (copy_from_user(&aux, &uwaitv[i], sizeof(aux))) return -EFAULT; if ((aux.flags & ~FUTEX2_VALID_MASK) || aux.__reserved) return -EINVAL; flags = futex2_to_flags(aux.flags); if (!futex_flags_valid(flags)) return -EINVAL; if (!futex_validate_input(flags, aux.val)) return -EINVAL; futexv[i].w.flags = flags; futexv[i].w.val = aux.val; futexv[i].w.uaddr = aux.uaddr; futexv[i].q = futex_q_init; futexv[i].q.wake = wake; futexv[i].q.wake_data = wake_data; } return 0; } static int futex2_setup_timeout(struct __kernel_timespec __user *timeout, clockid_t clockid, struct hrtimer_sleeper *to) { int flag_clkid = 0, flag_init = 0; struct timespec64 ts; ktime_t time; int ret; if (!timeout) return 0; if (clockid == CLOCK_REALTIME) { flag_clkid = FLAGS_CLOCKRT; flag_init = FUTEX_CLOCK_REALTIME; } if (clockid != CLOCK_REALTIME && clockid != CLOCK_MONOTONIC) return -EINVAL; if (get_timespec64(&ts, timeout)) return -EFAULT; /* * Since there's no opcode for futex_waitv, use * FUTEX_WAIT_BITSET that uses absolute timeout as well */ ret = futex_init_timeout(FUTEX_WAIT_BITSET, flag_init, &ts, &time); if (ret) return ret; futex_setup_timer(&time, to, flag_clkid, 0); return 0; } static inline void futex2_destroy_timeout(struct hrtimer_sleeper *to) { hrtimer_cancel(&to->timer); destroy_hrtimer_on_stack(&to->timer); } /** * sys_futex_waitv - Wait on a list of futexes * @waiters: List of futexes to wait on * @nr_futexes: Length of futexv * @flags: Flag for timeout (monotonic/realtime) * @timeout: Optional absolute timeout. * @clockid: Clock to be used for the timeout, realtime or monotonic. * * Given an array of `struct futex_waitv`, wait on each uaddr. The thread wakes * if a futex_wake() is performed at any uaddr. The syscall returns immediately * if any waiter has *uaddr != val. *timeout is an optional timeout value for * the operation. Each waiter has individual flags. The `flags` argument for * the syscall should be used solely for specifying the timeout as realtime, if * needed. Flags for private futexes, sizes, etc. should be used on the * individual flags of each waiter. * * Returns the array index of one of the woken futexes. No further information * is provided: any number of other futexes may also have been woken by the * same event, and if more than one futex was woken, the retrned index may * refer to any one of them. (It is not necessaryily the futex with the * smallest index, nor the one most recently woken, nor...) */ SYSCALL_DEFINE5(futex_waitv, struct futex_waitv __user *, waiters, unsigned int, nr_futexes, unsigned int, flags, struct __kernel_timespec __user *, timeout, clockid_t, clockid) { struct hrtimer_sleeper to; struct futex_vector *futexv; int ret; /* This syscall supports no flags for now */ if (flags) return -EINVAL; if (!nr_futexes || nr_futexes > FUTEX_WAITV_MAX || !waiters) return -EINVAL; if (timeout && (ret = futex2_setup_timeout(timeout, clockid, &to))) return ret; futexv = kcalloc(nr_futexes, sizeof(*futexv), GFP_KERNEL); if (!futexv) { ret = -ENOMEM; goto destroy_timer; } ret = futex_parse_waitv(futexv, waiters, nr_futexes, futex_wake_mark, NULL); if (!ret) ret = futex_wait_multiple(futexv, nr_futexes, timeout ? &to : NULL); kfree(futexv); destroy_timer: if (timeout) futex2_destroy_timeout(&to); return ret; } /* * sys_futex_wake - Wake a number of futexes * @uaddr: Address of the futex(es) to wake * @mask: bitmask * @nr: Number of the futexes to wake * @flags: FUTEX2 flags * * Identical to the traditional FUTEX_WAKE_BITSET op, except it is part of the * futex2 family of calls. */ SYSCALL_DEFINE4(futex_wake, void __user *, uaddr, unsigned long, mask, int, nr, unsigned int, flags) { if (flags & ~FUTEX2_VALID_MASK) return -EINVAL; flags = futex2_to_flags(flags); if (!futex_flags_valid(flags)) return -EINVAL; if (!futex_validate_input(flags, mask)) return -EINVAL; return futex_wake(uaddr, FLAGS_STRICT | flags, nr, mask); } /* * sys_futex_wait - Wait on a futex * @uaddr: Address of the futex to wait on * @val: Value of @uaddr * @mask: bitmask * @flags: FUTEX2 flags * @timeout: Optional absolute timeout * @clockid: Clock to be used for the timeout, realtime or monotonic * * Identical to the traditional FUTEX_WAIT_BITSET op, except it is part of the * futex2 familiy of calls. */ SYSCALL_DEFINE6(futex_wait, void __user *, uaddr, unsigned long, val, unsigned long, mask, unsigned int, flags, struct __kernel_timespec __user *, timeout, clockid_t, clockid) { struct hrtimer_sleeper to; int ret; if (flags & ~FUTEX2_VALID_MASK) return -EINVAL; flags = futex2_to_flags(flags); if (!futex_flags_valid(flags)) return -EINVAL; if (!futex_validate_input(flags, val) || !futex_validate_input(flags, mask)) return -EINVAL; if (timeout && (ret = futex2_setup_timeout(timeout, clockid, &to))) return ret; ret = __futex_wait(uaddr, flags, val, timeout ? &to : NULL, mask); if (timeout) futex2_destroy_timeout(&to); return ret; } /* * sys_futex_requeue - Requeue a waiter from one futex to another * @waiters: array describing the source and destination futex * @flags: unused * @nr_wake: number of futexes to wake * @nr_requeue: number of futexes to requeue * * Identical to the traditional FUTEX_CMP_REQUEUE op, except it is part of the * futex2 family of calls. */ SYSCALL_DEFINE4(futex_requeue, struct futex_waitv __user *, waiters, unsigned int, flags, int, nr_wake, int, nr_requeue) { struct futex_vector futexes[2]; u32 cmpval; int ret; if (flags) return -EINVAL; if (!waiters) return -EINVAL; ret = futex_parse_waitv(futexes, waiters, 2, futex_wake_mark, NULL); if (ret) return ret; cmpval = futexes[0].w.val; return futex_requeue(u64_to_user_ptr(futexes[0].w.uaddr), futexes[0].w.flags, u64_to_user_ptr(futexes[1].w.uaddr), futexes[1].w.flags, nr_wake, nr_requeue, &cmpval, 0); } #ifdef CONFIG_COMPAT COMPAT_SYSCALL_DEFINE2(set_robust_list, struct compat_robust_list_head __user *, head, compat_size_t, len) { if (unlikely(len != sizeof(*head))) return -EINVAL; current->compat_robust_list = head; return 0; } COMPAT_SYSCALL_DEFINE3(get_robust_list, int, pid, compat_uptr_t __user *, head_ptr, compat_size_t __user *, len_ptr) { struct compat_robust_list_head __user *head; unsigned long ret; struct task_struct *p; rcu_read_lock(); ret = -ESRCH; if (!pid) p = current; else { p = find_task_by_vpid(pid); if (!p) goto err_unlock; } ret = -EPERM; if (!ptrace_may_access(p, PTRACE_MODE_READ_REALCREDS)) goto err_unlock; head = p->compat_robust_list; rcu_read_unlock(); if (put_user(sizeof(*head), len_ptr)) return -EFAULT; return put_user(ptr_to_compat(head), head_ptr); err_unlock: rcu_read_unlock(); return ret; } #endif /* CONFIG_COMPAT */ #ifdef CONFIG_COMPAT_32BIT_TIME SYSCALL_DEFINE6(futex_time32, u32 __user *, uaddr, int, op, u32, val, const struct old_timespec32 __user *, utime, u32 __user *, uaddr2, u32, val3) { int ret, cmd = op & FUTEX_CMD_MASK; ktime_t t, *tp = NULL; struct timespec64 ts; if (utime && futex_cmd_has_timeout(cmd)) { if (get_old_timespec32(&ts, utime)) return -EFAULT; ret = futex_init_timeout(cmd, op, &ts, &t); if (ret) return ret; tp = &t; } return do_futex(uaddr, op, val, tp, uaddr2, (unsigned long)utime, val3); } #endif /* CONFIG_COMPAT_32BIT_TIME */ |
809 166 808 16 1889 1479 829 814 598 706 1756 1693 8 6 2 2 2 1961 1962 112 112 40 43 43 43 19 88 88 78 78 56 56 23 23 7 11 7 7 4 17 17 6 1754 1756 1755 1740 237 1755 1755 1753 417 185 184 37 779 37 779 199 198 583 583 41 1385 330 330 330 8 5 3 5 4 1 509 509 6 1335 1335 185 47 820 17 809 120 53 78 120 696 696 182 648 229 182 695 681 15 681 2 2 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 | // SPDX-License-Identifier: GPL-2.0-only /* * proc/fs/generic.c --- generic routines for the proc-fs * * This file contains generic proc-fs routines for handling * directories and files. * * Copyright (C) 1991, 1992 Linus Torvalds. * Copyright (C) 1997 Theodore Ts'o */ #include <linux/cache.h> #include <linux/errno.h> #include <linux/time.h> #include <linux/proc_fs.h> #include <linux/stat.h> #include <linux/mm.h> #include <linux/module.h> #include <linux/namei.h> #include <linux/slab.h> #include <linux/printk.h> #include <linux/mount.h> #include <linux/init.h> #include <linux/idr.h> #include <linux/bitops.h> #include <linux/spinlock.h> #include <linux/completion.h> #include <linux/uaccess.h> #include <linux/seq_file.h> #include "internal.h" static DEFINE_RWLOCK(proc_subdir_lock); struct kmem_cache *proc_dir_entry_cache __ro_after_init; void pde_free(struct proc_dir_entry *pde) { if (S_ISLNK(pde->mode)) kfree(pde->data); if (pde->name != pde->inline_name) kfree(pde->name); kmem_cache_free(proc_dir_entry_cache, pde); } static int proc_match(const char *name, struct proc_dir_entry *de, unsigned int len) { if (len < de->namelen) return -1; if (len > de->namelen) return 1; return memcmp(name, de->name, len); } static struct proc_dir_entry *pde_subdir_first(struct proc_dir_entry *dir) { return rb_entry_safe(rb_first(&dir->subdir), struct proc_dir_entry, subdir_node); } static struct proc_dir_entry *pde_subdir_next(struct proc_dir_entry *dir) { return rb_entry_safe(rb_next(&dir->subdir_node), struct proc_dir_entry, subdir_node); } static struct proc_dir_entry *pde_subdir_find(struct proc_dir_entry *dir, const char *name, unsigned int len) { struct rb_node *node = dir->subdir.rb_node; while (node) { struct proc_dir_entry *de = rb_entry(node, struct proc_dir_entry, subdir_node); int result = proc_match(name, de, len); if (result < 0) node = node->rb_left; else if (result > 0) node = node->rb_right; else return de; } return NULL; } static bool pde_subdir_insert(struct proc_dir_entry *dir, struct proc_dir_entry *de) { struct rb_root *root = &dir->subdir; struct rb_node **new = &root->rb_node, *parent = NULL; /* Figure out where to put new node */ while (*new) { struct proc_dir_entry *this = rb_entry(*new, struct proc_dir_entry, subdir_node); int result = proc_match(de->name, this, de->namelen); parent = *new; if (result < 0) new = &(*new)->rb_left; else if (result > 0) new = &(*new)->rb_right; else return false; } /* Add new node and rebalance tree. */ rb_link_node(&de->subdir_node, parent, new); rb_insert_color(&de->subdir_node, root); return true; } static int proc_notify_change(struct mnt_idmap *idmap, struct dentry *dentry, struct iattr *iattr) { struct inode *inode = d_inode(dentry); struct proc_dir_entry *de = PDE(inode); int error; error = setattr_prepare(&nop_mnt_idmap, dentry, iattr); if (error) return error; setattr_copy(&nop_mnt_idmap, inode, iattr); proc_set_user(de, inode->i_uid, inode->i_gid); de->mode = inode->i_mode; return 0; } static int proc_getattr(struct mnt_idmap *idmap, const struct path *path, struct kstat *stat, u32 request_mask, unsigned int query_flags) { struct inode *inode = d_inode(path->dentry); struct proc_dir_entry *de = PDE(inode); if (de) { nlink_t nlink = READ_ONCE(de->nlink); if (nlink > 0) { set_nlink(inode, nlink); } } generic_fillattr(&nop_mnt_idmap, request_mask, inode, stat); return 0; } static const struct inode_operations proc_file_inode_operations = { .setattr = proc_notify_change, }; /* * This function parses a name such as "tty/driver/serial", and * returns the struct proc_dir_entry for "/proc/tty/driver", and * returns "serial" in residual. */ static int __xlate_proc_name(const char *name, struct proc_dir_entry **ret, const char **residual) { const char *cp = name, *next; struct proc_dir_entry *de; de = *ret ?: &proc_root; while ((next = strchr(cp, '/')) != NULL) { de = pde_subdir_find(de, cp, next - cp); if (!de) { WARN(1, "name '%s'\n", name); return -ENOENT; } cp = next + 1; } *residual = cp; *ret = de; return 0; } static int xlate_proc_name(const char *name, struct proc_dir_entry **ret, const char **residual) { int rv; read_lock(&proc_subdir_lock); rv = __xlate_proc_name(name, ret, residual); read_unlock(&proc_subdir_lock); return rv; } static DEFINE_IDA(proc_inum_ida); #define PROC_DYNAMIC_FIRST 0xF0000000U /* * Return an inode number between PROC_DYNAMIC_FIRST and * 0xffffffff, or zero on failure. */ int proc_alloc_inum(unsigned int *inum) { int i; i = ida_alloc_max(&proc_inum_ida, UINT_MAX - PROC_DYNAMIC_FIRST, GFP_KERNEL); if (i < 0) return i; *inum = PROC_DYNAMIC_FIRST + (unsigned int)i; return 0; } void proc_free_inum(unsigned int inum) { ida_free(&proc_inum_ida, inum - PROC_DYNAMIC_FIRST); } static int proc_misc_d_revalidate(struct dentry *dentry, unsigned int flags) { if (flags & LOOKUP_RCU) return -ECHILD; if (atomic_read(&PDE(d_inode(dentry))->in_use) < 0) return 0; /* revalidate */ return 1; } static int proc_misc_d_delete(const struct dentry *dentry) { return atomic_read(&PDE(d_inode(dentry))->in_use) < 0; } static const struct dentry_operations proc_misc_dentry_ops = { .d_revalidate = proc_misc_d_revalidate, .d_delete = proc_misc_d_delete, }; /* * Don't create negative dentries here, return -ENOENT by hand * instead. */ struct dentry *proc_lookup_de(struct inode *dir, struct dentry *dentry, struct proc_dir_entry *de) { struct inode *inode; read_lock(&proc_subdir_lock); de = pde_subdir_find(de, dentry->d_name.name, dentry->d_name.len); if (de) { pde_get(de); read_unlock(&proc_subdir_lock); inode = proc_get_inode(dir->i_sb, de); if (!inode) return ERR_PTR(-ENOMEM); d_set_d_op(dentry, de->proc_dops); return d_splice_alias(inode, dentry); } read_unlock(&proc_subdir_lock); return ERR_PTR(-ENOENT); } struct dentry *proc_lookup(struct inode *dir, struct dentry *dentry, unsigned int flags) { struct proc_fs_info *fs_info = proc_sb_info(dir->i_sb); if (fs_info->pidonly == PROC_PIDONLY_ON) return ERR_PTR(-ENOENT); return proc_lookup_de(dir, dentry, PDE(dir)); } /* * This returns non-zero if at EOF, so that the /proc * root directory can use this and check if it should * continue with the <pid> entries.. * * Note that the VFS-layer doesn't care about the return * value of the readdir() call, as long as it's non-negative * for success.. */ int proc_readdir_de(struct file *file, struct dir_context *ctx, struct proc_dir_entry *de) { int i; if (!dir_emit_dots(file, ctx)) return 0; i = ctx->pos - 2; read_lock(&proc_subdir_lock); de = pde_subdir_first(de); for (;;) { if (!de) { read_unlock(&proc_subdir_lock); return 0; } if (!i) break; de = pde_subdir_next(de); i--; } do { struct proc_dir_entry *next; pde_get(de); read_unlock(&proc_subdir_lock); if (!dir_emit(ctx, de->name, de->namelen, de->low_ino, de->mode >> 12)) { pde_put(de); return 0; } ctx->pos++; read_lock(&proc_subdir_lock); next = pde_subdir_next(de); pde_put(de); de = next; } while (de); read_unlock(&proc_subdir_lock); return 1; } int proc_readdir(struct file *file, struct dir_context *ctx) { struct inode *inode = file_inode(file); struct proc_fs_info *fs_info = proc_sb_info(inode->i_sb); if (fs_info->pidonly == PROC_PIDONLY_ON) return 1; return proc_readdir_de(file, ctx, PDE(inode)); } /* * These are the generic /proc directory operations. They * use the in-memory "struct proc_dir_entry" tree to parse * the /proc directory. */ static const struct file_operations proc_dir_operations = { .llseek = generic_file_llseek, .read = generic_read_dir, .iterate_shared = proc_readdir, }; static int proc_net_d_revalidate(struct dentry *dentry, unsigned int flags) { return 0; } const struct dentry_operations proc_net_dentry_ops = { .d_revalidate = proc_net_d_revalidate, .d_delete = always_delete_dentry, }; /* * proc directories can do almost nothing.. */ static const struct inode_operations proc_dir_inode_operations = { .lookup = proc_lookup, .getattr = proc_getattr, .setattr = proc_notify_change, }; /* returns the registered entry, or frees dp and returns NULL on failure */ struct proc_dir_entry *proc_register(struct proc_dir_entry *dir, struct proc_dir_entry *dp) { if (proc_alloc_inum(&dp->low_ino)) goto out_free_entry; write_lock(&proc_subdir_lock); dp->parent = dir; if (pde_subdir_insert(dir, dp) == false) { WARN(1, "proc_dir_entry '%s/%s' already registered\n", dir->name, dp->name); write_unlock(&proc_subdir_lock); goto out_free_inum; } dir->nlink++; write_unlock(&proc_subdir_lock); return dp; out_free_inum: proc_free_inum(dp->low_ino); out_free_entry: pde_free(dp); return NULL; } static struct proc_dir_entry *__proc_create(struct proc_dir_entry **parent, const char *name, umode_t mode, nlink_t nlink) { struct proc_dir_entry *ent = NULL; const char *fn; struct qstr qstr; if (xlate_proc_name(name, parent, &fn) != 0) goto out; qstr.name = fn; qstr.len = strlen(fn); if (qstr.len == 0 || qstr.len >= 256) { WARN(1, "name len %u\n", qstr.len); return NULL; } if (qstr.len == 1 && fn[0] == '.') { WARN(1, "name '.'\n"); return NULL; } if (qstr.len == 2 && fn[0] == '.' && fn[1] == '.') { WARN(1, "name '..'\n"); return NULL; } if (*parent == &proc_root && name_to_int(&qstr) != ~0U) { WARN(1, "create '/proc/%s' by hand\n", qstr.name); return NULL; } if (is_empty_pde(*parent)) { WARN(1, "attempt to add to permanently empty directory"); return NULL; } ent = kmem_cache_zalloc(proc_dir_entry_cache, GFP_KERNEL); if (!ent) goto out; if (qstr.len + 1 <= SIZEOF_PDE_INLINE_NAME) { ent->name = ent->inline_name; } else { ent->name = kmalloc(qstr.len + 1, GFP_KERNEL); if (!ent->name) { pde_free(ent); return NULL; } } memcpy(ent->name, fn, qstr.len + 1); ent->namelen = qstr.len; ent->mode = mode; ent->nlink = nlink; ent->subdir = RB_ROOT; refcount_set(&ent->refcnt, 1); spin_lock_init(&ent->pde_unload_lock); INIT_LIST_HEAD(&ent->pde_openers); proc_set_user(ent, (*parent)->uid, (*parent)->gid); ent->proc_dops = &proc_misc_dentry_ops; /* Revalidate everything under /proc/${pid}/net */ if ((*parent)->proc_dops == &proc_net_dentry_ops) pde_force_lookup(ent); out: return ent; } struct proc_dir_entry *proc_symlink(const char *name, struct proc_dir_entry *parent, const char *dest) { struct proc_dir_entry *ent; ent = __proc_create(&parent, name, (S_IFLNK | S_IRUGO | S_IWUGO | S_IXUGO),1); if (ent) { ent->data = kmalloc((ent->size=strlen(dest))+1, GFP_KERNEL); if (ent->data) { strcpy((char*)ent->data,dest); ent->proc_iops = &proc_link_inode_operations; ent = proc_register(parent, ent); } else { pde_free(ent); ent = NULL; } } return ent; } EXPORT_SYMBOL(proc_symlink); struct proc_dir_entry *_proc_mkdir(const char *name, umode_t mode, struct proc_dir_entry *parent, void *data, bool force_lookup) { struct proc_dir_entry *ent; if (mode == 0) mode = S_IRUGO | S_IXUGO; ent = __proc_create(&parent, name, S_IFDIR | mode, 2); if (ent) { ent->data = data; ent->proc_dir_ops = &proc_dir_operations; ent->proc_iops = &proc_dir_inode_operations; if (force_lookup) { pde_force_lookup(ent); } ent = proc_register(parent, ent); } return ent; } EXPORT_SYMBOL_GPL(_proc_mkdir); struct proc_dir_entry *proc_mkdir_data(const char *name, umode_t mode, struct proc_dir_entry *parent, void *data) { return _proc_mkdir(name, mode, parent, data, false); } EXPORT_SYMBOL_GPL(proc_mkdir_data); struct proc_dir_entry *proc_mkdir_mode(const char *name, umode_t mode, struct proc_dir_entry *parent) { return proc_mkdir_data(name, mode, parent, NULL); } EXPORT_SYMBOL(proc_mkdir_mode); struct proc_dir_entry *proc_mkdir(const char *name, struct proc_dir_entry *parent) { return proc_mkdir_data(name, 0, parent, NULL); } EXPORT_SYMBOL(proc_mkdir); struct proc_dir_entry *proc_create_mount_point(const char *name) { umode_t mode = S_IFDIR | S_IRUGO | S_IXUGO; struct proc_dir_entry *ent, *parent = NULL; ent = __proc_create(&parent, name, mode, 2); if (ent) { ent->data = NULL; ent->proc_dir_ops = NULL; ent->proc_iops = NULL; ent = proc_register(parent, ent); } return ent; } EXPORT_SYMBOL(proc_create_mount_point); struct proc_dir_entry *proc_create_reg(const char *name, umode_t mode, struct proc_dir_entry **parent, void *data) { struct proc_dir_entry *p; if ((mode & S_IFMT) == 0) mode |= S_IFREG; if ((mode & S_IALLUGO) == 0) mode |= S_IRUGO; if (WARN_ON_ONCE(!S_ISREG(mode))) return NULL; p = __proc_create(parent, name, mode, 1); if (p) { p->proc_iops = &proc_file_inode_operations; p->data = data; } return p; } static inline void pde_set_flags(struct proc_dir_entry *pde) { if (pde->proc_ops->proc_flags & PROC_ENTRY_PERMANENT) pde->flags |= PROC_ENTRY_PERMANENT; } struct proc_dir_entry *proc_create_data(const char *name, umode_t mode, struct proc_dir_entry *parent, const struct proc_ops *proc_ops, void *data) { struct proc_dir_entry *p; p = proc_create_reg(name, mode, &parent, data); if (!p) return NULL; p->proc_ops = proc_ops; pde_set_flags(p); return proc_register(parent, p); } EXPORT_SYMBOL(proc_create_data); struct proc_dir_entry *proc_create(const char *name, umode_t mode, struct proc_dir_entry *parent, const struct proc_ops *proc_ops) { return proc_create_data(name, mode, parent, proc_ops, NULL); } EXPORT_SYMBOL(proc_create); static int proc_seq_open(struct inode *inode, struct file *file) { struct proc_dir_entry *de = PDE(inode); if (de->state_size) return seq_open_private(file, de->seq_ops, de->state_size); return seq_open(file, de->seq_ops); } static int proc_seq_release(struct inode *inode, struct file *file) { struct proc_dir_entry *de = PDE(inode); if (de->state_size) return seq_release_private(inode, file); return seq_release(inode, file); } static const struct proc_ops proc_seq_ops = { /* not permanent -- can call into arbitrary seq_operations */ .proc_open = proc_seq_open, .proc_read_iter = seq_read_iter, .proc_lseek = seq_lseek, .proc_release = proc_seq_release, }; struct proc_dir_entry *proc_create_seq_private(const char *name, umode_t mode, struct proc_dir_entry *parent, const struct seq_operations *ops, unsigned int state_size, void *data) { struct proc_dir_entry *p; p = proc_create_reg(name, mode, &parent, data); if (!p) return NULL; p->proc_ops = &proc_seq_ops; p->seq_ops = ops; p->state_size = state_size; return proc_register(parent, p); } EXPORT_SYMBOL(proc_create_seq_private); static int proc_single_open(struct inode *inode, struct file *file) { struct proc_dir_entry *de = PDE(inode); return single_open(file, de->single_show, de->data); } static const struct proc_ops proc_single_ops = { /* not permanent -- can call into arbitrary ->single_show */ .proc_open = proc_single_open, .proc_read_iter = seq_read_iter, .proc_lseek = seq_lseek, .proc_release = single_release, }; struct proc_dir_entry *proc_create_single_data(const char *name, umode_t mode, struct proc_dir_entry *parent, int (*show)(struct seq_file *, void *), void *data) { struct proc_dir_entry *p; p = proc_create_reg(name, mode, &parent, data); if (!p) return NULL; p->proc_ops = &proc_single_ops; p->single_show = show; return proc_register(parent, p); } EXPORT_SYMBOL(proc_create_single_data); void proc_set_size(struct proc_dir_entry *de, loff_t size) { de->size = size; } EXPORT_SYMBOL(proc_set_size); void proc_set_user(struct proc_dir_entry *de, kuid_t uid, kgid_t gid) { de->uid = uid; de->gid = gid; } EXPORT_SYMBOL(proc_set_user); void pde_put(struct proc_dir_entry *pde) { if (refcount_dec_and_test(&pde->refcnt)) { proc_free_inum(pde->low_ino); pde_free(pde); } } /* * Remove a /proc entry and free it if it's not currently in use. */ void remove_proc_entry(const char *name, struct proc_dir_entry *parent) { struct proc_dir_entry *de = NULL; const char *fn = name; unsigned int len; write_lock(&proc_subdir_lock); if (__xlate_proc_name(name, &parent, &fn) != 0) { write_unlock(&proc_subdir_lock); return; } len = strlen(fn); de = pde_subdir_find(parent, fn, len); if (de) { if (unlikely(pde_is_permanent(de))) { WARN(1, "removing permanent /proc entry '%s'", de->name); de = NULL; } else { rb_erase(&de->subdir_node, &parent->subdir); if (S_ISDIR(de->mode)) parent->nlink--; } } write_unlock(&proc_subdir_lock); if (!de) { WARN(1, "name '%s'\n", name); return; } proc_entry_rundown(de); WARN(pde_subdir_first(de), "%s: removing non-empty directory '%s/%s', leaking at least '%s'\n", __func__, de->parent->name, de->name, pde_subdir_first(de)->name); pde_put(de); } EXPORT_SYMBOL(remove_proc_entry); int remove_proc_subtree(const char *name, struct proc_dir_entry *parent) { struct proc_dir_entry *root = NULL, *de, *next; const char *fn = name; unsigned int len; write_lock(&proc_subdir_lock); if (__xlate_proc_name(name, &parent, &fn) != 0) { write_unlock(&proc_subdir_lock); return -ENOENT; } len = strlen(fn); root = pde_subdir_find(parent, fn, len); if (!root) { write_unlock(&proc_subdir_lock); return -ENOENT; } if (unlikely(pde_is_permanent(root))) { write_unlock(&proc_subdir_lock); WARN(1, "removing permanent /proc entry '%s/%s'", root->parent->name, root->name); return -EINVAL; } rb_erase(&root->subdir_node, &parent->subdir); de = root; while (1) { next = pde_subdir_first(de); if (next) { if (unlikely(pde_is_permanent(next))) { write_unlock(&proc_subdir_lock); WARN(1, "removing permanent /proc entry '%s/%s'", next->parent->name, next->name); return -EINVAL; } rb_erase(&next->subdir_node, &de->subdir); de = next; continue; } next = de->parent; if (S_ISDIR(de->mode)) next->nlink--; write_unlock(&proc_subdir_lock); proc_entry_rundown(de); if (de == root) break; pde_put(de); write_lock(&proc_subdir_lock); de = next; } pde_put(root); return 0; } EXPORT_SYMBOL(remove_proc_subtree); void *proc_get_parent_data(const struct inode *inode) { struct proc_dir_entry *de = PDE(inode); return de->parent->data; } EXPORT_SYMBOL_GPL(proc_get_parent_data); void proc_remove(struct proc_dir_entry *de) { if (de) remove_proc_subtree(de->name, de->parent); } EXPORT_SYMBOL(proc_remove); /* * Pull a user buffer into memory and pass it to the file's write handler if * one is supplied. The ->write() method is permitted to modify the * kernel-side buffer. */ ssize_t proc_simple_write(struct file *f, const char __user *ubuf, size_t size, loff_t *_pos) { struct proc_dir_entry *pde = PDE(file_inode(f)); char *buf; int ret; if (!pde->write) return -EACCES; if (size == 0 || size > PAGE_SIZE - 1) return -EINVAL; buf = memdup_user_nul(ubuf, size); if (IS_ERR(buf)) return PTR_ERR(buf); ret = pde->write(f, buf, size); kfree(buf); return ret == 0 ? size : ret; } |
1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 | #ifndef _NF_TPROXY_H_ #define _NF_TPROXY_H_ #include <net/tcp.h> enum nf_tproxy_lookup_t { NF_TPROXY_LOOKUP_LISTENER, NF_TPROXY_LOOKUP_ESTABLISHED, }; static inline bool nf_tproxy_sk_is_transparent(struct sock *sk) { if (inet_sk_transparent(sk)) return true; sock_gen_put(sk); return false; } static inline void nf_tproxy_twsk_deschedule_put(struct inet_timewait_sock *tw) { local_bh_disable(); inet_twsk_deschedule_put(tw); local_bh_enable(); } /* assign a socket to the skb -- consumes sk */ static inline void nf_tproxy_assign_sock(struct sk_buff *skb, struct sock *sk) { skb_orphan(skb); skb->sk = sk; skb->destructor = sock_edemux; } __be32 nf_tproxy_laddr4(struct sk_buff *skb, __be32 user_laddr, __be32 daddr); /** * nf_tproxy_handle_time_wait4 - handle IPv4 TCP TIME_WAIT reopen redirections * @skb: The skb being processed. * @laddr: IPv4 address to redirect to or zero. * @lport: TCP port to redirect to or zero. * @sk: The TIME_WAIT TCP socket found by the lookup. * * We have to handle SYN packets arriving to TIME_WAIT sockets * differently: instead of reopening the connection we should rather * redirect the new connection to the proxy if there's a listener * socket present. * * nf_tproxy_handle_time_wait4() consumes the socket reference passed in. * * Returns the listener socket if there's one, the TIME_WAIT socket if * no such listener is found, or NULL if the TCP header is incomplete. */ struct sock * nf_tproxy_handle_time_wait4(struct net *net, struct sk_buff *skb, __be32 laddr, __be16 lport, struct sock *sk); /* * This is used when the user wants to intercept a connection matching * an explicit iptables rule. In this case the sockets are assumed * matching in preference order: * * - match: if there's a fully established connection matching the * _packet_ tuple, it is returned, assuming the redirection * already took place and we process a packet belonging to an * established connection * * - match: if there's a listening socket matching the redirection * (e.g. on-port & on-ip of the connection), it is returned, * regardless if it was bound to 0.0.0.0 or an explicit * address. The reasoning is that if there's an explicit rule, it * does not really matter if the listener is bound to an interface * or to 0. The user already stated that he wants redirection * (since he added the rule). * * Please note that there's an overlap between what a TPROXY target * and a socket match will match. Normally if you have both rules the * "socket" match will be the first one, effectively all packets * belonging to established connections going through that one. */ struct sock * nf_tproxy_get_sock_v4(struct net *net, struct sk_buff *skb, const u8 protocol, const __be32 saddr, const __be32 daddr, const __be16 sport, const __be16 dport, const struct net_device *in, const enum nf_tproxy_lookup_t lookup_type); const struct in6_addr * nf_tproxy_laddr6(struct sk_buff *skb, const struct in6_addr *user_laddr, const struct in6_addr *daddr); /** * nf_tproxy_handle_time_wait6 - handle IPv6 TCP TIME_WAIT reopen redirections * @skb: The skb being processed. * @tproto: Transport protocol. * @thoff: Transport protocol header offset. * @net: Network namespace. * @laddr: IPv6 address to redirect to. * @lport: TCP port to redirect to or zero. * @sk: The TIME_WAIT TCP socket found by the lookup. * * We have to handle SYN packets arriving to TIME_WAIT sockets * differently: instead of reopening the connection we should rather * redirect the new connection to the proxy if there's a listener * socket present. * * nf_tproxy_handle_time_wait6() consumes the socket reference passed in. * * Returns the listener socket if there's one, the TIME_WAIT socket if * no such listener is found, or NULL if the TCP header is incomplete. */ struct sock * nf_tproxy_handle_time_wait6(struct sk_buff *skb, int tproto, int thoff, struct net *net, const struct in6_addr *laddr, const __be16 lport, struct sock *sk); struct sock * nf_tproxy_get_sock_v6(struct net *net, struct sk_buff *skb, int thoff, const u8 protocol, const struct in6_addr *saddr, const struct in6_addr *daddr, const __be16 sport, const __be16 dport, const struct net_device *in, const enum nf_tproxy_lookup_t lookup_type); #endif /* _NF_TPROXY_H_ */ |
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 | // SPDX-License-Identifier: GPL-2.0+ /* * usbduxsigma.c * Copyright (C) 2011-2015 Bernd Porr, mail@berndporr.me.uk */ /* * Driver: usbduxsigma * Description: University of Stirling USB DAQ & INCITE Technology Limited * Devices: [ITL] USB-DUX-SIGMA (usbduxsigma) * Author: Bernd Porr <mail@berndporr.me.uk> * Updated: 20 July 2015 * Status: stable */ /* * I must give credit here to Chris Baugher who * wrote the driver for AT-MIO-16d. I used some parts of this * driver. I also must give credits to David Brownell * who supported me with the USB development. * * Note: the raw data from the A/D converter is 24 bit big endian * anything else is little endian to/from the dux board * * * Revision history: * 0.1: initial version * 0.2: all basic functions implemented, digital I/O only for one port * 0.3: proper vendor ID and driver name * 0.4: fixed D/A voltage range * 0.5: various bug fixes, health check at startup * 0.6: corrected wrong input range * 0.7: rewrite code that urb->interval is always 1 */ #include <linux/kernel.h> #include <linux/module.h> #include <linux/slab.h> #include <linux/input.h> #include <linux/fcntl.h> #include <linux/compiler.h> #include <asm/unaligned.h> #include <linux/comedi/comedi_usb.h> /* timeout for the USB-transfer in ms*/ #define BULK_TIMEOUT 1000 /* constants for "firmware" upload and download */ #define FIRMWARE "usbduxsigma_firmware.bin" #define FIRMWARE_MAX_LEN 0x4000 #define USBDUXSUB_FIRMWARE 0xa0 #define VENDOR_DIR_IN 0xc0 #define VENDOR_DIR_OUT 0x40 /* internal addresses of the 8051 processor */ #define USBDUXSUB_CPUCS 0xE600 /* 300Hz max frequ under PWM */ #define MIN_PWM_PERIOD ((long)(1E9 / 300)) /* Default PWM frequency */ #define PWM_DEFAULT_PERIOD ((long)(1E9 / 100)) /* Number of channels (16 AD and offset)*/ #define NUMCHANNELS 16 /* Size of one A/D value */ #define SIZEADIN ((sizeof(u32))) /* * Size of the async input-buffer IN BYTES, the DIO state is transmitted * as the first byte. */ #define SIZEINBUF (((NUMCHANNELS + 1) * SIZEADIN)) /* 16 bytes. */ #define SIZEINSNBUF 16 /* Number of DA channels */ #define NUMOUTCHANNELS 8 /* size of one value for the D/A converter: channel and value */ #define SIZEDAOUT ((sizeof(u8) + sizeof(uint16_t))) /* * Size of the output-buffer in bytes * Actually only the first 4 triplets are used but for the * high speed mode we need to pad it to 8 (microframes). */ #define SIZEOUTBUF ((8 * SIZEDAOUT)) /* * Size of the buffer for the dux commands: just now max size is determined * by the analogue out + command byte + panic bytes... */ #define SIZEOFDUXBUFFER ((8 * SIZEDAOUT + 2)) /* Number of in-URBs which receive the data: min=2 */ #define NUMOFINBUFFERSFULL 5 /* Number of out-URBs which send the data: min=2 */ #define NUMOFOUTBUFFERSFULL 5 /* Number of in-URBs which receive the data: min=5 */ /* must have more buffers due to buggy USB ctr */ #define NUMOFINBUFFERSHIGH 10 /* Number of out-URBs which send the data: min=5 */ /* must have more buffers due to buggy USB ctr */ #define NUMOFOUTBUFFERSHIGH 10 /* number of retries to get the right dux command */ #define RETRIES 10 /* bulk transfer commands to usbduxsigma */ #define USBBUXSIGMA_AD_CMD 9 #define USBDUXSIGMA_DA_CMD 1 #define USBDUXSIGMA_DIO_CFG_CMD 2 #define USBDUXSIGMA_DIO_BITS_CMD 3 #define USBDUXSIGMA_SINGLE_AD_CMD 4 #define USBDUXSIGMA_PWM_ON_CMD 7 #define USBDUXSIGMA_PWM_OFF_CMD 8 static const struct comedi_lrange usbduxsigma_ai_range = { 1, { BIP_RANGE(2.5 * 0x800000 / 0x780000 / 2.0) } }; struct usbduxsigma_private { /* actual number of in-buffers */ int n_ai_urbs; /* actual number of out-buffers */ int n_ao_urbs; /* ISO-transfer handling: buffers */ struct urb **ai_urbs; struct urb **ao_urbs; /* pwm-transfer handling */ struct urb *pwm_urb; /* PWM period */ unsigned int pwm_period; /* PWM internal delay for the GPIF in the FX2 */ u8 pwm_delay; /* size of the PWM buffer which holds the bit pattern */ int pwm_buf_sz; /* input buffer for the ISO-transfer */ __be32 *in_buf; /* input buffer for single insn */ u8 *insn_buf; unsigned high_speed:1; unsigned ai_cmd_running:1; unsigned ao_cmd_running:1; unsigned pwm_cmd_running:1; /* time between samples in units of the timer */ unsigned int ai_timer; unsigned int ao_timer; /* counter between acquisitions */ unsigned int ai_counter; unsigned int ao_counter; /* interval in frames/uframes */ unsigned int ai_interval; /* commands */ u8 *dux_commands; struct mutex mut; }; static void usbduxsigma_unlink_urbs(struct urb **urbs, int num_urbs) { int i; for (i = 0; i < num_urbs; i++) usb_kill_urb(urbs[i]); } static void usbduxsigma_ai_stop(struct comedi_device *dev, int do_unlink) { struct usbduxsigma_private *devpriv = dev->private; if (do_unlink && devpriv->ai_urbs) usbduxsigma_unlink_urbs(devpriv->ai_urbs, devpriv->n_ai_urbs); devpriv->ai_cmd_running = 0; } static int usbduxsigma_ai_cancel(struct comedi_device *dev, struct comedi_subdevice *s) { struct usbduxsigma_private *devpriv = dev->private; mutex_lock(&devpriv->mut); /* unlink only if it is really running */ usbduxsigma_ai_stop(dev, devpriv->ai_cmd_running); mutex_unlock(&devpriv->mut); return 0; } static void usbduxsigma_ai_handle_urb(struct comedi_device *dev, struct comedi_subdevice *s, struct urb *urb) { struct usbduxsigma_private *devpriv = dev->private; struct comedi_async *async = s->async; struct comedi_cmd *cmd = &async->cmd; u32 val; int ret; int i; if ((urb->actual_length > 0) && (urb->status != -EXDEV)) { devpriv->ai_counter--; if (devpriv->ai_counter == 0) { devpriv->ai_counter = devpriv->ai_timer; /* * Get the data from the USB bus and hand it over * to comedi. Note, first byte is the DIO state. */ for (i = 0; i < cmd->chanlist_len; i++) { val = be32_to_cpu(devpriv->in_buf[i + 1]); val &= 0x00ffffff; /* strip status byte */ val = comedi_offset_munge(s, val); if (!comedi_buf_write_samples(s, &val, 1)) return; } if (cmd->stop_src == TRIG_COUNT && async->scans_done >= cmd->stop_arg) async->events |= COMEDI_CB_EOA; } } /* if command is still running, resubmit urb */ if (!(async->events & COMEDI_CB_CANCEL_MASK)) { urb->dev = comedi_to_usb_dev(dev); ret = usb_submit_urb(urb, GFP_ATOMIC); if (ret < 0) { dev_err(dev->class_dev, "urb resubmit failed (%d)\n", ret); if (ret == -EL2NSYNC) dev_err(dev->class_dev, "buggy USB host controller or bug in IRQ handler\n"); async->events |= COMEDI_CB_ERROR; } } } static void usbduxsigma_ai_urb_complete(struct urb *urb) { struct comedi_device *dev = urb->context; struct usbduxsigma_private *devpriv = dev->private; struct comedi_subdevice *s = dev->read_subdev; struct comedi_async *async = s->async; /* exit if not running a command, do not resubmit urb */ if (!devpriv->ai_cmd_running) return; switch (urb->status) { case 0: /* copy the result in the transfer buffer */ memcpy(devpriv->in_buf, urb->transfer_buffer, SIZEINBUF); usbduxsigma_ai_handle_urb(dev, s, urb); break; case -EILSEQ: /* * error in the ISOchronous data * we don't copy the data into the transfer buffer * and recycle the last data byte */ dev_dbg(dev->class_dev, "CRC error in ISO IN stream\n"); usbduxsigma_ai_handle_urb(dev, s, urb); break; case -ECONNRESET: case -ENOENT: case -ESHUTDOWN: case -ECONNABORTED: /* happens after an unlink command */ async->events |= COMEDI_CB_ERROR; break; default: /* a real error */ dev_err(dev->class_dev, "non-zero urb status (%d)\n", urb->status); async->events |= COMEDI_CB_ERROR; break; } /* * comedi_handle_events() cannot be used in this driver. The (*cancel) * operation would unlink the urb. */ if (async->events & COMEDI_CB_CANCEL_MASK) usbduxsigma_ai_stop(dev, 0); comedi_event(dev, s); } static void usbduxsigma_ao_stop(struct comedi_device *dev, int do_unlink) { struct usbduxsigma_private *devpriv = dev->private; if (do_unlink && devpriv->ao_urbs) usbduxsigma_unlink_urbs(devpriv->ao_urbs, devpriv->n_ao_urbs); devpriv->ao_cmd_running = 0; } static int usbduxsigma_ao_cancel(struct comedi_device *dev, struct comedi_subdevice *s) { struct usbduxsigma_private *devpriv = dev->private; mutex_lock(&devpriv->mut); /* unlink only if it is really running */ usbduxsigma_ao_stop(dev, devpriv->ao_cmd_running); mutex_unlock(&devpriv->mut); return 0; } static void usbduxsigma_ao_handle_urb(struct comedi_device *dev, struct comedi_subdevice *s, struct urb *urb) { struct usbduxsigma_private *devpriv = dev->private; struct comedi_async *async = s->async; struct comedi_cmd *cmd = &async->cmd; u8 *datap; int ret; int i; devpriv->ao_counter--; if (devpriv->ao_counter == 0) { devpriv->ao_counter = devpriv->ao_timer; if (cmd->stop_src == TRIG_COUNT && async->scans_done >= cmd->stop_arg) { async->events |= COMEDI_CB_EOA; return; } /* transmit data to the USB bus */ datap = urb->transfer_buffer; *datap++ = cmd->chanlist_len; for (i = 0; i < cmd->chanlist_len; i++) { unsigned int chan = CR_CHAN(cmd->chanlist[i]); unsigned short val; if (!comedi_buf_read_samples(s, &val, 1)) { dev_err(dev->class_dev, "buffer underflow\n"); async->events |= COMEDI_CB_OVERFLOW; return; } *datap++ = val; *datap++ = chan; s->readback[chan] = val; } } /* if command is still running, resubmit urb */ if (!(async->events & COMEDI_CB_CANCEL_MASK)) { urb->transfer_buffer_length = SIZEOUTBUF; urb->dev = comedi_to_usb_dev(dev); urb->status = 0; urb->interval = 1; /* (u)frames */ urb->number_of_packets = 1; urb->iso_frame_desc[0].offset = 0; urb->iso_frame_desc[0].length = SIZEOUTBUF; urb->iso_frame_desc[0].status = 0; ret = usb_submit_urb(urb, GFP_ATOMIC); if (ret < 0) { dev_err(dev->class_dev, "urb resubmit failed (%d)\n", ret); if (ret == -EL2NSYNC) dev_err(dev->class_dev, "buggy USB host controller or bug in IRQ handler\n"); async->events |= COMEDI_CB_ERROR; } } } static void usbduxsigma_ao_urb_complete(struct urb *urb) { struct comedi_device *dev = urb->context; struct usbduxsigma_private *devpriv = dev->private; struct comedi_subdevice *s = dev->write_subdev; struct comedi_async *async = s->async; /* exit if not running a command, do not resubmit urb */ if (!devpriv->ao_cmd_running) return; switch (urb->status) { case 0: usbduxsigma_ao_handle_urb(dev, s, urb); break; case -ECONNRESET: case -ENOENT: case -ESHUTDOWN: case -ECONNABORTED: /* happens after an unlink command */ async->events |= COMEDI_CB_ERROR; break; default: /* a real error */ dev_err(dev->class_dev, "non-zero urb status (%d)\n", urb->status); async->events |= COMEDI_CB_ERROR; break; } /* * comedi_handle_events() cannot be used in this driver. The (*cancel) * operation would unlink the urb. */ if (async->events & COMEDI_CB_CANCEL_MASK) usbduxsigma_ao_stop(dev, 0); comedi_event(dev, s); } static int usbduxsigma_submit_urbs(struct comedi_device *dev, struct urb **urbs, int num_urbs, int input_urb) { struct usb_device *usb = comedi_to_usb_dev(dev); struct urb *urb; int ret; int i; /* Submit all URBs and start the transfer on the bus */ for (i = 0; i < num_urbs; i++) { urb = urbs[i]; /* in case of a resubmission after an unlink... */ if (input_urb) urb->interval = 1; urb->context = dev; urb->dev = usb; urb->status = 0; urb->transfer_flags = URB_ISO_ASAP; ret = usb_submit_urb(urb, GFP_ATOMIC); if (ret) return ret; } return 0; } static int usbduxsigma_chans_to_interval(int num_chan) { if (num_chan <= 2) return 2; /* 4kHz */ if (num_chan <= 8) return 4; /* 2kHz */ return 8; /* 1kHz */ } static int usbduxsigma_ai_cmdtest(struct comedi_device *dev, struct comedi_subdevice *s, struct comedi_cmd *cmd) { struct usbduxsigma_private *devpriv = dev->private; int high_speed = devpriv->high_speed; int interval = usbduxsigma_chans_to_interval(cmd->chanlist_len); unsigned int tmp; int err = 0; /* Step 1 : check if triggers are trivially valid */ err |= comedi_check_trigger_src(&cmd->start_src, TRIG_NOW | TRIG_INT); err |= comedi_check_trigger_src(&cmd->scan_begin_src, TRIG_TIMER); err |= comedi_check_trigger_src(&cmd->convert_src, TRIG_NOW); err |= comedi_check_trigger_src(&cmd->scan_end_src, TRIG_COUNT); err |= comedi_check_trigger_src(&cmd->stop_src, TRIG_COUNT | TRIG_NONE); if (err) return 1; /* Step 2a : make sure trigger sources are unique */ err |= comedi_check_trigger_is_unique(cmd->start_src); err |= comedi_check_trigger_is_unique(cmd->stop_src); /* Step 2b : and mutually compatible */ if (err) return 2; /* Step 3: check if arguments are trivially valid */ err |= comedi_check_trigger_arg_is(&cmd->start_arg, 0); if (high_speed) { /* * In high speed mode microframes are possible. * However, during one microframe we can roughly * sample two channels. Thus, the more channels * are in the channel list the more time we need. */ err |= comedi_check_trigger_arg_min(&cmd->scan_begin_arg, (125000 * interval)); } else { /* full speed */ /* 1kHz scans every USB frame */ err |= comedi_check_trigger_arg_min(&cmd->scan_begin_arg, 1000000); } err |= comedi_check_trigger_arg_is(&cmd->scan_end_arg, cmd->chanlist_len); if (cmd->stop_src == TRIG_COUNT) err |= comedi_check_trigger_arg_min(&cmd->stop_arg, 1); else /* TRIG_NONE */ err |= comedi_check_trigger_arg_is(&cmd->stop_arg, 0); if (err) return 3; /* Step 4: fix up any arguments */ tmp = rounddown(cmd->scan_begin_arg, high_speed ? 125000 : 1000000); err |= comedi_check_trigger_arg_is(&cmd->scan_begin_arg, tmp); if (err) return 4; return 0; } /* * creates the ADC command for the MAX1271 * range is the range value from comedi */ static void create_adc_command(unsigned int chan, u8 *muxsg0, u8 *muxsg1) { if (chan < 8) (*muxsg0) = (*muxsg0) | (1 << chan); else if (chan < 16) (*muxsg1) = (*muxsg1) | (1 << (chan - 8)); } static int usbbuxsigma_send_cmd(struct comedi_device *dev, int cmd_type) { struct usb_device *usb = comedi_to_usb_dev(dev); struct usbduxsigma_private *devpriv = dev->private; int nsent; devpriv->dux_commands[0] = cmd_type; return usb_bulk_msg(usb, usb_sndbulkpipe(usb, 1), devpriv->dux_commands, SIZEOFDUXBUFFER, &nsent, BULK_TIMEOUT); } static int usbduxsigma_receive_cmd(struct comedi_device *dev, int command) { struct usb_device *usb = comedi_to_usb_dev(dev); struct usbduxsigma_private *devpriv = dev->private; int nrec; int ret; int i; for (i = 0; i < RETRIES; i++) { ret = usb_bulk_msg(usb, usb_rcvbulkpipe(usb, 8), devpriv->insn_buf, SIZEINSNBUF, &nrec, BULK_TIMEOUT); if (ret < 0) return ret; if (devpriv->insn_buf[0] == command) return 0; } /* * This is only reached if the data has been requested a * couple of times and the command was not received. */ return -EFAULT; } static int usbduxsigma_ai_inttrig(struct comedi_device *dev, struct comedi_subdevice *s, unsigned int trig_num) { struct usbduxsigma_private *devpriv = dev->private; struct comedi_cmd *cmd = &s->async->cmd; int ret; if (trig_num != cmd->start_arg) return -EINVAL; mutex_lock(&devpriv->mut); if (!devpriv->ai_cmd_running) { devpriv->ai_cmd_running = 1; ret = usbduxsigma_submit_urbs(dev, devpriv->ai_urbs, devpriv->n_ai_urbs, 1); if (ret < 0) { devpriv->ai_cmd_running = 0; mutex_unlock(&devpriv->mut); return ret; } s->async->inttrig = NULL; } mutex_unlock(&devpriv->mut); return 1; } static int usbduxsigma_ai_cmd(struct comedi_device *dev, struct comedi_subdevice *s) { struct usbduxsigma_private *devpriv = dev->private; struct comedi_cmd *cmd = &s->async->cmd; unsigned int len = cmd->chanlist_len; u8 muxsg0 = 0; u8 muxsg1 = 0; u8 sysred = 0; int ret; int i; mutex_lock(&devpriv->mut); if (devpriv->high_speed) { /* * every 2 channels get a time window of 125us. Thus, if we * sample all 16 channels we need 1ms. If we sample only one * channel we need only 125us */ unsigned int interval = usbduxsigma_chans_to_interval(len); devpriv->ai_interval = interval; devpriv->ai_timer = cmd->scan_begin_arg / (125000 * interval); } else { /* interval always 1ms */ devpriv->ai_interval = 1; devpriv->ai_timer = cmd->scan_begin_arg / 1000000; } for (i = 0; i < len; i++) { unsigned int chan = CR_CHAN(cmd->chanlist[i]); create_adc_command(chan, &muxsg0, &muxsg1); } devpriv->dux_commands[1] = devpriv->ai_interval; devpriv->dux_commands[2] = len; /* num channels per time step */ devpriv->dux_commands[3] = 0x12; /* CONFIG0 */ devpriv->dux_commands[4] = 0x03; /* CONFIG1: 23kHz sample, delay 0us */ devpriv->dux_commands[5] = 0x00; /* CONFIG3: diff. channels off */ devpriv->dux_commands[6] = muxsg0; devpriv->dux_commands[7] = muxsg1; devpriv->dux_commands[8] = sysred; ret = usbbuxsigma_send_cmd(dev, USBBUXSIGMA_AD_CMD); if (ret < 0) { mutex_unlock(&devpriv->mut); return ret; } devpriv->ai_counter = devpriv->ai_timer; if (cmd->start_src == TRIG_NOW) { /* enable this acquisition operation */ devpriv->ai_cmd_running = 1; ret = usbduxsigma_submit_urbs(dev, devpriv->ai_urbs, devpriv->n_ai_urbs, 1); if (ret < 0) { devpriv->ai_cmd_running = 0; mutex_unlock(&devpriv->mut); return ret; } s->async->inttrig = NULL; } else { /* TRIG_INT */ s->async->inttrig = usbduxsigma_ai_inttrig; } mutex_unlock(&devpriv->mut); return 0; } static int usbduxsigma_ai_insn_read(struct comedi_device *dev, struct comedi_subdevice *s, struct comedi_insn *insn, unsigned int *data) { struct usbduxsigma_private *devpriv = dev->private; unsigned int chan = CR_CHAN(insn->chanspec); u8 muxsg0 = 0; u8 muxsg1 = 0; u8 sysred = 0; int ret; int i; mutex_lock(&devpriv->mut); if (devpriv->ai_cmd_running) { mutex_unlock(&devpriv->mut); return -EBUSY; } create_adc_command(chan, &muxsg0, &muxsg1); /* Mode 0 is used to get a single conversion on demand */ devpriv->dux_commands[1] = 0x16; /* CONFIG0: chopper on */ devpriv->dux_commands[2] = 0x80; /* CONFIG1: 2kHz sampling rate */ devpriv->dux_commands[3] = 0x00; /* CONFIG3: diff. channels off */ devpriv->dux_commands[4] = muxsg0; devpriv->dux_commands[5] = muxsg1; devpriv->dux_commands[6] = sysred; /* adc commands */ ret = usbbuxsigma_send_cmd(dev, USBDUXSIGMA_SINGLE_AD_CMD); if (ret < 0) { mutex_unlock(&devpriv->mut); return ret; } for (i = 0; i < insn->n; i++) { u32 val; ret = usbduxsigma_receive_cmd(dev, USBDUXSIGMA_SINGLE_AD_CMD); if (ret < 0) { mutex_unlock(&devpriv->mut); return ret; } /* 32 bits big endian from the A/D converter */ val = be32_to_cpu(get_unaligned((__be32 *)(devpriv->insn_buf + 1))); val &= 0x00ffffff; /* strip status byte */ data[i] = comedi_offset_munge(s, val); } mutex_unlock(&devpriv->mut); return insn->n; } static int usbduxsigma_ao_insn_read(struct comedi_device *dev, struct comedi_subdevice *s, struct comedi_insn *insn, unsigned int *data) { struct usbduxsigma_private *devpriv = dev->private; int ret; mutex_lock(&devpriv->mut); ret = comedi_readback_insn_read(dev, s, insn, data); mutex_unlock(&devpriv->mut); return ret; } static int usbduxsigma_ao_insn_write(struct comedi_device *dev, struct comedi_subdevice *s, struct comedi_insn *insn, unsigned int *data) { struct usbduxsigma_private *devpriv = dev->private; unsigned int chan = CR_CHAN(insn->chanspec); int ret; int i; mutex_lock(&devpriv->mut); if (devpriv->ao_cmd_running) { mutex_unlock(&devpriv->mut); return -EBUSY; } for (i = 0; i < insn->n; i++) { devpriv->dux_commands[1] = 1; /* num channels */ devpriv->dux_commands[2] = data[i]; /* value */ devpriv->dux_commands[3] = chan; /* channel number */ ret = usbbuxsigma_send_cmd(dev, USBDUXSIGMA_DA_CMD); if (ret < 0) { mutex_unlock(&devpriv->mut); return ret; } s->readback[chan] = data[i]; } mutex_unlock(&devpriv->mut); return insn->n; } static int usbduxsigma_ao_inttrig(struct comedi_device *dev, struct comedi_subdevice *s, unsigned int trig_num) { struct usbduxsigma_private *devpriv = dev->private; struct comedi_cmd *cmd = &s->async->cmd; int ret; if (trig_num != cmd->start_arg) return -EINVAL; mutex_lock(&devpriv->mut); if (!devpriv->ao_cmd_running) { devpriv->ao_cmd_running = 1; ret = usbduxsigma_submit_urbs(dev, devpriv->ao_urbs, devpriv->n_ao_urbs, 0); if (ret < 0) { devpriv->ao_cmd_running = 0; mutex_unlock(&devpriv->mut); return ret; } s->async->inttrig = NULL; } mutex_unlock(&devpriv->mut); return 1; } static int usbduxsigma_ao_cmdtest(struct comedi_device *dev, struct comedi_subdevice *s, struct comedi_cmd *cmd) { struct usbduxsigma_private *devpriv = dev->private; unsigned int tmp; int err = 0; /* Step 1 : check if triggers are trivially valid */ err |= comedi_check_trigger_src(&cmd->start_src, TRIG_NOW | TRIG_INT); /* * For now, always use "scan" timing with all channels updated at once * (cmd->scan_begin_src == TRIG_TIMER, cmd->convert_src == TRIG_NOW). * * In a future version, "convert" timing with channels updated * indivually may be supported in high speed mode * (cmd->scan_begin_src == TRIG_FOLLOW, cmd->convert_src == TRIG_TIMER). */ err |= comedi_check_trigger_src(&cmd->scan_begin_src, TRIG_TIMER); err |= comedi_check_trigger_src(&cmd->convert_src, TRIG_NOW); err |= comedi_check_trigger_src(&cmd->scan_end_src, TRIG_COUNT); err |= comedi_check_trigger_src(&cmd->stop_src, TRIG_COUNT | TRIG_NONE); if (err) { mutex_unlock(&devpriv->mut); return 1; } /* Step 2a : make sure trigger sources are unique */ err |= comedi_check_trigger_is_unique(cmd->start_src); err |= comedi_check_trigger_is_unique(cmd->stop_src); /* Step 2b : and mutually compatible */ if (err) return 2; /* Step 3: check if arguments are trivially valid */ err |= comedi_check_trigger_arg_is(&cmd->start_arg, 0); err |= comedi_check_trigger_arg_min(&cmd->scan_begin_arg, 1000000); err |= comedi_check_trigger_arg_is(&cmd->scan_end_arg, cmd->chanlist_len); if (cmd->stop_src == TRIG_COUNT) err |= comedi_check_trigger_arg_min(&cmd->stop_arg, 1); else /* TRIG_NONE */ err |= comedi_check_trigger_arg_is(&cmd->stop_arg, 0); if (err) return 3; /* Step 4: fix up any arguments */ tmp = rounddown(cmd->scan_begin_arg, 1000000); err |= comedi_check_trigger_arg_is(&cmd->scan_begin_arg, tmp); if (err) return 4; return 0; } static int usbduxsigma_ao_cmd(struct comedi_device *dev, struct comedi_subdevice *s) { struct usbduxsigma_private *devpriv = dev->private; struct comedi_cmd *cmd = &s->async->cmd; int ret; mutex_lock(&devpriv->mut); /* * For now, only "scan" timing is supported. A future version may * support "convert" timing in high speed mode. * * Timing of the scan: every 1ms all channels updated at once. */ devpriv->ao_timer = cmd->scan_begin_arg / 1000000; devpriv->ao_counter = devpriv->ao_timer; if (cmd->start_src == TRIG_NOW) { /* enable this acquisition operation */ devpriv->ao_cmd_running = 1; ret = usbduxsigma_submit_urbs(dev, devpriv->ao_urbs, devpriv->n_ao_urbs, 0); if (ret < 0) { devpriv->ao_cmd_running = 0; mutex_unlock(&devpriv->mut); return ret; } s->async->inttrig = NULL; } else { /* TRIG_INT */ s->async->inttrig = usbduxsigma_ao_inttrig; } mutex_unlock(&devpriv->mut); return 0; } static int usbduxsigma_dio_insn_config(struct comedi_device *dev, struct comedi_subdevice *s, struct comedi_insn *insn, unsigned int *data) { int ret; ret = comedi_dio_insn_config(dev, s, insn, data, 0); if (ret) return ret; /* * We don't tell the firmware here as it would take 8 frames * to submit the information. We do it in the (*insn_bits). */ return insn->n; } static int usbduxsigma_dio_insn_bits(struct comedi_device *dev, struct comedi_subdevice *s, struct comedi_insn *insn, unsigned int *data) { struct usbduxsigma_private *devpriv = dev->private; int ret; mutex_lock(&devpriv->mut); comedi_dio_update_state(s, data); /* Always update the hardware. See the (*insn_config). */ devpriv->dux_commands[1] = s->io_bits & 0xff; devpriv->dux_commands[4] = s->state & 0xff; devpriv->dux_commands[2] = (s->io_bits >> 8) & 0xff; devpriv->dux_commands[5] = (s->state >> 8) & 0xff; devpriv->dux_commands[3] = (s->io_bits >> 16) & 0xff; devpriv->dux_commands[6] = (s->state >> 16) & 0xff; ret = usbbuxsigma_send_cmd(dev, USBDUXSIGMA_DIO_BITS_CMD); if (ret < 0) goto done; ret = usbduxsigma_receive_cmd(dev, USBDUXSIGMA_DIO_BITS_CMD); if (ret < 0) goto done; s->state = devpriv->insn_buf[1] | (devpriv->insn_buf[2] << 8) | (devpriv->insn_buf[3] << 16); data[1] = s->state; ret = insn->n; done: mutex_unlock(&devpriv->mut); return ret; } static void usbduxsigma_pwm_stop(struct comedi_device *dev, int do_unlink) { struct usbduxsigma_private *devpriv = dev->private; if (do_unlink) { if (devpriv->pwm_urb) usb_kill_urb(devpriv->pwm_urb); } devpriv->pwm_cmd_running = 0; } static int usbduxsigma_pwm_cancel(struct comedi_device *dev, struct comedi_subdevice *s) { struct usbduxsigma_private *devpriv = dev->private; /* unlink only if it is really running */ usbduxsigma_pwm_stop(dev, devpriv->pwm_cmd_running); return usbbuxsigma_send_cmd(dev, USBDUXSIGMA_PWM_OFF_CMD); } static void usbduxsigma_pwm_urb_complete(struct urb *urb) { struct comedi_device *dev = urb->context; struct usbduxsigma_private *devpriv = dev->private; int ret; switch (urb->status) { case 0: /* success */ break; case -ECONNRESET: case -ENOENT: case -ESHUTDOWN: case -ECONNABORTED: /* happens after an unlink command */ if (devpriv->pwm_cmd_running) usbduxsigma_pwm_stop(dev, 0); /* w/o unlink */ return; default: /* a real error */ if (devpriv->pwm_cmd_running) { dev_err(dev->class_dev, "non-zero urb status (%d)\n", urb->status); usbduxsigma_pwm_stop(dev, 0); /* w/o unlink */ } return; } if (!devpriv->pwm_cmd_running) return; urb->transfer_buffer_length = devpriv->pwm_buf_sz; urb->dev = comedi_to_usb_dev(dev); urb->status = 0; ret = usb_submit_urb(urb, GFP_ATOMIC); if (ret < 0) { dev_err(dev->class_dev, "urb resubmit failed (%d)\n", ret); if (ret == -EL2NSYNC) dev_err(dev->class_dev, "buggy USB host controller or bug in IRQ handler\n"); usbduxsigma_pwm_stop(dev, 0); /* w/o unlink */ } } static int usbduxsigma_submit_pwm_urb(struct comedi_device *dev) { struct usb_device *usb = comedi_to_usb_dev(dev); struct usbduxsigma_private *devpriv = dev->private; struct urb *urb = devpriv->pwm_urb; /* in case of a resubmission after an unlink... */ usb_fill_bulk_urb(urb, usb, usb_sndbulkpipe(usb, 4), urb->transfer_buffer, devpriv->pwm_buf_sz, usbduxsigma_pwm_urb_complete, dev); return usb_submit_urb(urb, GFP_ATOMIC); } static int usbduxsigma_pwm_period(struct comedi_device *dev, struct comedi_subdevice *s, unsigned int period) { struct usbduxsigma_private *devpriv = dev->private; int fx2delay; if (period < MIN_PWM_PERIOD) return -EAGAIN; fx2delay = (period / (6 * 512 * 1000 / 33)) - 6; if (fx2delay > 255) return -EAGAIN; devpriv->pwm_delay = fx2delay; devpriv->pwm_period = period; return 0; } static int usbduxsigma_pwm_start(struct comedi_device *dev, struct comedi_subdevice *s) { struct usbduxsigma_private *devpriv = dev->private; int ret; if (devpriv->pwm_cmd_running) return 0; devpriv->dux_commands[1] = devpriv->pwm_delay; ret = usbbuxsigma_send_cmd(dev, USBDUXSIGMA_PWM_ON_CMD); if (ret < 0) return ret; memset(devpriv->pwm_urb->transfer_buffer, 0, devpriv->pwm_buf_sz); devpriv->pwm_cmd_running = 1; ret = usbduxsigma_submit_pwm_urb(dev); if (ret < 0) { devpriv->pwm_cmd_running = 0; return ret; } return 0; } static void usbduxsigma_pwm_pattern(struct comedi_device *dev, struct comedi_subdevice *s, unsigned int chan, unsigned int value, unsigned int sign) { struct usbduxsigma_private *devpriv = dev->private; char pwm_mask = (1 << chan); /* DIO bit for the PWM data */ char sgn_mask = (16 << chan); /* DIO bit for the sign */ char *buf = (char *)(devpriv->pwm_urb->transfer_buffer); int szbuf = devpriv->pwm_buf_sz; int i; for (i = 0; i < szbuf; i++) { char c = *buf; c &= ~pwm_mask; if (i < value) c |= pwm_mask; if (!sign) c &= ~sgn_mask; else c |= sgn_mask; *buf++ = c; } } static int usbduxsigma_pwm_write(struct comedi_device *dev, struct comedi_subdevice *s, struct comedi_insn *insn, unsigned int *data) { unsigned int chan = CR_CHAN(insn->chanspec); /* * It doesn't make sense to support more than one value here * because it would just overwrite the PWM buffer. */ if (insn->n != 1) return -EINVAL; /* * The sign is set via a special INSN only, this gives us 8 bits * for normal operation, sign is 0 by default. */ usbduxsigma_pwm_pattern(dev, s, chan, data[0], 0); return insn->n; } static int usbduxsigma_pwm_config(struct comedi_device *dev, struct comedi_subdevice *s, struct comedi_insn *insn, unsigned int *data) { struct usbduxsigma_private *devpriv = dev->private; unsigned int chan = CR_CHAN(insn->chanspec); switch (data[0]) { case INSN_CONFIG_ARM: /* * if not zero the PWM is limited to a certain time which is * not supported here */ if (data[1] != 0) return -EINVAL; return usbduxsigma_pwm_start(dev, s); case INSN_CONFIG_DISARM: return usbduxsigma_pwm_cancel(dev, s); case INSN_CONFIG_GET_PWM_STATUS: data[1] = devpriv->pwm_cmd_running; return 0; case INSN_CONFIG_PWM_SET_PERIOD: return usbduxsigma_pwm_period(dev, s, data[1]); case INSN_CONFIG_PWM_GET_PERIOD: data[1] = devpriv->pwm_period; return 0; case INSN_CONFIG_PWM_SET_H_BRIDGE: /* * data[1] = value * data[2] = sign (for a relay) */ usbduxsigma_pwm_pattern(dev, s, chan, data[1], (data[2] != 0)); return 0; case INSN_CONFIG_PWM_GET_H_BRIDGE: /* values are not kept in this driver, nothing to return */ return -EINVAL; } return -EINVAL; } static int usbduxsigma_getstatusinfo(struct comedi_device *dev, int chan) { struct comedi_subdevice *s = dev->read_subdev; struct usbduxsigma_private *devpriv = dev->private; u8 sysred; u32 val; int ret; switch (chan) { default: case 0: sysred = 0; /* ADC zero */ break; case 1: sysred = 1; /* ADC offset */ break; case 2: sysred = 4; /* VCC */ break; case 3: sysred = 8; /* temperature */ break; case 4: sysred = 16; /* gain */ break; case 5: sysred = 32; /* ref */ break; } devpriv->dux_commands[1] = 0x12; /* CONFIG0 */ devpriv->dux_commands[2] = 0x80; /* CONFIG1: 2kHz sampling rate */ devpriv->dux_commands[3] = 0x00; /* CONFIG3: diff. channels off */ devpriv->dux_commands[4] = 0; devpriv->dux_commands[5] = 0; devpriv->dux_commands[6] = sysred; ret = usbbuxsigma_send_cmd(dev, USBDUXSIGMA_SINGLE_AD_CMD); if (ret < 0) return ret; ret = usbduxsigma_receive_cmd(dev, USBDUXSIGMA_SINGLE_AD_CMD); if (ret < 0) return ret; /* 32 bits big endian from the A/D converter */ val = be32_to_cpu(get_unaligned((__be32 *)(devpriv->insn_buf + 1))); val &= 0x00ffffff; /* strip status byte */ return (int)comedi_offset_munge(s, val); } static int usbduxsigma_firmware_upload(struct comedi_device *dev, const u8 *data, size_t size, unsigned long context) { struct usb_device *usb = comedi_to_usb_dev(dev); u8 *buf; u8 *tmp; int ret; if (!data) return 0; if (size > FIRMWARE_MAX_LEN) { dev_err(dev->class_dev, "firmware binary too large for FX2\n"); return -ENOMEM; } /* we generate a local buffer for the firmware */ buf = kmemdup(data, size, GFP_KERNEL); if (!buf) return -ENOMEM; /* we need a malloc'ed buffer for usb_control_msg() */ tmp = kmalloc(1, GFP_KERNEL); if (!tmp) { kfree(buf); return -ENOMEM; } /* stop the current firmware on the device */ *tmp = 1; /* 7f92 to one */ ret = usb_control_msg(usb, usb_sndctrlpipe(usb, 0), USBDUXSUB_FIRMWARE, VENDOR_DIR_OUT, USBDUXSUB_CPUCS, 0x0000, tmp, 1, BULK_TIMEOUT); if (ret < 0) { dev_err(dev->class_dev, "can not stop firmware\n"); goto done; } /* upload the new firmware to the device */ ret = usb_control_msg(usb, usb_sndctrlpipe(usb, 0), USBDUXSUB_FIRMWARE, VENDOR_DIR_OUT, 0, 0x0000, buf, size, BULK_TIMEOUT); if (ret < 0) { dev_err(dev->class_dev, "firmware upload failed\n"); goto done; } /* start the new firmware on the device */ *tmp = 0; /* 7f92 to zero */ ret = usb_control_msg(usb, usb_sndctrlpipe(usb, 0), USBDUXSUB_FIRMWARE, VENDOR_DIR_OUT, USBDUXSUB_CPUCS, 0x0000, tmp, 1, BULK_TIMEOUT); if (ret < 0) dev_err(dev->class_dev, "can not start firmware\n"); done: kfree(tmp); kfree(buf); return ret; } static int usbduxsigma_alloc_usb_buffers(struct comedi_device *dev) { struct usb_device *usb = comedi_to_usb_dev(dev); struct usbduxsigma_private *devpriv = dev->private; struct urb *urb; int i; devpriv->dux_commands = kzalloc(SIZEOFDUXBUFFER, GFP_KERNEL); devpriv->in_buf = kzalloc(SIZEINBUF, GFP_KERNEL); devpriv->insn_buf = kzalloc(SIZEINSNBUF, GFP_KERNEL); devpriv->ai_urbs = kcalloc(devpriv->n_ai_urbs, sizeof(urb), GFP_KERNEL); devpriv->ao_urbs = kcalloc(devpriv->n_ao_urbs, sizeof(urb), GFP_KERNEL); if (!devpriv->dux_commands || !devpriv->in_buf || !devpriv->insn_buf || !devpriv->ai_urbs || !devpriv->ao_urbs) return -ENOMEM; for (i = 0; i < devpriv->n_ai_urbs; i++) { /* one frame: 1ms */ urb = usb_alloc_urb(1, GFP_KERNEL); if (!urb) return -ENOMEM; devpriv->ai_urbs[i] = urb; urb->dev = usb; /* will be filled later with a pointer to the comedi-device */ /* and ONLY then the urb should be submitted */ urb->context = NULL; urb->pipe = usb_rcvisocpipe(usb, 6); urb->transfer_flags = URB_ISO_ASAP; urb->transfer_buffer = kzalloc(SIZEINBUF, GFP_KERNEL); if (!urb->transfer_buffer) return -ENOMEM; urb->complete = usbduxsigma_ai_urb_complete; urb->number_of_packets = 1; urb->transfer_buffer_length = SIZEINBUF; urb->iso_frame_desc[0].offset = 0; urb->iso_frame_desc[0].length = SIZEINBUF; } for (i = 0; i < devpriv->n_ao_urbs; i++) { /* one frame: 1ms */ urb = usb_alloc_urb(1, GFP_KERNEL); if (!urb) return -ENOMEM; devpriv->ao_urbs[i] = urb; urb->dev = usb; /* will be filled later with a pointer to the comedi-device */ /* and ONLY then the urb should be submitted */ urb->context = NULL; urb->pipe = usb_sndisocpipe(usb, 2); urb->transfer_flags = URB_ISO_ASAP; urb->transfer_buffer = kzalloc(SIZEOUTBUF, GFP_KERNEL); if (!urb->transfer_buffer) return -ENOMEM; urb->complete = usbduxsigma_ao_urb_complete; urb->number_of_packets = 1; urb->transfer_buffer_length = SIZEOUTBUF; urb->iso_frame_desc[0].offset = 0; urb->iso_frame_desc[0].length = SIZEOUTBUF; urb->interval = 1; /* (u)frames */ } if (devpriv->pwm_buf_sz) { urb = usb_alloc_urb(0, GFP_KERNEL); if (!urb) return -ENOMEM; devpriv->pwm_urb = urb; urb->transfer_buffer = kzalloc(devpriv->pwm_buf_sz, GFP_KERNEL); if (!urb->transfer_buffer) return -ENOMEM; } return 0; } static void usbduxsigma_free_usb_buffers(struct comedi_device *dev) { struct usbduxsigma_private *devpriv = dev->private; struct urb *urb; int i; urb = devpriv->pwm_urb; if (urb) { kfree(urb->transfer_buffer); usb_free_urb(urb); } if (devpriv->ao_urbs) { for (i = 0; i < devpriv->n_ao_urbs; i++) { urb = devpriv->ao_urbs[i]; if (urb) { kfree(urb->transfer_buffer); usb_free_urb(urb); } } kfree(devpriv->ao_urbs); } if (devpriv->ai_urbs) { for (i = 0; i < devpriv->n_ai_urbs; i++) { urb = devpriv->ai_urbs[i]; if (urb) { kfree(urb->transfer_buffer); usb_free_urb(urb); } } kfree(devpriv->ai_urbs); } kfree(devpriv->insn_buf); kfree(devpriv->in_buf); kfree(devpriv->dux_commands); } static int usbduxsigma_auto_attach(struct comedi_device *dev, unsigned long context_unused) { struct usb_interface *intf = comedi_to_usb_interface(dev); struct usb_device *usb = comedi_to_usb_dev(dev); struct usbduxsigma_private *devpriv; struct comedi_subdevice *s; int offset; int ret; devpriv = comedi_alloc_devpriv(dev, sizeof(*devpriv)); if (!devpriv) return -ENOMEM; mutex_init(&devpriv->mut); usb_set_intfdata(intf, devpriv); devpriv->high_speed = (usb->speed == USB_SPEED_HIGH); if (devpriv->high_speed) { devpriv->n_ai_urbs = NUMOFINBUFFERSHIGH; devpriv->n_ao_urbs = NUMOFOUTBUFFERSHIGH; devpriv->pwm_buf_sz = 512; } else { devpriv->n_ai_urbs = NUMOFINBUFFERSFULL; devpriv->n_ao_urbs = NUMOFOUTBUFFERSFULL; } ret = usbduxsigma_alloc_usb_buffers(dev); if (ret) return ret; /* setting to alternate setting 3: enabling iso ep and bulk ep. */ ret = usb_set_interface(usb, intf->altsetting->desc.bInterfaceNumber, 3); if (ret < 0) { dev_err(dev->class_dev, "could not set alternate setting 3 in high speed\n"); return ret; } ret = comedi_load_firmware(dev, &usb->dev, FIRMWARE, usbduxsigma_firmware_upload, 0); if (ret) return ret; ret = comedi_alloc_subdevices(dev, (devpriv->high_speed) ? 4 : 3); if (ret) return ret; /* Analog Input subdevice */ s = &dev->subdevices[0]; dev->read_subdev = s; s->type = COMEDI_SUBD_AI; s->subdev_flags = SDF_READABLE | SDF_GROUND | SDF_CMD_READ | SDF_LSAMPL; s->n_chan = NUMCHANNELS; s->len_chanlist = NUMCHANNELS; s->maxdata = 0x00ffffff; s->range_table = &usbduxsigma_ai_range; s->insn_read = usbduxsigma_ai_insn_read; s->do_cmdtest = usbduxsigma_ai_cmdtest; s->do_cmd = usbduxsigma_ai_cmd; s->cancel = usbduxsigma_ai_cancel; /* Analog Output subdevice */ s = &dev->subdevices[1]; dev->write_subdev = s; s->type = COMEDI_SUBD_AO; s->subdev_flags = SDF_WRITABLE | SDF_GROUND | SDF_CMD_WRITE; s->n_chan = 4; s->len_chanlist = s->n_chan; s->maxdata = 0x00ff; s->range_table = &range_unipolar2_5; s->insn_write = usbduxsigma_ao_insn_write; s->insn_read = usbduxsigma_ao_insn_read; s->do_cmdtest = usbduxsigma_ao_cmdtest; s->do_cmd = usbduxsigma_ao_cmd; s->cancel = usbduxsigma_ao_cancel; ret = comedi_alloc_subdev_readback(s); if (ret) return ret; /* Digital I/O subdevice */ s = &dev->subdevices[2]; s->type = COMEDI_SUBD_DIO; s->subdev_flags = SDF_READABLE | SDF_WRITABLE; s->n_chan = 24; s->maxdata = 1; s->range_table = &range_digital; s->insn_bits = usbduxsigma_dio_insn_bits; s->insn_config = usbduxsigma_dio_insn_config; if (devpriv->high_speed) { /* Timer / pwm subdevice */ s = &dev->subdevices[3]; s->type = COMEDI_SUBD_PWM; s->subdev_flags = SDF_WRITABLE | SDF_PWM_HBRIDGE; s->n_chan = 8; s->maxdata = devpriv->pwm_buf_sz; s->insn_write = usbduxsigma_pwm_write; s->insn_config = usbduxsigma_pwm_config; usbduxsigma_pwm_period(dev, s, PWM_DEFAULT_PERIOD); } offset = usbduxsigma_getstatusinfo(dev, 0); if (offset < 0) { dev_err(dev->class_dev, "Communication to USBDUXSIGMA failed! Check firmware and cabling.\n"); return offset; } dev_info(dev->class_dev, "ADC_zero = %x\n", offset); return 0; } static void usbduxsigma_detach(struct comedi_device *dev) { struct usb_interface *intf = comedi_to_usb_interface(dev); struct usbduxsigma_private *devpriv = dev->private; usb_set_intfdata(intf, NULL); if (!devpriv) return; mutex_lock(&devpriv->mut); /* force unlink all urbs */ usbduxsigma_ai_stop(dev, 1); usbduxsigma_ao_stop(dev, 1); usbduxsigma_pwm_stop(dev, 1); usbduxsigma_free_usb_buffers(dev); mutex_unlock(&devpriv->mut); mutex_destroy(&devpriv->mut); } static struct comedi_driver usbduxsigma_driver = { .driver_name = "usbduxsigma", .module = THIS_MODULE, .auto_attach = usbduxsigma_auto_attach, .detach = usbduxsigma_detach, }; static int usbduxsigma_usb_probe(struct usb_interface *intf, const struct usb_device_id *id) { return comedi_usb_auto_config(intf, &usbduxsigma_driver, 0); } static const struct usb_device_id usbduxsigma_usb_table[] = { { USB_DEVICE(0x13d8, 0x0020) }, { USB_DEVICE(0x13d8, 0x0021) }, { USB_DEVICE(0x13d8, 0x0022) }, { } }; MODULE_DEVICE_TABLE(usb, usbduxsigma_usb_table); static struct usb_driver usbduxsigma_usb_driver = { .name = "usbduxsigma", .probe = usbduxsigma_usb_probe, .disconnect = comedi_usb_auto_unconfig, .id_table = usbduxsigma_usb_table, }; module_comedi_usb_driver(usbduxsigma_driver, usbduxsigma_usb_driver); MODULE_AUTHOR("Bernd Porr, mail@berndporr.me.uk"); MODULE_DESCRIPTION("Stirling/ITL USB-DUX SIGMA -- mail@berndporr.me.uk"); MODULE_LICENSE("GPL"); MODULE_FIRMWARE(FIRMWARE); |
706 706 666 666 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 | /* SPDX-License-Identifier: GPL-2.0-or-later */ #ifndef _ASM_X86_INSN_H #define _ASM_X86_INSN_H /* * x86 instruction analysis * * Copyright (C) IBM Corporation, 2009 */ #include <asm/byteorder.h> /* insn_attr_t is defined in inat.h */ #include <asm/inat.h> /* __ignore_sync_check__ */ #if defined(__BYTE_ORDER) ? __BYTE_ORDER == __LITTLE_ENDIAN : defined(__LITTLE_ENDIAN) struct insn_field { union { insn_value_t value; insn_byte_t bytes[4]; }; /* !0 if we've run insn_get_xxx() for this field */ unsigned char got; unsigned char nbytes; }; static inline void insn_field_set(struct insn_field *p, insn_value_t v, unsigned char n) { p->value = v; p->nbytes = n; } static inline void insn_set_byte(struct insn_field *p, unsigned char n, insn_byte_t v) { p->bytes[n] = v; } #else struct insn_field { insn_value_t value; union { insn_value_t little; insn_byte_t bytes[4]; }; /* !0 if we've run insn_get_xxx() for this field */ unsigned char got; unsigned char nbytes; }; static inline void insn_field_set(struct insn_field *p, insn_value_t v, unsigned char n) { p->value = v; p->little = __cpu_to_le32(v); p->nbytes = n; } static inline void insn_set_byte(struct insn_field *p, unsigned char n, insn_byte_t v) { p->bytes[n] = v; p->value = __le32_to_cpu(p->little); } #endif struct insn { struct insn_field prefixes; /* * Prefixes * prefixes.bytes[3]: last prefix */ struct insn_field rex_prefix; /* REX prefix */ struct insn_field vex_prefix; /* VEX prefix */ struct insn_field opcode; /* * opcode.bytes[0]: opcode1 * opcode.bytes[1]: opcode2 * opcode.bytes[2]: opcode3 */ struct insn_field modrm; struct insn_field sib; struct insn_field displacement; union { struct insn_field immediate; struct insn_field moffset1; /* for 64bit MOV */ struct insn_field immediate1; /* for 64bit imm or off16/32 */ }; union { struct insn_field moffset2; /* for 64bit MOV */ struct insn_field immediate2; /* for 64bit imm or seg16 */ }; int emulate_prefix_size; insn_attr_t attr; unsigned char opnd_bytes; unsigned char addr_bytes; unsigned char length; unsigned char x86_64; const insn_byte_t *kaddr; /* kernel address of insn to analyze */ const insn_byte_t *end_kaddr; /* kernel address of last insn in buffer */ const insn_byte_t *next_byte; }; #define MAX_INSN_SIZE 15 #define X86_MODRM_MOD(modrm) (((modrm) & 0xc0) >> 6) #define X86_MODRM_REG(modrm) (((modrm) & 0x38) >> 3) #define X86_MODRM_RM(modrm) ((modrm) & 0x07) #define X86_SIB_SCALE(sib) (((sib) & 0xc0) >> 6) #define X86_SIB_INDEX(sib) (((sib) & 0x38) >> 3) #define X86_SIB_BASE(sib) ((sib) & 0x07) #define X86_REX2_M(rex) ((rex) & 0x80) /* REX2 M0 */ #define X86_REX2_R(rex) ((rex) & 0x40) /* REX2 R4 */ #define X86_REX2_X(rex) ((rex) & 0x20) /* REX2 X4 */ #define X86_REX2_B(rex) ((rex) & 0x10) /* REX2 B4 */ #define X86_REX_W(rex) ((rex) & 8) /* REX or REX2 W */ #define X86_REX_R(rex) ((rex) & 4) /* REX or REX2 R3 */ #define X86_REX_X(rex) ((rex) & 2) /* REX or REX2 X3 */ #define X86_REX_B(rex) ((rex) & 1) /* REX or REX2 B3 */ /* VEX bit flags */ #define X86_VEX_W(vex) ((vex) & 0x80) /* VEX3 Byte2 */ #define X86_VEX_R(vex) ((vex) & 0x80) /* VEX2/3 Byte1 */ #define X86_VEX_X(vex) ((vex) & 0x40) /* VEX3 Byte1 */ #define X86_VEX_B(vex) ((vex) & 0x20) /* VEX3 Byte1 */ #define X86_VEX_L(vex) ((vex) & 0x04) /* VEX3 Byte2, VEX2 Byte1 */ /* VEX bit fields */ #define X86_EVEX_M(vex) ((vex) & 0x07) /* EVEX Byte1 */ #define X86_VEX3_M(vex) ((vex) & 0x1f) /* VEX3 Byte1 */ #define X86_VEX2_M 1 /* VEX2.M always 1 */ #define X86_VEX_V(vex) (((vex) & 0x78) >> 3) /* VEX3 Byte2, VEX2 Byte1 */ #define X86_VEX_P(vex) ((vex) & 0x03) /* VEX3 Byte2, VEX2 Byte1 */ #define X86_VEX_M_MAX 0x1f /* VEX3.M Maximum value */ extern void insn_init(struct insn *insn, const void *kaddr, int buf_len, int x86_64); extern int insn_get_prefixes(struct insn *insn); extern int insn_get_opcode(struct insn *insn); extern int insn_get_modrm(struct insn *insn); extern int insn_get_sib(struct insn *insn); extern int insn_get_displacement(struct insn *insn); extern int insn_get_immediate(struct insn *insn); extern int insn_get_length(struct insn *insn); enum insn_mode { INSN_MODE_32, INSN_MODE_64, /* Mode is determined by the current kernel build. */ INSN_MODE_KERN, INSN_NUM_MODES, }; extern int insn_decode(struct insn *insn, const void *kaddr, int buf_len, enum insn_mode m); #define insn_decode_kernel(_insn, _ptr) insn_decode((_insn), (_ptr), MAX_INSN_SIZE, INSN_MODE_KERN) /* Attribute will be determined after getting ModRM (for opcode groups) */ static inline void insn_get_attribute(struct insn *insn) { insn_get_modrm(insn); } /* Instruction uses RIP-relative addressing */ extern int insn_rip_relative(struct insn *insn); static inline int insn_is_rex2(struct insn *insn) { if (!insn->prefixes.got) insn_get_prefixes(insn); return insn->rex_prefix.nbytes == 2; } static inline insn_byte_t insn_rex2_m_bit(struct insn *insn) { return X86_REX2_M(insn->rex_prefix.bytes[1]); } static inline int insn_is_avx(struct insn *insn) { if (!insn->prefixes.got) insn_get_prefixes(insn); return (insn->vex_prefix.value != 0); } static inline int insn_is_evex(struct insn *insn) { if (!insn->prefixes.got) insn_get_prefixes(insn); return (insn->vex_prefix.nbytes == 4); } static inline int insn_has_emulate_prefix(struct insn *insn) { return !!insn->emulate_prefix_size; } static inline insn_byte_t insn_vex_m_bits(struct insn *insn) { if (insn->vex_prefix.nbytes == 2) /* 2 bytes VEX */ return X86_VEX2_M; else if (insn->vex_prefix.nbytes == 3) /* 3 bytes VEX */ return X86_VEX3_M(insn->vex_prefix.bytes[1]); else /* EVEX */ return X86_EVEX_M(insn->vex_prefix.bytes[1]); } static inline insn_byte_t insn_vex_p_bits(struct insn *insn) { if (insn->vex_prefix.nbytes == 2) /* 2 bytes VEX */ return X86_VEX_P(insn->vex_prefix.bytes[1]); else return X86_VEX_P(insn->vex_prefix.bytes[2]); } static inline insn_byte_t insn_vex_w_bit(struct insn *insn) { if (insn->vex_prefix.nbytes < 3) return 0; return X86_VEX_W(insn->vex_prefix.bytes[2]); } /* Get the last prefix id from last prefix or VEX prefix */ static inline int insn_last_prefix_id(struct insn *insn) { if (insn_is_avx(insn)) return insn_vex_p_bits(insn); /* VEX_p is a SIMD prefix id */ if (insn->prefixes.bytes[3]) return inat_get_last_prefix_id(insn->prefixes.bytes[3]); return 0; } /* Offset of each field from kaddr */ static inline int insn_offset_rex_prefix(struct insn *insn) { return insn->prefixes.nbytes; } static inline int insn_offset_vex_prefix(struct insn *insn) { return insn_offset_rex_prefix(insn) + insn->rex_prefix.nbytes; } static inline int insn_offset_opcode(struct insn *insn) { return insn_offset_vex_prefix(insn) + insn->vex_prefix.nbytes; } static inline int insn_offset_modrm(struct insn *insn) { return insn_offset_opcode(insn) + insn->opcode.nbytes; } static inline int insn_offset_sib(struct insn *insn) { return insn_offset_modrm(insn) + insn->modrm.nbytes; } static inline int insn_offset_displacement(struct insn *insn) { return insn_offset_sib(insn) + insn->sib.nbytes; } static inline int insn_offset_immediate(struct insn *insn) { return insn_offset_displacement(insn) + insn->displacement.nbytes; } /** * for_each_insn_prefix() -- Iterate prefixes in the instruction * @insn: Pointer to struct insn. * @idx: Index storage. * @prefix: Prefix byte. * * Iterate prefix bytes of given @insn. Each prefix byte is stored in @prefix * and the index is stored in @idx (note that this @idx is just for a cursor, * do not change it.) * Since prefixes.nbytes can be bigger than 4 if some prefixes * are repeated, it cannot be used for looping over the prefixes. */ #define for_each_insn_prefix(insn, idx, prefix) \ for (idx = 0; idx < ARRAY_SIZE(insn->prefixes.bytes) && (prefix = insn->prefixes.bytes[idx]) != 0; idx++) #define POP_SS_OPCODE 0x1f #define MOV_SREG_OPCODE 0x8e /* * Intel SDM Vol.3A 6.8.3 states; * "Any single-step trap that would be delivered following the MOV to SS * instruction or POP to SS instruction (because EFLAGS.TF is 1) is * suppressed." * This function returns true if @insn is MOV SS or POP SS. On these * instructions, single stepping is suppressed. */ static inline int insn_masking_exception(struct insn *insn) { return insn->opcode.bytes[0] == POP_SS_OPCODE || (insn->opcode.bytes[0] == MOV_SREG_OPCODE && X86_MODRM_REG(insn->modrm.bytes[0]) == 2); } #endif /* _ASM_X86_INSN_H */ |
2 4 16 19 19 19 19 16 16 4 19 19 19 19 35 37 37 31 27 27 2 25 11 11 11 11 11 6 6 3 3 35 35 35 11 11 27 13 14 24 3 27 26 23 23 23 23 10 10 10 2 44 1795 31 30 1 31 1 12 11 15 1 23 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 | // SPDX-License-Identifier: GPL-2.0 /* Copyright 2011-2014 Autronica Fire and Security AS * * Author(s): * 2011-2014 Arvid Brodin, arvid.brodin@alten.se * This file contains device methods for creating, using and destroying * virtual HSR or PRP devices. */ #include <linux/netdevice.h> #include <linux/skbuff.h> #include <linux/etherdevice.h> #include <linux/rtnetlink.h> #include <linux/pkt_sched.h> #include "hsr_device.h" #include "hsr_slave.h" #include "hsr_framereg.h" #include "hsr_main.h" #include "hsr_forward.h" static bool is_admin_up(struct net_device *dev) { return dev && (dev->flags & IFF_UP); } static bool is_slave_up(struct net_device *dev) { return dev && is_admin_up(dev) && netif_oper_up(dev); } static void hsr_set_operstate(struct hsr_port *master, bool has_carrier) { struct net_device *dev = master->dev; if (!is_admin_up(dev)) { netdev_set_operstate(dev, IF_OPER_DOWN); return; } if (has_carrier) netdev_set_operstate(dev, IF_OPER_UP); else netdev_set_operstate(dev, IF_OPER_LOWERLAYERDOWN); } static bool hsr_check_carrier(struct hsr_port *master) { struct hsr_port *port; ASSERT_RTNL(); hsr_for_each_port(master->hsr, port) { if (port->type != HSR_PT_MASTER && is_slave_up(port->dev)) { netif_carrier_on(master->dev); return true; } } netif_carrier_off(master->dev); return false; } static void hsr_check_announce(struct net_device *hsr_dev) { struct hsr_priv *hsr; hsr = netdev_priv(hsr_dev); if (netif_running(hsr_dev) && netif_oper_up(hsr_dev)) { /* Enable announce timer and start sending supervisory frames */ if (!timer_pending(&hsr->announce_timer)) { hsr->announce_count = 0; mod_timer(&hsr->announce_timer, jiffies + msecs_to_jiffies(HSR_ANNOUNCE_INTERVAL)); } if (hsr->redbox && !timer_pending(&hsr->announce_proxy_timer)) mod_timer(&hsr->announce_proxy_timer, jiffies + msecs_to_jiffies(HSR_ANNOUNCE_INTERVAL) / 2); } else { /* Deactivate the announce timer */ timer_delete(&hsr->announce_timer); if (hsr->redbox) timer_delete(&hsr->announce_proxy_timer); } } void hsr_check_carrier_and_operstate(struct hsr_priv *hsr) { struct hsr_port *master; bool has_carrier; master = hsr_port_get_hsr(hsr, HSR_PT_MASTER); /* netif_stacked_transfer_operstate() cannot be used here since * it doesn't set IF_OPER_LOWERLAYERDOWN (?) */ has_carrier = hsr_check_carrier(master); hsr_set_operstate(master, has_carrier); hsr_check_announce(master->dev); } int hsr_get_max_mtu(struct hsr_priv *hsr) { unsigned int mtu_max; struct hsr_port *port; mtu_max = ETH_DATA_LEN; hsr_for_each_port(hsr, port) if (port->type != HSR_PT_MASTER) mtu_max = min(port->dev->mtu, mtu_max); if (mtu_max < HSR_HLEN) return 0; return mtu_max - HSR_HLEN; } static int hsr_dev_change_mtu(struct net_device *dev, int new_mtu) { struct hsr_priv *hsr; hsr = netdev_priv(dev); if (new_mtu > hsr_get_max_mtu(hsr)) { netdev_info(dev, "A HSR master's MTU cannot be greater than the smallest MTU of its slaves minus the HSR Tag length (%d octets).\n", HSR_HLEN); return -EINVAL; } WRITE_ONCE(dev->mtu, new_mtu); return 0; } static int hsr_dev_open(struct net_device *dev) { struct hsr_priv *hsr; struct hsr_port *port; const char *designation = NULL; hsr = netdev_priv(dev); hsr_for_each_port(hsr, port) { if (port->type == HSR_PT_MASTER) continue; switch (port->type) { case HSR_PT_SLAVE_A: designation = "Slave A"; break; case HSR_PT_SLAVE_B: designation = "Slave B"; break; case HSR_PT_INTERLINK: designation = "Interlink"; break; default: designation = "Unknown"; } if (!is_slave_up(port->dev)) netdev_warn(dev, "%s (%s) is not up; please bring it up to get a fully working HSR network\n", designation, port->dev->name); } if (!designation) netdev_warn(dev, "No slave devices configured\n"); return 0; } static int hsr_dev_close(struct net_device *dev) { struct hsr_port *port; struct hsr_priv *hsr; hsr = netdev_priv(dev); hsr_for_each_port(hsr, port) { if (port->type == HSR_PT_MASTER) continue; switch (port->type) { case HSR_PT_SLAVE_A: case HSR_PT_SLAVE_B: dev_uc_unsync(port->dev, dev); dev_mc_unsync(port->dev, dev); break; default: break; } } return 0; } static netdev_features_t hsr_features_recompute(struct hsr_priv *hsr, netdev_features_t features) { netdev_features_t mask; struct hsr_port *port; mask = features; /* Mask out all features that, if supported by one device, should be * enabled for all devices (see NETIF_F_ONE_FOR_ALL). * * Anything that's off in mask will not be enabled - so only things * that were in features originally, and also is in NETIF_F_ONE_FOR_ALL, * may become enabled. */ features &= ~NETIF_F_ONE_FOR_ALL; hsr_for_each_port(hsr, port) features = netdev_increment_features(features, port->dev->features, mask); return features; } static netdev_features_t hsr_fix_features(struct net_device *dev, netdev_features_t features) { struct hsr_priv *hsr = netdev_priv(dev); return hsr_features_recompute(hsr, features); } static netdev_tx_t hsr_dev_xmit(struct sk_buff *skb, struct net_device *dev) { struct hsr_priv *hsr = netdev_priv(dev); struct hsr_port *master; master = hsr_port_get_hsr(hsr, HSR_PT_MASTER); if (master) { skb->dev = master->dev; skb_reset_mac_header(skb); skb_reset_mac_len(skb); spin_lock_bh(&hsr->seqnr_lock); hsr_forward_skb(skb, master); spin_unlock_bh(&hsr->seqnr_lock); } else { dev_core_stats_tx_dropped_inc(dev); dev_kfree_skb_any(skb); } return NETDEV_TX_OK; } static const struct header_ops hsr_header_ops = { .create = eth_header, .parse = eth_header_parse, }; static struct sk_buff *hsr_init_skb(struct hsr_port *master) { struct hsr_priv *hsr = master->hsr; struct sk_buff *skb; int hlen, tlen; hlen = LL_RESERVED_SPACE(master->dev); tlen = master->dev->needed_tailroom; /* skb size is same for PRP/HSR frames, only difference * being, for PRP it is a trailer and for HSR it is a * header */ skb = dev_alloc_skb(sizeof(struct hsr_sup_tag) + sizeof(struct hsr_sup_payload) + hlen + tlen); if (!skb) return skb; skb_reserve(skb, hlen); skb->dev = master->dev; skb->priority = TC_PRIO_CONTROL; if (dev_hard_header(skb, skb->dev, ETH_P_PRP, hsr->sup_multicast_addr, skb->dev->dev_addr, skb->len) <= 0) goto out; skb_reset_mac_header(skb); skb_reset_mac_len(skb); skb_reset_network_header(skb); skb_reset_transport_header(skb); return skb; out: kfree_skb(skb); return NULL; } static void send_hsr_supervision_frame(struct hsr_port *port, unsigned long *interval, const unsigned char *addr) { struct hsr_priv *hsr = port->hsr; __u8 type = HSR_TLV_LIFE_CHECK; struct hsr_sup_payload *hsr_sp; struct hsr_sup_tlv *hsr_stlv; struct hsr_sup_tag *hsr_stag; struct sk_buff *skb; *interval = msecs_to_jiffies(HSR_LIFE_CHECK_INTERVAL); if (hsr->announce_count < 3 && hsr->prot_version == 0) { type = HSR_TLV_ANNOUNCE; *interval = msecs_to_jiffies(HSR_ANNOUNCE_INTERVAL); hsr->announce_count++; } skb = hsr_init_skb(port); if (!skb) { netdev_warn_once(port->dev, "HSR: Could not send supervision frame\n"); return; } hsr_stag = skb_put(skb, sizeof(struct hsr_sup_tag)); set_hsr_stag_path(hsr_stag, (hsr->prot_version ? 0x0 : 0xf)); set_hsr_stag_HSR_ver(hsr_stag, hsr->prot_version); /* From HSRv1 on we have separate supervision sequence numbers. */ spin_lock_bh(&hsr->seqnr_lock); if (hsr->prot_version > 0) { hsr_stag->sequence_nr = htons(hsr->sup_sequence_nr); hsr->sup_sequence_nr++; } else { hsr_stag->sequence_nr = htons(hsr->sequence_nr); hsr->sequence_nr++; } hsr_stag->tlv.HSR_TLV_type = type; /* TODO: Why 12 in HSRv0? */ hsr_stag->tlv.HSR_TLV_length = hsr->prot_version ? sizeof(struct hsr_sup_payload) : 12; /* Payload: MacAddressA / SAN MAC from ProxyNodeTable */ hsr_sp = skb_put(skb, sizeof(struct hsr_sup_payload)); ether_addr_copy(hsr_sp->macaddress_A, addr); if (hsr->redbox && hsr_is_node_in_db(&hsr->proxy_node_db, addr)) { hsr_stlv = skb_put(skb, sizeof(struct hsr_sup_tlv)); hsr_stlv->HSR_TLV_type = PRP_TLV_REDBOX_MAC; hsr_stlv->HSR_TLV_length = sizeof(struct hsr_sup_payload); /* Payload: MacAddressRedBox */ hsr_sp = skb_put(skb, sizeof(struct hsr_sup_payload)); ether_addr_copy(hsr_sp->macaddress_A, hsr->macaddress_redbox); } if (skb_put_padto(skb, ETH_ZLEN)) { spin_unlock_bh(&hsr->seqnr_lock); return; } hsr_forward_skb(skb, port); spin_unlock_bh(&hsr->seqnr_lock); return; } static void send_prp_supervision_frame(struct hsr_port *master, unsigned long *interval, const unsigned char *addr) { struct hsr_priv *hsr = master->hsr; struct hsr_sup_payload *hsr_sp; struct hsr_sup_tag *hsr_stag; struct sk_buff *skb; skb = hsr_init_skb(master); if (!skb) { netdev_warn_once(master->dev, "PRP: Could not send supervision frame\n"); return; } *interval = msecs_to_jiffies(HSR_LIFE_CHECK_INTERVAL); hsr_stag = skb_put(skb, sizeof(struct hsr_sup_tag)); set_hsr_stag_path(hsr_stag, (hsr->prot_version ? 0x0 : 0xf)); set_hsr_stag_HSR_ver(hsr_stag, (hsr->prot_version ? 1 : 0)); /* From HSRv1 on we have separate supervision sequence numbers. */ spin_lock_bh(&hsr->seqnr_lock); hsr_stag->sequence_nr = htons(hsr->sup_sequence_nr); hsr->sup_sequence_nr++; hsr_stag->tlv.HSR_TLV_type = PRP_TLV_LIFE_CHECK_DD; hsr_stag->tlv.HSR_TLV_length = sizeof(struct hsr_sup_payload); /* Payload: MacAddressA */ hsr_sp = skb_put(skb, sizeof(struct hsr_sup_payload)); ether_addr_copy(hsr_sp->macaddress_A, master->dev->dev_addr); if (skb_put_padto(skb, ETH_ZLEN)) { spin_unlock_bh(&hsr->seqnr_lock); return; } hsr_forward_skb(skb, master); spin_unlock_bh(&hsr->seqnr_lock); } /* Announce (supervision frame) timer function */ static void hsr_announce(struct timer_list *t) { struct hsr_priv *hsr; struct hsr_port *master; unsigned long interval; hsr = from_timer(hsr, t, announce_timer); rcu_read_lock(); master = hsr_port_get_hsr(hsr, HSR_PT_MASTER); hsr->proto_ops->send_sv_frame(master, &interval, master->dev->dev_addr); if (is_admin_up(master->dev)) mod_timer(&hsr->announce_timer, jiffies + interval); rcu_read_unlock(); } /* Announce (supervision frame) timer function for RedBox */ static void hsr_proxy_announce(struct timer_list *t) { struct hsr_priv *hsr = from_timer(hsr, t, announce_proxy_timer); struct hsr_port *interlink; unsigned long interval = 0; struct hsr_node *node; rcu_read_lock(); /* RedBOX sends supervisory frames to HSR network with MAC addresses * of SAN nodes stored in ProxyNodeTable. */ interlink = hsr_port_get_hsr(hsr, HSR_PT_INTERLINK); list_for_each_entry_rcu(node, &hsr->proxy_node_db, mac_list) { if (hsr_addr_is_redbox(hsr, node->macaddress_A)) continue; hsr->proto_ops->send_sv_frame(interlink, &interval, node->macaddress_A); } if (is_admin_up(interlink->dev)) { if (!interval) interval = msecs_to_jiffies(HSR_ANNOUNCE_INTERVAL); mod_timer(&hsr->announce_proxy_timer, jiffies + interval); } rcu_read_unlock(); } void hsr_del_ports(struct hsr_priv *hsr) { struct hsr_port *port; port = hsr_port_get_hsr(hsr, HSR_PT_SLAVE_A); if (port) hsr_del_port(port); port = hsr_port_get_hsr(hsr, HSR_PT_SLAVE_B); if (port) hsr_del_port(port); port = hsr_port_get_hsr(hsr, HSR_PT_INTERLINK); if (port) hsr_del_port(port); port = hsr_port_get_hsr(hsr, HSR_PT_MASTER); if (port) hsr_del_port(port); } static void hsr_set_rx_mode(struct net_device *dev) { struct hsr_port *port; struct hsr_priv *hsr; hsr = netdev_priv(dev); hsr_for_each_port(hsr, port) { if (port->type == HSR_PT_MASTER) continue; switch (port->type) { case HSR_PT_SLAVE_A: case HSR_PT_SLAVE_B: dev_mc_sync_multiple(port->dev, dev); dev_uc_sync_multiple(port->dev, dev); break; default: break; } } } static void hsr_change_rx_flags(struct net_device *dev, int change) { struct hsr_port *port; struct hsr_priv *hsr; hsr = netdev_priv(dev); hsr_for_each_port(hsr, port) { if (port->type == HSR_PT_MASTER) continue; switch (port->type) { case HSR_PT_SLAVE_A: case HSR_PT_SLAVE_B: if (change & IFF_ALLMULTI) dev_set_allmulti(port->dev, dev->flags & IFF_ALLMULTI ? 1 : -1); break; default: break; } } } static const struct net_device_ops hsr_device_ops = { .ndo_change_mtu = hsr_dev_change_mtu, .ndo_open = hsr_dev_open, .ndo_stop = hsr_dev_close, .ndo_start_xmit = hsr_dev_xmit, .ndo_change_rx_flags = hsr_change_rx_flags, .ndo_fix_features = hsr_fix_features, .ndo_set_rx_mode = hsr_set_rx_mode, }; static const struct device_type hsr_type = { .name = "hsr", }; static struct hsr_proto_ops hsr_ops = { .send_sv_frame = send_hsr_supervision_frame, .create_tagged_frame = hsr_create_tagged_frame, .get_untagged_frame = hsr_get_untagged_frame, .drop_frame = hsr_drop_frame, .fill_frame_info = hsr_fill_frame_info, .invalid_dan_ingress_frame = hsr_invalid_dan_ingress_frame, }; static struct hsr_proto_ops prp_ops = { .send_sv_frame = send_prp_supervision_frame, .create_tagged_frame = prp_create_tagged_frame, .get_untagged_frame = prp_get_untagged_frame, .drop_frame = prp_drop_frame, .fill_frame_info = prp_fill_frame_info, .handle_san_frame = prp_handle_san_frame, .update_san_info = prp_update_san_info, }; void hsr_dev_setup(struct net_device *dev) { eth_hw_addr_random(dev); ether_setup(dev); dev->min_mtu = 0; dev->header_ops = &hsr_header_ops; dev->netdev_ops = &hsr_device_ops; SET_NETDEV_DEVTYPE(dev, &hsr_type); dev->priv_flags |= IFF_NO_QUEUE | IFF_DISABLE_NETPOLL; dev->needs_free_netdev = true; dev->hw_features = NETIF_F_SG | NETIF_F_FRAGLIST | NETIF_F_HIGHDMA | NETIF_F_GSO_MASK | NETIF_F_HW_CSUM | NETIF_F_HW_VLAN_CTAG_TX; dev->features = dev->hw_features; /* Prevent recursive tx locking */ dev->features |= NETIF_F_LLTX; /* VLAN on top of HSR needs testing and probably some work on * hsr_header_create() etc. */ dev->features |= NETIF_F_VLAN_CHALLENGED; /* Not sure about this. Taken from bridge code. netdev_features.h says * it means "Does not change network namespaces". */ dev->features |= NETIF_F_NETNS_LOCAL; } /* Return true if dev is a HSR master; return false otherwise. */ bool is_hsr_master(struct net_device *dev) { return (dev->netdev_ops->ndo_start_xmit == hsr_dev_xmit); } EXPORT_SYMBOL(is_hsr_master); /* Default multicast address for HSR Supervision frames */ static const unsigned char def_multicast_addr[ETH_ALEN] __aligned(2) = { 0x01, 0x15, 0x4e, 0x00, 0x01, 0x00 }; int hsr_dev_finalize(struct net_device *hsr_dev, struct net_device *slave[2], struct net_device *interlink, unsigned char multicast_spec, u8 protocol_version, struct netlink_ext_ack *extack) { bool unregister = false; struct hsr_priv *hsr; int res; hsr = netdev_priv(hsr_dev); INIT_LIST_HEAD(&hsr->ports); INIT_LIST_HEAD(&hsr->node_db); INIT_LIST_HEAD(&hsr->proxy_node_db); spin_lock_init(&hsr->list_lock); eth_hw_addr_set(hsr_dev, slave[0]->dev_addr); /* initialize protocol specific functions */ if (protocol_version == PRP_V1) { /* For PRP, lan_id has most significant 3 bits holding * the net_id of PRP_LAN_ID */ hsr->net_id = PRP_LAN_ID << 1; hsr->proto_ops = &prp_ops; } else { hsr->proto_ops = &hsr_ops; } /* Make sure we recognize frames from ourselves in hsr_rcv() */ res = hsr_create_self_node(hsr, hsr_dev->dev_addr, slave[1]->dev_addr); if (res < 0) return res; spin_lock_init(&hsr->seqnr_lock); /* Overflow soon to find bugs easier: */ hsr->sequence_nr = HSR_SEQNR_START; hsr->sup_sequence_nr = HSR_SUP_SEQNR_START; hsr->interlink_sequence_nr = HSR_SEQNR_START; timer_setup(&hsr->announce_timer, hsr_announce, 0); timer_setup(&hsr->prune_timer, hsr_prune_nodes, 0); timer_setup(&hsr->prune_proxy_timer, hsr_prune_proxy_nodes, 0); timer_setup(&hsr->announce_proxy_timer, hsr_proxy_announce, 0); ether_addr_copy(hsr->sup_multicast_addr, def_multicast_addr); hsr->sup_multicast_addr[ETH_ALEN - 1] = multicast_spec; hsr->prot_version = protocol_version; /* Make sure the 1st call to netif_carrier_on() gets through */ netif_carrier_off(hsr_dev); res = hsr_add_port(hsr, hsr_dev, HSR_PT_MASTER, extack); if (res) goto err_add_master; /* HSR forwarding offload supported in lower device? */ if ((slave[0]->features & NETIF_F_HW_HSR_FWD) && (slave[1]->features & NETIF_F_HW_HSR_FWD)) hsr->fwd_offloaded = true; res = register_netdevice(hsr_dev); if (res) goto err_unregister; unregister = true; res = hsr_add_port(hsr, slave[0], HSR_PT_SLAVE_A, extack); if (res) goto err_unregister; res = hsr_add_port(hsr, slave[1], HSR_PT_SLAVE_B, extack); if (res) goto err_unregister; if (interlink) { res = hsr_add_port(hsr, interlink, HSR_PT_INTERLINK, extack); if (res) goto err_unregister; hsr->redbox = true; ether_addr_copy(hsr->macaddress_redbox, interlink->dev_addr); mod_timer(&hsr->prune_proxy_timer, jiffies + msecs_to_jiffies(PRUNE_PROXY_PERIOD)); } hsr_debugfs_init(hsr, hsr_dev); mod_timer(&hsr->prune_timer, jiffies + msecs_to_jiffies(PRUNE_PERIOD)); return 0; err_unregister: hsr_del_ports(hsr); err_add_master: hsr_del_self_node(hsr); if (unregister) unregister_netdevice(hsr_dev); return res; } |
72 8 6477 51 30 220 427 332 30 418 8 113 9 114 9 2 16 18 130 226 6 67 8 1 2 28 3 56 57 20 37 59 305 60 280 300 300 5 5 121 294 296 170 171 171 171 3 3 60 171 171 225 110 115 104 104 105 104 5 5 5 5 7 7 7 7 7 118 119 119 119 8 8 17 17 8 16 17 15 15 191 191 11 33 39 520 467 60 60 4 55 14 79 65 61 4 65 21 21 20 1 4 3 5 2 3 4 2 2 19 1 1 9 1 7 15 6 9 8 4 3 62 33 32 21 21 170 171 16 3406 3404 16 3405 3 42 18 26 12 19 6 13 91 463 464 1 9 1 456 455 1 388 3 63 1 8 140 308 382 3 380 383 123 62 120 1 262 63 44 168 6 853 186 45 837 739 6 9 10 24 17 7 2 33 396 371 29 200 1 19 2 190 2 23 168 188 1 18 104 71 194 345 2 27 2 330 2 2 329 327 151 192 334 89 1 8 2 2 5 2 32 46 87 153 1 12 1 23 2 122 7 138 59 90 148 31 1 27 2 1 1 22 1 13 102 2 99 1 99 4 91 2 1 32 45 1 42 2 43 2 1 40 24 7 38 11 2 7 24 2 48 1 2 8 4 37 2 2 10 27 33 33 1 17 9 17 42 12 1 7 2 2 4 6 3223 3223 7 1210 1289 1210 1207 1209 1209 86 2135 2134 2 84 2051 86 86 86 941 86 227 964 74 74 2 42 1 1 1209 1209 307 59 31 32 33 28 360 32 146 1345 97 1255 143 240 3147 43 51 3 53 3262 3256 1 6 3260 2534 91 1 217 1 244 36 85 53 28 122 2796 1802 3289 1 2 3286 3280 6 12 508 2796 2533 12 1 3251 8 5 13 13 21 3209 32 1 1 3223 7 2 2047 3 1205 2 54 25 3 23 20 3 17 70 71 30 94 28 30 94 94 64 51 45 50 5 5 45 92 94 93 1 1 1 349 354 5 349 349 322 13 17 10 47 44 8 2 3 3 14 1 5 8 2 1 2 289 289 3 285 2 1 2 280 304 1 303 8 280 9 432 316 8 12 10 7 21 16 5 6 31 6 2 224 28 1 30 12 3 9 1 1 17 33 89 164 1 1 20 28 121 84 6 3 40 5 1 12 54 180 1 3 8 27 4 58 27 56 8 19 22 9 17 116 115 2 1 113 683 1 683 681 678 3 2 677 526 9 3 1 3 2 4 1 3 2 5 1 1 2 2 10 2 1 1 5 2 6 4 3 3 9 9 4 1 2 5 1 9 7 2 2 1 55 42 1 3 4 5 41 41 1 32 9 2 33 34 25 7 2 22 6 2 20 8 28 25 2 24 1 19 6 1 23 17 8 2 9 1 3 7 24 26 1 19 19 2 9 131 2 23 55 17 24 3 26 124 658 1 1 654 2 1 651 7 1 1 5 9 9 1 4 2 3 20 1 1 1 5 12 2 1 1 202 1 29 60 149 198 1 59 21 103 33 2 136 187 175 2 1 171 3 66 31 25 1 18 9 2 1 13 163 126 38 1 1 36 2 32 1 13 2 14 1 25 13 13 26 33 15 1 14 1 2 8 10 2 6 3 5 2 6 1 1 3 1 8 1 17 20 1 1 1 18 6 1 1 5 1 2 16 1 1 14 1 11 2 7 3 6 1 11 7 1 1 5 6488 6486 6478 464 201 344 89 153 13 3288 25 20 165 178 117 688 3 1 4 5 10 131 304 658 7 20 49 61 23 105 34 175 36 6 1 21 7 15 16 7 6482 2 1 1 10 6 2 1 1 2 103 2 1 6 6 97 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 2297 2298 2299 2300 2301 2302 2303 2304 2305 2306 2307 2308 2309 2310 2311 2312 2313 2314 2315 2316 2317 2318 2319 2320 2321 2322 2323 2324 2325 2326 2327 2328 2329 2330 2331 2332 2333 2334 2335 2336 2337 2338 2339 2340 2341 2342 2343 2344 2345 2346 2347 2348 2349 2350 2351 2352 2353 2354 2355 2356 2357 2358 2359 2360 2361 2362 2363 2364 2365 2366 2367 2368 2369 2370 2371 2372 2373 2374 2375 2376 2377 2378 2379 2380 2381 2382 2383 2384 2385 2386 2387 2388 2389 2390 2391 2392 2393 2394 2395 2396 2397 2398 2399 2400 2401 2402 2403 2404 2405 2406 2407 2408 2409 2410 2411 2412 2413 2414 2415 2416 2417 2418 2419 2420 2421 2422 2423 2424 2425 2426 2427 2428 2429 2430 2431 2432 2433 2434 2435 2436 2437 2438 2439 2440 2441 2442 2443 2444 2445 2446 2447 2448 2449 2450 2451 2452 2453 2454 2455 2456 2457 2458 2459 2460 2461 2462 2463 2464 2465 2466 2467 2468 2469 2470 2471 2472 2473 2474 2475 2476 2477 2478 2479 2480 2481 2482 2483 2484 2485 2486 2487 2488 2489 2490 2491 2492 2493 2494 2495 2496 2497 2498 2499 2500 2501 2502 2503 2504 2505 2506 2507 2508 2509 2510 2511 2512 2513 2514 2515 2516 2517 2518 2519 2520 2521 2522 2523 2524 2525 2526 2527 2528 2529 2530 2531 2532 2533 2534 2535 2536 2537 2538 2539 2540 2541 2542 2543 2544 2545 2546 2547 2548 2549 2550 2551 2552 2553 2554 2555 2556 2557 2558 2559 2560 2561 2562 2563 2564 2565 2566 2567 2568 2569 2570 2571 2572 2573 2574 2575 2576 2577 2578 2579 2580 2581 2582 2583 2584 2585 2586 2587 2588 2589 2590 2591 2592 2593 2594 2595 2596 2597 2598 2599 2600 2601 2602 2603 2604 2605 2606 2607 2608 2609 2610 2611 2612 2613 2614 2615 2616 2617 2618 2619 2620 2621 2622 2623 2624 2625 2626 2627 2628 2629 2630 2631 2632 2633 2634 2635 2636 2637 2638 2639 2640 2641 2642 2643 2644 2645 2646 2647 2648 2649 2650 2651 2652 2653 2654 2655 2656 2657 2658 2659 2660 2661 2662 2663 2664 2665 2666 2667 2668 2669 2670 2671 2672 2673 2674 2675 2676 2677 2678 2679 2680 2681 2682 2683 2684 2685 2686 2687 2688 2689 2690 2691 2692 2693 2694 2695 2696 2697 2698 2699 2700 2701 2702 2703 2704 2705 2706 2707 2708 2709 2710 2711 2712 2713 2714 2715 2716 2717 2718 2719 2720 2721 2722 2723 2724 2725 2726 2727 2728 2729 2730 2731 2732 2733 2734 2735 2736 2737 2738 2739 2740 2741 2742 2743 2744 2745 2746 2747 2748 2749 2750 2751 2752 2753 2754 2755 2756 2757 2758 2759 2760 2761 2762 2763 2764 2765 2766 2767 2768 2769 2770 2771 2772 2773 2774 2775 2776 2777 2778 2779 2780 2781 2782 2783 2784 2785 2786 2787 2788 2789 2790 2791 2792 2793 2794 2795 2796 2797 2798 2799 2800 2801 2802 2803 2804 2805 2806 2807 2808 2809 2810 2811 2812 2813 2814 2815 2816 2817 2818 2819 2820 2821 2822 2823 2824 2825 2826 2827 2828 2829 2830 2831 2832 2833 2834 2835 2836 2837 2838 2839 2840 2841 2842 2843 2844 2845 2846 2847 2848 2849 2850 2851 2852 2853 2854 2855 2856 2857 2858 2859 2860 2861 2862 2863 2864 2865 2866 2867 2868 2869 2870 2871 2872 2873 2874 2875 2876 2877 2878 2879 2880 2881 2882 2883 2884 2885 2886 2887 2888 2889 2890 2891 2892 2893 2894 2895 2896 2897 2898 2899 2900 2901 2902 2903 2904 2905 2906 2907 2908 2909 2910 2911 2912 2913 2914 2915 2916 2917 2918 2919 2920 2921 2922 2923 2924 2925 2926 2927 2928 2929 2930 2931 2932 2933 2934 2935 2936 2937 2938 2939 2940 2941 2942 2943 2944 2945 2946 2947 2948 2949 2950 2951 2952 2953 2954 2955 2956 2957 2958 2959 2960 2961 2962 2963 2964 2965 2966 2967 2968 2969 2970 2971 2972 2973 2974 2975 2976 2977 2978 2979 2980 2981 2982 2983 2984 2985 2986 2987 2988 2989 2990 2991 2992 2993 2994 2995 2996 2997 2998 2999 3000 3001 3002 3003 3004 3005 3006 3007 3008 3009 3010 3011 3012 3013 3014 3015 3016 3017 3018 3019 3020 3021 3022 3023 3024 3025 3026 3027 3028 3029 3030 3031 3032 3033 3034 3035 3036 3037 3038 3039 3040 3041 3042 3043 3044 3045 3046 3047 3048 3049 3050 3051 3052 3053 3054 3055 3056 3057 3058 3059 3060 3061 3062 3063 3064 3065 3066 3067 3068 3069 3070 3071 3072 3073 3074 3075 3076 3077 3078 3079 3080 3081 3082 3083 3084 3085 3086 3087 3088 3089 3090 3091 3092 3093 3094 3095 3096 3097 3098 3099 3100 3101 3102 3103 3104 3105 3106 3107 3108 3109 3110 3111 3112 3113 3114 3115 3116 3117 3118 3119 3120 3121 3122 3123 3124 3125 3126 3127 3128 3129 3130 3131 3132 3133 3134 3135 3136 3137 3138 3139 3140 3141 3142 3143 3144 3145 3146 3147 3148 3149 3150 3151 3152 3153 3154 3155 3156 3157 3158 3159 3160 3161 3162 3163 3164 3165 3166 3167 3168 3169 3170 3171 3172 3173 3174 3175 3176 3177 3178 3179 3180 3181 3182 3183 3184 3185 3186 3187 3188 3189 3190 3191 3192 3193 3194 3195 3196 3197 3198 3199 3200 3201 3202 3203 3204 3205 3206 3207 3208 3209 3210 3211 3212 3213 3214 3215 3216 3217 3218 3219 3220 3221 3222 3223 3224 3225 3226 3227 3228 3229 3230 3231 3232 3233 3234 3235 3236 3237 3238 3239 3240 3241 3242 3243 3244 3245 3246 3247 3248 3249 3250 3251 3252 3253 3254 3255 3256 3257 3258 3259 3260 3261 3262 3263 3264 3265 3266 3267 3268 3269 3270 3271 3272 3273 3274 3275 3276 3277 3278 3279 3280 3281 3282 3283 3284 3285 3286 3287 3288 3289 3290 3291 3292 3293 3294 3295 3296 3297 3298 3299 3300 3301 3302 3303 3304 3305 3306 3307 3308 3309 3310 3311 3312 3313 3314 3315 3316 3317 3318 3319 3320 3321 3322 3323 3324 3325 3326 3327 3328 3329 3330 3331 3332 3333 3334 3335 3336 3337 3338 3339 3340 3341 3342 3343 3344 3345 3346 3347 3348 3349 3350 3351 3352 3353 3354 3355 3356 3357 3358 3359 3360 3361 3362 3363 3364 3365 3366 3367 3368 3369 3370 3371 3372 3373 3374 3375 3376 3377 3378 3379 3380 3381 3382 3383 3384 3385 3386 3387 3388 3389 3390 3391 3392 3393 3394 3395 3396 3397 3398 3399 3400 3401 3402 3403 3404 3405 3406 3407 3408 3409 3410 3411 3412 3413 3414 3415 3416 3417 3418 3419 3420 3421 3422 3423 3424 3425 3426 3427 3428 3429 3430 3431 3432 3433 3434 3435 3436 3437 3438 3439 3440 3441 3442 3443 3444 3445 3446 3447 3448 3449 3450 3451 3452 3453 3454 3455 3456 3457 3458 3459 3460 3461 3462 3463 3464 3465 3466 3467 3468 3469 3470 3471 3472 3473 3474 3475 3476 3477 3478 3479 3480 3481 3482 3483 3484 3485 3486 3487 3488 3489 3490 3491 3492 3493 3494 3495 3496 3497 3498 3499 3500 3501 3502 3503 3504 3505 3506 3507 3508 3509 3510 3511 3512 3513 3514 3515 3516 3517 3518 3519 3520 3521 3522 3523 3524 3525 3526 3527 3528 3529 3530 3531 3532 3533 3534 3535 3536 3537 3538 3539 3540 3541 3542 3543 3544 3545 3546 3547 3548 3549 3550 3551 3552 3553 3554 3555 3556 3557 3558 3559 3560 3561 3562 3563 3564 3565 3566 3567 3568 3569 3570 3571 3572 3573 3574 3575 3576 3577 3578 3579 3580 3581 3582 3583 3584 3585 3586 3587 3588 3589 3590 3591 3592 3593 3594 3595 3596 3597 3598 3599 3600 3601 3602 3603 3604 3605 3606 3607 3608 3609 3610 3611 3612 3613 3614 3615 3616 3617 3618 3619 3620 3621 3622 3623 3624 3625 3626 3627 3628 3629 3630 3631 3632 3633 3634 3635 3636 3637 3638 3639 3640 3641 3642 3643 3644 3645 3646 3647 3648 3649 3650 3651 3652 3653 3654 3655 3656 3657 3658 3659 3660 3661 3662 3663 3664 3665 3666 3667 3668 3669 3670 3671 3672 3673 3674 3675 3676 3677 3678 3679 3680 3681 3682 3683 3684 3685 3686 3687 3688 3689 3690 3691 3692 3693 3694 3695 3696 3697 3698 3699 3700 3701 3702 3703 3704 3705 3706 3707 3708 3709 3710 3711 3712 3713 3714 3715 3716 3717 3718 3719 3720 3721 3722 3723 3724 3725 3726 3727 3728 3729 3730 3731 3732 3733 3734 3735 3736 3737 3738 3739 3740 3741 3742 3743 3744 3745 3746 3747 3748 3749 3750 3751 3752 3753 3754 3755 3756 3757 3758 3759 3760 3761 3762 3763 3764 3765 3766 3767 3768 3769 3770 3771 3772 3773 3774 3775 3776 3777 3778 3779 3780 3781 3782 3783 3784 3785 3786 3787 3788 3789 3790 3791 3792 3793 3794 3795 3796 3797 3798 3799 3800 3801 3802 3803 3804 3805 3806 3807 3808 3809 3810 3811 3812 3813 3814 3815 3816 3817 3818 3819 3820 3821 3822 3823 3824 3825 3826 3827 3828 3829 3830 3831 3832 3833 3834 3835 3836 3837 3838 3839 3840 3841 3842 3843 3844 3845 3846 3847 3848 3849 3850 3851 3852 3853 3854 3855 3856 3857 3858 3859 3860 3861 3862 3863 3864 3865 3866 3867 3868 3869 3870 3871 3872 3873 3874 3875 3876 3877 3878 3879 3880 3881 3882 3883 3884 3885 3886 3887 3888 3889 3890 3891 3892 3893 3894 3895 3896 3897 3898 3899 3900 3901 3902 3903 3904 3905 3906 3907 3908 3909 3910 3911 3912 3913 3914 3915 3916 3917 3918 3919 3920 3921 3922 3923 3924 3925 3926 3927 3928 3929 3930 3931 3932 3933 3934 3935 3936 3937 3938 3939 3940 3941 3942 3943 3944 3945 3946 3947 3948 3949 3950 3951 3952 3953 3954 3955 3956 3957 3958 3959 3960 3961 3962 3963 3964 3965 3966 3967 3968 3969 3970 3971 3972 3973 3974 3975 3976 3977 3978 3979 3980 3981 3982 3983 3984 3985 3986 3987 3988 3989 3990 3991 3992 3993 3994 3995 3996 3997 3998 3999 4000 4001 4002 4003 4004 4005 4006 4007 4008 4009 4010 4011 4012 4013 4014 4015 4016 4017 4018 4019 4020 4021 4022 4023 4024 4025 4026 4027 4028 4029 4030 4031 4032 4033 4034 4035 4036 4037 4038 4039 4040 4041 4042 4043 4044 4045 4046 4047 4048 4049 4050 4051 4052 4053 4054 4055 4056 4057 4058 4059 4060 4061 4062 4063 4064 4065 4066 4067 4068 4069 4070 4071 4072 4073 4074 4075 4076 4077 4078 4079 4080 4081 4082 4083 4084 4085 4086 4087 4088 4089 4090 4091 4092 4093 4094 4095 4096 4097 4098 4099 4100 4101 4102 4103 4104 4105 4106 4107 4108 4109 4110 4111 4112 4113 4114 4115 4116 4117 4118 4119 4120 4121 4122 4123 4124 4125 4126 4127 4128 4129 4130 4131 4132 4133 4134 4135 4136 4137 4138 4139 4140 4141 4142 4143 4144 4145 4146 4147 4148 4149 4150 4151 4152 4153 4154 4155 4156 4157 4158 4159 4160 4161 4162 4163 4164 4165 4166 4167 4168 4169 4170 4171 4172 4173 4174 4175 4176 4177 4178 4179 4180 4181 4182 4183 4184 4185 4186 4187 4188 4189 4190 4191 4192 4193 4194 4195 4196 4197 4198 4199 4200 4201 4202 4203 4204 4205 4206 4207 4208 4209 4210 4211 4212 4213 4214 4215 4216 4217 4218 4219 4220 4221 4222 4223 4224 4225 4226 4227 4228 4229 4230 4231 4232 4233 4234 4235 4236 4237 4238 4239 4240 4241 4242 4243 4244 4245 4246 4247 4248 4249 4250 4251 4252 4253 4254 4255 4256 4257 4258 4259 4260 4261 4262 4263 4264 4265 4266 4267 4268 4269 4270 4271 4272 4273 4274 4275 4276 4277 4278 4279 4280 4281 4282 4283 4284 4285 4286 4287 4288 4289 4290 4291 4292 4293 4294 4295 4296 4297 4298 4299 4300 4301 4302 4303 4304 4305 4306 4307 4308 4309 4310 4311 4312 4313 4314 4315 4316 4317 4318 4319 4320 4321 4322 4323 4324 4325 4326 4327 4328 4329 4330 4331 4332 4333 4334 4335 4336 4337 4338 4339 4340 4341 4342 4343 4344 4345 4346 4347 4348 4349 4350 4351 4352 4353 4354 4355 4356 4357 4358 4359 4360 4361 4362 4363 4364 4365 4366 4367 4368 4369 4370 4371 4372 4373 4374 4375 4376 4377 4378 4379 4380 4381 4382 4383 4384 4385 4386 4387 4388 4389 4390 4391 4392 4393 4394 4395 4396 4397 4398 4399 4400 4401 4402 4403 4404 4405 4406 4407 4408 4409 4410 4411 4412 4413 4414 4415 4416 4417 4418 4419 4420 4421 4422 4423 4424 4425 4426 4427 4428 4429 4430 4431 4432 4433 4434 4435 4436 4437 4438 4439 4440 4441 4442 4443 4444 4445 4446 4447 4448 4449 4450 4451 4452 4453 4454 4455 4456 4457 4458 4459 4460 4461 4462 4463 4464 4465 4466 4467 4468 4469 4470 4471 4472 4473 4474 4475 4476 4477 4478 4479 4480 4481 4482 4483 4484 4485 4486 4487 4488 4489 4490 4491 4492 4493 4494 4495 4496 4497 4498 4499 4500 4501 4502 4503 4504 4505 4506 4507 4508 4509 4510 4511 4512 4513 4514 4515 4516 4517 4518 4519 4520 4521 4522 4523 4524 4525 4526 4527 4528 4529 4530 4531 4532 4533 4534 4535 4536 4537 4538 4539 4540 4541 4542 4543 4544 4545 4546 4547 4548 4549 4550 4551 4552 4553 4554 4555 4556 4557 4558 4559 4560 4561 4562 4563 4564 4565 4566 4567 4568 4569 4570 4571 4572 4573 4574 4575 4576 4577 4578 4579 4580 4581 4582 4583 4584 4585 4586 4587 4588 4589 4590 4591 4592 4593 4594 4595 4596 4597 4598 4599 4600 4601 4602 4603 4604 4605 4606 4607 4608 4609 4610 4611 4612 4613 4614 4615 4616 4617 4618 4619 4620 4621 4622 4623 4624 4625 4626 4627 4628 4629 4630 4631 4632 4633 4634 4635 4636 4637 4638 4639 4640 4641 4642 4643 4644 4645 4646 4647 4648 4649 4650 4651 4652 4653 4654 4655 4656 4657 4658 4659 4660 4661 4662 4663 4664 4665 4666 4667 4668 4669 4670 4671 4672 4673 4674 4675 4676 4677 4678 4679 4680 4681 4682 4683 4684 4685 4686 4687 4688 4689 4690 4691 4692 4693 4694 4695 4696 4697 4698 4699 4700 4701 4702 4703 4704 4705 4706 4707 4708 4709 4710 4711 4712 4713 4714 4715 4716 4717 4718 4719 4720 4721 4722 4723 4724 4725 4726 4727 4728 4729 4730 4731 4732 4733 4734 4735 4736 4737 4738 4739 4740 4741 4742 4743 4744 4745 4746 4747 4748 4749 4750 4751 4752 4753 4754 4755 4756 4757 4758 4759 4760 4761 4762 4763 4764 4765 4766 4767 4768 4769 4770 4771 4772 4773 4774 4775 4776 4777 4778 4779 4780 4781 4782 4783 4784 4785 4786 4787 4788 4789 4790 4791 4792 4793 4794 4795 4796 4797 4798 4799 4800 4801 4802 4803 4804 4805 4806 4807 4808 4809 4810 4811 4812 4813 4814 4815 4816 4817 4818 4819 4820 4821 4822 4823 4824 4825 4826 4827 4828 4829 4830 4831 4832 4833 4834 4835 4836 4837 4838 4839 4840 4841 4842 4843 4844 4845 4846 4847 4848 4849 4850 4851 4852 4853 4854 4855 4856 4857 4858 4859 4860 4861 4862 4863 4864 4865 4866 4867 4868 4869 4870 4871 4872 4873 4874 4875 4876 4877 4878 4879 4880 4881 4882 4883 4884 4885 4886 4887 4888 4889 4890 4891 4892 4893 4894 4895 4896 4897 4898 4899 4900 4901 4902 4903 4904 4905 4906 4907 4908 4909 4910 4911 4912 4913 4914 4915 4916 4917 4918 4919 4920 4921 4922 4923 4924 4925 4926 4927 4928 4929 4930 4931 4932 4933 4934 4935 4936 4937 4938 4939 4940 4941 4942 4943 4944 4945 4946 4947 4948 4949 4950 4951 4952 4953 4954 4955 4956 4957 4958 4959 4960 4961 4962 4963 4964 4965 4966 4967 4968 4969 4970 4971 4972 4973 4974 4975 4976 4977 4978 4979 4980 4981 4982 4983 4984 4985 4986 4987 4988 4989 4990 4991 4992 4993 4994 4995 4996 4997 4998 4999 5000 5001 5002 5003 5004 5005 5006 5007 5008 5009 5010 5011 5012 5013 5014 5015 5016 5017 5018 5019 5020 5021 5022 5023 5024 5025 5026 5027 5028 5029 5030 5031 5032 5033 5034 5035 5036 5037 5038 5039 5040 5041 5042 5043 5044 5045 5046 5047 5048 5049 5050 5051 5052 5053 5054 5055 5056 5057 5058 5059 5060 5061 5062 5063 5064 5065 5066 5067 5068 5069 5070 5071 5072 5073 5074 5075 5076 5077 5078 5079 5080 5081 5082 5083 5084 5085 5086 5087 5088 5089 5090 5091 5092 5093 5094 5095 5096 5097 5098 5099 5100 5101 5102 5103 5104 5105 5106 5107 5108 5109 5110 5111 5112 5113 5114 5115 5116 5117 5118 5119 5120 5121 5122 5123 5124 5125 5126 5127 5128 5129 5130 5131 5132 5133 5134 5135 5136 5137 5138 5139 5140 5141 5142 5143 5144 5145 5146 5147 5148 5149 5150 5151 5152 5153 5154 5155 5156 5157 5158 5159 5160 5161 5162 5163 5164 5165 5166 5167 5168 5169 5170 5171 5172 5173 5174 5175 5176 5177 5178 5179 5180 5181 5182 5183 5184 5185 5186 5187 5188 5189 5190 5191 5192 5193 5194 5195 5196 5197 5198 5199 5200 5201 5202 5203 5204 5205 5206 5207 5208 5209 5210 5211 5212 5213 5214 5215 5216 5217 5218 5219 5220 5221 5222 5223 5224 5225 5226 5227 5228 5229 5230 5231 5232 5233 5234 5235 5236 5237 5238 5239 5240 5241 5242 5243 5244 5245 5246 5247 5248 5249 5250 5251 5252 5253 5254 5255 5256 5257 5258 5259 5260 5261 5262 5263 5264 5265 5266 5267 5268 5269 5270 5271 5272 5273 5274 5275 5276 5277 5278 5279 5280 5281 5282 5283 5284 5285 5286 5287 5288 5289 5290 5291 5292 5293 5294 5295 5296 5297 5298 5299 5300 5301 5302 5303 5304 5305 5306 5307 5308 5309 5310 5311 5312 5313 5314 5315 5316 5317 5318 5319 5320 5321 5322 5323 5324 5325 5326 5327 5328 5329 5330 5331 5332 5333 5334 5335 5336 5337 5338 5339 5340 5341 5342 5343 5344 5345 5346 5347 5348 5349 5350 5351 5352 5353 5354 5355 5356 5357 5358 5359 5360 5361 5362 5363 5364 5365 5366 5367 5368 5369 5370 5371 5372 5373 5374 5375 5376 5377 5378 5379 5380 5381 5382 5383 5384 5385 5386 5387 5388 5389 5390 5391 5392 5393 5394 5395 5396 5397 5398 5399 5400 5401 5402 5403 5404 5405 5406 5407 5408 5409 5410 5411 5412 5413 5414 5415 5416 5417 5418 5419 5420 5421 5422 5423 5424 5425 5426 5427 5428 5429 5430 5431 5432 5433 5434 5435 5436 5437 5438 5439 5440 5441 5442 5443 5444 5445 5446 5447 5448 5449 5450 5451 5452 5453 5454 5455 5456 5457 5458 5459 5460 5461 5462 5463 5464 5465 5466 5467 5468 5469 5470 5471 5472 5473 5474 5475 5476 5477 5478 5479 5480 5481 5482 5483 5484 5485 5486 5487 5488 5489 5490 5491 5492 5493 5494 5495 5496 5497 5498 5499 5500 5501 5502 5503 5504 5505 5506 5507 5508 5509 5510 5511 5512 5513 5514 5515 5516 5517 5518 5519 5520 5521 5522 5523 5524 5525 5526 5527 5528 5529 5530 5531 5532 5533 5534 5535 5536 5537 5538 5539 5540 5541 5542 5543 5544 5545 5546 5547 5548 5549 5550 5551 5552 5553 5554 5555 5556 5557 5558 5559 5560 5561 5562 5563 5564 5565 5566 5567 5568 5569 5570 5571 5572 5573 5574 5575 5576 5577 5578 5579 5580 5581 5582 5583 5584 5585 5586 5587 5588 5589 5590 5591 5592 5593 5594 5595 5596 5597 5598 5599 5600 5601 5602 5603 5604 5605 5606 5607 5608 5609 5610 5611 5612 5613 5614 5615 5616 5617 5618 5619 5620 5621 5622 5623 5624 5625 5626 5627 5628 5629 5630 5631 5632 5633 5634 5635 5636 5637 5638 5639 5640 5641 5642 5643 5644 5645 5646 5647 5648 5649 5650 5651 5652 5653 5654 5655 5656 5657 5658 5659 5660 5661 5662 5663 5664 5665 5666 5667 5668 5669 5670 5671 5672 5673 5674 5675 5676 5677 5678 5679 5680 5681 5682 5683 5684 5685 5686 5687 5688 5689 5690 5691 5692 5693 5694 5695 5696 5697 5698 5699 5700 5701 5702 5703 5704 5705 5706 5707 5708 5709 5710 5711 5712 5713 5714 5715 5716 5717 5718 5719 5720 5721 5722 5723 5724 5725 5726 5727 5728 5729 5730 5731 5732 5733 5734 5735 5736 5737 5738 5739 5740 5741 5742 5743 5744 5745 5746 5747 5748 5749 5750 5751 5752 5753 5754 5755 5756 5757 5758 5759 5760 5761 5762 5763 5764 5765 5766 5767 5768 5769 5770 5771 5772 5773 5774 5775 5776 5777 5778 5779 5780 5781 5782 5783 5784 5785 5786 5787 5788 5789 5790 5791 5792 5793 5794 5795 5796 5797 5798 5799 5800 5801 5802 5803 5804 5805 5806 5807 5808 5809 5810 5811 5812 5813 5814 5815 5816 5817 5818 5819 5820 5821 5822 5823 5824 5825 5826 5827 5828 5829 5830 5831 5832 5833 5834 5835 5836 5837 5838 5839 5840 5841 5842 5843 5844 5845 5846 5847 5848 5849 5850 5851 5852 5853 5854 5855 5856 5857 5858 5859 5860 5861 5862 5863 5864 5865 5866 5867 5868 5869 5870 5871 5872 5873 5874 5875 5876 5877 5878 5879 5880 5881 5882 5883 5884 5885 5886 5887 5888 5889 5890 5891 5892 5893 5894 5895 5896 5897 5898 5899 5900 5901 5902 5903 5904 5905 5906 5907 5908 5909 5910 5911 5912 5913 5914 5915 5916 5917 5918 5919 5920 5921 5922 5923 5924 5925 5926 5927 5928 5929 5930 5931 5932 5933 5934 5935 5936 5937 5938 5939 5940 5941 5942 5943 5944 5945 5946 5947 5948 5949 5950 5951 5952 5953 5954 5955 5956 5957 5958 5959 5960 5961 5962 5963 5964 5965 5966 5967 5968 5969 5970 5971 5972 5973 5974 5975 5976 5977 5978 5979 5980 5981 5982 5983 5984 5985 5986 5987 5988 5989 5990 5991 5992 5993 5994 5995 5996 5997 5998 5999 6000 6001 6002 6003 6004 6005 6006 6007 6008 6009 6010 6011 6012 6013 6014 6015 6016 6017 6018 6019 6020 6021 6022 6023 6024 6025 6026 6027 6028 6029 6030 6031 6032 6033 6034 6035 6036 6037 6038 6039 6040 6041 6042 6043 6044 6045 6046 6047 6048 6049 6050 6051 6052 6053 6054 6055 6056 6057 6058 6059 6060 6061 6062 6063 6064 6065 6066 6067 6068 6069 | // SPDX-License-Identifier: GPL-2.0-only /* Copyright (c) 2011-2014 PLUMgrid, http://plumgrid.com */ #include <linux/bpf.h> #include <linux/bpf-cgroup.h> #include <linux/bpf_trace.h> #include <linux/bpf_lirc.h> #include <linux/bpf_verifier.h> #include <linux/bsearch.h> #include <linux/btf.h> #include <linux/syscalls.h> #include <linux/slab.h> #include <linux/sched/signal.h> #include <linux/vmalloc.h> #include <linux/mmzone.h> #include <linux/anon_inodes.h> #include <linux/fdtable.h> #include <linux/file.h> #include <linux/fs.h> #include <linux/license.h> #include <linux/filter.h> #include <linux/kernel.h> #include <linux/idr.h> #include <linux/cred.h> #include <linux/timekeeping.h> #include <linux/ctype.h> #include <linux/nospec.h> #include <linux/audit.h> #include <uapi/linux/btf.h> #include <linux/pgtable.h> #include <linux/bpf_lsm.h> #include <linux/poll.h> #include <linux/sort.h> #include <linux/bpf-netns.h> #include <linux/rcupdate_trace.h> #include <linux/memcontrol.h> #include <linux/trace_events.h> #include <net/netfilter/nf_bpf_link.h> #include <net/netkit.h> #include <net/tcx.h> #define IS_FD_ARRAY(map) ((map)->map_type == BPF_MAP_TYPE_PERF_EVENT_ARRAY || \ (map)->map_type == BPF_MAP_TYPE_CGROUP_ARRAY || \ (map)->map_type == BPF_MAP_TYPE_ARRAY_OF_MAPS) #define IS_FD_PROG_ARRAY(map) ((map)->map_type == BPF_MAP_TYPE_PROG_ARRAY) #define IS_FD_HASH(map) ((map)->map_type == BPF_MAP_TYPE_HASH_OF_MAPS) #define IS_FD_MAP(map) (IS_FD_ARRAY(map) || IS_FD_PROG_ARRAY(map) || \ IS_FD_HASH(map)) #define BPF_OBJ_FLAG_MASK (BPF_F_RDONLY | BPF_F_WRONLY) DEFINE_PER_CPU(int, bpf_prog_active); static DEFINE_IDR(prog_idr); static DEFINE_SPINLOCK(prog_idr_lock); static DEFINE_IDR(map_idr); static DEFINE_SPINLOCK(map_idr_lock); static DEFINE_IDR(link_idr); static DEFINE_SPINLOCK(link_idr_lock); int sysctl_unprivileged_bpf_disabled __read_mostly = IS_BUILTIN(CONFIG_BPF_UNPRIV_DEFAULT_OFF) ? 2 : 0; static const struct bpf_map_ops * const bpf_map_types[] = { #define BPF_PROG_TYPE(_id, _name, prog_ctx_type, kern_ctx_type) #define BPF_MAP_TYPE(_id, _ops) \ [_id] = &_ops, #define BPF_LINK_TYPE(_id, _name) #include <linux/bpf_types.h> #undef BPF_PROG_TYPE #undef BPF_MAP_TYPE #undef BPF_LINK_TYPE }; /* * If we're handed a bigger struct than we know of, ensure all the unknown bits * are 0 - i.e. new user-space does not rely on any kernel feature extensions * we don't know about yet. * * There is a ToCToU between this function call and the following * copy_from_user() call. However, this is not a concern since this function is * meant to be a future-proofing of bits. */ int bpf_check_uarg_tail_zero(bpfptr_t uaddr, size_t expected_size, size_t actual_size) { int res; if (unlikely(actual_size > PAGE_SIZE)) /* silly large */ return -E2BIG; if (actual_size <= expected_size) return 0; if (uaddr.is_kernel) res = memchr_inv(uaddr.kernel + expected_size, 0, actual_size - expected_size) == NULL; else res = check_zeroed_user(uaddr.user + expected_size, actual_size - expected_size); if (res < 0) return res; return res ? 0 : -E2BIG; } const struct bpf_map_ops bpf_map_offload_ops = { .map_meta_equal = bpf_map_meta_equal, .map_alloc = bpf_map_offload_map_alloc, .map_free = bpf_map_offload_map_free, .map_check_btf = map_check_no_btf, .map_mem_usage = bpf_map_offload_map_mem_usage, }; static void bpf_map_write_active_inc(struct bpf_map *map) { atomic64_inc(&map->writecnt); } static void bpf_map_write_active_dec(struct bpf_map *map) { atomic64_dec(&map->writecnt); } bool bpf_map_write_active(const struct bpf_map *map) { return atomic64_read(&map->writecnt) != 0; } static u32 bpf_map_value_size(const struct bpf_map *map) { if (map->map_type == BPF_MAP_TYPE_PERCPU_HASH || map->map_type == BPF_MAP_TYPE_LRU_PERCPU_HASH || map->map_type == BPF_MAP_TYPE_PERCPU_ARRAY || map->map_type == BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE) return round_up(map->value_size, 8) * num_possible_cpus(); else if (IS_FD_MAP(map)) return sizeof(u32); else return map->value_size; } static void maybe_wait_bpf_programs(struct bpf_map *map) { /* Wait for any running non-sleepable BPF programs to complete so that * userspace, when we return to it, knows that all non-sleepable * programs that could be running use the new map value. For sleepable * BPF programs, synchronize_rcu_tasks_trace() should be used to wait * for the completions of these programs, but considering the waiting * time can be very long and userspace may think it will hang forever, * so don't handle sleepable BPF programs now. */ if (map->map_type == BPF_MAP_TYPE_HASH_OF_MAPS || map->map_type == BPF_MAP_TYPE_ARRAY_OF_MAPS) synchronize_rcu(); } static int bpf_map_update_value(struct bpf_map *map, struct file *map_file, void *key, void *value, __u64 flags) { int err; /* Need to create a kthread, thus must support schedule */ if (bpf_map_is_offloaded(map)) { return bpf_map_offload_update_elem(map, key, value, flags); } else if (map->map_type == BPF_MAP_TYPE_CPUMAP || map->map_type == BPF_MAP_TYPE_ARENA || map->map_type == BPF_MAP_TYPE_STRUCT_OPS) { return map->ops->map_update_elem(map, key, value, flags); } else if (map->map_type == BPF_MAP_TYPE_SOCKHASH || map->map_type == BPF_MAP_TYPE_SOCKMAP) { return sock_map_update_elem_sys(map, key, value, flags); } else if (IS_FD_PROG_ARRAY(map)) { return bpf_fd_array_map_update_elem(map, map_file, key, value, flags); } bpf_disable_instrumentation(); if (map->map_type == BPF_MAP_TYPE_PERCPU_HASH || map->map_type == BPF_MAP_TYPE_LRU_PERCPU_HASH) { err = bpf_percpu_hash_update(map, key, value, flags); } else if (map->map_type == BPF_MAP_TYPE_PERCPU_ARRAY) { err = bpf_percpu_array_update(map, key, value, flags); } else if (map->map_type == BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE) { err = bpf_percpu_cgroup_storage_update(map, key, value, flags); } else if (IS_FD_ARRAY(map)) { err = bpf_fd_array_map_update_elem(map, map_file, key, value, flags); } else if (map->map_type == BPF_MAP_TYPE_HASH_OF_MAPS) { err = bpf_fd_htab_map_update_elem(map, map_file, key, value, flags); } else if (map->map_type == BPF_MAP_TYPE_REUSEPORT_SOCKARRAY) { /* rcu_read_lock() is not needed */ err = bpf_fd_reuseport_array_update_elem(map, key, value, flags); } else if (map->map_type == BPF_MAP_TYPE_QUEUE || map->map_type == BPF_MAP_TYPE_STACK || map->map_type == BPF_MAP_TYPE_BLOOM_FILTER) { err = map->ops->map_push_elem(map, value, flags); } else { rcu_read_lock(); err = map->ops->map_update_elem(map, key, value, flags); rcu_read_unlock(); } bpf_enable_instrumentation(); return err; } static int bpf_map_copy_value(struct bpf_map *map, void *key, void *value, __u64 flags) { void *ptr; int err; if (bpf_map_is_offloaded(map)) return bpf_map_offload_lookup_elem(map, key, value); bpf_disable_instrumentation(); if (map->map_type == BPF_MAP_TYPE_PERCPU_HASH || map->map_type == BPF_MAP_TYPE_LRU_PERCPU_HASH) { err = bpf_percpu_hash_copy(map, key, value); } else if (map->map_type == BPF_MAP_TYPE_PERCPU_ARRAY) { err = bpf_percpu_array_copy(map, key, value); } else if (map->map_type == BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE) { err = bpf_percpu_cgroup_storage_copy(map, key, value); } else if (map->map_type == BPF_MAP_TYPE_STACK_TRACE) { err = bpf_stackmap_copy(map, key, value); } else if (IS_FD_ARRAY(map) || IS_FD_PROG_ARRAY(map)) { err = bpf_fd_array_map_lookup_elem(map, key, value); } else if (IS_FD_HASH(map)) { err = bpf_fd_htab_map_lookup_elem(map, key, value); } else if (map->map_type == BPF_MAP_TYPE_REUSEPORT_SOCKARRAY) { err = bpf_fd_reuseport_array_lookup_elem(map, key, value); } else if (map->map_type == BPF_MAP_TYPE_QUEUE || map->map_type == BPF_MAP_TYPE_STACK || map->map_type == BPF_MAP_TYPE_BLOOM_FILTER) { err = map->ops->map_peek_elem(map, value); } else if (map->map_type == BPF_MAP_TYPE_STRUCT_OPS) { /* struct_ops map requires directly updating "value" */ err = bpf_struct_ops_map_sys_lookup_elem(map, key, value); } else { rcu_read_lock(); if (map->ops->map_lookup_elem_sys_only) ptr = map->ops->map_lookup_elem_sys_only(map, key); else ptr = map->ops->map_lookup_elem(map, key); if (IS_ERR(ptr)) { err = PTR_ERR(ptr); } else if (!ptr) { err = -ENOENT; } else { err = 0; if (flags & BPF_F_LOCK) /* lock 'ptr' and copy everything but lock */ copy_map_value_locked(map, value, ptr, true); else copy_map_value(map, value, ptr); /* mask lock and timer, since value wasn't zero inited */ check_and_init_map_value(map, value); } rcu_read_unlock(); } bpf_enable_instrumentation(); return err; } /* Please, do not use this function outside from the map creation path * (e.g. in map update path) without taking care of setting the active * memory cgroup (see at bpf_map_kmalloc_node() for example). */ static void *__bpf_map_area_alloc(u64 size, int numa_node, bool mmapable) { /* We really just want to fail instead of triggering OOM killer * under memory pressure, therefore we set __GFP_NORETRY to kmalloc, * which is used for lower order allocation requests. * * It has been observed that higher order allocation requests done by * vmalloc with __GFP_NORETRY being set might fail due to not trying * to reclaim memory from the page cache, thus we set * __GFP_RETRY_MAYFAIL to avoid such situations. */ gfp_t gfp = bpf_memcg_flags(__GFP_NOWARN | __GFP_ZERO); unsigned int flags = 0; unsigned long align = 1; void *area; if (size >= SIZE_MAX) return NULL; /* kmalloc()'ed memory can't be mmap()'ed */ if (mmapable) { BUG_ON(!PAGE_ALIGNED(size)); align = SHMLBA; flags = VM_USERMAP; } else if (size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) { area = kmalloc_node(size, gfp | GFP_USER | __GFP_NORETRY, numa_node); if (area != NULL) return area; } return __vmalloc_node_range(size, align, VMALLOC_START, VMALLOC_END, gfp | GFP_KERNEL | __GFP_RETRY_MAYFAIL, PAGE_KERNEL, flags, numa_node, __builtin_return_address(0)); } void *bpf_map_area_alloc(u64 size, int numa_node) { return __bpf_map_area_alloc(size, numa_node, false); } void *bpf_map_area_mmapable_alloc(u64 size, int numa_node) { return __bpf_map_area_alloc(size, numa_node, true); } void bpf_map_area_free(void *area) { kvfree(area); } static u32 bpf_map_flags_retain_permanent(u32 flags) { /* Some map creation flags are not tied to the map object but * rather to the map fd instead, so they have no meaning upon * map object inspection since multiple file descriptors with * different (access) properties can exist here. Thus, given * this has zero meaning for the map itself, lets clear these * from here. */ return flags & ~(BPF_F_RDONLY | BPF_F_WRONLY); } void bpf_map_init_from_attr(struct bpf_map *map, union bpf_attr *attr) { map->map_type = attr->map_type; map->key_size = attr->key_size; map->value_size = attr->value_size; map->max_entries = attr->max_entries; map->map_flags = bpf_map_flags_retain_permanent(attr->map_flags); map->numa_node = bpf_map_attr_numa_node(attr); map->map_extra = attr->map_extra; } static int bpf_map_alloc_id(struct bpf_map *map) { int id; idr_preload(GFP_KERNEL); spin_lock_bh(&map_idr_lock); id = idr_alloc_cyclic(&map_idr, map, 1, INT_MAX, GFP_ATOMIC); if (id > 0) map->id = id; spin_unlock_bh(&map_idr_lock); idr_preload_end(); if (WARN_ON_ONCE(!id)) return -ENOSPC; return id > 0 ? 0 : id; } void bpf_map_free_id(struct bpf_map *map) { unsigned long flags; /* Offloaded maps are removed from the IDR store when their device * disappears - even if someone holds an fd to them they are unusable, * the memory is gone, all ops will fail; they are simply waiting for * refcnt to drop to be freed. */ if (!map->id) return; spin_lock_irqsave(&map_idr_lock, flags); idr_remove(&map_idr, map->id); map->id = 0; spin_unlock_irqrestore(&map_idr_lock, flags); } #ifdef CONFIG_MEMCG static void bpf_map_save_memcg(struct bpf_map *map) { /* Currently if a map is created by a process belonging to the root * memory cgroup, get_obj_cgroup_from_current() will return NULL. * So we have to check map->objcg for being NULL each time it's * being used. */ if (memcg_bpf_enabled()) map->objcg = get_obj_cgroup_from_current(); } static void bpf_map_release_memcg(struct bpf_map *map) { if (map->objcg) obj_cgroup_put(map->objcg); } static struct mem_cgroup *bpf_map_get_memcg(const struct bpf_map *map) { if (map->objcg) return get_mem_cgroup_from_objcg(map->objcg); return root_mem_cgroup; } void *bpf_map_kmalloc_node(const struct bpf_map *map, size_t size, gfp_t flags, int node) { struct mem_cgroup *memcg, *old_memcg; void *ptr; memcg = bpf_map_get_memcg(map); old_memcg = set_active_memcg(memcg); ptr = kmalloc_node(size, flags | __GFP_ACCOUNT, node); set_active_memcg(old_memcg); mem_cgroup_put(memcg); return ptr; } void *bpf_map_kzalloc(const struct bpf_map *map, size_t size, gfp_t flags) { struct mem_cgroup *memcg, *old_memcg; void *ptr; memcg = bpf_map_get_memcg(map); old_memcg = set_active_memcg(memcg); ptr = kzalloc(size, flags | __GFP_ACCOUNT); set_active_memcg(old_memcg); mem_cgroup_put(memcg); return ptr; } void *bpf_map_kvcalloc(struct bpf_map *map, size_t n, size_t size, gfp_t flags) { struct mem_cgroup *memcg, *old_memcg; void *ptr; memcg = bpf_map_get_memcg(map); old_memcg = set_active_memcg(memcg); ptr = kvcalloc(n, size, flags | __GFP_ACCOUNT); set_active_memcg(old_memcg); mem_cgroup_put(memcg); return ptr; } void __percpu *bpf_map_alloc_percpu(const struct bpf_map *map, size_t size, size_t align, gfp_t flags) { struct mem_cgroup *memcg, *old_memcg; void __percpu *ptr; memcg = bpf_map_get_memcg(map); old_memcg = set_active_memcg(memcg); ptr = __alloc_percpu_gfp(size, align, flags | __GFP_ACCOUNT); set_active_memcg(old_memcg); mem_cgroup_put(memcg); return ptr; } #else static void bpf_map_save_memcg(struct bpf_map *map) { } static void bpf_map_release_memcg(struct bpf_map *map) { } #endif int bpf_map_alloc_pages(const struct bpf_map *map, gfp_t gfp, int nid, unsigned long nr_pages, struct page **pages) { unsigned long i, j; struct page *pg; int ret = 0; #ifdef CONFIG_MEMCG struct mem_cgroup *memcg, *old_memcg; memcg = bpf_map_get_memcg(map); old_memcg = set_active_memcg(memcg); #endif for (i = 0; i < nr_pages; i++) { pg = alloc_pages_node(nid, gfp | __GFP_ACCOUNT, 0); if (pg) { pages[i] = pg; continue; } for (j = 0; j < i; j++) __free_page(pages[j]); ret = -ENOMEM; break; } #ifdef CONFIG_MEMCG set_active_memcg(old_memcg); mem_cgroup_put(memcg); #endif return ret; } static int btf_field_cmp(const void *a, const void *b) { const struct btf_field *f1 = a, *f2 = b; if (f1->offset < f2->offset) return -1; else if (f1->offset > f2->offset) return 1; return 0; } struct btf_field *btf_record_find(const struct btf_record *rec, u32 offset, u32 field_mask) { struct btf_field *field; if (IS_ERR_OR_NULL(rec) || !(rec->field_mask & field_mask)) return NULL; field = bsearch(&offset, rec->fields, rec->cnt, sizeof(rec->fields[0]), btf_field_cmp); if (!field || !(field->type & field_mask)) return NULL; return field; } void btf_record_free(struct btf_record *rec) { int i; if (IS_ERR_OR_NULL(rec)) return; for (i = 0; i < rec->cnt; i++) { switch (rec->fields[i].type) { case BPF_KPTR_UNREF: case BPF_KPTR_REF: case BPF_KPTR_PERCPU: if (rec->fields[i].kptr.module) module_put(rec->fields[i].kptr.module); btf_put(rec->fields[i].kptr.btf); break; case BPF_LIST_HEAD: case BPF_LIST_NODE: case BPF_RB_ROOT: case BPF_RB_NODE: case BPF_SPIN_LOCK: case BPF_TIMER: case BPF_REFCOUNT: case BPF_WORKQUEUE: /* Nothing to release */ break; default: WARN_ON_ONCE(1); continue; } } kfree(rec); } void bpf_map_free_record(struct bpf_map *map) { btf_record_free(map->record); map->record = NULL; } struct btf_record *btf_record_dup(const struct btf_record *rec) { const struct btf_field *fields; struct btf_record *new_rec; int ret, size, i; if (IS_ERR_OR_NULL(rec)) return NULL; size = offsetof(struct btf_record, fields[rec->cnt]); new_rec = kmemdup(rec, size, GFP_KERNEL | __GFP_NOWARN); if (!new_rec) return ERR_PTR(-ENOMEM); /* Do a deep copy of the btf_record */ fields = rec->fields; new_rec->cnt = 0; for (i = 0; i < rec->cnt; i++) { switch (fields[i].type) { case BPF_KPTR_UNREF: case BPF_KPTR_REF: case BPF_KPTR_PERCPU: btf_get(fields[i].kptr.btf); if (fields[i].kptr.module && !try_module_get(fields[i].kptr.module)) { ret = -ENXIO; goto free; } break; case BPF_LIST_HEAD: case BPF_LIST_NODE: case BPF_RB_ROOT: case BPF_RB_NODE: case BPF_SPIN_LOCK: case BPF_TIMER: case BPF_REFCOUNT: case BPF_WORKQUEUE: /* Nothing to acquire */ break; default: ret = -EFAULT; WARN_ON_ONCE(1); goto free; } new_rec->cnt++; } return new_rec; free: btf_record_free(new_rec); return ERR_PTR(ret); } bool btf_record_equal(const struct btf_record *rec_a, const struct btf_record *rec_b) { bool a_has_fields = !IS_ERR_OR_NULL(rec_a), b_has_fields = !IS_ERR_OR_NULL(rec_b); int size; if (!a_has_fields && !b_has_fields) return true; if (a_has_fields != b_has_fields) return false; if (rec_a->cnt != rec_b->cnt) return false; size = offsetof(struct btf_record, fields[rec_a->cnt]); /* btf_parse_fields uses kzalloc to allocate a btf_record, so unused * members are zeroed out. So memcmp is safe to do without worrying * about padding/unused fields. * * While spin_lock, timer, and kptr have no relation to map BTF, * list_head metadata is specific to map BTF, the btf and value_rec * members in particular. btf is the map BTF, while value_rec points to * btf_record in that map BTF. * * So while by default, we don't rely on the map BTF (which the records * were parsed from) matching for both records, which is not backwards * compatible, in case list_head is part of it, we implicitly rely on * that by way of depending on memcmp succeeding for it. */ return !memcmp(rec_a, rec_b, size); } void bpf_obj_free_timer(const struct btf_record *rec, void *obj) { if (WARN_ON_ONCE(!btf_record_has_field(rec, BPF_TIMER))) return; bpf_timer_cancel_and_free(obj + rec->timer_off); } void bpf_obj_free_workqueue(const struct btf_record *rec, void *obj) { if (WARN_ON_ONCE(!btf_record_has_field(rec, BPF_WORKQUEUE))) return; bpf_wq_cancel_and_free(obj + rec->wq_off); } void bpf_obj_free_fields(const struct btf_record *rec, void *obj) { const struct btf_field *fields; int i; if (IS_ERR_OR_NULL(rec)) return; fields = rec->fields; for (i = 0; i < rec->cnt; i++) { struct btf_struct_meta *pointee_struct_meta; const struct btf_field *field = &fields[i]; void *field_ptr = obj + field->offset; void *xchgd_field; switch (fields[i].type) { case BPF_SPIN_LOCK: break; case BPF_TIMER: bpf_timer_cancel_and_free(field_ptr); break; case BPF_WORKQUEUE: bpf_wq_cancel_and_free(field_ptr); break; case BPF_KPTR_UNREF: WRITE_ONCE(*(u64 *)field_ptr, 0); break; case BPF_KPTR_REF: case BPF_KPTR_PERCPU: xchgd_field = (void *)xchg((unsigned long *)field_ptr, 0); if (!xchgd_field) break; if (!btf_is_kernel(field->kptr.btf)) { pointee_struct_meta = btf_find_struct_meta(field->kptr.btf, field->kptr.btf_id); migrate_disable(); __bpf_obj_drop_impl(xchgd_field, pointee_struct_meta ? pointee_struct_meta->record : NULL, fields[i].type == BPF_KPTR_PERCPU); migrate_enable(); } else { field->kptr.dtor(xchgd_field); } break; case BPF_LIST_HEAD: if (WARN_ON_ONCE(rec->spin_lock_off < 0)) continue; bpf_list_head_free(field, field_ptr, obj + rec->spin_lock_off); break; case BPF_RB_ROOT: if (WARN_ON_ONCE(rec->spin_lock_off < 0)) continue; bpf_rb_root_free(field, field_ptr, obj + rec->spin_lock_off); break; case BPF_LIST_NODE: case BPF_RB_NODE: case BPF_REFCOUNT: break; default: WARN_ON_ONCE(1); continue; } } } /* called from workqueue */ static void bpf_map_free_deferred(struct work_struct *work) { struct bpf_map *map = container_of(work, struct bpf_map, work); struct btf_record *rec = map->record; struct btf *btf = map->btf; security_bpf_map_free(map); bpf_map_release_memcg(map); /* implementation dependent freeing */ map->ops->map_free(map); /* Delay freeing of btf_record for maps, as map_free * callback usually needs access to them. It is better to do it here * than require each callback to do the free itself manually. * * Note that the btf_record stashed in map->inner_map_meta->record was * already freed using the map_free callback for map in map case which * eventually calls bpf_map_free_meta, since inner_map_meta is only a * template bpf_map struct used during verification. */ btf_record_free(rec); /* Delay freeing of btf for maps, as map_free callback may need * struct_meta info which will be freed with btf_put(). */ btf_put(btf); } static void bpf_map_put_uref(struct bpf_map *map) { if (atomic64_dec_and_test(&map->usercnt)) { if (map->ops->map_release_uref) map->ops->map_release_uref(map); } } static void bpf_map_free_in_work(struct bpf_map *map) { INIT_WORK(&map->work, bpf_map_free_deferred); /* Avoid spawning kworkers, since they all might contend * for the same mutex like slab_mutex. */ queue_work(system_unbound_wq, &map->work); } static void bpf_map_free_rcu_gp(struct rcu_head *rcu) { bpf_map_free_in_work(container_of(rcu, struct bpf_map, rcu)); } static void bpf_map_free_mult_rcu_gp(struct rcu_head *rcu) { if (rcu_trace_implies_rcu_gp()) bpf_map_free_rcu_gp(rcu); else call_rcu(rcu, bpf_map_free_rcu_gp); } /* decrement map refcnt and schedule it for freeing via workqueue * (underlying map implementation ops->map_free() might sleep) */ void bpf_map_put(struct bpf_map *map) { if (atomic64_dec_and_test(&map->refcnt)) { /* bpf_map_free_id() must be called first */ bpf_map_free_id(map); WARN_ON_ONCE(atomic64_read(&map->sleepable_refcnt)); if (READ_ONCE(map->free_after_mult_rcu_gp)) call_rcu_tasks_trace(&map->rcu, bpf_map_free_mult_rcu_gp); else if (READ_ONCE(map->free_after_rcu_gp)) call_rcu(&map->rcu, bpf_map_free_rcu_gp); else bpf_map_free_in_work(map); } } EXPORT_SYMBOL_GPL(bpf_map_put); void bpf_map_put_with_uref(struct bpf_map *map) { bpf_map_put_uref(map); bpf_map_put(map); } static int bpf_map_release(struct inode *inode, struct file *filp) { struct bpf_map *map = filp->private_data; if (map->ops->map_release) map->ops->map_release(map, filp); bpf_map_put_with_uref(map); return 0; } static fmode_t map_get_sys_perms(struct bpf_map *map, struct fd f) { fmode_t mode = f.file->f_mode; /* Our file permissions may have been overridden by global * map permissions facing syscall side. */ if (READ_ONCE(map->frozen)) mode &= ~FMODE_CAN_WRITE; return mode; } #ifdef CONFIG_PROC_FS /* Show the memory usage of a bpf map */ static u64 bpf_map_memory_usage(const struct bpf_map *map) { return map->ops->map_mem_usage(map); } static void bpf_map_show_fdinfo(struct seq_file *m, struct file *filp) { struct bpf_map *map = filp->private_data; u32 type = 0, jited = 0; if (map_type_contains_progs(map)) { spin_lock(&map->owner.lock); type = map->owner.type; jited = map->owner.jited; spin_unlock(&map->owner.lock); } seq_printf(m, "map_type:\t%u\n" "key_size:\t%u\n" "value_size:\t%u\n" "max_entries:\t%u\n" "map_flags:\t%#x\n" "map_extra:\t%#llx\n" "memlock:\t%llu\n" "map_id:\t%u\n" "frozen:\t%u\n", map->map_type, map->key_size, map->value_size, map->max_entries, map->map_flags, (unsigned long long)map->map_extra, bpf_map_memory_usage(map), map->id, READ_ONCE(map->frozen)); if (type) { seq_printf(m, "owner_prog_type:\t%u\n", type); seq_printf(m, "owner_jited:\t%u\n", jited); } } #endif static ssize_t bpf_dummy_read(struct file *filp, char __user *buf, size_t siz, loff_t *ppos) { /* We need this handler such that alloc_file() enables * f_mode with FMODE_CAN_READ. */ return -EINVAL; } static ssize_t bpf_dummy_write(struct file *filp, const char __user *buf, size_t siz, loff_t *ppos) { /* We need this handler such that alloc_file() enables * f_mode with FMODE_CAN_WRITE. */ return -EINVAL; } /* called for any extra memory-mapped regions (except initial) */ static void bpf_map_mmap_open(struct vm_area_struct *vma) { struct bpf_map *map = vma->vm_file->private_data; if (vma->vm_flags & VM_MAYWRITE) bpf_map_write_active_inc(map); } /* called for all unmapped memory region (including initial) */ static void bpf_map_mmap_close(struct vm_area_struct *vma) { struct bpf_map *map = vma->vm_file->private_data; if (vma->vm_flags & VM_MAYWRITE) bpf_map_write_active_dec(map); } static const struct vm_operations_struct bpf_map_default_vmops = { .open = bpf_map_mmap_open, .close = bpf_map_mmap_close, }; static int bpf_map_mmap(struct file *filp, struct vm_area_struct *vma) { struct bpf_map *map = filp->private_data; int err; if (!map->ops->map_mmap || !IS_ERR_OR_NULL(map->record)) return -ENOTSUPP; if (!(vma->vm_flags & VM_SHARED)) return -EINVAL; mutex_lock(&map->freeze_mutex); if (vma->vm_flags & VM_WRITE) { if (map->frozen) { err = -EPERM; goto out; } /* map is meant to be read-only, so do not allow mapping as * writable, because it's possible to leak a writable page * reference and allows user-space to still modify it after * freezing, while verifier will assume contents do not change */ if (map->map_flags & BPF_F_RDONLY_PROG) { err = -EACCES; goto out; } } /* set default open/close callbacks */ vma->vm_ops = &bpf_map_default_vmops; vma->vm_private_data = map; vm_flags_clear(vma, VM_MAYEXEC); if (!(vma->vm_flags & VM_WRITE)) /* disallow re-mapping with PROT_WRITE */ vm_flags_clear(vma, VM_MAYWRITE); err = map->ops->map_mmap(map, vma); if (err) goto out; if (vma->vm_flags & VM_MAYWRITE) bpf_map_write_active_inc(map); out: mutex_unlock(&map->freeze_mutex); return err; } static __poll_t bpf_map_poll(struct file *filp, struct poll_table_struct *pts) { struct bpf_map *map = filp->private_data; if (map->ops->map_poll) return map->ops->map_poll(map, filp, pts); return EPOLLERR; } static unsigned long bpf_get_unmapped_area(struct file *filp, unsigned long addr, unsigned long len, unsigned long pgoff, unsigned long flags) { struct bpf_map *map = filp->private_data; if (map->ops->map_get_unmapped_area) return map->ops->map_get_unmapped_area(filp, addr, len, pgoff, flags); #ifdef CONFIG_MMU return mm_get_unmapped_area(current->mm, filp, addr, len, pgoff, flags); #else return addr; #endif } const struct file_operations bpf_map_fops = { #ifdef CONFIG_PROC_FS .show_fdinfo = bpf_map_show_fdinfo, #endif .release = bpf_map_release, .read = bpf_dummy_read, .write = bpf_dummy_write, .mmap = bpf_map_mmap, .poll = bpf_map_poll, .get_unmapped_area = bpf_get_unmapped_area, }; int bpf_map_new_fd(struct bpf_map *map, int flags) { int ret; ret = security_bpf_map(map, OPEN_FMODE(flags)); if (ret < 0) return ret; return anon_inode_getfd("bpf-map", &bpf_map_fops, map, flags | O_CLOEXEC); } int bpf_get_file_flag(int flags) { if ((flags & BPF_F_RDONLY) && (flags & BPF_F_WRONLY)) return -EINVAL; if (flags & BPF_F_RDONLY) return O_RDONLY; if (flags & BPF_F_WRONLY) return O_WRONLY; return O_RDWR; } /* helper macro to check that unused fields 'union bpf_attr' are zero */ #define CHECK_ATTR(CMD) \ memchr_inv((void *) &attr->CMD##_LAST_FIELD + \ sizeof(attr->CMD##_LAST_FIELD), 0, \ sizeof(*attr) - \ offsetof(union bpf_attr, CMD##_LAST_FIELD) - \ sizeof(attr->CMD##_LAST_FIELD)) != NULL /* dst and src must have at least "size" number of bytes. * Return strlen on success and < 0 on error. */ int bpf_obj_name_cpy(char *dst, const char *src, unsigned int size) { const char *end = src + size; const char *orig_src = src; memset(dst, 0, size); /* Copy all isalnum(), '_' and '.' chars. */ while (src < end && *src) { if (!isalnum(*src) && *src != '_' && *src != '.') return -EINVAL; *dst++ = *src++; } /* No '\0' found in "size" number of bytes */ if (src == end) return -EINVAL; return src - orig_src; } int map_check_no_btf(const struct bpf_map *map, const struct btf *btf, const struct btf_type *key_type, const struct btf_type *value_type) { return -ENOTSUPP; } static int map_check_btf(struct bpf_map *map, struct bpf_token *token, const struct btf *btf, u32 btf_key_id, u32 btf_value_id) { const struct btf_type *key_type, *value_type; u32 key_size, value_size; int ret = 0; /* Some maps allow key to be unspecified. */ if (btf_key_id) { key_type = btf_type_id_size(btf, &btf_key_id, &key_size); if (!key_type || key_size != map->key_size) return -EINVAL; } else { key_type = btf_type_by_id(btf, 0); if (!map->ops->map_check_btf) return -EINVAL; } value_type = btf_type_id_size(btf, &btf_value_id, &value_size); if (!value_type || value_size != map->value_size) return -EINVAL; map->record = btf_parse_fields(btf, value_type, BPF_SPIN_LOCK | BPF_TIMER | BPF_KPTR | BPF_LIST_HEAD | BPF_RB_ROOT | BPF_REFCOUNT | BPF_WORKQUEUE, map->value_size); if (!IS_ERR_OR_NULL(map->record)) { int i; if (!bpf_token_capable(token, CAP_BPF)) { ret = -EPERM; goto free_map_tab; } if (map->map_flags & (BPF_F_RDONLY_PROG | BPF_F_WRONLY_PROG)) { ret = -EACCES; goto free_map_tab; } for (i = 0; i < sizeof(map->record->field_mask) * 8; i++) { switch (map->record->field_mask & (1 << i)) { case 0: continue; case BPF_SPIN_LOCK: if (map->map_type != BPF_MAP_TYPE_HASH && map->map_type != BPF_MAP_TYPE_ARRAY && map->map_type != BPF_MAP_TYPE_CGROUP_STORAGE && map->map_type != BPF_MAP_TYPE_SK_STORAGE && map->map_type != BPF_MAP_TYPE_INODE_STORAGE && map->map_type != BPF_MAP_TYPE_TASK_STORAGE && map->map_type != BPF_MAP_TYPE_CGRP_STORAGE) { ret = -EOPNOTSUPP; goto free_map_tab; } break; case BPF_TIMER: case BPF_WORKQUEUE: if (map->map_type != BPF_MAP_TYPE_HASH && map->map_type != BPF_MAP_TYPE_LRU_HASH && map->map_type != BPF_MAP_TYPE_ARRAY) { ret = -EOPNOTSUPP; goto free_map_tab; } break; case BPF_KPTR_UNREF: case BPF_KPTR_REF: case BPF_KPTR_PERCPU: case BPF_REFCOUNT: if (map->map_type != BPF_MAP_TYPE_HASH && map->map_type != BPF_MAP_TYPE_PERCPU_HASH && map->map_type != BPF_MAP_TYPE_LRU_HASH && map->map_type != BPF_MAP_TYPE_LRU_PERCPU_HASH && map->map_type != BPF_MAP_TYPE_ARRAY && map->map_type != BPF_MAP_TYPE_PERCPU_ARRAY && map->map_type != BPF_MAP_TYPE_SK_STORAGE && map->map_type != BPF_MAP_TYPE_INODE_STORAGE && map->map_type != BPF_MAP_TYPE_TASK_STORAGE && map->map_type != BPF_MAP_TYPE_CGRP_STORAGE) { ret = -EOPNOTSUPP; goto free_map_tab; } break; case BPF_LIST_HEAD: case BPF_RB_ROOT: if (map->map_type != BPF_MAP_TYPE_HASH && map->map_type != BPF_MAP_TYPE_LRU_HASH && map->map_type != BPF_MAP_TYPE_ARRAY) { ret = -EOPNOTSUPP; goto free_map_tab; } break; default: /* Fail if map_type checks are missing for a field type */ ret = -EOPNOTSUPP; goto free_map_tab; } } } ret = btf_check_and_fixup_fields(btf, map->record); if (ret < 0) goto free_map_tab; if (map->ops->map_check_btf) { ret = map->ops->map_check_btf(map, btf, key_type, value_type); if (ret < 0) goto free_map_tab; } return ret; free_map_tab: bpf_map_free_record(map); return ret; } static bool bpf_net_capable(void) { return capable(CAP_NET_ADMIN) || capable(CAP_SYS_ADMIN); } #define BPF_MAP_CREATE_LAST_FIELD map_token_fd /* called via syscall */ static int map_create(union bpf_attr *attr) { const struct bpf_map_ops *ops; struct bpf_token *token = NULL; int numa_node = bpf_map_attr_numa_node(attr); u32 map_type = attr->map_type; struct bpf_map *map; bool token_flag; int f_flags; int err; err = CHECK_ATTR(BPF_MAP_CREATE); if (err) return -EINVAL; /* check BPF_F_TOKEN_FD flag, remember if it's set, and then clear it * to avoid per-map type checks tripping on unknown flag */ token_flag = attr->map_flags & BPF_F_TOKEN_FD; attr->map_flags &= ~BPF_F_TOKEN_FD; if (attr->btf_vmlinux_value_type_id) { if (attr->map_type != BPF_MAP_TYPE_STRUCT_OPS || attr->btf_key_type_id || attr->btf_value_type_id) return -EINVAL; } else if (attr->btf_key_type_id && !attr->btf_value_type_id) { return -EINVAL; } if (attr->map_type != BPF_MAP_TYPE_BLOOM_FILTER && attr->map_type != BPF_MAP_TYPE_ARENA && attr->map_extra != 0) return -EINVAL; f_flags = bpf_get_file_flag(attr->map_flags); if (f_flags < 0) return f_flags; if (numa_node != NUMA_NO_NODE && ((unsigned int)numa_node >= nr_node_ids || !node_online(numa_node))) return -EINVAL; /* find map type and init map: hashtable vs rbtree vs bloom vs ... */ map_type = attr->map_type; if (map_type >= ARRAY_SIZE(bpf_map_types)) return -EINVAL; map_type = array_index_nospec(map_type, ARRAY_SIZE(bpf_map_types)); ops = bpf_map_types[map_type]; if (!ops) return -EINVAL; if (ops->map_alloc_check) { err = ops->map_alloc_check(attr); if (err) return err; } if (attr->map_ifindex) ops = &bpf_map_offload_ops; if (!ops->map_mem_usage) return -EINVAL; if (token_flag) { token = bpf_token_get_from_fd(attr->map_token_fd); if (IS_ERR(token)) return PTR_ERR(token); /* if current token doesn't grant map creation permissions, * then we can't use this token, so ignore it and rely on * system-wide capabilities checks */ if (!bpf_token_allow_cmd(token, BPF_MAP_CREATE) || !bpf_token_allow_map_type(token, attr->map_type)) { bpf_token_put(token); token = NULL; } } err = -EPERM; /* Intent here is for unprivileged_bpf_disabled to block BPF map * creation for unprivileged users; other actions depend * on fd availability and access to bpffs, so are dependent on * object creation success. Even with unprivileged BPF disabled, * capability checks are still carried out. */ if (sysctl_unprivileged_bpf_disabled && !bpf_token_capable(token, CAP_BPF)) goto put_token; /* check privileged map type permissions */ switch (map_type) { case BPF_MAP_TYPE_ARRAY: case BPF_MAP_TYPE_PERCPU_ARRAY: case BPF_MAP_TYPE_PROG_ARRAY: case BPF_MAP_TYPE_PERF_EVENT_ARRAY: case BPF_MAP_TYPE_CGROUP_ARRAY: case BPF_MAP_TYPE_ARRAY_OF_MAPS: case BPF_MAP_TYPE_HASH: case BPF_MAP_TYPE_PERCPU_HASH: case BPF_MAP_TYPE_HASH_OF_MAPS: case BPF_MAP_TYPE_RINGBUF: case BPF_MAP_TYPE_USER_RINGBUF: case BPF_MAP_TYPE_CGROUP_STORAGE: case BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE: /* unprivileged */ break; case BPF_MAP_TYPE_SK_STORAGE: case BPF_MAP_TYPE_INODE_STORAGE: case BPF_MAP_TYPE_TASK_STORAGE: case BPF_MAP_TYPE_CGRP_STORAGE: case BPF_MAP_TYPE_BLOOM_FILTER: case BPF_MAP_TYPE_LPM_TRIE: case BPF_MAP_TYPE_REUSEPORT_SOCKARRAY: case BPF_MAP_TYPE_STACK_TRACE: case BPF_MAP_TYPE_QUEUE: case BPF_MAP_TYPE_STACK: case BPF_MAP_TYPE_LRU_HASH: case BPF_MAP_TYPE_LRU_PERCPU_HASH: case BPF_MAP_TYPE_STRUCT_OPS: case BPF_MAP_TYPE_CPUMAP: case BPF_MAP_TYPE_ARENA: if (!bpf_token_capable(token, CAP_BPF)) goto put_token; break; case BPF_MAP_TYPE_SOCKMAP: case BPF_MAP_TYPE_SOCKHASH: case BPF_MAP_TYPE_DEVMAP: case BPF_MAP_TYPE_DEVMAP_HASH: case BPF_MAP_TYPE_XSKMAP: if (!bpf_token_capable(token, CAP_NET_ADMIN)) goto put_token; break; default: WARN(1, "unsupported map type %d", map_type); goto put_token; } map = ops->map_alloc(attr); if (IS_ERR(map)) { err = PTR_ERR(map); goto put_token; } map->ops = ops; map->map_type = map_type; err = bpf_obj_name_cpy(map->name, attr->map_name, sizeof(attr->map_name)); if (err < 0) goto free_map; atomic64_set(&map->refcnt, 1); atomic64_set(&map->usercnt, 1); mutex_init(&map->freeze_mutex); spin_lock_init(&map->owner.lock); if (attr->btf_key_type_id || attr->btf_value_type_id || /* Even the map's value is a kernel's struct, * the bpf_prog.o must have BTF to begin with * to figure out the corresponding kernel's * counter part. Thus, attr->btf_fd has * to be valid also. */ attr->btf_vmlinux_value_type_id) { struct btf *btf; btf = btf_get_by_fd(attr->btf_fd); if (IS_ERR(btf)) { err = PTR_ERR(btf); goto free_map; } if (btf_is_kernel(btf)) { btf_put(btf); err = -EACCES; goto free_map; } map->btf = btf; if (attr->btf_value_type_id) { err = map_check_btf(map, token, btf, attr->btf_key_type_id, attr->btf_value_type_id); if (err) goto free_map; } map->btf_key_type_id = attr->btf_key_type_id; map->btf_value_type_id = attr->btf_value_type_id; map->btf_vmlinux_value_type_id = attr->btf_vmlinux_value_type_id; } err = security_bpf_map_create(map, attr, token); if (err) goto free_map_sec; err = bpf_map_alloc_id(map); if (err) goto free_map_sec; bpf_map_save_memcg(map); bpf_token_put(token); err = bpf_map_new_fd(map, f_flags); if (err < 0) { /* failed to allocate fd. * bpf_map_put_with_uref() is needed because the above * bpf_map_alloc_id() has published the map * to the userspace and the userspace may * have refcnt-ed it through BPF_MAP_GET_FD_BY_ID. */ bpf_map_put_with_uref(map); return err; } return err; free_map_sec: security_bpf_map_free(map); free_map: btf_put(map->btf); map->ops->map_free(map); put_token: bpf_token_put(token); return err; } /* if error is returned, fd is released. * On success caller should complete fd access with matching fdput() */ struct bpf_map *__bpf_map_get(struct fd f) { if (!f.file) return ERR_PTR(-EBADF); if (f.file->f_op != &bpf_map_fops) { fdput(f); return ERR_PTR(-EINVAL); } return f.file->private_data; } void bpf_map_inc(struct bpf_map *map) { atomic64_inc(&map->refcnt); } EXPORT_SYMBOL_GPL(bpf_map_inc); void bpf_map_inc_with_uref(struct bpf_map *map) { atomic64_inc(&map->refcnt); atomic64_inc(&map->usercnt); } EXPORT_SYMBOL_GPL(bpf_map_inc_with_uref); struct bpf_map *bpf_map_get(u32 ufd) { struct fd f = fdget(ufd); struct bpf_map *map; map = __bpf_map_get(f); if (IS_ERR(map)) return map; bpf_map_inc(map); fdput(f); return map; } EXPORT_SYMBOL(bpf_map_get); struct bpf_map *bpf_map_get_with_uref(u32 ufd) { struct fd f = fdget(ufd); struct bpf_map *map; map = __bpf_map_get(f); if (IS_ERR(map)) return map; bpf_map_inc_with_uref(map); fdput(f); return map; } /* map_idr_lock should have been held or the map should have been * protected by rcu read lock. */ struct bpf_map *__bpf_map_inc_not_zero(struct bpf_map *map, bool uref) { int refold; refold = atomic64_fetch_add_unless(&map->refcnt, 1, 0); if (!refold) return ERR_PTR(-ENOENT); if (uref) atomic64_inc(&map->usercnt); return map; } struct bpf_map *bpf_map_inc_not_zero(struct bpf_map *map) { spin_lock_bh(&map_idr_lock); map = __bpf_map_inc_not_zero(map, false); spin_unlock_bh(&map_idr_lock); return map; } EXPORT_SYMBOL_GPL(bpf_map_inc_not_zero); int __weak bpf_stackmap_copy(struct bpf_map *map, void *key, void *value) { return -ENOTSUPP; } static void *__bpf_copy_key(void __user *ukey, u64 key_size) { if (key_size) return vmemdup_user(ukey, key_size); if (ukey) return ERR_PTR(-EINVAL); return NULL; } static void *___bpf_copy_key(bpfptr_t ukey, u64 key_size) { if (key_size) return kvmemdup_bpfptr(ukey, key_size); if (!bpfptr_is_null(ukey)) return ERR_PTR(-EINVAL); return NULL; } /* last field in 'union bpf_attr' used by this command */ #define BPF_MAP_LOOKUP_ELEM_LAST_FIELD flags static int map_lookup_elem(union bpf_attr *attr) { void __user *ukey = u64_to_user_ptr(attr->key); void __user *uvalue = u64_to_user_ptr(attr->value); int ufd = attr->map_fd; struct bpf_map *map; void *key, *value; u32 value_size; struct fd f; int err; if (CHECK_ATTR(BPF_MAP_LOOKUP_ELEM)) return -EINVAL; if (attr->flags & ~BPF_F_LOCK) return -EINVAL; f = fdget(ufd); map = __bpf_map_get(f); if (IS_ERR(map)) return PTR_ERR(map); if (!(map_get_sys_perms(map, f) & FMODE_CAN_READ)) { err = -EPERM; goto err_put; } if ((attr->flags & BPF_F_LOCK) && !btf_record_has_field(map->record, BPF_SPIN_LOCK)) { err = -EINVAL; goto err_put; } key = __bpf_copy_key(ukey, map->key_size); if (IS_ERR(key)) { err = PTR_ERR(key); goto err_put; } value_size = bpf_map_value_size(map); err = -ENOMEM; value = kvmalloc(value_size, GFP_USER | __GFP_NOWARN); if (!value) goto free_key; if (map->map_type == BPF_MAP_TYPE_BLOOM_FILTER) { if (copy_from_user(value, uvalue, value_size)) err = -EFAULT; else err = bpf_map_copy_value(map, key, value, attr->flags); goto free_value; } err = bpf_map_copy_value(map, key, value, attr->flags); if (err) goto free_value; err = -EFAULT; if (copy_to_user(uvalue, value, value_size) != 0) goto free_value; err = 0; free_value: kvfree(value); free_key: kvfree(key); err_put: fdput(f); return err; } #define BPF_MAP_UPDATE_ELEM_LAST_FIELD flags static int map_update_elem(union bpf_attr *attr, bpfptr_t uattr) { bpfptr_t ukey = make_bpfptr(attr->key, uattr.is_kernel); bpfptr_t uvalue = make_bpfptr(attr->value, uattr.is_kernel); int ufd = attr->map_fd; struct bpf_map *map; void *key, *value; u32 value_size; struct fd f; int err; if (CHECK_ATTR(BPF_MAP_UPDATE_ELEM)) return -EINVAL; f = fdget(ufd); map = __bpf_map_get(f); if (IS_ERR(map)) return PTR_ERR(map); bpf_map_write_active_inc(map); if (!(map_get_sys_perms(map, f) & FMODE_CAN_WRITE)) { err = -EPERM; goto err_put; } if ((attr->flags & BPF_F_LOCK) && !btf_record_has_field(map->record, BPF_SPIN_LOCK)) { err = -EINVAL; goto err_put; } key = ___bpf_copy_key(ukey, map->key_size); if (IS_ERR(key)) { err = PTR_ERR(key); goto err_put; } value_size = bpf_map_value_size(map); value = kvmemdup_bpfptr(uvalue, value_size); if (IS_ERR(value)) { err = PTR_ERR(value); goto free_key; } err = bpf_map_update_value(map, f.file, key, value, attr->flags); if (!err) maybe_wait_bpf_programs(map); kvfree(value); free_key: kvfree(key); err_put: bpf_map_write_active_dec(map); fdput(f); return err; } #define BPF_MAP_DELETE_ELEM_LAST_FIELD key static int map_delete_elem(union bpf_attr *attr, bpfptr_t uattr) { bpfptr_t ukey = make_bpfptr(attr->key, uattr.is_kernel); int ufd = attr->map_fd; struct bpf_map *map; struct fd f; void *key; int err; if (CHECK_ATTR(BPF_MAP_DELETE_ELEM)) return -EINVAL; f = fdget(ufd); map = __bpf_map_get(f); if (IS_ERR(map)) return PTR_ERR(map); bpf_map_write_active_inc(map); if (!(map_get_sys_perms(map, f) & FMODE_CAN_WRITE)) { err = -EPERM; goto err_put; } key = ___bpf_copy_key(ukey, map->key_size); if (IS_ERR(key)) { err = PTR_ERR(key); goto err_put; } if (bpf_map_is_offloaded(map)) { err = bpf_map_offload_delete_elem(map, key); goto out; } else if (IS_FD_PROG_ARRAY(map) || map->map_type == BPF_MAP_TYPE_STRUCT_OPS) { /* These maps require sleepable context */ err = map->ops->map_delete_elem(map, key); goto out; } bpf_disable_instrumentation(); rcu_read_lock(); err = map->ops->map_delete_elem(map, key); rcu_read_unlock(); bpf_enable_instrumentation(); if (!err) maybe_wait_bpf_programs(map); out: kvfree(key); err_put: bpf_map_write_active_dec(map); fdput(f); return err; } /* last field in 'union bpf_attr' used by this command */ #define BPF_MAP_GET_NEXT_KEY_LAST_FIELD next_key static int map_get_next_key(union bpf_attr *attr) { void __user *ukey = u64_to_user_ptr(attr->key); void __user *unext_key = u64_to_user_ptr(attr->next_key); int ufd = attr->map_fd; struct bpf_map *map; void *key, *next_key; struct fd f; int err; if (CHECK_ATTR(BPF_MAP_GET_NEXT_KEY)) return -EINVAL; f = fdget(ufd); map = __bpf_map_get(f); if (IS_ERR(map)) return PTR_ERR(map); if (!(map_get_sys_perms(map, f) & FMODE_CAN_READ)) { err = -EPERM; goto err_put; } if (ukey) { key = __bpf_copy_key(ukey, map->key_size); if (IS_ERR(key)) { err = PTR_ERR(key); goto err_put; } } else { key = NULL; } err = -ENOMEM; next_key = kvmalloc(map->key_size, GFP_USER); if (!next_key) goto free_key; if (bpf_map_is_offloaded(map)) { err = bpf_map_offload_get_next_key(map, key, next_key); goto out; } rcu_read_lock(); err = map->ops->map_get_next_key(map, key, next_key); rcu_read_unlock(); out: if (err) goto free_next_key; err = -EFAULT; if (copy_to_user(unext_key, next_key, map->key_size) != 0) goto free_next_key; err = 0; free_next_key: kvfree(next_key); free_key: kvfree(key); err_put: fdput(f); return err; } int generic_map_delete_batch(struct bpf_map *map, const union bpf_attr *attr, union bpf_attr __user *uattr) { void __user *keys = u64_to_user_ptr(attr->batch.keys); u32 cp, max_count; int err = 0; void *key; if (attr->batch.elem_flags & ~BPF_F_LOCK) return -EINVAL; if ((attr->batch.elem_flags & BPF_F_LOCK) && !btf_record_has_field(map->record, BPF_SPIN_LOCK)) { return -EINVAL; } max_count = attr->batch.count; if (!max_count) return 0; if (put_user(0, &uattr->batch.count)) return -EFAULT; key = kvmalloc(map->key_size, GFP_USER | __GFP_NOWARN); if (!key) return -ENOMEM; for (cp = 0; cp < max_count; cp++) { err = -EFAULT; if (copy_from_user(key, keys + cp * map->key_size, map->key_size)) break; if (bpf_map_is_offloaded(map)) { err = bpf_map_offload_delete_elem(map, key); break; } bpf_disable_instrumentation(); rcu_read_lock(); err = map->ops->map_delete_elem(map, key); rcu_read_unlock(); bpf_enable_instrumentation(); if (err) break; cond_resched(); } if (copy_to_user(&uattr->batch.count, &cp, sizeof(cp))) err = -EFAULT; kvfree(key); return err; } int generic_map_update_batch(struct bpf_map *map, struct file *map_file, const union bpf_attr *attr, union bpf_attr __user *uattr) { void __user *values = u64_to_user_ptr(attr->batch.values); void __user *keys = u64_to_user_ptr(attr->batch.keys); u32 value_size, cp, max_count; void *key, *value; int err = 0; if (attr->batch.elem_flags & ~BPF_F_LOCK) return -EINVAL; if ((attr->batch.elem_flags & BPF_F_LOCK) && !btf_record_has_field(map->record, BPF_SPIN_LOCK)) { return -EINVAL; } value_size = bpf_map_value_size(map); max_count = attr->batch.count; if (!max_count) return 0; if (put_user(0, &uattr->batch.count)) return -EFAULT; key = kvmalloc(map->key_size, GFP_USER | __GFP_NOWARN); if (!key) return -ENOMEM; value = kvmalloc(value_size, GFP_USER | __GFP_NOWARN); if (!value) { kvfree(key); return -ENOMEM; } for (cp = 0; cp < max_count; cp++) { err = -EFAULT; if (copy_from_user(key, keys + cp * map->key_size, map->key_size) || copy_from_user(value, values + cp * value_size, value_size)) break; err = bpf_map_update_value(map, map_file, key, value, attr->batch.elem_flags); if (err) break; cond_resched(); } if (copy_to_user(&uattr->batch.count, &cp, sizeof(cp))) err = -EFAULT; kvfree(value); kvfree(key); return err; } #define MAP_LOOKUP_RETRIES 3 int generic_map_lookup_batch(struct bpf_map *map, const union bpf_attr *attr, union bpf_attr __user *uattr) { void __user *uobatch = u64_to_user_ptr(attr->batch.out_batch); void __user *ubatch = u64_to_user_ptr(attr->batch.in_batch); void __user *values = u64_to_user_ptr(attr->batch.values); void __user *keys = u64_to_user_ptr(attr->batch.keys); void *buf, *buf_prevkey, *prev_key, *key, *value; int err, retry = MAP_LOOKUP_RETRIES; u32 value_size, cp, max_count; if (attr->batch.elem_flags & ~BPF_F_LOCK) return -EINVAL; if ((attr->batch.elem_flags & BPF_F_LOCK) && !btf_record_has_field(map->record, BPF_SPIN_LOCK)) return -EINVAL; value_size = bpf_map_value_size(map); max_count = attr->batch.count; if (!max_count) return 0; if (put_user(0, &uattr->batch.count)) return -EFAULT; buf_prevkey = kvmalloc(map->key_size, GFP_USER | __GFP_NOWARN); if (!buf_prevkey) return -ENOMEM; buf = kvmalloc(map->key_size + value_size, GFP_USER | __GFP_NOWARN); if (!buf) { kvfree(buf_prevkey); return -ENOMEM; } err = -EFAULT; prev_key = NULL; if (ubatch && copy_from_user(buf_prevkey, ubatch, map->key_size)) goto free_buf; key = buf; value = key + map->key_size; if (ubatch) prev_key = buf_prevkey; for (cp = 0; cp < max_count;) { rcu_read_lock(); err = map->ops->map_get_next_key(map, prev_key, key); rcu_read_unlock(); if (err) break; err = bpf_map_copy_value(map, key, value, attr->batch.elem_flags); if (err == -ENOENT) { if (retry) { retry--; continue; } err = -EINTR; break; } if (err) goto free_buf; if (copy_to_user(keys + cp * map->key_size, key, map->key_size)) { err = -EFAULT; goto free_buf; } if (copy_to_user(values + cp * value_size, value, value_size)) { err = -EFAULT; goto free_buf; } if (!prev_key) prev_key = buf_prevkey; swap(prev_key, key); retry = MAP_LOOKUP_RETRIES; cp++; cond_resched(); } if (err == -EFAULT) goto free_buf; if ((copy_to_user(&uattr->batch.count, &cp, sizeof(cp)) || (cp && copy_to_user(uobatch, prev_key, map->key_size)))) err = -EFAULT; free_buf: kvfree(buf_prevkey); kvfree(buf); return err; } #define BPF_MAP_LOOKUP_AND_DELETE_ELEM_LAST_FIELD flags static int map_lookup_and_delete_elem(union bpf_attr *attr) { void __user *ukey = u64_to_user_ptr(attr->key); void __user *uvalue = u64_to_user_ptr(attr->value); int ufd = attr->map_fd; struct bpf_map *map; void *key, *value; u32 value_size; struct fd f; int err; if (CHECK_ATTR(BPF_MAP_LOOKUP_AND_DELETE_ELEM)) return -EINVAL; if (attr->flags & ~BPF_F_LOCK) return -EINVAL; f = fdget(ufd); map = __bpf_map_get(f); if (IS_ERR(map)) return PTR_ERR(map); bpf_map_write_active_inc(map); if (!(map_get_sys_perms(map, f) & FMODE_CAN_READ) || !(map_get_sys_perms(map, f) & FMODE_CAN_WRITE)) { err = -EPERM; goto err_put; } if (attr->flags && (map->map_type == BPF_MAP_TYPE_QUEUE || map->map_type == BPF_MAP_TYPE_STACK)) { err = -EINVAL; goto err_put; } if ((attr->flags & BPF_F_LOCK) && !btf_record_has_field(map->record, BPF_SPIN_LOCK)) { err = -EINVAL; goto err_put; } key = __bpf_copy_key(ukey, map->key_size); if (IS_ERR(key)) { err = PTR_ERR(key); goto err_put; } value_size = bpf_map_value_size(map); err = -ENOMEM; value = kvmalloc(value_size, GFP_USER | __GFP_NOWARN); if (!value) goto free_key; err = -ENOTSUPP; if (map->map_type == BPF_MAP_TYPE_QUEUE || map->map_type == BPF_MAP_TYPE_STACK) { err = map->ops->map_pop_elem(map, value); } else if (map->map_type == BPF_MAP_TYPE_HASH || map->map_type == BPF_MAP_TYPE_PERCPU_HASH || map->map_type == BPF_MAP_TYPE_LRU_HASH || map->map_type == BPF_MAP_TYPE_LRU_PERCPU_HASH) { if (!bpf_map_is_offloaded(map)) { bpf_disable_instrumentation(); rcu_read_lock(); err = map->ops->map_lookup_and_delete_elem(map, key, value, attr->flags); rcu_read_unlock(); bpf_enable_instrumentation(); } } if (err) goto free_value; if (copy_to_user(uvalue, value, value_size) != 0) { err = -EFAULT; goto free_value; } err = 0; free_value: kvfree(value); free_key: kvfree(key); err_put: bpf_map_write_active_dec(map); fdput(f); return err; } #define BPF_MAP_FREEZE_LAST_FIELD map_fd static int map_freeze(const union bpf_attr *attr) { int err = 0, ufd = attr->map_fd; struct bpf_map *map; struct fd f; if (CHECK_ATTR(BPF_MAP_FREEZE)) return -EINVAL; f = fdget(ufd); map = __bpf_map_get(f); if (IS_ERR(map)) return PTR_ERR(map); if (map->map_type == BPF_MAP_TYPE_STRUCT_OPS || !IS_ERR_OR_NULL(map->record)) { fdput(f); return -ENOTSUPP; } if (!(map_get_sys_perms(map, f) & FMODE_CAN_WRITE)) { fdput(f); return -EPERM; } mutex_lock(&map->freeze_mutex); if (bpf_map_write_active(map)) { err = -EBUSY; goto err_put; } if (READ_ONCE(map->frozen)) { err = -EBUSY; goto err_put; } WRITE_ONCE(map->frozen, true); err_put: mutex_unlock(&map->freeze_mutex); fdput(f); return err; } static const struct bpf_prog_ops * const bpf_prog_types[] = { #define BPF_PROG_TYPE(_id, _name, prog_ctx_type, kern_ctx_type) \ [_id] = & _name ## _prog_ops, #define BPF_MAP_TYPE(_id, _ops) #define BPF_LINK_TYPE(_id, _name) #include <linux/bpf_types.h> #undef BPF_PROG_TYPE #undef BPF_MAP_TYPE #undef BPF_LINK_TYPE }; static int find_prog_type(enum bpf_prog_type type, struct bpf_prog *prog) { const struct bpf_prog_ops *ops; if (type >= ARRAY_SIZE(bpf_prog_types)) return -EINVAL; type = array_index_nospec(type, ARRAY_SIZE(bpf_prog_types)); ops = bpf_prog_types[type]; if (!ops) return -EINVAL; if (!bpf_prog_is_offloaded(prog->aux)) prog->aux->ops = ops; else prog->aux->ops = &bpf_offload_prog_ops; prog->type = type; return 0; } enum bpf_audit { BPF_AUDIT_LOAD, BPF_AUDIT_UNLOAD, BPF_AUDIT_MAX, }; static const char * const bpf_audit_str[BPF_AUDIT_MAX] = { [BPF_AUDIT_LOAD] = "LOAD", [BPF_AUDIT_UNLOAD] = "UNLOAD", }; static void bpf_audit_prog(const struct bpf_prog *prog, unsigned int op) { struct audit_context *ctx = NULL; struct audit_buffer *ab; if (WARN_ON_ONCE(op >= BPF_AUDIT_MAX)) return; if (audit_enabled == AUDIT_OFF) return; if (!in_irq() && !irqs_disabled()) ctx = audit_context(); ab = audit_log_start(ctx, GFP_ATOMIC, AUDIT_BPF); if (unlikely(!ab)) return; audit_log_format(ab, "prog-id=%u op=%s", prog->aux->id, bpf_audit_str[op]); audit_log_end(ab); } static int bpf_prog_alloc_id(struct bpf_prog *prog) { int id; idr_preload(GFP_KERNEL); spin_lock_bh(&prog_idr_lock); id = idr_alloc_cyclic(&prog_idr, prog, 1, INT_MAX, GFP_ATOMIC); if (id > 0) prog->aux->id = id; spin_unlock_bh(&prog_idr_lock); idr_preload_end(); /* id is in [1, INT_MAX) */ if (WARN_ON_ONCE(!id)) return -ENOSPC; return id > 0 ? 0 : id; } void bpf_prog_free_id(struct bpf_prog *prog) { unsigned long flags; /* cBPF to eBPF migrations are currently not in the idr store. * Offloaded programs are removed from the store when their device * disappears - even if someone grabs an fd to them they are unusable, * simply waiting for refcnt to drop to be freed. */ if (!prog->aux->id) return; spin_lock_irqsave(&prog_idr_lock, flags); idr_remove(&prog_idr, prog->aux->id); prog->aux->id = 0; spin_unlock_irqrestore(&prog_idr_lock, flags); } static void __bpf_prog_put_rcu(struct rcu_head *rcu) { struct bpf_prog_aux *aux = container_of(rcu, struct bpf_prog_aux, rcu); kvfree(aux->func_info); kfree(aux->func_info_aux); free_uid(aux->user); security_bpf_prog_free(aux->prog); bpf_prog_free(aux->prog); } static void __bpf_prog_put_noref(struct bpf_prog *prog, bool deferred) { bpf_prog_kallsyms_del_all(prog); btf_put(prog->aux->btf); module_put(prog->aux->mod); kvfree(prog->aux->jited_linfo); kvfree(prog->aux->linfo); kfree(prog->aux->kfunc_tab); if (prog->aux->attach_btf) btf_put(prog->aux->attach_btf); if (deferred) { if (prog->sleepable) call_rcu_tasks_trace(&prog->aux->rcu, __bpf_prog_put_rcu); else call_rcu(&prog->aux->rcu, __bpf_prog_put_rcu); } else { __bpf_prog_put_rcu(&prog->aux->rcu); } } static void bpf_prog_put_deferred(struct work_struct *work) { struct bpf_prog_aux *aux; struct bpf_prog *prog; aux = container_of(work, struct bpf_prog_aux, work); prog = aux->prog; perf_event_bpf_event(prog, PERF_BPF_EVENT_PROG_UNLOAD, 0); bpf_audit_prog(prog, BPF_AUDIT_UNLOAD); bpf_prog_free_id(prog); __bpf_prog_put_noref(prog, true); } static void __bpf_prog_put(struct bpf_prog *prog) { struct bpf_prog_aux *aux = prog->aux; if (atomic64_dec_and_test(&aux->refcnt)) { if (in_irq() || irqs_disabled()) { INIT_WORK(&aux->work, bpf_prog_put_deferred); schedule_work(&aux->work); } else { bpf_prog_put_deferred(&aux->work); } } } void bpf_prog_put(struct bpf_prog *prog) { __bpf_prog_put(prog); } EXPORT_SYMBOL_GPL(bpf_prog_put); static int bpf_prog_release(struct inode *inode, struct file *filp) { struct bpf_prog *prog = filp->private_data; bpf_prog_put(prog); return 0; } struct bpf_prog_kstats { u64 nsecs; u64 cnt; u64 misses; }; void notrace bpf_prog_inc_misses_counter(struct bpf_prog *prog) { struct bpf_prog_stats *stats; unsigned int flags; stats = this_cpu_ptr(prog->stats); flags = u64_stats_update_begin_irqsave(&stats->syncp); u64_stats_inc(&stats->misses); u64_stats_update_end_irqrestore(&stats->syncp, flags); } static void bpf_prog_get_stats(const struct bpf_prog *prog, struct bpf_prog_kstats *stats) { u64 nsecs = 0, cnt = 0, misses = 0; int cpu; for_each_possible_cpu(cpu) { const struct bpf_prog_stats *st; unsigned int start; u64 tnsecs, tcnt, tmisses; st = per_cpu_ptr(prog->stats, cpu); do { start = u64_stats_fetch_begin(&st->syncp); tnsecs = u64_stats_read(&st->nsecs); tcnt = u64_stats_read(&st->cnt); tmisses = u64_stats_read(&st->misses); } while (u64_stats_fetch_retry(&st->syncp, start)); nsecs += tnsecs; cnt += tcnt; misses += tmisses; } stats->nsecs = nsecs; stats->cnt = cnt; stats->misses = misses; } #ifdef CONFIG_PROC_FS static void bpf_prog_show_fdinfo(struct seq_file *m, struct file *filp) { const struct bpf_prog *prog = filp->private_data; char prog_tag[sizeof(prog->tag) * 2 + 1] = { }; struct bpf_prog_kstats stats; bpf_prog_get_stats(prog, &stats); bin2hex(prog_tag, prog->tag, sizeof(prog->tag)); seq_printf(m, "prog_type:\t%u\n" "prog_jited:\t%u\n" "prog_tag:\t%s\n" "memlock:\t%llu\n" "prog_id:\t%u\n" "run_time_ns:\t%llu\n" "run_cnt:\t%llu\n" "recursion_misses:\t%llu\n" "verified_insns:\t%u\n", prog->type, prog->jited, prog_tag, prog->pages * 1ULL << PAGE_SHIFT, prog->aux->id, stats.nsecs, stats.cnt, stats.misses, prog->aux->verified_insns); } #endif const struct file_operations bpf_prog_fops = { #ifdef CONFIG_PROC_FS .show_fdinfo = bpf_prog_show_fdinfo, #endif .release = bpf_prog_release, .read = bpf_dummy_read, .write = bpf_dummy_write, }; int bpf_prog_new_fd(struct bpf_prog *prog) { int ret; ret = security_bpf_prog(prog); if (ret < 0) return ret; return anon_inode_getfd("bpf-prog", &bpf_prog_fops, prog, O_RDWR | O_CLOEXEC); } static struct bpf_prog *____bpf_prog_get(struct fd f) { if (!f.file) return ERR_PTR(-EBADF); if (f.file->f_op != &bpf_prog_fops) { fdput(f); return ERR_PTR(-EINVAL); } return f.file->private_data; } void bpf_prog_add(struct bpf_prog *prog, int i) { atomic64_add(i, &prog->aux->refcnt); } EXPORT_SYMBOL_GPL(bpf_prog_add); void bpf_prog_sub(struct bpf_prog *prog, int i) { /* Only to be used for undoing previous bpf_prog_add() in some * error path. We still know that another entity in our call * path holds a reference to the program, thus atomic_sub() can * be safely used in such cases! */ WARN_ON(atomic64_sub_return(i, &prog->aux->refcnt) == 0); } EXPORT_SYMBOL_GPL(bpf_prog_sub); void bpf_prog_inc(struct bpf_prog *prog) { atomic64_inc(&prog->aux->refcnt); } EXPORT_SYMBOL_GPL(bpf_prog_inc); /* prog_idr_lock should have been held */ struct bpf_prog *bpf_prog_inc_not_zero(struct bpf_prog *prog) { int refold; refold = atomic64_fetch_add_unless(&prog->aux->refcnt, 1, 0); if (!refold) return ERR_PTR(-ENOENT); return prog; } EXPORT_SYMBOL_GPL(bpf_prog_inc_not_zero); bool bpf_prog_get_ok(struct bpf_prog *prog, enum bpf_prog_type *attach_type, bool attach_drv) { /* not an attachment, just a refcount inc, always allow */ if (!attach_type) return true; if (prog->type != *attach_type) return false; if (bpf_prog_is_offloaded(prog->aux) && !attach_drv) return false; return true; } static struct bpf_prog *__bpf_prog_get(u32 ufd, enum bpf_prog_type *attach_type, bool attach_drv) { struct fd f = fdget(ufd); struct bpf_prog *prog; prog = ____bpf_prog_get(f); if (IS_ERR(prog)) return prog; if (!bpf_prog_get_ok(prog, attach_type, attach_drv)) { prog = ERR_PTR(-EINVAL); goto out; } bpf_prog_inc(prog); out: fdput(f); return prog; } struct bpf_prog *bpf_prog_get(u32 ufd) { return __bpf_prog_get(ufd, NULL, false); } struct bpf_prog *bpf_prog_get_type_dev(u32 ufd, enum bpf_prog_type type, bool attach_drv) { return __bpf_prog_get(ufd, &type, attach_drv); } EXPORT_SYMBOL_GPL(bpf_prog_get_type_dev); /* Initially all BPF programs could be loaded w/o specifying * expected_attach_type. Later for some of them specifying expected_attach_type * at load time became required so that program could be validated properly. * Programs of types that are allowed to be loaded both w/ and w/o (for * backward compatibility) expected_attach_type, should have the default attach * type assigned to expected_attach_type for the latter case, so that it can be * validated later at attach time. * * bpf_prog_load_fixup_attach_type() sets expected_attach_type in @attr if * prog type requires it but has some attach types that have to be backward * compatible. */ static void bpf_prog_load_fixup_attach_type(union bpf_attr *attr) { switch (attr->prog_type) { case BPF_PROG_TYPE_CGROUP_SOCK: /* Unfortunately BPF_ATTACH_TYPE_UNSPEC enumeration doesn't * exist so checking for non-zero is the way to go here. */ if (!attr->expected_attach_type) attr->expected_attach_type = BPF_CGROUP_INET_SOCK_CREATE; break; case BPF_PROG_TYPE_SK_REUSEPORT: if (!attr->expected_attach_type) attr->expected_attach_type = BPF_SK_REUSEPORT_SELECT; break; } } static int bpf_prog_load_check_attach(enum bpf_prog_type prog_type, enum bpf_attach_type expected_attach_type, struct btf *attach_btf, u32 btf_id, struct bpf_prog *dst_prog) { if (btf_id) { if (btf_id > BTF_MAX_TYPE) return -EINVAL; if (!attach_btf && !dst_prog) return -EINVAL; switch (prog_type) { case BPF_PROG_TYPE_TRACING: case BPF_PROG_TYPE_LSM: case BPF_PROG_TYPE_STRUCT_OPS: case BPF_PROG_TYPE_EXT: break; default: return -EINVAL; } } if (attach_btf && (!btf_id || dst_prog)) return -EINVAL; if (dst_prog && prog_type != BPF_PROG_TYPE_TRACING && prog_type != BPF_PROG_TYPE_EXT) return -EINVAL; switch (prog_type) { case BPF_PROG_TYPE_CGROUP_SOCK: switch (expected_attach_type) { case BPF_CGROUP_INET_SOCK_CREATE: case BPF_CGROUP_INET_SOCK_RELEASE: case BPF_CGROUP_INET4_POST_BIND: case BPF_CGROUP_INET6_POST_BIND: return 0; default: return -EINVAL; } case BPF_PROG_TYPE_CGROUP_SOCK_ADDR: switch (expected_attach_type) { case BPF_CGROUP_INET4_BIND: case BPF_CGROUP_INET6_BIND: case BPF_CGROUP_INET4_CONNECT: case BPF_CGROUP_INET6_CONNECT: case BPF_CGROUP_UNIX_CONNECT: case BPF_CGROUP_INET4_GETPEERNAME: case BPF_CGROUP_INET6_GETPEERNAME: case BPF_CGROUP_UNIX_GETPEERNAME: case BPF_CGROUP_INET4_GETSOCKNAME: case BPF_CGROUP_INET6_GETSOCKNAME: case BPF_CGROUP_UNIX_GETSOCKNAME: case BPF_CGROUP_UDP4_SENDMSG: case BPF_CGROUP_UDP6_SENDMSG: case BPF_CGROUP_UNIX_SENDMSG: case BPF_CGROUP_UDP4_RECVMSG: case BPF_CGROUP_UDP6_RECVMSG: case BPF_CGROUP_UNIX_RECVMSG: return 0; default: return -EINVAL; } case BPF_PROG_TYPE_CGROUP_SKB: switch (expected_attach_type) { case BPF_CGROUP_INET_INGRESS: case BPF_CGROUP_INET_EGRESS: return 0; default: return -EINVAL; } case BPF_PROG_TYPE_CGROUP_SOCKOPT: switch (expected_attach_type) { case BPF_CGROUP_SETSOCKOPT: case BPF_CGROUP_GETSOCKOPT: return 0; default: return -EINVAL; } case BPF_PROG_TYPE_SK_LOOKUP: if (expected_attach_type == BPF_SK_LOOKUP) return 0; return -EINVAL; case BPF_PROG_TYPE_SK_REUSEPORT: switch (expected_attach_type) { case BPF_SK_REUSEPORT_SELECT: case BPF_SK_REUSEPORT_SELECT_OR_MIGRATE: return 0; default: return -EINVAL; } case BPF_PROG_TYPE_NETFILTER: if (expected_attach_type == BPF_NETFILTER) return 0; return -EINVAL; case BPF_PROG_TYPE_SYSCALL: case BPF_PROG_TYPE_EXT: if (expected_attach_type) return -EINVAL; fallthrough; default: return 0; } } static bool is_net_admin_prog_type(enum bpf_prog_type prog_type) { switch (prog_type) { case BPF_PROG_TYPE_SCHED_CLS: case BPF_PROG_TYPE_SCHED_ACT: case BPF_PROG_TYPE_XDP: case BPF_PROG_TYPE_LWT_IN: case BPF_PROG_TYPE_LWT_OUT: case BPF_PROG_TYPE_LWT_XMIT: case BPF_PROG_TYPE_LWT_SEG6LOCAL: case BPF_PROG_TYPE_SK_SKB: case BPF_PROG_TYPE_SK_MSG: case BPF_PROG_TYPE_FLOW_DISSECTOR: case BPF_PROG_TYPE_CGROUP_DEVICE: case BPF_PROG_TYPE_CGROUP_SOCK: case BPF_PROG_TYPE_CGROUP_SOCK_ADDR: case BPF_PROG_TYPE_CGROUP_SOCKOPT: case BPF_PROG_TYPE_CGROUP_SYSCTL: case BPF_PROG_TYPE_SOCK_OPS: case BPF_PROG_TYPE_EXT: /* extends any prog */ case BPF_PROG_TYPE_NETFILTER: return true; case BPF_PROG_TYPE_CGROUP_SKB: /* always unpriv */ case BPF_PROG_TYPE_SK_REUSEPORT: /* equivalent to SOCKET_FILTER. need CAP_BPF only */ default: return false; } } static bool is_perfmon_prog_type(enum bpf_prog_type prog_type) { switch (prog_type) { case BPF_PROG_TYPE_KPROBE: case BPF_PROG_TYPE_TRACEPOINT: case BPF_PROG_TYPE_PERF_EVENT: case BPF_PROG_TYPE_RAW_TRACEPOINT: case BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE: case BPF_PROG_TYPE_TRACING: case BPF_PROG_TYPE_LSM: case BPF_PROG_TYPE_STRUCT_OPS: /* has access to struct sock */ case BPF_PROG_TYPE_EXT: /* extends any prog */ return true; default: return false; } } /* last field in 'union bpf_attr' used by this command */ #define BPF_PROG_LOAD_LAST_FIELD prog_token_fd static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size) { enum bpf_prog_type type = attr->prog_type; struct bpf_prog *prog, *dst_prog = NULL; struct btf *attach_btf = NULL; struct bpf_token *token = NULL; bool bpf_cap; int err; char license[128]; if (CHECK_ATTR(BPF_PROG_LOAD)) return -EINVAL; if (attr->prog_flags & ~(BPF_F_STRICT_ALIGNMENT | BPF_F_ANY_ALIGNMENT | BPF_F_TEST_STATE_FREQ | BPF_F_SLEEPABLE | BPF_F_TEST_RND_HI32 | BPF_F_XDP_HAS_FRAGS | BPF_F_XDP_DEV_BOUND_ONLY | BPF_F_TEST_REG_INVARIANTS | BPF_F_TOKEN_FD)) return -EINVAL; bpf_prog_load_fixup_attach_type(attr); if (attr->prog_flags & BPF_F_TOKEN_FD) { token = bpf_token_get_from_fd(attr->prog_token_fd); if (IS_ERR(token)) return PTR_ERR(token); /* if current token doesn't grant prog loading permissions, * then we can't use this token, so ignore it and rely on * system-wide capabilities checks */ if (!bpf_token_allow_cmd(token, BPF_PROG_LOAD) || !bpf_token_allow_prog_type(token, attr->prog_type, attr->expected_attach_type)) { bpf_token_put(token); token = NULL; } } bpf_cap = bpf_token_capable(token, CAP_BPF); err = -EPERM; if (!IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) && (attr->prog_flags & BPF_F_ANY_ALIGNMENT) && !bpf_cap) goto put_token; /* Intent here is for unprivileged_bpf_disabled to block BPF program * creation for unprivileged users; other actions depend * on fd availability and access to bpffs, so are dependent on * object creation success. Even with unprivileged BPF disabled, * capability checks are still carried out for these * and other operations. */ if (sysctl_unprivileged_bpf_disabled && !bpf_cap) goto put_token; if (attr->insn_cnt == 0 || attr->insn_cnt > (bpf_cap ? BPF_COMPLEXITY_LIMIT_INSNS : BPF_MAXINSNS)) { err = -E2BIG; goto put_token; } if (type != BPF_PROG_TYPE_SOCKET_FILTER && type != BPF_PROG_TYPE_CGROUP_SKB && !bpf_cap) goto put_token; if (is_net_admin_prog_type(type) && !bpf_token_capable(token, CAP_NET_ADMIN)) goto put_token; if (is_perfmon_prog_type(type) && !bpf_token_capable(token, CAP_PERFMON)) goto put_token; /* attach_prog_fd/attach_btf_obj_fd can specify fd of either bpf_prog * or btf, we need to check which one it is */ if (attr->attach_prog_fd) { dst_prog = bpf_prog_get(attr->attach_prog_fd); if (IS_ERR(dst_prog)) { dst_prog = NULL; attach_btf = btf_get_by_fd(attr->attach_btf_obj_fd); if (IS_ERR(attach_btf)) { err = -EINVAL; goto put_token; } if (!btf_is_kernel(attach_btf)) { /* attaching through specifying bpf_prog's BTF * objects directly might be supported eventually */ btf_put(attach_btf); err = -ENOTSUPP; goto put_token; } } } else if (attr->attach_btf_id) { /* fall back to vmlinux BTF, if BTF type ID is specified */ attach_btf = bpf_get_btf_vmlinux(); if (IS_ERR(attach_btf)) { err = PTR_ERR(attach_btf); goto put_token; } if (!attach_btf) { err = -EINVAL; goto put_token; } btf_get(attach_btf); } if (bpf_prog_load_check_attach(type, attr->expected_attach_type, attach_btf, attr->attach_btf_id, dst_prog)) { if (dst_prog) bpf_prog_put(dst_prog); if (attach_btf) btf_put(attach_btf); err = -EINVAL; goto put_token; } /* plain bpf_prog allocation */ prog = bpf_prog_alloc(bpf_prog_size(attr->insn_cnt), GFP_USER); if (!prog) { if (dst_prog) bpf_prog_put(dst_prog); if (attach_btf) btf_put(attach_btf); err = -EINVAL; goto put_token; } prog->expected_attach_type = attr->expected_attach_type; prog->sleepable = !!(attr->prog_flags & BPF_F_SLEEPABLE); prog->aux->attach_btf = attach_btf; prog->aux->attach_btf_id = attr->attach_btf_id; prog->aux->dst_prog = dst_prog; prog->aux->dev_bound = !!attr->prog_ifindex; prog->aux->xdp_has_frags = attr->prog_flags & BPF_F_XDP_HAS_FRAGS; /* move token into prog->aux, reuse taken refcnt */ prog->aux->token = token; token = NULL; prog->aux->user = get_current_user(); prog->len = attr->insn_cnt; err = -EFAULT; if (copy_from_bpfptr(prog->insns, make_bpfptr(attr->insns, uattr.is_kernel), bpf_prog_insn_size(prog)) != 0) goto free_prog; /* copy eBPF program license from user space */ if (strncpy_from_bpfptr(license, make_bpfptr(attr->license, uattr.is_kernel), sizeof(license) - 1) < 0) goto free_prog; license[sizeof(license) - 1] = 0; /* eBPF programs must be GPL compatible to use GPL-ed functions */ prog->gpl_compatible = license_is_gpl_compatible(license) ? 1 : 0; prog->orig_prog = NULL; prog->jited = 0; atomic64_set(&prog->aux->refcnt, 1); if (bpf_prog_is_dev_bound(prog->aux)) { err = bpf_prog_dev_bound_init(prog, attr); if (err) goto free_prog; } if (type == BPF_PROG_TYPE_EXT && dst_prog && bpf_prog_is_dev_bound(dst_prog->aux)) { err = bpf_prog_dev_bound_inherit(prog, dst_prog); if (err) goto free_prog; } /* * Bookkeeping for managing the program attachment chain. * * It might be tempting to set attach_tracing_prog flag at the attachment * time, but this will not prevent from loading bunch of tracing prog * first, then attach them one to another. * * The flag attach_tracing_prog is set for the whole program lifecycle, and * doesn't have to be cleared in bpf_tracing_link_release, since tracing * programs cannot change attachment target. */ if (type == BPF_PROG_TYPE_TRACING && dst_prog && dst_prog->type == BPF_PROG_TYPE_TRACING) { prog->aux->attach_tracing_prog = true; } /* find program type: socket_filter vs tracing_filter */ err = find_prog_type(type, prog); if (err < 0) goto free_prog; prog->aux->load_time = ktime_get_boottime_ns(); err = bpf_obj_name_cpy(prog->aux->name, attr->prog_name, sizeof(attr->prog_name)); if (err < 0) goto free_prog; err = security_bpf_prog_load(prog, attr, token); if (err) goto free_prog_sec; /* run eBPF verifier */ err = bpf_check(&prog, attr, uattr, uattr_size); if (err < 0) goto free_used_maps; prog = bpf_prog_select_runtime(prog, &err); if (err < 0) goto free_used_maps; err = bpf_prog_alloc_id(prog); if (err) goto free_used_maps; /* Upon success of bpf_prog_alloc_id(), the BPF prog is * effectively publicly exposed. However, retrieving via * bpf_prog_get_fd_by_id() will take another reference, * therefore it cannot be gone underneath us. * * Only for the time /after/ successful bpf_prog_new_fd() * and before returning to userspace, we might just hold * one reference and any parallel close on that fd could * rip everything out. Hence, below notifications must * happen before bpf_prog_new_fd(). * * Also, any failure handling from this point onwards must * be using bpf_prog_put() given the program is exposed. */ bpf_prog_kallsyms_add(prog); perf_event_bpf_event(prog, PERF_BPF_EVENT_PROG_LOAD, 0); bpf_audit_prog(prog, BPF_AUDIT_LOAD); err = bpf_prog_new_fd(prog); if (err < 0) bpf_prog_put(prog); return err; free_used_maps: /* In case we have subprogs, we need to wait for a grace * period before we can tear down JIT memory since symbols * are already exposed under kallsyms. */ __bpf_prog_put_noref(prog, prog->aux->real_func_cnt); return err; free_prog_sec: security_bpf_prog_free(prog); free_prog: free_uid(prog->aux->user); if (prog->aux->attach_btf) btf_put(prog->aux->attach_btf); bpf_prog_free(prog); put_token: bpf_token_put(token); return err; } #define BPF_OBJ_LAST_FIELD path_fd static int bpf_obj_pin(const union bpf_attr *attr) { int path_fd; if (CHECK_ATTR(BPF_OBJ) || attr->file_flags & ~BPF_F_PATH_FD) return -EINVAL; /* path_fd has to be accompanied by BPF_F_PATH_FD flag */ if (!(attr->file_flags & BPF_F_PATH_FD) && attr->path_fd) return -EINVAL; path_fd = attr->file_flags & BPF_F_PATH_FD ? attr->path_fd : AT_FDCWD; return bpf_obj_pin_user(attr->bpf_fd, path_fd, u64_to_user_ptr(attr->pathname)); } static int bpf_obj_get(const union bpf_attr *attr) { int path_fd; if (CHECK_ATTR(BPF_OBJ) || attr->bpf_fd != 0 || attr->file_flags & ~(BPF_OBJ_FLAG_MASK | BPF_F_PATH_FD)) return -EINVAL; /* path_fd has to be accompanied by BPF_F_PATH_FD flag */ if (!(attr->file_flags & BPF_F_PATH_FD) && attr->path_fd) return -EINVAL; path_fd = attr->file_flags & BPF_F_PATH_FD ? attr->path_fd : AT_FDCWD; return bpf_obj_get_user(path_fd, u64_to_user_ptr(attr->pathname), attr->file_flags); } void bpf_link_init(struct bpf_link *link, enum bpf_link_type type, const struct bpf_link_ops *ops, struct bpf_prog *prog) { WARN_ON(ops->dealloc && ops->dealloc_deferred); atomic64_set(&link->refcnt, 1); link->type = type; link->id = 0; link->ops = ops; link->prog = prog; } static void bpf_link_free_id(int id) { if (!id) return; spin_lock_bh(&link_idr_lock); idr_remove(&link_idr, id); spin_unlock_bh(&link_idr_lock); } /* Clean up bpf_link and corresponding anon_inode file and FD. After * anon_inode is created, bpf_link can't be just kfree()'d due to deferred * anon_inode's release() call. This helper marks bpf_link as * defunct, releases anon_inode file and puts reserved FD. bpf_prog's refcnt * is not decremented, it's the responsibility of a calling code that failed * to complete bpf_link initialization. * This helper eventually calls link's dealloc callback, but does not call * link's release callback. */ void bpf_link_cleanup(struct bpf_link_primer *primer) { primer->link->prog = NULL; bpf_link_free_id(primer->id); fput(primer->file); put_unused_fd(primer->fd); } void bpf_link_inc(struct bpf_link *link) { atomic64_inc(&link->refcnt); } static void bpf_link_defer_dealloc_rcu_gp(struct rcu_head *rcu) { struct bpf_link *link = container_of(rcu, struct bpf_link, rcu); /* free bpf_link and its containing memory */ link->ops->dealloc_deferred(link); } static void bpf_link_defer_dealloc_mult_rcu_gp(struct rcu_head *rcu) { if (rcu_trace_implies_rcu_gp()) bpf_link_defer_dealloc_rcu_gp(rcu); else call_rcu(rcu, bpf_link_defer_dealloc_rcu_gp); } /* bpf_link_free is guaranteed to be called from process context */ static void bpf_link_free(struct bpf_link *link) { const struct bpf_link_ops *ops = link->ops; bool sleepable = false; bpf_link_free_id(link->id); if (link->prog) { sleepable = link->prog->sleepable; /* detach BPF program, clean up used resources */ ops->release(link); bpf_prog_put(link->prog); } if (ops->dealloc_deferred) { /* schedule BPF link deallocation; if underlying BPF program * is sleepable, we need to first wait for RCU tasks trace * sync, then go through "classic" RCU grace period */ if (sleepable) call_rcu_tasks_trace(&link->rcu, bpf_link_defer_dealloc_mult_rcu_gp); else call_rcu(&link->rcu, bpf_link_defer_dealloc_rcu_gp); } else if (ops->dealloc) ops->dealloc(link); } static void bpf_link_put_deferred(struct work_struct *work) { struct bpf_link *link = container_of(work, struct bpf_link, work); bpf_link_free(link); } /* bpf_link_put might be called from atomic context. It needs to be called * from sleepable context in order to acquire sleeping locks during the process. */ void bpf_link_put(struct bpf_link *link) { if (!atomic64_dec_and_test(&link->refcnt)) return; INIT_WORK(&link->work, bpf_link_put_deferred); schedule_work(&link->work); } EXPORT_SYMBOL(bpf_link_put); static void bpf_link_put_direct(struct bpf_link *link) { if (!atomic64_dec_and_test(&link->refcnt)) return; bpf_link_free(link); } static int bpf_link_release(struct inode *inode, struct file *filp) { struct bpf_link *link = filp->private_data; bpf_link_put_direct(link); return 0; } #ifdef CONFIG_PROC_FS #define BPF_PROG_TYPE(_id, _name, prog_ctx_type, kern_ctx_type) #define BPF_MAP_TYPE(_id, _ops) #define BPF_LINK_TYPE(_id, _name) [_id] = #_name, static const char *bpf_link_type_strs[] = { [BPF_LINK_TYPE_UNSPEC] = "<invalid>", #include <linux/bpf_types.h> }; #undef BPF_PROG_TYPE #undef BPF_MAP_TYPE #undef BPF_LINK_TYPE static void bpf_link_show_fdinfo(struct seq_file *m, struct file *filp) { const struct bpf_link *link = filp->private_data; const struct bpf_prog *prog = link->prog; char prog_tag[sizeof(prog->tag) * 2 + 1] = { }; seq_printf(m, "link_type:\t%s\n" "link_id:\t%u\n", bpf_link_type_strs[link->type], link->id); if (prog) { bin2hex(prog_tag, prog->tag, sizeof(prog->tag)); seq_printf(m, "prog_tag:\t%s\n" "prog_id:\t%u\n", prog_tag, prog->aux->id); } if (link->ops->show_fdinfo) link->ops->show_fdinfo(link, m); } #endif static __poll_t bpf_link_poll(struct file *file, struct poll_table_struct *pts) { struct bpf_link *link = file->private_data; return link->ops->poll(file, pts); } static const struct file_operations bpf_link_fops = { #ifdef CONFIG_PROC_FS .show_fdinfo = bpf_link_show_fdinfo, #endif .release = bpf_link_release, .read = bpf_dummy_read, .write = bpf_dummy_write, }; static const struct file_operations bpf_link_fops_poll = { #ifdef CONFIG_PROC_FS .show_fdinfo = bpf_link_show_fdinfo, #endif .release = bpf_link_release, .read = bpf_dummy_read, .write = bpf_dummy_write, .poll = bpf_link_poll, }; static int bpf_link_alloc_id(struct bpf_link *link) { int id; idr_preload(GFP_KERNEL); spin_lock_bh(&link_idr_lock); id = idr_alloc_cyclic(&link_idr, link, 1, INT_MAX, GFP_ATOMIC); spin_unlock_bh(&link_idr_lock); idr_preload_end(); return id; } /* Prepare bpf_link to be exposed to user-space by allocating anon_inode file, * reserving unused FD and allocating ID from link_idr. This is to be paired * with bpf_link_settle() to install FD and ID and expose bpf_link to * user-space, if bpf_link is successfully attached. If not, bpf_link and * pre-allocated resources are to be freed with bpf_cleanup() call. All the * transient state is passed around in struct bpf_link_primer. * This is preferred way to create and initialize bpf_link, especially when * there are complicated and expensive operations in between creating bpf_link * itself and attaching it to BPF hook. By using bpf_link_prime() and * bpf_link_settle() kernel code using bpf_link doesn't have to perform * expensive (and potentially failing) roll back operations in a rare case * that file, FD, or ID can't be allocated. */ int bpf_link_prime(struct bpf_link *link, struct bpf_link_primer *primer) { struct file *file; int fd, id; fd = get_unused_fd_flags(O_CLOEXEC); if (fd < 0) return fd; id = bpf_link_alloc_id(link); if (id < 0) { put_unused_fd(fd); return id; } file = anon_inode_getfile("bpf_link", link->ops->poll ? &bpf_link_fops_poll : &bpf_link_fops, link, O_CLOEXEC); if (IS_ERR(file)) { bpf_link_free_id(id); put_unused_fd(fd); return PTR_ERR(file); } primer->link = link; primer->file = file; primer->fd = fd; primer->id = id; return 0; } int bpf_link_settle(struct bpf_link_primer *primer) { /* make bpf_link fetchable by ID */ spin_lock_bh(&link_idr_lock); primer->link->id = primer->id; spin_unlock_bh(&link_idr_lock); /* make bpf_link fetchable by FD */ fd_install(primer->fd, primer->file); /* pass through installed FD */ return primer->fd; } int bpf_link_new_fd(struct bpf_link *link) { return anon_inode_getfd("bpf-link", link->ops->poll ? &bpf_link_fops_poll : &bpf_link_fops, link, O_CLOEXEC); } struct bpf_link *bpf_link_get_from_fd(u32 ufd) { struct fd f = fdget(ufd); struct bpf_link *link; if (!f.file) return ERR_PTR(-EBADF); if (f.file->f_op != &bpf_link_fops && f.file->f_op != &bpf_link_fops_poll) { fdput(f); return ERR_PTR(-EINVAL); } link = f.file->private_data; bpf_link_inc(link); fdput(f); return link; } EXPORT_SYMBOL(bpf_link_get_from_fd); static void bpf_tracing_link_release(struct bpf_link *link) { struct bpf_tracing_link *tr_link = container_of(link, struct bpf_tracing_link, link.link); WARN_ON_ONCE(bpf_trampoline_unlink_prog(&tr_link->link, tr_link->trampoline)); bpf_trampoline_put(tr_link->trampoline); /* tgt_prog is NULL if target is a kernel function */ if (tr_link->tgt_prog) bpf_prog_put(tr_link->tgt_prog); } static void bpf_tracing_link_dealloc(struct bpf_link *link) { struct bpf_tracing_link *tr_link = container_of(link, struct bpf_tracing_link, link.link); kfree(tr_link); } static void bpf_tracing_link_show_fdinfo(const struct bpf_link *link, struct seq_file *seq) { struct bpf_tracing_link *tr_link = container_of(link, struct bpf_tracing_link, link.link); u32 target_btf_id, target_obj_id; bpf_trampoline_unpack_key(tr_link->trampoline->key, &target_obj_id, &target_btf_id); seq_printf(seq, "attach_type:\t%d\n" "target_obj_id:\t%u\n" "target_btf_id:\t%u\n", tr_link->attach_type, target_obj_id, target_btf_id); } static int bpf_tracing_link_fill_link_info(const struct bpf_link *link, struct bpf_link_info *info) { struct bpf_tracing_link *tr_link = container_of(link, struct bpf_tracing_link, link.link); info->tracing.attach_type = tr_link->attach_type; bpf_trampoline_unpack_key(tr_link->trampoline->key, &info->tracing.target_obj_id, &info->tracing.target_btf_id); return 0; } static const struct bpf_link_ops bpf_tracing_link_lops = { .release = bpf_tracing_link_release, .dealloc = bpf_tracing_link_dealloc, .show_fdinfo = bpf_tracing_link_show_fdinfo, .fill_link_info = bpf_tracing_link_fill_link_info, }; static int bpf_tracing_prog_attach(struct bpf_prog *prog, int tgt_prog_fd, u32 btf_id, u64 bpf_cookie) { struct bpf_link_primer link_primer; struct bpf_prog *tgt_prog = NULL; struct bpf_trampoline *tr = NULL; struct bpf_tracing_link *link; u64 key = 0; int err; switch (prog->type) { case BPF_PROG_TYPE_TRACING: if (prog->expected_attach_type != BPF_TRACE_FENTRY && prog->expected_attach_type != BPF_TRACE_FEXIT && prog->expected_attach_type != BPF_MODIFY_RETURN) { err = -EINVAL; goto out_put_prog; } break; case BPF_PROG_TYPE_EXT: if (prog->expected_attach_type != 0) { err = -EINVAL; goto out_put_prog; } break; case BPF_PROG_TYPE_LSM: if (prog->expected_attach_type != BPF_LSM_MAC) { err = -EINVAL; goto out_put_prog; } break; default: err = -EINVAL; goto out_put_prog; } if (!!tgt_prog_fd != !!btf_id) { err = -EINVAL; goto out_put_prog; } if (tgt_prog_fd) { /* * For now we only allow new targets for BPF_PROG_TYPE_EXT. If this * part would be changed to implement the same for * BPF_PROG_TYPE_TRACING, do not forget to update the way how * attach_tracing_prog flag is set. */ if (prog->type != BPF_PROG_TYPE_EXT) { err = -EINVAL; goto out_put_prog; } tgt_prog = bpf_prog_get(tgt_prog_fd); if (IS_ERR(tgt_prog)) { err = PTR_ERR(tgt_prog); tgt_prog = NULL; goto out_put_prog; } key = bpf_trampoline_compute_key(tgt_prog, NULL, btf_id); } link = kzalloc(sizeof(*link), GFP_USER); if (!link) { err = -ENOMEM; goto out_put_prog; } bpf_link_init(&link->link.link, BPF_LINK_TYPE_TRACING, &bpf_tracing_link_lops, prog); link->attach_type = prog->expected_attach_type; link->link.cookie = bpf_cookie; mutex_lock(&prog->aux->dst_mutex); /* There are a few possible cases here: * * - if prog->aux->dst_trampoline is set, the program was just loaded * and not yet attached to anything, so we can use the values stored * in prog->aux * * - if prog->aux->dst_trampoline is NULL, the program has already been * attached to a target and its initial target was cleared (below) * * - if tgt_prog != NULL, the caller specified tgt_prog_fd + * target_btf_id using the link_create API. * * - if tgt_prog == NULL when this function was called using the old * raw_tracepoint_open API, and we need a target from prog->aux * * - if prog->aux->dst_trampoline and tgt_prog is NULL, the program * was detached and is going for re-attachment. * * - if prog->aux->dst_trampoline is NULL and tgt_prog and prog->aux->attach_btf * are NULL, then program was already attached and user did not provide * tgt_prog_fd so we have no way to find out or create trampoline */ if (!prog->aux->dst_trampoline && !tgt_prog) { /* * Allow re-attach for TRACING and LSM programs. If it's * currently linked, bpf_trampoline_link_prog will fail. * EXT programs need to specify tgt_prog_fd, so they * re-attach in separate code path. */ if (prog->type != BPF_PROG_TYPE_TRACING && prog->type != BPF_PROG_TYPE_LSM) { err = -EINVAL; goto out_unlock; } /* We can allow re-attach only if we have valid attach_btf. */ if (!prog->aux->attach_btf) { err = -EINVAL; goto out_unlock; } btf_id = prog->aux->attach_btf_id; key = bpf_trampoline_compute_key(NULL, prog->aux->attach_btf, btf_id); } if (!prog->aux->dst_trampoline || (key && key != prog->aux->dst_trampoline->key)) { /* If there is no saved target, or the specified target is * different from the destination specified at load time, we * need a new trampoline and a check for compatibility */ struct bpf_attach_target_info tgt_info = {}; err = bpf_check_attach_target(NULL, prog, tgt_prog, btf_id, &tgt_info); if (err) goto out_unlock; if (tgt_info.tgt_mod) { module_put(prog->aux->mod); prog->aux->mod = tgt_info.tgt_mod; } tr = bpf_trampoline_get(key, &tgt_info); if (!tr) { err = -ENOMEM; goto out_unlock; } } else { /* The caller didn't specify a target, or the target was the * same as the destination supplied during program load. This * means we can reuse the trampoline and reference from program * load time, and there is no need to allocate a new one. This * can only happen once for any program, as the saved values in * prog->aux are cleared below. */ tr = prog->aux->dst_trampoline; tgt_prog = prog->aux->dst_prog; } err = bpf_link_prime(&link->link.link, &link_primer); if (err) goto out_unlock; err = bpf_trampoline_link_prog(&link->link, tr); if (err) { bpf_link_cleanup(&link_primer); link = NULL; goto out_unlock; } link->tgt_prog = tgt_prog; link->trampoline = tr; /* Always clear the trampoline and target prog from prog->aux to make * sure the original attach destination is not kept alive after a * program is (re-)attached to another target. */ if (prog->aux->dst_prog && (tgt_prog_fd || tr != prog->aux->dst_trampoline)) /* got extra prog ref from syscall, or attaching to different prog */ bpf_prog_put(prog->aux->dst_prog); if (prog->aux->dst_trampoline && tr != prog->aux->dst_trampoline) /* we allocated a new trampoline, so free the old one */ bpf_trampoline_put(prog->aux->dst_trampoline); prog->aux->dst_prog = NULL; prog->aux->dst_trampoline = NULL; mutex_unlock(&prog->aux->dst_mutex); return bpf_link_settle(&link_primer); out_unlock: if (tr && tr != prog->aux->dst_trampoline) bpf_trampoline_put(tr); mutex_unlock(&prog->aux->dst_mutex); kfree(link); out_put_prog: if (tgt_prog_fd && tgt_prog) bpf_prog_put(tgt_prog); return err; } static void bpf_raw_tp_link_release(struct bpf_link *link) { struct bpf_raw_tp_link *raw_tp = container_of(link, struct bpf_raw_tp_link, link); bpf_probe_unregister(raw_tp->btp, raw_tp); bpf_put_raw_tracepoint(raw_tp->btp); } static void bpf_raw_tp_link_dealloc(struct bpf_link *link) { struct bpf_raw_tp_link *raw_tp = container_of(link, struct bpf_raw_tp_link, link); kfree(raw_tp); } static void bpf_raw_tp_link_show_fdinfo(const struct bpf_link *link, struct seq_file *seq) { struct bpf_raw_tp_link *raw_tp_link = container_of(link, struct bpf_raw_tp_link, link); seq_printf(seq, "tp_name:\t%s\n", raw_tp_link->btp->tp->name); } static int bpf_copy_to_user(char __user *ubuf, const char *buf, u32 ulen, u32 len) { if (ulen >= len + 1) { if (copy_to_user(ubuf, buf, len + 1)) return -EFAULT; } else { char zero = '\0'; if (copy_to_user(ubuf, buf, ulen - 1)) return -EFAULT; if (put_user(zero, ubuf + ulen - 1)) return -EFAULT; return -ENOSPC; } return 0; } static int bpf_raw_tp_link_fill_link_info(const struct bpf_link *link, struct bpf_link_info *info) { struct bpf_raw_tp_link *raw_tp_link = container_of(link, struct bpf_raw_tp_link, link); char __user *ubuf = u64_to_user_ptr(info->raw_tracepoint.tp_name); const char *tp_name = raw_tp_link->btp->tp->name; u32 ulen = info->raw_tracepoint.tp_name_len; size_t tp_len = strlen(tp_name); if (!ulen ^ !ubuf) return -EINVAL; info->raw_tracepoint.tp_name_len = tp_len + 1; if (!ubuf) return 0; return bpf_copy_to_user(ubuf, tp_name, ulen, tp_len); } static const struct bpf_link_ops bpf_raw_tp_link_lops = { .release = bpf_raw_tp_link_release, .dealloc_deferred = bpf_raw_tp_link_dealloc, .show_fdinfo = bpf_raw_tp_link_show_fdinfo, .fill_link_info = bpf_raw_tp_link_fill_link_info, }; #ifdef CONFIG_PERF_EVENTS struct bpf_perf_link { struct bpf_link link; struct file *perf_file; }; static void bpf_perf_link_release(struct bpf_link *link) { struct bpf_perf_link *perf_link = container_of(link, struct bpf_perf_link, link); struct perf_event *event = perf_link->perf_file->private_data; perf_event_free_bpf_prog(event); fput(perf_link->perf_file); } static void bpf_perf_link_dealloc(struct bpf_link *link) { struct bpf_perf_link *perf_link = container_of(link, struct bpf_perf_link, link); kfree(perf_link); } static int bpf_perf_link_fill_common(const struct perf_event *event, char __user *uname, u32 ulen, u64 *probe_offset, u64 *probe_addr, u32 *fd_type, unsigned long *missed) { const char *buf; u32 prog_id; size_t len; int err; if (!ulen ^ !uname) return -EINVAL; err = bpf_get_perf_event_info(event, &prog_id, fd_type, &buf, probe_offset, probe_addr, missed); if (err) return err; if (!uname) return 0; if (buf) { len = strlen(buf); err = bpf_copy_to_user(uname, buf, ulen, len); if (err) return err; } else { char zero = '\0'; if (put_user(zero, uname)) return -EFAULT; } return 0; } #ifdef CONFIG_KPROBE_EVENTS static int bpf_perf_link_fill_kprobe(const struct perf_event *event, struct bpf_link_info *info) { unsigned long missed; char __user *uname; u64 addr, offset; u32 ulen, type; int err; uname = u64_to_user_ptr(info->perf_event.kprobe.func_name); ulen = info->perf_event.kprobe.name_len; err = bpf_perf_link_fill_common(event, uname, ulen, &offset, &addr, &type, &missed); if (err) return err; if (type == BPF_FD_TYPE_KRETPROBE) info->perf_event.type = BPF_PERF_EVENT_KRETPROBE; else info->perf_event.type = BPF_PERF_EVENT_KPROBE; info->perf_event.kprobe.offset = offset; info->perf_event.kprobe.missed = missed; if (!kallsyms_show_value(current_cred())) addr = 0; info->perf_event.kprobe.addr = addr; info->perf_event.kprobe.cookie = event->bpf_cookie; return 0; } #endif #ifdef CONFIG_UPROBE_EVENTS static int bpf_perf_link_fill_uprobe(const struct perf_event *event, struct bpf_link_info *info) { char __user *uname; u64 addr, offset; u32 ulen, type; int err; uname = u64_to_user_ptr(info->perf_event.uprobe.file_name); ulen = info->perf_event.uprobe.name_len; err = bpf_perf_link_fill_common(event, uname, ulen, &offset, &addr, &type, NULL); if (err) return err; if (type == BPF_FD_TYPE_URETPROBE) info->perf_event.type = BPF_PERF_EVENT_URETPROBE; else info->perf_event.type = BPF_PERF_EVENT_UPROBE; info->perf_event.uprobe.offset = offset; info->perf_event.uprobe.cookie = event->bpf_cookie; return 0; } #endif static int bpf_perf_link_fill_probe(const struct perf_event *event, struct bpf_link_info *info) { #ifdef CONFIG_KPROBE_EVENTS if (event->tp_event->flags & TRACE_EVENT_FL_KPROBE) return bpf_perf_link_fill_kprobe(event, info); #endif #ifdef CONFIG_UPROBE_EVENTS if (event->tp_event->flags & TRACE_EVENT_FL_UPROBE) return bpf_perf_link_fill_uprobe(event, info); #endif return -EOPNOTSUPP; } static int bpf_perf_link_fill_tracepoint(const struct perf_event *event, struct bpf_link_info *info) { char __user *uname; u32 ulen; uname = u64_to_user_ptr(info->perf_event.tracepoint.tp_name); ulen = info->perf_event.tracepoint.name_len; info->perf_event.type = BPF_PERF_EVENT_TRACEPOINT; info->perf_event.tracepoint.cookie = event->bpf_cookie; return bpf_perf_link_fill_common(event, uname, ulen, NULL, NULL, NULL, NULL); } static int bpf_perf_link_fill_perf_event(const struct perf_event *event, struct bpf_link_info *info) { info->perf_event.event.type = event->attr.type; info->perf_event.event.config = event->attr.config; info->perf_event.event.cookie = event->bpf_cookie; info->perf_event.type = BPF_PERF_EVENT_EVENT; return 0; } static int bpf_perf_link_fill_link_info(const struct bpf_link *link, struct bpf_link_info *info) { struct bpf_perf_link *perf_link; const struct perf_event *event; perf_link = container_of(link, struct bpf_perf_link, link); event = perf_get_event(perf_link->perf_file); if (IS_ERR(event)) return PTR_ERR(event); switch (event->prog->type) { case BPF_PROG_TYPE_PERF_EVENT: return bpf_perf_link_fill_perf_event(event, info); case BPF_PROG_TYPE_TRACEPOINT: return bpf_perf_link_fill_tracepoint(event, info); case BPF_PROG_TYPE_KPROBE: return bpf_perf_link_fill_probe(event, info); default: return -EOPNOTSUPP; } } static const struct bpf_link_ops bpf_perf_link_lops = { .release = bpf_perf_link_release, .dealloc = bpf_perf_link_dealloc, .fill_link_info = bpf_perf_link_fill_link_info, }; static int bpf_perf_link_attach(const union bpf_attr *attr, struct bpf_prog *prog) { struct bpf_link_primer link_primer; struct bpf_perf_link *link; struct perf_event *event; struct file *perf_file; int err; if (attr->link_create.flags) return -EINVAL; perf_file = perf_event_get(attr->link_create.target_fd); if (IS_ERR(perf_file)) return PTR_ERR(perf_file); link = kzalloc(sizeof(*link), GFP_USER); if (!link) { err = -ENOMEM; goto out_put_file; } bpf_link_init(&link->link, BPF_LINK_TYPE_PERF_EVENT, &bpf_perf_link_lops, prog); link->perf_file = perf_file; err = bpf_link_prime(&link->link, &link_primer); if (err) { kfree(link); goto out_put_file; } event = perf_file->private_data; err = perf_event_set_bpf_prog(event, prog, attr->link_create.perf_event.bpf_cookie); if (err) { bpf_link_cleanup(&link_primer); goto out_put_file; } /* perf_event_set_bpf_prog() doesn't take its own refcnt on prog */ bpf_prog_inc(prog); return bpf_link_settle(&link_primer); out_put_file: fput(perf_file); return err; } #else static int bpf_perf_link_attach(const union bpf_attr *attr, struct bpf_prog *prog) { return -EOPNOTSUPP; } #endif /* CONFIG_PERF_EVENTS */ static int bpf_raw_tp_link_attach(struct bpf_prog *prog, const char __user *user_tp_name, u64 cookie) { struct bpf_link_primer link_primer; struct bpf_raw_tp_link *link; struct bpf_raw_event_map *btp; const char *tp_name; char buf[128]; int err; switch (prog->type) { case BPF_PROG_TYPE_TRACING: case BPF_PROG_TYPE_EXT: case BPF_PROG_TYPE_LSM: if (user_tp_name) /* The attach point for this category of programs * should be specified via btf_id during program load. */ return -EINVAL; if (prog->type == BPF_PROG_TYPE_TRACING && prog->expected_attach_type == BPF_TRACE_RAW_TP) { tp_name = prog->aux->attach_func_name; break; } return bpf_tracing_prog_attach(prog, 0, 0, 0); case BPF_PROG_TYPE_RAW_TRACEPOINT: case BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE: if (strncpy_from_user(buf, user_tp_name, sizeof(buf) - 1) < 0) return -EFAULT; buf[sizeof(buf) - 1] = 0; tp_name = buf; break; default: return -EINVAL; } btp = bpf_get_raw_tracepoint(tp_name); if (!btp) return -ENOENT; link = kzalloc(sizeof(*link), GFP_USER); if (!link) { err = -ENOMEM; goto out_put_btp; } bpf_link_init(&link->link, BPF_LINK_TYPE_RAW_TRACEPOINT, &bpf_raw_tp_link_lops, prog); link->btp = btp; link->cookie = cookie; err = bpf_link_prime(&link->link, &link_primer); if (err) { kfree(link); goto out_put_btp; } err = bpf_probe_register(link->btp, link); if (err) { bpf_link_cleanup(&link_primer); goto out_put_btp; } return bpf_link_settle(&link_primer); out_put_btp: bpf_put_raw_tracepoint(btp); return err; } #define BPF_RAW_TRACEPOINT_OPEN_LAST_FIELD raw_tracepoint.cookie static int bpf_raw_tracepoint_open(const union bpf_attr *attr) { struct bpf_prog *prog; void __user *tp_name; __u64 cookie; int fd; if (CHECK_ATTR(BPF_RAW_TRACEPOINT_OPEN)) return -EINVAL; prog = bpf_prog_get(attr->raw_tracepoint.prog_fd); if (IS_ERR(prog)) return PTR_ERR(prog); tp_name = u64_to_user_ptr(attr->raw_tracepoint.name); cookie = attr->raw_tracepoint.cookie; fd = bpf_raw_tp_link_attach(prog, tp_name, cookie); if (fd < 0) bpf_prog_put(prog); return fd; } static enum bpf_prog_type attach_type_to_prog_type(enum bpf_attach_type attach_type) { switch (attach_type) { case BPF_CGROUP_INET_INGRESS: case BPF_CGROUP_INET_EGRESS: return BPF_PROG_TYPE_CGROUP_SKB; case BPF_CGROUP_INET_SOCK_CREATE: case BPF_CGROUP_INET_SOCK_RELEASE: case BPF_CGROUP_INET4_POST_BIND: case BPF_CGROUP_INET6_POST_BIND: return BPF_PROG_TYPE_CGROUP_SOCK; case BPF_CGROUP_INET4_BIND: case BPF_CGROUP_INET6_BIND: case BPF_CGROUP_INET4_CONNECT: case BPF_CGROUP_INET6_CONNECT: case BPF_CGROUP_UNIX_CONNECT: case BPF_CGROUP_INET4_GETPEERNAME: case BPF_CGROUP_INET6_GETPEERNAME: case BPF_CGROUP_UNIX_GETPEERNAME: case BPF_CGROUP_INET4_GETSOCKNAME: case BPF_CGROUP_INET6_GETSOCKNAME: case BPF_CGROUP_UNIX_GETSOCKNAME: case BPF_CGROUP_UDP4_SENDMSG: case BPF_CGROUP_UDP6_SENDMSG: case BPF_CGROUP_UNIX_SENDMSG: case BPF_CGROUP_UDP4_RECVMSG: case BPF_CGROUP_UDP6_RECVMSG: case BPF_CGROUP_UNIX_RECVMSG: return BPF_PROG_TYPE_CGROUP_SOCK_ADDR; case BPF_CGROUP_SOCK_OPS: return BPF_PROG_TYPE_SOCK_OPS; case BPF_CGROUP_DEVICE: return BPF_PROG_TYPE_CGROUP_DEVICE; case BPF_SK_MSG_VERDICT: return BPF_PROG_TYPE_SK_MSG; case BPF_SK_SKB_STREAM_PARSER: case BPF_SK_SKB_STREAM_VERDICT: case BPF_SK_SKB_VERDICT: return BPF_PROG_TYPE_SK_SKB; case BPF_LIRC_MODE2: return BPF_PROG_TYPE_LIRC_MODE2; case BPF_FLOW_DISSECTOR: return BPF_PROG_TYPE_FLOW_DISSECTOR; case BPF_CGROUP_SYSCTL: return BPF_PROG_TYPE_CGROUP_SYSCTL; case BPF_CGROUP_GETSOCKOPT: case BPF_CGROUP_SETSOCKOPT: return BPF_PROG_TYPE_CGROUP_SOCKOPT; case BPF_TRACE_ITER: case BPF_TRACE_RAW_TP: case BPF_TRACE_FENTRY: case BPF_TRACE_FEXIT: case BPF_MODIFY_RETURN: return BPF_PROG_TYPE_TRACING; case BPF_LSM_MAC: return BPF_PROG_TYPE_LSM; case BPF_SK_LOOKUP: return BPF_PROG_TYPE_SK_LOOKUP; case BPF_XDP: return BPF_PROG_TYPE_XDP; case BPF_LSM_CGROUP: return BPF_PROG_TYPE_LSM; case BPF_TCX_INGRESS: case BPF_TCX_EGRESS: case BPF_NETKIT_PRIMARY: case BPF_NETKIT_PEER: return BPF_PROG_TYPE_SCHED_CLS; default: return BPF_PROG_TYPE_UNSPEC; } } static int bpf_prog_attach_check_attach_type(const struct bpf_prog *prog, enum bpf_attach_type attach_type) { enum bpf_prog_type ptype; switch (prog->type) { case BPF_PROG_TYPE_CGROUP_SOCK: case BPF_PROG_TYPE_CGROUP_SOCK_ADDR: case BPF_PROG_TYPE_CGROUP_SOCKOPT: case BPF_PROG_TYPE_SK_LOOKUP: return attach_type == prog->expected_attach_type ? 0 : -EINVAL; case BPF_PROG_TYPE_CGROUP_SKB: if (!bpf_token_capable(prog->aux->token, CAP_NET_ADMIN)) /* cg-skb progs can be loaded by unpriv user. * check permissions at attach time. */ return -EPERM; ptype = attach_type_to_prog_type(attach_type); if (prog->type != ptype) return -EINVAL; return prog->enforce_expected_attach_type && prog->expected_attach_type != attach_type ? -EINVAL : 0; case BPF_PROG_TYPE_EXT: return 0; case BPF_PROG_TYPE_NETFILTER: if (attach_type != BPF_NETFILTER) return -EINVAL; return 0; case BPF_PROG_TYPE_PERF_EVENT: case BPF_PROG_TYPE_TRACEPOINT: if (attach_type != BPF_PERF_EVENT) return -EINVAL; return 0; case BPF_PROG_TYPE_KPROBE: if (prog->expected_attach_type == BPF_TRACE_KPROBE_MULTI && attach_type != BPF_TRACE_KPROBE_MULTI) return -EINVAL; if (prog->expected_attach_type == BPF_TRACE_KPROBE_SESSION && attach_type != BPF_TRACE_KPROBE_SESSION) return -EINVAL; if (prog->expected_attach_type == BPF_TRACE_UPROBE_MULTI && attach_type != BPF_TRACE_UPROBE_MULTI) return -EINVAL; if (attach_type != BPF_PERF_EVENT && attach_type != BPF_TRACE_KPROBE_MULTI && attach_type != BPF_TRACE_KPROBE_SESSION && attach_type != BPF_TRACE_UPROBE_MULTI) return -EINVAL; return 0; case BPF_PROG_TYPE_SCHED_CLS: if (attach_type != BPF_TCX_INGRESS && attach_type != BPF_TCX_EGRESS && attach_type != BPF_NETKIT_PRIMARY && attach_type != BPF_NETKIT_PEER) return -EINVAL; return 0; default: ptype = attach_type_to_prog_type(attach_type); if (ptype == BPF_PROG_TYPE_UNSPEC || ptype != prog->type) return -EINVAL; return 0; } } #define BPF_PROG_ATTACH_LAST_FIELD expected_revision #define BPF_F_ATTACH_MASK_BASE \ (BPF_F_ALLOW_OVERRIDE | \ BPF_F_ALLOW_MULTI | \ BPF_F_REPLACE) #define BPF_F_ATTACH_MASK_MPROG \ (BPF_F_REPLACE | \ BPF_F_BEFORE | \ BPF_F_AFTER | \ BPF_F_ID | \ BPF_F_LINK) static int bpf_prog_attach(const union bpf_attr *attr) { enum bpf_prog_type ptype; struct bpf_prog *prog; int ret; if (CHECK_ATTR(BPF_PROG_ATTACH)) return -EINVAL; ptype = attach_type_to_prog_type(attr->attach_type); if (ptype == BPF_PROG_TYPE_UNSPEC) return -EINVAL; if (bpf_mprog_supported(ptype)) { if (attr->attach_flags & ~BPF_F_ATTACH_MASK_MPROG) return -EINVAL; } else { if (attr->attach_flags & ~BPF_F_ATTACH_MASK_BASE) return -EINVAL; if (attr->relative_fd || attr->expected_revision) return -EINVAL; } prog = bpf_prog_get_type(attr->attach_bpf_fd, ptype); if (IS_ERR(prog)) return PTR_ERR(prog); if (bpf_prog_attach_check_attach_type(prog, attr->attach_type)) { bpf_prog_put(prog); return -EINVAL; } switch (ptype) { case BPF_PROG_TYPE_SK_SKB: case BPF_PROG_TYPE_SK_MSG: ret = sock_map_get_from_fd(attr, prog); break; case BPF_PROG_TYPE_LIRC_MODE2: ret = lirc_prog_attach(attr, prog); break; case BPF_PROG_TYPE_FLOW_DISSECTOR: ret = netns_bpf_prog_attach(attr, prog); break; case BPF_PROG_TYPE_CGROUP_DEVICE: case BPF_PROG_TYPE_CGROUP_SKB: case BPF_PROG_TYPE_CGROUP_SOCK: case BPF_PROG_TYPE_CGROUP_SOCK_ADDR: case BPF_PROG_TYPE_CGROUP_SOCKOPT: case BPF_PROG_TYPE_CGROUP_SYSCTL: case BPF_PROG_TYPE_SOCK_OPS: case BPF_PROG_TYPE_LSM: if (ptype == BPF_PROG_TYPE_LSM && prog->expected_attach_type != BPF_LSM_CGROUP) ret = -EINVAL; else ret = cgroup_bpf_prog_attach(attr, ptype, prog); break; case BPF_PROG_TYPE_SCHED_CLS: if (attr->attach_type == BPF_TCX_INGRESS || attr->attach_type == BPF_TCX_EGRESS) ret = tcx_prog_attach(attr, prog); else ret = netkit_prog_attach(attr, prog); break; default: ret = -EINVAL; } if (ret) bpf_prog_put(prog); return ret; } #define BPF_PROG_DETACH_LAST_FIELD expected_revision static int bpf_prog_detach(const union bpf_attr *attr) { struct bpf_prog *prog = NULL; enum bpf_prog_type ptype; int ret; if (CHECK_ATTR(BPF_PROG_DETACH)) return -EINVAL; ptype = attach_type_to_prog_type(attr->attach_type); if (bpf_mprog_supported(ptype)) { if (ptype == BPF_PROG_TYPE_UNSPEC) return -EINVAL; if (attr->attach_flags & ~BPF_F_ATTACH_MASK_MPROG) return -EINVAL; if (attr->attach_bpf_fd) { prog = bpf_prog_get_type(attr->attach_bpf_fd, ptype); if (IS_ERR(prog)) return PTR_ERR(prog); } } else if (attr->attach_flags || attr->relative_fd || attr->expected_revision) { return -EINVAL; } switch (ptype) { case BPF_PROG_TYPE_SK_MSG: case BPF_PROG_TYPE_SK_SKB: ret = sock_map_prog_detach(attr, ptype); break; case BPF_PROG_TYPE_LIRC_MODE2: ret = lirc_prog_detach(attr); break; case BPF_PROG_TYPE_FLOW_DISSECTOR: ret = netns_bpf_prog_detach(attr, ptype); break; case BPF_PROG_TYPE_CGROUP_DEVICE: case BPF_PROG_TYPE_CGROUP_SKB: case BPF_PROG_TYPE_CGROUP_SOCK: case BPF_PROG_TYPE_CGROUP_SOCK_ADDR: case BPF_PROG_TYPE_CGROUP_SOCKOPT: case BPF_PROG_TYPE_CGROUP_SYSCTL: case BPF_PROG_TYPE_SOCK_OPS: case BPF_PROG_TYPE_LSM: ret = cgroup_bpf_prog_detach(attr, ptype); break; case BPF_PROG_TYPE_SCHED_CLS: if (attr->attach_type == BPF_TCX_INGRESS || attr->attach_type == BPF_TCX_EGRESS) ret = tcx_prog_detach(attr, prog); else ret = netkit_prog_detach(attr, prog); break; default: ret = -EINVAL; } if (prog) bpf_prog_put(prog); return ret; } #define BPF_PROG_QUERY_LAST_FIELD query.revision static int bpf_prog_query(const union bpf_attr *attr, union bpf_attr __user *uattr) { if (!bpf_net_capable()) return -EPERM; if (CHECK_ATTR(BPF_PROG_QUERY)) return -EINVAL; if (attr->query.query_flags & ~BPF_F_QUERY_EFFECTIVE) return -EINVAL; switch (attr->query.attach_type) { case BPF_CGROUP_INET_INGRESS: case BPF_CGROUP_INET_EGRESS: case BPF_CGROUP_INET_SOCK_CREATE: case BPF_CGROUP_INET_SOCK_RELEASE: case BPF_CGROUP_INET4_BIND: case BPF_CGROUP_INET6_BIND: case BPF_CGROUP_INET4_POST_BIND: case BPF_CGROUP_INET6_POST_BIND: case BPF_CGROUP_INET4_CONNECT: case BPF_CGROUP_INET6_CONNECT: case BPF_CGROUP_UNIX_CONNECT: case BPF_CGROUP_INET4_GETPEERNAME: case BPF_CGROUP_INET6_GETPEERNAME: case BPF_CGROUP_UNIX_GETPEERNAME: case BPF_CGROUP_INET4_GETSOCKNAME: case BPF_CGROUP_INET6_GETSOCKNAME: case BPF_CGROUP_UNIX_GETSOCKNAME: case BPF_CGROUP_UDP4_SENDMSG: case BPF_CGROUP_UDP6_SENDMSG: case BPF_CGROUP_UNIX_SENDMSG: case BPF_CGROUP_UDP4_RECVMSG: case BPF_CGROUP_UDP6_RECVMSG: case BPF_CGROUP_UNIX_RECVMSG: case BPF_CGROUP_SOCK_OPS: case BPF_CGROUP_DEVICE: case BPF_CGROUP_SYSCTL: case BPF_CGROUP_GETSOCKOPT: case BPF_CGROUP_SETSOCKOPT: case BPF_LSM_CGROUP: return cgroup_bpf_prog_query(attr, uattr); case BPF_LIRC_MODE2: return lirc_prog_query(attr, uattr); case BPF_FLOW_DISSECTOR: case BPF_SK_LOOKUP: return netns_bpf_prog_query(attr, uattr); case BPF_SK_SKB_STREAM_PARSER: case BPF_SK_SKB_STREAM_VERDICT: case BPF_SK_MSG_VERDICT: case BPF_SK_SKB_VERDICT: return sock_map_bpf_prog_query(attr, uattr); case BPF_TCX_INGRESS: case BPF_TCX_EGRESS: return tcx_prog_query(attr, uattr); case BPF_NETKIT_PRIMARY: case BPF_NETKIT_PEER: return netkit_prog_query(attr, uattr); default: return -EINVAL; } } #define BPF_PROG_TEST_RUN_LAST_FIELD test.batch_size static int bpf_prog_test_run(const union bpf_attr *attr, union bpf_attr __user *uattr) { struct bpf_prog *prog; int ret = -ENOTSUPP; if (CHECK_ATTR(BPF_PROG_TEST_RUN)) return -EINVAL; if ((attr->test.ctx_size_in && !attr->test.ctx_in) || (!attr->test.ctx_size_in && attr->test.ctx_in)) return -EINVAL; if ((attr->test.ctx_size_out && !attr->test.ctx_out) || (!attr->test.ctx_size_out && attr->test.ctx_out)) return -EINVAL; prog = bpf_prog_get(attr->test.prog_fd); if (IS_ERR(prog)) return PTR_ERR(prog); if (prog->aux->ops->test_run) ret = prog->aux->ops->test_run(prog, attr, uattr); bpf_prog_put(prog); return ret; } #define BPF_OBJ_GET_NEXT_ID_LAST_FIELD next_id static int bpf_obj_get_next_id(const union bpf_attr *attr, union bpf_attr __user *uattr, struct idr *idr, spinlock_t *lock) { u32 next_id = attr->start_id; int err = 0; if (CHECK_ATTR(BPF_OBJ_GET_NEXT_ID) || next_id >= INT_MAX) return -EINVAL; if (!capable(CAP_SYS_ADMIN)) return -EPERM; next_id++; spin_lock_bh(lock); if (!idr_get_next(idr, &next_id)) err = -ENOENT; spin_unlock_bh(lock); if (!err) err = put_user(next_id, &uattr->next_id); return err; } struct bpf_map *bpf_map_get_curr_or_next(u32 *id) { struct bpf_map *map; spin_lock_bh(&map_idr_lock); again: map = idr_get_next(&map_idr, id); if (map) { map = __bpf_map_inc_not_zero(map, false); if (IS_ERR(map)) { (*id)++; goto again; } } spin_unlock_bh(&map_idr_lock); return map; } struct bpf_prog *bpf_prog_get_curr_or_next(u32 *id) { struct bpf_prog *prog; spin_lock_bh(&prog_idr_lock); again: prog = idr_get_next(&prog_idr, id); if (prog) { prog = bpf_prog_inc_not_zero(prog); if (IS_ERR(prog)) { (*id)++; goto again; } } spin_unlock_bh(&prog_idr_lock); return prog; } #define BPF_PROG_GET_FD_BY_ID_LAST_FIELD prog_id struct bpf_prog *bpf_prog_by_id(u32 id) { struct bpf_prog *prog; if (!id) return ERR_PTR(-ENOENT); spin_lock_bh(&prog_idr_lock); prog = idr_find(&prog_idr, id); if (prog) prog = bpf_prog_inc_not_zero(prog); else prog = ERR_PTR(-ENOENT); spin_unlock_bh(&prog_idr_lock); return prog; } static int bpf_prog_get_fd_by_id(const union bpf_attr *attr) { struct bpf_prog *prog; u32 id = attr->prog_id; int fd; if (CHECK_ATTR(BPF_PROG_GET_FD_BY_ID)) return -EINVAL; if (!capable(CAP_SYS_ADMIN)) return -EPERM; prog = bpf_prog_by_id(id); if (IS_ERR(prog)) return PTR_ERR(prog); fd = bpf_prog_new_fd(prog); if (fd < 0) bpf_prog_put(prog); return fd; } #define BPF_MAP_GET_FD_BY_ID_LAST_FIELD open_flags static int bpf_map_get_fd_by_id(const union bpf_attr *attr) { struct bpf_map *map; u32 id = attr->map_id; int f_flags; int fd; if (CHECK_ATTR(BPF_MAP_GET_FD_BY_ID) || attr->open_flags & ~BPF_OBJ_FLAG_MASK) return -EINVAL; if (!capable(CAP_SYS_ADMIN)) return -EPERM; f_flags = bpf_get_file_flag(attr->open_flags); if (f_flags < 0) return f_flags; spin_lock_bh(&map_idr_lock); map = idr_find(&map_idr, id); if (map) map = __bpf_map_inc_not_zero(map, true); else map = ERR_PTR(-ENOENT); spin_unlock_bh(&map_idr_lock); if (IS_ERR(map)) return PTR_ERR(map); fd = bpf_map_new_fd(map, f_flags); if (fd < 0) bpf_map_put_with_uref(map); return fd; } static const struct bpf_map *bpf_map_from_imm(const struct bpf_prog *prog, unsigned long addr, u32 *off, u32 *type) { const struct bpf_map *map; int i; mutex_lock(&prog->aux->used_maps_mutex); for (i = 0, *off = 0; i < prog->aux->used_map_cnt; i++) { map = prog->aux->used_maps[i]; if (map == (void *)addr) { *type = BPF_PSEUDO_MAP_FD; goto out; } if (!map->ops->map_direct_value_meta) continue; if (!map->ops->map_direct_value_meta(map, addr, off)) { *type = BPF_PSEUDO_MAP_VALUE; goto out; } } map = NULL; out: mutex_unlock(&prog->aux->used_maps_mutex); return map; } static struct bpf_insn *bpf_insn_prepare_dump(const struct bpf_prog *prog, const struct cred *f_cred) { const struct bpf_map *map; struct bpf_insn *insns; u32 off, type; u64 imm; u8 code; int i; insns = kmemdup(prog->insnsi, bpf_prog_insn_size(prog), GFP_USER); if (!insns) return insns; for (i = 0; i < prog->len; i++) { code = insns[i].code; if (code == (BPF_JMP | BPF_TAIL_CALL)) { insns[i].code = BPF_JMP | BPF_CALL; insns[i].imm = BPF_FUNC_tail_call; /* fall-through */ } if (code == (BPF_JMP | BPF_CALL) || code == (BPF_JMP | BPF_CALL_ARGS)) { if (code == (BPF_JMP | BPF_CALL_ARGS)) insns[i].code = BPF_JMP | BPF_CALL; if (!bpf_dump_raw_ok(f_cred)) insns[i].imm = 0; continue; } if (BPF_CLASS(code) == BPF_LDX && BPF_MODE(code) == BPF_PROBE_MEM) { insns[i].code = BPF_LDX | BPF_SIZE(code) | BPF_MEM; continue; } if ((BPF_CLASS(code) == BPF_LDX || BPF_CLASS(code) == BPF_STX || BPF_CLASS(code) == BPF_ST) && BPF_MODE(code) == BPF_PROBE_MEM32) { insns[i].code = BPF_CLASS(code) | BPF_SIZE(code) | BPF_MEM; continue; } if (code != (BPF_LD | BPF_IMM | BPF_DW)) continue; imm = ((u64)insns[i + 1].imm << 32) | (u32)insns[i].imm; map = bpf_map_from_imm(prog, imm, &off, &type); if (map) { insns[i].src_reg = type; insns[i].imm = map->id; insns[i + 1].imm = off; continue; } } return insns; } static int set_info_rec_size(struct bpf_prog_info *info) { /* * Ensure info.*_rec_size is the same as kernel expected size * * or * * Only allow zero *_rec_size if both _rec_size and _cnt are * zero. In this case, the kernel will set the expected * _rec_size back to the info. */ if ((info->nr_func_info || info->func_info_rec_size) && info->func_info_rec_size != sizeof(struct bpf_func_info)) return -EINVAL; if ((info->nr_line_info || info->line_info_rec_size) && info->line_info_rec_size != sizeof(struct bpf_line_info)) return -EINVAL; if ((info->nr_jited_line_info || info->jited_line_info_rec_size) && info->jited_line_info_rec_size != sizeof(__u64)) return -EINVAL; info->func_info_rec_size = sizeof(struct bpf_func_info); info->line_info_rec_size = sizeof(struct bpf_line_info); info->jited_line_info_rec_size = sizeof(__u64); return 0; } static int bpf_prog_get_info_by_fd(struct file *file, struct bpf_prog *prog, const union bpf_attr *attr, union bpf_attr __user *uattr) { struct bpf_prog_info __user *uinfo = u64_to_user_ptr(attr->info.info); struct btf *attach_btf = bpf_prog_get_target_btf(prog); struct bpf_prog_info info; u32 info_len = attr->info.info_len; struct bpf_prog_kstats stats; char __user *uinsns; u32 ulen; int err; err = bpf_check_uarg_tail_zero(USER_BPFPTR(uinfo), sizeof(info), info_len); if (err) return err; info_len = min_t(u32, sizeof(info), info_len); memset(&info, 0, sizeof(info)); if (copy_from_user(&info, uinfo, info_len)) return -EFAULT; info.type = prog->type; info.id = prog->aux->id; info.load_time = prog->aux->load_time; info.created_by_uid = from_kuid_munged(current_user_ns(), prog->aux->user->uid); info.gpl_compatible = prog->gpl_compatible; memcpy(info.tag, prog->tag, sizeof(prog->tag)); memcpy(info.name, prog->aux->name, sizeof(prog->aux->name)); mutex_lock(&prog->aux->used_maps_mutex); ulen = info.nr_map_ids; info.nr_map_ids = prog->aux->used_map_cnt; ulen = min_t(u32, info.nr_map_ids, ulen); if (ulen) { u32 __user *user_map_ids = u64_to_user_ptr(info.map_ids); u32 i; for (i = 0; i < ulen; i++) if (put_user(prog->aux->used_maps[i]->id, &user_map_ids[i])) { mutex_unlock(&prog->aux->used_maps_mutex); return -EFAULT; } } mutex_unlock(&prog->aux->used_maps_mutex); err = set_info_rec_size(&info); if (err) return err; bpf_prog_get_stats(prog, &stats); info.run_time_ns = stats.nsecs; info.run_cnt = stats.cnt; info.recursion_misses = stats.misses; info.verified_insns = prog->aux->verified_insns; if (!bpf_capable()) { info.jited_prog_len = 0; info.xlated_prog_len = 0; info.nr_jited_ksyms = 0; info.nr_jited_func_lens = 0; info.nr_func_info = 0; info.nr_line_info = 0; info.nr_jited_line_info = 0; goto done; } ulen = info.xlated_prog_len; info.xlated_prog_len = bpf_prog_insn_size(prog); if (info.xlated_prog_len && ulen) { struct bpf_insn *insns_sanitized; bool fault; if (prog->blinded && !bpf_dump_raw_ok(file->f_cred)) { info.xlated_prog_insns = 0; goto done; } insns_sanitized = bpf_insn_prepare_dump(prog, file->f_cred); if (!insns_sanitized) return -ENOMEM; uinsns = u64_to_user_ptr(info.xlated_prog_insns); ulen = min_t(u32, info.xlated_prog_len, ulen); fault = copy_to_user(uinsns, insns_sanitized, ulen); kfree(insns_sanitized); if (fault) return -EFAULT; } if (bpf_prog_is_offloaded(prog->aux)) { err = bpf_prog_offload_info_fill(&info, prog); if (err) return err; goto done; } /* NOTE: the following code is supposed to be skipped for offload. * bpf_prog_offload_info_fill() is the place to fill similar fields * for offload. */ ulen = info.jited_prog_len; if (prog->aux->func_cnt) { u32 i; info.jited_prog_len = 0; for (i = 0; i < prog->aux->func_cnt; i++) info.jited_prog_len += prog->aux->func[i]->jited_len; } else { info.jited_prog_len = prog->jited_len; } if (info.jited_prog_len && ulen) { if (bpf_dump_raw_ok(file->f_cred)) { uinsns = u64_to_user_ptr(info.jited_prog_insns); ulen = min_t(u32, info.jited_prog_len, ulen); /* for multi-function programs, copy the JITed * instructions for all the functions */ if (prog->aux->func_cnt) { u32 len, free, i; u8 *img; free = ulen; for (i = 0; i < prog->aux->func_cnt; i++) { len = prog->aux->func[i]->jited_len; len = min_t(u32, len, free); img = (u8 *) prog->aux->func[i]->bpf_func; if (copy_to_user(uinsns, img, len)) return -EFAULT; uinsns += len; free -= len; if (!free) break; } } else { if (copy_to_user(uinsns, prog->bpf_func, ulen)) return -EFAULT; } } else { info.jited_prog_insns = 0; } } ulen = info.nr_jited_ksyms; info.nr_jited_ksyms = prog->aux->func_cnt ? : 1; if (ulen) { if (bpf_dump_raw_ok(file->f_cred)) { unsigned long ksym_addr; u64 __user *user_ksyms; u32 i; /* copy the address of the kernel symbol * corresponding to each function */ ulen = min_t(u32, info.nr_jited_ksyms, ulen); user_ksyms = u64_to_user_ptr(info.jited_ksyms); if (prog->aux->func_cnt) { for (i = 0; i < ulen; i++) { ksym_addr = (unsigned long) prog->aux->func[i]->bpf_func; if (put_user((u64) ksym_addr, &user_ksyms[i])) return -EFAULT; } } else { ksym_addr = (unsigned long) prog->bpf_func; if (put_user((u64) ksym_addr, &user_ksyms[0])) return -EFAULT; } } else { info.jited_ksyms = 0; } } ulen = info.nr_jited_func_lens; info.nr_jited_func_lens = prog->aux->func_cnt ? : 1; if (ulen) { if (bpf_dump_raw_ok(file->f_cred)) { u32 __user *user_lens; u32 func_len, i; /* copy the JITed image lengths for each function */ ulen = min_t(u32, info.nr_jited_func_lens, ulen); user_lens = u64_to_user_ptr(info.jited_func_lens); if (prog->aux->func_cnt) { for (i = 0; i < ulen; i++) { func_len = prog->aux->func[i]->jited_len; if (put_user(func_len, &user_lens[i])) return -EFAULT; } } else { func_len = prog->jited_len; if (put_user(func_len, &user_lens[0])) return -EFAULT; } } else { info.jited_func_lens = 0; } } if (prog->aux->btf) info.btf_id = btf_obj_id(prog->aux->btf); info.attach_btf_id = prog->aux->attach_btf_id; if (attach_btf) info.attach_btf_obj_id = btf_obj_id(attach_btf); ulen = info.nr_func_info; info.nr_func_info = prog->aux->func_info_cnt; if (info.nr_func_info && ulen) { char __user *user_finfo; user_finfo = u64_to_user_ptr(info.func_info); ulen = min_t(u32, info.nr_func_info, ulen); if (copy_to_user(user_finfo, prog->aux->func_info, info.func_info_rec_size * ulen)) return -EFAULT; } ulen = info.nr_line_info; info.nr_line_info = prog->aux->nr_linfo; if (info.nr_line_info && ulen) { __u8 __user *user_linfo; user_linfo = u64_to_user_ptr(info.line_info); ulen = min_t(u32, info.nr_line_info, ulen); if (copy_to_user(user_linfo, prog->aux->linfo, info.line_info_rec_size * ulen)) return -EFAULT; } ulen = info.nr_jited_line_info; if (prog->aux->jited_linfo) info.nr_jited_line_info = prog->aux->nr_linfo; else info.nr_jited_line_info = 0; if (info.nr_jited_line_info && ulen) { if (bpf_dump_raw_ok(file->f_cred)) { unsigned long line_addr; __u64 __user *user_linfo; u32 i; user_linfo = u64_to_user_ptr(info.jited_line_info); ulen = min_t(u32, info.nr_jited_line_info, ulen); for (i = 0; i < ulen; i++) { line_addr = (unsigned long)prog->aux->jited_linfo[i]; if (put_user((__u64)line_addr, &user_linfo[i])) return -EFAULT; } } else { info.jited_line_info = 0; } } ulen = info.nr_prog_tags; info.nr_prog_tags = prog->aux->func_cnt ? : 1; if (ulen) { __u8 __user (*user_prog_tags)[BPF_TAG_SIZE]; u32 i; user_prog_tags = u64_to_user_ptr(info.prog_tags); ulen = min_t(u32, info.nr_prog_tags, ulen); if (prog->aux->func_cnt) { for (i = 0; i < ulen; i++) { if (copy_to_user(user_prog_tags[i], prog->aux->func[i]->tag, BPF_TAG_SIZE)) return -EFAULT; } } else { if (copy_to_user(user_prog_tags[0], prog->tag, BPF_TAG_SIZE)) return -EFAULT; } } done: if (copy_to_user(uinfo, &info, info_len) || put_user(info_len, &uattr->info.info_len)) return -EFAULT; return 0; } static int bpf_map_get_info_by_fd(struct file *file, struct bpf_map *map, const union bpf_attr *attr, union bpf_attr __user *uattr) { struct bpf_map_info __user *uinfo = u64_to_user_ptr(attr->info.info); struct bpf_map_info info; u32 info_len = attr->info.info_len; int err; err = bpf_check_uarg_tail_zero(USER_BPFPTR(uinfo), sizeof(info), info_len); if (err) return err; info_len = min_t(u32, sizeof(info), info_len); memset(&info, 0, sizeof(info)); info.type = map->map_type; info.id = map->id; info.key_size = map->key_size; info.value_size = map->value_size; info.max_entries = map->max_entries; info.map_flags = map->map_flags; info.map_extra = map->map_extra; memcpy(info.name, map->name, sizeof(map->name)); if (map->btf) { info.btf_id = btf_obj_id(map->btf); info.btf_key_type_id = map->btf_key_type_id; info.btf_value_type_id = map->btf_value_type_id; } info.btf_vmlinux_value_type_id = map->btf_vmlinux_value_type_id; if (map->map_type == BPF_MAP_TYPE_STRUCT_OPS) bpf_map_struct_ops_info_fill(&info, map); if (bpf_map_is_offloaded(map)) { err = bpf_map_offload_info_fill(&info, map); if (err) return err; } if (copy_to_user(uinfo, &info, info_len) || put_user(info_len, &uattr->info.info_len)) return -EFAULT; return 0; } static int bpf_btf_get_info_by_fd(struct file *file, struct btf *btf, const union bpf_attr *attr, union bpf_attr __user *uattr) { struct bpf_btf_info __user *uinfo = u64_to_user_ptr(attr->info.info); u32 info_len = attr->info.info_len; int err; err = bpf_check_uarg_tail_zero(USER_BPFPTR(uinfo), sizeof(*uinfo), info_len); if (err) return err; return btf_get_info_by_fd(btf, attr, uattr); } static int bpf_link_get_info_by_fd(struct file *file, struct bpf_link *link, const union bpf_attr *attr, union bpf_attr __user *uattr) { struct bpf_link_info __user *uinfo = u64_to_user_ptr(attr->info.info); struct bpf_link_info info; u32 info_len = attr->info.info_len; int err; err = bpf_check_uarg_tail_zero(USER_BPFPTR(uinfo), sizeof(info), info_len); if (err) return err; info_len = min_t(u32, sizeof(info), info_len); memset(&info, 0, sizeof(info)); if (copy_from_user(&info, uinfo, info_len)) return -EFAULT; info.type = link->type; info.id = link->id; if (link->prog) info.prog_id = link->prog->aux->id; if (link->ops->fill_link_info) { err = link->ops->fill_link_info(link, &info); if (err) return err; } if (copy_to_user(uinfo, &info, info_len) || put_user(info_len, &uattr->info.info_len)) return -EFAULT; return 0; } #define BPF_OBJ_GET_INFO_BY_FD_LAST_FIELD info.info static int bpf_obj_get_info_by_fd(const union bpf_attr *attr, union bpf_attr __user *uattr) { int ufd = attr->info.bpf_fd; struct fd f; int err; if (CHECK_ATTR(BPF_OBJ_GET_INFO_BY_FD)) return -EINVAL; f = fdget(ufd); if (!f.file) return -EBADFD; if (f.file->f_op == &bpf_prog_fops) err = bpf_prog_get_info_by_fd(f.file, f.file->private_data, attr, uattr); else if (f.file->f_op == &bpf_map_fops) err = bpf_map_get_info_by_fd(f.file, f.file->private_data, attr, uattr); else if (f.file->f_op == &btf_fops) err = bpf_btf_get_info_by_fd(f.file, f.file->private_data, attr, uattr); else if (f.file->f_op == &bpf_link_fops || f.file->f_op == &bpf_link_fops_poll) err = bpf_link_get_info_by_fd(f.file, f.file->private_data, attr, uattr); else err = -EINVAL; fdput(f); return err; } #define BPF_BTF_LOAD_LAST_FIELD btf_token_fd static int bpf_btf_load(const union bpf_attr *attr, bpfptr_t uattr, __u32 uattr_size) { struct bpf_token *token = NULL; if (CHECK_ATTR(BPF_BTF_LOAD)) return -EINVAL; if (attr->btf_flags & ~BPF_F_TOKEN_FD) return -EINVAL; if (attr->btf_flags & BPF_F_TOKEN_FD) { token = bpf_token_get_from_fd(attr->btf_token_fd); if (IS_ERR(token)) return PTR_ERR(token); if (!bpf_token_allow_cmd(token, BPF_BTF_LOAD)) { bpf_token_put(token); token = NULL; } } if (!bpf_token_capable(token, CAP_BPF)) { bpf_token_put(token); return -EPERM; } bpf_token_put(token); return btf_new_fd(attr, uattr, uattr_size); } #define BPF_BTF_GET_FD_BY_ID_LAST_FIELD btf_id static int bpf_btf_get_fd_by_id(const union bpf_attr *attr) { if (CHECK_ATTR(BPF_BTF_GET_FD_BY_ID)) return -EINVAL; if (!capable(CAP_SYS_ADMIN)) return -EPERM; return btf_get_fd_by_id(attr->btf_id); } static int bpf_task_fd_query_copy(const union bpf_attr *attr, union bpf_attr __user *uattr, u32 prog_id, u32 fd_type, const char *buf, u64 probe_offset, u64 probe_addr) { char __user *ubuf = u64_to_user_ptr(attr->task_fd_query.buf); u32 len = buf ? strlen(buf) : 0, input_len; int err = 0; if (put_user(len, &uattr->task_fd_query.buf_len)) return -EFAULT; input_len = attr->task_fd_query.buf_len; if (input_len && ubuf) { if (!len) { /* nothing to copy, just make ubuf NULL terminated */ char zero = '\0'; if (put_user(zero, ubuf)) return -EFAULT; } else if (input_len >= len + 1) { /* ubuf can hold the string with NULL terminator */ if (copy_to_user(ubuf, buf, len + 1)) return -EFAULT; } else { /* ubuf cannot hold the string with NULL terminator, * do a partial copy with NULL terminator. */ char zero = '\0'; err = -ENOSPC; if (copy_to_user(ubuf, buf, input_len - 1)) return -EFAULT; if (put_user(zero, ubuf + input_len - 1)) return -EFAULT; } } if (put_user(prog_id, &uattr->task_fd_query.prog_id) || put_user(fd_type, &uattr->task_fd_query.fd_type) || put_user(probe_offset, &uattr->task_fd_query.probe_offset) || put_user(probe_addr, &uattr->task_fd_query.probe_addr)) return -EFAULT; return err; } #define BPF_TASK_FD_QUERY_LAST_FIELD task_fd_query.probe_addr static int bpf_task_fd_query(const union bpf_attr *attr, union bpf_attr __user *uattr) { pid_t pid = attr->task_fd_query.pid; u32 fd = attr->task_fd_query.fd; const struct perf_event *event; struct task_struct *task; struct file *file; int err; if (CHECK_ATTR(BPF_TASK_FD_QUERY)) return -EINVAL; if (!capable(CAP_SYS_ADMIN)) return -EPERM; if (attr->task_fd_query.flags != 0) return -EINVAL; rcu_read_lock(); task = get_pid_task(find_vpid(pid), PIDTYPE_PID); rcu_read_unlock(); if (!task) return -ENOENT; err = 0; file = fget_task(task, fd); put_task_struct(task); if (!file) return -EBADF; if (file->f_op == &bpf_link_fops || file->f_op == &bpf_link_fops_poll) { struct bpf_link *link = file->private_data; if (link->ops == &bpf_raw_tp_link_lops) { struct bpf_raw_tp_link *raw_tp = container_of(link, struct bpf_raw_tp_link, link); struct bpf_raw_event_map *btp = raw_tp->btp; err = bpf_task_fd_query_copy(attr, uattr, raw_tp->link.prog->aux->id, BPF_FD_TYPE_RAW_TRACEPOINT, btp->tp->name, 0, 0); goto put_file; } goto out_not_supp; } event = perf_get_event(file); if (!IS_ERR(event)) { u64 probe_offset, probe_addr; u32 prog_id, fd_type; const char *buf; err = bpf_get_perf_event_info(event, &prog_id, &fd_type, &buf, &probe_offset, &probe_addr, NULL); if (!err) err = bpf_task_fd_query_copy(attr, uattr, prog_id, fd_type, buf, probe_offset, probe_addr); goto put_file; } out_not_supp: err = -ENOTSUPP; put_file: fput(file); return err; } #define BPF_MAP_BATCH_LAST_FIELD batch.flags #define BPF_DO_BATCH(fn, ...) \ do { \ if (!fn) { \ err = -ENOTSUPP; \ goto err_put; \ } \ err = fn(__VA_ARGS__); \ } while (0) static int bpf_map_do_batch(const union bpf_attr *attr, union bpf_attr __user *uattr, int cmd) { bool has_read = cmd == BPF_MAP_LOOKUP_BATCH || cmd == BPF_MAP_LOOKUP_AND_DELETE_BATCH; bool has_write = cmd != BPF_MAP_LOOKUP_BATCH; struct bpf_map *map; int err, ufd; struct fd f; if (CHECK_ATTR(BPF_MAP_BATCH)) return -EINVAL; ufd = attr->batch.map_fd; f = fdget(ufd); map = __bpf_map_get(f); if (IS_ERR(map)) return PTR_ERR(map); if (has_write) bpf_map_write_active_inc(map); if (has_read && !(map_get_sys_perms(map, f) & FMODE_CAN_READ)) { err = -EPERM; goto err_put; } if (has_write && !(map_get_sys_perms(map, f) & FMODE_CAN_WRITE)) { err = -EPERM; goto err_put; } if (cmd == BPF_MAP_LOOKUP_BATCH) BPF_DO_BATCH(map->ops->map_lookup_batch, map, attr, uattr); else if (cmd == BPF_MAP_LOOKUP_AND_DELETE_BATCH) BPF_DO_BATCH(map->ops->map_lookup_and_delete_batch, map, attr, uattr); else if (cmd == BPF_MAP_UPDATE_BATCH) BPF_DO_BATCH(map->ops->map_update_batch, map, f.file, attr, uattr); else BPF_DO_BATCH(map->ops->map_delete_batch, map, attr, uattr); err_put: if (has_write) { maybe_wait_bpf_programs(map); bpf_map_write_active_dec(map); } fdput(f); return err; } #define BPF_LINK_CREATE_LAST_FIELD link_create.uprobe_multi.pid static int link_create(union bpf_attr *attr, bpfptr_t uattr) { struct bpf_prog *prog; int ret; if (CHECK_ATTR(BPF_LINK_CREATE)) return -EINVAL; if (attr->link_create.attach_type == BPF_STRUCT_OPS) return bpf_struct_ops_link_create(attr); prog = bpf_prog_get(attr->link_create.prog_fd); if (IS_ERR(prog)) return PTR_ERR(prog); ret = bpf_prog_attach_check_attach_type(prog, attr->link_create.attach_type); if (ret) goto out; switch (prog->type) { case BPF_PROG_TYPE_CGROUP_SKB: case BPF_PROG_TYPE_CGROUP_SOCK: case BPF_PROG_TYPE_CGROUP_SOCK_ADDR: case BPF_PROG_TYPE_SOCK_OPS: case BPF_PROG_TYPE_CGROUP_DEVICE: case BPF_PROG_TYPE_CGROUP_SYSCTL: case BPF_PROG_TYPE_CGROUP_SOCKOPT: ret = cgroup_bpf_link_attach(attr, prog); break; case BPF_PROG_TYPE_EXT: ret = bpf_tracing_prog_attach(prog, attr->link_create.target_fd, attr->link_create.target_btf_id, attr->link_create.tracing.cookie); break; case BPF_PROG_TYPE_LSM: case BPF_PROG_TYPE_TRACING: if (attr->link_create.attach_type != prog->expected_attach_type) { ret = -EINVAL; goto out; } if (prog->expected_attach_type == BPF_TRACE_RAW_TP) ret = bpf_raw_tp_link_attach(prog, NULL, attr->link_create.tracing.cookie); else if (prog->expected_attach_type == BPF_TRACE_ITER) ret = bpf_iter_link_attach(attr, uattr, prog); else if (prog->expected_attach_type == BPF_LSM_CGROUP) ret = cgroup_bpf_link_attach(attr, prog); else ret = bpf_tracing_prog_attach(prog, attr->link_create.target_fd, attr->link_create.target_btf_id, attr->link_create.tracing.cookie); break; case BPF_PROG_TYPE_FLOW_DISSECTOR: case BPF_PROG_TYPE_SK_LOOKUP: ret = netns_bpf_link_create(attr, prog); break; case BPF_PROG_TYPE_SK_MSG: case BPF_PROG_TYPE_SK_SKB: ret = sock_map_link_create(attr, prog); break; #ifdef CONFIG_NET case BPF_PROG_TYPE_XDP: ret = bpf_xdp_link_attach(attr, prog); break; case BPF_PROG_TYPE_SCHED_CLS: if (attr->link_create.attach_type == BPF_TCX_INGRESS || attr->link_create.attach_type == BPF_TCX_EGRESS) ret = tcx_link_attach(attr, prog); else ret = netkit_link_attach(attr, prog); break; case BPF_PROG_TYPE_NETFILTER: ret = bpf_nf_link_attach(attr, prog); break; #endif case BPF_PROG_TYPE_PERF_EVENT: case BPF_PROG_TYPE_TRACEPOINT: ret = bpf_perf_link_attach(attr, prog); break; case BPF_PROG_TYPE_KPROBE: if (attr->link_create.attach_type == BPF_PERF_EVENT) ret = bpf_perf_link_attach(attr, prog); else if (attr->link_create.attach_type == BPF_TRACE_KPROBE_MULTI || attr->link_create.attach_type == BPF_TRACE_KPROBE_SESSION) ret = bpf_kprobe_multi_link_attach(attr, prog); else if (attr->link_create.attach_type == BPF_TRACE_UPROBE_MULTI) ret = bpf_uprobe_multi_link_attach(attr, prog); break; default: ret = -EINVAL; } out: if (ret < 0) bpf_prog_put(prog); return ret; } static int link_update_map(struct bpf_link *link, union bpf_attr *attr) { struct bpf_map *new_map, *old_map = NULL; int ret; new_map = bpf_map_get(attr->link_update.new_map_fd); if (IS_ERR(new_map)) return PTR_ERR(new_map); if (attr->link_update.flags & BPF_F_REPLACE) { old_map = bpf_map_get(attr->link_update.old_map_fd); if (IS_ERR(old_map)) { ret = PTR_ERR(old_map); goto out_put; } } else if (attr->link_update.old_map_fd) { ret = -EINVAL; goto out_put; } ret = link->ops->update_map(link, new_map, old_map); if (old_map) bpf_map_put(old_map); out_put: bpf_map_put(new_map); return ret; } #define BPF_LINK_UPDATE_LAST_FIELD link_update.old_prog_fd static int link_update(union bpf_attr *attr) { struct bpf_prog *old_prog = NULL, *new_prog; struct bpf_link *link; u32 flags; int ret; if (CHECK_ATTR(BPF_LINK_UPDATE)) return -EINVAL; flags = attr->link_update.flags; if (flags & ~BPF_F_REPLACE) return -EINVAL; link = bpf_link_get_from_fd(attr->link_update.link_fd); if (IS_ERR(link)) return PTR_ERR(link); if (link->ops->update_map) { ret = link_update_map(link, attr); goto out_put_link; } new_prog = bpf_prog_get(attr->link_update.new_prog_fd); if (IS_ERR(new_prog)) { ret = PTR_ERR(new_prog); goto out_put_link; } if (flags & BPF_F_REPLACE) { old_prog = bpf_prog_get(attr->link_update.old_prog_fd); if (IS_ERR(old_prog)) { ret = PTR_ERR(old_prog); old_prog = NULL; goto out_put_progs; } } else if (attr->link_update.old_prog_fd) { ret = -EINVAL; goto out_put_progs; } if (link->ops->update_prog) ret = link->ops->update_prog(link, new_prog, old_prog); else ret = -EINVAL; out_put_progs: if (old_prog) bpf_prog_put(old_prog); if (ret) bpf_prog_put(new_prog); out_put_link: bpf_link_put_direct(link); return ret; } #define BPF_LINK_DETACH_LAST_FIELD link_detach.link_fd static int link_detach(union bpf_attr *attr) { struct bpf_link *link; int ret; if (CHECK_ATTR(BPF_LINK_DETACH)) return -EINVAL; link = bpf_link_get_from_fd(attr->link_detach.link_fd); if (IS_ERR(link)) return PTR_ERR(link); if (link->ops->detach) ret = link->ops->detach(link); else ret = -EOPNOTSUPP; bpf_link_put_direct(link); return ret; } struct bpf_link *bpf_link_inc_not_zero(struct bpf_link *link) { return atomic64_fetch_add_unless(&link->refcnt, 1, 0) ? link : ERR_PTR(-ENOENT); } EXPORT_SYMBOL(bpf_link_inc_not_zero); struct bpf_link *bpf_link_by_id(u32 id) { struct bpf_link *link; if (!id) return ERR_PTR(-ENOENT); spin_lock_bh(&link_idr_lock); /* before link is "settled", ID is 0, pretend it doesn't exist yet */ link = idr_find(&link_idr, id); if (link) { if (link->id) link = bpf_link_inc_not_zero(link); else link = ERR_PTR(-EAGAIN); } else { link = ERR_PTR(-ENOENT); } spin_unlock_bh(&link_idr_lock); return link; } struct bpf_link *bpf_link_get_curr_or_next(u32 *id) { struct bpf_link *link; spin_lock_bh(&link_idr_lock); again: link = idr_get_next(&link_idr, id); if (link) { link = bpf_link_inc_not_zero(link); if (IS_ERR(link)) { (*id)++; goto again; } } spin_unlock_bh(&link_idr_lock); return link; } #define BPF_LINK_GET_FD_BY_ID_LAST_FIELD link_id static int bpf_link_get_fd_by_id(const union bpf_attr *attr) { struct bpf_link *link; u32 id = attr->link_id; int fd; if (CHECK_ATTR(BPF_LINK_GET_FD_BY_ID)) return -EINVAL; if (!capable(CAP_SYS_ADMIN)) return -EPERM; link = bpf_link_by_id(id); if (IS_ERR(link)) return PTR_ERR(link); fd = bpf_link_new_fd(link); if (fd < 0) bpf_link_put_direct(link); return fd; } DEFINE_MUTEX(bpf_stats_enabled_mutex); static int bpf_stats_release(struct inode *inode, struct file *file) { mutex_lock(&bpf_stats_enabled_mutex); static_key_slow_dec(&bpf_stats_enabled_key.key); mutex_unlock(&bpf_stats_enabled_mutex); return 0; } static const struct file_operations bpf_stats_fops = { .release = bpf_stats_release, }; static int bpf_enable_runtime_stats(void) { int fd; mutex_lock(&bpf_stats_enabled_mutex); /* Set a very high limit to avoid overflow */ if (static_key_count(&bpf_stats_enabled_key.key) > INT_MAX / 2) { mutex_unlock(&bpf_stats_enabled_mutex); return -EBUSY; } fd = anon_inode_getfd("bpf-stats", &bpf_stats_fops, NULL, O_CLOEXEC); if (fd >= 0) static_key_slow_inc(&bpf_stats_enabled_key.key); mutex_unlock(&bpf_stats_enabled_mutex); return fd; } #define BPF_ENABLE_STATS_LAST_FIELD enable_stats.type static int bpf_enable_stats(union bpf_attr *attr) { if (CHECK_ATTR(BPF_ENABLE_STATS)) return -EINVAL; if (!capable(CAP_SYS_ADMIN)) return -EPERM; switch (attr->enable_stats.type) { case BPF_STATS_RUN_TIME: return bpf_enable_runtime_stats(); default: break; } return -EINVAL; } #define BPF_ITER_CREATE_LAST_FIELD iter_create.flags static int bpf_iter_create(union bpf_attr *attr) { struct bpf_link *link; int err; if (CHECK_ATTR(BPF_ITER_CREATE)) return -EINVAL; if (attr->iter_create.flags) return -EINVAL; link = bpf_link_get_from_fd(attr->iter_create.link_fd); if (IS_ERR(link)) return PTR_ERR(link); err = bpf_iter_new_fd(link); bpf_link_put_direct(link); return err; } #define BPF_PROG_BIND_MAP_LAST_FIELD prog_bind_map.flags static int bpf_prog_bind_map(union bpf_attr *attr) { struct bpf_prog *prog; struct bpf_map *map; struct bpf_map **used_maps_old, **used_maps_new; int i, ret = 0; if (CHECK_ATTR(BPF_PROG_BIND_MAP)) return -EINVAL; if (attr->prog_bind_map.flags) return -EINVAL; prog = bpf_prog_get(attr->prog_bind_map.prog_fd); if (IS_ERR(prog)) return PTR_ERR(prog); map = bpf_map_get(attr->prog_bind_map.map_fd); if (IS_ERR(map)) { ret = PTR_ERR(map); goto out_prog_put; } mutex_lock(&prog->aux->used_maps_mutex); used_maps_old = prog->aux->used_maps; for (i = 0; i < prog->aux->used_map_cnt; i++) if (used_maps_old[i] == map) { bpf_map_put(map); goto out_unlock; } used_maps_new = kmalloc_array(prog->aux->used_map_cnt + 1, sizeof(used_maps_new[0]), GFP_KERNEL); if (!used_maps_new) { ret = -ENOMEM; goto out_unlock; } /* The bpf program will not access the bpf map, but for the sake of * simplicity, increase sleepable_refcnt for sleepable program as well. */ if (prog->sleepable) atomic64_inc(&map->sleepable_refcnt); memcpy(used_maps_new, used_maps_old, sizeof(used_maps_old[0]) * prog->aux->used_map_cnt); used_maps_new[prog->aux->used_map_cnt] = map; prog->aux->used_map_cnt++; prog->aux->used_maps = used_maps_new; kfree(used_maps_old); out_unlock: mutex_unlock(&prog->aux->used_maps_mutex); if (ret) bpf_map_put(map); out_prog_put: bpf_prog_put(prog); return ret; } #define BPF_TOKEN_CREATE_LAST_FIELD token_create.bpffs_fd static int token_create(union bpf_attr *attr) { if (CHECK_ATTR(BPF_TOKEN_CREATE)) return -EINVAL; /* no flags are supported yet */ if (attr->token_create.flags) return -EINVAL; return bpf_token_create(attr); } static int __sys_bpf(int cmd, bpfptr_t uattr, unsigned int size) { union bpf_attr attr; int err; err = bpf_check_uarg_tail_zero(uattr, sizeof(attr), size); if (err) return err; size = min_t(u32, size, sizeof(attr)); /* copy attributes from user space, may be less than sizeof(bpf_attr) */ memset(&attr, 0, sizeof(attr)); if (copy_from_bpfptr(&attr, uattr, size) != 0) return -EFAULT; err = security_bpf(cmd, &attr, size); if (err < 0) return err; switch (cmd) { case BPF_MAP_CREATE: err = map_create(&attr); break; case BPF_MAP_LOOKUP_ELEM: err = map_lookup_elem(&attr); break; case BPF_MAP_UPDATE_ELEM: err = map_update_elem(&attr, uattr); break; case BPF_MAP_DELETE_ELEM: err = map_delete_elem(&attr, uattr); break; case BPF_MAP_GET_NEXT_KEY: err = map_get_next_key(&attr); break; case BPF_MAP_FREEZE: err = map_freeze(&attr); break; case BPF_PROG_LOAD: err = bpf_prog_load(&attr, uattr, size); break; case BPF_OBJ_PIN: err = bpf_obj_pin(&attr); break; case BPF_OBJ_GET: err = bpf_obj_get(&attr); break; case BPF_PROG_ATTACH: err = bpf_prog_attach(&attr); break; case BPF_PROG_DETACH: err = bpf_prog_detach(&attr); break; case BPF_PROG_QUERY: err = bpf_prog_query(&attr, uattr.user); break; case BPF_PROG_TEST_RUN: err = bpf_prog_test_run(&attr, uattr.user); break; case BPF_PROG_GET_NEXT_ID: err = bpf_obj_get_next_id(&attr, uattr.user, &prog_idr, &prog_idr_lock); break; case BPF_MAP_GET_NEXT_ID: err = bpf_obj_get_next_id(&attr, uattr.user, &map_idr, &map_idr_lock); break; case BPF_BTF_GET_NEXT_ID: err = bpf_obj_get_next_id(&attr, uattr.user, &btf_idr, &btf_idr_lock); break; case BPF_PROG_GET_FD_BY_ID: err = bpf_prog_get_fd_by_id(&attr); break; case BPF_MAP_GET_FD_BY_ID: err = bpf_map_get_fd_by_id(&attr); break; case BPF_OBJ_GET_INFO_BY_FD: err = bpf_obj_get_info_by_fd(&attr, uattr.user); break; case BPF_RAW_TRACEPOINT_OPEN: err = bpf_raw_tracepoint_open(&attr); break; case BPF_BTF_LOAD: err = bpf_btf_load(&attr, uattr, size); break; case BPF_BTF_GET_FD_BY_ID: err = bpf_btf_get_fd_by_id(&attr); break; case BPF_TASK_FD_QUERY: err = bpf_task_fd_query(&attr, uattr.user); break; case BPF_MAP_LOOKUP_AND_DELETE_ELEM: err = map_lookup_and_delete_elem(&attr); break; case BPF_MAP_LOOKUP_BATCH: err = bpf_map_do_batch(&attr, uattr.user, BPF_MAP_LOOKUP_BATCH); break; case BPF_MAP_LOOKUP_AND_DELETE_BATCH: err = bpf_map_do_batch(&attr, uattr.user, BPF_MAP_LOOKUP_AND_DELETE_BATCH); break; case BPF_MAP_UPDATE_BATCH: err = bpf_map_do_batch(&attr, uattr.user, BPF_MAP_UPDATE_BATCH); break; case BPF_MAP_DELETE_BATCH: err = bpf_map_do_batch(&attr, uattr.user, BPF_MAP_DELETE_BATCH); break; case BPF_LINK_CREATE: err = link_create(&attr, uattr); break; case BPF_LINK_UPDATE: err = link_update(&attr); break; case BPF_LINK_GET_FD_BY_ID: err = bpf_link_get_fd_by_id(&attr); break; case BPF_LINK_GET_NEXT_ID: err = bpf_obj_get_next_id(&attr, uattr.user, &link_idr, &link_idr_lock); break; case BPF_ENABLE_STATS: err = bpf_enable_stats(&attr); break; case BPF_ITER_CREATE: err = bpf_iter_create(&attr); break; case BPF_LINK_DETACH: err = link_detach(&attr); break; case BPF_PROG_BIND_MAP: err = bpf_prog_bind_map(&attr); break; case BPF_TOKEN_CREATE: err = token_create(&attr); break; default: err = -EINVAL; break; } return err; } SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, size) { return __sys_bpf(cmd, USER_BPFPTR(uattr), size); } static bool syscall_prog_is_valid_access(int off, int size, enum bpf_access_type type, const struct bpf_prog *prog, struct bpf_insn_access_aux *info) { if (off < 0 || off >= U16_MAX) return false; if (off % size != 0) return false; return true; } BPF_CALL_3(bpf_sys_bpf, int, cmd, union bpf_attr *, attr, u32, attr_size) { switch (cmd) { case BPF_MAP_CREATE: case BPF_MAP_DELETE_ELEM: case BPF_MAP_UPDATE_ELEM: case BPF_MAP_FREEZE: case BPF_MAP_GET_FD_BY_ID: case BPF_PROG_LOAD: case BPF_BTF_LOAD: case BPF_LINK_CREATE: case BPF_RAW_TRACEPOINT_OPEN: break; default: return -EINVAL; } return __sys_bpf(cmd, KERNEL_BPFPTR(attr), attr_size); } /* To shut up -Wmissing-prototypes. * This function is used by the kernel light skeleton * to load bpf programs when modules are loaded or during kernel boot. * See tools/lib/bpf/skel_internal.h */ int kern_sys_bpf(int cmd, union bpf_attr *attr, unsigned int size); int kern_sys_bpf(int cmd, union bpf_attr *attr, unsigned int size) { struct bpf_prog * __maybe_unused prog; struct bpf_tramp_run_ctx __maybe_unused run_ctx; switch (cmd) { #ifdef CONFIG_BPF_JIT /* __bpf_prog_enter_sleepable used by trampoline and JIT */ case BPF_PROG_TEST_RUN: if (attr->test.data_in || attr->test.data_out || attr->test.ctx_out || attr->test.duration || attr->test.repeat || attr->test.flags) return -EINVAL; prog = bpf_prog_get_type(attr->test.prog_fd, BPF_PROG_TYPE_SYSCALL); if (IS_ERR(prog)) return PTR_ERR(prog); if (attr->test.ctx_size_in < prog->aux->max_ctx_offset || attr->test.ctx_size_in > U16_MAX) { bpf_prog_put(prog); return -EINVAL; } run_ctx.bpf_cookie = 0; if (!__bpf_prog_enter_sleepable_recur(prog, &run_ctx)) { /* recursion detected */ __bpf_prog_exit_sleepable_recur(prog, 0, &run_ctx); bpf_prog_put(prog); return -EBUSY; } attr->test.retval = bpf_prog_run(prog, (void *) (long) attr->test.ctx_in); __bpf_prog_exit_sleepable_recur(prog, 0 /* bpf_prog_run does runtime stats */, &run_ctx); bpf_prog_put(prog); return 0; #endif default: return ____bpf_sys_bpf(cmd, attr, size); } } EXPORT_SYMBOL(kern_sys_bpf); static const struct bpf_func_proto bpf_sys_bpf_proto = { .func = bpf_sys_bpf, .gpl_only = false, .ret_type = RET_INTEGER, .arg1_type = ARG_ANYTHING, .arg2_type = ARG_PTR_TO_MEM | MEM_RDONLY, .arg3_type = ARG_CONST_SIZE, }; const struct bpf_func_proto * __weak tracing_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) { return bpf_base_func_proto(func_id, prog); } BPF_CALL_1(bpf_sys_close, u32, fd) { /* When bpf program calls this helper there should not be * an fdget() without matching completed fdput(). * This helper is allowed in the following callchain only: * sys_bpf->prog_test_run->bpf_prog->bpf_sys_close */ return close_fd(fd); } static const struct bpf_func_proto bpf_sys_close_proto = { .func = bpf_sys_close, .gpl_only = false, .ret_type = RET_INTEGER, .arg1_type = ARG_ANYTHING, }; BPF_CALL_4(bpf_kallsyms_lookup_name, const char *, name, int, name_sz, int, flags, u64 *, res) { if (flags) return -EINVAL; if (name_sz <= 1 || name[name_sz - 1]) return -EINVAL; if (!bpf_dump_raw_ok(current_cred())) return -EPERM; *res = kallsyms_lookup_name(name); return *res ? 0 : -ENOENT; } static const struct bpf_func_proto bpf_kallsyms_lookup_name_proto = { .func = bpf_kallsyms_lookup_name, .gpl_only = false, .ret_type = RET_INTEGER, .arg1_type = ARG_PTR_TO_MEM, .arg2_type = ARG_CONST_SIZE_OR_ZERO, .arg3_type = ARG_ANYTHING, .arg4_type = ARG_PTR_TO_LONG, }; static const struct bpf_func_proto * syscall_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) { switch (func_id) { case BPF_FUNC_sys_bpf: return !bpf_token_capable(prog->aux->token, CAP_PERFMON) ? NULL : &bpf_sys_bpf_proto; case BPF_FUNC_btf_find_by_name_kind: return &bpf_btf_find_by_name_kind_proto; case BPF_FUNC_sys_close: return &bpf_sys_close_proto; case BPF_FUNC_kallsyms_lookup_name: return &bpf_kallsyms_lookup_name_proto; default: return tracing_prog_func_proto(func_id, prog); } } const struct bpf_verifier_ops bpf_syscall_verifier_ops = { .get_func_proto = syscall_prog_func_proto, .is_valid_access = syscall_prog_is_valid_access, }; const struct bpf_prog_ops bpf_syscall_prog_ops = { .test_run = bpf_prog_test_run_syscall, }; #ifdef CONFIG_SYSCTL static int bpf_stats_handler(struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos) { struct static_key *key = (struct static_key *)table->data; static int saved_val; int val, ret; struct ctl_table tmp = { .data = &val, .maxlen = sizeof(val), .mode = table->mode, .extra1 = SYSCTL_ZERO, .extra2 = SYSCTL_ONE, }; if (write && !capable(CAP_SYS_ADMIN)) return -EPERM; mutex_lock(&bpf_stats_enabled_mutex); val = saved_val; ret = proc_dointvec_minmax(&tmp, write, buffer, lenp, ppos); if (write && !ret && val != saved_val) { if (val) static_key_slow_inc(key); else static_key_slow_dec(key); saved_val = val; } mutex_unlock(&bpf_stats_enabled_mutex); return ret; } void __weak unpriv_ebpf_notify(int new_state) { } static int bpf_unpriv_handler(struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos) { int ret, unpriv_enable = *(int *)table->data; bool locked_state = unpriv_enable == 1; struct ctl_table tmp = *table; if (write && !capable(CAP_SYS_ADMIN)) return -EPERM; tmp.data = &unpriv_enable; ret = proc_dointvec_minmax(&tmp, write, buffer, lenp, ppos); if (write && !ret) { if (locked_state && unpriv_enable != 1) return -EPERM; *(int *)table->data = unpriv_enable; } if (write) unpriv_ebpf_notify(unpriv_enable); return ret; } static struct ctl_table bpf_syscall_table[] = { { .procname = "unprivileged_bpf_disabled", .data = &sysctl_unprivileged_bpf_disabled, .maxlen = sizeof(sysctl_unprivileged_bpf_disabled), .mode = 0644, .proc_handler = bpf_unpriv_handler, .extra1 = SYSCTL_ZERO, .extra2 = SYSCTL_TWO, }, { .procname = "bpf_stats_enabled", .data = &bpf_stats_enabled_key.key, .mode = 0644, .proc_handler = bpf_stats_handler, }, }; static int __init bpf_syscall_sysctl_init(void) { register_sysctl_init("kernel", bpf_syscall_table); return 0; } late_initcall(bpf_syscall_sysctl_init); #endif /* CONFIG_SYSCTL */ |
5 3 2 4 2 1 1 3 2 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 | // SPDX-License-Identifier: GPL-2.0 #include <linux/kernel.h> #include <linux/errno.h> #include <linux/fs.h> #include <linux/file.h> #include <linux/mm.h> #include <linux/slab.h> #include <linux/namei.h> #include <linux/io_uring.h> #include <linux/fsnotify.h> #include <uapi/linux/io_uring.h> #include "io_uring.h" #include "sync.h" struct io_sync { struct file *file; loff_t len; loff_t off; int flags; int mode; }; int io_sfr_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) { struct io_sync *sync = io_kiocb_to_cmd(req, struct io_sync); if (unlikely(sqe->addr || sqe->buf_index || sqe->splice_fd_in)) return -EINVAL; sync->off = READ_ONCE(sqe->off); sync->len = READ_ONCE(sqe->len); sync->flags = READ_ONCE(sqe->sync_range_flags); req->flags |= REQ_F_FORCE_ASYNC; return 0; } int io_sync_file_range(struct io_kiocb *req, unsigned int issue_flags) { struct io_sync *sync = io_kiocb_to_cmd(req, struct io_sync); int ret; /* sync_file_range always requires a blocking context */ WARN_ON_ONCE(issue_flags & IO_URING_F_NONBLOCK); ret = sync_file_range(req->file, sync->off, sync->len, sync->flags); io_req_set_res(req, ret, 0); return IOU_OK; } int io_fsync_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) { struct io_sync *sync = io_kiocb_to_cmd(req, struct io_sync); if (unlikely(sqe->addr || sqe->buf_index || sqe->splice_fd_in)) return -EINVAL; sync->flags = READ_ONCE(sqe->fsync_flags); if (unlikely(sync->flags & ~IORING_FSYNC_DATASYNC)) return -EINVAL; sync->off = READ_ONCE(sqe->off); sync->len = READ_ONCE(sqe->len); req->flags |= REQ_F_FORCE_ASYNC; return 0; } int io_fsync(struct io_kiocb *req, unsigned int issue_flags) { struct io_sync *sync = io_kiocb_to_cmd(req, struct io_sync); loff_t end = sync->off + sync->len; int ret; /* fsync always requires a blocking context */ WARN_ON_ONCE(issue_flags & IO_URING_F_NONBLOCK); ret = vfs_fsync_range(req->file, sync->off, end > 0 ? end : LLONG_MAX, sync->flags & IORING_FSYNC_DATASYNC); io_req_set_res(req, ret, 0); return IOU_OK; } int io_fallocate_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) { struct io_sync *sync = io_kiocb_to_cmd(req, struct io_sync); if (sqe->buf_index || sqe->rw_flags || sqe->splice_fd_in) return -EINVAL; sync->off = READ_ONCE(sqe->off); sync->len = READ_ONCE(sqe->addr); sync->mode = READ_ONCE(sqe->len); req->flags |= REQ_F_FORCE_ASYNC; return 0; } int io_fallocate(struct io_kiocb *req, unsigned int issue_flags) { struct io_sync *sync = io_kiocb_to_cmd(req, struct io_sync); int ret; /* fallocate always requiring blocking context */ WARN_ON_ONCE(issue_flags & IO_URING_F_NONBLOCK); ret = vfs_fallocate(req->file, sync->mode, sync->off, sync->len); if (ret >= 0) fsnotify_modify(req->file); io_req_set_res(req, ret, 0); return IOU_OK; } |
68 72 29 29 2 28 1 29 29 28 28 1 1 28 26 26 2 26 25 25 22 24 25 25 25 18 10 21 3 4 25 25 25 25 26 28 3 1 1 2 1 1 5 3 2 1 5 5 3 3 1 2 2 3 9 4 1 2 1 2 2 2 2 2 2 2 1 1 1 1 1 1 1 4 1 1 2 1 3 1 2 1 47 5 44 36 8 3 37 1 2 6 39 36 3 37 2 39 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 | // SPDX-License-Identifier: GPL-2.0 /* Copyright (C) B.A.T.M.A.N. contributors: * * Matthias Schiffer */ #include "netlink.h" #include "main.h" #include <linux/array_size.h> #include <linux/atomic.h> #include <linux/bitops.h> #include <linux/bug.h> #include <linux/byteorder/generic.h> #include <linux/cache.h> #include <linux/err.h> #include <linux/errno.h> #include <linux/gfp.h> #include <linux/if_ether.h> #include <linux/if_vlan.h> #include <linux/init.h> #include <linux/limits.h> #include <linux/list.h> #include <linux/minmax.h> #include <linux/netdevice.h> #include <linux/netlink.h> #include <linux/printk.h> #include <linux/rtnetlink.h> #include <linux/skbuff.h> #include <linux/stddef.h> #include <linux/types.h> #include <net/genetlink.h> #include <net/net_namespace.h> #include <net/netlink.h> #include <net/sock.h> #include <uapi/linux/batadv_packet.h> #include <uapi/linux/batman_adv.h> #include "bat_algo.h" #include "bridge_loop_avoidance.h" #include "distributed-arp-table.h" #include "gateway_client.h" #include "gateway_common.h" #include "hard-interface.h" #include "log.h" #include "multicast.h" #include "network-coding.h" #include "originator.h" #include "soft-interface.h" #include "tp_meter.h" #include "translation-table.h" struct genl_family batadv_netlink_family; /* multicast groups */ enum batadv_netlink_multicast_groups { BATADV_NL_MCGRP_CONFIG, BATADV_NL_MCGRP_TPMETER, }; /** * enum batadv_genl_ops_flags - flags for genl_ops's internal_flags */ enum batadv_genl_ops_flags { /** * @BATADV_FLAG_NEED_MESH: request requires valid soft interface in * attribute BATADV_ATTR_MESH_IFINDEX and expects a pointer to it to be * saved in info->user_ptr[0] */ BATADV_FLAG_NEED_MESH = BIT(0), /** * @BATADV_FLAG_NEED_HARDIF: request requires valid hard interface in * attribute BATADV_ATTR_HARD_IFINDEX and expects a pointer to it to be * saved in info->user_ptr[1] */ BATADV_FLAG_NEED_HARDIF = BIT(1), /** * @BATADV_FLAG_NEED_VLAN: request requires valid vlan in * attribute BATADV_ATTR_VLANID and expects a pointer to it to be * saved in info->user_ptr[1] */ BATADV_FLAG_NEED_VLAN = BIT(2), }; static const struct genl_multicast_group batadv_netlink_mcgrps[] = { [BATADV_NL_MCGRP_CONFIG] = { .name = BATADV_NL_MCAST_GROUP_CONFIG }, [BATADV_NL_MCGRP_TPMETER] = { .name = BATADV_NL_MCAST_GROUP_TPMETER }, }; static const struct nla_policy batadv_netlink_policy[NUM_BATADV_ATTR] = { [BATADV_ATTR_VERSION] = { .type = NLA_STRING }, [BATADV_ATTR_ALGO_NAME] = { .type = NLA_STRING }, [BATADV_ATTR_MESH_IFINDEX] = { .type = NLA_U32 }, [BATADV_ATTR_MESH_IFNAME] = { .type = NLA_STRING }, [BATADV_ATTR_MESH_ADDRESS] = { .len = ETH_ALEN }, [BATADV_ATTR_HARD_IFINDEX] = { .type = NLA_U32 }, [BATADV_ATTR_HARD_IFNAME] = { .type = NLA_STRING }, [BATADV_ATTR_HARD_ADDRESS] = { .len = ETH_ALEN }, [BATADV_ATTR_ORIG_ADDRESS] = { .len = ETH_ALEN }, [BATADV_ATTR_TPMETER_RESULT] = { .type = NLA_U8 }, [BATADV_ATTR_TPMETER_TEST_TIME] = { .type = NLA_U32 }, [BATADV_ATTR_TPMETER_BYTES] = { .type = NLA_U64 }, [BATADV_ATTR_TPMETER_COOKIE] = { .type = NLA_U32 }, [BATADV_ATTR_ACTIVE] = { .type = NLA_FLAG }, [BATADV_ATTR_TT_ADDRESS] = { .len = ETH_ALEN }, [BATADV_ATTR_TT_TTVN] = { .type = NLA_U8 }, [BATADV_ATTR_TT_LAST_TTVN] = { .type = NLA_U8 }, [BATADV_ATTR_TT_CRC32] = { .type = NLA_U32 }, [BATADV_ATTR_TT_VID] = { .type = NLA_U16 }, [BATADV_ATTR_TT_FLAGS] = { .type = NLA_U32 }, [BATADV_ATTR_FLAG_BEST] = { .type = NLA_FLAG }, [BATADV_ATTR_LAST_SEEN_MSECS] = { .type = NLA_U32 }, [BATADV_ATTR_NEIGH_ADDRESS] = { .len = ETH_ALEN }, [BATADV_ATTR_TQ] = { .type = NLA_U8 }, [BATADV_ATTR_THROUGHPUT] = { .type = NLA_U32 }, [BATADV_ATTR_BANDWIDTH_UP] = { .type = NLA_U32 }, [BATADV_ATTR_BANDWIDTH_DOWN] = { .type = NLA_U32 }, [BATADV_ATTR_ROUTER] = { .len = ETH_ALEN }, [BATADV_ATTR_BLA_OWN] = { .type = NLA_FLAG }, [BATADV_ATTR_BLA_ADDRESS] = { .len = ETH_ALEN }, [BATADV_ATTR_BLA_VID] = { .type = NLA_U16 }, [BATADV_ATTR_BLA_BACKBONE] = { .len = ETH_ALEN }, [BATADV_ATTR_BLA_CRC] = { .type = NLA_U16 }, [BATADV_ATTR_DAT_CACHE_IP4ADDRESS] = { .type = NLA_U32 }, [BATADV_ATTR_DAT_CACHE_HWADDRESS] = { .len = ETH_ALEN }, [BATADV_ATTR_DAT_CACHE_VID] = { .type = NLA_U16 }, [BATADV_ATTR_MCAST_FLAGS] = { .type = NLA_U32 }, [BATADV_ATTR_MCAST_FLAGS_PRIV] = { .type = NLA_U32 }, [BATADV_ATTR_VLANID] = { .type = NLA_U16 }, [BATADV_ATTR_AGGREGATED_OGMS_ENABLED] = { .type = NLA_U8 }, [BATADV_ATTR_AP_ISOLATION_ENABLED] = { .type = NLA_U8 }, [BATADV_ATTR_ISOLATION_MARK] = { .type = NLA_U32 }, [BATADV_ATTR_ISOLATION_MASK] = { .type = NLA_U32 }, [BATADV_ATTR_BONDING_ENABLED] = { .type = NLA_U8 }, [BATADV_ATTR_BRIDGE_LOOP_AVOIDANCE_ENABLED] = { .type = NLA_U8 }, [BATADV_ATTR_DISTRIBUTED_ARP_TABLE_ENABLED] = { .type = NLA_U8 }, [BATADV_ATTR_FRAGMENTATION_ENABLED] = { .type = NLA_U8 }, [BATADV_ATTR_GW_BANDWIDTH_DOWN] = { .type = NLA_U32 }, [BATADV_ATTR_GW_BANDWIDTH_UP] = { .type = NLA_U32 }, [BATADV_ATTR_GW_MODE] = { .type = NLA_U8 }, [BATADV_ATTR_GW_SEL_CLASS] = { .type = NLA_U32 }, [BATADV_ATTR_HOP_PENALTY] = { .type = NLA_U8 }, [BATADV_ATTR_LOG_LEVEL] = { .type = NLA_U32 }, [BATADV_ATTR_MULTICAST_FORCEFLOOD_ENABLED] = { .type = NLA_U8 }, [BATADV_ATTR_MULTICAST_FANOUT] = { .type = NLA_U32 }, [BATADV_ATTR_NETWORK_CODING_ENABLED] = { .type = NLA_U8 }, [BATADV_ATTR_ORIG_INTERVAL] = { .type = NLA_U32 }, [BATADV_ATTR_ELP_INTERVAL] = { .type = NLA_U32 }, [BATADV_ATTR_THROUGHPUT_OVERRIDE] = { .type = NLA_U32 }, }; /** * batadv_netlink_get_ifindex() - Extract an interface index from a message * @nlh: Message header * @attrtype: Attribute which holds an interface index * * Return: interface index, or 0. */ int batadv_netlink_get_ifindex(const struct nlmsghdr *nlh, int attrtype) { struct nlattr *attr = nlmsg_find_attr(nlh, GENL_HDRLEN, attrtype); return (attr && nla_len(attr) == sizeof(u32)) ? nla_get_u32(attr) : 0; } /** * batadv_netlink_mesh_fill_ap_isolation() - Add ap_isolation softif attribute * @msg: Netlink message to dump into * @bat_priv: the bat priv with all the soft interface information * * Return: 0 on success or negative error number in case of failure */ static int batadv_netlink_mesh_fill_ap_isolation(struct sk_buff *msg, struct batadv_priv *bat_priv) { struct batadv_softif_vlan *vlan; u8 ap_isolation; vlan = batadv_softif_vlan_get(bat_priv, BATADV_NO_FLAGS); if (!vlan) return 0; ap_isolation = atomic_read(&vlan->ap_isolation); batadv_softif_vlan_put(vlan); return nla_put_u8(msg, BATADV_ATTR_AP_ISOLATION_ENABLED, !!ap_isolation); } /** * batadv_netlink_set_mesh_ap_isolation() - Set ap_isolation from genl msg * @attr: parsed BATADV_ATTR_AP_ISOLATION_ENABLED attribute * @bat_priv: the bat priv with all the soft interface information * * Return: 0 on success or negative error number in case of failure */ static int batadv_netlink_set_mesh_ap_isolation(struct nlattr *attr, struct batadv_priv *bat_priv) { struct batadv_softif_vlan *vlan; vlan = batadv_softif_vlan_get(bat_priv, BATADV_NO_FLAGS); if (!vlan) return -ENOENT; atomic_set(&vlan->ap_isolation, !!nla_get_u8(attr)); batadv_softif_vlan_put(vlan); return 0; } /** * batadv_netlink_mesh_fill() - Fill message with mesh attributes * @msg: Netlink message to dump into * @bat_priv: the bat priv with all the soft interface information * @cmd: type of message to generate * @portid: Port making netlink request * @seq: sequence number for message * @flags: Additional flags for message * * Return: 0 on success or negative error number in case of failure */ static int batadv_netlink_mesh_fill(struct sk_buff *msg, struct batadv_priv *bat_priv, enum batadv_nl_commands cmd, u32 portid, u32 seq, int flags) { struct net_device *soft_iface = bat_priv->soft_iface; struct batadv_hard_iface *primary_if = NULL; struct net_device *hard_iface; void *hdr; hdr = genlmsg_put(msg, portid, seq, &batadv_netlink_family, flags, cmd); if (!hdr) return -ENOBUFS; if (nla_put_string(msg, BATADV_ATTR_VERSION, BATADV_SOURCE_VERSION) || nla_put_string(msg, BATADV_ATTR_ALGO_NAME, bat_priv->algo_ops->name) || nla_put_u32(msg, BATADV_ATTR_MESH_IFINDEX, soft_iface->ifindex) || nla_put_string(msg, BATADV_ATTR_MESH_IFNAME, soft_iface->name) || nla_put(msg, BATADV_ATTR_MESH_ADDRESS, ETH_ALEN, soft_iface->dev_addr) || nla_put_u8(msg, BATADV_ATTR_TT_TTVN, (u8)atomic_read(&bat_priv->tt.vn))) goto nla_put_failure; #ifdef CONFIG_BATMAN_ADV_BLA if (nla_put_u16(msg, BATADV_ATTR_BLA_CRC, ntohs(bat_priv->bla.claim_dest.group))) goto nla_put_failure; #endif if (batadv_mcast_mesh_info_put(msg, bat_priv)) goto nla_put_failure; primary_if = batadv_primary_if_get_selected(bat_priv); if (primary_if && primary_if->if_status == BATADV_IF_ACTIVE) { hard_iface = primary_if->net_dev; if (nla_put_u32(msg, BATADV_ATTR_HARD_IFINDEX, hard_iface->ifindex) || nla_put_string(msg, BATADV_ATTR_HARD_IFNAME, hard_iface->name) || nla_put(msg, BATADV_ATTR_HARD_ADDRESS, ETH_ALEN, hard_iface->dev_addr)) goto nla_put_failure; } if (nla_put_u8(msg, BATADV_ATTR_AGGREGATED_OGMS_ENABLED, !!atomic_read(&bat_priv->aggregated_ogms))) goto nla_put_failure; if (batadv_netlink_mesh_fill_ap_isolation(msg, bat_priv)) goto nla_put_failure; if (nla_put_u32(msg, BATADV_ATTR_ISOLATION_MARK, bat_priv->isolation_mark)) goto nla_put_failure; if (nla_put_u32(msg, BATADV_ATTR_ISOLATION_MASK, bat_priv->isolation_mark_mask)) goto nla_put_failure; if (nla_put_u8(msg, BATADV_ATTR_BONDING_ENABLED, !!atomic_read(&bat_priv->bonding))) goto nla_put_failure; #ifdef CONFIG_BATMAN_ADV_BLA if (nla_put_u8(msg, BATADV_ATTR_BRIDGE_LOOP_AVOIDANCE_ENABLED, !!atomic_read(&bat_priv->bridge_loop_avoidance))) goto nla_put_failure; #endif /* CONFIG_BATMAN_ADV_BLA */ #ifdef CONFIG_BATMAN_ADV_DAT if (nla_put_u8(msg, BATADV_ATTR_DISTRIBUTED_ARP_TABLE_ENABLED, !!atomic_read(&bat_priv->distributed_arp_table))) goto nla_put_failure; #endif /* CONFIG_BATMAN_ADV_DAT */ if (nla_put_u8(msg, BATADV_ATTR_FRAGMENTATION_ENABLED, !!atomic_read(&bat_priv->fragmentation))) goto nla_put_failure; if (nla_put_u32(msg, BATADV_ATTR_GW_BANDWIDTH_DOWN, atomic_read(&bat_priv->gw.bandwidth_down))) goto nla_put_failure; if (nla_put_u32(msg, BATADV_ATTR_GW_BANDWIDTH_UP, atomic_read(&bat_priv->gw.bandwidth_up))) goto nla_put_failure; if (nla_put_u8(msg, BATADV_ATTR_GW_MODE, atomic_read(&bat_priv->gw.mode))) goto nla_put_failure; if (bat_priv->algo_ops->gw.get_best_gw_node && bat_priv->algo_ops->gw.is_eligible) { /* GW selection class is not available if the routing algorithm * in use does not implement the GW API */ if (nla_put_u32(msg, BATADV_ATTR_GW_SEL_CLASS, atomic_read(&bat_priv->gw.sel_class))) goto nla_put_failure; } if (nla_put_u8(msg, BATADV_ATTR_HOP_PENALTY, atomic_read(&bat_priv->hop_penalty))) goto nla_put_failure; #ifdef CONFIG_BATMAN_ADV_DEBUG if (nla_put_u32(msg, BATADV_ATTR_LOG_LEVEL, atomic_read(&bat_priv->log_level))) goto nla_put_failure; #endif /* CONFIG_BATMAN_ADV_DEBUG */ #ifdef CONFIG_BATMAN_ADV_MCAST if (nla_put_u8(msg, BATADV_ATTR_MULTICAST_FORCEFLOOD_ENABLED, !atomic_read(&bat_priv->multicast_mode))) goto nla_put_failure; if (nla_put_u32(msg, BATADV_ATTR_MULTICAST_FANOUT, atomic_read(&bat_priv->multicast_fanout))) goto nla_put_failure; #endif /* CONFIG_BATMAN_ADV_MCAST */ #ifdef CONFIG_BATMAN_ADV_NC if (nla_put_u8(msg, BATADV_ATTR_NETWORK_CODING_ENABLED, !!atomic_read(&bat_priv->network_coding))) goto nla_put_failure; #endif /* CONFIG_BATMAN_ADV_NC */ if (nla_put_u32(msg, BATADV_ATTR_ORIG_INTERVAL, atomic_read(&bat_priv->orig_interval))) goto nla_put_failure; batadv_hardif_put(primary_if); genlmsg_end(msg, hdr); return 0; nla_put_failure: batadv_hardif_put(primary_if); genlmsg_cancel(msg, hdr); return -EMSGSIZE; } /** * batadv_netlink_notify_mesh() - send softif attributes to listener * @bat_priv: the bat priv with all the soft interface information * * Return: 0 on success, < 0 on error */ static int batadv_netlink_notify_mesh(struct batadv_priv *bat_priv) { struct sk_buff *msg; int ret; msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); if (!msg) return -ENOMEM; ret = batadv_netlink_mesh_fill(msg, bat_priv, BATADV_CMD_SET_MESH, 0, 0, 0); if (ret < 0) { nlmsg_free(msg); return ret; } genlmsg_multicast_netns(&batadv_netlink_family, dev_net(bat_priv->soft_iface), msg, 0, BATADV_NL_MCGRP_CONFIG, GFP_KERNEL); return 0; } /** * batadv_netlink_get_mesh() - Get softif attributes * @skb: Netlink message with request data * @info: receiver information * * Return: 0 on success or negative error number in case of failure */ static int batadv_netlink_get_mesh(struct sk_buff *skb, struct genl_info *info) { struct batadv_priv *bat_priv = info->user_ptr[0]; struct sk_buff *msg; int ret; msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); if (!msg) return -ENOMEM; ret = batadv_netlink_mesh_fill(msg, bat_priv, BATADV_CMD_GET_MESH, info->snd_portid, info->snd_seq, 0); if (ret < 0) { nlmsg_free(msg); return ret; } ret = genlmsg_reply(msg, info); return ret; } /** * batadv_netlink_set_mesh() - Set softif attributes * @skb: Netlink message with request data * @info: receiver information * * Return: 0 on success or negative error number in case of failure */ static int batadv_netlink_set_mesh(struct sk_buff *skb, struct genl_info *info) { struct batadv_priv *bat_priv = info->user_ptr[0]; struct nlattr *attr; if (info->attrs[BATADV_ATTR_AGGREGATED_OGMS_ENABLED]) { attr = info->attrs[BATADV_ATTR_AGGREGATED_OGMS_ENABLED]; atomic_set(&bat_priv->aggregated_ogms, !!nla_get_u8(attr)); } if (info->attrs[BATADV_ATTR_AP_ISOLATION_ENABLED]) { attr = info->attrs[BATADV_ATTR_AP_ISOLATION_ENABLED]; batadv_netlink_set_mesh_ap_isolation(attr, bat_priv); } if (info->attrs[BATADV_ATTR_ISOLATION_MARK]) { attr = info->attrs[BATADV_ATTR_ISOLATION_MARK]; bat_priv->isolation_mark = nla_get_u32(attr); } if (info->attrs[BATADV_ATTR_ISOLATION_MASK]) { attr = info->attrs[BATADV_ATTR_ISOLATION_MASK]; bat_priv->isolation_mark_mask = nla_get_u32(attr); } if (info->attrs[BATADV_ATTR_BONDING_ENABLED]) { attr = info->attrs[BATADV_ATTR_BONDING_ENABLED]; atomic_set(&bat_priv->bonding, !!nla_get_u8(attr)); } #ifdef CONFIG_BATMAN_ADV_BLA if (info->attrs[BATADV_ATTR_BRIDGE_LOOP_AVOIDANCE_ENABLED]) { attr = info->attrs[BATADV_ATTR_BRIDGE_LOOP_AVOIDANCE_ENABLED]; atomic_set(&bat_priv->bridge_loop_avoidance, !!nla_get_u8(attr)); batadv_bla_status_update(bat_priv->soft_iface); } #endif /* CONFIG_BATMAN_ADV_BLA */ #ifdef CONFIG_BATMAN_ADV_DAT if (info->attrs[BATADV_ATTR_DISTRIBUTED_ARP_TABLE_ENABLED]) { attr = info->attrs[BATADV_ATTR_DISTRIBUTED_ARP_TABLE_ENABLED]; atomic_set(&bat_priv->distributed_arp_table, !!nla_get_u8(attr)); batadv_dat_status_update(bat_priv->soft_iface); } #endif /* CONFIG_BATMAN_ADV_DAT */ if (info->attrs[BATADV_ATTR_FRAGMENTATION_ENABLED]) { attr = info->attrs[BATADV_ATTR_FRAGMENTATION_ENABLED]; atomic_set(&bat_priv->fragmentation, !!nla_get_u8(attr)); rtnl_lock(); batadv_update_min_mtu(bat_priv->soft_iface); rtnl_unlock(); } if (info->attrs[BATADV_ATTR_GW_BANDWIDTH_DOWN]) { attr = info->attrs[BATADV_ATTR_GW_BANDWIDTH_DOWN]; atomic_set(&bat_priv->gw.bandwidth_down, nla_get_u32(attr)); batadv_gw_tvlv_container_update(bat_priv); } if (info->attrs[BATADV_ATTR_GW_BANDWIDTH_UP]) { attr = info->attrs[BATADV_ATTR_GW_BANDWIDTH_UP]; atomic_set(&bat_priv->gw.bandwidth_up, nla_get_u32(attr)); batadv_gw_tvlv_container_update(bat_priv); } if (info->attrs[BATADV_ATTR_GW_MODE]) { u8 gw_mode; attr = info->attrs[BATADV_ATTR_GW_MODE]; gw_mode = nla_get_u8(attr); if (gw_mode <= BATADV_GW_MODE_SERVER) { /* Invoking batadv_gw_reselect() is not enough to really * de-select the current GW. It will only instruct the * gateway client code to perform a re-election the next * time that this is needed. * * When gw client mode is being switched off the current * GW must be de-selected explicitly otherwise no GW_ADD * uevent is thrown on client mode re-activation. This * is operation is performed in * batadv_gw_check_client_stop(). */ batadv_gw_reselect(bat_priv); /* always call batadv_gw_check_client_stop() before * changing the gateway state */ batadv_gw_check_client_stop(bat_priv); atomic_set(&bat_priv->gw.mode, gw_mode); batadv_gw_tvlv_container_update(bat_priv); } } if (info->attrs[BATADV_ATTR_GW_SEL_CLASS] && bat_priv->algo_ops->gw.get_best_gw_node && bat_priv->algo_ops->gw.is_eligible) { /* setting the GW selection class is allowed only if the routing * algorithm in use implements the GW API */ u32 sel_class_max = bat_priv->algo_ops->gw.sel_class_max; u32 sel_class; attr = info->attrs[BATADV_ATTR_GW_SEL_CLASS]; sel_class = nla_get_u32(attr); if (sel_class >= 1 && sel_class <= sel_class_max) { atomic_set(&bat_priv->gw.sel_class, sel_class); batadv_gw_reselect(bat_priv); } } if (info->attrs[BATADV_ATTR_HOP_PENALTY]) { attr = info->attrs[BATADV_ATTR_HOP_PENALTY]; atomic_set(&bat_priv->hop_penalty, nla_get_u8(attr)); } #ifdef CONFIG_BATMAN_ADV_DEBUG if (info->attrs[BATADV_ATTR_LOG_LEVEL]) { attr = info->attrs[BATADV_ATTR_LOG_LEVEL]; atomic_set(&bat_priv->log_level, nla_get_u32(attr) & BATADV_DBG_ALL); } #endif /* CONFIG_BATMAN_ADV_DEBUG */ #ifdef CONFIG_BATMAN_ADV_MCAST if (info->attrs[BATADV_ATTR_MULTICAST_FORCEFLOOD_ENABLED]) { attr = info->attrs[BATADV_ATTR_MULTICAST_FORCEFLOOD_ENABLED]; atomic_set(&bat_priv->multicast_mode, !nla_get_u8(attr)); } if (info->attrs[BATADV_ATTR_MULTICAST_FANOUT]) { attr = info->attrs[BATADV_ATTR_MULTICAST_FANOUT]; atomic_set(&bat_priv->multicast_fanout, nla_get_u32(attr)); } #endif /* CONFIG_BATMAN_ADV_MCAST */ #ifdef CONFIG_BATMAN_ADV_NC if (info->attrs[BATADV_ATTR_NETWORK_CODING_ENABLED]) { attr = info->attrs[BATADV_ATTR_NETWORK_CODING_ENABLED]; atomic_set(&bat_priv->network_coding, !!nla_get_u8(attr)); batadv_nc_status_update(bat_priv->soft_iface); } #endif /* CONFIG_BATMAN_ADV_NC */ if (info->attrs[BATADV_ATTR_ORIG_INTERVAL]) { u32 orig_interval; attr = info->attrs[BATADV_ATTR_ORIG_INTERVAL]; orig_interval = nla_get_u32(attr); orig_interval = min_t(u32, orig_interval, INT_MAX); orig_interval = max_t(u32, orig_interval, 2 * BATADV_JITTER); atomic_set(&bat_priv->orig_interval, orig_interval); } batadv_netlink_notify_mesh(bat_priv); return 0; } /** * batadv_netlink_tp_meter_put() - Fill information of started tp_meter session * @msg: netlink message to be sent back * @cookie: tp meter session cookie * * Return: 0 on success, < 0 on error */ static int batadv_netlink_tp_meter_put(struct sk_buff *msg, u32 cookie) { if (nla_put_u32(msg, BATADV_ATTR_TPMETER_COOKIE, cookie)) return -ENOBUFS; return 0; } /** * batadv_netlink_tpmeter_notify() - send tp_meter result via netlink to client * @bat_priv: the bat priv with all the soft interface information * @dst: destination of tp_meter session * @result: reason for tp meter session stop * @test_time: total time of the tp_meter session * @total_bytes: bytes acked to the receiver * @cookie: cookie of tp_meter session * * Return: 0 on success, < 0 on error */ int batadv_netlink_tpmeter_notify(struct batadv_priv *bat_priv, const u8 *dst, u8 result, u32 test_time, u64 total_bytes, u32 cookie) { struct sk_buff *msg; void *hdr; int ret; msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); if (!msg) return -ENOMEM; hdr = genlmsg_put(msg, 0, 0, &batadv_netlink_family, 0, BATADV_CMD_TP_METER); if (!hdr) { ret = -ENOBUFS; goto err_genlmsg; } if (nla_put_u32(msg, BATADV_ATTR_TPMETER_COOKIE, cookie)) goto nla_put_failure; if (nla_put_u32(msg, BATADV_ATTR_TPMETER_TEST_TIME, test_time)) goto nla_put_failure; if (nla_put_u64_64bit(msg, BATADV_ATTR_TPMETER_BYTES, total_bytes, BATADV_ATTR_PAD)) goto nla_put_failure; if (nla_put_u8(msg, BATADV_ATTR_TPMETER_RESULT, result)) goto nla_put_failure; if (nla_put(msg, BATADV_ATTR_ORIG_ADDRESS, ETH_ALEN, dst)) goto nla_put_failure; genlmsg_end(msg, hdr); genlmsg_multicast_netns(&batadv_netlink_family, dev_net(bat_priv->soft_iface), msg, 0, BATADV_NL_MCGRP_TPMETER, GFP_KERNEL); return 0; nla_put_failure: genlmsg_cancel(msg, hdr); ret = -EMSGSIZE; err_genlmsg: nlmsg_free(msg); return ret; } /** * batadv_netlink_tp_meter_start() - Start a new tp_meter session * @skb: received netlink message * @info: receiver information * * Return: 0 on success, < 0 on error */ static int batadv_netlink_tp_meter_start(struct sk_buff *skb, struct genl_info *info) { struct batadv_priv *bat_priv = info->user_ptr[0]; struct sk_buff *msg = NULL; u32 test_length; void *msg_head; u32 cookie; u8 *dst; int ret; if (!info->attrs[BATADV_ATTR_ORIG_ADDRESS]) return -EINVAL; if (!info->attrs[BATADV_ATTR_TPMETER_TEST_TIME]) return -EINVAL; dst = nla_data(info->attrs[BATADV_ATTR_ORIG_ADDRESS]); test_length = nla_get_u32(info->attrs[BATADV_ATTR_TPMETER_TEST_TIME]); msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); if (!msg) { ret = -ENOMEM; goto out; } msg_head = genlmsg_put(msg, info->snd_portid, info->snd_seq, &batadv_netlink_family, 0, BATADV_CMD_TP_METER); if (!msg_head) { ret = -ENOBUFS; goto out; } batadv_tp_start(bat_priv, dst, test_length, &cookie); ret = batadv_netlink_tp_meter_put(msg, cookie); out: if (ret) { if (msg) nlmsg_free(msg); return ret; } genlmsg_end(msg, msg_head); return genlmsg_reply(msg, info); } /** * batadv_netlink_tp_meter_cancel() - Cancel a running tp_meter session * @skb: received netlink message * @info: receiver information * * Return: 0 on success, < 0 on error */ static int batadv_netlink_tp_meter_cancel(struct sk_buff *skb, struct genl_info *info) { struct batadv_priv *bat_priv = info->user_ptr[0]; u8 *dst; int ret = 0; if (!info->attrs[BATADV_ATTR_ORIG_ADDRESS]) return -EINVAL; dst = nla_data(info->attrs[BATADV_ATTR_ORIG_ADDRESS]); batadv_tp_stop(bat_priv, dst, BATADV_TP_REASON_CANCEL); return ret; } /** * batadv_netlink_hardif_fill() - Fill message with hardif attributes * @msg: Netlink message to dump into * @bat_priv: the bat priv with all the soft interface information * @hard_iface: hard interface which was modified * @cmd: type of message to generate * @portid: Port making netlink request * @seq: sequence number for message * @flags: Additional flags for message * @cb: Control block containing additional options * * Return: 0 on success or negative error number in case of failure */ static int batadv_netlink_hardif_fill(struct sk_buff *msg, struct batadv_priv *bat_priv, struct batadv_hard_iface *hard_iface, enum batadv_nl_commands cmd, u32 portid, u32 seq, int flags, struct netlink_callback *cb) { struct net_device *net_dev = hard_iface->net_dev; void *hdr; hdr = genlmsg_put(msg, portid, seq, &batadv_netlink_family, flags, cmd); if (!hdr) return -ENOBUFS; if (cb) genl_dump_check_consistent(cb, hdr); if (nla_put_u32(msg, BATADV_ATTR_MESH_IFINDEX, bat_priv->soft_iface->ifindex)) goto nla_put_failure; if (nla_put_string(msg, BATADV_ATTR_MESH_IFNAME, bat_priv->soft_iface->name)) goto nla_put_failure; if (nla_put_u32(msg, BATADV_ATTR_HARD_IFINDEX, net_dev->ifindex) || nla_put_string(msg, BATADV_ATTR_HARD_IFNAME, net_dev->name) || nla_put(msg, BATADV_ATTR_HARD_ADDRESS, ETH_ALEN, net_dev->dev_addr)) goto nla_put_failure; if (hard_iface->if_status == BATADV_IF_ACTIVE) { if (nla_put_flag(msg, BATADV_ATTR_ACTIVE)) goto nla_put_failure; } if (nla_put_u8(msg, BATADV_ATTR_HOP_PENALTY, atomic_read(&hard_iface->hop_penalty))) goto nla_put_failure; #ifdef CONFIG_BATMAN_ADV_BATMAN_V if (nla_put_u32(msg, BATADV_ATTR_ELP_INTERVAL, atomic_read(&hard_iface->bat_v.elp_interval))) goto nla_put_failure; if (nla_put_u32(msg, BATADV_ATTR_THROUGHPUT_OVERRIDE, atomic_read(&hard_iface->bat_v.throughput_override))) goto nla_put_failure; #endif /* CONFIG_BATMAN_ADV_BATMAN_V */ genlmsg_end(msg, hdr); return 0; nla_put_failure: genlmsg_cancel(msg, hdr); return -EMSGSIZE; } /** * batadv_netlink_notify_hardif() - send hardif attributes to listener * @bat_priv: the bat priv with all the soft interface information * @hard_iface: hard interface which was modified * * Return: 0 on success, < 0 on error */ static int batadv_netlink_notify_hardif(struct batadv_priv *bat_priv, struct batadv_hard_iface *hard_iface) { struct sk_buff *msg; int ret; msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); if (!msg) return -ENOMEM; ret = batadv_netlink_hardif_fill(msg, bat_priv, hard_iface, BATADV_CMD_SET_HARDIF, 0, 0, 0, NULL); if (ret < 0) { nlmsg_free(msg); return ret; } genlmsg_multicast_netns(&batadv_netlink_family, dev_net(bat_priv->soft_iface), msg, 0, BATADV_NL_MCGRP_CONFIG, GFP_KERNEL); return 0; } /** * batadv_netlink_get_hardif() - Get hardif attributes * @skb: Netlink message with request data * @info: receiver information * * Return: 0 on success or negative error number in case of failure */ static int batadv_netlink_get_hardif(struct sk_buff *skb, struct genl_info *info) { struct batadv_hard_iface *hard_iface = info->user_ptr[1]; struct batadv_priv *bat_priv = info->user_ptr[0]; struct sk_buff *msg; int ret; msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); if (!msg) return -ENOMEM; ret = batadv_netlink_hardif_fill(msg, bat_priv, hard_iface, BATADV_CMD_GET_HARDIF, info->snd_portid, info->snd_seq, 0, NULL); if (ret < 0) { nlmsg_free(msg); return ret; } ret = genlmsg_reply(msg, info); return ret; } /** * batadv_netlink_set_hardif() - Set hardif attributes * @skb: Netlink message with request data * @info: receiver information * * Return: 0 on success or negative error number in case of failure */ static int batadv_netlink_set_hardif(struct sk_buff *skb, struct genl_info *info) { struct batadv_hard_iface *hard_iface = info->user_ptr[1]; struct batadv_priv *bat_priv = info->user_ptr[0]; struct nlattr *attr; if (info->attrs[BATADV_ATTR_HOP_PENALTY]) { attr = info->attrs[BATADV_ATTR_HOP_PENALTY]; atomic_set(&hard_iface->hop_penalty, nla_get_u8(attr)); } #ifdef CONFIG_BATMAN_ADV_BATMAN_V if (info->attrs[BATADV_ATTR_ELP_INTERVAL]) { attr = info->attrs[BATADV_ATTR_ELP_INTERVAL]; atomic_set(&hard_iface->bat_v.elp_interval, nla_get_u32(attr)); } if (info->attrs[BATADV_ATTR_THROUGHPUT_OVERRIDE]) { attr = info->attrs[BATADV_ATTR_THROUGHPUT_OVERRIDE]; atomic_set(&hard_iface->bat_v.throughput_override, nla_get_u32(attr)); } #endif /* CONFIG_BATMAN_ADV_BATMAN_V */ batadv_netlink_notify_hardif(bat_priv, hard_iface); return 0; } /** * batadv_netlink_dump_hardif() - Dump all hard interface into a messages * @msg: Netlink message to dump into * @cb: Parameters from query * * Return: error code, or length of reply message on success */ static int batadv_netlink_dump_hardif(struct sk_buff *msg, struct netlink_callback *cb) { struct net *net = sock_net(cb->skb->sk); struct net_device *soft_iface; struct batadv_hard_iface *hard_iface; struct batadv_priv *bat_priv; int ifindex; int portid = NETLINK_CB(cb->skb).portid; int skip = cb->args[0]; int i = 0; ifindex = batadv_netlink_get_ifindex(cb->nlh, BATADV_ATTR_MESH_IFINDEX); if (!ifindex) return -EINVAL; soft_iface = dev_get_by_index(net, ifindex); if (!soft_iface) return -ENODEV; if (!batadv_softif_is_valid(soft_iface)) { dev_put(soft_iface); return -ENODEV; } bat_priv = netdev_priv(soft_iface); rtnl_lock(); cb->seq = batadv_hardif_generation << 1 | 1; list_for_each_entry(hard_iface, &batadv_hardif_list, list) { if (hard_iface->soft_iface != soft_iface) continue; if (i++ < skip) continue; if (batadv_netlink_hardif_fill(msg, bat_priv, hard_iface, BATADV_CMD_GET_HARDIF, portid, cb->nlh->nlmsg_seq, NLM_F_MULTI, cb)) { i--; break; } } rtnl_unlock(); dev_put(soft_iface); cb->args[0] = i; return msg->len; } /** * batadv_netlink_vlan_fill() - Fill message with vlan attributes * @msg: Netlink message to dump into * @bat_priv: the bat priv with all the soft interface information * @vlan: vlan which was modified * @cmd: type of message to generate * @portid: Port making netlink request * @seq: sequence number for message * @flags: Additional flags for message * * Return: 0 on success or negative error number in case of failure */ static int batadv_netlink_vlan_fill(struct sk_buff *msg, struct batadv_priv *bat_priv, struct batadv_softif_vlan *vlan, enum batadv_nl_commands cmd, u32 portid, u32 seq, int flags) { void *hdr; hdr = genlmsg_put(msg, portid, seq, &batadv_netlink_family, flags, cmd); if (!hdr) return -ENOBUFS; if (nla_put_u32(msg, BATADV_ATTR_MESH_IFINDEX, bat_priv->soft_iface->ifindex)) goto nla_put_failure; if (nla_put_string(msg, BATADV_ATTR_MESH_IFNAME, bat_priv->soft_iface->name)) goto nla_put_failure; if (nla_put_u32(msg, BATADV_ATTR_VLANID, vlan->vid & VLAN_VID_MASK)) goto nla_put_failure; if (nla_put_u8(msg, BATADV_ATTR_AP_ISOLATION_ENABLED, !!atomic_read(&vlan->ap_isolation))) goto nla_put_failure; genlmsg_end(msg, hdr); return 0; nla_put_failure: genlmsg_cancel(msg, hdr); return -EMSGSIZE; } /** * batadv_netlink_notify_vlan() - send vlan attributes to listener * @bat_priv: the bat priv with all the soft interface information * @vlan: vlan which was modified * * Return: 0 on success, < 0 on error */ static int batadv_netlink_notify_vlan(struct batadv_priv *bat_priv, struct batadv_softif_vlan *vlan) { struct sk_buff *msg; int ret; msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); if (!msg) return -ENOMEM; ret = batadv_netlink_vlan_fill(msg, bat_priv, vlan, BATADV_CMD_SET_VLAN, 0, 0, 0); if (ret < 0) { nlmsg_free(msg); return ret; } genlmsg_multicast_netns(&batadv_netlink_family, dev_net(bat_priv->soft_iface), msg, 0, BATADV_NL_MCGRP_CONFIG, GFP_KERNEL); return 0; } /** * batadv_netlink_get_vlan() - Get vlan attributes * @skb: Netlink message with request data * @info: receiver information * * Return: 0 on success or negative error number in case of failure */ static int batadv_netlink_get_vlan(struct sk_buff *skb, struct genl_info *info) { struct batadv_softif_vlan *vlan = info->user_ptr[1]; struct batadv_priv *bat_priv = info->user_ptr[0]; struct sk_buff *msg; int ret; msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); if (!msg) return -ENOMEM; ret = batadv_netlink_vlan_fill(msg, bat_priv, vlan, BATADV_CMD_GET_VLAN, info->snd_portid, info->snd_seq, 0); if (ret < 0) { nlmsg_free(msg); return ret; } ret = genlmsg_reply(msg, info); return ret; } /** * batadv_netlink_set_vlan() - Get vlan attributes * @skb: Netlink message with request data * @info: receiver information * * Return: 0 on success or negative error number in case of failure */ static int batadv_netlink_set_vlan(struct sk_buff *skb, struct genl_info *info) { struct batadv_softif_vlan *vlan = info->user_ptr[1]; struct batadv_priv *bat_priv = info->user_ptr[0]; struct nlattr *attr; if (info->attrs[BATADV_ATTR_AP_ISOLATION_ENABLED]) { attr = info->attrs[BATADV_ATTR_AP_ISOLATION_ENABLED]; atomic_set(&vlan->ap_isolation, !!nla_get_u8(attr)); } batadv_netlink_notify_vlan(bat_priv, vlan); return 0; } /** * batadv_get_softif_from_info() - Retrieve soft interface from genl attributes * @net: the applicable net namespace * @info: receiver information * * Return: Pointer to soft interface (with increased refcnt) on success, error * pointer on error */ static struct net_device * batadv_get_softif_from_info(struct net *net, struct genl_info *info) { struct net_device *soft_iface; int ifindex; if (!info->attrs[BATADV_ATTR_MESH_IFINDEX]) return ERR_PTR(-EINVAL); ifindex = nla_get_u32(info->attrs[BATADV_ATTR_MESH_IFINDEX]); soft_iface = dev_get_by_index(net, ifindex); if (!soft_iface) return ERR_PTR(-ENODEV); if (!batadv_softif_is_valid(soft_iface)) goto err_put_softif; return soft_iface; err_put_softif: dev_put(soft_iface); return ERR_PTR(-EINVAL); } /** * batadv_get_hardif_from_info() - Retrieve hardif from genl attributes * @bat_priv: the bat priv with all the soft interface information * @net: the applicable net namespace * @info: receiver information * * Return: Pointer to hard interface (with increased refcnt) on success, error * pointer on error */ static struct batadv_hard_iface * batadv_get_hardif_from_info(struct batadv_priv *bat_priv, struct net *net, struct genl_info *info) { struct batadv_hard_iface *hard_iface; struct net_device *hard_dev; unsigned int hardif_index; if (!info->attrs[BATADV_ATTR_HARD_IFINDEX]) return ERR_PTR(-EINVAL); hardif_index = nla_get_u32(info->attrs[BATADV_ATTR_HARD_IFINDEX]); hard_dev = dev_get_by_index(net, hardif_index); if (!hard_dev) return ERR_PTR(-ENODEV); hard_iface = batadv_hardif_get_by_netdev(hard_dev); if (!hard_iface) goto err_put_harddev; if (hard_iface->soft_iface != bat_priv->soft_iface) goto err_put_hardif; /* hard_dev is referenced by hard_iface and not needed here */ dev_put(hard_dev); return hard_iface; err_put_hardif: batadv_hardif_put(hard_iface); err_put_harddev: dev_put(hard_dev); return ERR_PTR(-EINVAL); } /** * batadv_get_vlan_from_info() - Retrieve vlan from genl attributes * @bat_priv: the bat priv with all the soft interface information * @net: the applicable net namespace * @info: receiver information * * Return: Pointer to vlan on success (with increased refcnt), error pointer * on error */ static struct batadv_softif_vlan * batadv_get_vlan_from_info(struct batadv_priv *bat_priv, struct net *net, struct genl_info *info) { struct batadv_softif_vlan *vlan; u16 vid; if (!info->attrs[BATADV_ATTR_VLANID]) return ERR_PTR(-EINVAL); vid = nla_get_u16(info->attrs[BATADV_ATTR_VLANID]); vlan = batadv_softif_vlan_get(bat_priv, vid | BATADV_VLAN_HAS_TAG); if (!vlan) return ERR_PTR(-ENOENT); return vlan; } /** * batadv_pre_doit() - Prepare batman-adv genl doit request * @ops: requested netlink operation * @skb: Netlink message with request data * @info: receiver information * * Return: 0 on success or negative error number in case of failure */ static int batadv_pre_doit(const struct genl_split_ops *ops, struct sk_buff *skb, struct genl_info *info) { struct net *net = genl_info_net(info); struct batadv_hard_iface *hard_iface; struct batadv_priv *bat_priv = NULL; struct batadv_softif_vlan *vlan; struct net_device *soft_iface; u8 user_ptr1_flags; u8 mesh_dep_flags; int ret; user_ptr1_flags = BATADV_FLAG_NEED_HARDIF | BATADV_FLAG_NEED_VLAN; if (WARN_ON(hweight8(ops->internal_flags & user_ptr1_flags) > 1)) return -EINVAL; mesh_dep_flags = BATADV_FLAG_NEED_HARDIF | BATADV_FLAG_NEED_VLAN; if (WARN_ON((ops->internal_flags & mesh_dep_flags) && (~ops->internal_flags & BATADV_FLAG_NEED_MESH))) return -EINVAL; if (ops->internal_flags & BATADV_FLAG_NEED_MESH) { soft_iface = batadv_get_softif_from_info(net, info); if (IS_ERR(soft_iface)) return PTR_ERR(soft_iface); bat_priv = netdev_priv(soft_iface); info->user_ptr[0] = bat_priv; } if (ops->internal_flags & BATADV_FLAG_NEED_HARDIF) { hard_iface = batadv_get_hardif_from_info(bat_priv, net, info); if (IS_ERR(hard_iface)) { ret = PTR_ERR(hard_iface); goto err_put_softif; } info->user_ptr[1] = hard_iface; } if (ops->internal_flags & BATADV_FLAG_NEED_VLAN) { vlan = batadv_get_vlan_from_info(bat_priv, net, info); if (IS_ERR(vlan)) { ret = PTR_ERR(vlan); goto err_put_softif; } info->user_ptr[1] = vlan; } return 0; err_put_softif: if (bat_priv) dev_put(bat_priv->soft_iface); return ret; } /** * batadv_post_doit() - End batman-adv genl doit request * @ops: requested netlink operation * @skb: Netlink message with request data * @info: receiver information */ static void batadv_post_doit(const struct genl_split_ops *ops, struct sk_buff *skb, struct genl_info *info) { struct batadv_hard_iface *hard_iface; struct batadv_softif_vlan *vlan; struct batadv_priv *bat_priv; if (ops->internal_flags & BATADV_FLAG_NEED_HARDIF && info->user_ptr[1]) { hard_iface = info->user_ptr[1]; batadv_hardif_put(hard_iface); } if (ops->internal_flags & BATADV_FLAG_NEED_VLAN && info->user_ptr[1]) { vlan = info->user_ptr[1]; batadv_softif_vlan_put(vlan); } if (ops->internal_flags & BATADV_FLAG_NEED_MESH && info->user_ptr[0]) { bat_priv = info->user_ptr[0]; dev_put(bat_priv->soft_iface); } } static const struct genl_small_ops batadv_netlink_ops[] = { { .cmd = BATADV_CMD_GET_MESH, .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, /* can be retrieved by unprivileged users */ .doit = batadv_netlink_get_mesh, .internal_flags = BATADV_FLAG_NEED_MESH, }, { .cmd = BATADV_CMD_TP_METER, .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, .flags = GENL_UNS_ADMIN_PERM, .doit = batadv_netlink_tp_meter_start, .internal_flags = BATADV_FLAG_NEED_MESH, }, { .cmd = BATADV_CMD_TP_METER_CANCEL, .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, .flags = GENL_UNS_ADMIN_PERM, .doit = batadv_netlink_tp_meter_cancel, .internal_flags = BATADV_FLAG_NEED_MESH, }, { .cmd = BATADV_CMD_GET_ROUTING_ALGOS, .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, .flags = GENL_UNS_ADMIN_PERM, .dumpit = batadv_algo_dump, }, { .cmd = BATADV_CMD_GET_HARDIF, .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, /* can be retrieved by unprivileged users */ .dumpit = batadv_netlink_dump_hardif, .doit = batadv_netlink_get_hardif, .internal_flags = BATADV_FLAG_NEED_MESH | BATADV_FLAG_NEED_HARDIF, }, { .cmd = BATADV_CMD_GET_TRANSTABLE_LOCAL, .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, .flags = GENL_UNS_ADMIN_PERM, .dumpit = batadv_tt_local_dump, }, { .cmd = BATADV_CMD_GET_TRANSTABLE_GLOBAL, .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, .flags = GENL_UNS_ADMIN_PERM, .dumpit = batadv_tt_global_dump, }, { .cmd = BATADV_CMD_GET_ORIGINATORS, .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, .flags = GENL_UNS_ADMIN_PERM, .dumpit = batadv_orig_dump, }, { .cmd = BATADV_CMD_GET_NEIGHBORS, .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, .flags = GENL_UNS_ADMIN_PERM, .dumpit = batadv_hardif_neigh_dump, }, { .cmd = BATADV_CMD_GET_GATEWAYS, .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, .flags = GENL_UNS_ADMIN_PERM, .dumpit = batadv_gw_dump, }, { .cmd = BATADV_CMD_GET_BLA_CLAIM, .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, .flags = GENL_UNS_ADMIN_PERM, .dumpit = batadv_bla_claim_dump, }, { .cmd = BATADV_CMD_GET_BLA_BACKBONE, .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, .flags = GENL_UNS_ADMIN_PERM, .dumpit = batadv_bla_backbone_dump, }, { .cmd = BATADV_CMD_GET_DAT_CACHE, .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, .flags = GENL_UNS_ADMIN_PERM, .dumpit = batadv_dat_cache_dump, }, { .cmd = BATADV_CMD_GET_MCAST_FLAGS, .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, .flags = GENL_UNS_ADMIN_PERM, .dumpit = batadv_mcast_flags_dump, }, { .cmd = BATADV_CMD_SET_MESH, .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, .flags = GENL_UNS_ADMIN_PERM, .doit = batadv_netlink_set_mesh, .internal_flags = BATADV_FLAG_NEED_MESH, }, { .cmd = BATADV_CMD_SET_HARDIF, .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, .flags = GENL_UNS_ADMIN_PERM, .doit = batadv_netlink_set_hardif, .internal_flags = BATADV_FLAG_NEED_MESH | BATADV_FLAG_NEED_HARDIF, }, { .cmd = BATADV_CMD_GET_VLAN, .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, /* can be retrieved by unprivileged users */ .doit = batadv_netlink_get_vlan, .internal_flags = BATADV_FLAG_NEED_MESH | BATADV_FLAG_NEED_VLAN, }, { .cmd = BATADV_CMD_SET_VLAN, .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, .flags = GENL_UNS_ADMIN_PERM, .doit = batadv_netlink_set_vlan, .internal_flags = BATADV_FLAG_NEED_MESH | BATADV_FLAG_NEED_VLAN, }, }; struct genl_family batadv_netlink_family __ro_after_init = { .hdrsize = 0, .name = BATADV_NL_NAME, .version = 1, .maxattr = BATADV_ATTR_MAX, .policy = batadv_netlink_policy, .netnsok = true, .pre_doit = batadv_pre_doit, .post_doit = batadv_post_doit, .module = THIS_MODULE, .small_ops = batadv_netlink_ops, .n_small_ops = ARRAY_SIZE(batadv_netlink_ops), .resv_start_op = BATADV_CMD_SET_VLAN + 1, .mcgrps = batadv_netlink_mcgrps, .n_mcgrps = ARRAY_SIZE(batadv_netlink_mcgrps), }; /** * batadv_netlink_register() - register batadv genl netlink family */ void __init batadv_netlink_register(void) { int ret; ret = genl_register_family(&batadv_netlink_family); if (ret) pr_warn("unable to register netlink family"); } /** * batadv_netlink_unregister() - unregister batadv genl netlink family */ void batadv_netlink_unregister(void) { genl_unregister_family(&batadv_netlink_family); } |
7 7 7 7 7 7 7 7 7 7 7 6 7 7 7 7 7 7 7 7 7 7 7 5 2 2 2 7 7 7 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 | // SPDX-License-Identifier: GPL-2.0-only /* * Universal power supply monitor class * * Copyright © 2007 Anton Vorontsov <cbou@mail.ru> * Copyright © 2004 Szabolcs Gyurko * Copyright © 2003 Ian Molton <spyro@f2s.com> * * Modified: 2004, Oct Szabolcs Gyurko */ #include <linux/module.h> #include <linux/types.h> #include <linux/init.h> #include <linux/slab.h> #include <linux/delay.h> #include <linux/device.h> #include <linux/notifier.h> #include <linux/err.h> #include <linux/of.h> #include <linux/power_supply.h> #include <linux/property.h> #include <linux/thermal.h> #include <linux/fixp-arith.h> #include "power_supply.h" #include "samsung-sdi-battery.h" static const struct class power_supply_class = { .name = "power_supply", .dev_uevent = power_supply_uevent, }; static BLOCKING_NOTIFIER_HEAD(power_supply_notifier); static const struct device_type power_supply_dev_type = { .name = "power_supply", .groups = power_supply_attr_groups, }; #define POWER_SUPPLY_DEFERRED_REGISTER_TIME msecs_to_jiffies(10) static bool __power_supply_is_supplied_by(struct power_supply *supplier, struct power_supply *supply) { int i; if (!supply->supplied_from && !supplier->supplied_to) return false; /* Support both supplied_to and supplied_from modes */ if (supply->supplied_from) { if (!supplier->desc->name) return false; for (i = 0; i < supply->num_supplies; i++) if (!strcmp(supplier->desc->name, supply->supplied_from[i])) return true; } else { if (!supply->desc->name) return false; for (i = 0; i < supplier->num_supplicants; i++) if (!strcmp(supplier->supplied_to[i], supply->desc->name)) return true; } return false; } static int __power_supply_changed_work(struct device *dev, void *data) { struct power_supply *psy = data; struct power_supply *pst = dev_get_drvdata(dev); if (__power_supply_is_supplied_by(psy, pst)) { if (pst->desc->external_power_changed) pst->desc->external_power_changed(pst); } return 0; } static void power_supply_changed_work(struct work_struct *work) { unsigned long flags; struct power_supply *psy = container_of(work, struct power_supply, changed_work); dev_dbg(&psy->dev, "%s\n", __func__); spin_lock_irqsave(&psy->changed_lock, flags); /* * Check 'changed' here to avoid issues due to race between * power_supply_changed() and this routine. In worst case * power_supply_changed() can be called again just before we take above * lock. During the first call of this routine we will mark 'changed' as * false and it will stay false for the next call as well. */ if (likely(psy->changed)) { psy->changed = false; spin_unlock_irqrestore(&psy->changed_lock, flags); power_supply_for_each_device(psy, __power_supply_changed_work); power_supply_update_leds(psy); blocking_notifier_call_chain(&power_supply_notifier, PSY_EVENT_PROP_CHANGED, psy); kobject_uevent(&psy->dev.kobj, KOBJ_CHANGE); spin_lock_irqsave(&psy->changed_lock, flags); } /* * Hold the wakeup_source until all events are processed. * power_supply_changed() might have called again and have set 'changed' * to true. */ if (likely(!psy->changed)) pm_relax(&psy->dev); spin_unlock_irqrestore(&psy->changed_lock, flags); } int power_supply_for_each_device(void *data, int (*fn)(struct device *dev, void *data)) { return class_for_each_device(&power_supply_class, NULL, data, fn); } EXPORT_SYMBOL_GPL(power_supply_for_each_device); void power_supply_changed(struct power_supply *psy) { unsigned long flags; dev_dbg(&psy->dev, "%s\n", __func__); spin_lock_irqsave(&psy->changed_lock, flags); psy->changed = true; pm_stay_awake(&psy->dev); spin_unlock_irqrestore(&psy->changed_lock, flags); schedule_work(&psy->changed_work); } EXPORT_SYMBOL_GPL(power_supply_changed); /* * Notify that power supply was registered after parent finished the probing. * * Often power supply is registered from driver's probe function. However * calling power_supply_changed() directly from power_supply_register() * would lead to execution of get_property() function provided by the driver * too early - before the probe ends. * * Avoid that by waiting on parent's mutex. */ static void power_supply_deferred_register_work(struct work_struct *work) { struct power_supply *psy = container_of(work, struct power_supply, deferred_register_work.work); if (psy->dev.parent) { while (!mutex_trylock(&psy->dev.parent->mutex)) { if (psy->removing) return; msleep(10); } } power_supply_changed(psy); if (psy->dev.parent) mutex_unlock(&psy->dev.parent->mutex); } #ifdef CONFIG_OF static int __power_supply_populate_supplied_from(struct device *dev, void *data) { struct power_supply *psy = data; struct power_supply *epsy = dev_get_drvdata(dev); struct device_node *np; int i = 0; do { np = of_parse_phandle(psy->of_node, "power-supplies", i++); if (!np) break; if (np == epsy->of_node) { dev_dbg(&psy->dev, "%s: Found supply : %s\n", psy->desc->name, epsy->desc->name); psy->supplied_from[i-1] = (char *)epsy->desc->name; psy->num_supplies++; of_node_put(np); break; } of_node_put(np); } while (np); return 0; } static int power_supply_populate_supplied_from(struct power_supply *psy) { int error; error = power_supply_for_each_device(psy, __power_supply_populate_supplied_from); dev_dbg(&psy->dev, "%s %d\n", __func__, error); return error; } static int __power_supply_find_supply_from_node(struct device *dev, void *data) { struct device_node *np = data; struct power_supply *epsy = dev_get_drvdata(dev); /* returning non-zero breaks out of power_supply_for_each_device loop */ if (epsy->of_node == np) return 1; return 0; } static int power_supply_find_supply_from_node(struct device_node *supply_node) { int error; /* * power_supply_for_each_device() either returns its own errors or values * returned by __power_supply_find_supply_from_node(). * * __power_supply_find_supply_from_node() will return 0 (no match) * or 1 (match). * * We return 0 if power_supply_for_each_device() returned 1, -EPROBE_DEFER if * it returned 0, or error as returned by it. */ error = power_supply_for_each_device(supply_node, __power_supply_find_supply_from_node); return error ? (error == 1 ? 0 : error) : -EPROBE_DEFER; } static int power_supply_check_supplies(struct power_supply *psy) { struct device_node *np; int cnt = 0; /* If there is already a list honor it */ if (psy->supplied_from && psy->num_supplies > 0) return 0; /* No device node found, nothing to do */ if (!psy->of_node) return 0; do { int ret; np = of_parse_phandle(psy->of_node, "power-supplies", cnt++); if (!np) break; ret = power_supply_find_supply_from_node(np); of_node_put(np); if (ret) { dev_dbg(&psy->dev, "Failed to find supply!\n"); return ret; } } while (np); /* Missing valid "power-supplies" entries */ if (cnt == 1) return 0; /* All supplies found, allocate char ** array for filling */ psy->supplied_from = devm_kzalloc(&psy->dev, sizeof(*psy->supplied_from), GFP_KERNEL); if (!psy->supplied_from) return -ENOMEM; *psy->supplied_from = devm_kcalloc(&psy->dev, cnt - 1, sizeof(**psy->supplied_from), GFP_KERNEL); if (!*psy->supplied_from) return -ENOMEM; return power_supply_populate_supplied_from(psy); } #else static int power_supply_check_supplies(struct power_supply *psy) { int nval, ret; if (!psy->dev.parent) return 0; nval = device_property_string_array_count(psy->dev.parent, "supplied-from"); if (nval <= 0) return 0; psy->supplied_from = devm_kmalloc_array(&psy->dev, nval, sizeof(char *), GFP_KERNEL); if (!psy->supplied_from) return -ENOMEM; ret = device_property_read_string_array(psy->dev.parent, "supplied-from", (const char **)psy->supplied_from, nval); if (ret < 0) return ret; psy->num_supplies = nval; return 0; } #endif struct psy_am_i_supplied_data { struct power_supply *psy; unsigned int count; }; static int __power_supply_am_i_supplied(struct device *dev, void *_data) { union power_supply_propval ret = {0,}; struct power_supply *epsy = dev_get_drvdata(dev); struct psy_am_i_supplied_data *data = _data; if (__power_supply_is_supplied_by(epsy, data->psy)) { data->count++; if (!epsy->desc->get_property(epsy, POWER_SUPPLY_PROP_ONLINE, &ret)) return ret.intval; } return 0; } int power_supply_am_i_supplied(struct power_supply *psy) { struct psy_am_i_supplied_data data = { psy, 0 }; int error; error = power_supply_for_each_device(&data, __power_supply_am_i_supplied); dev_dbg(&psy->dev, "%s count %u err %d\n", __func__, data.count, error); if (data.count == 0) return -ENODEV; return error; } EXPORT_SYMBOL_GPL(power_supply_am_i_supplied); static int __power_supply_is_system_supplied(struct device *dev, void *data) { union power_supply_propval ret = {0,}; struct power_supply *psy = dev_get_drvdata(dev); unsigned int *count = data; if (!psy->desc->get_property(psy, POWER_SUPPLY_PROP_SCOPE, &ret)) if (ret.intval == POWER_SUPPLY_SCOPE_DEVICE) return 0; (*count)++; if (psy->desc->type != POWER_SUPPLY_TYPE_BATTERY) if (!psy->desc->get_property(psy, POWER_SUPPLY_PROP_ONLINE, &ret)) return ret.intval; return 0; } int power_supply_is_system_supplied(void) { int error; unsigned int count = 0; error = power_supply_for_each_device(&count, __power_supply_is_system_supplied); /* * If no system scope power class device was found at all, most probably we * are running on a desktop system, so assume we are on mains power. */ if (count == 0) return 1; return error; } EXPORT_SYMBOL_GPL(power_supply_is_system_supplied); struct psy_get_supplier_prop_data { struct power_supply *psy; enum power_supply_property psp; union power_supply_propval *val; }; static int __power_supply_get_supplier_property(struct device *dev, void *_data) { struct power_supply *epsy = dev_get_drvdata(dev); struct psy_get_supplier_prop_data *data = _data; if (__power_supply_is_supplied_by(epsy, data->psy)) if (!power_supply_get_property(epsy, data->psp, data->val)) return 1; /* Success */ return 0; /* Continue iterating */ } int power_supply_get_property_from_supplier(struct power_supply *psy, enum power_supply_property psp, union power_supply_propval *val) { struct psy_get_supplier_prop_data data = { .psy = psy, .psp = psp, .val = val, }; int ret; /* * This function is not intended for use with a supply with multiple * suppliers, we simply pick the first supply to report the psp. */ ret = power_supply_for_each_device(&data, __power_supply_get_supplier_property); if (ret < 0) return ret; if (ret == 0) return -ENODEV; return 0; } EXPORT_SYMBOL_GPL(power_supply_get_property_from_supplier); int power_supply_set_battery_charged(struct power_supply *psy) { if (atomic_read(&psy->use_cnt) >= 0 && psy->desc->type == POWER_SUPPLY_TYPE_BATTERY && psy->desc->set_charged) { psy->desc->set_charged(psy); return 0; } return -EINVAL; } EXPORT_SYMBOL_GPL(power_supply_set_battery_charged); static int power_supply_match_device_by_name(struct device *dev, const void *data) { const char *name = data; struct power_supply *psy = dev_get_drvdata(dev); return strcmp(psy->desc->name, name) == 0; } /** * power_supply_get_by_name() - Search for a power supply and returns its ref * @name: Power supply name to fetch * * If power supply was found, it increases reference count for the * internal power supply's device. The user should power_supply_put() * after usage. * * Return: On success returns a reference to a power supply with * matching name equals to @name, a NULL otherwise. */ struct power_supply *power_supply_get_by_name(const char *name) { struct power_supply *psy = NULL; struct device *dev = class_find_device(&power_supply_class, NULL, name, power_supply_match_device_by_name); if (dev) { psy = dev_get_drvdata(dev); atomic_inc(&psy->use_cnt); } return psy; } EXPORT_SYMBOL_GPL(power_supply_get_by_name); /** * power_supply_put() - Drop reference obtained with power_supply_get_by_name * @psy: Reference to put * * The reference to power supply should be put before unregistering * the power supply. */ void power_supply_put(struct power_supply *psy) { might_sleep(); atomic_dec(&psy->use_cnt); put_device(&psy->dev); } EXPORT_SYMBOL_GPL(power_supply_put); #ifdef CONFIG_OF static int power_supply_match_device_node(struct device *dev, const void *data) { return dev->parent && dev->parent->of_node == data; } /** * power_supply_get_by_phandle() - Search for a power supply and returns its ref * @np: Pointer to device node holding phandle property * @property: Name of property holding a power supply name * * If power supply was found, it increases reference count for the * internal power supply's device. The user should power_supply_put() * after usage. * * Return: On success returns a reference to a power supply with * matching name equals to value under @property, NULL or ERR_PTR otherwise. */ struct power_supply *power_supply_get_by_phandle(struct device_node *np, const char *property) { struct device_node *power_supply_np; struct power_supply *psy = NULL; struct device *dev; power_supply_np = of_parse_phandle(np, property, 0); if (!power_supply_np) return ERR_PTR(-ENODEV); dev = class_find_device(&power_supply_class, NULL, power_supply_np, power_supply_match_device_node); of_node_put(power_supply_np); if (dev) { psy = dev_get_drvdata(dev); atomic_inc(&psy->use_cnt); } return psy; } EXPORT_SYMBOL_GPL(power_supply_get_by_phandle); static void devm_power_supply_put(struct device *dev, void *res) { struct power_supply **psy = res; power_supply_put(*psy); } /** * devm_power_supply_get_by_phandle() - Resource managed version of * power_supply_get_by_phandle() * @dev: Pointer to device holding phandle property * @property: Name of property holding a power supply phandle * * Return: On success returns a reference to a power supply with * matching name equals to value under @property, NULL or ERR_PTR otherwise. */ struct power_supply *devm_power_supply_get_by_phandle(struct device *dev, const char *property) { struct power_supply **ptr, *psy; if (!dev->of_node) return ERR_PTR(-ENODEV); ptr = devres_alloc(devm_power_supply_put, sizeof(*ptr), GFP_KERNEL); if (!ptr) return ERR_PTR(-ENOMEM); psy = power_supply_get_by_phandle(dev->of_node, property); if (IS_ERR_OR_NULL(psy)) { devres_free(ptr); } else { *ptr = psy; devres_add(dev, ptr); } return psy; } EXPORT_SYMBOL_GPL(devm_power_supply_get_by_phandle); #endif /* CONFIG_OF */ int power_supply_get_battery_info(struct power_supply *psy, struct power_supply_battery_info **info_out) { struct power_supply_resistance_temp_table *resist_table; struct power_supply_battery_info *info; struct device_node *battery_np = NULL; struct fwnode_reference_args args; struct fwnode_handle *fwnode = NULL; const char *value; int err, len, index; const __be32 *list; u32 min_max[2]; if (psy->of_node) { battery_np = of_parse_phandle(psy->of_node, "monitored-battery", 0); if (!battery_np) return -ENODEV; fwnode = fwnode_handle_get(of_fwnode_handle(battery_np)); } else if (psy->dev.parent) { err = fwnode_property_get_reference_args( dev_fwnode(psy->dev.parent), "monitored-battery", NULL, 0, 0, &args); if (err) return err; fwnode = args.fwnode; } if (!fwnode) return -ENOENT; err = fwnode_property_read_string(fwnode, "compatible", &value); if (err) goto out_put_node; /* Try static batteries first */ err = samsung_sdi_battery_get_info(&psy->dev, value, &info); if (!err) goto out_ret_pointer; else if (err == -ENODEV) /* * Device does not have a static battery. * Proceed to look for a simple battery. */ err = 0; if (strcmp("simple-battery", value)) { err = -ENODEV; goto out_put_node; } info = devm_kzalloc(&psy->dev, sizeof(*info), GFP_KERNEL); if (!info) { err = -ENOMEM; goto out_put_node; } info->technology = POWER_SUPPLY_TECHNOLOGY_UNKNOWN; info->energy_full_design_uwh = -EINVAL; info->charge_full_design_uah = -EINVAL; info->voltage_min_design_uv = -EINVAL; info->voltage_max_design_uv = -EINVAL; info->precharge_current_ua = -EINVAL; info->charge_term_current_ua = -EINVAL; info->constant_charge_current_max_ua = -EINVAL; info->constant_charge_voltage_max_uv = -EINVAL; info->tricklecharge_current_ua = -EINVAL; info->precharge_voltage_max_uv = -EINVAL; info->charge_restart_voltage_uv = -EINVAL; info->overvoltage_limit_uv = -EINVAL; info->maintenance_charge = NULL; info->alert_low_temp_charge_current_ua = -EINVAL; info->alert_low_temp_charge_voltage_uv = -EINVAL; info->alert_high_temp_charge_current_ua = -EINVAL; info->alert_high_temp_charge_voltage_uv = -EINVAL; info->temp_ambient_alert_min = INT_MIN; info->temp_ambient_alert_max = INT_MAX; info->temp_alert_min = INT_MIN; info->temp_alert_max = INT_MAX; info->temp_min = INT_MIN; info->temp_max = INT_MAX; info->factory_internal_resistance_uohm = -EINVAL; info->resist_table = NULL; info->bti_resistance_ohm = -EINVAL; info->bti_resistance_tolerance = -EINVAL; for (index = 0; index < POWER_SUPPLY_OCV_TEMP_MAX; index++) { info->ocv_table[index] = NULL; info->ocv_temp[index] = -EINVAL; info->ocv_table_size[index] = -EINVAL; } /* The property and field names below must correspond to elements * in enum power_supply_property. For reasoning, see * Documentation/power/power_supply_class.rst. */ if (!fwnode_property_read_string(fwnode, "device-chemistry", &value)) { if (!strcmp("nickel-cadmium", value)) info->technology = POWER_SUPPLY_TECHNOLOGY_NiCd; else if (!strcmp("nickel-metal-hydride", value)) info->technology = POWER_SUPPLY_TECHNOLOGY_NiMH; else if (!strcmp("lithium-ion", value)) /* Imprecise lithium-ion type */ info->technology = POWER_SUPPLY_TECHNOLOGY_LION; else if (!strcmp("lithium-ion-polymer", value)) info->technology = POWER_SUPPLY_TECHNOLOGY_LIPO; else if (!strcmp("lithium-ion-iron-phosphate", value)) info->technology = POWER_SUPPLY_TECHNOLOGY_LiFe; else if (!strcmp("lithium-ion-manganese-oxide", value)) info->technology = POWER_SUPPLY_TECHNOLOGY_LiMn; else dev_warn(&psy->dev, "%s unknown battery type\n", value); } fwnode_property_read_u32(fwnode, "energy-full-design-microwatt-hours", &info->energy_full_design_uwh); fwnode_property_read_u32(fwnode, "charge-full-design-microamp-hours", &info->charge_full_design_uah); fwnode_property_read_u32(fwnode, "voltage-min-design-microvolt", &info->voltage_min_design_uv); fwnode_property_read_u32(fwnode, "voltage-max-design-microvolt", &info->voltage_max_design_uv); fwnode_property_read_u32(fwnode, "trickle-charge-current-microamp", &info->tricklecharge_current_ua); fwnode_property_read_u32(fwnode, "precharge-current-microamp", &info->precharge_current_ua); fwnode_property_read_u32(fwnode, "precharge-upper-limit-microvolt", &info->precharge_voltage_max_uv); fwnode_property_read_u32(fwnode, "charge-term-current-microamp", &info->charge_term_current_ua); fwnode_property_read_u32(fwnode, "re-charge-voltage-microvolt", &info->charge_restart_voltage_uv); fwnode_property_read_u32(fwnode, "over-voltage-threshold-microvolt", &info->overvoltage_limit_uv); fwnode_property_read_u32(fwnode, "constant-charge-current-max-microamp", &info->constant_charge_current_max_ua); fwnode_property_read_u32(fwnode, "constant-charge-voltage-max-microvolt", &info->constant_charge_voltage_max_uv); fwnode_property_read_u32(fwnode, "factory-internal-resistance-micro-ohms", &info->factory_internal_resistance_uohm); if (!fwnode_property_read_u32_array(fwnode, "ambient-celsius", min_max, ARRAY_SIZE(min_max))) { info->temp_ambient_alert_min = min_max[0]; info->temp_ambient_alert_max = min_max[1]; } if (!fwnode_property_read_u32_array(fwnode, "alert-celsius", min_max, ARRAY_SIZE(min_max))) { info->temp_alert_min = min_max[0]; info->temp_alert_max = min_max[1]; } if (!fwnode_property_read_u32_array(fwnode, "operating-range-celsius", min_max, ARRAY_SIZE(min_max))) { info->temp_min = min_max[0]; info->temp_max = min_max[1]; } /* * The below code uses raw of-data parsing to parse * /schemas/types.yaml#/definitions/uint32-matrix * data, so for now this is only support with of. */ if (!battery_np) goto out_ret_pointer; len = of_property_count_u32_elems(battery_np, "ocv-capacity-celsius"); if (len < 0 && len != -EINVAL) { err = len; goto out_put_node; } else if (len > POWER_SUPPLY_OCV_TEMP_MAX) { dev_err(&psy->dev, "Too many temperature values\n"); err = -EINVAL; goto out_put_node; } else if (len > 0) { of_property_read_u32_array(battery_np, "ocv-capacity-celsius", info->ocv_temp, len); } for (index = 0; index < len; index++) { struct power_supply_battery_ocv_table *table; char *propname; int i, tab_len, size; propname = kasprintf(GFP_KERNEL, "ocv-capacity-table-%d", index); if (!propname) { power_supply_put_battery_info(psy, info); err = -ENOMEM; goto out_put_node; } list = of_get_property(battery_np, propname, &size); if (!list || !size) { dev_err(&psy->dev, "failed to get %s\n", propname); kfree(propname); power_supply_put_battery_info(psy, info); err = -EINVAL; goto out_put_node; } kfree(propname); tab_len = size / (2 * sizeof(__be32)); info->ocv_table_size[index] = tab_len; table = info->ocv_table[index] = devm_kcalloc(&psy->dev, tab_len, sizeof(*table), GFP_KERNEL); if (!info->ocv_table[index]) { power_supply_put_battery_info(psy, info); err = -ENOMEM; goto out_put_node; } for (i = 0; i < tab_len; i++) { table[i].ocv = be32_to_cpu(*list); list++; table[i].capacity = be32_to_cpu(*list); list++; } } list = of_get_property(battery_np, "resistance-temp-table", &len); if (!list || !len) goto out_ret_pointer; info->resist_table_size = len / (2 * sizeof(__be32)); resist_table = info->resist_table = devm_kcalloc(&psy->dev, info->resist_table_size, sizeof(*resist_table), GFP_KERNEL); if (!info->resist_table) { power_supply_put_battery_info(psy, info); err = -ENOMEM; goto out_put_node; } for (index = 0; index < info->resist_table_size; index++) { resist_table[index].temp = be32_to_cpu(*list++); resist_table[index].resistance = be32_to_cpu(*list++); } out_ret_pointer: /* Finally return the whole thing */ *info_out = info; out_put_node: fwnode_handle_put(fwnode); of_node_put(battery_np); return err; } EXPORT_SYMBOL_GPL(power_supply_get_battery_info); void power_supply_put_battery_info(struct power_supply *psy, struct power_supply_battery_info *info) { int i; for (i = 0; i < POWER_SUPPLY_OCV_TEMP_MAX; i++) { if (info->ocv_table[i]) devm_kfree(&psy->dev, info->ocv_table[i]); } if (info->resist_table) devm_kfree(&psy->dev, info->resist_table); devm_kfree(&psy->dev, info); } EXPORT_SYMBOL_GPL(power_supply_put_battery_info); const enum power_supply_property power_supply_battery_info_properties[] = { POWER_SUPPLY_PROP_TECHNOLOGY, POWER_SUPPLY_PROP_ENERGY_FULL_DESIGN, POWER_SUPPLY_PROP_CHARGE_FULL_DESIGN, POWER_SUPPLY_PROP_VOLTAGE_MIN_DESIGN, POWER_SUPPLY_PROP_VOLTAGE_MAX_DESIGN, POWER_SUPPLY_PROP_PRECHARGE_CURRENT, POWER_SUPPLY_PROP_CHARGE_TERM_CURRENT, POWER_SUPPLY_PROP_CONSTANT_CHARGE_CURRENT_MAX, POWER_SUPPLY_PROP_CONSTANT_CHARGE_VOLTAGE_MAX, POWER_SUPPLY_PROP_TEMP_AMBIENT_ALERT_MIN, POWER_SUPPLY_PROP_TEMP_AMBIENT_ALERT_MAX, POWER_SUPPLY_PROP_TEMP_ALERT_MIN, POWER_SUPPLY_PROP_TEMP_ALERT_MAX, POWER_SUPPLY_PROP_TEMP_MIN, POWER_SUPPLY_PROP_TEMP_MAX, }; EXPORT_SYMBOL_GPL(power_supply_battery_info_properties); const size_t power_supply_battery_info_properties_size = ARRAY_SIZE(power_supply_battery_info_properties); EXPORT_SYMBOL_GPL(power_supply_battery_info_properties_size); bool power_supply_battery_info_has_prop(struct power_supply_battery_info *info, enum power_supply_property psp) { if (!info) return false; switch (psp) { case POWER_SUPPLY_PROP_TECHNOLOGY: return info->technology != POWER_SUPPLY_TECHNOLOGY_UNKNOWN; case POWER_SUPPLY_PROP_ENERGY_FULL_DESIGN: return info->energy_full_design_uwh >= 0; case POWER_SUPPLY_PROP_CHARGE_FULL_DESIGN: return info->charge_full_design_uah >= 0; case POWER_SUPPLY_PROP_VOLTAGE_MIN_DESIGN: return info->voltage_min_design_uv >= 0; case POWER_SUPPLY_PROP_VOLTAGE_MAX_DESIGN: return info->voltage_max_design_uv >= 0; case POWER_SUPPLY_PROP_PRECHARGE_CURRENT: return info->precharge_current_ua >= 0; case POWER_SUPPLY_PROP_CHARGE_TERM_CURRENT: return info->charge_term_current_ua >= 0; case POWER_SUPPLY_PROP_CONSTANT_CHARGE_CURRENT_MAX: return info->constant_charge_current_max_ua >= 0; case POWER_SUPPLY_PROP_CONSTANT_CHARGE_VOLTAGE_MAX: return info->constant_charge_voltage_max_uv >= 0; case POWER_SUPPLY_PROP_TEMP_AMBIENT_ALERT_MIN: return info->temp_ambient_alert_min > INT_MIN; case POWER_SUPPLY_PROP_TEMP_AMBIENT_ALERT_MAX: return info->temp_ambient_alert_max < INT_MAX; case POWER_SUPPLY_PROP_TEMP_ALERT_MIN: return info->temp_alert_min > INT_MIN; case POWER_SUPPLY_PROP_TEMP_ALERT_MAX: return info->temp_alert_max < INT_MAX; case POWER_SUPPLY_PROP_TEMP_MIN: return info->temp_min > INT_MIN; case POWER_SUPPLY_PROP_TEMP_MAX: return info->temp_max < INT_MAX; default: return false; } } EXPORT_SYMBOL_GPL(power_supply_battery_info_has_prop); int power_supply_battery_info_get_prop(struct power_supply_battery_info *info, enum power_supply_property psp, union power_supply_propval *val) { if (!info) return -EINVAL; if (!power_supply_battery_info_has_prop(info, psp)) return -EINVAL; switch (psp) { case POWER_SUPPLY_PROP_TECHNOLOGY: val->intval = info->technology; return 0; case POWER_SUPPLY_PROP_ENERGY_FULL_DESIGN: val->intval = info->energy_full_design_uwh; return 0; case POWER_SUPPLY_PROP_CHARGE_FULL_DESIGN: val->intval = info->charge_full_design_uah; return 0; case POWER_SUPPLY_PROP_VOLTAGE_MIN_DESIGN: val->intval = info->voltage_min_design_uv; return 0; case POWER_SUPPLY_PROP_VOLTAGE_MAX_DESIGN: val->intval = info->voltage_max_design_uv; return 0; case POWER_SUPPLY_PROP_PRECHARGE_CURRENT: val->intval = info->precharge_current_ua; return 0; case POWER_SUPPLY_PROP_CHARGE_TERM_CURRENT: val->intval = info->charge_term_current_ua; return 0; case POWER_SUPPLY_PROP_CONSTANT_CHARGE_CURRENT_MAX: val->intval = info->constant_charge_current_max_ua; return 0; case POWER_SUPPLY_PROP_CONSTANT_CHARGE_VOLTAGE_MAX: val->intval = info->constant_charge_voltage_max_uv; return 0; case POWER_SUPPLY_PROP_TEMP_AMBIENT_ALERT_MIN: val->intval = info->temp_ambient_alert_min; return 0; case POWER_SUPPLY_PROP_TEMP_AMBIENT_ALERT_MAX: val->intval = info->temp_ambient_alert_max; return 0; case POWER_SUPPLY_PROP_TEMP_ALERT_MIN: val->intval = info->temp_alert_min; return 0; case POWER_SUPPLY_PROP_TEMP_ALERT_MAX: val->intval = info->temp_alert_max; return 0; case POWER_SUPPLY_PROP_TEMP_MIN: val->intval = info->temp_min; return 0; case POWER_SUPPLY_PROP_TEMP_MAX: val->intval = info->temp_max; return 0; default: return -EINVAL; } } EXPORT_SYMBOL_GPL(power_supply_battery_info_get_prop); /** * power_supply_temp2resist_simple() - find the battery internal resistance * percent from temperature * @table: Pointer to battery resistance temperature table * @table_len: The table length * @temp: Current temperature * * This helper function is used to look up battery internal resistance percent * according to current temperature value from the resistance temperature table, * and the table must be ordered descending. Then the actual battery internal * resistance = the ideal battery internal resistance * percent / 100. * * Return: the battery internal resistance percent */ int power_supply_temp2resist_simple(struct power_supply_resistance_temp_table *table, int table_len, int temp) { int i, high, low; for (i = 0; i < table_len; i++) if (temp > table[i].temp) break; /* The library function will deal with high == low */ if (i == 0) high = low = i; else if (i == table_len) high = low = i - 1; else high = (low = i) - 1; return fixp_linear_interpolate(table[low].temp, table[low].resistance, table[high].temp, table[high].resistance, temp); } EXPORT_SYMBOL_GPL(power_supply_temp2resist_simple); /** * power_supply_vbat2ri() - find the battery internal resistance * from the battery voltage * @info: The battery information container * @vbat_uv: The battery voltage in microvolt * @charging: If we are charging (true) or not (false) * * This helper function is used to look up battery internal resistance * according to current battery voltage. Depending on whether the battery * is currently charging or not, different resistance will be returned. * * Returns the internal resistance in microohm or negative error code. */ int power_supply_vbat2ri(struct power_supply_battery_info *info, int vbat_uv, bool charging) { const struct power_supply_vbat_ri_table *vbat2ri; int table_len; int i, high, low; /* * If we are charging, and the battery supplies a separate table * for this state, we use that in order to compensate for the * charging voltage. Otherwise we use the main table. */ if (charging && info->vbat2ri_charging) { vbat2ri = info->vbat2ri_charging; table_len = info->vbat2ri_charging_size; } else { vbat2ri = info->vbat2ri_discharging; table_len = info->vbat2ri_discharging_size; } /* * If no tables are specified, or if we are above the highest voltage in * the voltage table, just return the factory specified internal resistance. */ if (!vbat2ri || (table_len <= 0) || (vbat_uv > vbat2ri[0].vbat_uv)) { if (charging && (info->factory_internal_resistance_charging_uohm > 0)) return info->factory_internal_resistance_charging_uohm; else return info->factory_internal_resistance_uohm; } /* Break loop at table_len - 1 because that is the highest index */ for (i = 0; i < table_len - 1; i++) if (vbat_uv > vbat2ri[i].vbat_uv) break; /* The library function will deal with high == low */ if ((i == 0) || (i == (table_len - 1))) high = i; else high = i - 1; low = i; return fixp_linear_interpolate(vbat2ri[low].vbat_uv, vbat2ri[low].ri_uohm, vbat2ri[high].vbat_uv, vbat2ri[high].ri_uohm, vbat_uv); } EXPORT_SYMBOL_GPL(power_supply_vbat2ri); const struct power_supply_maintenance_charge_table * power_supply_get_maintenance_charging_setting(struct power_supply_battery_info *info, int index) { if (index >= info->maintenance_charge_size) return NULL; return &info->maintenance_charge[index]; } EXPORT_SYMBOL_GPL(power_supply_get_maintenance_charging_setting); /** * power_supply_ocv2cap_simple() - find the battery capacity * @table: Pointer to battery OCV lookup table * @table_len: OCV table length * @ocv: Current OCV value * * This helper function is used to look up battery capacity according to * current OCV value from one OCV table, and the OCV table must be ordered * descending. * * Return: the battery capacity. */ int power_supply_ocv2cap_simple(struct power_supply_battery_ocv_table *table, int table_len, int ocv) { int i, high, low; for (i = 0; i < table_len; i++) if (ocv > table[i].ocv) break; /* The library function will deal with high == low */ if (i == 0) high = low = i; else if (i == table_len) high = low = i - 1; else high = (low = i) - 1; return fixp_linear_interpolate(table[low].ocv, table[low].capacity, table[high].ocv, table[high].capacity, ocv); } EXPORT_SYMBOL_GPL(power_supply_ocv2cap_simple); struct power_supply_battery_ocv_table * power_supply_find_ocv2cap_table(struct power_supply_battery_info *info, int temp, int *table_len) { int best_temp_diff = INT_MAX, temp_diff; u8 i, best_index = 0; if (!info->ocv_table[0]) return NULL; for (i = 0; i < POWER_SUPPLY_OCV_TEMP_MAX; i++) { /* Out of capacity tables */ if (!info->ocv_table[i]) break; temp_diff = abs(info->ocv_temp[i] - temp); if (temp_diff < best_temp_diff) { best_temp_diff = temp_diff; best_index = i; } } *table_len = info->ocv_table_size[best_index]; return info->ocv_table[best_index]; } EXPORT_SYMBOL_GPL(power_supply_find_ocv2cap_table); int power_supply_batinfo_ocv2cap(struct power_supply_battery_info *info, int ocv, int temp) { struct power_supply_battery_ocv_table *table; int table_len; table = power_supply_find_ocv2cap_table(info, temp, &table_len); if (!table) return -EINVAL; return power_supply_ocv2cap_simple(table, table_len, ocv); } EXPORT_SYMBOL_GPL(power_supply_batinfo_ocv2cap); bool power_supply_battery_bti_in_range(struct power_supply_battery_info *info, int resistance) { int low, high; /* Nothing like this can be checked */ if (info->bti_resistance_ohm <= 0) return false; /* This will be extremely strict and unlikely to work */ if (info->bti_resistance_tolerance <= 0) return (info->bti_resistance_ohm == resistance); low = info->bti_resistance_ohm - (info->bti_resistance_ohm * info->bti_resistance_tolerance) / 100; high = info->bti_resistance_ohm + (info->bti_resistance_ohm * info->bti_resistance_tolerance) / 100; return ((resistance >= low) && (resistance <= high)); } EXPORT_SYMBOL_GPL(power_supply_battery_bti_in_range); static bool psy_has_property(const struct power_supply_desc *psy_desc, enum power_supply_property psp) { bool found = false; int i; for (i = 0; i < psy_desc->num_properties; i++) { if (psy_desc->properties[i] == psp) { found = true; break; } } return found; } int power_supply_get_property(struct power_supply *psy, enum power_supply_property psp, union power_supply_propval *val) { if (atomic_read(&psy->use_cnt) <= 0) { if (!psy->initialized) return -EAGAIN; return -ENODEV; } if (psy_has_property(psy->desc, psp)) return psy->desc->get_property(psy, psp, val); else if (power_supply_battery_info_has_prop(psy->battery_info, psp)) return power_supply_battery_info_get_prop(psy->battery_info, psp, val); else return -EINVAL; } EXPORT_SYMBOL_GPL(power_supply_get_property); int power_supply_set_property(struct power_supply *psy, enum power_supply_property psp, const union power_supply_propval *val) { if (atomic_read(&psy->use_cnt) <= 0 || !psy->desc->set_property) return -ENODEV; return psy->desc->set_property(psy, psp, val); } EXPORT_SYMBOL_GPL(power_supply_set_property); int power_supply_property_is_writeable(struct power_supply *psy, enum power_supply_property psp) { if (atomic_read(&psy->use_cnt) <= 0 || !psy->desc->property_is_writeable) return -ENODEV; return psy->desc->property_is_writeable(psy, psp); } EXPORT_SYMBOL_GPL(power_supply_property_is_writeable); void power_supply_external_power_changed(struct power_supply *psy) { if (atomic_read(&psy->use_cnt) <= 0 || !psy->desc->external_power_changed) return; psy->desc->external_power_changed(psy); } EXPORT_SYMBOL_GPL(power_supply_external_power_changed); int power_supply_powers(struct power_supply *psy, struct device *dev) { return sysfs_create_link(&psy->dev.kobj, &dev->kobj, "powers"); } EXPORT_SYMBOL_GPL(power_supply_powers); static void power_supply_dev_release(struct device *dev) { struct power_supply *psy = to_power_supply(dev); dev_dbg(dev, "%s\n", __func__); kfree(psy); } int power_supply_reg_notifier(struct notifier_block *nb) { return blocking_notifier_chain_register(&power_supply_notifier, nb); } EXPORT_SYMBOL_GPL(power_supply_reg_notifier); void power_supply_unreg_notifier(struct notifier_block *nb) { blocking_notifier_chain_unregister(&power_supply_notifier, nb); } EXPORT_SYMBOL_GPL(power_supply_unreg_notifier); #ifdef CONFIG_THERMAL static int power_supply_read_temp(struct thermal_zone_device *tzd, int *temp) { struct power_supply *psy; union power_supply_propval val; int ret; WARN_ON(tzd == NULL); psy = thermal_zone_device_priv(tzd); ret = power_supply_get_property(psy, POWER_SUPPLY_PROP_TEMP, &val); if (ret) return ret; /* Convert tenths of degree Celsius to milli degree Celsius. */ *temp = val.intval * 100; return ret; } static struct thermal_zone_device_ops psy_tzd_ops = { .get_temp = power_supply_read_temp, }; static int psy_register_thermal(struct power_supply *psy) { int ret; if (psy->desc->no_thermal) return 0; /* Register battery zone device psy reports temperature */ if (psy_has_property(psy->desc, POWER_SUPPLY_PROP_TEMP)) { /* Prefer our hwmon device and avoid duplicates */ struct thermal_zone_params tzp = { .no_hwmon = IS_ENABLED(CONFIG_POWER_SUPPLY_HWMON) }; psy->tzd = thermal_tripless_zone_device_register(psy->desc->name, psy, &psy_tzd_ops, &tzp); if (IS_ERR(psy->tzd)) return PTR_ERR(psy->tzd); ret = thermal_zone_device_enable(psy->tzd); if (ret) thermal_zone_device_unregister(psy->tzd); return ret; } return 0; } static void psy_unregister_thermal(struct power_supply *psy) { if (IS_ERR_OR_NULL(psy->tzd)) return; thermal_zone_device_unregister(psy->tzd); } #else static int psy_register_thermal(struct power_supply *psy) { return 0; } static void psy_unregister_thermal(struct power_supply *psy) { } #endif static struct power_supply *__must_check __power_supply_register(struct device *parent, const struct power_supply_desc *desc, const struct power_supply_config *cfg, bool ws) { struct device *dev; struct power_supply *psy; int rc; if (!desc || !desc->name || !desc->properties || !desc->num_properties) return ERR_PTR(-EINVAL); if (!parent) pr_warn("%s: Expected proper parent device for '%s'\n", __func__, desc->name); if (psy_has_property(desc, POWER_SUPPLY_PROP_USB_TYPE) && (!desc->usb_types || !desc->num_usb_types)) return ERR_PTR(-EINVAL); psy = kzalloc(sizeof(*psy), GFP_KERNEL); if (!psy) return ERR_PTR(-ENOMEM); dev = &psy->dev; device_initialize(dev); dev->class = &power_supply_class; dev->type = &power_supply_dev_type; dev->parent = parent; dev->release = power_supply_dev_release; dev_set_drvdata(dev, psy); psy->desc = desc; if (cfg) { dev->groups = cfg->attr_grp; psy->drv_data = cfg->drv_data; psy->of_node = cfg->fwnode ? to_of_node(cfg->fwnode) : cfg->of_node; dev->of_node = psy->of_node; psy->supplied_to = cfg->supplied_to; psy->num_supplicants = cfg->num_supplicants; } rc = dev_set_name(dev, "%s", desc->name); if (rc) goto dev_set_name_failed; INIT_WORK(&psy->changed_work, power_supply_changed_work); INIT_DELAYED_WORK(&psy->deferred_register_work, power_supply_deferred_register_work); rc = power_supply_check_supplies(psy); if (rc) { dev_dbg(dev, "Not all required supplies found, defer probe\n"); goto check_supplies_failed; } /* * Expose constant battery info, if it is available. While there are * some chargers accessing constant battery data, we only want to * expose battery data to userspace for battery devices. */ if (desc->type == POWER_SUPPLY_TYPE_BATTERY) { rc = power_supply_get_battery_info(psy, &psy->battery_info); if (rc && rc != -ENODEV && rc != -ENOENT) goto check_supplies_failed; } spin_lock_init(&psy->changed_lock); rc = device_add(dev); if (rc) goto device_add_failed; rc = device_init_wakeup(dev, ws); if (rc) goto wakeup_init_failed; rc = psy_register_thermal(psy); if (rc) goto register_thermal_failed; rc = power_supply_create_triggers(psy); if (rc) goto create_triggers_failed; rc = power_supply_add_hwmon_sysfs(psy); if (rc) goto add_hwmon_sysfs_failed; /* * Update use_cnt after any uevents (most notably from device_add()). * We are here still during driver's probe but * the power_supply_uevent() calls back driver's get_property * method so: * 1. Driver did not assigned the returned struct power_supply, * 2. Driver could not finish initialization (anything in its probe * after calling power_supply_register()). */ atomic_inc(&psy->use_cnt); psy->initialized = true; queue_delayed_work(system_power_efficient_wq, &psy->deferred_register_work, POWER_SUPPLY_DEFERRED_REGISTER_TIME); return psy; add_hwmon_sysfs_failed: power_supply_remove_triggers(psy); create_triggers_failed: psy_unregister_thermal(psy); register_thermal_failed: wakeup_init_failed: device_del(dev); device_add_failed: check_supplies_failed: dev_set_name_failed: put_device(dev); return ERR_PTR(rc); } /** * power_supply_register() - Register new power supply * @parent: Device to be a parent of power supply's device, usually * the device which probe function calls this * @desc: Description of power supply, must be valid through whole * lifetime of this power supply * @cfg: Run-time specific configuration accessed during registering, * may be NULL * * Return: A pointer to newly allocated power_supply on success * or ERR_PTR otherwise. * Use power_supply_unregister() on returned power_supply pointer to release * resources. */ struct power_supply *__must_check power_supply_register(struct device *parent, const struct power_supply_desc *desc, const struct power_supply_config *cfg) { return __power_supply_register(parent, desc, cfg, true); } EXPORT_SYMBOL_GPL(power_supply_register); /** * power_supply_register_no_ws() - Register new non-waking-source power supply * @parent: Device to be a parent of power supply's device, usually * the device which probe function calls this * @desc: Description of power supply, must be valid through whole * lifetime of this power supply * @cfg: Run-time specific configuration accessed during registering, * may be NULL * * Return: A pointer to newly allocated power_supply on success * or ERR_PTR otherwise. * Use power_supply_unregister() on returned power_supply pointer to release * resources. */ struct power_supply *__must_check power_supply_register_no_ws(struct device *parent, const struct power_supply_desc *desc, const struct power_supply_config *cfg) { return __power_supply_register(parent, desc, cfg, false); } EXPORT_SYMBOL_GPL(power_supply_register_no_ws); static void devm_power_supply_release(struct device *dev, void *res) { struct power_supply **psy = res; power_supply_unregister(*psy); } /** * devm_power_supply_register() - Register managed power supply * @parent: Device to be a parent of power supply's device, usually * the device which probe function calls this * @desc: Description of power supply, must be valid through whole * lifetime of this power supply * @cfg: Run-time specific configuration accessed during registering, * may be NULL * * Return: A pointer to newly allocated power_supply on success * or ERR_PTR otherwise. * The returned power_supply pointer will be automatically unregistered * on driver detach. */ struct power_supply *__must_check devm_power_supply_register(struct device *parent, const struct power_supply_desc *desc, const struct power_supply_config *cfg) { struct power_supply **ptr, *psy; ptr = devres_alloc(devm_power_supply_release, sizeof(*ptr), GFP_KERNEL); if (!ptr) return ERR_PTR(-ENOMEM); psy = __power_supply_register(parent, desc, cfg, true); if (IS_ERR(psy)) { devres_free(ptr); } else { *ptr = psy; devres_add(parent, ptr); } return psy; } EXPORT_SYMBOL_GPL(devm_power_supply_register); /** * devm_power_supply_register_no_ws() - Register managed non-waking-source power supply * @parent: Device to be a parent of power supply's device, usually * the device which probe function calls this * @desc: Description of power supply, must be valid through whole * lifetime of this power supply * @cfg: Run-time specific configuration accessed during registering, * may be NULL * * Return: A pointer to newly allocated power_supply on success * or ERR_PTR otherwise. * The returned power_supply pointer will be automatically unregistered * on driver detach. */ struct power_supply *__must_check devm_power_supply_register_no_ws(struct device *parent, const struct power_supply_desc *desc, const struct power_supply_config *cfg) { struct power_supply **ptr, *psy; ptr = devres_alloc(devm_power_supply_release, sizeof(*ptr), GFP_KERNEL); if (!ptr) return ERR_PTR(-ENOMEM); psy = __power_supply_register(parent, desc, cfg, false); if (IS_ERR(psy)) { devres_free(ptr); } else { *ptr = psy; devres_add(parent, ptr); } return psy; } EXPORT_SYMBOL_GPL(devm_power_supply_register_no_ws); /** * power_supply_unregister() - Remove this power supply from system * @psy: Pointer to power supply to unregister * * Remove this power supply from the system. The resources of power supply * will be freed here or on last power_supply_put() call. */ void power_supply_unregister(struct power_supply *psy) { WARN_ON(atomic_dec_return(&psy->use_cnt)); psy->removing = true; cancel_work_sync(&psy->changed_work); cancel_delayed_work_sync(&psy->deferred_register_work); sysfs_remove_link(&psy->dev.kobj, "powers"); power_supply_remove_hwmon_sysfs(psy); power_supply_remove_triggers(psy); psy_unregister_thermal(psy); device_init_wakeup(&psy->dev, false); device_unregister(&psy->dev); } EXPORT_SYMBOL_GPL(power_supply_unregister); void *power_supply_get_drvdata(struct power_supply *psy) { return psy->drv_data; } EXPORT_SYMBOL_GPL(power_supply_get_drvdata); static int __init power_supply_class_init(void) { power_supply_init_attrs(); return class_register(&power_supply_class); } static void __exit power_supply_class_exit(void) { class_unregister(&power_supply_class); } subsys_initcall(power_supply_class_init); module_exit(power_supply_class_exit); MODULE_DESCRIPTION("Universal power supply monitor class"); MODULE_AUTHOR("Ian Molton <spyro@f2s.com>"); MODULE_AUTHOR("Szabolcs Gyurko"); MODULE_AUTHOR("Anton Vorontsov <cbou@mail.ru>"); |
4 3 2 2 2 7 19 19 3 3 7 9 7 8 7 7 7 6 7 7 2 2 9 9 9 7 7 7 7 7 7 4 3 4 4 4 4 4 10 7 10 10 7 7 7 2 2 2 1 1 30 28 10 10 7 7 7 4 4 4 9 10 7 7 1 1 1 1 4 3 2 1 4 3 1 1 2 1 4 4 4 3 3 4 1 3 4 4 9 9 4 4 9 10 10 10 10 9 10 7 7 7 7 7 9 9 9 9 1806 1805 8 4 4 4 6 7 4 3 3 7 2 2 3 3 3 10 10 10 9 10 10 10 10 10 10 9 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 9 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 9 10 10 9 10 10 10 9 10 10 10 10 10 10 10 10 10 10 10 10 10 10 9 10 9 10 10 10 10 10 10 10 9 10 10 10 10 10 10 10 9 10 10 10 10 10 10 10 10 10 10 10 10 10 9 9 9 10 10 10 10 10 10 10 10 10 10 9 10 9 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 2297 2298 2299 2300 2301 2302 2303 2304 2305 2306 2307 2308 2309 2310 2311 2312 2313 2314 2315 2316 2317 2318 2319 2320 2321 2322 2323 2324 2325 2326 2327 2328 2329 2330 2331 2332 2333 2334 2335 2336 2337 2338 2339 2340 2341 2342 2343 2344 2345 2346 2347 2348 2349 2350 2351 2352 2353 2354 2355 2356 2357 2358 2359 2360 2361 2362 2363 2364 2365 2366 2367 2368 2369 2370 2371 2372 2373 2374 2375 2376 2377 2378 2379 2380 2381 2382 2383 2384 2385 2386 2387 2388 2389 2390 2391 2392 2393 2394 2395 2396 2397 2398 2399 2400 2401 2402 2403 2404 2405 2406 2407 2408 2409 2410 2411 2412 2413 2414 2415 2416 2417 2418 2419 2420 2421 2422 2423 2424 2425 2426 2427 2428 2429 2430 2431 2432 2433 2434 2435 2436 2437 2438 2439 2440 2441 2442 2443 2444 2445 2446 2447 2448 2449 2450 2451 2452 2453 2454 2455 2456 2457 2458 2459 2460 2461 2462 2463 2464 2465 2466 2467 2468 2469 2470 2471 2472 2473 2474 2475 2476 2477 2478 2479 2480 2481 2482 2483 2484 2485 2486 2487 2488 2489 2490 2491 2492 2493 2494 2495 2496 2497 2498 2499 2500 2501 2502 2503 2504 2505 2506 2507 2508 2509 2510 2511 2512 2513 2514 2515 2516 2517 2518 2519 2520 2521 2522 2523 2524 2525 2526 2527 2528 2529 2530 2531 2532 2533 2534 2535 2536 2537 2538 2539 2540 2541 2542 2543 2544 2545 2546 2547 2548 2549 2550 2551 2552 2553 2554 2555 2556 2557 2558 2559 2560 2561 2562 2563 2564 2565 2566 2567 2568 2569 2570 2571 2572 2573 2574 2575 2576 2577 2578 2579 2580 2581 2582 2583 2584 2585 2586 2587 2588 2589 2590 2591 2592 2593 2594 2595 2596 2597 2598 2599 2600 2601 2602 2603 2604 2605 2606 2607 2608 2609 2610 2611 2612 2613 2614 2615 2616 2617 2618 2619 2620 2621 2622 2623 2624 2625 2626 2627 2628 2629 2630 2631 2632 2633 2634 2635 2636 2637 2638 2639 2640 2641 2642 2643 2644 2645 2646 2647 2648 2649 2650 2651 2652 2653 2654 2655 2656 2657 2658 2659 2660 2661 2662 2663 2664 2665 2666 2667 2668 2669 2670 2671 2672 2673 2674 2675 2676 2677 2678 2679 2680 2681 2682 2683 2684 2685 2686 2687 2688 2689 2690 2691 2692 2693 2694 2695 2696 2697 2698 2699 2700 2701 2702 2703 2704 2705 2706 2707 2708 2709 2710 2711 2712 2713 2714 2715 2716 2717 2718 2719 2720 2721 2722 2723 2724 2725 2726 2727 2728 2729 2730 2731 2732 2733 2734 2735 2736 2737 2738 2739 2740 2741 2742 2743 2744 2745 2746 2747 2748 2749 2750 2751 2752 2753 2754 2755 2756 2757 2758 2759 2760 2761 2762 2763 2764 2765 2766 2767 2768 2769 2770 2771 2772 2773 2774 2775 2776 2777 2778 2779 2780 2781 2782 2783 2784 2785 2786 2787 2788 2789 2790 2791 2792 2793 2794 2795 2796 2797 2798 2799 2800 2801 2802 2803 2804 2805 2806 2807 2808 2809 2810 2811 2812 2813 2814 2815 2816 2817 2818 2819 2820 2821 2822 2823 2824 2825 2826 2827 2828 2829 2830 2831 2832 2833 2834 2835 2836 2837 2838 2839 2840 2841 2842 2843 2844 2845 2846 2847 2848 2849 2850 2851 2852 2853 2854 2855 2856 2857 2858 2859 2860 2861 2862 2863 2864 2865 2866 2867 2868 2869 2870 2871 2872 2873 2874 2875 2876 2877 2878 2879 2880 2881 2882 2883 2884 2885 2886 2887 2888 2889 2890 2891 2892 2893 2894 2895 2896 2897 2898 2899 2900 2901 2902 2903 2904 2905 2906 2907 2908 2909 2910 2911 2912 2913 2914 2915 2916 2917 2918 2919 2920 2921 2922 2923 2924 2925 2926 2927 2928 2929 2930 2931 2932 2933 2934 2935 2936 2937 2938 2939 2940 2941 2942 2943 2944 2945 2946 2947 2948 2949 2950 | /* * Copyright (c) 2004 Topspin Communications. All rights reserved. * Copyright (c) 2005 Sun Microsystems, Inc. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU * General Public License (GPL) Version 2, available from the file * COPYING in the main directory of this source tree, or the * OpenIB.org BSD license below: * * Redistribution and use in source and binary forms, with or * without modification, are permitted provided that the following * conditions are met: * * - Redistributions of source code must retain the above * copyright notice, this list of conditions and the following * disclaimer. * * - Redistributions in binary form must reproduce the above * copyright notice, this list of conditions and the following * disclaimer in the documentation and/or other materials * provided with the distribution. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE * SOFTWARE. */ #include <linux/module.h> #include <linux/string.h> #include <linux/errno.h> #include <linux/kernel.h> #include <linux/slab.h> #include <linux/init.h> #include <linux/netdevice.h> #include <net/net_namespace.h> #include <linux/security.h> #include <linux/notifier.h> #include <linux/hashtable.h> #include <rdma/rdma_netlink.h> #include <rdma/ib_addr.h> #include <rdma/ib_cache.h> #include <rdma/rdma_counter.h> #include "core_priv.h" #include "restrack.h" MODULE_AUTHOR("Roland Dreier"); MODULE_DESCRIPTION("core kernel InfiniBand API"); MODULE_LICENSE("Dual BSD/GPL"); struct workqueue_struct *ib_comp_wq; struct workqueue_struct *ib_comp_unbound_wq; struct workqueue_struct *ib_wq; EXPORT_SYMBOL_GPL(ib_wq); static struct workqueue_struct *ib_unreg_wq; /* * Each of the three rwsem locks (devices, clients, client_data) protects the * xarray of the same name. Specifically it allows the caller to assert that * the MARK will/will not be changing under the lock, and for devices and * clients, that the value in the xarray is still a valid pointer. Change of * the MARK is linked to the object state, so holding the lock and testing the * MARK also asserts that the contained object is in a certain state. * * This is used to build a two stage register/unregister flow where objects * can continue to be in the xarray even though they are still in progress to * register/unregister. * * The xarray itself provides additional locking, and restartable iteration, * which is also relied on. * * Locks should not be nested, with the exception of client_data, which is * allowed to nest under the read side of the other two locks. * * The devices_rwsem also protects the device name list, any change or * assignment of device name must also hold the write side to guarantee unique * names. */ /* * devices contains devices that have had their names assigned. The * devices may not be registered. Users that care about the registration * status need to call ib_device_try_get() on the device to ensure it is * registered, and keep it registered, for the required duration. * */ static DEFINE_XARRAY_FLAGS(devices, XA_FLAGS_ALLOC); static DECLARE_RWSEM(devices_rwsem); #define DEVICE_REGISTERED XA_MARK_1 static u32 highest_client_id; #define CLIENT_REGISTERED XA_MARK_1 static DEFINE_XARRAY_FLAGS(clients, XA_FLAGS_ALLOC); static DECLARE_RWSEM(clients_rwsem); static void ib_client_put(struct ib_client *client) { if (refcount_dec_and_test(&client->uses)) complete(&client->uses_zero); } /* * If client_data is registered then the corresponding client must also still * be registered. */ #define CLIENT_DATA_REGISTERED XA_MARK_1 unsigned int rdma_dev_net_id; /* * A list of net namespaces is maintained in an xarray. This is necessary * because we can't get the locking right using the existing net ns list. We * would require a init_net callback after the list is updated. */ static DEFINE_XARRAY_FLAGS(rdma_nets, XA_FLAGS_ALLOC); /* * rwsem to protect accessing the rdma_nets xarray entries. */ static DECLARE_RWSEM(rdma_nets_rwsem); bool ib_devices_shared_netns = true; module_param_named(netns_mode, ib_devices_shared_netns, bool, 0444); MODULE_PARM_DESC(netns_mode, "Share device among net namespaces; default=1 (shared)"); /** * rdma_dev_access_netns() - Return whether an rdma device can be accessed * from a specified net namespace or not. * @dev: Pointer to rdma device which needs to be checked * @net: Pointer to net namesapce for which access to be checked * * When the rdma device is in shared mode, it ignores the net namespace. * When the rdma device is exclusive to a net namespace, rdma device net * namespace is checked against the specified one. */ bool rdma_dev_access_netns(const struct ib_device *dev, const struct net *net) { return (ib_devices_shared_netns || net_eq(read_pnet(&dev->coredev.rdma_net), net)); } EXPORT_SYMBOL(rdma_dev_access_netns); /* * xarray has this behavior where it won't iterate over NULL values stored in * allocated arrays. So we need our own iterator to see all values stored in * the array. This does the same thing as xa_for_each except that it also * returns NULL valued entries if the array is allocating. Simplified to only * work on simple xarrays. */ static void *xan_find_marked(struct xarray *xa, unsigned long *indexp, xa_mark_t filter) { XA_STATE(xas, xa, *indexp); void *entry; rcu_read_lock(); do { entry = xas_find_marked(&xas, ULONG_MAX, filter); if (xa_is_zero(entry)) break; } while (xas_retry(&xas, entry)); rcu_read_unlock(); if (entry) { *indexp = xas.xa_index; if (xa_is_zero(entry)) return NULL; return entry; } return XA_ERROR(-ENOENT); } #define xan_for_each_marked(xa, index, entry, filter) \ for (index = 0, entry = xan_find_marked(xa, &(index), filter); \ !xa_is_err(entry); \ (index)++, entry = xan_find_marked(xa, &(index), filter)) /* RCU hash table mapping netdevice pointers to struct ib_port_data */ static DEFINE_SPINLOCK(ndev_hash_lock); static DECLARE_HASHTABLE(ndev_hash, 5); static void free_netdevs(struct ib_device *ib_dev); static void ib_unregister_work(struct work_struct *work); static void __ib_unregister_device(struct ib_device *device); static int ib_security_change(struct notifier_block *nb, unsigned long event, void *lsm_data); static void ib_policy_change_task(struct work_struct *work); static DECLARE_WORK(ib_policy_change_work, ib_policy_change_task); static void __ibdev_printk(const char *level, const struct ib_device *ibdev, struct va_format *vaf) { if (ibdev && ibdev->dev.parent) dev_printk_emit(level[1] - '0', ibdev->dev.parent, "%s %s %s: %pV", dev_driver_string(ibdev->dev.parent), dev_name(ibdev->dev.parent), dev_name(&ibdev->dev), vaf); else if (ibdev) printk("%s%s: %pV", level, dev_name(&ibdev->dev), vaf); else printk("%s(NULL ib_device): %pV", level, vaf); } void ibdev_printk(const char *level, const struct ib_device *ibdev, const char *format, ...) { struct va_format vaf; va_list args; va_start(args, format); vaf.fmt = format; vaf.va = &args; __ibdev_printk(level, ibdev, &vaf); va_end(args); } EXPORT_SYMBOL(ibdev_printk); #define define_ibdev_printk_level(func, level) \ void func(const struct ib_device *ibdev, const char *fmt, ...) \ { \ struct va_format vaf; \ va_list args; \ \ va_start(args, fmt); \ \ vaf.fmt = fmt; \ vaf.va = &args; \ \ __ibdev_printk(level, ibdev, &vaf); \ \ va_end(args); \ } \ EXPORT_SYMBOL(func); define_ibdev_printk_level(ibdev_emerg, KERN_EMERG); define_ibdev_printk_level(ibdev_alert, KERN_ALERT); define_ibdev_printk_level(ibdev_crit, KERN_CRIT); define_ibdev_printk_level(ibdev_err, KERN_ERR); define_ibdev_printk_level(ibdev_warn, KERN_WARNING); define_ibdev_printk_level(ibdev_notice, KERN_NOTICE); define_ibdev_printk_level(ibdev_info, KERN_INFO); static struct notifier_block ibdev_lsm_nb = { .notifier_call = ib_security_change, }; static int rdma_dev_change_netns(struct ib_device *device, struct net *cur_net, struct net *net); /* Pointer to the RCU head at the start of the ib_port_data array */ struct ib_port_data_rcu { struct rcu_head rcu_head; struct ib_port_data pdata[]; }; static void ib_device_check_mandatory(struct ib_device *device) { #define IB_MANDATORY_FUNC(x) { offsetof(struct ib_device_ops, x), #x } static const struct { size_t offset; char *name; } mandatory_table[] = { IB_MANDATORY_FUNC(query_device), IB_MANDATORY_FUNC(query_port), IB_MANDATORY_FUNC(alloc_pd), IB_MANDATORY_FUNC(dealloc_pd), IB_MANDATORY_FUNC(create_qp), IB_MANDATORY_FUNC(modify_qp), IB_MANDATORY_FUNC(destroy_qp), IB_MANDATORY_FUNC(post_send), IB_MANDATORY_FUNC(post_recv), IB_MANDATORY_FUNC(create_cq), IB_MANDATORY_FUNC(destroy_cq), IB_MANDATORY_FUNC(poll_cq), IB_MANDATORY_FUNC(req_notify_cq), IB_MANDATORY_FUNC(get_dma_mr), IB_MANDATORY_FUNC(reg_user_mr), IB_MANDATORY_FUNC(dereg_mr), IB_MANDATORY_FUNC(get_port_immutable) }; int i; device->kverbs_provider = true; for (i = 0; i < ARRAY_SIZE(mandatory_table); ++i) { if (!*(void **) ((void *) &device->ops + mandatory_table[i].offset)) { device->kverbs_provider = false; break; } } } /* * Caller must perform ib_device_put() to return the device reference count * when ib_device_get_by_index() returns valid device pointer. */ struct ib_device *ib_device_get_by_index(const struct net *net, u32 index) { struct ib_device *device; down_read(&devices_rwsem); device = xa_load(&devices, index); if (device) { if (!rdma_dev_access_netns(device, net)) { device = NULL; goto out; } if (!ib_device_try_get(device)) device = NULL; } out: up_read(&devices_rwsem); return device; } /** * ib_device_put - Release IB device reference * @device: device whose reference to be released * * ib_device_put() releases reference to the IB device to allow it to be * unregistered and eventually free. */ void ib_device_put(struct ib_device *device) { if (refcount_dec_and_test(&device->refcount)) complete(&device->unreg_completion); } EXPORT_SYMBOL(ib_device_put); static struct ib_device *__ib_device_get_by_name(const char *name) { struct ib_device *device; unsigned long index; xa_for_each (&devices, index, device) if (!strcmp(name, dev_name(&device->dev))) return device; return NULL; } /** * ib_device_get_by_name - Find an IB device by name * @name: The name to look for * @driver_id: The driver ID that must match (RDMA_DRIVER_UNKNOWN matches all) * * Find and hold an ib_device by its name. The caller must call * ib_device_put() on the returned pointer. */ struct ib_device *ib_device_get_by_name(const char *name, enum rdma_driver_id driver_id) { struct ib_device *device; down_read(&devices_rwsem); device = __ib_device_get_by_name(name); if (device && driver_id != RDMA_DRIVER_UNKNOWN && device->ops.driver_id != driver_id) device = NULL; if (device) { if (!ib_device_try_get(device)) device = NULL; } up_read(&devices_rwsem); return device; } EXPORT_SYMBOL(ib_device_get_by_name); static int rename_compat_devs(struct ib_device *device) { struct ib_core_device *cdev; unsigned long index; int ret = 0; mutex_lock(&device->compat_devs_mutex); xa_for_each (&device->compat_devs, index, cdev) { ret = device_rename(&cdev->dev, dev_name(&device->dev)); if (ret) { dev_warn(&cdev->dev, "Fail to rename compatdev to new name %s\n", dev_name(&device->dev)); break; } } mutex_unlock(&device->compat_devs_mutex); return ret; } int ib_device_rename(struct ib_device *ibdev, const char *name) { unsigned long index; void *client_data; int ret; down_write(&devices_rwsem); if (!strcmp(name, dev_name(&ibdev->dev))) { up_write(&devices_rwsem); return 0; } if (__ib_device_get_by_name(name)) { up_write(&devices_rwsem); return -EEXIST; } ret = device_rename(&ibdev->dev, name); if (ret) { up_write(&devices_rwsem); return ret; } strscpy(ibdev->name, name, IB_DEVICE_NAME_MAX); ret = rename_compat_devs(ibdev); downgrade_write(&devices_rwsem); down_read(&ibdev->client_data_rwsem); xan_for_each_marked(&ibdev->client_data, index, client_data, CLIENT_DATA_REGISTERED) { struct ib_client *client = xa_load(&clients, index); if (!client || !client->rename) continue; client->rename(ibdev, client_data); } up_read(&ibdev->client_data_rwsem); up_read(&devices_rwsem); return 0; } int ib_device_set_dim(struct ib_device *ibdev, u8 use_dim) { if (use_dim > 1) return -EINVAL; ibdev->use_cq_dim = use_dim; return 0; } static int alloc_name(struct ib_device *ibdev, const char *name) { struct ib_device *device; unsigned long index; struct ida inuse; int rc; int i; lockdep_assert_held_write(&devices_rwsem); ida_init(&inuse); xa_for_each (&devices, index, device) { char buf[IB_DEVICE_NAME_MAX]; if (sscanf(dev_name(&device->dev), name, &i) != 1) continue; if (i < 0 || i >= INT_MAX) continue; snprintf(buf, sizeof buf, name, i); if (strcmp(buf, dev_name(&device->dev)) != 0) continue; rc = ida_alloc_range(&inuse, i, i, GFP_KERNEL); if (rc < 0) goto out; } rc = ida_alloc(&inuse, GFP_KERNEL); if (rc < 0) goto out; rc = dev_set_name(&ibdev->dev, name, rc); out: ida_destroy(&inuse); return rc; } static void ib_device_release(struct device *device) { struct ib_device *dev = container_of(device, struct ib_device, dev); free_netdevs(dev); WARN_ON(refcount_read(&dev->refcount)); if (dev->hw_stats_data) ib_device_release_hw_stats(dev->hw_stats_data); if (dev->port_data) { ib_cache_release_one(dev); ib_security_release_port_pkey_list(dev); rdma_counter_release(dev); kfree_rcu(container_of(dev->port_data, struct ib_port_data_rcu, pdata[0]), rcu_head); } mutex_destroy(&dev->subdev_lock); mutex_destroy(&dev->unregistration_lock); mutex_destroy(&dev->compat_devs_mutex); xa_destroy(&dev->compat_devs); xa_destroy(&dev->client_data); kfree_rcu(dev, rcu_head); } static int ib_device_uevent(const struct device *device, struct kobj_uevent_env *env) { if (add_uevent_var(env, "NAME=%s", dev_name(device))) return -ENOMEM; /* * It would be nice to pass the node GUID with the event... */ return 0; } static const void *net_namespace(const struct device *d) { const struct ib_core_device *coredev = container_of(d, struct ib_core_device, dev); return read_pnet(&coredev->rdma_net); } static struct class ib_class = { .name = "infiniband", .dev_release = ib_device_release, .dev_uevent = ib_device_uevent, .ns_type = &net_ns_type_operations, .namespace = net_namespace, }; static void rdma_init_coredev(struct ib_core_device *coredev, struct ib_device *dev, struct net *net) { /* This BUILD_BUG_ON is intended to catch layout change * of union of ib_core_device and device. * dev must be the first element as ib_core and providers * driver uses it. Adding anything in ib_core_device before * device will break this assumption. */ BUILD_BUG_ON(offsetof(struct ib_device, coredev.dev) != offsetof(struct ib_device, dev)); coredev->dev.class = &ib_class; coredev->dev.groups = dev->groups; device_initialize(&coredev->dev); coredev->owner = dev; INIT_LIST_HEAD(&coredev->port_list); write_pnet(&coredev->rdma_net, net); } /** * _ib_alloc_device - allocate an IB device struct * @size:size of structure to allocate * * Low-level drivers should use ib_alloc_device() to allocate &struct * ib_device. @size is the size of the structure to be allocated, * including any private data used by the low-level driver. * ib_dealloc_device() must be used to free structures allocated with * ib_alloc_device(). */ struct ib_device *_ib_alloc_device(size_t size) { struct ib_device *device; unsigned int i; if (WARN_ON(size < sizeof(struct ib_device))) return NULL; device = kzalloc(size, GFP_KERNEL); if (!device) return NULL; if (rdma_restrack_init(device)) { kfree(device); return NULL; } rdma_init_coredev(&device->coredev, device, &init_net); INIT_LIST_HEAD(&device->event_handler_list); spin_lock_init(&device->qp_open_list_lock); init_rwsem(&device->event_handler_rwsem); mutex_init(&device->unregistration_lock); /* * client_data needs to be alloc because we don't want our mark to be * destroyed if the user stores NULL in the client data. */ xa_init_flags(&device->client_data, XA_FLAGS_ALLOC); init_rwsem(&device->client_data_rwsem); xa_init_flags(&device->compat_devs, XA_FLAGS_ALLOC); mutex_init(&device->compat_devs_mutex); init_completion(&device->unreg_completion); INIT_WORK(&device->unregistration_work, ib_unregister_work); spin_lock_init(&device->cq_pools_lock); for (i = 0; i < ARRAY_SIZE(device->cq_pools); i++) INIT_LIST_HEAD(&device->cq_pools[i]); rwlock_init(&device->cache_lock); device->uverbs_cmd_mask = BIT_ULL(IB_USER_VERBS_CMD_ALLOC_MW) | BIT_ULL(IB_USER_VERBS_CMD_ALLOC_PD) | BIT_ULL(IB_USER_VERBS_CMD_ATTACH_MCAST) | BIT_ULL(IB_USER_VERBS_CMD_CLOSE_XRCD) | BIT_ULL(IB_USER_VERBS_CMD_CREATE_AH) | BIT_ULL(IB_USER_VERBS_CMD_CREATE_COMP_CHANNEL) | BIT_ULL(IB_USER_VERBS_CMD_CREATE_CQ) | BIT_ULL(IB_USER_VERBS_CMD_CREATE_QP) | BIT_ULL(IB_USER_VERBS_CMD_CREATE_SRQ) | BIT_ULL(IB_USER_VERBS_CMD_CREATE_XSRQ) | BIT_ULL(IB_USER_VERBS_CMD_DEALLOC_MW) | BIT_ULL(IB_USER_VERBS_CMD_DEALLOC_PD) | BIT_ULL(IB_USER_VERBS_CMD_DEREG_MR) | BIT_ULL(IB_USER_VERBS_CMD_DESTROY_AH) | BIT_ULL(IB_USER_VERBS_CMD_DESTROY_CQ) | BIT_ULL(IB_USER_VERBS_CMD_DESTROY_QP) | BIT_ULL(IB_USER_VERBS_CMD_DESTROY_SRQ) | BIT_ULL(IB_USER_VERBS_CMD_DETACH_MCAST) | BIT_ULL(IB_USER_VERBS_CMD_GET_CONTEXT) | BIT_ULL(IB_USER_VERBS_CMD_MODIFY_QP) | BIT_ULL(IB_USER_VERBS_CMD_MODIFY_SRQ) | BIT_ULL(IB_USER_VERBS_CMD_OPEN_QP) | BIT_ULL(IB_USER_VERBS_CMD_OPEN_XRCD) | BIT_ULL(IB_USER_VERBS_CMD_QUERY_DEVICE) | BIT_ULL(IB_USER_VERBS_CMD_QUERY_PORT) | BIT_ULL(IB_USER_VERBS_CMD_QUERY_QP) | BIT_ULL(IB_USER_VERBS_CMD_QUERY_SRQ) | BIT_ULL(IB_USER_VERBS_CMD_REG_MR) | BIT_ULL(IB_USER_VERBS_CMD_REREG_MR) | BIT_ULL(IB_USER_VERBS_CMD_RESIZE_CQ); mutex_init(&device->subdev_lock); INIT_LIST_HEAD(&device->subdev_list_head); INIT_LIST_HEAD(&device->subdev_list); return device; } EXPORT_SYMBOL(_ib_alloc_device); /** * ib_dealloc_device - free an IB device struct * @device:structure to free * * Free a structure allocated with ib_alloc_device(). */ void ib_dealloc_device(struct ib_device *device) { if (device->ops.dealloc_driver) device->ops.dealloc_driver(device); /* * ib_unregister_driver() requires all devices to remain in the xarray * while their ops are callable. The last op we call is dealloc_driver * above. This is needed to create a fence on op callbacks prior to * allowing the driver module to unload. */ down_write(&devices_rwsem); if (xa_load(&devices, device->index) == device) xa_erase(&devices, device->index); up_write(&devices_rwsem); /* Expedite releasing netdev references */ free_netdevs(device); WARN_ON(!xa_empty(&device->compat_devs)); WARN_ON(!xa_empty(&device->client_data)); WARN_ON(refcount_read(&device->refcount)); rdma_restrack_clean(device); /* Balances with device_initialize */ put_device(&device->dev); } EXPORT_SYMBOL(ib_dealloc_device); /* * add_client_context() and remove_client_context() must be safe against * parallel calls on the same device - registration/unregistration of both the * device and client can be occurring in parallel. * * The routines need to be a fence, any caller must not return until the add * or remove is fully completed. */ static int add_client_context(struct ib_device *device, struct ib_client *client) { int ret = 0; if (!device->kverbs_provider && !client->no_kverbs_req) return 0; down_write(&device->client_data_rwsem); /* * So long as the client is registered hold both the client and device * unregistration locks. */ if (!refcount_inc_not_zero(&client->uses)) goto out_unlock; refcount_inc(&device->refcount); /* * Another caller to add_client_context got here first and has already * completely initialized context. */ if (xa_get_mark(&device->client_data, client->client_id, CLIENT_DATA_REGISTERED)) goto out; ret = xa_err(xa_store(&device->client_data, client->client_id, NULL, GFP_KERNEL)); if (ret) goto out; downgrade_write(&device->client_data_rwsem); if (client->add) { if (client->add(device)) { /* * If a client fails to add then the error code is * ignored, but we won't call any more ops on this * client. */ xa_erase(&device->client_data, client->client_id); up_read(&device->client_data_rwsem); ib_device_put(device); ib_client_put(client); return 0; } } /* Readers shall not see a client until add has been completed */ xa_set_mark(&device->client_data, client->client_id, CLIENT_DATA_REGISTERED); up_read(&device->client_data_rwsem); return 0; out: ib_device_put(device); ib_client_put(client); out_unlock: up_write(&device->client_data_rwsem); return ret; } static void remove_client_context(struct ib_device *device, unsigned int client_id) { struct ib_client *client; void *client_data; down_write(&device->client_data_rwsem); if (!xa_get_mark(&device->client_data, client_id, CLIENT_DATA_REGISTERED)) { up_write(&device->client_data_rwsem); return; } client_data = xa_load(&device->client_data, client_id); xa_clear_mark(&device->client_data, client_id, CLIENT_DATA_REGISTERED); client = xa_load(&clients, client_id); up_write(&device->client_data_rwsem); /* * Notice we cannot be holding any exclusive locks when calling the * remove callback as the remove callback can recurse back into any * public functions in this module and thus try for any locks those * functions take. * * For this reason clients and drivers should not call the * unregistration functions will holdling any locks. */ if (client->remove) client->remove(device, client_data); xa_erase(&device->client_data, client_id); ib_device_put(device); ib_client_put(client); } static int alloc_port_data(struct ib_device *device) { struct ib_port_data_rcu *pdata_rcu; u32 port; if (device->port_data) return 0; /* This can only be called once the physical port range is defined */ if (WARN_ON(!device->phys_port_cnt)) return -EINVAL; /* Reserve U32_MAX so the logic to go over all the ports is sane */ if (WARN_ON(device->phys_port_cnt == U32_MAX)) return -EINVAL; /* * device->port_data is indexed directly by the port number to make * access to this data as efficient as possible. * * Therefore port_data is declared as a 1 based array with potential * empty slots at the beginning. */ pdata_rcu = kzalloc(struct_size(pdata_rcu, pdata, size_add(rdma_end_port(device), 1)), GFP_KERNEL); if (!pdata_rcu) return -ENOMEM; /* * The rcu_head is put in front of the port data array and the stored * pointer is adjusted since we never need to see that member until * kfree_rcu. */ device->port_data = pdata_rcu->pdata; rdma_for_each_port (device, port) { struct ib_port_data *pdata = &device->port_data[port]; pdata->ib_dev = device; spin_lock_init(&pdata->pkey_list_lock); INIT_LIST_HEAD(&pdata->pkey_list); spin_lock_init(&pdata->netdev_lock); INIT_HLIST_NODE(&pdata->ndev_hash_link); } return 0; } static int verify_immutable(const struct ib_device *dev, u32 port) { return WARN_ON(!rdma_cap_ib_mad(dev, port) && rdma_max_mad_size(dev, port) != 0); } static int setup_port_data(struct ib_device *device) { u32 port; int ret; ret = alloc_port_data(device); if (ret) return ret; rdma_for_each_port (device, port) { struct ib_port_data *pdata = &device->port_data[port]; ret = device->ops.get_port_immutable(device, port, &pdata->immutable); if (ret) return ret; if (verify_immutable(device, port)) return -EINVAL; } return 0; } /** * ib_port_immutable_read() - Read rdma port's immutable data * @dev: IB device * @port: port number whose immutable data to read. It starts with index 1 and * valid upto including rdma_end_port(). */ const struct ib_port_immutable* ib_port_immutable_read(struct ib_device *dev, unsigned int port) { WARN_ON(!rdma_is_port_valid(dev, port)); return &dev->port_data[port].immutable; } EXPORT_SYMBOL(ib_port_immutable_read); void ib_get_device_fw_str(struct ib_device *dev, char *str) { if (dev->ops.get_dev_fw_str) dev->ops.get_dev_fw_str(dev, str); else str[0] = '\0'; } EXPORT_SYMBOL(ib_get_device_fw_str); static void ib_policy_change_task(struct work_struct *work) { struct ib_device *dev; unsigned long index; down_read(&devices_rwsem); xa_for_each_marked (&devices, index, dev, DEVICE_REGISTERED) { unsigned int i; rdma_for_each_port (dev, i) { u64 sp; ib_get_cached_subnet_prefix(dev, i, &sp); ib_security_cache_change(dev, i, sp); } } up_read(&devices_rwsem); } static int ib_security_change(struct notifier_block *nb, unsigned long event, void *lsm_data) { if (event != LSM_POLICY_CHANGE) return NOTIFY_DONE; schedule_work(&ib_policy_change_work); ib_mad_agent_security_change(); return NOTIFY_OK; } static void compatdev_release(struct device *dev) { struct ib_core_device *cdev = container_of(dev, struct ib_core_device, dev); kfree(cdev); } static int add_one_compat_dev(struct ib_device *device, struct rdma_dev_net *rnet) { struct ib_core_device *cdev; int ret; lockdep_assert_held(&rdma_nets_rwsem); if (!ib_devices_shared_netns) return 0; /* * Create and add compat device in all namespaces other than where it * is currently bound to. */ if (net_eq(read_pnet(&rnet->net), read_pnet(&device->coredev.rdma_net))) return 0; /* * The first of init_net() or ib_register_device() to take the * compat_devs_mutex wins and gets to add the device. Others will wait * for completion here. */ mutex_lock(&device->compat_devs_mutex); cdev = xa_load(&device->compat_devs, rnet->id); if (cdev) { ret = 0; goto done; } ret = xa_reserve(&device->compat_devs, rnet->id, GFP_KERNEL); if (ret) goto done; cdev = kzalloc(sizeof(*cdev), GFP_KERNEL); if (!cdev) { ret = -ENOMEM; goto cdev_err; } cdev->dev.parent = device->dev.parent; rdma_init_coredev(cdev, device, read_pnet(&rnet->net)); cdev->dev.release = compatdev_release; ret = dev_set_name(&cdev->dev, "%s", dev_name(&device->dev)); if (ret) goto add_err; ret = device_add(&cdev->dev); if (ret) goto add_err; ret = ib_setup_port_attrs(cdev); if (ret) goto port_err; ret = xa_err(xa_store(&device->compat_devs, rnet->id, cdev, GFP_KERNEL)); if (ret) goto insert_err; mutex_unlock(&device->compat_devs_mutex); return 0; insert_err: ib_free_port_attrs(cdev); port_err: device_del(&cdev->dev); add_err: put_device(&cdev->dev); cdev_err: xa_release(&device->compat_devs, rnet->id); done: mutex_unlock(&device->compat_devs_mutex); return ret; } static void remove_one_compat_dev(struct ib_device *device, u32 id) { struct ib_core_device *cdev; mutex_lock(&device->compat_devs_mutex); cdev = xa_erase(&device->compat_devs, id); mutex_unlock(&device->compat_devs_mutex); if (cdev) { ib_free_port_attrs(cdev); device_del(&cdev->dev); put_device(&cdev->dev); } } static void remove_compat_devs(struct ib_device *device) { struct ib_core_device *cdev; unsigned long index; xa_for_each (&device->compat_devs, index, cdev) remove_one_compat_dev(device, index); } static int add_compat_devs(struct ib_device *device) { struct rdma_dev_net *rnet; unsigned long index; int ret = 0; lockdep_assert_held(&devices_rwsem); down_read(&rdma_nets_rwsem); xa_for_each (&rdma_nets, index, rnet) { ret = add_one_compat_dev(device, rnet); if (ret) break; } up_read(&rdma_nets_rwsem); return ret; } static void remove_all_compat_devs(void) { struct ib_compat_device *cdev; struct ib_device *dev; unsigned long index; down_read(&devices_rwsem); xa_for_each (&devices, index, dev) { unsigned long c_index = 0; /* Hold nets_rwsem so that any other thread modifying this * system param can sync with this thread. */ down_read(&rdma_nets_rwsem); xa_for_each (&dev->compat_devs, c_index, cdev) remove_one_compat_dev(dev, c_index); up_read(&rdma_nets_rwsem); } up_read(&devices_rwsem); } static int add_all_compat_devs(void) { struct rdma_dev_net *rnet; struct ib_device *dev; unsigned long index; int ret = 0; down_read(&devices_rwsem); xa_for_each_marked (&devices, index, dev, DEVICE_REGISTERED) { unsigned long net_index = 0; /* Hold nets_rwsem so that any other thread modifying this * system param can sync with this thread. */ down_read(&rdma_nets_rwsem); xa_for_each (&rdma_nets, net_index, rnet) { ret = add_one_compat_dev(dev, rnet); if (ret) break; } up_read(&rdma_nets_rwsem); } up_read(&devices_rwsem); if (ret) remove_all_compat_devs(); return ret; } int rdma_compatdev_set(u8 enable) { struct rdma_dev_net *rnet; unsigned long index; int ret = 0; down_write(&rdma_nets_rwsem); if (ib_devices_shared_netns == enable) { up_write(&rdma_nets_rwsem); return 0; } /* enable/disable of compat devices is not supported * when more than default init_net exists. */ xa_for_each (&rdma_nets, index, rnet) { ret++; break; } if (!ret) ib_devices_shared_netns = enable; up_write(&rdma_nets_rwsem); if (ret) return -EBUSY; if (enable) ret = add_all_compat_devs(); else remove_all_compat_devs(); return ret; } static void rdma_dev_exit_net(struct net *net) { struct rdma_dev_net *rnet = rdma_net_to_dev_net(net); struct ib_device *dev; unsigned long index; int ret; down_write(&rdma_nets_rwsem); /* * Prevent the ID from being re-used and hide the id from xa_for_each. */ ret = xa_err(xa_store(&rdma_nets, rnet->id, NULL, GFP_KERNEL)); WARN_ON(ret); up_write(&rdma_nets_rwsem); down_read(&devices_rwsem); xa_for_each (&devices, index, dev) { get_device(&dev->dev); /* * Release the devices_rwsem so that pontentially blocking * device_del, doesn't hold the devices_rwsem for too long. */ up_read(&devices_rwsem); remove_one_compat_dev(dev, rnet->id); /* * If the real device is in the NS then move it back to init. */ rdma_dev_change_netns(dev, net, &init_net); put_device(&dev->dev); down_read(&devices_rwsem); } up_read(&devices_rwsem); rdma_nl_net_exit(rnet); xa_erase(&rdma_nets, rnet->id); } static __net_init int rdma_dev_init_net(struct net *net) { struct rdma_dev_net *rnet = rdma_net_to_dev_net(net); unsigned long index; struct ib_device *dev; int ret; write_pnet(&rnet->net, net); ret = rdma_nl_net_init(rnet); if (ret) return ret; /* No need to create any compat devices in default init_net. */ if (net_eq(net, &init_net)) return 0; ret = xa_alloc(&rdma_nets, &rnet->id, rnet, xa_limit_32b, GFP_KERNEL); if (ret) { rdma_nl_net_exit(rnet); return ret; } down_read(&devices_rwsem); xa_for_each_marked (&devices, index, dev, DEVICE_REGISTERED) { /* Hold nets_rwsem so that netlink command cannot change * system configuration for device sharing mode. */ down_read(&rdma_nets_rwsem); ret = add_one_compat_dev(dev, rnet); up_read(&rdma_nets_rwsem); if (ret) break; } up_read(&devices_rwsem); if (ret) rdma_dev_exit_net(net); return ret; } /* * Assign the unique string device name and the unique device index. This is * undone by ib_dealloc_device. */ static int assign_name(struct ib_device *device, const char *name) { static u32 last_id; int ret; down_write(&devices_rwsem); /* Assign a unique name to the device */ if (strchr(name, '%')) ret = alloc_name(device, name); else ret = dev_set_name(&device->dev, name); if (ret) goto out; if (__ib_device_get_by_name(dev_name(&device->dev))) { ret = -ENFILE; goto out; } strscpy(device->name, dev_name(&device->dev), IB_DEVICE_NAME_MAX); ret = xa_alloc_cyclic(&devices, &device->index, device, xa_limit_31b, &last_id, GFP_KERNEL); if (ret > 0) ret = 0; out: up_write(&devices_rwsem); return ret; } /* * setup_device() allocates memory and sets up data that requires calling the * device ops, this is the only reason these actions are not done during * ib_alloc_device. It is undone by ib_dealloc_device(). */ static int setup_device(struct ib_device *device) { struct ib_udata uhw = {.outlen = 0, .inlen = 0}; int ret; ib_device_check_mandatory(device); ret = setup_port_data(device); if (ret) { dev_warn(&device->dev, "Couldn't create per-port data\n"); return ret; } memset(&device->attrs, 0, sizeof(device->attrs)); ret = device->ops.query_device(device, &device->attrs, &uhw); if (ret) { dev_warn(&device->dev, "Couldn't query the device attributes\n"); return ret; } return 0; } static void disable_device(struct ib_device *device) { u32 cid; WARN_ON(!refcount_read(&device->refcount)); down_write(&devices_rwsem); xa_clear_mark(&devices, device->index, DEVICE_REGISTERED); up_write(&devices_rwsem); /* * Remove clients in LIFO order, see assign_client_id. This could be * more efficient if xarray learns to reverse iterate. Since no new * clients can be added to this ib_device past this point we only need * the maximum possible client_id value here. */ down_read(&clients_rwsem); cid = highest_client_id; up_read(&clients_rwsem); while (cid) { cid--; remove_client_context(device, cid); } ib_cq_pool_cleanup(device); /* Pairs with refcount_set in enable_device */ ib_device_put(device); wait_for_completion(&device->unreg_completion); /* * compat devices must be removed after device refcount drops to zero. * Otherwise init_net() may add more compatdevs after removing compat * devices and before device is disabled. */ remove_compat_devs(device); } /* * An enabled device is visible to all clients and to all the public facing * APIs that return a device pointer. This always returns with a new get, even * if it fails. */ static int enable_device_and_get(struct ib_device *device) { struct ib_client *client; unsigned long index; int ret = 0; /* * One ref belongs to the xa and the other belongs to this * thread. This is needed to guard against parallel unregistration. */ refcount_set(&device->refcount, 2); down_write(&devices_rwsem); xa_set_mark(&devices, device->index, DEVICE_REGISTERED); /* * By using downgrade_write() we ensure that no other thread can clear * DEVICE_REGISTERED while we are completing the client setup. */ downgrade_write(&devices_rwsem); if (device->ops.enable_driver) { ret = device->ops.enable_driver(device); if (ret) goto out; } down_read(&clients_rwsem); xa_for_each_marked (&clients, index, client, CLIENT_REGISTERED) { ret = add_client_context(device, client); if (ret) break; } up_read(&clients_rwsem); if (!ret) ret = add_compat_devs(device); out: up_read(&devices_rwsem); return ret; } static void prevent_dealloc_device(struct ib_device *ib_dev) { } /** * ib_register_device - Register an IB device with IB core * @device: Device to register * @name: unique string device name. This may include a '%' which will * cause a unique index to be added to the passed device name. * @dma_device: pointer to a DMA-capable device. If %NULL, then the IB * device will be used. In this case the caller should fully * setup the ibdev for DMA. This usually means using dma_virt_ops. * * Low-level drivers use ib_register_device() to register their * devices with the IB core. All registered clients will receive a * callback for each device that is added. @device must be allocated * with ib_alloc_device(). * * If the driver uses ops.dealloc_driver and calls any ib_unregister_device() * asynchronously then the device pointer may become freed as soon as this * function returns. */ int ib_register_device(struct ib_device *device, const char *name, struct device *dma_device) { int ret; ret = assign_name(device, name); if (ret) return ret; /* * If the caller does not provide a DMA capable device then the IB core * will set up ib_sge and scatterlist structures that stash the kernel * virtual address into the address field. */ WARN_ON(dma_device && !dma_device->dma_parms); device->dma_device = dma_device; ret = setup_device(device); if (ret) return ret; ret = ib_cache_setup_one(device); if (ret) { dev_warn(&device->dev, "Couldn't set up InfiniBand P_Key/GID cache\n"); return ret; } device->groups[0] = &ib_dev_attr_group; device->groups[1] = device->ops.device_group; ret = ib_setup_device_attrs(device); if (ret) goto cache_cleanup; ib_device_register_rdmacg(device); rdma_counter_init(device); /* * Ensure that ADD uevent is not fired because it * is too early amd device is not initialized yet. */ dev_set_uevent_suppress(&device->dev, true); ret = device_add(&device->dev); if (ret) goto cg_cleanup; ret = ib_setup_port_attrs(&device->coredev); if (ret) { dev_warn(&device->dev, "Couldn't register device with driver model\n"); goto dev_cleanup; } ret = enable_device_and_get(device); if (ret) { void (*dealloc_fn)(struct ib_device *); /* * If we hit this error flow then we don't want to * automatically dealloc the device since the caller is * expected to call ib_dealloc_device() after * ib_register_device() fails. This is tricky due to the * possibility for a parallel unregistration along with this * error flow. Since we have a refcount here we know any * parallel flow is stopped in disable_device and will see the * special dealloc_driver pointer, causing the responsibility to * ib_dealloc_device() to revert back to this thread. */ dealloc_fn = device->ops.dealloc_driver; device->ops.dealloc_driver = prevent_dealloc_device; ib_device_put(device); __ib_unregister_device(device); device->ops.dealloc_driver = dealloc_fn; dev_set_uevent_suppress(&device->dev, false); return ret; } dev_set_uevent_suppress(&device->dev, false); /* Mark for userspace that device is ready */ kobject_uevent(&device->dev.kobj, KOBJ_ADD); ib_device_put(device); return 0; dev_cleanup: device_del(&device->dev); cg_cleanup: dev_set_uevent_suppress(&device->dev, false); ib_device_unregister_rdmacg(device); cache_cleanup: ib_cache_cleanup_one(device); return ret; } EXPORT_SYMBOL(ib_register_device); /* Callers must hold a get on the device. */ static void __ib_unregister_device(struct ib_device *ib_dev) { struct ib_device *sub, *tmp; mutex_lock(&ib_dev->subdev_lock); list_for_each_entry_safe_reverse(sub, tmp, &ib_dev->subdev_list_head, subdev_list) { list_del(&sub->subdev_list); ib_dev->ops.del_sub_dev(sub); ib_device_put(ib_dev); } mutex_unlock(&ib_dev->subdev_lock); /* * We have a registration lock so that all the calls to unregister are * fully fenced, once any unregister returns the device is truely * unregistered even if multiple callers are unregistering it at the * same time. This also interacts with the registration flow and * provides sane semantics if register and unregister are racing. */ mutex_lock(&ib_dev->unregistration_lock); if (!refcount_read(&ib_dev->refcount)) goto out; disable_device(ib_dev); /* Expedite removing unregistered pointers from the hash table */ free_netdevs(ib_dev); ib_free_port_attrs(&ib_dev->coredev); device_del(&ib_dev->dev); ib_device_unregister_rdmacg(ib_dev); ib_cache_cleanup_one(ib_dev); /* * Drivers using the new flow may not call ib_dealloc_device except * in error unwind prior to registration success. */ if (ib_dev->ops.dealloc_driver && ib_dev->ops.dealloc_driver != prevent_dealloc_device) { WARN_ON(kref_read(&ib_dev->dev.kobj.kref) <= 1); ib_dealloc_device(ib_dev); } out: mutex_unlock(&ib_dev->unregistration_lock); } /** * ib_unregister_device - Unregister an IB device * @ib_dev: The device to unregister * * Unregister an IB device. All clients will receive a remove callback. * * Callers should call this routine only once, and protect against races with * registration. Typically it should only be called as part of a remove * callback in an implementation of driver core's struct device_driver and * related. * * If ops.dealloc_driver is used then ib_dev will be freed upon return from * this function. */ void ib_unregister_device(struct ib_device *ib_dev) { get_device(&ib_dev->dev); __ib_unregister_device(ib_dev); put_device(&ib_dev->dev); } EXPORT_SYMBOL(ib_unregister_device); /** * ib_unregister_device_and_put - Unregister a device while holding a 'get' * @ib_dev: The device to unregister * * This is the same as ib_unregister_device(), except it includes an internal * ib_device_put() that should match a 'get' obtained by the caller. * * It is safe to call this routine concurrently from multiple threads while * holding the 'get'. When the function returns the device is fully * unregistered. * * Drivers using this flow MUST use the driver_unregister callback to clean up * their resources associated with the device and dealloc it. */ void ib_unregister_device_and_put(struct ib_device *ib_dev) { WARN_ON(!ib_dev->ops.dealloc_driver); get_device(&ib_dev->dev); ib_device_put(ib_dev); __ib_unregister_device(ib_dev); put_device(&ib_dev->dev); } EXPORT_SYMBOL(ib_unregister_device_and_put); /** * ib_unregister_driver - Unregister all IB devices for a driver * @driver_id: The driver to unregister * * This implements a fence for device unregistration. It only returns once all * devices associated with the driver_id have fully completed their * unregistration and returned from ib_unregister_device*(). * * If device's are not yet unregistered it goes ahead and starts unregistering * them. * * This does not block creation of new devices with the given driver_id, that * is the responsibility of the caller. */ void ib_unregister_driver(enum rdma_driver_id driver_id) { struct ib_device *ib_dev; unsigned long index; down_read(&devices_rwsem); xa_for_each (&devices, index, ib_dev) { if (ib_dev->ops.driver_id != driver_id) continue; get_device(&ib_dev->dev); up_read(&devices_rwsem); WARN_ON(!ib_dev->ops.dealloc_driver); __ib_unregister_device(ib_dev); put_device(&ib_dev->dev); down_read(&devices_rwsem); } up_read(&devices_rwsem); } EXPORT_SYMBOL(ib_unregister_driver); static void ib_unregister_work(struct work_struct *work) { struct ib_device *ib_dev = container_of(work, struct ib_device, unregistration_work); __ib_unregister_device(ib_dev); put_device(&ib_dev->dev); } /** * ib_unregister_device_queued - Unregister a device using a work queue * @ib_dev: The device to unregister * * This schedules an asynchronous unregistration using a WQ for the device. A * driver should use this to avoid holding locks while doing unregistration, * such as holding the RTNL lock. * * Drivers using this API must use ib_unregister_driver before module unload * to ensure that all scheduled unregistrations have completed. */ void ib_unregister_device_queued(struct ib_device *ib_dev) { WARN_ON(!refcount_read(&ib_dev->refcount)); WARN_ON(!ib_dev->ops.dealloc_driver); get_device(&ib_dev->dev); if (!queue_work(ib_unreg_wq, &ib_dev->unregistration_work)) put_device(&ib_dev->dev); } EXPORT_SYMBOL(ib_unregister_device_queued); /* * The caller must pass in a device that has the kref held and the refcount * released. If the device is in cur_net and still registered then it is moved * into net. */ static int rdma_dev_change_netns(struct ib_device *device, struct net *cur_net, struct net *net) { int ret2 = -EINVAL; int ret; mutex_lock(&device->unregistration_lock); /* * If a device not under ib_device_get() or if the unregistration_lock * is not held, the namespace can be changed, or it can be unregistered. * Check again under the lock. */ if (refcount_read(&device->refcount) == 0 || !net_eq(cur_net, read_pnet(&device->coredev.rdma_net))) { ret = -ENODEV; goto out; } kobject_uevent(&device->dev.kobj, KOBJ_REMOVE); disable_device(device); /* * At this point no one can be using the device, so it is safe to * change the namespace. */ write_pnet(&device->coredev.rdma_net, net); down_read(&devices_rwsem); /* * Currently rdma devices are system wide unique. So the device name * is guaranteed free in the new namespace. Publish the new namespace * at the sysfs level. */ ret = device_rename(&device->dev, dev_name(&device->dev)); up_read(&devices_rwsem); if (ret) { dev_warn(&device->dev, "%s: Couldn't rename device after namespace change\n", __func__); /* Try and put things back and re-enable the device */ write_pnet(&device->coredev.rdma_net, cur_net); } ret2 = enable_device_and_get(device); if (ret2) { /* * This shouldn't really happen, but if it does, let the user * retry at later point. So don't disable the device. */ dev_warn(&device->dev, "%s: Couldn't re-enable device after namespace change\n", __func__); } kobject_uevent(&device->dev.kobj, KOBJ_ADD); ib_device_put(device); out: mutex_unlock(&device->unregistration_lock); if (ret) return ret; return ret2; } int ib_device_set_netns_put(struct sk_buff *skb, struct ib_device *dev, u32 ns_fd) { struct net *net; int ret; net = get_net_ns_by_fd(ns_fd); if (IS_ERR(net)) { ret = PTR_ERR(net); goto net_err; } if (!netlink_ns_capable(skb, net->user_ns, CAP_NET_ADMIN)) { ret = -EPERM; goto ns_err; } /* * All the ib_clients, including uverbs, are reset when the namespace is * changed and this cannot be blocked waiting for userspace to do * something, so disassociation is mandatory. */ if (!dev->ops.disassociate_ucontext || ib_devices_shared_netns) { ret = -EOPNOTSUPP; goto ns_err; } get_device(&dev->dev); ib_device_put(dev); ret = rdma_dev_change_netns(dev, current->nsproxy->net_ns, net); put_device(&dev->dev); put_net(net); return ret; ns_err: put_net(net); net_err: ib_device_put(dev); return ret; } static struct pernet_operations rdma_dev_net_ops = { .init = rdma_dev_init_net, .exit = rdma_dev_exit_net, .id = &rdma_dev_net_id, .size = sizeof(struct rdma_dev_net), }; static int assign_client_id(struct ib_client *client) { int ret; lockdep_assert_held(&clients_rwsem); /* * The add/remove callbacks must be called in FIFO/LIFO order. To * achieve this we assign client_ids so they are sorted in * registration order. */ client->client_id = highest_client_id; ret = xa_insert(&clients, client->client_id, client, GFP_KERNEL); if (ret) return ret; highest_client_id++; xa_set_mark(&clients, client->client_id, CLIENT_REGISTERED); return 0; } static void remove_client_id(struct ib_client *client) { down_write(&clients_rwsem); xa_erase(&clients, client->client_id); for (; highest_client_id; highest_client_id--) if (xa_load(&clients, highest_client_id - 1)) break; up_write(&clients_rwsem); } /** * ib_register_client - Register an IB client * @client:Client to register * * Upper level users of the IB drivers can use ib_register_client() to * register callbacks for IB device addition and removal. When an IB * device is added, each registered client's add method will be called * (in the order the clients were registered), and when a device is * removed, each client's remove method will be called (in the reverse * order that clients were registered). In addition, when * ib_register_client() is called, the client will receive an add * callback for all devices already registered. */ int ib_register_client(struct ib_client *client) { struct ib_device *device; unsigned long index; bool need_unreg = false; int ret; refcount_set(&client->uses, 1); init_completion(&client->uses_zero); /* * The devices_rwsem is held in write mode to ensure that a racing * ib_register_device() sees a consisent view of clients and devices. */ down_write(&devices_rwsem); down_write(&clients_rwsem); ret = assign_client_id(client); if (ret) goto out; need_unreg = true; xa_for_each_marked (&devices, index, device, DEVICE_REGISTERED) { ret = add_client_context(device, client); if (ret) goto out; } ret = 0; out: up_write(&clients_rwsem); up_write(&devices_rwsem); if (need_unreg && ret) ib_unregister_client(client); return ret; } EXPORT_SYMBOL(ib_register_client); /** * ib_unregister_client - Unregister an IB client * @client:Client to unregister * * Upper level users use ib_unregister_client() to remove their client * registration. When ib_unregister_client() is called, the client * will receive a remove callback for each IB device still registered. * * This is a full fence, once it returns no client callbacks will be called, * or are running in another thread. */ void ib_unregister_client(struct ib_client *client) { struct ib_device *device; unsigned long index; down_write(&clients_rwsem); ib_client_put(client); xa_clear_mark(&clients, client->client_id, CLIENT_REGISTERED); up_write(&clients_rwsem); /* We do not want to have locks while calling client->remove() */ rcu_read_lock(); xa_for_each (&devices, index, device) { if (!ib_device_try_get(device)) continue; rcu_read_unlock(); remove_client_context(device, client->client_id); ib_device_put(device); rcu_read_lock(); } rcu_read_unlock(); /* * remove_client_context() is not a fence, it can return even though a * removal is ongoing. Wait until all removals are completed. */ wait_for_completion(&client->uses_zero); remove_client_id(client); } EXPORT_SYMBOL(ib_unregister_client); static int __ib_get_global_client_nl_info(const char *client_name, struct ib_client_nl_info *res) { struct ib_client *client; unsigned long index; int ret = -ENOENT; down_read(&clients_rwsem); xa_for_each_marked (&clients, index, client, CLIENT_REGISTERED) { if (strcmp(client->name, client_name) != 0) continue; if (!client->get_global_nl_info) { ret = -EOPNOTSUPP; break; } ret = client->get_global_nl_info(res); if (WARN_ON(ret == -ENOENT)) ret = -EINVAL; if (!ret && res->cdev) get_device(res->cdev); break; } up_read(&clients_rwsem); return ret; } static int __ib_get_client_nl_info(struct ib_device *ibdev, const char *client_name, struct ib_client_nl_info *res) { unsigned long index; void *client_data; int ret = -ENOENT; down_read(&ibdev->client_data_rwsem); xan_for_each_marked (&ibdev->client_data, index, client_data, CLIENT_DATA_REGISTERED) { struct ib_client *client = xa_load(&clients, index); if (!client || strcmp(client->name, client_name) != 0) continue; if (!client->get_nl_info) { ret = -EOPNOTSUPP; break; } ret = client->get_nl_info(ibdev, client_data, res); if (WARN_ON(ret == -ENOENT)) ret = -EINVAL; /* * The cdev is guaranteed valid as long as we are inside the * client_data_rwsem as remove_one can't be called. Keep it * valid for the caller. */ if (!ret && res->cdev) get_device(res->cdev); break; } up_read(&ibdev->client_data_rwsem); return ret; } /** * ib_get_client_nl_info - Fetch the nl_info from a client * @ibdev: IB device * @client_name: Name of the client * @res: Result of the query */ int ib_get_client_nl_info(struct ib_device *ibdev, const char *client_name, struct ib_client_nl_info *res) { int ret; if (ibdev) ret = __ib_get_client_nl_info(ibdev, client_name, res); else ret = __ib_get_global_client_nl_info(client_name, res); #ifdef CONFIG_MODULES if (ret == -ENOENT) { request_module("rdma-client-%s", client_name); if (ibdev) ret = __ib_get_client_nl_info(ibdev, client_name, res); else ret = __ib_get_global_client_nl_info(client_name, res); } #endif if (ret) { if (ret == -ENOENT) return -EOPNOTSUPP; return ret; } if (WARN_ON(!res->cdev)) return -EINVAL; return 0; } /** * ib_set_client_data - Set IB client context * @device:Device to set context for * @client:Client to set context for * @data:Context to set * * ib_set_client_data() sets client context data that can be retrieved with * ib_get_client_data(). This can only be called while the client is * registered to the device, once the ib_client remove() callback returns this * cannot be called. */ void ib_set_client_data(struct ib_device *device, struct ib_client *client, void *data) { void *rc; if (WARN_ON(IS_ERR(data))) data = NULL; rc = xa_store(&device->client_data, client->client_id, data, GFP_KERNEL); WARN_ON(xa_is_err(rc)); } EXPORT_SYMBOL(ib_set_client_data); /** * ib_register_event_handler - Register an IB event handler * @event_handler:Handler to register * * ib_register_event_handler() registers an event handler that will be * called back when asynchronous IB events occur (as defined in * chapter 11 of the InfiniBand Architecture Specification). This * callback occurs in workqueue context. */ void ib_register_event_handler(struct ib_event_handler *event_handler) { down_write(&event_handler->device->event_handler_rwsem); list_add_tail(&event_handler->list, &event_handler->device->event_handler_list); up_write(&event_handler->device->event_handler_rwsem); } EXPORT_SYMBOL(ib_register_event_handler); /** * ib_unregister_event_handler - Unregister an event handler * @event_handler:Handler to unregister * * Unregister an event handler registered with * ib_register_event_handler(). */ void ib_unregister_event_handler(struct ib_event_handler *event_handler) { down_write(&event_handler->device->event_handler_rwsem); list_del(&event_handler->list); up_write(&event_handler->device->event_handler_rwsem); } EXPORT_SYMBOL(ib_unregister_event_handler); void ib_dispatch_event_clients(struct ib_event *event) { struct ib_event_handler *handler; down_read(&event->device->event_handler_rwsem); list_for_each_entry(handler, &event->device->event_handler_list, list) handler->handler(handler, event); up_read(&event->device->event_handler_rwsem); } static int iw_query_port(struct ib_device *device, u32 port_num, struct ib_port_attr *port_attr) { struct in_device *inetdev; struct net_device *netdev; memset(port_attr, 0, sizeof(*port_attr)); netdev = ib_device_get_netdev(device, port_num); if (!netdev) return -ENODEV; port_attr->max_mtu = IB_MTU_4096; port_attr->active_mtu = ib_mtu_int_to_enum(netdev->mtu); if (!netif_carrier_ok(netdev)) { port_attr->state = IB_PORT_DOWN; port_attr->phys_state = IB_PORT_PHYS_STATE_DISABLED; } else { rcu_read_lock(); inetdev = __in_dev_get_rcu(netdev); if (inetdev && inetdev->ifa_list) { port_attr->state = IB_PORT_ACTIVE; port_attr->phys_state = IB_PORT_PHYS_STATE_LINK_UP; } else { port_attr->state = IB_PORT_INIT; port_attr->phys_state = IB_PORT_PHYS_STATE_PORT_CONFIGURATION_TRAINING; } rcu_read_unlock(); } dev_put(netdev); return device->ops.query_port(device, port_num, port_attr); } static int __ib_query_port(struct ib_device *device, u32 port_num, struct ib_port_attr *port_attr) { int err; memset(port_attr, 0, sizeof(*port_attr)); err = device->ops.query_port(device, port_num, port_attr); if (err || port_attr->subnet_prefix) return err; if (rdma_port_get_link_layer(device, port_num) != IB_LINK_LAYER_INFINIBAND) return 0; ib_get_cached_subnet_prefix(device, port_num, &port_attr->subnet_prefix); return 0; } /** * ib_query_port - Query IB port attributes * @device:Device to query * @port_num:Port number to query * @port_attr:Port attributes * * ib_query_port() returns the attributes of a port through the * @port_attr pointer. */ int ib_query_port(struct ib_device *device, u32 port_num, struct ib_port_attr *port_attr) { if (!rdma_is_port_valid(device, port_num)) return -EINVAL; if (rdma_protocol_iwarp(device, port_num)) return iw_query_port(device, port_num, port_attr); else return __ib_query_port(device, port_num, port_attr); } EXPORT_SYMBOL(ib_query_port); static void add_ndev_hash(struct ib_port_data *pdata) { unsigned long flags; might_sleep(); spin_lock_irqsave(&ndev_hash_lock, flags); if (hash_hashed(&pdata->ndev_hash_link)) { hash_del_rcu(&pdata->ndev_hash_link); spin_unlock_irqrestore(&ndev_hash_lock, flags); /* * We cannot do hash_add_rcu after a hash_del_rcu until the * grace period */ synchronize_rcu(); spin_lock_irqsave(&ndev_hash_lock, flags); } if (pdata->netdev) hash_add_rcu(ndev_hash, &pdata->ndev_hash_link, (uintptr_t)pdata->netdev); spin_unlock_irqrestore(&ndev_hash_lock, flags); } /** * ib_device_set_netdev - Associate the ib_dev with an underlying net_device * @ib_dev: Device to modify * @ndev: net_device to affiliate, may be NULL * @port: IB port the net_device is connected to * * Drivers should use this to link the ib_device to a netdev so the netdev * shows up in interfaces like ib_enum_roce_netdev. Only one netdev may be * affiliated with any port. * * The caller must ensure that the given ndev is not unregistered or * unregistering, and that either the ib_device is unregistered or * ib_device_set_netdev() is called with NULL when the ndev sends a * NETDEV_UNREGISTER event. */ int ib_device_set_netdev(struct ib_device *ib_dev, struct net_device *ndev, u32 port) { struct net_device *old_ndev; struct ib_port_data *pdata; unsigned long flags; int ret; if (!rdma_is_port_valid(ib_dev, port)) return -EINVAL; /* * Drivers wish to call this before ib_register_driver, so we have to * setup the port data early. */ ret = alloc_port_data(ib_dev); if (ret) return ret; pdata = &ib_dev->port_data[port]; spin_lock_irqsave(&pdata->netdev_lock, flags); old_ndev = rcu_dereference_protected( pdata->netdev, lockdep_is_held(&pdata->netdev_lock)); if (old_ndev == ndev) { spin_unlock_irqrestore(&pdata->netdev_lock, flags); return 0; } rcu_assign_pointer(pdata->netdev, ndev); netdev_put(old_ndev, &pdata->netdev_tracker); netdev_hold(ndev, &pdata->netdev_tracker, GFP_ATOMIC); spin_unlock_irqrestore(&pdata->netdev_lock, flags); add_ndev_hash(pdata); return 0; } EXPORT_SYMBOL(ib_device_set_netdev); static void free_netdevs(struct ib_device *ib_dev) { unsigned long flags; u32 port; if (!ib_dev->port_data) return; rdma_for_each_port (ib_dev, port) { struct ib_port_data *pdata = &ib_dev->port_data[port]; struct net_device *ndev; spin_lock_irqsave(&pdata->netdev_lock, flags); ndev = rcu_dereference_protected( pdata->netdev, lockdep_is_held(&pdata->netdev_lock)); if (ndev) { spin_lock(&ndev_hash_lock); hash_del_rcu(&pdata->ndev_hash_link); spin_unlock(&ndev_hash_lock); /* * If this is the last dev_put there is still a * synchronize_rcu before the netdev is kfreed, so we * can continue to rely on unlocked pointer * comparisons after the put */ rcu_assign_pointer(pdata->netdev, NULL); netdev_put(ndev, &pdata->netdev_tracker); } spin_unlock_irqrestore(&pdata->netdev_lock, flags); } } struct net_device *ib_device_get_netdev(struct ib_device *ib_dev, u32 port) { struct ib_port_data *pdata; struct net_device *res; if (!rdma_is_port_valid(ib_dev, port)) return NULL; pdata = &ib_dev->port_data[port]; /* * New drivers should use ib_device_set_netdev() not the legacy * get_netdev(). */ if (ib_dev->ops.get_netdev) res = ib_dev->ops.get_netdev(ib_dev, port); else { spin_lock(&pdata->netdev_lock); res = rcu_dereference_protected( pdata->netdev, lockdep_is_held(&pdata->netdev_lock)); dev_hold(res); spin_unlock(&pdata->netdev_lock); } /* * If we are starting to unregister expedite things by preventing * propagation of an unregistering netdev. */ if (res && res->reg_state != NETREG_REGISTERED) { dev_put(res); return NULL; } return res; } /** * ib_device_get_by_netdev - Find an IB device associated with a netdev * @ndev: netdev to locate * @driver_id: The driver ID that must match (RDMA_DRIVER_UNKNOWN matches all) * * Find and hold an ib_device that is associated with a netdev via * ib_device_set_netdev(). The caller must call ib_device_put() on the * returned pointer. */ struct ib_device *ib_device_get_by_netdev(struct net_device *ndev, enum rdma_driver_id driver_id) { struct ib_device *res = NULL; struct ib_port_data *cur; rcu_read_lock(); hash_for_each_possible_rcu (ndev_hash, cur, ndev_hash_link, (uintptr_t)ndev) { if (rcu_access_pointer(cur->netdev) == ndev && (driver_id == RDMA_DRIVER_UNKNOWN || cur->ib_dev->ops.driver_id == driver_id) && ib_device_try_get(cur->ib_dev)) { res = cur->ib_dev; break; } } rcu_read_unlock(); return res; } EXPORT_SYMBOL(ib_device_get_by_netdev); /** * ib_enum_roce_netdev - enumerate all RoCE ports * @ib_dev : IB device we want to query * @filter: Should we call the callback? * @filter_cookie: Cookie passed to filter * @cb: Callback to call for each found RoCE ports * @cookie: Cookie passed back to the callback * * Enumerates all of the physical RoCE ports of ib_dev * which are related to netdevice and calls callback() on each * device for which filter() function returns non zero. */ void ib_enum_roce_netdev(struct ib_device *ib_dev, roce_netdev_filter filter, void *filter_cookie, roce_netdev_callback cb, void *cookie) { u32 port; rdma_for_each_port (ib_dev, port) if (rdma_protocol_roce(ib_dev, port)) { struct net_device *idev = ib_device_get_netdev(ib_dev, port); if (filter(ib_dev, port, idev, filter_cookie)) cb(ib_dev, port, idev, cookie); dev_put(idev); } } /** * ib_enum_all_roce_netdevs - enumerate all RoCE devices * @filter: Should we call the callback? * @filter_cookie: Cookie passed to filter * @cb: Callback to call for each found RoCE ports * @cookie: Cookie passed back to the callback * * Enumerates all RoCE devices' physical ports which are related * to netdevices and calls callback() on each device for which * filter() function returns non zero. */ void ib_enum_all_roce_netdevs(roce_netdev_filter filter, void *filter_cookie, roce_netdev_callback cb, void *cookie) { struct ib_device *dev; unsigned long index; down_read(&devices_rwsem); xa_for_each_marked (&devices, index, dev, DEVICE_REGISTERED) ib_enum_roce_netdev(dev, filter, filter_cookie, cb, cookie); up_read(&devices_rwsem); } /* * ib_enum_all_devs - enumerate all ib_devices * @cb: Callback to call for each found ib_device * * Enumerates all ib_devices and calls callback() on each device. */ int ib_enum_all_devs(nldev_callback nldev_cb, struct sk_buff *skb, struct netlink_callback *cb) { unsigned long index; struct ib_device *dev; unsigned int idx = 0; int ret = 0; down_read(&devices_rwsem); xa_for_each_marked (&devices, index, dev, DEVICE_REGISTERED) { if (!rdma_dev_access_netns(dev, sock_net(skb->sk))) continue; ret = nldev_cb(dev, skb, cb, idx); if (ret) break; idx++; } up_read(&devices_rwsem); return ret; } /** * ib_query_pkey - Get P_Key table entry * @device:Device to query * @port_num:Port number to query * @index:P_Key table index to query * @pkey:Returned P_Key * * ib_query_pkey() fetches the specified P_Key table entry. */ int ib_query_pkey(struct ib_device *device, u32 port_num, u16 index, u16 *pkey) { if (!rdma_is_port_valid(device, port_num)) return -EINVAL; if (!device->ops.query_pkey) return -EOPNOTSUPP; return device->ops.query_pkey(device, port_num, index, pkey); } EXPORT_SYMBOL(ib_query_pkey); /** * ib_modify_device - Change IB device attributes * @device:Device to modify * @device_modify_mask:Mask of attributes to change * @device_modify:New attribute values * * ib_modify_device() changes a device's attributes as specified by * the @device_modify_mask and @device_modify structure. */ int ib_modify_device(struct ib_device *device, int device_modify_mask, struct ib_device_modify *device_modify) { if (!device->ops.modify_device) return -EOPNOTSUPP; return device->ops.modify_device(device, device_modify_mask, device_modify); } EXPORT_SYMBOL(ib_modify_device); /** * ib_modify_port - Modifies the attributes for the specified port. * @device: The device to modify. * @port_num: The number of the port to modify. * @port_modify_mask: Mask used to specify which attributes of the port * to change. * @port_modify: New attribute values for the port. * * ib_modify_port() changes a port's attributes as specified by the * @port_modify_mask and @port_modify structure. */ int ib_modify_port(struct ib_device *device, u32 port_num, int port_modify_mask, struct ib_port_modify *port_modify) { int rc; if (!rdma_is_port_valid(device, port_num)) return -EINVAL; if (device->ops.modify_port) rc = device->ops.modify_port(device, port_num, port_modify_mask, port_modify); else if (rdma_protocol_roce(device, port_num) && ((port_modify->set_port_cap_mask & ~IB_PORT_CM_SUP) == 0 || (port_modify->clr_port_cap_mask & ~IB_PORT_CM_SUP) == 0)) rc = 0; else rc = -EOPNOTSUPP; return rc; } EXPORT_SYMBOL(ib_modify_port); /** * ib_find_gid - Returns the port number and GID table index where * a specified GID value occurs. Its searches only for IB link layer. * @device: The device to query. * @gid: The GID value to search for. * @port_num: The port number of the device where the GID value was found. * @index: The index into the GID table where the GID was found. This * parameter may be NULL. */ int ib_find_gid(struct ib_device *device, union ib_gid *gid, u32 *port_num, u16 *index) { union ib_gid tmp_gid; u32 port; int ret, i; rdma_for_each_port (device, port) { if (!rdma_protocol_ib(device, port)) continue; for (i = 0; i < device->port_data[port].immutable.gid_tbl_len; ++i) { ret = rdma_query_gid(device, port, i, &tmp_gid); if (ret) continue; if (!memcmp(&tmp_gid, gid, sizeof *gid)) { *port_num = port; if (index) *index = i; return 0; } } } return -ENOENT; } EXPORT_SYMBOL(ib_find_gid); /** * ib_find_pkey - Returns the PKey table index where a specified * PKey value occurs. * @device: The device to query. * @port_num: The port number of the device to search for the PKey. * @pkey: The PKey value to search for. * @index: The index into the PKey table where the PKey was found. */ int ib_find_pkey(struct ib_device *device, u32 port_num, u16 pkey, u16 *index) { int ret, i; u16 tmp_pkey; int partial_ix = -1; for (i = 0; i < device->port_data[port_num].immutable.pkey_tbl_len; ++i) { ret = ib_query_pkey(device, port_num, i, &tmp_pkey); if (ret) return ret; if ((pkey & 0x7fff) == (tmp_pkey & 0x7fff)) { /* if there is full-member pkey take it.*/ if (tmp_pkey & 0x8000) { *index = i; return 0; } if (partial_ix < 0) partial_ix = i; } } /*no full-member, if exists take the limited*/ if (partial_ix >= 0) { *index = partial_ix; return 0; } return -ENOENT; } EXPORT_SYMBOL(ib_find_pkey); /** * ib_get_net_dev_by_params() - Return the appropriate net_dev * for a received CM request * @dev: An RDMA device on which the request has been received. * @port: Port number on the RDMA device. * @pkey: The Pkey the request came on. * @gid: A GID that the net_dev uses to communicate. * @addr: Contains the IP address that the request specified as its * destination. * */ struct net_device *ib_get_net_dev_by_params(struct ib_device *dev, u32 port, u16 pkey, const union ib_gid *gid, const struct sockaddr *addr) { struct net_device *net_dev = NULL; unsigned long index; void *client_data; if (!rdma_protocol_ib(dev, port)) return NULL; /* * Holding the read side guarantees that the client will not become * unregistered while we are calling get_net_dev_by_params() */ down_read(&dev->client_data_rwsem); xan_for_each_marked (&dev->client_data, index, client_data, CLIENT_DATA_REGISTERED) { struct ib_client *client = xa_load(&clients, index); if (!client || !client->get_net_dev_by_params) continue; net_dev = client->get_net_dev_by_params(dev, port, pkey, gid, addr, client_data); if (net_dev) break; } up_read(&dev->client_data_rwsem); return net_dev; } EXPORT_SYMBOL(ib_get_net_dev_by_params); void ib_set_device_ops(struct ib_device *dev, const struct ib_device_ops *ops) { struct ib_device_ops *dev_ops = &dev->ops; #define SET_DEVICE_OP(ptr, name) \ do { \ if (ops->name) \ if (!((ptr)->name)) \ (ptr)->name = ops->name; \ } while (0) #define SET_OBJ_SIZE(ptr, name) SET_DEVICE_OP(ptr, size_##name) if (ops->driver_id != RDMA_DRIVER_UNKNOWN) { WARN_ON(dev_ops->driver_id != RDMA_DRIVER_UNKNOWN && dev_ops->driver_id != ops->driver_id); dev_ops->driver_id = ops->driver_id; } if (ops->owner) { WARN_ON(dev_ops->owner && dev_ops->owner != ops->owner); dev_ops->owner = ops->owner; } if (ops->uverbs_abi_ver) dev_ops->uverbs_abi_ver = ops->uverbs_abi_ver; dev_ops->uverbs_no_driver_id_binding |= ops->uverbs_no_driver_id_binding; SET_DEVICE_OP(dev_ops, add_gid); SET_DEVICE_OP(dev_ops, add_sub_dev); SET_DEVICE_OP(dev_ops, advise_mr); SET_DEVICE_OP(dev_ops, alloc_dm); SET_DEVICE_OP(dev_ops, alloc_hw_device_stats); SET_DEVICE_OP(dev_ops, alloc_hw_port_stats); SET_DEVICE_OP(dev_ops, alloc_mr); SET_DEVICE_OP(dev_ops, alloc_mr_integrity); SET_DEVICE_OP(dev_ops, alloc_mw); SET_DEVICE_OP(dev_ops, alloc_pd); SET_DEVICE_OP(dev_ops, alloc_rdma_netdev); SET_DEVICE_OP(dev_ops, alloc_ucontext); SET_DEVICE_OP(dev_ops, alloc_xrcd); SET_DEVICE_OP(dev_ops, attach_mcast); SET_DEVICE_OP(dev_ops, check_mr_status); SET_DEVICE_OP(dev_ops, counter_alloc_stats); SET_DEVICE_OP(dev_ops, counter_bind_qp); SET_DEVICE_OP(dev_ops, counter_dealloc); SET_DEVICE_OP(dev_ops, counter_unbind_qp); SET_DEVICE_OP(dev_ops, counter_update_stats); SET_DEVICE_OP(dev_ops, create_ah); SET_DEVICE_OP(dev_ops, create_counters); SET_DEVICE_OP(dev_ops, create_cq); SET_DEVICE_OP(dev_ops, create_flow); SET_DEVICE_OP(dev_ops, create_qp); SET_DEVICE_OP(dev_ops, create_rwq_ind_table); SET_DEVICE_OP(dev_ops, create_srq); SET_DEVICE_OP(dev_ops, create_user_ah); SET_DEVICE_OP(dev_ops, create_wq); SET_DEVICE_OP(dev_ops, dealloc_dm); SET_DEVICE_OP(dev_ops, dealloc_driver); SET_DEVICE_OP(dev_ops, dealloc_mw); SET_DEVICE_OP(dev_ops, dealloc_pd); SET_DEVICE_OP(dev_ops, dealloc_ucontext); SET_DEVICE_OP(dev_ops, dealloc_xrcd); SET_DEVICE_OP(dev_ops, del_gid); SET_DEVICE_OP(dev_ops, del_sub_dev); SET_DEVICE_OP(dev_ops, dereg_mr); SET_DEVICE_OP(dev_ops, destroy_ah); SET_DEVICE_OP(dev_ops, destroy_counters); SET_DEVICE_OP(dev_ops, destroy_cq); SET_DEVICE_OP(dev_ops, destroy_flow); SET_DEVICE_OP(dev_ops, destroy_flow_action); SET_DEVICE_OP(dev_ops, destroy_qp); SET_DEVICE_OP(dev_ops, destroy_rwq_ind_table); SET_DEVICE_OP(dev_ops, destroy_srq); SET_DEVICE_OP(dev_ops, destroy_wq); SET_DEVICE_OP(dev_ops, device_group); SET_DEVICE_OP(dev_ops, detach_mcast); SET_DEVICE_OP(dev_ops, disassociate_ucontext); SET_DEVICE_OP(dev_ops, drain_rq); SET_DEVICE_OP(dev_ops, drain_sq); SET_DEVICE_OP(dev_ops, enable_driver); SET_DEVICE_OP(dev_ops, fill_res_cm_id_entry); SET_DEVICE_OP(dev_ops, fill_res_cq_entry); SET_DEVICE_OP(dev_ops, fill_res_cq_entry_raw); SET_DEVICE_OP(dev_ops, fill_res_mr_entry); SET_DEVICE_OP(dev_ops, fill_res_mr_entry_raw); SET_DEVICE_OP(dev_ops, fill_res_qp_entry); SET_DEVICE_OP(dev_ops, fill_res_qp_entry_raw); SET_DEVICE_OP(dev_ops, fill_res_srq_entry); SET_DEVICE_OP(dev_ops, fill_res_srq_entry_raw); SET_DEVICE_OP(dev_ops, fill_stat_mr_entry); SET_DEVICE_OP(dev_ops, get_dev_fw_str); SET_DEVICE_OP(dev_ops, get_dma_mr); SET_DEVICE_OP(dev_ops, get_hw_stats); SET_DEVICE_OP(dev_ops, get_link_layer); SET_DEVICE_OP(dev_ops, get_netdev); SET_DEVICE_OP(dev_ops, get_numa_node); SET_DEVICE_OP(dev_ops, get_port_immutable); SET_DEVICE_OP(dev_ops, get_vector_affinity); SET_DEVICE_OP(dev_ops, get_vf_config); SET_DEVICE_OP(dev_ops, get_vf_guid); SET_DEVICE_OP(dev_ops, get_vf_stats); SET_DEVICE_OP(dev_ops, iw_accept); SET_DEVICE_OP(dev_ops, iw_add_ref); SET_DEVICE_OP(dev_ops, iw_connect); SET_DEVICE_OP(dev_ops, iw_create_listen); SET_DEVICE_OP(dev_ops, iw_destroy_listen); SET_DEVICE_OP(dev_ops, iw_get_qp); SET_DEVICE_OP(dev_ops, iw_reject); SET_DEVICE_OP(dev_ops, iw_rem_ref); SET_DEVICE_OP(dev_ops, map_mr_sg); SET_DEVICE_OP(dev_ops, map_mr_sg_pi); SET_DEVICE_OP(dev_ops, mmap); SET_DEVICE_OP(dev_ops, mmap_free); SET_DEVICE_OP(dev_ops, modify_ah); SET_DEVICE_OP(dev_ops, modify_cq); SET_DEVICE_OP(dev_ops, modify_device); SET_DEVICE_OP(dev_ops, modify_hw_stat); SET_DEVICE_OP(dev_ops, modify_port); SET_DEVICE_OP(dev_ops, modify_qp); SET_DEVICE_OP(dev_ops, modify_srq); SET_DEVICE_OP(dev_ops, modify_wq); SET_DEVICE_OP(dev_ops, peek_cq); SET_DEVICE_OP(dev_ops, poll_cq); SET_DEVICE_OP(dev_ops, port_groups); SET_DEVICE_OP(dev_ops, post_recv); SET_DEVICE_OP(dev_ops, post_send); SET_DEVICE_OP(dev_ops, post_srq_recv); SET_DEVICE_OP(dev_ops, process_mad); SET_DEVICE_OP(dev_ops, query_ah); SET_DEVICE_OP(dev_ops, query_device); SET_DEVICE_OP(dev_ops, query_gid); SET_DEVICE_OP(dev_ops, query_pkey); SET_DEVICE_OP(dev_ops, query_port); SET_DEVICE_OP(dev_ops, query_qp); SET_DEVICE_OP(dev_ops, query_srq); SET_DEVICE_OP(dev_ops, query_ucontext); SET_DEVICE_OP(dev_ops, rdma_netdev_get_params); SET_DEVICE_OP(dev_ops, read_counters); SET_DEVICE_OP(dev_ops, reg_dm_mr); SET_DEVICE_OP(dev_ops, reg_user_mr); SET_DEVICE_OP(dev_ops, reg_user_mr_dmabuf); SET_DEVICE_OP(dev_ops, req_notify_cq); SET_DEVICE_OP(dev_ops, rereg_user_mr); SET_DEVICE_OP(dev_ops, resize_cq); SET_DEVICE_OP(dev_ops, set_vf_guid); SET_DEVICE_OP(dev_ops, set_vf_link_state); SET_OBJ_SIZE(dev_ops, ib_ah); SET_OBJ_SIZE(dev_ops, ib_counters); SET_OBJ_SIZE(dev_ops, ib_cq); SET_OBJ_SIZE(dev_ops, ib_mw); SET_OBJ_SIZE(dev_ops, ib_pd); SET_OBJ_SIZE(dev_ops, ib_qp); SET_OBJ_SIZE(dev_ops, ib_rwq_ind_table); SET_OBJ_SIZE(dev_ops, ib_srq); SET_OBJ_SIZE(dev_ops, ib_ucontext); SET_OBJ_SIZE(dev_ops, ib_xrcd); } EXPORT_SYMBOL(ib_set_device_ops); int ib_add_sub_device(struct ib_device *parent, enum rdma_nl_dev_type type, const char *name) { struct ib_device *sub; int ret = 0; if (!parent->ops.add_sub_dev || !parent->ops.del_sub_dev) return -EOPNOTSUPP; if (!ib_device_try_get(parent)) return -EINVAL; sub = parent->ops.add_sub_dev(parent, type, name); if (IS_ERR(sub)) { ib_device_put(parent); return PTR_ERR(sub); } sub->type = type; sub->parent = parent; mutex_lock(&parent->subdev_lock); list_add_tail(&parent->subdev_list_head, &sub->subdev_list); mutex_unlock(&parent->subdev_lock); return ret; } EXPORT_SYMBOL(ib_add_sub_device); int ib_del_sub_device_and_put(struct ib_device *sub) { struct ib_device *parent = sub->parent; if (!parent) return -EOPNOTSUPP; mutex_lock(&parent->subdev_lock); list_del(&sub->subdev_list); mutex_unlock(&parent->subdev_lock); ib_device_put(sub); parent->ops.del_sub_dev(sub); ib_device_put(parent); return 0; } EXPORT_SYMBOL(ib_del_sub_device_and_put); #ifdef CONFIG_INFINIBAND_VIRT_DMA int ib_dma_virt_map_sg(struct ib_device *dev, struct scatterlist *sg, int nents) { struct scatterlist *s; int i; for_each_sg(sg, s, nents, i) { sg_dma_address(s) = (uintptr_t)sg_virt(s); sg_dma_len(s) = s->length; } return nents; } EXPORT_SYMBOL(ib_dma_virt_map_sg); #endif /* CONFIG_INFINIBAND_VIRT_DMA */ static const struct rdma_nl_cbs ibnl_ls_cb_table[RDMA_NL_LS_NUM_OPS] = { [RDMA_NL_LS_OP_RESOLVE] = { .doit = ib_nl_handle_resolve_resp, .flags = RDMA_NL_ADMIN_PERM, }, [RDMA_NL_LS_OP_SET_TIMEOUT] = { .doit = ib_nl_handle_set_timeout, .flags = RDMA_NL_ADMIN_PERM, }, [RDMA_NL_LS_OP_IP_RESOLVE] = { .doit = ib_nl_handle_ip_res_resp, .flags = RDMA_NL_ADMIN_PERM, }, }; static int __init ib_core_init(void) { int ret = -ENOMEM; ib_wq = alloc_workqueue("infiniband", 0, 0); if (!ib_wq) return -ENOMEM; ib_unreg_wq = alloc_workqueue("ib-unreg-wq", WQ_UNBOUND, WQ_UNBOUND_MAX_ACTIVE); if (!ib_unreg_wq) goto err; ib_comp_wq = alloc_workqueue("ib-comp-wq", WQ_HIGHPRI | WQ_MEM_RECLAIM | WQ_SYSFS, 0); if (!ib_comp_wq) goto err_unbound; ib_comp_unbound_wq = alloc_workqueue("ib-comp-unb-wq", WQ_UNBOUND | WQ_HIGHPRI | WQ_MEM_RECLAIM | WQ_SYSFS, WQ_UNBOUND_MAX_ACTIVE); if (!ib_comp_unbound_wq) goto err_comp; ret = class_register(&ib_class); if (ret) { pr_warn("Couldn't create InfiniBand device class\n"); goto err_comp_unbound; } rdma_nl_init(); ret = addr_init(); if (ret) { pr_warn("Couldn't init IB address resolution\n"); goto err_ibnl; } ret = ib_mad_init(); if (ret) { pr_warn("Couldn't init IB MAD\n"); goto err_addr; } ret = ib_sa_init(); if (ret) { pr_warn("Couldn't init SA\n"); goto err_mad; } ret = register_blocking_lsm_notifier(&ibdev_lsm_nb); if (ret) { pr_warn("Couldn't register LSM notifier. ret %d\n", ret); goto err_sa; } ret = register_pernet_device(&rdma_dev_net_ops); if (ret) { pr_warn("Couldn't init compat dev. ret %d\n", ret); goto err_compat; } nldev_init(); rdma_nl_register(RDMA_NL_LS, ibnl_ls_cb_table); ret = roce_gid_mgmt_init(); if (ret) { pr_warn("Couldn't init RoCE GID management\n"); goto err_parent; } return 0; err_parent: rdma_nl_unregister(RDMA_NL_LS); nldev_exit(); unregister_pernet_device(&rdma_dev_net_ops); err_compat: unregister_blocking_lsm_notifier(&ibdev_lsm_nb); err_sa: ib_sa_cleanup(); err_mad: ib_mad_cleanup(); err_addr: addr_cleanup(); err_ibnl: class_unregister(&ib_class); err_comp_unbound: destroy_workqueue(ib_comp_unbound_wq); err_comp: destroy_workqueue(ib_comp_wq); err_unbound: destroy_workqueue(ib_unreg_wq); err: destroy_workqueue(ib_wq); return ret; } static void __exit ib_core_cleanup(void) { roce_gid_mgmt_cleanup(); rdma_nl_unregister(RDMA_NL_LS); nldev_exit(); unregister_pernet_device(&rdma_dev_net_ops); unregister_blocking_lsm_notifier(&ibdev_lsm_nb); ib_sa_cleanup(); ib_mad_cleanup(); addr_cleanup(); rdma_nl_exit(); class_unregister(&ib_class); destroy_workqueue(ib_comp_unbound_wq); destroy_workqueue(ib_comp_wq); /* Make sure that any pending umem accounting work is done. */ destroy_workqueue(ib_wq); destroy_workqueue(ib_unreg_wq); WARN_ON(!xa_empty(&clients)); WARN_ON(!xa_empty(&devices)); } MODULE_ALIAS_RDMA_NETLINK(RDMA_NL_LS, 4); /* ib core relies on netdev stack to first register net_ns_type_operations * ns kobject type before ib_core initialization. */ fs_initcall(ib_core_init); module_exit(ib_core_cleanup); |
4 1 3 52 6 142 29 4079 442 21 7 257 202 204 1239 1238 16 2019 5 3 1 1 2 2 2 712 170 554 713 35 499 370 93 344 239 21 19 213 1241 37 39 4 1 2 4 3 1 9 9 155 157 1116 1384 313 1112 124 313 78 48 16037 15768 16017 16079 289 312 1 22 22 4 132 128 9 3 8 8 12 65 67 130 4 9 9 9 132 29 37 53 13 60 2 45 17 62 62 62 62 504 1996 1952 1992 2001 70 27 44 55 45 11 199 201 3 18 179 1005 2 1006 955 49 11 35 35 1375 815 24 3 355 180 1997 1 245 166 210 93 7 25 48 48 1 1 2 3 2 2 9 1378 37 37 680 686 19 663 20 18 28 1110 1118 11 3 8 178 78 17 72 42 3 1085 3 2 1077 4 1071 12 10 146 944 156 10 348 1 2 2 650 651 1 134 529 651 112 1 1 1 3 24 50 112 6 953 967 58 969 959 962 966 159 161 9 159 157 161 165 4 95 56 5 10 2 2 103 57 167 19 158 156 108 59 46 3 4 1061 1026 52 1021 1021 973 1023 71 12 5 62 66 70 59 12 24 28 51 57 1377 1 38 422 991 1285 1382 497 1 18 458 17 474 498 4135 2 4134 3899 2 4158 4157 300 4104 4159 1906 1908 199 1 1711 1719 1902 1914 104 1906 1894 1913 218 9 218 216 222 222 13905 16 13851 75 13869 28 13845 3 1424 52 14943 13 1308 39 13855 13868 4 6 3 13250 373 426 3 769 13692 185 854 551 48 12737 637 682 356 13400 416 13856 13865 13856 55 13857 15 1 629 12608 12783 12625 1291 1 56 983 710 994 22 1181 1287 1310 1319 502 1322 429 1094 326 7 1 1119 1304 1310 68 1303 23 24 346 366 352 966 967 55 124 829 2 836 414 843 731 5 112 8 166 143 148 876 1 838 53 111 971 1 962 4 4 1 973 8 91 885 175 352 179 179 7 60 63 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 2297 2298 2299 2300 2301 2302 2303 2304 2305 2306 2307 2308 2309 2310 2311 2312 2313 2314 2315 2316 2317 2318 2319 2320 2321 2322 2323 2324 2325 2326 2327 2328 2329 2330 2331 2332 2333 2334 2335 2336 2337 2338 2339 2340 2341 2342 2343 2344 2345 2346 2347 2348 2349 2350 2351 2352 2353 2354 2355 2356 2357 2358 2359 2360 2361 2362 2363 2364 2365 2366 2367 2368 2369 2370 2371 2372 2373 2374 2375 2376 2377 2378 2379 2380 2381 2382 2383 2384 2385 2386 2387 2388 2389 2390 2391 2392 2393 2394 2395 2396 2397 2398 2399 2400 2401 2402 2403 2404 2405 2406 2407 2408 2409 2410 2411 2412 2413 2414 2415 2416 2417 2418 2419 2420 2421 2422 2423 2424 2425 2426 2427 2428 2429 2430 2431 2432 2433 2434 2435 2436 2437 2438 2439 2440 2441 2442 2443 2444 2445 2446 2447 2448 2449 2450 2451 2452 2453 2454 2455 2456 2457 2458 2459 2460 2461 2462 2463 2464 2465 2466 2467 2468 2469 2470 2471 2472 2473 2474 2475 2476 2477 2478 2479 2480 2481 2482 2483 2484 2485 2486 2487 2488 2489 2490 2491 2492 2493 2494 2495 2496 2497 2498 2499 2500 2501 2502 2503 2504 2505 2506 2507 2508 2509 2510 2511 2512 2513 2514 2515 2516 2517 2518 2519 2520 2521 2522 2523 2524 2525 2526 2527 2528 2529 2530 2531 2532 2533 2534 2535 2536 2537 2538 2539 2540 2541 2542 2543 2544 2545 2546 2547 2548 2549 2550 2551 2552 2553 2554 2555 2556 2557 2558 2559 2560 2561 2562 2563 2564 2565 2566 2567 2568 2569 2570 2571 2572 2573 2574 2575 2576 2577 2578 2579 2580 2581 2582 2583 2584 2585 2586 2587 2588 2589 2590 2591 2592 2593 2594 2595 2596 2597 2598 2599 2600 2601 2602 2603 2604 2605 2606 2607 2608 2609 2610 2611 2612 2613 2614 2615 2616 2617 2618 2619 2620 2621 2622 2623 2624 2625 2626 2627 2628 2629 2630 2631 2632 2633 2634 2635 2636 2637 2638 2639 2640 2641 2642 2643 2644 2645 2646 2647 2648 2649 2650 2651 2652 2653 2654 2655 2656 2657 2658 2659 2660 2661 2662 2663 2664 2665 2666 2667 2668 2669 2670 2671 2672 2673 2674 2675 2676 2677 2678 2679 2680 2681 2682 2683 2684 2685 2686 2687 2688 2689 2690 2691 2692 2693 2694 2695 2696 2697 2698 2699 2700 2701 2702 2703 2704 2705 2706 2707 2708 2709 2710 2711 2712 2713 2714 2715 2716 2717 2718 2719 2720 2721 2722 2723 2724 2725 2726 2727 2728 2729 2730 2731 2732 2733 2734 2735 2736 2737 2738 2739 2740 2741 2742 2743 2744 2745 2746 2747 2748 2749 2750 2751 2752 2753 2754 2755 2756 2757 2758 2759 2760 2761 2762 2763 2764 2765 2766 2767 2768 2769 2770 2771 2772 2773 2774 2775 2776 2777 2778 2779 2780 2781 2782 2783 2784 2785 2786 2787 2788 2789 2790 2791 2792 2793 2794 2795 2796 2797 2798 2799 2800 2801 2802 2803 2804 2805 2806 2807 2808 2809 2810 2811 2812 2813 2814 2815 2816 2817 2818 2819 2820 2821 2822 2823 2824 2825 2826 2827 2828 2829 2830 2831 2832 2833 2834 2835 2836 2837 2838 2839 2840 2841 2842 2843 2844 2845 2846 2847 2848 2849 2850 2851 2852 2853 2854 2855 2856 2857 2858 2859 2860 2861 2862 2863 2864 2865 2866 2867 2868 2869 2870 2871 2872 2873 2874 2875 2876 2877 2878 2879 2880 2881 2882 2883 2884 2885 2886 2887 2888 2889 2890 2891 2892 2893 2894 2895 2896 2897 2898 2899 2900 2901 2902 2903 2904 2905 2906 2907 2908 2909 2910 2911 2912 2913 2914 2915 2916 2917 2918 2919 2920 2921 2922 2923 2924 2925 2926 2927 2928 2929 2930 2931 2932 2933 2934 2935 2936 2937 2938 2939 2940 2941 2942 2943 2944 2945 2946 2947 2948 2949 2950 2951 2952 2953 2954 2955 2956 2957 2958 2959 2960 2961 2962 2963 2964 2965 2966 2967 2968 2969 2970 2971 2972 2973 2974 2975 2976 2977 2978 2979 2980 2981 2982 2983 2984 2985 2986 2987 2988 2989 2990 2991 2992 2993 2994 2995 2996 2997 2998 2999 3000 3001 3002 3003 3004 3005 3006 3007 3008 3009 3010 3011 3012 3013 3014 3015 3016 3017 3018 3019 3020 3021 3022 3023 3024 3025 3026 3027 3028 3029 3030 3031 3032 3033 3034 3035 3036 3037 3038 3039 3040 3041 3042 3043 3044 3045 3046 3047 3048 3049 3050 3051 3052 3053 3054 3055 3056 3057 3058 3059 3060 3061 3062 3063 3064 3065 3066 3067 3068 3069 3070 3071 3072 3073 3074 3075 3076 3077 3078 3079 3080 3081 3082 3083 3084 3085 3086 3087 3088 3089 3090 3091 3092 3093 3094 3095 3096 3097 3098 3099 3100 3101 3102 3103 3104 3105 3106 3107 3108 3109 3110 3111 3112 3113 3114 3115 3116 3117 3118 3119 3120 3121 3122 3123 3124 3125 3126 3127 3128 3129 3130 3131 3132 3133 3134 3135 3136 3137 3138 3139 3140 3141 3142 3143 3144 3145 3146 3147 3148 3149 3150 3151 3152 3153 3154 3155 3156 3157 3158 3159 3160 3161 3162 3163 3164 3165 3166 3167 3168 3169 3170 3171 3172 3173 3174 3175 3176 3177 3178 3179 3180 3181 3182 3183 3184 3185 3186 3187 3188 3189 3190 3191 3192 3193 3194 3195 3196 3197 3198 3199 3200 3201 3202 3203 3204 3205 3206 3207 3208 3209 3210 3211 3212 3213 3214 3215 3216 3217 3218 3219 3220 3221 3222 3223 3224 3225 3226 3227 3228 3229 3230 3231 3232 3233 3234 3235 3236 3237 3238 3239 3240 3241 3242 3243 3244 3245 3246 3247 3248 3249 3250 3251 3252 3253 3254 3255 3256 3257 3258 3259 3260 3261 3262 3263 3264 3265 3266 3267 3268 3269 3270 3271 3272 3273 3274 3275 3276 3277 3278 3279 3280 3281 3282 3283 3284 3285 3286 3287 3288 3289 3290 3291 3292 3293 3294 3295 3296 3297 3298 3299 3300 3301 3302 3303 3304 3305 3306 3307 3308 3309 3310 3311 3312 3313 3314 3315 3316 3317 3318 3319 3320 3321 3322 3323 3324 3325 3326 3327 3328 3329 3330 3331 3332 3333 3334 3335 3336 3337 3338 3339 3340 3341 3342 3343 3344 3345 3346 3347 3348 3349 3350 3351 3352 3353 3354 3355 3356 3357 3358 3359 3360 3361 3362 3363 3364 3365 3366 3367 3368 3369 3370 3371 3372 3373 3374 3375 3376 3377 3378 3379 3380 3381 3382 3383 3384 3385 3386 3387 3388 3389 3390 3391 3392 3393 3394 3395 3396 3397 3398 3399 3400 3401 3402 3403 3404 3405 3406 3407 3408 3409 3410 3411 3412 3413 3414 3415 3416 3417 3418 3419 3420 3421 3422 3423 3424 3425 3426 3427 3428 3429 3430 3431 3432 3433 3434 3435 3436 3437 3438 3439 3440 3441 3442 3443 3444 3445 3446 3447 3448 3449 3450 3451 3452 3453 3454 3455 3456 3457 3458 3459 3460 3461 3462 3463 3464 3465 3466 3467 3468 3469 3470 3471 3472 3473 3474 3475 3476 3477 3478 3479 3480 3481 3482 3483 3484 3485 3486 3487 3488 3489 3490 3491 3492 3493 3494 3495 3496 3497 3498 3499 3500 3501 3502 3503 3504 3505 3506 3507 3508 3509 3510 3511 3512 3513 3514 3515 3516 3517 3518 3519 3520 3521 3522 3523 3524 3525 3526 3527 3528 3529 3530 3531 3532 3533 3534 3535 3536 3537 3538 3539 3540 3541 3542 3543 3544 3545 3546 3547 3548 3549 3550 3551 3552 3553 3554 3555 3556 3557 3558 3559 3560 3561 3562 3563 3564 3565 3566 3567 3568 3569 3570 3571 3572 3573 3574 3575 3576 3577 3578 3579 3580 3581 3582 3583 3584 3585 3586 3587 3588 3589 3590 3591 3592 3593 3594 3595 3596 3597 3598 3599 3600 3601 3602 3603 3604 3605 3606 3607 3608 3609 3610 3611 3612 3613 3614 3615 3616 3617 3618 3619 3620 3621 3622 3623 3624 3625 3626 3627 3628 3629 3630 3631 3632 3633 3634 3635 3636 3637 3638 3639 3640 3641 3642 3643 3644 3645 3646 3647 3648 3649 3650 3651 3652 3653 3654 3655 3656 3657 3658 3659 3660 3661 3662 3663 3664 3665 3666 3667 3668 3669 3670 3671 3672 3673 3674 3675 3676 3677 3678 3679 3680 3681 3682 3683 3684 3685 3686 3687 3688 3689 3690 3691 3692 3693 3694 3695 3696 3697 3698 3699 3700 3701 3702 3703 3704 3705 3706 3707 3708 3709 3710 3711 3712 3713 3714 3715 3716 3717 3718 3719 3720 3721 3722 3723 3724 3725 3726 3727 3728 3729 3730 3731 3732 3733 3734 3735 3736 3737 3738 | // SPDX-License-Identifier: GPL-2.0-or-later /* * NET An implementation of the SOCKET network access protocol. * * Version: @(#)socket.c 1.1.93 18/02/95 * * Authors: Orest Zborowski, <obz@Kodak.COM> * Ross Biro * Fred N. van Kempen, <waltje@uWalt.NL.Mugnet.ORG> * * Fixes: * Anonymous : NOTSOCK/BADF cleanup. Error fix in * shutdown() * Alan Cox : verify_area() fixes * Alan Cox : Removed DDI * Jonathan Kamens : SOCK_DGRAM reconnect bug * Alan Cox : Moved a load of checks to the very * top level. * Alan Cox : Move address structures to/from user * mode above the protocol layers. * Rob Janssen : Allow 0 length sends. * Alan Cox : Asynchronous I/O support (cribbed from the * tty drivers). * Niibe Yutaka : Asynchronous I/O for writes (4.4BSD style) * Jeff Uphoff : Made max number of sockets command-line * configurable. * Matti Aarnio : Made the number of sockets dynamic, * to be allocated when needed, and mr. * Uphoff's max is used as max to be * allowed to allocate. * Linus : Argh. removed all the socket allocation * altogether: it's in the inode now. * Alan Cox : Made sock_alloc()/sock_release() public * for NetROM and future kernel nfsd type * stuff. * Alan Cox : sendmsg/recvmsg basics. * Tom Dyas : Export net symbols. * Marcin Dalecki : Fixed problems with CONFIG_NET="n". * Alan Cox : Added thread locking to sys_* calls * for sockets. May have errors at the * moment. * Kevin Buhr : Fixed the dumb errors in the above. * Andi Kleen : Some small cleanups, optimizations, * and fixed a copy_from_user() bug. * Tigran Aivazian : sys_send(args) calls sys_sendto(args, NULL, 0) * Tigran Aivazian : Made listen(2) backlog sanity checks * protocol-independent * * This module is effectively the top level interface to the BSD socket * paradigm. * * Based upon Swansea University Computer Society NET3.039 */ #include <linux/bpf-cgroup.h> #include <linux/ethtool.h> #include <linux/mm.h> #include <linux/socket.h> #include <linux/file.h> #include <linux/splice.h> #include <linux/net.h> #include <linux/interrupt.h> #include <linux/thread_info.h> #include <linux/rcupdate.h> #include <linux/netdevice.h> #include <linux/proc_fs.h> #include <linux/seq_file.h> #include <linux/mutex.h> #include <linux/if_bridge.h> #include <linux/if_vlan.h> #include <linux/ptp_classify.h> #include <linux/init.h> #include <linux/poll.h> #include <linux/cache.h> #include <linux/module.h> #include <linux/highmem.h> #include <linux/mount.h> #include <linux/pseudo_fs.h> #include <linux/security.h> #include <linux/syscalls.h> #include <linux/compat.h> #include <linux/kmod.h> #include <linux/audit.h> #include <linux/wireless.h> #include <linux/nsproxy.h> #include <linux/magic.h> #include <linux/slab.h> #include <linux/xattr.h> #include <linux/nospec.h> #include <linux/indirect_call_wrapper.h> #include <linux/io_uring/net.h> #include <linux/uaccess.h> #include <asm/unistd.h> #include <net/compat.h> #include <net/wext.h> #include <net/cls_cgroup.h> #include <net/sock.h> #include <linux/netfilter.h> #include <linux/if_tun.h> #include <linux/ipv6_route.h> #include <linux/route.h> #include <linux/termios.h> #include <linux/sockios.h> #include <net/busy_poll.h> #include <linux/errqueue.h> #include <linux/ptp_clock_kernel.h> #include <trace/events/sock.h> #ifdef CONFIG_NET_RX_BUSY_POLL unsigned int sysctl_net_busy_read __read_mostly; unsigned int sysctl_net_busy_poll __read_mostly; #endif static ssize_t sock_read_iter(struct kiocb *iocb, struct iov_iter *to); static ssize_t sock_write_iter(struct kiocb *iocb, struct iov_iter *from); static int sock_mmap(struct file *file, struct vm_area_struct *vma); static int sock_close(struct inode *inode, struct file *file); static __poll_t sock_poll(struct file *file, struct poll_table_struct *wait); static long sock_ioctl(struct file *file, unsigned int cmd, unsigned long arg); #ifdef CONFIG_COMPAT static long compat_sock_ioctl(struct file *file, unsigned int cmd, unsigned long arg); #endif static int sock_fasync(int fd, struct file *filp, int on); static ssize_t sock_splice_read(struct file *file, loff_t *ppos, struct pipe_inode_info *pipe, size_t len, unsigned int flags); static void sock_splice_eof(struct file *file); #ifdef CONFIG_PROC_FS static void sock_show_fdinfo(struct seq_file *m, struct file *f) { struct socket *sock = f->private_data; const struct proto_ops *ops = READ_ONCE(sock->ops); if (ops->show_fdinfo) ops->show_fdinfo(m, sock); } #else #define sock_show_fdinfo NULL #endif /* * Socket files have a set of 'special' operations as well as the generic file ones. These don't appear * in the operation structures but are done directly via the socketcall() multiplexor. */ static const struct file_operations socket_file_ops = { .owner = THIS_MODULE, .llseek = no_llseek, .read_iter = sock_read_iter, .write_iter = sock_write_iter, .poll = sock_poll, .unlocked_ioctl = sock_ioctl, #ifdef CONFIG_COMPAT .compat_ioctl = compat_sock_ioctl, #endif .uring_cmd = io_uring_cmd_sock, .mmap = sock_mmap, .release = sock_close, .fasync = sock_fasync, .splice_write = splice_to_socket, .splice_read = sock_splice_read, .splice_eof = sock_splice_eof, .show_fdinfo = sock_show_fdinfo, }; static const char * const pf_family_names[] = { [PF_UNSPEC] = "PF_UNSPEC", [PF_UNIX] = "PF_UNIX/PF_LOCAL", [PF_INET] = "PF_INET", [PF_AX25] = "PF_AX25", [PF_IPX] = "PF_IPX", [PF_APPLETALK] = "PF_APPLETALK", [PF_NETROM] = "PF_NETROM", [PF_BRIDGE] = "PF_BRIDGE", [PF_ATMPVC] = "PF_ATMPVC", [PF_X25] = "PF_X25", [PF_INET6] = "PF_INET6", [PF_ROSE] = "PF_ROSE", [PF_DECnet] = "PF_DECnet", [PF_NETBEUI] = "PF_NETBEUI", [PF_SECURITY] = "PF_SECURITY", [PF_KEY] = "PF_KEY", [PF_NETLINK] = "PF_NETLINK/PF_ROUTE", [PF_PACKET] = "PF_PACKET", [PF_ASH] = "PF_ASH", [PF_ECONET] = "PF_ECONET", [PF_ATMSVC] = "PF_ATMSVC", [PF_RDS] = "PF_RDS", [PF_SNA] = "PF_SNA", [PF_IRDA] = "PF_IRDA", [PF_PPPOX] = "PF_PPPOX", [PF_WANPIPE] = "PF_WANPIPE", [PF_LLC] = "PF_LLC", [PF_IB] = "PF_IB", [PF_MPLS] = "PF_MPLS", [PF_CAN] = "PF_CAN", [PF_TIPC] = "PF_TIPC", [PF_BLUETOOTH] = "PF_BLUETOOTH", [PF_IUCV] = "PF_IUCV", [PF_RXRPC] = "PF_RXRPC", [PF_ISDN] = "PF_ISDN", [PF_PHONET] = "PF_PHONET", [PF_IEEE802154] = "PF_IEEE802154", [PF_CAIF] = "PF_CAIF", [PF_ALG] = "PF_ALG", [PF_NFC] = "PF_NFC", [PF_VSOCK] = "PF_VSOCK", [PF_KCM] = "PF_KCM", [PF_QIPCRTR] = "PF_QIPCRTR", [PF_SMC] = "PF_SMC", [PF_XDP] = "PF_XDP", [PF_MCTP] = "PF_MCTP", }; /* * The protocol list. Each protocol is registered in here. */ static DEFINE_SPINLOCK(net_family_lock); static const struct net_proto_family __rcu *net_families[NPROTO] __read_mostly; /* * Support routines. * Move socket addresses back and forth across the kernel/user * divide and look after the messy bits. */ /** * move_addr_to_kernel - copy a socket address into kernel space * @uaddr: Address in user space * @kaddr: Address in kernel space * @ulen: Length in user space * * The address is copied into kernel space. If the provided address is * too long an error code of -EINVAL is returned. If the copy gives * invalid addresses -EFAULT is returned. On a success 0 is returned. */ int move_addr_to_kernel(void __user *uaddr, int ulen, struct sockaddr_storage *kaddr) { if (ulen < 0 || ulen > sizeof(struct sockaddr_storage)) return -EINVAL; if (ulen == 0) return 0; if (copy_from_user(kaddr, uaddr, ulen)) return -EFAULT; return audit_sockaddr(ulen, kaddr); } /** * move_addr_to_user - copy an address to user space * @kaddr: kernel space address * @klen: length of address in kernel * @uaddr: user space address * @ulen: pointer to user length field * * The value pointed to by ulen on entry is the buffer length available. * This is overwritten with the buffer space used. -EINVAL is returned * if an overlong buffer is specified or a negative buffer size. -EFAULT * is returned if either the buffer or the length field are not * accessible. * After copying the data up to the limit the user specifies, the true * length of the data is written over the length limit the user * specified. Zero is returned for a success. */ static int move_addr_to_user(struct sockaddr_storage *kaddr, int klen, void __user *uaddr, int __user *ulen) { int err; int len; BUG_ON(klen > sizeof(struct sockaddr_storage)); err = get_user(len, ulen); if (err) return err; if (len > klen) len = klen; if (len < 0) return -EINVAL; if (len) { if (audit_sockaddr(klen, kaddr)) return -ENOMEM; if (copy_to_user(uaddr, kaddr, len)) return -EFAULT; } /* * "fromlen shall refer to the value before truncation.." * 1003.1g */ return __put_user(klen, ulen); } static struct kmem_cache *sock_inode_cachep __ro_after_init; static struct inode *sock_alloc_inode(struct super_block *sb) { struct socket_alloc *ei; ei = alloc_inode_sb(sb, sock_inode_cachep, GFP_KERNEL); if (!ei) return NULL; init_waitqueue_head(&ei->socket.wq.wait); ei->socket.wq.fasync_list = NULL; ei->socket.wq.flags = 0; ei->socket.state = SS_UNCONNECTED; ei->socket.flags = 0; ei->socket.ops = NULL; ei->socket.sk = NULL; ei->socket.file = NULL; return &ei->vfs_inode; } static void sock_free_inode(struct inode *inode) { struct socket_alloc *ei; ei = container_of(inode, struct socket_alloc, vfs_inode); kmem_cache_free(sock_inode_cachep, ei); } static void init_once(void *foo) { struct socket_alloc *ei = (struct socket_alloc *)foo; inode_init_once(&ei->vfs_inode); } static void init_inodecache(void) { sock_inode_cachep = kmem_cache_create("sock_inode_cache", sizeof(struct socket_alloc), 0, (SLAB_HWCACHE_ALIGN | SLAB_RECLAIM_ACCOUNT | SLAB_ACCOUNT), init_once); BUG_ON(sock_inode_cachep == NULL); } static const struct super_operations sockfs_ops = { .alloc_inode = sock_alloc_inode, .free_inode = sock_free_inode, .statfs = simple_statfs, }; /* * sockfs_dname() is called from d_path(). */ static char *sockfs_dname(struct dentry *dentry, char *buffer, int buflen) { return dynamic_dname(buffer, buflen, "socket:[%lu]", d_inode(dentry)->i_ino); } static const struct dentry_operations sockfs_dentry_operations = { .d_dname = sockfs_dname, }; static int sockfs_xattr_get(const struct xattr_handler *handler, struct dentry *dentry, struct inode *inode, const char *suffix, void *value, size_t size) { if (value) { if (dentry->d_name.len + 1 > size) return -ERANGE; memcpy(value, dentry->d_name.name, dentry->d_name.len + 1); } return dentry->d_name.len + 1; } #define XATTR_SOCKPROTONAME_SUFFIX "sockprotoname" #define XATTR_NAME_SOCKPROTONAME (XATTR_SYSTEM_PREFIX XATTR_SOCKPROTONAME_SUFFIX) #define XATTR_NAME_SOCKPROTONAME_LEN (sizeof(XATTR_NAME_SOCKPROTONAME)-1) static const struct xattr_handler sockfs_xattr_handler = { .name = XATTR_NAME_SOCKPROTONAME, .get = sockfs_xattr_get, }; static int sockfs_security_xattr_set(const struct xattr_handler *handler, struct mnt_idmap *idmap, struct dentry *dentry, struct inode *inode, const char *suffix, const void *value, size_t size, int flags) { /* Handled by LSM. */ return -EAGAIN; } static const struct xattr_handler sockfs_security_xattr_handler = { .prefix = XATTR_SECURITY_PREFIX, .set = sockfs_security_xattr_set, }; static const struct xattr_handler * const sockfs_xattr_handlers[] = { &sockfs_xattr_handler, &sockfs_security_xattr_handler, NULL }; static int sockfs_init_fs_context(struct fs_context *fc) { struct pseudo_fs_context *ctx = init_pseudo(fc, SOCKFS_MAGIC); if (!ctx) return -ENOMEM; ctx->ops = &sockfs_ops; ctx->dops = &sockfs_dentry_operations; ctx->xattr = sockfs_xattr_handlers; return 0; } static struct vfsmount *sock_mnt __read_mostly; static struct file_system_type sock_fs_type = { .name = "sockfs", .init_fs_context = sockfs_init_fs_context, .kill_sb = kill_anon_super, }; /* * Obtains the first available file descriptor and sets it up for use. * * These functions create file structures and maps them to fd space * of the current process. On success it returns file descriptor * and file struct implicitly stored in sock->file. * Note that another thread may close file descriptor before we return * from this function. We use the fact that now we do not refer * to socket after mapping. If one day we will need it, this * function will increment ref. count on file by 1. * * In any case returned fd MAY BE not valid! * This race condition is unavoidable * with shared fd spaces, we cannot solve it inside kernel, * but we take care of internal coherence yet. */ /** * sock_alloc_file - Bind a &socket to a &file * @sock: socket * @flags: file status flags * @dname: protocol name * * Returns the &file bound with @sock, implicitly storing it * in sock->file. If dname is %NULL, sets to "". * * On failure @sock is released, and an ERR pointer is returned. * * This function uses GFP_KERNEL internally. */ struct file *sock_alloc_file(struct socket *sock, int flags, const char *dname) { struct file *file; if (!dname) dname = sock->sk ? sock->sk->sk_prot_creator->name : ""; file = alloc_file_pseudo(SOCK_INODE(sock), sock_mnt, dname, O_RDWR | (flags & O_NONBLOCK), &socket_file_ops); if (IS_ERR(file)) { sock_release(sock); return file; } file->f_mode |= FMODE_NOWAIT; sock->file = file; file->private_data = sock; stream_open(SOCK_INODE(sock), file); return file; } EXPORT_SYMBOL(sock_alloc_file); static int sock_map_fd(struct socket *sock, int flags) { struct file *newfile; int fd = get_unused_fd_flags(flags); if (unlikely(fd < 0)) { sock_release(sock); return fd; } newfile = sock_alloc_file(sock, flags, NULL); if (!IS_ERR(newfile)) { fd_install(fd, newfile); return fd; } put_unused_fd(fd); return PTR_ERR(newfile); } /** * sock_from_file - Return the &socket bounded to @file. * @file: file * * On failure returns %NULL. */ struct socket *sock_from_file(struct file *file) { if (file->f_op == &socket_file_ops) return file->private_data; /* set in sock_alloc_file */ return NULL; } EXPORT_SYMBOL(sock_from_file); /** * sockfd_lookup - Go from a file number to its socket slot * @fd: file handle * @err: pointer to an error code return * * The file handle passed in is locked and the socket it is bound * to is returned. If an error occurs the err pointer is overwritten * with a negative errno code and NULL is returned. The function checks * for both invalid handles and passing a handle which is not a socket. * * On a success the socket object pointer is returned. */ struct socket *sockfd_lookup(int fd, int *err) { struct file *file; struct socket *sock; file = fget(fd); if (!file) { *err = -EBADF; return NULL; } sock = sock_from_file(file); if (!sock) { *err = -ENOTSOCK; fput(file); } return sock; } EXPORT_SYMBOL(sockfd_lookup); static struct socket *sockfd_lookup_light(int fd, int *err, int *fput_needed) { struct fd f = fdget(fd); struct socket *sock; *err = -EBADF; if (f.file) { sock = sock_from_file(f.file); if (likely(sock)) { *fput_needed = f.flags & FDPUT_FPUT; return sock; } *err = -ENOTSOCK; fdput(f); } return NULL; } static ssize_t sockfs_listxattr(struct dentry *dentry, char *buffer, size_t size) { ssize_t len; ssize_t used = 0; len = security_inode_listsecurity(d_inode(dentry), buffer, size); if (len < 0) return len; used += len; if (buffer) { if (size < used) return -ERANGE; buffer += len; } len = (XATTR_NAME_SOCKPROTONAME_LEN + 1); used += len; if (buffer) { if (size < used) return -ERANGE; memcpy(buffer, XATTR_NAME_SOCKPROTONAME, len); buffer += len; } return used; } static int sockfs_setattr(struct mnt_idmap *idmap, struct dentry *dentry, struct iattr *iattr) { int err = simple_setattr(&nop_mnt_idmap, dentry, iattr); if (!err && (iattr->ia_valid & ATTR_UID)) { struct socket *sock = SOCKET_I(d_inode(dentry)); if (sock->sk) sock->sk->sk_uid = iattr->ia_uid; else err = -ENOENT; } return err; } static const struct inode_operations sockfs_inode_ops = { .listxattr = sockfs_listxattr, .setattr = sockfs_setattr, }; /** * sock_alloc - allocate a socket * * Allocate a new inode and socket object. The two are bound together * and initialised. The socket is then returned. If we are out of inodes * NULL is returned. This functions uses GFP_KERNEL internally. */ struct socket *sock_alloc(void) { struct inode *inode; struct socket *sock; inode = new_inode_pseudo(sock_mnt->mnt_sb); if (!inode) return NULL; sock = SOCKET_I(inode); inode->i_ino = get_next_ino(); inode->i_mode = S_IFSOCK | S_IRWXUGO; inode->i_uid = current_fsuid(); inode->i_gid = current_fsgid(); inode->i_op = &sockfs_inode_ops; return sock; } EXPORT_SYMBOL(sock_alloc); static void __sock_release(struct socket *sock, struct inode *inode) { const struct proto_ops *ops = READ_ONCE(sock->ops); if (ops) { struct module *owner = ops->owner; if (inode) inode_lock(inode); ops->release(sock); sock->sk = NULL; if (inode) inode_unlock(inode); sock->ops = NULL; module_put(owner); } if (sock->wq.fasync_list) pr_err("%s: fasync list not empty!\n", __func__); if (!sock->file) { iput(SOCK_INODE(sock)); return; } sock->file = NULL; } /** * sock_release - close a socket * @sock: socket to close * * The socket is released from the protocol stack if it has a release * callback, and the inode is then released if the socket is bound to * an inode not a file. */ void sock_release(struct socket *sock) { __sock_release(sock, NULL); } EXPORT_SYMBOL(sock_release); void __sock_tx_timestamp(__u16 tsflags, __u8 *tx_flags) { u8 flags = *tx_flags; if (tsflags & SOF_TIMESTAMPING_TX_HARDWARE) { flags |= SKBTX_HW_TSTAMP; /* PTP hardware clocks can provide a free running cycle counter * as a time base for virtual clocks. Tell driver to use the * free running cycle counter for timestamp if socket is bound * to virtual clock. */ if (tsflags & SOF_TIMESTAMPING_BIND_PHC) flags |= SKBTX_HW_TSTAMP_USE_CYCLES; } if (tsflags & SOF_TIMESTAMPING_TX_SOFTWARE) flags |= SKBTX_SW_TSTAMP; if (tsflags & SOF_TIMESTAMPING_TX_SCHED) flags |= SKBTX_SCHED_TSTAMP; *tx_flags = flags; } EXPORT_SYMBOL(__sock_tx_timestamp); INDIRECT_CALLABLE_DECLARE(int inet_sendmsg(struct socket *, struct msghdr *, size_t)); INDIRECT_CALLABLE_DECLARE(int inet6_sendmsg(struct socket *, struct msghdr *, size_t)); static noinline void call_trace_sock_send_length(struct sock *sk, int ret, int flags) { trace_sock_send_length(sk, ret, 0); } static inline int sock_sendmsg_nosec(struct socket *sock, struct msghdr *msg) { int ret = INDIRECT_CALL_INET(READ_ONCE(sock->ops)->sendmsg, inet6_sendmsg, inet_sendmsg, sock, msg, msg_data_left(msg)); BUG_ON(ret == -EIOCBQUEUED); if (trace_sock_send_length_enabled()) call_trace_sock_send_length(sock->sk, ret, 0); return ret; } static int __sock_sendmsg(struct socket *sock, struct msghdr *msg) { int err = security_socket_sendmsg(sock, msg, msg_data_left(msg)); return err ?: sock_sendmsg_nosec(sock, msg); } /** * sock_sendmsg - send a message through @sock * @sock: socket * @msg: message to send * * Sends @msg through @sock, passing through LSM. * Returns the number of bytes sent, or an error code. */ int sock_sendmsg(struct socket *sock, struct msghdr *msg) { struct sockaddr_storage *save_addr = (struct sockaddr_storage *)msg->msg_name; struct sockaddr_storage address; int save_len = msg->msg_namelen; int ret; if (msg->msg_name) { memcpy(&address, msg->msg_name, msg->msg_namelen); msg->msg_name = &address; } ret = __sock_sendmsg(sock, msg); msg->msg_name = save_addr; msg->msg_namelen = save_len; return ret; } EXPORT_SYMBOL(sock_sendmsg); /** * kernel_sendmsg - send a message through @sock (kernel-space) * @sock: socket * @msg: message header * @vec: kernel vec * @num: vec array length * @size: total message data size * * Builds the message data with @vec and sends it through @sock. * Returns the number of bytes sent, or an error code. */ int kernel_sendmsg(struct socket *sock, struct msghdr *msg, struct kvec *vec, size_t num, size_t size) { iov_iter_kvec(&msg->msg_iter, ITER_SOURCE, vec, num, size); return sock_sendmsg(sock, msg); } EXPORT_SYMBOL(kernel_sendmsg); /** * kernel_sendmsg_locked - send a message through @sock (kernel-space) * @sk: sock * @msg: message header * @vec: output s/g array * @num: output s/g array length * @size: total message data size * * Builds the message data with @vec and sends it through @sock. * Returns the number of bytes sent, or an error code. * Caller must hold @sk. */ int kernel_sendmsg_locked(struct sock *sk, struct msghdr *msg, struct kvec *vec, size_t num, size_t size) { struct socket *sock = sk->sk_socket; const struct proto_ops *ops = READ_ONCE(sock->ops); if (!ops->sendmsg_locked) return sock_no_sendmsg_locked(sk, msg, size); iov_iter_kvec(&msg->msg_iter, ITER_SOURCE, vec, num, size); return ops->sendmsg_locked(sk, msg, msg_data_left(msg)); } EXPORT_SYMBOL(kernel_sendmsg_locked); static bool skb_is_err_queue(const struct sk_buff *skb) { /* pkt_type of skbs enqueued on the error queue are set to * PACKET_OUTGOING in skb_set_err_queue(). This is only safe to do * in recvmsg, since skbs received on a local socket will never * have a pkt_type of PACKET_OUTGOING. */ return skb->pkt_type == PACKET_OUTGOING; } /* On transmit, software and hardware timestamps are returned independently. * As the two skb clones share the hardware timestamp, which may be updated * before the software timestamp is received, a hardware TX timestamp may be * returned only if there is no software TX timestamp. Ignore false software * timestamps, which may be made in the __sock_recv_timestamp() call when the * option SO_TIMESTAMP_OLD(NS) is enabled on the socket, even when the skb has a * hardware timestamp. */ static bool skb_is_swtx_tstamp(const struct sk_buff *skb, int false_tstamp) { return skb->tstamp && !false_tstamp && skb_is_err_queue(skb); } static ktime_t get_timestamp(struct sock *sk, struct sk_buff *skb, int *if_index) { bool cycles = READ_ONCE(sk->sk_tsflags) & SOF_TIMESTAMPING_BIND_PHC; struct skb_shared_hwtstamps *shhwtstamps = skb_hwtstamps(skb); struct net_device *orig_dev; ktime_t hwtstamp; rcu_read_lock(); orig_dev = dev_get_by_napi_id(skb_napi_id(skb)); if (orig_dev) { *if_index = orig_dev->ifindex; hwtstamp = netdev_get_tstamp(orig_dev, shhwtstamps, cycles); } else { hwtstamp = shhwtstamps->hwtstamp; } rcu_read_unlock(); return hwtstamp; } static void put_ts_pktinfo(struct msghdr *msg, struct sk_buff *skb, int if_index) { struct scm_ts_pktinfo ts_pktinfo; struct net_device *orig_dev; if (!skb_mac_header_was_set(skb)) return; memset(&ts_pktinfo, 0, sizeof(ts_pktinfo)); if (!if_index) { rcu_read_lock(); orig_dev = dev_get_by_napi_id(skb_napi_id(skb)); if (orig_dev) if_index = orig_dev->ifindex; rcu_read_unlock(); } ts_pktinfo.if_index = if_index; ts_pktinfo.pkt_length = skb->len - skb_mac_offset(skb); put_cmsg(msg, SOL_SOCKET, SCM_TIMESTAMPING_PKTINFO, sizeof(ts_pktinfo), &ts_pktinfo); } /* * called from sock_recv_timestamp() if sock_flag(sk, SOCK_RCVTSTAMP) */ void __sock_recv_timestamp(struct msghdr *msg, struct sock *sk, struct sk_buff *skb) { int need_software_tstamp = sock_flag(sk, SOCK_RCVTSTAMP); int new_tstamp = sock_flag(sk, SOCK_TSTAMP_NEW); struct scm_timestamping_internal tss; int empty = 1, false_tstamp = 0; struct skb_shared_hwtstamps *shhwtstamps = skb_hwtstamps(skb); int if_index; ktime_t hwtstamp; u32 tsflags; /* Race occurred between timestamp enabling and packet receiving. Fill in the current time for now. */ if (need_software_tstamp && skb->tstamp == 0) { __net_timestamp(skb); false_tstamp = 1; } if (need_software_tstamp) { if (!sock_flag(sk, SOCK_RCVTSTAMPNS)) { if (new_tstamp) { struct __kernel_sock_timeval tv; skb_get_new_timestamp(skb, &tv); put_cmsg(msg, SOL_SOCKET, SO_TIMESTAMP_NEW, sizeof(tv), &tv); } else { struct __kernel_old_timeval tv; skb_get_timestamp(skb, &tv); put_cmsg(msg, SOL_SOCKET, SO_TIMESTAMP_OLD, sizeof(tv), &tv); } } else { if (new_tstamp) { struct __kernel_timespec ts; skb_get_new_timestampns(skb, &ts); put_cmsg(msg, SOL_SOCKET, SO_TIMESTAMPNS_NEW, sizeof(ts), &ts); } else { struct __kernel_old_timespec ts; skb_get_timestampns(skb, &ts); put_cmsg(msg, SOL_SOCKET, SO_TIMESTAMPNS_OLD, sizeof(ts), &ts); } } } memset(&tss, 0, sizeof(tss)); tsflags = READ_ONCE(sk->sk_tsflags); if ((tsflags & SOF_TIMESTAMPING_SOFTWARE) && ktime_to_timespec64_cond(skb->tstamp, tss.ts + 0)) empty = 0; if (shhwtstamps && (tsflags & SOF_TIMESTAMPING_RAW_HARDWARE) && !skb_is_swtx_tstamp(skb, false_tstamp)) { if_index = 0; if (skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP_NETDEV) hwtstamp = get_timestamp(sk, skb, &if_index); else hwtstamp = shhwtstamps->hwtstamp; if (tsflags & SOF_TIMESTAMPING_BIND_PHC) hwtstamp = ptp_convert_timestamp(&hwtstamp, READ_ONCE(sk->sk_bind_phc)); if (ktime_to_timespec64_cond(hwtstamp, tss.ts + 2)) { empty = 0; if ((tsflags & SOF_TIMESTAMPING_OPT_PKTINFO) && !skb_is_err_queue(skb)) put_ts_pktinfo(msg, skb, if_index); } } if (!empty) { if (sock_flag(sk, SOCK_TSTAMP_NEW)) put_cmsg_scm_timestamping64(msg, &tss); else put_cmsg_scm_timestamping(msg, &tss); if (skb_is_err_queue(skb) && skb->len && SKB_EXT_ERR(skb)->opt_stats) put_cmsg(msg, SOL_SOCKET, SCM_TIMESTAMPING_OPT_STATS, skb->len, skb->data); } } EXPORT_SYMBOL_GPL(__sock_recv_timestamp); #ifdef CONFIG_WIRELESS void __sock_recv_wifi_status(struct msghdr *msg, struct sock *sk, struct sk_buff *skb) { int ack; if (!sock_flag(sk, SOCK_WIFI_STATUS)) return; if (!skb->wifi_acked_valid) return; ack = skb->wifi_acked; put_cmsg(msg, SOL_SOCKET, SCM_WIFI_STATUS, sizeof(ack), &ack); } EXPORT_SYMBOL_GPL(__sock_recv_wifi_status); #endif static inline void sock_recv_drops(struct msghdr *msg, struct sock *sk, struct sk_buff *skb) { if (sock_flag(sk, SOCK_RXQ_OVFL) && skb && SOCK_SKB_CB(skb)->dropcount) put_cmsg(msg, SOL_SOCKET, SO_RXQ_OVFL, sizeof(__u32), &SOCK_SKB_CB(skb)->dropcount); } static void sock_recv_mark(struct msghdr *msg, struct sock *sk, struct sk_buff *skb) { if (sock_flag(sk, SOCK_RCVMARK) && skb) { /* We must use a bounce buffer for CONFIG_HARDENED_USERCOPY=y */ __u32 mark = skb->mark; put_cmsg(msg, SOL_SOCKET, SO_MARK, sizeof(__u32), &mark); } } void __sock_recv_cmsgs(struct msghdr *msg, struct sock *sk, struct sk_buff *skb) { sock_recv_timestamp(msg, sk, skb); sock_recv_drops(msg, sk, skb); sock_recv_mark(msg, sk, skb); } EXPORT_SYMBOL_GPL(__sock_recv_cmsgs); INDIRECT_CALLABLE_DECLARE(int inet_recvmsg(struct socket *, struct msghdr *, size_t, int)); INDIRECT_CALLABLE_DECLARE(int inet6_recvmsg(struct socket *, struct msghdr *, size_t, int)); static noinline void call_trace_sock_recv_length(struct sock *sk, int ret, int flags) { trace_sock_recv_length(sk, ret, flags); } static inline int sock_recvmsg_nosec(struct socket *sock, struct msghdr *msg, int flags) { int ret = INDIRECT_CALL_INET(READ_ONCE(sock->ops)->recvmsg, inet6_recvmsg, inet_recvmsg, sock, msg, msg_data_left(msg), flags); if (trace_sock_recv_length_enabled()) call_trace_sock_recv_length(sock->sk, ret, flags); return ret; } /** * sock_recvmsg - receive a message from @sock * @sock: socket * @msg: message to receive * @flags: message flags * * Receives @msg from @sock, passing through LSM. Returns the total number * of bytes received, or an error. */ int sock_recvmsg(struct socket *sock, struct msghdr *msg, int flags) { int err = security_socket_recvmsg(sock, msg, msg_data_left(msg), flags); return err ?: sock_recvmsg_nosec(sock, msg, flags); } EXPORT_SYMBOL(sock_recvmsg); /** * kernel_recvmsg - Receive a message from a socket (kernel space) * @sock: The socket to receive the message from * @msg: Received message * @vec: Input s/g array for message data * @num: Size of input s/g array * @size: Number of bytes to read * @flags: Message flags (MSG_DONTWAIT, etc...) * * On return the msg structure contains the scatter/gather array passed in the * vec argument. The array is modified so that it consists of the unfilled * portion of the original array. * * The returned value is the total number of bytes received, or an error. */ int kernel_recvmsg(struct socket *sock, struct msghdr *msg, struct kvec *vec, size_t num, size_t size, int flags) { msg->msg_control_is_user = false; iov_iter_kvec(&msg->msg_iter, ITER_DEST, vec, num, size); return sock_recvmsg(sock, msg, flags); } EXPORT_SYMBOL(kernel_recvmsg); static ssize_t sock_splice_read(struct file *file, loff_t *ppos, struct pipe_inode_info *pipe, size_t len, unsigned int flags) { struct socket *sock = file->private_data; const struct proto_ops *ops; ops = READ_ONCE(sock->ops); if (unlikely(!ops->splice_read)) return copy_splice_read(file, ppos, pipe, len, flags); return ops->splice_read(sock, ppos, pipe, len, flags); } static void sock_splice_eof(struct file *file) { struct socket *sock = file->private_data; const struct proto_ops *ops; ops = READ_ONCE(sock->ops); if (ops->splice_eof) ops->splice_eof(sock); } static ssize_t sock_read_iter(struct kiocb *iocb, struct iov_iter *to) { struct file *file = iocb->ki_filp; struct socket *sock = file->private_data; struct msghdr msg = {.msg_iter = *to, .msg_iocb = iocb}; ssize_t res; if (file->f_flags & O_NONBLOCK || (iocb->ki_flags & IOCB_NOWAIT)) msg.msg_flags = MSG_DONTWAIT; if (iocb->ki_pos != 0) return -ESPIPE; if (!iov_iter_count(to)) /* Match SYS5 behaviour */ return 0; res = sock_recvmsg(sock, &msg, msg.msg_flags); *to = msg.msg_iter; return res; } static ssize_t sock_write_iter(struct kiocb *iocb, struct iov_iter *from) { struct file *file = iocb->ki_filp; struct socket *sock = file->private_data; struct msghdr msg = {.msg_iter = *from, .msg_iocb = iocb}; ssize_t res; if (iocb->ki_pos != 0) return -ESPIPE; if (file->f_flags & O_NONBLOCK || (iocb->ki_flags & IOCB_NOWAIT)) msg.msg_flags = MSG_DONTWAIT; if (sock->type == SOCK_SEQPACKET) msg.msg_flags |= MSG_EOR; res = __sock_sendmsg(sock, &msg); *from = msg.msg_iter; return res; } /* * Atomic setting of ioctl hooks to avoid race * with module unload. */ static DEFINE_MUTEX(br_ioctl_mutex); static int (*br_ioctl_hook)(struct net *net, struct net_bridge *br, unsigned int cmd, struct ifreq *ifr, void __user *uarg); void brioctl_set(int (*hook)(struct net *net, struct net_bridge *br, unsigned int cmd, struct ifreq *ifr, void __user *uarg)) { mutex_lock(&br_ioctl_mutex); br_ioctl_hook = hook; mutex_unlock(&br_ioctl_mutex); } EXPORT_SYMBOL(brioctl_set); int br_ioctl_call(struct net *net, struct net_bridge *br, unsigned int cmd, struct ifreq *ifr, void __user *uarg) { int err = -ENOPKG; if (!br_ioctl_hook) request_module("bridge"); mutex_lock(&br_ioctl_mutex); if (br_ioctl_hook) err = br_ioctl_hook(net, br, cmd, ifr, uarg); mutex_unlock(&br_ioctl_mutex); return err; } static DEFINE_MUTEX(vlan_ioctl_mutex); static int (*vlan_ioctl_hook) (struct net *, void __user *arg); void vlan_ioctl_set(int (*hook) (struct net *, void __user *)) { mutex_lock(&vlan_ioctl_mutex); vlan_ioctl_hook = hook; mutex_unlock(&vlan_ioctl_mutex); } EXPORT_SYMBOL(vlan_ioctl_set); static long sock_do_ioctl(struct net *net, struct socket *sock, unsigned int cmd, unsigned long arg) { const struct proto_ops *ops = READ_ONCE(sock->ops); struct ifreq ifr; bool need_copyout; int err; void __user *argp = (void __user *)arg; void __user *data; err = ops->ioctl(sock, cmd, arg); /* * If this ioctl is unknown try to hand it down * to the NIC driver. */ if (err != -ENOIOCTLCMD) return err; if (!is_socket_ioctl_cmd(cmd)) return -ENOTTY; if (get_user_ifreq(&ifr, &data, argp)) return -EFAULT; err = dev_ioctl(net, cmd, &ifr, data, &need_copyout); if (!err && need_copyout) if (put_user_ifreq(&ifr, argp)) return -EFAULT; return err; } /* * With an ioctl, arg may well be a user mode pointer, but we don't know * what to do with it - that's up to the protocol still. */ static long sock_ioctl(struct file *file, unsigned cmd, unsigned long arg) { const struct proto_ops *ops; struct socket *sock; struct sock *sk; void __user *argp = (void __user *)arg; int pid, err; struct net *net; sock = file->private_data; ops = READ_ONCE(sock->ops); sk = sock->sk; net = sock_net(sk); if (unlikely(cmd >= SIOCDEVPRIVATE && cmd <= (SIOCDEVPRIVATE + 15))) { struct ifreq ifr; void __user *data; bool need_copyout; if (get_user_ifreq(&ifr, &data, argp)) return -EFAULT; err = dev_ioctl(net, cmd, &ifr, data, &need_copyout); if (!err && need_copyout) if (put_user_ifreq(&ifr, argp)) return -EFAULT; } else #ifdef CONFIG_WEXT_CORE if (cmd >= SIOCIWFIRST && cmd <= SIOCIWLAST) { err = wext_handle_ioctl(net, cmd, argp); } else #endif switch (cmd) { case FIOSETOWN: case SIOCSPGRP: err = -EFAULT; if (get_user(pid, (int __user *)argp)) break; err = f_setown(sock->file, pid, 1); break; case FIOGETOWN: case SIOCGPGRP: err = put_user(f_getown(sock->file), (int __user *)argp); break; case SIOCGIFBR: case SIOCSIFBR: case SIOCBRADDBR: case SIOCBRDELBR: err = br_ioctl_call(net, NULL, cmd, NULL, argp); break; case SIOCGIFVLAN: case SIOCSIFVLAN: err = -ENOPKG; if (!vlan_ioctl_hook) request_module("8021q"); mutex_lock(&vlan_ioctl_mutex); if (vlan_ioctl_hook) err = vlan_ioctl_hook(net, argp); mutex_unlock(&vlan_ioctl_mutex); break; case SIOCGSKNS: err = -EPERM; if (!ns_capable(net->user_ns, CAP_NET_ADMIN)) break; err = open_related_ns(&net->ns, get_net_ns); break; case SIOCGSTAMP_OLD: case SIOCGSTAMPNS_OLD: if (!ops->gettstamp) { err = -ENOIOCTLCMD; break; } err = ops->gettstamp(sock, argp, cmd == SIOCGSTAMP_OLD, !IS_ENABLED(CONFIG_64BIT)); break; case SIOCGSTAMP_NEW: case SIOCGSTAMPNS_NEW: if (!ops->gettstamp) { err = -ENOIOCTLCMD; break; } err = ops->gettstamp(sock, argp, cmd == SIOCGSTAMP_NEW, false); break; case SIOCGIFCONF: err = dev_ifconf(net, argp); break; default: err = sock_do_ioctl(net, sock, cmd, arg); break; } return err; } /** * sock_create_lite - creates a socket * @family: protocol family (AF_INET, ...) * @type: communication type (SOCK_STREAM, ...) * @protocol: protocol (0, ...) * @res: new socket * * Creates a new socket and assigns it to @res, passing through LSM. * The new socket initialization is not complete, see kernel_accept(). * Returns 0 or an error. On failure @res is set to %NULL. * This function internally uses GFP_KERNEL. */ int sock_create_lite(int family, int type, int protocol, struct socket **res) { int err; struct socket *sock = NULL; err = security_socket_create(family, type, protocol, 1); if (err) goto out; sock = sock_alloc(); if (!sock) { err = -ENOMEM; goto out; } sock->type = type; err = security_socket_post_create(sock, family, type, protocol, 1); if (err) goto out_release; out: *res = sock; return err; out_release: sock_release(sock); sock = NULL; goto out; } EXPORT_SYMBOL(sock_create_lite); /* No kernel lock held - perfect */ static __poll_t sock_poll(struct file *file, poll_table *wait) { struct socket *sock = file->private_data; const struct proto_ops *ops = READ_ONCE(sock->ops); __poll_t events = poll_requested_events(wait), flag = 0; if (!ops->poll) return 0; if (sk_can_busy_loop(sock->sk)) { /* poll once if requested by the syscall */ if (events & POLL_BUSY_LOOP) sk_busy_loop(sock->sk, 1); /* if this socket can poll_ll, tell the system call */ flag = POLL_BUSY_LOOP; } return ops->poll(file, sock, wait) | flag; } static int sock_mmap(struct file *file, struct vm_area_struct *vma) { struct socket *sock = file->private_data; return READ_ONCE(sock->ops)->mmap(file, sock, vma); } static int sock_close(struct inode *inode, struct file *filp) { __sock_release(SOCKET_I(inode), inode); return 0; } /* * Update the socket async list * * Fasync_list locking strategy. * * 1. fasync_list is modified only under process context socket lock * i.e. under semaphore. * 2. fasync_list is used under read_lock(&sk->sk_callback_lock) * or under socket lock */ static int sock_fasync(int fd, struct file *filp, int on) { struct socket *sock = filp->private_data; struct sock *sk = sock->sk; struct socket_wq *wq = &sock->wq; if (sk == NULL) return -EINVAL; lock_sock(sk); fasync_helper(fd, filp, on, &wq->fasync_list); if (!wq->fasync_list) sock_reset_flag(sk, SOCK_FASYNC); else sock_set_flag(sk, SOCK_FASYNC); release_sock(sk); return 0; } /* This function may be called only under rcu_lock */ int sock_wake_async(struct socket_wq *wq, int how, int band) { if (!wq || !wq->fasync_list) return -1; switch (how) { case SOCK_WAKE_WAITD: if (test_bit(SOCKWQ_ASYNC_WAITDATA, &wq->flags)) break; goto call_kill; case SOCK_WAKE_SPACE: if (!test_and_clear_bit(SOCKWQ_ASYNC_NOSPACE, &wq->flags)) break; fallthrough; case SOCK_WAKE_IO: call_kill: kill_fasync(&wq->fasync_list, SIGIO, band); break; case SOCK_WAKE_URG: kill_fasync(&wq->fasync_list, SIGURG, band); } return 0; } EXPORT_SYMBOL(sock_wake_async); /** * __sock_create - creates a socket * @net: net namespace * @family: protocol family (AF_INET, ...) * @type: communication type (SOCK_STREAM, ...) * @protocol: protocol (0, ...) * @res: new socket * @kern: boolean for kernel space sockets * * Creates a new socket and assigns it to @res, passing through LSM. * Returns 0 or an error. On failure @res is set to %NULL. @kern must * be set to true if the socket resides in kernel space. * This function internally uses GFP_KERNEL. */ int __sock_create(struct net *net, int family, int type, int protocol, struct socket **res, int kern) { int err; struct socket *sock; const struct net_proto_family *pf; /* * Check protocol is in range */ if (family < 0 || family >= NPROTO) return -EAFNOSUPPORT; if (type < 0 || type >= SOCK_MAX) return -EINVAL; /* Compatibility. This uglymoron is moved from INET layer to here to avoid deadlock in module load. */ if (family == PF_INET && type == SOCK_PACKET) { pr_info_once("%s uses obsolete (PF_INET,SOCK_PACKET)\n", current->comm); family = PF_PACKET; } err = security_socket_create(family, type, protocol, kern); if (err) return err; /* * Allocate the socket and allow the family to set things up. if * the protocol is 0, the family is instructed to select an appropriate * default. */ sock = sock_alloc(); if (!sock) { net_warn_ratelimited("socket: no more sockets\n"); return -ENFILE; /* Not exactly a match, but its the closest posix thing */ } sock->type = type; #ifdef CONFIG_MODULES /* Attempt to load a protocol module if the find failed. * * 12/09/1996 Marcin: But! this makes REALLY only sense, if the user * requested real, full-featured networking support upon configuration. * Otherwise module support will break! */ if (rcu_access_pointer(net_families[family]) == NULL) request_module("net-pf-%d", family); #endif rcu_read_lock(); pf = rcu_dereference(net_families[family]); err = -EAFNOSUPPORT; if (!pf) goto out_release; /* * We will call the ->create function, that possibly is in a loadable * module, so we have to bump that loadable module refcnt first. */ if (!try_module_get(pf->owner)) goto out_release; /* Now protected by module ref count */ rcu_read_unlock(); err = pf->create(net, sock, protocol, kern); if (err < 0) goto out_module_put; /* * Now to bump the refcnt of the [loadable] module that owns this * socket at sock_release time we decrement its refcnt. */ if (!try_module_get(sock->ops->owner)) goto out_module_busy; /* * Now that we're done with the ->create function, the [loadable] * module can have its refcnt decremented */ module_put(pf->owner); err = security_socket_post_create(sock, family, type, protocol, kern); if (err) goto out_sock_release; *res = sock; return 0; out_module_busy: err = -EAFNOSUPPORT; out_module_put: sock->ops = NULL; module_put(pf->owner); out_sock_release: sock_release(sock); return err; out_release: rcu_read_unlock(); goto out_sock_release; } EXPORT_SYMBOL(__sock_create); /** * sock_create - creates a socket * @family: protocol family (AF_INET, ...) * @type: communication type (SOCK_STREAM, ...) * @protocol: protocol (0, ...) * @res: new socket * * A wrapper around __sock_create(). * Returns 0 or an error. This function internally uses GFP_KERNEL. */ int sock_create(int family, int type, int protocol, struct socket **res) { return __sock_create(current->nsproxy->net_ns, family, type, protocol, res, 0); } EXPORT_SYMBOL(sock_create); /** * sock_create_kern - creates a socket (kernel space) * @net: net namespace * @family: protocol family (AF_INET, ...) * @type: communication type (SOCK_STREAM, ...) * @protocol: protocol (0, ...) * @res: new socket * * A wrapper around __sock_create(). * Returns 0 or an error. This function internally uses GFP_KERNEL. */ int sock_create_kern(struct net *net, int family, int type, int protocol, struct socket **res) { return __sock_create(net, family, type, protocol, res, 1); } EXPORT_SYMBOL(sock_create_kern); static struct socket *__sys_socket_create(int family, int type, int protocol) { struct socket *sock; int retval; /* Check the SOCK_* constants for consistency. */ BUILD_BUG_ON(SOCK_CLOEXEC != O_CLOEXEC); BUILD_BUG_ON((SOCK_MAX | SOCK_TYPE_MASK) != SOCK_TYPE_MASK); BUILD_BUG_ON(SOCK_CLOEXEC & SOCK_TYPE_MASK); BUILD_BUG_ON(SOCK_NONBLOCK & SOCK_TYPE_MASK); if ((type & ~SOCK_TYPE_MASK) & ~(SOCK_CLOEXEC | SOCK_NONBLOCK)) return ERR_PTR(-EINVAL); type &= SOCK_TYPE_MASK; retval = sock_create(family, type, protocol, &sock); if (retval < 0) return ERR_PTR(retval); return sock; } struct file *__sys_socket_file(int family, int type, int protocol) { struct socket *sock; int flags; sock = __sys_socket_create(family, type, protocol); if (IS_ERR(sock)) return ERR_CAST(sock); flags = type & ~SOCK_TYPE_MASK; if (SOCK_NONBLOCK != O_NONBLOCK && (flags & SOCK_NONBLOCK)) flags = (flags & ~SOCK_NONBLOCK) | O_NONBLOCK; return sock_alloc_file(sock, flags, NULL); } /* A hook for bpf progs to attach to and update socket protocol. * * A static noinline declaration here could cause the compiler to * optimize away the function. A global noinline declaration will * keep the definition, but may optimize away the callsite. * Therefore, __weak is needed to ensure that the call is still * emitted, by telling the compiler that we don't know what the * function might eventually be. */ __bpf_hook_start(); __weak noinline int update_socket_protocol(int family, int type, int protocol) { return protocol; } __bpf_hook_end(); int __sys_socket(int family, int type, int protocol) { struct socket *sock; int flags; sock = __sys_socket_create(family, type, update_socket_protocol(family, type, protocol)); if (IS_ERR(sock)) return PTR_ERR(sock); flags = type & ~SOCK_TYPE_MASK; if (SOCK_NONBLOCK != O_NONBLOCK && (flags & SOCK_NONBLOCK)) flags = (flags & ~SOCK_NONBLOCK) | O_NONBLOCK; return sock_map_fd(sock, flags & (O_CLOEXEC | O_NONBLOCK)); } SYSCALL_DEFINE3(socket, int, family, int, type, int, protocol) { return __sys_socket(family, type, protocol); } /* * Create a pair of connected sockets. */ int __sys_socketpair(int family, int type, int protocol, int __user *usockvec) { struct socket *sock1, *sock2; int fd1, fd2, err; struct file *newfile1, *newfile2; int flags; flags = type & ~SOCK_TYPE_MASK; if (flags & ~(SOCK_CLOEXEC | SOCK_NONBLOCK)) return -EINVAL; type &= SOCK_TYPE_MASK; if (SOCK_NONBLOCK != O_NONBLOCK && (flags & SOCK_NONBLOCK)) flags = (flags & ~SOCK_NONBLOCK) | O_NONBLOCK; /* * reserve descriptors and make sure we won't fail * to return them to userland. */ fd1 = get_unused_fd_flags(flags); if (unlikely(fd1 < 0)) return fd1; fd2 = get_unused_fd_flags(flags); if (unlikely(fd2 < 0)) { put_unused_fd(fd1); return fd2; } err = put_user(fd1, &usockvec[0]); if (err) goto out; err = put_user(fd2, &usockvec[1]); if (err) goto out; /* * Obtain the first socket and check if the underlying protocol * supports the socketpair call. */ err = sock_create(family, type, protocol, &sock1); if (unlikely(err < 0)) goto out; err = sock_create(family, type, protocol, &sock2); if (unlikely(err < 0)) { sock_release(sock1); goto out; } err = security_socket_socketpair(sock1, sock2); if (unlikely(err)) { sock_release(sock2); sock_release(sock1); goto out; } err = READ_ONCE(sock1->ops)->socketpair(sock1, sock2); if (unlikely(err < 0)) { sock_release(sock2); sock_release(sock1); goto out; } newfile1 = sock_alloc_file(sock1, flags, NULL); if (IS_ERR(newfile1)) { err = PTR_ERR(newfile1); sock_release(sock2); goto out; } newfile2 = sock_alloc_file(sock2, flags, NULL); if (IS_ERR(newfile2)) { err = PTR_ERR(newfile2); fput(newfile1); goto out; } audit_fd_pair(fd1, fd2); fd_install(fd1, newfile1); fd_install(fd2, newfile2); return 0; out: put_unused_fd(fd2); put_unused_fd(fd1); return err; } SYSCALL_DEFINE4(socketpair, int, family, int, type, int, protocol, int __user *, usockvec) { return __sys_socketpair(family, type, protocol, usockvec); } int __sys_bind_socket(struct socket *sock, struct sockaddr_storage *address, int addrlen) { int err; err = security_socket_bind(sock, (struct sockaddr *)address, addrlen); if (!err) err = READ_ONCE(sock->ops)->bind(sock, (struct sockaddr *)address, addrlen); return err; } /* * Bind a name to a socket. Nothing much to do here since it's * the protocol's responsibility to handle the local address. * * We move the socket address to kernel space before we call * the protocol layer (having also checked the address is ok). */ int __sys_bind(int fd, struct sockaddr __user *umyaddr, int addrlen) { struct socket *sock; struct sockaddr_storage address; int err, fput_needed; sock = sockfd_lookup_light(fd, &err, &fput_needed); if (sock) { err = move_addr_to_kernel(umyaddr, addrlen, &address); if (!err) err = __sys_bind_socket(sock, &address, addrlen); fput_light(sock->file, fput_needed); } return err; } SYSCALL_DEFINE3(bind, int, fd, struct sockaddr __user *, umyaddr, int, addrlen) { return __sys_bind(fd, umyaddr, addrlen); } /* * Perform a listen. Basically, we allow the protocol to do anything * necessary for a listen, and if that works, we mark the socket as * ready for listening. */ int __sys_listen_socket(struct socket *sock, int backlog) { int somaxconn, err; somaxconn = READ_ONCE(sock_net(sock->sk)->core.sysctl_somaxconn); if ((unsigned int)backlog > somaxconn) backlog = somaxconn; err = security_socket_listen(sock, backlog); if (!err) err = READ_ONCE(sock->ops)->listen(sock, backlog); return err; } int __sys_listen(int fd, int backlog) { struct socket *sock; int err, fput_needed; sock = sockfd_lookup_light(fd, &err, &fput_needed); if (sock) { err = __sys_listen_socket(sock, backlog); fput_light(sock->file, fput_needed); } return err; } SYSCALL_DEFINE2(listen, int, fd, int, backlog) { return __sys_listen(fd, backlog); } struct file *do_accept(struct file *file, struct proto_accept_arg *arg, struct sockaddr __user *upeer_sockaddr, int __user *upeer_addrlen, int flags) { struct socket *sock, *newsock; struct file *newfile; int err, len; struct sockaddr_storage address; const struct proto_ops *ops; sock = sock_from_file(file); if (!sock) return ERR_PTR(-ENOTSOCK); newsock = sock_alloc(); if (!newsock) return ERR_PTR(-ENFILE); ops = READ_ONCE(sock->ops); newsock->type = sock->type; newsock->ops = ops; /* * We don't need try_module_get here, as the listening socket (sock) * has the protocol module (sock->ops->owner) held. */ __module_get(ops->owner); newfile = sock_alloc_file(newsock, flags, sock->sk->sk_prot_creator->name); if (IS_ERR(newfile)) return newfile; err = security_socket_accept(sock, newsock); if (err) goto out_fd; arg->flags |= sock->file->f_flags; err = ops->accept(sock, newsock, arg); if (err < 0) goto out_fd; if (upeer_sockaddr) { len = ops->getname(newsock, (struct sockaddr *)&address, 2); if (len < 0) { err = -ECONNABORTED; goto out_fd; } err = move_addr_to_user(&address, len, upeer_sockaddr, upeer_addrlen); if (err < 0) goto out_fd; } /* File flags are not inherited via accept() unlike another OSes. */ return newfile; out_fd: fput(newfile); return ERR_PTR(err); } static int __sys_accept4_file(struct file *file, struct sockaddr __user *upeer_sockaddr, int __user *upeer_addrlen, int flags) { struct proto_accept_arg arg = { }; struct file *newfile; int newfd; if (flags & ~(SOCK_CLOEXEC | SOCK_NONBLOCK)) return -EINVAL; if (SOCK_NONBLOCK != O_NONBLOCK && (flags & SOCK_NONBLOCK)) flags = (flags & ~SOCK_NONBLOCK) | O_NONBLOCK; newfd = get_unused_fd_flags(flags); if (unlikely(newfd < 0)) return newfd; newfile = do_accept(file, &arg, upeer_sockaddr, upeer_addrlen, flags); if (IS_ERR(newfile)) { put_unused_fd(newfd); return PTR_ERR(newfile); } fd_install(newfd, newfile); return newfd; } /* * For accept, we attempt to create a new socket, set up the link * with the client, wake up the client, then return the new * connected fd. We collect the address of the connector in kernel * space and move it to user at the very end. This is unclean because * we open the socket then return an error. * * 1003.1g adds the ability to recvmsg() to query connection pending * status to recvmsg. We need to add that support in a way thats * clean when we restructure accept also. */ int __sys_accept4(int fd, struct sockaddr __user *upeer_sockaddr, int __user *upeer_addrlen, int flags) { int ret = -EBADF; struct fd f; f = fdget(fd); if (f.file) { ret = __sys_accept4_file(f.file, upeer_sockaddr, upeer_addrlen, flags); fdput(f); } return ret; } SYSCALL_DEFINE4(accept4, int, fd, struct sockaddr __user *, upeer_sockaddr, int __user *, upeer_addrlen, int, flags) { return __sys_accept4(fd, upeer_sockaddr, upeer_addrlen, flags); } SYSCALL_DEFINE3(accept, int, fd, struct sockaddr __user *, upeer_sockaddr, int __user *, upeer_addrlen) { return __sys_accept4(fd, upeer_sockaddr, upeer_addrlen, 0); } /* * Attempt to connect to a socket with the server address. The address * is in user space so we verify it is OK and move it to kernel space. * * For 1003.1g we need to add clean support for a bind to AF_UNSPEC to * break bindings * * NOTE: 1003.1g draft 6.3 is broken with respect to AX.25/NetROM and * other SEQPACKET protocols that take time to connect() as it doesn't * include the -EINPROGRESS status for such sockets. */ int __sys_connect_file(struct file *file, struct sockaddr_storage *address, int addrlen, int file_flags) { struct socket *sock; int err; sock = sock_from_file(file); if (!sock) { err = -ENOTSOCK; goto out; } err = security_socket_connect(sock, (struct sockaddr *)address, addrlen); if (err) goto out; err = READ_ONCE(sock->ops)->connect(sock, (struct sockaddr *)address, addrlen, sock->file->f_flags | file_flags); out: return err; } int __sys_connect(int fd, struct sockaddr __user *uservaddr, int addrlen) { int ret = -EBADF; struct fd f; f = fdget(fd); if (f.file) { struct sockaddr_storage address; ret = move_addr_to_kernel(uservaddr, addrlen, &address); if (!ret) ret = __sys_connect_file(f.file, &address, addrlen, 0); fdput(f); } return ret; } SYSCALL_DEFINE3(connect, int, fd, struct sockaddr __user *, uservaddr, int, addrlen) { return __sys_connect(fd, uservaddr, addrlen); } /* * Get the local address ('name') of a socket object. Move the obtained * name to user space. */ int __sys_getsockname(int fd, struct sockaddr __user *usockaddr, int __user *usockaddr_len) { struct socket *sock; struct sockaddr_storage address; int err, fput_needed; sock = sockfd_lookup_light(fd, &err, &fput_needed); if (!sock) goto out; err = security_socket_getsockname(sock); if (err) goto out_put; err = READ_ONCE(sock->ops)->getname(sock, (struct sockaddr *)&address, 0); if (err < 0) goto out_put; /* "err" is actually length in this case */ err = move_addr_to_user(&address, err, usockaddr, usockaddr_len); out_put: fput_light(sock->file, fput_needed); out: return err; } SYSCALL_DEFINE3(getsockname, int, fd, struct sockaddr __user *, usockaddr, int __user *, usockaddr_len) { return __sys_getsockname(fd, usockaddr, usockaddr_len); } /* * Get the remote address ('name') of a socket object. Move the obtained * name to user space. */ int __sys_getpeername(int fd, struct sockaddr __user *usockaddr, int __user *usockaddr_len) { struct socket *sock; struct sockaddr_storage address; int err, fput_needed; sock = sockfd_lookup_light(fd, &err, &fput_needed); if (sock != NULL) { const struct proto_ops *ops = READ_ONCE(sock->ops); err = security_socket_getpeername(sock); if (err) { fput_light(sock->file, fput_needed); return err; } err = ops->getname(sock, (struct sockaddr *)&address, 1); if (err >= 0) /* "err" is actually length in this case */ err = move_addr_to_user(&address, err, usockaddr, usockaddr_len); fput_light(sock->file, fput_needed); } return err; } SYSCALL_DEFINE3(getpeername, int, fd, struct sockaddr __user *, usockaddr, int __user *, usockaddr_len) { return __sys_getpeername(fd, usockaddr, usockaddr_len); } /* * Send a datagram to a given address. We move the address into kernel * space and check the user space data area is readable before invoking * the protocol. */ int __sys_sendto(int fd, void __user *buff, size_t len, unsigned int flags, struct sockaddr __user *addr, int addr_len) { struct socket *sock; struct sockaddr_storage address; int err; struct msghdr msg; int fput_needed; err = import_ubuf(ITER_SOURCE, buff, len, &msg.msg_iter); if (unlikely(err)) return err; sock = sockfd_lookup_light(fd, &err, &fput_needed); if (!sock) goto out; msg.msg_name = NULL; msg.msg_control = NULL; msg.msg_controllen = 0; msg.msg_namelen = 0; msg.msg_ubuf = NULL; if (addr) { err = move_addr_to_kernel(addr, addr_len, &address); if (err < 0) goto out_put; msg.msg_name = (struct sockaddr *)&address; msg.msg_namelen = addr_len; } flags &= ~MSG_INTERNAL_SENDMSG_FLAGS; if (sock->file->f_flags & O_NONBLOCK) flags |= MSG_DONTWAIT; msg.msg_flags = flags; err = __sock_sendmsg(sock, &msg); out_put: fput_light(sock->file, fput_needed); out: return err; } SYSCALL_DEFINE6(sendto, int, fd, void __user *, buff, size_t, len, unsigned int, flags, struct sockaddr __user *, addr, int, addr_len) { return __sys_sendto(fd, buff, len, flags, addr, addr_len); } /* * Send a datagram down a socket. */ SYSCALL_DEFINE4(send, int, fd, void __user *, buff, size_t, len, unsigned int, flags) { return __sys_sendto(fd, buff, len, flags, NULL, 0); } /* * Receive a frame from the socket and optionally record the address of the * sender. We verify the buffers are writable and if needed move the * sender address from kernel to user space. */ int __sys_recvfrom(int fd, void __user *ubuf, size_t size, unsigned int flags, struct sockaddr __user *addr, int __user *addr_len) { struct sockaddr_storage address; struct msghdr msg = { /* Save some cycles and don't copy the address if not needed */ .msg_name = addr ? (struct sockaddr *)&address : NULL, }; struct socket *sock; int err, err2; int fput_needed; err = import_ubuf(ITER_DEST, ubuf, size, &msg.msg_iter); if (unlikely(err)) return err; sock = sockfd_lookup_light(fd, &err, &fput_needed); if (!sock) goto out; if (sock->file->f_flags & O_NONBLOCK) flags |= MSG_DONTWAIT; err = sock_recvmsg(sock, &msg, flags); if (err >= 0 && addr != NULL) { err2 = move_addr_to_user(&address, msg.msg_namelen, addr, addr_len); if (err2 < 0) err = err2; } fput_light(sock->file, fput_needed); out: return err; } SYSCALL_DEFINE6(recvfrom, int, fd, void __user *, ubuf, size_t, size, unsigned int, flags, struct sockaddr __user *, addr, int __user *, addr_len) { return __sys_recvfrom(fd, ubuf, size, flags, addr, addr_len); } /* * Receive a datagram from a socket. */ SYSCALL_DEFINE4(recv, int, fd, void __user *, ubuf, size_t, size, unsigned int, flags) { return __sys_recvfrom(fd, ubuf, size, flags, NULL, NULL); } static bool sock_use_custom_sol_socket(const struct socket *sock) { return test_bit(SOCK_CUSTOM_SOCKOPT, &sock->flags); } int do_sock_setsockopt(struct socket *sock, bool compat, int level, int optname, sockptr_t optval, int optlen) { const struct proto_ops *ops; char *kernel_optval = NULL; int err; if (optlen < 0) return -EINVAL; err = security_socket_setsockopt(sock, level, optname); if (err) goto out_put; if (!compat) err = BPF_CGROUP_RUN_PROG_SETSOCKOPT(sock->sk, &level, &optname, optval, &optlen, &kernel_optval); if (err < 0) goto out_put; if (err > 0) { err = 0; goto out_put; } if (kernel_optval) optval = KERNEL_SOCKPTR(kernel_optval); ops = READ_ONCE(sock->ops); if (level == SOL_SOCKET && !sock_use_custom_sol_socket(sock)) err = sock_setsockopt(sock, level, optname, optval, optlen); else if (unlikely(!ops->setsockopt)) err = -EOPNOTSUPP; else err = ops->setsockopt(sock, level, optname, optval, optlen); kfree(kernel_optval); out_put: return err; } EXPORT_SYMBOL(do_sock_setsockopt); /* Set a socket option. Because we don't know the option lengths we have * to pass the user mode parameter for the protocols to sort out. */ int __sys_setsockopt(int fd, int level, int optname, char __user *user_optval, int optlen) { sockptr_t optval = USER_SOCKPTR(user_optval); bool compat = in_compat_syscall(); int err, fput_needed; struct socket *sock; sock = sockfd_lookup_light(fd, &err, &fput_needed); if (!sock) return err; err = do_sock_setsockopt(sock, compat, level, optname, optval, optlen); fput_light(sock->file, fput_needed); return err; } SYSCALL_DEFINE5(setsockopt, int, fd, int, level, int, optname, char __user *, optval, int, optlen) { return __sys_setsockopt(fd, level, optname, optval, optlen); } INDIRECT_CALLABLE_DECLARE(bool tcp_bpf_bypass_getsockopt(int level, int optname)); int do_sock_getsockopt(struct socket *sock, bool compat, int level, int optname, sockptr_t optval, sockptr_t optlen) { int max_optlen __maybe_unused; const struct proto_ops *ops; int err; err = security_socket_getsockopt(sock, level, optname); if (err) return err; if (!compat) max_optlen = BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN(optlen); ops = READ_ONCE(sock->ops); if (level == SOL_SOCKET) { err = sk_getsockopt(sock->sk, level, optname, optval, optlen); } else if (unlikely(!ops->getsockopt)) { err = -EOPNOTSUPP; } else { if (WARN_ONCE(optval.is_kernel || optlen.is_kernel, "Invalid argument type")) return -EOPNOTSUPP; err = ops->getsockopt(sock, level, optname, optval.user, optlen.user); } if (!compat) err = BPF_CGROUP_RUN_PROG_GETSOCKOPT(sock->sk, level, optname, optval, optlen, max_optlen, err); return err; } EXPORT_SYMBOL(do_sock_getsockopt); /* * Get a socket option. Because we don't know the option lengths we have * to pass a user mode parameter for the protocols to sort out. */ int __sys_getsockopt(int fd, int level, int optname, char __user *optval, int __user *optlen) { int err, fput_needed; struct socket *sock; bool compat; sock = sockfd_lookup_light(fd, &err, &fput_needed); if (!sock) return err; compat = in_compat_syscall(); err = do_sock_getsockopt(sock, compat, level, optname, USER_SOCKPTR(optval), USER_SOCKPTR(optlen)); fput_light(sock->file, fput_needed); return err; } SYSCALL_DEFINE5(getsockopt, int, fd, int, level, int, optname, char __user *, optval, int __user *, optlen) { return __sys_getsockopt(fd, level, optname, optval, optlen); } /* * Shutdown a socket. */ int __sys_shutdown_sock(struct socket *sock, int how) { int err; err = security_socket_shutdown(sock, how); if (!err) err = READ_ONCE(sock->ops)->shutdown(sock, how); return err; } int __sys_shutdown(int fd, int how) { int err, fput_needed; struct socket *sock; sock = sockfd_lookup_light(fd, &err, &fput_needed); if (sock != NULL) { err = __sys_shutdown_sock(sock, how); fput_light(sock->file, fput_needed); } return err; } SYSCALL_DEFINE2(shutdown, int, fd, int, how) { return __sys_shutdown(fd, how); } /* A couple of helpful macros for getting the address of the 32/64 bit * fields which are the same type (int / unsigned) on our platforms. */ #define COMPAT_MSG(msg, member) ((MSG_CMSG_COMPAT & flags) ? &msg##_compat->member : &msg->member) #define COMPAT_NAMELEN(msg) COMPAT_MSG(msg, msg_namelen) #define COMPAT_FLAGS(msg) COMPAT_MSG(msg, msg_flags) struct used_address { struct sockaddr_storage name; unsigned int name_len; }; int __copy_msghdr(struct msghdr *kmsg, struct user_msghdr *msg, struct sockaddr __user **save_addr) { ssize_t err; kmsg->msg_control_is_user = true; kmsg->msg_get_inq = 0; kmsg->msg_control_user = msg->msg_control; kmsg->msg_controllen = msg->msg_controllen; kmsg->msg_flags = msg->msg_flags; kmsg->msg_namelen = msg->msg_namelen; if (!msg->msg_name) kmsg->msg_namelen = 0; if (kmsg->msg_namelen < 0) return -EINVAL; if (kmsg->msg_namelen > sizeof(struct sockaddr_storage)) kmsg->msg_namelen = sizeof(struct sockaddr_storage); if (save_addr) *save_addr = msg->msg_name; if (msg->msg_name && kmsg->msg_namelen) { if (!save_addr) { err = move_addr_to_kernel(msg->msg_name, kmsg->msg_namelen, kmsg->msg_name); if (err < 0) return err; } } else { kmsg->msg_name = NULL; kmsg->msg_namelen = 0; } if (msg->msg_iovlen > UIO_MAXIOV) return -EMSGSIZE; kmsg->msg_iocb = NULL; kmsg->msg_ubuf = NULL; return 0; } static int copy_msghdr_from_user(struct msghdr *kmsg, struct user_msghdr __user *umsg, struct sockaddr __user **save_addr, struct iovec **iov) { struct user_msghdr msg; ssize_t err; if (copy_from_user(&msg, umsg, sizeof(*umsg))) return -EFAULT; err = __copy_msghdr(kmsg, &msg, save_addr); if (err) return err; err = import_iovec(save_addr ? ITER_DEST : ITER_SOURCE, msg.msg_iov, msg.msg_iovlen, UIO_FASTIOV, iov, &kmsg->msg_iter); return err < 0 ? err : 0; } static int ____sys_sendmsg(struct socket *sock, struct msghdr *msg_sys, unsigned int flags, struct used_address *used_address, unsigned int allowed_msghdr_flags) { unsigned char ctl[sizeof(struct cmsghdr) + 20] __aligned(sizeof(__kernel_size_t)); /* 20 is size of ipv6_pktinfo */ unsigned char *ctl_buf = ctl; int ctl_len; ssize_t err; err = -ENOBUFS; if (msg_sys->msg_controllen > INT_MAX) goto out; flags |= (msg_sys->msg_flags & allowed_msghdr_flags); ctl_len = msg_sys->msg_controllen; if ((MSG_CMSG_COMPAT & flags) && ctl_len) { err = cmsghdr_from_user_compat_to_kern(msg_sys, sock->sk, ctl, sizeof(ctl)); if (err) goto out; ctl_buf = msg_sys->msg_control; ctl_len = msg_sys->msg_controllen; } else if (ctl_len) { BUILD_BUG_ON(sizeof(struct cmsghdr) != CMSG_ALIGN(sizeof(struct cmsghdr))); if (ctl_len > sizeof(ctl)) { ctl_buf = sock_kmalloc(sock->sk, ctl_len, GFP_KERNEL); if (ctl_buf == NULL) goto out; } err = -EFAULT; if (copy_from_user(ctl_buf, msg_sys->msg_control_user, ctl_len)) goto out_freectl; msg_sys->msg_control = ctl_buf; msg_sys->msg_control_is_user = false; } flags &= ~MSG_INTERNAL_SENDMSG_FLAGS; msg_sys->msg_flags = flags; if (sock->file->f_flags & O_NONBLOCK) msg_sys->msg_flags |= MSG_DONTWAIT; /* * If this is sendmmsg() and current destination address is same as * previously succeeded address, omit asking LSM's decision. * used_address->name_len is initialized to UINT_MAX so that the first * destination address never matches. */ if (used_address && msg_sys->msg_name && used_address->name_len == msg_sys->msg_namelen && !memcmp(&used_address->name, msg_sys->msg_name, used_address->name_len)) { err = sock_sendmsg_nosec(sock, msg_sys); goto out_freectl; } err = __sock_sendmsg(sock, msg_sys); /* * If this is sendmmsg() and sending to current destination address was * successful, remember it. */ if (used_address && err >= 0) { used_address->name_len = msg_sys->msg_namelen; if (msg_sys->msg_name) memcpy(&used_address->name, msg_sys->msg_name, used_address->name_len); } out_freectl: if (ctl_buf != ctl) sock_kfree_s(sock->sk, ctl_buf, ctl_len); out: return err; } static int sendmsg_copy_msghdr(struct msghdr *msg, struct user_msghdr __user *umsg, unsigned flags, struct iovec **iov) { int err; if (flags & MSG_CMSG_COMPAT) { struct compat_msghdr __user *msg_compat; msg_compat = (struct compat_msghdr __user *) umsg; err = get_compat_msghdr(msg, msg_compat, NULL, iov); } else { err = copy_msghdr_from_user(msg, umsg, NULL, iov); } if (err < 0) return err; return 0; } static int ___sys_sendmsg(struct socket *sock, struct user_msghdr __user *msg, struct msghdr *msg_sys, unsigned int flags, struct used_address *used_address, unsigned int allowed_msghdr_flags) { struct sockaddr_storage address; struct iovec iovstack[UIO_FASTIOV], *iov = iovstack; ssize_t err; msg_sys->msg_name = &address; err = sendmsg_copy_msghdr(msg_sys, msg, flags, &iov); if (err < 0) return err; err = ____sys_sendmsg(sock, msg_sys, flags, used_address, allowed_msghdr_flags); kfree(iov); return err; } /* * BSD sendmsg interface */ long __sys_sendmsg_sock(struct socket *sock, struct msghdr *msg, unsigned int flags) { return ____sys_sendmsg(sock, msg, flags, NULL, 0); } long __sys_sendmsg(int fd, struct user_msghdr __user *msg, unsigned int flags, bool forbid_cmsg_compat) { int fput_needed, err; struct msghdr msg_sys; struct socket *sock; if (forbid_cmsg_compat && (flags & MSG_CMSG_COMPAT)) return -EINVAL; sock = sockfd_lookup_light(fd, &err, &fput_needed); if (!sock) goto out; err = ___sys_sendmsg(sock, msg, &msg_sys, flags, NULL, 0); fput_light(sock->file, fput_needed); out: return err; } SYSCALL_DEFINE3(sendmsg, int, fd, struct user_msghdr __user *, msg, unsigned int, flags) { return __sys_sendmsg(fd, msg, flags, true); } /* * Linux sendmmsg interface */ int __sys_sendmmsg(int fd, struct mmsghdr __user *mmsg, unsigned int vlen, unsigned int flags, bool forbid_cmsg_compat) { int fput_needed, err, datagrams; struct socket *sock; struct mmsghdr __user *entry; struct compat_mmsghdr __user *compat_entry; struct msghdr msg_sys; struct used_address used_address; unsigned int oflags = flags; if (forbid_cmsg_compat && (flags & MSG_CMSG_COMPAT)) return -EINVAL; if (vlen > UIO_MAXIOV) vlen = UIO_MAXIOV; datagrams = 0; sock = sockfd_lookup_light(fd, &err, &fput_needed); if (!sock) return err; used_address.name_len = UINT_MAX; entry = mmsg; compat_entry = (struct compat_mmsghdr __user *)mmsg; err = 0; flags |= MSG_BATCH; while (datagrams < vlen) { if (datagrams == vlen - 1) flags = oflags; if (MSG_CMSG_COMPAT & flags) { err = ___sys_sendmsg(sock, (struct user_msghdr __user *)compat_entry, &msg_sys, flags, &used_address, MSG_EOR); if (err < 0) break; err = __put_user(err, &compat_entry->msg_len); ++compat_entry; } else { err = ___sys_sendmsg(sock, (struct user_msghdr __user *)entry, &msg_sys, flags, &used_address, MSG_EOR); if (err < 0) break; err = put_user(err, &entry->msg_len); ++entry; } if (err) break; ++datagrams; if (msg_data_left(&msg_sys)) break; cond_resched(); } fput_light(sock->file, fput_needed); /* We only return an error if no datagrams were able to be sent */ if (datagrams != 0) return datagrams; return err; } SYSCALL_DEFINE4(sendmmsg, int, fd, struct mmsghdr __user *, mmsg, unsigned int, vlen, unsigned int, flags) { return __sys_sendmmsg(fd, mmsg, vlen, flags, true); } static int recvmsg_copy_msghdr(struct msghdr *msg, struct user_msghdr __user *umsg, unsigned flags, struct sockaddr __user **uaddr, struct iovec **iov) { ssize_t err; if (MSG_CMSG_COMPAT & flags) { struct compat_msghdr __user *msg_compat; msg_compat = (struct compat_msghdr __user *) umsg; err = get_compat_msghdr(msg, msg_compat, uaddr, iov); } else { err = copy_msghdr_from_user(msg, umsg, uaddr, iov); } if (err < 0) return err; return 0; } static int ____sys_recvmsg(struct socket *sock, struct msghdr *msg_sys, struct user_msghdr __user *msg, struct sockaddr __user *uaddr, unsigned int flags, int nosec) { struct compat_msghdr __user *msg_compat = (struct compat_msghdr __user *) msg; int __user *uaddr_len = COMPAT_NAMELEN(msg); struct sockaddr_storage addr; unsigned long cmsg_ptr; int len; ssize_t err; msg_sys->msg_name = &addr; cmsg_ptr = (unsigned long)msg_sys->msg_control; msg_sys->msg_flags = flags & (MSG_CMSG_CLOEXEC|MSG_CMSG_COMPAT); /* We assume all kernel code knows the size of sockaddr_storage */ msg_sys->msg_namelen = 0; if (sock->file->f_flags & O_NONBLOCK) flags |= MSG_DONTWAIT; if (unlikely(nosec)) err = sock_recvmsg_nosec(sock, msg_sys, flags); else err = sock_recvmsg(sock, msg_sys, flags); if (err < 0) goto out; len = err; if (uaddr != NULL) { err = move_addr_to_user(&addr, msg_sys->msg_namelen, uaddr, uaddr_len); if (err < 0) goto out; } err = __put_user((msg_sys->msg_flags & ~MSG_CMSG_COMPAT), COMPAT_FLAGS(msg)); if (err) goto out; if (MSG_CMSG_COMPAT & flags) err = __put_user((unsigned long)msg_sys->msg_control - cmsg_ptr, &msg_compat->msg_controllen); else err = __put_user((unsigned long)msg_sys->msg_control - cmsg_ptr, &msg->msg_controllen); if (err) goto out; err = len; out: return err; } static int ___sys_recvmsg(struct socket *sock, struct user_msghdr __user *msg, struct msghdr *msg_sys, unsigned int flags, int nosec) { struct iovec iovstack[UIO_FASTIOV], *iov = iovstack; /* user mode address pointers */ struct sockaddr __user *uaddr; ssize_t err; err = recvmsg_copy_msghdr(msg_sys, msg, flags, &uaddr, &iov); if (err < 0) return err; err = ____sys_recvmsg(sock, msg_sys, msg, uaddr, flags, nosec); kfree(iov); return err; } /* * BSD recvmsg interface */ long __sys_recvmsg_sock(struct socket *sock, struct msghdr *msg, struct user_msghdr __user *umsg, struct sockaddr __user *uaddr, unsigned int flags) { return ____sys_recvmsg(sock, msg, umsg, uaddr, flags, 0); } long __sys_recvmsg(int fd, struct user_msghdr __user *msg, unsigned int flags, bool forbid_cmsg_compat) { int fput_needed, err; struct msghdr msg_sys; struct socket *sock; if (forbid_cmsg_compat && (flags & MSG_CMSG_COMPAT)) return -EINVAL; sock = sockfd_lookup_light(fd, &err, &fput_needed); if (!sock) goto out; err = ___sys_recvmsg(sock, msg, &msg_sys, flags, 0); fput_light(sock->file, fput_needed); out: return err; } SYSCALL_DEFINE3(recvmsg, int, fd, struct user_msghdr __user *, msg, unsigned int, flags) { return __sys_recvmsg(fd, msg, flags, true); } /* * Linux recvmmsg interface */ static int do_recvmmsg(int fd, struct mmsghdr __user *mmsg, unsigned int vlen, unsigned int flags, struct timespec64 *timeout) { int fput_needed, err, datagrams; struct socket *sock; struct mmsghdr __user *entry; struct compat_mmsghdr __user *compat_entry; struct msghdr msg_sys; struct timespec64 end_time; struct timespec64 timeout64; if (timeout && poll_select_set_timeout(&end_time, timeout->tv_sec, timeout->tv_nsec)) return -EINVAL; datagrams = 0; sock = sockfd_lookup_light(fd, &err, &fput_needed); if (!sock) return err; if (likely(!(flags & MSG_ERRQUEUE))) { err = sock_error(sock->sk); if (err) { datagrams = err; goto out_put; } } entry = mmsg; compat_entry = (struct compat_mmsghdr __user *)mmsg; while (datagrams < vlen) { /* * No need to ask LSM for more than the first datagram. */ if (MSG_CMSG_COMPAT & flags) { err = ___sys_recvmsg(sock, (struct user_msghdr __user *)compat_entry, &msg_sys, flags & ~MSG_WAITFORONE, datagrams); if (err < 0) break; err = __put_user(err, &compat_entry->msg_len); ++compat_entry; } else { err = ___sys_recvmsg(sock, (struct user_msghdr __user *)entry, &msg_sys, flags & ~MSG_WAITFORONE, datagrams); if (err < 0) break; err = put_user(err, &entry->msg_len); ++entry; } if (err) break; ++datagrams; /* MSG_WAITFORONE turns on MSG_DONTWAIT after one packet */ if (flags & MSG_WAITFORONE) flags |= MSG_DONTWAIT; if (timeout) { ktime_get_ts64(&timeout64); *timeout = timespec64_sub(end_time, timeout64); if (timeout->tv_sec < 0) { timeout->tv_sec = timeout->tv_nsec = 0; break; } /* Timeout, return less than vlen datagrams */ if (timeout->tv_nsec == 0 && timeout->tv_sec == 0) break; } /* Out of band data, return right away */ if (msg_sys.msg_flags & MSG_OOB) break; cond_resched(); } if (err == 0) goto out_put; if (datagrams == 0) { datagrams = err; goto out_put; } /* * We may return less entries than requested (vlen) if the * sock is non block and there aren't enough datagrams... */ if (err != -EAGAIN) { /* * ... or if recvmsg returns an error after we * received some datagrams, where we record the * error to return on the next call or if the * app asks about it using getsockopt(SO_ERROR). */ WRITE_ONCE(sock->sk->sk_err, -err); } out_put: fput_light(sock->file, fput_needed); return datagrams; } int __sys_recvmmsg(int fd, struct mmsghdr __user *mmsg, unsigned int vlen, unsigned int flags, struct __kernel_timespec __user *timeout, struct old_timespec32 __user *timeout32) { int datagrams; struct timespec64 timeout_sys; if (timeout && get_timespec64(&timeout_sys, timeout)) return -EFAULT; if (timeout32 && get_old_timespec32(&timeout_sys, timeout32)) return -EFAULT; if (!timeout && !timeout32) return do_recvmmsg(fd, mmsg, vlen, flags, NULL); datagrams = do_recvmmsg(fd, mmsg, vlen, flags, &timeout_sys); if (datagrams <= 0) return datagrams; if (timeout && put_timespec64(&timeout_sys, timeout)) datagrams = -EFAULT; if (timeout32 && put_old_timespec32(&timeout_sys, timeout32)) datagrams = -EFAULT; return datagrams; } SYSCALL_DEFINE5(recvmmsg, int, fd, struct mmsghdr __user *, mmsg, unsigned int, vlen, unsigned int, flags, struct __kernel_timespec __user *, timeout) { if (flags & MSG_CMSG_COMPAT) return -EINVAL; return __sys_recvmmsg(fd, mmsg, vlen, flags, timeout, NULL); } #ifdef CONFIG_COMPAT_32BIT_TIME SYSCALL_DEFINE5(recvmmsg_time32, int, fd, struct mmsghdr __user *, mmsg, unsigned int, vlen, unsigned int, flags, struct old_timespec32 __user *, timeout) { if (flags & MSG_CMSG_COMPAT) return -EINVAL; return __sys_recvmmsg(fd, mmsg, vlen, flags, NULL, timeout); } #endif #ifdef __ARCH_WANT_SYS_SOCKETCALL /* Argument list sizes for sys_socketcall */ #define AL(x) ((x) * sizeof(unsigned long)) static const unsigned char nargs[21] = { AL(0), AL(3), AL(3), AL(3), AL(2), AL(3), AL(3), AL(3), AL(4), AL(4), AL(4), AL(6), AL(6), AL(2), AL(5), AL(5), AL(3), AL(3), AL(4), AL(5), AL(4) }; #undef AL /* * System call vectors. * * Argument checking cleaned up. Saved 20% in size. * This function doesn't need to set the kernel lock because * it is set by the callees. */ SYSCALL_DEFINE2(socketcall, int, call, unsigned long __user *, args) { unsigned long a[AUDITSC_ARGS]; unsigned long a0, a1; int err; unsigned int len; if (call < 1 || call > SYS_SENDMMSG) return -EINVAL; call = array_index_nospec(call, SYS_SENDMMSG + 1); len = nargs[call]; if (len > sizeof(a)) return -EINVAL; /* copy_from_user should be SMP safe. */ if (copy_from_user(a, args, len)) return -EFAULT; err = audit_socketcall(nargs[call] / sizeof(unsigned long), a); if (err) return err; a0 = a[0]; a1 = a[1]; switch (call) { case SYS_SOCKET: err = __sys_socket(a0, a1, a[2]); break; case SYS_BIND: err = __sys_bind(a0, (struct sockaddr __user *)a1, a[2]); break; case SYS_CONNECT: err = __sys_connect(a0, (struct sockaddr __user *)a1, a[2]); break; case SYS_LISTEN: err = __sys_listen(a0, a1); break; case SYS_ACCEPT: err = __sys_accept4(a0, (struct sockaddr __user *)a1, (int __user *)a[2], 0); break; case SYS_GETSOCKNAME: err = __sys_getsockname(a0, (struct sockaddr __user *)a1, (int __user *)a[2]); break; case SYS_GETPEERNAME: err = __sys_getpeername(a0, (struct sockaddr __user *)a1, (int __user *)a[2]); break; case SYS_SOCKETPAIR: err = __sys_socketpair(a0, a1, a[2], (int __user *)a[3]); break; case SYS_SEND: err = __sys_sendto(a0, (void __user *)a1, a[2], a[3], NULL, 0); break; case SYS_SENDTO: err = __sys_sendto(a0, (void __user *)a1, a[2], a[3], (struct sockaddr __user *)a[4], a[5]); break; case SYS_RECV: err = __sys_recvfrom(a0, (void __user *)a1, a[2], a[3], NULL, NULL); break; case SYS_RECVFROM: err = __sys_recvfrom(a0, (void __user *)a1, a[2], a[3], (struct sockaddr __user *)a[4], (int __user *)a[5]); break; case SYS_SHUTDOWN: err = __sys_shutdown(a0, a1); break; case SYS_SETSOCKOPT: err = __sys_setsockopt(a0, a1, a[2], (char __user *)a[3], a[4]); break; case SYS_GETSOCKOPT: err = __sys_getsockopt(a0, a1, a[2], (char __user *)a[3], (int __user *)a[4]); break; case SYS_SENDMSG: err = __sys_sendmsg(a0, (struct user_msghdr __user *)a1, a[2], true); break; case SYS_SENDMMSG: err = __sys_sendmmsg(a0, (struct mmsghdr __user *)a1, a[2], a[3], true); break; case SYS_RECVMSG: err = __sys_recvmsg(a0, (struct user_msghdr __user *)a1, a[2], true); break; case SYS_RECVMMSG: if (IS_ENABLED(CONFIG_64BIT)) err = __sys_recvmmsg(a0, (struct mmsghdr __user *)a1, a[2], a[3], (struct __kernel_timespec __user *)a[4], NULL); else err = __sys_recvmmsg(a0, (struct mmsghdr __user *)a1, a[2], a[3], NULL, (struct old_timespec32 __user *)a[4]); break; case SYS_ACCEPT4: err = __sys_accept4(a0, (struct sockaddr __user *)a1, (int __user *)a[2], a[3]); break; default: err = -EINVAL; break; } return err; } #endif /* __ARCH_WANT_SYS_SOCKETCALL */ /** * sock_register - add a socket protocol handler * @ops: description of protocol * * This function is called by a protocol handler that wants to * advertise its address family, and have it linked into the * socket interface. The value ops->family corresponds to the * socket system call protocol family. */ int sock_register(const struct net_proto_family *ops) { int err; if (ops->family >= NPROTO) { pr_crit("protocol %d >= NPROTO(%d)\n", ops->family, NPROTO); return -ENOBUFS; } spin_lock(&net_family_lock); if (rcu_dereference_protected(net_families[ops->family], lockdep_is_held(&net_family_lock))) err = -EEXIST; else { rcu_assign_pointer(net_families[ops->family], ops); err = 0; } spin_unlock(&net_family_lock); pr_info("NET: Registered %s protocol family\n", pf_family_names[ops->family]); return err; } EXPORT_SYMBOL(sock_register); /** * sock_unregister - remove a protocol handler * @family: protocol family to remove * * This function is called by a protocol handler that wants to * remove its address family, and have it unlinked from the * new socket creation. * * If protocol handler is a module, then it can use module reference * counts to protect against new references. If protocol handler is not * a module then it needs to provide its own protection in * the ops->create routine. */ void sock_unregister(int family) { BUG_ON(family < 0 || family >= NPROTO); spin_lock(&net_family_lock); RCU_INIT_POINTER(net_families[family], NULL); spin_unlock(&net_family_lock); synchronize_rcu(); pr_info("NET: Unregistered %s protocol family\n", pf_family_names[family]); } EXPORT_SYMBOL(sock_unregister); bool sock_is_registered(int family) { return family < NPROTO && rcu_access_pointer(net_families[family]); } static int __init sock_init(void) { int err; /* * Initialize the network sysctl infrastructure. */ err = net_sysctl_init(); if (err) goto out; /* * Initialize skbuff SLAB cache */ skb_init(); /* * Initialize the protocols module. */ init_inodecache(); err = register_filesystem(&sock_fs_type); if (err) goto out; sock_mnt = kern_mount(&sock_fs_type); if (IS_ERR(sock_mnt)) { err = PTR_ERR(sock_mnt); goto out_mount; } /* The real protocol initialization is performed in later initcalls. */ #ifdef CONFIG_NETFILTER err = netfilter_init(); if (err) goto out; #endif ptp_classifier_init(); out: return err; out_mount: unregister_filesystem(&sock_fs_type); goto out; } core_initcall(sock_init); /* early initcall */ #ifdef CONFIG_PROC_FS void socket_seq_show(struct seq_file *seq) { seq_printf(seq, "sockets: used %d\n", sock_inuse_get(seq->private)); } #endif /* CONFIG_PROC_FS */ /* Handle the fact that while struct ifreq has the same *layout* on * 32/64 for everything but ifreq::ifru_ifmap and ifreq::ifru_data, * which are handled elsewhere, it still has different *size* due to * ifreq::ifru_ifmap (which is 16 bytes on 32 bit, 24 bytes on 64-bit, * resulting in struct ifreq being 32 and 40 bytes respectively). * As a result, if the struct happens to be at the end of a page and * the next page isn't readable/writable, we get a fault. To prevent * that, copy back and forth to the full size. */ int get_user_ifreq(struct ifreq *ifr, void __user **ifrdata, void __user *arg) { if (in_compat_syscall()) { struct compat_ifreq *ifr32 = (struct compat_ifreq *)ifr; memset(ifr, 0, sizeof(*ifr)); if (copy_from_user(ifr32, arg, sizeof(*ifr32))) return -EFAULT; if (ifrdata) *ifrdata = compat_ptr(ifr32->ifr_data); return 0; } if (copy_from_user(ifr, arg, sizeof(*ifr))) return -EFAULT; if (ifrdata) *ifrdata = ifr->ifr_data; return 0; } EXPORT_SYMBOL(get_user_ifreq); int put_user_ifreq(struct ifreq *ifr, void __user *arg) { size_t size = sizeof(*ifr); if (in_compat_syscall()) size = sizeof(struct compat_ifreq); if (copy_to_user(arg, ifr, size)) return -EFAULT; return 0; } EXPORT_SYMBOL(put_user_ifreq); #ifdef CONFIG_COMPAT static int compat_siocwandev(struct net *net, struct compat_ifreq __user *uifr32) { compat_uptr_t uptr32; struct ifreq ifr; void __user *saved; int err; if (get_user_ifreq(&ifr, NULL, uifr32)) return -EFAULT; if (get_user(uptr32, &uifr32->ifr_settings.ifs_ifsu)) return -EFAULT; saved = ifr.ifr_settings.ifs_ifsu.raw_hdlc; ifr.ifr_settings.ifs_ifsu.raw_hdlc = compat_ptr(uptr32); err = dev_ioctl(net, SIOCWANDEV, &ifr, NULL, NULL); if (!err) { ifr.ifr_settings.ifs_ifsu.raw_hdlc = saved; if (put_user_ifreq(&ifr, uifr32)) err = -EFAULT; } return err; } /* Handle ioctls that use ifreq::ifr_data and just need struct ifreq converted */ static int compat_ifr_data_ioctl(struct net *net, unsigned int cmd, struct compat_ifreq __user *u_ifreq32) { struct ifreq ifreq; void __user *data; if (!is_socket_ioctl_cmd(cmd)) return -ENOTTY; if (get_user_ifreq(&ifreq, &data, u_ifreq32)) return -EFAULT; ifreq.ifr_data = data; return dev_ioctl(net, cmd, &ifreq, data, NULL); } static int compat_sock_ioctl_trans(struct file *file, struct socket *sock, unsigned int cmd, unsigned long arg) { void __user *argp = compat_ptr(arg); struct sock *sk = sock->sk; struct net *net = sock_net(sk); const struct proto_ops *ops; if (cmd >= SIOCDEVPRIVATE && cmd <= (SIOCDEVPRIVATE + 15)) return sock_ioctl(file, cmd, (unsigned long)argp); switch (cmd) { case SIOCWANDEV: return compat_siocwandev(net, argp); case SIOCGSTAMP_OLD: case SIOCGSTAMPNS_OLD: ops = READ_ONCE(sock->ops); if (!ops->gettstamp) return -ENOIOCTLCMD; return ops->gettstamp(sock, argp, cmd == SIOCGSTAMP_OLD, !COMPAT_USE_64BIT_TIME); case SIOCETHTOOL: case SIOCBONDSLAVEINFOQUERY: case SIOCBONDINFOQUERY: case SIOCSHWTSTAMP: case SIOCGHWTSTAMP: return compat_ifr_data_ioctl(net, cmd, argp); case FIOSETOWN: case SIOCSPGRP: case FIOGETOWN: case SIOCGPGRP: case SIOCBRADDBR: case SIOCBRDELBR: case SIOCGIFVLAN: case SIOCSIFVLAN: case SIOCGSKNS: case SIOCGSTAMP_NEW: case SIOCGSTAMPNS_NEW: case SIOCGIFCONF: case SIOCSIFBR: case SIOCGIFBR: return sock_ioctl(file, cmd, arg); case SIOCGIFFLAGS: case SIOCSIFFLAGS: case SIOCGIFMAP: case SIOCSIFMAP: case SIOCGIFMETRIC: case SIOCSIFMETRIC: case SIOCGIFMTU: case SIOCSIFMTU: case SIOCGIFMEM: case SIOCSIFMEM: case SIOCGIFHWADDR: case SIOCSIFHWADDR: case SIOCADDMULTI: case SIOCDELMULTI: case SIOCGIFINDEX: case SIOCGIFADDR: case SIOCSIFADDR: case SIOCSIFHWBROADCAST: case SIOCDIFADDR: case SIOCGIFBRDADDR: case SIOCSIFBRDADDR: case SIOCGIFDSTADDR: case SIOCSIFDSTADDR: case SIOCGIFNETMASK: case SIOCSIFNETMASK: case SIOCSIFPFLAGS: case SIOCGIFPFLAGS: case SIOCGIFTXQLEN: case SIOCSIFTXQLEN: case SIOCBRADDIF: case SIOCBRDELIF: case SIOCGIFNAME: case SIOCSIFNAME: case SIOCGMIIPHY: case SIOCGMIIREG: case SIOCSMIIREG: case SIOCBONDENSLAVE: case SIOCBONDRELEASE: case SIOCBONDSETHWADDR: case SIOCBONDCHANGEACTIVE: case SIOCSARP: case SIOCGARP: case SIOCDARP: case SIOCOUTQ: case SIOCOUTQNSD: case SIOCATMARK: return sock_do_ioctl(net, sock, cmd, arg); } return -ENOIOCTLCMD; } static long compat_sock_ioctl(struct file *file, unsigned int cmd, unsigned long arg) { struct socket *sock = file->private_data; const struct proto_ops *ops = READ_ONCE(sock->ops); int ret = -ENOIOCTLCMD; struct sock *sk; struct net *net; sk = sock->sk; net = sock_net(sk); if (ops->compat_ioctl) ret = ops->compat_ioctl(sock, cmd, arg); if (ret == -ENOIOCTLCMD && (cmd >= SIOCIWFIRST && cmd <= SIOCIWLAST)) ret = compat_wext_handle_ioctl(net, cmd, arg); if (ret == -ENOIOCTLCMD) ret = compat_sock_ioctl_trans(file, sock, cmd, arg); return ret; } #endif /** * kernel_bind - bind an address to a socket (kernel space) * @sock: socket * @addr: address * @addrlen: length of address * * Returns 0 or an error. */ int kernel_bind(struct socket *sock, struct sockaddr *addr, int addrlen) { struct sockaddr_storage address; memcpy(&address, addr, addrlen); return READ_ONCE(sock->ops)->bind(sock, (struct sockaddr *)&address, addrlen); } EXPORT_SYMBOL(kernel_bind); /** * kernel_listen - move socket to listening state (kernel space) * @sock: socket * @backlog: pending connections queue size * * Returns 0 or an error. */ int kernel_listen(struct socket *sock, int backlog) { return READ_ONCE(sock->ops)->listen(sock, backlog); } EXPORT_SYMBOL(kernel_listen); /** * kernel_accept - accept a connection (kernel space) * @sock: listening socket * @newsock: new connected socket * @flags: flags * * @flags must be SOCK_CLOEXEC, SOCK_NONBLOCK or 0. * If it fails, @newsock is guaranteed to be %NULL. * Returns 0 or an error. */ int kernel_accept(struct socket *sock, struct socket **newsock, int flags) { struct sock *sk = sock->sk; const struct proto_ops *ops = READ_ONCE(sock->ops); struct proto_accept_arg arg = { .flags = flags, .kern = true, }; int err; err = sock_create_lite(sk->sk_family, sk->sk_type, sk->sk_protocol, newsock); if (err < 0) goto done; err = ops->accept(sock, *newsock, &arg); if (err < 0) { sock_release(*newsock); *newsock = NULL; goto done; } (*newsock)->ops = ops; __module_get(ops->owner); done: return err; } EXPORT_SYMBOL(kernel_accept); /** * kernel_connect - connect a socket (kernel space) * @sock: socket * @addr: address * @addrlen: address length * @flags: flags (O_NONBLOCK, ...) * * For datagram sockets, @addr is the address to which datagrams are sent * by default, and the only address from which datagrams are received. * For stream sockets, attempts to connect to @addr. * Returns 0 or an error code. */ int kernel_connect(struct socket *sock, struct sockaddr *addr, int addrlen, int flags) { struct sockaddr_storage address; memcpy(&address, addr, addrlen); return READ_ONCE(sock->ops)->connect(sock, (struct sockaddr *)&address, addrlen, flags); } EXPORT_SYMBOL(kernel_connect); /** * kernel_getsockname - get the address which the socket is bound (kernel space) * @sock: socket * @addr: address holder * * Fills the @addr pointer with the address which the socket is bound. * Returns the length of the address in bytes or an error code. */ int kernel_getsockname(struct socket *sock, struct sockaddr *addr) { return READ_ONCE(sock->ops)->getname(sock, addr, 0); } EXPORT_SYMBOL(kernel_getsockname); /** * kernel_getpeername - get the address which the socket is connected (kernel space) * @sock: socket * @addr: address holder * * Fills the @addr pointer with the address which the socket is connected. * Returns the length of the address in bytes or an error code. */ int kernel_getpeername(struct socket *sock, struct sockaddr *addr) { return READ_ONCE(sock->ops)->getname(sock, addr, 1); } EXPORT_SYMBOL(kernel_getpeername); /** * kernel_sock_shutdown - shut down part of a full-duplex connection (kernel space) * @sock: socket * @how: connection part * * Returns 0 or an error. */ int kernel_sock_shutdown(struct socket *sock, enum sock_shutdown_cmd how) { return READ_ONCE(sock->ops)->shutdown(sock, how); } EXPORT_SYMBOL(kernel_sock_shutdown); /** * kernel_sock_ip_overhead - returns the IP overhead imposed by a socket * @sk: socket * * This routine returns the IP overhead imposed by a socket i.e. * the length of the underlying IP header, depending on whether * this is an IPv4 or IPv6 socket and the length from IP options turned * on at the socket. Assumes that the caller has a lock on the socket. */ u32 kernel_sock_ip_overhead(struct sock *sk) { struct inet_sock *inet; struct ip_options_rcu *opt; u32 overhead = 0; #if IS_ENABLED(CONFIG_IPV6) struct ipv6_pinfo *np; struct ipv6_txoptions *optv6 = NULL; #endif /* IS_ENABLED(CONFIG_IPV6) */ if (!sk) return overhead; switch (sk->sk_family) { case AF_INET: inet = inet_sk(sk); overhead += sizeof(struct iphdr); opt = rcu_dereference_protected(inet->inet_opt, sock_owned_by_user(sk)); if (opt) overhead += opt->opt.optlen; return overhead; #if IS_ENABLED(CONFIG_IPV6) case AF_INET6: np = inet6_sk(sk); overhead += sizeof(struct ipv6hdr); if (np) optv6 = rcu_dereference_protected(np->opt, sock_owned_by_user(sk)); if (optv6) overhead += (optv6->opt_flen + optv6->opt_nflen); return overhead; #endif /* IS_ENABLED(CONFIG_IPV6) */ default: /* Returns 0 overhead if the socket is not ipv4 or ipv6 */ return overhead; } } EXPORT_SYMBOL(kernel_sock_ip_overhead); |
8 8 8 16 22 8 22 22 22 22 21 22 8 21 15 5 19 21 20 8 21 15 21 22 22 17 20 17 20 22 22 22 1 22 7 16 21 22 22 2 2 22 16 16 1 8 8 8 22 22 22 8 22 20 8 22 22 8 22 22 22 17 16 17 17 17 14 16 16 16 16 1 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 | // SPDX-License-Identifier: GPL-2.0 /* * Copyright (c) 2010 Red Hat, Inc. All Rights Reserved. */ #include "xfs.h" #include "xfs_fs.h" #include "xfs_format.h" #include "xfs_log_format.h" #include "xfs_shared.h" #include "xfs_trans_resv.h" #include "xfs_mount.h" #include "xfs_extent_busy.h" #include "xfs_trans.h" #include "xfs_trans_priv.h" #include "xfs_log.h" #include "xfs_log_priv.h" #include "xfs_trace.h" #include "xfs_discard.h" /* * Allocate a new ticket. Failing to get a new ticket makes it really hard to * recover, so we don't allow failure here. Also, we allocate in a context that * we don't want to be issuing transactions from, so we need to tell the * allocation code this as well. * * We don't reserve any space for the ticket - we are going to steal whatever * space we require from transactions as they commit. To ensure we reserve all * the space required, we need to set the current reservation of the ticket to * zero so that we know to steal the initial transaction overhead from the * first transaction commit. */ static struct xlog_ticket * xlog_cil_ticket_alloc( struct xlog *log) { struct xlog_ticket *tic; tic = xlog_ticket_alloc(log, 0, 1, 0); /* * set the current reservation to zero so we know to steal the basic * transaction overhead reservation from the first transaction commit. */ tic->t_curr_res = 0; tic->t_iclog_hdrs = 0; return tic; } static inline void xlog_cil_set_iclog_hdr_count(struct xfs_cil *cil) { struct xlog *log = cil->xc_log; atomic_set(&cil->xc_iclog_hdrs, (XLOG_CIL_BLOCKING_SPACE_LIMIT(log) / (log->l_iclog_size - log->l_iclog_hsize))); } /* * Check if the current log item was first committed in this sequence. * We can't rely on just the log item being in the CIL, we have to check * the recorded commit sequence number. * * Note: for this to be used in a non-racy manner, it has to be called with * CIL flushing locked out. As a result, it should only be used during the * transaction commit process when deciding what to format into the item. */ static bool xlog_item_in_current_chkpt( struct xfs_cil *cil, struct xfs_log_item *lip) { if (test_bit(XLOG_CIL_EMPTY, &cil->xc_flags)) return false; /* * li_seq is written on the first commit of a log item to record the * first checkpoint it is written to. Hence if it is different to the * current sequence, we're in a new checkpoint. */ return lip->li_seq == READ_ONCE(cil->xc_current_sequence); } bool xfs_log_item_in_current_chkpt( struct xfs_log_item *lip) { return xlog_item_in_current_chkpt(lip->li_log->l_cilp, lip); } /* * Unavoidable forward declaration - xlog_cil_push_work() calls * xlog_cil_ctx_alloc() itself. */ static void xlog_cil_push_work(struct work_struct *work); static struct xfs_cil_ctx * xlog_cil_ctx_alloc(void) { struct xfs_cil_ctx *ctx; ctx = kzalloc(sizeof(*ctx), GFP_KERNEL | __GFP_NOFAIL); INIT_LIST_HEAD(&ctx->committing); INIT_LIST_HEAD(&ctx->busy_extents.extent_list); INIT_LIST_HEAD(&ctx->log_items); INIT_LIST_HEAD(&ctx->lv_chain); INIT_WORK(&ctx->push_work, xlog_cil_push_work); return ctx; } /* * Aggregate the CIL per cpu structures into global counts, lists, etc and * clear the percpu state ready for the next context to use. This is called * from the push code with the context lock held exclusively, hence nothing else * will be accessing or modifying the per-cpu counters. */ static void xlog_cil_push_pcp_aggregate( struct xfs_cil *cil, struct xfs_cil_ctx *ctx) { struct xlog_cil_pcp *cilpcp; int cpu; for_each_cpu(cpu, &ctx->cil_pcpmask) { cilpcp = per_cpu_ptr(cil->xc_pcp, cpu); ctx->ticket->t_curr_res += cilpcp->space_reserved; cilpcp->space_reserved = 0; if (!list_empty(&cilpcp->busy_extents)) { list_splice_init(&cilpcp->busy_extents, &ctx->busy_extents.extent_list); } if (!list_empty(&cilpcp->log_items)) list_splice_init(&cilpcp->log_items, &ctx->log_items); /* * We're in the middle of switching cil contexts. Reset the * counter we use to detect when the current context is nearing * full. */ cilpcp->space_used = 0; } } /* * Aggregate the CIL per-cpu space used counters into the global atomic value. * This is called when the per-cpu counter aggregation will first pass the soft * limit threshold so we can switch to atomic counter aggregation for accurate * detection of hard limit traversal. */ static void xlog_cil_insert_pcp_aggregate( struct xfs_cil *cil, struct xfs_cil_ctx *ctx) { struct xlog_cil_pcp *cilpcp; int cpu; int count = 0; /* Trigger atomic updates then aggregate only for the first caller */ if (!test_and_clear_bit(XLOG_CIL_PCP_SPACE, &cil->xc_flags)) return; /* * We can race with other cpus setting cil_pcpmask. However, we've * atomically cleared PCP_SPACE which forces other threads to add to * the global space used count. cil_pcpmask is a superset of cilpcp * structures that could have a nonzero space_used. */ for_each_cpu(cpu, &ctx->cil_pcpmask) { int old, prev; cilpcp = per_cpu_ptr(cil->xc_pcp, cpu); do { old = cilpcp->space_used; prev = cmpxchg(&cilpcp->space_used, old, 0); } while (old != prev); count += old; } atomic_add(count, &ctx->space_used); } static void xlog_cil_ctx_switch( struct xfs_cil *cil, struct xfs_cil_ctx *ctx) { xlog_cil_set_iclog_hdr_count(cil); set_bit(XLOG_CIL_EMPTY, &cil->xc_flags); set_bit(XLOG_CIL_PCP_SPACE, &cil->xc_flags); ctx->sequence = ++cil->xc_current_sequence; ctx->cil = cil; cil->xc_ctx = ctx; } /* * After the first stage of log recovery is done, we know where the head and * tail of the log are. We need this log initialisation done before we can * initialise the first CIL checkpoint context. * * Here we allocate a log ticket to track space usage during a CIL push. This * ticket is passed to xlog_write() directly so that we don't slowly leak log * space by failing to account for space used by log headers and additional * region headers for split regions. */ void xlog_cil_init_post_recovery( struct xlog *log) { log->l_cilp->xc_ctx->ticket = xlog_cil_ticket_alloc(log); log->l_cilp->xc_ctx->sequence = 1; xlog_cil_set_iclog_hdr_count(log->l_cilp); } static inline int xlog_cil_iovec_space( uint niovecs) { return round_up((sizeof(struct xfs_log_vec) + niovecs * sizeof(struct xfs_log_iovec)), sizeof(uint64_t)); } /* * Allocate or pin log vector buffers for CIL insertion. * * The CIL currently uses disposable buffers for copying a snapshot of the * modified items into the log during a push. The biggest problem with this is * the requirement to allocate the disposable buffer during the commit if: * a) does not exist; or * b) it is too small * * If we do this allocation within xlog_cil_insert_format_items(), it is done * under the xc_ctx_lock, which means that a CIL push cannot occur during * the memory allocation. This means that we have a potential deadlock situation * under low memory conditions when we have lots of dirty metadata pinned in * the CIL and we need a CIL commit to occur to free memory. * * To avoid this, we need to move the memory allocation outside the * xc_ctx_lock, but because the log vector buffers are disposable, that opens * up a TOCTOU race condition w.r.t. the CIL committing and removing the log * vector buffers between the check and the formatting of the item into the * log vector buffer within the xc_ctx_lock. * * Because the log vector buffer needs to be unchanged during the CIL push * process, we cannot share the buffer between the transaction commit (which * modifies the buffer) and the CIL push context that is writing the changes * into the log. This means skipping preallocation of buffer space is * unreliable, but we most definitely do not want to be allocating and freeing * buffers unnecessarily during commits when overwrites can be done safely. * * The simplest solution to this problem is to allocate a shadow buffer when a * log item is committed for the second time, and then to only use this buffer * if necessary. The buffer can remain attached to the log item until such time * it is needed, and this is the buffer that is reallocated to match the size of * the incoming modification. Then during the formatting of the item we can swap * the active buffer with the new one if we can't reuse the existing buffer. We * don't free the old buffer as it may be reused on the next modification if * it's size is right, otherwise we'll free and reallocate it at that point. * * This function builds a vector for the changes in each log item in the * transaction. It then works out the length of the buffer needed for each log * item, allocates them and attaches the vector to the log item in preparation * for the formatting step which occurs under the xc_ctx_lock. * * While this means the memory footprint goes up, it avoids the repeated * alloc/free pattern that repeated modifications of an item would otherwise * cause, and hence minimises the CPU overhead of such behaviour. */ static void xlog_cil_alloc_shadow_bufs( struct xlog *log, struct xfs_trans *tp) { struct xfs_log_item *lip; list_for_each_entry(lip, &tp->t_items, li_trans) { struct xfs_log_vec *lv; int niovecs = 0; int nbytes = 0; int buf_size; bool ordered = false; /* Skip items which aren't dirty in this transaction. */ if (!test_bit(XFS_LI_DIRTY, &lip->li_flags)) continue; /* get number of vecs and size of data to be stored */ lip->li_ops->iop_size(lip, &niovecs, &nbytes); /* * Ordered items need to be tracked but we do not wish to write * them. We need a logvec to track the object, but we do not * need an iovec or buffer to be allocated for copying data. */ if (niovecs == XFS_LOG_VEC_ORDERED) { ordered = true; niovecs = 0; nbytes = 0; } /* * We 64-bit align the length of each iovec so that the start of * the next one is naturally aligned. We'll need to account for * that slack space here. * * We also add the xlog_op_header to each region when * formatting, but that's not accounted to the size of the item * at this point. Hence we'll need an addition number of bytes * for each vector to hold an opheader. * * Then round nbytes up to 64-bit alignment so that the initial * buffer alignment is easy to calculate and verify. */ nbytes += niovecs * (sizeof(uint64_t) + sizeof(struct xlog_op_header)); nbytes = round_up(nbytes, sizeof(uint64_t)); /* * The data buffer needs to start 64-bit aligned, so round up * that space to ensure we can align it appropriately and not * overrun the buffer. */ buf_size = nbytes + xlog_cil_iovec_space(niovecs); /* * if we have no shadow buffer, or it is too small, we need to * reallocate it. */ if (!lip->li_lv_shadow || buf_size > lip->li_lv_shadow->lv_size) { /* * We free and allocate here as a realloc would copy * unnecessary data. We don't use kvzalloc() for the * same reason - we don't need to zero the data area in * the buffer, only the log vector header and the iovec * storage. */ kvfree(lip->li_lv_shadow); lv = xlog_kvmalloc(buf_size); memset(lv, 0, xlog_cil_iovec_space(niovecs)); INIT_LIST_HEAD(&lv->lv_list); lv->lv_item = lip; lv->lv_size = buf_size; if (ordered) lv->lv_buf_len = XFS_LOG_VEC_ORDERED; else lv->lv_iovecp = (struct xfs_log_iovec *)&lv[1]; lip->li_lv_shadow = lv; } else { /* same or smaller, optimise common overwrite case */ lv = lip->li_lv_shadow; if (ordered) lv->lv_buf_len = XFS_LOG_VEC_ORDERED; else lv->lv_buf_len = 0; lv->lv_bytes = 0; } /* Ensure the lv is set up according to ->iop_size */ lv->lv_niovecs = niovecs; /* The allocated data region lies beyond the iovec region */ lv->lv_buf = (char *)lv + xlog_cil_iovec_space(niovecs); } } /* * Prepare the log item for insertion into the CIL. Calculate the difference in * log space it will consume, and if it is a new item pin it as well. */ STATIC void xfs_cil_prepare_item( struct xlog *log, struct xfs_log_vec *lv, struct xfs_log_vec *old_lv, int *diff_len) { /* Account for the new LV being passed in */ if (lv->lv_buf_len != XFS_LOG_VEC_ORDERED) *diff_len += lv->lv_bytes; /* * If there is no old LV, this is the first time we've seen the item in * this CIL context and so we need to pin it. If we are replacing the * old_lv, then remove the space it accounts for and make it the shadow * buffer for later freeing. In both cases we are now switching to the * shadow buffer, so update the pointer to it appropriately. */ if (!old_lv) { if (lv->lv_item->li_ops->iop_pin) lv->lv_item->li_ops->iop_pin(lv->lv_item); lv->lv_item->li_lv_shadow = NULL; } else if (old_lv != lv) { ASSERT(lv->lv_buf_len != XFS_LOG_VEC_ORDERED); *diff_len -= old_lv->lv_bytes; lv->lv_item->li_lv_shadow = old_lv; } /* attach new log vector to log item */ lv->lv_item->li_lv = lv; /* * If this is the first time the item is being committed to the * CIL, store the sequence number on the log item so we can * tell in future commits whether this is the first checkpoint * the item is being committed into. */ if (!lv->lv_item->li_seq) lv->lv_item->li_seq = log->l_cilp->xc_ctx->sequence; } /* * Format log item into a flat buffers * * For delayed logging, we need to hold a formatted buffer containing all the * changes on the log item. This enables us to relog the item in memory and * write it out asynchronously without needing to relock the object that was * modified at the time it gets written into the iclog. * * This function takes the prepared log vectors attached to each log item, and * formats the changes into the log vector buffer. The buffer it uses is * dependent on the current state of the vector in the CIL - the shadow lv is * guaranteed to be large enough for the current modification, but we will only * use that if we can't reuse the existing lv. If we can't reuse the existing * lv, then simple swap it out for the shadow lv. We don't free it - that is * done lazily either by th enext modification or the freeing of the log item. * * We don't set up region headers during this process; we simply copy the * regions into the flat buffer. We can do this because we still have to do a * formatting step to write the regions into the iclog buffer. Writing the * ophdrs during the iclog write means that we can support splitting large * regions across iclog boundares without needing a change in the format of the * item/region encapsulation. * * Hence what we need to do now is change the rewrite the vector array to point * to the copied region inside the buffer we just allocated. This allows us to * format the regions into the iclog as though they are being formatted * directly out of the objects themselves. */ static void xlog_cil_insert_format_items( struct xlog *log, struct xfs_trans *tp, int *diff_len) { struct xfs_log_item *lip; /* Bail out if we didn't find a log item. */ if (list_empty(&tp->t_items)) { ASSERT(0); return; } list_for_each_entry(lip, &tp->t_items, li_trans) { struct xfs_log_vec *lv; struct xfs_log_vec *old_lv = NULL; struct xfs_log_vec *shadow; bool ordered = false; /* Skip items which aren't dirty in this transaction. */ if (!test_bit(XFS_LI_DIRTY, &lip->li_flags)) continue; /* * The formatting size information is already attached to * the shadow lv on the log item. */ shadow = lip->li_lv_shadow; if (shadow->lv_buf_len == XFS_LOG_VEC_ORDERED) ordered = true; /* Skip items that do not have any vectors for writing */ if (!shadow->lv_niovecs && !ordered) continue; /* compare to existing item size */ old_lv = lip->li_lv; if (lip->li_lv && shadow->lv_size <= lip->li_lv->lv_size) { /* same or smaller, optimise common overwrite case */ lv = lip->li_lv; if (ordered) goto insert; /* * set the item up as though it is a new insertion so * that the space reservation accounting is correct. */ *diff_len -= lv->lv_bytes; /* Ensure the lv is set up according to ->iop_size */ lv->lv_niovecs = shadow->lv_niovecs; /* reset the lv buffer information for new formatting */ lv->lv_buf_len = 0; lv->lv_bytes = 0; lv->lv_buf = (char *)lv + xlog_cil_iovec_space(lv->lv_niovecs); } else { /* switch to shadow buffer! */ lv = shadow; lv->lv_item = lip; if (ordered) { /* track as an ordered logvec */ ASSERT(lip->li_lv == NULL); goto insert; } } ASSERT(IS_ALIGNED((unsigned long)lv->lv_buf, sizeof(uint64_t))); lip->li_ops->iop_format(lip, lv); insert: xfs_cil_prepare_item(log, lv, old_lv, diff_len); } } /* * The use of lockless waitqueue_active() requires that the caller has * serialised itself against the wakeup call in xlog_cil_push_work(). That * can be done by either holding the push lock or the context lock. */ static inline bool xlog_cil_over_hard_limit( struct xlog *log, int32_t space_used) { if (waitqueue_active(&log->l_cilp->xc_push_wait)) return true; if (space_used >= XLOG_CIL_BLOCKING_SPACE_LIMIT(log)) return true; return false; } /* * Insert the log items into the CIL and calculate the difference in space * consumed by the item. Add the space to the checkpoint ticket and calculate * if the change requires additional log metadata. If it does, take that space * as well. Remove the amount of space we added to the checkpoint ticket from * the current transaction ticket so that the accounting works out correctly. */ static void xlog_cil_insert_items( struct xlog *log, struct xfs_trans *tp, uint32_t released_space) { struct xfs_cil *cil = log->l_cilp; struct xfs_cil_ctx *ctx = cil->xc_ctx; struct xfs_log_item *lip; int len = 0; int iovhdr_res = 0, split_res = 0, ctx_res = 0; int space_used; int order; unsigned int cpu_nr; struct xlog_cil_pcp *cilpcp; ASSERT(tp); /* * We can do this safely because the context can't checkpoint until we * are done so it doesn't matter exactly how we update the CIL. */ xlog_cil_insert_format_items(log, tp, &len); /* * Subtract the space released by intent cancelation from the space we * consumed so that we remove it from the CIL space and add it back to * the current transaction reservation context. */ len -= released_space; /* * Grab the per-cpu pointer for the CIL before we start any accounting. * That ensures that we are running with pre-emption disabled and so we * can't be scheduled away between split sample/update operations that * are done without outside locking to serialise them. */ cpu_nr = get_cpu(); cilpcp = this_cpu_ptr(cil->xc_pcp); /* Tell the future push that there was work added by this CPU. */ if (!cpumask_test_cpu(cpu_nr, &ctx->cil_pcpmask)) cpumask_test_and_set_cpu(cpu_nr, &ctx->cil_pcpmask); /* * We need to take the CIL checkpoint unit reservation on the first * commit into the CIL. Test the XLOG_CIL_EMPTY bit first so we don't * unnecessarily do an atomic op in the fast path here. We can clear the * XLOG_CIL_EMPTY bit as we are under the xc_ctx_lock here and that * needs to be held exclusively to reset the XLOG_CIL_EMPTY bit. */ if (test_bit(XLOG_CIL_EMPTY, &cil->xc_flags) && test_and_clear_bit(XLOG_CIL_EMPTY, &cil->xc_flags)) ctx_res = ctx->ticket->t_unit_res; /* * Check if we need to steal iclog headers. atomic_read() is not a * locked atomic operation, so we can check the value before we do any * real atomic ops in the fast path. If we've already taken the CIL unit * reservation from this commit, we've already got one iclog header * space reserved so we have to account for that otherwise we risk * overrunning the reservation on this ticket. * * If the CIL is already at the hard limit, we might need more header * space that originally reserved. So steal more header space from every * commit that occurs once we are over the hard limit to ensure the CIL * push won't run out of reservation space. * * This can steal more than we need, but that's OK. * * The cil->xc_ctx_lock provides the serialisation necessary for safely * calling xlog_cil_over_hard_limit() in this context. */ space_used = atomic_read(&ctx->space_used) + cilpcp->space_used + len; if (atomic_read(&cil->xc_iclog_hdrs) > 0 || xlog_cil_over_hard_limit(log, space_used)) { split_res = log->l_iclog_hsize + sizeof(struct xlog_op_header); if (ctx_res) ctx_res += split_res * (tp->t_ticket->t_iclog_hdrs - 1); else ctx_res = split_res * tp->t_ticket->t_iclog_hdrs; atomic_sub(tp->t_ticket->t_iclog_hdrs, &cil->xc_iclog_hdrs); } cilpcp->space_reserved += ctx_res; /* * Accurately account when over the soft limit, otherwise fold the * percpu count into the global count if over the per-cpu threshold. */ if (!test_bit(XLOG_CIL_PCP_SPACE, &cil->xc_flags)) { atomic_add(len, &ctx->space_used); } else if (cilpcp->space_used + len > (XLOG_CIL_SPACE_LIMIT(log) / num_online_cpus())) { space_used = atomic_add_return(cilpcp->space_used + len, &ctx->space_used); cilpcp->space_used = 0; /* * If we just transitioned over the soft limit, we need to * transition to the global atomic counter. */ if (space_used >= XLOG_CIL_SPACE_LIMIT(log)) xlog_cil_insert_pcp_aggregate(cil, ctx); } else { cilpcp->space_used += len; } /* attach the transaction to the CIL if it has any busy extents */ if (!list_empty(&tp->t_busy)) list_splice_init(&tp->t_busy, &cilpcp->busy_extents); /* * Now update the order of everything modified in the transaction * and insert items into the CIL if they aren't already there. * We do this here so we only need to take the CIL lock once during * the transaction commit. */ order = atomic_inc_return(&ctx->order_id); list_for_each_entry(lip, &tp->t_items, li_trans) { /* Skip items which aren't dirty in this transaction. */ if (!test_bit(XFS_LI_DIRTY, &lip->li_flags)) continue; lip->li_order_id = order; if (!list_empty(&lip->li_cil)) continue; list_add_tail(&lip->li_cil, &cilpcp->log_items); } put_cpu(); /* * If we've overrun the reservation, dump the tx details before we move * the log items. Shutdown is imminent... */ tp->t_ticket->t_curr_res -= ctx_res + len; if (WARN_ON(tp->t_ticket->t_curr_res < 0)) { xfs_warn(log->l_mp, "Transaction log reservation overrun:"); xfs_warn(log->l_mp, " log items: %d bytes (iov hdrs: %d bytes)", len, iovhdr_res); xfs_warn(log->l_mp, " split region headers: %d bytes", split_res); xfs_warn(log->l_mp, " ctx ticket: %d bytes", ctx_res); xlog_print_trans(tp); xlog_force_shutdown(log, SHUTDOWN_LOG_IO_ERROR); } } static inline void xlog_cil_ail_insert_batch( struct xfs_ail *ailp, struct xfs_ail_cursor *cur, struct xfs_log_item **log_items, int nr_items, xfs_lsn_t commit_lsn) { int i; spin_lock(&ailp->ail_lock); /* xfs_trans_ail_update_bulk drops ailp->ail_lock */ xfs_trans_ail_update_bulk(ailp, cur, log_items, nr_items, commit_lsn); for (i = 0; i < nr_items; i++) { struct xfs_log_item *lip = log_items[i]; if (lip->li_ops->iop_unpin) lip->li_ops->iop_unpin(lip, 0); } } /* * Take the checkpoint's log vector chain of items and insert the attached log * items into the AIL. This uses bulk insertion techniques to minimise AIL lock * traffic. * * The AIL tracks log items via the start record LSN of the checkpoint, * not the commit record LSN. This is because we can pipeline multiple * checkpoints, and so the start record of checkpoint N+1 can be * written before the commit record of checkpoint N. i.e: * * start N commit N * +-------------+------------+----------------+ * start N+1 commit N+1 * * The tail of the log cannot be moved to the LSN of commit N when all * the items of that checkpoint are written back, because then the * start record for N+1 is no longer in the active portion of the log * and recovery will fail/corrupt the filesystem. * * Hence when all the log items in checkpoint N are written back, the * tail of the log most now only move as far forwards as the start LSN * of checkpoint N+1. * * If we are called with the aborted flag set, it is because a log write during * a CIL checkpoint commit has failed. In this case, all the items in the * checkpoint have already gone through iop_committed and iop_committing, which * means that checkpoint commit abort handling is treated exactly the same as an * iclog write error even though we haven't started any IO yet. Hence in this * case all we need to do is iop_committed processing, followed by an * iop_unpin(aborted) call. * * The AIL cursor is used to optimise the insert process. If commit_lsn is not * at the end of the AIL, the insert cursor avoids the need to walk the AIL to * find the insertion point on every xfs_log_item_batch_insert() call. This * saves a lot of needless list walking and is a net win, even though it * slightly increases that amount of AIL lock traffic to set it up and tear it * down. */ static void xlog_cil_ail_insert( struct xfs_cil_ctx *ctx, bool aborted) { #define LOG_ITEM_BATCH_SIZE 32 struct xfs_ail *ailp = ctx->cil->xc_log->l_ailp; struct xfs_log_item *log_items[LOG_ITEM_BATCH_SIZE]; struct xfs_log_vec *lv; struct xfs_ail_cursor cur; xfs_lsn_t old_head; int i = 0; /* * Update the AIL head LSN with the commit record LSN of this * checkpoint. As iclogs are always completed in order, this should * always be the same (as iclogs can contain multiple commit records) or * higher LSN than the current head. We do this before insertion of the * items so that log space checks during insertion will reflect the * space that this checkpoint has already consumed. We call * xfs_ail_update_finish() so that tail space and space-based wakeups * will be recalculated appropriately. */ ASSERT(XFS_LSN_CMP(ctx->commit_lsn, ailp->ail_head_lsn) >= 0 || aborted); spin_lock(&ailp->ail_lock); xfs_trans_ail_cursor_last(ailp, &cur, ctx->start_lsn); old_head = ailp->ail_head_lsn; ailp->ail_head_lsn = ctx->commit_lsn; /* xfs_ail_update_finish() drops the ail_lock */ xfs_ail_update_finish(ailp, NULLCOMMITLSN); /* * We move the AIL head forwards to account for the space used in the * log before we remove that space from the grant heads. This prevents a * transient condition where reservation space appears to become * available on return, only for it to disappear again immediately as * the AIL head update accounts in the log tail space. */ smp_wmb(); /* paired with smp_rmb in xlog_grant_space_left */ xlog_grant_return_space(ailp->ail_log, old_head, ailp->ail_head_lsn); /* unpin all the log items */ list_for_each_entry(lv, &ctx->lv_chain, lv_list) { struct xfs_log_item *lip = lv->lv_item; xfs_lsn_t item_lsn; if (aborted) set_bit(XFS_LI_ABORTED, &lip->li_flags); if (lip->li_ops->flags & XFS_ITEM_RELEASE_WHEN_COMMITTED) { lip->li_ops->iop_release(lip); continue; } if (lip->li_ops->iop_committed) item_lsn = lip->li_ops->iop_committed(lip, ctx->start_lsn); else item_lsn = ctx->start_lsn; /* item_lsn of -1 means the item needs no further processing */ if (XFS_LSN_CMP(item_lsn, (xfs_lsn_t)-1) == 0) continue; /* * if we are aborting the operation, no point in inserting the * object into the AIL as we are in a shutdown situation. */ if (aborted) { ASSERT(xlog_is_shutdown(ailp->ail_log)); if (lip->li_ops->iop_unpin) lip->li_ops->iop_unpin(lip, 1); continue; } if (item_lsn != ctx->start_lsn) { /* * Not a bulk update option due to unusual item_lsn. * Push into AIL immediately, rechecking the lsn once * we have the ail lock. Then unpin the item. This does * not affect the AIL cursor the bulk insert path is * using. */ spin_lock(&ailp->ail_lock); if (XFS_LSN_CMP(item_lsn, lip->li_lsn) > 0) xfs_trans_ail_update(ailp, lip, item_lsn); else spin_unlock(&ailp->ail_lock); if (lip->li_ops->iop_unpin) lip->li_ops->iop_unpin(lip, 0); continue; } /* Item is a candidate for bulk AIL insert. */ log_items[i++] = lv->lv_item; if (i >= LOG_ITEM_BATCH_SIZE) { xlog_cil_ail_insert_batch(ailp, &cur, log_items, LOG_ITEM_BATCH_SIZE, ctx->start_lsn); i = 0; } } /* make sure we insert the remainder! */ if (i) xlog_cil_ail_insert_batch(ailp, &cur, log_items, i, ctx->start_lsn); spin_lock(&ailp->ail_lock); xfs_trans_ail_cursor_done(&cur); spin_unlock(&ailp->ail_lock); } static void xlog_cil_free_logvec( struct list_head *lv_chain) { struct xfs_log_vec *lv; while (!list_empty(lv_chain)) { lv = list_first_entry(lv_chain, struct xfs_log_vec, lv_list); list_del_init(&lv->lv_list); kvfree(lv); } } /* * Mark all items committed and clear busy extents. We free the log vector * chains in a separate pass so that we unpin the log items as quickly as * possible. */ static void xlog_cil_committed( struct xfs_cil_ctx *ctx) { struct xfs_mount *mp = ctx->cil->xc_log->l_mp; bool abort = xlog_is_shutdown(ctx->cil->xc_log); /* * If the I/O failed, we're aborting the commit and already shutdown. * Wake any commit waiters before aborting the log items so we don't * block async log pushers on callbacks. Async log pushers explicitly do * not wait on log force completion because they may be holding locks * required to unpin items. */ if (abort) { spin_lock(&ctx->cil->xc_push_lock); wake_up_all(&ctx->cil->xc_start_wait); wake_up_all(&ctx->cil->xc_commit_wait); spin_unlock(&ctx->cil->xc_push_lock); } xlog_cil_ail_insert(ctx, abort); xfs_extent_busy_sort(&ctx->busy_extents.extent_list); xfs_extent_busy_clear(mp, &ctx->busy_extents.extent_list, xfs_has_discard(mp) && !abort); spin_lock(&ctx->cil->xc_push_lock); list_del(&ctx->committing); spin_unlock(&ctx->cil->xc_push_lock); xlog_cil_free_logvec(&ctx->lv_chain); if (!list_empty(&ctx->busy_extents.extent_list)) { ctx->busy_extents.mount = mp; ctx->busy_extents.owner = ctx; xfs_discard_extents(mp, &ctx->busy_extents); return; } kfree(ctx); } void xlog_cil_process_committed( struct list_head *list) { struct xfs_cil_ctx *ctx; while ((ctx = list_first_entry_or_null(list, struct xfs_cil_ctx, iclog_entry))) { list_del(&ctx->iclog_entry); xlog_cil_committed(ctx); } } /* * Record the LSN of the iclog we were just granted space to start writing into. * If the context doesn't have a start_lsn recorded, then this iclog will * contain the start record for the checkpoint. Otherwise this write contains * the commit record for the checkpoint. */ void xlog_cil_set_ctx_write_state( struct xfs_cil_ctx *ctx, struct xlog_in_core *iclog) { struct xfs_cil *cil = ctx->cil; xfs_lsn_t lsn = be64_to_cpu(iclog->ic_header.h_lsn); ASSERT(!ctx->commit_lsn); if (!ctx->start_lsn) { spin_lock(&cil->xc_push_lock); /* * The LSN we need to pass to the log items on transaction * commit is the LSN reported by the first log vector write, not * the commit lsn. If we use the commit record lsn then we can * move the grant write head beyond the tail LSN and overwrite * it. */ ctx->start_lsn = lsn; wake_up_all(&cil->xc_start_wait); spin_unlock(&cil->xc_push_lock); /* * Make sure the metadata we are about to overwrite in the log * has been flushed to stable storage before this iclog is * issued. */ spin_lock(&cil->xc_log->l_icloglock); iclog->ic_flags |= XLOG_ICL_NEED_FLUSH; spin_unlock(&cil->xc_log->l_icloglock); return; } /* * Take a reference to the iclog for the context so that we still hold * it when xlog_write is done and has released it. This means the * context controls when the iclog is released for IO. */ atomic_inc(&iclog->ic_refcnt); /* * xlog_state_get_iclog_space() guarantees there is enough space in the * iclog for an entire commit record, so we can attach the context * callbacks now. This needs to be done before we make the commit_lsn * visible to waiters so that checkpoints with commit records in the * same iclog order their IO completion callbacks in the same order that * the commit records appear in the iclog. */ spin_lock(&cil->xc_log->l_icloglock); list_add_tail(&ctx->iclog_entry, &iclog->ic_callbacks); spin_unlock(&cil->xc_log->l_icloglock); /* * Now we can record the commit LSN and wake anyone waiting for this * sequence to have the ordered commit record assigned to a physical * location in the log. */ spin_lock(&cil->xc_push_lock); ctx->commit_iclog = iclog; ctx->commit_lsn = lsn; wake_up_all(&cil->xc_commit_wait); spin_unlock(&cil->xc_push_lock); } /* * Ensure that the order of log writes follows checkpoint sequence order. This * relies on the context LSN being zero until the log write has guaranteed the * LSN that the log write will start at via xlog_state_get_iclog_space(). */ enum _record_type { _START_RECORD, _COMMIT_RECORD, }; static int xlog_cil_order_write( struct xfs_cil *cil, xfs_csn_t sequence, enum _record_type record) { struct xfs_cil_ctx *ctx; restart: spin_lock(&cil->xc_push_lock); list_for_each_entry(ctx, &cil->xc_committing, committing) { /* * Avoid getting stuck in this loop because we were woken by the * shutdown, but then went back to sleep once already in the * shutdown state. */ if (xlog_is_shutdown(cil->xc_log)) { spin_unlock(&cil->xc_push_lock); return -EIO; } /* * Higher sequences will wait for this one so skip them. * Don't wait for our own sequence, either. */ if (ctx->sequence >= sequence) continue; /* Wait until the LSN for the record has been recorded. */ switch (record) { case _START_RECORD: if (!ctx->start_lsn) { xlog_wait(&cil->xc_start_wait, &cil->xc_push_lock); goto restart; } break; case _COMMIT_RECORD: if (!ctx->commit_lsn) { xlog_wait(&cil->xc_commit_wait, &cil->xc_push_lock); goto restart; } break; } } spin_unlock(&cil->xc_push_lock); return 0; } /* * Write out the log vector change now attached to the CIL context. This will * write a start record that needs to be strictly ordered in ascending CIL * sequence order so that log recovery will always use in-order start LSNs when * replaying checkpoints. */ static int xlog_cil_write_chain( struct xfs_cil_ctx *ctx, uint32_t chain_len) { struct xlog *log = ctx->cil->xc_log; int error; error = xlog_cil_order_write(ctx->cil, ctx->sequence, _START_RECORD); if (error) return error; return xlog_write(log, ctx, &ctx->lv_chain, ctx->ticket, chain_len); } /* * Write out the commit record of a checkpoint transaction to close off a * running log write. These commit records are strictly ordered in ascending CIL * sequence order so that log recovery will always replay the checkpoints in the * correct order. */ static int xlog_cil_write_commit_record( struct xfs_cil_ctx *ctx) { struct xlog *log = ctx->cil->xc_log; struct xlog_op_header ophdr = { .oh_clientid = XFS_TRANSACTION, .oh_tid = cpu_to_be32(ctx->ticket->t_tid), .oh_flags = XLOG_COMMIT_TRANS, }; struct xfs_log_iovec reg = { .i_addr = &ophdr, .i_len = sizeof(struct xlog_op_header), .i_type = XLOG_REG_TYPE_COMMIT, }; struct xfs_log_vec vec = { .lv_niovecs = 1, .lv_iovecp = ®, }; int error; LIST_HEAD(lv_chain); list_add(&vec.lv_list, &lv_chain); if (xlog_is_shutdown(log)) return -EIO; error = xlog_cil_order_write(ctx->cil, ctx->sequence, _COMMIT_RECORD); if (error) return error; /* account for space used by record data */ ctx->ticket->t_curr_res -= reg.i_len; error = xlog_write(log, ctx, &lv_chain, ctx->ticket, reg.i_len); if (error) xlog_force_shutdown(log, SHUTDOWN_LOG_IO_ERROR); return error; } struct xlog_cil_trans_hdr { struct xlog_op_header oph[2]; struct xfs_trans_header thdr; struct xfs_log_iovec lhdr[2]; }; /* * Build a checkpoint transaction header to begin the journal transaction. We * need to account for the space used by the transaction header here as it is * not accounted for in xlog_write(). * * This is the only place we write a transaction header, so we also build the * log opheaders that indicate the start of a log transaction and wrap the * transaction header. We keep the start record in it's own log vector rather * than compacting them into a single region as this ends up making the logic * in xlog_write() for handling empty opheaders for start, commit and unmount * records much simpler. */ static void xlog_cil_build_trans_hdr( struct xfs_cil_ctx *ctx, struct xlog_cil_trans_hdr *hdr, struct xfs_log_vec *lvhdr, int num_iovecs) { struct xlog_ticket *tic = ctx->ticket; __be32 tid = cpu_to_be32(tic->t_tid); memset(hdr, 0, sizeof(*hdr)); /* Log start record */ hdr->oph[0].oh_tid = tid; hdr->oph[0].oh_clientid = XFS_TRANSACTION; hdr->oph[0].oh_flags = XLOG_START_TRANS; /* log iovec region pointer */ hdr->lhdr[0].i_addr = &hdr->oph[0]; hdr->lhdr[0].i_len = sizeof(struct xlog_op_header); hdr->lhdr[0].i_type = XLOG_REG_TYPE_LRHEADER; /* log opheader */ hdr->oph[1].oh_tid = tid; hdr->oph[1].oh_clientid = XFS_TRANSACTION; hdr->oph[1].oh_len = cpu_to_be32(sizeof(struct xfs_trans_header)); /* transaction header in host byte order format */ hdr->thdr.th_magic = XFS_TRANS_HEADER_MAGIC; hdr->thdr.th_type = XFS_TRANS_CHECKPOINT; hdr->thdr.th_tid = tic->t_tid; hdr->thdr.th_num_items = num_iovecs; /* log iovec region pointer */ hdr->lhdr[1].i_addr = &hdr->oph[1]; hdr->lhdr[1].i_len = sizeof(struct xlog_op_header) + sizeof(struct xfs_trans_header); hdr->lhdr[1].i_type = XLOG_REG_TYPE_TRANSHDR; lvhdr->lv_niovecs = 2; lvhdr->lv_iovecp = &hdr->lhdr[0]; lvhdr->lv_bytes = hdr->lhdr[0].i_len + hdr->lhdr[1].i_len; tic->t_curr_res -= lvhdr->lv_bytes; } /* * CIL item reordering compare function. We want to order in ascending ID order, * but we want to leave items with the same ID in the order they were added to * the list. This is important for operations like reflink where we log 4 order * dependent intents in a single transaction when we overwrite an existing * shared extent with a new shared extent. i.e. BUI(unmap), CUI(drop), * CUI (inc), BUI(remap)... */ static int xlog_cil_order_cmp( void *priv, const struct list_head *a, const struct list_head *b) { struct xfs_log_vec *l1 = container_of(a, struct xfs_log_vec, lv_list); struct xfs_log_vec *l2 = container_of(b, struct xfs_log_vec, lv_list); return l1->lv_order_id > l2->lv_order_id; } /* * Pull all the log vectors off the items in the CIL, and remove the items from * the CIL. We don't need the CIL lock here because it's only needed on the * transaction commit side which is currently locked out by the flush lock. * * If a log item is marked with a whiteout, we do not need to write it to the * journal and so we just move them to the whiteout list for the caller to * dispose of appropriately. */ static void xlog_cil_build_lv_chain( struct xfs_cil_ctx *ctx, struct list_head *whiteouts, uint32_t *num_iovecs, uint32_t *num_bytes) { while (!list_empty(&ctx->log_items)) { struct xfs_log_item *item; struct xfs_log_vec *lv; item = list_first_entry(&ctx->log_items, struct xfs_log_item, li_cil); if (test_bit(XFS_LI_WHITEOUT, &item->li_flags)) { list_move(&item->li_cil, whiteouts); trace_xfs_cil_whiteout_skip(item); continue; } lv = item->li_lv; lv->lv_order_id = item->li_order_id; /* we don't write ordered log vectors */ if (lv->lv_buf_len != XFS_LOG_VEC_ORDERED) *num_bytes += lv->lv_bytes; *num_iovecs += lv->lv_niovecs; list_add_tail(&lv->lv_list, &ctx->lv_chain); list_del_init(&item->li_cil); item->li_order_id = 0; item->li_lv = NULL; } } static void xlog_cil_cleanup_whiteouts( struct list_head *whiteouts) { while (!list_empty(whiteouts)) { struct xfs_log_item *item = list_first_entry(whiteouts, struct xfs_log_item, li_cil); list_del_init(&item->li_cil); trace_xfs_cil_whiteout_unpin(item); item->li_ops->iop_unpin(item, 1); } } /* * Push the Committed Item List to the log. * * If the current sequence is the same as xc_push_seq we need to do a flush. If * xc_push_seq is less than the current sequence, then it has already been * flushed and we don't need to do anything - the caller will wait for it to * complete if necessary. * * xc_push_seq is checked unlocked against the sequence number for a match. * Hence we can allow log forces to run racily and not issue pushes for the * same sequence twice. If we get a race between multiple pushes for the same * sequence they will block on the first one and then abort, hence avoiding * needless pushes. * * This runs from a workqueue so it does not inherent any specific memory * allocation context. However, we do not want to block on memory reclaim * recursing back into the filesystem because this push may have been triggered * by memory reclaim itself. Hence we really need to run under full GFP_NOFS * contraints here. */ static void xlog_cil_push_work( struct work_struct *work) { unsigned int nofs_flags = memalloc_nofs_save(); struct xfs_cil_ctx *ctx = container_of(work, struct xfs_cil_ctx, push_work); struct xfs_cil *cil = ctx->cil; struct xlog *log = cil->xc_log; struct xfs_cil_ctx *new_ctx; int num_iovecs = 0; int num_bytes = 0; int error = 0; struct xlog_cil_trans_hdr thdr; struct xfs_log_vec lvhdr = {}; xfs_csn_t push_seq; bool push_commit_stable; LIST_HEAD (whiteouts); struct xlog_ticket *ticket; new_ctx = xlog_cil_ctx_alloc(); new_ctx->ticket = xlog_cil_ticket_alloc(log); down_write(&cil->xc_ctx_lock); spin_lock(&cil->xc_push_lock); push_seq = cil->xc_push_seq; ASSERT(push_seq <= ctx->sequence); push_commit_stable = cil->xc_push_commit_stable; cil->xc_push_commit_stable = false; /* * As we are about to switch to a new, empty CIL context, we no longer * need to throttle tasks on CIL space overruns. Wake any waiters that * the hard push throttle may have caught so they can start committing * to the new context. The ctx->xc_push_lock provides the serialisation * necessary for safely using the lockless waitqueue_active() check in * this context. */ if (waitqueue_active(&cil->xc_push_wait)) wake_up_all(&cil->xc_push_wait); xlog_cil_push_pcp_aggregate(cil, ctx); /* * Check if we've anything to push. If there is nothing, then we don't * move on to a new sequence number and so we have to be able to push * this sequence again later. */ if (test_bit(XLOG_CIL_EMPTY, &cil->xc_flags)) { cil->xc_push_seq = 0; spin_unlock(&cil->xc_push_lock); goto out_skip; } /* check for a previously pushed sequence */ if (push_seq < ctx->sequence) { spin_unlock(&cil->xc_push_lock); goto out_skip; } /* * We are now going to push this context, so add it to the committing * list before we do anything else. This ensures that anyone waiting on * this push can easily detect the difference between a "push in * progress" and "CIL is empty, nothing to do". * * IOWs, a wait loop can now check for: * the current sequence not being found on the committing list; * an empty CIL; and * an unchanged sequence number * to detect a push that had nothing to do and therefore does not need * waiting on. If the CIL is not empty, we get put on the committing * list before emptying the CIL and bumping the sequence number. Hence * an empty CIL and an unchanged sequence number means we jumped out * above after doing nothing. * * Hence the waiter will either find the commit sequence on the * committing list or the sequence number will be unchanged and the CIL * still dirty. In that latter case, the push has not yet started, and * so the waiter will have to continue trying to check the CIL * committing list until it is found. In extreme cases of delay, the * sequence may fully commit between the attempts the wait makes to wait * on the commit sequence. */ list_add(&ctx->committing, &cil->xc_committing); spin_unlock(&cil->xc_push_lock); xlog_cil_build_lv_chain(ctx, &whiteouts, &num_iovecs, &num_bytes); /* * Switch the contexts so we can drop the context lock and move out * of a shared context. We can't just go straight to the commit record, * though - we need to synchronise with previous and future commits so * that the commit records are correctly ordered in the log to ensure * that we process items during log IO completion in the correct order. * * For example, if we get an EFI in one checkpoint and the EFD in the * next (e.g. due to log forces), we do not want the checkpoint with * the EFD to be committed before the checkpoint with the EFI. Hence * we must strictly order the commit records of the checkpoints so * that: a) the checkpoint callbacks are attached to the iclogs in the * correct order; and b) the checkpoints are replayed in correct order * in log recovery. * * Hence we need to add this context to the committing context list so * that higher sequences will wait for us to write out a commit record * before they do. * * xfs_log_force_seq requires us to mirror the new sequence into the cil * structure atomically with the addition of this sequence to the * committing list. This also ensures that we can do unlocked checks * against the current sequence in log forces without risking * deferencing a freed context pointer. */ spin_lock(&cil->xc_push_lock); xlog_cil_ctx_switch(cil, new_ctx); spin_unlock(&cil->xc_push_lock); up_write(&cil->xc_ctx_lock); /* * Sort the log vector chain before we add the transaction headers. * This ensures we always have the transaction headers at the start * of the chain. */ list_sort(NULL, &ctx->lv_chain, xlog_cil_order_cmp); /* * Build a checkpoint transaction header and write it to the log to * begin the transaction. We need to account for the space used by the * transaction header here as it is not accounted for in xlog_write(). * Add the lvhdr to the head of the lv chain we pass to xlog_write() so * it gets written into the iclog first. */ xlog_cil_build_trans_hdr(ctx, &thdr, &lvhdr, num_iovecs); num_bytes += lvhdr.lv_bytes; list_add(&lvhdr.lv_list, &ctx->lv_chain); /* * Take the lvhdr back off the lv_chain immediately after calling * xlog_cil_write_chain() as it should not be passed to log IO * completion. */ error = xlog_cil_write_chain(ctx, num_bytes); list_del(&lvhdr.lv_list); if (error) goto out_abort_free_ticket; error = xlog_cil_write_commit_record(ctx); if (error) goto out_abort_free_ticket; /* * Grab the ticket from the ctx so we can ungrant it after releasing the * commit_iclog. The ctx may be freed by the time we return from * releasing the commit_iclog (i.e. checkpoint has been completed and * callback run) so we can't reference the ctx after the call to * xlog_state_release_iclog(). */ ticket = ctx->ticket; /* * If the checkpoint spans multiple iclogs, wait for all previous iclogs * to complete before we submit the commit_iclog. We can't use state * checks for this - ACTIVE can be either a past completed iclog or a * future iclog being filled, while WANT_SYNC through SYNC_DONE can be a * past or future iclog awaiting IO or ordered IO completion to be run. * In the latter case, if it's a future iclog and we wait on it, the we * will hang because it won't get processed through to ic_force_wait * wakeup until this commit_iclog is written to disk. Hence we use the * iclog header lsn and compare it to the commit lsn to determine if we * need to wait on iclogs or not. */ spin_lock(&log->l_icloglock); if (ctx->start_lsn != ctx->commit_lsn) { xfs_lsn_t plsn; plsn = be64_to_cpu(ctx->commit_iclog->ic_prev->ic_header.h_lsn); if (plsn && XFS_LSN_CMP(plsn, ctx->commit_lsn) < 0) { /* * Waiting on ic_force_wait orders the completion of * iclogs older than ic_prev. Hence we only need to wait * on the most recent older iclog here. */ xlog_wait_on_iclog(ctx->commit_iclog->ic_prev); spin_lock(&log->l_icloglock); } /* * We need to issue a pre-flush so that the ordering for this * checkpoint is correctly preserved down to stable storage. */ ctx->commit_iclog->ic_flags |= XLOG_ICL_NEED_FLUSH; } /* * The commit iclog must be written to stable storage to guarantee * journal IO vs metadata writeback IO is correctly ordered on stable * storage. * * If the push caller needs the commit to be immediately stable and the * commit_iclog is not yet marked as XLOG_STATE_WANT_SYNC to indicate it * will be written when released, switch it's state to WANT_SYNC right * now. */ ctx->commit_iclog->ic_flags |= XLOG_ICL_NEED_FUA; if (push_commit_stable && ctx->commit_iclog->ic_state == XLOG_STATE_ACTIVE) xlog_state_switch_iclogs(log, ctx->commit_iclog, 0); ticket = ctx->ticket; xlog_state_release_iclog(log, ctx->commit_iclog, ticket); /* Not safe to reference ctx now! */ spin_unlock(&log->l_icloglock); xlog_cil_cleanup_whiteouts(&whiteouts); xfs_log_ticket_ungrant(log, ticket); memalloc_nofs_restore(nofs_flags); return; out_skip: up_write(&cil->xc_ctx_lock); xfs_log_ticket_put(new_ctx->ticket); kfree(new_ctx); memalloc_nofs_restore(nofs_flags); return; out_abort_free_ticket: ASSERT(xlog_is_shutdown(log)); xlog_cil_cleanup_whiteouts(&whiteouts); if (!ctx->commit_iclog) { xfs_log_ticket_ungrant(log, ctx->ticket); xlog_cil_committed(ctx); memalloc_nofs_restore(nofs_flags); return; } spin_lock(&log->l_icloglock); ticket = ctx->ticket; xlog_state_release_iclog(log, ctx->commit_iclog, ticket); /* Not safe to reference ctx now! */ spin_unlock(&log->l_icloglock); xfs_log_ticket_ungrant(log, ticket); memalloc_nofs_restore(nofs_flags); } /* * We need to push CIL every so often so we don't cache more than we can fit in * the log. The limit really is that a checkpoint can't be more than half the * log (the current checkpoint is not allowed to overwrite the previous * checkpoint), but commit latency and memory usage limit this to a smaller * size. */ static void xlog_cil_push_background( struct xlog *log) { struct xfs_cil *cil = log->l_cilp; int space_used = atomic_read(&cil->xc_ctx->space_used); /* * The cil won't be empty because we are called while holding the * context lock so whatever we added to the CIL will still be there. */ ASSERT(!test_bit(XLOG_CIL_EMPTY, &cil->xc_flags)); /* * We are done if: * - we haven't used up all the space available yet; or * - we've already queued up a push; and * - we're not over the hard limit; and * - nothing has been over the hard limit. * * If so, we don't need to take the push lock as there's nothing to do. */ if (space_used < XLOG_CIL_SPACE_LIMIT(log) || (cil->xc_push_seq == cil->xc_current_sequence && space_used < XLOG_CIL_BLOCKING_SPACE_LIMIT(log) && !waitqueue_active(&cil->xc_push_wait))) { up_read(&cil->xc_ctx_lock); return; } spin_lock(&cil->xc_push_lock); if (cil->xc_push_seq < cil->xc_current_sequence) { cil->xc_push_seq = cil->xc_current_sequence; queue_work(cil->xc_push_wq, &cil->xc_ctx->push_work); } /* * Drop the context lock now, we can't hold that if we need to sleep * because we are over the blocking threshold. The push_lock is still * held, so blocking threshold sleep/wakeup is still correctly * serialised here. */ up_read(&cil->xc_ctx_lock); /* * If we are well over the space limit, throttle the work that is being * done until the push work on this context has begun. Enforce the hard * throttle on all transaction commits once it has been activated, even * if the committing transactions have resulted in the space usage * dipping back down under the hard limit. * * The ctx->xc_push_lock provides the serialisation necessary for safely * calling xlog_cil_over_hard_limit() in this context. */ if (xlog_cil_over_hard_limit(log, space_used)) { trace_xfs_log_cil_wait(log, cil->xc_ctx->ticket); ASSERT(space_used < log->l_logsize); xlog_wait(&cil->xc_push_wait, &cil->xc_push_lock); return; } spin_unlock(&cil->xc_push_lock); } /* * xlog_cil_push_now() is used to trigger an immediate CIL push to the sequence * number that is passed. When it returns, the work will be queued for * @push_seq, but it won't be completed. * * If the caller is performing a synchronous force, we will flush the workqueue * to get previously queued work moving to minimise the wait time they will * undergo waiting for all outstanding pushes to complete. The caller is * expected to do the required waiting for push_seq to complete. * * If the caller is performing an async push, we need to ensure that the * checkpoint is fully flushed out of the iclogs when we finish the push. If we * don't do this, then the commit record may remain sitting in memory in an * ACTIVE iclog. This then requires another full log force to push to disk, * which defeats the purpose of having an async, non-blocking CIL force * mechanism. Hence in this case we need to pass a flag to the push work to * indicate it needs to flush the commit record itself. */ static void xlog_cil_push_now( struct xlog *log, xfs_lsn_t push_seq, bool async) { struct xfs_cil *cil = log->l_cilp; if (!cil) return; ASSERT(push_seq && push_seq <= cil->xc_current_sequence); /* start on any pending background push to minimise wait time on it */ if (!async) flush_workqueue(cil->xc_push_wq); spin_lock(&cil->xc_push_lock); /* * If this is an async flush request, we always need to set the * xc_push_commit_stable flag even if something else has already queued * a push. The flush caller is asking for the CIL to be on stable * storage when the next push completes, so regardless of who has queued * the push, the flush requires stable semantics from it. */ cil->xc_push_commit_stable = async; /* * If the CIL is empty or we've already pushed the sequence then * there's no more work that we need to do. */ if (test_bit(XLOG_CIL_EMPTY, &cil->xc_flags) || push_seq <= cil->xc_push_seq) { spin_unlock(&cil->xc_push_lock); return; } cil->xc_push_seq = push_seq; queue_work(cil->xc_push_wq, &cil->xc_ctx->push_work); spin_unlock(&cil->xc_push_lock); } bool xlog_cil_empty( struct xlog *log) { struct xfs_cil *cil = log->l_cilp; bool empty = false; spin_lock(&cil->xc_push_lock); if (test_bit(XLOG_CIL_EMPTY, &cil->xc_flags)) empty = true; spin_unlock(&cil->xc_push_lock); return empty; } /* * If there are intent done items in this transaction and the related intent was * committed in the current (same) CIL checkpoint, we don't need to write either * the intent or intent done item to the journal as the change will be * journalled atomically within this checkpoint. As we cannot remove items from * the CIL here, mark the related intent with a whiteout so that the CIL push * can remove it rather than writing it to the journal. Then remove the intent * done item from the current transaction and release it so it doesn't get put * into the CIL at all. */ static uint32_t xlog_cil_process_intents( struct xfs_cil *cil, struct xfs_trans *tp) { struct xfs_log_item *lip, *ilip, *next; uint32_t len = 0; list_for_each_entry_safe(lip, next, &tp->t_items, li_trans) { if (!(lip->li_ops->flags & XFS_ITEM_INTENT_DONE)) continue; ilip = lip->li_ops->iop_intent(lip); if (!ilip || !xlog_item_in_current_chkpt(cil, ilip)) continue; set_bit(XFS_LI_WHITEOUT, &ilip->li_flags); trace_xfs_cil_whiteout_mark(ilip); len += ilip->li_lv->lv_bytes; kvfree(ilip->li_lv); ilip->li_lv = NULL; xfs_trans_del_item(lip); lip->li_ops->iop_release(lip); } return len; } /* * Commit a transaction with the given vector to the Committed Item List. * * To do this, we need to format the item, pin it in memory if required and * account for the space used by the transaction. Once we have done that we * need to release the unused reservation for the transaction, attach the * transaction to the checkpoint context so we carry the busy extents through * to checkpoint completion, and then unlock all the items in the transaction. * * Called with the context lock already held in read mode to lock out * background commit, returns without it held once background commits are * allowed again. */ void xlog_cil_commit( struct xlog *log, struct xfs_trans *tp, xfs_csn_t *commit_seq, bool regrant) { struct xfs_cil *cil = log->l_cilp; struct xfs_log_item *lip, *next; uint32_t released_space = 0; /* * Do all necessary memory allocation before we lock the CIL. * This ensures the allocation does not deadlock with a CIL * push in memory reclaim (e.g. from kswapd). */ xlog_cil_alloc_shadow_bufs(log, tp); /* lock out background commit */ down_read(&cil->xc_ctx_lock); if (tp->t_flags & XFS_TRANS_HAS_INTENT_DONE) released_space = xlog_cil_process_intents(cil, tp); xlog_cil_insert_items(log, tp, released_space); if (regrant && !xlog_is_shutdown(log)) xfs_log_ticket_regrant(log, tp->t_ticket); else xfs_log_ticket_ungrant(log, tp->t_ticket); tp->t_ticket = NULL; xfs_trans_unreserve_and_mod_sb(tp); /* * Once all the items of the transaction have been copied to the CIL, * the items can be unlocked and possibly freed. * * This needs to be done before we drop the CIL context lock because we * have to update state in the log items and unlock them before they go * to disk. If we don't, then the CIL checkpoint can race with us and * we can run checkpoint completion before we've updated and unlocked * the log items. This affects (at least) processing of stale buffers, * inodes and EFIs. */ trace_xfs_trans_commit_items(tp, _RET_IP_); list_for_each_entry_safe(lip, next, &tp->t_items, li_trans) { xfs_trans_del_item(lip); if (lip->li_ops->iop_committing) lip->li_ops->iop_committing(lip, cil->xc_ctx->sequence); } if (commit_seq) *commit_seq = cil->xc_ctx->sequence; /* xlog_cil_push_background() releases cil->xc_ctx_lock */ xlog_cil_push_background(log); } /* * Flush the CIL to stable storage but don't wait for it to complete. This * requires the CIL push to ensure the commit record for the push hits the disk, * but otherwise is no different to a push done from a log force. */ void xlog_cil_flush( struct xlog *log) { xfs_csn_t seq = log->l_cilp->xc_current_sequence; trace_xfs_log_force(log->l_mp, seq, _RET_IP_); xlog_cil_push_now(log, seq, true); /* * If the CIL is empty, make sure that any previous checkpoint that may * still be in an active iclog is pushed to stable storage. */ if (test_bit(XLOG_CIL_EMPTY, &log->l_cilp->xc_flags)) xfs_log_force(log->l_mp, 0); } /* * Conditionally push the CIL based on the sequence passed in. * * We only need to push if we haven't already pushed the sequence number given. * Hence the only time we will trigger a push here is if the push sequence is * the same as the current context. * * We return the current commit lsn to allow the callers to determine if a * iclog flush is necessary following this call. */ xfs_lsn_t xlog_cil_force_seq( struct xlog *log, xfs_csn_t sequence) { struct xfs_cil *cil = log->l_cilp; struct xfs_cil_ctx *ctx; xfs_lsn_t commit_lsn = NULLCOMMITLSN; ASSERT(sequence <= cil->xc_current_sequence); if (!sequence) sequence = cil->xc_current_sequence; trace_xfs_log_force(log->l_mp, sequence, _RET_IP_); /* * check to see if we need to force out the current context. * xlog_cil_push() handles racing pushes for the same sequence, * so no need to deal with it here. */ restart: xlog_cil_push_now(log, sequence, false); /* * See if we can find a previous sequence still committing. * We need to wait for all previous sequence commits to complete * before allowing the force of push_seq to go ahead. Hence block * on commits for those as well. */ spin_lock(&cil->xc_push_lock); list_for_each_entry(ctx, &cil->xc_committing, committing) { /* * Avoid getting stuck in this loop because we were woken by the * shutdown, but then went back to sleep once already in the * shutdown state. */ if (xlog_is_shutdown(log)) goto out_shutdown; if (ctx->sequence > sequence) continue; if (!ctx->commit_lsn) { /* * It is still being pushed! Wait for the push to * complete, then start again from the beginning. */ XFS_STATS_INC(log->l_mp, xs_log_force_sleep); xlog_wait(&cil->xc_commit_wait, &cil->xc_push_lock); goto restart; } if (ctx->sequence != sequence) continue; /* found it! */ commit_lsn = ctx->commit_lsn; } /* * The call to xlog_cil_push_now() executes the push in the background. * Hence by the time we have got here it our sequence may not have been * pushed yet. This is true if the current sequence still matches the * push sequence after the above wait loop and the CIL still contains * dirty objects. This is guaranteed by the push code first adding the * context to the committing list before emptying the CIL. * * Hence if we don't find the context in the committing list and the * current sequence number is unchanged then the CIL contents are * significant. If the CIL is empty, if means there was nothing to push * and that means there is nothing to wait for. If the CIL is not empty, * it means we haven't yet started the push, because if it had started * we would have found the context on the committing list. */ if (sequence == cil->xc_current_sequence && !test_bit(XLOG_CIL_EMPTY, &cil->xc_flags)) { spin_unlock(&cil->xc_push_lock); goto restart; } spin_unlock(&cil->xc_push_lock); return commit_lsn; /* * We detected a shutdown in progress. We need to trigger the log force * to pass through it's iclog state machine error handling, even though * we are already in a shutdown state. Hence we can't return * NULLCOMMITLSN here as that has special meaning to log forces (i.e. * LSN is already stable), so we return a zero LSN instead. */ out_shutdown: spin_unlock(&cil->xc_push_lock); return 0; } /* * Perform initial CIL structure initialisation. */ int xlog_cil_init( struct xlog *log) { struct xfs_cil *cil; struct xfs_cil_ctx *ctx; struct xlog_cil_pcp *cilpcp; int cpu; cil = kzalloc(sizeof(*cil), GFP_KERNEL | __GFP_RETRY_MAYFAIL); if (!cil) return -ENOMEM; /* * Limit the CIL pipeline depth to 4 concurrent works to bound the * concurrency the log spinlocks will be exposed to. */ cil->xc_push_wq = alloc_workqueue("xfs-cil/%s", XFS_WQFLAGS(WQ_FREEZABLE | WQ_MEM_RECLAIM | WQ_UNBOUND), 4, log->l_mp->m_super->s_id); if (!cil->xc_push_wq) goto out_destroy_cil; cil->xc_log = log; cil->xc_pcp = alloc_percpu(struct xlog_cil_pcp); if (!cil->xc_pcp) goto out_destroy_wq; for_each_possible_cpu(cpu) { cilpcp = per_cpu_ptr(cil->xc_pcp, cpu); INIT_LIST_HEAD(&cilpcp->busy_extents); INIT_LIST_HEAD(&cilpcp->log_items); } INIT_LIST_HEAD(&cil->xc_committing); spin_lock_init(&cil->xc_push_lock); init_waitqueue_head(&cil->xc_push_wait); init_rwsem(&cil->xc_ctx_lock); init_waitqueue_head(&cil->xc_start_wait); init_waitqueue_head(&cil->xc_commit_wait); log->l_cilp = cil; ctx = xlog_cil_ctx_alloc(); xlog_cil_ctx_switch(cil, ctx); return 0; out_destroy_wq: destroy_workqueue(cil->xc_push_wq); out_destroy_cil: kfree(cil); return -ENOMEM; } void xlog_cil_destroy( struct xlog *log) { struct xfs_cil *cil = log->l_cilp; if (cil->xc_ctx) { if (cil->xc_ctx->ticket) xfs_log_ticket_put(cil->xc_ctx->ticket); kfree(cil->xc_ctx); } ASSERT(test_bit(XLOG_CIL_EMPTY, &cil->xc_flags)); free_percpu(cil->xc_pcp); destroy_workqueue(cil->xc_push_wq); kfree(cil); } |
2 2 2 2 2 2 1 1 1 1 2 2 2 2 2 2 2 2 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 2297 2298 2299 2300 2301 2302 2303 2304 2305 2306 2307 2308 2309 2310 2311 2312 2313 2314 2315 2316 2317 2318 2319 2320 2321 2322 2323 2324 2325 2326 2327 2328 2329 2330 2331 2332 2333 2334 2335 2336 2337 2338 2339 2340 2341 2342 2343 2344 2345 2346 2347 2348 2349 2350 2351 2352 2353 2354 2355 2356 2357 2358 2359 2360 2361 2362 2363 2364 2365 2366 2367 2368 2369 2370 2371 2372 2373 2374 2375 2376 2377 2378 2379 2380 2381 2382 2383 2384 2385 2386 2387 2388 2389 2390 2391 2392 2393 2394 2395 2396 2397 2398 2399 2400 2401 2402 2403 2404 2405 2406 2407 2408 2409 2410 2411 2412 2413 2414 2415 2416 2417 2418 2419 2420 2421 2422 2423 2424 2425 2426 2427 2428 2429 2430 2431 2432 2433 2434 2435 2436 2437 2438 2439 2440 2441 2442 2443 2444 2445 2446 2447 2448 2449 2450 2451 2452 2453 2454 2455 2456 2457 2458 2459 2460 2461 2462 2463 2464 2465 2466 2467 2468 2469 2470 2471 2472 2473 2474 2475 2476 2477 2478 2479 2480 2481 2482 2483 2484 2485 2486 2487 2488 2489 2490 2491 2492 2493 2494 2495 2496 2497 2498 2499 2500 2501 2502 2503 2504 2505 2506 2507 2508 2509 2510 2511 2512 2513 2514 2515 2516 2517 2518 2519 2520 2521 2522 2523 2524 2525 2526 2527 2528 2529 2530 2531 2532 2533 2534 2535 2536 2537 2538 2539 2540 2541 2542 2543 2544 2545 2546 2547 2548 2549 2550 2551 2552 2553 2554 2555 2556 2557 2558 2559 2560 2561 2562 2563 2564 2565 2566 2567 2568 2569 2570 2571 2572 2573 2574 2575 2576 2577 2578 2579 2580 2581 2582 2583 2584 2585 2586 2587 2588 2589 2590 2591 2592 2593 2594 2595 2596 2597 2598 2599 2600 2601 2602 2603 2604 2605 2606 2607 2608 2609 2610 2611 2612 2613 2614 2615 2616 2617 2618 2619 2620 2621 2622 2623 2624 2625 2626 2627 2628 2629 2630 2631 2632 2633 2634 2635 2636 2637 2638 2639 2640 2641 2642 2643 2644 2645 2646 2647 2648 2649 2650 2651 2652 2653 2654 2655 2656 2657 2658 2659 2660 2661 2662 2663 2664 2665 2666 2667 2668 2669 2670 2671 2672 2673 2674 2675 2676 2677 2678 2679 2680 2681 2682 2683 2684 2685 2686 2687 2688 2689 2690 2691 2692 2693 2694 2695 2696 2697 2698 2699 2700 2701 2702 2703 2704 2705 2706 2707 2708 2709 2710 2711 2712 2713 2714 2715 2716 2717 2718 2719 2720 2721 2722 2723 2724 2725 2726 2727 2728 2729 2730 2731 2732 2733 2734 2735 2736 2737 2738 2739 2740 2741 2742 2743 2744 2745 2746 2747 2748 2749 2750 2751 2752 2753 2754 2755 2756 2757 2758 2759 2760 2761 2762 2763 2764 2765 2766 2767 2768 2769 2770 2771 2772 2773 2774 2775 2776 2777 2778 2779 2780 2781 2782 2783 2784 2785 2786 2787 2788 2789 2790 2791 2792 2793 2794 2795 2796 2797 2798 2799 2800 2801 2802 2803 2804 2805 2806 2807 2808 2809 2810 2811 2812 2813 2814 2815 2816 2817 2818 2819 2820 2821 2822 2823 2824 2825 2826 2827 2828 2829 2830 2831 2832 2833 2834 2835 2836 2837 2838 2839 2840 2841 2842 2843 2844 2845 2846 2847 2848 2849 2850 2851 2852 2853 2854 2855 2856 | // SPDX-License-Identifier: GPL-2.0-or-later /* * Driver for NXP PN533 NFC Chip - core functions * * Copyright (C) 2011 Instituto Nokia de Tecnologia * Copyright (C) 2012-2013 Tieto Poland */ #include <linux/device.h> #include <linux/kernel.h> #include <linux/module.h> #include <linux/slab.h> #include <linux/nfc.h> #include <linux/netdevice.h> #include <net/nfc/nfc.h> #include "pn533.h" #define VERSION "0.3" /* How much time we spend listening for initiators */ #define PN533_LISTEN_TIME 2 /* Delay between each poll frame (ms) */ #define PN533_POLL_INTERVAL 10 /* structs for pn533 commands */ /* PN533_CMD_GET_FIRMWARE_VERSION */ struct pn533_fw_version { u8 ic; u8 ver; u8 rev; u8 support; }; /* PN533_CMD_RF_CONFIGURATION */ #define PN533_CFGITEM_RF_FIELD 0x01 #define PN533_CFGITEM_TIMING 0x02 #define PN533_CFGITEM_MAX_RETRIES 0x05 #define PN533_CFGITEM_PASORI 0x82 #define PN533_CFGITEM_RF_FIELD_AUTO_RFCA 0x2 #define PN533_CFGITEM_RF_FIELD_ON 0x1 #define PN533_CFGITEM_RF_FIELD_OFF 0x0 #define PN533_CONFIG_TIMING_102 0xb #define PN533_CONFIG_TIMING_204 0xc #define PN533_CONFIG_TIMING_409 0xd #define PN533_CONFIG_TIMING_819 0xe #define PN533_CONFIG_MAX_RETRIES_NO_RETRY 0x00 #define PN533_CONFIG_MAX_RETRIES_ENDLESS 0xFF struct pn533_config_max_retries { u8 mx_rty_atr; u8 mx_rty_psl; u8 mx_rty_passive_act; } __packed; struct pn533_config_timing { u8 rfu; u8 atr_res_timeout; u8 dep_timeout; } __packed; /* PN533_CMD_IN_LIST_PASSIVE_TARGET */ /* felica commands opcode */ #define PN533_FELICA_OPC_SENSF_REQ 0 #define PN533_FELICA_OPC_SENSF_RES 1 /* felica SENSF_REQ parameters */ #define PN533_FELICA_SENSF_SC_ALL 0xFFFF #define PN533_FELICA_SENSF_RC_NO_SYSTEM_CODE 0 #define PN533_FELICA_SENSF_RC_SYSTEM_CODE 1 #define PN533_FELICA_SENSF_RC_ADVANCED_PROTOCOL 2 /* type B initiator_data values */ #define PN533_TYPE_B_AFI_ALL_FAMILIES 0 #define PN533_TYPE_B_POLL_METHOD_TIMESLOT 0 #define PN533_TYPE_B_POLL_METHOD_PROBABILISTIC 1 union pn533_cmd_poll_initdata { struct { u8 afi; u8 polling_method; } __packed type_b; struct { u8 opcode; __be16 sc; u8 rc; u8 tsn; } __packed felica; }; struct pn533_poll_modulations { struct { u8 maxtg; u8 brty; union pn533_cmd_poll_initdata initiator_data; } __packed data; u8 len; }; static const struct pn533_poll_modulations poll_mod[] = { [PN533_POLL_MOD_106KBPS_A] = { .data = { .maxtg = 1, .brty = 0, }, .len = 2, }, [PN533_POLL_MOD_212KBPS_FELICA] = { .data = { .maxtg = 1, .brty = 1, .initiator_data.felica = { .opcode = PN533_FELICA_OPC_SENSF_REQ, .sc = PN533_FELICA_SENSF_SC_ALL, .rc = PN533_FELICA_SENSF_RC_SYSTEM_CODE, .tsn = 0x03, }, }, .len = 7, }, [PN533_POLL_MOD_424KBPS_FELICA] = { .data = { .maxtg = 1, .brty = 2, .initiator_data.felica = { .opcode = PN533_FELICA_OPC_SENSF_REQ, .sc = PN533_FELICA_SENSF_SC_ALL, .rc = PN533_FELICA_SENSF_RC_SYSTEM_CODE, .tsn = 0x03, }, }, .len = 7, }, [PN533_POLL_MOD_106KBPS_JEWEL] = { .data = { .maxtg = 1, .brty = 4, }, .len = 2, }, [PN533_POLL_MOD_847KBPS_B] = { .data = { .maxtg = 1, .brty = 8, .initiator_data.type_b = { .afi = PN533_TYPE_B_AFI_ALL_FAMILIES, .polling_method = PN533_TYPE_B_POLL_METHOD_TIMESLOT, }, }, .len = 3, }, [PN533_LISTEN_MOD] = { .len = 0, }, }; /* PN533_CMD_IN_ATR */ struct pn533_cmd_activate_response { u8 status; u8 nfcid3t[10]; u8 didt; u8 bst; u8 brt; u8 to; u8 ppt; /* optional */ u8 gt[]; } __packed; struct pn533_cmd_jump_dep_response { u8 status; u8 tg; u8 nfcid3t[10]; u8 didt; u8 bst; u8 brt; u8 to; u8 ppt; /* optional */ u8 gt[]; } __packed; struct pn532_autopoll_resp { u8 type; u8 ln; u8 tg; u8 tgdata[]; }; /* PN532_CMD_IN_AUTOPOLL */ #define PN532_AUTOPOLL_POLLNR_INFINITE 0xff #define PN532_AUTOPOLL_PERIOD 0x03 /* in units of 150 ms */ #define PN532_AUTOPOLL_TYPE_GENERIC_106 0x00 #define PN532_AUTOPOLL_TYPE_GENERIC_212 0x01 #define PN532_AUTOPOLL_TYPE_GENERIC_424 0x02 #define PN532_AUTOPOLL_TYPE_JEWEL 0x04 #define PN532_AUTOPOLL_TYPE_MIFARE 0x10 #define PN532_AUTOPOLL_TYPE_FELICA212 0x11 #define PN532_AUTOPOLL_TYPE_FELICA424 0x12 #define PN532_AUTOPOLL_TYPE_ISOA 0x20 #define PN532_AUTOPOLL_TYPE_ISOB 0x23 #define PN532_AUTOPOLL_TYPE_DEP_PASSIVE_106 0x40 #define PN532_AUTOPOLL_TYPE_DEP_PASSIVE_212 0x41 #define PN532_AUTOPOLL_TYPE_DEP_PASSIVE_424 0x42 #define PN532_AUTOPOLL_TYPE_DEP_ACTIVE_106 0x80 #define PN532_AUTOPOLL_TYPE_DEP_ACTIVE_212 0x81 #define PN532_AUTOPOLL_TYPE_DEP_ACTIVE_424 0x82 /* PN533_TG_INIT_AS_TARGET */ #define PN533_INIT_TARGET_PASSIVE 0x1 #define PN533_INIT_TARGET_DEP 0x2 #define PN533_INIT_TARGET_RESP_FRAME_MASK 0x3 #define PN533_INIT_TARGET_RESP_ACTIVE 0x1 #define PN533_INIT_TARGET_RESP_DEP 0x4 /* The rule: value(high byte) + value(low byte) + checksum = 0 */ static inline u8 pn533_ext_checksum(u16 value) { return ~(u8)(((value & 0xFF00) >> 8) + (u8)(value & 0xFF)) + 1; } /* The rule: value + checksum = 0 */ static inline u8 pn533_std_checksum(u8 value) { return ~value + 1; } /* The rule: sum(data elements) + checksum = 0 */ static u8 pn533_std_data_checksum(u8 *data, int datalen) { u8 sum = 0; int i; for (i = 0; i < datalen; i++) sum += data[i]; return pn533_std_checksum(sum); } static void pn533_std_tx_frame_init(void *_frame, u8 cmd_code) { struct pn533_std_frame *frame = _frame; frame->preamble = 0; frame->start_frame = cpu_to_be16(PN533_STD_FRAME_SOF); PN533_STD_FRAME_IDENTIFIER(frame) = PN533_STD_FRAME_DIR_OUT; PN533_FRAME_CMD(frame) = cmd_code; frame->datalen = 2; } static void pn533_std_tx_frame_finish(void *_frame) { struct pn533_std_frame *frame = _frame; frame->datalen_checksum = pn533_std_checksum(frame->datalen); PN533_STD_FRAME_CHECKSUM(frame) = pn533_std_data_checksum(frame->data, frame->datalen); PN533_STD_FRAME_POSTAMBLE(frame) = 0; } static void pn533_std_tx_update_payload_len(void *_frame, int len) { struct pn533_std_frame *frame = _frame; frame->datalen += len; } static bool pn533_std_rx_frame_is_valid(void *_frame, struct pn533 *dev) { u8 checksum; struct pn533_std_frame *stdf = _frame; if (stdf->start_frame != cpu_to_be16(PN533_STD_FRAME_SOF)) return false; if (likely(!PN533_STD_IS_EXTENDED(stdf))) { /* Standard frame code */ dev->ops->rx_header_len = PN533_STD_FRAME_HEADER_LEN; checksum = pn533_std_checksum(stdf->datalen); if (checksum != stdf->datalen_checksum) return false; checksum = pn533_std_data_checksum(stdf->data, stdf->datalen); if (checksum != PN533_STD_FRAME_CHECKSUM(stdf)) return false; } else { /* Extended */ struct pn533_ext_frame *eif = _frame; dev->ops->rx_header_len = PN533_EXT_FRAME_HEADER_LEN; checksum = pn533_ext_checksum(be16_to_cpu(eif->datalen)); if (checksum != eif->datalen_checksum) return false; /* check data checksum */ checksum = pn533_std_data_checksum(eif->data, be16_to_cpu(eif->datalen)); if (checksum != PN533_EXT_FRAME_CHECKSUM(eif)) return false; } return true; } bool pn533_rx_frame_is_ack(void *_frame) { struct pn533_std_frame *frame = _frame; if (frame->start_frame != cpu_to_be16(PN533_STD_FRAME_SOF)) return false; if (frame->datalen != 0 || frame->datalen_checksum != 0xFF) return false; return true; } EXPORT_SYMBOL_GPL(pn533_rx_frame_is_ack); static inline int pn533_std_rx_frame_size(void *frame) { struct pn533_std_frame *f = frame; /* check for Extended Information frame */ if (PN533_STD_IS_EXTENDED(f)) { struct pn533_ext_frame *eif = frame; return sizeof(struct pn533_ext_frame) + be16_to_cpu(eif->datalen) + PN533_STD_FRAME_TAIL_LEN; } return sizeof(struct pn533_std_frame) + f->datalen + PN533_STD_FRAME_TAIL_LEN; } static u8 pn533_std_get_cmd_code(void *frame) { struct pn533_std_frame *f = frame; struct pn533_ext_frame *eif = frame; if (PN533_STD_IS_EXTENDED(f)) return PN533_FRAME_CMD(eif); else return PN533_FRAME_CMD(f); } bool pn533_rx_frame_is_cmd_response(struct pn533 *dev, void *frame) { return (dev->ops->get_cmd_code(frame) == PN533_CMD_RESPONSE(dev->cmd->code)); } EXPORT_SYMBOL_GPL(pn533_rx_frame_is_cmd_response); static struct pn533_frame_ops pn533_std_frame_ops = { .tx_frame_init = pn533_std_tx_frame_init, .tx_frame_finish = pn533_std_tx_frame_finish, .tx_update_payload_len = pn533_std_tx_update_payload_len, .tx_header_len = PN533_STD_FRAME_HEADER_LEN, .tx_tail_len = PN533_STD_FRAME_TAIL_LEN, .rx_is_frame_valid = pn533_std_rx_frame_is_valid, .rx_frame_size = pn533_std_rx_frame_size, .rx_header_len = PN533_STD_FRAME_HEADER_LEN, .rx_tail_len = PN533_STD_FRAME_TAIL_LEN, .max_payload_len = PN533_STD_FRAME_MAX_PAYLOAD_LEN, .get_cmd_code = pn533_std_get_cmd_code, }; static void pn533_build_cmd_frame(struct pn533 *dev, u8 cmd_code, struct sk_buff *skb) { /* payload is already there, just update datalen */ int payload_len = skb->len; struct pn533_frame_ops *ops = dev->ops; skb_push(skb, ops->tx_header_len); skb_put(skb, ops->tx_tail_len); ops->tx_frame_init(skb->data, cmd_code); ops->tx_update_payload_len(skb->data, payload_len); ops->tx_frame_finish(skb->data); } static int pn533_send_async_complete(struct pn533 *dev) { struct pn533_cmd *cmd = dev->cmd; struct sk_buff *resp; int status, rc = 0; if (!cmd) { dev_dbg(dev->dev, "%s: cmd not set\n", __func__); goto done; } dev_kfree_skb(cmd->req); status = cmd->status; resp = cmd->resp; if (status < 0) { rc = cmd->complete_cb(dev, cmd->complete_cb_context, ERR_PTR(status)); dev_kfree_skb(resp); goto done; } /* when no response is set we got interrupted */ if (!resp) resp = ERR_PTR(-EINTR); if (!IS_ERR(resp)) { skb_pull(resp, dev->ops->rx_header_len); skb_trim(resp, resp->len - dev->ops->rx_tail_len); } rc = cmd->complete_cb(dev, cmd->complete_cb_context, resp); done: kfree(cmd); dev->cmd = NULL; return rc; } static int __pn533_send_async(struct pn533 *dev, u8 cmd_code, struct sk_buff *req, pn533_send_async_complete_t complete_cb, void *complete_cb_context) { struct pn533_cmd *cmd; int rc = 0; dev_dbg(dev->dev, "Sending command 0x%x\n", cmd_code); cmd = kzalloc(sizeof(*cmd), GFP_KERNEL); if (!cmd) return -ENOMEM; cmd->code = cmd_code; cmd->req = req; cmd->complete_cb = complete_cb; cmd->complete_cb_context = complete_cb_context; pn533_build_cmd_frame(dev, cmd_code, req); mutex_lock(&dev->cmd_lock); if (!dev->cmd_pending) { dev->cmd = cmd; rc = dev->phy_ops->send_frame(dev, req); if (rc) { dev->cmd = NULL; goto error; } dev->cmd_pending = 1; goto unlock; } dev_dbg(dev->dev, "%s Queueing command 0x%x\n", __func__, cmd_code); INIT_LIST_HEAD(&cmd->queue); list_add_tail(&cmd->queue, &dev->cmd_queue); goto unlock; error: kfree(cmd); unlock: mutex_unlock(&dev->cmd_lock); return rc; } static int pn533_send_data_async(struct pn533 *dev, u8 cmd_code, struct sk_buff *req, pn533_send_async_complete_t complete_cb, void *complete_cb_context) { return __pn533_send_async(dev, cmd_code, req, complete_cb, complete_cb_context); } static int pn533_send_cmd_async(struct pn533 *dev, u8 cmd_code, struct sk_buff *req, pn533_send_async_complete_t complete_cb, void *complete_cb_context) { return __pn533_send_async(dev, cmd_code, req, complete_cb, complete_cb_context); } /* * pn533_send_cmd_direct_async * * The function sends a priority cmd directly to the chip omitting the cmd * queue. It's intended to be used by chaining mechanism of received responses * where the host has to request every single chunk of data before scheduling * next cmd from the queue. */ static int pn533_send_cmd_direct_async(struct pn533 *dev, u8 cmd_code, struct sk_buff *req, pn533_send_async_complete_t complete_cb, void *complete_cb_context) { struct pn533_cmd *cmd; int rc; cmd = kzalloc(sizeof(*cmd), GFP_KERNEL); if (!cmd) return -ENOMEM; cmd->code = cmd_code; cmd->req = req; cmd->complete_cb = complete_cb; cmd->complete_cb_context = complete_cb_context; pn533_build_cmd_frame(dev, cmd_code, req); dev->cmd = cmd; rc = dev->phy_ops->send_frame(dev, req); if (rc < 0) { dev->cmd = NULL; kfree(cmd); } return rc; } static void pn533_wq_cmd_complete(struct work_struct *work) { struct pn533 *dev = container_of(work, struct pn533, cmd_complete_work); int rc; rc = pn533_send_async_complete(dev); if (rc != -EINPROGRESS) queue_work(dev->wq, &dev->cmd_work); } static void pn533_wq_cmd(struct work_struct *work) { struct pn533 *dev = container_of(work, struct pn533, cmd_work); struct pn533_cmd *cmd; int rc; mutex_lock(&dev->cmd_lock); if (list_empty(&dev->cmd_queue)) { dev->cmd_pending = 0; mutex_unlock(&dev->cmd_lock); return; } cmd = list_first_entry(&dev->cmd_queue, struct pn533_cmd, queue); list_del(&cmd->queue); mutex_unlock(&dev->cmd_lock); dev->cmd = cmd; rc = dev->phy_ops->send_frame(dev, cmd->req); if (rc < 0) { dev->cmd = NULL; dev_kfree_skb(cmd->req); kfree(cmd); return; } } struct pn533_sync_cmd_response { struct sk_buff *resp; struct completion done; }; static int pn533_send_sync_complete(struct pn533 *dev, void *_arg, struct sk_buff *resp) { struct pn533_sync_cmd_response *arg = _arg; arg->resp = resp; complete(&arg->done); return 0; } /* pn533_send_cmd_sync * * Please note the req parameter is freed inside the function to * limit a number of return value interpretations by the caller. * * 1. negative in case of error during TX path -> req should be freed * * 2. negative in case of error during RX path -> req should not be freed * as it's been already freed at the beginning of RX path by * async_complete_cb. * * 3. valid pointer in case of successful RX path * * A caller has to check a return value with IS_ERR macro. If the test pass, * the returned pointer is valid. * */ static struct sk_buff *pn533_send_cmd_sync(struct pn533 *dev, u8 cmd_code, struct sk_buff *req) { int rc; struct pn533_sync_cmd_response arg; init_completion(&arg.done); rc = pn533_send_cmd_async(dev, cmd_code, req, pn533_send_sync_complete, &arg); if (rc) { dev_kfree_skb(req); return ERR_PTR(rc); } wait_for_completion(&arg.done); return arg.resp; } static struct sk_buff *pn533_alloc_skb(struct pn533 *dev, unsigned int size) { struct sk_buff *skb; skb = alloc_skb(dev->ops->tx_header_len + size + dev->ops->tx_tail_len, GFP_KERNEL); if (skb) skb_reserve(skb, dev->ops->tx_header_len); return skb; } struct pn533_target_type_a { __be16 sens_res; u8 sel_res; u8 nfcid_len; u8 nfcid_data[]; } __packed; #define PN533_TYPE_A_SENS_RES_NFCID1(x) ((u8)((be16_to_cpu(x) & 0x00C0) >> 6)) #define PN533_TYPE_A_SENS_RES_SSD(x) ((u8)((be16_to_cpu(x) & 0x001F) >> 0)) #define PN533_TYPE_A_SENS_RES_PLATCONF(x) ((u8)((be16_to_cpu(x) & 0x0F00) >> 8)) #define PN533_TYPE_A_SENS_RES_SSD_JEWEL 0x00 #define PN533_TYPE_A_SENS_RES_PLATCONF_JEWEL 0x0C #define PN533_TYPE_A_SEL_PROT(x) (((x) & 0x60) >> 5) #define PN533_TYPE_A_SEL_CASCADE(x) (((x) & 0x04) >> 2) #define PN533_TYPE_A_SEL_PROT_MIFARE 0 #define PN533_TYPE_A_SEL_PROT_ISO14443 1 #define PN533_TYPE_A_SEL_PROT_DEP 2 #define PN533_TYPE_A_SEL_PROT_ISO14443_DEP 3 static bool pn533_target_type_a_is_valid(struct pn533_target_type_a *type_a, int target_data_len) { u8 ssd; u8 platconf; if (target_data_len < sizeof(struct pn533_target_type_a)) return false; /* * The length check of nfcid[] and ats[] are not being performed because * the values are not being used */ /* Requirement 4.6.3.3 from NFC Forum Digital Spec */ ssd = PN533_TYPE_A_SENS_RES_SSD(type_a->sens_res); platconf = PN533_TYPE_A_SENS_RES_PLATCONF(type_a->sens_res); if ((ssd == PN533_TYPE_A_SENS_RES_SSD_JEWEL && platconf != PN533_TYPE_A_SENS_RES_PLATCONF_JEWEL) || (ssd != PN533_TYPE_A_SENS_RES_SSD_JEWEL && platconf == PN533_TYPE_A_SENS_RES_PLATCONF_JEWEL)) return false; /* Requirements 4.8.2.1, 4.8.2.3, 4.8.2.5 and 4.8.2.7 from NFC Forum */ if (PN533_TYPE_A_SEL_CASCADE(type_a->sel_res) != 0) return false; if (type_a->nfcid_len > NFC_NFCID1_MAXSIZE) return false; return true; } static int pn533_target_found_type_a(struct nfc_target *nfc_tgt, u8 *tgt_data, int tgt_data_len) { struct pn533_target_type_a *tgt_type_a; tgt_type_a = (struct pn533_target_type_a *)tgt_data; if (!pn533_target_type_a_is_valid(tgt_type_a, tgt_data_len)) return -EPROTO; switch (PN533_TYPE_A_SEL_PROT(tgt_type_a->sel_res)) { case PN533_TYPE_A_SEL_PROT_MIFARE: nfc_tgt->supported_protocols = NFC_PROTO_MIFARE_MASK; break; case PN533_TYPE_A_SEL_PROT_ISO14443: nfc_tgt->supported_protocols = NFC_PROTO_ISO14443_MASK; break; case PN533_TYPE_A_SEL_PROT_DEP: nfc_tgt->supported_protocols = NFC_PROTO_NFC_DEP_MASK; break; case PN533_TYPE_A_SEL_PROT_ISO14443_DEP: nfc_tgt->supported_protocols = NFC_PROTO_ISO14443_MASK | NFC_PROTO_NFC_DEP_MASK; break; } nfc_tgt->sens_res = be16_to_cpu(tgt_type_a->sens_res); nfc_tgt->sel_res = tgt_type_a->sel_res; nfc_tgt->nfcid1_len = tgt_type_a->nfcid_len; memcpy(nfc_tgt->nfcid1, tgt_type_a->nfcid_data, nfc_tgt->nfcid1_len); return 0; } struct pn533_target_felica { u8 pol_res; u8 opcode; u8 nfcid2[NFC_NFCID2_MAXSIZE]; u8 pad[8]; /* optional */ u8 syst_code[]; } __packed; #define PN533_FELICA_SENSF_NFCID2_DEP_B1 0x01 #define PN533_FELICA_SENSF_NFCID2_DEP_B2 0xFE static bool pn533_target_felica_is_valid(struct pn533_target_felica *felica, int target_data_len) { if (target_data_len < sizeof(struct pn533_target_felica)) return false; if (felica->opcode != PN533_FELICA_OPC_SENSF_RES) return false; return true; } static int pn533_target_found_felica(struct nfc_target *nfc_tgt, u8 *tgt_data, int tgt_data_len) { struct pn533_target_felica *tgt_felica; tgt_felica = (struct pn533_target_felica *)tgt_data; if (!pn533_target_felica_is_valid(tgt_felica, tgt_data_len)) return -EPROTO; if ((tgt_felica->nfcid2[0] == PN533_FELICA_SENSF_NFCID2_DEP_B1) && (tgt_felica->nfcid2[1] == PN533_FELICA_SENSF_NFCID2_DEP_B2)) nfc_tgt->supported_protocols = NFC_PROTO_NFC_DEP_MASK; else nfc_tgt->supported_protocols = NFC_PROTO_FELICA_MASK; memcpy(nfc_tgt->sensf_res, &tgt_felica->opcode, 9); nfc_tgt->sensf_res_len = 9; memcpy(nfc_tgt->nfcid2, tgt_felica->nfcid2, NFC_NFCID2_MAXSIZE); nfc_tgt->nfcid2_len = NFC_NFCID2_MAXSIZE; return 0; } struct pn533_target_jewel { __be16 sens_res; u8 jewelid[4]; } __packed; static bool pn533_target_jewel_is_valid(struct pn533_target_jewel *jewel, int target_data_len) { u8 ssd; u8 platconf; if (target_data_len < sizeof(struct pn533_target_jewel)) return false; /* Requirement 4.6.3.3 from NFC Forum Digital Spec */ ssd = PN533_TYPE_A_SENS_RES_SSD(jewel->sens_res); platconf = PN533_TYPE_A_SENS_RES_PLATCONF(jewel->sens_res); if ((ssd == PN533_TYPE_A_SENS_RES_SSD_JEWEL && platconf != PN533_TYPE_A_SENS_RES_PLATCONF_JEWEL) || (ssd != PN533_TYPE_A_SENS_RES_SSD_JEWEL && platconf == PN533_TYPE_A_SENS_RES_PLATCONF_JEWEL)) return false; return true; } static int pn533_target_found_jewel(struct nfc_target *nfc_tgt, u8 *tgt_data, int tgt_data_len) { struct pn533_target_jewel *tgt_jewel; tgt_jewel = (struct pn533_target_jewel *)tgt_data; if (!pn533_target_jewel_is_valid(tgt_jewel, tgt_data_len)) return -EPROTO; nfc_tgt->supported_protocols = NFC_PROTO_JEWEL_MASK; nfc_tgt->sens_res = be16_to_cpu(tgt_jewel->sens_res); nfc_tgt->nfcid1_len = 4; memcpy(nfc_tgt->nfcid1, tgt_jewel->jewelid, nfc_tgt->nfcid1_len); return 0; } struct pn533_type_b_prot_info { u8 bitrate; u8 fsci_type; u8 fwi_adc_fo; } __packed; #define PN533_TYPE_B_PROT_FCSI(x) (((x) & 0xF0) >> 4) #define PN533_TYPE_B_PROT_TYPE(x) (((x) & 0x0F) >> 0) #define PN533_TYPE_B_PROT_TYPE_RFU_MASK 0x8 struct pn533_type_b_sens_res { u8 opcode; u8 nfcid[4]; u8 appdata[4]; struct pn533_type_b_prot_info prot_info; } __packed; #define PN533_TYPE_B_OPC_SENSB_RES 0x50 struct pn533_target_type_b { struct pn533_type_b_sens_res sensb_res; u8 attrib_res_len; u8 attrib_res[]; } __packed; static bool pn533_target_type_b_is_valid(struct pn533_target_type_b *type_b, int target_data_len) { if (target_data_len < sizeof(struct pn533_target_type_b)) return false; if (type_b->sensb_res.opcode != PN533_TYPE_B_OPC_SENSB_RES) return false; if (PN533_TYPE_B_PROT_TYPE(type_b->sensb_res.prot_info.fsci_type) & PN533_TYPE_B_PROT_TYPE_RFU_MASK) return false; return true; } static int pn533_target_found_type_b(struct nfc_target *nfc_tgt, u8 *tgt_data, int tgt_data_len) { struct pn533_target_type_b *tgt_type_b; tgt_type_b = (struct pn533_target_type_b *)tgt_data; if (!pn533_target_type_b_is_valid(tgt_type_b, tgt_data_len)) return -EPROTO; nfc_tgt->supported_protocols = NFC_PROTO_ISO14443_B_MASK; return 0; } static void pn533_poll_reset_mod_list(struct pn533 *dev); static int pn533_target_found(struct pn533 *dev, u8 tg, u8 *tgdata, int tgdata_len) { struct nfc_target nfc_tgt; int rc; dev_dbg(dev->dev, "%s: modulation=%d\n", __func__, dev->poll_mod_curr); if (tg != 1) return -EPROTO; memset(&nfc_tgt, 0, sizeof(struct nfc_target)); switch (dev->poll_mod_curr) { case PN533_POLL_MOD_106KBPS_A: rc = pn533_target_found_type_a(&nfc_tgt, tgdata, tgdata_len); break; case PN533_POLL_MOD_212KBPS_FELICA: case PN533_POLL_MOD_424KBPS_FELICA: rc = pn533_target_found_felica(&nfc_tgt, tgdata, tgdata_len); break; case PN533_POLL_MOD_106KBPS_JEWEL: rc = pn533_target_found_jewel(&nfc_tgt, tgdata, tgdata_len); break; case PN533_POLL_MOD_847KBPS_B: rc = pn533_target_found_type_b(&nfc_tgt, tgdata, tgdata_len); break; default: nfc_err(dev->dev, "Unknown current poll modulation\n"); return -EPROTO; } if (rc) return rc; if (!(nfc_tgt.supported_protocols & dev->poll_protocols)) { dev_dbg(dev->dev, "The Tg found doesn't have the desired protocol\n"); return -EAGAIN; } dev_dbg(dev->dev, "Target found - supported protocols: 0x%x\n", nfc_tgt.supported_protocols); dev->tgt_available_prots = nfc_tgt.supported_protocols; pn533_poll_reset_mod_list(dev); nfc_targets_found(dev->nfc_dev, &nfc_tgt, 1); return 0; } static inline void pn533_poll_next_mod(struct pn533 *dev) { dev->poll_mod_curr = (dev->poll_mod_curr + 1) % dev->poll_mod_count; } static void pn533_poll_reset_mod_list(struct pn533 *dev) { dev->poll_mod_count = 0; } static void pn533_poll_add_mod(struct pn533 *dev, u8 mod_index) { dev->poll_mod_active[dev->poll_mod_count] = (struct pn533_poll_modulations *)&poll_mod[mod_index]; dev->poll_mod_count++; } static void pn533_poll_create_mod_list(struct pn533 *dev, u32 im_protocols, u32 tm_protocols) { pn533_poll_reset_mod_list(dev); if ((im_protocols & NFC_PROTO_MIFARE_MASK) || (im_protocols & NFC_PROTO_ISO14443_MASK) || (im_protocols & NFC_PROTO_NFC_DEP_MASK)) pn533_poll_add_mod(dev, PN533_POLL_MOD_106KBPS_A); if (im_protocols & NFC_PROTO_FELICA_MASK || im_protocols & NFC_PROTO_NFC_DEP_MASK) { pn533_poll_add_mod(dev, PN533_POLL_MOD_212KBPS_FELICA); pn533_poll_add_mod(dev, PN533_POLL_MOD_424KBPS_FELICA); } if (im_protocols & NFC_PROTO_JEWEL_MASK) pn533_poll_add_mod(dev, PN533_POLL_MOD_106KBPS_JEWEL); if (im_protocols & NFC_PROTO_ISO14443_B_MASK) pn533_poll_add_mod(dev, PN533_POLL_MOD_847KBPS_B); if (tm_protocols) pn533_poll_add_mod(dev, PN533_LISTEN_MOD); } static int pn533_start_poll_complete(struct pn533 *dev, struct sk_buff *resp) { u8 nbtg, tg, *tgdata; int rc, tgdata_len; /* Toggle the DEP polling */ if (dev->poll_protocols & NFC_PROTO_NFC_DEP_MASK) dev->poll_dep = 1; nbtg = resp->data[0]; tg = resp->data[1]; tgdata = &resp->data[2]; tgdata_len = resp->len - 2; /* nbtg + tg */ if (nbtg) { rc = pn533_target_found(dev, tg, tgdata, tgdata_len); /* We must stop the poll after a valid target found */ if (rc == 0) return 0; } return -EAGAIN; } static struct sk_buff *pn533_alloc_poll_tg_frame(struct pn533 *dev) { struct sk_buff *skb; u8 *felica, *nfcid3; u8 *gbytes = dev->gb; size_t gbytes_len = dev->gb_len; u8 felica_params[18] = {0x1, 0xfe, /* DEP */ 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, /* random */ 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xff, 0xff}; /* System code */ u8 mifare_params[6] = {0x1, 0x1, /* SENS_RES */ 0x0, 0x0, 0x0, 0x40}; /* SEL_RES for DEP */ unsigned int skb_len = 36 + /* * mode (1), mifare (6), * felica (18), nfcid3 (10), gb_len (1) */ gbytes_len + 1; /* len Tk*/ skb = pn533_alloc_skb(dev, skb_len); if (!skb) return NULL; /* DEP support only */ skb_put_u8(skb, PN533_INIT_TARGET_DEP); /* MIFARE params */ skb_put_data(skb, mifare_params, 6); /* Felica params */ felica = skb_put_data(skb, felica_params, 18); get_random_bytes(felica + 2, 6); /* NFCID3 */ nfcid3 = skb_put_zero(skb, 10); memcpy(nfcid3, felica, 8); /* General bytes */ skb_put_u8(skb, gbytes_len); skb_put_data(skb, gbytes, gbytes_len); /* Len Tk */ skb_put_u8(skb, 0); return skb; } static void pn533_wq_tm_mi_recv(struct work_struct *work); static struct sk_buff *pn533_build_response(struct pn533 *dev); static int pn533_tm_get_data_complete(struct pn533 *dev, void *arg, struct sk_buff *resp) { struct sk_buff *skb; u8 status, ret, mi; int rc; if (IS_ERR(resp)) { skb_queue_purge(&dev->resp_q); return PTR_ERR(resp); } status = resp->data[0]; ret = status & PN533_CMD_RET_MASK; mi = status & PN533_CMD_MI_MASK; skb_pull(resp, sizeof(status)); if (ret != PN533_CMD_RET_SUCCESS) { rc = -EIO; goto error; } skb_queue_tail(&dev->resp_q, resp); if (mi) { queue_work(dev->wq, &dev->mi_tm_rx_work); return -EINPROGRESS; } skb = pn533_build_response(dev); if (!skb) { rc = -EIO; goto error; } return nfc_tm_data_received(dev->nfc_dev, skb); error: nfc_tm_deactivated(dev->nfc_dev); dev->tgt_mode = 0; skb_queue_purge(&dev->resp_q); dev_kfree_skb(resp); return rc; } static void pn533_wq_tm_mi_recv(struct work_struct *work) { struct pn533 *dev = container_of(work, struct pn533, mi_tm_rx_work); struct sk_buff *skb; int rc; skb = pn533_alloc_skb(dev, 0); if (!skb) return; rc = pn533_send_cmd_direct_async(dev, PN533_CMD_TG_GET_DATA, skb, pn533_tm_get_data_complete, NULL); if (rc < 0) dev_kfree_skb(skb); } static int pn533_tm_send_complete(struct pn533 *dev, void *arg, struct sk_buff *resp); static void pn533_wq_tm_mi_send(struct work_struct *work) { struct pn533 *dev = container_of(work, struct pn533, mi_tm_tx_work); struct sk_buff *skb; int rc; /* Grab the first skb in the queue */ skb = skb_dequeue(&dev->fragment_skb); if (skb == NULL) { /* No more data */ /* Reset the queue for future use */ skb_queue_head_init(&dev->fragment_skb); goto error; } /* last entry - remove MI bit */ if (skb_queue_len(&dev->fragment_skb) == 0) { rc = pn533_send_cmd_direct_async(dev, PN533_CMD_TG_SET_DATA, skb, pn533_tm_send_complete, NULL); } else rc = pn533_send_cmd_direct_async(dev, PN533_CMD_TG_SET_META_DATA, skb, pn533_tm_send_complete, NULL); if (rc == 0) /* success */ return; dev_err(dev->dev, "Error %d when trying to perform set meta data_exchange", rc); dev_kfree_skb(skb); error: dev->phy_ops->send_ack(dev, GFP_KERNEL); queue_work(dev->wq, &dev->cmd_work); } static void pn533_wq_tg_get_data(struct work_struct *work) { struct pn533 *dev = container_of(work, struct pn533, tg_work); struct sk_buff *skb; int rc; skb = pn533_alloc_skb(dev, 0); if (!skb) return; rc = pn533_send_data_async(dev, PN533_CMD_TG_GET_DATA, skb, pn533_tm_get_data_complete, NULL); if (rc < 0) dev_kfree_skb(skb); } #define ATR_REQ_GB_OFFSET 17 static int pn533_init_target_complete(struct pn533 *dev, struct sk_buff *resp) { u8 mode, *cmd, comm_mode = NFC_COMM_PASSIVE, *gb; size_t gb_len; int rc; if (resp->len < ATR_REQ_GB_OFFSET + 1) return -EINVAL; mode = resp->data[0]; cmd = &resp->data[1]; dev_dbg(dev->dev, "Target mode 0x%x len %d\n", mode, resp->len); if ((mode & PN533_INIT_TARGET_RESP_FRAME_MASK) == PN533_INIT_TARGET_RESP_ACTIVE) comm_mode = NFC_COMM_ACTIVE; if ((mode & PN533_INIT_TARGET_RESP_DEP) == 0) /* Only DEP supported */ return -EOPNOTSUPP; gb = cmd + ATR_REQ_GB_OFFSET; gb_len = resp->len - (ATR_REQ_GB_OFFSET + 1); rc = nfc_tm_activated(dev->nfc_dev, NFC_PROTO_NFC_DEP_MASK, comm_mode, gb, gb_len); if (rc < 0) { nfc_err(dev->dev, "Error when signaling target activation\n"); return rc; } dev->tgt_mode = 1; queue_work(dev->wq, &dev->tg_work); return 0; } static void pn533_listen_mode_timer(struct timer_list *t) { struct pn533 *dev = from_timer(dev, t, listen_timer); dev->cancel_listen = 1; pn533_poll_next_mod(dev); queue_delayed_work(dev->wq, &dev->poll_work, msecs_to_jiffies(PN533_POLL_INTERVAL)); } static int pn533_rf_complete(struct pn533 *dev, void *arg, struct sk_buff *resp) { int rc = 0; if (IS_ERR(resp)) { rc = PTR_ERR(resp); nfc_err(dev->dev, "RF setting error %d\n", rc); return rc; } queue_delayed_work(dev->wq, &dev->poll_work, msecs_to_jiffies(PN533_POLL_INTERVAL)); dev_kfree_skb(resp); return rc; } static void pn533_wq_rf(struct work_struct *work) { struct pn533 *dev = container_of(work, struct pn533, rf_work); struct sk_buff *skb; int rc; skb = pn533_alloc_skb(dev, 2); if (!skb) return; skb_put_u8(skb, PN533_CFGITEM_RF_FIELD); skb_put_u8(skb, PN533_CFGITEM_RF_FIELD_AUTO_RFCA); rc = pn533_send_cmd_async(dev, PN533_CMD_RF_CONFIGURATION, skb, pn533_rf_complete, NULL); if (rc < 0) { dev_kfree_skb(skb); nfc_err(dev->dev, "RF setting error %d\n", rc); } } static int pn533_poll_dep_complete(struct pn533 *dev, void *arg, struct sk_buff *resp) { struct pn533_cmd_jump_dep_response *rsp; struct nfc_target nfc_target; u8 target_gt_len; int rc; if (IS_ERR(resp)) return PTR_ERR(resp); memset(&nfc_target, 0, sizeof(struct nfc_target)); rsp = (struct pn533_cmd_jump_dep_response *)resp->data; rc = rsp->status & PN533_CMD_RET_MASK; if (rc != PN533_CMD_RET_SUCCESS) { /* Not target found, turn radio off */ queue_work(dev->wq, &dev->rf_work); dev_kfree_skb(resp); return 0; } dev_dbg(dev->dev, "Creating new target"); nfc_target.supported_protocols = NFC_PROTO_NFC_DEP_MASK; nfc_target.nfcid1_len = 10; memcpy(nfc_target.nfcid1, rsp->nfcid3t, nfc_target.nfcid1_len); rc = nfc_targets_found(dev->nfc_dev, &nfc_target, 1); if (rc) goto error; dev->tgt_available_prots = 0; dev->tgt_active_prot = NFC_PROTO_NFC_DEP; /* ATR_RES general bytes are located at offset 17 */ target_gt_len = resp->len - 17; rc = nfc_set_remote_general_bytes(dev->nfc_dev, rsp->gt, target_gt_len); if (!rc) { rc = nfc_dep_link_is_up(dev->nfc_dev, dev->nfc_dev->targets[0].idx, 0, NFC_RF_INITIATOR); if (!rc) pn533_poll_reset_mod_list(dev); } error: dev_kfree_skb(resp); return rc; } #define PASSIVE_DATA_LEN 5 static int pn533_poll_dep(struct nfc_dev *nfc_dev) { struct pn533 *dev = nfc_get_drvdata(nfc_dev); struct sk_buff *skb; int rc, skb_len; u8 *next, nfcid3[NFC_NFCID3_MAXSIZE]; u8 passive_data[PASSIVE_DATA_LEN] = {0x00, 0xff, 0xff, 0x00, 0x3}; if (!dev->gb) { dev->gb = nfc_get_local_general_bytes(nfc_dev, &dev->gb_len); if (!dev->gb || !dev->gb_len) { dev->poll_dep = 0; queue_work(dev->wq, &dev->rf_work); } } skb_len = 3 + dev->gb_len; /* ActPass + BR + Next */ skb_len += PASSIVE_DATA_LEN; /* NFCID3 */ skb_len += NFC_NFCID3_MAXSIZE; nfcid3[0] = 0x1; nfcid3[1] = 0xfe; get_random_bytes(nfcid3 + 2, 6); skb = pn533_alloc_skb(dev, skb_len); if (!skb) return -ENOMEM; skb_put_u8(skb, 0x01); /* Active */ skb_put_u8(skb, 0x02); /* 424 kbps */ next = skb_put(skb, 1); /* Next */ *next = 0; /* Copy passive data */ skb_put_data(skb, passive_data, PASSIVE_DATA_LEN); *next |= 1; /* Copy NFCID3 (which is NFCID2 from SENSF_RES) */ skb_put_data(skb, nfcid3, NFC_NFCID3_MAXSIZE); *next |= 2; skb_put_data(skb, dev->gb, dev->gb_len); *next |= 4; /* We have some Gi */ rc = pn533_send_cmd_async(dev, PN533_CMD_IN_JUMP_FOR_DEP, skb, pn533_poll_dep_complete, NULL); if (rc < 0) dev_kfree_skb(skb); return rc; } static int pn533_autopoll_complete(struct pn533 *dev, void *arg, struct sk_buff *resp) { struct pn532_autopoll_resp *apr; struct nfc_target nfc_tgt; u8 nbtg; int rc; if (IS_ERR(resp)) { rc = PTR_ERR(resp); nfc_err(dev->dev, "%s autopoll complete error %d\n", __func__, rc); if (rc == -ENOENT) { if (dev->poll_mod_count != 0) return rc; goto stop_poll; } else if (rc < 0) { nfc_err(dev->dev, "Error %d when running autopoll\n", rc); goto stop_poll; } } nbtg = resp->data[0]; if ((nbtg > 2) || (nbtg <= 0)) return -EAGAIN; apr = (struct pn532_autopoll_resp *)&resp->data[1]; while (nbtg--) { memset(&nfc_tgt, 0, sizeof(struct nfc_target)); switch (apr->type) { case PN532_AUTOPOLL_TYPE_ISOA: dev_dbg(dev->dev, "ISOA\n"); rc = pn533_target_found_type_a(&nfc_tgt, apr->tgdata, apr->ln - 1); break; case PN532_AUTOPOLL_TYPE_FELICA212: case PN532_AUTOPOLL_TYPE_FELICA424: dev_dbg(dev->dev, "FELICA\n"); rc = pn533_target_found_felica(&nfc_tgt, apr->tgdata, apr->ln - 1); break; case PN532_AUTOPOLL_TYPE_JEWEL: dev_dbg(dev->dev, "JEWEL\n"); rc = pn533_target_found_jewel(&nfc_tgt, apr->tgdata, apr->ln - 1); break; case PN532_AUTOPOLL_TYPE_ISOB: dev_dbg(dev->dev, "ISOB\n"); rc = pn533_target_found_type_b(&nfc_tgt, apr->tgdata, apr->ln - 1); break; case PN532_AUTOPOLL_TYPE_MIFARE: dev_dbg(dev->dev, "Mifare\n"); rc = pn533_target_found_type_a(&nfc_tgt, apr->tgdata, apr->ln - 1); break; default: nfc_err(dev->dev, "Unknown current poll modulation\n"); rc = -EPROTO; } if (rc) goto done; if (!(nfc_tgt.supported_protocols & dev->poll_protocols)) { nfc_err(dev->dev, "The Tg found doesn't have the desired protocol\n"); rc = -EAGAIN; goto done; } dev->tgt_available_prots = nfc_tgt.supported_protocols; apr = (struct pn532_autopoll_resp *) (apr->tgdata + (apr->ln - 1)); } pn533_poll_reset_mod_list(dev); nfc_targets_found(dev->nfc_dev, &nfc_tgt, 1); done: dev_kfree_skb(resp); return rc; stop_poll: nfc_err(dev->dev, "autopoll operation has been stopped\n"); pn533_poll_reset_mod_list(dev); dev->poll_protocols = 0; return rc; } static int pn533_poll_complete(struct pn533 *dev, void *arg, struct sk_buff *resp) { struct pn533_poll_modulations *cur_mod; int rc; if (IS_ERR(resp)) { rc = PTR_ERR(resp); nfc_err(dev->dev, "%s Poll complete error %d\n", __func__, rc); if (rc == -ENOENT) { if (dev->poll_mod_count != 0) return rc; goto stop_poll; } else if (rc < 0) { nfc_err(dev->dev, "Error %d when running poll\n", rc); goto stop_poll; } } cur_mod = dev->poll_mod_active[dev->poll_mod_curr]; if (cur_mod->len == 0) { /* Target mode */ del_timer(&dev->listen_timer); rc = pn533_init_target_complete(dev, resp); goto done; } /* Initiator mode */ rc = pn533_start_poll_complete(dev, resp); if (!rc) goto done; if (!dev->poll_mod_count) { dev_dbg(dev->dev, "Polling has been stopped\n"); goto done; } pn533_poll_next_mod(dev); /* Not target found, turn radio off */ queue_work(dev->wq, &dev->rf_work); done: dev_kfree_skb(resp); return rc; stop_poll: nfc_err(dev->dev, "Polling operation has been stopped\n"); pn533_poll_reset_mod_list(dev); dev->poll_protocols = 0; return rc; } static struct sk_buff *pn533_alloc_poll_in_frame(struct pn533 *dev, struct pn533_poll_modulations *mod) { struct sk_buff *skb; skb = pn533_alloc_skb(dev, mod->len); if (!skb) return NULL; skb_put_data(skb, &mod->data, mod->len); return skb; } static int pn533_send_poll_frame(struct pn533 *dev) { struct pn533_poll_modulations *mod; struct sk_buff *skb; int rc; u8 cmd_code; mod = dev->poll_mod_active[dev->poll_mod_curr]; dev_dbg(dev->dev, "%s mod len %d\n", __func__, mod->len); if ((dev->poll_protocols & NFC_PROTO_NFC_DEP_MASK) && dev->poll_dep) { dev->poll_dep = 0; return pn533_poll_dep(dev->nfc_dev); } if (mod->len == 0) { /* Listen mode */ cmd_code = PN533_CMD_TG_INIT_AS_TARGET; skb = pn533_alloc_poll_tg_frame(dev); } else { /* Polling mode */ cmd_code = PN533_CMD_IN_LIST_PASSIVE_TARGET; skb = pn533_alloc_poll_in_frame(dev, mod); } if (!skb) { nfc_err(dev->dev, "Failed to allocate skb\n"); return -ENOMEM; } rc = pn533_send_cmd_async(dev, cmd_code, skb, pn533_poll_complete, NULL); if (rc < 0) { dev_kfree_skb(skb); nfc_err(dev->dev, "Polling loop error %d\n", rc); } return rc; } static void pn533_wq_poll(struct work_struct *work) { struct pn533 *dev = container_of(work, struct pn533, poll_work.work); struct pn533_poll_modulations *cur_mod; int rc; cur_mod = dev->poll_mod_active[dev->poll_mod_curr]; dev_dbg(dev->dev, "%s cancel_listen %d modulation len %d\n", __func__, dev->cancel_listen, cur_mod->len); if (dev->cancel_listen == 1) { dev->cancel_listen = 0; dev->phy_ops->abort_cmd(dev, GFP_ATOMIC); } rc = pn533_send_poll_frame(dev); if (rc) return; if (cur_mod->len == 0 && dev->poll_mod_count > 1) mod_timer(&dev->listen_timer, jiffies + PN533_LISTEN_TIME * HZ); } static int pn533_start_poll(struct nfc_dev *nfc_dev, u32 im_protocols, u32 tm_protocols) { struct pn533 *dev = nfc_get_drvdata(nfc_dev); struct pn533_poll_modulations *cur_mod; struct sk_buff *skb; u8 rand_mod; int rc; dev_dbg(dev->dev, "%s: im protocols 0x%x tm protocols 0x%x\n", __func__, im_protocols, tm_protocols); if (dev->tgt_active_prot) { nfc_err(dev->dev, "Cannot poll with a target already activated\n"); return -EBUSY; } if (dev->tgt_mode) { nfc_err(dev->dev, "Cannot poll while already being activated\n"); return -EBUSY; } if (tm_protocols) { dev->gb = nfc_get_local_general_bytes(nfc_dev, &dev->gb_len); if (dev->gb == NULL) tm_protocols = 0; } dev->poll_protocols = im_protocols; dev->listen_protocols = tm_protocols; if (dev->device_type == PN533_DEVICE_PN532_AUTOPOLL) { skb = pn533_alloc_skb(dev, 4 + 6); if (!skb) return -ENOMEM; *((u8 *)skb_put(skb, sizeof(u8))) = PN532_AUTOPOLL_POLLNR_INFINITE; *((u8 *)skb_put(skb, sizeof(u8))) = PN532_AUTOPOLL_PERIOD; if ((im_protocols & NFC_PROTO_MIFARE_MASK) && (im_protocols & NFC_PROTO_ISO14443_MASK) && (im_protocols & NFC_PROTO_NFC_DEP_MASK)) *((u8 *)skb_put(skb, sizeof(u8))) = PN532_AUTOPOLL_TYPE_GENERIC_106; else { if (im_protocols & NFC_PROTO_MIFARE_MASK) *((u8 *)skb_put(skb, sizeof(u8))) = PN532_AUTOPOLL_TYPE_MIFARE; if (im_protocols & NFC_PROTO_ISO14443_MASK) *((u8 *)skb_put(skb, sizeof(u8))) = PN532_AUTOPOLL_TYPE_ISOA; if (im_protocols & NFC_PROTO_NFC_DEP_MASK) { *((u8 *)skb_put(skb, sizeof(u8))) = PN532_AUTOPOLL_TYPE_DEP_PASSIVE_106; *((u8 *)skb_put(skb, sizeof(u8))) = PN532_AUTOPOLL_TYPE_DEP_PASSIVE_212; *((u8 *)skb_put(skb, sizeof(u8))) = PN532_AUTOPOLL_TYPE_DEP_PASSIVE_424; } } if (im_protocols & NFC_PROTO_FELICA_MASK || im_protocols & NFC_PROTO_NFC_DEP_MASK) { *((u8 *)skb_put(skb, sizeof(u8))) = PN532_AUTOPOLL_TYPE_FELICA212; *((u8 *)skb_put(skb, sizeof(u8))) = PN532_AUTOPOLL_TYPE_FELICA424; } if (im_protocols & NFC_PROTO_JEWEL_MASK) *((u8 *)skb_put(skb, sizeof(u8))) = PN532_AUTOPOLL_TYPE_JEWEL; if (im_protocols & NFC_PROTO_ISO14443_B_MASK) *((u8 *)skb_put(skb, sizeof(u8))) = PN532_AUTOPOLL_TYPE_ISOB; if (tm_protocols) *((u8 *)skb_put(skb, sizeof(u8))) = PN532_AUTOPOLL_TYPE_DEP_ACTIVE_106; rc = pn533_send_cmd_async(dev, PN533_CMD_IN_AUTOPOLL, skb, pn533_autopoll_complete, NULL); if (rc < 0) dev_kfree_skb(skb); else dev->poll_mod_count++; return rc; } pn533_poll_create_mod_list(dev, im_protocols, tm_protocols); /* Do not always start polling from the same modulation */ get_random_bytes(&rand_mod, sizeof(rand_mod)); rand_mod %= dev->poll_mod_count; dev->poll_mod_curr = rand_mod; cur_mod = dev->poll_mod_active[dev->poll_mod_curr]; rc = pn533_send_poll_frame(dev); /* Start listen timer */ if (!rc && cur_mod->len == 0 && dev->poll_mod_count > 1) mod_timer(&dev->listen_timer, jiffies + PN533_LISTEN_TIME * HZ); return rc; } static void pn533_stop_poll(struct nfc_dev *nfc_dev) { struct pn533 *dev = nfc_get_drvdata(nfc_dev); del_timer(&dev->listen_timer); if (!dev->poll_mod_count) { dev_dbg(dev->dev, "Polling operation was not running\n"); return; } dev->phy_ops->abort_cmd(dev, GFP_KERNEL); flush_delayed_work(&dev->poll_work); pn533_poll_reset_mod_list(dev); } static int pn533_activate_target_nfcdep(struct pn533 *dev) { struct pn533_cmd_activate_response *rsp; u16 gt_len; int rc; struct sk_buff *skb; struct sk_buff *resp; skb = pn533_alloc_skb(dev, sizeof(u8) * 2); /*TG + Next*/ if (!skb) return -ENOMEM; skb_put_u8(skb, 1); /* TG */ skb_put_u8(skb, 0); /* Next */ resp = pn533_send_cmd_sync(dev, PN533_CMD_IN_ATR, skb); if (IS_ERR(resp)) return PTR_ERR(resp); rsp = (struct pn533_cmd_activate_response *)resp->data; rc = rsp->status & PN533_CMD_RET_MASK; if (rc != PN533_CMD_RET_SUCCESS) { nfc_err(dev->dev, "Target activation failed (error 0x%x)\n", rc); dev_kfree_skb(resp); return -EIO; } /* ATR_RES general bytes are located at offset 16 */ gt_len = resp->len - 16; rc = nfc_set_remote_general_bytes(dev->nfc_dev, rsp->gt, gt_len); dev_kfree_skb(resp); return rc; } static int pn533_activate_target(struct nfc_dev *nfc_dev, struct nfc_target *target, u32 protocol) { struct pn533 *dev = nfc_get_drvdata(nfc_dev); int rc; dev_dbg(dev->dev, "%s: protocol=%u\n", __func__, protocol); if (dev->poll_mod_count) { nfc_err(dev->dev, "Cannot activate while polling\n"); return -EBUSY; } if (dev->tgt_active_prot) { nfc_err(dev->dev, "There is already an active target\n"); return -EBUSY; } if (!dev->tgt_available_prots) { nfc_err(dev->dev, "There is no available target to activate\n"); return -EINVAL; } if (!(dev->tgt_available_prots & (1 << protocol))) { nfc_err(dev->dev, "Target doesn't support requested proto %u\n", protocol); return -EINVAL; } if (protocol == NFC_PROTO_NFC_DEP) { rc = pn533_activate_target_nfcdep(dev); if (rc) { nfc_err(dev->dev, "Activating target with DEP failed %d\n", rc); return rc; } } dev->tgt_active_prot = protocol; dev->tgt_available_prots = 0; return 0; } static int pn533_deactivate_target_complete(struct pn533 *dev, void *arg, struct sk_buff *resp) { int rc = 0; if (IS_ERR(resp)) { rc = PTR_ERR(resp); nfc_err(dev->dev, "Target release error %d\n", rc); return rc; } rc = resp->data[0] & PN533_CMD_RET_MASK; if (rc != PN533_CMD_RET_SUCCESS) nfc_err(dev->dev, "Error 0x%x when releasing the target\n", rc); dev_kfree_skb(resp); return rc; } static void pn533_deactivate_target(struct nfc_dev *nfc_dev, struct nfc_target *target, u8 mode) { struct pn533 *dev = nfc_get_drvdata(nfc_dev); struct sk_buff *skb; int rc; if (!dev->tgt_active_prot) { nfc_err(dev->dev, "There is no active target\n"); return; } dev->tgt_active_prot = 0; skb_queue_purge(&dev->resp_q); skb = pn533_alloc_skb(dev, sizeof(u8)); if (!skb) return; skb_put_u8(skb, 1); /* TG*/ rc = pn533_send_cmd_async(dev, PN533_CMD_IN_RELEASE, skb, pn533_deactivate_target_complete, NULL); if (rc < 0) { dev_kfree_skb(skb); nfc_err(dev->dev, "Target release error %d\n", rc); } } static int pn533_in_dep_link_up_complete(struct pn533 *dev, void *arg, struct sk_buff *resp) { struct pn533_cmd_jump_dep_response *rsp; u8 target_gt_len; int rc; u8 active = *(u8 *)arg; kfree(arg); if (IS_ERR(resp)) return PTR_ERR(resp); if (dev->tgt_available_prots && !(dev->tgt_available_prots & (1 << NFC_PROTO_NFC_DEP))) { nfc_err(dev->dev, "The target does not support DEP\n"); rc = -EINVAL; goto error; } rsp = (struct pn533_cmd_jump_dep_response *)resp->data; rc = rsp->status & PN533_CMD_RET_MASK; if (rc != PN533_CMD_RET_SUCCESS) { nfc_err(dev->dev, "Bringing DEP link up failed (error 0x%x)\n", rc); goto error; } if (!dev->tgt_available_prots) { struct nfc_target nfc_target; dev_dbg(dev->dev, "Creating new target\n"); memset(&nfc_target, 0, sizeof(struct nfc_target)); nfc_target.supported_protocols = NFC_PROTO_NFC_DEP_MASK; nfc_target.nfcid1_len = 10; memcpy(nfc_target.nfcid1, rsp->nfcid3t, nfc_target.nfcid1_len); rc = nfc_targets_found(dev->nfc_dev, &nfc_target, 1); if (rc) goto error; dev->tgt_available_prots = 0; } dev->tgt_active_prot = NFC_PROTO_NFC_DEP; /* ATR_RES general bytes are located at offset 17 */ target_gt_len = resp->len - 17; rc = nfc_set_remote_general_bytes(dev->nfc_dev, rsp->gt, target_gt_len); if (rc == 0) rc = nfc_dep_link_is_up(dev->nfc_dev, dev->nfc_dev->targets[0].idx, !active, NFC_RF_INITIATOR); error: dev_kfree_skb(resp); return rc; } static int pn533_rf_field(struct nfc_dev *nfc_dev, u8 rf); static int pn533_dep_link_up(struct nfc_dev *nfc_dev, struct nfc_target *target, u8 comm_mode, u8 *gb, size_t gb_len) { struct pn533 *dev = nfc_get_drvdata(nfc_dev); struct sk_buff *skb; int rc, skb_len; u8 *next, *arg, nfcid3[NFC_NFCID3_MAXSIZE]; u8 passive_data[PASSIVE_DATA_LEN] = {0x00, 0xff, 0xff, 0x00, 0x3}; if (dev->poll_mod_count) { nfc_err(dev->dev, "Cannot bring the DEP link up while polling\n"); return -EBUSY; } if (dev->tgt_active_prot) { nfc_err(dev->dev, "There is already an active target\n"); return -EBUSY; } skb_len = 3 + gb_len; /* ActPass + BR + Next */ skb_len += PASSIVE_DATA_LEN; /* NFCID3 */ skb_len += NFC_NFCID3_MAXSIZE; if (target && !target->nfcid2_len) { nfcid3[0] = 0x1; nfcid3[1] = 0xfe; get_random_bytes(nfcid3 + 2, 6); } skb = pn533_alloc_skb(dev, skb_len); if (!skb) return -ENOMEM; skb_put_u8(skb, !comm_mode); /* ActPass */ skb_put_u8(skb, 0x02); /* 424 kbps */ next = skb_put(skb, 1); /* Next */ *next = 0; /* Copy passive data */ skb_put_data(skb, passive_data, PASSIVE_DATA_LEN); *next |= 1; /* Copy NFCID3 (which is NFCID2 from SENSF_RES) */ if (target && target->nfcid2_len) memcpy(skb_put(skb, NFC_NFCID3_MAXSIZE), target->nfcid2, target->nfcid2_len); else skb_put_data(skb, nfcid3, NFC_NFCID3_MAXSIZE); *next |= 2; if (gb != NULL && gb_len > 0) { skb_put_data(skb, gb, gb_len); *next |= 4; /* We have some Gi */ } else { *next = 0; } arg = kmalloc(sizeof(*arg), GFP_KERNEL); if (!arg) { dev_kfree_skb(skb); return -ENOMEM; } *arg = !comm_mode; pn533_rf_field(dev->nfc_dev, 0); rc = pn533_send_cmd_async(dev, PN533_CMD_IN_JUMP_FOR_DEP, skb, pn533_in_dep_link_up_complete, arg); if (rc < 0) { dev_kfree_skb(skb); kfree(arg); } return rc; } static int pn533_dep_link_down(struct nfc_dev *nfc_dev) { struct pn533 *dev = nfc_get_drvdata(nfc_dev); pn533_poll_reset_mod_list(dev); if (dev->tgt_mode || dev->tgt_active_prot) dev->phy_ops->abort_cmd(dev, GFP_KERNEL); dev->tgt_active_prot = 0; dev->tgt_mode = 0; skb_queue_purge(&dev->resp_q); return 0; } struct pn533_data_exchange_arg { data_exchange_cb_t cb; void *cb_context; }; static struct sk_buff *pn533_build_response(struct pn533 *dev) { struct sk_buff *skb, *tmp, *t; unsigned int skb_len = 0, tmp_len = 0; if (skb_queue_empty(&dev->resp_q)) return NULL; if (skb_queue_len(&dev->resp_q) == 1) { skb = skb_dequeue(&dev->resp_q); goto out; } skb_queue_walk_safe(&dev->resp_q, tmp, t) skb_len += tmp->len; dev_dbg(dev->dev, "%s total length %d\n", __func__, skb_len); skb = alloc_skb(skb_len, GFP_KERNEL); if (skb == NULL) goto out; skb_put(skb, skb_len); skb_queue_walk_safe(&dev->resp_q, tmp, t) { memcpy(skb->data + tmp_len, tmp->data, tmp->len); tmp_len += tmp->len; } out: skb_queue_purge(&dev->resp_q); return skb; } static int pn533_data_exchange_complete(struct pn533 *dev, void *_arg, struct sk_buff *resp) { struct pn533_data_exchange_arg *arg = _arg; struct sk_buff *skb; int rc = 0; u8 status, ret, mi; if (IS_ERR(resp)) { rc = PTR_ERR(resp); goto _error; } status = resp->data[0]; ret = status & PN533_CMD_RET_MASK; mi = status & PN533_CMD_MI_MASK; skb_pull(resp, sizeof(status)); if (ret != PN533_CMD_RET_SUCCESS) { nfc_err(dev->dev, "Exchanging data failed (error 0x%x)\n", ret); rc = -EIO; goto error; } skb_queue_tail(&dev->resp_q, resp); if (mi) { dev->cmd_complete_mi_arg = arg; queue_work(dev->wq, &dev->mi_rx_work); return -EINPROGRESS; } /* Prepare for the next round */ if (skb_queue_len(&dev->fragment_skb) > 0) { dev->cmd_complete_dep_arg = arg; queue_work(dev->wq, &dev->mi_tx_work); return -EINPROGRESS; } skb = pn533_build_response(dev); if (!skb) { rc = -ENOMEM; goto error; } arg->cb(arg->cb_context, skb, 0); kfree(arg); return 0; error: dev_kfree_skb(resp); _error: skb_queue_purge(&dev->resp_q); arg->cb(arg->cb_context, NULL, rc); kfree(arg); return rc; } /* * Receive an incoming pn533 frame. skb contains only header and payload. * If skb == NULL, it is a notification that the link below is dead. */ void pn533_recv_frame(struct pn533 *dev, struct sk_buff *skb, int status) { if (!dev->cmd) goto sched_wq; dev->cmd->status = status; if (status != 0) { dev_dbg(dev->dev, "%s: Error received: %d\n", __func__, status); goto sched_wq; } if (skb == NULL) { dev_err(dev->dev, "NULL Frame -> link is dead\n"); goto sched_wq; } if (pn533_rx_frame_is_ack(skb->data)) { dev_dbg(dev->dev, "%s: Received ACK frame\n", __func__); dev_kfree_skb(skb); return; } print_hex_dump_debug("PN533 RX: ", DUMP_PREFIX_NONE, 16, 1, skb->data, dev->ops->rx_frame_size(skb->data), false); if (!dev->ops->rx_is_frame_valid(skb->data, dev)) { nfc_err(dev->dev, "Received an invalid frame\n"); dev->cmd->status = -EIO; } else if (!pn533_rx_frame_is_cmd_response(dev, skb->data)) { nfc_err(dev->dev, "It it not the response to the last command\n"); dev->cmd->status = -EIO; } dev->cmd->resp = skb; sched_wq: queue_work(dev->wq, &dev->cmd_complete_work); } EXPORT_SYMBOL(pn533_recv_frame); /* Split the Tx skb into small chunks */ static int pn533_fill_fragment_skbs(struct pn533 *dev, struct sk_buff *skb) { struct sk_buff *frag; int frag_size; do { /* Remaining size */ if (skb->len > PN533_CMD_DATAFRAME_MAXLEN) frag_size = PN533_CMD_DATAFRAME_MAXLEN; else frag_size = skb->len; /* Allocate and reserve */ frag = pn533_alloc_skb(dev, frag_size); if (!frag) { skb_queue_purge(&dev->fragment_skb); return -ENOMEM; } if (!dev->tgt_mode) { /* Reserve the TG/MI byte */ skb_reserve(frag, 1); /* MI + TG */ if (frag_size == PN533_CMD_DATAFRAME_MAXLEN) *(u8 *)skb_push(frag, sizeof(u8)) = (PN533_CMD_MI_MASK | 1); else *(u8 *)skb_push(frag, sizeof(u8)) = 1; /* TG */ } skb_put_data(frag, skb->data, frag_size); /* Reduce the size of incoming buffer */ skb_pull(skb, frag_size); /* Add this to skb_queue */ skb_queue_tail(&dev->fragment_skb, frag); } while (skb->len > 0); dev_kfree_skb(skb); return skb_queue_len(&dev->fragment_skb); } static int pn533_transceive(struct nfc_dev *nfc_dev, struct nfc_target *target, struct sk_buff *skb, data_exchange_cb_t cb, void *cb_context) { struct pn533 *dev = nfc_get_drvdata(nfc_dev); struct pn533_data_exchange_arg *arg = NULL; int rc; if (!dev->tgt_active_prot) { nfc_err(dev->dev, "Can't exchange data if there is no active target\n"); rc = -EINVAL; goto error; } arg = kmalloc(sizeof(*arg), GFP_KERNEL); if (!arg) { rc = -ENOMEM; goto error; } arg->cb = cb; arg->cb_context = cb_context; switch (dev->device_type) { case PN533_DEVICE_PASORI: if (dev->tgt_active_prot == NFC_PROTO_FELICA) { rc = pn533_send_data_async(dev, PN533_CMD_IN_COMM_THRU, skb, pn533_data_exchange_complete, arg); break; } fallthrough; default: /* jumbo frame ? */ if (skb->len > PN533_CMD_DATAEXCH_DATA_MAXLEN) { rc = pn533_fill_fragment_skbs(dev, skb); if (rc < 0) goto error; skb = skb_dequeue(&dev->fragment_skb); if (!skb) { rc = -EIO; goto error; } } else { *(u8 *)skb_push(skb, sizeof(u8)) = 1; /* TG */ } rc = pn533_send_data_async(dev, PN533_CMD_IN_DATA_EXCHANGE, skb, pn533_data_exchange_complete, arg); break; } if (rc < 0) /* rc from send_async */ goto error; return 0; error: kfree(arg); dev_kfree_skb(skb); return rc; } static int pn533_tm_send_complete(struct pn533 *dev, void *arg, struct sk_buff *resp) { u8 status; if (IS_ERR(resp)) return PTR_ERR(resp); status = resp->data[0]; /* Prepare for the next round */ if (skb_queue_len(&dev->fragment_skb) > 0) { queue_work(dev->wq, &dev->mi_tm_tx_work); return -EINPROGRESS; } dev_kfree_skb(resp); if (status != 0) { nfc_tm_deactivated(dev->nfc_dev); dev->tgt_mode = 0; return 0; } queue_work(dev->wq, &dev->tg_work); return 0; } static int pn533_tm_send(struct nfc_dev *nfc_dev, struct sk_buff *skb) { struct pn533 *dev = nfc_get_drvdata(nfc_dev); int rc; /* let's split in multiple chunks if size's too big */ if (skb->len > PN533_CMD_DATAEXCH_DATA_MAXLEN) { rc = pn533_fill_fragment_skbs(dev, skb); if (rc < 0) goto error; /* get the first skb */ skb = skb_dequeue(&dev->fragment_skb); if (!skb) { rc = -EIO; goto error; } rc = pn533_send_data_async(dev, PN533_CMD_TG_SET_META_DATA, skb, pn533_tm_send_complete, NULL); } else { /* Send th skb */ rc = pn533_send_data_async(dev, PN533_CMD_TG_SET_DATA, skb, pn533_tm_send_complete, NULL); } error: if (rc < 0) { dev_kfree_skb(skb); skb_queue_purge(&dev->fragment_skb); } return rc; } static void pn533_wq_mi_recv(struct work_struct *work) { struct pn533 *dev = container_of(work, struct pn533, mi_rx_work); struct sk_buff *skb; int rc; skb = pn533_alloc_skb(dev, PN533_CMD_DATAEXCH_HEAD_LEN); if (!skb) goto error; switch (dev->device_type) { case PN533_DEVICE_PASORI: if (dev->tgt_active_prot == NFC_PROTO_FELICA) { rc = pn533_send_cmd_direct_async(dev, PN533_CMD_IN_COMM_THRU, skb, pn533_data_exchange_complete, dev->cmd_complete_mi_arg); break; } fallthrough; default: skb_put_u8(skb, 1); /*TG*/ rc = pn533_send_cmd_direct_async(dev, PN533_CMD_IN_DATA_EXCHANGE, skb, pn533_data_exchange_complete, dev->cmd_complete_mi_arg); break; } if (rc == 0) /* success */ return; nfc_err(dev->dev, "Error %d when trying to perform data_exchange\n", rc); dev_kfree_skb(skb); kfree(dev->cmd_complete_mi_arg); error: dev->phy_ops->send_ack(dev, GFP_KERNEL); queue_work(dev->wq, &dev->cmd_work); } static void pn533_wq_mi_send(struct work_struct *work) { struct pn533 *dev = container_of(work, struct pn533, mi_tx_work); struct sk_buff *skb; int rc; /* Grab the first skb in the queue */ skb = skb_dequeue(&dev->fragment_skb); if (skb == NULL) { /* No more data */ /* Reset the queue for future use */ skb_queue_head_init(&dev->fragment_skb); goto error; } switch (dev->device_type) { case PN533_DEVICE_PASORI: if (dev->tgt_active_prot != NFC_PROTO_FELICA) { rc = -EIO; break; } rc = pn533_send_cmd_direct_async(dev, PN533_CMD_IN_COMM_THRU, skb, pn533_data_exchange_complete, dev->cmd_complete_dep_arg); break; default: /* Still some fragments? */ rc = pn533_send_cmd_direct_async(dev, PN533_CMD_IN_DATA_EXCHANGE, skb, pn533_data_exchange_complete, dev->cmd_complete_dep_arg); break; } if (rc == 0) /* success */ return; nfc_err(dev->dev, "Error %d when trying to perform data_exchange\n", rc); dev_kfree_skb(skb); kfree(dev->cmd_complete_dep_arg); error: dev->phy_ops->send_ack(dev, GFP_KERNEL); queue_work(dev->wq, &dev->cmd_work); } static int pn533_set_configuration(struct pn533 *dev, u8 cfgitem, u8 *cfgdata, u8 cfgdata_len) { struct sk_buff *skb; struct sk_buff *resp; int skb_len; skb_len = sizeof(cfgitem) + cfgdata_len; /* cfgitem + cfgdata */ skb = pn533_alloc_skb(dev, skb_len); if (!skb) return -ENOMEM; skb_put_u8(skb, cfgitem); skb_put_data(skb, cfgdata, cfgdata_len); resp = pn533_send_cmd_sync(dev, PN533_CMD_RF_CONFIGURATION, skb); if (IS_ERR(resp)) return PTR_ERR(resp); dev_kfree_skb(resp); return 0; } static int pn533_get_firmware_version(struct pn533 *dev, struct pn533_fw_version *fv) { struct sk_buff *skb; struct sk_buff *resp; skb = pn533_alloc_skb(dev, 0); if (!skb) return -ENOMEM; resp = pn533_send_cmd_sync(dev, PN533_CMD_GET_FIRMWARE_VERSION, skb); if (IS_ERR(resp)) return PTR_ERR(resp); fv->ic = resp->data[0]; fv->ver = resp->data[1]; fv->rev = resp->data[2]; fv->support = resp->data[3]; dev_kfree_skb(resp); return 0; } static int pn533_pasori_fw_reset(struct pn533 *dev) { struct sk_buff *skb; struct sk_buff *resp; skb = pn533_alloc_skb(dev, sizeof(u8)); if (!skb) return -ENOMEM; skb_put_u8(skb, 0x1); resp = pn533_send_cmd_sync(dev, 0x18, skb); if (IS_ERR(resp)) return PTR_ERR(resp); dev_kfree_skb(resp); return 0; } static int pn533_rf_field(struct nfc_dev *nfc_dev, u8 rf) { struct pn533 *dev = nfc_get_drvdata(nfc_dev); u8 rf_field = !!rf; int rc; rf_field |= PN533_CFGITEM_RF_FIELD_AUTO_RFCA; rc = pn533_set_configuration(dev, PN533_CFGITEM_RF_FIELD, (u8 *)&rf_field, 1); if (rc) { nfc_err(dev->dev, "Error on setting RF field\n"); return rc; } return 0; } static int pn532_sam_configuration(struct nfc_dev *nfc_dev) { struct pn533 *dev = nfc_get_drvdata(nfc_dev); struct sk_buff *skb; struct sk_buff *resp; skb = pn533_alloc_skb(dev, 1); if (!skb) return -ENOMEM; skb_put_u8(skb, 0x01); resp = pn533_send_cmd_sync(dev, PN533_CMD_SAM_CONFIGURATION, skb); if (IS_ERR(resp)) return PTR_ERR(resp); dev_kfree_skb(resp); return 0; } static int pn533_dev_up(struct nfc_dev *nfc_dev) { struct pn533 *dev = nfc_get_drvdata(nfc_dev); int rc; if (dev->phy_ops->dev_up) { rc = dev->phy_ops->dev_up(dev); if (rc) return rc; } if ((dev->device_type == PN533_DEVICE_PN532) || (dev->device_type == PN533_DEVICE_PN532_AUTOPOLL)) { rc = pn532_sam_configuration(nfc_dev); if (rc) return rc; } return pn533_rf_field(nfc_dev, 1); } static int pn533_dev_down(struct nfc_dev *nfc_dev) { struct pn533 *dev = nfc_get_drvdata(nfc_dev); int ret; ret = pn533_rf_field(nfc_dev, 0); if (dev->phy_ops->dev_down && !ret) ret = dev->phy_ops->dev_down(dev); return ret; } static const struct nfc_ops pn533_nfc_ops = { .dev_up = pn533_dev_up, .dev_down = pn533_dev_down, .dep_link_up = pn533_dep_link_up, .dep_link_down = pn533_dep_link_down, .start_poll = pn533_start_poll, .stop_poll = pn533_stop_poll, .activate_target = pn533_activate_target, .deactivate_target = pn533_deactivate_target, .im_transceive = pn533_transceive, .tm_send = pn533_tm_send, }; static int pn533_setup(struct pn533 *dev) { struct pn533_config_max_retries max_retries; struct pn533_config_timing timing; u8 pasori_cfg[3] = {0x08, 0x01, 0x08}; int rc; switch (dev->device_type) { case PN533_DEVICE_STD: case PN533_DEVICE_PASORI: case PN533_DEVICE_ACR122U: case PN533_DEVICE_PN532: case PN533_DEVICE_PN532_AUTOPOLL: max_retries.mx_rty_atr = 0x2; max_retries.mx_rty_psl = 0x1; max_retries.mx_rty_passive_act = PN533_CONFIG_MAX_RETRIES_NO_RETRY; timing.rfu = PN533_CONFIG_TIMING_102; timing.atr_res_timeout = PN533_CONFIG_TIMING_102; timing.dep_timeout = PN533_CONFIG_TIMING_204; break; default: nfc_err(dev->dev, "Unknown device type %d\n", dev->device_type); return -EINVAL; } rc = pn533_set_configuration(dev, PN533_CFGITEM_MAX_RETRIES, (u8 *)&max_retries, sizeof(max_retries)); if (rc) { nfc_err(dev->dev, "Error on setting MAX_RETRIES config\n"); return rc; } rc = pn533_set_configuration(dev, PN533_CFGITEM_TIMING, (u8 *)&timing, sizeof(timing)); if (rc) { nfc_err(dev->dev, "Error on setting RF timings\n"); return rc; } switch (dev->device_type) { case PN533_DEVICE_STD: case PN533_DEVICE_PN532: case PN533_DEVICE_PN532_AUTOPOLL: break; case PN533_DEVICE_PASORI: pn533_pasori_fw_reset(dev); rc = pn533_set_configuration(dev, PN533_CFGITEM_PASORI, pasori_cfg, 3); if (rc) { nfc_err(dev->dev, "Error while settings PASORI config\n"); return rc; } pn533_pasori_fw_reset(dev); break; } return 0; } int pn533_finalize_setup(struct pn533 *dev) { struct pn533_fw_version fw_ver; int rc; memset(&fw_ver, 0, sizeof(fw_ver)); rc = pn533_get_firmware_version(dev, &fw_ver); if (rc) { nfc_err(dev->dev, "Unable to get FW version\n"); return rc; } nfc_info(dev->dev, "NXP PN5%02X firmware ver %d.%d now attached\n", fw_ver.ic, fw_ver.ver, fw_ver.rev); rc = pn533_setup(dev); if (rc) return rc; return 0; } EXPORT_SYMBOL_GPL(pn533_finalize_setup); struct pn533 *pn53x_common_init(u32 device_type, enum pn533_protocol_type protocol_type, void *phy, const struct pn533_phy_ops *phy_ops, struct pn533_frame_ops *fops, struct device *dev) { struct pn533 *priv; priv = kzalloc(sizeof(*priv), GFP_KERNEL); if (!priv) return ERR_PTR(-ENOMEM); priv->phy = phy; priv->phy_ops = phy_ops; priv->dev = dev; if (fops != NULL) priv->ops = fops; else priv->ops = &pn533_std_frame_ops; priv->protocol_type = protocol_type; priv->device_type = device_type; mutex_init(&priv->cmd_lock); INIT_WORK(&priv->cmd_work, pn533_wq_cmd); INIT_WORK(&priv->cmd_complete_work, pn533_wq_cmd_complete); INIT_WORK(&priv->mi_rx_work, pn533_wq_mi_recv); INIT_WORK(&priv->mi_tx_work, pn533_wq_mi_send); INIT_WORK(&priv->tg_work, pn533_wq_tg_get_data); INIT_WORK(&priv->mi_tm_rx_work, pn533_wq_tm_mi_recv); INIT_WORK(&priv->mi_tm_tx_work, pn533_wq_tm_mi_send); INIT_DELAYED_WORK(&priv->poll_work, pn533_wq_poll); INIT_WORK(&priv->rf_work, pn533_wq_rf); priv->wq = alloc_ordered_workqueue("pn533", 0); if (priv->wq == NULL) goto error; timer_setup(&priv->listen_timer, pn533_listen_mode_timer, 0); skb_queue_head_init(&priv->resp_q); skb_queue_head_init(&priv->fragment_skb); INIT_LIST_HEAD(&priv->cmd_queue); return priv; error: kfree(priv); return ERR_PTR(-ENOMEM); } EXPORT_SYMBOL_GPL(pn53x_common_init); void pn53x_common_clean(struct pn533 *priv) { struct pn533_cmd *cmd, *n; /* delete the timer before cleanup the worker */ timer_shutdown_sync(&priv->listen_timer); flush_delayed_work(&priv->poll_work); destroy_workqueue(priv->wq); skb_queue_purge(&priv->resp_q); list_for_each_entry_safe(cmd, n, &priv->cmd_queue, queue) { list_del(&cmd->queue); kfree(cmd); } kfree(priv); } EXPORT_SYMBOL_GPL(pn53x_common_clean); int pn532_i2c_nfc_alloc(struct pn533 *priv, u32 protocols, struct device *parent) { priv->nfc_dev = nfc_allocate_device(&pn533_nfc_ops, protocols, priv->ops->tx_header_len + PN533_CMD_DATAEXCH_HEAD_LEN, priv->ops->tx_tail_len); if (!priv->nfc_dev) return -ENOMEM; nfc_set_parent_dev(priv->nfc_dev, parent); nfc_set_drvdata(priv->nfc_dev, priv); return 0; } EXPORT_SYMBOL_GPL(pn532_i2c_nfc_alloc); int pn53x_register_nfc(struct pn533 *priv, u32 protocols, struct device *parent) { int rc; rc = pn532_i2c_nfc_alloc(priv, protocols, parent); if (rc) return rc; rc = nfc_register_device(priv->nfc_dev); if (rc) nfc_free_device(priv->nfc_dev); return rc; } EXPORT_SYMBOL_GPL(pn53x_register_nfc); void pn53x_unregister_nfc(struct pn533 *priv) { nfc_unregister_device(priv->nfc_dev); nfc_free_device(priv->nfc_dev); } EXPORT_SYMBOL_GPL(pn53x_unregister_nfc); MODULE_AUTHOR("Lauro Ramos Venancio <lauro.venancio@openbossa.org>"); MODULE_AUTHOR("Aloisio Almeida Jr <aloisio.almeida@openbossa.org>"); MODULE_AUTHOR("Waldemar Rymarkiewicz <waldemar.rymarkiewicz@tieto.com>"); MODULE_DESCRIPTION("PN533 driver ver " VERSION); MODULE_VERSION(VERSION); MODULE_LICENSE("GPL"); |
999 296 108 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 | /* SPDX-License-Identifier: GPL-2.0-or-later */ /* * INET An implementation of the TCP/IP protocol suite for the LINUX * operating system. INET is implemented using the BSD Socket * interface as the means of communication with the user level. * * Definitions for the TCP protocol. * * Version: @(#)tcp.h 1.0.2 04/28/93 * * Author: Fred N. van Kempen, <waltje@uWalt.NL.Mugnet.ORG> */ #ifndef _LINUX_TCP_H #define _LINUX_TCP_H #include <linux/skbuff.h> #include <linux/win_minmax.h> #include <net/sock.h> #include <net/inet_connection_sock.h> #include <net/inet_timewait_sock.h> #include <uapi/linux/tcp.h> static inline struct tcphdr *tcp_hdr(const struct sk_buff *skb) { return (struct tcphdr *)skb_transport_header(skb); } static inline unsigned int __tcp_hdrlen(const struct tcphdr *th) { return th->doff * 4; } static inline unsigned int tcp_hdrlen(const struct sk_buff *skb) { return __tcp_hdrlen(tcp_hdr(skb)); } static inline struct tcphdr *inner_tcp_hdr(const struct sk_buff *skb) { return (struct tcphdr *)skb_inner_transport_header(skb); } static inline unsigned int inner_tcp_hdrlen(const struct sk_buff *skb) { return inner_tcp_hdr(skb)->doff * 4; } /** * skb_tcp_all_headers - Returns size of all headers for a TCP packet * @skb: buffer * * Used in TX path, for a packet known to be a TCP one. * * if (skb_is_gso(skb)) { * int hlen = skb_tcp_all_headers(skb); * ... */ static inline int skb_tcp_all_headers(const struct sk_buff *skb) { return skb_transport_offset(skb) + tcp_hdrlen(skb); } /** * skb_inner_tcp_all_headers - Returns size of all headers for an encap TCP packet * @skb: buffer * * Used in TX path, for a packet known to be a TCP one. * * if (skb_is_gso(skb) && skb->encapsulation) { * int hlen = skb_inner_tcp_all_headers(skb); * ... */ static inline int skb_inner_tcp_all_headers(const struct sk_buff *skb) { return skb_inner_transport_offset(skb) + inner_tcp_hdrlen(skb); } static inline unsigned int tcp_optlen(const struct sk_buff *skb) { return (tcp_hdr(skb)->doff - 5) * 4; } /* TCP Fast Open */ #define TCP_FASTOPEN_COOKIE_MIN 4 /* Min Fast Open Cookie size in bytes */ #define TCP_FASTOPEN_COOKIE_MAX 16 /* Max Fast Open Cookie size in bytes */ #define TCP_FASTOPEN_COOKIE_SIZE 8 /* the size employed by this impl. */ /* TCP Fast Open Cookie as stored in memory */ struct tcp_fastopen_cookie { __le64 val[DIV_ROUND_UP(TCP_FASTOPEN_COOKIE_MAX, sizeof(u64))]; s8 len; bool exp; /* In RFC6994 experimental option format */ }; /* This defines a selective acknowledgement block. */ struct tcp_sack_block_wire { __be32 start_seq; __be32 end_seq; }; struct tcp_sack_block { u32 start_seq; u32 end_seq; }; /*These are used to set the sack_ok field in struct tcp_options_received */ #define TCP_SACK_SEEN (1 << 0) /*1 = peer is SACK capable, */ #define TCP_DSACK_SEEN (1 << 2) /*1 = DSACK was received from peer*/ struct tcp_options_received { /* PAWS/RTTM data */ int ts_recent_stamp;/* Time we stored ts_recent (for aging) */ u32 ts_recent; /* Time stamp to echo next */ u32 rcv_tsval; /* Time stamp value */ u32 rcv_tsecr; /* Time stamp echo reply */ u16 saw_tstamp : 1, /* Saw TIMESTAMP on last packet */ tstamp_ok : 1, /* TIMESTAMP seen on SYN packet */ dsack : 1, /* D-SACK is scheduled */ wscale_ok : 1, /* Wscale seen on SYN packet */ sack_ok : 3, /* SACK seen on SYN packet */ smc_ok : 1, /* SMC seen on SYN packet */ snd_wscale : 4, /* Window scaling received from sender */ rcv_wscale : 4; /* Window scaling to send to receiver */ u8 saw_unknown:1, /* Received unknown option */ unused:7; u8 num_sacks; /* Number of SACK blocks */ u16 user_mss; /* mss requested by user in ioctl */ u16 mss_clamp; /* Maximal mss, negotiated at connection setup */ }; static inline void tcp_clear_options(struct tcp_options_received *rx_opt) { rx_opt->tstamp_ok = rx_opt->sack_ok = 0; rx_opt->wscale_ok = rx_opt->snd_wscale = 0; #if IS_ENABLED(CONFIG_SMC) rx_opt->smc_ok = 0; #endif } /* This is the max number of SACKS that we'll generate and process. It's safe * to increase this, although since: * size = TCPOLEN_SACK_BASE_ALIGNED (4) + n * TCPOLEN_SACK_PERBLOCK (8) * only four options will fit in a standard TCP header */ #define TCP_NUM_SACKS 4 struct tcp_request_sock_ops; struct tcp_request_sock { struct inet_request_sock req; const struct tcp_request_sock_ops *af_specific; u64 snt_synack; /* first SYNACK sent time */ bool tfo_listener; bool is_mptcp; bool req_usec_ts; #if IS_ENABLED(CONFIG_MPTCP) bool drop_req; #endif u32 txhash; u32 rcv_isn; u32 snt_isn; u32 ts_off; u32 last_oow_ack_time; /* last SYNACK */ u32 rcv_nxt; /* the ack # by SYNACK. For * FastOpen it's the seq# * after data-in-SYN. */ u8 syn_tos; #ifdef CONFIG_TCP_AO u8 ao_keyid; u8 ao_rcv_next; bool used_tcp_ao; #endif }; static inline struct tcp_request_sock *tcp_rsk(const struct request_sock *req) { return (struct tcp_request_sock *)req; } static inline bool tcp_rsk_used_ao(const struct request_sock *req) { #ifndef CONFIG_TCP_AO return false; #else return tcp_rsk(req)->used_tcp_ao; #endif } #define TCP_RMEM_TO_WIN_SCALE 8 struct tcp_sock { /* Cacheline organization can be found documented in * Documentation/networking/net_cachelines/tcp_sock.rst. * Please update the document when adding new fields. */ /* inet_connection_sock has to be the first member of tcp_sock */ struct inet_connection_sock inet_conn; /* TX read-mostly hotpath cache lines */ __cacheline_group_begin(tcp_sock_read_tx); /* timestamp of last sent data packet (for restart window) */ u32 max_window; /* Maximal window ever seen from peer */ u32 rcv_ssthresh; /* Current window clamp */ u32 reordering; /* Packet reordering metric. */ u32 notsent_lowat; /* TCP_NOTSENT_LOWAT */ u16 gso_segs; /* Max number of segs per GSO packet */ /* from STCP, retrans queue hinting */ struct sk_buff *lost_skb_hint; struct sk_buff *retransmit_skb_hint; __cacheline_group_end(tcp_sock_read_tx); /* TXRX read-mostly hotpath cache lines */ __cacheline_group_begin(tcp_sock_read_txrx); u32 tsoffset; /* timestamp offset */ u32 snd_wnd; /* The window we expect to receive */ u32 mss_cache; /* Cached effective mss, not including SACKS */ u32 snd_cwnd; /* Sending congestion window */ u32 prr_out; /* Total number of pkts sent during Recovery. */ u32 lost_out; /* Lost packets */ u32 sacked_out; /* SACK'd packets */ u16 tcp_header_len; /* Bytes of tcp header to send */ u8 scaling_ratio; /* see tcp_win_from_space() */ u8 chrono_type : 2, /* current chronograph type */ repair : 1, tcp_usec_ts : 1, /* TSval values in usec */ is_sack_reneg:1, /* in recovery from loss with SACK reneg? */ is_cwnd_limited:1;/* forward progress limited by snd_cwnd? */ __cacheline_group_end(tcp_sock_read_txrx); /* RX read-mostly hotpath cache lines */ __cacheline_group_begin(tcp_sock_read_rx); u32 copied_seq; /* Head of yet unread data */ u32 rcv_tstamp; /* timestamp of last received ACK (for keepalives) */ u32 snd_wl1; /* Sequence for window update */ u32 tlp_high_seq; /* snd_nxt at the time of TLP */ u32 rttvar_us; /* smoothed mdev_max */ u32 retrans_out; /* Retransmitted packets out */ u16 advmss; /* Advertised MSS */ u16 urg_data; /* Saved octet of OOB data and control flags */ u32 lost; /* Total data packets lost incl. rexmits */ struct minmax rtt_min; /* OOO segments go in this rbtree. Socket lock must be held. */ struct rb_root out_of_order_queue; u32 snd_ssthresh; /* Slow start size threshold */ u8 recvmsg_inq : 1;/* Indicate # of bytes in queue upon recvmsg */ __cacheline_group_end(tcp_sock_read_rx); /* TX read-write hotpath cache lines */ __cacheline_group_begin(tcp_sock_write_tx) ____cacheline_aligned; u32 segs_out; /* RFC4898 tcpEStatsPerfSegsOut * The total number of segments sent. */ u32 data_segs_out; /* RFC4898 tcpEStatsPerfDataSegsOut * total number of data segments sent. */ u64 bytes_sent; /* RFC4898 tcpEStatsPerfHCDataOctetsOut * total number of data bytes sent. */ u32 snd_sml; /* Last byte of the most recently transmitted small packet */ u32 chrono_start; /* Start time in jiffies of a TCP chrono */ u32 chrono_stat[3]; /* Time in jiffies for chrono_stat stats */ u32 write_seq; /* Tail(+1) of data held in tcp send buffer */ u32 pushed_seq; /* Last pushed seq, required to talk to windows */ u32 lsndtime; u32 mdev_us; /* medium deviation */ u32 rtt_seq; /* sequence number to update rttvar */ u64 tcp_wstamp_ns; /* departure time for next sent data packet */ struct list_head tsorted_sent_queue; /* time-sorted sent but un-SACKed skbs */ struct sk_buff *highest_sack; /* skb just after the highest * skb with SACKed bit set * (validity guaranteed only if * sacked_out > 0) */ u8 ecn_flags; /* ECN status bits. */ __cacheline_group_end(tcp_sock_write_tx); /* TXRX read-write hotpath cache lines */ __cacheline_group_begin(tcp_sock_write_txrx); /* * Header prediction flags * 0x5?10 << 16 + snd_wnd in net byte order */ __be32 pred_flags; u64 tcp_clock_cache; /* cache last tcp_clock_ns() (see tcp_mstamp_refresh()) */ u64 tcp_mstamp; /* most recent packet received/sent */ u32 rcv_nxt; /* What we want to receive next */ u32 snd_nxt; /* Next sequence we send */ u32 snd_una; /* First byte we want an ack for */ u32 window_clamp; /* Maximal window to advertise */ u32 srtt_us; /* smoothed round trip time << 3 in usecs */ u32 packets_out; /* Packets which are "in flight" */ u32 snd_up; /* Urgent pointer */ u32 delivered; /* Total data packets delivered incl. rexmits */ u32 delivered_ce; /* Like the above but only ECE marked packets */ u32 app_limited; /* limited until "delivered" reaches this val */ u32 rcv_wnd; /* Current receiver window */ /* * Options received (usually on last packet, some only on SYN packets). */ struct tcp_options_received rx_opt; u8 nonagle : 4,/* Disable Nagle algorithm? */ rate_app_limited:1; /* rate_{delivered,interval_us} limited? */ __cacheline_group_end(tcp_sock_write_txrx); /* RX read-write hotpath cache lines */ __cacheline_group_begin(tcp_sock_write_rx) __aligned(8); u64 bytes_received; /* RFC4898 tcpEStatsAppHCThruOctetsReceived * sum(delta(rcv_nxt)), or how many bytes * were acked. */ u32 segs_in; /* RFC4898 tcpEStatsPerfSegsIn * total number of segments in. */ u32 data_segs_in; /* RFC4898 tcpEStatsPerfDataSegsIn * total number of data segments in. */ u32 rcv_wup; /* rcv_nxt on last window update sent */ u32 max_packets_out; /* max packets_out in last window */ u32 cwnd_usage_seq; /* right edge of cwnd usage tracking flight */ u32 rate_delivered; /* saved rate sample: packets delivered */ u32 rate_interval_us; /* saved rate sample: time elapsed */ u32 rcv_rtt_last_tsecr; u64 first_tx_mstamp; /* start of window send phase */ u64 delivered_mstamp; /* time we reached "delivered" */ u64 bytes_acked; /* RFC4898 tcpEStatsAppHCThruOctetsAcked * sum(delta(snd_una)), or how many bytes * were acked. */ struct { u32 rtt_us; u32 seq; u64 time; } rcv_rtt_est; /* Receiver queue space */ struct { u32 space; u32 seq; u64 time; } rcvq_space; __cacheline_group_end(tcp_sock_write_rx); /* End of Hot Path */ /* * RFC793 variables by their proper names. This means you can * read the code and the spec side by side (and laugh ...) * See RFC793 and RFC1122. The RFC writes these in capitals. */ u32 dsack_dups; /* RFC4898 tcpEStatsStackDSACKDups * total number of DSACK blocks received */ u32 compressed_ack_rcv_nxt; struct list_head tsq_node; /* anchor in tsq_tasklet.head list */ /* Information of the most recently (s)acked skb */ struct tcp_rack { u64 mstamp; /* (Re)sent time of the skb */ u32 rtt_us; /* Associated RTT */ u32 end_seq; /* Ending TCP sequence of the skb */ u32 last_delivered; /* tp->delivered at last reo_wnd adj */ u8 reo_wnd_steps; /* Allowed reordering window */ #define TCP_RACK_RECOVERY_THRESH 16 u8 reo_wnd_persist:5, /* No. of recovery since last adj */ dsack_seen:1, /* Whether DSACK seen after last adj */ advanced:1; /* mstamp advanced since last lost marking */ } rack; u8 compressed_ack; u8 dup_ack_counter:2, tlp_retrans:1, /* TLP is a retransmission */ unused:5; u8 thin_lto : 1,/* Use linear timeouts for thin streams */ fastopen_connect:1, /* FASTOPEN_CONNECT sockopt */ fastopen_no_cookie:1, /* Allow send/recv SYN+data without a cookie */ fastopen_client_fail:2, /* reason why fastopen failed */ frto : 1;/* F-RTO (RFC5682) activated in CA_Loss */ u8 repair_queue; u8 save_syn:2, /* Save headers of SYN packet */ syn_data:1, /* SYN includes data */ syn_fastopen:1, /* SYN includes Fast Open option */ syn_fastopen_exp:1,/* SYN includes Fast Open exp. option */ syn_fastopen_ch:1, /* Active TFO re-enabling probe */ syn_data_acked:1;/* data in SYN is acked by SYN-ACK */ u8 keepalive_probes; /* num of allowed keep alive probes */ u32 tcp_tx_delay; /* delay (in usec) added to TX packets */ /* RTT measurement */ u32 mdev_max_us; /* maximal mdev for the last rtt period */ u32 reord_seen; /* number of data packet reordering events */ /* * Slow start and congestion control (see also Nagle, and Karn & Partridge) */ u32 snd_cwnd_cnt; /* Linear increase counter */ u32 snd_cwnd_clamp; /* Do not allow snd_cwnd to grow above this */ u32 snd_cwnd_used; u32 snd_cwnd_stamp; u32 prior_cwnd; /* cwnd right before starting loss recovery */ u32 prr_delivered; /* Number of newly delivered packets to * receiver in Recovery. */ u32 last_oow_ack_time; /* timestamp of last out-of-window ACK */ struct hrtimer pacing_timer; struct hrtimer compressed_ack_timer; struct sk_buff *ooo_last_skb; /* cache rb_last(out_of_order_queue) */ /* SACKs data, these 2 need to be together (see tcp_options_write) */ struct tcp_sack_block duplicate_sack[1]; /* D-SACK block */ struct tcp_sack_block selective_acks[4]; /* The SACKS themselves*/ struct tcp_sack_block recv_sack_cache[4]; int lost_cnt_hint; u32 prior_ssthresh; /* ssthresh saved at recovery start */ u32 high_seq; /* snd_nxt at onset of congestion */ u32 retrans_stamp; /* Timestamp of the last retransmit, * also used in SYN-SENT to remember stamp of * the first SYN. */ u32 undo_marker; /* snd_una upon a new recovery episode. */ int undo_retrans; /* number of undoable retransmissions. */ u64 bytes_retrans; /* RFC4898 tcpEStatsPerfOctetsRetrans * Total data bytes retransmitted */ u32 total_retrans; /* Total retransmits for entire connection */ u32 rto_stamp; /* Start time (ms) of last CA_Loss recovery */ u16 total_rto; /* Total number of RTO timeouts, including * SYN/SYN-ACK and recurring timeouts. */ u16 total_rto_recoveries; /* Total number of RTO recoveries, * including any unfinished recovery. */ u32 total_rto_time; /* ms spent in (completed) RTO recoveries. */ u32 urg_seq; /* Seq of received urgent pointer */ unsigned int keepalive_time; /* time before keep alive takes place */ unsigned int keepalive_intvl; /* time interval between keep alive probes */ int linger2; /* Sock_ops bpf program related variables */ #ifdef CONFIG_BPF u8 bpf_sock_ops_cb_flags; /* Control calling BPF programs * values defined in uapi/linux/tcp.h */ u8 bpf_chg_cc_inprogress:1; /* In the middle of * bpf_setsockopt(TCP_CONGESTION), * it is to avoid the bpf_tcp_cc->init() * to recur itself by calling * bpf_setsockopt(TCP_CONGESTION, "itself"). */ #define BPF_SOCK_OPS_TEST_FLAG(TP, ARG) (TP->bpf_sock_ops_cb_flags & ARG) #else #define BPF_SOCK_OPS_TEST_FLAG(TP, ARG) 0 #endif u16 timeout_rehash; /* Timeout-triggered rehash attempts */ u32 rcv_ooopack; /* Received out-of-order packets, for tcpinfo */ /* TCP-specific MTU probe information. */ struct { u32 probe_seq_start; u32 probe_seq_end; } mtu_probe; u32 plb_rehash; /* PLB-triggered rehash attempts */ u32 mtu_info; /* We received an ICMP_FRAG_NEEDED / ICMPV6_PKT_TOOBIG * while socket was owned by user. */ #if IS_ENABLED(CONFIG_MPTCP) bool is_mptcp; #endif #if IS_ENABLED(CONFIG_SMC) bool syn_smc; /* SYN includes SMC */ bool (*smc_hs_congested)(const struct sock *sk); #endif #if defined(CONFIG_TCP_MD5SIG) || defined(CONFIG_TCP_AO) /* TCP AF-Specific parts; only used by TCP-AO/MD5 Signature support so far */ const struct tcp_sock_af_ops *af_specific; #ifdef CONFIG_TCP_MD5SIG /* TCP MD5 Signature Option information */ struct tcp_md5sig_info __rcu *md5sig_info; #endif #ifdef CONFIG_TCP_AO struct tcp_ao_info __rcu *ao_info; #endif #endif /* TCP fastopen related information */ struct tcp_fastopen_request *fastopen_req; /* fastopen_rsk points to request_sock that resulted in this big * socket. Used to retransmit SYNACKs etc. */ struct request_sock __rcu *fastopen_rsk; struct saved_syn *saved_syn; }; enum tsq_enum { TSQ_THROTTLED, TSQ_QUEUED, TCP_TSQ_DEFERRED, /* tcp_tasklet_func() found socket was owned */ TCP_WRITE_TIMER_DEFERRED, /* tcp_write_timer() found socket was owned */ TCP_DELACK_TIMER_DEFERRED, /* tcp_delack_timer() found socket was owned */ TCP_MTU_REDUCED_DEFERRED, /* tcp_v{4|6}_err() could not call * tcp_v{4|6}_mtu_reduced() */ TCP_ACK_DEFERRED, /* TX pure ack is deferred */ }; enum tsq_flags { TSQF_THROTTLED = BIT(TSQ_THROTTLED), TSQF_QUEUED = BIT(TSQ_QUEUED), TCPF_TSQ_DEFERRED = BIT(TCP_TSQ_DEFERRED), TCPF_WRITE_TIMER_DEFERRED = BIT(TCP_WRITE_TIMER_DEFERRED), TCPF_DELACK_TIMER_DEFERRED = BIT(TCP_DELACK_TIMER_DEFERRED), TCPF_MTU_REDUCED_DEFERRED = BIT(TCP_MTU_REDUCED_DEFERRED), TCPF_ACK_DEFERRED = BIT(TCP_ACK_DEFERRED), }; #define tcp_sk(ptr) container_of_const(ptr, struct tcp_sock, inet_conn.icsk_inet.sk) /* Variant of tcp_sk() upgrading a const sock to a read/write tcp socket. * Used in context of (lockless) tcp listeners. */ #define tcp_sk_rw(ptr) container_of(ptr, struct tcp_sock, inet_conn.icsk_inet.sk) struct tcp_timewait_sock { struct inet_timewait_sock tw_sk; #define tw_rcv_nxt tw_sk.__tw_common.skc_tw_rcv_nxt #define tw_snd_nxt tw_sk.__tw_common.skc_tw_snd_nxt u32 tw_rcv_wnd; u32 tw_ts_offset; u32 tw_ts_recent; /* The time we sent the last out-of-window ACK: */ u32 tw_last_oow_ack_time; int tw_ts_recent_stamp; u32 tw_tx_delay; #ifdef CONFIG_TCP_MD5SIG struct tcp_md5sig_key *tw_md5_key; #endif #ifdef CONFIG_TCP_AO struct tcp_ao_info __rcu *ao_info; #endif }; static inline struct tcp_timewait_sock *tcp_twsk(const struct sock *sk) { return (struct tcp_timewait_sock *)sk; } static inline bool tcp_passive_fastopen(const struct sock *sk) { return sk->sk_state == TCP_SYN_RECV && rcu_access_pointer(tcp_sk(sk)->fastopen_rsk) != NULL; } static inline void fastopen_queue_tune(struct sock *sk, int backlog) { struct request_sock_queue *queue = &inet_csk(sk)->icsk_accept_queue; int somaxconn = READ_ONCE(sock_net(sk)->core.sysctl_somaxconn); WRITE_ONCE(queue->fastopenq.max_qlen, min_t(unsigned int, backlog, somaxconn)); } static inline void tcp_move_syn(struct tcp_sock *tp, struct request_sock *req) { tp->saved_syn = req->saved_syn; req->saved_syn = NULL; } static inline void tcp_saved_syn_free(struct tcp_sock *tp) { kfree(tp->saved_syn); tp->saved_syn = NULL; } static inline u32 tcp_saved_syn_len(const struct saved_syn *saved_syn) { return saved_syn->mac_hdrlen + saved_syn->network_hdrlen + saved_syn->tcp_hdrlen; } struct sk_buff *tcp_get_timestamping_opt_stats(const struct sock *sk, const struct sk_buff *orig_skb, const struct sk_buff *ack_skb); static inline u16 tcp_mss_clamp(const struct tcp_sock *tp, u16 mss) { /* We use READ_ONCE() here because socket might not be locked. * This happens for listeners. */ u16 user_mss = READ_ONCE(tp->rx_opt.user_mss); return (user_mss && user_mss < mss) ? user_mss : mss; } int tcp_skb_shift(struct sk_buff *to, struct sk_buff *from, int pcount, int shiftlen); void __tcp_sock_set_cork(struct sock *sk, bool on); void tcp_sock_set_cork(struct sock *sk, bool on); int tcp_sock_set_keepcnt(struct sock *sk, int val); int tcp_sock_set_keepidle_locked(struct sock *sk, int val); int tcp_sock_set_keepidle(struct sock *sk, int val); int tcp_sock_set_keepintvl(struct sock *sk, int val); void __tcp_sock_set_nodelay(struct sock *sk, bool on); void tcp_sock_set_nodelay(struct sock *sk); void tcp_sock_set_quickack(struct sock *sk, int val); int tcp_sock_set_syncnt(struct sock *sk, int val); int tcp_sock_set_user_timeout(struct sock *sk, int val); static inline bool dst_tcp_usec_ts(const struct dst_entry *dst) { return dst_feature(dst, RTAX_FEATURE_TCP_USEC_TS); } #endif /* _LINUX_TCP_H */ |
147 144 86 130 78 74 130 1 86 87 96 75 2 1 147 92 89 79 79 79 94 94 94 5 3 10 96 147 2 96 73 1 96 95 71 96 75 107 107 107 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 | // SPDX-License-Identifier: GPL-2.0-or-later /* * Squashfs - a compressed read only filesystem for Linux * * Copyright (c) 2002, 2003, 2004, 2005, 2006, 2007, 2008 * Phillip Lougher <phillip@squashfs.org.uk> * * cache.c */ /* * Blocks in Squashfs are compressed. To avoid repeatedly decompressing * recently accessed data Squashfs uses two small metadata and fragment caches. * * This file implements a generic cache implementation used for both caches, * plus functions layered ontop of the generic cache implementation to * access the metadata and fragment caches. * * To avoid out of memory and fragmentation issues with vmalloc the cache * uses sequences of kmalloced PAGE_SIZE buffers. * * It should be noted that the cache is not used for file datablocks, these * are decompressed and cached in the page-cache in the normal way. The * cache is only used to temporarily cache fragment and metadata blocks * which have been read as as a result of a metadata (i.e. inode or * directory) or fragment access. Because metadata and fragments are packed * together into blocks (to gain greater compression) the read of a particular * piece of metadata or fragment will retrieve other metadata/fragments which * have been packed with it, these because of locality-of-reference may be read * in the near future. Temporarily caching them ensures they are available for * near future access without requiring an additional read and decompress. */ #include <linux/fs.h> #include <linux/vfs.h> #include <linux/slab.h> #include <linux/vmalloc.h> #include <linux/sched.h> #include <linux/spinlock.h> #include <linux/wait.h> #include <linux/pagemap.h> #include "squashfs_fs.h" #include "squashfs_fs_sb.h" #include "squashfs.h" #include "page_actor.h" /* * Look-up block in cache, and increment usage count. If not in cache, read * and decompress it from disk. */ struct squashfs_cache_entry *squashfs_cache_get(struct super_block *sb, struct squashfs_cache *cache, u64 block, int length) { int i, n; struct squashfs_cache_entry *entry; spin_lock(&cache->lock); while (1) { for (i = cache->curr_blk, n = 0; n < cache->entries; n++) { if (cache->entry[i].block == block) { cache->curr_blk = i; break; } i = (i + 1) % cache->entries; } if (n == cache->entries) { /* * Block not in cache, if all cache entries are used * go to sleep waiting for one to become available. */ if (cache->unused == 0) { cache->num_waiters++; spin_unlock(&cache->lock); wait_event(cache->wait_queue, cache->unused); spin_lock(&cache->lock); cache->num_waiters--; continue; } /* * At least one unused cache entry. A simple * round-robin strategy is used to choose the entry to * be evicted from the cache. */ i = cache->next_blk; for (n = 0; n < cache->entries; n++) { if (cache->entry[i].refcount == 0) break; i = (i + 1) % cache->entries; } cache->next_blk = (i + 1) % cache->entries; entry = &cache->entry[i]; /* * Initialise chosen cache entry, and fill it in from * disk. */ cache->unused--; entry->block = block; entry->refcount = 1; entry->pending = 1; entry->num_waiters = 0; entry->error = 0; spin_unlock(&cache->lock); entry->length = squashfs_read_data(sb, block, length, &entry->next_index, entry->actor); spin_lock(&cache->lock); if (entry->length < 0) entry->error = entry->length; entry->pending = 0; /* * While filling this entry one or more other processes * have looked it up in the cache, and have slept * waiting for it to become available. */ if (entry->num_waiters) { spin_unlock(&cache->lock); wake_up_all(&entry->wait_queue); } else spin_unlock(&cache->lock); goto out; } /* * Block already in cache. Increment refcount so it doesn't * get reused until we're finished with it, if it was * previously unused there's one less cache entry available * for reuse. */ entry = &cache->entry[i]; if (entry->refcount == 0) cache->unused--; entry->refcount++; /* * If the entry is currently being filled in by another process * go to sleep waiting for it to become available. */ if (entry->pending) { entry->num_waiters++; spin_unlock(&cache->lock); wait_event(entry->wait_queue, !entry->pending); } else spin_unlock(&cache->lock); goto out; } out: TRACE("Got %s %d, start block %lld, refcount %d, error %d\n", cache->name, i, entry->block, entry->refcount, entry->error); if (entry->error) ERROR("Unable to read %s cache entry [%llx]\n", cache->name, block); return entry; } /* * Release cache entry, once usage count is zero it can be reused. */ void squashfs_cache_put(struct squashfs_cache_entry *entry) { struct squashfs_cache *cache = entry->cache; spin_lock(&cache->lock); entry->refcount--; if (entry->refcount == 0) { cache->unused++; /* * If there's any processes waiting for a block to become * available, wake one up. */ if (cache->num_waiters) { spin_unlock(&cache->lock); wake_up(&cache->wait_queue); return; } } spin_unlock(&cache->lock); } /* * Delete cache reclaiming all kmalloced buffers. */ void squashfs_cache_delete(struct squashfs_cache *cache) { int i, j; if (cache == NULL) return; for (i = 0; i < cache->entries; i++) { if (cache->entry[i].data) { for (j = 0; j < cache->pages; j++) kfree(cache->entry[i].data[j]); kfree(cache->entry[i].data); } kfree(cache->entry[i].actor); } kfree(cache->entry); kfree(cache); } /* * Initialise cache allocating the specified number of entries, each of * size block_size. To avoid vmalloc fragmentation issues each entry * is allocated as a sequence of kmalloced PAGE_SIZE buffers. */ struct squashfs_cache *squashfs_cache_init(char *name, int entries, int block_size) { int i, j; struct squashfs_cache *cache = kzalloc(sizeof(*cache), GFP_KERNEL); if (cache == NULL) { ERROR("Failed to allocate %s cache\n", name); return NULL; } cache->entry = kcalloc(entries, sizeof(*(cache->entry)), GFP_KERNEL); if (cache->entry == NULL) { ERROR("Failed to allocate %s cache\n", name); goto cleanup; } cache->curr_blk = 0; cache->next_blk = 0; cache->unused = entries; cache->entries = entries; cache->block_size = block_size; cache->pages = block_size >> PAGE_SHIFT; cache->pages = cache->pages ? cache->pages : 1; cache->name = name; cache->num_waiters = 0; spin_lock_init(&cache->lock); init_waitqueue_head(&cache->wait_queue); for (i = 0; i < entries; i++) { struct squashfs_cache_entry *entry = &cache->entry[i]; init_waitqueue_head(&cache->entry[i].wait_queue); entry->cache = cache; entry->block = SQUASHFS_INVALID_BLK; entry->data = kcalloc(cache->pages, sizeof(void *), GFP_KERNEL); if (entry->data == NULL) { ERROR("Failed to allocate %s cache entry\n", name); goto cleanup; } for (j = 0; j < cache->pages; j++) { entry->data[j] = kmalloc(PAGE_SIZE, GFP_KERNEL); if (entry->data[j] == NULL) { ERROR("Failed to allocate %s buffer\n", name); goto cleanup; } } entry->actor = squashfs_page_actor_init(entry->data, cache->pages, 0); if (entry->actor == NULL) { ERROR("Failed to allocate %s cache entry\n", name); goto cleanup; } } return cache; cleanup: squashfs_cache_delete(cache); return NULL; } /* * Copy up to length bytes from cache entry to buffer starting at offset bytes * into the cache entry. If there's not length bytes then copy the number of * bytes available. In all cases return the number of bytes copied. */ int squashfs_copy_data(void *buffer, struct squashfs_cache_entry *entry, int offset, int length) { int remaining = length; if (length == 0) return 0; else if (buffer == NULL) return min(length, entry->length - offset); while (offset < entry->length) { void *buff = entry->data[offset / PAGE_SIZE] + (offset % PAGE_SIZE); int bytes = min_t(int, entry->length - offset, PAGE_SIZE - (offset % PAGE_SIZE)); if (bytes >= remaining) { memcpy(buffer, buff, remaining); remaining = 0; break; } memcpy(buffer, buff, bytes); buffer += bytes; remaining -= bytes; offset += bytes; } return length - remaining; } /* * Read length bytes from metadata position <block, offset> (block is the * start of the compressed block on disk, and offset is the offset into * the block once decompressed). Data is packed into consecutive blocks, * and length bytes may require reading more than one block. */ int squashfs_read_metadata(struct super_block *sb, void *buffer, u64 *block, int *offset, int length) { struct squashfs_sb_info *msblk = sb->s_fs_info; int bytes, res = length; struct squashfs_cache_entry *entry; TRACE("Entered squashfs_read_metadata [%llx:%x]\n", *block, *offset); if (unlikely(length < 0)) return -EIO; while (length) { entry = squashfs_cache_get(sb, msblk->block_cache, *block, 0); if (entry->error) { res = entry->error; goto error; } else if (*offset >= entry->length) { res = -EIO; goto error; } bytes = squashfs_copy_data(buffer, entry, *offset, length); if (buffer) buffer += bytes; length -= bytes; *offset += bytes; if (*offset == entry->length) { *block = entry->next_index; *offset = 0; } squashfs_cache_put(entry); } return res; error: squashfs_cache_put(entry); return res; } /* * Look-up in the fragmment cache the fragment located at <start_block> in the * filesystem. If necessary read and decompress it from disk. */ struct squashfs_cache_entry *squashfs_get_fragment(struct super_block *sb, u64 start_block, int length) { struct squashfs_sb_info *msblk = sb->s_fs_info; return squashfs_cache_get(sb, msblk->fragment_cache, start_block, length); } /* * Read and decompress the datablock located at <start_block> in the * filesystem. The cache is used here to avoid duplicating locking and * read/decompress code. */ struct squashfs_cache_entry *squashfs_get_datablock(struct super_block *sb, u64 start_block, int length) { struct squashfs_sb_info *msblk = sb->s_fs_info; return squashfs_cache_get(sb, msblk->read_page, start_block, length); } /* * Read a filesystem table (uncompressed sequence of bytes) from disk */ void *squashfs_read_table(struct super_block *sb, u64 block, int length) { int pages = (length + PAGE_SIZE - 1) >> PAGE_SHIFT; int i, res; void *table, *buffer, **data; struct squashfs_page_actor *actor; table = buffer = kmalloc(length, GFP_KERNEL); if (table == NULL) return ERR_PTR(-ENOMEM); data = kcalloc(pages, sizeof(void *), GFP_KERNEL); if (data == NULL) { res = -ENOMEM; goto failed; } actor = squashfs_page_actor_init(data, pages, length); if (actor == NULL) { res = -ENOMEM; goto failed2; } for (i = 0; i < pages; i++, buffer += PAGE_SIZE) data[i] = buffer; res = squashfs_read_data(sb, block, length | SQUASHFS_COMPRESSED_BIT_BLOCK, NULL, actor); kfree(data); kfree(actor); if (res < 0) goto failed; return table; failed2: kfree(data); failed: kfree(table); return ERR_PTR(res); } |
1 1 1 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 | // SPDX-License-Identifier: GPL-2.0-or-later /* * Cryptographic API * * ARC4 Cipher Algorithm * * Jon Oberheide <jon@oberheide.org> */ #include <crypto/arc4.h> #include <linux/module.h> int arc4_setkey(struct arc4_ctx *ctx, const u8 *in_key, unsigned int key_len) { int i, j = 0, k = 0; ctx->x = 1; ctx->y = 0; for (i = 0; i < 256; i++) ctx->S[i] = i; for (i = 0; i < 256; i++) { u32 a = ctx->S[i]; j = (j + in_key[k] + a) & 0xff; ctx->S[i] = ctx->S[j]; ctx->S[j] = a; if (++k >= key_len) k = 0; } return 0; } EXPORT_SYMBOL(arc4_setkey); void arc4_crypt(struct arc4_ctx *ctx, u8 *out, const u8 *in, unsigned int len) { u32 *const S = ctx->S; u32 x, y, a, b; u32 ty, ta, tb; if (len == 0) return; x = ctx->x; y = ctx->y; a = S[x]; y = (y + a) & 0xff; b = S[y]; do { S[y] = a; a = (a + b) & 0xff; S[x] = b; x = (x + 1) & 0xff; ta = S[x]; ty = (y + ta) & 0xff; tb = S[ty]; *out++ = *in++ ^ S[a]; if (--len == 0) break; y = ty; a = ta; b = tb; } while (true); ctx->x = x; ctx->y = y; } EXPORT_SYMBOL(arc4_crypt); MODULE_DESCRIPTION("ARC4 Cipher Algorithm"); MODULE_LICENSE("GPL"); |
1 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 | // SPDX-License-Identifier: GPL-2.0-or-later /* * TerraTec Cinergy T2/qanu USB2 DVB-T adapter. * * Copyright (C) 2007 Tomi Orava (tomimo@ncircle.nullnet.fi) * * Based on the dvb-usb-framework code and the * original Terratec Cinergy T2 driver by: * * Copyright (C) 2004 Daniel Mack <daniel@qanu.de> and * Holger Waechtler <holger@qanu.de> * * Protocol Spec published on http://qanu.de/specs/terratec_cinergyT2.pdf */ #include "cinergyT2.h" /* debug */ int dvb_usb_cinergyt2_debug; module_param_named(debug, dvb_usb_cinergyt2_debug, int, 0644); MODULE_PARM_DESC(debug, "set debugging level (1=info, xfer=2, rc=4 (or-able))."); DVB_DEFINE_MOD_OPT_ADAPTER_NR(adapter_nr); struct cinergyt2_state { u8 rc_counter; unsigned char data[64]; }; /* Forward declaration */ static const struct dvb_usb_device_properties cinergyt2_properties; static int cinergyt2_streaming_ctrl(struct dvb_usb_adapter *adap, int enable) { struct dvb_usb_device *d = adap->dev; struct cinergyt2_state *st = d->priv; int ret; mutex_lock(&d->data_mutex); st->data[0] = CINERGYT2_EP1_CONTROL_STREAM_TRANSFER; st->data[1] = enable ? 1 : 0; ret = dvb_usb_generic_rw(d, st->data, 2, st->data, 64, 0); mutex_unlock(&d->data_mutex); return ret; } static int cinergyt2_power_ctrl(struct dvb_usb_device *d, int enable) { struct cinergyt2_state *st = d->priv; int ret; mutex_lock(&d->data_mutex); st->data[0] = CINERGYT2_EP1_SLEEP_MODE; st->data[1] = enable ? 0 : 1; ret = dvb_usb_generic_rw(d, st->data, 2, st->data, 3, 0); mutex_unlock(&d->data_mutex); return ret; } static int cinergyt2_frontend_attach(struct dvb_usb_adapter *adap) { struct dvb_usb_device *d = adap->dev; struct cinergyt2_state *st = d->priv; int ret; adap->fe_adap[0].fe = cinergyt2_fe_attach(adap->dev); mutex_lock(&d->data_mutex); st->data[0] = CINERGYT2_EP1_GET_FIRMWARE_VERSION; ret = dvb_usb_generic_rw(d, st->data, 1, st->data, 3, 0); if (ret < 0) { if (adap->fe_adap[0].fe) adap->fe_adap[0].fe->ops.release(adap->fe_adap[0].fe); deb_rc("cinergyt2_power_ctrl() Failed to retrieve sleep state info\n"); } mutex_unlock(&d->data_mutex); return ret; } static struct rc_map_table rc_map_cinergyt2_table[] = { { 0x0401, KEY_POWER }, { 0x0402, KEY_1 }, { 0x0403, KEY_2 }, { 0x0404, KEY_3 }, { 0x0405, KEY_4 }, { 0x0406, KEY_5 }, { 0x0407, KEY_6 }, { 0x0408, KEY_7 }, { 0x0409, KEY_8 }, { 0x040a, KEY_9 }, { 0x040c, KEY_0 }, { 0x040b, KEY_VIDEO }, { 0x040d, KEY_REFRESH }, { 0x040e, KEY_SELECT }, { 0x040f, KEY_EPG }, { 0x0410, KEY_UP }, { 0x0414, KEY_DOWN }, { 0x0411, KEY_LEFT }, { 0x0413, KEY_RIGHT }, { 0x0412, KEY_OK }, { 0x0415, KEY_TEXT }, { 0x0416, KEY_INFO }, { 0x0417, KEY_RED }, { 0x0418, KEY_GREEN }, { 0x0419, KEY_YELLOW }, { 0x041a, KEY_BLUE }, { 0x041c, KEY_VOLUMEUP }, { 0x041e, KEY_VOLUMEDOWN }, { 0x041d, KEY_MUTE }, { 0x041b, KEY_CHANNELUP }, { 0x041f, KEY_CHANNELDOWN }, { 0x0440, KEY_PAUSE }, { 0x044c, KEY_PLAY }, { 0x0458, KEY_RECORD }, { 0x0454, KEY_PREVIOUS }, { 0x0448, KEY_STOP }, { 0x045c, KEY_NEXT } }; /* Number of keypresses to ignore before detect repeating */ #define RC_REPEAT_DELAY 3 static int repeatable_keys[] = { KEY_UP, KEY_DOWN, KEY_LEFT, KEY_RIGHT, KEY_VOLUMEUP, KEY_VOLUMEDOWN, KEY_CHANNELUP, KEY_CHANNELDOWN }; static int cinergyt2_rc_query(struct dvb_usb_device *d, u32 *event, int *state) { struct cinergyt2_state *st = d->priv; int i, ret; *state = REMOTE_NO_KEY_PRESSED; mutex_lock(&d->data_mutex); st->data[0] = CINERGYT2_EP1_GET_RC_EVENTS; ret = dvb_usb_generic_rw(d, st->data, 1, st->data, 5, 0); if (ret < 0) goto ret; if (st->data[4] == 0xff) { /* key repeat */ st->rc_counter++; if (st->rc_counter > RC_REPEAT_DELAY) { for (i = 0; i < ARRAY_SIZE(repeatable_keys); i++) { if (d->last_event == repeatable_keys[i]) { *state = REMOTE_KEY_REPEAT; *event = d->last_event; deb_rc("repeat key, event %x\n", *event); goto ret; } } deb_rc("repeated key (non repeatable)\n"); } goto ret; } /* hack to pass checksum on the custom field */ st->data[2] = ~st->data[1]; dvb_usb_nec_rc_key_to_event(d, st->data, event, state); if (st->data[0] != 0) { if (*event != d->last_event) st->rc_counter = 0; deb_rc("key: %*ph\n", 5, st->data); } ret: mutex_unlock(&d->data_mutex); return ret; } static int cinergyt2_usb_probe(struct usb_interface *intf, const struct usb_device_id *id) { return dvb_usb_device_init(intf, &cinergyt2_properties, THIS_MODULE, NULL, adapter_nr); } enum { TERRATEC_CINERGY_T2, }; static struct usb_device_id cinergyt2_usb_table[] = { DVB_USB_DEV(TERRATEC, TERRATEC_CINERGY_T2), { } }; MODULE_DEVICE_TABLE(usb, cinergyt2_usb_table); static const struct dvb_usb_device_properties cinergyt2_properties = { .size_of_priv = sizeof(struct cinergyt2_state), .num_adapters = 1, .adapter = { { .num_frontends = 1, .fe = {{ .streaming_ctrl = cinergyt2_streaming_ctrl, .frontend_attach = cinergyt2_frontend_attach, /* parameter for the MPEG2-data transfer */ .stream = { .type = USB_BULK, .count = 5, .endpoint = 0x02, .u = { .bulk = { .buffersize = 512, } } }, }}, } }, .power_ctrl = cinergyt2_power_ctrl, .rc.legacy = { .rc_interval = 50, .rc_map_table = rc_map_cinergyt2_table, .rc_map_size = ARRAY_SIZE(rc_map_cinergyt2_table), .rc_query = cinergyt2_rc_query, }, .generic_bulk_ctrl_endpoint = 1, .num_device_descs = 1, .devices = { { .name = "TerraTec/qanu USB2.0 Highspeed DVB-T Receiver", .cold_ids = {NULL}, .warm_ids = { &cinergyt2_usb_table[TERRATEC_CINERGY_T2], NULL }, }, { NULL }, } }; static struct usb_driver cinergyt2_driver = { .name = "cinergyT2", .probe = cinergyt2_usb_probe, .disconnect = dvb_usb_device_exit, .id_table = cinergyt2_usb_table }; module_usb_driver(cinergyt2_driver); MODULE_DESCRIPTION("Terratec Cinergy T2 DVB-T driver"); MODULE_LICENSE("GPL"); MODULE_AUTHOR("Tomi Orava"); |
263 206 37 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 | /* SPDX-License-Identifier: GPL-2.0-or-later */ /* * INET An implementation of the TCP/IP protocol suite for the LINUX * operating system. INET is implemented using the BSD Socket * interface as the means of communication with the user level. * * Authors: Lotsa people, from code originally in tcp */ #ifndef _INET6_HASHTABLES_H #define _INET6_HASHTABLES_H #if IS_ENABLED(CONFIG_IPV6) #include <linux/in6.h> #include <linux/ipv6.h> #include <linux/types.h> #include <linux/jhash.h> #include <net/inet_sock.h> #include <net/ipv6.h> #include <net/netns/hash.h> struct inet_hashinfo; static inline unsigned int __inet6_ehashfn(const u32 lhash, const u16 lport, const u32 fhash, const __be16 fport, const u32 initval) { const u32 ports = (((u32)lport) << 16) | (__force u32)fport; return jhash_3words(lhash, fhash, ports, initval); } /* * Sockets in TCP_CLOSE state are _always_ taken out of the hash, so * we need not check it for TCP lookups anymore, thanks Alexey. -DaveM * * The sockhash lock must be held as a reader here. */ struct sock *__inet6_lookup_established(struct net *net, struct inet_hashinfo *hashinfo, const struct in6_addr *saddr, const __be16 sport, const struct in6_addr *daddr, const u16 hnum, const int dif, const int sdif); typedef u32 (inet6_ehashfn_t)(const struct net *net, const struct in6_addr *laddr, const u16 lport, const struct in6_addr *faddr, const __be16 fport); inet6_ehashfn_t inet6_ehashfn; INDIRECT_CALLABLE_DECLARE(inet6_ehashfn_t udp6_ehashfn); struct sock *inet6_lookup_reuseport(struct net *net, struct sock *sk, struct sk_buff *skb, int doff, const struct in6_addr *saddr, __be16 sport, const struct in6_addr *daddr, unsigned short hnum, inet6_ehashfn_t *ehashfn); struct sock *inet6_lookup_listener(struct net *net, struct inet_hashinfo *hashinfo, struct sk_buff *skb, int doff, const struct in6_addr *saddr, const __be16 sport, const struct in6_addr *daddr, const unsigned short hnum, const int dif, const int sdif); struct sock *inet6_lookup_run_sk_lookup(struct net *net, int protocol, struct sk_buff *skb, int doff, const struct in6_addr *saddr, const __be16 sport, const struct in6_addr *daddr, const u16 hnum, const int dif, inet6_ehashfn_t *ehashfn); static inline struct sock *__inet6_lookup(struct net *net, struct inet_hashinfo *hashinfo, struct sk_buff *skb, int doff, const struct in6_addr *saddr, const __be16 sport, const struct in6_addr *daddr, const u16 hnum, const int dif, const int sdif, bool *refcounted) { struct sock *sk = __inet6_lookup_established(net, hashinfo, saddr, sport, daddr, hnum, dif, sdif); *refcounted = true; if (sk) return sk; *refcounted = false; return inet6_lookup_listener(net, hashinfo, skb, doff, saddr, sport, daddr, hnum, dif, sdif); } static inline struct sock *inet6_steal_sock(struct net *net, struct sk_buff *skb, int doff, const struct in6_addr *saddr, const __be16 sport, const struct in6_addr *daddr, const __be16 dport, bool *refcounted, inet6_ehashfn_t *ehashfn) { struct sock *sk, *reuse_sk; bool prefetched; sk = skb_steal_sock(skb, refcounted, &prefetched); if (!sk) return NULL; if (!prefetched || !sk_fullsock(sk)) return sk; if (sk->sk_protocol == IPPROTO_TCP) { if (sk->sk_state != TCP_LISTEN) return sk; } else if (sk->sk_protocol == IPPROTO_UDP) { if (sk->sk_state != TCP_CLOSE) return sk; } else { return sk; } reuse_sk = inet6_lookup_reuseport(net, sk, skb, doff, saddr, sport, daddr, ntohs(dport), ehashfn); if (!reuse_sk) return sk; /* We've chosen a new reuseport sock which is never refcounted. This * implies that sk also isn't refcounted. */ WARN_ON_ONCE(*refcounted); return reuse_sk; } static inline struct sock *__inet6_lookup_skb(struct inet_hashinfo *hashinfo, struct sk_buff *skb, int doff, const __be16 sport, const __be16 dport, int iif, int sdif, bool *refcounted) { struct net *net = dev_net(skb_dst(skb)->dev); const struct ipv6hdr *ip6h = ipv6_hdr(skb); struct sock *sk; sk = inet6_steal_sock(net, skb, doff, &ip6h->saddr, sport, &ip6h->daddr, dport, refcounted, inet6_ehashfn); if (IS_ERR(sk)) return NULL; if (sk) return sk; return __inet6_lookup(net, hashinfo, skb, doff, &ip6h->saddr, sport, &ip6h->daddr, ntohs(dport), iif, sdif, refcounted); } struct sock *inet6_lookup(struct net *net, struct inet_hashinfo *hashinfo, struct sk_buff *skb, int doff, const struct in6_addr *saddr, const __be16 sport, const struct in6_addr *daddr, const __be16 dport, const int dif); int inet6_hash(struct sock *sk); static inline bool inet6_match(struct net *net, const struct sock *sk, const struct in6_addr *saddr, const struct in6_addr *daddr, const __portpair ports, const int dif, const int sdif) { if (!net_eq(sock_net(sk), net) || sk->sk_family != AF_INET6 || sk->sk_portpair != ports || !ipv6_addr_equal(&sk->sk_v6_daddr, saddr) || !ipv6_addr_equal(&sk->sk_v6_rcv_saddr, daddr)) return false; /* READ_ONCE() paired with WRITE_ONCE() in sock_bindtoindex_locked() */ return inet_sk_bound_dev_eq(net, READ_ONCE(sk->sk_bound_dev_if), dif, sdif); } #endif /* IS_ENABLED(CONFIG_IPV6) */ #endif /* _INET6_HASHTABLES_H */ |
3 3 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 2297 2298 2299 2300 2301 2302 2303 2304 2305 2306 2307 2308 2309 2310 2311 2312 2313 2314 2315 2316 2317 2318 2319 2320 2321 2322 2323 2324 2325 2326 2327 2328 2329 2330 2331 2332 2333 2334 2335 2336 2337 2338 2339 2340 2341 2342 2343 2344 2345 2346 2347 2348 2349 2350 2351 2352 2353 2354 2355 2356 2357 2358 2359 2360 2361 2362 2363 2364 2365 2366 2367 2368 2369 2370 2371 2372 2373 2374 2375 2376 2377 2378 2379 2380 2381 2382 2383 2384 2385 2386 2387 2388 2389 2390 2391 2392 2393 2394 2395 2396 2397 2398 2399 2400 2401 2402 2403 2404 2405 2406 2407 2408 2409 2410 2411 2412 2413 2414 2415 2416 2417 2418 2419 2420 2421 2422 2423 2424 2425 2426 2427 2428 2429 2430 2431 2432 2433 2434 2435 2436 2437 2438 2439 2440 2441 | // SPDX-License-Identifier: GPL-2.0-only /* * linux/mm/memory_hotplug.c * * Copyright (C) */ #include <linux/stddef.h> #include <linux/mm.h> #include <linux/sched/signal.h> #include <linux/swap.h> #include <linux/interrupt.h> #include <linux/pagemap.h> #include <linux/compiler.h> #include <linux/export.h> #include <linux/writeback.h> #include <linux/slab.h> #include <linux/sysctl.h> #include <linux/cpu.h> #include <linux/memory.h> #include <linux/memremap.h> #include <linux/memory_hotplug.h> #include <linux/vmalloc.h> #include <linux/ioport.h> #include <linux/delay.h> #include <linux/migrate.h> #include <linux/page-isolation.h> #include <linux/pfn.h> #include <linux/suspend.h> #include <linux/mm_inline.h> #include <linux/firmware-map.h> #include <linux/stop_machine.h> #include <linux/hugetlb.h> #include <linux/memblock.h> #include <linux/compaction.h> #include <linux/rmap.h> #include <linux/module.h> #include <asm/tlbflush.h> #include "internal.h" #include "shuffle.h" enum { MEMMAP_ON_MEMORY_DISABLE = 0, MEMMAP_ON_MEMORY_ENABLE, MEMMAP_ON_MEMORY_FORCE, }; static int memmap_mode __read_mostly = MEMMAP_ON_MEMORY_DISABLE; static inline unsigned long memory_block_memmap_size(void) { return PHYS_PFN(memory_block_size_bytes()) * sizeof(struct page); } static inline unsigned long memory_block_memmap_on_memory_pages(void) { unsigned long nr_pages = PFN_UP(memory_block_memmap_size()); /* * In "forced" memmap_on_memory mode, we add extra pages to align the * vmemmap size to cover full pageblocks. That way, we can add memory * even if the vmemmap size is not properly aligned, however, we might waste * memory. */ if (memmap_mode == MEMMAP_ON_MEMORY_FORCE) return pageblock_align(nr_pages); return nr_pages; } #ifdef CONFIG_MHP_MEMMAP_ON_MEMORY /* * memory_hotplug.memmap_on_memory parameter */ static int set_memmap_mode(const char *val, const struct kernel_param *kp) { int ret, mode; bool enabled; if (sysfs_streq(val, "force") || sysfs_streq(val, "FORCE")) { mode = MEMMAP_ON_MEMORY_FORCE; } else { ret = kstrtobool(val, &enabled); if (ret < 0) return ret; if (enabled) mode = MEMMAP_ON_MEMORY_ENABLE; else mode = MEMMAP_ON_MEMORY_DISABLE; } *((int *)kp->arg) = mode; if (mode == MEMMAP_ON_MEMORY_FORCE) { unsigned long memmap_pages = memory_block_memmap_on_memory_pages(); pr_info_once("Memory hotplug will waste %ld pages in each memory block\n", memmap_pages - PFN_UP(memory_block_memmap_size())); } return 0; } static int get_memmap_mode(char *buffer, const struct kernel_param *kp) { int mode = *((int *)kp->arg); if (mode == MEMMAP_ON_MEMORY_FORCE) return sprintf(buffer, "force\n"); return sprintf(buffer, "%c\n", mode ? 'Y' : 'N'); } static const struct kernel_param_ops memmap_mode_ops = { .set = set_memmap_mode, .get = get_memmap_mode, }; module_param_cb(memmap_on_memory, &memmap_mode_ops, &memmap_mode, 0444); MODULE_PARM_DESC(memmap_on_memory, "Enable memmap on memory for memory hotplug\n" "With value \"force\" it could result in memory wastage due " "to memmap size limitations (Y/N/force)"); static inline bool mhp_memmap_on_memory(void) { return memmap_mode != MEMMAP_ON_MEMORY_DISABLE; } #else static inline bool mhp_memmap_on_memory(void) { return false; } #endif enum { ONLINE_POLICY_CONTIG_ZONES = 0, ONLINE_POLICY_AUTO_MOVABLE, }; static const char * const online_policy_to_str[] = { [ONLINE_POLICY_CONTIG_ZONES] = "contig-zones", [ONLINE_POLICY_AUTO_MOVABLE] = "auto-movable", }; static int set_online_policy(const char *val, const struct kernel_param *kp) { int ret = sysfs_match_string(online_policy_to_str, val); if (ret < 0) return ret; *((int *)kp->arg) = ret; return 0; } static int get_online_policy(char *buffer, const struct kernel_param *kp) { return sprintf(buffer, "%s\n", online_policy_to_str[*((int *)kp->arg)]); } /* * memory_hotplug.online_policy: configure online behavior when onlining without * specifying a zone (MMOP_ONLINE) * * "contig-zones": keep zone contiguous * "auto-movable": online memory to ZONE_MOVABLE if the configuration * (auto_movable_ratio, auto_movable_numa_aware) allows for it */ static int online_policy __read_mostly = ONLINE_POLICY_CONTIG_ZONES; static const struct kernel_param_ops online_policy_ops = { .set = set_online_policy, .get = get_online_policy, }; module_param_cb(online_policy, &online_policy_ops, &online_policy, 0644); MODULE_PARM_DESC(online_policy, "Set the online policy (\"contig-zones\", \"auto-movable\") " "Default: \"contig-zones\""); /* * memory_hotplug.auto_movable_ratio: specify maximum MOVABLE:KERNEL ratio * * The ratio represent an upper limit and the kernel might decide to not * online some memory to ZONE_MOVABLE -- e.g., because hotplugged KERNEL memory * doesn't allow for more MOVABLE memory. */ static unsigned int auto_movable_ratio __read_mostly = 301; module_param(auto_movable_ratio, uint, 0644); MODULE_PARM_DESC(auto_movable_ratio, "Set the maximum ratio of MOVABLE:KERNEL memory in the system " "in percent for \"auto-movable\" online policy. Default: 301"); /* * memory_hotplug.auto_movable_numa_aware: consider numa node stats */ #ifdef CONFIG_NUMA static bool auto_movable_numa_aware __read_mostly = true; module_param(auto_movable_numa_aware, bool, 0644); MODULE_PARM_DESC(auto_movable_numa_aware, "Consider numa node stats in addition to global stats in " "\"auto-movable\" online policy. Default: true"); #endif /* CONFIG_NUMA */ /* * online_page_callback contains pointer to current page onlining function. * Initially it is generic_online_page(). If it is required it could be * changed by calling set_online_page_callback() for callback registration * and restore_online_page_callback() for generic callback restore. */ static online_page_callback_t online_page_callback = generic_online_page; static DEFINE_MUTEX(online_page_callback_lock); DEFINE_STATIC_PERCPU_RWSEM(mem_hotplug_lock); void get_online_mems(void) { percpu_down_read(&mem_hotplug_lock); } void put_online_mems(void) { percpu_up_read(&mem_hotplug_lock); } bool movable_node_enabled = false; #ifndef CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE int mhp_default_online_type = MMOP_OFFLINE; #else int mhp_default_online_type = MMOP_ONLINE; #endif static int __init setup_memhp_default_state(char *str) { const int online_type = mhp_online_type_from_str(str); if (online_type >= 0) mhp_default_online_type = online_type; return 1; } __setup("memhp_default_state=", setup_memhp_default_state); void mem_hotplug_begin(void) { cpus_read_lock(); percpu_down_write(&mem_hotplug_lock); } void mem_hotplug_done(void) { percpu_up_write(&mem_hotplug_lock); cpus_read_unlock(); } u64 max_mem_size = U64_MAX; /* add this memory to iomem resource */ static struct resource *register_memory_resource(u64 start, u64 size, const char *resource_name) { struct resource *res; unsigned long flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY; if (strcmp(resource_name, "System RAM")) flags |= IORESOURCE_SYSRAM_DRIVER_MANAGED; if (!mhp_range_allowed(start, size, true)) return ERR_PTR(-E2BIG); /* * Make sure value parsed from 'mem=' only restricts memory adding * while booting, so that memory hotplug won't be impacted. Please * refer to document of 'mem=' in kernel-parameters.txt for more * details. */ if (start + size > max_mem_size && system_state < SYSTEM_RUNNING) return ERR_PTR(-E2BIG); /* * Request ownership of the new memory range. This might be * a child of an existing resource that was present but * not marked as busy. */ res = __request_region(&iomem_resource, start, size, resource_name, flags); if (!res) { pr_debug("Unable to reserve System RAM region: %016llx->%016llx\n", start, start + size); return ERR_PTR(-EEXIST); } return res; } static void release_memory_resource(struct resource *res) { if (!res) return; release_resource(res); kfree(res); } static int check_pfn_span(unsigned long pfn, unsigned long nr_pages) { /* * Disallow all operations smaller than a sub-section and only * allow operations smaller than a section for * SPARSEMEM_VMEMMAP. Note that check_hotplug_memory_range() * enforces a larger memory_block_size_bytes() granularity for * memory that will be marked online, so this check should only * fire for direct arch_{add,remove}_memory() users outside of * add_memory_resource(). */ unsigned long min_align; if (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP)) min_align = PAGES_PER_SUBSECTION; else min_align = PAGES_PER_SECTION; if (!IS_ALIGNED(pfn | nr_pages, min_align)) return -EINVAL; return 0; } /* * Return page for the valid pfn only if the page is online. All pfn * walkers which rely on the fully initialized page->flags and others * should use this rather than pfn_valid && pfn_to_page */ struct page *pfn_to_online_page(unsigned long pfn) { unsigned long nr = pfn_to_section_nr(pfn); struct dev_pagemap *pgmap; struct mem_section *ms; if (nr >= NR_MEM_SECTIONS) return NULL; ms = __nr_to_section(nr); if (!online_section(ms)) return NULL; /* * Save some code text when online_section() + * pfn_section_valid() are sufficient. */ if (IS_ENABLED(CONFIG_HAVE_ARCH_PFN_VALID) && !pfn_valid(pfn)) return NULL; if (!pfn_section_valid(ms, pfn)) return NULL; if (!online_device_section(ms)) return pfn_to_page(pfn); /* * Slowpath: when ZONE_DEVICE collides with * ZONE_{NORMAL,MOVABLE} within the same section some pfns in * the section may be 'offline' but 'valid'. Only * get_dev_pagemap() can determine sub-section online status. */ pgmap = get_dev_pagemap(pfn, NULL); put_dev_pagemap(pgmap); /* The presence of a pgmap indicates ZONE_DEVICE offline pfn */ if (pgmap) return NULL; return pfn_to_page(pfn); } EXPORT_SYMBOL_GPL(pfn_to_online_page); int __ref __add_pages(int nid, unsigned long pfn, unsigned long nr_pages, struct mhp_params *params) { const unsigned long end_pfn = pfn + nr_pages; unsigned long cur_nr_pages; int err; struct vmem_altmap *altmap = params->altmap; if (WARN_ON_ONCE(!pgprot_val(params->pgprot))) return -EINVAL; VM_BUG_ON(!mhp_range_allowed(PFN_PHYS(pfn), nr_pages * PAGE_SIZE, false)); if (altmap) { /* * Validate altmap is within bounds of the total request */ if (altmap->base_pfn != pfn || vmem_altmap_offset(altmap) > nr_pages) { pr_warn_once("memory add fail, invalid altmap\n"); return -EINVAL; } altmap->alloc = 0; } if (check_pfn_span(pfn, nr_pages)) { WARN(1, "Misaligned %s start: %#lx end: %#lx\n", __func__, pfn, pfn + nr_pages - 1); return -EINVAL; } for (; pfn < end_pfn; pfn += cur_nr_pages) { /* Select all remaining pages up to the next section boundary */ cur_nr_pages = min(end_pfn - pfn, SECTION_ALIGN_UP(pfn + 1) - pfn); err = sparse_add_section(nid, pfn, cur_nr_pages, altmap, params->pgmap); if (err) break; cond_resched(); } vmemmap_populate_print_last(); return err; } /* find the smallest valid pfn in the range [start_pfn, end_pfn) */ static unsigned long find_smallest_section_pfn(int nid, struct zone *zone, unsigned long start_pfn, unsigned long end_pfn) { for (; start_pfn < end_pfn; start_pfn += PAGES_PER_SUBSECTION) { if (unlikely(!pfn_to_online_page(start_pfn))) continue; if (unlikely(pfn_to_nid(start_pfn) != nid)) continue; if (zone != page_zone(pfn_to_page(start_pfn))) continue; return start_pfn; } return 0; } /* find the biggest valid pfn in the range [start_pfn, end_pfn). */ static unsigned long find_biggest_section_pfn(int nid, struct zone *zone, unsigned long start_pfn, unsigned long end_pfn) { unsigned long pfn; /* pfn is the end pfn of a memory section. */ pfn = end_pfn - 1; for (; pfn >= start_pfn; pfn -= PAGES_PER_SUBSECTION) { if (unlikely(!pfn_to_online_page(pfn))) continue; if (unlikely(pfn_to_nid(pfn) != nid)) continue; if (zone != page_zone(pfn_to_page(pfn))) continue; return pfn; } return 0; } static void shrink_zone_span(struct zone *zone, unsigned long start_pfn, unsigned long end_pfn) { unsigned long pfn; int nid = zone_to_nid(zone); if (zone->zone_start_pfn == start_pfn) { /* * If the section is smallest section in the zone, it need * shrink zone->zone_start_pfn and zone->zone_spanned_pages. * In this case, we find second smallest valid mem_section * for shrinking zone. */ pfn = find_smallest_section_pfn(nid, zone, end_pfn, zone_end_pfn(zone)); if (pfn) { zone->spanned_pages = zone_end_pfn(zone) - pfn; zone->zone_start_pfn = pfn; } else { zone->zone_start_pfn = 0; zone->spanned_pages = 0; } } else if (zone_end_pfn(zone) == end_pfn) { /* * If the section is biggest section in the zone, it need * shrink zone->spanned_pages. * In this case, we find second biggest valid mem_section for * shrinking zone. */ pfn = find_biggest_section_pfn(nid, zone, zone->zone_start_pfn, start_pfn); if (pfn) zone->spanned_pages = pfn - zone->zone_start_pfn + 1; else { zone->zone_start_pfn = 0; zone->spanned_pages = 0; } } } static void update_pgdat_span(struct pglist_data *pgdat) { unsigned long node_start_pfn = 0, node_end_pfn = 0; struct zone *zone; for (zone = pgdat->node_zones; zone < pgdat->node_zones + MAX_NR_ZONES; zone++) { unsigned long end_pfn = zone_end_pfn(zone); /* No need to lock the zones, they can't change. */ if (!zone->spanned_pages) continue; if (!node_end_pfn) { node_start_pfn = zone->zone_start_pfn; node_end_pfn = end_pfn; continue; } if (end_pfn > node_end_pfn) node_end_pfn = end_pfn; if (zone->zone_start_pfn < node_start_pfn) node_start_pfn = zone->zone_start_pfn; } pgdat->node_start_pfn = node_start_pfn; pgdat->node_spanned_pages = node_end_pfn - node_start_pfn; } void __ref remove_pfn_range_from_zone(struct zone *zone, unsigned long start_pfn, unsigned long nr_pages) { const unsigned long end_pfn = start_pfn + nr_pages; struct pglist_data *pgdat = zone->zone_pgdat; unsigned long pfn, cur_nr_pages; /* Poison struct pages because they are now uninitialized again. */ for (pfn = start_pfn; pfn < end_pfn; pfn += cur_nr_pages) { cond_resched(); /* Select all remaining pages up to the next section boundary */ cur_nr_pages = min(end_pfn - pfn, SECTION_ALIGN_UP(pfn + 1) - pfn); page_init_poison(pfn_to_page(pfn), sizeof(struct page) * cur_nr_pages); } /* * Zone shrinking code cannot properly deal with ZONE_DEVICE. So * we will not try to shrink the zones - which is okay as * set_zone_contiguous() cannot deal with ZONE_DEVICE either way. */ if (zone_is_zone_device(zone)) return; clear_zone_contiguous(zone); shrink_zone_span(zone, start_pfn, start_pfn + nr_pages); update_pgdat_span(pgdat); set_zone_contiguous(zone); } /** * __remove_pages() - remove sections of pages * @pfn: starting pageframe (must be aligned to start of a section) * @nr_pages: number of pages to remove (must be multiple of section size) * @altmap: alternative device page map or %NULL if default memmap is used * * Generic helper function to remove section mappings and sysfs entries * for the section of the memory we are removing. Caller needs to make * sure that pages are marked reserved and zones are adjust properly by * calling offline_pages(). */ void __remove_pages(unsigned long pfn, unsigned long nr_pages, struct vmem_altmap *altmap) { const unsigned long end_pfn = pfn + nr_pages; unsigned long cur_nr_pages; if (check_pfn_span(pfn, nr_pages)) { WARN(1, "Misaligned %s start: %#lx end: %#lx\n", __func__, pfn, pfn + nr_pages - 1); return; } for (; pfn < end_pfn; pfn += cur_nr_pages) { cond_resched(); /* Select all remaining pages up to the next section boundary */ cur_nr_pages = min(end_pfn - pfn, SECTION_ALIGN_UP(pfn + 1) - pfn); sparse_remove_section(pfn, cur_nr_pages, altmap); } } int set_online_page_callback(online_page_callback_t callback) { int rc = -EINVAL; get_online_mems(); mutex_lock(&online_page_callback_lock); if (online_page_callback == generic_online_page) { online_page_callback = callback; rc = 0; } mutex_unlock(&online_page_callback_lock); put_online_mems(); return rc; } EXPORT_SYMBOL_GPL(set_online_page_callback); int restore_online_page_callback(online_page_callback_t callback) { int rc = -EINVAL; get_online_mems(); mutex_lock(&online_page_callback_lock); if (online_page_callback == callback) { online_page_callback = generic_online_page; rc = 0; } mutex_unlock(&online_page_callback_lock); put_online_mems(); return rc; } EXPORT_SYMBOL_GPL(restore_online_page_callback); /* we are OK calling __meminit stuff here - we have CONFIG_MEMORY_HOTPLUG */ void __ref generic_online_page(struct page *page, unsigned int order) { __free_pages_core(page, order, MEMINIT_HOTPLUG); } EXPORT_SYMBOL_GPL(generic_online_page); static void online_pages_range(unsigned long start_pfn, unsigned long nr_pages) { const unsigned long end_pfn = start_pfn + nr_pages; unsigned long pfn; /* * Online the pages in MAX_PAGE_ORDER aligned chunks. The callback might * decide to not expose all pages to the buddy (e.g., expose them * later). We account all pages as being online and belonging to this * zone ("present"). * When using memmap_on_memory, the range might not be aligned to * MAX_ORDER_NR_PAGES - 1, but pageblock aligned. __ffs() will detect * this and the first chunk to online will be pageblock_nr_pages. */ for (pfn = start_pfn; pfn < end_pfn;) { int order; /* * Free to online pages in the largest chunks alignment allows. * * __ffs() behaviour is undefined for 0. start == 0 is * MAX_PAGE_ORDER-aligned, Set order to MAX_PAGE_ORDER for * the case. */ if (pfn) order = min_t(int, MAX_PAGE_ORDER, __ffs(pfn)); else order = MAX_PAGE_ORDER; (*online_page_callback)(pfn_to_page(pfn), order); pfn += (1UL << order); } /* mark all involved sections as online */ online_mem_sections(start_pfn, end_pfn); } /* check which state of node_states will be changed when online memory */ static void node_states_check_changes_online(unsigned long nr_pages, struct zone *zone, struct memory_notify *arg) { int nid = zone_to_nid(zone); arg->status_change_nid = NUMA_NO_NODE; arg->status_change_nid_normal = NUMA_NO_NODE; if (!node_state(nid, N_MEMORY)) arg->status_change_nid = nid; if (zone_idx(zone) <= ZONE_NORMAL && !node_state(nid, N_NORMAL_MEMORY)) arg->status_change_nid_normal = nid; } static void node_states_set_node(int node, struct memory_notify *arg) { if (arg->status_change_nid_normal >= 0) node_set_state(node, N_NORMAL_MEMORY); if (arg->status_change_nid >= 0) node_set_state(node, N_MEMORY); } static void __meminit resize_zone_range(struct zone *zone, unsigned long start_pfn, unsigned long nr_pages) { unsigned long old_end_pfn = zone_end_pfn(zone); if (zone_is_empty(zone) || start_pfn < zone->zone_start_pfn) zone->zone_start_pfn = start_pfn; zone->spanned_pages = max(start_pfn + nr_pages, old_end_pfn) - zone->zone_start_pfn; } static void __meminit resize_pgdat_range(struct pglist_data *pgdat, unsigned long start_pfn, unsigned long nr_pages) { unsigned long old_end_pfn = pgdat_end_pfn(pgdat); if (!pgdat->node_spanned_pages || start_pfn < pgdat->node_start_pfn) pgdat->node_start_pfn = start_pfn; pgdat->node_spanned_pages = max(start_pfn + nr_pages, old_end_pfn) - pgdat->node_start_pfn; } #ifdef CONFIG_ZONE_DEVICE static void section_taint_zone_device(unsigned long pfn) { struct mem_section *ms = __pfn_to_section(pfn); ms->section_mem_map |= SECTION_TAINT_ZONE_DEVICE; } #else static inline void section_taint_zone_device(unsigned long pfn) { } #endif /* * Associate the pfn range with the given zone, initializing the memmaps * and resizing the pgdat/zone data to span the added pages. After this * call, all affected pages are PageOffline(). * * All aligned pageblocks are initialized to the specified migratetype * (usually MIGRATE_MOVABLE). Besides setting the migratetype, no related * zone stats (e.g., nr_isolate_pageblock) are touched. */ void __ref move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn, unsigned long nr_pages, struct vmem_altmap *altmap, int migratetype) { struct pglist_data *pgdat = zone->zone_pgdat; int nid = pgdat->node_id; clear_zone_contiguous(zone); if (zone_is_empty(zone)) init_currently_empty_zone(zone, start_pfn, nr_pages); resize_zone_range(zone, start_pfn, nr_pages); resize_pgdat_range(pgdat, start_pfn, nr_pages); /* * Subsection population requires care in pfn_to_online_page(). * Set the taint to enable the slow path detection of * ZONE_DEVICE pages in an otherwise ZONE_{NORMAL,MOVABLE} * section. */ if (zone_is_zone_device(zone)) { if (!IS_ALIGNED(start_pfn, PAGES_PER_SECTION)) section_taint_zone_device(start_pfn); if (!IS_ALIGNED(start_pfn + nr_pages, PAGES_PER_SECTION)) section_taint_zone_device(start_pfn + nr_pages); } /* * TODO now we have a visible range of pages which are not associated * with their zone properly. Not nice but set_pfnblock_flags_mask * expects the zone spans the pfn range. All the pages in the range * are reserved so nobody should be touching them so we should be safe */ memmap_init_range(nr_pages, nid, zone_idx(zone), start_pfn, 0, MEMINIT_HOTPLUG, altmap, migratetype); set_zone_contiguous(zone); } struct auto_movable_stats { unsigned long kernel_early_pages; unsigned long movable_pages; }; static void auto_movable_stats_account_zone(struct auto_movable_stats *stats, struct zone *zone) { if (zone_idx(zone) == ZONE_MOVABLE) { stats->movable_pages += zone->present_pages; } else { stats->kernel_early_pages += zone->present_early_pages; #ifdef CONFIG_CMA /* * CMA pages (never on hotplugged memory) behave like * ZONE_MOVABLE. */ stats->movable_pages += zone->cma_pages; stats->kernel_early_pages -= zone->cma_pages; #endif /* CONFIG_CMA */ } } struct auto_movable_group_stats { unsigned long movable_pages; unsigned long req_kernel_early_pages; }; static int auto_movable_stats_account_group(struct memory_group *group, void *arg) { const int ratio = READ_ONCE(auto_movable_ratio); struct auto_movable_group_stats *stats = arg; long pages; /* * We don't support modifying the config while the auto-movable online * policy is already enabled. Just avoid the division by zero below. */ if (!ratio) return 0; /* * Calculate how many early kernel pages this group requires to * satisfy the configured zone ratio. */ pages = group->present_movable_pages * 100 / ratio; pages -= group->present_kernel_pages; if (pages > 0) stats->req_kernel_early_pages += pages; stats->movable_pages += group->present_movable_pages; return 0; } static bool auto_movable_can_online_movable(int nid, struct memory_group *group, unsigned long nr_pages) { unsigned long kernel_early_pages, movable_pages; struct auto_movable_group_stats group_stats = {}; struct auto_movable_stats stats = {}; struct zone *zone; int i; /* Walk all relevant zones and collect MOVABLE vs. KERNEL stats. */ if (nid == NUMA_NO_NODE) { /* TODO: cache values */ for_each_populated_zone(zone) auto_movable_stats_account_zone(&stats, zone); } else { for (i = 0; i < MAX_NR_ZONES; i++) { pg_data_t *pgdat = NODE_DATA(nid); zone = pgdat->node_zones + i; if (populated_zone(zone)) auto_movable_stats_account_zone(&stats, zone); } } kernel_early_pages = stats.kernel_early_pages; movable_pages = stats.movable_pages; /* * Kernel memory inside dynamic memory group allows for more MOVABLE * memory within the same group. Remove the effect of all but the * current group from the stats. */ walk_dynamic_memory_groups(nid, auto_movable_stats_account_group, group, &group_stats); if (kernel_early_pages <= group_stats.req_kernel_early_pages) return false; kernel_early_pages -= group_stats.req_kernel_early_pages; movable_pages -= group_stats.movable_pages; if (group && group->is_dynamic) kernel_early_pages += group->present_kernel_pages; /* * Test if we could online the given number of pages to ZONE_MOVABLE * and still stay in the configured ratio. */ movable_pages += nr_pages; return movable_pages <= (auto_movable_ratio * kernel_early_pages) / 100; } /* * Returns a default kernel memory zone for the given pfn range. * If no kernel zone covers this pfn range it will automatically go * to the ZONE_NORMAL. */ static struct zone *default_kernel_zone_for_pfn(int nid, unsigned long start_pfn, unsigned long nr_pages) { struct pglist_data *pgdat = NODE_DATA(nid); int zid; for (zid = 0; zid < ZONE_NORMAL; zid++) { struct zone *zone = &pgdat->node_zones[zid]; if (zone_intersects(zone, start_pfn, nr_pages)) return zone; } return &pgdat->node_zones[ZONE_NORMAL]; } /* * Determine to which zone to online memory dynamically based on user * configuration and system stats. We care about the following ratio: * * MOVABLE : KERNEL * * Whereby MOVABLE is memory in ZONE_MOVABLE and KERNEL is memory in * one of the kernel zones. CMA pages inside one of the kernel zones really * behaves like ZONE_MOVABLE, so we treat them accordingly. * * We don't allow for hotplugged memory in a KERNEL zone to increase the * amount of MOVABLE memory we can have, so we end up with: * * MOVABLE : KERNEL_EARLY * * Whereby KERNEL_EARLY is memory in one of the kernel zones, available sinze * boot. We base our calculation on KERNEL_EARLY internally, because: * * a) Hotplugged memory in one of the kernel zones can sometimes still get * hotunplugged, especially when hot(un)plugging individual memory blocks. * There is no coordination across memory devices, therefore "automatic" * hotunplugging, as implemented in hypervisors, could result in zone * imbalances. * b) Early/boot memory in one of the kernel zones can usually not get * hotunplugged again (e.g., no firmware interface to unplug, fragmented * with unmovable allocations). While there are corner cases where it might * still work, it is barely relevant in practice. * * Exceptions are dynamic memory groups, which allow for more MOVABLE * memory within the same memory group -- because in that case, there is * coordination within the single memory device managed by a single driver. * * We rely on "present pages" instead of "managed pages", as the latter is * highly unreliable and dynamic in virtualized environments, and does not * consider boot time allocations. For example, memory ballooning adjusts the * managed pages when inflating/deflating the balloon, and balloon compaction * can even migrate inflated pages between zones. * * Using "present pages" is better but some things to keep in mind are: * * a) Some memblock allocations, such as for the crashkernel area, are * effectively unused by the kernel, yet they account to "present pages". * Fortunately, these allocations are comparatively small in relevant setups * (e.g., fraction of system memory). * b) Some hotplugged memory blocks in virtualized environments, esecially * hotplugged by virtio-mem, look like they are completely present, however, * only parts of the memory block are actually currently usable. * "present pages" is an upper limit that can get reached at runtime. As * we base our calculations on KERNEL_EARLY, this is not an issue. */ static struct zone *auto_movable_zone_for_pfn(int nid, struct memory_group *group, unsigned long pfn, unsigned long nr_pages) { unsigned long online_pages = 0, max_pages, end_pfn; struct page *page; if (!auto_movable_ratio) goto kernel_zone; if (group && !group->is_dynamic) { max_pages = group->s.max_pages; online_pages = group->present_movable_pages; /* If anything is !MOVABLE online the rest !MOVABLE. */ if (group->present_kernel_pages) goto kernel_zone; } else if (!group || group->d.unit_pages == nr_pages) { max_pages = nr_pages; } else { max_pages = group->d.unit_pages; /* * Take a look at all online sections in the current unit. * We can safely assume that all pages within a section belong * to the same zone, because dynamic memory groups only deal * with hotplugged memory. */ pfn = ALIGN_DOWN(pfn, group->d.unit_pages); end_pfn = pfn + group->d.unit_pages; for (; pfn < end_pfn; pfn += PAGES_PER_SECTION) { page = pfn_to_online_page(pfn); if (!page) continue; /* If anything is !MOVABLE online the rest !MOVABLE. */ if (!is_zone_movable_page(page)) goto kernel_zone; online_pages += PAGES_PER_SECTION; } } /* * Online MOVABLE if we could *currently* online all remaining parts * MOVABLE. We expect to (add+) online them immediately next, so if * nobody interferes, all will be MOVABLE if possible. */ nr_pages = max_pages - online_pages; if (!auto_movable_can_online_movable(NUMA_NO_NODE, group, nr_pages)) goto kernel_zone; #ifdef CONFIG_NUMA if (auto_movable_numa_aware && !auto_movable_can_online_movable(nid, group, nr_pages)) goto kernel_zone; #endif /* CONFIG_NUMA */ return &NODE_DATA(nid)->node_zones[ZONE_MOVABLE]; kernel_zone: return default_kernel_zone_for_pfn(nid, pfn, nr_pages); } static inline struct zone *default_zone_for_pfn(int nid, unsigned long start_pfn, unsigned long nr_pages) { struct zone *kernel_zone = default_kernel_zone_for_pfn(nid, start_pfn, nr_pages); struct zone *movable_zone = &NODE_DATA(nid)->node_zones[ZONE_MOVABLE]; bool in_kernel = zone_intersects(kernel_zone, start_pfn, nr_pages); bool in_movable = zone_intersects(movable_zone, start_pfn, nr_pages); /* * We inherit the existing zone in a simple case where zones do not * overlap in the given range */ if (in_kernel ^ in_movable) return (in_kernel) ? kernel_zone : movable_zone; /* * If the range doesn't belong to any zone or two zones overlap in the * given range then we use movable zone only if movable_node is * enabled because we always online to a kernel zone by default. */ return movable_node_enabled ? movable_zone : kernel_zone; } struct zone *zone_for_pfn_range(int online_type, int nid, struct memory_group *group, unsigned long start_pfn, unsigned long nr_pages) { if (online_type == MMOP_ONLINE_KERNEL) return default_kernel_zone_for_pfn(nid, start_pfn, nr_pages); if (online_type == MMOP_ONLINE_MOVABLE) return &NODE_DATA(nid)->node_zones[ZONE_MOVABLE]; if (online_policy == ONLINE_POLICY_AUTO_MOVABLE) return auto_movable_zone_for_pfn(nid, group, start_pfn, nr_pages); return default_zone_for_pfn(nid, start_pfn, nr_pages); } /* * This function should only be called by memory_block_{online,offline}, * and {online,offline}_pages. */ void adjust_present_page_count(struct page *page, struct memory_group *group, long nr_pages) { struct zone *zone = page_zone(page); const bool movable = zone_idx(zone) == ZONE_MOVABLE; /* * We only support onlining/offlining/adding/removing of complete * memory blocks; therefore, either all is either early or hotplugged. */ if (early_section(__pfn_to_section(page_to_pfn(page)))) zone->present_early_pages += nr_pages; zone->present_pages += nr_pages; zone->zone_pgdat->node_present_pages += nr_pages; if (group && movable) group->present_movable_pages += nr_pages; else if (group && !movable) group->present_kernel_pages += nr_pages; } int mhp_init_memmap_on_memory(unsigned long pfn, unsigned long nr_pages, struct zone *zone, bool mhp_off_inaccessible) { unsigned long end_pfn = pfn + nr_pages; int ret, i; ret = kasan_add_zero_shadow(__va(PFN_PHYS(pfn)), PFN_PHYS(nr_pages)); if (ret) return ret; /* * Memory block is accessible at this stage and hence poison the struct * pages now. If the memory block is accessible during memory hotplug * addition phase, then page poisining is already performed in * sparse_add_section(). */ if (mhp_off_inaccessible) page_init_poison(pfn_to_page(pfn), sizeof(struct page) * nr_pages); move_pfn_range_to_zone(zone, pfn, nr_pages, NULL, MIGRATE_UNMOVABLE); for (i = 0; i < nr_pages; i++) { struct page *page = pfn_to_page(pfn + i); __ClearPageOffline(page); SetPageVmemmapSelfHosted(page); } /* * It might be that the vmemmap_pages fully span sections. If that is * the case, mark those sections online here as otherwise they will be * left offline. */ if (nr_pages >= PAGES_PER_SECTION) online_mem_sections(pfn, ALIGN_DOWN(end_pfn, PAGES_PER_SECTION)); return ret; } void mhp_deinit_memmap_on_memory(unsigned long pfn, unsigned long nr_pages) { unsigned long end_pfn = pfn + nr_pages; /* * It might be that the vmemmap_pages fully span sections. If that is * the case, mark those sections offline here as otherwise they will be * left online. */ if (nr_pages >= PAGES_PER_SECTION) offline_mem_sections(pfn, ALIGN_DOWN(end_pfn, PAGES_PER_SECTION)); /* * The pages associated with this vmemmap have been offlined, so * we can reset its state here. */ remove_pfn_range_from_zone(page_zone(pfn_to_page(pfn)), pfn, nr_pages); kasan_remove_zero_shadow(__va(PFN_PHYS(pfn)), PFN_PHYS(nr_pages)); } /* * Must be called with mem_hotplug_lock in write mode. */ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, struct zone *zone, struct memory_group *group) { unsigned long flags; int need_zonelists_rebuild = 0; const int nid = zone_to_nid(zone); int ret; struct memory_notify arg; /* * {on,off}lining is constrained to full memory sections (or more * precisely to memory blocks from the user space POV). * memmap_on_memory is an exception because it reserves initial part * of the physical memory space for vmemmaps. That space is pageblock * aligned. */ if (WARN_ON_ONCE(!nr_pages || !pageblock_aligned(pfn) || !IS_ALIGNED(pfn + nr_pages, PAGES_PER_SECTION))) return -EINVAL; /* associate pfn range with the zone */ move_pfn_range_to_zone(zone, pfn, nr_pages, NULL, MIGRATE_ISOLATE); arg.start_pfn = pfn; arg.nr_pages = nr_pages; node_states_check_changes_online(nr_pages, zone, &arg); ret = memory_notify(MEM_GOING_ONLINE, &arg); ret = notifier_to_errno(ret); if (ret) goto failed_addition; /* * Fixup the number of isolated pageblocks before marking the sections * onlining, such that undo_isolate_page_range() works correctly. */ spin_lock_irqsave(&zone->lock, flags); zone->nr_isolate_pageblock += nr_pages / pageblock_nr_pages; spin_unlock_irqrestore(&zone->lock, flags); /* * If this zone is not populated, then it is not in zonelist. * This means the page allocator ignores this zone. * So, zonelist must be updated after online. */ if (!populated_zone(zone)) { need_zonelists_rebuild = 1; setup_zone_pageset(zone); } online_pages_range(pfn, nr_pages); adjust_present_page_count(pfn_to_page(pfn), group, nr_pages); node_states_set_node(nid, &arg); if (need_zonelists_rebuild) build_all_zonelists(NULL); /* Basic onlining is complete, allow allocation of onlined pages. */ undo_isolate_page_range(pfn, pfn + nr_pages, MIGRATE_MOVABLE); /* * Freshly onlined pages aren't shuffled (e.g., all pages are placed to * the tail of the freelist when undoing isolation). Shuffle the whole * zone to make sure the just onlined pages are properly distributed * across the whole freelist - to create an initial shuffle. */ shuffle_zone(zone); /* reinitialise watermarks and update pcp limits */ init_per_zone_wmark_min(); kswapd_run(nid); kcompactd_run(nid); writeback_set_ratelimit(); memory_notify(MEM_ONLINE, &arg); return 0; failed_addition: pr_debug("online_pages [mem %#010llx-%#010llx] failed\n", (unsigned long long) pfn << PAGE_SHIFT, (((unsigned long long) pfn + nr_pages) << PAGE_SHIFT) - 1); memory_notify(MEM_CANCEL_ONLINE, &arg); remove_pfn_range_from_zone(zone, pfn, nr_pages); return ret; } /* we are OK calling __meminit stuff here - we have CONFIG_MEMORY_HOTPLUG */ static pg_data_t __ref *hotadd_init_pgdat(int nid) { struct pglist_data *pgdat; /* * NODE_DATA is preallocated (free_area_init) but its internal * state is not allocated completely. Add missing pieces. * Completely offline nodes stay around and they just need * reintialization. */ pgdat = NODE_DATA(nid); /* init node's zones as empty zones, we don't have any present pages.*/ free_area_init_core_hotplug(pgdat); /* * The node we allocated has no zone fallback lists. For avoiding * to access not-initialized zonelist, build here. */ build_all_zonelists(pgdat); return pgdat; } /* * __try_online_node - online a node if offlined * @nid: the node ID * @set_node_online: Whether we want to online the node * called by cpu_up() to online a node without onlined memory. * * Returns: * 1 -> a new node has been allocated * 0 -> the node is already online * -ENOMEM -> the node could not be allocated */ static int __try_online_node(int nid, bool set_node_online) { pg_data_t *pgdat; int ret = 1; if (node_online(nid)) return 0; pgdat = hotadd_init_pgdat(nid); if (!pgdat) { pr_err("Cannot online node %d due to NULL pgdat\n", nid); ret = -ENOMEM; goto out; } if (set_node_online) { node_set_online(nid); ret = register_one_node(nid); BUG_ON(ret); } out: return ret; } /* * Users of this function always want to online/register the node */ int try_online_node(int nid) { int ret; mem_hotplug_begin(); ret = __try_online_node(nid, true); mem_hotplug_done(); return ret; } static int check_hotplug_memory_range(u64 start, u64 size) { /* memory range must be block size aligned */ if (!size || !IS_ALIGNED(start, memory_block_size_bytes()) || !IS_ALIGNED(size, memory_block_size_bytes())) { pr_err("Block size [%#lx] unaligned hotplug range: start %#llx, size %#llx", memory_block_size_bytes(), start, size); return -EINVAL; } return 0; } static int online_memory_block(struct memory_block *mem, void *arg) { mem->online_type = mhp_default_online_type; return device_online(&mem->dev); } #ifndef arch_supports_memmap_on_memory static inline bool arch_supports_memmap_on_memory(unsigned long vmemmap_size) { /* * As default, we want the vmemmap to span a complete PMD such that we * can map the vmemmap using a single PMD if supported by the * architecture. */ return IS_ALIGNED(vmemmap_size, PMD_SIZE); } #endif bool mhp_supports_memmap_on_memory(void) { unsigned long vmemmap_size = memory_block_memmap_size(); unsigned long memmap_pages = memory_block_memmap_on_memory_pages(); /* * Besides having arch support and the feature enabled at runtime, we * need a few more assumptions to hold true: * * a) The vmemmap pages span complete PMDs: We don't want vmemmap code * to populate memory from the altmap for unrelated parts (i.e., * other memory blocks) * * b) The vmemmap pages (and thereby the pages that will be exposed to * the buddy) have to cover full pageblocks: memory onlining/offlining * code requires applicable ranges to be page-aligned, for example, to * set the migratetypes properly. * * TODO: Although we have a check here to make sure that vmemmap pages * fully populate a PMD, it is not the right place to check for * this. A much better solution involves improving vmemmap code * to fallback to base pages when trying to populate vmemmap using * altmap as an alternative source of memory, and we do not exactly * populate a single PMD. */ if (!mhp_memmap_on_memory()) return false; /* * Make sure the vmemmap allocation is fully contained * so that we always allocate vmemmap memory from altmap area. */ if (!IS_ALIGNED(vmemmap_size, PAGE_SIZE)) return false; /* * start pfn should be pageblock_nr_pages aligned for correctly * setting migrate types */ if (!pageblock_aligned(memmap_pages)) return false; if (memmap_pages == PHYS_PFN(memory_block_size_bytes())) /* No effective hotplugged memory doesn't make sense. */ return false; return arch_supports_memmap_on_memory(vmemmap_size); } EXPORT_SYMBOL_GPL(mhp_supports_memmap_on_memory); static void __ref remove_memory_blocks_and_altmaps(u64 start, u64 size) { unsigned long memblock_size = memory_block_size_bytes(); u64 cur_start; /* * For memmap_on_memory, the altmaps were added on a per-memblock * basis; we have to process each individual memory block. */ for (cur_start = start; cur_start < start + size; cur_start += memblock_size) { struct vmem_altmap *altmap = NULL; struct memory_block *mem; mem = find_memory_block(pfn_to_section_nr(PFN_DOWN(cur_start))); if (WARN_ON_ONCE(!mem)) continue; altmap = mem->altmap; mem->altmap = NULL; remove_memory_block_devices(cur_start, memblock_size); arch_remove_memory(cur_start, memblock_size, altmap); /* Verify that all vmemmap pages have actually been freed. */ WARN(altmap->alloc, "Altmap not fully unmapped"); kfree(altmap); } } static int create_altmaps_and_memory_blocks(int nid, struct memory_group *group, u64 start, u64 size, mhp_t mhp_flags) { unsigned long memblock_size = memory_block_size_bytes(); u64 cur_start; int ret; for (cur_start = start; cur_start < start + size; cur_start += memblock_size) { struct mhp_params params = { .pgprot = pgprot_mhp(PAGE_KERNEL) }; struct vmem_altmap mhp_altmap = { .base_pfn = PHYS_PFN(cur_start), .end_pfn = PHYS_PFN(cur_start + memblock_size - 1), }; mhp_altmap.free = memory_block_memmap_on_memory_pages(); if (mhp_flags & MHP_OFFLINE_INACCESSIBLE) mhp_altmap.inaccessible = true; params.altmap = kmemdup(&mhp_altmap, sizeof(struct vmem_altmap), GFP_KERNEL); if (!params.altmap) { ret = -ENOMEM; goto out; } /* call arch's memory hotadd */ ret = arch_add_memory(nid, cur_start, memblock_size, ¶ms); if (ret < 0) { kfree(params.altmap); goto out; } /* create memory block devices after memory was added */ ret = create_memory_block_devices(cur_start, memblock_size, params.altmap, group); if (ret) { arch_remove_memory(cur_start, memblock_size, NULL); kfree(params.altmap); goto out; } } return 0; out: if (ret && cur_start != start) remove_memory_blocks_and_altmaps(start, cur_start - start); return ret; } /* * NOTE: The caller must call lock_device_hotplug() to serialize hotplug * and online/offline operations (triggered e.g. by sysfs). * * we are OK calling __meminit stuff here - we have CONFIG_MEMORY_HOTPLUG */ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags) { struct mhp_params params = { .pgprot = pgprot_mhp(PAGE_KERNEL) }; enum memblock_flags memblock_flags = MEMBLOCK_NONE; struct memory_group *group = NULL; u64 start, size; bool new_node = false; int ret; start = res->start; size = resource_size(res); ret = check_hotplug_memory_range(start, size); if (ret) return ret; if (mhp_flags & MHP_NID_IS_MGID) { group = memory_group_find_by_id(nid); if (!group) return -EINVAL; nid = group->nid; } if (!node_possible(nid)) { WARN(1, "node %d was absent from the node_possible_map\n", nid); return -EINVAL; } mem_hotplug_begin(); if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK)) { if (res->flags & IORESOURCE_SYSRAM_DRIVER_MANAGED) memblock_flags = MEMBLOCK_DRIVER_MANAGED; ret = memblock_add_node(start, size, nid, memblock_flags); if (ret) goto error_mem_hotplug_end; } ret = __try_online_node(nid, false); if (ret < 0) goto error; new_node = ret; /* * Self hosted memmap array */ if ((mhp_flags & MHP_MEMMAP_ON_MEMORY) && mhp_supports_memmap_on_memory()) { ret = create_altmaps_and_memory_blocks(nid, group, start, size, mhp_flags); if (ret) goto error; } else { ret = arch_add_memory(nid, start, size, ¶ms); if (ret < 0) goto error; /* create memory block devices after memory was added */ ret = create_memory_block_devices(start, size, NULL, group); if (ret) { arch_remove_memory(start, size, params.altmap); goto error; } } if (new_node) { /* If sysfs file of new node can't be created, cpu on the node * can't be hot-added. There is no rollback way now. * So, check by BUG_ON() to catch it reluctantly.. * We online node here. We can't roll back from here. */ node_set_online(nid); ret = __register_one_node(nid); BUG_ON(ret); } register_memory_blocks_under_node(nid, PFN_DOWN(start), PFN_UP(start + size - 1), MEMINIT_HOTPLUG); /* create new memmap entry */ if (!strcmp(res->name, "System RAM")) firmware_map_add_hotplug(start, start + size, "System RAM"); /* device_online() will take the lock when calling online_pages() */ mem_hotplug_done(); /* * In case we're allowed to merge the resource, flag it and trigger * merging now that adding succeeded. */ if (mhp_flags & MHP_MERGE_RESOURCE) merge_system_ram_resource(res); /* online pages if requested */ if (mhp_default_online_type != MMOP_OFFLINE) walk_memory_blocks(start, size, NULL, online_memory_block); return ret; error: if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK)) memblock_remove(start, size); error_mem_hotplug_end: mem_hotplug_done(); return ret; } /* requires device_hotplug_lock, see add_memory_resource() */ int __ref __add_memory(int nid, u64 start, u64 size, mhp_t mhp_flags) { struct resource *res; int ret; res = register_memory_resource(start, size, "System RAM"); if (IS_ERR(res)) return PTR_ERR(res); ret = add_memory_resource(nid, res, mhp_flags); if (ret < 0) release_memory_resource(res); return ret; } int add_memory(int nid, u64 start, u64 size, mhp_t mhp_flags) { int rc; lock_device_hotplug(); rc = __add_memory(nid, start, size, mhp_flags); unlock_device_hotplug(); return rc; } EXPORT_SYMBOL_GPL(add_memory); /* * Add special, driver-managed memory to the system as system RAM. Such * memory is not exposed via the raw firmware-provided memmap as system * RAM, instead, it is detected and added by a driver - during cold boot, * after a reboot, and after kexec. * * Reasons why this memory should not be used for the initial memmap of a * kexec kernel or for placing kexec images: * - The booting kernel is in charge of determining how this memory will be * used (e.g., use persistent memory as system RAM) * - Coordination with a hypervisor is required before this memory * can be used (e.g., inaccessible parts). * * For this memory, no entries in /sys/firmware/memmap ("raw firmware-provided * memory map") are created. Also, the created memory resource is flagged * with IORESOURCE_SYSRAM_DRIVER_MANAGED, so in-kernel users can special-case * this memory as well (esp., not place kexec images onto it). * * The resource_name (visible via /proc/iomem) has to have the format * "System RAM ($DRIVER)". */ int add_memory_driver_managed(int nid, u64 start, u64 size, const char *resource_name, mhp_t mhp_flags) { struct resource *res; int rc; if (!resource_name || strstr(resource_name, "System RAM (") != resource_name || resource_name[strlen(resource_name) - 1] != ')') return -EINVAL; lock_device_hotplug(); res = register_memory_resource(start, size, resource_name); if (IS_ERR(res)) { rc = PTR_ERR(res); goto out_unlock; } rc = add_memory_resource(nid, res, mhp_flags); if (rc < 0) release_memory_resource(res); out_unlock: unlock_device_hotplug(); return rc; } EXPORT_SYMBOL_GPL(add_memory_driver_managed); /* * Platforms should define arch_get_mappable_range() that provides * maximum possible addressable physical memory range for which the * linear mapping could be created. The platform returned address * range must adhere to these following semantics. * * - range.start <= range.end * - Range includes both end points [range.start..range.end] * * There is also a fallback definition provided here, allowing the * entire possible physical address range in case any platform does * not define arch_get_mappable_range(). */ struct range __weak arch_get_mappable_range(void) { struct range mhp_range = { .start = 0UL, .end = -1ULL, }; return mhp_range; } struct range mhp_get_pluggable_range(bool need_mapping) { const u64 max_phys = (1ULL << MAX_PHYSMEM_BITS) - 1; struct range mhp_range; if (need_mapping) { mhp_range = arch_get_mappable_range(); if (mhp_range.start > max_phys) { mhp_range.start = 0; mhp_range.end = 0; } mhp_range.end = min_t(u64, mhp_range.end, max_phys); } else { mhp_range.start = 0; mhp_range.end = max_phys; } return mhp_range; } EXPORT_SYMBOL_GPL(mhp_get_pluggable_range); bool mhp_range_allowed(u64 start, u64 size, bool need_mapping) { struct range mhp_range = mhp_get_pluggable_range(need_mapping); u64 end = start + size; if (start < end && start >= mhp_range.start && (end - 1) <= mhp_range.end) return true; pr_warn("Hotplug memory [%#llx-%#llx] exceeds maximum addressable range [%#llx-%#llx]\n", start, end, mhp_range.start, mhp_range.end); return false; } #ifdef CONFIG_MEMORY_HOTREMOVE /* * Scan pfn range [start,end) to find movable/migratable pages (LRU pages, * non-lru movable pages and hugepages). Will skip over most unmovable * pages (esp., pages that can be skipped when offlining), but bail out on * definitely unmovable pages. * * Returns: * 0 in case a movable page is found and movable_pfn was updated. * -ENOENT in case no movable page was found. * -EBUSY in case a definitely unmovable page was found. */ static int scan_movable_pages(unsigned long start, unsigned long end, unsigned long *movable_pfn) { unsigned long pfn; for (pfn = start; pfn < end; pfn++) { struct page *page; struct folio *folio; if (!pfn_valid(pfn)) continue; page = pfn_to_page(pfn); if (PageLRU(page)) goto found; if (__PageMovable(page)) goto found; /* * PageOffline() pages that are not marked __PageMovable() and * have a reference count > 0 (after MEM_GOING_OFFLINE) are * definitely unmovable. If their reference count would be 0, * they could at least be skipped when offlining memory. */ if (PageOffline(page) && page_count(page)) return -EBUSY; if (!PageHuge(page)) continue; folio = page_folio(page); /* * This test is racy as we hold no reference or lock. The * hugetlb page could have been free'ed and head is no longer * a hugetlb page before the following check. In such unlikely * cases false positives and negatives are possible. Calling * code must deal with these scenarios. */ if (folio_test_hugetlb_migratable(folio)) goto found; pfn |= folio_nr_pages(folio) - 1; } return -ENOENT; found: *movable_pfn = pfn; return 0; } static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) { unsigned long pfn; struct page *page, *head; LIST_HEAD(source); static DEFINE_RATELIMIT_STATE(migrate_rs, DEFAULT_RATELIMIT_INTERVAL, DEFAULT_RATELIMIT_BURST); for (pfn = start_pfn; pfn < end_pfn; pfn++) { struct folio *folio; bool isolated; if (!pfn_valid(pfn)) continue; page = pfn_to_page(pfn); folio = page_folio(page); head = &folio->page; if (PageHuge(page)) { pfn = page_to_pfn(head) + compound_nr(head) - 1; isolate_hugetlb(folio, &source); continue; } else if (PageTransHuge(page)) pfn = page_to_pfn(head) + thp_nr_pages(page) - 1; /* * HWPoison pages have elevated reference counts so the migration would * fail on them. It also doesn't make any sense to migrate them in the * first place. Still try to unmap such a page in case it is still mapped * (e.g. current hwpoison implementation doesn't unmap KSM pages but keep * the unmap as the catch all safety net). */ if (PageHWPoison(page)) { if (WARN_ON(folio_test_lru(folio))) folio_isolate_lru(folio); if (folio_mapped(folio)) try_to_unmap(folio, TTU_IGNORE_MLOCK); continue; } if (!get_page_unless_zero(page)) continue; /* * We can skip free pages. And we can deal with pages on * LRU and non-lru movable pages. */ if (PageLRU(page)) isolated = isolate_lru_page(page); else isolated = isolate_movable_page(page, ISOLATE_UNEVICTABLE); if (isolated) { list_add_tail(&page->lru, &source); if (!__PageMovable(page)) inc_node_page_state(page, NR_ISOLATED_ANON + page_is_file_lru(page)); } else { if (__ratelimit(&migrate_rs)) { pr_warn("failed to isolate pfn %lx\n", pfn); dump_page(page, "isolation failed"); } } put_page(page); } if (!list_empty(&source)) { nodemask_t nmask = node_states[N_MEMORY]; struct migration_target_control mtc = { .nmask = &nmask, .gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL, .reason = MR_MEMORY_HOTPLUG, }; int ret; /* * We have checked that migration range is on a single zone so * we can use the nid of the first page to all the others. */ mtc.nid = page_to_nid(list_first_entry(&source, struct page, lru)); /* * try to allocate from a different node but reuse this node * if there are no other online nodes to be used (e.g. we are * offlining a part of the only existing node) */ node_clear(mtc.nid, nmask); if (nodes_empty(nmask)) node_set(mtc.nid, nmask); ret = migrate_pages(&source, alloc_migration_target, NULL, (unsigned long)&mtc, MIGRATE_SYNC, MR_MEMORY_HOTPLUG, NULL); if (ret) { list_for_each_entry(page, &source, lru) { if (__ratelimit(&migrate_rs)) { pr_warn("migrating pfn %lx failed ret:%d\n", page_to_pfn(page), ret); dump_page(page, "migration failure"); } } putback_movable_pages(&source); } } } static int __init cmdline_parse_movable_node(char *p) { movable_node_enabled = true; return 0; } early_param("movable_node", cmdline_parse_movable_node); /* check which state of node_states will be changed when offline memory */ static void node_states_check_changes_offline(unsigned long nr_pages, struct zone *zone, struct memory_notify *arg) { struct pglist_data *pgdat = zone->zone_pgdat; unsigned long present_pages = 0; enum zone_type zt; arg->status_change_nid = NUMA_NO_NODE; arg->status_change_nid_normal = NUMA_NO_NODE; /* * Check whether node_states[N_NORMAL_MEMORY] will be changed. * If the memory to be offline is within the range * [0..ZONE_NORMAL], and it is the last present memory there, * the zones in that range will become empty after the offlining, * thus we can determine that we need to clear the node from * node_states[N_NORMAL_MEMORY]. */ for (zt = 0; zt <= ZONE_NORMAL; zt++) present_pages += pgdat->node_zones[zt].present_pages; if (zone_idx(zone) <= ZONE_NORMAL && nr_pages >= present_pages) arg->status_change_nid_normal = zone_to_nid(zone); /* * We have accounted the pages from [0..ZONE_NORMAL); ZONE_HIGHMEM * does not apply as we don't support 32bit. * Here we count the possible pages from ZONE_MOVABLE. * If after having accounted all the pages, we see that the nr_pages * to be offlined is over or equal to the accounted pages, * we know that the node will become empty, and so, we can clear * it for N_MEMORY as well. */ present_pages += pgdat->node_zones[ZONE_MOVABLE].present_pages; if (nr_pages >= present_pages) arg->status_change_nid = zone_to_nid(zone); } static void node_states_clear_node(int node, struct memory_notify *arg) { if (arg->status_change_nid_normal >= 0) node_clear_state(node, N_NORMAL_MEMORY); if (arg->status_change_nid >= 0) node_clear_state(node, N_MEMORY); } static int count_system_ram_pages_cb(unsigned long start_pfn, unsigned long nr_pages, void *data) { unsigned long *nr_system_ram_pages = data; *nr_system_ram_pages += nr_pages; return 0; } /* * Must be called with mem_hotplug_lock in write mode. */ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages, struct zone *zone, struct memory_group *group) { const unsigned long end_pfn = start_pfn + nr_pages; unsigned long pfn, managed_pages, system_ram_pages = 0; const int node = zone_to_nid(zone); unsigned long flags; struct memory_notify arg; char *reason; int ret; /* * {on,off}lining is constrained to full memory sections (or more * precisely to memory blocks from the user space POV). * memmap_on_memory is an exception because it reserves initial part * of the physical memory space for vmemmaps. That space is pageblock * aligned. */ if (WARN_ON_ONCE(!nr_pages || !pageblock_aligned(start_pfn) || !IS_ALIGNED(start_pfn + nr_pages, PAGES_PER_SECTION))) return -EINVAL; /* * Don't allow to offline memory blocks that contain holes. * Consequently, memory blocks with holes can never get onlined * via the hotplug path - online_pages() - as hotplugged memory has * no holes. This way, we don't have to worry about memory holes, * don't need pfn_valid() checks, and can avoid using * walk_system_ram_range() later. */ walk_system_ram_range(start_pfn, nr_pages, &system_ram_pages, count_system_ram_pages_cb); if (system_ram_pages != nr_pages) { ret = -EINVAL; reason = "memory holes"; goto failed_removal; } /* * We only support offlining of memory blocks managed by a single zone, * checked by calling code. This is just a sanity check that we might * want to remove in the future. */ if (WARN_ON_ONCE(page_zone(pfn_to_page(start_pfn)) != zone || page_zone(pfn_to_page(end_pfn - 1)) != zone)) { ret = -EINVAL; reason = "multizone range"; goto failed_removal; } /* * Disable pcplists so that page isolation cannot race with freeing * in a way that pages from isolated pageblock are left on pcplists. */ zone_pcp_disable(zone); lru_cache_disable(); /* set above range as isolated */ ret = start_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE, MEMORY_OFFLINE | REPORT_FAILURE, GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL); if (ret) { reason = "failure to isolate range"; goto failed_removal_pcplists_disabled; } arg.start_pfn = start_pfn; arg.nr_pages = nr_pages; node_states_check_changes_offline(nr_pages, zone, &arg); ret = memory_notify(MEM_GOING_OFFLINE, &arg); ret = notifier_to_errno(ret); if (ret) { reason = "notifier failure"; goto failed_removal_isolated; } do { pfn = start_pfn; do { /* * Historically we always checked for any signal and * can't limit it to fatal signals without eventually * breaking user space. */ if (signal_pending(current)) { ret = -EINTR; reason = "signal backoff"; goto failed_removal_isolated; } cond_resched(); ret = scan_movable_pages(pfn, end_pfn, &pfn); if (!ret) { /* * TODO: fatal migration failures should bail * out */ do_migrate_range(pfn, end_pfn); } } while (!ret); if (ret != -ENOENT) { reason = "unmovable page"; goto failed_removal_isolated; } /* * Dissolve free hugetlb folios in the memory block before doing * offlining actually in order to make hugetlbfs's object * counting consistent. */ ret = dissolve_free_hugetlb_folios(start_pfn, end_pfn); if (ret) { reason = "failure to dissolve huge pages"; goto failed_removal_isolated; } ret = test_pages_isolated(start_pfn, end_pfn, MEMORY_OFFLINE); } while (ret); /* Mark all sections offline and remove free pages from the buddy. */ managed_pages = __offline_isolated_pages(start_pfn, end_pfn); pr_debug("Offlined Pages %ld\n", nr_pages); /* * The memory sections are marked offline, and the pageblock flags * effectively stale; nobody should be touching them. Fixup the number * of isolated pageblocks, memory onlining will properly revert this. */ spin_lock_irqsave(&zone->lock, flags); zone->nr_isolate_pageblock -= nr_pages / pageblock_nr_pages; spin_unlock_irqrestore(&zone->lock, flags); lru_cache_enable(); zone_pcp_enable(zone); /* removal success */ adjust_managed_page_count(pfn_to_page(start_pfn), -managed_pages); adjust_present_page_count(pfn_to_page(start_pfn), group, -nr_pages); /* reinitialise watermarks and update pcp limits */ init_per_zone_wmark_min(); /* * Make sure to mark the node as memory-less before rebuilding the zone * list. Otherwise this node would still appear in the fallback lists. */ node_states_clear_node(node, &arg); if (!populated_zone(zone)) { zone_pcp_reset(zone); build_all_zonelists(NULL); } if (arg.status_change_nid >= 0) { kcompactd_stop(node); kswapd_stop(node); } writeback_set_ratelimit(); memory_notify(MEM_OFFLINE, &arg); remove_pfn_range_from_zone(zone, start_pfn, nr_pages); return 0; failed_removal_isolated: /* pushback to free area */ undo_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE); memory_notify(MEM_CANCEL_OFFLINE, &arg); failed_removal_pcplists_disabled: lru_cache_enable(); zone_pcp_enable(zone); failed_removal: pr_debug("memory offlining [mem %#010llx-%#010llx] failed due to %s\n", (unsigned long long) start_pfn << PAGE_SHIFT, ((unsigned long long) end_pfn << PAGE_SHIFT) - 1, reason); return ret; } static int check_memblock_offlined_cb(struct memory_block *mem, void *arg) { int *nid = arg; *nid = mem->nid; if (unlikely(mem->state != MEM_OFFLINE)) { phys_addr_t beginpa, endpa; beginpa = PFN_PHYS(section_nr_to_pfn(mem->start_section_nr)); endpa = beginpa + memory_block_size_bytes() - 1; pr_warn("removing memory fails, because memory [%pa-%pa] is onlined\n", &beginpa, &endpa); return -EBUSY; } return 0; } static int count_memory_range_altmaps_cb(struct memory_block *mem, void *arg) { u64 *num_altmaps = (u64 *)arg; if (mem->altmap) *num_altmaps += 1; return 0; } static int check_cpu_on_node(int nid) { int cpu; for_each_present_cpu(cpu) { if (cpu_to_node(cpu) == nid) /* * the cpu on this node isn't removed, and we can't * offline this node. */ return -EBUSY; } return 0; } static int check_no_memblock_for_node_cb(struct memory_block *mem, void *arg) { int nid = *(int *)arg; /* * If a memory block belongs to multiple nodes, the stored nid is not * reliable. However, such blocks are always online (e.g., cannot get * offlined) and, therefore, are still spanned by the node. */ return mem->nid == nid ? -EEXIST : 0; } /** * try_offline_node * @nid: the node ID * * Offline a node if all memory sections and cpus of the node are removed. * * NOTE: The caller must call lock_device_hotplug() to serialize hotplug * and online/offline operations before this call. */ void try_offline_node(int nid) { int rc; /* * If the node still spans pages (especially ZONE_DEVICE), don't * offline it. A node spans memory after move_pfn_range_to_zone(), * e.g., after the memory block was onlined. */ if (node_spanned_pages(nid)) return; /* * Especially offline memory blocks might not be spanned by the * node. They will get spanned by the node once they get onlined. * However, they link to the node in sysfs and can get onlined later. */ rc = for_each_memory_block(&nid, check_no_memblock_for_node_cb); if (rc) return; if (check_cpu_on_node(nid)) return; /* * all memory/cpu of this node are removed, we can offline this * node now. */ node_set_offline(nid); unregister_one_node(nid); } EXPORT_SYMBOL(try_offline_node); static int memory_blocks_have_altmaps(u64 start, u64 size) { u64 num_memblocks = size / memory_block_size_bytes(); u64 num_altmaps = 0; if (!mhp_memmap_on_memory()) return 0; walk_memory_blocks(start, size, &num_altmaps, count_memory_range_altmaps_cb); if (num_altmaps == 0) return 0; if (WARN_ON_ONCE(num_memblocks != num_altmaps)) return -EINVAL; return 1; } static int __ref try_remove_memory(u64 start, u64 size) { int rc, nid = NUMA_NO_NODE; BUG_ON(check_hotplug_memory_range(start, size)); /* * All memory blocks must be offlined before removing memory. Check * whether all memory blocks in question are offline and return error * if this is not the case. * * While at it, determine the nid. Note that if we'd have mixed nodes, * we'd only try to offline the last determined one -- which is good * enough for the cases we care about. */ rc = walk_memory_blocks(start, size, &nid, check_memblock_offlined_cb); if (rc) return rc; /* remove memmap entry */ firmware_map_remove(start, start + size, "System RAM"); mem_hotplug_begin(); rc = memory_blocks_have_altmaps(start, size); if (rc < 0) { mem_hotplug_done(); return rc; } else if (!rc) { /* * Memory block device removal under the device_hotplug_lock is * a barrier against racing online attempts. * No altmaps present, do the removal directly */ remove_memory_block_devices(start, size); arch_remove_memory(start, size, NULL); } else { /* all memblocks in the range have altmaps */ remove_memory_blocks_and_altmaps(start, size); } if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK)) memblock_remove(start, size); release_mem_region_adjustable(start, size); if (nid != NUMA_NO_NODE) try_offline_node(nid); mem_hotplug_done(); return 0; } /** * __remove_memory - Remove memory if every memory block is offline * @start: physical address of the region to remove * @size: size of the region to remove * * NOTE: The caller must call lock_device_hotplug() to serialize hotplug * and online/offline operations before this call, as required by * try_offline_node(). */ void __remove_memory(u64 start, u64 size) { /* * trigger BUG() if some memory is not offlined prior to calling this * function */ if (try_remove_memory(start, size)) BUG(); } /* * Remove memory if every memory block is offline, otherwise return -EBUSY is * some memory is not offline */ int remove_memory(u64 start, u64 size) { int rc; lock_device_hotplug(); rc = try_remove_memory(start, size); unlock_device_hotplug(); return rc; } EXPORT_SYMBOL_GPL(remove_memory); static int try_offline_memory_block(struct memory_block *mem, void *arg) { uint8_t online_type = MMOP_ONLINE_KERNEL; uint8_t **online_types = arg; struct page *page; int rc; /* * Sense the online_type via the zone of the memory block. Offlining * with multiple zones within one memory block will be rejected * by offlining code ... so we don't care about that. */ page = pfn_to_online_page(section_nr_to_pfn(mem->start_section_nr)); if (page && zone_idx(page_zone(page)) == ZONE_MOVABLE) online_type = MMOP_ONLINE_MOVABLE; rc = device_offline(&mem->dev); /* * Default is MMOP_OFFLINE - change it only if offlining succeeded, * so try_reonline_memory_block() can do the right thing. */ if (!rc) **online_types = online_type; (*online_types)++; /* Ignore if already offline. */ return rc < 0 ? rc : 0; } static int try_reonline_memory_block(struct memory_block *mem, void *arg) { uint8_t **online_types = arg; int rc; if (**online_types != MMOP_OFFLINE) { mem->online_type = **online_types; rc = device_online(&mem->dev); if (rc < 0) pr_warn("%s: Failed to re-online memory: %d", __func__, rc); } /* Continue processing all remaining memory blocks. */ (*online_types)++; return 0; } /* * Try to offline and remove memory. Might take a long time to finish in case * memory is still in use. Primarily useful for memory devices that logically * unplugged all memory (so it's no longer in use) and want to offline + remove * that memory. */ int offline_and_remove_memory(u64 start, u64 size) { const unsigned long mb_count = size / memory_block_size_bytes(); uint8_t *online_types, *tmp; int rc; if (!IS_ALIGNED(start, memory_block_size_bytes()) || !IS_ALIGNED(size, memory_block_size_bytes()) || !size) return -EINVAL; /* * We'll remember the old online type of each memory block, so we can * try to revert whatever we did when offlining one memory block fails * after offlining some others succeeded. */ online_types = kmalloc_array(mb_count, sizeof(*online_types), GFP_KERNEL); if (!online_types) return -ENOMEM; /* * Initialize all states to MMOP_OFFLINE, so when we abort processing in * try_offline_memory_block(), we'll skip all unprocessed blocks in * try_reonline_memory_block(). */ memset(online_types, MMOP_OFFLINE, mb_count); lock_device_hotplug(); tmp = online_types; rc = walk_memory_blocks(start, size, &tmp, try_offline_memory_block); /* * In case we succeeded to offline all memory, remove it. * This cannot fail as it cannot get onlined in the meantime. */ if (!rc) { rc = try_remove_memory(start, size); if (rc) pr_err("%s: Failed to remove memory: %d", __func__, rc); } /* * Rollback what we did. While memory onlining might theoretically fail * (nacked by a notifier), it barely ever happens. */ if (rc) { tmp = online_types; walk_memory_blocks(start, size, &tmp, try_reonline_memory_block); } unlock_device_hotplug(); kfree(online_types); return rc; } EXPORT_SYMBOL_GPL(offline_and_remove_memory); #endif /* CONFIG_MEMORY_HOTREMOVE */ |
6 4 53 53 3 42 2 46 45 45 1 4 4 6 6 6 6 25 4 21 4 22 3 21 21 4 19 1 1 7 2 7 6 1 1 1 3 3 144 4 144 11 138 144 2 2 1 2 2 57 57 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 | // SPDX-License-Identifier: GPL-2.0+ /* * NILFS disk address translation. * * Copyright (C) 2006-2008 Nippon Telegraph and Telephone Corporation. * * Written by Koji Sato. */ #include <linux/types.h> #include <linux/buffer_head.h> #include <linux/string.h> #include <linux/errno.h> #include "nilfs.h" #include "mdt.h" #include "alloc.h" #include "dat.h" #define NILFS_CNO_MIN ((__u64)1) #define NILFS_CNO_MAX (~(__u64)0) /** * struct nilfs_dat_info - on-memory private data of DAT file * @mi: on-memory private data of metadata file * @palloc_cache: persistent object allocator cache of DAT file * @shadow: shadow map of DAT file */ struct nilfs_dat_info { struct nilfs_mdt_info mi; struct nilfs_palloc_cache palloc_cache; struct nilfs_shadow_map shadow; }; static inline struct nilfs_dat_info *NILFS_DAT_I(struct inode *dat) { return (struct nilfs_dat_info *)NILFS_MDT(dat); } static int nilfs_dat_prepare_entry(struct inode *dat, struct nilfs_palloc_req *req, int create) { int ret; ret = nilfs_palloc_get_entry_block(dat, req->pr_entry_nr, create, &req->pr_entry_bh); if (unlikely(ret == -ENOENT)) { nilfs_err(dat->i_sb, "DAT doesn't have a block to manage vblocknr = %llu", (unsigned long long)req->pr_entry_nr); /* * Return internal code -EINVAL to notify bmap layer of * metadata corruption. */ ret = -EINVAL; } return ret; } static void nilfs_dat_commit_entry(struct inode *dat, struct nilfs_palloc_req *req) { mark_buffer_dirty(req->pr_entry_bh); nilfs_mdt_mark_dirty(dat); brelse(req->pr_entry_bh); } static void nilfs_dat_abort_entry(struct inode *dat, struct nilfs_palloc_req *req) { brelse(req->pr_entry_bh); } int nilfs_dat_prepare_alloc(struct inode *dat, struct nilfs_palloc_req *req) { int ret; ret = nilfs_palloc_prepare_alloc_entry(dat, req, true); if (ret < 0) return ret; ret = nilfs_dat_prepare_entry(dat, req, 1); if (ret < 0) nilfs_palloc_abort_alloc_entry(dat, req); return ret; } void nilfs_dat_commit_alloc(struct inode *dat, struct nilfs_palloc_req *req) { struct nilfs_dat_entry *entry; void *kaddr; kaddr = kmap_local_page(req->pr_entry_bh->b_page); entry = nilfs_palloc_block_get_entry(dat, req->pr_entry_nr, req->pr_entry_bh, kaddr); entry->de_start = cpu_to_le64(NILFS_CNO_MIN); entry->de_end = cpu_to_le64(NILFS_CNO_MAX); entry->de_blocknr = cpu_to_le64(0); kunmap_local(kaddr); nilfs_palloc_commit_alloc_entry(dat, req); nilfs_dat_commit_entry(dat, req); } void nilfs_dat_abort_alloc(struct inode *dat, struct nilfs_palloc_req *req) { nilfs_dat_abort_entry(dat, req); nilfs_palloc_abort_alloc_entry(dat, req); } static void nilfs_dat_commit_free(struct inode *dat, struct nilfs_palloc_req *req) { struct nilfs_dat_entry *entry; void *kaddr; kaddr = kmap_local_page(req->pr_entry_bh->b_page); entry = nilfs_palloc_block_get_entry(dat, req->pr_entry_nr, req->pr_entry_bh, kaddr); entry->de_start = cpu_to_le64(NILFS_CNO_MIN); entry->de_end = cpu_to_le64(NILFS_CNO_MIN); entry->de_blocknr = cpu_to_le64(0); kunmap_local(kaddr); nilfs_dat_commit_entry(dat, req); if (unlikely(req->pr_desc_bh == NULL || req->pr_bitmap_bh == NULL)) { nilfs_error(dat->i_sb, "state inconsistency probably due to duplicate use of vblocknr = %llu", (unsigned long long)req->pr_entry_nr); return; } nilfs_palloc_commit_free_entry(dat, req); } int nilfs_dat_prepare_start(struct inode *dat, struct nilfs_palloc_req *req) { return nilfs_dat_prepare_entry(dat, req, 0); } void nilfs_dat_commit_start(struct inode *dat, struct nilfs_palloc_req *req, sector_t blocknr) { struct nilfs_dat_entry *entry; void *kaddr; kaddr = kmap_local_page(req->pr_entry_bh->b_page); entry = nilfs_palloc_block_get_entry(dat, req->pr_entry_nr, req->pr_entry_bh, kaddr); entry->de_start = cpu_to_le64(nilfs_mdt_cno(dat)); entry->de_blocknr = cpu_to_le64(blocknr); kunmap_local(kaddr); nilfs_dat_commit_entry(dat, req); } int nilfs_dat_prepare_end(struct inode *dat, struct nilfs_palloc_req *req) { struct nilfs_dat_entry *entry; __u64 start; sector_t blocknr; void *kaddr; int ret; ret = nilfs_dat_prepare_entry(dat, req, 0); if (ret < 0) return ret; kaddr = kmap_local_page(req->pr_entry_bh->b_page); entry = nilfs_palloc_block_get_entry(dat, req->pr_entry_nr, req->pr_entry_bh, kaddr); start = le64_to_cpu(entry->de_start); blocknr = le64_to_cpu(entry->de_blocknr); kunmap_local(kaddr); if (blocknr == 0) { ret = nilfs_palloc_prepare_free_entry(dat, req); if (ret < 0) { nilfs_dat_abort_entry(dat, req); return ret; } } if (unlikely(start > nilfs_mdt_cno(dat))) { nilfs_err(dat->i_sb, "vblocknr = %llu has abnormal lifetime: start cno (= %llu) > current cno (= %llu)", (unsigned long long)req->pr_entry_nr, (unsigned long long)start, (unsigned long long)nilfs_mdt_cno(dat)); nilfs_dat_abort_entry(dat, req); return -EINVAL; } return 0; } void nilfs_dat_commit_end(struct inode *dat, struct nilfs_palloc_req *req, int dead) { struct nilfs_dat_entry *entry; __u64 start, end; sector_t blocknr; void *kaddr; kaddr = kmap_local_page(req->pr_entry_bh->b_page); entry = nilfs_palloc_block_get_entry(dat, req->pr_entry_nr, req->pr_entry_bh, kaddr); end = start = le64_to_cpu(entry->de_start); if (!dead) { end = nilfs_mdt_cno(dat); WARN_ON(start > end); } entry->de_end = cpu_to_le64(end); blocknr = le64_to_cpu(entry->de_blocknr); kunmap_local(kaddr); if (blocknr == 0) nilfs_dat_commit_free(dat, req); else nilfs_dat_commit_entry(dat, req); } void nilfs_dat_abort_end(struct inode *dat, struct nilfs_palloc_req *req) { struct nilfs_dat_entry *entry; __u64 start; sector_t blocknr; void *kaddr; kaddr = kmap_local_page(req->pr_entry_bh->b_page); entry = nilfs_palloc_block_get_entry(dat, req->pr_entry_nr, req->pr_entry_bh, kaddr); start = le64_to_cpu(entry->de_start); blocknr = le64_to_cpu(entry->de_blocknr); kunmap_local(kaddr); if (start == nilfs_mdt_cno(dat) && blocknr == 0) nilfs_palloc_abort_free_entry(dat, req); nilfs_dat_abort_entry(dat, req); } int nilfs_dat_prepare_update(struct inode *dat, struct nilfs_palloc_req *oldreq, struct nilfs_palloc_req *newreq) { int ret; ret = nilfs_dat_prepare_end(dat, oldreq); if (!ret) { ret = nilfs_dat_prepare_alloc(dat, newreq); if (ret < 0) nilfs_dat_abort_end(dat, oldreq); } return ret; } void nilfs_dat_commit_update(struct inode *dat, struct nilfs_palloc_req *oldreq, struct nilfs_palloc_req *newreq, int dead) { nilfs_dat_commit_end(dat, oldreq, dead); nilfs_dat_commit_alloc(dat, newreq); } void nilfs_dat_abort_update(struct inode *dat, struct nilfs_palloc_req *oldreq, struct nilfs_palloc_req *newreq) { nilfs_dat_abort_end(dat, oldreq); nilfs_dat_abort_alloc(dat, newreq); } /** * nilfs_dat_mark_dirty - * @dat: DAT file inode * @vblocknr: virtual block number * * Description: * * Return Value: On success, 0 is returned. On error, one of the following * negative error codes is returned. * * %-EIO - I/O error. * * %-ENOMEM - Insufficient amount of memory available. */ int nilfs_dat_mark_dirty(struct inode *dat, __u64 vblocknr) { struct nilfs_palloc_req req; int ret; req.pr_entry_nr = vblocknr; ret = nilfs_dat_prepare_entry(dat, &req, 0); if (ret == 0) nilfs_dat_commit_entry(dat, &req); return ret; } /** * nilfs_dat_freev - free virtual block numbers * @dat: DAT file inode * @vblocknrs: array of virtual block numbers * @nitems: number of virtual block numbers * * Description: nilfs_dat_freev() frees the virtual block numbers specified by * @vblocknrs and @nitems. * * Return Value: On success, 0 is returned. On error, one of the following * negative error codes is returned. * * %-EIO - I/O error. * * %-ENOMEM - Insufficient amount of memory available. * * %-ENOENT - The virtual block number have not been allocated. */ int nilfs_dat_freev(struct inode *dat, __u64 *vblocknrs, size_t nitems) { return nilfs_palloc_freev(dat, vblocknrs, nitems); } /** * nilfs_dat_move - change a block number * @dat: DAT file inode * @vblocknr: virtual block number * @blocknr: block number * * Description: nilfs_dat_move() changes the block number associated with * @vblocknr to @blocknr. * * Return Value: On success, 0 is returned. On error, one of the following * negative error codes is returned. * * %-EIO - I/O error. * * %-ENOMEM - Insufficient amount of memory available. */ int nilfs_dat_move(struct inode *dat, __u64 vblocknr, sector_t blocknr) { struct buffer_head *entry_bh; struct nilfs_dat_entry *entry; void *kaddr; int ret; ret = nilfs_palloc_get_entry_block(dat, vblocknr, 0, &entry_bh); if (ret < 0) return ret; /* * The given disk block number (blocknr) is not yet written to * the device at this point. * * To prevent nilfs_dat_translate() from returning the * uncommitted block number, this makes a copy of the entry * buffer and redirects nilfs_dat_translate() to the copy. */ if (!buffer_nilfs_redirected(entry_bh)) { ret = nilfs_mdt_freeze_buffer(dat, entry_bh); if (ret) { brelse(entry_bh); return ret; } } kaddr = kmap_local_page(entry_bh->b_page); entry = nilfs_palloc_block_get_entry(dat, vblocknr, entry_bh, kaddr); if (unlikely(entry->de_blocknr == cpu_to_le64(0))) { nilfs_crit(dat->i_sb, "%s: invalid vblocknr = %llu, [%llu, %llu)", __func__, (unsigned long long)vblocknr, (unsigned long long)le64_to_cpu(entry->de_start), (unsigned long long)le64_to_cpu(entry->de_end)); kunmap_local(kaddr); brelse(entry_bh); return -EINVAL; } WARN_ON(blocknr == 0); entry->de_blocknr = cpu_to_le64(blocknr); kunmap_local(kaddr); mark_buffer_dirty(entry_bh); nilfs_mdt_mark_dirty(dat); brelse(entry_bh); return 0; } /** * nilfs_dat_translate - translate a virtual block number to a block number * @dat: DAT file inode * @vblocknr: virtual block number * @blocknrp: pointer to a block number * * Description: nilfs_dat_translate() maps the virtual block number @vblocknr * to the corresponding block number. * * Return Value: On success, 0 is returned and the block number associated * with @vblocknr is stored in the place pointed by @blocknrp. On error, one * of the following negative error codes is returned. * * %-EIO - I/O error. * * %-ENOMEM - Insufficient amount of memory available. * * %-ENOENT - A block number associated with @vblocknr does not exist. */ int nilfs_dat_translate(struct inode *dat, __u64 vblocknr, sector_t *blocknrp) { struct buffer_head *entry_bh, *bh; struct nilfs_dat_entry *entry; sector_t blocknr; void *kaddr; int ret; ret = nilfs_palloc_get_entry_block(dat, vblocknr, 0, &entry_bh); if (ret < 0) return ret; if (!nilfs_doing_gc() && buffer_nilfs_redirected(entry_bh)) { bh = nilfs_mdt_get_frozen_buffer(dat, entry_bh); if (bh) { WARN_ON(!buffer_uptodate(bh)); brelse(entry_bh); entry_bh = bh; } } kaddr = kmap_local_page(entry_bh->b_page); entry = nilfs_palloc_block_get_entry(dat, vblocknr, entry_bh, kaddr); blocknr = le64_to_cpu(entry->de_blocknr); if (blocknr == 0) { ret = -ENOENT; goto out; } *blocknrp = blocknr; out: kunmap_local(kaddr); brelse(entry_bh); return ret; } ssize_t nilfs_dat_get_vinfo(struct inode *dat, void *buf, unsigned int visz, size_t nvi) { struct buffer_head *entry_bh; struct nilfs_dat_entry *entry; struct nilfs_vinfo *vinfo = buf; __u64 first, last; void *kaddr; unsigned long entries_per_block = NILFS_MDT(dat)->mi_entries_per_block; int i, j, n, ret; for (i = 0; i < nvi; i += n) { ret = nilfs_palloc_get_entry_block(dat, vinfo->vi_vblocknr, 0, &entry_bh); if (ret < 0) return ret; kaddr = kmap_local_page(entry_bh->b_page); /* last virtual block number in this block */ first = vinfo->vi_vblocknr; first = div64_ul(first, entries_per_block); first *= entries_per_block; last = first + entries_per_block - 1; for (j = i, n = 0; j < nvi && vinfo->vi_vblocknr >= first && vinfo->vi_vblocknr <= last; j++, n++, vinfo = (void *)vinfo + visz) { entry = nilfs_palloc_block_get_entry( dat, vinfo->vi_vblocknr, entry_bh, kaddr); vinfo->vi_start = le64_to_cpu(entry->de_start); vinfo->vi_end = le64_to_cpu(entry->de_end); vinfo->vi_blocknr = le64_to_cpu(entry->de_blocknr); } kunmap_local(kaddr); brelse(entry_bh); } return nvi; } /** * nilfs_dat_read - read or get dat inode * @sb: super block instance * @entry_size: size of a dat entry * @raw_inode: on-disk dat inode * @inodep: buffer to store the inode */ int nilfs_dat_read(struct super_block *sb, size_t entry_size, struct nilfs_inode *raw_inode, struct inode **inodep) { static struct lock_class_key dat_lock_key; struct inode *dat; struct nilfs_dat_info *di; int err; if (entry_size > sb->s_blocksize) { nilfs_err(sb, "too large DAT entry size: %zu bytes", entry_size); return -EINVAL; } else if (entry_size < NILFS_MIN_DAT_ENTRY_SIZE) { nilfs_err(sb, "too small DAT entry size: %zu bytes", entry_size); return -EINVAL; } dat = nilfs_iget_locked(sb, NULL, NILFS_DAT_INO); if (unlikely(!dat)) return -ENOMEM; if (!(dat->i_state & I_NEW)) goto out; err = nilfs_mdt_init(dat, NILFS_MDT_GFP, sizeof(*di)); if (err) goto failed; err = nilfs_palloc_init_blockgroup(dat, entry_size); if (err) goto failed; di = NILFS_DAT_I(dat); lockdep_set_class(&di->mi.mi_sem, &dat_lock_key); nilfs_palloc_setup_cache(dat, &di->palloc_cache); err = nilfs_mdt_setup_shadow_map(dat, &di->shadow); if (err) goto failed; err = nilfs_read_inode_common(dat, raw_inode); if (err) goto failed; unlock_new_inode(dat); out: *inodep = dat; return 0; failed: iget_failed(dat); return err; } |
23 23 12 13 102 112 60 20 20 12 1 1 58 58 195 4 19 29 18 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 | /* * net/tipc/msg.h: Include file for TIPC message header routines * * Copyright (c) 2000-2007, 2014-2017 Ericsson AB * Copyright (c) 2005-2008, 2010-2011, Wind River Systems * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are met: * * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. Neither the names of the copyright holders nor the names of its * contributors may be used to endorse or promote products derived from * this software without specific prior written permission. * * Alternatively, this software may be distributed under the terms of the * GNU General Public License ("GPL") version 2 as published by the Free * Software Foundation. * * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE * POSSIBILITY OF SUCH DAMAGE. */ #ifndef _TIPC_MSG_H #define _TIPC_MSG_H #include <linux/tipc.h> #include "core.h" /* * Constants and routines used to read and write TIPC payload message headers * * Note: Some items are also used with TIPC internal message headers */ #define TIPC_VERSION 2 struct plist; /* * Payload message users are defined in TIPC's public API: * - TIPC_LOW_IMPORTANCE * - TIPC_MEDIUM_IMPORTANCE * - TIPC_HIGH_IMPORTANCE * - TIPC_CRITICAL_IMPORTANCE */ #define TIPC_SYSTEM_IMPORTANCE 4 /* * Payload message types */ #define TIPC_CONN_MSG 0 #define TIPC_MCAST_MSG 1 #define TIPC_NAMED_MSG 2 #define TIPC_DIRECT_MSG 3 #define TIPC_GRP_MEMBER_EVT 4 #define TIPC_GRP_BCAST_MSG 5 #define TIPC_GRP_MCAST_MSG 6 #define TIPC_GRP_UCAST_MSG 7 /* * Internal message users */ #define BCAST_PROTOCOL 5 #define MSG_BUNDLER 6 #define LINK_PROTOCOL 7 #define CONN_MANAGER 8 #define GROUP_PROTOCOL 9 #define TUNNEL_PROTOCOL 10 #define NAME_DISTRIBUTOR 11 #define MSG_FRAGMENTER 12 #define LINK_CONFIG 13 #define MSG_CRYPTO 14 #define SOCK_WAKEUP 14 /* pseudo user */ #define TOP_SRV 15 /* pseudo user */ /* * Message header sizes */ #define SHORT_H_SIZE 24 /* In-cluster basic payload message */ #define BASIC_H_SIZE 32 /* Basic payload message */ #define NAMED_H_SIZE 40 /* Named payload message */ #define MCAST_H_SIZE 44 /* Multicast payload message */ #define GROUP_H_SIZE 44 /* Group payload message */ #define INT_H_SIZE 40 /* Internal messages */ #define MIN_H_SIZE 24 /* Smallest legal TIPC header size */ #define MAX_H_SIZE 60 /* Largest possible TIPC header size */ #define MAX_MSG_SIZE (MAX_H_SIZE + TIPC_MAX_USER_MSG_SIZE) #define TIPC_MEDIA_INFO_OFFSET 5 extern const int one_page_mtu; struct tipc_skb_cb { union { struct { struct sk_buff *tail; unsigned long nxt_retr; unsigned long retr_stamp; u32 bytes_read; u32 orig_member; u16 chain_imp; u16 ackers; u16 retr_cnt; } __packed; #ifdef CONFIG_TIPC_CRYPTO struct { struct tipc_crypto *rx; struct tipc_aead *last; u8 recurs; } tx_clone_ctx __packed; #endif } __packed; union { struct { u8 validated:1; #ifdef CONFIG_TIPC_CRYPTO u8 encrypted:1; u8 decrypted:1; #define SKB_PROBING 1 #define SKB_GRACING 2 u8 xmit_type:2; u8 tx_clone_deferred:1; #endif }; u8 flags; }; u8 reserved; #ifdef CONFIG_TIPC_CRYPTO void *crypto_ctx; #endif } __packed; #define TIPC_SKB_CB(__skb) ((struct tipc_skb_cb *)&((__skb)->cb[0])) struct tipc_msg { __be32 hdr[15]; }; /* struct tipc_gap_ack - TIPC Gap ACK block * @ack: seqno of the last consecutive packet in link deferdq * @gap: number of gap packets since the last ack * * E.g: * link deferdq: 1 2 3 4 10 11 13 14 15 20 * --> Gap ACK blocks: <4, 5>, <11, 1>, <15, 4>, <20, 0> */ struct tipc_gap_ack { __be16 ack; __be16 gap; }; /* struct tipc_gap_ack_blks * @len: actual length of the record * @ugack_cnt: number of Gap ACK blocks for unicast (following the broadcast * ones) * @start_index: starting index for "valid" broadcast Gap ACK blocks * @bgack_cnt: number of Gap ACK blocks for broadcast in the record * @gacks: array of Gap ACK blocks * * 31 16 15 0 * +-------------+-------------+-------------+-------------+ * | bgack_cnt | ugack_cnt | len | * +-------------+-------------+-------------+-------------+ - * | gap | ack | | * +-------------+-------------+-------------+-------------+ > bc gacks * : : : | * +-------------+-------------+-------------+-------------+ - * | gap | ack | | * +-------------+-------------+-------------+-------------+ > uc gacks * : : : | * +-------------+-------------+-------------+-------------+ - */ struct tipc_gap_ack_blks { __be16 len; union { u8 ugack_cnt; u8 start_index; }; u8 bgack_cnt; struct tipc_gap_ack gacks[]; }; #define MAX_GAP_ACK_BLKS 128 #define MAX_GAP_ACK_BLKS_SZ (sizeof(struct tipc_gap_ack_blks) + \ sizeof(struct tipc_gap_ack) * MAX_GAP_ACK_BLKS) static inline struct tipc_msg *buf_msg(struct sk_buff *skb) { return (struct tipc_msg *)skb->data; } static inline u32 msg_word(struct tipc_msg *m, u32 pos) { return ntohl(m->hdr[pos]); } static inline void msg_set_word(struct tipc_msg *m, u32 w, u32 val) { m->hdr[w] = htonl(val); } static inline u32 msg_bits(struct tipc_msg *m, u32 w, u32 pos, u32 mask) { return (msg_word(m, w) >> pos) & mask; } static inline void msg_set_bits(struct tipc_msg *m, u32 w, u32 pos, u32 mask, u32 val) { val = (val & mask) << pos; mask = mask << pos; m->hdr[w] &= ~htonl(mask); m->hdr[w] |= htonl(val); } /* * Word 0 */ static inline u32 msg_version(struct tipc_msg *m) { return msg_bits(m, 0, 29, 7); } static inline void msg_set_version(struct tipc_msg *m) { msg_set_bits(m, 0, 29, 7, TIPC_VERSION); } static inline u32 msg_user(struct tipc_msg *m) { return msg_bits(m, 0, 25, 0xf); } static inline u32 msg_isdata(struct tipc_msg *m) { return msg_user(m) <= TIPC_CRITICAL_IMPORTANCE; } static inline void msg_set_user(struct tipc_msg *m, u32 n) { msg_set_bits(m, 0, 25, 0xf, n); } static inline u32 msg_hdr_sz(struct tipc_msg *m) { return msg_bits(m, 0, 21, 0xf) << 2; } static inline void msg_set_hdr_sz(struct tipc_msg *m, u32 n) { msg_set_bits(m, 0, 21, 0xf, n>>2); } static inline u32 msg_size(struct tipc_msg *m) { return msg_bits(m, 0, 0, 0x1ffff); } static inline u32 msg_blocks(struct tipc_msg *m) { return (msg_size(m) / 1024) + 1; } static inline u32 msg_data_sz(struct tipc_msg *m) { return msg_size(m) - msg_hdr_sz(m); } static inline int msg_non_seq(struct tipc_msg *m) { return msg_bits(m, 0, 20, 1); } static inline void msg_set_non_seq(struct tipc_msg *m, u32 n) { msg_set_bits(m, 0, 20, 1, n); } static inline int msg_is_syn(struct tipc_msg *m) { return msg_bits(m, 0, 17, 1); } static inline void msg_set_syn(struct tipc_msg *m, u32 d) { msg_set_bits(m, 0, 17, 1, d); } static inline int msg_dest_droppable(struct tipc_msg *m) { return msg_bits(m, 0, 19, 1); } static inline void msg_set_dest_droppable(struct tipc_msg *m, u32 d) { msg_set_bits(m, 0, 19, 1, d); } static inline int msg_is_keepalive(struct tipc_msg *m) { return msg_bits(m, 0, 19, 1); } static inline void msg_set_is_keepalive(struct tipc_msg *m, u32 d) { msg_set_bits(m, 0, 19, 1, d); } static inline int msg_src_droppable(struct tipc_msg *m) { return msg_bits(m, 0, 18, 1); } static inline void msg_set_src_droppable(struct tipc_msg *m, u32 d) { msg_set_bits(m, 0, 18, 1, d); } static inline int msg_ack_required(struct tipc_msg *m) { return msg_bits(m, 0, 18, 1); } static inline void msg_set_ack_required(struct tipc_msg *m) { msg_set_bits(m, 0, 18, 1, 1); } static inline int msg_nagle_ack(struct tipc_msg *m) { return msg_bits(m, 0, 18, 1); } static inline void msg_set_nagle_ack(struct tipc_msg *m) { msg_set_bits(m, 0, 18, 1, 1); } static inline bool msg_is_rcast(struct tipc_msg *m) { return msg_bits(m, 0, 18, 0x1); } static inline void msg_set_is_rcast(struct tipc_msg *m, bool d) { msg_set_bits(m, 0, 18, 0x1, d); } static inline void msg_set_size(struct tipc_msg *m, u32 sz) { m->hdr[0] = htonl((msg_word(m, 0) & ~0x1ffff) | sz); } static inline unchar *msg_data(struct tipc_msg *m) { return ((unchar *)m) + msg_hdr_sz(m); } static inline struct tipc_msg *msg_inner_hdr(struct tipc_msg *m) { return (struct tipc_msg *)msg_data(m); } /* * Word 1 */ static inline u32 msg_type(struct tipc_msg *m) { return msg_bits(m, 1, 29, 0x7); } static inline void msg_set_type(struct tipc_msg *m, u32 n) { msg_set_bits(m, 1, 29, 0x7, n); } static inline int msg_in_group(struct tipc_msg *m) { int mtyp = msg_type(m); return mtyp >= TIPC_GRP_MEMBER_EVT && mtyp <= TIPC_GRP_UCAST_MSG; } static inline bool msg_is_grp_evt(struct tipc_msg *m) { return msg_type(m) == TIPC_GRP_MEMBER_EVT; } static inline u32 msg_named(struct tipc_msg *m) { return msg_type(m) == TIPC_NAMED_MSG; } static inline u32 msg_mcast(struct tipc_msg *m) { int mtyp = msg_type(m); return ((mtyp == TIPC_MCAST_MSG) || (mtyp == TIPC_GRP_BCAST_MSG) || (mtyp == TIPC_GRP_MCAST_MSG)); } static inline u32 msg_connected(struct tipc_msg *m) { return msg_type(m) == TIPC_CONN_MSG; } static inline u32 msg_direct(struct tipc_msg *m) { return msg_type(m) == TIPC_DIRECT_MSG; } static inline u32 msg_errcode(struct tipc_msg *m) { return msg_bits(m, 1, 25, 0xf); } static inline void msg_set_errcode(struct tipc_msg *m, u32 err) { msg_set_bits(m, 1, 25, 0xf, err); } static inline void msg_set_bulk(struct tipc_msg *m) { msg_set_bits(m, 1, 28, 0x1, 1); } static inline u32 msg_is_bulk(struct tipc_msg *m) { return msg_bits(m, 1, 28, 0x1); } static inline void msg_set_last_bulk(struct tipc_msg *m) { msg_set_bits(m, 1, 27, 0x1, 1); } static inline u32 msg_is_last_bulk(struct tipc_msg *m) { return msg_bits(m, 1, 27, 0x1); } static inline void msg_set_non_legacy(struct tipc_msg *m) { msg_set_bits(m, 1, 26, 0x1, 1); } static inline u32 msg_is_legacy(struct tipc_msg *m) { return !msg_bits(m, 1, 26, 0x1); } static inline u32 msg_reroute_cnt(struct tipc_msg *m) { return msg_bits(m, 1, 21, 0xf); } static inline void msg_incr_reroute_cnt(struct tipc_msg *m) { msg_set_bits(m, 1, 21, 0xf, msg_reroute_cnt(m) + 1); } static inline u32 msg_lookup_scope(struct tipc_msg *m) { return msg_bits(m, 1, 19, 0x3); } static inline void msg_set_lookup_scope(struct tipc_msg *m, u32 n) { msg_set_bits(m, 1, 19, 0x3, n); } static inline u16 msg_bcast_ack(struct tipc_msg *m) { return msg_bits(m, 1, 0, 0xffff); } static inline void msg_set_bcast_ack(struct tipc_msg *m, u16 n) { msg_set_bits(m, 1, 0, 0xffff, n); } /* Note: reusing bits in word 1 for ACTIVATE_MSG only, to re-synch * link peer session number */ static inline bool msg_dest_session_valid(struct tipc_msg *m) { return msg_bits(m, 1, 16, 0x1); } static inline void msg_set_dest_session_valid(struct tipc_msg *m, bool valid) { msg_set_bits(m, 1, 16, 0x1, valid); } static inline u16 msg_dest_session(struct tipc_msg *m) { return msg_bits(m, 1, 0, 0xffff); } static inline void msg_set_dest_session(struct tipc_msg *m, u16 n) { msg_set_bits(m, 1, 0, 0xffff, n); } /* * Word 2 */ static inline u16 msg_ack(struct tipc_msg *m) { return msg_bits(m, 2, 16, 0xffff); } static inline void msg_set_ack(struct tipc_msg *m, u16 n) { msg_set_bits(m, 2, 16, 0xffff, n); } static inline u16 msg_seqno(struct tipc_msg *m) { return msg_bits(m, 2, 0, 0xffff); } static inline void msg_set_seqno(struct tipc_msg *m, u16 n) { msg_set_bits(m, 2, 0, 0xffff, n); } /* * Words 3-10 */ static inline u32 msg_importance(struct tipc_msg *m) { int usr = msg_user(m); if (likely((usr <= TIPC_CRITICAL_IMPORTANCE) && !msg_errcode(m))) return usr; if ((usr == MSG_FRAGMENTER) || (usr == MSG_BUNDLER)) return msg_bits(m, 9, 0, 0x7); return TIPC_SYSTEM_IMPORTANCE; } static inline void msg_set_importance(struct tipc_msg *m, u32 i) { int usr = msg_user(m); if (likely((usr == MSG_FRAGMENTER) || (usr == MSG_BUNDLER))) msg_set_bits(m, 9, 0, 0x7, i); else if (i < TIPC_SYSTEM_IMPORTANCE) msg_set_user(m, i); else pr_warn("Trying to set illegal importance in message\n"); } static inline u32 msg_prevnode(struct tipc_msg *m) { return msg_word(m, 3); } static inline void msg_set_prevnode(struct tipc_msg *m, u32 a) { msg_set_word(m, 3, a); } static inline u32 msg_origport(struct tipc_msg *m) { if (msg_user(m) == MSG_FRAGMENTER) m = msg_inner_hdr(m); return msg_word(m, 4); } static inline void msg_set_origport(struct tipc_msg *m, u32 p) { msg_set_word(m, 4, p); } static inline u16 msg_named_seqno(struct tipc_msg *m) { return msg_bits(m, 4, 0, 0xffff); } static inline void msg_set_named_seqno(struct tipc_msg *m, u16 n) { msg_set_bits(m, 4, 0, 0xffff, n); } static inline u32 msg_destport(struct tipc_msg *m) { return msg_word(m, 5); } static inline void msg_set_destport(struct tipc_msg *m, u32 p) { msg_set_word(m, 5, p); } static inline u32 msg_mc_netid(struct tipc_msg *m) { return msg_word(m, 5); } static inline void msg_set_mc_netid(struct tipc_msg *m, u32 p) { msg_set_word(m, 5, p); } static inline int msg_short(struct tipc_msg *m) { return msg_hdr_sz(m) == SHORT_H_SIZE; } static inline u32 msg_orignode(struct tipc_msg *m) { if (likely(msg_short(m))) return msg_prevnode(m); return msg_word(m, 6); } static inline void msg_set_orignode(struct tipc_msg *m, u32 a) { msg_set_word(m, 6, a); } static inline u32 msg_destnode(struct tipc_msg *m) { return msg_word(m, 7); } static inline void msg_set_destnode(struct tipc_msg *m, u32 a) { msg_set_word(m, 7, a); } static inline u32 msg_nametype(struct tipc_msg *m) { return msg_word(m, 8); } static inline void msg_set_nametype(struct tipc_msg *m, u32 n) { msg_set_word(m, 8, n); } static inline u32 msg_nameinst(struct tipc_msg *m) { return msg_word(m, 9); } static inline u32 msg_namelower(struct tipc_msg *m) { return msg_nameinst(m); } static inline void msg_set_namelower(struct tipc_msg *m, u32 n) { msg_set_word(m, 9, n); } static inline void msg_set_nameinst(struct tipc_msg *m, u32 n) { msg_set_namelower(m, n); } static inline u32 msg_nameupper(struct tipc_msg *m) { return msg_word(m, 10); } static inline void msg_set_nameupper(struct tipc_msg *m, u32 n) { msg_set_word(m, 10, n); } /* * Constants and routines used to read and write TIPC internal message headers */ /* * Connection management protocol message types */ #define CONN_PROBE 0 #define CONN_PROBE_REPLY 1 #define CONN_ACK 2 /* * Name distributor message types */ #define PUBLICATION 0 #define WITHDRAWAL 1 /* * Segmentation message types */ #define FIRST_FRAGMENT 0 #define FRAGMENT 1 #define LAST_FRAGMENT 2 /* * Link management protocol message types */ #define STATE_MSG 0 #define RESET_MSG 1 #define ACTIVATE_MSG 2 /* * Changeover tunnel message types */ #define SYNCH_MSG 0 #define FAILOVER_MSG 1 /* * Config protocol message types */ #define DSC_REQ_MSG 0 #define DSC_RESP_MSG 1 #define DSC_TRIAL_MSG 2 #define DSC_TRIAL_FAIL_MSG 3 /* * Group protocol message types */ #define GRP_JOIN_MSG 0 #define GRP_LEAVE_MSG 1 #define GRP_ADV_MSG 2 #define GRP_ACK_MSG 3 #define GRP_RECLAIM_MSG 4 #define GRP_REMIT_MSG 5 /* Crypto message types */ #define KEY_DISTR_MSG 0 /* * Word 1 */ static inline u32 msg_seq_gap(struct tipc_msg *m) { return msg_bits(m, 1, 16, 0x1fff); } static inline void msg_set_seq_gap(struct tipc_msg *m, u32 n) { msg_set_bits(m, 1, 16, 0x1fff, n); } static inline u32 msg_node_sig(struct tipc_msg *m) { return msg_bits(m, 1, 0, 0xffff); } static inline void msg_set_node_sig(struct tipc_msg *m, u32 n) { msg_set_bits(m, 1, 0, 0xffff, n); } static inline u32 msg_node_capabilities(struct tipc_msg *m) { return msg_bits(m, 1, 15, 0x1fff); } static inline void msg_set_node_capabilities(struct tipc_msg *m, u32 n) { msg_set_bits(m, 1, 15, 0x1fff, n); } /* * Word 2 */ static inline u32 msg_dest_domain(struct tipc_msg *m) { return msg_word(m, 2); } static inline void msg_set_dest_domain(struct tipc_msg *m, u32 n) { msg_set_word(m, 2, n); } static inline void msg_set_bcgap_after(struct tipc_msg *m, u32 n) { msg_set_bits(m, 2, 16, 0xffff, n); } static inline u32 msg_bcgap_to(struct tipc_msg *m) { return msg_bits(m, 2, 0, 0xffff); } static inline void msg_set_bcgap_to(struct tipc_msg *m, u32 n) { msg_set_bits(m, 2, 0, 0xffff, n); } /* * Word 4 */ static inline u32 msg_last_bcast(struct tipc_msg *m) { return msg_bits(m, 4, 16, 0xffff); } static inline u32 msg_bc_snd_nxt(struct tipc_msg *m) { return msg_last_bcast(m) + 1; } static inline void msg_set_last_bcast(struct tipc_msg *m, u32 n) { msg_set_bits(m, 4, 16, 0xffff, n); } static inline u32 msg_nof_fragms(struct tipc_msg *m) { return msg_bits(m, 4, 0, 0xffff); } static inline void msg_set_nof_fragms(struct tipc_msg *m, u32 n) { msg_set_bits(m, 4, 0, 0xffff, n); } static inline u32 msg_fragm_no(struct tipc_msg *m) { return msg_bits(m, 4, 16, 0xffff); } static inline void msg_set_fragm_no(struct tipc_msg *m, u32 n) { msg_set_bits(m, 4, 16, 0xffff, n); } static inline u16 msg_next_sent(struct tipc_msg *m) { return msg_bits(m, 4, 0, 0xffff); } static inline void msg_set_next_sent(struct tipc_msg *m, u16 n) { msg_set_bits(m, 4, 0, 0xffff, n); } static inline u32 msg_bc_netid(struct tipc_msg *m) { return msg_word(m, 4); } static inline void msg_set_bc_netid(struct tipc_msg *m, u32 id) { msg_set_word(m, 4, id); } static inline u32 msg_link_selector(struct tipc_msg *m) { if (msg_user(m) == MSG_FRAGMENTER) m = (void *)msg_data(m); return msg_bits(m, 4, 0, 1); } /* * Word 5 */ static inline u16 msg_session(struct tipc_msg *m) { return msg_bits(m, 5, 16, 0xffff); } static inline void msg_set_session(struct tipc_msg *m, u16 n) { msg_set_bits(m, 5, 16, 0xffff, n); } static inline u32 msg_probe(struct tipc_msg *m) { return msg_bits(m, 5, 0, 1); } static inline void msg_set_probe(struct tipc_msg *m, u32 val) { msg_set_bits(m, 5, 0, 1, val); } static inline char msg_net_plane(struct tipc_msg *m) { return msg_bits(m, 5, 1, 7) + 'A'; } static inline void msg_set_net_plane(struct tipc_msg *m, char n) { msg_set_bits(m, 5, 1, 7, (n - 'A')); } static inline u32 msg_linkprio(struct tipc_msg *m) { return msg_bits(m, 5, 4, 0x1f); } static inline void msg_set_linkprio(struct tipc_msg *m, u32 n) { msg_set_bits(m, 5, 4, 0x1f, n); } static inline u32 msg_bearer_id(struct tipc_msg *m) { return msg_bits(m, 5, 9, 0x7); } static inline void msg_set_bearer_id(struct tipc_msg *m, u32 n) { msg_set_bits(m, 5, 9, 0x7, n); } static inline u32 msg_redundant_link(struct tipc_msg *m) { return msg_bits(m, 5, 12, 0x1); } static inline void msg_set_redundant_link(struct tipc_msg *m, u32 r) { msg_set_bits(m, 5, 12, 0x1, r); } static inline u32 msg_peer_stopping(struct tipc_msg *m) { return msg_bits(m, 5, 13, 0x1); } static inline void msg_set_peer_stopping(struct tipc_msg *m, u32 s) { msg_set_bits(m, 5, 13, 0x1, s); } static inline bool msg_bc_ack_invalid(struct tipc_msg *m) { switch (msg_user(m)) { case BCAST_PROTOCOL: case NAME_DISTRIBUTOR: case LINK_PROTOCOL: return msg_bits(m, 5, 14, 0x1); default: return false; } } static inline void msg_set_bc_ack_invalid(struct tipc_msg *m, bool invalid) { msg_set_bits(m, 5, 14, 0x1, invalid); } static inline char *msg_media_addr(struct tipc_msg *m) { return (char *)&m->hdr[TIPC_MEDIA_INFO_OFFSET]; } static inline u32 msg_bc_gap(struct tipc_msg *m) { return msg_bits(m, 8, 0, 0x3ff); } static inline void msg_set_bc_gap(struct tipc_msg *m, u32 n) { msg_set_bits(m, 8, 0, 0x3ff, n); } /* * Word 9 */ static inline u16 msg_msgcnt(struct tipc_msg *m) { return msg_bits(m, 9, 16, 0xffff); } static inline void msg_set_msgcnt(struct tipc_msg *m, u16 n) { msg_set_bits(m, 9, 16, 0xffff, n); } static inline u16 msg_syncpt(struct tipc_msg *m) { return msg_bits(m, 9, 16, 0xffff); } static inline void msg_set_syncpt(struct tipc_msg *m, u16 n) { msg_set_bits(m, 9, 16, 0xffff, n); } static inline u32 msg_conn_ack(struct tipc_msg *m) { return msg_bits(m, 9, 16, 0xffff); } static inline void msg_set_conn_ack(struct tipc_msg *m, u32 n) { msg_set_bits(m, 9, 16, 0xffff, n); } static inline u16 msg_adv_win(struct tipc_msg *m) { return msg_bits(m, 9, 0, 0xffff); } static inline void msg_set_adv_win(struct tipc_msg *m, u16 n) { msg_set_bits(m, 9, 0, 0xffff, n); } static inline u32 msg_max_pkt(struct tipc_msg *m) { return msg_bits(m, 9, 16, 0xffff) * 4; } static inline void msg_set_max_pkt(struct tipc_msg *m, u32 n) { msg_set_bits(m, 9, 16, 0xffff, (n / 4)); } static inline u32 msg_link_tolerance(struct tipc_msg *m) { return msg_bits(m, 9, 0, 0xffff); } static inline void msg_set_link_tolerance(struct tipc_msg *m, u32 n) { msg_set_bits(m, 9, 0, 0xffff, n); } static inline u16 msg_grp_bc_syncpt(struct tipc_msg *m) { return msg_bits(m, 9, 16, 0xffff); } static inline void msg_set_grp_bc_syncpt(struct tipc_msg *m, u16 n) { msg_set_bits(m, 9, 16, 0xffff, n); } static inline u16 msg_grp_bc_acked(struct tipc_msg *m) { return msg_bits(m, 9, 16, 0xffff); } static inline void msg_set_grp_bc_acked(struct tipc_msg *m, u16 n) { msg_set_bits(m, 9, 16, 0xffff, n); } static inline u16 msg_grp_remitted(struct tipc_msg *m) { return msg_bits(m, 9, 16, 0xffff); } static inline void msg_set_grp_remitted(struct tipc_msg *m, u16 n) { msg_set_bits(m, 9, 16, 0xffff, n); } /* Word 10 */ static inline u16 msg_grp_evt(struct tipc_msg *m) { return msg_bits(m, 10, 0, 0x3); } static inline void msg_set_grp_evt(struct tipc_msg *m, int n) { msg_set_bits(m, 10, 0, 0x3, n); } static inline u16 msg_grp_bc_ack_req(struct tipc_msg *m) { return msg_bits(m, 10, 0, 0x1); } static inline void msg_set_grp_bc_ack_req(struct tipc_msg *m, bool n) { msg_set_bits(m, 10, 0, 0x1, n); } static inline u16 msg_grp_bc_seqno(struct tipc_msg *m) { return msg_bits(m, 10, 16, 0xffff); } static inline void msg_set_grp_bc_seqno(struct tipc_msg *m, u32 n) { msg_set_bits(m, 10, 16, 0xffff, n); } static inline bool msg_peer_link_is_up(struct tipc_msg *m) { if (likely(msg_user(m) != LINK_PROTOCOL)) return true; if (msg_type(m) == STATE_MSG) return true; return false; } static inline bool msg_peer_node_is_up(struct tipc_msg *m) { if (msg_peer_link_is_up(m)) return true; return msg_redundant_link(m); } static inline bool msg_is_reset(struct tipc_msg *hdr) { return (msg_user(hdr) == LINK_PROTOCOL) && (msg_type(hdr) == RESET_MSG); } /* Word 13 */ static inline void msg_set_peer_net_hash(struct tipc_msg *m, u32 n) { msg_set_word(m, 13, n); } static inline u32 msg_peer_net_hash(struct tipc_msg *m) { return msg_word(m, 13); } /* Word 14 */ static inline u32 msg_sugg_node_addr(struct tipc_msg *m) { return msg_word(m, 14); } static inline void msg_set_sugg_node_addr(struct tipc_msg *m, u32 n) { msg_set_word(m, 14, n); } static inline void msg_set_node_id(struct tipc_msg *hdr, u8 *id) { memcpy(msg_data(hdr), id, 16); } static inline u8 *msg_node_id(struct tipc_msg *hdr) { return (u8 *)msg_data(hdr); } struct sk_buff *tipc_buf_acquire(u32 size, gfp_t gfp); bool tipc_msg_validate(struct sk_buff **_skb); bool tipc_msg_reverse(u32 own_addr, struct sk_buff **skb, int err); void tipc_skb_reject(struct net *net, int err, struct sk_buff *skb, struct sk_buff_head *xmitq); void tipc_msg_init(u32 own_addr, struct tipc_msg *m, u32 user, u32 type, u32 hsize, u32 destnode); struct sk_buff *tipc_msg_create(uint user, uint type, uint hdr_sz, uint data_sz, u32 dnode, u32 onode, u32 dport, u32 oport, int errcode); int tipc_buf_append(struct sk_buff **headbuf, struct sk_buff **buf); bool tipc_msg_try_bundle(struct sk_buff *tskb, struct sk_buff **skb, u32 mss, u32 dnode, bool *new_bundle); bool tipc_msg_extract(struct sk_buff *skb, struct sk_buff **iskb, int *pos); int tipc_msg_fragment(struct sk_buff *skb, const struct tipc_msg *hdr, int pktmax, struct sk_buff_head *frags); int tipc_msg_build(struct tipc_msg *mhdr, struct msghdr *m, int offset, int dsz, int mtu, struct sk_buff_head *list); int tipc_msg_append(struct tipc_msg *hdr, struct msghdr *m, int dlen, int mss, struct sk_buff_head *txq); bool tipc_msg_lookup_dest(struct net *net, struct sk_buff *skb, int *err); bool tipc_msg_assemble(struct sk_buff_head *list); bool tipc_msg_reassemble(struct sk_buff_head *list, struct sk_buff_head *rcvq); bool tipc_msg_pskb_copy(u32 dst, struct sk_buff_head *msg, struct sk_buff_head *cpy); bool __tipc_skb_queue_sorted(struct sk_buff_head *list, u16 seqno, struct sk_buff *skb); bool tipc_msg_skb_clone(struct sk_buff_head *msg, struct sk_buff_head *cpy); static inline u16 buf_seqno(struct sk_buff *skb) { return msg_seqno(buf_msg(skb)); } static inline int buf_roundup_len(struct sk_buff *skb) { return (skb->len / 1024 + 1) * 1024; } /* tipc_skb_peek(): peek and reserve first buffer in list * @list: list to be peeked in * Returns pointer to first buffer in list, if any */ static inline struct sk_buff *tipc_skb_peek(struct sk_buff_head *list, spinlock_t *lock) { struct sk_buff *skb; spin_lock_bh(lock); skb = skb_peek(list); if (skb) skb_get(skb); spin_unlock_bh(lock); return skb; } /* tipc_skb_peek_port(): find a destination port, ignoring all destinations * up to and including 'filter'. * Note: ignoring previously tried destinations minimizes the risk of * contention on the socket lock * @list: list to be peeked in * @filter: last destination to be ignored from search * Returns a destination port number, of applicable. */ static inline u32 tipc_skb_peek_port(struct sk_buff_head *list, u32 filter) { struct sk_buff *skb; u32 dport = 0; bool ignore = true; spin_lock_bh(&list->lock); skb_queue_walk(list, skb) { dport = msg_destport(buf_msg(skb)); if (!filter || skb_queue_is_last(list, skb)) break; if (dport == filter) ignore = false; else if (!ignore) break; } spin_unlock_bh(&list->lock); return dport; } /* tipc_skb_dequeue(): unlink first buffer with dest 'dport' from list * @list: list to be unlinked from * @dport: selection criteria for buffer to unlink */ static inline struct sk_buff *tipc_skb_dequeue(struct sk_buff_head *list, u32 dport) { struct sk_buff *_skb, *tmp, *skb = NULL; spin_lock_bh(&list->lock); skb_queue_walk_safe(list, _skb, tmp) { if (msg_destport(buf_msg(_skb)) == dport) { __skb_unlink(_skb, list); skb = _skb; break; } } spin_unlock_bh(&list->lock); return skb; } /* tipc_skb_queue_splice_tail - append an skb list to lock protected list * @list: the new list to append. Not lock protected * @head: target list. Lock protected. */ static inline void tipc_skb_queue_splice_tail(struct sk_buff_head *list, struct sk_buff_head *head) { spin_lock_bh(&head->lock); skb_queue_splice_tail(list, head); spin_unlock_bh(&head->lock); } /* tipc_skb_queue_splice_tail_init - merge two lock protected skb lists * @list: the new list to add. Lock protected. Will be reinitialized * @head: target list. Lock protected. */ static inline void tipc_skb_queue_splice_tail_init(struct sk_buff_head *list, struct sk_buff_head *head) { struct sk_buff_head tmp; __skb_queue_head_init(&tmp); spin_lock_bh(&list->lock); skb_queue_splice_tail_init(list, &tmp); spin_unlock_bh(&list->lock); tipc_skb_queue_splice_tail(&tmp, head); } /* __tipc_skb_dequeue() - dequeue the head skb according to expected seqno * @list: list to be dequeued from * @seqno: seqno of the expected msg * * returns skb dequeued from the list if its seqno is less than or equal to * the expected one, otherwise the skb is still hold * * Note: must be used with appropriate locks held only */ static inline struct sk_buff *__tipc_skb_dequeue(struct sk_buff_head *list, u16 seqno) { struct sk_buff *skb = skb_peek(list); if (skb && less_eq(buf_seqno(skb), seqno)) { __skb_unlink(skb, list); return skb; } return NULL; } #endif |
80 41 2 107 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 | /* SPDX-License-Identifier: GPL-2.0-only */ /* * pcm_local.h - a local header file for snd-pcm module. * * Copyright (c) Takashi Sakamoto <o-takashi@sakamocchi.jp> */ #ifndef __SOUND_CORE_PCM_LOCAL_H #define __SOUND_CORE_PCM_LOCAL_H extern const struct snd_pcm_hw_constraint_list snd_pcm_known_rates; void snd_interval_mul(const struct snd_interval *a, const struct snd_interval *b, struct snd_interval *c); void snd_interval_div(const struct snd_interval *a, const struct snd_interval *b, struct snd_interval *c); void snd_interval_muldivk(const struct snd_interval *a, const struct snd_interval *b, unsigned int k, struct snd_interval *c); void snd_interval_mulkdiv(const struct snd_interval *a, unsigned int k, const struct snd_interval *b, struct snd_interval *c); int snd_pcm_hw_constraint_mask(struct snd_pcm_runtime *runtime, snd_pcm_hw_param_t var, u_int32_t mask); int pcm_lib_apply_appl_ptr(struct snd_pcm_substream *substream, snd_pcm_uframes_t appl_ptr); int snd_pcm_update_state(struct snd_pcm_substream *substream, struct snd_pcm_runtime *runtime); int snd_pcm_update_hw_ptr(struct snd_pcm_substream *substream); void snd_pcm_playback_silence(struct snd_pcm_substream *substream, snd_pcm_uframes_t new_hw_ptr); static inline snd_pcm_uframes_t snd_pcm_avail(struct snd_pcm_substream *substream) { if (substream->stream == SNDRV_PCM_STREAM_PLAYBACK) return snd_pcm_playback_avail(substream->runtime); else return snd_pcm_capture_avail(substream->runtime); } static inline snd_pcm_uframes_t snd_pcm_hw_avail(struct snd_pcm_substream *substream) { if (substream->stream == SNDRV_PCM_STREAM_PLAYBACK) return snd_pcm_playback_hw_avail(substream->runtime); else return snd_pcm_capture_hw_avail(substream->runtime); } #ifdef CONFIG_SND_PCM_TIMER void snd_pcm_timer_resolution_change(struct snd_pcm_substream *substream); void snd_pcm_timer_init(struct snd_pcm_substream *substream); void snd_pcm_timer_done(struct snd_pcm_substream *substream); #else static inline void snd_pcm_timer_resolution_change(struct snd_pcm_substream *substream) {} static inline void snd_pcm_timer_init(struct snd_pcm_substream *substream) {} static inline void snd_pcm_timer_done(struct snd_pcm_substream *substream) {} #endif void __snd_pcm_xrun(struct snd_pcm_substream *substream); void snd_pcm_group_init(struct snd_pcm_group *group); void snd_pcm_sync_stop(struct snd_pcm_substream *substream, bool sync_irq); #define PCM_RUNTIME_CHECK(sub) snd_BUG_ON(!(sub) || !(sub)->runtime) /* loop over all PCM substreams */ #define for_each_pcm_substream(pcm, str, subs) \ for ((str) = 0; (str) < 2; (str)++) \ for ((subs) = (pcm)->streams[str].substream; (subs); \ (subs) = (subs)->next) static inline void snd_pcm_dma_buffer_sync(struct snd_pcm_substream *substream, enum snd_dma_sync_mode mode) { if (substream->runtime->info & SNDRV_PCM_INFO_EXPLICIT_SYNC) snd_dma_buffer_sync(snd_pcm_get_dma_buf(substream), mode); } #endif /* __SOUND_CORE_PCM_LOCAL_H */ |
27 27 27 27 27 27 34 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 | // SPDX-License-Identifier: GPL-2.0 /* * Copyright (C) 1992, 1998-2004 Linus Torvalds, Ingo Molnar * * This file contains the /proc/irq/ handling code. */ #include <linux/irq.h> #include <linux/gfp.h> #include <linux/proc_fs.h> #include <linux/seq_file.h> #include <linux/interrupt.h> #include <linux/kernel_stat.h> #include <linux/mutex.h> #include "internals.h" /* * Access rules: * * procfs protects read/write of /proc/irq/N/ files against a * concurrent free of the interrupt descriptor. remove_proc_entry() * immediately prevents new read/writes to happen and waits for * already running read/write functions to complete. * * We remove the proc entries first and then delete the interrupt * descriptor from the radix tree and free it. So it is guaranteed * that irq_to_desc(N) is valid as long as the read/writes are * permitted by procfs. * * The read from /proc/interrupts is a different problem because there * is no protection. So the lookup and the access to irqdesc * information must be protected by sparse_irq_lock. */ static struct proc_dir_entry *root_irq_dir; #ifdef CONFIG_SMP enum { AFFINITY, AFFINITY_LIST, EFFECTIVE, EFFECTIVE_LIST, }; static int show_irq_affinity(int type, struct seq_file *m) { struct irq_desc *desc = irq_to_desc((long)m->private); const struct cpumask *mask; switch (type) { case AFFINITY: case AFFINITY_LIST: mask = desc->irq_common_data.affinity; #ifdef CONFIG_GENERIC_PENDING_IRQ if (irqd_is_setaffinity_pending(&desc->irq_data)) mask = desc->pending_mask; #endif break; case EFFECTIVE: case EFFECTIVE_LIST: #ifdef CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK mask = irq_data_get_effective_affinity_mask(&desc->irq_data); break; #endif default: return -EINVAL; } switch (type) { case AFFINITY_LIST: case EFFECTIVE_LIST: seq_printf(m, "%*pbl\n", cpumask_pr_args(mask)); break; case AFFINITY: case EFFECTIVE: seq_printf(m, "%*pb\n", cpumask_pr_args(mask)); break; } return 0; } static int irq_affinity_hint_proc_show(struct seq_file *m, void *v) { struct irq_desc *desc = irq_to_desc((long)m->private); unsigned long flags; cpumask_var_t mask; if (!zalloc_cpumask_var(&mask, GFP_KERNEL)) return -ENOMEM; raw_spin_lock_irqsave(&desc->lock, flags); if (desc->affinity_hint) cpumask_copy(mask, desc->affinity_hint); raw_spin_unlock_irqrestore(&desc->lock, flags); seq_printf(m, "%*pb\n", cpumask_pr_args(mask)); free_cpumask_var(mask); return 0; } int no_irq_affinity; static int irq_affinity_proc_show(struct seq_file *m, void *v) { return show_irq_affinity(AFFINITY, m); } static int irq_affinity_list_proc_show(struct seq_file *m, void *v) { return show_irq_affinity(AFFINITY_LIST, m); } #ifndef CONFIG_AUTO_IRQ_AFFINITY static inline int irq_select_affinity_usr(unsigned int irq) { /* * If the interrupt is started up already then this fails. The * interrupt is assigned to an online CPU already. There is no * point to move it around randomly. Tell user space that the * selected mask is bogus. * * If not then any change to the affinity is pointless because the * startup code invokes irq_setup_affinity() which will select * a online CPU anyway. */ return -EINVAL; } #else /* ALPHA magic affinity auto selector. Keep it for historical reasons. */ static inline int irq_select_affinity_usr(unsigned int irq) { return irq_select_affinity(irq); } #endif static ssize_t write_irq_affinity(int type, struct file *file, const char __user *buffer, size_t count, loff_t *pos) { unsigned int irq = (int)(long)pde_data(file_inode(file)); cpumask_var_t new_value; int err; if (!irq_can_set_affinity_usr(irq) || no_irq_affinity) return -EIO; if (!zalloc_cpumask_var(&new_value, GFP_KERNEL)) return -ENOMEM; if (type) err = cpumask_parselist_user(buffer, count, new_value); else err = cpumask_parse_user(buffer, count, new_value); if (err) goto free_cpumask; /* * Do not allow disabling IRQs completely - it's a too easy * way to make the system unusable accidentally :-) At least * one online CPU still has to be targeted. */ if (!cpumask_intersects(new_value, cpu_online_mask)) { /* * Special case for empty set - allow the architecture code * to set default SMP affinity. */ err = irq_select_affinity_usr(irq) ? -EINVAL : count; } else { err = irq_set_affinity(irq, new_value); if (!err) err = count; } free_cpumask: free_cpumask_var(new_value); return err; } static ssize_t irq_affinity_proc_write(struct file *file, const char __user *buffer, size_t count, loff_t *pos) { return write_irq_affinity(0, file, buffer, count, pos); } static ssize_t irq_affinity_list_proc_write(struct file *file, const char __user *buffer, size_t count, loff_t *pos) { return write_irq_affinity(1, file, buffer, count, pos); } static int irq_affinity_proc_open(struct inode *inode, struct file *file) { return single_open(file, irq_affinity_proc_show, pde_data(inode)); } static int irq_affinity_list_proc_open(struct inode *inode, struct file *file) { return single_open(file, irq_affinity_list_proc_show, pde_data(inode)); } static const struct proc_ops irq_affinity_proc_ops = { .proc_open = irq_affinity_proc_open, .proc_read = seq_read, .proc_lseek = seq_lseek, .proc_release = single_release, .proc_write = irq_affinity_proc_write, }; static const struct proc_ops irq_affinity_list_proc_ops = { .proc_open = irq_affinity_list_proc_open, .proc_read = seq_read, .proc_lseek = seq_lseek, .proc_release = single_release, .proc_write = irq_affinity_list_proc_write, }; #ifdef CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK static int irq_effective_aff_proc_show(struct seq_file *m, void *v) { return show_irq_affinity(EFFECTIVE, m); } static int irq_effective_aff_list_proc_show(struct seq_file *m, void *v) { return show_irq_affinity(EFFECTIVE_LIST, m); } #endif static int default_affinity_show(struct seq_file *m, void *v) { seq_printf(m, "%*pb\n", cpumask_pr_args(irq_default_affinity)); return 0; } static ssize_t default_affinity_write(struct file *file, const char __user *buffer, size_t count, loff_t *ppos) { cpumask_var_t new_value; int err; if (!zalloc_cpumask_var(&new_value, GFP_KERNEL)) return -ENOMEM; err = cpumask_parse_user(buffer, count, new_value); if (err) goto out; /* * Do not allow disabling IRQs completely - it's a too easy * way to make the system unusable accidentally :-) At least * one online CPU still has to be targeted. */ if (!cpumask_intersects(new_value, cpu_online_mask)) { err = -EINVAL; goto out; } cpumask_copy(irq_default_affinity, new_value); err = count; out: free_cpumask_var(new_value); return err; } static int default_affinity_open(struct inode *inode, struct file *file) { return single_open(file, default_affinity_show, pde_data(inode)); } static const struct proc_ops default_affinity_proc_ops = { .proc_open = default_affinity_open, .proc_read = seq_read, .proc_lseek = seq_lseek, .proc_release = single_release, .proc_write = default_affinity_write, }; static int irq_node_proc_show(struct seq_file *m, void *v) { struct irq_desc *desc = irq_to_desc((long) m->private); seq_printf(m, "%d\n", irq_desc_get_node(desc)); return 0; } #endif static int irq_spurious_proc_show(struct seq_file *m, void *v) { struct irq_desc *desc = irq_to_desc((long) m->private); seq_printf(m, "count %u\n" "unhandled %u\n" "last_unhandled %u ms\n", desc->irq_count, desc->irqs_unhandled, jiffies_to_msecs(desc->last_unhandled)); return 0; } #define MAX_NAMELEN 128 static int name_unique(unsigned int irq, struct irqaction *new_action) { struct irq_desc *desc = irq_to_desc(irq); struct irqaction *action; unsigned long flags; int ret = 1; raw_spin_lock_irqsave(&desc->lock, flags); for_each_action_of_desc(desc, action) { if ((action != new_action) && action->name && !strcmp(new_action->name, action->name)) { ret = 0; break; } } raw_spin_unlock_irqrestore(&desc->lock, flags); return ret; } void register_handler_proc(unsigned int irq, struct irqaction *action) { char name [MAX_NAMELEN]; struct irq_desc *desc = irq_to_desc(irq); if (!desc->dir || action->dir || !action->name || !name_unique(irq, action)) return; snprintf(name, MAX_NAMELEN, "%s", action->name); /* create /proc/irq/1234/handler/ */ action->dir = proc_mkdir(name, desc->dir); } #undef MAX_NAMELEN #define MAX_NAMELEN 10 void register_irq_proc(unsigned int irq, struct irq_desc *desc) { static DEFINE_MUTEX(register_lock); void __maybe_unused *irqp = (void *)(unsigned long) irq; char name [MAX_NAMELEN]; if (!root_irq_dir || (desc->irq_data.chip == &no_irq_chip)) return; /* * irq directories are registered only when a handler is * added, not when the descriptor is created, so multiple * tasks might try to register at the same time. */ mutex_lock(®ister_lock); |