| 23 23 23 20 20 18 18 18 23 16 9 23 23 23 23 23 12 12 18 23 7 9 23 3 3 3 12 12 11 2 1 12 5 4 12 2 2 2 2 2 2 2 13 8 5 2 2 2 5 5 5 2 2 13 1 1 1 1 1 1 16 15 3 16 14 10 4 3 2 1 3 8 8 8 2 2 11 2 13 9 16 1 1 1 1 17 7 7 7 7 7 7 7 11 11 11 11 11 11 18 18 18 18 18 18 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 | // SPDX-License-Identifier: GPL-2.0+ // // em28xx-core.c - driver for Empia EM2800/EM2820/2840 USB video capture devices // // Copyright (C) 2005 Ludovico Cavedon <cavedon@sssup.it> // Markus Rechberger <mrechberger@gmail.com> // Mauro Carvalho Chehab <mchehab@kernel.org> // Sascha Sommer <saschasommer@freenet.de> // Copyright (C) 2012 Frank Schäfer <fschaefer.oss@googlemail.com> #include "em28xx.h" #include <linux/init.h> #include <linux/jiffies.h> #include <linux/list.h> #include <linux/module.h> #include <linux/slab.h> #include <linux/usb.h> #include <linux/vmalloc.h> #include <sound/ac97_codec.h> #include <media/v4l2-common.h> #define DRIVER_AUTHOR "Ludovico Cavedon <cavedon@sssup.it>, " \ "Markus Rechberger <mrechberger@gmail.com>, " \ "Mauro Carvalho Chehab <mchehab@kernel.org>, " \ "Sascha Sommer <saschasommer@freenet.de>" MODULE_AUTHOR(DRIVER_AUTHOR); MODULE_DESCRIPTION(DRIVER_DESC); MODULE_LICENSE("GPL v2"); MODULE_VERSION(EM28XX_VERSION); /* #define ENABLE_DEBUG_ISOC_FRAMES */ static unsigned int core_debug; module_param(core_debug, int, 0644); MODULE_PARM_DESC(core_debug, "enable debug messages [core and isoc]"); #define em28xx_coredbg(fmt, arg...) do { \ if (core_debug) \ dev_printk(KERN_DEBUG, &dev->intf->dev, \ "core: %s: " fmt, __func__, ## arg); \ } while (0) static unsigned int reg_debug; module_param(reg_debug, int, 0644); MODULE_PARM_DESC(reg_debug, "enable debug messages [URB reg]"); #define em28xx_regdbg(fmt, arg...) do { \ if (reg_debug) \ dev_printk(KERN_DEBUG, &dev->intf->dev, \ "reg: %s: " fmt, __func__, ## arg); \ } while (0) /* FIXME: don't abuse core_debug */ #define em28xx_isocdbg(fmt, arg...) do { \ if (core_debug) \ dev_printk(KERN_DEBUG, &dev->intf->dev, \ "core: %s: " fmt, __func__, ## arg); \ } while (0) /* * em28xx_read_reg_req() * reads data from the usb device specifying bRequest */ int em28xx_read_reg_req_len(struct em28xx *dev, u8 req, u16 reg, char *buf, int len) { int ret; struct usb_device *udev = interface_to_usbdev(dev->intf); int pipe = usb_rcvctrlpipe(udev, 0); if (dev->disconnected) return -ENODEV; if (len > URB_MAX_CTRL_SIZE) return -EINVAL; mutex_lock(&dev->ctrl_urb_lock); ret = usb_control_msg(udev, pipe, req, USB_DIR_IN | USB_TYPE_VENDOR | USB_RECIP_DEVICE, 0x0000, reg, dev->urb_buf, len, 1000); if (ret < 0) { em28xx_regdbg("(pipe 0x%08x): IN: %02x %02x %02x %02x %02x %02x %02x %02x failed with error %i\n", pipe, USB_DIR_IN | USB_TYPE_VENDOR | USB_RECIP_DEVICE, req, 0, 0, reg & 0xff, reg >> 8, len & 0xff, len >> 8, ret); mutex_unlock(&dev->ctrl_urb_lock); return usb_translate_errors(ret); } if (len) memcpy(buf, dev->urb_buf, len); mutex_unlock(&dev->ctrl_urb_lock); em28xx_regdbg("(pipe 0x%08x): IN: %02x %02x %02x %02x %02x %02x %02x %02x <<< %*ph\n", pipe, USB_DIR_IN | USB_TYPE_VENDOR | USB_RECIP_DEVICE, req, 0, 0, reg & 0xff, reg >> 8, len & 0xff, len >> 8, len, buf); return ret; } /* * em28xx_read_reg_req() * reads data from the usb device specifying bRequest */ int em28xx_read_reg_req(struct em28xx *dev, u8 req, u16 reg) { int ret; u8 val; ret = em28xx_read_reg_req_len(dev, req, reg, &val, 1); if (ret < 0) return ret; return val; } int em28xx_read_reg(struct em28xx *dev, u16 reg) { return em28xx_read_reg_req(dev, USB_REQ_GET_STATUS, reg); } EXPORT_SYMBOL_GPL(em28xx_read_reg); /* * em28xx_write_regs_req() * sends data to the usb device, specifying bRequest */ int em28xx_write_regs_req(struct em28xx *dev, u8 req, u16 reg, char *buf, int len) { int ret; struct usb_device *udev = interface_to_usbdev(dev->intf); int pipe = usb_sndctrlpipe(udev, 0); if (dev->disconnected) return -ENODEV; if (len < 1 || len > URB_MAX_CTRL_SIZE) return -EINVAL; mutex_lock(&dev->ctrl_urb_lock); memcpy(dev->urb_buf, buf, len); ret = usb_control_msg(udev, pipe, req, USB_DIR_OUT | USB_TYPE_VENDOR | USB_RECIP_DEVICE, 0x0000, reg, dev->urb_buf, len, 1000); mutex_unlock(&dev->ctrl_urb_lock); if (ret < 0) { em28xx_regdbg("(pipe 0x%08x): OUT: %02x %02x %02x %02x %02x %02x %02x %02x >>> %*ph failed with error %i\n", pipe, USB_DIR_OUT | USB_TYPE_VENDOR | USB_RECIP_DEVICE, req, 0, 0, reg & 0xff, reg >> 8, len & 0xff, len >> 8, len, buf, ret); return usb_translate_errors(ret); } em28xx_regdbg("(pipe 0x%08x): OUT: %02x %02x %02x %02x %02x %02x %02x %02x >>> %*ph\n", pipe, USB_DIR_OUT | USB_TYPE_VENDOR | USB_RECIP_DEVICE, req, 0, 0, reg & 0xff, reg >> 8, len & 0xff, len >> 8, len, buf); if (dev->wait_after_write) msleep(dev->wait_after_write); return ret; } int em28xx_write_regs(struct em28xx *dev, u16 reg, char *buf, int len) { return em28xx_write_regs_req(dev, USB_REQ_GET_STATUS, reg, buf, len); } EXPORT_SYMBOL_GPL(em28xx_write_regs); /* Write a single register */ int em28xx_write_reg(struct em28xx *dev, u16 reg, u8 val) { return em28xx_write_regs(dev, reg, &val, 1); } EXPORT_SYMBOL_GPL(em28xx_write_reg); /* * em28xx_write_reg_bits() * sets only some bits (specified by bitmask) of a register, by first reading * the actual value */ int em28xx_write_reg_bits(struct em28xx *dev, u16 reg, u8 val, u8 bitmask) { int oldval; u8 newval; oldval = em28xx_read_reg(dev, reg); if (oldval < 0) return oldval; newval = (((u8)oldval) & ~bitmask) | (val & bitmask); return em28xx_write_regs(dev, reg, &newval, 1); } EXPORT_SYMBOL_GPL(em28xx_write_reg_bits); /* * em28xx_toggle_reg_bits() * toggles/inverts the bits (specified by bitmask) of a register */ int em28xx_toggle_reg_bits(struct em28xx *dev, u16 reg, u8 bitmask) { int oldval; u8 newval; oldval = em28xx_read_reg(dev, reg); if (oldval < 0) return oldval; newval = (~oldval & bitmask) | (oldval & ~bitmask); return em28xx_write_reg(dev, reg, newval); } EXPORT_SYMBOL_GPL(em28xx_toggle_reg_bits); /* * em28xx_is_ac97_ready() * Checks if ac97 is ready */ static int em28xx_is_ac97_ready(struct em28xx *dev) { unsigned long timeout = jiffies + msecs_to_jiffies(EM28XX_AC97_XFER_TIMEOUT); int ret; /* Wait up to 50 ms for AC97 command to complete */ while (time_is_after_jiffies(timeout)) { ret = em28xx_read_reg(dev, EM28XX_R43_AC97BUSY); if (ret < 0) return ret; if (!(ret & 0x01)) return 0; msleep(5); } dev_warn(&dev->intf->dev, "AC97 command still being executed: not handled properly!\n"); return -EBUSY; } /* * em28xx_read_ac97() * write a 16 bit value to the specified AC97 address (LSB first!) */ int em28xx_read_ac97(struct em28xx *dev, u8 reg) { int ret; u8 addr = (reg & 0x7f) | 0x80; __le16 val; ret = em28xx_is_ac97_ready(dev); if (ret < 0) return ret; ret = em28xx_write_regs(dev, EM28XX_R42_AC97ADDR, &addr, 1); if (ret < 0) return ret; ret = dev->em28xx_read_reg_req_len(dev, 0, EM28XX_R40_AC97LSB, (u8 *)&val, sizeof(val)); if (ret < 0) return ret; return le16_to_cpu(val); } EXPORT_SYMBOL_GPL(em28xx_read_ac97); /* * em28xx_write_ac97() * write a 16 bit value to the specified AC97 address (LSB first!) */ int em28xx_write_ac97(struct em28xx *dev, u8 reg, u16 val) { int ret; u8 addr = reg & 0x7f; __le16 value; value = cpu_to_le16(val); ret = em28xx_is_ac97_ready(dev); if (ret < 0) return ret; ret = em28xx_write_regs(dev, EM28XX_R40_AC97LSB, (u8 *)&value, 2); if (ret < 0) return ret; ret = em28xx_write_regs(dev, EM28XX_R42_AC97ADDR, &addr, 1); if (ret < 0) return ret; return 0; } EXPORT_SYMBOL_GPL(em28xx_write_ac97); struct em28xx_vol_itable { enum em28xx_amux mux; u8 reg; }; static struct em28xx_vol_itable inputs[] = { { EM28XX_AMUX_VIDEO, AC97_VIDEO }, { EM28XX_AMUX_LINE_IN, AC97_LINE }, { EM28XX_AMUX_PHONE, AC97_PHONE }, { EM28XX_AMUX_MIC, AC97_MIC }, { EM28XX_AMUX_CD, AC97_CD }, { EM28XX_AMUX_AUX, AC97_AUX }, { EM28XX_AMUX_PCM_OUT, AC97_PCM }, }; static int set_ac97_input(struct em28xx *dev) { int ret, i; enum em28xx_amux amux = dev->ctl_ainput; /* * EM28XX_AMUX_VIDEO2 is a special case used to indicate that * em28xx should point to LINE IN, while AC97 should use VIDEO */ if (amux == EM28XX_AMUX_VIDEO2) amux = EM28XX_AMUX_VIDEO; /* Mute all entres but the one that were selected */ for (i = 0; i < ARRAY_SIZE(inputs); i++) { if (amux == inputs[i].mux) ret = em28xx_write_ac97(dev, inputs[i].reg, 0x0808); else ret = em28xx_write_ac97(dev, inputs[i].reg, 0x8000); if (ret < 0) dev_warn(&dev->intf->dev, "couldn't setup AC97 register %d\n", inputs[i].reg); } return 0; } static int em28xx_set_audio_source(struct em28xx *dev) { int ret; u8 input; if (dev->board.is_em2800) { if (dev->ctl_ainput == EM28XX_AMUX_VIDEO) input = EM2800_AUDIO_SRC_TUNER; else input = EM2800_AUDIO_SRC_LINE; ret = em28xx_write_regs(dev, EM2800_R08_AUDIOSRC, &input, 1); if (ret < 0) return ret; } if (dev->has_msp34xx) { input = EM28XX_AUDIO_SRC_TUNER; } else { switch (dev->ctl_ainput) { case EM28XX_AMUX_VIDEO: input = EM28XX_AUDIO_SRC_TUNER; break; default: input = EM28XX_AUDIO_SRC_LINE; break; } } if (dev->board.mute_gpio && dev->mute) em28xx_gpio_set(dev, dev->board.mute_gpio); else em28xx_gpio_set(dev, INPUT(dev->ctl_input)->gpio); ret = em28xx_write_reg_bits(dev, EM28XX_R0E_AUDIOSRC, input, 0xc0); if (ret < 0) return ret; usleep_range(10000, 11000); switch (dev->audio_mode.ac97) { case EM28XX_NO_AC97: break; default: ret = set_ac97_input(dev); } return ret; } struct em28xx_vol_otable { enum em28xx_aout mux; u8 reg; }; static const struct em28xx_vol_otable outputs[] = { { EM28XX_AOUT_MASTER, AC97_MASTER }, { EM28XX_AOUT_LINE, AC97_HEADPHONE }, { EM28XX_AOUT_MONO, AC97_MASTER_MONO }, { EM28XX_AOUT_LFE, AC97_CENTER_LFE_MASTER }, { EM28XX_AOUT_SURR, AC97_SURROUND_MASTER }, }; int em28xx_audio_analog_set(struct em28xx *dev) { int ret, i; u8 xclk; /* Set GPIOs here for boards without audio */ if (dev->int_audio_type == EM28XX_INT_AUDIO_NONE) return em28xx_gpio_set(dev, INPUT(dev->ctl_input)->gpio); /* * It is assumed that all devices use master volume for output. * It would be possible to use also line output. */ if (dev->audio_mode.ac97 != EM28XX_NO_AC97) { /* Mute all outputs */ for (i = 0; i < ARRAY_SIZE(outputs); i++) { ret = em28xx_write_ac97(dev, outputs[i].reg, 0x8000); if (ret < 0) dev_warn(&dev->intf->dev, "couldn't setup AC97 register %d\n", outputs[i].reg); } } xclk = dev->board.xclk & 0x7f; if (!dev->mute) xclk |= EM28XX_XCLK_AUDIO_UNMUTE; ret = em28xx_write_reg(dev, EM28XX_R0F_XCLK, xclk); if (ret < 0) return ret; usleep_range(10000, 11000); /* Selects the proper audio input */ ret = em28xx_set_audio_source(dev); /* Sets volume */ if (dev->audio_mode.ac97 != EM28XX_NO_AC97) { int vol; em28xx_write_ac97(dev, AC97_POWERDOWN, 0x4200); em28xx_write_ac97(dev, AC97_EXTENDED_STATUS, 0x0031); em28xx_write_ac97(dev, AC97_PCM_LR_ADC_RATE, 0xbb80); /* LSB: left channel - both channels with the same level */ vol = (0x1f - dev->volume) | ((0x1f - dev->volume) << 8); /* Mute device, if needed */ if (dev->mute) vol |= 0x8000; /* Sets volume */ for (i = 0; i < ARRAY_SIZE(outputs); i++) { if (dev->ctl_aoutput & outputs[i].mux) ret = em28xx_write_ac97(dev, outputs[i].reg, vol); if (ret < 0) dev_warn(&dev->intf->dev, "couldn't setup AC97 register %d\n", outputs[i].reg); } if (dev->ctl_aoutput & EM28XX_AOUT_PCM_IN) { int sel = ac97_return_record_select(dev->ctl_aoutput); /* * Use the same input for both left and right * channels */ sel |= (sel << 8); em28xx_write_ac97(dev, AC97_REC_SEL, sel); } } return ret; } EXPORT_SYMBOL_GPL(em28xx_audio_analog_set); int em28xx_audio_setup(struct em28xx *dev) { int vid1, vid2, feat, cfg; u32 vid = 0; u8 i2s_samplerates; if (dev->chip_id == CHIP_ID_EM2870 || dev->chip_id == CHIP_ID_EM2874 || dev->chip_id == CHIP_ID_EM28174 || dev->chip_id == CHIP_ID_EM28178) { /* Digital only device - don't load any alsa module */ dev->int_audio_type = EM28XX_INT_AUDIO_NONE; dev->usb_audio_type = EM28XX_USB_AUDIO_NONE; return 0; } /* See how this device is configured */ cfg = em28xx_read_reg(dev, EM28XX_R00_CHIPCFG); dev_info(&dev->intf->dev, "Config register raw data: 0x%02x\n", cfg); if (cfg < 0) { /* Register read error */ /* Be conservative */ dev->int_audio_type = EM28XX_INT_AUDIO_AC97; } else if ((cfg & EM28XX_CHIPCFG_AUDIOMASK) == 0x00) { /* The device doesn't have vendor audio at all */ dev->int_audio_type = EM28XX_INT_AUDIO_NONE; dev->usb_audio_type = EM28XX_USB_AUDIO_NONE; return 0; } else if ((cfg & EM28XX_CHIPCFG_AUDIOMASK) != EM28XX_CHIPCFG_AC97) { dev->int_audio_type = EM28XX_INT_AUDIO_I2S; if (dev->chip_id < CHIP_ID_EM2860 && (cfg & EM28XX_CHIPCFG_AUDIOMASK) == EM2820_CHIPCFG_I2S_1_SAMPRATE) i2s_samplerates = 1; else if (dev->chip_id >= CHIP_ID_EM2860 && (cfg & EM28XX_CHIPCFG_AUDIOMASK) == EM2860_CHIPCFG_I2S_5_SAMPRATES) i2s_samplerates = 5; else i2s_samplerates = 3; dev_info(&dev->intf->dev, "I2S Audio (%d sample rate(s))\n", i2s_samplerates); /* Skip the code that does AC97 vendor detection */ dev->audio_mode.ac97 = EM28XX_NO_AC97; goto init_audio; } else { dev->int_audio_type = EM28XX_INT_AUDIO_AC97; } dev->audio_mode.ac97 = EM28XX_AC97_OTHER; vid1 = em28xx_read_ac97(dev, AC97_VENDOR_ID1); if (vid1 < 0) { /* * Device likely doesn't support AC97 * Note: (some) em2800 devices without eeprom reports 0x91 on * CHIPCFG register, even not having an AC97 chip */ dev_warn(&dev->intf->dev, "AC97 chip type couldn't be determined\n"); dev->audio_mode.ac97 = EM28XX_NO_AC97; if (dev->usb_audio_type == EM28XX_USB_AUDIO_VENDOR) dev->usb_audio_type = EM28XX_USB_AUDIO_NONE; dev->int_audio_type = EM28XX_INT_AUDIO_NONE; goto init_audio; } vid2 = em28xx_read_ac97(dev, AC97_VENDOR_ID2); if (vid2 < 0) goto init_audio; vid = vid1 << 16 | vid2; dev_warn(&dev->intf->dev, "AC97 vendor ID = 0x%08x\n", vid); feat = em28xx_read_ac97(dev, AC97_RESET); if (feat < 0) goto init_audio; dev_warn(&dev->intf->dev, "AC97 features = 0x%04x\n", feat); /* Try to identify what audio processor we have */ if ((vid == 0xffffffff || vid == 0x83847650) && feat == 0x6a90) dev->audio_mode.ac97 = EM28XX_AC97_EM202; else if ((vid >> 8) == 0x838476) dev->audio_mode.ac97 = EM28XX_AC97_SIGMATEL; init_audio: /* Reports detected AC97 processor */ switch (dev->audio_mode.ac97) { case EM28XX_NO_AC97: dev_info(&dev->intf->dev, "No AC97 audio processor\n"); break; case EM28XX_AC97_EM202: dev_info(&dev->intf->dev, "Empia 202 AC97 audio processor detected\n"); break; case EM28XX_AC97_SIGMATEL: dev_info(&dev->intf->dev, "Sigmatel audio processor detected (stac 97%02x)\n", vid & 0xff); break; case EM28XX_AC97_OTHER: dev_warn(&dev->intf->dev, "Unknown AC97 audio processor detected!\n"); break; default: break; } return em28xx_audio_analog_set(dev); } EXPORT_SYMBOL_GPL(em28xx_audio_setup); const struct em28xx_led *em28xx_find_led(struct em28xx *dev, enum em28xx_led_role role) { if (dev->board.leds) { u8 k = 0; while (dev->board.leds[k].role >= 0 && dev->board.leds[k].role < EM28XX_NUM_LED_ROLES) { if (dev->board.leds[k].role == role) return &dev->board.leds[k]; k++; } } return NULL; } EXPORT_SYMBOL_GPL(em28xx_find_led); int em28xx_capture_start(struct em28xx *dev, int start) { int rc; const struct em28xx_led *led = NULL; if (dev->chip_id == CHIP_ID_EM2874 || dev->chip_id == CHIP_ID_EM2884 || dev->chip_id == CHIP_ID_EM28174 || dev->chip_id == CHIP_ID_EM28178) { /* The Transport Stream Enable Register moved in em2874 */ if (dev->dvb_xfer_bulk) { /* Max Tx Size = 188 * 256 = 48128 - LCM(188,512) * 2 */ em28xx_write_reg(dev, (dev->ts == PRIMARY_TS) ? EM2874_R5D_TS1_PKT_SIZE : EM2874_R5E_TS2_PKT_SIZE, 0xff); } else { /* ISOC Maximum Transfer Size = 188 * 5 */ em28xx_write_reg(dev, (dev->ts == PRIMARY_TS) ? EM2874_R5D_TS1_PKT_SIZE : EM2874_R5E_TS2_PKT_SIZE, dev->dvb_max_pkt_size_isoc / 188); } if (dev->ts == PRIMARY_TS) rc = em28xx_write_reg_bits(dev, EM2874_R5F_TS_ENABLE, start ? EM2874_TS1_CAPTURE_ENABLE : 0x00, EM2874_TS1_CAPTURE_ENABLE | EM2874_TS1_FILTER_ENABLE | EM2874_TS1_NULL_DISCARD); else rc = em28xx_write_reg_bits(dev, EM2874_R5F_TS_ENABLE, start ? EM2874_TS2_CAPTURE_ENABLE : 0x00, EM2874_TS2_CAPTURE_ENABLE | EM2874_TS2_FILTER_ENABLE | EM2874_TS2_NULL_DISCARD); } else { /* FIXME: which is the best order? */ /* video registers are sampled by VREF */ rc = em28xx_write_reg_bits(dev, EM28XX_R0C_USBSUSP, start ? 0x10 : 0x00, 0x10); if (rc < 0) return rc; if (start) { if (dev->is_webcam) rc = em28xx_write_reg(dev, 0x13, 0x0c); /* Enable video capture */ rc = em28xx_write_reg(dev, 0x48, 0x00); if (rc < 0) return rc; if (dev->mode == EM28XX_ANALOG_MODE) rc = em28xx_write_reg(dev, EM28XX_R12_VINENABLE, 0x67); else rc = em28xx_write_reg(dev, EM28XX_R12_VINENABLE, 0x37); if (rc < 0) return rc; usleep_range(10000, 11000); } else { /* disable video capture */ rc = em28xx_write_reg(dev, EM28XX_R12_VINENABLE, 0x27); } } if (dev->mode == EM28XX_ANALOG_MODE) led = em28xx_find_led(dev, EM28XX_LED_ANALOG_CAPTURING); else if (dev->ts == PRIMARY_TS) led = em28xx_find_led(dev, EM28XX_LED_DIGITAL_CAPTURING); else led = em28xx_find_led(dev, EM28XX_LED_DIGITAL_CAPTURING_TS2); if (led) em28xx_write_reg_bits(dev, led->gpio_reg, (!start ^ led->inverted) ? ~led->gpio_mask : led->gpio_mask, led->gpio_mask); return rc; } int em28xx_gpio_set(struct em28xx *dev, const struct em28xx_reg_seq *gpio) { int rc = 0; if (!gpio) return rc; if (dev->mode != EM28XX_SUSPEND) { em28xx_write_reg(dev, 0x48, 0x00); if (dev->mode == EM28XX_ANALOG_MODE) em28xx_write_reg(dev, EM28XX_R12_VINENABLE, 0x67); else em28xx_write_reg(dev, EM28XX_R12_VINENABLE, 0x37); usleep_range(10000, 11000); } /* Send GPIO reset sequences specified at board entry */ while (gpio->sleep >= 0) { if (gpio->reg >= 0) { rc = em28xx_write_reg_bits(dev, gpio->reg, gpio->val, gpio->mask); if (rc < 0) return rc; } if (gpio->sleep > 0) msleep(gpio->sleep); gpio++; } return rc; } EXPORT_SYMBOL_GPL(em28xx_gpio_set); int em28xx_set_mode(struct em28xx *dev, enum em28xx_mode set_mode) { if (dev->mode == set_mode) return 0; if (set_mode == EM28XX_SUSPEND) { dev->mode = set_mode; /* FIXME: add suspend support for ac97 */ return em28xx_gpio_set(dev, dev->board.suspend_gpio); } dev->mode = set_mode; if (dev->mode == EM28XX_DIGITAL_MODE) return em28xx_gpio_set(dev, dev->board.dvb_gpio); else return em28xx_gpio_set(dev, INPUT(dev->ctl_input)->gpio); } EXPORT_SYMBOL_GPL(em28xx_set_mode); /* *URB control */ /* * URB completion handler for isoc/bulk transfers */ static void em28xx_irq_callback(struct urb *urb) { struct em28xx *dev = urb->context; unsigned long flags; int i; switch (urb->status) { case 0: /* success */ case -ETIMEDOUT: /* NAK */ break; case -ECONNRESET: /* kill */ case -ENOENT: case -ESHUTDOWN: return; default: /* error */ em28xx_isocdbg("urb completion error %d.\n", urb->status); break; } /* Copy data from URB */ spin_lock_irqsave(&dev->slock, flags); dev->usb_ctl.urb_data_copy(dev, urb); spin_unlock_irqrestore(&dev->slock, flags); /* Reset urb buffers */ for (i = 0; i < urb->number_of_packets; i++) { /* isoc only (bulk: number_of_packets = 0) */ urb->iso_frame_desc[i].status = 0; urb->iso_frame_desc[i].actual_length = 0; } urb->status = 0; urb->status = usb_submit_urb(urb, GFP_ATOMIC); if (urb->status) { em28xx_isocdbg("urb resubmit failed (error=%i)\n", urb->status); } } /* * Stop and Deallocate URBs */ void em28xx_uninit_usb_xfer(struct em28xx *dev, enum em28xx_mode mode) { struct urb *urb; struct em28xx_usb_bufs *usb_bufs; int i; em28xx_isocdbg("called %s in mode %d\n", __func__, mode); if (mode == EM28XX_DIGITAL_MODE) usb_bufs = &dev->usb_ctl.digital_bufs; else usb_bufs = &dev->usb_ctl.analog_bufs; for (i = 0; i < usb_bufs->num_bufs; i++) { urb = usb_bufs->urb[i]; if (urb) { if (!irqs_disabled()) usb_kill_urb(urb); else usb_unlink_urb(urb); usb_free_urb(urb); usb_bufs->urb[i] = NULL; } } kfree(usb_bufs->urb); kfree(usb_bufs->buf); usb_bufs->urb = NULL; usb_bufs->buf = NULL; usb_bufs->num_bufs = 0; em28xx_capture_start(dev, 0); } EXPORT_SYMBOL_GPL(em28xx_uninit_usb_xfer); /* * Stop URBs */ void em28xx_stop_urbs(struct em28xx *dev) { int i; struct urb *urb; struct em28xx_usb_bufs *isoc_bufs = &dev->usb_ctl.digital_bufs; em28xx_isocdbg("called %s\n", __func__); for (i = 0; i < isoc_bufs->num_bufs; i++) { urb = isoc_bufs->urb[i]; if (urb) { if (!irqs_disabled()) usb_kill_urb(urb); else usb_unlink_urb(urb); } } em28xx_capture_start(dev, 0); } EXPORT_SYMBOL_GPL(em28xx_stop_urbs); /* * Allocate URBs */ int em28xx_alloc_urbs(struct em28xx *dev, enum em28xx_mode mode, int xfer_bulk, int num_bufs, int max_pkt_size, int packet_multiplier) { struct em28xx_usb_bufs *usb_bufs; struct urb *urb; struct usb_device *udev = interface_to_usbdev(dev->intf); int i; int sb_size, pipe; int j, k; em28xx_isocdbg("em28xx: called %s in mode %d\n", __func__, mode); /* * Check mode and if we have an endpoint for the selected * transfer type, select buffer */ if (mode == EM28XX_DIGITAL_MODE) { if ((xfer_bulk && !dev->dvb_ep_bulk) || (!xfer_bulk && !dev->dvb_ep_isoc)) { dev_err(&dev->intf->dev, "no endpoint for DVB mode and transfer type %d\n", xfer_bulk > 0); return -EINVAL; } usb_bufs = &dev->usb_ctl.digital_bufs; } else if (mode == EM28XX_ANALOG_MODE) { if ((xfer_bulk && !dev->analog_ep_bulk) || (!xfer_bulk && !dev->analog_ep_isoc)) { dev_err(&dev->intf->dev, "no endpoint for analog mode and transfer type %d\n", xfer_bulk > 0); return -EINVAL; } usb_bufs = &dev->usb_ctl.analog_bufs; } else { dev_err(&dev->intf->dev, "invalid mode selected\n"); return -EINVAL; } /* De-allocates all pending stuff */ em28xx_uninit_usb_xfer(dev, mode); usb_bufs->num_bufs = num_bufs; usb_bufs->urb = kcalloc(num_bufs, sizeof(void *), GFP_KERNEL); if (!usb_bufs->urb) return -ENOMEM; usb_bufs->buf = kcalloc(num_bufs, sizeof(void *), GFP_KERNEL); if (!usb_bufs->buf) { kfree(usb_bufs->urb); return -ENOMEM; } usb_bufs->max_pkt_size = max_pkt_size; if (xfer_bulk) usb_bufs->num_packets = 0; else usb_bufs->num_packets = packet_multiplier; dev->usb_ctl.vid_buf = NULL; dev->usb_ctl.vbi_buf = NULL; sb_size = packet_multiplier * usb_bufs->max_pkt_size; /* allocate urbs and transfer buffers */ for (i = 0; i < usb_bufs->num_bufs; i++) { urb = usb_alloc_urb(usb_bufs->num_packets, GFP_KERNEL); if (!urb) { em28xx_uninit_usb_xfer(dev, mode); return -ENOMEM; } usb_bufs->urb[i] = urb; usb_bufs->buf[i] = kzalloc(sb_size, GFP_KERNEL); if (!usb_bufs->buf[i]) { for (i--; i >= 0; i--) kfree(usb_bufs->buf[i]); em28xx_uninit_usb_xfer(dev, mode); return -ENOMEM; } urb->transfer_flags = URB_FREE_BUFFER; if (xfer_bulk) { /* bulk */ pipe = usb_rcvbulkpipe(udev, mode == EM28XX_ANALOG_MODE ? dev->analog_ep_bulk : dev->dvb_ep_bulk); usb_fill_bulk_urb(urb, udev, pipe, usb_bufs->buf[i], sb_size, em28xx_irq_callback, dev); } else { /* isoc */ pipe = usb_rcvisocpipe(udev, mode == EM28XX_ANALOG_MODE ? dev->analog_ep_isoc : dev->dvb_ep_isoc); usb_fill_int_urb(urb, udev, pipe, usb_bufs->buf[i], sb_size, em28xx_irq_callback, dev, 1); urb->transfer_flags |= URB_ISO_ASAP; k = 0; for (j = 0; j < usb_bufs->num_packets; j++) { urb->iso_frame_desc[j].offset = k; urb->iso_frame_desc[j].length = usb_bufs->max_pkt_size; k += usb_bufs->max_pkt_size; } } urb->number_of_packets = usb_bufs->num_packets; } return 0; } EXPORT_SYMBOL_GPL(em28xx_alloc_urbs); /* * Allocate URBs and start IRQ */ int em28xx_init_usb_xfer(struct em28xx *dev, enum em28xx_mode mode, int xfer_bulk, int num_bufs, int max_pkt_size, int packet_multiplier, int (*urb_data_copy)(struct em28xx *dev, struct urb *urb)) { struct em28xx_dmaqueue *dma_q = &dev->vidq; struct em28xx_dmaqueue *vbi_dma_q = &dev->vbiq; struct em28xx_usb_bufs *usb_bufs; struct usb_device *udev = interface_to_usbdev(dev->intf); int i; int rc; int alloc; em28xx_isocdbg("em28xx: called %s in mode %d\n", __func__, mode); dev->usb_ctl.urb_data_copy = urb_data_copy; if (mode == EM28XX_DIGITAL_MODE) { usb_bufs = &dev->usb_ctl.digital_bufs; /* no need to free/alloc usb buffers in digital mode */ alloc = 0; } else { usb_bufs = &dev->usb_ctl.analog_bufs; alloc = 1; } if (alloc) { rc = em28xx_alloc_urbs(dev, mode, xfer_bulk, num_bufs, max_pkt_size, packet_multiplier); if (rc) return rc; } if (xfer_bulk) { rc = usb_clear_halt(udev, usb_bufs->urb[0]->pipe); if (rc < 0) { dev_err(&dev->intf->dev, "failed to clear USB bulk endpoint stall/halt condition (error=%i)\n", rc); em28xx_uninit_usb_xfer(dev, mode); return rc; } } init_waitqueue_head(&dma_q->wq); init_waitqueue_head(&vbi_dma_q->wq); em28xx_capture_start(dev, 1); /* submit urbs and enables IRQ */ for (i = 0; i < usb_bufs->num_bufs; i++) { rc = usb_submit_urb(usb_bufs->urb[i], GFP_KERNEL); if (rc) { dev_err(&dev->intf->dev, "submit of urb %i failed (error=%i)\n", i, rc); em28xx_uninit_usb_xfer(dev, mode); return rc; } } return 0; } EXPORT_SYMBOL_GPL(em28xx_init_usb_xfer); /* * Device control list */ static LIST_HEAD(em28xx_devlist); static DEFINE_MUTEX(em28xx_devlist_mutex); /* * Extension interface */ static LIST_HEAD(em28xx_extension_devlist); int em28xx_register_extension(struct em28xx_ops *ops) { struct em28xx *dev = NULL; mutex_lock(&em28xx_devlist_mutex); list_add_tail(&ops->next, &em28xx_extension_devlist); list_for_each_entry(dev, &em28xx_devlist, devlist) { if (ops->init) { ops->init(dev); if (dev->dev_next) ops->init(dev->dev_next); } } mutex_unlock(&em28xx_devlist_mutex); pr_info("em28xx: Registered (%s) extension\n", ops->name); return 0; } EXPORT_SYMBOL(em28xx_register_extension); void em28xx_unregister_extension(struct em28xx_ops *ops) { struct em28xx *dev = NULL; mutex_lock(&em28xx_devlist_mutex); list_for_each_entry(dev, &em28xx_devlist, devlist) { if (ops->fini) { if (dev->dev_next) ops->fini(dev->dev_next); ops->fini(dev); } } list_del(&ops->next); mutex_unlock(&em28xx_devlist_mutex); pr_info("em28xx: Removed (%s) extension\n", ops->name); } EXPORT_SYMBOL(em28xx_unregister_extension); void em28xx_init_extension(struct em28xx *dev) { const struct em28xx_ops *ops = NULL; mutex_lock(&em28xx_devlist_mutex); list_add_tail(&dev->devlist, &em28xx_devlist); list_for_each_entry(ops, &em28xx_extension_devlist, next) { if (ops->init) { ops->init(dev); if (dev->dev_next) ops->init(dev->dev_next); } } mutex_unlock(&em28xx_devlist_mutex); } void em28xx_close_extension(struct em28xx *dev) { const struct em28xx_ops *ops = NULL; mutex_lock(&em28xx_devlist_mutex); list_for_each_entry(ops, &em28xx_extension_devlist, next) { if (ops->fini) { if (dev->dev_next) ops->fini(dev->dev_next); ops->fini(dev); } } list_del(&dev->devlist); mutex_unlock(&em28xx_devlist_mutex); } int em28xx_suspend_extension(struct em28xx *dev) { const struct em28xx_ops *ops = NULL; dev_info(&dev->intf->dev, "Suspending extensions\n"); mutex_lock(&em28xx_devlist_mutex); list_for_each_entry(ops, &em28xx_extension_devlist, next) { if (!ops->suspend) continue; ops->suspend(dev); if (dev->dev_next) ops->suspend(dev->dev_next); } mutex_unlock(&em28xx_devlist_mutex); return 0; } int em28xx_resume_extension(struct em28xx *dev) { const struct em28xx_ops *ops = NULL; dev_info(&dev->intf->dev, "Resuming extensions\n"); mutex_lock(&em28xx_devlist_mutex); list_for_each_entry(ops, &em28xx_extension_devlist, next) { if (!ops->resume) continue; ops->resume(dev); if (dev->dev_next) ops->resume(dev->dev_next); } mutex_unlock(&em28xx_devlist_mutex); return 0; } |
| 6 7 7 7 7 2 5 1 4 1 2 4 1 3 1 15 15 15 3 31 3 3 3 3 3 3 3 3 31 3 3 3 3 3 31 3 3 3 3 3 3 3 3 1 1 3 3 1 3 3 3 3 3 3 3 17 3 3 3 3 17 17 17 17 17 17 17 6 6 5 5 4 6 2 3 52 52 6 6 1 1 3 4 4 2 4 3 3 1 3 3 3 3 3 3 3 3 3 6 6 6 6 5 5 6 12 8 11 11 10 8 8 7 8 7 6 6 7 12 52 52 31 31 31 52 52 52 52 51 52 52 31 31 3 31 31 31 31 31 31 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 | // SPDX-License-Identifier: GPL-2.0-only /* * IEEE 802.1Q Multiple Registration Protocol (MRP) * * Copyright (c) 2012 Massachusetts Institute of Technology * * Adapted from code in net/802/garp.c * Copyright (c) 2008 Patrick McHardy <kaber@trash.net> */ #include <linux/kernel.h> #include <linux/timer.h> #include <linux/skbuff.h> #include <linux/netdevice.h> #include <linux/etherdevice.h> #include <linux/rtnetlink.h> #include <linux/slab.h> #include <linux/module.h> #include <net/mrp.h> #include <linux/unaligned.h> static unsigned int mrp_join_time __read_mostly = 200; module_param(mrp_join_time, uint, 0644); MODULE_PARM_DESC(mrp_join_time, "Join time in ms (default 200ms)"); static unsigned int mrp_periodic_time __read_mostly = 1000; module_param(mrp_periodic_time, uint, 0644); MODULE_PARM_DESC(mrp_periodic_time, "Periodic time in ms (default 1s)"); MODULE_DESCRIPTION("IEEE 802.1Q Multiple Registration Protocol (MRP)"); MODULE_LICENSE("GPL"); static const u8 mrp_applicant_state_table[MRP_APPLICANT_MAX + 1][MRP_EVENT_MAX + 1] = { [MRP_APPLICANT_VO] = { [MRP_EVENT_NEW] = MRP_APPLICANT_VN, [MRP_EVENT_JOIN] = MRP_APPLICANT_VP, [MRP_EVENT_LV] = MRP_APPLICANT_VO, [MRP_EVENT_TX] = MRP_APPLICANT_VO, [MRP_EVENT_R_NEW] = MRP_APPLICANT_VO, [MRP_EVENT_R_JOIN_IN] = MRP_APPLICANT_AO, [MRP_EVENT_R_IN] = MRP_APPLICANT_VO, [MRP_EVENT_R_JOIN_MT] = MRP_APPLICANT_VO, [MRP_EVENT_R_MT] = MRP_APPLICANT_VO, [MRP_EVENT_R_LV] = MRP_APPLICANT_VO, [MRP_EVENT_R_LA] = MRP_APPLICANT_VO, [MRP_EVENT_REDECLARE] = MRP_APPLICANT_VO, [MRP_EVENT_PERIODIC] = MRP_APPLICANT_VO, }, [MRP_APPLICANT_VP] = { [MRP_EVENT_NEW] = MRP_APPLICANT_VN, [MRP_EVENT_JOIN] = MRP_APPLICANT_VP, [MRP_EVENT_LV] = MRP_APPLICANT_VO, [MRP_EVENT_TX] = MRP_APPLICANT_AA, [MRP_EVENT_R_NEW] = MRP_APPLICANT_VP, [MRP_EVENT_R_JOIN_IN] = MRP_APPLICANT_AP, [MRP_EVENT_R_IN] = MRP_APPLICANT_VP, [MRP_EVENT_R_JOIN_MT] = MRP_APPLICANT_VP, [MRP_EVENT_R_MT] = MRP_APPLICANT_VP, [MRP_EVENT_R_LV] = MRP_APPLICANT_VP, [MRP_EVENT_R_LA] = MRP_APPLICANT_VP, [MRP_EVENT_REDECLARE] = MRP_APPLICANT_VP, [MRP_EVENT_PERIODIC] = MRP_APPLICANT_VP, }, [MRP_APPLICANT_VN] = { [MRP_EVENT_NEW] = MRP_APPLICANT_VN, [MRP_EVENT_JOIN] = MRP_APPLICANT_VN, [MRP_EVENT_LV] = MRP_APPLICANT_LA, [MRP_EVENT_TX] = MRP_APPLICANT_AN, [MRP_EVENT_R_NEW] = MRP_APPLICANT_VN, [MRP_EVENT_R_JOIN_IN] = MRP_APPLICANT_VN, [MRP_EVENT_R_IN] = MRP_APPLICANT_VN, [MRP_EVENT_R_JOIN_MT] = MRP_APPLICANT_VN, [MRP_EVENT_R_MT] = MRP_APPLICANT_VN, [MRP_EVENT_R_LV] = MRP_APPLICANT_VN, [MRP_EVENT_R_LA] = MRP_APPLICANT_VN, [MRP_EVENT_REDECLARE] = MRP_APPLICANT_VN, [MRP_EVENT_PERIODIC] = MRP_APPLICANT_VN, }, [MRP_APPLICANT_AN] = { [MRP_EVENT_NEW] = MRP_APPLICANT_AN, [MRP_EVENT_JOIN] = MRP_APPLICANT_AN, [MRP_EVENT_LV] = MRP_APPLICANT_LA, [MRP_EVENT_TX] = MRP_APPLICANT_QA, [MRP_EVENT_R_NEW] = MRP_APPLICANT_AN, [MRP_EVENT_R_JOIN_IN] = MRP_APPLICANT_AN, [MRP_EVENT_R_IN] = MRP_APPLICANT_AN, [MRP_EVENT_R_JOIN_MT] = MRP_APPLICANT_AN, [MRP_EVENT_R_MT] = MRP_APPLICANT_AN, [MRP_EVENT_R_LV] = MRP_APPLICANT_VN, [MRP_EVENT_R_LA] = MRP_APPLICANT_VN, [MRP_EVENT_REDECLARE] = MRP_APPLICANT_VN, [MRP_EVENT_PERIODIC] = MRP_APPLICANT_AN, }, [MRP_APPLICANT_AA] = { [MRP_EVENT_NEW] = MRP_APPLICANT_VN, [MRP_EVENT_JOIN] = MRP_APPLICANT_AA, [MRP_EVENT_LV] = MRP_APPLICANT_LA, [MRP_EVENT_TX] = MRP_APPLICANT_QA, [MRP_EVENT_R_NEW] = MRP_APPLICANT_AA, [MRP_EVENT_R_JOIN_IN] = MRP_APPLICANT_QA, [MRP_EVENT_R_IN] = MRP_APPLICANT_AA, [MRP_EVENT_R_JOIN_MT] = MRP_APPLICANT_AA, [MRP_EVENT_R_MT] = MRP_APPLICANT_AA, [MRP_EVENT_R_LV] = MRP_APPLICANT_VP, [MRP_EVENT_R_LA] = MRP_APPLICANT_VP, [MRP_EVENT_REDECLARE] = MRP_APPLICANT_VP, [MRP_EVENT_PERIODIC] = MRP_APPLICANT_AA, }, [MRP_APPLICANT_QA] = { [MRP_EVENT_NEW] = MRP_APPLICANT_VN, [MRP_EVENT_JOIN] = MRP_APPLICANT_QA, [MRP_EVENT_LV] = MRP_APPLICANT_LA, [MRP_EVENT_TX] = MRP_APPLICANT_QA, [MRP_EVENT_R_NEW] = MRP_APPLICANT_QA, [MRP_EVENT_R_JOIN_IN] = MRP_APPLICANT_QA, [MRP_EVENT_R_IN] = MRP_APPLICANT_QA, [MRP_EVENT_R_JOIN_MT] = MRP_APPLICANT_AA, [MRP_EVENT_R_MT] = MRP_APPLICANT_AA, [MRP_EVENT_R_LV] = MRP_APPLICANT_VP, [MRP_EVENT_R_LA] = MRP_APPLICANT_VP, [MRP_EVENT_REDECLARE] = MRP_APPLICANT_VP, [MRP_EVENT_PERIODIC] = MRP_APPLICANT_AA, }, [MRP_APPLICANT_LA] = { [MRP_EVENT_NEW] = MRP_APPLICANT_VN, [MRP_EVENT_JOIN] = MRP_APPLICANT_AA, [MRP_EVENT_LV] = MRP_APPLICANT_LA, [MRP_EVENT_TX] = MRP_APPLICANT_VO, [MRP_EVENT_R_NEW] = MRP_APPLICANT_LA, [MRP_EVENT_R_JOIN_IN] = MRP_APPLICANT_LA, [MRP_EVENT_R_IN] = MRP_APPLICANT_LA, [MRP_EVENT_R_JOIN_MT] = MRP_APPLICANT_LA, [MRP_EVENT_R_MT] = MRP_APPLICANT_LA, [MRP_EVENT_R_LV] = MRP_APPLICANT_LA, [MRP_EVENT_R_LA] = MRP_APPLICANT_LA, [MRP_EVENT_REDECLARE] = MRP_APPLICANT_LA, [MRP_EVENT_PERIODIC] = MRP_APPLICANT_LA, }, [MRP_APPLICANT_AO] = { [MRP_EVENT_NEW] = MRP_APPLICANT_VN, [MRP_EVENT_JOIN] = MRP_APPLICANT_AP, [MRP_EVENT_LV] = MRP_APPLICANT_AO, [MRP_EVENT_TX] = MRP_APPLICANT_AO, [MRP_EVENT_R_NEW] = MRP_APPLICANT_AO, [MRP_EVENT_R_JOIN_IN] = MRP_APPLICANT_QO, [MRP_EVENT_R_IN] = MRP_APPLICANT_AO, [MRP_EVENT_R_JOIN_MT] = MRP_APPLICANT_AO, [MRP_EVENT_R_MT] = MRP_APPLICANT_AO, [MRP_EVENT_R_LV] = MRP_APPLICANT_VO, [MRP_EVENT_R_LA] = MRP_APPLICANT_VO, [MRP_EVENT_REDECLARE] = MRP_APPLICANT_VO, [MRP_EVENT_PERIODIC] = MRP_APPLICANT_AO, }, [MRP_APPLICANT_QO] = { [MRP_EVENT_NEW] = MRP_APPLICANT_VN, [MRP_EVENT_JOIN] = MRP_APPLICANT_QP, [MRP_EVENT_LV] = MRP_APPLICANT_QO, [MRP_EVENT_TX] = MRP_APPLICANT_QO, [MRP_EVENT_R_NEW] = MRP_APPLICANT_QO, [MRP_EVENT_R_JOIN_IN] = MRP_APPLICANT_QO, [MRP_EVENT_R_IN] = MRP_APPLICANT_QO, [MRP_EVENT_R_JOIN_MT] = MRP_APPLICANT_AO, [MRP_EVENT_R_MT] = MRP_APPLICANT_AO, [MRP_EVENT_R_LV] = MRP_APPLICANT_VO, [MRP_EVENT_R_LA] = MRP_APPLICANT_VO, [MRP_EVENT_REDECLARE] = MRP_APPLICANT_VO, [MRP_EVENT_PERIODIC] = MRP_APPLICANT_QO, }, [MRP_APPLICANT_AP] = { [MRP_EVENT_NEW] = MRP_APPLICANT_VN, [MRP_EVENT_JOIN] = MRP_APPLICANT_AP, [MRP_EVENT_LV] = MRP_APPLICANT_AO, [MRP_EVENT_TX] = MRP_APPLICANT_QA, [MRP_EVENT_R_NEW] = MRP_APPLICANT_AP, [MRP_EVENT_R_JOIN_IN] = MRP_APPLICANT_QP, [MRP_EVENT_R_IN] = MRP_APPLICANT_AP, [MRP_EVENT_R_JOIN_MT] = MRP_APPLICANT_AP, [MRP_EVENT_R_MT] = MRP_APPLICANT_AP, [MRP_EVENT_R_LV] = MRP_APPLICANT_VP, [MRP_EVENT_R_LA] = MRP_APPLICANT_VP, [MRP_EVENT_REDECLARE] = MRP_APPLICANT_VP, [MRP_EVENT_PERIODIC] = MRP_APPLICANT_AP, }, [MRP_APPLICANT_QP] = { [MRP_EVENT_NEW] = MRP_APPLICANT_VN, [MRP_EVENT_JOIN] = MRP_APPLICANT_QP, [MRP_EVENT_LV] = MRP_APPLICANT_QO, [MRP_EVENT_TX] = MRP_APPLICANT_QP, [MRP_EVENT_R_NEW] = MRP_APPLICANT_QP, [MRP_EVENT_R_JOIN_IN] = MRP_APPLICANT_QP, [MRP_EVENT_R_IN] = MRP_APPLICANT_QP, [MRP_EVENT_R_JOIN_MT] = MRP_APPLICANT_AP, [MRP_EVENT_R_MT] = MRP_APPLICANT_AP, [MRP_EVENT_R_LV] = MRP_APPLICANT_VP, [MRP_EVENT_R_LA] = MRP_APPLICANT_VP, [MRP_EVENT_REDECLARE] = MRP_APPLICANT_VP, [MRP_EVENT_PERIODIC] = MRP_APPLICANT_AP, }, }; static const u8 mrp_tx_action_table[MRP_APPLICANT_MAX + 1] = { [MRP_APPLICANT_VO] = MRP_TX_ACTION_S_IN_OPTIONAL, [MRP_APPLICANT_VP] = MRP_TX_ACTION_S_JOIN_IN, [MRP_APPLICANT_VN] = MRP_TX_ACTION_S_NEW, [MRP_APPLICANT_AN] = MRP_TX_ACTION_S_NEW, [MRP_APPLICANT_AA] = MRP_TX_ACTION_S_JOIN_IN, [MRP_APPLICANT_QA] = MRP_TX_ACTION_S_JOIN_IN_OPTIONAL, [MRP_APPLICANT_LA] = MRP_TX_ACTION_S_LV, [MRP_APPLICANT_AO] = MRP_TX_ACTION_S_IN_OPTIONAL, [MRP_APPLICANT_QO] = MRP_TX_ACTION_S_IN_OPTIONAL, [MRP_APPLICANT_AP] = MRP_TX_ACTION_S_JOIN_IN, [MRP_APPLICANT_QP] = MRP_TX_ACTION_S_IN_OPTIONAL, }; static void mrp_attrvalue_inc(void *value, u8 len) { u8 *v = (u8 *)value; /* Add 1 to the last byte. If it becomes zero, * go to the previous byte and repeat. */ while (len > 0 && !++v[--len]) ; } static int mrp_attr_cmp(const struct mrp_attr *attr, const void *value, u8 len, u8 type) { if (attr->type != type) return attr->type - type; if (attr->len != len) return attr->len - len; return memcmp(attr->value, value, len); } static struct mrp_attr *mrp_attr_lookup(const struct mrp_applicant *app, const void *value, u8 len, u8 type) { struct rb_node *parent = app->mad.rb_node; struct mrp_attr *attr; int d; while (parent) { attr = rb_entry(parent, struct mrp_attr, node); d = mrp_attr_cmp(attr, value, len, type); if (d > 0) parent = parent->rb_left; else if (d < 0) parent = parent->rb_right; else return attr; } return NULL; } static struct mrp_attr *mrp_attr_create(struct mrp_applicant *app, const void *value, u8 len, u8 type) { struct rb_node *parent = NULL, **p = &app->mad.rb_node; struct mrp_attr *attr; int d; while (*p) { parent = *p; attr = rb_entry(parent, struct mrp_attr, node); d = mrp_attr_cmp(attr, value, len, type); if (d > 0) p = &parent->rb_left; else if (d < 0) p = &parent->rb_right; else { /* The attribute already exists; re-use it. */ return attr; } } attr = kmalloc(sizeof(*attr) + len, GFP_ATOMIC); if (!attr) return attr; attr->state = MRP_APPLICANT_VO; attr->type = type; attr->len = len; memcpy(attr->value, value, len); rb_link_node(&attr->node, parent, p); rb_insert_color(&attr->node, &app->mad); return attr; } static void mrp_attr_destroy(struct mrp_applicant *app, struct mrp_attr *attr) { rb_erase(&attr->node, &app->mad); kfree(attr); } static void mrp_attr_destroy_all(struct mrp_applicant *app) { struct rb_node *node, *next; struct mrp_attr *attr; for (node = rb_first(&app->mad); next = node ? rb_next(node) : NULL, node != NULL; node = next) { attr = rb_entry(node, struct mrp_attr, node); mrp_attr_destroy(app, attr); } } static int mrp_pdu_init(struct mrp_applicant *app) { struct sk_buff *skb; struct mrp_pdu_hdr *ph; skb = alloc_skb(app->dev->mtu + LL_RESERVED_SPACE(app->dev), GFP_ATOMIC); if (!skb) return -ENOMEM; skb->dev = app->dev; skb->protocol = app->app->pkttype.type; skb_reserve(skb, LL_RESERVED_SPACE(app->dev)); skb_reset_network_header(skb); skb_reset_transport_header(skb); ph = __skb_put(skb, sizeof(*ph)); ph->version = app->app->version; app->pdu = skb; return 0; } static int mrp_pdu_append_end_mark(struct mrp_applicant *app) { __be16 *endmark; if (skb_tailroom(app->pdu) < sizeof(*endmark)) return -1; endmark = __skb_put(app->pdu, sizeof(*endmark)); put_unaligned(MRP_END_MARK, endmark); return 0; } static void mrp_pdu_queue(struct mrp_applicant *app) { if (!app->pdu) return; if (mrp_cb(app->pdu)->mh) mrp_pdu_append_end_mark(app); mrp_pdu_append_end_mark(app); dev_hard_header(app->pdu, app->dev, ntohs(app->app->pkttype.type), app->app->group_address, app->dev->dev_addr, app->pdu->len); skb_queue_tail(&app->queue, app->pdu); app->pdu = NULL; } static void mrp_queue_xmit(struct mrp_applicant *app) { struct sk_buff *skb; while ((skb = skb_dequeue(&app->queue))) dev_queue_xmit(skb); } static int mrp_pdu_append_msg_hdr(struct mrp_applicant *app, u8 attrtype, u8 attrlen) { struct mrp_msg_hdr *mh; if (mrp_cb(app->pdu)->mh) { if (mrp_pdu_append_end_mark(app) < 0) return -1; mrp_cb(app->pdu)->mh = NULL; mrp_cb(app->pdu)->vah = NULL; } if (skb_tailroom(app->pdu) < sizeof(*mh)) return -1; mh = __skb_put(app->pdu, sizeof(*mh)); mh->attrtype = attrtype; mh->attrlen = attrlen; mrp_cb(app->pdu)->mh = mh; return 0; } static int mrp_pdu_append_vecattr_hdr(struct mrp_applicant *app, const void *firstattrvalue, u8 attrlen) { struct mrp_vecattr_hdr *vah; if (skb_tailroom(app->pdu) < sizeof(*vah) + attrlen) return -1; vah = __skb_put(app->pdu, sizeof(*vah) + attrlen); put_unaligned(0, &vah->lenflags); memcpy(vah->firstattrvalue, firstattrvalue, attrlen); mrp_cb(app->pdu)->vah = vah; memcpy(mrp_cb(app->pdu)->attrvalue, firstattrvalue, attrlen); return 0; } static int mrp_pdu_append_vecattr_event(struct mrp_applicant *app, const struct mrp_attr *attr, enum mrp_vecattr_event vaevent) { u16 len, pos; u8 *vaevents; int err; again: if (!app->pdu) { err = mrp_pdu_init(app); if (err < 0) return err; } /* If there is no Message header in the PDU, or the Message header is * for a different attribute type, add an EndMark (if necessary) and a * new Message header to the PDU. */ if (!mrp_cb(app->pdu)->mh || mrp_cb(app->pdu)->mh->attrtype != attr->type || mrp_cb(app->pdu)->mh->attrlen != attr->len) { if (mrp_pdu_append_msg_hdr(app, attr->type, attr->len) < 0) goto queue; } /* If there is no VectorAttribute header for this Message in the PDU, * or this attribute's value does not sequentially follow the previous * attribute's value, add a new VectorAttribute header to the PDU. */ if (!mrp_cb(app->pdu)->vah || memcmp(mrp_cb(app->pdu)->attrvalue, attr->value, attr->len)) { if (mrp_pdu_append_vecattr_hdr(app, attr->value, attr->len) < 0) goto queue; } len = be16_to_cpu(get_unaligned(&mrp_cb(app->pdu)->vah->lenflags)); pos = len % 3; /* Events are packed into Vectors in the PDU, three to a byte. Add a * byte to the end of the Vector if necessary. */ if (!pos) { if (skb_tailroom(app->pdu) < sizeof(u8)) goto queue; vaevents = __skb_put(app->pdu, sizeof(u8)); } else { vaevents = (u8 *)(skb_tail_pointer(app->pdu) - sizeof(u8)); } switch (pos) { case 0: *vaevents = vaevent * (__MRP_VECATTR_EVENT_MAX * __MRP_VECATTR_EVENT_MAX); break; case 1: *vaevents += vaevent * __MRP_VECATTR_EVENT_MAX; break; case 2: *vaevents += vaevent; break; default: WARN_ON(1); } /* Increment the length of the VectorAttribute in the PDU, as well as * the value of the next attribute that would continue its Vector. */ put_unaligned(cpu_to_be16(++len), &mrp_cb(app->pdu)->vah->lenflags); mrp_attrvalue_inc(mrp_cb(app->pdu)->attrvalue, attr->len); return 0; queue: mrp_pdu_queue(app); goto again; } static void mrp_attr_event(struct mrp_applicant *app, struct mrp_attr *attr, enum mrp_event event) { enum mrp_applicant_state state; state = mrp_applicant_state_table[attr->state][event]; if (state == MRP_APPLICANT_INVALID) { WARN_ON(1); return; } if (event == MRP_EVENT_TX) { /* When appending the attribute fails, don't update its state * in order to retry at the next TX event. */ switch (mrp_tx_action_table[attr->state]) { case MRP_TX_ACTION_NONE: case MRP_TX_ACTION_S_JOIN_IN_OPTIONAL: case MRP_TX_ACTION_S_IN_OPTIONAL: break; case MRP_TX_ACTION_S_NEW: if (mrp_pdu_append_vecattr_event( app, attr, MRP_VECATTR_EVENT_NEW) < 0) return; break; case MRP_TX_ACTION_S_JOIN_IN: if (mrp_pdu_append_vecattr_event( app, attr, MRP_VECATTR_EVENT_JOIN_IN) < 0) return; break; case MRP_TX_ACTION_S_LV: if (mrp_pdu_append_vecattr_event( app, attr, MRP_VECATTR_EVENT_LV) < 0) return; /* As a pure applicant, sending a leave message * implies that the attribute was unregistered and * can be destroyed. */ mrp_attr_destroy(app, attr); return; default: WARN_ON(1); } } attr->state = state; } int mrp_request_join(const struct net_device *dev, const struct mrp_application *appl, const void *value, u8 len, u8 type) { struct mrp_port *port = rtnl_dereference(dev->mrp_port); struct mrp_applicant *app = rtnl_dereference( port->applicants[appl->type]); struct mrp_attr *attr; if (sizeof(struct mrp_skb_cb) + len > sizeof_field(struct sk_buff, cb)) return -ENOMEM; spin_lock_bh(&app->lock); attr = mrp_attr_create(app, value, len, type); if (!attr) { spin_unlock_bh(&app->lock); return -ENOMEM; } mrp_attr_event(app, attr, MRP_EVENT_JOIN); spin_unlock_bh(&app->lock); return 0; } EXPORT_SYMBOL_GPL(mrp_request_join); void mrp_request_leave(const struct net_device *dev, const struct mrp_application *appl, const void *value, u8 len, u8 type) { struct mrp_port *port = rtnl_dereference(dev->mrp_port); struct mrp_applicant *app = rtnl_dereference( port->applicants[appl->type]); struct mrp_attr *attr; if (sizeof(struct mrp_skb_cb) + len > sizeof_field(struct sk_buff, cb)) return; spin_lock_bh(&app->lock); attr = mrp_attr_lookup(app, value, len, type); if (!attr) { spin_unlock_bh(&app->lock); return; } mrp_attr_event(app, attr, MRP_EVENT_LV); spin_unlock_bh(&app->lock); } EXPORT_SYMBOL_GPL(mrp_request_leave); static void mrp_mad_event(struct mrp_applicant *app, enum mrp_event event) { struct rb_node *node, *next; struct mrp_attr *attr; for (node = rb_first(&app->mad); next = node ? rb_next(node) : NULL, node != NULL; node = next) { attr = rb_entry(node, struct mrp_attr, node); mrp_attr_event(app, attr, event); } } static void mrp_join_timer_arm(struct mrp_applicant *app) { unsigned long delay; delay = get_random_u32_below(msecs_to_jiffies(mrp_join_time)); mod_timer(&app->join_timer, jiffies + delay); } static void mrp_join_timer(struct timer_list *t) { struct mrp_applicant *app = timer_container_of(app, t, join_timer); spin_lock(&app->lock); mrp_mad_event(app, MRP_EVENT_TX); mrp_pdu_queue(app); spin_unlock(&app->lock); mrp_queue_xmit(app); spin_lock(&app->lock); if (likely(app->active)) mrp_join_timer_arm(app); spin_unlock(&app->lock); } static void mrp_periodic_timer_arm(struct mrp_applicant *app) { mod_timer(&app->periodic_timer, jiffies + msecs_to_jiffies(mrp_periodic_time)); } static void mrp_periodic_timer(struct timer_list *t) { struct mrp_applicant *app = timer_container_of(app, t, periodic_timer); spin_lock(&app->lock); if (likely(app->active)) { mrp_mad_event(app, MRP_EVENT_PERIODIC); mrp_pdu_queue(app); mrp_periodic_timer_arm(app); } spin_unlock(&app->lock); } static int mrp_pdu_parse_end_mark(struct sk_buff *skb, int *offset) { __be16 endmark; if (skb_copy_bits(skb, *offset, &endmark, sizeof(endmark)) < 0) return -1; if (endmark == MRP_END_MARK) { *offset += sizeof(endmark); return -1; } return 0; } static void mrp_pdu_parse_vecattr_event(struct mrp_applicant *app, struct sk_buff *skb, enum mrp_vecattr_event vaevent) { struct mrp_attr *attr; enum mrp_event event; attr = mrp_attr_lookup(app, mrp_cb(skb)->attrvalue, mrp_cb(skb)->mh->attrlen, mrp_cb(skb)->mh->attrtype); if (attr == NULL) return; switch (vaevent) { case MRP_VECATTR_EVENT_NEW: event = MRP_EVENT_R_NEW; break; case MRP_VECATTR_EVENT_JOIN_IN: event = MRP_EVENT_R_JOIN_IN; break; case MRP_VECATTR_EVENT_IN: event = MRP_EVENT_R_IN; break; case MRP_VECATTR_EVENT_JOIN_MT: event = MRP_EVENT_R_JOIN_MT; break; case MRP_VECATTR_EVENT_MT: event = MRP_EVENT_R_MT; break; case MRP_VECATTR_EVENT_LV: event = MRP_EVENT_R_LV; break; default: return; } mrp_attr_event(app, attr, event); } static int mrp_pdu_parse_vecattr(struct mrp_applicant *app, struct sk_buff *skb, int *offset) { struct mrp_vecattr_hdr _vah; u16 valen; u8 vaevents, vaevent; mrp_cb(skb)->vah = skb_header_pointer(skb, *offset, sizeof(_vah), &_vah); if (!mrp_cb(skb)->vah) return -1; *offset += sizeof(_vah); if (get_unaligned(&mrp_cb(skb)->vah->lenflags) & MRP_VECATTR_HDR_FLAG_LA) mrp_mad_event(app, MRP_EVENT_R_LA); valen = be16_to_cpu(get_unaligned(&mrp_cb(skb)->vah->lenflags) & MRP_VECATTR_HDR_LEN_MASK); /* The VectorAttribute structure in a PDU carries event information * about one or more attributes having consecutive values. Only the * value for the first attribute is contained in the structure. So * we make a copy of that value, and then increment it each time we * advance to the next event in its Vector. */ if (sizeof(struct mrp_skb_cb) + mrp_cb(skb)->mh->attrlen > sizeof_field(struct sk_buff, cb)) return -1; if (skb_copy_bits(skb, *offset, mrp_cb(skb)->attrvalue, mrp_cb(skb)->mh->attrlen) < 0) return -1; *offset += mrp_cb(skb)->mh->attrlen; /* In a VectorAttribute, the Vector contains events which are packed * three to a byte. We process one byte of the Vector at a time. */ while (valen > 0) { if (skb_copy_bits(skb, *offset, &vaevents, sizeof(vaevents)) < 0) return -1; *offset += sizeof(vaevents); /* Extract and process the first event. */ vaevent = vaevents / (__MRP_VECATTR_EVENT_MAX * __MRP_VECATTR_EVENT_MAX); if (vaevent >= __MRP_VECATTR_EVENT_MAX) { /* The byte is malformed; stop processing. */ return -1; } mrp_pdu_parse_vecattr_event(app, skb, vaevent); /* If present, extract and process the second event. */ if (!--valen) break; mrp_attrvalue_inc(mrp_cb(skb)->attrvalue, mrp_cb(skb)->mh->attrlen); vaevents %= (__MRP_VECATTR_EVENT_MAX * __MRP_VECATTR_EVENT_MAX); vaevent = vaevents / __MRP_VECATTR_EVENT_MAX; mrp_pdu_parse_vecattr_event(app, skb, vaevent); /* If present, extract and process the third event. */ if (!--valen) break; mrp_attrvalue_inc(mrp_cb(skb)->attrvalue, mrp_cb(skb)->mh->attrlen); vaevents %= __MRP_VECATTR_EVENT_MAX; vaevent = vaevents; mrp_pdu_parse_vecattr_event(app, skb, vaevent); } return 0; } static int mrp_pdu_parse_msg(struct mrp_applicant *app, struct sk_buff *skb, int *offset) { struct mrp_msg_hdr _mh; mrp_cb(skb)->mh = skb_header_pointer(skb, *offset, sizeof(_mh), &_mh); if (!mrp_cb(skb)->mh) return -1; *offset += sizeof(_mh); if (mrp_cb(skb)->mh->attrtype == 0 || mrp_cb(skb)->mh->attrtype > app->app->maxattr || mrp_cb(skb)->mh->attrlen == 0) return -1; while (skb->len > *offset) { if (mrp_pdu_parse_end_mark(skb, offset) < 0) break; if (mrp_pdu_parse_vecattr(app, skb, offset) < 0) return -1; } return 0; } static int mrp_rcv(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt, struct net_device *orig_dev) { struct mrp_application *appl = container_of(pt, struct mrp_application, pkttype); struct mrp_port *port; struct mrp_applicant *app; struct mrp_pdu_hdr _ph; const struct mrp_pdu_hdr *ph; int offset = skb_network_offset(skb); /* If the interface is in promiscuous mode, drop the packet if * it was unicast to another host. */ if (unlikely(skb->pkt_type == PACKET_OTHERHOST)) goto out; skb = skb_share_check(skb, GFP_ATOMIC); if (unlikely(!skb)) goto out; port = rcu_dereference(dev->mrp_port); if (unlikely(!port)) goto out; app = rcu_dereference(port->applicants[appl->type]); if (unlikely(!app)) goto out; ph = skb_header_pointer(skb, offset, sizeof(_ph), &_ph); if (!ph) goto out; offset += sizeof(_ph); if (ph->version != app->app->version) goto out; spin_lock(&app->lock); while (skb->len > offset) { if (mrp_pdu_parse_end_mark(skb, &offset) < 0) break; if (mrp_pdu_parse_msg(app, skb, &offset) < 0) break; } spin_unlock(&app->lock); out: kfree_skb(skb); return 0; } static int mrp_init_port(struct net_device *dev) { struct mrp_port *port; port = kzalloc(sizeof(*port), GFP_KERNEL); if (!port) return -ENOMEM; rcu_assign_pointer(dev->mrp_port, port); return 0; } static void mrp_release_port(struct net_device *dev) { struct mrp_port *port = rtnl_dereference(dev->mrp_port); unsigned int i; for (i = 0; i <= MRP_APPLICATION_MAX; i++) { if (rtnl_dereference(port->applicants[i])) return; } RCU_INIT_POINTER(dev->mrp_port, NULL); kfree_rcu(port, rcu); } int mrp_init_applicant(struct net_device *dev, struct mrp_application *appl) { struct mrp_applicant *app; int err; ASSERT_RTNL(); if (!rtnl_dereference(dev->mrp_port)) { err = mrp_init_port(dev); if (err < 0) goto err1; } err = -ENOMEM; app = kzalloc(sizeof(*app), GFP_KERNEL); if (!app) goto err2; err = dev_mc_add(dev, appl->group_address); if (err < 0) goto err3; app->dev = dev; app->app = appl; app->mad = RB_ROOT; app->active = true; spin_lock_init(&app->lock); skb_queue_head_init(&app->queue); rcu_assign_pointer(dev->mrp_port->applicants[appl->type], app); timer_setup(&app->join_timer, mrp_join_timer, 0); mrp_join_timer_arm(app); timer_setup(&app->periodic_timer, mrp_periodic_timer, 0); mrp_periodic_timer_arm(app); return 0; err3: kfree(app); err2: mrp_release_port(dev); err1: return err; } EXPORT_SYMBOL_GPL(mrp_init_applicant); void mrp_uninit_applicant(struct net_device *dev, struct mrp_application *appl) { struct mrp_port *port = rtnl_dereference(dev->mrp_port); struct mrp_applicant *app = rtnl_dereference( port->applicants[appl->type]); ASSERT_RTNL(); RCU_INIT_POINTER(port->applicants[appl->type], NULL); spin_lock_bh(&app->lock); app->active = false; spin_unlock_bh(&app->lock); /* Delete timer and generate a final TX event to flush out * all pending messages before the applicant is gone. */ timer_shutdown_sync(&app->join_timer); timer_shutdown_sync(&app->periodic_timer); spin_lock_bh(&app->lock); mrp_mad_event(app, MRP_EVENT_TX); mrp_attr_destroy_all(app); mrp_pdu_queue(app); spin_unlock_bh(&app->lock); mrp_queue_xmit(app); dev_mc_del(dev, appl->group_address); kfree_rcu(app, rcu); mrp_release_port(dev); } EXPORT_SYMBOL_GPL(mrp_uninit_applicant); int mrp_register_application(struct mrp_application *appl) { appl->pkttype.func = mrp_rcv; dev_add_pack(&appl->pkttype); return 0; } EXPORT_SYMBOL_GPL(mrp_register_application); void mrp_unregister_application(struct mrp_application *appl) { dev_remove_pack(&appl->pkttype); } EXPORT_SYMBOL_GPL(mrp_unregister_application); |
| 280 49 49 49 11 298 11 11 11 295 295 293 295 595 124 595 555 295 295 2 295 124 593 530 530 530 514 513 513 234 234 502 54 54 501 502 502 513 513 514 512 13 13 13 13 13 13 13 13 4 9 9 13 13 13 536 75 538 524 527 538 538 537 538 538 460 538 255 536 466 72 12 13 531 531 531 531 531 513 514 531 70 70 70 70 70 70 70 81 16 11 81 81 3 2 11 2 6 8 11 77 12 12 12 12 12 12 9 3 12 12 65 66 66 66 66 66 65 61 66 40 9 60 49 12 12 12 12 12 16 49 60 60 12 12 12 531 531 5 2 1 531 253 460 531 4 530 1 531 530 254 459 513 514 514 531 529 531 531 529 531 529 530 344 514 514 514 514 510 514 531 531 85 85 66 85 2 2 1 1 2 2 2 2 2 2 1 1 1 2 2 2 2 3 3 3 3 3 3 4 3 3 2 2 2 1 1 1 1 1 2 3 2 2 2 2 2 2 2 1 1 2 2 2 2 1 1 1 1 1 1 2 1 2 10 10 10 10 10 10 75 75 75 75 75 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 | // SPDX-License-Identifier: GPL-2.0 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt #include "mmu.h" #include "mmu_internal.h" #include "mmutrace.h" #include "tdp_iter.h" #include "tdp_mmu.h" #include "spte.h" #include <asm/cmpxchg.h> #include <trace/events/kvm.h> /* Initializes the TDP MMU for the VM, if enabled. */ void kvm_mmu_init_tdp_mmu(struct kvm *kvm) { INIT_LIST_HEAD(&kvm->arch.tdp_mmu_roots); spin_lock_init(&kvm->arch.tdp_mmu_pages_lock); } /* Arbitrarily returns true so that this may be used in if statements. */ static __always_inline bool kvm_lockdep_assert_mmu_lock_held(struct kvm *kvm, bool shared) { if (shared) lockdep_assert_held_read(&kvm->mmu_lock); else lockdep_assert_held_write(&kvm->mmu_lock); return true; } void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm) { /* * Invalidate all roots, which besides the obvious, schedules all roots * for zapping and thus puts the TDP MMU's reference to each root, i.e. * ultimately frees all roots. */ kvm_tdp_mmu_invalidate_roots(kvm, KVM_VALID_ROOTS); kvm_tdp_mmu_zap_invalidated_roots(kvm, false); #ifdef CONFIG_KVM_PROVE_MMU KVM_MMU_WARN_ON(atomic64_read(&kvm->arch.tdp_mmu_pages)); #endif WARN_ON(!list_empty(&kvm->arch.tdp_mmu_roots)); /* * Ensure that all the outstanding RCU callbacks to free shadow pages * can run before the VM is torn down. Putting the last reference to * zapped roots will create new callbacks. */ rcu_barrier(); } static void tdp_mmu_free_sp(struct kvm_mmu_page *sp) { free_page((unsigned long)sp->external_spt); free_page((unsigned long)sp->spt); kmem_cache_free(mmu_page_header_cache, sp); } /* * This is called through call_rcu in order to free TDP page table memory * safely with respect to other kernel threads that may be operating on * the memory. * By only accessing TDP MMU page table memory in an RCU read critical * section, and freeing it after a grace period, lockless access to that * memory won't use it after it is freed. */ static void tdp_mmu_free_sp_rcu_callback(struct rcu_head *head) { struct kvm_mmu_page *sp = container_of(head, struct kvm_mmu_page, rcu_head); tdp_mmu_free_sp(sp); } void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root) { if (!refcount_dec_and_test(&root->tdp_mmu_root_count)) return; /* * The TDP MMU itself holds a reference to each root until the root is * explicitly invalidated, i.e. the final reference should be never be * put for a valid root. */ KVM_BUG_ON(!is_tdp_mmu_page(root) || !root->role.invalid, kvm); spin_lock(&kvm->arch.tdp_mmu_pages_lock); list_del_rcu(&root->link); spin_unlock(&kvm->arch.tdp_mmu_pages_lock); call_rcu(&root->rcu_head, tdp_mmu_free_sp_rcu_callback); } static bool tdp_mmu_root_match(struct kvm_mmu_page *root, enum kvm_tdp_mmu_root_types types) { if (WARN_ON_ONCE(!(types & KVM_VALID_ROOTS))) return false; if (root->role.invalid && !(types & KVM_INVALID_ROOTS)) return false; if (likely(!is_mirror_sp(root))) return types & KVM_DIRECT_ROOTS; return types & KVM_MIRROR_ROOTS; } /* * Returns the next root after @prev_root (or the first root if @prev_root is * NULL) that matches with @types. A reference to the returned root is * acquired, and the reference to @prev_root is released (the caller obviously * must hold a reference to @prev_root if it's non-NULL). * * Roots that doesn't match with @types are skipped. * * Returns NULL if the end of tdp_mmu_roots was reached. */ static struct kvm_mmu_page *tdp_mmu_next_root(struct kvm *kvm, struct kvm_mmu_page *prev_root, enum kvm_tdp_mmu_root_types types) { struct kvm_mmu_page *next_root; /* * While the roots themselves are RCU-protected, fields such as * role.invalid are protected by mmu_lock. */ lockdep_assert_held(&kvm->mmu_lock); rcu_read_lock(); if (prev_root) next_root = list_next_or_null_rcu(&kvm->arch.tdp_mmu_roots, &prev_root->link, typeof(*prev_root), link); else next_root = list_first_or_null_rcu(&kvm->arch.tdp_mmu_roots, typeof(*next_root), link); while (next_root) { if (tdp_mmu_root_match(next_root, types) && kvm_tdp_mmu_get_root(next_root)) break; next_root = list_next_or_null_rcu(&kvm->arch.tdp_mmu_roots, &next_root->link, typeof(*next_root), link); } rcu_read_unlock(); if (prev_root) kvm_tdp_mmu_put_root(kvm, prev_root); return next_root; } /* * Note: this iterator gets and puts references to the roots it iterates over. * This makes it safe to release the MMU lock and yield within the loop, but * if exiting the loop early, the caller must drop the reference to the most * recent root. (Unless keeping a live reference is desirable.) * * If shared is set, this function is operating under the MMU lock in read * mode. */ #define __for_each_tdp_mmu_root_yield_safe(_kvm, _root, _as_id, _types) \ for (_root = tdp_mmu_next_root(_kvm, NULL, _types); \ ({ lockdep_assert_held(&(_kvm)->mmu_lock); }), _root; \ _root = tdp_mmu_next_root(_kvm, _root, _types)) \ if (_as_id >= 0 && kvm_mmu_page_as_id(_root) != _as_id) { \ } else #define for_each_valid_tdp_mmu_root_yield_safe(_kvm, _root, _as_id) \ __for_each_tdp_mmu_root_yield_safe(_kvm, _root, _as_id, KVM_VALID_ROOTS) #define for_each_tdp_mmu_root_yield_safe(_kvm, _root) \ for (_root = tdp_mmu_next_root(_kvm, NULL, KVM_ALL_ROOTS); \ ({ lockdep_assert_held(&(_kvm)->mmu_lock); }), _root; \ _root = tdp_mmu_next_root(_kvm, _root, KVM_ALL_ROOTS)) /* * Iterate over all TDP MMU roots. Requires that mmu_lock be held for write, * the implication being that any flow that holds mmu_lock for read is * inherently yield-friendly and should use the yield-safe variant above. * Holding mmu_lock for write obviates the need for RCU protection as the list * is guaranteed to be stable. */ #define __for_each_tdp_mmu_root(_kvm, _root, _as_id, _types) \ list_for_each_entry(_root, &_kvm->arch.tdp_mmu_roots, link) \ if (kvm_lockdep_assert_mmu_lock_held(_kvm, false) && \ ((_as_id >= 0 && kvm_mmu_page_as_id(_root) != _as_id) || \ !tdp_mmu_root_match((_root), (_types)))) { \ } else /* * Iterate over all TDP MMU roots in an RCU read-side critical section. * It is safe to iterate over the SPTEs under the root, but their values will * be unstable, so all writes must be atomic. As this routine is meant to be * used without holding the mmu_lock at all, any bits that are flipped must * be reflected in kvm_tdp_mmu_spte_need_atomic_write(). */ #define for_each_tdp_mmu_root_rcu(_kvm, _root, _as_id, _types) \ list_for_each_entry_rcu(_root, &_kvm->arch.tdp_mmu_roots, link) \ if ((_as_id >= 0 && kvm_mmu_page_as_id(_root) != _as_id) || \ !tdp_mmu_root_match((_root), (_types))) { \ } else #define for_each_valid_tdp_mmu_root(_kvm, _root, _as_id) \ __for_each_tdp_mmu_root(_kvm, _root, _as_id, KVM_VALID_ROOTS) static struct kvm_mmu_page *tdp_mmu_alloc_sp(struct kvm_vcpu *vcpu) { struct kvm_mmu_page *sp; sp = kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_page_header_cache); sp->spt = kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_shadow_page_cache); return sp; } static void tdp_mmu_init_sp(struct kvm_mmu_page *sp, tdp_ptep_t sptep, gfn_t gfn, union kvm_mmu_page_role role) { INIT_LIST_HEAD(&sp->possible_nx_huge_page_link); set_page_private(virt_to_page(sp->spt), (unsigned long)sp); sp->role = role; sp->gfn = gfn; sp->ptep = sptep; sp->tdp_mmu_page = true; trace_kvm_mmu_get_page(sp, true); } static void tdp_mmu_init_child_sp(struct kvm_mmu_page *child_sp, struct tdp_iter *iter) { struct kvm_mmu_page *parent_sp; union kvm_mmu_page_role role; parent_sp = sptep_to_sp(rcu_dereference(iter->sptep)); role = parent_sp->role; role.level--; tdp_mmu_init_sp(child_sp, iter->sptep, iter->gfn, role); } void kvm_tdp_mmu_alloc_root(struct kvm_vcpu *vcpu, bool mirror) { struct kvm_mmu *mmu = vcpu->arch.mmu; union kvm_mmu_page_role role = mmu->root_role; int as_id = kvm_mmu_role_as_id(role); struct kvm *kvm = vcpu->kvm; struct kvm_mmu_page *root; if (mirror) role.is_mirror = true; /* * Check for an existing root before acquiring the pages lock to avoid * unnecessary serialization if multiple vCPUs are loading a new root. * E.g. when bringing up secondary vCPUs, KVM will already have created * a valid root on behalf of the primary vCPU. */ read_lock(&kvm->mmu_lock); for_each_valid_tdp_mmu_root_yield_safe(kvm, root, as_id) { if (root->role.word == role.word) goto out_read_unlock; } spin_lock(&kvm->arch.tdp_mmu_pages_lock); /* * Recheck for an existing root after acquiring the pages lock, another * vCPU may have raced ahead and created a new usable root. Manually * walk the list of roots as the standard macros assume that the pages * lock is *not* held. WARN if grabbing a reference to a usable root * fails, as the last reference to a root can only be put *after* the * root has been invalidated, which requires holding mmu_lock for write. */ list_for_each_entry(root, &kvm->arch.tdp_mmu_roots, link) { if (root->role.word == role.word && !WARN_ON_ONCE(!kvm_tdp_mmu_get_root(root))) goto out_spin_unlock; } root = tdp_mmu_alloc_sp(vcpu); tdp_mmu_init_sp(root, NULL, 0, role); /* * TDP MMU roots are kept until they are explicitly invalidated, either * by a memslot update or by the destruction of the VM. Initialize the * refcount to two; one reference for the vCPU, and one reference for * the TDP MMU itself, which is held until the root is invalidated and * is ultimately put by kvm_tdp_mmu_zap_invalidated_roots(). */ refcount_set(&root->tdp_mmu_root_count, 2); list_add_rcu(&root->link, &kvm->arch.tdp_mmu_roots); out_spin_unlock: spin_unlock(&kvm->arch.tdp_mmu_pages_lock); out_read_unlock: read_unlock(&kvm->mmu_lock); /* * Note, KVM_REQ_MMU_FREE_OBSOLETE_ROOTS will prevent entering the guest * and actually consuming the root if it's invalidated after dropping * mmu_lock, and the root can't be freed as this vCPU holds a reference. */ if (mirror) { mmu->mirror_root_hpa = __pa(root->spt); } else { mmu->root.hpa = __pa(root->spt); mmu->root.pgd = 0; } } static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn, u64 old_spte, u64 new_spte, int level, bool shared); static void tdp_account_mmu_page(struct kvm *kvm, struct kvm_mmu_page *sp) { kvm_account_pgtable_pages((void *)sp->spt, +1); #ifdef CONFIG_KVM_PROVE_MMU atomic64_inc(&kvm->arch.tdp_mmu_pages); #endif } static void tdp_unaccount_mmu_page(struct kvm *kvm, struct kvm_mmu_page *sp) { kvm_account_pgtable_pages((void *)sp->spt, -1); #ifdef CONFIG_KVM_PROVE_MMU atomic64_dec(&kvm->arch.tdp_mmu_pages); #endif } /** * tdp_mmu_unlink_sp() - Remove a shadow page from the list of used pages * * @kvm: kvm instance * @sp: the page to be removed */ static void tdp_mmu_unlink_sp(struct kvm *kvm, struct kvm_mmu_page *sp) { tdp_unaccount_mmu_page(kvm, sp); if (!sp->nx_huge_page_disallowed) return; spin_lock(&kvm->arch.tdp_mmu_pages_lock); sp->nx_huge_page_disallowed = false; untrack_possible_nx_huge_page(kvm, sp); spin_unlock(&kvm->arch.tdp_mmu_pages_lock); } static void remove_external_spte(struct kvm *kvm, gfn_t gfn, u64 old_spte, int level) { kvm_pfn_t old_pfn = spte_to_pfn(old_spte); int ret; /* * External (TDX) SPTEs are limited to PG_LEVEL_4K, and external * PTs are removed in a special order, involving free_external_spt(). * But remove_external_spte() will be called on non-leaf PTEs via * __tdp_mmu_zap_root(), so avoid the error the former would return * in this case. */ if (!is_last_spte(old_spte, level)) return; /* Zapping leaf spte is allowed only when write lock is held. */ lockdep_assert_held_write(&kvm->mmu_lock); /* Because write lock is held, operation should success. */ ret = kvm_x86_call(remove_external_spte)(kvm, gfn, level, old_pfn); KVM_BUG_ON(ret, kvm); } /** * handle_removed_pt() - handle a page table removed from the TDP structure * * @kvm: kvm instance * @pt: the page removed from the paging structure * @shared: This operation may not be running under the exclusive use * of the MMU lock and the operation must synchronize with other * threads that might be modifying SPTEs. * * Given a page table that has been removed from the TDP paging structure, * iterates through the page table to clear SPTEs and free child page tables. * * Note that pt is passed in as a tdp_ptep_t, but it does not need RCU * protection. Since this thread removed it from the paging structure, * this thread will be responsible for ensuring the page is freed. Hence the * early rcu_dereferences in the function. */ static void handle_removed_pt(struct kvm *kvm, tdp_ptep_t pt, bool shared) { struct kvm_mmu_page *sp = sptep_to_sp(rcu_dereference(pt)); int level = sp->role.level; gfn_t base_gfn = sp->gfn; int i; trace_kvm_mmu_prepare_zap_page(sp); tdp_mmu_unlink_sp(kvm, sp); for (i = 0; i < SPTE_ENT_PER_PAGE; i++) { tdp_ptep_t sptep = pt + i; gfn_t gfn = base_gfn + i * KVM_PAGES_PER_HPAGE(level); u64 old_spte; if (shared) { /* * Set the SPTE to a nonpresent value that other * threads will not overwrite. If the SPTE was * already marked as frozen then another thread * handling a page fault could overwrite it, so * set the SPTE until it is set from some other * value to the frozen SPTE value. */ for (;;) { old_spte = kvm_tdp_mmu_write_spte_atomic(sptep, FROZEN_SPTE); if (!is_frozen_spte(old_spte)) break; cpu_relax(); } } else { /* * If the SPTE is not MMU-present, there is no backing * page associated with the SPTE and so no side effects * that need to be recorded, and exclusive ownership of * mmu_lock ensures the SPTE can't be made present. * Note, zapping MMIO SPTEs is also unnecessary as they * are guarded by the memslots generation, not by being * unreachable. */ old_spte = kvm_tdp_mmu_read_spte(sptep); if (!is_shadow_present_pte(old_spte)) continue; /* * Use the common helper instead of a raw WRITE_ONCE as * the SPTE needs to be updated atomically if it can be * modified by a different vCPU outside of mmu_lock. * Even though the parent SPTE is !PRESENT, the TLB * hasn't yet been flushed, and both Intel and AMD * document that A/D assists can use upper-level PxE * entries that are cached in the TLB, i.e. the CPU can * still access the page and mark it dirty. * * No retry is needed in the atomic update path as the * sole concern is dropping a Dirty bit, i.e. no other * task can zap/remove the SPTE as mmu_lock is held for * write. Marking the SPTE as a frozen SPTE is not * strictly necessary for the same reason, but using * the frozen SPTE value keeps the shared/exclusive * paths consistent and allows the handle_changed_spte() * call below to hardcode the new value to FROZEN_SPTE. * * Note, even though dropping a Dirty bit is the only * scenario where a non-atomic update could result in a * functional bug, simply checking the Dirty bit isn't * sufficient as a fast page fault could read the upper * level SPTE before it is zapped, and then make this * target SPTE writable, resume the guest, and set the * Dirty bit between reading the SPTE above and writing * it here. */ old_spte = kvm_tdp_mmu_write_spte(sptep, old_spte, FROZEN_SPTE, level); } handle_changed_spte(kvm, kvm_mmu_page_as_id(sp), gfn, old_spte, FROZEN_SPTE, level, shared); if (is_mirror_sp(sp)) { KVM_BUG_ON(shared, kvm); remove_external_spte(kvm, gfn, old_spte, level); } } if (is_mirror_sp(sp) && WARN_ON(kvm_x86_call(free_external_spt)(kvm, base_gfn, sp->role.level, sp->external_spt))) { /* * Failed to free page table page in mirror page table and * there is nothing to do further. * Intentionally leak the page to prevent the kernel from * accessing the encrypted page. */ sp->external_spt = NULL; } call_rcu(&sp->rcu_head, tdp_mmu_free_sp_rcu_callback); } static void *get_external_spt(gfn_t gfn, u64 new_spte, int level) { if (is_shadow_present_pte(new_spte) && !is_last_spte(new_spte, level)) { struct kvm_mmu_page *sp = spte_to_child_sp(new_spte); WARN_ON_ONCE(sp->role.level + 1 != level); WARN_ON_ONCE(sp->gfn != gfn); return sp->external_spt; } return NULL; } static int __must_check set_external_spte_present(struct kvm *kvm, tdp_ptep_t sptep, gfn_t gfn, u64 old_spte, u64 new_spte, int level) { bool was_present = is_shadow_present_pte(old_spte); bool is_present = is_shadow_present_pte(new_spte); bool is_leaf = is_present && is_last_spte(new_spte, level); kvm_pfn_t new_pfn = spte_to_pfn(new_spte); int ret = 0; KVM_BUG_ON(was_present, kvm); lockdep_assert_held(&kvm->mmu_lock); /* * We need to lock out other updates to the SPTE until the external * page table has been modified. Use FROZEN_SPTE similar to * the zapping case. */ if (!try_cmpxchg64(rcu_dereference(sptep), &old_spte, FROZEN_SPTE)) return -EBUSY; /* * Use different call to either set up middle level * external page table, or leaf. */ if (is_leaf) { ret = kvm_x86_call(set_external_spte)(kvm, gfn, level, new_pfn); } else { void *external_spt = get_external_spt(gfn, new_spte, level); KVM_BUG_ON(!external_spt, kvm); ret = kvm_x86_call(link_external_spt)(kvm, gfn, level, external_spt); } if (ret) __kvm_tdp_mmu_write_spte(sptep, old_spte); else __kvm_tdp_mmu_write_spte(sptep, new_spte); return ret; } /** * handle_changed_spte - handle bookkeeping associated with an SPTE change * @kvm: kvm instance * @as_id: the address space of the paging structure the SPTE was a part of * @gfn: the base GFN that was mapped by the SPTE * @old_spte: The value of the SPTE before the change * @new_spte: The value of the SPTE after the change * @level: the level of the PT the SPTE is part of in the paging structure * @shared: This operation may not be running under the exclusive use of * the MMU lock and the operation must synchronize with other * threads that might be modifying SPTEs. * * Handle bookkeeping that might result from the modification of a SPTE. Note, * dirty logging updates are handled in common code, not here (see make_spte() * and fast_pf_fix_direct_spte()). */ static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn, u64 old_spte, u64 new_spte, int level, bool shared) { bool was_present = is_shadow_present_pte(old_spte); bool is_present = is_shadow_present_pte(new_spte); bool was_leaf = was_present && is_last_spte(old_spte, level); bool is_leaf = is_present && is_last_spte(new_spte, level); bool pfn_changed = spte_to_pfn(old_spte) != spte_to_pfn(new_spte); WARN_ON_ONCE(level > PT64_ROOT_MAX_LEVEL); WARN_ON_ONCE(level < PG_LEVEL_4K); WARN_ON_ONCE(gfn & (KVM_PAGES_PER_HPAGE(level) - 1)); /* * If this warning were to trigger it would indicate that there was a * missing MMU notifier or a race with some notifier handler. * A present, leaf SPTE should never be directly replaced with another * present leaf SPTE pointing to a different PFN. A notifier handler * should be zapping the SPTE before the main MM's page table is * changed, or the SPTE should be zeroed, and the TLBs flushed by the * thread before replacement. */ if (was_leaf && is_leaf && pfn_changed) { pr_err("Invalid SPTE change: cannot replace a present leaf\n" "SPTE with another present leaf SPTE mapping a\n" "different PFN!\n" "as_id: %d gfn: %llx old_spte: %llx new_spte: %llx level: %d", as_id, gfn, old_spte, new_spte, level); /* * Crash the host to prevent error propagation and guest data * corruption. */ BUG(); } if (old_spte == new_spte) return; trace_kvm_tdp_mmu_spte_changed(as_id, gfn, level, old_spte, new_spte); if (is_leaf) check_spte_writable_invariants(new_spte); /* * The only times a SPTE should be changed from a non-present to * non-present state is when an MMIO entry is installed/modified/ * removed. In that case, there is nothing to do here. */ if (!was_present && !is_present) { /* * If this change does not involve a MMIO SPTE or frozen SPTE, * it is unexpected. Log the change, though it should not * impact the guest since both the former and current SPTEs * are nonpresent. */ if (WARN_ON_ONCE(!is_mmio_spte(kvm, old_spte) && !is_mmio_spte(kvm, new_spte) && !is_frozen_spte(new_spte))) pr_err("Unexpected SPTE change! Nonpresent SPTEs\n" "should not be replaced with another,\n" "different nonpresent SPTE, unless one or both\n" "are MMIO SPTEs, or the new SPTE is\n" "a temporary frozen SPTE.\n" "as_id: %d gfn: %llx old_spte: %llx new_spte: %llx level: %d", as_id, gfn, old_spte, new_spte, level); return; } if (is_leaf != was_leaf) kvm_update_page_stats(kvm, level, is_leaf ? 1 : -1); /* * Recursively handle child PTs if the change removed a subtree from * the paging structure. Note the WARN on the PFN changing without the * SPTE being converted to a hugepage (leaf) or being zapped. Shadow * pages are kernel allocations and should never be migrated. */ if (was_present && !was_leaf && (is_leaf || !is_present || WARN_ON_ONCE(pfn_changed))) handle_removed_pt(kvm, spte_to_child_pt(old_spte, level), shared); } static inline int __must_check __tdp_mmu_set_spte_atomic(struct kvm *kvm, struct tdp_iter *iter, u64 new_spte) { /* * The caller is responsible for ensuring the old SPTE is not a FROZEN * SPTE. KVM should never attempt to zap or manipulate a FROZEN SPTE, * and pre-checking before inserting a new SPTE is advantageous as it * avoids unnecessary work. */ WARN_ON_ONCE(iter->yielded || is_frozen_spte(iter->old_spte)); if (is_mirror_sptep(iter->sptep) && !is_frozen_spte(new_spte)) { int ret; /* * Users of atomic zapping don't operate on mirror roots, * so don't handle it and bug the VM if it's seen. */ if (KVM_BUG_ON(!is_shadow_present_pte(new_spte), kvm)) return -EBUSY; ret = set_external_spte_present(kvm, iter->sptep, iter->gfn, iter->old_spte, new_spte, iter->level); if (ret) return ret; } else { u64 *sptep = rcu_dereference(iter->sptep); /* * Note, fast_pf_fix_direct_spte() can also modify TDP MMU SPTEs * and does not hold the mmu_lock. On failure, i.e. if a * different logical CPU modified the SPTE, try_cmpxchg64() * updates iter->old_spte with the current value, so the caller * operates on fresh data, e.g. if it retries * tdp_mmu_set_spte_atomic() */ if (!try_cmpxchg64(sptep, &iter->old_spte, new_spte)) return -EBUSY; } return 0; } /* * tdp_mmu_set_spte_atomic - Set a TDP MMU SPTE atomically * and handle the associated bookkeeping. Do not mark the page dirty * in KVM's dirty bitmaps. * * If setting the SPTE fails because it has changed, iter->old_spte will be * refreshed to the current value of the spte. * * @kvm: kvm instance * @iter: a tdp_iter instance currently on the SPTE that should be set * @new_spte: The value the SPTE should be set to * Return: * * 0 - If the SPTE was set. * * -EBUSY - If the SPTE cannot be set. In this case this function will have * no side-effects other than setting iter->old_spte to the last * known value of the spte. */ static inline int __must_check tdp_mmu_set_spte_atomic(struct kvm *kvm, struct tdp_iter *iter, u64 new_spte) { int ret; lockdep_assert_held_read(&kvm->mmu_lock); ret = __tdp_mmu_set_spte_atomic(kvm, iter, new_spte); if (ret) return ret; handle_changed_spte(kvm, iter->as_id, iter->gfn, iter->old_spte, new_spte, iter->level, true); return 0; } /* * tdp_mmu_set_spte - Set a TDP MMU SPTE and handle the associated bookkeeping * @kvm: KVM instance * @as_id: Address space ID, i.e. regular vs. SMM * @sptep: Pointer to the SPTE * @old_spte: The current value of the SPTE * @new_spte: The new value that will be set for the SPTE * @gfn: The base GFN that was (or will be) mapped by the SPTE * @level: The level _containing_ the SPTE (its parent PT's level) * * Returns the old SPTE value, which _may_ be different than @old_spte if the * SPTE had voldatile bits. */ static u64 tdp_mmu_set_spte(struct kvm *kvm, int as_id, tdp_ptep_t sptep, u64 old_spte, u64 new_spte, gfn_t gfn, int level) { lockdep_assert_held_write(&kvm->mmu_lock); /* * No thread should be using this function to set SPTEs to or from the * temporary frozen SPTE value. * If operating under the MMU lock in read mode, tdp_mmu_set_spte_atomic * should be used. If operating under the MMU lock in write mode, the * use of the frozen SPTE should not be necessary. */ WARN_ON_ONCE(is_frozen_spte(old_spte) || is_frozen_spte(new_spte)); old_spte = kvm_tdp_mmu_write_spte(sptep, old_spte, new_spte, level); handle_changed_spte(kvm, as_id, gfn, old_spte, new_spte, level, false); /* * Users that do non-atomic setting of PTEs don't operate on mirror * roots, so don't handle it and bug the VM if it's seen. */ if (is_mirror_sptep(sptep)) { KVM_BUG_ON(is_shadow_present_pte(new_spte), kvm); remove_external_spte(kvm, gfn, old_spte, level); } return old_spte; } static inline void tdp_mmu_iter_set_spte(struct kvm *kvm, struct tdp_iter *iter, u64 new_spte) { WARN_ON_ONCE(iter->yielded); iter->old_spte = tdp_mmu_set_spte(kvm, iter->as_id, iter->sptep, iter->old_spte, new_spte, iter->gfn, iter->level); } #define tdp_root_for_each_pte(_iter, _kvm, _root, _start, _end) \ for_each_tdp_pte(_iter, _kvm, _root, _start, _end) #define tdp_root_for_each_leaf_pte(_iter, _kvm, _root, _start, _end) \ tdp_root_for_each_pte(_iter, _kvm, _root, _start, _end) \ if (!is_shadow_present_pte(_iter.old_spte) || \ !is_last_spte(_iter.old_spte, _iter.level)) \ continue; \ else static inline bool __must_check tdp_mmu_iter_need_resched(struct kvm *kvm, struct tdp_iter *iter) { if (!need_resched() && !rwlock_needbreak(&kvm->mmu_lock)) return false; /* Ensure forward progress has been made before yielding. */ return iter->next_last_level_gfn != iter->yielded_gfn; } /* * Yield if the MMU lock is contended or this thread needs to return control * to the scheduler. * * If this function should yield and flush is set, it will perform a remote * TLB flush before yielding. * * If this function yields, iter->yielded is set and the caller must skip to * the next iteration, where tdp_iter_next() will reset the tdp_iter's walk * over the paging structures to allow the iterator to continue its traversal * from the paging structure root. * * Returns true if this function yielded. */ static inline bool __must_check tdp_mmu_iter_cond_resched(struct kvm *kvm, struct tdp_iter *iter, bool flush, bool shared) { KVM_MMU_WARN_ON(iter->yielded); if (!tdp_mmu_iter_need_resched(kvm, iter)) return false; if (flush) kvm_flush_remote_tlbs(kvm); rcu_read_unlock(); if (shared) cond_resched_rwlock_read(&kvm->mmu_lock); else cond_resched_rwlock_write(&kvm->mmu_lock); rcu_read_lock(); WARN_ON_ONCE(iter->gfn > iter->next_last_level_gfn); iter->yielded = true; return true; } static inline gfn_t tdp_mmu_max_gfn_exclusive(void) { /* * Bound TDP MMU walks at host.MAXPHYADDR. KVM disallows memslots with * a gpa range that would exceed the max gfn, and KVM does not create * MMIO SPTEs for "impossible" gfns, instead sending such accesses down * the slow emulation path every time. */ return kvm_mmu_max_gfn() + 1; } static void __tdp_mmu_zap_root(struct kvm *kvm, struct kvm_mmu_page *root, bool shared, int zap_level) { struct tdp_iter iter; for_each_tdp_pte_min_level_all(iter, root, zap_level) { retry: if (tdp_mmu_iter_cond_resched(kvm, &iter, false, shared)) continue; if (!is_shadow_present_pte(iter.old_spte)) continue; if (iter.level > zap_level) continue; if (!shared) tdp_mmu_iter_set_spte(kvm, &iter, SHADOW_NONPRESENT_VALUE); else if (tdp_mmu_set_spte_atomic(kvm, &iter, SHADOW_NONPRESENT_VALUE)) goto retry; } } static void tdp_mmu_zap_root(struct kvm *kvm, struct kvm_mmu_page *root, bool shared) { /* * The root must have an elevated refcount so that it's reachable via * mmu_notifier callbacks, which allows this path to yield and drop * mmu_lock. When handling an unmap/release mmu_notifier command, KVM * must drop all references to relevant pages prior to completing the * callback. Dropping mmu_lock with an unreachable root would result * in zapping SPTEs after a relevant mmu_notifier callback completes * and lead to use-after-free as zapping a SPTE triggers "writeback" of * dirty accessed bits to the SPTE's associated struct page. */ WARN_ON_ONCE(!refcount_read(&root->tdp_mmu_root_count)); kvm_lockdep_assert_mmu_lock_held(kvm, shared); rcu_read_lock(); /* * Zap roots in multiple passes of decreasing granularity, i.e. zap at * 4KiB=>2MiB=>1GiB=>root, in order to better honor need_resched() (all * preempt models) or mmu_lock contention (full or real-time models). * Zapping at finer granularity marginally increases the total time of * the zap, but in most cases the zap itself isn't latency sensitive. * * If KVM is configured to prove the MMU, skip the 4KiB and 2MiB zaps * in order to mimic the page fault path, which can replace a 1GiB page * table with an equivalent 1GiB hugepage, i.e. can get saddled with * zapping a 1GiB region that's fully populated with 4KiB SPTEs. This * allows verifying that KVM can safely zap 1GiB regions, e.g. without * inducing RCU stalls, without relying on a relatively rare event * (zapping roots is orders of magnitude more common). Note, because * zapping a SP recurses on its children, stepping down to PG_LEVEL_4K * in the iterator itself is unnecessary. */ if (!IS_ENABLED(CONFIG_KVM_PROVE_MMU)) { __tdp_mmu_zap_root(kvm, root, shared, PG_LEVEL_4K); __tdp_mmu_zap_root(kvm, root, shared, PG_LEVEL_2M); } __tdp_mmu_zap_root(kvm, root, shared, PG_LEVEL_1G); __tdp_mmu_zap_root(kvm, root, shared, root->role.level); rcu_read_unlock(); } bool kvm_tdp_mmu_zap_sp(struct kvm *kvm, struct kvm_mmu_page *sp) { u64 old_spte; /* * This helper intentionally doesn't allow zapping a root shadow page, * which doesn't have a parent page table and thus no associated entry. */ if (WARN_ON_ONCE(!sp->ptep)) return false; old_spte = kvm_tdp_mmu_read_spte(sp->ptep); if (WARN_ON_ONCE(!is_shadow_present_pte(old_spte))) return false; tdp_mmu_set_spte(kvm, kvm_mmu_page_as_id(sp), sp->ptep, old_spte, SHADOW_NONPRESENT_VALUE, sp->gfn, sp->role.level + 1); return true; } /* * If can_yield is true, will release the MMU lock and reschedule if the * scheduler needs the CPU or there is contention on the MMU lock. If this * function cannot yield, it will not release the MMU lock or reschedule and * the caller must ensure it does not supply too large a GFN range, or the * operation can cause a soft lockup. */ static bool tdp_mmu_zap_leafs(struct kvm *kvm, struct kvm_mmu_page *root, gfn_t start, gfn_t end, bool can_yield, bool flush) { struct tdp_iter iter; end = min(end, tdp_mmu_max_gfn_exclusive()); lockdep_assert_held_write(&kvm->mmu_lock); rcu_read_lock(); for_each_tdp_pte_min_level(iter, kvm, root, PG_LEVEL_4K, start, end) { if (can_yield && tdp_mmu_iter_cond_resched(kvm, &iter, flush, false)) { flush = false; continue; } if (!is_shadow_present_pte(iter.old_spte) || !is_last_spte(iter.old_spte, iter.level)) continue; tdp_mmu_iter_set_spte(kvm, &iter, SHADOW_NONPRESENT_VALUE); /* * Zappings SPTEs in invalid roots doesn't require a TLB flush, * see kvm_tdp_mmu_zap_invalidated_roots() for details. */ if (!root->role.invalid) flush = true; } rcu_read_unlock(); /* * Because this flow zaps _only_ leaf SPTEs, the caller doesn't need * to provide RCU protection as no 'struct kvm_mmu_page' will be freed. */ return flush; } /* * Zap leaf SPTEs for the range of gfns, [start, end), for all *VALID** roots. * Returns true if a TLB flush is needed before releasing the MMU lock, i.e. if * one or more SPTEs were zapped since the MMU lock was last acquired. */ bool kvm_tdp_mmu_zap_leafs(struct kvm *kvm, gfn_t start, gfn_t end, bool flush) { struct kvm_mmu_page *root; lockdep_assert_held_write(&kvm->mmu_lock); for_each_valid_tdp_mmu_root_yield_safe(kvm, root, -1) flush = tdp_mmu_zap_leafs(kvm, root, start, end, true, flush); return flush; } void kvm_tdp_mmu_zap_all(struct kvm *kvm) { struct kvm_mmu_page *root; /* * Zap all direct roots, including invalid direct roots, as all direct * SPTEs must be dropped before returning to the caller. For TDX, mirror * roots don't need handling in response to the mmu notifier (the caller). * * Zap directly even if the root is also being zapped by a concurrent * "fast zap". Walking zapped top-level SPTEs isn't all that expensive * and mmu_lock is already held, which means the other thread has yielded. * * A TLB flush is unnecessary, KVM zaps everything if and only the VM * is being destroyed or the userspace VMM has exited. In both cases, * KVM_RUN is unreachable, i.e. no vCPUs will ever service the request. */ lockdep_assert_held_write(&kvm->mmu_lock); __for_each_tdp_mmu_root_yield_safe(kvm, root, -1, KVM_DIRECT_ROOTS | KVM_INVALID_ROOTS) tdp_mmu_zap_root(kvm, root, false); } /* * Zap all invalidated roots to ensure all SPTEs are dropped before the "fast * zap" completes. */ void kvm_tdp_mmu_zap_invalidated_roots(struct kvm *kvm, bool shared) { struct kvm_mmu_page *root; if (shared) read_lock(&kvm->mmu_lock); else write_lock(&kvm->mmu_lock); for_each_tdp_mmu_root_yield_safe(kvm, root) { if (!root->tdp_mmu_scheduled_root_to_zap) continue; root->tdp_mmu_scheduled_root_to_zap = false; KVM_BUG_ON(!root->role.invalid, kvm); /* * A TLB flush is not necessary as KVM performs a local TLB * flush when allocating a new root (see kvm_mmu_load()), and * when migrating a vCPU to a different pCPU. Note, the local * TLB flush on reuse also invalidates paging-structure-cache * entries, i.e. TLB entries for intermediate paging structures, * that may be zapped, as such entries are associated with the * ASID on both VMX and SVM. */ tdp_mmu_zap_root(kvm, root, shared); /* * The referenced needs to be put *after* zapping the root, as * the root must be reachable by mmu_notifiers while it's being * zapped */ kvm_tdp_mmu_put_root(kvm, root); } if (shared) read_unlock(&kvm->mmu_lock); else write_unlock(&kvm->mmu_lock); } /* * Mark each TDP MMU root as invalid to prevent vCPUs from reusing a root that * is about to be zapped, e.g. in response to a memslots update. The actual * zapping is done separately so that it happens with mmu_lock with read, * whereas invalidating roots must be done with mmu_lock held for write (unless * the VM is being destroyed). * * Note, kvm_tdp_mmu_zap_invalidated_roots() is gifted the TDP MMU's reference. * See kvm_tdp_mmu_alloc_root(). */ void kvm_tdp_mmu_invalidate_roots(struct kvm *kvm, enum kvm_tdp_mmu_root_types root_types) { struct kvm_mmu_page *root; /* * Invalidating invalid roots doesn't make sense, prevent developers from * having to think about it. */ if (WARN_ON_ONCE(root_types & KVM_INVALID_ROOTS)) root_types &= ~KVM_INVALID_ROOTS; /* * mmu_lock must be held for write to ensure that a root doesn't become * invalid while there are active readers (invalidating a root while * there are active readers may or may not be problematic in practice, * but it's uncharted territory and not supported). * * Waive the assertion if there are no users of @kvm, i.e. the VM is * being destroyed after all references have been put, or if no vCPUs * have been created (which means there are no roots), i.e. the VM is * being destroyed in an error path of KVM_CREATE_VM. */ if (IS_ENABLED(CONFIG_PROVE_LOCKING) && refcount_read(&kvm->users_count) && kvm->created_vcpus) lockdep_assert_held_write(&kvm->mmu_lock); /* * As above, mmu_lock isn't held when destroying the VM! There can't * be other references to @kvm, i.e. nothing else can invalidate roots * or get/put references to roots. */ list_for_each_entry(root, &kvm->arch.tdp_mmu_roots, link) { if (!tdp_mmu_root_match(root, root_types)) continue; /* * Note, invalid roots can outlive a memslot update! Invalid * roots must be *zapped* before the memslot update completes, * but a different task can acquire a reference and keep the * root alive after its been zapped. */ if (!root->role.invalid) { root->tdp_mmu_scheduled_root_to_zap = true; root->role.invalid = true; } } } /* * Installs a last-level SPTE to handle a TDP page fault. * (NPT/EPT violation/misconfiguration) */ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault, struct tdp_iter *iter) { struct kvm_mmu_page *sp = sptep_to_sp(rcu_dereference(iter->sptep)); u64 new_spte; int ret = RET_PF_FIXED; bool wrprot = false; if (WARN_ON_ONCE(sp->role.level != fault->goal_level)) return RET_PF_RETRY; if (is_shadow_present_pte(iter->old_spte) && (fault->prefetch || is_access_allowed(fault, iter->old_spte)) && is_last_spte(iter->old_spte, iter->level)) { WARN_ON_ONCE(fault->pfn != spte_to_pfn(iter->old_spte)); return RET_PF_SPURIOUS; } if (unlikely(!fault->slot)) new_spte = make_mmio_spte(vcpu, iter->gfn, ACC_ALL); else wrprot = make_spte(vcpu, sp, fault->slot, ACC_ALL, iter->gfn, fault->pfn, iter->old_spte, fault->prefetch, false, fault->map_writable, &new_spte); if (new_spte == iter->old_spte) ret = RET_PF_SPURIOUS; else if (tdp_mmu_set_spte_atomic(vcpu->kvm, iter, new_spte)) return RET_PF_RETRY; else if (is_shadow_present_pte(iter->old_spte) && (!is_last_spte(iter->old_spte, iter->level) || WARN_ON_ONCE(leaf_spte_change_needs_tlb_flush(iter->old_spte, new_spte)))) kvm_flush_remote_tlbs_gfn(vcpu->kvm, iter->gfn, iter->level); /* * If the page fault was caused by a write but the page is write * protected, emulation is needed. If the emulation was skipped, * the vCPU would have the same fault again. */ if (wrprot && fault->write) ret = RET_PF_WRITE_PROTECTED; /* If a MMIO SPTE is installed, the MMIO will need to be emulated. */ if (unlikely(is_mmio_spte(vcpu->kvm, new_spte))) { vcpu->stat.pf_mmio_spte_created++; trace_mark_mmio_spte(rcu_dereference(iter->sptep), iter->gfn, new_spte); ret = RET_PF_EMULATE; } else { trace_kvm_mmu_set_spte(iter->level, iter->gfn, rcu_dereference(iter->sptep)); } return ret; } /* * tdp_mmu_link_sp - Replace the given spte with an spte pointing to the * provided page table. * * @kvm: kvm instance * @iter: a tdp_iter instance currently on the SPTE that should be set * @sp: The new TDP page table to install. * @shared: This operation is running under the MMU lock in read mode. * * Returns: 0 if the new page table was installed. Non-0 if the page table * could not be installed (e.g. the atomic compare-exchange failed). */ static int tdp_mmu_link_sp(struct kvm *kvm, struct tdp_iter *iter, struct kvm_mmu_page *sp, bool shared) { u64 spte = make_nonleaf_spte(sp->spt, !kvm_ad_enabled); int ret = 0; if (shared) { ret = tdp_mmu_set_spte_atomic(kvm, iter, spte); if (ret) return ret; } else { tdp_mmu_iter_set_spte(kvm, iter, spte); } tdp_account_mmu_page(kvm, sp); return 0; } static int tdp_mmu_split_huge_page(struct kvm *kvm, struct tdp_iter *iter, struct kvm_mmu_page *sp, bool shared); /* * Handle a TDP page fault (NPT/EPT violation/misconfiguration) by installing * page tables and SPTEs to translate the faulting guest physical address. */ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) { struct kvm_mmu_page *root = tdp_mmu_get_root_for_fault(vcpu, fault); struct kvm *kvm = vcpu->kvm; struct tdp_iter iter; struct kvm_mmu_page *sp; int ret = RET_PF_RETRY; kvm_mmu_hugepage_adjust(vcpu, fault); trace_kvm_mmu_spte_requested(fault); rcu_read_lock(); for_each_tdp_pte(iter, kvm, root, fault->gfn, fault->gfn + 1) { int r; if (fault->nx_huge_page_workaround_enabled) disallowed_hugepage_adjust(fault, iter.old_spte, iter.level); /* * If SPTE has been frozen by another thread, just give up and * retry, avoiding unnecessary page table allocation and free. */ if (is_frozen_spte(iter.old_spte)) goto retry; if (iter.level == fault->goal_level) goto map_target_level; /* Step down into the lower level page table if it exists. */ if (is_shadow_present_pte(iter.old_spte) && !is_large_pte(iter.old_spte)) continue; /* * The SPTE is either non-present or points to a huge page that * needs to be split. */ sp = tdp_mmu_alloc_sp(vcpu); tdp_mmu_init_child_sp(sp, &iter); if (is_mirror_sp(sp)) kvm_mmu_alloc_external_spt(vcpu, sp); sp->nx_huge_page_disallowed = fault->huge_page_disallowed; if (is_shadow_present_pte(iter.old_spte)) { /* Don't support large page for mirrored roots (TDX) */ KVM_BUG_ON(is_mirror_sptep(iter.sptep), vcpu->kvm); r = tdp_mmu_split_huge_page(kvm, &iter, sp, true); } else { r = tdp_mmu_link_sp(kvm, &iter, sp, true); } /* * Force the guest to retry if installing an upper level SPTE * failed, e.g. because a different task modified the SPTE. */ if (r) { tdp_mmu_free_sp(sp); goto retry; } if (fault->huge_page_disallowed && fault->req_level >= iter.level) { spin_lock(&kvm->arch.tdp_mmu_pages_lock); if (sp->nx_huge_page_disallowed) track_possible_nx_huge_page(kvm, sp); spin_unlock(&kvm->arch.tdp_mmu_pages_lock); } } /* * The walk aborted before reaching the target level, e.g. because the * iterator detected an upper level SPTE was frozen during traversal. */ WARN_ON_ONCE(iter.level == fault->goal_level); goto retry; map_target_level: ret = tdp_mmu_map_handle_target_level(vcpu, fault, &iter); retry: rcu_read_unlock(); return ret; } /* Used by mmu notifier via kvm_unmap_gfn_range() */ bool kvm_tdp_mmu_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range, bool flush) { enum kvm_tdp_mmu_root_types types; struct kvm_mmu_page *root; types = kvm_gfn_range_filter_to_root_types(kvm, range->attr_filter) | KVM_INVALID_ROOTS; __for_each_tdp_mmu_root_yield_safe(kvm, root, range->slot->as_id, types) flush = tdp_mmu_zap_leafs(kvm, root, range->start, range->end, range->may_block, flush); return flush; } /* * Mark the SPTEs range of GFNs [start, end) unaccessed and return non-zero * if any of the GFNs in the range have been accessed. * * No need to mark the corresponding PFN as accessed as this call is coming * from the clear_young() or clear_flush_young() notifier, which uses the * return value to determine if the page has been accessed. */ static void kvm_tdp_mmu_age_spte(struct kvm *kvm, struct tdp_iter *iter) { u64 new_spte; if (spte_ad_enabled(iter->old_spte)) { iter->old_spte = tdp_mmu_clear_spte_bits_atomic(iter->sptep, shadow_accessed_mask); new_spte = iter->old_spte & ~shadow_accessed_mask; } else { new_spte = mark_spte_for_access_track(iter->old_spte); /* * It is safe for the following cmpxchg to fail. Leave the * Accessed bit set, as the spte is most likely young anyway. */ if (__tdp_mmu_set_spte_atomic(kvm, iter, new_spte)) return; } trace_kvm_tdp_mmu_spte_changed(iter->as_id, iter->gfn, iter->level, iter->old_spte, new_spte); } static bool __kvm_tdp_mmu_age_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range, bool test_only) { enum kvm_tdp_mmu_root_types types; struct kvm_mmu_page *root; struct tdp_iter iter; bool ret = false; types = kvm_gfn_range_filter_to_root_types(kvm, range->attr_filter); /* * Don't support rescheduling, none of the MMU notifiers that funnel * into this helper allow blocking; it'd be dead, wasteful code. Note, * this helper must NOT be used to unmap GFNs, as it processes only * valid roots! */ WARN_ON(types & ~KVM_VALID_ROOTS); guard(rcu)(); for_each_tdp_mmu_root_rcu(kvm, root, range->slot->as_id, types) { tdp_root_for_each_leaf_pte(iter, kvm, root, range->start, range->end) { if (!is_accessed_spte(iter.old_spte)) continue; if (test_only) return true; ret = true; kvm_tdp_mmu_age_spte(kvm, &iter); } } return ret; } bool kvm_tdp_mmu_age_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range) { return __kvm_tdp_mmu_age_gfn_range(kvm, range, false); } bool kvm_tdp_mmu_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) { return __kvm_tdp_mmu_age_gfn_range(kvm, range, true); } /* * Remove write access from all SPTEs at or above min_level that map GFNs * [start, end). Returns true if an SPTE has been changed and the TLBs need to * be flushed. */ static bool wrprot_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root, gfn_t start, gfn_t end, int min_level) { struct tdp_iter iter; u64 new_spte; bool spte_set = false; rcu_read_lock(); BUG_ON(min_level > KVM_MAX_HUGEPAGE_LEVEL); for_each_tdp_pte_min_level(iter, kvm, root, min_level, start, end) { retry: if (tdp_mmu_iter_cond_resched(kvm, &iter, false, true)) continue; if (!is_shadow_present_pte(iter.old_spte) || !is_last_spte(iter.old_spte, iter.level) || !(iter.old_spte & PT_WRITABLE_MASK)) continue; new_spte = iter.old_spte & ~PT_WRITABLE_MASK; if (tdp_mmu_set_spte_atomic(kvm, &iter, new_spte)) goto retry; spte_set = true; } rcu_read_unlock(); return spte_set; } /* * Remove write access from all the SPTEs mapping GFNs in the memslot. Will * only affect leaf SPTEs down to min_level. * Returns true if an SPTE has been changed and the TLBs need to be flushed. */ bool kvm_tdp_mmu_wrprot_slot(struct kvm *kvm, const struct kvm_memory_slot *slot, int min_level) { struct kvm_mmu_page *root; bool spte_set = false; lockdep_assert_held_read(&kvm->mmu_lock); for_each_valid_tdp_mmu_root_yield_safe(kvm, root, slot->as_id) spte_set |= wrprot_gfn_range(kvm, root, slot->base_gfn, slot->base_gfn + slot->npages, min_level); return spte_set; } static struct kvm_mmu_page *tdp_mmu_alloc_sp_for_split(void) { struct kvm_mmu_page *sp; sp = kmem_cache_zalloc(mmu_page_header_cache, GFP_KERNEL_ACCOUNT); if (!sp) return NULL; sp->spt = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT); if (!sp->spt) { kmem_cache_free(mmu_page_header_cache, sp); return NULL; } return sp; } /* Note, the caller is responsible for initializing @sp. */ static int tdp_mmu_split_huge_page(struct kvm *kvm, struct tdp_iter *iter, struct kvm_mmu_page *sp, bool shared) { const u64 huge_spte = iter->old_spte; const int level = iter->level; int ret, i; /* * No need for atomics when writing to sp->spt since the page table has * not been linked in yet and thus is not reachable from any other CPU. */ for (i = 0; i < SPTE_ENT_PER_PAGE; i++) sp->spt[i] = make_small_spte(kvm, huge_spte, sp->role, i); /* * Replace the huge spte with a pointer to the populated lower level * page table. Since we are making this change without a TLB flush vCPUs * will see a mix of the split mappings and the original huge mapping, * depending on what's currently in their TLB. This is fine from a * correctness standpoint since the translation will be the same either * way. */ ret = tdp_mmu_link_sp(kvm, iter, sp, shared); if (ret) goto out; /* * tdp_mmu_link_sp_atomic() will handle subtracting the huge page we * are overwriting from the page stats. But we have to manually update * the page stats with the new present child pages. */ kvm_update_page_stats(kvm, level - 1, SPTE_ENT_PER_PAGE); out: trace_kvm_mmu_split_huge_page(iter->gfn, huge_spte, level, ret); return ret; } static int tdp_mmu_split_huge_pages_root(struct kvm *kvm, struct kvm_mmu_page *root, gfn_t start, gfn_t end, int target_level, bool shared) { struct kvm_mmu_page *sp = NULL; struct tdp_iter iter; rcu_read_lock(); /* * Traverse the page table splitting all huge pages above the target * level into one lower level. For example, if we encounter a 1GB page * we split it into 512 2MB pages. * * Since the TDP iterator uses a pre-order traversal, we are guaranteed * to visit an SPTE before ever visiting its children, which means we * will correctly recursively split huge pages that are more than one * level above the target level (e.g. splitting a 1GB to 512 2MB pages, * and then splitting each of those to 512 4KB pages). */ for_each_tdp_pte_min_level(iter, kvm, root, target_level + 1, start, end) { retry: if (tdp_mmu_iter_cond_resched(kvm, &iter, false, shared)) continue; if (!is_shadow_present_pte(iter.old_spte) || !is_large_pte(iter.old_spte)) continue; if (!sp) { rcu_read_unlock(); if (shared) read_unlock(&kvm->mmu_lock); else write_unlock(&kvm->mmu_lock); sp = tdp_mmu_alloc_sp_for_split(); if (shared) read_lock(&kvm->mmu_lock); else write_lock(&kvm->mmu_lock); if (!sp) { trace_kvm_mmu_split_huge_page(iter.gfn, iter.old_spte, iter.level, -ENOMEM); return -ENOMEM; } rcu_read_lock(); iter.yielded = true; continue; } tdp_mmu_init_child_sp(sp, &iter); if (tdp_mmu_split_huge_page(kvm, &iter, sp, shared)) goto retry; sp = NULL; } rcu_read_unlock(); /* * It's possible to exit the loop having never used the last sp if, for * example, a vCPU doing HugePage NX splitting wins the race and * installs its own sp in place of the last sp we tried to split. */ if (sp) tdp_mmu_free_sp(sp); return 0; } /* * Try to split all huge pages mapped by the TDP MMU down to the target level. */ void kvm_tdp_mmu_try_split_huge_pages(struct kvm *kvm, const struct kvm_memory_slot *slot, gfn_t start, gfn_t end, int target_level, bool shared) { struct kvm_mmu_page *root; int r = 0; kvm_lockdep_assert_mmu_lock_held(kvm, shared); for_each_valid_tdp_mmu_root_yield_safe(kvm, root, slot->as_id) { r = tdp_mmu_split_huge_pages_root(kvm, root, start, end, target_level, shared); if (r) { kvm_tdp_mmu_put_root(kvm, root); break; } } } static bool tdp_mmu_need_write_protect(struct kvm *kvm, struct kvm_mmu_page *sp) { /* * All TDP MMU shadow pages share the same role as their root, aside * from level, so it is valid to key off any shadow page to determine if * write protection is needed for an entire tree. */ return kvm_mmu_page_ad_need_write_protect(kvm, sp) || !kvm_ad_enabled; } static void clear_dirty_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root, gfn_t start, gfn_t end) { const u64 dbit = tdp_mmu_need_write_protect(kvm, root) ? PT_WRITABLE_MASK : shadow_dirty_mask; struct tdp_iter iter; rcu_read_lock(); tdp_root_for_each_pte(iter, kvm, root, start, end) { retry: if (!is_shadow_present_pte(iter.old_spte) || !is_last_spte(iter.old_spte, iter.level)) continue; if (tdp_mmu_iter_cond_resched(kvm, &iter, false, true)) continue; KVM_MMU_WARN_ON(dbit == shadow_dirty_mask && spte_ad_need_write_protect(iter.old_spte)); if (!(iter.old_spte & dbit)) continue; if (tdp_mmu_set_spte_atomic(kvm, &iter, iter.old_spte & ~dbit)) goto retry; } rcu_read_unlock(); } /* * Clear the dirty status (D-bit or W-bit) of all the SPTEs mapping GFNs in the * memslot. */ void kvm_tdp_mmu_clear_dirty_slot(struct kvm *kvm, const struct kvm_memory_slot *slot) { struct kvm_mmu_page *root; lockdep_assert_held_read(&kvm->mmu_lock); for_each_valid_tdp_mmu_root_yield_safe(kvm, root, slot->as_id) clear_dirty_gfn_range(kvm, root, slot->base_gfn, slot->base_gfn + slot->npages); } static void clear_dirty_pt_masked(struct kvm *kvm, struct kvm_mmu_page *root, gfn_t gfn, unsigned long mask, bool wrprot) { const u64 dbit = (wrprot || tdp_mmu_need_write_protect(kvm, root)) ? PT_WRITABLE_MASK : shadow_dirty_mask; struct tdp_iter iter; lockdep_assert_held_write(&kvm->mmu_lock); rcu_read_lock(); tdp_root_for_each_leaf_pte(iter, kvm, root, gfn + __ffs(mask), gfn + BITS_PER_LONG) { if (!mask) break; KVM_MMU_WARN_ON(dbit == shadow_dirty_mask && spte_ad_need_write_protect(iter.old_spte)); if (iter.level > PG_LEVEL_4K || !(mask & (1UL << (iter.gfn - gfn)))) continue; mask &= ~(1UL << (iter.gfn - gfn)); if (!(iter.old_spte & dbit)) continue; iter.old_spte = tdp_mmu_clear_spte_bits(iter.sptep, iter.old_spte, dbit, iter.level); trace_kvm_tdp_mmu_spte_changed(iter.as_id, iter.gfn, iter.level, iter.old_spte, iter.old_spte & ~dbit); } rcu_read_unlock(); } /* * Clear the dirty status (D-bit or W-bit) of all the 4k SPTEs mapping GFNs for * which a bit is set in mask, starting at gfn. The given memslot is expected to * contain all the GFNs represented by set bits in the mask. */ void kvm_tdp_mmu_clear_dirty_pt_masked(struct kvm *kvm, struct kvm_memory_slot *slot, gfn_t gfn, unsigned long mask, bool wrprot) { struct kvm_mmu_page *root; for_each_valid_tdp_mmu_root(kvm, root, slot->as_id) clear_dirty_pt_masked(kvm, root, gfn, mask, wrprot); } static int tdp_mmu_make_huge_spte(struct kvm *kvm, struct tdp_iter *parent, u64 *huge_spte) { struct kvm_mmu_page *root = spte_to_child_sp(parent->old_spte); gfn_t start = parent->gfn; gfn_t end = start + KVM_PAGES_PER_HPAGE(parent->level); struct tdp_iter iter; tdp_root_for_each_leaf_pte(iter, kvm, root, start, end) { /* * Use the parent iterator when checking for forward progress so * that KVM doesn't get stuck continuously trying to yield (i.e. * returning -EAGAIN here and then failing the forward progress * check in the caller ad nauseam). */ if (tdp_mmu_iter_need_resched(kvm, parent)) return -EAGAIN; *huge_spte = make_huge_spte(kvm, iter.old_spte, parent->level); return 0; } return -ENOENT; } static void recover_huge_pages_range(struct kvm *kvm, struct kvm_mmu_page *root, const struct kvm_memory_slot *slot) { gfn_t start = slot->base_gfn; gfn_t end = start + slot->npages; struct tdp_iter iter; int max_mapping_level; bool flush = false; u64 huge_spte; int r; if (WARN_ON_ONCE(kvm_slot_dirty_track_enabled(slot))) return; rcu_read_lock(); for_each_tdp_pte_min_level(iter, kvm, root, PG_LEVEL_2M, start, end) { retry: if (tdp_mmu_iter_cond_resched(kvm, &iter, flush, true)) { flush = false; continue; } if (iter.level > KVM_MAX_HUGEPAGE_LEVEL || !is_shadow_present_pte(iter.old_spte)) continue; /* * Don't zap leaf SPTEs, if a leaf SPTE could be replaced with * a large page size, then its parent would have been zapped * instead of stepping down. */ if (is_last_spte(iter.old_spte, iter.level)) continue; /* * If iter.gfn resides outside of the slot, i.e. the page for * the current level overlaps but is not contained by the slot, * then the SPTE can't be made huge. More importantly, trying * to query that info from slot->arch.lpage_info will cause an * out-of-bounds access. */ if (iter.gfn < start || iter.gfn >= end) continue; max_mapping_level = kvm_mmu_max_mapping_level(kvm, slot, iter.gfn); if (max_mapping_level < iter.level) continue; r = tdp_mmu_make_huge_spte(kvm, &iter, &huge_spte); if (r == -EAGAIN) goto retry; else if (r) continue; if (tdp_mmu_set_spte_atomic(kvm, &iter, huge_spte)) goto retry; flush = true; } if (flush) kvm_flush_remote_tlbs_memslot(kvm, slot); rcu_read_unlock(); } /* * Recover huge page mappings within the slot by replacing non-leaf SPTEs with * huge SPTEs where possible. */ void kvm_tdp_mmu_recover_huge_pages(struct kvm *kvm, const struct kvm_memory_slot *slot) { struct kvm_mmu_page *root; lockdep_assert_held_read(&kvm->mmu_lock); for_each_valid_tdp_mmu_root_yield_safe(kvm, root, slot->as_id) recover_huge_pages_range(kvm, root, slot); } /* * Removes write access on the last level SPTE mapping this GFN and unsets the * MMU-writable bit to ensure future writes continue to be intercepted. * Returns true if an SPTE was set and a TLB flush is needed. */ static bool write_protect_gfn(struct kvm *kvm, struct kvm_mmu_page *root, gfn_t gfn, int min_level) { struct tdp_iter iter; u64 new_spte; bool spte_set = false; BUG_ON(min_level > KVM_MAX_HUGEPAGE_LEVEL); rcu_read_lock(); for_each_tdp_pte_min_level(iter, kvm, root, min_level, gfn, gfn + 1) { if (!is_shadow_present_pte(iter.old_spte) || !is_last_spte(iter.old_spte, iter.level)) continue; new_spte = iter.old_spte & ~(PT_WRITABLE_MASK | shadow_mmu_writable_mask); if (new_spte == iter.old_spte) break; tdp_mmu_iter_set_spte(kvm, &iter, new_spte); spte_set = true; } rcu_read_unlock(); return spte_set; } /* * Removes write access on the last level SPTE mapping this GFN and unsets the * MMU-writable bit to ensure future writes continue to be intercepted. * Returns true if an SPTE was set and a TLB flush is needed. */ bool kvm_tdp_mmu_write_protect_gfn(struct kvm *kvm, struct kvm_memory_slot *slot, gfn_t gfn, int min_level) { struct kvm_mmu_page *root; bool spte_set = false; lockdep_assert_held_write(&kvm->mmu_lock); for_each_valid_tdp_mmu_root(kvm, root, slot->as_id) spte_set |= write_protect_gfn(kvm, root, gfn, min_level); return spte_set; } /* * Return the level of the lowest level SPTE added to sptes. * That SPTE may be non-present. * * Must be called between kvm_tdp_mmu_walk_lockless_{begin,end}. */ static int __kvm_tdp_mmu_get_walk(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes, struct kvm_mmu_page *root) { struct tdp_iter iter; gfn_t gfn = addr >> PAGE_SHIFT; int leaf = -1; for_each_tdp_pte(iter, vcpu->kvm, root, gfn, gfn + 1) { leaf = iter.level; sptes[leaf] = iter.old_spte; } return leaf; } int kvm_tdp_mmu_get_walk(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes, int *root_level) { struct kvm_mmu_page *root = root_to_sp(vcpu->arch.mmu->root.hpa); *root_level = vcpu->arch.mmu->root_role.level; return __kvm_tdp_mmu_get_walk(vcpu, addr, sptes, root); } bool kvm_tdp_mmu_gpa_is_mapped(struct kvm_vcpu *vcpu, u64 gpa) { struct kvm *kvm = vcpu->kvm; bool is_direct = kvm_is_addr_direct(kvm, gpa); hpa_t root = is_direct ? vcpu->arch.mmu->root.hpa : vcpu->arch.mmu->mirror_root_hpa; u64 sptes[PT64_ROOT_MAX_LEVEL + 1], spte; int leaf; lockdep_assert_held(&kvm->mmu_lock); rcu_read_lock(); leaf = __kvm_tdp_mmu_get_walk(vcpu, gpa, sptes, root_to_sp(root)); rcu_read_unlock(); if (leaf < 0) return false; spte = sptes[leaf]; return is_shadow_present_pte(spte) && is_last_spte(spte, leaf); } EXPORT_SYMBOL_GPL(kvm_tdp_mmu_gpa_is_mapped); /* * Returns the last level spte pointer of the shadow page walk for the given * gpa, and sets *spte to the spte value. This spte may be non-preset. If no * walk could be performed, returns NULL and *spte does not contain valid data. * * Contract: * - Must be called between kvm_tdp_mmu_walk_lockless_{begin,end}. * - The returned sptep must not be used after kvm_tdp_mmu_walk_lockless_end. * * WARNING: This function is only intended to be called during fast_page_fault. */ u64 *kvm_tdp_mmu_fast_pf_get_last_sptep(struct kvm_vcpu *vcpu, gfn_t gfn, u64 *spte) { /* Fast pf is not supported for mirrored roots */ struct kvm_mmu_page *root = tdp_mmu_get_root(vcpu, KVM_DIRECT_ROOTS); struct tdp_iter iter; tdp_ptep_t sptep = NULL; for_each_tdp_pte(iter, vcpu->kvm, root, gfn, gfn + 1) { *spte = iter.old_spte; sptep = iter.sptep; } /* * Perform the rcu_dereference to get the raw spte pointer value since * we are passing it up to fast_page_fault, which is shared with the * legacy MMU and thus does not retain the TDP MMU-specific __rcu * annotation. * * This is safe since fast_page_fault obeys the contracts of this * function as well as all TDP MMU contracts around modifying SPTEs * outside of mmu_lock. */ return rcu_dereference(sptep); } |
| 1969 1927 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 | // SPDX-License-Identifier: GPL-2.0 /* * All the USB notify logic * * (C) Copyright 2005 Greg Kroah-Hartman <gregkh@suse.de> * * notifier functions originally based on those in kernel/sys.c * but fixed up to not be so broken. * * Released under the GPLv2 only. */ #include <linux/kernel.h> #include <linux/export.h> #include <linux/notifier.h> #include <linux/usb.h> #include <linux/mutex.h> #include "usb.h" static BLOCKING_NOTIFIER_HEAD(usb_notifier_list); /** * usb_register_notify - register a notifier callback whenever a usb change happens * @nb: pointer to the notifier block for the callback events. * * These changes are either USB devices or busses being added or removed. */ void usb_register_notify(struct notifier_block *nb) { blocking_notifier_chain_register(&usb_notifier_list, nb); } EXPORT_SYMBOL_GPL(usb_register_notify); /** * usb_unregister_notify - unregister a notifier callback * @nb: pointer to the notifier block for the callback events. * * usb_register_notify() must have been previously called for this function * to work properly. */ void usb_unregister_notify(struct notifier_block *nb) { blocking_notifier_chain_unregister(&usb_notifier_list, nb); } EXPORT_SYMBOL_GPL(usb_unregister_notify); void usb_notify_add_device(struct usb_device *udev) { blocking_notifier_call_chain(&usb_notifier_list, USB_DEVICE_ADD, udev); } void usb_notify_remove_device(struct usb_device *udev) { blocking_notifier_call_chain(&usb_notifier_list, USB_DEVICE_REMOVE, udev); } void usb_notify_add_bus(struct usb_bus *ubus) { blocking_notifier_call_chain(&usb_notifier_list, USB_BUS_ADD, ubus); } void usb_notify_remove_bus(struct usb_bus *ubus) { blocking_notifier_call_chain(&usb_notifier_list, USB_BUS_REMOVE, ubus); } |
| 28 15 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 | // SPDX-License-Identifier: GPL-2.0 /* * The Virtual DTV test driver serves as a reference DVB driver and helps * validate the existing APIs in the media subsystem. It can also aid * developers working on userspace applications. * * When this module is loaded, it will attempt to modprobe 'dvb_vidtv_tuner' * and 'dvb_vidtv_demod'. * * Copyright (C) 2020 Daniel W. S. Almeida */ #include <linux/dev_printk.h> #include <linux/moduleparam.h> #include <linux/mutex.h> #include <linux/platform_device.h> #include <linux/time.h> #include <linux/types.h> #include <linux/workqueue.h> #include <media/dvbdev.h> #include <media/media-device.h> #include "vidtv_bridge.h" #include "vidtv_common.h" #include "vidtv_demod.h" #include "vidtv_mux.h" #include "vidtv_ts.h" #include "vidtv_tuner.h" #define MUX_BUF_MIN_SZ 90164 #define MUX_BUF_MAX_SZ (MUX_BUF_MIN_SZ * 10) #define TUNER_DEFAULT_ADDR 0x68 #define DEMOD_DEFAULT_ADDR 0x60 #define VIDTV_DEFAULT_NETWORK_ID 0xff44 #define VIDTV_DEFAULT_NETWORK_NAME "LinuxTV.org" #define VIDTV_DEFAULT_TS_ID 0x4081 /* * The LNBf fake parameters here are the ranges used by an * Universal (extended) European LNBf, which is likely the most common LNBf * found on Satellite digital TV system nowadays. */ #define LNB_CUT_FREQUENCY 11700000 /* high IF frequency */ #define LNB_LOW_FREQ 9750000 /* low IF frequency */ #define LNB_HIGH_FREQ 10600000 /* transition frequency */ static unsigned int drop_tslock_prob_on_low_snr; module_param(drop_tslock_prob_on_low_snr, uint, 0444); MODULE_PARM_DESC(drop_tslock_prob_on_low_snr, "Probability of losing the TS lock if the signal quality is bad"); static unsigned int recover_tslock_prob_on_good_snr; module_param(recover_tslock_prob_on_good_snr, uint, 0444); MODULE_PARM_DESC(recover_tslock_prob_on_good_snr, "Probability recovering the TS lock when the signal improves"); static unsigned int mock_power_up_delay_msec; module_param(mock_power_up_delay_msec, uint, 0444); MODULE_PARM_DESC(mock_power_up_delay_msec, "Simulate a power up delay"); static unsigned int mock_tune_delay_msec; module_param(mock_tune_delay_msec, uint, 0444); MODULE_PARM_DESC(mock_tune_delay_msec, "Simulate a tune delay"); static unsigned int vidtv_valid_dvb_t_freqs[NUM_VALID_TUNER_FREQS] = { 474000000 }; module_param_array(vidtv_valid_dvb_t_freqs, uint, NULL, 0444); MODULE_PARM_DESC(vidtv_valid_dvb_t_freqs, "Valid DVB-T frequencies to simulate, in Hz"); static unsigned int vidtv_valid_dvb_c_freqs[NUM_VALID_TUNER_FREQS] = { 474000000 }; module_param_array(vidtv_valid_dvb_c_freqs, uint, NULL, 0444); MODULE_PARM_DESC(vidtv_valid_dvb_c_freqs, "Valid DVB-C frequencies to simulate, in Hz"); static unsigned int vidtv_valid_dvb_s_freqs[NUM_VALID_TUNER_FREQS] = { 11362000 }; module_param_array(vidtv_valid_dvb_s_freqs, uint, NULL, 0444); MODULE_PARM_DESC(vidtv_valid_dvb_s_freqs, "Valid DVB-S/S2 frequencies to simulate at Ku-Band, in kHz"); static unsigned int max_frequency_shift_hz; module_param(max_frequency_shift_hz, uint, 0444); MODULE_PARM_DESC(max_frequency_shift_hz, "Maximum shift in HZ allowed when tuning in a channel"); DVB_DEFINE_MOD_OPT_ADAPTER_NR(adapter_nums); /* * Influences the signal acquisition time. See ISO/IEC 13818-1 : 2000. p. 113. */ static unsigned int si_period_msec = 40; module_param(si_period_msec, uint, 0444); MODULE_PARM_DESC(si_period_msec, "How often to send SI packets. Default: 40ms"); static unsigned int pcr_period_msec = 40; module_param(pcr_period_msec, uint, 0444); MODULE_PARM_DESC(pcr_period_msec, "How often to send PCR packets. Default: 40ms"); static unsigned int mux_rate_kbytes_sec = 4096; module_param(mux_rate_kbytes_sec, uint, 0444); MODULE_PARM_DESC(mux_rate_kbytes_sec, "Mux rate: will pad stream if below"); static unsigned int pcr_pid = 0x200; module_param(pcr_pid, uint, 0444); MODULE_PARM_DESC(pcr_pid, "PCR PID for all channels: defaults to 0x200"); static unsigned int mux_buf_sz_pkts; module_param(mux_buf_sz_pkts, uint, 0444); MODULE_PARM_DESC(mux_buf_sz_pkts, "Size for the internal mux buffer in multiples of 188 bytes"); static u32 vidtv_bridge_mux_buf_sz_for_mux_rate(void) { u32 max_elapsed_time_msecs = VIDTV_MAX_SLEEP_USECS / USEC_PER_MSEC; u32 mux_buf_sz = mux_buf_sz_pkts * TS_PACKET_LEN; u32 nbytes_expected; nbytes_expected = mux_rate_kbytes_sec; nbytes_expected *= max_elapsed_time_msecs; mux_buf_sz = roundup(nbytes_expected, TS_PACKET_LEN); mux_buf_sz += mux_buf_sz / 10; if (mux_buf_sz < MUX_BUF_MIN_SZ) mux_buf_sz = MUX_BUF_MIN_SZ; if (mux_buf_sz > MUX_BUF_MAX_SZ) mux_buf_sz = MUX_BUF_MAX_SZ; return mux_buf_sz; } static bool vidtv_bridge_check_demod_lock(struct vidtv_dvb *dvb, u32 n) { enum fe_status status; dvb->fe[n]->ops.read_status(dvb->fe[n], &status); return status == (FE_HAS_SIGNAL | FE_HAS_CARRIER | FE_HAS_VITERBI | FE_HAS_SYNC | FE_HAS_LOCK); } /* * called on a separate thread by the mux when new packets become available */ static void vidtv_bridge_on_new_pkts_avail(void *priv, u8 *buf, u32 npkts) { struct vidtv_dvb *dvb = priv; /* drop packets if we lose the lock */ if (vidtv_bridge_check_demod_lock(dvb, 0)) dvb_dmx_swfilter_packets(&dvb->demux, buf, npkts); } static int vidtv_start_streaming(struct vidtv_dvb *dvb) { struct vidtv_mux_init_args mux_args = { .mux_rate_kbytes_sec = mux_rate_kbytes_sec, .on_new_packets_available_cb = vidtv_bridge_on_new_pkts_avail, .pcr_period_usecs = pcr_period_msec * USEC_PER_MSEC, .si_period_usecs = si_period_msec * USEC_PER_MSEC, .pcr_pid = pcr_pid, .transport_stream_id = VIDTV_DEFAULT_TS_ID, .network_id = VIDTV_DEFAULT_NETWORK_ID, .network_name = VIDTV_DEFAULT_NETWORK_NAME, .priv = dvb, }; struct device *dev = &dvb->pdev->dev; u32 mux_buf_sz; if (dvb->streaming) { dev_warn_ratelimited(dev, "Already streaming. Skipping.\n"); return 0; } if (mux_buf_sz_pkts) mux_buf_sz = mux_buf_sz_pkts; else mux_buf_sz = vidtv_bridge_mux_buf_sz_for_mux_rate(); mux_args.mux_buf_sz = mux_buf_sz; dvb->mux = vidtv_mux_init(dvb->fe[0], dev, &mux_args); if (!dvb->mux) return -ENOMEM; dvb->streaming = true; vidtv_mux_start_thread(dvb->mux); dev_dbg_ratelimited(dev, "Started streaming\n"); return 0; } static int vidtv_stop_streaming(struct vidtv_dvb *dvb) { struct device *dev = &dvb->pdev->dev; if (!dvb->streaming) { dev_warn_ratelimited(dev, "No streaming. Skipping.\n"); return 0; } dvb->streaming = false; vidtv_mux_stop_thread(dvb->mux); vidtv_mux_destroy(dvb->mux); dvb->mux = NULL; dev_dbg_ratelimited(dev, "Stopped streaming\n"); return 0; } static int vidtv_start_feed(struct dvb_demux_feed *feed) { struct dvb_demux *demux = feed->demux; struct vidtv_dvb *dvb = demux->priv; int ret; int rc; if (!demux->dmx.frontend) return -EINVAL; mutex_lock(&dvb->feed_lock); dvb->nfeeds++; rc = dvb->nfeeds; if (dvb->nfeeds == 1) { ret = vidtv_start_streaming(dvb); if (ret < 0) rc = ret; } mutex_unlock(&dvb->feed_lock); return rc; } static int vidtv_stop_feed(struct dvb_demux_feed *feed) { struct dvb_demux *demux = feed->demux; struct vidtv_dvb *dvb = demux->priv; int err = 0; mutex_lock(&dvb->feed_lock); dvb->nfeeds--; if (!dvb->nfeeds) err = vidtv_stop_streaming(dvb); mutex_unlock(&dvb->feed_lock); return err; } static struct dvb_frontend *vidtv_get_frontend_ptr(struct i2c_client *c) { struct vidtv_demod_state *state = i2c_get_clientdata(c); /* the demod will set this when its probe function runs */ return &state->frontend; } static int vidtv_master_xfer(struct i2c_adapter *i2c_adap, struct i2c_msg msgs[], int num) { /* * Right now, this virtual driver doesn't really send or receive * messages from I2C. A real driver will require an implementation * here. */ return 0; } static u32 vidtv_i2c_func(struct i2c_adapter *adapter) { return I2C_FUNC_I2C; } static const struct i2c_algorithm vidtv_i2c_algorithm = { .master_xfer = vidtv_master_xfer, .functionality = vidtv_i2c_func, }; static int vidtv_bridge_i2c_register_adap(struct vidtv_dvb *dvb) { struct i2c_adapter *i2c_adapter = &dvb->i2c_adapter; strscpy(i2c_adapter->name, "vidtv_i2c", sizeof(i2c_adapter->name)); i2c_adapter->owner = THIS_MODULE; i2c_adapter->algo = &vidtv_i2c_algorithm; i2c_adapter->algo_data = NULL; i2c_adapter->timeout = 500; i2c_adapter->retries = 3; i2c_adapter->dev.parent = &dvb->pdev->dev; i2c_set_adapdata(i2c_adapter, dvb); return i2c_add_adapter(&dvb->i2c_adapter); } static int vidtv_bridge_register_adap(struct vidtv_dvb *dvb) { int ret = 0; ret = dvb_register_adapter(&dvb->adapter, KBUILD_MODNAME, THIS_MODULE, &dvb->i2c_adapter.dev, adapter_nums); return ret; } static int vidtv_bridge_dmx_init(struct vidtv_dvb *dvb) { dvb->demux.dmx.capabilities = DMX_TS_FILTERING | DMX_SECTION_FILTERING; dvb->demux.priv = dvb; dvb->demux.filternum = 256; dvb->demux.feednum = 256; dvb->demux.start_feed = vidtv_start_feed; dvb->demux.stop_feed = vidtv_stop_feed; return dvb_dmx_init(&dvb->demux); } static int vidtv_bridge_dmxdev_init(struct vidtv_dvb *dvb) { dvb->dmx_dev.filternum = 256; dvb->dmx_dev.demux = &dvb->demux.dmx; dvb->dmx_dev.capabilities = 0; return dvb_dmxdev_init(&dvb->dmx_dev, &dvb->adapter); } static int vidtv_bridge_probe_demod(struct vidtv_dvb *dvb, u32 n) { struct vidtv_demod_config cfg = { .drop_tslock_prob_on_low_snr = drop_tslock_prob_on_low_snr, .recover_tslock_prob_on_good_snr = recover_tslock_prob_on_good_snr, }; dvb->i2c_client_demod[n] = dvb_module_probe("dvb_vidtv_demod", NULL, &dvb->i2c_adapter, DEMOD_DEFAULT_ADDR, &cfg); /* driver will not work anyways so bail out */ if (!dvb->i2c_client_demod[n]) return -ENODEV; /* retrieve a ptr to the frontend state */ dvb->fe[n] = vidtv_get_frontend_ptr(dvb->i2c_client_demod[n]); return 0; } static int vidtv_bridge_probe_tuner(struct vidtv_dvb *dvb, u32 n) { struct vidtv_tuner_config cfg = { .fe = dvb->fe[n], .mock_power_up_delay_msec = mock_power_up_delay_msec, .mock_tune_delay_msec = mock_tune_delay_msec, }; u32 freq; int i; /* TODO: check if the frequencies are at a valid range */ memcpy(cfg.vidtv_valid_dvb_t_freqs, vidtv_valid_dvb_t_freqs, sizeof(vidtv_valid_dvb_t_freqs)); memcpy(cfg.vidtv_valid_dvb_c_freqs, vidtv_valid_dvb_c_freqs, sizeof(vidtv_valid_dvb_c_freqs)); /* * Convert Satellite frequencies from Ku-band in kHZ into S-band * frequencies in Hz. */ for (i = 0; i < ARRAY_SIZE(vidtv_valid_dvb_s_freqs); i++) { freq = vidtv_valid_dvb_s_freqs[i]; if (freq) { if (freq < LNB_CUT_FREQUENCY) freq = abs(freq - LNB_LOW_FREQ); else freq = abs(freq - LNB_HIGH_FREQ); } cfg.vidtv_valid_dvb_s_freqs[i] = freq; } cfg.max_frequency_shift_hz = max_frequency_shift_hz; dvb->i2c_client_tuner[n] = dvb_module_probe("dvb_vidtv_tuner", NULL, &dvb->i2c_adapter, TUNER_DEFAULT_ADDR, &cfg); return (dvb->i2c_client_tuner[n]) ? 0 : -ENODEV; } static int vidtv_bridge_dvb_init(struct vidtv_dvb *dvb) { int ret, i, j; ret = vidtv_bridge_i2c_register_adap(dvb); if (ret < 0) goto fail_i2c; ret = vidtv_bridge_register_adap(dvb); if (ret < 0) goto fail_adapter; dvb_register_media_controller(&dvb->adapter, &dvb->mdev); for (i = 0; i < NUM_FE; ++i) { ret = vidtv_bridge_probe_demod(dvb, i); if (ret < 0) goto fail_demod_probe; ret = vidtv_bridge_probe_tuner(dvb, i); if (ret < 0) goto fail_tuner_probe; ret = dvb_register_frontend(&dvb->adapter, dvb->fe[i]); if (ret < 0) goto fail_fe; } ret = vidtv_bridge_dmx_init(dvb); if (ret < 0) goto fail_dmx; ret = vidtv_bridge_dmxdev_init(dvb); if (ret < 0) goto fail_dmx_dev; for (j = 0; j < NUM_FE; ++j) { ret = dvb->demux.dmx.connect_frontend(&dvb->demux.dmx, &dvb->dmx_fe[j]); if (ret < 0) goto fail_dmx_conn; /* * The source of the demux is a frontend connected * to the demux. */ dvb->dmx_fe[j].source = DMX_FRONTEND_0; } return ret; fail_dmx_conn: for (j = j - 1; j >= 0; --j) dvb->demux.dmx.remove_frontend(&dvb->demux.dmx, &dvb->dmx_fe[j]); dvb_dmxdev_release(&dvb->dmx_dev); fail_dmx_dev: dvb_dmx_release(&dvb->demux); fail_dmx: fail_demod_probe: for (i = i - 1; i >= 0; --i) { dvb_unregister_frontend(dvb->fe[i]); fail_fe: dvb_module_release(dvb->i2c_client_tuner[i]); fail_tuner_probe: dvb_module_release(dvb->i2c_client_demod[i]); } fail_adapter: dvb_unregister_adapter(&dvb->adapter); fail_i2c: i2c_del_adapter(&dvb->i2c_adapter); return ret; } static int vidtv_bridge_probe(struct platform_device *pdev) { struct vidtv_dvb *dvb; int ret; dvb = kzalloc(sizeof(*dvb), GFP_KERNEL); if (!dvb) return -ENOMEM; dvb->pdev = pdev; #ifdef CONFIG_MEDIA_CONTROLLER_DVB dvb->mdev.dev = &pdev->dev; strscpy(dvb->mdev.model, "vidtv", sizeof(dvb->mdev.model)); strscpy(dvb->mdev.bus_info, "platform:vidtv", sizeof(dvb->mdev.bus_info)); media_device_init(&dvb->mdev); #endif ret = vidtv_bridge_dvb_init(dvb); if (ret < 0) goto err_dvb; mutex_init(&dvb->feed_lock); platform_set_drvdata(pdev, dvb); #ifdef CONFIG_MEDIA_CONTROLLER_DVB ret = media_device_register(&dvb->mdev); if (ret) { dev_err(dvb->mdev.dev, "media device register failed (err=%d)\n", ret); goto err_media_device_register; } #endif /* CONFIG_MEDIA_CONTROLLER_DVB */ dev_info(&pdev->dev, "Successfully initialized vidtv!\n"); return ret; #ifdef CONFIG_MEDIA_CONTROLLER_DVB err_media_device_register: media_device_cleanup(&dvb->mdev); #endif /* CONFIG_MEDIA_CONTROLLER_DVB */ err_dvb: kfree(dvb); return ret; } static void vidtv_bridge_remove(struct platform_device *pdev) { struct vidtv_dvb *dvb; u32 i; dvb = platform_get_drvdata(pdev); #ifdef CONFIG_MEDIA_CONTROLLER_DVB media_device_unregister(&dvb->mdev); media_device_cleanup(&dvb->mdev); #endif /* CONFIG_MEDIA_CONTROLLER_DVB */ mutex_destroy(&dvb->feed_lock); for (i = 0; i < NUM_FE; ++i) { dvb_unregister_frontend(dvb->fe[i]); dvb_module_release(dvb->i2c_client_tuner[i]); dvb_module_release(dvb->i2c_client_demod[i]); } dvb_dmxdev_release(&dvb->dmx_dev); dvb_dmx_release(&dvb->demux); dvb_unregister_adapter(&dvb->adapter); dev_info(&pdev->dev, "Successfully removed vidtv\n"); } static void vidtv_bridge_dev_release(struct device *dev) { struct vidtv_dvb *dvb; dvb = dev_get_drvdata(dev); kfree(dvb); } static struct platform_device vidtv_bridge_dev = { .name = VIDTV_PDEV_NAME, .dev.release = vidtv_bridge_dev_release, }; static struct platform_driver vidtv_bridge_driver = { .driver = { .name = VIDTV_PDEV_NAME, }, .probe = vidtv_bridge_probe, .remove = vidtv_bridge_remove, }; static void __exit vidtv_bridge_exit(void) { platform_driver_unregister(&vidtv_bridge_driver); platform_device_unregister(&vidtv_bridge_dev); } static int __init vidtv_bridge_init(void) { int ret; ret = platform_device_register(&vidtv_bridge_dev); if (ret) return ret; ret = platform_driver_register(&vidtv_bridge_driver); if (ret) platform_device_unregister(&vidtv_bridge_dev); return ret; } module_init(vidtv_bridge_init); module_exit(vidtv_bridge_exit); MODULE_DESCRIPTION("Virtual Digital TV Test Driver"); MODULE_AUTHOR("Daniel W. S. Almeida"); MODULE_LICENSE("GPL"); MODULE_ALIAS("vidtv"); MODULE_ALIAS("dvb_vidtv"); |
| 184 194 291 6 361 363 363 305 202 46 194 162 60 140 1 2 1 1 8 7 2 1 5 2 6 192 199 190 190 189 312 2 359 357 215 5 415 198 246 64 226 65 215 220 159 222 222 220 174 205 284 258 111 314 489 1 7 1 8 321 134 320 329 1 399 1 4 31 3 3 8 12 441 27 147 137 122 358 177 162 87 303 249 265 383 272 46 269 264 49 286 58 25 59 64 4 151 197 197 233 22 22 335 245 34 53 34 255 3 506 11 46 168 181 181 162 51 4 345 4 387 1 1 5 5 30 172 4 2 2 4 3 1 1 1 2 2 2 227 9 9 7 3 77 1 22 140 23 2 251 241 277 308 193 679 654 686 227 6 248 248 248 248 4 220 113 36 257 260 36 32 34 34 246 28 243 151 9 9 205 8 8 254 83 15 457 20 101 205 206 4 3 3 1 81 66 63 6 6 64 38 1 15 5 20 503 127 52 388 58 242 234 234 3 7 5 5 5 53 135 2 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 2297 2298 2299 2300 2301 2302 2303 2304 2305 2306 2307 2308 2309 2310 2311 2312 2313 2314 2315 2316 2317 2318 2319 2320 2321 2322 2323 2324 2325 2326 2327 2328 2329 2330 2331 2332 2333 2334 2335 2336 2337 2338 2339 2340 2341 2342 2343 2344 2345 2346 2347 2348 2349 2350 2351 2352 2353 2354 2355 2356 2357 2358 2359 2360 2361 2362 2363 2364 2365 2366 2367 2368 2369 2370 2371 2372 2373 2374 2375 2376 2377 2378 2379 2380 2381 2382 2383 2384 2385 2386 2387 2388 2389 2390 2391 2392 2393 2394 2395 2396 2397 2398 2399 2400 2401 2402 2403 2404 2405 2406 2407 2408 2409 2410 2411 2412 2413 2414 2415 2416 2417 2418 2419 2420 2421 2422 2423 2424 2425 2426 2427 2428 2429 2430 2431 2432 2433 2434 2435 2436 2437 2438 2439 2440 2441 2442 2443 2444 2445 2446 2447 2448 2449 2450 2451 2452 2453 2454 2455 2456 2457 2458 2459 2460 2461 2462 2463 2464 2465 2466 2467 2468 2469 2470 2471 2472 2473 2474 2475 2476 2477 2478 2479 2480 2481 2482 2483 2484 2485 2486 2487 2488 2489 2490 2491 2492 2493 2494 2495 2496 2497 2498 2499 2500 2501 2502 2503 2504 2505 2506 2507 2508 2509 2510 2511 2512 2513 2514 2515 2516 2517 2518 2519 2520 2521 2522 2523 2524 2525 2526 2527 2528 2529 2530 2531 2532 2533 2534 2535 2536 2537 2538 2539 2540 2541 2542 2543 2544 2545 2546 2547 2548 2549 2550 2551 2552 2553 2554 2555 2556 2557 2558 2559 2560 2561 2562 2563 2564 2565 2566 2567 2568 2569 2570 2571 2572 2573 2574 2575 2576 2577 2578 2579 2580 2581 2582 2583 2584 2585 2586 2587 2588 2589 2590 2591 2592 2593 2594 2595 2596 2597 2598 2599 2600 2601 2602 2603 2604 2605 2606 2607 2608 2609 2610 2611 2612 2613 2614 2615 2616 2617 2618 2619 2620 2621 2622 2623 2624 2625 2626 2627 2628 2629 2630 2631 2632 2633 2634 2635 2636 2637 2638 2639 2640 2641 2642 2643 2644 2645 2646 2647 2648 2649 2650 2651 2652 2653 2654 2655 2656 2657 2658 2659 2660 2661 2662 2663 2664 2665 2666 2667 2668 2669 2670 2671 2672 2673 2674 2675 2676 2677 2678 2679 2680 2681 2682 2683 2684 2685 2686 2687 2688 2689 2690 2691 2692 2693 2694 2695 2696 2697 2698 2699 2700 2701 2702 2703 2704 2705 2706 2707 2708 2709 2710 2711 2712 2713 2714 2715 2716 2717 2718 2719 2720 2721 2722 2723 2724 2725 2726 2727 2728 2729 2730 2731 2732 2733 2734 2735 2736 2737 2738 2739 2740 2741 2742 2743 2744 2745 2746 2747 2748 2749 2750 2751 2752 2753 2754 2755 2756 2757 2758 2759 2760 2761 2762 2763 2764 2765 2766 2767 2768 2769 2770 2771 2772 2773 2774 2775 2776 2777 2778 2779 2780 2781 2782 2783 2784 2785 2786 2787 2788 2789 2790 2791 2792 2793 2794 2795 2796 2797 2798 2799 2800 2801 2802 2803 2804 2805 2806 2807 2808 2809 2810 2811 2812 2813 2814 2815 2816 2817 2818 2819 2820 2821 2822 2823 2824 2825 2826 2827 2828 2829 2830 2831 2832 2833 2834 2835 2836 2837 2838 2839 2840 2841 2842 2843 2844 2845 2846 2847 2848 2849 2850 2851 2852 2853 2854 2855 2856 2857 2858 2859 2860 2861 2862 2863 2864 2865 2866 2867 2868 2869 2870 2871 2872 2873 2874 2875 2876 2877 2878 2879 2880 2881 2882 2883 2884 2885 2886 2887 2888 2889 2890 2891 2892 2893 2894 2895 | /* SPDX-License-Identifier: GPL-2.0-or-later */ /* * INET An implementation of the TCP/IP protocol suite for the LINUX * operating system. INET is implemented using the BSD Socket * interface as the means of communication with the user level. * * Definitions for the TCP module. * * Version: @(#)tcp.h 1.0.5 05/23/93 * * Authors: Ross Biro * Fred N. van Kempen, <waltje@uWalt.NL.Mugnet.ORG> */ #ifndef _TCP_H #define _TCP_H #define FASTRETRANS_DEBUG 1 #include <linux/list.h> #include <linux/tcp.h> #include <linux/bug.h> #include <linux/slab.h> #include <linux/cache.h> #include <linux/percpu.h> #include <linux/skbuff.h> #include <linux/kref.h> #include <linux/ktime.h> #include <linux/indirect_call_wrapper.h> #include <linux/bits.h> #include <net/inet_connection_sock.h> #include <net/inet_timewait_sock.h> #include <net/inet_hashtables.h> #include <net/checksum.h> #include <net/request_sock.h> #include <net/sock_reuseport.h> #include <net/sock.h> #include <net/snmp.h> #include <net/ip.h> #include <net/tcp_states.h> #include <net/tcp_ao.h> #include <net/inet_ecn.h> #include <net/dst.h> #include <net/mptcp.h> #include <net/xfrm.h> #include <linux/seq_file.h> #include <linux/memcontrol.h> #include <linux/bpf-cgroup.h> #include <linux/siphash.h> extern struct inet_hashinfo tcp_hashinfo; DECLARE_PER_CPU(unsigned int, tcp_orphan_count); int tcp_orphan_count_sum(void); DECLARE_PER_CPU(u32, tcp_tw_isn); void tcp_time_wait(struct sock *sk, int state, int timeo); #define MAX_TCP_HEADER L1_CACHE_ALIGN(128 + MAX_HEADER) #define MAX_TCP_OPTION_SPACE 40 #define TCP_MIN_SND_MSS 48 #define TCP_MIN_GSO_SIZE (TCP_MIN_SND_MSS - MAX_TCP_OPTION_SPACE) /* * Never offer a window over 32767 without using window scaling. Some * poor stacks do signed 16bit maths! */ #define MAX_TCP_WINDOW 32767U /* Minimal accepted MSS. It is (60+60+8) - (20+20). */ #define TCP_MIN_MSS 88U /* The initial MTU to use for probing */ #define TCP_BASE_MSS 1024 /* probing interval, default to 10 minutes as per RFC4821 */ #define TCP_PROBE_INTERVAL 600 /* Specify interval when tcp mtu probing will stop */ #define TCP_PROBE_THRESHOLD 8 /* After receiving this amount of duplicate ACKs fast retransmit starts. */ #define TCP_FASTRETRANS_THRESH 3 /* Maximal number of ACKs sent quickly to accelerate slow-start. */ #define TCP_MAX_QUICKACKS 16U /* Maximal number of window scale according to RFC1323 */ #define TCP_MAX_WSCALE 14U /* urg_data states */ #define TCP_URG_VALID 0x0100 #define TCP_URG_NOTYET 0x0200 #define TCP_URG_READ 0x0400 #define TCP_RETR1 3 /* * This is how many retries it does before it * tries to figure out if the gateway is * down. Minimal RFC value is 3; it corresponds * to ~3sec-8min depending on RTO. */ #define TCP_RETR2 15 /* * This should take at least * 90 minutes to time out. * RFC1122 says that the limit is 100 sec. * 15 is ~13-30min depending on RTO. */ #define TCP_SYN_RETRIES 6 /* This is how many retries are done * when active opening a connection. * RFC1122 says the minimum retry MUST * be at least 180secs. Nevertheless * this value is corresponding to * 63secs of retransmission with the * current initial RTO. */ #define TCP_SYNACK_RETRIES 5 /* This is how may retries are done * when passive opening a connection. * This is corresponding to 31secs of * retransmission with the current * initial RTO. */ #define TCP_TIMEWAIT_LEN (60*HZ) /* how long to wait to destroy TIME-WAIT * state, about 60 seconds */ #define TCP_FIN_TIMEOUT TCP_TIMEWAIT_LEN /* BSD style FIN_WAIT2 deadlock breaker. * It used to be 3min, new value is 60sec, * to combine FIN-WAIT-2 timeout with * TIME-WAIT timer. */ #define TCP_FIN_TIMEOUT_MAX (120 * HZ) /* max TCP_LINGER2 value (two minutes) */ #define TCP_DELACK_MAX ((unsigned)(HZ/5)) /* maximal time to delay before sending an ACK */ static_assert((1 << ATO_BITS) > TCP_DELACK_MAX); #if HZ >= 100 #define TCP_DELACK_MIN ((unsigned)(HZ/25)) /* minimal time to delay before sending an ACK */ #define TCP_ATO_MIN ((unsigned)(HZ/25)) #else #define TCP_DELACK_MIN 4U #define TCP_ATO_MIN 4U #endif #define TCP_RTO_MAX_SEC 120 #define TCP_RTO_MAX ((unsigned)(TCP_RTO_MAX_SEC * HZ)) #define TCP_RTO_MIN ((unsigned)(HZ / 5)) #define TCP_TIMEOUT_MIN (2U) /* Min timeout for TCP timers in jiffies */ #define TCP_TIMEOUT_MIN_US (2*USEC_PER_MSEC) /* Min TCP timeout in microsecs */ #define TCP_TIMEOUT_INIT ((unsigned)(1*HZ)) /* RFC6298 2.1 initial RTO value */ #define TCP_TIMEOUT_FALLBACK ((unsigned)(3*HZ)) /* RFC 1122 initial RTO value, now * used as a fallback RTO for the * initial data transmission if no * valid RTT sample has been acquired, * most likely due to retrans in 3WHS. */ #define TCP_RESOURCE_PROBE_INTERVAL ((unsigned)(HZ/2U)) /* Maximal interval between probes * for local resources. */ #define TCP_KEEPALIVE_TIME (120*60*HZ) /* two hours */ #define TCP_KEEPALIVE_PROBES 9 /* Max of 9 keepalive probes */ #define TCP_KEEPALIVE_INTVL (75*HZ) #define MAX_TCP_KEEPIDLE 32767 #define MAX_TCP_KEEPINTVL 32767 #define MAX_TCP_KEEPCNT 127 #define MAX_TCP_SYNCNT 127 /* Ensure that TCP PAWS checks are relaxed after ~2147 seconds * to avoid overflows. This assumes a clock smaller than 1 Mhz. * Default clock is 1 Khz, tcp_usec_ts uses 1 Mhz. */ #define TCP_PAWS_WRAP (INT_MAX / USEC_PER_SEC) #define TCP_PAWS_MSL 60 /* Per-host timestamps are invalidated * after this time. It should be equal * (or greater than) TCP_TIMEWAIT_LEN * to provide reliability equal to one * provided by timewait state. */ #define TCP_PAWS_WINDOW 1 /* Replay window for per-host * timestamps. It must be less than * minimal timewait lifetime. */ /* * TCP option */ #define TCPOPT_NOP 1 /* Padding */ #define TCPOPT_EOL 0 /* End of options */ #define TCPOPT_MSS 2 /* Segment size negotiating */ #define TCPOPT_WINDOW 3 /* Window scaling */ #define TCPOPT_SACK_PERM 4 /* SACK Permitted */ #define TCPOPT_SACK 5 /* SACK Block */ #define TCPOPT_TIMESTAMP 8 /* Better RTT estimations/PAWS */ #define TCPOPT_MD5SIG 19 /* MD5 Signature (RFC2385) */ #define TCPOPT_AO 29 /* Authentication Option (RFC5925) */ #define TCPOPT_MPTCP 30 /* Multipath TCP (RFC6824) */ #define TCPOPT_FASTOPEN 34 /* Fast open (RFC7413) */ #define TCPOPT_EXP 254 /* Experimental */ /* Magic number to be after the option value for sharing TCP * experimental options. See draft-ietf-tcpm-experimental-options-00.txt */ #define TCPOPT_FASTOPEN_MAGIC 0xF989 #define TCPOPT_SMC_MAGIC 0xE2D4C3D9 /* * TCP option lengths */ #define TCPOLEN_MSS 4 #define TCPOLEN_WINDOW 3 #define TCPOLEN_SACK_PERM 2 #define TCPOLEN_TIMESTAMP 10 #define TCPOLEN_MD5SIG 18 #define TCPOLEN_FASTOPEN_BASE 2 #define TCPOLEN_EXP_FASTOPEN_BASE 4 #define TCPOLEN_EXP_SMC_BASE 6 /* But this is what stacks really send out. */ #define TCPOLEN_TSTAMP_ALIGNED 12 #define TCPOLEN_WSCALE_ALIGNED 4 #define TCPOLEN_SACKPERM_ALIGNED 4 #define TCPOLEN_SACK_BASE 2 #define TCPOLEN_SACK_BASE_ALIGNED 4 #define TCPOLEN_SACK_PERBLOCK 8 #define TCPOLEN_MD5SIG_ALIGNED 20 #define TCPOLEN_MSS_ALIGNED 4 #define TCPOLEN_EXP_SMC_BASE_ALIGNED 8 /* Flags in tp->nonagle */ #define TCP_NAGLE_OFF 1 /* Nagle's algo is disabled */ #define TCP_NAGLE_CORK 2 /* Socket is corked */ #define TCP_NAGLE_PUSH 4 /* Cork is overridden for already queued data */ /* TCP thin-stream limits */ #define TCP_THIN_LINEAR_RETRIES 6 /* After 6 linear retries, do exp. backoff */ /* TCP initial congestion window as per rfc6928 */ #define TCP_INIT_CWND 10 /* Bit Flags for sysctl_tcp_fastopen */ #define TFO_CLIENT_ENABLE 1 #define TFO_SERVER_ENABLE 2 #define TFO_CLIENT_NO_COOKIE 4 /* Data in SYN w/o cookie option */ /* Accept SYN data w/o any cookie option */ #define TFO_SERVER_COOKIE_NOT_REQD 0x200 /* Force enable TFO on all listeners, i.e., not requiring the * TCP_FASTOPEN socket option. */ #define TFO_SERVER_WO_SOCKOPT1 0x400 /* sysctl variables for tcp */ extern int sysctl_tcp_max_orphans; extern long sysctl_tcp_mem[3]; #define TCP_RACK_LOSS_DETECTION 0x1 /* Use RACK to detect losses */ #define TCP_RACK_STATIC_REO_WND 0x2 /* Use static RACK reo wnd */ #define TCP_RACK_NO_DUPTHRESH 0x4 /* Do not use DUPACK threshold in RACK */ DECLARE_PER_CPU(int, tcp_memory_per_cpu_fw_alloc); extern struct percpu_counter tcp_sockets_allocated; extern unsigned long tcp_memory_pressure; /* optimized version of sk_under_memory_pressure() for TCP sockets */ static inline bool tcp_under_memory_pressure(const struct sock *sk) { if (mem_cgroup_sockets_enabled && sk->sk_memcg && mem_cgroup_under_socket_pressure(sk->sk_memcg)) return true; return READ_ONCE(tcp_memory_pressure); } /* * The next routines deal with comparing 32 bit unsigned ints * and worry about wraparound (automatic with unsigned arithmetic). */ static inline bool before(__u32 seq1, __u32 seq2) { return (__s32)(seq1-seq2) < 0; } #define after(seq2, seq1) before(seq1, seq2) /* is s2<=s1<=s3 ? */ static inline bool between(__u32 seq1, __u32 seq2, __u32 seq3) { return seq3 - seq2 >= seq1 - seq2; } static inline void tcp_wmem_free_skb(struct sock *sk, struct sk_buff *skb) { sk_wmem_queued_add(sk, -skb->truesize); if (!skb_zcopy_pure(skb)) sk_mem_uncharge(sk, skb->truesize); else sk_mem_uncharge(sk, SKB_TRUESIZE(skb_end_offset(skb))); __kfree_skb(skb); } void sk_forced_mem_schedule(struct sock *sk, int size); bool tcp_check_oom(const struct sock *sk, int shift); extern struct proto tcp_prot; #define TCP_INC_STATS(net, field) SNMP_INC_STATS((net)->mib.tcp_statistics, field) #define __TCP_INC_STATS(net, field) __SNMP_INC_STATS((net)->mib.tcp_statistics, field) #define TCP_DEC_STATS(net, field) SNMP_DEC_STATS((net)->mib.tcp_statistics, field) #define TCP_ADD_STATS(net, field, val) SNMP_ADD_STATS((net)->mib.tcp_statistics, field, val) void tcp_tsq_work_init(void); int tcp_v4_err(struct sk_buff *skb, u32); void tcp_shutdown(struct sock *sk, int how); int tcp_v4_early_demux(struct sk_buff *skb); int tcp_v4_rcv(struct sk_buff *skb); void tcp_remove_empty_skb(struct sock *sk); int tcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t size); int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size); int tcp_sendmsg_fastopen(struct sock *sk, struct msghdr *msg, int *copied, size_t size, struct ubuf_info *uarg); void tcp_splice_eof(struct socket *sock); int tcp_send_mss(struct sock *sk, int *size_goal, int flags); int tcp_wmem_schedule(struct sock *sk, int copy); void tcp_push(struct sock *sk, int flags, int mss_now, int nonagle, int size_goal); void tcp_release_cb(struct sock *sk); void tcp_wfree(struct sk_buff *skb); void tcp_write_timer_handler(struct sock *sk); void tcp_delack_timer_handler(struct sock *sk); int tcp_ioctl(struct sock *sk, int cmd, int *karg); enum skb_drop_reason tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb); void tcp_rcv_established(struct sock *sk, struct sk_buff *skb); void tcp_rcv_space_adjust(struct sock *sk); int tcp_twsk_unique(struct sock *sk, struct sock *sktw, void *twp); void tcp_twsk_destructor(struct sock *sk); void tcp_twsk_purge(struct list_head *net_exit_list); ssize_t tcp_splice_read(struct socket *sk, loff_t *ppos, struct pipe_inode_info *pipe, size_t len, unsigned int flags); struct sk_buff *tcp_stream_alloc_skb(struct sock *sk, gfp_t gfp, bool force_schedule); static inline void tcp_dec_quickack_mode(struct sock *sk) { struct inet_connection_sock *icsk = inet_csk(sk); if (icsk->icsk_ack.quick) { /* How many ACKs S/ACKing new data have we sent? */ const unsigned int pkts = inet_csk_ack_scheduled(sk) ? 1 : 0; if (pkts >= icsk->icsk_ack.quick) { icsk->icsk_ack.quick = 0; /* Leaving quickack mode we deflate ATO. */ icsk->icsk_ack.ato = TCP_ATO_MIN; } else icsk->icsk_ack.quick -= pkts; } } #define TCP_ECN_MODE_RFC3168 BIT(0) #define TCP_ECN_QUEUE_CWR BIT(1) #define TCP_ECN_DEMAND_CWR BIT(2) #define TCP_ECN_SEEN BIT(3) #define TCP_ECN_MODE_ACCECN BIT(4) #define TCP_ECN_DISABLED 0 #define TCP_ECN_MODE_PENDING (TCP_ECN_MODE_RFC3168 | TCP_ECN_MODE_ACCECN) #define TCP_ECN_MODE_ANY (TCP_ECN_MODE_RFC3168 | TCP_ECN_MODE_ACCECN) static inline bool tcp_ecn_mode_any(const struct tcp_sock *tp) { return tp->ecn_flags & TCP_ECN_MODE_ANY; } static inline bool tcp_ecn_mode_rfc3168(const struct tcp_sock *tp) { return (tp->ecn_flags & TCP_ECN_MODE_ANY) == TCP_ECN_MODE_RFC3168; } static inline bool tcp_ecn_mode_accecn(const struct tcp_sock *tp) { return (tp->ecn_flags & TCP_ECN_MODE_ANY) == TCP_ECN_MODE_ACCECN; } static inline bool tcp_ecn_disabled(const struct tcp_sock *tp) { return !tcp_ecn_mode_any(tp); } static inline bool tcp_ecn_mode_pending(const struct tcp_sock *tp) { return (tp->ecn_flags & TCP_ECN_MODE_PENDING) == TCP_ECN_MODE_PENDING; } static inline void tcp_ecn_mode_set(struct tcp_sock *tp, u8 mode) { tp->ecn_flags &= ~TCP_ECN_MODE_ANY; tp->ecn_flags |= mode; } enum tcp_tw_status { TCP_TW_SUCCESS = 0, TCP_TW_RST = 1, TCP_TW_ACK = 2, TCP_TW_SYN = 3, TCP_TW_ACK_OOW = 4 }; enum tcp_tw_status tcp_timewait_state_process(struct inet_timewait_sock *tw, struct sk_buff *skb, const struct tcphdr *th, u32 *tw_isn, enum skb_drop_reason *drop_reason); struct sock *tcp_check_req(struct sock *sk, struct sk_buff *skb, struct request_sock *req, bool fastopen, bool *lost_race, enum skb_drop_reason *drop_reason); enum skb_drop_reason tcp_child_process(struct sock *parent, struct sock *child, struct sk_buff *skb); void tcp_enter_loss(struct sock *sk); void tcp_cwnd_reduction(struct sock *sk, int newly_acked_sacked, int newly_lost, int flag); void tcp_clear_retrans(struct tcp_sock *tp); void tcp_update_metrics(struct sock *sk); void tcp_init_metrics(struct sock *sk); void tcp_metrics_init(void); bool tcp_peer_is_proven(struct request_sock *req, struct dst_entry *dst); void __tcp_close(struct sock *sk, long timeout); void tcp_close(struct sock *sk, long timeout); void tcp_init_sock(struct sock *sk); void tcp_init_transfer(struct sock *sk, int bpf_op, struct sk_buff *skb); __poll_t tcp_poll(struct file *file, struct socket *sock, struct poll_table_struct *wait); int do_tcp_getsockopt(struct sock *sk, int level, int optname, sockptr_t optval, sockptr_t optlen); int tcp_getsockopt(struct sock *sk, int level, int optname, char __user *optval, int __user *optlen); bool tcp_bpf_bypass_getsockopt(int level, int optname); int do_tcp_setsockopt(struct sock *sk, int level, int optname, sockptr_t optval, unsigned int optlen); int tcp_setsockopt(struct sock *sk, int level, int optname, sockptr_t optval, unsigned int optlen); void tcp_reset_keepalive_timer(struct sock *sk, unsigned long timeout); void tcp_set_keepalive(struct sock *sk, int val); void tcp_syn_ack_timeout(const struct request_sock *req); int tcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int flags, int *addr_len); int tcp_set_rcvlowat(struct sock *sk, int val); int tcp_set_window_clamp(struct sock *sk, int val); void tcp_update_recv_tstamps(struct sk_buff *skb, struct scm_timestamping_internal *tss); void tcp_recv_timestamp(struct msghdr *msg, const struct sock *sk, struct scm_timestamping_internal *tss); void tcp_data_ready(struct sock *sk); #ifdef CONFIG_MMU int tcp_mmap(struct file *file, struct socket *sock, struct vm_area_struct *vma); #endif void tcp_parse_options(const struct net *net, const struct sk_buff *skb, struct tcp_options_received *opt_rx, int estab, struct tcp_fastopen_cookie *foc); /* * BPF SKB-less helpers */ u16 tcp_v4_get_syncookie(struct sock *sk, struct iphdr *iph, struct tcphdr *th, u32 *cookie); u16 tcp_v6_get_syncookie(struct sock *sk, struct ipv6hdr *iph, struct tcphdr *th, u32 *cookie); u16 tcp_parse_mss_option(const struct tcphdr *th, u16 user_mss); u16 tcp_get_syncookie_mss(struct request_sock_ops *rsk_ops, const struct tcp_request_sock_ops *af_ops, struct sock *sk, struct tcphdr *th); /* * TCP v4 functions exported for the inet6 API */ void tcp_v4_send_check(struct sock *sk, struct sk_buff *skb); void tcp_v4_mtu_reduced(struct sock *sk); void tcp_req_err(struct sock *sk, u32 seq, bool abort); void tcp_ld_RTO_revert(struct sock *sk, u32 seq); int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb); struct sock *tcp_create_openreq_child(const struct sock *sk, struct request_sock *req, struct sk_buff *skb); void tcp_ca_openreq_child(struct sock *sk, const struct dst_entry *dst); struct sock *tcp_v4_syn_recv_sock(const struct sock *sk, struct sk_buff *skb, struct request_sock *req, struct dst_entry *dst, struct request_sock *req_unhash, bool *own_req); int tcp_v4_do_rcv(struct sock *sk, struct sk_buff *skb); int tcp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len); int tcp_connect(struct sock *sk); enum tcp_synack_type { TCP_SYNACK_NORMAL, TCP_SYNACK_FASTOPEN, TCP_SYNACK_COOKIE, }; struct sk_buff *tcp_make_synack(const struct sock *sk, struct dst_entry *dst, struct request_sock *req, struct tcp_fastopen_cookie *foc, enum tcp_synack_type synack_type, struct sk_buff *syn_skb); int tcp_disconnect(struct sock *sk, int flags); void tcp_finish_connect(struct sock *sk, struct sk_buff *skb); int tcp_send_rcvq(struct sock *sk, struct msghdr *msg, size_t size); void inet_sk_rx_dst_set(struct sock *sk, const struct sk_buff *skb); /* From syncookies.c */ struct sock *tcp_get_cookie_sock(struct sock *sk, struct sk_buff *skb, struct request_sock *req, struct dst_entry *dst); int __cookie_v4_check(const struct iphdr *iph, const struct tcphdr *th); struct sock *cookie_v4_check(struct sock *sk, struct sk_buff *skb); struct request_sock *cookie_tcp_reqsk_alloc(const struct request_sock_ops *ops, struct sock *sk, struct sk_buff *skb, struct tcp_options_received *tcp_opt, int mss, u32 tsoff); #if IS_ENABLED(CONFIG_BPF) struct bpf_tcp_req_attrs { u32 rcv_tsval; u32 rcv_tsecr; u16 mss; u8 rcv_wscale; u8 snd_wscale; u8 ecn_ok; u8 wscale_ok; u8 sack_ok; u8 tstamp_ok; u8 usec_ts_ok; u8 reserved[3]; }; #endif #ifdef CONFIG_SYN_COOKIES /* Syncookies use a monotonic timer which increments every 60 seconds. * This counter is used both as a hash input and partially encoded into * the cookie value. A cookie is only validated further if the delta * between the current counter value and the encoded one is less than this, * i.e. a sent cookie is valid only at most for 2*60 seconds (or less if * the counter advances immediately after a cookie is generated). */ #define MAX_SYNCOOKIE_AGE 2 #define TCP_SYNCOOKIE_PERIOD (60 * HZ) #define TCP_SYNCOOKIE_VALID (MAX_SYNCOOKIE_AGE * TCP_SYNCOOKIE_PERIOD) /* syncookies: remember time of last synqueue overflow * But do not dirty this field too often (once per second is enough) * It is racy as we do not hold a lock, but race is very minor. */ static inline void tcp_synq_overflow(const struct sock *sk) { unsigned int last_overflow; unsigned int now = jiffies; if (sk->sk_reuseport) { struct sock_reuseport *reuse; reuse = rcu_dereference(sk->sk_reuseport_cb); if (likely(reuse)) { last_overflow = READ_ONCE(reuse->synq_overflow_ts); if (!time_between32(now, last_overflow, last_overflow + HZ)) WRITE_ONCE(reuse->synq_overflow_ts, now); return; } } last_overflow = READ_ONCE(tcp_sk(sk)->rx_opt.ts_recent_stamp); if (!time_between32(now, last_overflow, last_overflow + HZ)) WRITE_ONCE(tcp_sk_rw(sk)->rx_opt.ts_recent_stamp, now); } /* syncookies: no recent synqueue overflow on this listening socket? */ static inline bool tcp_synq_no_recent_overflow(const struct sock *sk) { unsigned int last_overflow; unsigned int now = jiffies; if (sk->sk_reuseport) { struct sock_reuseport *reuse; reuse = rcu_dereference(sk->sk_reuseport_cb); if (likely(reuse)) { last_overflow = READ_ONCE(reuse->synq_overflow_ts); return !time_between32(now, last_overflow - HZ, last_overflow + TCP_SYNCOOKIE_VALID); } } last_overflow = READ_ONCE(tcp_sk(sk)->rx_opt.ts_recent_stamp); /* If last_overflow <= jiffies <= last_overflow + TCP_SYNCOOKIE_VALID, * then we're under synflood. However, we have to use * 'last_overflow - HZ' as lower bound. That's because a concurrent * tcp_synq_overflow() could update .ts_recent_stamp after we read * jiffies but before we store .ts_recent_stamp into last_overflow, * which could lead to rejecting a valid syncookie. */ return !time_between32(now, last_overflow - HZ, last_overflow + TCP_SYNCOOKIE_VALID); } static inline u32 tcp_cookie_time(void) { u64 val = get_jiffies_64(); do_div(val, TCP_SYNCOOKIE_PERIOD); return val; } /* Convert one nsec 64bit timestamp to ts (ms or usec resolution) */ static inline u64 tcp_ns_to_ts(bool usec_ts, u64 val) { if (usec_ts) return div_u64(val, NSEC_PER_USEC); return div_u64(val, NSEC_PER_MSEC); } u32 __cookie_v4_init_sequence(const struct iphdr *iph, const struct tcphdr *th, u16 *mssp); __u32 cookie_v4_init_sequence(const struct sk_buff *skb, __u16 *mss); u64 cookie_init_timestamp(struct request_sock *req, u64 now); bool cookie_timestamp_decode(const struct net *net, struct tcp_options_received *opt); static inline bool cookie_ecn_ok(const struct net *net, const struct dst_entry *dst) { return READ_ONCE(net->ipv4.sysctl_tcp_ecn) || dst_feature(dst, RTAX_FEATURE_ECN); } #if IS_ENABLED(CONFIG_BPF) static inline bool cookie_bpf_ok(struct sk_buff *skb) { return skb->sk; } struct request_sock *cookie_bpf_check(struct sock *sk, struct sk_buff *skb); #else static inline bool cookie_bpf_ok(struct sk_buff *skb) { return false; } static inline struct request_sock *cookie_bpf_check(struct net *net, struct sock *sk, struct sk_buff *skb) { return NULL; } #endif /* From net/ipv6/syncookies.c */ int __cookie_v6_check(const struct ipv6hdr *iph, const struct tcphdr *th); struct sock *cookie_v6_check(struct sock *sk, struct sk_buff *skb); u32 __cookie_v6_init_sequence(const struct ipv6hdr *iph, const struct tcphdr *th, u16 *mssp); __u32 cookie_v6_init_sequence(const struct sk_buff *skb, __u16 *mss); #endif /* tcp_output.c */ void tcp_skb_entail(struct sock *sk, struct sk_buff *skb); void tcp_mark_push(struct tcp_sock *tp, struct sk_buff *skb); void __tcp_push_pending_frames(struct sock *sk, unsigned int cur_mss, int nonagle); int __tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb, int segs); int tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb, int segs); void tcp_retransmit_timer(struct sock *sk); void tcp_xmit_retransmit_queue(struct sock *); void tcp_simple_retransmit(struct sock *); void tcp_enter_recovery(struct sock *sk, bool ece_ack); int tcp_trim_head(struct sock *, struct sk_buff *, u32); enum tcp_queue { TCP_FRAG_IN_WRITE_QUEUE, TCP_FRAG_IN_RTX_QUEUE, }; int tcp_fragment(struct sock *sk, enum tcp_queue tcp_queue, struct sk_buff *skb, u32 len, unsigned int mss_now, gfp_t gfp); void tcp_send_probe0(struct sock *); int tcp_write_wakeup(struct sock *, int mib); void tcp_send_fin(struct sock *sk); void tcp_send_active_reset(struct sock *sk, gfp_t priority, enum sk_rst_reason reason); int tcp_send_synack(struct sock *); void tcp_push_one(struct sock *, unsigned int mss_now); void __tcp_send_ack(struct sock *sk, u32 rcv_nxt, u16 flags); void tcp_send_ack(struct sock *sk); void tcp_send_delayed_ack(struct sock *sk); void tcp_send_loss_probe(struct sock *sk); bool tcp_schedule_loss_probe(struct sock *sk, bool advancing_rto); void tcp_skb_collapse_tstamp(struct sk_buff *skb, const struct sk_buff *next_skb); /* tcp_input.c */ void tcp_rearm_rto(struct sock *sk); void tcp_synack_rtt_meas(struct sock *sk, struct request_sock *req); void tcp_done_with_error(struct sock *sk, int err); void tcp_reset(struct sock *sk, struct sk_buff *skb); void tcp_fin(struct sock *sk); void tcp_check_space(struct sock *sk); void tcp_sack_compress_send_ack(struct sock *sk); static inline void tcp_cleanup_skb(struct sk_buff *skb) { skb_dst_drop(skb); secpath_reset(skb); } static inline void tcp_add_receive_queue(struct sock *sk, struct sk_buff *skb) { DEBUG_NET_WARN_ON_ONCE(skb_dst(skb)); DEBUG_NET_WARN_ON_ONCE(secpath_exists(skb)); __skb_queue_tail(&sk->sk_receive_queue, skb); } /* tcp_timer.c */ void tcp_init_xmit_timers(struct sock *); static inline void tcp_clear_xmit_timers(struct sock *sk) { if (hrtimer_try_to_cancel(&tcp_sk(sk)->pacing_timer) == 1) __sock_put(sk); if (hrtimer_try_to_cancel(&tcp_sk(sk)->compressed_ack_timer) == 1) __sock_put(sk); inet_csk_clear_xmit_timers(sk); } unsigned int tcp_sync_mss(struct sock *sk, u32 pmtu); unsigned int tcp_current_mss(struct sock *sk); u32 tcp_clamp_probe0_to_user_timeout(const struct sock *sk, u32 when); /* Bound MSS / TSO packet size with the half of the window */ static inline int tcp_bound_to_half_wnd(struct tcp_sock *tp, int pktsize) { int cutoff; /* When peer uses tiny windows, there is no use in packetizing * to sub-MSS pieces for the sake of SWS or making sure there * are enough packets in the pipe for fast recovery. * * On the other hand, for extremely large MSS devices, handling * smaller than MSS windows in this way does make sense. */ if (tp->max_window > TCP_MSS_DEFAULT) cutoff = (tp->max_window >> 1); else cutoff = tp->max_window; if (cutoff && pktsize > cutoff) return max_t(int, cutoff, 68U - tp->tcp_header_len); else return pktsize; } /* tcp.c */ void tcp_get_info(struct sock *, struct tcp_info *); /* Read 'sendfile()'-style from a TCP socket */ int tcp_read_sock(struct sock *sk, read_descriptor_t *desc, sk_read_actor_t recv_actor); int tcp_read_sock_noack(struct sock *sk, read_descriptor_t *desc, sk_read_actor_t recv_actor, bool noack, u32 *copied_seq); int tcp_read_skb(struct sock *sk, skb_read_actor_t recv_actor); struct sk_buff *tcp_recv_skb(struct sock *sk, u32 seq, u32 *off); void tcp_read_done(struct sock *sk, size_t len); void tcp_initialize_rcv_mss(struct sock *sk); int tcp_mtu_to_mss(struct sock *sk, int pmtu); int tcp_mss_to_mtu(struct sock *sk, int mss); void tcp_mtup_init(struct sock *sk); static inline unsigned int tcp_rto_max(const struct sock *sk) { return READ_ONCE(inet_csk(sk)->icsk_rto_max); } static inline void tcp_bound_rto(struct sock *sk) { inet_csk(sk)->icsk_rto = min(inet_csk(sk)->icsk_rto, tcp_rto_max(sk)); } static inline u32 __tcp_set_rto(const struct tcp_sock *tp) { return usecs_to_jiffies((tp->srtt_us >> 3) + tp->rttvar_us); } static inline void __tcp_fast_path_on(struct tcp_sock *tp, u32 snd_wnd) { /* mptcp hooks are only on the slow path */ if (sk_is_mptcp((struct sock *)tp)) return; tp->pred_flags = htonl((tp->tcp_header_len << 26) | ntohl(TCP_FLAG_ACK) | snd_wnd); } static inline void tcp_fast_path_on(struct tcp_sock *tp) { __tcp_fast_path_on(tp, tp->snd_wnd >> tp->rx_opt.snd_wscale); } static inline void tcp_fast_path_check(struct sock *sk) { struct tcp_sock *tp = tcp_sk(sk); if (RB_EMPTY_ROOT(&tp->out_of_order_queue) && tp->rcv_wnd && atomic_read(&sk->sk_rmem_alloc) < sk->sk_rcvbuf && !tp->urg_data) tcp_fast_path_on(tp); } u32 tcp_delack_max(const struct sock *sk); /* Compute the actual rto_min value */ static inline u32 tcp_rto_min(const struct sock *sk) { const struct dst_entry *dst = __sk_dst_get(sk); u32 rto_min = READ_ONCE(inet_csk(sk)->icsk_rto_min); if (dst && dst_metric_locked(dst, RTAX_RTO_MIN)) rto_min = dst_metric_rtt(dst, RTAX_RTO_MIN); return rto_min; } static inline u32 tcp_rto_min_us(const struct sock *sk) { return jiffies_to_usecs(tcp_rto_min(sk)); } static inline bool tcp_ca_dst_locked(const struct dst_entry *dst) { return dst_metric_locked(dst, RTAX_CC_ALGO); } /* Minimum RTT in usec. ~0 means not available. */ static inline u32 tcp_min_rtt(const struct tcp_sock *tp) { return minmax_get(&tp->rtt_min); } /* Compute the actual receive window we are currently advertising. * Rcv_nxt can be after the window if our peer push more data * than the offered window. */ static inline u32 tcp_receive_window(const struct tcp_sock *tp) { s32 win = tp->rcv_wup + tp->rcv_wnd - tp->rcv_nxt; if (win < 0) win = 0; return (u32) win; } /* Choose a new window, without checks for shrinking, and without * scaling applied to the result. The caller does these things * if necessary. This is a "raw" window selection. */ u32 __tcp_select_window(struct sock *sk); void tcp_send_window_probe(struct sock *sk); /* TCP uses 32bit jiffies to save some space. * Note that this is different from tcp_time_stamp, which * historically has been the same until linux-4.13. */ #define tcp_jiffies32 ((u32)jiffies) /* * Deliver a 32bit value for TCP timestamp option (RFC 7323) * It is no longer tied to jiffies, but to 1 ms clock. * Note: double check if you want to use tcp_jiffies32 instead of this. */ #define TCP_TS_HZ 1000 static inline u64 tcp_clock_ns(void) { return ktime_get_ns(); } static inline u64 tcp_clock_us(void) { return div_u64(tcp_clock_ns(), NSEC_PER_USEC); } static inline u64 tcp_clock_ms(void) { return div_u64(tcp_clock_ns(), NSEC_PER_MSEC); } /* TCP Timestamp included in TS option (RFC 1323) can either use ms * or usec resolution. Each socket carries a flag to select one or other * resolution, as the route attribute could change anytime. * Each flow must stick to initial resolution. */ static inline u32 tcp_clock_ts(bool usec_ts) { return usec_ts ? tcp_clock_us() : tcp_clock_ms(); } static inline u32 tcp_time_stamp_ms(const struct tcp_sock *tp) { return div_u64(tp->tcp_mstamp, USEC_PER_MSEC); } static inline u32 tcp_time_stamp_ts(const struct tcp_sock *tp) { if (tp->tcp_usec_ts) return tp->tcp_mstamp; return tcp_time_stamp_ms(tp); } void tcp_mstamp_refresh(struct tcp_sock *tp); static inline u32 tcp_stamp_us_delta(u64 t1, u64 t0) { return max_t(s64, t1 - t0, 0); } /* provide the departure time in us unit */ static inline u64 tcp_skb_timestamp_us(const struct sk_buff *skb) { return div_u64(skb->skb_mstamp_ns, NSEC_PER_USEC); } /* Provide skb TSval in usec or ms unit */ static inline u32 tcp_skb_timestamp_ts(bool usec_ts, const struct sk_buff *skb) { if (usec_ts) return tcp_skb_timestamp_us(skb); return div_u64(skb->skb_mstamp_ns, NSEC_PER_MSEC); } static inline u32 tcp_tw_tsval(const struct tcp_timewait_sock *tcptw) { return tcp_clock_ts(tcptw->tw_sk.tw_usec_ts) + tcptw->tw_ts_offset; } static inline u32 tcp_rsk_tsval(const struct tcp_request_sock *treq) { return tcp_clock_ts(treq->req_usec_ts) + treq->ts_off; } #define tcp_flag_byte(th) (((u_int8_t *)th)[13]) #define TCPHDR_FIN BIT(0) #define TCPHDR_SYN BIT(1) #define TCPHDR_RST BIT(2) #define TCPHDR_PSH BIT(3) #define TCPHDR_ACK BIT(4) #define TCPHDR_URG BIT(5) #define TCPHDR_ECE BIT(6) #define TCPHDR_CWR BIT(7) #define TCPHDR_AE BIT(8) #define TCPHDR_FLAGS_MASK (TCPHDR_FIN | TCPHDR_SYN | TCPHDR_RST | \ TCPHDR_PSH | TCPHDR_ACK | TCPHDR_URG | \ TCPHDR_ECE | TCPHDR_CWR | TCPHDR_AE) #define tcp_flags_ntohs(th) (ntohs(*(__be16 *)&tcp_flag_word(th)) & \ TCPHDR_FLAGS_MASK) #define TCPHDR_ACE (TCPHDR_ECE | TCPHDR_CWR | TCPHDR_AE) #define TCPHDR_SYN_ECN (TCPHDR_SYN | TCPHDR_ECE | TCPHDR_CWR) /* State flags for sacked in struct tcp_skb_cb */ enum tcp_skb_cb_sacked_flags { TCPCB_SACKED_ACKED = (1 << 0), /* SKB ACK'd by a SACK block */ TCPCB_SACKED_RETRANS = (1 << 1), /* SKB retransmitted */ TCPCB_LOST = (1 << 2), /* SKB is lost */ TCPCB_TAGBITS = (TCPCB_SACKED_ACKED | TCPCB_SACKED_RETRANS | TCPCB_LOST), /* All tag bits */ TCPCB_REPAIRED = (1 << 4), /* SKB repaired (no skb_mstamp_ns) */ TCPCB_EVER_RETRANS = (1 << 7), /* Ever retransmitted frame */ TCPCB_RETRANS = (TCPCB_SACKED_RETRANS | TCPCB_EVER_RETRANS | TCPCB_REPAIRED), }; /* This is what the send packet queuing engine uses to pass * TCP per-packet control information to the transmission code. * We also store the host-order sequence numbers in here too. * This is 44 bytes if IPV6 is enabled. * If this grows please adjust skbuff.h:skbuff->cb[xxx] size appropriately. */ struct tcp_skb_cb { __u32 seq; /* Starting sequence number */ __u32 end_seq; /* SEQ + FIN + SYN + datalen */ union { /* Note : * tcp_gso_segs/size are used in write queue only, * cf tcp_skb_pcount()/tcp_skb_mss() */ struct { u16 tcp_gso_segs; u16 tcp_gso_size; }; }; __u16 tcp_flags; /* TCP header flags (tcp[12-13])*/ __u8 sacked; /* State flags for SACK. */ __u8 ip_dsfield; /* IPv4 tos or IPv6 dsfield */ #define TSTAMP_ACK_SK 0x1 #define TSTAMP_ACK_BPF 0x2 __u8 txstamp_ack:2, /* Record TX timestamp for ack? */ eor:1, /* Is skb MSG_EOR marked? */ has_rxtstamp:1, /* SKB has a RX timestamp */ unused:4; __u32 ack_seq; /* Sequence number ACK'd */ union { struct { #define TCPCB_DELIVERED_CE_MASK ((1U<<20) - 1) /* There is space for up to 24 bytes */ __u32 is_app_limited:1, /* cwnd not fully used? */ delivered_ce:20, unused:11; /* pkts S/ACKed so far upon tx of skb, incl retrans: */ __u32 delivered; /* start of send pipeline phase */ u64 first_tx_mstamp; /* when we reached the "delivered" count */ u64 delivered_mstamp; } tx; /* only used for outgoing skbs */ union { struct inet_skb_parm h4; #if IS_ENABLED(CONFIG_IPV6) struct inet6_skb_parm h6; #endif } header; /* For incoming skbs */ }; }; #define TCP_SKB_CB(__skb) ((struct tcp_skb_cb *)&((__skb)->cb[0])) extern const struct inet_connection_sock_af_ops ipv4_specific; #if IS_ENABLED(CONFIG_IPV6) /* This is the variant of inet6_iif() that must be used by TCP, * as TCP moves IP6CB into a different location in skb->cb[] */ static inline int tcp_v6_iif(const struct sk_buff *skb) { return TCP_SKB_CB(skb)->header.h6.iif; } static inline int tcp_v6_iif_l3_slave(const struct sk_buff *skb) { bool l3_slave = ipv6_l3mdev_skb(TCP_SKB_CB(skb)->header.h6.flags); return l3_slave ? skb->skb_iif : TCP_SKB_CB(skb)->header.h6.iif; } /* TCP_SKB_CB reference means this can not be used from early demux */ static inline int tcp_v6_sdif(const struct sk_buff *skb) { #if IS_ENABLED(CONFIG_NET_L3_MASTER_DEV) if (skb && ipv6_l3mdev_skb(TCP_SKB_CB(skb)->header.h6.flags)) return TCP_SKB_CB(skb)->header.h6.iif; #endif return 0; } extern const struct inet_connection_sock_af_ops ipv6_specific; INDIRECT_CALLABLE_DECLARE(void tcp_v6_send_check(struct sock *sk, struct sk_buff *skb)); INDIRECT_CALLABLE_DECLARE(int tcp_v6_rcv(struct sk_buff *skb)); void tcp_v6_early_demux(struct sk_buff *skb); #endif /* TCP_SKB_CB reference means this can not be used from early demux */ static inline int tcp_v4_sdif(struct sk_buff *skb) { #if IS_ENABLED(CONFIG_NET_L3_MASTER_DEV) if (skb && ipv4_l3mdev_skb(TCP_SKB_CB(skb)->header.h4.flags)) return TCP_SKB_CB(skb)->header.h4.iif; #endif return 0; } /* Due to TSO, an SKB can be composed of multiple actual * packets. To keep these tracked properly, we use this. */ static inline int tcp_skb_pcount(const struct sk_buff *skb) { return TCP_SKB_CB(skb)->tcp_gso_segs; } static inline void tcp_skb_pcount_set(struct sk_buff *skb, int segs) { TCP_SKB_CB(skb)->tcp_gso_segs = segs; } static inline void tcp_skb_pcount_add(struct sk_buff *skb, int segs) { TCP_SKB_CB(skb)->tcp_gso_segs += segs; } /* This is valid iff skb is in write queue and tcp_skb_pcount() > 1. */ static inline int tcp_skb_mss(const struct sk_buff *skb) { return TCP_SKB_CB(skb)->tcp_gso_size; } static inline bool tcp_skb_can_collapse_to(const struct sk_buff *skb) { return likely(!TCP_SKB_CB(skb)->eor); } static inline bool tcp_skb_can_collapse(const struct sk_buff *to, const struct sk_buff *from) { /* skb_cmp_decrypted() not needed, use tcp_write_collapse_fence() */ return likely(tcp_skb_can_collapse_to(to) && mptcp_skb_can_collapse(to, from) && skb_pure_zcopy_same(to, from) && skb_frags_readable(to) == skb_frags_readable(from)); } static inline bool tcp_skb_can_collapse_rx(const struct sk_buff *to, const struct sk_buff *from) { return likely(mptcp_skb_can_collapse(to, from) && !skb_cmp_decrypted(to, from)); } /* Events passed to congestion control interface */ enum tcp_ca_event { CA_EVENT_TX_START, /* first transmit when no packets in flight */ CA_EVENT_CWND_RESTART, /* congestion window restart */ CA_EVENT_COMPLETE_CWR, /* end of congestion recovery */ CA_EVENT_LOSS, /* loss timeout */ CA_EVENT_ECN_NO_CE, /* ECT set, but not CE marked */ CA_EVENT_ECN_IS_CE, /* received CE marked IP packet */ }; /* Information about inbound ACK, passed to cong_ops->in_ack_event() */ enum tcp_ca_ack_event_flags { CA_ACK_SLOWPATH = (1 << 0), /* In slow path processing */ CA_ACK_WIN_UPDATE = (1 << 1), /* ACK updated window */ CA_ACK_ECE = (1 << 2), /* ECE bit is set on ack */ }; /* * Interface for adding new TCP congestion control handlers */ #define TCP_CA_NAME_MAX 16 #define TCP_CA_MAX 128 #define TCP_CA_BUF_MAX (TCP_CA_NAME_MAX*TCP_CA_MAX) #define TCP_CA_UNSPEC 0 /* Algorithm can be set on socket without CAP_NET_ADMIN privileges */ #define TCP_CONG_NON_RESTRICTED BIT(0) /* Requires ECN/ECT set on all packets */ #define TCP_CONG_NEEDS_ECN BIT(1) #define TCP_CONG_MASK (TCP_CONG_NON_RESTRICTED | TCP_CONG_NEEDS_ECN) union tcp_cc_info; struct ack_sample { u32 pkts_acked; s32 rtt_us; u32 in_flight; }; /* A rate sample measures the number of (original/retransmitted) data * packets delivered "delivered" over an interval of time "interval_us". * The tcp_rate.c code fills in the rate sample, and congestion * control modules that define a cong_control function to run at the end * of ACK processing can optionally chose to consult this sample when * setting cwnd and pacing rate. * A sample is invalid if "delivered" or "interval_us" is negative. */ struct rate_sample { u64 prior_mstamp; /* starting timestamp for interval */ u32 prior_delivered; /* tp->delivered at "prior_mstamp" */ u32 prior_delivered_ce;/* tp->delivered_ce at "prior_mstamp" */ s32 delivered; /* number of packets delivered over interval */ s32 delivered_ce; /* number of packets delivered w/ CE marks*/ long interval_us; /* time for tp->delivered to incr "delivered" */ u32 snd_interval_us; /* snd interval for delivered packets */ u32 rcv_interval_us; /* rcv interval for delivered packets */ long rtt_us; /* RTT of last (S)ACKed packet (or -1) */ int losses; /* number of packets marked lost upon ACK */ u32 acked_sacked; /* number of packets newly (S)ACKed upon ACK */ u32 prior_in_flight; /* in flight before this ACK */ u32 last_end_seq; /* end_seq of most recently ACKed packet */ bool is_app_limited; /* is sample from packet with bubble in pipe? */ bool is_retrans; /* is sample from retransmission? */ bool is_ack_delayed; /* is this (likely) a delayed ACK? */ }; struct tcp_congestion_ops { /* fast path fields are put first to fill one cache line */ /* return slow start threshold (required) */ u32 (*ssthresh)(struct sock *sk); /* do new cwnd calculation (required) */ void (*cong_avoid)(struct sock *sk, u32 ack, u32 acked); /* call before changing ca_state (optional) */ void (*set_state)(struct sock *sk, u8 new_state); /* call when cwnd event occurs (optional) */ void (*cwnd_event)(struct sock *sk, enum tcp_ca_event ev); /* call when ack arrives (optional) */ void (*in_ack_event)(struct sock *sk, u32 flags); /* hook for packet ack accounting (optional) */ void (*pkts_acked)(struct sock *sk, const struct ack_sample *sample); /* override sysctl_tcp_min_tso_segs */ u32 (*min_tso_segs)(struct sock *sk); /* call when packets are delivered to update cwnd and pacing rate, * after all the ca_state processing. (optional) */ void (*cong_control)(struct sock *sk, u32 ack, int flag, const struct rate_sample *rs); /* new value of cwnd after loss (required) */ u32 (*undo_cwnd)(struct sock *sk); /* returns the multiplier used in tcp_sndbuf_expand (optional) */ u32 (*sndbuf_expand)(struct sock *sk); /* control/slow paths put last */ /* get info for inet_diag (optional) */ size_t (*get_info)(struct sock *sk, u32 ext, int *attr, union tcp_cc_info *info); char name[TCP_CA_NAME_MAX]; struct module *owner; struct list_head list; u32 key; u32 flags; /* initialize private data (optional) */ void (*init)(struct sock *sk); /* cleanup private data (optional) */ void (*release)(struct sock *sk); } ____cacheline_aligned_in_smp; int tcp_register_congestion_control(struct tcp_congestion_ops *type); void tcp_unregister_congestion_control(struct tcp_congestion_ops *type); int tcp_update_congestion_control(struct tcp_congestion_ops *type, struct tcp_congestion_ops *old_type); int tcp_validate_congestion_control(struct tcp_congestion_ops *ca); void tcp_assign_congestion_control(struct sock *sk); void tcp_init_congestion_control(struct sock *sk); void tcp_cleanup_congestion_control(struct sock *sk); int tcp_set_default_congestion_control(struct net *net, const char *name); void tcp_get_default_congestion_control(struct net *net, char *name); void tcp_get_available_congestion_control(char *buf, size_t len); void tcp_get_allowed_congestion_control(char *buf, size_t len); int tcp_set_allowed_congestion_control(char *allowed); int tcp_set_congestion_control(struct sock *sk, const char *name, bool load, bool cap_net_admin); u32 tcp_slow_start(struct tcp_sock *tp, u32 acked); void tcp_cong_avoid_ai(struct tcp_sock *tp, u32 w, u32 acked); u32 tcp_reno_ssthresh(struct sock *sk); u32 tcp_reno_undo_cwnd(struct sock *sk); void tcp_reno_cong_avoid(struct sock *sk, u32 ack, u32 acked); extern struct tcp_congestion_ops tcp_reno; struct tcp_congestion_ops *tcp_ca_find(const char *name); struct tcp_congestion_ops *tcp_ca_find_key(u32 key); u32 tcp_ca_get_key_by_name(const char *name, bool *ecn_ca); #ifdef CONFIG_INET char *tcp_ca_get_name_by_key(u32 key, char *buffer); #else static inline char *tcp_ca_get_name_by_key(u32 key, char *buffer) { return NULL; } #endif static inline bool tcp_ca_needs_ecn(const struct sock *sk) { const struct inet_connection_sock *icsk = inet_csk(sk); return icsk->icsk_ca_ops->flags & TCP_CONG_NEEDS_ECN; } static inline void tcp_ca_event(struct sock *sk, const enum tcp_ca_event event) { const struct inet_connection_sock *icsk = inet_csk(sk); if (icsk->icsk_ca_ops->cwnd_event) icsk->icsk_ca_ops->cwnd_event(sk, event); } /* From tcp_cong.c */ void tcp_set_ca_state(struct sock *sk, const u8 ca_state); /* From tcp_rate.c */ void tcp_rate_skb_sent(struct sock *sk, struct sk_buff *skb); void tcp_rate_skb_delivered(struct sock *sk, struct sk_buff *skb, struct rate_sample *rs); void tcp_rate_gen(struct sock *sk, u32 delivered, u32 lost, bool is_sack_reneg, struct rate_sample *rs); void tcp_rate_check_app_limited(struct sock *sk); static inline bool tcp_skb_sent_after(u64 t1, u64 t2, u32 seq1, u32 seq2) { return t1 > t2 || (t1 == t2 && after(seq1, seq2)); } /* These functions determine how the current flow behaves in respect of SACK * handling. SACK is negotiated with the peer, and therefore it can vary * between different flows. * * tcp_is_sack - SACK enabled * tcp_is_reno - No SACK */ static inline int tcp_is_sack(const struct tcp_sock *tp) { return likely(tp->rx_opt.sack_ok); } static inline bool tcp_is_reno(const struct tcp_sock *tp) { return !tcp_is_sack(tp); } static inline unsigned int tcp_left_out(const struct tcp_sock *tp) { return tp->sacked_out + tp->lost_out; } /* This determines how many packets are "in the network" to the best * of our knowledge. In many cases it is conservative, but where * detailed information is available from the receiver (via SACK * blocks etc.) we can make more aggressive calculations. * * Use this for decisions involving congestion control, use just * tp->packets_out to determine if the send queue is empty or not. * * Read this equation as: * * "Packets sent once on transmission queue" MINUS * "Packets left network, but not honestly ACKed yet" PLUS * "Packets fast retransmitted" */ static inline unsigned int tcp_packets_in_flight(const struct tcp_sock *tp) { return tp->packets_out - tcp_left_out(tp) + tp->retrans_out; } #define TCP_INFINITE_SSTHRESH 0x7fffffff static inline u32 tcp_snd_cwnd(const struct tcp_sock *tp) { return tp->snd_cwnd; } static inline void tcp_snd_cwnd_set(struct tcp_sock *tp, u32 val) { WARN_ON_ONCE((int)val <= 0); tp->snd_cwnd = val; } static inline bool tcp_in_slow_start(const struct tcp_sock *tp) { return tcp_snd_cwnd(tp) < tp->snd_ssthresh; } static inline bool tcp_in_initial_slowstart(const struct tcp_sock *tp) { return tp->snd_ssthresh >= TCP_INFINITE_SSTHRESH; } static inline bool tcp_in_cwnd_reduction(const struct sock *sk) { return (TCPF_CA_CWR | TCPF_CA_Recovery) & (1 << inet_csk(sk)->icsk_ca_state); } /* If cwnd > ssthresh, we may raise ssthresh to be half-way to cwnd. * The exception is cwnd reduction phase, when cwnd is decreasing towards * ssthresh. */ static inline __u32 tcp_current_ssthresh(const struct sock *sk) { const struct tcp_sock *tp = tcp_sk(sk); if (tcp_in_cwnd_reduction(sk)) return tp->snd_ssthresh; else return max(tp->snd_ssthresh, ((tcp_snd_cwnd(tp) >> 1) + (tcp_snd_cwnd(tp) >> 2))); } /* Use define here intentionally to get WARN_ON location shown at the caller */ #define tcp_verify_left_out(tp) WARN_ON(tcp_left_out(tp) > tp->packets_out) void tcp_enter_cwr(struct sock *sk); __u32 tcp_init_cwnd(const struct tcp_sock *tp, const struct dst_entry *dst); /* The maximum number of MSS of available cwnd for which TSO defers * sending if not using sysctl_tcp_tso_win_divisor. */ static inline __u32 tcp_max_tso_deferred_mss(const struct tcp_sock *tp) { return 3; } /* Returns end sequence number of the receiver's advertised window */ static inline u32 tcp_wnd_end(const struct tcp_sock *tp) { return tp->snd_una + tp->snd_wnd; } /* We follow the spirit of RFC2861 to validate cwnd but implement a more * flexible approach. The RFC suggests cwnd should not be raised unless * it was fully used previously. And that's exactly what we do in * congestion avoidance mode. But in slow start we allow cwnd to grow * as long as the application has used half the cwnd. * Example : * cwnd is 10 (IW10), but application sends 9 frames. * We allow cwnd to reach 18 when all frames are ACKed. * This check is safe because it's as aggressive as slow start which already * risks 100% overshoot. The advantage is that we discourage application to * either send more filler packets or data to artificially blow up the cwnd * usage, and allow application-limited process to probe bw more aggressively. */ static inline bool tcp_is_cwnd_limited(const struct sock *sk) { const struct tcp_sock *tp = tcp_sk(sk); if (tp->is_cwnd_limited) return true; /* If in slow start, ensure cwnd grows to twice what was ACKed. */ if (tcp_in_slow_start(tp)) return tcp_snd_cwnd(tp) < 2 * tp->max_packets_out; return false; } /* BBR congestion control needs pacing. * Same remark for SO_MAX_PACING_RATE. * sch_fq packet scheduler is efficiently handling pacing, * but is not always installed/used. * Return true if TCP stack should pace packets itself. */ static inline bool tcp_needs_internal_pacing(const struct sock *sk) { return smp_load_acquire(&sk->sk_pacing_status) == SK_PACING_NEEDED; } /* Estimates in how many jiffies next packet for this flow can be sent. * Scheduling a retransmit timer too early would be silly. */ static inline unsigned long tcp_pacing_delay(const struct sock *sk) { s64 delay = tcp_sk(sk)->tcp_wstamp_ns - tcp_sk(sk)->tcp_clock_cache; return delay > 0 ? nsecs_to_jiffies(delay) : 0; } static inline void tcp_reset_xmit_timer(struct sock *sk, const int what, unsigned long when, bool pace_delay) { if (pace_delay) when += tcp_pacing_delay(sk); inet_csk_reset_xmit_timer(sk, what, when, tcp_rto_max(sk)); } /* Something is really bad, we could not queue an additional packet, * because qdisc is full or receiver sent a 0 window, or we are paced. * We do not want to add fuel to the fire, or abort too early, * so make sure the timer we arm now is at least 200ms in the future, * regardless of current icsk_rto value (as it could be ~2ms) */ static inline unsigned long tcp_probe0_base(const struct sock *sk) { return max_t(unsigned long, inet_csk(sk)->icsk_rto, TCP_RTO_MIN); } /* Variant of inet_csk_rto_backoff() used for zero window probes */ static inline unsigned long tcp_probe0_when(const struct sock *sk, unsigned long max_when) { u8 backoff = min_t(u8, ilog2(TCP_RTO_MAX / TCP_RTO_MIN) + 1, inet_csk(sk)->icsk_backoff); u64 when = (u64)tcp_probe0_base(sk) << backoff; return (unsigned long)min_t(u64, when, max_when); } static inline void tcp_check_probe_timer(struct sock *sk) { if (!tcp_sk(sk)->packets_out && !inet_csk(sk)->icsk_pending) tcp_reset_xmit_timer(sk, ICSK_TIME_PROBE0, tcp_probe0_base(sk), true); } static inline void tcp_init_wl(struct tcp_sock *tp, u32 seq) { tp->snd_wl1 = seq; } static inline void tcp_update_wl(struct tcp_sock *tp, u32 seq) { tp->snd_wl1 = seq; } /* * Calculate(/check) TCP checksum */ static inline __sum16 tcp_v4_check(int len, __be32 saddr, __be32 daddr, __wsum base) { return csum_tcpudp_magic(saddr, daddr, len, IPPROTO_TCP, base); } static inline bool tcp_checksum_complete(struct sk_buff *skb) { return !skb_csum_unnecessary(skb) && __skb_checksum_complete(skb); } bool tcp_add_backlog(struct sock *sk, struct sk_buff *skb, enum skb_drop_reason *reason); int tcp_filter(struct sock *sk, struct sk_buff *skb, enum skb_drop_reason *reason); void tcp_set_state(struct sock *sk, int state); void tcp_done(struct sock *sk); int tcp_abort(struct sock *sk, int err); static inline void tcp_sack_reset(struct tcp_options_received *rx_opt) { rx_opt->dsack = 0; rx_opt->num_sacks = 0; } void tcp_cwnd_restart(struct sock *sk, s32 delta); static inline void tcp_slow_start_after_idle_check(struct sock *sk) { const struct tcp_congestion_ops *ca_ops = inet_csk(sk)->icsk_ca_ops; struct tcp_sock *tp = tcp_sk(sk); s32 delta; if (!READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_slow_start_after_idle) || tp->packets_out || ca_ops->cong_control) return; delta = tcp_jiffies32 - tp->lsndtime; if (delta > inet_csk(sk)->icsk_rto) tcp_cwnd_restart(sk, delta); } /* Determine a window scaling and initial window to offer. */ void tcp_select_initial_window(const struct sock *sk, int __space, __u32 mss, __u32 *rcv_wnd, __u32 *window_clamp, int wscale_ok, __u8 *rcv_wscale, __u32 init_rcv_wnd); static inline int __tcp_win_from_space(u8 scaling_ratio, int space) { s64 scaled_space = (s64)space * scaling_ratio; return scaled_space >> TCP_RMEM_TO_WIN_SCALE; } static inline int tcp_win_from_space(const struct sock *sk, int space) { return __tcp_win_from_space(tcp_sk(sk)->scaling_ratio, space); } /* inverse of __tcp_win_from_space() */ static inline int __tcp_space_from_win(u8 scaling_ratio, int win) { u64 val = (u64)win << TCP_RMEM_TO_WIN_SCALE; do_div(val, scaling_ratio); return val; } static inline int tcp_space_from_win(const struct sock *sk, int win) { return __tcp_space_from_win(tcp_sk(sk)->scaling_ratio, win); } /* Assume a 50% default for skb->len/skb->truesize ratio. * This may be adjusted later in tcp_measure_rcv_mss(). */ #define TCP_DEFAULT_SCALING_RATIO (1 << (TCP_RMEM_TO_WIN_SCALE - 1)) static inline void tcp_scaling_ratio_init(struct sock *sk) { tcp_sk(sk)->scaling_ratio = TCP_DEFAULT_SCALING_RATIO; } /* Note: caller must be prepared to deal with negative returns */ static inline int tcp_space(const struct sock *sk) { return tcp_win_from_space(sk, READ_ONCE(sk->sk_rcvbuf) - READ_ONCE(sk->sk_backlog.len) - atomic_read(&sk->sk_rmem_alloc)); } static inline int tcp_full_space(const struct sock *sk) { return tcp_win_from_space(sk, READ_ONCE(sk->sk_rcvbuf)); } static inline void __tcp_adjust_rcv_ssthresh(struct sock *sk, u32 new_ssthresh) { int unused_mem = sk_unused_reserved_mem(sk); struct tcp_sock *tp = tcp_sk(sk); tp->rcv_ssthresh = min(tp->rcv_ssthresh, new_ssthresh); if (unused_mem) tp->rcv_ssthresh = max_t(u32, tp->rcv_ssthresh, tcp_win_from_space(sk, unused_mem)); } static inline void tcp_adjust_rcv_ssthresh(struct sock *sk) { __tcp_adjust_rcv_ssthresh(sk, 4U * tcp_sk(sk)->advmss); } void tcp_cleanup_rbuf(struct sock *sk, int copied); void __tcp_cleanup_rbuf(struct sock *sk, int copied); /* We provision sk_rcvbuf around 200% of sk_rcvlowat. * If 87.5 % (7/8) of the space has been consumed, we want to override * SO_RCVLOWAT constraint, since we are receiving skbs with too small * len/truesize ratio. */ static inline bool tcp_rmem_pressure(const struct sock *sk) { int rcvbuf, threshold; if (tcp_under_memory_pressure(sk)) return true; rcvbuf = READ_ONCE(sk->sk_rcvbuf); threshold = rcvbuf - (rcvbuf >> 3); return atomic_read(&sk->sk_rmem_alloc) > threshold; } static inline bool tcp_epollin_ready(const struct sock *sk, int target) { const struct tcp_sock *tp = tcp_sk(sk); int avail = READ_ONCE(tp->rcv_nxt) - READ_ONCE(tp->copied_seq); if (avail <= 0) return false; return (avail >= target) || tcp_rmem_pressure(sk) || (tcp_receive_window(tp) <= inet_csk(sk)->icsk_ack.rcv_mss); } extern void tcp_openreq_init_rwin(struct request_sock *req, const struct sock *sk_listener, const struct dst_entry *dst); void tcp_enter_memory_pressure(struct sock *sk); void tcp_leave_memory_pressure(struct sock *sk); static inline int keepalive_intvl_when(const struct tcp_sock *tp) { struct net *net = sock_net((struct sock *)tp); int val; /* Paired with WRITE_ONCE() in tcp_sock_set_keepintvl() * and do_tcp_setsockopt(). */ val = READ_ONCE(tp->keepalive_intvl); return val ? : READ_ONCE(net->ipv4.sysctl_tcp_keepalive_intvl); } static inline int keepalive_time_when(const struct tcp_sock *tp) { struct net *net = sock_net((struct sock *)tp); int val; /* Paired with WRITE_ONCE() in tcp_sock_set_keepidle_locked() */ val = READ_ONCE(tp->keepalive_time); return val ? : READ_ONCE(net->ipv4.sysctl_tcp_keepalive_time); } static inline int keepalive_probes(const struct tcp_sock *tp) { struct net *net = sock_net((struct sock *)tp); int val; /* Paired with WRITE_ONCE() in tcp_sock_set_keepcnt() * and do_tcp_setsockopt(). */ val = READ_ONCE(tp->keepalive_probes); return val ? : READ_ONCE(net->ipv4.sysctl_tcp_keepalive_probes); } static inline u32 keepalive_time_elapsed(const struct tcp_sock *tp) { const struct inet_connection_sock *icsk = &tp->inet_conn; return min_t(u32, tcp_jiffies32 - icsk->icsk_ack.lrcvtime, tcp_jiffies32 - tp->rcv_tstamp); } static inline int tcp_fin_time(const struct sock *sk) { int fin_timeout = tcp_sk(sk)->linger2 ? : READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_fin_timeout); const int rto = inet_csk(sk)->icsk_rto; if (fin_timeout < (rto << 2) - (rto >> 1)) fin_timeout = (rto << 2) - (rto >> 1); return fin_timeout; } static inline bool tcp_paws_check(const struct tcp_options_received *rx_opt, int paws_win) { if ((s32)(rx_opt->ts_recent - rx_opt->rcv_tsval) <= paws_win) return true; if (unlikely(!time_before32(ktime_get_seconds(), rx_opt->ts_recent_stamp + TCP_PAWS_WRAP))) return true; /* * Some OSes send SYN and SYNACK messages with tsval=0 tsecr=0, * then following tcp messages have valid values. Ignore 0 value, * or else 'negative' tsval might forbid us to accept their packets. */ if (!rx_opt->ts_recent) return true; return false; } static inline bool tcp_paws_reject(const struct tcp_options_received *rx_opt, int rst) { if (tcp_paws_check(rx_opt, 0)) return false; /* RST segments are not recommended to carry timestamp, and, if they do, it is recommended to ignore PAWS because "their cleanup function should take precedence over timestamps." Certainly, it is mistake. It is necessary to understand the reasons of this constraint to relax it: if peer reboots, clock may go out-of-sync and half-open connections will not be reset. Actually, the problem would be not existing if all the implementations followed draft about maintaining clock via reboots. Linux-2.2 DOES NOT! However, we can relax time bounds for RST segments to MSL. */ if (rst && !time_before32(ktime_get_seconds(), rx_opt->ts_recent_stamp + TCP_PAWS_MSL)) return false; return true; } bool tcp_oow_rate_limited(struct net *net, const struct sk_buff *skb, int mib_idx, u32 *last_oow_ack_time); static inline void tcp_mib_init(struct net *net) { /* See RFC 2012 */ TCP_ADD_STATS(net, TCP_MIB_RTOALGORITHM, 1); TCP_ADD_STATS(net, TCP_MIB_RTOMIN, TCP_RTO_MIN*1000/HZ); TCP_ADD_STATS(net, TCP_MIB_RTOMAX, TCP_RTO_MAX*1000/HZ); TCP_ADD_STATS(net, TCP_MIB_MAXCONN, -1); } /* from STCP */ static inline void tcp_clear_all_retrans_hints(struct tcp_sock *tp) { tp->retransmit_skb_hint = NULL; } #define tcp_md5_addr tcp_ao_addr /* - key database */ struct tcp_md5sig_key { struct hlist_node node; u8 keylen; u8 family; /* AF_INET or AF_INET6 */ u8 prefixlen; u8 flags; union tcp_md5_addr addr; int l3index; /* set if key added with L3 scope */ u8 key[TCP_MD5SIG_MAXKEYLEN]; struct rcu_head rcu; }; /* - sock block */ struct tcp_md5sig_info { struct hlist_head head; struct rcu_head rcu; }; /* - pseudo header */ struct tcp4_pseudohdr { __be32 saddr; __be32 daddr; __u8 pad; __u8 protocol; __be16 len; }; struct tcp6_pseudohdr { struct in6_addr saddr; struct in6_addr daddr; __be32 len; __be32 protocol; /* including padding */ }; union tcp_md5sum_block { struct tcp4_pseudohdr ip4; #if IS_ENABLED(CONFIG_IPV6) struct tcp6_pseudohdr ip6; #endif }; /* * struct tcp_sigpool - per-CPU pool of ahash_requests * @scratch: per-CPU temporary area, that can be used between * tcp_sigpool_start() and tcp_sigpool_end() to perform * crypto request * @req: pre-allocated ahash request */ struct tcp_sigpool { void *scratch; struct ahash_request *req; }; int tcp_sigpool_alloc_ahash(const char *alg, size_t scratch_size); void tcp_sigpool_get(unsigned int id); void tcp_sigpool_release(unsigned int id); int tcp_sigpool_hash_skb_data(struct tcp_sigpool *hp, const struct sk_buff *skb, unsigned int header_len); /** * tcp_sigpool_start - disable bh and start using tcp_sigpool_ahash * @id: tcp_sigpool that was previously allocated by tcp_sigpool_alloc_ahash() * @c: returned tcp_sigpool for usage (uninitialized on failure) * * Returns: 0 on success, error otherwise. */ int tcp_sigpool_start(unsigned int id, struct tcp_sigpool *c); /** * tcp_sigpool_end - enable bh and stop using tcp_sigpool * @c: tcp_sigpool context that was returned by tcp_sigpool_start() */ void tcp_sigpool_end(struct tcp_sigpool *c); size_t tcp_sigpool_algo(unsigned int id, char *buf, size_t buf_len); /* - functions */ int tcp_v4_md5_hash_skb(char *md5_hash, const struct tcp_md5sig_key *key, const struct sock *sk, const struct sk_buff *skb); int tcp_md5_do_add(struct sock *sk, const union tcp_md5_addr *addr, int family, u8 prefixlen, int l3index, u8 flags, const u8 *newkey, u8 newkeylen); int tcp_md5_key_copy(struct sock *sk, const union tcp_md5_addr *addr, int family, u8 prefixlen, int l3index, struct tcp_md5sig_key *key); int tcp_md5_do_del(struct sock *sk, const union tcp_md5_addr *addr, int family, u8 prefixlen, int l3index, u8 flags); void tcp_clear_md5_list(struct sock *sk); struct tcp_md5sig_key *tcp_v4_md5_lookup(const struct sock *sk, const struct sock *addr_sk); #ifdef CONFIG_TCP_MD5SIG struct tcp_md5sig_key *__tcp_md5_do_lookup(const struct sock *sk, int l3index, const union tcp_md5_addr *addr, int family, bool any_l3index); static inline struct tcp_md5sig_key * tcp_md5_do_lookup(const struct sock *sk, int l3index, const union tcp_md5_addr *addr, int family) { if (!static_branch_unlikely(&tcp_md5_needed.key)) return NULL; return __tcp_md5_do_lookup(sk, l3index, addr, family, false); } static inline struct tcp_md5sig_key * tcp_md5_do_lookup_any_l3index(const struct sock *sk, const union tcp_md5_addr *addr, int family) { if (!static_branch_unlikely(&tcp_md5_needed.key)) return NULL; return __tcp_md5_do_lookup(sk, 0, addr, family, true); } #define tcp_twsk_md5_key(twsk) ((twsk)->tw_md5_key) #else static inline struct tcp_md5sig_key * tcp_md5_do_lookup(const struct sock *sk, int l3index, const union tcp_md5_addr *addr, int family) { return NULL; } static inline struct tcp_md5sig_key * tcp_md5_do_lookup_any_l3index(const struct sock *sk, const union tcp_md5_addr *addr, int family) { return NULL; } #define tcp_twsk_md5_key(twsk) NULL #endif int tcp_md5_alloc_sigpool(void); void tcp_md5_release_sigpool(void); void tcp_md5_add_sigpool(void); extern int tcp_md5_sigpool_id; int tcp_md5_hash_key(struct tcp_sigpool *hp, const struct tcp_md5sig_key *key); /* From tcp_fastopen.c */ void tcp_fastopen_cache_get(struct sock *sk, u16 *mss, struct tcp_fastopen_cookie *cookie); void tcp_fastopen_cache_set(struct sock *sk, u16 mss, struct tcp_fastopen_cookie *cookie, bool syn_lost, u16 try_exp); struct tcp_fastopen_request { /* Fast Open cookie. Size 0 means a cookie request */ struct tcp_fastopen_cookie cookie; struct msghdr *data; /* data in MSG_FASTOPEN */ size_t size; int copied; /* queued in tcp_connect() */ struct ubuf_info *uarg; }; void tcp_free_fastopen_req(struct tcp_sock *tp); void tcp_fastopen_destroy_cipher(struct sock *sk); void tcp_fastopen_ctx_destroy(struct net *net); int tcp_fastopen_reset_cipher(struct net *net, struct sock *sk, void *primary_key, void *backup_key); int tcp_fastopen_get_cipher(struct net *net, struct inet_connection_sock *icsk, u64 *key); void tcp_fastopen_add_skb(struct sock *sk, struct sk_buff *skb); struct sock *tcp_try_fastopen(struct sock *sk, struct sk_buff *skb, struct request_sock *req, struct tcp_fastopen_cookie *foc, const struct dst_entry *dst); void tcp_fastopen_init_key_once(struct net *net); bool tcp_fastopen_cookie_check(struct sock *sk, u16 *mss, struct tcp_fastopen_cookie *cookie); bool tcp_fastopen_defer_connect(struct sock *sk, int *err); #define TCP_FASTOPEN_KEY_LENGTH sizeof(siphash_key_t) #define TCP_FASTOPEN_KEY_MAX 2 #define TCP_FASTOPEN_KEY_BUF_LENGTH \ (TCP_FASTOPEN_KEY_LENGTH * TCP_FASTOPEN_KEY_MAX) /* Fastopen key context */ struct tcp_fastopen_context { siphash_key_t key[TCP_FASTOPEN_KEY_MAX]; int num; struct rcu_head rcu; }; void tcp_fastopen_active_disable(struct sock *sk); bool tcp_fastopen_active_should_disable(struct sock *sk); void tcp_fastopen_active_disable_ofo_check(struct sock *sk); void tcp_fastopen_active_detect_blackhole(struct sock *sk, bool expired); /* Caller needs to wrap with rcu_read_(un)lock() */ static inline struct tcp_fastopen_context *tcp_fastopen_get_ctx(const struct sock *sk) { struct tcp_fastopen_context *ctx; ctx = rcu_dereference(inet_csk(sk)->icsk_accept_queue.fastopenq.ctx); if (!ctx) ctx = rcu_dereference(sock_net(sk)->ipv4.tcp_fastopen_ctx); return ctx; } static inline bool tcp_fastopen_cookie_match(const struct tcp_fastopen_cookie *foc, const struct tcp_fastopen_cookie *orig) { if (orig->len == TCP_FASTOPEN_COOKIE_SIZE && orig->len == foc->len && !memcmp(orig->val, foc->val, foc->len)) return true; return false; } static inline int tcp_fastopen_context_len(const struct tcp_fastopen_context *ctx) { return ctx->num; } /* Latencies incurred by various limits for a sender. They are * chronograph-like stats that are mutually exclusive. */ enum tcp_chrono { TCP_CHRONO_UNSPEC, TCP_CHRONO_BUSY, /* Actively sending data (non-empty write queue) */ TCP_CHRONO_RWND_LIMITED, /* Stalled by insufficient receive window */ TCP_CHRONO_SNDBUF_LIMITED, /* Stalled by insufficient send buffer */ __TCP_CHRONO_MAX, }; void tcp_chrono_start(struct sock *sk, const enum tcp_chrono type); void tcp_chrono_stop(struct sock *sk, const enum tcp_chrono type); /* This helper is needed, because skb->tcp_tsorted_anchor uses * the same memory storage than skb->destructor/_skb_refdst */ static inline void tcp_skb_tsorted_anchor_cleanup(struct sk_buff *skb) { skb->destructor = NULL; skb->_skb_refdst = 0UL; } #define tcp_skb_tsorted_save(skb) { \ unsigned long _save = skb->_skb_refdst; \ skb->_skb_refdst = 0UL; #define tcp_skb_tsorted_restore(skb) \ skb->_skb_refdst = _save; \ } void tcp_write_queue_purge(struct sock *sk); static inline struct sk_buff *tcp_rtx_queue_head(const struct sock *sk) { return skb_rb_first(&sk->tcp_rtx_queue); } static inline struct sk_buff *tcp_rtx_queue_tail(const struct sock *sk) { return skb_rb_last(&sk->tcp_rtx_queue); } static inline struct sk_buff *tcp_write_queue_tail(const struct sock *sk) { return skb_peek_tail(&sk->sk_write_queue); } #define tcp_for_write_queue_from_safe(skb, tmp, sk) \ skb_queue_walk_from_safe(&(sk)->sk_write_queue, skb, tmp) static inline struct sk_buff *tcp_send_head(const struct sock *sk) { return skb_peek(&sk->sk_write_queue); } static inline bool tcp_skb_is_last(const struct sock *sk, const struct sk_buff *skb) { return skb_queue_is_last(&sk->sk_write_queue, skb); } /** * tcp_write_queue_empty - test if any payload (or FIN) is available in write queue * @sk: socket * * Since the write queue can have a temporary empty skb in it, * we must not use "return skb_queue_empty(&sk->sk_write_queue)" */ static inline bool tcp_write_queue_empty(const struct sock *sk) { const struct tcp_sock *tp = tcp_sk(sk); return tp->write_seq == tp->snd_nxt; } static inline bool tcp_rtx_queue_empty(const struct sock *sk) { return RB_EMPTY_ROOT(&sk->tcp_rtx_queue); } static inline bool tcp_rtx_and_write_queues_empty(const struct sock *sk) { return tcp_rtx_queue_empty(sk) && tcp_write_queue_empty(sk); } static inline void tcp_add_write_queue_tail(struct sock *sk, struct sk_buff *skb) { __skb_queue_tail(&sk->sk_write_queue, skb); /* Queue it, remembering where we must start sending. */ if (sk->sk_write_queue.next == skb) tcp_chrono_start(sk, TCP_CHRONO_BUSY); } /* Insert new before skb on the write queue of sk. */ static inline void tcp_insert_write_queue_before(struct sk_buff *new, struct sk_buff *skb, struct sock *sk) { __skb_queue_before(&sk->sk_write_queue, skb, new); } static inline void tcp_unlink_write_queue(struct sk_buff *skb, struct sock *sk) { tcp_skb_tsorted_anchor_cleanup(skb); __skb_unlink(skb, &sk->sk_write_queue); } void tcp_rbtree_insert(struct rb_root *root, struct sk_buff *skb); static inline void tcp_rtx_queue_unlink(struct sk_buff *skb, struct sock *sk) { tcp_skb_tsorted_anchor_cleanup(skb); rb_erase(&skb->rbnode, &sk->tcp_rtx_queue); } static inline void tcp_rtx_queue_unlink_and_free(struct sk_buff *skb, struct sock *sk) { list_del(&skb->tcp_tsorted_anchor); tcp_rtx_queue_unlink(skb, sk); tcp_wmem_free_skb(sk, skb); } static inline void tcp_write_collapse_fence(struct sock *sk) { struct sk_buff *skb = tcp_write_queue_tail(sk); if (skb) TCP_SKB_CB(skb)->eor = 1; } static inline void tcp_push_pending_frames(struct sock *sk) { if (tcp_send_head(sk)) { struct tcp_sock *tp = tcp_sk(sk); __tcp_push_pending_frames(sk, tcp_current_mss(sk), tp->nonagle); } } /* Start sequence of the skb just after the highest skb with SACKed * bit, valid only if sacked_out > 0 or when the caller has ensured * validity by itself. */ static inline u32 tcp_highest_sack_seq(struct tcp_sock *tp) { if (!tp->sacked_out) return tp->snd_una; if (tp->highest_sack == NULL) return tp->snd_nxt; return TCP_SKB_CB(tp->highest_sack)->seq; } static inline void tcp_advance_highest_sack(struct sock *sk, struct sk_buff *skb) { tcp_sk(sk)->highest_sack = skb_rb_next(skb); } static inline struct sk_buff *tcp_highest_sack(struct sock *sk) { return tcp_sk(sk)->highest_sack; } static inline void tcp_highest_sack_reset(struct sock *sk) { tcp_sk(sk)->highest_sack = tcp_rtx_queue_head(sk); } /* Called when old skb is about to be deleted and replaced by new skb */ static inline void tcp_highest_sack_replace(struct sock *sk, struct sk_buff *old, struct sk_buff *new) { if (old == tcp_highest_sack(sk)) tcp_sk(sk)->highest_sack = new; } /* This helper checks if socket has IP_TRANSPARENT set */ static inline bool inet_sk_transparent(const struct sock *sk) { switch (sk->sk_state) { case TCP_TIME_WAIT: return inet_twsk(sk)->tw_transparent; case TCP_NEW_SYN_RECV: return inet_rsk(inet_reqsk(sk))->no_srccheck; } return inet_test_bit(TRANSPARENT, sk); } /* Determines whether this is a thin stream (which may suffer from * increased latency). Used to trigger latency-reducing mechanisms. */ static inline bool tcp_stream_is_thin(struct tcp_sock *tp) { return tp->packets_out < 4 && !tcp_in_initial_slowstart(tp); } /* /proc */ enum tcp_seq_states { TCP_SEQ_STATE_LISTENING, TCP_SEQ_STATE_ESTABLISHED, }; void *tcp_seq_start(struct seq_file *seq, loff_t *pos); void *tcp_seq_next(struct seq_file *seq, void *v, loff_t *pos); void tcp_seq_stop(struct seq_file *seq, void *v); struct tcp_seq_afinfo { sa_family_t family; }; struct tcp_iter_state { struct seq_net_private p; enum tcp_seq_states state; struct sock *syn_wait_sk; int bucket, offset, sbucket, num; loff_t last_pos; }; extern struct request_sock_ops tcp_request_sock_ops; extern struct request_sock_ops tcp6_request_sock_ops; void tcp_v4_destroy_sock(struct sock *sk); struct sk_buff *tcp_gso_segment(struct sk_buff *skb, netdev_features_t features); struct tcphdr *tcp_gro_pull_header(struct sk_buff *skb); struct sk_buff *tcp_gro_lookup(struct list_head *head, struct tcphdr *th); struct sk_buff *tcp_gro_receive(struct list_head *head, struct sk_buff *skb, struct tcphdr *th); INDIRECT_CALLABLE_DECLARE(int tcp4_gro_complete(struct sk_buff *skb, int thoff)); INDIRECT_CALLABLE_DECLARE(struct sk_buff *tcp4_gro_receive(struct list_head *head, struct sk_buff *skb)); INDIRECT_CALLABLE_DECLARE(int tcp6_gro_complete(struct sk_buff *skb, int thoff)); INDIRECT_CALLABLE_DECLARE(struct sk_buff *tcp6_gro_receive(struct list_head *head, struct sk_buff *skb)); #ifdef CONFIG_INET void tcp_gro_complete(struct sk_buff *skb); #else static inline void tcp_gro_complete(struct sk_buff *skb) { } #endif void __tcp_v4_send_check(struct sk_buff *skb, __be32 saddr, __be32 daddr); static inline u32 tcp_notsent_lowat(const struct tcp_sock *tp) { struct net *net = sock_net((struct sock *)tp); u32 val; val = READ_ONCE(tp->notsent_lowat); return val ?: READ_ONCE(net->ipv4.sysctl_tcp_notsent_lowat); } bool tcp_stream_memory_free(const struct sock *sk, int wake); #ifdef CONFIG_PROC_FS int tcp4_proc_init(void); void tcp4_proc_exit(void); #endif int tcp_rtx_synack(const struct sock *sk, struct request_sock *req); int tcp_conn_request(struct request_sock_ops *rsk_ops, const struct tcp_request_sock_ops *af_ops, struct sock *sk, struct sk_buff *skb); /* TCP af-specific functions */ struct tcp_sock_af_ops { #ifdef CONFIG_TCP_MD5SIG struct tcp_md5sig_key *(*md5_lookup) (const struct sock *sk, const struct sock *addr_sk); int (*calc_md5_hash)(char *location, const struct tcp_md5sig_key *md5, const struct sock *sk, const struct sk_buff *skb); int (*md5_parse)(struct sock *sk, int optname, sockptr_t optval, int optlen); #endif #ifdef CONFIG_TCP_AO int (*ao_parse)(struct sock *sk, int optname, sockptr_t optval, int optlen); struct tcp_ao_key *(*ao_lookup)(const struct sock *sk, struct sock *addr_sk, int sndid, int rcvid); int (*ao_calc_key_sk)(struct tcp_ao_key *mkt, u8 *key, const struct sock *sk, __be32 sisn, __be32 disn, bool send); int (*calc_ao_hash)(char *location, struct tcp_ao_key *ao, const struct sock *sk, const struct sk_buff *skb, const u8 *tkey, int hash_offset, u32 sne); #endif }; struct tcp_request_sock_ops { u16 mss_clamp; #ifdef CONFIG_TCP_MD5SIG struct tcp_md5sig_key *(*req_md5_lookup)(const struct sock *sk, const struct sock *addr_sk); int (*calc_md5_hash) (char *location, const struct tcp_md5sig_key *md5, const struct sock *sk, const struct sk_buff *skb); #endif #ifdef CONFIG_TCP_AO struct tcp_ao_key *(*ao_lookup)(const struct sock *sk, struct request_sock *req, int sndid, int rcvid); int (*ao_calc_key)(struct tcp_ao_key *mkt, u8 *key, struct request_sock *sk); int (*ao_synack_hash)(char *ao_hash, struct tcp_ao_key *mkt, struct request_sock *req, const struct sk_buff *skb, int hash_offset, u32 sne); #endif #ifdef CONFIG_SYN_COOKIES __u32 (*cookie_init_seq)(const struct sk_buff *skb, __u16 *mss); #endif struct dst_entry *(*route_req)(const struct sock *sk, struct sk_buff *skb, struct flowi *fl, struct request_sock *req, u32 tw_isn); u32 (*init_seq)(const struct sk_buff *skb); u32 (*init_ts_off)(const struct net *net, const struct sk_buff *skb); int (*send_synack)(const struct sock *sk, struct dst_entry *dst, struct flowi *fl, struct request_sock *req, struct tcp_fastopen_cookie *foc, enum tcp_synack_type synack_type, struct sk_buff *syn_skb); }; extern const struct tcp_request_sock_ops tcp_request_sock_ipv4_ops; #if IS_ENABLED(CONFIG_IPV6) extern const struct tcp_request_sock_ops tcp_request_sock_ipv6_ops; #endif #ifdef CONFIG_SYN_COOKIES static inline __u32 cookie_init_sequence(const struct tcp_request_sock_ops *ops, const struct sock *sk, struct sk_buff *skb, __u16 *mss) { tcp_synq_overflow(sk); __NET_INC_STATS(sock_net(sk), LINUX_MIB_SYNCOOKIESSENT); return ops->cookie_init_seq(skb, mss); } #else static inline __u32 cookie_init_sequence(const struct tcp_request_sock_ops *ops, const struct sock *sk, struct sk_buff *skb, __u16 *mss) { return 0; } #endif struct tcp_key { union { struct { struct tcp_ao_key *ao_key; char *traffic_key; u32 sne; u8 rcv_next; }; struct tcp_md5sig_key *md5_key; }; enum { TCP_KEY_NONE = 0, TCP_KEY_MD5, TCP_KEY_AO, } type; }; static inline void tcp_get_current_key(const struct sock *sk, struct tcp_key *out) { #if defined(CONFIG_TCP_AO) || defined(CONFIG_TCP_MD5SIG) const struct tcp_sock *tp = tcp_sk(sk); #endif #ifdef CONFIG_TCP_AO if (static_branch_unlikely(&tcp_ao_needed.key)) { struct tcp_ao_info *ao; ao = rcu_dereference_protected(tp->ao_info, lockdep_sock_is_held(sk)); if (ao) { out->ao_key = READ_ONCE(ao->current_key); out->type = TCP_KEY_AO; return; } } #endif #ifdef CONFIG_TCP_MD5SIG if (static_branch_unlikely(&tcp_md5_needed.key) && rcu_access_pointer(tp->md5sig_info)) { out->md5_key = tp->af_specific->md5_lookup(sk, sk); if (out->md5_key) { out->type = TCP_KEY_MD5; return; } } #endif out->type = TCP_KEY_NONE; } static inline bool tcp_key_is_md5(const struct tcp_key *key) { if (static_branch_tcp_md5()) return key->type == TCP_KEY_MD5; return false; } static inline bool tcp_key_is_ao(const struct tcp_key *key) { if (static_branch_tcp_ao()) return key->type == TCP_KEY_AO; return false; } int tcpv4_offload_init(void); void tcp_v4_init(void); void tcp_init(void); /* tcp_recovery.c */ void tcp_mark_skb_lost(struct sock *sk, struct sk_buff *skb); void tcp_newreno_mark_lost(struct sock *sk, bool snd_una_advanced); extern s32 tcp_rack_skb_timeout(struct tcp_sock *tp, struct sk_buff *skb, u32 reo_wnd); extern bool tcp_rack_mark_lost(struct sock *sk); extern void tcp_rack_advance(struct tcp_sock *tp, u8 sacked, u32 end_seq, u64 xmit_time); extern void tcp_rack_reo_timeout(struct sock *sk); extern void tcp_rack_update_reo_wnd(struct sock *sk, struct rate_sample *rs); /* tcp_plb.c */ /* * Scaling factor for fractions in PLB. For example, tcp_plb_update_state * expects cong_ratio which represents fraction of traffic that experienced * congestion over a single RTT. In order to avoid floating point operations, * this fraction should be mapped to (1 << TCP_PLB_SCALE) and passed in. */ #define TCP_PLB_SCALE 8 /* State for PLB (Protective Load Balancing) for a single TCP connection. */ struct tcp_plb_state { u8 consec_cong_rounds:5, /* consecutive congested rounds */ unused:3; u32 pause_until; /* jiffies32 when PLB can resume rerouting */ }; static inline void tcp_plb_init(const struct sock *sk, struct tcp_plb_state *plb) { plb->consec_cong_rounds = 0; plb->pause_until = 0; } void tcp_plb_update_state(const struct sock *sk, struct tcp_plb_state *plb, const int cong_ratio); void tcp_plb_check_rehash(struct sock *sk, struct tcp_plb_state *plb); void tcp_plb_update_state_upon_rto(struct sock *sk, struct tcp_plb_state *plb); static inline void tcp_warn_once(const struct sock *sk, bool cond, const char *str) { WARN_ONCE(cond, "%scwn:%u out:%u sacked:%u lost:%u retrans:%u tlp_high_seq:%u sk_state:%u ca_state:%u advmss:%u mss_cache:%u pmtu:%u\n", str, tcp_snd_cwnd(tcp_sk(sk)), tcp_sk(sk)->packets_out, tcp_sk(sk)->sacked_out, tcp_sk(sk)->lost_out, tcp_sk(sk)->retrans_out, tcp_sk(sk)->tlp_high_seq, sk->sk_state, inet_csk(sk)->icsk_ca_state, tcp_sk(sk)->advmss, tcp_sk(sk)->mss_cache, inet_csk(sk)->icsk_pmtu_cookie); } /* At how many usecs into the future should the RTO fire? */ static inline s64 tcp_rto_delta_us(const struct sock *sk) { const struct sk_buff *skb = tcp_rtx_queue_head(sk); u32 rto = inet_csk(sk)->icsk_rto; if (likely(skb)) { u64 rto_time_stamp_us = tcp_skb_timestamp_us(skb) + jiffies_to_usecs(rto); return rto_time_stamp_us - tcp_sk(sk)->tcp_mstamp; } else { tcp_warn_once(sk, 1, "rtx queue empty: "); return jiffies_to_usecs(rto); } } /* * Save and compile IPv4 options, return a pointer to it */ static inline struct ip_options_rcu *tcp_v4_save_options(struct net *net, struct sk_buff *skb) { const struct ip_options *opt = &TCP_SKB_CB(skb)->header.h4.opt; struct ip_options_rcu *dopt = NULL; if (opt->optlen) { int opt_size = sizeof(*dopt) + opt->optlen; dopt = kmalloc(opt_size, GFP_ATOMIC); if (dopt && __ip_options_echo(net, &dopt->opt, skb, opt)) { kfree(dopt); dopt = NULL; } } return dopt; } /* locally generated TCP pure ACKs have skb->truesize == 2 * (check tcp_send_ack() in net/ipv4/tcp_output.c ) * This is much faster than dissecting the packet to find out. * (Think of GRE encapsulations, IPv4, IPv6, ...) */ static inline bool skb_is_tcp_pure_ack(const struct sk_buff *skb) { return skb->truesize == 2; } static inline void skb_set_tcp_pure_ack(struct sk_buff *skb) { skb->truesize = 2; } static inline int tcp_inq(struct sock *sk) { struct tcp_sock *tp = tcp_sk(sk); int answ; if ((1 << sk->sk_state) & (TCPF_SYN_SENT | TCPF_SYN_RECV)) { answ = 0; } else if (sock_flag(sk, SOCK_URGINLINE) || !tp->urg_data || before(tp->urg_seq, tp->copied_seq) || !before(tp->urg_seq, tp->rcv_nxt)) { answ = tp->rcv_nxt - tp->copied_seq; /* Subtract 1, if FIN was received */ if (answ && sock_flag(sk, SOCK_DONE)) answ--; } else { answ = tp->urg_seq - tp->copied_seq; } return answ; } int tcp_peek_len(struct socket *sock); static inline void tcp_segs_in(struct tcp_sock *tp, const struct sk_buff *skb) { u16 segs_in; segs_in = max_t(u16, 1, skb_shinfo(skb)->gso_segs); /* We update these fields while other threads might * read them from tcp_get_info() */ WRITE_ONCE(tp->segs_in, tp->segs_in + segs_in); if (skb->len > tcp_hdrlen(skb)) WRITE_ONCE(tp->data_segs_in, tp->data_segs_in + segs_in); } /* * TCP listen path runs lockless. * We forced "struct sock" to be const qualified to make sure * we don't modify one of its field by mistake. * Here, we increment sk_drops which is an atomic_t, so we can safely * make sock writable again. */ static inline void tcp_listendrop(const struct sock *sk) { atomic_inc(&((struct sock *)sk)->sk_drops); __NET_INC_STATS(sock_net(sk), LINUX_MIB_LISTENDROPS); } enum hrtimer_restart tcp_pace_kick(struct hrtimer *timer); /* * Interface for adding Upper Level Protocols over TCP */ #define TCP_ULP_NAME_MAX 16 #define TCP_ULP_MAX 128 #define TCP_ULP_BUF_MAX (TCP_ULP_NAME_MAX*TCP_ULP_MAX) struct tcp_ulp_ops { struct list_head list; /* initialize ulp */ int (*init)(struct sock *sk); /* update ulp */ void (*update)(struct sock *sk, struct proto *p, void (*write_space)(struct sock *sk)); /* cleanup ulp */ void (*release)(struct sock *sk); /* diagnostic */ int (*get_info)(struct sock *sk, struct sk_buff *skb, bool net_admin); size_t (*get_info_size)(const struct sock *sk, bool net_admin); /* clone ulp */ void (*clone)(const struct request_sock *req, struct sock *newsk, const gfp_t priority); char name[TCP_ULP_NAME_MAX]; struct module *owner; }; int tcp_register_ulp(struct tcp_ulp_ops *type); void tcp_unregister_ulp(struct tcp_ulp_ops *type); int tcp_set_ulp(struct sock *sk, const char *name); void tcp_get_available_ulp(char *buf, size_t len); void tcp_cleanup_ulp(struct sock *sk); void tcp_update_ulp(struct sock *sk, struct proto *p, void (*write_space)(struct sock *sk)); #define MODULE_ALIAS_TCP_ULP(name) \ MODULE_INFO(alias, name); \ MODULE_INFO(alias, "tcp-ulp-" name) #ifdef CONFIG_NET_SOCK_MSG struct sk_msg; struct sk_psock; #ifdef CONFIG_BPF_SYSCALL int tcp_bpf_update_proto(struct sock *sk, struct sk_psock *psock, bool restore); void tcp_bpf_clone(const struct sock *sk, struct sock *newsk); #ifdef CONFIG_BPF_STREAM_PARSER struct strparser; int tcp_bpf_strp_read_sock(struct strparser *strp, read_descriptor_t *desc, sk_read_actor_t recv_actor); #endif /* CONFIG_BPF_STREAM_PARSER */ #endif /* CONFIG_BPF_SYSCALL */ #ifdef CONFIG_INET void tcp_eat_skb(struct sock *sk, struct sk_buff *skb); #else static inline void tcp_eat_skb(struct sock *sk, struct sk_buff *skb) { } #endif int tcp_bpf_sendmsg_redir(struct sock *sk, bool ingress, struct sk_msg *msg, u32 bytes, int flags); #endif /* CONFIG_NET_SOCK_MSG */ #if !defined(CONFIG_BPF_SYSCALL) || !defined(CONFIG_NET_SOCK_MSG) static inline void tcp_bpf_clone(const struct sock *sk, struct sock *newsk) { } #endif #ifdef CONFIG_CGROUP_BPF static inline void bpf_skops_init_skb(struct bpf_sock_ops_kern *skops, struct sk_buff *skb, unsigned int end_offset) { skops->skb = skb; skops->skb_data_end = skb->data + end_offset; } #else static inline void bpf_skops_init_skb(struct bpf_sock_ops_kern *skops, struct sk_buff *skb, unsigned int end_offset) { } #endif /* Call BPF_SOCK_OPS program that returns an int. If the return value * is < 0, then the BPF op failed (for example if the loaded BPF * program does not support the chosen operation or there is no BPF * program loaded). */ #ifdef CONFIG_BPF static inline int tcp_call_bpf(struct sock *sk, int op, u32 nargs, u32 *args) { struct bpf_sock_ops_kern sock_ops; int ret; memset(&sock_ops, 0, offsetof(struct bpf_sock_ops_kern, temp)); if (sk_fullsock(sk)) { sock_ops.is_fullsock = 1; sock_ops.is_locked_tcp_sock = 1; sock_owned_by_me(sk); } sock_ops.sk = sk; sock_ops.op = op; if (nargs > 0) memcpy(sock_ops.args, args, nargs * sizeof(*args)); ret = BPF_CGROUP_RUN_PROG_SOCK_OPS(&sock_ops); if (ret == 0) ret = sock_ops.reply; else ret = -1; return ret; } static inline int tcp_call_bpf_2arg(struct sock *sk, int op, u32 arg1, u32 arg2) { u32 args[2] = {arg1, arg2}; return tcp_call_bpf(sk, op, 2, args); } static inline int tcp_call_bpf_3arg(struct sock *sk, int op, u32 arg1, u32 arg2, u32 arg3) { u32 args[3] = {arg1, arg2, arg3}; return tcp_call_bpf(sk, op, 3, args); } #else static inline int tcp_call_bpf(struct sock *sk, int op, u32 nargs, u32 *args) { return -EPERM; } static inline int tcp_call_bpf_2arg(struct sock *sk, int op, u32 arg1, u32 arg2) { return -EPERM; } static inline int tcp_call_bpf_3arg(struct sock *sk, int op, u32 arg1, u32 arg2, u32 arg3) { return -EPERM; } #endif static inline u32 tcp_timeout_init(struct sock *sk) { int timeout; timeout = tcp_call_bpf(sk, BPF_SOCK_OPS_TIMEOUT_INIT, 0, NULL); if (timeout <= 0) timeout = TCP_TIMEOUT_INIT; return min_t(int, timeout, TCP_RTO_MAX); } static inline u32 tcp_rwnd_init_bpf(struct sock *sk) { int rwnd; rwnd = tcp_call_bpf(sk, BPF_SOCK_OPS_RWND_INIT, 0, NULL); if (rwnd < 0) rwnd = 0; return rwnd; } static inline bool tcp_bpf_ca_needs_ecn(struct sock *sk) { return (tcp_call_bpf(sk, BPF_SOCK_OPS_NEEDS_ECN, 0, NULL) == 1); } static inline void tcp_bpf_rtt(struct sock *sk, long mrtt, u32 srtt) { if (BPF_SOCK_OPS_TEST_FLAG(tcp_sk(sk), BPF_SOCK_OPS_RTT_CB_FLAG)) tcp_call_bpf_2arg(sk, BPF_SOCK_OPS_RTT_CB, mrtt, srtt); } #if IS_ENABLED(CONFIG_SMC) extern struct static_key_false tcp_have_smc; #endif #if IS_ENABLED(CONFIG_TLS_DEVICE) void clean_acked_data_enable(struct tcp_sock *tp, void (*cad)(struct sock *sk, u32 ack_seq)); void clean_acked_data_disable(struct tcp_sock *tp); void clean_acked_data_flush(void); #endif DECLARE_STATIC_KEY_FALSE(tcp_tx_delay_enabled); static inline void tcp_add_tx_delay(struct sk_buff *skb, const struct tcp_sock *tp) { if (static_branch_unlikely(&tcp_tx_delay_enabled)) skb->skb_mstamp_ns += (u64)tp->tcp_tx_delay * NSEC_PER_USEC; } /* Compute Earliest Departure Time for some control packets * like ACK or RST for TIME_WAIT or non ESTABLISHED sockets. */ static inline u64 tcp_transmit_time(const struct sock *sk) { if (static_branch_unlikely(&tcp_tx_delay_enabled)) { u32 delay = (sk->sk_state == TCP_TIME_WAIT) ? tcp_twsk(sk)->tw_tx_delay : tcp_sk(sk)->tcp_tx_delay; return tcp_clock_ns() + (u64)delay * NSEC_PER_USEC; } return 0; } static inline int tcp_parse_auth_options(const struct tcphdr *th, const u8 **md5_hash, const struct tcp_ao_hdr **aoh) { const u8 *md5_tmp, *ao_tmp; int ret; ret = tcp_do_parse_auth_options(th, &md5_tmp, &ao_tmp); if (ret) return ret; if (md5_hash) *md5_hash = md5_tmp; if (aoh) { if (!ao_tmp) *aoh = NULL; else *aoh = (struct tcp_ao_hdr *)(ao_tmp - 2); } return 0; } static inline bool tcp_ao_required(struct sock *sk, const void *saddr, int family, int l3index, bool stat_inc) { #ifdef CONFIG_TCP_AO struct tcp_ao_info *ao_info; struct tcp_ao_key *ao_key; if (!static_branch_unlikely(&tcp_ao_needed.key)) return false; ao_info = rcu_dereference_check(tcp_sk(sk)->ao_info, lockdep_sock_is_held(sk)); if (!ao_info) return false; ao_key = tcp_ao_do_lookup(sk, l3index, saddr, family, -1, -1); if (ao_info->ao_required || ao_key) { if (stat_inc) { NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPAOREQUIRED); atomic64_inc(&ao_info->counters.ao_required); } return true; } #endif return false; } enum skb_drop_reason tcp_inbound_hash(struct sock *sk, const struct request_sock *req, const struct sk_buff *skb, const void *saddr, const void *daddr, int family, int dif, int sdif); #endif /* _TCP_H */ |
| 88 156 156 156 156 33 33 33 32 32 31 32 32 22 33 9 9 9 5 5 190 69 192 191 175 191 176 192 2 2 2 192 169 191 51 74 19 185 35 19 185 184 1 184 172 19 172 42 9 33 3 2 11 185 38 156 38 120 75 56 8 48 10 48 156 154 24 156 160 159 160 156 19 151 156 8 6 2 2 158 159 159 19 20 20 15 14 3 19 3 2 2 1 2 2 3 102 84 103 99 79 6 6 1 1 6 1 4 5 1 95 95 15 15 15 4 111 4 111 1 109 105 5 1 4 103 102 1 101 99 99 98 98 9 1 2 2 1 1 115 115 7 7 1 7 1 7 1 7 1 7 7 7 7 107 107 107 118 118 118 118 118 47 85 47 46 2 2 47 107 9 9 9 8 2 7 7 6 1 6 5 5 4 2 5 2 2 1 10 9 9 10 4 4 4 1 1 3 2 1 51 51 54 51 51 51 22 4 2 49 23 3 1 51 54 69 55 54 54 53 9 53 11 53 45 8 51 38 53 11 53 52 6 53 10 53 7 53 10 43 3 53 12 5 53 17 53 14 52 10 53 10 53 20 53 8 53 12 53 7 53 11 2 53 5 53 4 53 14 53 4 53 5 53 4 53 1 53 1 53 2 1 53 1 1 54 54 53 54 16 17 70 94 94 94 94 94 2 94 94 107 107 107 107 107 107 107 107 107 14 22 22 22 21 95 96 95 95 3 92 94 94 94 94 94 1 94 94 94 94 94 94 94 93 96 101 100 99 97 96 96 2 94 94 32 1 2 106 106 106 100 106 5 101 97 95 2 1 1 97 106 8 106 106 115 115 4 115 115 12 12 12 1 12 12 12 12 12 46 8 46 42 11 42 42 4 1 4 4 25 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 | /* FUSE: Filesystem in Userspace Copyright (C) 2001-2008 Miklos Szeredi <miklos@szeredi.hu> This program can be distributed under the terms of the GNU GPL. See the file COPYING. */ #include "fuse_i.h" #include "dev_uring_i.h" #include <linux/dax.h> #include <linux/pagemap.h> #include <linux/slab.h> #include <linux/file.h> #include <linux/seq_file.h> #include <linux/init.h> #include <linux/module.h> #include <linux/moduleparam.h> #include <linux/fs_context.h> #include <linux/fs_parser.h> #include <linux/statfs.h> #include <linux/random.h> #include <linux/sched.h> #include <linux/exportfs.h> #include <linux/posix_acl.h> #include <linux/pid_namespace.h> #include <uapi/linux/magic.h> MODULE_AUTHOR("Miklos Szeredi <miklos@szeredi.hu>"); MODULE_DESCRIPTION("Filesystem in Userspace"); MODULE_LICENSE("GPL"); static struct kmem_cache *fuse_inode_cachep; struct list_head fuse_conn_list; DEFINE_MUTEX(fuse_mutex); static int set_global_limit(const char *val, const struct kernel_param *kp); unsigned int fuse_max_pages_limit = 256; /* default is no timeout */ unsigned int fuse_default_req_timeout; unsigned int fuse_max_req_timeout; unsigned int max_user_bgreq; module_param_call(max_user_bgreq, set_global_limit, param_get_uint, &max_user_bgreq, 0644); __MODULE_PARM_TYPE(max_user_bgreq, "uint"); MODULE_PARM_DESC(max_user_bgreq, "Global limit for the maximum number of backgrounded requests an " "unprivileged user can set"); unsigned int max_user_congthresh; module_param_call(max_user_congthresh, set_global_limit, param_get_uint, &max_user_congthresh, 0644); __MODULE_PARM_TYPE(max_user_congthresh, "uint"); MODULE_PARM_DESC(max_user_congthresh, "Global limit for the maximum congestion threshold an " "unprivileged user can set"); #define FUSE_DEFAULT_BLKSIZE 512 /** Maximum number of outstanding background requests */ #define FUSE_DEFAULT_MAX_BACKGROUND 12 /** Congestion starts at 75% of maximum */ #define FUSE_DEFAULT_CONGESTION_THRESHOLD (FUSE_DEFAULT_MAX_BACKGROUND * 3 / 4) #ifdef CONFIG_BLOCK static struct file_system_type fuseblk_fs_type; #endif struct fuse_forget_link *fuse_alloc_forget(void) { return kzalloc(sizeof(struct fuse_forget_link), GFP_KERNEL_ACCOUNT); } static struct fuse_submount_lookup *fuse_alloc_submount_lookup(void) { struct fuse_submount_lookup *sl; sl = kzalloc(sizeof(struct fuse_submount_lookup), GFP_KERNEL_ACCOUNT); if (!sl) return NULL; sl->forget = fuse_alloc_forget(); if (!sl->forget) goto out_free; return sl; out_free: kfree(sl); return NULL; } static struct inode *fuse_alloc_inode(struct super_block *sb) { struct fuse_inode *fi; fi = alloc_inode_sb(sb, fuse_inode_cachep, GFP_KERNEL); if (!fi) return NULL; fi->i_time = 0; fi->inval_mask = ~0; fi->nodeid = 0; fi->nlookup = 0; fi->attr_version = 0; fi->orig_ino = 0; fi->state = 0; fi->submount_lookup = NULL; mutex_init(&fi->mutex); spin_lock_init(&fi->lock); fi->forget = fuse_alloc_forget(); if (!fi->forget) goto out_free; if (IS_ENABLED(CONFIG_FUSE_DAX) && !fuse_dax_inode_alloc(sb, fi)) goto out_free_forget; if (IS_ENABLED(CONFIG_FUSE_PASSTHROUGH)) fuse_inode_backing_set(fi, NULL); return &fi->inode; out_free_forget: kfree(fi->forget); out_free: kmem_cache_free(fuse_inode_cachep, fi); return NULL; } static void fuse_free_inode(struct inode *inode) { struct fuse_inode *fi = get_fuse_inode(inode); mutex_destroy(&fi->mutex); kfree(fi->forget); #ifdef CONFIG_FUSE_DAX kfree(fi->dax); #endif if (IS_ENABLED(CONFIG_FUSE_PASSTHROUGH)) fuse_backing_put(fuse_inode_backing(fi)); kmem_cache_free(fuse_inode_cachep, fi); } static void fuse_cleanup_submount_lookup(struct fuse_conn *fc, struct fuse_submount_lookup *sl) { if (!refcount_dec_and_test(&sl->count)) return; fuse_queue_forget(fc, sl->forget, sl->nodeid, 1); sl->forget = NULL; kfree(sl); } static void fuse_evict_inode(struct inode *inode) { struct fuse_inode *fi = get_fuse_inode(inode); /* Will write inode on close/munmap and in all other dirtiers */ WARN_ON(inode->i_state & I_DIRTY_INODE); if (FUSE_IS_DAX(inode)) dax_break_layout_final(inode); truncate_inode_pages_final(&inode->i_data); clear_inode(inode); if (inode->i_sb->s_flags & SB_ACTIVE) { struct fuse_conn *fc = get_fuse_conn(inode); if (FUSE_IS_DAX(inode)) fuse_dax_inode_cleanup(inode); if (fi->nlookup) { fuse_queue_forget(fc, fi->forget, fi->nodeid, fi->nlookup); fi->forget = NULL; } if (fi->submount_lookup) { fuse_cleanup_submount_lookup(fc, fi->submount_lookup); fi->submount_lookup = NULL; } /* * Evict of non-deleted inode may race with outstanding * LOOKUP/READDIRPLUS requests and result in inconsistency when * the request finishes. Deal with that here by bumping a * counter that can be compared to the starting value. */ if (inode->i_nlink > 0) atomic64_inc(&fc->evict_ctr); } if (S_ISREG(inode->i_mode) && !fuse_is_bad(inode)) { WARN_ON(fi->iocachectr != 0); WARN_ON(!list_empty(&fi->write_files)); WARN_ON(!list_empty(&fi->queued_writes)); } } static int fuse_reconfigure(struct fs_context *fsc) { struct super_block *sb = fsc->root->d_sb; sync_filesystem(sb); if (fsc->sb_flags & SB_MANDLOCK) return -EINVAL; return 0; } /* * ino_t is 32-bits on 32-bit arch. We have to squash the 64-bit value down * so that it will fit. */ static ino_t fuse_squash_ino(u64 ino64) { ino_t ino = (ino_t) ino64; if (sizeof(ino_t) < sizeof(u64)) ino ^= ino64 >> (sizeof(u64) - sizeof(ino_t)) * 8; return ino; } void fuse_change_attributes_common(struct inode *inode, struct fuse_attr *attr, struct fuse_statx *sx, u64 attr_valid, u32 cache_mask, u64 evict_ctr) { struct fuse_conn *fc = get_fuse_conn(inode); struct fuse_inode *fi = get_fuse_inode(inode); lockdep_assert_held(&fi->lock); /* * Clear basic stats from invalid mask. * * Don't do this if this is coming from a fuse_iget() call and there * might have been a racing evict which would've invalidated the result * if the attr_version would've been preserved. * * !evict_ctr -> this is create * fi->attr_version != 0 -> this is not a new inode * evict_ctr == fuse_get_evict_ctr() -> no evicts while during request */ if (!evict_ctr || fi->attr_version || evict_ctr == fuse_get_evict_ctr(fc)) set_mask_bits(&fi->inval_mask, STATX_BASIC_STATS, 0); fi->attr_version = atomic64_inc_return(&fc->attr_version); fi->i_time = attr_valid; inode->i_ino = fuse_squash_ino(attr->ino); inode->i_mode = (inode->i_mode & S_IFMT) | (attr->mode & 07777); set_nlink(inode, attr->nlink); inode->i_uid = make_kuid(fc->user_ns, attr->uid); inode->i_gid = make_kgid(fc->user_ns, attr->gid); inode->i_blocks = attr->blocks; /* Sanitize nsecs */ attr->atimensec = min_t(u32, attr->atimensec, NSEC_PER_SEC - 1); attr->mtimensec = min_t(u32, attr->mtimensec, NSEC_PER_SEC - 1); attr->ctimensec = min_t(u32, attr->ctimensec, NSEC_PER_SEC - 1); inode_set_atime(inode, attr->atime, attr->atimensec); /* mtime from server may be stale due to local buffered write */ if (!(cache_mask & STATX_MTIME)) { inode_set_mtime(inode, attr->mtime, attr->mtimensec); } if (!(cache_mask & STATX_CTIME)) { inode_set_ctime(inode, attr->ctime, attr->ctimensec); } if (sx) { /* Sanitize nsecs */ sx->btime.tv_nsec = min_t(u32, sx->btime.tv_nsec, NSEC_PER_SEC - 1); /* * Btime has been queried, cache is valid (whether or not btime * is available or not) so clear STATX_BTIME from inval_mask. * * Availability of the btime attribute is indicated in * FUSE_I_BTIME */ set_mask_bits(&fi->inval_mask, STATX_BTIME, 0); if (sx->mask & STATX_BTIME) { set_bit(FUSE_I_BTIME, &fi->state); fi->i_btime.tv_sec = sx->btime.tv_sec; fi->i_btime.tv_nsec = sx->btime.tv_nsec; } } /* * Don't set the sticky bit in i_mode, unless we want the VFS * to check permissions. This prevents failures due to the * check in may_delete(). */ fi->orig_i_mode = inode->i_mode; if (!fc->default_permissions) inode->i_mode &= ~S_ISVTX; fi->orig_ino = attr->ino; /* * We are refreshing inode data and it is possible that another * client set suid/sgid or security.capability xattr. So clear * S_NOSEC. Ideally, we could have cleared it only if suid/sgid * was set or if security.capability xattr was set. But we don't * know if security.capability has been set or not. So clear it * anyway. Its less efficient but should be safe. */ inode->i_flags &= ~S_NOSEC; } u32 fuse_get_cache_mask(struct inode *inode) { struct fuse_conn *fc = get_fuse_conn(inode); if (!fc->writeback_cache || !S_ISREG(inode->i_mode)) return 0; return STATX_MTIME | STATX_CTIME | STATX_SIZE; } static void fuse_change_attributes_i(struct inode *inode, struct fuse_attr *attr, struct fuse_statx *sx, u64 attr_valid, u64 attr_version, u64 evict_ctr) { struct fuse_conn *fc = get_fuse_conn(inode); struct fuse_inode *fi = get_fuse_inode(inode); u32 cache_mask; loff_t oldsize; struct timespec64 old_mtime; spin_lock(&fi->lock); /* * In case of writeback_cache enabled, writes update mtime, ctime and * may update i_size. In these cases trust the cached value in the * inode. */ cache_mask = fuse_get_cache_mask(inode); if (cache_mask & STATX_SIZE) attr->size = i_size_read(inode); if (cache_mask & STATX_MTIME) { attr->mtime = inode_get_mtime_sec(inode); attr->mtimensec = inode_get_mtime_nsec(inode); } if (cache_mask & STATX_CTIME) { attr->ctime = inode_get_ctime_sec(inode); attr->ctimensec = inode_get_ctime_nsec(inode); } if ((attr_version != 0 && fi->attr_version > attr_version) || test_bit(FUSE_I_SIZE_UNSTABLE, &fi->state)) { spin_unlock(&fi->lock); return; } old_mtime = inode_get_mtime(inode); fuse_change_attributes_common(inode, attr, sx, attr_valid, cache_mask, evict_ctr); oldsize = inode->i_size; /* * In case of writeback_cache enabled, the cached writes beyond EOF * extend local i_size without keeping userspace server in sync. So, * attr->size coming from server can be stale. We cannot trust it. */ if (!(cache_mask & STATX_SIZE)) i_size_write(inode, attr->size); spin_unlock(&fi->lock); if (!cache_mask && S_ISREG(inode->i_mode)) { bool inval = false; if (oldsize != attr->size) { truncate_pagecache(inode, attr->size); if (!fc->explicit_inval_data) inval = true; } else if (fc->auto_inval_data) { struct timespec64 new_mtime = { .tv_sec = attr->mtime, .tv_nsec = attr->mtimensec, }; /* * Auto inval mode also checks and invalidates if mtime * has changed. */ if (!timespec64_equal(&old_mtime, &new_mtime)) inval = true; } if (inval) invalidate_inode_pages2(inode->i_mapping); } if (IS_ENABLED(CONFIG_FUSE_DAX)) fuse_dax_dontcache(inode, attr->flags); } void fuse_change_attributes(struct inode *inode, struct fuse_attr *attr, struct fuse_statx *sx, u64 attr_valid, u64 attr_version) { fuse_change_attributes_i(inode, attr, sx, attr_valid, attr_version, 0); } static void fuse_init_submount_lookup(struct fuse_submount_lookup *sl, u64 nodeid) { sl->nodeid = nodeid; refcount_set(&sl->count, 1); } static void fuse_init_inode(struct inode *inode, struct fuse_attr *attr, struct fuse_conn *fc) { inode->i_mode = attr->mode & S_IFMT; inode->i_size = attr->size; inode_set_mtime(inode, attr->mtime, attr->mtimensec); inode_set_ctime(inode, attr->ctime, attr->ctimensec); if (S_ISREG(inode->i_mode)) { fuse_init_common(inode); fuse_init_file_inode(inode, attr->flags); } else if (S_ISDIR(inode->i_mode)) fuse_init_dir(inode); else if (S_ISLNK(inode->i_mode)) fuse_init_symlink(inode); else if (S_ISCHR(inode->i_mode) || S_ISBLK(inode->i_mode) || S_ISFIFO(inode->i_mode) || S_ISSOCK(inode->i_mode)) { fuse_init_common(inode); init_special_inode(inode, inode->i_mode, new_decode_dev(attr->rdev)); } else BUG(); /* * Ensure that we don't cache acls for daemons without FUSE_POSIX_ACL * so they see the exact same behavior as before. */ if (!fc->posix_acl) inode->i_acl = inode->i_default_acl = ACL_DONT_CACHE; } static int fuse_inode_eq(struct inode *inode, void *_nodeidp) { u64 nodeid = *(u64 *) _nodeidp; if (get_node_id(inode) == nodeid) return 1; else return 0; } static int fuse_inode_set(struct inode *inode, void *_nodeidp) { u64 nodeid = *(u64 *) _nodeidp; get_fuse_inode(inode)->nodeid = nodeid; return 0; } struct inode *fuse_iget(struct super_block *sb, u64 nodeid, int generation, struct fuse_attr *attr, u64 attr_valid, u64 attr_version, u64 evict_ctr) { struct inode *inode; struct fuse_inode *fi; struct fuse_conn *fc = get_fuse_conn_super(sb); /* * Auto mount points get their node id from the submount root, which is * not a unique identifier within this filesystem. * * To avoid conflicts, do not place submount points into the inode hash * table. */ if (fc->auto_submounts && (attr->flags & FUSE_ATTR_SUBMOUNT) && S_ISDIR(attr->mode)) { struct fuse_inode *fi; inode = new_inode(sb); if (!inode) return NULL; fuse_init_inode(inode, attr, fc); fi = get_fuse_inode(inode); fi->nodeid = nodeid; fi->submount_lookup = fuse_alloc_submount_lookup(); if (!fi->submount_lookup) { iput(inode); return NULL; } /* Sets nlookup = 1 on fi->submount_lookup->nlookup */ fuse_init_submount_lookup(fi->submount_lookup, nodeid); inode->i_flags |= S_AUTOMOUNT; goto done; } retry: inode = iget5_locked(sb, nodeid, fuse_inode_eq, fuse_inode_set, &nodeid); if (!inode) return NULL; if ((inode->i_state & I_NEW)) { inode->i_flags |= S_NOATIME; if (!fc->writeback_cache || !S_ISREG(attr->mode)) inode->i_flags |= S_NOCMTIME; inode->i_generation = generation; fuse_init_inode(inode, attr, fc); unlock_new_inode(inode); } else if (fuse_stale_inode(inode, generation, attr)) { /* nodeid was reused, any I/O on the old inode should fail */ fuse_make_bad(inode); if (inode != d_inode(sb->s_root)) { remove_inode_hash(inode); iput(inode); goto retry; } } fi = get_fuse_inode(inode); spin_lock(&fi->lock); fi->nlookup++; spin_unlock(&fi->lock); done: fuse_change_attributes_i(inode, attr, NULL, attr_valid, attr_version, evict_ctr); return inode; } struct inode *fuse_ilookup(struct fuse_conn *fc, u64 nodeid, struct fuse_mount **fm) { struct fuse_mount *fm_iter; struct inode *inode; WARN_ON(!rwsem_is_locked(&fc->killsb)); list_for_each_entry(fm_iter, &fc->mounts, fc_entry) { if (!fm_iter->sb) continue; inode = ilookup5(fm_iter->sb, nodeid, fuse_inode_eq, &nodeid); if (inode) { if (fm) *fm = fm_iter; return inode; } } return NULL; } int fuse_reverse_inval_inode(struct fuse_conn *fc, u64 nodeid, loff_t offset, loff_t len) { struct fuse_inode *fi; struct inode *inode; pgoff_t pg_start; pgoff_t pg_end; inode = fuse_ilookup(fc, nodeid, NULL); if (!inode) return -ENOENT; fi = get_fuse_inode(inode); spin_lock(&fi->lock); fi->attr_version = atomic64_inc_return(&fc->attr_version); spin_unlock(&fi->lock); fuse_invalidate_attr(inode); forget_all_cached_acls(inode); if (offset >= 0) { pg_start = offset >> PAGE_SHIFT; if (len <= 0) pg_end = -1; else pg_end = (offset + len - 1) >> PAGE_SHIFT; invalidate_inode_pages2_range(inode->i_mapping, pg_start, pg_end); } iput(inode); return 0; } bool fuse_lock_inode(struct inode *inode) { bool locked = false; if (!get_fuse_conn(inode)->parallel_dirops) { mutex_lock(&get_fuse_inode(inode)->mutex); locked = true; } return locked; } void fuse_unlock_inode(struct inode *inode, bool locked) { if (locked) mutex_unlock(&get_fuse_inode(inode)->mutex); } static void fuse_umount_begin(struct super_block *sb) { struct fuse_conn *fc = get_fuse_conn_super(sb); if (fc->no_force_umount) return; fuse_abort_conn(fc); // Only retire block-device-based superblocks. if (sb->s_bdev != NULL) retire_super(sb); } static void fuse_send_destroy(struct fuse_mount *fm) { if (fm->fc->conn_init) { FUSE_ARGS(args); args.opcode = FUSE_DESTROY; args.force = true; args.nocreds = true; fuse_simple_request(fm, &args); } } static void convert_fuse_statfs(struct kstatfs *stbuf, struct fuse_kstatfs *attr) { stbuf->f_type = FUSE_SUPER_MAGIC; stbuf->f_bsize = attr->bsize; stbuf->f_frsize = attr->frsize; stbuf->f_blocks = attr->blocks; stbuf->f_bfree = attr->bfree; stbuf->f_bavail = attr->bavail; stbuf->f_files = attr->files; stbuf->f_ffree = attr->ffree; stbuf->f_namelen = attr->namelen; /* fsid is left zero */ } static int fuse_statfs(struct dentry *dentry, struct kstatfs *buf) { struct super_block *sb = dentry->d_sb; struct fuse_mount *fm = get_fuse_mount_super(sb); FUSE_ARGS(args); struct fuse_statfs_out outarg; int err; if (!fuse_allow_current_process(fm->fc)) { buf->f_type = FUSE_SUPER_MAGIC; return 0; } memset(&outarg, 0, sizeof(outarg)); args.in_numargs = 0; args.opcode = FUSE_STATFS; args.nodeid = get_node_id(d_inode(dentry)); args.out_numargs = 1; args.out_args[0].size = sizeof(outarg); args.out_args[0].value = &outarg; err = fuse_simple_request(fm, &args); if (!err) convert_fuse_statfs(buf, &outarg.st); return err; } static struct fuse_sync_bucket *fuse_sync_bucket_alloc(void) { struct fuse_sync_bucket *bucket; bucket = kzalloc(sizeof(*bucket), GFP_KERNEL | __GFP_NOFAIL); if (bucket) { init_waitqueue_head(&bucket->waitq); /* Initial active count */ atomic_set(&bucket->count, 1); } return bucket; } static void fuse_sync_fs_writes(struct fuse_conn *fc) { struct fuse_sync_bucket *bucket, *new_bucket; int count; new_bucket = fuse_sync_bucket_alloc(); spin_lock(&fc->lock); bucket = rcu_dereference_protected(fc->curr_bucket, 1); count = atomic_read(&bucket->count); WARN_ON(count < 1); /* No outstanding writes? */ if (count == 1) { spin_unlock(&fc->lock); kfree(new_bucket); return; } /* * Completion of new bucket depends on completion of this bucket, so add * one more count. */ atomic_inc(&new_bucket->count); rcu_assign_pointer(fc->curr_bucket, new_bucket); spin_unlock(&fc->lock); /* * Drop initial active count. At this point if all writes in this and * ancestor buckets complete, the count will go to zero and this task * will be woken up. */ atomic_dec(&bucket->count); wait_event(bucket->waitq, atomic_read(&bucket->count) == 0); /* Drop temp count on descendant bucket */ fuse_sync_bucket_dec(new_bucket); kfree_rcu(bucket, rcu); } static int fuse_sync_fs(struct super_block *sb, int wait) { struct fuse_mount *fm = get_fuse_mount_super(sb); struct fuse_conn *fc = fm->fc; struct fuse_syncfs_in inarg; FUSE_ARGS(args); int err; /* * Userspace cannot handle the wait == 0 case. Avoid a * gratuitous roundtrip. */ if (!wait) return 0; /* The filesystem is being unmounted. Nothing to do. */ if (!sb->s_root) return 0; if (!fc->sync_fs) return 0; fuse_sync_fs_writes(fc); memset(&inarg, 0, sizeof(inarg)); args.in_numargs = 1; args.in_args[0].size = sizeof(inarg); args.in_args[0].value = &inarg; args.opcode = FUSE_SYNCFS; args.nodeid = get_node_id(sb->s_root->d_inode); args.out_numargs = 0; err = fuse_simple_request(fm, &args); if (err == -ENOSYS) { fc->sync_fs = 0; err = 0; } return err; } enum { OPT_SOURCE, OPT_SUBTYPE, OPT_FD, OPT_ROOTMODE, OPT_USER_ID, OPT_GROUP_ID, OPT_DEFAULT_PERMISSIONS, OPT_ALLOW_OTHER, OPT_MAX_READ, OPT_BLKSIZE, OPT_ERR }; static const struct fs_parameter_spec fuse_fs_parameters[] = { fsparam_string ("source", OPT_SOURCE), fsparam_u32 ("fd", OPT_FD), fsparam_u32oct ("rootmode", OPT_ROOTMODE), fsparam_uid ("user_id", OPT_USER_ID), fsparam_gid ("group_id", OPT_GROUP_ID), fsparam_flag ("default_permissions", OPT_DEFAULT_PERMISSIONS), fsparam_flag ("allow_other", OPT_ALLOW_OTHER), fsparam_u32 ("max_read", OPT_MAX_READ), fsparam_u32 ("blksize", OPT_BLKSIZE), fsparam_string ("subtype", OPT_SUBTYPE), {} }; static int fuse_parse_param(struct fs_context *fsc, struct fs_parameter *param) { struct fs_parse_result result; struct fuse_fs_context *ctx = fsc->fs_private; int opt; kuid_t kuid; kgid_t kgid; if (fsc->purpose == FS_CONTEXT_FOR_RECONFIGURE) { /* * Ignore options coming from mount(MS_REMOUNT) for backward * compatibility. */ if (fsc->oldapi) return 0; return invalfc(fsc, "No changes allowed in reconfigure"); } opt = fs_parse(fsc, fuse_fs_parameters, param, &result); if (opt < 0) return opt; switch (opt) { case OPT_SOURCE: if (fsc->source) return invalfc(fsc, "Multiple sources specified"); fsc->source = param->string; param->string = NULL; break; case OPT_SUBTYPE: if (ctx->subtype) return invalfc(fsc, "Multiple subtypes specified"); ctx->subtype = param->string; param->string = NULL; return 0; case OPT_FD: ctx->fd = result.uint_32; ctx->fd_present = true; break; case OPT_ROOTMODE: if (!fuse_valid_type(result.uint_32)) return invalfc(fsc, "Invalid rootmode"); ctx->rootmode = result.uint_32; ctx->rootmode_present = true; break; case OPT_USER_ID: kuid = result.uid; /* * The requested uid must be representable in the * filesystem's idmapping. */ if (!kuid_has_mapping(fsc->user_ns, kuid)) return invalfc(fsc, "Invalid user_id"); ctx->user_id = kuid; ctx->user_id_present = true; break; case OPT_GROUP_ID: kgid = result.gid; /* * The requested gid must be representable in the * filesystem's idmapping. */ if (!kgid_has_mapping(fsc->user_ns, kgid)) return invalfc(fsc, "Invalid group_id"); ctx->group_id = kgid; ctx->group_id_present = true; break; case OPT_DEFAULT_PERMISSIONS: ctx->default_permissions = true; break; case OPT_ALLOW_OTHER: ctx->allow_other = true; break; case OPT_MAX_READ: ctx->max_read = result.uint_32; break; case OPT_BLKSIZE: if (!ctx->is_bdev) return invalfc(fsc, "blksize only supported for fuseblk"); ctx->blksize = result.uint_32; break; default: return -EINVAL; } return 0; } static void fuse_free_fsc(struct fs_context *fsc) { struct fuse_fs_context *ctx = fsc->fs_private; if (ctx) { kfree(ctx->subtype); kfree(ctx); } } static int fuse_show_options(struct seq_file *m, struct dentry *root) { struct super_block *sb = root->d_sb; struct fuse_conn *fc = get_fuse_conn_super(sb); if (fc->legacy_opts_show) { seq_printf(m, ",user_id=%u", from_kuid_munged(fc->user_ns, fc->user_id)); seq_printf(m, ",group_id=%u", from_kgid_munged(fc->user_ns, fc->group_id)); if (fc->default_permissions) seq_puts(m, ",default_permissions"); if (fc->allow_other) seq_puts(m, ",allow_other"); if (fc->max_read != ~0) seq_printf(m, ",max_read=%u", fc->max_read); if (sb->s_bdev && sb->s_blocksize != FUSE_DEFAULT_BLKSIZE) seq_printf(m, ",blksize=%lu", sb->s_blocksize); } #ifdef CONFIG_FUSE_DAX if (fc->dax_mode == FUSE_DAX_ALWAYS) seq_puts(m, ",dax=always"); else if (fc->dax_mode == FUSE_DAX_NEVER) seq_puts(m, ",dax=never"); else if (fc->dax_mode == FUSE_DAX_INODE_USER) seq_puts(m, ",dax=inode"); #endif return 0; } static void fuse_iqueue_init(struct fuse_iqueue *fiq, const struct fuse_iqueue_ops *ops, void *priv) { memset(fiq, 0, sizeof(struct fuse_iqueue)); spin_lock_init(&fiq->lock); init_waitqueue_head(&fiq->waitq); INIT_LIST_HEAD(&fiq->pending); INIT_LIST_HEAD(&fiq->interrupts); fiq->forget_list_tail = &fiq->forget_list_head; fiq->connected = 1; fiq->ops = ops; fiq->priv = priv; } void fuse_pqueue_init(struct fuse_pqueue *fpq) { unsigned int i; spin_lock_init(&fpq->lock); for (i = 0; i < FUSE_PQ_HASH_SIZE; i++) INIT_LIST_HEAD(&fpq->processing[i]); INIT_LIST_HEAD(&fpq->io); fpq->connected = 1; } void fuse_conn_init(struct fuse_conn *fc, struct fuse_mount *fm, struct user_namespace *user_ns, const struct fuse_iqueue_ops *fiq_ops, void *fiq_priv) { memset(fc, 0, sizeof(*fc)); spin_lock_init(&fc->lock); spin_lock_init(&fc->bg_lock); init_rwsem(&fc->killsb); refcount_set(&fc->count, 1); atomic_set(&fc->dev_count, 1); atomic_set(&fc->epoch, 1); init_waitqueue_head(&fc->blocked_waitq); fuse_iqueue_init(&fc->iq, fiq_ops, fiq_priv); INIT_LIST_HEAD(&fc->bg_queue); INIT_LIST_HEAD(&fc->entry); INIT_LIST_HEAD(&fc->devices); atomic_set(&fc->num_waiting, 0); fc->max_background = FUSE_DEFAULT_MAX_BACKGROUND; fc->congestion_threshold = FUSE_DEFAULT_CONGESTION_THRESHOLD; atomic64_set(&fc->khctr, 0); fc->polled_files = RB_ROOT; fc->blocked = 0; fc->initialized = 0; fc->connected = 1; atomic64_set(&fc->attr_version, 1); atomic64_set(&fc->evict_ctr, 1); get_random_bytes(&fc->scramble_key, sizeof(fc->scramble_key)); fc->pid_ns = get_pid_ns(task_active_pid_ns(current)); fc->user_ns = get_user_ns(user_ns); fc->max_pages = FUSE_DEFAULT_MAX_PAGES_PER_REQ; fc->max_pages_limit = fuse_max_pages_limit; fc->name_max = FUSE_NAME_LOW_MAX; fc->timeout.req_timeout = 0; if (IS_ENABLED(CONFIG_FUSE_PASSTHROUGH)) fuse_backing_files_init(fc); INIT_LIST_HEAD(&fc->mounts); list_add(&fm->fc_entry, &fc->mounts); fm->fc = fc; } EXPORT_SYMBOL_GPL(fuse_conn_init); static void delayed_release(struct rcu_head *p) { struct fuse_conn *fc = container_of(p, struct fuse_conn, rcu); fuse_uring_destruct(fc); put_user_ns(fc->user_ns); fc->release(fc); } void fuse_conn_put(struct fuse_conn *fc) { if (refcount_dec_and_test(&fc->count)) { struct fuse_iqueue *fiq = &fc->iq; struct fuse_sync_bucket *bucket; if (IS_ENABLED(CONFIG_FUSE_DAX)) fuse_dax_conn_free(fc); if (fc->timeout.req_timeout) cancel_delayed_work_sync(&fc->timeout.work); if (fiq->ops->release) fiq->ops->release(fiq); put_pid_ns(fc->pid_ns); bucket = rcu_dereference_protected(fc->curr_bucket, 1); if (bucket) { WARN_ON(atomic_read(&bucket->count) != 1); kfree(bucket); } if (IS_ENABLED(CONFIG_FUSE_PASSTHROUGH)) fuse_backing_files_free(fc); call_rcu(&fc->rcu, delayed_release); } } EXPORT_SYMBOL_GPL(fuse_conn_put); struct fuse_conn *fuse_conn_get(struct fuse_conn *fc) { refcount_inc(&fc->count); return fc; } EXPORT_SYMBOL_GPL(fuse_conn_get); static struct inode *fuse_get_root_inode(struct super_block *sb, unsigned int mode) { struct fuse_attr attr; memset(&attr, 0, sizeof(attr)); attr.mode = mode; attr.ino = FUSE_ROOT_ID; attr.nlink = 1; return fuse_iget(sb, FUSE_ROOT_ID, 0, &attr, 0, 0, 0); } struct fuse_inode_handle { u64 nodeid; u32 generation; }; static struct dentry *fuse_get_dentry(struct super_block *sb, struct fuse_inode_handle *handle) { struct fuse_conn *fc = get_fuse_conn_super(sb); struct inode *inode; struct dentry *entry; int err = -ESTALE; if (handle->nodeid == 0) goto out_err; inode = ilookup5(sb, handle->nodeid, fuse_inode_eq, &handle->nodeid); if (!inode) { struct fuse_entry_out outarg; const struct qstr name = QSTR_INIT(".", 1); if (!fc->export_support) goto out_err; err = fuse_lookup_name(sb, handle->nodeid, &name, &outarg, &inode); if (err && err != -ENOENT) goto out_err; if (err || !inode) { err = -ESTALE; goto out_err; } err = -EIO; if (get_node_id(inode) != handle->nodeid) goto out_iput; } err = -ESTALE; if (inode->i_generation != handle->generation) goto out_iput; entry = d_obtain_alias(inode); if (!IS_ERR(entry) && get_node_id(inode) != FUSE_ROOT_ID) fuse_invalidate_entry_cache(entry); return entry; out_iput: iput(inode); out_err: return ERR_PTR(err); } static int fuse_encode_fh(struct inode *inode, u32 *fh, int *max_len, struct inode *parent) { int len = parent ? 6 : 3; u64 nodeid; u32 generation; if (*max_len < len) { *max_len = len; return FILEID_INVALID; } nodeid = get_fuse_inode(inode)->nodeid; generation = inode->i_generation; fh[0] = (u32)(nodeid >> 32); fh[1] = (u32)(nodeid & 0xffffffff); fh[2] = generation; if (parent) { nodeid = get_fuse_inode(parent)->nodeid; generation = parent->i_generation; fh[3] = (u32)(nodeid >> 32); fh[4] = (u32)(nodeid & 0xffffffff); fh[5] = generation; } *max_len = len; return parent ? FILEID_INO64_GEN_PARENT : FILEID_INO64_GEN; } static struct dentry *fuse_fh_to_dentry(struct super_block *sb, struct fid *fid, int fh_len, int fh_type) { struct fuse_inode_handle handle; if ((fh_type != FILEID_INO64_GEN && fh_type != FILEID_INO64_GEN_PARENT) || fh_len < 3) return NULL; handle.nodeid = (u64) fid->raw[0] << 32; handle.nodeid |= (u64) fid->raw[1]; handle.generation = fid->raw[2]; return fuse_get_dentry(sb, &handle); } static struct dentry *fuse_fh_to_parent(struct super_block *sb, struct fid *fid, int fh_len, int fh_type) { struct fuse_inode_handle parent; if (fh_type != FILEID_INO64_GEN_PARENT || fh_len < 6) return NULL; parent.nodeid = (u64) fid->raw[3] << 32; parent.nodeid |= (u64) fid->raw[4]; parent.generation = fid->raw[5]; return fuse_get_dentry(sb, &parent); } static struct dentry *fuse_get_parent(struct dentry *child) { struct inode *child_inode = d_inode(child); struct fuse_conn *fc = get_fuse_conn(child_inode); struct inode *inode; struct dentry *parent; struct fuse_entry_out outarg; int err; if (!fc->export_support) return ERR_PTR(-ESTALE); err = fuse_lookup_name(child_inode->i_sb, get_node_id(child_inode), &dotdot_name, &outarg, &inode); if (err) { if (err == -ENOENT) return ERR_PTR(-ESTALE); return ERR_PTR(err); } parent = d_obtain_alias(inode); if (!IS_ERR(parent) && get_node_id(inode) != FUSE_ROOT_ID) fuse_invalidate_entry_cache(parent); return parent; } /* only for fid encoding; no support for file handle */ static const struct export_operations fuse_export_fid_operations = { .encode_fh = fuse_encode_fh, }; static const struct export_operations fuse_export_operations = { .fh_to_dentry = fuse_fh_to_dentry, .fh_to_parent = fuse_fh_to_parent, .encode_fh = fuse_encode_fh, .get_parent = fuse_get_parent, }; static const struct super_operations fuse_super_operations = { .alloc_inode = fuse_alloc_inode, .free_inode = fuse_free_inode, .evict_inode = fuse_evict_inode, .write_inode = fuse_write_inode, .drop_inode = generic_delete_inode, .umount_begin = fuse_umount_begin, .statfs = fuse_statfs, .sync_fs = fuse_sync_fs, .show_options = fuse_show_options, }; static void sanitize_global_limit(unsigned int *limit) { /* * The default maximum number of async requests is calculated to consume * 1/2^13 of the total memory, assuming 392 bytes per request. */ if (*limit == 0) *limit = ((totalram_pages() << PAGE_SHIFT) >> 13) / 392; if (*limit >= 1 << 16) *limit = (1 << 16) - 1; } static int set_global_limit(const char *val, const struct kernel_param *kp) { int rv; rv = param_set_uint(val, kp); if (rv) return rv; sanitize_global_limit((unsigned int *)kp->arg); return 0; } static void process_init_limits(struct fuse_conn *fc, struct fuse_init_out *arg) { int cap_sys_admin = capable(CAP_SYS_ADMIN); if (arg->minor < 13) return; sanitize_global_limit(&max_user_bgreq); sanitize_global_limit(&max_user_congthresh); spin_lock(&fc->bg_lock); if (arg->max_background) { fc->max_background = arg->max_background; if (!cap_sys_admin && fc->max_background > max_user_bgreq) fc->max_background = max_user_bgreq; } if (arg->congestion_threshold) { fc->congestion_threshold = arg->congestion_threshold; if (!cap_sys_admin && fc->congestion_threshold > max_user_congthresh) fc->congestion_threshold = max_user_congthresh; } spin_unlock(&fc->bg_lock); } static void set_request_timeout(struct fuse_conn *fc, unsigned int timeout) { fc->timeout.req_timeout = secs_to_jiffies(timeout); INIT_DELAYED_WORK(&fc->timeout.work, fuse_check_timeout); queue_delayed_work(system_wq, &fc->timeout.work, fuse_timeout_timer_freq); } static void init_server_timeout(struct fuse_conn *fc, unsigned int timeout) { if (!timeout && !fuse_max_req_timeout && !fuse_default_req_timeout) return; if (!timeout) timeout = fuse_default_req_timeout; if (fuse_max_req_timeout) { if (timeout) timeout = min(fuse_max_req_timeout, timeout); else timeout = fuse_max_req_timeout; } timeout = max(FUSE_TIMEOUT_TIMER_FREQ, timeout); set_request_timeout(fc, timeout); } struct fuse_init_args { struct fuse_args args; struct fuse_init_in in; struct fuse_init_out out; }; static void process_init_reply(struct fuse_mount *fm, struct fuse_args *args, int error) { struct fuse_conn *fc = fm->fc; struct fuse_init_args *ia = container_of(args, typeof(*ia), args); struct fuse_init_out *arg = &ia->out; bool ok = true; if (error || arg->major != FUSE_KERNEL_VERSION) ok = false; else { unsigned long ra_pages; unsigned int timeout = 0; process_init_limits(fc, arg); if (arg->minor >= 6) { u64 flags = arg->flags; if (flags & FUSE_INIT_EXT) flags |= (u64) arg->flags2 << 32; ra_pages = arg->max_readahead / PAGE_SIZE; if (flags & FUSE_ASYNC_READ) fc->async_read = 1; if (!(flags & FUSE_POSIX_LOCKS)) fc->no_lock = 1; if (arg->minor >= 17) { if (!(flags & FUSE_FLOCK_LOCKS)) fc->no_flock = 1; } else { if (!(flags & FUSE_POSIX_LOCKS)) fc->no_flock = 1; } if (flags & FUSE_ATOMIC_O_TRUNC) fc->atomic_o_trunc = 1; if (arg->minor >= 9) { /* LOOKUP has dependency on proto version */ if (flags & FUSE_EXPORT_SUPPORT) fc->export_support = 1; } if (flags & FUSE_BIG_WRITES) fc->big_writes = 1; if (flags & FUSE_DONT_MASK) fc->dont_mask = 1; if (flags & FUSE_AUTO_INVAL_DATA) fc->auto_inval_data = 1; else if (flags & FUSE_EXPLICIT_INVAL_DATA) fc->explicit_inval_data = 1; if (flags & FUSE_DO_READDIRPLUS) { fc->do_readdirplus = 1; if (flags & FUSE_READDIRPLUS_AUTO) fc->readdirplus_auto = 1; } if (flags & FUSE_ASYNC_DIO) fc->async_dio = 1; if (flags & FUSE_WRITEBACK_CACHE) fc->writeback_cache = 1; if (flags & FUSE_PARALLEL_DIROPS) fc->parallel_dirops = 1; if (flags & FUSE_HANDLE_KILLPRIV) fc->handle_killpriv = 1; if (arg->time_gran && arg->time_gran <= 1000000000) fm->sb->s_time_gran = arg->time_gran; if ((flags & FUSE_POSIX_ACL)) { fc->default_permissions = 1; fc->posix_acl = 1; } if (flags & FUSE_CACHE_SYMLINKS) fc->cache_symlinks = 1; if (flags & FUSE_ABORT_ERROR) fc->abort_err = 1; if (flags & FUSE_MAX_PAGES) { fc->max_pages = min_t(unsigned int, fc->max_pages_limit, max_t(unsigned int, arg->max_pages, 1)); /* * PATH_MAX file names might need two pages for * ops like rename */ if (fc->max_pages > 1) fc->name_max = FUSE_NAME_MAX; } if (IS_ENABLED(CONFIG_FUSE_DAX)) { if (flags & FUSE_MAP_ALIGNMENT && !fuse_dax_check_alignment(fc, arg->map_alignment)) { ok = false; } if (flags & FUSE_HAS_INODE_DAX) fc->inode_dax = 1; } if (flags & FUSE_HANDLE_KILLPRIV_V2) { fc->handle_killpriv_v2 = 1; fm->sb->s_flags |= SB_NOSEC; } if (flags & FUSE_SETXATTR_EXT) fc->setxattr_ext = 1; if (flags & FUSE_SECURITY_CTX) fc->init_security = 1; if (flags & FUSE_CREATE_SUPP_GROUP) fc->create_supp_group = 1; if (flags & FUSE_DIRECT_IO_ALLOW_MMAP) fc->direct_io_allow_mmap = 1; /* * max_stack_depth is the max stack depth of FUSE fs, * so it has to be at least 1 to support passthrough * to backing files. * * with max_stack_depth > 1, the backing files can be * on a stacked fs (e.g. overlayfs) themselves and with * max_stack_depth == 1, FUSE fs can be stacked as the * underlying fs of a stacked fs (e.g. overlayfs). * * Also don't allow the combination of FUSE_PASSTHROUGH * and FUSE_WRITEBACK_CACHE, current design doesn't handle * them together. */ if (IS_ENABLED(CONFIG_FUSE_PASSTHROUGH) && (flags & FUSE_PASSTHROUGH) && arg->max_stack_depth > 0 && arg->max_stack_depth <= FILESYSTEM_MAX_STACK_DEPTH && !(flags & FUSE_WRITEBACK_CACHE)) { fc->passthrough = 1; fc->max_stack_depth = arg->max_stack_depth; fm->sb->s_stack_depth = arg->max_stack_depth; } if (flags & FUSE_NO_EXPORT_SUPPORT) fm->sb->s_export_op = &fuse_export_fid_operations; if (flags & FUSE_ALLOW_IDMAP) { if (fc->default_permissions) fm->sb->s_iflags &= ~SB_I_NOIDMAP; else ok = false; } if (flags & FUSE_OVER_IO_URING && fuse_uring_enabled()) fc->io_uring = 1; if (flags & FUSE_REQUEST_TIMEOUT) timeout = arg->request_timeout; } else { ra_pages = fc->max_read / PAGE_SIZE; fc->no_lock = 1; fc->no_flock = 1; } init_server_timeout(fc, timeout); fm->sb->s_bdi->ra_pages = min(fm->sb->s_bdi->ra_pages, ra_pages); fc->minor = arg->minor; fc->max_write = arg->minor < 5 ? 4096 : arg->max_write; fc->max_write = max_t(unsigned, 4096, fc->max_write); fc->conn_init = 1; } kfree(ia); if (!ok) { fc->conn_init = 0; fc->conn_error = 1; } fuse_set_initialized(fc); wake_up_all(&fc->blocked_waitq); } void fuse_send_init(struct fuse_mount *fm) { struct fuse_init_args *ia; u64 flags; ia = kzalloc(sizeof(*ia), GFP_KERNEL | __GFP_NOFAIL); ia->in.major = FUSE_KERNEL_VERSION; ia->in.minor = FUSE_KERNEL_MINOR_VERSION; ia->in.max_readahead = fm->sb->s_bdi->ra_pages * PAGE_SIZE; flags = FUSE_ASYNC_READ | FUSE_POSIX_LOCKS | FUSE_ATOMIC_O_TRUNC | FUSE_EXPORT_SUPPORT | FUSE_BIG_WRITES | FUSE_DONT_MASK | FUSE_SPLICE_WRITE | FUSE_SPLICE_MOVE | FUSE_SPLICE_READ | FUSE_FLOCK_LOCKS | FUSE_HAS_IOCTL_DIR | FUSE_AUTO_INVAL_DATA | FUSE_DO_READDIRPLUS | FUSE_READDIRPLUS_AUTO | FUSE_ASYNC_DIO | FUSE_WRITEBACK_CACHE | FUSE_NO_OPEN_SUPPORT | FUSE_PARALLEL_DIROPS | FUSE_HANDLE_KILLPRIV | FUSE_POSIX_ACL | FUSE_ABORT_ERROR | FUSE_MAX_PAGES | FUSE_CACHE_SYMLINKS | FUSE_NO_OPENDIR_SUPPORT | FUSE_EXPLICIT_INVAL_DATA | FUSE_HANDLE_KILLPRIV_V2 | FUSE_SETXATTR_EXT | FUSE_INIT_EXT | FUSE_SECURITY_CTX | FUSE_CREATE_SUPP_GROUP | FUSE_HAS_EXPIRE_ONLY | FUSE_DIRECT_IO_ALLOW_MMAP | FUSE_NO_EXPORT_SUPPORT | FUSE_HAS_RESEND | FUSE_ALLOW_IDMAP | FUSE_REQUEST_TIMEOUT; #ifdef CONFIG_FUSE_DAX if (fm->fc->dax) flags |= FUSE_MAP_ALIGNMENT; if (fuse_is_inode_dax_mode(fm->fc->dax_mode)) flags |= FUSE_HAS_INODE_DAX; #endif if (fm->fc->auto_submounts) flags |= FUSE_SUBMOUNTS; if (IS_ENABLED(CONFIG_FUSE_PASSTHROUGH)) flags |= FUSE_PASSTHROUGH; /* * This is just an information flag for fuse server. No need to check * the reply - server is either sending IORING_OP_URING_CMD or not. */ if (fuse_uring_enabled()) flags |= FUSE_OVER_IO_URING; ia->in.flags = flags; ia->in.flags2 = flags >> 32; ia->args.opcode = FUSE_INIT; ia->args.in_numargs = 1; ia->args.in_args[0].size = sizeof(ia->in); ia->args.in_args[0].value = &ia->in; ia->args.out_numargs = 1; /* Variable length argument used for backward compatibility with interface version < 7.5. Rest of init_out is zeroed by do_get_request(), so a short reply is not a problem */ ia->args.out_argvar = true; ia->args.out_args[0].size = sizeof(ia->out); ia->args.out_args[0].value = &ia->out; ia->args.force = true; ia->args.nocreds = true; ia->args.end = process_init_reply; if (fuse_simple_background(fm, &ia->args, GFP_KERNEL) != 0) process_init_reply(fm, &ia->args, -ENOTCONN); } EXPORT_SYMBOL_GPL(fuse_send_init); void fuse_free_conn(struct fuse_conn *fc) { WARN_ON(!list_empty(&fc->devices)); kfree(fc); } EXPORT_SYMBOL_GPL(fuse_free_conn); static int fuse_bdi_init(struct fuse_conn *fc, struct super_block *sb) { int err; char *suffix = ""; if (sb->s_bdev) { suffix = "-fuseblk"; /* * sb->s_bdi points to blkdev's bdi however we want to redirect * it to our private bdi... */ bdi_put(sb->s_bdi); sb->s_bdi = &noop_backing_dev_info; } err = super_setup_bdi_name(sb, "%u:%u%s", MAJOR(fc->dev), MINOR(fc->dev), suffix); if (err) return err; /* fuse does it's own writeback accounting */ sb->s_bdi->capabilities &= ~BDI_CAP_WRITEBACK_ACCT; sb->s_bdi->capabilities |= BDI_CAP_STRICTLIMIT; /* * For a single fuse filesystem use max 1% of dirty + * writeback threshold. * * This gives about 1M of write buffer for memory maps on a * machine with 1G and 10% dirty_ratio, which should be more * than enough. * * Privileged users can raise it by writing to * * /sys/class/bdi/<bdi>/max_ratio */ bdi_set_max_ratio(sb->s_bdi, 1); return 0; } struct fuse_dev *fuse_dev_alloc(void) { struct fuse_dev *fud; struct list_head *pq; fud = kzalloc(sizeof(struct fuse_dev), GFP_KERNEL); if (!fud) return NULL; pq = kcalloc(FUSE_PQ_HASH_SIZE, sizeof(struct list_head), GFP_KERNEL); if (!pq) { kfree(fud); return NULL; } fud->pq.processing = pq; fuse_pqueue_init(&fud->pq); return fud; } EXPORT_SYMBOL_GPL(fuse_dev_alloc); void fuse_dev_install(struct fuse_dev *fud, struct fuse_conn *fc) { fud->fc = fuse_conn_get(fc); spin_lock(&fc->lock); list_add_tail(&fud->entry, &fc->devices); spin_unlock(&fc->lock); } EXPORT_SYMBOL_GPL(fuse_dev_install); struct fuse_dev *fuse_dev_alloc_install(struct fuse_conn *fc) { struct fuse_dev *fud; fud = fuse_dev_alloc(); if (!fud) return NULL; fuse_dev_install(fud, fc); return fud; } EXPORT_SYMBOL_GPL(fuse_dev_alloc_install); void fuse_dev_free(struct fuse_dev *fud) { struct fuse_conn *fc = fud->fc; if (fc) { spin_lock(&fc->lock); list_del(&fud->entry); spin_unlock(&fc->lock); fuse_conn_put(fc); } kfree(fud->pq.processing); kfree(fud); } EXPORT_SYMBOL_GPL(fuse_dev_free); static void fuse_fill_attr_from_inode(struct fuse_attr *attr, const struct fuse_inode *fi) { struct timespec64 atime = inode_get_atime(&fi->inode); struct timespec64 mtime = inode_get_mtime(&fi->inode); struct timespec64 ctime = inode_get_ctime(&fi->inode); *attr = (struct fuse_attr){ .ino = fi->inode.i_ino, .size = fi->inode.i_size, .blocks = fi->inode.i_blocks, .atime = atime.tv_sec, .mtime = mtime.tv_sec, .ctime = ctime.tv_sec, .atimensec = atime.tv_nsec, .mtimensec = mtime.tv_nsec, .ctimensec = ctime.tv_nsec, .mode = fi->inode.i_mode, .nlink = fi->inode.i_nlink, .uid = __kuid_val(fi->inode.i_uid), .gid = __kgid_val(fi->inode.i_gid), .rdev = fi->inode.i_rdev, .blksize = 1u << fi->inode.i_blkbits, }; } static void fuse_sb_defaults(struct super_block *sb) { sb->s_magic = FUSE_SUPER_MAGIC; sb->s_op = &fuse_super_operations; sb->s_xattr = fuse_xattr_handlers; sb->s_maxbytes = MAX_LFS_FILESIZE; sb->s_time_gran = 1; sb->s_export_op = &fuse_export_operations; sb->s_iflags |= SB_I_IMA_UNVERIFIABLE_SIGNATURE; sb->s_iflags |= SB_I_NOIDMAP; if (sb->s_user_ns != &init_user_ns) sb->s_iflags |= SB_I_UNTRUSTED_MOUNTER; sb->s_flags &= ~(SB_NOSEC | SB_I_VERSION); } static int fuse_fill_super_submount(struct super_block *sb, struct fuse_inode *parent_fi) { struct fuse_mount *fm = get_fuse_mount_super(sb); struct super_block *parent_sb = parent_fi->inode.i_sb; struct fuse_attr root_attr; struct inode *root; struct fuse_submount_lookup *sl; struct fuse_inode *fi; fuse_sb_defaults(sb); fm->sb = sb; WARN_ON(sb->s_bdi != &noop_backing_dev_info); sb->s_bdi = bdi_get(parent_sb->s_bdi); sb->s_xattr = parent_sb->s_xattr; sb->s_export_op = parent_sb->s_export_op; sb->s_time_gran = parent_sb->s_time_gran; sb->s_blocksize = parent_sb->s_blocksize; sb->s_blocksize_bits = parent_sb->s_blocksize_bits; sb->s_subtype = kstrdup(parent_sb->s_subtype, GFP_KERNEL); if (parent_sb->s_subtype && !sb->s_subtype) return -ENOMEM; fuse_fill_attr_from_inode(&root_attr, parent_fi); root = fuse_iget(sb, parent_fi->nodeid, 0, &root_attr, 0, 0, fuse_get_evict_ctr(fm->fc)); /* * This inode is just a duplicate, so it is not looked up and * its nlookup should not be incremented. fuse_iget() does * that, though, so undo it here. */ fi = get_fuse_inode(root); fi->nlookup--; set_default_d_op(sb, &fuse_dentry_operations); sb->s_root = d_make_root(root); if (!sb->s_root) return -ENOMEM; /* * Grab the parent's submount_lookup pointer and take a * reference on the shared nlookup from the parent. This is to * prevent the last forget for this nodeid from getting * triggered until all users have finished with it. */ sl = parent_fi->submount_lookup; WARN_ON(!sl); if (sl) { refcount_inc(&sl->count); fi->submount_lookup = sl; } return 0; } /* Filesystem context private data holds the FUSE inode of the mount point */ static int fuse_get_tree_submount(struct fs_context *fsc) { struct fuse_mount *fm; struct fuse_inode *mp_fi = fsc->fs_private; struct fuse_conn *fc = get_fuse_conn(&mp_fi->inode); struct super_block *sb; int err; fm = kzalloc(sizeof(struct fuse_mount), GFP_KERNEL); if (!fm) return -ENOMEM; fm->fc = fuse_conn_get(fc); fsc->s_fs_info = fm; sb = sget_fc(fsc, NULL, set_anon_super_fc); if (fsc->s_fs_info) fuse_mount_destroy(fm); if (IS_ERR(sb)) return PTR_ERR(sb); /* Initialize superblock, making @mp_fi its root */ err = fuse_fill_super_submount(sb, mp_fi); if (err) { deactivate_locked_super(sb); return err; } down_write(&fc->killsb); list_add_tail(&fm->fc_entry, &fc->mounts); up_write(&fc->killsb); sb->s_flags |= SB_ACTIVE; fsc->root = dget(sb->s_root); return 0; } static const struct fs_context_operations fuse_context_submount_ops = { .get_tree = fuse_get_tree_submount, }; int fuse_init_fs_context_submount(struct fs_context *fsc) { fsc->ops = &fuse_context_submount_ops; return 0; } EXPORT_SYMBOL_GPL(fuse_init_fs_context_submount); int fuse_fill_super_common(struct super_block *sb, struct fuse_fs_context *ctx) { struct fuse_dev *fud = NULL; struct fuse_mount *fm = get_fuse_mount_super(sb); struct fuse_conn *fc = fm->fc; struct inode *root; struct dentry *root_dentry; int err; err = -EINVAL; if (sb->s_flags & SB_MANDLOCK) goto err; rcu_assign_pointer(fc->curr_bucket, fuse_sync_bucket_alloc()); fuse_sb_defaults(sb); if (ctx->is_bdev) { #ifdef CONFIG_BLOCK err = -EINVAL; if (!sb_set_blocksize(sb, ctx->blksize)) goto err; #endif } else { sb->s_blocksize = PAGE_SIZE; sb->s_blocksize_bits = PAGE_SHIFT; } sb->s_subtype = ctx->subtype; ctx->subtype = NULL; if (IS_ENABLED(CONFIG_FUSE_DAX)) { err = fuse_dax_conn_alloc(fc, ctx->dax_mode, ctx->dax_dev); if (err) goto err; } if (ctx->fudptr) { err = -ENOMEM; fud = fuse_dev_alloc_install(fc); if (!fud) goto err_free_dax; } fc->dev = sb->s_dev; fm->sb = sb; err = fuse_bdi_init(fc, sb); if (err) goto err_dev_free; /* Handle umasking inside the fuse code */ if (sb->s_flags & SB_POSIXACL) fc->dont_mask = 1; sb->s_flags |= SB_POSIXACL; fc->default_permissions = ctx->default_permissions; fc->allow_other = ctx->allow_other; fc->user_id = ctx->user_id; fc->group_id = ctx->group_id; fc->legacy_opts_show = ctx->legacy_opts_show; fc->max_read = max_t(unsigned int, 4096, ctx->max_read); fc->destroy = ctx->destroy; fc->no_control = ctx->no_control; fc->no_force_umount = ctx->no_force_umount; err = -ENOMEM; root = fuse_get_root_inode(sb, ctx->rootmode); set_default_d_op(sb, &fuse_dentry_operations); root_dentry = d_make_root(root); if (!root_dentry) goto err_dev_free; mutex_lock(&fuse_mutex); err = -EINVAL; if (ctx->fudptr && *ctx->fudptr) goto err_unlock; err = fuse_ctl_add_conn(fc); if (err) goto err_unlock; list_add_tail(&fc->entry, &fuse_conn_list); sb->s_root = root_dentry; if (ctx->fudptr) *ctx->fudptr = fud; mutex_unlock(&fuse_mutex); return 0; err_unlock: mutex_unlock(&fuse_mutex); dput(root_dentry); err_dev_free: if (fud) fuse_dev_free(fud); err_free_dax: if (IS_ENABLED(CONFIG_FUSE_DAX)) fuse_dax_conn_free(fc); err: return err; } EXPORT_SYMBOL_GPL(fuse_fill_super_common); static int fuse_fill_super(struct super_block *sb, struct fs_context *fsc) { struct fuse_fs_context *ctx = fsc->fs_private; int err; if (!ctx->file || !ctx->rootmode_present || !ctx->user_id_present || !ctx->group_id_present) return -EINVAL; /* * Require mount to happen from the same user namespace which * opened /dev/fuse to prevent potential attacks. */ if ((ctx->file->f_op != &fuse_dev_operations) || (ctx->file->f_cred->user_ns != sb->s_user_ns)) return -EINVAL; ctx->fudptr = &ctx->file->private_data; err = fuse_fill_super_common(sb, ctx); if (err) return err; /* file->private_data shall be visible on all CPUs after this */ smp_mb(); fuse_send_init(get_fuse_mount_super(sb)); return 0; } /* * This is the path where user supplied an already initialized fuse dev. In * this case never create a new super if the old one is gone. */ static int fuse_set_no_super(struct super_block *sb, struct fs_context *fsc) { return -ENOTCONN; } static int fuse_test_super(struct super_block *sb, struct fs_context *fsc) { return fsc->sget_key == get_fuse_conn_super(sb); } static int fuse_get_tree(struct fs_context *fsc) { struct fuse_fs_context *ctx = fsc->fs_private; struct fuse_dev *fud; struct fuse_conn *fc; struct fuse_mount *fm; struct super_block *sb; int err; fc = kmalloc(sizeof(*fc), GFP_KERNEL); if (!fc) return -ENOMEM; fm = kzalloc(sizeof(*fm), GFP_KERNEL); if (!fm) { kfree(fc); return -ENOMEM; } fuse_conn_init(fc, fm, fsc->user_ns, &fuse_dev_fiq_ops, NULL); fc->release = fuse_free_conn; fsc->s_fs_info = fm; if (ctx->fd_present) ctx->file = fget(ctx->fd); if (IS_ENABLED(CONFIG_BLOCK) && ctx->is_bdev) { err = get_tree_bdev(fsc, fuse_fill_super); goto out; } /* * While block dev mount can be initialized with a dummy device fd * (found by device name), normal fuse mounts can't */ err = -EINVAL; if (!ctx->file) goto out; /* * Allow creating a fuse mount with an already initialized fuse * connection */ fud = READ_ONCE(ctx->file->private_data); if (ctx->file->f_op == &fuse_dev_operations && fud) { fsc->sget_key = fud->fc; sb = sget_fc(fsc, fuse_test_super, fuse_set_no_super); err = PTR_ERR_OR_ZERO(sb); if (!IS_ERR(sb)) fsc->root = dget(sb->s_root); } else { err = get_tree_nodev(fsc, fuse_fill_super); } out: if (fsc->s_fs_info) fuse_mount_destroy(fm); if (ctx->file) fput(ctx->file); return err; } static const struct fs_context_operations fuse_context_ops = { .free = fuse_free_fsc, .parse_param = fuse_parse_param, .reconfigure = fuse_reconfigure, .get_tree = fuse_get_tree, }; /* * Set up the filesystem mount context. */ static int fuse_init_fs_context(struct fs_context *fsc) { struct fuse_fs_context *ctx; ctx = kzalloc(sizeof(struct fuse_fs_context), GFP_KERNEL); if (!ctx) return -ENOMEM; ctx->max_read = ~0; ctx->blksize = FUSE_DEFAULT_BLKSIZE; ctx->legacy_opts_show = true; #ifdef CONFIG_BLOCK if (fsc->fs_type == &fuseblk_fs_type) { ctx->is_bdev = true; ctx->destroy = true; } #endif fsc->fs_private = ctx; fsc->ops = &fuse_context_ops; return 0; } bool fuse_mount_remove(struct fuse_mount *fm) { struct fuse_conn *fc = fm->fc; bool last = false; down_write(&fc->killsb); list_del_init(&fm->fc_entry); if (list_empty(&fc->mounts)) last = true; up_write(&fc->killsb); return last; } EXPORT_SYMBOL_GPL(fuse_mount_remove); void fuse_conn_destroy(struct fuse_mount *fm) { struct fuse_conn *fc = fm->fc; if (fc->destroy) fuse_send_destroy(fm); fuse_abort_conn(fc); fuse_wait_aborted(fc); if (!list_empty(&fc->entry)) { mutex_lock(&fuse_mutex); list_del(&fc->entry); fuse_ctl_remove_conn(fc); mutex_unlock(&fuse_mutex); } } EXPORT_SYMBOL_GPL(fuse_conn_destroy); static void fuse_sb_destroy(struct super_block *sb) { struct fuse_mount *fm = get_fuse_mount_super(sb); bool last; if (sb->s_root) { last = fuse_mount_remove(fm); if (last) fuse_conn_destroy(fm); } } void fuse_mount_destroy(struct fuse_mount *fm) { fuse_conn_put(fm->fc); kfree_rcu(fm, rcu); } EXPORT_SYMBOL(fuse_mount_destroy); static void fuse_kill_sb_anon(struct super_block *sb) { fuse_sb_destroy(sb); kill_anon_super(sb); fuse_mount_destroy(get_fuse_mount_super(sb)); } static struct file_system_type fuse_fs_type = { .owner = THIS_MODULE, .name = "fuse", .fs_flags = FS_HAS_SUBTYPE | FS_USERNS_MOUNT | FS_ALLOW_IDMAP, .init_fs_context = fuse_init_fs_context, .parameters = fuse_fs_parameters, .kill_sb = fuse_kill_sb_anon, }; MODULE_ALIAS_FS("fuse"); #ifdef CONFIG_BLOCK static void fuse_kill_sb_blk(struct super_block *sb) { fuse_sb_destroy(sb); kill_block_super(sb); fuse_mount_destroy(get_fuse_mount_super(sb)); } static struct file_system_type fuseblk_fs_type = { .owner = THIS_MODULE, .name = "fuseblk", .init_fs_context = fuse_init_fs_context, .parameters = fuse_fs_parameters, .kill_sb = fuse_kill_sb_blk, .fs_flags = FS_REQUIRES_DEV | FS_HAS_SUBTYPE | FS_ALLOW_IDMAP, }; MODULE_ALIAS_FS("fuseblk"); static inline int register_fuseblk(void) { return register_filesystem(&fuseblk_fs_type); } static inline void unregister_fuseblk(void) { unregister_filesystem(&fuseblk_fs_type); } #else static inline int register_fuseblk(void) { return 0; } static inline void unregister_fuseblk(void) { } #endif static void fuse_inode_init_once(void *foo) { struct inode *inode = foo; inode_init_once(inode); } static int __init fuse_fs_init(void) { int err; fuse_inode_cachep = kmem_cache_create("fuse_inode", sizeof(struct fuse_inode), 0, SLAB_HWCACHE_ALIGN|SLAB_ACCOUNT|SLAB_RECLAIM_ACCOUNT, fuse_inode_init_once); err = -ENOMEM; if (!fuse_inode_cachep) goto out; err = register_fuseblk(); if (err) goto out2; err = register_filesystem(&fuse_fs_type); if (err) goto out3; err = fuse_sysctl_register(); if (err) goto out4; return 0; out4: unregister_filesystem(&fuse_fs_type); out3: unregister_fuseblk(); out2: kmem_cache_destroy(fuse_inode_cachep); out: return err; } static void fuse_fs_cleanup(void) { fuse_sysctl_unregister(); unregister_filesystem(&fuse_fs_type); unregister_fuseblk(); /* * Make sure all delayed rcu free inodes are flushed before we * destroy cache. */ rcu_barrier(); kmem_cache_destroy(fuse_inode_cachep); } static struct kobject *fuse_kobj; static int fuse_sysfs_init(void) { int err; fuse_kobj = kobject_create_and_add("fuse", fs_kobj); if (!fuse_kobj) { err = -ENOMEM; goto out_err; } err = sysfs_create_mount_point(fuse_kobj, "connections"); if (err) goto out_fuse_unregister; return 0; out_fuse_unregister: kobject_put(fuse_kobj); out_err: return err; } static void fuse_sysfs_cleanup(void) { sysfs_remove_mount_point(fuse_kobj, "connections"); kobject_put(fuse_kobj); } static int __init fuse_init(void) { int res; pr_info("init (API version %i.%i)\n", FUSE_KERNEL_VERSION, FUSE_KERNEL_MINOR_VERSION); INIT_LIST_HEAD(&fuse_conn_list); res = fuse_fs_init(); if (res) goto err; res = fuse_dev_init(); if (res) goto err_fs_cleanup; res = fuse_sysfs_init(); if (res) goto err_dev_cleanup; res = fuse_ctl_init(); if (res) goto err_sysfs_cleanup; sanitize_global_limit(&max_user_bgreq); sanitize_global_limit(&max_user_congthresh); return 0; err_sysfs_cleanup: fuse_sysfs_cleanup(); err_dev_cleanup: fuse_dev_cleanup(); err_fs_cleanup: fuse_fs_cleanup(); err: return res; } static void __exit fuse_exit(void) { pr_debug("exit\n"); fuse_ctl_cleanup(); fuse_sysfs_cleanup(); fuse_fs_cleanup(); fuse_dev_cleanup(); } module_init(fuse_init); module_exit(fuse_exit); |
| 29 18 18 17 16 16 4 16 16 16 10 1 8 1 2 16 14 16 16 29 27 29 22 22 22 22 17 17 13 17 17 17 13 13 15 17 29 29 29 29 29 8 8 7 8 1 14 4 3 4 3 2 10 1 9 1 8 4 8 4 1 7 1 14 7 7 14 14 14 7 1 1 1 3 2 3 2 2 1 2 3 2 2 2 17 16 17 16 14 16 16 16 16 14 14 16 2 16 10 10 1 1 1 10 9 10 10 10 10 1 1 10 3 10 14 14 14 14 15 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 | /* * Compressed rom filesystem for Linux. * * Copyright (C) 1999 Linus Torvalds. * * This file is released under the GPL. */ /* * These are the VFS interfaces to the compressed rom filesystem. * The actual compression is based on zlib, see the other files. */ #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt #include <linux/module.h> #include <linux/fs.h> #include <linux/file.h> #include <linux/pagemap.h> #include <linux/ramfs.h> #include <linux/init.h> #include <linux/string.h> #include <linux/blkdev.h> #include <linux/mtd/mtd.h> #include <linux/mtd/super.h> #include <linux/fs_context.h> #include <linux/slab.h> #include <linux/vfs.h> #include <linux/mutex.h> #include <uapi/linux/cramfs_fs.h> #include <linux/uaccess.h> #include "internal.h" /* * cramfs super-block data in memory */ struct cramfs_sb_info { unsigned long magic; unsigned long size; unsigned long blocks; unsigned long files; unsigned long flags; void *linear_virt_addr; resource_size_t linear_phys_addr; size_t mtd_point_size; }; static inline struct cramfs_sb_info *CRAMFS_SB(struct super_block *sb) { return sb->s_fs_info; } static const struct super_operations cramfs_ops; static const struct inode_operations cramfs_dir_inode_operations; static const struct file_operations cramfs_directory_operations; static const struct file_operations cramfs_physmem_fops; static const struct address_space_operations cramfs_aops; static DEFINE_MUTEX(read_mutex); /* These macros may change in future, to provide better st_ino semantics. */ #define OFFSET(x) ((x)->i_ino) static unsigned long cramino(const struct cramfs_inode *cino, unsigned int offset) { if (!cino->offset) return offset + 1; if (!cino->size) return offset + 1; /* * The file mode test fixes buggy mkcramfs implementations where * cramfs_inode->offset is set to a non zero value for entries * which did not contain data, like devices node and fifos. */ switch (cino->mode & S_IFMT) { case S_IFREG: case S_IFDIR: case S_IFLNK: return cino->offset << 2; default: break; } return offset + 1; } static struct inode *get_cramfs_inode(struct super_block *sb, const struct cramfs_inode *cramfs_inode, unsigned int offset) { struct inode *inode; static struct timespec64 zerotime; inode = iget_locked(sb, cramino(cramfs_inode, offset)); if (!inode) return ERR_PTR(-ENOMEM); if (!(inode->i_state & I_NEW)) return inode; switch (cramfs_inode->mode & S_IFMT) { case S_IFREG: inode->i_fop = &generic_ro_fops; inode->i_data.a_ops = &cramfs_aops; if (IS_ENABLED(CONFIG_CRAMFS_MTD) && CRAMFS_SB(sb)->flags & CRAMFS_FLAG_EXT_BLOCK_POINTERS && CRAMFS_SB(sb)->linear_phys_addr) inode->i_fop = &cramfs_physmem_fops; break; case S_IFDIR: inode->i_op = &cramfs_dir_inode_operations; inode->i_fop = &cramfs_directory_operations; break; case S_IFLNK: inode->i_op = &page_symlink_inode_operations; inode_nohighmem(inode); inode->i_data.a_ops = &cramfs_aops; break; default: init_special_inode(inode, cramfs_inode->mode, old_decode_dev(cramfs_inode->size)); } inode->i_mode = cramfs_inode->mode; i_uid_write(inode, cramfs_inode->uid); i_gid_write(inode, cramfs_inode->gid); /* if the lower 2 bits are zero, the inode contains data */ if (!(inode->i_ino & 3)) { inode->i_size = cramfs_inode->size; inode->i_blocks = (cramfs_inode->size - 1) / 512 + 1; } /* Struct copy intentional */ inode_set_mtime_to_ts(inode, inode_set_atime_to_ts(inode, inode_set_ctime_to_ts(inode, zerotime))); /* inode->i_nlink is left 1 - arguably wrong for directories, but it's the best we can do without reading the directory contents. 1 yields the right result in GNU find, even without -noleaf option. */ unlock_new_inode(inode); return inode; } /* * We have our own block cache: don't fill up the buffer cache * with the rom-image, because the way the filesystem is set * up the accesses should be fairly regular and cached in the * page cache and dentry tree anyway.. * * This also acts as a way to guarantee contiguous areas of up to * BLKS_PER_BUF*PAGE_SIZE, so that the caller doesn't need to * worry about end-of-buffer issues even when decompressing a full * page cache. * * Note: This is all optimized away at compile time when * CONFIG_CRAMFS_BLOCKDEV=n. */ #define READ_BUFFERS (2) /* NEXT_BUFFER(): Loop over [0..(READ_BUFFERS-1)]. */ #define NEXT_BUFFER(_ix) ((_ix) ^ 1) /* * BLKS_PER_BUF_SHIFT should be at least 2 to allow for "compressed" * data that takes up more space than the original and with unlucky * alignment. */ #define BLKS_PER_BUF_SHIFT (2) #define BLKS_PER_BUF (1 << BLKS_PER_BUF_SHIFT) #define BUFFER_SIZE (BLKS_PER_BUF*PAGE_SIZE) static unsigned char read_buffers[READ_BUFFERS][BUFFER_SIZE]; static unsigned buffer_blocknr[READ_BUFFERS]; static struct super_block *buffer_dev[READ_BUFFERS]; static int next_buffer; /* * Populate our block cache and return a pointer to it. */ static void *cramfs_blkdev_read(struct super_block *sb, unsigned int offset, unsigned int len) { struct address_space *mapping = sb->s_bdev->bd_mapping; struct file_ra_state ra = {}; struct page *pages[BLKS_PER_BUF]; unsigned i, blocknr, buffer; unsigned long devsize; char *data; if (!len) return NULL; blocknr = offset >> PAGE_SHIFT; offset &= PAGE_SIZE - 1; /* Check if an existing buffer already has the data.. */ for (i = 0; i < READ_BUFFERS; i++) { unsigned int blk_offset; if (buffer_dev[i] != sb) continue; if (blocknr < buffer_blocknr[i]) continue; blk_offset = (blocknr - buffer_blocknr[i]) << PAGE_SHIFT; blk_offset += offset; if (blk_offset > BUFFER_SIZE || blk_offset + len > BUFFER_SIZE) continue; return read_buffers[i] + blk_offset; } devsize = bdev_nr_bytes(sb->s_bdev) >> PAGE_SHIFT; /* Ok, read in BLKS_PER_BUF pages completely first. */ file_ra_state_init(&ra, mapping); page_cache_sync_readahead(mapping, &ra, NULL, blocknr, BLKS_PER_BUF); for (i = 0; i < BLKS_PER_BUF; i++) { struct page *page = NULL; if (blocknr + i < devsize) { page = read_mapping_page(mapping, blocknr + i, NULL); /* synchronous error? */ if (IS_ERR(page)) page = NULL; } pages[i] = page; } buffer = next_buffer; next_buffer = NEXT_BUFFER(buffer); buffer_blocknr[buffer] = blocknr; buffer_dev[buffer] = sb; data = read_buffers[buffer]; for (i = 0; i < BLKS_PER_BUF; i++) { struct page *page = pages[i]; if (page) { memcpy_from_page(data, page, 0, PAGE_SIZE); put_page(page); } else memset(data, 0, PAGE_SIZE); data += PAGE_SIZE; } return read_buffers[buffer] + offset; } /* * Return a pointer to the linearly addressed cramfs image in memory. */ static void *cramfs_direct_read(struct super_block *sb, unsigned int offset, unsigned int len) { struct cramfs_sb_info *sbi = CRAMFS_SB(sb); if (!len) return NULL; if (len > sbi->size || offset > sbi->size - len) return page_address(ZERO_PAGE(0)); return sbi->linear_virt_addr + offset; } /* * Returns a pointer to a buffer containing at least LEN bytes of * filesystem starting at byte offset OFFSET into the filesystem. */ static void *cramfs_read(struct super_block *sb, unsigned int offset, unsigned int len) { struct cramfs_sb_info *sbi = CRAMFS_SB(sb); if (IS_ENABLED(CONFIG_CRAMFS_MTD) && sbi->linear_virt_addr) return cramfs_direct_read(sb, offset, len); else if (IS_ENABLED(CONFIG_CRAMFS_BLOCKDEV)) return cramfs_blkdev_read(sb, offset, len); else return NULL; } /* * For a mapping to be possible, we need a range of uncompressed and * contiguous blocks. Return the offset for the first block and number of * valid blocks for which that is true, or zero otherwise. */ static u32 cramfs_get_block_range(struct inode *inode, u32 pgoff, u32 *pages) { struct cramfs_sb_info *sbi = CRAMFS_SB(inode->i_sb); int i; u32 *blockptrs, first_block_addr; /* * We can dereference memory directly here as this code may be * reached only when there is a direct filesystem image mapping * available in memory. */ blockptrs = (u32 *)(sbi->linear_virt_addr + OFFSET(inode) + pgoff * 4); first_block_addr = blockptrs[0] & ~CRAMFS_BLK_FLAGS; i = 0; do { u32 block_off = i * (PAGE_SIZE >> CRAMFS_BLK_DIRECT_PTR_SHIFT); u32 expect = (first_block_addr + block_off) | CRAMFS_BLK_FLAG_DIRECT_PTR | CRAMFS_BLK_FLAG_UNCOMPRESSED; if (blockptrs[i] != expect) { pr_debug("range: block %d/%d got %#x expects %#x\n", pgoff+i, pgoff + *pages - 1, blockptrs[i], expect); if (i == 0) return 0; break; } } while (++i < *pages); *pages = i; return first_block_addr << CRAMFS_BLK_DIRECT_PTR_SHIFT; } #ifdef CONFIG_MMU /* * Return true if the last page of a file in the filesystem image contains * some other data that doesn't belong to that file. It is assumed that the * last block is CRAMFS_BLK_FLAG_DIRECT_PTR | CRAMFS_BLK_FLAG_UNCOMPRESSED * (verified by cramfs_get_block_range() and directly accessible in memory. */ static bool cramfs_last_page_is_shared(struct inode *inode) { struct cramfs_sb_info *sbi = CRAMFS_SB(inode->i_sb); u32 partial, last_page, blockaddr, *blockptrs; char *tail_data; partial = offset_in_page(inode->i_size); if (!partial) return false; last_page = inode->i_size >> PAGE_SHIFT; blockptrs = (u32 *)(sbi->linear_virt_addr + OFFSET(inode)); blockaddr = blockptrs[last_page] & ~CRAMFS_BLK_FLAGS; blockaddr <<= CRAMFS_BLK_DIRECT_PTR_SHIFT; tail_data = sbi->linear_virt_addr + blockaddr + partial; return memchr_inv(tail_data, 0, PAGE_SIZE - partial) ? true : false; } static int cramfs_physmem_mmap(struct file *file, struct vm_area_struct *vma) { struct inode *inode = file_inode(file); struct cramfs_sb_info *sbi = CRAMFS_SB(inode->i_sb); unsigned int pages, max_pages, offset; unsigned long address, pgoff = vma->vm_pgoff; char *bailout_reason; int ret; ret = generic_file_readonly_mmap(file, vma); if (ret) return ret; /* * Now try to pre-populate ptes for this vma with a direct * mapping avoiding memory allocation when possible. */ /* Could COW work here? */ bailout_reason = "vma is writable"; if (vma->vm_flags & VM_WRITE) goto bailout; max_pages = (inode->i_size + PAGE_SIZE - 1) >> PAGE_SHIFT; bailout_reason = "beyond file limit"; if (pgoff >= max_pages) goto bailout; pages = min(vma_pages(vma), max_pages - pgoff); offset = cramfs_get_block_range(inode, pgoff, &pages); bailout_reason = "unsuitable block layout"; if (!offset) goto bailout; address = sbi->linear_phys_addr + offset; bailout_reason = "data is not page aligned"; if (!PAGE_ALIGNED(address)) goto bailout; /* Don't map the last page if it contains some other data */ if (pgoff + pages == max_pages && cramfs_last_page_is_shared(inode)) { pr_debug("mmap: %pD: last page is shared\n", file); pages--; } if (!pages) { bailout_reason = "no suitable block remaining"; goto bailout; } if (pages == vma_pages(vma)) { /* * The entire vma is mappable. remap_pfn_range() will * make it distinguishable from a non-direct mapping * in /proc/<pid>/maps by substituting the file offset * with the actual physical address. */ ret = remap_pfn_range(vma, vma->vm_start, address >> PAGE_SHIFT, pages * PAGE_SIZE, vma->vm_page_prot); } else { /* * Let's create a mixed map if we can't map it all. * The normal paging machinery will take care of the * unpopulated ptes via cramfs_read_folio(). */ int i; vm_flags_set(vma, VM_MIXEDMAP); for (i = 0; i < pages && !ret; i++) { vm_fault_t vmf; unsigned long off = i * PAGE_SIZE; vmf = vmf_insert_mixed(vma, vma->vm_start + off, address + off); if (vmf & VM_FAULT_ERROR) ret = vm_fault_to_errno(vmf, 0); } } if (!ret) pr_debug("mapped %pD[%lu] at 0x%08lx (%u/%lu pages) " "to vma 0x%08lx, page_prot 0x%llx\n", file, pgoff, address, pages, vma_pages(vma), vma->vm_start, (unsigned long long)pgprot_val(vma->vm_page_prot)); return ret; bailout: pr_debug("%pD[%lu]: direct mmap impossible: %s\n", file, pgoff, bailout_reason); /* Didn't manage any direct map, but normal paging is still possible */ return 0; } #else /* CONFIG_MMU */ static int cramfs_physmem_mmap(struct file *file, struct vm_area_struct *vma) { return is_nommu_shared_mapping(vma->vm_flags) ? 0 : -ENOSYS; } static unsigned long cramfs_physmem_get_unmapped_area(struct file *file, unsigned long addr, unsigned long len, unsigned long pgoff, unsigned long flags) { struct inode *inode = file_inode(file); struct super_block *sb = inode->i_sb; struct cramfs_sb_info *sbi = CRAMFS_SB(sb); unsigned int pages, block_pages, max_pages, offset; pages = (len + PAGE_SIZE - 1) >> PAGE_SHIFT; max_pages = (inode->i_size + PAGE_SIZE - 1) >> PAGE_SHIFT; if (pgoff >= max_pages || pages > max_pages - pgoff) return -EINVAL; block_pages = pages; offset = cramfs_get_block_range(inode, pgoff, &block_pages); if (!offset || block_pages != pages) return -ENOSYS; addr = sbi->linear_phys_addr + offset; pr_debug("get_unmapped for %pD ofs %#lx siz %lu at 0x%08lx\n", file, pgoff*PAGE_SIZE, len, addr); return addr; } static unsigned int cramfs_physmem_mmap_capabilities(struct file *file) { return NOMMU_MAP_COPY | NOMMU_MAP_DIRECT | NOMMU_MAP_READ | NOMMU_MAP_EXEC; } #endif /* CONFIG_MMU */ static const struct file_operations cramfs_physmem_fops = { .llseek = generic_file_llseek, .read_iter = generic_file_read_iter, .splice_read = filemap_splice_read, .mmap = cramfs_physmem_mmap, #ifndef CONFIG_MMU .get_unmapped_area = cramfs_physmem_get_unmapped_area, .mmap_capabilities = cramfs_physmem_mmap_capabilities, #endif }; static void cramfs_kill_sb(struct super_block *sb) { struct cramfs_sb_info *sbi = CRAMFS_SB(sb); generic_shutdown_super(sb); if (IS_ENABLED(CONFIG_CRAMFS_MTD) && sb->s_mtd) { if (sbi && sbi->mtd_point_size) mtd_unpoint(sb->s_mtd, 0, sbi->mtd_point_size); put_mtd_device(sb->s_mtd); sb->s_mtd = NULL; } else if (IS_ENABLED(CONFIG_CRAMFS_BLOCKDEV) && sb->s_bdev) { sync_blockdev(sb->s_bdev); bdev_fput(sb->s_bdev_file); } kfree(sbi); } static int cramfs_reconfigure(struct fs_context *fc) { sync_filesystem(fc->root->d_sb); fc->sb_flags |= SB_RDONLY; return 0; } static int cramfs_read_super(struct super_block *sb, struct fs_context *fc, struct cramfs_super *super) { struct cramfs_sb_info *sbi = CRAMFS_SB(sb); unsigned long root_offset; bool silent = fc->sb_flags & SB_SILENT; /* We don't know the real size yet */ sbi->size = PAGE_SIZE; /* Read the first block and get the superblock from it */ mutex_lock(&read_mutex); memcpy(super, cramfs_read(sb, 0, sizeof(*super)), sizeof(*super)); mutex_unlock(&read_mutex); /* Do sanity checks on the superblock */ if (super->magic != CRAMFS_MAGIC) { /* check for wrong endianness */ if (super->magic == CRAMFS_MAGIC_WEND) { if (!silent) errorfc(fc, "wrong endianness"); return -EINVAL; } /* check at 512 byte offset */ mutex_lock(&read_mutex); memcpy(super, cramfs_read(sb, 512, sizeof(*super)), sizeof(*super)); mutex_unlock(&read_mutex); if (super->magic != CRAMFS_MAGIC) { if (super->magic == CRAMFS_MAGIC_WEND && !silent) errorfc(fc, "wrong endianness"); else if (!silent) errorfc(fc, "wrong magic"); return -EINVAL; } } /* get feature flags first */ if (super->flags & ~CRAMFS_SUPPORTED_FLAGS) { errorfc(fc, "unsupported filesystem features"); return -EINVAL; } /* Check that the root inode is in a sane state */ if (!S_ISDIR(super->root.mode)) { errorfc(fc, "root is not a directory"); return -EINVAL; } /* correct strange, hard-coded permissions of mkcramfs */ super->root.mode |= 0555; root_offset = super->root.offset << 2; if (super->flags & CRAMFS_FLAG_FSID_VERSION_2) { sbi->size = super->size; sbi->blocks = super->fsid.blocks; sbi->files = super->fsid.files; } else { sbi->size = 1<<28; sbi->blocks = 0; sbi->files = 0; } sbi->magic = super->magic; sbi->flags = super->flags; if (root_offset == 0) infofc(fc, "empty filesystem"); else if (!(super->flags & CRAMFS_FLAG_SHIFTED_ROOT_OFFSET) && ((root_offset != sizeof(struct cramfs_super)) && (root_offset != 512 + sizeof(struct cramfs_super)))) { errorfc(fc, "bad root offset %lu", root_offset); return -EINVAL; } return 0; } static int cramfs_finalize_super(struct super_block *sb, struct cramfs_inode *cramfs_root) { struct inode *root; /* Set it all up.. */ sb->s_flags |= SB_RDONLY; sb->s_time_min = 0; sb->s_time_max = 0; sb->s_op = &cramfs_ops; root = get_cramfs_inode(sb, cramfs_root, 0); if (IS_ERR(root)) return PTR_ERR(root); sb->s_root = d_make_root(root); if (!sb->s_root) return -ENOMEM; return 0; } static int cramfs_blkdev_fill_super(struct super_block *sb, struct fs_context *fc) { struct cramfs_sb_info *sbi; struct cramfs_super super; int i, err; sbi = kzalloc(sizeof(struct cramfs_sb_info), GFP_KERNEL); if (!sbi) return -ENOMEM; sb->s_fs_info = sbi; /* Invalidate the read buffers on mount: think disk change.. */ for (i = 0; i < READ_BUFFERS; i++) buffer_blocknr[i] = -1; err = cramfs_read_super(sb, fc, &super); if (err) return err; return cramfs_finalize_super(sb, &super.root); } static int cramfs_mtd_fill_super(struct super_block *sb, struct fs_context *fc) { struct cramfs_sb_info *sbi; struct cramfs_super super; int err; sbi = kzalloc(sizeof(struct cramfs_sb_info), GFP_KERNEL); if (!sbi) return -ENOMEM; sb->s_fs_info = sbi; /* Map only one page for now. Will remap it when fs size is known. */ err = mtd_point(sb->s_mtd, 0, PAGE_SIZE, &sbi->mtd_point_size, &sbi->linear_virt_addr, &sbi->linear_phys_addr); if (err || sbi->mtd_point_size != PAGE_SIZE) { pr_err("unable to get direct memory access to mtd:%s\n", sb->s_mtd->name); return err ? : -ENODATA; } pr_info("checking physical address %pap for linear cramfs image\n", &sbi->linear_phys_addr); err = cramfs_read_super(sb, fc, &super); if (err) return err; /* Remap the whole filesystem now */ pr_info("linear cramfs image on mtd:%s appears to be %lu KB in size\n", sb->s_mtd->name, sbi->size/1024); mtd_unpoint(sb->s_mtd, 0, PAGE_SIZE); err = mtd_point(sb->s_mtd, 0, sbi->size, &sbi->mtd_point_size, &sbi->linear_virt_addr, &sbi->linear_phys_addr); if (err || sbi->mtd_point_size != sbi->size) { pr_err("unable to get direct memory access to mtd:%s\n", sb->s_mtd->name); return err ? : -ENODATA; } return cramfs_finalize_super(sb, &super.root); } static int cramfs_statfs(struct dentry *dentry, struct kstatfs *buf) { struct super_block *sb = dentry->d_sb; u64 id = 0; if (sb->s_bdev) id = huge_encode_dev(sb->s_bdev->bd_dev); else if (sb->s_dev) id = huge_encode_dev(sb->s_dev); buf->f_type = CRAMFS_MAGIC; buf->f_bsize = PAGE_SIZE; buf->f_blocks = CRAMFS_SB(sb)->blocks; buf->f_bfree = 0; buf->f_bavail = 0; buf->f_files = CRAMFS_SB(sb)->files; buf->f_ffree = 0; buf->f_fsid = u64_to_fsid(id); buf->f_namelen = CRAMFS_MAXPATHLEN; return 0; } /* * Read a cramfs directory entry. */ static int cramfs_readdir(struct file *file, struct dir_context *ctx) { struct inode *inode = file_inode(file); struct super_block *sb = inode->i_sb; char *buf; unsigned int offset; /* Offset within the thing. */ if (ctx->pos >= inode->i_size) return 0; offset = ctx->pos; /* Directory entries are always 4-byte aligned */ if (offset & 3) return -EINVAL; buf = kmalloc(CRAMFS_MAXPATHLEN, GFP_KERNEL); if (!buf) return -ENOMEM; while (offset < inode->i_size) { struct cramfs_inode *de; unsigned long nextoffset; char *name; ino_t ino; umode_t mode; int namelen; mutex_lock(&read_mutex); de = cramfs_read(sb, OFFSET(inode) + offset, sizeof(*de)+CRAMFS_MAXPATHLEN); name = (char *)(de+1); /* * Namelengths on disk are shifted by two * and the name padded out to 4-byte boundaries * with zeroes. */ namelen = de->namelen << 2; memcpy(buf, name, namelen); ino = cramino(de, OFFSET(inode) + offset); mode = de->mode; mutex_unlock(&read_mutex); nextoffset = offset + sizeof(*de) + namelen; for (;;) { if (!namelen) { kfree(buf); return -EIO; } if (buf[namelen-1]) break; namelen--; } if (!dir_emit(ctx, buf, namelen, ino, mode >> 12)) break; ctx->pos = offset = nextoffset; } kfree(buf); return 0; } /* * Lookup and fill in the inode data.. */ static struct dentry *cramfs_lookup(struct inode *dir, struct dentry *dentry, unsigned int flags) { unsigned int offset = 0; struct inode *inode = NULL; int sorted; mutex_lock(&read_mutex); sorted = CRAMFS_SB(dir->i_sb)->flags & CRAMFS_FLAG_SORTED_DIRS; while (offset < dir->i_size) { struct cramfs_inode *de; char *name; int namelen, retval; int dir_off = OFFSET(dir) + offset; de = cramfs_read(dir->i_sb, dir_off, sizeof(*de)+CRAMFS_MAXPATHLEN); name = (char *)(de+1); /* Try to take advantage of sorted directories */ if (sorted && (dentry->d_name.name[0] < name[0])) break; namelen = de->namelen << 2; offset += sizeof(*de) + namelen; /* Quick check that the name is roughly the right length */ if (((dentry->d_name.len + 3) & ~3) != namelen) continue; for (;;) { if (!namelen) { inode = ERR_PTR(-EIO); goto out; } if (name[namelen-1]) break; namelen--; } if (namelen != dentry->d_name.len) continue; retval = memcmp(dentry->d_name.name, name, namelen); if (retval > 0) continue; if (!retval) { inode = get_cramfs_inode(dir->i_sb, de, dir_off); break; } /* else (retval < 0) */ if (sorted) break; } out: mutex_unlock(&read_mutex); return d_splice_alias(inode, dentry); } static int cramfs_read_folio(struct file *file, struct folio *folio) { struct inode *inode = folio->mapping->host; u32 maxblock; int bytes_filled; void *pgdata; bool success = false; maxblock = (inode->i_size + PAGE_SIZE - 1) >> PAGE_SHIFT; bytes_filled = 0; pgdata = kmap_local_folio(folio, 0); if (folio->index < maxblock) { struct super_block *sb = inode->i_sb; u32 blkptr_offset = OFFSET(inode) + folio->index * 4; u32 block_ptr, block_start, block_len; bool uncompressed, direct; mutex_lock(&read_mutex); block_ptr = *(u32 *) cramfs_read(sb, blkptr_offset, 4); uncompressed = (block_ptr & CRAMFS_BLK_FLAG_UNCOMPRESSED); direct = (block_ptr & CRAMFS_BLK_FLAG_DIRECT_PTR); block_ptr &= ~CRAMFS_BLK_FLAGS; if (direct) { /* * The block pointer is an absolute start pointer, * shifted by 2 bits. The size is included in the * first 2 bytes of the data block when compressed, * or PAGE_SIZE otherwise. */ block_start = block_ptr << CRAMFS_BLK_DIRECT_PTR_SHIFT; if (uncompressed) { block_len = PAGE_SIZE; /* if last block: cap to file length */ if (folio->index == maxblock - 1) block_len = offset_in_page(inode->i_size); } else { block_len = *(u16 *) cramfs_read(sb, block_start, 2); block_start += 2; } } else { /* * The block pointer indicates one past the end of * the current block (start of next block). If this * is the first block then it starts where the block * pointer table ends, otherwise its start comes * from the previous block's pointer. */ block_start = OFFSET(inode) + maxblock * 4; if (folio->index) block_start = *(u32 *) cramfs_read(sb, blkptr_offset - 4, 4); /* Beware... previous ptr might be a direct ptr */ if (unlikely(block_start & CRAMFS_BLK_FLAG_DIRECT_PTR)) { /* See comments on earlier code. */ u32 prev_start = block_start; block_start = prev_start & ~CRAMFS_BLK_FLAGS; block_start <<= CRAMFS_BLK_DIRECT_PTR_SHIFT; if (prev_start & CRAMFS_BLK_FLAG_UNCOMPRESSED) { block_start += PAGE_SIZE; } else { block_len = *(u16 *) cramfs_read(sb, block_start, 2); block_start += 2 + block_len; } } block_start &= ~CRAMFS_BLK_FLAGS; block_len = block_ptr - block_start; } if (block_len == 0) ; /* hole */ else if (unlikely(block_len > 2*PAGE_SIZE || (uncompressed && block_len > PAGE_SIZE))) { mutex_unlock(&read_mutex); pr_err("bad data blocksize %u\n", block_len); goto err; } else if (uncompressed) { memcpy(pgdata, cramfs_read(sb, block_start, block_len), block_len); bytes_filled = block_len; } else { bytes_filled = cramfs_uncompress_block(pgdata, PAGE_SIZE, cramfs_read(sb, block_start, block_len), block_len); } mutex_unlock(&read_mutex); if (unlikely(bytes_filled < 0)) goto err; } memset(pgdata + bytes_filled, 0, PAGE_SIZE - bytes_filled); flush_dcache_folio(folio); success = true; err: kunmap_local(pgdata); folio_end_read(folio, success); return 0; } static const struct address_space_operations cramfs_aops = { .read_folio = cramfs_read_folio }; /* * Our operations: */ /* * A directory can only readdir */ static const struct file_operations cramfs_directory_operations = { .llseek = generic_file_llseek, .read = generic_read_dir, .iterate_shared = cramfs_readdir, }; static const struct inode_operations cramfs_dir_inode_operations = { .lookup = cramfs_lookup, }; static const struct super_operations cramfs_ops = { .statfs = cramfs_statfs, }; static int cramfs_get_tree(struct fs_context *fc) { int ret = -ENOPROTOOPT; if (IS_ENABLED(CONFIG_CRAMFS_MTD)) { ret = get_tree_mtd(fc, cramfs_mtd_fill_super); if (!ret) return 0; } if (IS_ENABLED(CONFIG_CRAMFS_BLOCKDEV)) ret = get_tree_bdev(fc, cramfs_blkdev_fill_super); return ret; } static const struct fs_context_operations cramfs_context_ops = { .get_tree = cramfs_get_tree, .reconfigure = cramfs_reconfigure, }; /* * Set up the filesystem mount context. */ static int cramfs_init_fs_context(struct fs_context *fc) { fc->ops = &cramfs_context_ops; return 0; } static struct file_system_type cramfs_fs_type = { .owner = THIS_MODULE, .name = "cramfs", .init_fs_context = cramfs_init_fs_context, .kill_sb = cramfs_kill_sb, .fs_flags = FS_REQUIRES_DEV, }; MODULE_ALIAS_FS("cramfs"); static int __init init_cramfs_fs(void) { int rv; rv = cramfs_uncompress_init(); if (rv < 0) return rv; rv = register_filesystem(&cramfs_fs_type); if (rv < 0) cramfs_uncompress_exit(); return rv; } static void __exit exit_cramfs_fs(void) { cramfs_uncompress_exit(); unregister_filesystem(&cramfs_fs_type); } module_init(init_cramfs_fs) module_exit(exit_cramfs_fs) MODULE_DESCRIPTION("Compressed ROM file system support"); MODULE_LICENSE("GPL"); |
| 11110 11129 11105 19 11109 11102 4952 4966 4967 8404 2786 8389 113 8389 8389 8387 8387 5664 15694 4552 4563 39 39 4556 717 716 718 486 485 717 484 717 645 642 643 1448 1445 1447 1447 4931 4931 4929 4925 4931 4931 4927 1306 1308 57 1300 1301 59 59 59 4481 3958 3957 3958 594 694 663 29 28 29 692 4932 4927 4935 4925 4926 3248 3255 1201 1198 1194 1036 2467 2460 48 273 3249 3243 176 178 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 | // SPDX-License-Identifier: GPL-2.0-only #include <linux/bitmap.h> #include <linux/bug.h> #include <linux/export.h> #include <linux/idr.h> #include <linux/slab.h> #include <linux/spinlock.h> #include <linux/xarray.h> /** * idr_alloc_u32() - Allocate an ID. * @idr: IDR handle. * @ptr: Pointer to be associated with the new ID. * @nextid: Pointer to an ID. * @max: The maximum ID to allocate (inclusive). * @gfp: Memory allocation flags. * * Allocates an unused ID in the range specified by @nextid and @max. * Note that @max is inclusive whereas the @end parameter to idr_alloc() * is exclusive. The new ID is assigned to @nextid before the pointer * is inserted into the IDR, so if @nextid points into the object pointed * to by @ptr, a concurrent lookup will not find an uninitialised ID. * * The caller should provide their own locking to ensure that two * concurrent modifications to the IDR are not possible. Read-only * accesses to the IDR may be done under the RCU read lock or may * exclude simultaneous writers. * * Return: 0 if an ID was allocated, -ENOMEM if memory allocation failed, * or -ENOSPC if no free IDs could be found. If an error occurred, * @nextid is unchanged. */ int idr_alloc_u32(struct idr *idr, void *ptr, u32 *nextid, unsigned long max, gfp_t gfp) { struct radix_tree_iter iter; void __rcu **slot; unsigned int base = idr->idr_base; unsigned int id = *nextid; if (WARN_ON_ONCE(!(idr->idr_rt.xa_flags & ROOT_IS_IDR))) idr->idr_rt.xa_flags |= IDR_RT_MARKER; id = (id < base) ? 0 : id - base; radix_tree_iter_init(&iter, id); slot = idr_get_free(&idr->idr_rt, &iter, gfp, max - base); if (IS_ERR(slot)) return PTR_ERR(slot); *nextid = iter.index + base; /* there is a memory barrier inside radix_tree_iter_replace() */ radix_tree_iter_replace(&idr->idr_rt, &iter, slot, ptr); radix_tree_iter_tag_clear(&idr->idr_rt, &iter, IDR_FREE); return 0; } EXPORT_SYMBOL_GPL(idr_alloc_u32); /** * idr_alloc() - Allocate an ID. * @idr: IDR handle. * @ptr: Pointer to be associated with the new ID. * @start: The minimum ID (inclusive). * @end: The maximum ID (exclusive). * @gfp: Memory allocation flags. * * Allocates an unused ID in the range specified by @start and @end. If * @end is <= 0, it is treated as one larger than %INT_MAX. This allows * callers to use @start + N as @end as long as N is within integer range. * * The caller should provide their own locking to ensure that two * concurrent modifications to the IDR are not possible. Read-only * accesses to the IDR may be done under the RCU read lock or may * exclude simultaneous writers. * * Return: The newly allocated ID, -ENOMEM if memory allocation failed, * or -ENOSPC if no free IDs could be found. */ int idr_alloc(struct idr *idr, void *ptr, int start, int end, gfp_t gfp) { u32 id = start; int ret; if (WARN_ON_ONCE(start < 0)) return -EINVAL; ret = idr_alloc_u32(idr, ptr, &id, end > 0 ? end - 1 : INT_MAX, gfp); if (ret) return ret; return id; } EXPORT_SYMBOL_GPL(idr_alloc); /** * idr_alloc_cyclic() - Allocate an ID cyclically. * @idr: IDR handle. * @ptr: Pointer to be associated with the new ID. * @start: The minimum ID (inclusive). * @end: The maximum ID (exclusive). * @gfp: Memory allocation flags. * * Allocates an unused ID in the range specified by @start and @end. If * @end is <= 0, it is treated as one larger than %INT_MAX. This allows * callers to use @start + N as @end as long as N is within integer range. * The search for an unused ID will start at the last ID allocated and will * wrap around to @start if no free IDs are found before reaching @end. * * The caller should provide their own locking to ensure that two * concurrent modifications to the IDR are not possible. Read-only * accesses to the IDR may be done under the RCU read lock or may * exclude simultaneous writers. * * Return: The newly allocated ID, -ENOMEM if memory allocation failed, * or -ENOSPC if no free IDs could be found. */ int idr_alloc_cyclic(struct idr *idr, void *ptr, int start, int end, gfp_t gfp) { u32 id = idr->idr_next; int err, max = end > 0 ? end - 1 : INT_MAX; if ((int)id < start) id = start; err = idr_alloc_u32(idr, ptr, &id, max, gfp); if ((err == -ENOSPC) && (id > start)) { id = start; err = idr_alloc_u32(idr, ptr, &id, max, gfp); } if (err) return err; idr->idr_next = id + 1; return id; } EXPORT_SYMBOL(idr_alloc_cyclic); /** * idr_remove() - Remove an ID from the IDR. * @idr: IDR handle. * @id: Pointer ID. * * Removes this ID from the IDR. If the ID was not previously in the IDR, * this function returns %NULL. * * Since this function modifies the IDR, the caller should provide their * own locking to ensure that concurrent modification of the same IDR is * not possible. * * Return: The pointer formerly associated with this ID. */ void *idr_remove(struct idr *idr, unsigned long id) { return radix_tree_delete_item(&idr->idr_rt, id - idr->idr_base, NULL); } EXPORT_SYMBOL_GPL(idr_remove); /** * idr_find() - Return pointer for given ID. * @idr: IDR handle. * @id: Pointer ID. * * Looks up the pointer associated with this ID. A %NULL pointer may * indicate that @id is not allocated or that the %NULL pointer was * associated with this ID. * * This function can be called under rcu_read_lock(), given that the leaf * pointers lifetimes are correctly managed. * * Return: The pointer associated with this ID. */ void *idr_find(const struct idr *idr, unsigned long id) { return radix_tree_lookup(&idr->idr_rt, id - idr->idr_base); } EXPORT_SYMBOL_GPL(idr_find); /** * idr_for_each() - Iterate through all stored pointers. * @idr: IDR handle. * @fn: Function to be called for each pointer. * @data: Data passed to callback function. * * The callback function will be called for each entry in @idr, passing * the ID, the entry and @data. * * If @fn returns anything other than %0, the iteration stops and that * value is returned from this function. * * idr_for_each() can be called concurrently with idr_alloc() and * idr_remove() if protected by RCU. Newly added entries may not be * seen and deleted entries may be seen, but adding and removing entries * will not cause other entries to be skipped, nor spurious ones to be seen. */ int idr_for_each(const struct idr *idr, int (*fn)(int id, void *p, void *data), void *data) { struct radix_tree_iter iter; void __rcu **slot; int base = idr->idr_base; radix_tree_for_each_slot(slot, &idr->idr_rt, &iter, 0) { int ret; unsigned long id = iter.index + base; if (WARN_ON_ONCE(id > INT_MAX)) break; ret = fn(id, rcu_dereference_raw(*slot), data); if (ret) return ret; } return 0; } EXPORT_SYMBOL(idr_for_each); /** * idr_get_next_ul() - Find next populated entry. * @idr: IDR handle. * @nextid: Pointer to an ID. * * Returns the next populated entry in the tree with an ID greater than * or equal to the value pointed to by @nextid. On exit, @nextid is updated * to the ID of the found value. To use in a loop, the value pointed to by * nextid must be incremented by the user. */ void *idr_get_next_ul(struct idr *idr, unsigned long *nextid) { struct radix_tree_iter iter; void __rcu **slot; void *entry = NULL; unsigned long base = idr->idr_base; unsigned long id = *nextid; id = (id < base) ? 0 : id - base; radix_tree_for_each_slot(slot, &idr->idr_rt, &iter, id) { entry = rcu_dereference_raw(*slot); if (!entry) continue; if (!xa_is_internal(entry)) break; if (slot != &idr->idr_rt.xa_head && !xa_is_retry(entry)) break; slot = radix_tree_iter_retry(&iter); } if (!slot) return NULL; *nextid = iter.index + base; return entry; } EXPORT_SYMBOL(idr_get_next_ul); /** * idr_get_next() - Find next populated entry. * @idr: IDR handle. * @nextid: Pointer to an ID. * * Returns the next populated entry in the tree with an ID greater than * or equal to the value pointed to by @nextid. On exit, @nextid is updated * to the ID of the found value. To use in a loop, the value pointed to by * nextid must be incremented by the user. */ void *idr_get_next(struct idr *idr, int *nextid) { unsigned long id = *nextid; void *entry = idr_get_next_ul(idr, &id); if (WARN_ON_ONCE(id > INT_MAX)) return NULL; *nextid = id; return entry; } EXPORT_SYMBOL(idr_get_next); /** * idr_replace() - replace pointer for given ID. * @idr: IDR handle. * @ptr: New pointer to associate with the ID. * @id: ID to change. * * Replace the pointer registered with an ID and return the old value. * This function can be called under the RCU read lock concurrently with * idr_alloc() and idr_remove() (as long as the ID being removed is not * the one being replaced!). * * Returns: the old value on success. %-ENOENT indicates that @id was not * found. %-EINVAL indicates that @ptr was not valid. */ void *idr_replace(struct idr *idr, void *ptr, unsigned long id) { struct radix_tree_node *node; void __rcu **slot = NULL; void *entry; id -= idr->idr_base; entry = __radix_tree_lookup(&idr->idr_rt, id, &node, &slot); if (!slot || radix_tree_tag_get(&idr->idr_rt, id, IDR_FREE)) return ERR_PTR(-ENOENT); __radix_tree_replace(&idr->idr_rt, node, slot, ptr); return entry; } EXPORT_SYMBOL(idr_replace); /** * DOC: IDA description * * The IDA is an ID allocator which does not provide the ability to * associate an ID with a pointer. As such, it only needs to store one * bit per ID, and so is more space efficient than an IDR. To use an IDA, * define it using DEFINE_IDA() (or embed a &struct ida in a data structure, * then initialise it using ida_init()). To allocate a new ID, call * ida_alloc(), ida_alloc_min(), ida_alloc_max() or ida_alloc_range(). * To free an ID, call ida_free(). * * ida_destroy() can be used to dispose of an IDA without needing to * free the individual IDs in it. You can use ida_is_empty() to find * out whether the IDA has any IDs currently allocated. * * The IDA handles its own locking. It is safe to call any of the IDA * functions without synchronisation in your code. * * IDs are currently limited to the range [0-INT_MAX]. If this is an awkward * limitation, it should be quite straightforward to raise the maximum. */ /* * Developer's notes: * * The IDA uses the functionality provided by the XArray to store bitmaps in * each entry. The XA_FREE_MARK is only cleared when all bits in the bitmap * have been set. * * I considered telling the XArray that each slot is an order-10 node * and indexing by bit number, but the XArray can't allow a single multi-index * entry in the head, which would significantly increase memory consumption * for the IDA. So instead we divide the index by the number of bits in the * leaf bitmap before doing a radix tree lookup. * * As an optimisation, if there are only a few low bits set in any given * leaf, instead of allocating a 128-byte bitmap, we store the bits * as a value entry. Value entries never have the XA_FREE_MARK cleared * because we can always convert them into a bitmap entry. * * It would be possible to optimise further; once we've run out of a * single 128-byte bitmap, we currently switch to a 576-byte node, put * the 128-byte bitmap in the first entry and then start allocating extra * 128-byte entries. We could instead use the 512 bytes of the node's * data as a bitmap before moving to that scheme. I do not believe this * is a worthwhile optimisation; Rasmus Villemoes surveyed the current * users of the IDA and almost none of them use more than 1024 entries. * Those that do use more than the 8192 IDs that the 512 bytes would * provide. * * The IDA always uses a lock to alloc/free. If we add a 'test_bit' * equivalent, it will still need locking. Going to RCU lookup would require * using RCU to free bitmaps, and that's not trivial without embedding an * RCU head in the bitmap, which adds a 2-pointer overhead to each 128-byte * bitmap, which is excessive. */ /** * ida_alloc_range() - Allocate an unused ID. * @ida: IDA handle. * @min: Lowest ID to allocate. * @max: Highest ID to allocate. * @gfp: Memory allocation flags. * * Allocate an ID between @min and @max, inclusive. The allocated ID will * not exceed %INT_MAX, even if @max is larger. * * Context: Any context. It is safe to call this function without * locking in your code. * Return: The allocated ID, or %-ENOMEM if memory could not be allocated, * or %-ENOSPC if there are no free IDs. */ int ida_alloc_range(struct ida *ida, unsigned int min, unsigned int max, gfp_t gfp) { XA_STATE(xas, &ida->xa, min / IDA_BITMAP_BITS); unsigned bit = min % IDA_BITMAP_BITS; unsigned long flags; struct ida_bitmap *bitmap, *alloc = NULL; if ((int)min < 0) return -ENOSPC; if ((int)max < 0) max = INT_MAX; retry: xas_lock_irqsave(&xas, flags); next: bitmap = xas_find_marked(&xas, max / IDA_BITMAP_BITS, XA_FREE_MARK); if (xas.xa_index > min / IDA_BITMAP_BITS) bit = 0; if (xas.xa_index * IDA_BITMAP_BITS + bit > max) goto nospc; if (xa_is_value(bitmap)) { unsigned long tmp = xa_to_value(bitmap); if (bit < BITS_PER_XA_VALUE) { bit = find_next_zero_bit(&tmp, BITS_PER_XA_VALUE, bit); if (xas.xa_index * IDA_BITMAP_BITS + bit > max) goto nospc; if (bit < BITS_PER_XA_VALUE) { tmp |= 1UL << bit; xas_store(&xas, xa_mk_value(tmp)); goto out; } } bitmap = alloc; if (!bitmap) bitmap = kzalloc(sizeof(*bitmap), GFP_NOWAIT); if (!bitmap) goto alloc; bitmap->bitmap[0] = tmp; xas_store(&xas, bitmap); if (xas_error(&xas)) { bitmap->bitmap[0] = 0; goto out; } } if (bitmap) { bit = find_next_zero_bit(bitmap->bitmap, IDA_BITMAP_BITS, bit); if (xas.xa_index * IDA_BITMAP_BITS + bit > max) goto nospc; if (bit == IDA_BITMAP_BITS) goto next; __set_bit(bit, bitmap->bitmap); if (bitmap_full(bitmap->bitmap, IDA_BITMAP_BITS)) xas_clear_mark(&xas, XA_FREE_MARK); } else { if (bit < BITS_PER_XA_VALUE) { bitmap = xa_mk_value(1UL << bit); } else { bitmap = alloc; if (!bitmap) bitmap = kzalloc(sizeof(*bitmap), GFP_NOWAIT); if (!bitmap) goto alloc; __set_bit(bit, bitmap->bitmap); } xas_store(&xas, bitmap); } out: xas_unlock_irqrestore(&xas, flags); if (xas_nomem(&xas, gfp)) { xas.xa_index = min / IDA_BITMAP_BITS; bit = min % IDA_BITMAP_BITS; goto retry; } if (bitmap != alloc) kfree(alloc); if (xas_error(&xas)) return xas_error(&xas); return xas.xa_index * IDA_BITMAP_BITS + bit; alloc: xas_unlock_irqrestore(&xas, flags); alloc = kzalloc(sizeof(*bitmap), gfp); if (!alloc) return -ENOMEM; xas_set(&xas, min / IDA_BITMAP_BITS); bit = min % IDA_BITMAP_BITS; goto retry; nospc: xas_unlock_irqrestore(&xas, flags); kfree(alloc); return -ENOSPC; } EXPORT_SYMBOL(ida_alloc_range); /** * ida_find_first_range - Get the lowest used ID. * @ida: IDA handle. * @min: Lowest ID to get. * @max: Highest ID to get. * * Get the lowest used ID between @min and @max, inclusive. The returned * ID will not exceed %INT_MAX, even if @max is larger. * * Context: Any context. Takes and releases the xa_lock. * Return: The lowest used ID, or errno if no used ID is found. */ int ida_find_first_range(struct ida *ida, unsigned int min, unsigned int max) { unsigned long index = min / IDA_BITMAP_BITS; unsigned int offset = min % IDA_BITMAP_BITS; unsigned long *addr, size, bit; unsigned long tmp = 0; unsigned long flags; void *entry; int ret; if ((int)min < 0) return -EINVAL; if ((int)max < 0) max = INT_MAX; xa_lock_irqsave(&ida->xa, flags); entry = xa_find(&ida->xa, &index, max / IDA_BITMAP_BITS, XA_PRESENT); if (!entry) { ret = -ENOENT; goto err_unlock; } if (index > min / IDA_BITMAP_BITS) offset = 0; if (index * IDA_BITMAP_BITS + offset > max) { ret = -ENOENT; goto err_unlock; } if (xa_is_value(entry)) { tmp = xa_to_value(entry); addr = &tmp; size = BITS_PER_XA_VALUE; } else { addr = ((struct ida_bitmap *)entry)->bitmap; size = IDA_BITMAP_BITS; } bit = find_next_bit(addr, size, offset); xa_unlock_irqrestore(&ida->xa, flags); if (bit == size || index * IDA_BITMAP_BITS + bit > max) return -ENOENT; return index * IDA_BITMAP_BITS + bit; err_unlock: xa_unlock_irqrestore(&ida->xa, flags); return ret; } EXPORT_SYMBOL(ida_find_first_range); /** * ida_free() - Release an allocated ID. * @ida: IDA handle. * @id: Previously allocated ID. * * Context: Any context. It is safe to call this function without * locking in your code. */ void ida_free(struct ida *ida, unsigned int id) { XA_STATE(xas, &ida->xa, id / IDA_BITMAP_BITS); unsigned bit = id % IDA_BITMAP_BITS; struct ida_bitmap *bitmap; unsigned long flags; if ((int)id < 0) return; xas_lock_irqsave(&xas, flags); bitmap = xas_load(&xas); if (xa_is_value(bitmap)) { unsigned long v = xa_to_value(bitmap); if (bit >= BITS_PER_XA_VALUE) goto err; if (!(v & (1UL << bit))) goto err; v &= ~(1UL << bit); if (!v) goto delete; xas_store(&xas, xa_mk_value(v)); } else { if (!bitmap || !test_bit(bit, bitmap->bitmap)) goto err; __clear_bit(bit, bitmap->bitmap); xas_set_mark(&xas, XA_FREE_MARK); if (bitmap_empty(bitmap->bitmap, IDA_BITMAP_BITS)) { kfree(bitmap); delete: xas_store(&xas, NULL); } } xas_unlock_irqrestore(&xas, flags); return; err: xas_unlock_irqrestore(&xas, flags); WARN(1, "ida_free called for id=%d which is not allocated.\n", id); } EXPORT_SYMBOL(ida_free); /** * ida_destroy() - Free all IDs. * @ida: IDA handle. * * Calling this function frees all IDs and releases all resources used * by an IDA. When this call returns, the IDA is empty and can be reused * or freed. If the IDA is already empty, there is no need to call this * function. * * Context: Any context. It is safe to call this function without * locking in your code. */ void ida_destroy(struct ida *ida) { XA_STATE(xas, &ida->xa, 0); struct ida_bitmap *bitmap; unsigned long flags; xas_lock_irqsave(&xas, flags); xas_for_each(&xas, bitmap, ULONG_MAX) { if (!xa_is_value(bitmap)) kfree(bitmap); xas_store(&xas, NULL); } xas_unlock_irqrestore(&xas, flags); } EXPORT_SYMBOL(ida_destroy); #ifndef __KERNEL__ extern void xa_dump_index(unsigned long index, unsigned int shift); #define IDA_CHUNK_SHIFT ilog2(IDA_BITMAP_BITS) static void ida_dump_entry(void *entry, unsigned long index) { unsigned long i; if (!entry) return; if (xa_is_node(entry)) { struct xa_node *node = xa_to_node(entry); unsigned int shift = node->shift + IDA_CHUNK_SHIFT + XA_CHUNK_SHIFT; xa_dump_index(index * IDA_BITMAP_BITS, shift); xa_dump_node(node); for (i = 0; i < XA_CHUNK_SIZE; i++) ida_dump_entry(node->slots[i], index | (i << node->shift)); } else if (xa_is_value(entry)) { xa_dump_index(index * IDA_BITMAP_BITS, ilog2(BITS_PER_LONG)); pr_cont("value: data %lx [%px]\n", xa_to_value(entry), entry); } else { struct ida_bitmap *bitmap = entry; xa_dump_index(index * IDA_BITMAP_BITS, IDA_CHUNK_SHIFT); pr_cont("bitmap: %p data", bitmap); for (i = 0; i < IDA_BITMAP_LONGS; i++) pr_cont(" %lx", bitmap->bitmap[i]); pr_cont("\n"); } } static void ida_dump(struct ida *ida) { struct xarray *xa = &ida->xa; pr_debug("ida: %p node %p free %d\n", ida, xa->xa_head, xa->xa_flags >> ROOT_TAG_SHIFT); ida_dump_entry(xa->xa_head, 0); } #endif |
| 28 67 68 66 41 41 41 40 28 5 19 4 4 28 47 46 6 6 12 12 13 14 47 47 47 47 47 9 9 9 9 2 2 11 11 11 11 11 11 11 11 5 5 5 46 47 5 5 4 47 135 134 47 47 47 47 47 2 2 47 47 47 47 29 45 46 45 46 30 11 10 10 10 14 13 14 14 14 5 5 5 4 1 5 5 5 5 9 9 8 8 9 6 9 8 1 1 1 1 1 1 1 6 6 6 6 6 6 6 6 1 1 1 1 5 5 5 5 60 1 60 2 3 2 11 11 11 3 24 28 28 28 12 12 12 20 2 20 3 56 69 69 68 68 53 53 51 1 1 53 4 4 65 65 53 46 46 52 53 53 53 53 1 67 1 1 1 67 49 64 49 1 1 5 5 5 5 4 4 2 4 4 4 4 4 4 3 3 4 4 1 1 2 2 2 10 11 182 190 184 188 183 181 182 179 11 11 1 1 11 178 10 10 182 179 3 3 4 4 4 3 4 4 4 4 2 2 5 5 3 3 3 3 2 2 5 3 1 2 1 2 4 1 1 1 2 2 2 2 2 2 2 1 1 1 1 2 3 3 1 1 1 2 2 2 2 2 3 2 2 1 3 2 2 2 1 1 1 1 3 1 1 8 2 2 1 2 1 15 5 1 4 3 2 1 1 51 10 4 6 1 1 2 2 6 1 32 16 19 15 3 13 4 9 1 5 3 9 51 51 1 1 51 51 32 118 45 94 38 38 38 38 36 36 36 36 45 48 45 41 41 38 38 3 3 2 2 2 35 64 147 122 106 110 106 6 27 27 4 4 5 5 4 4 5 5 10 10 9 3 6 6 6 4 6 5 32 32 1 1 31 30 11 30 3 1 1 1 3 1 1 1 8 8 8 2 1 7 2 1 1 1 1 1 1 1 2 2 2 2 2 2 2 8 7 7 7 6 3 3 3 3 3 3 3 7 7 5 5 5 5 7 5 5 7 8 7 7 7 7 7 7 8 2 1 2 4 2 4 3 1 2 1 3 5 3 2 1 5 47 83 30 15 15 9 9 15 15 15 15 15 83 83 83 30 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 2297 2298 2299 2300 2301 2302 2303 2304 2305 2306 2307 2308 2309 2310 2311 2312 2313 2314 2315 2316 2317 2318 2319 2320 2321 2322 2323 2324 2325 2326 2327 2328 2329 2330 2331 2332 2333 2334 2335 2336 2337 2338 2339 2340 2341 2342 2343 2344 2345 2346 2347 2348 2349 2350 2351 2352 2353 2354 2355 2356 2357 2358 2359 2360 2361 2362 2363 2364 2365 2366 2367 2368 2369 2370 2371 2372 2373 2374 2375 2376 2377 2378 2379 2380 2381 2382 2383 2384 2385 2386 2387 2388 2389 2390 2391 2392 2393 2394 2395 2396 2397 2398 2399 2400 2401 2402 2403 2404 2405 2406 2407 2408 2409 2410 2411 2412 2413 2414 2415 2416 2417 2418 2419 2420 2421 2422 2423 2424 2425 2426 2427 2428 2429 2430 2431 2432 2433 2434 2435 2436 2437 2438 2439 2440 2441 2442 2443 2444 2445 2446 2447 2448 2449 2450 2451 2452 2453 2454 2455 2456 2457 2458 2459 2460 2461 2462 2463 2464 2465 2466 2467 2468 2469 2470 2471 2472 2473 2474 2475 2476 2477 2478 2479 2480 2481 2482 2483 2484 2485 2486 2487 2488 2489 2490 2491 2492 2493 2494 2495 2496 2497 2498 2499 2500 2501 2502 2503 2504 2505 2506 2507 2508 2509 2510 2511 2512 2513 2514 2515 2516 2517 2518 2519 2520 2521 2522 2523 2524 2525 2526 2527 2528 2529 2530 2531 2532 2533 2534 2535 2536 2537 2538 2539 2540 2541 2542 2543 2544 2545 2546 2547 2548 2549 2550 2551 2552 2553 2554 2555 2556 2557 2558 2559 2560 2561 2562 2563 2564 2565 2566 2567 2568 2569 2570 2571 2572 2573 2574 2575 2576 2577 2578 2579 2580 2581 2582 2583 2584 2585 2586 2587 2588 2589 2590 2591 2592 2593 2594 2595 2596 2597 2598 2599 2600 2601 2602 2603 2604 2605 2606 2607 2608 2609 2610 2611 2612 2613 2614 2615 2616 2617 2618 2619 2620 2621 2622 2623 2624 2625 2626 2627 2628 2629 2630 2631 2632 2633 2634 2635 2636 2637 2638 2639 2640 2641 2642 2643 2644 2645 2646 2647 2648 2649 2650 2651 2652 2653 2654 2655 2656 2657 2658 2659 2660 2661 2662 2663 2664 2665 2666 2667 2668 2669 2670 2671 2672 2673 2674 2675 2676 2677 2678 2679 2680 2681 2682 2683 2684 2685 2686 2687 2688 2689 2690 2691 2692 2693 2694 2695 2696 2697 2698 2699 2700 2701 2702 2703 2704 2705 2706 2707 2708 2709 2710 2711 2712 2713 2714 2715 2716 2717 2718 2719 2720 2721 2722 2723 2724 2725 2726 2727 2728 2729 2730 2731 2732 2733 2734 2735 2736 2737 2738 2739 2740 2741 2742 2743 2744 2745 2746 2747 2748 2749 2750 2751 2752 2753 2754 2755 2756 2757 2758 2759 2760 2761 2762 2763 2764 2765 2766 2767 2768 2769 2770 2771 2772 2773 2774 2775 2776 2777 2778 2779 2780 2781 2782 2783 2784 2785 2786 2787 2788 2789 2790 2791 2792 2793 2794 2795 2796 2797 2798 2799 2800 2801 2802 2803 2804 2805 2806 2807 2808 2809 2810 2811 2812 2813 2814 2815 2816 2817 2818 2819 2820 2821 2822 2823 2824 2825 2826 2827 2828 2829 2830 2831 2832 2833 2834 2835 2836 2837 2838 2839 2840 2841 2842 2843 2844 2845 2846 2847 2848 2849 2850 2851 2852 2853 2854 2855 2856 2857 2858 2859 2860 2861 2862 2863 2864 2865 2866 2867 2868 2869 2870 2871 2872 2873 2874 2875 2876 2877 2878 2879 2880 2881 2882 2883 2884 2885 2886 2887 2888 2889 2890 2891 2892 2893 2894 2895 2896 2897 2898 2899 2900 2901 2902 2903 2904 2905 2906 2907 2908 2909 2910 2911 2912 2913 2914 2915 2916 2917 2918 2919 2920 2921 2922 2923 2924 2925 2926 2927 2928 2929 2930 2931 2932 2933 2934 2935 2936 2937 2938 2939 2940 2941 2942 2943 2944 2945 2946 2947 2948 2949 2950 2951 2952 2953 2954 2955 2956 2957 2958 2959 2960 2961 2962 2963 2964 2965 2966 2967 2968 2969 2970 2971 2972 2973 2974 2975 2976 2977 2978 2979 2980 2981 2982 2983 2984 2985 2986 2987 2988 2989 2990 2991 2992 2993 2994 2995 2996 2997 2998 2999 3000 3001 3002 3003 3004 3005 3006 3007 3008 3009 3010 3011 3012 3013 3014 3015 3016 3017 3018 3019 3020 3021 3022 3023 3024 3025 3026 3027 3028 3029 3030 3031 3032 3033 3034 3035 3036 3037 3038 3039 3040 3041 3042 3043 3044 3045 3046 3047 3048 3049 3050 3051 3052 3053 3054 3055 3056 3057 3058 3059 3060 3061 3062 3063 3064 3065 3066 3067 3068 3069 3070 3071 3072 3073 3074 3075 3076 3077 3078 3079 3080 3081 3082 3083 3084 3085 3086 3087 3088 3089 3090 3091 3092 3093 3094 3095 3096 3097 3098 3099 3100 3101 3102 3103 3104 3105 3106 3107 3108 3109 3110 3111 3112 3113 3114 3115 3116 3117 3118 3119 3120 3121 3122 3123 3124 3125 3126 3127 3128 3129 3130 3131 3132 3133 3134 3135 3136 3137 3138 3139 3140 3141 3142 3143 3144 3145 3146 3147 3148 3149 3150 3151 3152 3153 3154 3155 3156 3157 3158 3159 3160 3161 3162 3163 3164 3165 3166 3167 3168 3169 3170 3171 3172 3173 3174 3175 3176 3177 3178 3179 3180 3181 3182 3183 3184 3185 3186 3187 3188 3189 3190 3191 3192 3193 3194 3195 3196 3197 3198 3199 3200 3201 3202 3203 3204 3205 3206 3207 3208 3209 3210 3211 3212 3213 3214 3215 3216 3217 3218 3219 3220 3221 3222 3223 3224 3225 3226 3227 3228 3229 3230 3231 3232 3233 3234 3235 3236 3237 3238 3239 3240 3241 3242 3243 3244 3245 3246 3247 3248 3249 3250 3251 3252 3253 3254 3255 3256 3257 3258 3259 3260 3261 3262 3263 3264 3265 3266 3267 3268 3269 3270 3271 3272 3273 3274 3275 3276 3277 3278 3279 3280 3281 3282 3283 3284 3285 3286 3287 3288 3289 3290 3291 3292 3293 3294 3295 3296 3297 3298 3299 3300 3301 3302 3303 3304 3305 3306 3307 3308 3309 3310 3311 3312 3313 3314 3315 3316 3317 3318 3319 3320 3321 3322 3323 3324 3325 3326 3327 3328 3329 3330 3331 3332 3333 3334 3335 3336 3337 3338 3339 3340 3341 3342 3343 3344 3345 3346 3347 3348 3349 3350 3351 3352 3353 3354 3355 3356 3357 3358 3359 3360 3361 3362 3363 3364 3365 3366 3367 3368 3369 3370 3371 3372 3373 3374 3375 3376 3377 3378 3379 3380 3381 3382 3383 3384 3385 3386 3387 3388 3389 3390 3391 3392 3393 3394 3395 3396 3397 3398 3399 3400 3401 3402 3403 3404 3405 3406 3407 3408 3409 3410 3411 3412 3413 3414 3415 3416 3417 3418 3419 3420 3421 3422 3423 3424 3425 3426 3427 3428 3429 3430 3431 3432 3433 3434 3435 3436 3437 3438 3439 3440 3441 3442 3443 3444 3445 3446 3447 3448 3449 3450 3451 3452 3453 3454 3455 3456 3457 3458 3459 3460 3461 3462 3463 3464 3465 3466 3467 3468 3469 3470 3471 3472 3473 3474 3475 3476 3477 3478 3479 3480 3481 3482 3483 3484 3485 3486 3487 3488 3489 3490 3491 3492 3493 3494 3495 3496 3497 3498 3499 3500 3501 3502 3503 3504 3505 3506 3507 3508 3509 3510 3511 3512 3513 3514 3515 3516 3517 3518 3519 3520 3521 3522 3523 3524 3525 3526 3527 3528 3529 3530 3531 3532 3533 3534 3535 3536 3537 3538 3539 3540 3541 3542 3543 3544 3545 3546 3547 3548 3549 3550 3551 3552 3553 3554 3555 3556 3557 3558 3559 3560 3561 3562 3563 3564 3565 3566 3567 3568 3569 3570 3571 3572 3573 3574 3575 3576 3577 3578 3579 3580 3581 3582 3583 3584 3585 3586 3587 3588 3589 3590 3591 3592 3593 3594 3595 3596 3597 3598 3599 3600 3601 3602 3603 3604 3605 3606 3607 3608 3609 3610 3611 3612 3613 3614 3615 3616 3617 3618 3619 3620 3621 3622 3623 3624 3625 3626 3627 3628 3629 3630 3631 3632 3633 3634 3635 3636 3637 3638 3639 3640 3641 3642 3643 3644 3645 3646 3647 3648 3649 3650 3651 3652 3653 3654 3655 3656 3657 3658 3659 3660 3661 3662 3663 3664 3665 3666 3667 3668 3669 3670 3671 3672 3673 3674 3675 3676 3677 3678 3679 3680 3681 3682 3683 3684 3685 3686 3687 3688 3689 3690 3691 3692 3693 3694 3695 3696 3697 3698 3699 3700 3701 3702 3703 3704 3705 3706 3707 3708 3709 3710 3711 3712 3713 3714 3715 3716 3717 3718 3719 3720 3721 3722 3723 3724 3725 3726 3727 3728 3729 3730 3731 3732 3733 3734 3735 3736 3737 3738 3739 3740 3741 3742 3743 3744 3745 3746 3747 3748 3749 3750 3751 3752 3753 3754 3755 3756 3757 3758 3759 3760 3761 3762 3763 3764 3765 3766 3767 3768 3769 3770 3771 3772 3773 3774 3775 3776 3777 3778 3779 3780 3781 3782 3783 3784 3785 3786 3787 3788 3789 3790 3791 3792 3793 3794 3795 3796 3797 3798 3799 3800 3801 3802 3803 3804 3805 3806 3807 3808 3809 3810 3811 3812 3813 3814 3815 3816 3817 3818 3819 3820 3821 3822 3823 3824 3825 3826 3827 3828 3829 3830 3831 3832 3833 3834 3835 3836 3837 3838 3839 3840 3841 3842 3843 3844 3845 3846 3847 3848 3849 3850 3851 3852 3853 3854 3855 3856 3857 3858 3859 3860 3861 3862 3863 3864 3865 3866 3867 3868 3869 3870 3871 3872 3873 3874 3875 3876 3877 3878 3879 3880 3881 3882 3883 3884 3885 3886 3887 3888 3889 3890 3891 3892 3893 3894 3895 3896 3897 3898 3899 3900 3901 3902 3903 3904 3905 3906 3907 3908 3909 3910 3911 3912 3913 3914 3915 3916 3917 3918 3919 3920 3921 3922 3923 3924 3925 3926 3927 3928 3929 3930 3931 3932 3933 3934 3935 3936 3937 3938 3939 3940 3941 3942 3943 3944 3945 3946 3947 3948 3949 3950 3951 3952 3953 3954 3955 3956 3957 3958 3959 3960 3961 3962 3963 3964 3965 3966 3967 3968 3969 3970 3971 3972 3973 3974 3975 3976 3977 3978 3979 3980 3981 3982 3983 3984 3985 3986 3987 3988 3989 3990 3991 3992 3993 3994 3995 3996 3997 3998 3999 4000 4001 4002 4003 4004 4005 4006 4007 4008 4009 4010 4011 4012 4013 4014 4015 4016 4017 4018 4019 4020 4021 4022 4023 4024 4025 4026 4027 4028 4029 4030 4031 4032 4033 4034 4035 4036 4037 4038 4039 4040 4041 4042 4043 4044 4045 4046 4047 4048 4049 4050 4051 4052 4053 4054 4055 4056 4057 4058 4059 4060 4061 4062 4063 4064 4065 4066 4067 4068 4069 4070 4071 4072 4073 4074 4075 4076 4077 4078 4079 4080 4081 4082 4083 4084 4085 4086 4087 4088 4089 4090 4091 4092 4093 4094 4095 4096 4097 4098 4099 4100 4101 4102 4103 4104 4105 4106 4107 4108 4109 4110 4111 4112 4113 4114 4115 4116 4117 4118 4119 4120 4121 4122 4123 4124 4125 4126 4127 4128 4129 4130 4131 4132 4133 4134 4135 4136 4137 4138 4139 4140 4141 4142 4143 4144 4145 4146 4147 4148 4149 4150 4151 4152 4153 4154 4155 4156 4157 4158 4159 4160 4161 4162 4163 4164 4165 4166 4167 4168 4169 4170 4171 4172 4173 4174 4175 4176 4177 4178 4179 4180 4181 4182 4183 4184 4185 4186 4187 4188 4189 4190 4191 4192 4193 4194 4195 4196 4197 4198 4199 4200 4201 4202 4203 4204 4205 4206 4207 4208 4209 4210 4211 4212 4213 4214 4215 4216 4217 4218 4219 4220 4221 4222 4223 4224 4225 4226 4227 4228 4229 4230 4231 4232 4233 4234 4235 4236 4237 4238 4239 4240 4241 4242 4243 4244 4245 4246 4247 4248 4249 4250 4251 4252 4253 4254 4255 4256 4257 4258 4259 4260 4261 4262 4263 4264 4265 4266 4267 4268 4269 4270 4271 4272 4273 4274 4275 4276 4277 4278 4279 4280 4281 4282 4283 4284 4285 4286 4287 4288 4289 4290 4291 4292 4293 4294 4295 4296 4297 4298 4299 4300 4301 4302 4303 4304 4305 4306 4307 4308 4309 4310 4311 4312 4313 4314 4315 4316 4317 4318 4319 4320 4321 4322 4323 4324 4325 4326 4327 4328 4329 4330 4331 4332 4333 4334 4335 4336 4337 4338 4339 4340 4341 4342 4343 4344 4345 4346 4347 4348 4349 4350 4351 4352 4353 4354 4355 4356 4357 4358 4359 4360 4361 4362 4363 4364 4365 4366 4367 4368 4369 4370 4371 4372 4373 4374 4375 4376 4377 4378 4379 4380 4381 4382 4383 4384 4385 4386 4387 4388 4389 4390 4391 4392 4393 4394 4395 4396 4397 4398 4399 4400 4401 4402 4403 4404 4405 4406 4407 4408 4409 4410 4411 4412 4413 4414 4415 4416 4417 4418 4419 4420 4421 4422 4423 4424 4425 4426 4427 4428 4429 4430 4431 4432 4433 4434 4435 4436 4437 4438 4439 4440 4441 4442 4443 4444 4445 4446 4447 4448 4449 4450 4451 4452 4453 4454 4455 4456 4457 4458 4459 4460 4461 4462 4463 4464 4465 4466 4467 4468 4469 4470 4471 4472 4473 4474 4475 4476 4477 4478 4479 4480 4481 4482 4483 4484 4485 4486 4487 4488 4489 4490 4491 4492 4493 4494 4495 4496 4497 4498 4499 4500 4501 4502 4503 4504 4505 4506 4507 4508 4509 4510 4511 4512 4513 4514 4515 4516 4517 4518 4519 4520 4521 4522 4523 4524 4525 4526 4527 4528 4529 4530 4531 4532 4533 4534 4535 4536 4537 4538 4539 4540 4541 4542 4543 4544 4545 4546 4547 4548 4549 4550 4551 4552 4553 4554 4555 4556 4557 4558 4559 4560 4561 4562 4563 4564 4565 4566 4567 4568 4569 4570 4571 4572 4573 4574 4575 4576 4577 4578 4579 4580 4581 4582 4583 4584 4585 4586 4587 4588 4589 4590 4591 4592 4593 4594 4595 4596 4597 4598 4599 4600 4601 4602 4603 4604 4605 4606 4607 4608 4609 4610 4611 4612 4613 4614 4615 4616 4617 4618 4619 4620 4621 4622 4623 4624 4625 4626 4627 4628 4629 4630 4631 4632 4633 4634 4635 4636 4637 4638 4639 4640 4641 4642 4643 4644 4645 4646 4647 4648 4649 4650 4651 4652 4653 4654 4655 4656 4657 4658 4659 4660 4661 4662 4663 4664 4665 4666 4667 4668 4669 4670 4671 4672 4673 4674 4675 4676 4677 4678 4679 4680 4681 4682 4683 4684 4685 4686 4687 4688 4689 4690 4691 4692 4693 4694 4695 4696 4697 4698 4699 4700 4701 4702 4703 4704 4705 4706 4707 4708 4709 4710 4711 4712 4713 4714 4715 4716 4717 4718 4719 4720 4721 4722 4723 4724 4725 4726 4727 4728 4729 4730 4731 4732 4733 4734 4735 4736 4737 4738 4739 4740 4741 4742 4743 4744 4745 4746 4747 4748 4749 4750 4751 4752 4753 4754 4755 4756 4757 4758 4759 4760 4761 4762 4763 4764 4765 4766 4767 4768 4769 4770 4771 4772 4773 4774 4775 4776 4777 4778 4779 4780 4781 4782 4783 4784 4785 4786 4787 4788 4789 4790 4791 4792 4793 4794 4795 4796 4797 4798 4799 4800 4801 4802 4803 4804 4805 4806 4807 4808 4809 4810 4811 4812 4813 4814 4815 4816 4817 4818 4819 4820 4821 4822 4823 4824 4825 4826 4827 4828 4829 4830 4831 4832 4833 4834 4835 4836 4837 4838 4839 4840 4841 4842 4843 4844 4845 4846 4847 4848 4849 4850 4851 4852 4853 4854 4855 4856 4857 4858 4859 4860 4861 4862 4863 4864 4865 4866 4867 4868 4869 4870 4871 4872 4873 4874 4875 4876 4877 4878 4879 4880 4881 4882 4883 4884 4885 4886 4887 4888 4889 4890 4891 4892 4893 4894 4895 4896 4897 4898 4899 4900 4901 4902 4903 4904 4905 4906 4907 4908 4909 4910 4911 4912 4913 4914 4915 4916 4917 4918 4919 4920 4921 4922 4923 4924 4925 4926 4927 4928 4929 4930 4931 4932 4933 4934 4935 4936 4937 4938 4939 4940 4941 4942 4943 4944 4945 4946 4947 4948 4949 4950 4951 4952 4953 4954 4955 4956 4957 4958 4959 4960 4961 4962 4963 4964 4965 4966 4967 4968 4969 4970 4971 4972 4973 4974 4975 4976 4977 4978 4979 4980 4981 4982 4983 4984 4985 4986 4987 4988 4989 4990 4991 4992 4993 4994 4995 4996 4997 4998 4999 5000 5001 5002 5003 5004 5005 5006 5007 5008 5009 5010 5011 5012 5013 5014 5015 5016 5017 5018 5019 5020 5021 5022 5023 5024 5025 5026 5027 5028 5029 5030 5031 5032 5033 5034 5035 5036 5037 5038 5039 5040 5041 5042 5043 5044 5045 5046 5047 5048 5049 5050 5051 5052 5053 5054 5055 5056 5057 5058 5059 5060 5061 5062 5063 5064 5065 5066 5067 5068 5069 5070 5071 5072 5073 5074 5075 5076 5077 5078 5079 5080 5081 5082 5083 5084 5085 5086 5087 5088 5089 5090 5091 5092 5093 5094 5095 5096 5097 5098 5099 5100 5101 5102 5103 5104 5105 5106 5107 5108 5109 5110 5111 5112 5113 5114 5115 5116 5117 5118 5119 5120 5121 5122 5123 5124 5125 5126 5127 5128 5129 5130 5131 5132 5133 5134 5135 5136 5137 5138 5139 5140 5141 5142 5143 5144 5145 5146 5147 5148 5149 5150 5151 5152 5153 5154 5155 5156 5157 5158 5159 5160 5161 5162 5163 5164 5165 5166 5167 5168 5169 5170 5171 5172 5173 5174 5175 5176 5177 5178 5179 5180 5181 5182 5183 5184 5185 5186 5187 5188 5189 5190 5191 5192 5193 5194 5195 5196 5197 5198 5199 5200 5201 5202 5203 5204 5205 5206 5207 5208 5209 5210 5211 5212 5213 5214 5215 5216 5217 5218 5219 5220 5221 5222 5223 5224 5225 5226 5227 5228 5229 5230 5231 5232 5233 5234 5235 5236 5237 5238 5239 5240 5241 | // SPDX-License-Identifier: GPL-2.0-or-later /* * Bridge multicast support. * * Copyright (c) 2010 Herbert Xu <herbert@gondor.apana.org.au> */ #include <linux/err.h> #include <linux/export.h> #include <linux/if_ether.h> #include <linux/igmp.h> #include <linux/in.h> #include <linux/jhash.h> #include <linux/kernel.h> #include <linux/log2.h> #include <linux/netdevice.h> #include <linux/netfilter_bridge.h> #include <linux/random.h> #include <linux/rculist.h> #include <linux/skbuff.h> #include <linux/slab.h> #include <linux/timer.h> #include <linux/inetdevice.h> #include <linux/mroute.h> #include <net/ip.h> #include <net/switchdev.h> #if IS_ENABLED(CONFIG_IPV6) #include <linux/icmpv6.h> #include <net/ipv6.h> #include <net/mld.h> #include <net/ip6_checksum.h> #include <net/addrconf.h> #endif #include <trace/events/bridge.h> #include "br_private.h" #include "br_private_mcast_eht.h" static const struct rhashtable_params br_mdb_rht_params = { .head_offset = offsetof(struct net_bridge_mdb_entry, rhnode), .key_offset = offsetof(struct net_bridge_mdb_entry, addr), .key_len = sizeof(struct br_ip), .automatic_shrinking = true, }; static const struct rhashtable_params br_sg_port_rht_params = { .head_offset = offsetof(struct net_bridge_port_group, rhnode), .key_offset = offsetof(struct net_bridge_port_group, key), .key_len = sizeof(struct net_bridge_port_group_sg_key), .automatic_shrinking = true, }; static void br_multicast_start_querier(struct net_bridge_mcast *brmctx, struct bridge_mcast_own_query *query); static void br_ip4_multicast_add_router(struct net_bridge_mcast *brmctx, struct net_bridge_mcast_port *pmctx); static void br_ip4_multicast_leave_group(struct net_bridge_mcast *brmctx, struct net_bridge_mcast_port *pmctx, __be32 group, __u16 vid, const unsigned char *src); static void br_multicast_port_group_rexmit(struct timer_list *t); static void br_multicast_rport_del_notify(struct net_bridge_mcast_port *pmctx, bool deleted); static void br_ip6_multicast_add_router(struct net_bridge_mcast *brmctx, struct net_bridge_mcast_port *pmctx); #if IS_ENABLED(CONFIG_IPV6) static void br_ip6_multicast_leave_group(struct net_bridge_mcast *brmctx, struct net_bridge_mcast_port *pmctx, const struct in6_addr *group, __u16 vid, const unsigned char *src); #endif static struct net_bridge_port_group * __br_multicast_add_group(struct net_bridge_mcast *brmctx, struct net_bridge_mcast_port *pmctx, struct br_ip *group, const unsigned char *src, u8 filter_mode, bool igmpv2_mldv1, bool blocked); static void br_multicast_find_del_pg(struct net_bridge *br, struct net_bridge_port_group *pg); static void __br_multicast_stop(struct net_bridge_mcast *brmctx); static int br_mc_disabled_update(struct net_device *dev, bool value, struct netlink_ext_ack *extack); static struct net_bridge_port_group * br_sg_port_find(struct net_bridge *br, struct net_bridge_port_group_sg_key *sg_p) { lockdep_assert_held_once(&br->multicast_lock); return rhashtable_lookup_fast(&br->sg_port_tbl, sg_p, br_sg_port_rht_params); } static struct net_bridge_mdb_entry *br_mdb_ip_get_rcu(struct net_bridge *br, struct br_ip *dst) { return rhashtable_lookup(&br->mdb_hash_tbl, dst, br_mdb_rht_params); } struct net_bridge_mdb_entry *br_mdb_ip_get(struct net_bridge *br, struct br_ip *dst) { struct net_bridge_mdb_entry *ent; lockdep_assert_held_once(&br->multicast_lock); rcu_read_lock(); ent = rhashtable_lookup(&br->mdb_hash_tbl, dst, br_mdb_rht_params); rcu_read_unlock(); return ent; } static struct net_bridge_mdb_entry *br_mdb_ip4_get(struct net_bridge *br, __be32 dst, __u16 vid) { struct br_ip br_dst; memset(&br_dst, 0, sizeof(br_dst)); br_dst.dst.ip4 = dst; br_dst.proto = htons(ETH_P_IP); br_dst.vid = vid; return br_mdb_ip_get(br, &br_dst); } #if IS_ENABLED(CONFIG_IPV6) static struct net_bridge_mdb_entry *br_mdb_ip6_get(struct net_bridge *br, const struct in6_addr *dst, __u16 vid) { struct br_ip br_dst; memset(&br_dst, 0, sizeof(br_dst)); br_dst.dst.ip6 = *dst; br_dst.proto = htons(ETH_P_IPV6); br_dst.vid = vid; return br_mdb_ip_get(br, &br_dst); } #endif struct net_bridge_mdb_entry * br_mdb_entry_skb_get(struct net_bridge_mcast *brmctx, struct sk_buff *skb, u16 vid) { struct net_bridge *br = brmctx->br; struct br_ip ip; if (!br_opt_get(br, BROPT_MULTICAST_ENABLED) || br_multicast_ctx_vlan_global_disabled(brmctx)) return NULL; if (BR_INPUT_SKB_CB(skb)->igmp) return NULL; memset(&ip, 0, sizeof(ip)); ip.proto = skb->protocol; ip.vid = vid; switch (skb->protocol) { case htons(ETH_P_IP): ip.dst.ip4 = ip_hdr(skb)->daddr; if (brmctx->multicast_igmp_version == 3) { struct net_bridge_mdb_entry *mdb; ip.src.ip4 = ip_hdr(skb)->saddr; mdb = br_mdb_ip_get_rcu(br, &ip); if (mdb) return mdb; ip.src.ip4 = 0; } break; #if IS_ENABLED(CONFIG_IPV6) case htons(ETH_P_IPV6): ip.dst.ip6 = ipv6_hdr(skb)->daddr; if (brmctx->multicast_mld_version == 2) { struct net_bridge_mdb_entry *mdb; ip.src.ip6 = ipv6_hdr(skb)->saddr; mdb = br_mdb_ip_get_rcu(br, &ip); if (mdb) return mdb; memset(&ip.src.ip6, 0, sizeof(ip.src.ip6)); } break; #endif default: ip.proto = 0; ether_addr_copy(ip.dst.mac_addr, eth_hdr(skb)->h_dest); } return br_mdb_ip_get_rcu(br, &ip); } /* IMPORTANT: this function must be used only when the contexts cannot be * passed down (e.g. timer) and must be used for read-only purposes because * the vlan snooping option can change, so it can return any context * (non-vlan or vlan). Its initial intended purpose is to read timer values * from the *current* context based on the option. At worst that could lead * to inconsistent timers when the contexts are changed, i.e. src timer * which needs to re-arm with a specific delay taken from the old context */ static struct net_bridge_mcast_port * br_multicast_pg_to_port_ctx(const struct net_bridge_port_group *pg) { struct net_bridge_mcast_port *pmctx = &pg->key.port->multicast_ctx; struct net_bridge_vlan *vlan; lockdep_assert_held_once(&pg->key.port->br->multicast_lock); /* if vlan snooping is disabled use the port's multicast context */ if (!pg->key.addr.vid || !br_opt_get(pg->key.port->br, BROPT_MCAST_VLAN_SNOOPING_ENABLED)) goto out; /* locking is tricky here, due to different rules for multicast and * vlans we need to take rcu to find the vlan and make sure it has * the BR_VLFLAG_MCAST_ENABLED flag set, it can only change under * multicast_lock which must be already held here, so the vlan's pmctx * can safely be used on return */ rcu_read_lock(); vlan = br_vlan_find(nbp_vlan_group_rcu(pg->key.port), pg->key.addr.vid); if (vlan && !br_multicast_port_ctx_vlan_disabled(&vlan->port_mcast_ctx)) pmctx = &vlan->port_mcast_ctx; else pmctx = NULL; rcu_read_unlock(); out: return pmctx; } static struct net_bridge_mcast_port * br_multicast_port_vid_to_port_ctx(struct net_bridge_port *port, u16 vid) { struct net_bridge_mcast_port *pmctx = NULL; struct net_bridge_vlan *vlan; lockdep_assert_held_once(&port->br->multicast_lock); if (!br_opt_get(port->br, BROPT_MCAST_VLAN_SNOOPING_ENABLED)) return NULL; /* Take RCU to access the vlan. */ rcu_read_lock(); vlan = br_vlan_find(nbp_vlan_group_rcu(port), vid); if (vlan && !br_multicast_port_ctx_vlan_disabled(&vlan->port_mcast_ctx)) pmctx = &vlan->port_mcast_ctx; rcu_read_unlock(); return pmctx; } /* when snooping we need to check if the contexts should be used * in the following order: * - if pmctx is non-NULL (port), check if it should be used * - if pmctx is NULL (bridge), check if brmctx should be used */ static bool br_multicast_ctx_should_use(const struct net_bridge_mcast *brmctx, const struct net_bridge_mcast_port *pmctx) { if (!netif_running(brmctx->br->dev)) return false; if (pmctx) return !br_multicast_port_ctx_state_disabled(pmctx); else return !br_multicast_ctx_vlan_disabled(brmctx); } static bool br_port_group_equal(struct net_bridge_port_group *p, struct net_bridge_port *port, const unsigned char *src) { if (p->key.port != port) return false; if (!(port->flags & BR_MULTICAST_TO_UNICAST)) return true; return ether_addr_equal(src, p->eth_addr); } static void __fwd_add_star_excl(struct net_bridge_mcast_port *pmctx, struct net_bridge_port_group *pg, struct br_ip *sg_ip) { struct net_bridge_port_group_sg_key sg_key; struct net_bridge_port_group *src_pg; struct net_bridge_mcast *brmctx; memset(&sg_key, 0, sizeof(sg_key)); brmctx = br_multicast_port_ctx_get_global(pmctx); sg_key.port = pg->key.port; sg_key.addr = *sg_ip; if (br_sg_port_find(brmctx->br, &sg_key)) return; src_pg = __br_multicast_add_group(brmctx, pmctx, sg_ip, pg->eth_addr, MCAST_INCLUDE, false, false); if (IS_ERR_OR_NULL(src_pg) || src_pg->rt_protocol != RTPROT_KERNEL) return; src_pg->flags |= MDB_PG_FLAGS_STAR_EXCL; } static void __fwd_del_star_excl(struct net_bridge_port_group *pg, struct br_ip *sg_ip) { struct net_bridge_port_group_sg_key sg_key; struct net_bridge *br = pg->key.port->br; struct net_bridge_port_group *src_pg; memset(&sg_key, 0, sizeof(sg_key)); sg_key.port = pg->key.port; sg_key.addr = *sg_ip; src_pg = br_sg_port_find(br, &sg_key); if (!src_pg || !(src_pg->flags & MDB_PG_FLAGS_STAR_EXCL) || src_pg->rt_protocol != RTPROT_KERNEL) return; br_multicast_find_del_pg(br, src_pg); } /* When a port group transitions to (or is added as) EXCLUDE we need to add it * to all other ports' S,G entries which are not blocked by the current group * for proper replication, the assumption is that any S,G blocked entries * are already added so the S,G,port lookup should skip them. * When a port group transitions from EXCLUDE -> INCLUDE mode or is being * deleted we need to remove it from all ports' S,G entries where it was * automatically installed before (i.e. where it's MDB_PG_FLAGS_STAR_EXCL). */ void br_multicast_star_g_handle_mode(struct net_bridge_port_group *pg, u8 filter_mode) { struct net_bridge *br = pg->key.port->br; struct net_bridge_port_group *pg_lst; struct net_bridge_mcast_port *pmctx; struct net_bridge_mdb_entry *mp; struct br_ip sg_ip; if (WARN_ON(!br_multicast_is_star_g(&pg->key.addr))) return; mp = br_mdb_ip_get(br, &pg->key.addr); if (!mp) return; pmctx = br_multicast_pg_to_port_ctx(pg); if (!pmctx) return; memset(&sg_ip, 0, sizeof(sg_ip)); sg_ip = pg->key.addr; for (pg_lst = mlock_dereference(mp->ports, br); pg_lst; pg_lst = mlock_dereference(pg_lst->next, br)) { struct net_bridge_group_src *src_ent; if (pg_lst == pg) continue; hlist_for_each_entry(src_ent, &pg_lst->src_list, node) { if (!(src_ent->flags & BR_SGRP_F_INSTALLED)) continue; sg_ip.src = src_ent->addr.src; switch (filter_mode) { case MCAST_INCLUDE: __fwd_del_star_excl(pg, &sg_ip); break; case MCAST_EXCLUDE: __fwd_add_star_excl(pmctx, pg, &sg_ip); break; } } } } /* called when adding a new S,G with host_joined == false by default */ static void br_multicast_sg_host_state(struct net_bridge_mdb_entry *star_mp, struct net_bridge_port_group *sg) { struct net_bridge_mdb_entry *sg_mp; if (WARN_ON(!br_multicast_is_star_g(&star_mp->addr))) return; if (!star_mp->host_joined) return; sg_mp = br_mdb_ip_get(star_mp->br, &sg->key.addr); if (!sg_mp) return; sg_mp->host_joined = true; } /* set the host_joined state of all of *,G's S,G entries */ static void br_multicast_star_g_host_state(struct net_bridge_mdb_entry *star_mp) { struct net_bridge *br = star_mp->br; struct net_bridge_mdb_entry *sg_mp; struct net_bridge_port_group *pg; struct br_ip sg_ip; if (WARN_ON(!br_multicast_is_star_g(&star_mp->addr))) return; memset(&sg_ip, 0, sizeof(sg_ip)); sg_ip = star_mp->addr; for (pg = mlock_dereference(star_mp->ports, br); pg; pg = mlock_dereference(pg->next, br)) { struct net_bridge_group_src *src_ent; hlist_for_each_entry(src_ent, &pg->src_list, node) { if (!(src_ent->flags & BR_SGRP_F_INSTALLED)) continue; sg_ip.src = src_ent->addr.src; sg_mp = br_mdb_ip_get(br, &sg_ip); if (!sg_mp) continue; sg_mp->host_joined = star_mp->host_joined; } } } static void br_multicast_sg_del_exclude_ports(struct net_bridge_mdb_entry *sgmp) { struct net_bridge_port_group __rcu **pp; struct net_bridge_port_group *p; /* *,G exclude ports are only added to S,G entries */ if (WARN_ON(br_multicast_is_star_g(&sgmp->addr))) return; /* we need the STAR_EXCLUDE ports if there are non-STAR_EXCLUDE ports * we should ignore perm entries since they're managed by user-space */ for (pp = &sgmp->ports; (p = mlock_dereference(*pp, sgmp->br)) != NULL; pp = &p->next) if (!(p->flags & (MDB_PG_FLAGS_STAR_EXCL | MDB_PG_FLAGS_PERMANENT))) return; /* currently the host can only have joined the *,G which means * we treat it as EXCLUDE {}, so for an S,G it's considered a * STAR_EXCLUDE entry and we can safely leave it */ sgmp->host_joined = false; for (pp = &sgmp->ports; (p = mlock_dereference(*pp, sgmp->br)) != NULL;) { if (!(p->flags & MDB_PG_FLAGS_PERMANENT)) br_multicast_del_pg(sgmp, p, pp); else pp = &p->next; } } void br_multicast_sg_add_exclude_ports(struct net_bridge_mdb_entry *star_mp, struct net_bridge_port_group *sg) { struct net_bridge_port_group_sg_key sg_key; struct net_bridge *br = star_mp->br; struct net_bridge_mcast_port *pmctx; struct net_bridge_port_group *pg; struct net_bridge_mcast *brmctx; if (WARN_ON(br_multicast_is_star_g(&sg->key.addr))) return; if (WARN_ON(!br_multicast_is_star_g(&star_mp->addr))) return; br_multicast_sg_host_state(star_mp, sg); memset(&sg_key, 0, sizeof(sg_key)); sg_key.addr = sg->key.addr; /* we need to add all exclude ports to the S,G */ for (pg = mlock_dereference(star_mp->ports, br); pg; pg = mlock_dereference(pg->next, br)) { struct net_bridge_port_group *src_pg; if (pg == sg || pg->filter_mode == MCAST_INCLUDE) continue; sg_key.port = pg->key.port; if (br_sg_port_find(br, &sg_key)) continue; pmctx = br_multicast_pg_to_port_ctx(pg); if (!pmctx) continue; brmctx = br_multicast_port_ctx_get_global(pmctx); src_pg = __br_multicast_add_group(brmctx, pmctx, &sg->key.addr, sg->eth_addr, MCAST_INCLUDE, false, false); if (IS_ERR_OR_NULL(src_pg) || src_pg->rt_protocol != RTPROT_KERNEL) continue; src_pg->flags |= MDB_PG_FLAGS_STAR_EXCL; } } static void br_multicast_fwd_src_add(struct net_bridge_group_src *src) { struct net_bridge_mdb_entry *star_mp; struct net_bridge_mcast_port *pmctx; struct net_bridge_port_group *sg; struct net_bridge_mcast *brmctx; struct br_ip sg_ip; if (src->flags & BR_SGRP_F_INSTALLED) return; memset(&sg_ip, 0, sizeof(sg_ip)); pmctx = br_multicast_pg_to_port_ctx(src->pg); if (!pmctx) return; brmctx = br_multicast_port_ctx_get_global(pmctx); sg_ip = src->pg->key.addr; sg_ip.src = src->addr.src; sg = __br_multicast_add_group(brmctx, pmctx, &sg_ip, src->pg->eth_addr, MCAST_INCLUDE, false, !timer_pending(&src->timer)); if (IS_ERR_OR_NULL(sg)) return; src->flags |= BR_SGRP_F_INSTALLED; sg->flags &= ~MDB_PG_FLAGS_STAR_EXCL; /* if it was added by user-space as perm we can skip next steps */ if (sg->rt_protocol != RTPROT_KERNEL && (sg->flags & MDB_PG_FLAGS_PERMANENT)) return; /* the kernel is now responsible for removing this S,G */ timer_delete(&sg->timer); star_mp = br_mdb_ip_get(src->br, &src->pg->key.addr); if (!star_mp) return; br_multicast_sg_add_exclude_ports(star_mp, sg); } static void br_multicast_fwd_src_remove(struct net_bridge_group_src *src, bool fastleave) { struct net_bridge_port_group *p, *pg = src->pg; struct net_bridge_port_group __rcu **pp; struct net_bridge_mdb_entry *mp; struct br_ip sg_ip; memset(&sg_ip, 0, sizeof(sg_ip)); sg_ip = pg->key.addr; sg_ip.src = src->addr.src; mp = br_mdb_ip_get(src->br, &sg_ip); if (!mp) return; for (pp = &mp->ports; (p = mlock_dereference(*pp, src->br)) != NULL; pp = &p->next) { if (!br_port_group_equal(p, pg->key.port, pg->eth_addr)) continue; if (p->rt_protocol != RTPROT_KERNEL && (p->flags & MDB_PG_FLAGS_PERMANENT) && !(src->flags & BR_SGRP_F_USER_ADDED)) break; if (fastleave) p->flags |= MDB_PG_FLAGS_FAST_LEAVE; br_multicast_del_pg(mp, p, pp); break; } src->flags &= ~BR_SGRP_F_INSTALLED; } /* install S,G and based on src's timer enable or disable forwarding */ static void br_multicast_fwd_src_handle(struct net_bridge_group_src *src) { struct net_bridge_port_group_sg_key sg_key; struct net_bridge_port_group *sg; u8 old_flags; br_multicast_fwd_src_add(src); memset(&sg_key, 0, sizeof(sg_key)); sg_key.addr = src->pg->key.addr; sg_key.addr.src = src->addr.src; sg_key.port = src->pg->key.port; sg = br_sg_port_find(src->br, &sg_key); if (!sg || (sg->flags & MDB_PG_FLAGS_PERMANENT)) return; old_flags = sg->flags; if (timer_pending(&src->timer)) sg->flags &= ~MDB_PG_FLAGS_BLOCKED; else sg->flags |= MDB_PG_FLAGS_BLOCKED; if (old_flags != sg->flags) { struct net_bridge_mdb_entry *sg_mp; sg_mp = br_mdb_ip_get(src->br, &sg_key.addr); if (!sg_mp) return; br_mdb_notify(src->br->dev, sg_mp, sg, RTM_NEWMDB); } } static void br_multicast_destroy_mdb_entry(struct net_bridge_mcast_gc *gc) { struct net_bridge_mdb_entry *mp; mp = container_of(gc, struct net_bridge_mdb_entry, mcast_gc); WARN_ON(!hlist_unhashed(&mp->mdb_node)); WARN_ON(mp->ports); timer_shutdown_sync(&mp->timer); kfree_rcu(mp, rcu); } static void br_multicast_del_mdb_entry(struct net_bridge_mdb_entry *mp) { struct net_bridge *br = mp->br; rhashtable_remove_fast(&br->mdb_hash_tbl, &mp->rhnode, br_mdb_rht_params); hlist_del_init_rcu(&mp->mdb_node); hlist_add_head(&mp->mcast_gc.gc_node, &br->mcast_gc_list); queue_work(system_long_wq, &br->mcast_gc_work); } static void br_multicast_group_expired(struct timer_list *t) { struct net_bridge_mdb_entry *mp = timer_container_of(mp, t, timer); struct net_bridge *br = mp->br; spin_lock(&br->multicast_lock); if (hlist_unhashed(&mp->mdb_node) || !netif_running(br->dev) || timer_pending(&mp->timer)) goto out; br_multicast_host_leave(mp, true); if (mp->ports) goto out; br_multicast_del_mdb_entry(mp); out: spin_unlock(&br->multicast_lock); } static void br_multicast_destroy_group_src(struct net_bridge_mcast_gc *gc) { struct net_bridge_group_src *src; src = container_of(gc, struct net_bridge_group_src, mcast_gc); WARN_ON(!hlist_unhashed(&src->node)); timer_shutdown_sync(&src->timer); kfree_rcu(src, rcu); } void __br_multicast_del_group_src(struct net_bridge_group_src *src) { struct net_bridge *br = src->pg->key.port->br; hlist_del_init_rcu(&src->node); src->pg->src_ents--; hlist_add_head(&src->mcast_gc.gc_node, &br->mcast_gc_list); queue_work(system_long_wq, &br->mcast_gc_work); } void br_multicast_del_group_src(struct net_bridge_group_src *src, bool fastleave) { br_multicast_fwd_src_remove(src, fastleave); __br_multicast_del_group_src(src); } static int br_multicast_port_ngroups_inc_one(struct net_bridge_mcast_port *pmctx, struct netlink_ext_ack *extack, const char *what) { u32 max = READ_ONCE(pmctx->mdb_max_entries); u32 n = READ_ONCE(pmctx->mdb_n_entries); if (max && n >= max) { NL_SET_ERR_MSG_FMT_MOD(extack, "%s is already in %u groups, and mcast_max_groups=%u", what, n, max); return -E2BIG; } WRITE_ONCE(pmctx->mdb_n_entries, n + 1); return 0; } static void br_multicast_port_ngroups_dec_one(struct net_bridge_mcast_port *pmctx) { u32 n = READ_ONCE(pmctx->mdb_n_entries); WARN_ON_ONCE(n == 0); WRITE_ONCE(pmctx->mdb_n_entries, n - 1); } static int br_multicast_port_ngroups_inc(struct net_bridge_port *port, const struct br_ip *group, struct netlink_ext_ack *extack) { struct net_bridge_mcast_port *pmctx; int err; lockdep_assert_held_once(&port->br->multicast_lock); /* Always count on the port context. */ err = br_multicast_port_ngroups_inc_one(&port->multicast_ctx, extack, "Port"); if (err) { trace_br_mdb_full(port->dev, group); return err; } /* Only count on the VLAN context if VID is given, and if snooping on * that VLAN is enabled. */ if (!group->vid) return 0; pmctx = br_multicast_port_vid_to_port_ctx(port, group->vid); if (!pmctx) return 0; err = br_multicast_port_ngroups_inc_one(pmctx, extack, "Port-VLAN"); if (err) { trace_br_mdb_full(port->dev, group); goto dec_one_out; } return 0; dec_one_out: br_multicast_port_ngroups_dec_one(&port->multicast_ctx); return err; } static void br_multicast_port_ngroups_dec(struct net_bridge_port *port, u16 vid) { struct net_bridge_mcast_port *pmctx; lockdep_assert_held_once(&port->br->multicast_lock); if (vid) { pmctx = br_multicast_port_vid_to_port_ctx(port, vid); if (pmctx) br_multicast_port_ngroups_dec_one(pmctx); } br_multicast_port_ngroups_dec_one(&port->multicast_ctx); } u32 br_multicast_ngroups_get(const struct net_bridge_mcast_port *pmctx) { return READ_ONCE(pmctx->mdb_n_entries); } void br_multicast_ngroups_set_max(struct net_bridge_mcast_port *pmctx, u32 max) { WRITE_ONCE(pmctx->mdb_max_entries, max); } u32 br_multicast_ngroups_get_max(const struct net_bridge_mcast_port *pmctx) { return READ_ONCE(pmctx->mdb_max_entries); } static void br_multicast_destroy_port_group(struct net_bridge_mcast_gc *gc) { struct net_bridge_port_group *pg; pg = container_of(gc, struct net_bridge_port_group, mcast_gc); WARN_ON(!hlist_unhashed(&pg->mglist)); WARN_ON(!hlist_empty(&pg->src_list)); timer_shutdown_sync(&pg->rexmit_timer); timer_shutdown_sync(&pg->timer); kfree_rcu(pg, rcu); } void br_multicast_del_pg(struct net_bridge_mdb_entry *mp, struct net_bridge_port_group *pg, struct net_bridge_port_group __rcu **pp) { struct net_bridge *br = pg->key.port->br; struct net_bridge_group_src *ent; struct hlist_node *tmp; rcu_assign_pointer(*pp, pg->next); hlist_del_init(&pg->mglist); br_multicast_eht_clean_sets(pg); hlist_for_each_entry_safe(ent, tmp, &pg->src_list, node) br_multicast_del_group_src(ent, false); br_mdb_notify(br->dev, mp, pg, RTM_DELMDB); if (!br_multicast_is_star_g(&mp->addr)) { rhashtable_remove_fast(&br->sg_port_tbl, &pg->rhnode, br_sg_port_rht_params); br_multicast_sg_del_exclude_ports(mp); } else { br_multicast_star_g_handle_mode(pg, MCAST_INCLUDE); } br_multicast_port_ngroups_dec(pg->key.port, pg->key.addr.vid); hlist_add_head(&pg->mcast_gc.gc_node, &br->mcast_gc_list); queue_work(system_long_wq, &br->mcast_gc_work); if (!mp->ports && !mp->host_joined && netif_running(br->dev)) mod_timer(&mp->timer, jiffies); } static void br_multicast_find_del_pg(struct net_bridge *br, struct net_bridge_port_group *pg) { struct net_bridge_port_group __rcu **pp; struct net_bridge_mdb_entry *mp; struct net_bridge_port_group *p; mp = br_mdb_ip_get(br, &pg->key.addr); if (WARN_ON(!mp)) return; for (pp = &mp->ports; (p = mlock_dereference(*pp, br)) != NULL; pp = &p->next) { if (p != pg) continue; br_multicast_del_pg(mp, pg, pp); return; } WARN_ON(1); } static void br_multicast_port_group_expired(struct timer_list *t) { struct net_bridge_port_group *pg = timer_container_of(pg, t, timer); struct net_bridge_group_src *src_ent; struct net_bridge *br = pg->key.port->br; struct hlist_node *tmp; bool changed; spin_lock(&br->multicast_lock); if (!netif_running(br->dev) || timer_pending(&pg->timer) || hlist_unhashed(&pg->mglist) || pg->flags & MDB_PG_FLAGS_PERMANENT) goto out; changed = !!(pg->filter_mode == MCAST_EXCLUDE); pg->filter_mode = MCAST_INCLUDE; hlist_for_each_entry_safe(src_ent, tmp, &pg->src_list, node) { if (!timer_pending(&src_ent->timer)) { br_multicast_del_group_src(src_ent, false); changed = true; } } if (hlist_empty(&pg->src_list)) { br_multicast_find_del_pg(br, pg); } else if (changed) { struct net_bridge_mdb_entry *mp = br_mdb_ip_get(br, &pg->key.addr); if (changed && br_multicast_is_star_g(&pg->key.addr)) br_multicast_star_g_handle_mode(pg, MCAST_INCLUDE); if (WARN_ON(!mp)) goto out; br_mdb_notify(br->dev, mp, pg, RTM_NEWMDB); } out: spin_unlock(&br->multicast_lock); } static void br_multicast_gc(struct hlist_head *head) { struct net_bridge_mcast_gc *gcent; struct hlist_node *tmp; hlist_for_each_entry_safe(gcent, tmp, head, gc_node) { hlist_del_init(&gcent->gc_node); gcent->destroy(gcent); } } static void __br_multicast_query_handle_vlan(struct net_bridge_mcast *brmctx, struct net_bridge_mcast_port *pmctx, struct sk_buff *skb) { struct net_bridge_vlan *vlan = NULL; if (pmctx && br_multicast_port_ctx_is_vlan(pmctx)) vlan = pmctx->vlan; else if (br_multicast_ctx_is_vlan(brmctx)) vlan = brmctx->vlan; if (vlan && !(vlan->flags & BRIDGE_VLAN_INFO_UNTAGGED)) { u16 vlan_proto; if (br_vlan_get_proto(brmctx->br->dev, &vlan_proto) != 0) return; __vlan_hwaccel_put_tag(skb, htons(vlan_proto), vlan->vid); } } static struct sk_buff *br_ip4_multicast_alloc_query(struct net_bridge_mcast *brmctx, struct net_bridge_mcast_port *pmctx, struct net_bridge_port_group *pg, __be32 ip_dst, __be32 group, bool with_srcs, bool over_lmqt, u8 sflag, u8 *igmp_type, bool *need_rexmit) { struct net_bridge_port *p = pg ? pg->key.port : NULL; struct net_bridge_group_src *ent; size_t pkt_size, igmp_hdr_size; unsigned long now = jiffies; struct igmpv3_query *ihv3; void *csum_start = NULL; __sum16 *csum = NULL; struct sk_buff *skb; struct igmphdr *ih; struct ethhdr *eth; unsigned long lmqt; struct iphdr *iph; u16 lmqt_srcs = 0; igmp_hdr_size = sizeof(*ih); if (brmctx->multicast_igmp_version == 3) { igmp_hdr_size = sizeof(*ihv3); if (pg && with_srcs) { lmqt = now + (brmctx->multicast_last_member_interval * brmctx->multicast_last_member_count); hlist_for_each_entry(ent, &pg->src_list, node) { if (over_lmqt == time_after(ent->timer.expires, lmqt) && ent->src_query_rexmit_cnt > 0) lmqt_srcs++; } if (!lmqt_srcs) return NULL; igmp_hdr_size += lmqt_srcs * sizeof(__be32); } } pkt_size = sizeof(*eth) + sizeof(*iph) + 4 + igmp_hdr_size; if ((p && pkt_size > p->dev->mtu) || pkt_size > brmctx->br->dev->mtu) return NULL; skb = netdev_alloc_skb_ip_align(brmctx->br->dev, pkt_size); if (!skb) goto out; __br_multicast_query_handle_vlan(brmctx, pmctx, skb); skb->protocol = htons(ETH_P_IP); skb_reset_mac_header(skb); eth = eth_hdr(skb); ether_addr_copy(eth->h_source, brmctx->br->dev->dev_addr); ip_eth_mc_map(ip_dst, eth->h_dest); eth->h_proto = htons(ETH_P_IP); skb_put(skb, sizeof(*eth)); skb_set_network_header(skb, skb->len); iph = ip_hdr(skb); iph->tot_len = htons(pkt_size - sizeof(*eth)); iph->version = 4; iph->ihl = 6; iph->tos = 0xc0; iph->id = 0; iph->frag_off = htons(IP_DF); iph->ttl = 1; iph->protocol = IPPROTO_IGMP; iph->saddr = br_opt_get(brmctx->br, BROPT_MULTICAST_QUERY_USE_IFADDR) ? inet_select_addr(brmctx->br->dev, 0, RT_SCOPE_LINK) : 0; iph->daddr = ip_dst; ((u8 *)&iph[1])[0] = IPOPT_RA; ((u8 *)&iph[1])[1] = 4; ((u8 *)&iph[1])[2] = 0; ((u8 *)&iph[1])[3] = 0; ip_send_check(iph); skb_put(skb, 24); skb_set_transport_header(skb, skb->len); *igmp_type = IGMP_HOST_MEMBERSHIP_QUERY; switch (brmctx->multicast_igmp_version) { case 2: ih = igmp_hdr(skb); ih->type = IGMP_HOST_MEMBERSHIP_QUERY; ih->code = (group ? brmctx->multicast_last_member_interval : brmctx->multicast_query_response_interval) / (HZ / IGMP_TIMER_SCALE); ih->group = group; ih->csum = 0; csum = &ih->csum; csum_start = (void *)ih; break; case 3: ihv3 = igmpv3_query_hdr(skb); ihv3->type = IGMP_HOST_MEMBERSHIP_QUERY; ihv3->code = (group ? brmctx->multicast_last_member_interval : brmctx->multicast_query_response_interval) / (HZ / IGMP_TIMER_SCALE); ihv3->group = group; ihv3->qqic = brmctx->multicast_query_interval / HZ; ihv3->nsrcs = htons(lmqt_srcs); ihv3->resv = 0; ihv3->suppress = sflag; ihv3->qrv = 2; ihv3->csum = 0; csum = &ihv3->csum; csum_start = (void *)ihv3; if (!pg || !with_srcs) break; lmqt_srcs = 0; hlist_for_each_entry(ent, &pg->src_list, node) { if (over_lmqt == time_after(ent->timer.expires, lmqt) && ent->src_query_rexmit_cnt > 0) { ihv3->srcs[lmqt_srcs++] = ent->addr.src.ip4; ent->src_query_rexmit_cnt--; if (need_rexmit && ent->src_query_rexmit_cnt) *need_rexmit = true; } } if (WARN_ON(lmqt_srcs != ntohs(ihv3->nsrcs))) { kfree_skb(skb); return NULL; } break; } if (WARN_ON(!csum || !csum_start)) { kfree_skb(skb); return NULL; } *csum = ip_compute_csum(csum_start, igmp_hdr_size); skb_put(skb, igmp_hdr_size); __skb_pull(skb, sizeof(*eth)); out: return skb; } #if IS_ENABLED(CONFIG_IPV6) static struct sk_buff *br_ip6_multicast_alloc_query(struct net_bridge_mcast *brmctx, struct net_bridge_mcast_port *pmctx, struct net_bridge_port_group *pg, const struct in6_addr *ip6_dst, const struct in6_addr *group, bool with_srcs, bool over_llqt, u8 sflag, u8 *igmp_type, bool *need_rexmit) { struct net_bridge_port *p = pg ? pg->key.port : NULL; struct net_bridge_group_src *ent; size_t pkt_size, mld_hdr_size; unsigned long now = jiffies; struct mld2_query *mld2q; void *csum_start = NULL; unsigned long interval; __sum16 *csum = NULL; struct ipv6hdr *ip6h; struct mld_msg *mldq; struct sk_buff *skb; unsigned long llqt; struct ethhdr *eth; u16 llqt_srcs = 0; u8 *hopopt; mld_hdr_size = sizeof(*mldq); if (brmctx->multicast_mld_version == 2) { mld_hdr_size = sizeof(*mld2q); if (pg && with_srcs) { llqt = now + (brmctx->multicast_last_member_interval * brmctx->multicast_last_member_count); hlist_for_each_entry(ent, &pg->src_list, node) { if (over_llqt == time_after(ent->timer.expires, llqt) && ent->src_query_rexmit_cnt > 0) llqt_srcs++; } if (!llqt_srcs) return NULL; mld_hdr_size += llqt_srcs * sizeof(struct in6_addr); } } pkt_size = sizeof(*eth) + sizeof(*ip6h) + 8 + mld_hdr_size; if ((p && pkt_size > p->dev->mtu) || pkt_size > brmctx->br->dev->mtu) return NULL; skb = netdev_alloc_skb_ip_align(brmctx->br->dev, pkt_size); if (!skb) goto out; __br_multicast_query_handle_vlan(brmctx, pmctx, skb); skb->protocol = htons(ETH_P_IPV6); /* Ethernet header */ skb_reset_mac_header(skb); eth = eth_hdr(skb); ether_addr_copy(eth->h_source, brmctx->br->dev->dev_addr); eth->h_proto = htons(ETH_P_IPV6); skb_put(skb, sizeof(*eth)); /* IPv6 header + HbH option */ skb_set_network_header(skb, skb->len); ip6h = ipv6_hdr(skb); *(__force __be32 *)ip6h = htonl(0x60000000); ip6h->payload_len = htons(8 + mld_hdr_size); ip6h->nexthdr = IPPROTO_HOPOPTS; ip6h->hop_limit = 1; ip6h->daddr = *ip6_dst; if (ipv6_dev_get_saddr(dev_net(brmctx->br->dev), brmctx->br->dev, &ip6h->daddr, 0, &ip6h->saddr)) { kfree_skb(skb); br_opt_toggle(brmctx->br, BROPT_HAS_IPV6_ADDR, false); return NULL; } br_opt_toggle(brmctx->br, BROPT_HAS_IPV6_ADDR, true); ipv6_eth_mc_map(&ip6h->daddr, eth->h_dest); hopopt = (u8 *)(ip6h + 1); hopopt[0] = IPPROTO_ICMPV6; /* next hdr */ hopopt[1] = 0; /* length of HbH */ hopopt[2] = IPV6_TLV_ROUTERALERT; /* Router Alert */ hopopt[3] = 2; /* Length of RA Option */ hopopt[4] = 0; /* Type = 0x0000 (MLD) */ hopopt[5] = 0; hopopt[6] = IPV6_TLV_PAD1; /* Pad1 */ hopopt[7] = IPV6_TLV_PAD1; /* Pad1 */ skb_put(skb, sizeof(*ip6h) + 8); /* ICMPv6 */ skb_set_transport_header(skb, skb->len); interval = ipv6_addr_any(group) ? brmctx->multicast_query_response_interval : brmctx->multicast_last_member_interval; *igmp_type = ICMPV6_MGM_QUERY; switch (brmctx->multicast_mld_version) { case 1: mldq = (struct mld_msg *)icmp6_hdr(skb); mldq->mld_type = ICMPV6_MGM_QUERY; mldq->mld_code = 0; mldq->mld_cksum = 0; mldq->mld_maxdelay = htons((u16)jiffies_to_msecs(interval)); mldq->mld_reserved = 0; mldq->mld_mca = *group; csum = &mldq->mld_cksum; csum_start = (void *)mldq; break; case 2: mld2q = (struct mld2_query *)icmp6_hdr(skb); mld2q->mld2q_mrc = htons((u16)jiffies_to_msecs(interval)); mld2q->mld2q_type = ICMPV6_MGM_QUERY; mld2q->mld2q_code = 0; mld2q->mld2q_cksum = 0; mld2q->mld2q_resv1 = 0; mld2q->mld2q_resv2 = 0; mld2q->mld2q_suppress = sflag; mld2q->mld2q_qrv = 2; mld2q->mld2q_nsrcs = htons(llqt_srcs); mld2q->mld2q_qqic = brmctx->multicast_query_interval / HZ; mld2q->mld2q_mca = *group; csum = &mld2q->mld2q_cksum; csum_start = (void *)mld2q; if (!pg || !with_srcs) break; llqt_srcs = 0; hlist_for_each_entry(ent, &pg->src_list, node) { if (over_llqt == time_after(ent->timer.expires, llqt) && ent->src_query_rexmit_cnt > 0) { mld2q->mld2q_srcs[llqt_srcs++] = ent->addr.src.ip6; ent->src_query_rexmit_cnt--; if (need_rexmit && ent->src_query_rexmit_cnt) *need_rexmit = true; } } if (WARN_ON(llqt_srcs != ntohs(mld2q->mld2q_nsrcs))) { kfree_skb(skb); return NULL; } break; } if (WARN_ON(!csum || !csum_start)) { kfree_skb(skb); return NULL; } *csum = csum_ipv6_magic(&ip6h->saddr, &ip6h->daddr, mld_hdr_size, IPPROTO_ICMPV6, csum_partial(csum_start, mld_hdr_size, 0)); skb_put(skb, mld_hdr_size); __skb_pull(skb, sizeof(*eth)); out: return skb; } #endif static struct sk_buff *br_multicast_alloc_query(struct net_bridge_mcast *brmctx, struct net_bridge_mcast_port *pmctx, struct net_bridge_port_group *pg, struct br_ip *ip_dst, struct br_ip *group, bool with_srcs, bool over_lmqt, u8 sflag, u8 *igmp_type, bool *need_rexmit) { __be32 ip4_dst; switch (group->proto) { case htons(ETH_P_IP): ip4_dst = ip_dst ? ip_dst->dst.ip4 : htonl(INADDR_ALLHOSTS_GROUP); return br_ip4_multicast_alloc_query(brmctx, pmctx, pg, ip4_dst, group->dst.ip4, with_srcs, over_lmqt, sflag, igmp_type, need_rexmit); #if IS_ENABLED(CONFIG_IPV6) case htons(ETH_P_IPV6): { struct in6_addr ip6_dst; if (ip_dst) ip6_dst = ip_dst->dst.ip6; else ipv6_addr_set(&ip6_dst, htonl(0xff020000), 0, 0, htonl(1)); return br_ip6_multicast_alloc_query(brmctx, pmctx, pg, &ip6_dst, &group->dst.ip6, with_srcs, over_lmqt, sflag, igmp_type, need_rexmit); } #endif } return NULL; } struct net_bridge_mdb_entry *br_multicast_new_group(struct net_bridge *br, struct br_ip *group) { struct net_bridge_mdb_entry *mp; int err; mp = br_mdb_ip_get(br, group); if (mp) return mp; if (atomic_read(&br->mdb_hash_tbl.nelems) >= br->hash_max) { trace_br_mdb_full(br->dev, group); br_mc_disabled_update(br->dev, false, NULL); br_opt_toggle(br, BROPT_MULTICAST_ENABLED, false); return ERR_PTR(-E2BIG); } mp = kzalloc(sizeof(*mp), GFP_ATOMIC); if (unlikely(!mp)) return ERR_PTR(-ENOMEM); mp->br = br; mp->addr = *group; mp->mcast_gc.destroy = br_multicast_destroy_mdb_entry; timer_setup(&mp->timer, br_multicast_group_expired, 0); err = rhashtable_lookup_insert_fast(&br->mdb_hash_tbl, &mp->rhnode, br_mdb_rht_params); if (err) { kfree(mp); mp = ERR_PTR(err); } else { hlist_add_head_rcu(&mp->mdb_node, &br->mdb_list); } return mp; } static void br_multicast_group_src_expired(struct timer_list *t) { struct net_bridge_group_src *src = timer_container_of(src, t, timer); struct net_bridge_port_group *pg; struct net_bridge *br = src->br; spin_lock(&br->multicast_lock); if (hlist_unhashed(&src->node) || !netif_running(br->dev) || timer_pending(&src->timer)) goto out; pg = src->pg; if (pg->filter_mode == MCAST_INCLUDE) { br_multicast_del_group_src(src, false); if (!hlist_empty(&pg->src_list)) goto out; br_multicast_find_del_pg(br, pg); } else { br_multicast_fwd_src_handle(src); } out: spin_unlock(&br->multicast_lock); } struct net_bridge_group_src * br_multicast_find_group_src(struct net_bridge_port_group *pg, struct br_ip *ip) { struct net_bridge_group_src *ent; switch (ip->proto) { case htons(ETH_P_IP): hlist_for_each_entry(ent, &pg->src_list, node) if (ip->src.ip4 == ent->addr.src.ip4) return ent; break; #if IS_ENABLED(CONFIG_IPV6) case htons(ETH_P_IPV6): hlist_for_each_entry(ent, &pg->src_list, node) if (!ipv6_addr_cmp(&ent->addr.src.ip6, &ip->src.ip6)) return ent; break; #endif } return NULL; } struct net_bridge_group_src * br_multicast_new_group_src(struct net_bridge_port_group *pg, struct br_ip *src_ip) { struct net_bridge_group_src *grp_src; if (unlikely(pg->src_ents >= PG_SRC_ENT_LIMIT)) return NULL; switch (src_ip->proto) { case htons(ETH_P_IP): if (ipv4_is_zeronet(src_ip->src.ip4) || ipv4_is_multicast(src_ip->src.ip4)) return NULL; break; #if IS_ENABLED(CONFIG_IPV6) case htons(ETH_P_IPV6): if (ipv6_addr_any(&src_ip->src.ip6) || ipv6_addr_is_multicast(&src_ip->src.ip6)) return NULL; break; #endif } grp_src = kzalloc(sizeof(*grp_src), GFP_ATOMIC); if (unlikely(!grp_src)) return NULL; grp_src->pg = pg; grp_src->br = pg->key.port->br; grp_src->addr = *src_ip; grp_src->mcast_gc.destroy = br_multicast_destroy_group_src; timer_setup(&grp_src->timer, br_multicast_group_src_expired, 0); hlist_add_head_rcu(&grp_src->node, &pg->src_list); pg->src_ents++; return grp_src; } struct net_bridge_port_group *br_multicast_new_port_group( struct net_bridge_port *port, const struct br_ip *group, struct net_bridge_port_group __rcu *next, unsigned char flags, const unsigned char *src, u8 filter_mode, u8 rt_protocol, struct netlink_ext_ack *extack) { struct net_bridge_port_group *p; int err; err = br_multicast_port_ngroups_inc(port, group, extack); if (err) return NULL; p = kzalloc(sizeof(*p), GFP_ATOMIC); if (unlikely(!p)) { NL_SET_ERR_MSG_MOD(extack, "Couldn't allocate new port group"); goto dec_out; } p->key.addr = *group; p->key.port = port; p->flags = flags; p->filter_mode = filter_mode; p->rt_protocol = rt_protocol; p->eht_host_tree = RB_ROOT; p->eht_set_tree = RB_ROOT; p->mcast_gc.destroy = br_multicast_destroy_port_group; INIT_HLIST_HEAD(&p->src_list); if (!br_multicast_is_star_g(group) && rhashtable_lookup_insert_fast(&port->br->sg_port_tbl, &p->rhnode, br_sg_port_rht_params)) { NL_SET_ERR_MSG_MOD(extack, "Couldn't insert new port group"); goto free_out; } rcu_assign_pointer(p->next, next); timer_setup(&p->timer, br_multicast_port_group_expired, 0); timer_setup(&p->rexmit_timer, br_multicast_port_group_rexmit, 0); hlist_add_head(&p->mglist, &port->mglist); if (src) memcpy(p->eth_addr, src, ETH_ALEN); else eth_broadcast_addr(p->eth_addr); return p; free_out: kfree(p); dec_out: br_multicast_port_ngroups_dec(port, group->vid); return NULL; } void br_multicast_del_port_group(struct net_bridge_port_group *p) { struct net_bridge_port *port = p->key.port; __u16 vid = p->key.addr.vid; hlist_del_init(&p->mglist); if (!br_multicast_is_star_g(&p->key.addr)) rhashtable_remove_fast(&port->br->sg_port_tbl, &p->rhnode, br_sg_port_rht_params); kfree(p); br_multicast_port_ngroups_dec(port, vid); } void br_multicast_host_join(const struct net_bridge_mcast *brmctx, struct net_bridge_mdb_entry *mp, bool notify) { if (!mp->host_joined) { mp->host_joined = true; if (br_multicast_is_star_g(&mp->addr)) br_multicast_star_g_host_state(mp); if (notify) br_mdb_notify(mp->br->dev, mp, NULL, RTM_NEWMDB); } if (br_group_is_l2(&mp->addr)) return; mod_timer(&mp->timer, jiffies + brmctx->multicast_membership_interval); } void br_multicast_host_leave(struct net_bridge_mdb_entry *mp, bool notify) { if (!mp->host_joined) return; mp->host_joined = false; if (br_multicast_is_star_g(&mp->addr)) br_multicast_star_g_host_state(mp); if (notify) br_mdb_notify(mp->br->dev, mp, NULL, RTM_DELMDB); } static struct net_bridge_port_group * __br_multicast_add_group(struct net_bridge_mcast *brmctx, struct net_bridge_mcast_port *pmctx, struct br_ip *group, const unsigned char *src, u8 filter_mode, bool igmpv2_mldv1, bool blocked) { struct net_bridge_port_group __rcu **pp; struct net_bridge_port_group *p = NULL; struct net_bridge_mdb_entry *mp; unsigned long now = jiffies; if (!br_multicast_ctx_should_use(brmctx, pmctx)) goto out; mp = br_multicast_new_group(brmctx->br, group); if (IS_ERR(mp)) return ERR_CAST(mp); if (!pmctx) { br_multicast_host_join(brmctx, mp, true); goto out; } for (pp = &mp->ports; (p = mlock_dereference(*pp, brmctx->br)) != NULL; pp = &p->next) { if (br_port_group_equal(p, pmctx->port, src)) goto found; if ((unsigned long)p->key.port < (unsigned long)pmctx->port) break; } p = br_multicast_new_port_group(pmctx->port, group, *pp, 0, src, filter_mode, RTPROT_KERNEL, NULL); if (unlikely(!p)) { p = ERR_PTR(-ENOMEM); goto out; } rcu_assign_pointer(*pp, p); if (blocked) p->flags |= MDB_PG_FLAGS_BLOCKED; br_mdb_notify(brmctx->br->dev, mp, p, RTM_NEWMDB); found: if (igmpv2_mldv1) mod_timer(&p->timer, now + brmctx->multicast_membership_interval); out: return p; } static int br_multicast_add_group(struct net_bridge_mcast *brmctx, struct net_bridge_mcast_port *pmctx, struct br_ip *group, const unsigned char *src, u8 filter_mode, bool igmpv2_mldv1) { struct net_bridge_port_group *pg; int err; spin_lock(&brmctx->br->multicast_lock); pg = __br_multicast_add_group(brmctx, pmctx, group, src, filter_mode, igmpv2_mldv1, false); /* NULL is considered valid for host joined groups */ err = PTR_ERR_OR_ZERO(pg); spin_unlock(&brmctx->br->multicast_lock); return err; } static int br_ip4_multicast_add_group(struct net_bridge_mcast *brmctx, struct net_bridge_mcast_port *pmctx, __be32 group, __u16 vid, const unsigned char *src, bool igmpv2) { struct br_ip br_group; u8 filter_mode; if (ipv4_is_local_multicast(group)) return 0; memset(&br_group, 0, sizeof(br_group)); br_group.dst.ip4 = group; br_group.proto = htons(ETH_P_IP); br_group.vid = vid; filter_mode = igmpv2 ? MCAST_EXCLUDE : MCAST_INCLUDE; return br_multicast_add_group(brmctx, pmctx, &br_group, src, filter_mode, igmpv2); } #if IS_ENABLED(CONFIG_IPV6) static int br_ip6_multicast_add_group(struct net_bridge_mcast *brmctx, struct net_bridge_mcast_port *pmctx, const struct in6_addr *group, __u16 vid, const unsigned char *src, bool mldv1) { struct br_ip br_group; u8 filter_mode; if (ipv6_addr_is_ll_all_nodes(group)) return 0; memset(&br_group, 0, sizeof(br_group)); br_group.dst.ip6 = *group; br_group.proto = htons(ETH_P_IPV6); br_group.vid = vid; filter_mode = mldv1 ? MCAST_EXCLUDE : MCAST_INCLUDE; return br_multicast_add_group(brmctx, pmctx, &br_group, src, filter_mode, mldv1); } #endif static bool br_multicast_rport_del(struct hlist_node *rlist) { if (hlist_unhashed(rlist)) return false; hlist_del_init_rcu(rlist); return true; } static bool br_ip4_multicast_rport_del(struct net_bridge_mcast_port *pmctx) { return br_multicast_rport_del(&pmctx->ip4_rlist); } static bool br_ip6_multicast_rport_del(struct net_bridge_mcast_port *pmctx) { #if IS_ENABLED(CONFIG_IPV6) return br_multicast_rport_del(&pmctx->ip6_rlist); #else return false; #endif } static void br_multicast_router_expired(struct net_bridge_mcast_port *pmctx, struct timer_list *t, struct hlist_node *rlist) { struct net_bridge *br = pmctx->port->br; bool del; spin_lock(&br->multicast_lock); if (pmctx->multicast_router == MDB_RTR_TYPE_DISABLED || pmctx->multicast_router == MDB_RTR_TYPE_PERM || timer_pending(t)) goto out; del = br_multicast_rport_del(rlist); br_multicast_rport_del_notify(pmctx, del); out: spin_unlock(&br->multicast_lock); } static void br_ip4_multicast_router_expired(struct timer_list *t) { struct net_bridge_mcast_port *pmctx = timer_container_of(pmctx, t, ip4_mc_router_timer); br_multicast_router_expired(pmctx, t, &pmctx->ip4_rlist); } #if IS_ENABLED(CONFIG_IPV6) static void br_ip6_multicast_router_expired(struct timer_list *t) { struct net_bridge_mcast_port *pmctx = timer_container_of(pmctx, t, ip6_mc_router_timer); br_multicast_router_expired(pmctx, t, &pmctx->ip6_rlist); } #endif static void br_mc_router_state_change(struct net_bridge *p, bool is_mc_router) { struct switchdev_attr attr = { .orig_dev = p->dev, .id = SWITCHDEV_ATTR_ID_BRIDGE_MROUTER, .flags = SWITCHDEV_F_DEFER, .u.mrouter = is_mc_router, }; switchdev_port_attr_set(p->dev, &attr, NULL); } static void br_multicast_local_router_expired(struct net_bridge_mcast *brmctx, struct timer_list *timer) { spin_lock(&brmctx->br->multicast_lock); if (brmctx->multicast_router == MDB_RTR_TYPE_DISABLED || brmctx->multicast_router == MDB_RTR_TYPE_PERM || br_ip4_multicast_is_router(brmctx) || br_ip6_multicast_is_router(brmctx)) goto out; br_mc_router_state_change(brmctx->br, false); out: spin_unlock(&brmctx->br->multicast_lock); } static void br_ip4_multicast_local_router_expired(struct timer_list *t) { struct net_bridge_mcast *brmctx = timer_container_of(brmctx, t, ip4_mc_router_timer); br_multicast_local_router_expired(brmctx, t); } #if IS_ENABLED(CONFIG_IPV6) static void br_ip6_multicast_local_router_expired(struct timer_list *t) { struct net_bridge_mcast *brmctx = timer_container_of(brmctx, t, ip6_mc_router_timer); br_multicast_local_router_expired(brmctx, t); } #endif static void br_multicast_querier_expired(struct net_bridge_mcast *brmctx, struct bridge_mcast_own_query *query) { spin_lock(&brmctx->br->multicast_lock); if (!netif_running(brmctx->br->dev) || br_multicast_ctx_vlan_global_disabled(brmctx) || !br_opt_get(brmctx->br, BROPT_MULTICAST_ENABLED)) goto out; br_multicast_start_querier(brmctx, query); out: spin_unlock(&brmctx->br->multicast_lock); } static void br_ip4_multicast_querier_expired(struct timer_list *t) { struct net_bridge_mcast *brmctx = timer_container_of(brmctx, t, ip4_other_query.timer); br_multicast_querier_expired(brmctx, &brmctx->ip4_own_query); } #if IS_ENABLED(CONFIG_IPV6) static void br_ip6_multicast_querier_expired(struct timer_list *t) { struct net_bridge_mcast *brmctx = timer_container_of(brmctx, t, ip6_other_query.timer); br_multicast_querier_expired(brmctx, &brmctx->ip6_own_query); } #endif static void br_multicast_query_delay_expired(struct timer_list *t) { } static void br_multicast_select_own_querier(struct net_bridge_mcast *brmctx, struct br_ip *ip, struct sk_buff *skb) { if (ip->proto == htons(ETH_P_IP)) brmctx->ip4_querier.addr.src.ip4 = ip_hdr(skb)->saddr; #if IS_ENABLED(CONFIG_IPV6) else brmctx->ip6_querier.addr.src.ip6 = ipv6_hdr(skb)->saddr; #endif } static void __br_multicast_send_query(struct net_bridge_mcast *brmctx, struct net_bridge_mcast_port *pmctx, struct net_bridge_port_group *pg, struct br_ip *ip_dst, struct br_ip *group, bool with_srcs, u8 sflag, bool *need_rexmit) { bool over_lmqt = !!sflag; struct sk_buff *skb; u8 igmp_type; if (!br_multicast_ctx_should_use(brmctx, pmctx) || !br_multicast_ctx_matches_vlan_snooping(brmctx)) return; again_under_lmqt: skb = br_multicast_alloc_query(brmctx, pmctx, pg, ip_dst, group, with_srcs, over_lmqt, sflag, &igmp_type, need_rexmit); if (!skb) return; if (pmctx) { skb->dev = pmctx->port->dev; br_multicast_count(brmctx->br, pmctx->port, skb, igmp_type, BR_MCAST_DIR_TX); NF_HOOK(NFPROTO_BRIDGE, NF_BR_LOCAL_OUT, dev_net(pmctx->port->dev), NULL, skb, NULL, skb->dev, br_dev_queue_push_xmit); if (over_lmqt && with_srcs && sflag) { over_lmqt = false; goto again_under_lmqt; } } else { br_multicast_select_own_querier(brmctx, group, skb); br_multicast_count(brmctx->br, NULL, skb, igmp_type, BR_MCAST_DIR_RX); netif_rx(skb); } } static void br_multicast_read_querier(const struct bridge_mcast_querier *querier, struct bridge_mcast_querier *dest) { unsigned int seq; memset(dest, 0, sizeof(*dest)); do { seq = read_seqcount_begin(&querier->seq); dest->port_ifidx = querier->port_ifidx; memcpy(&dest->addr, &querier->addr, sizeof(struct br_ip)); } while (read_seqcount_retry(&querier->seq, seq)); } static void br_multicast_update_querier(struct net_bridge_mcast *brmctx, struct bridge_mcast_querier *querier, int ifindex, struct br_ip *saddr) { write_seqcount_begin(&querier->seq); querier->port_ifidx = ifindex; memcpy(&querier->addr, saddr, sizeof(*saddr)); write_seqcount_end(&querier->seq); } static void br_multicast_send_query(struct net_bridge_mcast *brmctx, struct net_bridge_mcast_port *pmctx, struct bridge_mcast_own_query *own_query) { struct bridge_mcast_other_query *other_query = NULL; struct bridge_mcast_querier *querier; struct br_ip br_group; unsigned long time; if (!br_multicast_ctx_should_use(brmctx, pmctx) || !br_opt_get(brmctx->br, BROPT_MULTICAST_ENABLED) || !brmctx->multicast_querier) return; memset(&br_group.dst, 0, sizeof(br_group.dst)); if (pmctx ? (own_query == &pmctx->ip4_own_query) : (own_query == &brmctx->ip4_own_query)) { querier = &brmctx->ip4_querier; other_query = &brmctx->ip4_other_query; br_group.proto = htons(ETH_P_IP); #if IS_ENABLED(CONFIG_IPV6) } else { querier = &brmctx->ip6_querier; other_query = &brmctx->ip6_other_query; br_group.proto = htons(ETH_P_IPV6); #endif } if (!other_query || timer_pending(&other_query->timer)) return; /* we're about to select ourselves as querier */ if (!pmctx && querier->port_ifidx) { struct br_ip zeroip = {}; br_multicast_update_querier(brmctx, querier, 0, &zeroip); } __br_multicast_send_query(brmctx, pmctx, NULL, NULL, &br_group, false, 0, NULL); time = jiffies; time += own_query->startup_sent < brmctx->multicast_startup_query_count ? brmctx->multicast_startup_query_interval : brmctx->multicast_query_interval; mod_timer(&own_query->timer, time); } static void br_multicast_port_query_expired(struct net_bridge_mcast_port *pmctx, struct bridge_mcast_own_query *query) { struct net_bridge *br = pmctx->port->br; struct net_bridge_mcast *brmctx; spin_lock(&br->multicast_lock); if (br_multicast_port_ctx_state_stopped(pmctx)) goto out; brmctx = br_multicast_port_ctx_get_global(pmctx); if (query->startup_sent < brmctx->multicast_startup_query_count) query->startup_sent++; br_multicast_send_query(brmctx, pmctx, query); out: spin_unlock(&br->multicast_lock); } static void br_ip4_multicast_port_query_expired(struct timer_list *t) { struct net_bridge_mcast_port *pmctx = timer_container_of(pmctx, t, ip4_own_query.timer); br_multicast_port_query_expired(pmctx, &pmctx->ip4_own_query); } #if IS_ENABLED(CONFIG_IPV6) static void br_ip6_multicast_port_query_expired(struct timer_list *t) { struct net_bridge_mcast_port *pmctx = timer_container_of(pmctx, t, ip6_own_query.timer); br_multicast_port_query_expired(pmctx, &pmctx->ip6_own_query); } #endif static void br_multicast_port_group_rexmit(struct timer_list *t) { struct net_bridge_port_group *pg = timer_container_of(pg, t, rexmit_timer); struct bridge_mcast_other_query *other_query = NULL; struct net_bridge *br = pg->key.port->br; struct net_bridge_mcast_port *pmctx; struct net_bridge_mcast *brmctx; bool need_rexmit = false; spin_lock(&br->multicast_lock); if (!netif_running(br->dev) || hlist_unhashed(&pg->mglist) || !br_opt_get(br, BROPT_MULTICAST_ENABLED)) goto out; pmctx = br_multicast_pg_to_port_ctx(pg); if (!pmctx) goto out; brmctx = br_multicast_port_ctx_get_global(pmctx); if (!brmctx->multicast_querier) goto out; if (pg->key.addr.proto == htons(ETH_P_IP)) other_query = &brmctx->ip4_other_query; #if IS_ENABLED(CONFIG_IPV6) else other_query = &brmctx->ip6_other_query; #endif if (!other_query || timer_pending(&other_query->timer)) goto out; if (pg->grp_query_rexmit_cnt) { pg->grp_query_rexmit_cnt--; __br_multicast_send_query(brmctx, pmctx, pg, &pg->key.addr, &pg->key.addr, false, 1, NULL); } __br_multicast_send_query(brmctx, pmctx, pg, &pg->key.addr, &pg->key.addr, true, 0, &need_rexmit); if (pg->grp_query_rexmit_cnt || need_rexmit) mod_timer(&pg->rexmit_timer, jiffies + brmctx->multicast_last_member_interval); out: spin_unlock(&br->multicast_lock); } static int br_mc_disabled_update(struct net_device *dev, bool value, struct netlink_ext_ack *extack) { struct switchdev_attr attr = { .orig_dev = dev, .id = SWITCHDEV_ATTR_ID_BRIDGE_MC_DISABLED, .flags = SWITCHDEV_F_DEFER, .u.mc_disabled = !value, }; return switchdev_port_attr_set(dev, &attr, extack); } void br_multicast_port_ctx_init(struct net_bridge_port *port, struct net_bridge_vlan *vlan, struct net_bridge_mcast_port *pmctx) { pmctx->port = port; pmctx->vlan = vlan; pmctx->multicast_router = MDB_RTR_TYPE_TEMP_QUERY; timer_setup(&pmctx->ip4_mc_router_timer, br_ip4_multicast_router_expired, 0); timer_setup(&pmctx->ip4_own_query.timer, br_ip4_multicast_port_query_expired, 0); #if IS_ENABLED(CONFIG_IPV6) timer_setup(&pmctx->ip6_mc_router_timer, br_ip6_multicast_router_expired, 0); timer_setup(&pmctx->ip6_own_query.timer, br_ip6_multicast_port_query_expired, 0); #endif } void br_multicast_port_ctx_deinit(struct net_bridge_mcast_port *pmctx) { struct net_bridge *br = pmctx->port->br; bool del = false; #if IS_ENABLED(CONFIG_IPV6) timer_delete_sync(&pmctx->ip6_mc_router_timer); #endif timer_delete_sync(&pmctx->ip4_mc_router_timer); spin_lock_bh(&br->multicast_lock); del |= br_ip6_multicast_rport_del(pmctx); del |= br_ip4_multicast_rport_del(pmctx); br_multicast_rport_del_notify(pmctx, del); spin_unlock_bh(&br->multicast_lock); } int br_multicast_add_port(struct net_bridge_port *port) { int err; port->multicast_eht_hosts_limit = BR_MCAST_DEFAULT_EHT_HOSTS_LIMIT; br_multicast_port_ctx_init(port, NULL, &port->multicast_ctx); err = br_mc_disabled_update(port->dev, br_opt_get(port->br, BROPT_MULTICAST_ENABLED), NULL); if (err && err != -EOPNOTSUPP) return err; port->mcast_stats = netdev_alloc_pcpu_stats(struct bridge_mcast_stats); if (!port->mcast_stats) return -ENOMEM; return 0; } void br_multicast_del_port(struct net_bridge_port *port) { struct net_bridge *br = port->br; struct net_bridge_port_group *pg; struct hlist_node *n; /* Take care of the remaining groups, only perm ones should be left */ spin_lock_bh(&br->multicast_lock); hlist_for_each_entry_safe(pg, n, &port->mglist, mglist) br_multicast_find_del_pg(br, pg); spin_unlock_bh(&br->multicast_lock); flush_work(&br->mcast_gc_work); br_multicast_port_ctx_deinit(&port->multicast_ctx); free_percpu(port->mcast_stats); } static void br_multicast_enable(struct bridge_mcast_own_query *query) { query->startup_sent = 0; if (timer_delete_sync_try(&query->timer) >= 0 || timer_delete(&query->timer)) mod_timer(&query->timer, jiffies); } static void __br_multicast_enable_port_ctx(struct net_bridge_mcast_port *pmctx) { struct net_bridge *br = pmctx->port->br; struct net_bridge_mcast *brmctx; brmctx = br_multicast_port_ctx_get_global(pmctx); if (!br_opt_get(br, BROPT_MULTICAST_ENABLED) || !netif_running(br->dev)) return; br_multicast_enable(&pmctx->ip4_own_query); #if IS_ENABLED(CONFIG_IPV6) br_multicast_enable(&pmctx->ip6_own_query); #endif if (pmctx->multicast_router == MDB_RTR_TYPE_PERM) { br_ip4_multicast_add_router(brmctx, pmctx); br_ip6_multicast_add_router(brmctx, pmctx); } if (br_multicast_port_ctx_is_vlan(pmctx)) { struct net_bridge_port_group *pg; u32 n = 0; /* The mcast_n_groups counter might be wrong. First, * BR_VLFLAG_MCAST_ENABLED is toggled before temporary entries * are flushed, thus mcast_n_groups after the toggle does not * reflect the true values. And second, permanent entries added * while BR_VLFLAG_MCAST_ENABLED was disabled, are not reflected * either. Thus we have to refresh the counter. */ hlist_for_each_entry(pg, &pmctx->port->mglist, mglist) { if (pg->key.addr.vid == pmctx->vlan->vid) n++; } WRITE_ONCE(pmctx->mdb_n_entries, n); } } static void br_multicast_enable_port_ctx(struct net_bridge_mcast_port *pmctx) { struct net_bridge *br = pmctx->port->br; spin_lock_bh(&br->multicast_lock); if (br_multicast_port_ctx_is_vlan(pmctx) && !(pmctx->vlan->priv_flags & BR_VLFLAG_MCAST_ENABLED)) { spin_unlock_bh(&br->multicast_lock); return; } __br_multicast_enable_port_ctx(pmctx); spin_unlock_bh(&br->multicast_lock); } static void __br_multicast_disable_port_ctx(struct net_bridge_mcast_port *pmctx) { struct net_bridge_port_group *pg; struct hlist_node *n; bool del = false; hlist_for_each_entry_safe(pg, n, &pmctx->port->mglist, mglist) if (!(pg->flags & MDB_PG_FLAGS_PERMANENT) && (!br_multicast_port_ctx_is_vlan(pmctx) || pg->key.addr.vid == pmctx->vlan->vid)) br_multicast_find_del_pg(pmctx->port->br, pg); del |= br_ip4_multicast_rport_del(pmctx); timer_delete(&pmctx->ip4_mc_router_timer); timer_delete(&pmctx->ip4_own_query.timer); del |= br_ip6_multicast_rport_del(pmctx); #if IS_ENABLED(CONFIG_IPV6) timer_delete(&pmctx->ip6_mc_router_timer); timer_delete(&pmctx->ip6_own_query.timer); #endif br_multicast_rport_del_notify(pmctx, del); } static void br_multicast_disable_port_ctx(struct net_bridge_mcast_port *pmctx) { struct net_bridge *br = pmctx->port->br; spin_lock_bh(&br->multicast_lock); if (br_multicast_port_ctx_is_vlan(pmctx) && !(pmctx->vlan->priv_flags & BR_VLFLAG_MCAST_ENABLED)) { spin_unlock_bh(&br->multicast_lock); return; } __br_multicast_disable_port_ctx(pmctx); spin_unlock_bh(&br->multicast_lock); } static void br_multicast_toggle_port(struct net_bridge_port *port, bool on) { #if IS_ENABLED(CONFIG_BRIDGE_VLAN_FILTERING) if (br_opt_get(port->br, BROPT_MCAST_VLAN_SNOOPING_ENABLED)) { struct net_bridge_vlan_group *vg; struct net_bridge_vlan *vlan; rcu_read_lock(); vg = nbp_vlan_group_rcu(port); if (!vg) { rcu_read_unlock(); return; } /* iterate each vlan, toggle vlan multicast context */ list_for_each_entry_rcu(vlan, &vg->vlan_list, vlist) { struct net_bridge_mcast_port *pmctx = &vlan->port_mcast_ctx; u8 state = br_vlan_get_state(vlan); /* enable vlan multicast context when state is * LEARNING or FORWARDING */ if (on && br_vlan_state_allowed(state, true)) br_multicast_enable_port_ctx(pmctx); else br_multicast_disable_port_ctx(pmctx); } rcu_read_unlock(); return; } #endif /* toggle port multicast context when vlan snooping is disabled */ if (on) br_multicast_enable_port_ctx(&port->multicast_ctx); else br_multicast_disable_port_ctx(&port->multicast_ctx); } void br_multicast_enable_port(struct net_bridge_port *port) { br_multicast_toggle_port(port, true); } void br_multicast_disable_port(struct net_bridge_port *port) { br_multicast_toggle_port(port, false); } static int __grp_src_delete_marked(struct net_bridge_port_group *pg) { struct net_bridge_group_src *ent; struct hlist_node *tmp; int deleted = 0; hlist_for_each_entry_safe(ent, tmp, &pg->src_list, node) if (ent->flags & BR_SGRP_F_DELETE) { br_multicast_del_group_src(ent, false); deleted++; } return deleted; } static void __grp_src_mod_timer(struct net_bridge_group_src *src, unsigned long expires) { mod_timer(&src->timer, expires); br_multicast_fwd_src_handle(src); } static void __grp_src_query_marked_and_rexmit(struct net_bridge_mcast *brmctx, struct net_bridge_mcast_port *pmctx, struct net_bridge_port_group *pg) { struct bridge_mcast_other_query *other_query = NULL; u32 lmqc = brmctx->multicast_last_member_count; unsigned long lmqt, lmi, now = jiffies; struct net_bridge_group_src *ent; if (!netif_running(brmctx->br->dev) || !br_opt_get(brmctx->br, BROPT_MULTICAST_ENABLED)) return; if (pg->key.addr.proto == htons(ETH_P_IP)) other_query = &brmctx->ip4_other_query; #if IS_ENABLED(CONFIG_IPV6) else other_query = &brmctx->ip6_other_query; #endif lmqt = now + br_multicast_lmqt(brmctx); hlist_for_each_entry(ent, &pg->src_list, node) { if (ent->flags & BR_SGRP_F_SEND) { ent->flags &= ~BR_SGRP_F_SEND; if (ent->timer.expires > lmqt) { if (brmctx->multicast_querier && other_query && !timer_pending(&other_query->timer)) ent->src_query_rexmit_cnt = lmqc; __grp_src_mod_timer(ent, lmqt); } } } if (!brmctx->multicast_querier || !other_query || timer_pending(&other_query->timer)) return; __br_multicast_send_query(brmctx, pmctx, pg, &pg->key.addr, &pg->key.addr, true, 1, NULL); lmi = now + brmctx->multicast_last_member_interval; if (!timer_pending(&pg->rexmit_timer) || time_after(pg->rexmit_timer.expires, lmi)) mod_timer(&pg->rexmit_timer, lmi); } static void __grp_send_query_and_rexmit(struct net_bridge_mcast *brmctx, struct net_bridge_mcast_port *pmctx, struct net_bridge_port_group *pg) { struct bridge_mcast_other_query *other_query = NULL; unsigned long now = jiffies, lmi; if (!netif_running(brmctx->br->dev) || !br_opt_get(brmctx->br, BROPT_MULTICAST_ENABLED)) return; if (pg->key.addr.proto == htons(ETH_P_IP)) other_query = &brmctx->ip4_other_query; #if IS_ENABLED(CONFIG_IPV6) else other_query = &brmctx->ip6_other_query; #endif if (brmctx->multicast_querier && other_query && !timer_pending(&other_query->timer)) { lmi = now + brmctx->multicast_last_member_interval; pg->grp_query_rexmit_cnt = brmctx->multicast_last_member_count - 1; __br_multicast_send_query(brmctx, pmctx, pg, &pg->key.addr, &pg->key.addr, false, 0, NULL); if (!timer_pending(&pg->rexmit_timer) || time_after(pg->rexmit_timer.expires, lmi)) mod_timer(&pg->rexmit_timer, lmi); } if (pg->filter_mode == MCAST_EXCLUDE && (!timer_pending(&pg->timer) || time_after(pg->timer.expires, now + br_multicast_lmqt(brmctx)))) mod_timer(&pg->timer, now + br_multicast_lmqt(brmctx)); } /* State Msg type New state Actions * INCLUDE (A) IS_IN (B) INCLUDE (A+B) (B)=GMI * INCLUDE (A) ALLOW (B) INCLUDE (A+B) (B)=GMI * EXCLUDE (X,Y) ALLOW (A) EXCLUDE (X+A,Y-A) (A)=GMI */ static bool br_multicast_isinc_allow(const struct net_bridge_mcast *brmctx, struct net_bridge_port_group *pg, void *h_addr, void *srcs, u32 nsrcs, size_t addr_size, int grec_type) { struct net_bridge_group_src *ent; unsigned long now = jiffies; bool changed = false; struct br_ip src_ip; u32 src_idx; memset(&src_ip, 0, sizeof(src_ip)); src_ip.proto = pg->key.addr.proto; for (src_idx = 0; src_idx < nsrcs; src_idx++) { memcpy(&src_ip.src, srcs + (src_idx * addr_size), addr_size); ent = br_multicast_find_group_src(pg, &src_ip); if (!ent) { ent = br_multicast_new_group_src(pg, &src_ip); if (ent) changed = true; } if (ent) __grp_src_mod_timer(ent, now + br_multicast_gmi(brmctx)); } if (br_multicast_eht_handle(brmctx, pg, h_addr, srcs, nsrcs, addr_size, grec_type)) changed = true; return changed; } /* State Msg type New state Actions * INCLUDE (A) IS_EX (B) EXCLUDE (A*B,B-A) (B-A)=0 * Delete (A-B) * Group Timer=GMI */ static void __grp_src_isexc_incl(const struct net_bridge_mcast *brmctx, struct net_bridge_port_group *pg, void *h_addr, void *srcs, u32 nsrcs, size_t addr_size, int grec_type) { struct net_bridge_group_src *ent; struct br_ip src_ip; u32 src_idx; hlist_for_each_entry(ent, &pg->src_list, node) ent->flags |= BR_SGRP_F_DELETE; memset(&src_ip, 0, sizeof(src_ip)); src_ip.proto = pg->key.addr.proto; for (src_idx = 0; src_idx < nsrcs; src_idx++) { memcpy(&src_ip.src, srcs + (src_idx * addr_size), addr_size); ent = br_multicast_find_group_src(pg, &src_ip); if (ent) ent->flags &= ~BR_SGRP_F_DELETE; else ent = br_multicast_new_group_src(pg, &src_ip); if (ent) br_multicast_fwd_src_handle(ent); } br_multicast_eht_handle(brmctx, pg, h_addr, srcs, nsrcs, addr_size, grec_type); __grp_src_delete_marked(pg); } /* State Msg type New state Actions * EXCLUDE (X,Y) IS_EX (A) EXCLUDE (A-Y,Y*A) (A-X-Y)=GMI * Delete (X-A) * Delete (Y-A) * Group Timer=GMI */ static bool __grp_src_isexc_excl(const struct net_bridge_mcast *brmctx, struct net_bridge_port_group *pg, void *h_addr, void *srcs, u32 nsrcs, size_t addr_size, int grec_type) { struct net_bridge_group_src *ent; unsigned long now = jiffies; bool changed = false; struct br_ip src_ip; u32 src_idx; hlist_for_each_entry(ent, &pg->src_list, node) ent->flags |= BR_SGRP_F_DELETE; memset(&src_ip, 0, sizeof(src_ip)); src_ip.proto = pg->key.addr.proto; for (src_idx = 0; src_idx < nsrcs; src_idx++) { memcpy(&src_ip.src, srcs + (src_idx * addr_size), addr_size); ent = br_multicast_find_group_src(pg, &src_ip); if (ent) { ent->flags &= ~BR_SGRP_F_DELETE; } else { ent = br_multicast_new_group_src(pg, &src_ip); if (ent) { __grp_src_mod_timer(ent, now + br_multicast_gmi(brmctx)); changed = true; } } } if (br_multicast_eht_handle(brmctx, pg, h_addr, srcs, nsrcs, addr_size, grec_type)) changed = true; if (__grp_src_delete_marked(pg)) changed = true; return changed; } static bool br_multicast_isexc(const struct net_bridge_mcast *brmctx, struct net_bridge_port_group *pg, void *h_addr, void *srcs, u32 nsrcs, size_t addr_size, int grec_type) { bool changed = false; switch (pg->filter_mode) { case MCAST_INCLUDE: __grp_src_isexc_incl(brmctx, pg, h_addr, srcs, nsrcs, addr_size, grec_type); br_multicast_star_g_handle_mode(pg, MCAST_EXCLUDE); changed = true; break; case MCAST_EXCLUDE: changed = __grp_src_isexc_excl(brmctx, pg, h_addr, srcs, nsrcs, addr_size, grec_type); break; } pg->filter_mode = MCAST_EXCLUDE; mod_timer(&pg->timer, jiffies + br_multicast_gmi(brmctx)); return changed; } /* State Msg type New state Actions * INCLUDE (A) TO_IN (B) INCLUDE (A+B) (B)=GMI * Send Q(G,A-B) */ static bool __grp_src_toin_incl(struct net_bridge_mcast *brmctx, struct net_bridge_mcast_port *pmctx, struct net_bridge_port_group *pg, void *h_addr, void *srcs, u32 nsrcs, size_t addr_size, int grec_type) { u32 src_idx, to_send = pg->src_ents; struct net_bridge_group_src *ent; unsigned long now = jiffies; bool changed = false; struct br_ip src_ip; hlist_for_each_entry(ent, &pg->src_list, node) ent->flags |= BR_SGRP_F_SEND; memset(&src_ip, 0, sizeof(src_ip)); src_ip.proto = pg->key.addr.proto; for (src_idx = 0; src_idx < nsrcs; src_idx++) { memcpy(&src_ip.src, srcs + (src_idx * addr_size), addr_size); ent = br_multicast_find_group_src(pg, &src_ip); if (ent) { ent->flags &= ~BR_SGRP_F_SEND; to_send--; } else { ent = br_multicast_new_group_src(pg, &src_ip); if (ent) changed = true; } if (ent) __grp_src_mod_timer(ent, now + br_multicast_gmi(brmctx)); } if (br_multicast_eht_handle(brmctx, pg, h_addr, srcs, nsrcs, addr_size, grec_type)) changed = true; if (to_send) __grp_src_query_marked_and_rexmit(brmctx, pmctx, pg); return changed; } /* State Msg type New state Actions * EXCLUDE (X,Y) TO_IN (A) EXCLUDE (X+A,Y-A) (A)=GMI * Send Q(G,X-A) * Send Q(G) */ static bool __grp_src_toin_excl(struct net_bridge_mcast *brmctx, struct net_bridge_mcast_port *pmctx, struct net_bridge_port_group *pg, void *h_addr, void *srcs, u32 nsrcs, size_t addr_size, int grec_type) { u32 src_idx, to_send = pg->src_ents; struct net_bridge_group_src *ent; unsigned long now = jiffies; bool changed = false; struct br_ip src_ip; hlist_for_each_entry(ent, &pg->src_list, node) if (timer_pending(&ent->timer)) ent->flags |= BR_SGRP_F_SEND; memset(&src_ip, 0, sizeof(src_ip)); src_ip.proto = pg->key.addr.proto; for (src_idx = 0; src_idx < nsrcs; src_idx++) { memcpy(&src_ip.src, srcs + (src_idx * addr_size), addr_size); ent = br_multicast_find_group_src(pg, &src_ip); if (ent) { if (timer_pending(&ent->timer)) { ent->flags &= ~BR_SGRP_F_SEND; to_send--; } } else { ent = br_multicast_new_group_src(pg, &src_ip); if (ent) changed = true; } if (ent) __grp_src_mod_timer(ent, now + br_multicast_gmi(brmctx)); } if (br_multicast_eht_handle(brmctx, pg, h_addr, srcs, nsrcs, addr_size, grec_type)) changed = true; if (to_send) __grp_src_query_marked_and_rexmit(brmctx, pmctx, pg); __grp_send_query_and_rexmit(brmctx, pmctx, pg); return changed; } static bool br_multicast_toin(struct net_bridge_mcast *brmctx, struct net_bridge_mcast_port *pmctx, struct net_bridge_port_group *pg, void *h_addr, void *srcs, u32 nsrcs, size_t addr_size, int grec_type) { bool changed = false; switch (pg->filter_mode) { case MCAST_INCLUDE: changed = __grp_src_toin_incl(brmctx, pmctx, pg, h_addr, srcs, nsrcs, addr_size, grec_type); break; case MCAST_EXCLUDE: changed = __grp_src_toin_excl(brmctx, pmctx, pg, h_addr, srcs, nsrcs, addr_size, grec_type); break; } if (br_multicast_eht_should_del_pg(pg)) { pg->flags |= MDB_PG_FLAGS_FAST_LEAVE; br_multicast_find_del_pg(pg->key.port->br, pg); /* a notification has already been sent and we shouldn't * access pg after the delete so we have to return false */ changed = false; } return changed; } /* State Msg type New state Actions * INCLUDE (A) TO_EX (B) EXCLUDE (A*B,B-A) (B-A)=0 * Delete (A-B) * Send Q(G,A*B) * Group Timer=GMI */ static void __grp_src_toex_incl(struct net_bridge_mcast *brmctx, struct net_bridge_mcast_port *pmctx, struct net_bridge_port_group *pg, void *h_addr, void *srcs, u32 nsrcs, size_t addr_size, int grec_type) { struct net_bridge_group_src *ent; u32 src_idx, to_send = 0; struct br_ip src_ip; hlist_for_each_entry(ent, &pg->src_list, node) ent->flags = (ent->flags & ~BR_SGRP_F_SEND) | BR_SGRP_F_DELETE; memset(&src_ip, 0, sizeof(src_ip)); src_ip.proto = pg->key.addr.proto; for (src_idx = 0; src_idx < nsrcs; src_idx++) { memcpy(&src_ip.src, srcs + (src_idx * addr_size), addr_size); ent = br_multicast_find_group_src(pg, &src_ip); if (ent) { ent->flags = (ent->flags & ~BR_SGRP_F_DELETE) | BR_SGRP_F_SEND; to_send++; } else { ent = br_multicast_new_group_src(pg, &src_ip); } if (ent) br_multicast_fwd_src_handle(ent); } br_multicast_eht_handle(brmctx, pg, h_addr, srcs, nsrcs, addr_size, grec_type); __grp_src_delete_marked(pg); if (to_send) __grp_src_query_marked_and_rexmit(brmctx, pmctx, pg); } /* State Msg type New state Actions * EXCLUDE (X,Y) TO_EX (A) EXCLUDE (A-Y,Y*A) (A-X-Y)=Group Timer * Delete (X-A) * Delete (Y-A) * Send Q(G,A-Y) * Group Timer=GMI */ static bool __grp_src_toex_excl(struct net_bridge_mcast *brmctx, struct net_bridge_mcast_port *pmctx, struct net_bridge_port_group *pg, void *h_addr, void *srcs, u32 nsrcs, size_t addr_size, int grec_type) { struct net_bridge_group_src *ent; u32 src_idx, to_send = 0; bool changed = false; struct br_ip src_ip; hlist_for_each_entry(ent, &pg->src_list, node) ent->flags = (ent->flags & ~BR_SGRP_F_SEND) | BR_SGRP_F_DELETE; memset(&src_ip, 0, sizeof(src_ip)); src_ip.proto = pg->key.addr.proto; for (src_idx = 0; src_idx < nsrcs; src_idx++) { memcpy(&src_ip.src, srcs + (src_idx * addr_size), addr_size); ent = br_multicast_find_group_src(pg, &src_ip); if (ent) { ent->flags &= ~BR_SGRP_F_DELETE; } else { ent = br_multicast_new_group_src(pg, &src_ip); if (ent) { __grp_src_mod_timer(ent, pg->timer.expires); changed = true; } } if (ent && timer_pending(&ent->timer)) { ent->flags |= BR_SGRP_F_SEND; to_send++; } } if (br_multicast_eht_handle(brmctx, pg, h_addr, srcs, nsrcs, addr_size, grec_type)) changed = true; if (__grp_src_delete_marked(pg)) changed = true; if (to_send) __grp_src_query_marked_and_rexmit(brmctx, pmctx, pg); return changed; } static bool br_multicast_toex(struct net_bridge_mcast *brmctx, struct net_bridge_mcast_port *pmctx, struct net_bridge_port_group *pg, void *h_addr, void *srcs, u32 nsrcs, size_t addr_size, int grec_type) { bool changed = false; switch (pg->filter_mode) { case MCAST_INCLUDE: __grp_src_toex_incl(brmctx, pmctx, pg, h_addr, srcs, nsrcs, addr_size, grec_type); br_multicast_star_g_handle_mode(pg, MCAST_EXCLUDE); changed = true; break; case MCAST_EXCLUDE: changed = __grp_src_toex_excl(brmctx, pmctx, pg, h_addr, srcs, nsrcs, addr_size, grec_type); break; } pg->filter_mode = MCAST_EXCLUDE; mod_timer(&pg->timer, jiffies + br_multicast_gmi(brmctx)); return changed; } /* State Msg type New state Actions * INCLUDE (A) BLOCK (B) INCLUDE (A) Send Q(G,A*B) */ static bool __grp_src_block_incl(struct net_bridge_mcast *brmctx, struct net_bridge_mcast_port *pmctx, struct net_bridge_port_group *pg, void *h_addr, void *srcs, u32 nsrcs, size_t addr_size, int grec_type) { struct net_bridge_group_src *ent; u32 src_idx, to_send = 0; bool changed = false; struct br_ip src_ip; hlist_for_each_entry(ent, &pg->src_list, node) ent->flags &= ~BR_SGRP_F_SEND; memset(&src_ip, 0, sizeof(src_ip)); src_ip.proto = pg->key.addr.proto; for (src_idx = 0; src_idx < nsrcs; src_idx++) { memcpy(&src_ip.src, srcs + (src_idx * addr_size), addr_size); ent = br_multicast_find_group_src(pg, &src_ip); if (ent) { ent->flags |= BR_SGRP_F_SEND; to_send++; } } if (br_multicast_eht_handle(brmctx, pg, h_addr, srcs, nsrcs, addr_size, grec_type)) changed = true; if (to_send) __grp_src_query_marked_and_rexmit(brmctx, pmctx, pg); return changed; } /* State Msg type New state Actions * EXCLUDE (X,Y) BLOCK (A) EXCLUDE (X+(A-Y),Y) (A-X-Y)=Group Timer * Send Q(G,A-Y) */ static bool __grp_src_block_excl(struct net_bridge_mcast *brmctx, struct net_bridge_mcast_port *pmctx, struct net_bridge_port_group *pg, void *h_addr, void *srcs, u32 nsrcs, size_t addr_size, int grec_type) { struct net_bridge_group_src *ent; u32 src_idx, to_send = 0; bool changed = false; struct br_ip src_ip; hlist_for_each_entry(ent, &pg->src_list, node) ent->flags &= ~BR_SGRP_F_SEND; memset(&src_ip, 0, sizeof(src_ip)); src_ip.proto = pg->key.addr.proto; for (src_idx = 0; src_idx < nsrcs; src_idx++) { memcpy(&src_ip.src, srcs + (src_idx * addr_size), addr_size); ent = br_multicast_find_group_src(pg, &src_ip); if (!ent) { ent = br_multicast_new_group_src(pg, &src_ip); if (ent) { __grp_src_mod_timer(ent, pg->timer.expires); changed = true; } } if (ent && timer_pending(&ent->timer)) { ent->flags |= BR_SGRP_F_SEND; to_send++; } } if (br_multicast_eht_handle(brmctx, pg, h_addr, srcs, nsrcs, addr_size, grec_type)) changed = true; if (to_send) __grp_src_query_marked_and_rexmit(brmctx, pmctx, pg); return changed; } static bool br_multicast_block(struct net_bridge_mcast *brmctx, struct net_bridge_mcast_port *pmctx, struct net_bridge_port_group *pg, void *h_addr, void *srcs, u32 nsrcs, size_t addr_size, int grec_type) { bool changed = false; switch (pg->filter_mode) { case MCAST_INCLUDE: changed = __grp_src_block_incl(brmctx, pmctx, pg, h_addr, srcs, nsrcs, addr_size, grec_type); break; case MCAST_EXCLUDE: changed = __grp_src_block_excl(brmctx, pmctx, pg, h_addr, srcs, nsrcs, addr_size, grec_type); break; } if ((pg->filter_mode == MCAST_INCLUDE && hlist_empty(&pg->src_list)) || br_multicast_eht_should_del_pg(pg)) { if (br_multicast_eht_should_del_pg(pg)) pg->flags |= MDB_PG_FLAGS_FAST_LEAVE; br_multicast_find_del_pg(pg->key.port->br, pg); /* a notification has already been sent and we shouldn't * access pg after the delete so we have to return false */ changed = false; } return changed; } static struct net_bridge_port_group * br_multicast_find_port(struct net_bridge_mdb_entry *mp, struct net_bridge_port *p, const unsigned char *src) { struct net_bridge *br __maybe_unused = mp->br; struct net_bridge_port_group *pg; for (pg = mlock_dereference(mp->ports, br); pg; pg = mlock_dereference(pg->next, br)) if (br_port_group_equal(pg, p, src)) return pg; return NULL; } static int br_ip4_multicast_igmp3_report(struct net_bridge_mcast *brmctx, struct net_bridge_mcast_port *pmctx, struct sk_buff *skb, u16 vid) { bool igmpv2 = brmctx->multicast_igmp_version == 2; struct net_bridge_mdb_entry *mdst; struct net_bridge_port_group *pg; const unsigned char *src; struct igmpv3_report *ih; struct igmpv3_grec *grec; int i, len, num, type; __be32 group, *h_addr; bool changed = false; int err = 0; u16 nsrcs; ih = igmpv3_report_hdr(skb); num = ntohs(ih->ngrec); len = skb_transport_offset(skb) + sizeof(*ih); for (i = 0; i < num; i++) { len += sizeof(*grec); if (!ip_mc_may_pull(skb, len)) return -EINVAL; grec = (void *)(skb->data + len - sizeof(*grec)); group = grec->grec_mca; type = grec->grec_type; nsrcs = ntohs(grec->grec_nsrcs); len += nsrcs * 4; if (!ip_mc_may_pull(skb, len)) return -EINVAL; switch (type) { case IGMPV3_MODE_IS_INCLUDE: case IGMPV3_MODE_IS_EXCLUDE: case IGMPV3_CHANGE_TO_INCLUDE: case IGMPV3_CHANGE_TO_EXCLUDE: case IGMPV3_ALLOW_NEW_SOURCES: case IGMPV3_BLOCK_OLD_SOURCES: break; default: continue; } src = eth_hdr(skb)->h_source; if (nsrcs == 0 && (type == IGMPV3_CHANGE_TO_INCLUDE || type == IGMPV3_MODE_IS_INCLUDE)) { if (!pmctx || igmpv2) { br_ip4_multicast_leave_group(brmctx, pmctx, group, vid, src); continue; } } else { err = br_ip4_multicast_add_group(brmctx, pmctx, group, vid, src, igmpv2); if (err) break; } if (!pmctx || igmpv2) continue; spin_lock(&brmctx->br->multicast_lock); if (!br_multicast_ctx_should_use(brmctx, pmctx)) goto unlock_continue; mdst = br_mdb_ip4_get(brmctx->br, group, vid); if (!mdst) goto unlock_continue; pg = br_multicast_find_port(mdst, pmctx->port, src); if (!pg || (pg->flags & MDB_PG_FLAGS_PERMANENT)) goto unlock_continue; /* reload grec and host addr */ grec = (void *)(skb->data + len - sizeof(*grec) - (nsrcs * 4)); h_addr = &ip_hdr(skb)->saddr; switch (type) { case IGMPV3_ALLOW_NEW_SOURCES: changed = br_multicast_isinc_allow(brmctx, pg, h_addr, grec->grec_src, nsrcs, sizeof(__be32), type); break; case IGMPV3_MODE_IS_INCLUDE: changed = br_multicast_isinc_allow(brmctx, pg, h_addr, grec->grec_src, nsrcs, sizeof(__be32), type); break; case IGMPV3_MODE_IS_EXCLUDE: changed = br_multicast_isexc(brmctx, pg, h_addr, grec->grec_src, nsrcs, sizeof(__be32), type); break; case IGMPV3_CHANGE_TO_INCLUDE: changed = br_multicast_toin(brmctx, pmctx, pg, h_addr, grec->grec_src, nsrcs, sizeof(__be32), type); break; case IGMPV3_CHANGE_TO_EXCLUDE: changed = br_multicast_toex(brmctx, pmctx, pg, h_addr, grec->grec_src, nsrcs, sizeof(__be32), type); break; case IGMPV3_BLOCK_OLD_SOURCES: changed = br_multicast_block(brmctx, pmctx, pg, h_addr, grec->grec_src, nsrcs, sizeof(__be32), type); break; } if (changed) br_mdb_notify(brmctx->br->dev, mdst, pg, RTM_NEWMDB); unlock_continue: spin_unlock(&brmctx->br->multicast_lock); } return err; } #if IS_ENABLED(CONFIG_IPV6) static int br_ip6_multicast_mld2_report(struct net_bridge_mcast *brmctx, struct net_bridge_mcast_port *pmctx, struct sk_buff *skb, u16 vid) { bool mldv1 = brmctx->multicast_mld_version == 1; struct net_bridge_mdb_entry *mdst; struct net_bridge_port_group *pg; unsigned int nsrcs_offset; struct mld2_report *mld2r; const unsigned char *src; struct in6_addr *h_addr; struct mld2_grec *grec; unsigned int grec_len; bool changed = false; int i, len, num; int err = 0; if (!ipv6_mc_may_pull(skb, sizeof(*mld2r))) return -EINVAL; mld2r = (struct mld2_report *)icmp6_hdr(skb); num = ntohs(mld2r->mld2r_ngrec); len = skb_transport_offset(skb) + sizeof(*mld2r); for (i = 0; i < num; i++) { __be16 *_nsrcs, __nsrcs; u16 nsrcs; nsrcs_offset = len + offsetof(struct mld2_grec, grec_nsrcs); if (skb_transport_offset(skb) + ipv6_transport_len(skb) < nsrcs_offset + sizeof(__nsrcs)) return -EINVAL; _nsrcs = skb_header_pointer(skb, nsrcs_offset, sizeof(__nsrcs), &__nsrcs); if (!_nsrcs) return -EINVAL; nsrcs = ntohs(*_nsrcs); grec_len = struct_size(grec, grec_src, nsrcs); if (!ipv6_mc_may_pull(skb, len + grec_len)) return -EINVAL; grec = (struct mld2_grec *)(skb->data + len); len += grec_len; switch (grec->grec_type) { case MLD2_MODE_IS_INCLUDE: case MLD2_MODE_IS_EXCLUDE: case MLD2_CHANGE_TO_INCLUDE: case MLD2_CHANGE_TO_EXCLUDE: case MLD2_ALLOW_NEW_SOURCES: case MLD2_BLOCK_OLD_SOURCES: break; default: continue; } src = eth_hdr(skb)->h_source; if ((grec->grec_type == MLD2_CHANGE_TO_INCLUDE || grec->grec_type == MLD2_MODE_IS_INCLUDE) && nsrcs == 0) { if (!pmctx || mldv1) { br_ip6_multicast_leave_group(brmctx, pmctx, &grec->grec_mca, vid, src); continue; } } else { err = br_ip6_multicast_add_group(brmctx, pmctx, &grec->grec_mca, vid, src, mldv1); if (err) break; } if (!pmctx || mldv1) continue; spin_lock(&brmctx->br->multicast_lock); if (!br_multicast_ctx_should_use(brmctx, pmctx)) goto unlock_continue; mdst = br_mdb_ip6_get(brmctx->br, &grec->grec_mca, vid); if (!mdst) goto unlock_continue; pg = br_multicast_find_port(mdst, pmctx->port, src); if (!pg || (pg->flags & MDB_PG_FLAGS_PERMANENT)) goto unlock_continue; h_addr = &ipv6_hdr(skb)->saddr; switch (grec->grec_type) { case MLD2_ALLOW_NEW_SOURCES: changed = br_multicast_isinc_allow(brmctx, pg, h_addr, grec->grec_src, nsrcs, sizeof(struct in6_addr), grec->grec_type); break; case MLD2_MODE_IS_INCLUDE: changed = br_multicast_isinc_allow(brmctx, pg, h_addr, grec->grec_src, nsrcs, sizeof(struct in6_addr), grec->grec_type); break; case MLD2_MODE_IS_EXCLUDE: changed = br_multicast_isexc(brmctx, pg, h_addr, grec->grec_src, nsrcs, sizeof(struct in6_addr), grec->grec_type); break; case MLD2_CHANGE_TO_INCLUDE: changed = br_multicast_toin(brmctx, pmctx, pg, h_addr, grec->grec_src, nsrcs, sizeof(struct in6_addr), grec->grec_type); break; case MLD2_CHANGE_TO_EXCLUDE: changed = br_multicast_toex(brmctx, pmctx, pg, h_addr, grec->grec_src, nsrcs, sizeof(struct in6_addr), grec->grec_type); break; case MLD2_BLOCK_OLD_SOURCES: changed = br_multicast_block(brmctx, pmctx, pg, h_addr, grec->grec_src, nsrcs, sizeof(struct in6_addr), grec->grec_type); break; } if (changed) br_mdb_notify(brmctx->br->dev, mdst, pg, RTM_NEWMDB); unlock_continue: spin_unlock(&brmctx->br->multicast_lock); } return err; } #endif static bool br_multicast_select_querier(struct net_bridge_mcast *brmctx, struct net_bridge_mcast_port *pmctx, struct br_ip *saddr) { int port_ifidx = pmctx ? pmctx->port->dev->ifindex : 0; struct timer_list *own_timer, *other_timer; struct bridge_mcast_querier *querier; switch (saddr->proto) { case htons(ETH_P_IP): querier = &brmctx->ip4_querier; own_timer = &brmctx->ip4_own_query.timer; other_timer = &brmctx->ip4_other_query.timer; if (!querier->addr.src.ip4 || ntohl(saddr->src.ip4) <= ntohl(querier->addr.src.ip4)) goto update; break; #if IS_ENABLED(CONFIG_IPV6) case htons(ETH_P_IPV6): querier = &brmctx->ip6_querier; own_timer = &brmctx->ip6_own_query.timer; other_timer = &brmctx->ip6_other_query.timer; if (ipv6_addr_cmp(&saddr->src.ip6, &querier->addr.src.ip6) <= 0) goto update; break; #endif default: return false; } if (!timer_pending(own_timer) && !timer_pending(other_timer)) goto update; return false; update: br_multicast_update_querier(brmctx, querier, port_ifidx, saddr); return true; } static struct net_bridge_port * __br_multicast_get_querier_port(struct net_bridge *br, const struct bridge_mcast_querier *querier) { int port_ifidx = READ_ONCE(querier->port_ifidx); struct net_bridge_port *p; struct net_device *dev; if (port_ifidx == 0) return NULL; dev = dev_get_by_index_rcu(dev_net(br->dev), port_ifidx); if (!dev) return NULL; p = br_port_get_rtnl_rcu(dev); if (!p || p->br != br) return NULL; return p; } size_t br_multicast_querier_state_size(void) { return nla_total_size(0) + /* nest attribute */ nla_total_size(sizeof(__be32)) + /* BRIDGE_QUERIER_IP_ADDRESS */ nla_total_size(sizeof(int)) + /* BRIDGE_QUERIER_IP_PORT */ nla_total_size_64bit(sizeof(u64)) + /* BRIDGE_QUERIER_IP_OTHER_TIMER */ #if IS_ENABLED(CONFIG_IPV6) nla_total_size(sizeof(struct in6_addr)) + /* BRIDGE_QUERIER_IPV6_ADDRESS */ nla_total_size(sizeof(int)) + /* BRIDGE_QUERIER_IPV6_PORT */ nla_total_size_64bit(sizeof(u64)) + /* BRIDGE_QUERIER_IPV6_OTHER_TIMER */ #endif 0; } /* protected by rtnl or rcu */ int br_multicast_dump_querier_state(struct sk_buff *skb, const struct net_bridge_mcast *brmctx, int nest_attr) { struct bridge_mcast_querier querier = {}; struct net_bridge_port *p; struct nlattr *nest; if (!br_opt_get(brmctx->br, BROPT_MULTICAST_ENABLED) || br_multicast_ctx_vlan_global_disabled(brmctx)) return 0; nest = nla_nest_start(skb, nest_attr); if (!nest) return -EMSGSIZE; rcu_read_lock(); if (!brmctx->multicast_querier && !timer_pending(&brmctx->ip4_other_query.timer)) goto out_v6; br_multicast_read_querier(&brmctx->ip4_querier, &querier); if (nla_put_in_addr(skb, BRIDGE_QUERIER_IP_ADDRESS, querier.addr.src.ip4)) { rcu_read_unlock(); goto out_err; } p = __br_multicast_get_querier_port(brmctx->br, &querier); if (timer_pending(&brmctx->ip4_other_query.timer) && (nla_put_u64_64bit(skb, BRIDGE_QUERIER_IP_OTHER_TIMER, br_timer_value(&brmctx->ip4_other_query.timer), BRIDGE_QUERIER_PAD) || (p && nla_put_u32(skb, BRIDGE_QUERIER_IP_PORT, p->dev->ifindex)))) { rcu_read_unlock(); goto out_err; } out_v6: #if IS_ENABLED(CONFIG_IPV6) if (!brmctx->multicast_querier && !timer_pending(&brmctx->ip6_other_query.timer)) goto out; br_multicast_read_querier(&brmctx->ip6_querier, &querier); if (nla_put_in6_addr(skb, BRIDGE_QUERIER_IPV6_ADDRESS, &querier.addr.src.ip6)) { rcu_read_unlock(); goto out_err; } p = __br_multicast_get_querier_port(brmctx->br, &querier); if (timer_pending(&brmctx->ip6_other_query.timer) && (nla_put_u64_64bit(skb, BRIDGE_QUERIER_IPV6_OTHER_TIMER, br_timer_value(&brmctx->ip6_other_query.timer), BRIDGE_QUERIER_PAD) || (p && nla_put_u32(skb, BRIDGE_QUERIER_IPV6_PORT, p->dev->ifindex)))) { rcu_read_unlock(); goto out_err; } out: #endif rcu_read_unlock(); nla_nest_end(skb, nest); if (!nla_len(nest)) nla_nest_cancel(skb, nest); return 0; out_err: nla_nest_cancel(skb, nest); return -EMSGSIZE; } static void br_multicast_update_query_timer(struct net_bridge_mcast *brmctx, struct bridge_mcast_other_query *query, unsigned long max_delay) { if (!timer_pending(&query->timer)) mod_timer(&query->delay_timer, jiffies + max_delay); mod_timer(&query->timer, jiffies + brmctx->multicast_querier_interval); } static void br_port_mc_router_state_change(struct net_bridge_port *p, bool is_mc_router) { struct switchdev_attr attr = { .orig_dev = p->dev, .id = SWITCHDEV_ATTR_ID_PORT_MROUTER, .flags = SWITCHDEV_F_DEFER, .u.mrouter = is_mc_router, }; switchdev_port_attr_set(p->dev, &attr, NULL); } static struct net_bridge_port * br_multicast_rport_from_node(struct net_bridge_mcast *brmctx, struct hlist_head *mc_router_list, struct hlist_node *rlist) { struct net_bridge_mcast_port *pmctx; #if IS_ENABLED(CONFIG_IPV6) if (mc_router_list == &brmctx->ip6_mc_router_list) pmctx = hlist_entry(rlist, struct net_bridge_mcast_port, ip6_rlist); else #endif pmctx = hlist_entry(rlist, struct net_bridge_mcast_port, ip4_rlist); return pmctx->port; } static struct hlist_node * br_multicast_get_rport_slot(struct net_bridge_mcast *brmctx, struct net_bridge_port *port, struct hlist_head *mc_router_list) { struct hlist_node *slot = NULL; struct net_bridge_port *p; struct hlist_node *rlist; hlist_for_each(rlist, mc_router_list) { p = br_multicast_rport_from_node(brmctx, mc_router_list, rlist); if ((unsigned long)port >= (unsigned long)p) break; slot = rlist; } return slot; } static bool br_multicast_no_router_otherpf(struct net_bridge_mcast_port *pmctx, struct hlist_node *rnode) { #if IS_ENABLED(CONFIG_IPV6) if (rnode != &pmctx->ip6_rlist) return hlist_unhashed(&pmctx->ip6_rlist); else return hlist_unhashed(&pmctx->ip4_rlist); #else return true; #endif } /* Add port to router_list * list is maintained ordered by pointer value * and locked by br->multicast_lock and RCU */ static void br_multicast_add_router(struct net_bridge_mcast *brmctx, struct net_bridge_mcast_port *pmctx, struct hlist_node *rlist, struct hlist_head *mc_router_list) { struct hlist_node *slot; if (!hlist_unhashed(rlist)) return; slot = br_multicast_get_rport_slot(brmctx, pmctx->port, mc_router_list); if (slot) hlist_add_behind_rcu(rlist, slot); else hlist_add_head_rcu(rlist, mc_router_list); /* For backwards compatibility for now, only notify if we * switched from no IPv4/IPv6 multicast router to a new * IPv4 or IPv6 multicast router. */ if (br_multicast_no_router_otherpf(pmctx, rlist)) { br_rtr_notify(pmctx->port->br->dev, pmctx, RTM_NEWMDB); br_port_mc_router_state_change(pmctx->port, true); } } /* Add port to router_list * list is maintained ordered by pointer value * and locked by br->multicast_lock and RCU */ static void br_ip4_multicast_add_router(struct net_bridge_mcast *brmctx, struct net_bridge_mcast_port *pmctx) { br_multicast_add_router(brmctx, pmctx, &pmctx->ip4_rlist, &brmctx->ip4_mc_router_list); } /* Add port to router_list * list is maintained ordered by pointer value * and locked by br->multicast_lock and RCU */ static void br_ip6_multicast_add_router(struct net_bridge_mcast *brmctx, struct net_bridge_mcast_port *pmctx) { #if IS_ENABLED(CONFIG_IPV6) br_multicast_add_router(brmctx, pmctx, &pmctx->ip6_rlist, &brmctx->ip6_mc_router_list); #endif } static void br_multicast_mark_router(struct net_bridge_mcast *brmctx, struct net_bridge_mcast_port *pmctx, struct timer_list *timer, struct hlist_node *rlist, struct hlist_head *mc_router_list) { unsigned long now = jiffies; if (!br_multicast_ctx_should_use(brmctx, pmctx)) return; if (!pmctx) { if (brmctx->multicast_router == MDB_RTR_TYPE_TEMP_QUERY) { if (!br_ip4_multicast_is_router(brmctx) && !br_ip6_multicast_is_router(brmctx)) br_mc_router_state_change(brmctx->br, true); mod_timer(timer, now + brmctx->multicast_querier_interval); } return; } if (pmctx->multicast_router == MDB_RTR_TYPE_DISABLED || pmctx->multicast_router == MDB_RTR_TYPE_PERM) return; br_multicast_add_router(brmctx, pmctx, rlist, mc_router_list); mod_timer(timer, now + brmctx->multicast_querier_interval); } static void br_ip4_multicast_mark_router(struct net_bridge_mcast *brmctx, struct net_bridge_mcast_port *pmctx) { struct timer_list *timer = &brmctx->ip4_mc_router_timer; struct hlist_node *rlist = NULL; if (pmctx) { timer = &pmctx->ip4_mc_router_timer; rlist = &pmctx->ip4_rlist; } br_multicast_mark_router(brmctx, pmctx, timer, rlist, &brmctx->ip4_mc_router_list); } static void br_ip6_multicast_mark_router(struct net_bridge_mcast *brmctx, struct net_bridge_mcast_port *pmctx) { #if IS_ENABLED(CONFIG_IPV6) struct timer_list *timer = &brmctx->ip6_mc_router_timer; struct hlist_node *rlist = NULL; if (pmctx) { timer = &pmctx->ip6_mc_router_timer; rlist = &pmctx->ip6_rlist; } br_multicast_mark_router(brmctx, pmctx, timer, rlist, &brmctx->ip6_mc_router_list); #endif } static void br_ip4_multicast_query_received(struct net_bridge_mcast *brmctx, struct net_bridge_mcast_port *pmctx, struct bridge_mcast_other_query *query, struct br_ip *saddr, unsigned long max_delay) { if (!br_multicast_select_querier(brmctx, pmctx, saddr)) return; br_multicast_update_query_timer(brmctx, query, max_delay); br_ip4_multicast_mark_router(brmctx, pmctx); } #if IS_ENABLED(CONFIG_IPV6) static void br_ip6_multicast_query_received(struct net_bridge_mcast *brmctx, struct net_bridge_mcast_port *pmctx, struct bridge_mcast_other_query *query, struct br_ip *saddr, unsigned long max_delay) { if (!br_multicast_select_querier(brmctx, pmctx, saddr)) return; br_multicast_update_query_timer(brmctx, query, max_delay); br_ip6_multicast_mark_router(brmctx, pmctx); } #endif static void br_ip4_multicast_query(struct net_bridge_mcast *brmctx, struct net_bridge_mcast_port *pmctx, struct sk_buff *skb, u16 vid) { unsigned int transport_len = ip_transport_len(skb); const struct iphdr *iph = ip_hdr(skb); struct igmphdr *ih = igmp_hdr(skb); struct net_bridge_mdb_entry *mp; struct igmpv3_query *ih3; struct net_bridge_port_group *p; struct net_bridge_port_group __rcu **pp; struct br_ip saddr = {}; unsigned long max_delay; unsigned long now = jiffies; __be32 group; spin_lock(&brmctx->br->multicast_lock); if (!br_multicast_ctx_should_use(brmctx, pmctx)) goto out; group = ih->group; if (transport_len == sizeof(*ih)) { max_delay = ih->code * (HZ / IGMP_TIMER_SCALE); if (!max_delay) { max_delay = 10 * HZ; group = 0; } } else if (transport_len >= sizeof(*ih3)) { ih3 = igmpv3_query_hdr(skb); if (ih3->nsrcs || (brmctx->multicast_igmp_version == 3 && group && ih3->suppress)) goto out; max_delay = ih3->code ? IGMPV3_MRC(ih3->code) * (HZ / IGMP_TIMER_SCALE) : 1; } else { goto out; } if (!group) { saddr.proto = htons(ETH_P_IP); saddr.src.ip4 = iph->saddr; br_ip4_multicast_query_received(brmctx, pmctx, &brmctx->ip4_other_query, &saddr, max_delay); goto out; } mp = br_mdb_ip4_get(brmctx->br, group, vid); if (!mp) goto out; max_delay *= brmctx->multicast_last_member_count; if (mp->host_joined && (timer_pending(&mp->timer) ? time_after(mp->timer.expires, now + max_delay) : timer_delete_sync_try(&mp->timer) >= 0)) mod_timer(&mp->timer, now + max_delay); for (pp = &mp->ports; (p = mlock_dereference(*pp, brmctx->br)) != NULL; pp = &p->next) { if (timer_pending(&p->timer) ? time_after(p->timer.expires, now + max_delay) : timer_delete_sync_try(&p->timer) >= 0 && (brmctx->multicast_igmp_version == 2 || p->filter_mode == MCAST_EXCLUDE)) mod_timer(&p->timer, now + max_delay); } out: spin_unlock(&brmctx->br->multicast_lock); } #if IS_ENABLED(CONFIG_IPV6) static int br_ip6_multicast_query(struct net_bridge_mcast *brmctx, struct net_bridge_mcast_port *pmctx, struct sk_buff *skb, u16 vid) { unsigned int transport_len = ipv6_transport_len(skb); struct mld_msg *mld; struct net_bridge_mdb_entry *mp; struct mld2_query *mld2q; struct net_bridge_port_group *p; struct net_bridge_port_group __rcu **pp; struct br_ip saddr = {}; unsigned long max_delay; unsigned long now = jiffies; unsigned int offset = skb_transport_offset(skb); const struct in6_addr *group = NULL; bool is_general_query; int err = 0; spin_lock(&brmctx->br->multicast_lock); if (!br_multicast_ctx_should_use(brmctx, pmctx)) goto out; if (transport_len == sizeof(*mld)) { if (!pskb_may_pull(skb, offset + sizeof(*mld))) { err = -EINVAL; goto out; } mld = (struct mld_msg *) icmp6_hdr(skb); max_delay = msecs_to_jiffies(ntohs(mld->mld_maxdelay)); if (max_delay) group = &mld->mld_mca; } else { if (!pskb_may_pull(skb, offset + sizeof(*mld2q))) { err = -EINVAL; goto out; } mld2q = (struct mld2_query *)icmp6_hdr(skb); if (!mld2q->mld2q_nsrcs) group = &mld2q->mld2q_mca; if (brmctx->multicast_mld_version == 2 && !ipv6_addr_any(&mld2q->mld2q_mca) && mld2q->mld2q_suppress) goto out; max_delay = max(msecs_to_jiffies(mldv2_mrc(mld2q)), 1UL); } is_general_query = group && ipv6_addr_any(group); if (is_general_query) { saddr.proto = htons(ETH_P_IPV6); saddr.src.ip6 = ipv6_hdr(skb)->saddr; br_ip6_multicast_query_received(brmctx, pmctx, &brmctx->ip6_other_query, &saddr, max_delay); goto out; } else if (!group) { goto out; } mp = br_mdb_ip6_get(brmctx->br, group, vid); if (!mp) goto out; max_delay *= brmctx->multicast_last_member_count; if (mp->host_joined && (timer_pending(&mp->timer) ? time_after(mp->timer.expires, now + max_delay) : timer_delete_sync_try(&mp->timer) >= 0)) mod_timer(&mp->timer, now + max_delay); for (pp = &mp->ports; (p = mlock_dereference(*pp, brmctx->br)) != NULL; pp = &p->next) { if (timer_pending(&p->timer) ? time_after(p->timer.expires, now + max_delay) : timer_delete_sync_try(&p->timer) >= 0 && (brmctx->multicast_mld_version == 1 || p->filter_mode == MCAST_EXCLUDE)) mod_timer(&p->timer, now + max_delay); } out: spin_unlock(&brmctx->br->multicast_lock); return err; } #endif static void br_multicast_leave_group(struct net_bridge_mcast *brmctx, struct net_bridge_mcast_port *pmctx, struct br_ip *group, struct bridge_mcast_other_query *other_query, struct bridge_mcast_own_query *own_query, const unsigned char *src) { struct net_bridge_mdb_entry *mp; struct net_bridge_port_group *p; unsigned long now; unsigned long time; spin_lock(&brmctx->br->multicast_lock); if (!br_multicast_ctx_should_use(brmctx, pmctx)) goto out; mp = br_mdb_ip_get(brmctx->br, group); if (!mp) goto out; if (pmctx && (pmctx->port->flags & BR_MULTICAST_FAST_LEAVE)) { struct net_bridge_port_group __rcu **pp; for (pp = &mp->ports; (p = mlock_dereference(*pp, brmctx->br)) != NULL; pp = &p->next) { if (!br_port_group_equal(p, pmctx->port, src)) continue; if (p->flags & MDB_PG_FLAGS_PERMANENT) break; p->flags |= MDB_PG_FLAGS_FAST_LEAVE; br_multicast_del_pg(mp, p, pp); } goto out; } if (timer_pending(&other_query->timer)) goto out; if (brmctx->multicast_querier) { __br_multicast_send_query(brmctx, pmctx, NULL, NULL, &mp->addr, false, 0, NULL); time = jiffies + brmctx->multicast_last_member_count * brmctx->multicast_last_member_interval; mod_timer(&own_query->timer, time); for (p = mlock_dereference(mp->ports, brmctx->br); p != NULL && pmctx != NULL; p = mlock_dereference(p->next, brmctx->br)) { if (!br_port_group_equal(p, pmctx->port, src)) continue; if (!hlist_unhashed(&p->mglist) && (timer_pending(&p->timer) ? time_after(p->timer.expires, time) : timer_delete_sync_try(&p->timer) >= 0)) { mod_timer(&p->timer, time); } break; } } now = jiffies; time = now + brmctx->multicast_last_member_count * brmctx->multicast_last_member_interval; if (!pmctx) { if (mp->host_joined && (timer_pending(&mp->timer) ? time_after(mp->timer.expires, time) : timer_delete_sync_try(&mp->timer) >= 0)) { mod_timer(&mp->timer, time); } goto out; } for (p = mlock_dereference(mp->ports, brmctx->br); p != NULL; p = mlock_dereference(p->next, brmctx->br)) { if (p->key.port != pmctx->port) continue; if (!hlist_unhashed(&p->mglist) && (timer_pending(&p->timer) ? time_after(p->timer.expires, time) : timer_delete_sync_try(&p->timer) >= 0)) { mod_timer(&p->timer, time); } break; } out: spin_unlock(&brmctx->br->multicast_lock); } static void br_ip4_multicast_leave_group(struct net_bridge_mcast *brmctx, struct net_bridge_mcast_port *pmctx, __be32 group, __u16 vid, const unsigned char *src) { struct br_ip br_group; struct bridge_mcast_own_query *own_query; if (ipv4_is_local_multicast(group)) return; own_query = pmctx ? &pmctx->ip4_own_query : &brmctx->ip4_own_query; memset(&br_group, 0, sizeof(br_group)); br_group.dst.ip4 = group; br_group.proto = htons(ETH_P_IP); br_group.vid = vid; br_multicast_leave_group(brmctx, pmctx, &br_group, &brmctx->ip4_other_query, own_query, src); } #if IS_ENABLED(CONFIG_IPV6) static void br_ip6_multicast_leave_group(struct net_bridge_mcast *brmctx, struct net_bridge_mcast_port *pmctx, const struct in6_addr *group, __u16 vid, const unsigned char *src) { struct br_ip br_group; struct bridge_mcast_own_query *own_query; if (ipv6_addr_is_ll_all_nodes(group)) return; own_query = pmctx ? &pmctx->ip6_own_query : &brmctx->ip6_own_query; memset(&br_group, 0, sizeof(br_group)); br_group.dst.ip6 = *group; br_group.proto = htons(ETH_P_IPV6); br_group.vid = vid; br_multicast_leave_group(brmctx, pmctx, &br_group, &brmctx->ip6_other_query, own_query, src); } #endif static void br_multicast_err_count(const struct net_bridge *br, const struct net_bridge_port *p, __be16 proto) { struct bridge_mcast_stats __percpu *stats; struct bridge_mcast_stats *pstats; if (!br_opt_get(br, BROPT_MULTICAST_STATS_ENABLED)) return; if (p) stats = p->mcast_stats; else stats = br->mcast_stats; if (WARN_ON(!stats)) return; pstats = this_cpu_ptr(stats); u64_stats_update_begin(&pstats->syncp); switch (proto) { case htons(ETH_P_IP): pstats->mstats.igmp_parse_errors++; break; #if IS_ENABLED(CONFIG_IPV6) case htons(ETH_P_IPV6): pstats->mstats.mld_parse_errors++; break; #endif } u64_stats_update_end(&pstats->syncp); } static void br_multicast_pim(struct net_bridge_mcast *brmctx, struct net_bridge_mcast_port *pmctx, const struct sk_buff *skb) { unsigned int offset = skb_transport_offset(skb); struct pimhdr *pimhdr, _pimhdr; pimhdr = skb_header_pointer(skb, offset, sizeof(_pimhdr), &_pimhdr); if (!pimhdr || pim_hdr_version(pimhdr) != PIM_VERSION || pim_hdr_type(pimhdr) != PIM_TYPE_HELLO) return; spin_lock(&brmctx->br->multicast_lock); br_ip4_multicast_mark_router(brmctx, pmctx); spin_unlock(&brmctx->br->multicast_lock); } static int br_ip4_multicast_mrd_rcv(struct net_bridge_mcast *brmctx, struct net_bridge_mcast_port *pmctx, struct sk_buff *skb) { if (ip_hdr(skb)->protocol != IPPROTO_IGMP || igmp_hdr(skb)->type != IGMP_MRDISC_ADV) return -ENOMSG; spin_lock(&brmctx->br->multicast_lock); br_ip4_multicast_mark_router(brmctx, pmctx); spin_unlock(&brmctx->br->multicast_lock); return 0; } static int br_multicast_ipv4_rcv(struct net_bridge_mcast *brmctx, struct net_bridge_mcast_port *pmctx, struct sk_buff *skb, u16 vid) { struct net_bridge_port *p = pmctx ? pmctx->port : NULL; const unsigned char *src; struct igmphdr *ih; int err; err = ip_mc_check_igmp(skb); if (err == -ENOMSG) { if (!ipv4_is_local_multicast(ip_hdr(skb)->daddr)) { BR_INPUT_SKB_CB(skb)->mrouters_only = 1; } else if (pim_ipv4_all_pim_routers(ip_hdr(skb)->daddr)) { if (ip_hdr(skb)->protocol == IPPROTO_PIM) br_multicast_pim(brmctx, pmctx, skb); } else if (ipv4_is_all_snoopers(ip_hdr(skb)->daddr)) { br_ip4_multicast_mrd_rcv(brmctx, pmctx, skb); } return 0; } else if (err < 0) { br_multicast_err_count(brmctx->br, p, skb->protocol); return err; } ih = igmp_hdr(skb); src = eth_hdr(skb)->h_source; BR_INPUT_SKB_CB(skb)->igmp = ih->type; switch (ih->type) { case IGMP_HOST_MEMBERSHIP_REPORT: case IGMPV2_HOST_MEMBERSHIP_REPORT: BR_INPUT_SKB_CB(skb)->mrouters_only = 1; err = br_ip4_multicast_add_group(brmctx, pmctx, ih->group, vid, src, true); break; case IGMPV3_HOST_MEMBERSHIP_REPORT: err = br_ip4_multicast_igmp3_report(brmctx, pmctx, skb, vid); break; case IGMP_HOST_MEMBERSHIP_QUERY: br_ip4_multicast_query(brmctx, pmctx, skb, vid); break; case IGMP_HOST_LEAVE_MESSAGE: br_ip4_multicast_leave_group(brmctx, pmctx, ih->group, vid, src); break; } br_multicast_count(brmctx->br, p, skb, BR_INPUT_SKB_CB(skb)->igmp, BR_MCAST_DIR_RX); return err; } #if IS_ENABLED(CONFIG_IPV6) static void br_ip6_multicast_mrd_rcv(struct net_bridge_mcast *brmctx, struct net_bridge_mcast_port *pmctx, struct sk_buff *skb) { if (icmp6_hdr(skb)->icmp6_type != ICMPV6_MRDISC_ADV) return; spin_lock(&brmctx->br->multicast_lock); br_ip6_multicast_mark_router(brmctx, pmctx); spin_unlock(&brmctx->br->multicast_lock); } static int br_multicast_ipv6_rcv(struct net_bridge_mcast *brmctx, struct net_bridge_mcast_port *pmctx, struct sk_buff *skb, u16 vid) { struct net_bridge_port *p = pmctx ? pmctx->port : NULL; const unsigned char *src; struct mld_msg *mld; int err; err = ipv6_mc_check_mld(skb); if (err == -ENOMSG || err == -ENODATA) { if (!ipv6_addr_is_ll_all_nodes(&ipv6_hdr(skb)->daddr)) BR_INPUT_SKB_CB(skb)->mrouters_only = 1; if (err == -ENODATA && ipv6_addr_is_all_snoopers(&ipv6_hdr(skb)->daddr)) br_ip6_multicast_mrd_rcv(brmctx, pmctx, skb); return 0; } else if (err < 0) { br_multicast_err_count(brmctx->br, p, skb->protocol); return err; } mld = (struct mld_msg *)skb_transport_header(skb); BR_INPUT_SKB_CB(skb)->igmp = mld->mld_type; switch (mld->mld_type) { case ICMPV6_MGM_REPORT: src = eth_hdr(skb)->h_source; BR_INPUT_SKB_CB(skb)->mrouters_only = 1; err = br_ip6_multicast_add_group(brmctx, pmctx, &mld->mld_mca, vid, src, true); break; case ICMPV6_MLD2_REPORT: err = br_ip6_multicast_mld2_report(brmctx, pmctx, skb, vid); break; case ICMPV6_MGM_QUERY: err = br_ip6_multicast_query(brmctx, pmctx, skb, vid); break; case ICMPV6_MGM_REDUCTION: src = eth_hdr(skb)->h_source; br_ip6_multicast_leave_group(brmctx, pmctx, &mld->mld_mca, vid, src); break; } br_multicast_count(brmctx->br, p, skb, BR_INPUT_SKB_CB(skb)->igmp, BR_MCAST_DIR_RX); return err; } #endif int br_multicast_rcv(struct net_bridge_mcast **brmctx, struct net_bridge_mcast_port **pmctx, struct net_bridge_vlan *vlan, struct sk_buff *skb, u16 vid) { int ret = 0; BR_INPUT_SKB_CB(skb)->igmp = 0; BR_INPUT_SKB_CB(skb)->mrouters_only = 0; if (!br_opt_get((*brmctx)->br, BROPT_MULTICAST_ENABLED)) return 0; if (br_opt_get((*brmctx)->br, BROPT_MCAST_VLAN_SNOOPING_ENABLED) && vlan) { const struct net_bridge_vlan *masterv; /* the vlan has the master flag set only when transmitting * through the bridge device */ if (br_vlan_is_master(vlan)) { masterv = vlan; *brmctx = &vlan->br_mcast_ctx; *pmctx = NULL; } else { masterv = vlan->brvlan; *brmctx = &vlan->brvlan->br_mcast_ctx; *pmctx = &vlan->port_mcast_ctx; } if (!(masterv->priv_flags & BR_VLFLAG_GLOBAL_MCAST_ENABLED)) return 0; } switch (skb->protocol) { case htons(ETH_P_IP): ret = br_multicast_ipv4_rcv(*brmctx, *pmctx, skb, vid); break; #if IS_ENABLED(CONFIG_IPV6) case htons(ETH_P_IPV6): ret = br_multicast_ipv6_rcv(*brmctx, *pmctx, skb, vid); break; #endif } return ret; } static void br_multicast_query_expired(struct net_bridge_mcast *brmctx, struct bridge_mcast_own_query *query, struct bridge_mcast_querier *querier) { spin_lock(&brmctx->br->multicast_lock); if (br_multicast_ctx_vlan_disabled(brmctx)) goto out; if (query->startup_sent < brmctx->multicast_startup_query_count) query->startup_sent++; br_multicast_send_query(brmctx, NULL, query); out: spin_unlock(&brmctx->br->multicast_lock); } static void br_ip4_multicast_query_expired(struct timer_list *t) { struct net_bridge_mcast *brmctx = timer_container_of(brmctx, t, ip4_own_query.timer); br_multicast_query_expired(brmctx, &brmctx->ip4_own_query, &brmctx->ip4_querier); } #if IS_ENABLED(CONFIG_IPV6) static void br_ip6_multicast_query_expired(struct timer_list *t) { struct net_bridge_mcast *brmctx = timer_container_of(brmctx, t, ip6_own_query.timer); br_multicast_query_expired(brmctx, &brmctx->ip6_own_query, &brmctx->ip6_querier); } #endif static void br_multicast_gc_work(struct work_struct *work) { struct net_bridge *br = container_of(work, struct net_bridge, mcast_gc_work); HLIST_HEAD(deleted_head); spin_lock_bh(&br->multicast_lock); hlist_move_list(&br->mcast_gc_list, &deleted_head); spin_unlock_bh(&br->multicast_lock); br_multicast_gc(&deleted_head); } void br_multicast_ctx_init(struct net_bridge *br, struct net_bridge_vlan *vlan, struct net_bridge_mcast *brmctx) { brmctx->br = br; brmctx->vlan = vlan; brmctx->multicast_router = MDB_RTR_TYPE_TEMP_QUERY; brmctx->multicast_last_member_count = 2; brmctx->multicast_startup_query_count = 2; brmctx->multicast_last_member_interval = HZ; brmctx->multicast_query_response_interval = 10 * HZ; brmctx->multicast_startup_query_interval = 125 * HZ / 4; brmctx->multicast_query_interval = 125 * HZ; brmctx->multicast_querier_interval = 255 * HZ; brmctx->multicast_membership_interval = 260 * HZ; brmctx->ip4_querier.port_ifidx = 0; seqcount_spinlock_init(&brmctx->ip4_querier.seq, &br->multicast_lock); brmctx->multicast_igmp_version = 2; #if IS_ENABLED(CONFIG_IPV6) brmctx->multicast_mld_version = 1; brmctx->ip6_querier.port_ifidx = 0; seqcount_spinlock_init(&brmctx->ip6_querier.seq, &br->multicast_lock); #endif timer_setup(&brmctx->ip4_mc_router_timer, br_ip4_multicast_local_router_expired, 0); timer_setup(&brmctx->ip4_other_query.timer, br_ip4_multicast_querier_expired, 0); timer_setup(&brmctx->ip4_other_query.delay_timer, br_multicast_query_delay_expired, 0); timer_setup(&brmctx->ip4_own_query.timer, br_ip4_multicast_query_expired, 0); #if IS_ENABLED(CONFIG_IPV6) timer_setup(&brmctx->ip6_mc_router_timer, br_ip6_multicast_local_router_expired, 0); timer_setup(&brmctx->ip6_other_query.timer, br_ip6_multicast_querier_expired, 0); timer_setup(&brmctx->ip6_other_query.delay_timer, br_multicast_query_delay_expired, 0); timer_setup(&brmctx->ip6_own_query.timer, br_ip6_multicast_query_expired, 0); #endif } void br_multicast_ctx_deinit(struct net_bridge_mcast *brmctx) { __br_multicast_stop(brmctx); } void br_multicast_init(struct net_bridge *br) { br->hash_max = BR_MULTICAST_DEFAULT_HASH_MAX; br_multicast_ctx_init(br, NULL, &br->multicast_ctx); br_opt_toggle(br, BROPT_MULTICAST_ENABLED, true); br_opt_toggle(br, BROPT_HAS_IPV6_ADDR, true); spin_lock_init(&br->multicast_lock); INIT_HLIST_HEAD(&br->mdb_list); INIT_HLIST_HEAD(&br->mcast_gc_list); INIT_WORK(&br->mcast_gc_work, br_multicast_gc_work); } static void br_ip4_multicast_join_snoopers(struct net_bridge *br) { struct in_device *in_dev = in_dev_get(br->dev); if (!in_dev) return; __ip_mc_inc_group(in_dev, htonl(INADDR_ALLSNOOPERS_GROUP), GFP_ATOMIC); in_dev_put(in_dev); } #if IS_ENABLED(CONFIG_IPV6) static void br_ip6_multicast_join_snoopers(struct net_bridge *br) { struct in6_addr addr; ipv6_addr_set(&addr, htonl(0xff020000), 0, 0, htonl(0x6a)); ipv6_dev_mc_inc(br->dev, &addr); } #else static inline void br_ip6_multicast_join_snoopers(struct net_bridge *br) { } #endif void br_multicast_join_snoopers(struct net_bridge *br) { br_ip4_multicast_join_snoopers(br); br_ip6_multicast_join_snoopers(br); } static void br_ip4_multicast_leave_snoopers(struct net_bridge *br) { struct in_device *in_dev = in_dev_get(br->dev); if (WARN_ON(!in_dev)) return; __ip_mc_dec_group(in_dev, htonl(INADDR_ALLSNOOPERS_GROUP), GFP_ATOMIC); in_dev_put(in_dev); } #if IS_ENABLED(CONFIG_IPV6) static void br_ip6_multicast_leave_snoopers(struct net_bridge *br) { struct in6_addr addr; ipv6_addr_set(&addr, htonl(0xff020000), 0, 0, htonl(0x6a)); ipv6_dev_mc_dec(br->dev, &addr); } #else static inline void br_ip6_multicast_leave_snoopers(struct net_bridge *br) { } #endif void br_multicast_leave_snoopers(struct net_bridge *br) { br_ip4_multicast_leave_snoopers(br); br_ip6_multicast_leave_snoopers(br); } static void __br_multicast_open_query(struct net_bridge *br, struct bridge_mcast_own_query *query) { query->startup_sent = 0; if (!br_opt_get(br, BROPT_MULTICAST_ENABLED)) return; mod_timer(&query->timer, jiffies); } static void __br_multicast_open(struct net_bridge_mcast *brmctx) { __br_multicast_open_query(brmctx->br, &brmctx->ip4_own_query); #if IS_ENABLED(CONFIG_IPV6) __br_multicast_open_query(brmctx->br, &brmctx->ip6_own_query); #endif } void br_multicast_open(struct net_bridge *br) { ASSERT_RTNL(); if (br_opt_get(br, BROPT_MCAST_VLAN_SNOOPING_ENABLED)) { struct net_bridge_vlan_group *vg; struct net_bridge_vlan *vlan; vg = br_vlan_group(br); if (vg) { list_for_each_entry(vlan, &vg->vlan_list, vlist) { struct net_bridge_mcast *brmctx; brmctx = &vlan->br_mcast_ctx; if (br_vlan_is_brentry(vlan) && !br_multicast_ctx_vlan_disabled(brmctx)) __br_multicast_open(&vlan->br_mcast_ctx); } } } else { __br_multicast_open(&br->multicast_ctx); } } static void __br_multicast_stop(struct net_bridge_mcast *brmctx) { timer_delete_sync(&brmctx->ip4_mc_router_timer); timer_delete_sync(&brmctx->ip4_other_query.timer); timer_delete_sync(&brmctx->ip4_other_query.delay_timer); timer_delete_sync(&brmctx->ip4_own_query.timer); #if IS_ENABLED(CONFIG_IPV6) timer_delete_sync(&brmctx->ip6_mc_router_timer); timer_delete_sync(&brmctx->ip6_other_query.timer); timer_delete_sync(&brmctx->ip6_other_query.delay_timer); timer_delete_sync(&brmctx->ip6_own_query.timer); #endif } void br_multicast_update_vlan_mcast_ctx(struct net_bridge_vlan *v, u8 state) { #if IS_ENABLED(CONFIG_BRIDGE_VLAN_FILTERING) struct net_bridge *br; if (!br_vlan_should_use(v)) return; if (br_vlan_is_master(v)) return; br = v->port->br; if (!br_opt_get(br, BROPT_MCAST_VLAN_SNOOPING_ENABLED)) return; if (br_vlan_state_allowed(state, true)) br_multicast_enable_port_ctx(&v->port_mcast_ctx); /* Multicast is not disabled for the vlan when it goes in * blocking state because the timers will expire and stop by * themselves without sending more queries. */ #endif } void br_multicast_toggle_one_vlan(struct net_bridge_vlan *vlan, bool on) { struct net_bridge *br; /* it's okay to check for the flag without the multicast lock because it * can only change under RTNL -> multicast_lock, we need the latter to * sync with timers and packets */ if (on == !!(vlan->priv_flags & BR_VLFLAG_MCAST_ENABLED)) return; if (br_vlan_is_master(vlan)) { br = vlan->br; if (!br_vlan_is_brentry(vlan) || (on && br_multicast_ctx_vlan_global_disabled(&vlan->br_mcast_ctx))) return; spin_lock_bh(&br->multicast_lock); vlan->priv_flags ^= BR_VLFLAG_MCAST_ENABLED; spin_unlock_bh(&br->multicast_lock); if (on) __br_multicast_open(&vlan->br_mcast_ctx); else __br_multicast_stop(&vlan->br_mcast_ctx); } else { struct net_bridge_mcast *brmctx; brmctx = br_multicast_port_ctx_get_global(&vlan->port_mcast_ctx); if (on && br_multicast_ctx_vlan_global_disabled(brmctx)) return; br = vlan->port->br; spin_lock_bh(&br->multicast_lock); vlan->priv_flags ^= BR_VLFLAG_MCAST_ENABLED; if (on) __br_multicast_enable_port_ctx(&vlan->port_mcast_ctx); else __br_multicast_disable_port_ctx(&vlan->port_mcast_ctx); spin_unlock_bh(&br->multicast_lock); } } static void br_multicast_toggle_vlan(struct net_bridge_vlan *vlan, bool on) { struct net_bridge_port *p; if (WARN_ON_ONCE(!br_vlan_is_master(vlan))) return; list_for_each_entry(p, &vlan->br->port_list, list) { struct net_bridge_vlan *vport; vport = br_vlan_find(nbp_vlan_group(p), vlan->vid); if (!vport) continue; br_multicast_toggle_one_vlan(vport, on); } if (br_vlan_is_brentry(vlan)) br_multicast_toggle_one_vlan(vlan, on); } int br_multicast_toggle_vlan_snooping(struct net_bridge *br, bool on, struct netlink_ext_ack *extack) { struct net_bridge_vlan_group *vg; struct net_bridge_vlan *vlan; struct net_bridge_port *p; if (br_opt_get(br, BROPT_MCAST_VLAN_SNOOPING_ENABLED) == on) return 0; if (on && !br_opt_get(br, BROPT_VLAN_ENABLED)) { NL_SET_ERR_MSG_MOD(extack, "Cannot enable multicast vlan snooping with vlan filtering disabled"); return -EINVAL; } vg = br_vlan_group(br); if (!vg) return 0; br_opt_toggle(br, BROPT_MCAST_VLAN_SNOOPING_ENABLED, on); /* disable/enable non-vlan mcast contexts based on vlan snooping */ if (on) __br_multicast_stop(&br->multicast_ctx); else __br_multicast_open(&br->multicast_ctx); list_for_each_entry(p, &br->port_list, list) { if (on) br_multicast_disable_port_ctx(&p->multicast_ctx); else br_multicast_enable_port_ctx(&p->multicast_ctx); } list_for_each_entry(vlan, &vg->vlan_list, vlist) br_multicast_toggle_vlan(vlan, on); return 0; } bool br_multicast_toggle_global_vlan(struct net_bridge_vlan *vlan, bool on) { ASSERT_RTNL(); /* BR_VLFLAG_GLOBAL_MCAST_ENABLED relies on eventual consistency and * requires only RTNL to change */ if (on == !!(vlan->priv_flags & BR_VLFLAG_GLOBAL_MCAST_ENABLED)) return false; vlan->priv_flags ^= BR_VLFLAG_GLOBAL_MCAST_ENABLED; br_multicast_toggle_vlan(vlan, on); return true; } void br_multicast_stop(struct net_bridge *br) { ASSERT_RTNL(); if (br_opt_get(br, BROPT_MCAST_VLAN_SNOOPING_ENABLED)) { struct net_bridge_vlan_group *vg; struct net_bridge_vlan *vlan; vg = br_vlan_group(br); if (vg) { list_for_each_entry(vlan, &vg->vlan_list, vlist) { struct net_bridge_mcast *brmctx; brmctx = &vlan->br_mcast_ctx; if (br_vlan_is_brentry(vlan) && !br_multicast_ctx_vlan_disabled(brmctx)) __br_multicast_stop(&vlan->br_mcast_ctx); } } } else { __br_multicast_stop(&br->multicast_ctx); } } void br_multicast_dev_del(struct net_bridge *br) { struct net_bridge_mdb_entry *mp; HLIST_HEAD(deleted_head); struct hlist_node *tmp; spin_lock_bh(&br->multicast_lock); hlist_for_each_entry_safe(mp, tmp, &br->mdb_list, mdb_node) br_multicast_del_mdb_entry(mp); hlist_move_list(&br->mcast_gc_list, &deleted_head); spin_unlock_bh(&br->multicast_lock); br_multicast_ctx_deinit(&br->multicast_ctx); br_multicast_gc(&deleted_head); cancel_work_sync(&br->mcast_gc_work); rcu_barrier(); } int br_multicast_set_router(struct net_bridge_mcast *brmctx, unsigned long val) { int err = -EINVAL; spin_lock_bh(&brmctx->br->multicast_lock); switch (val) { case MDB_RTR_TYPE_DISABLED: case MDB_RTR_TYPE_PERM: br_mc_router_state_change(brmctx->br, val == MDB_RTR_TYPE_PERM); timer_delete(&brmctx->ip4_mc_router_timer); #if IS_ENABLED(CONFIG_IPV6) timer_delete(&brmctx->ip6_mc_router_timer); #endif brmctx->multicast_router = val; err = 0; break; case MDB_RTR_TYPE_TEMP_QUERY: if (brmctx->multicast_router != MDB_RTR_TYPE_TEMP_QUERY) br_mc_router_state_change(brmctx->br, false); brmctx->multicast_router = val; err = 0; break; } spin_unlock_bh(&brmctx->br->multicast_lock); return err; } static void br_multicast_rport_del_notify(struct net_bridge_mcast_port *pmctx, bool deleted) { if (!deleted) return; /* For backwards compatibility for now, only notify if there is * no multicast router anymore for both IPv4 and IPv6. */ if (!hlist_unhashed(&pmctx->ip4_rlist)) return; #if IS_ENABLED(CONFIG_IPV6) if (!hlist_unhashed(&pmctx->ip6_rlist)) return; #endif br_rtr_notify(pmctx->port->br->dev, pmctx, RTM_DELMDB); br_port_mc_router_state_change(pmctx->port, false); /* don't allow timer refresh */ if (pmctx->multicast_router == MDB_RTR_TYPE_TEMP) pmctx->multicast_router = MDB_RTR_TYPE_TEMP_QUERY; } int br_multicast_set_port_router(struct net_bridge_mcast_port *pmctx, unsigned long val) { struct net_bridge_mcast *brmctx; unsigned long now = jiffies; int err = -EINVAL; bool del = false; brmctx = br_multicast_port_ctx_get_global(pmctx); spin_lock_bh(&brmctx->br->multicast_lock); if (pmctx->multicast_router == val) { /* Refresh the temp router port timer */ if (pmctx->multicast_router == MDB_RTR_TYPE_TEMP) { mod_timer(&pmctx->ip4_mc_router_timer, now + brmctx->multicast_querier_interval); #if IS_ENABLED(CONFIG_IPV6) mod_timer(&pmctx->ip6_mc_router_timer, now + brmctx->multicast_querier_interval); #endif } err = 0; goto unlock; } switch (val) { case MDB_RTR_TYPE_DISABLED: pmctx->multicast_router = MDB_RTR_TYPE_DISABLED; del |= br_ip4_multicast_rport_del(pmctx); timer_delete(&pmctx->ip4_mc_router_timer); del |= br_ip6_multicast_rport_del(pmctx); #if IS_ENABLED(CONFIG_IPV6) timer_delete(&pmctx->ip6_mc_router_timer); #endif br_multicast_rport_del_notify(pmctx, del); break; case MDB_RTR_TYPE_TEMP_QUERY: pmctx->multicast_router = MDB_RTR_TYPE_TEMP_QUERY; del |= br_ip4_multicast_rport_del(pmctx); del |= br_ip6_multicast_rport_del(pmctx); br_multicast_rport_del_notify(pmctx, del); break; case MDB_RTR_TYPE_PERM: pmctx->multicast_router = MDB_RTR_TYPE_PERM; timer_delete(&pmctx->ip4_mc_router_timer); br_ip4_multicast_add_router(brmctx, pmctx); #if IS_ENABLED(CONFIG_IPV6) timer_delete(&pmctx->ip6_mc_router_timer); #endif br_ip6_multicast_add_router(brmctx, pmctx); break; case MDB_RTR_TYPE_TEMP: pmctx->multicast_router = MDB_RTR_TYPE_TEMP; br_ip4_multicast_mark_router(brmctx, pmctx); br_ip6_multicast_mark_router(brmctx, pmctx); break; default: goto unlock; } err = 0; unlock: spin_unlock_bh(&brmctx->br->multicast_lock); return err; } int br_multicast_set_vlan_router(struct net_bridge_vlan *v, u8 mcast_router) { int err; if (br_vlan_is_master(v)) err = br_multicast_set_router(&v->br_mcast_ctx, mcast_router); else err = br_multicast_set_port_router(&v->port_mcast_ctx, mcast_router); return err; } static void br_multicast_start_querier(struct net_bridge_mcast *brmctx, struct bridge_mcast_own_query *query) { struct net_bridge_port *port; if (!br_multicast_ctx_matches_vlan_snooping(brmctx)) return; __br_multicast_open_query(brmctx->br, query); rcu_read_lock(); list_for_each_entry_rcu(port, &brmctx->br->port_list, list) { struct bridge_mcast_own_query *ip4_own_query; #if IS_ENABLED(CONFIG_IPV6) struct bridge_mcast_own_query *ip6_own_query; #endif if (br_multicast_port_ctx_state_stopped(&port->multicast_ctx)) continue; if (br_multicast_ctx_is_vlan(brmctx)) { struct net_bridge_vlan *vlan; vlan = br_vlan_find(nbp_vlan_group_rcu(port), brmctx->vlan->vid); if (!vlan || br_multicast_port_ctx_state_stopped(&vlan->port_mcast_ctx)) continue; ip4_own_query = &vlan->port_mcast_ctx.ip4_own_query; #if IS_ENABLED(CONFIG_IPV6) ip6_own_query = &vlan->port_mcast_ctx.ip6_own_query; #endif } else { ip4_own_query = &port->multicast_ctx.ip4_own_query; #if IS_ENABLED(CONFIG_IPV6) ip6_own_query = &port->multicast_ctx.ip6_own_query; #endif } if (query == &brmctx->ip4_own_query) br_multicast_enable(ip4_own_query); #if IS_ENABLED(CONFIG_IPV6) else br_multicast_enable(ip6_own_query); #endif } rcu_read_unlock(); } int br_multicast_toggle(struct net_bridge *br, unsigned long val, struct netlink_ext_ack *extack) { struct net_bridge_port *port; bool change_snoopers = false; int err = 0; spin_lock_bh(&br->multicast_lock); if (!!br_opt_get(br, BROPT_MULTICAST_ENABLED) == !!val) goto unlock; err = br_mc_disabled_update(br->dev, val, extack); if (err == -EOPNOTSUPP) err = 0; if (err) goto unlock; br_opt_toggle(br, BROPT_MULTICAST_ENABLED, !!val); if (!br_opt_get(br, BROPT_MULTICAST_ENABLED)) { change_snoopers = true; goto unlock; } if (!netif_running(br->dev)) goto unlock; br_multicast_open(br); list_for_each_entry(port, &br->port_list, list) __br_multicast_enable_port_ctx(&port->multicast_ctx); change_snoopers = true; unlock: spin_unlock_bh(&br->multicast_lock); /* br_multicast_join_snoopers has the potential to cause * an MLD Report/Leave to be delivered to br_multicast_rcv, * which would in turn call br_multicast_add_group, which would * attempt to acquire multicast_lock. This function should be * called after the lock has been released to avoid deadlocks on * multicast_lock. * * br_multicast_leave_snoopers does not have the problem since * br_multicast_rcv first checks BROPT_MULTICAST_ENABLED, and * returns without calling br_multicast_ipv4/6_rcv if it's not * enabled. Moved both functions out just for symmetry. */ if (change_snoopers) { if (br_opt_get(br, BROPT_MULTICAST_ENABLED)) br_multicast_join_snoopers(br); else br_multicast_leave_snoopers(br); } return err; } bool br_multicast_enabled(const struct net_device *dev) { struct net_bridge *br = netdev_priv(dev); return !!br_opt_get(br, BROPT_MULTICAST_ENABLED); } EXPORT_SYMBOL_GPL(br_multicast_enabled); bool br_multicast_router(const struct net_device *dev) { struct net_bridge *br = netdev_priv(dev); bool is_router; spin_lock_bh(&br->multicast_lock); is_router = br_multicast_is_router(&br->multicast_ctx, NULL); spin_unlock_bh(&br->multicast_lock); return is_router; } EXPORT_SYMBOL_GPL(br_multicast_router); int br_multicast_set_querier(struct net_bridge_mcast *brmctx, unsigned long val) { unsigned long max_delay; val = !!val; spin_lock_bh(&brmctx->br->multicast_lock); if (brmctx->multicast_querier == val) goto unlock; WRITE_ONCE(brmctx->multicast_querier, val); if (!val) goto unlock; max_delay = brmctx->multicast_query_response_interval; if (!timer_pending(&brmctx->ip4_other_query.timer)) mod_timer(&brmctx->ip4_other_query.delay_timer, jiffies + max_delay); br_multicast_start_querier(brmctx, &brmctx->ip4_own_query); #if IS_ENABLED(CONFIG_IPV6) if (!timer_pending(&brmctx->ip6_other_query.timer)) mod_timer(&brmctx->ip6_other_query.delay_timer, jiffies + max_delay); br_multicast_start_querier(brmctx, &brmctx->ip6_own_query); #endif unlock: spin_unlock_bh(&brmctx->br->multicast_lock); return 0; } int br_multicast_set_igmp_version(struct net_bridge_mcast *brmctx, unsigned long val) { /* Currently we support only version 2 and 3 */ switch (val) { case 2: case 3: break; default: return -EINVAL; } spin_lock_bh(&brmctx->br->multicast_lock); brmctx->multicast_igmp_version = val; spin_unlock_bh(&brmctx->br->multicast_lock); return 0; } #if IS_ENABLED(CONFIG_IPV6) int br_multicast_set_mld_version(struct net_bridge_mcast *brmctx, unsigned long val) { /* Currently we support version 1 and 2 */ switch (val) { case 1: case 2: break; default: return -EINVAL; } spin_lock_bh(&brmctx->br->multicast_lock); brmctx->multicast_mld_version = val; spin_unlock_bh(&brmctx->br->multicast_lock); return 0; } #endif void br_multicast_set_query_intvl(struct net_bridge_mcast *brmctx, unsigned long val) { unsigned long intvl_jiffies = clock_t_to_jiffies(val); if (intvl_jiffies < BR_MULTICAST_QUERY_INTVL_MIN) { br_info(brmctx->br, "trying to set multicast query interval below minimum, setting to %lu (%ums)\n", jiffies_to_clock_t(BR_MULTICAST_QUERY_INTVL_MIN), jiffies_to_msecs(BR_MULTICAST_QUERY_INTVL_MIN)); intvl_jiffies = BR_MULTICAST_QUERY_INTVL_MIN; } if (intvl_jiffies > BR_MULTICAST_QUERY_INTVL_MAX) { br_info(brmctx->br, "trying to set multicast query interval above maximum, setting to %lu (%ums)\n", jiffies_to_clock_t(BR_MULTICAST_QUERY_INTVL_MAX), jiffies_to_msecs(BR_MULTICAST_QUERY_INTVL_MAX)); intvl_jiffies = BR_MULTICAST_QUERY_INTVL_MAX; } brmctx->multicast_query_interval = intvl_jiffies; } void br_multicast_set_startup_query_intvl(struct net_bridge_mcast *brmctx, unsigned long val) { unsigned long intvl_jiffies = clock_t_to_jiffies(val); if (intvl_jiffies < BR_MULTICAST_STARTUP_QUERY_INTVL_MIN) { br_info(brmctx->br, "trying to set multicast startup query interval below minimum, setting to %lu (%ums)\n", jiffies_to_clock_t(BR_MULTICAST_STARTUP_QUERY_INTVL_MIN), jiffies_to_msecs(BR_MULTICAST_STARTUP_QUERY_INTVL_MIN)); intvl_jiffies = BR_MULTICAST_STARTUP_QUERY_INTVL_MIN; } if (intvl_jiffies > BR_MULTICAST_STARTUP_QUERY_INTVL_MAX) { br_info(brmctx->br, "trying to set multicast startup query interval above maximum, setting to %lu (%ums)\n", jiffies_to_clock_t(BR_MULTICAST_STARTUP_QUERY_INTVL_MAX), jiffies_to_msecs(BR_MULTICAST_STARTUP_QUERY_INTVL_MAX)); intvl_jiffies = BR_MULTICAST_STARTUP_QUERY_INTVL_MAX; } brmctx->multicast_startup_query_interval = intvl_jiffies; } /** * br_multicast_list_adjacent - Returns snooped multicast addresses * @dev: The bridge port adjacent to which to retrieve addresses * @br_ip_list: The list to store found, snooped multicast IP addresses in * * Creates a list of IP addresses (struct br_ip_list) sensed by the multicast * snooping feature on all bridge ports of dev's bridge device, excluding * the addresses from dev itself. * * Returns the number of items added to br_ip_list. * * Notes: * - br_ip_list needs to be initialized by caller * - br_ip_list might contain duplicates in the end * (needs to be taken care of by caller) * - br_ip_list needs to be freed by caller */ int br_multicast_list_adjacent(struct net_device *dev, struct list_head *br_ip_list) { struct net_bridge *br; struct net_bridge_port *port; struct net_bridge_port_group *group; struct br_ip_list *entry; int count = 0; rcu_read_lock(); if (!br_ip_list || !netif_is_bridge_port(dev)) goto unlock; port = br_port_get_rcu(dev); if (!port || !port->br) goto unlock; br = port->br; list_for_each_entry_rcu(port, &br->port_list, list) { if (!port->dev || port->dev == dev) continue; hlist_for_each_entry_rcu(group, &port->mglist, mglist) { entry = kmalloc(sizeof(*entry), GFP_ATOMIC); if (!entry) goto unlock; entry->addr = group->key.addr; list_add(&entry->list, br_ip_list); count++; } } unlock: rcu_read_unlock(); return count; } EXPORT_SYMBOL_GPL(br_multicast_list_adjacent); /** * br_multicast_has_querier_anywhere - Checks for a querier on a bridge * @dev: The bridge port providing the bridge on which to check for a querier * @proto: The protocol family to check for: IGMP -> ETH_P_IP, MLD -> ETH_P_IPV6 * * Checks whether the given interface has a bridge on top and if so returns * true if a valid querier exists anywhere on the bridged link layer. * Otherwise returns false. */ bool br_multicast_has_querier_anywhere(struct net_device *dev, int proto) { struct net_bridge *br; struct net_bridge_port *port; struct ethhdr eth; bool ret = false; rcu_read_lock(); if (!netif_is_bridge_port(dev)) goto unlock; port = br_port_get_rcu(dev); if (!port || !port->br) goto unlock; br = port->br; memset(ð, 0, sizeof(eth)); eth.h_proto = htons(proto); ret = br_multicast_querier_exists(&br->multicast_ctx, ð, NULL); unlock: rcu_read_unlock(); return ret; } EXPORT_SYMBOL_GPL(br_multicast_has_querier_anywhere); /** * br_multicast_has_querier_adjacent - Checks for a querier behind a bridge port * @dev: The bridge port adjacent to which to check for a querier * @proto: The protocol family to check for: IGMP -> ETH_P_IP, MLD -> ETH_P_IPV6 * * Checks whether the given interface has a bridge on top and if so returns * true if a selected querier is behind one of the other ports of this * bridge. Otherwise returns false. */ bool br_multicast_has_querier_adjacent(struct net_device *dev, int proto) { struct net_bridge_mcast *brmctx; struct net_bridge *br; struct net_bridge_port *port; bool ret = false; int port_ifidx; rcu_read_lock(); if (!netif_is_bridge_port(dev)) goto unlock; port = br_port_get_rcu(dev); if (!port || !port->br) goto unlock; br = port->br; brmctx = &br->multicast_ctx; switch (proto) { case ETH_P_IP: port_ifidx = brmctx->ip4_querier.port_ifidx; if (!timer_pending(&brmctx->ip4_other_query.timer) || port_ifidx == port->dev->ifindex) goto unlock; break; #if IS_ENABLED(CONFIG_IPV6) case ETH_P_IPV6: port_ifidx = brmctx->ip6_querier.port_ifidx; if (!timer_pending(&brmctx->ip6_other_query.timer) || port_ifidx == port->dev->ifindex) goto unlock; break; #endif default: goto unlock; } ret = true; unlock: rcu_read_unlock(); return ret; } EXPORT_SYMBOL_GPL(br_multicast_has_querier_adjacent); /** * br_multicast_has_router_adjacent - Checks for a router behind a bridge port * @dev: The bridge port adjacent to which to check for a multicast router * @proto: The protocol family to check for: IGMP -> ETH_P_IP, MLD -> ETH_P_IPV6 * * Checks whether the given interface has a bridge on top and if so returns * true if a multicast router is behind one of the other ports of this * bridge. Otherwise returns false. */ bool br_multicast_has_router_adjacent(struct net_device *dev, int proto) { struct net_bridge_mcast_port *pmctx; struct net_bridge_mcast *brmctx; struct net_bridge_port *port; bool ret = false; rcu_read_lock(); port = br_port_get_check_rcu(dev); if (!port) goto unlock; brmctx = &port->br->multicast_ctx; switch (proto) { case ETH_P_IP: hlist_for_each_entry_rcu(pmctx, &brmctx->ip4_mc_router_list, ip4_rlist) { if (pmctx->port == port) continue; ret = true; goto unlock; } break; #if IS_ENABLED(CONFIG_IPV6) case ETH_P_IPV6: hlist_for_each_entry_rcu(pmctx, &brmctx->ip6_mc_router_list, ip6_rlist) { if (pmctx->port == port) continue; ret = true; goto unlock; } break; #endif default: /* when compiled without IPv6 support, be conservative and * always assume presence of an IPv6 multicast router */ ret = true; } unlock: rcu_read_unlock(); return ret; } EXPORT_SYMBOL_GPL(br_multicast_has_router_adjacent); static void br_mcast_stats_add(struct bridge_mcast_stats __percpu *stats, const struct sk_buff *skb, u8 type, u8 dir) { struct bridge_mcast_stats *pstats = this_cpu_ptr(stats); __be16 proto = skb->protocol; unsigned int t_len; u64_stats_update_begin(&pstats->syncp); switch (proto) { case htons(ETH_P_IP): t_len = ntohs(ip_hdr(skb)->tot_len) - ip_hdrlen(skb); switch (type) { case IGMP_HOST_MEMBERSHIP_REPORT: pstats->mstats.igmp_v1reports[dir]++; break; case IGMPV2_HOST_MEMBERSHIP_REPORT: pstats->mstats.igmp_v2reports[dir]++; break; case IGMPV3_HOST_MEMBERSHIP_REPORT: pstats->mstats.igmp_v3reports[dir]++; break; case IGMP_HOST_MEMBERSHIP_QUERY: if (t_len != sizeof(struct igmphdr)) { pstats->mstats.igmp_v3queries[dir]++; } else { unsigned int offset = skb_transport_offset(skb); struct igmphdr *ih, _ihdr; ih = skb_header_pointer(skb, offset, sizeof(_ihdr), &_ihdr); if (!ih) break; if (!ih->code) pstats->mstats.igmp_v1queries[dir]++; else pstats->mstats.igmp_v2queries[dir]++; } break; case IGMP_HOST_LEAVE_MESSAGE: pstats->mstats.igmp_leaves[dir]++; break; } break; #if IS_ENABLED(CONFIG_IPV6) case htons(ETH_P_IPV6): t_len = ntohs(ipv6_hdr(skb)->payload_len) + sizeof(struct ipv6hdr); t_len -= skb_network_header_len(skb); switch (type) { case ICMPV6_MGM_REPORT: pstats->mstats.mld_v1reports[dir]++; break; case ICMPV6_MLD2_REPORT: pstats->mstats.mld_v2reports[dir]++; break; case ICMPV6_MGM_QUERY: if (t_len != sizeof(struct mld_msg)) pstats->mstats.mld_v2queries[dir]++; else pstats->mstats.mld_v1queries[dir]++; break; case ICMPV6_MGM_REDUCTION: pstats->mstats.mld_leaves[dir]++; break; } break; #endif /* CONFIG_IPV6 */ } u64_stats_update_end(&pstats->syncp); } void br_multicast_count(struct net_bridge *br, const struct net_bridge_port *p, const struct sk_buff *skb, u8 type, u8 dir) { struct bridge_mcast_stats __percpu *stats; /* if multicast_disabled is true then igmp type can't be set */ if (!type || !br_opt_get(br, BROPT_MULTICAST_STATS_ENABLED)) return; if (p) stats = p->mcast_stats; else stats = br->mcast_stats; if (WARN_ON(!stats)) return; br_mcast_stats_add(stats, skb, type, dir); } int br_multicast_init_stats(struct net_bridge *br) { br->mcast_stats = netdev_alloc_pcpu_stats(struct bridge_mcast_stats); if (!br->mcast_stats) return -ENOMEM; return 0; } void br_multicast_uninit_stats(struct net_bridge *br) { free_percpu(br->mcast_stats); } /* noinline for https://llvm.org/pr45802#c9 */ static noinline_for_stack void mcast_stats_add_dir(u64 *dst, u64 *src) { dst[BR_MCAST_DIR_RX] += src[BR_MCAST_DIR_RX]; dst[BR_MCAST_DIR_TX] += src[BR_MCAST_DIR_TX]; } void br_multicast_get_stats(const struct net_bridge *br, const struct net_bridge_port *p, struct br_mcast_stats *dest) { struct bridge_mcast_stats __percpu *stats; struct br_mcast_stats tdst; int i; memset(dest, 0, sizeof(*dest)); if (p) stats = p->mcast_stats; else stats = br->mcast_stats; if (WARN_ON(!stats)) return; memset(&tdst, 0, sizeof(tdst)); for_each_possible_cpu(i) { struct bridge_mcast_stats *cpu_stats = per_cpu_ptr(stats, i); struct br_mcast_stats temp; unsigned int start; do { start = u64_stats_fetch_begin(&cpu_stats->syncp); memcpy(&temp, &cpu_stats->mstats, sizeof(temp)); } while (u64_stats_fetch_retry(&cpu_stats->syncp, start)); mcast_stats_add_dir(tdst.igmp_v1queries, temp.igmp_v1queries); mcast_stats_add_dir(tdst.igmp_v2queries, temp.igmp_v2queries); mcast_stats_add_dir(tdst.igmp_v3queries, temp.igmp_v3queries); mcast_stats_add_dir(tdst.igmp_leaves, temp.igmp_leaves); mcast_stats_add_dir(tdst.igmp_v1reports, temp.igmp_v1reports); mcast_stats_add_dir(tdst.igmp_v2reports, temp.igmp_v2reports); mcast_stats_add_dir(tdst.igmp_v3reports, temp.igmp_v3reports); tdst.igmp_parse_errors += temp.igmp_parse_errors; mcast_stats_add_dir(tdst.mld_v1queries, temp.mld_v1queries); mcast_stats_add_dir(tdst.mld_v2queries, temp.mld_v2queries); mcast_stats_add_dir(tdst.mld_leaves, temp.mld_leaves); mcast_stats_add_dir(tdst.mld_v1reports, temp.mld_v1reports); mcast_stats_add_dir(tdst.mld_v2reports, temp.mld_v2reports); tdst.mld_parse_errors += temp.mld_parse_errors; } memcpy(dest, &tdst, sizeof(*dest)); } int br_mdb_hash_init(struct net_bridge *br) { int err; err = rhashtable_init(&br->sg_port_tbl, &br_sg_port_rht_params); if (err) return err; err = rhashtable_init(&br->mdb_hash_tbl, &br_mdb_rht_params); if (err) { rhashtable_destroy(&br->sg_port_tbl); return err; } return 0; } void br_mdb_hash_fini(struct net_bridge *br) { rhashtable_destroy(&br->sg_port_tbl); rhashtable_destroy(&br->mdb_hash_tbl); } |
| 20 17 16 15 14 14 13 14 2 13 10 4 10 12 11 12 9 15 14 13 10 10 9 1 8 10 3 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 | // SPDX-License-Identifier: GPL-2.0-only /* Copyright (c) 2017 Facebook */ #include <linux/slab.h> #include <linux/bpf.h> #include <linux/btf.h> #include "map_in_map.h" struct bpf_map *bpf_map_meta_alloc(int inner_map_ufd) { struct bpf_map *inner_map, *inner_map_meta; u32 inner_map_meta_size; CLASS(fd, f)(inner_map_ufd); inner_map = __bpf_map_get(f); if (IS_ERR(inner_map)) return inner_map; /* Does not support >1 level map-in-map */ if (inner_map->inner_map_meta) return ERR_PTR(-EINVAL); if (!inner_map->ops->map_meta_equal) return ERR_PTR(-ENOTSUPP); inner_map_meta_size = sizeof(*inner_map_meta); /* In some cases verifier needs to access beyond just base map. */ if (inner_map->ops == &array_map_ops || inner_map->ops == &percpu_array_map_ops) inner_map_meta_size = sizeof(struct bpf_array); inner_map_meta = kzalloc(inner_map_meta_size, GFP_USER); if (!inner_map_meta) return ERR_PTR(-ENOMEM); inner_map_meta->map_type = inner_map->map_type; inner_map_meta->key_size = inner_map->key_size; inner_map_meta->value_size = inner_map->value_size; inner_map_meta->map_flags = inner_map->map_flags; inner_map_meta->max_entries = inner_map->max_entries; inner_map_meta->record = btf_record_dup(inner_map->record); if (IS_ERR(inner_map_meta->record)) { /* btf_record_dup returns NULL or valid pointer in case of * invalid/empty/valid, but ERR_PTR in case of errors. During * equality NULL or IS_ERR is equivalent. */ struct bpf_map *ret = ERR_CAST(inner_map_meta->record); kfree(inner_map_meta); return ret; } /* Note: We must use the same BTF, as we also used btf_record_dup above * which relies on BTF being same for both maps, as some members like * record->fields.list_head have pointers like value_rec pointing into * inner_map->btf. */ if (inner_map->btf) { btf_get(inner_map->btf); inner_map_meta->btf = inner_map->btf; } /* Misc members not needed in bpf_map_meta_equal() check. */ inner_map_meta->ops = inner_map->ops; if (inner_map->ops == &array_map_ops || inner_map->ops == &percpu_array_map_ops) { struct bpf_array *inner_array_meta = container_of(inner_map_meta, struct bpf_array, map); struct bpf_array *inner_array = container_of(inner_map, struct bpf_array, map); inner_array_meta->index_mask = inner_array->index_mask; inner_array_meta->elem_size = inner_array->elem_size; inner_map_meta->bypass_spec_v1 = inner_map->bypass_spec_v1; } return inner_map_meta; } void bpf_map_meta_free(struct bpf_map *map_meta) { bpf_map_free_record(map_meta); btf_put(map_meta->btf); kfree(map_meta); } bool bpf_map_meta_equal(const struct bpf_map *meta0, const struct bpf_map *meta1) { /* No need to compare ops because it is covered by map_type */ return meta0->map_type == meta1->map_type && meta0->key_size == meta1->key_size && meta0->value_size == meta1->value_size && meta0->map_flags == meta1->map_flags && btf_record_equal(meta0->record, meta1->record); } void *bpf_map_fd_get_ptr(struct bpf_map *map, struct file *map_file /* not used */, int ufd) { struct bpf_map *inner_map, *inner_map_meta; CLASS(fd, f)(ufd); inner_map = __bpf_map_get(f); if (IS_ERR(inner_map)) return inner_map; inner_map_meta = map->inner_map_meta; if (inner_map_meta->ops->map_meta_equal(inner_map_meta, inner_map)) bpf_map_inc(inner_map); else inner_map = ERR_PTR(-EINVAL); return inner_map; } void bpf_map_fd_put_ptr(struct bpf_map *map, void *ptr, bool need_defer) { struct bpf_map *inner_map = ptr; /* Defer the freeing of inner map according to the sleepable attribute * of bpf program which owns the outer map, so unnecessary waiting for * RCU tasks trace grace period can be avoided. */ if (need_defer) { if (atomic64_read(&map->sleepable_refcnt)) WRITE_ONCE(inner_map->free_after_mult_rcu_gp, true); else WRITE_ONCE(inner_map->free_after_rcu_gp, true); } bpf_map_put(inner_map); } u32 bpf_map_fd_sys_lookup_elem(void *ptr) { return ((struct bpf_map *)ptr)->id; } |
| 2 2 2 2 2 2 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 | // SPDX-License-Identifier: GPL-2.0-or-later /* * Mirics MSi001 silicon tuner driver * * Copyright (C) 2013 Antti Palosaari <crope@iki.fi> * Copyright (C) 2014 Antti Palosaari <crope@iki.fi> */ #include <linux/module.h> #include <linux/gcd.h> #include <media/v4l2-device.h> #include <media/v4l2-ctrls.h> static const struct v4l2_frequency_band bands[] = { { .type = V4L2_TUNER_RF, .index = 0, .capability = V4L2_TUNER_CAP_1HZ | V4L2_TUNER_CAP_FREQ_BANDS, .rangelow = 49000000, .rangehigh = 263000000, }, { .type = V4L2_TUNER_RF, .index = 1, .capability = V4L2_TUNER_CAP_1HZ | V4L2_TUNER_CAP_FREQ_BANDS, .rangelow = 390000000, .rangehigh = 960000000, }, }; struct msi001_dev { struct spi_device *spi; struct v4l2_subdev sd; /* Controls */ struct v4l2_ctrl_handler hdl; struct v4l2_ctrl *bandwidth_auto; struct v4l2_ctrl *bandwidth; struct v4l2_ctrl *lna_gain; struct v4l2_ctrl *mixer_gain; struct v4l2_ctrl *if_gain; unsigned int f_tuner; }; static inline struct msi001_dev *sd_to_msi001_dev(struct v4l2_subdev *sd) { return container_of(sd, struct msi001_dev, sd); } static int msi001_wreg(struct msi001_dev *dev, u32 data) { /* Register format: 4 bits addr + 20 bits value */ return spi_write(dev->spi, &data, 3); }; static int msi001_set_gain(struct msi001_dev *dev, int lna_gain, int mixer_gain, int if_gain) { struct spi_device *spi = dev->spi; int ret; u32 reg; dev_dbg(&spi->dev, "lna=%d mixer=%d if=%d\n", lna_gain, mixer_gain, if_gain); reg = 1 << 0; reg |= (59 - if_gain) << 4; reg |= 0 << 10; reg |= (1 - mixer_gain) << 12; reg |= (1 - lna_gain) << 13; reg |= 4 << 14; reg |= 0 << 17; ret = msi001_wreg(dev, reg); if (ret) goto err; return 0; err: dev_dbg(&spi->dev, "failed %d\n", ret); return ret; }; static int msi001_set_tuner(struct msi001_dev *dev) { struct spi_device *spi = dev->spi; int ret, i; unsigned int uitmp, div_n, k, k_thresh, k_frac, div_lo, f_if1; u32 reg; u64 f_vco; u8 mode, filter_mode; static const struct { u32 rf; u8 mode; u8 div_lo; } band_lut[] = { { 50000000, 0xe1, 16}, /* AM_MODE2, antenna 2 */ {108000000, 0x42, 32}, /* VHF_MODE */ {330000000, 0x44, 16}, /* B3_MODE */ {960000000, 0x48, 4}, /* B45_MODE */ { ~0U, 0x50, 2}, /* BL_MODE */ }; static const struct { u32 freq; u8 filter_mode; } if_freq_lut[] = { { 0, 0x03}, /* Zero IF */ { 450000, 0x02}, /* 450 kHz IF */ {1620000, 0x01}, /* 1.62 MHz IF */ {2048000, 0x00}, /* 2.048 MHz IF */ }; static const struct { u32 freq; u8 val; } bandwidth_lut[] = { { 200000, 0x00}, /* 200 kHz */ { 300000, 0x01}, /* 300 kHz */ { 600000, 0x02}, /* 600 kHz */ {1536000, 0x03}, /* 1.536 MHz */ {5000000, 0x04}, /* 5 MHz */ {6000000, 0x05}, /* 6 MHz */ {7000000, 0x06}, /* 7 MHz */ {8000000, 0x07}, /* 8 MHz */ }; unsigned int f_rf = dev->f_tuner; /* * bandwidth (Hz) * 200000, 300000, 600000, 1536000, 5000000, 6000000, 7000000, 8000000 */ unsigned int bandwidth; /* * intermediate frequency (Hz) * 0, 450000, 1620000, 2048000 */ unsigned int f_if = 0; #define F_REF 24000000 #define DIV_PRE_N 4 #define F_VCO_STEP div_lo dev_dbg(&spi->dev, "f_rf=%d f_if=%d\n", f_rf, f_if); for (i = 0; i < ARRAY_SIZE(band_lut); i++) { if (f_rf <= band_lut[i].rf) { mode = band_lut[i].mode; div_lo = band_lut[i].div_lo; break; } } if (i == ARRAY_SIZE(band_lut)) { ret = -EINVAL; goto err; } /* AM_MODE is upconverted */ if ((mode >> 0) & 0x1) f_if1 = 5 * F_REF; else f_if1 = 0; for (i = 0; i < ARRAY_SIZE(if_freq_lut); i++) { if (f_if == if_freq_lut[i].freq) { filter_mode = if_freq_lut[i].filter_mode; break; } } if (i == ARRAY_SIZE(if_freq_lut)) { ret = -EINVAL; goto err; } /* filters */ bandwidth = dev->bandwidth->val; bandwidth = clamp(bandwidth, 200000U, 8000000U); for (i = 0; i < ARRAY_SIZE(bandwidth_lut); i++) { if (bandwidth <= bandwidth_lut[i].freq) { bandwidth = bandwidth_lut[i].val; break; } } if (i == ARRAY_SIZE(bandwidth_lut)) { ret = -EINVAL; goto err; } dev->bandwidth->val = bandwidth_lut[i].freq; dev_dbg(&spi->dev, "bandwidth selected=%d\n", bandwidth_lut[i].freq); /* * Fractional-N synthesizer * * +---------------------------------------+ * v | * Fref +----+ +-------+ +----+ +------+ +---+ * ------> | PD | --> | VCO | ------> | /4 | --> | /N.F | <-- | K | * +----+ +-------+ +----+ +------+ +---+ * | * | * v * +-------+ Fout * | /Rout | ------> * +-------+ */ /* Calculate PLL integer and fractional control word. */ f_vco = (u64) (f_rf + f_if + f_if1) * div_lo; div_n = div_u64_rem(f_vco, DIV_PRE_N * F_REF, &k); k_thresh = (DIV_PRE_N * F_REF) / F_VCO_STEP; k_frac = div_u64((u64) k * k_thresh, (DIV_PRE_N * F_REF)); /* Find out greatest common divisor and divide to smaller. */ uitmp = gcd(k_thresh, k_frac); k_thresh /= uitmp; k_frac /= uitmp; /* Force divide to reg max. Resolution will be reduced. */ uitmp = DIV_ROUND_UP(k_thresh, 4095); k_thresh = DIV_ROUND_CLOSEST(k_thresh, uitmp); k_frac = DIV_ROUND_CLOSEST(k_frac, uitmp); /* Calculate real RF set. */ uitmp = (unsigned int) F_REF * DIV_PRE_N * div_n; uitmp += (unsigned int) F_REF * DIV_PRE_N * k_frac / k_thresh; uitmp /= div_lo; dev_dbg(&spi->dev, "f_rf=%u:%u f_vco=%llu div_n=%u k_thresh=%u k_frac=%u div_lo=%u\n", f_rf, uitmp, f_vco, div_n, k_thresh, k_frac, div_lo); ret = msi001_wreg(dev, 0x00000e); if (ret) goto err; ret = msi001_wreg(dev, 0x000003); if (ret) goto err; reg = 0 << 0; reg |= mode << 4; reg |= filter_mode << 12; reg |= bandwidth << 14; reg |= 0x02 << 17; reg |= 0x00 << 20; ret = msi001_wreg(dev, reg); if (ret) goto err; reg = 5 << 0; reg |= k_thresh << 4; reg |= 1 << 19; reg |= 1 << 21; ret = msi001_wreg(dev, reg); if (ret) goto err; reg = 2 << 0; reg |= k_frac << 4; reg |= div_n << 16; ret = msi001_wreg(dev, reg); if (ret) goto err; ret = msi001_set_gain(dev, dev->lna_gain->cur.val, dev->mixer_gain->cur.val, dev->if_gain->cur.val); if (ret) goto err; reg = 6 << 0; reg |= 63 << 4; reg |= 4095 << 10; ret = msi001_wreg(dev, reg); if (ret) goto err; return 0; err: dev_dbg(&spi->dev, "failed %d\n", ret); return ret; } static int msi001_standby(struct v4l2_subdev *sd) { struct msi001_dev *dev = sd_to_msi001_dev(sd); return msi001_wreg(dev, 0x000000); } static int msi001_g_tuner(struct v4l2_subdev *sd, struct v4l2_tuner *v) { struct msi001_dev *dev = sd_to_msi001_dev(sd); struct spi_device *spi = dev->spi; dev_dbg(&spi->dev, "index=%d\n", v->index); strscpy(v->name, "Mirics MSi001", sizeof(v->name)); v->type = V4L2_TUNER_RF; v->capability = V4L2_TUNER_CAP_1HZ | V4L2_TUNER_CAP_FREQ_BANDS; v->rangelow = 49000000; v->rangehigh = 960000000; return 0; } static int msi001_s_tuner(struct v4l2_subdev *sd, const struct v4l2_tuner *v) { struct msi001_dev *dev = sd_to_msi001_dev(sd); struct spi_device *spi = dev->spi; dev_dbg(&spi->dev, "index=%d\n", v->index); return 0; } static int msi001_g_frequency(struct v4l2_subdev *sd, struct v4l2_frequency *f) { struct msi001_dev *dev = sd_to_msi001_dev(sd); struct spi_device *spi = dev->spi; dev_dbg(&spi->dev, "tuner=%d\n", f->tuner); f->frequency = dev->f_tuner; return 0; } static int msi001_s_frequency(struct v4l2_subdev *sd, const struct v4l2_frequency *f) { struct msi001_dev *dev = sd_to_msi001_dev(sd); struct spi_device *spi = dev->spi; unsigned int band; dev_dbg(&spi->dev, "tuner=%d type=%d frequency=%u\n", f->tuner, f->type, f->frequency); if (f->frequency < ((bands[0].rangehigh + bands[1].rangelow) / 2)) band = 0; else band = 1; dev->f_tuner = clamp_t(unsigned int, f->frequency, bands[band].rangelow, bands[band].rangehigh); return msi001_set_tuner(dev); } static int msi001_enum_freq_bands(struct v4l2_subdev *sd, struct v4l2_frequency_band *band) { struct msi001_dev *dev = sd_to_msi001_dev(sd); struct spi_device *spi = dev->spi; dev_dbg(&spi->dev, "tuner=%d type=%d index=%d\n", band->tuner, band->type, band->index); if (band->index >= ARRAY_SIZE(bands)) return -EINVAL; band->capability = bands[band->index].capability; band->rangelow = bands[band->index].rangelow; band->rangehigh = bands[band->index].rangehigh; return 0; } static const struct v4l2_subdev_tuner_ops msi001_tuner_ops = { .standby = msi001_standby, .g_tuner = msi001_g_tuner, .s_tuner = msi001_s_tuner, .g_frequency = msi001_g_frequency, .s_frequency = msi001_s_frequency, .enum_freq_bands = msi001_enum_freq_bands, }; static const struct v4l2_subdev_ops msi001_ops = { .tuner = &msi001_tuner_ops, }; static int msi001_s_ctrl(struct v4l2_ctrl *ctrl) { struct msi001_dev *dev = container_of(ctrl->handler, struct msi001_dev, hdl); struct spi_device *spi = dev->spi; int ret; dev_dbg(&spi->dev, "id=%d name=%s val=%d min=%lld max=%lld step=%lld\n", ctrl->id, ctrl->name, ctrl->val, ctrl->minimum, ctrl->maximum, ctrl->step); switch (ctrl->id) { case V4L2_CID_RF_TUNER_BANDWIDTH_AUTO: case V4L2_CID_RF_TUNER_BANDWIDTH: ret = msi001_set_tuner(dev); break; case V4L2_CID_RF_TUNER_LNA_GAIN: ret = msi001_set_gain(dev, dev->lna_gain->val, dev->mixer_gain->cur.val, dev->if_gain->cur.val); break; case V4L2_CID_RF_TUNER_MIXER_GAIN: ret = msi001_set_gain(dev, dev->lna_gain->cur.val, dev->mixer_gain->val, dev->if_gain->cur.val); break; case V4L2_CID_RF_TUNER_IF_GAIN: ret = msi001_set_gain(dev, dev->lna_gain->cur.val, dev->mixer_gain->cur.val, dev->if_gain->val); break; default: dev_dbg(&spi->dev, "unknown control %d\n", ctrl->id); ret = -EINVAL; } return ret; } static const struct v4l2_ctrl_ops msi001_ctrl_ops = { .s_ctrl = msi001_s_ctrl, }; static int msi001_probe(struct spi_device *spi) { struct msi001_dev *dev; int ret; dev_dbg(&spi->dev, "\n"); dev = kzalloc(sizeof(*dev), GFP_KERNEL); if (!dev) { ret = -ENOMEM; goto err; } dev->spi = spi; dev->f_tuner = bands[0].rangelow; v4l2_spi_subdev_init(&dev->sd, spi, &msi001_ops); /* Register controls */ v4l2_ctrl_handler_init(&dev->hdl, 5); dev->bandwidth_auto = v4l2_ctrl_new_std(&dev->hdl, &msi001_ctrl_ops, V4L2_CID_RF_TUNER_BANDWIDTH_AUTO, 0, 1, 1, 1); dev->bandwidth = v4l2_ctrl_new_std(&dev->hdl, &msi001_ctrl_ops, V4L2_CID_RF_TUNER_BANDWIDTH, 200000, 8000000, 1, 200000); if (dev->hdl.error) { ret = dev->hdl.error; dev_err(&spi->dev, "Could not initialize controls\n"); /* control init failed, free handler */ goto err_ctrl_handler_free; } v4l2_ctrl_auto_cluster(2, &dev->bandwidth_auto, 0, false); dev->lna_gain = v4l2_ctrl_new_std(&dev->hdl, &msi001_ctrl_ops, V4L2_CID_RF_TUNER_LNA_GAIN, 0, 1, 1, 1); dev->mixer_gain = v4l2_ctrl_new_std(&dev->hdl, &msi001_ctrl_ops, V4L2_CID_RF_TUNER_MIXER_GAIN, 0, 1, 1, 1); dev->if_gain = v4l2_ctrl_new_std(&dev->hdl, &msi001_ctrl_ops, V4L2_CID_RF_TUNER_IF_GAIN, 0, 59, 1, 0); if (dev->hdl.error) { ret = dev->hdl.error; dev_err(&spi->dev, "Could not initialize controls\n"); /* control init failed, free handler */ goto err_ctrl_handler_free; } dev->sd.ctrl_handler = &dev->hdl; return 0; err_ctrl_handler_free: v4l2_ctrl_handler_free(&dev->hdl); kfree(dev); err: return ret; } static void msi001_remove(struct spi_device *spi) { struct v4l2_subdev *sd = spi_get_drvdata(spi); struct msi001_dev *dev = sd_to_msi001_dev(sd); dev_dbg(&spi->dev, "\n"); /* * Registered by v4l2_spi_new_subdev() from master driver, but we must * unregister it from here. Weird. */ v4l2_device_unregister_subdev(&dev->sd); v4l2_ctrl_handler_free(&dev->hdl); kfree(dev); } static const struct spi_device_id msi001_id_table[] = { {"msi001", 0}, {} }; MODULE_DEVICE_TABLE(spi, msi001_id_table); static struct spi_driver msi001_driver = { .driver = { .name = "msi001", .suppress_bind_attrs = true, }, .probe = msi001_probe, .remove = msi001_remove, .id_table = msi001_id_table, }; module_spi_driver(msi001_driver); MODULE_AUTHOR("Antti Palosaari <crope@iki.fi>"); MODULE_DESCRIPTION("Mirics MSi001"); MODULE_LICENSE("GPL"); |
| 7 523 265 340 349 65 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 | /* SPDX-License-Identifier: GPL-2.0 */ #ifndef BTRFS_SPACE_INFO_H #define BTRFS_SPACE_INFO_H #include <trace/events/btrfs.h> #include <linux/spinlock.h> #include <linux/list.h> #include <linux/kobject.h> #include <linux/lockdep.h> #include <linux/wait.h> #include <linux/rwsem.h> #include "volumes.h" struct btrfs_fs_info; struct btrfs_block_group; /* * Different levels for to flush space when doing space reservations. * * The higher the level, the more methods we try to reclaim space. */ enum btrfs_reserve_flush_enum { /* If we are in the transaction, we can't flush anything.*/ BTRFS_RESERVE_NO_FLUSH, /* * Flush space by: * - Running delayed inode items * - Allocating a new chunk */ BTRFS_RESERVE_FLUSH_LIMIT, /* * Flush space by: * - Running delayed inode items * - Running delayed refs * - Running delalloc and waiting for ordered extents * - Allocating a new chunk * - Committing transaction */ BTRFS_RESERVE_FLUSH_EVICT, /* * Flush space by above mentioned methods and by: * - Running delayed iputs * - Committing transaction * * Can be interrupted by a fatal signal. */ BTRFS_RESERVE_FLUSH_DATA, BTRFS_RESERVE_FLUSH_FREE_SPACE_INODE, BTRFS_RESERVE_FLUSH_ALL, /* * Pretty much the same as FLUSH_ALL, but can also steal space from * global rsv. * * Can be interrupted by a fatal signal. */ BTRFS_RESERVE_FLUSH_ALL_STEAL, /* * This is for btrfs_use_block_rsv only. We have exhausted our block * rsv and our global block rsv. This can happen for things like * delalloc where we are overwriting a lot of extents with a single * extent and didn't reserve enough space. Alternatively it can happen * with delalloc where we reserve 1 extents worth for a large extent but * fragmentation leads to multiple extents being created. This will * give us the reservation in the case of * * if (num_bytes < (space_info->total_bytes - * btrfs_space_info_used(space_info, false)) * * Which ignores bytes_may_use. This is potentially dangerous, but our * reservation system is generally pessimistic so is able to absorb this * style of mistake. */ BTRFS_RESERVE_FLUSH_EMERGENCY, }; /* * Please be aware that the order of enum values will be the order of the reclaim * process in btrfs_async_reclaim_metadata_space(). */ enum btrfs_flush_state { FLUSH_DELAYED_ITEMS_NR = 1, FLUSH_DELAYED_ITEMS = 2, FLUSH_DELAYED_REFS_NR = 3, FLUSH_DELAYED_REFS = 4, FLUSH_DELALLOC = 5, FLUSH_DELALLOC_WAIT = 6, FLUSH_DELALLOC_FULL = 7, ALLOC_CHUNK = 8, ALLOC_CHUNK_FORCE = 9, RUN_DELAYED_IPUTS = 10, COMMIT_TRANS = 11, RESET_ZONES = 12, }; enum btrfs_space_info_sub_group { BTRFS_SUB_GROUP_PRIMARY, BTRFS_SUB_GROUP_DATA_RELOC, BTRFS_SUB_GROUP_TREELOG, }; #define BTRFS_SPACE_INFO_SUB_GROUP_MAX 1 struct btrfs_space_info { struct btrfs_fs_info *fs_info; struct btrfs_space_info *parent; struct btrfs_space_info *sub_group[BTRFS_SPACE_INFO_SUB_GROUP_MAX]; int subgroup_id; spinlock_t lock; u64 total_bytes; /* total bytes in the space, this doesn't take mirrors into account */ u64 bytes_used; /* total bytes used, this doesn't take mirrors into account */ u64 bytes_pinned; /* total bytes pinned, will be freed when the transaction finishes */ u64 bytes_reserved; /* total bytes the allocator has reserved for current allocations */ u64 bytes_may_use; /* number of bytes that may be used for delalloc/allocations */ u64 bytes_readonly; /* total bytes that are read only */ u64 bytes_zone_unusable; /* total bytes that are unusable until resetting the device zone */ u64 max_extent_size; /* This will hold the maximum extent size of the space info if we had an ENOSPC in the allocator. */ /* Chunk size in bytes */ u64 chunk_size; /* * Once a block group drops below this threshold (percents) we'll * schedule it for reclaim. */ int bg_reclaim_threshold; int clamp; /* Used to scale our threshold for preemptive flushing. The value is >> clamp, so turns out to be a 2^clamp divisor. */ unsigned int full:1; /* indicates that we cannot allocate any more chunks for this space */ unsigned int chunk_alloc:1; /* set if we are allocating a chunk */ unsigned int flush:1; /* set if we are trying to make space */ unsigned int force_alloc; /* set if we need to force a chunk alloc for this space */ u64 disk_used; /* total bytes used on disk */ u64 disk_total; /* total bytes on disk, takes mirrors into account */ u64 flags; struct list_head list; /* Protected by the spinlock 'lock'. */ struct list_head ro_bgs; struct list_head priority_tickets; struct list_head tickets; /* * Size of space that needs to be reclaimed in order to satisfy pending * tickets */ u64 reclaim_size; /* * tickets_id just indicates the next ticket will be handled, so note * it's not stored per ticket. */ u64 tickets_id; struct rw_semaphore groups_sem; /* for block groups in our same type */ struct list_head block_groups[BTRFS_NR_RAID_TYPES]; struct kobject kobj; struct kobject *block_group_kobjs[BTRFS_NR_RAID_TYPES]; /* * Monotonically increasing counter of block group reclaim attempts * Exposed in /sys/fs/<uuid>/allocation/<type>/reclaim_count */ u64 reclaim_count; /* * Monotonically increasing counter of reclaimed bytes * Exposed in /sys/fs/<uuid>/allocation/<type>/reclaim_bytes */ u64 reclaim_bytes; /* * Monotonically increasing counter of reclaim errors * Exposed in /sys/fs/<uuid>/allocation/<type>/reclaim_errors */ u64 reclaim_errors; /* * If true, use the dynamic relocation threshold, instead of the * fixed bg_reclaim_threshold. */ bool dynamic_reclaim; /* * Periodically check all block groups against the reclaim * threshold in the cleaner thread. */ bool periodic_reclaim; /* * Periodic reclaim should be a no-op if a space_info hasn't * freed any space since the last time we tried. */ bool periodic_reclaim_ready; /* * Net bytes freed or allocated since the last reclaim pass. */ s64 reclaimable_bytes; }; struct reserve_ticket { u64 bytes; int error; bool steal; struct list_head list; wait_queue_head_t wait; }; static inline bool btrfs_mixed_space_info(const struct btrfs_space_info *space_info) { return ((space_info->flags & BTRFS_BLOCK_GROUP_METADATA) && (space_info->flags & BTRFS_BLOCK_GROUP_DATA)); } /* * * Declare a helper function to detect underflow of various space info members */ #define DECLARE_SPACE_INFO_UPDATE(name, trace_name) \ static inline void \ btrfs_space_info_update_##name(struct btrfs_space_info *sinfo, \ s64 bytes) \ { \ struct btrfs_fs_info *fs_info = sinfo->fs_info; \ const u64 abs_bytes = (bytes < 0) ? -bytes : bytes; \ lockdep_assert_held(&sinfo->lock); \ trace_update_##name(fs_info, sinfo, sinfo->name, bytes); \ trace_btrfs_space_reservation(fs_info, trace_name, \ sinfo->flags, abs_bytes, \ bytes > 0); \ if (bytes < 0 && sinfo->name < -bytes) { \ WARN_ON(1); \ sinfo->name = 0; \ return; \ } \ sinfo->name += bytes; \ } DECLARE_SPACE_INFO_UPDATE(bytes_may_use, "space_info"); DECLARE_SPACE_INFO_UPDATE(bytes_pinned, "pinned"); DECLARE_SPACE_INFO_UPDATE(bytes_zone_unusable, "zone_unusable"); int btrfs_init_space_info(struct btrfs_fs_info *fs_info); void btrfs_add_bg_to_space_info(struct btrfs_fs_info *info, struct btrfs_block_group *block_group); void btrfs_update_space_info_chunk_size(struct btrfs_space_info *space_info, u64 chunk_size); struct btrfs_space_info *btrfs_find_space_info(struct btrfs_fs_info *info, u64 flags); u64 __pure btrfs_space_info_used(const struct btrfs_space_info *s_info, bool may_use_included); void btrfs_clear_space_info_full(struct btrfs_fs_info *info); void btrfs_dump_space_info(struct btrfs_fs_info *fs_info, struct btrfs_space_info *info, u64 bytes, bool dump_block_groups); int btrfs_reserve_metadata_bytes(struct btrfs_fs_info *fs_info, struct btrfs_space_info *space_info, u64 orig_bytes, enum btrfs_reserve_flush_enum flush); void btrfs_try_granting_tickets(struct btrfs_fs_info *fs_info, struct btrfs_space_info *space_info); int btrfs_can_overcommit(struct btrfs_fs_info *fs_info, const struct btrfs_space_info *space_info, u64 bytes, enum btrfs_reserve_flush_enum flush); static inline void btrfs_space_info_free_bytes_may_use( struct btrfs_space_info *space_info, u64 num_bytes) { spin_lock(&space_info->lock); btrfs_space_info_update_bytes_may_use(space_info, -num_bytes); btrfs_try_granting_tickets(space_info->fs_info, space_info); spin_unlock(&space_info->lock); } int btrfs_reserve_data_bytes(struct btrfs_space_info *space_info, u64 bytes, enum btrfs_reserve_flush_enum flush); void btrfs_dump_space_info_for_trans_abort(struct btrfs_fs_info *fs_info); void btrfs_init_async_reclaim_work(struct btrfs_fs_info *fs_info); u64 btrfs_account_ro_block_groups_free_space(struct btrfs_space_info *sinfo); void btrfs_space_info_update_reclaimable(struct btrfs_space_info *space_info, s64 bytes); void btrfs_set_periodic_reclaim_ready(struct btrfs_space_info *space_info, bool ready); int btrfs_calc_reclaim_threshold(const struct btrfs_space_info *space_info); void btrfs_reclaim_sweep(const struct btrfs_fs_info *fs_info); void btrfs_return_free_space(struct btrfs_space_info *space_info, u64 len); #endif /* BTRFS_SPACE_INFO_H */ |
| 178 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 | /* SPDX-License-Identifier: GPL-2.0 */ /* * Copyright (C) 2014 Facebook. All rights reserved. */ #ifndef BTRFS_QGROUP_H #define BTRFS_QGROUP_H #include <linux/types.h> #include <linux/spinlock.h> #include <linux/rbtree.h> #include <linux/kobject.h> #include <linux/list.h> #include <uapi/linux/btrfs_tree.h> struct extent_buffer; struct extent_changeset; struct btrfs_delayed_extent_op; struct btrfs_fs_info; struct btrfs_root; struct btrfs_ioctl_quota_ctl_args; struct btrfs_trans_handle; struct btrfs_delayed_ref_root; struct btrfs_inode; struct btrfs_transaction; struct btrfs_block_group; struct btrfs_qgroup_swapped_blocks; /* * Btrfs qgroup overview * * Btrfs qgroup splits into 3 main part: * 1) Reserve * Reserve metadata/data space for incoming operations * Affect how qgroup limit works * * 2) Trace * Tell btrfs qgroup to trace dirty extents. * * Dirty extents including: * - Newly allocated extents * - Extents going to be deleted (in this trans) * - Extents whose owner is going to be modified * * This is the main part affects whether qgroup numbers will stay * consistent. * Btrfs qgroup can trace clean extents and won't cause any problem, * but it will consume extra CPU time, it should be avoided if possible. * * 3) Account * Btrfs qgroup will updates its numbers, based on dirty extents traced * in previous step. * * Normally at qgroup rescan and transaction commit time. */ /* * Special performance optimization for balance. * * For balance, we need to swap subtree of subvolume and reloc trees. * In theory, we need to trace all subtree blocks of both subvolume and reloc * trees, since their owner has changed during such swap. * * However since balance has ensured that both subtrees are containing the * same contents and have the same tree structures, such swap won't cause * qgroup number change. * * But there is a race window between subtree swap and transaction commit, * during that window, if we increase/decrease tree level or merge/split tree * blocks, we still need to trace the original subtrees. * * So for balance, we use a delayed subtree tracing, whose workflow is: * * 1) Record the subtree root block get swapped. * * During subtree swap: * O = Old tree blocks * N = New tree blocks * reloc tree subvolume tree X * Root Root * / \ / \ * NA OB OA OB * / | | \ / | | \ * NC ND OE OF OC OD OE OF * * In this case, NA and OA are going to be swapped, record (NA, OA) into * subvolume tree X. * * 2) After subtree swap. * reloc tree subvolume tree X * Root Root * / \ / \ * OA OB NA OB * / | | \ / | | \ * OC OD OE OF NC ND OE OF * * 3a) COW happens for OB * If we are going to COW tree block OB, we check OB's bytenr against * tree X's swapped_blocks structure. * If it doesn't fit any, nothing will happen. * * 3b) COW happens for NA * Check NA's bytenr against tree X's swapped_blocks, and get a hit. * Then we do subtree scan on both subtrees OA and NA. * Resulting 6 tree blocks to be scanned (OA, OC, OD, NA, NC, ND). * * Then no matter what we do to subvolume tree X, qgroup numbers will * still be correct. * Then NA's record gets removed from X's swapped_blocks. * * 4) Transaction commit * Any record in X's swapped_blocks gets removed, since there is no * modification to the swapped subtrees, no need to trigger heavy qgroup * subtree rescan for them. */ /* * These flags share the flags field of the btrfs_qgroup_status_item with the * persisted flags defined in btrfs_tree.h. * * To minimize the chance of collision with new persisted status flags, these * count backwards from the MSB. */ #define BTRFS_QGROUP_RUNTIME_FLAG_CANCEL_RESCAN (1ULL << 63) #define BTRFS_QGROUP_RUNTIME_FLAG_NO_ACCOUNTING (1ULL << 62) #define BTRFS_QGROUP_DROP_SUBTREE_THRES_DEFAULT (3) /* * Record a dirty extent, and info qgroup to update quota on it */ struct btrfs_qgroup_extent_record { /* * The bytenr of the extent is given by its index in the dirty_extents * xarray of struct btrfs_delayed_ref_root left shifted by * fs_info->sectorsize_bits. */ u64 num_bytes; /* * For qgroup reserved data space freeing. * * @data_rsv_refroot and @data_rsv will be recorded after * BTRFS_ADD_DELAYED_EXTENT is called. * And will be used to free reserved qgroup space at * transaction commit time. */ u32 data_rsv; /* reserved data space needs to be freed */ u64 data_rsv_refroot; /* which root the reserved data belongs to */ struct ulist *old_roots; }; struct btrfs_qgroup_swapped_block { struct rb_node node; int level; bool trace_leaf; /* bytenr/generation of the tree block in subvolume tree after swap */ u64 subvol_bytenr; u64 subvol_generation; /* bytenr/generation of the tree block in reloc tree after swap */ u64 reloc_bytenr; u64 reloc_generation; u64 last_snapshot; struct btrfs_key first_key; }; /* * Qgroup reservation types: * * DATA: * space reserved for data * * META_PERTRANS: * Space reserved for metadata (per-transaction) * Due to the fact that qgroup data is only updated at transaction commit * time, reserved space for metadata must be kept until transaction * commits. * Any metadata reserved that are used in btrfs_start_transaction() should * be of this type. * * META_PREALLOC: * There are cases where metadata space is reserved before starting * transaction, and then btrfs_join_transaction() to get a trans handle. * Any metadata reserved for such usage should be of this type. * And after join_transaction() part (or all) of such reservation should * be converted into META_PERTRANS. */ enum btrfs_qgroup_rsv_type { BTRFS_QGROUP_RSV_DATA, BTRFS_QGROUP_RSV_META_PERTRANS, BTRFS_QGROUP_RSV_META_PREALLOC, BTRFS_QGROUP_RSV_LAST, }; /* * Represents how many bytes we have reserved for this qgroup. * * Each type should have different reservation behavior. * E.g, data follows its io_tree flag modification, while * *currently* meta is just reserve-and-clear during transaction. * * TODO: Add new type for reservation which can survive transaction commit. * Current metadata reservation behavior is not suitable for such case. */ struct btrfs_qgroup_rsv { u64 values[BTRFS_QGROUP_RSV_LAST]; }; /* * one struct for each qgroup, organized in fs_info->qgroup_tree. */ struct btrfs_qgroup { u64 qgroupid; /* * state */ u64 rfer; /* referenced */ u64 rfer_cmpr; /* referenced compressed */ u64 excl; /* exclusive */ u64 excl_cmpr; /* exclusive compressed */ /* * limits */ u64 lim_flags; /* which limits are set */ u64 max_rfer; u64 max_excl; u64 rsv_rfer; u64 rsv_excl; /* * reservation tracking */ struct btrfs_qgroup_rsv rsv; /* * lists */ struct list_head groups; /* groups this group is member of */ struct list_head members; /* groups that are members of this group */ struct list_head dirty; /* dirty groups */ /* * For qgroup iteration usage. * * The iteration list should always be empty until qgroup_iterator_add() * is called. And should be reset to empty after the iteration is * finished. */ struct list_head iterator; /* * For nested iterator usage. * * Here we support at most one level of nested iterator calls like: * * LIST_HEAD(all_qgroups); * { * LIST_HEAD(local_qgroups); * qgroup_iterator_add(local_qgroups, qg); * qgroup_iterator_nested_add(all_qgroups, qg); * do_some_work(local_qgroups); * qgroup_iterator_clean(local_qgroups); * } * do_some_work(all_qgroups); * qgroup_iterator_nested_clean(all_qgroups); */ struct list_head nested_iterator; struct rb_node node; /* tree of qgroups */ /* * temp variables for accounting operations * Refer to qgroup_shared_accounting() for details. */ u64 old_refcnt; u64 new_refcnt; /* * Sysfs kobjectid */ struct kobject kobj; }; /* Glue structure to represent the relations between qgroups. */ struct btrfs_qgroup_list { struct list_head next_group; struct list_head next_member; struct btrfs_qgroup *group; struct btrfs_qgroup *member; }; struct btrfs_squota_delta { /* The fstree root this delta counts against. */ u64 root; /* The number of bytes in the extent being counted. */ u64 num_bytes; /* The generation the extent was created in. */ u64 generation; /* Whether we are using or freeing the extent. */ bool is_inc; /* Whether the extent is data or metadata. */ bool is_data; }; static inline u64 btrfs_qgroup_subvolid(u64 qgroupid) { return (qgroupid & ((1ULL << BTRFS_QGROUP_LEVEL_SHIFT) - 1)); } /* * For qgroup event trace points only */ enum { ENUM_BIT(QGROUP_RESERVE), ENUM_BIT(QGROUP_RELEASE), ENUM_BIT(QGROUP_FREE), }; enum btrfs_qgroup_mode { BTRFS_QGROUP_MODE_DISABLED, BTRFS_QGROUP_MODE_FULL, BTRFS_QGROUP_MODE_SIMPLE }; enum btrfs_qgroup_mode btrfs_qgroup_mode(const struct btrfs_fs_info *fs_info); bool btrfs_qgroup_enabled(const struct btrfs_fs_info *fs_info); bool btrfs_qgroup_full_accounting(const struct btrfs_fs_info *fs_info); int btrfs_quota_enable(struct btrfs_fs_info *fs_info, struct btrfs_ioctl_quota_ctl_args *quota_ctl_args); int btrfs_quota_disable(struct btrfs_fs_info *fs_info); int btrfs_qgroup_rescan(struct btrfs_fs_info *fs_info); void btrfs_qgroup_rescan_resume(struct btrfs_fs_info *fs_info); int btrfs_qgroup_wait_for_completion(struct btrfs_fs_info *fs_info, bool interruptible); int btrfs_add_qgroup_relation(struct btrfs_trans_handle *trans, u64 src, u64 dst, struct btrfs_qgroup_list *prealloc); int btrfs_del_qgroup_relation(struct btrfs_trans_handle *trans, u64 src, u64 dst); int btrfs_create_qgroup(struct btrfs_trans_handle *trans, u64 qgroupid); int btrfs_remove_qgroup(struct btrfs_trans_handle *trans, u64 qgroupid); int btrfs_qgroup_cleanup_dropped_subvolume(struct btrfs_fs_info *fs_info, u64 subvolid); int btrfs_limit_qgroup(struct btrfs_trans_handle *trans, u64 qgroupid, struct btrfs_qgroup_limit *limit); int btrfs_read_qgroup_config(struct btrfs_fs_info *fs_info); void btrfs_free_qgroup_config(struct btrfs_fs_info *fs_info); int btrfs_qgroup_trace_extent_nolock( struct btrfs_fs_info *fs_info, struct btrfs_delayed_ref_root *delayed_refs, struct btrfs_qgroup_extent_record *record, u64 bytenr); int btrfs_qgroup_trace_extent_post(struct btrfs_trans_handle *trans, struct btrfs_qgroup_extent_record *qrecord, u64 bytenr); int btrfs_qgroup_trace_extent(struct btrfs_trans_handle *trans, u64 bytenr, u64 num_bytes); int btrfs_qgroup_trace_leaf_items(struct btrfs_trans_handle *trans, struct extent_buffer *eb); int btrfs_qgroup_trace_subtree(struct btrfs_trans_handle *trans, struct extent_buffer *root_eb, u64 root_gen, int root_level); int btrfs_qgroup_account_extent(struct btrfs_trans_handle *trans, u64 bytenr, u64 num_bytes, struct ulist *old_roots, struct ulist *new_roots); int btrfs_qgroup_account_extents(struct btrfs_trans_handle *trans); int btrfs_run_qgroups(struct btrfs_trans_handle *trans); int btrfs_qgroup_check_inherit(struct btrfs_fs_info *fs_info, struct btrfs_qgroup_inherit *inherit, size_t size); int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, u64 srcid, u64 objectid, u64 inode_rootid, struct btrfs_qgroup_inherit *inherit); void btrfs_qgroup_free_refroot(struct btrfs_fs_info *fs_info, u64 ref_root, u64 num_bytes, enum btrfs_qgroup_rsv_type type); #ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS int btrfs_verify_qgroup_counts(const struct btrfs_fs_info *fs_info, u64 qgroupid, u64 rfer, u64 excl); #endif /* New io_tree based accurate qgroup reserve API */ int btrfs_qgroup_reserve_data(struct btrfs_inode *inode, struct extent_changeset **reserved, u64 start, u64 len); int btrfs_qgroup_release_data(struct btrfs_inode *inode, u64 start, u64 len, u64 *released); int btrfs_qgroup_free_data(struct btrfs_inode *inode, struct extent_changeset *reserved, u64 start, u64 len, u64 *freed); int btrfs_qgroup_reserve_meta(struct btrfs_root *root, int num_bytes, enum btrfs_qgroup_rsv_type type, bool enforce); int __btrfs_qgroup_reserve_meta(struct btrfs_root *root, int num_bytes, enum btrfs_qgroup_rsv_type type, bool enforce, bool noflush); /* Reserve metadata space for pertrans and prealloc type */ static inline int btrfs_qgroup_reserve_meta_pertrans(struct btrfs_root *root, int num_bytes, bool enforce) { return __btrfs_qgroup_reserve_meta(root, num_bytes, BTRFS_QGROUP_RSV_META_PERTRANS, enforce, false); } static inline int btrfs_qgroup_reserve_meta_prealloc(struct btrfs_root *root, int num_bytes, bool enforce, bool noflush) { return __btrfs_qgroup_reserve_meta(root, num_bytes, BTRFS_QGROUP_RSV_META_PREALLOC, enforce, noflush); } void __btrfs_qgroup_free_meta(struct btrfs_root *root, int num_bytes, enum btrfs_qgroup_rsv_type type); /* Free per-transaction meta reservation for error handling */ static inline void btrfs_qgroup_free_meta_pertrans(struct btrfs_root *root, int num_bytes) { __btrfs_qgroup_free_meta(root, num_bytes, BTRFS_QGROUP_RSV_META_PERTRANS); } /* Pre-allocated meta reservation can be freed at need */ static inline void btrfs_qgroup_free_meta_prealloc(struct btrfs_root *root, int num_bytes) { __btrfs_qgroup_free_meta(root, num_bytes, BTRFS_QGROUP_RSV_META_PREALLOC); } void btrfs_qgroup_free_meta_all_pertrans(struct btrfs_root *root); void btrfs_qgroup_convert_reserved_meta(struct btrfs_root *root, int num_bytes); void btrfs_qgroup_check_reserved_leak(struct btrfs_inode *inode); /* btrfs_qgroup_swapped_blocks related functions */ void btrfs_qgroup_init_swapped_blocks( struct btrfs_qgroup_swapped_blocks *swapped_blocks); void btrfs_qgroup_clean_swapped_blocks(struct btrfs_root *root); int btrfs_qgroup_add_swapped_blocks(struct btrfs_root *subvol_root, struct btrfs_block_group *bg, struct extent_buffer *subvol_parent, int subvol_slot, struct extent_buffer *reloc_parent, int reloc_slot, u64 last_snapshot); int btrfs_qgroup_trace_subtree_after_cow(struct btrfs_trans_handle *trans, struct btrfs_root *root, struct extent_buffer *eb); void btrfs_qgroup_destroy_extent_records(struct btrfs_transaction *trans); bool btrfs_check_quota_leak(const struct btrfs_fs_info *fs_info); int btrfs_record_squota_delta(struct btrfs_fs_info *fs_info, const struct btrfs_squota_delta *delta); #endif |
| 1 1 1 1 1 1 1 1 1 1 1 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 | // SPDX-License-Identifier: GPL-2.0-only /* * Driver for DiBcom DiB3000MC/P-demodulator. * * Copyright (C) 2004-7 DiBcom (http://www.dibcom.fr/) * Copyright (C) 2004-5 Patrick Boettcher (patrick.boettcher@posteo.de) * * This code is partially based on the previous dib3000mc.c . */ #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt #include <linux/kernel.h> #include <linux/slab.h> #include <linux/i2c.h> #include <media/dvb_frontend.h> #include "dib3000mc.h" static int debug; module_param(debug, int, 0644); MODULE_PARM_DESC(debug, "turn on debugging (default: 0)"); static int buggy_sfn_workaround; module_param(buggy_sfn_workaround, int, 0644); MODULE_PARM_DESC(buggy_sfn_workaround, "Enable work-around for buggy SFNs (default: 0)"); #define dprintk(fmt, arg...) do { \ if (debug) \ printk(KERN_DEBUG pr_fmt("%s: " fmt), \ __func__, ##arg); \ } while (0) struct dib3000mc_state { struct dvb_frontend demod; struct dib3000mc_config *cfg; u8 i2c_addr; struct i2c_adapter *i2c_adap; struct dibx000_i2c_master i2c_master; u32 timf; u32 current_bandwidth; u16 dev_id; u8 sfn_workaround_active :1; }; static u16 dib3000mc_read_word(struct dib3000mc_state *state, u16 reg) { struct i2c_msg msg[2] = { { .addr = state->i2c_addr >> 1, .flags = 0, .len = 2 }, { .addr = state->i2c_addr >> 1, .flags = I2C_M_RD, .len = 2 }, }; u16 word; u8 *b; b = kmalloc(4, GFP_KERNEL); if (!b) return 0; b[0] = (reg >> 8) | 0x80; b[1] = reg; b[2] = 0; b[3] = 0; msg[0].buf = b; msg[1].buf = b + 2; if (i2c_transfer(state->i2c_adap, msg, 2) != 2) dprintk("i2c read error on %d\n",reg); word = (b[2] << 8) | b[3]; kfree(b); return word; } static int dib3000mc_write_word(struct dib3000mc_state *state, u16 reg, u16 val) { struct i2c_msg msg = { .addr = state->i2c_addr >> 1, .flags = 0, .len = 4 }; int rc; u8 *b; b = kmalloc(4, GFP_KERNEL); if (!b) return -ENOMEM; b[0] = reg >> 8; b[1] = reg; b[2] = val >> 8; b[3] = val; msg.buf = b; rc = i2c_transfer(state->i2c_adap, &msg, 1) != 1 ? -EREMOTEIO : 0; kfree(b); return rc; } static int dib3000mc_identify(struct dib3000mc_state *state) { u16 value; if ((value = dib3000mc_read_word(state, 1025)) != 0x01b3) { dprintk("-E- DiB3000MC/P: wrong Vendor ID (read=0x%x)\n",value); return -EREMOTEIO; } value = dib3000mc_read_word(state, 1026); if (value != 0x3001 && value != 0x3002) { dprintk("-E- DiB3000MC/P: wrong Device ID (%x)\n",value); return -EREMOTEIO; } state->dev_id = value; dprintk("-I- found DiB3000MC/P: %x\n",state->dev_id); return 0; } static int dib3000mc_set_timing(struct dib3000mc_state *state, s16 nfft, u32 bw, u8 update_offset) { u32 timf; if (state->timf == 0) { timf = 1384402; // default value for 8MHz if (update_offset) msleep(200); // first time we do an update } else timf = state->timf; timf *= (bw / 1000); if (update_offset) { s16 tim_offs = dib3000mc_read_word(state, 416); if (tim_offs & 0x2000) tim_offs -= 0x4000; if (nfft == TRANSMISSION_MODE_2K) tim_offs *= 4; timf += tim_offs; state->timf = timf / (bw / 1000); } dprintk("timf: %d\n", timf); dib3000mc_write_word(state, 23, (u16) (timf >> 16)); dib3000mc_write_word(state, 24, (u16) (timf ) & 0xffff); return 0; } static int dib3000mc_setup_pwm_state(struct dib3000mc_state *state) { u16 reg_51, reg_52 = state->cfg->agc->setup & 0xfefb; if (state->cfg->pwm3_inversion) { reg_51 = (2 << 14) | (0 << 10) | (7 << 6) | (2 << 2) | (2 << 0); reg_52 |= (1 << 2); } else { reg_51 = (2 << 14) | (4 << 10) | (7 << 6) | (2 << 2) | (2 << 0); reg_52 |= (1 << 8); } dib3000mc_write_word(state, 51, reg_51); dib3000mc_write_word(state, 52, reg_52); if (state->cfg->use_pwm3) dib3000mc_write_word(state, 245, (1 << 3) | (1 << 0)); else dib3000mc_write_word(state, 245, 0); dib3000mc_write_word(state, 1040, 0x3); return 0; } static int dib3000mc_set_output_mode(struct dib3000mc_state *state, int mode) { int ret = 0; u16 fifo_threshold = 1792; u16 outreg = 0; u16 outmode = 0; u16 elecout = 1; u16 smo_reg = dib3000mc_read_word(state, 206) & 0x0010; /* keep the pid_parse bit */ dprintk("-I- Setting output mode for demod %p to %d\n", &state->demod, mode); switch (mode) { case OUTMODE_HIGH_Z: // disable elecout = 0; break; case OUTMODE_MPEG2_PAR_GATED_CLK: // STBs with parallel gated clock outmode = 0; break; case OUTMODE_MPEG2_PAR_CONT_CLK: // STBs with parallel continues clock outmode = 1; break; case OUTMODE_MPEG2_SERIAL: // STBs with serial input outmode = 2; break; case OUTMODE_MPEG2_FIFO: // e.g. USB feeding elecout = 3; /*ADDR @ 206 : P_smo_error_discard [1;6:6] = 0 P_smo_rs_discard [1;5:5] = 0 P_smo_pid_parse [1;4:4] = 0 P_smo_fifo_flush [1;3:3] = 0 P_smo_mode [2;2:1] = 11 P_smo_ovf_prot [1;0:0] = 0 */ smo_reg |= 3 << 1; fifo_threshold = 512; outmode = 5; break; case OUTMODE_DIVERSITY: outmode = 4; elecout = 1; break; default: dprintk("Unhandled output_mode passed to be set for demod %p\n",&state->demod); outmode = 0; break; } if ((state->cfg->output_mpeg2_in_188_bytes)) smo_reg |= (1 << 5); // P_smo_rs_discard [1;5:5] = 1 outreg = dib3000mc_read_word(state, 244) & 0x07FF; outreg |= (outmode << 11); ret |= dib3000mc_write_word(state, 244, outreg); ret |= dib3000mc_write_word(state, 206, smo_reg); /*smo_ mode*/ ret |= dib3000mc_write_word(state, 207, fifo_threshold); /* synchronous fread */ ret |= dib3000mc_write_word(state, 1040, elecout); /* P_out_cfg */ return ret; } static int dib3000mc_set_bandwidth(struct dib3000mc_state *state, u32 bw) { u16 bw_cfg[6] = { 0 }; u16 imp_bw_cfg[3] = { 0 }; u16 reg; /* settings here are for 27.7MHz */ switch (bw) { case 8000: bw_cfg[0] = 0x0019; bw_cfg[1] = 0x5c30; bw_cfg[2] = 0x0054; bw_cfg[3] = 0x88a0; bw_cfg[4] = 0x01a6; bw_cfg[5] = 0xab20; imp_bw_cfg[0] = 0x04db; imp_bw_cfg[1] = 0x00db; imp_bw_cfg[2] = 0x00b7; break; case 7000: bw_cfg[0] = 0x001c; bw_cfg[1] = 0xfba5; bw_cfg[2] = 0x0060; bw_cfg[3] = 0x9c25; bw_cfg[4] = 0x01e3; bw_cfg[5] = 0x0cb7; imp_bw_cfg[0] = 0x04c0; imp_bw_cfg[1] = 0x00c0; imp_bw_cfg[2] = 0x00a0; break; case 6000: bw_cfg[0] = 0x0021; bw_cfg[1] = 0xd040; bw_cfg[2] = 0x0070; bw_cfg[3] = 0xb62b; bw_cfg[4] = 0x0233; bw_cfg[5] = 0x8ed5; imp_bw_cfg[0] = 0x04a5; imp_bw_cfg[1] = 0x00a5; imp_bw_cfg[2] = 0x0089; break; case 5000: bw_cfg[0] = 0x0028; bw_cfg[1] = 0x9380; bw_cfg[2] = 0x0087; bw_cfg[3] = 0x4100; bw_cfg[4] = 0x02a4; bw_cfg[5] = 0x4500; imp_bw_cfg[0] = 0x0489; imp_bw_cfg[1] = 0x0089; imp_bw_cfg[2] = 0x0072; break; default: return -EINVAL; } for (reg = 6; reg < 12; reg++) dib3000mc_write_word(state, reg, bw_cfg[reg - 6]); dib3000mc_write_word(state, 12, 0x0000); dib3000mc_write_word(state, 13, 0x03e8); dib3000mc_write_word(state, 14, 0x0000); dib3000mc_write_word(state, 15, 0x03f2); dib3000mc_write_word(state, 16, 0x0001); dib3000mc_write_word(state, 17, 0xb0d0); // P_sec_len dib3000mc_write_word(state, 18, 0x0393); dib3000mc_write_word(state, 19, 0x8700); for (reg = 55; reg < 58; reg++) dib3000mc_write_word(state, reg, imp_bw_cfg[reg - 55]); // Timing configuration dib3000mc_set_timing(state, TRANSMISSION_MODE_2K, bw, 0); return 0; } static u16 impulse_noise_val[29] = { 0x38, 0x6d9, 0x3f28, 0x7a7, 0x3a74, 0x196, 0x32a, 0x48c, 0x3ffe, 0x7f3, 0x2d94, 0x76, 0x53d, 0x3ff8, 0x7e3, 0x3320, 0x76, 0x5b3, 0x3feb, 0x7d2, 0x365e, 0x76, 0x48c, 0x3ffe, 0x5b3, 0x3feb, 0x76, 0x0000, 0xd }; static void dib3000mc_set_impulse_noise(struct dib3000mc_state *state, u8 mode, s16 nfft) { u16 i; for (i = 58; i < 87; i++) dib3000mc_write_word(state, i, impulse_noise_val[i-58]); if (nfft == TRANSMISSION_MODE_8K) { dib3000mc_write_word(state, 58, 0x3b); dib3000mc_write_word(state, 84, 0x00); dib3000mc_write_word(state, 85, 0x8200); } dib3000mc_write_word(state, 34, 0x1294); dib3000mc_write_word(state, 35, 0x1ff8); if (mode == 1) dib3000mc_write_word(state, 55, dib3000mc_read_word(state, 55) | (1 << 10)); } static int dib3000mc_init(struct dvb_frontend *demod) { struct dib3000mc_state *state = demod->demodulator_priv; struct dibx000_agc_config *agc = state->cfg->agc; // Restart Configuration dib3000mc_write_word(state, 1027, 0x8000); dib3000mc_write_word(state, 1027, 0x0000); // power up the demod + mobility configuration dib3000mc_write_word(state, 140, 0x0000); dib3000mc_write_word(state, 1031, 0); if (state->cfg->mobile_mode) { dib3000mc_write_word(state, 139, 0x0000); dib3000mc_write_word(state, 141, 0x0000); dib3000mc_write_word(state, 175, 0x0002); dib3000mc_write_word(state, 1032, 0x0000); } else { dib3000mc_write_word(state, 139, 0x0001); dib3000mc_write_word(state, 141, 0x0000); dib3000mc_write_word(state, 175, 0x0000); dib3000mc_write_word(state, 1032, 0x012C); } dib3000mc_write_word(state, 1033, 0x0000); // P_clk_cfg dib3000mc_write_word(state, 1037, 0x3130); // other configurations // P_ctrl_sfreq dib3000mc_write_word(state, 33, (5 << 0)); dib3000mc_write_word(state, 88, (1 << 10) | (0x10 << 0)); // Phase noise control // P_fft_phacor_inh, P_fft_phacor_cpe, P_fft_powrange dib3000mc_write_word(state, 99, (1 << 9) | (0x20 << 0)); if (state->cfg->phase_noise_mode == 0) dib3000mc_write_word(state, 111, 0x00); else dib3000mc_write_word(state, 111, 0x02); // P_agc_global dib3000mc_write_word(state, 50, 0x8000); // agc setup misc dib3000mc_setup_pwm_state(state); // P_agc_counter_lock dib3000mc_write_word(state, 53, 0x87); // P_agc_counter_unlock dib3000mc_write_word(state, 54, 0x87); /* agc */ dib3000mc_write_word(state, 36, state->cfg->max_time); dib3000mc_write_word(state, 37, (state->cfg->agc_command1 << 13) | (state->cfg->agc_command2 << 12) | (0x1d << 0)); dib3000mc_write_word(state, 38, state->cfg->pwm3_value); dib3000mc_write_word(state, 39, state->cfg->ln_adc_level); // set_agc_loop_Bw dib3000mc_write_word(state, 40, 0x0179); dib3000mc_write_word(state, 41, 0x03f0); dib3000mc_write_word(state, 42, agc->agc1_max); dib3000mc_write_word(state, 43, agc->agc1_min); dib3000mc_write_word(state, 44, agc->agc2_max); dib3000mc_write_word(state, 45, agc->agc2_min); dib3000mc_write_word(state, 46, (agc->agc1_pt1 << 8) | agc->agc1_pt2); dib3000mc_write_word(state, 47, (agc->agc1_slope1 << 8) | agc->agc1_slope2); dib3000mc_write_word(state, 48, (agc->agc2_pt1 << 8) | agc->agc2_pt2); dib3000mc_write_word(state, 49, (agc->agc2_slope1 << 8) | agc->agc2_slope2); // Begin: TimeOut registers // P_pha3_thres dib3000mc_write_word(state, 110, 3277); // P_timf_alpha = 6, P_corm_alpha = 6, P_corm_thres = 0x80 dib3000mc_write_word(state, 26, 0x6680); // lock_mask0 dib3000mc_write_word(state, 1, 4); // lock_mask1 dib3000mc_write_word(state, 2, 4); // lock_mask2 dib3000mc_write_word(state, 3, 0x1000); // P_search_maxtrial=1 dib3000mc_write_word(state, 5, 1); dib3000mc_set_bandwidth(state, 8000); // div_lock_mask dib3000mc_write_word(state, 4, 0x814); dib3000mc_write_word(state, 21, (1 << 9) | 0x164); dib3000mc_write_word(state, 22, 0x463d); // Spurious rm cfg // P_cspu_regul, P_cspu_win_cut dib3000mc_write_word(state, 120, 0x200f); // P_adp_selec_monit dib3000mc_write_word(state, 134, 0); // Fec cfg dib3000mc_write_word(state, 195, 0x10); // diversity register: P_dvsy_sync_wait.. dib3000mc_write_word(state, 180, 0x2FF0); // Impulse noise configuration dib3000mc_set_impulse_noise(state, 0, TRANSMISSION_MODE_8K); // output mode set-up dib3000mc_set_output_mode(state, OUTMODE_HIGH_Z); /* close the i2c-gate */ dib3000mc_write_word(state, 769, (1 << 7) ); return 0; } static int dib3000mc_sleep(struct dvb_frontend *demod) { struct dib3000mc_state *state = demod->demodulator_priv; dib3000mc_write_word(state, 1031, 0xFFFF); dib3000mc_write_word(state, 1032, 0xFFFF); dib3000mc_write_word(state, 1033, 0xFFF0); return 0; } static void dib3000mc_set_adp_cfg(struct dib3000mc_state *state, s16 qam) { u16 cfg[4] = { 0 },reg; switch (qam) { case QPSK: cfg[0] = 0x099a; cfg[1] = 0x7fae; cfg[2] = 0x0333; cfg[3] = 0x7ff0; break; case QAM_16: cfg[0] = 0x023d; cfg[1] = 0x7fdf; cfg[2] = 0x00a4; cfg[3] = 0x7ff0; break; case QAM_64: cfg[0] = 0x0148; cfg[1] = 0x7ff0; cfg[2] = 0x00a4; cfg[3] = 0x7ff8; break; } for (reg = 129; reg < 133; reg++) dib3000mc_write_word(state, reg, cfg[reg - 129]); } static void dib3000mc_set_channel_cfg(struct dib3000mc_state *state, struct dtv_frontend_properties *ch, u16 seq) { u16 value; u32 bw = BANDWIDTH_TO_KHZ(ch->bandwidth_hz); dib3000mc_set_bandwidth(state, bw); dib3000mc_set_timing(state, ch->transmission_mode, bw, 0); #if 1 dib3000mc_write_word(state, 100, (16 << 6) + 9); #else if (boost) dib3000mc_write_word(state, 100, (11 << 6) + 6); else dib3000mc_write_word(state, 100, (16 << 6) + 9); #endif dib3000mc_write_word(state, 1027, 0x0800); dib3000mc_write_word(state, 1027, 0x0000); //Default cfg isi offset adp dib3000mc_write_word(state, 26, 0x6680); dib3000mc_write_word(state, 29, 0x1273); dib3000mc_write_word(state, 33, 5); dib3000mc_set_adp_cfg(state, QAM_16); dib3000mc_write_word(state, 133, 15564); dib3000mc_write_word(state, 12 , 0x0); dib3000mc_write_word(state, 13 , 0x3e8); dib3000mc_write_word(state, 14 , 0x0); dib3000mc_write_word(state, 15 , 0x3f2); dib3000mc_write_word(state, 93,0); dib3000mc_write_word(state, 94,0); dib3000mc_write_word(state, 95,0); dib3000mc_write_word(state, 96,0); dib3000mc_write_word(state, 97,0); dib3000mc_write_word(state, 98,0); dib3000mc_set_impulse_noise(state, 0, ch->transmission_mode); value = 0; switch (ch->transmission_mode) { case TRANSMISSION_MODE_2K: value |= (0 << 7); break; default: case TRANSMISSION_MODE_8K: value |= (1 << 7); break; } switch (ch->guard_interval) { case GUARD_INTERVAL_1_32: value |= (0 << 5); break; case GUARD_INTERVAL_1_16: value |= (1 << 5); break; case GUARD_INTERVAL_1_4: value |= (3 << 5); break; default: case GUARD_INTERVAL_1_8: value |= (2 << 5); break; } switch (ch->modulation) { case QPSK: value |= (0 << 3); break; case QAM_16: value |= (1 << 3); break; default: case QAM_64: value |= (2 << 3); break; } switch (HIERARCHY_1) { case HIERARCHY_2: value |= 2; break; case HIERARCHY_4: value |= 4; break; default: case HIERARCHY_1: value |= 1; break; } dib3000mc_write_word(state, 0, value); dib3000mc_write_word(state, 5, (1 << 8) | ((seq & 0xf) << 4)); value = 0; if (ch->hierarchy == 1) value |= (1 << 4); if (1 == 1) value |= 1; switch ((ch->hierarchy == 0 || 1 == 1) ? ch->code_rate_HP : ch->code_rate_LP) { case FEC_2_3: value |= (2 << 1); break; case FEC_3_4: value |= (3 << 1); break; case FEC_5_6: value |= (5 << 1); break; case FEC_7_8: value |= (7 << 1); break; default: case FEC_1_2: value |= (1 << 1); break; } dib3000mc_write_word(state, 181, value); // diversity synchro delay add 50% SFN margin switch (ch->transmission_mode) { case TRANSMISSION_MODE_8K: value = 256; break; case TRANSMISSION_MODE_2K: default: value = 64; break; } switch (ch->guard_interval) { case GUARD_INTERVAL_1_16: value *= 2; break; case GUARD_INTERVAL_1_8: value *= 4; break; case GUARD_INTERVAL_1_4: value *= 8; break; default: case GUARD_INTERVAL_1_32: value *= 1; break; } value <<= 4; value |= dib3000mc_read_word(state, 180) & 0x000f; dib3000mc_write_word(state, 180, value); // restart demod value = dib3000mc_read_word(state, 0); dib3000mc_write_word(state, 0, value | (1 << 9)); dib3000mc_write_word(state, 0, value); msleep(30); dib3000mc_set_impulse_noise(state, state->cfg->impulse_noise_mode, ch->transmission_mode); } static int dib3000mc_autosearch_start(struct dvb_frontend *demod) { struct dtv_frontend_properties *chan = &demod->dtv_property_cache; struct dib3000mc_state *state = demod->demodulator_priv; u16 reg; // u32 val; struct dtv_frontend_properties schan; schan = *chan; /* TODO what is that ? */ /* a channel for autosearch */ schan.transmission_mode = TRANSMISSION_MODE_8K; schan.guard_interval = GUARD_INTERVAL_1_32; schan.modulation = QAM_64; schan.code_rate_HP = FEC_2_3; schan.code_rate_LP = FEC_2_3; schan.hierarchy = 0; dib3000mc_set_channel_cfg(state, &schan, 11); reg = dib3000mc_read_word(state, 0); dib3000mc_write_word(state, 0, reg | (1 << 8)); dib3000mc_read_word(state, 511); dib3000mc_write_word(state, 0, reg); return 0; } static int dib3000mc_autosearch_is_irq(struct dvb_frontend *demod) { struct dib3000mc_state *state = demod->demodulator_priv; u16 irq_pending = dib3000mc_read_word(state, 511); if (irq_pending & 0x1) // failed return 1; if (irq_pending & 0x2) // succeeded return 2; return 0; // still pending } static int dib3000mc_tune(struct dvb_frontend *demod) { struct dtv_frontend_properties *ch = &demod->dtv_property_cache; struct dib3000mc_state *state = demod->demodulator_priv; // ** configure demod ** dib3000mc_set_channel_cfg(state, ch, 0); // activates isi if (state->sfn_workaround_active) { dprintk("SFN workaround is active\n"); dib3000mc_write_word(state, 29, 0x1273); dib3000mc_write_word(state, 108, 0x4000); // P_pha3_force_pha_shift } else { dib3000mc_write_word(state, 29, 0x1073); dib3000mc_write_word(state, 108, 0x0000); // P_pha3_force_pha_shift } dib3000mc_set_adp_cfg(state, (u8)ch->modulation); if (ch->transmission_mode == TRANSMISSION_MODE_8K) { dib3000mc_write_word(state, 26, 38528); dib3000mc_write_word(state, 33, 8); } else { dib3000mc_write_word(state, 26, 30336); dib3000mc_write_word(state, 33, 6); } if (dib3000mc_read_word(state, 509) & 0x80) dib3000mc_set_timing(state, ch->transmission_mode, BANDWIDTH_TO_KHZ(ch->bandwidth_hz), 1); return 0; } struct i2c_adapter * dib3000mc_get_tuner_i2c_master(struct dvb_frontend *demod, int gating) { struct dib3000mc_state *st = demod->demodulator_priv; return dibx000_get_i2c_adapter(&st->i2c_master, DIBX000_I2C_INTERFACE_TUNER, gating); } EXPORT_SYMBOL(dib3000mc_get_tuner_i2c_master); static int dib3000mc_get_frontend(struct dvb_frontend* fe, struct dtv_frontend_properties *fep) { struct dib3000mc_state *state = fe->demodulator_priv; u16 tps = dib3000mc_read_word(state,458); fep->inversion = INVERSION_AUTO; fep->bandwidth_hz = state->current_bandwidth; switch ((tps >> 8) & 0x1) { case 0: fep->transmission_mode = TRANSMISSION_MODE_2K; break; case 1: fep->transmission_mode = TRANSMISSION_MODE_8K; break; } switch (tps & 0x3) { case 0: fep->guard_interval = GUARD_INTERVAL_1_32; break; case 1: fep->guard_interval = GUARD_INTERVAL_1_16; break; case 2: fep->guard_interval = GUARD_INTERVAL_1_8; break; case 3: fep->guard_interval = GUARD_INTERVAL_1_4; break; } switch ((tps >> 13) & 0x3) { case 0: fep->modulation = QPSK; break; case 1: fep->modulation = QAM_16; break; case 2: default: fep->modulation = QAM_64; break; } /* as long as the frontend_param structure is fixed for hierarchical transmission I refuse to use it */ /* (tps >> 12) & 0x1 == hrch is used, (tps >> 9) & 0x7 == alpha */ fep->hierarchy = HIERARCHY_NONE; switch ((tps >> 5) & 0x7) { case 1: fep->code_rate_HP = FEC_1_2; break; case 2: fep->code_rate_HP = FEC_2_3; break; case 3: fep->code_rate_HP = FEC_3_4; break; case 5: fep->code_rate_HP = FEC_5_6; break; case 7: default: fep->code_rate_HP = FEC_7_8; break; } switch ((tps >> 2) & 0x7) { case 1: fep->code_rate_LP = FEC_1_2; break; case 2: fep->code_rate_LP = FEC_2_3; break; case 3: fep->code_rate_LP = FEC_3_4; break; case 5: fep->code_rate_LP = FEC_5_6; break; case 7: default: fep->code_rate_LP = FEC_7_8; break; } return 0; } static int dib3000mc_set_frontend(struct dvb_frontend *fe) { struct dtv_frontend_properties *fep = &fe->dtv_property_cache; struct dib3000mc_state *state = fe->demodulator_priv; int ret; dib3000mc_set_output_mode(state, OUTMODE_HIGH_Z); state->current_bandwidth = fep->bandwidth_hz; dib3000mc_set_bandwidth(state, BANDWIDTH_TO_KHZ(fep->bandwidth_hz)); /* maybe the parameter has been changed */ state->sfn_workaround_active = buggy_sfn_workaround; if (fe->ops.tuner_ops.set_params) { fe->ops.tuner_ops.set_params(fe); msleep(100); } if (fep->transmission_mode == TRANSMISSION_MODE_AUTO || fep->guard_interval == GUARD_INTERVAL_AUTO || fep->modulation == QAM_AUTO || fep->code_rate_HP == FEC_AUTO) { int i = 1000, found; dib3000mc_autosearch_start(fe); do { msleep(1); found = dib3000mc_autosearch_is_irq(fe); } while (found == 0 && i--); dprintk("autosearch returns: %d\n",found); if (found == 0 || found == 1) return 0; // no channel found dib3000mc_get_frontend(fe, fep); } ret = dib3000mc_tune(fe); /* make this a config parameter */ dib3000mc_set_output_mode(state, OUTMODE_MPEG2_FIFO); return ret; } static int dib3000mc_read_status(struct dvb_frontend *fe, enum fe_status *stat) { struct dib3000mc_state *state = fe->demodulator_priv; u16 lock = dib3000mc_read_word(state, 509); *stat = 0; if (lock & 0x8000) *stat |= FE_HAS_SIGNAL; if (lock & 0x3000) *stat |= FE_HAS_CARRIER; if (lock & 0x0100) *stat |= FE_HAS_VITERBI; if (lock & 0x0010) *stat |= FE_HAS_SYNC; if (lock & 0x0008) *stat |= FE_HAS_LOCK; return 0; } static int dib3000mc_read_ber(struct dvb_frontend *fe, u32 *ber) { struct dib3000mc_state *state = fe->demodulator_priv; *ber = (dib3000mc_read_word(state, 500) << 16) | dib3000mc_read_word(state, 501); return 0; } static int dib3000mc_read_unc_blocks(struct dvb_frontend *fe, u32 *unc) { struct dib3000mc_state *state = fe->demodulator_priv; *unc = dib3000mc_read_word(state, 508); return 0; } static int dib3000mc_read_signal_strength(struct dvb_frontend *fe, u16 *strength) { struct dib3000mc_state *state = fe->demodulator_priv; u16 val = dib3000mc_read_word(state, 392); *strength = 65535 - val; return 0; } static int dib3000mc_read_snr(struct dvb_frontend* fe, u16 *snr) { *snr = 0x0000; return 0; } static int dib3000mc_fe_get_tune_settings(struct dvb_frontend* fe, struct dvb_frontend_tune_settings *tune) { tune->min_delay_ms = 1000; return 0; } static void dib3000mc_release(struct dvb_frontend *fe) { struct dib3000mc_state *state = fe->demodulator_priv; dibx000_exit_i2c_master(&state->i2c_master); kfree(state); } int dib3000mc_pid_control(struct dvb_frontend *fe, int index, int pid,int onoff) { struct dib3000mc_state *state = fe->demodulator_priv; dib3000mc_write_word(state, 212 + index, onoff ? (1 << 13) | pid : 0); return 0; } EXPORT_SYMBOL(dib3000mc_pid_control); int dib3000mc_pid_parse(struct dvb_frontend *fe, int onoff) { struct dib3000mc_state *state = fe->demodulator_priv; u16 tmp = dib3000mc_read_word(state, 206) & ~(1 << 4); tmp |= (onoff << 4); return dib3000mc_write_word(state, 206, tmp); } EXPORT_SYMBOL(dib3000mc_pid_parse); void dib3000mc_set_config(struct dvb_frontend *fe, struct dib3000mc_config *cfg) { struct dib3000mc_state *state = fe->demodulator_priv; state->cfg = cfg; } EXPORT_SYMBOL(dib3000mc_set_config); int dib3000mc_i2c_enumeration(struct i2c_adapter *i2c, int no_of_demods, u8 default_addr, struct dib3000mc_config cfg[]) { struct dib3000mc_state *dmcst; int k; u8 new_addr; static const u8 DIB3000MC_I2C_ADDRESS[] = { 20, 22, 24, 26 }; dmcst = kzalloc(sizeof(struct dib3000mc_state), GFP_KERNEL); if (dmcst == NULL) return -ENOMEM; dmcst->i2c_adap = i2c; for (k = no_of_demods-1; k >= 0; k--) { dmcst->cfg = &cfg[k]; /* designated i2c address */ new_addr = DIB3000MC_I2C_ADDRESS[k]; dmcst->i2c_addr = new_addr; if (dib3000mc_identify(dmcst) != 0) { dmcst->i2c_addr = default_addr; if (dib3000mc_identify(dmcst) != 0) { dprintk("-E- DiB3000P/MC #%d: not identified\n", k); kfree(dmcst); return -ENODEV; } } dib3000mc_set_output_mode(dmcst, OUTMODE_MPEG2_PAR_CONT_CLK); // set new i2c address and force divstr (Bit 1) to value 0 (Bit 0) dib3000mc_write_word(dmcst, 1024, (new_addr << 3) | 0x1); dmcst->i2c_addr = new_addr; } for (k = 0; k < no_of_demods; k++) { dmcst->cfg = &cfg[k]; dmcst->i2c_addr = DIB3000MC_I2C_ADDRESS[k]; dib3000mc_write_word(dmcst, 1024, dmcst->i2c_addr << 3); /* turn off data output */ dib3000mc_set_output_mode(dmcst, OUTMODE_HIGH_Z); } kfree(dmcst); return 0; } EXPORT_SYMBOL(dib3000mc_i2c_enumeration); static const struct dvb_frontend_ops dib3000mc_ops; struct dvb_frontend * dib3000mc_attach(struct i2c_adapter *i2c_adap, u8 i2c_addr, struct dib3000mc_config *cfg) { struct dvb_frontend *demod; struct dib3000mc_state *st; st = kzalloc(sizeof(struct dib3000mc_state), GFP_KERNEL); if (st == NULL) return NULL; st->cfg = cfg; st->i2c_adap = i2c_adap; st->i2c_addr = i2c_addr; demod = &st->demod; demod->demodulator_priv = st; memcpy(&st->demod.ops, &dib3000mc_ops, sizeof(struct dvb_frontend_ops)); if (dib3000mc_identify(st) != 0) goto error; dibx000_init_i2c_master(&st->i2c_master, DIB3000MC, st->i2c_adap, st->i2c_addr); dib3000mc_write_word(st, 1037, 0x3130); return demod; error: kfree(st); return NULL; } EXPORT_SYMBOL_GPL(dib3000mc_attach); static const struct dvb_frontend_ops dib3000mc_ops = { .delsys = { SYS_DVBT }, .info = { .name = "DiBcom 3000MC/P", .frequency_min_hz = 44250 * kHz, .frequency_max_hz = 867250 * kHz, .frequency_stepsize_hz = 62500, .caps = FE_CAN_INVERSION_AUTO | FE_CAN_FEC_1_2 | FE_CAN_FEC_2_3 | FE_CAN_FEC_3_4 | FE_CAN_FEC_5_6 | FE_CAN_FEC_7_8 | FE_CAN_FEC_AUTO | FE_CAN_QPSK | FE_CAN_QAM_16 | FE_CAN_QAM_64 | FE_CAN_QAM_AUTO | FE_CAN_TRANSMISSION_MODE_AUTO | FE_CAN_GUARD_INTERVAL_AUTO | FE_CAN_RECOVER | FE_CAN_HIERARCHY_AUTO, }, .release = dib3000mc_release, .init = dib3000mc_init, .sleep = dib3000mc_sleep, .set_frontend = dib3000mc_set_frontend, .get_tune_settings = dib3000mc_fe_get_tune_settings, .get_frontend = dib3000mc_get_frontend, .read_status = dib3000mc_read_status, .read_ber = dib3000mc_read_ber, .read_signal_strength = dib3000mc_read_signal_strength, .read_snr = dib3000mc_read_snr, .read_ucblocks = dib3000mc_read_unc_blocks, }; MODULE_AUTHOR("Patrick Boettcher <patrick.boettcher@posteo.de>"); MODULE_DESCRIPTION("Driver for the DiBcom 3000MC/P COFDM demodulator"); MODULE_LICENSE("GPL"); |
| 6 7 6 6 6 7 7 8 8 5 2 3 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 | // SPDX-License-Identifier: GPL-2.0-or-later /* Kernel cryptographic api. * cast6.c - Cast6 cipher algorithm [rfc2612]. * * CAST-256 (*cast6*) is a DES like Substitution-Permutation Network (SPN) * cryptosystem built upon the CAST-128 (*cast5*) [rfc2144] encryption * algorithm. * * Copyright (C) 2003 Kartikey Mahendra Bhatt <kartik_me@hotmail.com>. */ #include <linux/unaligned.h> #include <crypto/algapi.h> #include <linux/init.h> #include <linux/module.h> #include <linux/errno.h> #include <linux/string.h> #include <linux/types.h> #include <crypto/cast6.h> #define s1 cast_s1 #define s2 cast_s2 #define s3 cast_s3 #define s4 cast_s4 #define F1(D, r, m) ((I = ((m) + (D))), (I = rol32(I, (r))), \ (((s1[I >> 24] ^ s2[(I>>16)&0xff]) - s3[(I>>8)&0xff]) + s4[I&0xff])) #define F2(D, r, m) ((I = ((m) ^ (D))), (I = rol32(I, (r))), \ (((s1[I >> 24] - s2[(I>>16)&0xff]) + s3[(I>>8)&0xff]) ^ s4[I&0xff])) #define F3(D, r, m) ((I = ((m) - (D))), (I = rol32(I, (r))), \ (((s1[I >> 24] + s2[(I>>16)&0xff]) ^ s3[(I>>8)&0xff]) - s4[I&0xff])) static const u32 Tm[24][8] = { { 0x5a827999, 0xc95c653a, 0x383650db, 0xa7103c7c, 0x15ea281d, 0x84c413be, 0xf39dff5f, 0x6277eb00 } , { 0xd151d6a1, 0x402bc242, 0xaf05ade3, 0x1ddf9984, 0x8cb98525, 0xfb9370c6, 0x6a6d5c67, 0xd9474808 } , { 0x482133a9, 0xb6fb1f4a, 0x25d50aeb, 0x94aef68c, 0x0388e22d, 0x7262cdce, 0xe13cb96f, 0x5016a510 } , { 0xbef090b1, 0x2dca7c52, 0x9ca467f3, 0x0b7e5394, 0x7a583f35, 0xe9322ad6, 0x580c1677, 0xc6e60218 } , { 0x35bfedb9, 0xa499d95a, 0x1373c4fb, 0x824db09c, 0xf1279c3d, 0x600187de, 0xcedb737f, 0x3db55f20 } , { 0xac8f4ac1, 0x1b693662, 0x8a432203, 0xf91d0da4, 0x67f6f945, 0xd6d0e4e6, 0x45aad087, 0xb484bc28 } , { 0x235ea7c9, 0x9238936a, 0x01127f0b, 0x6fec6aac, 0xdec6564d, 0x4da041ee, 0xbc7a2d8f, 0x2b541930 } , { 0x9a2e04d1, 0x0907f072, 0x77e1dc13, 0xe6bbc7b4, 0x5595b355, 0xc46f9ef6, 0x33498a97, 0xa2237638 } , { 0x10fd61d9, 0x7fd74d7a, 0xeeb1391b, 0x5d8b24bc, 0xcc65105d, 0x3b3efbfe, 0xaa18e79f, 0x18f2d340 } , { 0x87ccbee1, 0xf6a6aa82, 0x65809623, 0xd45a81c4, 0x43346d65, 0xb20e5906, 0x20e844a7, 0x8fc23048 } , { 0xfe9c1be9, 0x6d76078a, 0xdc4ff32b, 0x4b29decc, 0xba03ca6d, 0x28ddb60e, 0x97b7a1af, 0x06918d50 } , { 0x756b78f1, 0xe4456492, 0x531f5033, 0xc1f93bd4, 0x30d32775, 0x9fad1316, 0x0e86feb7, 0x7d60ea58 } , { 0xec3ad5f9, 0x5b14c19a, 0xc9eead3b, 0x38c898dc, 0xa7a2847d, 0x167c701e, 0x85565bbf, 0xf4304760 } , { 0x630a3301, 0xd1e41ea2, 0x40be0a43, 0xaf97f5e4, 0x1e71e185, 0x8d4bcd26, 0xfc25b8c7, 0x6affa468 } , { 0xd9d99009, 0x48b37baa, 0xb78d674b, 0x266752ec, 0x95413e8d, 0x041b2a2e, 0x72f515cf, 0xe1cf0170 } , { 0x50a8ed11, 0xbf82d8b2, 0x2e5cc453, 0x9d36aff4, 0x0c109b95, 0x7aea8736, 0xe9c472d7, 0x589e5e78 } , { 0xc7784a19, 0x365235ba, 0xa52c215b, 0x14060cfc, 0x82dff89d, 0xf1b9e43e, 0x6093cfdf, 0xcf6dbb80 } , { 0x3e47a721, 0xad2192c2, 0x1bfb7e63, 0x8ad56a04, 0xf9af55a5, 0x68894146, 0xd7632ce7, 0x463d1888 } , { 0xb5170429, 0x23f0efca, 0x92cadb6b, 0x01a4c70c, 0x707eb2ad, 0xdf589e4e, 0x4e3289ef, 0xbd0c7590 } , { 0x2be66131, 0x9ac04cd2, 0x099a3873, 0x78742414, 0xe74e0fb5, 0x5627fb56, 0xc501e6f7, 0x33dbd298 } , { 0xa2b5be39, 0x118fa9da, 0x8069957b, 0xef43811c, 0x5e1d6cbd, 0xccf7585e, 0x3bd143ff, 0xaaab2fa0 } , { 0x19851b41, 0x885f06e2, 0xf738f283, 0x6612de24, 0xd4ecc9c5, 0x43c6b566, 0xb2a0a107, 0x217a8ca8 } , { 0x90547849, 0xff2e63ea, 0x6e084f8b, 0xdce23b2c, 0x4bbc26cd, 0xba96126e, 0x296ffe0f, 0x9849e9b0 } , { 0x0723d551, 0x75fdc0f2, 0xe4d7ac93, 0x53b19834, 0xc28b83d5, 0x31656f76, 0xa03f5b17, 0x0f1946b8 } }; static const u8 Tr[4][8] = { { 0x13, 0x04, 0x15, 0x06, 0x17, 0x08, 0x19, 0x0a } , { 0x1b, 0x0c, 0x1d, 0x0e, 0x1f, 0x10, 0x01, 0x12 } , { 0x03, 0x14, 0x05, 0x16, 0x07, 0x18, 0x09, 0x1a } , { 0x0b, 0x1c, 0x0d, 0x1e, 0x0f, 0x00, 0x11, 0x02 } }; /* forward octave */ static inline void W(u32 *key, unsigned int i) { u32 I; key[6] ^= F1(key[7], Tr[i % 4][0], Tm[i][0]); key[5] ^= F2(key[6], Tr[i % 4][1], Tm[i][1]); key[4] ^= F3(key[5], Tr[i % 4][2], Tm[i][2]); key[3] ^= F1(key[4], Tr[i % 4][3], Tm[i][3]); key[2] ^= F2(key[3], Tr[i % 4][4], Tm[i][4]); key[1] ^= F3(key[2], Tr[i % 4][5], Tm[i][5]); key[0] ^= F1(key[1], Tr[i % 4][6], Tm[i][6]); key[7] ^= F2(key[0], Tr[i % 4][7], Tm[i][7]); } int __cast6_setkey(struct cast6_ctx *c, const u8 *in_key, unsigned int key_len) { int i; u32 key[8]; __be32 p_key[8]; /* padded key */ if (key_len % 4 != 0) return -EINVAL; memset(p_key, 0, 32); memcpy(p_key, in_key, key_len); key[0] = be32_to_cpu(p_key[0]); /* A */ key[1] = be32_to_cpu(p_key[1]); /* B */ key[2] = be32_to_cpu(p_key[2]); /* C */ key[3] = be32_to_cpu(p_key[3]); /* D */ key[4] = be32_to_cpu(p_key[4]); /* E */ key[5] = be32_to_cpu(p_key[5]); /* F */ key[6] = be32_to_cpu(p_key[6]); /* G */ key[7] = be32_to_cpu(p_key[7]); /* H */ for (i = 0; i < 12; i++) { W(key, 2 * i); W(key, 2 * i + 1); c->Kr[i][0] = key[0] & 0x1f; c->Kr[i][1] = key[2] & 0x1f; c->Kr[i][2] = key[4] & 0x1f; c->Kr[i][3] = key[6] & 0x1f; c->Km[i][0] = key[7]; c->Km[i][1] = key[5]; c->Km[i][2] = key[3]; c->Km[i][3] = key[1]; } return 0; } EXPORT_SYMBOL_GPL(__cast6_setkey); int cast6_setkey(struct crypto_tfm *tfm, const u8 *key, unsigned int keylen) { return __cast6_setkey(crypto_tfm_ctx(tfm), key, keylen); } EXPORT_SYMBOL_GPL(cast6_setkey); /*forward quad round*/ static inline void Q(u32 *block, const u8 *Kr, const u32 *Km) { u32 I; block[2] ^= F1(block[3], Kr[0], Km[0]); block[1] ^= F2(block[2], Kr[1], Km[1]); block[0] ^= F3(block[1], Kr[2], Km[2]); block[3] ^= F1(block[0], Kr[3], Km[3]); } /*reverse quad round*/ static inline void QBAR(u32 *block, const u8 *Kr, const u32 *Km) { u32 I; block[3] ^= F1(block[0], Kr[3], Km[3]); block[0] ^= F3(block[1], Kr[2], Km[2]); block[1] ^= F2(block[2], Kr[1], Km[1]); block[2] ^= F1(block[3], Kr[0], Km[0]); } void __cast6_encrypt(const void *ctx, u8 *outbuf, const u8 *inbuf) { const struct cast6_ctx *c = ctx; u32 block[4]; const u32 *Km; const u8 *Kr; block[0] = get_unaligned_be32(inbuf); block[1] = get_unaligned_be32(inbuf + 4); block[2] = get_unaligned_be32(inbuf + 8); block[3] = get_unaligned_be32(inbuf + 12); Km = c->Km[0]; Kr = c->Kr[0]; Q(block, Kr, Km); Km = c->Km[1]; Kr = c->Kr[1]; Q(block, Kr, Km); Km = c->Km[2]; Kr = c->Kr[2]; Q(block, Kr, Km); Km = c->Km[3]; Kr = c->Kr[3]; Q(block, Kr, Km); Km = c->Km[4]; Kr = c->Kr[4]; Q(block, Kr, Km); Km = c->Km[5]; Kr = c->Kr[5]; Q(block, Kr, Km); Km = c->Km[6]; Kr = c->Kr[6]; QBAR(block, Kr, Km); Km = c->Km[7]; Kr = c->Kr[7]; QBAR(block, Kr, Km); Km = c->Km[8]; Kr = c->Kr[8]; QBAR(block, Kr, Km); Km = c->Km[9]; Kr = c->Kr[9]; QBAR(block, Kr, Km); Km = c->Km[10]; Kr = c->Kr[10]; QBAR(block, Kr, Km); Km = c->Km[11]; Kr = c->Kr[11]; QBAR(block, Kr, Km); put_unaligned_be32(block[0], outbuf); put_unaligned_be32(block[1], outbuf + 4); put_unaligned_be32(block[2], outbuf + 8); put_unaligned_be32(block[3], outbuf + 12); } EXPORT_SYMBOL_GPL(__cast6_encrypt); static void cast6_encrypt(struct crypto_tfm *tfm, u8 *outbuf, const u8 *inbuf) { __cast6_encrypt(crypto_tfm_ctx(tfm), outbuf, inbuf); } void __cast6_decrypt(const void *ctx, u8 *outbuf, const u8 *inbuf) { const struct cast6_ctx *c = ctx; u32 block[4]; const u32 *Km; const u8 *Kr; block[0] = get_unaligned_be32(inbuf); block[1] = get_unaligned_be32(inbuf + 4); block[2] = get_unaligned_be32(inbuf + 8); block[3] = get_unaligned_be32(inbuf + 12); Km = c->Km[11]; Kr = c->Kr[11]; Q(block, Kr, Km); Km = c->Km[10]; Kr = c->Kr[10]; Q(block, Kr, Km); Km = c->Km[9]; Kr = c->Kr[9]; Q(block, Kr, Km); Km = c->Km[8]; Kr = c->Kr[8]; Q(block, Kr, Km); Km = c->Km[7]; Kr = c->Kr[7]; Q(block, Kr, Km); Km = c->Km[6]; Kr = c->Kr[6]; Q(block, Kr, Km); Km = c->Km[5]; Kr = c->Kr[5]; QBAR(block, Kr, Km); Km = c->Km[4]; Kr = c->Kr[4]; QBAR(block, Kr, Km); Km = c->Km[3]; Kr = c->Kr[3]; QBAR(block, Kr, Km); Km = c->Km[2]; Kr = c->Kr[2]; QBAR(block, Kr, Km); Km = c->Km[1]; Kr = c->Kr[1]; QBAR(block, Kr, Km); Km = c->Km[0]; Kr = c->Kr[0]; QBAR(block, Kr, Km); put_unaligned_be32(block[0], outbuf); put_unaligned_be32(block[1], outbuf + 4); put_unaligned_be32(block[2], outbuf + 8); put_unaligned_be32(block[3], outbuf + 12); } EXPORT_SYMBOL_GPL(__cast6_decrypt); static void cast6_decrypt(struct crypto_tfm *tfm, u8 *outbuf, const u8 *inbuf) { __cast6_decrypt(crypto_tfm_ctx(tfm), outbuf, inbuf); } static struct crypto_alg alg = { .cra_name = "cast6", .cra_driver_name = "cast6-generic", .cra_priority = 100, .cra_flags = CRYPTO_ALG_TYPE_CIPHER, .cra_blocksize = CAST6_BLOCK_SIZE, .cra_ctxsize = sizeof(struct cast6_ctx), .cra_module = THIS_MODULE, .cra_u = { .cipher = { .cia_min_keysize = CAST6_MIN_KEY_SIZE, .cia_max_keysize = CAST6_MAX_KEY_SIZE, .cia_setkey = cast6_setkey, .cia_encrypt = cast6_encrypt, .cia_decrypt = cast6_decrypt} } }; static int __init cast6_mod_init(void) { return crypto_register_alg(&alg); } static void __exit cast6_mod_fini(void) { crypto_unregister_alg(&alg); } module_init(cast6_mod_init); module_exit(cast6_mod_fini); MODULE_LICENSE("GPL"); MODULE_DESCRIPTION("Cast6 Cipher Algorithm"); MODULE_ALIAS_CRYPTO("cast6"); MODULE_ALIAS_CRYPTO("cast6-generic"); |
| 54 57 34 5 106 49 49 74 64 20 20 36 37 37 16 20 75 73 75 33 86 34 4 34 34 34 25 25 25 24 24 24 24 24 24 24 24 24 24 24 24 15 15 15 42 42 40 40 16 3 40 38 37 37 37 15 37 38 38 42 42 40 29 38 33 27 27 38 42 8 8 8 35 34 17 18 18 34 34 34 34 34 18 34 34 34 3 2 2 33 5 30 25 30 21 30 34 5 5 21 20 20 15 1 15 1 20 20 20 20 21 21 20 20 21 16 16 21 20 1 1 1 20 38 37 37 37 37 38 3 2 3 52 52 51 50 36 35 35 35 52 52 5 4 4 4 5 15 15 14 12 11 4 4 4 4 12 15 9 9 3 9 8 14 14 12 2 14 13 10 14 2 2 2 2 2 2 5 5 4 2 1 2 1 2 2 4 4 3 1 1 1 1 2 2 38 37 38 3 3 1 2 1 12 11 9 11 4 3 7 12 5 3 3 3 2 1 7 6 3 2 2 1 1 1 4 4 4 2 3 3 2 3 4 3 3 2 2 17 2 3 2 1 3 2 2 5 2 5 5 5 5 5 5 5 5 5 5 2 2 5 5 5 5 13 5 5 5 5 5 2 2 2 2 4 13 13 8 8 9 9 9 9 8 8 8 4 7 8 8 8 5 5 5 1 4 55 49 55 55 55 55 55 55 6 49 49 49 1 49 55 1 55 55 4 51 1 51 48 49 47 1 46 46 46 46 49 2 49 49 48 13 51 11 11 11 11 14 11 11 10 46 106 51 51 51 98 56 103 59 50 106 103 103 106 53 39 39 38 39 53 53 53 53 53 53 53 53 53 53 53 53 53 53 53 53 53 54 40 40 54 54 49 54 54 54 52 54 54 54 49 48 48 48 48 48 48 48 48 48 48 48 48 48 48 48 48 48 48 48 49 49 49 49 49 49 35 1 49 49 49 49 1 49 49 49 53 39 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 | // SPDX-License-Identifier: GPL-2.0-or-later /* * Abstract layer for MIDI v1.0 stream * Copyright (c) by Jaroslav Kysela <perex@perex.cz> */ #include <sound/core.h> #include <linux/major.h> #include <linux/init.h> #include <linux/sched/signal.h> #include <linux/slab.h> #include <linux/time.h> #include <linux/wait.h> #include <linux/mutex.h> #include <linux/module.h> #include <linux/delay.h> #include <linux/mm.h> #include <linux/nospec.h> #include <sound/rawmidi.h> #include <sound/info.h> #include <sound/control.h> #include <sound/minors.h> #include <sound/initval.h> #include <sound/ump.h> MODULE_AUTHOR("Jaroslav Kysela <perex@perex.cz>"); MODULE_DESCRIPTION("Midlevel RawMidi code for ALSA."); MODULE_LICENSE("GPL"); #ifdef CONFIG_SND_OSSEMUL static int midi_map[SNDRV_CARDS]; static int amidi_map[SNDRV_CARDS] = {[0 ... (SNDRV_CARDS-1)] = 1}; module_param_array(midi_map, int, NULL, 0444); MODULE_PARM_DESC(midi_map, "Raw MIDI device number assigned to 1st OSS device."); module_param_array(amidi_map, int, NULL, 0444); MODULE_PARM_DESC(amidi_map, "Raw MIDI device number assigned to 2nd OSS device."); #endif /* CONFIG_SND_OSSEMUL */ static int snd_rawmidi_dev_free(struct snd_device *device); static int snd_rawmidi_dev_register(struct snd_device *device); static int snd_rawmidi_dev_disconnect(struct snd_device *device); static LIST_HEAD(snd_rawmidi_devices); static DEFINE_MUTEX(register_mutex); #define rmidi_err(rmidi, fmt, args...) \ dev_err((rmidi)->dev, fmt, ##args) #define rmidi_warn(rmidi, fmt, args...) \ dev_warn((rmidi)->dev, fmt, ##args) #define rmidi_dbg(rmidi, fmt, args...) \ dev_dbg((rmidi)->dev, fmt, ##args) struct snd_rawmidi_status32 { s32 stream; s32 tstamp_sec; /* Timestamp */ s32 tstamp_nsec; u32 avail; /* available bytes */ u32 xruns; /* count of overruns since last status (in bytes) */ unsigned char reserved[16]; /* reserved for future use */ }; #define SNDRV_RAWMIDI_IOCTL_STATUS32 _IOWR('W', 0x20, struct snd_rawmidi_status32) struct snd_rawmidi_status64 { int stream; u8 rsvd[4]; /* alignment */ s64 tstamp_sec; /* Timestamp */ s64 tstamp_nsec; size_t avail; /* available bytes */ size_t xruns; /* count of overruns since last status (in bytes) */ unsigned char reserved[16]; /* reserved for future use */ }; #define SNDRV_RAWMIDI_IOCTL_STATUS64 _IOWR('W', 0x20, struct snd_rawmidi_status64) #define rawmidi_is_ump(rmidi) \ (IS_ENABLED(CONFIG_SND_UMP) && ((rmidi)->info_flags & SNDRV_RAWMIDI_INFO_UMP)) static struct snd_rawmidi *snd_rawmidi_search(struct snd_card *card, int device) { struct snd_rawmidi *rawmidi; list_for_each_entry(rawmidi, &snd_rawmidi_devices, list) if (rawmidi->card == card && rawmidi->device == device) return rawmidi; return NULL; } static inline unsigned short snd_rawmidi_file_flags(struct file *file) { switch (file->f_mode & (FMODE_READ | FMODE_WRITE)) { case FMODE_WRITE: return SNDRV_RAWMIDI_LFLG_OUTPUT; case FMODE_READ: return SNDRV_RAWMIDI_LFLG_INPUT; default: return SNDRV_RAWMIDI_LFLG_OPEN; } } static inline bool __snd_rawmidi_ready(struct snd_rawmidi_runtime *runtime) { return runtime->avail >= runtime->avail_min; } static bool snd_rawmidi_ready(struct snd_rawmidi_substream *substream) { guard(spinlock_irqsave)(&substream->lock); return __snd_rawmidi_ready(substream->runtime); } static inline int snd_rawmidi_ready_append(struct snd_rawmidi_substream *substream, size_t count) { struct snd_rawmidi_runtime *runtime = substream->runtime; return runtime->avail >= runtime->avail_min && (!substream->append || runtime->avail >= count); } static void snd_rawmidi_input_event_work(struct work_struct *work) { struct snd_rawmidi_runtime *runtime = container_of(work, struct snd_rawmidi_runtime, event_work); if (runtime->event) runtime->event(runtime->substream); } /* buffer refcount management: call with substream->lock held */ static inline void snd_rawmidi_buffer_ref(struct snd_rawmidi_runtime *runtime) { runtime->buffer_ref++; } static inline void snd_rawmidi_buffer_unref(struct snd_rawmidi_runtime *runtime) { runtime->buffer_ref--; } static void snd_rawmidi_buffer_ref_sync(struct snd_rawmidi_substream *substream) { int loop = HZ; spin_lock_irq(&substream->lock); while (substream->runtime->buffer_ref) { spin_unlock_irq(&substream->lock); if (!--loop) { rmidi_err(substream->rmidi, "Buffer ref sync timeout\n"); return; } schedule_timeout_uninterruptible(1); spin_lock_irq(&substream->lock); } spin_unlock_irq(&substream->lock); } static int snd_rawmidi_runtime_create(struct snd_rawmidi_substream *substream) { struct snd_rawmidi_runtime *runtime; runtime = kzalloc(sizeof(*runtime), GFP_KERNEL); if (!runtime) return -ENOMEM; runtime->substream = substream; init_waitqueue_head(&runtime->sleep); INIT_WORK(&runtime->event_work, snd_rawmidi_input_event_work); runtime->event = NULL; runtime->buffer_size = PAGE_SIZE; runtime->avail_min = 1; if (substream->stream == SNDRV_RAWMIDI_STREAM_INPUT) runtime->avail = 0; else runtime->avail = runtime->buffer_size; runtime->buffer = kvzalloc(runtime->buffer_size, GFP_KERNEL); if (!runtime->buffer) { kfree(runtime); return -ENOMEM; } runtime->appl_ptr = runtime->hw_ptr = 0; substream->runtime = runtime; if (rawmidi_is_ump(substream->rmidi)) runtime->align = 3; return 0; } /* get the current alignment (either 0 or 3) */ static inline int get_align(struct snd_rawmidi_runtime *runtime) { if (IS_ENABLED(CONFIG_SND_UMP)) return runtime->align; else return 0; } /* get the trimmed size with the current alignment */ #define get_aligned_size(runtime, size) ((size) & ~get_align(runtime)) static int snd_rawmidi_runtime_free(struct snd_rawmidi_substream *substream) { struct snd_rawmidi_runtime *runtime = substream->runtime; kvfree(runtime->buffer); kfree(runtime); substream->runtime = NULL; return 0; } static inline void snd_rawmidi_output_trigger(struct snd_rawmidi_substream *substream, int up) { if (!substream->opened) return; substream->ops->trigger(substream, up); } static void snd_rawmidi_input_trigger(struct snd_rawmidi_substream *substream, int up) { if (!substream->opened) return; substream->ops->trigger(substream, up); if (!up) cancel_work_sync(&substream->runtime->event_work); } static void __reset_runtime_ptrs(struct snd_rawmidi_runtime *runtime, bool is_input) { runtime->drain = 0; runtime->appl_ptr = runtime->hw_ptr = 0; runtime->avail = is_input ? 0 : runtime->buffer_size; } static void reset_runtime_ptrs(struct snd_rawmidi_substream *substream, bool is_input) { guard(spinlock_irqsave)(&substream->lock); if (substream->opened && substream->runtime) __reset_runtime_ptrs(substream->runtime, is_input); } int snd_rawmidi_drop_output(struct snd_rawmidi_substream *substream) { snd_rawmidi_output_trigger(substream, 0); reset_runtime_ptrs(substream, false); return 0; } EXPORT_SYMBOL(snd_rawmidi_drop_output); int snd_rawmidi_drain_output(struct snd_rawmidi_substream *substream) { int err = 0; long timeout; struct snd_rawmidi_runtime *runtime; scoped_guard(spinlock_irq, &substream->lock) { runtime = substream->runtime; if (!substream->opened || !runtime || !runtime->buffer) return -EINVAL; snd_rawmidi_buffer_ref(runtime); runtime->drain = 1; } timeout = wait_event_interruptible_timeout(runtime->sleep, (runtime->avail >= runtime->buffer_size), 10*HZ); scoped_guard(spinlock_irq, &substream->lock) { if (signal_pending(current)) err = -ERESTARTSYS; if (runtime->avail < runtime->buffer_size && !timeout) { rmidi_warn(substream->rmidi, "rawmidi drain error (avail = %li, buffer_size = %li)\n", (long)runtime->avail, (long)runtime->buffer_size); err = -EIO; } runtime->drain = 0; } if (err != -ERESTARTSYS) { /* we need wait a while to make sure that Tx FIFOs are empty */ if (substream->ops->drain) substream->ops->drain(substream); else msleep(50); snd_rawmidi_drop_output(substream); } scoped_guard(spinlock_irq, &substream->lock) snd_rawmidi_buffer_unref(runtime); return err; } EXPORT_SYMBOL(snd_rawmidi_drain_output); int snd_rawmidi_drain_input(struct snd_rawmidi_substream *substream) { snd_rawmidi_input_trigger(substream, 0); reset_runtime_ptrs(substream, true); return 0; } EXPORT_SYMBOL(snd_rawmidi_drain_input); /* look for an available substream for the given stream direction; * if a specific subdevice is given, try to assign it */ static int assign_substream(struct snd_rawmidi *rmidi, int subdevice, int stream, int mode, struct snd_rawmidi_substream **sub_ret) { struct snd_rawmidi_substream *substream; struct snd_rawmidi_str *s = &rmidi->streams[stream]; static const unsigned int info_flags[2] = { [SNDRV_RAWMIDI_STREAM_OUTPUT] = SNDRV_RAWMIDI_INFO_OUTPUT, [SNDRV_RAWMIDI_STREAM_INPUT] = SNDRV_RAWMIDI_INFO_INPUT, }; if (!(rmidi->info_flags & info_flags[stream])) return -ENXIO; if (subdevice >= 0 && subdevice >= s->substream_count) return -ENODEV; list_for_each_entry(substream, &s->substreams, list) { if (substream->opened) { if (stream == SNDRV_RAWMIDI_STREAM_INPUT || !(mode & SNDRV_RAWMIDI_LFLG_APPEND) || !substream->append) continue; } if (subdevice < 0 || subdevice == substream->number) { *sub_ret = substream; return 0; } } return -EAGAIN; } /* open and do ref-counting for the given substream */ static int open_substream(struct snd_rawmidi *rmidi, struct snd_rawmidi_substream *substream, int mode) { int err; if (substream->use_count == 0) { err = snd_rawmidi_runtime_create(substream); if (err < 0) return err; err = substream->ops->open(substream); if (err < 0) { snd_rawmidi_runtime_free(substream); return err; } guard(spinlock_irq)(&substream->lock); substream->opened = 1; substream->active_sensing = 0; if (mode & SNDRV_RAWMIDI_LFLG_APPEND) substream->append = 1; substream->pid = get_pid(task_pid(current)); rmidi->streams[substream->stream].substream_opened++; } substream->use_count++; return 0; } static void close_substream(struct snd_rawmidi *rmidi, struct snd_rawmidi_substream *substream, int cleanup); static int rawmidi_open_priv(struct snd_rawmidi *rmidi, int subdevice, int mode, struct snd_rawmidi_file *rfile) { struct snd_rawmidi_substream *sinput = NULL, *soutput = NULL; int err; rfile->input = rfile->output = NULL; if (mode & SNDRV_RAWMIDI_LFLG_INPUT) { err = assign_substream(rmidi, subdevice, SNDRV_RAWMIDI_STREAM_INPUT, mode, &sinput); if (err < 0) return err; } if (mode & SNDRV_RAWMIDI_LFLG_OUTPUT) { err = assign_substream(rmidi, subdevice, SNDRV_RAWMIDI_STREAM_OUTPUT, mode, &soutput); if (err < 0) return err; } if (sinput) { err = open_substream(rmidi, sinput, mode); if (err < 0) return err; } if (soutput) { err = open_substream(rmidi, soutput, mode); if (err < 0) { if (sinput) close_substream(rmidi, sinput, 0); return err; } } rfile->rmidi = rmidi; rfile->input = sinput; rfile->output = soutput; return 0; } /* called from sound/core/seq/seq_midi.c */ int snd_rawmidi_kernel_open(struct snd_rawmidi *rmidi, int subdevice, int mode, struct snd_rawmidi_file *rfile) { int err; if (snd_BUG_ON(!rfile)) return -EINVAL; if (!try_module_get(rmidi->card->module)) return -ENXIO; guard(mutex)(&rmidi->open_mutex); err = rawmidi_open_priv(rmidi, subdevice, mode, rfile); if (err < 0) module_put(rmidi->card->module); return err; } EXPORT_SYMBOL(snd_rawmidi_kernel_open); static int snd_rawmidi_open(struct inode *inode, struct file *file) { int maj = imajor(inode); struct snd_card *card; int subdevice; unsigned short fflags; int err; struct snd_rawmidi *rmidi; struct snd_rawmidi_file *rawmidi_file = NULL; wait_queue_entry_t wait; if ((file->f_flags & O_APPEND) && !(file->f_flags & O_NONBLOCK)) return -EINVAL; /* invalid combination */ err = stream_open(inode, file); if (err < 0) return err; if (maj == snd_major) { rmidi = snd_lookup_minor_data(iminor(inode), SNDRV_DEVICE_TYPE_RAWMIDI); #ifdef CONFIG_SND_OSSEMUL } else if (maj == SOUND_MAJOR) { rmidi = snd_lookup_oss_minor_data(iminor(inode), SNDRV_OSS_DEVICE_TYPE_MIDI); #endif } else return -ENXIO; if (rmidi == NULL) return -ENODEV; if (!try_module_get(rmidi->card->module)) { snd_card_unref(rmidi->card); return -ENXIO; } mutex_lock(&rmidi->open_mutex); card = rmidi->card; err = snd_card_file_add(card, file); if (err < 0) goto __error_card; fflags = snd_rawmidi_file_flags(file); if ((file->f_flags & O_APPEND) || maj == SOUND_MAJOR) /* OSS emul? */ fflags |= SNDRV_RAWMIDI_LFLG_APPEND; rawmidi_file = kmalloc(sizeof(*rawmidi_file), GFP_KERNEL); if (rawmidi_file == NULL) { err = -ENOMEM; goto __error; } rawmidi_file->user_pversion = 0; init_waitqueue_entry(&wait, current); add_wait_queue(&rmidi->open_wait, &wait); while (1) { subdevice = snd_ctl_get_preferred_subdevice(card, SND_CTL_SUBDEV_RAWMIDI); err = rawmidi_open_priv(rmidi, subdevice, fflags, rawmidi_file); if (err >= 0) break; if (err == -EAGAIN) { if (file->f_flags & O_NONBLOCK) { err = -EBUSY; break; } } else break; set_current_state(TASK_INTERRUPTIBLE); mutex_unlock(&rmidi->open_mutex); schedule(); mutex_lock(&rmidi->open_mutex); if (rmidi->card->shutdown) { err = -ENODEV; break; } if (signal_pending(current)) { err = -ERESTARTSYS; break; } } remove_wait_queue(&rmidi->open_wait, &wait); if (err < 0) { kfree(rawmidi_file); goto __error; } #ifdef CONFIG_SND_OSSEMUL if (rawmidi_file->input && rawmidi_file->input->runtime) rawmidi_file->input->runtime->oss = (maj == SOUND_MAJOR); if (rawmidi_file->output && rawmidi_file->output->runtime) rawmidi_file->output->runtime->oss = (maj == SOUND_MAJOR); #endif file->private_data = rawmidi_file; mutex_unlock(&rmidi->open_mutex); snd_card_unref(rmidi->card); return 0; __error: snd_card_file_remove(card, file); __error_card: mutex_unlock(&rmidi->open_mutex); module_put(rmidi->card->module); snd_card_unref(rmidi->card); return err; } static void close_substream(struct snd_rawmidi *rmidi, struct snd_rawmidi_substream *substream, int cleanup) { if (--substream->use_count) return; if (cleanup) { if (substream->stream == SNDRV_RAWMIDI_STREAM_INPUT) snd_rawmidi_input_trigger(substream, 0); else { if (substream->active_sensing) { unsigned char buf = 0xfe; /* sending single active sensing message * to shut the device up */ snd_rawmidi_kernel_write(substream, &buf, 1); } if (snd_rawmidi_drain_output(substream) == -ERESTARTSYS) snd_rawmidi_output_trigger(substream, 0); } snd_rawmidi_buffer_ref_sync(substream); } scoped_guard(spinlock_irq, &substream->lock) { substream->opened = 0; substream->append = 0; } substream->ops->close(substream); if (substream->runtime->private_free) substream->runtime->private_free(substream); snd_rawmidi_runtime_free(substream); put_pid(substream->pid); substream->pid = NULL; rmidi->streams[substream->stream].substream_opened--; } static void rawmidi_release_priv(struct snd_rawmidi_file *rfile) { struct snd_rawmidi *rmidi; rmidi = rfile->rmidi; guard(mutex)(&rmidi->open_mutex); if (rfile->input) { close_substream(rmidi, rfile->input, 1); rfile->input = NULL; } if (rfile->output) { close_substream(rmidi, rfile->output, 1); rfile->output = NULL; } rfile->rmidi = NULL; wake_up(&rmidi->open_wait); } /* called from sound/core/seq/seq_midi.c */ int snd_rawmidi_kernel_release(struct snd_rawmidi_file *rfile) { struct snd_rawmidi *rmidi; if (snd_BUG_ON(!rfile)) return -ENXIO; rmidi = rfile->rmidi; rawmidi_release_priv(rfile); module_put(rmidi->card->module); return 0; } EXPORT_SYMBOL(snd_rawmidi_kernel_release); static int snd_rawmidi_release(struct inode *inode, struct file *file) { struct snd_rawmidi_file *rfile; struct snd_rawmidi *rmidi; struct module *module; rfile = file->private_data; rmidi = rfile->rmidi; rawmidi_release_priv(rfile); kfree(rfile); module = rmidi->card->module; snd_card_file_remove(rmidi->card, file); module_put(module); return 0; } static int snd_rawmidi_info(struct snd_rawmidi_substream *substream, struct snd_rawmidi_info *info) { struct snd_rawmidi *rmidi; if (substream == NULL) return -ENODEV; rmidi = substream->rmidi; memset(info, 0, sizeof(*info)); info->card = rmidi->card->number; info->device = rmidi->device; info->subdevice = substream->number; info->stream = substream->stream; info->flags = rmidi->info_flags; if (substream->inactive) info->flags |= SNDRV_RAWMIDI_INFO_STREAM_INACTIVE; strscpy(info->id, rmidi->id); strscpy(info->name, rmidi->name); strscpy(info->subname, substream->name); info->subdevices_count = substream->pstr->substream_count; info->subdevices_avail = (substream->pstr->substream_count - substream->pstr->substream_opened); info->tied_device = rmidi->tied_device; return 0; } static int snd_rawmidi_info_user(struct snd_rawmidi_substream *substream, struct snd_rawmidi_info __user *_info) { struct snd_rawmidi_info info; int err; err = snd_rawmidi_info(substream, &info); if (err < 0) return err; if (copy_to_user(_info, &info, sizeof(struct snd_rawmidi_info))) return -EFAULT; return 0; } static int __snd_rawmidi_info_select(struct snd_card *card, struct snd_rawmidi_info *info) { struct snd_rawmidi *rmidi; struct snd_rawmidi_str *pstr; struct snd_rawmidi_substream *substream; rmidi = snd_rawmidi_search(card, info->device); if (!rmidi) return -ENXIO; if (info->stream < 0 || info->stream > 1) return -EINVAL; info->stream = array_index_nospec(info->stream, 2); pstr = &rmidi->streams[info->stream]; if (pstr->substream_count == 0) return -ENOENT; if (info->subdevice >= pstr->substream_count) return -ENXIO; list_for_each_entry(substream, &pstr->substreams, list) { if ((unsigned int)substream->number == info->subdevice) return snd_rawmidi_info(substream, info); } return -ENXIO; } int snd_rawmidi_info_select(struct snd_card *card, struct snd_rawmidi_info *info) { guard(mutex)(®ister_mutex); return __snd_rawmidi_info_select(card, info); } EXPORT_SYMBOL(snd_rawmidi_info_select); static int snd_rawmidi_info_select_user(struct snd_card *card, struct snd_rawmidi_info __user *_info) { int err; struct snd_rawmidi_info info; if (get_user(info.device, &_info->device)) return -EFAULT; if (get_user(info.stream, &_info->stream)) return -EFAULT; if (get_user(info.subdevice, &_info->subdevice)) return -EFAULT; err = snd_rawmidi_info_select(card, &info); if (err < 0) return err; if (copy_to_user(_info, &info, sizeof(struct snd_rawmidi_info))) return -EFAULT; return 0; } static int resize_runtime_buffer(struct snd_rawmidi_substream *substream, struct snd_rawmidi_params *params, bool is_input) { struct snd_rawmidi_runtime *runtime = substream->runtime; char *newbuf, *oldbuf; unsigned int framing = params->mode & SNDRV_RAWMIDI_MODE_FRAMING_MASK; if (params->buffer_size < 32 || params->buffer_size > 1024L * 1024L) return -EINVAL; if (framing == SNDRV_RAWMIDI_MODE_FRAMING_TSTAMP && (params->buffer_size & 0x1f) != 0) return -EINVAL; if (params->avail_min < 1 || params->avail_min > params->buffer_size) return -EINVAL; if (params->buffer_size & get_align(runtime)) return -EINVAL; if (params->buffer_size != runtime->buffer_size) { newbuf = kvzalloc(params->buffer_size, GFP_KERNEL); if (!newbuf) return -ENOMEM; spin_lock_irq(&substream->lock); if (runtime->buffer_ref) { spin_unlock_irq(&substream->lock); kvfree(newbuf); return -EBUSY; } oldbuf = runtime->buffer; runtime->buffer = newbuf; runtime->buffer_size = params->buffer_size; __reset_runtime_ptrs(runtime, is_input); spin_unlock_irq(&substream->lock); kvfree(oldbuf); } runtime->avail_min = params->avail_min; return 0; } int snd_rawmidi_output_params(struct snd_rawmidi_substream *substream, struct snd_rawmidi_params *params) { int err; snd_rawmidi_drain_output(substream); guard(mutex)(&substream->rmidi->open_mutex); if (substream->append && substream->use_count > 1) return -EBUSY; err = resize_runtime_buffer(substream, params, false); if (!err) substream->active_sensing = !params->no_active_sensing; return err; } EXPORT_SYMBOL(snd_rawmidi_output_params); int snd_rawmidi_input_params(struct snd_rawmidi_substream *substream, struct snd_rawmidi_params *params) { unsigned int framing = params->mode & SNDRV_RAWMIDI_MODE_FRAMING_MASK; unsigned int clock_type = params->mode & SNDRV_RAWMIDI_MODE_CLOCK_MASK; int err; snd_rawmidi_drain_input(substream); guard(mutex)(&substream->rmidi->open_mutex); if (framing == SNDRV_RAWMIDI_MODE_FRAMING_NONE && clock_type != SNDRV_RAWMIDI_MODE_CLOCK_NONE) err = -EINVAL; else if (clock_type > SNDRV_RAWMIDI_MODE_CLOCK_MONOTONIC_RAW) err = -EINVAL; else if (framing > SNDRV_RAWMIDI_MODE_FRAMING_TSTAMP) err = -EINVAL; else err = resize_runtime_buffer(substream, params, true); if (!err) { substream->framing = framing; substream->clock_type = clock_type; } return 0; } EXPORT_SYMBOL(snd_rawmidi_input_params); static int snd_rawmidi_output_status(struct snd_rawmidi_substream *substream, struct snd_rawmidi_status64 *status) { struct snd_rawmidi_runtime *runtime = substream->runtime; memset(status, 0, sizeof(*status)); status->stream = SNDRV_RAWMIDI_STREAM_OUTPUT; guard(spinlock_irq)(&substream->lock); status->avail = runtime->avail; return 0; } static int snd_rawmidi_input_status(struct snd_rawmidi_substream *substream, struct snd_rawmidi_status64 *status) { struct snd_rawmidi_runtime *runtime = substream->runtime; memset(status, 0, sizeof(*status)); status->stream = SNDRV_RAWMIDI_STREAM_INPUT; guard(spinlock_irq)(&substream->lock); status->avail = runtime->avail; status->xruns = runtime->xruns; runtime->xruns = 0; return 0; } static int snd_rawmidi_ioctl_status32(struct snd_rawmidi_file *rfile, struct snd_rawmidi_status32 __user *argp) { int err = 0; struct snd_rawmidi_status32 __user *status = argp; struct snd_rawmidi_status32 status32; struct snd_rawmidi_status64 status64; if (copy_from_user(&status32, argp, sizeof(struct snd_rawmidi_status32))) return -EFAULT; switch (status32.stream) { case SNDRV_RAWMIDI_STREAM_OUTPUT: if (rfile->output == NULL) return -EINVAL; err = snd_rawmidi_output_status(rfile->output, &status64); break; case SNDRV_RAWMIDI_STREAM_INPUT: if (rfile->input == NULL) return -EINVAL; err = snd_rawmidi_input_status(rfile->input, &status64); break; default: return -EINVAL; } if (err < 0) return err; status32 = (struct snd_rawmidi_status32) { .stream = status64.stream, .tstamp_sec = status64.tstamp_sec, .tstamp_nsec = status64.tstamp_nsec, .avail = status64.avail, .xruns = status64.xruns, }; if (copy_to_user(status, &status32, sizeof(*status))) return -EFAULT; return 0; } static int snd_rawmidi_ioctl_status64(struct snd_rawmidi_file *rfile, struct snd_rawmidi_status64 __user *argp) { int err = 0; struct snd_rawmidi_status64 status; if (copy_from_user(&status, argp, sizeof(struct snd_rawmidi_status64))) return -EFAULT; switch (status.stream) { case SNDRV_RAWMIDI_STREAM_OUTPUT: if (rfile->output == NULL) return -EINVAL; err = snd_rawmidi_output_status(rfile->output, &status); break; case SNDRV_RAWMIDI_STREAM_INPUT: if (rfile->input == NULL) return -EINVAL; err = snd_rawmidi_input_status(rfile->input, &status); break; default: return -EINVAL; } if (err < 0) return err; if (copy_to_user(argp, &status, sizeof(struct snd_rawmidi_status64))) return -EFAULT; return 0; } static long snd_rawmidi_ioctl(struct file *file, unsigned int cmd, unsigned long arg) { struct snd_rawmidi_file *rfile; struct snd_rawmidi *rmidi; void __user *argp = (void __user *)arg; rfile = file->private_data; if (((cmd >> 8) & 0xff) != 'W') return -ENOTTY; switch (cmd) { case SNDRV_RAWMIDI_IOCTL_PVERSION: return put_user(SNDRV_RAWMIDI_VERSION, (int __user *)argp) ? -EFAULT : 0; case SNDRV_RAWMIDI_IOCTL_INFO: { int stream; struct snd_rawmidi_info __user *info = argp; if (get_user(stream, &info->stream)) return -EFAULT; switch (stream) { case SNDRV_RAWMIDI_STREAM_INPUT: return snd_rawmidi_info_user(rfile->input, info); case SNDRV_RAWMIDI_STREAM_OUTPUT: return snd_rawmidi_info_user(rfile->output, info); default: return -EINVAL; } } case SNDRV_RAWMIDI_IOCTL_USER_PVERSION: if (get_user(rfile->user_pversion, (unsigned int __user *)arg)) return -EFAULT; return 0; case SNDRV_RAWMIDI_IOCTL_PARAMS: { struct snd_rawmidi_params params; if (copy_from_user(¶ms, argp, sizeof(struct snd_rawmidi_params))) return -EFAULT; if (rfile->user_pversion < SNDRV_PROTOCOL_VERSION(2, 0, 2)) { params.mode = 0; memset(params.reserved, 0, sizeof(params.reserved)); } switch (params.stream) { case SNDRV_RAWMIDI_STREAM_OUTPUT: if (rfile->output == NULL) return -EINVAL; return snd_rawmidi_output_params(rfile->output, ¶ms); case SNDRV_RAWMIDI_STREAM_INPUT: if (rfile->input == NULL) return -EINVAL; return snd_rawmidi_input_params(rfile->input, ¶ms); default: return -EINVAL; } } case SNDRV_RAWMIDI_IOCTL_STATUS32: return snd_rawmidi_ioctl_status32(rfile, argp); case SNDRV_RAWMIDI_IOCTL_STATUS64: return snd_rawmidi_ioctl_status64(rfile, argp); case SNDRV_RAWMIDI_IOCTL_DROP: { int val; if (get_user(val, (int __user *) argp)) return -EFAULT; switch (val) { case SNDRV_RAWMIDI_STREAM_OUTPUT: if (rfile->output == NULL) return -EINVAL; return snd_rawmidi_drop_output(rfile->output); default: return -EINVAL; } } case SNDRV_RAWMIDI_IOCTL_DRAIN: { int val; if (get_user(val, (int __user *) argp)) return -EFAULT; switch (val) { case SNDRV_RAWMIDI_STREAM_OUTPUT: if (rfile->output == NULL) return -EINVAL; return snd_rawmidi_drain_output(rfile->output); case SNDRV_RAWMIDI_STREAM_INPUT: if (rfile->input == NULL) return -EINVAL; return snd_rawmidi_drain_input(rfile->input); default: return -EINVAL; } } default: rmidi = rfile->rmidi; if (rmidi->ops && rmidi->ops->ioctl) return rmidi->ops->ioctl(rmidi, cmd, argp); rmidi_dbg(rmidi, "rawmidi: unknown command = 0x%x\n", cmd); } return -ENOTTY; } /* ioctl to find the next device; either legacy or UMP depending on @find_ump */ static int snd_rawmidi_next_device(struct snd_card *card, int __user *argp, bool find_ump) { struct snd_rawmidi *rmidi; int device; bool is_ump; if (get_user(device, argp)) return -EFAULT; if (device >= SNDRV_RAWMIDI_DEVICES) /* next device is -1 */ device = SNDRV_RAWMIDI_DEVICES - 1; scoped_guard(mutex, ®ister_mutex) { device = device < 0 ? 0 : device + 1; for (; device < SNDRV_RAWMIDI_DEVICES; device++) { rmidi = snd_rawmidi_search(card, device); if (!rmidi) continue; is_ump = rawmidi_is_ump(rmidi); if (find_ump == is_ump) break; } if (device == SNDRV_RAWMIDI_DEVICES) device = -1; } if (put_user(device, argp)) return -EFAULT; return 0; } #if IS_ENABLED(CONFIG_SND_UMP) /* inquiry of UMP endpoint and block info via control API */ static int snd_rawmidi_call_ump_ioctl(struct snd_card *card, int cmd, void __user *argp) { struct snd_ump_endpoint_info __user *info = argp; struct snd_rawmidi *rmidi; int device; if (get_user(device, &info->device)) return -EFAULT; guard(mutex)(®ister_mutex); rmidi = snd_rawmidi_search(card, device); if (rmidi && rmidi->ops && rmidi->ops->ioctl) return rmidi->ops->ioctl(rmidi, cmd, argp); else return -ENXIO; } #endif static int snd_rawmidi_control_ioctl(struct snd_card *card, struct snd_ctl_file *control, unsigned int cmd, unsigned long arg) { void __user *argp = (void __user *)arg; switch (cmd) { case SNDRV_CTL_IOCTL_RAWMIDI_NEXT_DEVICE: return snd_rawmidi_next_device(card, argp, false); #if IS_ENABLED(CONFIG_SND_UMP) case SNDRV_CTL_IOCTL_UMP_NEXT_DEVICE: return snd_rawmidi_next_device(card, argp, true); case SNDRV_CTL_IOCTL_UMP_ENDPOINT_INFO: return snd_rawmidi_call_ump_ioctl(card, SNDRV_UMP_IOCTL_ENDPOINT_INFO, argp); case SNDRV_CTL_IOCTL_UMP_BLOCK_INFO: return snd_rawmidi_call_ump_ioctl(card, SNDRV_UMP_IOCTL_BLOCK_INFO, argp); #endif case SNDRV_CTL_IOCTL_RAWMIDI_PREFER_SUBDEVICE: { int val; if (get_user(val, (int __user *)argp)) return -EFAULT; control->preferred_subdevice[SND_CTL_SUBDEV_RAWMIDI] = val; return 0; } case SNDRV_CTL_IOCTL_RAWMIDI_INFO: return snd_rawmidi_info_select_user(card, argp); } return -ENOIOCTLCMD; } static int receive_with_tstamp_framing(struct snd_rawmidi_substream *substream, const unsigned char *buffer, int src_count, const struct timespec64 *tstamp) { struct snd_rawmidi_runtime *runtime = substream->runtime; struct snd_rawmidi_framing_tstamp *dest_ptr; struct snd_rawmidi_framing_tstamp frame = { .tv_sec = tstamp->tv_sec, .tv_nsec = tstamp->tv_nsec }; int orig_count = src_count; int frame_size = sizeof(struct snd_rawmidi_framing_tstamp); int align = get_align(runtime); BUILD_BUG_ON(frame_size != 0x20); if (snd_BUG_ON((runtime->hw_ptr & 0x1f) != 0)) return -EINVAL; while (src_count > align) { if ((int)(runtime->buffer_size - runtime->avail) < frame_size) { runtime->xruns += src_count; break; } if (src_count >= SNDRV_RAWMIDI_FRAMING_DATA_LENGTH) frame.length = SNDRV_RAWMIDI_FRAMING_DATA_LENGTH; else { frame.length = get_aligned_size(runtime, src_count); if (!frame.length) break; memset(frame.data, 0, SNDRV_RAWMIDI_FRAMING_DATA_LENGTH); } memcpy(frame.data, buffer, frame.length); buffer += frame.length; src_count -= frame.length; dest_ptr = (struct snd_rawmidi_framing_tstamp *) (runtime->buffer + runtime->hw_ptr); *dest_ptr = frame; runtime->avail += frame_size; runtime->hw_ptr += frame_size; runtime->hw_ptr %= runtime->buffer_size; } return orig_count - src_count; } static struct timespec64 get_framing_tstamp(struct snd_rawmidi_substream *substream) { struct timespec64 ts64 = {0, 0}; switch (substream->clock_type) { case SNDRV_RAWMIDI_MODE_CLOCK_MONOTONIC_RAW: ktime_get_raw_ts64(&ts64); break; case SNDRV_RAWMIDI_MODE_CLOCK_MONOTONIC: ktime_get_ts64(&ts64); break; case SNDRV_RAWMIDI_MODE_CLOCK_REALTIME: ktime_get_real_ts64(&ts64); break; } return ts64; } /** * snd_rawmidi_receive - receive the input data from the device * @substream: the rawmidi substream * @buffer: the buffer pointer * @count: the data size to read * * Reads the data from the internal buffer. * * Return: The size of read data, or a negative error code on failure. */ int snd_rawmidi_receive(struct snd_rawmidi_substream *substream, const unsigned char *buffer, int count) { struct timespec64 ts64 = get_framing_tstamp(substream); int result = 0, count1; struct snd_rawmidi_runtime *runtime; guard(spinlock_irqsave)(&substream->lock); if (!substream->opened) return -EBADFD; runtime = substream->runtime; if (!runtime || !runtime->buffer) { rmidi_dbg(substream->rmidi, "snd_rawmidi_receive: input is not active!!!\n"); return -EINVAL; } count = get_aligned_size(runtime, count); if (!count) return result; if (substream->framing == SNDRV_RAWMIDI_MODE_FRAMING_TSTAMP) { result = receive_with_tstamp_framing(substream, buffer, count, &ts64); } else if (count == 1) { /* special case, faster code */ substream->bytes++; if (runtime->avail < runtime->buffer_size) { runtime->buffer[runtime->hw_ptr++] = buffer[0]; runtime->hw_ptr %= runtime->buffer_size; runtime->avail++; result++; } else { runtime->xruns++; } } else { substream->bytes += count; count1 = runtime->buffer_size - runtime->hw_ptr; if (count1 > count) count1 = count; if (count1 > (int)(runtime->buffer_size - runtime->avail)) count1 = runtime->buffer_size - runtime->avail; count1 = get_aligned_size(runtime, count1); if (!count1) return result; memcpy(runtime->buffer + runtime->hw_ptr, buffer, count1); runtime->hw_ptr += count1; runtime->hw_ptr %= runtime->buffer_size; runtime->avail += count1; count -= count1; result += count1; if (count > 0) { buffer += count1; count1 = count; if (count1 > (int)(runtime->buffer_size - runtime->avail)) { count1 = runtime->buffer_size - runtime->avail; runtime->xruns += count - count1; } if (count1 > 0) { memcpy(runtime->buffer, buffer, count1); runtime->hw_ptr = count1; runtime->avail += count1; result += count1; } } } if (result > 0) { if (runtime->event) schedule_work(&runtime->event_work); else if (__snd_rawmidi_ready(runtime)) wake_up(&runtime->sleep); } return result; } EXPORT_SYMBOL(snd_rawmidi_receive); static long snd_rawmidi_kernel_read1(struct snd_rawmidi_substream *substream, unsigned char __user *userbuf, unsigned char *kernelbuf, long count) { unsigned long flags; long result = 0, count1; struct snd_rawmidi_runtime *runtime = substream->runtime; unsigned long appl_ptr; int err = 0; spin_lock_irqsave(&substream->lock, flags); snd_rawmidi_buffer_ref(runtime); while (count > 0 && runtime->avail) { count1 = runtime->buffer_size - runtime->appl_ptr; if (count1 > count) count1 = count; if (count1 > (int)runtime->avail) count1 = runtime->avail; /* update runtime->appl_ptr before unlocking for userbuf */ appl_ptr = runtime->appl_ptr; runtime->appl_ptr += count1; runtime->appl_ptr %= runtime->buffer_size; runtime->avail -= count1; if (kernelbuf) memcpy(kernelbuf + result, runtime->buffer + appl_ptr, count1); if (userbuf) { spin_unlock_irqrestore(&substream->lock, flags); if (copy_to_user(userbuf + result, runtime->buffer + appl_ptr, count1)) err = -EFAULT; spin_lock_irqsave(&substream->lock, flags); if (err) goto out; } result += count1; count -= count1; } out: snd_rawmidi_buffer_unref(runtime); spin_unlock_irqrestore(&substream->lock, flags); return result > 0 ? result : err; } long snd_rawmidi_kernel_read(struct snd_rawmidi_substream *substream, unsigned char *buf, long count) { snd_rawmidi_input_trigger(substream, 1); return snd_rawmidi_kernel_read1(substream, NULL/*userbuf*/, buf, count); } EXPORT_SYMBOL(snd_rawmidi_kernel_read); static ssize_t snd_rawmidi_read(struct file *file, char __user *buf, size_t count, loff_t *offset) { long result; int count1; struct snd_rawmidi_file *rfile; struct snd_rawmidi_substream *substream; struct snd_rawmidi_runtime *runtime; rfile = file->private_data; substream = rfile->input; if (substream == NULL) return -EIO; runtime = substream->runtime; snd_rawmidi_input_trigger(substream, 1); result = 0; while (count > 0) { spin_lock_irq(&substream->lock); while (!__snd_rawmidi_ready(runtime)) { wait_queue_entry_t wait; if ((file->f_flags & O_NONBLOCK) != 0 || result > 0) { spin_unlock_irq(&substream->lock); return result > 0 ? result : -EAGAIN; } init_waitqueue_entry(&wait, current); add_wait_queue(&runtime->sleep, &wait); set_current_state(TASK_INTERRUPTIBLE); spin_unlock_irq(&substream->lock); schedule(); remove_wait_queue(&runtime->sleep, &wait); if (rfile->rmidi->card->shutdown) return -ENODEV; if (signal_pending(current)) return result > 0 ? result : -ERESTARTSYS; spin_lock_irq(&substream->lock); if (!runtime->avail) { spin_unlock_irq(&substream->lock); return result > 0 ? result : -EIO; } } spin_unlock_irq(&substream->lock); count1 = snd_rawmidi_kernel_read1(substream, (unsigned char __user *)buf, NULL/*kernelbuf*/, count); if (count1 < 0) return result > 0 ? result : count1; result += count1; buf += count1; count -= count1; } return result; } /** * snd_rawmidi_transmit_empty - check whether the output buffer is empty * @substream: the rawmidi substream * * Return: 1 if the internal output buffer is empty, 0 if not. */ int snd_rawmidi_transmit_empty(struct snd_rawmidi_substream *substream) { struct snd_rawmidi_runtime *runtime; guard(spinlock_irqsave)(&substream->lock); runtime = substream->runtime; if (!substream->opened || !runtime || !runtime->buffer) { rmidi_dbg(substream->rmidi, "snd_rawmidi_transmit_empty: output is not active!!!\n"); return 1; } return (runtime->avail >= runtime->buffer_size); } EXPORT_SYMBOL(snd_rawmidi_transmit_empty); /* * __snd_rawmidi_transmit_peek - copy data from the internal buffer * @substream: the rawmidi substream * @buffer: the buffer pointer * @count: data size to transfer * * This is a variant of snd_rawmidi_transmit_peek() without spinlock. */ static int __snd_rawmidi_transmit_peek(struct snd_rawmidi_substream *substream, unsigned char *buffer, int count) { int result, count1; struct snd_rawmidi_runtime *runtime = substream->runtime; if (runtime->buffer == NULL) { rmidi_dbg(substream->rmidi, "snd_rawmidi_transmit_peek: output is not active!!!\n"); return -EINVAL; } result = 0; if (runtime->avail >= runtime->buffer_size) { /* warning: lowlevel layer MUST trigger down the hardware */ goto __skip; } if (count == 1) { /* special case, faster code */ *buffer = runtime->buffer[runtime->hw_ptr]; result++; } else { count1 = runtime->buffer_size - runtime->hw_ptr; if (count1 > count) count1 = count; if (count1 > (int)(runtime->buffer_size - runtime->avail)) count1 = runtime->buffer_size - runtime->avail; count1 = get_aligned_size(runtime, count1); if (!count1) goto __skip; memcpy(buffer, runtime->buffer + runtime->hw_ptr, count1); count -= count1; result += count1; if (count > 0) { if (count > (int)(runtime->buffer_size - runtime->avail - count1)) count = runtime->buffer_size - runtime->avail - count1; count = get_aligned_size(runtime, count); if (!count) goto __skip; memcpy(buffer + count1, runtime->buffer, count); result += count; } } __skip: return result; } /** * snd_rawmidi_transmit_peek - copy data from the internal buffer * @substream: the rawmidi substream * @buffer: the buffer pointer * @count: data size to transfer * * Copies data from the internal output buffer to the given buffer. * * Call this in the interrupt handler when the midi output is ready, * and call snd_rawmidi_transmit_ack() after the transmission is * finished. * * Return: The size of copied data, or a negative error code on failure. */ int snd_rawmidi_transmit_peek(struct snd_rawmidi_substream *substream, unsigned char *buffer, int count) { guard(spinlock_irqsave)(&substream->lock); if (!substream->opened || !substream->runtime) return -EBADFD; return __snd_rawmidi_transmit_peek(substream, buffer, count); } EXPORT_SYMBOL(snd_rawmidi_transmit_peek); /* * __snd_rawmidi_transmit_ack - acknowledge the transmission * @substream: the rawmidi substream * @count: the transferred count * * This is a variant of __snd_rawmidi_transmit_ack() without spinlock. */ static int __snd_rawmidi_transmit_ack(struct snd_rawmidi_substream *substream, int count) { struct snd_rawmidi_runtime *runtime = substream->runtime; if (runtime->buffer == NULL) { rmidi_dbg(substream->rmidi, "snd_rawmidi_transmit_ack: output is not active!!!\n"); return -EINVAL; } snd_BUG_ON(runtime->avail + count > runtime->buffer_size); count = get_aligned_size(runtime, count); runtime->hw_ptr += count; runtime->hw_ptr %= runtime->buffer_size; runtime->avail += count; substream->bytes += count; if (count > 0) { if (runtime->drain || __snd_rawmidi_ready(runtime)) wake_up(&runtime->sleep); } return count; } /** * snd_rawmidi_transmit_ack - acknowledge the transmission * @substream: the rawmidi substream * @count: the transferred count * * Advances the hardware pointer for the internal output buffer with * the given size and updates the condition. * Call after the transmission is finished. * * Return: The advanced size if successful, or a negative error code on failure. */ int snd_rawmidi_transmit_ack(struct snd_rawmidi_substream *substream, int count) { guard(spinlock_irqsave)(&substream->lock); if (!substream->opened || !substream->runtime) return -EBADFD; return __snd_rawmidi_transmit_ack(substream, count); } EXPORT_SYMBOL(snd_rawmidi_transmit_ack); /** * snd_rawmidi_transmit - copy from the buffer to the device * @substream: the rawmidi substream * @buffer: the buffer pointer * @count: the data size to transfer * * Copies data from the buffer to the device and advances the pointer. * * Return: The copied size if successful, or a negative error code on failure. */ int snd_rawmidi_transmit(struct snd_rawmidi_substream *substream, unsigned char *buffer, int count) { guard(spinlock_irqsave)(&substream->lock); if (!substream->opened) return -EBADFD; count = __snd_rawmidi_transmit_peek(substream, buffer, count); if (count <= 0) return count; return __snd_rawmidi_transmit_ack(substream, count); } EXPORT_SYMBOL(snd_rawmidi_transmit); /** * snd_rawmidi_proceed - Discard the all pending bytes and proceed * @substream: rawmidi substream * * Return: the number of discarded bytes */ int snd_rawmidi_proceed(struct snd_rawmidi_substream *substream) { struct snd_rawmidi_runtime *runtime; int count = 0; guard(spinlock_irqsave)(&substream->lock); runtime = substream->runtime; if (substream->opened && runtime && runtime->avail < runtime->buffer_size) { count = runtime->buffer_size - runtime->avail; __snd_rawmidi_transmit_ack(substream, count); } return count; } EXPORT_SYMBOL(snd_rawmidi_proceed); static long snd_rawmidi_kernel_write1(struct snd_rawmidi_substream *substream, const unsigned char __user *userbuf, const unsigned char *kernelbuf, long count) { unsigned long flags; long count1, result; struct snd_rawmidi_runtime *runtime = substream->runtime; unsigned long appl_ptr; if (!kernelbuf && !userbuf) return -EINVAL; if (snd_BUG_ON(!runtime->buffer)) return -EINVAL; result = 0; spin_lock_irqsave(&substream->lock, flags); if (substream->append) { if ((long)runtime->avail < count) { spin_unlock_irqrestore(&substream->lock, flags); return -EAGAIN; } } snd_rawmidi_buffer_ref(runtime); while (count > 0 && runtime->avail > 0) { count1 = runtime->buffer_size - runtime->appl_ptr; if (count1 > count) count1 = count; if (count1 > (long)runtime->avail) count1 = runtime->avail; /* update runtime->appl_ptr before unlocking for userbuf */ appl_ptr = runtime->appl_ptr; runtime->appl_ptr += count1; runtime->appl_ptr %= runtime->buffer_size; runtime->avail -= count1; if (kernelbuf) memcpy(runtime->buffer + appl_ptr, kernelbuf + result, count1); else if (userbuf) { spin_unlock_irqrestore(&substream->lock, flags); if (copy_from_user(runtime->buffer + appl_ptr, userbuf + result, count1)) { spin_lock_irqsave(&substream->lock, flags); result = result > 0 ? result : -EFAULT; goto __end; } spin_lock_irqsave(&substream->lock, flags); } result += count1; count -= count1; } __end: count1 = runtime->avail < runtime->buffer_size; snd_rawmidi_buffer_unref(runtime); spin_unlock_irqrestore(&substream->lock, flags); if (count1) snd_rawmidi_output_trigger(substream, 1); return result; } long snd_rawmidi_kernel_write(struct snd_rawmidi_substream *substream, const unsigned char *buf, long count) { return snd_rawmidi_kernel_write1(substream, NULL, buf, count); } EXPORT_SYMBOL(snd_rawmidi_kernel_write); static ssize_t snd_rawmidi_write(struct file *file, const char __user *buf, size_t count, loff_t *offset) { long result, timeout; int count1; struct snd_rawmidi_file *rfile; struct snd_rawmidi_runtime *runtime; struct snd_rawmidi_substream *substream; rfile = file->private_data; substream = rfile->output; runtime = substream->runtime; /* we cannot put an atomic message to our buffer */ if (substream->append && count > runtime->buffer_size) return -EIO; result = 0; while (count > 0) { spin_lock_irq(&substream->lock); while (!snd_rawmidi_ready_append(substream, count)) { wait_queue_entry_t wait; if (file->f_flags & O_NONBLOCK) { spin_unlock_irq(&substream->lock); return result > 0 ? result : -EAGAIN; } init_waitqueue_entry(&wait, current); add_wait_queue(&runtime->sleep, &wait); set_current_state(TASK_INTERRUPTIBLE); spin_unlock_irq(&substream->lock); timeout = schedule_timeout(30 * HZ); remove_wait_queue(&runtime->sleep, &wait); if (rfile->rmidi->card->shutdown) return -ENODEV; if (signal_pending(current)) return result > 0 ? result : -ERESTARTSYS; spin_lock_irq(&substream->lock); if (!runtime->avail && !timeout) { spin_unlock_irq(&substream->lock); return result > 0 ? result : -EIO; } } spin_unlock_irq(&substream->lock); count1 = snd_rawmidi_kernel_write1(substream, buf, NULL, count); if (count1 < 0) return result > 0 ? result : count1; result += count1; buf += count1; if ((size_t)count1 < count && (file->f_flags & O_NONBLOCK)) break; count -= count1; } if (file->f_flags & O_DSYNC) { spin_lock_irq(&substream->lock); while (runtime->avail != runtime->buffer_size) { wait_queue_entry_t wait; unsigned int last_avail = runtime->avail; init_waitqueue_entry(&wait, current); add_wait_queue(&runtime->sleep, &wait); set_current_state(TASK_INTERRUPTIBLE); spin_unlock_irq(&substream->lock); timeout = schedule_timeout(30 * HZ); remove_wait_queue(&runtime->sleep, &wait); if (signal_pending(current)) return result > 0 ? result : -ERESTARTSYS; if (runtime->avail == last_avail && !timeout) return result > 0 ? result : -EIO; spin_lock_irq(&substream->lock); } spin_unlock_irq(&substream->lock); } return result; } static __poll_t snd_rawmidi_poll(struct file *file, poll_table *wait) { struct snd_rawmidi_file *rfile; struct snd_rawmidi_runtime *runtime; __poll_t mask; rfile = file->private_data; if (rfile->input != NULL) { runtime = rfile->input->runtime; snd_rawmidi_input_trigger(rfile->input, 1); poll_wait(file, &runtime->sleep, wait); } if (rfile->output != NULL) { runtime = rfile->output->runtime; poll_wait(file, &runtime->sleep, wait); } mask = 0; if (rfile->input != NULL) { if (snd_rawmidi_ready(rfile->input)) mask |= EPOLLIN | EPOLLRDNORM; } if (rfile->output != NULL) { if (snd_rawmidi_ready(rfile->output)) mask |= EPOLLOUT | EPOLLWRNORM; } return mask; } /* */ #ifdef CONFIG_COMPAT #include "rawmidi_compat.c" #else #define snd_rawmidi_ioctl_compat NULL #endif /* */ static void snd_rawmidi_proc_info_read(struct snd_info_entry *entry, struct snd_info_buffer *buffer) { struct snd_rawmidi *rmidi; struct snd_rawmidi_substream *substream; struct snd_rawmidi_runtime *runtime; unsigned long buffer_size, avail, xruns; unsigned int clock_type; static const char *clock_names[4] = { "none", "realtime", "monotonic", "monotonic raw" }; rmidi = entry->private_data; snd_iprintf(buffer, "%s\n\n", rmidi->name); if (IS_ENABLED(CONFIG_SND_UMP)) snd_iprintf(buffer, "Type: %s\n", rawmidi_is_ump(rmidi) ? "UMP" : "Legacy"); if (rmidi->ops && rmidi->ops->proc_read) rmidi->ops->proc_read(entry, buffer); guard(mutex)(&rmidi->open_mutex); if (rmidi->info_flags & SNDRV_RAWMIDI_INFO_OUTPUT) { list_for_each_entry(substream, &rmidi->streams[SNDRV_RAWMIDI_STREAM_OUTPUT].substreams, list) { snd_iprintf(buffer, "Output %d\n" " Tx bytes : %lu\n", substream->number, (unsigned long) substream->bytes); if (substream->opened) { snd_iprintf(buffer, " Owner PID : %d\n", pid_vnr(substream->pid)); runtime = substream->runtime; scoped_guard(spinlock_irq, &substream->lock) { buffer_size = runtime->buffer_size; avail = runtime->avail; } snd_iprintf(buffer, " Mode : %s\n" " Buffer size : %lu\n" " Avail : %lu\n", runtime->oss ? "OSS compatible" : "native", buffer_size, avail); } } } if (rmidi->info_flags & SNDRV_RAWMIDI_INFO_INPUT) { list_for_each_entry(substream, &rmidi->streams[SNDRV_RAWMIDI_STREAM_INPUT].substreams, list) { snd_iprintf(buffer, "Input %d\n" " Rx bytes : %lu\n", substream->number, (unsigned long) substream->bytes); if (substream->opened) { snd_iprintf(buffer, " Owner PID : %d\n", pid_vnr(substream->pid)); runtime = substream->runtime; scoped_guard(spinlock_irq, &substream->lock) { buffer_size = runtime->buffer_size; avail = runtime->avail; xruns = runtime->xruns; } snd_iprintf(buffer, " Buffer size : %lu\n" " Avail : %lu\n" " Overruns : %lu\n", buffer_size, avail, xruns); if (substream->framing == SNDRV_RAWMIDI_MODE_FRAMING_TSTAMP) { clock_type = substream->clock_type >> SNDRV_RAWMIDI_MODE_CLOCK_SHIFT; if (!snd_BUG_ON(clock_type >= ARRAY_SIZE(clock_names))) snd_iprintf(buffer, " Framing : tstamp\n" " Clock type : %s\n", clock_names[clock_type]); } } } } } /* * Register functions */ static const struct file_operations snd_rawmidi_f_ops = { .owner = THIS_MODULE, .read = snd_rawmidi_read, .write = snd_rawmidi_write, .open = snd_rawmidi_open, .release = snd_rawmidi_release, .poll = snd_rawmidi_poll, .unlocked_ioctl = snd_rawmidi_ioctl, .compat_ioctl = snd_rawmidi_ioctl_compat, }; static int snd_rawmidi_alloc_substreams(struct snd_rawmidi *rmidi, struct snd_rawmidi_str *stream, int direction, int count) { struct snd_rawmidi_substream *substream; int idx; for (idx = 0; idx < count; idx++) { substream = kzalloc(sizeof(*substream), GFP_KERNEL); if (!substream) return -ENOMEM; substream->stream = direction; substream->number = idx; substream->rmidi = rmidi; substream->pstr = stream; spin_lock_init(&substream->lock); list_add_tail(&substream->list, &stream->substreams); stream->substream_count++; } return 0; } /* used for both rawmidi and ump */ int snd_rawmidi_init(struct snd_rawmidi *rmidi, struct snd_card *card, char *id, int device, int output_count, int input_count, unsigned int info_flags) { int err; static const struct snd_device_ops ops = { .dev_free = snd_rawmidi_dev_free, .dev_register = snd_rawmidi_dev_register, .dev_disconnect = snd_rawmidi_dev_disconnect, }; rmidi->card = card; rmidi->device = device; mutex_init(&rmidi->open_mutex); init_waitqueue_head(&rmidi->open_wait); INIT_LIST_HEAD(&rmidi->streams[SNDRV_RAWMIDI_STREAM_INPUT].substreams); INIT_LIST_HEAD(&rmidi->streams[SNDRV_RAWMIDI_STREAM_OUTPUT].substreams); rmidi->info_flags = info_flags; if (id != NULL) strscpy(rmidi->id, id, sizeof(rmidi->id)); err = snd_device_alloc(&rmidi->dev, card); if (err < 0) return err; if (rawmidi_is_ump(rmidi)) dev_set_name(rmidi->dev, "umpC%iD%i", card->number, device); else dev_set_name(rmidi->dev, "midiC%iD%i", card->number, device); err = snd_rawmidi_alloc_substreams(rmidi, &rmidi->streams[SNDRV_RAWMIDI_STREAM_INPUT], SNDRV_RAWMIDI_STREAM_INPUT, input_count); if (err < 0) return err; err = snd_rawmidi_alloc_substreams(rmidi, &rmidi->streams[SNDRV_RAWMIDI_STREAM_OUTPUT], SNDRV_RAWMIDI_STREAM_OUTPUT, output_count); if (err < 0) return err; err = snd_device_new(card, SNDRV_DEV_RAWMIDI, rmidi, &ops); if (err < 0) return err; return 0; } EXPORT_SYMBOL_GPL(snd_rawmidi_init); /** * snd_rawmidi_new - create a rawmidi instance * @card: the card instance * @id: the id string * @device: the device index * @output_count: the number of output streams * @input_count: the number of input streams * @rrawmidi: the pointer to store the new rawmidi instance * * Creates a new rawmidi instance. * Use snd_rawmidi_set_ops() to set the operators to the new instance. * * Return: Zero if successful, or a negative error code on failure. */ int snd_rawmidi_new(struct snd_card *card, char *id, int device, int output_count, int input_count, struct snd_rawmidi **rrawmidi) { struct snd_rawmidi *rmidi; int err; if (rrawmidi) *rrawmidi = NULL; rmidi = kzalloc(sizeof(*rmidi), GFP_KERNEL); if (!rmidi) return -ENOMEM; err = snd_rawmidi_init(rmidi, card, id, device, output_count, input_count, 0); if (err < 0) { snd_rawmidi_free(rmidi); return err; } if (rrawmidi) *rrawmidi = rmidi; return 0; } EXPORT_SYMBOL(snd_rawmidi_new); static void snd_rawmidi_free_substreams(struct snd_rawmidi_str *stream) { struct snd_rawmidi_substream *substream; while (!list_empty(&stream->substreams)) { substream = list_entry(stream->substreams.next, struct snd_rawmidi_substream, list); list_del(&substream->list); kfree(substream); } } /* called from ump.c, too */ int snd_rawmidi_free(struct snd_rawmidi *rmidi) { if (!rmidi) return 0; snd_info_free_entry(rmidi->proc_entry); rmidi->proc_entry = NULL; if (rmidi->ops && rmidi->ops->dev_unregister) rmidi->ops->dev_unregister(rmidi); snd_rawmidi_free_substreams(&rmidi->streams[SNDRV_RAWMIDI_STREAM_INPUT]); snd_rawmidi_free_substreams(&rmidi->streams[SNDRV_RAWMIDI_STREAM_OUTPUT]); if (rmidi->private_free) rmidi->private_free(rmidi); put_device(rmidi->dev); kfree(rmidi); return 0; } EXPORT_SYMBOL_GPL(snd_rawmidi_free); static int snd_rawmidi_dev_free(struct snd_device *device) { struct snd_rawmidi *rmidi = device->device_data; return snd_rawmidi_free(rmidi); } #if IS_ENABLED(CONFIG_SND_SEQUENCER) static void snd_rawmidi_dev_seq_free(struct snd_seq_device *device) { struct snd_rawmidi *rmidi = device->private_data; rmidi->seq_dev = NULL; } #endif static int snd_rawmidi_dev_register(struct snd_device *device) { int err; struct snd_info_entry *entry; char name[16]; struct snd_rawmidi *rmidi = device->device_data; if (rmidi->device >= SNDRV_RAWMIDI_DEVICES) return -ENOMEM; err = 0; scoped_guard(mutex, ®ister_mutex) { if (snd_rawmidi_search(rmidi->card, rmidi->device)) err = -EBUSY; else list_add_tail(&rmidi->list, &snd_rawmidi_devices); } if (err < 0) return err; err = snd_register_device(SNDRV_DEVICE_TYPE_RAWMIDI, rmidi->card, rmidi->device, &snd_rawmidi_f_ops, rmidi, rmidi->dev); if (err < 0) { rmidi_err(rmidi, "unable to register\n"); goto error; } if (rmidi->ops && rmidi->ops->dev_register) { err = rmidi->ops->dev_register(rmidi); if (err < 0) goto error_unregister; } #ifdef CONFIG_SND_OSSEMUL rmidi->ossreg = 0; if (!rawmidi_is_ump(rmidi) && (int)rmidi->device == midi_map[rmidi->card->number]) { if (snd_register_oss_device(SNDRV_OSS_DEVICE_TYPE_MIDI, rmidi->card, 0, &snd_rawmidi_f_ops, rmidi) < 0) { rmidi_err(rmidi, "unable to register OSS rawmidi device %i:%i\n", rmidi->card->number, 0); } else { rmidi->ossreg++; #ifdef SNDRV_OSS_INFO_DEV_MIDI snd_oss_info_register(SNDRV_OSS_INFO_DEV_MIDI, rmidi->card->number, rmidi->name); #endif } } if (!rawmidi_is_ump(rmidi) && (int)rmidi->device == amidi_map[rmidi->card->number]) { if (snd_register_oss_device(SNDRV_OSS_DEVICE_TYPE_MIDI, rmidi->card, 1, &snd_rawmidi_f_ops, rmidi) < 0) { rmidi_err(rmidi, "unable to register OSS rawmidi device %i:%i\n", rmidi->card->number, 1); } else { rmidi->ossreg++; } } #endif /* CONFIG_SND_OSSEMUL */ sprintf(name, "midi%d", rmidi->device); entry = snd_info_create_card_entry(rmidi->card, name, rmidi->card->proc_root); if (entry) { entry->private_data = rmidi; entry->c.text.read = snd_rawmidi_proc_info_read; if (snd_info_register(entry) < 0) { snd_info_free_entry(entry); entry = NULL; } } rmidi->proc_entry = entry; #if IS_ENABLED(CONFIG_SND_SEQUENCER) /* no own registration mechanism? */ if (!rmidi->ops || !rmidi->ops->dev_register) { if (snd_seq_device_new(rmidi->card, rmidi->device, SNDRV_SEQ_DEV_ID_MIDISYNTH, 0, &rmidi->seq_dev) >= 0) { rmidi->seq_dev->private_data = rmidi; rmidi->seq_dev->private_free = snd_rawmidi_dev_seq_free; sprintf(rmidi->seq_dev->name, "MIDI %d-%d", rmidi->card->number, rmidi->device); snd_device_register(rmidi->card, rmidi->seq_dev); } } #endif return 0; error_unregister: snd_unregister_device(rmidi->dev); error: scoped_guard(mutex, ®ister_mutex) list_del(&rmidi->list); return err; } static int snd_rawmidi_dev_disconnect(struct snd_device *device) { struct snd_rawmidi *rmidi = device->device_data; int dir; guard(mutex)(®ister_mutex); guard(mutex)(&rmidi->open_mutex); wake_up(&rmidi->open_wait); list_del_init(&rmidi->list); for (dir = 0; dir < 2; dir++) { struct snd_rawmidi_substream *s; list_for_each_entry(s, &rmidi->streams[dir].substreams, list) { if (s->runtime) wake_up(&s->runtime->sleep); } } #ifdef CONFIG_SND_OSSEMUL if (rmidi->ossreg) { if ((int)rmidi->device == midi_map[rmidi->card->number]) { snd_unregister_oss_device(SNDRV_OSS_DEVICE_TYPE_MIDI, rmidi->card, 0); #ifdef SNDRV_OSS_INFO_DEV_MIDI snd_oss_info_unregister(SNDRV_OSS_INFO_DEV_MIDI, rmidi->card->number); #endif } if ((int)rmidi->device == amidi_map[rmidi->card->number]) snd_unregister_oss_device(SNDRV_OSS_DEVICE_TYPE_MIDI, rmidi->card, 1); rmidi->ossreg = 0; } #endif /* CONFIG_SND_OSSEMUL */ snd_unregister_device(rmidi->dev); return 0; } /** * snd_rawmidi_set_ops - set the rawmidi operators * @rmidi: the rawmidi instance * @stream: the stream direction, SNDRV_RAWMIDI_STREAM_XXX * @ops: the operator table * * Sets the rawmidi operators for the given stream direction. */ void snd_rawmidi_set_ops(struct snd_rawmidi *rmidi, int stream, const struct snd_rawmidi_ops *ops) { struct snd_rawmidi_substream *substream; list_for_each_entry(substream, &rmidi->streams[stream].substreams, list) substream->ops = ops; } EXPORT_SYMBOL(snd_rawmidi_set_ops); /* * ENTRY functions */ static int __init alsa_rawmidi_init(void) { snd_ctl_register_ioctl(snd_rawmidi_control_ioctl); snd_ctl_register_ioctl_compat(snd_rawmidi_control_ioctl); #ifdef CONFIG_SND_OSSEMUL { int i; /* check device map table */ for (i = 0; i < SNDRV_CARDS; i++) { if (midi_map[i] < 0 || midi_map[i] >= SNDRV_RAWMIDI_DEVICES) { pr_err("ALSA: rawmidi: invalid midi_map[%d] = %d\n", i, midi_map[i]); midi_map[i] = 0; } if (amidi_map[i] < 0 || amidi_map[i] >= SNDRV_RAWMIDI_DEVICES) { pr_err("ALSA: rawmidi: invalid amidi_map[%d] = %d\n", i, amidi_map[i]); amidi_map[i] = 1; } } } #endif /* CONFIG_SND_OSSEMUL */ return 0; } static void __exit alsa_rawmidi_exit(void) { snd_ctl_unregister_ioctl(snd_rawmidi_control_ioctl); snd_ctl_unregister_ioctl_compat(snd_rawmidi_control_ioctl); } module_init(alsa_rawmidi_init) module_exit(alsa_rawmidi_exit) |
| 3 2 2 2 2 3 3 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 54 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31 21 6 59 3 58 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 | // SPDX-License-Identifier: GPL-2.0-only /* * Copyright (C) 2010-2013 Felix Fietkau <nbd@openwrt.org> * Copyright (C) 2019-2022 Intel Corporation */ #include <linux/netdevice.h> #include <linux/types.h> #include <linux/skbuff.h> #include <linux/debugfs.h> #include <linux/random.h> #include <linux/moduleparam.h> #include <linux/ieee80211.h> #include <linux/minmax.h> #include <net/mac80211.h> #include "rate.h" #include "sta_info.h" #include "rc80211_minstrel_ht.h" #define AVG_AMPDU_SIZE 16 #define AVG_PKT_SIZE 1200 /* Number of bits for an average sized packet */ #define MCS_NBITS ((AVG_PKT_SIZE * AVG_AMPDU_SIZE) << 3) /* Number of symbols for a packet with (bps) bits per symbol */ #define MCS_NSYMS(bps) DIV_ROUND_UP(MCS_NBITS, (bps)) /* Transmission time (nanoseconds) for a packet containing (syms) symbols */ #define MCS_SYMBOL_TIME(sgi, syms) \ (sgi ? \ ((syms) * 18000 + 4000) / 5 : /* syms * 3.6 us */ \ ((syms) * 1000) << 2 /* syms * 4 us */ \ ) /* Transmit duration for the raw data part of an average sized packet */ #define MCS_DURATION(streams, sgi, bps) \ (MCS_SYMBOL_TIME(sgi, MCS_NSYMS((streams) * (bps))) / AVG_AMPDU_SIZE) #define BW_20 0 #define BW_40 1 #define BW_80 2 /* * Define group sort order: HT40 -> SGI -> #streams */ #define GROUP_IDX(_streams, _sgi, _ht40) \ MINSTREL_HT_GROUP_0 + \ MINSTREL_MAX_STREAMS * 2 * _ht40 + \ MINSTREL_MAX_STREAMS * _sgi + \ _streams - 1 #define _MAX(a, b) (((a)>(b))?(a):(b)) #define GROUP_SHIFT(duration) \ _MAX(0, 16 - __builtin_clz(duration)) /* MCS rate information for an MCS group */ #define __MCS_GROUP(_streams, _sgi, _ht40, _s) \ [GROUP_IDX(_streams, _sgi, _ht40)] = { \ .streams = _streams, \ .shift = _s, \ .bw = _ht40, \ .flags = \ IEEE80211_TX_RC_MCS | \ (_sgi ? IEEE80211_TX_RC_SHORT_GI : 0) | \ (_ht40 ? IEEE80211_TX_RC_40_MHZ_WIDTH : 0), \ .duration = { \ MCS_DURATION(_streams, _sgi, _ht40 ? 54 : 26) >> _s, \ MCS_DURATION(_streams, _sgi, _ht40 ? 108 : 52) >> _s, \ MCS_DURATION(_streams, _sgi, _ht40 ? 162 : 78) >> _s, \ MCS_DURATION(_streams, _sgi, _ht40 ? 216 : 104) >> _s, \ MCS_DURATION(_streams, _sgi, _ht40 ? 324 : 156) >> _s, \ MCS_DURATION(_streams, _sgi, _ht40 ? 432 : 208) >> _s, \ MCS_DURATION(_streams, _sgi, _ht40 ? 486 : 234) >> _s, \ MCS_DURATION(_streams, _sgi, _ht40 ? 540 : 260) >> _s \ } \ } #define MCS_GROUP_SHIFT(_streams, _sgi, _ht40) \ GROUP_SHIFT(MCS_DURATION(_streams, _sgi, _ht40 ? 54 : 26)) #define MCS_GROUP(_streams, _sgi, _ht40) \ __MCS_GROUP(_streams, _sgi, _ht40, \ MCS_GROUP_SHIFT(_streams, _sgi, _ht40)) #define VHT_GROUP_IDX(_streams, _sgi, _bw) \ (MINSTREL_VHT_GROUP_0 + \ MINSTREL_MAX_STREAMS * 2 * (_bw) + \ MINSTREL_MAX_STREAMS * (_sgi) + \ (_streams) - 1) #define BW2VBPS(_bw, r3, r2, r1) \ (_bw == BW_80 ? r3 : _bw == BW_40 ? r2 : r1) #define __VHT_GROUP(_streams, _sgi, _bw, _s) \ [VHT_GROUP_IDX(_streams, _sgi, _bw)] = { \ .streams = _streams, \ .shift = _s, \ .bw = _bw, \ .flags = \ IEEE80211_TX_RC_VHT_MCS | \ (_sgi ? IEEE80211_TX_RC_SHORT_GI : 0) | \ (_bw == BW_80 ? IEEE80211_TX_RC_80_MHZ_WIDTH : \ _bw == BW_40 ? IEEE80211_TX_RC_40_MHZ_WIDTH : 0), \ .duration = { \ MCS_DURATION(_streams, _sgi, \ BW2VBPS(_bw, 117, 54, 26)) >> _s, \ MCS_DURATION(_streams, _sgi, \ BW2VBPS(_bw, 234, 108, 52)) >> _s, \ MCS_DURATION(_streams, _sgi, \ BW2VBPS(_bw, 351, 162, 78)) >> _s, \ MCS_DURATION(_streams, _sgi, \ BW2VBPS(_bw, 468, 216, 104)) >> _s, \ MCS_DURATION(_streams, _sgi, \ BW2VBPS(_bw, 702, 324, 156)) >> _s, \ MCS_DURATION(_streams, _sgi, \ BW2VBPS(_bw, 936, 432, 208)) >> _s, \ MCS_DURATION(_streams, _sgi, \ BW2VBPS(_bw, 1053, 486, 234)) >> _s, \ MCS_DURATION(_streams, _sgi, \ BW2VBPS(_bw, 1170, 540, 260)) >> _s, \ MCS_DURATION(_streams, _sgi, \ BW2VBPS(_bw, 1404, 648, 312)) >> _s, \ MCS_DURATION(_streams, _sgi, \ BW2VBPS(_bw, 1560, 720, 346)) >> _s \ } \ } #define VHT_GROUP_SHIFT(_streams, _sgi, _bw) \ GROUP_SHIFT(MCS_DURATION(_streams, _sgi, \ BW2VBPS(_bw, 117, 54, 26))) #define VHT_GROUP(_streams, _sgi, _bw) \ __VHT_GROUP(_streams, _sgi, _bw, \ VHT_GROUP_SHIFT(_streams, _sgi, _bw)) #define CCK_DURATION(_bitrate, _short) \ (1000 * (10 /* SIFS */ + \ (_short ? 72 + 24 : 144 + 48) + \ (8 * (AVG_PKT_SIZE + 4) * 10) / (_bitrate))) #define CCK_DURATION_LIST(_short, _s) \ CCK_DURATION(10, _short) >> _s, \ CCK_DURATION(20, _short) >> _s, \ CCK_DURATION(55, _short) >> _s, \ CCK_DURATION(110, _short) >> _s #define __CCK_GROUP(_s) \ [MINSTREL_CCK_GROUP] = { \ .streams = 1, \ .flags = 0, \ .shift = _s, \ .duration = { \ CCK_DURATION_LIST(false, _s), \ CCK_DURATION_LIST(true, _s) \ } \ } #define CCK_GROUP_SHIFT \ GROUP_SHIFT(CCK_DURATION(10, false)) #define CCK_GROUP __CCK_GROUP(CCK_GROUP_SHIFT) #define OFDM_DURATION(_bitrate) \ (1000 * (16 /* SIFS + signal ext */ + \ 16 /* T_PREAMBLE */ + \ 4 /* T_SIGNAL */ + \ 4 * (((16 + 80 * (AVG_PKT_SIZE + 4) + 6) / \ ((_bitrate) * 4))))) #define OFDM_DURATION_LIST(_s) \ OFDM_DURATION(60) >> _s, \ OFDM_DURATION(90) >> _s, \ OFDM_DURATION(120) >> _s, \ OFDM_DURATION(180) >> _s, \ OFDM_DURATION(240) >> _s, \ OFDM_DURATION(360) >> _s, \ OFDM_DURATION(480) >> _s, \ OFDM_DURATION(540) >> _s #define __OFDM_GROUP(_s) \ [MINSTREL_OFDM_GROUP] = { \ .streams = 1, \ .flags = 0, \ .shift = _s, \ .duration = { \ OFDM_DURATION_LIST(_s), \ } \ } #define OFDM_GROUP_SHIFT \ GROUP_SHIFT(OFDM_DURATION(60)) #define OFDM_GROUP __OFDM_GROUP(OFDM_GROUP_SHIFT) static bool minstrel_vht_only = true; module_param(minstrel_vht_only, bool, 0644); MODULE_PARM_DESC(minstrel_vht_only, "Use only VHT rates when VHT is supported by sta."); /* * To enable sufficiently targeted rate sampling, MCS rates are divided into * groups, based on the number of streams and flags (HT40, SGI) that they * use. * * Sortorder has to be fixed for GROUP_IDX macro to be applicable: * BW -> SGI -> #streams */ const struct mcs_group minstrel_mcs_groups[] = { MCS_GROUP(1, 0, BW_20), MCS_GROUP(2, 0, BW_20), MCS_GROUP(3, 0, BW_20), MCS_GROUP(4, 0, BW_20), MCS_GROUP(1, 1, BW_20), MCS_GROUP(2, 1, BW_20), MCS_GROUP(3, 1, BW_20), MCS_GROUP(4, 1, BW_20), MCS_GROUP(1, 0, BW_40), MCS_GROUP(2, 0, BW_40), MCS_GROUP(3, 0, BW_40), MCS_GROUP(4, 0, BW_40), MCS_GROUP(1, 1, BW_40), MCS_GROUP(2, 1, BW_40), MCS_GROUP(3, 1, BW_40), MCS_GROUP(4, 1, BW_40), CCK_GROUP, OFDM_GROUP, VHT_GROUP(1, 0, BW_20), VHT_GROUP(2, 0, BW_20), VHT_GROUP(3, 0, BW_20), VHT_GROUP(4, 0, BW_20), VHT_GROUP(1, 1, BW_20), VHT_GROUP(2, 1, BW_20), VHT_GROUP(3, 1, BW_20), VHT_GROUP(4, 1, BW_20), VHT_GROUP(1, 0, BW_40), VHT_GROUP(2, 0, BW_40), VHT_GROUP(3, 0, BW_40), VHT_GROUP(4, 0, BW_40), VHT_GROUP(1, 1, BW_40), VHT_GROUP(2, 1, BW_40), VHT_GROUP(3, 1, BW_40), VHT_GROUP(4, 1, BW_40), VHT_GROUP(1, 0, BW_80), VHT_GROUP(2, 0, BW_80), VHT_GROUP(3, 0, BW_80), VHT_GROUP(4, 0, BW_80), VHT_GROUP(1, 1, BW_80), VHT_GROUP(2, 1, BW_80), VHT_GROUP(3, 1, BW_80), VHT_GROUP(4, 1, BW_80), }; const s16 minstrel_cck_bitrates[4] = { 10, 20, 55, 110 }; const s16 minstrel_ofdm_bitrates[8] = { 60, 90, 120, 180, 240, 360, 480, 540 }; static u8 sample_table[SAMPLE_COLUMNS][MCS_GROUP_RATES] __read_mostly; static const u8 minstrel_sample_seq[] = { MINSTREL_SAMPLE_TYPE_INC, MINSTREL_SAMPLE_TYPE_JUMP, MINSTREL_SAMPLE_TYPE_INC, MINSTREL_SAMPLE_TYPE_JUMP, MINSTREL_SAMPLE_TYPE_INC, MINSTREL_SAMPLE_TYPE_SLOW, }; static void minstrel_ht_update_rates(struct minstrel_priv *mp, struct minstrel_ht_sta *mi); /* * Some VHT MCSes are invalid (when Ndbps / Nes is not an integer) * e.g for MCS9@20MHzx1Nss: Ndbps=8x52*(5/6) Nes=1 * * Returns the valid mcs map for struct minstrel_mcs_group_data.supported */ static u16 minstrel_get_valid_vht_rates(int bw, int nss, __le16 mcs_map) { u16 mask = 0; if (bw == BW_20) { if (nss != 3 && nss != 6) mask = BIT(9); } else if (bw == BW_80) { if (nss == 3 || nss == 7) mask = BIT(6); else if (nss == 6) mask = BIT(9); } else { WARN_ON(bw != BW_40); } switch ((le16_to_cpu(mcs_map) >> (2 * (nss - 1))) & 3) { case IEEE80211_VHT_MCS_SUPPORT_0_7: mask |= 0x300; break; case IEEE80211_VHT_MCS_SUPPORT_0_8: mask |= 0x200; break; case IEEE80211_VHT_MCS_SUPPORT_0_9: break; default: mask = 0x3ff; } return 0x3ff & ~mask; } static bool minstrel_ht_is_legacy_group(int group) { return group == MINSTREL_CCK_GROUP || group == MINSTREL_OFDM_GROUP; } /* * Look up an MCS group index based on mac80211 rate information */ static int minstrel_ht_get_group_idx(struct ieee80211_tx_rate *rate) { return GROUP_IDX((rate->idx / 8) + 1, !!(rate->flags & IEEE80211_TX_RC_SHORT_GI), !!(rate->flags & IEEE80211_TX_RC_40_MHZ_WIDTH)); } /* * Look up an MCS group index based on new cfg80211 rate_info. */ static int minstrel_ht_ri_get_group_idx(struct rate_info *rate) { return GROUP_IDX((rate->mcs / 8) + 1, !!(rate->flags & RATE_INFO_FLAGS_SHORT_GI), !!(rate->bw & RATE_INFO_BW_40)); } static int minstrel_vht_get_group_idx(struct ieee80211_tx_rate *rate) { return VHT_GROUP_IDX(ieee80211_rate_get_vht_nss(rate), !!(rate->flags & IEEE80211_TX_RC_SHORT_GI), !!(rate->flags & IEEE80211_TX_RC_40_MHZ_WIDTH) + 2*!!(rate->flags & IEEE80211_TX_RC_80_MHZ_WIDTH)); } /* * Look up an MCS group index based on new cfg80211 rate_info. */ static int minstrel_vht_ri_get_group_idx(struct rate_info *rate) { return VHT_GROUP_IDX(rate->nss, !!(rate->flags & RATE_INFO_FLAGS_SHORT_GI), !!(rate->bw & RATE_INFO_BW_40) + 2*!!(rate->bw & RATE_INFO_BW_80)); } static struct minstrel_rate_stats * minstrel_ht_get_stats(struct minstrel_priv *mp, struct minstrel_ht_sta *mi, struct ieee80211_tx_rate *rate) { int group, idx; if (rate->flags & IEEE80211_TX_RC_MCS) { group = minstrel_ht_get_group_idx(rate); idx = rate->idx % 8; goto out; } if (rate->flags & IEEE80211_TX_RC_VHT_MCS) { group = minstrel_vht_get_group_idx(rate); idx = ieee80211_rate_get_vht_mcs(rate); goto out; } group = MINSTREL_CCK_GROUP; for (idx = 0; idx < ARRAY_SIZE(mp->cck_rates); idx++) { if (!(mi->supported[group] & BIT(idx))) continue; if (rate->idx != mp->cck_rates[idx]) continue; /* short preamble */ if ((mi->supported[group] & BIT(idx + 4)) && (rate->flags & IEEE80211_TX_RC_USE_SHORT_PREAMBLE)) idx += 4; goto out; } group = MINSTREL_OFDM_GROUP; for (idx = 0; idx < ARRAY_SIZE(mp->ofdm_rates[0]); idx++) if (rate->idx == mp->ofdm_rates[mi->band][idx]) goto out; idx = 0; out: return &mi->groups[group].rates[idx]; } /* * Get the minstrel rate statistics for specified STA and rate info. */ static struct minstrel_rate_stats * minstrel_ht_ri_get_stats(struct minstrel_priv *mp, struct minstrel_ht_sta *mi, struct ieee80211_rate_status *rate_status) { int group, idx; struct rate_info *rate = &rate_status->rate_idx; if (rate->flags & RATE_INFO_FLAGS_MCS) { group = minstrel_ht_ri_get_group_idx(rate); idx = rate->mcs % 8; goto out; } if (rate->flags & RATE_INFO_FLAGS_VHT_MCS) { group = minstrel_vht_ri_get_group_idx(rate); idx = rate->mcs; goto out; } group = MINSTREL_CCK_GROUP; for (idx = 0; idx < ARRAY_SIZE(mp->cck_rates); idx++) { if (rate->legacy != minstrel_cck_bitrates[ mp->cck_rates[idx] ]) continue; /* short preamble */ if ((mi->supported[group] & BIT(idx + 4)) && mi->use_short_preamble) idx += 4; goto out; } group = MINSTREL_OFDM_GROUP; for (idx = 0; idx < ARRAY_SIZE(mp->ofdm_rates[0]); idx++) if (rate->legacy == minstrel_ofdm_bitrates[ mp->ofdm_rates[mi->band][idx] ]) goto out; idx = 0; out: return &mi->groups[group].rates[idx]; } static inline struct minstrel_rate_stats * minstrel_get_ratestats(struct minstrel_ht_sta *mi, int index) { return &mi->groups[MI_RATE_GROUP(index)].rates[MI_RATE_IDX(index)]; } static inline int minstrel_get_duration(int index) { const struct mcs_group *group = &minstrel_mcs_groups[MI_RATE_GROUP(index)]; unsigned int duration = group->duration[MI_RATE_IDX(index)]; return duration << group->shift; } static unsigned int minstrel_ht_avg_ampdu_len(struct minstrel_ht_sta *mi) { int duration; if (mi->avg_ampdu_len) return MINSTREL_TRUNC(mi->avg_ampdu_len); if (minstrel_ht_is_legacy_group(MI_RATE_GROUP(mi->max_tp_rate[0]))) return 1; duration = minstrel_get_duration(mi->max_tp_rate[0]); if (duration > 400 * 1000) return 2; if (duration > 250 * 1000) return 4; if (duration > 150 * 1000) return 8; return 16; } /* * Return current throughput based on the average A-MPDU length, taking into * account the expected number of retransmissions and their expected length */ int minstrel_ht_get_tp_avg(struct minstrel_ht_sta *mi, int group, int rate, int prob_avg) { unsigned int nsecs = 0, overhead = mi->overhead; unsigned int ampdu_len = 1; /* do not account throughput if success prob is below 10% */ if (prob_avg < MINSTREL_FRAC(10, 100)) return 0; if (minstrel_ht_is_legacy_group(group)) overhead = mi->overhead_legacy; else ampdu_len = minstrel_ht_avg_ampdu_len(mi); nsecs = 1000 * overhead / ampdu_len; nsecs += minstrel_mcs_groups[group].duration[rate] << minstrel_mcs_groups[group].shift; /* * For the throughput calculation, limit the probability value to 90% to * account for collision related packet error rate fluctuation * (prob is scaled - see MINSTREL_FRAC above) */ if (prob_avg > MINSTREL_FRAC(90, 100)) prob_avg = MINSTREL_FRAC(90, 100); return MINSTREL_TRUNC(100 * ((prob_avg * 1000000) / nsecs)); } /* * Find & sort topmost throughput rates * * If multiple rates provide equal throughput the sorting is based on their * current success probability. Higher success probability is preferred among * MCS groups, CCK rates do not provide aggregation and are therefore at last. */ static void minstrel_ht_sort_best_tp_rates(struct minstrel_ht_sta *mi, u16 index, u16 *tp_list) { int cur_group, cur_idx, cur_tp_avg, cur_prob; int tmp_group, tmp_idx, tmp_tp_avg, tmp_prob; int j = MAX_THR_RATES; cur_group = MI_RATE_GROUP(index); cur_idx = MI_RATE_IDX(index); cur_prob = mi->groups[cur_group].rates[cur_idx].prob_avg; cur_tp_avg = minstrel_ht_get_tp_avg(mi, cur_group, cur_idx, cur_prob); do { tmp_group = MI_RATE_GROUP(tp_list[j - 1]); tmp_idx = MI_RATE_IDX(tp_list[j - 1]); tmp_prob = mi->groups[tmp_group].rates[tmp_idx].prob_avg; tmp_tp_avg = minstrel_ht_get_tp_avg(mi, tmp_group, tmp_idx, tmp_prob); if (cur_tp_avg < tmp_tp_avg || (cur_tp_avg == tmp_tp_avg && cur_prob <= tmp_prob)) break; j--; } while (j > 0); if (j < MAX_THR_RATES - 1) { memmove(&tp_list[j + 1], &tp_list[j], (sizeof(*tp_list) * (MAX_THR_RATES - (j + 1)))); } if (j < MAX_THR_RATES) tp_list[j] = index; } /* * Find and set the topmost probability rate per sta and per group */ static void minstrel_ht_set_best_prob_rate(struct minstrel_ht_sta *mi, u16 *dest, u16 index) { struct minstrel_mcs_group_data *mg; struct minstrel_rate_stats *mrs; int tmp_group, tmp_idx, tmp_tp_avg, tmp_prob; int max_tp_group, max_tp_idx, max_tp_prob; int cur_tp_avg, cur_group, cur_idx; int max_gpr_group, max_gpr_idx; int max_gpr_tp_avg, max_gpr_prob; cur_group = MI_RATE_GROUP(index); cur_idx = MI_RATE_IDX(index); mg = &mi->groups[cur_group]; mrs = &mg->rates[cur_idx]; tmp_group = MI_RATE_GROUP(*dest); tmp_idx = MI_RATE_IDX(*dest); tmp_prob = mi->groups[tmp_group].rates[tmp_idx].prob_avg; tmp_tp_avg = minstrel_ht_get_tp_avg(mi, tmp_group, tmp_idx, tmp_prob); /* if max_tp_rate[0] is from MCS_GROUP max_prob_rate get selected from * MCS_GROUP as well as CCK_GROUP rates do not allow aggregation */ max_tp_group = MI_RATE_GROUP(mi->max_tp_rate[0]); max_tp_idx = MI_RATE_IDX(mi->max_tp_rate[0]); max_tp_prob = mi->groups[max_tp_group].rates[max_tp_idx].prob_avg; if (minstrel_ht_is_legacy_group(MI_RATE_GROUP(index)) && !minstrel_ht_is_legacy_group(max_tp_group)) return; /* skip rates faster than max tp rate with lower prob */ if (minstrel_get_duration(mi->max_tp_rate[0]) > minstrel_get_duration(index) && mrs->prob_avg < max_tp_prob) return; max_gpr_group = MI_RATE_GROUP(mg->max_group_prob_rate); max_gpr_idx = MI_RATE_IDX(mg->max_group_prob_rate); max_gpr_prob = mi->groups[max_gpr_group].rates[max_gpr_idx].prob_avg; if (mrs->prob_avg > MINSTREL_FRAC(75, 100)) { cur_tp_avg = minstrel_ht_get_tp_avg(mi, cur_group, cur_idx, mrs->prob_avg); if (cur_tp_avg > tmp_tp_avg) *dest = index; max_gpr_tp_avg = minstrel_ht_get_tp_avg(mi, max_gpr_group, max_gpr_idx, max_gpr_prob); if (cur_tp_avg > max_gpr_tp_avg) mg->max_group_prob_rate = index; } else { if (mrs->prob_avg > tmp_prob) *dest = index; if (mrs->prob_avg > max_gpr_prob) mg->max_group_prob_rate = index; } } /* * Assign new rate set per sta and use CCK rates only if the fastest * rate (max_tp_rate[0]) is from CCK group. This prohibits such sorted * rate sets where MCS and CCK rates are mixed, because CCK rates can * not use aggregation. */ static void minstrel_ht_assign_best_tp_rates(struct minstrel_ht_sta *mi, u16 tmp_mcs_tp_rate[MAX_THR_RATES], u16 tmp_legacy_tp_rate[MAX_THR_RATES]) { unsigned int tmp_group, tmp_idx, tmp_cck_tp, tmp_mcs_tp, tmp_prob; int i; tmp_group = MI_RATE_GROUP(tmp_legacy_tp_rate[0]); tmp_idx = MI_RATE_IDX(tmp_legacy_tp_rate[0]); tmp_prob = mi->groups[tmp_group].rates[tmp_idx].prob_avg; tmp_cck_tp = minstrel_ht_get_tp_avg(mi, tmp_group, tmp_idx, tmp_prob); tmp_group = MI_RATE_GROUP(tmp_mcs_tp_rate[0]); tmp_idx = MI_RATE_IDX(tmp_mcs_tp_rate[0]); tmp_prob = mi->groups[tmp_group].rates[tmp_idx].prob_avg; tmp_mcs_tp = minstrel_ht_get_tp_avg(mi, tmp_group, tmp_idx, tmp_prob); if (tmp_cck_tp > tmp_mcs_tp) { for(i = 0; i < MAX_THR_RATES; i++) { minstrel_ht_sort_best_tp_rates(mi, tmp_legacy_tp_rate[i], tmp_mcs_tp_rate); } } } /* * Try to increase robustness of max_prob rate by decrease number of * streams if possible. */ static inline void minstrel_ht_prob_rate_reduce_streams(struct minstrel_ht_sta *mi) { struct minstrel_mcs_group_data *mg; int tmp_max_streams, group, tmp_idx, tmp_prob; int tmp_tp = 0; if (!mi->sta->deflink.ht_cap.ht_supported) return; group = MI_RATE_GROUP(mi->max_tp_rate[0]); tmp_max_streams = minstrel_mcs_groups[group].streams; for (group = 0; group < ARRAY_SIZE(minstrel_mcs_groups); group++) { mg = &mi->groups[group]; if (!mi->supported[group] || group == MINSTREL_CCK_GROUP) continue; tmp_idx = MI_RATE_IDX(mg->max_group_prob_rate); tmp_prob = mi->groups[group].rates[tmp_idx].prob_avg; if (tmp_tp < minstrel_ht_get_tp_avg(mi, group, tmp_idx, tmp_prob) && (minstrel_mcs_groups[group].streams < tmp_max_streams)) { mi->max_prob_rate = mg->max_group_prob_rate; tmp_tp = minstrel_ht_get_tp_avg(mi, group, tmp_idx, tmp_prob); } } } static u16 __minstrel_ht_get_sample_rate(struct minstrel_ht_sta *mi, enum minstrel_sample_type type) { u16 *rates = mi->sample[type].sample_rates; u16 cur; int i; for (i = 0; i < MINSTREL_SAMPLE_RATES; i++) { if (!rates[i]) continue; cur = rates[i]; rates[i] = 0; return cur; } return 0; } static inline int minstrel_ewma(int old, int new, int weight) { int diff, incr; diff = new - old; incr = (EWMA_DIV - weight) * diff / EWMA_DIV; return old + incr; } static inline int minstrel_filter_avg_add(u16 *prev_1, u16 *prev_2, s32 in) { s32 out_1 = *prev_1; s32 out_2 = *prev_2; s32 val; if (!in) in += 1; if (!out_1) { val = out_1 = in; goto out; } val = MINSTREL_AVG_COEFF1 * in; val += MINSTREL_AVG_COEFF2 * out_1; val += MINSTREL_AVG_COEFF3 * out_2; val >>= MINSTREL_SCALE; if (val > 1 << MINSTREL_SCALE) val = 1 << MINSTREL_SCALE; if (val < 0) val = 1; out: *prev_2 = out_1; *prev_1 = val; return val; } /* * Recalculate statistics and counters of a given rate */ static void minstrel_ht_calc_rate_stats(struct minstrel_priv *mp, struct minstrel_rate_stats *mrs) { unsigned int cur_prob; if (unlikely(mrs->attempts > 0)) { cur_prob = MINSTREL_FRAC(mrs->success, mrs->attempts); minstrel_filter_avg_add(&mrs->prob_avg, &mrs->prob_avg_1, cur_prob); mrs->att_hist += mrs->attempts; mrs->succ_hist += mrs->success; } mrs->last_success = mrs->success; mrs->last_attempts = mrs->attempts; mrs->success = 0; mrs->attempts = 0; } static bool minstrel_ht_find_sample_rate(struct minstrel_ht_sta *mi, int type, int idx) { int i; for (i = 0; i < MINSTREL_SAMPLE_RATES; i++) { u16 cur = mi->sample[type].sample_rates[i]; if (cur == idx) return true; if (!cur) break; } return false; } static int minstrel_ht_move_sample_rates(struct minstrel_ht_sta *mi, int type, u32 fast_rate_dur, u32 slow_rate_dur) { u16 *rates = mi->sample[type].sample_rates; int i, j; for (i = 0, j = 0; i < MINSTREL_SAMPLE_RATES; i++) { u32 duration; bool valid = false; u16 cur; cur = rates[i]; if (!cur) continue; duration = minstrel_get_duration(cur); switch (type) { case MINSTREL_SAMPLE_TYPE_SLOW: valid = duration > fast_rate_dur && duration < slow_rate_dur; break; case MINSTREL_SAMPLE_TYPE_INC: case MINSTREL_SAMPLE_TYPE_JUMP: valid = duration < fast_rate_dur; break; default: valid = false; break; } if (!valid) { rates[i] = 0; continue; } if (i == j) continue; rates[j++] = cur; rates[i] = 0; } return j; } static int minstrel_ht_group_min_rate_offset(struct minstrel_ht_sta *mi, int group, u32 max_duration) { u16 supported = mi->supported[group]; int i; for (i = 0; i < MCS_GROUP_RATES && supported; i++, supported >>= 1) { if (!(supported & BIT(0))) continue; if (minstrel_get_duration(MI_RATE(group, i)) >= max_duration) continue; return i; } return -1; } /* * Incremental update rates: * Flip through groups and pick the first group rate that is faster than the * highest currently selected rate */ static u16 minstrel_ht_next_inc_rate(struct minstrel_ht_sta *mi, u32 fast_rate_dur) { u8 type = MINSTREL_SAMPLE_TYPE_INC; int i, index = 0; u8 group; group = mi->sample[type].sample_group; for (i = 0; i < ARRAY_SIZE(minstrel_mcs_groups); i++) { group = (group + 1) % ARRAY_SIZE(minstrel_mcs_groups); index = minstrel_ht_group_min_rate_offset(mi, group, fast_rate_dur); if (index < 0) continue; index = MI_RATE(group, index & 0xf); if (!minstrel_ht_find_sample_rate(mi, type, index)) goto out; } index = 0; out: mi->sample[type].sample_group = group; return index; } static int minstrel_ht_next_group_sample_rate(struct minstrel_ht_sta *mi, int group, u16 supported, int offset) { struct minstrel_mcs_group_data *mg = &mi->groups[group]; u16 idx; int i; for (i = 0; i < MCS_GROUP_RATES; i++) { idx = sample_table[mg->column][mg->index]; if (++mg->index >= MCS_GROUP_RATES) { mg->index = 0; if (++mg->column >= ARRAY_SIZE(sample_table)) mg->column = 0; } if (idx < offset) continue; if (!(supported & BIT(idx))) continue; return MI_RATE(group, idx); } return -1; } /* * Jump rates: * Sample random rates, use those that are faster than the highest * currently selected rate. Rates between the fastest and the slowest * get sorted into the slow sample bucket, but only if it has room */ static u16 minstrel_ht_next_jump_rate(struct minstrel_ht_sta *mi, u32 fast_rate_dur, u32 slow_rate_dur, int *slow_rate_ofs) { struct minstrel_rate_stats *mrs; u32 max_duration = slow_rate_dur; int i, index, offset; u16 *slow_rates; u16 supported; u32 duration; u8 group; if (*slow_rate_ofs >= MINSTREL_SAMPLE_RATES) max_duration = fast_rate_dur; slow_rates = mi->sample[MINSTREL_SAMPLE_TYPE_SLOW].sample_rates; group = mi->sample[MINSTREL_SAMPLE_TYPE_JUMP].sample_group; for (i = 0; i < ARRAY_SIZE(minstrel_mcs_groups); i++) { u8 type; group = (group + 1) % ARRAY_SIZE(minstrel_mcs_groups); supported = mi->supported[group]; if (!supported) continue; offset = minstrel_ht_group_min_rate_offset(mi, group, max_duration); if (offset < 0) continue; index = minstrel_ht_next_group_sample_rate(mi, group, supported, offset); if (index < 0) continue; duration = minstrel_get_duration(index); if (duration < fast_rate_dur) type = MINSTREL_SAMPLE_TYPE_JUMP; else type = MINSTREL_SAMPLE_TYPE_SLOW; if (minstrel_ht_find_sample_rate(mi, type, index)) continue; if (type == MINSTREL_SAMPLE_TYPE_JUMP) goto found; if (*slow_rate_ofs >= MINSTREL_SAMPLE_RATES) continue; if (duration >= slow_rate_dur) continue; /* skip slow rates with high success probability */ mrs = minstrel_get_ratestats(mi, index); if (mrs->prob_avg > MINSTREL_FRAC(95, 100)) continue; slow_rates[(*slow_rate_ofs)++] = index; if (*slow_rate_ofs >= MINSTREL_SAMPLE_RATES) max_duration = fast_rate_dur; } index = 0; found: mi->sample[MINSTREL_SAMPLE_TYPE_JUMP].sample_group = group; return index; } static void minstrel_ht_refill_sample_rates(struct minstrel_ht_sta *mi) { u32 prob_dur = minstrel_get_duration(mi->max_prob_rate); u32 tp_dur = minstrel_get_duration(mi->max_tp_rate[0]); u32 tp2_dur = minstrel_get_duration(mi->max_tp_rate[1]); u32 fast_rate_dur = min(min(tp_dur, tp2_dur), prob_dur); u32 slow_rate_dur = max(max(tp_dur, tp2_dur), prob_dur); u16 *rates; int i, j; rates = mi->sample[MINSTREL_SAMPLE_TYPE_INC].sample_rates; i = minstrel_ht_move_sample_rates(mi, MINSTREL_SAMPLE_TYPE_INC, fast_rate_dur, slow_rate_dur); while (i < MINSTREL_SAMPLE_RATES) { rates[i] = minstrel_ht_next_inc_rate(mi, tp_dur); if (!rates[i]) break; i++; } rates = mi->sample[MINSTREL_SAMPLE_TYPE_JUMP].sample_rates; i = minstrel_ht_move_sample_rates(mi, MINSTREL_SAMPLE_TYPE_JUMP, fast_rate_dur, slow_rate_dur); j = minstrel_ht_move_sample_rates(mi, MINSTREL_SAMPLE_TYPE_SLOW, fast_rate_dur, slow_rate_dur); while (i < MINSTREL_SAMPLE_RATES) { rates[i] = minstrel_ht_next_jump_rate(mi, fast_rate_dur, slow_rate_dur, &j); if (!rates[i]) break; i++; } for (i = 0; i < ARRAY_SIZE(mi->sample); i++) memcpy(mi->sample[i].cur_sample_rates, mi->sample[i].sample_rates, sizeof(mi->sample[i].cur_sample_rates)); } /* * Update rate statistics and select new primary rates * * Rules for rate selection: * - max_prob_rate must use only one stream, as a tradeoff between delivery * probability and throughput during strong fluctuations * - as long as the max prob rate has a probability of more than 75%, pick * higher throughput rates, even if the probability is a bit lower */ static void minstrel_ht_update_stats(struct minstrel_priv *mp, struct minstrel_ht_sta *mi) { struct minstrel_mcs_group_data *mg; struct minstrel_rate_stats *mrs; int group, i, j, cur_prob; u16 tmp_mcs_tp_rate[MAX_THR_RATES], tmp_group_tp_rate[MAX_THR_RATES]; u16 tmp_legacy_tp_rate[MAX_THR_RATES], tmp_max_prob_rate; u16 index; bool ht_supported = mi->sta->deflink.ht_cap.ht_supported; if (mi->ampdu_packets > 0) { if (!ieee80211_hw_check(mp->hw, TX_STATUS_NO_AMPDU_LEN)) mi->avg_ampdu_len = minstrel_ewma(mi->avg_ampdu_len, MINSTREL_FRAC(mi->ampdu_len, mi->ampdu_packets), EWMA_LEVEL); else mi->avg_ampdu_len = 0; mi->ampdu_len = 0; mi->ampdu_packets = 0; } if (mi->supported[MINSTREL_CCK_GROUP]) group = MINSTREL_CCK_GROUP; else if (mi->supported[MINSTREL_OFDM_GROUP]) group = MINSTREL_OFDM_GROUP; else group = 0; index = MI_RATE(group, 0); for (j = 0; j < ARRAY_SIZE(tmp_legacy_tp_rate); j++) tmp_legacy_tp_rate[j] = index; if (mi->supported[MINSTREL_VHT_GROUP_0]) group = MINSTREL_VHT_GROUP_0; else if (ht_supported) group = MINSTREL_HT_GROUP_0; else if (mi->supported[MINSTREL_CCK_GROUP]) group = MINSTREL_CCK_GROUP; else group = MINSTREL_OFDM_GROUP; index = MI_RATE(group, 0); tmp_max_prob_rate = index; for (j = 0; j < ARRAY_SIZE(tmp_mcs_tp_rate); j++) tmp_mcs_tp_rate[j] = index; /* Find best rate sets within all MCS groups*/ for (group = 0; group < ARRAY_SIZE(minstrel_mcs_groups); group++) { u16 *tp_rate = tmp_mcs_tp_rate; u16 last_prob = 0; mg = &mi->groups[group]; if (!mi->supported[group]) continue; /* (re)Initialize group rate indexes */ for(j = 0; j < MAX_THR_RATES; j++) tmp_group_tp_rate[j] = MI_RATE(group, 0); if (group == MINSTREL_CCK_GROUP && ht_supported) tp_rate = tmp_legacy_tp_rate; for (i = MCS_GROUP_RATES - 1; i >= 0; i--) { if (!(mi->supported[group] & BIT(i))) continue; index = MI_RATE(group, i); mrs = &mg->rates[i]; mrs->retry_updated = false; minstrel_ht_calc_rate_stats(mp, mrs); if (mrs->att_hist) last_prob = max(last_prob, mrs->prob_avg); else mrs->prob_avg = max(last_prob, mrs->prob_avg); cur_prob = mrs->prob_avg; if (minstrel_ht_get_tp_avg(mi, group, i, cur_prob) == 0) continue; /* Find max throughput rate set */ minstrel_ht_sort_best_tp_rates(mi, index, tp_rate); /* Find max throughput rate set within a group */ minstrel_ht_sort_best_tp_rates(mi, index, tmp_group_tp_rate); } memcpy(mg->max_group_tp_rate, tmp_group_tp_rate, sizeof(mg->max_group_tp_rate)); } /* Assign new rate set per sta */ minstrel_ht_assign_best_tp_rates(mi, tmp_mcs_tp_rate, tmp_legacy_tp_rate); memcpy(mi->max_tp_rate, tmp_mcs_tp_rate, sizeof(mi->max_tp_rate)); for (group = 0; group < ARRAY_SIZE(minstrel_mcs_groups); group++) { if (!mi->supported[group]) continue; mg = &mi->groups[group]; mg->max_group_prob_rate = MI_RATE(group, 0); for (i = 0; i < MCS_GROUP_RATES; i++) { if (!(mi->supported[group] & BIT(i))) continue; index = MI_RATE(group, i); /* Find max probability rate per group and global */ minstrel_ht_set_best_prob_rate(mi, &tmp_max_prob_rate, index); } } mi->max_prob_rate = tmp_max_prob_rate; /* Try to increase robustness of max_prob_rate*/ minstrel_ht_prob_rate_reduce_streams(mi); minstrel_ht_refill_sample_rates(mi); #ifdef CONFIG_MAC80211_DEBUGFS /* use fixed index if set */ if (mp->fixed_rate_idx != -1) { for (i = 0; i < 4; i++) mi->max_tp_rate[i] = mp->fixed_rate_idx; mi->max_prob_rate = mp->fixed_rate_idx; } #endif /* Reset update timer */ mi->last_stats_update = jiffies; mi->sample_time = jiffies; } static bool minstrel_ht_txstat_valid(struct minstrel_priv *mp, struct minstrel_ht_sta *mi, struct ieee80211_tx_rate *rate) { int i; if (rate->idx < 0) return false; if (!rate->count) return false; if (rate->flags & IEEE80211_TX_RC_MCS || rate->flags & IEEE80211_TX_RC_VHT_MCS) return true; for (i = 0; i < ARRAY_SIZE(mp->cck_rates); i++) if (rate->idx == mp->cck_rates[i]) return true; for (i = 0; i < ARRAY_SIZE(mp->ofdm_rates[0]); i++) if (rate->idx == mp->ofdm_rates[mi->band][i]) return true; return false; } /* * Check whether rate_status contains valid information. */ static bool minstrel_ht_ri_txstat_valid(struct minstrel_priv *mp, struct minstrel_ht_sta *mi, struct ieee80211_rate_status *rate_status) { int i; if (!rate_status) return false; if (!rate_status->try_count) return false; if (rate_status->rate_idx.flags & RATE_INFO_FLAGS_MCS || rate_status->rate_idx.flags & RATE_INFO_FLAGS_VHT_MCS) return true; for (i = 0; i < ARRAY_SIZE(mp->cck_rates); i++) { if (rate_status->rate_idx.legacy == minstrel_cck_bitrates[ mp->cck_rates[i] ]) return true; } for (i = 0; i < ARRAY_SIZE(mp->ofdm_rates); i++) { if (rate_status->rate_idx.legacy == minstrel_ofdm_bitrates[ mp->ofdm_rates[mi->band][i] ]) return true; } return false; } static void minstrel_downgrade_rate(struct minstrel_ht_sta *mi, u16 *idx, bool primary) { int group, orig_group; orig_group = group = MI_RATE_GROUP(*idx); while (group > 0) { group--; if (!mi->supported[group]) continue; if (minstrel_mcs_groups[group].streams > minstrel_mcs_groups[orig_group].streams) continue; if (primary) *idx = mi->groups[group].max_group_tp_rate[0]; else *idx = mi->groups[group].max_group_tp_rate[1]; break; } } static void minstrel_ht_tx_status(void *priv, struct ieee80211_supported_band *sband, void *priv_sta, struct ieee80211_tx_status *st) { struct ieee80211_tx_info *info = st->info; struct minstrel_ht_sta *mi = priv_sta; struct ieee80211_tx_rate *ar = info->status.rates; struct minstrel_rate_stats *rate, *rate2; struct minstrel_priv *mp = priv; u32 update_interval = mp->update_interval; bool last, update = false; int i; /* Ignore packet that was sent with noAck flag */ if (info->flags & IEEE80211_TX_CTL_NO_ACK) return; /* This packet was aggregated but doesn't carry status info */ if ((info->flags & IEEE80211_TX_CTL_AMPDU) && !(info->flags & IEEE80211_TX_STAT_AMPDU)) return; if (!(info->flags & IEEE80211_TX_STAT_AMPDU)) { info->status.ampdu_ack_len = (info->flags & IEEE80211_TX_STAT_ACK ? 1 : 0); info->status.ampdu_len = 1; } /* wraparound */ if (mi->total_packets >= ~0 - info->status.ampdu_len) { mi->total_packets = 0; mi->sample_packets = 0; } mi->total_packets += info->status.ampdu_len; if (info->flags & IEEE80211_TX_CTL_RATE_CTRL_PROBE) mi->sample_packets += info->status.ampdu_len; mi->ampdu_packets++; mi->ampdu_len += info->status.ampdu_len; if (st->rates && st->n_rates) { last = !minstrel_ht_ri_txstat_valid(mp, mi, &(st->rates[0])); for (i = 0; !last; i++) { last = (i == st->n_rates - 1) || !minstrel_ht_ri_txstat_valid(mp, mi, &(st->rates[i + 1])); rate = minstrel_ht_ri_get_stats(mp, mi, &(st->rates[i])); if (last) rate->success += info->status.ampdu_ack_len; rate->attempts += st->rates[i].try_count * info->status.ampdu_len; } } else { last = !minstrel_ht_txstat_valid(mp, mi, &ar[0]); for (i = 0; !last; i++) { last = (i == IEEE80211_TX_MAX_RATES - 1) || !minstrel_ht_txstat_valid(mp, mi, &ar[i + 1]); rate = minstrel_ht_get_stats(mp, mi, &ar[i]); if (last) rate->success += info->status.ampdu_ack_len; rate->attempts += ar[i].count * info->status.ampdu_len; } } if (mp->hw->max_rates > 1) { /* * check for sudden death of spatial multiplexing, * downgrade to a lower number of streams if necessary. */ rate = minstrel_get_ratestats(mi, mi->max_tp_rate[0]); if (rate->attempts > 30 && rate->success < rate->attempts / 4) { minstrel_downgrade_rate(mi, &mi->max_tp_rate[0], true); update = true; } rate2 = minstrel_get_ratestats(mi, mi->max_tp_rate[1]); if (rate2->attempts > 30 && rate2->success < rate2->attempts / 4) { minstrel_downgrade_rate(mi, &mi->max_tp_rate[1], false); update = true; } } if (time_after(jiffies, mi->last_stats_update + update_interval)) { update = true; minstrel_ht_update_stats(mp, mi); } if (update) minstrel_ht_update_rates(mp, mi); } static void minstrel_calc_retransmit(struct minstrel_priv *mp, struct minstrel_ht_sta *mi, int index) { struct minstrel_rate_stats *mrs; unsigned int tx_time, tx_time_rtscts, tx_time_data; unsigned int cw = mp->cw_min; unsigned int ctime = 0; unsigned int t_slot = 9; /* FIXME */ unsigned int ampdu_len = minstrel_ht_avg_ampdu_len(mi); unsigned int overhead = 0, overhead_rtscts = 0; mrs = minstrel_get_ratestats(mi, index); if (mrs->prob_avg < MINSTREL_FRAC(1, 10)) { mrs->retry_count = 1; mrs->retry_count_rtscts = 1; return; } mrs->retry_count = 2; mrs->retry_count_rtscts = 2; mrs->retry_updated = true; tx_time_data = minstrel_get_duration(index) * ampdu_len / 1000; /* Contention time for first 2 tries */ ctime = (t_slot * cw) >> 1; cw = min((cw << 1) | 1, mp->cw_max); ctime += (t_slot * cw) >> 1; cw = min((cw << 1) | 1, mp->cw_max); if (minstrel_ht_is_legacy_group(MI_RATE_GROUP(index))) { overhead = mi->overhead_legacy; overhead_rtscts = mi->overhead_legacy_rtscts; } else { overhead = mi->overhead; overhead_rtscts = mi->overhead_rtscts; } /* Total TX time for data and Contention after first 2 tries */ tx_time = ctime + 2 * (overhead + tx_time_data); tx_time_rtscts = ctime + 2 * (overhead_rtscts + tx_time_data); /* See how many more tries we can fit inside segment size */ do { /* Contention time for this try */ ctime = (t_slot * cw) >> 1; cw = min((cw << 1) | 1, mp->cw_max); /* Total TX time after this try */ tx_time += ctime + overhead + tx_time_data; tx_time_rtscts += ctime + overhead_rtscts + tx_time_data; if (tx_time_rtscts < mp->segment_size) mrs->retry_count_rtscts++; } while ((tx_time < mp->segment_size) && (++mrs->retry_count < mp->max_retry)); } static void minstrel_ht_set_rate(struct minstrel_priv *mp, struct minstrel_ht_sta *mi, struct ieee80211_sta_rates *ratetbl, int offset, int index) { int group_idx = MI_RATE_GROUP(index); const struct mcs_group *group = &minstrel_mcs_groups[group_idx]; struct minstrel_rate_stats *mrs; u8 idx; u16 flags = group->flags; mrs = minstrel_get_ratestats(mi, index); if (!mrs->retry_updated) minstrel_calc_retransmit(mp, mi, index); if (mrs->prob_avg < MINSTREL_FRAC(20, 100) || !mrs->retry_count) { ratetbl->rate[offset].count = 2; ratetbl->rate[offset].count_rts = 2; ratetbl->rate[offset].count_cts = 2; } else { ratetbl->rate[offset].count = mrs->retry_count; ratetbl->rate[offset].count_cts = mrs->retry_count; ratetbl->rate[offset].count_rts = mrs->retry_count_rtscts; } index = MI_RATE_IDX(index); if (group_idx == MINSTREL_CCK_GROUP) idx = mp->cck_rates[index % ARRAY_SIZE(mp->cck_rates)]; else if (group_idx == MINSTREL_OFDM_GROUP) idx = mp->ofdm_rates[mi->band][index % ARRAY_SIZE(mp->ofdm_rates[0])]; else if (flags & IEEE80211_TX_RC_VHT_MCS) idx = ((group->streams - 1) << 4) | (index & 0xF); else idx = index + (group->streams - 1) * 8; /* enable RTS/CTS if needed: * - if station is in dynamic SMPS (and streams > 1) * - for fallback rates, to increase chances of getting through */ if (offset > 0 || (mi->sta->deflink.smps_mode == IEEE80211_SMPS_DYNAMIC && group->streams > 1)) { ratetbl->rate[offset].count = ratetbl->rate[offset].count_rts; flags |= IEEE80211_TX_RC_USE_RTS_CTS; } ratetbl->rate[offset].idx = idx; ratetbl->rate[offset].flags = flags; } static inline int minstrel_ht_get_prob_avg(struct minstrel_ht_sta *mi, int rate) { int group = MI_RATE_GROUP(rate); rate = MI_RATE_IDX(rate); return mi->groups[group].rates[rate].prob_avg; } static int minstrel_ht_get_max_amsdu_len(struct minstrel_ht_sta *mi) { int group = MI_RATE_GROUP(mi->max_prob_rate); const struct mcs_group *g = &minstrel_mcs_groups[group]; int rate = MI_RATE_IDX(mi->max_prob_rate); unsigned int duration; /* Disable A-MSDU if max_prob_rate is bad */ if (mi->groups[group].rates[rate].prob_avg < MINSTREL_FRAC(50, 100)) return 1; duration = g->duration[rate]; duration <<= g->shift; /* If the rate is slower than single-stream MCS1, make A-MSDU limit small */ if (duration > MCS_DURATION(1, 0, 52)) return 500; /* * If the rate is slower than single-stream MCS4, limit A-MSDU to usual * data packet size */ if (duration > MCS_DURATION(1, 0, 104)) return 1600; /* * If the rate is slower than single-stream MCS7, or if the max throughput * rate success probability is less than 75%, limit A-MSDU to twice the usual * data packet size */ if (duration > MCS_DURATION(1, 0, 260) || (minstrel_ht_get_prob_avg(mi, mi->max_tp_rate[0]) < MINSTREL_FRAC(75, 100))) return 3200; /* * HT A-MPDU limits maximum MPDU size under BA agreement to 4095 bytes. * Since aggregation sessions are started/stopped without txq flush, use * the limit here to avoid the complexity of having to de-aggregate * packets in the queue. */ if (!mi->sta->deflink.vht_cap.vht_supported) return IEEE80211_MAX_MPDU_LEN_HT_BA; /* unlimited */ return 0; } static void minstrel_ht_update_rates(struct minstrel_priv *mp, struct minstrel_ht_sta *mi) { struct ieee80211_sta_rates *rates; int i = 0; int max_rates = min_t(int, mp->hw->max_rates, IEEE80211_TX_RATE_TABLE_SIZE); rates = kzalloc(sizeof(*rates), GFP_ATOMIC); if (!rates) return; /* Start with max_tp_rate[0] */ minstrel_ht_set_rate(mp, mi, rates, i++, mi->max_tp_rate[0]); /* Fill up remaining, keep one entry for max_probe_rate */ for (; i < (max_rates - 1); i++) minstrel_ht_set_rate(mp, mi, rates, i, mi->max_tp_rate[i]); if (i < max_rates) minstrel_ht_set_rate(mp, mi, rates, i++, mi->max_prob_rate); if (i < IEEE80211_TX_RATE_TABLE_SIZE) rates->rate[i].idx = -1; mi->sta->deflink.agg.max_rc_amsdu_len = minstrel_ht_get_max_amsdu_len(mi); ieee80211_sta_recalc_aggregates(mi->sta); rate_control_set_rates(mp->hw, mi->sta, rates); } static u16 minstrel_ht_get_sample_rate(struct minstrel_priv *mp, struct minstrel_ht_sta *mi) { u8 seq; if (mp->hw->max_rates > 1) { seq = mi->sample_seq; mi->sample_seq = (seq + 1) % ARRAY_SIZE(minstrel_sample_seq); seq = minstrel_sample_seq[seq]; } else { seq = MINSTREL_SAMPLE_TYPE_INC; } return __minstrel_ht_get_sample_rate(mi, seq); } static void minstrel_ht_get_rate(void *priv, struct ieee80211_sta *sta, void *priv_sta, struct ieee80211_tx_rate_control *txrc) { const struct mcs_group *sample_group; struct ieee80211_tx_info *info = IEEE80211_SKB_CB(txrc->skb); struct ieee80211_tx_rate *rate = &info->status.rates[0]; struct minstrel_ht_sta *mi = priv_sta; struct minstrel_priv *mp = priv; u16 sample_idx; info->flags |= mi->tx_flags; #ifdef CONFIG_MAC80211_DEBUGFS if (mp->fixed_rate_idx != -1) return; #endif /* Don't use EAPOL frames for sampling on non-mrr hw */ if (mp->hw->max_rates == 1 && (info->control.flags & IEEE80211_TX_CTRL_PORT_CTRL_PROTO)) return; if (time_is_after_jiffies(mi->sample_time)) return; mi->sample_time = jiffies + MINSTREL_SAMPLE_INTERVAL; sample_idx = minstrel_ht_get_sample_rate(mp, mi); if (!sample_idx) return; sample_group = &minstrel_mcs_groups[MI_RATE_GROUP(sample_idx)]; sample_idx = MI_RATE_IDX(sample_idx); if (sample_group == &minstrel_mcs_groups[MINSTREL_CCK_GROUP] && (sample_idx >= 4) != txrc->short_preamble) return; info->flags |= IEEE80211_TX_CTL_RATE_CTRL_PROBE; rate->count = 1; if (sample_group == &minstrel_mcs_groups[MINSTREL_CCK_GROUP]) { int idx = sample_idx % ARRAY_SIZE(mp->cck_rates); rate->idx = mp->cck_rates[idx]; } else if (sample_group == &minstrel_mcs_groups[MINSTREL_OFDM_GROUP]) { int idx = sample_idx % ARRAY_SIZE(mp->ofdm_rates[0]); rate->idx = mp->ofdm_rates[mi->band][idx]; } else if (sample_group->flags & IEEE80211_TX_RC_VHT_MCS) { ieee80211_rate_set_vht(rate, MI_RATE_IDX(sample_idx), sample_group->streams); } else { rate->idx = sample_idx + (sample_group->streams - 1) * 8; } rate->flags = sample_group->flags; } static void minstrel_ht_update_cck(struct minstrel_priv *mp, struct minstrel_ht_sta *mi, struct ieee80211_supported_band *sband, struct ieee80211_sta *sta) { int i; if (sband->band != NL80211_BAND_2GHZ) return; if (sta->deflink.ht_cap.ht_supported && !ieee80211_hw_check(mp->hw, SUPPORTS_HT_CCK_RATES)) return; for (i = 0; i < 4; i++) { if (mp->cck_rates[i] == 0xff || !rate_supported(sta, sband->band, mp->cck_rates[i])) continue; mi->supported[MINSTREL_CCK_GROUP] |= BIT(i); if (sband->bitrates[i].flags & IEEE80211_RATE_SHORT_PREAMBLE) mi->supported[MINSTREL_CCK_GROUP] |= BIT(i + 4); } } static void minstrel_ht_update_ofdm(struct minstrel_priv *mp, struct minstrel_ht_sta *mi, struct ieee80211_supported_band *sband, struct ieee80211_sta *sta) { const u8 *rates; int i; if (sta->deflink.ht_cap.ht_supported) return; rates = mp->ofdm_rates[sband->band]; for (i = 0; i < ARRAY_SIZE(mp->ofdm_rates[0]); i++) { if (rates[i] == 0xff || !rate_supported(sta, sband->band, rates[i])) continue; mi->supported[MINSTREL_OFDM_GROUP] |= BIT(i); } } static void minstrel_ht_update_caps(void *priv, struct ieee80211_supported_band *sband, struct cfg80211_chan_def *chandef, struct ieee80211_sta *sta, void *priv_sta) { struct minstrel_priv *mp = priv; struct minstrel_ht_sta *mi = priv_sta; struct ieee80211_mcs_info *mcs = &sta->deflink.ht_cap.mcs; u16 ht_cap = sta->deflink.ht_cap.cap; struct ieee80211_sta_vht_cap *vht_cap = &sta->deflink.vht_cap; const struct ieee80211_rate *ctl_rate; struct sta_info *sta_info; bool ldpc, erp; int use_vht; int ack_dur; int stbc; int i; BUILD_BUG_ON(ARRAY_SIZE(minstrel_mcs_groups) != MINSTREL_GROUPS_NB); if (vht_cap->vht_supported) use_vht = vht_cap->vht_mcs.tx_mcs_map != cpu_to_le16(~0); else use_vht = 0; memset(mi, 0, sizeof(*mi)); mi->sta = sta; mi->band = sband->band; mi->last_stats_update = jiffies; ack_dur = ieee80211_frame_duration(sband->band, 10, 60, 1, 1); mi->overhead = ieee80211_frame_duration(sband->band, 0, 60, 1, 1); mi->overhead += ack_dur; mi->overhead_rtscts = mi->overhead + 2 * ack_dur; ctl_rate = &sband->bitrates[rate_lowest_index(sband, sta)]; erp = ctl_rate->flags & IEEE80211_RATE_ERP_G; ack_dur = ieee80211_frame_duration(sband->band, 10, ctl_rate->bitrate, erp, 1); mi->overhead_legacy = ack_dur; mi->overhead_legacy_rtscts = mi->overhead_legacy + 2 * ack_dur; mi->avg_ampdu_len = MINSTREL_FRAC(1, 1); if (!use_vht) { stbc = (ht_cap & IEEE80211_HT_CAP_RX_STBC) >> IEEE80211_HT_CAP_RX_STBC_SHIFT; ldpc = ht_cap & IEEE80211_HT_CAP_LDPC_CODING; } else { stbc = (vht_cap->cap & IEEE80211_VHT_CAP_RXSTBC_MASK) >> IEEE80211_VHT_CAP_RXSTBC_SHIFT; ldpc = vht_cap->cap & IEEE80211_VHT_CAP_RXLDPC; } mi->tx_flags |= stbc << IEEE80211_TX_CTL_STBC_SHIFT; if (ldpc) mi->tx_flags |= IEEE80211_TX_CTL_LDPC; for (i = 0; i < ARRAY_SIZE(mi->groups); i++) { u32 gflags = minstrel_mcs_groups[i].flags; int bw, nss; mi->supported[i] = 0; if (minstrel_ht_is_legacy_group(i)) continue; if (gflags & IEEE80211_TX_RC_SHORT_GI) { if (gflags & IEEE80211_TX_RC_40_MHZ_WIDTH) { if (!(ht_cap & IEEE80211_HT_CAP_SGI_40)) continue; } else { if (!(ht_cap & IEEE80211_HT_CAP_SGI_20)) continue; } } if (gflags & IEEE80211_TX_RC_40_MHZ_WIDTH && sta->deflink.bandwidth < IEEE80211_STA_RX_BW_40) continue; nss = minstrel_mcs_groups[i].streams; /* Mark MCS > 7 as unsupported if STA is in static SMPS mode */ if (sta->deflink.smps_mode == IEEE80211_SMPS_STATIC && nss > 1) continue; /* HT rate */ if (gflags & IEEE80211_TX_RC_MCS) { if (use_vht && minstrel_vht_only) continue; mi->supported[i] = mcs->rx_mask[nss - 1]; continue; } /* VHT rate */ if (!vht_cap->vht_supported || WARN_ON(!(gflags & IEEE80211_TX_RC_VHT_MCS)) || WARN_ON(gflags & IEEE80211_TX_RC_160_MHZ_WIDTH)) continue; if (gflags & IEEE80211_TX_RC_80_MHZ_WIDTH) { if (sta->deflink.bandwidth < IEEE80211_STA_RX_BW_80 || ((gflags & IEEE80211_TX_RC_SHORT_GI) && !(vht_cap->cap & IEEE80211_VHT_CAP_SHORT_GI_80))) { continue; } } if (gflags & IEEE80211_TX_RC_40_MHZ_WIDTH) bw = BW_40; else if (gflags & IEEE80211_TX_RC_80_MHZ_WIDTH) bw = BW_80; else bw = BW_20; mi->supported[i] = minstrel_get_valid_vht_rates(bw, nss, vht_cap->vht_mcs.tx_mcs_map); } sta_info = container_of(sta, struct sta_info, sta); mi->use_short_preamble = test_sta_flag(sta_info, WLAN_STA_SHORT_PREAMBLE) && sta_info->sdata->vif.bss_conf.use_short_preamble; minstrel_ht_update_cck(mp, mi, sband, sta); minstrel_ht_update_ofdm(mp, mi, sband, sta); /* create an initial rate table with the lowest supported rates */ minstrel_ht_update_stats(mp, mi); minstrel_ht_update_rates(mp, mi); } static void minstrel_ht_rate_init(void *priv, struct ieee80211_supported_band *sband, struct cfg80211_chan_def *chandef, struct ieee80211_sta *sta, void *priv_sta) { minstrel_ht_update_caps(priv, sband, chandef, sta, priv_sta); } static void minstrel_ht_rate_update(void *priv, struct ieee80211_supported_band *sband, struct cfg80211_chan_def *chandef, struct ieee80211_sta *sta, void *priv_sta, u32 changed) { minstrel_ht_update_caps(priv, sband, chandef, sta, priv_sta); } static void * minstrel_ht_alloc_sta(void *priv, struct ieee80211_sta *sta, gfp_t gfp) { struct ieee80211_supported_band *sband; struct minstrel_ht_sta *mi; struct minstrel_priv *mp = priv; struct ieee80211_hw *hw = mp->hw; int max_rates = 0; int i; for (i = 0; i < NUM_NL80211_BANDS; i++) { sband = hw->wiphy->bands[i]; if (sband && sband->n_bitrates > max_rates) max_rates = sband->n_bitrates; } return kzalloc(sizeof(*mi), gfp); } static void minstrel_ht_free_sta(void *priv, struct ieee80211_sta *sta, void *priv_sta) { kfree(priv_sta); } static void minstrel_ht_fill_rate_array(u8 *dest, struct ieee80211_supported_band *sband, const s16 *bitrates, int n_rates) { int i, j; for (i = 0; i < sband->n_bitrates; i++) { struct ieee80211_rate *rate = &sband->bitrates[i]; for (j = 0; j < n_rates; j++) { if (rate->bitrate != bitrates[j]) continue; dest[j] = i; break; } } } static void minstrel_ht_init_cck_rates(struct minstrel_priv *mp) { static const s16 bitrates[4] = { 10, 20, 55, 110 }; struct ieee80211_supported_band *sband; memset(mp->cck_rates, 0xff, sizeof(mp->cck_rates)); sband = mp->hw->wiphy->bands[NL80211_BAND_2GHZ]; if (!sband) return; BUILD_BUG_ON(ARRAY_SIZE(mp->cck_rates) != ARRAY_SIZE(bitrates)); minstrel_ht_fill_rate_array(mp->cck_rates, sband, minstrel_cck_bitrates, ARRAY_SIZE(minstrel_cck_bitrates)); } static void minstrel_ht_init_ofdm_rates(struct minstrel_priv *mp, enum nl80211_band band) { static const s16 bitrates[8] = { 60, 90, 120, 180, 240, 360, 480, 540 }; struct ieee80211_supported_band *sband; memset(mp->ofdm_rates[band], 0xff, sizeof(mp->ofdm_rates[band])); sband = mp->hw->wiphy->bands[band]; if (!sband) return; BUILD_BUG_ON(ARRAY_SIZE(mp->ofdm_rates[band]) != ARRAY_SIZE(bitrates)); minstrel_ht_fill_rate_array(mp->ofdm_rates[band], sband, minstrel_ofdm_bitrates, ARRAY_SIZE(minstrel_ofdm_bitrates)); } static void * minstrel_ht_alloc(struct ieee80211_hw *hw) { struct minstrel_priv *mp; int i; mp = kzalloc(sizeof(struct minstrel_priv), GFP_ATOMIC); if (!mp) return NULL; /* contention window settings * Just an approximation. Using the per-queue values would complicate * the calculations and is probably unnecessary */ mp->cw_min = 15; mp->cw_max = 1023; /* maximum time that the hw is allowed to stay in one MRR segment */ mp->segment_size = 6000; if (hw->max_rate_tries > 0) mp->max_retry = hw->max_rate_tries; else /* safe default, does not necessarily have to match hw properties */ mp->max_retry = 7; mp->hw = hw; mp->update_interval = HZ / 20; minstrel_ht_init_cck_rates(mp); for (i = 0; i < ARRAY_SIZE(mp->hw->wiphy->bands); i++) minstrel_ht_init_ofdm_rates(mp, i); return mp; } #ifdef CONFIG_MAC80211_DEBUGFS static void minstrel_ht_add_debugfs(struct ieee80211_hw *hw, void *priv, struct dentry *debugfsdir) { struct minstrel_priv *mp = priv; mp->fixed_rate_idx = (u32) -1; debugfs_create_u32("fixed_rate_idx", S_IRUGO | S_IWUGO, debugfsdir, &mp->fixed_rate_idx); } #endif static void minstrel_ht_free(void *priv) { kfree(priv); } static u32 minstrel_ht_get_expected_throughput(void *priv_sta) { struct minstrel_ht_sta *mi = priv_sta; int i, j, prob, tp_avg; i = MI_RATE_GROUP(mi->max_tp_rate[0]); j = MI_RATE_IDX(mi->max_tp_rate[0]); prob = mi->groups[i].rates[j].prob_avg; /* convert tp_avg from pkt per second in kbps */ tp_avg = minstrel_ht_get_tp_avg(mi, i, j, prob) * 10; tp_avg = tp_avg * AVG_PKT_SIZE * 8 / 1024; return tp_avg; } static const struct rate_control_ops mac80211_minstrel_ht = { .name = "minstrel_ht", .capa = RATE_CTRL_CAPA_AMPDU_TRIGGER, .tx_status_ext = minstrel_ht_tx_status, .get_rate = minstrel_ht_get_rate, .rate_init = minstrel_ht_rate_init, .rate_update = minstrel_ht_rate_update, .alloc_sta = minstrel_ht_alloc_sta, .free_sta = minstrel_ht_free_sta, .alloc = minstrel_ht_alloc, .free = minstrel_ht_free, #ifdef CONFIG_MAC80211_DEBUGFS .add_debugfs = minstrel_ht_add_debugfs, .add_sta_debugfs = minstrel_ht_add_sta_debugfs, #endif .get_expected_throughput = minstrel_ht_get_expected_throughput, }; static void __init init_sample_table(void) { int col, i, new_idx; u8 rnd[MCS_GROUP_RATES]; memset(sample_table, 0xff, sizeof(sample_table)); for (col = 0; col < SAMPLE_COLUMNS; col++) { get_random_bytes(rnd, sizeof(rnd)); for (i = 0; i < MCS_GROUP_RATES; i++) { new_idx = (i + rnd[i]) % MCS_GROUP_RATES; while (sample_table[col][new_idx] != 0xff) new_idx = (new_idx + 1) % MCS_GROUP_RATES; sample_table[col][new_idx] = i; } } } int __init rc80211_minstrel_init(void) { init_sample_table(); return ieee80211_rate_control_register(&mac80211_minstrel_ht); } void rc80211_minstrel_exit(void) { ieee80211_rate_control_unregister(&mac80211_minstrel_ht); } |
| 154 151 156 184 164 146 156 158 158 124 124 142 29 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 | /* * Copyright (C) 2014 Red Hat * Copyright (C) 2014 Intel Corp. * * Permission is hereby granted, free of charge, to any person obtaining a * copy of this software and associated documentation files (the "Software"), * to deal in the Software without restriction, including without limitation * the rights to use, copy, modify, merge, publish, distribute, sublicense, * and/or sell copies of the Software, and to permit persons to whom the * Software is furnished to do so, subject to the following conditions: * * The above copyright notice and this permission notice shall be included in * all copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR * OTHER DEALINGS IN THE SOFTWARE. * * Authors: * Rob Clark <robdclark@gmail.com> * Daniel Vetter <daniel.vetter@ffwll.ch> */ #ifndef DRM_ATOMIC_H_ #define DRM_ATOMIC_H_ #include <drm/drm_crtc.h> #include <drm/drm_util.h> /** * struct drm_crtc_commit - track modeset commits on a CRTC * * This structure is used to track pending modeset changes and atomic commit on * a per-CRTC basis. Since updating the list should never block, this structure * is reference counted to allow waiters to safely wait on an event to complete, * without holding any locks. * * It has 3 different events in total to allow a fine-grained synchronization * between outstanding updates:: * * atomic commit thread hardware * * write new state into hardware ----> ... * signal hw_done * switch to new state on next * ... v/hblank * * wait for buffers to show up ... * * ... send completion irq * irq handler signals flip_done * cleanup old buffers * * signal cleanup_done * * wait for flip_done <---- * clean up atomic state * * The important bit to know is that &cleanup_done is the terminal event, but the * ordering between &flip_done and &hw_done is entirely up to the specific driver * and modeset state change. * * For an implementation of how to use this look at * drm_atomic_helper_setup_commit() from the atomic helper library. * * See also drm_crtc_commit_wait(). */ struct drm_crtc_commit { /** * @crtc: * * DRM CRTC for this commit. */ struct drm_crtc *crtc; /** * @ref: * * Reference count for this structure. Needed to allow blocking on * completions without the risk of the completion disappearing * meanwhile. */ struct kref ref; /** * @flip_done: * * Will be signaled when the hardware has flipped to the new set of * buffers. Signals at the same time as when the drm event for this * commit is sent to userspace, or when an out-fence is singalled. Note * that for most hardware, in most cases this happens after @hw_done is * signalled. * * Completion of this stage is signalled implicitly by calling * drm_crtc_send_vblank_event() on &drm_crtc_state.event. */ struct completion flip_done; /** * @hw_done: * * Will be signalled when all hw register changes for this commit have * been written out. Especially when disabling a pipe this can be much * later than @flip_done, since that can signal already when the * screen goes black, whereas to fully shut down a pipe more register * I/O is required. * * Note that this does not need to include separately reference-counted * resources like backing storage buffer pinning, or runtime pm * management. * * Drivers should call drm_atomic_helper_commit_hw_done() to signal * completion of this stage. */ struct completion hw_done; /** * @cleanup_done: * * Will be signalled after old buffers have been cleaned up by calling * drm_atomic_helper_cleanup_planes(). Since this can only happen after * a vblank wait completed it might be a bit later. This completion is * useful to throttle updates and avoid hardware updates getting ahead * of the buffer cleanup too much. * * Drivers should call drm_atomic_helper_commit_cleanup_done() to signal * completion of this stage. */ struct completion cleanup_done; /** * @commit_entry: * * Entry on the per-CRTC &drm_crtc.commit_list. Protected by * $drm_crtc.commit_lock. */ struct list_head commit_entry; /** * @event: * * &drm_pending_vblank_event pointer to clean up private events. */ struct drm_pending_vblank_event *event; /** * @abort_completion: * * A flag that's set after drm_atomic_helper_setup_commit() takes a * second reference for the completion of $drm_crtc_state.event. It's * used by the free code to remove the second reference if commit fails. */ bool abort_completion; }; struct __drm_planes_state { struct drm_plane *ptr; struct drm_plane_state *state, *old_state, *new_state; }; struct __drm_crtcs_state { struct drm_crtc *ptr; struct drm_crtc_state *state, *old_state, *new_state; /** * @commit: * * A reference to the CRTC commit object that is kept for use by * drm_atomic_helper_wait_for_flip_done() after * drm_atomic_helper_commit_hw_done() is called. This ensures that a * concurrent commit won't free a commit object that is still in use. */ struct drm_crtc_commit *commit; s32 __user *out_fence_ptr; u64 last_vblank_count; }; struct __drm_connnectors_state { struct drm_connector *ptr; struct drm_connector_state *state, *old_state, *new_state; /** * @out_fence_ptr: * * User-provided pointer which the kernel uses to return a sync_file * file descriptor. Used by writeback connectors to signal completion of * the writeback. */ s32 __user *out_fence_ptr; }; struct drm_private_obj; struct drm_private_state; /** * struct drm_private_state_funcs - atomic state functions for private objects * * These hooks are used by atomic helpers to create, swap and destroy states of * private objects. The structure itself is used as a vtable to identify the * associated private object type. Each private object type that needs to be * added to the atomic states is expected to have an implementation of these * hooks and pass a pointer to its drm_private_state_funcs struct to * drm_atomic_get_private_obj_state(). */ struct drm_private_state_funcs { /** * @atomic_duplicate_state: * * Duplicate the current state of the private object and return it. It * is an error to call this before obj->state has been initialized. * * RETURNS: * * Duplicated atomic state or NULL when obj->state is not * initialized or allocation failed. */ struct drm_private_state *(*atomic_duplicate_state)(struct drm_private_obj *obj); /** * @atomic_destroy_state: * * Frees the private object state created with @atomic_duplicate_state. */ void (*atomic_destroy_state)(struct drm_private_obj *obj, struct drm_private_state *state); /** * @atomic_print_state: * * If driver subclasses &struct drm_private_state, it should implement * this optional hook for printing additional driver specific state. * * Do not call this directly, use drm_atomic_private_obj_print_state() * instead. */ void (*atomic_print_state)(struct drm_printer *p, const struct drm_private_state *state); }; /** * struct drm_private_obj - base struct for driver private atomic object * * A driver private object is initialized by calling * drm_atomic_private_obj_init() and cleaned up by calling * drm_atomic_private_obj_fini(). * * Currently only tracks the state update functions and the opaque driver * private state itself, but in the future might also track which * &drm_modeset_lock is required to duplicate and update this object's state. * * All private objects must be initialized before the DRM device they are * attached to is registered to the DRM subsystem (call to drm_dev_register()) * and should stay around until this DRM device is unregistered (call to * drm_dev_unregister()). In other words, private objects lifetime is tied * to the DRM device lifetime. This implies that: * * 1/ all calls to drm_atomic_private_obj_init() must be done before calling * drm_dev_register() * 2/ all calls to drm_atomic_private_obj_fini() must be done after calling * drm_dev_unregister() * * If that private object is used to store a state shared by multiple * CRTCs, proper care must be taken to ensure that non-blocking commits are * properly ordered to avoid a use-after-free issue. * * Indeed, assuming a sequence of two non-blocking &drm_atomic_commit on two * different &drm_crtc using different &drm_plane and &drm_connector, so with no * resources shared, there's no guarantee on which commit is going to happen * first. However, the second &drm_atomic_commit will consider the first * &drm_private_obj its old state, and will be in charge of freeing it whenever * the second &drm_atomic_commit is done. * * If the first &drm_atomic_commit happens after it, it will consider its * &drm_private_obj the new state and will be likely to access it, resulting in * an access to a freed memory region. Drivers should store (and get a reference * to) the &drm_crtc_commit structure in our private state in * &drm_mode_config_helper_funcs.atomic_commit_setup, and then wait for that * commit to complete as the first step of * &drm_mode_config_helper_funcs.atomic_commit_tail, similar to * drm_atomic_helper_wait_for_dependencies(). */ struct drm_private_obj { /** * @head: List entry used to attach a private object to a &drm_device * (queued to &drm_mode_config.privobj_list). */ struct list_head head; /** * @lock: Modeset lock to protect the state object. */ struct drm_modeset_lock lock; /** * @state: Current atomic state for this driver private object. */ struct drm_private_state *state; /** * @funcs: * * Functions to manipulate the state of this driver private object, see * &drm_private_state_funcs. */ const struct drm_private_state_funcs *funcs; }; /** * drm_for_each_privobj() - private object iterator * * @privobj: pointer to the current private object. Updated after each * iteration * @dev: the DRM device we want get private objects from * * Allows one to iterate over all private objects attached to @dev */ #define drm_for_each_privobj(privobj, dev) \ list_for_each_entry(privobj, &(dev)->mode_config.privobj_list, head) /** * struct drm_private_state - base struct for driver private object state * * Currently only contains a backpointer to the overall atomic update, * and the relevant private object but in the future also might hold * synchronization information similar to e.g. &drm_crtc.commit. */ struct drm_private_state { /** * @state: backpointer to global drm_atomic_state */ struct drm_atomic_state *state; /** * @obj: backpointer to the private object */ struct drm_private_obj *obj; }; struct __drm_private_objs_state { struct drm_private_obj *ptr; struct drm_private_state *state, *old_state, *new_state; }; /** * struct drm_atomic_state - Atomic commit structure * * This structure is the kernel counterpart of @drm_mode_atomic and represents * an atomic commit that transitions from an old to a new display state. It * contains all the objects affected by the atomic commit and both the new * state structures and pointers to the old state structures for * these. * * States are added to an atomic update by calling drm_atomic_get_crtc_state(), * drm_atomic_get_plane_state(), drm_atomic_get_connector_state(), or for * private state structures, drm_atomic_get_private_obj_state(). * * NOTE: struct drm_atomic_state first started as a single collection of * entities state pointers (drm_plane_state, drm_crtc_state, etc.). * * At atomic_check time, you could get the state about to be committed * from drm_atomic_state, and the one currently running from the * entities state pointer (drm_crtc.state, for example). After the call * to drm_atomic_helper_swap_state(), the entities state pointer would * contain the state previously checked, and the drm_atomic_state * structure the old state. * * Over time, and in order to avoid confusion, drm_atomic_state has * grown to have both the old state (ie, the state we replace) and the * new state (ie, the state we want to apply). Those names are stable * during the commit process, which makes it easier to reason about. * * You can still find some traces of that evolution through some hooks * or callbacks taking a drm_atomic_state parameter called names like * "old_state". This doesn't necessarily mean that the previous * drm_atomic_state is passed, but rather that this used to be the state * collection we were replacing after drm_atomic_helper_swap_state(), * but the variable name was never updated. * * Some atomic operations implementations followed a similar process. We * first started to pass the entity state only. However, it was pretty * cumbersome for drivers, and especially CRTCs, to retrieve the states * of other components. Thus, we switched to passing the whole * drm_atomic_state as a parameter to those operations. Similarly, the * transition isn't complete yet, and one might still find atomic * operations taking a drm_atomic_state pointer, or a component state * pointer. The former is the preferred form. */ struct drm_atomic_state { /** * @ref: * * Count of all references to this update (will not be freed until zero). */ struct kref ref; /** * @dev: Parent DRM Device. */ struct drm_device *dev; /** * @allow_modeset: * * Allow full modeset. This is used by the ATOMIC IOCTL handler to * implement the DRM_MODE_ATOMIC_ALLOW_MODESET flag. Drivers should * generally not consult this flag, but instead look at the output of * drm_atomic_crtc_needs_modeset(). The detailed rules are: * * - Drivers must not consult @allow_modeset in the atomic commit path. * Use drm_atomic_crtc_needs_modeset() instead. * * - Drivers must consult @allow_modeset before adding unrelated struct * drm_crtc_state to this commit by calling * drm_atomic_get_crtc_state(). See also the warning in the * documentation for that function. * * - Drivers must never change this flag, it is under the exclusive * control of userspace. * * - Drivers may consult @allow_modeset in the atomic check path, if * they have the choice between an optimal hardware configuration * which requires a modeset, and a less optimal configuration which * can be committed without a modeset. An example would be suboptimal * scanout FIFO allocation resulting in increased idle power * consumption. This allows userspace to avoid flickering and delays * for the normal composition loop at reasonable cost. */ bool allow_modeset : 1; /** * @legacy_cursor_update: * * Hint to enforce legacy cursor IOCTL semantics. * * WARNING: This is thoroughly broken and pretty much impossible to * implement correctly. Drivers must ignore this and should instead * implement &drm_plane_helper_funcs.atomic_async_check and * &drm_plane_helper_funcs.atomic_async_commit hooks. New users of this * flag are not allowed. */ bool legacy_cursor_update : 1; /** * @async_update: hint for asynchronous plane update */ bool async_update : 1; /** * @duplicated: * * Indicates whether or not this atomic state was duplicated using * drm_atomic_helper_duplicate_state(). Drivers and atomic helpers * should use this to fixup normal inconsistencies in duplicated * states. */ bool duplicated : 1; /** * @planes: * * Pointer to array of @drm_plane and @drm_plane_state part of this * update. */ struct __drm_planes_state *planes; /** * @crtcs: * * Pointer to array of @drm_crtc and @drm_crtc_state part of this * update. */ struct __drm_crtcs_state *crtcs; /** * @num_connector: size of the @connectors array */ int num_connector; /** * @connectors: * * Pointer to array of @drm_connector and @drm_connector_state part of * this update. */ struct __drm_connnectors_state *connectors; /** * @num_private_objs: size of the @private_objs array */ int num_private_objs; /** * @private_objs: * * Pointer to array of @drm_private_obj and @drm_private_obj_state part * of this update. */ struct __drm_private_objs_state *private_objs; /** * @acquire_ctx: acquire context for this atomic modeset state update */ struct drm_modeset_acquire_ctx *acquire_ctx; /** * @fake_commit: * * Used for signaling unbound planes/connectors. * When a connector or plane is not bound to any CRTC, it's still important * to preserve linearity to prevent the atomic states from being freed too early. * * This commit (if set) is not bound to any CRTC, but will be completed when * drm_atomic_helper_commit_hw_done() is called. */ struct drm_crtc_commit *fake_commit; /** * @commit_work: * * Work item which can be used by the driver or helpers to execute the * commit without blocking. */ struct work_struct commit_work; }; void __drm_crtc_commit_free(struct kref *kref); /** * drm_crtc_commit_get - acquire a reference to the CRTC commit * @commit: CRTC commit * * Increases the reference of @commit. * * Returns: * The pointer to @commit, with reference increased. */ static inline struct drm_crtc_commit *drm_crtc_commit_get(struct drm_crtc_commit *commit) { kref_get(&commit->ref); return commit; } /** * drm_crtc_commit_put - release a reference to the CRTC commmit * @commit: CRTC commit * * This releases a reference to @commit which is freed after removing the * final reference. No locking required and callable from any context. */ static inline void drm_crtc_commit_put(struct drm_crtc_commit *commit) { kref_put(&commit->ref, __drm_crtc_commit_free); } int drm_crtc_commit_wait(struct drm_crtc_commit *commit); struct drm_atomic_state * __must_check drm_atomic_state_alloc(struct drm_device *dev); void drm_atomic_state_clear(struct drm_atomic_state *state); /** * drm_atomic_state_get - acquire a reference to the atomic state * @state: The atomic state * * Returns a new reference to the @state */ static inline struct drm_atomic_state * drm_atomic_state_get(struct drm_atomic_state *state) { kref_get(&state->ref); return state; } void __drm_atomic_state_free(struct kref *ref); /** * drm_atomic_state_put - release a reference to the atomic state * @state: The atomic state * * This releases a reference to @state which is freed after removing the * final reference. No locking required and callable from any context. */ static inline void drm_atomic_state_put(struct drm_atomic_state *state) { kref_put(&state->ref, __drm_atomic_state_free); } int __must_check drm_atomic_state_init(struct drm_device *dev, struct drm_atomic_state *state); void drm_atomic_state_default_clear(struct drm_atomic_state *state); void drm_atomic_state_default_release(struct drm_atomic_state *state); struct drm_crtc_state * __must_check drm_atomic_get_crtc_state(struct drm_atomic_state *state, struct drm_crtc *crtc); struct drm_plane_state * __must_check drm_atomic_get_plane_state(struct drm_atomic_state *state, struct drm_plane *plane); struct drm_connector_state * __must_check drm_atomic_get_connector_state(struct drm_atomic_state *state, struct drm_connector *connector); void drm_atomic_private_obj_init(struct drm_device *dev, struct drm_private_obj *obj, struct drm_private_state *state, const struct drm_private_state_funcs *funcs); void drm_atomic_private_obj_fini(struct drm_private_obj *obj); struct drm_private_state * __must_check drm_atomic_get_private_obj_state(struct drm_atomic_state *state, struct drm_private_obj *obj); struct drm_private_state * drm_atomic_get_old_private_obj_state(const struct drm_atomic_state *state, struct drm_private_obj *obj); struct drm_private_state * drm_atomic_get_new_private_obj_state(const struct drm_atomic_state *state, struct drm_private_obj *obj); struct drm_connector * drm_atomic_get_old_connector_for_encoder(const struct drm_atomic_state *state, struct drm_encoder *encoder); struct drm_connector * drm_atomic_get_new_connector_for_encoder(const struct drm_atomic_state *state, struct drm_encoder *encoder); struct drm_connector * drm_atomic_get_connector_for_encoder(const struct drm_encoder *encoder, struct drm_modeset_acquire_ctx *ctx); struct drm_crtc * drm_atomic_get_old_crtc_for_encoder(struct drm_atomic_state *state, struct drm_encoder *encoder); struct drm_crtc * drm_atomic_get_new_crtc_for_encoder(struct drm_atomic_state *state, struct drm_encoder *encoder); /** * drm_atomic_get_existing_crtc_state - get CRTC state, if it exists * @state: global atomic state object * @crtc: CRTC to grab * * This function returns the CRTC state for the given CRTC, or NULL * if the CRTC is not part of the global atomic state. * * This function is deprecated, @drm_atomic_get_old_crtc_state or * @drm_atomic_get_new_crtc_state should be used instead. */ static inline struct drm_crtc_state * drm_atomic_get_existing_crtc_state(const struct drm_atomic_state *state, struct drm_crtc *crtc) { return state->crtcs[drm_crtc_index(crtc)].state; } /** * drm_atomic_get_old_crtc_state - get old CRTC state, if it exists * @state: global atomic state object * @crtc: CRTC to grab * * This function returns the old CRTC state for the given CRTC, or * NULL if the CRTC is not part of the global atomic state. */ static inline struct drm_crtc_state * drm_atomic_get_old_crtc_state(const struct drm_atomic_state *state, struct drm_crtc *crtc) { return state->crtcs[drm_crtc_index(crtc)].old_state; } /** * drm_atomic_get_new_crtc_state - get new CRTC state, if it exists * @state: global atomic state object * @crtc: CRTC to grab * * This function returns the new CRTC state for the given CRTC, or * NULL if the CRTC is not part of the global atomic state. */ static inline struct drm_crtc_state * drm_atomic_get_new_crtc_state(const struct drm_atomic_state *state, struct drm_crtc *crtc) { return state->crtcs[drm_crtc_index(crtc)].new_state; } /** * drm_atomic_get_existing_plane_state - get plane state, if it exists * @state: global atomic state object * @plane: plane to grab * * This function returns the plane state for the given plane, or NULL * if the plane is not part of the global atomic state. * * This function is deprecated, @drm_atomic_get_old_plane_state or * @drm_atomic_get_new_plane_state should be used instead. */ static inline struct drm_plane_state * drm_atomic_get_existing_plane_state(const struct drm_atomic_state *state, struct drm_plane *plane) { return state->planes[drm_plane_index(plane)].state; } /** * drm_atomic_get_old_plane_state - get plane state, if it exists * @state: global atomic state object * @plane: plane to grab * * This function returns the old plane state for the given plane, or * NULL if the plane is not part of the global atomic state. */ static inline struct drm_plane_state * drm_atomic_get_old_plane_state(const struct drm_atomic_state *state, struct drm_plane *plane) { return state->planes[drm_plane_index(plane)].old_state; } /** * drm_atomic_get_new_plane_state - get plane state, if it exists * @state: global atomic state object * @plane: plane to grab * * This function returns the new plane state for the given plane, or * NULL if the plane is not part of the global atomic state. */ static inline struct drm_plane_state * drm_atomic_get_new_plane_state(const struct drm_atomic_state *state, struct drm_plane *plane) { return state->planes[drm_plane_index(plane)].new_state; } /** * drm_atomic_get_existing_connector_state - get connector state, if it exists * @state: global atomic state object * @connector: connector to grab * * This function returns the connector state for the given connector, * or NULL if the connector is not part of the global atomic state. * * This function is deprecated, @drm_atomic_get_old_connector_state or * @drm_atomic_get_new_connector_state should be used instead. */ static inline struct drm_connector_state * drm_atomic_get_existing_connector_state(const struct drm_atomic_state *state, struct drm_connector *connector) { int index = drm_connector_index(connector); if (index >= state->num_connector) return NULL; return state->connectors[index].state; } /** * drm_atomic_get_old_connector_state - get connector state, if it exists * @state: global atomic state object * @connector: connector to grab * * This function returns the old connector state for the given connector, * or NULL if the connector is not part of the global atomic state. */ static inline struct drm_connector_state * drm_atomic_get_old_connector_state(const struct drm_atomic_state *state, struct drm_connector *connector) { int index = drm_connector_index(connector); if (index >= state->num_connector) return NULL; return state->connectors[index].old_state; } /** * drm_atomic_get_new_connector_state - get connector state, if it exists * @state: global atomic state object * @connector: connector to grab * * This function returns the new connector state for the given connector, * or NULL if the connector is not part of the global atomic state. */ static inline struct drm_connector_state * drm_atomic_get_new_connector_state(const struct drm_atomic_state *state, struct drm_connector *connector) { int index = drm_connector_index(connector); if (index >= state->num_connector) return NULL; return state->connectors[index].new_state; } /** * __drm_atomic_get_current_plane_state - get current plane state * @state: global atomic state object * @plane: plane to grab * * This function returns the plane state for the given plane, either from * @state, or if the plane isn't part of the atomic state update, from @plane. * This is useful in atomic check callbacks, when drivers need to peek at, but * not change, state of other planes, since it avoids threading an error code * back up the call chain. * * WARNING: * * Note that this function is in general unsafe since it doesn't check for the * required locking for access state structures. Drivers must ensure that it is * safe to access the returned state structure through other means. One common * example is when planes are fixed to a single CRTC, and the driver knows that * the CRTC lock is held already. In that case holding the CRTC lock gives a * read-lock on all planes connected to that CRTC. But if planes can be * reassigned things get more tricky. In that case it's better to use * drm_atomic_get_plane_state and wire up full error handling. * * Returns: * * Read-only pointer to the current plane state. */ static inline const struct drm_plane_state * __drm_atomic_get_current_plane_state(const struct drm_atomic_state *state, struct drm_plane *plane) { if (state->planes[drm_plane_index(plane)].state) return state->planes[drm_plane_index(plane)].state; return plane->state; } int __must_check drm_atomic_add_encoder_bridges(struct drm_atomic_state *state, struct drm_encoder *encoder); int __must_check drm_atomic_add_affected_connectors(struct drm_atomic_state *state, struct drm_crtc *crtc); int __must_check drm_atomic_add_affected_planes(struct drm_atomic_state *state, struct drm_crtc *crtc); int __must_check drm_atomic_check_only(struct drm_atomic_state *state); int __must_check drm_atomic_commit(struct drm_atomic_state *state); int __must_check drm_atomic_nonblocking_commit(struct drm_atomic_state *state); void drm_state_dump(struct drm_device *dev, struct drm_printer *p); /** * for_each_oldnew_connector_in_state - iterate over all connectors in an atomic update * @__state: &struct drm_atomic_state pointer * @connector: &struct drm_connector iteration cursor * @old_connector_state: &struct drm_connector_state iteration cursor for the * old state * @new_connector_state: &struct drm_connector_state iteration cursor for the * new state * @__i: int iteration cursor, for macro-internal use * * This iterates over all connectors in an atomic update, tracking both old and * new state. This is useful in places where the state delta needs to be * considered, for example in atomic check functions. */ #define for_each_oldnew_connector_in_state(__state, connector, old_connector_state, new_connector_state, __i) \ for ((__i) = 0; \ (__i) < (__state)->num_connector; \ (__i)++) \ for_each_if ((__state)->connectors[__i].ptr && \ ((connector) = (__state)->connectors[__i].ptr, \ (void)(connector) /* Only to avoid unused-but-set-variable warning */, \ (old_connector_state) = (__state)->connectors[__i].old_state, \ (new_connector_state) = (__state)->connectors[__i].new_state, 1)) /** * for_each_old_connector_in_state - iterate over all connectors in an atomic update * @__state: &struct drm_atomic_state pointer * @connector: &struct drm_connector iteration cursor * @old_connector_state: &struct drm_connector_state iteration cursor for the * old state * @__i: int iteration cursor, for macro-internal use * * This iterates over all connectors in an atomic update, tracking only the old * state. This is useful in disable functions, where we need the old state the * hardware is still in. */ #define for_each_old_connector_in_state(__state, connector, old_connector_state, __i) \ for ((__i) = 0; \ (__i) < (__state)->num_connector; \ (__i)++) \ for_each_if ((__state)->connectors[__i].ptr && \ ((connector) = (__state)->connectors[__i].ptr, \ (void)(connector) /* Only to avoid unused-but-set-variable warning */, \ (old_connector_state) = (__state)->connectors[__i].old_state, 1)) /** * for_each_new_connector_in_state - iterate over all connectors in an atomic update * @__state: &struct drm_atomic_state pointer * @connector: &struct drm_connector iteration cursor * @new_connector_state: &struct drm_connector_state iteration cursor for the * new state * @__i: int iteration cursor, for macro-internal use * * This iterates over all connectors in an atomic update, tracking only the new * state. This is useful in enable functions, where we need the new state the * hardware should be in when the atomic commit operation has completed. */ #define for_each_new_connector_in_state(__state, connector, new_connector_state, __i) \ for ((__i) = 0; \ (__i) < (__state)->num_connector; \ (__i)++) \ for_each_if ((__state)->connectors[__i].ptr && \ ((connector) = (__state)->connectors[__i].ptr, \ (void)(connector) /* Only to avoid unused-but-set-variable warning */, \ (new_connector_state) = (__state)->connectors[__i].new_state, \ (void)(new_connector_state) /* Only to avoid unused-but-set-variable warning */, 1)) /** * for_each_oldnew_crtc_in_state - iterate over all CRTCs in an atomic update * @__state: &struct drm_atomic_state pointer * @crtc: &struct drm_crtc iteration cursor * @old_crtc_state: &struct drm_crtc_state iteration cursor for the old state * @new_crtc_state: &struct drm_crtc_state iteration cursor for the new state * @__i: int iteration cursor, for macro-internal use * * This iterates over all CRTCs in an atomic update, tracking both old and * new state. This is useful in places where the state delta needs to be * considered, for example in atomic check functions. */ #define for_each_oldnew_crtc_in_state(__state, crtc, old_crtc_state, new_crtc_state, __i) \ for ((__i) = 0; \ (__i) < (__state)->dev->mode_config.num_crtc; \ (__i)++) \ for_each_if ((__state)->crtcs[__i].ptr && \ ((crtc) = (__state)->crtcs[__i].ptr, \ (void)(crtc) /* Only to avoid unused-but-set-variable warning */, \ (old_crtc_state) = (__state)->crtcs[__i].old_state, \ (void)(old_crtc_state) /* Only to avoid unused-but-set-variable warning */, \ (new_crtc_state) = (__state)->crtcs[__i].new_state, \ (void)(new_crtc_state) /* Only to avoid unused-but-set-variable warning */, 1)) /** * for_each_old_crtc_in_state - iterate over all CRTCs in an atomic update * @__state: &struct drm_atomic_state pointer * @crtc: &struct drm_crtc iteration cursor * @old_crtc_state: &struct drm_crtc_state iteration cursor for the old state * @__i: int iteration cursor, for macro-internal use * * This iterates over all CRTCs in an atomic update, tracking only the old * state. This is useful in disable functions, where we need the old state the * hardware is still in. */ #define for_each_old_crtc_in_state(__state, crtc, old_crtc_state, __i) \ for ((__i) = 0; \ (__i) < (__state)->dev->mode_config.num_crtc; \ (__i)++) \ for_each_if ((__state)->crtcs[__i].ptr && \ ((crtc) = (__state)->crtcs[__i].ptr, \ (void)(crtc) /* Only to avoid unused-but-set-variable warning */, \ (old_crtc_state) = (__state)->crtcs[__i].old_state, 1)) /** * for_each_new_crtc_in_state - iterate over all CRTCs in an atomic update * @__state: &struct drm_atomic_state pointer * @crtc: &struct drm_crtc iteration cursor * @new_crtc_state: &struct drm_crtc_state iteration cursor for the new state * @__i: int iteration cursor, for macro-internal use * * This iterates over all CRTCs in an atomic update, tracking only the new * state. This is useful in enable functions, where we need the new state the * hardware should be in when the atomic commit operation has completed. */ #define for_each_new_crtc_in_state(__state, crtc, new_crtc_state, __i) \ for ((__i) = 0; \ (__i) < (__state)->dev->mode_config.num_crtc; \ (__i)++) \ for_each_if ((__state)->crtcs[__i].ptr && \ ((crtc) = (__state)->crtcs[__i].ptr, \ (void)(crtc) /* Only to avoid unused-but-set-variable warning */, \ (new_crtc_state) = (__state)->crtcs[__i].new_state, \ (void)(new_crtc_state) /* Only to avoid unused-but-set-variable warning */, 1)) /** * for_each_oldnew_plane_in_state - iterate over all planes in an atomic update * @__state: &struct drm_atomic_state pointer * @plane: &struct drm_plane iteration cursor * @old_plane_state: &struct drm_plane_state iteration cursor for the old state * @new_plane_state: &struct drm_plane_state iteration cursor for the new state * @__i: int iteration cursor, for macro-internal use * * This iterates over all planes in an atomic update, tracking both old and * new state. This is useful in places where the state delta needs to be * considered, for example in atomic check functions. */ #define for_each_oldnew_plane_in_state(__state, plane, old_plane_state, new_plane_state, __i) \ for ((__i) = 0; \ (__i) < (__state)->dev->mode_config.num_total_plane; \ (__i)++) \ for_each_if ((__state)->planes[__i].ptr && \ ((plane) = (__state)->planes[__i].ptr, \ (void)(plane) /* Only to avoid unused-but-set-variable warning */, \ (old_plane_state) = (__state)->planes[__i].old_state,\ (new_plane_state) = (__state)->planes[__i].new_state, 1)) /** * for_each_oldnew_plane_in_state_reverse - iterate over all planes in an atomic * update in reverse order * @__state: &struct drm_atomic_state pointer * @plane: &struct drm_plane iteration cursor * @old_plane_state: &struct drm_plane_state iteration cursor for the old state * @new_plane_state: &struct drm_plane_state iteration cursor for the new state * @__i: int iteration cursor, for macro-internal use * * This iterates over all planes in an atomic update in reverse order, * tracking both old and new state. This is useful in places where the * state delta needs to be considered, for example in atomic check functions. */ #define for_each_oldnew_plane_in_state_reverse(__state, plane, old_plane_state, new_plane_state, __i) \ for ((__i) = ((__state)->dev->mode_config.num_total_plane - 1); \ (__i) >= 0; \ (__i)--) \ for_each_if ((__state)->planes[__i].ptr && \ ((plane) = (__state)->planes[__i].ptr, \ (old_plane_state) = (__state)->planes[__i].old_state,\ (new_plane_state) = (__state)->planes[__i].new_state, 1)) /** * for_each_new_plane_in_state_reverse - other than only tracking new state, * it's the same as for_each_oldnew_plane_in_state_reverse * @__state: &struct drm_atomic_state pointer * @plane: &struct drm_plane iteration cursor * @new_plane_state: &struct drm_plane_state iteration cursor for the new state * @__i: int iteration cursor, for macro-internal use */ #define for_each_new_plane_in_state_reverse(__state, plane, new_plane_state, __i) \ for ((__i) = ((__state)->dev->mode_config.num_total_plane - 1); \ (__i) >= 0; \ (__i)--) \ for_each_if ((__state)->planes[__i].ptr && \ ((plane) = (__state)->planes[__i].ptr, \ (new_plane_state) = (__state)->planes[__i].new_state, 1)) /** * for_each_old_plane_in_state - iterate over all planes in an atomic update * @__state: &struct drm_atomic_state pointer * @plane: &struct drm_plane iteration cursor * @old_plane_state: &struct drm_plane_state iteration cursor for the old state * @__i: int iteration cursor, for macro-internal use * * This iterates over all planes in an atomic update, tracking only the old * state. This is useful in disable functions, where we need the old state the * hardware is still in. */ #define for_each_old_plane_in_state(__state, plane, old_plane_state, __i) \ for ((__i) = 0; \ (__i) < (__state)->dev->mode_config.num_total_plane; \ (__i)++) \ for_each_if ((__state)->planes[__i].ptr && \ ((plane) = (__state)->planes[__i].ptr, \ (old_plane_state) = (__state)->planes[__i].old_state, 1)) /** * for_each_new_plane_in_state - iterate over all planes in an atomic update * @__state: &struct drm_atomic_state pointer * @plane: &struct drm_plane iteration cursor * @new_plane_state: &struct drm_plane_state iteration cursor for the new state * @__i: int iteration cursor, for macro-internal use * * This iterates over all planes in an atomic update, tracking only the new * state. This is useful in enable functions, where we need the new state the * hardware should be in when the atomic commit operation has completed. */ #define for_each_new_plane_in_state(__state, plane, new_plane_state, __i) \ for ((__i) = 0; \ (__i) < (__state)->dev->mode_config.num_total_plane; \ (__i)++) \ for_each_if ((__state)->planes[__i].ptr && \ ((plane) = (__state)->planes[__i].ptr, \ (void)(plane) /* Only to avoid unused-but-set-variable warning */, \ (new_plane_state) = (__state)->planes[__i].new_state, \ (void)(new_plane_state) /* Only to avoid unused-but-set-variable warning */, 1)) /** * for_each_oldnew_private_obj_in_state - iterate over all private objects in an atomic update * @__state: &struct drm_atomic_state pointer * @obj: &struct drm_private_obj iteration cursor * @old_obj_state: &struct drm_private_state iteration cursor for the old state * @new_obj_state: &struct drm_private_state iteration cursor for the new state * @__i: int iteration cursor, for macro-internal use * * This iterates over all private objects in an atomic update, tracking both * old and new state. This is useful in places where the state delta needs * to be considered, for example in atomic check functions. */ #define for_each_oldnew_private_obj_in_state(__state, obj, old_obj_state, new_obj_state, __i) \ for ((__i) = 0; \ (__i) < (__state)->num_private_objs && \ ((obj) = (__state)->private_objs[__i].ptr, \ (old_obj_state) = (__state)->private_objs[__i].old_state, \ (new_obj_state) = (__state)->private_objs[__i].new_state, 1); \ (__i)++) /** * for_each_old_private_obj_in_state - iterate over all private objects in an atomic update * @__state: &struct drm_atomic_state pointer * @obj: &struct drm_private_obj iteration cursor * @old_obj_state: &struct drm_private_state iteration cursor for the old state * @__i: int iteration cursor, for macro-internal use * * This iterates over all private objects in an atomic update, tracking only * the old state. This is useful in disable functions, where we need the old * state the hardware is still in. */ #define for_each_old_private_obj_in_state(__state, obj, old_obj_state, __i) \ for ((__i) = 0; \ (__i) < (__state)->num_private_objs && \ ((obj) = (__state)->private_objs[__i].ptr, \ (old_obj_state) = (__state)->private_objs[__i].old_state, 1); \ (__i)++) /** * for_each_new_private_obj_in_state - iterate over all private objects in an atomic update * @__state: &struct drm_atomic_state pointer * @obj: &struct drm_private_obj iteration cursor * @new_obj_state: &struct drm_private_state iteration cursor for the new state * @__i: int iteration cursor, for macro-internal use * * This iterates over all private objects in an atomic update, tracking only * the new state. This is useful in enable functions, where we need the new state the * hardware should be in when the atomic commit operation has completed. */ #define for_each_new_private_obj_in_state(__state, obj, new_obj_state, __i) \ for ((__i) = 0; \ (__i) < (__state)->num_private_objs && \ ((obj) = (__state)->private_objs[__i].ptr, \ (void)(obj) /* Only to avoid unused-but-set-variable warning */, \ (new_obj_state) = (__state)->private_objs[__i].new_state, 1); \ (__i)++) /** * drm_atomic_crtc_needs_modeset - compute combined modeset need * @state: &drm_crtc_state for the CRTC * * To give drivers flexibility &struct drm_crtc_state has 3 booleans to track * whether the state CRTC changed enough to need a full modeset cycle: * mode_changed, active_changed and connectors_changed. This helper simply * combines these three to compute the overall need for a modeset for @state. * * The atomic helper code sets these booleans, but drivers can and should * change them appropriately to accurately represent whether a modeset is * really needed. In general, drivers should avoid full modesets whenever * possible. * * For example if the CRTC mode has changed, and the hardware is able to enact * the requested mode change without going through a full modeset, the driver * should clear mode_changed in its &drm_mode_config_funcs.atomic_check * implementation. */ static inline bool drm_atomic_crtc_needs_modeset(const struct drm_crtc_state *state) { return state->mode_changed || state->active_changed || state->connectors_changed; } /** * drm_atomic_crtc_effectively_active - compute whether CRTC is actually active * @state: &drm_crtc_state for the CRTC * * When in self refresh mode, the crtc_state->active value will be false, since * the CRTC is off. However in some cases we're interested in whether the CRTC * is active, or effectively active (ie: it's connected to an active display). * In these cases, use this function instead of just checking active. */ static inline bool drm_atomic_crtc_effectively_active(const struct drm_crtc_state *state) { return state->active || state->self_refresh_active; } /** * struct drm_bus_cfg - bus configuration * * This structure stores the configuration of a physical bus between two * components in an output pipeline, usually between two bridges, an encoder * and a bridge, or a bridge and a connector. * * The bus configuration is stored in &drm_bridge_state separately for the * input and output buses, as seen from the point of view of each bridge. The * bus configuration of a bridge output is usually identical to the * configuration of the next bridge's input, but may differ if the signals are * modified between the two bridges, for instance by an inverter on the board. * The input and output configurations of a bridge may differ if the bridge * modifies the signals internally, for instance by performing format * conversion, or modifying signals polarities. */ struct drm_bus_cfg { /** * @format: format used on this bus (one of the MEDIA_BUS_FMT_* format) * * This field should not be directly modified by drivers * (drm_atomic_bridge_chain_select_bus_fmts() takes care of the bus * format negotiation). */ u32 format; /** * @flags: DRM_BUS_* flags used on this bus */ u32 flags; }; /** * struct drm_bridge_state - Atomic bridge state object */ struct drm_bridge_state { /** * @base: inherit from &drm_private_state */ struct drm_private_state base; /** * @bridge: the bridge this state refers to */ struct drm_bridge *bridge; /** * @input_bus_cfg: input bus configuration */ struct drm_bus_cfg input_bus_cfg; /** * @output_bus_cfg: output bus configuration */ struct drm_bus_cfg output_bus_cfg; }; static inline struct drm_bridge_state * drm_priv_to_bridge_state(struct drm_private_state *priv) { return container_of(priv, struct drm_bridge_state, base); } struct drm_bridge_state * drm_atomic_get_bridge_state(struct drm_atomic_state *state, struct drm_bridge *bridge); struct drm_bridge_state * drm_atomic_get_old_bridge_state(const struct drm_atomic_state *state, struct drm_bridge *bridge); struct drm_bridge_state * drm_atomic_get_new_bridge_state(const struct drm_atomic_state *state, struct drm_bridge *bridge); #endif /* DRM_ATOMIC_H_ */ |
| 1 1 1 1 184 161 161 161 161 161 161 159 161 4 158 160 5 5 4 1 4 1 1 1 4 4 184 2 2 2 2 184 119 119 119 119 119 119 112 39 36 36 121 6 106 106 106 40 40 39 16 16 40 40 24 40 40 5 5 74 74 29 74 72 74 74 117 117 113 4 118 118 21 118 118 117 2 2 2 2 2 1 1 2 2 161 170 170 170 161 161 160 161 54 47 47 47 167 170 168 169 139 140 140 140 139 139 129 125 113 125 5 4 2 2 2 2 2 129 118 129 129 127 2 129 138 121 122 122 122 112 112 27 4 3 1 3 122 2 2 2 122 122 4 2 122 122 140 13 13 13 140 140 92 92 91 91 91 91 91 110 111 111 97 96 97 97 95 97 97 97 97 106 106 106 106 106 106 92 92 92 92 92 92 92 92 92 92 92 92 92 92 91 91 91 89 86 89 84 84 84 93 93 92 93 92 76 84 81 236 237 235 237 237 237 235 237 237 237 5 6 6 6 237 237 233 233 237 6 234 17 233 234 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 2297 2298 2299 2300 2301 2302 2303 2304 2305 2306 2307 2308 2309 2310 2311 2312 2313 2314 2315 2316 2317 2318 2319 2320 2321 2322 2323 2324 2325 2326 2327 2328 2329 2330 2331 2332 2333 2334 2335 2336 2337 2338 2339 2340 2341 2342 2343 2344 2345 2346 2347 2348 2349 2350 2351 2352 2353 2354 2355 2356 2357 2358 2359 2360 2361 2362 2363 2364 2365 2366 2367 2368 2369 2370 2371 2372 2373 2374 2375 2376 2377 2378 2379 2380 2381 2382 2383 2384 2385 2386 2387 2388 2389 2390 2391 2392 2393 2394 2395 2396 2397 2398 2399 2400 2401 2402 2403 2404 2405 2406 2407 2408 2409 2410 2411 2412 2413 2414 2415 2416 2417 2418 2419 2420 2421 2422 2423 2424 2425 2426 2427 2428 2429 2430 2431 2432 2433 2434 2435 2436 2437 2438 2439 2440 2441 2442 2443 2444 2445 2446 2447 2448 2449 2450 2451 2452 2453 2454 2455 2456 2457 2458 2459 2460 2461 2462 2463 2464 2465 2466 2467 2468 2469 2470 2471 2472 2473 2474 2475 2476 2477 2478 2479 2480 2481 2482 2483 2484 2485 2486 2487 2488 2489 2490 2491 2492 2493 2494 2495 2496 2497 2498 2499 2500 2501 2502 2503 2504 2505 2506 2507 2508 2509 2510 2511 2512 2513 2514 2515 2516 2517 2518 2519 2520 2521 2522 2523 2524 2525 2526 2527 2528 2529 2530 2531 2532 2533 2534 2535 2536 2537 2538 2539 2540 2541 2542 2543 2544 2545 2546 2547 2548 2549 2550 2551 2552 2553 2554 2555 2556 2557 2558 2559 2560 2561 2562 2563 2564 2565 2566 2567 2568 2569 2570 2571 2572 2573 2574 2575 2576 2577 2578 2579 2580 2581 2582 2583 2584 2585 2586 2587 2588 2589 2590 2591 2592 2593 2594 2595 2596 2597 2598 2599 2600 2601 2602 2603 2604 2605 2606 2607 2608 2609 2610 2611 2612 2613 2614 2615 2616 2617 2618 2619 2620 2621 2622 2623 2624 2625 2626 2627 2628 2629 2630 2631 2632 2633 2634 2635 2636 2637 2638 2639 2640 2641 2642 2643 2644 2645 2646 2647 2648 2649 2650 2651 2652 2653 2654 2655 2656 2657 2658 2659 2660 2661 2662 2663 2664 2665 2666 2667 2668 2669 2670 2671 2672 2673 2674 2675 2676 2677 2678 2679 2680 | // SPDX-License-Identifier: GPL-2.0 #include "bcachefs.h" #include "alloc_background.h" #include "alloc_foreground.h" #include "backpointers.h" #include "bkey_buf.h" #include "btree_cache.h" #include "btree_io.h" #include "btree_key_cache.h" #include "btree_update.h" #include "btree_update_interior.h" #include "btree_gc.h" #include "btree_write_buffer.h" #include "buckets.h" #include "buckets_waiting_for_journal.h" #include "clock.h" #include "debug.h" #include "disk_accounting.h" #include "ec.h" #include "enumerated_ref.h" #include "error.h" #include "lru.h" #include "recovery.h" #include "varint.h" #include <linux/kthread.h> #include <linux/math64.h> #include <linux/random.h> #include <linux/rculist.h> #include <linux/rcupdate.h> #include <linux/sched/task.h> #include <linux/sort.h> #include <linux/jiffies.h> static void bch2_discard_one_bucket_fast(struct bch_dev *, u64); /* Persistent alloc info: */ static const unsigned BCH_ALLOC_V1_FIELD_BYTES[] = { #define x(name, bits) [BCH_ALLOC_FIELD_V1_##name] = bits / 8, BCH_ALLOC_FIELDS_V1() #undef x }; struct bkey_alloc_unpacked { u64 journal_seq; u8 gen; u8 oldest_gen; u8 data_type; bool need_discard:1; bool need_inc_gen:1; #define x(_name, _bits) u##_bits _name; BCH_ALLOC_FIELDS_V2() #undef x }; static inline u64 alloc_field_v1_get(const struct bch_alloc *a, const void **p, unsigned field) { unsigned bytes = BCH_ALLOC_V1_FIELD_BYTES[field]; u64 v; if (!(a->fields & (1 << field))) return 0; switch (bytes) { case 1: v = *((const u8 *) *p); break; case 2: v = le16_to_cpup(*p); break; case 4: v = le32_to_cpup(*p); break; case 8: v = le64_to_cpup(*p); break; default: BUG(); } *p += bytes; return v; } static void bch2_alloc_unpack_v1(struct bkey_alloc_unpacked *out, struct bkey_s_c k) { const struct bch_alloc *in = bkey_s_c_to_alloc(k).v; const void *d = in->data; unsigned idx = 0; out->gen = in->gen; #define x(_name, _bits) out->_name = alloc_field_v1_get(in, &d, idx++); BCH_ALLOC_FIELDS_V1() #undef x } static int bch2_alloc_unpack_v2(struct bkey_alloc_unpacked *out, struct bkey_s_c k) { struct bkey_s_c_alloc_v2 a = bkey_s_c_to_alloc_v2(k); const u8 *in = a.v->data; const u8 *end = bkey_val_end(a); unsigned fieldnr = 0; int ret; u64 v; out->gen = a.v->gen; out->oldest_gen = a.v->oldest_gen; out->data_type = a.v->data_type; #define x(_name, _bits) \ if (fieldnr < a.v->nr_fields) { \ ret = bch2_varint_decode_fast(in, end, &v); \ if (ret < 0) \ return ret; \ in += ret; \ } else { \ v = 0; \ } \ out->_name = v; \ if (v != out->_name) \ return -1; \ fieldnr++; BCH_ALLOC_FIELDS_V2() #undef x return 0; } static int bch2_alloc_unpack_v3(struct bkey_alloc_unpacked *out, struct bkey_s_c k) { struct bkey_s_c_alloc_v3 a = bkey_s_c_to_alloc_v3(k); const u8 *in = a.v->data; const u8 *end = bkey_val_end(a); unsigned fieldnr = 0; int ret; u64 v; out->gen = a.v->gen; out->oldest_gen = a.v->oldest_gen; out->data_type = a.v->data_type; out->need_discard = BCH_ALLOC_V3_NEED_DISCARD(a.v); out->need_inc_gen = BCH_ALLOC_V3_NEED_INC_GEN(a.v); out->journal_seq = le64_to_cpu(a.v->journal_seq); #define x(_name, _bits) \ if (fieldnr < a.v->nr_fields) { \ ret = bch2_varint_decode_fast(in, end, &v); \ if (ret < 0) \ return ret; \ in += ret; \ } else { \ v = 0; \ } \ out->_name = v; \ if (v != out->_name) \ return -1; \ fieldnr++; BCH_ALLOC_FIELDS_V2() #undef x return 0; } static struct bkey_alloc_unpacked bch2_alloc_unpack(struct bkey_s_c k) { struct bkey_alloc_unpacked ret = { .gen = 0 }; switch (k.k->type) { case KEY_TYPE_alloc: bch2_alloc_unpack_v1(&ret, k); break; case KEY_TYPE_alloc_v2: bch2_alloc_unpack_v2(&ret, k); break; case KEY_TYPE_alloc_v3: bch2_alloc_unpack_v3(&ret, k); break; } return ret; } static unsigned bch_alloc_v1_val_u64s(const struct bch_alloc *a) { unsigned i, bytes = offsetof(struct bch_alloc, data); for (i = 0; i < ARRAY_SIZE(BCH_ALLOC_V1_FIELD_BYTES); i++) if (a->fields & (1 << i)) bytes += BCH_ALLOC_V1_FIELD_BYTES[i]; return DIV_ROUND_UP(bytes, sizeof(u64)); } int bch2_alloc_v1_validate(struct bch_fs *c, struct bkey_s_c k, struct bkey_validate_context from) { struct bkey_s_c_alloc a = bkey_s_c_to_alloc(k); int ret = 0; /* allow for unknown fields */ bkey_fsck_err_on(bkey_val_u64s(a.k) < bch_alloc_v1_val_u64s(a.v), c, alloc_v1_val_size_bad, "incorrect value size (%zu < %u)", bkey_val_u64s(a.k), bch_alloc_v1_val_u64s(a.v)); fsck_err: return ret; } int bch2_alloc_v2_validate(struct bch_fs *c, struct bkey_s_c k, struct bkey_validate_context from) { struct bkey_alloc_unpacked u; int ret = 0; bkey_fsck_err_on(bch2_alloc_unpack_v2(&u, k), c, alloc_v2_unpack_error, "unpack error"); fsck_err: return ret; } int bch2_alloc_v3_validate(struct bch_fs *c, struct bkey_s_c k, struct bkey_validate_context from) { struct bkey_alloc_unpacked u; int ret = 0; bkey_fsck_err_on(bch2_alloc_unpack_v3(&u, k), c, alloc_v3_unpack_error, "unpack error"); fsck_err: return ret; } int bch2_alloc_v4_validate(struct bch_fs *c, struct bkey_s_c k, struct bkey_validate_context from) { struct bch_alloc_v4 a; int ret = 0; bkey_val_copy(&a, bkey_s_c_to_alloc_v4(k)); bkey_fsck_err_on(alloc_v4_u64s_noerror(&a) > bkey_val_u64s(k.k), c, alloc_v4_val_size_bad, "bad val size (%u > %zu)", alloc_v4_u64s_noerror(&a), bkey_val_u64s(k.k)); bkey_fsck_err_on(!BCH_ALLOC_V4_BACKPOINTERS_START(&a) && BCH_ALLOC_V4_NR_BACKPOINTERS(&a), c, alloc_v4_backpointers_start_bad, "invalid backpointers_start"); bkey_fsck_err_on(alloc_data_type(a, a.data_type) != a.data_type, c, alloc_key_data_type_bad, "invalid data type (got %u should be %u)", a.data_type, alloc_data_type(a, a.data_type)); for (unsigned i = 0; i < 2; i++) bkey_fsck_err_on(a.io_time[i] > LRU_TIME_MAX, c, alloc_key_io_time_bad, "invalid io_time[%s]: %llu, max %llu", i == READ ? "read" : "write", a.io_time[i], LRU_TIME_MAX); unsigned stripe_sectors = BCH_ALLOC_V4_BACKPOINTERS_START(&a) * sizeof(u64) > offsetof(struct bch_alloc_v4, stripe_sectors) ? a.stripe_sectors : 0; switch (a.data_type) { case BCH_DATA_free: case BCH_DATA_need_gc_gens: case BCH_DATA_need_discard: bkey_fsck_err_on(stripe_sectors || a.dirty_sectors || a.cached_sectors || a.stripe, c, alloc_key_empty_but_have_data, "empty data type free but have data %u.%u.%u %u", stripe_sectors, a.dirty_sectors, a.cached_sectors, a.stripe); break; case BCH_DATA_sb: case BCH_DATA_journal: case BCH_DATA_btree: case BCH_DATA_user: case BCH_DATA_parity: bkey_fsck_err_on(!a.dirty_sectors && !stripe_sectors, c, alloc_key_dirty_sectors_0, "data_type %s but dirty_sectors==0", bch2_data_type_str(a.data_type)); break; case BCH_DATA_cached: bkey_fsck_err_on(!a.cached_sectors || a.dirty_sectors || stripe_sectors || a.stripe, c, alloc_key_cached_inconsistency, "data type inconsistency"); bkey_fsck_err_on(!a.io_time[READ] && !(c->recovery.passes_to_run & BIT_ULL(BCH_RECOVERY_PASS_check_alloc_to_lru_refs)), c, alloc_key_cached_but_read_time_zero, "cached bucket with read_time == 0"); break; case BCH_DATA_stripe: break; } fsck_err: return ret; } void bch2_alloc_v4_swab(struct bkey_s k) { struct bch_alloc_v4 *a = bkey_s_to_alloc_v4(k).v; a->journal_seq_nonempty = swab64(a->journal_seq_nonempty); a->journal_seq_empty = swab64(a->journal_seq_empty); a->flags = swab32(a->flags); a->dirty_sectors = swab32(a->dirty_sectors); a->cached_sectors = swab32(a->cached_sectors); a->io_time[0] = swab64(a->io_time[0]); a->io_time[1] = swab64(a->io_time[1]); a->stripe = swab32(a->stripe); a->nr_external_backpointers = swab32(a->nr_external_backpointers); a->stripe_sectors = swab32(a->stripe_sectors); } static inline void __bch2_alloc_v4_to_text(struct printbuf *out, struct bch_fs *c, unsigned dev, const struct bch_alloc_v4 *a) { struct bch_dev *ca = c ? bch2_dev_tryget_noerror(c, dev) : NULL; prt_newline(out); printbuf_indent_add(out, 2); prt_printf(out, "gen %u oldest_gen %u data_type ", a->gen, a->oldest_gen); bch2_prt_data_type(out, a->data_type); prt_newline(out); prt_printf(out, "journal_seq_nonempty %llu\n", a->journal_seq_nonempty); prt_printf(out, "journal_seq_empty %llu\n", a->journal_seq_empty); prt_printf(out, "need_discard %llu\n", BCH_ALLOC_V4_NEED_DISCARD(a)); prt_printf(out, "need_inc_gen %llu\n", BCH_ALLOC_V4_NEED_INC_GEN(a)); prt_printf(out, "dirty_sectors %u\n", a->dirty_sectors); prt_printf(out, "stripe_sectors %u\n", a->stripe_sectors); prt_printf(out, "cached_sectors %u\n", a->cached_sectors); prt_printf(out, "stripe %u\n", a->stripe); prt_printf(out, "stripe_redundancy %u\n", a->stripe_redundancy); prt_printf(out, "io_time[READ] %llu\n", a->io_time[READ]); prt_printf(out, "io_time[WRITE] %llu\n", a->io_time[WRITE]); if (ca) prt_printf(out, "fragmentation %llu\n", alloc_lru_idx_fragmentation(*a, ca)); prt_printf(out, "bp_start %llu\n", BCH_ALLOC_V4_BACKPOINTERS_START(a)); printbuf_indent_sub(out, 2); bch2_dev_put(ca); } void bch2_alloc_to_text(struct printbuf *out, struct bch_fs *c, struct bkey_s_c k) { struct bch_alloc_v4 _a; const struct bch_alloc_v4 *a = bch2_alloc_to_v4(k, &_a); __bch2_alloc_v4_to_text(out, c, k.k->p.inode, a); } void bch2_alloc_v4_to_text(struct printbuf *out, struct bch_fs *c, struct bkey_s_c k) { __bch2_alloc_v4_to_text(out, c, k.k->p.inode, bkey_s_c_to_alloc_v4(k).v); } void __bch2_alloc_to_v4(struct bkey_s_c k, struct bch_alloc_v4 *out) { if (k.k->type == KEY_TYPE_alloc_v4) { void *src, *dst; *out = *bkey_s_c_to_alloc_v4(k).v; src = alloc_v4_backpointers(out); SET_BCH_ALLOC_V4_BACKPOINTERS_START(out, BCH_ALLOC_V4_U64s); dst = alloc_v4_backpointers(out); if (src < dst) memset(src, 0, dst - src); SET_BCH_ALLOC_V4_NR_BACKPOINTERS(out, 0); } else { struct bkey_alloc_unpacked u = bch2_alloc_unpack(k); *out = (struct bch_alloc_v4) { .journal_seq_nonempty = u.journal_seq, .flags = u.need_discard, .gen = u.gen, .oldest_gen = u.oldest_gen, .data_type = u.data_type, .stripe_redundancy = u.stripe_redundancy, .dirty_sectors = u.dirty_sectors, .cached_sectors = u.cached_sectors, .io_time[READ] = u.read_time, .io_time[WRITE] = u.write_time, .stripe = u.stripe, }; SET_BCH_ALLOC_V4_BACKPOINTERS_START(out, BCH_ALLOC_V4_U64s); } } static noinline struct bkey_i_alloc_v4 * __bch2_alloc_to_v4_mut(struct btree_trans *trans, struct bkey_s_c k) { struct bkey_i_alloc_v4 *ret; ret = bch2_trans_kmalloc(trans, max(bkey_bytes(k.k), sizeof(struct bkey_i_alloc_v4))); if (IS_ERR(ret)) return ret; if (k.k->type == KEY_TYPE_alloc_v4) { void *src, *dst; bkey_reassemble(&ret->k_i, k); src = alloc_v4_backpointers(&ret->v); SET_BCH_ALLOC_V4_BACKPOINTERS_START(&ret->v, BCH_ALLOC_V4_U64s); dst = alloc_v4_backpointers(&ret->v); if (src < dst) memset(src, 0, dst - src); SET_BCH_ALLOC_V4_NR_BACKPOINTERS(&ret->v, 0); set_alloc_v4_u64s(ret); } else { bkey_alloc_v4_init(&ret->k_i); ret->k.p = k.k->p; bch2_alloc_to_v4(k, &ret->v); } return ret; } static inline struct bkey_i_alloc_v4 *bch2_alloc_to_v4_mut_inlined(struct btree_trans *trans, struct bkey_s_c k) { struct bkey_s_c_alloc_v4 a; if (likely(k.k->type == KEY_TYPE_alloc_v4) && ((a = bkey_s_c_to_alloc_v4(k), true) && BCH_ALLOC_V4_NR_BACKPOINTERS(a.v) == 0)) return bch2_bkey_make_mut_noupdate_typed(trans, k, alloc_v4); return __bch2_alloc_to_v4_mut(trans, k); } struct bkey_i_alloc_v4 *bch2_alloc_to_v4_mut(struct btree_trans *trans, struct bkey_s_c k) { return bch2_alloc_to_v4_mut_inlined(trans, k); } struct bkey_i_alloc_v4 * bch2_trans_start_alloc_update_noupdate(struct btree_trans *trans, struct btree_iter *iter, struct bpos pos) { struct bkey_s_c k = bch2_bkey_get_iter(trans, iter, BTREE_ID_alloc, pos, BTREE_ITER_with_updates| BTREE_ITER_cached| BTREE_ITER_intent); int ret = bkey_err(k); if (unlikely(ret)) return ERR_PTR(ret); struct bkey_i_alloc_v4 *a = bch2_alloc_to_v4_mut_inlined(trans, k); ret = PTR_ERR_OR_ZERO(a); if (unlikely(ret)) goto err; return a; err: bch2_trans_iter_exit(trans, iter); return ERR_PTR(ret); } __flatten struct bkey_i_alloc_v4 *bch2_trans_start_alloc_update(struct btree_trans *trans, struct bpos pos, enum btree_iter_update_trigger_flags flags) { struct btree_iter iter; struct bkey_s_c k = bch2_bkey_get_iter(trans, &iter, BTREE_ID_alloc, pos, BTREE_ITER_with_updates| BTREE_ITER_cached| BTREE_ITER_intent); int ret = bkey_err(k); if (unlikely(ret)) return ERR_PTR(ret); if ((void *) k.v >= trans->mem && (void *) k.v < trans->mem + trans->mem_top) { bch2_trans_iter_exit(trans, &iter); return container_of(bkey_s_c_to_alloc_v4(k).v, struct bkey_i_alloc_v4, v); } struct bkey_i_alloc_v4 *a = bch2_alloc_to_v4_mut_inlined(trans, k); if (IS_ERR(a)) { bch2_trans_iter_exit(trans, &iter); return a; } ret = bch2_trans_update_ip(trans, &iter, &a->k_i, flags, _RET_IP_); bch2_trans_iter_exit(trans, &iter); return unlikely(ret) ? ERR_PTR(ret) : a; } static struct bpos alloc_gens_pos(struct bpos pos, unsigned *offset) { *offset = pos.offset & KEY_TYPE_BUCKET_GENS_MASK; pos.offset >>= KEY_TYPE_BUCKET_GENS_BITS; return pos; } static struct bpos bucket_gens_pos_to_alloc(struct bpos pos, unsigned offset) { pos.offset <<= KEY_TYPE_BUCKET_GENS_BITS; pos.offset += offset; return pos; } static unsigned alloc_gen(struct bkey_s_c k, unsigned offset) { return k.k->type == KEY_TYPE_bucket_gens ? bkey_s_c_to_bucket_gens(k).v->gens[offset] : 0; } int bch2_bucket_gens_validate(struct bch_fs *c, struct bkey_s_c k, struct bkey_validate_context from) { int ret = 0; bkey_fsck_err_on(bkey_val_bytes(k.k) != sizeof(struct bch_bucket_gens), c, bucket_gens_val_size_bad, "bad val size (%zu != %zu)", bkey_val_bytes(k.k), sizeof(struct bch_bucket_gens)); fsck_err: return ret; } void bch2_bucket_gens_to_text(struct printbuf *out, struct bch_fs *c, struct bkey_s_c k) { struct bkey_s_c_bucket_gens g = bkey_s_c_to_bucket_gens(k); unsigned i; for (i = 0; i < ARRAY_SIZE(g.v->gens); i++) { if (i) prt_char(out, ' '); prt_printf(out, "%u", g.v->gens[i]); } } int bch2_bucket_gens_init(struct bch_fs *c) { struct btree_trans *trans = bch2_trans_get(c); struct bkey_i_bucket_gens g; bool have_bucket_gens_key = false; int ret; ret = for_each_btree_key(trans, iter, BTREE_ID_alloc, POS_MIN, BTREE_ITER_prefetch, k, ({ /* * Not a fsck error because this is checked/repaired by * bch2_check_alloc_key() which runs later: */ if (!bch2_dev_bucket_exists(c, k.k->p)) continue; struct bch_alloc_v4 a; u8 gen = bch2_alloc_to_v4(k, &a)->gen; unsigned offset; struct bpos pos = alloc_gens_pos(iter.pos, &offset); int ret2 = 0; if (have_bucket_gens_key && !bkey_eq(g.k.p, pos)) { ret2 = bch2_btree_insert_trans(trans, BTREE_ID_bucket_gens, &g.k_i, 0) ?: bch2_trans_commit(trans, NULL, NULL, BCH_TRANS_COMMIT_no_enospc); if (ret2) goto iter_err; have_bucket_gens_key = false; } if (!have_bucket_gens_key) { bkey_bucket_gens_init(&g.k_i); g.k.p = pos; have_bucket_gens_key = true; } g.v.gens[offset] = gen; iter_err: ret2; })); if (have_bucket_gens_key && !ret) ret = commit_do(trans, NULL, NULL, BCH_TRANS_COMMIT_no_enospc, bch2_btree_insert_trans(trans, BTREE_ID_bucket_gens, &g.k_i, 0)); bch2_trans_put(trans); bch_err_fn(c, ret); return ret; } int bch2_alloc_read(struct bch_fs *c) { down_read(&c->state_lock); struct btree_trans *trans = bch2_trans_get(c); struct bch_dev *ca = NULL; int ret; if (c->sb.version_upgrade_complete >= bcachefs_metadata_version_bucket_gens) { ret = for_each_btree_key(trans, iter, BTREE_ID_bucket_gens, POS_MIN, BTREE_ITER_prefetch, k, ({ u64 start = bucket_gens_pos_to_alloc(k.k->p, 0).offset; u64 end = bucket_gens_pos_to_alloc(bpos_nosnap_successor(k.k->p), 0).offset; if (k.k->type != KEY_TYPE_bucket_gens) continue; ca = bch2_dev_iterate(c, ca, k.k->p.inode); /* * Not a fsck error because this is checked/repaired by * bch2_check_alloc_key() which runs later: */ if (!ca) { bch2_btree_iter_set_pos(trans, &iter, POS(k.k->p.inode + 1, 0)); continue; } const struct bch_bucket_gens *g = bkey_s_c_to_bucket_gens(k).v; for (u64 b = max_t(u64, ca->mi.first_bucket, start); b < min_t(u64, ca->mi.nbuckets, end); b++) *bucket_gen(ca, b) = g->gens[b & KEY_TYPE_BUCKET_GENS_MASK]; 0; })); } else { ret = for_each_btree_key(trans, iter, BTREE_ID_alloc, POS_MIN, BTREE_ITER_prefetch, k, ({ ca = bch2_dev_iterate(c, ca, k.k->p.inode); /* * Not a fsck error because this is checked/repaired by * bch2_check_alloc_key() which runs later: */ if (!ca) { bch2_btree_iter_set_pos(trans, &iter, POS(k.k->p.inode + 1, 0)); continue; } if (k.k->p.offset < ca->mi.first_bucket) { bch2_btree_iter_set_pos(trans, &iter, POS(k.k->p.inode, ca->mi.first_bucket)); continue; } if (k.k->p.offset >= ca->mi.nbuckets) { bch2_btree_iter_set_pos(trans, &iter, POS(k.k->p.inode + 1, 0)); continue; } struct bch_alloc_v4 a; *bucket_gen(ca, k.k->p.offset) = bch2_alloc_to_v4(k, &a)->gen; 0; })); } bch2_dev_put(ca); bch2_trans_put(trans); up_read(&c->state_lock); bch_err_fn(c, ret); return ret; } /* Free space/discard btree: */ static int __need_discard_or_freespace_err(struct btree_trans *trans, struct bkey_s_c alloc_k, bool set, bool discard, bool repair) { struct bch_fs *c = trans->c; enum bch_fsck_flags flags = FSCK_CAN_IGNORE|(repair ? FSCK_CAN_FIX : 0); enum bch_sb_error_id err_id = discard ? BCH_FSCK_ERR_need_discard_key_wrong : BCH_FSCK_ERR_freespace_key_wrong; enum btree_id btree = discard ? BTREE_ID_need_discard : BTREE_ID_freespace; struct printbuf buf = PRINTBUF; bch2_bkey_val_to_text(&buf, c, alloc_k); int ret = __bch2_fsck_err(NULL, trans, flags, err_id, "bucket incorrectly %sset in %s btree\n%s", set ? "" : "un", bch2_btree_id_str(btree), buf.buf); if (bch2_err_matches(ret, BCH_ERR_fsck_ignore) || bch2_err_matches(ret, BCH_ERR_fsck_errors_not_fixed)) ret = 0; printbuf_exit(&buf); return ret; } #define need_discard_or_freespace_err(...) \ fsck_err_wrap(__need_discard_or_freespace_err(__VA_ARGS__)) #define need_discard_or_freespace_err_on(cond, ...) \ (unlikely(cond) ? need_discard_or_freespace_err(__VA_ARGS__) : false) static int bch2_bucket_do_index(struct btree_trans *trans, struct bch_dev *ca, struct bkey_s_c alloc_k, const struct bch_alloc_v4 *a, bool set) { enum btree_id btree; struct bpos pos; if (a->data_type != BCH_DATA_free && a->data_type != BCH_DATA_need_discard) return 0; switch (a->data_type) { case BCH_DATA_free: btree = BTREE_ID_freespace; pos = alloc_freespace_pos(alloc_k.k->p, *a); break; case BCH_DATA_need_discard: btree = BTREE_ID_need_discard; pos = alloc_k.k->p; break; default: return 0; } struct btree_iter iter; struct bkey_s_c old = bch2_bkey_get_iter(trans, &iter, btree, pos, BTREE_ITER_intent); int ret = bkey_err(old); if (ret) return ret; need_discard_or_freespace_err_on(ca->mi.freespace_initialized && !old.k->type != set, trans, alloc_k, set, btree == BTREE_ID_need_discard, false); ret = bch2_btree_bit_mod_iter(trans, &iter, set); fsck_err: bch2_trans_iter_exit(trans, &iter); return ret; } static noinline int bch2_bucket_gen_update(struct btree_trans *trans, struct bpos bucket, u8 gen) { struct btree_iter iter; unsigned offset; struct bpos pos = alloc_gens_pos(bucket, &offset); struct bkey_i_bucket_gens *g; struct bkey_s_c k; int ret; g = bch2_trans_kmalloc(trans, sizeof(*g)); ret = PTR_ERR_OR_ZERO(g); if (ret) return ret; k = bch2_bkey_get_iter(trans, &iter, BTREE_ID_bucket_gens, pos, BTREE_ITER_intent| BTREE_ITER_with_updates); ret = bkey_err(k); if (ret) return ret; if (k.k->type != KEY_TYPE_bucket_gens) { bkey_bucket_gens_init(&g->k_i); g->k.p = iter.pos; } else { bkey_reassemble(&g->k_i, k); } g->v.gens[offset] = gen; ret = bch2_trans_update(trans, &iter, &g->k_i, 0); bch2_trans_iter_exit(trans, &iter); return ret; } static inline int bch2_dev_data_type_accounting_mod(struct btree_trans *trans, struct bch_dev *ca, enum bch_data_type data_type, s64 delta_buckets, s64 delta_sectors, s64 delta_fragmented, unsigned flags) { s64 d[3] = { delta_buckets, delta_sectors, delta_fragmented }; return bch2_disk_accounting_mod2(trans, flags & BTREE_TRIGGER_gc, d, dev_data_type, .dev = ca->dev_idx, .data_type = data_type); } int bch2_alloc_key_to_dev_counters(struct btree_trans *trans, struct bch_dev *ca, const struct bch_alloc_v4 *old, const struct bch_alloc_v4 *new, unsigned flags) { s64 old_sectors = bch2_bucket_sectors(*old); s64 new_sectors = bch2_bucket_sectors(*new); if (old->data_type != new->data_type) { int ret = bch2_dev_data_type_accounting_mod(trans, ca, new->data_type, 1, new_sectors, bch2_bucket_sectors_fragmented(ca, *new), flags) ?: bch2_dev_data_type_accounting_mod(trans, ca, old->data_type, -1, -old_sectors, -bch2_bucket_sectors_fragmented(ca, *old), flags); if (ret) return ret; } else if (old_sectors != new_sectors) { int ret = bch2_dev_data_type_accounting_mod(trans, ca, new->data_type, 0, new_sectors - old_sectors, bch2_bucket_sectors_fragmented(ca, *new) - bch2_bucket_sectors_fragmented(ca, *old), flags); if (ret) return ret; } s64 old_unstriped = bch2_bucket_sectors_unstriped(*old); s64 new_unstriped = bch2_bucket_sectors_unstriped(*new); if (old_unstriped != new_unstriped) { int ret = bch2_dev_data_type_accounting_mod(trans, ca, BCH_DATA_unstriped, !!new_unstriped - !!old_unstriped, new_unstriped - old_unstriped, 0, flags); if (ret) return ret; } return 0; } int bch2_trigger_alloc(struct btree_trans *trans, enum btree_id btree, unsigned level, struct bkey_s_c old, struct bkey_s new, enum btree_iter_update_trigger_flags flags) { struct bch_fs *c = trans->c; struct printbuf buf = PRINTBUF; int ret = 0; struct bch_dev *ca = bch2_dev_bucket_tryget(c, new.k->p); if (!ca) return bch_err_throw(c, trigger_alloc); struct bch_alloc_v4 old_a_convert; const struct bch_alloc_v4 *old_a = bch2_alloc_to_v4(old, &old_a_convert); struct bch_alloc_v4 *new_a; if (likely(new.k->type == KEY_TYPE_alloc_v4)) { new_a = bkey_s_to_alloc_v4(new).v; } else { BUG_ON(!(flags & (BTREE_TRIGGER_gc|BTREE_TRIGGER_check_repair))); struct bkey_i_alloc_v4 *new_ka = bch2_alloc_to_v4_mut_inlined(trans, new.s_c); ret = PTR_ERR_OR_ZERO(new_ka); if (unlikely(ret)) goto err; new_a = &new_ka->v; } if (flags & BTREE_TRIGGER_transactional) { alloc_data_type_set(new_a, new_a->data_type); int is_empty_delta = (int) data_type_is_empty(new_a->data_type) - (int) data_type_is_empty(old_a->data_type); if (is_empty_delta < 0) { new_a->io_time[READ] = bch2_current_io_time(c, READ); new_a->io_time[WRITE]= bch2_current_io_time(c, WRITE); SET_BCH_ALLOC_V4_NEED_INC_GEN(new_a, true); SET_BCH_ALLOC_V4_NEED_DISCARD(new_a, true); } if (data_type_is_empty(new_a->data_type) && BCH_ALLOC_V4_NEED_INC_GEN(new_a) && !bch2_bucket_is_open_safe(c, new.k->p.inode, new.k->p.offset)) { if (new_a->oldest_gen == new_a->gen && !bch2_bucket_sectors_total(*new_a)) new_a->oldest_gen++; new_a->gen++; SET_BCH_ALLOC_V4_NEED_INC_GEN(new_a, false); alloc_data_type_set(new_a, new_a->data_type); } if (old_a->data_type != new_a->data_type || (new_a->data_type == BCH_DATA_free && alloc_freespace_genbits(*old_a) != alloc_freespace_genbits(*new_a))) { ret = bch2_bucket_do_index(trans, ca, old, old_a, false) ?: bch2_bucket_do_index(trans, ca, new.s_c, new_a, true); if (ret) goto err; } if (new_a->data_type == BCH_DATA_cached && !new_a->io_time[READ]) new_a->io_time[READ] = bch2_current_io_time(c, READ); ret = bch2_lru_change(trans, new.k->p.inode, bucket_to_u64(new.k->p), alloc_lru_idx_read(*old_a), alloc_lru_idx_read(*new_a)); if (ret) goto err; ret = bch2_lru_change(trans, BCH_LRU_BUCKET_FRAGMENTATION, bucket_to_u64(new.k->p), alloc_lru_idx_fragmentation(*old_a, ca), alloc_lru_idx_fragmentation(*new_a, ca)); if (ret) goto err; if (old_a->gen != new_a->gen) { ret = bch2_bucket_gen_update(trans, new.k->p, new_a->gen); if (ret) goto err; } ret = bch2_alloc_key_to_dev_counters(trans, ca, old_a, new_a, flags); if (ret) goto err; } if ((flags & BTREE_TRIGGER_atomic) && (flags & BTREE_TRIGGER_insert)) { u64 transaction_seq = trans->journal_res.seq; BUG_ON(!transaction_seq); if (log_fsck_err_on(transaction_seq && new_a->journal_seq_nonempty > transaction_seq, trans, alloc_key_journal_seq_in_future, "bucket journal seq in future (currently at %llu)\n%s", journal_cur_seq(&c->journal), (bch2_bkey_val_to_text(&buf, c, new.s_c), buf.buf))) new_a->journal_seq_nonempty = transaction_seq; int is_empty_delta = (int) data_type_is_empty(new_a->data_type) - (int) data_type_is_empty(old_a->data_type); /* * Record journal sequence number of empty -> nonempty transition: * Note that there may be multiple empty -> nonempty * transitions, data in a bucket may be overwritten while we're * still writing to it - so be careful to only record the first: * */ if (is_empty_delta < 0 && new_a->journal_seq_empty <= c->journal.flushed_seq_ondisk) { new_a->journal_seq_nonempty = transaction_seq; new_a->journal_seq_empty = 0; } /* * Bucket becomes empty: mark it as waiting for a journal flush, * unless updates since empty -> nonempty transition were never * flushed - we may need to ask the journal not to flush * intermediate sequence numbers: */ if (is_empty_delta > 0) { if (new_a->journal_seq_nonempty == transaction_seq || bch2_journal_noflush_seq(&c->journal, new_a->journal_seq_nonempty, transaction_seq)) { new_a->journal_seq_nonempty = new_a->journal_seq_empty = 0; } else { new_a->journal_seq_empty = transaction_seq; ret = bch2_set_bucket_needs_journal_commit(&c->buckets_waiting_for_journal, c->journal.flushed_seq_ondisk, new.k->p.inode, new.k->p.offset, transaction_seq); if (bch2_fs_fatal_err_on(ret, c, "setting bucket_needs_journal_commit: %s", bch2_err_str(ret))) goto err; } } if (new_a->gen != old_a->gen) { guard(rcu)(); u8 *gen = bucket_gen(ca, new.k->p.offset); if (unlikely(!gen)) goto invalid_bucket; *gen = new_a->gen; } #define eval_state(_a, expr) ({ const struct bch_alloc_v4 *a = _a; expr; }) #define statechange(expr) !eval_state(old_a, expr) && eval_state(new_a, expr) #define bucket_flushed(a) (a->journal_seq_empty <= c->journal.flushed_seq_ondisk) if (statechange(a->data_type == BCH_DATA_free) && bucket_flushed(new_a)) closure_wake_up(&c->freelist_wait); if (statechange(a->data_type == BCH_DATA_need_discard) && !bch2_bucket_is_open_safe(c, new.k->p.inode, new.k->p.offset) && bucket_flushed(new_a)) bch2_discard_one_bucket_fast(ca, new.k->p.offset); if (statechange(a->data_type == BCH_DATA_cached) && !bch2_bucket_is_open(c, new.k->p.inode, new.k->p.offset) && should_invalidate_buckets(ca, bch2_dev_usage_read(ca))) bch2_dev_do_invalidates(ca); if (statechange(a->data_type == BCH_DATA_need_gc_gens)) bch2_gc_gens_async(c); } if ((flags & BTREE_TRIGGER_gc) && (flags & BTREE_TRIGGER_insert)) { guard(rcu)(); struct bucket *g = gc_bucket(ca, new.k->p.offset); if (unlikely(!g)) goto invalid_bucket; g->gen_valid = 1; g->gen = new_a->gen; } err: fsck_err: printbuf_exit(&buf); bch2_dev_put(ca); return ret; invalid_bucket: bch2_fs_inconsistent(c, "reference to invalid bucket\n%s", (bch2_bkey_val_to_text(&buf, c, new.s_c), buf.buf)); ret = bch_err_throw(c, trigger_alloc); goto err; } /* * This synthesizes deleted extents for holes, similar to BTREE_ITER_slots for * extents style btrees, but works on non-extents btrees: */ static struct bkey_s_c bch2_get_key_or_hole(struct btree_trans *trans, struct btree_iter *iter, struct bpos end, struct bkey *hole) { struct bkey_s_c k = bch2_btree_iter_peek_slot(trans, iter); if (bkey_err(k)) return k; if (k.k->type) { return k; } else { struct btree_iter iter2; struct bpos next; bch2_trans_copy_iter(trans, &iter2, iter); struct btree_path *path = btree_iter_path(trans, iter); if (!bpos_eq(path->l[0].b->key.k.p, SPOS_MAX)) end = bkey_min(end, bpos_nosnap_successor(path->l[0].b->key.k.p)); end = bkey_min(end, POS(iter->pos.inode, iter->pos.offset + U32_MAX - 1)); /* * btree node min/max is a closed interval, upto takes a half * open interval: */ k = bch2_btree_iter_peek_max(trans, &iter2, end); next = iter2.pos; bch2_trans_iter_exit(trans, &iter2); BUG_ON(next.offset >= iter->pos.offset + U32_MAX); if (bkey_err(k)) return k; bkey_init(hole); hole->p = iter->pos; bch2_key_resize(hole, next.offset - iter->pos.offset); return (struct bkey_s_c) { hole, NULL }; } } static bool next_bucket(struct bch_fs *c, struct bch_dev **ca, struct bpos *bucket) { if (*ca) { if (bucket->offset < (*ca)->mi.first_bucket) bucket->offset = (*ca)->mi.first_bucket; if (bucket->offset < (*ca)->mi.nbuckets) return true; bch2_dev_put(*ca); *ca = NULL; bucket->inode++; bucket->offset = 0; } guard(rcu)(); *ca = __bch2_next_dev_idx(c, bucket->inode, NULL); if (*ca) { *bucket = POS((*ca)->dev_idx, (*ca)->mi.first_bucket); bch2_dev_get(*ca); } return *ca != NULL; } static struct bkey_s_c bch2_get_key_or_real_bucket_hole(struct btree_trans *trans, struct btree_iter *iter, struct bch_dev **ca, struct bkey *hole) { struct bch_fs *c = trans->c; struct bkey_s_c k; again: k = bch2_get_key_or_hole(trans, iter, POS_MAX, hole); if (bkey_err(k)) return k; *ca = bch2_dev_iterate_noerror(c, *ca, k.k->p.inode); if (!k.k->type) { struct bpos hole_start = bkey_start_pos(k.k); if (!*ca || !bucket_valid(*ca, hole_start.offset)) { if (!next_bucket(c, ca, &hole_start)) return bkey_s_c_null; bch2_btree_iter_set_pos(trans, iter, hole_start); goto again; } if (k.k->p.offset > (*ca)->mi.nbuckets) bch2_key_resize(hole, (*ca)->mi.nbuckets - hole_start.offset); } return k; } static noinline_for_stack int bch2_check_alloc_key(struct btree_trans *trans, struct bkey_s_c alloc_k, struct btree_iter *alloc_iter, struct btree_iter *discard_iter, struct btree_iter *freespace_iter, struct btree_iter *bucket_gens_iter) { struct bch_fs *c = trans->c; struct bch_alloc_v4 a_convert; const struct bch_alloc_v4 *a; unsigned gens_offset; struct bkey_s_c k; struct printbuf buf = PRINTBUF; int ret = 0; struct bch_dev *ca = bch2_dev_bucket_tryget_noerror(c, alloc_k.k->p); if (fsck_err_on(!ca, trans, alloc_key_to_missing_dev_bucket, "alloc key for invalid device:bucket %llu:%llu", alloc_k.k->p.inode, alloc_k.k->p.offset)) ret = bch2_btree_delete_at(trans, alloc_iter, 0); if (!ca) return ret; if (!ca->mi.freespace_initialized) goto out; a = bch2_alloc_to_v4(alloc_k, &a_convert); bch2_btree_iter_set_pos(trans, discard_iter, alloc_k.k->p); k = bch2_btree_iter_peek_slot(trans, discard_iter); ret = bkey_err(k); if (ret) goto err; bool is_discarded = a->data_type == BCH_DATA_need_discard; if (need_discard_or_freespace_err_on(!!k.k->type != is_discarded, trans, alloc_k, !is_discarded, true, true)) { ret = bch2_btree_bit_mod_iter(trans, discard_iter, is_discarded); if (ret) goto err; } bch2_btree_iter_set_pos(trans, freespace_iter, alloc_freespace_pos(alloc_k.k->p, *a)); k = bch2_btree_iter_peek_slot(trans, freespace_iter); ret = bkey_err(k); if (ret) goto err; bool is_free = a->data_type == BCH_DATA_free; if (need_discard_or_freespace_err_on(!!k.k->type != is_free, trans, alloc_k, !is_free, false, true)) { ret = bch2_btree_bit_mod_iter(trans, freespace_iter, is_free); if (ret) goto err; } bch2_btree_iter_set_pos(trans, bucket_gens_iter, alloc_gens_pos(alloc_k.k->p, &gens_offset)); k = bch2_btree_iter_peek_slot(trans, bucket_gens_iter); ret = bkey_err(k); if (ret) goto err; if (fsck_err_on(a->gen != alloc_gen(k, gens_offset), trans, bucket_gens_key_wrong, "incorrect gen in bucket_gens btree (got %u should be %u)\n%s", alloc_gen(k, gens_offset), a->gen, (printbuf_reset(&buf), bch2_bkey_val_to_text(&buf, c, alloc_k), buf.buf))) { struct bkey_i_bucket_gens *g = bch2_trans_kmalloc(trans, sizeof(*g)); ret = PTR_ERR_OR_ZERO(g); if (ret) goto err; if (k.k->type == KEY_TYPE_bucket_gens) { bkey_reassemble(&g->k_i, k); } else { bkey_bucket_gens_init(&g->k_i); g->k.p = alloc_gens_pos(alloc_k.k->p, &gens_offset); } g->v.gens[gens_offset] = a->gen; ret = bch2_trans_update(trans, bucket_gens_iter, &g->k_i, 0); if (ret) goto err; } out: err: fsck_err: bch2_dev_put(ca); printbuf_exit(&buf); return ret; } static noinline_for_stack int bch2_check_alloc_hole_freespace(struct btree_trans *trans, struct bch_dev *ca, struct bpos start, struct bpos *end, struct btree_iter *freespace_iter) { struct bkey_s_c k; struct printbuf buf = PRINTBUF; int ret; if (!ca->mi.freespace_initialized) return 0; bch2_btree_iter_set_pos(trans, freespace_iter, start); k = bch2_btree_iter_peek_slot(trans, freespace_iter); ret = bkey_err(k); if (ret) goto err; *end = bkey_min(k.k->p, *end); if (fsck_err_on(k.k->type != KEY_TYPE_set, trans, freespace_hole_missing, "hole in alloc btree missing in freespace btree\n" "device %llu buckets %llu-%llu", freespace_iter->pos.inode, freespace_iter->pos.offset, end->offset)) { struct bkey_i *update = bch2_trans_kmalloc(trans, sizeof(*update)); ret = PTR_ERR_OR_ZERO(update); if (ret) goto err; bkey_init(&update->k); update->k.type = KEY_TYPE_set; update->k.p = freespace_iter->pos; bch2_key_resize(&update->k, min_t(u64, U32_MAX, end->offset - freespace_iter->pos.offset)); ret = bch2_trans_update(trans, freespace_iter, update, 0); if (ret) goto err; } err: fsck_err: printbuf_exit(&buf); return ret; } static noinline_for_stack int bch2_check_alloc_hole_bucket_gens(struct btree_trans *trans, struct bpos start, struct bpos *end, struct btree_iter *bucket_gens_iter) { struct bkey_s_c k; struct printbuf buf = PRINTBUF; unsigned i, gens_offset, gens_end_offset; int ret; bch2_btree_iter_set_pos(trans, bucket_gens_iter, alloc_gens_pos(start, &gens_offset)); k = bch2_btree_iter_peek_slot(trans, bucket_gens_iter); ret = bkey_err(k); if (ret) goto err; if (bkey_cmp(alloc_gens_pos(start, &gens_offset), alloc_gens_pos(*end, &gens_end_offset))) gens_end_offset = KEY_TYPE_BUCKET_GENS_NR; if (k.k->type == KEY_TYPE_bucket_gens) { struct bkey_i_bucket_gens g; bool need_update = false; bkey_reassemble(&g.k_i, k); for (i = gens_offset; i < gens_end_offset; i++) { if (fsck_err_on(g.v.gens[i], trans, bucket_gens_hole_wrong, "hole in alloc btree at %llu:%llu with nonzero gen in bucket_gens btree (%u)", bucket_gens_pos_to_alloc(k.k->p, i).inode, bucket_gens_pos_to_alloc(k.k->p, i).offset, g.v.gens[i])) { g.v.gens[i] = 0; need_update = true; } } if (need_update) { struct bkey_i *u = bch2_trans_kmalloc(trans, sizeof(g)); ret = PTR_ERR_OR_ZERO(u); if (ret) goto err; memcpy(u, &g, sizeof(g)); ret = bch2_trans_update(trans, bucket_gens_iter, u, 0); if (ret) goto err; } } *end = bkey_min(*end, bucket_gens_pos_to_alloc(bpos_nosnap_successor(k.k->p), 0)); err: fsck_err: printbuf_exit(&buf); return ret; } struct check_discard_freespace_key_async { struct work_struct work; struct bch_fs *c; struct bbpos pos; }; static int bch2_recheck_discard_freespace_key(struct btree_trans *trans, struct bbpos pos) { struct btree_iter iter; struct bkey_s_c k = bch2_bkey_get_iter(trans, &iter, pos.btree, pos.pos, 0); int ret = bkey_err(k); if (ret) return ret; u8 gen; ret = k.k->type != KEY_TYPE_set ? bch2_check_discard_freespace_key(trans, &iter, &gen, false) : 0; bch2_trans_iter_exit(trans, &iter); return ret; } static void check_discard_freespace_key_work(struct work_struct *work) { struct check_discard_freespace_key_async *w = container_of(work, struct check_discard_freespace_key_async, work); bch2_trans_do(w->c, bch2_recheck_discard_freespace_key(trans, w->pos)); enumerated_ref_put(&w->c->writes, BCH_WRITE_REF_check_discard_freespace_key); kfree(w); } int bch2_check_discard_freespace_key(struct btree_trans *trans, struct btree_iter *iter, u8 *gen, bool async_repair) { struct bch_fs *c = trans->c; enum bch_data_type state = iter->btree_id == BTREE_ID_need_discard ? BCH_DATA_need_discard : BCH_DATA_free; struct printbuf buf = PRINTBUF; unsigned fsck_flags = (async_repair ? FSCK_ERR_NO_LOG : 0)| FSCK_CAN_FIX|FSCK_CAN_IGNORE; struct bpos bucket = iter->pos; bucket.offset &= ~(~0ULL << 56); u64 genbits = iter->pos.offset & (~0ULL << 56); struct btree_iter alloc_iter; struct bkey_s_c alloc_k = bch2_bkey_get_iter(trans, &alloc_iter, BTREE_ID_alloc, bucket, async_repair ? BTREE_ITER_cached : 0); int ret = bkey_err(alloc_k); if (ret) return ret; if (!bch2_dev_bucket_exists(c, bucket)) { if (__fsck_err(trans, fsck_flags, need_discard_freespace_key_to_invalid_dev_bucket, "entry in %s btree for nonexistant dev:bucket %llu:%llu", bch2_btree_id_str(iter->btree_id), bucket.inode, bucket.offset)) goto delete; ret = 1; goto out; } struct bch_alloc_v4 a_convert; const struct bch_alloc_v4 *a = bch2_alloc_to_v4(alloc_k, &a_convert); if (a->data_type != state || (state == BCH_DATA_free && genbits != alloc_freespace_genbits(*a))) { if (__fsck_err(trans, fsck_flags, need_discard_freespace_key_bad, "%s\nincorrectly set at %s:%llu:%llu:0 (free %u, genbits %llu should be %llu)", (bch2_bkey_val_to_text(&buf, c, alloc_k), buf.buf), bch2_btree_id_str(iter->btree_id), iter->pos.inode, iter->pos.offset, a->data_type == state, genbits >> 56, alloc_freespace_genbits(*a) >> 56)) goto delete; ret = 1; goto out; } *gen = a->gen; out: fsck_err: bch2_set_btree_iter_dontneed(trans, &alloc_iter); bch2_trans_iter_exit(trans, &alloc_iter); printbuf_exit(&buf); return ret; delete: if (!async_repair) { ret = bch2_btree_bit_mod_iter(trans, iter, false) ?: bch2_trans_commit(trans, NULL, NULL, BCH_TRANS_COMMIT_no_enospc) ?: bch_err_throw(c, transaction_restart_commit); goto out; } else { /* * We can't repair here when called from the allocator path: the * commit will recurse back into the allocator */ struct check_discard_freespace_key_async *w = kzalloc(sizeof(*w), GFP_KERNEL); if (!w) goto out; if (!enumerated_ref_tryget(&c->writes, BCH_WRITE_REF_check_discard_freespace_key)) { kfree(w); goto out; } INIT_WORK(&w->work, check_discard_freespace_key_work); w->c = c; w->pos = BBPOS(iter->btree_id, iter->pos); queue_work(c->write_ref_wq, &w->work); ret = 1; /* don't allocate from this bucket */ goto out; } } static int bch2_check_discard_freespace_key_fsck(struct btree_trans *trans, struct btree_iter *iter) { u8 gen; int ret = bch2_check_discard_freespace_key(trans, iter, &gen, false); return ret < 0 ? ret : 0; } /* * We've already checked that generation numbers in the bucket_gens btree are * valid for buckets that exist; this just checks for keys for nonexistent * buckets. */ static noinline_for_stack int bch2_check_bucket_gens_key(struct btree_trans *trans, struct btree_iter *iter, struct bkey_s_c k) { struct bch_fs *c = trans->c; struct bkey_i_bucket_gens g; u64 start = bucket_gens_pos_to_alloc(k.k->p, 0).offset; u64 end = bucket_gens_pos_to_alloc(bpos_nosnap_successor(k.k->p), 0).offset; u64 b; bool need_update = false; struct printbuf buf = PRINTBUF; int ret = 0; BUG_ON(k.k->type != KEY_TYPE_bucket_gens); bkey_reassemble(&g.k_i, k); struct bch_dev *ca = bch2_dev_tryget_noerror(c, k.k->p.inode); if (!ca) { if (fsck_err(trans, bucket_gens_to_invalid_dev, "bucket_gens key for invalid device:\n%s", (bch2_bkey_val_to_text(&buf, c, k), buf.buf))) ret = bch2_btree_delete_at(trans, iter, 0); goto out; } if (fsck_err_on(end <= ca->mi.first_bucket || start >= ca->mi.nbuckets, trans, bucket_gens_to_invalid_buckets, "bucket_gens key for invalid buckets:\n%s", (bch2_bkey_val_to_text(&buf, c, k), buf.buf))) { ret = bch2_btree_delete_at(trans, iter, 0); goto out; } for (b = start; b < ca->mi.first_bucket; b++) if (fsck_err_on(g.v.gens[b & KEY_TYPE_BUCKET_GENS_MASK], trans, bucket_gens_nonzero_for_invalid_buckets, "bucket_gens key has nonzero gen for invalid bucket")) { g.v.gens[b & KEY_TYPE_BUCKET_GENS_MASK] = 0; need_update = true; } for (b = ca->mi.nbuckets; b < end; b++) if (fsck_err_on(g.v.gens[b & KEY_TYPE_BUCKET_GENS_MASK], trans, bucket_gens_nonzero_for_invalid_buckets, "bucket_gens key has nonzero gen for invalid bucket")) { g.v.gens[b & KEY_TYPE_BUCKET_GENS_MASK] = 0; need_update = true; } if (need_update) { struct bkey_i *u = bch2_trans_kmalloc(trans, sizeof(g)); ret = PTR_ERR_OR_ZERO(u); if (ret) goto out; memcpy(u, &g, sizeof(g)); ret = bch2_trans_update(trans, iter, u, 0); } out: fsck_err: bch2_dev_put(ca); printbuf_exit(&buf); return ret; } int bch2_check_alloc_info(struct bch_fs *c) { struct btree_trans *trans = bch2_trans_get(c); struct btree_iter iter, discard_iter, freespace_iter, bucket_gens_iter; struct bch_dev *ca = NULL; struct bkey hole; struct bkey_s_c k; int ret = 0; bch2_trans_iter_init(trans, &iter, BTREE_ID_alloc, POS_MIN, BTREE_ITER_prefetch); bch2_trans_iter_init(trans, &discard_iter, BTREE_ID_need_discard, POS_MIN, BTREE_ITER_prefetch); bch2_trans_iter_init(trans, &freespace_iter, BTREE_ID_freespace, POS_MIN, BTREE_ITER_prefetch); bch2_trans_iter_init(trans, &bucket_gens_iter, BTREE_ID_bucket_gens, POS_MIN, BTREE_ITER_prefetch); while (1) { struct bpos next; bch2_trans_begin(trans); k = bch2_get_key_or_real_bucket_hole(trans, &iter, &ca, &hole); ret = bkey_err(k); if (ret) goto bkey_err; if (!k.k) break; if (k.k->type) { next = bpos_nosnap_successor(k.k->p); ret = bch2_check_alloc_key(trans, k, &iter, &discard_iter, &freespace_iter, &bucket_gens_iter); if (ret) goto bkey_err; } else { next = k.k->p; ret = bch2_check_alloc_hole_freespace(trans, ca, bkey_start_pos(k.k), &next, &freespace_iter) ?: bch2_check_alloc_hole_bucket_gens(trans, bkey_start_pos(k.k), &next, &bucket_gens_iter); if (ret) goto bkey_err; } ret = bch2_trans_commit(trans, NULL, NULL, BCH_TRANS_COMMIT_no_enospc); if (ret) goto bkey_err; bch2_btree_iter_set_pos(trans, &iter, next); bkey_err: if (bch2_err_matches(ret, BCH_ERR_transaction_restart)) continue; if (ret) break; } bch2_trans_iter_exit(trans, &bucket_gens_iter); bch2_trans_iter_exit(trans, &freespace_iter); bch2_trans_iter_exit(trans, &discard_iter); bch2_trans_iter_exit(trans, &iter); bch2_dev_put(ca); ca = NULL; if (ret < 0) goto err; ret = for_each_btree_key(trans, iter, BTREE_ID_need_discard, POS_MIN, BTREE_ITER_prefetch, k, bch2_check_discard_freespace_key_fsck(trans, &iter)); if (ret) goto err; bch2_trans_iter_init(trans, &iter, BTREE_ID_freespace, POS_MIN, BTREE_ITER_prefetch); while (1) { bch2_trans_begin(trans); k = bch2_btree_iter_peek(trans, &iter); if (!k.k) break; ret = bkey_err(k) ?: bch2_check_discard_freespace_key_fsck(trans, &iter); if (bch2_err_matches(ret, BCH_ERR_transaction_restart)) { ret = 0; continue; } if (ret) { struct printbuf buf = PRINTBUF; bch2_bkey_val_to_text(&buf, c, k); bch_err(c, "while checking %s", buf.buf); printbuf_exit(&buf); break; } bch2_btree_iter_set_pos(trans, &iter, bpos_nosnap_successor(iter.pos)); } bch2_trans_iter_exit(trans, &iter); if (ret) goto err; ret = for_each_btree_key_commit(trans, iter, BTREE_ID_bucket_gens, POS_MIN, BTREE_ITER_prefetch, k, NULL, NULL, BCH_TRANS_COMMIT_no_enospc, bch2_check_bucket_gens_key(trans, &iter, k)); err: bch2_trans_put(trans); bch_err_fn(c, ret); return ret; } static int bch2_check_alloc_to_lru_ref(struct btree_trans *trans, struct btree_iter *alloc_iter, struct bkey_buf *last_flushed) { struct bch_fs *c = trans->c; struct bch_alloc_v4 a_convert; const struct bch_alloc_v4 *a; struct bkey_s_c alloc_k; struct printbuf buf = PRINTBUF; int ret; alloc_k = bch2_btree_iter_peek(trans, alloc_iter); if (!alloc_k.k) return 0; ret = bkey_err(alloc_k); if (ret) return ret; struct bch_dev *ca = bch2_dev_tryget_noerror(c, alloc_k.k->p.inode); if (!ca) return 0; a = bch2_alloc_to_v4(alloc_k, &a_convert); u64 lru_idx = alloc_lru_idx_fragmentation(*a, ca); if (lru_idx) { ret = bch2_lru_check_set(trans, BCH_LRU_BUCKET_FRAGMENTATION, bucket_to_u64(alloc_k.k->p), lru_idx, alloc_k, last_flushed); if (ret) goto err; } if (a->data_type != BCH_DATA_cached) goto err; if (fsck_err_on(!a->io_time[READ], trans, alloc_key_cached_but_read_time_zero, "cached bucket with read_time 0\n%s", (printbuf_reset(&buf), bch2_bkey_val_to_text(&buf, c, alloc_k), buf.buf))) { struct bkey_i_alloc_v4 *a_mut = bch2_alloc_to_v4_mut(trans, alloc_k); ret = PTR_ERR_OR_ZERO(a_mut); if (ret) goto err; a_mut->v.io_time[READ] = bch2_current_io_time(c, READ); ret = bch2_trans_update(trans, alloc_iter, &a_mut->k_i, BTREE_TRIGGER_norun); if (ret) goto err; a = &a_mut->v; } ret = bch2_lru_check_set(trans, alloc_k.k->p.inode, bucket_to_u64(alloc_k.k->p), a->io_time[READ], alloc_k, last_flushed); if (ret) goto err; err: fsck_err: bch2_dev_put(ca); printbuf_exit(&buf); return ret; } int bch2_check_alloc_to_lru_refs(struct bch_fs *c) { struct bkey_buf last_flushed; bch2_bkey_buf_init(&last_flushed); bkey_init(&last_flushed.k->k); int ret = bch2_trans_run(c, for_each_btree_key_commit(trans, iter, BTREE_ID_alloc, POS_MIN, BTREE_ITER_prefetch, k, NULL, NULL, BCH_TRANS_COMMIT_no_enospc, bch2_check_alloc_to_lru_ref(trans, &iter, &last_flushed))) ?: bch2_check_stripe_to_lru_refs(c); bch2_bkey_buf_exit(&last_flushed, c); bch_err_fn(c, ret); return ret; } static int discard_in_flight_add(struct bch_dev *ca, u64 bucket, bool in_progress) { struct bch_fs *c = ca->fs; int ret; mutex_lock(&ca->discard_buckets_in_flight_lock); struct discard_in_flight *i = darray_find_p(ca->discard_buckets_in_flight, i, i->bucket == bucket); if (i) { ret = bch_err_throw(c, EEXIST_discard_in_flight_add); goto out; } ret = darray_push(&ca->discard_buckets_in_flight, ((struct discard_in_flight) { .in_progress = in_progress, .bucket = bucket, })); out: mutex_unlock(&ca->discard_buckets_in_flight_lock); return ret; } static void discard_in_flight_remove(struct bch_dev *ca, u64 bucket) { mutex_lock(&ca->discard_buckets_in_flight_lock); struct discard_in_flight *i = darray_find_p(ca->discard_buckets_in_flight, i, i->bucket == bucket); BUG_ON(!i || !i->in_progress); darray_remove_item(&ca->discard_buckets_in_flight, i); mutex_unlock(&ca->discard_buckets_in_flight_lock); } struct discard_buckets_state { u64 seen; u64 open; u64 need_journal_commit; u64 discarded; }; static int bch2_discard_one_bucket(struct btree_trans *trans, struct bch_dev *ca, struct btree_iter *need_discard_iter, struct bpos *discard_pos_done, struct discard_buckets_state *s, bool fastpath) { struct bch_fs *c = trans->c; struct bpos pos = need_discard_iter->pos; struct btree_iter iter = {}; struct bkey_s_c k; struct bkey_i_alloc_v4 *a; struct printbuf buf = PRINTBUF; bool discard_locked = false; int ret = 0; if (bch2_bucket_is_open_safe(c, pos.inode, pos.offset)) { s->open++; goto out; } u64 seq_ready = bch2_bucket_journal_seq_ready(&c->buckets_waiting_for_journal, pos.inode, pos.offset); if (seq_ready > c->journal.flushed_seq_ondisk) { if (seq_ready > c->journal.flushing_seq) s->need_journal_commit++; goto out; } k = bch2_bkey_get_iter(trans, &iter, BTREE_ID_alloc, need_discard_iter->pos, BTREE_ITER_cached); ret = bkey_err(k); if (ret) goto out; a = bch2_alloc_to_v4_mut(trans, k); ret = PTR_ERR_OR_ZERO(a); if (ret) goto out; if (a->v.data_type != BCH_DATA_need_discard) { if (need_discard_or_freespace_err(trans, k, true, true, true)) { ret = bch2_btree_bit_mod_iter(trans, need_discard_iter, false); if (ret) goto out; goto commit; } goto out; } if (!fastpath) { if (discard_in_flight_add(ca, iter.pos.offset, true)) goto out; discard_locked = true; } if (!bkey_eq(*discard_pos_done, iter.pos)) { s->discarded++; *discard_pos_done = iter.pos; if (bch2_discard_opt_enabled(c, ca) && !c->opts.nochanges) { /* * This works without any other locks because this is the only * thread that removes items from the need_discard tree */ bch2_trans_unlock_long(trans); blkdev_issue_discard(ca->disk_sb.bdev, k.k->p.offset * ca->mi.bucket_size, ca->mi.bucket_size, GFP_KERNEL); ret = bch2_trans_relock_notrace(trans); if (ret) goto out; } } SET_BCH_ALLOC_V4_NEED_DISCARD(&a->v, false); alloc_data_type_set(&a->v, a->v.data_type); ret = bch2_trans_update(trans, &iter, &a->k_i, 0); if (ret) goto out; commit: ret = bch2_trans_commit(trans, NULL, NULL, BCH_WATERMARK_btree| BCH_TRANS_COMMIT_no_enospc); if (ret) goto out; if (!fastpath) count_event(c, bucket_discard); else count_event(c, bucket_discard_fast); out: fsck_err: if (discard_locked) discard_in_flight_remove(ca, iter.pos.offset); if (!ret) s->seen++; bch2_trans_iter_exit(trans, &iter); printbuf_exit(&buf); return ret; } static void bch2_do_discards_work(struct work_struct *work) { struct bch_dev *ca = container_of(work, struct bch_dev, discard_work); struct bch_fs *c = ca->fs; struct discard_buckets_state s = {}; struct bpos discard_pos_done = POS_MAX; int ret; /* * We're doing the commit in bch2_discard_one_bucket instead of using * for_each_btree_key_commit() so that we can increment counters after * successful commit: */ ret = bch2_trans_run(c, for_each_btree_key_max(trans, iter, BTREE_ID_need_discard, POS(ca->dev_idx, 0), POS(ca->dev_idx, U64_MAX), 0, k, bch2_discard_one_bucket(trans, ca, &iter, &discard_pos_done, &s, false))); if (s.need_journal_commit > dev_buckets_available(ca, BCH_WATERMARK_normal)) bch2_journal_flush_async(&c->journal, NULL); trace_discard_buckets(c, s.seen, s.open, s.need_journal_commit, s.discarded, bch2_err_str(ret)); enumerated_ref_put(&ca->io_ref[WRITE], BCH_DEV_WRITE_REF_dev_do_discards); enumerated_ref_put(&c->writes, BCH_WRITE_REF_discard); } void bch2_dev_do_discards(struct bch_dev *ca) { struct bch_fs *c = ca->fs; if (!enumerated_ref_tryget(&c->writes, BCH_WRITE_REF_discard)) return; if (!bch2_dev_get_ioref(c, ca->dev_idx, WRITE, BCH_DEV_WRITE_REF_dev_do_discards)) goto put_write_ref; if (queue_work(c->write_ref_wq, &ca->discard_work)) return; enumerated_ref_put(&ca->io_ref[WRITE], BCH_DEV_WRITE_REF_dev_do_discards); put_write_ref: enumerated_ref_put(&c->writes, BCH_WRITE_REF_discard); } void bch2_do_discards(struct bch_fs *c) { for_each_member_device(c, ca) bch2_dev_do_discards(ca); } static int bch2_do_discards_fast_one(struct btree_trans *trans, struct bch_dev *ca, u64 bucket, struct bpos *discard_pos_done, struct discard_buckets_state *s) { struct btree_iter need_discard_iter; struct bkey_s_c discard_k = bch2_bkey_get_iter(trans, &need_discard_iter, BTREE_ID_need_discard, POS(ca->dev_idx, bucket), 0); int ret = bkey_err(discard_k); if (ret) return ret; if (log_fsck_err_on(discard_k.k->type != KEY_TYPE_set, trans, discarding_bucket_not_in_need_discard_btree, "attempting to discard bucket %u:%llu not in need_discard btree", ca->dev_idx, bucket)) goto out; ret = bch2_discard_one_bucket(trans, ca, &need_discard_iter, discard_pos_done, s, true); out: fsck_err: bch2_trans_iter_exit(trans, &need_discard_iter); return ret; } static void bch2_do_discards_fast_work(struct work_struct *work) { struct bch_dev *ca = container_of(work, struct bch_dev, discard_fast_work); struct bch_fs *c = ca->fs; struct discard_buckets_state s = {}; struct bpos discard_pos_done = POS_MAX; struct btree_trans *trans = bch2_trans_get(c); int ret = 0; while (1) { bool got_bucket = false; u64 bucket; mutex_lock(&ca->discard_buckets_in_flight_lock); darray_for_each(ca->discard_buckets_in_flight, i) { if (i->in_progress) continue; got_bucket = true; bucket = i->bucket; i->in_progress = true; break; } mutex_unlock(&ca->discard_buckets_in_flight_lock); if (!got_bucket) break; ret = lockrestart_do(trans, bch2_do_discards_fast_one(trans, ca, bucket, &discard_pos_done, &s)); bch_err_fn(c, ret); discard_in_flight_remove(ca, bucket); if (ret) break; } trace_discard_buckets_fast(c, s.seen, s.open, s.need_journal_commit, s.discarded, bch2_err_str(ret)); bch2_trans_put(trans); enumerated_ref_put(&ca->io_ref[WRITE], BCH_DEV_WRITE_REF_discard_one_bucket_fast); enumerated_ref_put(&c->writes, BCH_WRITE_REF_discard_fast); } static void bch2_discard_one_bucket_fast(struct bch_dev *ca, u64 bucket) { struct bch_fs *c = ca->fs; if (discard_in_flight_add(ca, bucket, false)) return; if (!enumerated_ref_tryget(&c->writes, BCH_WRITE_REF_discard_fast)) return; if (!bch2_dev_get_ioref(c, ca->dev_idx, WRITE, BCH_DEV_WRITE_REF_discard_one_bucket_fast)) goto put_ref; if (queue_work(c->write_ref_wq, &ca->discard_fast_work)) return; enumerated_ref_put(&ca->io_ref[WRITE], BCH_DEV_WRITE_REF_discard_one_bucket_fast); put_ref: enumerated_ref_put(&c->writes, BCH_WRITE_REF_discard_fast); } static int invalidate_one_bp(struct btree_trans *trans, struct bch_dev *ca, struct bkey_s_c_backpointer bp, struct bkey_buf *last_flushed) { struct btree_iter extent_iter; struct bkey_s_c extent_k = bch2_backpointer_get_key(trans, bp, &extent_iter, 0, last_flushed); int ret = bkey_err(extent_k); if (ret) return ret; if (!extent_k.k) return 0; struct bkey_i *n = bch2_bkey_make_mut(trans, &extent_iter, &extent_k, BTREE_UPDATE_internal_snapshot_node); ret = PTR_ERR_OR_ZERO(n); if (ret) goto err; bch2_bkey_drop_device(bkey_i_to_s(n), ca->dev_idx); err: bch2_trans_iter_exit(trans, &extent_iter); return ret; } static int invalidate_one_bucket_by_bps(struct btree_trans *trans, struct bch_dev *ca, struct bpos bucket, u8 gen, struct bkey_buf *last_flushed) { struct bpos bp_start = bucket_pos_to_bp_start(ca, bucket); struct bpos bp_end = bucket_pos_to_bp_end(ca, bucket); return for_each_btree_key_max_commit(trans, iter, BTREE_ID_backpointers, bp_start, bp_end, 0, k, NULL, NULL, BCH_WATERMARK_btree| BCH_TRANS_COMMIT_no_enospc, ({ if (k.k->type != KEY_TYPE_backpointer) continue; struct bkey_s_c_backpointer bp = bkey_s_c_to_backpointer(k); if (bp.v->bucket_gen != gen) continue; /* filter out bps with gens that don't match */ invalidate_one_bp(trans, ca, bp, last_flushed); })); } noinline_for_stack static int invalidate_one_bucket(struct btree_trans *trans, struct bch_dev *ca, struct btree_iter *lru_iter, struct bkey_s_c lru_k, struct bkey_buf *last_flushed, s64 *nr_to_invalidate) { struct bch_fs *c = trans->c; struct printbuf buf = PRINTBUF; struct bpos bucket = u64_to_bucket(lru_k.k->p.offset); struct btree_iter alloc_iter = {}; int ret = 0; if (*nr_to_invalidate <= 0) return 1; if (!bch2_dev_bucket_exists(c, bucket)) { if (fsck_err(trans, lru_entry_to_invalid_bucket, "lru key points to nonexistent device:bucket %llu:%llu", bucket.inode, bucket.offset)) return bch2_btree_bit_mod_buffered(trans, BTREE_ID_lru, lru_iter->pos, false); goto out; } if (bch2_bucket_is_open_safe(c, bucket.inode, bucket.offset)) return 0; struct bkey_s_c alloc_k = bch2_bkey_get_iter(trans, &alloc_iter, BTREE_ID_alloc, bucket, BTREE_ITER_cached); ret = bkey_err(alloc_k); if (ret) return ret; struct bch_alloc_v4 a_convert; const struct bch_alloc_v4 *a = bch2_alloc_to_v4(alloc_k, &a_convert); /* We expect harmless races here due to the btree write buffer: */ if (lru_pos_time(lru_iter->pos) != alloc_lru_idx_read(*a)) goto out; /* * Impossible since alloc_lru_idx_read() only returns nonzero if the * bucket is supposed to be on the cached bucket LRU (i.e. * BCH_DATA_cached) * * bch2_lru_validate() also disallows lru keys with lru_pos_time() == 0 */ BUG_ON(a->data_type != BCH_DATA_cached); BUG_ON(a->dirty_sectors); if (!a->cached_sectors) { bch2_check_bucket_backpointer_mismatch(trans, ca, bucket.offset, true, last_flushed); goto out; } unsigned cached_sectors = a->cached_sectors; u8 gen = a->gen; ret = invalidate_one_bucket_by_bps(trans, ca, bucket, gen, last_flushed); if (ret) goto out; trace_and_count(c, bucket_invalidate, c, bucket.inode, bucket.offset, cached_sectors); --*nr_to_invalidate; out: fsck_err: bch2_trans_iter_exit(trans, &alloc_iter); printbuf_exit(&buf); return ret; } static struct bkey_s_c next_lru_key(struct btree_trans *trans, struct btree_iter *iter, struct bch_dev *ca, bool *wrapped) { struct bkey_s_c k; again: k = bch2_btree_iter_peek_max(trans, iter, lru_pos(ca->dev_idx, U64_MAX, LRU_TIME_MAX)); if (!k.k && !*wrapped) { bch2_btree_iter_set_pos(trans, iter, lru_pos(ca->dev_idx, 0, 0)); *wrapped = true; goto again; } return k; } static void bch2_do_invalidates_work(struct work_struct *work) { struct bch_dev *ca = container_of(work, struct bch_dev, invalidate_work); struct bch_fs *c = ca->fs; struct btree_trans *trans = bch2_trans_get(c); int ret = 0; struct bkey_buf last_flushed; bch2_bkey_buf_init(&last_flushed); bkey_init(&last_flushed.k->k); ret = bch2_btree_write_buffer_tryflush(trans); if (ret) goto err; s64 nr_to_invalidate = should_invalidate_buckets(ca, bch2_dev_usage_read(ca)); struct btree_iter iter; bool wrapped = false; bch2_trans_iter_init(trans, &iter, BTREE_ID_lru, lru_pos(ca->dev_idx, 0, ((bch2_current_io_time(c, READ) + U32_MAX) & LRU_TIME_MAX)), 0); while (true) { bch2_trans_begin(trans); struct bkey_s_c k = next_lru_key(trans, &iter, ca, &wrapped); ret = bkey_err(k); if (ret) goto restart_err; if (!k.k) break; ret = invalidate_one_bucket(trans, ca, &iter, k, &last_flushed, &nr_to_invalidate); restart_err: if (bch2_err_matches(ret, BCH_ERR_transaction_restart)) continue; if (ret) break; bch2_btree_iter_advance(trans, &iter); } bch2_trans_iter_exit(trans, &iter); err: bch2_trans_put(trans); bch2_bkey_buf_exit(&last_flushed, c); enumerated_ref_put(&ca->io_ref[WRITE], BCH_DEV_WRITE_REF_do_invalidates); enumerated_ref_put(&c->writes, BCH_WRITE_REF_invalidate); } void bch2_dev_do_invalidates(struct bch_dev *ca) { struct bch_fs *c = ca->fs; if (!enumerated_ref_tryget(&c->writes, BCH_WRITE_REF_invalidate)) return; if (!bch2_dev_get_ioref(c, ca->dev_idx, WRITE, BCH_DEV_WRITE_REF_do_invalidates)) goto put_ref; if (queue_work(c->write_ref_wq, &ca->invalidate_work)) return; enumerated_ref_put(&ca->io_ref[WRITE], BCH_DEV_WRITE_REF_do_invalidates); put_ref: enumerated_ref_put(&c->writes, BCH_WRITE_REF_invalidate); } void bch2_do_invalidates(struct bch_fs *c) { for_each_member_device(c, ca) bch2_dev_do_invalidates(ca); } int bch2_dev_freespace_init(struct bch_fs *c, struct bch_dev *ca, u64 bucket_start, u64 bucket_end) { struct btree_trans *trans = bch2_trans_get(c); struct btree_iter iter; struct bkey_s_c k; struct bkey hole; struct bpos end = POS(ca->dev_idx, bucket_end); struct bch_member *m; unsigned long last_updated = jiffies; int ret; BUG_ON(bucket_start > bucket_end); BUG_ON(bucket_end > ca->mi.nbuckets); bch2_trans_iter_init(trans, &iter, BTREE_ID_alloc, POS(ca->dev_idx, max_t(u64, ca->mi.first_bucket, bucket_start)), BTREE_ITER_prefetch); /* * Scan the alloc btree for every bucket on @ca, and add buckets to the * freespace/need_discard/need_gc_gens btrees as needed: */ while (1) { if (time_after(jiffies, last_updated + HZ * 10)) { bch_info(ca, "%s: currently at %llu/%llu", __func__, iter.pos.offset, ca->mi.nbuckets); last_updated = jiffies; } bch2_trans_begin(trans); if (bkey_ge(iter.pos, end)) { ret = 0; break; } k = bch2_get_key_or_hole(trans, &iter, end, &hole); ret = bkey_err(k); if (ret) goto bkey_err; if (k.k->type) { /* * We process live keys in the alloc btree one at a * time: */ struct bch_alloc_v4 a_convert; const struct bch_alloc_v4 *a = bch2_alloc_to_v4(k, &a_convert); ret = bch2_bucket_do_index(trans, ca, k, a, true) ?: bch2_trans_commit(trans, NULL, NULL, BCH_TRANS_COMMIT_no_enospc); if (ret) goto bkey_err; bch2_btree_iter_advance(trans, &iter); } else { struct bkey_i *freespace; freespace = bch2_trans_kmalloc(trans, sizeof(*freespace)); ret = PTR_ERR_OR_ZERO(freespace); if (ret) goto bkey_err; bkey_init(&freespace->k); freespace->k.type = KEY_TYPE_set; freespace->k.p = k.k->p; freespace->k.size = k.k->size; ret = bch2_btree_insert_trans(trans, BTREE_ID_freespace, freespace, 0) ?: bch2_trans_commit(trans, NULL, NULL, BCH_TRANS_COMMIT_no_enospc); if (ret) goto bkey_err; bch2_btree_iter_set_pos(trans, &iter, k.k->p); } bkey_err: if (bch2_err_matches(ret, BCH_ERR_transaction_restart)) continue; if (ret) break; } bch2_trans_iter_exit(trans, &iter); bch2_trans_put(trans); if (ret < 0) { bch_err_msg(ca, ret, "initializing free space"); return ret; } mutex_lock(&c->sb_lock); m = bch2_members_v2_get_mut(c->disk_sb.sb, ca->dev_idx); SET_BCH_MEMBER_FREESPACE_INITIALIZED(m, true); mutex_unlock(&c->sb_lock); return 0; } int bch2_fs_freespace_init(struct bch_fs *c) { if (c->sb.features & BIT_ULL(BCH_FEATURE_small_image)) return 0; /* * We can crash during the device add path, so we need to check this on * every mount: */ bool doing_init = false; for_each_member_device(c, ca) { if (ca->mi.freespace_initialized) continue; if (!doing_init) { bch_info(c, "initializing freespace"); doing_init = true; } int ret = bch2_dev_freespace_init(c, ca, 0, ca->mi.nbuckets); if (ret) { bch2_dev_put(ca); bch_err_fn(c, ret); return ret; } } if (doing_init) { mutex_lock(&c->sb_lock); bch2_write_super(c); mutex_unlock(&c->sb_lock); bch_verbose(c, "done initializing freespace"); } return 0; } /* device removal */ int bch2_dev_remove_alloc(struct bch_fs *c, struct bch_dev *ca) { struct bpos start = POS(ca->dev_idx, 0); struct bpos end = POS(ca->dev_idx, U64_MAX); int ret; /* * We clear the LRU and need_discard btrees first so that we don't race * with bch2_do_invalidates() and bch2_do_discards() */ ret = bch2_btree_delete_range(c, BTREE_ID_lru, start, end, BTREE_TRIGGER_norun, NULL) ?: bch2_btree_delete_range(c, BTREE_ID_need_discard, start, end, BTREE_TRIGGER_norun, NULL) ?: bch2_btree_delete_range(c, BTREE_ID_freespace, start, end, BTREE_TRIGGER_norun, NULL) ?: bch2_btree_delete_range(c, BTREE_ID_backpointers, start, end, BTREE_TRIGGER_norun, NULL) ?: bch2_btree_delete_range(c, BTREE_ID_bucket_gens, start, end, BTREE_TRIGGER_norun, NULL) ?: bch2_btree_delete_range(c, BTREE_ID_alloc, start, end, BTREE_TRIGGER_norun, NULL) ?: bch2_dev_usage_remove(c, ca->dev_idx); bch_err_msg(ca, ret, "removing dev alloc info"); return ret; } /* Bucket IO clocks: */ static int __bch2_bucket_io_time_reset(struct btree_trans *trans, unsigned dev, size_t bucket_nr, int rw) { struct bch_fs *c = trans->c; struct btree_iter iter; struct bkey_i_alloc_v4 *a = bch2_trans_start_alloc_update_noupdate(trans, &iter, POS(dev, bucket_nr)); int ret = PTR_ERR_OR_ZERO(a); if (ret) return ret; u64 now = bch2_current_io_time(c, rw); if (a->v.io_time[rw] == now) goto out; a->v.io_time[rw] = now; ret = bch2_trans_update(trans, &iter, &a->k_i, 0) ?: bch2_trans_commit(trans, NULL, NULL, 0); out: bch2_trans_iter_exit(trans, &iter); return ret; } int bch2_bucket_io_time_reset(struct btree_trans *trans, unsigned dev, size_t bucket_nr, int rw) { if (bch2_trans_relock(trans)) bch2_trans_begin(trans); return nested_lockrestart_do(trans, __bch2_bucket_io_time_reset(trans, dev, bucket_nr, rw)); } /* Startup/shutdown (ro/rw): */ void bch2_recalc_capacity(struct bch_fs *c) { u64 capacity = 0, reserved_sectors = 0, gc_reserve; unsigned bucket_size_max = 0; unsigned long ra_pages = 0; lockdep_assert_held(&c->state_lock); guard(rcu)(); for_each_member_device_rcu(c, ca, NULL) { struct block_device *bdev = READ_ONCE(ca->disk_sb.bdev); if (bdev) ra_pages += bdev->bd_disk->bdi->ra_pages; if (ca->mi.state != BCH_MEMBER_STATE_rw) continue; u64 dev_reserve = 0; /* * We need to reserve buckets (from the number * of currently available buckets) against * foreground writes so that mainly copygc can * make forward progress. * * We need enough to refill the various reserves * from scratch - copygc will use its entire * reserve all at once, then run against when * its reserve is refilled (from the formerly * available buckets). * * This reserve is just used when considering if * allocations for foreground writes must wait - * not -ENOSPC calculations. */ dev_reserve += ca->nr_btree_reserve * 2; dev_reserve += ca->mi.nbuckets >> 6; /* copygc reserve */ dev_reserve += 1; /* btree write point */ dev_reserve += 1; /* copygc write point */ dev_reserve += 1; /* rebalance write point */ dev_reserve *= ca->mi.bucket_size; capacity += bucket_to_sector(ca, ca->mi.nbuckets - ca->mi.first_bucket); reserved_sectors += dev_reserve * 2; bucket_size_max = max_t(unsigned, bucket_size_max, ca->mi.bucket_size); } bch2_set_ra_pages(c, ra_pages); gc_reserve = c->opts.gc_reserve_bytes ? c->opts.gc_reserve_bytes >> 9 : div64_u64(capacity * c->opts.gc_reserve_percent, 100); reserved_sectors = max(gc_reserve, reserved_sectors); reserved_sectors = min(reserved_sectors, capacity); c->reserved = reserved_sectors; c->capacity = capacity - reserved_sectors; c->bucket_size_max = bucket_size_max; /* Wake up case someone was waiting for buckets */ closure_wake_up(&c->freelist_wait); } u64 bch2_min_rw_member_capacity(struct bch_fs *c) { u64 ret = U64_MAX; guard(rcu)(); for_each_rw_member_rcu(c, ca) ret = min(ret, ca->mi.nbuckets * ca->mi.bucket_size); return ret; } static bool bch2_dev_has_open_write_point(struct bch_fs *c, struct bch_dev *ca) { struct open_bucket *ob; for (ob = c->open_buckets; ob < c->open_buckets + ARRAY_SIZE(c->open_buckets); ob++) { scoped_guard(spinlock, &ob->lock) { if (ob->valid && !ob->on_partial_list && ob->dev == ca->dev_idx) return true; } } return false; } void bch2_dev_allocator_set_rw(struct bch_fs *c, struct bch_dev *ca, bool rw) { /* BCH_DATA_free == all rw devs */ for (unsigned i = 0; i < ARRAY_SIZE(c->rw_devs); i++) if (rw && (i == BCH_DATA_free || (ca->mi.data_allowed & BIT(i)))) set_bit(ca->dev_idx, c->rw_devs[i].d); else clear_bit(ca->dev_idx, c->rw_devs[i].d); } /* device goes ro: */ void bch2_dev_allocator_remove(struct bch_fs *c, struct bch_dev *ca) { lockdep_assert_held(&c->state_lock); /* First, remove device from allocation groups: */ bch2_dev_allocator_set_rw(c, ca, false); c->rw_devs_change_count++; /* * Capacity is calculated based off of devices in allocation groups: */ bch2_recalc_capacity(c); bch2_open_buckets_stop(c, ca, false); /* * Wake up threads that were blocked on allocation, so they can notice * the device can no longer be removed and the capacity has changed: */ closure_wake_up(&c->freelist_wait); /* * journal_res_get() can block waiting for free space in the journal - * it needs to notice there may not be devices to allocate from anymore: */ wake_up(&c->journal.wait); /* Now wait for any in flight writes: */ closure_wait_event(&c->open_buckets_wait, !bch2_dev_has_open_write_point(c, ca)); } /* device goes rw: */ void bch2_dev_allocator_add(struct bch_fs *c, struct bch_dev *ca) { lockdep_assert_held(&c->state_lock); bch2_dev_allocator_set_rw(c, ca, true); c->rw_devs_change_count++; } void bch2_dev_allocator_background_exit(struct bch_dev *ca) { darray_exit(&ca->discard_buckets_in_flight); } void bch2_dev_allocator_background_init(struct bch_dev *ca) { mutex_init(&ca->discard_buckets_in_flight_lock); INIT_WORK(&ca->discard_work, bch2_do_discards_work); INIT_WORK(&ca->discard_fast_work, bch2_do_discards_fast_work); INIT_WORK(&ca->invalidate_work, bch2_do_invalidates_work); } void bch2_fs_allocator_background_init(struct bch_fs *c) { spin_lock_init(&c->freelist_lock); } |
| 598 478 651 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 | /* SPDX-License-Identifier: GPL-2.0 */ #ifndef __IPC_NAMESPACE_H__ #define __IPC_NAMESPACE_H__ #include <linux/err.h> #include <linux/idr.h> #include <linux/rwsem.h> #include <linux/notifier.h> #include <linux/nsproxy.h> #include <linux/ns_common.h> #include <linux/refcount.h> #include <linux/rhashtable-types.h> #include <linux/sysctl.h> #include <linux/percpu_counter.h> struct user_namespace; struct ipc_ids { int in_use; unsigned short seq; struct rw_semaphore rwsem; struct idr ipcs_idr; int max_idx; int last_idx; /* For wrap around detection */ #ifdef CONFIG_CHECKPOINT_RESTORE int next_id; #endif struct rhashtable key_ht; }; struct ipc_namespace { struct ipc_ids ids[3]; int sem_ctls[4]; int used_sems; unsigned int msg_ctlmax; unsigned int msg_ctlmnb; unsigned int msg_ctlmni; struct percpu_counter percpu_msg_bytes; struct percpu_counter percpu_msg_hdrs; size_t shm_ctlmax; size_t shm_ctlall; unsigned long shm_tot; int shm_ctlmni; /* * Defines whether IPC_RMID is forced for _all_ shm segments regardless * of shmctl() */ int shm_rmid_forced; struct notifier_block ipcns_nb; /* The kern_mount of the mqueuefs sb. We take a ref on it */ struct vfsmount *mq_mnt; /* # queues in this ns, protected by mq_lock */ unsigned int mq_queues_count; /* next fields are set through sysctl */ unsigned int mq_queues_max; /* initialized to DFLT_QUEUESMAX */ unsigned int mq_msg_max; /* initialized to DFLT_MSGMAX */ unsigned int mq_msgsize_max; /* initialized to DFLT_MSGSIZEMAX */ unsigned int mq_msg_default; unsigned int mq_msgsize_default; struct ctl_table_set mq_set; struct ctl_table_header *mq_sysctls; struct ctl_table_set ipc_set; struct ctl_table_header *ipc_sysctls; /* user_ns which owns the ipc ns */ struct user_namespace *user_ns; struct ucounts *ucounts; struct llist_node mnt_llist; struct ns_common ns; } __randomize_layout; extern struct ipc_namespace init_ipc_ns; extern spinlock_t mq_lock; #ifdef CONFIG_SYSVIPC extern void shm_destroy_orphaned(struct ipc_namespace *ns); #else /* CONFIG_SYSVIPC */ static inline void shm_destroy_orphaned(struct ipc_namespace *ns) {} #endif /* CONFIG_SYSVIPC */ #ifdef CONFIG_POSIX_MQUEUE extern int mq_init_ns(struct ipc_namespace *ns); /* * POSIX Message Queue default values: * * MIN_*: Lowest value an admin can set the maximum unprivileged limit to * DFLT_*MAX: Default values for the maximum unprivileged limits * DFLT_{MSG,MSGSIZE}: Default values used when the user doesn't supply * an attribute to the open call and the queue must be created * HARD_*: Highest value the maximums can be set to. These are enforced * on CAP_SYS_RESOURCE apps as well making them inviolate (so make them * suitably high) * * POSIX Requirements: * Per app minimum openable message queues - 8. This does not map well * to the fact that we limit the number of queues on a per namespace * basis instead of a per app basis. So, make the default high enough * that no given app should have a hard time opening 8 queues. * Minimum maximum for HARD_MSGMAX - 32767. I bumped this to 65536. * Minimum maximum for HARD_MSGSIZEMAX - POSIX is silent on this. However, * we have run into a situation where running applications in the wild * require this to be at least 5MB, and preferably 10MB, so I set the * value to 16MB in hopes that this user is the worst of the bunch and * the new maximum will handle anyone else. I may have to revisit this * in the future. */ #define DFLT_QUEUESMAX 256 #define MIN_MSGMAX 1 #define DFLT_MSG 10U #define DFLT_MSGMAX 10 #define HARD_MSGMAX 65536 #define MIN_MSGSIZEMAX 128 #define DFLT_MSGSIZE 8192U #define DFLT_MSGSIZEMAX 8192 #define HARD_MSGSIZEMAX (16*1024*1024) #else static inline int mq_init_ns(struct ipc_namespace *ns) { return 0; } #endif #if defined(CONFIG_IPC_NS) extern struct ipc_namespace *copy_ipcs(unsigned long flags, struct user_namespace *user_ns, struct ipc_namespace *ns); static inline struct ipc_namespace *get_ipc_ns(struct ipc_namespace *ns) { if (ns) refcount_inc(&ns->ns.count); return ns; } static inline struct ipc_namespace *get_ipc_ns_not_zero(struct ipc_namespace *ns) { if (ns) { if (refcount_inc_not_zero(&ns->ns.count)) return ns; } return NULL; } extern void put_ipc_ns(struct ipc_namespace *ns); #else static inline struct ipc_namespace *copy_ipcs(unsigned long flags, struct user_namespace *user_ns, struct ipc_namespace *ns) { if (flags & CLONE_NEWIPC) return ERR_PTR(-EINVAL); return ns; } static inline struct ipc_namespace *get_ipc_ns(struct ipc_namespace *ns) { return ns; } static inline struct ipc_namespace *get_ipc_ns_not_zero(struct ipc_namespace *ns) { return ns; } static inline void put_ipc_ns(struct ipc_namespace *ns) { } #endif #ifdef CONFIG_POSIX_MQUEUE_SYSCTL void retire_mq_sysctls(struct ipc_namespace *ns); bool setup_mq_sysctls(struct ipc_namespace *ns); #else /* CONFIG_POSIX_MQUEUE_SYSCTL */ static inline void retire_mq_sysctls(struct ipc_namespace *ns) { } static inline bool setup_mq_sysctls(struct ipc_namespace *ns) { return true; } #endif /* CONFIG_POSIX_MQUEUE_SYSCTL */ #ifdef CONFIG_SYSVIPC_SYSCTL bool setup_ipc_sysctls(struct ipc_namespace *ns); void retire_ipc_sysctls(struct ipc_namespace *ns); #else /* CONFIG_SYSVIPC_SYSCTL */ static inline void retire_ipc_sysctls(struct ipc_namespace *ns) { } static inline bool setup_ipc_sysctls(struct ipc_namespace *ns) { return true; } #endif /* CONFIG_SYSVIPC_SYSCTL */ #endif |
| 1 3 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 | /* SPDX-License-Identifier: GPL-2.0 */ #ifndef ASM_X86_SERPENT_SSE2_H #define ASM_X86_SERPENT_SSE2_H #include <linux/crypto.h> #include <crypto/serpent.h> #ifdef CONFIG_X86_32 #define SERPENT_PARALLEL_BLOCKS 4 asmlinkage void __serpent_enc_blk_4way(const struct serpent_ctx *ctx, u8 *dst, const u8 *src, bool xor); asmlinkage void serpent_dec_blk_4way(const struct serpent_ctx *ctx, u8 *dst, const u8 *src); static inline void serpent_enc_blk_xway(const void *ctx, u8 *dst, const u8 *src) { __serpent_enc_blk_4way(ctx, dst, src, false); } static inline void serpent_enc_blk_xway_xor(const struct serpent_ctx *ctx, u8 *dst, const u8 *src) { __serpent_enc_blk_4way(ctx, dst, src, true); } static inline void serpent_dec_blk_xway(const void *ctx, u8 *dst, const u8 *src) { serpent_dec_blk_4way(ctx, dst, src); } #else #define SERPENT_PARALLEL_BLOCKS 8 asmlinkage void __serpent_enc_blk_8way(const struct serpent_ctx *ctx, u8 *dst, const u8 *src, bool xor); asmlinkage void serpent_dec_blk_8way(const struct serpent_ctx *ctx, u8 *dst, const u8 *src); static inline void serpent_enc_blk_xway(const void *ctx, u8 *dst, const u8 *src) { __serpent_enc_blk_8way(ctx, dst, src, false); } static inline void serpent_enc_blk_xway_xor(const struct serpent_ctx *ctx, u8 *dst, const u8 *src) { __serpent_enc_blk_8way(ctx, dst, src, true); } static inline void serpent_dec_blk_xway(const void *ctx, u8 *dst, const u8 *src) { serpent_dec_blk_8way(ctx, dst, src); } #endif #endif |
| 8 8 8 2 1 1 1 1 1 2 1 1 1 6 6 6 6 6 6 1 1 1 1 1 1 1 1 6 6 6 6 6 5 4 4 6 4 6 6 6 6 6 6 6 6 6 6 6 4 6 6 6 6 6 6 6 6 6 6 1 1 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 | // SPDX-License-Identifier: GPL-2.0 /* * comedi/drivers/pcl812.c * * Author: Michal Dobes <dobes@tesnet.cz> * * hardware driver for Advantech cards * card: PCL-812, PCL-812PG, PCL-813, PCL-813B * driver: pcl812, pcl812pg, pcl813, pcl813b * and for ADlink cards * card: ACL-8112DG, ACL-8112HG, ACL-8112PG, ACL-8113, ACL-8216 * driver: acl8112dg, acl8112hg, acl8112pg, acl8113, acl8216 * and for ICP DAS cards * card: ISO-813, A-821PGH, A-821PGL, A-821PGL-NDA, A-822PGH, A-822PGL, * driver: iso813, a821pgh, a-821pgl, a-821pglnda, a822pgh, a822pgl, * card: A-823PGH, A-823PGL, A-826PG * driver: a823pgh, a823pgl, a826pg */ /* * Driver: pcl812 * Description: Advantech PCL-812/PG, PCL-813/B, * ADLink ACL-8112DG/HG/PG, ACL-8113, ACL-8216, * ICP DAS A-821PGH/PGL/PGL-NDA, A-822PGH/PGL, A-823PGH/PGL, A-826PG, * ICP DAS ISO-813 * Author: Michal Dobes <dobes@tesnet.cz> * Devices: [Advantech] PCL-812 (pcl812), PCL-812PG (pcl812pg), * PCL-813 (pcl813), PCL-813B (pcl813b), [ADLink] ACL-8112DG (acl8112dg), * ACL-8112HG (acl8112hg), ACL-8113 (acl-8113), ACL-8216 (acl8216), * [ICP] ISO-813 (iso813), A-821PGH (a821pgh), A-821PGL (a821pgl), * A-821PGL-NDA (a821pclnda), A-822PGH (a822pgh), A-822PGL (a822pgl), * A-823PGH (a823pgh), A-823PGL (a823pgl), A-826PG (a826pg) * Updated: Mon, 06 Aug 2007 12:03:15 +0100 * Status: works (I hope. My board fire up under my hands * and I cann't test all features.) * * This driver supports insn and cmd interfaces. Some boards support only insn * because their hardware don't allow more (PCL-813/B, ACL-8113, ISO-813). * Data transfer over DMA is supported only when you measure only one * channel, this is too hardware limitation of these boards. * * Options for PCL-812: * [0] - IO Base * [1] - IRQ (0=disable, 2, 3, 4, 5, 6, 7; 10, 11, 12, 14, 15) * [2] - DMA (0=disable, 1, 3) * [3] - 0=trigger source is internal 8253 with 2MHz clock * 1=trigger source is external * [4] - 0=A/D input range is +/-10V * 1=A/D input range is +/-5V * 2=A/D input range is +/-2.5V * 3=A/D input range is +/-1.25V * 4=A/D input range is +/-0.625V * 5=A/D input range is +/-0.3125V * [5] - 0=D/A outputs 0-5V (internal reference -5V) * 1=D/A outputs 0-10V (internal reference -10V) * 2=D/A outputs unknown (external reference) * * Options for PCL-812PG, ACL-8112PG: * [0] - IO Base * [1] - IRQ (0=disable, 2, 3, 4, 5, 6, 7; 10, 11, 12, 14, 15) * [2] - DMA (0=disable, 1, 3) * [3] - 0=trigger source is internal 8253 with 2MHz clock * 1=trigger source is external * [4] - 0=A/D have max +/-5V input * 1=A/D have max +/-10V input * [5] - 0=D/A outputs 0-5V (internal reference -5V) * 1=D/A outputs 0-10V (internal reference -10V) * 2=D/A outputs unknown (external reference) * * Options for ACL-8112DG/HG, A-822PGL/PGH, A-823PGL/PGH, ACL-8216, A-826PG: * [0] - IO Base * [1] - IRQ (0=disable, 2, 3, 4, 5, 6, 7; 10, 11, 12, 14, 15) * [2] - DMA (0=disable, 1, 3) * [3] - 0=trigger source is internal 8253 with 2MHz clock * 1=trigger source is external * [4] - 0=A/D channels are S.E. * 1=A/D channels are DIFF * [5] - 0=D/A outputs 0-5V (internal reference -5V) * 1=D/A outputs 0-10V (internal reference -10V) * 2=D/A outputs unknown (external reference) * * Options for A-821PGL/PGH: * [0] - IO Base * [1] - IRQ (0=disable, 2, 3, 4, 5, 6, 7) * [2] - 0=A/D channels are S.E. * 1=A/D channels are DIFF * [3] - 0=D/A output 0-5V (internal reference -5V) * 1=D/A output 0-10V (internal reference -10V) * * Options for A-821PGL-NDA: * [0] - IO Base * [1] - IRQ (0=disable, 2, 3, 4, 5, 6, 7) * [2] - 0=A/D channels are S.E. * 1=A/D channels are DIFF * * Options for PCL-813: * [0] - IO Base * * Options for PCL-813B: * [0] - IO Base * [1] - 0= bipolar inputs * 1= unipolar inputs * * Options for ACL-8113, ISO-813: * [0] - IO Base * [1] - 0= 10V bipolar inputs * 1= 10V unipolar inputs * 2= 20V bipolar inputs * 3= 20V unipolar inputs */ #include <linux/module.h> #include <linux/interrupt.h> #include <linux/gfp.h> #include <linux/delay.h> #include <linux/io.h> #include <linux/comedi/comedidev.h> #include <linux/comedi/comedi_8254.h> #include <linux/comedi/comedi_isadma.h> /* * Register I/O map */ #define PCL812_TIMER_BASE 0x00 #define PCL812_AI_LSB_REG 0x04 #define PCL812_AI_MSB_REG 0x05 #define PCL812_AI_MSB_DRDY BIT(4) #define PCL812_AO_LSB_REG(x) (0x04 + ((x) * 2)) #define PCL812_AO_MSB_REG(x) (0x05 + ((x) * 2)) #define PCL812_DI_LSB_REG 0x06 #define PCL812_DI_MSB_REG 0x07 #define PCL812_STATUS_REG 0x08 #define PCL812_STATUS_DRDY BIT(5) #define PCL812_RANGE_REG 0x09 #define PCL812_MUX_REG 0x0a #define PCL812_MUX_CHAN(x) ((x) << 0) #define PCL812_MUX_CS0 BIT(4) #define PCL812_MUX_CS1 BIT(5) #define PCL812_CTRL_REG 0x0b #define PCL812_CTRL_TRIG(x) (((x) & 0x7) << 0) #define PCL812_CTRL_DISABLE_TRIG PCL812_CTRL_TRIG(0) #define PCL812_CTRL_SOFT_TRIG PCL812_CTRL_TRIG(1) #define PCL812_CTRL_PACER_DMA_TRIG PCL812_CTRL_TRIG(2) #define PCL812_CTRL_PACER_EOC_TRIG PCL812_CTRL_TRIG(6) #define PCL812_SOFTTRIG_REG 0x0c #define PCL812_DO_LSB_REG 0x0d #define PCL812_DO_MSB_REG 0x0e #define MAX_CHANLIST_LEN 256 /* length of scan list */ static const struct comedi_lrange range_pcl812pg_ai = { 5, { BIP_RANGE(5), BIP_RANGE(2.5), BIP_RANGE(1.25), BIP_RANGE(0.625), BIP_RANGE(0.3125) } }; static const struct comedi_lrange range_pcl812pg2_ai = { 5, { BIP_RANGE(10), BIP_RANGE(5), BIP_RANGE(2.5), BIP_RANGE(1.25), BIP_RANGE(0.625) } }; static const struct comedi_lrange range812_bipolar1_25 = { 1, { BIP_RANGE(1.25) } }; static const struct comedi_lrange range812_bipolar0_625 = { 1, { BIP_RANGE(0.625) } }; static const struct comedi_lrange range812_bipolar0_3125 = { 1, { BIP_RANGE(0.3125) } }; static const struct comedi_lrange range_pcl813b_ai = { 4, { BIP_RANGE(5), BIP_RANGE(2.5), BIP_RANGE(1.25), BIP_RANGE(0.625) } }; static const struct comedi_lrange range_pcl813b2_ai = { 4, { UNI_RANGE(10), UNI_RANGE(5), UNI_RANGE(2.5), UNI_RANGE(1.25) } }; static const struct comedi_lrange range_iso813_1_ai = { 5, { BIP_RANGE(5), BIP_RANGE(2.5), BIP_RANGE(1.25), BIP_RANGE(0.625), BIP_RANGE(0.3125) } }; static const struct comedi_lrange range_iso813_1_2_ai = { 5, { UNI_RANGE(10), UNI_RANGE(5), UNI_RANGE(2.5), UNI_RANGE(1.25), UNI_RANGE(0.625) } }; static const struct comedi_lrange range_iso813_2_ai = { 4, { BIP_RANGE(5), BIP_RANGE(2.5), BIP_RANGE(1.25), BIP_RANGE(0.625) } }; static const struct comedi_lrange range_iso813_2_2_ai = { 4, { UNI_RANGE(10), UNI_RANGE(5), UNI_RANGE(2.5), UNI_RANGE(1.25) } }; static const struct comedi_lrange range_acl8113_1_ai = { 4, { BIP_RANGE(5), BIP_RANGE(2.5), BIP_RANGE(1.25), BIP_RANGE(0.625) } }; static const struct comedi_lrange range_acl8113_1_2_ai = { 4, { UNI_RANGE(10), UNI_RANGE(5), UNI_RANGE(2.5), UNI_RANGE(1.25) } }; static const struct comedi_lrange range_acl8113_2_ai = { 3, { BIP_RANGE(5), BIP_RANGE(2.5), BIP_RANGE(1.25) } }; static const struct comedi_lrange range_acl8113_2_2_ai = { 3, { UNI_RANGE(10), UNI_RANGE(5), UNI_RANGE(2.5) } }; static const struct comedi_lrange range_acl8112dg_ai = { 9, { BIP_RANGE(5), BIP_RANGE(2.5), BIP_RANGE(1.25), BIP_RANGE(0.625), UNI_RANGE(10), UNI_RANGE(5), UNI_RANGE(2.5), UNI_RANGE(1.25), BIP_RANGE(10) } }; static const struct comedi_lrange range_acl8112hg_ai = { 12, { BIP_RANGE(5), BIP_RANGE(0.5), BIP_RANGE(0.05), BIP_RANGE(0.005), UNI_RANGE(10), UNI_RANGE(1), UNI_RANGE(0.1), UNI_RANGE(0.01), BIP_RANGE(10), BIP_RANGE(1), BIP_RANGE(0.1), BIP_RANGE(0.01) } }; static const struct comedi_lrange range_a821pgh_ai = { 4, { BIP_RANGE(5), BIP_RANGE(0.5), BIP_RANGE(0.05), BIP_RANGE(0.005) } }; enum pcl812_boardtype { BOARD_PCL812PG = 0, /* and ACL-8112PG */ BOARD_PCL813B = 1, BOARD_PCL812 = 2, BOARD_PCL813 = 3, BOARD_ISO813 = 5, BOARD_ACL8113 = 6, BOARD_ACL8112 = 7, /* ACL-8112DG/HG, A-822PGL/PGH, A-823PGL/PGH */ BOARD_ACL8216 = 8, /* and ICP DAS A-826PG */ BOARD_A821 = 9, /* PGH, PGL, PGL/NDA versions */ }; struct pcl812_board { const char *name; enum pcl812_boardtype board_type; int n_aichan; int n_aochan; unsigned int ai_ns_min; const struct comedi_lrange *rangelist_ai; unsigned int irq_bits; unsigned int has_dma:1; unsigned int has_16bit_ai:1; unsigned int has_mpc508_mux:1; unsigned int has_dio:1; }; static const struct pcl812_board boardtypes[] = { { .name = "pcl812", .board_type = BOARD_PCL812, .n_aichan = 16, .n_aochan = 2, .ai_ns_min = 33000, .rangelist_ai = &range_bipolar10, .irq_bits = 0xdcfc, .has_dma = 1, .has_dio = 1, }, { .name = "pcl812pg", .board_type = BOARD_PCL812PG, .n_aichan = 16, .n_aochan = 2, .ai_ns_min = 33000, .rangelist_ai = &range_pcl812pg_ai, .irq_bits = 0xdcfc, .has_dma = 1, .has_dio = 1, }, { .name = "acl8112pg", .board_type = BOARD_PCL812PG, .n_aichan = 16, .n_aochan = 2, .ai_ns_min = 10000, .rangelist_ai = &range_pcl812pg_ai, .irq_bits = 0xdcfc, .has_dma = 1, .has_dio = 1, }, { .name = "acl8112dg", .board_type = BOARD_ACL8112, .n_aichan = 16, /* 8 differential */ .n_aochan = 2, .ai_ns_min = 10000, .rangelist_ai = &range_acl8112dg_ai, .irq_bits = 0xdcfc, .has_dma = 1, .has_mpc508_mux = 1, .has_dio = 1, }, { .name = "acl8112hg", .board_type = BOARD_ACL8112, .n_aichan = 16, /* 8 differential */ .n_aochan = 2, .ai_ns_min = 10000, .rangelist_ai = &range_acl8112hg_ai, .irq_bits = 0xdcfc, .has_dma = 1, .has_mpc508_mux = 1, .has_dio = 1, }, { .name = "a821pgl", .board_type = BOARD_A821, .n_aichan = 16, /* 8 differential */ .n_aochan = 1, .ai_ns_min = 10000, .rangelist_ai = &range_pcl813b_ai, .irq_bits = 0x000c, .has_dio = 1, }, { .name = "a821pglnda", .board_type = BOARD_A821, .n_aichan = 16, /* 8 differential */ .ai_ns_min = 10000, .rangelist_ai = &range_pcl813b_ai, .irq_bits = 0x000c, }, { .name = "a821pgh", .board_type = BOARD_A821, .n_aichan = 16, /* 8 differential */ .n_aochan = 1, .ai_ns_min = 10000, .rangelist_ai = &range_a821pgh_ai, .irq_bits = 0x000c, .has_dio = 1, }, { .name = "a822pgl", .board_type = BOARD_ACL8112, .n_aichan = 16, /* 8 differential */ .n_aochan = 2, .ai_ns_min = 10000, .rangelist_ai = &range_acl8112dg_ai, .irq_bits = 0xdcfc, .has_dma = 1, .has_dio = 1, }, { .name = "a822pgh", .board_type = BOARD_ACL8112, .n_aichan = 16, /* 8 differential */ .n_aochan = 2, .ai_ns_min = 10000, .rangelist_ai = &range_acl8112hg_ai, .irq_bits = 0xdcfc, .has_dma = 1, .has_dio = 1, }, { .name = "a823pgl", .board_type = BOARD_ACL8112, .n_aichan = 16, /* 8 differential */ .n_aochan = 2, .ai_ns_min = 8000, .rangelist_ai = &range_acl8112dg_ai, .irq_bits = 0xdcfc, .has_dma = 1, .has_dio = 1, }, { .name = "a823pgh", .board_type = BOARD_ACL8112, .n_aichan = 16, /* 8 differential */ .n_aochan = 2, .ai_ns_min = 8000, .rangelist_ai = &range_acl8112hg_ai, .irq_bits = 0xdcfc, .has_dma = 1, .has_dio = 1, }, { .name = "pcl813", .board_type = BOARD_PCL813, .n_aichan = 32, .rangelist_ai = &range_pcl813b_ai, }, { .name = "pcl813b", .board_type = BOARD_PCL813B, .n_aichan = 32, .rangelist_ai = &range_pcl813b_ai, }, { .name = "acl8113", .board_type = BOARD_ACL8113, .n_aichan = 32, .rangelist_ai = &range_acl8113_1_ai, }, { .name = "iso813", .board_type = BOARD_ISO813, .n_aichan = 32, .rangelist_ai = &range_iso813_1_ai, }, { .name = "acl8216", .board_type = BOARD_ACL8216, .n_aichan = 16, /* 8 differential */ .n_aochan = 2, .ai_ns_min = 10000, .rangelist_ai = &range_pcl813b2_ai, .irq_bits = 0xdcfc, .has_dma = 1, .has_16bit_ai = 1, .has_mpc508_mux = 1, .has_dio = 1, }, { .name = "a826pg", .board_type = BOARD_ACL8216, .n_aichan = 16, /* 8 differential */ .n_aochan = 2, .ai_ns_min = 10000, .rangelist_ai = &range_pcl813b2_ai, .irq_bits = 0xdcfc, .has_dma = 1, .has_16bit_ai = 1, .has_dio = 1, }, }; struct pcl812_private { struct comedi_isadma *dma; unsigned char range_correction; /* =1 we must add 1 to range number */ unsigned int last_ai_chanspec; unsigned char mode_reg_int; /* stored INT number for some cards */ unsigned int ai_poll_ptr; /* how many samples transfer poll */ unsigned int max_812_ai_mode0_rangewait; /* settling time for gain */ unsigned int use_diff:1; unsigned int use_mpc508:1; unsigned int use_ext_trg:1; unsigned int ai_dma:1; unsigned int ai_eos:1; }; static void pcl812_ai_setup_dma(struct comedi_device *dev, struct comedi_subdevice *s, unsigned int unread_samples) { struct pcl812_private *devpriv = dev->private; struct comedi_isadma *dma = devpriv->dma; struct comedi_isadma_desc *desc = &dma->desc[dma->cur_dma]; unsigned int bytes; unsigned int max_samples; unsigned int nsamples; comedi_isadma_disable(dma->chan); /* if using EOS, adapt DMA buffer to one scan */ bytes = devpriv->ai_eos ? comedi_bytes_per_scan(s) : desc->maxsize; max_samples = comedi_bytes_to_samples(s, bytes); /* * Determine dma size based on the buffer size plus the number of * unread samples and the number of samples remaining in the command. */ nsamples = comedi_nsamples_left(s, max_samples + unread_samples); if (nsamples > unread_samples) { nsamples -= unread_samples; desc->size = comedi_samples_to_bytes(s, nsamples); comedi_isadma_program(desc); } } static void pcl812_ai_set_chan_range(struct comedi_device *dev, unsigned int chanspec, char wait) { struct pcl812_private *devpriv = dev->private; unsigned int chan = CR_CHAN(chanspec); unsigned int range = CR_RANGE(chanspec); unsigned int mux = 0; if (chanspec == devpriv->last_ai_chanspec) return; devpriv->last_ai_chanspec = chanspec; if (devpriv->use_mpc508) { if (devpriv->use_diff) { mux |= PCL812_MUX_CS0 | PCL812_MUX_CS1; } else { if (chan < 8) mux |= PCL812_MUX_CS0; else mux |= PCL812_MUX_CS1; } } outb(mux | PCL812_MUX_CHAN(chan), dev->iobase + PCL812_MUX_REG); outb(range + devpriv->range_correction, dev->iobase + PCL812_RANGE_REG); if (wait) /* * XXX this depends on selected range and can be very long for * some high gain ranges! */ udelay(devpriv->max_812_ai_mode0_rangewait); } static void pcl812_ai_clear_eoc(struct comedi_device *dev) { /* writing any value clears the interrupt request */ outb(0, dev->iobase + PCL812_STATUS_REG); } static void pcl812_ai_soft_trig(struct comedi_device *dev) { /* writing any value triggers a software conversion */ outb(255, dev->iobase + PCL812_SOFTTRIG_REG); } static unsigned int pcl812_ai_get_sample(struct comedi_device *dev, struct comedi_subdevice *s) { unsigned int val; val = inb(dev->iobase + PCL812_AI_MSB_REG) << 8; val |= inb(dev->iobase + PCL812_AI_LSB_REG); return val & s->maxdata; } static int pcl812_ai_eoc(struct comedi_device *dev, struct comedi_subdevice *s, struct comedi_insn *insn, unsigned long context) { unsigned int status; if (s->maxdata > 0x0fff) { status = inb(dev->iobase + PCL812_STATUS_REG); if ((status & PCL812_STATUS_DRDY) == 0) return 0; } else { status = inb(dev->iobase + PCL812_AI_MSB_REG); if ((status & PCL812_AI_MSB_DRDY) == 0) return 0; } return -EBUSY; } static int pcl812_ai_cmdtest(struct comedi_device *dev, struct comedi_subdevice *s, struct comedi_cmd *cmd) { const struct pcl812_board *board = dev->board_ptr; struct pcl812_private *devpriv = dev->private; int err = 0; unsigned int flags; /* Step 1 : check if triggers are trivially valid */ err |= comedi_check_trigger_src(&cmd->start_src, TRIG_NOW); err |= comedi_check_trigger_src(&cmd->scan_begin_src, TRIG_FOLLOW); if (devpriv->use_ext_trg) flags = TRIG_EXT; else flags = TRIG_TIMER; err |= comedi_check_trigger_src(&cmd->convert_src, flags); err |= comedi_check_trigger_src(&cmd->scan_end_src, TRIG_COUNT); err |= comedi_check_trigger_src(&cmd->stop_src, TRIG_COUNT | TRIG_NONE); if (err) return 1; /* Step 2a : make sure trigger sources are unique */ err |= comedi_check_trigger_is_unique(cmd->stop_src); /* Step 2b : and mutually compatible */ if (err) return 2; /* Step 3: check if arguments are trivially valid */ err |= comedi_check_trigger_arg_is(&cmd->start_arg, 0); err |= comedi_check_trigger_arg_is(&cmd->scan_begin_arg, 0); if (cmd->convert_src == TRIG_TIMER) { err |= comedi_check_trigger_arg_min(&cmd->convert_arg, board->ai_ns_min); } else { /* TRIG_EXT */ err |= comedi_check_trigger_arg_is(&cmd->convert_arg, 0); } err |= comedi_check_trigger_arg_min(&cmd->chanlist_len, 1); err |= comedi_check_trigger_arg_is(&cmd->scan_end_arg, cmd->chanlist_len); if (cmd->stop_src == TRIG_COUNT) err |= comedi_check_trigger_arg_min(&cmd->stop_arg, 1); else /* TRIG_NONE */ err |= comedi_check_trigger_arg_is(&cmd->stop_arg, 0); if (err) return 3; /* step 4: fix up any arguments */ if (cmd->convert_src == TRIG_TIMER) { unsigned int arg = cmd->convert_arg; comedi_8254_cascade_ns_to_timer(dev->pacer, &arg, cmd->flags); err |= comedi_check_trigger_arg_is(&cmd->convert_arg, arg); } if (err) return 4; return 0; } static int pcl812_ai_cmd(struct comedi_device *dev, struct comedi_subdevice *s) { struct pcl812_private *devpriv = dev->private; struct comedi_isadma *dma = devpriv->dma; struct comedi_cmd *cmd = &s->async->cmd; unsigned int ctrl = 0; unsigned int i; pcl812_ai_set_chan_range(dev, cmd->chanlist[0], 1); if (dma) { /* check if we can use DMA transfer */ devpriv->ai_dma = 1; for (i = 1; i < cmd->chanlist_len; i++) if (cmd->chanlist[0] != cmd->chanlist[i]) { /* we cann't use DMA :-( */ devpriv->ai_dma = 0; break; } } else { devpriv->ai_dma = 0; } devpriv->ai_poll_ptr = 0; /* don't we want wake up every scan? */ if (cmd->flags & CMDF_WAKE_EOS) { devpriv->ai_eos = 1; /* DMA is useless for this situation */ if (cmd->chanlist_len == 1) devpriv->ai_dma = 0; } if (devpriv->ai_dma) { /* setup and enable dma for the first buffer */ dma->cur_dma = 0; pcl812_ai_setup_dma(dev, s, 0); } switch (cmd->convert_src) { case TRIG_TIMER: comedi_8254_update_divisors(dev->pacer); comedi_8254_pacer_enable(dev->pacer, 1, 2, true); break; } if (devpriv->ai_dma) ctrl |= PCL812_CTRL_PACER_DMA_TRIG; else ctrl |= PCL812_CTRL_PACER_EOC_TRIG; outb(devpriv->mode_reg_int | ctrl, dev->iobase + PCL812_CTRL_REG); return 0; } static bool pcl812_ai_next_chan(struct comedi_device *dev, struct comedi_subdevice *s) { struct comedi_cmd *cmd = &s->async->cmd; if (cmd->stop_src == TRIG_COUNT && s->async->scans_done >= cmd->stop_arg) { s->async->events |= COMEDI_CB_EOA; return false; } return true; } static void pcl812_handle_eoc(struct comedi_device *dev, struct comedi_subdevice *s) { struct comedi_cmd *cmd = &s->async->cmd; unsigned int chan = s->async->cur_chan; unsigned int next_chan; unsigned short val; if (pcl812_ai_eoc(dev, s, NULL, 0)) { dev_dbg(dev->class_dev, "A/D cmd IRQ without DRDY!\n"); s->async->events |= COMEDI_CB_ERROR; return; } val = pcl812_ai_get_sample(dev, s); comedi_buf_write_samples(s, &val, 1); /* Set up next channel. Added by abbotti 2010-01-20, but untested. */ next_chan = s->async->cur_chan; if (cmd->chanlist[chan] != cmd->chanlist[next_chan]) pcl812_ai_set_chan_range(dev, cmd->chanlist[next_chan], 0); pcl812_ai_next_chan(dev, s); } static void transfer_from_dma_buf(struct comedi_device *dev, struct comedi_subdevice *s, unsigned short *ptr, unsigned int bufptr, unsigned int len) { unsigned int i; unsigned short val; for (i = len; i; i--) { val = ptr[bufptr++]; comedi_buf_write_samples(s, &val, 1); if (!pcl812_ai_next_chan(dev, s)) break; } } static void pcl812_handle_dma(struct comedi_device *dev, struct comedi_subdevice *s) { struct pcl812_private *devpriv = dev->private; struct comedi_isadma *dma = devpriv->dma; struct comedi_isadma_desc *desc = &dma->desc[dma->cur_dma]; unsigned int nsamples; int bufptr; nsamples = comedi_bytes_to_samples(s, desc->size) - devpriv->ai_poll_ptr; bufptr = devpriv->ai_poll_ptr; devpriv->ai_poll_ptr = 0; /* restart dma with the next buffer */ dma->cur_dma = 1 - dma->cur_dma; pcl812_ai_setup_dma(dev, s, nsamples); transfer_from_dma_buf(dev, s, desc->virt_addr, bufptr, nsamples); } static irqreturn_t pcl812_interrupt(int irq, void *d) { struct comedi_device *dev = d; struct comedi_subdevice *s = dev->read_subdev; struct pcl812_private *devpriv = dev->private; if (!dev->attached) { pcl812_ai_clear_eoc(dev); return IRQ_HANDLED; } if (devpriv->ai_dma) pcl812_handle_dma(dev, s); else pcl812_handle_eoc(dev, s); pcl812_ai_clear_eoc(dev); comedi_handle_events(dev, s); return IRQ_HANDLED; } static int pcl812_ai_poll(struct comedi_device *dev, struct comedi_subdevice *s) { struct pcl812_private *devpriv = dev->private; struct comedi_isadma *dma = devpriv->dma; struct comedi_isadma_desc *desc; unsigned long flags; unsigned int poll; int ret; /* poll is valid only for DMA transfer */ if (!devpriv->ai_dma) return 0; spin_lock_irqsave(&dev->spinlock, flags); poll = comedi_isadma_poll(dma); poll = comedi_bytes_to_samples(s, poll); if (poll > devpriv->ai_poll_ptr) { desc = &dma->desc[dma->cur_dma]; transfer_from_dma_buf(dev, s, desc->virt_addr, devpriv->ai_poll_ptr, poll - devpriv->ai_poll_ptr); /* new buffer position */ devpriv->ai_poll_ptr = poll; ret = comedi_buf_n_bytes_ready(s); } else { /* no new samples */ ret = 0; } spin_unlock_irqrestore(&dev->spinlock, flags); return ret; } static int pcl812_ai_cancel(struct comedi_device *dev, struct comedi_subdevice *s) { struct pcl812_private *devpriv = dev->private; if (devpriv->ai_dma) comedi_isadma_disable(devpriv->dma->chan); outb(devpriv->mode_reg_int | PCL812_CTRL_DISABLE_TRIG, dev->iobase + PCL812_CTRL_REG); comedi_8254_pacer_enable(dev->pacer, 1, 2, false); pcl812_ai_clear_eoc(dev); return 0; } static int pcl812_ai_insn_read(struct comedi_device *dev, struct comedi_subdevice *s, struct comedi_insn *insn, unsigned int *data) { struct pcl812_private *devpriv = dev->private; int ret = 0; int i; outb(devpriv->mode_reg_int | PCL812_CTRL_SOFT_TRIG, dev->iobase + PCL812_CTRL_REG); pcl812_ai_set_chan_range(dev, insn->chanspec, 1); for (i = 0; i < insn->n; i++) { pcl812_ai_clear_eoc(dev); pcl812_ai_soft_trig(dev); ret = comedi_timeout(dev, s, insn, pcl812_ai_eoc, 0); if (ret) break; data[i] = pcl812_ai_get_sample(dev, s); } outb(devpriv->mode_reg_int | PCL812_CTRL_DISABLE_TRIG, dev->iobase + PCL812_CTRL_REG); pcl812_ai_clear_eoc(dev); return ret ? ret : insn->n; } static int pcl812_ao_insn_write(struct comedi_device *dev, struct comedi_subdevice *s, struct comedi_insn *insn, unsigned int *data) { unsigned int chan = CR_CHAN(insn->chanspec); unsigned int val = s->readback[chan]; int i; for (i = 0; i < insn->n; i++) { val = data[i]; outb(val & 0xff, dev->iobase + PCL812_AO_LSB_REG(chan)); outb((val >> 8) & 0x0f, dev->iobase + PCL812_AO_MSB_REG(chan)); } s->readback[chan] = val; return insn->n; } static int pcl812_di_insn_bits(struct comedi_device *dev, struct comedi_subdevice *s, struct comedi_insn *insn, unsigned int *data) { data[1] = inb(dev->iobase + PCL812_DI_LSB_REG) | (inb(dev->iobase + PCL812_DI_MSB_REG) << 8); return insn->n; } static int pcl812_do_insn_bits(struct comedi_device *dev, struct comedi_subdevice *s, struct comedi_insn *insn, unsigned int *data) { if (comedi_dio_update_state(s, data)) { outb(s->state & 0xff, dev->iobase + PCL812_DO_LSB_REG); outb((s->state >> 8), dev->iobase + PCL812_DO_MSB_REG); } data[1] = s->state; return insn->n; } static void pcl812_reset(struct comedi_device *dev) { const struct pcl812_board *board = dev->board_ptr; struct pcl812_private *devpriv = dev->private; unsigned int chan; /* disable analog input trigger */ outb(devpriv->mode_reg_int | PCL812_CTRL_DISABLE_TRIG, dev->iobase + PCL812_CTRL_REG); pcl812_ai_clear_eoc(dev); /* * Invalidate last_ai_chanspec then set analog input to * known channel/range. */ devpriv->last_ai_chanspec = CR_PACK(16, 0, 0); pcl812_ai_set_chan_range(dev, CR_PACK(0, 0, 0), 0); /* set analog output channels to 0V */ for (chan = 0; chan < board->n_aochan; chan++) { outb(0, dev->iobase + PCL812_AO_LSB_REG(chan)); outb(0, dev->iobase + PCL812_AO_MSB_REG(chan)); } /* set all digital outputs low */ if (board->has_dio) { outb(0, dev->iobase + PCL812_DO_MSB_REG); outb(0, dev->iobase + PCL812_DO_LSB_REG); } } static void pcl812_set_ai_range_table(struct comedi_device *dev, struct comedi_subdevice *s, struct comedi_devconfig *it) { const struct pcl812_board *board = dev->board_ptr; struct pcl812_private *devpriv = dev->private; switch (board->board_type) { case BOARD_PCL812PG: if (it->options[4] == 1) s->range_table = &range_pcl812pg2_ai; else s->range_table = board->rangelist_ai; break; case BOARD_PCL812: switch (it->options[4]) { case 0: s->range_table = &range_bipolar10; break; case 1: s->range_table = &range_bipolar5; break; case 2: s->range_table = &range_bipolar2_5; break; case 3: s->range_table = &range812_bipolar1_25; break; case 4: s->range_table = &range812_bipolar0_625; break; case 5: s->range_table = &range812_bipolar0_3125; break; default: s->range_table = &range_bipolar10; break; } break; case BOARD_PCL813B: if (it->options[1] == 1) s->range_table = &range_pcl813b2_ai; else s->range_table = board->rangelist_ai; break; case BOARD_ISO813: switch (it->options[1]) { case 0: s->range_table = &range_iso813_1_ai; break; case 1: s->range_table = &range_iso813_1_2_ai; break; case 2: s->range_table = &range_iso813_2_ai; devpriv->range_correction = 1; break; case 3: s->range_table = &range_iso813_2_2_ai; devpriv->range_correction = 1; break; default: s->range_table = &range_iso813_1_ai; break; } break; case BOARD_ACL8113: switch (it->options[1]) { case 0: s->range_table = &range_acl8113_1_ai; break; case 1: s->range_table = &range_acl8113_1_2_ai; break; case 2: s->range_table = &range_acl8113_2_ai; devpriv->range_correction = 1; break; case 3: s->range_table = &range_acl8113_2_2_ai; devpriv->range_correction = 1; break; default: s->range_table = &range_acl8113_1_ai; break; } break; default: s->range_table = board->rangelist_ai; break; } } static void pcl812_alloc_dma(struct comedi_device *dev, unsigned int dma_chan) { struct pcl812_private *devpriv = dev->private; /* only DMA channels 3 and 1 are valid */ if (!(dma_chan == 3 || dma_chan == 1)) return; /* DMA uses two 8K buffers */ devpriv->dma = comedi_isadma_alloc(dev, 2, dma_chan, dma_chan, PAGE_SIZE * 2, COMEDI_ISADMA_READ); } static void pcl812_free_dma(struct comedi_device *dev) { struct pcl812_private *devpriv = dev->private; if (devpriv) comedi_isadma_free(devpriv->dma); } static int pcl812_attach(struct comedi_device *dev, struct comedi_devconfig *it) { const struct pcl812_board *board = dev->board_ptr; struct pcl812_private *devpriv; struct comedi_subdevice *s; int n_subdevices; int subdev; int ret; devpriv = comedi_alloc_devpriv(dev, sizeof(*devpriv)); if (!devpriv) return -ENOMEM; ret = comedi_request_region(dev, it->options[0], 0x10); if (ret) return ret; if (board->irq_bits) { dev->pacer = comedi_8254_io_alloc(dev->iobase + PCL812_TIMER_BASE, I8254_OSC_BASE_2MHZ, I8254_IO8, 0); if (IS_ERR(dev->pacer)) return PTR_ERR(dev->pacer); if (it->options[1] > 0 && it->options[1] < 16 && (1 << it->options[1]) & board->irq_bits) { ret = request_irq(it->options[1], pcl812_interrupt, 0, dev->board_name, dev); if (ret == 0) dev->irq = it->options[1]; } } /* we need an IRQ to do DMA on channel 3 or 1 */ if (dev->irq && board->has_dma) pcl812_alloc_dma(dev, it->options[2]); /* differential analog inputs? */ switch (board->board_type) { case BOARD_A821: if (it->options[2] == 1) devpriv->use_diff = 1; break; case BOARD_ACL8112: case BOARD_ACL8216: if (it->options[4] == 1) devpriv->use_diff = 1; break; default: break; } n_subdevices = 1; /* all boardtypes have analog inputs */ if (board->n_aochan > 0) n_subdevices++; if (board->has_dio) n_subdevices += 2; ret = comedi_alloc_subdevices(dev, n_subdevices); if (ret) return ret; subdev = 0; /* Analog Input subdevice */ s = &dev->subdevices[subdev]; s->type = COMEDI_SUBD_AI; s->subdev_flags = SDF_READABLE; if (devpriv->use_diff) { s->subdev_flags |= SDF_DIFF; s->n_chan = board->n_aichan / 2; } else { s->subdev_flags |= SDF_GROUND; s->n_chan = board->n_aichan; } s->maxdata = board->has_16bit_ai ? 0xffff : 0x0fff; pcl812_set_ai_range_table(dev, s, it); s->insn_read = pcl812_ai_insn_read; if (dev->irq) { dev->read_subdev = s; s->subdev_flags |= SDF_CMD_READ; s->len_chanlist = MAX_CHANLIST_LEN; s->do_cmdtest = pcl812_ai_cmdtest; s->do_cmd = pcl812_ai_cmd; s->poll = pcl812_ai_poll; s->cancel = pcl812_ai_cancel; } devpriv->use_mpc508 = board->has_mpc508_mux; subdev++; /* analog output */ if (board->n_aochan > 0) { s = &dev->subdevices[subdev]; s->type = COMEDI_SUBD_AO; s->subdev_flags = SDF_WRITABLE | SDF_GROUND; s->n_chan = board->n_aochan; s->maxdata = 0xfff; switch (board->board_type) { case BOARD_A821: if (it->options[3] == 1) s->range_table = &range_unipolar10; else s->range_table = &range_unipolar5; break; case BOARD_PCL812: case BOARD_ACL8112: case BOARD_PCL812PG: case BOARD_ACL8216: switch (it->options[5]) { case 1: s->range_table = &range_unipolar10; break; case 2: s->range_table = &range_unknown; break; default: s->range_table = &range_unipolar5; break; } break; default: s->range_table = &range_unipolar5; break; } s->insn_write = pcl812_ao_insn_write; ret = comedi_alloc_subdev_readback(s); if (ret) return ret; subdev++; } if (board->has_dio) { /* Digital Input subdevice */ s = &dev->subdevices[subdev]; s->type = COMEDI_SUBD_DI; s->subdev_flags = SDF_READABLE; s->n_chan = 16; s->maxdata = 1; s->range_table = &range_digital; s->insn_bits = pcl812_di_insn_bits; subdev++; /* Digital Output subdevice */ s = &dev->subdevices[subdev]; s->type = COMEDI_SUBD_DO; s->subdev_flags = SDF_WRITABLE; s->n_chan = 16; s->maxdata = 1; s->range_table = &range_digital; s->insn_bits = pcl812_do_insn_bits; subdev++; } switch (board->board_type) { case BOARD_ACL8216: case BOARD_PCL812PG: case BOARD_PCL812: case BOARD_ACL8112: devpriv->max_812_ai_mode0_rangewait = 1; if (it->options[3] > 0) /* we use external trigger */ devpriv->use_ext_trg = 1; break; case BOARD_A821: devpriv->max_812_ai_mode0_rangewait = 1; devpriv->mode_reg_int = (dev->irq << 4) & 0xf0; break; case BOARD_PCL813B: case BOARD_PCL813: case BOARD_ISO813: case BOARD_ACL8113: /* maybe there must by greatest timeout */ devpriv->max_812_ai_mode0_rangewait = 5; break; } pcl812_reset(dev); return 0; } static void pcl812_detach(struct comedi_device *dev) { pcl812_free_dma(dev); comedi_legacy_detach(dev); } static struct comedi_driver pcl812_driver = { .driver_name = "pcl812", .module = THIS_MODULE, .attach = pcl812_attach, .detach = pcl812_detach, .board_name = &boardtypes[0].name, .num_names = ARRAY_SIZE(boardtypes), .offset = sizeof(struct pcl812_board), }; module_comedi_driver(pcl812_driver); MODULE_AUTHOR("Comedi https://www.comedi.org"); MODULE_DESCRIPTION("Comedi low-level driver"); MODULE_LICENSE("GPL"); |
| 16 39 38 39 38 40 39 1 1 1 38 37 37 37 37 37 38 41 35 35 5 30 19 11 10 33 34 30 29 30 29 29 29 29 1 30 30 30 27 30 9 35 44 44 3 74 11 71 26 70 13 63 24 44 38 36 15 3 23 22 75 57 29 29 57 56 29 57 11 56 55 34 34 20 34 20 34 33 34 34 33 34 33 34 30 33 1 1 1 1 1 1 1 1 77 77 77 77 77 77 77 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 | // SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause) /* af_can.c - Protocol family CAN core module * (used by different CAN protocol modules) * * Copyright (c) 2002-2017 Volkswagen Group Electronic Research * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. Neither the name of Volkswagen nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * Alternatively, provided that this notice is retained in full, this * software may be distributed under the terms of the GNU General * Public License ("GPL") version 2, in which case the provisions of the * GPL apply INSTEAD OF those given above. * * The provided data structures and external interfaces from this code * are not restricted to be used by modules with a GPL compatible license. * * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH * DAMAGE. * */ #include <linux/module.h> #include <linux/stddef.h> #include <linux/init.h> #include <linux/kmod.h> #include <linux/slab.h> #include <linux/list.h> #include <linux/spinlock.h> #include <linux/rcupdate.h> #include <linux/uaccess.h> #include <linux/net.h> #include <linux/netdevice.h> #include <linux/socket.h> #include <linux/if_ether.h> #include <linux/if_arp.h> #include <linux/skbuff.h> #include <linux/can.h> #include <linux/can/core.h> #include <linux/can/skb.h> #include <linux/can/can-ml.h> #include <linux/ratelimit.h> #include <net/net_namespace.h> #include <net/sock.h> #include "af_can.h" MODULE_DESCRIPTION("Controller Area Network PF_CAN core"); MODULE_LICENSE("Dual BSD/GPL"); MODULE_AUTHOR("Urs Thuermann <urs.thuermann@volkswagen.de>, " "Oliver Hartkopp <oliver.hartkopp@volkswagen.de>"); MODULE_ALIAS_NETPROTO(PF_CAN); static int stats_timer __read_mostly = 1; module_param(stats_timer, int, 0444); MODULE_PARM_DESC(stats_timer, "enable timer for statistics (default:on)"); static struct kmem_cache *rcv_cache __read_mostly; /* table of registered CAN protocols */ static const struct can_proto __rcu *proto_tab[CAN_NPROTO] __read_mostly; static DEFINE_MUTEX(proto_tab_lock); static atomic_t skbcounter = ATOMIC_INIT(0); /* af_can socket functions */ void can_sock_destruct(struct sock *sk) { skb_queue_purge(&sk->sk_receive_queue); skb_queue_purge(&sk->sk_error_queue); } EXPORT_SYMBOL(can_sock_destruct); static const struct can_proto *can_get_proto(int protocol) { const struct can_proto *cp; rcu_read_lock(); cp = rcu_dereference(proto_tab[protocol]); if (cp && !try_module_get(cp->prot->owner)) cp = NULL; rcu_read_unlock(); return cp; } static inline void can_put_proto(const struct can_proto *cp) { module_put(cp->prot->owner); } static int can_create(struct net *net, struct socket *sock, int protocol, int kern) { struct sock *sk; const struct can_proto *cp; int err = 0; sock->state = SS_UNCONNECTED; if (protocol < 0 || protocol >= CAN_NPROTO) return -EINVAL; cp = can_get_proto(protocol); #ifdef CONFIG_MODULES if (!cp) { /* try to load protocol module if kernel is modular */ err = request_module("can-proto-%d", protocol); /* In case of error we only print a message but don't * return the error code immediately. Below we will * return -EPROTONOSUPPORT */ if (err) pr_err_ratelimited("can: request_module (can-proto-%d) failed.\n", protocol); cp = can_get_proto(protocol); } #endif /* check for available protocol and correct usage */ if (!cp) return -EPROTONOSUPPORT; if (cp->type != sock->type) { err = -EPROTOTYPE; goto errout; } sock->ops = cp->ops; sk = sk_alloc(net, PF_CAN, GFP_KERNEL, cp->prot, kern); if (!sk) { err = -ENOMEM; goto errout; } sock_init_data(sock, sk); sk->sk_destruct = can_sock_destruct; if (sk->sk_prot->init) err = sk->sk_prot->init(sk); if (err) { /* release sk on errors */ sock_orphan(sk); sock_put(sk); sock->sk = NULL; } else { sock_prot_inuse_add(net, sk->sk_prot, 1); } errout: can_put_proto(cp); return err; } /* af_can tx path */ /** * can_send - transmit a CAN frame (optional with local loopback) * @skb: pointer to socket buffer with CAN frame in data section * @loop: loopback for listeners on local CAN sockets (recommended default!) * * Due to the loopback this routine must not be called from hardirq context. * * Return: * 0 on success * -ENETDOWN when the selected interface is down * -ENOBUFS on full driver queue (see net_xmit_errno()) * -ENOMEM when local loopback failed at calling skb_clone() * -EPERM when trying to send on a non-CAN interface * -EMSGSIZE CAN frame size is bigger than CAN interface MTU * -EINVAL when the skb->data does not contain a valid CAN frame */ int can_send(struct sk_buff *skb, int loop) { struct sk_buff *newskb = NULL; struct can_pkg_stats *pkg_stats = dev_net(skb->dev)->can.pkg_stats; int err = -EINVAL; if (can_is_canxl_skb(skb)) { skb->protocol = htons(ETH_P_CANXL); } else if (can_is_can_skb(skb)) { skb->protocol = htons(ETH_P_CAN); } else if (can_is_canfd_skb(skb)) { struct canfd_frame *cfd = (struct canfd_frame *)skb->data; skb->protocol = htons(ETH_P_CANFD); /* set CAN FD flag for CAN FD frames by default */ cfd->flags |= CANFD_FDF; } else { goto inval_skb; } /* Make sure the CAN frame can pass the selected CAN netdevice. */ if (unlikely(skb->len > skb->dev->mtu)) { err = -EMSGSIZE; goto inval_skb; } if (unlikely(skb->dev->type != ARPHRD_CAN)) { err = -EPERM; goto inval_skb; } if (unlikely(!(skb->dev->flags & IFF_UP))) { err = -ENETDOWN; goto inval_skb; } skb->ip_summed = CHECKSUM_UNNECESSARY; skb_reset_mac_header(skb); skb_reset_network_header(skb); skb_reset_transport_header(skb); if (loop) { /* local loopback of sent CAN frames */ /* indication for the CAN driver: do loopback */ skb->pkt_type = PACKET_LOOPBACK; /* The reference to the originating sock may be required * by the receiving socket to check whether the frame is * its own. Example: can_raw sockopt CAN_RAW_RECV_OWN_MSGS * Therefore we have to ensure that skb->sk remains the * reference to the originating sock by restoring skb->sk * after each skb_clone() or skb_orphan() usage. */ if (!(skb->dev->flags & IFF_ECHO)) { /* If the interface is not capable to do loopback * itself, we do it here. */ newskb = skb_clone(skb, GFP_ATOMIC); if (!newskb) { kfree_skb(skb); return -ENOMEM; } can_skb_set_owner(newskb, skb->sk); newskb->ip_summed = CHECKSUM_UNNECESSARY; newskb->pkt_type = PACKET_BROADCAST; } } else { /* indication for the CAN driver: no loopback required */ skb->pkt_type = PACKET_HOST; } /* send to netdevice */ err = dev_queue_xmit(skb); if (err > 0) err = net_xmit_errno(err); if (err) { kfree_skb(newskb); return err; } if (newskb) netif_rx(newskb); /* update statistics */ atomic_long_inc(&pkg_stats->tx_frames); atomic_long_inc(&pkg_stats->tx_frames_delta); return 0; inval_skb: kfree_skb(skb); return err; } EXPORT_SYMBOL(can_send); /* af_can rx path */ static struct can_dev_rcv_lists *can_dev_rcv_lists_find(struct net *net, struct net_device *dev) { if (dev) { struct can_ml_priv *can_ml = can_get_ml_priv(dev); return &can_ml->dev_rcv_lists; } else { return net->can.rx_alldev_list; } } /** * effhash - hash function for 29 bit CAN identifier reduction * @can_id: 29 bit CAN identifier * * Description: * To reduce the linear traversal in one linked list of _single_ EFF CAN * frame subscriptions the 29 bit identifier is mapped to 10 bits. * (see CAN_EFF_RCV_HASH_BITS definition) * * Return: * Hash value from 0x000 - 0x3FF ( enforced by CAN_EFF_RCV_HASH_BITS mask ) */ static unsigned int effhash(canid_t can_id) { unsigned int hash; hash = can_id; hash ^= can_id >> CAN_EFF_RCV_HASH_BITS; hash ^= can_id >> (2 * CAN_EFF_RCV_HASH_BITS); return hash & ((1 << CAN_EFF_RCV_HASH_BITS) - 1); } /** * can_rcv_list_find - determine optimal filterlist inside device filter struct * @can_id: pointer to CAN identifier of a given can_filter * @mask: pointer to CAN mask of a given can_filter * @dev_rcv_lists: pointer to the device filter struct * * Description: * Returns the optimal filterlist to reduce the filter handling in the * receive path. This function is called by service functions that need * to register or unregister a can_filter in the filter lists. * * A filter matches in general, when * * <received_can_id> & mask == can_id & mask * * so every bit set in the mask (even CAN_EFF_FLAG, CAN_RTR_FLAG) describe * relevant bits for the filter. * * The filter can be inverted (CAN_INV_FILTER bit set in can_id) or it can * filter for error messages (CAN_ERR_FLAG bit set in mask). For error msg * frames there is a special filterlist and a special rx path filter handling. * * Return: * Pointer to optimal filterlist for the given can_id/mask pair. * Consistency checked mask. * Reduced can_id to have a preprocessed filter compare value. */ static struct hlist_head *can_rcv_list_find(canid_t *can_id, canid_t *mask, struct can_dev_rcv_lists *dev_rcv_lists) { canid_t inv = *can_id & CAN_INV_FILTER; /* save flag before masking */ /* filter for error message frames in extra filterlist */ if (*mask & CAN_ERR_FLAG) { /* clear CAN_ERR_FLAG in filter entry */ *mask &= CAN_ERR_MASK; return &dev_rcv_lists->rx[RX_ERR]; } /* with cleared CAN_ERR_FLAG we have a simple mask/value filterpair */ #define CAN_EFF_RTR_FLAGS (CAN_EFF_FLAG | CAN_RTR_FLAG) /* ensure valid values in can_mask for 'SFF only' frame filtering */ if ((*mask & CAN_EFF_FLAG) && !(*can_id & CAN_EFF_FLAG)) *mask &= (CAN_SFF_MASK | CAN_EFF_RTR_FLAGS); /* reduce condition testing at receive time */ *can_id &= *mask; /* inverse can_id/can_mask filter */ if (inv) return &dev_rcv_lists->rx[RX_INV]; /* mask == 0 => no condition testing at receive time */ if (!(*mask)) return &dev_rcv_lists->rx[RX_ALL]; /* extra filterlists for the subscription of a single non-RTR can_id */ if (((*mask & CAN_EFF_RTR_FLAGS) == CAN_EFF_RTR_FLAGS) && !(*can_id & CAN_RTR_FLAG)) { if (*can_id & CAN_EFF_FLAG) { if (*mask == (CAN_EFF_MASK | CAN_EFF_RTR_FLAGS)) return &dev_rcv_lists->rx_eff[effhash(*can_id)]; } else { if (*mask == (CAN_SFF_MASK | CAN_EFF_RTR_FLAGS)) return &dev_rcv_lists->rx_sff[*can_id]; } } /* default: filter via can_id/can_mask */ return &dev_rcv_lists->rx[RX_FIL]; } /** * can_rx_register - subscribe CAN frames from a specific interface * @net: the applicable net namespace * @dev: pointer to netdevice (NULL => subscribe from 'all' CAN devices list) * @can_id: CAN identifier (see description) * @mask: CAN mask (see description) * @func: callback function on filter match * @data: returned parameter for callback function * @ident: string for calling module identification * @sk: socket pointer (might be NULL) * * Description: * Invokes the callback function with the received sk_buff and the given * parameter 'data' on a matching receive filter. A filter matches, when * * <received_can_id> & mask == can_id & mask * * The filter can be inverted (CAN_INV_FILTER bit set in can_id) or it can * filter for error message frames (CAN_ERR_FLAG bit set in mask). * * The provided pointer to the sk_buff is guaranteed to be valid as long as * the callback function is running. The callback function must *not* free * the given sk_buff while processing it's task. When the given sk_buff is * needed after the end of the callback function it must be cloned inside * the callback function with skb_clone(). * * Return: * 0 on success * -ENOMEM on missing cache mem to create subscription entry * -ENODEV unknown device */ int can_rx_register(struct net *net, struct net_device *dev, canid_t can_id, canid_t mask, void (*func)(struct sk_buff *, void *), void *data, char *ident, struct sock *sk) { struct receiver *rcv; struct hlist_head *rcv_list; struct can_dev_rcv_lists *dev_rcv_lists; struct can_rcv_lists_stats *rcv_lists_stats = net->can.rcv_lists_stats; /* insert new receiver (dev,canid,mask) -> (func,data) */ if (dev && (dev->type != ARPHRD_CAN || !can_get_ml_priv(dev))) return -ENODEV; if (dev && !net_eq(net, dev_net(dev))) return -ENODEV; rcv = kmem_cache_alloc(rcv_cache, GFP_KERNEL); if (!rcv) return -ENOMEM; spin_lock_bh(&net->can.rcvlists_lock); dev_rcv_lists = can_dev_rcv_lists_find(net, dev); rcv_list = can_rcv_list_find(&can_id, &mask, dev_rcv_lists); rcv->can_id = can_id; rcv->mask = mask; rcv->matches = 0; rcv->func = func; rcv->data = data; rcv->ident = ident; rcv->sk = sk; hlist_add_head_rcu(&rcv->list, rcv_list); dev_rcv_lists->entries++; rcv_lists_stats->rcv_entries++; rcv_lists_stats->rcv_entries_max = max(rcv_lists_stats->rcv_entries_max, rcv_lists_stats->rcv_entries); spin_unlock_bh(&net->can.rcvlists_lock); return 0; } EXPORT_SYMBOL(can_rx_register); /* can_rx_delete_receiver - rcu callback for single receiver entry removal */ static void can_rx_delete_receiver(struct rcu_head *rp) { struct receiver *rcv = container_of(rp, struct receiver, rcu); struct sock *sk = rcv->sk; kmem_cache_free(rcv_cache, rcv); if (sk) sock_put(sk); } /** * can_rx_unregister - unsubscribe CAN frames from a specific interface * @net: the applicable net namespace * @dev: pointer to netdevice (NULL => unsubscribe from 'all' CAN devices list) * @can_id: CAN identifier * @mask: CAN mask * @func: callback function on filter match * @data: returned parameter for callback function * * Description: * Removes subscription entry depending on given (subscription) values. */ void can_rx_unregister(struct net *net, struct net_device *dev, canid_t can_id, canid_t mask, void (*func)(struct sk_buff *, void *), void *data) { struct receiver *rcv = NULL; struct hlist_head *rcv_list; struct can_rcv_lists_stats *rcv_lists_stats = net->can.rcv_lists_stats; struct can_dev_rcv_lists *dev_rcv_lists; if (dev && dev->type != ARPHRD_CAN) return; if (dev && !net_eq(net, dev_net(dev))) return; spin_lock_bh(&net->can.rcvlists_lock); dev_rcv_lists = can_dev_rcv_lists_find(net, dev); rcv_list = can_rcv_list_find(&can_id, &mask, dev_rcv_lists); /* Search the receiver list for the item to delete. This should * exist, since no receiver may be unregistered that hasn't * been registered before. */ hlist_for_each_entry_rcu(rcv, rcv_list, list) { if (rcv->can_id == can_id && rcv->mask == mask && rcv->func == func && rcv->data == data) break; } /* Check for bugs in CAN protocol implementations using af_can.c: * 'rcv' will be NULL if no matching list item was found for removal. * As this case may potentially happen when closing a socket while * the notifier for removing the CAN netdev is running we just print * a warning here. */ if (!rcv) { pr_warn("can: receive list entry not found for dev %s, id %03X, mask %03X\n", DNAME(dev), can_id, mask); goto out; } hlist_del_rcu(&rcv->list); dev_rcv_lists->entries--; if (rcv_lists_stats->rcv_entries > 0) rcv_lists_stats->rcv_entries--; out: spin_unlock_bh(&net->can.rcvlists_lock); /* schedule the receiver item for deletion */ if (rcv) { if (rcv->sk) sock_hold(rcv->sk); call_rcu(&rcv->rcu, can_rx_delete_receiver); } } EXPORT_SYMBOL(can_rx_unregister); static inline void deliver(struct sk_buff *skb, struct receiver *rcv) { rcv->func(skb, rcv->data); rcv->matches++; } static int can_rcv_filter(struct can_dev_rcv_lists *dev_rcv_lists, struct sk_buff *skb) { struct receiver *rcv; int matches = 0; struct can_frame *cf = (struct can_frame *)skb->data; canid_t can_id = cf->can_id; if (dev_rcv_lists->entries == 0) return 0; if (can_id & CAN_ERR_FLAG) { /* check for error message frame entries only */ hlist_for_each_entry_rcu(rcv, &dev_rcv_lists->rx[RX_ERR], list) { if (can_id & rcv->mask) { deliver(skb, rcv); matches++; } } return matches; } /* check for unfiltered entries */ hlist_for_each_entry_rcu(rcv, &dev_rcv_lists->rx[RX_ALL], list) { deliver(skb, rcv); matches++; } /* check for can_id/mask entries */ hlist_for_each_entry_rcu(rcv, &dev_rcv_lists->rx[RX_FIL], list) { if ((can_id & rcv->mask) == rcv->can_id) { deliver(skb, rcv); matches++; } } /* check for inverted can_id/mask entries */ hlist_for_each_entry_rcu(rcv, &dev_rcv_lists->rx[RX_INV], list) { if ((can_id & rcv->mask) != rcv->can_id) { deliver(skb, rcv); matches++; } } /* check filterlists for single non-RTR can_ids */ if (can_id & CAN_RTR_FLAG) return matches; if (can_id & CAN_EFF_FLAG) { hlist_for_each_entry_rcu(rcv, &dev_rcv_lists->rx_eff[effhash(can_id)], list) { if (rcv->can_id == can_id) { deliver(skb, rcv); matches++; } } } else { can_id &= CAN_SFF_MASK; hlist_for_each_entry_rcu(rcv, &dev_rcv_lists->rx_sff[can_id], list) { deliver(skb, rcv); matches++; } } return matches; } static void can_receive(struct sk_buff *skb, struct net_device *dev) { struct can_dev_rcv_lists *dev_rcv_lists; struct net *net = dev_net(dev); struct can_pkg_stats *pkg_stats = net->can.pkg_stats; int matches; /* update statistics */ atomic_long_inc(&pkg_stats->rx_frames); atomic_long_inc(&pkg_stats->rx_frames_delta); /* create non-zero unique skb identifier together with *skb */ while (!(can_skb_prv(skb)->skbcnt)) can_skb_prv(skb)->skbcnt = atomic_inc_return(&skbcounter); rcu_read_lock(); /* deliver the packet to sockets listening on all devices */ matches = can_rcv_filter(net->can.rx_alldev_list, skb); /* find receive list for this device */ dev_rcv_lists = can_dev_rcv_lists_find(net, dev); matches += can_rcv_filter(dev_rcv_lists, skb); rcu_read_unlock(); /* consume the skbuff allocated by the netdevice driver */ consume_skb(skb); if (matches > 0) { atomic_long_inc(&pkg_stats->matches); atomic_long_inc(&pkg_stats->matches_delta); } } static int can_rcv(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt, struct net_device *orig_dev) { if (unlikely(dev->type != ARPHRD_CAN || !can_get_ml_priv(dev) || !can_is_can_skb(skb))) { pr_warn_once("PF_CAN: dropped non conform CAN skbuff: dev type %d, len %d\n", dev->type, skb->len); kfree_skb_reason(skb, SKB_DROP_REASON_CAN_RX_INVALID_FRAME); return NET_RX_DROP; } can_receive(skb, dev); return NET_RX_SUCCESS; } static int canfd_rcv(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt, struct net_device *orig_dev) { if (unlikely(dev->type != ARPHRD_CAN || !can_get_ml_priv(dev) || !can_is_canfd_skb(skb))) { pr_warn_once("PF_CAN: dropped non conform CAN FD skbuff: dev type %d, len %d\n", dev->type, skb->len); kfree_skb_reason(skb, SKB_DROP_REASON_CANFD_RX_INVALID_FRAME); return NET_RX_DROP; } can_receive(skb, dev); return NET_RX_SUCCESS; } static int canxl_rcv(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt, struct net_device *orig_dev) { if (unlikely(dev->type != ARPHRD_CAN || !can_get_ml_priv(dev) || !can_is_canxl_skb(skb))) { pr_warn_once("PF_CAN: dropped non conform CAN XL skbuff: dev type %d, len %d\n", dev->type, skb->len); kfree_skb_reason(skb, SKB_DROP_REASON_CANXL_RX_INVALID_FRAME); return NET_RX_DROP; } can_receive(skb, dev); return NET_RX_SUCCESS; } /* af_can protocol functions */ /** * can_proto_register - register CAN transport protocol * @cp: pointer to CAN protocol structure * * Return: * 0 on success * -EINVAL invalid (out of range) protocol number * -EBUSY protocol already in use * -ENOBUF if proto_register() fails */ int can_proto_register(const struct can_proto *cp) { int proto = cp->protocol; int err = 0; if (proto < 0 || proto >= CAN_NPROTO) { pr_err("can: protocol number %d out of range\n", proto); return -EINVAL; } err = proto_register(cp->prot, 0); if (err < 0) return err; mutex_lock(&proto_tab_lock); if (rcu_access_pointer(proto_tab[proto])) { pr_err("can: protocol %d already registered\n", proto); err = -EBUSY; } else { RCU_INIT_POINTER(proto_tab[proto], cp); } mutex_unlock(&proto_tab_lock); if (err < 0) proto_unregister(cp->prot); return err; } EXPORT_SYMBOL(can_proto_register); /** * can_proto_unregister - unregister CAN transport protocol * @cp: pointer to CAN protocol structure */ void can_proto_unregister(const struct can_proto *cp) { int proto = cp->protocol; mutex_lock(&proto_tab_lock); BUG_ON(rcu_access_pointer(proto_tab[proto]) != cp); RCU_INIT_POINTER(proto_tab[proto], NULL); mutex_unlock(&proto_tab_lock); synchronize_rcu(); proto_unregister(cp->prot); } EXPORT_SYMBOL(can_proto_unregister); static int can_pernet_init(struct net *net) { spin_lock_init(&net->can.rcvlists_lock); net->can.rx_alldev_list = kzalloc(sizeof(*net->can.rx_alldev_list), GFP_KERNEL); if (!net->can.rx_alldev_list) goto out; net->can.pkg_stats = kzalloc(sizeof(*net->can.pkg_stats), GFP_KERNEL); if (!net->can.pkg_stats) goto out_free_rx_alldev_list; net->can.rcv_lists_stats = kzalloc(sizeof(*net->can.rcv_lists_stats), GFP_KERNEL); if (!net->can.rcv_lists_stats) goto out_free_pkg_stats; if (IS_ENABLED(CONFIG_PROC_FS)) { /* the statistics are updated every second (timer triggered) */ if (stats_timer) { timer_setup(&net->can.stattimer, can_stat_update, 0); mod_timer(&net->can.stattimer, round_jiffies(jiffies + HZ)); } net->can.pkg_stats->jiffies_init = jiffies; can_init_proc(net); } return 0; out_free_pkg_stats: kfree(net->can.pkg_stats); out_free_rx_alldev_list: kfree(net->can.rx_alldev_list); out: return -ENOMEM; } static void can_pernet_exit(struct net *net) { if (IS_ENABLED(CONFIG_PROC_FS)) { can_remove_proc(net); if (stats_timer) timer_delete_sync(&net->can.stattimer); } kfree(net->can.rx_alldev_list); kfree(net->can.pkg_stats); kfree(net->can.rcv_lists_stats); } /* af_can module init/exit functions */ static struct packet_type can_packet __read_mostly = { .type = cpu_to_be16(ETH_P_CAN), .func = can_rcv, }; static struct packet_type canfd_packet __read_mostly = { .type = cpu_to_be16(ETH_P_CANFD), .func = canfd_rcv, }; static struct packet_type canxl_packet __read_mostly = { .type = cpu_to_be16(ETH_P_CANXL), .func = canxl_rcv, }; static const struct net_proto_family can_family_ops = { .family = PF_CAN, .create = can_create, .owner = THIS_MODULE, }; static struct pernet_operations can_pernet_ops __read_mostly = { .init = can_pernet_init, .exit = can_pernet_exit, }; static __init int can_init(void) { int err; /* check for correct padding to be able to use the structs similarly */ BUILD_BUG_ON(offsetof(struct can_frame, len) != offsetof(struct canfd_frame, len) || offsetof(struct can_frame, len) != offsetof(struct canxl_frame, flags) || offsetof(struct can_frame, data) != offsetof(struct canfd_frame, data)); pr_info("can: controller area network core\n"); rcv_cache = kmem_cache_create("can_receiver", sizeof(struct receiver), 0, 0, NULL); if (!rcv_cache) return -ENOMEM; err = register_pernet_subsys(&can_pernet_ops); if (err) goto out_pernet; /* protocol register */ err = sock_register(&can_family_ops); if (err) goto out_sock; dev_add_pack(&can_packet); dev_add_pack(&canfd_packet); dev_add_pack(&canxl_packet); return 0; out_sock: unregister_pernet_subsys(&can_pernet_ops); out_pernet: kmem_cache_destroy(rcv_cache); return err; } static __exit void can_exit(void) { /* protocol unregister */ dev_remove_pack(&canxl_packet); dev_remove_pack(&canfd_packet); dev_remove_pack(&can_packet); sock_unregister(PF_CAN); unregister_pernet_subsys(&can_pernet_ops); rcu_barrier(); /* Wait for completion of call_rcu()'s */ kmem_cache_destroy(rcv_cache); } module_init(can_init); module_exit(can_exit); |
| 74 70 74 74 71 74 74 170 74 109 315 226 226 230 4 226 227 226 227 231 231 4 174 224 18 18 1 18 17 18 225 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 | /* SPDX-License-Identifier: GPL-2.0 */ #ifndef _BCACHEFS_BTREE_UPDATE_INTERIOR_H #define _BCACHEFS_BTREE_UPDATE_INTERIOR_H #include "btree_cache.h" #include "btree_locking.h" #include "btree_update.h" #define BTREE_UPDATE_NODES_MAX ((BTREE_MAX_DEPTH - 2) * 2 + GC_MERGE_NODES) #define BTREE_UPDATE_JOURNAL_RES (BTREE_UPDATE_NODES_MAX * (BKEY_BTREE_PTR_U64s_MAX + 1)) int bch2_btree_node_check_topology(struct btree_trans *, struct btree *); #define BTREE_UPDATE_MODES() \ x(none) \ x(node) \ x(root) \ x(update) enum btree_update_mode { #define x(n) BTREE_UPDATE_##n, BTREE_UPDATE_MODES() #undef x }; /* * Tracks an in progress split/rewrite of a btree node and the update to the * parent node: * * When we split/rewrite a node, we do all the updates in memory without * waiting for any writes to complete - we allocate the new node(s) and update * the parent node, possibly recursively up to the root. * * The end result is that we have one or more new nodes being written - * possibly several, if there were multiple splits - and then a write (updating * an interior node) which will make all these new nodes visible. * * Additionally, as we split/rewrite nodes we free the old nodes - but the old * nodes can't be freed (their space on disk can't be reclaimed) until the * update to the interior node that makes the new node visible completes - * until then, the old nodes are still reachable on disk. * */ struct btree_update { struct closure cl; struct bch_fs *c; u64 start_time; unsigned long ip_started; struct list_head list; struct list_head unwritten_list; enum btree_update_mode mode; enum bch_trans_commit_flags flags; unsigned nodes_written:1; unsigned took_gc_lock:1; enum btree_id btree_id; struct bpos node_start; struct bpos node_end; enum btree_node_rewrite_reason node_needed_rewrite; u16 node_written; u16 node_sectors; u16 node_remaining; unsigned update_level_start; unsigned update_level_end; struct disk_reservation disk_res; /* * BTREE_UPDATE_node: * The update that made the new nodes visible was a regular update to an * existing interior node - @b. We can't write out the update to @b * until the new nodes we created are finished writing, so we block @b * from writing by putting this btree_interior update on the * @b->write_blocked list with @write_blocked_list: */ struct btree *b; struct list_head write_blocked_list; /* * We may be freeing nodes that were dirty, and thus had journal entries * pinned: we need to transfer the oldest of those pins to the * btree_update operation, and release it when the new node(s) * are all persistent and reachable: */ struct journal_entry_pin journal; /* Preallocated nodes we reserve when we start the update: */ struct prealloc_nodes { struct btree *b[BTREE_UPDATE_NODES_MAX]; unsigned nr; } prealloc_nodes[2]; /* Nodes being freed: */ struct keylist old_keys; u64 _old_keys[BTREE_UPDATE_NODES_MAX * BKEY_BTREE_PTR_U64s_MAX]; /* Nodes being added: */ struct keylist new_keys; u64 _new_keys[BTREE_UPDATE_NODES_MAX * BKEY_BTREE_PTR_U64s_MAX]; /* New nodes, that will be made reachable by this update: */ struct btree *new_nodes[BTREE_UPDATE_NODES_MAX]; unsigned nr_new_nodes; struct btree *old_nodes[BTREE_UPDATE_NODES_MAX]; __le64 old_nodes_seq[BTREE_UPDATE_NODES_MAX]; unsigned nr_old_nodes; open_bucket_idx_t open_buckets[BTREE_UPDATE_NODES_MAX * BCH_REPLICAS_MAX]; open_bucket_idx_t nr_open_buckets; unsigned journal_u64s; u64 journal_entries[BTREE_UPDATE_JOURNAL_RES]; /* Only here to reduce stack usage on recursive splits: */ struct keylist parent_keys; /* * Enough room for btree_split's keys without realloc - btree node * pointers never have crc/compression info, so we only need to acount * for the pointers for three keys */ u64 inline_keys[BKEY_BTREE_PTR_U64s_MAX * 3]; }; struct btree *__bch2_btree_node_alloc_replacement(struct btree_update *, struct btree_trans *, struct btree *, struct bkey_format); int bch2_btree_split_leaf(struct btree_trans *, btree_path_idx_t, unsigned); int bch2_btree_increase_depth(struct btree_trans *, btree_path_idx_t, unsigned); int __bch2_foreground_maybe_merge(struct btree_trans *, btree_path_idx_t, unsigned, unsigned, enum btree_node_sibling); static inline int bch2_foreground_maybe_merge_sibling(struct btree_trans *trans, btree_path_idx_t path_idx, unsigned level, unsigned flags, enum btree_node_sibling sib) { struct btree_path *path = trans->paths + path_idx; struct btree *b; EBUG_ON(!btree_node_locked(path, level)); if (static_branch_unlikely(&bch2_btree_node_merging_disabled)) return 0; b = path->l[level].b; if (b->sib_u64s[sib] > trans->c->btree_foreground_merge_threshold) return 0; return __bch2_foreground_maybe_merge(trans, path_idx, level, flags, sib); } static inline int bch2_foreground_maybe_merge(struct btree_trans *trans, btree_path_idx_t path, unsigned level, unsigned flags) { bch2_trans_verify_not_unlocked_or_in_restart(trans); return bch2_foreground_maybe_merge_sibling(trans, path, level, flags, btree_prev_sib) ?: bch2_foreground_maybe_merge_sibling(trans, path, level, flags, btree_next_sib); } int bch2_btree_node_rewrite(struct btree_trans *, struct btree_iter *, struct btree *, unsigned, unsigned); int bch2_btree_node_rewrite_key(struct btree_trans *, enum btree_id, unsigned, struct bkey_i *, unsigned); int bch2_btree_node_rewrite_pos(struct btree_trans *, enum btree_id, unsigned, struct bpos, unsigned, unsigned); int bch2_btree_node_rewrite_key_get_iter(struct btree_trans *, struct btree *, unsigned); void bch2_btree_node_rewrite_async(struct bch_fs *, struct btree *); int bch2_btree_node_update_key(struct btree_trans *, struct btree_iter *, struct btree *, struct bkey_i *, unsigned, bool); int bch2_btree_node_update_key_get_iter(struct btree_trans *, struct btree *, struct bkey_i *, unsigned, bool); void bch2_btree_set_root_for_read(struct bch_fs *, struct btree *); int bch2_btree_root_alloc_fake_trans(struct btree_trans *, enum btree_id, unsigned); void bch2_btree_root_alloc_fake(struct bch_fs *, enum btree_id, unsigned); static inline unsigned btree_update_reserve_required(struct bch_fs *c, struct btree *b) { unsigned depth = btree_node_root(c, b)->c.level + 1; /* * Number of nodes we might have to allocate in a worst case btree * split operation - we split all the way up to the root, then allocate * a new root, unless we're already at max depth: */ if (depth < BTREE_MAX_DEPTH) return (depth - b->c.level) * 2 + 1; else return (depth - b->c.level) * 2 - 1; } static inline void btree_node_reset_sib_u64s(struct btree *b) { b->sib_u64s[0] = b->nr.live_u64s; b->sib_u64s[1] = b->nr.live_u64s; } static inline void *btree_data_end(struct btree *b) { return (void *) b->data + btree_buf_bytes(b); } static inline struct bkey_packed *unwritten_whiteouts_start(struct btree *b) { return (void *) ((u64 *) btree_data_end(b) - b->whiteout_u64s); } static inline struct bkey_packed *unwritten_whiteouts_end(struct btree *b) { return btree_data_end(b); } static inline void *write_block(struct btree *b) { return (void *) b->data + (b->written << 9); } static inline bool __btree_addr_written(struct btree *b, void *p) { return p < write_block(b); } static inline bool bset_written(struct btree *b, struct bset *i) { return __btree_addr_written(b, i); } static inline bool bkey_written(struct btree *b, struct bkey_packed *k) { return __btree_addr_written(b, k); } static inline ssize_t __bch2_btree_u64s_remaining(struct btree *b, void *end) { ssize_t used = bset_byte_offset(b, end) / sizeof(u64) + b->whiteout_u64s; ssize_t total = btree_buf_bytes(b) >> 3; /* Always leave one extra u64 for bch2_varint_decode: */ used++; return total - used; } static inline size_t bch2_btree_keys_u64s_remaining(struct btree *b) { ssize_t remaining = __bch2_btree_u64s_remaining(b, btree_bkey_last(b, bset_tree_last(b))); BUG_ON(remaining < 0); if (bset_written(b, btree_bset_last(b))) return 0; return remaining; } #define BTREE_WRITE_SET_U64s_BITS 9 static inline unsigned btree_write_set_buffer(struct btree *b) { /* * Could buffer up larger amounts of keys for btrees with larger keys, * pending benchmarking: */ return 8 << BTREE_WRITE_SET_U64s_BITS; } static inline struct btree_node_entry *want_new_bset(struct bch_fs *c, struct btree *b) { struct bset_tree *t = bset_tree_last(b); struct btree_node_entry *bne = max(write_block(b), (void *) btree_bkey_last(b, t)); ssize_t remaining_space = __bch2_btree_u64s_remaining(b, bne->keys.start); if (unlikely(bset_written(b, bset(b, t)))) { if (b->written + block_sectors(c) <= btree_sectors(c)) return bne; } else { if (unlikely(bset_u64s(t) * sizeof(u64) > btree_write_set_buffer(b)) && remaining_space > (ssize_t) (btree_write_set_buffer(b) >> 3)) return bne; } return NULL; } static inline void push_whiteout(struct btree *b, struct bpos pos) { struct bkey_packed k; BUG_ON(bch2_btree_keys_u64s_remaining(b) < BKEY_U64s); EBUG_ON(btree_node_just_written(b)); if (!bkey_pack_pos(&k, pos, b)) { struct bkey *u = (void *) &k; bkey_init(u); u->p = pos; } k.needs_whiteout = true; b->whiteout_u64s += k.u64s; bkey_p_copy(unwritten_whiteouts_start(b), &k); } /* * write lock must be held on @b (else the dirty bset that we were going to * insert into could be written out from under us) */ static inline bool bch2_btree_node_insert_fits(struct btree *b, unsigned u64s) { if (unlikely(btree_node_need_rewrite(b))) return false; return u64s <= bch2_btree_keys_u64s_remaining(b); } void bch2_btree_updates_to_text(struct printbuf *, struct bch_fs *); bool bch2_btree_interior_updates_flush(struct bch_fs *); void bch2_journal_entry_to_btree_root(struct bch_fs *, struct jset_entry *); struct jset_entry *bch2_btree_roots_to_journal_entries(struct bch_fs *, struct jset_entry *, unsigned long); void bch2_async_btree_node_rewrites_flush(struct bch_fs *); void bch2_do_pending_node_rewrites(struct bch_fs *); void bch2_free_pending_node_rewrites(struct bch_fs *); void bch2_btree_reserve_cache_to_text(struct printbuf *, struct bch_fs *); void bch2_fs_btree_interior_update_exit(struct bch_fs *); void bch2_fs_btree_interior_update_init_early(struct bch_fs *); int bch2_fs_btree_interior_update_init(struct bch_fs *); #endif /* _BCACHEFS_BTREE_UPDATE_INTERIOR_H */ |
| 12 11 12 12 12 12 23 23 23 23 8 8 8 7 8 4 5 8 4 8 157 159 157 157 156 159 67 164 161 6 163 170 165 168 100 172 95 95 14 638 641 96 29 67 97 94 93 88 88 13 12 86 72 94 642 611 7 7 3 2 1 23 21 23 2 2 37 6 5 3 34 34 34 37 33 23 23 23 23 2 23 4 4 4 4 4 2 1 1 1 1 1 1 1 1 6 5 6 4 4 1 3 4 3 2 2 1 3 3 1 1 1 1 1 3 1 11 10 9 5 5 4 1 4 3 10 30 40 40 40 40 36 14 40 3 4 2 2 2 2 2 2 4 3 3 2 2 117 4 9 2 1 6 1 2 1 2 1 1 1 3 2 1 4 4 3 4 3 3 3 2 1 3 5 11 5 41 1 39 39 39 20 2 4 115 5 6 48 1 1 205 201 195 37 174 164 79 163 194 4 3 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 | // SPDX-License-Identifier: GPL-2.0-only /* * linux/kernel/ptrace.c * * (C) Copyright 1999 Linus Torvalds * * Common interfaces for "ptrace()" which we do not want * to continually duplicate across every architecture. */ #include <linux/capability.h> #include <linux/export.h> #include <linux/sched.h> #include <linux/sched/mm.h> #include <linux/sched/coredump.h> #include <linux/sched/task.h> #include <linux/errno.h> #include <linux/mm.h> #include <linux/highmem.h> #include <linux/pagemap.h> #include <linux/ptrace.h> #include <linux/security.h> #include <linux/signal.h> #include <linux/uio.h> #include <linux/audit.h> #include <linux/pid_namespace.h> #include <linux/syscalls.h> #include <linux/uaccess.h> #include <linux/regset.h> #include <linux/hw_breakpoint.h> #include <linux/cn_proc.h> #include <linux/compat.h> #include <linux/sched/signal.h> #include <linux/minmax.h> #include <linux/syscall_user_dispatch.h> #include <asm/syscall.h> /* for syscall_get_* */ /* * Access another process' address space via ptrace. * Source/target buffer must be kernel space, * Do not walk the page table directly, use get_user_pages */ int ptrace_access_vm(struct task_struct *tsk, unsigned long addr, void *buf, int len, unsigned int gup_flags) { struct mm_struct *mm; int ret; mm = get_task_mm(tsk); if (!mm) return 0; if (!tsk->ptrace || (current != tsk->parent) || ((get_dumpable(mm) != SUID_DUMP_USER) && !ptracer_capable(tsk, mm->user_ns))) { mmput(mm); return 0; } ret = access_remote_vm(mm, addr, buf, len, gup_flags); mmput(mm); return ret; } void __ptrace_link(struct task_struct *child, struct task_struct *new_parent, const struct cred *ptracer_cred) { BUG_ON(!list_empty(&child->ptrace_entry)); list_add(&child->ptrace_entry, &new_parent->ptraced); child->parent = new_parent; child->ptracer_cred = get_cred(ptracer_cred); } /* * ptrace a task: make the debugger its new parent and * move it to the ptrace list. * * Must be called with the tasklist lock write-held. */ static void ptrace_link(struct task_struct *child, struct task_struct *new_parent) { __ptrace_link(child, new_parent, current_cred()); } /** * __ptrace_unlink - unlink ptracee and restore its execution state * @child: ptracee to be unlinked * * Remove @child from the ptrace list, move it back to the original parent, * and restore the execution state so that it conforms to the group stop * state. * * Unlinking can happen via two paths - explicit PTRACE_DETACH or ptracer * exiting. For PTRACE_DETACH, unless the ptracee has been killed between * ptrace_check_attach() and here, it's guaranteed to be in TASK_TRACED. * If the ptracer is exiting, the ptracee can be in any state. * * After detach, the ptracee should be in a state which conforms to the * group stop. If the group is stopped or in the process of stopping, the * ptracee should be put into TASK_STOPPED; otherwise, it should be woken * up from TASK_TRACED. * * If the ptracee is in TASK_TRACED and needs to be moved to TASK_STOPPED, * it goes through TRACED -> RUNNING -> STOPPED transition which is similar * to but in the opposite direction of what happens while attaching to a * stopped task. However, in this direction, the intermediate RUNNING * state is not hidden even from the current ptracer and if it immediately * re-attaches and performs a WNOHANG wait(2), it may fail. * * CONTEXT: * write_lock_irq(tasklist_lock) */ void __ptrace_unlink(struct task_struct *child) { const struct cred *old_cred; BUG_ON(!child->ptrace); clear_task_syscall_work(child, SYSCALL_TRACE); #if defined(CONFIG_GENERIC_ENTRY) || defined(TIF_SYSCALL_EMU) clear_task_syscall_work(child, SYSCALL_EMU); #endif child->parent = child->real_parent; list_del_init(&child->ptrace_entry); old_cred = child->ptracer_cred; child->ptracer_cred = NULL; put_cred(old_cred); spin_lock(&child->sighand->siglock); child->ptrace = 0; /* * Clear all pending traps and TRAPPING. TRAPPING should be * cleared regardless of JOBCTL_STOP_PENDING. Do it explicitly. */ task_clear_jobctl_pending(child, JOBCTL_TRAP_MASK); task_clear_jobctl_trapping(child); /* * Reinstate JOBCTL_STOP_PENDING if group stop is in effect and * @child isn't dead. */ if (!(child->flags & PF_EXITING) && (child->signal->flags & SIGNAL_STOP_STOPPED || child->signal->group_stop_count)) child->jobctl |= JOBCTL_STOP_PENDING; /* * If transition to TASK_STOPPED is pending or in TASK_TRACED, kick * @child in the butt. Note that @resume should be used iff @child * is in TASK_TRACED; otherwise, we might unduly disrupt * TASK_KILLABLE sleeps. */ if (child->jobctl & JOBCTL_STOP_PENDING || task_is_traced(child)) ptrace_signal_wake_up(child, true); spin_unlock(&child->sighand->siglock); } static bool looks_like_a_spurious_pid(struct task_struct *task) { if (task->exit_code != ((PTRACE_EVENT_EXEC << 8) | SIGTRAP)) return false; if (task_pid_vnr(task) == task->ptrace_message) return false; /* * The tracee changed its pid but the PTRACE_EVENT_EXEC event * was not wait()'ed, most probably debugger targets the old * leader which was destroyed in de_thread(). */ return true; } /* * Ensure that nothing can wake it up, even SIGKILL * * A task is switched to this state while a ptrace operation is in progress; * such that the ptrace operation is uninterruptible. */ static bool ptrace_freeze_traced(struct task_struct *task) { bool ret = false; /* Lockless, nobody but us can set this flag */ if (task->jobctl & JOBCTL_LISTENING) return ret; spin_lock_irq(&task->sighand->siglock); if (task_is_traced(task) && !looks_like_a_spurious_pid(task) && !__fatal_signal_pending(task)) { task->jobctl |= JOBCTL_PTRACE_FROZEN; ret = true; } spin_unlock_irq(&task->sighand->siglock); return ret; } static void ptrace_unfreeze_traced(struct task_struct *task) { unsigned long flags; /* * The child may be awake and may have cleared * JOBCTL_PTRACE_FROZEN (see ptrace_resume). The child will * not set JOBCTL_PTRACE_FROZEN or enter __TASK_TRACED anew. */ if (lock_task_sighand(task, &flags)) { task->jobctl &= ~JOBCTL_PTRACE_FROZEN; if (__fatal_signal_pending(task)) { |