| 34 10 491 239 22 19 4 4 13 13 13 13 13 60 276 442 444 444 418 400 399 394 416 417 407 402 393 397 415 431 92 91 25 67 67 91 30 475 475 2 2 25 25 2 2 232 233 232 233 234 236 232 2 8 475 473 474 476 475 312 397 306 414 474 449 476 476 275 475 67 447 474 473 475 685 409 407 1 408 411 410 410 1 1 1 1 1 1 1055 309 308 308 307 1059 1038 338 341 1040 308 308 309 1050 1048 1017 1021 35 255 503 507 473 472 470 420 416 416 417 29 29 28 1 465 262 262 41 41 262 5 360 357 251 255 254 255 255 255 254 314 315 313 15 239 285 254 198 79 254 251 4 254 252 236 236 236 236 236 235 236 236 169 70 236 25 24 25 25 25 25 28 28 64 64 1 63 15 3 12 267 187 1 73 1 1 2 1 1 1 1 1 62 6 9 63 1 1 1 1 3 1 156 3 26 31 30 1 31 31 30 310 225 4 1 1 1 3 17 3 20 17 3 20 17 3 20 1 1 1 1 1 1 3 1 1 1 1 1 2 1 1 1 2 5 58 1 1 1 1 3 14 1 1 1 1 1 4 2 2 208 22 205 3 423 424 93 414 1 183 9 163 167 84 83 84 58 4 4 16 16 16 84 83 83 220 219 153 69 138 1 1 1 427 428 272 406 10 408 3 3 271 273 416 417 395 1 1 1 1 1 1 1 1 1 1 202 3 193 5 238 235 238 237 238 72 138 66 1 238 238 18 417 417 416 418 414 417 413 157 412 117 118 154 155 155 8 151 28 154 154 155 155 155 153 1 398 402 336 336 399 392 399 165 166 165 164 165 164 164 166 130 128 388 388 28 28 28 44 44 43 116 116 116 43 43 43 1 1 1 2 25 25 23 24 22 22 21 21 28 1 1 1 27 25 24 1 1 21 31 31 30 31 31 58 83 4 16 57 56 57 55 56 57 55 3 3 3 3 56 56 55 1 1 1 1 55 56 54 56 56 56 56 11 11 11 11 11 57 56 57 57 57 56 57 57 56 57 56 57 55 56 57 57 57 57 56 81 80 56 25 81 8 25 1 38 20 55 55 55 55 56 56 18 38 56 56 56 55 55 26 26 26 25 24 25 25 25 25 25 25 25 25 25 25 25 25 25 25 24 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 24 25 25 8 25 25 1 24 1 24 31 6 25 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31 23 23 4 4 1 3 46 46 45 1 46 46 11 11 11 11 28 12 28 16 5 20 12 12 12 12 12 11 377 57 428 430 430 430 2 429 1 1 1 133 134 128 12 13 1 12 3 6 1 1 2 3 223 55 80 81 53 50 4 156 1 1 69 4 4 167 167 156 70 1 3 2 4 2 33 33 7 33 33 2 21 11 12 21 21 21 118 119 1 19 19 19 19 431 434 210 385 432 434 432 31 3 27 28 1 1 478 477 480 14 3 2 2 173 174 174 174 1 29 29 474 471 1 29 28 1 432 213 450 19 75 2 223 6 30 443 470 476 474 52 52 5 50 3 3 3 3 3 100 101 4 19 97 97 97 45 45 133 224 475 1 474 224 134 391 371 369 469 440 224 8 2 137 69 1 301 303 304 474 474 26 4 4 7 397 383 451 446 75 15 56 475 474 1 476 473 475 476 406 476 454 473 21 473 472 472 458 18 475 476 471 476 476 474 473 471 30 17 3 470 475 29 450 451 12 12 12 12 26 26 26 26 26 26 9 25 25 25 1 1 1 14 14 14 14 14 14 358 359 358 56 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 21 22 22 22 22 23 56 20 37 56 34 23 55 56 4 5 5 5 4 16 16 16 7 12 10 7 9 9 416 410 8 24 131 88 121 120 4 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 2297 2298 2299 2300 2301 2302 2303 2304 2305 2306 2307 2308 2309 2310 2311 2312 2313 2314 2315 2316 2317 2318 2319 2320 2321 2322 2323 2324 2325 2326 2327 2328 2329 2330 2331 2332 2333 2334 2335 2336 2337 2338 2339 2340 2341 2342 2343 2344 2345 2346 2347 2348 2349 2350 2351 2352 2353 2354 2355 2356 2357 2358 2359 2360 2361 2362 2363 2364 2365 2366 2367 2368 2369 2370 2371 2372 2373 2374 2375 2376 2377 2378 2379 2380 2381 2382 2383 2384 2385 2386 2387 2388 2389 2390 2391 2392 2393 2394 2395 2396 2397 2398 2399 2400 2401 2402 2403 2404 2405 2406 2407 2408 2409 2410 2411 2412 2413 2414 2415 2416 2417 2418 2419 2420 2421 2422 2423 2424 2425 2426 2427 2428 2429 2430 2431 2432 2433 2434 2435 2436 2437 2438 2439 2440 2441 2442 2443 2444 2445 2446 2447 2448 2449 2450 2451 2452 2453 2454 2455 2456 2457 2458 2459 2460 2461 2462 2463 2464 2465 2466 2467 2468 2469 2470 2471 2472 2473 2474 2475 2476 2477 2478 2479 2480 2481 2482 2483 2484 2485 2486 2487 2488 2489 2490 2491 2492 2493 2494 2495 2496 2497 2498 2499 2500 2501 2502 2503 2504 2505 2506 2507 2508 2509 2510 2511 2512 2513 2514 2515 2516 2517 2518 2519 2520 2521 2522 2523 2524 2525 2526 2527 2528 2529 2530 2531 2532 2533 2534 2535 2536 2537 2538 2539 2540 2541 2542 2543 2544 2545 2546 2547 2548 2549 2550 2551 2552 2553 2554 2555 2556 2557 2558 2559 2560 2561 2562 2563 2564 2565 2566 2567 2568 2569 2570 2571 2572 2573 2574 2575 2576 2577 2578 2579 2580 2581 2582 2583 2584 2585 2586 2587 2588 2589 2590 2591 2592 2593 2594 2595 2596 2597 2598 2599 2600 2601 2602 2603 2604 2605 2606 2607 2608 2609 2610 2611 2612 2613 2614 2615 2616 2617 2618 2619 2620 2621 2622 2623 2624 2625 2626 2627 2628 2629 2630 2631 2632 2633 2634 2635 2636 2637 2638 2639 2640 2641 2642 2643 2644 2645 2646 2647 2648 2649 2650 2651 2652 2653 2654 2655 2656 2657 2658 2659 2660 2661 2662 2663 2664 2665 2666 2667 2668 2669 2670 2671 2672 2673 2674 2675 2676 2677 2678 2679 2680 2681 2682 2683 2684 2685 2686 2687 2688 2689 2690 2691 2692 2693 2694 2695 2696 2697 2698 2699 2700 2701 2702 2703 2704 2705 2706 2707 2708 2709 2710 2711 2712 2713 2714 2715 2716 2717 2718 2719 2720 2721 2722 2723 2724 2725 2726 2727 2728 2729 2730 2731 2732 2733 2734 2735 2736 2737 2738 2739 2740 2741 2742 2743 2744 2745 2746 2747 2748 2749 2750 2751 2752 2753 2754 2755 2756 2757 2758 2759 2760 2761 2762 2763 2764 2765 2766 2767 2768 2769 2770 2771 2772 2773 2774 2775 2776 2777 2778 2779 2780 2781 2782 2783 2784 2785 2786 2787 2788 2789 2790 2791 2792 2793 2794 2795 2796 2797 2798 2799 2800 2801 2802 2803 2804 2805 2806 2807 2808 2809 2810 2811 2812 2813 2814 2815 2816 2817 2818 2819 2820 2821 2822 2823 2824 2825 2826 2827 2828 2829 2830 2831 2832 2833 2834 2835 2836 2837 2838 2839 2840 2841 2842 2843 2844 2845 2846 2847 2848 2849 2850 2851 2852 2853 2854 2855 2856 2857 2858 2859 2860 2861 2862 2863 2864 2865 2866 2867 2868 2869 2870 2871 2872 2873 2874 2875 2876 2877 2878 2879 2880 2881 2882 2883 2884 2885 2886 2887 2888 2889 2890 2891 2892 2893 2894 2895 2896 2897 2898 2899 2900 2901 2902 2903 2904 2905 2906 2907 2908 2909 2910 2911 2912 2913 2914 2915 2916 2917 2918 2919 2920 2921 2922 2923 2924 2925 2926 2927 2928 2929 2930 2931 2932 2933 2934 2935 2936 2937 2938 2939 2940 2941 2942 2943 2944 2945 2946 2947 2948 2949 2950 2951 2952 2953 2954 2955 2956 2957 2958 2959 2960 2961 2962 2963 2964 2965 2966 2967 2968 2969 2970 2971 2972 2973 2974 2975 2976 2977 2978 2979 2980 2981 2982 2983 2984 2985 2986 2987 2988 2989 2990 2991 2992 2993 2994 2995 2996 2997 2998 2999 3000 3001 3002 3003 3004 3005 3006 3007 3008 3009 3010 3011 3012 3013 3014 3015 3016 3017 3018 3019 3020 3021 3022 3023 3024 3025 3026 3027 3028 3029 3030 3031 3032 3033 3034 3035 3036 3037 3038 3039 3040 3041 3042 3043 3044 3045 3046 3047 3048 3049 3050 3051 3052 3053 3054 3055 3056 3057 3058 3059 3060 3061 3062 3063 3064 3065 3066 3067 3068 3069 3070 3071 3072 3073 3074 3075 3076 3077 3078 3079 3080 3081 3082 3083 3084 3085 3086 3087 3088 3089 3090 3091 3092 3093 3094 3095 3096 3097 3098 3099 3100 3101 3102 3103 3104 3105 3106 3107 3108 3109 3110 3111 3112 3113 3114 3115 3116 3117 3118 3119 3120 3121 3122 3123 3124 3125 3126 3127 3128 3129 3130 3131 3132 3133 3134 3135 3136 3137 3138 3139 3140 3141 3142 3143 3144 3145 3146 3147 3148 3149 3150 3151 3152 3153 3154 3155 3156 3157 3158 3159 3160 3161 3162 3163 3164 3165 3166 3167 3168 3169 3170 3171 3172 3173 3174 3175 3176 3177 3178 3179 3180 3181 3182 3183 3184 3185 3186 3187 3188 3189 3190 3191 3192 3193 3194 3195 3196 3197 3198 3199 3200 3201 3202 3203 3204 3205 3206 3207 3208 3209 3210 3211 3212 3213 3214 3215 3216 3217 3218 3219 3220 3221 3222 3223 3224 3225 3226 3227 3228 3229 3230 3231 3232 3233 3234 3235 3236 3237 3238 3239 3240 3241 3242 3243 3244 3245 3246 3247 3248 3249 3250 3251 3252 3253 3254 3255 3256 3257 3258 3259 3260 3261 3262 3263 3264 3265 3266 3267 3268 3269 3270 3271 3272 3273 3274 3275 3276 3277 3278 3279 3280 3281 3282 3283 3284 3285 3286 3287 3288 3289 3290 3291 3292 3293 3294 3295 3296 3297 3298 3299 3300 3301 3302 3303 3304 3305 3306 3307 3308 3309 3310 3311 3312 3313 3314 3315 3316 3317 3318 3319 3320 3321 3322 3323 3324 3325 3326 3327 3328 3329 3330 3331 3332 3333 3334 3335 3336 3337 3338 3339 3340 3341 3342 3343 3344 3345 3346 3347 3348 3349 3350 3351 3352 3353 3354 3355 3356 3357 3358 3359 3360 3361 3362 3363 3364 3365 3366 3367 3368 3369 3370 3371 3372 3373 3374 3375 3376 3377 3378 3379 3380 3381 3382 3383 3384 3385 3386 3387 3388 3389 3390 3391 3392 3393 3394 3395 3396 3397 3398 3399 3400 3401 3402 3403 3404 3405 3406 3407 3408 3409 3410 3411 3412 3413 3414 3415 3416 3417 3418 3419 3420 3421 3422 3423 3424 3425 3426 3427 3428 3429 3430 3431 3432 3433 3434 3435 3436 3437 3438 3439 3440 3441 3442 3443 3444 3445 3446 3447 3448 3449 3450 3451 3452 3453 3454 3455 3456 3457 3458 3459 3460 3461 3462 3463 3464 3465 3466 3467 3468 3469 3470 3471 3472 3473 3474 3475 3476 3477 3478 3479 3480 3481 3482 3483 3484 3485 3486 3487 3488 3489 3490 3491 3492 3493 3494 3495 3496 3497 3498 3499 3500 3501 3502 3503 3504 3505 3506 3507 3508 3509 3510 3511 3512 3513 3514 3515 3516 3517 3518 3519 3520 3521 3522 3523 3524 3525 3526 3527 3528 3529 3530 3531 3532 3533 3534 3535 3536 3537 3538 3539 3540 3541 3542 3543 3544 3545 3546 3547 3548 3549 3550 3551 3552 3553 3554 3555 3556 3557 3558 3559 3560 3561 3562 3563 3564 3565 3566 3567 3568 3569 3570 3571 3572 3573 3574 3575 3576 3577 3578 3579 3580 3581 3582 3583 3584 3585 3586 3587 3588 3589 3590 3591 3592 3593 3594 3595 3596 3597 3598 3599 3600 3601 3602 3603 3604 3605 3606 3607 3608 3609 3610 3611 3612 3613 3614 3615 3616 3617 3618 3619 3620 3621 3622 3623 3624 3625 3626 3627 3628 3629 3630 3631 3632 3633 3634 3635 3636 3637 3638 3639 3640 3641 3642 3643 3644 3645 3646 3647 3648 3649 3650 3651 3652 3653 3654 3655 3656 3657 3658 3659 3660 3661 3662 3663 3664 3665 3666 3667 3668 3669 3670 3671 3672 3673 3674 3675 3676 3677 3678 3679 3680 3681 3682 3683 3684 3685 3686 3687 3688 3689 3690 3691 3692 3693 3694 3695 3696 3697 3698 3699 3700 3701 3702 3703 3704 3705 3706 3707 3708 3709 3710 3711 3712 3713 3714 3715 3716 3717 3718 3719 3720 3721 3722 3723 3724 3725 3726 3727 3728 3729 3730 3731 3732 3733 3734 3735 3736 3737 3738 3739 3740 3741 3742 3743 3744 3745 3746 3747 3748 3749 3750 3751 3752 3753 3754 3755 3756 3757 3758 3759 3760 3761 3762 3763 3764 3765 3766 3767 3768 3769 3770 3771 3772 3773 3774 3775 3776 3777 3778 3779 3780 3781 3782 3783 3784 3785 3786 3787 3788 3789 3790 3791 3792 3793 3794 3795 3796 3797 3798 3799 3800 3801 3802 3803 3804 3805 3806 3807 3808 3809 3810 3811 3812 3813 3814 3815 3816 3817 3818 3819 3820 3821 3822 3823 3824 3825 3826 3827 3828 3829 3830 3831 3832 3833 3834 3835 3836 3837 3838 3839 3840 3841 3842 3843 3844 3845 3846 3847 3848 3849 3850 3851 3852 3853 3854 3855 3856 3857 3858 3859 3860 3861 3862 3863 3864 3865 3866 3867 3868 3869 3870 3871 3872 3873 3874 3875 3876 3877 3878 3879 3880 3881 3882 3883 3884 3885 3886 3887 3888 3889 3890 3891 3892 3893 3894 3895 3896 3897 3898 3899 3900 3901 3902 3903 3904 3905 3906 3907 3908 3909 3910 3911 3912 3913 3914 3915 3916 3917 3918 3919 3920 3921 3922 3923 3924 3925 3926 3927 3928 3929 3930 3931 3932 3933 3934 3935 3936 3937 3938 3939 3940 3941 3942 3943 3944 3945 3946 3947 3948 3949 3950 3951 3952 3953 3954 3955 3956 3957 3958 3959 3960 3961 3962 3963 3964 3965 3966 3967 3968 3969 3970 3971 3972 3973 3974 3975 3976 3977 3978 3979 3980 3981 3982 3983 3984 3985 3986 3987 3988 3989 3990 3991 3992 3993 3994 3995 3996 3997 3998 3999 4000 4001 4002 4003 4004 4005 4006 4007 4008 4009 4010 4011 4012 4013 4014 4015 4016 4017 4018 4019 4020 4021 4022 4023 4024 4025 4026 4027 4028 4029 4030 4031 4032 4033 4034 4035 4036 4037 4038 4039 4040 4041 4042 4043 4044 4045 4046 4047 4048 4049 4050 4051 4052 4053 4054 4055 4056 4057 4058 4059 4060 4061 4062 4063 4064 4065 4066 4067 4068 4069 4070 4071 4072 4073 4074 4075 4076 4077 4078 4079 4080 4081 4082 4083 4084 4085 4086 4087 4088 4089 4090 4091 4092 4093 4094 4095 4096 4097 4098 4099 4100 4101 4102 4103 4104 4105 4106 4107 4108 4109 4110 4111 4112 4113 4114 4115 4116 4117 4118 4119 4120 4121 4122 4123 4124 4125 4126 4127 4128 4129 4130 4131 4132 4133 4134 4135 4136 4137 4138 4139 4140 4141 4142 4143 4144 4145 4146 4147 4148 4149 4150 4151 4152 4153 4154 4155 4156 4157 4158 4159 4160 4161 4162 4163 4164 4165 4166 4167 4168 4169 4170 4171 4172 4173 4174 4175 4176 4177 4178 4179 4180 4181 4182 4183 4184 4185 4186 4187 4188 4189 4190 4191 4192 4193 4194 4195 4196 4197 4198 4199 4200 4201 4202 4203 4204 4205 4206 4207 4208 4209 4210 4211 4212 4213 4214 4215 4216 4217 4218 4219 4220 4221 4222 4223 4224 4225 4226 4227 4228 4229 4230 4231 4232 4233 4234 4235 4236 4237 4238 4239 4240 4241 4242 4243 4244 4245 4246 4247 4248 4249 4250 4251 4252 4253 4254 4255 4256 4257 4258 4259 4260 4261 4262 4263 4264 4265 4266 4267 4268 4269 4270 4271 4272 4273 4274 4275 4276 4277 4278 4279 4280 4281 4282 4283 4284 4285 4286 4287 4288 4289 4290 4291 4292 4293 4294 4295 4296 4297 4298 4299 4300 4301 4302 4303 4304 4305 4306 4307 4308 4309 4310 4311 4312 4313 4314 4315 4316 4317 4318 4319 4320 4321 4322 4323 4324 4325 4326 4327 4328 4329 4330 4331 4332 4333 4334 4335 4336 4337 4338 4339 4340 4341 4342 4343 4344 4345 4346 4347 4348 4349 4350 4351 4352 4353 4354 4355 4356 4357 4358 4359 4360 4361 4362 4363 4364 4365 4366 4367 4368 4369 4370 4371 4372 4373 4374 4375 4376 4377 4378 4379 4380 4381 4382 4383 4384 4385 4386 4387 4388 4389 4390 4391 4392 4393 4394 4395 4396 4397 4398 4399 4400 4401 4402 4403 4404 4405 4406 4407 4408 4409 4410 4411 4412 4413 4414 4415 4416 4417 4418 4419 4420 4421 4422 4423 4424 4425 4426 4427 4428 4429 4430 4431 4432 4433 4434 4435 4436 4437 4438 4439 4440 4441 4442 4443 4444 4445 4446 4447 4448 4449 4450 4451 4452 4453 4454 4455 4456 4457 4458 4459 4460 4461 4462 4463 4464 4465 4466 4467 4468 4469 4470 4471 4472 4473 4474 4475 4476 4477 4478 4479 4480 4481 4482 4483 4484 4485 4486 4487 4488 4489 4490 4491 4492 4493 4494 4495 4496 4497 4498 4499 4500 4501 4502 4503 4504 4505 4506 4507 4508 4509 4510 4511 4512 4513 4514 4515 4516 4517 4518 4519 4520 4521 4522 4523 4524 4525 4526 4527 4528 4529 4530 4531 4532 4533 4534 4535 4536 4537 4538 4539 4540 4541 4542 4543 4544 4545 4546 4547 4548 4549 4550 4551 4552 4553 4554 4555 4556 4557 4558 4559 4560 4561 4562 4563 4564 4565 4566 4567 4568 4569 4570 4571 4572 4573 4574 4575 4576 4577 4578 4579 4580 4581 4582 4583 4584 4585 4586 4587 4588 4589 4590 4591 4592 4593 4594 4595 4596 4597 4598 4599 4600 4601 4602 4603 4604 4605 4606 4607 4608 4609 4610 4611 4612 4613 4614 4615 4616 4617 4618 4619 4620 4621 4622 4623 4624 4625 4626 4627 4628 4629 4630 4631 4632 4633 4634 4635 4636 4637 4638 4639 4640 4641 4642 4643 4644 4645 4646 4647 4648 4649 4650 4651 4652 4653 4654 4655 4656 4657 4658 4659 4660 4661 4662 4663 4664 4665 4666 4667 4668 4669 4670 4671 4672 4673 4674 4675 4676 4677 4678 4679 4680 4681 4682 4683 4684 4685 4686 4687 4688 4689 4690 4691 4692 4693 4694 4695 4696 4697 4698 4699 4700 4701 4702 4703 4704 4705 4706 4707 4708 4709 4710 4711 4712 4713 4714 4715 4716 4717 4718 4719 4720 4721 4722 4723 4724 4725 4726 4727 4728 4729 4730 4731 4732 4733 4734 4735 4736 4737 4738 4739 4740 4741 4742 4743 4744 4745 4746 4747 4748 4749 4750 4751 4752 4753 4754 4755 4756 4757 4758 4759 4760 4761 4762 4763 4764 4765 4766 4767 4768 4769 4770 4771 4772 4773 4774 4775 4776 4777 4778 4779 4780 4781 4782 4783 4784 4785 4786 4787 4788 4789 4790 4791 4792 4793 4794 4795 4796 4797 4798 4799 4800 4801 4802 4803 4804 4805 4806 4807 4808 4809 4810 4811 4812 4813 4814 4815 4816 4817 4818 4819 4820 4821 4822 4823 4824 4825 4826 4827 4828 4829 4830 4831 4832 4833 4834 4835 4836 4837 4838 4839 4840 4841 4842 4843 4844 4845 4846 4847 4848 4849 4850 4851 4852 4853 4854 4855 4856 4857 4858 4859 4860 4861 4862 4863 4864 4865 4866 4867 4868 4869 4870 4871 4872 4873 4874 4875 4876 4877 4878 4879 4880 4881 4882 4883 4884 4885 4886 4887 4888 4889 4890 4891 4892 4893 4894 4895 4896 4897 4898 4899 4900 4901 4902 4903 4904 4905 4906 4907 4908 4909 4910 4911 4912 4913 4914 4915 4916 4917 4918 4919 4920 4921 4922 4923 4924 4925 4926 4927 4928 4929 4930 4931 4932 4933 4934 4935 4936 4937 4938 4939 4940 4941 4942 4943 4944 4945 4946 4947 4948 4949 4950 4951 4952 4953 4954 4955 4956 4957 4958 4959 4960 4961 4962 4963 4964 4965 4966 4967 4968 4969 4970 4971 4972 4973 4974 4975 4976 4977 4978 4979 4980 4981 4982 4983 4984 4985 4986 4987 4988 4989 4990 4991 4992 4993 4994 4995 4996 4997 4998 4999 5000 5001 5002 5003 5004 5005 5006 5007 5008 5009 5010 5011 5012 5013 5014 5015 5016 5017 5018 5019 5020 5021 5022 5023 5024 5025 5026 5027 5028 5029 5030 5031 5032 5033 5034 5035 5036 5037 5038 5039 5040 5041 5042 5043 5044 5045 5046 5047 5048 5049 5050 5051 5052 5053 5054 5055 5056 5057 5058 5059 5060 5061 5062 5063 5064 5065 5066 5067 5068 5069 5070 5071 5072 5073 5074 5075 5076 5077 5078 5079 5080 5081 5082 5083 5084 5085 5086 5087 5088 5089 5090 5091 5092 5093 5094 5095 5096 5097 5098 5099 5100 5101 5102 5103 5104 5105 5106 5107 5108 5109 5110 5111 5112 5113 5114 5115 5116 5117 5118 5119 5120 5121 5122 5123 5124 5125 5126 5127 5128 5129 5130 5131 5132 5133 5134 5135 5136 5137 5138 5139 5140 5141 5142 5143 5144 5145 5146 5147 5148 5149 5150 5151 5152 5153 5154 5155 5156 5157 5158 5159 5160 5161 5162 5163 5164 5165 5166 5167 5168 5169 5170 5171 5172 5173 5174 5175 5176 5177 5178 5179 5180 5181 5182 5183 5184 5185 5186 5187 5188 5189 5190 5191 5192 5193 5194 5195 5196 5197 5198 5199 5200 5201 5202 5203 5204 5205 5206 5207 5208 5209 5210 5211 5212 5213 5214 5215 5216 5217 5218 5219 5220 5221 5222 5223 5224 5225 5226 5227 5228 5229 5230 5231 5232 5233 5234 5235 5236 5237 5238 5239 5240 5241 5242 5243 5244 5245 5246 5247 5248 5249 5250 5251 5252 5253 5254 5255 5256 5257 5258 5259 5260 5261 5262 5263 5264 5265 5266 5267 5268 5269 5270 5271 5272 5273 5274 5275 5276 5277 5278 5279 5280 5281 5282 5283 5284 5285 5286 5287 5288 5289 5290 5291 5292 5293 5294 5295 5296 5297 5298 5299 5300 5301 5302 5303 5304 5305 5306 5307 5308 5309 5310 5311 5312 5313 5314 5315 5316 5317 5318 5319 5320 5321 5322 5323 5324 5325 5326 5327 5328 5329 5330 5331 5332 5333 5334 5335 5336 5337 5338 5339 5340 5341 5342 5343 5344 5345 5346 5347 5348 5349 5350 5351 5352 5353 5354 5355 5356 5357 5358 5359 5360 5361 5362 5363 5364 5365 5366 5367 5368 5369 5370 5371 5372 5373 5374 5375 5376 5377 5378 5379 5380 5381 5382 5383 5384 5385 5386 5387 5388 5389 5390 5391 5392 5393 5394 5395 5396 5397 5398 5399 5400 5401 5402 5403 5404 5405 5406 5407 5408 5409 5410 5411 5412 5413 5414 5415 5416 5417 5418 5419 5420 5421 5422 5423 5424 5425 5426 5427 5428 5429 5430 5431 5432 5433 5434 5435 5436 5437 5438 5439 5440 5441 5442 5443 5444 5445 5446 5447 5448 5449 5450 5451 5452 5453 5454 5455 5456 5457 5458 5459 5460 5461 5462 5463 5464 5465 5466 5467 5468 5469 5470 5471 5472 5473 5474 5475 5476 5477 5478 5479 5480 5481 5482 5483 5484 5485 5486 5487 5488 5489 5490 5491 5492 5493 5494 5495 5496 5497 5498 5499 5500 5501 5502 5503 5504 5505 5506 5507 5508 5509 5510 5511 5512 5513 5514 5515 5516 5517 5518 5519 5520 5521 5522 5523 5524 5525 5526 5527 5528 5529 5530 5531 5532 5533 5534 5535 5536 5537 5538 5539 5540 5541 5542 5543 5544 5545 5546 5547 5548 5549 5550 5551 5552 5553 5554 5555 5556 5557 5558 5559 5560 5561 5562 5563 5564 5565 5566 5567 5568 5569 5570 5571 5572 5573 5574 5575 5576 5577 5578 5579 5580 5581 5582 5583 5584 5585 5586 5587 5588 5589 5590 5591 5592 5593 5594 5595 5596 5597 5598 5599 5600 5601 5602 5603 5604 5605 5606 5607 5608 5609 5610 5611 5612 5613 5614 5615 5616 5617 5618 5619 5620 5621 5622 5623 5624 5625 5626 5627 5628 5629 5630 5631 5632 5633 5634 5635 5636 5637 5638 5639 5640 5641 5642 5643 5644 5645 5646 5647 5648 5649 5650 5651 5652 5653 5654 5655 5656 5657 5658 5659 5660 5661 5662 5663 5664 5665 5666 5667 5668 5669 5670 5671 5672 5673 5674 5675 5676 5677 5678 5679 5680 5681 5682 5683 5684 5685 5686 5687 5688 5689 5690 5691 5692 5693 5694 5695 5696 5697 5698 5699 5700 5701 5702 5703 5704 5705 5706 5707 5708 5709 5710 5711 5712 5713 5714 5715 5716 5717 5718 5719 5720 5721 5722 5723 5724 5725 5726 5727 5728 5729 5730 5731 5732 5733 5734 5735 5736 5737 5738 5739 5740 5741 5742 5743 5744 5745 5746 5747 5748 5749 5750 5751 5752 5753 5754 5755 5756 5757 5758 5759 5760 5761 5762 5763 5764 5765 5766 5767 5768 5769 5770 5771 5772 5773 5774 5775 5776 5777 5778 5779 5780 5781 5782 5783 5784 5785 5786 5787 5788 5789 5790 5791 5792 5793 5794 5795 5796 5797 5798 5799 5800 5801 5802 5803 5804 5805 5806 5807 5808 5809 5810 5811 5812 5813 5814 5815 5816 5817 5818 5819 5820 5821 5822 5823 5824 5825 5826 5827 5828 5829 5830 5831 5832 5833 5834 5835 5836 5837 5838 5839 5840 5841 5842 5843 5844 5845 5846 5847 5848 5849 5850 5851 5852 5853 5854 5855 5856 5857 5858 5859 5860 5861 5862 5863 5864 5865 5866 5867 5868 5869 5870 5871 5872 5873 5874 5875 5876 5877 5878 5879 5880 5881 5882 5883 5884 5885 5886 5887 5888 5889 5890 5891 5892 5893 5894 5895 5896 5897 5898 5899 5900 5901 5902 5903 5904 5905 5906 5907 5908 5909 5910 5911 5912 5913 5914 5915 5916 5917 5918 5919 5920 5921 5922 5923 5924 5925 5926 5927 5928 5929 5930 5931 5932 5933 5934 5935 5936 5937 5938 5939 5940 5941 5942 5943 5944 5945 5946 5947 5948 5949 5950 5951 5952 5953 5954 5955 5956 5957 5958 5959 5960 5961 5962 5963 5964 5965 5966 5967 5968 5969 5970 5971 5972 5973 5974 5975 5976 5977 5978 5979 5980 5981 5982 5983 5984 5985 5986 5987 5988 5989 5990 5991 5992 5993 5994 5995 5996 5997 5998 5999 6000 6001 6002 6003 6004 6005 6006 6007 6008 6009 6010 6011 6012 6013 6014 6015 6016 6017 6018 6019 6020 6021 6022 6023 6024 6025 6026 6027 6028 6029 6030 6031 6032 6033 6034 6035 6036 6037 6038 6039 6040 6041 6042 6043 6044 6045 6046 6047 6048 6049 6050 6051 6052 6053 6054 6055 6056 6057 6058 6059 6060 6061 6062 6063 6064 6065 6066 6067 6068 6069 6070 6071 6072 6073 6074 6075 6076 6077 6078 6079 6080 6081 6082 6083 6084 6085 6086 6087 6088 6089 6090 6091 6092 6093 6094 6095 6096 6097 6098 6099 6100 6101 6102 6103 6104 6105 6106 6107 6108 6109 6110 6111 6112 6113 6114 6115 6116 6117 6118 6119 6120 6121 6122 6123 6124 6125 6126 6127 6128 6129 6130 6131 6132 6133 6134 6135 6136 6137 6138 6139 6140 6141 6142 6143 6144 6145 6146 6147 6148 6149 6150 6151 6152 6153 6154 6155 6156 6157 6158 6159 6160 6161 6162 6163 6164 6165 6166 6167 6168 6169 6170 6171 6172 6173 6174 6175 6176 6177 6178 6179 6180 6181 6182 6183 6184 6185 6186 6187 6188 6189 6190 6191 6192 6193 6194 6195 6196 6197 6198 6199 6200 6201 6202 6203 6204 6205 6206 6207 6208 6209 6210 6211 6212 6213 6214 6215 6216 6217 6218 6219 6220 6221 6222 6223 6224 6225 6226 6227 6228 6229 6230 6231 6232 6233 6234 6235 6236 6237 6238 6239 6240 6241 6242 6243 6244 6245 6246 6247 6248 6249 6250 6251 6252 6253 6254 6255 6256 6257 6258 6259 6260 6261 6262 6263 6264 6265 6266 6267 6268 6269 6270 6271 6272 6273 6274 6275 6276 6277 6278 6279 6280 6281 6282 6283 6284 6285 6286 6287 6288 6289 6290 6291 6292 6293 6294 6295 6296 6297 6298 6299 6300 6301 6302 6303 6304 6305 6306 6307 6308 6309 6310 6311 6312 6313 6314 6315 6316 6317 6318 6319 6320 6321 6322 6323 6324 6325 6326 6327 6328 6329 6330 6331 6332 6333 6334 6335 6336 6337 6338 6339 6340 6341 6342 6343 6344 6345 6346 6347 6348 6349 6350 6351 6352 6353 6354 6355 6356 6357 6358 6359 6360 6361 6362 6363 6364 6365 6366 6367 6368 6369 6370 6371 6372 6373 6374 6375 6376 6377 6378 6379 6380 6381 6382 6383 6384 6385 6386 6387 6388 6389 6390 6391 6392 6393 6394 6395 6396 6397 6398 6399 6400 6401 6402 6403 6404 6405 6406 6407 6408 6409 6410 6411 6412 6413 6414 6415 6416 6417 6418 6419 6420 6421 6422 6423 6424 6425 6426 6427 6428 6429 6430 6431 6432 6433 6434 6435 6436 6437 6438 6439 6440 6441 6442 6443 6444 6445 6446 6447 6448 6449 6450 6451 6452 6453 6454 6455 6456 6457 6458 6459 6460 6461 6462 6463 6464 6465 6466 6467 6468 6469 6470 6471 6472 6473 6474 6475 6476 6477 6478 6479 6480 6481 6482 6483 6484 6485 6486 6487 6488 6489 6490 6491 6492 6493 6494 6495 6496 6497 6498 6499 6500 6501 6502 6503 6504 6505 6506 6507 6508 6509 6510 6511 6512 6513 6514 6515 6516 6517 6518 6519 6520 6521 6522 6523 6524 6525 6526 6527 6528 6529 6530 6531 6532 6533 6534 6535 6536 6537 6538 6539 6540 6541 6542 6543 6544 6545 6546 6547 6548 6549 6550 6551 6552 6553 6554 6555 6556 6557 6558 6559 6560 6561 6562 6563 6564 6565 6566 6567 6568 6569 6570 6571 6572 6573 6574 6575 6576 6577 6578 6579 6580 6581 6582 6583 6584 6585 6586 6587 6588 6589 6590 6591 6592 6593 6594 6595 6596 6597 6598 6599 6600 6601 6602 6603 6604 6605 6606 6607 6608 6609 6610 6611 6612 6613 6614 6615 6616 6617 6618 6619 6620 6621 6622 6623 6624 6625 6626 6627 6628 6629 6630 6631 6632 6633 6634 6635 6636 6637 6638 6639 6640 6641 6642 6643 6644 6645 6646 6647 6648 6649 6650 6651 6652 6653 6654 6655 6656 6657 6658 6659 6660 6661 6662 6663 6664 6665 6666 6667 6668 6669 6670 6671 6672 6673 6674 6675 6676 6677 6678 6679 6680 6681 6682 6683 6684 6685 6686 6687 6688 6689 6690 6691 6692 6693 6694 6695 6696 6697 6698 6699 6700 6701 6702 6703 6704 6705 6706 6707 6708 6709 6710 6711 6712 6713 6714 6715 6716 6717 6718 6719 6720 6721 6722 6723 6724 6725 6726 6727 6728 6729 6730 6731 6732 6733 6734 6735 6736 6737 6738 6739 6740 6741 6742 6743 6744 6745 6746 6747 6748 6749 6750 6751 6752 6753 6754 6755 6756 6757 6758 6759 6760 6761 6762 6763 6764 6765 6766 6767 6768 6769 6770 6771 6772 6773 6774 6775 6776 6777 6778 6779 6780 6781 6782 6783 6784 6785 6786 6787 6788 6789 6790 6791 6792 6793 6794 6795 6796 6797 6798 6799 6800 6801 6802 6803 6804 6805 6806 6807 6808 6809 6810 6811 6812 6813 6814 6815 6816 6817 6818 6819 6820 6821 6822 6823 6824 6825 6826 6827 6828 6829 6830 6831 6832 6833 6834 6835 6836 6837 6838 6839 6840 6841 6842 6843 6844 6845 6846 6847 6848 6849 6850 6851 6852 6853 6854 6855 6856 6857 6858 6859 6860 6861 6862 6863 6864 6865 6866 6867 6868 6869 6870 6871 6872 6873 6874 6875 6876 6877 6878 6879 6880 6881 6882 6883 6884 6885 6886 6887 6888 6889 6890 6891 6892 6893 6894 6895 6896 6897 6898 6899 6900 6901 6902 6903 6904 6905 6906 6907 6908 6909 6910 6911 6912 6913 6914 6915 6916 6917 6918 6919 6920 6921 6922 6923 6924 6925 6926 6927 6928 6929 6930 6931 6932 6933 6934 6935 6936 6937 6938 6939 6940 6941 6942 6943 6944 6945 6946 6947 6948 6949 6950 6951 6952 6953 6954 6955 6956 6957 6958 6959 6960 6961 6962 6963 6964 6965 6966 6967 6968 6969 6970 6971 6972 6973 6974 6975 6976 6977 6978 6979 6980 6981 6982 6983 6984 6985 6986 6987 6988 6989 6990 6991 6992 6993 6994 6995 6996 6997 6998 6999 7000 7001 7002 7003 7004 7005 7006 7007 7008 7009 7010 7011 7012 7013 7014 7015 7016 7017 7018 7019 7020 7021 7022 7023 7024 7025 7026 7027 7028 7029 7030 7031 7032 7033 7034 7035 7036 7037 7038 7039 7040 7041 7042 7043 7044 7045 7046 7047 7048 7049 7050 7051 7052 7053 7054 7055 7056 7057 7058 7059 7060 7061 7062 7063 7064 7065 7066 7067 7068 7069 7070 7071 7072 7073 7074 7075 7076 7077 7078 7079 7080 7081 7082 7083 7084 7085 7086 7087 7088 7089 7090 7091 7092 7093 7094 7095 7096 7097 7098 7099 7100 7101 7102 7103 7104 7105 7106 7107 7108 7109 7110 7111 7112 7113 7114 7115 7116 7117 7118 7119 7120 7121 7122 7123 7124 7125 7126 7127 7128 7129 7130 7131 7132 7133 7134 7135 7136 7137 7138 7139 7140 7141 7142 7143 7144 7145 7146 7147 7148 7149 7150 7151 7152 7153 7154 7155 7156 7157 7158 7159 7160 7161 7162 7163 7164 7165 7166 7167 7168 7169 7170 7171 7172 7173 7174 7175 7176 7177 7178 7179 7180 7181 7182 7183 7184 7185 7186 7187 7188 7189 7190 7191 7192 7193 7194 7195 7196 7197 7198 7199 7200 7201 7202 7203 7204 7205 7206 7207 7208 7209 7210 7211 7212 7213 7214 7215 7216 7217 7218 7219 7220 7221 7222 7223 7224 7225 7226 7227 7228 7229 7230 7231 7232 7233 7234 7235 7236 7237 7238 7239 7240 7241 7242 7243 7244 7245 7246 7247 7248 7249 7250 7251 7252 7253 7254 7255 7256 7257 7258 7259 7260 7261 7262 7263 7264 7265 7266 7267 7268 7269 7270 7271 7272 7273 7274 7275 7276 7277 7278 7279 7280 7281 7282 7283 7284 7285 7286 7287 7288 7289 7290 7291 7292 7293 7294 7295 7296 7297 7298 7299 7300 7301 7302 7303 7304 7305 7306 7307 7308 7309 7310 7311 7312 7313 7314 7315 7316 7317 7318 7319 7320 7321 7322 7323 7324 7325 7326 7327 7328 7329 7330 7331 7332 7333 7334 7335 7336 7337 7338 7339 7340 7341 7342 7343 7344 7345 7346 7347 7348 7349 7350 7351 7352 7353 7354 7355 7356 7357 7358 7359 7360 7361 7362 7363 7364 7365 7366 7367 7368 7369 7370 7371 7372 7373 7374 7375 7376 7377 7378 7379 7380 7381 7382 7383 7384 7385 7386 7387 7388 7389 7390 7391 7392 7393 7394 7395 7396 7397 7398 7399 7400 7401 7402 7403 7404 7405 7406 7407 7408 7409 7410 7411 7412 7413 7414 7415 7416 7417 7418 7419 7420 7421 7422 7423 7424 7425 7426 7427 7428 7429 7430 7431 7432 7433 7434 7435 7436 7437 7438 7439 7440 7441 7442 7443 7444 7445 7446 7447 7448 7449 7450 7451 7452 7453 7454 7455 7456 7457 7458 7459 7460 7461 7462 7463 7464 7465 7466 7467 7468 7469 7470 7471 7472 7473 7474 7475 7476 7477 7478 7479 7480 7481 7482 7483 7484 7485 7486 7487 7488 7489 7490 7491 7492 7493 7494 7495 7496 7497 7498 7499 7500 7501 7502 7503 7504 7505 7506 7507 7508 7509 7510 7511 7512 7513 7514 7515 7516 7517 7518 7519 7520 7521 7522 7523 7524 7525 7526 7527 7528 7529 7530 7531 7532 7533 7534 7535 7536 7537 7538 7539 7540 7541 7542 7543 7544 7545 7546 7547 7548 7549 7550 7551 7552 7553 7554 7555 7556 7557 7558 7559 7560 7561 7562 7563 7564 7565 7566 7567 7568 7569 7570 7571 7572 7573 7574 7575 7576 7577 7578 7579 7580 7581 7582 7583 7584 7585 7586 7587 7588 7589 7590 7591 7592 7593 7594 7595 7596 7597 7598 7599 7600 7601 7602 7603 7604 7605 7606 7607 7608 7609 7610 7611 7612 7613 7614 7615 7616 7617 7618 7619 7620 7621 7622 7623 7624 7625 7626 7627 7628 7629 7630 7631 7632 7633 7634 7635 7636 7637 7638 7639 7640 7641 7642 7643 7644 7645 7646 7647 7648 7649 7650 7651 7652 7653 7654 7655 7656 7657 7658 7659 7660 7661 7662 7663 7664 7665 7666 7667 7668 7669 7670 7671 7672 7673 7674 7675 7676 7677 7678 7679 7680 7681 7682 7683 7684 7685 7686 7687 7688 7689 7690 7691 7692 7693 7694 7695 7696 7697 7698 7699 7700 7701 7702 7703 7704 7705 7706 7707 7708 7709 7710 7711 7712 7713 7714 7715 7716 7717 7718 7719 7720 7721 7722 7723 7724 7725 7726 7727 7728 7729 7730 7731 7732 7733 7734 7735 7736 7737 7738 7739 7740 7741 7742 7743 7744 7745 7746 7747 7748 7749 7750 7751 7752 7753 7754 7755 7756 7757 7758 7759 7760 7761 7762 7763 7764 7765 7766 7767 7768 7769 7770 7771 7772 7773 7774 7775 7776 7777 7778 7779 7780 7781 7782 7783 7784 7785 7786 7787 7788 7789 7790 7791 7792 7793 7794 7795 7796 7797 7798 7799 7800 7801 7802 7803 7804 7805 7806 7807 7808 7809 7810 7811 7812 7813 7814 7815 7816 7817 7818 7819 7820 7821 7822 7823 7824 7825 7826 7827 7828 7829 7830 7831 7832 7833 7834 7835 7836 7837 7838 7839 7840 7841 7842 7843 7844 7845 7846 7847 7848 7849 7850 7851 7852 7853 7854 7855 7856 7857 7858 7859 7860 7861 7862 7863 7864 7865 7866 7867 7868 7869 7870 7871 7872 7873 7874 7875 7876 7877 7878 7879 7880 7881 7882 7883 7884 7885 7886 7887 7888 7889 7890 7891 7892 7893 7894 7895 7896 7897 7898 7899 7900 7901 7902 7903 7904 7905 7906 7907 7908 7909 7910 7911 7912 7913 7914 7915 7916 7917 7918 7919 7920 7921 7922 7923 7924 7925 7926 7927 7928 7929 7930 7931 7932 7933 7934 7935 7936 7937 7938 7939 7940 7941 7942 7943 7944 7945 7946 7947 7948 7949 7950 7951 7952 7953 7954 7955 7956 7957 7958 7959 7960 7961 7962 7963 7964 7965 7966 7967 7968 7969 7970 7971 7972 7973 7974 7975 7976 7977 7978 7979 7980 7981 7982 7983 7984 7985 7986 7987 7988 7989 7990 7991 7992 7993 7994 7995 7996 7997 7998 7999 8000 8001 8002 8003 8004 8005 8006 8007 8008 8009 8010 8011 8012 8013 8014 8015 8016 8017 8018 8019 8020 8021 8022 8023 8024 8025 8026 8027 8028 8029 8030 8031 8032 8033 8034 8035 8036 8037 8038 8039 8040 8041 8042 8043 8044 8045 8046 8047 8048 8049 8050 8051 8052 8053 8054 8055 8056 8057 8058 8059 8060 8061 8062 8063 8064 8065 8066 8067 8068 8069 8070 8071 8072 8073 8074 8075 8076 8077 8078 8079 8080 8081 8082 8083 8084 8085 8086 8087 8088 8089 8090 8091 8092 8093 8094 8095 8096 8097 8098 8099 8100 8101 8102 8103 8104 8105 8106 8107 8108 8109 8110 8111 8112 8113 8114 8115 8116 8117 8118 8119 8120 8121 8122 8123 8124 8125 8126 8127 8128 8129 8130 8131 8132 8133 8134 8135 8136 8137 8138 8139 8140 8141 8142 8143 8144 8145 8146 8147 8148 8149 8150 8151 8152 8153 8154 8155 8156 8157 8158 8159 8160 8161 8162 8163 8164 8165 8166 8167 8168 8169 8170 8171 8172 8173 8174 8175 8176 8177 8178 8179 8180 8181 8182 8183 8184 8185 8186 8187 8188 8189 8190 8191 8192 8193 8194 8195 8196 8197 8198 8199 8200 8201 8202 8203 8204 8205 8206 8207 8208 8209 8210 8211 8212 8213 8214 8215 8216 8217 8218 8219 8220 8221 8222 8223 8224 8225 8226 8227 8228 8229 8230 8231 8232 8233 8234 8235 8236 8237 8238 8239 8240 8241 8242 8243 8244 8245 8246 8247 8248 8249 8250 8251 8252 8253 8254 8255 8256 8257 8258 8259 8260 8261 8262 8263 8264 8265 8266 8267 8268 8269 8270 8271 8272 8273 8274 8275 8276 8277 8278 8279 8280 8281 8282 8283 8284 8285 8286 8287 8288 8289 8290 8291 8292 8293 8294 8295 8296 8297 8298 8299 8300 8301 8302 8303 8304 8305 8306 8307 8308 8309 8310 8311 8312 8313 8314 8315 8316 8317 8318 8319 8320 8321 8322 8323 8324 8325 8326 8327 8328 8329 8330 8331 8332 8333 8334 8335 8336 8337 8338 8339 8340 8341 8342 8343 8344 8345 8346 8347 8348 8349 8350 8351 8352 8353 8354 8355 8356 8357 8358 8359 8360 8361 8362 8363 8364 8365 8366 8367 8368 8369 8370 8371 8372 8373 8374 8375 8376 8377 8378 8379 8380 8381 8382 8383 8384 8385 8386 8387 8388 8389 8390 8391 8392 8393 8394 8395 8396 8397 8398 8399 8400 8401 8402 8403 8404 8405 8406 8407 8408 8409 8410 8411 8412 8413 8414 8415 8416 8417 8418 8419 8420 8421 8422 8423 8424 8425 8426 8427 8428 8429 8430 8431 8432 8433 8434 8435 8436 8437 8438 8439 8440 8441 8442 8443 8444 8445 8446 8447 8448 8449 8450 8451 8452 8453 8454 8455 8456 8457 8458 8459 8460 8461 8462 8463 8464 8465 8466 8467 8468 8469 8470 8471 8472 8473 8474 8475 8476 8477 8478 8479 8480 8481 8482 8483 8484 8485 8486 8487 8488 8489 8490 8491 8492 8493 8494 8495 8496 8497 8498 8499 8500 8501 8502 8503 8504 8505 8506 8507 8508 8509 8510 8511 8512 8513 8514 8515 8516 8517 8518 8519 8520 8521 8522 8523 8524 8525 8526 8527 8528 8529 8530 8531 8532 8533 8534 8535 8536 8537 8538 8539 8540 8541 8542 8543 8544 8545 8546 8547 8548 8549 8550 8551 8552 8553 8554 8555 8556 8557 8558 8559 8560 8561 8562 8563 8564 8565 8566 8567 8568 8569 8570 8571 8572 8573 8574 8575 8576 8577 8578 8579 8580 8581 8582 8583 8584 8585 8586 8587 8588 8589 8590 8591 8592 8593 8594 8595 8596 8597 8598 | // SPDX-License-Identifier: GPL-2.0-only /* * Kernel-based Virtual Machine driver for Linux * * This module enables machines with Intel VT-x extensions to run virtual * machines without emulation or binary translation. * * Copyright (C) 2006 Qumranet, Inc. * Copyright 2010 Red Hat, Inc. and/or its affiliates. * * Authors: * Avi Kivity <avi@qumranet.com> * Yaniv Kamay <yaniv@qumranet.com> */ #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt #include <linux/highmem.h> #include <linux/hrtimer.h> #include <linux/kernel.h> #include <linux/kvm_host.h> #include <linux/module.h> #include <linux/moduleparam.h> #include <linux/mod_devicetable.h> #include <linux/mm.h> #include <linux/objtool.h> #include <linux/sched.h> #include <linux/sched/smt.h> #include <linux/slab.h> #include <linux/tboot.h> #include <linux/trace_events.h> #include <linux/entry-kvm.h> #include <asm/apic.h> #include <asm/asm.h> #include <asm/cpu.h> #include <asm/cpu_device_id.h> #include <asm/debugreg.h> #include <asm/desc.h> #include <asm/fpu/api.h> #include <asm/fpu/xstate.h> #include <asm/fred.h> #include <asm/idtentry.h> #include <asm/io.h> #include <asm/irq_remapping.h> #include <asm/reboot.h> #include <asm/perf_event.h> #include <asm/mmu_context.h> #include <asm/mshyperv.h> #include <asm/msr.h> #include <asm/mwait.h> #include <asm/spec-ctrl.h> #include <asm/vmx.h> #include <trace/events/ipi.h> #include "capabilities.h" #include "common.h" #include "cpuid.h" #include "hyperv.h" #include "kvm_onhyperv.h" #include "irq.h" #include "kvm_cache_regs.h" #include "lapic.h" #include "mmu.h" #include "nested.h" #include "pmu.h" #include "sgx.h" #include "trace.h" #include "vmcs.h" #include "vmcs12.h" #include "vmx.h" #include "x86.h" #include "x86_ops.h" #include "smm.h" #include "vmx_onhyperv.h" #include "posted_intr.h" #include "mmu/spte.h" MODULE_AUTHOR("Qumranet"); MODULE_DESCRIPTION("KVM support for VMX (Intel VT-x) extensions"); MODULE_LICENSE("GPL"); #ifdef MODULE static const struct x86_cpu_id vmx_cpu_id[] = { X86_MATCH_FEATURE(X86_FEATURE_VMX, NULL), {} }; MODULE_DEVICE_TABLE(x86cpu, vmx_cpu_id); #endif bool __read_mostly enable_vpid = 1; module_param_named(vpid, enable_vpid, bool, 0444); static bool __read_mostly enable_vnmi = 1; module_param_named(vnmi, enable_vnmi, bool, 0444); bool __read_mostly flexpriority_enabled = 1; module_param_named(flexpriority, flexpriority_enabled, bool, 0444); bool __read_mostly enable_ept = 1; module_param_named(ept, enable_ept, bool, 0444); bool __read_mostly enable_unrestricted_guest = 1; module_param_named(unrestricted_guest, enable_unrestricted_guest, bool, 0444); bool __read_mostly enable_ept_ad_bits = 1; module_param_named(eptad, enable_ept_ad_bits, bool, 0444); static bool __read_mostly emulate_invalid_guest_state = true; module_param(emulate_invalid_guest_state, bool, 0444); static bool __read_mostly fasteoi = 1; module_param(fasteoi, bool, 0444); module_param(enable_apicv, bool, 0444); module_param(enable_ipiv, bool, 0444); module_param(enable_device_posted_irqs, bool, 0444); /* * If nested=1, nested virtualization is supported, i.e., guests may use * VMX and be a hypervisor for its own guests. If nested=0, guests may not * use VMX instructions. */ static bool __read_mostly nested = 1; module_param(nested, bool, 0444); bool __read_mostly enable_pml = 1; module_param_named(pml, enable_pml, bool, 0444); static bool __read_mostly error_on_inconsistent_vmcs_config = true; module_param(error_on_inconsistent_vmcs_config, bool, 0444); static bool __read_mostly dump_invalid_vmcs = 0; module_param(dump_invalid_vmcs, bool, 0644); #define MSR_BITMAP_MODE_X2APIC 1 #define MSR_BITMAP_MODE_X2APIC_APICV 2 #define KVM_VMX_TSC_MULTIPLIER_MAX 0xffffffffffffffffULL /* Guest_tsc -> host_tsc conversion requires 64-bit division. */ static int __read_mostly cpu_preemption_timer_multi; static bool __read_mostly enable_preemption_timer = 1; #ifdef CONFIG_X86_64 module_param_named(preemption_timer, enable_preemption_timer, bool, S_IRUGO); #endif extern bool __read_mostly allow_smaller_maxphyaddr; module_param(allow_smaller_maxphyaddr, bool, S_IRUGO); #define KVM_VM_CR0_ALWAYS_OFF (X86_CR0_NW | X86_CR0_CD) #define KVM_VM_CR0_ALWAYS_ON_UNRESTRICTED_GUEST X86_CR0_NE #define KVM_VM_CR0_ALWAYS_ON \ (KVM_VM_CR0_ALWAYS_ON_UNRESTRICTED_GUEST | X86_CR0_PG | X86_CR0_PE) #define KVM_VM_CR4_ALWAYS_ON_UNRESTRICTED_GUEST X86_CR4_VMXE #define KVM_PMODE_VM_CR4_ALWAYS_ON (X86_CR4_PAE | X86_CR4_VMXE) #define KVM_RMODE_VM_CR4_ALWAYS_ON (X86_CR4_VME | X86_CR4_PAE | X86_CR4_VMXE) #define RMODE_GUEST_OWNED_EFLAGS_BITS (~(X86_EFLAGS_IOPL | X86_EFLAGS_VM)) #define MSR_IA32_RTIT_STATUS_MASK (~(RTIT_STATUS_FILTEREN | \ RTIT_STATUS_CONTEXTEN | RTIT_STATUS_TRIGGEREN | \ RTIT_STATUS_ERROR | RTIT_STATUS_STOPPED | \ RTIT_STATUS_BYTECNT)) /* * These 2 parameters are used to config the controls for Pause-Loop Exiting: * ple_gap: upper bound on the amount of time between two successive * executions of PAUSE in a loop. Also indicate if ple enabled. * According to test, this time is usually smaller than 128 cycles. * ple_window: upper bound on the amount of time a guest is allowed to execute * in a PAUSE loop. Tests indicate that most spinlocks are held for * less than 2^12 cycles * Time is measured based on a counter that runs at the same rate as the TSC, * refer SDM volume 3b section 21.6.13 & 22.1.3. */ static unsigned int ple_gap = KVM_DEFAULT_PLE_GAP; module_param(ple_gap, uint, 0444); static unsigned int ple_window = KVM_VMX_DEFAULT_PLE_WINDOW; module_param(ple_window, uint, 0444); /* Default doubles per-vcpu window every exit. */ static unsigned int ple_window_grow = KVM_DEFAULT_PLE_WINDOW_GROW; module_param(ple_window_grow, uint, 0444); /* Default resets per-vcpu window every exit to ple_window. */ static unsigned int ple_window_shrink = KVM_DEFAULT_PLE_WINDOW_SHRINK; module_param(ple_window_shrink, uint, 0444); /* Default is to compute the maximum so we can never overflow. */ static unsigned int ple_window_max = KVM_VMX_DEFAULT_PLE_WINDOW_MAX; module_param(ple_window_max, uint, 0444); /* Default is SYSTEM mode, 1 for host-guest mode (which is BROKEN) */ int __read_mostly pt_mode = PT_MODE_SYSTEM; #ifdef CONFIG_BROKEN module_param(pt_mode, int, S_IRUGO); #endif struct x86_pmu_lbr __ro_after_init vmx_lbr_caps; static DEFINE_STATIC_KEY_FALSE(vmx_l1d_should_flush); static DEFINE_STATIC_KEY_FALSE(vmx_l1d_flush_cond); static DEFINE_MUTEX(vmx_l1d_flush_mutex); /* Storage for pre module init parameter parsing */ static enum vmx_l1d_flush_state __read_mostly vmentry_l1d_flush_param = VMENTER_L1D_FLUSH_AUTO; static const struct { const char *option; bool for_parse; } vmentry_l1d_param[] = { [VMENTER_L1D_FLUSH_AUTO] = {"auto", true}, [VMENTER_L1D_FLUSH_NEVER] = {"never", true}, [VMENTER_L1D_FLUSH_COND] = {"cond", true}, [VMENTER_L1D_FLUSH_ALWAYS] = {"always", true}, [VMENTER_L1D_FLUSH_EPT_DISABLED] = {"EPT disabled", false}, [VMENTER_L1D_FLUSH_NOT_REQUIRED] = {"not required", false}, }; #define L1D_CACHE_ORDER 4 static void *vmx_l1d_flush_pages; static int vmx_setup_l1d_flush(enum vmx_l1d_flush_state l1tf) { struct page *page; unsigned int i; if (!boot_cpu_has_bug(X86_BUG_L1TF)) { l1tf_vmx_mitigation = VMENTER_L1D_FLUSH_NOT_REQUIRED; return 0; } if (!enable_ept) { l1tf_vmx_mitigation = VMENTER_L1D_FLUSH_EPT_DISABLED; return 0; } if (kvm_host.arch_capabilities & ARCH_CAP_SKIP_VMENTRY_L1DFLUSH) { l1tf_vmx_mitigation = VMENTER_L1D_FLUSH_NOT_REQUIRED; return 0; } /* If set to auto use the default l1tf mitigation method */ if (l1tf == VMENTER_L1D_FLUSH_AUTO) { switch (l1tf_mitigation) { case L1TF_MITIGATION_OFF: l1tf = VMENTER_L1D_FLUSH_NEVER; break; case L1TF_MITIGATION_AUTO: case L1TF_MITIGATION_FLUSH_NOWARN: case L1TF_MITIGATION_FLUSH: case L1TF_MITIGATION_FLUSH_NOSMT: l1tf = VMENTER_L1D_FLUSH_COND; break; case L1TF_MITIGATION_FULL: case L1TF_MITIGATION_FULL_FORCE: l1tf = VMENTER_L1D_FLUSH_ALWAYS; break; } } else if (l1tf_mitigation == L1TF_MITIGATION_FULL_FORCE) { l1tf = VMENTER_L1D_FLUSH_ALWAYS; } if (l1tf != VMENTER_L1D_FLUSH_NEVER && !vmx_l1d_flush_pages && !boot_cpu_has(X86_FEATURE_FLUSH_L1D)) { /* * This allocation for vmx_l1d_flush_pages is not tied to a VM * lifetime and so should not be charged to a memcg. */ page = alloc_pages(GFP_KERNEL, L1D_CACHE_ORDER); if (!page) return -ENOMEM; vmx_l1d_flush_pages = page_address(page); /* * Initialize each page with a different pattern in * order to protect against KSM in the nested * virtualization case. */ for (i = 0; i < 1u << L1D_CACHE_ORDER; ++i) { memset(vmx_l1d_flush_pages + i * PAGE_SIZE, i + 1, PAGE_SIZE); } } l1tf_vmx_mitigation = l1tf; if (l1tf != VMENTER_L1D_FLUSH_NEVER) static_branch_enable(&vmx_l1d_should_flush); else static_branch_disable(&vmx_l1d_should_flush); if (l1tf == VMENTER_L1D_FLUSH_COND) static_branch_enable(&vmx_l1d_flush_cond); else static_branch_disable(&vmx_l1d_flush_cond); return 0; } static int vmentry_l1d_flush_parse(const char *s) { unsigned int i; if (s) { for (i = 0; i < ARRAY_SIZE(vmentry_l1d_param); i++) { if (vmentry_l1d_param[i].for_parse && sysfs_streq(s, vmentry_l1d_param[i].option)) return i; } } return -EINVAL; } static int vmentry_l1d_flush_set(const char *s, const struct kernel_param *kp) { int l1tf, ret; l1tf = vmentry_l1d_flush_parse(s); if (l1tf < 0) return l1tf; if (!boot_cpu_has(X86_BUG_L1TF)) return 0; /* * Has vmx_init() run already? If not then this is the pre init * parameter parsing. In that case just store the value and let * vmx_init() do the proper setup after enable_ept has been * established. */ if (l1tf_vmx_mitigation == VMENTER_L1D_FLUSH_AUTO) { vmentry_l1d_flush_param = l1tf; return 0; } mutex_lock(&vmx_l1d_flush_mutex); ret = vmx_setup_l1d_flush(l1tf); mutex_unlock(&vmx_l1d_flush_mutex); return ret; } static int vmentry_l1d_flush_get(char *s, const struct kernel_param *kp) { if (WARN_ON_ONCE(l1tf_vmx_mitigation >= ARRAY_SIZE(vmentry_l1d_param))) return sysfs_emit(s, "???\n"); return sysfs_emit(s, "%s\n", vmentry_l1d_param[l1tf_vmx_mitigation].option); } static __always_inline void vmx_disable_fb_clear(struct vcpu_vmx *vmx) { u64 msr; if (!vmx->disable_fb_clear) return; msr = native_rdmsrq(MSR_IA32_MCU_OPT_CTRL); msr |= FB_CLEAR_DIS; native_wrmsrq(MSR_IA32_MCU_OPT_CTRL, msr); /* Cache the MSR value to avoid reading it later */ vmx->msr_ia32_mcu_opt_ctrl = msr; } static __always_inline void vmx_enable_fb_clear(struct vcpu_vmx *vmx) { if (!vmx->disable_fb_clear) return; vmx->msr_ia32_mcu_opt_ctrl &= ~FB_CLEAR_DIS; native_wrmsrq(MSR_IA32_MCU_OPT_CTRL, vmx->msr_ia32_mcu_opt_ctrl); } static void vmx_update_fb_clear_dis(struct kvm_vcpu *vcpu, struct vcpu_vmx *vmx) { /* * Disable VERW's behavior of clearing CPU buffers for the guest if the * CPU isn't affected by MDS/TAA, and the host hasn't forcefully enabled * the mitigation. Disabling the clearing behavior provides a * performance boost for guests that aren't aware that manually clearing * CPU buffers is unnecessary, at the cost of MSR accesses on VM-Entry * and VM-Exit. */ vmx->disable_fb_clear = !cpu_feature_enabled(X86_FEATURE_CLEAR_CPU_BUF) && (kvm_host.arch_capabilities & ARCH_CAP_FB_CLEAR_CTRL) && !boot_cpu_has_bug(X86_BUG_MDS) && !boot_cpu_has_bug(X86_BUG_TAA); /* * If guest will not execute VERW, there is no need to set FB_CLEAR_DIS * at VMEntry. Skip the MSR read/write when a guest has no use case to * execute VERW. */ if ((vcpu->arch.arch_capabilities & ARCH_CAP_FB_CLEAR) || ((vcpu->arch.arch_capabilities & ARCH_CAP_MDS_NO) && (vcpu->arch.arch_capabilities & ARCH_CAP_TAA_NO) && (vcpu->arch.arch_capabilities & ARCH_CAP_PSDP_NO) && (vcpu->arch.arch_capabilities & ARCH_CAP_FBSDP_NO) && (vcpu->arch.arch_capabilities & ARCH_CAP_SBDR_SSDP_NO))) vmx->disable_fb_clear = false; } static const struct kernel_param_ops vmentry_l1d_flush_ops = { .set = vmentry_l1d_flush_set, .get = vmentry_l1d_flush_get, }; module_param_cb(vmentry_l1d_flush, &vmentry_l1d_flush_ops, NULL, 0644); static u32 vmx_segment_access_rights(struct kvm_segment *var); void vmx_vmexit(void); #define vmx_insn_failed(fmt...) \ do { \ WARN_ONCE(1, fmt); \ pr_warn_ratelimited(fmt); \ } while (0) noinline void vmread_error(unsigned long field) { vmx_insn_failed("vmread failed: field=%lx\n", field); } #ifndef CONFIG_CC_HAS_ASM_GOTO_OUTPUT noinstr void vmread_error_trampoline2(unsigned long field, bool fault) { if (fault) { kvm_spurious_fault(); } else { instrumentation_begin(); vmread_error(field); instrumentation_end(); } } #endif noinline void vmwrite_error(unsigned long field, unsigned long value) { vmx_insn_failed("vmwrite failed: field=%lx val=%lx err=%u\n", field, value, vmcs_read32(VM_INSTRUCTION_ERROR)); } noinline void vmclear_error(struct vmcs *vmcs, u64 phys_addr) { vmx_insn_failed("vmclear failed: %p/%llx err=%u\n", vmcs, phys_addr, vmcs_read32(VM_INSTRUCTION_ERROR)); } noinline void vmptrld_error(struct vmcs *vmcs, u64 phys_addr) { vmx_insn_failed("vmptrld failed: %p/%llx err=%u\n", vmcs, phys_addr, vmcs_read32(VM_INSTRUCTION_ERROR)); } noinline void invvpid_error(unsigned long ext, u16 vpid, gva_t gva) { vmx_insn_failed("invvpid failed: ext=0x%lx vpid=%u gva=0x%lx\n", ext, vpid, gva); } noinline void invept_error(unsigned long ext, u64 eptp) { vmx_insn_failed("invept failed: ext=0x%lx eptp=%llx\n", ext, eptp); } static DEFINE_PER_CPU(struct vmcs *, vmxarea); DEFINE_PER_CPU(struct vmcs *, current_vmcs); /* * We maintain a per-CPU linked-list of VMCS loaded on that CPU. This is needed * when a CPU is brought down, and we need to VMCLEAR all VMCSs loaded on it. */ static DEFINE_PER_CPU(struct list_head, loaded_vmcss_on_cpu); static DECLARE_BITMAP(vmx_vpid_bitmap, VMX_NR_VPIDS); static DEFINE_SPINLOCK(vmx_vpid_lock); struct vmcs_config vmcs_config __ro_after_init; struct vmx_capability vmx_capability __ro_after_init; #define VMX_SEGMENT_FIELD(seg) \ [VCPU_SREG_##seg] = { \ .selector = GUEST_##seg##_SELECTOR, \ .base = GUEST_##seg##_BASE, \ .limit = GUEST_##seg##_LIMIT, \ .ar_bytes = GUEST_##seg##_AR_BYTES, \ } static const struct kvm_vmx_segment_field { unsigned selector; unsigned base; unsigned limit; unsigned ar_bytes; } kvm_vmx_segment_fields[] = { VMX_SEGMENT_FIELD(CS), VMX_SEGMENT_FIELD(DS), VMX_SEGMENT_FIELD(ES), VMX_SEGMENT_FIELD(FS), VMX_SEGMENT_FIELD(GS), VMX_SEGMENT_FIELD(SS), VMX_SEGMENT_FIELD(TR), VMX_SEGMENT_FIELD(LDTR), }; static unsigned long host_idt_base; #if IS_ENABLED(CONFIG_HYPERV) static bool __read_mostly enlightened_vmcs = true; module_param(enlightened_vmcs, bool, 0444); static int hv_enable_l2_tlb_flush(struct kvm_vcpu *vcpu) { struct hv_enlightened_vmcs *evmcs; hpa_t partition_assist_page = hv_get_partition_assist_page(vcpu); if (partition_assist_page == INVALID_PAGE) return -ENOMEM; evmcs = (struct hv_enlightened_vmcs *)to_vmx(vcpu)->loaded_vmcs->vmcs; evmcs->partition_assist_page = partition_assist_page; evmcs->hv_vm_id = (unsigned long)vcpu->kvm; evmcs->hv_enlightenments_control.nested_flush_hypercall = 1; return 0; } static __init void hv_init_evmcs(void) { int cpu; if (!enlightened_vmcs) return; /* * Enlightened VMCS usage should be recommended and the host needs * to support eVMCS v1 or above. */ if (ms_hyperv.hints & HV_X64_ENLIGHTENED_VMCS_RECOMMENDED && (ms_hyperv.nested_features & HV_X64_ENLIGHTENED_VMCS_VERSION) >= KVM_EVMCS_VERSION) { /* Check that we have assist pages on all online CPUs */ for_each_online_cpu(cpu) { if (!hv_get_vp_assist_page(cpu)) { enlightened_vmcs = false; break; } } if (enlightened_vmcs) { pr_info("Using Hyper-V Enlightened VMCS\n"); static_branch_enable(&__kvm_is_using_evmcs); } if (ms_hyperv.nested_features & HV_X64_NESTED_DIRECT_FLUSH) vt_x86_ops.enable_l2_tlb_flush = hv_enable_l2_tlb_flush; } else { enlightened_vmcs = false; } } static void hv_reset_evmcs(void) { struct hv_vp_assist_page *vp_ap; if (!kvm_is_using_evmcs()) return; /* * KVM should enable eVMCS if and only if all CPUs have a VP assist * page, and should reject CPU onlining if eVMCS is enabled the CPU * doesn't have a VP assist page allocated. */ vp_ap = hv_get_vp_assist_page(smp_processor_id()); if (WARN_ON_ONCE(!vp_ap)) return; /* * Reset everything to support using non-enlightened VMCS access later * (e.g. when we reload the module with enlightened_vmcs=0) */ vp_ap->nested_control.features.directhypercall = 0; vp_ap->current_nested_vmcs = 0; vp_ap->enlighten_vmentry = 0; } #else /* IS_ENABLED(CONFIG_HYPERV) */ static void hv_init_evmcs(void) {} static void hv_reset_evmcs(void) {} #endif /* IS_ENABLED(CONFIG_HYPERV) */ /* * Comment's format: document - errata name - stepping - processor name. * Refer from * https://www.virtualbox.org/svn/vbox/trunk/src/VBox/VMM/VMMR0/HMR0.cpp */ static u32 vmx_preemption_cpu_tfms[] = { /* 323344.pdf - BA86 - D0 - Xeon 7500 Series */ 0x000206E6, /* 323056.pdf - AAX65 - C2 - Xeon L3406 */ /* 322814.pdf - AAT59 - C2 - i7-600, i5-500, i5-400 and i3-300 Mobile */ /* 322911.pdf - AAU65 - C2 - i5-600, i3-500 Desktop and Pentium G6950 */ 0x00020652, /* 322911.pdf - AAU65 - K0 - i5-600, i3-500 Desktop and Pentium G6950 */ 0x00020655, /* 322373.pdf - AAO95 - B1 - Xeon 3400 Series */ /* 322166.pdf - AAN92 - B1 - i7-800 and i5-700 Desktop */ /* * 320767.pdf - AAP86 - B1 - * i7-900 Mobile Extreme, i7-800 and i7-700 Mobile */ 0x000106E5, /* 321333.pdf - AAM126 - C0 - Xeon 3500 */ 0x000106A0, /* 321333.pdf - AAM126 - C1 - Xeon 3500 */ 0x000106A1, /* 320836.pdf - AAJ124 - C0 - i7-900 Desktop Extreme and i7-900 Desktop */ 0x000106A4, /* 321333.pdf - AAM126 - D0 - Xeon 3500 */ /* 321324.pdf - AAK139 - D0 - Xeon 5500 */ /* 320836.pdf - AAJ124 - D0 - i7-900 Extreme and i7-900 Desktop */ 0x000106A5, /* Xeon E3-1220 V2 */ 0x000306A8, }; static inline bool cpu_has_broken_vmx_preemption_timer(void) { u32 eax = cpuid_eax(0x00000001), i; /* Clear the reserved bits */ eax &= ~(0x3U << 14 | 0xfU << 28); for (i = 0; i < ARRAY_SIZE(vmx_preemption_cpu_tfms); i++) if (eax == vmx_preemption_cpu_tfms[i]) return true; return false; } static inline bool cpu_need_virtualize_apic_accesses(struct kvm_vcpu *vcpu) { return flexpriority_enabled && lapic_in_kernel(vcpu); } struct vmx_uret_msr *vmx_find_uret_msr(struct vcpu_vmx *vmx, u32 msr) { int i; i = kvm_find_user_return_msr(msr); if (i >= 0) return &vmx->guest_uret_msrs[i]; return NULL; } static int vmx_set_guest_uret_msr(struct vcpu_vmx *vmx, struct vmx_uret_msr *msr, u64 data) { unsigned int slot = msr - vmx->guest_uret_msrs; int ret = 0; if (msr->load_into_hardware) { preempt_disable(); ret = kvm_set_user_return_msr(slot, data, msr->mask); preempt_enable(); } if (!ret) msr->data = data; return ret; } /* * Disable VMX and clear CR4.VMXE (even if VMXOFF faults) * * Note, VMXOFF causes a #UD if the CPU is !post-VMXON, but it's impossible to * atomically track post-VMXON state, e.g. this may be called in NMI context. * Eat all faults as all other faults on VMXOFF faults are mode related, i.e. * faults are guaranteed to be due to the !post-VMXON check unless the CPU is * magically in RM, VM86, compat mode, or at CPL>0. */ static int kvm_cpu_vmxoff(void) { asm goto("1: vmxoff\n\t" _ASM_EXTABLE(1b, %l[fault]) ::: "cc", "memory" : fault); cr4_clear_bits(X86_CR4_VMXE); return 0; fault: cr4_clear_bits(X86_CR4_VMXE); return -EIO; } void vmx_emergency_disable_virtualization_cpu(void) { int cpu = raw_smp_processor_id(); struct loaded_vmcs *v; kvm_rebooting = true; /* * Note, CR4.VMXE can be _cleared_ in NMI context, but it can only be * set in task context. If this races with VMX is disabled by an NMI, * VMCLEAR and VMXOFF may #UD, but KVM will eat those faults due to * kvm_rebooting set. */ if (!(__read_cr4() & X86_CR4_VMXE)) return; list_for_each_entry(v, &per_cpu(loaded_vmcss_on_cpu, cpu), loaded_vmcss_on_cpu_link) { vmcs_clear(v->vmcs); if (v->shadow_vmcs) vmcs_clear(v->shadow_vmcs); } kvm_cpu_vmxoff(); } static void __loaded_vmcs_clear(void *arg) { struct loaded_vmcs *loaded_vmcs = arg; int cpu = raw_smp_processor_id(); if (loaded_vmcs->cpu != cpu) return; /* vcpu migration can race with cpu offline */ if (per_cpu(current_vmcs, cpu) == loaded_vmcs->vmcs) per_cpu(current_vmcs, cpu) = NULL; vmcs_clear(loaded_vmcs->vmcs); if (loaded_vmcs->shadow_vmcs && loaded_vmcs->launched) vmcs_clear(loaded_vmcs->shadow_vmcs); list_del(&loaded_vmcs->loaded_vmcss_on_cpu_link); /* * Ensure all writes to loaded_vmcs, including deleting it from its * current percpu list, complete before setting loaded_vmcs->cpu to * -1, otherwise a different cpu can see loaded_vmcs->cpu == -1 first * and add loaded_vmcs to its percpu list before it's deleted from this * cpu's list. Pairs with the smp_rmb() in vmx_vcpu_load_vmcs(). */ smp_wmb(); loaded_vmcs->cpu = -1; loaded_vmcs->launched = 0; } void loaded_vmcs_clear(struct loaded_vmcs *loaded_vmcs) { int cpu = loaded_vmcs->cpu; if (cpu != -1) smp_call_function_single(cpu, __loaded_vmcs_clear, loaded_vmcs, 1); } static bool vmx_segment_cache_test_set(struct vcpu_vmx *vmx, unsigned seg, unsigned field) { bool ret; u32 mask = 1 << (seg * SEG_FIELD_NR + field); if (!kvm_register_is_available(&vmx->vcpu, VCPU_EXREG_SEGMENTS)) { kvm_register_mark_available(&vmx->vcpu, VCPU_EXREG_SEGMENTS); vmx->segment_cache.bitmask = 0; } ret = vmx->segment_cache.bitmask & mask; vmx->segment_cache.bitmask |= mask; return ret; } static u16 vmx_read_guest_seg_selector(struct vcpu_vmx *vmx, unsigned seg) { u16 *p = &vmx->segment_cache.seg[seg].selector; if (!vmx_segment_cache_test_set(vmx, seg, SEG_FIELD_SEL)) *p = vmcs_read16(kvm_vmx_segment_fields[seg].selector); return *p; } static ulong vmx_read_guest_seg_base(struct vcpu_vmx *vmx, unsigned seg) { ulong *p = &vmx->segment_cache.seg[seg].base; if (!vmx_segment_cache_test_set(vmx, seg, SEG_FIELD_BASE)) *p = vmcs_readl(kvm_vmx_segment_fields[seg].base); return *p; } static u32 vmx_read_guest_seg_limit(struct vcpu_vmx *vmx, unsigned seg) { u32 *p = &vmx->segment_cache.seg[seg].limit; if (!vmx_segment_cache_test_set(vmx, seg, SEG_FIELD_LIMIT)) *p = vmcs_read32(kvm_vmx_segment_fields[seg].limit); return *p; } static u32 vmx_read_guest_seg_ar(struct vcpu_vmx *vmx, unsigned seg) { u32 *p = &vmx->segment_cache.seg[seg].ar; if (!vmx_segment_cache_test_set(vmx, seg, SEG_FIELD_AR)) *p = vmcs_read32(kvm_vmx_segment_fields[seg].ar_bytes); return *p; } void vmx_update_exception_bitmap(struct kvm_vcpu *vcpu) { u32 eb; eb = (1u << PF_VECTOR) | (1u << UD_VECTOR) | (1u << MC_VECTOR) | (1u << DB_VECTOR) | (1u << AC_VECTOR); /* * #VE isn't used for VMX. To test against unexpected changes * related to #VE for VMX, intercept unexpected #VE and warn on it. */ if (IS_ENABLED(CONFIG_KVM_INTEL_PROVE_VE)) eb |= 1u << VE_VECTOR; /* * Guest access to VMware backdoor ports could legitimately * trigger #GP because of TSS I/O permission bitmap. * We intercept those #GP and allow access to them anyway * as VMware does. */ if (enable_vmware_backdoor) eb |= (1u << GP_VECTOR); if ((vcpu->guest_debug & (KVM_GUESTDBG_ENABLE | KVM_GUESTDBG_USE_SW_BP)) == (KVM_GUESTDBG_ENABLE | KVM_GUESTDBG_USE_SW_BP)) eb |= 1u << BP_VECTOR; if (to_vmx(vcpu)->rmode.vm86_active) eb = ~0; if (!vmx_need_pf_intercept(vcpu)) eb &= ~(1u << PF_VECTOR); /* When we are running a nested L2 guest and L1 specified for it a * certain exception bitmap, we must trap the same exceptions and pass * them to L1. When running L2, we will only handle the exceptions * specified above if L1 did not want them. */ if (is_guest_mode(vcpu)) eb |= get_vmcs12(vcpu)->exception_bitmap; else { int mask = 0, match = 0; if (enable_ept && (eb & (1u << PF_VECTOR))) { /* * If EPT is enabled, #PF is currently only intercepted * if MAXPHYADDR is smaller on the guest than on the * host. In that case we only care about present, * non-reserved faults. For vmcs02, however, PFEC_MASK * and PFEC_MATCH are set in prepare_vmcs02_rare. */ mask = PFERR_PRESENT_MASK | PFERR_RSVD_MASK; match = PFERR_PRESENT_MASK; } vmcs_write32(PAGE_FAULT_ERROR_CODE_MASK, mask); vmcs_write32(PAGE_FAULT_ERROR_CODE_MATCH, match); } /* * Disabling xfd interception indicates that dynamic xfeatures * might be used in the guest. Always trap #NM in this case * to save guest xfd_err timely. */ if (vcpu->arch.xfd_no_write_intercept) eb |= (1u << NM_VECTOR); vmcs_write32(EXCEPTION_BITMAP, eb); } /* * Check if MSR is intercepted for currently loaded MSR bitmap. */ static bool msr_write_intercepted(struct vcpu_vmx *vmx, u32 msr) { if (!(exec_controls_get(vmx) & CPU_BASED_USE_MSR_BITMAPS)) return true; return vmx_test_msr_bitmap_write(vmx->loaded_vmcs->msr_bitmap, msr); } unsigned int __vmx_vcpu_run_flags(struct vcpu_vmx *vmx) { unsigned int flags = 0; if (vmx->loaded_vmcs->launched) flags |= VMX_RUN_VMRESUME; /* * If writes to the SPEC_CTRL MSR aren't intercepted, the guest is free * to change it directly without causing a vmexit. In that case read * it after vmexit and store it in vmx->spec_ctrl. */ if (!msr_write_intercepted(vmx, MSR_IA32_SPEC_CTRL)) flags |= VMX_RUN_SAVE_SPEC_CTRL; if (static_branch_unlikely(&cpu_buf_vm_clear) && kvm_vcpu_can_access_host_mmio(&vmx->vcpu)) flags |= VMX_RUN_CLEAR_CPU_BUFFERS_FOR_MMIO; return flags; } static __always_inline void clear_atomic_switch_msr_special(struct vcpu_vmx *vmx, unsigned long entry, unsigned long exit) { vm_entry_controls_clearbit(vmx, entry); vm_exit_controls_clearbit(vmx, exit); } int vmx_find_loadstore_msr_slot(struct vmx_msrs *m, u32 msr) { unsigned int i; for (i = 0; i < m->nr; ++i) { if (m->val[i].index == msr) return i; } return -ENOENT; } static void clear_atomic_switch_msr(struct vcpu_vmx *vmx, unsigned msr) { int i; struct msr_autoload *m = &vmx->msr_autoload; switch (msr) { case MSR_EFER: if (cpu_has_load_ia32_efer()) { clear_atomic_switch_msr_special(vmx, VM_ENTRY_LOAD_IA32_EFER, VM_EXIT_LOAD_IA32_EFER); return; } break; case MSR_CORE_PERF_GLOBAL_CTRL: if (cpu_has_load_perf_global_ctrl()) { clear_atomic_switch_msr_special(vmx, VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL, VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL); return; } break; } i = vmx_find_loadstore_msr_slot(&m->guest, msr); if (i < 0) goto skip_guest; --m->guest.nr; m->guest.val[i] = m->guest.val[m->guest.nr]; vmcs_write32(VM_ENTRY_MSR_LOAD_COUNT, m->guest.nr); skip_guest: i = vmx_find_loadstore_msr_slot(&m->host, msr); if (i < 0) return; --m->host.nr; m->host.val[i] = m->host.val[m->host.nr]; vmcs_write32(VM_EXIT_MSR_LOAD_COUNT, m->host.nr); } static __always_inline void add_atomic_switch_msr_special(struct vcpu_vmx *vmx, unsigned long entry, unsigned long exit, unsigned long guest_val_vmcs, unsigned long host_val_vmcs, u64 guest_val, u64 host_val) { vmcs_write64(guest_val_vmcs, guest_val); if (host_val_vmcs != HOST_IA32_EFER) vmcs_write64(host_val_vmcs, host_val); vm_entry_controls_setbit(vmx, entry); vm_exit_controls_setbit(vmx, exit); } static void add_atomic_switch_msr(struct vcpu_vmx *vmx, unsigned msr, u64 guest_val, u64 host_val, bool entry_only) { int i, j = 0; struct msr_autoload *m = &vmx->msr_autoload; switch (msr) { case MSR_EFER: if (cpu_has_load_ia32_efer()) { add_atomic_switch_msr_special(vmx, VM_ENTRY_LOAD_IA32_EFER, VM_EXIT_LOAD_IA32_EFER, GUEST_IA32_EFER, HOST_IA32_EFER, guest_val, host_val); return; } break; case MSR_CORE_PERF_GLOBAL_CTRL: if (cpu_has_load_perf_global_ctrl()) { add_atomic_switch_msr_special(vmx, VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL, VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL, GUEST_IA32_PERF_GLOBAL_CTRL, HOST_IA32_PERF_GLOBAL_CTRL, guest_val, host_val); return; } break; case MSR_IA32_PEBS_ENABLE: /* PEBS needs a quiescent period after being disabled (to write * a record). Disabling PEBS through VMX MSR swapping doesn't * provide that period, so a CPU could write host's record into * guest's memory. */ wrmsrq(MSR_IA32_PEBS_ENABLE, 0); } i = vmx_find_loadstore_msr_slot(&m->guest, msr); if (!entry_only) j = vmx_find_loadstore_msr_slot(&m->host, msr); if ((i < 0 && m->guest.nr == MAX_NR_LOADSTORE_MSRS) || (j < 0 && m->host.nr == MAX_NR_LOADSTORE_MSRS)) { printk_once(KERN_WARNING "Not enough msr switch entries. " "Can't add msr %x\n", msr); return; } if (i < 0) { i = m->guest.nr++; vmcs_write32(VM_ENTRY_MSR_LOAD_COUNT, m->guest.nr); } m->guest.val[i].index = msr; m->guest.val[i].value = guest_val; if (entry_only) return; if (j < 0) { j = m->host.nr++; vmcs_write32(VM_EXIT_MSR_LOAD_COUNT, m->host.nr); } m->host.val[j].index = msr; m->host.val[j].value = host_val; } static bool update_transition_efer(struct vcpu_vmx *vmx) { u64 guest_efer = vmx->vcpu.arch.efer; u64 ignore_bits = 0; int i; /* Shadow paging assumes NX to be available. */ if (!enable_ept) guest_efer |= EFER_NX; /* * LMA and LME handled by hardware; SCE meaningless outside long mode. */ ignore_bits |= EFER_SCE; #ifdef CONFIG_X86_64 ignore_bits |= EFER_LMA | EFER_LME; /* SCE is meaningful only in long mode on Intel */ if (guest_efer & EFER_LMA) ignore_bits &= ~(u64)EFER_SCE; #endif /* * On EPT, we can't emulate NX, so we must switch EFER atomically. * On CPUs that support "load IA32_EFER", always switch EFER * atomically, since it's faster than switching it manually. */ if (cpu_has_load_ia32_efer() || (enable_ept && ((vmx->vcpu.arch.efer ^ kvm_host.efer) & EFER_NX))) { if (!(guest_efer & EFER_LMA)) guest_efer &= ~EFER_LME; if (guest_efer != kvm_host.efer) add_atomic_switch_msr(vmx, MSR_EFER, guest_efer, kvm_host.efer, false); else clear_atomic_switch_msr(vmx, MSR_EFER); return false; } i = kvm_find_user_return_msr(MSR_EFER); if (i < 0) return false; clear_atomic_switch_msr(vmx, MSR_EFER); guest_efer &= ~ignore_bits; guest_efer |= kvm_host.efer & ignore_bits; vmx->guest_uret_msrs[i].data = guest_efer; vmx->guest_uret_msrs[i].mask = ~ignore_bits; return true; } #ifdef CONFIG_X86_32 /* * On 32-bit kernels, VM exits still load the FS and GS bases from the * VMCS rather than the segment table. KVM uses this helper to figure * out the current bases to poke them into the VMCS before entry. */ static unsigned long segment_base(u16 selector) { struct desc_struct *table; unsigned long v; if (!(selector & ~SEGMENT_RPL_MASK)) return 0; table = get_current_gdt_ro(); if ((selector & SEGMENT_TI_MASK) == SEGMENT_LDT) { u16 ldt_selector = kvm_read_ldt(); if (!(ldt_selector & ~SEGMENT_RPL_MASK)) return 0; table = (struct desc_struct *)segment_base(ldt_selector); } v = get_desc_base(&table[selector >> 3]); return v; } #endif static inline bool pt_can_write_msr(struct vcpu_vmx *vmx) { return vmx_pt_mode_is_host_guest() && !(vmx->pt_desc.guest.ctl & RTIT_CTL_TRACEEN); } static inline bool pt_output_base_valid(struct kvm_vcpu *vcpu, u64 base) { /* The base must be 128-byte aligned and a legal physical address. */ return kvm_vcpu_is_legal_aligned_gpa(vcpu, base, 128); } static inline void pt_load_msr(struct pt_ctx *ctx, u32 addr_range) { u32 i; wrmsrq(MSR_IA32_RTIT_STATUS, ctx->status); wrmsrq(MSR_IA32_RTIT_OUTPUT_BASE, ctx->output_base); wrmsrq(MSR_IA32_RTIT_OUTPUT_MASK, ctx->output_mask); wrmsrq(MSR_IA32_RTIT_CR3_MATCH, ctx->cr3_match); for (i = 0; i < addr_range; i++) { wrmsrq(MSR_IA32_RTIT_ADDR0_A + i * 2, ctx->addr_a[i]); wrmsrq(MSR_IA32_RTIT_ADDR0_B + i * 2, ctx->addr_b[i]); } } static inline void pt_save_msr(struct pt_ctx *ctx, u32 addr_range) { u32 i; rdmsrq(MSR_IA32_RTIT_STATUS, ctx->status); rdmsrq(MSR_IA32_RTIT_OUTPUT_BASE, ctx->output_base); rdmsrq(MSR_IA32_RTIT_OUTPUT_MASK, ctx->output_mask); rdmsrq(MSR_IA32_RTIT_CR3_MATCH, ctx->cr3_match); for (i = 0; i < addr_range; i++) { rdmsrq(MSR_IA32_RTIT_ADDR0_A + i * 2, ctx->addr_a[i]); rdmsrq(MSR_IA32_RTIT_ADDR0_B + i * 2, ctx->addr_b[i]); } } static void pt_guest_enter(struct vcpu_vmx *vmx) { if (vmx_pt_mode_is_system()) return; /* * GUEST_IA32_RTIT_CTL is already set in the VMCS. * Save host state before VM entry. */ rdmsrq(MSR_IA32_RTIT_CTL, vmx->pt_desc.host.ctl); if (vmx->pt_desc.guest.ctl & RTIT_CTL_TRACEEN) { wrmsrq(MSR_IA32_RTIT_CTL, 0); pt_save_msr(&vmx->pt_desc.host, vmx->pt_desc.num_address_ranges); pt_load_msr(&vmx->pt_desc.guest, vmx->pt_desc.num_address_ranges); } } static void pt_guest_exit(struct vcpu_vmx *vmx) { if (vmx_pt_mode_is_system()) return; if (vmx->pt_desc.guest.ctl & RTIT_CTL_TRACEEN) { pt_save_msr(&vmx->pt_desc.guest, vmx->pt_desc.num_address_ranges); pt_load_msr(&vmx->pt_desc.host, vmx->pt_desc.num_address_ranges); } /* * KVM requires VM_EXIT_CLEAR_IA32_RTIT_CTL to expose PT to the guest, * i.e. RTIT_CTL is always cleared on VM-Exit. Restore it if necessary. */ if (vmx->pt_desc.host.ctl) wrmsrq(MSR_IA32_RTIT_CTL, vmx->pt_desc.host.ctl); } void vmx_set_host_fs_gs(struct vmcs_host_state *host, u16 fs_sel, u16 gs_sel, unsigned long fs_base, unsigned long gs_base) { if (unlikely(fs_sel != host->fs_sel)) { if (!(fs_sel & 7)) vmcs_write16(HOST_FS_SELECTOR, fs_sel); else vmcs_write16(HOST_FS_SELECTOR, 0); host->fs_sel = fs_sel; } if (unlikely(gs_sel != host->gs_sel)) { if (!(gs_sel & 7)) vmcs_write16(HOST_GS_SELECTOR, gs_sel); else vmcs_write16(HOST_GS_SELECTOR, 0); host->gs_sel = gs_sel; } if (unlikely(fs_base != host->fs_base)) { vmcs_writel(HOST_FS_BASE, fs_base); host->fs_base = fs_base; } if (unlikely(gs_base != host->gs_base)) { vmcs_writel(HOST_GS_BASE, gs_base); host->gs_base = gs_base; } } void vmx_prepare_switch_to_guest(struct kvm_vcpu *vcpu) { struct vcpu_vmx *vmx = to_vmx(vcpu); struct vcpu_vt *vt = to_vt(vcpu); struct vmcs_host_state *host_state; #ifdef CONFIG_X86_64 int cpu = raw_smp_processor_id(); #endif unsigned long fs_base, gs_base; u16 fs_sel, gs_sel; int i; /* * Note that guest MSRs to be saved/restored can also be changed * when guest state is loaded. This happens when guest transitions * to/from long-mode by setting MSR_EFER.LMA. */ if (!vmx->guest_uret_msrs_loaded) { vmx->guest_uret_msrs_loaded = true; for (i = 0; i < kvm_nr_uret_msrs; ++i) { if (!vmx->guest_uret_msrs[i].load_into_hardware) continue; kvm_set_user_return_msr(i, vmx->guest_uret_msrs[i].data, vmx->guest_uret_msrs[i].mask); } } if (vmx->nested.need_vmcs12_to_shadow_sync) nested_sync_vmcs12_to_shadow(vcpu); if (vt->guest_state_loaded) return; host_state = &vmx->loaded_vmcs->host_state; /* * Set host fs and gs selectors. Unfortunately, 22.2.3 does not * allow segment selectors with cpl > 0 or ti == 1. */ host_state->ldt_sel = kvm_read_ldt(); #ifdef CONFIG_X86_64 savesegment(ds, host_state->ds_sel); savesegment(es, host_state->es_sel); gs_base = cpu_kernelmode_gs_base(cpu); if (likely(is_64bit_mm(current->mm))) { current_save_fsgs(); fs_sel = current->thread.fsindex; gs_sel = current->thread.gsindex; fs_base = current->thread.fsbase; vt->msr_host_kernel_gs_base = current->thread.gsbase; } else { savesegment(fs, fs_sel); savesegment(gs, gs_sel); fs_base = read_msr(MSR_FS_BASE); vt->msr_host_kernel_gs_base = read_msr(MSR_KERNEL_GS_BASE); } wrmsrq(MSR_KERNEL_GS_BASE, vmx->msr_guest_kernel_gs_base); #else savesegment(fs, fs_sel); savesegment(gs, gs_sel); fs_base = segment_base(fs_sel); gs_base = segment_base(gs_sel); #endif vmx_set_host_fs_gs(host_state, fs_sel, gs_sel, fs_base, gs_base); vt->guest_state_loaded = true; } static void vmx_prepare_switch_to_host(struct vcpu_vmx *vmx) { struct vmcs_host_state *host_state; if (!vmx->vt.guest_state_loaded) return; host_state = &vmx->loaded_vmcs->host_state; ++vmx->vcpu.stat.host_state_reload; #ifdef CONFIG_X86_64 rdmsrq(MSR_KERNEL_GS_BASE, vmx->msr_guest_kernel_gs_base); #endif if (host_state->ldt_sel || (host_state->gs_sel & 7)) { kvm_load_ldt(host_state->ldt_sel); #ifdef CONFIG_X86_64 load_gs_index(host_state->gs_sel); #else loadsegment(gs, host_state->gs_sel); #endif } if (host_state->fs_sel & 7) loadsegment(fs, host_state->fs_sel); #ifdef CONFIG_X86_64 if (unlikely(host_state->ds_sel | host_state->es_sel)) { loadsegment(ds, host_state->ds_sel); loadsegment(es, host_state->es_sel); } #endif invalidate_tss_limit(); #ifdef CONFIG_X86_64 wrmsrq(MSR_KERNEL_GS_BASE, vmx->vt.msr_host_kernel_gs_base); #endif load_fixmap_gdt(raw_smp_processor_id()); vmx->vt.guest_state_loaded = false; vmx->guest_uret_msrs_loaded = false; } #ifdef CONFIG_X86_64 static u64 vmx_read_guest_kernel_gs_base(struct vcpu_vmx *vmx) { preempt_disable(); if (vmx->vt.guest_state_loaded) rdmsrq(MSR_KERNEL_GS_BASE, vmx->msr_guest_kernel_gs_base); preempt_enable(); return vmx->msr_guest_kernel_gs_base; } static void vmx_write_guest_kernel_gs_base(struct vcpu_vmx *vmx, u64 data) { preempt_disable(); if (vmx->vt.guest_state_loaded) wrmsrq(MSR_KERNEL_GS_BASE, data); preempt_enable(); vmx->msr_guest_kernel_gs_base = data; } #endif static void grow_ple_window(struct kvm_vcpu *vcpu) { struct vcpu_vmx *vmx = to_vmx(vcpu); unsigned int old = vmx->ple_window; vmx->ple_window = __grow_ple_window(old, ple_window, ple_window_grow, ple_window_max); if (vmx->ple_window != old) { vmx->ple_window_dirty = true; trace_kvm_ple_window_update(vcpu->vcpu_id, vmx->ple_window, old); } } static void shrink_ple_window(struct kvm_vcpu *vcpu) { struct vcpu_vmx *vmx = to_vmx(vcpu); unsigned int old = vmx->ple_window; vmx->ple_window = __shrink_ple_window(old, ple_window, ple_window_shrink, ple_window); if (vmx->ple_window != old) { vmx->ple_window_dirty = true; trace_kvm_ple_window_update(vcpu->vcpu_id, vmx->ple_window, old); } } void vmx_vcpu_load_vmcs(struct kvm_vcpu *vcpu, int cpu) { struct vcpu_vmx *vmx = to_vmx(vcpu); bool already_loaded = vmx->loaded_vmcs->cpu == cpu; struct vmcs *prev; if (!already_loaded) { loaded_vmcs_clear(vmx->loaded_vmcs); local_irq_disable(); /* * Ensure loaded_vmcs->cpu is read before adding loaded_vmcs to * this cpu's percpu list, otherwise it may not yet be deleted * from its previous cpu's percpu list. Pairs with the * smb_wmb() in __loaded_vmcs_clear(). */ smp_rmb(); list_add(&vmx->loaded_vmcs->loaded_vmcss_on_cpu_link, &per_cpu(loaded_vmcss_on_cpu, cpu)); local_irq_enable(); } prev = per_cpu(current_vmcs, cpu); if (prev != vmx->loaded_vmcs->vmcs) { per_cpu(current_vmcs, cpu) = vmx->loaded_vmcs->vmcs; vmcs_load(vmx->loaded_vmcs->vmcs); } if (!already_loaded) { void *gdt = get_current_gdt_ro(); /* * Flush all EPTP/VPID contexts, the new pCPU may have stale * TLB entries from its previous association with the vCPU. */ kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu); /* * Linux uses per-cpu TSS and GDT, so set these when switching * processors. See 22.2.4. */ vmcs_writel(HOST_TR_BASE, (unsigned long)&get_cpu_entry_area(cpu)->tss.x86_tss); vmcs_writel(HOST_GDTR_BASE, (unsigned long)gdt); /* 22.2.4 */ if (IS_ENABLED(CONFIG_IA32_EMULATION) || IS_ENABLED(CONFIG_X86_32)) { /* 22.2.3 */ vmcs_writel(HOST_IA32_SYSENTER_ESP, (unsigned long)(cpu_entry_stack(cpu) + 1)); } vmx->loaded_vmcs->cpu = cpu; } } /* * Switches to specified vcpu, until a matching vcpu_put(), but assumes * vcpu mutex is already taken. */ void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu) { if (vcpu->scheduled_out && !kvm_pause_in_guest(vcpu->kvm)) shrink_ple_window(vcpu); vmx_vcpu_load_vmcs(vcpu, cpu); vmx_vcpu_pi_load(vcpu, cpu); } void vmx_vcpu_put(struct kvm_vcpu *vcpu) { vmx_vcpu_pi_put(vcpu); vmx_prepare_switch_to_host(to_vmx(vcpu)); } bool vmx_emulation_required(struct kvm_vcpu *vcpu) { return emulate_invalid_guest_state && !vmx_guest_state_valid(vcpu); } unsigned long vmx_get_rflags(struct kvm_vcpu *vcpu) { struct vcpu_vmx *vmx = to_vmx(vcpu); unsigned long rflags, save_rflags; if (!kvm_register_is_available(vcpu, VCPU_EXREG_RFLAGS)) { kvm_register_mark_available(vcpu, VCPU_EXREG_RFLAGS); rflags = vmcs_readl(GUEST_RFLAGS); if (vmx->rmode.vm86_active) { rflags &= RMODE_GUEST_OWNED_EFLAGS_BITS; save_rflags = vmx->rmode.save_rflags; rflags |= save_rflags & ~RMODE_GUEST_OWNED_EFLAGS_BITS; } vmx->rflags = rflags; } return vmx->rflags; } void vmx_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags) { struct vcpu_vmx *vmx = to_vmx(vcpu); unsigned long old_rflags; /* * Unlike CR0 and CR4, RFLAGS handling requires checking if the vCPU * is an unrestricted guest in order to mark L2 as needing emulation * if L1 runs L2 as a restricted guest. */ if (is_unrestricted_guest(vcpu)) { kvm_register_mark_available(vcpu, VCPU_EXREG_RFLAGS); vmx->rflags = rflags; vmcs_writel(GUEST_RFLAGS, rflags); return; } old_rflags = vmx_get_rflags(vcpu); vmx->rflags = rflags; if (vmx->rmode.vm86_active) { vmx->rmode.save_rflags = rflags; rflags |= X86_EFLAGS_IOPL | X86_EFLAGS_VM; } vmcs_writel(GUEST_RFLAGS, rflags); if ((old_rflags ^ vmx->rflags) & X86_EFLAGS_VM) vmx->vt.emulation_required = vmx_emulation_required(vcpu); } bool vmx_get_if_flag(struct kvm_vcpu *vcpu) { return vmx_get_rflags(vcpu) & X86_EFLAGS_IF; } u32 vmx_get_interrupt_shadow(struct kvm_vcpu *vcpu) { u32 interruptibility = vmcs_read32(GUEST_INTERRUPTIBILITY_INFO); int ret = 0; if (interruptibility & GUEST_INTR_STATE_STI) ret |= KVM_X86_SHADOW_INT_STI; if (interruptibility & GUEST_INTR_STATE_MOV_SS) ret |= KVM_X86_SHADOW_INT_MOV_SS; return ret; } void vmx_set_interrupt_shadow(struct kvm_vcpu *vcpu, int mask) { u32 interruptibility_old = vmcs_read32(GUEST_INTERRUPTIBILITY_INFO); u32 interruptibility = interruptibility_old; interruptibility &= ~(GUEST_INTR_STATE_STI | GUEST_INTR_STATE_MOV_SS); if (mask & KVM_X86_SHADOW_INT_MOV_SS) interruptibility |= GUEST_INTR_STATE_MOV_SS; else if (mask & KVM_X86_SHADOW_INT_STI) interruptibility |= GUEST_INTR_STATE_STI; if ((interruptibility != interruptibility_old)) vmcs_write32(GUEST_INTERRUPTIBILITY_INFO, interruptibility); } static int vmx_rtit_ctl_check(struct kvm_vcpu *vcpu, u64 data) { struct vcpu_vmx *vmx = to_vmx(vcpu); unsigned long value; /* * Any MSR write that attempts to change bits marked reserved will * case a #GP fault. */ if (data & vmx->pt_desc.ctl_bitmask) return 1; /* * Any attempt to modify IA32_RTIT_CTL while TraceEn is set will * result in a #GP unless the same write also clears TraceEn. */ if ((vmx->pt_desc.guest.ctl & RTIT_CTL_TRACEEN) && (data & RTIT_CTL_TRACEEN) && data != vmx->pt_desc.guest.ctl) return 1; /* * WRMSR to IA32_RTIT_CTL that sets TraceEn but clears this bit * and FabricEn would cause #GP, if * CPUID.(EAX=14H, ECX=0):ECX.SNGLRGNOUT[bit 2] = 0 */ if ((data & RTIT_CTL_TRACEEN) && !(data & RTIT_CTL_TOPA) && !(data & RTIT_CTL_FABRIC_EN) && !intel_pt_validate_cap(vmx->pt_desc.caps, PT_CAP_single_range_output)) return 1; /* * MTCFreq, CycThresh and PSBFreq encodings check, any MSR write that * utilize encodings marked reserved will cause a #GP fault. */ value = intel_pt_validate_cap(vmx->pt_desc.caps, PT_CAP_mtc_periods); if (intel_pt_validate_cap(vmx->pt_desc.caps, PT_CAP_mtc) && !test_bit((data & RTIT_CTL_MTC_RANGE) >> RTIT_CTL_MTC_RANGE_OFFSET, &value)) return 1; value = intel_pt_validate_cap(vmx->pt_desc.caps, PT_CAP_cycle_thresholds); if (intel_pt_validate_cap(vmx->pt_desc.caps, PT_CAP_psb_cyc) && !test_bit((data & RTIT_CTL_CYC_THRESH) >> RTIT_CTL_CYC_THRESH_OFFSET, &value)) return 1; value = intel_pt_validate_cap(vmx->pt_desc.caps, PT_CAP_psb_periods); if (intel_pt_validate_cap(vmx->pt_desc.caps, PT_CAP_psb_cyc) && !test_bit((data & RTIT_CTL_PSB_FREQ) >> RTIT_CTL_PSB_FREQ_OFFSET, &value)) return 1; /* * If ADDRx_CFG is reserved or the encodings is >2 will * cause a #GP fault. */ value = (data & RTIT_CTL_ADDR0) >> RTIT_CTL_ADDR0_OFFSET; if ((value && (vmx->pt_desc.num_address_ranges < 1)) || (value > 2)) return 1; value = (data & RTIT_CTL_ADDR1) >> RTIT_CTL_ADDR1_OFFSET; if ((value && (vmx->pt_desc.num_address_ranges < 2)) || (value > 2)) return 1; value = (data & RTIT_CTL_ADDR2) >> RTIT_CTL_ADDR2_OFFSET; if ((value && (vmx->pt_desc.num_address_ranges < 3)) || (value > 2)) return 1; value = (data & RTIT_CTL_ADDR3) >> RTIT_CTL_ADDR3_OFFSET; if ((value && (vmx->pt_desc.num_address_ranges < 4)) || (value > 2)) return 1; return 0; } int vmx_check_emulate_instruction(struct kvm_vcpu *vcpu, int emul_type, void *insn, int insn_len) { /* * Emulation of instructions in SGX enclaves is impossible as RIP does * not point at the failing instruction, and even if it did, the code * stream is inaccessible. Inject #UD instead of exiting to userspace * so that guest userspace can't DoS the guest simply by triggering * emulation (enclaves are CPL3 only). */ if (vmx_get_exit_reason(vcpu).enclave_mode) { kvm_queue_exception(vcpu, UD_VECTOR); return X86EMUL_PROPAGATE_FAULT; } /* Check that emulation is possible during event vectoring */ if ((to_vmx(vcpu)->idt_vectoring_info & VECTORING_INFO_VALID_MASK) && !kvm_can_emulate_event_vectoring(emul_type)) return X86EMUL_UNHANDLEABLE_VECTORING; return X86EMUL_CONTINUE; } static int skip_emulated_instruction(struct kvm_vcpu *vcpu) { union vmx_exit_reason exit_reason = vmx_get_exit_reason(vcpu); unsigned long rip, orig_rip; u32 instr_len; /* * Using VMCS.VM_EXIT_INSTRUCTION_LEN on EPT misconfig depends on * undefined behavior: Intel's SDM doesn't mandate the VMCS field be * set when EPT misconfig occurs. In practice, real hardware updates * VM_EXIT_INSTRUCTION_LEN on EPT misconfig, but other hypervisors * (namely Hyper-V) don't set it due to it being undefined behavior, * i.e. we end up advancing IP with some random value. */ if (!static_cpu_has(X86_FEATURE_HYPERVISOR) || exit_reason.basic != EXIT_REASON_EPT_MISCONFIG) { instr_len = vmcs_read32(VM_EXIT_INSTRUCTION_LEN); /* * Emulating an enclave's instructions isn't supported as KVM * cannot access the enclave's memory or its true RIP, e.g. the * vmcs.GUEST_RIP points at the exit point of the enclave, not * the RIP that actually triggered the VM-Exit. But, because * most instructions that cause VM-Exit will #UD in an enclave, * most instruction-based VM-Exits simply do not occur. * * There are a few exceptions, notably the debug instructions * INT1ICEBRK and INT3, as they are allowed in debug enclaves * and generate #DB/#BP as expected, which KVM might intercept. * But again, the CPU does the dirty work and saves an instr * length of zero so VMMs don't shoot themselves in the foot. * WARN if KVM tries to skip a non-zero length instruction on * a VM-Exit from an enclave. */ if (!instr_len) goto rip_updated; WARN_ONCE(exit_reason.enclave_mode, "skipping instruction after SGX enclave VM-Exit"); orig_rip = kvm_rip_read(vcpu); rip = orig_rip + instr_len; #ifdef CONFIG_X86_64 /* * We need to mask out the high 32 bits of RIP if not in 64-bit * mode, but just finding out that we are in 64-bit mode is * quite expensive. Only do it if there was a carry. */ if (unlikely(((rip ^ orig_rip) >> 31) == 3) && !is_64_bit_mode(vcpu)) rip = (u32)rip; #endif kvm_rip_write(vcpu, rip); } else { if (!kvm_emulate_instruction(vcpu, EMULTYPE_SKIP)) return 0; } rip_updated: /* skipping an emulated instruction also counts */ vmx_set_interrupt_shadow(vcpu, 0); return 1; } /* * Recognizes a pending MTF VM-exit and records the nested state for later * delivery. */ void vmx_update_emulated_instruction(struct kvm_vcpu *vcpu) { struct vmcs12 *vmcs12 = get_vmcs12(vcpu); struct vcpu_vmx *vmx = to_vmx(vcpu); if (!is_guest_mode(vcpu)) return; /* * Per the SDM, MTF takes priority over debug-trap exceptions besides * TSS T-bit traps and ICEBP (INT1). KVM doesn't emulate T-bit traps * or ICEBP (in the emulator proper), and skipping of ICEBP after an * intercepted #DB deliberately avoids single-step #DB and MTF updates * as ICEBP is higher priority than both. As instruction emulation is * completed at this point (i.e. KVM is at the instruction boundary), * any #DB exception pending delivery must be a debug-trap of lower * priority than MTF. Record the pending MTF state to be delivered in * vmx_check_nested_events(). */ if (nested_cpu_has_mtf(vmcs12) && (!vcpu->arch.exception.pending || vcpu->arch.exception.vector == DB_VECTOR) && (!vcpu->arch.exception_vmexit.pending || vcpu->arch.exception_vmexit.vector == DB_VECTOR)) { vmx->nested.mtf_pending = true; kvm_make_request(KVM_REQ_EVENT, vcpu); } else { vmx->nested.mtf_pending = false; } } int vmx_skip_emulated_instruction(struct kvm_vcpu *vcpu) { vmx_update_emulated_instruction(vcpu); return skip_emulated_instruction(vcpu); } static void vmx_clear_hlt(struct kvm_vcpu *vcpu) { /* * Ensure that we clear the HLT state in the VMCS. We don't need to * explicitly skip the instruction because if the HLT state is set, * then the instruction is already executing and RIP has already been * advanced. */ if (kvm_hlt_in_guest(vcpu->kvm) && vmcs_read32(GUEST_ACTIVITY_STATE) == GUEST_ACTIVITY_HLT) vmcs_write32(GUEST_ACTIVITY_STATE, GUEST_ACTIVITY_ACTIVE); } void vmx_inject_exception(struct kvm_vcpu *vcpu) { struct kvm_queued_exception *ex = &vcpu->arch.exception; u32 intr_info = ex->vector | INTR_INFO_VALID_MASK; struct vcpu_vmx *vmx = to_vmx(vcpu); kvm_deliver_exception_payload(vcpu, ex); if (ex->has_error_code) { /* * Despite the error code being architecturally defined as 32 * bits, and the VMCS field being 32 bits, Intel CPUs and thus * VMX don't actually supporting setting bits 31:16. Hardware * will (should) never provide a bogus error code, but AMD CPUs * do generate error codes with bits 31:16 set, and so KVM's * ABI lets userspace shove in arbitrary 32-bit values. Drop * the upper bits to avoid VM-Fail, losing information that * doesn't really exist is preferable to killing the VM. */ vmcs_write32(VM_ENTRY_EXCEPTION_ERROR_CODE, (u16)ex->error_code); intr_info |= INTR_INFO_DELIVER_CODE_MASK; } if (vmx->rmode.vm86_active) { int inc_eip = 0; if (kvm_exception_is_soft(ex->vector)) inc_eip = vcpu->arch.event_exit_inst_len; kvm_inject_realmode_interrupt(vcpu, ex->vector, inc_eip); return; } WARN_ON_ONCE(vmx->vt.emulation_required); if (kvm_exception_is_soft(ex->vector)) { vmcs_write32(VM_ENTRY_INSTRUCTION_LEN, vmx->vcpu.arch.event_exit_inst_len); intr_info |= INTR_TYPE_SOFT_EXCEPTION; } else intr_info |= INTR_TYPE_HARD_EXCEPTION; vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, intr_info); vmx_clear_hlt(vcpu); } static void vmx_setup_uret_msr(struct vcpu_vmx *vmx, unsigned int msr, bool load_into_hardware) { struct vmx_uret_msr *uret_msr; uret_msr = vmx_find_uret_msr(vmx, msr); if (!uret_msr) return; uret_msr->load_into_hardware = load_into_hardware; } /* * Configuring user return MSRs to automatically save, load, and restore MSRs * that need to be shoved into hardware when running the guest. Note, omitting * an MSR here does _NOT_ mean it's not emulated, only that it will not be * loaded into hardware when running the guest. */ static void vmx_setup_uret_msrs(struct vcpu_vmx *vmx) { #ifdef CONFIG_X86_64 bool load_syscall_msrs; /* * The SYSCALL MSRs are only needed on long mode guests, and only * when EFER.SCE is set. */ load_syscall_msrs = is_long_mode(&vmx->vcpu) && (vmx->vcpu.arch.efer & EFER_SCE); vmx_setup_uret_msr(vmx, MSR_STAR, load_syscall_msrs); vmx_setup_uret_msr(vmx, MSR_LSTAR, load_syscall_msrs); vmx_setup_uret_msr(vmx, MSR_SYSCALL_MASK, load_syscall_msrs); #endif vmx_setup_uret_msr(vmx, MSR_EFER, update_transition_efer(vmx)); vmx_setup_uret_msr(vmx, MSR_TSC_AUX, guest_cpu_cap_has(&vmx->vcpu, X86_FEATURE_RDTSCP) || guest_cpu_cap_has(&vmx->vcpu, X86_FEATURE_RDPID)); /* * hle=0, rtm=0, tsx_ctrl=1 can be found with some combinations of new * kernel and old userspace. If those guests run on a tsx=off host, do * allow guests to use TSX_CTRL, but don't change the value in hardware * so that TSX remains always disabled. */ vmx_setup_uret_msr(vmx, MSR_IA32_TSX_CTRL, boot_cpu_has(X86_FEATURE_RTM)); /* * The set of MSRs to load may have changed, reload MSRs before the * next VM-Enter. */ vmx->guest_uret_msrs_loaded = false; } u64 vmx_get_l2_tsc_offset(struct kvm_vcpu *vcpu) { struct vmcs12 *vmcs12 = get_vmcs12(vcpu); if (nested_cpu_has(vmcs12, CPU_BASED_USE_TSC_OFFSETTING)) return vmcs12->tsc_offset; return 0; } u64 vmx_get_l2_tsc_multiplier(struct kvm_vcpu *vcpu) { struct vmcs12 *vmcs12 = get_vmcs12(vcpu); if (nested_cpu_has(vmcs12, CPU_BASED_USE_TSC_OFFSETTING) && nested_cpu_has2(vmcs12, SECONDARY_EXEC_TSC_SCALING)) return vmcs12->tsc_multiplier; return kvm_caps.default_tsc_scaling_ratio; } void vmx_write_tsc_offset(struct kvm_vcpu *vcpu) { vmcs_write64(TSC_OFFSET, vcpu->arch.tsc_offset); } void vmx_write_tsc_multiplier(struct kvm_vcpu *vcpu) { vmcs_write64(TSC_MULTIPLIER, vcpu->arch.tsc_scaling_ratio); } /* * Userspace is allowed to set any supported IA32_FEATURE_CONTROL regardless of * guest CPUID. Note, KVM allows userspace to set "VMX in SMX" to maintain * backwards compatibility even though KVM doesn't support emulating SMX. And * because userspace set "VMX in SMX", the guest must also be allowed to set it, * e.g. if the MSR is left unlocked and the guest does a RMW operation. */ #define KVM_SUPPORTED_FEATURE_CONTROL (FEAT_CTL_LOCKED | \ FEAT_CTL_VMX_ENABLED_INSIDE_SMX | \ FEAT_CTL_VMX_ENABLED_OUTSIDE_SMX | \ FEAT_CTL_SGX_LC_ENABLED | \ FEAT_CTL_SGX_ENABLED | \ FEAT_CTL_LMCE_ENABLED) static inline bool is_vmx_feature_control_msr_valid(struct vcpu_vmx *vmx, struct msr_data *msr) { uint64_t valid_bits; /* * Ensure KVM_SUPPORTED_FEATURE_CONTROL is updated when new bits are * exposed to the guest. */ WARN_ON_ONCE(vmx->msr_ia32_feature_control_valid_bits & ~KVM_SUPPORTED_FEATURE_CONTROL); if (!msr->host_initiated && (vmx->msr_ia32_feature_control & FEAT_CTL_LOCKED)) return false; if (msr->host_initiated) valid_bits = KVM_SUPPORTED_FEATURE_CONTROL; else valid_bits = vmx->msr_ia32_feature_control_valid_bits; return !(msr->data & ~valid_bits); } int vmx_get_feature_msr(u32 msr, u64 *data) { switch (msr) { case KVM_FIRST_EMULATED_VMX_MSR ... KVM_LAST_EMULATED_VMX_MSR: if (!nested) return 1; return vmx_get_vmx_msr(&vmcs_config.nested, msr, data); default: return KVM_MSR_RET_UNSUPPORTED; } } /* * Reads an msr value (of 'msr_info->index') into 'msr_info->data'. * Returns 0 on success, non-0 otherwise. * Assumes vcpu_load() was already called. */ int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) { struct vcpu_vmx *vmx = to_vmx(vcpu); struct vmx_uret_msr *msr; u32 index; switch (msr_info->index) { #ifdef CONFIG_X86_64 case MSR_FS_BASE: msr_info->data = vmcs_readl(GUEST_FS_BASE); break; case MSR_GS_BASE: msr_info->data = vmcs_readl(GUEST_GS_BASE); break; case MSR_KERNEL_GS_BASE: msr_info->data = vmx_read_guest_kernel_gs_base(vmx); break; #endif case MSR_EFER: return kvm_get_msr_common(vcpu, msr_info); case MSR_IA32_TSX_CTRL: if (!msr_info->host_initiated && !(vcpu->arch.arch_capabilities & ARCH_CAP_TSX_CTRL_MSR)) return 1; goto find_uret_msr; case MSR_IA32_UMWAIT_CONTROL: if (!msr_info->host_initiated && !vmx_has_waitpkg(vmx)) return 1; msr_info->data = vmx->msr_ia32_umwait_control; break; case MSR_IA32_SPEC_CTRL: if (!msr_info->host_initiated && !guest_has_spec_ctrl_msr(vcpu)) return 1; msr_info->data = to_vmx(vcpu)->spec_ctrl; break; case MSR_IA32_SYSENTER_CS: msr_info->data = vmcs_read32(GUEST_SYSENTER_CS); break; case MSR_IA32_SYSENTER_EIP: msr_info->data = vmcs_readl(GUEST_SYSENTER_EIP); break; case MSR_IA32_SYSENTER_ESP: msr_info->data = vmcs_readl(GUEST_SYSENTER_ESP); break; case MSR_IA32_BNDCFGS: if (!kvm_mpx_supported() || (!msr_info->host_initiated && !guest_cpu_cap_has(vcpu, X86_FEATURE_MPX))) return 1; msr_info->data = vmcs_read64(GUEST_BNDCFGS); break; case MSR_IA32_MCG_EXT_CTL: if (!msr_info->host_initiated && !(vmx->msr_ia32_feature_control & FEAT_CTL_LMCE_ENABLED)) return 1; msr_info->data = vcpu->arch.mcg_ext_ctl; break; case MSR_IA32_FEAT_CTL: msr_info->data = vmx->msr_ia32_feature_control; break; case MSR_IA32_SGXLEPUBKEYHASH0 ... MSR_IA32_SGXLEPUBKEYHASH3: if (!msr_info->host_initiated && !guest_cpu_cap_has(vcpu, X86_FEATURE_SGX_LC)) return 1; msr_info->data = to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash [msr_info->index - MSR_IA32_SGXLEPUBKEYHASH0]; break; case KVM_FIRST_EMULATED_VMX_MSR ... KVM_LAST_EMULATED_VMX_MSR: if (!guest_cpu_cap_has(vcpu, X86_FEATURE_VMX)) return 1; if (vmx_get_vmx_msr(&vmx->nested.msrs, msr_info->index, &msr_info->data)) return 1; #ifdef CONFIG_KVM_HYPERV /* * Enlightened VMCS v1 doesn't have certain VMCS fields but * instead of just ignoring the features, different Hyper-V * versions are either trying to use them and fail or do some * sanity checking and refuse to boot. Filter all unsupported * features out. */ if (!msr_info->host_initiated && guest_cpu_cap_has_evmcs(vcpu)) nested_evmcs_filter_control_msr(vcpu, msr_info->index, &msr_info->data); #endif break; case MSR_IA32_RTIT_CTL: if (!vmx_pt_mode_is_host_guest()) return 1; msr_info->data = vmx->pt_desc.guest.ctl; break; case MSR_IA32_RTIT_STATUS: if (!vmx_pt_mode_is_host_guest()) return 1; msr_info->data = vmx->pt_desc.guest.status; break; case MSR_IA32_RTIT_CR3_MATCH: if (!vmx_pt_mode_is_host_guest() || !intel_pt_validate_cap(vmx->pt_desc.caps, PT_CAP_cr3_filtering)) return 1; msr_info->data = vmx->pt_desc.guest.cr3_match; break; case MSR_IA32_RTIT_OUTPUT_BASE: if (!vmx_pt_mode_is_host_guest() || (!intel_pt_validate_cap(vmx->pt_desc.caps, PT_CAP_topa_output) && !intel_pt_validate_cap(vmx->pt_desc.caps, PT_CAP_single_range_output))) return 1; msr_info->data = vmx->pt_desc.guest.output_base; break; case MSR_IA32_RTIT_OUTPUT_MASK: if (!vmx_pt_mode_is_host_guest() || (!intel_pt_validate_cap(vmx->pt_desc.caps, PT_CAP_topa_output) && !intel_pt_validate_cap(vmx->pt_desc.caps, PT_CAP_single_range_output))) return 1; msr_info->data = vmx->pt_desc.guest.output_mask; break; case MSR_IA32_RTIT_ADDR0_A ... MSR_IA32_RTIT_ADDR3_B: index = msr_info->index - MSR_IA32_RTIT_ADDR0_A; if (!vmx_pt_mode_is_host_guest() || (index >= 2 * vmx->pt_desc.num_address_ranges)) return 1; if (index % 2) msr_info->data = vmx->pt_desc.guest.addr_b[index / 2]; else msr_info->data = vmx->pt_desc.guest.addr_a[index / 2]; break; case MSR_IA32_DEBUGCTLMSR: msr_info->data = vmx_guest_debugctl_read(); break; default: find_uret_msr: msr = vmx_find_uret_msr(vmx, msr_info->index); if (msr) { msr_info->data = msr->data; break; } return kvm_get_msr_common(vcpu, msr_info); } return 0; } static u64 nested_vmx_truncate_sysenter_addr(struct kvm_vcpu *vcpu, u64 data) { #ifdef CONFIG_X86_64 if (!guest_cpu_cap_has(vcpu, X86_FEATURE_LM)) return (u32)data; #endif return (unsigned long)data; } u64 vmx_get_supported_debugctl(struct kvm_vcpu *vcpu, bool host_initiated) { u64 debugctl = 0; if (boot_cpu_has(X86_FEATURE_BUS_LOCK_DETECT) && (host_initiated || guest_cpu_cap_has(vcpu, X86_FEATURE_BUS_LOCK_DETECT))) debugctl |= DEBUGCTLMSR_BUS_LOCK_DETECT; if ((kvm_caps.supported_perf_cap & PMU_CAP_LBR_FMT) && (host_initiated || intel_pmu_lbr_is_enabled(vcpu))) debugctl |= DEBUGCTLMSR_LBR | DEBUGCTLMSR_FREEZE_LBRS_ON_PMI; if (boot_cpu_has(X86_FEATURE_RTM) && (host_initiated || guest_cpu_cap_has(vcpu, X86_FEATURE_RTM))) debugctl |= DEBUGCTLMSR_RTM_DEBUG; return debugctl; } bool vmx_is_valid_debugctl(struct kvm_vcpu *vcpu, u64 data, bool host_initiated) { u64 invalid; invalid = data & ~vmx_get_supported_debugctl(vcpu, host_initiated); if (invalid & (DEBUGCTLMSR_BTF | DEBUGCTLMSR_LBR)) { kvm_pr_unimpl_wrmsr(vcpu, MSR_IA32_DEBUGCTLMSR, data); invalid &= ~(DEBUGCTLMSR_BTF | DEBUGCTLMSR_LBR); } return !invalid; } /* * Writes msr value into the appropriate "register". * Returns 0 on success, non-0 otherwise. * Assumes vcpu_load() was already called. */ int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) { struct vcpu_vmx *vmx = to_vmx(vcpu); struct vmx_uret_msr *msr; int ret = 0; u32 msr_index = msr_info->index; u64 data = msr_info->data; u32 index; switch (msr_index) { case MSR_EFER: ret = kvm_set_msr_common(vcpu, msr_info); break; #ifdef CONFIG_X86_64 case MSR_FS_BASE: vmx_segment_cache_clear(vmx); vmcs_writel(GUEST_FS_BASE, data); break; case MSR_GS_BASE: vmx_segment_cache_clear(vmx); vmcs_writel(GUEST_GS_BASE, data); break; case MSR_KERNEL_GS_BASE: vmx_write_guest_kernel_gs_base(vmx, data); break; case MSR_IA32_XFD: ret = kvm_set_msr_common(vcpu, msr_info); /* * Always intercepting WRMSR could incur non-negligible * overhead given xfd might be changed frequently in * guest context switch. Disable write interception * upon the first write with a non-zero value (indicating * potential usage on dynamic xfeatures). Also update * exception bitmap to trap #NM for proper virtualization * of guest xfd_err. */ if (!ret && data) { vmx_disable_intercept_for_msr(vcpu, MSR_IA32_XFD, MSR_TYPE_RW); vcpu->arch.xfd_no_write_intercept = true; vmx_update_exception_bitmap(vcpu); } break; #endif case MSR_IA32_SYSENTER_CS: if (is_guest_mode(vcpu)) get_vmcs12(vcpu)->guest_sysenter_cs = data; vmcs_write32(GUEST_SYSENTER_CS, data); break; case MSR_IA32_SYSENTER_EIP: if (is_guest_mode(vcpu)) { data = nested_vmx_truncate_sysenter_addr(vcpu, data); get_vmcs12(vcpu)->guest_sysenter_eip = data; } vmcs_writel(GUEST_SYSENTER_EIP, data); break; case MSR_IA32_SYSENTER_ESP: if (is_guest_mode(vcpu)) { data = nested_vmx_truncate_sysenter_addr(vcpu, data); get_vmcs12(vcpu)->guest_sysenter_esp = data; } vmcs_writel(GUEST_SYSENTER_ESP, data); break; case MSR_IA32_DEBUGCTLMSR: if (!vmx_is_valid_debugctl(vcpu, data, msr_info->host_initiated)) return 1; data &= vmx_get_supported_debugctl(vcpu, msr_info->host_initiated); if (is_guest_mode(vcpu) && get_vmcs12(vcpu)->vm_exit_controls & VM_EXIT_SAVE_DEBUG_CONTROLS) get_vmcs12(vcpu)->guest_ia32_debugctl = data; vmx_guest_debugctl_write(vcpu, data); if (intel_pmu_lbr_is_enabled(vcpu) && !to_vmx(vcpu)->lbr_desc.event && (data & DEBUGCTLMSR_LBR)) intel_pmu_create_guest_lbr_event(vcpu); return 0; case MSR_IA32_BNDCFGS: if (!kvm_mpx_supported() || (!msr_info->host_initiated && !guest_cpu_cap_has(vcpu, X86_FEATURE_MPX))) return 1; if (is_noncanonical_msr_address(data & PAGE_MASK, vcpu) || (data & MSR_IA32_BNDCFGS_RSVD)) return 1; if (is_guest_mode(vcpu) && ((vmx->nested.msrs.entry_ctls_high & VM_ENTRY_LOAD_BNDCFGS) || (vmx->nested.msrs.exit_ctls_high & VM_EXIT_CLEAR_BNDCFGS))) get_vmcs12(vcpu)->guest_bndcfgs = data; vmcs_write64(GUEST_BNDCFGS, data); break; case MSR_IA32_UMWAIT_CONTROL: if (!msr_info->host_initiated && !vmx_has_waitpkg(vmx)) return 1; /* The reserved bit 1 and non-32 bit [63:32] should be zero */ if (data & (BIT_ULL(1) | GENMASK_ULL(63, 32))) return 1; vmx->msr_ia32_umwait_control = data; break; case MSR_IA32_SPEC_CTRL: if (!msr_info->host_initiated && !guest_has_spec_ctrl_msr(vcpu)) return 1; if (kvm_spec_ctrl_test_value(data)) return 1; vmx->spec_ctrl = data; if (!data) break; /* * For non-nested: * When it's written (to non-zero) for the first time, pass * it through. * * For nested: * The handling of the MSR bitmap for L2 guests is done in * nested_vmx_prepare_msr_bitmap. We should not touch the * vmcs02.msr_bitmap here since it gets completely overwritten * in the merging. We update the vmcs01 here for L1 as well * since it will end up touching the MSR anyway now. */ vmx_disable_intercept_for_msr(vcpu, MSR_IA32_SPEC_CTRL, MSR_TYPE_RW); break; case MSR_IA32_TSX_CTRL: if (!msr_info->host_initiated && !(vcpu->arch.arch_capabilities & ARCH_CAP_TSX_CTRL_MSR)) return 1; if (data & ~(TSX_CTRL_RTM_DISABLE | TSX_CTRL_CPUID_CLEAR)) return 1; goto find_uret_msr; case MSR_IA32_CR_PAT: ret = kvm_set_msr_common(vcpu, msr_info); if (ret) break; if (is_guest_mode(vcpu) && get_vmcs12(vcpu)->vm_exit_controls & VM_EXIT_SAVE_IA32_PAT) get_vmcs12(vcpu)->guest_ia32_pat = data; if (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PAT) vmcs_write64(GUEST_IA32_PAT, data); break; case MSR_IA32_MCG_EXT_CTL: if ((!msr_info->host_initiated && !(to_vmx(vcpu)->msr_ia32_feature_control & FEAT_CTL_LMCE_ENABLED)) || (data & ~MCG_EXT_CTL_LMCE_EN)) return 1; vcpu->arch.mcg_ext_ctl = data; break; case MSR_IA32_FEAT_CTL: if (!is_vmx_feature_control_msr_valid(vmx, msr_info)) return 1; vmx->msr_ia32_feature_control = data; if (msr_info->host_initiated && data == 0) vmx_leave_nested(vcpu); /* SGX may be enabled/disabled by guest's firmware */ vmx_write_encls_bitmap(vcpu, NULL); break; case MSR_IA32_SGXLEPUBKEYHASH0 ... MSR_IA32_SGXLEPUBKEYHASH3: /* * On real hardware, the LE hash MSRs are writable before * the firmware sets bit 0 in MSR 0x7a ("activating" SGX), * at which point SGX related bits in IA32_FEATURE_CONTROL * become writable. * * KVM does not emulate SGX activation for simplicity, so * allow writes to the LE hash MSRs if IA32_FEATURE_CONTROL * is unlocked. This is technically not architectural * behavior, but it's close enough. */ if (!msr_info->host_initiated && (!guest_cpu_cap_has(vcpu, X86_FEATURE_SGX_LC) || ((vmx->msr_ia32_feature_control & FEAT_CTL_LOCKED) && !(vmx->msr_ia32_feature_control & FEAT_CTL_SGX_LC_ENABLED)))) return 1; vmx->msr_ia32_sgxlepubkeyhash [msr_index - MSR_IA32_SGXLEPUBKEYHASH0] = data; break; case KVM_FIRST_EMULATED_VMX_MSR ... KVM_LAST_EMULATED_VMX_MSR: if (!msr_info->host_initiated) return 1; /* they are read-only */ if (!guest_cpu_cap_has(vcpu, X86_FEATURE_VMX)) return 1; return vmx_set_vmx_msr(vcpu, msr_index, data); case MSR_IA32_RTIT_CTL: if (!vmx_pt_mode_is_host_guest() || vmx_rtit_ctl_check(vcpu, data) || vmx->nested.vmxon) return 1; vmcs_write64(GUEST_IA32_RTIT_CTL, data); vmx->pt_desc.guest.ctl = data; pt_update_intercept_for_msr(vcpu); break; case MSR_IA32_RTIT_STATUS: if (!pt_can_write_msr(vmx)) return 1; if (data & MSR_IA32_RTIT_STATUS_MASK) return 1; vmx->pt_desc.guest.status = data; break; case MSR_IA32_RTIT_CR3_MATCH: if (!pt_can_write_msr(vmx)) return 1; if (!intel_pt_validate_cap(vmx->pt_desc.caps, PT_CAP_cr3_filtering)) return 1; vmx->pt_desc.guest.cr3_match = data; break; case MSR_IA32_RTIT_OUTPUT_BASE: if (!pt_can_write_msr(vmx)) return 1; if (!intel_pt_validate_cap(vmx->pt_desc.caps, PT_CAP_topa_output) && !intel_pt_validate_cap(vmx->pt_desc.caps, PT_CAP_single_range_output)) return 1; if (!pt_output_base_valid(vcpu, data)) return 1; vmx->pt_desc.guest.output_base = data; break; case MSR_IA32_RTIT_OUTPUT_MASK: if (!pt_can_write_msr(vmx)) return 1; if (!intel_pt_validate_cap(vmx->pt_desc.caps, PT_CAP_topa_output) && !intel_pt_validate_cap(vmx->pt_desc.caps, PT_CAP_single_range_output)) return 1; vmx->pt_desc.guest.output_mask = data; break; case MSR_IA32_RTIT_ADDR0_A ... MSR_IA32_RTIT_ADDR3_B: if (!pt_can_write_msr(vmx)) return 1; index = msr_info->index - MSR_IA32_RTIT_ADDR0_A; if (index >= 2 * vmx->pt_desc.num_address_ranges) return 1; if (is_noncanonical_msr_address(data, vcpu)) return 1; if (index % 2) vmx->pt_desc.guest.addr_b[index / 2] = data; else vmx->pt_desc.guest.addr_a[index / 2] = data; break; case MSR_IA32_PERF_CAPABILITIES: if (data & PMU_CAP_LBR_FMT) { if ((data & PMU_CAP_LBR_FMT) != (kvm_caps.supported_perf_cap & PMU_CAP_LBR_FMT)) return 1; if (!cpuid_model_is_consistent(vcpu)) return 1; } if (data & PERF_CAP_PEBS_FORMAT) { if ((data & PERF_CAP_PEBS_MASK) != (kvm_caps.supported_perf_cap & PERF_CAP_PEBS_MASK)) return 1; if (!guest_cpu_cap_has(vcpu, X86_FEATURE_DS)) return 1; if (!guest_cpu_cap_has(vcpu, X86_FEATURE_DTES64)) return 1; if (!cpuid_model_is_consistent(vcpu)) return 1; } ret = kvm_set_msr_common(vcpu, msr_info); break; default: find_uret_msr: msr = vmx_find_uret_msr(vmx, msr_index); if (msr) ret = vmx_set_guest_uret_msr(vmx, msr, data); else ret = kvm_set_msr_common(vcpu, msr_info); } /* FB_CLEAR may have changed, also update the FB_CLEAR_DIS behavior */ if (msr_index == MSR_IA32_ARCH_CAPABILITIES) vmx_update_fb_clear_dis(vcpu, vmx); return ret; } void vmx_cache_reg(struct kvm_vcpu *vcpu, enum kvm_reg reg) { unsigned long guest_owned_bits; kvm_register_mark_available(vcpu, reg); switch (reg) { case VCPU_REGS_RSP: vcpu->arch.regs[VCPU_REGS_RSP] = vmcs_readl(GUEST_RSP); break; case VCPU_REGS_RIP: vcpu->arch.regs[VCPU_REGS_RIP] = vmcs_readl(GUEST_RIP); break; case VCPU_EXREG_PDPTR: if (enable_ept) ept_save_pdptrs(vcpu); break; case VCPU_EXREG_CR0: guest_owned_bits = vcpu->arch.cr0_guest_owned_bits; vcpu->arch.cr0 &= ~guest_owned_bits; vcpu->arch.cr0 |= vmcs_readl(GUEST_CR0) & guest_owned_bits; break; case VCPU_EXREG_CR3: /* * When intercepting CR3 loads, e.g. for shadowing paging, KVM's * CR3 is loaded into hardware, not the guest's CR3. */ if (!(exec_controls_get(to_vmx(vcpu)) & CPU_BASED_CR3_LOAD_EXITING)) vcpu->arch.cr3 = vmcs_readl(GUEST_CR3); break; case VCPU_EXREG_CR4: guest_owned_bits = vcpu->arch.cr4_guest_owned_bits; vcpu->arch.cr4 &= ~guest_owned_bits; vcpu->arch.cr4 |= vmcs_readl(GUEST_CR4) & guest_owned_bits; break; default: KVM_BUG_ON(1, vcpu->kvm); break; } } /* * There is no X86_FEATURE for SGX yet, but anyway we need to query CPUID * directly instead of going through cpu_has(), to ensure KVM is trapping * ENCLS whenever it's supported in hardware. It does not matter whether * the host OS supports or has enabled SGX. */ static bool cpu_has_sgx(void) { return cpuid_eax(0) >= 0x12 && (cpuid_eax(0x12) & BIT(0)); } static int adjust_vmx_controls(u32 ctl_min, u32 ctl_opt, u32 msr, u32 *result) { u32 vmx_msr_low, vmx_msr_high; u32 ctl = ctl_min | ctl_opt; rdmsr(msr, vmx_msr_low, vmx_msr_high); ctl &= vmx_msr_high; /* bit == 0 in high word ==> must be zero */ ctl |= vmx_msr_low; /* bit == 1 in low word ==> must be one */ /* Ensure minimum (required) set of control bits are supported. */ if (ctl_min & ~ctl) return -EIO; *result = ctl; return 0; } static u64 adjust_vmx_controls64(u64 ctl_opt, u32 msr) { u64 allowed; rdmsrq(msr, allowed); return ctl_opt & allowed; } #define vmx_check_entry_exit_pairs(pairs, entry_controls, exit_controls) \ ({ \ int i, r = 0; \ \ BUILD_BUG_ON(sizeof(pairs[0].entry_control) != sizeof(entry_controls)); \ BUILD_BUG_ON(sizeof(pairs[0].exit_control) != sizeof(exit_controls)); \ \ for (i = 0; i < ARRAY_SIZE(pairs); i++) { \ typeof(entry_controls) n_ctrl = pairs[i].entry_control; \ typeof(exit_controls) x_ctrl = pairs[i].exit_control; \ \ if (!(entry_controls & n_ctrl) == !(exit_controls & x_ctrl)) \ continue; \ \ pr_warn_once("Inconsistent VM-Entry/VM-Exit pair, " \ "entry = %llx (%llx), exit = %llx (%llx)\n", \ (u64)(entry_controls & n_ctrl), (u64)n_ctrl, \ (u64)(exit_controls & x_ctrl), (u64)x_ctrl); \ \ if (error_on_inconsistent_vmcs_config) \ r = -EIO; \ \ entry_controls &= ~n_ctrl; \ exit_controls &= ~x_ctrl; \ } \ r; \ }) static int setup_vmcs_config(struct vmcs_config *vmcs_conf, struct vmx_capability *vmx_cap) { u32 _pin_based_exec_control = 0; u32 _cpu_based_exec_control = 0; u32 _cpu_based_2nd_exec_control = 0; u64 _cpu_based_3rd_exec_control = 0; u32 _vmexit_control = 0; u32 _vmentry_control = 0; u64 basic_msr; u64 misc_msr; /* * LOAD/SAVE_DEBUG_CONTROLS are absent because both are mandatory. * SAVE_IA32_PAT and SAVE_IA32_EFER are absent because KVM always * intercepts writes to PAT and EFER, i.e. never enables those controls. */ struct { u32 entry_control; u32 exit_control; } const vmcs_entry_exit_pairs[] = { { VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL, VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL }, { VM_ENTRY_LOAD_IA32_PAT, VM_EXIT_LOAD_IA32_PAT }, { VM_ENTRY_LOAD_IA32_EFER, VM_EXIT_LOAD_IA32_EFER }, { VM_ENTRY_LOAD_BNDCFGS, VM_EXIT_CLEAR_BNDCFGS }, { VM_ENTRY_LOAD_IA32_RTIT_CTL, VM_EXIT_CLEAR_IA32_RTIT_CTL }, }; memset(vmcs_conf, 0, sizeof(*vmcs_conf)); if (adjust_vmx_controls(KVM_REQUIRED_VMX_CPU_BASED_VM_EXEC_CONTROL, KVM_OPTIONAL_VMX_CPU_BASED_VM_EXEC_CONTROL, MSR_IA32_VMX_PROCBASED_CTLS, &_cpu_based_exec_control)) return -EIO; if (_cpu_based_exec_control & CPU_BASED_ACTIVATE_SECONDARY_CONTROLS) { if (adjust_vmx_controls(KVM_REQUIRED_VMX_SECONDARY_VM_EXEC_CONTROL, KVM_OPTIONAL_VMX_SECONDARY_VM_EXEC_CONTROL, MSR_IA32_VMX_PROCBASED_CTLS2, &_cpu_based_2nd_exec_control)) return -EIO; } if (!IS_ENABLED(CONFIG_KVM_INTEL_PROVE_VE)) _cpu_based_2nd_exec_control &= ~SECONDARY_EXEC_EPT_VIOLATION_VE; #ifndef CONFIG_X86_64 if (!(_cpu_based_2nd_exec_control & SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES)) _cpu_based_exec_control &= ~CPU_BASED_TPR_SHADOW; #endif if (!(_cpu_based_exec_control & CPU_BASED_TPR_SHADOW)) _cpu_based_2nd_exec_control &= ~( SECONDARY_EXEC_APIC_REGISTER_VIRT | SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE | SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY); rdmsr_safe(MSR_IA32_VMX_EPT_VPID_CAP, &vmx_cap->ept, &vmx_cap->vpid); if (!(_cpu_based_2nd_exec_control & SECONDARY_EXEC_ENABLE_EPT) && vmx_cap->ept) { pr_warn_once("EPT CAP should not exist if not support " "1-setting enable EPT VM-execution control\n"); if (error_on_inconsistent_vmcs_config) return -EIO; vmx_cap->ept = 0; _cpu_based_2nd_exec_control &= ~SECONDARY_EXEC_EPT_VIOLATION_VE; } if (!(_cpu_based_2nd_exec_control & SECONDARY_EXEC_ENABLE_VPID) && vmx_cap->vpid) { pr_warn_once("VPID CAP should not exist if not support " "1-setting enable VPID VM-execution control\n"); if (error_on_inconsistent_vmcs_config) return -EIO; vmx_cap->vpid = 0; } if (!cpu_has_sgx()) _cpu_based_2nd_exec_control &= ~SECONDARY_EXEC_ENCLS_EXITING; if (_cpu_based_exec_control & CPU_BASED_ACTIVATE_TERTIARY_CONTROLS) _cpu_based_3rd_exec_control = adjust_vmx_controls64(KVM_OPTIONAL_VMX_TERTIARY_VM_EXEC_CONTROL, MSR_IA32_VMX_PROCBASED_CTLS3); if (adjust_vmx_controls(KVM_REQUIRED_VMX_VM_EXIT_CONTROLS, KVM_OPTIONAL_VMX_VM_EXIT_CONTROLS, MSR_IA32_VMX_EXIT_CTLS, &_vmexit_control)) return -EIO; if (adjust_vmx_controls(KVM_REQUIRED_VMX_PIN_BASED_VM_EXEC_CONTROL, KVM_OPTIONAL_VMX_PIN_BASED_VM_EXEC_CONTROL, MSR_IA32_VMX_PINBASED_CTLS, &_pin_based_exec_control)) return -EIO; if (cpu_has_broken_vmx_preemption_timer()) _pin_based_exec_control &= ~PIN_BASED_VMX_PREEMPTION_TIMER; if (!(_cpu_based_2nd_exec_control & SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY)) _pin_based_exec_control &= ~PIN_BASED_POSTED_INTR; if (adjust_vmx_controls(KVM_REQUIRED_VMX_VM_ENTRY_CONTROLS, KVM_OPTIONAL_VMX_VM_ENTRY_CONTROLS, MSR_IA32_VMX_ENTRY_CTLS, &_vmentry_control)) return -EIO; if (vmx_check_entry_exit_pairs(vmcs_entry_exit_pairs, _vmentry_control, _vmexit_control)) return -EIO; /* * Some cpus support VM_{ENTRY,EXIT}_IA32_PERF_GLOBAL_CTRL but they * can't be used due to an errata where VM Exit may incorrectly clear * IA32_PERF_GLOBAL_CTRL[34:32]. Workaround the errata by using the * MSR load mechanism to switch IA32_PERF_GLOBAL_CTRL. */ switch (boot_cpu_data.x86_vfm) { case INTEL_NEHALEM_EP: /* AAK155 */ case INTEL_NEHALEM: /* AAP115 */ case INTEL_WESTMERE: /* AAT100 */ case INTEL_WESTMERE_EP: /* BC86,AAY89,BD102 */ case INTEL_NEHALEM_EX: /* BA97 */ _vmentry_control &= ~VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL; _vmexit_control &= ~VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL; pr_warn_once("VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL " "does not work properly. Using workaround\n"); break; default: break; } rdmsrq(MSR_IA32_VMX_BASIC, basic_msr); /* IA-32 SDM Vol 3B: VMCS size is never greater than 4kB. */ if (vmx_basic_vmcs_size(basic_msr) > PAGE_SIZE) return -EIO; #ifdef CONFIG_X86_64 /* * KVM expects to be able to shove all legal physical addresses into * VMCS fields for 64-bit kernels, and per the SDM, "This bit is always * 0 for processors that support Intel 64 architecture". */ if (basic_msr & VMX_BASIC_32BIT_PHYS_ADDR_ONLY) return -EIO; #endif /* Require Write-Back (WB) memory type for VMCS accesses. */ if (vmx_basic_vmcs_mem_type(basic_msr) != X86_MEMTYPE_WB) return -EIO; rdmsrq(MSR_IA32_VMX_MISC, misc_msr); vmcs_conf->basic = basic_msr; vmcs_conf->pin_based_exec_ctrl = _pin_based_exec_control; vmcs_conf->cpu_based_exec_ctrl = _cpu_based_exec_control; vmcs_conf->cpu_based_2nd_exec_ctrl = _cpu_based_2nd_exec_control; vmcs_conf->cpu_based_3rd_exec_ctrl = _cpu_based_3rd_exec_control; vmcs_conf->vmexit_ctrl = _vmexit_control; vmcs_conf->vmentry_ctrl = _vmentry_control; vmcs_conf->misc = misc_msr; #if IS_ENABLED(CONFIG_HYPERV) if (enlightened_vmcs) evmcs_sanitize_exec_ctrls(vmcs_conf); #endif return 0; } static bool __kvm_is_vmx_supported(void) { int cpu = smp_processor_id(); if (!(cpuid_ecx(1) & feature_bit(VMX))) { pr_err("VMX not supported by CPU %d\n", cpu); return false; } if (!this_cpu_has(X86_FEATURE_MSR_IA32_FEAT_CTL) || !this_cpu_has(X86_FEATURE_VMX)) { pr_err("VMX not enabled (by BIOS) in MSR_IA32_FEAT_CTL on CPU %d\n", cpu); return false; } return true; } static bool kvm_is_vmx_supported(void) { bool supported; migrate_disable(); supported = __kvm_is_vmx_supported(); migrate_enable(); return supported; } int vmx_check_processor_compat(void) { int cpu = raw_smp_processor_id(); struct vmcs_config vmcs_conf; struct vmx_capability vmx_cap; if (!__kvm_is_vmx_supported()) return -EIO; if (setup_vmcs_config(&vmcs_conf, &vmx_cap) < 0) { pr_err("Failed to setup VMCS config on CPU %d\n", cpu); return -EIO; } if (nested) nested_vmx_setup_ctls_msrs(&vmcs_conf, vmx_cap.ept); if (memcmp(&vmcs_config, &vmcs_conf, sizeof(struct vmcs_config))) { pr_err("Inconsistent VMCS config on CPU %d\n", cpu); return -EIO; } return 0; } static int kvm_cpu_vmxon(u64 vmxon_pointer) { u64 msr; cr4_set_bits(X86_CR4_VMXE); asm goto("1: vmxon %[vmxon_pointer]\n\t" _ASM_EXTABLE(1b, %l[fault]) : : [vmxon_pointer] "m"(vmxon_pointer) : : fault); return 0; fault: WARN_ONCE(1, "VMXON faulted, MSR_IA32_FEAT_CTL (0x3a) = 0x%llx\n", rdmsrq_safe(MSR_IA32_FEAT_CTL, &msr) ? 0xdeadbeef : msr); cr4_clear_bits(X86_CR4_VMXE); return -EFAULT; } int vmx_enable_virtualization_cpu(void) { int cpu = raw_smp_processor_id(); u64 phys_addr = __pa(per_cpu(vmxarea, cpu)); int r; if (cr4_read_shadow() & X86_CR4_VMXE) return -EBUSY; /* * This can happen if we hot-added a CPU but failed to allocate * VP assist page for it. */ if (kvm_is_using_evmcs() && !hv_get_vp_assist_page(cpu)) return -EFAULT; intel_pt_handle_vmx(1); r = kvm_cpu_vmxon(phys_addr); if (r) { intel_pt_handle_vmx(0); return r; } return 0; } static void vmclear_local_loaded_vmcss(void) { int cpu = raw_smp_processor_id(); struct loaded_vmcs *v, *n; list_for_each_entry_safe(v, n, &per_cpu(loaded_vmcss_on_cpu, cpu), loaded_vmcss_on_cpu_link) __loaded_vmcs_clear(v); } void vmx_disable_virtualization_cpu(void) { vmclear_local_loaded_vmcss(); if (kvm_cpu_vmxoff()) kvm_spurious_fault(); hv_reset_evmcs(); intel_pt_handle_vmx(0); } struct vmcs *alloc_vmcs_cpu(bool shadow, int cpu, gfp_t flags) { int node = cpu_to_node(cpu); struct page *pages; struct vmcs *vmcs; pages = __alloc_pages_node(node, flags, 0); if (!pages) return NULL; vmcs = page_address(pages); memset(vmcs, 0, vmx_basic_vmcs_size(vmcs_config.basic)); /* KVM supports Enlightened VMCS v1 only */ if (kvm_is_using_evmcs()) vmcs->hdr.revision_id = KVM_EVMCS_VERSION; else vmcs->hdr.revision_id = vmx_basic_vmcs_revision_id(vmcs_config.basic); if (shadow) vmcs->hdr.shadow_vmcs = 1; return vmcs; } void free_vmcs(struct vmcs *vmcs) { free_page((unsigned long)vmcs); } /* * Free a VMCS, but before that VMCLEAR it on the CPU where it was last loaded */ void free_loaded_vmcs(struct loaded_vmcs *loaded_vmcs) { if (!loaded_vmcs->vmcs) return; loaded_vmcs_clear(loaded_vmcs); free_vmcs(loaded_vmcs->vmcs); loaded_vmcs->vmcs = NULL; if (loaded_vmcs->msr_bitmap) free_page((unsigned long)loaded_vmcs->msr_bitmap); WARN_ON(loaded_vmcs->shadow_vmcs != NULL); } int alloc_loaded_vmcs(struct loaded_vmcs *loaded_vmcs) { loaded_vmcs->vmcs = alloc_vmcs(false); if (!loaded_vmcs->vmcs) return -ENOMEM; vmcs_clear(loaded_vmcs->vmcs); loaded_vmcs->shadow_vmcs = NULL; loaded_vmcs->hv_timer_soft_disabled = false; loaded_vmcs->cpu = -1; loaded_vmcs->launched = 0; if (cpu_has_vmx_msr_bitmap()) { loaded_vmcs->msr_bitmap = (unsigned long *) __get_free_page(GFP_KERNEL_ACCOUNT); if (!loaded_vmcs->msr_bitmap) goto out_vmcs; memset(loaded_vmcs->msr_bitmap, 0xff, PAGE_SIZE); } memset(&loaded_vmcs->host_state, 0, sizeof(struct vmcs_host_state)); memset(&loaded_vmcs->controls_shadow, 0, sizeof(struct vmcs_controls_shadow)); return 0; out_vmcs: free_loaded_vmcs(loaded_vmcs); return -ENOMEM; } static void free_kvm_area(void) { int cpu; for_each_possible_cpu(cpu) { free_vmcs(per_cpu(vmxarea, cpu)); per_cpu(vmxarea, cpu) = NULL; } } static __init int alloc_kvm_area(void) { int cpu; for_each_possible_cpu(cpu) { struct vmcs *vmcs; vmcs = alloc_vmcs_cpu(false, cpu, GFP_KERNEL); if (!vmcs) { free_kvm_area(); return -ENOMEM; } /* * When eVMCS is enabled, alloc_vmcs_cpu() sets * vmcs->revision_id to KVM_EVMCS_VERSION instead of * revision_id reported by MSR_IA32_VMX_BASIC. * * However, even though not explicitly documented by * TLFS, VMXArea passed as VMXON argument should * still be marked with revision_id reported by * physical CPU. */ if (kvm_is_using_evmcs()) vmcs->hdr.revision_id = vmx_basic_vmcs_revision_id(vmcs_config.basic); per_cpu(vmxarea, cpu) = vmcs; } return 0; } static void fix_pmode_seg(struct kvm_vcpu *vcpu, int seg, struct kvm_segment *save) { if (!emulate_invalid_guest_state) { /* * CS and SS RPL should be equal during guest entry according * to VMX spec, but in reality it is not always so. Since vcpu * is in the middle of the transition from real mode to * protected mode it is safe to assume that RPL 0 is a good * default value. */ if (seg == VCPU_SREG_CS || seg == VCPU_SREG_SS) save->selector &= ~SEGMENT_RPL_MASK; save->dpl = save->selector & SEGMENT_RPL_MASK; save->s = 1; } __vmx_set_segment(vcpu, save, seg); } static void enter_pmode(struct kvm_vcpu *vcpu) { unsigned long flags; struct vcpu_vmx *vmx = to_vmx(vcpu); /* * Update real mode segment cache. It may be not up-to-date if segment * register was written while vcpu was in a guest mode. */ vmx_get_segment(vcpu, &vmx->rmode.segs[VCPU_SREG_ES], VCPU_SREG_ES); vmx_get_segment(vcpu, &vmx->rmode.segs[VCPU_SREG_DS], VCPU_SREG_DS); vmx_get_segment(vcpu, &vmx->rmode.segs[VCPU_SREG_FS], VCPU_SREG_FS); vmx_get_segment(vcpu, &vmx->rmode.segs[VCPU_SREG_GS], VCPU_SREG_GS); vmx_get_segment(vcpu, &vmx->rmode.segs[VCPU_SREG_SS], VCPU_SREG_SS); vmx_get_segment(vcpu, &vmx->rmode.segs[VCPU_SREG_CS], VCPU_SREG_CS); vmx->rmode.vm86_active = 0; __vmx_set_segment(vcpu, &vmx->rmode.segs[VCPU_SREG_TR], VCPU_SREG_TR); flags = vmcs_readl(GUEST_RFLAGS); flags &= RMODE_GUEST_OWNED_EFLAGS_BITS; flags |= vmx->rmode.save_rflags & ~RMODE_GUEST_OWNED_EFLAGS_BITS; vmcs_writel(GUEST_RFLAGS, flags); vmcs_writel(GUEST_CR4, (vmcs_readl(GUEST_CR4) & ~X86_CR4_VME) | (vmcs_readl(CR4_READ_SHADOW) & X86_CR4_VME)); vmx_update_exception_bitmap(vcpu); fix_pmode_seg(vcpu, VCPU_SREG_CS, &vmx->rmode.segs[VCPU_SREG_CS]); fix_pmode_seg(vcpu, VCPU_SREG_SS, &vmx->rmode.segs[VCPU_SREG_SS]); fix_pmode_seg(vcpu, VCPU_SREG_ES, &vmx->rmode.segs[VCPU_SREG_ES]); fix_pmode_seg(vcpu, VCPU_SREG_DS, &vmx->rmode.segs[VCPU_SREG_DS]); fix_pmode_seg(vcpu, VCPU_SREG_FS, &vmx->rmode.segs[VCPU_SREG_FS]); fix_pmode_seg(vcpu, VCPU_SREG_GS, &vmx->rmode.segs[VCPU_SREG_GS]); } static void fix_rmode_seg(int seg, struct kvm_segment *save) { const struct kvm_vmx_segment_field *sf = &kvm_vmx_segment_fields[seg]; struct kvm_segment var = *save; var.dpl = 0x3; if (seg == VCPU_SREG_CS) var.type = 0x3; if (!emulate_invalid_guest_state) { var.selector = var.base >> 4; var.base = var.base & 0xffff0; var.limit = 0xffff; var.g = 0; var.db = 0; var.present = 1; var.s = 1; var.l = 0; var.unusable = 0; var.type = 0x3; var.avl = 0; if (save->base & 0xf) pr_warn_once("segment base is not paragraph aligned " "when entering protected mode (seg=%d)", seg); } vmcs_write16(sf->selector, var.selector); vmcs_writel(sf->base, var.base); vmcs_write32(sf->limit, var.limit); vmcs_write32(sf->ar_bytes, vmx_segment_access_rights(&var)); } static void enter_rmode(struct kvm_vcpu *vcpu) { unsigned long flags; struct vcpu_vmx *vmx = to_vmx(vcpu); struct kvm_vmx *kvm_vmx = to_kvm_vmx(vcpu->kvm); /* * KVM should never use VM86 to virtualize Real Mode when L2 is active, * as using VM86 is unnecessary if unrestricted guest is enabled, and * if unrestricted guest is disabled, VM-Enter (from L1) with CR0.PG=0 * should VM-Fail and KVM should reject userspace attempts to stuff * CR0.PG=0 when L2 is active. */ WARN_ON_ONCE(is_guest_mode(vcpu)); vmx_get_segment(vcpu, &vmx->rmode.segs[VCPU_SREG_TR], VCPU_SREG_TR); vmx_get_segment(vcpu, &vmx->rmode.segs[VCPU_SREG_ES], VCPU_SREG_ES); vmx_get_segment(vcpu, &vmx->rmode.segs[VCPU_SREG_DS], VCPU_SREG_DS); vmx_get_segment(vcpu, &vmx->rmode.segs[VCPU_SREG_FS], VCPU_SREG_FS); vmx_get_segment(vcpu, &vmx->rmode.segs[VCPU_SREG_GS], VCPU_SREG_GS); vmx_get_segment(vcpu, &vmx->rmode.segs[VCPU_SREG_SS], VCPU_SREG_SS); vmx_get_segment(vcpu, &vmx->rmode.segs[VCPU_SREG_CS], VCPU_SREG_CS); vmx->rmode.vm86_active = 1; vmx_segment_cache_clear(vmx); vmcs_writel(GUEST_TR_BASE, kvm_vmx->tss_addr); vmcs_write32(GUEST_TR_LIMIT, RMODE_TSS_SIZE - 1); vmcs_write32(GUEST_TR_AR_BYTES, 0x008b); flags = vmcs_readl(GUEST_RFLAGS); vmx->rmode.save_rflags = flags; flags |= X86_EFLAGS_IOPL | X86_EFLAGS_VM; vmcs_writel(GUEST_RFLAGS, flags); vmcs_writel(GUEST_CR4, vmcs_readl(GUEST_CR4) | X86_CR4_VME); vmx_update_exception_bitmap(vcpu); fix_rmode_seg(VCPU_SREG_SS, &vmx->rmode.segs[VCPU_SREG_SS]); fix_rmode_seg(VCPU_SREG_CS, &vmx->rmode.segs[VCPU_SREG_CS]); fix_rmode_seg(VCPU_SREG_ES, &vmx->rmode.segs[VCPU_SREG_ES]); fix_rmode_seg(VCPU_SREG_DS, &vmx->rmode.segs[VCPU_SREG_DS]); fix_rmode_seg(VCPU_SREG_GS, &vmx->rmode.segs[VCPU_SREG_GS]); fix_rmode_seg(VCPU_SREG_FS, &vmx->rmode.segs[VCPU_SREG_FS]); } int vmx_set_efer(struct kvm_vcpu *vcpu, u64 efer) { struct vcpu_vmx *vmx = to_vmx(vcpu); /* Nothing to do if hardware doesn't support EFER. */ if (!vmx_find_uret_msr(vmx, MSR_EFER)) return 0; vcpu->arch.efer = efer; #ifdef CONFIG_X86_64 if (efer & EFER_LMA) vm_entry_controls_setbit(vmx, VM_ENTRY_IA32E_MODE); else vm_entry_controls_clearbit(vmx, VM_ENTRY_IA32E_MODE); #else if (KVM_BUG_ON(efer & EFER_LMA, vcpu->kvm)) return 1; #endif vmx_setup_uret_msrs(vmx); return 0; } #ifdef CONFIG_X86_64 static void enter_lmode(struct kvm_vcpu *vcpu) { u32 guest_tr_ar; vmx_segment_cache_clear(to_vmx(vcpu)); guest_tr_ar = vmcs_read32(GUEST_TR_AR_BYTES); if ((guest_tr_ar & VMX_AR_TYPE_MASK) != VMX_AR_TYPE_BUSY_64_TSS) { pr_debug_ratelimited("%s: tss fixup for long mode. \n", __func__); vmcs_write32(GUEST_TR_AR_BYTES, (guest_tr_ar & ~VMX_AR_TYPE_MASK) | VMX_AR_TYPE_BUSY_64_TSS); } vmx_set_efer(vcpu, vcpu->arch.efer | EFER_LMA); } static void exit_lmode(struct kvm_vcpu *vcpu) { vmx_set_efer(vcpu, vcpu->arch.efer & ~EFER_LMA); } #endif void vmx_flush_tlb_all(struct kvm_vcpu *vcpu) { struct vcpu_vmx *vmx = to_vmx(vcpu); /* * INVEPT must be issued when EPT is enabled, irrespective of VPID, as * the CPU is not required to invalidate guest-physical mappings on * VM-Entry, even if VPID is disabled. Guest-physical mappings are * associated with the root EPT structure and not any particular VPID * (INVVPID also isn't required to invalidate guest-physical mappings). */ if (enable_ept) { ept_sync_global(); } else if (enable_vpid) { if (cpu_has_vmx_invvpid_global()) { vpid_sync_vcpu_global(); } else { vpid_sync_vcpu_single(vmx->vpid); vpid_sync_vcpu_single(vmx->nested.vpid02); } } } static inline int vmx_get_current_vpid(struct kvm_vcpu *vcpu) { if (is_guest_mode(vcpu) && nested_cpu_has_vpid(get_vmcs12(vcpu))) return nested_get_vpid02(vcpu); return to_vmx(vcpu)->vpid; } void vmx_flush_tlb_current(struct kvm_vcpu *vcpu) { struct kvm_mmu *mmu = vcpu->arch.mmu; u64 root_hpa = mmu->root.hpa; /* No flush required if the current context is invalid. */ if (!VALID_PAGE(root_hpa)) return; if (enable_ept) ept_sync_context(construct_eptp(vcpu, root_hpa, mmu->root_role.level)); else vpid_sync_context(vmx_get_current_vpid(vcpu)); } void vmx_flush_tlb_gva(struct kvm_vcpu *vcpu, gva_t addr) { /* * vpid_sync_vcpu_addr() is a nop if vpid==0, see the comment in * vmx_flush_tlb_guest() for an explanation of why this is ok. */ vpid_sync_vcpu_addr(vmx_get_current_vpid(vcpu), addr); } void vmx_flush_tlb_guest(struct kvm_vcpu *vcpu) { /* * vpid_sync_context() is a nop if vpid==0, e.g. if enable_vpid==0 or a * vpid couldn't be allocated for this vCPU. VM-Enter and VM-Exit are * required to flush GVA->{G,H}PA mappings from the TLB if vpid is * disabled (VM-Enter with vpid enabled and vpid==0 is disallowed), * i.e. no explicit INVVPID is necessary. */ vpid_sync_context(vmx_get_current_vpid(vcpu)); } void vmx_ept_load_pdptrs(struct kvm_vcpu *vcpu) { struct kvm_mmu *mmu = vcpu->arch.walk_mmu; if (!kvm_register_is_dirty(vcpu, VCPU_EXREG_PDPTR)) return; if (is_pae_paging(vcpu)) { vmcs_write64(GUEST_PDPTR0, mmu->pdptrs[0]); vmcs_write64(GUEST_PDPTR1, mmu->pdptrs[1]); vmcs_write64(GUEST_PDPTR2, mmu->pdptrs[2]); vmcs_write64(GUEST_PDPTR3, mmu->pdptrs[3]); } } void ept_save_pdptrs(struct kvm_vcpu *vcpu) { struct kvm_mmu *mmu = vcpu->arch.walk_mmu; if (WARN_ON_ONCE(!is_pae_paging(vcpu))) return; mmu->pdptrs[0] = vmcs_read64(GUEST_PDPTR0); mmu->pdptrs[1] = vmcs_read64(GUEST_PDPTR1); mmu->pdptrs[2] = vmcs_read64(GUEST_PDPTR2); mmu->pdptrs[3] = vmcs_read64(GUEST_PDPTR3); kvm_register_mark_available(vcpu, VCPU_EXREG_PDPTR); } #define CR3_EXITING_BITS (CPU_BASED_CR3_LOAD_EXITING | \ CPU_BASED_CR3_STORE_EXITING) bool vmx_is_valid_cr0(struct kvm_vcpu *vcpu, unsigned long cr0) { if (is_guest_mode(vcpu)) return nested_guest_cr0_valid(vcpu, cr0); if (to_vmx(vcpu)->nested.vmxon) return nested_host_cr0_valid(vcpu, cr0); return true; } void vmx_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0) { struct vcpu_vmx *vmx = to_vmx(vcpu); unsigned long hw_cr0, old_cr0_pg; u32 tmp; old_cr0_pg = kvm_read_cr0_bits(vcpu, X86_CR0_PG); hw_cr0 = (cr0 & ~KVM_VM_CR0_ALWAYS_OFF); if (enable_unrestricted_guest) hw_cr0 |= KVM_VM_CR0_ALWAYS_ON_UNRESTRICTED_GUEST; else { hw_cr0 |= KVM_VM_CR0_ALWAYS_ON; if (!enable_ept) hw_cr0 |= X86_CR0_WP; if (vmx->rmode.vm86_active && (cr0 & X86_CR0_PE)) enter_pmode(vcpu); if (!vmx->rmode.vm86_active && !(cr0 & X86_CR0_PE)) enter_rmode(vcpu); } vmcs_writel(CR0_READ_SHADOW, cr0); vmcs_writel(GUEST_CR0, hw_cr0); vcpu->arch.cr0 = cr0; kvm_register_mark_available(vcpu, VCPU_EXREG_CR0); #ifdef CONFIG_X86_64 if (vcpu->arch.efer & EFER_LME) { if (!old_cr0_pg && (cr0 & X86_CR0_PG)) enter_lmode(vcpu); else if (old_cr0_pg && !(cr0 & X86_CR0_PG)) exit_lmode(vcpu); } #endif if (enable_ept && !enable_unrestricted_guest) { /* * Ensure KVM has an up-to-date snapshot of the guest's CR3. If * the below code _enables_ CR3 exiting, vmx_cache_reg() will * (correctly) stop reading vmcs.GUEST_CR3 because it thinks * KVM's CR3 is installed. */ if (!kvm_register_is_available(vcpu, VCPU_EXREG_CR3)) vmx_cache_reg(vcpu, VCPU_EXREG_CR3); /* * When running with EPT but not unrestricted guest, KVM must * intercept CR3 accesses when paging is _disabled_. This is * necessary because restricted guests can't actually run with * paging disabled, and so KVM stuffs its own CR3 in order to * run the guest when identity mapped page tables. * * Do _NOT_ check the old CR0.PG, e.g. to optimize away the * update, it may be stale with respect to CR3 interception, * e.g. after nested VM-Enter. * * Lastly, honor L1's desires, i.e. intercept CR3 loads and/or * stores to forward them to L1, even if KVM does not need to * intercept them to preserve its identity mapped page tables. */ if (!(cr0 & X86_CR0_PG)) { exec_controls_setbit(vmx, CR3_EXITING_BITS); } else if (!is_guest_mode(vcpu)) { exec_controls_clearbit(vmx, CR3_EXITING_BITS); } else { tmp = exec_controls_get(vmx); tmp &= ~CR3_EXITING_BITS; tmp |= get_vmcs12(vcpu)->cpu_based_vm_exec_control & CR3_EXITING_BITS; exec_controls_set(vmx, tmp); } /* Note, vmx_set_cr4() consumes the new vcpu->arch.cr0. */ if ((old_cr0_pg ^ cr0) & X86_CR0_PG) vmx_set_cr4(vcpu, kvm_read_cr4(vcpu)); /* * When !CR0_PG -> CR0_PG, vcpu->arch.cr3 becomes active, but * GUEST_CR3 is still vmx->ept_identity_map_addr if EPT + !URG. */ if (!(old_cr0_pg & X86_CR0_PG) && (cr0 & X86_CR0_PG)) kvm_register_mark_dirty(vcpu, VCPU_EXREG_CR3); } /* depends on vcpu->arch.cr0 to be set to a new value */ vmx->vt.emulation_required = vmx_emulation_required(vcpu); } static int vmx_get_max_ept_level(void) { if (cpu_has_vmx_ept_5levels()) return 5; return 4; } u64 construct_eptp(struct kvm_vcpu *vcpu, hpa_t root_hpa, int root_level) { u64 eptp = VMX_EPTP_MT_WB; eptp |= (root_level == 5) ? VMX_EPTP_PWL_5 : VMX_EPTP_PWL_4; if (enable_ept_ad_bits && (!is_guest_mode(vcpu) || nested_ept_ad_enabled(vcpu))) eptp |= VMX_EPTP_AD_ENABLE_BIT; eptp |= root_hpa; return eptp; } void vmx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int root_level) { struct kvm *kvm = vcpu->kvm; bool update_guest_cr3 = true; unsigned long guest_cr3; u64 eptp; if (enable_ept) { eptp = construct_eptp(vcpu, root_hpa, root_level); vmcs_write64(EPT_POINTER, eptp); hv_track_root_tdp(vcpu, root_hpa); if (!enable_unrestricted_guest && !is_paging(vcpu)) guest_cr3 = to_kvm_vmx(kvm)->ept_identity_map_addr; else if (kvm_register_is_dirty(vcpu, VCPU_EXREG_CR3)) guest_cr3 = vcpu->arch.cr3; else /* vmcs.GUEST_CR3 is already up-to-date. */ update_guest_cr3 = false; vmx_ept_load_pdptrs(vcpu); } else { guest_cr3 = root_hpa | kvm_get_active_pcid(vcpu) | kvm_get_active_cr3_lam_bits(vcpu); } if (update_guest_cr3) vmcs_writel(GUEST_CR3, guest_cr3); } bool vmx_is_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4) { /* * We operate under the default treatment of SMM, so VMX cannot be * enabled under SMM. Note, whether or not VMXE is allowed at all, * i.e. is a reserved bit, is handled by common x86 code. */ if ((cr4 & X86_CR4_VMXE) && is_smm(vcpu)) return false; if (to_vmx(vcpu)->nested.vmxon && !nested_cr4_valid(vcpu, cr4)) return false; return true; } void vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4) { unsigned long old_cr4 = kvm_read_cr4(vcpu); struct vcpu_vmx *vmx = to_vmx(vcpu); unsigned long hw_cr4; /* * Pass through host's Machine Check Enable value to hw_cr4, which * is in force while we are in guest mode. Do not let guests control * this bit, even if host CR4.MCE == 0. */ hw_cr4 = (cr4_read_shadow() & X86_CR4_MCE) | (cr4 & ~X86_CR4_MCE); if (enable_unrestricted_guest) hw_cr4 |= KVM_VM_CR4_ALWAYS_ON_UNRESTRICTED_GUEST; else if (vmx->rmode.vm86_active) hw_cr4 |= KVM_RMODE_VM_CR4_ALWAYS_ON; else hw_cr4 |= KVM_PMODE_VM_CR4_ALWAYS_ON; if (vmx_umip_emulated()) { if (cr4 & X86_CR4_UMIP) { secondary_exec_controls_setbit(vmx, SECONDARY_EXEC_DESC); hw_cr4 &= ~X86_CR4_UMIP; } else if (!is_guest_mode(vcpu) || !nested_cpu_has2(get_vmcs12(vcpu), SECONDARY_EXEC_DESC)) { secondary_exec_controls_clearbit(vmx, SECONDARY_EXEC_DESC); } } vcpu->arch.cr4 = cr4; kvm_register_mark_available(vcpu, VCPU_EXREG_CR4); if (!enable_unrestricted_guest) { if (enable_ept) { if (!is_paging(vcpu)) { hw_cr4 &= ~X86_CR4_PAE; hw_cr4 |= X86_CR4_PSE; } else if (!(cr4 & X86_CR4_PAE)) { hw_cr4 &= ~X86_CR4_PAE; } } /* * SMEP/SMAP/PKU is disabled if CPU is in non-paging mode in * hardware. To emulate this behavior, SMEP/SMAP/PKU needs * to be manually disabled when guest switches to non-paging * mode. * * If !enable_unrestricted_guest, the CPU is always running * with CR0.PG=1 and CR4 needs to be modified. * If enable_unrestricted_guest, the CPU automatically * disables SMEP/SMAP/PKU when the guest sets CR0.PG=0. */ if (!is_paging(vcpu)) hw_cr4 &= ~(X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_PKE); } vmcs_writel(CR4_READ_SHADOW, cr4); vmcs_writel(GUEST_CR4, hw_cr4); if ((cr4 ^ old_cr4) & (X86_CR4_OSXSAVE | X86_CR4_PKE)) vcpu->arch.cpuid_dynamic_bits_dirty = true; } void vmx_get_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var, int seg) { struct vcpu_vmx *vmx = to_vmx(vcpu); u32 ar; if (vmx->rmode.vm86_active && seg != VCPU_SREG_LDTR) { *var = vmx->rmode.segs[seg]; if (seg == VCPU_SREG_TR || var->selector == vmx_read_guest_seg_selector(vmx, seg)) return; var->base = vmx_read_guest_seg_base(vmx, seg); var->selector = vmx_read_guest_seg_selector(vmx, seg); return; } var->base = vmx_read_guest_seg_base(vmx, seg); var->limit = vmx_read_guest_seg_limit(vmx, seg); var->selector = vmx_read_guest_seg_selector(vmx, seg); ar = vmx_read_guest_seg_ar(vmx, seg); var->unusable = (ar >> 16) & 1; var->type = ar & 15; var->s = (ar >> 4) & 1; var->dpl = (ar >> 5) & 3; /* * Some userspaces do not preserve unusable property. Since usable * segment has to be present according to VMX spec we can use present * property to amend userspace bug by making unusable segment always * nonpresent. vmx_segment_access_rights() already marks nonpresent * segment as unusable. */ var->present = !var->unusable; var->avl = (ar >> 12) & 1; var->l = (ar >> 13) & 1; var->db = (ar >> 14) & 1; var->g = (ar >> 15) & 1; } u64 vmx_get_segment_base(struct kvm_vcpu *vcpu, int seg) { struct kvm_segment s; if (to_vmx(vcpu)->rmode.vm86_active) { vmx_get_segment(vcpu, &s, seg); return s.base; } return vmx_read_guest_seg_base(to_vmx(vcpu), seg); } static int __vmx_get_cpl(struct kvm_vcpu *vcpu, bool no_cache) { struct vcpu_vmx *vmx = to_vmx(vcpu); int ar; if (unlikely(vmx->rmode.vm86_active)) return 0; if (no_cache) ar = vmcs_read32(GUEST_SS_AR_BYTES); else ar = vmx_read_guest_seg_ar(vmx, VCPU_SREG_SS); return VMX_AR_DPL(ar); } int vmx_get_cpl(struct kvm_vcpu *vcpu) { return __vmx_get_cpl(vcpu, false); } int vmx_get_cpl_no_cache(struct kvm_vcpu *vcpu) { return __vmx_get_cpl(vcpu, true); } static u32 vmx_segment_access_rights(struct kvm_segment *var) { u32 ar; ar = var->type & 15; ar |= (var->s & 1) << 4; ar |= (var->dpl & 3) << 5; ar |= (var->present & 1) << 7; ar |= (var->avl & 1) << 12; ar |= (var->l & 1) << 13; ar |= (var->db & 1) << 14; ar |= (var->g & 1) << 15; ar |= (var->unusable || !var->present) << 16; return ar; } void __vmx_set_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var, int seg) { struct vcpu_vmx *vmx = to_vmx(vcpu); const struct kvm_vmx_segment_field *sf = &kvm_vmx_segment_fields[seg]; vmx_segment_cache_clear(vmx); if (vmx->rmode.vm86_active && seg != VCPU_SREG_LDTR) { vmx->rmode.segs[seg] = *var; if (seg == VCPU_SREG_TR) vmcs_write16(sf->selector, var->selector); else if (var->s) fix_rmode_seg(seg, &vmx->rmode.segs[seg]); return; } vmcs_writel(sf->base, var->base); vmcs_write32(sf->limit, var->limit); vmcs_write16(sf->selector, var->selector); /* * Fix the "Accessed" bit in AR field of segment registers for older * qemu binaries. * IA32 arch specifies that at the time of processor reset the * "Accessed" bit in the AR field of segment registers is 1. And qemu * is setting it to 0 in the userland code. This causes invalid guest * state vmexit when "unrestricted guest" mode is turned on. * Fix for this setup issue in cpu_reset is being pushed in the qemu * tree. Newer qemu binaries with that qemu fix would not need this * kvm hack. */ if (is_unrestricted_guest(vcpu) && (seg != VCPU_SREG_LDTR)) var->type |= 0x1; /* Accessed */ vmcs_write32(sf->ar_bytes, vmx_segment_access_rights(var)); } void vmx_set_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var, int seg) { __vmx_set_segment(vcpu, var, seg); to_vmx(vcpu)->vt.emulation_required = vmx_emulation_required(vcpu); } void vmx_get_cs_db_l_bits(struct kvm_vcpu *vcpu, int *db, int *l) { u32 ar = vmx_read_guest_seg_ar(to_vmx(vcpu), VCPU_SREG_CS); *db = (ar >> 14) & 1; *l = (ar >> 13) & 1; } void vmx_get_idt(struct kvm_vcpu *vcpu, struct desc_ptr *dt) { dt->size = vmcs_read32(GUEST_IDTR_LIMIT); dt->address = vmcs_readl(GUEST_IDTR_BASE); } void vmx_set_idt(struct kvm_vcpu *vcpu, struct desc_ptr *dt) { vmcs_write32(GUEST_IDTR_LIMIT, dt->size); vmcs_writel(GUEST_IDTR_BASE, dt->address); } void vmx_get_gdt(struct kvm_vcpu *vcpu, struct desc_ptr *dt) { dt->size = vmcs_read32(GUEST_GDTR_LIMIT); dt->address = vmcs_readl(GUEST_GDTR_BASE); } void vmx_set_gdt(struct kvm_vcpu *vcpu, struct desc_ptr *dt) { vmcs_write32(GUEST_GDTR_LIMIT, dt->size); vmcs_writel(GUEST_GDTR_BASE, dt->address); } static bool rmode_segment_valid(struct kvm_vcpu *vcpu, int seg) { struct kvm_segment var; u32 ar; vmx_get_segment(vcpu, &var, seg); var.dpl = 0x3; if (seg == VCPU_SREG_CS) var.type = 0x3; ar = vmx_segment_access_rights(&var); if (var.base != (var.selector << 4)) return false; if (var.limit != 0xffff) return false; if (ar != 0xf3) return false; return true; } static bool code_segment_valid(struct kvm_vcpu *vcpu) { struct kvm_segment cs; unsigned int cs_rpl; vmx_get_segment(vcpu, &cs, VCPU_SREG_CS); cs_rpl = cs.selector & SEGMENT_RPL_MASK; if (cs.unusable) return false; if (~cs.type & (VMX_AR_TYPE_CODE_MASK|VMX_AR_TYPE_ACCESSES_MASK)) return false; if (!cs.s) return false; if (cs.type & VMX_AR_TYPE_WRITEABLE_MASK) { if (cs.dpl > cs_rpl) return false; } else { if (cs.dpl != cs_rpl) return false; } if (!cs.present) return false; /* TODO: Add Reserved field check, this'll require a new member in the kvm_segment_field structure */ return true; } static bool stack_segment_valid(struct kvm_vcpu *vcpu) { struct kvm_segment ss; unsigned int ss_rpl; vmx_get_segment(vcpu, &ss, VCPU_SREG_SS); ss_rpl = ss.selector & SEGMENT_RPL_MASK; if (ss.unusable) return true; if (ss.type != 3 && ss.type != 7) return false; if (!ss.s) return false; if (ss.dpl != ss_rpl) /* DPL != RPL */ return false; if (!ss.present) return false; return true; } static bool data_segment_valid(struct kvm_vcpu *vcpu, int seg) { struct kvm_segment var; unsigned int rpl; vmx_get_segment(vcpu, &var, seg); rpl = var.selector & SEGMENT_RPL_MASK; if (var.unusable) return true; if (!var.s) return false; if (!var.present) return false; if (~var.type & (VMX_AR_TYPE_CODE_MASK|VMX_AR_TYPE_WRITEABLE_MASK)) { if (var.dpl < rpl) /* DPL < RPL */ return false; } /* TODO: Add other members to kvm_segment_field to allow checking for other access * rights flags */ return true; } static bool tr_valid(struct kvm_vcpu *vcpu) { struct kvm_segment tr; vmx_get_segment(vcpu, &tr, VCPU_SREG_TR); if (tr.unusable) return false; if (tr.selector & SEGMENT_TI_MASK) /* TI = 1 */ return false; if (tr.type != 3 && tr.type != 11) /* TODO: Check if guest is in IA32e mode */ return false; if (!tr.present) return false; return true; } static bool ldtr_valid(struct kvm_vcpu *vcpu) { struct kvm_segment ldtr; vmx_get_segment(vcpu, &ldtr, VCPU_SREG_LDTR); if (ldtr.unusable) return true; if (ldtr.selector & SEGMENT_TI_MASK) /* TI = 1 */ return false; if (ldtr.type != 2) return false; if (!ldtr.present) return false; return true; } static bool cs_ss_rpl_check(struct kvm_vcpu *vcpu) { struct kvm_segment cs, ss; vmx_get_segment(vcpu, &cs, VCPU_SREG_CS); vmx_get_segment(vcpu, &ss, VCPU_SREG_SS); return ((cs.selector & SEGMENT_RPL_MASK) == (ss.selector & SEGMENT_RPL_MASK)); } /* * Check if guest state is valid. Returns true if valid, false if * not. * We assume that registers are always usable */ bool __vmx_guest_state_valid(struct kvm_vcpu *vcpu) { /* real mode guest state checks */ if (!is_protmode(vcpu) || (vmx_get_rflags(vcpu) & X86_EFLAGS_VM)) { if (!rmode_segment_valid(vcpu, VCPU_SREG_CS)) return false; if (!rmode_segment_valid(vcpu, VCPU_SREG_SS)) return false; if (!rmode_segment_valid(vcpu, VCPU_SREG_DS)) return false; if (!rmode_segment_valid(vcpu, VCPU_SREG_ES)) return false; if (!rmode_segment_valid(vcpu, VCPU_SREG_FS)) return false; if (!rmode_segment_valid(vcpu, VCPU_SREG_GS)) return false; } else { /* protected mode guest state checks */ if (!cs_ss_rpl_check(vcpu)) return false; if (!code_segment_valid(vcpu)) return false; if (!stack_segment_valid(vcpu)) return false; if (!data_segment_valid(vcpu, VCPU_SREG_DS)) return false; if (!data_segment_valid(vcpu, VCPU_SREG_ES)) return false; if (!data_segment_valid(vcpu, VCPU_SREG_FS)) return false; if (!data_segment_valid(vcpu, VCPU_SREG_GS)) return false; if (!tr_valid(vcpu)) return false; if (!ldtr_valid(vcpu)) return false; } /* TODO: * - Add checks on RIP * - Add checks on RFLAGS */ return true; } static int init_rmode_tss(struct kvm *kvm, void __user *ua) { const void *zero_page = (const void *) __va(page_to_phys(ZERO_PAGE(0))); u16 data; int i; for (i = 0; i < 3; i++) { if (__copy_to_user(ua + PAGE_SIZE * i, zero_page, PAGE_SIZE)) return -EFAULT; } data = TSS_BASE_SIZE + TSS_REDIRECTION_SIZE; if (__copy_to_user(ua + TSS_IOPB_BASE_OFFSET, &data, sizeof(u16))) return -EFAULT; data = ~0; if (__copy_to_user(ua + RMODE_TSS_SIZE - 1, &data, sizeof(u8))) return -EFAULT; return 0; } static int init_rmode_identity_map(struct kvm *kvm) { struct kvm_vmx *kvm_vmx = to_kvm_vmx(kvm); int i, r = 0; void __user *uaddr; u32 tmp; /* Protect kvm_vmx->ept_identity_pagetable_done. */ mutex_lock(&kvm->slots_lock); if (likely(kvm_vmx->ept_identity_pagetable_done)) goto out; if (!kvm_vmx->ept_identity_map_addr) kvm_vmx->ept_identity_map_addr = VMX_EPT_IDENTITY_PAGETABLE_ADDR; uaddr = __x86_set_memory_region(kvm, IDENTITY_PAGETABLE_PRIVATE_MEMSLOT, kvm_vmx->ept_identity_map_addr, PAGE_SIZE); if (IS_ERR(uaddr)) { r = PTR_ERR(uaddr); goto out; } /* Set up identity-mapping pagetable for EPT in real mode */ for (i = 0; i < (PAGE_SIZE / sizeof(tmp)); i++) { tmp = (i << 22) + (_PAGE_PRESENT | _PAGE_RW | _PAGE_USER | _PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_PSE); if (__copy_to_user(uaddr + i * sizeof(tmp), &tmp, sizeof(tmp))) { r = -EFAULT; goto out; } } kvm_vmx->ept_identity_pagetable_done = true; out: mutex_unlock(&kvm->slots_lock); return r; } static void seg_setup(int seg) { const struct kvm_vmx_segment_field *sf = &kvm_vmx_segment_fields[seg]; unsigned int ar; vmcs_write16(sf->selector, 0); vmcs_writel(sf->base, 0); vmcs_write32(sf->limit, 0xffff); ar = 0x93; if (seg == VCPU_SREG_CS) ar |= 0x08; /* code segment */ vmcs_write32(sf->ar_bytes, ar); } int allocate_vpid(void) { int vpid; if (!enable_vpid) return 0; spin_lock(&vmx_vpid_lock); vpid = find_first_zero_bit(vmx_vpid_bitmap, VMX_NR_VPIDS); if (vpid < VMX_NR_VPIDS) __set_bit(vpid, vmx_vpid_bitmap); else vpid = 0; spin_unlock(&vmx_vpid_lock); return vpid; } void free_vpid(int vpid) { if (!enable_vpid || vpid == 0) return; spin_lock(&vmx_vpid_lock); __clear_bit(vpid, vmx_vpid_bitmap); spin_unlock(&vmx_vpid_lock); } static void vmx_msr_bitmap_l01_changed(struct vcpu_vmx *vmx) { /* * When KVM is a nested hypervisor on top of Hyper-V and uses * 'Enlightened MSR Bitmap' feature L0 needs to know that MSR * bitmap has changed. */ if (kvm_is_using_evmcs()) { struct hv_enlightened_vmcs *evmcs = (void *)vmx->vmcs01.vmcs; if (evmcs->hv_enlightenments_control.msr_bitmap) evmcs->hv_clean_fields &= ~HV_VMX_ENLIGHTENED_CLEAN_FIELD_MSR_BITMAP; } vmx->nested.force_msr_bitmap_recalc = true; } void vmx_set_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr, int type, bool set) { struct vcpu_vmx *vmx = to_vmx(vcpu); unsigned long *msr_bitmap = vmx->vmcs01.msr_bitmap; if (!cpu_has_vmx_msr_bitmap()) return; vmx_msr_bitmap_l01_changed(vmx); if (type & MSR_TYPE_R) { if (!set && kvm_msr_allowed(vcpu, msr, KVM_MSR_FILTER_READ)) vmx_clear_msr_bitmap_read(msr_bitmap, msr); else vmx_set_msr_bitmap_read(msr_bitmap, msr); } if (type & MSR_TYPE_W) { if (!set && kvm_msr_allowed(vcpu, msr, KVM_MSR_FILTER_WRITE)) vmx_clear_msr_bitmap_write(msr_bitmap, msr); else vmx_set_msr_bitmap_write(msr_bitmap, msr); } } static void vmx_update_msr_bitmap_x2apic(struct kvm_vcpu *vcpu) { /* * x2APIC indices for 64-bit accesses into the RDMSR and WRMSR halves * of the MSR bitmap. KVM emulates APIC registers up through 0x3f0, * i.e. MSR 0x83f, and so only needs to dynamically manipulate 64 bits. */ const int read_idx = APIC_BASE_MSR / BITS_PER_LONG_LONG; const int write_idx = read_idx + (0x800 / sizeof(u64)); struct vcpu_vmx *vmx = to_vmx(vcpu); u64 *msr_bitmap = (u64 *)vmx->vmcs01.msr_bitmap; u8 mode; if (!cpu_has_vmx_msr_bitmap() || WARN_ON_ONCE(!lapic_in_kernel(vcpu))) return; if (cpu_has_secondary_exec_ctrls() && (secondary_exec_controls_get(vmx) & SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE)) { mode = MSR_BITMAP_MODE_X2APIC; if (enable_apicv && kvm_vcpu_apicv_active(vcpu)) mode |= MSR_BITMAP_MODE_X2APIC_APICV; } else { mode = 0; } if (mode == vmx->x2apic_msr_bitmap_mode) return; vmx->x2apic_msr_bitmap_mode = mode; /* * Reset the bitmap for MSRs 0x800 - 0x83f. Leave AMD's uber-extended * registers (0x840 and above) intercepted, KVM doesn't support them. * Intercept all writes by default and poke holes as needed. Pass * through reads for all valid registers by default in x2APIC+APICv * mode, only the current timer count needs on-demand emulation by KVM. */ if (mode & MSR_BITMAP_MODE_X2APIC_APICV) msr_bitmap[read_idx] = ~kvm_lapic_readable_reg_mask(vcpu->arch.apic); else msr_bitmap[read_idx] = ~0ull; msr_bitmap[write_idx] = ~0ull; /* * TPR reads and writes can be virtualized even if virtual interrupt * delivery is not in use. */ vmx_set_intercept_for_msr(vcpu, X2APIC_MSR(APIC_TASKPRI), MSR_TYPE_RW, !(mode & MSR_BITMAP_MODE_X2APIC)); if (mode & MSR_BITMAP_MODE_X2APIC_APICV) { vmx_enable_intercept_for_msr(vcpu, X2APIC_MSR(APIC_TMCCT), MSR_TYPE_RW); vmx_disable_intercept_for_msr(vcpu, X2APIC_MSR(APIC_EOI), MSR_TYPE_W); vmx_disable_intercept_for_msr(vcpu, X2APIC_MSR(APIC_SELF_IPI), MSR_TYPE_W); if (enable_ipiv) vmx_disable_intercept_for_msr(vcpu, X2APIC_MSR(APIC_ICR), MSR_TYPE_RW); } } void pt_update_intercept_for_msr(struct kvm_vcpu *vcpu) { struct vcpu_vmx *vmx = to_vmx(vcpu); bool flag = !(vmx->pt_desc.guest.ctl & RTIT_CTL_TRACEEN); u32 i; vmx_set_intercept_for_msr(vcpu, MSR_IA32_RTIT_STATUS, MSR_TYPE_RW, flag); vmx_set_intercept_for_msr(vcpu, MSR_IA32_RTIT_OUTPUT_BASE, MSR_TYPE_RW, flag); vmx_set_intercept_for_msr(vcpu, MSR_IA32_RTIT_OUTPUT_MASK, MSR_TYPE_RW, flag); vmx_set_intercept_for_msr(vcpu, MSR_IA32_RTIT_CR3_MATCH, MSR_TYPE_RW, flag); for (i = 0; i < vmx->pt_desc.num_address_ranges; i++) { vmx_set_intercept_for_msr(vcpu, MSR_IA32_RTIT_ADDR0_A + i * 2, MSR_TYPE_RW, flag); vmx_set_intercept_for_msr(vcpu, MSR_IA32_RTIT_ADDR0_B + i * 2, MSR_TYPE_RW, flag); } } void vmx_recalc_msr_intercepts(struct kvm_vcpu *vcpu) { if (!cpu_has_vmx_msr_bitmap()) return; vmx_disable_intercept_for_msr(vcpu, MSR_IA32_TSC, MSR_TYPE_R); #ifdef CONFIG_X86_64 vmx_disable_intercept_for_msr(vcpu, MSR_FS_BASE, MSR_TYPE_RW); vmx_disable_intercept_for_msr(vcpu, MSR_GS_BASE, MSR_TYPE_RW); vmx_disable_intercept_for_msr(vcpu, MSR_KERNEL_GS_BASE, MSR_TYPE_RW); #endif vmx_disable_intercept_for_msr(vcpu, MSR_IA32_SYSENTER_CS, MSR_TYPE_RW); vmx_disable_intercept_for_msr(vcpu, MSR_IA32_SYSENTER_ESP, MSR_TYPE_RW); vmx_disable_intercept_for_msr(vcpu, MSR_IA32_SYSENTER_EIP, MSR_TYPE_RW); if (kvm_cstate_in_guest(vcpu->kvm)) { vmx_disable_intercept_for_msr(vcpu, MSR_CORE_C1_RES, MSR_TYPE_R); vmx_disable_intercept_for_msr(vcpu, MSR_CORE_C3_RESIDENCY, MSR_TYPE_R); vmx_disable_intercept_for_msr(vcpu, MSR_CORE_C6_RESIDENCY, MSR_TYPE_R); vmx_disable_intercept_for_msr(vcpu, MSR_CORE_C7_RESIDENCY, MSR_TYPE_R); } /* PT MSRs can be passed through iff PT is exposed to the guest. */ if (vmx_pt_mode_is_host_guest()) pt_update_intercept_for_msr(vcpu); if (vcpu->arch.xfd_no_write_intercept) vmx_disable_intercept_for_msr(vcpu, MSR_IA32_XFD, MSR_TYPE_RW); vmx_set_intercept_for_msr(vcpu, MSR_IA32_SPEC_CTRL, MSR_TYPE_RW, !to_vmx(vcpu)->spec_ctrl); if (kvm_cpu_cap_has(X86_FEATURE_XFD)) vmx_set_intercept_for_msr(vcpu, MSR_IA32_XFD_ERR, MSR_TYPE_R, !guest_cpu_cap_has(vcpu, X86_FEATURE_XFD)); if (cpu_feature_enabled(X86_FEATURE_IBPB)) vmx_set_intercept_for_msr(vcpu, MSR_IA32_PRED_CMD, MSR_TYPE_W, !guest_has_pred_cmd_msr(vcpu)); if (cpu_feature_enabled(X86_FEATURE_FLUSH_L1D)) vmx_set_intercept_for_msr(vcpu, MSR_IA32_FLUSH_CMD, MSR_TYPE_W, !guest_cpu_cap_has(vcpu, X86_FEATURE_FLUSH_L1D)); /* * x2APIC and LBR MSR intercepts are modified on-demand and cannot be * filtered by userspace. */ } static int vmx_deliver_nested_posted_interrupt(struct kvm_vcpu *vcpu, int vector) { struct vcpu_vmx *vmx = to_vmx(vcpu); /* * DO NOT query the vCPU's vmcs12, as vmcs12 is dynamically allocated * and freed, and must not be accessed outside of vcpu->mutex. The * vCPU's cached PI NV is valid if and only if posted interrupts * enabled in its vmcs12, i.e. checking the vector also checks that * L1 has enabled posted interrupts for L2. */ if (is_guest_mode(vcpu) && vector == vmx->nested.posted_intr_nv) { /* * If a posted intr is not recognized by hardware, * we will accomplish it in the next vmentry. */ vmx->nested.pi_pending = true; kvm_make_request(KVM_REQ_EVENT, vcpu); /* * This pairs with the smp_mb_*() after setting vcpu->mode in * vcpu_enter_guest() to guarantee the vCPU sees the event * request if triggering a posted interrupt "fails" because * vcpu->mode != IN_GUEST_MODE. The extra barrier is needed as * the smb_wmb() in kvm_make_request() only ensures everything * done before making the request is visible when the request * is visible, it doesn't ensure ordering between the store to * vcpu->requests and the load from vcpu->mode. */ smp_mb__after_atomic(); /* the PIR and ON have been set by L1. */ kvm_vcpu_trigger_posted_interrupt(vcpu, POSTED_INTR_NESTED_VECTOR); return 0; } return -1; } /* * Send interrupt to vcpu via posted interrupt way. * 1. If target vcpu is running(non-root mode), send posted interrupt * notification to vcpu and hardware will sync PIR to vIRR atomically. * 2. If target vcpu isn't running(root mode), kick it to pick up the * interrupt from PIR in next vmentry. */ static int vmx_deliver_posted_interrupt(struct kvm_vcpu *vcpu, int vector) { struct vcpu_vt *vt = to_vt(vcpu); int r; r = vmx_deliver_nested_posted_interrupt(vcpu, vector); if (!r) return 0; /* Note, this is called iff the local APIC is in-kernel. */ if (!vcpu->arch.apic->apicv_active) return -1; __vmx_deliver_posted_interrupt(vcpu, &vt->pi_desc, vector); return 0; } void vmx_deliver_interrupt(struct kvm_lapic *apic, int delivery_mode, int trig_mode, int vector) { struct kvm_vcpu *vcpu = apic->vcpu; if (vmx_deliver_posted_interrupt(vcpu, vector)) { kvm_lapic_set_irr(vector, apic); kvm_make_request(KVM_REQ_EVENT, vcpu); kvm_vcpu_kick(vcpu); } else { trace_kvm_apicv_accept_irq(vcpu->vcpu_id, delivery_mode, trig_mode, vector); } } /* * Set up the vmcs's constant host-state fields, i.e., host-state fields that * will not change in the lifetime of the guest. * Note that host-state that does change is set elsewhere. E.g., host-state * that is set differently for each CPU is set in vmx_vcpu_load(), not here. */ void vmx_set_constant_host_state(struct vcpu_vmx *vmx) { u32 low32, high32; unsigned long tmpl; unsigned long cr0, cr3, cr4; cr0 = read_cr0(); WARN_ON(cr0 & X86_CR0_TS); vmcs_writel(HOST_CR0, cr0); /* 22.2.3 */ /* * Save the most likely value for this task's CR3 in the VMCS. * We can't use __get_current_cr3_fast() because we're not atomic. */ cr3 = __read_cr3(); vmcs_writel(HOST_CR3, cr3); /* 22.2.3 FIXME: shadow tables */ vmx->loaded_vmcs->host_state.cr3 = cr3; /* Save the most likely value for this task's CR4 in the VMCS. */ cr4 = cr4_read_shadow(); vmcs_writel(HOST_CR4, cr4); /* 22.2.3, 22.2.5 */ vmx->loaded_vmcs->host_state.cr4 = cr4; vmcs_write16(HOST_CS_SELECTOR, __KERNEL_CS); /* 22.2.4 */ #ifdef CONFIG_X86_64 /* * Load null selectors, so we can avoid reloading them in * vmx_prepare_switch_to_host(), in case userspace uses * the null selectors too (the expected case). */ vmcs_write16(HOST_DS_SELECTOR, 0); vmcs_write16(HOST_ES_SELECTOR, 0); #else vmcs_write16(HOST_DS_SELECTOR, __KERNEL_DS); /* 22.2.4 */ vmcs_write16(HOST_ES_SELECTOR, __KERNEL_DS); /* 22.2.4 */ #endif vmcs_write16(HOST_SS_SELECTOR, __KERNEL_DS); /* 22.2.4 */ vmcs_write16(HOST_TR_SELECTOR, GDT_ENTRY_TSS*8); /* 22.2.4 */ vmcs_writel(HOST_IDTR_BASE, host_idt_base); /* 22.2.4 */ vmcs_writel(HOST_RIP, (unsigned long)vmx_vmexit); /* 22.2.5 */ rdmsr(MSR_IA32_SYSENTER_CS, low32, high32); vmcs_write32(HOST_IA32_SYSENTER_CS, low32); /* * SYSENTER is used for 32-bit system calls on either 32-bit or * 64-bit kernels. It is always zero If neither is allowed, otherwise * vmx_vcpu_load_vmcs loads it with the per-CPU entry stack (and may * have already done so!). */ if (!IS_ENABLED(CONFIG_IA32_EMULATION) && !IS_ENABLED(CONFIG_X86_32)) vmcs_writel(HOST_IA32_SYSENTER_ESP, 0); rdmsrq(MSR_IA32_SYSENTER_EIP, tmpl); vmcs_writel(HOST_IA32_SYSENTER_EIP, tmpl); /* 22.2.3 */ if (vmcs_config.vmexit_ctrl & VM_EXIT_LOAD_IA32_PAT) { rdmsr(MSR_IA32_CR_PAT, low32, high32); vmcs_write64(HOST_IA32_PAT, low32 | ((u64) high32 << 32)); } if (cpu_has_load_ia32_efer()) vmcs_write64(HOST_IA32_EFER, kvm_host.efer); } void set_cr4_guest_host_mask(struct vcpu_vmx *vmx) { struct kvm_vcpu *vcpu = &vmx->vcpu; vcpu->arch.cr4_guest_owned_bits = KVM_POSSIBLE_CR4_GUEST_BITS & ~vcpu->arch.cr4_guest_rsvd_bits; if (!enable_ept) { vcpu->arch.cr4_guest_owned_bits &= ~X86_CR4_TLBFLUSH_BITS; vcpu->arch.cr4_guest_owned_bits &= ~X86_CR4_PDPTR_BITS; } if (is_guest_mode(&vmx->vcpu)) vcpu->arch.cr4_guest_owned_bits &= ~get_vmcs12(vcpu)->cr4_guest_host_mask; vmcs_writel(CR4_GUEST_HOST_MASK, ~vcpu->arch.cr4_guest_owned_bits); } static u32 vmx_pin_based_exec_ctrl(struct vcpu_vmx *vmx) { u32 pin_based_exec_ctrl = vmcs_config.pin_based_exec_ctrl; if (!kvm_vcpu_apicv_active(&vmx->vcpu)) pin_based_exec_ctrl &= ~PIN_BASED_POSTED_INTR; if (!enable_vnmi) pin_based_exec_ctrl &= ~PIN_BASED_VIRTUAL_NMIS; if (!enable_preemption_timer) pin_based_exec_ctrl &= ~PIN_BASED_VMX_PREEMPTION_TIMER; return pin_based_exec_ctrl; } static u32 vmx_vmentry_ctrl(void) { u32 vmentry_ctrl = vmcs_config.vmentry_ctrl; if (vmx_pt_mode_is_system()) vmentry_ctrl &= ~(VM_ENTRY_PT_CONCEAL_PIP | VM_ENTRY_LOAD_IA32_RTIT_CTL); /* * IA32e mode, and loading of EFER and PERF_GLOBAL_CTRL are toggled dynamically. */ vmentry_ctrl &= ~(VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL | VM_ENTRY_LOAD_IA32_EFER | VM_ENTRY_IA32E_MODE); return vmentry_ctrl; } static u32 vmx_vmexit_ctrl(void) { u32 vmexit_ctrl = vmcs_config.vmexit_ctrl; /* * Not used by KVM and never set in vmcs01 or vmcs02, but emulated for * nested virtualization and thus allowed to be set in vmcs12. */ vmexit_ctrl &= ~(VM_EXIT_SAVE_IA32_PAT | VM_EXIT_SAVE_IA32_EFER | VM_EXIT_SAVE_VMX_PREEMPTION_TIMER); if (vmx_pt_mode_is_system()) vmexit_ctrl &= ~(VM_EXIT_PT_CONCEAL_PIP | VM_EXIT_CLEAR_IA32_RTIT_CTL); /* Loading of EFER and PERF_GLOBAL_CTRL are toggled dynamically */ return vmexit_ctrl & ~(VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL | VM_EXIT_LOAD_IA32_EFER); } void vmx_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu) { struct vcpu_vmx *vmx = to_vmx(vcpu); if (is_guest_mode(vcpu)) { vmx->nested.update_vmcs01_apicv_status = true; return; } pin_controls_set(vmx, vmx_pin_based_exec_ctrl(vmx)); if (kvm_vcpu_apicv_active(vcpu)) { secondary_exec_controls_setbit(vmx, SECONDARY_EXEC_APIC_REGISTER_VIRT | SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY); if (enable_ipiv) tertiary_exec_controls_setbit(vmx, TERTIARY_EXEC_IPI_VIRT); } else { secondary_exec_controls_clearbit(vmx, SECONDARY_EXEC_APIC_REGISTER_VIRT | SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY); if (enable_ipiv) tertiary_exec_controls_clearbit(vmx, TERTIARY_EXEC_IPI_VIRT); } vmx_update_msr_bitmap_x2apic(vcpu); } static u32 vmx_exec_control(struct vcpu_vmx *vmx) { u32 exec_control = vmcs_config.cpu_based_exec_ctrl; /* * Not used by KVM, but fully supported for nesting, i.e. are allowed in * vmcs12 and propagated to vmcs02 when set in vmcs12. */ exec_control &= ~(CPU_BASED_RDTSC_EXITING | CPU_BASED_USE_IO_BITMAPS | CPU_BASED_MONITOR_TRAP_FLAG | CPU_BASED_PAUSE_EXITING); /* INTR_WINDOW_EXITING and NMI_WINDOW_EXITING are toggled dynamically */ exec_control &= ~(CPU_BASED_INTR_WINDOW_EXITING | CPU_BASED_NMI_WINDOW_EXITING); if (vmx->vcpu.arch.switch_db_regs & KVM_DEBUGREG_WONT_EXIT) exec_control &= ~CPU_BASED_MOV_DR_EXITING; if (!cpu_need_tpr_shadow(&vmx->vcpu)) exec_control &= ~CPU_BASED_TPR_SHADOW; #ifdef CONFIG_X86_64 if (exec_control & CPU_BASED_TPR_SHADOW) exec_control &= ~(CPU_BASED_CR8_LOAD_EXITING | CPU_BASED_CR8_STORE_EXITING); else exec_control |= CPU_BASED_CR8_STORE_EXITING | CPU_BASED_CR8_LOAD_EXITING; #endif /* No need to intercept CR3 access or INVPLG when using EPT. */ if (enable_ept) exec_control &= ~(CPU_BASED_CR3_LOAD_EXITING | CPU_BASED_CR3_STORE_EXITING | CPU_BASED_INVLPG_EXITING); if (kvm_mwait_in_guest(vmx->vcpu.kvm)) exec_control &= ~(CPU_BASED_MWAIT_EXITING | CPU_BASED_MONITOR_EXITING); if (kvm_hlt_in_guest(vmx->vcpu.kvm)) exec_control &= ~CPU_BASED_HLT_EXITING; return exec_control; } static u64 vmx_tertiary_exec_control(struct vcpu_vmx *vmx) { u64 exec_control = vmcs_config.cpu_based_3rd_exec_ctrl; /* * IPI virtualization relies on APICv. Disable IPI virtualization if * APICv is inhibited. */ if (!enable_ipiv || !kvm_vcpu_apicv_active(&vmx->vcpu)) exec_control &= ~TERTIARY_EXEC_IPI_VIRT; return exec_control; } /* * Adjust a single secondary execution control bit to intercept/allow an * instruction in the guest. This is usually done based on whether or not a * feature has been exposed to the guest in order to correctly emulate faults. */ static inline void vmx_adjust_secondary_exec_control(struct vcpu_vmx *vmx, u32 *exec_control, u32 control, bool enabled, bool exiting) { /* * If the control is for an opt-in feature, clear the control if the * feature is not exposed to the guest, i.e. not enabled. If the * control is opt-out, i.e. an exiting control, clear the control if * the feature _is_ exposed to the guest, i.e. exiting/interception is * disabled for the associated instruction. Note, the caller is * responsible presetting exec_control to set all supported bits. */ if (enabled == exiting) *exec_control &= ~control; /* * Update the nested MSR settings so that a nested VMM can/can't set * controls for features that are/aren't exposed to the guest. */ if (nested && kvm_check_has_quirk(vmx->vcpu.kvm, KVM_X86_QUIRK_STUFF_FEATURE_MSRS)) { /* * All features that can be added or removed to VMX MSRs must * be supported in the first place for nested virtualization. */ if (WARN_ON_ONCE(!(vmcs_config.nested.secondary_ctls_high & control))) enabled = false; if (enabled) vmx->nested.msrs.secondary_ctls_high |= control; else vmx->nested.msrs.secondary_ctls_high &= ~control; } } /* * Wrapper macro for the common case of adjusting a secondary execution control * based on a single guest CPUID bit, with a dedicated feature bit. This also * verifies that the control is actually supported by KVM and hardware. */ #define vmx_adjust_sec_exec_control(vmx, exec_control, name, feat_name, ctrl_name, exiting) \ ({ \ struct kvm_vcpu *__vcpu = &(vmx)->vcpu; \ bool __enabled; \ \ if (cpu_has_vmx_##name()) { \ __enabled = guest_cpu_cap_has(__vcpu, X86_FEATURE_##feat_name); \ vmx_adjust_secondary_exec_control(vmx, exec_control, SECONDARY_EXEC_##ctrl_name,\ __enabled, exiting); \ } \ }) /* More macro magic for ENABLE_/opt-in versus _EXITING/opt-out controls. */ #define vmx_adjust_sec_exec_feature(vmx, exec_control, lname, uname) \ vmx_adjust_sec_exec_control(vmx, exec_control, lname, uname, ENABLE_##uname, false) #define vmx_adjust_sec_exec_exiting(vmx, exec_control, lname, uname) \ vmx_adjust_sec_exec_control(vmx, exec_control, lname, uname, uname##_EXITING, true) static u32 vmx_secondary_exec_control(struct vcpu_vmx *vmx) { struct kvm_vcpu *vcpu = &vmx->vcpu; u32 exec_control = vmcs_config.cpu_based_2nd_exec_ctrl; if (vmx_pt_mode_is_system()) exec_control &= ~(SECONDARY_EXEC_PT_USE_GPA | SECONDARY_EXEC_PT_CONCEAL_VMX); if (!cpu_need_virtualize_apic_accesses(vcpu)) exec_control &= ~SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES; if (vmx->vpid == 0) exec_control &= ~SECONDARY_EXEC_ENABLE_VPID; if (!enable_ept) { exec_control &= ~SECONDARY_EXEC_ENABLE_EPT; exec_control &= ~SECONDARY_EXEC_EPT_VIOLATION_VE; enable_unrestricted_guest = 0; } if (!enable_unrestricted_guest) exec_control &= ~SECONDARY_EXEC_UNRESTRICTED_GUEST; if (kvm_pause_in_guest(vmx->vcpu.kvm)) exec_control &= ~SECONDARY_EXEC_PAUSE_LOOP_EXITING; if (!kvm_vcpu_apicv_active(vcpu)) exec_control &= ~(SECONDARY_EXEC_APIC_REGISTER_VIRT | SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY); exec_control &= ~SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE; /* * KVM doesn't support VMFUNC for L1, but the control is set in KVM's * base configuration as KVM emulates VMFUNC[EPTP_SWITCHING] for L2. */ exec_control &= ~SECONDARY_EXEC_ENABLE_VMFUNC; /* SECONDARY_EXEC_DESC is enabled/disabled on writes to CR4.UMIP, * in vmx_set_cr4. */ exec_control &= ~SECONDARY_EXEC_DESC; /* SECONDARY_EXEC_SHADOW_VMCS is enabled when L1 executes VMPTRLD (handle_vmptrld). We can NOT enable shadow_vmcs here because we don't have yet a current VMCS12 */ exec_control &= ~SECONDARY_EXEC_SHADOW_VMCS; /* * PML is enabled/disabled when dirty logging of memsmlots changes, but * it needs to be set here when dirty logging is already active, e.g. * if this vCPU was created after dirty logging was enabled. */ if (!enable_pml || !atomic_read(&vcpu->kvm->nr_memslots_dirty_logging)) exec_control &= ~SECONDARY_EXEC_ENABLE_PML; vmx_adjust_sec_exec_feature(vmx, &exec_control, xsaves, XSAVES); /* * RDPID is also gated by ENABLE_RDTSCP, turn on the control if either * feature is exposed to the guest. This creates a virtualization hole * if both are supported in hardware but only one is exposed to the * guest, but letting the guest execute RDTSCP or RDPID when either one * is advertised is preferable to emulating the advertised instruction * in KVM on #UD, and obviously better than incorrectly injecting #UD. */ if (cpu_has_vmx_rdtscp()) { bool rdpid_or_rdtscp_enabled = guest_cpu_cap_has(vcpu, X86_FEATURE_RDTSCP) || guest_cpu_cap_has(vcpu, X86_FEATURE_RDPID); vmx_adjust_secondary_exec_control(vmx, &exec_control, SECONDARY_EXEC_ENABLE_RDTSCP, rdpid_or_rdtscp_enabled, false); } vmx_adjust_sec_exec_feature(vmx, &exec_control, invpcid, INVPCID); vmx_adjust_sec_exec_exiting(vmx, &exec_control, rdrand, RDRAND); vmx_adjust_sec_exec_exiting(vmx, &exec_control, rdseed, RDSEED); vmx_adjust_sec_exec_control(vmx, &exec_control, waitpkg, WAITPKG, ENABLE_USR_WAIT_PAUSE, false); if (!vcpu->kvm->arch.bus_lock_detection_enabled) exec_control &= ~SECONDARY_EXEC_BUS_LOCK_DETECTION; if (!kvm_notify_vmexit_enabled(vcpu->kvm)) exec_control &= ~SECONDARY_EXEC_NOTIFY_VM_EXITING; return exec_control; } static inline int vmx_get_pid_table_order(struct kvm *kvm) { return get_order(kvm->arch.max_vcpu_ids * sizeof(*to_kvm_vmx(kvm)->pid_table)); } static int vmx_alloc_ipiv_pid_table(struct kvm *kvm) { struct page *pages; struct kvm_vmx *kvm_vmx = to_kvm_vmx(kvm); if (!irqchip_in_kernel(kvm) || !enable_ipiv) return 0; if (kvm_vmx->pid_table) return 0; pages = alloc_pages(GFP_KERNEL_ACCOUNT | __GFP_ZERO, vmx_get_pid_table_order(kvm)); if (!pages) return -ENOMEM; kvm_vmx->pid_table = (void *)page_address(pages); return 0; } int vmx_vcpu_precreate(struct kvm *kvm) { return vmx_alloc_ipiv_pid_table(kvm); } #define VMX_XSS_EXIT_BITMAP 0 static void init_vmcs(struct vcpu_vmx *vmx) { struct kvm *kvm = vmx->vcpu.kvm; struct kvm_vmx *kvm_vmx = to_kvm_vmx(kvm); if (nested) nested_vmx_set_vmcs_shadowing_bitmap(); if (cpu_has_vmx_msr_bitmap()) vmcs_write64(MSR_BITMAP, __pa(vmx->vmcs01.msr_bitmap)); vmcs_write64(VMCS_LINK_POINTER, INVALID_GPA); /* 22.3.1.5 */ /* Control */ pin_controls_set(vmx, vmx_pin_based_exec_ctrl(vmx)); exec_controls_set(vmx, vmx_exec_control(vmx)); if (cpu_has_secondary_exec_ctrls()) { secondary_exec_controls_set(vmx, vmx_secondary_exec_control(vmx)); if (vmx->ve_info) vmcs_write64(VE_INFORMATION_ADDRESS, __pa(vmx->ve_info)); } if (cpu_has_tertiary_exec_ctrls()) tertiary_exec_controls_set(vmx, vmx_tertiary_exec_control(vmx)); if (enable_apicv && lapic_in_kernel(&vmx->vcpu)) { vmcs_write64(EOI_EXIT_BITMAP0, 0); vmcs_write64(EOI_EXIT_BITMAP1, 0); vmcs_write64(EOI_EXIT_BITMAP2, 0); vmcs_write64(EOI_EXIT_BITMAP3, 0); vmcs_write16(GUEST_INTR_STATUS, 0); vmcs_write16(POSTED_INTR_NV, POSTED_INTR_VECTOR); vmcs_write64(POSTED_INTR_DESC_ADDR, __pa((&vmx->vt.pi_desc))); } if (vmx_can_use_ipiv(&vmx->vcpu)) { vmcs_write64(PID_POINTER_TABLE, __pa(kvm_vmx->pid_table)); vmcs_write16(LAST_PID_POINTER_INDEX, kvm->arch.max_vcpu_ids - 1); } if (!kvm_pause_in_guest(kvm)) { vmcs_write32(PLE_GAP, ple_gap); vmx->ple_window = ple_window; vmx->ple_window_dirty = true; } if (kvm_notify_vmexit_enabled(kvm)) vmcs_write32(NOTIFY_WINDOW, kvm->arch.notify_window); vmcs_write32(PAGE_FAULT_ERROR_CODE_MASK, 0); vmcs_write32(PAGE_FAULT_ERROR_CODE_MATCH, 0); vmcs_write32(CR3_TARGET_COUNT, 0); /* 22.2.1 */ vmcs_write16(HOST_FS_SELECTOR, 0); /* 22.2.4 */ vmcs_write16(HOST_GS_SELECTOR, 0); /* 22.2.4 */ vmx_set_constant_host_state(vmx); vmcs_writel(HOST_FS_BASE, 0); /* 22.2.4 */ vmcs_writel(HOST_GS_BASE, 0); /* 22.2.4 */ if (cpu_has_vmx_vmfunc()) vmcs_write64(VM_FUNCTION_CONTROL, 0); vmcs_write32(VM_EXIT_MSR_STORE_COUNT, 0); vmcs_write32(VM_EXIT_MSR_LOAD_COUNT, 0); vmcs_write64(VM_EXIT_MSR_LOAD_ADDR, __pa(vmx->msr_autoload.host.val)); vmcs_write32(VM_ENTRY_MSR_LOAD_COUNT, 0); vmcs_write64(VM_ENTRY_MSR_LOAD_ADDR, __pa(vmx->msr_autoload.guest.val)); if (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PAT) vmcs_write64(GUEST_IA32_PAT, vmx->vcpu.arch.pat); vm_exit_controls_set(vmx, vmx_vmexit_ctrl()); /* 22.2.1, 20.8.1 */ vm_entry_controls_set(vmx, vmx_vmentry_ctrl()); vmx->vcpu.arch.cr0_guest_owned_bits = vmx_l1_guest_owned_cr0_bits(); vmcs_writel(CR0_GUEST_HOST_MASK, ~vmx->vcpu.arch.cr0_guest_owned_bits); set_cr4_guest_host_mask(vmx); if (vmx->vpid != 0) vmcs_write16(VIRTUAL_PROCESSOR_ID, vmx->vpid); if (cpu_has_vmx_xsaves()) vmcs_write64(XSS_EXIT_BITMAP, VMX_XSS_EXIT_BITMAP); if (enable_pml) { vmcs_write64(PML_ADDRESS, page_to_phys(vmx->pml_pg)); vmcs_write16(GUEST_PML_INDEX, PML_HEAD_INDEX); } vmx_write_encls_bitmap(&vmx->vcpu, NULL); if (vmx_pt_mode_is_host_guest()) { memset(&vmx->pt_desc, 0, sizeof(vmx->pt_desc)); /* Bit[6~0] are forced to 1, writes are ignored. */ vmx->pt_desc.guest.output_mask = 0x7F; vmcs_write64(GUEST_IA32_RTIT_CTL, 0); } vmcs_write32(GUEST_SYSENTER_CS, 0); vmcs_writel(GUEST_SYSENTER_ESP, 0); vmcs_writel(GUEST_SYSENTER_EIP, 0); vmx_guest_debugctl_write(&vmx->vcpu, 0); if (cpu_has_vmx_tpr_shadow()) { vmcs_write64(VIRTUAL_APIC_PAGE_ADDR, 0); if (cpu_need_tpr_shadow(&vmx->vcpu)) vmcs_write64(VIRTUAL_APIC_PAGE_ADDR, __pa(vmx->vcpu.arch.apic->regs)); vmcs_write32(TPR_THRESHOLD, 0); } vmx_setup_uret_msrs(vmx); } static void __vmx_vcpu_reset(struct kvm_vcpu *vcpu) { struct vcpu_vmx *vmx = to_vmx(vcpu); init_vmcs(vmx); if (nested && kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_STUFF_FEATURE_MSRS)) memcpy(&vmx->nested.msrs, &vmcs_config.nested, sizeof(vmx->nested.msrs)); vcpu_setup_sgx_lepubkeyhash(vcpu); vmx->nested.posted_intr_nv = -1; vmx->nested.vmxon_ptr = INVALID_GPA; vmx->nested.current_vmptr = INVALID_GPA; #ifdef CONFIG_KVM_HYPERV vmx->nested.hv_evmcs_vmptr = EVMPTR_INVALID; #endif if (kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_STUFF_FEATURE_MSRS)) vcpu->arch.microcode_version = 0x100000000ULL; vmx->msr_ia32_feature_control_valid_bits = FEAT_CTL_LOCKED; /* * Enforce invariant: pi_desc.nv is always either POSTED_INTR_VECTOR * or POSTED_INTR_WAKEUP_VECTOR. */ vmx->vt.pi_desc.nv = POSTED_INTR_VECTOR; __pi_set_sn(&vmx->vt.pi_desc); } void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) { struct vcpu_vmx *vmx = to_vmx(vcpu); if (!init_event) __vmx_vcpu_reset(vcpu); vmx->rmode.vm86_active = 0; vmx->spec_ctrl = 0; vmx->msr_ia32_umwait_control = 0; vmx->hv_deadline_tsc = -1; kvm_set_cr8(vcpu, 0); seg_setup(VCPU_SREG_CS); vmcs_write16(GUEST_CS_SELECTOR, 0xf000); vmcs_writel(GUEST_CS_BASE, 0xffff0000ul); seg_setup(VCPU_SREG_DS); seg_setup(VCPU_SREG_ES); seg_setup(VCPU_SREG_FS); seg_setup(VCPU_SREG_GS); seg_setup(VCPU_SREG_SS); vmcs_write16(GUEST_TR_SELECTOR, 0); vmcs_writel(GUEST_TR_BASE, 0); vmcs_write32(GUEST_TR_LIMIT, 0xffff); vmcs_write32(GUEST_TR_AR_BYTES, 0x008b); vmcs_write16(GUEST_LDTR_SELECTOR, 0); vmcs_writel(GUEST_LDTR_BASE, 0); vmcs_write32(GUEST_LDTR_LIMIT, 0xffff); vmcs_write32(GUEST_LDTR_AR_BYTES, 0x00082); vmcs_writel(GUEST_GDTR_BASE, 0); vmcs_write32(GUEST_GDTR_LIMIT, 0xffff); vmcs_writel(GUEST_IDTR_BASE, 0); vmcs_write32(GUEST_IDTR_LIMIT, 0xffff); vmx_segment_cache_clear(vmx); kvm_register_mark_available(vcpu, VCPU_EXREG_SEGMENTS); vmcs_write32(GUEST_ACTIVITY_STATE, GUEST_ACTIVITY_ACTIVE); vmcs_write32(GUEST_INTERRUPTIBILITY_INFO, 0); vmcs_writel(GUEST_PENDING_DBG_EXCEPTIONS, 0); if (kvm_mpx_supported()) vmcs_write64(GUEST_BNDCFGS, 0); vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, 0); /* 22.2.1 */ kvm_make_request(KVM_REQ_APIC_PAGE_RELOAD, vcpu); vpid_sync_context(vmx->vpid); vmx_update_fb_clear_dis(vcpu, vmx); } void vmx_enable_irq_window(struct kvm_vcpu *vcpu) { exec_controls_setbit(to_vmx(vcpu), CPU_BASED_INTR_WINDOW_EXITING); } void vmx_enable_nmi_window(struct kvm_vcpu *vcpu) { if (!enable_vnmi || vmcs_read32(GUEST_INTERRUPTIBILITY_INFO) & GUEST_INTR_STATE_STI) { vmx_enable_irq_window(vcpu); return; } exec_controls_setbit(to_vmx(vcpu), CPU_BASED_NMI_WINDOW_EXITING); } void vmx_inject_irq(struct kvm_vcpu *vcpu, bool reinjected) { struct vcpu_vmx *vmx = to_vmx(vcpu); uint32_t intr; int irq = vcpu->arch.interrupt.nr; trace_kvm_inj_virq(irq, vcpu->arch.interrupt.soft, reinjected); ++vcpu->stat.irq_injections; if (vmx->rmode.vm86_active) { int inc_eip = 0; if (vcpu->arch.interrupt.soft) inc_eip = vcpu->arch.event_exit_inst_len; kvm_inject_realmode_interrupt(vcpu, irq, inc_eip); return; } intr = irq | INTR_INFO_VALID_MASK; if (vcpu->arch.interrupt.soft) { intr |= INTR_TYPE_SOFT_INTR; vmcs_write32(VM_ENTRY_INSTRUCTION_LEN, vmx->vcpu.arch.event_exit_inst_len); } else intr |= INTR_TYPE_EXT_INTR; vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, intr); vmx_clear_hlt(vcpu); } void vmx_inject_nmi(struct kvm_vcpu *vcpu) { struct vcpu_vmx *vmx = to_vmx(vcpu); if (!enable_vnmi) { /* * Tracking the NMI-blocked state in software is built upon * finding the next open IRQ window. This, in turn, depends on * well-behaving guests: They have to keep IRQs disabled at * least as long as the NMI handler runs. Otherwise we may * cause NMI nesting, maybe breaking the guest. But as this is * highly unlikely, we can live with the residual risk. */ vmx->loaded_vmcs->soft_vnmi_blocked = 1; vmx->loaded_vmcs->vnmi_blocked_time = 0; } ++vcpu->stat.nmi_injections; vmx->loaded_vmcs->nmi_known_unmasked = false; if (vmx->rmode.vm86_active) { kvm_inject_realmode_interrupt(vcpu, NMI_VECTOR, 0); return; } vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, INTR_TYPE_NMI_INTR | INTR_INFO_VALID_MASK | NMI_VECTOR); vmx_clear_hlt(vcpu); } bool vmx_get_nmi_mask(struct kvm_vcpu *vcpu) { struct vcpu_vmx *vmx = to_vmx(vcpu); bool masked; if (!enable_vnmi) return vmx->loaded_vmcs->soft_vnmi_blocked; if (vmx->loaded_vmcs->nmi_known_unmasked) return false; masked = vmcs_read32(GUEST_INTERRUPTIBILITY_INFO) & GUEST_INTR_STATE_NMI; vmx->loaded_vmcs->nmi_known_unmasked = !masked; return masked; } void vmx_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked) { struct vcpu_vmx *vmx = to_vmx(vcpu); if (!enable_vnmi) { if (vmx->loaded_vmcs->soft_vnmi_blocked != masked) { vmx->loaded_vmcs->soft_vnmi_blocked = masked; vmx->loaded_vmcs->vnmi_blocked_time = 0; } } else { vmx->loaded_vmcs->nmi_known_unmasked = !masked; if (masked) vmcs_set_bits(GUEST_INTERRUPTIBILITY_INFO, GUEST_INTR_STATE_NMI); else vmcs_clear_bits(GUEST_INTERRUPTIBILITY_INFO, GUEST_INTR_STATE_NMI); } } bool vmx_nmi_blocked(struct kvm_vcpu *vcpu) { if (is_guest_mode(vcpu) && nested_exit_on_nmi(vcpu)) return false; if (!enable_vnmi && to_vmx(vcpu)->loaded_vmcs->soft_vnmi_blocked) return true; return (vmcs_read32(GUEST_INTERRUPTIBILITY_INFO) & (GUEST_INTR_STATE_MOV_SS | GUEST_INTR_STATE_STI | GUEST_INTR_STATE_NMI)); } int vmx_nmi_allowed(struct kvm_vcpu *vcpu, bool for_injection) { if (to_vmx(vcpu)->nested.nested_run_pending) return -EBUSY; /* An NMI must not be injected into L2 if it's supposed to VM-Exit. */ if (for_injection && is_guest_mode(vcpu) && nested_exit_on_nmi(vcpu)) return -EBUSY; return !vmx_nmi_blocked(vcpu); } bool __vmx_interrupt_blocked(struct kvm_vcpu *vcpu) { return !(vmx_get_rflags(vcpu) & X86_EFLAGS_IF) || (vmcs_read32(GUEST_INTERRUPTIBILITY_INFO) & (GUEST_INTR_STATE_STI | GUEST_INTR_STATE_MOV_SS)); } bool vmx_interrupt_blocked(struct kvm_vcpu *vcpu) { if (is_guest_mode(vcpu) && nested_exit_on_intr(vcpu)) return false; return __vmx_interrupt_blocked(vcpu); } int vmx_interrupt_allowed(struct kvm_vcpu *vcpu, bool for_injection) { if (to_vmx(vcpu)->nested.nested_run_pending) return -EBUSY; /* * An IRQ must not be injected into L2 if it's supposed to VM-Exit, * e.g. if the IRQ arrived asynchronously after checking nested events. */ if (for_injection && is_guest_mode(vcpu) && nested_exit_on_intr(vcpu)) return -EBUSY; return !vmx_interrupt_blocked(vcpu); } int vmx_set_tss_addr(struct kvm *kvm, unsigned int addr) { void __user *ret; if (enable_unrestricted_guest) return 0; mutex_lock(&kvm->slots_lock); ret = __x86_set_memory_region(kvm, TSS_PRIVATE_MEMSLOT, addr, PAGE_SIZE * 3); mutex_unlock(&kvm->slots_lock); if (IS_ERR(ret)) return PTR_ERR(ret); to_kvm_vmx(kvm)->tss_addr = addr; return init_rmode_tss(kvm, ret); } int vmx_set_identity_map_addr(struct kvm *kvm, u64 ident_addr) { to_kvm_vmx(kvm)->ept_identity_map_addr = ident_addr; return 0; } static bool rmode_exception(struct kvm_vcpu *vcpu, int vec) { switch (vec) { case BP_VECTOR: /* * Update instruction length as we may reinject the exception * from user space while in guest debugging mode. */ to_vmx(vcpu)->vcpu.arch.event_exit_inst_len = vmcs_read32(VM_EXIT_INSTRUCTION_LEN); if (vcpu->guest_debug & KVM_GUESTDBG_USE_SW_BP) return false; fallthrough; case DB_VECTOR: return !(vcpu->guest_debug & (KVM_GUESTDBG_SINGLESTEP | KVM_GUESTDBG_USE_HW_BP)); case DE_VECTOR: case OF_VECTOR: case BR_VECTOR: case UD_VECTOR: case DF_VECTOR: case SS_VECTOR: case GP_VECTOR: case MF_VECTOR: return true; } return false; } static int handle_rmode_exception(struct kvm_vcpu *vcpu, int vec, u32 err_code) { /* * Instruction with address size override prefix opcode 0x67 * Cause the #SS fault with 0 error code in VM86 mode. */ if (((vec == GP_VECTOR) || (vec == SS_VECTOR)) && err_code == 0) { if (kvm_emulate_instruction(vcpu, 0)) { if (vcpu->arch.halt_request) { vcpu->arch.halt_request = 0; return kvm_emulate_halt_noskip(vcpu); } return 1; } return 0; } /* * Forward all other exceptions that are valid in real mode. * FIXME: Breaks guest debugging in real mode, needs to be fixed with * the required debugging infrastructure rework. */ kvm_queue_exception(vcpu, vec); return 1; } static int handle_machine_check(struct kvm_vcpu *vcpu) { /* handled by vmx_vcpu_run() */ return 1; } /* * If the host has split lock detection disabled, then #AC is * unconditionally injected into the guest, which is the pre split lock * detection behaviour. * * If the host has split lock detection enabled then #AC is * only injected into the guest when: * - Guest CPL == 3 (user mode) * - Guest has #AC detection enabled in CR0 * - Guest EFLAGS has AC bit set */ bool vmx_guest_inject_ac(struct kvm_vcpu *vcpu) { if (!boot_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT)) return true; return vmx_get_cpl(vcpu) == 3 && kvm_is_cr0_bit_set(vcpu, X86_CR0_AM) && (kvm_get_rflags(vcpu) & X86_EFLAGS_AC); } static bool is_xfd_nm_fault(struct kvm_vcpu *vcpu) { return vcpu->arch.guest_fpu.fpstate->xfd && !kvm_is_cr0_bit_set(vcpu, X86_CR0_TS); } static int handle_exception_nmi(struct kvm_vcpu *vcpu) { struct vcpu_vmx *vmx = to_vmx(vcpu); struct kvm_run *kvm_run = vcpu->run; u32 intr_info, ex_no, error_code; unsigned long cr2, dr6; u32 vect_info; vect_info = vmx->idt_vectoring_info; intr_info = vmx_get_intr_info(vcpu); /* * Machine checks are handled by handle_exception_irqoff(), or by * vmx_vcpu_run() if a #MC occurs on VM-Entry. NMIs are handled by * vmx_vcpu_enter_exit(). */ if (is_machine_check(intr_info) || is_nmi(intr_info)) return 1; /* * Queue the exception here instead of in handle_nm_fault_irqoff(). * This ensures the nested_vmx check is not skipped so vmexit can * be reflected to L1 (when it intercepts #NM) before reaching this * point. */ if (is_nm_fault(intr_info)) { kvm_queue_exception_p(vcpu, NM_VECTOR, is_xfd_nm_fault(vcpu) ? vcpu->arch.guest_fpu.xfd_err : 0); return 1; } if (is_invalid_opcode(intr_info)) return handle_ud(vcpu); if (WARN_ON_ONCE(is_ve_fault(intr_info))) { struct vmx_ve_information *ve_info = vmx->ve_info; WARN_ONCE(ve_info->exit_reason != EXIT_REASON_EPT_VIOLATION, "Unexpected #VE on VM-Exit reason 0x%x", ve_info->exit_reason); dump_vmcs(vcpu); kvm_mmu_print_sptes(vcpu, ve_info->guest_physical_address, "#VE"); return 1; } error_code = 0; if (intr_info & INTR_INFO_DELIVER_CODE_MASK) error_code = vmcs_read32(VM_EXIT_INTR_ERROR_CODE); if (!vmx->rmode.vm86_active && is_gp_fault(intr_info)) { WARN_ON_ONCE(!enable_vmware_backdoor); /* * VMware backdoor emulation on #GP interception only handles * IN{S}, OUT{S}, and RDPMC, none of which generate a non-zero * error code on #GP. */ if (error_code) { kvm_queue_exception_e(vcpu, GP_VECTOR, error_code); return 1; } return kvm_emulate_instruction(vcpu, EMULTYPE_VMWARE_GP); } /* * The #PF with PFEC.RSVD = 1 indicates the guest is accessing * MMIO, it is better to report an internal error. * See the comments in vmx_handle_exit. */ if ((vect_info & VECTORING_INFO_VALID_MASK) && !(is_page_fault(intr_info) && !(error_code & PFERR_RSVD_MASK))) { vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR; vcpu->run->internal.suberror = KVM_INTERNAL_ERROR_SIMUL_EX; vcpu->run->internal.ndata = 4; vcpu->run->internal.data[0] = vect_info; vcpu->run->internal.data[1] = intr_info; vcpu->run->internal.data[2] = error_code; vcpu->run->internal.data[3] = vcpu->arch.last_vmentry_cpu; return 0; } if (is_page_fault(intr_info)) { cr2 = vmx_get_exit_qual(vcpu); if (enable_ept && !vcpu->arch.apf.host_apf_flags) { /* * EPT will cause page fault only if we need to * detect illegal GPAs. */ WARN_ON_ONCE(!allow_smaller_maxphyaddr); kvm_fixup_and_inject_pf_error(vcpu, cr2, error_code); return 1; } else return kvm_handle_page_fault(vcpu, error_code, cr2, NULL, 0); } ex_no = intr_info & INTR_INFO_VECTOR_MASK; if (vmx->rmode.vm86_active && rmode_exception(vcpu, ex_no)) return handle_rmode_exception(vcpu, ex_no, error_code); switch (ex_no) { case DB_VECTOR: dr6 = vmx_get_exit_qual(vcpu); if (!(vcpu->guest_debug & (KVM_GUESTDBG_SINGLESTEP | KVM_GUESTDBG_USE_HW_BP))) { /* * If the #DB was due to ICEBP, a.k.a. INT1, skip the * instruction. ICEBP generates a trap-like #DB, but * despite its interception control being tied to #DB, * is an instruction intercept, i.e. the VM-Exit occurs * on the ICEBP itself. Use the inner "skip" helper to * avoid single-step #DB and MTF updates, as ICEBP is * higher priority. Note, skipping ICEBP still clears * STI and MOVSS blocking. * * For all other #DBs, set vmcs.PENDING_DBG_EXCEPTIONS.BS * if single-step is enabled in RFLAGS and STI or MOVSS * blocking is active, as the CPU doesn't set the bit * on VM-Exit due to #DB interception. VM-Entry has a * consistency check that a single-step #DB is pending * in this scenario as the previous instruction cannot * have toggled RFLAGS.TF 0=>1 (because STI and POP/MOV * don't modify RFLAGS), therefore the one instruction * delay when activating single-step breakpoints must * have already expired. Note, the CPU sets/clears BS * as appropriate for all other VM-Exits types. */ if (is_icebp(intr_info)) WARN_ON(!skip_emulated_instruction(vcpu)); else if ((vmx_get_rflags(vcpu) & X86_EFLAGS_TF) && (vmcs_read32(GUEST_INTERRUPTIBILITY_INFO) & (GUEST_INTR_STATE_STI | GUEST_INTR_STATE_MOV_SS))) vmcs_writel(GUEST_PENDING_DBG_EXCEPTIONS, vmcs_readl(GUEST_PENDING_DBG_EXCEPTIONS) | DR6_BS); kvm_queue_exception_p(vcpu, DB_VECTOR, dr6); return 1; } kvm_run->debug.arch.dr6 = dr6 | DR6_ACTIVE_LOW; kvm_run->debug.arch.dr7 = vmcs_readl(GUEST_DR7); fallthrough; case BP_VECTOR: /* * Update instruction length as we may reinject #BP from * user space while in guest debugging mode. Reading it for * #DB as well causes no harm, it is not used in that case. */ vmx->vcpu.arch.event_exit_inst_len = vmcs_read32(VM_EXIT_INSTRUCTION_LEN); kvm_run->exit_reason = KVM_EXIT_DEBUG; kvm_run->debug.arch.pc = kvm_get_linear_rip(vcpu); kvm_run->debug.arch.exception = ex_no; break; case AC_VECTOR: if (vmx_guest_inject_ac(vcpu)) { kvm_queue_exception_e(vcpu, AC_VECTOR, error_code); return 1; } /* * Handle split lock. Depending on detection mode this will * either warn and disable split lock detection for this * task or force SIGBUS on it. */ if (handle_guest_split_lock(kvm_rip_read(vcpu))) return 1; fallthrough; default: kvm_run->exit_reason = KVM_EXIT_EXCEPTION; kvm_run->ex.exception = ex_no; kvm_run->ex.error_code = error_code; break; } return 0; } static __always_inline int handle_external_interrupt(struct kvm_vcpu *vcpu) { ++vcpu->stat.irq_exits; return 1; } static int handle_triple_fault(struct kvm_vcpu *vcpu) { vcpu->run->exit_reason = KVM_EXIT_SHUTDOWN; vcpu->mmio_needed = 0; return 0; } static int handle_io(struct kvm_vcpu *vcpu) { unsigned long exit_qualification; int size, in, string; unsigned port; exit_qualification = vmx_get_exit_qual(vcpu); string = (exit_qualification & 16) != 0; ++vcpu->stat.io_exits; if (string) return kvm_emulate_instruction(vcpu, 0); port = exit_qualification >> 16; size = (exit_qualification & 7) + 1; in = (exit_qualification & 8) != 0; return kvm_fast_pio(vcpu, size, port, in); } void vmx_patch_hypercall(struct kvm_vcpu *vcpu, unsigned char *hypercall) { /* * Patch in the VMCALL instruction: */ hypercall[0] = 0x0f; hypercall[1] = 0x01; hypercall[2] = 0xc1; } /* called to set cr0 as appropriate for a mov-to-cr0 exit. */ static int handle_set_cr0(struct kvm_vcpu *vcpu, unsigned long val) { if (is_guest_mode(vcpu)) { struct vmcs12 *vmcs12 = get_vmcs12(vcpu); unsigned long orig_val = val; /* * We get here when L2 changed cr0 in a way that did not change * any of L1's shadowed bits (see nested_vmx_exit_handled_cr), * but did change L0 shadowed bits. So we first calculate the * effective cr0 value that L1 would like to write into the * hardware. It consists of the L2-owned bits from the new * value combined with the L1-owned bits from L1's guest_cr0. */ val = (val & ~vmcs12->cr0_guest_host_mask) | (vmcs12->guest_cr0 & vmcs12->cr0_guest_host_mask); if (kvm_set_cr0(vcpu, val)) return 1; vmcs_writel(CR0_READ_SHADOW, orig_val); return 0; } else { return kvm_set_cr0(vcpu, val); } } static int handle_set_cr4(struct kvm_vcpu *vcpu, unsigned long val) { if (is_guest_mode(vcpu)) { struct vmcs12 *vmcs12 = get_vmcs12(vcpu); unsigned long orig_val = val; /* analogously to handle_set_cr0 */ val = (val & ~vmcs12->cr4_guest_host_mask) | (vmcs12->guest_cr4 & vmcs12->cr4_guest_host_mask); if (kvm_set_cr4(vcpu, val)) return 1; vmcs_writel(CR4_READ_SHADOW, orig_val); return 0; } else return kvm_set_cr4(vcpu, val); } static int handle_desc(struct kvm_vcpu *vcpu) { /* * UMIP emulation relies on intercepting writes to CR4.UMIP, i.e. this * and other code needs to be updated if UMIP can be guest owned. */ BUILD_BUG_ON(KVM_POSSIBLE_CR4_GUEST_BITS & X86_CR4_UMIP); WARN_ON_ONCE(!kvm_is_cr4_bit_set(vcpu, X86_CR4_UMIP)); return kvm_emulate_instruction(vcpu, 0); } static int handle_cr(struct kvm_vcpu *vcpu) { unsigned long exit_qualification, val; int cr; int reg; int err; int ret; exit_qualification = vmx_get_exit_qual(vcpu); cr = exit_qualification & 15; reg = (exit_qualification >> 8) & 15; switch ((exit_qualification >> 4) & 3) { case 0: /* mov to cr */ val = kvm_register_read(vcpu, reg); trace_kvm_cr_write(cr, val); switch (cr) { case 0: err = handle_set_cr0(vcpu, val); return kvm_complete_insn_gp(vcpu, err); case 3: WARN_ON_ONCE(enable_unrestricted_guest); err = kvm_set_cr3(vcpu, val); return kvm_complete_insn_gp(vcpu, err); case 4: err = handle_set_cr4(vcpu, val); return kvm_complete_insn_gp(vcpu, err); case 8: { u8 cr8_prev = kvm_get_cr8(vcpu); u8 cr8 = (u8)val; err = kvm_set_cr8(vcpu, cr8); ret = kvm_complete_insn_gp(vcpu, err); if (lapic_in_kernel(vcpu)) return ret; if (cr8_prev <= cr8) return ret; /* * TODO: we might be squashing a * KVM_GUESTDBG_SINGLESTEP-triggered * KVM_EXIT_DEBUG here. */ vcpu->run->exit_reason = KVM_EXIT_SET_TPR; return 0; } } break; case 2: /* clts */ KVM_BUG(1, vcpu->kvm, "Guest always owns CR0.TS"); return -EIO; case 1: /*mov from cr*/ switch (cr) { case 3: WARN_ON_ONCE(enable_unrestricted_guest); val = kvm_read_cr3(vcpu); kvm_register_write(vcpu, reg, val); trace_kvm_cr_read(cr, val); return kvm_skip_emulated_instruction(vcpu); case 8: val = kvm_get_cr8(vcpu); kvm_register_write(vcpu, reg, val); trace_kvm_cr_read(cr, val); return kvm_skip_emulated_instruction(vcpu); } break; case 3: /* lmsw */ val = (exit_qualification >> LMSW_SOURCE_DATA_SHIFT) & 0x0f; trace_kvm_cr_write(0, (kvm_read_cr0_bits(vcpu, ~0xful) | val)); kvm_lmsw(vcpu, val); return kvm_skip_emulated_instruction(vcpu); default: break; } vcpu->run->exit_reason = 0; vcpu_unimpl(vcpu, "unhandled control register: op %d cr %d\n", (int)(exit_qualification >> 4) & 3, cr); return 0; } static int handle_dr(struct kvm_vcpu *vcpu) { unsigned long exit_qualification; int dr, dr7, reg; int err = 1; exit_qualification = vmx_get_exit_qual(vcpu); dr = exit_qualification & DEBUG_REG_ACCESS_NUM; /* First, if DR does not exist, trigger UD */ if (!kvm_require_dr(vcpu, dr)) return 1; if (vmx_get_cpl(vcpu) > 0) goto out; dr7 = vmcs_readl(GUEST_DR7); if (dr7 & DR7_GD) { /* * As the vm-exit takes precedence over the debug trap, we * need to emulate the latter, either for the host or the * guest debugging itself. */ if (vcpu->guest_debug & KVM_GUESTDBG_USE_HW_BP) { vcpu->run->debug.arch.dr6 = DR6_BD | DR6_ACTIVE_LOW; vcpu->run->debug.arch.dr7 = dr7; vcpu->run->debug.arch.pc = kvm_get_linear_rip(vcpu); vcpu->run->debug.arch.exception = DB_VECTOR; vcpu->run->exit_reason = KVM_EXIT_DEBUG; return 0; } else { kvm_queue_exception_p(vcpu, DB_VECTOR, DR6_BD); return 1; } } if (vcpu->guest_debug == 0) { exec_controls_clearbit(to_vmx(vcpu), CPU_BASED_MOV_DR_EXITING); /* * No more DR vmexits; force a reload of the debug registers * and reenter on this instruction. The next vmexit will * retrieve the full state of the debug registers. */ vcpu->arch.switch_db_regs |= KVM_DEBUGREG_WONT_EXIT; return 1; } reg = DEBUG_REG_ACCESS_REG(exit_qualification); if (exit_qualification & TYPE_MOV_FROM_DR) { kvm_register_write(vcpu, reg, kvm_get_dr(vcpu, dr)); err = 0; } else { err = kvm_set_dr(vcpu, dr, kvm_register_read(vcpu, reg)); } out: return kvm_complete_insn_gp(vcpu, err); } void vmx_sync_dirty_debug_regs(struct kvm_vcpu *vcpu) { get_debugreg(vcpu->arch.db[0], 0); get_debugreg(vcpu->arch.db[1], 1); get_debugreg(vcpu->arch.db[2], 2); get_debugreg(vcpu->arch.db[3], 3); get_debugreg(vcpu->arch.dr6, 6); vcpu->arch.dr7 = vmcs_readl(GUEST_DR7); vcpu->arch.switch_db_regs &= ~KVM_DEBUGREG_WONT_EXIT; exec_controls_setbit(to_vmx(vcpu), CPU_BASED_MOV_DR_EXITING); /* * exc_debug expects dr6 to be cleared after it runs, avoid that it sees * a stale dr6 from the guest. */ set_debugreg(DR6_RESERVED, 6); } void vmx_set_dr7(struct kvm_vcpu *vcpu, unsigned long val) { vmcs_writel(GUEST_DR7, val); } static int handle_tpr_below_threshold(struct kvm_vcpu *vcpu) { kvm_apic_update_ppr(vcpu); return 1; } static int handle_interrupt_window(struct kvm_vcpu *vcpu) { exec_controls_clearbit(to_vmx(vcpu), CPU_BASED_INTR_WINDOW_EXITING); kvm_make_request(KVM_REQ_EVENT, vcpu); ++vcpu->stat.irq_window_exits; return 1; } static int handle_invlpg(struct kvm_vcpu *vcpu) { unsigned long exit_qualification = vmx_get_exit_qual(vcpu); kvm_mmu_invlpg(vcpu, exit_qualification); return kvm_skip_emulated_instruction(vcpu); } static int handle_apic_access(struct kvm_vcpu *vcpu) { if (likely(fasteoi)) { unsigned long exit_qualification = vmx_get_exit_qual(vcpu); int access_type, offset; access_type = exit_qualification & APIC_ACCESS_TYPE; offset = exit_qualification & APIC_ACCESS_OFFSET; /* * Sane guest uses MOV to write EOI, with written value * not cared. So make a short-circuit here by avoiding * heavy instruction emulation. */ if ((access_type == TYPE_LINEAR_APIC_INST_WRITE) && (offset == APIC_EOI)) { kvm_lapic_set_eoi(vcpu); return kvm_skip_emulated_instruction(vcpu); } } return kvm_emulate_instruction(vcpu, 0); } static int handle_apic_eoi_induced(struct kvm_vcpu *vcpu) { unsigned long exit_qualification = vmx_get_exit_qual(vcpu); int vector = exit_qualification & 0xff; /* EOI-induced VM exit is trap-like and thus no need to adjust IP */ kvm_apic_set_eoi_accelerated(vcpu, vector); return 1; } static int handle_apic_write(struct kvm_vcpu *vcpu) { unsigned long exit_qualification = vmx_get_exit_qual(vcpu); /* * APIC-write VM-Exit is trap-like, KVM doesn't need to advance RIP and * hardware has done any necessary aliasing, offset adjustments, etc... * for the access. I.e. the correct value has already been written to * the vAPIC page for the correct 16-byte chunk. KVM needs only to * retrieve the register value and emulate the access. */ u32 offset = exit_qualification & 0xff0; kvm_apic_write_nodecode(vcpu, offset); return 1; } static int handle_task_switch(struct kvm_vcpu *vcpu) { struct vcpu_vmx *vmx = to_vmx(vcpu); unsigned long exit_qualification; bool has_error_code = false; u32 error_code = 0; u16 tss_selector; int reason, type, idt_v, idt_index; idt_v = (vmx->idt_vectoring_info & VECTORING_INFO_VALID_MASK); idt_index = (vmx->idt_vectoring_info & VECTORING_INFO_VECTOR_MASK); type = (vmx->idt_vectoring_info & VECTORING_INFO_TYPE_MASK); exit_qualification = vmx_get_exit_qual(vcpu); reason = (u32)exit_qualification >> 30; if (reason == TASK_SWITCH_GATE && idt_v) { switch (type) { case INTR_TYPE_NMI_INTR: vcpu->arch.nmi_injected = false; vmx_set_nmi_mask(vcpu, true); break; case INTR_TYPE_EXT_INTR: case INTR_TYPE_SOFT_INTR: kvm_clear_interrupt_queue(vcpu); break; case INTR_TYPE_HARD_EXCEPTION: if (vmx->idt_vectoring_info & VECTORING_INFO_DELIVER_CODE_MASK) { has_error_code = true; error_code = vmcs_read32(IDT_VECTORING_ERROR_CODE); } fallthrough; case INTR_TYPE_SOFT_EXCEPTION: kvm_clear_exception_queue(vcpu); break; default: break; } } tss_selector = exit_qualification; if (!idt_v || (type != INTR_TYPE_HARD_EXCEPTION && type != INTR_TYPE_EXT_INTR && type != INTR_TYPE_NMI_INTR)) WARN_ON(!skip_emulated_instruction(vcpu)); /* * TODO: What about debug traps on tss switch? * Are we supposed to inject them and update dr6? */ return kvm_task_switch(vcpu, tss_selector, type == INTR_TYPE_SOFT_INTR ? idt_index : -1, reason, has_error_code, error_code); } static int handle_ept_violation(struct kvm_vcpu *vcpu) { unsigned long exit_qualification = vmx_get_exit_qual(vcpu); gpa_t gpa; /* * EPT violation happened while executing iret from NMI, * "blocked by NMI" bit has to be set before next VM entry. * There are errata that may cause this bit to not be set: * AAK134, BY25. */ if (!(to_vmx(vcpu)->idt_vectoring_info & VECTORING_INFO_VALID_MASK) && enable_vnmi && (exit_qualification & INTR_INFO_UNBLOCK_NMI)) vmcs_set_bits(GUEST_INTERRUPTIBILITY_INFO, GUEST_INTR_STATE_NMI); gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS); trace_kvm_page_fault(vcpu, gpa, exit_qualification); /* * Check that the GPA doesn't exceed physical memory limits, as that is * a guest page fault. We have to emulate the instruction here, because * if the illegal address is that of a paging structure, then * EPT_VIOLATION_ACC_WRITE bit is set. Alternatively, if supported we * would also use advanced VM-exit information for EPT violations to * reconstruct the page fault error code. */ if (unlikely(allow_smaller_maxphyaddr && !kvm_vcpu_is_legal_gpa(vcpu, gpa))) return kvm_emulate_instruction(vcpu, 0); return __vmx_handle_ept_violation(vcpu, gpa, exit_qualification); } static int handle_ept_misconfig(struct kvm_vcpu *vcpu) { gpa_t gpa; if (vmx_check_emulate_instruction(vcpu, EMULTYPE_PF, NULL, 0)) return 1; /* * A nested guest cannot optimize MMIO vmexits, because we have an * nGPA here instead of the required GPA. */ gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS); if (!is_guest_mode(vcpu) && !kvm_io_bus_write(vcpu, KVM_FAST_MMIO_BUS, gpa, 0, NULL)) { trace_kvm_fast_mmio(gpa); return kvm_skip_emulated_instruction(vcpu); } return kvm_mmu_page_fault(vcpu, gpa, PFERR_RSVD_MASK, NULL, 0); } static int handle_nmi_window(struct kvm_vcpu *vcpu) { if (KVM_BUG_ON(!enable_vnmi, vcpu->kvm)) return -EIO; exec_controls_clearbit(to_vmx(vcpu), CPU_BASED_NMI_WINDOW_EXITING); ++vcpu->stat.nmi_window_exits; kvm_make_request(KVM_REQ_EVENT, vcpu); return 1; } /* * Returns true if emulation is required (due to the vCPU having invalid state * with unsrestricted guest mode disabled) and KVM can't faithfully emulate the * current vCPU state. */ static bool vmx_unhandleable_emulation_required(struct kvm_vcpu *vcpu) { struct vcpu_vmx *vmx = to_vmx(vcpu); if (!vmx->vt.emulation_required) return false; /* * It is architecturally impossible for emulation to be required when a * nested VM-Enter is pending completion, as VM-Enter will VM-Fail if * guest state is invalid and unrestricted guest is disabled, i.e. KVM * should synthesize VM-Fail instead emulation L2 code. This path is * only reachable if userspace modifies L2 guest state after KVM has * performed the nested VM-Enter consistency checks. */ if (vmx->nested.nested_run_pending) return true; /* * KVM only supports emulating exceptions if the vCPU is in Real Mode. * If emulation is required, KVM can't perform a successful VM-Enter to * inject the exception. */ return !vmx->rmode.vm86_active && (kvm_is_exception_pending(vcpu) || vcpu->arch.exception.injected); } static int handle_invalid_guest_state(struct kvm_vcpu *vcpu) { struct vcpu_vmx *vmx = to_vmx(vcpu); bool intr_window_requested; unsigned count = 130; intr_window_requested = exec_controls_get(vmx) & CPU_BASED_INTR_WINDOW_EXITING; while (vmx->vt.emulation_required && count-- != 0) { if (intr_window_requested && !vmx_interrupt_blocked(vcpu)) return handle_interrupt_window(&vmx->vcpu); if (kvm_test_request(KVM_REQ_EVENT, vcpu)) return 1; if (!kvm_emulate_instruction(vcpu, 0)) return 0; if (vmx_unhandleable_emulation_required(vcpu)) { kvm_prepare_emulation_failure_exit(vcpu); return 0; } if (vcpu->arch.halt_request) { vcpu->arch.halt_request = 0; return kvm_emulate_halt_noskip(vcpu); } /* * Note, return 1 and not 0, vcpu_run() will invoke * xfer_to_guest_mode() which will create a proper return * code. */ if (__xfer_to_guest_mode_work_pending()) return 1; } return 1; } int vmx_vcpu_pre_run(struct kvm_vcpu *vcpu) { if (vmx_unhandleable_emulation_required(vcpu)) { kvm_prepare_emulation_failure_exit(vcpu); return 0; } return 1; } /* * Indicate a busy-waiting vcpu in spinlock. We do not enable the PAUSE * exiting, so only get here on cpu with PAUSE-Loop-Exiting. */ static int handle_pause(struct kvm_vcpu *vcpu) { if (!kvm_pause_in_guest(vcpu->kvm)) grow_ple_window(vcpu); /* * Intel sdm vol3 ch-25.1.3 says: The "PAUSE-loop exiting" * VM-execution control is ignored if CPL > 0. OTOH, KVM * never set PAUSE_EXITING and just set PLE if supported, * so the vcpu must be CPL=0 if it gets a PAUSE exit. */ kvm_vcpu_on_spin(vcpu, true); return kvm_skip_emulated_instruction(vcpu); } static int handle_monitor_trap(struct kvm_vcpu *vcpu) { return 1; } static int handle_invpcid(struct kvm_vcpu *vcpu) { u32 vmx_instruction_info; unsigned long type; gva_t gva; struct { u64 pcid; u64 gla; } operand; int gpr_index; if (!guest_cpu_cap_has(vcpu, X86_FEATURE_INVPCID)) { kvm_queue_exception(vcpu, UD_VECTOR); return 1; } vmx_instruction_info = vmcs_read32(VMX_INSTRUCTION_INFO); gpr_index = vmx_get_instr_info_reg2(vmx_instruction_info); type = kvm_register_read(vcpu, gpr_index); /* According to the Intel instruction reference, the memory operand * is read even if it isn't needed (e.g., for type==all) */ if (get_vmx_mem_address(vcpu, vmx_get_exit_qual(vcpu), vmx_instruction_info, false, sizeof(operand), &gva)) return 1; return kvm_handle_invpcid(vcpu, type, gva); } static int handle_pml_full(struct kvm_vcpu *vcpu) { unsigned long exit_qualification; trace_kvm_pml_full(vcpu->vcpu_id); exit_qualification = vmx_get_exit_qual(vcpu); /* * PML buffer FULL happened while executing iret from NMI, * "blocked by NMI" bit has to be set before next VM entry. */ if (!(to_vmx(vcpu)->idt_vectoring_info & VECTORING_INFO_VALID_MASK) && enable_vnmi && (exit_qualification & INTR_INFO_UNBLOCK_NMI)) vmcs_set_bits(GUEST_INTERRUPTIBILITY_INFO, GUEST_INTR_STATE_NMI); /* * PML buffer already flushed at beginning of VMEXIT. Nothing to do * here.., and there's no userspace involvement needed for PML. */ return 1; } static fastpath_t handle_fastpath_preemption_timer(struct kvm_vcpu *vcpu, bool force_immediate_exit) { struct vcpu_vmx *vmx = to_vmx(vcpu); /* * In the *extremely* unlikely scenario that this is a spurious VM-Exit * due to the timer expiring while it was "soft" disabled, just eat the * exit and re-enter the guest. */ if (unlikely(vmx->loaded_vmcs->hv_timer_soft_disabled)) return EXIT_FASTPATH_REENTER_GUEST; /* * If the timer expired because KVM used it to force an immediate exit, * then mission accomplished. */ if (force_immediate_exit) return EXIT_FASTPATH_EXIT_HANDLED; /* * If L2 is active, go down the slow path as emulating the guest timer * expiration likely requires synthesizing a nested VM-Exit. */ if (is_guest_mode(vcpu)) return EXIT_FASTPATH_NONE; kvm_lapic_expired_hv_timer(vcpu); return EXIT_FASTPATH_REENTER_GUEST; } static int handle_preemption_timer(struct kvm_vcpu *vcpu) { /* * This non-fastpath handler is reached if and only if the preemption * timer was being used to emulate a guest timer while L2 is active. * All other scenarios are supposed to be handled in the fastpath. */ WARN_ON_ONCE(!is_guest_mode(vcpu)); kvm_lapic_expired_hv_timer(vcpu); return 1; } /* * When nested=0, all VMX instruction VM Exits filter here. The handlers * are overwritten by nested_vmx_hardware_setup() when nested=1. */ static int handle_vmx_instruction(struct kvm_vcpu *vcpu) { kvm_queue_exception(vcpu, UD_VECTOR); return 1; } #ifndef CONFIG_X86_SGX_KVM static int handle_encls(struct kvm_vcpu *vcpu) { /* * SGX virtualization is disabled. There is no software enable bit for * SGX, so KVM intercepts all ENCLS leafs and injects a #UD to prevent * the guest from executing ENCLS (when SGX is supported by hardware). */ kvm_queue_exception(vcpu, UD_VECTOR); return 1; } #endif /* CONFIG_X86_SGX_KVM */ static int handle_bus_lock_vmexit(struct kvm_vcpu *vcpu) { /* * Hardware may or may not set the BUS_LOCK_DETECTED flag on BUS_LOCK * VM-Exits. Unconditionally set the flag here and leave the handling to * vmx_handle_exit(). */ to_vt(vcpu)->exit_reason.bus_lock_detected = true; return 1; } static int handle_notify(struct kvm_vcpu *vcpu) { unsigned long exit_qual = vmx_get_exit_qual(vcpu); bool context_invalid = exit_qual & NOTIFY_VM_CONTEXT_INVALID; ++vcpu->stat.notify_window_exits; /* * Notify VM exit happened while executing iret from NMI, * "blocked by NMI" bit has to be set before next VM entry. */ if (enable_vnmi && (exit_qual & INTR_INFO_UNBLOCK_NMI)) vmcs_set_bits(GUEST_INTERRUPTIBILITY_INFO, GUEST_INTR_STATE_NMI); if (vcpu->kvm->arch.notify_vmexit_flags & KVM_X86_NOTIFY_VMEXIT_USER || context_invalid) { vcpu->run->exit_reason = KVM_EXIT_NOTIFY; vcpu->run->notify.flags = context_invalid ? KVM_NOTIFY_CONTEXT_INVALID : 0; return 0; } return 1; } /* * The exit handlers return 1 if the exit was handled fully and guest execution * may resume. Otherwise they set the kvm_run parameter to indicate what needs * to be done to userspace and return 0. */ static int (*kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) = { [EXIT_REASON_EXCEPTION_NMI] = handle_exception_nmi, [EXIT_REASON_EXTERNAL_INTERRUPT] = handle_external_interrupt, [EXIT_REASON_TRIPLE_FAULT] = handle_triple_fault, [EXIT_REASON_NMI_WINDOW] = handle_nmi_window, [EXIT_REASON_IO_INSTRUCTION] = handle_io, [EXIT_REASON_CR_ACCESS] = handle_cr, [EXIT_REASON_DR_ACCESS] = handle_dr, [EXIT_REASON_CPUID] = kvm_emulate_cpuid, [EXIT_REASON_MSR_READ] = kvm_emulate_rdmsr, [EXIT_REASON_MSR_WRITE] = kvm_emulate_wrmsr, [EXIT_REASON_INTERRUPT_WINDOW] = handle_interrupt_window, [EXIT_REASON_HLT] = kvm_emulate_halt, [EXIT_REASON_INVD] = kvm_emulate_invd, [EXIT_REASON_INVLPG] = handle_invlpg, [EXIT_REASON_RDPMC] = kvm_emulate_rdpmc, [EXIT_REASON_VMCALL] = kvm_emulate_hypercall, [EXIT_REASON_VMCLEAR] = handle_vmx_instruction, [EXIT_REASON_VMLAUNCH] = handle_vmx_instruction, [EXIT_REASON_VMPTRLD] = handle_vmx_instruction, [EXIT_REASON_VMPTRST] = handle_vmx_instruction, [EXIT_REASON_VMREAD] = handle_vmx_instruction, [EXIT_REASON_VMRESUME] = handle_vmx_instruction, [EXIT_REASON_VMWRITE] = handle_vmx_instruction, [EXIT_REASON_VMOFF] = handle_vmx_instruction, [EXIT_REASON_VMON] = handle_vmx_instruction, [EXIT_REASON_TPR_BELOW_THRESHOLD] = handle_tpr_below_threshold, [EXIT_REASON_APIC_ACCESS] = handle_apic_access, [EXIT_REASON_APIC_WRITE] = handle_apic_write, [EXIT_REASON_EOI_INDUCED] = handle_apic_eoi_induced, [EXIT_REASON_WBINVD] = kvm_emulate_wbinvd, [EXIT_REASON_XSETBV] = kvm_emulate_xsetbv, [EXIT_REASON_TASK_SWITCH] = handle_task_switch, [EXIT_REASON_MCE_DURING_VMENTRY] = handle_machine_check, [EXIT_REASON_GDTR_IDTR] = handle_desc, [EXIT_REASON_LDTR_TR] = handle_desc, [EXIT_REASON_EPT_VIOLATION] = handle_ept_violation, [EXIT_REASON_EPT_MISCONFIG] = handle_ept_misconfig, [EXIT_REASON_PAUSE_INSTRUCTION] = handle_pause, [EXIT_REASON_MWAIT_INSTRUCTION] = kvm_emulate_mwait, [EXIT_REASON_MONITOR_TRAP_FLAG] = handle_monitor_trap, [EXIT_REASON_MONITOR_INSTRUCTION] = kvm_emulate_monitor, [EXIT_REASON_INVEPT] = handle_vmx_instruction, [EXIT_REASON_INVVPID] = handle_vmx_instruction, [EXIT_REASON_RDRAND] = kvm_handle_invalid_op, [EXIT_REASON_RDSEED] = kvm_handle_invalid_op, [EXIT_REASON_PML_FULL] = handle_pml_full, [EXIT_REASON_INVPCID] = handle_invpcid, [EXIT_REASON_VMFUNC] = handle_vmx_instruction, [EXIT_REASON_PREEMPTION_TIMER] = handle_preemption_timer, [EXIT_REASON_ENCLS] = handle_encls, [EXIT_REASON_BUS_LOCK] = handle_bus_lock_vmexit, [EXIT_REASON_NOTIFY] = handle_notify, }; static const int kvm_vmx_max_exit_handlers = ARRAY_SIZE(kvm_vmx_exit_handlers); void vmx_get_exit_info(struct kvm_vcpu *vcpu, u32 *reason, u64 *info1, u64 *info2, u32 *intr_info, u32 *error_code) { struct vcpu_vmx *vmx = to_vmx(vcpu); *reason = vmx->vt.exit_reason.full; *info1 = vmx_get_exit_qual(vcpu); if (!(vmx->vt.exit_reason.failed_vmentry)) { *info2 = vmx->idt_vectoring_info; *intr_info = vmx_get_intr_info(vcpu); if (is_exception_with_error_code(*intr_info)) *error_code = vmcs_read32(VM_EXIT_INTR_ERROR_CODE); else *error_code = 0; } else { *info2 = 0; *intr_info = 0; *error_code = 0; } } void vmx_get_entry_info(struct kvm_vcpu *vcpu, u32 *intr_info, u32 *error_code) { *intr_info = vmcs_read32(VM_ENTRY_INTR_INFO_FIELD); if (is_exception_with_error_code(*intr_info)) *error_code = vmcs_read32(VM_ENTRY_EXCEPTION_ERROR_CODE); else *error_code = 0; } static void vmx_destroy_pml_buffer(struct vcpu_vmx *vmx) { if (vmx->pml_pg) { __free_page(vmx->pml_pg); vmx->pml_pg = NULL; } } static void vmx_flush_pml_buffer(struct kvm_vcpu *vcpu) { struct vcpu_vmx *vmx = to_vmx(vcpu); u16 pml_idx, pml_tail_index; u64 *pml_buf; int i; pml_idx = vmcs_read16(GUEST_PML_INDEX); /* Do nothing if PML buffer is empty */ if (pml_idx == PML_HEAD_INDEX) return; /* * PML index always points to the next available PML buffer entity * unless PML log has just overflowed. */ pml_tail_index = (pml_idx >= PML_LOG_NR_ENTRIES) ? 0 : pml_idx + 1; /* * PML log is written backwards: the CPU first writes the entry 511 * then the entry 510, and so on. * * Read the entries in the same order they were written, to ensure that * the dirty ring is filled in the same order the CPU wrote them. */ pml_buf = page_address(vmx->pml_pg); for (i = PML_HEAD_INDEX; i >= pml_tail_index; i--) { u64 gpa; gpa = pml_buf[i]; WARN_ON(gpa & (PAGE_SIZE - 1)); kvm_vcpu_mark_page_dirty(vcpu, gpa >> PAGE_SHIFT); } /* reset PML index */ vmcs_write16(GUEST_PML_INDEX, PML_HEAD_INDEX); } static void vmx_dump_sel(char *name, uint32_t sel) { pr_err("%s sel=0x%04x, attr=0x%05x, limit=0x%08x, base=0x%016lx\n", name, vmcs_read16(sel), vmcs_read32(sel + GUEST_ES_AR_BYTES - GUEST_ES_SELECTOR), vmcs_read32(sel + GUEST_ES_LIMIT - GUEST_ES_SELECTOR), vmcs_readl(sel + GUEST_ES_BASE - GUEST_ES_SELECTOR)); } static void vmx_dump_dtsel(char *name, uint32_t limit) { pr_err("%s limit=0x%08x, base=0x%016lx\n", name, vmcs_read32(limit), vmcs_readl(limit + GUEST_GDTR_BASE - GUEST_GDTR_LIMIT)); } static void vmx_dump_msrs(char *name, struct vmx_msrs *m) { unsigned int i; struct vmx_msr_entry *e; pr_err("MSR %s:\n", name); for (i = 0, e = m->val; i < m->nr; ++i, ++e) pr_err(" %2d: msr=0x%08x value=0x%016llx\n", i, e->index, e->value); } void dump_vmcs(struct kvm_vcpu *vcpu) { struct vcpu_vmx *vmx = to_vmx(vcpu); u32 vmentry_ctl, vmexit_ctl; u32 cpu_based_exec_ctrl, pin_based_exec_ctrl, secondary_exec_control; u64 tertiary_exec_control; unsigned long cr4; int efer_slot; if (!dump_invalid_vmcs) { pr_warn_ratelimited("set kvm_intel.dump_invalid_vmcs=1 to dump internal KVM state.\n"); return; } vmentry_ctl = vmcs_read32(VM_ENTRY_CONTROLS); vmexit_ctl = vmcs_read32(VM_EXIT_CONTROLS); cpu_based_exec_ctrl = vmcs_read32(CPU_BASED_VM_EXEC_CONTROL); pin_based_exec_ctrl = vmcs_read32(PIN_BASED_VM_EXEC_CONTROL); cr4 = vmcs_readl(GUEST_CR4); if (cpu_has_secondary_exec_ctrls()) secondary_exec_control = vmcs_read32(SECONDARY_VM_EXEC_CONTROL); else secondary_exec_control = 0; if (cpu_has_tertiary_exec_ctrls()) tertiary_exec_control = vmcs_read64(TERTIARY_VM_EXEC_CONTROL); else tertiary_exec_control = 0; pr_err("VMCS %p, last attempted VM-entry on CPU %d\n", vmx->loaded_vmcs->vmcs, vcpu->arch.last_vmentry_cpu); pr_err("*** Guest State ***\n"); pr_err("CR0: actual=0x%016lx, shadow=0x%016lx, gh_mask=%016lx\n", vmcs_readl(GUEST_CR0), vmcs_readl(CR0_READ_SHADOW), vmcs_readl(CR0_GUEST_HOST_MASK)); pr_err("CR4: actual=0x%016lx, shadow=0x%016lx, gh_mask=%016lx\n", cr4, vmcs_readl(CR4_READ_SHADOW), vmcs_readl(CR4_GUEST_HOST_MASK)); pr_err("CR3 = 0x%016lx\n", vmcs_readl(GUEST_CR3)); if (cpu_has_vmx_ept()) { pr_err("PDPTR0 = 0x%016llx PDPTR1 = 0x%016llx\n", vmcs_read64(GUEST_PDPTR0), vmcs_read64(GUEST_PDPTR1)); pr_err("PDPTR2 = 0x%016llx PDPTR3 = 0x%016llx\n", vmcs_read64(GUEST_PDPTR2), vmcs_read64(GUEST_PDPTR3)); } pr_err("RSP = 0x%016lx RIP = 0x%016lx\n", vmcs_readl(GUEST_RSP), vmcs_readl(GUEST_RIP)); pr_err("RFLAGS=0x%08lx DR7 = 0x%016lx\n", vmcs_readl(GUEST_RFLAGS), vmcs_readl(GUEST_DR7)); pr_err("Sysenter RSP=%016lx CS:RIP=%04x:%016lx\n", vmcs_readl(GUEST_SYSENTER_ESP), vmcs_read32(GUEST_SYSENTER_CS), vmcs_readl(GUEST_SYSENTER_EIP)); vmx_dump_sel("CS: ", GUEST_CS_SELECTOR); vmx_dump_sel("DS: ", GUEST_DS_SELECTOR); vmx_dump_sel("SS: ", GUEST_SS_SELECTOR); vmx_dump_sel("ES: ", GUEST_ES_SELECTOR); vmx_dump_sel("FS: ", GUEST_FS_SELECTOR); vmx_dump_sel("GS: ", GUEST_GS_SELECTOR); vmx_dump_dtsel("GDTR:", GUEST_GDTR_LIMIT); vmx_dump_sel("LDTR:", GUEST_LDTR_SELECTOR); vmx_dump_dtsel("IDTR:", GUEST_IDTR_LIMIT); vmx_dump_sel("TR: ", GUEST_TR_SELECTOR); efer_slot = vmx_find_loadstore_msr_slot(&vmx->msr_autoload.guest, MSR_EFER); if (vmentry_ctl & VM_ENTRY_LOAD_IA32_EFER) pr_err("EFER= 0x%016llx\n", vmcs_read64(GUEST_IA32_EFER)); else if (efer_slot >= 0) pr_err("EFER= 0x%016llx (autoload)\n", vmx->msr_autoload.guest.val[efer_slot].value); else if (vmentry_ctl & VM_ENTRY_IA32E_MODE) pr_err("EFER= 0x%016llx (effective)\n", vcpu->arch.efer | (EFER_LMA | EFER_LME)); else pr_err("EFER= 0x%016llx (effective)\n", vcpu->arch.efer & ~(EFER_LMA | EFER_LME)); if (vmentry_ctl & VM_ENTRY_LOAD_IA32_PAT) pr_err("PAT = 0x%016llx\n", vmcs_read64(GUEST_IA32_PAT)); pr_err("DebugCtl = 0x%016llx DebugExceptions = 0x%016lx\n", vmcs_read64(GUEST_IA32_DEBUGCTL), vmcs_readl(GUEST_PENDING_DBG_EXCEPTIONS)); if (cpu_has_load_perf_global_ctrl() && vmentry_ctl & VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL) pr_err("PerfGlobCtl = 0x%016llx\n", vmcs_read64(GUEST_IA32_PERF_GLOBAL_CTRL)); if (vmentry_ctl & VM_ENTRY_LOAD_BNDCFGS) pr_err("BndCfgS = 0x%016llx\n", vmcs_read64(GUEST_BNDCFGS)); pr_err("Interruptibility = %08x ActivityState = %08x\n", vmcs_read32(GUEST_INTERRUPTIBILITY_INFO), vmcs_read32(GUEST_ACTIVITY_STATE)); if (secondary_exec_control & SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY) pr_err("InterruptStatus = %04x\n", vmcs_read16(GUEST_INTR_STATUS)); if (vmcs_read32(VM_ENTRY_MSR_LOAD_COUNT) > 0) vmx_dump_msrs("guest autoload", &vmx->msr_autoload.guest); if (vmcs_read32(VM_EXIT_MSR_STORE_COUNT) > 0) vmx_dump_msrs("guest autostore", &vmx->msr_autostore.guest); pr_err("*** Host State ***\n"); pr_err("RIP = 0x%016lx RSP = 0x%016lx\n", vmcs_readl(HOST_RIP), vmcs_readl(HOST_RSP)); pr_err("CS=%04x SS=%04x DS=%04x ES=%04x FS=%04x GS=%04x TR=%04x\n", vmcs_read16(HOST_CS_SELECTOR), vmcs_read16(HOST_SS_SELECTOR), vmcs_read16(HOST_DS_SELECTOR), vmcs_read16(HOST_ES_SELECTOR), vmcs_read16(HOST_FS_SELECTOR), vmcs_read16(HOST_GS_SELECTOR), vmcs_read16(HOST_TR_SELECTOR)); pr_err("FSBase=%016lx GSBase=%016lx TRBase=%016lx\n", vmcs_readl(HOST_FS_BASE), vmcs_readl(HOST_GS_BASE), vmcs_readl(HOST_TR_BASE)); pr_err("GDTBase=%016lx IDTBase=%016lx\n", vmcs_readl(HOST_GDTR_BASE), vmcs_readl(HOST_IDTR_BASE)); pr_err("CR0=%016lx CR3=%016lx CR4=%016lx\n", vmcs_readl(HOST_CR0), vmcs_readl(HOST_CR3), vmcs_readl(HOST_CR4)); pr_err("Sysenter RSP=%016lx CS:RIP=%04x:%016lx\n", vmcs_readl(HOST_IA32_SYSENTER_ESP), vmcs_read32(HOST_IA32_SYSENTER_CS), vmcs_readl(HOST_IA32_SYSENTER_EIP)); if (vmexit_ctl & VM_EXIT_LOAD_IA32_EFER) pr_err("EFER= 0x%016llx\n", vmcs_read64(HOST_IA32_EFER)); if (vmexit_ctl & VM_EXIT_LOAD_IA32_PAT) pr_err("PAT = 0x%016llx\n", vmcs_read64(HOST_IA32_PAT)); if (cpu_has_load_perf_global_ctrl() && vmexit_ctl & VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL) pr_err("PerfGlobCtl = 0x%016llx\n", vmcs_read64(HOST_IA32_PERF_GLOBAL_CTRL)); if (vmcs_read32(VM_EXIT_MSR_LOAD_COUNT) > 0) vmx_dump_msrs("host autoload", &vmx->msr_autoload.host); pr_err("*** Control State ***\n"); pr_err("CPUBased=0x%08x SecondaryExec=0x%08x TertiaryExec=0x%016llx\n", cpu_based_exec_ctrl, secondary_exec_control, tertiary_exec_control); pr_err("PinBased=0x%08x EntryControls=%08x ExitControls=%08x\n", pin_based_exec_ctrl, vmentry_ctl, vmexit_ctl); pr_err("ExceptionBitmap=%08x PFECmask=%08x PFECmatch=%08x\n", vmcs_read32(EXCEPTION_BITMAP), vmcs_read32(PAGE_FAULT_ERROR_CODE_MASK), vmcs_read32(PAGE_FAULT_ERROR_CODE_MATCH)); pr_err("VMEntry: intr_info=%08x errcode=%08x ilen=%08x\n", vmcs_read32(VM_ENTRY_INTR_INFO_FIELD), vmcs_read32(VM_ENTRY_EXCEPTION_ERROR_CODE), vmcs_read32(VM_ENTRY_INSTRUCTION_LEN)); pr_err("VMExit: intr_info=%08x errcode=%08x ilen=%08x\n", vmcs_read32(VM_EXIT_INTR_INFO), vmcs_read32(VM_EXIT_INTR_ERROR_CODE), vmcs_read32(VM_EXIT_INSTRUCTION_LEN)); pr_err(" reason=%08x qualification=%016lx\n", vmcs_read32(VM_EXIT_REASON), vmcs_readl(EXIT_QUALIFICATION)); pr_err("IDTVectoring: info=%08x errcode=%08x\n", vmcs_read32(IDT_VECTORING_INFO_FIELD), vmcs_read32(IDT_VECTORING_ERROR_CODE)); pr_err("TSC Offset = 0x%016llx\n", vmcs_read64(TSC_OFFSET)); if (secondary_exec_control & SECONDARY_EXEC_TSC_SCALING) pr_err("TSC Multiplier = 0x%016llx\n", vmcs_read64(TSC_MULTIPLIER)); if (cpu_based_exec_ctrl & CPU_BASED_TPR_SHADOW) { if (secondary_exec_control & SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY) { u16 status = vmcs_read16(GUEST_INTR_STATUS); pr_err("SVI|RVI = %02x|%02x ", status >> 8, status & 0xff); } pr_cont("TPR Threshold = 0x%02x\n", vmcs_read32(TPR_THRESHOLD)); if (secondary_exec_control & SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES) pr_err("APIC-access addr = 0x%016llx ", vmcs_read64(APIC_ACCESS_ADDR)); pr_cont("virt-APIC addr = 0x%016llx\n", vmcs_read64(VIRTUAL_APIC_PAGE_ADDR)); } if (pin_based_exec_ctrl & PIN_BASED_POSTED_INTR) pr_err("PostedIntrVec = 0x%02x\n", vmcs_read16(POSTED_INTR_NV)); if ((secondary_exec_control & SECONDARY_EXEC_ENABLE_EPT)) pr_err("EPT pointer = 0x%016llx\n", vmcs_read64(EPT_POINTER)); if (secondary_exec_control & SECONDARY_EXEC_PAUSE_LOOP_EXITING) pr_err("PLE Gap=%08x Window=%08x\n", vmcs_read32(PLE_GAP), vmcs_read32(PLE_WINDOW)); if (secondary_exec_control & SECONDARY_EXEC_ENABLE_VPID) pr_err("Virtual processor ID = 0x%04x\n", vmcs_read16(VIRTUAL_PROCESSOR_ID)); if (secondary_exec_control & SECONDARY_EXEC_EPT_VIOLATION_VE) { struct vmx_ve_information *ve_info = vmx->ve_info; u64 ve_info_pa = vmcs_read64(VE_INFORMATION_ADDRESS); /* * If KVM is dumping the VMCS, then something has gone wrong * already. Derefencing an address from the VMCS, which could * very well be corrupted, is a terrible idea. The virtual * address is known so use it. */ pr_err("VE info address = 0x%016llx%s\n", ve_info_pa, ve_info_pa == __pa(ve_info) ? "" : "(corrupted!)"); pr_err("ve_info: 0x%08x 0x%08x 0x%016llx 0x%016llx 0x%016llx 0x%04x\n", ve_info->exit_reason, ve_info->delivery, ve_info->exit_qualification, ve_info->guest_linear_address, ve_info->guest_physical_address, ve_info->eptp_index); } } /* * The guest has exited. See if we can fix it or if we need userspace * assistance. */ static int __vmx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath) { struct vcpu_vmx *vmx = to_vmx(vcpu); union vmx_exit_reason exit_reason = vmx_get_exit_reason(vcpu); u32 vectoring_info = vmx->idt_vectoring_info; u16 exit_handler_index; /* * Flush logged GPAs PML buffer, this will make dirty_bitmap more * updated. Another good is, in kvm_vm_ioctl_get_dirty_log, before * querying dirty_bitmap, we only need to kick all vcpus out of guest * mode as if vcpus is in root mode, the PML buffer must has been * flushed already. Note, PML is never enabled in hardware while * running L2. */ if (enable_pml && !is_guest_mode(vcpu)) vmx_flush_pml_buffer(vcpu); /* * KVM should never reach this point with a pending nested VM-Enter. * More specifically, short-circuiting VM-Entry to emulate L2 due to * invalid guest state should never happen as that means KVM knowingly * allowed a nested VM-Enter with an invalid vmcs12. More below. */ if (KVM_BUG_ON(vmx->nested.nested_run_pending, vcpu->kvm)) return -EIO; if (is_guest_mode(vcpu)) { /* * PML is never enabled when running L2, bail immediately if a * PML full exit occurs as something is horribly wrong. */ if (exit_reason.basic == EXIT_REASON_PML_FULL) goto unexpected_vmexit; /* * The host physical addresses of some pages of guest memory * are loaded into the vmcs02 (e.g. vmcs12's Virtual APIC * Page). The CPU may write to these pages via their host * physical address while L2 is running, bypassing any * address-translation-based dirty tracking (e.g. EPT write * protection). * * Mark them dirty on every exit from L2 to prevent them from * getting out of sync with dirty tracking. */ nested_mark_vmcs12_pages_dirty(vcpu); /* * Synthesize a triple fault if L2 state is invalid. In normal * operation, nested VM-Enter rejects any attempt to enter L2 * with invalid state. However, those checks are skipped if * state is being stuffed via RSM or KVM_SET_NESTED_STATE. If * L2 state is invalid, it means either L1 modified SMRAM state * or userspace provided bad state. Synthesize TRIPLE_FAULT as * doing so is architecturally allowed in the RSM case, and is * the least awful solution for the userspace case without * risking false positives. */ if (vmx->vt.emulation_required) { nested_vmx_vmexit(vcpu, EXIT_REASON_TRIPLE_FAULT, 0, 0); return 1; } if (nested_vmx_reflect_vmexit(vcpu)) return 1; } /* If guest state is invalid, start emulating. L2 is handled above. */ if (vmx->vt.emulation_required) return handle_invalid_guest_state(vcpu); if (exit_reason.failed_vmentry) { dump_vmcs(vcpu); vcpu->run->exit_reason = KVM_EXIT_FAIL_ENTRY; vcpu->run->fail_entry.hardware_entry_failure_reason = exit_reason.full; vcpu->run->fail_entry.cpu = vcpu->arch.last_vmentry_cpu; return 0; } if (unlikely(vmx->fail)) { dump_vmcs(vcpu); vcpu->run->exit_reason = KVM_EXIT_FAIL_ENTRY; vcpu->run->fail_entry.hardware_entry_failure_reason = vmcs_read32(VM_INSTRUCTION_ERROR); vcpu->run->fail_entry.cpu = vcpu->arch.last_vmentry_cpu; return 0; } if ((vectoring_info & VECTORING_INFO_VALID_MASK) && (exit_reason.basic != EXIT_REASON_EXCEPTION_NMI && exit_reason.basic != EXIT_REASON_EPT_VIOLATION && exit_reason.basic != EXIT_REASON_PML_FULL && exit_reason.basic != EXIT_REASON_APIC_ACCESS && exit_reason.basic != EXIT_REASON_TASK_SWITCH && exit_reason.basic != EXIT_REASON_NOTIFY && exit_reason.basic != EXIT_REASON_EPT_MISCONFIG)) { kvm_prepare_event_vectoring_exit(vcpu, INVALID_GPA); return 0; } if (unlikely(!enable_vnmi && vmx->loaded_vmcs->soft_vnmi_blocked)) { if (!vmx_interrupt_blocked(vcpu)) { vmx->loaded_vmcs->soft_vnmi_blocked = 0; } else if (vmx->loaded_vmcs->vnmi_blocked_time > 1000000000LL && vcpu->arch.nmi_pending) { /* * This CPU don't support us in finding the end of an * NMI-blocked window if the guest runs with IRQs * disabled. So we pull the trigger after 1 s of * futile waiting, but inform the user about this. */ printk(KERN_WARNING "%s: Breaking out of NMI-blocked " "state on VCPU %d after 1 s timeout\n", __func__, vcpu->vcpu_id); vmx->loaded_vmcs->soft_vnmi_blocked = 0; } } if (exit_fastpath != EXIT_FASTPATH_NONE) return 1; if (exit_reason.basic >= kvm_vmx_max_exit_handlers) goto unexpected_vmexit; #ifdef CONFIG_MITIGATION_RETPOLINE if (exit_reason.basic == EXIT_REASON_MSR_WRITE) return kvm_emulate_wrmsr(vcpu); else if (exit_reason.basic == EXIT_REASON_PREEMPTION_TIMER) return handle_preemption_timer(vcpu); else if (exit_reason.basic == EXIT_REASON_INTERRUPT_WINDOW) return handle_interrupt_window(vcpu); else if (exit_reason.basic == EXIT_REASON_EXTERNAL_INTERRUPT) return handle_external_interrupt(vcpu); else if (exit_reason.basic == EXIT_REASON_HLT) return kvm_emulate_halt(vcpu); else if (exit_reason.basic == EXIT_REASON_EPT_MISCONFIG) return handle_ept_misconfig(vcpu); #endif exit_handler_index = array_index_nospec((u16)exit_reason.basic, kvm_vmx_max_exit_handlers); if (!kvm_vmx_exit_handlers[exit_handler_index]) goto unexpected_vmexit; return kvm_vmx_exit_handlers[exit_handler_index](vcpu); unexpected_vmexit: vcpu_unimpl(vcpu, "vmx: unexpected exit reason 0x%x\n", exit_reason.full); dump_vmcs(vcpu); vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR; vcpu->run->internal.suberror = KVM_INTERNAL_ERROR_UNEXPECTED_EXIT_REASON; vcpu->run->internal.ndata = 2; vcpu->run->internal.data[0] = exit_reason.full; vcpu->run->internal.data[1] = vcpu->arch.last_vmentry_cpu; return 0; } int vmx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath) { int ret = __vmx_handle_exit(vcpu, exit_fastpath); /* * Exit to user space when bus lock detected to inform that there is * a bus lock in guest. */ if (vmx_get_exit_reason(vcpu).bus_lock_detected) { if (ret > 0) vcpu->run->exit_reason = KVM_EXIT_X86_BUS_LOCK; vcpu->run->flags |= KVM_RUN_X86_BUS_LOCK; return 0; } return ret; } /* * Software based L1D cache flush which is used when microcode providing * the cache control MSR is not loaded. * * The L1D cache is 32 KiB on Nehalem and later microarchitectures, but to * flush it is required to read in 64 KiB because the replacement algorithm * is not exactly LRU. This could be sized at runtime via topology * information but as all relevant affected CPUs have 32KiB L1D cache size * there is no point in doing so. */ static noinstr void vmx_l1d_flush(struct kvm_vcpu *vcpu) { int size = PAGE_SIZE << L1D_CACHE_ORDER; /* * This code is only executed when the flush mode is 'cond' or * 'always' */ if (static_branch_likely(&vmx_l1d_flush_cond)) { bool flush_l1d; /* * Clear the per-vcpu flush bit, it gets set again if the vCPU * is reloaded, i.e. if the vCPU is scheduled out or if KVM * exits to userspace, or if KVM reaches one of the unsafe * VMEXIT handlers, e.g. if KVM calls into the emulator. */ flush_l1d = vcpu->arch.l1tf_flush_l1d; vcpu->arch.l1tf_flush_l1d = false; /* * Clear the per-cpu flush bit, it gets set again from * the interrupt handlers. */ flush_l1d |= kvm_get_cpu_l1tf_flush_l1d(); kvm_clear_cpu_l1tf_flush_l1d(); if (!flush_l1d) return; } vcpu->stat.l1d_flush++; if (static_cpu_has(X86_FEATURE_FLUSH_L1D)) { native_wrmsrq(MSR_IA32_FLUSH_CMD, L1D_FLUSH); return; } asm volatile( /* First ensure the pages are in the TLB */ "xorl %%eax, %%eax\n" ".Lpopulate_tlb:\n\t" "movzbl (%[flush_pages], %%" _ASM_AX "), %%ecx\n\t" "addl $4096, %%eax\n\t" "cmpl %%eax, %[size]\n\t" "jne .Lpopulate_tlb\n\t" "xorl %%eax, %%eax\n\t" "cpuid\n\t" /* Now fill the cache */ "xorl %%eax, %%eax\n" ".Lfill_cache:\n" "movzbl (%[flush_pages], %%" _ASM_AX "), %%ecx\n\t" "addl $64, %%eax\n\t" "cmpl %%eax, %[size]\n\t" "jne .Lfill_cache\n\t" "lfence\n" :: [flush_pages] "r" (vmx_l1d_flush_pages), [size] "r" (size) : "eax", "ebx", "ecx", "edx"); } void vmx_update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr) { struct vmcs12 *vmcs12 = get_vmcs12(vcpu); int tpr_threshold; if (is_guest_mode(vcpu) && nested_cpu_has(vmcs12, CPU_BASED_TPR_SHADOW)) return; tpr_threshold = (irr == -1 || tpr < irr) ? 0 : irr; if (is_guest_mode(vcpu)) to_vmx(vcpu)->nested.l1_tpr_threshold = tpr_threshold; else vmcs_write32(TPR_THRESHOLD, tpr_threshold); } void vmx_set_virtual_apic_mode(struct kvm_vcpu *vcpu) { struct vcpu_vmx *vmx = to_vmx(vcpu); u32 sec_exec_control; if (!lapic_in_kernel(vcpu)) return; if (!flexpriority_enabled && !cpu_has_vmx_virtualize_x2apic_mode()) return; /* Postpone execution until vmcs01 is the current VMCS. */ if (is_guest_mode(vcpu)) { vmx->nested.change_vmcs01_virtual_apic_mode = true; return; } sec_exec_control = secondary_exec_controls_get(vmx); sec_exec_control &= ~(SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES | SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE); switch (kvm_get_apic_mode(vcpu)) { case LAPIC_MODE_INVALID: WARN_ONCE(true, "Invalid local APIC state"); break; case LAPIC_MODE_DISABLED: break; case LAPIC_MODE_XAPIC: if (flexpriority_enabled) { sec_exec_control |= SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES; kvm_make_request(KVM_REQ_APIC_PAGE_RELOAD, vcpu); /* * Flush the TLB, reloading the APIC access page will * only do so if its physical address has changed, but * the guest may have inserted a non-APIC mapping into * the TLB while the APIC access page was disabled. */ kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu); } break; case LAPIC_MODE_X2APIC: if (cpu_has_vmx_virtualize_x2apic_mode()) sec_exec_control |= SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE; break; } secondary_exec_controls_set(vmx, sec_exec_control); vmx_update_msr_bitmap_x2apic(vcpu); } void vmx_set_apic_access_page_addr(struct kvm_vcpu *vcpu) { const gfn_t gfn = APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT; struct kvm *kvm = vcpu->kvm; struct kvm_memslots *slots = kvm_memslots(kvm); struct kvm_memory_slot *slot; struct page *refcounted_page; unsigned long mmu_seq; kvm_pfn_t pfn; bool writable; /* Defer reload until vmcs01 is the current VMCS. */ if (is_guest_mode(vcpu)) { to_vmx(vcpu)->nested.reload_vmcs01_apic_access_page = true; return; } if (!(secondary_exec_controls_get(to_vmx(vcpu)) & SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES)) return; /* * Explicitly grab the memslot using KVM's internal slot ID to ensure * KVM doesn't unintentionally grab a userspace memslot. It _should_ * be impossible for userspace to create a memslot for the APIC when * APICv is enabled, but paranoia won't hurt in this case. */ slot = id_to_memslot(slots, APIC_ACCESS_PAGE_PRIVATE_MEMSLOT); if (!slot || slot->flags & KVM_MEMSLOT_INVALID) return; /* * Ensure that the mmu_notifier sequence count is read before KVM * retrieves the pfn from the primary MMU. Note, the memslot is * protected by SRCU, not the mmu_notifier. Pairs with the smp_wmb() * in kvm_mmu_invalidate_end(). */ mmu_seq = kvm->mmu_invalidate_seq; smp_rmb(); /* * No need to retry if the memslot does not exist or is invalid. KVM * controls the APIC-access page memslot, and only deletes the memslot * if APICv is permanently inhibited, i.e. the memslot won't reappear. */ pfn = __kvm_faultin_pfn(slot, gfn, FOLL_WRITE, &writable, &refcounted_page); if (is_error_noslot_pfn(pfn)) return; read_lock(&vcpu->kvm->mmu_lock); if (mmu_invalidate_retry_gfn(kvm, mmu_seq, gfn)) kvm_make_request(KVM_REQ_APIC_PAGE_RELOAD, vcpu); else vmcs_write64(APIC_ACCESS_ADDR, pfn_to_hpa(pfn)); /* * Do not pin the APIC access page in memory so that it can be freely * migrated, the MMU notifier will call us again if it is migrated or * swapped out. KVM backs the memslot with anonymous memory, the pfn * should always point at a refcounted page (if the pfn is valid). */ if (!WARN_ON_ONCE(!refcounted_page)) kvm_release_page_clean(refcounted_page); /* * No need for a manual TLB flush at this point, KVM has already done a * flush if there were SPTEs pointing at the previous page. */ read_unlock(&vcpu->kvm->mmu_lock); } void vmx_hwapic_isr_update(struct kvm_vcpu *vcpu, int max_isr) { u16 status; u8 old; /* * If L2 is active, defer the SVI update until vmcs01 is loaded, as SVI * is only relevant for if and only if Virtual Interrupt Delivery is * enabled in vmcs12, and if VID is enabled then L2 EOIs affect L2's * vAPIC, not L1's vAPIC. KVM must update vmcs01 on the next nested * VM-Exit, otherwise L1 with run with a stale SVI. */ if (is_guest_mode(vcpu)) { /* * KVM is supposed to forward intercepted L2 EOIs to L1 if VID * is enabled in vmcs12; as above, the EOIs affect L2's vAPIC. * Note, userspace can stuff state while L2 is active; assert * that VID is disabled if and only if the vCPU is in KVM_RUN * to avoid false positives if userspace is setting APIC state. */ WARN_ON_ONCE(vcpu->wants_to_run && nested_cpu_has_vid(get_vmcs12(vcpu))); to_vmx(vcpu)->nested.update_vmcs01_hwapic_isr = true; return; } if (max_isr == -1) max_isr = 0; status = vmcs_read16(GUEST_INTR_STATUS); old = status >> 8; if (max_isr != old) { status &= 0xff; status |= max_isr << 8; vmcs_write16(GUEST_INTR_STATUS, status); } } static void vmx_set_rvi(int vector) { u16 status; u8 old; if (vector == -1) vector = 0; status = vmcs_read16(GUEST_INTR_STATUS); old = (u8)status & 0xff; if ((u8)vector != old) { status &= ~0xff; status |= (u8)vector; vmcs_write16(GUEST_INTR_STATUS, status); } } int vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu) { struct vcpu_vt *vt = to_vt(vcpu); int max_irr; bool got_posted_interrupt; if (KVM_BUG_ON(!enable_apicv, vcpu->kvm)) return -EIO; if (pi_test_on(&vt->pi_desc)) { pi_clear_on(&vt->pi_desc); /* * IOMMU can write to PID.ON, so the barrier matters even on UP. * But on x86 this is just a compiler barrier anyway. */ smp_mb__after_atomic(); got_posted_interrupt = kvm_apic_update_irr(vcpu, vt->pi_desc.pir, &max_irr); } else { max_irr = kvm_lapic_find_highest_irr(vcpu); got_posted_interrupt = false; } /* * Newly recognized interrupts are injected via either virtual interrupt * delivery (RVI) or KVM_REQ_EVENT. Virtual interrupt delivery is * disabled in two cases: * * 1) If L2 is running and the vCPU has a new pending interrupt. If L1 * wants to exit on interrupts, KVM_REQ_EVENT is needed to synthesize a * VM-Exit to L1. If L1 doesn't want to exit, the interrupt is injected * into L2, but KVM doesn't use virtual interrupt delivery to inject * interrupts into L2, and so KVM_REQ_EVENT is again needed. * * 2) If APICv is disabled for this vCPU, assigned devices may still * attempt to post interrupts. The posted interrupt vector will cause * a VM-Exit and the subsequent entry will call sync_pir_to_irr. */ if (!is_guest_mode(vcpu) && kvm_vcpu_apicv_active(vcpu)) vmx_set_rvi(max_irr); else if (got_posted_interrupt) kvm_make_request(KVM_REQ_EVENT, vcpu); return max_irr; } void vmx_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap) { if (!kvm_vcpu_apicv_active(vcpu)) return; vmcs_write64(EOI_EXIT_BITMAP0, eoi_exit_bitmap[0]); vmcs_write64(EOI_EXIT_BITMAP1, eoi_exit_bitmap[1]); vmcs_write64(EOI_EXIT_BITMAP2, eoi_exit_bitmap[2]); vmcs_write64(EOI_EXIT_BITMAP3, eoi_exit_bitmap[3]); } void vmx_do_interrupt_irqoff(unsigned long entry); void vmx_do_nmi_irqoff(void); static void handle_nm_fault_irqoff(struct kvm_vcpu *vcpu) { /* * Save xfd_err to guest_fpu before interrupt is enabled, so the * MSR value is not clobbered by the host activity before the guest * has chance to consume it. * * Update the guest's XFD_ERR if and only if XFD is enabled, as the #NM * interception may have been caused by L1 interception. Per the SDM, * XFD_ERR is not modified for non-XFD #NM, i.e. if CR0.TS=1. * * Note, XFD_ERR is updated _before_ the #NM interception check, i.e. * unlike CR2 and DR6, the value is not a payload that is attached to * the #NM exception. */ if (is_xfd_nm_fault(vcpu)) rdmsrq(MSR_IA32_XFD_ERR, vcpu->arch.guest_fpu.xfd_err); } static void handle_exception_irqoff(struct kvm_vcpu *vcpu, u32 intr_info) { /* if exit due to PF check for async PF */ if (is_page_fault(intr_info)) vcpu->arch.apf.host_apf_flags = kvm_read_and_reset_apf_flags(); /* if exit due to NM, handle before interrupts are enabled */ else if (is_nm_fault(intr_info)) handle_nm_fault_irqoff(vcpu); /* Handle machine checks before interrupts are enabled */ else if (is_machine_check(intr_info)) kvm_machine_check(); } static void handle_external_interrupt_irqoff(struct kvm_vcpu *vcpu, u32 intr_info) { unsigned int vector = intr_info & INTR_INFO_VECTOR_MASK; if (KVM_BUG(!is_external_intr(intr_info), vcpu->kvm, "unexpected VM-Exit interrupt info: 0x%x", intr_info)) return; kvm_before_interrupt(vcpu, KVM_HANDLING_IRQ); if (cpu_feature_enabled(X86_FEATURE_FRED)) fred_entry_from_kvm(EVENT_TYPE_EXTINT, vector); else vmx_do_interrupt_irqoff(gate_offset((gate_desc *)host_idt_base + vector)); kvm_after_interrupt(vcpu); vcpu->arch.at_instruction_boundary = true; } void vmx_handle_exit_irqoff(struct kvm_vcpu *vcpu) { if (to_vt(vcpu)->emulation_required) return; if (vmx_get_exit_reason(vcpu).basic == EXIT_REASON_EXTERNAL_INTERRUPT) handle_external_interrupt_irqoff(vcpu, vmx_get_intr_info(vcpu)); else if (vmx_get_exit_reason(vcpu).basic == EXIT_REASON_EXCEPTION_NMI) handle_exception_irqoff(vcpu, vmx_get_intr_info(vcpu)); } /* * The kvm parameter can be NULL (module initialization, or invocation before * VM creation). Be sure to check the kvm parameter before using it. */ bool vmx_has_emulated_msr(struct kvm *kvm, u32 index) { switch (index) { case MSR_IA32_SMBASE: if (!IS_ENABLED(CONFIG_KVM_SMM)) return false; /* * We cannot do SMM unless we can run the guest in big * real mode. */ return enable_unrestricted_guest || emulate_invalid_guest_state; case KVM_FIRST_EMULATED_VMX_MSR ... KVM_LAST_EMULATED_VMX_MSR: return nested; case MSR_AMD64_VIRT_SPEC_CTRL: case MSR_AMD64_TSC_RATIO: /* This is AMD only. */ return false; default: return true; } } static void vmx_recover_nmi_blocking(struct vcpu_vmx *vmx) { u32 exit_intr_info; bool unblock_nmi; u8 vector; bool idtv_info_valid; idtv_info_valid = vmx->idt_vectoring_info & VECTORING_INFO_VALID_MASK; if (enable_vnmi) { if (vmx->loaded_vmcs->nmi_known_unmasked) return; exit_intr_info = vmx_get_intr_info(&vmx->vcpu); unblock_nmi = (exit_intr_info & INTR_INFO_UNBLOCK_NMI) != 0; vector = exit_intr_info & INTR_INFO_VECTOR_MASK; /* * SDM 3: 27.7.1.2 (September 2008) * Re-set bit "block by NMI" before VM entry if vmexit caused by * a guest IRET fault. * SDM 3: 23.2.2 (September 2008) * Bit 12 is undefined in any of the following cases: * If the VM exit sets the valid bit in the IDT-vectoring * information field. * If the VM exit is due to a double fault. */ if ((exit_intr_info & INTR_INFO_VALID_MASK) && unblock_nmi && vector != DF_VECTOR && !idtv_info_valid) vmcs_set_bits(GUEST_INTERRUPTIBILITY_INFO, GUEST_INTR_STATE_NMI); else vmx->loaded_vmcs->nmi_known_unmasked = !(vmcs_read32(GUEST_INTERRUPTIBILITY_INFO) & GUEST_INTR_STATE_NMI); } else if (unlikely(vmx->loaded_vmcs->soft_vnmi_blocked)) vmx->loaded_vmcs->vnmi_blocked_time += ktime_to_ns(ktime_sub(ktime_get(), vmx->loaded_vmcs->entry_time)); } static void __vmx_complete_interrupts(struct kvm_vcpu *vcpu, u32 idt_vectoring_info, int instr_len_field, int error_code_field) { u8 vector; int type; bool idtv_info_valid; idtv_info_valid = idt_vectoring_info & VECTORING_INFO_VALID_MASK; vcpu->arch.nmi_injected = false; kvm_clear_exception_queue(vcpu); kvm_clear_interrupt_queue(vcpu); if (!idtv_info_valid) return; kvm_make_request(KVM_REQ_EVENT, vcpu); vector = idt_vectoring_info & VECTORING_INFO_VECTOR_MASK; type = idt_vectoring_info & VECTORING_INFO_TYPE_MASK; switch (type) { case INTR_TYPE_NMI_INTR: vcpu->arch.nmi_injected = true; /* * SDM 3: 27.7.1.2 (September 2008) * Clear bit "block by NMI" before VM entry if a NMI * delivery faulted. */ vmx_set_nmi_mask(vcpu, false); break; case INTR_TYPE_SOFT_EXCEPTION: vcpu->arch.event_exit_inst_len = vmcs_read32(instr_len_field); fallthrough; case INTR_TYPE_HARD_EXCEPTION: { u32 error_code = 0; if (idt_vectoring_info & VECTORING_INFO_DELIVER_CODE_MASK) error_code = vmcs_read32(error_code_field); kvm_requeue_exception(vcpu, vector, idt_vectoring_info & VECTORING_INFO_DELIVER_CODE_MASK, error_code); break; } case INTR_TYPE_SOFT_INTR: vcpu->arch.event_exit_inst_len = vmcs_read32(instr_len_field); fallthrough; case INTR_TYPE_EXT_INTR: kvm_queue_interrupt(vcpu, vector, type == INTR_TYPE_SOFT_INTR); break; default: break; } } static void vmx_complete_interrupts(struct vcpu_vmx *vmx) { __vmx_complete_interrupts(&vmx->vcpu, vmx->idt_vectoring_info, VM_EXIT_INSTRUCTION_LEN, IDT_VECTORING_ERROR_CODE); } void vmx_cancel_injection(struct kvm_vcpu *vcpu) { __vmx_complete_interrupts(vcpu, vmcs_read32(VM_ENTRY_INTR_INFO_FIELD), VM_ENTRY_INSTRUCTION_LEN, VM_ENTRY_EXCEPTION_ERROR_CODE); vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, 0); } static void atomic_switch_perf_msrs(struct vcpu_vmx *vmx) { int i, nr_msrs; struct perf_guest_switch_msr *msrs; struct kvm_pmu *pmu = vcpu_to_pmu(&vmx->vcpu); pmu->host_cross_mapped_mask = 0; if (pmu->pebs_enable & pmu->global_ctrl) intel_pmu_cross_mapped_check(pmu); /* Note, nr_msrs may be garbage if perf_guest_get_msrs() returns NULL. */ msrs = perf_guest_get_msrs(&nr_msrs, (void *)pmu); if (!msrs) return; for (i = 0; i < nr_msrs; i++) if (msrs[i].host == msrs[i].guest) clear_atomic_switch_msr(vmx, msrs[i].msr); else add_atomic_switch_msr(vmx, msrs[i].msr, msrs[i].guest, msrs[i].host, false); } static void vmx_update_hv_timer(struct kvm_vcpu *vcpu, bool force_immediate_exit) { struct vcpu_vmx *vmx = to_vmx(vcpu); u64 tscl; u32 delta_tsc; if (force_immediate_exit) { vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, 0); vmx->loaded_vmcs->hv_timer_soft_disabled = false; } else if (vmx->hv_deadline_tsc != -1) { tscl = rdtsc(); if (vmx->hv_deadline_tsc > tscl) /* set_hv_timer ensures the delta fits in 32-bits */ delta_tsc = (u32)((vmx->hv_deadline_tsc - tscl) >> cpu_preemption_timer_multi); else delta_tsc = 0; vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, delta_tsc); vmx->loaded_vmcs->hv_timer_soft_disabled = false; } else if (!vmx->loaded_vmcs->hv_timer_soft_disabled) { vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, -1); vmx->loaded_vmcs->hv_timer_soft_disabled = true; } } void noinstr vmx_update_host_rsp(struct vcpu_vmx *vmx, unsigned long host_rsp) { if (unlikely(host_rsp != vmx->loaded_vmcs->host_state.rsp)) { vmx->loaded_vmcs->host_state.rsp = host_rsp; vmcs_writel(HOST_RSP, host_rsp); } } void noinstr vmx_spec_ctrl_restore_host(struct vcpu_vmx *vmx, unsigned int flags) { u64 hostval = this_cpu_read(x86_spec_ctrl_current); if (!cpu_feature_enabled(X86_FEATURE_MSR_SPEC_CTRL)) return; if (flags & VMX_RUN_SAVE_SPEC_CTRL) vmx->spec_ctrl = native_rdmsrq(MSR_IA32_SPEC_CTRL); /* * If the guest/host SPEC_CTRL values differ, restore the host value. * * For legacy IBRS, the IBRS bit always needs to be written after * transitioning from a less privileged predictor mode, regardless of * whether the guest/host values differ. */ if (cpu_feature_enabled(X86_FEATURE_KERNEL_IBRS) || vmx->spec_ctrl != hostval) native_wrmsrq(MSR_IA32_SPEC_CTRL, hostval); barrier_nospec(); } static fastpath_t vmx_exit_handlers_fastpath(struct kvm_vcpu *vcpu, bool force_immediate_exit) { /* * If L2 is active, some VMX preemption timer exits can be handled in * the fastpath even, all other exits must use the slow path. */ if (is_guest_mode(vcpu) && vmx_get_exit_reason(vcpu).basic != EXIT_REASON_PREEMPTION_TIMER) return EXIT_FASTPATH_NONE; switch (vmx_get_exit_reason(vcpu).basic) { case EXIT_REASON_MSR_WRITE: return handle_fastpath_set_msr_irqoff(vcpu); case EXIT_REASON_PREEMPTION_TIMER: return handle_fastpath_preemption_timer(vcpu, force_immediate_exit); case EXIT_REASON_HLT: return handle_fastpath_hlt(vcpu); default: return EXIT_FASTPATH_NONE; } } noinstr void vmx_handle_nmi(struct kvm_vcpu *vcpu) { if ((u16)vmx_get_exit_reason(vcpu).basic != EXIT_REASON_EXCEPTION_NMI || !is_nmi(vmx_get_intr_info(vcpu))) return; kvm_before_interrupt(vcpu, KVM_HANDLING_NMI); if (cpu_feature_enabled(X86_FEATURE_FRED)) fred_entry_from_kvm(EVENT_TYPE_NMI, NMI_VECTOR); else vmx_do_nmi_irqoff(); kvm_after_interrupt(vcpu); } static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu, unsigned int flags) { struct vcpu_vmx *vmx = to_vmx(vcpu); guest_state_enter_irqoff(); /* * L1D Flush includes CPU buffer clear to mitigate MDS, but VERW * mitigation for MDS is done late in VMentry and is still * executed in spite of L1D Flush. This is because an extra VERW * should not matter much after the big hammer L1D Flush. * * cpu_buf_vm_clear is used when system is not vulnerable to MDS/TAA, * and is affected by MMIO Stale Data. In such cases mitigation in only * needed against an MMIO capable guest. */ if (static_branch_unlikely(&vmx_l1d_should_flush)) vmx_l1d_flush(vcpu); else if (static_branch_unlikely(&cpu_buf_vm_clear) && (flags & VMX_RUN_CLEAR_CPU_BUFFERS_FOR_MMIO)) mds_clear_cpu_buffers(); vmx_disable_fb_clear(vmx); if (vcpu->arch.cr2 != native_read_cr2()) native_write_cr2(vcpu->arch.cr2); vmx->fail = __vmx_vcpu_run(vmx, (unsigned long *)&vcpu->arch.regs, flags); vcpu->arch.cr2 = native_read_cr2(); vcpu->arch.regs_avail &= ~VMX_REGS_LAZY_LOAD_SET; vmx->idt_vectoring_info = 0; vmx_enable_fb_clear(vmx); if (unlikely(vmx->fail)) { vmx->vt.exit_reason.full = 0xdead; goto out; } vmx->vt.exit_reason.full = vmcs_read32(VM_EXIT_REASON); if (likely(!vmx_get_exit_reason(vcpu).failed_vmentry)) vmx->idt_vectoring_info = vmcs_read32(IDT_VECTORING_INFO_FIELD); vmx_handle_nmi(vcpu); out: guest_state_exit_irqoff(); } fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu, u64 run_flags) { bool force_immediate_exit = run_flags & KVM_RUN_FORCE_IMMEDIATE_EXIT; struct vcpu_vmx *vmx = to_vmx(vcpu); unsigned long cr3, cr4; /* Record the guest's net vcpu time for enforced NMI injections. */ if (unlikely(!enable_vnmi && vmx->loaded_vmcs->soft_vnmi_blocked)) vmx->loaded_vmcs->entry_time = ktime_get(); /* * Don't enter VMX if guest state is invalid, let the exit handler * start emulation until we arrive back to a valid state. Synthesize a * consistency check VM-Exit due to invalid guest state and bail. */ if (unlikely(vmx->vt.emulation_required)) { vmx->fail = 0; vmx->vt.exit_reason.full = EXIT_REASON_INVALID_STATE; vmx->vt.exit_reason.failed_vmentry = 1; kvm_register_mark_available(vcpu, VCPU_EXREG_EXIT_INFO_1); vmx->vt.exit_qualification = ENTRY_FAIL_DEFAULT; kvm_register_mark_available(vcpu, VCPU_EXREG_EXIT_INFO_2); vmx->vt.exit_intr_info = 0; return EXIT_FASTPATH_NONE; } trace_kvm_entry(vcpu, force_immediate_exit); if (vmx->ple_window_dirty) { vmx->ple_window_dirty = false; vmcs_write32(PLE_WINDOW, vmx->ple_window); } /* * We did this in prepare_switch_to_guest, because it needs to * be within srcu_read_lock. */ WARN_ON_ONCE(vmx->nested.need_vmcs12_to_shadow_sync); if (kvm_register_is_dirty(vcpu, VCPU_REGS_RSP)) vmcs_writel(GUEST_RSP, vcpu->arch.regs[VCPU_REGS_RSP]); if (kvm_register_is_dirty(vcpu, VCPU_REGS_RIP)) vmcs_writel(GUEST_RIP, vcpu->arch.regs[VCPU_REGS_RIP]); vcpu->arch.regs_dirty = 0; if (run_flags & KVM_RUN_LOAD_GUEST_DR6) set_debugreg(vcpu->arch.dr6, 6); if (run_flags & KVM_RUN_LOAD_DEBUGCTL) vmx_reload_guest_debugctl(vcpu); /* * Refresh vmcs.HOST_CR3 if necessary. This must be done immediately * prior to VM-Enter, as the kernel may load a new ASID (PCID) any time * it switches back to the current->mm, which can occur in KVM context * when switching to a temporary mm to patch kernel code, e.g. if KVM * toggles a static key while handling a VM-Exit. */ cr3 = __get_current_cr3_fast(); if (unlikely(cr3 != vmx->loaded_vmcs->host_state.cr3)) { vmcs_writel(HOST_CR3, cr3); vmx->loaded_vmcs->host_state.cr3 = cr3; } cr4 = cr4_read_shadow(); if (unlikely(cr4 != vmx->loaded_vmcs->host_state.cr4)) { vmcs_writel(HOST_CR4, cr4); vmx->loaded_vmcs->host_state.cr4 = cr4; } /* When single-stepping over STI and MOV SS, we must clear the * corresponding interruptibility bits in the guest state. Otherwise * vmentry fails as it then expects bit 14 (BS) in pending debug * exceptions being set, but that's not correct for the guest debugging * case. */ if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP) vmx_set_interrupt_shadow(vcpu, 0); kvm_load_guest_xsave_state(vcpu); pt_guest_enter(vmx); atomic_switch_perf_msrs(vmx); if (intel_pmu_lbr_is_enabled(vcpu)) vmx_passthrough_lbr_msrs(vcpu); if (enable_preemption_timer) vmx_update_hv_timer(vcpu, force_immediate_exit); else if (force_immediate_exit) smp_send_reschedule(vcpu->cpu); kvm_wait_lapic_expire(vcpu); /* The actual VMENTER/EXIT is in the .noinstr.text section. */ vmx_vcpu_enter_exit(vcpu, __vmx_vcpu_run_flags(vmx)); /* All fields are clean at this point */ if (kvm_is_using_evmcs()) { current_evmcs->hv_clean_fields |= HV_VMX_ENLIGHTENED_CLEAN_FIELD_ALL; current_evmcs->hv_vp_id = kvm_hv_get_vpindex(vcpu); } /* MSR_IA32_DEBUGCTLMSR is zeroed on vmexit. Restore it if needed */ if (vcpu->arch.host_debugctl) update_debugctlmsr(vcpu->arch.host_debugctl); #ifndef CONFIG_X86_64 /* * The sysexit path does not restore ds/es, so we must set them to * a reasonable value ourselves. * * We can't defer this to vmx_prepare_switch_to_host() since that * function may be executed in interrupt context, which saves and * restore segments around it, nullifying its effect. */ loadsegment(ds, __USER_DS); loadsegment(es, __USER_DS); #endif pt_guest_exit(vmx); kvm_load_host_xsave_state(vcpu); if (is_guest_mode(vcpu)) { /* * Track VMLAUNCH/VMRESUME that have made past guest state * checking. */ if (vmx->nested.nested_run_pending && !vmx_get_exit_reason(vcpu).failed_vmentry) ++vcpu->stat.nested_run; vmx->nested.nested_run_pending = 0; } if (unlikely(vmx->fail)) return EXIT_FASTPATH_NONE; if (unlikely((u16)vmx_get_exit_reason(vcpu).basic == EXIT_REASON_MCE_DURING_VMENTRY)) kvm_machine_check(); trace_kvm_exit(vcpu, KVM_ISA_VMX); if (unlikely(vmx_get_exit_reason(vcpu).failed_vmentry)) return EXIT_FASTPATH_NONE; vmx->loaded_vmcs->launched = 1; vmx_recover_nmi_blocking(vmx); vmx_complete_interrupts(vmx); return vmx_exit_handlers_fastpath(vcpu, force_immediate_exit); } void vmx_vcpu_free(struct kvm_vcpu *vcpu) { struct vcpu_vmx *vmx = to_vmx(vcpu); if (enable_pml) vmx_destroy_pml_buffer(vmx); free_vpid(vmx->vpid); nested_vmx_free_vcpu(vcpu); free_loaded_vmcs(vmx->loaded_vmcs); free_page((unsigned long)vmx->ve_info); } int vmx_vcpu_create(struct kvm_vcpu *vcpu) { struct vmx_uret_msr *tsx_ctrl; struct vcpu_vmx *vmx; int i, err; BUILD_BUG_ON(offsetof(struct vcpu_vmx, vcpu) != 0); vmx = to_vmx(vcpu); INIT_LIST_HEAD(&vmx->vt.pi_wakeup_list); err = -ENOMEM; vmx->vpid = allocate_vpid(); /* * If PML is turned on, failure on enabling PML just results in failure * of creating the vcpu, therefore we can simplify PML logic (by * avoiding dealing with cases, such as enabling PML partially on vcpus * for the guest), etc. */ if (enable_pml) { vmx->pml_pg = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO); if (!vmx->pml_pg) goto free_vpid; } for (i = 0; i < kvm_nr_uret_msrs; ++i) vmx->guest_uret_msrs[i].mask = -1ull; if (boot_cpu_has(X86_FEATURE_RTM)) { /* * TSX_CTRL_CPUID_CLEAR is handled in the CPUID interception. * Keep the host value unchanged to avoid changing CPUID bits * under the host kernel's feet. */ tsx_ctrl = vmx_find_uret_msr(vmx, MSR_IA32_TSX_CTRL); if (tsx_ctrl) tsx_ctrl->mask = ~(u64)TSX_CTRL_CPUID_CLEAR; } err = alloc_loaded_vmcs(&vmx->vmcs01); if (err < 0) goto free_pml; /* * Use Hyper-V 'Enlightened MSR Bitmap' feature when KVM runs as a * nested (L1) hypervisor and Hyper-V in L0 supports it. Enable the * feature only for vmcs01, KVM currently isn't equipped to realize any * performance benefits from enabling it for vmcs02. */ if (kvm_is_using_evmcs() && (ms_hyperv.nested_features & HV_X64_NESTED_MSR_BITMAP)) { struct hv_enlightened_vmcs *evmcs = (void *)vmx->vmcs01.vmcs; evmcs->hv_enlightenments_control.msr_bitmap = 1; } vmx->loaded_vmcs = &vmx->vmcs01; if (cpu_need_virtualize_apic_accesses(vcpu)) { err = kvm_alloc_apic_access_page(vcpu->kvm); if (err) goto free_vmcs; } if (enable_ept && !enable_unrestricted_guest) { err = init_rmode_identity_map(vcpu->kvm); if (err) goto free_vmcs; } err = -ENOMEM; if (vmcs_config.cpu_based_2nd_exec_ctrl & SECONDARY_EXEC_EPT_VIOLATION_VE) { struct page *page; BUILD_BUG_ON(sizeof(*vmx->ve_info) > PAGE_SIZE); /* ve_info must be page aligned. */ page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO); if (!page) goto free_vmcs; vmx->ve_info = page_to_virt(page); } if (vmx_can_use_ipiv(vcpu)) WRITE_ONCE(to_kvm_vmx(vcpu->kvm)->pid_table[vcpu->vcpu_id], __pa(&vmx->vt.pi_desc) | PID_TABLE_ENTRY_VALID); return 0; free_vmcs: free_loaded_vmcs(vmx->loaded_vmcs); free_pml: vmx_destroy_pml_buffer(vmx); free_vpid: free_vpid(vmx->vpid); return err; } #define L1TF_MSG_SMT "L1TF CPU bug present and SMT on, data leak possible. See CVE-2018-3646 and https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/l1tf.html for details.\n" #define L1TF_MSG_L1D "L1TF CPU bug present and virtualization mitigation disabled, data leak possible. See CVE-2018-3646 and https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/l1tf.html for details.\n" int vmx_vm_init(struct kvm *kvm) { if (!ple_gap) kvm->arch.pause_in_guest = true; if (boot_cpu_has(X86_BUG_L1TF) && enable_ept) { switch (l1tf_mitigation) { case L1TF_MITIGATION_OFF: case L1TF_MITIGATION_FLUSH_NOWARN: /* 'I explicitly don't care' is set */ break; case L1TF_MITIGATION_AUTO: case L1TF_MITIGATION_FLUSH: case L1TF_MITIGATION_FLUSH_NOSMT: case L1TF_MITIGATION_FULL: /* * Warn upon starting the first VM in a potentially * insecure environment. */ if (sched_smt_active()) pr_warn_once(L1TF_MSG_SMT); if (l1tf_vmx_mitigation == VMENTER_L1D_FLUSH_NEVER) pr_warn_once(L1TF_MSG_L1D); break; case L1TF_MITIGATION_FULL_FORCE: /* Flush is enforced */ break; } } if (enable_pml) kvm->arch.cpu_dirty_log_size = PML_LOG_NR_ENTRIES; return 0; } static inline bool vmx_ignore_guest_pat(struct kvm *kvm) { /* * Non-coherent DMA devices need the guest to flush CPU properly. * In that case it is not possible to map all guest RAM as WB, so * always trust guest PAT. */ return !kvm_arch_has_noncoherent_dma(kvm) && kvm_check_has_quirk(kvm, KVM_X86_QUIRK_IGNORE_GUEST_PAT); } u8 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio) { /* * Force UC for host MMIO regions, as allowing the guest to access MMIO * with cacheable accesses will result in Machine Checks. */ if (is_mmio) return MTRR_TYPE_UNCACHABLE << VMX_EPT_MT_EPTE_SHIFT; /* Force WB if ignoring guest PAT */ if (vmx_ignore_guest_pat(vcpu->kvm)) return (MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT) | VMX_EPT_IPAT_BIT; return (MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT); } static void vmcs_set_secondary_exec_control(struct vcpu_vmx *vmx, u32 new_ctl) { /* * These bits in the secondary execution controls field * are dynamic, the others are mostly based on the hypervisor * architecture and the guest's CPUID. Do not touch the * dynamic bits. */ u32 mask = SECONDARY_EXEC_SHADOW_VMCS | SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE | SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES | SECONDARY_EXEC_DESC; u32 cur_ctl = secondary_exec_controls_get(vmx); secondary_exec_controls_set(vmx, (new_ctl & ~mask) | (cur_ctl & mask)); } /* * Generate MSR_IA32_VMX_CR{0,4}_FIXED1 according to CPUID. Only set bits * (indicating "allowed-1") if they are supported in the guest's CPUID. */ static void nested_vmx_cr_fixed1_bits_update(struct kvm_vcpu *vcpu) { struct vcpu_vmx *vmx = to_vmx(vcpu); struct kvm_cpuid_entry2 *entry; vmx->nested.msrs.cr0_fixed1 = 0xffffffff; vmx->nested.msrs.cr4_fixed1 = X86_CR4_PCE; #define cr4_fixed1_update(_cr4_mask, _reg, _cpuid_mask) do { \ if (entry && (entry->_reg & (_cpuid_mask))) \ vmx->nested.msrs.cr4_fixed1 |= (_cr4_mask); \ } while (0) entry = kvm_find_cpuid_entry(vcpu, 0x1); cr4_fixed1_update(X86_CR4_VME, edx, feature_bit(VME)); cr4_fixed1_update(X86_CR4_PVI, edx, feature_bit(VME)); cr4_fixed1_update(X86_CR4_TSD, edx, feature_bit(TSC)); cr4_fixed1_update(X86_CR4_DE, edx, feature_bit(DE)); cr4_fixed1_update(X86_CR4_PSE, edx, feature_bit(PSE)); cr4_fixed1_update(X86_CR4_PAE, edx, feature_bit(PAE)); cr4_fixed1_update(X86_CR4_MCE, edx, feature_bit(MCE)); cr4_fixed1_update(X86_CR4_PGE, edx, feature_bit(PGE)); cr4_fixed1_update(X86_CR4_OSFXSR, edx, feature_bit(FXSR)); cr4_fixed1_update(X86_CR4_OSXMMEXCPT, edx, feature_bit(XMM)); cr4_fixed1_update(X86_CR4_VMXE, ecx, feature_bit(VMX)); cr4_fixed1_update(X86_CR4_SMXE, ecx, feature_bit(SMX)); cr4_fixed1_update(X86_CR4_PCIDE, ecx, feature_bit(PCID)); cr4_fixed1_update(X86_CR4_OSXSAVE, ecx, feature_bit(XSAVE)); entry = kvm_find_cpuid_entry_index(vcpu, 0x7, 0); cr4_fixed1_update(X86_CR4_FSGSBASE, ebx, feature_bit(FSGSBASE)); cr4_fixed1_update(X86_CR4_SMEP, ebx, feature_bit(SMEP)); cr4_fixed1_update(X86_CR4_SMAP, ebx, feature_bit(SMAP)); cr4_fixed1_update(X86_CR4_PKE, ecx, feature_bit(PKU)); cr4_fixed1_update(X86_CR4_UMIP, ecx, feature_bit(UMIP)); cr4_fixed1_update(X86_CR4_LA57, ecx, feature_bit(LA57)); entry = kvm_find_cpuid_entry_index(vcpu, 0x7, 1); cr4_fixed1_update(X86_CR4_LAM_SUP, eax, feature_bit(LAM)); #undef cr4_fixed1_update } static void update_intel_pt_cfg(struct kvm_vcpu *vcpu) { struct vcpu_vmx *vmx = to_vmx(vcpu); struct kvm_cpuid_entry2 *best = NULL; int i; for (i = 0; i < PT_CPUID_LEAVES; i++) { best = kvm_find_cpuid_entry_index(vcpu, 0x14, i); if (!best) return; vmx->pt_desc.caps[CPUID_EAX + i*PT_CPUID_REGS_NUM] = best->eax; vmx->pt_desc.caps[CPUID_EBX + i*PT_CPUID_REGS_NUM] = best->ebx; vmx->pt_desc.caps[CPUID_ECX + i*PT_CPUID_REGS_NUM] = best->ecx; vmx->pt_desc.caps[CPUID_EDX + i*PT_CPUID_REGS_NUM] = best->edx; } /* Get the number of configurable Address Ranges for filtering */ vmx->pt_desc.num_address_ranges = intel_pt_validate_cap(vmx->pt_desc.caps, PT_CAP_num_address_ranges); /* Initialize and clear the no dependency bits */ vmx->pt_desc.ctl_bitmask = ~(RTIT_CTL_TRACEEN | RTIT_CTL_OS | RTIT_CTL_USR | RTIT_CTL_TSC_EN | RTIT_CTL_DISRETC | RTIT_CTL_BRANCH_EN); /* * If CPUID.(EAX=14H,ECX=0):EBX[0]=1 CR3Filter can be set otherwise * will inject an #GP */ if (intel_pt_validate_cap(vmx->pt_desc.caps, PT_CAP_cr3_filtering)) vmx->pt_desc.ctl_bitmask &= ~RTIT_CTL_CR3EN; /* * If CPUID.(EAX=14H,ECX=0):EBX[1]=1 CYCEn, CycThresh and * PSBFreq can be set */ if (intel_pt_validate_cap(vmx->pt_desc.caps, PT_CAP_psb_cyc)) vmx->pt_desc.ctl_bitmask &= ~(RTIT_CTL_CYCLEACC | RTIT_CTL_CYC_THRESH | RTIT_CTL_PSB_FREQ); /* * If CPUID.(EAX=14H,ECX=0):EBX[3]=1 MTCEn and MTCFreq can be set */ if (intel_pt_validate_cap(vmx->pt_desc.caps, PT_CAP_mtc)) vmx->pt_desc.ctl_bitmask &= ~(RTIT_CTL_MTC_EN | RTIT_CTL_MTC_RANGE); /* If CPUID.(EAX=14H,ECX=0):EBX[4]=1 FUPonPTW and PTWEn can be set */ if (intel_pt_validate_cap(vmx->pt_desc.caps, PT_CAP_ptwrite)) vmx->pt_desc.ctl_bitmask &= ~(RTIT_CTL_FUP_ON_PTW | RTIT_CTL_PTW_EN); /* If CPUID.(EAX=14H,ECX=0):EBX[5]=1 PwrEvEn can be set */ if (intel_pt_validate_cap(vmx->pt_desc.caps, PT_CAP_power_event_trace)) vmx->pt_desc.ctl_bitmask &= ~RTIT_CTL_PWR_EVT_EN; /* If CPUID.(EAX=14H,ECX=0):ECX[0]=1 ToPA can be set */ if (intel_pt_validate_cap(vmx->pt_desc.caps, PT_CAP_topa_output)) vmx->pt_desc.ctl_bitmask &= ~RTIT_CTL_TOPA; /* If CPUID.(EAX=14H,ECX=0):ECX[3]=1 FabricEn can be set */ if (intel_pt_validate_cap(vmx->pt_desc.caps, PT_CAP_output_subsys)) vmx->pt_desc.ctl_bitmask &= ~RTIT_CTL_FABRIC_EN; /* unmask address range configure area */ for (i = 0; i < vmx->pt_desc.num_address_ranges; i++) vmx->pt_desc.ctl_bitmask &= ~(0xfULL << (32 + i * 4)); } void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) { struct vcpu_vmx *vmx = to_vmx(vcpu); /* * XSAVES is effectively enabled if and only if XSAVE is also exposed * to the guest. XSAVES depends on CR4.OSXSAVE, and CR4.OSXSAVE can be * set if and only if XSAVE is supported. */ if (!guest_cpu_cap_has(vcpu, X86_FEATURE_XSAVE)) guest_cpu_cap_clear(vcpu, X86_FEATURE_XSAVES); vmx_setup_uret_msrs(vmx); if (cpu_has_secondary_exec_ctrls()) vmcs_set_secondary_exec_control(vmx, vmx_secondary_exec_control(vmx)); if (guest_cpu_cap_has(vcpu, X86_FEATURE_VMX)) vmx->msr_ia32_feature_control_valid_bits |= FEAT_CTL_VMX_ENABLED_INSIDE_SMX | FEAT_CTL_VMX_ENABLED_OUTSIDE_SMX; else vmx->msr_ia32_feature_control_valid_bits &= ~(FEAT_CTL_VMX_ENABLED_INSIDE_SMX | FEAT_CTL_VMX_ENABLED_OUTSIDE_SMX); if (guest_cpu_cap_has(vcpu, X86_FEATURE_VMX)) nested_vmx_cr_fixed1_bits_update(vcpu); if (boot_cpu_has(X86_FEATURE_INTEL_PT) && guest_cpu_cap_has(vcpu, X86_FEATURE_INTEL_PT)) update_intel_pt_cfg(vcpu); if (boot_cpu_has(X86_FEATURE_RTM)) { struct vmx_uret_msr *msr; msr = vmx_find_uret_msr(vmx, MSR_IA32_TSX_CTRL); if (msr) { bool enabled = guest_cpu_cap_has(vcpu, X86_FEATURE_RTM); vmx_set_guest_uret_msr(vmx, msr, enabled ? 0 : TSX_CTRL_RTM_DISABLE); } } set_cr4_guest_host_mask(vmx); vmx_write_encls_bitmap(vcpu, NULL); if (guest_cpu_cap_has(vcpu, X86_FEATURE_SGX)) vmx->msr_ia32_feature_control_valid_bits |= FEAT_CTL_SGX_ENABLED; else vmx->msr_ia32_feature_control_valid_bits &= ~FEAT_CTL_SGX_ENABLED; if (guest_cpu_cap_has(vcpu, X86_FEATURE_SGX_LC)) vmx->msr_ia32_feature_control_valid_bits |= FEAT_CTL_SGX_LC_ENABLED; else vmx->msr_ia32_feature_control_valid_bits &= ~FEAT_CTL_SGX_LC_ENABLED; /* Recalc MSR interception to account for feature changes. */ vmx_recalc_msr_intercepts(vcpu); /* Refresh #PF interception to account for MAXPHYADDR changes. */ vmx_update_exception_bitmap(vcpu); } static __init u64 vmx_get_perf_capabilities(void) { u64 perf_cap = PMU_CAP_FW_WRITES; u64 host_perf_cap = 0; if (!enable_pmu) return 0; if (boot_cpu_has(X86_FEATURE_PDCM)) rdmsrq(MSR_IA32_PERF_CAPABILITIES, host_perf_cap); if (!cpu_feature_enabled(X86_FEATURE_ARCH_LBR)) { x86_perf_get_lbr(&vmx_lbr_caps); /* * KVM requires LBR callstack support, as the overhead due to * context switching LBRs without said support is too high. * See intel_pmu_create_guest_lbr_event() for more info. */ if (!vmx_lbr_caps.has_callstack) memset(&vmx_lbr_caps, 0, sizeof(vmx_lbr_caps)); else if (vmx_lbr_caps.nr) perf_cap |= host_perf_cap & PMU_CAP_LBR_FMT; } if (vmx_pebs_supported()) { perf_cap |= host_perf_cap & PERF_CAP_PEBS_MASK; /* * Disallow adaptive PEBS as it is functionally broken, can be * used by the guest to read *host* LBRs, and can be used to * bypass userspace event filters. To correctly and safely * support adaptive PEBS, KVM needs to: * * 1. Account for the ADAPTIVE flag when (re)programming fixed * counters. * * 2. Gain support from perf (or take direct control of counter * programming) to support events without adaptive PEBS * enabled for the hardware counter. * * 3. Ensure LBR MSRs cannot hold host data on VM-Entry with * adaptive PEBS enabled and MSR_PEBS_DATA_CFG.LBRS=1. * * 4. Document which PMU events are effectively exposed to the * guest via adaptive PEBS, and make adaptive PEBS mutually * exclusive with KVM_SET_PMU_EVENT_FILTER if necessary. */ perf_cap &= ~PERF_CAP_PEBS_BASELINE; } return perf_cap; } static __init void vmx_set_cpu_caps(void) { kvm_set_cpu_caps(); /* CPUID 0x1 */ if (nested) kvm_cpu_cap_set(X86_FEATURE_VMX); /* CPUID 0x7 */ if (kvm_mpx_supported()) kvm_cpu_cap_check_and_set(X86_FEATURE_MPX); if (!cpu_has_vmx_invpcid()) kvm_cpu_cap_clear(X86_FEATURE_INVPCID); if (vmx_pt_mode_is_host_guest()) kvm_cpu_cap_check_and_set(X86_FEATURE_INTEL_PT); if (vmx_pebs_supported()) { kvm_cpu_cap_check_and_set(X86_FEATURE_DS); kvm_cpu_cap_check_and_set(X86_FEATURE_DTES64); } if (!enable_pmu) kvm_cpu_cap_clear(X86_FEATURE_PDCM); kvm_caps.supported_perf_cap = vmx_get_perf_capabilities(); if (!enable_sgx) { kvm_cpu_cap_clear(X86_FEATURE_SGX); kvm_cpu_cap_clear(X86_FEATURE_SGX_LC); kvm_cpu_cap_clear(X86_FEATURE_SGX1); kvm_cpu_cap_clear(X86_FEATURE_SGX2); kvm_cpu_cap_clear(X86_FEATURE_SGX_EDECCSSA); } if (vmx_umip_emulated()) kvm_cpu_cap_set(X86_FEATURE_UMIP); /* CPUID 0xD.1 */ kvm_caps.supported_xss = 0; if (!cpu_has_vmx_xsaves()) kvm_cpu_cap_clear(X86_FEATURE_XSAVES); /* CPUID 0x80000001 and 0x7 (RDPID) */ if (!cpu_has_vmx_rdtscp()) { kvm_cpu_cap_clear(X86_FEATURE_RDTSCP); kvm_cpu_cap_clear(X86_FEATURE_RDPID); } if (cpu_has_vmx_waitpkg()) kvm_cpu_cap_check_and_set(X86_FEATURE_WAITPKG); } static bool vmx_is_io_intercepted(struct kvm_vcpu *vcpu, struct x86_instruction_info *info, unsigned long *exit_qualification) { struct vmcs12 *vmcs12 = get_vmcs12(vcpu); unsigned short port; int size; bool imm; /* * If the 'use IO bitmaps' VM-execution control is 0, IO instruction * VM-exits depend on the 'unconditional IO exiting' VM-execution * control. * * Otherwise, IO instruction VM-exits are controlled by the IO bitmaps. */ if (!nested_cpu_has(vmcs12, CPU_BASED_USE_IO_BITMAPS)) return nested_cpu_has(vmcs12, CPU_BASED_UNCOND_IO_EXITING); if (info->intercept == x86_intercept_in || info->intercept == x86_intercept_ins) { port = info->src_val; size = info->dst_bytes; imm = info->src_type == OP_IMM; } else { port = info->dst_val; size = info->src_bytes; imm = info->dst_type == OP_IMM; } *exit_qualification = ((unsigned long)port << 16) | (size - 1); if (info->intercept == x86_intercept_ins || info->intercept == x86_intercept_outs) *exit_qualification |= BIT(4); if (info->rep_prefix) *exit_qualification |= BIT(5); if (imm) *exit_qualification |= BIT(6); return nested_vmx_check_io_bitmaps(vcpu, port, size); } int vmx_check_intercept(struct kvm_vcpu *vcpu, struct x86_instruction_info *info, enum x86_intercept_stage stage, struct x86_exception *exception) { struct vmcs12 *vmcs12 = get_vmcs12(vcpu); unsigned long exit_qualification = 0; u32 vm_exit_reason; u64 exit_insn_len; switch (info->intercept) { case x86_intercept_rdpid: /* * RDPID causes #UD if not enabled through secondary execution * controls (ENABLE_RDTSCP). Note, the implicit MSR access to * TSC_AUX is NOT subject to interception, i.e. checking only * the dedicated execution control is architecturally correct. */ if (!nested_cpu_has2(vmcs12, SECONDARY_EXEC_ENABLE_RDTSCP)) { exception->vector = UD_VECTOR; exception->error_code_valid = false; return X86EMUL_PROPAGATE_FAULT; } return X86EMUL_CONTINUE; case x86_intercept_in: case x86_intercept_ins: case x86_intercept_out: case x86_intercept_outs: if (!vmx_is_io_intercepted(vcpu, info, &exit_qualification)) return X86EMUL_CONTINUE; vm_exit_reason = EXIT_REASON_IO_INSTRUCTION; break; case x86_intercept_lgdt: case x86_intercept_lidt: case x86_intercept_lldt: case x86_intercept_ltr: case x86_intercept_sgdt: case x86_intercept_sidt: case x86_intercept_sldt: case x86_intercept_str: if (!nested_cpu_has2(vmcs12, SECONDARY_EXEC_DESC)) return X86EMUL_CONTINUE; if (info->intercept == x86_intercept_lldt || info->intercept == x86_intercept_ltr || info->intercept == x86_intercept_sldt || info->intercept == x86_intercept_str) vm_exit_reason = EXIT_REASON_LDTR_TR; else vm_exit_reason = EXIT_REASON_GDTR_IDTR; /* * FIXME: Decode the ModR/M to generate the correct exit * qualification for memory operands. */ break; case x86_intercept_hlt: if (!nested_cpu_has(vmcs12, CPU_BASED_HLT_EXITING)) return X86EMUL_CONTINUE; vm_exit_reason = EXIT_REASON_HLT; break; case x86_intercept_pause: /* * PAUSE is a single-byte NOP with a REPE prefix, i.e. collides * with vanilla NOPs in the emulator. Apply the interception * check only to actual PAUSE instructions. Don't check * PAUSE-loop-exiting, software can't expect a given PAUSE to * exit, i.e. KVM is within its rights to allow L2 to execute * the PAUSE. */ if ((info->rep_prefix != REPE_PREFIX) || !nested_cpu_has(vmcs12, CPU_BASED_PAUSE_EXITING)) return X86EMUL_CONTINUE; vm_exit_reason = EXIT_REASON_PAUSE_INSTRUCTION; break; /* TODO: check more intercepts... */ default: return X86EMUL_UNHANDLEABLE; } exit_insn_len = abs_diff((s64)info->next_rip, (s64)info->rip); if (!exit_insn_len || exit_insn_len > X86_MAX_INSTRUCTION_LENGTH) return X86EMUL_UNHANDLEABLE; __nested_vmx_vmexit(vcpu, vm_exit_reason, 0, exit_qualification, exit_insn_len); return X86EMUL_INTERCEPTED; } #ifdef CONFIG_X86_64 /* (a << shift) / divisor, return 1 if overflow otherwise 0 */ static inline int u64_shl_div_u64(u64 a, unsigned int shift, u64 divisor, u64 *result) { u64 low = a << shift, high = a >> (64 - shift); /* To avoid the overflow on divq */ if (high >= divisor) return 1; /* Low hold the result, high hold rem which is discarded */ asm("divq %2\n\t" : "=a" (low), "=d" (high) : "rm" (divisor), "0" (low), "1" (high)); *result = low; return 0; } int vmx_set_hv_timer(struct kvm_vcpu *vcpu, u64 guest_deadline_tsc, bool *expired) { struct vcpu_vmx *vmx; u64 tscl, guest_tscl, delta_tsc, lapic_timer_advance_cycles; struct kvm_timer *ktimer = &vcpu->arch.apic->lapic_timer; vmx = to_vmx(vcpu); tscl = rdtsc(); guest_tscl = kvm_read_l1_tsc(vcpu, tscl); delta_tsc = max(guest_deadline_tsc, guest_tscl) - guest_tscl; lapic_timer_advance_cycles = nsec_to_cycles(vcpu, ktimer->timer_advance_ns); if (delta_tsc > lapic_timer_advance_cycles) delta_tsc -= lapic_timer_advance_cycles; else delta_tsc = 0; /* Convert to host delta tsc if tsc scaling is enabled */ if (vcpu->arch.l1_tsc_scaling_ratio != kvm_caps.default_tsc_scaling_ratio && delta_tsc && u64_shl_div_u64(delta_tsc, kvm_caps.tsc_scaling_ratio_frac_bits, vcpu->arch.l1_tsc_scaling_ratio, &delta_tsc)) return -ERANGE; /* * If the delta tsc can't fit in the 32 bit after the multi shift, * we can't use the preemption timer. * It's possible that it fits on later vmentries, but checking * on every vmentry is costly so we just use an hrtimer. */ if (delta_tsc >> (cpu_preemption_timer_multi + 32)) return -ERANGE; vmx->hv_deadline_tsc = tscl + delta_tsc; *expired = !delta_tsc; return 0; } void vmx_cancel_hv_timer(struct kvm_vcpu *vcpu) { to_vmx(vcpu)->hv_deadline_tsc = -1; } #endif void vmx_update_cpu_dirty_logging(struct kvm_vcpu *vcpu) { struct vcpu_vmx *vmx = to_vmx(vcpu); if (WARN_ON_ONCE(!enable_pml)) return; if (is_guest_mode(vcpu)) { vmx->nested.update_vmcs01_cpu_dirty_logging = true; return; } /* * Note, nr_memslots_dirty_logging can be changed concurrent with this * code, but in that case another update request will be made and so * the guest will never run with a stale PML value. */ if (atomic_read(&vcpu->kvm->nr_memslots_dirty_logging)) secondary_exec_controls_setbit(vmx, SECONDARY_EXEC_ENABLE_PML); else secondary_exec_controls_clearbit(vmx, SECONDARY_EXEC_ENABLE_PML); } void vmx_setup_mce(struct kvm_vcpu *vcpu) { if (vcpu->arch.mcg_cap & MCG_LMCE_P) to_vmx(vcpu)->msr_ia32_feature_control_valid_bits |= FEAT_CTL_LMCE_ENABLED; else to_vmx(vcpu)->msr_ia32_feature_control_valid_bits &= ~FEAT_CTL_LMCE_ENABLED; } #ifdef CONFIG_KVM_SMM int vmx_smi_allowed(struct kvm_vcpu *vcpu, bool for_injection) { /* we need a nested vmexit to enter SMM, postpone if run is pending */ if (to_vmx(vcpu)->nested.nested_run_pending) return -EBUSY; return !is_smm(vcpu); } int vmx_enter_smm(struct kvm_vcpu *vcpu, union kvm_smram *smram) { struct vcpu_vmx *vmx = to_vmx(vcpu); /* * TODO: Implement custom flows for forcing the vCPU out/in of L2 on * SMI and RSM. Using the common VM-Exit + VM-Enter routines is wrong * SMI and RSM only modify state that is saved and restored via SMRAM. * E.g. most MSRs are left untouched, but many are modified by VM-Exit * and VM-Enter, and thus L2's values may be corrupted on SMI+RSM. */ vmx->nested.smm.guest_mode = is_guest_mode(vcpu); if (vmx->nested.smm.guest_mode) nested_vmx_vmexit(vcpu, -1, 0, 0); vmx->nested.smm.vmxon = vmx->nested.vmxon; vmx->nested.vmxon = false; vmx_clear_hlt(vcpu); return 0; } int vmx_leave_smm(struct kvm_vcpu *vcpu, const union kvm_smram *smram) { struct vcpu_vmx *vmx = to_vmx(vcpu); int ret; if (vmx->nested.smm.vmxon) { vmx->nested.vmxon = true; vmx->nested.smm.vmxon = false; } if (vmx->nested.smm.guest_mode) { ret = nested_vmx_enter_non_root_mode(vcpu, false); if (ret) return ret; vmx->nested.nested_run_pending = 1; vmx->nested.smm.guest_mode = false; } return 0; } void vmx_enable_smi_window(struct kvm_vcpu *vcpu) { /* RSM will cause a vmexit anyway. */ } #endif bool vmx_apic_init_signal_blocked(struct kvm_vcpu *vcpu) { return to_vmx(vcpu)->nested.vmxon && !is_guest_mode(vcpu); } void vmx_migrate_timers(struct kvm_vcpu *vcpu) { if (is_guest_mode(vcpu)) { struct hrtimer *timer = &to_vmx(vcpu)->nested.preemption_timer; if (hrtimer_try_to_cancel(timer) == 1) hrtimer_start_expires(timer, HRTIMER_MODE_ABS_PINNED); } } void vmx_hardware_unsetup(void) { kvm_set_posted_intr_wakeup_handler(NULL); if (nested) nested_vmx_hardware_unsetup(); free_kvm_area(); } void vmx_vm_destroy(struct kvm *kvm) { struct kvm_vmx *kvm_vmx = to_kvm_vmx(kvm); free_pages((unsigned long)kvm_vmx->pid_table, vmx_get_pid_table_order(kvm)); } /* * Note, the SDM states that the linear address is masked *after* the modified * canonicality check, whereas KVM masks (untags) the address and then performs * a "normal" canonicality check. Functionally, the two methods are identical, * and when the masking occurs relative to the canonicality check isn't visible * to software, i.e. KVM's behavior doesn't violate the SDM. */ gva_t vmx_get_untagged_addr(struct kvm_vcpu *vcpu, gva_t gva, unsigned int flags) { int lam_bit; unsigned long cr3_bits; if (flags & (X86EMUL_F_FETCH | X86EMUL_F_IMPLICIT | X86EMUL_F_INVLPG)) return gva; if (!is_64_bit_mode(vcpu)) return gva; /* * Bit 63 determines if the address should be treated as user address * or a supervisor address. */ if (!(gva & BIT_ULL(63))) { cr3_bits = kvm_get_active_cr3_lam_bits(vcpu); if (!(cr3_bits & (X86_CR3_LAM_U57 | X86_CR3_LAM_U48))) return gva; /* LAM_U48 is ignored if LAM_U57 is set. */ lam_bit = cr3_bits & X86_CR3_LAM_U57 ? 56 : 47; } else { if (!kvm_is_cr4_bit_set(vcpu, X86_CR4_LAM_SUP)) return gva; lam_bit = kvm_is_cr4_bit_set(vcpu, X86_CR4_LA57) ? 56 : 47; } /* * Untag the address by sign-extending the lam_bit, but NOT to bit 63. * Bit 63 is retained from the raw virtual address so that untagging * doesn't change a user access to a supervisor access, and vice versa. */ return (sign_extend64(gva, lam_bit) & ~BIT_ULL(63)) | (gva & BIT_ULL(63)); } static unsigned int vmx_handle_intel_pt_intr(void) { struct kvm_vcpu *vcpu = kvm_get_running_vcpu(); /* '0' on failure so that the !PT case can use a RET0 static call. */ if (!vcpu || !kvm_handling_nmi_from_guest(vcpu)) return 0; kvm_make_request(KVM_REQ_PMI, vcpu); __set_bit(MSR_CORE_PERF_GLOBAL_OVF_CTRL_TRACE_TOPA_PMI_BIT, (unsigned long *)&vcpu->arch.pmu.global_status); return 1; } static __init void vmx_setup_user_return_msrs(void) { /* * Though SYSCALL is only supported in 64-bit mode on Intel CPUs, kvm * will emulate SYSCALL in legacy mode if the vendor string in guest * CPUID.0:{EBX,ECX,EDX} is "AuthenticAMD" or "AMDisbetter!" To * support this emulation, MSR_STAR is included in the list for i386, * but is never loaded into hardware. MSR_CSTAR is also never loaded * into hardware and is here purely for emulation purposes. */ const u32 vmx_uret_msrs_list[] = { #ifdef CONFIG_X86_64 MSR_SYSCALL_MASK, MSR_LSTAR, MSR_CSTAR, #endif MSR_EFER, MSR_TSC_AUX, MSR_STAR, MSR_IA32_TSX_CTRL, }; int i; BUILD_BUG_ON(ARRAY_SIZE(vmx_uret_msrs_list) != MAX_NR_USER_RETURN_MSRS); for (i = 0; i < ARRAY_SIZE(vmx_uret_msrs_list); ++i) kvm_add_user_return_msr(vmx_uret_msrs_list[i]); } static void __init vmx_setup_me_spte_mask(void) { u64 me_mask = 0; /* * On pre-MKTME system, boot_cpu_data.x86_phys_bits equals to * kvm_host.maxphyaddr. On MKTME and/or TDX capable systems, * boot_cpu_data.x86_phys_bits holds the actual physical address * w/o the KeyID bits, and kvm_host.maxphyaddr equals to * MAXPHYADDR reported by CPUID. Those bits between are KeyID bits. */ if (boot_cpu_data.x86_phys_bits != kvm_host.maxphyaddr) me_mask = rsvd_bits(boot_cpu_data.x86_phys_bits, kvm_host.maxphyaddr - 1); /* * Unlike SME, host kernel doesn't support setting up any * MKTME KeyID on Intel platforms. No memory encryption * bits should be included into the SPTE. */ kvm_mmu_set_me_spte_mask(0, me_mask); } __init int vmx_hardware_setup(void) { unsigned long host_bndcfgs; struct desc_ptr dt; int r; store_idt(&dt); host_idt_base = dt.address; vmx_setup_user_return_msrs(); if (setup_vmcs_config(&vmcs_config, &vmx_capability) < 0) return -EIO; if (boot_cpu_has(X86_FEATURE_NX)) kvm_enable_efer_bits(EFER_NX); if (boot_cpu_has(X86_FEATURE_MPX)) { rdmsrq(MSR_IA32_BNDCFGS, host_bndcfgs); WARN_ONCE(host_bndcfgs, "BNDCFGS in host will be lost"); } if (!cpu_has_vmx_mpx()) kvm_caps.supported_xcr0 &= ~(XFEATURE_MASK_BNDREGS | XFEATURE_MASK_BNDCSR); if (!cpu_has_vmx_vpid() || !cpu_has_vmx_invvpid() || !(cpu_has_vmx_invvpid_single() || cpu_has_vmx_invvpid_global())) enable_vpid = 0; if (!cpu_has_vmx_ept() || !cpu_has_vmx_ept_4levels() || !cpu_has_vmx_ept_mt_wb() || !cpu_has_vmx_invept_global()) enable_ept = 0; /* NX support is required for shadow paging. */ if (!enable_ept && !boot_cpu_has(X86_FEATURE_NX)) { pr_err_ratelimited("NX (Execute Disable) not supported\n"); return -EOPNOTSUPP; } if (!cpu_has_vmx_ept_ad_bits() || !enable_ept) enable_ept_ad_bits = 0; if (!cpu_has_vmx_unrestricted_guest() || !enable_ept) enable_unrestricted_guest = 0; if (!cpu_has_vmx_flexpriority()) flexpriority_enabled = 0; if (!cpu_has_virtual_nmis()) enable_vnmi = 0; #ifdef CONFIG_X86_SGX_KVM if (!cpu_has_vmx_encls_vmexit()) enable_sgx = false; #endif /* * set_apic_access_page_addr() is used to reload apic access * page upon invalidation. No need to do anything if not * using the APIC_ACCESS_ADDR VMCS field. */ if (!flexpriority_enabled) vt_x86_ops.set_apic_access_page_addr = NULL; if (!cpu_has_vmx_tpr_shadow()) vt_x86_ops.update_cr8_intercept = NULL; #if IS_ENABLED(CONFIG_HYPERV) if (ms_hyperv.nested_features & HV_X64_NESTED_GUEST_MAPPING_FLUSH && enable_ept) { vt_x86_ops.flush_remote_tlbs = hv_flush_remote_tlbs; vt_x86_ops.flush_remote_tlbs_range = hv_flush_remote_tlbs_range; } #endif if (!cpu_has_vmx_ple()) { ple_gap = 0; ple_window = 0; ple_window_grow = 0; ple_window_max = 0; ple_window_shrink = 0; } if (!cpu_has_vmx_apicv()) enable_apicv = 0; if (!enable_apicv) vt_x86_ops.sync_pir_to_irr = NULL; if (!enable_apicv || !cpu_has_vmx_ipiv()) enable_ipiv = false; if (cpu_has_vmx_tsc_scaling()) kvm_caps.has_tsc_control = true; kvm_caps.max_tsc_scaling_ratio = KVM_VMX_TSC_MULTIPLIER_MAX; kvm_caps.tsc_scaling_ratio_frac_bits = 48; kvm_caps.has_bus_lock_exit = cpu_has_vmx_bus_lock_detection(); kvm_caps.has_notify_vmexit = cpu_has_notify_vmexit(); set_bit(0, vmx_vpid_bitmap); /* 0 is reserved for host */ if (enable_ept) kvm_mmu_set_ept_masks(enable_ept_ad_bits, cpu_has_vmx_ept_execute_only()); else vt_x86_ops.get_mt_mask = NULL; /* * Setup shadow_me_value/shadow_me_mask to include MKTME KeyID * bits to shadow_zero_check. */ vmx_setup_me_spte_mask(); kvm_configure_mmu(enable_ept, 0, vmx_get_max_ept_level(), ept_caps_to_lpage_level(vmx_capability.ept)); /* * Only enable PML when hardware supports PML feature, and both EPT * and EPT A/D bit features are enabled -- PML depends on them to work. */ if (!enable_ept || !enable_ept_ad_bits || !cpu_has_vmx_pml()) enable_pml = 0; if (!cpu_has_vmx_preemption_timer()) enable_preemption_timer = false; if (enable_preemption_timer) { u64 use_timer_freq = 5000ULL * 1000 * 1000; cpu_preemption_timer_multi = vmx_misc_preemption_timer_rate(vmcs_config.misc); if (tsc_khz) use_timer_freq = (u64)tsc_khz * 1000; use_timer_freq >>= cpu_preemption_timer_multi; /* * KVM "disables" the preemption timer by setting it to its max * value. Don't use the timer if it might cause spurious exits * at a rate faster than 0.1 Hz (of uninterrupted guest time). */ if (use_timer_freq > 0xffffffffu / 10) enable_preemption_timer = false; } if (!enable_preemption_timer) { vt_x86_ops.set_hv_timer = NULL; vt_x86_ops.cancel_hv_timer = NULL; } kvm_caps.supported_mce_cap |= MCG_LMCE_P; kvm_caps.supported_mce_cap |= MCG_CMCI_P; if (pt_mode != PT_MODE_SYSTEM && pt_mode != PT_MODE_HOST_GUEST) return -EINVAL; if (!enable_ept || !enable_pmu || !cpu_has_vmx_intel_pt()) pt_mode = PT_MODE_SYSTEM; if (pt_mode == PT_MODE_HOST_GUEST) vt_init_ops.handle_intel_pt_intr = vmx_handle_intel_pt_intr; else vt_init_ops.handle_intel_pt_intr = NULL; setup_default_sgx_lepubkeyhash(); if (nested) { nested_vmx_setup_ctls_msrs(&vmcs_config, vmx_capability.ept); r = nested_vmx_hardware_setup(kvm_vmx_exit_handlers); if (r) return r; } vmx_set_cpu_caps(); r = alloc_kvm_area(); if (r && nested) nested_vmx_hardware_unsetup(); kvm_set_posted_intr_wakeup_handler(pi_wakeup_handler); /* * On Intel CPUs that lack self-snoop feature, letting the guest control * memory types may result in unexpected behavior. So always ignore guest * PAT on those CPUs and map VM as writeback, not allowing userspace to * disable the quirk. * * On certain Intel CPUs (e.g. SPR, ICX), though self-snoop feature is * supported, UC is slow enough to cause issues with some older guests (e.g. * an old version of bochs driver uses ioremap() instead of ioremap_wc() to * map the video RAM, causing wayland desktop to fail to get started * correctly). To avoid breaking those older guests that rely on KVM to force * memory type to WB, provide KVM_X86_QUIRK_IGNORE_GUEST_PAT to preserve the * safer (for performance) default behavior. * * On top of this, non-coherent DMA devices need the guest to flush CPU * caches properly. This also requires honoring guest PAT, and is forced * independent of the quirk in vmx_ignore_guest_pat(). */ if (!static_cpu_has(X86_FEATURE_SELFSNOOP)) kvm_caps.supported_quirks &= ~KVM_X86_QUIRK_IGNORE_GUEST_PAT; kvm_caps.inapplicable_quirks &= ~KVM_X86_QUIRK_IGNORE_GUEST_PAT; return r; } static void vmx_cleanup_l1d_flush(void) { if (vmx_l1d_flush_pages) { free_pages((unsigned long)vmx_l1d_flush_pages, L1D_CACHE_ORDER); vmx_l1d_flush_pages = NULL; } /* Restore state so sysfs ignores VMX */ l1tf_vmx_mitigation = VMENTER_L1D_FLUSH_AUTO; } void vmx_exit(void) { allow_smaller_maxphyaddr = false; vmx_cleanup_l1d_flush(); kvm_x86_vendor_exit(); } int __init vmx_init(void) { int r, cpu; KVM_SANITY_CHECK_VM_STRUCT_SIZE(kvm_vmx); if (!kvm_is_vmx_supported()) return -EOPNOTSUPP; /* * Note, hv_init_evmcs() touches only VMX knobs, i.e. there's nothing * to unwind if a later step fails. */ hv_init_evmcs(); r = kvm_x86_vendor_init(&vt_init_ops); if (r) return r; /* * Must be called after common x86 init so enable_ept is properly set * up. Hand the parameter mitigation value in which was stored in * the pre module init parser. If no parameter was given, it will * contain 'auto' which will be turned into the default 'cond' * mitigation mode. */ r = vmx_setup_l1d_flush(vmentry_l1d_flush_param); if (r) goto err_l1d_flush; for_each_possible_cpu(cpu) { INIT_LIST_HEAD(&per_cpu(loaded_vmcss_on_cpu, cpu)); pi_init_cpu(cpu); } vmx_check_vmcs12_offsets(); /* * Shadow paging doesn't have a (further) performance penalty * from GUEST_MAXPHYADDR < HOST_MAXPHYADDR so enable it * by default */ if (!enable_ept) allow_smaller_maxphyaddr = true; return 0; err_l1d_flush: kvm_x86_vendor_exit(); return r; } |
| 3 3 2 2 1 2 2 2 2 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 | // SPDX-License-Identifier: GPL-2.0 /* * linux/fs/ufs/util.c * * Copyright (C) 1998 * Daniel Pirkl <daniel.pirkl@email.cz> * Charles University, Faculty of Mathematics and Physics */ #include <linux/string.h> #include <linux/slab.h> #include <linux/buffer_head.h> #include "ufs_fs.h" #include "ufs.h" #include "swab.h" #include "util.h" struct ufs_buffer_head * _ubh_bread_ (struct ufs_sb_private_info * uspi, struct super_block *sb, u64 fragment, u64 size) { struct ufs_buffer_head * ubh; unsigned i, j ; u64 count = 0; if (size & ~uspi->s_fmask) return NULL; count = size >> uspi->s_fshift; if (count > UFS_MAXFRAG) return NULL; ubh = kmalloc (sizeof (struct ufs_buffer_head), GFP_NOFS); if (!ubh) return NULL; ubh->fragment = fragment; ubh->count = count; for (i = 0; i < count; i++) if (!(ubh->bh[i] = sb_bread(sb, fragment + i))) goto failed; for (; i < UFS_MAXFRAG; i++) ubh->bh[i] = NULL; return ubh; failed: for (j = 0; j < i; j++) brelse (ubh->bh[j]); kfree(ubh); return NULL; } struct ufs_buffer_head * ubh_bread_uspi (struct ufs_sb_private_info * uspi, struct super_block *sb, u64 fragment, u64 size) { unsigned i, j; u64 count = 0; if (size & ~uspi->s_fmask) return NULL; count = size >> uspi->s_fshift; if (count <= 0 || count > UFS_MAXFRAG) return NULL; USPI_UBH(uspi)->fragment = fragment; USPI_UBH(uspi)->count = count; for (i = 0; i < count; i++) if (!(USPI_UBH(uspi)->bh[i] = sb_bread(sb, fragment + i))) goto failed; for (; i < UFS_MAXFRAG; i++) USPI_UBH(uspi)->bh[i] = NULL; return USPI_UBH(uspi); failed: for (j = 0; j < i; j++) brelse (USPI_UBH(uspi)->bh[j]); return NULL; } void ubh_brelse (struct ufs_buffer_head * ubh) { unsigned i; if (!ubh) return; for (i = 0; i < ubh->count; i++) brelse (ubh->bh[i]); kfree (ubh); } void ubh_brelse_uspi (struct ufs_sb_private_info * uspi) { unsigned i; if (!USPI_UBH(uspi)) return; for ( i = 0; i < USPI_UBH(uspi)->count; i++ ) { brelse (USPI_UBH(uspi)->bh[i]); USPI_UBH(uspi)->bh[i] = NULL; } } void ubh_mark_buffer_dirty (struct ufs_buffer_head * ubh) { unsigned i; if (!ubh) return; for ( i = 0; i < ubh->count; i++ ) mark_buffer_dirty (ubh->bh[i]); } void ubh_sync_block(struct ufs_buffer_head *ubh) { if (ubh) { unsigned i; for (i = 0; i < ubh->count; i++) write_dirty_buffer(ubh->bh[i], 0); for (i = 0; i < ubh->count; i++) wait_on_buffer(ubh->bh[i]); } } void ubh_bforget (struct ufs_buffer_head * ubh) { unsigned i; if (!ubh) return; for ( i = 0; i < ubh->count; i++ ) if ( ubh->bh[i] ) bforget (ubh->bh[i]); } int ubh_buffer_dirty (struct ufs_buffer_head * ubh) { unsigned i; unsigned result = 0; if (!ubh) return 0; for ( i = 0; i < ubh->count; i++ ) result |= buffer_dirty(ubh->bh[i]); return result; } dev_t ufs_get_inode_dev(struct super_block *sb, struct ufs_inode_info *ufsi) { __u32 fs32; dev_t dev; if ((UFS_SB(sb)->s_flags & UFS_ST_MASK) == UFS_ST_SUNx86) fs32 = fs32_to_cpu(sb, ufsi->i_u1.i_data[1]); else fs32 = fs32_to_cpu(sb, ufsi->i_u1.i_data[0]); switch (UFS_SB(sb)->s_flags & UFS_ST_MASK) { case UFS_ST_SUNx86: case UFS_ST_SUN: if ((fs32 & 0xffff0000) == 0 || (fs32 & 0xffff0000) == 0xffff0000) dev = old_decode_dev(fs32 & 0x7fff); else dev = MKDEV(sysv_major(fs32), sysv_minor(fs32)); break; default: dev = old_decode_dev(fs32); break; } return dev; } void ufs_set_inode_dev(struct super_block *sb, struct ufs_inode_info *ufsi, dev_t dev) { __u32 fs32; switch (UFS_SB(sb)->s_flags & UFS_ST_MASK) { case UFS_ST_SUNx86: case UFS_ST_SUN: fs32 = sysv_encode_dev(dev); if ((fs32 & 0xffff8000) == 0) { fs32 = old_encode_dev(dev); } break; default: fs32 = old_encode_dev(dev); break; } if ((UFS_SB(sb)->s_flags & UFS_ST_MASK) == UFS_ST_SUNx86) ufsi->i_u1.i_data[1] = cpu_to_fs32(sb, fs32); else ufsi->i_u1.i_data[0] = cpu_to_fs32(sb, fs32); } /** * ufs_get_locked_folio() - locate, pin and lock a pagecache folio, if not exist * read it from disk. * @mapping: the address_space to search * @index: the page index * * Locates the desired pagecache folio, if not exist we'll read it, * locks it, increments its reference * count and returns its address. * */ struct folio *ufs_get_locked_folio(struct address_space *mapping, pgoff_t index) { struct inode *inode = mapping->host; struct folio *folio = filemap_lock_folio(mapping, index); if (IS_ERR(folio)) { folio = read_mapping_folio(mapping, index, NULL); if (IS_ERR(folio)) { printk(KERN_ERR "ufs_change_blocknr: read_mapping_folio error: ino %lu, index: %lu\n", mapping->host->i_ino, index); return folio; } folio_lock(folio); if (unlikely(folio->mapping == NULL)) { /* Truncate got there first */ folio_unlock(folio); folio_put(folio); return NULL; } } if (!folio_buffers(folio)) create_empty_buffers(folio, 1 << inode->i_blkbits, 0); return folio; } |
| 15 14 21 14 7 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 | /* SPDX-License-Identifier: GPL-2.0 */ /* * iommu trace points * * Copyright (C) 2013 Shuah Khan <shuah.kh@samsung.com> * */ #undef TRACE_SYSTEM #define TRACE_SYSTEM iommu #if !defined(_TRACE_IOMMU_H) || defined(TRACE_HEADER_MULTI_READ) #define _TRACE_IOMMU_H #include <linux/tracepoint.h> struct device; DECLARE_EVENT_CLASS(iommu_group_event, TP_PROTO(int group_id, struct device *dev), TP_ARGS(group_id, dev), TP_STRUCT__entry( __field(int, gid) __string(device, dev_name(dev)) ), TP_fast_assign( __entry->gid = group_id; __assign_str(device); ), TP_printk("IOMMU: groupID=%d device=%s", __entry->gid, __get_str(device) ) ); DEFINE_EVENT(iommu_group_event, add_device_to_group, TP_PROTO(int group_id, struct device *dev), TP_ARGS(group_id, dev) ); DEFINE_EVENT(iommu_group_event, remove_device_from_group, TP_PROTO(int group_id, struct device *dev), TP_ARGS(group_id, dev) ); DECLARE_EVENT_CLASS(iommu_device_event, TP_PROTO(struct device *dev), TP_ARGS(dev), TP_STRUCT__entry( __string(device, dev_name(dev)) ), TP_fast_assign( __assign_str(device); ), TP_printk("IOMMU: device=%s", __get_str(device) ) ); DEFINE_EVENT(iommu_device_event, attach_device_to_domain, TP_PROTO(struct device *dev), TP_ARGS(dev) ); TRACE_EVENT(map, TP_PROTO(unsigned long iova, phys_addr_t paddr, size_t size), TP_ARGS(iova, paddr, size), TP_STRUCT__entry( __field(u64, iova) __field(u64, paddr) __field(size_t, size) ), TP_fast_assign( __entry->iova = iova; __entry->paddr = paddr; __entry->size = size; ), TP_printk("IOMMU: iova=0x%016llx - 0x%016llx paddr=0x%016llx size=%zu", __entry->iova, __entry->iova + __entry->size, __entry->paddr, __entry->size ) ); TRACE_EVENT(unmap, TP_PROTO(unsigned long iova, size_t size, size_t unmapped_size), TP_ARGS(iova, size, unmapped_size), TP_STRUCT__entry( __field(u64, iova) __field(size_t, size) __field(size_t, unmapped_size) ), TP_fast_assign( __entry->iova = iova; __entry->size = size; __entry->unmapped_size = unmapped_size; ), TP_printk("IOMMU: iova=0x%016llx - 0x%016llx size=%zu unmapped_size=%zu", __entry->iova, __entry->iova + __entry->size, __entry->size, __entry->unmapped_size ) ); DECLARE_EVENT_CLASS(iommu_error, TP_PROTO(struct device *dev, unsigned long iova, int flags), TP_ARGS(dev, iova, flags), TP_STRUCT__entry( __string(device, dev_name(dev)) __string(driver, dev_driver_string(dev)) __field(u64, iova) __field(int, flags) ), TP_fast_assign( __assign_str(device); __assign_str(driver); __entry->iova = iova; __entry->flags = flags; ), TP_printk("IOMMU:%s %s iova=0x%016llx flags=0x%04x", __get_str(driver), __get_str(device), __entry->iova, __entry->flags ) ); DEFINE_EVENT(iommu_error, io_page_fault, TP_PROTO(struct device *dev, unsigned long iova, int flags), TP_ARGS(dev, iova, flags) ); #endif /* _TRACE_IOMMU_H */ /* This part must be outside protection */ #include <trace/define_trace.h> |
| 16 17 17 17 17 17 17 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 | // SPDX-License-Identifier: GPL-2.0-only /* * Copyright (C) 2007 IBM Corporation * * Author: Cedric Le Goater <clg@fr.ibm.com> */ #include <linux/nsproxy.h> #include <linux/ipc_namespace.h> #include <linux/sysctl.h> #include <linux/stat.h> #include <linux/capability.h> #include <linux/slab.h> #include <linux/cred.h> static int msg_max_limit_min = MIN_MSGMAX; static int msg_max_limit_max = HARD_MSGMAX; static int msg_maxsize_limit_min = MIN_MSGSIZEMAX; static int msg_maxsize_limit_max = HARD_MSGSIZEMAX; static const struct ctl_table mq_sysctls[] = { { .procname = "queues_max", .data = &init_ipc_ns.mq_queues_max, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec, }, { .procname = "msg_max", .data = &init_ipc_ns.mq_msg_max, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec_minmax, .extra1 = &msg_max_limit_min, .extra2 = &msg_max_limit_max, }, { .procname = "msgsize_max", .data = &init_ipc_ns.mq_msgsize_max, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec_minmax, .extra1 = &msg_maxsize_limit_min, .extra2 = &msg_maxsize_limit_max, }, { .procname = "msg_default", .data = &init_ipc_ns.mq_msg_default, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec_minmax, .extra1 = &msg_max_limit_min, .extra2 = &msg_max_limit_max, }, { .procname = "msgsize_default", .data = &init_ipc_ns.mq_msgsize_default, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec_minmax, .extra1 = &msg_maxsize_limit_min, .extra2 = &msg_maxsize_limit_max, }, }; static struct ctl_table_set *set_lookup(struct ctl_table_root *root) { return ¤t->nsproxy->ipc_ns->mq_set; } static int set_is_seen(struct ctl_table_set *set) { return ¤t->nsproxy->ipc_ns->mq_set == set; } static void mq_set_ownership(struct ctl_table_header *head, kuid_t *uid, kgid_t *gid) { struct ipc_namespace *ns = container_of(head->set, struct ipc_namespace, mq_set); kuid_t ns_root_uid = make_kuid(ns->user_ns, 0); kgid_t ns_root_gid = make_kgid(ns->user_ns, 0); *uid = uid_valid(ns_root_uid) ? ns_root_uid : GLOBAL_ROOT_UID; *gid = gid_valid(ns_root_gid) ? ns_root_gid : GLOBAL_ROOT_GID; } static int mq_permissions(struct ctl_table_header *head, const struct ctl_table *table) { int mode = table->mode; kuid_t ns_root_uid; kgid_t ns_root_gid; mq_set_ownership(head, &ns_root_uid, &ns_root_gid); if (uid_eq(current_euid(), ns_root_uid)) mode >>= 6; else if (in_egroup_p(ns_root_gid)) mode >>= 3; mode &= 7; return (mode << 6) | (mode << 3) | mode; } static struct ctl_table_root set_root = { .lookup = set_lookup, .permissions = mq_permissions, .set_ownership = mq_set_ownership, }; bool setup_mq_sysctls(struct ipc_namespace *ns) { struct ctl_table *tbl; setup_sysctl_set(&ns->mq_set, &set_root, set_is_seen); tbl = kmemdup(mq_sysctls, sizeof(mq_sysctls), GFP_KERNEL); if (tbl) { int i; for (i = 0; i < ARRAY_SIZE(mq_sysctls); i++) { if (tbl[i].data == &init_ipc_ns.mq_queues_max) tbl[i].data = &ns->mq_queues_max; else if (tbl[i].data == &init_ipc_ns.mq_msg_max) tbl[i].data = &ns->mq_msg_max; else if (tbl[i].data == &init_ipc_ns.mq_msgsize_max) tbl[i].data = &ns->mq_msgsize_max; else if (tbl[i].data == &init_ipc_ns.mq_msg_default) tbl[i].data = &ns->mq_msg_default; else if (tbl[i].data == &init_ipc_ns.mq_msgsize_default) tbl[i].data = &ns->mq_msgsize_default; else tbl[i].data = NULL; } ns->mq_sysctls = __register_sysctl_table(&ns->mq_set, "fs/mqueue", tbl, ARRAY_SIZE(mq_sysctls)); } if (!ns->mq_sysctls) { kfree(tbl); retire_sysctl_set(&ns->mq_set); return false; } return true; } void retire_mq_sysctls(struct ipc_namespace *ns) { const struct ctl_table *tbl; tbl = ns->mq_sysctls->ctl_table_arg; unregister_sysctl_table(ns->mq_sysctls); retire_sysctl_set(&ns->mq_set); kfree(tbl); } |
| 7 7 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 | // SPDX-License-Identifier: GPL-2.0-or-later /* * Handle bridge arp/nd proxy/suppress * * Copyright (C) 2017 Cumulus Networks * Copyright (c) 2017 Roopa Prabhu <roopa@cumulusnetworks.com> * * Authors: * Roopa Prabhu <roopa@cumulusnetworks.com> */ #include <linux/kernel.h> #include <linux/netdevice.h> #include <linux/etherdevice.h> #include <linux/neighbour.h> #include <net/arp.h> #include <linux/if_vlan.h> #include <linux/inetdevice.h> #include <net/addrconf.h> #include <net/ipv6_stubs.h> #if IS_ENABLED(CONFIG_IPV6) #include <net/ip6_checksum.h> #endif #include "br_private.h" void br_recalculate_neigh_suppress_enabled(struct net_bridge *br) { struct net_bridge_port *p; bool neigh_suppress = false; list_for_each_entry(p, &br->port_list, list) { if (p->flags & (BR_NEIGH_SUPPRESS | BR_NEIGH_VLAN_SUPPRESS)) { neigh_suppress = true; break; } } br_opt_toggle(br, BROPT_NEIGH_SUPPRESS_ENABLED, neigh_suppress); } #if IS_ENABLED(CONFIG_INET) static void br_arp_send(struct net_bridge *br, struct net_bridge_port *p, struct net_device *dev, __be32 dest_ip, __be32 src_ip, const unsigned char *dest_hw, const unsigned char *src_hw, const unsigned char *target_hw, __be16 vlan_proto, u16 vlan_tci) { struct net_bridge_vlan_group *vg; struct sk_buff *skb; u16 pvid; netdev_dbg(dev, "arp send dev %s dst %pI4 dst_hw %pM src %pI4 src_hw %pM\n", dev->name, &dest_ip, dest_hw, &src_ip, src_hw); if (!vlan_tci) { arp_send(ARPOP_REPLY, ETH_P_ARP, dest_ip, dev, src_ip, dest_hw, src_hw, target_hw); return; } skb = arp_create(ARPOP_REPLY, ETH_P_ARP, dest_ip, dev, src_ip, dest_hw, src_hw, target_hw); if (!skb) return; if (p) vg = nbp_vlan_group_rcu(p); else vg = br_vlan_group_rcu(br); pvid = br_get_pvid(vg); if (pvid == (vlan_tci & VLAN_VID_MASK)) vlan_tci = 0; if (vlan_tci) __vlan_hwaccel_put_tag(skb, vlan_proto, vlan_tci); if (p) { arp_xmit(skb); } else { skb_reset_mac_header(skb); __skb_pull(skb, skb_network_offset(skb)); skb->ip_summed = CHECKSUM_UNNECESSARY; skb->pkt_type = PACKET_HOST; netif_rx(skb); } } static int br_chk_addr_ip(struct net_device *dev, struct netdev_nested_priv *priv) { __be32 ip = *(__be32 *)priv->data; struct in_device *in_dev; __be32 addr = 0; in_dev = __in_dev_get_rcu(dev); if (in_dev) addr = inet_confirm_addr(dev_net(dev), in_dev, 0, ip, RT_SCOPE_HOST); if (addr == ip) return 1; return 0; } static bool br_is_local_ip(struct net_device *dev, __be32 ip) { struct netdev_nested_priv priv = { .data = (void *)&ip, }; if (br_chk_addr_ip(dev, &priv)) return true; /* check if ip is configured on upper dev */ if (netdev_walk_all_upper_dev_rcu(dev, br_chk_addr_ip, &priv)) return true; return false; } void br_do_proxy_suppress_arp(struct sk_buff *skb, struct net_bridge *br, u16 vid, struct net_bridge_port *p) { struct net_device *dev = br->dev; struct net_device *vlandev = dev; struct neighbour *n; struct arphdr *parp; u8 *arpptr, *sha; __be32 sip, tip; BR_INPUT_SKB_CB(skb)->proxyarp_replied = 0; if ((dev->flags & IFF_NOARP) || !pskb_may_pull(skb, arp_hdr_len(dev))) return; parp = arp_hdr(skb); if (parp->ar_pro != htons(ETH_P_IP) || parp->ar_hln != dev->addr_len || parp->ar_pln != 4) return; arpptr = (u8 *)parp + sizeof(struct arphdr); sha = arpptr; arpptr += dev->addr_len; /* sha */ memcpy(&sip, arpptr, sizeof(sip)); arpptr += sizeof(sip); arpptr += dev->addr_len; /* tha */ memcpy(&tip, arpptr, sizeof(tip)); if (ipv4_is_loopback(tip) || ipv4_is_multicast(tip)) return; if (br_opt_get(br, BROPT_NEIGH_SUPPRESS_ENABLED)) { if (br_is_neigh_suppress_enabled(p, vid)) return; if (is_unicast_ether_addr(eth_hdr(skb)->h_dest) && parp->ar_op == htons(ARPOP_REQUEST)) return; if (parp->ar_op != htons(ARPOP_RREQUEST) && parp->ar_op != htons(ARPOP_RREPLY) && (ipv4_is_zeronet(sip) || sip == tip)) { /* prevent flooding to neigh suppress ports */ BR_INPUT_SKB_CB(skb)->proxyarp_replied = 1; return; } } if (parp->ar_op != htons(ARPOP_REQUEST)) return; if (vid != 0) { vlandev = __vlan_find_dev_deep_rcu(br->dev, skb->vlan_proto, vid); if (!vlandev) return; } if (br_opt_get(br, BROPT_NEIGH_SUPPRESS_ENABLED) && br_is_local_ip(vlandev, tip)) { /* its our local ip, so don't proxy reply * and don't forward to neigh suppress ports */ BR_INPUT_SKB_CB(skb)->proxyarp_replied = 1; return; } n = neigh_lookup(&arp_tbl, &tip, vlandev); if (n) { struct net_bridge_fdb_entry *f; if (!(READ_ONCE(n->nud_state) & NUD_VALID)) { neigh_release(n); return; } f = br_fdb_find_rcu(br, n->ha, vid); if (f) { bool replied = false; if ((p && (p->flags & BR_PROXYARP)) || (f->dst && (f->dst->flags & BR_PROXYARP_WIFI)) || br_is_neigh_suppress_enabled(f->dst, vid)) { if (!vid) br_arp_send(br, p, skb->dev, sip, tip, sha, n->ha, sha, 0, 0); else br_arp_send(br, p, skb->dev, sip, tip, sha, n->ha, sha, skb->vlan_proto, skb_vlan_tag_get(skb)); replied = true; } /* If we have replied or as long as we know the * mac, indicate to arp replied */ if (replied || br_opt_get(br, BROPT_NEIGH_SUPPRESS_ENABLED)) BR_INPUT_SKB_CB(skb)->proxyarp_replied = 1; } neigh_release(n); } } #endif #if IS_ENABLED(CONFIG_IPV6) struct nd_msg *br_is_nd_neigh_msg(const struct sk_buff *skb, struct nd_msg *msg) { struct nd_msg *m; m = skb_header_pointer(skb, skb_network_offset(skb) + sizeof(struct ipv6hdr), sizeof(*msg), msg); if (!m) return NULL; if (m->icmph.icmp6_code != 0 || (m->icmph.icmp6_type != NDISC_NEIGHBOUR_SOLICITATION && m->icmph.icmp6_type != NDISC_NEIGHBOUR_ADVERTISEMENT)) return NULL; return m; } static void br_nd_send(struct net_bridge *br, struct net_bridge_port *p, struct sk_buff *request, struct neighbour *n, __be16 vlan_proto, u16 vlan_tci, struct nd_msg *ns) { struct net_device *dev = request->dev; struct net_bridge_vlan_group *vg; struct sk_buff *reply; struct nd_msg *na; struct ipv6hdr *pip6; int na_olen = 8; /* opt hdr + ETH_ALEN for target */ int ns_olen; int i, len; u8 *daddr; u16 pvid; if (!dev) return; len = LL_RESERVED_SPACE(dev) + sizeof(struct ipv6hdr) + sizeof(*na) + na_olen + dev->needed_tailroom; reply = alloc_skb(len, GFP_ATOMIC); if (!reply) return; reply->protocol = htons(ETH_P_IPV6); reply->dev = dev; skb_reserve(reply, LL_RESERVED_SPACE(dev)); skb_push(reply, sizeof(struct ethhdr)); skb_set_mac_header(reply, 0); daddr = eth_hdr(request)->h_source; /* Do we need option processing ? */ ns_olen = request->len - (skb_network_offset(request) + sizeof(struct ipv6hdr)) - sizeof(*ns); for (i = 0; i < ns_olen - 1; i += (ns->opt[i + 1] << 3)) { if (!ns->opt[i + 1]) { kfree_skb(reply); return; } if (ns->opt[i] == ND_OPT_SOURCE_LL_ADDR) { daddr = ns->opt + i + sizeof(struct nd_opt_hdr); break; } } /* Ethernet header */ ether_addr_copy(eth_hdr(reply)->h_dest, daddr); ether_addr_copy(eth_hdr(reply)->h_source, n->ha); eth_hdr(reply)->h_proto = htons(ETH_P_IPV6); reply->protocol = htons(ETH_P_IPV6); skb_pull(reply, sizeof(struct ethhdr)); skb_set_network_header(reply, 0); skb_put(reply, sizeof(struct ipv6hdr)); /* IPv6 header */ pip6 = ipv6_hdr(reply); memset(pip6, 0, sizeof(struct ipv6hdr)); pip6->version = 6; pip6->priority = ipv6_hdr(request)->priority; pip6->nexthdr = IPPROTO_ICMPV6; pip6->hop_limit = 255; pip6->daddr = ipv6_hdr(request)->saddr; pip6->saddr = *(struct in6_addr *)n->primary_key; skb_pull(reply, sizeof(struct ipv6hdr)); skb_set_transport_header(reply, 0); na = (struct nd_msg *)skb_put(reply, sizeof(*na) + na_olen); /* Neighbor Advertisement */ memset(na, 0, sizeof(*na) + na_olen); na->icmph.icmp6_type = NDISC_NEIGHBOUR_ADVERTISEMENT; na->icmph.icmp6_router = (n->flags & NTF_ROUTER) ? 1 : 0; na->icmph.icmp6_override = 1; na->icmph.icmp6_solicited = 1; na->target = ns->target; ether_addr_copy(&na->opt[2], n->ha); na->opt[0] = ND_OPT_TARGET_LL_ADDR; na->opt[1] = na_olen >> 3; na->icmph.icmp6_cksum = csum_ipv6_magic(&pip6->saddr, &pip6->daddr, sizeof(*na) + na_olen, IPPROTO_ICMPV6, csum_partial(na, sizeof(*na) + na_olen, 0)); pip6->payload_len = htons(sizeof(*na) + na_olen); skb_push(reply, sizeof(struct ipv6hdr)); skb_push(reply, sizeof(struct ethhdr)); reply->ip_summed = CHECKSUM_UNNECESSARY; if (p) vg = nbp_vlan_group_rcu(p); else vg = br_vlan_group_rcu(br); pvid = br_get_pvid(vg); if (pvid == (vlan_tci & VLAN_VID_MASK)) vlan_tci = 0; if (vlan_tci) __vlan_hwaccel_put_tag(reply, vlan_proto, vlan_tci); netdev_dbg(dev, "nd send dev %s dst %pI6 dst_hw %pM src %pI6 src_hw %pM\n", dev->name, &pip6->daddr, daddr, &pip6->saddr, n->ha); if (p) { dev_queue_xmit(reply); } else { skb_reset_mac_header(reply); __skb_pull(reply, skb_network_offset(reply)); reply->ip_summed = CHECKSUM_UNNECESSARY; reply->pkt_type = PACKET_HOST; netif_rx(reply); } } static int br_chk_addr_ip6(struct net_device *dev, struct netdev_nested_priv *priv) { struct in6_addr *addr = (struct in6_addr *)priv->data; if (ipv6_chk_addr(dev_net(dev), addr, dev, 0)) return 1; return 0; } static bool br_is_local_ip6(struct net_device *dev, struct in6_addr *addr) { struct netdev_nested_priv priv = { .data = (void *)addr, }; if (br_chk_addr_ip6(dev, &priv)) return true; /* check if ip is configured on upper dev */ if (netdev_walk_all_upper_dev_rcu(dev, br_chk_addr_ip6, &priv)) return true; return false; } void br_do_suppress_nd(struct sk_buff *skb, struct net_bridge *br, u16 vid, struct net_bridge_port *p, struct nd_msg *msg) { struct net_device *dev = br->dev; struct net_device *vlandev = NULL; struct in6_addr *saddr, *daddr; struct ipv6hdr *iphdr; struct neighbour *n; BR_INPUT_SKB_CB(skb)->proxyarp_replied = 0; if (br_is_neigh_suppress_enabled(p, vid)) return; if (is_unicast_ether_addr(eth_hdr(skb)->h_dest) && msg->icmph.icmp6_type == NDISC_NEIGHBOUR_SOLICITATION) return; if (msg->icmph.icmp6_type == NDISC_NEIGHBOUR_ADVERTISEMENT && !msg->icmph.icmp6_solicited) { /* prevent flooding to neigh suppress ports */ BR_INPUT_SKB_CB(skb)->proxyarp_replied = 1; return; } if (msg->icmph.icmp6_type != NDISC_NEIGHBOUR_SOLICITATION) return; iphdr = ipv6_hdr(skb); saddr = &iphdr->saddr; daddr = &iphdr->daddr; if (ipv6_addr_any(saddr) || !ipv6_addr_cmp(saddr, daddr)) { /* prevent flooding to neigh suppress ports */ BR_INPUT_SKB_CB(skb)->proxyarp_replied = 1; return; } if (vid != 0) { /* build neigh table lookup on the vlan device */ vlandev = __vlan_find_dev_deep_rcu(br->dev, skb->vlan_proto, vid); if (!vlandev) return; } else { vlandev = dev; } if (br_is_local_ip6(vlandev, &msg->target)) { /* its our own ip, so don't proxy reply * and don't forward to arp suppress ports */ BR_INPUT_SKB_CB(skb)->proxyarp_replied = 1; return; } n = neigh_lookup(ipv6_stub->nd_tbl, &msg->target, vlandev); if (n) { struct net_bridge_fdb_entry *f; if (!(READ_ONCE(n->nud_state) & NUD_VALID)) { neigh_release(n); return; } f = br_fdb_find_rcu(br, n->ha, vid); if (f) { bool replied = false; if (br_is_neigh_suppress_enabled(f->dst, vid)) { if (vid != 0) br_nd_send(br, p, skb, n, skb->vlan_proto, skb_vlan_tag_get(skb), msg); else br_nd_send(br, p, skb, n, 0, 0, msg); replied = true; } /* If we have replied or as long as we know the * mac, indicate to NEIGH_SUPPRESS ports that we * have replied */ if (replied || br_opt_get(br, BROPT_NEIGH_SUPPRESS_ENABLED)) BR_INPUT_SKB_CB(skb)->proxyarp_replied = 1; } neigh_release(n); } } #endif bool br_is_neigh_suppress_enabled(const struct net_bridge_port *p, u16 vid) { if (!p) return false; if (!vid) return !!(p->flags & BR_NEIGH_SUPPRESS); if (p->flags & BR_NEIGH_VLAN_SUPPRESS) { struct net_bridge_vlan_group *vg = nbp_vlan_group_rcu(p); struct net_bridge_vlan *v; v = br_vlan_find(vg, vid); if (!v) return false; return !!(v->priv_flags & BR_VLFLAG_NEIGH_SUPPRESS_ENABLED); } else { return !!(p->flags & BR_NEIGH_SUPPRESS); } } |
| 101 101 34 38 42 38 4 4 4 2 3 4 5 3 3 5 4 3 2 2 1 3 2 8 8 11 9 9 6 7 5 3 1 9 4 5 3 1 9 3 4 2 6 7 2 6 6 2 9 8 5 17 17 1 5 12 15 9 11 3 1 4 13 11 10 8 7 10 6 13 3 16 1 4 3 1 4 3 3 3 3 3 3 6 6 4 3 6 13 6 4 4 10 14 18 1 1 17 1 1 1 11 4 3 2 14 4 13 6 3 3 1 4 3 3 2 3 3 2 3 3 7 2 5 6 12 1 1 7 3 3 3 2 3 2 3 3 3 2 1 2 1 1 1 1 1 2 2 2 2 2 3 2 2 3 3 3 3 2 2 2 2 3 3 3 3 1 1 1 3 2 1 2 4 4 3 1 1 2 2 4 1 2 1 2 3 8 2 4 2 4 4 1 4 2 10 18 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 | // SPDX-License-Identifier: GPL-2.0 /* Multipath TCP * * Copyright (c) 2025, Matthieu Baerts. */ #define pr_fmt(fmt) "MPTCP: " fmt #include <net/netns/generic.h> #include "protocol.h" #include "mib.h" #include "mptcp_pm_gen.h" static int pm_nl_pernet_id; struct pm_nl_pernet { /* protects pernet updates */ spinlock_t lock; struct list_head local_addr_list; unsigned int addrs; unsigned int stale_loss_cnt; unsigned int add_addr_signal_max; unsigned int add_addr_accept_max; unsigned int local_addr_max; unsigned int subflows_max; unsigned int next_id; DECLARE_BITMAP(id_bitmap, MPTCP_PM_MAX_ADDR_ID + 1); }; #define MPTCP_PM_ADDR_MAX 8 static struct pm_nl_pernet *pm_nl_get_pernet(const struct net *net) { return net_generic(net, pm_nl_pernet_id); } static struct pm_nl_pernet * pm_nl_get_pernet_from_msk(const struct mptcp_sock *msk) { return pm_nl_get_pernet(sock_net((struct sock *)msk)); } static struct pm_nl_pernet *genl_info_pm_nl(struct genl_info *info) { return pm_nl_get_pernet(genl_info_net(info)); } unsigned int mptcp_pm_get_add_addr_signal_max(const struct mptcp_sock *msk) { const struct pm_nl_pernet *pernet = pm_nl_get_pernet_from_msk(msk); return READ_ONCE(pernet->add_addr_signal_max); } EXPORT_SYMBOL_GPL(mptcp_pm_get_add_addr_signal_max); unsigned int mptcp_pm_get_add_addr_accept_max(const struct mptcp_sock *msk) { struct pm_nl_pernet *pernet = pm_nl_get_pernet_from_msk(msk); return READ_ONCE(pernet->add_addr_accept_max); } EXPORT_SYMBOL_GPL(mptcp_pm_get_add_addr_accept_max); unsigned int mptcp_pm_get_subflows_max(const struct mptcp_sock *msk) { struct pm_nl_pernet *pernet = pm_nl_get_pernet_from_msk(msk); return READ_ONCE(pernet->subflows_max); } EXPORT_SYMBOL_GPL(mptcp_pm_get_subflows_max); unsigned int mptcp_pm_get_local_addr_max(const struct mptcp_sock *msk) { struct pm_nl_pernet *pernet = pm_nl_get_pernet_from_msk(msk); return READ_ONCE(pernet->local_addr_max); } EXPORT_SYMBOL_GPL(mptcp_pm_get_local_addr_max); static bool lookup_subflow_by_daddr(const struct list_head *list, const struct mptcp_addr_info *daddr) { struct mptcp_subflow_context *subflow; struct mptcp_addr_info cur; list_for_each_entry(subflow, list, node) { struct sock *ssk = mptcp_subflow_tcp_sock(subflow); if (!((1 << inet_sk_state_load(ssk)) & (TCPF_ESTABLISHED | TCPF_SYN_SENT | TCPF_SYN_RECV))) continue; mptcp_remote_address((struct sock_common *)ssk, &cur); if (mptcp_addresses_equal(&cur, daddr, daddr->port)) return true; } return false; } static bool select_local_address(const struct pm_nl_pernet *pernet, const struct mptcp_sock *msk, struct mptcp_pm_local *new_local) { struct mptcp_pm_addr_entry *entry; bool found = false; msk_owned_by_me(msk); rcu_read_lock(); list_for_each_entry_rcu(entry, &pernet->local_addr_list, list) { if (!(entry->flags & MPTCP_PM_ADDR_FLAG_SUBFLOW)) continue; if (!test_bit(entry->addr.id, msk->pm.id_avail_bitmap)) continue; new_local->addr = entry->addr; new_local->flags = entry->flags; new_local->ifindex = entry->ifindex; found = true; break; } rcu_read_unlock(); return found; } static bool select_signal_address(struct pm_nl_pernet *pernet, const struct mptcp_sock *msk, struct mptcp_pm_local *new_local) { struct mptcp_pm_addr_entry *entry; bool found = false; rcu_read_lock(); /* do not keep any additional per socket state, just signal * the address list in order. * Note: removal from the local address list during the msk life-cycle * can lead to additional addresses not being announced. */ list_for_each_entry_rcu(entry, &pernet->local_addr_list, list) { if (!test_bit(entry->addr.id, msk->pm.id_avail_bitmap)) continue; if (!(entry->flags & MPTCP_PM_ADDR_FLAG_SIGNAL)) continue; new_local->addr = entry->addr; new_local->flags = entry->flags; new_local->ifindex = entry->ifindex; found = true; break; } rcu_read_unlock(); return found; } /* Fill all the remote addresses into the array addrs[], * and return the array size. */ static unsigned int fill_remote_addresses_vec(struct mptcp_sock *msk, struct mptcp_addr_info *local, bool fullmesh, struct mptcp_addr_info *addrs) { bool deny_id0 = READ_ONCE(msk->pm.remote_deny_join_id0); struct sock *sk = (struct sock *)msk, *ssk; struct mptcp_subflow_context *subflow; struct mptcp_addr_info remote = { 0 }; unsigned int subflows_max; int i = 0; subflows_max = mptcp_pm_get_subflows_max(msk); mptcp_remote_address((struct sock_common *)sk, &remote); /* Non-fullmesh endpoint, fill in the single entry * corresponding to the primary MPC subflow remote address */ if (!fullmesh) { if (deny_id0) return 0; if (!mptcp_pm_addr_families_match(sk, local, &remote)) return 0; msk->pm.subflows++; addrs[i++] = remote; } else { DECLARE_BITMAP(unavail_id, MPTCP_PM_MAX_ADDR_ID + 1); /* Forbid creation of new subflows matching existing * ones, possibly already created by incoming ADD_ADDR */ bitmap_zero(unavail_id, MPTCP_PM_MAX_ADDR_ID + 1); mptcp_for_each_subflow(msk, subflow) if (READ_ONCE(subflow->local_id) == local->id) __set_bit(subflow->remote_id, unavail_id); mptcp_for_each_subflow(msk, subflow) { ssk = mptcp_subflow_tcp_sock(subflow); mptcp_remote_address((struct sock_common *)ssk, &addrs[i]); addrs[i].id = READ_ONCE(subflow->remote_id); if (deny_id0 && !addrs[i].id) continue; if (test_bit(addrs[i].id, unavail_id)) continue; if (!mptcp_pm_addr_families_match(sk, local, &addrs[i])) continue; if (msk->pm.subflows < subflows_max) { /* forbid creating multiple address towards * this id */ __set_bit(addrs[i].id, unavail_id); msk->pm.subflows++; i++; } } } return i; } static struct mptcp_pm_addr_entry * __lookup_addr_by_id(struct pm_nl_pernet *pernet, unsigned int id) { struct mptcp_pm_addr_entry *entry; list_for_each_entry_rcu(entry, &pernet->local_addr_list, list, lockdep_is_held(&pernet->lock)) { if (entry->addr.id == id) return entry; } return NULL; } static struct mptcp_pm_addr_entry * __lookup_addr(struct pm_nl_pernet *pernet, const struct mptcp_addr_info *info) { struct mptcp_pm_addr_entry *entry; list_for_each_entry_rcu(entry, &pernet->local_addr_list, list, lockdep_is_held(&pernet->lock)) { if (mptcp_addresses_equal(&entry->addr, info, entry->addr.port)) return entry; } return NULL; } static void mptcp_pm_create_subflow_or_signal_addr(struct mptcp_sock *msk) { struct sock *sk = (struct sock *)msk; unsigned int add_addr_signal_max; bool signal_and_subflow = false; unsigned int local_addr_max; struct pm_nl_pernet *pernet; struct mptcp_pm_local local; unsigned int subflows_max; pernet = pm_nl_get_pernet(sock_net(sk)); add_addr_signal_max = mptcp_pm_get_add_addr_signal_max(msk); local_addr_max = mptcp_pm_get_local_addr_max(msk); subflows_max = mptcp_pm_get_subflows_max(msk); /* do lazy endpoint usage accounting for the MPC subflows */ if (unlikely(!(msk->pm.status & BIT(MPTCP_PM_MPC_ENDPOINT_ACCOUNTED))) && msk->first) { struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(msk->first); struct mptcp_pm_addr_entry *entry; struct mptcp_addr_info mpc_addr; bool backup = false; mptcp_local_address((struct sock_common *)msk->first, &mpc_addr); rcu_read_lock(); entry = __lookup_addr(pernet, &mpc_addr); if (entry) { __clear_bit(entry->addr.id, msk->pm.id_avail_bitmap); msk->mpc_endpoint_id = entry->addr.id; backup = !!(entry->flags & MPTCP_PM_ADDR_FLAG_BACKUP); } rcu_read_unlock(); if (backup) mptcp_pm_send_ack(msk, subflow, true, backup); msk->pm.status |= BIT(MPTCP_PM_MPC_ENDPOINT_ACCOUNTED); } pr_debug("local %d:%d signal %d:%d subflows %d:%d\n", msk->pm.local_addr_used, local_addr_max, msk->pm.add_addr_signaled, add_addr_signal_max, msk->pm.subflows, subflows_max); /* check first for announce */ if (msk->pm.add_addr_signaled < add_addr_signal_max) { /* due to racing events on both ends we can reach here while * previous add address is still running: if we invoke now * mptcp_pm_announce_addr(), that will fail and the * corresponding id will be marked as used. * Instead let the PM machinery reschedule us when the * current address announce will be completed. */ if (msk->pm.addr_signal & BIT(MPTCP_ADD_ADDR_SIGNAL)) return; if (!select_signal_address(pernet, msk, &local)) goto subflow; /* If the alloc fails, we are on memory pressure, not worth * continuing, and trying to create subflows. */ if (!mptcp_pm_alloc_anno_list(msk, &local.addr)) return; __clear_bit(local.addr.id, msk->pm.id_avail_bitmap); msk->pm.add_addr_signaled++; /* Special case for ID0: set the correct ID */ if (local.addr.id == msk->mpc_endpoint_id) local.addr.id = 0; mptcp_pm_announce_addr(msk, &local.addr, false); mptcp_pm_addr_send_ack(msk); if (local.flags & MPTCP_PM_ADDR_FLAG_SUBFLOW) signal_and_subflow = true; } subflow: /* check if should create a new subflow */ while (msk->pm.local_addr_used < local_addr_max && msk->pm.subflows < subflows_max) { struct mptcp_addr_info addrs[MPTCP_PM_ADDR_MAX]; bool fullmesh; int i, nr; if (signal_and_subflow) signal_and_subflow = false; else if (!select_local_address(pernet, msk, &local)) break; fullmesh = !!(local.flags & MPTCP_PM_ADDR_FLAG_FULLMESH); __clear_bit(local.addr.id, msk->pm.id_avail_bitmap); /* Special case for ID0: set the correct ID */ if (local.addr.id == msk->mpc_endpoint_id) local.addr.id = 0; else /* local_addr_used is not decr for ID 0 */ msk->pm.local_addr_used++; nr = fill_remote_addresses_vec(msk, &local.addr, fullmesh, addrs); if (nr == 0) continue; spin_unlock_bh(&msk->pm.lock); for (i = 0; i < nr; i++) __mptcp_subflow_connect(sk, &local, &addrs[i]); spin_lock_bh(&msk->pm.lock); } mptcp_pm_nl_check_work_pending(msk); } static void mptcp_pm_nl_fully_established(struct mptcp_sock *msk) { mptcp_pm_create_subflow_or_signal_addr(msk); } static void mptcp_pm_nl_subflow_established(struct mptcp_sock *msk) { mptcp_pm_create_subflow_or_signal_addr(msk); } /* Fill all the local addresses into the array addrs[], * and return the array size. */ static unsigned int fill_local_addresses_vec(struct mptcp_sock *msk, struct mptcp_addr_info *remote, struct mptcp_pm_local *locals) { struct sock *sk = (struct sock *)msk; struct mptcp_pm_addr_entry *entry; struct mptcp_addr_info mpc_addr; struct pm_nl_pernet *pernet; unsigned int subflows_max; int i = 0; pernet = pm_nl_get_pernet_from_msk(msk); subflows_max = mptcp_pm_get_subflows_max(msk); mptcp_local_address((struct sock_common *)msk, &mpc_addr); rcu_read_lock(); list_for_each_entry_rcu(entry, &pernet->local_addr_list, list) { if (!(entry->flags & MPTCP_PM_ADDR_FLAG_FULLMESH)) continue; if (!mptcp_pm_addr_families_match(sk, &entry->addr, remote)) continue; if (msk->pm.subflows < subflows_max) { locals[i].addr = entry->addr; locals[i].flags = entry->flags; locals[i].ifindex = entry->ifindex; /* Special case for ID0: set the correct ID */ if (mptcp_addresses_equal(&locals[i].addr, &mpc_addr, locals[i].addr.port)) locals[i].addr.id = 0; msk->pm.subflows++; i++; } } rcu_read_unlock(); /* If the array is empty, fill in the single * 'IPADDRANY' local address */ if (!i) { memset(&locals[i], 0, sizeof(locals[i])); locals[i].addr.family = #if IS_ENABLED(CONFIG_MPTCP_IPV6) remote->family == AF_INET6 && ipv6_addr_v4mapped(&remote->addr6) ? AF_INET : #endif remote->family; if (!mptcp_pm_addr_families_match(sk, &locals[i].addr, remote)) return 0; msk->pm.subflows++; i++; } return i; } static void mptcp_pm_nl_add_addr_received(struct mptcp_sock *msk) { struct mptcp_pm_local locals[MPTCP_PM_ADDR_MAX]; struct sock *sk = (struct sock *)msk; unsigned int add_addr_accept_max; struct mptcp_addr_info remote; unsigned int subflows_max; bool sf_created = false; int i, nr; add_addr_accept_max = mptcp_pm_get_add_addr_accept_max(msk); subflows_max = mptcp_pm_get_subflows_max(msk); pr_debug("accepted %d:%d remote family %d\n", msk->pm.add_addr_accepted, add_addr_accept_max, msk->pm.remote.family); remote = msk->pm.remote; mptcp_pm_announce_addr(msk, &remote, true); mptcp_pm_addr_send_ack(msk); if (lookup_subflow_by_daddr(&msk->conn_list, &remote)) return; /* pick id 0 port, if none is provided the remote address */ if (!remote.port) remote.port = sk->sk_dport; /* connect to the specified remote address, using whatever * local address the routing configuration will pick. */ nr = fill_local_addresses_vec(msk, &remote, locals); if (nr == 0) return; spin_unlock_bh(&msk->pm.lock); for (i = 0; i < nr; i++) if (__mptcp_subflow_connect(sk, &locals[i], &remote) == 0) sf_created = true; spin_lock_bh(&msk->pm.lock); if (sf_created) { /* add_addr_accepted is not decr for ID 0 */ if (remote.id) msk->pm.add_addr_accepted++; if (msk->pm.add_addr_accepted >= add_addr_accept_max || msk->pm.subflows >= subflows_max) WRITE_ONCE(msk->pm.accept_addr, false); } } void mptcp_pm_nl_rm_addr(struct mptcp_sock *msk, u8 rm_id) { if (rm_id && WARN_ON_ONCE(msk->pm.add_addr_accepted == 0)) { /* Note: if the subflow has been closed before, this * add_addr_accepted counter will not be decremented. */ if (--msk->pm.add_addr_accepted < mptcp_pm_get_add_addr_accept_max(msk)) WRITE_ONCE(msk->pm.accept_addr, true); } } static bool address_use_port(struct mptcp_pm_addr_entry *entry) { return (entry->flags & (MPTCP_PM_ADDR_FLAG_SIGNAL | MPTCP_PM_ADDR_FLAG_SUBFLOW)) == MPTCP_PM_ADDR_FLAG_SIGNAL; } /* caller must ensure the RCU grace period is already elapsed */ static void __mptcp_pm_release_addr_entry(struct mptcp_pm_addr_entry *entry) { if (entry->lsk) sock_release(entry->lsk); kfree(entry); } static int mptcp_pm_nl_append_new_local_addr(struct pm_nl_pernet *pernet, struct mptcp_pm_addr_entry *entry, bool needs_id, bool replace) { struct mptcp_pm_addr_entry *cur, *del_entry = NULL; unsigned int addr_max; int ret = -EINVAL; spin_lock_bh(&pernet->lock); /* to keep the code simple, don't do IDR-like allocation for address ID, * just bail when we exceed limits */ if (pernet->next_id == MPTCP_PM_MAX_ADDR_ID) pernet->next_id = 1; if (pernet->addrs >= MPTCP_PM_ADDR_MAX) { ret = -ERANGE; goto out; } if (test_bit(entry->addr.id, pernet->id_bitmap)) { ret = -EBUSY; goto out; } /* do not insert duplicate address, differentiate on port only * singled addresses */ if (!address_use_port(entry)) entry->addr.port = 0; list_for_each_entry(cur, &pernet->local_addr_list, list) { if (mptcp_addresses_equal(&cur->addr, &entry->addr, cur->addr.port || entry->addr.port)) { /* allow replacing the exiting endpoint only if such * endpoint is an implicit one and the user-space * did not provide an endpoint id */ if (!(cur->flags & MPTCP_PM_ADDR_FLAG_IMPLICIT)) { ret = -EEXIST; goto out; } if (entry->addr.id) goto out; /* allow callers that only need to look up the local * addr's id to skip replacement. This allows them to * avoid calling synchronize_rcu in the packet recv * path. */ if (!replace) { kfree(entry); ret = cur->addr.id; goto out; } pernet->addrs--; entry->addr.id = cur->addr.id; list_del_rcu(&cur->list); del_entry = cur; break; } } if (!entry->addr.id && needs_id) { find_next: entry->addr.id = find_next_zero_bit(pernet->id_bitmap, MPTCP_PM_MAX_ADDR_ID + 1, pernet->next_id); if (!entry->addr.id && pernet->next_id != 1) { pernet->next_id = 1; goto find_next; } } if (!entry->addr.id && needs_id) goto out; __set_bit(entry->addr.id, pernet->id_bitmap); if (entry->addr.id > pernet->next_id) pernet->next_id = entry->addr.id; if (entry->flags & MPTCP_PM_ADDR_FLAG_SIGNAL) { addr_max = pernet->add_addr_signal_max; WRITE_ONCE(pernet->add_addr_signal_max, addr_max + 1); } if (entry->flags & MPTCP_PM_ADDR_FLAG_SUBFLOW) { addr_max = pernet->local_addr_max; WRITE_ONCE(pernet->local_addr_max, addr_max + 1); } pernet->addrs++; if (!entry->addr.port) list_add_tail_rcu(&entry->list, &pernet->local_addr_list); else list_add_rcu(&entry->list, &pernet->local_addr_list); ret = entry->addr.id; out: spin_unlock_bh(&pernet->lock); /* just replaced an existing entry, free it */ if (del_entry) { synchronize_rcu(); __mptcp_pm_release_addr_entry(del_entry); } return ret; } static struct lock_class_key mptcp_slock_keys[2]; static struct lock_class_key mptcp_keys[2]; static int mptcp_pm_nl_create_listen_socket(struct sock *sk, struct mptcp_pm_addr_entry *entry) { bool is_ipv6 = sk->sk_family == AF_INET6; int addrlen = sizeof(struct sockaddr_in); struct sockaddr_storage addr; struct sock *newsk, *ssk; int backlog = 1024; int err; err = sock_create_kern(sock_net(sk), entry->addr.family, SOCK_STREAM, IPPROTO_MPTCP, &entry->lsk); if (err) return err; newsk = entry->lsk->sk; if (!newsk) return -EINVAL; /* The subflow socket lock is acquired in a nested to the msk one * in several places, even by the TCP stack, and this msk is a kernel * socket: lockdep complains. Instead of propagating the _nested * modifiers in several places, re-init the lock class for the msk * socket to an mptcp specific one. */ sock_lock_init_class_and_name(newsk, is_ipv6 ? "mlock-AF_INET6" : "mlock-AF_INET", &mptcp_slock_keys[is_ipv6], is_ipv6 ? "msk_lock-AF_INET6" : "msk_lock-AF_INET", &mptcp_keys[is_ipv6]); lock_sock(newsk); ssk = __mptcp_nmpc_sk(mptcp_sk(newsk)); release_sock(newsk); if (IS_ERR(ssk)) return PTR_ERR(ssk); mptcp_info2sockaddr(&entry->addr, &addr, entry->addr.family); #if IS_ENABLED(CONFIG_MPTCP_IPV6) if (entry->addr.family == AF_INET6) addrlen = sizeof(struct sockaddr_in6); #endif if (ssk->sk_family == AF_INET) err = inet_bind_sk(ssk, (struct sockaddr *)&addr, addrlen); #if IS_ENABLED(CONFIG_MPTCP_IPV6) else if (ssk->sk_family == AF_INET6) err = inet6_bind_sk(ssk, (struct sockaddr *)&addr, addrlen); #endif if (err) return err; /* We don't use mptcp_set_state() here because it needs to be called * under the msk socket lock. For the moment, that will not bring * anything more than only calling inet_sk_state_store(), because the * old status is known (TCP_CLOSE). */ inet_sk_state_store(newsk, TCP_LISTEN); lock_sock(ssk); WRITE_ONCE(mptcp_subflow_ctx(ssk)->pm_listener, true); err = __inet_listen_sk(ssk, backlog); if (!err) mptcp_event_pm_listener(ssk, MPTCP_EVENT_LISTENER_CREATED); release_sock(ssk); return err; } int mptcp_pm_nl_get_local_id(struct mptcp_sock *msk, struct mptcp_pm_addr_entry *skc) { struct mptcp_pm_addr_entry *entry; struct pm_nl_pernet *pernet; int ret; pernet = pm_nl_get_pernet_from_msk(msk); rcu_read_lock(); entry = __lookup_addr(pernet, &skc->addr); ret = entry ? entry->addr.id : -1; rcu_read_unlock(); if (ret >= 0) return ret; /* address not found, add to local list */ entry = kmemdup(skc, sizeof(*skc), GFP_ATOMIC); if (!entry) return -ENOMEM; entry->addr.port = 0; ret = mptcp_pm_nl_append_new_local_addr(pernet, entry, true, false); if (ret < 0) kfree(entry); return ret; } bool mptcp_pm_nl_is_backup(struct mptcp_sock *msk, struct mptcp_addr_info *skc) { struct pm_nl_pernet *pernet = pm_nl_get_pernet_from_msk(msk); struct mptcp_pm_addr_entry *entry; bool backup; rcu_read_lock(); entry = __lookup_addr(pernet, skc); backup = entry && !!(entry->flags & MPTCP_PM_ADDR_FLAG_BACKUP); rcu_read_unlock(); return backup; } static int mptcp_nl_add_subflow_or_signal_addr(struct net *net, struct mptcp_addr_info *addr) { struct mptcp_sock *msk; long s_slot = 0, s_num = 0; while ((msk = mptcp_token_iter_next(net, &s_slot, &s_num)) != NULL) { struct sock *sk = (struct sock *)msk; struct mptcp_addr_info mpc_addr; if (!READ_ONCE(msk->fully_established) || mptcp_pm_is_userspace(msk)) goto next; /* if the endp linked to the init sf is re-added with a != ID */ mptcp_local_address((struct sock_common *)msk, &mpc_addr); lock_sock(sk); spin_lock_bh(&msk->pm.lock); if (mptcp_addresses_equal(addr, &mpc_addr, addr->port)) msk->mpc_endpoint_id = addr->id; mptcp_pm_create_subflow_or_signal_addr(msk); spin_unlock_bh(&msk->pm.lock); release_sock(sk); next: sock_put(sk); cond_resched(); } return 0; } static bool mptcp_pm_has_addr_attr_id(const struct nlattr *attr, struct genl_info *info) { struct nlattr *tb[MPTCP_PM_ADDR_ATTR_MAX + 1]; if (!nla_parse_nested_deprecated(tb, MPTCP_PM_ADDR_ATTR_MAX, attr, mptcp_pm_address_nl_policy, info->extack) && tb[MPTCP_PM_ADDR_ATTR_ID]) return true; return false; } /* Add an MPTCP endpoint */ int mptcp_pm_nl_add_addr_doit(struct sk_buff *skb, struct genl_info *info) { struct pm_nl_pernet *pernet = genl_info_pm_nl(info); struct mptcp_pm_addr_entry addr, *entry; struct nlattr *attr; int ret; if (GENL_REQ_ATTR_CHECK(info, MPTCP_PM_ENDPOINT_ADDR)) return -EINVAL; attr = info->attrs[MPTCP_PM_ENDPOINT_ADDR]; ret = mptcp_pm_parse_entry(attr, info, true, &addr); if (ret < 0) return ret; if (addr.addr.port && !address_use_port(&addr)) { NL_SET_ERR_MSG_ATTR(info->extack, attr, "flags must have signal and not subflow when using port"); return -EINVAL; } if (addr.flags & MPTCP_PM_ADDR_FLAG_SIGNAL && addr.flags & MPTCP_PM_ADDR_FLAG_FULLMESH) { NL_SET_ERR_MSG_ATTR(info->extack, attr, "flags mustn't have both signal and fullmesh"); return -EINVAL; } if (addr.flags & MPTCP_PM_ADDR_FLAG_IMPLICIT) { NL_SET_ERR_MSG_ATTR(info->extack, attr, "can't create IMPLICIT endpoint"); return -EINVAL; } entry = kmemdup(&addr, sizeof(addr), GFP_KERNEL_ACCOUNT); if (!entry) { GENL_SET_ERR_MSG(info, "can't allocate addr"); return -ENOMEM; } if (entry->addr.port) { ret = mptcp_pm_nl_create_listen_socket(skb->sk, entry); if (ret) { GENL_SET_ERR_MSG_FMT(info, "create listen socket error: %d", ret); goto out_free; } } ret = mptcp_pm_nl_append_new_local_addr(pernet, entry, !mptcp_pm_has_addr_attr_id(attr, info), true); if (ret < 0) { GENL_SET_ERR_MSG_FMT(info, "too many addresses or duplicate one: %d", ret); goto out_free; } mptcp_nl_add_subflow_or_signal_addr(sock_net(skb->sk), &entry->addr); return 0; out_free: __mptcp_pm_release_addr_entry(entry); return ret; } static u8 mptcp_endp_get_local_id(struct mptcp_sock *msk, const struct mptcp_addr_info *addr) { return msk->mpc_endpoint_id == addr->id ? 0 : addr->id; } static bool mptcp_pm_remove_anno_addr(struct mptcp_sock *msk, const struct mptcp_addr_info *addr, bool force) { struct mptcp_rm_list list = { .nr = 0 }; bool ret; list.ids[list.nr++] = mptcp_endp_get_local_id(msk, addr); ret = mptcp_remove_anno_list_by_saddr(msk, addr); if (ret || force) { spin_lock_bh(&msk->pm.lock); if (ret) { __set_bit(addr->id, msk->pm.id_avail_bitmap); msk->pm.add_addr_signaled--; } mptcp_pm_remove_addr(msk, &list); spin_unlock_bh(&msk->pm.lock); } return ret; } static void __mark_subflow_endp_available(struct mptcp_sock *msk, u8 id) { /* If it was marked as used, and not ID 0, decrement local_addr_used */ if (!__test_and_set_bit(id ? : msk->mpc_endpoint_id, msk->pm.id_avail_bitmap) && id && !WARN_ON_ONCE(msk->pm.local_addr_used == 0)) msk->pm.local_addr_used--; } static int mptcp_nl_remove_subflow_and_signal_addr(struct net *net, const struct mptcp_pm_addr_entry *entry) { const struct mptcp_addr_info *addr = &entry->addr; struct mptcp_rm_list list = { .nr = 1 }; long s_slot = 0, s_num = 0; struct mptcp_sock *msk; pr_debug("remove_id=%d\n", addr->id); while ((msk = mptcp_token_iter_next(net, &s_slot, &s_num)) != NULL) { struct sock *sk = (struct sock *)msk; bool remove_subflow; if (mptcp_pm_is_userspace(msk)) goto next; lock_sock(sk); remove_subflow = mptcp_lookup_subflow_by_saddr(&msk->conn_list, addr); mptcp_pm_remove_anno_addr(msk, addr, remove_subflow && !(entry->flags & MPTCP_PM_ADDR_FLAG_IMPLICIT)); list.ids[0] = mptcp_endp_get_local_id(msk, addr); if (remove_subflow) { spin_lock_bh(&msk->pm.lock); mptcp_pm_rm_subflow(msk, &list); spin_unlock_bh(&msk->pm.lock); } if (entry->flags & MPTCP_PM_ADDR_FLAG_SUBFLOW) { spin_lock_bh(&msk->pm.lock); __mark_subflow_endp_available(msk, list.ids[0]); spin_unlock_bh(&msk->pm.lock); } if (msk->mpc_endpoint_id == entry->addr.id) msk->mpc_endpoint_id = 0; release_sock(sk); next: sock_put(sk); cond_resched(); } return 0; } static int mptcp_nl_remove_id_zero_address(struct net *net, struct mptcp_addr_info *addr) { struct mptcp_rm_list list = { .nr = 0 }; long s_slot = 0, s_num = 0; struct mptcp_sock *msk; list.ids[list.nr++] = 0; while ((msk = mptcp_token_iter_next(net, &s_slot, &s_num)) != NULL) { struct sock *sk = (struct sock *)msk; struct mptcp_addr_info msk_local; if (list_empty(&msk->conn_list) || mptcp_pm_is_userspace(msk)) goto next; mptcp_local_address((struct sock_common *)msk, &msk_local); if (!mptcp_addresses_equal(&msk_local, addr, addr->port)) goto next; lock_sock(sk); spin_lock_bh(&msk->pm.lock); mptcp_pm_remove_addr(msk, &list); mptcp_pm_rm_subflow(msk, &list); __mark_subflow_endp_available(msk, 0); spin_unlock_bh(&msk->pm.lock); release_sock(sk); next: sock_put(sk); cond_resched(); } return 0; } /* Remove an MPTCP endpoint */ int mptcp_pm_nl_del_addr_doit(struct sk_buff *skb, struct genl_info *info) { struct pm_nl_pernet *pernet = genl_info_pm_nl(info); struct mptcp_pm_addr_entry addr, *entry; unsigned int addr_max; struct nlattr *attr; int ret; if (GENL_REQ_ATTR_CHECK(info, MPTCP_PM_ENDPOINT_ADDR)) return -EINVAL; attr = info->attrs[MPTCP_PM_ENDPOINT_ADDR]; ret = mptcp_pm_parse_entry(attr, info, false, &addr); if (ret < 0) return ret; /* the zero id address is special: the first address used by the msk * always gets such an id, so different subflows can have different zero * id addresses. Additionally zero id is not accounted for in id_bitmap. * Let's use an 'mptcp_rm_list' instead of the common remove code. */ if (addr.addr.id == 0) return mptcp_nl_remove_id_zero_address(sock_net(skb->sk), &addr.addr); spin_lock_bh(&pernet->lock); entry = __lookup_addr_by_id(pernet, addr.addr.id); if (!entry) { NL_SET_ERR_MSG_ATTR(info->extack, attr, "address not found"); spin_unlock_bh(&pernet->lock); return -EINVAL; } if (entry->flags & MPTCP_PM_ADDR_FLAG_SIGNAL) { addr_max = pernet->add_addr_signal_max; WRITE_ONCE(pernet->add_addr_signal_max, addr_max - 1); } if (entry->flags & MPTCP_PM_ADDR_FLAG_SUBFLOW) { addr_max = pernet->local_addr_max; WRITE_ONCE(pernet->local_addr_max, addr_max - 1); } pernet->addrs--; list_del_rcu(&entry->list); __clear_bit(entry->addr.id, pernet->id_bitmap); spin_unlock_bh(&pernet->lock); mptcp_nl_remove_subflow_and_signal_addr(sock_net(skb->sk), entry); synchronize_rcu(); __mptcp_pm_release_addr_entry(entry); return ret; } static void mptcp_pm_flush_addrs_and_subflows(struct mptcp_sock *msk, struct list_head *rm_list) { struct mptcp_rm_list alist = { .nr = 0 }, slist = { .nr = 0 }; struct mptcp_pm_addr_entry *entry; list_for_each_entry(entry, rm_list, list) { if (slist.nr < MPTCP_RM_IDS_MAX && mptcp_lookup_subflow_by_saddr(&msk->conn_list, &entry->addr)) slist.ids[slist.nr++] = mptcp_endp_get_local_id(msk, &entry->addr); if (alist.nr < MPTCP_RM_IDS_MAX && mptcp_remove_anno_list_by_saddr(msk, &entry->addr)) alist.ids[alist.nr++] = mptcp_endp_get_local_id(msk, &entry->addr); } spin_lock_bh(&msk->pm.lock); if (alist.nr) { msk->pm.add_addr_signaled -= alist.nr; mptcp_pm_remove_addr(msk, &alist); } if (slist.nr) mptcp_pm_rm_subflow(msk, &slist); /* Reset counters: maybe some subflows have been removed before */ bitmap_fill(msk->pm.id_avail_bitmap, MPTCP_PM_MAX_ADDR_ID + 1); msk->pm.local_addr_used = 0; spin_unlock_bh(&msk->pm.lock); } static void mptcp_nl_flush_addrs_list(struct net *net, struct list_head *rm_list) { long s_slot = 0, s_num = 0; struct mptcp_sock *msk; if (list_empty(rm_list)) return; while ((msk = mptcp_token_iter_next(net, &s_slot, &s_num)) != NULL) { struct sock *sk = (struct sock *)msk; if (!mptcp_pm_is_userspace(msk)) { lock_sock(sk); mptcp_pm_flush_addrs_and_subflows(msk, rm_list); release_sock(sk); } sock_put(sk); cond_resched(); } } /* caller must ensure the RCU grace period is already elapsed */ static void __flush_addrs(struct list_head *list) { while (!list_empty(list)) { struct mptcp_pm_addr_entry *cur; cur = list_entry(list->next, struct mptcp_pm_addr_entry, list); list_del_rcu(&cur->list); __mptcp_pm_release_addr_entry(cur); } } static void __reset_counters(struct pm_nl_pernet *pernet) { WRITE_ONCE(pernet->add_addr_signal_max, 0); WRITE_ONCE(pernet->add_addr_accept_max, 0); WRITE_ONCE(pernet->local_addr_max, 0); pernet->addrs = 0; } int mptcp_pm_nl_flush_addrs_doit(struct sk_buff *skb, struct genl_info *info) { struct pm_nl_pernet *pernet = genl_info_pm_nl(info); LIST_HEAD(free_list); spin_lock_bh(&pernet->lock); list_splice_init(&pernet->local_addr_list, &free_list); __reset_counters(pernet); pernet->next_id = 1; bitmap_zero(pernet->id_bitmap, MPTCP_PM_MAX_ADDR_ID + 1); spin_unlock_bh(&pernet->lock); mptcp_nl_flush_addrs_list(sock_net(skb->sk), &free_list); synchronize_rcu(); __flush_addrs(&free_list); return 0; } int mptcp_pm_nl_get_addr(u8 id, struct mptcp_pm_addr_entry *addr, struct genl_info *info) { struct pm_nl_pernet *pernet = genl_info_pm_nl(info); struct mptcp_pm_addr_entry *entry; int ret = -EINVAL; rcu_read_lock(); entry = __lookup_addr_by_id(pernet, id); if (entry) { *addr = *entry; ret = 0; } rcu_read_unlock(); return ret; } int mptcp_pm_nl_dump_addr(struct sk_buff *msg, struct netlink_callback *cb) { struct net *net = sock_net(msg->sk); struct mptcp_pm_addr_entry *entry; struct pm_nl_pernet *pernet; int id = cb->args[0]; int i; pernet = pm_nl_get_pernet(net); rcu_read_lock(); for (i = id; i < MPTCP_PM_MAX_ADDR_ID + 1; i++) { if (test_bit(i, pernet->id_bitmap)) { entry = __lookup_addr_by_id(pernet, i); if (!entry) break; if (entry->addr.id <= id) continue; if (mptcp_pm_genl_fill_addr(msg, cb, entry) < 0) break; id = entry->addr.id; } } rcu_read_unlock(); cb->args[0] = id; return msg->len; } static int parse_limit(struct genl_info *info, int id, unsigned int *limit) { struct nlattr *attr = info->attrs[id]; if (!attr) return 0; *limit = nla_get_u32(attr); if (*limit > MPTCP_PM_ADDR_MAX) { NL_SET_ERR_MSG_ATTR_FMT(info->extack, attr, "limit greater than maximum (%u)", MPTCP_PM_ADDR_MAX); return -EINVAL; } return 0; } int mptcp_pm_nl_set_limits_doit(struct sk_buff *skb, struct genl_info *info) { struct pm_nl_pernet *pernet = genl_info_pm_nl(info); unsigned int rcv_addrs, subflows; int ret; spin_lock_bh(&pernet->lock); rcv_addrs = pernet->add_addr_accept_max; ret = parse_limit(info, MPTCP_PM_ATTR_RCV_ADD_ADDRS, &rcv_addrs); if (ret) goto unlock; subflows = pernet->subflows_max; ret = parse_limit(info, MPTCP_PM_ATTR_SUBFLOWS, &subflows); if (ret) goto unlock; WRITE_ONCE(pernet->add_addr_accept_max, rcv_addrs); WRITE_ONCE(pernet->subflows_max, subflows); unlock: spin_unlock_bh(&pernet->lock); return ret; } int mptcp_pm_nl_get_limits_doit(struct sk_buff *skb, struct genl_info *info) { struct pm_nl_pernet *pernet = genl_info_pm_nl(info); struct sk_buff *msg; void *reply; msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); if (!msg) return -ENOMEM; reply = genlmsg_put_reply(msg, info, &mptcp_genl_family, 0, MPTCP_PM_CMD_GET_LIMITS); if (!reply) goto fail; if (nla_put_u32(msg, MPTCP_PM_ATTR_RCV_ADD_ADDRS, READ_ONCE(pernet->add_addr_accept_max))) goto fail; if (nla_put_u32(msg, MPTCP_PM_ATTR_SUBFLOWS, READ_ONCE(pernet->subflows_max))) goto fail; genlmsg_end(msg, reply); return genlmsg_reply(msg, info); fail: GENL_SET_ERR_MSG(info, "not enough space in Netlink message"); nlmsg_free(msg); return -EMSGSIZE; } static void mptcp_pm_nl_fullmesh(struct mptcp_sock *msk, struct mptcp_addr_info *addr) { struct mptcp_rm_list list = { .nr = 0 }; list.ids[list.nr++] = mptcp_endp_get_local_id(msk, addr); spin_lock_bh(&msk->pm.lock); mptcp_pm_rm_subflow(msk, &list); __mark_subflow_endp_available(msk, list.ids[0]); mptcp_pm_create_subflow_or_signal_addr(msk); spin_unlock_bh(&msk->pm.lock); } static void mptcp_pm_nl_set_flags_all(struct net *net, struct mptcp_pm_addr_entry *local, u8 changed) { u8 is_subflow = !!(local->flags & MPTCP_PM_ADDR_FLAG_SUBFLOW); u8 bkup = !!(local->flags & MPTCP_PM_ADDR_FLAG_BACKUP); long s_slot = 0, s_num = 0; struct mptcp_sock *msk; if (changed == MPTCP_PM_ADDR_FLAG_FULLMESH && !is_subflow) return; while ((msk = mptcp_token_iter_next(net, &s_slot, &s_num)) != NULL) { struct sock *sk = (struct sock *)msk; if (list_empty(&msk->conn_list) || mptcp_pm_is_userspace(msk)) goto next; lock_sock(sk); if (changed & MPTCP_PM_ADDR_FLAG_BACKUP) mptcp_pm_mp_prio_send_ack(msk, &local->addr, NULL, bkup); /* Subflows will only be recreated if the SUBFLOW flag is set */ if (is_subflow && (changed & MPTCP_PM_ADDR_FLAG_FULLMESH)) mptcp_pm_nl_fullmesh(msk, &local->addr); release_sock(sk); next: sock_put(sk); cond_resched(); } } int mptcp_pm_nl_set_flags(struct mptcp_pm_addr_entry *local, struct genl_info *info) { struct nlattr *attr = info->attrs[MPTCP_PM_ATTR_ADDR]; u8 changed, mask = MPTCP_PM_ADDR_FLAG_BACKUP | MPTCP_PM_ADDR_FLAG_FULLMESH; struct net *net = genl_info_net(info); struct mptcp_pm_addr_entry *entry; struct pm_nl_pernet *pernet; u8 lookup_by_id = 0; pernet = pm_nl_get_pernet(net); if (local->addr.family == AF_UNSPEC) { lookup_by_id = 1; if (!local->addr.id) { NL_SET_ERR_MSG_ATTR(info->extack, attr, "missing address ID"); return -EOPNOTSUPP; } } spin_lock_bh(&pernet->lock); entry = lookup_by_id ? __lookup_addr_by_id(pernet, local->addr.id) : __lookup_addr(pernet, &local->addr); if (!entry) { spin_unlock_bh(&pernet->lock); NL_SET_ERR_MSG_ATTR(info->extack, attr, "address not found"); return -EINVAL; } if ((local->flags & MPTCP_PM_ADDR_FLAG_FULLMESH) && (entry->flags & (MPTCP_PM_ADDR_FLAG_SIGNAL | MPTCP_PM_ADDR_FLAG_IMPLICIT))) { spin_unlock_bh(&pernet->lock); NL_SET_ERR_MSG_ATTR(info->extack, attr, "invalid addr flags"); return -EINVAL; } changed = (local->flags ^ entry->flags) & mask; entry->flags = (entry->flags & ~mask) | (local->flags & mask); *local = *entry; spin_unlock_bh(&pernet->lock); mptcp_pm_nl_set_flags_all(net, local, changed); return 0; } bool mptcp_pm_nl_check_work_pending(struct mptcp_sock *msk) { struct pm_nl_pernet *pernet = pm_nl_get_pernet_from_msk(msk); if (msk->pm.subflows == mptcp_pm_get_subflows_max(msk) || (find_next_and_bit(pernet->id_bitmap, msk->pm.id_avail_bitmap, MPTCP_PM_MAX_ADDR_ID + 1, 0) == MPTCP_PM_MAX_ADDR_ID + 1)) { WRITE_ONCE(msk->pm.work_pending, false); return false; } return true; } /* Called under PM lock */ void __mptcp_pm_kernel_worker(struct mptcp_sock *msk) { struct mptcp_pm_data *pm = &msk->pm; if (pm->status & BIT(MPTCP_PM_ADD_ADDR_RECEIVED)) { pm->status &= ~BIT(MPTCP_PM_ADD_ADDR_RECEIVED); mptcp_pm_nl_add_addr_received(msk); } if (pm->status & BIT(MPTCP_PM_ESTABLISHED)) { pm->status &= ~BIT(MPTCP_PM_ESTABLISHED); mptcp_pm_nl_fully_established(msk); } if (pm->status & BIT(MPTCP_PM_SUBFLOW_ESTABLISHED)) { pm->status &= ~BIT(MPTCP_PM_SUBFLOW_ESTABLISHED); mptcp_pm_nl_subflow_established(msk); } } static int __net_init pm_nl_init_net(struct net *net) { struct pm_nl_pernet *pernet = pm_nl_get_pernet(net); INIT_LIST_HEAD_RCU(&pernet->local_addr_list); /* Cit. 2 subflows ought to be enough for anybody. */ pernet->subflows_max = 2; pernet->next_id = 1; pernet->stale_loss_cnt = 4; spin_lock_init(&pernet->lock); /* No need to initialize other pernet fields, the struct is zeroed at * allocation time. */ return 0; } static void __net_exit pm_nl_exit_net(struct list_head *net_list) { struct net *net; list_for_each_entry(net, net_list, exit_list) { struct pm_nl_pernet *pernet = pm_nl_get_pernet(net); /* net is removed from namespace list, can't race with * other modifiers, also netns core already waited for a * RCU grace period. */ __flush_addrs(&pernet->local_addr_list); } } static struct pernet_operations mptcp_pm_pernet_ops = { .init = pm_nl_init_net, .exit_batch = pm_nl_exit_net, .id = &pm_nl_pernet_id, .size = sizeof(struct pm_nl_pernet), }; struct mptcp_pm_ops mptcp_pm_kernel = { .name = "kernel", .owner = THIS_MODULE, }; void __init mptcp_pm_kernel_register(void) { if (register_pernet_subsys(&mptcp_pm_pernet_ops) < 0) panic("Failed to register MPTCP PM pernet subsystem.\n"); mptcp_pm_register(&mptcp_pm_kernel); } |
| 3 3 3 3 14 14 14 3 1 2 52 49 5 5 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 | // SPDX-License-Identifier: GPL-2.0-only /* * Copyright 2004, Instant802 Networks, Inc. * Copyright 2013-2014 Intel Mobile Communications GmbH * Copyright (C) 2022 Intel Corporation */ #include <linux/netdevice.h> #include <linux/skbuff.h> #include <linux/module.h> #include <linux/if_arp.h> #include <linux/types.h> #include <net/ip.h> #include <net/pkt_sched.h> #include <net/mac80211.h> #include "ieee80211_i.h" #include "wme.h" /* Default mapping in classifier to work with default * queue setup. */ const int ieee802_1d_to_ac[8] = { IEEE80211_AC_BE, IEEE80211_AC_BK, IEEE80211_AC_BK, IEEE80211_AC_BE, IEEE80211_AC_VI, IEEE80211_AC_VI, IEEE80211_AC_VO, IEEE80211_AC_VO }; static int wme_downgrade_ac(struct sk_buff *skb) { switch (skb->priority) { case 6: case 7: skb->priority = 5; /* VO -> VI */ return 0; case 4: case 5: skb->priority = 3; /* VI -> BE */ return 0; case 0: case 3: skb->priority = 2; /* BE -> BK */ return 0; default: return -1; } } /** * ieee80211_fix_reserved_tid - return the TID to use if this one is reserved * @tid: the assumed-reserved TID * * Returns: the alternative TID to use, or 0 on error */ static inline u8 ieee80211_fix_reserved_tid(u8 tid) { switch (tid) { case 0: return 3; case 1: return 2; case 2: return 1; case 3: return 0; case 4: return 5; case 5: return 4; case 6: return 7; case 7: return 6; } return 0; } static u16 ieee80211_downgrade_queue(struct ieee80211_sub_if_data *sdata, struct sta_info *sta, struct sk_buff *skb) { struct ieee80211_if_managed *ifmgd = &sdata->u.mgd; /* in case we are a client verify acm is not set for this ac */ while (sdata->wmm_acm & BIT(skb->priority)) { int ac = ieee802_1d_to_ac[skb->priority]; if (ifmgd->tx_tspec[ac].admitted_time && skb->priority == ifmgd->tx_tspec[ac].up) return ac; if (wme_downgrade_ac(skb)) { /* * This should not really happen. The AP has marked all * lower ACs to require admission control which is not * a reasonable configuration. Allow the frame to be * transmitted using AC_BK as a workaround. */ break; } } /* Check to see if this is a reserved TID */ if (sta && sta->reserved_tid == skb->priority) skb->priority = ieee80211_fix_reserved_tid(skb->priority); /* look up which queue to use for frames with this 1d tag */ return ieee802_1d_to_ac[skb->priority]; } /* Indicate which queue to use for this fully formed 802.11 frame */ u16 ieee80211_select_queue_80211(struct ieee80211_sub_if_data *sdata, struct sk_buff *skb, struct ieee80211_hdr *hdr) { struct ieee80211_local *local = sdata->local; struct ieee80211_tx_info *info = IEEE80211_SKB_CB(skb); u8 *p; /* Ensure hash is set prior to potential SW encryption */ skb_get_hash(skb); if ((info->control.flags & IEEE80211_TX_CTRL_DONT_REORDER) || local->hw.queues < IEEE80211_NUM_ACS) return 0; if (!ieee80211_is_data(hdr->frame_control)) { skb->priority = 7; return ieee802_1d_to_ac[skb->priority]; } if (!ieee80211_is_data_qos(hdr->frame_control)) { skb->priority = 0; return ieee802_1d_to_ac[skb->priority]; } p = ieee80211_get_qos_ctl(hdr); skb->priority = *p & IEEE80211_QOS_CTL_TAG1D_MASK; return ieee80211_downgrade_queue(sdata, NULL, skb); } u16 ieee80211_select_queue(struct ieee80211_sub_if_data *sdata, struct sta_info *sta, struct sk_buff *skb) { const struct ethhdr *eth = (void *)skb->data; struct mac80211_qos_map *qos_map; bool qos; /* Ensure hash is set prior to potential SW encryption */ skb_get_hash(skb); /* all mesh/ocb stations are required to support WME */ if ((sdata->vif.type == NL80211_IFTYPE_MESH_POINT && !is_multicast_ether_addr(eth->h_dest)) || (sdata->vif.type == NL80211_IFTYPE_OCB && sta)) qos = true; else if (sta) qos = sta->sta.wme; else qos = false; if (!qos) { skb->priority = 0; /* required for correct WPA/11i MIC */ return IEEE80211_AC_BE; } if (skb->protocol == sdata->control_port_protocol) { skb->priority = 7; goto downgrade; } /* use the data classifier to determine what 802.1d tag the * data frame has */ qos_map = rcu_dereference(sdata->qos_map); skb->priority = cfg80211_classify8021d(skb, qos_map ? &qos_map->qos_map : NULL); downgrade: return ieee80211_downgrade_queue(sdata, sta, skb); } /** * ieee80211_set_qos_hdr - Fill in the QoS header if there is one. * * @sdata: local subif * @skb: packet to be updated */ void ieee80211_set_qos_hdr(struct ieee80211_sub_if_data *sdata, struct sk_buff *skb) { struct ieee80211_hdr *hdr = (void *)skb->data; struct ieee80211_tx_info *info = IEEE80211_SKB_CB(skb); u8 tid = skb->priority & IEEE80211_QOS_CTL_TAG1D_MASK; u8 flags; u8 *p; if (!ieee80211_is_data_qos(hdr->frame_control)) return; p = ieee80211_get_qos_ctl(hdr); /* don't overwrite the QoS field of injected frames */ if (info->flags & IEEE80211_TX_CTL_INJECTED) { /* do take into account Ack policy of injected frames */ if (*p & IEEE80211_QOS_CTL_ACK_POLICY_NOACK) info->flags |= IEEE80211_TX_CTL_NO_ACK; return; } /* set up the first byte */ /* * preserve everything but the TID and ACK policy * (which we both write here) */ flags = *p & ~(IEEE80211_QOS_CTL_TID_MASK | IEEE80211_QOS_CTL_ACK_POLICY_MASK); if (is_multicast_ether_addr(hdr->addr1) || sdata->noack_map & BIT(tid)) { flags |= IEEE80211_QOS_CTL_ACK_POLICY_NOACK; info->flags |= IEEE80211_TX_CTL_NO_ACK; } *p = flags | tid; /* set up the second byte */ p++; if (ieee80211_vif_is_mesh(&sdata->vif)) { /* preserve RSPI and Mesh PS Level bit */ *p &= ((IEEE80211_QOS_CTL_RSPI | IEEE80211_QOS_CTL_MESH_PS_LEVEL) >> 8); /* Nulls don't have a mesh header (frame body) */ if (!ieee80211_is_qos_nullfunc(hdr->frame_control)) *p |= (IEEE80211_QOS_CTL_MESH_CONTROL_PRESENT >> 8); } else { *p = 0; } } |
| 2 1 2 1 1 1 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 | // SPDX-License-Identifier: GPL-2.0-only /* * CAN driver for PEAK System PCAN-USB FD / PCAN-USB Pro FD adapter * * Copyright (C) 2013-2014 Stephane Grosjean <s.grosjean@peak-system.com> */ #include <linux/ethtool.h> #include <linux/module.h> #include <linux/netdevice.h> #include <linux/usb.h> #include <linux/can.h> #include <linux/can/dev.h> #include <linux/can/error.h> #include <linux/can/dev/peak_canfd.h> #include "pcan_usb_core.h" #include "pcan_usb_pro.h" #define PCAN_USBPROFD_CHANNEL_COUNT 2 #define PCAN_USBFD_CHANNEL_COUNT 1 /* PCAN-USB Pro FD adapter internal clock (Hz) */ #define PCAN_UFD_CRYSTAL_HZ 80000000 #define PCAN_UFD_CMD_BUFFER_SIZE 512 #define PCAN_UFD_LOSPD_PKT_SIZE 64 /* PCAN-USB Pro FD command timeout (ms.) */ #define PCAN_UFD_CMD_TIMEOUT_MS 1000 /* PCAN-USB Pro FD rx/tx buffers size */ #define PCAN_UFD_RX_BUFFER_SIZE 2048 #define PCAN_UFD_TX_BUFFER_SIZE 512 /* struct pcan_ufd_fw_info::type */ #define PCAN_USBFD_TYPE_STD 1 #define PCAN_USBFD_TYPE_EXT 2 /* includes EP numbers */ /* read some versions info from the hw device */ struct __packed pcan_ufd_fw_info { __le16 size_of; /* sizeof this */ __le16 type; /* type of this structure */ u8 hw_type; /* Type of hardware (HW_TYPE_xxx) */ u8 bl_version[3]; /* Bootloader version */ u8 hw_version; /* Hardware version (PCB) */ u8 fw_version[3]; /* Firmware version */ __le32 dev_id[2]; /* "device id" per CAN */ __le32 ser_no; /* S/N */ __le32 flags; /* special functions */ /* extended data when type == PCAN_USBFD_TYPE_EXT */ u8 cmd_out_ep; /* ep for cmd */ u8 cmd_in_ep; /* ep for replies */ u8 data_out_ep[2]; /* ep for CANx TX */ u8 data_in_ep; /* ep for CAN RX */ u8 dummy[3]; }; /* handle device specific info used by the netdevices */ struct pcan_usb_fd_if { struct peak_usb_device *dev[PCAN_USB_MAX_CHANNEL]; struct pcan_ufd_fw_info fw_info; struct peak_time_ref time_ref; int cm_ignore_count; int dev_opened_count; }; /* device information */ struct pcan_usb_fd_device { struct peak_usb_device dev; struct can_berr_counter bec; struct pcan_usb_fd_if *usb_if; u8 *cmd_buffer_addr; }; /* Extended USB commands (non uCAN commands) */ /* Clock Modes command */ #define PCAN_UFD_CMD_CLK_SET 0x80 #define PCAN_UFD_CLK_80MHZ 0x0 #define PCAN_UFD_CLK_60MHZ 0x1 #define PCAN_UFD_CLK_40MHZ 0x2 #define PCAN_UFD_CLK_30MHZ 0x3 #define PCAN_UFD_CLK_24MHZ 0x4 #define PCAN_UFD_CLK_20MHZ 0x5 #define PCAN_UFD_CLK_DEF PCAN_UFD_CLK_80MHZ struct __packed pcan_ufd_clock { __le16 opcode_channel; u8 mode; u8 unused[5]; }; /* LED control command */ #define PCAN_UFD_CMD_LED_SET 0x86 #define PCAN_UFD_LED_DEV 0x00 #define PCAN_UFD_LED_FAST 0x01 #define PCAN_UFD_LED_SLOW 0x02 #define PCAN_UFD_LED_ON 0x03 #define PCAN_UFD_LED_OFF 0x04 #define PCAN_UFD_LED_DEF PCAN_UFD_LED_DEV struct __packed pcan_ufd_led { __le16 opcode_channel; u8 mode; u8 unused[5]; }; /* Extended usage of uCAN commands CMD_xxx_xx_OPTION for PCAN-USB Pro FD */ #define PCAN_UFD_FLTEXT_CALIBRATION 0x8000 struct __packed pcan_ufd_options { __le16 opcode_channel; __le16 ucan_mask; u16 unused; __le16 usb_mask; }; /* Extended usage of uCAN messages for PCAN-USB Pro FD */ #define PCAN_UFD_MSG_CALIBRATION 0x100 struct __packed pcan_ufd_ts_msg { __le16 size; __le16 type; __le32 ts_low; __le32 ts_high; __le16 usb_frame_index; u16 unused; }; #define PCAN_UFD_MSG_OVERRUN 0x101 #define PCAN_UFD_OVMSG_CHANNEL(o) ((o)->channel & 0xf) struct __packed pcan_ufd_ovr_msg { __le16 size; __le16 type; __le32 ts_low; __le32 ts_high; u8 channel; u8 unused[3]; }; #define PCAN_UFD_CMD_DEVID_SET 0x81 struct __packed pcan_ufd_device_id { __le16 opcode_channel; u16 unused; __le32 device_id; }; static inline int pufd_omsg_get_channel(struct pcan_ufd_ovr_msg *om) { return om->channel & 0xf; } /* Clock mode frequency values */ static const u32 pcan_usb_fd_clk_freq[6] = { [PCAN_UFD_CLK_80MHZ] = 80000000, [PCAN_UFD_CLK_60MHZ] = 60000000, [PCAN_UFD_CLK_40MHZ] = 40000000, [PCAN_UFD_CLK_30MHZ] = 30000000, [PCAN_UFD_CLK_24MHZ] = 24000000, [PCAN_UFD_CLK_20MHZ] = 20000000 }; /* return a device USB interface */ static inline struct pcan_usb_fd_if *pcan_usb_fd_dev_if(struct peak_usb_device *dev) { struct pcan_usb_fd_device *pdev = container_of(dev, struct pcan_usb_fd_device, dev); return pdev->usb_if; } /* return a device USB commands buffer */ static inline void *pcan_usb_fd_cmd_buffer(struct peak_usb_device *dev) { struct pcan_usb_fd_device *pdev = container_of(dev, struct pcan_usb_fd_device, dev); return pdev->cmd_buffer_addr; } /* send PCAN-USB Pro FD commands synchronously */ static int pcan_usb_fd_send_cmd(struct peak_usb_device *dev, void *cmd_tail) { struct pcan_usb_fd_device *pdev = container_of(dev, struct pcan_usb_fd_device, dev); struct pcan_ufd_fw_info *fw_info = &pdev->usb_if->fw_info; void *cmd_head = pcan_usb_fd_cmd_buffer(dev); int err = 0; u8 *packet_ptr; int packet_len; ptrdiff_t cmd_len; /* usb device unregistered? */ if (!(dev->state & PCAN_USB_STATE_CONNECTED)) return 0; /* if a packet is not filled completely by commands, the command list * is terminated with an "end of collection" record. */ cmd_len = cmd_tail - cmd_head; if (cmd_len <= (PCAN_UFD_CMD_BUFFER_SIZE - sizeof(u64))) { memset(cmd_tail, 0xff, sizeof(u64)); cmd_len += sizeof(u64); } packet_ptr = cmd_head; packet_len = cmd_len; /* firmware is not able to re-assemble 512 bytes buffer in full-speed */ if (unlikely(dev->udev->speed != USB_SPEED_HIGH)) packet_len = min(packet_len, PCAN_UFD_LOSPD_PKT_SIZE); do { err = usb_bulk_msg(dev->udev, usb_sndbulkpipe(dev->udev, fw_info->cmd_out_ep), packet_ptr, packet_len, NULL, PCAN_UFD_CMD_TIMEOUT_MS); if (err) { netdev_err(dev->netdev, "sending command failure: %d\n", err); break; } packet_ptr += packet_len; cmd_len -= packet_len; if (cmd_len < PCAN_UFD_LOSPD_PKT_SIZE) packet_len = cmd_len; } while (packet_len > 0); return err; } static int pcan_usb_fd_read_fwinfo(struct peak_usb_device *dev, struct pcan_ufd_fw_info *fw_info) { return pcan_usb_pro_send_req(dev, PCAN_USBPRO_REQ_INFO, PCAN_USBPRO_INFO_FW, fw_info, sizeof(*fw_info)); } /* build the commands list in the given buffer, to enter operational mode */ static int pcan_usb_fd_build_restart_cmd(struct peak_usb_device *dev, u8 *buf) { struct pucan_wr_err_cnt *prc; struct pucan_command *cmd; u8 *pc = buf; /* 1st, reset error counters: */ prc = (struct pucan_wr_err_cnt *)pc; prc->opcode_channel = pucan_cmd_opcode_channel(dev->ctrl_idx, PUCAN_CMD_WR_ERR_CNT); /* select both counters */ prc->sel_mask = cpu_to_le16(PUCAN_WRERRCNT_TE|PUCAN_WRERRCNT_RE); /* and reset their values */ prc->tx_counter = 0; prc->rx_counter = 0; /* moves the pointer forward */ pc += sizeof(struct pucan_wr_err_cnt); /* add command to switch from ISO to non-ISO mode, if fw allows it */ if (dev->can.ctrlmode_supported & CAN_CTRLMODE_FD_NON_ISO) { struct pucan_options *puo = (struct pucan_options *)pc; puo->opcode_channel = (dev->can.ctrlmode & CAN_CTRLMODE_FD_NON_ISO) ? pucan_cmd_opcode_channel(dev->ctrl_idx, PUCAN_CMD_CLR_DIS_OPTION) : pucan_cmd_opcode_channel(dev->ctrl_idx, PUCAN_CMD_SET_EN_OPTION); puo->options = cpu_to_le16(PUCAN_OPTION_CANDFDISO); /* to be sure that no other extended bits will be taken into * account */ puo->unused = 0; /* moves the pointer forward */ pc += sizeof(struct pucan_options); } /* next, go back to operational mode */ cmd = (struct pucan_command *)pc; cmd->opcode_channel = pucan_cmd_opcode_channel(dev->ctrl_idx, (dev->can.ctrlmode & CAN_CTRLMODE_LISTENONLY) ? PUCAN_CMD_LISTEN_ONLY_MODE : PUCAN_CMD_NORMAL_MODE); pc += sizeof(struct pucan_command); return pc - buf; } /* set CAN bus on/off */ static int pcan_usb_fd_set_bus(struct peak_usb_device *dev, u8 onoff) { u8 *pc = pcan_usb_fd_cmd_buffer(dev); int l; if (onoff) { /* build the cmds list to enter operational mode */ l = pcan_usb_fd_build_restart_cmd(dev, pc); } else { struct pucan_command *cmd = (struct pucan_command *)pc; /* build cmd to go back to reset mode */ cmd->opcode_channel = pucan_cmd_opcode_channel(dev->ctrl_idx, PUCAN_CMD_RESET_MODE); l = sizeof(struct pucan_command); } /* send the command */ return pcan_usb_fd_send_cmd(dev, pc + l); } /* set filtering masks: * * idx in range [0..63] selects a row #idx, all rows otherwise * mask in range [0..0xffffffff] defines up to 32 CANIDs in the row(s) * * Each bit of this 64 x 32 bits array defines a CANID value: * * bit[i,j] = 1 implies that CANID=(i x 32)+j will be received, while * bit[i,j] = 0 implies that CANID=(i x 32)+j will be discarded. */ static int pcan_usb_fd_set_filter_std(struct peak_usb_device *dev, int idx, u32 mask) { struct pucan_filter_std *cmd = pcan_usb_fd_cmd_buffer(dev); int i, n; /* select all rows when idx is out of range [0..63] */ if ((idx < 0) || (idx >= (1 << PUCAN_FLTSTD_ROW_IDX_BITS))) { n = 1 << PUCAN_FLTSTD_ROW_IDX_BITS; idx = 0; /* select the row (and only the row) otherwise */ } else { n = idx + 1; } for (i = idx; i < n; i++, cmd++) { cmd->opcode_channel = pucan_cmd_opcode_channel(dev->ctrl_idx, PUCAN_CMD_FILTER_STD); cmd->idx = cpu_to_le16(i); cmd->mask = cpu_to_le32(mask); } /* send the command */ return pcan_usb_fd_send_cmd(dev, cmd); } /* set/unset options * * onoff set(1)/unset(0) options * mask each bit defines a kind of options to set/unset */ static int pcan_usb_fd_set_options(struct peak_usb_device *dev, bool onoff, u16 ucan_mask, u16 usb_mask) { struct pcan_ufd_options *cmd = pcan_usb_fd_cmd_buffer(dev); cmd->opcode_channel = pucan_cmd_opcode_channel(dev->ctrl_idx, (onoff) ? PUCAN_CMD_SET_EN_OPTION : PUCAN_CMD_CLR_DIS_OPTION); cmd->ucan_mask = cpu_to_le16(ucan_mask); cmd->usb_mask = cpu_to_le16(usb_mask); /* send the command */ return pcan_usb_fd_send_cmd(dev, ++cmd); } /* setup LED control */ static int pcan_usb_fd_set_can_led(struct peak_usb_device *dev, u8 led_mode) { struct pcan_ufd_led *cmd = pcan_usb_fd_cmd_buffer(dev); cmd->opcode_channel = pucan_cmd_opcode_channel(dev->ctrl_idx, PCAN_UFD_CMD_LED_SET); cmd->mode = led_mode; /* send the command */ return pcan_usb_fd_send_cmd(dev, ++cmd); } /* set CAN clock domain */ static int pcan_usb_fd_set_clock_domain(struct peak_usb_device *dev, u8 clk_mode) { struct pcan_ufd_clock *cmd = pcan_usb_fd_cmd_buffer(dev); cmd->opcode_channel = pucan_cmd_opcode_channel(dev->ctrl_idx, PCAN_UFD_CMD_CLK_SET); cmd->mode = clk_mode; /* send the command */ return pcan_usb_fd_send_cmd(dev, ++cmd); } /* set bittiming for CAN and CAN-FD header */ static int pcan_usb_fd_set_bittiming_slow(struct peak_usb_device *dev, struct can_bittiming *bt) { struct pucan_timing_slow *cmd = pcan_usb_fd_cmd_buffer(dev); cmd->opcode_channel = pucan_cmd_opcode_channel(dev->ctrl_idx, PUCAN_CMD_TIMING_SLOW); cmd->sjw_t = PUCAN_TSLOW_SJW_T(bt->sjw - 1, dev->can.ctrlmode & CAN_CTRLMODE_3_SAMPLES); cmd->tseg2 = PUCAN_TSLOW_TSEG2(bt->phase_seg2 - 1); cmd->tseg1 = PUCAN_TSLOW_TSEG1(bt->prop_seg + bt->phase_seg1 - 1); cmd->brp = cpu_to_le16(PUCAN_TSLOW_BRP(bt->brp - 1)); cmd->ewl = 96; /* default */ /* send the command */ return pcan_usb_fd_send_cmd(dev, ++cmd); } /* set CAN-FD bittiming for data */ static int pcan_usb_fd_set_bittiming_fast(struct peak_usb_device *dev, struct can_bittiming *bt) { struct pucan_timing_fast *cmd = pcan_usb_fd_cmd_buffer(dev); cmd->opcode_channel = pucan_cmd_opcode_channel(dev->ctrl_idx, PUCAN_CMD_TIMING_FAST); cmd->sjw = PUCAN_TFAST_SJW(bt->sjw - 1); cmd->tseg2 = PUCAN_TFAST_TSEG2(bt->phase_seg2 - 1); cmd->tseg1 = PUCAN_TFAST_TSEG1(bt->prop_seg + bt->phase_seg1 - 1); cmd->brp = cpu_to_le16(PUCAN_TFAST_BRP(bt->brp - 1)); /* send the command */ return pcan_usb_fd_send_cmd(dev, ++cmd); } /* read user CAN channel id from device */ static int pcan_usb_fd_get_can_channel_id(struct peak_usb_device *dev, u32 *can_ch_id) { int err; struct pcan_usb_fd_if *usb_if = pcan_usb_fd_dev_if(dev); err = pcan_usb_fd_read_fwinfo(dev, &usb_if->fw_info); if (err) return err; *can_ch_id = le32_to_cpu(usb_if->fw_info.dev_id[dev->ctrl_idx]); return err; } /* set a new CAN channel id in the flash memory of the device */ static int pcan_usb_fd_set_can_channel_id(struct peak_usb_device *dev, u32 can_ch_id) { struct pcan_ufd_device_id *cmd = pcan_usb_fd_cmd_buffer(dev); cmd->opcode_channel = pucan_cmd_opcode_channel(dev->ctrl_idx, PCAN_UFD_CMD_DEVID_SET); cmd->device_id = cpu_to_le32(can_ch_id); /* send the command */ return pcan_usb_fd_send_cmd(dev, ++cmd); } /* handle restart but in asynchronously way * (uses PCAN-USB Pro code to complete asynchronous request) */ static int pcan_usb_fd_restart_async(struct peak_usb_device *dev, struct urb *urb, u8 *buf) { struct pcan_usb_fd_device *pdev = container_of(dev, struct pcan_usb_fd_device, dev); struct pcan_ufd_fw_info *fw_info = &pdev->usb_if->fw_info; u8 *pc = buf; /* build the entire cmds list in the provided buffer, to go back into * operational mode. */ pc += pcan_usb_fd_build_restart_cmd(dev, pc); /* add EOC */ memset(pc, 0xff, sizeof(struct pucan_command)); pc += sizeof(struct pucan_command); /* complete the URB */ usb_fill_bulk_urb(urb, dev->udev, usb_sndbulkpipe(dev->udev, fw_info->cmd_out_ep), buf, pc - buf, pcan_usb_pro_restart_complete, dev); /* and submit it. */ return usb_submit_urb(urb, GFP_ATOMIC); } static int pcan_usb_fd_drv_loaded(struct peak_usb_device *dev, bool loaded) { struct pcan_usb_fd_device *pdev = container_of(dev, struct pcan_usb_fd_device, dev); pdev->cmd_buffer_addr[0] = 0; pdev->cmd_buffer_addr[1] = !!loaded; return pcan_usb_pro_send_req(dev, PCAN_USBPRO_REQ_FCT, PCAN_USBPRO_FCT_DRVLD, pdev->cmd_buffer_addr, PCAN_USBPRO_FCT_DRVLD_REQ_LEN); } static int pcan_usb_fd_decode_canmsg(struct pcan_usb_fd_if *usb_if, struct pucan_msg *rx_msg) { struct pucan_rx_msg *rm = (struct pucan_rx_msg *)rx_msg; struct peak_usb_device *dev; struct net_device *netdev; struct canfd_frame *cfd; struct sk_buff *skb; const u16 rx_msg_flags = le16_to_cpu(rm->flags); if (pucan_msg_get_channel(rm) >= ARRAY_SIZE(usb_if->dev)) return -ENOMEM; dev = usb_if->dev[pucan_msg_get_channel(rm)]; netdev = dev->netdev; if (rx_msg_flags & PUCAN_MSG_EXT_DATA_LEN) { /* CANFD frame case */ skb = alloc_canfd_skb(netdev, &cfd); if (!skb) return -ENOMEM; if (rx_msg_flags & PUCAN_MSG_BITRATE_SWITCH) cfd->flags |= CANFD_BRS; if (rx_msg_flags & PUCAN_MSG_ERROR_STATE_IND) cfd->flags |= CANFD_ESI; cfd->len = can_fd_dlc2len(pucan_msg_get_dlc(rm)); } else { /* CAN 2.0 frame case */ skb = alloc_can_skb(netdev, (struct can_frame **)&cfd); if (!skb) return -ENOMEM; can_frame_set_cc_len((struct can_frame *)cfd, pucan_msg_get_dlc(rm), dev->can.ctrlmode); } cfd->can_id = le32_to_cpu(rm->can_id); if (rx_msg_flags & PUCAN_MSG_EXT_ID) cfd->can_id |= CAN_EFF_FLAG; if (rx_msg_flags & PUCAN_MSG_RTR) { cfd->can_id |= CAN_RTR_FLAG; } else { memcpy(cfd->data, rm->d, cfd->len); netdev->stats.rx_bytes += cfd->len; } netdev->stats.rx_packets++; peak_usb_netif_rx_64(skb, le32_to_cpu(rm->ts_low), le32_to_cpu(rm->ts_high)); return 0; } /* handle uCAN status message */ static int pcan_usb_fd_decode_status(struct pcan_usb_fd_if *usb_if, struct pucan_msg *rx_msg) { struct pucan_status_msg *sm = (struct pucan_status_msg *)rx_msg; struct pcan_usb_fd_device *pdev; enum can_state new_state = CAN_STATE_ERROR_ACTIVE; enum can_state rx_state, tx_state; struct peak_usb_device *dev; struct net_device *netdev; struct can_frame *cf; struct sk_buff *skb; if (pucan_stmsg_get_channel(sm) >= ARRAY_SIZE(usb_if->dev)) return -ENOMEM; dev = usb_if->dev[pucan_stmsg_get_channel(sm)]; pdev = container_of(dev, struct pcan_usb_fd_device, dev); netdev = dev->netdev; /* nothing should be sent while in BUS_OFF state */ if (dev->can.state == CAN_STATE_BUS_OFF) return 0; if (sm->channel_p_w_b & PUCAN_BUS_BUSOFF) { new_state = CAN_STATE_BUS_OFF; } else if (sm->channel_p_w_b & PUCAN_BUS_PASSIVE) { new_state = CAN_STATE_ERROR_PASSIVE; } else if (sm->channel_p_w_b & PUCAN_BUS_WARNING) { new_state = CAN_STATE_ERROR_WARNING; } else { /* back to (or still in) ERROR_ACTIVE state */ new_state = CAN_STATE_ERROR_ACTIVE; pdev->bec.txerr = 0; pdev->bec.rxerr = 0; } /* state hasn't changed */ if (new_state == dev->can.state) return 0; /* handle bus state change */ tx_state = (pdev->bec.txerr >= pdev->bec.rxerr) ? new_state : 0; rx_state = (pdev->bec.txerr <= pdev->bec.rxerr) ? new_state : 0; /* allocate an skb to store the error frame */ skb = alloc_can_err_skb(netdev, &cf); can_change_state(netdev, cf, tx_state, rx_state); /* things must be done even in case of OOM */ if (new_state == CAN_STATE_BUS_OFF) can_bus_off(netdev); if (!skb) return -ENOMEM; peak_usb_netif_rx_64(skb, le32_to_cpu(sm->ts_low), le32_to_cpu(sm->ts_high)); return 0; } /* handle uCAN error message */ static int pcan_usb_fd_decode_error(struct pcan_usb_fd_if *usb_if, struct pucan_msg *rx_msg) { struct pucan_error_msg *er = (struct pucan_error_msg *)rx_msg; struct pcan_usb_fd_device *pdev; struct peak_usb_device *dev; if (pucan_ermsg_get_channel(er) >= ARRAY_SIZE(usb_if->dev)) return -EINVAL; dev = usb_if->dev[pucan_ermsg_get_channel(er)]; pdev = container_of(dev, struct pcan_usb_fd_device, dev); /* keep a trace of tx and rx error counters for later use */ pdev->bec.txerr = er->tx_err_cnt; pdev->bec.rxerr = er->rx_err_cnt; return 0; } /* handle uCAN overrun message */ static int pcan_usb_fd_decode_overrun(struct pcan_usb_fd_if *usb_if, struct pucan_msg *rx_msg) { struct pcan_ufd_ovr_msg *ov = (struct pcan_ufd_ovr_msg *)rx_msg; struct peak_usb_device *dev; struct net_device *netdev; struct can_frame *cf; struct sk_buff *skb; if (pufd_omsg_get_channel(ov) >= ARRAY_SIZE(usb_if->dev)) return -EINVAL; dev = usb_if->dev[pufd_omsg_get_channel(ov)]; netdev = dev->netdev; /* allocate an skb to store the error frame */ skb = alloc_can_err_skb(netdev, &cf); if (!skb) return -ENOMEM; cf->can_id |= CAN_ERR_CRTL; cf->data[1] |= CAN_ERR_CRTL_RX_OVERFLOW; peak_usb_netif_rx_64(skb, le32_to_cpu(ov->ts_low), le32_to_cpu(ov->ts_high)); netdev->stats.rx_over_errors++; netdev->stats.rx_errors++; return 0; } /* handle USB calibration message */ static void pcan_usb_fd_decode_ts(struct pcan_usb_fd_if *usb_if, struct pucan_msg *rx_msg) { struct pcan_ufd_ts_msg *ts = (struct pcan_ufd_ts_msg *)rx_msg; /* should wait until clock is stabilized */ if (usb_if->cm_ignore_count > 0) usb_if->cm_ignore_count--; else peak_usb_set_ts_now(&usb_if->time_ref, le32_to_cpu(ts->ts_low)); } /* callback for bulk IN urb */ static int pcan_usb_fd_decode_buf(struct peak_usb_device *dev, struct urb *urb) { struct pcan_usb_fd_if *usb_if = pcan_usb_fd_dev_if(dev); struct net_device *netdev = dev->netdev; struct pucan_msg *rx_msg; u8 *msg_ptr, *msg_end; int err = 0; /* loop reading all the records from the incoming message */ msg_ptr = urb->transfer_buffer; msg_end = urb->transfer_buffer + urb->actual_length; for (; msg_ptr < msg_end;) { u16 rx_msg_type, rx_msg_size; rx_msg = (struct pucan_msg *)msg_ptr; if (!rx_msg->size) { /* null packet found: end of list */ break; } rx_msg_size = le16_to_cpu(rx_msg->size); rx_msg_type = le16_to_cpu(rx_msg->type); /* check if the record goes out of current packet */ if (msg_ptr + rx_msg_size > msg_end) { netdev_err(netdev, "got frag rec: should inc usb rx buf sze\n"); err = -EBADMSG; break; } switch (rx_msg_type) { case PUCAN_MSG_CAN_RX: err = pcan_usb_fd_decode_canmsg(usb_if, rx_msg); if (err < 0) goto fail; break; case PCAN_UFD_MSG_CALIBRATION: pcan_usb_fd_decode_ts(usb_if, rx_msg); break; case PUCAN_MSG_ERROR: err = pcan_usb_fd_decode_error(usb_if, rx_msg); if (err < 0) goto fail; break; case PUCAN_MSG_STATUS: err = pcan_usb_fd_decode_status(usb_if, rx_msg); if (err < 0) goto fail; break; case PCAN_UFD_MSG_OVERRUN: err = pcan_usb_fd_decode_overrun(usb_if, rx_msg); if (err < 0) goto fail; break; default: netdev_err(netdev, "unhandled msg type 0x%02x (%d): ignored\n", rx_msg_type, rx_msg_type); break; } msg_ptr += rx_msg_size; } fail: if (err) pcan_dump_mem("received msg", urb->transfer_buffer, urb->actual_length); return err; } /* CAN/CANFD frames encoding callback */ static int pcan_usb_fd_encode_msg(struct peak_usb_device *dev, struct sk_buff *skb, u8 *obuf, size_t *size) { struct pucan_tx_msg *tx_msg = (struct pucan_tx_msg *)obuf; struct canfd_frame *cfd = (struct canfd_frame *)skb->data; u16 tx_msg_size, tx_msg_flags; u8 dlc; if (cfd->len > CANFD_MAX_DLEN) return -EINVAL; tx_msg_size = ALIGN(sizeof(struct pucan_tx_msg) + cfd->len, 4); tx_msg->size = cpu_to_le16(tx_msg_size); tx_msg->type = cpu_to_le16(PUCAN_MSG_CAN_TX); tx_msg_flags = 0; if (cfd->can_id & CAN_EFF_FLAG) { tx_msg_flags |= PUCAN_MSG_EXT_ID; tx_msg->can_id = cpu_to_le32(cfd->can_id & CAN_EFF_MASK); } else { tx_msg->can_id = cpu_to_le32(cfd->can_id & CAN_SFF_MASK); } if (can_is_canfd_skb(skb)) { /* considering a CANFD frame */ dlc = can_fd_len2dlc(cfd->len); tx_msg_flags |= PUCAN_MSG_EXT_DATA_LEN; if (cfd->flags & CANFD_BRS) tx_msg_flags |= PUCAN_MSG_BITRATE_SWITCH; if (cfd->flags & CANFD_ESI) tx_msg_flags |= PUCAN_MSG_ERROR_STATE_IND; } else { /* CAND 2.0 frames */ dlc = can_get_cc_dlc((struct can_frame *)cfd, dev->can.ctrlmode); if (cfd->can_id & CAN_RTR_FLAG) tx_msg_flags |= PUCAN_MSG_RTR; } /* Single-Shot frame */ if (dev->can.ctrlmode & CAN_CTRLMODE_ONE_SHOT) tx_msg_flags |= PUCAN_MSG_SINGLE_SHOT; tx_msg->flags = cpu_to_le16(tx_msg_flags); tx_msg->channel_dlc = PUCAN_MSG_CHANNEL_DLC(dev->ctrl_idx, dlc); memcpy(tx_msg->d, cfd->data, cfd->len); /* add null size message to tag the end (messages are 32-bits aligned) */ tx_msg = (struct pucan_tx_msg *)(obuf + tx_msg_size); tx_msg->size = 0; /* set the whole size of the USB packet to send */ *size = tx_msg_size + sizeof(u32); return 0; } /* start the interface (last chance before set bus on) */ static int pcan_usb_fd_start(struct peak_usb_device *dev) { struct pcan_usb_fd_device *pdev = container_of(dev, struct pcan_usb_fd_device, dev); int err; /* set filter mode: all acceptance */ err = pcan_usb_fd_set_filter_std(dev, -1, 0xffffffff); if (err) return err; /* opening first device: */ if (pdev->usb_if->dev_opened_count == 0) { /* reset time_ref */ peak_usb_init_time_ref(&pdev->usb_if->time_ref, &pcan_usb_pro_fd); /* enable USB calibration messages */ err = pcan_usb_fd_set_options(dev, 1, PUCAN_OPTION_ERROR, PCAN_UFD_FLTEXT_CALIBRATION); } pdev->usb_if->dev_opened_count++; /* reset cached error counters */ pdev->bec.txerr = 0; pdev->bec.rxerr = 0; return err; } /* socket callback used to copy berr counters values received through USB */ static int pcan_usb_fd_get_berr_counter(const struct net_device *netdev, struct can_berr_counter *bec) { struct peak_usb_device *dev = netdev_priv(netdev); struct pcan_usb_fd_device *pdev = container_of(dev, struct pcan_usb_fd_device, dev); *bec = pdev->bec; /* must return 0 */ return 0; } /* probe function for all PCAN-USB FD family usb interfaces */ static int pcan_usb_fd_probe(struct usb_interface *intf) { struct usb_host_interface *iface_desc = &intf->altsetting[0]; /* CAN interface is always interface #0 */ return iface_desc->desc.bInterfaceNumber; } /* stop interface (last chance before set bus off) */ static int pcan_usb_fd_stop(struct peak_usb_device *dev) { struct pcan_usb_fd_device *pdev = container_of(dev, struct pcan_usb_fd_device, dev); /* turn off special msgs for that interface if no other dev opened */ if (pdev->usb_if->dev_opened_count == 1) pcan_usb_fd_set_options(dev, 0, PUCAN_OPTION_ERROR, PCAN_UFD_FLTEXT_CALIBRATION); pdev->usb_if->dev_opened_count--; return 0; } /* called when probing, to initialize a device object */ static int pcan_usb_fd_init(struct peak_usb_device *dev) { struct pcan_usb_fd_device *pdev = container_of(dev, struct pcan_usb_fd_device, dev); struct pcan_ufd_fw_info *fw_info; int i, err = -ENOMEM; /* do this for 1st channel only */ if (!dev->prev_siblings) { /* allocate netdevices common structure attached to first one */ pdev->usb_if = kzalloc(sizeof(*pdev->usb_if), GFP_KERNEL); if (!pdev->usb_if) goto err_out; /* allocate command buffer once for all for the interface */ pdev->cmd_buffer_addr = kzalloc(PCAN_UFD_CMD_BUFFER_SIZE, GFP_KERNEL); if (!pdev->cmd_buffer_addr) goto err_out_1; /* number of ts msgs to ignore before taking one into account */ pdev->usb_if->cm_ignore_count = 5; fw_info = &pdev->usb_if->fw_info; err = pcan_usb_fd_read_fwinfo(dev, fw_info); if (err) { dev_err(dev->netdev->dev.parent, "unable to read %s firmware info (err %d)\n", dev->adapter->name, err); goto err_out_2; } /* explicit use of dev_xxx() instead of netdev_xxx() here: * information displayed are related to the device itself, not * to the canx (channel) device. */ dev_info(dev->netdev->dev.parent, "PEAK-System %s v%u fw v%u.%u.%u (%u channels)\n", dev->adapter->name, fw_info->hw_version, fw_info->fw_version[0], fw_info->fw_version[1], fw_info->fw_version[2], dev->adapter->ctrl_count); /* check for ability to switch between ISO/non-ISO modes */ if (fw_info->fw_version[0] >= 2) { /* firmware >= 2.x supports ISO/non-ISO switching */ dev->can.ctrlmode_supported |= CAN_CTRLMODE_FD_NON_ISO; } else { /* firmware < 2.x only supports fixed(!) non-ISO */ dev->can.ctrlmode |= CAN_CTRLMODE_FD_NON_ISO; } /* if vendor rsp is of type 2, then it contains EP numbers to * use for cmds pipes. If not, then default EP should be used. */ if (fw_info->type != cpu_to_le16(PCAN_USBFD_TYPE_EXT)) { fw_info->cmd_out_ep = PCAN_USBPRO_EP_CMDOUT; fw_info->cmd_in_ep = PCAN_USBPRO_EP_CMDIN; } /* tell the hardware the can driver is running */ err = pcan_usb_fd_drv_loaded(dev, 1); if (err) { dev_err(dev->netdev->dev.parent, "unable to tell %s driver is loaded (err %d)\n", dev->adapter->name, err); goto err_out_2; } } else { /* otherwise, simply copy previous sibling's values */ struct pcan_usb_fd_device *ppdev = container_of(dev->prev_siblings, struct pcan_usb_fd_device, dev); pdev->usb_if = ppdev->usb_if; pdev->cmd_buffer_addr = ppdev->cmd_buffer_addr; /* do a copy of the ctrlmode[_supported] too */ dev->can.ctrlmode = ppdev->dev.can.ctrlmode; dev->can.ctrlmode_supported = ppdev->dev.can.ctrlmode_supported; fw_info = &pdev->usb_if->fw_info; } pdev->usb_if->dev[dev->ctrl_idx] = dev; dev->can_channel_id = le32_to_cpu(pdev->usb_if->fw_info.dev_id[dev->ctrl_idx]); /* if vendor rsp is of type 2, then it contains EP numbers to * use for data pipes. If not, then statically defined EP are used * (see peak_usb_create_dev()). */ if (fw_info->type == cpu_to_le16(PCAN_USBFD_TYPE_EXT)) { dev->ep_msg_in = fw_info->data_in_ep; dev->ep_msg_out = fw_info->data_out_ep[dev->ctrl_idx]; } /* set clock domain */ for (i = 0; i < ARRAY_SIZE(pcan_usb_fd_clk_freq); i++) if (dev->adapter->clock.freq == pcan_usb_fd_clk_freq[i]) break; if (i >= ARRAY_SIZE(pcan_usb_fd_clk_freq)) { dev_warn(dev->netdev->dev.parent, "incompatible clock frequencies\n"); err = -EINVAL; goto err_out_2; } pcan_usb_fd_set_clock_domain(dev, i); /* set LED in default state (end of init phase) */ pcan_usb_fd_set_can_led(dev, PCAN_UFD_LED_DEF); return 0; err_out_2: kfree(pdev->cmd_buffer_addr); err_out_1: kfree(pdev->usb_if); err_out: return err; } /* called when driver module is being unloaded */ static void pcan_usb_fd_exit(struct peak_usb_device *dev) { struct pcan_usb_fd_device *pdev = container_of(dev, struct pcan_usb_fd_device, dev); /* when rmmod called before unplug and if down, should reset things * before leaving */ if (dev->can.state != CAN_STATE_STOPPED) { /* set bus off on the corresponding channel */ pcan_usb_fd_set_bus(dev, 0); } /* switch off corresponding CAN LEDs */ pcan_usb_fd_set_can_led(dev, PCAN_UFD_LED_OFF); /* if channel #0 (only) */ if (dev->ctrl_idx == 0) { /* turn off calibration message if any device were opened */ if (pdev->usb_if->dev_opened_count > 0) pcan_usb_fd_set_options(dev, 0, PUCAN_OPTION_ERROR, PCAN_UFD_FLTEXT_CALIBRATION); /* tell USB adapter that the driver is being unloaded */ pcan_usb_fd_drv_loaded(dev, 0); } } /* called when the USB adapter is unplugged */ static void pcan_usb_fd_free(struct peak_usb_device *dev) { /* last device: can free shared objects now */ if (!dev->prev_siblings && !dev->next_siblings) { struct pcan_usb_fd_device *pdev = container_of(dev, struct pcan_usb_fd_device, dev); /* free commands buffer */ kfree(pdev->cmd_buffer_addr); /* free usb interface object */ kfree(pdev->usb_if); } } /* blink LED's */ static int pcan_usb_fd_set_phys_id(struct net_device *netdev, enum ethtool_phys_id_state state) { struct peak_usb_device *dev = netdev_priv(netdev); int err = 0; switch (state) { case ETHTOOL_ID_ACTIVE: err = pcan_usb_fd_set_can_led(dev, PCAN_UFD_LED_FAST); break; case ETHTOOL_ID_INACTIVE: err = pcan_usb_fd_set_can_led(dev, PCAN_UFD_LED_DEF); break; default: break; } return err; } static const struct ethtool_ops pcan_usb_fd_ethtool_ops = { .set_phys_id = pcan_usb_fd_set_phys_id, .get_ts_info = pcan_get_ts_info, .get_eeprom_len = peak_usb_get_eeprom_len, .get_eeprom = peak_usb_get_eeprom, .set_eeprom = peak_usb_set_eeprom, }; /* describes the PCAN-USB FD adapter */ static const struct can_bittiming_const pcan_usb_fd_const = { .name = "pcan_usb_fd", .tseg1_min = 1, .tseg1_max = (1 << PUCAN_TSLOW_TSGEG1_BITS), .tseg2_min = 1, .tseg2_max = (1 << PUCAN_TSLOW_TSGEG2_BITS), .sjw_max = (1 << PUCAN_TSLOW_SJW_BITS), .brp_min = 1, .brp_max = (1 << PUCAN_TSLOW_BRP_BITS), .brp_inc = 1, }; static const struct can_bittiming_const pcan_usb_fd_data_const = { .name = "pcan_usb_fd", .tseg1_min = 1, .tseg1_max = (1 << PUCAN_TFAST_TSGEG1_BITS), .tseg2_min = 1, .tseg2_max = (1 << PUCAN_TFAST_TSGEG2_BITS), .sjw_max = (1 << PUCAN_TFAST_SJW_BITS), .brp_min = 1, .brp_max = (1 << PUCAN_TFAST_BRP_BITS), .brp_inc = 1, }; const struct peak_usb_adapter pcan_usb_fd = { .name = "PCAN-USB FD", .device_id = PCAN_USBFD_PRODUCT_ID, .ctrl_count = PCAN_USBFD_CHANNEL_COUNT, .ctrlmode_supported = CAN_CTRLMODE_FD | CAN_CTRLMODE_3_SAMPLES | CAN_CTRLMODE_LISTENONLY | CAN_CTRLMODE_ONE_SHOT | CAN_CTRLMODE_CC_LEN8_DLC, .clock = { .freq = PCAN_UFD_CRYSTAL_HZ, }, .bittiming_const = &pcan_usb_fd_const, .data_bittiming_const = &pcan_usb_fd_data_const, /* size of device private data */ .sizeof_dev_private = sizeof(struct pcan_usb_fd_device), .ethtool_ops = &pcan_usb_fd_ethtool_ops, /* timestamps usage */ .ts_used_bits = 32, .us_per_ts_scale = 1, /* us = (ts * scale) >> shift */ .us_per_ts_shift = 0, /* give here messages in/out endpoints */ .ep_msg_in = PCAN_USBPRO_EP_MSGIN, .ep_msg_out = {PCAN_USBPRO_EP_MSGOUT_0}, /* size of rx/tx usb buffers */ .rx_buffer_size = PCAN_UFD_RX_BUFFER_SIZE, .tx_buffer_size = PCAN_UFD_TX_BUFFER_SIZE, /* device callbacks */ .intf_probe = pcan_usb_fd_probe, .dev_init = pcan_usb_fd_init, .dev_exit = pcan_usb_fd_exit, .dev_free = pcan_usb_fd_free, .dev_set_bus = pcan_usb_fd_set_bus, .dev_set_bittiming = pcan_usb_fd_set_bittiming_slow, .dev_set_data_bittiming = pcan_usb_fd_set_bittiming_fast, .dev_get_can_channel_id = pcan_usb_fd_get_can_channel_id, .dev_set_can_channel_id = pcan_usb_fd_set_can_channel_id, .dev_decode_buf = pcan_usb_fd_decode_buf, .dev_start = pcan_usb_fd_start, .dev_stop = pcan_usb_fd_stop, .dev_restart_async = pcan_usb_fd_restart_async, .dev_encode_msg = pcan_usb_fd_encode_msg, .do_get_berr_counter = pcan_usb_fd_get_berr_counter, }; /* describes the PCAN-CHIP USB */ static const struct can_bittiming_const pcan_usb_chip_const = { .name = "pcan_chip_usb", .tseg1_min = 1, .tseg1_max = (1 << PUCAN_TSLOW_TSGEG1_BITS), .tseg2_min = 1, .tseg2_max = (1 << PUCAN_TSLOW_TSGEG2_BITS), .sjw_max = (1 << PUCAN_TSLOW_SJW_BITS), .brp_min = 1, .brp_max = (1 << PUCAN_TSLOW_BRP_BITS), .brp_inc = 1, }; static const struct can_bittiming_const pcan_usb_chip_data_const = { .name = "pcan_chip_usb", .tseg1_min = 1, .tseg1_max = (1 << PUCAN_TFAST_TSGEG1_BITS), .tseg2_min = 1, .tseg2_max = (1 << PUCAN_TFAST_TSGEG2_BITS), .sjw_max = (1 << PUCAN_TFAST_SJW_BITS), .brp_min = 1, .brp_max = (1 << PUCAN_TFAST_BRP_BITS), .brp_inc = 1, }; const struct peak_usb_adapter pcan_usb_chip = { .name = "PCAN-Chip USB", .device_id = PCAN_USBCHIP_PRODUCT_ID, .ctrl_count = PCAN_USBFD_CHANNEL_COUNT, .ctrlmode_supported = CAN_CTRLMODE_FD | CAN_CTRLMODE_3_SAMPLES | CAN_CTRLMODE_LISTENONLY | CAN_CTRLMODE_ONE_SHOT | CAN_CTRLMODE_CC_LEN8_DLC, .clock = { .freq = PCAN_UFD_CRYSTAL_HZ, }, .bittiming_const = &pcan_usb_chip_const, .data_bittiming_const = &pcan_usb_chip_data_const, /* size of device private data */ .sizeof_dev_private = sizeof(struct pcan_usb_fd_device), .ethtool_ops = &pcan_usb_fd_ethtool_ops, /* timestamps usage */ .ts_used_bits = 32, .us_per_ts_scale = 1, /* us = (ts * scale) >> shift */ .us_per_ts_shift = 0, /* give here messages in/out endpoints */ .ep_msg_in = PCAN_USBPRO_EP_MSGIN, .ep_msg_out = {PCAN_USBPRO_EP_MSGOUT_0}, /* size of rx/tx usb buffers */ .rx_buffer_size = PCAN_UFD_RX_BUFFER_SIZE, .tx_buffer_size = PCAN_UFD_TX_BUFFER_SIZE, /* device callbacks */ .intf_probe = pcan_usb_pro_probe, /* same as PCAN-USB Pro */ .dev_init = pcan_usb_fd_init, .dev_exit = pcan_usb_fd_exit, .dev_free = pcan_usb_fd_free, .dev_set_bus = pcan_usb_fd_set_bus, .dev_set_bittiming = pcan_usb_fd_set_bittiming_slow, .dev_set_data_bittiming = pcan_usb_fd_set_bittiming_fast, .dev_get_can_channel_id = pcan_usb_fd_get_can_channel_id, .dev_set_can_channel_id = pcan_usb_fd_set_can_channel_id, .dev_decode_buf = pcan_usb_fd_decode_buf, .dev_start = pcan_usb_fd_start, .dev_stop = pcan_usb_fd_stop, .dev_restart_async = pcan_usb_fd_restart_async, .dev_encode_msg = pcan_usb_fd_encode_msg, .do_get_berr_counter = pcan_usb_fd_get_berr_counter, }; /* describes the PCAN-USB Pro FD adapter */ static const struct can_bittiming_const pcan_usb_pro_fd_const = { .name = "pcan_usb_pro_fd", .tseg1_min = 1, .tseg1_max = (1 << PUCAN_TSLOW_TSGEG1_BITS), .tseg2_min = 1, .tseg2_max = (1 << PUCAN_TSLOW_TSGEG2_BITS), .sjw_max = (1 << PUCAN_TSLOW_SJW_BITS), .brp_min = 1, .brp_max = (1 << PUCAN_TSLOW_BRP_BITS), .brp_inc = 1, }; static const struct can_bittiming_const pcan_usb_pro_fd_data_const = { .name = "pcan_usb_pro_fd", .tseg1_min = 1, .tseg1_max = (1 << PUCAN_TFAST_TSGEG1_BITS), .tseg2_min = 1, .tseg2_max = (1 << PUCAN_TFAST_TSGEG2_BITS), .sjw_max = (1 << PUCAN_TFAST_SJW_BITS), .brp_min = 1, .brp_max = (1 << PUCAN_TFAST_BRP_BITS), .brp_inc = 1, }; const struct peak_usb_adapter pcan_usb_pro_fd = { .name = "PCAN-USB Pro FD", .device_id = PCAN_USBPROFD_PRODUCT_ID, .ctrl_count = PCAN_USBPROFD_CHANNEL_COUNT, .ctrlmode_supported = CAN_CTRLMODE_FD | CAN_CTRLMODE_3_SAMPLES | CAN_CTRLMODE_LISTENONLY | CAN_CTRLMODE_ONE_SHOT | CAN_CTRLMODE_CC_LEN8_DLC, .clock = { .freq = PCAN_UFD_CRYSTAL_HZ, }, .bittiming_const = &pcan_usb_pro_fd_const, .data_bittiming_const = &pcan_usb_pro_fd_data_const, /* size of device private data */ .sizeof_dev_private = sizeof(struct pcan_usb_fd_device), .ethtool_ops = &pcan_usb_fd_ethtool_ops, /* timestamps usage */ .ts_used_bits = 32, .us_per_ts_scale = 1, /* us = (ts * scale) >> shift */ .us_per_ts_shift = 0, /* give here messages in/out endpoints */ .ep_msg_in = PCAN_USBPRO_EP_MSGIN, .ep_msg_out = {PCAN_USBPRO_EP_MSGOUT_0, PCAN_USBPRO_EP_MSGOUT_1}, /* size of rx/tx usb buffers */ .rx_buffer_size = PCAN_UFD_RX_BUFFER_SIZE, .tx_buffer_size = PCAN_UFD_TX_BUFFER_SIZE, /* device callbacks */ .intf_probe = pcan_usb_pro_probe, /* same as PCAN-USB Pro */ .dev_init = pcan_usb_fd_init, .dev_exit = pcan_usb_fd_exit, .dev_free = pcan_usb_fd_free, .dev_set_bus = pcan_usb_fd_set_bus, .dev_set_bittiming = pcan_usb_fd_set_bittiming_slow, .dev_set_data_bittiming = pcan_usb_fd_set_bittiming_fast, .dev_get_can_channel_id = pcan_usb_fd_get_can_channel_id, .dev_set_can_channel_id = pcan_usb_fd_set_can_channel_id, .dev_decode_buf = pcan_usb_fd_decode_buf, .dev_start = pcan_usb_fd_start, .dev_stop = pcan_usb_fd_stop, .dev_restart_async = pcan_usb_fd_restart_async, .dev_encode_msg = pcan_usb_fd_encode_msg, .do_get_berr_counter = pcan_usb_fd_get_berr_counter, }; /* describes the PCAN-USB X6 adapter */ static const struct can_bittiming_const pcan_usb_x6_const = { .name = "pcan_usb_x6", .tseg1_min = 1, .tseg1_max = (1 << PUCAN_TSLOW_TSGEG1_BITS), .tseg2_min = 1, .tseg2_max = (1 << PUCAN_TSLOW_TSGEG2_BITS), .sjw_max = (1 << PUCAN_TSLOW_SJW_BITS), .brp_min = 1, .brp_max = (1 << PUCAN_TSLOW_BRP_BITS), .brp_inc = 1, }; static const struct can_bittiming_const pcan_usb_x6_data_const = { .name = "pcan_usb_x6", .tseg1_min = 1, .tseg1_max = (1 << PUCAN_TFAST_TSGEG1_BITS), .tseg2_min = 1, .tseg2_max = (1 << PUCAN_TFAST_TSGEG2_BITS), .sjw_max = (1 << PUCAN_TFAST_SJW_BITS), .brp_min = 1, .brp_max = (1 << PUCAN_TFAST_BRP_BITS), .brp_inc = 1, }; const struct peak_usb_adapter pcan_usb_x6 = { .name = "PCAN-USB X6", .device_id = PCAN_USBX6_PRODUCT_ID, .ctrl_count = PCAN_USBPROFD_CHANNEL_COUNT, .ctrlmode_supported = CAN_CTRLMODE_FD | CAN_CTRLMODE_3_SAMPLES | CAN_CTRLMODE_LISTENONLY | CAN_CTRLMODE_ONE_SHOT | CAN_CTRLMODE_CC_LEN8_DLC, .clock = { .freq = PCAN_UFD_CRYSTAL_HZ, }, .bittiming_const = &pcan_usb_x6_const, .data_bittiming_const = &pcan_usb_x6_data_const, /* size of device private data */ .sizeof_dev_private = sizeof(struct pcan_usb_fd_device), .ethtool_ops = &pcan_usb_fd_ethtool_ops, /* timestamps usage */ .ts_used_bits = 32, .us_per_ts_scale = 1, /* us = (ts * scale) >> shift */ .us_per_ts_shift = 0, /* give here messages in/out endpoints */ .ep_msg_in = PCAN_USBPRO_EP_MSGIN, .ep_msg_out = {PCAN_USBPRO_EP_MSGOUT_0, PCAN_USBPRO_EP_MSGOUT_1}, /* size of rx/tx usb buffers */ .rx_buffer_size = PCAN_UFD_RX_BUFFER_SIZE, .tx_buffer_size = PCAN_UFD_TX_BUFFER_SIZE, /* device callbacks */ .intf_probe = pcan_usb_pro_probe, /* same as PCAN-USB Pro */ .dev_init = pcan_usb_fd_init, .dev_exit = pcan_usb_fd_exit, .dev_free = pcan_usb_fd_free, .dev_set_bus = pcan_usb_fd_set_bus, .dev_set_bittiming = pcan_usb_fd_set_bittiming_slow, .dev_set_data_bittiming = pcan_usb_fd_set_bittiming_fast, .dev_get_can_channel_id = pcan_usb_fd_get_can_channel_id, .dev_set_can_channel_id = pcan_usb_fd_set_can_channel_id, .dev_decode_buf = pcan_usb_fd_decode_buf, .dev_start = pcan_usb_fd_start, .dev_stop = pcan_usb_fd_stop, .dev_restart_async = pcan_usb_fd_restart_async, .dev_encode_msg = pcan_usb_fd_encode_msg, .do_get_berr_counter = pcan_usb_fd_get_berr_counter, }; |
| 10 8 2 1 1 10 10 10 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 | /* * Copyright (c) 2004-2011 Atheros Communications Inc. * Copyright (c) 2011-2012 Qualcomm Atheros, Inc. * * Permission to use, copy, modify, and/or distribute this software for any * purpose with or without fee is hereby granted, provided that the above * copyright notice and this permission notice appear in all copies. * * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. */ #include "core.h" #include "hif-ops.h" #include "target.h" #include "debug.h" int ath6kl_bmi_done(struct ath6kl *ar) { int ret; u32 cid = BMI_DONE; if (ar->bmi.done_sent) { ath6kl_dbg(ATH6KL_DBG_BMI, "bmi done skipped\n"); return 0; } ar->bmi.done_sent = true; ret = ath6kl_hif_bmi_write(ar, (u8 *)&cid, sizeof(cid)); if (ret) { ath6kl_err("Unable to send bmi done: %d\n", ret); return ret; } return 0; } int ath6kl_bmi_get_target_info(struct ath6kl *ar, struct ath6kl_bmi_target_info *targ_info) { int ret; u32 cid = BMI_GET_TARGET_INFO; if (ar->bmi.done_sent) { ath6kl_err("bmi done sent already, cmd %d disallowed\n", cid); return -EACCES; } ret = ath6kl_hif_bmi_write(ar, (u8 *)&cid, sizeof(cid)); if (ret) { ath6kl_err("Unable to send get target info: %d\n", ret); return ret; } if (ar->hif_type == ATH6KL_HIF_TYPE_USB) { ret = ath6kl_hif_bmi_read(ar, (u8 *)targ_info, sizeof(*targ_info)); } else { ret = ath6kl_hif_bmi_read(ar, (u8 *)&targ_info->version, sizeof(targ_info->version)); } if (ret) { ath6kl_err("Unable to recv target info: %d\n", ret); return ret; } if (le32_to_cpu(targ_info->version) == TARGET_VERSION_SENTINAL) { /* Determine how many bytes are in the Target's targ_info */ ret = ath6kl_hif_bmi_read(ar, (u8 *)&targ_info->byte_count, sizeof(targ_info->byte_count)); if (ret) { ath6kl_err("unable to read target info byte count: %d\n", ret); return ret; } /* * The target's targ_info doesn't match the host's targ_info. * We need to do some backwards compatibility to make this work. */ if (le32_to_cpu(targ_info->byte_count) != sizeof(*targ_info)) { ath6kl_err("mismatched byte count %d vs. expected %zd\n", le32_to_cpu(targ_info->byte_count), sizeof(*targ_info)); return -EINVAL; } /* Read the remainder of the targ_info */ ret = ath6kl_hif_bmi_read(ar, ((u8 *)targ_info) + sizeof(targ_info->byte_count), sizeof(*targ_info) - sizeof(targ_info->byte_count)); if (ret) { ath6kl_err("Unable to read target info (%d bytes): %d\n", targ_info->byte_count, ret); return ret; } } ath6kl_dbg(ATH6KL_DBG_BMI, "target info (ver: 0x%x type: 0x%x)\n", targ_info->version, targ_info->type); return 0; } int ath6kl_bmi_read(struct ath6kl *ar, u32 addr, u8 *buf, u32 len) { u32 cid = BMI_READ_MEMORY; int ret; u32 offset; u32 len_remain, rx_len; u16 size; if (ar->bmi.done_sent) { ath6kl_err("bmi done sent already, cmd %d disallowed\n", cid); return -EACCES; } size = ar->bmi.max_data_size + sizeof(cid) + sizeof(addr) + sizeof(len); if (size > ar->bmi.max_cmd_size) { WARN_ON(1); return -EINVAL; } memset(ar->bmi.cmd_buf, 0, size); ath6kl_dbg(ATH6KL_DBG_BMI, "bmi read memory: device: addr: 0x%x, len: %d\n", addr, len); len_remain = len; while (len_remain) { rx_len = (len_remain < ar->bmi.max_data_size) ? len_remain : ar->bmi.max_data_size; offset = 0; memcpy(&(ar->bmi.cmd_buf[offset]), &cid, sizeof(cid)); offset += sizeof(cid); memcpy(&(ar->bmi.cmd_buf[offset]), &addr, sizeof(addr)); offset += sizeof(addr); memcpy(&(ar->bmi.cmd_buf[offset]), &rx_len, sizeof(rx_len)); offset += sizeof(len); ret = ath6kl_hif_bmi_write(ar, ar->bmi.cmd_buf, offset); if (ret) { ath6kl_err("Unable to write to the device: %d\n", ret); return ret; } ret = ath6kl_hif_bmi_read(ar, ar->bmi.cmd_buf, rx_len); if (ret) { ath6kl_err("Unable to read from the device: %d\n", ret); return ret; } memcpy(&buf[len - len_remain], ar->bmi.cmd_buf, rx_len); len_remain -= rx_len; addr += rx_len; } return 0; } int ath6kl_bmi_write(struct ath6kl *ar, u32 addr, u8 *buf, u32 len) { u32 cid = BMI_WRITE_MEMORY; int ret; u32 offset; u32 len_remain, tx_len; const u32 header = sizeof(cid) + sizeof(addr) + sizeof(len); u8 aligned_buf[400]; u8 *src; if (ar->bmi.done_sent) { ath6kl_err("bmi done sent already, cmd %d disallowed\n", cid); return -EACCES; } if ((ar->bmi.max_data_size + header) > ar->bmi.max_cmd_size) { WARN_ON(1); return -EINVAL; } if (WARN_ON(ar->bmi.max_data_size > sizeof(aligned_buf))) return -E2BIG; memset(ar->bmi.cmd_buf, 0, ar->bmi.max_data_size + header); ath6kl_dbg(ATH6KL_DBG_BMI, "bmi write memory: addr: 0x%x, len: %d\n", addr, len); len_remain = len; while (len_remain) { src = &buf[len - len_remain]; if (len_remain < (ar->bmi.max_data_size - header)) { if (len_remain & 3) { /* align it with 4 bytes */ len_remain = len_remain + (4 - (len_remain & 3)); memcpy(aligned_buf, src, len_remain); src = aligned_buf; } tx_len = len_remain; } else { tx_len = (ar->bmi.max_data_size - header); } offset = 0; memcpy(&(ar->bmi.cmd_buf[offset]), &cid, sizeof(cid)); offset += sizeof(cid); memcpy(&(ar->bmi.cmd_buf[offset]), &addr, sizeof(addr)); offset += sizeof(addr); memcpy(&(ar->bmi.cmd_buf[offset]), &tx_len, sizeof(tx_len)); offset += sizeof(tx_len); memcpy(&(ar->bmi.cmd_buf[offset]), src, tx_len); offset += tx_len; ret = ath6kl_hif_bmi_write(ar, ar->bmi.cmd_buf, offset); if (ret) { ath6kl_err("Unable to write to the device: %d\n", ret); return ret; } len_remain -= tx_len; addr += tx_len; } return 0; } int ath6kl_bmi_execute(struct ath6kl *ar, u32 addr, u32 *param) { u32 cid = BMI_EXECUTE; int ret; u32 offset; u16 size; if (ar->bmi.done_sent) { ath6kl_err("bmi done sent already, cmd %d disallowed\n", cid); return -EACCES; } size = sizeof(cid) + sizeof(addr) + sizeof(*param); if (size > ar->bmi.max_cmd_size) { WARN_ON(1); return -EINVAL; } memset(ar->bmi.cmd_buf, 0, size); ath6kl_dbg(ATH6KL_DBG_BMI, "bmi execute: addr: 0x%x, param: %d)\n", addr, *param); offset = 0; memcpy(&(ar->bmi.cmd_buf[offset]), &cid, sizeof(cid)); offset += sizeof(cid); memcpy(&(ar->bmi.cmd_buf[offset]), &addr, sizeof(addr)); offset += sizeof(addr); memcpy(&(ar->bmi.cmd_buf[offset]), param, sizeof(*param)); offset += sizeof(*param); ret = ath6kl_hif_bmi_write(ar, ar->bmi.cmd_buf, offset); if (ret) { ath6kl_err("Unable to write to the device: %d\n", ret); return ret; } ret = ath6kl_hif_bmi_read(ar, ar->bmi.cmd_buf, sizeof(*param)); if (ret) { ath6kl_err("Unable to read from the device: %d\n", ret); return ret; } memcpy(param, ar->bmi.cmd_buf, sizeof(*param)); return 0; } int ath6kl_bmi_set_app_start(struct ath6kl *ar, u32 addr) { u32 cid = BMI_SET_APP_START; int ret; u32 offset; u16 size; if (ar->bmi.done_sent) { ath6kl_err("bmi done sent already, cmd %d disallowed\n", cid); return -EACCES; } size = sizeof(cid) + sizeof(addr); if (size > ar->bmi.max_cmd_size) { WARN_ON(1); return -EINVAL; } memset(ar->bmi.cmd_buf, 0, size); ath6kl_dbg(ATH6KL_DBG_BMI, "bmi set app start: addr: 0x%x\n", addr); offset = 0; memcpy(&(ar->bmi.cmd_buf[offset]), &cid, sizeof(cid)); offset += sizeof(cid); memcpy(&(ar->bmi.cmd_buf[offset]), &addr, sizeof(addr)); offset += sizeof(addr); ret = ath6kl_hif_bmi_write(ar, ar->bmi.cmd_buf, offset); if (ret) { ath6kl_err("Unable to write to the device: %d\n", ret); return ret; } return 0; } int ath6kl_bmi_reg_read(struct ath6kl *ar, u32 addr, u32 *param) { u32 cid = BMI_READ_SOC_REGISTER; int ret; u32 offset; u16 size; if (ar->bmi.done_sent) { ath6kl_err("bmi done sent already, cmd %d disallowed\n", cid); return -EACCES; } size = sizeof(cid) + sizeof(addr); if (size > ar->bmi.max_cmd_size) { WARN_ON(1); return -EINVAL; } memset(ar->bmi.cmd_buf, 0, size); ath6kl_dbg(ATH6KL_DBG_BMI, "bmi read SOC reg: addr: 0x%x\n", addr); offset = 0; memcpy(&(ar->bmi.cmd_buf[offset]), &cid, sizeof(cid)); offset += sizeof(cid); memcpy(&(ar->bmi.cmd_buf[offset]), &addr, sizeof(addr)); offset += sizeof(addr); ret = ath6kl_hif_bmi_write(ar, ar->bmi.cmd_buf, offset); if (ret) { ath6kl_err("Unable to write to the device: %d\n", ret); return ret; } ret = ath6kl_hif_bmi_read(ar, ar->bmi.cmd_buf, sizeof(*param)); if (ret) { ath6kl_err("Unable to read from the device: %d\n", ret); return ret; } memcpy(param, ar->bmi.cmd_buf, sizeof(*param)); return 0; } int ath6kl_bmi_reg_write(struct ath6kl *ar, u32 addr, u32 param) { u32 cid = BMI_WRITE_SOC_REGISTER; int ret; u32 offset; u16 size; if (ar->bmi.done_sent) { ath6kl_err("bmi done sent already, cmd %d disallowed\n", cid); return -EACCES; } size = sizeof(cid) + sizeof(addr) + sizeof(param); if (size > ar->bmi.max_cmd_size) { WARN_ON(1); return -EINVAL; } memset(ar->bmi.cmd_buf, 0, size); ath6kl_dbg(ATH6KL_DBG_BMI, "bmi write SOC reg: addr: 0x%x, param: %d\n", addr, param); offset = 0; memcpy(&(ar->bmi.cmd_buf[offset]), &cid, sizeof(cid)); offset += sizeof(cid); memcpy(&(ar->bmi.cmd_buf[offset]), &addr, sizeof(addr)); offset += sizeof(addr); memcpy(&(ar->bmi.cmd_buf[offset]), ¶m, sizeof(param)); offset += sizeof(param); ret = ath6kl_hif_bmi_write(ar, ar->bmi.cmd_buf, offset); if (ret) { ath6kl_err("Unable to write to the device: %d\n", ret); return ret; } return 0; } int ath6kl_bmi_lz_data(struct ath6kl *ar, u8 *buf, u32 len) { u32 cid = BMI_LZ_DATA; int ret; u32 offset; u32 len_remain, tx_len; const u32 header = sizeof(cid) + sizeof(len); u16 size; if (ar->bmi.done_sent) { ath6kl_err("bmi done sent already, cmd %d disallowed\n", cid); return -EACCES; } size = ar->bmi.max_data_size + header; if (size > ar->bmi.max_cmd_size) { WARN_ON(1); return -EINVAL; } memset(ar->bmi.cmd_buf, 0, size); ath6kl_dbg(ATH6KL_DBG_BMI, "bmi send LZ data: len: %d)\n", len); len_remain = len; while (len_remain) { tx_len = (len_remain < (ar->bmi.max_data_size - header)) ? len_remain : (ar->bmi.max_data_size - header); offset = 0; memcpy(&(ar->bmi.cmd_buf[offset]), &cid, sizeof(cid)); offset += sizeof(cid); memcpy(&(ar->bmi.cmd_buf[offset]), &tx_len, sizeof(tx_len)); offset += sizeof(tx_len); memcpy(&(ar->bmi.cmd_buf[offset]), &buf[len - len_remain], tx_len); offset += tx_len; ret = ath6kl_hif_bmi_write(ar, ar->bmi.cmd_buf, offset); if (ret) { ath6kl_err("Unable to write to the device: %d\n", ret); return ret; } len_remain -= tx_len; } return 0; } int ath6kl_bmi_lz_stream_start(struct ath6kl *ar, u32 addr) { u32 cid = BMI_LZ_STREAM_START; int ret; u32 offset; u16 size; if (ar->bmi.done_sent) { ath6kl_err("bmi done sent already, cmd %d disallowed\n", cid); return -EACCES; } size = sizeof(cid) + sizeof(addr); if (size > ar->bmi.max_cmd_size) { WARN_ON(1); return -EINVAL; } memset(ar->bmi.cmd_buf, 0, size); ath6kl_dbg(ATH6KL_DBG_BMI, "bmi LZ stream start: addr: 0x%x)\n", addr); offset = 0; memcpy(&(ar->bmi.cmd_buf[offset]), &cid, sizeof(cid)); offset += sizeof(cid); memcpy(&(ar->bmi.cmd_buf[offset]), &addr, sizeof(addr)); offset += sizeof(addr); ret = ath6kl_hif_bmi_write(ar, ar->bmi.cmd_buf, offset); if (ret) { ath6kl_err("Unable to start LZ stream to the device: %d\n", ret); return ret; } return 0; } int ath6kl_bmi_fast_download(struct ath6kl *ar, u32 addr, u8 *buf, u32 len) { int ret; u32 last_word = 0; u32 last_word_offset = len & ~0x3; u32 unaligned_bytes = len & 0x3; ret = ath6kl_bmi_lz_stream_start(ar, addr); if (ret) return ret; if (unaligned_bytes) { /* copy the last word into a zero padded buffer */ memcpy(&last_word, &buf[last_word_offset], unaligned_bytes); } ret = ath6kl_bmi_lz_data(ar, buf, last_word_offset); if (ret) return ret; if (unaligned_bytes) ret = ath6kl_bmi_lz_data(ar, (u8 *)&last_word, 4); if (!ret) { /* Close compressed stream and open a new (fake) one. * This serves mainly to flush Target caches. */ ret = ath6kl_bmi_lz_stream_start(ar, 0x00); } return ret; } void ath6kl_bmi_reset(struct ath6kl *ar) { ar->bmi.done_sent = false; } int ath6kl_bmi_init(struct ath6kl *ar) { if (WARN_ON(ar->bmi.max_data_size == 0)) return -EINVAL; /* cmd + addr + len + data_size */ ar->bmi.max_cmd_size = ar->bmi.max_data_size + (sizeof(u32) * 3); ar->bmi.cmd_buf = kzalloc(ar->bmi.max_cmd_size, GFP_KERNEL); if (!ar->bmi.cmd_buf) return -ENOMEM; return 0; } void ath6kl_bmi_cleanup(struct ath6kl *ar) { kfree(ar->bmi.cmd_buf); ar->bmi.cmd_buf = NULL; } |
| 3 1 4 2 4 2 2 2 1 1 2 2 5 5 5 3 2 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 | // SPDX-License-Identifier: GPL-2.0 /* Fintek F81604 USB-to-2CAN controller driver. * * Copyright (C) 2023 Ji-Ze Hong (Peter Hong) <peter_hong@fintek.com.tw> */ #include <linux/bitfield.h> #include <linux/netdevice.h> #include <linux/units.h> #include <linux/usb.h> #include <linux/can.h> #include <linux/can/dev.h> #include <linux/can/error.h> #include <linux/can/platform/sja1000.h> #include <linux/unaligned.h> /* vendor and product id */ #define F81604_VENDOR_ID 0x2c42 #define F81604_PRODUCT_ID 0x1709 #define F81604_CAN_CLOCK (12 * MEGA) #define F81604_MAX_DEV 2 #define F81604_SET_DEVICE_RETRY 10 #define F81604_USB_TIMEOUT 2000 #define F81604_SET_GET_REGISTER 0xA0 #define F81604_PORT_OFFSET 0x1000 #define F81604_MAX_RX_URBS 4 #define F81604_CMD_DATA 0x00 #define F81604_DLC_LEN_MASK GENMASK(3, 0) #define F81604_DLC_EFF_BIT BIT(7) #define F81604_DLC_RTR_BIT BIT(6) #define F81604_SFF_SHIFT 5 #define F81604_EFF_SHIFT 3 #define F81604_BRP_MASK GENMASK(5, 0) #define F81604_SJW_MASK GENMASK(7, 6) #define F81604_SEG1_MASK GENMASK(3, 0) #define F81604_SEG2_MASK GENMASK(6, 4) #define F81604_CLEAR_ALC 0 #define F81604_CLEAR_ECC 1 #define F81604_CLEAR_OVERRUN 2 /* device setting */ #define F81604_CTRL_MODE_REG 0x80 #define F81604_TX_ONESHOT (0x03 << 3) #define F81604_TX_NORMAL (0x01 << 3) #define F81604_RX_AUTO_RELEASE_BUF BIT(1) #define F81604_INT_WHEN_CHANGE BIT(0) #define F81604_TERMINATOR_REG 0x105 #define F81604_CAN0_TERM BIT(2) #define F81604_CAN1_TERM BIT(3) #define F81604_TERMINATION_DISABLED CAN_TERMINATION_DISABLED #define F81604_TERMINATION_ENABLED 120 /* SJA1000 registers - manual section 6.4 (Pelican Mode) */ #define F81604_SJA1000_MOD 0x00 #define F81604_SJA1000_CMR 0x01 #define F81604_SJA1000_IR 0x03 #define F81604_SJA1000_IER 0x04 #define F81604_SJA1000_ALC 0x0B #define F81604_SJA1000_ECC 0x0C #define F81604_SJA1000_RXERR 0x0E #define F81604_SJA1000_TXERR 0x0F #define F81604_SJA1000_ACCC0 0x10 #define F81604_SJA1000_ACCM0 0x14 #define F81604_MAX_FILTER_CNT 4 /* Common registers - manual section 6.5 */ #define F81604_SJA1000_BTR0 0x06 #define F81604_SJA1000_BTR1 0x07 #define F81604_SJA1000_BTR1_SAMPLE_TRIPLE BIT(7) #define F81604_SJA1000_OCR 0x08 #define F81604_SJA1000_CDR 0x1F /* mode register */ #define F81604_SJA1000_MOD_RM 0x01 #define F81604_SJA1000_MOD_LOM 0x02 #define F81604_SJA1000_MOD_STM 0x04 /* commands */ #define F81604_SJA1000_CMD_CDO 0x08 /* interrupt sources */ #define F81604_SJA1000_IRQ_BEI 0x80 #define F81604_SJA1000_IRQ_ALI 0x40 #define F81604_SJA1000_IRQ_EPI 0x20 #define F81604_SJA1000_IRQ_DOI 0x08 #define F81604_SJA1000_IRQ_EI 0x04 #define F81604_SJA1000_IRQ_TI 0x02 #define F81604_SJA1000_IRQ_RI 0x01 #define F81604_SJA1000_IRQ_ALL 0xFF #define F81604_SJA1000_IRQ_OFF 0x00 /* status register content */ #define F81604_SJA1000_SR_BS 0x80 #define F81604_SJA1000_SR_ES 0x40 #define F81604_SJA1000_SR_TCS 0x08 /* ECC register */ #define F81604_SJA1000_ECC_SEG 0x1F #define F81604_SJA1000_ECC_DIR 0x20 #define F81604_SJA1000_ECC_BIT 0x00 #define F81604_SJA1000_ECC_FORM 0x40 #define F81604_SJA1000_ECC_STUFF 0x80 #define F81604_SJA1000_ECC_MASK 0xc0 /* ALC register */ #define F81604_SJA1000_ALC_MASK 0x1f /* table of devices that work with this driver */ static const struct usb_device_id f81604_table[] = { { USB_DEVICE(F81604_VENDOR_ID, F81604_PRODUCT_ID) }, {} /* Terminating entry */ }; MODULE_DEVICE_TABLE(usb, f81604_table); static const struct ethtool_ops f81604_ethtool_ops = { .get_ts_info = ethtool_op_get_ts_info, }; static const u16 f81604_termination[] = { F81604_TERMINATION_DISABLED, F81604_TERMINATION_ENABLED }; struct f81604_priv { struct net_device *netdev[F81604_MAX_DEV]; }; struct f81604_port_priv { struct can_priv can; struct net_device *netdev; struct sk_buff *echo_skb; unsigned long clear_flags; struct work_struct clear_reg_work; struct usb_device *dev; struct usb_interface *intf; struct usb_anchor urbs_anchor; }; /* Interrupt endpoint data format: * Byte 0: Status register. * Byte 1: Interrupt register. * Byte 2: Interrupt enable register. * Byte 3: Arbitration lost capture(ALC) register. * Byte 4: Error code capture(ECC) register. * Byte 5: Error warning limit register. * Byte 6: RX error counter register. * Byte 7: TX error counter register. * Byte 8: Reserved. */ struct f81604_int_data { u8 sr; u8 isrc; u8 ier; u8 alc; u8 ecc; u8 ewlr; u8 rxerr; u8 txerr; u8 val; } __packed __aligned(4); struct f81604_sff { __be16 id; u8 data[CAN_MAX_DLEN]; } __packed __aligned(2); struct f81604_eff { __be32 id; u8 data[CAN_MAX_DLEN]; } __packed __aligned(2); struct f81604_can_frame { u8 cmd; /* According for F81604 DLC define: * bit 3~0: data length (0~8) * bit6: is RTR flag. * bit7: is EFF frame. */ u8 dlc; union { struct f81604_sff sff; struct f81604_eff eff; }; } __packed __aligned(2); static const u8 bulk_in_addr[F81604_MAX_DEV] = { 2, 4 }; static const u8 bulk_out_addr[F81604_MAX_DEV] = { 1, 3 }; static const u8 int_in_addr[F81604_MAX_DEV] = { 1, 3 }; static int f81604_write(struct usb_device *dev, u16 reg, u8 data) { int ret; ret = usb_control_msg_send(dev, 0, F81604_SET_GET_REGISTER, USB_TYPE_VENDOR | USB_DIR_OUT, 0, reg, &data, sizeof(data), F81604_USB_TIMEOUT, GFP_KERNEL); if (ret) dev_err(&dev->dev, "%s: reg: %x data: %x failed: %pe\n", __func__, reg, data, ERR_PTR(ret)); return ret; } static int f81604_read(struct usb_device *dev, u16 reg, u8 *data) { int ret; ret = usb_control_msg_recv(dev, 0, F81604_SET_GET_REGISTER, USB_TYPE_VENDOR | USB_DIR_IN, 0, reg, data, sizeof(*data), F81604_USB_TIMEOUT, GFP_KERNEL); if (ret < 0) dev_err(&dev->dev, "%s: reg: %x failed: %pe\n", __func__, reg, ERR_PTR(ret)); return ret; } static int f81604_update_bits(struct usb_device *dev, u16 reg, u8 mask, u8 data) { int ret; u8 tmp; ret = f81604_read(dev, reg, &tmp); if (ret) return ret; tmp &= ~mask; tmp |= (mask & data); return f81604_write(dev, reg, tmp); } static int f81604_sja1000_write(struct f81604_port_priv *priv, u16 reg, u8 data) { int port = priv->netdev->dev_port; int real_reg; real_reg = reg + F81604_PORT_OFFSET * port + F81604_PORT_OFFSET; return f81604_write(priv->dev, real_reg, data); } static int f81604_sja1000_read(struct f81604_port_priv *priv, u16 reg, u8 *data) { int port = priv->netdev->dev_port; int real_reg; real_reg = reg + F81604_PORT_OFFSET * port + F81604_PORT_OFFSET; return f81604_read(priv->dev, real_reg, data); } static int f81604_set_reset_mode(struct f81604_port_priv *priv) { int ret, i; u8 tmp; /* disable interrupts */ ret = f81604_sja1000_write(priv, F81604_SJA1000_IER, F81604_SJA1000_IRQ_OFF); if (ret) return ret; for (i = 0; i < F81604_SET_DEVICE_RETRY; i++) { ret = f81604_sja1000_read(priv, F81604_SJA1000_MOD, &tmp); if (ret) return ret; /* check reset bit */ if (tmp & F81604_SJA1000_MOD_RM) { priv->can.state = CAN_STATE_STOPPED; return 0; } /* reset chip */ ret = f81604_sja1000_write(priv, F81604_SJA1000_MOD, F81604_SJA1000_MOD_RM); if (ret) return ret; } return -EPERM; } static int f81604_set_normal_mode(struct f81604_port_priv *priv) { u8 tmp, ier = 0; u8 mod_reg = 0; int ret, i; for (i = 0; i < F81604_SET_DEVICE_RETRY; i++) { ret = f81604_sja1000_read(priv, F81604_SJA1000_MOD, &tmp); if (ret) return ret; /* check reset bit */ if ((tmp & F81604_SJA1000_MOD_RM) == 0) { priv->can.state = CAN_STATE_ERROR_ACTIVE; /* enable interrupts, RI handled by bulk-in */ ier = F81604_SJA1000_IRQ_ALL & ~F81604_SJA1000_IRQ_RI; if (!(priv->can.ctrlmode & CAN_CTRLMODE_BERR_REPORTING)) ier &= ~F81604_SJA1000_IRQ_BEI; return f81604_sja1000_write(priv, F81604_SJA1000_IER, ier); } /* set chip to normal mode */ if (priv->can.ctrlmode & CAN_CTRLMODE_LISTENONLY) mod_reg |= F81604_SJA1000_MOD_LOM; if (priv->can.ctrlmode & CAN_CTRLMODE_PRESUME_ACK) mod_reg |= F81604_SJA1000_MOD_STM; ret = f81604_sja1000_write(priv, F81604_SJA1000_MOD, mod_reg); if (ret) return ret; } return -EPERM; } static int f81604_chipset_init(struct f81604_port_priv *priv) { int i, ret; /* set clock divider and output control register */ ret = f81604_sja1000_write(priv, F81604_SJA1000_CDR, CDR_CBP | CDR_PELICAN); if (ret) return ret; /* set acceptance filter (accept all) */ for (i = 0; i < F81604_MAX_FILTER_CNT; ++i) { ret = f81604_sja1000_write(priv, F81604_SJA1000_ACCC0 + i, 0); if (ret) return ret; } for (i = 0; i < F81604_MAX_FILTER_CNT; ++i) { ret = f81604_sja1000_write(priv, F81604_SJA1000_ACCM0 + i, 0xFF); if (ret) return ret; } return f81604_sja1000_write(priv, F81604_SJA1000_OCR, OCR_TX0_PUSHPULL | OCR_TX1_PUSHPULL | OCR_MODE_NORMAL); } static void f81604_process_rx_packet(struct net_device *netdev, struct f81604_can_frame *frame) { struct net_device_stats *stats = &netdev->stats; struct can_frame *cf; struct sk_buff *skb; if (frame->cmd != F81604_CMD_DATA) return; skb = alloc_can_skb(netdev, &cf); if (!skb) { stats->rx_dropped++; return; } cf->len = can_cc_dlc2len(frame->dlc & F81604_DLC_LEN_MASK); if (frame->dlc & F81604_DLC_EFF_BIT) { cf->can_id = get_unaligned_be32(&frame->eff.id) >> F81604_EFF_SHIFT; cf->can_id |= CAN_EFF_FLAG; if (!(frame->dlc & F81604_DLC_RTR_BIT)) memcpy(cf->data, frame->eff.data, cf->len); } else { cf->can_id = get_unaligned_be16(&frame->sff.id) >> F81604_SFF_SHIFT; if (!(frame->dlc & F81604_DLC_RTR_BIT)) memcpy(cf->data, frame->sff.data, cf->len); } if (frame->dlc & F81604_DLC_RTR_BIT) cf->can_id |= CAN_RTR_FLAG; else stats->rx_bytes += cf->len; stats->rx_packets++; netif_rx(skb); } static void f81604_read_bulk_callback(struct urb *urb) { struct f81604_can_frame *frame = urb->transfer_buffer; struct net_device *netdev = urb->context; int ret; if (!netif_device_present(netdev)) return; if (urb->status) netdev_info(netdev, "%s: URB aborted %pe\n", __func__, ERR_PTR(urb->status)); switch (urb->status) { case 0: /* success */ break; case -ENOENT: case -EPIPE: case -EPROTO: case -ESHUTDOWN: return; default: goto resubmit_urb; } if (urb->actual_length != sizeof(*frame)) { netdev_warn(netdev, "URB length %u not equal to %zu\n", urb->actual_length, sizeof(*frame)); goto resubmit_urb; } f81604_process_rx_packet(netdev, frame); resubmit_urb: ret = usb_submit_urb(urb, GFP_ATOMIC); if (ret == -ENODEV) netif_device_detach(netdev); else if (ret) netdev_err(netdev, "%s: failed to resubmit read bulk urb: %pe\n", __func__, ERR_PTR(ret)); } static void f81604_handle_tx(struct f81604_port_priv *priv, struct f81604_int_data *data) { struct net_device *netdev = priv->netdev; struct net_device_stats *stats = &netdev->stats; /* transmission buffer released */ if (priv->can.ctrlmode & CAN_CTRLMODE_ONE_SHOT && !(data->sr & F81604_SJA1000_SR_TCS)) { stats->tx_errors++; can_free_echo_skb(netdev, 0, NULL); } else { /* transmission complete */ stats->tx_bytes += can_get_echo_skb(netdev, 0, NULL); stats->tx_packets++; } netif_wake_queue(netdev); } static void f81604_handle_can_bus_errors(struct f81604_port_priv *priv, struct f81604_int_data *data) { enum can_state can_state = priv->can.state; struct net_device *netdev = priv->netdev; struct net_device_stats *stats = &netdev->stats; struct can_frame *cf; struct sk_buff *skb; /* Note: ALC/ECC will not auto clear by read here, must be cleared by * read register (via clear_reg_work). */ skb = alloc_can_err_skb(netdev, &cf); if (skb) { cf->can_id |= CAN_ERR_CNT; cf->data[6] = data->txerr; cf->data[7] = data->rxerr; } if (data->isrc & F81604_SJA1000_IRQ_DOI) { /* data overrun interrupt */ netdev_dbg(netdev, "data overrun interrupt\n"); if (skb) { cf->can_id |= CAN_ERR_CRTL; cf->data[1] = CAN_ERR_CRTL_RX_OVERFLOW; } stats->rx_over_errors++; stats->rx_errors++; set_bit(F81604_CLEAR_OVERRUN, &priv->clear_flags); } if (data->isrc & F81604_SJA1000_IRQ_EI) { /* error warning interrupt */ netdev_dbg(netdev, "error warning interrupt\n"); if (data->sr & F81604_SJA1000_SR_BS) can_state = CAN_STATE_BUS_OFF; else if (data->sr & F81604_SJA1000_SR_ES) can_state = CAN_STATE_ERROR_WARNING; else can_state = CAN_STATE_ERROR_ACTIVE; } if (data->isrc & F81604_SJA1000_IRQ_BEI) { /* bus error interrupt */ netdev_dbg(netdev, "bus error interrupt\n"); priv->can.can_stats.bus_error++; if (skb) { cf->can_id |= CAN_ERR_PROT | CAN_ERR_BUSERROR; /* set error type */ switch (data->ecc & F81604_SJA1000_ECC_MASK) { case F81604_SJA1000_ECC_BIT: cf->data[2] |= CAN_ERR_PROT_BIT; break; case F81604_SJA1000_ECC_FORM: cf->data[2] |= CAN_ERR_PROT_FORM; break; case F81604_SJA1000_ECC_STUFF: cf->data[2] |= CAN_ERR_PROT_STUFF; break; default: break; } /* set error location */ cf->data[3] = data->ecc & F81604_SJA1000_ECC_SEG; } /* Error occurred during transmission? */ if ((data->ecc & F81604_SJA1000_ECC_DIR) == 0) { stats->tx_errors++; if (skb) cf->data[2] |= CAN_ERR_PROT_TX; } else { stats->rx_errors++; } set_bit(F81604_CLEAR_ECC, &priv->clear_flags); } if (data->isrc & F81604_SJA1000_IRQ_EPI) { if (can_state == CAN_STATE_ERROR_PASSIVE) can_state = CAN_STATE_ERROR_WARNING; else can_state = CAN_STATE_ERROR_PASSIVE; /* error passive interrupt */ netdev_dbg(netdev, "error passive interrupt: %d\n", can_state); } if (data->isrc & F81604_SJA1000_IRQ_ALI) { /* arbitration lost interrupt */ netdev_dbg(netdev, "arbitration lost interrupt\n"); priv->can.can_stats.arbitration_lost++; if (skb) { cf->can_id |= CAN_ERR_LOSTARB; cf->data[0] = data->alc & F81604_SJA1000_ALC_MASK; } set_bit(F81604_CLEAR_ALC, &priv->clear_flags); } if (can_state != priv->can.state) { enum can_state tx_state, rx_state; tx_state = data->txerr >= data->rxerr ? can_state : 0; rx_state = data->txerr <= data->rxerr ? can_state : 0; can_change_state(netdev, cf, tx_state, rx_state); if (can_state == CAN_STATE_BUS_OFF) can_bus_off(netdev); } if (priv->clear_flags) schedule_work(&priv->clear_reg_work); if (skb) netif_rx(skb); } static void f81604_read_int_callback(struct urb *urb) { struct f81604_int_data *data = urb->transfer_buffer; struct net_device *netdev = urb->context; struct f81604_port_priv *priv; int ret; priv = netdev_priv(netdev); if (!netif_device_present(netdev)) return; if (urb->status) netdev_info(netdev, "%s: Int URB aborted: %pe\n", __func__, ERR_PTR(urb->status)); switch (urb->status) { case 0: /* success */ break; case -ENOENT: case -EPIPE: case -EPROTO: case -ESHUTDOWN: return; default: goto resubmit_urb; } /* handle Errors */ if (data->isrc & (F81604_SJA1000_IRQ_DOI | F81604_SJA1000_IRQ_EI | F81604_SJA1000_IRQ_BEI | F81604_SJA1000_IRQ_EPI | F81604_SJA1000_IRQ_ALI)) f81604_handle_can_bus_errors(priv, data); /* handle TX */ if (priv->can.state != CAN_STATE_BUS_OFF && (data->isrc & F81604_SJA1000_IRQ_TI)) f81604_handle_tx(priv, data); resubmit_urb: ret = usb_submit_urb(urb, GFP_ATOMIC); if (ret == -ENODEV) netif_device_detach(netdev); else if (ret) netdev_err(netdev, "%s: failed to resubmit int urb: %pe\n", __func__, ERR_PTR(ret)); } static void f81604_unregister_urbs(struct f81604_port_priv *priv) { usb_kill_anchored_urbs(&priv->urbs_anchor); } static int f81604_register_urbs(struct f81604_port_priv *priv) { struct net_device *netdev = priv->netdev; struct f81604_int_data *int_data; int id = netdev->dev_port; struct urb *int_urb; int rx_urb_cnt; int ret; for (rx_urb_cnt = 0; rx_urb_cnt < F81604_MAX_RX_URBS; ++rx_urb_cnt) { struct f81604_can_frame *frame; struct urb *rx_urb; rx_urb = usb_alloc_urb(0, GFP_KERNEL); if (!rx_urb) { ret = -ENOMEM; break; } frame = kmalloc(sizeof(*frame), GFP_KERNEL); if (!frame) { usb_free_urb(rx_urb); ret = -ENOMEM; break; } usb_fill_bulk_urb(rx_urb, priv->dev, usb_rcvbulkpipe(priv->dev, bulk_in_addr[id]), frame, sizeof(*frame), f81604_read_bulk_callback, netdev); rx_urb->transfer_flags |= URB_FREE_BUFFER; usb_anchor_urb(rx_urb, &priv->urbs_anchor); ret = usb_submit_urb(rx_urb, GFP_KERNEL); if (ret) { usb_unanchor_urb(rx_urb); usb_free_urb(rx_urb); break; } /* Drop reference, USB core will take care of freeing it */ usb_free_urb(rx_urb); } if (rx_urb_cnt == 0) { netdev_warn(netdev, "%s: submit rx urb failed: %pe\n", __func__, ERR_PTR(ret)); goto error; } int_urb = usb_alloc_urb(0, GFP_KERNEL); if (!int_urb) { ret = -ENOMEM; goto error; } int_data = kmalloc(sizeof(*int_data), GFP_KERNEL); if (!int_data) { usb_free_urb(int_urb); ret = -ENOMEM; goto error; } usb_fill_int_urb(int_urb, priv->dev, usb_rcvintpipe(priv->dev, int_in_addr[id]), int_data, sizeof(*int_data), f81604_read_int_callback, netdev, 1); int_urb->transfer_flags |= URB_FREE_BUFFER; usb_anchor_urb(int_urb, &priv->urbs_anchor); ret = usb_submit_urb(int_urb, GFP_KERNEL); if (ret) { usb_unanchor_urb(int_urb); usb_free_urb(int_urb); netdev_warn(netdev, "%s: submit int urb failed: %pe\n", __func__, ERR_PTR(ret)); goto error; } /* Drop reference, USB core will take care of freeing it */ usb_free_urb(int_urb); return 0; error: f81604_unregister_urbs(priv); return ret; } static int f81604_start(struct net_device *netdev) { struct f81604_port_priv *priv = netdev_priv(netdev); int ret; u8 mode; u8 tmp; mode = F81604_RX_AUTO_RELEASE_BUF | F81604_INT_WHEN_CHANGE; /* Set TR/AT mode */ if (priv->can.ctrlmode & CAN_CTRLMODE_ONE_SHOT) mode |= F81604_TX_ONESHOT; else mode |= F81604_TX_NORMAL; ret = f81604_sja1000_write(priv, F81604_CTRL_MODE_REG, mode); if (ret) return ret; /* set reset mode */ ret = f81604_set_reset_mode(priv); if (ret) return ret; ret = f81604_chipset_init(priv); if (ret) return ret; /* Clear error counters and error code capture */ ret = f81604_sja1000_write(priv, F81604_SJA1000_TXERR, 0); if (ret) return ret; ret = f81604_sja1000_write(priv, F81604_SJA1000_RXERR, 0); if (ret) return ret; /* Read clear for ECC/ALC/IR register */ ret = f81604_sja1000_read(priv, F81604_SJA1000_ECC, &tmp); if (ret) return ret; ret = f81604_sja1000_read(priv, F81604_SJA1000_ALC, &tmp); if (ret) return ret; ret = f81604_sja1000_read(priv, F81604_SJA1000_IR, &tmp); if (ret) return ret; ret = f81604_register_urbs(priv); if (ret) return ret; ret = f81604_set_normal_mode(priv); if (ret) { f81604_unregister_urbs(priv); return ret; } return 0; } static int f81604_set_bittiming(struct net_device *dev) { struct f81604_port_priv *priv = netdev_priv(dev); struct can_bittiming *bt = &priv->can.bittiming; u8 btr0, btr1; int ret; btr0 = FIELD_PREP(F81604_BRP_MASK, bt->brp - 1) | FIELD_PREP(F81604_SJW_MASK, bt->sjw - 1); btr1 = FIELD_PREP(F81604_SEG1_MASK, bt->prop_seg + bt->phase_seg1 - 1) | FIELD_PREP(F81604_SEG2_MASK, bt->phase_seg2 - 1); if (priv->can.ctrlmode & CAN_CTRLMODE_3_SAMPLES) btr1 |= F81604_SJA1000_BTR1_SAMPLE_TRIPLE; ret = f81604_sja1000_write(priv, F81604_SJA1000_BTR0, btr0); if (ret) { netdev_warn(dev, "%s: Set BTR0 failed: %pe\n", __func__, ERR_PTR(ret)); return ret; } ret = f81604_sja1000_write(priv, F81604_SJA1000_BTR1, btr1); if (ret) { netdev_warn(dev, "%s: Set BTR1 failed: %pe\n", __func__, ERR_PTR(ret)); return ret; } return 0; } static int f81604_set_mode(struct net_device *netdev, enum can_mode mode) { int ret; switch (mode) { case CAN_MODE_START: ret = f81604_start(netdev); if (!ret && netif_queue_stopped(netdev)) netif_wake_queue(netdev); break; default: ret = -EOPNOTSUPP; } return ret; } static void f81604_write_bulk_callback(struct urb *urb) { struct net_device *netdev = urb->context; if (!netif_device_present(netdev)) return; if (urb->status) netdev_info(netdev, "%s: Tx URB error: %pe\n", __func__, ERR_PTR(urb->status)); } static void f81604_clear_reg_work(struct work_struct *work) { struct f81604_port_priv *priv; u8 tmp; priv = container_of(work, struct f81604_port_priv, clear_reg_work); /* dummy read for clear Arbitration lost capture(ALC) register. */ if (test_and_clear_bit(F81604_CLEAR_ALC, &priv->clear_flags)) f81604_sja1000_read(priv, F81604_SJA1000_ALC, &tmp); /* dummy read for clear Error code capture(ECC) register. */ if (test_and_clear_bit(F81604_CLEAR_ECC, &priv->clear_flags)) f81604_sja1000_read(priv, F81604_SJA1000_ECC, &tmp); /* dummy write for clear data overrun flag. */ if (test_and_clear_bit(F81604_CLEAR_OVERRUN, &priv->clear_flags)) f81604_sja1000_write(priv, F81604_SJA1000_CMR, F81604_SJA1000_CMD_CDO); } static netdev_tx_t f81604_start_xmit(struct sk_buff *skb, struct net_device *netdev) { struct can_frame *cf = (struct can_frame *)skb->data; struct f81604_port_priv *priv = netdev_priv(netdev); struct net_device_stats *stats = &netdev->stats; struct f81604_can_frame *frame; struct urb *write_urb; int ret; if (can_dev_dropped_skb(netdev, skb)) return NETDEV_TX_OK; netif_stop_queue(netdev); write_urb = usb_alloc_urb(0, GFP_ATOMIC); if (!write_urb) goto nomem_urb; frame = kzalloc(sizeof(*frame), GFP_ATOMIC); if (!frame) goto nomem_buf; usb_fill_bulk_urb(write_urb, priv->dev, usb_sndbulkpipe(priv->dev, bulk_out_addr[netdev->dev_port]), frame, sizeof(*frame), f81604_write_bulk_callback, priv->netdev); write_urb->transfer_flags |= URB_FREE_BUFFER; frame->cmd = F81604_CMD_DATA; frame->dlc = cf->len; if (cf->can_id & CAN_RTR_FLAG) frame->dlc |= F81604_DLC_RTR_BIT; if (cf->can_id & CAN_EFF_FLAG) { u32 id = (cf->can_id & CAN_EFF_MASK) << F81604_EFF_SHIFT; put_unaligned_be32(id, &frame->eff.id); frame->dlc |= F81604_DLC_EFF_BIT; if (!(cf->can_id & CAN_RTR_FLAG)) memcpy(&frame->eff.data, cf->data, cf->len); } else { u32 id = (cf->can_id & CAN_SFF_MASK) << F81604_SFF_SHIFT; put_unaligned_be16(id, &frame->sff.id); if (!(cf->can_id & CAN_RTR_FLAG)) memcpy(&frame->sff.data, cf->data, cf->len); } can_put_echo_skb(skb, netdev, 0, 0); ret = usb_submit_urb(write_urb, GFP_ATOMIC); if (ret) { netdev_err(netdev, "%s: failed to resubmit tx bulk urb: %pe\n", __func__, ERR_PTR(ret)); can_free_echo_skb(netdev, 0, NULL); stats->tx_dropped++; stats->tx_errors++; if (ret == -ENODEV) netif_device_detach(netdev); else netif_wake_queue(netdev); } /* let usb core take care of this urb */ usb_free_urb(write_urb); return NETDEV_TX_OK; nomem_buf: usb_free_urb(write_urb); nomem_urb: dev_kfree_skb(skb); stats->tx_dropped++; stats->tx_errors++; netif_wake_queue(netdev); return NETDEV_TX_OK; } static int f81604_get_berr_counter(const struct net_device *netdev, struct can_berr_counter *bec) { struct f81604_port_priv *priv = netdev_priv(netdev); u8 txerr, rxerr; int ret; ret = f81604_sja1000_read(priv, F81604_SJA1000_TXERR, &txerr); if (ret) return ret; ret = f81604_sja1000_read(priv, F81604_SJA1000_RXERR, &rxerr); if (ret) return ret; bec->txerr = txerr; bec->rxerr = rxerr; return 0; } /* Open USB device */ static int f81604_open(struct net_device *netdev) { int ret; ret = open_candev(netdev); if (ret) return ret; ret = f81604_start(netdev); if (ret) { if (ret == -ENODEV) netif_device_detach(netdev); close_candev(netdev); return ret; } netif_start_queue(netdev); return 0; } /* Close USB device */ static int f81604_close(struct net_device *netdev) { struct f81604_port_priv *priv = netdev_priv(netdev); f81604_set_reset_mode(priv); netif_stop_queue(netdev); cancel_work_sync(&priv->clear_reg_work); close_candev(netdev); f81604_unregister_urbs(priv); return 0; } static const struct net_device_ops f81604_netdev_ops = { .ndo_open = f81604_open, .ndo_stop = f81604_close, .ndo_start_xmit = f81604_start_xmit, .ndo_change_mtu = can_change_mtu, }; static const struct can_bittiming_const f81604_bittiming_const = { .name = KBUILD_MODNAME, .tseg1_min = 1, .tseg1_max = 16, .tseg2_min = 1, .tseg2_max = 8, .sjw_max = 4, .brp_min = 1, .brp_max = 64, .brp_inc = 1, }; /* Called by the usb core when driver is unloaded or device is removed */ static void f81604_disconnect(struct usb_interface *intf) { struct f81604_priv *priv = usb_get_intfdata(intf); int i; for (i = 0; i < ARRAY_SIZE(priv->netdev); ++i) { if (!priv->netdev[i]) continue; unregister_netdev(priv->netdev[i]); free_candev(priv->netdev[i]); } } static int __f81604_set_termination(struct usb_device *dev, int idx, u16 term) { u8 mask, data = 0; if (idx == 0) mask = F81604_CAN0_TERM; else mask = F81604_CAN1_TERM; if (term) data = mask; return f81604_update_bits(dev, F81604_TERMINATOR_REG, mask, data); } static int f81604_set_termination(struct net_device *netdev, u16 term) { struct f81604_port_priv *port_priv = netdev_priv(netdev); ASSERT_RTNL(); return __f81604_set_termination(port_priv->dev, netdev->dev_port, term); } static int f81604_probe(struct usb_interface *intf, const struct usb_device_id *id) { struct usb_device *dev = interface_to_usbdev(intf); struct net_device *netdev; struct f81604_priv *priv; int i, ret; priv = devm_kzalloc(&intf->dev, sizeof(*priv), GFP_KERNEL); if (!priv) return -ENOMEM; usb_set_intfdata(intf, priv); for (i = 0; i < ARRAY_SIZE(priv->netdev); ++i) { ret = __f81604_set_termination(dev, i, 0); if (ret) { dev_err(&intf->dev, "Setting termination of CH#%d failed: %pe\n", i, ERR_PTR(ret)); return ret; } } for (i = 0; i < ARRAY_SIZE(priv->netdev); ++i) { struct f81604_port_priv *port_priv; netdev = alloc_candev(sizeof(*port_priv), 1); if (!netdev) { dev_err(&intf->dev, "Couldn't alloc candev: %d\n", i); ret = -ENOMEM; goto failure_cleanup; } port_priv = netdev_priv(netdev); INIT_WORK(&port_priv->clear_reg_work, f81604_clear_reg_work); init_usb_anchor(&port_priv->urbs_anchor); port_priv->intf = intf; port_priv->dev = dev; port_priv->netdev = netdev; port_priv->can.clock.freq = F81604_CAN_CLOCK; port_priv->can.termination_const = f81604_termination; port_priv->can.termination_const_cnt = ARRAY_SIZE(f81604_termination); port_priv->can.bittiming_const = &f81604_bittiming_const; port_priv->can.do_set_bittiming = f81604_set_bittiming; port_priv->can.do_set_mode = f81604_set_mode; port_priv->can.do_set_termination = f81604_set_termination; port_priv->can.do_get_berr_counter = f81604_get_berr_counter; port_priv->can.ctrlmode_supported = CAN_CTRLMODE_LISTENONLY | CAN_CTRLMODE_3_SAMPLES | CAN_CTRLMODE_ONE_SHOT | CAN_CTRLMODE_BERR_REPORTING | CAN_CTRLMODE_PRESUME_ACK; netdev->ethtool_ops = &f81604_ethtool_ops; netdev->netdev_ops = &f81604_netdev_ops; netdev->flags |= IFF_ECHO; netdev->dev_port = i; SET_NETDEV_DEV(netdev, &intf->dev); ret = register_candev(netdev); if (ret) { netdev_err(netdev, "register CAN device failed: %pe\n", ERR_PTR(ret)); free_candev(netdev); goto failure_cleanup; } priv->netdev[i] = netdev; } return 0; failure_cleanup: f81604_disconnect(intf); return ret; } static struct usb_driver f81604_driver = { .name = KBUILD_MODNAME, .probe = f81604_probe, .disconnect = f81604_disconnect, .id_table = f81604_table, }; module_usb_driver(f81604_driver); MODULE_AUTHOR("Ji-Ze Hong (Peter Hong) <peter_hong@fintek.com.tw>"); MODULE_DESCRIPTION("Fintek F81604 USB to 2xCANBUS"); MODULE_LICENSE("GPL"); |
| 39 48 48 8 5 2 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 | /* * Copyright IBM Corporation, 2012 * Author Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> * * This program is free software; you can redistribute it and/or modify it * under the terms of version 2.1 of the GNU Lesser General Public License * as published by the Free Software Foundation. * * This program is distributed in the hope that it would be useful, but * WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. * */ #ifndef _LINUX_HUGETLB_CGROUP_H #define _LINUX_HUGETLB_CGROUP_H #include <linux/mmdebug.h> struct hugetlb_cgroup; struct resv_map; struct file_region; #ifdef CONFIG_CGROUP_HUGETLB enum hugetlb_memory_event { HUGETLB_MAX, HUGETLB_NR_MEMORY_EVENTS, }; struct hugetlb_cgroup_per_node { /* hugetlb usage in pages over all hstates. */ unsigned long usage[HUGE_MAX_HSTATE]; }; struct hugetlb_cgroup { struct cgroup_subsys_state css; /* * the counter to account for hugepages from hugetlb. */ struct page_counter hugepage[HUGE_MAX_HSTATE]; /* * the counter to account for hugepage reservations from hugetlb. */ struct page_counter rsvd_hugepage[HUGE_MAX_HSTATE]; atomic_long_t events[HUGE_MAX_HSTATE][HUGETLB_NR_MEMORY_EVENTS]; atomic_long_t events_local[HUGE_MAX_HSTATE][HUGETLB_NR_MEMORY_EVENTS]; /* Handle for "hugetlb.events" */ struct cgroup_file events_file[HUGE_MAX_HSTATE]; /* Handle for "hugetlb.events.local" */ struct cgroup_file events_local_file[HUGE_MAX_HSTATE]; struct hugetlb_cgroup_per_node *nodeinfo[]; }; static inline struct hugetlb_cgroup * __hugetlb_cgroup_from_folio(struct folio *folio, bool rsvd) { VM_BUG_ON_FOLIO(!folio_test_hugetlb(folio), folio); if (rsvd) return folio->_hugetlb_cgroup_rsvd; else return folio->_hugetlb_cgroup; } static inline struct hugetlb_cgroup *hugetlb_cgroup_from_folio(struct folio *folio) { return __hugetlb_cgroup_from_folio(folio, false); } static inline struct hugetlb_cgroup * hugetlb_cgroup_from_folio_rsvd(struct folio *folio) { return __hugetlb_cgroup_from_folio(folio, true); } static inline void __set_hugetlb_cgroup(struct folio *folio, struct hugetlb_cgroup *h_cg, bool rsvd) { VM_BUG_ON_FOLIO(!folio_test_hugetlb(folio), folio); if (rsvd) folio->_hugetlb_cgroup_rsvd = h_cg; else folio->_hugetlb_cgroup = h_cg; } static inline void set_hugetlb_cgroup(struct folio *folio, struct hugetlb_cgroup *h_cg) { __set_hugetlb_cgroup(folio, h_cg, false); } static inline void set_hugetlb_cgroup_rsvd(struct folio *folio, struct hugetlb_cgroup *h_cg) { __set_hugetlb_cgroup(folio, h_cg, true); } static inline bool hugetlb_cgroup_disabled(void) { return !cgroup_subsys_enabled(hugetlb_cgrp_subsys); } static inline void hugetlb_cgroup_put_rsvd_cgroup(struct hugetlb_cgroup *h_cg) { css_put(&h_cg->css); } static inline void resv_map_dup_hugetlb_cgroup_uncharge_info( struct resv_map *resv_map) { if (resv_map->css) css_get(resv_map->css); } static inline void resv_map_put_hugetlb_cgroup_uncharge_info( struct resv_map *resv_map) { if (resv_map->css) css_put(resv_map->css); } extern int hugetlb_cgroup_charge_cgroup(int idx, unsigned long nr_pages, struct hugetlb_cgroup **ptr); extern int hugetlb_cgroup_charge_cgroup_rsvd(int idx, unsigned long nr_pages, struct hugetlb_cgroup **ptr); extern void hugetlb_cgroup_commit_charge(int idx, unsigned long nr_pages, struct hugetlb_cgroup *h_cg, struct folio *folio); extern void hugetlb_cgroup_commit_charge_rsvd(int idx, unsigned long nr_pages, struct hugetlb_cgroup *h_cg, struct folio *folio); extern void hugetlb_cgroup_uncharge_folio(int idx, unsigned long nr_pages, struct folio *folio); extern void hugetlb_cgroup_uncharge_folio_rsvd(int idx, unsigned long nr_pages, struct folio *folio); extern void hugetlb_cgroup_uncharge_cgroup(int idx, unsigned long nr_pages, struct hugetlb_cgroup *h_cg); extern void hugetlb_cgroup_uncharge_cgroup_rsvd(int idx, unsigned long nr_pages, struct hugetlb_cgroup *h_cg); extern void hugetlb_cgroup_uncharge_counter(struct resv_map *resv, unsigned long start, unsigned long end); extern void hugetlb_cgroup_uncharge_file_region(struct resv_map *resv, struct file_region *rg, unsigned long nr_pages, bool region_del); extern void hugetlb_cgroup_file_init(void) __init; extern void hugetlb_cgroup_migrate(struct folio *old_folio, struct folio *new_folio); #else static inline void hugetlb_cgroup_uncharge_file_region(struct resv_map *resv, struct file_region *rg, unsigned long nr_pages, bool region_del) { } static inline struct hugetlb_cgroup *hugetlb_cgroup_from_folio(struct folio *folio) { return NULL; } static inline struct hugetlb_cgroup * hugetlb_cgroup_from_folio_rsvd(struct folio *folio) { return NULL; } static inline void set_hugetlb_cgroup(struct folio *folio, struct hugetlb_cgroup *h_cg) { } static inline void set_hugetlb_cgroup_rsvd(struct folio *folio, struct hugetlb_cgroup *h_cg) { } static inline bool hugetlb_cgroup_disabled(void) { return true; } static inline void hugetlb_cgroup_put_rsvd_cgroup(struct hugetlb_cgroup *h_cg) { } static inline void resv_map_dup_hugetlb_cgroup_uncharge_info( struct resv_map *resv_map) { } static inline void resv_map_put_hugetlb_cgroup_uncharge_info( struct resv_map *resv_map) { } static inline int hugetlb_cgroup_charge_cgroup(int idx, unsigned long nr_pages, struct hugetlb_cgroup **ptr) { return 0; } static inline int hugetlb_cgroup_charge_cgroup_rsvd(int idx, unsigned long nr_pages, struct hugetlb_cgroup **ptr) { return 0; } static inline void hugetlb_cgroup_commit_charge(int idx, unsigned long nr_pages, struct hugetlb_cgroup *h_cg, struct folio *folio) { } static inline void hugetlb_cgroup_commit_charge_rsvd(int idx, unsigned long nr_pages, struct hugetlb_cgroup *h_cg, struct folio *folio) { } static inline void hugetlb_cgroup_uncharge_folio(int idx, unsigned long nr_pages, struct folio *folio) { } static inline void hugetlb_cgroup_uncharge_folio_rsvd(int idx, unsigned long nr_pages, struct folio *folio) { } static inline void hugetlb_cgroup_uncharge_cgroup(int idx, unsigned long nr_pages, struct hugetlb_cgroup *h_cg) { } static inline void hugetlb_cgroup_uncharge_cgroup_rsvd(int idx, unsigned long nr_pages, struct hugetlb_cgroup *h_cg) { } static inline void hugetlb_cgroup_uncharge_counter(struct resv_map *resv, unsigned long start, unsigned long end) { } static inline void hugetlb_cgroup_file_init(void) { } static inline void hugetlb_cgroup_migrate(struct folio *old_folio, struct folio *new_folio) { } #endif /* CONFIG_MEM_RES_CTLR_HUGETLB */ #endif |
| 23 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 | /* SPDX-License-Identifier: GPL-2.0 */ /* Copyright (C) B.A.T.M.A.N. contributors: * * Simon Wunderlich */ #ifndef _NET_BATMAN_ADV_BLA_H_ #define _NET_BATMAN_ADV_BLA_H_ #include "main.h" #include <linux/compiler.h> #include <linux/netdevice.h> #include <linux/netlink.h> #include <linux/skbuff.h> #include <linux/stddef.h> #include <linux/types.h> /** * batadv_bla_is_loopdetect_mac() - check if the mac address is from a loop * detect frame sent by bridge loop avoidance * @mac: mac address to check * * Return: true if the it looks like a loop detect frame * (mac starts with BA:BE), false otherwise */ static inline bool batadv_bla_is_loopdetect_mac(const uint8_t *mac) { if (mac[0] == 0xba && mac[1] == 0xbe) return true; return false; } #ifdef CONFIG_BATMAN_ADV_BLA bool batadv_bla_rx(struct batadv_priv *bat_priv, struct sk_buff *skb, unsigned short vid, int packet_type); bool batadv_bla_tx(struct batadv_priv *bat_priv, struct sk_buff *skb, unsigned short vid); bool batadv_bla_is_backbone_gw(struct sk_buff *skb, struct batadv_orig_node *orig_node, int hdr_size); int batadv_bla_claim_dump(struct sk_buff *msg, struct netlink_callback *cb); int batadv_bla_backbone_dump(struct sk_buff *msg, struct netlink_callback *cb); bool batadv_bla_is_backbone_gw_orig(struct batadv_priv *bat_priv, u8 *orig, unsigned short vid); bool batadv_bla_check_bcast_duplist(struct batadv_priv *bat_priv, struct sk_buff *skb); void batadv_bla_update_orig_address(struct batadv_priv *bat_priv, struct batadv_hard_iface *primary_if, struct batadv_hard_iface *oldif); void batadv_bla_status_update(struct net_device *net_dev); int batadv_bla_init(struct batadv_priv *bat_priv); void batadv_bla_free(struct batadv_priv *bat_priv); #ifdef CONFIG_BATMAN_ADV_DAT bool batadv_bla_check_claim(struct batadv_priv *bat_priv, u8 *addr, unsigned short vid); #endif #define BATADV_BLA_CRC_INIT 0 #else /* ifdef CONFIG_BATMAN_ADV_BLA */ static inline bool batadv_bla_rx(struct batadv_priv *bat_priv, struct sk_buff *skb, unsigned short vid, int packet_type) { return false; } static inline bool batadv_bla_tx(struct batadv_priv *bat_priv, struct sk_buff *skb, unsigned short vid) { return false; } static inline bool batadv_bla_is_backbone_gw(struct sk_buff *skb, struct batadv_orig_node *orig_node, int hdr_size) { return false; } static inline bool batadv_bla_is_backbone_gw_orig(struct batadv_priv *bat_priv, u8 *orig, unsigned short vid) { return false; } static inline bool batadv_bla_check_bcast_duplist(struct batadv_priv *bat_priv, struct sk_buff *skb) { return false; } static inline void batadv_bla_update_orig_address(struct batadv_priv *bat_priv, struct batadv_hard_iface *primary_if, struct batadv_hard_iface *oldif) { } static inline int batadv_bla_init(struct batadv_priv *bat_priv) { return 1; } static inline void batadv_bla_free(struct batadv_priv *bat_priv) { } static inline int batadv_bla_claim_dump(struct sk_buff *msg, struct netlink_callback *cb) { return -EOPNOTSUPP; } static inline int batadv_bla_backbone_dump(struct sk_buff *msg, struct netlink_callback *cb) { return -EOPNOTSUPP; } static inline bool batadv_bla_check_claim(struct batadv_priv *bat_priv, u8 *addr, unsigned short vid) { return true; } #endif /* ifdef CONFIG_BATMAN_ADV_BLA */ #endif /* ifndef _NET_BATMAN_ADV_BLA_H_ */ |
| 4 4 4 4 4 2 4 1 3 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 | // SPDX-License-Identifier: GPL-2.0-or-later /* * Randomness driver for virtio * Copyright (C) 2007, 2008 Rusty Russell IBM Corporation */ #include <asm/barrier.h> #include <linux/err.h> #include <linux/hw_random.h> #include <linux/scatterlist.h> #include <linux/spinlock.h> #include <linux/virtio.h> #include <linux/virtio_rng.h> #include <linux/module.h> #include <linux/slab.h> static DEFINE_IDA(rng_index_ida); struct virtrng_info { struct hwrng hwrng; struct virtqueue *vq; char name[25]; int index; bool hwrng_register_done; bool hwrng_removed; /* data transfer */ struct completion have_data; unsigned int data_avail; unsigned int data_idx; /* minimal size returned by rng_buffer_size() */ #if SMP_CACHE_BYTES < 32 u8 data[32]; #else u8 data[SMP_CACHE_BYTES]; #endif }; static void random_recv_done(struct virtqueue *vq) { struct virtrng_info *vi = vq->vdev->priv; unsigned int len; /* We can get spurious callbacks, e.g. shared IRQs + virtio_pci. */ if (!virtqueue_get_buf(vi->vq, &len)) return; smp_store_release(&vi->data_avail, len); complete(&vi->have_data); } static void request_entropy(struct virtrng_info *vi) { struct scatterlist sg; reinit_completion(&vi->have_data); vi->data_idx = 0; sg_init_one(&sg, vi->data, sizeof(vi->data)); /* There should always be room for one buffer. */ virtqueue_add_inbuf(vi->vq, &sg, 1, vi->data, GFP_KERNEL); virtqueue_kick(vi->vq); } static unsigned int copy_data(struct virtrng_info *vi, void *buf, unsigned int size) { size = min_t(unsigned int, size, vi->data_avail); memcpy(buf, vi->data + vi->data_idx, size); vi->data_idx += size; vi->data_avail -= size; if (vi->data_avail == 0) request_entropy(vi); return size; } static int virtio_read(struct hwrng *rng, void *buf, size_t size, bool wait) { int ret; struct virtrng_info *vi = (struct virtrng_info *)rng->priv; unsigned int chunk; size_t read; if (vi->hwrng_removed) return -ENODEV; read = 0; /* copy available data */ if (smp_load_acquire(&vi->data_avail)) { chunk = copy_data(vi, buf, size); size -= chunk; read += chunk; } if (!wait) return read; /* We have already copied available entropy, * so either size is 0 or data_avail is 0 */ while (size != 0) { /* data_avail is 0 but a request is pending */ ret = wait_for_completion_killable(&vi->have_data); if (ret < 0) return ret; /* if vi->data_avail is 0, we have been interrupted * by a cleanup, but buffer stays in the queue */ if (vi->data_avail == 0) return read; chunk = copy_data(vi, buf + read, size); size -= chunk; read += chunk; } return read; } static void virtio_cleanup(struct hwrng *rng) { struct virtrng_info *vi = (struct virtrng_info *)rng->priv; complete(&vi->have_data); } static int probe_common(struct virtio_device *vdev) { int err, index; struct virtrng_info *vi = NULL; vi = kzalloc(sizeof(struct virtrng_info), GFP_KERNEL); if (!vi) return -ENOMEM; vi->index = index = ida_alloc(&rng_index_ida, GFP_KERNEL); if (index < 0) { err = index; goto err_ida; } sprintf(vi->name, "virtio_rng.%d", index); init_completion(&vi->have_data); vi->hwrng = (struct hwrng) { .read = virtio_read, .cleanup = virtio_cleanup, .priv = (unsigned long)vi, .name = vi->name, }; vdev->priv = vi; /* We expect a single virtqueue. */ vi->vq = virtio_find_single_vq(vdev, random_recv_done, "input"); if (IS_ERR(vi->vq)) { err = PTR_ERR(vi->vq); goto err_find; } virtio_device_ready(vdev); /* we always have a pending entropy request */ request_entropy(vi); return 0; err_find: ida_free(&rng_index_ida, index); err_ida: kfree(vi); return err; } static void remove_common(struct virtio_device *vdev) { struct virtrng_info *vi = vdev->priv; vi->hwrng_removed = true; vi->data_avail = 0; vi->data_idx = 0; complete(&vi->have_data); if (vi->hwrng_register_done) hwrng_unregister(&vi->hwrng); virtio_reset_device(vdev); vdev->config->del_vqs(vdev); ida_free(&rng_index_ida, vi->index); kfree(vi); } static int virtrng_probe(struct virtio_device *vdev) { return probe_common(vdev); } static void virtrng_remove(struct virtio_device *vdev) { remove_common(vdev); } static void virtrng_scan(struct virtio_device *vdev) { struct virtrng_info *vi = vdev->priv; int err; err = hwrng_register(&vi->hwrng); if (!err) vi->hwrng_register_done = true; } static int virtrng_freeze(struct virtio_device *vdev) { remove_common(vdev); return 0; } static int virtrng_restore(struct virtio_device *vdev) { int err; err = probe_common(vdev); if (!err) { struct virtrng_info *vi = vdev->priv; /* * Set hwrng_removed to ensure that virtio_read() * does not block waiting for data before the * registration is complete. */ vi->hwrng_removed = true; err = hwrng_register(&vi->hwrng); if (!err) { vi->hwrng_register_done = true; vi->hwrng_removed = false; } } return err; } static const struct virtio_device_id id_table[] = { { VIRTIO_ID_RNG, VIRTIO_DEV_ANY_ID }, { 0 }, }; static struct virtio_driver virtio_rng_driver = { .driver.name = KBUILD_MODNAME, .id_table = id_table, .probe = virtrng_probe, .remove = virtrng_remove, .scan = virtrng_scan, .freeze = pm_sleep_ptr(virtrng_freeze), .restore = pm_sleep_ptr(virtrng_restore), }; module_virtio_driver(virtio_rng_driver); MODULE_DEVICE_TABLE(virtio, id_table); MODULE_DESCRIPTION("Virtio random number driver"); MODULE_LICENSE("GPL"); |
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 | /* * Copyright © 2017 Red Hat * * Permission is hereby granted, free of charge, to any person obtaining a * copy of this software and associated documentation files (the "Software"), * to deal in the Software without restriction, including without limitation * the rights to use, copy, modify, merge, publish, distribute, sublicense, * and/or sell copies of the Software, and to permit persons to whom the * Software is furnished to do so, subject to the following conditions: * * The above copyright notice and this permission notice (including the next * paragraph) shall be included in all copies or substantial portions of the * Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS * IN THE SOFTWARE. * * Authors: * */ #ifndef __DRM_SYNCOBJ_H__ #define __DRM_SYNCOBJ_H__ #include <linux/dma-fence.h> #include <linux/dma-fence-chain.h> struct drm_file; /** * struct drm_syncobj - sync object. * * This structure defines a generic sync object which wraps a &dma_fence. */ struct drm_syncobj { /** * @refcount: Reference count of this object. */ struct kref refcount; /** * @fence: * NULL or a pointer to the fence bound to this object. * * This field should not be used directly. Use drm_syncobj_fence_get() * and drm_syncobj_replace_fence() instead. */ struct dma_fence __rcu *fence; /** * @cb_list: List of callbacks to call when the &fence gets replaced. */ struct list_head cb_list; /** * @ev_fd_list: List of registered eventfd. */ struct list_head ev_fd_list; /** * @lock: Protects &cb_list and &ev_fd_list, and write-locks &fence. */ spinlock_t lock; /** * @file: A file backing for this syncobj. */ struct file *file; }; void drm_syncobj_free(struct kref *kref); /** * drm_syncobj_get - acquire a syncobj reference * @obj: sync object * * This acquires an additional reference to @obj. It is illegal to call this * without already holding a reference. No locks required. */ static inline void drm_syncobj_get(struct drm_syncobj *obj) { kref_get(&obj->refcount); } /** * drm_syncobj_put - release a reference to a sync object. * @obj: sync object. */ static inline void drm_syncobj_put(struct drm_syncobj *obj) { kref_put(&obj->refcount, drm_syncobj_free); } /** * drm_syncobj_fence_get - get a reference to a fence in a sync object * @syncobj: sync object. * * This acquires additional reference to &drm_syncobj.fence contained in @obj, * if not NULL. It is illegal to call this without already holding a reference. * No locks required. * * Returns: * Either the fence of @obj or NULL if there's none. */ static inline struct dma_fence * drm_syncobj_fence_get(struct drm_syncobj *syncobj) { struct dma_fence *fence; rcu_read_lock(); fence = dma_fence_get_rcu_safe(&syncobj->fence); rcu_read_unlock(); return fence; } struct drm_syncobj *drm_syncobj_find(struct drm_file *file_private, u32 handle); void drm_syncobj_add_point(struct drm_syncobj *syncobj, struct dma_fence_chain *chain, struct dma_fence *fence, uint64_t point); void drm_syncobj_replace_fence(struct drm_syncobj *syncobj, struct dma_fence *fence); int drm_syncobj_find_fence(struct drm_file *file_private, u32 handle, u64 point, u64 flags, struct dma_fence **fence); void drm_syncobj_free(struct kref *kref); int drm_syncobj_create(struct drm_syncobj **out_syncobj, uint32_t flags, struct dma_fence *fence); int drm_syncobj_get_handle(struct drm_file *file_private, struct drm_syncobj *syncobj, u32 *handle); int drm_syncobj_get_fd(struct drm_syncobj *syncobj, int *p_fd); #endif |
| 2 2 1 1 1 1 1 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 | // SPDX-License-Identifier: GPL-2.0-only /* * IPV4 GSO/GRO offload support * Linux INET implementation * * Copyright (C) 2016 secunet Security Networks AG * Author: Steffen Klassert <steffen.klassert@secunet.com> * * ESP GRO support */ #include <linux/skbuff.h> #include <linux/init.h> #include <net/protocol.h> #include <crypto/aead.h> #include <crypto/authenc.h> #include <linux/err.h> #include <linux/module.h> #include <net/gro.h> #include <net/gso.h> #include <net/ip.h> #include <net/xfrm.h> #include <net/esp.h> #include <linux/scatterlist.h> #include <linux/kernel.h> #include <linux/slab.h> #include <linux/spinlock.h> #include <net/udp.h> static struct sk_buff *esp4_gro_receive(struct list_head *head, struct sk_buff *skb) { int offset = skb_gro_offset(skb); struct xfrm_offload *xo; struct xfrm_state *x; int encap_type = 0; __be32 seq; __be32 spi; if (!pskb_pull(skb, offset)) return NULL; if (xfrm_parse_spi(skb, IPPROTO_ESP, &spi, &seq) != 0) goto out; xo = xfrm_offload(skb); if (!xo || !(xo->flags & CRYPTO_DONE)) { struct sec_path *sp = secpath_set(skb); if (!sp) goto out; if (sp->len == XFRM_MAX_DEPTH) goto out_reset; x = xfrm_input_state_lookup(dev_net(skb->dev), skb->mark, (xfrm_address_t *)&ip_hdr(skb)->daddr, spi, IPPROTO_ESP, AF_INET); if (unlikely(x && x->dir && x->dir != XFRM_SA_DIR_IN)) { /* non-offload path will record the error and audit log */ xfrm_state_put(x); x = NULL; } if (!x) goto out_reset; skb->mark = xfrm_smark_get(skb->mark, x); sp->xvec[sp->len++] = x; sp->olen++; xo = xfrm_offload(skb); if (!xo) goto out_reset; } xo->flags |= XFRM_GRO; if (NAPI_GRO_CB(skb)->proto == IPPROTO_UDP) encap_type = UDP_ENCAP_ESPINUDP; XFRM_TUNNEL_SKB_CB(skb)->tunnel.ip4 = NULL; XFRM_SPI_SKB_CB(skb)->family = AF_INET; XFRM_SPI_SKB_CB(skb)->daddroff = offsetof(struct iphdr, daddr); XFRM_SPI_SKB_CB(skb)->seq = seq; /* We don't need to handle errors from xfrm_input, it does all * the error handling and frees the resources on error. */ xfrm_input(skb, IPPROTO_ESP, spi, encap_type); return ERR_PTR(-EINPROGRESS); out_reset: secpath_reset(skb); out: skb_push(skb, offset); NAPI_GRO_CB(skb)->same_flow = 0; NAPI_GRO_CB(skb)->flush = 1; return NULL; } static void esp4_gso_encap(struct xfrm_state *x, struct sk_buff *skb) { struct ip_esp_hdr *esph; struct iphdr *iph = ip_hdr(skb); struct xfrm_offload *xo = xfrm_offload(skb); int proto = iph->protocol; skb_push(skb, -skb_network_offset(skb)); esph = ip_esp_hdr(skb); *skb_mac_header(skb) = IPPROTO_ESP; esph->spi = x->id.spi; esph->seq_no = htonl(XFRM_SKB_CB(skb)->seq.output.low); xo->proto = proto; } static struct sk_buff *xfrm4_tunnel_gso_segment(struct xfrm_state *x, struct sk_buff *skb, netdev_features_t features) { __be16 type = x->inner_mode.family == AF_INET6 ? htons(ETH_P_IPV6) : htons(ETH_P_IP); return skb_eth_gso_segment(skb, features, type); } static struct sk_buff *xfrm4_transport_gso_segment(struct xfrm_state *x, struct sk_buff *skb, netdev_features_t features) { const struct net_offload *ops; struct sk_buff *segs = ERR_PTR(-EINVAL); struct xfrm_offload *xo = xfrm_offload(skb); skb->transport_header += x->props.header_len; ops = rcu_dereference(inet_offloads[xo->proto]); if (likely(ops && ops->callbacks.gso_segment)) segs = ops->callbacks.gso_segment(skb, features); return segs; } static struct sk_buff *xfrm4_beet_gso_segment(struct xfrm_state *x, struct sk_buff *skb, netdev_features_t features) { struct xfrm_offload *xo = xfrm_offload(skb); struct sk_buff *segs = ERR_PTR(-EINVAL); const struct net_offload *ops; u8 proto = xo->proto; skb->transport_header += x->props.header_len; if (x->sel.family != AF_INET6) { if (proto == IPPROTO_BEETPH) { struct ip_beet_phdr *ph = (struct ip_beet_phdr *)skb->data; skb->transport_header += ph->hdrlen * 8; proto = ph->nexthdr; } else { skb->transport_header -= IPV4_BEET_PHMAXLEN; } } else { __be16 frag; skb->transport_header += ipv6_skip_exthdr(skb, 0, &proto, &frag); if (proto == IPPROTO_TCP) skb_shinfo(skb)->gso_type |= SKB_GSO_TCPV4; } if (proto == IPPROTO_IPV6) skb_shinfo(skb)->gso_type |= SKB_GSO_IPXIP4; __skb_pull(skb, skb_transport_offset(skb)); ops = rcu_dereference(inet_offloads[proto]); if (likely(ops && ops->callbacks.gso_segment)) segs = ops->callbacks.gso_segment(skb, features); return segs; } static struct sk_buff *xfrm4_outer_mode_gso_segment(struct xfrm_state *x, struct sk_buff *skb, netdev_features_t features) { switch (x->outer_mode.encap) { case XFRM_MODE_TUNNEL: return xfrm4_tunnel_gso_segment(x, skb, features); case XFRM_MODE_TRANSPORT: return xfrm4_transport_gso_segment(x, skb, features); case XFRM_MODE_BEET: return xfrm4_beet_gso_segment(x, skb, features); } return ERR_PTR(-EOPNOTSUPP); } static struct sk_buff *esp4_gso_segment(struct sk_buff *skb, netdev_features_t features) { struct xfrm_state *x; struct ip_esp_hdr *esph; struct crypto_aead *aead; netdev_features_t esp_features = features; struct xfrm_offload *xo = xfrm_offload(skb); struct sec_path *sp; if (!xo) return ERR_PTR(-EINVAL); if (!(skb_shinfo(skb)->gso_type & SKB_GSO_ESP)) return ERR_PTR(-EINVAL); sp = skb_sec_path(skb); x = sp->xvec[sp->len - 1]; aead = x->data; esph = ip_esp_hdr(skb); if (esph->spi != x->id.spi) return ERR_PTR(-EINVAL); if (!pskb_may_pull(skb, sizeof(*esph) + crypto_aead_ivsize(aead))) return ERR_PTR(-EINVAL); __skb_pull(skb, sizeof(*esph) + crypto_aead_ivsize(aead)); skb->encap_hdr_csum = 1; if ((!(skb->dev->gso_partial_features & NETIF_F_HW_ESP) && !(features & NETIF_F_HW_ESP)) || x->xso.dev != skb->dev) esp_features = features & ~(NETIF_F_SG | NETIF_F_CSUM_MASK | NETIF_F_SCTP_CRC); else if (!(features & NETIF_F_HW_ESP_TX_CSUM) && !(skb->dev->gso_partial_features & NETIF_F_HW_ESP_TX_CSUM)) esp_features = features & ~(NETIF_F_CSUM_MASK | NETIF_F_SCTP_CRC); xo->flags |= XFRM_GSO_SEGMENT; return xfrm4_outer_mode_gso_segment(x, skb, esp_features); } static int esp_input_tail(struct xfrm_state *x, struct sk_buff *skb) { struct crypto_aead *aead = x->data; struct xfrm_offload *xo = xfrm_offload(skb); if (!pskb_may_pull(skb, sizeof(struct ip_esp_hdr) + crypto_aead_ivsize(aead))) return -EINVAL; if (!(xo->flags & CRYPTO_DONE)) skb->ip_summed = CHECKSUM_NONE; return esp_input_done2(skb, 0); } static int esp_xmit(struct xfrm_state *x, struct sk_buff *skb, netdev_features_t features) { int err; int alen; int blksize; struct xfrm_offload *xo; struct ip_esp_hdr *esph; struct crypto_aead *aead; struct esp_info esp; bool hw_offload = true; __u32 seq; int encap_type = 0; esp.inplace = true; xo = xfrm_offload(skb); if (!xo) return -EINVAL; if ((!(features & NETIF_F_HW_ESP) && !(skb->dev->gso_partial_features & NETIF_F_HW_ESP)) || x->xso.dev != skb->dev) { xo->flags |= CRYPTO_FALLBACK; hw_offload = false; } esp.proto = xo->proto; /* skb is pure payload to encrypt */ aead = x->data; alen = crypto_aead_authsize(aead); esp.tfclen = 0; /* XXX: Add support for tfc padding here. */ blksize = ALIGN(crypto_aead_blocksize(aead), 4); esp.clen = ALIGN(skb->len + 2 + esp.tfclen, blksize); esp.plen = esp.clen - skb->len - esp.tfclen; esp.tailen = esp.tfclen + esp.plen + alen; esp.esph = ip_esp_hdr(skb); if (x->encap) encap_type = x->encap->encap_type; if (!hw_offload || !skb_is_gso(skb) || (hw_offload && encap_type == UDP_ENCAP_ESPINUDP)) { esp.nfrags = esp_output_head(x, skb, &esp); if (esp.nfrags < 0) return esp.nfrags; } seq = xo->seq.low; esph = esp.esph; esph->spi = x->id.spi; skb_push(skb, -skb_network_offset(skb)); if (xo->flags & XFRM_GSO_SEGMENT) { esph->seq_no = htonl(seq); if (!skb_is_gso(skb)) xo->seq.low++; else xo->seq.low += skb_shinfo(skb)->gso_segs; } if (xo->seq.low < seq) xo->seq.hi++; esp.seqno = cpu_to_be64(seq + ((u64)xo->seq.hi << 32)); if (hw_offload && encap_type == UDP_ENCAP_ESPINUDP) { /* In the XFRM stack, the encapsulation protocol is set to iphdr->protocol by * setting *skb_mac_header(skb) (see esp_output_udp_encap()) where skb->mac_header * points to iphdr->protocol (see xfrm4_tunnel_encap_add()). * However, in esp_xmit(), skb->mac_header doesn't point to iphdr->protocol. * Therefore, the protocol field needs to be corrected. */ ip_hdr(skb)->protocol = IPPROTO_UDP; esph->seq_no = htonl(seq); } ip_hdr(skb)->tot_len = htons(skb->len); ip_send_check(ip_hdr(skb)); if (hw_offload) { if (!skb_ext_add(skb, SKB_EXT_SEC_PATH)) return -ENOMEM; xo = xfrm_offload(skb); if (!xo) return -EINVAL; xo->flags |= XFRM_XMIT; return 0; } err = esp_output_tail(x, skb, &esp); if (err) return err; secpath_reset(skb); if (skb_needs_linearize(skb, skb->dev->features) && __skb_linearize(skb)) return -ENOMEM; return 0; } static const struct net_offload esp4_offload = { .callbacks = { .gro_receive = esp4_gro_receive, .gso_segment = esp4_gso_segment, }, }; static const struct xfrm_type_offload esp_type_offload = { .owner = THIS_MODULE, .proto = IPPROTO_ESP, .input_tail = esp_input_tail, .xmit = esp_xmit, .encap = esp4_gso_encap, }; static int __init esp4_offload_init(void) { if (xfrm_register_type_offload(&esp_type_offload, AF_INET) < 0) { pr_info("%s: can't add xfrm type offload\n", __func__); return -EAGAIN; } return inet_add_offload(&esp4_offload, IPPROTO_ESP); } static void __exit esp4_offload_exit(void) { xfrm_unregister_type_offload(&esp_type_offload, AF_INET); inet_del_offload(&esp4_offload, IPPROTO_ESP); } module_init(esp4_offload_init); module_exit(esp4_offload_exit); MODULE_LICENSE("GPL"); MODULE_AUTHOR("Steffen Klassert <steffen.klassert@secunet.com>"); MODULE_ALIAS_XFRM_OFFLOAD_TYPE(AF_INET, XFRM_PROTO_ESP); MODULE_DESCRIPTION("IPV4 GSO/GRO offload support"); |
| 49 134 133 4 135 135 134 4 4 420 140 140 139 58 86 138 5 133 139 123 10 16 117 134 134 134 133 125 121 49 134 82 74 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 | // SPDX-License-Identifier: GPL-2.0-only /* * Copyright (C) 2008 IBM Corporation * * Author: Mimi Zohar <zohar@us.ibm.com> * * File: ima_api.c * Implements must_appraise_or_measure, collect_measurement, * appraise_measurement, store_measurement and store_template. */ #include <linux/slab.h> #include <linux/file.h> #include <linux/fs.h> #include <linux/xattr.h> #include <linux/evm.h> #include <linux/fsverity.h> #include "ima.h" /* * ima_free_template_entry - free an existing template entry */ void ima_free_template_entry(struct ima_template_entry *entry) { int i; for (i = 0; i < entry->template_desc->num_fields; i++) kfree(entry->template_data[i].data); kfree(entry->digests); kfree(entry); } /* * ima_alloc_init_template - create and initialize a new template entry */ int ima_alloc_init_template(struct ima_event_data *event_data, struct ima_template_entry **entry, struct ima_template_desc *desc) { struct ima_template_desc *template_desc; struct tpm_digest *digests; int i, result = 0; if (desc) template_desc = desc; else template_desc = ima_template_desc_current(); *entry = kzalloc(struct_size(*entry, template_data, template_desc->num_fields), GFP_NOFS); if (!*entry) return -ENOMEM; digests = kcalloc(NR_BANKS(ima_tpm_chip) + ima_extra_slots, sizeof(*digests), GFP_NOFS); if (!digests) { kfree(*entry); *entry = NULL; return -ENOMEM; } (*entry)->digests = digests; (*entry)->template_desc = template_desc; for (i = 0; i < template_desc->num_fields; i++) { const struct ima_template_field *field = template_desc->fields[i]; u32 len; result = field->field_init(event_data, &((*entry)->template_data[i])); if (result != 0) goto out; len = (*entry)->template_data[i].len; (*entry)->template_data_len += sizeof(len); (*entry)->template_data_len += len; } return 0; out: ima_free_template_entry(*entry); *entry = NULL; return result; } /* * ima_store_template - store ima template measurements * * Calculate the hash of a template entry, add the template entry * to an ordered list of measurement entries maintained inside the kernel, * and also update the aggregate integrity value (maintained inside the * configured TPM PCR) over the hashes of the current list of measurement * entries. * * Applications retrieve the current kernel-held measurement list through * the securityfs entries in /sys/kernel/security/ima. The signed aggregate * TPM PCR (called quote) can be retrieved using a TPM user space library * and is used to validate the measurement list. * * Returns 0 on success, error code otherwise */ int ima_store_template(struct ima_template_entry *entry, int violation, struct inode *inode, const unsigned char *filename, int pcr) { static const char op[] = "add_template_measure"; static const char audit_cause[] = "hashing_error"; char *template_name = entry->template_desc->name; int result; if (!violation) { result = ima_calc_field_array_hash(&entry->template_data[0], entry); if (result < 0) { integrity_audit_msg(AUDIT_INTEGRITY_PCR, inode, template_name, op, audit_cause, result, 0); return result; } } entry->pcr = pcr; result = ima_add_template_entry(entry, violation, op, inode, filename); return result; } /* * ima_add_violation - add violation to measurement list. * * Violations are flagged in the measurement list with zero hash values. * By extending the PCR with 0xFF's instead of with zeroes, the PCR * value is invalidated. */ void ima_add_violation(struct file *file, const unsigned char *filename, struct ima_iint_cache *iint, const char *op, const char *cause) { struct ima_template_entry *entry; struct inode *inode = file_inode(file); struct ima_event_data event_data = { .iint = iint, .file = file, .filename = filename, .violation = cause }; int violation = 1; int result; /* can overflow, only indicator */ atomic_long_inc(&ima_htable.violations); result = ima_alloc_init_template(&event_data, &entry, NULL); if (result < 0) { result = -ENOMEM; goto err_out; } result = ima_store_template(entry, violation, inode, filename, CONFIG_IMA_MEASURE_PCR_IDX); if (result < 0) ima_free_template_entry(entry); err_out: integrity_audit_msg(AUDIT_INTEGRITY_PCR, inode, filename, op, cause, result, 0); } /** * ima_get_action - appraise & measure decision based on policy. * @idmap: idmap of the mount the inode was found from * @inode: pointer to the inode associated with the object being validated * @cred: pointer to credentials structure to validate * @prop: properties of the task being validated * @mask: contains the permission mask (MAY_READ, MAY_WRITE, MAY_EXEC, * MAY_APPEND) * @func: caller identifier * @pcr: pointer filled in if matched measure policy sets pcr= * @template_desc: pointer filled in if matched measure policy sets template= * @func_data: func specific data, may be NULL * @allowed_algos: allowlist of hash algorithms for the IMA xattr * * The policy is defined in terms of keypairs: * subj=, obj=, type=, func=, mask=, fsmagic= * subj,obj, and type: are LSM specific. * func: FILE_CHECK | BPRM_CHECK | CREDS_CHECK | MMAP_CHECK | MODULE_CHECK * | KEXEC_CMDLINE | KEY_CHECK | CRITICAL_DATA | SETXATTR_CHECK * | MMAP_CHECK_REQPROT * mask: contains the permission mask * fsmagic: hex value * * Returns IMA_MEASURE, IMA_APPRAISE mask. * */ int ima_get_action(struct mnt_idmap *idmap, struct inode *inode, const struct cred *cred, struct lsm_prop *prop, int mask, enum ima_hooks func, int *pcr, struct ima_template_desc **template_desc, const char *func_data, unsigned int *allowed_algos) { int flags = IMA_MEASURE | IMA_AUDIT | IMA_APPRAISE | IMA_HASH; flags &= ima_policy_flag; return ima_match_policy(idmap, inode, cred, prop, func, mask, flags, pcr, template_desc, func_data, allowed_algos); } static bool ima_get_verity_digest(struct ima_iint_cache *iint, struct inode *inode, struct ima_max_digest_data *hash) { enum hash_algo alg; int digest_len; /* * On failure, 'measure' policy rules will result in a file data * hash containing 0's. */ digest_len = fsverity_get_digest(inode, hash->digest, NULL, &alg); if (digest_len == 0) return false; /* * Unlike in the case of actually calculating the file hash, in * the fsverity case regardless of the hash algorithm, return * the verity digest to be included in the measurement list. A * mismatch between the verity algorithm and the xattr signature * algorithm, if one exists, will be detected later. */ hash->hdr.algo = alg; hash->hdr.length = digest_len; return true; } /* * ima_collect_measurement - collect file measurement * * Calculate the file hash, if it doesn't already exist, * storing the measurement and i_version in the iint. * * Must be called with iint->mutex held. * * Return 0 on success, error code otherwise */ int ima_collect_measurement(struct ima_iint_cache *iint, struct file *file, void *buf, loff_t size, enum hash_algo algo, struct modsig *modsig) { const char *audit_cause = "failed"; struct inode *inode = file_inode(file); struct inode *real_inode = d_real_inode(file_dentry(file)); struct ima_max_digest_data hash; struct ima_digest_data *hash_hdr = container_of(&hash.hdr, struct ima_digest_data, hdr); struct name_snapshot filename; struct kstat stat; int result = 0; int length; void *tmpbuf; u64 i_version = 0; /* * Always collect the modsig, because IMA might have already collected * the file digest without collecting the modsig in a previous * measurement rule. */ if (modsig) ima_collect_modsig(modsig, buf, size); if (iint->flags & IMA_COLLECTED) goto out; /* * Detecting file change is based on i_version. On filesystems * which do not support i_version, support was originally limited * to an initial measurement/appraisal/audit, but was modified to * assume the file changed. */ result = vfs_getattr_nosec(&file->f_path, &stat, STATX_CHANGE_COOKIE, AT_STATX_SYNC_AS_STAT); if (!result && (stat.result_mask & STATX_CHANGE_COOKIE)) i_version = stat.change_cookie; hash.hdr.algo = algo; hash.hdr.length = hash_digest_size[algo]; /* Initialize hash digest to 0's in case of failure */ memset(&hash.digest, 0, sizeof(hash.digest)); if (iint->flags & IMA_VERITY_REQUIRED) { if (!ima_get_verity_digest(iint, inode, &hash)) { audit_cause = "no-verity-digest"; result = -ENODATA; } } else if (buf) { result = ima_calc_buffer_hash(buf, size, hash_hdr); } else { result = ima_calc_file_hash(file, hash_hdr); } if (result && result != -EBADF && result != -EINVAL) goto out; length = sizeof(hash.hdr) + hash.hdr.length; tmpbuf = krealloc(iint->ima_hash, length, GFP_NOFS); if (!tmpbuf) { result = -ENOMEM; goto out; } iint->ima_hash = tmpbuf; memcpy(iint->ima_hash, &hash, length); if (real_inode == inode) iint->real_inode.version = i_version; else integrity_inode_attrs_store(&iint->real_inode, i_version, real_inode); /* Possibly temporary failure due to type of read (eg. O_DIRECT) */ if (!result) iint->flags |= IMA_COLLECTED; out: if (result) { if (file->f_flags & O_DIRECT) audit_cause = "failed(directio)"; take_dentry_name_snapshot(&filename, file->f_path.dentry); integrity_audit_msg(AUDIT_INTEGRITY_DATA, inode, filename.name.name, "collect_data", audit_cause, result, 0); release_dentry_name_snapshot(&filename); } return result; } /* * ima_store_measurement - store file measurement * * Create an "ima" template and then store the template by calling * ima_store_template. * * We only get here if the inode has not already been measured, * but the measurement could already exist: * - multiple copies of the same file on either the same or * different filesystems. * - the inode was previously flushed as well as the iint info, * containing the hashing info. * * Must be called with iint->mutex held. */ void ima_store_measurement(struct ima_iint_cache *iint, struct file *file, const unsigned char *filename, struct evm_ima_xattr_data *xattr_value, int xattr_len, const struct modsig *modsig, int pcr, struct ima_template_desc *template_desc) { static const char op[] = "add_template_measure"; static const char audit_cause[] = "ENOMEM"; int result = -ENOMEM; struct inode *inode = file_inode(file); struct ima_template_entry *entry; struct ima_event_data event_data = { .iint = iint, .file = file, .filename = filename, .xattr_value = xattr_value, .xattr_len = xattr_len, .modsig = modsig }; int violation = 0; /* * We still need to store the measurement in the case of MODSIG because * we only have its contents to put in the list at the time of * appraisal, but a file measurement from earlier might already exist in * the measurement list. */ if (iint->measured_pcrs & (0x1 << pcr) && !modsig) return; result = ima_alloc_init_template(&event_data, &entry, template_desc); if (result < 0) { integrity_audit_msg(AUDIT_INTEGRITY_PCR, inode, filename, op, audit_cause, result, 0); return; } result = ima_store_template(entry, violation, inode, filename, pcr); if ((!result || result == -EEXIST) && !(file->f_flags & O_DIRECT)) { iint->flags |= IMA_MEASURED; iint->measured_pcrs |= (0x1 << pcr); } if (result < 0) ima_free_template_entry(entry); } void ima_audit_measurement(struct ima_iint_cache *iint, const unsigned char *filename) { struct audit_buffer *ab; char *hash; const char *algo_name = hash_algo_name[iint->ima_hash->algo]; int i; if (iint->flags & IMA_AUDITED) return; hash = kzalloc((iint->ima_hash->length * 2) + 1, GFP_KERNEL); if (!hash) return; for (i = 0; i < iint->ima_hash->length; i++) hex_byte_pack(hash + (i * 2), iint->ima_hash->digest[i]); hash[i * 2] = '\0'; ab = audit_log_start(audit_context(), GFP_KERNEL, AUDIT_INTEGRITY_RULE); if (!ab) goto out; audit_log_format(ab, "file="); audit_log_untrustedstring(ab, filename); audit_log_format(ab, " hash=\"%s:%s\"", algo_name, hash); audit_log_task_info(ab); audit_log_end(ab); iint->flags |= IMA_AUDITED; out: kfree(hash); return; } /* * ima_d_path - return a pointer to the full pathname * * Attempt to return a pointer to the full pathname for use in the * IMA measurement list, IMA audit records, and auditing logs. * * On failure, return a pointer to a copy of the filename, not dname. * Returning a pointer to dname, could result in using the pointer * after the memory has been freed. */ const char *ima_d_path(const struct path *path, char **pathbuf, char *namebuf) { struct name_snapshot filename; char *pathname = NULL; *pathbuf = __getname(); if (*pathbuf) { pathname = d_absolute_path(path, *pathbuf, PATH_MAX); if (IS_ERR(pathname)) { __putname(*pathbuf); *pathbuf = NULL; pathname = NULL; } } if (!pathname) { take_dentry_name_snapshot(&filename, path->dentry); strscpy(namebuf, filename.name.name, NAME_MAX); release_dentry_name_snapshot(&filename); pathname = namebuf; } return pathname; } |
| 11 3 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 | /* SPDX-License-Identifier: GPL-2.0 */ #ifndef _LINUX_SCHED_TASK_STACK_H #define _LINUX_SCHED_TASK_STACK_H /* * task->stack (kernel stack) handling interfaces: */ #include <linux/sched.h> #include <linux/magic.h> #include <linux/refcount.h> #include <linux/kasan.h> #ifdef CONFIG_THREAD_INFO_IN_TASK /* * When accessing the stack of a non-current task that might exit, use * try_get_task_stack() instead. task_stack_page will return a pointer * that could get freed out from under you. */ static __always_inline void *task_stack_page(const struct task_struct *task) { return task->stack; } #define setup_thread_stack(new,old) do { } while(0) static __always_inline unsigned long *end_of_stack(const struct task_struct *task) { #ifdef CONFIG_STACK_GROWSUP return (unsigned long *)((unsigned long)task->stack + THREAD_SIZE) - 1; #else return task->stack; #endif } #else #define task_stack_page(task) ((void *)(task)->stack) static inline void setup_thread_stack(struct task_struct *p, struct task_struct *org) { *task_thread_info(p) = *task_thread_info(org); task_thread_info(p)->task = p; } /* * Return the address of the last usable long on the stack. * * When the stack grows down, this is just above the thread * info struct. Going any lower will corrupt the threadinfo. * * When the stack grows up, this is the highest address. * Beyond that position, we corrupt data on the next page. */ static inline unsigned long *end_of_stack(struct task_struct *p) { #ifdef CONFIG_STACK_GROWSUP return (unsigned long *)((unsigned long)task_thread_info(p) + THREAD_SIZE) - 1; #else return (unsigned long *)(task_thread_info(p) + 1); #endif } #endif #ifdef CONFIG_THREAD_INFO_IN_TASK static inline void *try_get_task_stack(struct task_struct *tsk) { return refcount_inc_not_zero(&tsk->stack_refcount) ? task_stack_page(tsk) : NULL; } extern void put_task_stack(struct task_struct *tsk); #else static inline void *try_get_task_stack(struct task_struct *tsk) { return task_stack_page(tsk); } static inline void put_task_stack(struct task_struct *tsk) {} #endif void exit_task_stack_account(struct task_struct *tsk); #define task_stack_end_corrupted(task) \ (*(end_of_stack(task)) != STACK_END_MAGIC) static inline int object_is_on_stack(const void *obj) { void *stack = task_stack_page(current); obj = kasan_reset_tag(obj); return (obj >= stack) && (obj < (stack + THREAD_SIZE)); } extern void thread_stack_cache_init(void); #ifdef CONFIG_DEBUG_STACK_USAGE unsigned long stack_not_used(struct task_struct *p); #else static inline unsigned long stack_not_used(struct task_struct *p) { return 0; } #endif extern void set_task_stack_end_magic(struct task_struct *tsk); static inline int kstack_end(void *addr) { /* Reliable end of stack detection: * Some APM bios versions misalign the stack */ return !(((unsigned long)addr+sizeof(void*)-1) & (THREAD_SIZE-sizeof(void*))); } #endif /* _LINUX_SCHED_TASK_STACK_H */ |
| 343 342 341 1 1 274 107 64 1 55 55 1 105 99 8 330 332 332 334 254 164 162 1 1 337 337 335 1 1 1 31 1 344 31 46 322 346 346 348 343 347 345 346 2 338 2 202 202 202 196 4 193 10 1 9 197 3 1 1 3 199 6 201 198 202 201 202 201 202 202 200 1 8 8 8 8 8 43 43 43 43 43 12 12 12 12 12 12 12 12 12 11 9 32 32 32 32 30 32 47 47 47 2 44 43 1 42 1 42 25 18 1 18 13 3 17 13 12 11 12 11 11 11 2 6 30 2 30 30 6 32 1 75 45 1 32 30 9 373 229 276 277 279 279 275 1 235 138 235 294 146 332 333 332 328 330 3 3 332 2 57 116 39 313 146 305 16 75 75 75 49 44 5 32 26 27 152 155 156 124 31 32 32 32 32 32 156 151 4 179 32 155 178 179 1 1 175 3 177 2 142 39 38 26 3 10 2 7 5 141 2 173 2 175 169 27 17 158 153 23 159 25 135 2 1 78 98 158 156 2 158 83 3 157 135 22 97 80 43 6 4 3 1 1 37 37 36 37 32 6 86 102 2 14 13 1 138 140 6 32 109 110 134 182 141 42 165 25 151 3 148 4 152 24 153 153 148 4 4 37 152 152 152 153 153 150 118 49 5 45 5 49 144 146 147 140 19 12 145 106 106 1 102 38 38 14 25 25 52 51 3 13 43 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 | // SPDX-License-Identifier: GPL-2.0-or-later /* * IPv6 output functions * Linux INET6 implementation * * Authors: * Pedro Roque <roque@di.fc.ul.pt> * * Based on linux/net/ipv4/ip_output.c * * Changes: * A.N.Kuznetsov : airthmetics in fragmentation. * extension headers are implemented. * route changes now work. * ip6_forward does not confuse sniffers. * etc. * * H. von Brand : Added missing #include <linux/string.h> * Imran Patel : frag id should be in NBO * Kazunori MIYAZAWA @USAGI * : add ip6_append_data and related functions * for datagram xmit */ #include <linux/errno.h> #include <linux/kernel.h> #include <linux/string.h> #include <linux/socket.h> #include <linux/net.h> #include <linux/netdevice.h> #include <linux/if_arp.h> #include <linux/in6.h> #include <linux/tcp.h> #include <linux/route.h> #include <linux/module.h> #include <linux/slab.h> #include <linux/bpf-cgroup.h> #include <linux/netfilter.h> #include <linux/netfilter_ipv6.h> #include <net/sock.h> #include <net/snmp.h> #include <net/gso.h> #include <net/ipv6.h> #include <net/ndisc.h> #include <net/protocol.h> #include <net/ip6_route.h> #include <net/addrconf.h> #include <net/rawv6.h> #include <net/icmp.h> #include <net/xfrm.h> #include <net/checksum.h> #include <linux/mroute6.h> #include <net/l3mdev.h> #include <net/lwtunnel.h> #include <net/ip_tunnels.h> static int ip6_finish_output2(struct net *net, struct sock *sk, struct sk_buff *skb) { struct dst_entry *dst = skb_dst(skb); struct net_device *dev = dst->dev; struct inet6_dev *idev = ip6_dst_idev(dst); unsigned int hh_len = LL_RESERVED_SPACE(dev); const struct in6_addr *daddr, *nexthop; struct ipv6hdr *hdr; struct neighbour *neigh; int ret; /* Be paranoid, rather than too clever. */ if (unlikely(hh_len > skb_headroom(skb)) && dev->header_ops) { /* Make sure idev stays alive */ rcu_read_lock(); skb = skb_expand_head(skb, hh_len); if (!skb) { IP6_INC_STATS(net, idev, IPSTATS_MIB_OUTDISCARDS); rcu_read_unlock(); return -ENOMEM; } rcu_read_unlock(); } hdr = ipv6_hdr(skb); daddr = &hdr->daddr; if (ipv6_addr_is_multicast(daddr)) { if (!(dev->flags & IFF_LOOPBACK) && sk_mc_loop(sk) && ((mroute6_is_socket(net, skb) && !(IP6CB(skb)->flags & IP6SKB_FORWARDED)) || ipv6_chk_mcast_addr(dev, daddr, &hdr->saddr))) { struct sk_buff *newskb = skb_clone(skb, GFP_ATOMIC); /* Do not check for IFF_ALLMULTI; multicast routing is not supported in any case. */ if (newskb) NF_HOOK(NFPROTO_IPV6, NF_INET_POST_ROUTING, net, sk, newskb, NULL, newskb->dev, dev_loopback_xmit); if (hdr->hop_limit == 0) { IP6_INC_STATS(net, idev, IPSTATS_MIB_OUTDISCARDS); kfree_skb(skb); return 0; } } IP6_UPD_PO_STATS(net, idev, IPSTATS_MIB_OUTMCAST, skb->len); if (IPV6_ADDR_MC_SCOPE(daddr) <= IPV6_ADDR_SCOPE_NODELOCAL && !(dev->flags & IFF_LOOPBACK)) { kfree_skb(skb); return 0; } } if (lwtunnel_xmit_redirect(dst->lwtstate)) { int res = lwtunnel_xmit(skb); if (res != LWTUNNEL_XMIT_CONTINUE) return res; } IP6_UPD_PO_STATS(net, idev, IPSTATS_MIB_OUT, skb->len); rcu_read_lock(); nexthop = rt6_nexthop(dst_rt6_info(dst), daddr); neigh = __ipv6_neigh_lookup_noref(dev, nexthop); if (IS_ERR_OR_NULL(neigh)) { if (unlikely(!neigh)) neigh = __neigh_create(&nd_tbl, nexthop, dev, false); if (IS_ERR(neigh)) { rcu_read_unlock(); IP6_INC_STATS(net, idev, IPSTATS_MIB_OUTNOROUTES); kfree_skb_reason(skb, SKB_DROP_REASON_NEIGH_CREATEFAIL); return -EINVAL; } } sock_confirm_neigh(skb, neigh); ret = neigh_output(neigh, skb, false); rcu_read_unlock(); return ret; } static int ip6_finish_output_gso_slowpath_drop(struct net *net, struct sock *sk, struct sk_buff *skb, unsigned int mtu) { struct sk_buff *segs, *nskb; netdev_features_t features; int ret = 0; /* Please see corresponding comment in ip_finish_output_gso * describing the cases where GSO segment length exceeds the * egress MTU. */ features = netif_skb_features(skb); segs = skb_gso_segment(skb, features & ~NETIF_F_GSO_MASK); if (IS_ERR_OR_NULL(segs)) { kfree_skb(skb); return -ENOMEM; } consume_skb(skb); skb_list_walk_safe(segs, segs, nskb) { int err; skb_mark_not_on_list(segs); /* Last GSO segment can be smaller than gso_size (and MTU). * Adding a fragment header would produce an "atomic fragment", * which is considered harmful (RFC-8021). Avoid that. */ err = segs->len > mtu ? ip6_fragment(net, sk, segs, ip6_finish_output2) : ip6_finish_output2(net, sk, segs); if (err && ret == 0) ret = err; } return ret; } static int ip6_finish_output_gso(struct net *net, struct sock *sk, struct sk_buff *skb, unsigned int mtu) { if (!(IP6CB(skb)->flags & IP6SKB_FAKEJUMBO) && !skb_gso_validate_network_len(skb, mtu)) return ip6_finish_output_gso_slowpath_drop(net, sk, skb, mtu); return ip6_finish_output2(net, sk, skb); } static int __ip6_finish_output(struct net *net, struct sock *sk, struct sk_buff *skb) { unsigned int mtu; #if defined(CONFIG_NETFILTER) && defined(CONFIG_XFRM) /* Policy lookup after SNAT yielded a new policy */ if (skb_dst(skb)->xfrm) { IP6CB(skb)->flags |= IP6SKB_REROUTED; return dst_output(net, sk, skb); } #endif mtu = ip6_skb_dst_mtu(skb); if (skb_is_gso(skb)) return ip6_finish_output_gso(net, sk, skb, mtu); if (skb->len > mtu || (IP6CB(skb)->frag_max_size && skb->len > IP6CB(skb)->frag_max_size)) return ip6_fragment(net, sk, skb, ip6_finish_output2); return ip6_finish_output2(net, sk, skb); } static int ip6_finish_output(struct net *net, struct sock *sk, struct sk_buff *skb) { int ret; ret = BPF_CGROUP_RUN_PROG_INET_EGRESS(sk, skb); switch (ret) { case NET_XMIT_SUCCESS: case NET_XMIT_CN: return __ip6_finish_output(net, sk, skb) ? : ret; default: kfree_skb_reason(skb, SKB_DROP_REASON_BPF_CGROUP_EGRESS); return ret; } } int ip6_output(struct net *net, struct sock *sk, struct sk_buff *skb) { struct net_device *dev = skb_dst(skb)->dev, *indev = skb->dev; struct inet6_dev *idev = ip6_dst_idev(skb_dst(skb)); skb->protocol = htons(ETH_P_IPV6); skb->dev = dev; if (unlikely(!idev || READ_ONCE(idev->cnf.disable_ipv6))) { IP6_INC_STATS(net, idev, IPSTATS_MIB_OUTDISCARDS); kfree_skb_reason(skb, SKB_DROP_REASON_IPV6DISABLED); return 0; } return NF_HOOK_COND(NFPROTO_IPV6, NF_INET_POST_ROUTING, net, sk, skb, indev, dev, ip6_finish_output, !(IP6CB(skb)->flags & IP6SKB_REROUTED)); } EXPORT_SYMBOL(ip6_output); bool ip6_autoflowlabel(struct net *net, const struct sock *sk) { if (!inet6_test_bit(AUTOFLOWLABEL_SET, sk)) return ip6_default_np_autolabel(net); return inet6_test_bit(AUTOFLOWLABEL, sk); } /* * xmit an sk_buff (used by TCP and SCTP) * Note : socket lock is not held for SYNACK packets, but might be modified * by calls to skb_set_owner_w() and ipv6_local_error(), * which are using proper atomic operations or spinlocks. */ int ip6_xmit(const struct sock *sk, struct sk_buff *skb, struct flowi6 *fl6, __u32 mark, struct ipv6_txoptions *opt, int tclass, u32 priority) { struct net *net = sock_net(sk); const struct ipv6_pinfo *np = inet6_sk(sk); struct in6_addr *first_hop = &fl6->daddr; struct dst_entry *dst = skb_dst(skb); struct net_device *dev = dst->dev; struct inet6_dev *idev = ip6_dst_idev(dst); struct hop_jumbo_hdr *hop_jumbo; int hoplen = sizeof(*hop_jumbo); unsigned int head_room; struct ipv6hdr *hdr; u8 proto = fl6->flowi6_proto; int seg_len = skb->len; int hlimit = -1; u32 mtu; head_room = sizeof(struct ipv6hdr) + hoplen + LL_RESERVED_SPACE(dev); if (opt) head_room += opt->opt_nflen + opt->opt_flen; if (unlikely(head_room > skb_headroom(skb))) { /* Make sure idev stays alive */ rcu_read_lock(); skb = skb_expand_head(skb, head_room); if (!skb) { IP6_INC_STATS(net, idev, IPSTATS_MIB_OUTDISCARDS); rcu_read_unlock(); return -ENOBUFS; } rcu_read_unlock(); } if (opt) { seg_len += opt->opt_nflen + opt->opt_flen; if (opt->opt_flen) ipv6_push_frag_opts(skb, opt, &proto); if (opt->opt_nflen) ipv6_push_nfrag_opts(skb, opt, &proto, &first_hop, &fl6->saddr); } if (unlikely(seg_len > IPV6_MAXPLEN)) { hop_jumbo = skb_push(skb, hoplen); hop_jumbo->nexthdr = proto; hop_jumbo->hdrlen = 0; hop_jumbo->tlv_type = IPV6_TLV_JUMBO; hop_jumbo->tlv_len = 4; hop_jumbo->jumbo_payload_len = htonl(seg_len + hoplen); proto = IPPROTO_HOPOPTS; seg_len = 0; IP6CB(skb)->flags |= IP6SKB_FAKEJUMBO; } skb_push(skb, sizeof(struct ipv6hdr)); skb_reset_network_header(skb); hdr = ipv6_hdr(skb); /* * Fill in the IPv6 header */ if (np) hlimit = READ_ONCE(np->hop_limit); if (hlimit < 0) hlimit = ip6_dst_hoplimit(dst); ip6_flow_hdr(hdr, tclass, ip6_make_flowlabel(net, skb, fl6->flowlabel, ip6_autoflowlabel(net, sk), fl6)); hdr->payload_len = htons(seg_len); hdr->nexthdr = proto; hdr->hop_limit = hlimit; hdr->saddr = fl6->saddr; hdr->daddr = *first_hop; skb->protocol = htons(ETH_P_IPV6); skb->priority = priority; skb->mark = mark; mtu = dst_mtu(dst); if ((skb->len <= mtu) || skb->ignore_df || skb_is_gso(skb)) { IP6_INC_STATS(net, idev, IPSTATS_MIB_OUTREQUESTS); /* if egress device is enslaved to an L3 master device pass the * skb to its handler for processing */ skb = l3mdev_ip6_out((struct sock *)sk, skb); if (unlikely(!skb)) return 0; /* hooks should never assume socket lock is held. * we promote our socket to non const */ return NF_HOOK(NFPROTO_IPV6, NF_INET_LOCAL_OUT, net, (struct sock *)sk, skb, NULL, dev, dst_output); } skb->dev = dev; /* ipv6_local_error() does not require socket lock, * we promote our socket to non const */ ipv6_local_error((struct sock *)sk, EMSGSIZE, fl6, mtu); IP6_INC_STATS(net, idev, IPSTATS_MIB_FRAGFAILS); kfree_skb(skb); return -EMSGSIZE; } EXPORT_SYMBOL(ip6_xmit); static int ip6_call_ra_chain(struct sk_buff *skb, int sel) { struct ip6_ra_chain *ra; struct sock *last = NULL; read_lock(&ip6_ra_lock); for (ra = ip6_ra_chain; ra; ra = ra->next) { struct sock *sk = ra->sk; if (sk && ra->sel == sel && (!sk->sk_bound_dev_if || sk->sk_bound_dev_if == skb->dev->ifindex)) { if (inet6_test_bit(RTALERT_ISOLATE, sk) && !net_eq(sock_net(sk), dev_net(skb->dev))) { continue; } if (last) { struct sk_buff *skb2 = skb_clone(skb, GFP_ATOMIC); if (skb2) rawv6_rcv(last, skb2); } last = sk; } } if (last) { rawv6_rcv(last, skb); read_unlock(&ip6_ra_lock); return 1; } read_unlock(&ip6_ra_lock); return 0; } static int ip6_forward_proxy_check(struct sk_buff *skb) { struct ipv6hdr *hdr = ipv6_hdr(skb); u8 nexthdr = hdr->nexthdr; __be16 frag_off; int offset; if (ipv6_ext_hdr(nexthdr)) { offset = ipv6_skip_exthdr(skb, sizeof(*hdr), &nexthdr, &frag_off); if (offset < 0) return 0; } else offset = sizeof(struct ipv6hdr); if (nexthdr == IPPROTO_ICMPV6) { struct icmp6hdr *icmp6; if (!pskb_may_pull(skb, (skb_network_header(skb) + offset + 1 - skb->data))) return 0; icmp6 = (struct icmp6hdr *)(skb_network_header(skb) + offset); switch (icmp6->icmp6_type) { case NDISC_ROUTER_SOLICITATION: case NDISC_ROUTER_ADVERTISEMENT: case NDISC_NEIGHBOUR_SOLICITATION: case NDISC_NEIGHBOUR_ADVERTISEMENT: case NDISC_REDIRECT: /* For reaction involving unicast neighbor discovery * message destined to the proxied address, pass it to * input function. */ return 1; default: break; } } /* * The proxying router can't forward traffic sent to a link-local * address, so signal the sender and discard the packet. This * behavior is clarified by the MIPv6 specification. */ if (ipv6_addr_type(&hdr->daddr) & IPV6_ADDR_LINKLOCAL) { dst_link_failure(skb); return -1; } return 0; } static inline int ip6_forward_finish(struct net *net, struct sock *sk, struct sk_buff *skb) { #ifdef CONFIG_NET_SWITCHDEV if (skb->offload_l3_fwd_mark) { consume_skb(skb); return 0; } #endif skb_clear_tstamp(skb); return dst_output(net, sk, skb); } static bool ip6_pkt_too_big(const struct sk_buff *skb, unsigned int mtu) { if (skb->len <= mtu) return false; /* ipv6 conntrack defrag sets max_frag_size + ignore_df */ if (IP6CB(skb)->frag_max_size && IP6CB(skb)->frag_max_size > mtu) return true; if (skb->ignore_df) return false; if (skb_is_gso(skb) && skb_gso_validate_network_len(skb, mtu)) return false; return true; } int ip6_forward(struct sk_buff *skb) { struct dst_entry *dst = skb_dst(skb); struct ipv6hdr *hdr = ipv6_hdr(skb); struct inet6_skb_parm *opt = IP6CB(skb); struct net *net = dev_net(dst->dev); struct inet6_dev *idev; SKB_DR(reason); u32 mtu; idev = __in6_dev_get_safely(dev_get_by_index_rcu(net, IP6CB(skb)->iif)); if (READ_ONCE(net->ipv6.devconf_all->forwarding) == 0) goto error; if (skb->pkt_type != PACKET_HOST) goto drop; if (unlikely(skb->sk)) goto drop; if (skb_warn_if_lro(skb)) goto drop; if (!READ_ONCE(net->ipv6.devconf_all->disable_policy) && (!idev || !READ_ONCE(idev->cnf.disable_policy)) && !xfrm6_policy_check(NULL, XFRM_POLICY_FWD, skb)) { __IP6_INC_STATS(net, idev, IPSTATS_MIB_INDISCARDS); goto drop; } skb_forward_csum(skb); /* * We DO NOT make any processing on * RA packets, pushing them to user level AS IS * without ane WARRANTY that application will be able * to interpret them. The reason is that we * cannot make anything clever here. * * We are not end-node, so that if packet contains * AH/ESP, we cannot make anything. * Defragmentation also would be mistake, RA packets * cannot be fragmented, because there is no warranty * that different fragments will go along one path. --ANK */ if (unlikely(opt->flags & IP6SKB_ROUTERALERT)) { if (ip6_call_ra_chain(skb, ntohs(opt->ra))) return 0; } /* * check and decrement ttl */ if (hdr->hop_limit <= 1) { icmpv6_send(skb, ICMPV6_TIME_EXCEED, ICMPV6_EXC_HOPLIMIT, 0); __IP6_INC_STATS(net, idev, IPSTATS_MIB_INHDRERRORS); kfree_skb_reason(skb, SKB_DROP_REASON_IP_INHDR); return -ETIMEDOUT; } /* XXX: idev->cnf.proxy_ndp? */ if (READ_ONCE(net->ipv6.devconf_all->proxy_ndp) && pneigh_lookup(&nd_tbl, net, &hdr->daddr, skb->dev, 0)) { int proxied = ip6_forward_proxy_check(skb); if (proxied > 0) { /* It's tempting to decrease the hop limit * here by 1, as we do at the end of the * function too. * * But that would be incorrect, as proxying is * not forwarding. The ip6_input function * will handle this packet locally, and it * depends on the hop limit being unchanged. * * One example is the NDP hop limit, that * always has to stay 255, but other would be * similar checks around RA packets, where the * user can even change the desired limit. */ return ip6_input(skb); } else if (proxied < 0) { __IP6_INC_STATS(net, idev, IPSTATS_MIB_INDISCARDS); goto drop; } } if (!xfrm6_route_forward(skb)) { __IP6_INC_STATS(net, idev, IPSTATS_MIB_INDISCARDS); SKB_DR_SET(reason, XFRM_POLICY); goto drop; } dst = skb_dst(skb); /* IPv6 specs say nothing about it, but it is clear that we cannot send redirects to source routed frames. We don't send redirects to frames decapsulated from IPsec. */ if (IP6CB(skb)->iif == dst->dev->ifindex && opt->srcrt == 0 && !skb_sec_path(skb)) { struct in6_addr *target = NULL; struct inet_peer *peer; struct rt6_info *rt; /* * incoming and outgoing devices are the same * send a redirect. */ rt = dst_rt6_info(dst); if (rt->rt6i_flags & RTF_GATEWAY) target = &rt->rt6i_gateway; else target = &hdr->daddr; rcu_read_lock(); peer = inet_getpeer_v6(net->ipv6.peers, &hdr->daddr); /* Limit redirects both by destination (here) and by source (inside ndisc_send_redirect) */ if (inet_peer_xrlim_allow(peer, 1*HZ)) ndisc_send_redirect(skb, target); rcu_read_unlock(); } else { int addrtype = ipv6_addr_type(&hdr->saddr); /* This check is security critical. */ if (addrtype == IPV6_ADDR_ANY || addrtype & (IPV6_ADDR_MULTICAST | IPV6_ADDR_LOOPBACK)) goto error; if (addrtype & IPV6_ADDR_LINKLOCAL) { icmpv6_send(skb, ICMPV6_DEST_UNREACH, ICMPV6_NOT_NEIGHBOUR, 0); goto error; } } __IP6_INC_STATS(net, ip6_dst_idev(dst), IPSTATS_MIB_OUTFORWDATAGRAMS); mtu = ip6_dst_mtu_maybe_forward(dst, true); if (mtu < IPV6_MIN_MTU) mtu = IPV6_MIN_MTU; if (ip6_pkt_too_big(skb, mtu)) { /* Again, force OUTPUT device used as source address */ skb->dev = dst->dev; icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu); __IP6_INC_STATS(net, idev, IPSTATS_MIB_INTOOBIGERRORS); __IP6_INC_STATS(net, ip6_dst_idev(dst), IPSTATS_MIB_FRAGFAILS); kfree_skb_reason(skb, SKB_DROP_REASON_PKT_TOO_BIG); return -EMSGSIZE; } if (skb_cow(skb, dst->dev->hard_header_len)) { __IP6_INC_STATS(net, ip6_dst_idev(dst), IPSTATS_MIB_OUTDISCARDS); goto drop; } hdr = ipv6_hdr(skb); /* Mangling hops number delayed to point after skb COW */ hdr->hop_limit--; return NF_HOOK(NFPROTO_IPV6, NF_INET_FORWARD, net, NULL, skb, skb->dev, dst->dev, ip6_forward_finish); error: __IP6_INC_STATS(net, idev, IPSTATS_MIB_INADDRERRORS); SKB_DR_SET(reason, IP_INADDRERRORS); drop: kfree_skb_reason(skb, reason); return -EINVAL; } static void ip6_copy_metadata(struct sk_buff *to, struct sk_buff *from) { to->pkt_type = from->pkt_type; to->priority = from->priority; to->protocol = from->protocol; skb_dst_drop(to); skb_dst_set(to, dst_clone(skb_dst(from))); to->dev = from->dev; to->mark = from->mark; skb_copy_hash(to, from); #ifdef CONFIG_NET_SCHED to->tc_index = from->tc_index; #endif nf_copy(to, from); skb_ext_copy(to, from); skb_copy_secmark(to, from); } int ip6_fraglist_init(struct sk_buff *skb, unsigned int hlen, u8 *prevhdr, u8 nexthdr, __be32 frag_id, struct ip6_fraglist_iter *iter) { unsigned int first_len; struct frag_hdr *fh; /* BUILD HEADER */ *prevhdr = NEXTHDR_FRAGMENT; iter->tmp_hdr = kmemdup(skb_network_header(skb), hlen, GFP_ATOMIC); if (!iter->tmp_hdr) return -ENOMEM; iter->frag = skb_shinfo(skb)->frag_list; skb_frag_list_init(skb); iter->offset = 0; iter->hlen = hlen; iter->frag_id = frag_id; iter->nexthdr = nexthdr; __skb_pull(skb, hlen); fh = __skb_push(skb, sizeof(struct frag_hdr)); __skb_push(skb, hlen); skb_reset_network_header(skb); memcpy(skb_network_header(skb), iter->tmp_hdr, hlen); fh->nexthdr = nexthdr; fh->reserved = 0; fh->frag_off = htons(IP6_MF); fh->identification = frag_id; first_len = skb_pagelen(skb); skb->data_len = first_len - skb_headlen(skb); skb->len = first_len; ipv6_hdr(skb)->payload_len = htons(first_len - sizeof(struct ipv6hdr)); return 0; } EXPORT_SYMBOL(ip6_fraglist_init); void ip6_fraglist_prepare(struct sk_buff *skb, struct ip6_fraglist_iter *iter) { struct sk_buff *frag = iter->frag; unsigned int hlen = iter->hlen; struct frag_hdr *fh; frag->ip_summed = CHECKSUM_NONE; skb_reset_transport_header(frag); fh = __skb_push(frag, sizeof(struct frag_hdr)); __skb_push(frag, hlen); skb_reset_network_header(frag); memcpy(skb_network_header(frag), iter->tmp_hdr, hlen); iter->offset += skb->len - hlen - sizeof(struct frag_hdr); fh->nexthdr = iter->nexthdr; fh->reserved = 0; fh->frag_off = htons(iter->offset); if (frag->next) fh->frag_off |= htons(IP6_MF); fh->identification = iter->frag_id; ipv6_hdr(frag)->payload_len = htons(frag->len - sizeof(struct ipv6hdr)); ip6_copy_metadata(frag, skb); } EXPORT_SYMBOL(ip6_fraglist_prepare); void ip6_frag_init(struct sk_buff *skb, unsigned int hlen, unsigned int mtu, unsigned short needed_tailroom, int hdr_room, u8 *prevhdr, u8 nexthdr, __be32 frag_id, struct ip6_frag_state *state) { state->prevhdr = prevhdr; state->nexthdr = nexthdr; state->frag_id = frag_id; state->hlen = hlen; state->mtu = mtu; state->left = skb->len - hlen; /* Space per frame */ state->ptr = hlen; /* Where to start from */ state->hroom = hdr_room; state->troom = needed_tailroom; state->offset = 0; } EXPORT_SYMBOL(ip6_frag_init); struct sk_buff *ip6_frag_next(struct sk_buff *skb, struct ip6_frag_state *state) { u8 *prevhdr = state->prevhdr, *fragnexthdr_offset; struct sk_buff *frag; struct frag_hdr *fh; unsigned int len; len = state->left; /* IF: it doesn't fit, use 'mtu' - the data space left */ if (len > state->mtu) len = state->mtu; /* IF: we are not sending up to and including the packet end then align the next start on an eight byte boundary */ if (len < state->left) len &= ~7; /* Allocate buffer */ frag = alloc_skb(len + state->hlen + sizeof(struct frag_hdr) + state->hroom + state->troom, GFP_ATOMIC); if (!frag) return ERR_PTR(-ENOMEM); /* * Set up data on packet */ ip6_copy_metadata(frag, skb); skb_reserve(frag, state->hroom); skb_put(frag, len + state->hlen + sizeof(struct frag_hdr)); skb_reset_network_header(frag); fh = (struct frag_hdr *)(skb_network_header(frag) + state->hlen); frag->transport_header = (frag->network_header + state->hlen + sizeof(struct frag_hdr)); /* * Charge the memory for the fragment to any owner * it might possess */ if (skb->sk) skb_set_owner_w(frag, skb->sk); /* * Copy the packet header into the new buffer. */ skb_copy_from_linear_data(skb, skb_network_header(frag), state->hlen); fragnexthdr_offset = skb_network_header(frag); fragnexthdr_offset += prevhdr - skb_network_header(skb); *fragnexthdr_offset = NEXTHDR_FRAGMENT; /* * Build fragment header. */ fh->nexthdr = state->nexthdr; fh->reserved = 0; fh->identification = state->frag_id; /* * Copy a block of the IP datagram. */ BUG_ON(skb_copy_bits(skb, state->ptr, skb_transport_header(frag), len)); state->left -= len; fh->frag_off = htons(state->offset); if (state->left > 0) fh->frag_off |= htons(IP6_MF); ipv6_hdr(frag)->payload_len = htons(frag->len - sizeof(struct ipv6hdr)); state->ptr += len; state->offset += len; return frag; } EXPORT_SYMBOL(ip6_frag_next); int ip6_fragment(struct net *net, struct sock *sk, struct sk_buff *skb, int (*output)(struct net *, struct sock *, struct sk_buff *)) { struct sk_buff *frag; struct rt6_info *rt = dst_rt6_info(skb_dst(skb)); struct ipv6_pinfo *np = skb->sk && !dev_recursion_level() ? inet6_sk(skb->sk) : NULL; u8 tstamp_type = skb->tstamp_type; struct ip6_frag_state state; unsigned int mtu, hlen, nexthdr_offset; ktime_t tstamp = skb->tstamp; int hroom, err = 0; __be32 frag_id; u8 *prevhdr, nexthdr = 0; err = ip6_find_1stfragopt(skb, &prevhdr); if (err < 0) goto fail; hlen = err; nexthdr = *prevhdr; nexthdr_offset = prevhdr - skb_network_header(skb); mtu = ip6_skb_dst_mtu(skb); /* We must not fragment if the socket is set to force MTU discovery * or if the skb it not generated by a local socket. */ if (unlikely(!skb->ignore_df && skb->len > mtu)) goto fail_toobig; if (IP6CB(skb)->frag_max_size) { if (IP6CB(skb)->frag_max_size > mtu) goto fail_toobig; /* don't send fragments larger than what we received */ mtu = IP6CB(skb)->frag_max_size; if (mtu < IPV6_MIN_MTU) mtu = IPV6_MIN_MTU; } if (np) { u32 frag_size = READ_ONCE(np->frag_size); if (frag_size && frag_size < mtu) mtu = frag_size; } if (mtu < hlen + sizeof(struct frag_hdr) + 8) goto fail_toobig; mtu -= hlen + sizeof(struct frag_hdr); frag_id = ipv6_select_ident(net, &ipv6_hdr(skb)->daddr, &ipv6_hdr(skb)->saddr); if (skb->ip_summed == CHECKSUM_PARTIAL && (err = skb_checksum_help(skb))) goto fail; prevhdr = skb_network_header(skb) + nexthdr_offset; hroom = LL_RESERVED_SPACE(rt->dst.dev); if (skb_has_frag_list(skb)) { unsigned int first_len = skb_pagelen(skb); struct ip6_fraglist_iter iter; struct sk_buff *frag2; if (first_len - hlen > mtu || ((first_len - hlen) & 7) || skb_cloned(skb) || skb_headroom(skb) < (hroom + sizeof(struct frag_hdr))) goto slow_path; skb_walk_frags(skb, frag) { /* Correct geometry. */ if (frag->len > mtu || ((frag->len & 7) && frag->next) || skb_headroom(frag) < (hlen + hroom + sizeof(struct frag_hdr))) goto slow_path_clean; /* Partially cloned skb? */ if (skb_shared(frag)) goto slow_path_clean; BUG_ON(frag->sk); if (skb->sk) { frag->sk = skb->sk; frag->destructor = sock_wfree; } skb->truesize -= frag->truesize; } err = ip6_fraglist_init(skb, hlen, prevhdr, nexthdr, frag_id, &iter); if (err < 0) goto fail; /* We prevent @rt from being freed. */ rcu_read_lock(); for (;;) { /* Prepare header of the next frame, * before previous one went down. */ if (iter.frag) ip6_fraglist_prepare(skb, &iter); skb_set_delivery_time(skb, tstamp, tstamp_type); err = output(net, sk, skb); if (!err) IP6_INC_STATS(net, ip6_dst_idev(&rt->dst), IPSTATS_MIB_FRAGCREATES); if (err || !iter.frag) break; skb = ip6_fraglist_next(&iter); } kfree(iter.tmp_hdr); if (err == 0) { IP6_INC_STATS(net, ip6_dst_idev(&rt->dst), IPSTATS_MIB_FRAGOKS); rcu_read_unlock(); return 0; } kfree_skb_list(iter.frag); IP6_INC_STATS(net, ip6_dst_idev(&rt->dst), IPSTATS_MIB_FRAGFAILS); rcu_read_unlock(); return err; slow_path_clean: skb_walk_frags(skb, frag2) { if (frag2 == frag) break; frag2->sk = NULL; frag2->destructor = NULL; skb->truesize += frag2->truesize; } } slow_path: /* * Fragment the datagram. */ ip6_frag_init(skb, hlen, mtu, rt->dst.dev->needed_tailroom, LL_RESERVED_SPACE(rt->dst.dev), prevhdr, nexthdr, frag_id, &state); /* * Keep copying data until we run out. */ while (state.left > 0) { frag = ip6_frag_next(skb, &state); if (IS_ERR(frag)) { err = PTR_ERR(frag); goto fail; } /* * Put this fragment into the sending queue. */ skb_set_delivery_time(frag, tstamp, tstamp_type); err = output(net, sk, frag); if (err) goto fail; IP6_INC_STATS(net, ip6_dst_idev(skb_dst(skb)), IPSTATS_MIB_FRAGCREATES); } IP6_INC_STATS(net, ip6_dst_idev(skb_dst(skb)), IPSTATS_MIB_FRAGOKS); consume_skb(skb); return err; fail_toobig: icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu); err = -EMSGSIZE; fail: IP6_INC_STATS(net, ip6_dst_idev(skb_dst(skb)), IPSTATS_MIB_FRAGFAILS); kfree_skb(skb); return err; } static inline int ip6_rt_check(const struct rt6key *rt_key, const struct in6_addr *fl_addr, const struct in6_addr *addr_cache) { return (rt_key->plen != 128 || !ipv6_addr_equal(fl_addr, &rt_key->addr)) && (!addr_cache || !ipv6_addr_equal(fl_addr, addr_cache)); } static struct dst_entry *ip6_sk_dst_check(struct sock *sk, struct dst_entry *dst, const struct flowi6 *fl6) { struct ipv6_pinfo *np = inet6_sk(sk); struct rt6_info *rt; if (!dst) goto out; if (dst->ops->family != AF_INET6) { dst_release(dst); return NULL; } rt = dst_rt6_info(dst); /* Yes, checking route validity in not connected * case is not very simple. Take into account, * that we do not support routing by source, TOS, * and MSG_DONTROUTE --ANK (980726) * * 1. ip6_rt_check(): If route was host route, * check that cached destination is current. * If it is network route, we still may * check its validity using saved pointer * to the last used address: daddr_cache. * We do not want to save whole address now, * (because main consumer of this service * is tcp, which has not this problem), * so that the last trick works only on connected * sockets. * 2. oif also should be the same. */ if (ip6_rt_check(&rt->rt6i_dst, &fl6->daddr, np->daddr_cache) || #ifdef CONFIG_IPV6_SUBTREES ip6_rt_check(&rt->rt6i_src, &fl6->saddr, np->saddr_cache) || #endif (fl6->flowi6_oif && fl6->flowi6_oif != dst->dev->ifindex)) { dst_release(dst); dst = NULL; } out: return dst; } static int ip6_dst_lookup_tail(struct net *net, const struct sock *sk, struct dst_entry **dst, struct flowi6 *fl6) { #ifdef CONFIG_IPV6_OPTIMISTIC_DAD struct neighbour *n; struct rt6_info *rt; #endif int err; int flags = 0; /* The correct way to handle this would be to do * ip6_route_get_saddr, and then ip6_route_output; however, * the route-specific preferred source forces the * ip6_route_output call _before_ ip6_route_get_saddr. * * In source specific routing (no src=any default route), * ip6_route_output will fail given src=any saddr, though, so * that's why we try it again later. */ if (ipv6_addr_any(&fl6->saddr)) { struct fib6_info *from; struct rt6_info *rt; *dst = ip6_route_output(net, sk, fl6); rt = (*dst)->error ? NULL : dst_rt6_info(*dst); rcu_read_lock(); from = rt ? rcu_dereference(rt->from) : NULL; err = ip6_route_get_saddr(net, from, &fl6->daddr, sk ? READ_ONCE(inet6_sk(sk)->srcprefs) : 0, fl6->flowi6_l3mdev, &fl6->saddr); rcu_read_unlock(); if (err) goto out_err_release; /* If we had an erroneous initial result, pretend it * never existed and let the SA-enabled version take * over. */ if ((*dst)->error) { dst_release(*dst); *dst = NULL; } if (fl6->flowi6_oif) flags |= RT6_LOOKUP_F_IFACE; } if (!*dst) *dst = ip6_route_output_flags(net, sk, fl6, flags); err = (*dst)->error; if (err) goto out_err_release; #ifdef CONFIG_IPV6_OPTIMISTIC_DAD /* * Here if the dst entry we've looked up * has a neighbour entry that is in the INCOMPLETE * state and the src address from the flow is * marked as OPTIMISTIC, we release the found * dst entry and replace it instead with the * dst entry of the nexthop router */ rt = dst_rt6_info(*dst); rcu_read_lock(); n = __ipv6_neigh_lookup_noref(rt->dst.dev, rt6_nexthop(rt, &fl6->daddr)); err = n && !(READ_ONCE(n->nud_state) & NUD_VALID) ? -EINVAL : 0; rcu_read_unlock(); if (err) { struct inet6_ifaddr *ifp; struct flowi6 fl_gw6; int redirect; ifp = ipv6_get_ifaddr(net, &fl6->saddr, (*dst)->dev, 1); redirect = (ifp && ifp->flags & IFA_F_OPTIMISTIC); if (ifp) in6_ifa_put(ifp); if (redirect) { /* * We need to get the dst entry for the * default router instead */ dst_release(*dst); memcpy(&fl_gw6, fl6, sizeof(struct flowi6)); memset(&fl_gw6.daddr, 0, sizeof(struct in6_addr)); *dst = ip6_route_output(net, sk, &fl_gw6); err = (*dst)->error; if (err) goto out_err_release; } } #endif if (ipv6_addr_v4mapped(&fl6->saddr) && !(ipv6_addr_v4mapped(&fl6->daddr) || ipv6_addr_any(&fl6->daddr))) { err = -EAFNOSUPPORT; goto out_err_release; } return 0; out_err_release: dst_release(*dst); *dst = NULL; if (err == -ENETUNREACH) IP6_INC_STATS(net, NULL, IPSTATS_MIB_OUTNOROUTES); return err; } /** * ip6_dst_lookup - perform route lookup on flow * @net: Network namespace to perform lookup in * @sk: socket which provides route info * @dst: pointer to dst_entry * for result * @fl6: flow to lookup * * This function performs a route lookup on the given flow. * * It returns zero on success, or a standard errno code on error. */ int ip6_dst_lookup(struct net *net, struct sock *sk, struct dst_entry **dst, struct flowi6 *fl6) { *dst = NULL; return ip6_dst_lookup_tail(net, sk, dst, fl6); } EXPORT_SYMBOL_GPL(ip6_dst_lookup); /** * ip6_dst_lookup_flow - perform route lookup on flow with ipsec * @net: Network namespace to perform lookup in * @sk: socket which provides route info * @fl6: flow to lookup * @final_dst: final destination address for ipsec lookup * * This function performs a route lookup on the given flow. * * It returns a valid dst pointer on success, or a pointer encoded * error code. */ struct dst_entry *ip6_dst_lookup_flow(struct net *net, const struct sock *sk, struct flowi6 *fl6, const struct in6_addr *final_dst) { struct dst_entry *dst = NULL; int err; err = ip6_dst_lookup_tail(net, sk, &dst, fl6); if (err) return ERR_PTR(err); if (final_dst) fl6->daddr = *final_dst; return xfrm_lookup_route(net, dst, flowi6_to_flowi(fl6), sk, 0); } EXPORT_SYMBOL_GPL(ip6_dst_lookup_flow); /** * ip6_sk_dst_lookup_flow - perform socket cached route lookup on flow * @sk: socket which provides the dst cache and route info * @fl6: flow to lookup * @final_dst: final destination address for ipsec lookup * @connected: whether @sk is connected or not * * This function performs a route lookup on the given flow with the * possibility of using the cached route in the socket if it is valid. * It will take the socket dst lock when operating on the dst cache. * As a result, this function can only be used in process context. * * In addition, for a connected socket, cache the dst in the socket * if the current cache is not valid. * * It returns a valid dst pointer on success, or a pointer encoded * error code. */ struct dst_entry *ip6_sk_dst_lookup_flow(struct sock *sk, struct flowi6 *fl6, const struct in6_addr *final_dst, bool connected) { struct dst_entry *dst = sk_dst_check(sk, inet6_sk(sk)->dst_cookie); dst = ip6_sk_dst_check(sk, dst, fl6); if (dst) return dst; dst = ip6_dst_lookup_flow(sock_net(sk), sk, fl6, final_dst); if (connected && !IS_ERR(dst)) ip6_sk_dst_store_flow(sk, dst_clone(dst), fl6); return dst; } EXPORT_SYMBOL_GPL(ip6_sk_dst_lookup_flow); static inline struct ipv6_opt_hdr *ip6_opt_dup(struct ipv6_opt_hdr *src, gfp_t gfp) { return src ? kmemdup(src, (src->hdrlen + 1) * 8, gfp) : NULL; } static inline struct ipv6_rt_hdr *ip6_rthdr_dup(struct ipv6_rt_hdr *src, gfp_t gfp) { return src ? kmemdup(src, (src->hdrlen + 1) * 8, gfp) : NULL; } static void ip6_append_data_mtu(unsigned int *mtu, int *maxfraglen, unsigned int fragheaderlen, struct sk_buff *skb, struct rt6_info *rt, unsigned int orig_mtu) { if (!(rt->dst.flags & DST_XFRM_TUNNEL)) { if (!skb) { /* first fragment, reserve header_len */ *mtu = orig_mtu - rt->dst.header_len; } else { /* * this fragment is not first, the headers * space is regarded as data space. */ *mtu = orig_mtu; } *maxfraglen = ((*mtu - fragheaderlen) & ~7) + fragheaderlen - sizeof(struct frag_hdr); } } static int ip6_setup_cork(struct sock *sk, struct inet_cork_full *cork, struct inet6_cork *v6_cork, struct ipcm6_cookie *ipc6, struct rt6_info *rt) { struct ipv6_pinfo *np = inet6_sk(sk); unsigned int mtu, frag_size; struct ipv6_txoptions *nopt, *opt = ipc6->opt; /* callers pass dst together with a reference, set it first so * ip6_cork_release() can put it down even in case of an error. */ cork->base.dst = &rt->dst; /* * setup for corking */ if (opt) { if (WARN_ON(v6_cork->opt)) return -EINVAL; nopt = v6_cork->opt = kzalloc(sizeof(*opt), sk->sk_allocation); if (unlikely(!nopt)) return -ENOBUFS; nopt->tot_len = sizeof(*opt); nopt->opt_flen = opt->opt_flen; nopt->opt_nflen = opt->opt_nflen; nopt->dst0opt = ip6_opt_dup(opt->dst0opt, sk->sk_allocation); if (opt->dst0opt && !nopt->dst0opt) return -ENOBUFS; nopt->dst1opt = ip6_opt_dup(opt->dst1opt, sk->sk_allocation); if (opt->dst1opt && !nopt->dst1opt) return -ENOBUFS; nopt->hopopt = ip6_opt_dup(opt->hopopt, sk->sk_allocation); if (opt->hopopt && !nopt->hopopt) return -ENOBUFS; nopt->srcrt = ip6_rthdr_dup(opt->srcrt, sk->sk_allocation); if (opt->srcrt && !nopt->srcrt) return -ENOBUFS; /* need source address above miyazawa*/ } v6_cork->hop_limit = ipc6->hlimit; v6_cork->tclass = ipc6->tclass; v6_cork->dontfrag = ipc6->dontfrag; if (rt->dst.flags & DST_XFRM_TUNNEL) mtu = READ_ONCE(np->pmtudisc) >= IPV6_PMTUDISC_PROBE ? READ_ONCE(rt->dst.dev->mtu) : dst_mtu(&rt->dst); else mtu = READ_ONCE(np->pmtudisc) >= IPV6_PMTUDISC_PROBE ? READ_ONCE(rt->dst.dev->mtu) : dst_mtu(xfrm_dst_path(&rt->dst)); frag_size = READ_ONCE(np->frag_size); if (frag_size && frag_size < mtu) mtu = frag_size; cork->base.fragsize = mtu; cork->base.gso_size = ipc6->gso_size; cork->base.tx_flags = 0; cork->base.mark = ipc6->sockc.mark; cork->base.priority = ipc6->sockc.priority; sock_tx_timestamp(sk, &ipc6->sockc, &cork->base.tx_flags); if (ipc6->sockc.tsflags & SOCKCM_FLAG_TS_OPT_ID) { cork->base.flags |= IPCORK_TS_OPT_ID; cork->base.ts_opt_id = ipc6->sockc.ts_opt_id; } cork->base.length = 0; cork->base.transmit_time = ipc6->sockc.transmit_time; return 0; } static int __ip6_append_data(struct sock *sk, struct sk_buff_head *queue, struct inet_cork_full *cork_full, struct inet6_cork *v6_cork, struct page_frag *pfrag, int getfrag(void *from, char *to, int offset, int len, int odd, struct sk_buff *skb), void *from, size_t length, int transhdrlen, unsigned int flags) { struct sk_buff *skb, *skb_prev = NULL; struct inet_cork *cork = &cork_full->base; struct flowi6 *fl6 = &cork_full->fl.u.ip6; unsigned int maxfraglen, fragheaderlen, mtu, orig_mtu, pmtu; struct ubuf_info *uarg = NULL; int exthdrlen = 0; int dst_exthdrlen = 0; int hh_len; int copy; int err; int offset = 0; bool zc = false; u32 tskey = 0; struct rt6_info *rt = dst_rt6_info(cork->dst); bool paged, hold_tskey = false, extra_uref = false; struct ipv6_txoptions *opt = v6_cork->opt; int csummode = CHECKSUM_NONE; unsigned int maxnonfragsize, headersize; unsigned int wmem_alloc_delta = 0; skb = skb_peek_tail(queue); if (!skb) { exthdrlen = opt ? opt->opt_flen : 0; dst_exthdrlen = rt->dst.header_len - rt->rt6i_nfheader_len; } paged = !!cork->gso_size; mtu = cork->gso_size ? IP6_MAX_MTU : cork->fragsize; orig_mtu = mtu; hh_len = LL_RESERVED_SPACE(rt->dst.dev); fragheaderlen = sizeof(struct ipv6hdr) + rt->rt6i_nfheader_len + (opt ? opt->opt_nflen : 0); headersize = sizeof(struct ipv6hdr) + (opt ? opt->opt_flen + opt->opt_nflen : 0) + rt->rt6i_nfheader_len; if (mtu <= fragheaderlen || ((mtu - fragheaderlen) & ~7) + fragheaderlen <= sizeof(struct frag_hdr)) goto emsgsize; maxfraglen = ((mtu - fragheaderlen) & ~7) + fragheaderlen - sizeof(struct frag_hdr); /* as per RFC 7112 section 5, the entire IPv6 Header Chain must fit * the first fragment */ if (headersize + transhdrlen > mtu) goto emsgsize; if (cork->length + length > mtu - headersize && v6_cork->dontfrag && (sk->sk_protocol == IPPROTO_UDP || sk->sk_protocol == IPPROTO_ICMPV6 || sk->sk_protocol == IPPROTO_RAW)) { ipv6_local_rxpmtu(sk, fl6, mtu - headersize + sizeof(struct ipv6hdr)); goto emsgsize; } if (ip6_sk_ignore_df(sk)) maxnonfragsize = sizeof(struct ipv6hdr) + IPV6_MAXPLEN; else maxnonfragsize = mtu; if (cork->length + length > maxnonfragsize - headersize) { emsgsize: pmtu = max_t(int, mtu - headersize + sizeof(struct ipv6hdr), 0); ipv6_local_error(sk, EMSGSIZE, fl6, pmtu); return -EMSGSIZE; } /* CHECKSUM_PARTIAL only with no extension headers and when * we are not going to fragment */ if (transhdrlen && sk->sk_protocol == IPPROTO_UDP && headersize == sizeof(struct ipv6hdr) && length <= mtu - headersize && (!(flags & MSG_MORE) || cork->gso_size) && rt->dst.dev->features & (NETIF_F_IPV6_CSUM | NETIF_F_HW_CSUM)) csummode = CHECKSUM_PARTIAL; if ((flags & MSG_ZEROCOPY) && length) { struct msghdr *msg = from; if (getfrag == ip_generic_getfrag && msg->msg_ubuf) { if (skb_zcopy(skb) && msg->msg_ubuf != skb_zcopy(skb)) return -EINVAL; /* Leave uarg NULL if can't zerocopy, callers should * be able to handle it. */ if ((rt->dst.dev->features & NETIF_F_SG) && csummode == CHECKSUM_PARTIAL) { paged = true; zc = true; uarg = msg->msg_ubuf; } } else if (sock_flag(sk, SOCK_ZEROCOPY)) { uarg = msg_zerocopy_realloc(sk, length, skb_zcopy(skb), false); if (!uarg) return -ENOBUFS; extra_uref = !skb_zcopy(skb); /* only ref on new uarg */ if (rt->dst.dev->features & NETIF_F_SG && csummode == CHECKSUM_PARTIAL) { paged = true; zc = true; } else { uarg_to_msgzc(uarg)->zerocopy = 0; skb_zcopy_set(skb, uarg, &extra_uref); } } } else if ((flags & MSG_SPLICE_PAGES) && length) { if (inet_test_bit(HDRINCL, sk)) return -EPERM; if (rt->dst.dev->features & NETIF_F_SG && getfrag == ip_generic_getfrag) /* We need an empty buffer to attach stuff to */ paged = true; else flags &= ~MSG_SPLICE_PAGES; } if (cork->tx_flags & SKBTX_ANY_TSTAMP && READ_ONCE(sk->sk_tsflags) & SOF_TIMESTAMPING_OPT_ID) { if (cork->flags & IPCORK_TS_OPT_ID) { tskey = cork->ts_opt_id; } else { tskey = atomic_inc_return(&sk->sk_tskey) - 1; hold_tskey = true; } } /* * Let's try using as much space as possible. * Use MTU if total length of the message fits into the MTU. * Otherwise, we need to reserve fragment header and * fragment alignment (= 8-15 octects, in total). * * Note that we may need to "move" the data from the tail * of the buffer to the new fragment when we split * the message. * * FIXME: It may be fragmented into multiple chunks * at once if non-fragmentable extension headers * are too large. * --yoshfuji */ cork->length += length; if (!skb) goto alloc_new_skb; while (length > 0) { /* Check if the remaining data fits into current packet. */ copy = (cork->length <= mtu ? mtu : maxfraglen) - skb->len; if (copy < length) copy = maxfraglen - skb->len; if (copy <= 0) { char *data; unsigned int datalen; unsigned int fraglen; unsigned int fraggap; unsigned int alloclen, alloc_extra; unsigned int pagedlen; alloc_new_skb: /* There's no room in the current skb */ if (skb) fraggap = skb->len - maxfraglen; else fraggap = 0; /* update mtu and maxfraglen if necessary */ if (!skb || !skb_prev) ip6_append_data_mtu(&mtu, &maxfraglen, fragheaderlen, skb, rt, orig_mtu); skb_prev = skb; /* * If remaining data exceeds the mtu, * we know we need more fragment(s). */ datalen = length + fraggap; if (datalen > (cork->length <= mtu ? mtu : maxfraglen) - fragheaderlen) datalen = maxfraglen - fragheaderlen - rt->dst.trailer_len; fraglen = datalen + fragheaderlen; pagedlen = 0; alloc_extra = hh_len; alloc_extra += dst_exthdrlen; alloc_extra += rt->dst.trailer_len; /* We just reserve space for fragment header. * Note: this may be overallocation if the message * (without MSG_MORE) fits into the MTU. */ alloc_extra += sizeof(struct frag_hdr); if ((flags & MSG_MORE) && !(rt->dst.dev->features&NETIF_F_SG)) alloclen = mtu; else if (!paged && (fraglen + alloc_extra < SKB_MAX_ALLOC || !(rt->dst.dev->features & NETIF_F_SG))) alloclen = fraglen; else { alloclen = fragheaderlen + transhdrlen; pagedlen = datalen - transhdrlen; } alloclen += alloc_extra; if (datalen != length + fraggap) { /* * this is not the last fragment, the trailer * space is regarded as data space. */ datalen += rt->dst.trailer_len; } fraglen = datalen + fragheaderlen; copy = datalen - transhdrlen - fraggap - pagedlen; /* [!] NOTE: copy may be negative if pagedlen>0 * because then the equation may reduces to -fraggap. */ if (copy < 0 && !(flags & MSG_SPLICE_PAGES)) { err = -EINVAL; goto error; } if (transhdrlen) { skb = sock_alloc_send_skb(sk, alloclen, (flags & MSG_DONTWAIT), &err); } else { skb = NULL; if (refcount_read(&sk->sk_wmem_alloc) + wmem_alloc_delta <= 2 * sk->sk_sndbuf) skb = alloc_skb(alloclen, sk->sk_allocation); if (unlikely(!skb)) err = -ENOBUFS; } if (!skb) goto error; /* * Fill in the control structures */ skb->protocol = htons(ETH_P_IPV6); skb->ip_summed = csummode; skb->csum = 0; /* reserve for fragmentation and ipsec header */ skb_reserve(skb, hh_len + sizeof(struct frag_hdr) + dst_exthdrlen); /* * Find where to start putting bytes */ data = skb_put(skb, fraglen - pagedlen); skb_set_network_header(skb, exthdrlen); data += fragheaderlen; skb->transport_header = (skb->network_header + fragheaderlen); if (fraggap) { skb->csum = skb_copy_and_csum_bits( skb_prev, maxfraglen, data + transhdrlen, fraggap); skb_prev->csum = csum_sub(skb_prev->csum, skb->csum); data += fraggap; pskb_trim_unique(skb_prev, maxfraglen); } if (copy > 0 && INDIRECT_CALL_1(getfrag, ip_generic_getfrag, from, data + transhdrlen, offset, copy, fraggap, skb) < 0) { err = -EFAULT; kfree_skb(skb); goto error; } else if (flags & MSG_SPLICE_PAGES) { copy = 0; } offset += copy; length -= copy + transhdrlen; transhdrlen = 0; exthdrlen = 0; dst_exthdrlen = 0; /* Only the initial fragment is time stamped */ skb_shinfo(skb)->tx_flags = cork->tx_flags; cork->tx_flags = 0; skb_shinfo(skb)->tskey = tskey; tskey = 0; skb_zcopy_set(skb, uarg, &extra_uref); if ((flags & MSG_CONFIRM) && !skb_prev) skb_set_dst_pending_confirm(skb, 1); /* * Put the packet on the pending queue */ if (!skb->destructor) { skb->destructor = sock_wfree; skb->sk = sk; wmem_alloc_delta += skb->truesize; } __skb_queue_tail(queue, skb); continue; } if (copy > length) copy = length; if (!(rt->dst.dev->features&NETIF_F_SG) && skb_tailroom(skb) >= copy) { unsigned int off; off = skb->len; if (INDIRECT_CALL_1(getfrag, ip_generic_getfrag, from, skb_put(skb, copy), offset, copy, off, skb) < 0) { __skb_trim(skb, off); err = -EFAULT; goto error; } } else if (flags & MSG_SPLICE_PAGES) { struct msghdr *msg = from; err = -EIO; if (WARN_ON_ONCE(copy > msg->msg_iter.count)) goto error; err = skb_splice_from_iter(skb, &msg->msg_iter, copy, sk->sk_allocation); if (err < 0) goto error; copy = err; wmem_alloc_delta += copy; } else if (!zc) { int i = skb_shinfo(skb)->nr_frags; err = -ENOMEM; if (!sk_page_frag_refill(sk, pfrag)) goto error; skb_zcopy_downgrade_managed(skb); if (!skb_can_coalesce(skb, i, pfrag->page, pfrag->offset)) { err = -EMSGSIZE; if (i == MAX_SKB_FRAGS) goto error; __skb_fill_page_desc(skb, i, pfrag->page, pfrag->offset, 0); skb_shinfo(skb)->nr_frags = ++i; get_page(pfrag->page); } copy = min_t(int, copy, pfrag->size - pfrag->offset); if (INDIRECT_CALL_1(getfrag, ip_generic_getfrag, from, page_address(pfrag->page) + pfrag->offset, offset, copy, skb->len, skb) < 0) goto error_efault; pfrag->offset += copy; skb_frag_size_add(&skb_shinfo(skb)->frags[i - 1], copy); skb->len += copy; skb->data_len += copy; skb->truesize += copy; wmem_alloc_delta += copy; } else { err = skb_zerocopy_iter_dgram(skb, from, copy); if (err < 0) goto error; } offset += copy; length -= copy; } if (wmem_alloc_delta) refcount_add(wmem_alloc_delta, &sk->sk_wmem_alloc); return 0; error_efault: err = -EFAULT; error: net_zcopy_put_abort(uarg, extra_uref); cork->length -= length; IP6_INC_STATS(sock_net(sk), rt->rt6i_idev, IPSTATS_MIB_OUTDISCARDS); refcount_add(wmem_alloc_delta, &sk->sk_wmem_alloc); if (hold_tskey) atomic_dec(&sk->sk_tskey); return err; } int ip6_append_data(struct sock *sk, int getfrag(void *from, char *to, int offset, int len, int odd, struct sk_buff *skb), void *from, size_t length, int transhdrlen, struct ipcm6_cookie *ipc6, struct flowi6 *fl6, struct rt6_info *rt, unsigned int flags) { struct inet_sock *inet = inet_sk(sk); struct ipv6_pinfo *np = inet6_sk(sk); int exthdrlen; int err; if (flags&MSG_PROBE) return 0; if (skb_queue_empty(&sk->sk_write_queue)) { /* * setup for corking */ dst_hold(&rt->dst); err = ip6_setup_cork(sk, &inet->cork, &np->cork, ipc6, rt); if (err) return err; inet->cork.fl.u.ip6 = *fl6; exthdrlen = (ipc6->opt ? ipc6->opt->opt_flen : 0); length += exthdrlen; transhdrlen += exthdrlen; } else { transhdrlen = 0; } return __ip6_append_data(sk, &sk->sk_write_queue, &inet->cork, &np->cork, sk_page_frag(sk), getfrag, from, length, transhdrlen, flags); } EXPORT_SYMBOL_GPL(ip6_append_data); static void ip6_cork_steal_dst(struct sk_buff *skb, struct inet_cork_full *cork) { struct dst_entry *dst = cork->base.dst; cork->base.dst = NULL; skb_dst_set(skb, dst); } static void ip6_cork_release(struct inet_cork_full *cork, struct inet6_cork *v6_cork) { if (v6_cork->opt) { struct ipv6_txoptions *opt = v6_cork->opt; kfree(opt->dst0opt); kfree(opt->dst1opt); kfree(opt->hopopt); kfree(opt->srcrt); kfree(opt); v6_cork->opt = NULL; } if (cork->base.dst) { dst_release(cork->base.dst); cork->base.dst = NULL; } } struct sk_buff *__ip6_make_skb(struct sock *sk, struct sk_buff_head *queue, struct inet_cork_full *cork, struct inet6_cork *v6_cork) { struct sk_buff *skb, *tmp_skb; struct sk_buff **tail_skb; struct in6_addr *final_dst; struct net *net = sock_net(sk); struct ipv6hdr *hdr; struct ipv6_txoptions *opt = v6_cork->opt; struct rt6_info *rt = dst_rt6_info(cork->base.dst); struct flowi6 *fl6 = &cork->fl.u.ip6; unsigned char proto = fl6->flowi6_proto; skb = __skb_dequeue(queue); if (!skb) goto out; tail_skb = &(skb_shinfo(skb)->frag_list); /* move skb->data to ip header from ext header */ if (skb->data < skb_network_header(skb)) __skb_pull(skb, skb_network_offset(skb)); while ((tmp_skb = __skb_dequeue(queue)) != NULL) { __skb_pull(tmp_skb, skb_network_header_len(skb)); *tail_skb = tmp_skb; tail_skb = &(tmp_skb->next); skb->len += tmp_skb->len; skb->data_len += tmp_skb->len; skb->truesize += tmp_skb->truesize; tmp_skb->destructor = NULL; tmp_skb->sk = NULL; } /* Allow local fragmentation. */ skb->ignore_df = ip6_sk_ignore_df(sk); __skb_pull(skb, skb_network_header_len(skb)); final_dst = &fl6->daddr; if (opt && opt->opt_flen) ipv6_push_frag_opts(skb, opt, &proto); if (opt && opt->opt_nflen) ipv6_push_nfrag_opts(skb, opt, &proto, &final_dst, &fl6->saddr); skb_push(skb, sizeof(struct ipv6hdr)); skb_reset_network_header(skb); hdr = ipv6_hdr(skb); ip6_flow_hdr(hdr, v6_cork->tclass, ip6_make_flowlabel(net, skb, fl6->flowlabel, ip6_autoflowlabel(net, sk), fl6)); hdr->hop_limit = v6_cork->hop_limit; hdr->nexthdr = proto; hdr->saddr = fl6->saddr; hdr->daddr = *final_dst; skb->priority = cork->base.priority; skb->mark = cork->base.mark; if (sk_is_tcp(sk)) skb_set_delivery_time(skb, cork->base.transmit_time, SKB_CLOCK_MONOTONIC); else skb_set_delivery_type_by_clockid(skb, cork->base.transmit_time, sk->sk_clockid); ip6_cork_steal_dst(skb, cork); IP6_INC_STATS(net, rt->rt6i_idev, IPSTATS_MIB_OUTREQUESTS); if (proto == IPPROTO_ICMPV6) { struct inet6_dev *idev = ip6_dst_idev(skb_dst(skb)); u8 icmp6_type; if (sk->sk_socket->type == SOCK_RAW && !(fl6->flowi6_flags & FLOWI_FLAG_KNOWN_NH)) icmp6_type = fl6->fl6_icmp_type; else icmp6_type = icmp6_hdr(skb)->icmp6_type; ICMP6MSGOUT_INC_STATS(net, idev, icmp6_type); ICMP6_INC_STATS(net, idev, ICMP6_MIB_OUTMSGS); } ip6_cork_release(cork, v6_cork); out: return skb; } int ip6_send_skb(struct sk_buff *skb) { struct net *net = sock_net(skb->sk); struct rt6_info *rt = dst_rt6_info(skb_dst(skb)); int err; rcu_read_lock(); err = ip6_local_out(net, skb->sk, skb); if (err) { if (err > 0) err = net_xmit_errno(err); if (err) IP6_INC_STATS(net, rt->rt6i_idev, IPSTATS_MIB_OUTDISCARDS); } rcu_read_unlock(); return err; } int ip6_push_pending_frames(struct sock *sk) { struct sk_buff *skb; skb = ip6_finish_skb(sk); if (!skb) return 0; return ip6_send_skb(skb); } EXPORT_SYMBOL_GPL(ip6_push_pending_frames); static void __ip6_flush_pending_frames(struct sock *sk, struct sk_buff_head *queue, struct inet_cork_full *cork, struct inet6_cork *v6_cork) { struct sk_buff *skb; while ((skb = __skb_dequeue_tail(queue)) != NULL) { if (skb_dst(skb)) IP6_INC_STATS(sock_net(sk), ip6_dst_idev(skb_dst(skb)), IPSTATS_MIB_OUTDISCARDS); kfree_skb(skb); } ip6_cork_release(cork, v6_cork); } void ip6_flush_pending_frames(struct sock *sk) { __ip6_flush_pending_frames(sk, &sk->sk_write_queue, &inet_sk(sk)->cork, &inet6_sk(sk)->cork); } EXPORT_SYMBOL_GPL(ip6_flush_pending_frames); struct sk_buff *ip6_make_skb(struct sock *sk, int getfrag(void *from, char *to, int offset, int len, int odd, struct sk_buff *skb), void *from, size_t length, int transhdrlen, struct ipcm6_cookie *ipc6, struct rt6_info *rt, unsigned int flags, struct inet_cork_full *cork) { struct inet6_cork v6_cork; struct sk_buff_head queue; int exthdrlen = (ipc6->opt ? ipc6->opt->opt_flen : 0); int err; if (flags & MSG_PROBE) { dst_release(&rt->dst); return NULL; } __skb_queue_head_init(&queue); cork->base.flags = 0; cork->base.addr = 0; cork->base.opt = NULL; v6_cork.opt = NULL; err = ip6_setup_cork(sk, cork, &v6_cork, ipc6, rt); if (err) { ip6_cork_release(cork, &v6_cork); return ERR_PTR(err); } err = __ip6_append_data(sk, &queue, cork, &v6_cork, ¤t->task_frag, getfrag, from, length + exthdrlen, transhdrlen + exthdrlen, flags); if (err) { __ip6_flush_pending_frames(sk, &queue, cork, &v6_cork); return ERR_PTR(err); } return __ip6_make_skb(sk, &queue, cork, &v6_cork); } |
| 18 18 1 1 11 11 1 2 19 8 145 25 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 | /* SPDX-License-Identifier: GPL-2.0 */ #ifndef _LINUX_SCHED_H #define _LINUX_SCHED_H /* * Define 'struct task_struct' and provide the main scheduler * APIs (schedule(), wakeup variants, etc.) */ #include <uapi/linux/sched.h> #include <asm/current.h> #include <asm/processor.h> #include <linux/thread_info.h> #include <linux/preempt.h> #include <linux/cpumask_types.h> #include <linux/cache.h> #include <linux/irqflags_types.h> #include <linux/smp_types.h> #include <linux/pid_types.h> #include <linux/sem_types.h> #include <linux/shm.h> #include <linux/kmsan_types.h> #include <linux/mutex_types.h> #include <linux/plist_types.h> #include <linux/hrtimer_types.h> #include <linux/timer_types.h> #include <linux/seccomp_types.h> #include <linux/nodemask_types.h> #include <linux/refcount_types.h> #include <linux/resource.h> #include <linux/latencytop.h> #include <linux/sched/prio.h> #include <linux/sched/types.h> #include <linux/signal_types.h> #include <linux/syscall_user_dispatch_types.h> #include <linux/mm_types_task.h> #include <linux/netdevice_xmit.h> #include <linux/task_io_accounting.h> #include <linux/posix-timers_types.h> #include <linux/restart_block.h> #include <uapi/linux/rseq.h> #include <linux/seqlock_types.h> #include <linux/kcsan.h> #include <linux/rv.h> #include <linux/uidgid_types.h> #include <linux/tracepoint-defs.h> #include <asm/kmap_size.h> /* task_struct member predeclarations (sorted alphabetically): */ struct audit_context; struct bio_list; struct blk_plug; struct bpf_local_storage; struct bpf_run_ctx; struct bpf_net_context; struct capture_control; struct cfs_rq; struct fs_struct; struct futex_pi_state; struct io_context; struct io_uring_task; struct mempolicy; struct nameidata; struct nsproxy; struct perf_event_context; struct perf_ctx_data; struct pid_namespace; struct pipe_inode_info; struct rcu_node; struct reclaim_state; struct robust_list_head; struct root_domain; struct rq; struct sched_attr; struct sched_dl_entity; struct seq_file; struct sighand_struct; struct signal_struct; struct task_delay_info; struct task_group; struct task_struct; struct user_event_mm; #include <linux/sched/ext.h> /* * Task state bitmask. NOTE! These bits are also * encoded in fs/proc/array.c: get_task_state(). * * We have two separate sets of flags: task->__state * is about runnability, while task->exit_state are * about the task exiting. Confusing, but this way * modifying one set can't modify the other one by * mistake. */ /* Used in tsk->__state: */ #define TASK_RUNNING 0x00000000 #define TASK_INTERRUPTIBLE 0x00000001 #define TASK_UNINTERRUPTIBLE 0x00000002 #define __TASK_STOPPED 0x00000004 #define __TASK_TRACED 0x00000008 /* Used in tsk->exit_state: */ #define EXIT_DEAD 0x00000010 #define EXIT_ZOMBIE 0x00000020 #define EXIT_TRACE (EXIT_ZOMBIE | EXIT_DEAD) /* Used in tsk->__state again: */ #define TASK_PARKED 0x00000040 #define TASK_DEAD 0x00000080 #define TASK_WAKEKILL 0x00000100 #define TASK_WAKING 0x00000200 #define TASK_NOLOAD 0x00000400 #define TASK_NEW 0x00000800 #define TASK_RTLOCK_WAIT 0x00001000 #define TASK_FREEZABLE 0x00002000 #define __TASK_FREEZABLE_UNSAFE (0x00004000 * IS_ENABLED(CONFIG_LOCKDEP)) #define TASK_FROZEN 0x00008000 #define TASK_STATE_MAX 0x00010000 #define TASK_ANY (TASK_STATE_MAX-1) /* * DO NOT ADD ANY NEW USERS ! */ #define TASK_FREEZABLE_UNSAFE (TASK_FREEZABLE | __TASK_FREEZABLE_UNSAFE) /* Convenience macros for the sake of set_current_state: */ #define TASK_KILLABLE (TASK_WAKEKILL | TASK_UNINTERRUPTIBLE) #define TASK_STOPPED (TASK_WAKEKILL | __TASK_STOPPED) #define TASK_TRACED __TASK_TRACED #define TASK_IDLE (TASK_UNINTERRUPTIBLE | TASK_NOLOAD) /* Convenience macros for the sake of wake_up(): */ #define TASK_NORMAL (TASK_INTERRUPTIBLE | TASK_UNINTERRUPTIBLE) /* get_task_state(): */ #define TASK_REPORT (TASK_RUNNING | TASK_INTERRUPTIBLE | \ TASK_UNINTERRUPTIBLE | __TASK_STOPPED | \ __TASK_TRACED | EXIT_DEAD | EXIT_ZOMBIE | \ TASK_PARKED) #define task_is_running(task) (READ_ONCE((task)->__state) == TASK_RUNNING) #define task_is_traced(task) ((READ_ONCE(task->jobctl) & JOBCTL_TRACED) != 0) #define task_is_stopped(task) ((READ_ONCE(task->jobctl) & JOBCTL_STOPPED) != 0) #define task_is_stopped_or_traced(task) ((READ_ONCE(task->jobctl) & (JOBCTL_STOPPED | JOBCTL_TRACED)) != 0) /* * Special states are those that do not use the normal wait-loop pattern. See * the comment with set_special_state(). */ #define is_special_task_state(state) \ ((state) & (__TASK_STOPPED | __TASK_TRACED | TASK_PARKED | \ TASK_DEAD | TASK_FROZEN)) #ifdef CONFIG_DEBUG_ATOMIC_SLEEP # define debug_normal_state_change(state_value) \ do { \ WARN_ON_ONCE(is_special_task_state(state_value)); \ current->task_state_change = _THIS_IP_; \ } while (0) # define debug_special_state_change(state_value) \ do { \ WARN_ON_ONCE(!is_special_task_state(state_value)); \ current->task_state_change = _THIS_IP_; \ } while (0) # define debug_rtlock_wait_set_state() \ do { \ current->saved_state_change = current->task_state_change;\ current->task_state_change = _THIS_IP_; \ } while (0) # define debug_rtlock_wait_restore_state() \ do { \ current->task_state_change = current->saved_state_change;\ } while (0) #else # define debug_normal_state_change(cond) do { } while (0) # define debug_special_state_change(cond) do { } while (0) # define debug_rtlock_wait_set_state() do { } while (0) # define debug_rtlock_wait_restore_state() do { } while (0) #endif #define trace_set_current_state(state_value) \ do { \ if (tracepoint_enabled(sched_set_state_tp)) \ __trace_set_current_state(state_value); \ } while (0) /* * set_current_state() includes a barrier so that the write of current->__state * is correctly serialised wrt the caller's subsequent test of whether to * actually sleep: * * for (;;) { * set_current_state(TASK_UNINTERRUPTIBLE); * if (CONDITION) * break; * * schedule(); * } * __set_current_state(TASK_RUNNING); * * If the caller does not need such serialisation (because, for instance, the * CONDITION test and condition change and wakeup are under the same lock) then * use __set_current_state(). * * The above is typically ordered against the wakeup, which does: * * CONDITION = 1; * wake_up_state(p, TASK_UNINTERRUPTIBLE); * * where wake_up_state()/try_to_wake_up() executes a full memory barrier before * accessing p->__state. * * Wakeup will do: if (@state & p->__state) p->__state = TASK_RUNNING, that is, * once it observes the TASK_UNINTERRUPTIBLE store the waking CPU can issue a * TASK_RUNNING store which can collide with __set_current_state(TASK_RUNNING). * * However, with slightly different timing the wakeup TASK_RUNNING store can * also collide with the TASK_UNINTERRUPTIBLE store. Losing that store is not * a problem either because that will result in one extra go around the loop * and our @cond test will save the day. * * Also see the comments of try_to_wake_up(). */ #define __set_current_state(state_value) \ do { \ debug_normal_state_change((state_value)); \ trace_set_current_state(state_value); \ WRITE_ONCE(current->__state, (state_value)); \ } while (0) #define set_current_state(state_value) \ do { \ debug_normal_state_change((state_value)); \ trace_set_current_state(state_value); \ smp_store_mb(current->__state, (state_value)); \ } while (0) /* * set_special_state() should be used for those states when the blocking task * can not use the regular condition based wait-loop. In that case we must * serialize against wakeups such that any possible in-flight TASK_RUNNING * stores will not collide with our state change. */ #define set_special_state(state_value) \ do { \ unsigned long flags; /* may shadow */ \ \ raw_spin_lock_irqsave(¤t->pi_lock, flags); \ debug_special_state_change((state_value)); \ trace_set_current_state(state_value); \ WRITE_ONCE(current->__state, (state_value)); \ raw_spin_unlock_irqrestore(¤t->pi_lock, flags); \ } while (0) /* * PREEMPT_RT specific variants for "sleeping" spin/rwlocks * * RT's spin/rwlock substitutions are state preserving. The state of the * task when blocking on the lock is saved in task_struct::saved_state and * restored after the lock has been acquired. These operations are * serialized by task_struct::pi_lock against try_to_wake_up(). Any non RT * lock related wakeups while the task is blocked on the lock are * redirected to operate on task_struct::saved_state to ensure that these * are not dropped. On restore task_struct::saved_state is set to * TASK_RUNNING so any wakeup attempt redirected to saved_state will fail. * * The lock operation looks like this: * * current_save_and_set_rtlock_wait_state(); * for (;;) { * if (try_lock()) * break; * raw_spin_unlock_irq(&lock->wait_lock); * schedule_rtlock(); * raw_spin_lock_irq(&lock->wait_lock); * set_current_state(TASK_RTLOCK_WAIT); * } * current_restore_rtlock_saved_state(); */ #define current_save_and_set_rtlock_wait_state() \ do { \ lockdep_assert_irqs_disabled(); \ raw_spin_lock(¤t->pi_lock); \ current->saved_state = current->__state; \ debug_rtlock_wait_set_state(); \ trace_set_current_state(TASK_RTLOCK_WAIT); \ WRITE_ONCE(current->__state, TASK_RTLOCK_WAIT); \ raw_spin_unlock(¤t->pi_lock); \ } while (0); #define current_restore_rtlock_saved_state() \ do { \ lockdep_assert_irqs_disabled(); \ raw_spin_lock(¤t->pi_lock); \ debug_rtlock_wait_restore_state(); \ trace_set_current_state(current->saved_state); \ WRITE_ONCE(current->__state, current->saved_state); \ current->saved_state = TASK_RUNNING; \ raw_spin_unlock(¤t->pi_lock); \ } while (0); #define get_current_state() READ_ONCE(current->__state) /* * Define the task command name length as enum, then it can be visible to * BPF programs. */ enum { TASK_COMM_LEN = 16, }; extern void sched_tick(void); #define MAX_SCHEDULE_TIMEOUT LONG_MAX extern long schedule_timeout(long timeout); extern long schedule_timeout_interruptible(long timeout); extern long schedule_timeout_killable(long timeout); extern long schedule_timeout_uninterruptible(long timeout); extern long schedule_timeout_idle(long timeout); asmlinkage void schedule(void); extern void schedule_preempt_disabled(void); asmlinkage void preempt_schedule_irq(void); #ifdef CONFIG_PREEMPT_RT extern void schedule_rtlock(void); #endif extern int __must_check io_schedule_prepare(void); extern void io_schedule_finish(int token); extern long io_schedule_timeout(long timeout); extern void io_schedule(void); /* wrapper function to trace from this header file */ DECLARE_TRACEPOINT(sched_set_state_tp); extern void __trace_set_current_state(int state_value); /** * struct prev_cputime - snapshot of system and user cputime * @utime: time spent in user mode * @stime: time spent in system mode * @lock: protects the above two fields * * Stores previous user/system time values such that we can guarantee * monotonicity. */ struct prev_cputime { #ifndef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE u64 utime; u64 stime; raw_spinlock_t lock; #endif }; enum vtime_state { /* Task is sleeping or running in a CPU with VTIME inactive: */ VTIME_INACTIVE = 0, /* Task is idle */ VTIME_IDLE, /* Task runs in kernelspace in a CPU with VTIME active: */ VTIME_SYS, /* Task runs in userspace in a CPU with VTIME active: */ VTIME_USER, /* Task runs as guests in a CPU with VTIME active: */ VTIME_GUEST, }; struct vtime { seqcount_t seqcount; unsigned long long starttime; enum vtime_state state; unsigned int cpu; u64 utime; u64 stime; u64 gtime; }; /* * Utilization clamp constraints. * @UCLAMP_MIN: Minimum utilization * @UCLAMP_MAX: Maximum utilization * @UCLAMP_CNT: Utilization clamp constraints count */ enum uclamp_id { UCLAMP_MIN = 0, UCLAMP_MAX, UCLAMP_CNT }; extern struct root_domain def_root_domain; extern struct mutex sched_domains_mutex; extern void sched_domains_mutex_lock(void); extern void sched_domains_mutex_unlock(void); struct sched_param { int sched_priority; }; struct sched_info { #ifdef CONFIG_SCHED_INFO /* Cumulative counters: */ /* # of times we have run on this CPU: */ unsigned long pcount; /* Time spent waiting on a runqueue: */ unsigned long long run_delay; /* Max time spent waiting on a runqueue: */ unsigned long long max_run_delay; /* Min time spent waiting on a runqueue: */ unsigned long long min_run_delay; /* Timestamps: */ /* When did we last run on a CPU? */ unsigned long long last_arrival; /* When were we last queued to run? */ unsigned long long last_queued; #endif /* CONFIG_SCHED_INFO */ }; /* * Integer metrics need fixed point arithmetic, e.g., sched/fair * has a few: load, load_avg, util_avg, freq, and capacity. * * We define a basic fixed point arithmetic range, and then formalize * all these metrics based on that basic range. */ # define SCHED_FIXEDPOINT_SHIFT 10 # define SCHED_FIXEDPOINT_SCALE (1L << SCHED_FIXEDPOINT_SHIFT) /* Increase resolution of cpu_capacity calculations */ # define SCHED_CAPACITY_SHIFT SCHED_FIXEDPOINT_SHIFT # define SCHED_CAPACITY_SCALE (1L << SCHED_CAPACITY_SHIFT) struct load_weight { unsigned long weight; u32 inv_weight; }; /* * The load/runnable/util_avg accumulates an infinite geometric series * (see __update_load_avg_cfs_rq() in kernel/sched/pelt.c). * * [load_avg definition] * * load_avg = runnable% * scale_load_down(load) * * [runnable_avg definition] * * runnable_avg = runnable% * SCHED_CAPACITY_SCALE * * [util_avg definition] * * util_avg = running% * SCHED_CAPACITY_SCALE * * where runnable% is the time ratio that a sched_entity is runnable and * running% the time ratio that a sched_entity is running. * * For cfs_rq, they are the aggregated values of all runnable and blocked * sched_entities. * * The load/runnable/util_avg doesn't directly factor frequency scaling and CPU * capacity scaling. The scaling is done through the rq_clock_pelt that is used * for computing those signals (see update_rq_clock_pelt()) * * N.B., the above ratios (runnable% and running%) themselves are in the * range of [0, 1]. To do fixed point arithmetics, we therefore scale them * to as large a range as necessary. This is for example reflected by * util_avg's SCHED_CAPACITY_SCALE. * * [Overflow issue] * * The 64-bit load_sum can have 4353082796 (=2^64/47742/88761) entities * with the highest load (=88761), always runnable on a single cfs_rq, * and should not overflow as the number already hits PID_MAX_LIMIT. * * For all other cases (including 32-bit kernels), struct load_weight's * weight will overflow first before we do, because: * * Max(load_avg) <= Max(load.weight) * * Then it is the load_weight's responsibility to consider overflow * issues. */ struct sched_avg { u64 last_update_time; u64 load_sum; u64 runnable_sum; u32 util_sum; u32 period_contrib; unsigned long load_avg; unsigned long runnable_avg; unsigned long util_avg; unsigned int util_est; } ____cacheline_aligned; /* * The UTIL_AVG_UNCHANGED flag is used to synchronize util_est with util_avg * updates. When a task is dequeued, its util_est should not be updated if its * util_avg has not been updated in the meantime. * This information is mapped into the MSB bit of util_est at dequeue time. * Since max value of util_est for a task is 1024 (PELT util_avg for a task) * it is safe to use MSB. */ #define UTIL_EST_WEIGHT_SHIFT 2 #define UTIL_AVG_UNCHANGED 0x80000000 struct sched_statistics { #ifdef CONFIG_SCHEDSTATS u64 wait_start; u64 wait_max; u64 wait_count; u64 wait_sum; u64 iowait_count; u64 iowait_sum; u64 sleep_start; u64 sleep_max; s64 sum_sleep_runtime; u64 block_start; u64 block_max; s64 sum_block_runtime; s64 exec_max; u64 slice_max; u64 nr_migrations_cold; u64 nr_failed_migrations_affine; u64 nr_failed_migrations_running; u64 nr_failed_migrations_hot; u64 nr_forced_migrations; #ifdef CONFIG_NUMA_BALANCING u64 numa_task_migrated; u64 numa_task_swapped; #endif u64 nr_wakeups; u64 nr_wakeups_sync; u64 nr_wakeups_migrate; u64 nr_wakeups_local; u64 nr_wakeups_remote; u64 nr_wakeups_affine; u64 nr_wakeups_affine_attempts; u64 nr_wakeups_passive; u64 nr_wakeups_idle; #ifdef CONFIG_SCHED_CORE u64 core_forceidle_sum; #endif #endif /* CONFIG_SCHEDSTATS */ } ____cacheline_aligned; struct sched_entity { /* For load-balancing: */ struct load_weight load; struct rb_node run_node; u64 deadline; u64 min_vruntime; u64 min_slice; struct list_head group_node; unsigned char on_rq; unsigned char sched_delayed; unsigned char rel_deadline; unsigned char custom_slice; /* hole */ u64 exec_start; u64 sum_exec_runtime; u64 prev_sum_exec_runtime; u64 vruntime; s64 vlag; u64 slice; u64 nr_migrations; #ifdef CONFIG_FAIR_GROUP_SCHED int depth; struct sched_entity *parent; /* rq on which this entity is (to be) queued: */ struct cfs_rq *cfs_rq; /* rq "owned" by this entity/group: */ struct cfs_rq *my_q; /* cached value of my_q->h_nr_running */ unsigned long runnable_weight; #endif /* * Per entity load average tracking. * * Put into separate cache line so it does not * collide with read-mostly values above. */ struct sched_avg avg; }; struct sched_rt_entity { struct list_head run_list; unsigned long timeout; unsigned long watchdog_stamp; unsigned int time_slice; unsigned short on_rq; unsigned short on_list; struct sched_rt_entity *back; #ifdef CONFIG_RT_GROUP_SCHED struct sched_rt_entity *parent; /* rq on which this entity is (to be) queued: */ struct rt_rq *rt_rq; /* rq "owned" by this entity/group: */ struct rt_rq *my_q; #endif } __randomize_layout; typedef bool (*dl_server_has_tasks_f)(struct sched_dl_entity *); typedef struct task_struct *(*dl_server_pick_f)(struct sched_dl_entity *); struct sched_dl_entity { struct rb_node rb_node; /* * Original scheduling parameters. Copied here from sched_attr * during sched_setattr(), they will remain the same until * the next sched_setattr(). */ u64 dl_runtime; /* Maximum runtime for each instance */ u64 dl_deadline; /* Relative deadline of each instance */ u64 dl_period; /* Separation of two instances (period) */ u64 dl_bw; /* dl_runtime / dl_period */ u64 dl_density; /* dl_runtime / dl_deadline */ /* * Actual scheduling parameters. Initialized with the values above, * they are continuously updated during task execution. Note that * the remaining runtime could be < 0 in case we are in overrun. */ s64 runtime; /* Remaining runtime for this instance */ u64 deadline; /* Absolute deadline for this instance */ unsigned int flags; /* Specifying the scheduler behaviour */ /* * Some bool flags: * * @dl_throttled tells if we exhausted the runtime. If so, the * task has to wait for a replenishment to be performed at the * next firing of dl_timer. * * @dl_yielded tells if task gave up the CPU before consuming * all its available runtime during the last job. * * @dl_non_contending tells if the task is inactive while still * contributing to the active utilization. In other words, it * indicates if the inactive timer has been armed and its handler * has not been executed yet. This flag is useful to avoid race * conditions between the inactive timer handler and the wakeup * code. * * @dl_overrun tells if the task asked to be informed about runtime * overruns. * * @dl_server tells if this is a server entity. * * @dl_defer tells if this is a deferred or regular server. For * now only defer server exists. * * @dl_defer_armed tells if the deferrable server is waiting * for the replenishment timer to activate it. * * @dl_server_active tells if the dlserver is active(started). * dlserver is started on first cfs enqueue on an idle runqueue * and is stopped when a dequeue results in 0 cfs tasks on the * runqueue. In other words, dlserver is active only when cpu's * runqueue has atleast one cfs task. * * @dl_defer_running tells if the deferrable server is actually * running, skipping the defer phase. */ unsigned int dl_throttled : 1; unsigned int dl_yielded : 1; unsigned int dl_non_contending : 1; unsigned int dl_overrun : 1; unsigned int dl_server : 1; unsigned int dl_server_active : 1; unsigned int dl_defer : 1; unsigned int dl_defer_armed : 1; unsigned int dl_defer_running : 1; /* * Bandwidth enforcement timer. Each -deadline task has its * own bandwidth to be enforced, thus we need one timer per task. */ struct hrtimer dl_timer; /* * Inactive timer, responsible for decreasing the active utilization * at the "0-lag time". When a -deadline task blocks, it contributes * to GRUB's active utilization until the "0-lag time", hence a * timer is needed to decrease the active utilization at the correct * time. */ struct hrtimer inactive_timer; /* * Bits for DL-server functionality. Also see the comment near * dl_server_update(). * * @rq the runqueue this server is for * * @server_has_tasks() returns true if @server_pick return a * runnable task. */ struct rq *rq; dl_server_has_tasks_f server_has_tasks; dl_server_pick_f server_pick_task; #ifdef CONFIG_RT_MUTEXES /* * Priority Inheritance. When a DEADLINE scheduling entity is boosted * pi_se points to the donor, otherwise points to the dl_se it belongs * to (the original one/itself). */ struct sched_dl_entity *pi_se; #endif }; #ifdef CONFIG_UCLAMP_TASK /* Number of utilization clamp buckets (shorter alias) */ #define UCLAMP_BUCKETS CONFIG_UCLAMP_BUCKETS_COUNT /* * Utilization clamp for a scheduling entity * @value: clamp value "assigned" to a se * @bucket_id: bucket index corresponding to the "assigned" value * @active: the se is currently refcounted in a rq's bucket * @user_defined: the requested clamp value comes from user-space * * The bucket_id is the index of the clamp bucket matching the clamp value * which is pre-computed and stored to avoid expensive integer divisions from * the fast path. * * The active bit is set whenever a task has got an "effective" value assigned, * which can be different from the clamp value "requested" from user-space. * This allows to know a task is refcounted in the rq's bucket corresponding * to the "effective" bucket_id. * * The user_defined bit is set whenever a task has got a task-specific clamp * value requested from userspace, i.e. the system defaults apply to this task * just as a restriction. This allows to relax default clamps when a less * restrictive task-specific value has been requested, thus allowing to * implement a "nice" semantic. For example, a task running with a 20% * default boost can still drop its own boosting to 0%. */ struct uclamp_se { unsigned int value : bits_per(SCHED_CAPACITY_SCALE); unsigned int bucket_id : bits_per(UCLAMP_BUCKETS); unsigned int active : 1; unsigned int user_defined : 1; }; #endif /* CONFIG_UCLAMP_TASK */ union rcu_special { struct { u8 blocked; u8 need_qs; u8 exp_hint; /* Hint for performance. */ u8 need_mb; /* Readers need smp_mb(). */ } b; /* Bits. */ u32 s; /* Set of bits. */ }; enum perf_event_task_context { perf_invalid_context = -1, perf_hw_context = 0, perf_sw_context, perf_nr_task_contexts, }; /* * Number of contexts where an event can trigger: * task, softirq, hardirq, nmi. */ #define PERF_NR_CONTEXTS 4 struct wake_q_node { struct wake_q_node *next; }; struct kmap_ctrl { #ifdef CONFIG_KMAP_LOCAL int idx; pte_t pteval[KM_MAX_IDX]; #endif }; struct task_struct { #ifdef CONFIG_THREAD_INFO_IN_TASK /* * For reasons of header soup (see current_thread_info()), this * must be the first element of task_struct. */ struct thread_info thread_info; #endif unsigned int __state; /* saved state for "spinlock sleepers" */ unsigned int saved_state; /* * This begins the randomizable portion of task_struct. Only * scheduling-critical items should be added above here. */ randomized_struct_fields_start void *stack; refcount_t usage; /* Per task flags (PF_*), defined further below: */ unsigned int flags; unsigned int ptrace; #ifdef CONFIG_MEM_ALLOC_PROFILING struct alloc_tag *alloc_tag; #endif int on_cpu; struct __call_single_node wake_entry; unsigned int wakee_flips; unsigned long wakee_flip_decay_ts; struct task_struct *last_wakee; /* * recent_used_cpu is initially set as the last CPU used by a task * that wakes affine another task. Waker/wakee relationships can * push tasks around a CPU where each wakeup moves to the next one. * Tracking a recently used CPU allows a quick search for a recently * used CPU that may be idle. */ int recent_used_cpu; int wake_cpu; int on_rq; int prio; int static_prio; int normal_prio; unsigned int rt_priority; struct sched_entity se; struct sched_rt_entity rt; struct sched_dl_entity dl; struct sched_dl_entity *dl_server; #ifdef CONFIG_SCHED_CLASS_EXT struct sched_ext_entity scx; #endif const struct sched_class *sched_class; #ifdef CONFIG_SCHED_CORE struct rb_node core_node; unsigned long core_cookie; unsigned int core_occupation; #endif #ifdef CONFIG_CGROUP_SCHED struct task_group *sched_task_group; #endif #ifdef CONFIG_UCLAMP_TASK /* * Clamp values requested for a scheduling entity. * Must be updated with task_rq_lock() held. */ struct uclamp_se uclamp_req[UCLAMP_CNT]; /* * Effective clamp values used for a scheduling entity. * Must be updated with task_rq_lock() held. */ struct uclamp_se uclamp[UCLAMP_CNT]; #endif struct sched_statistics stats; #ifdef CONFIG_PREEMPT_NOTIFIERS /* List of struct preempt_notifier: */ struct hlist_head preempt_notifiers; #endif #ifdef CONFIG_BLK_DEV_IO_TRACE unsigned int btrace_seq; #endif unsigned int policy; unsigned long max_allowed_capacity; int nr_cpus_allowed; const cpumask_t *cpus_ptr; cpumask_t *user_cpus_ptr; cpumask_t cpus_mask; void *migration_pending; unsigned short migration_disabled; unsigned short migration_flags; #ifdef CONFIG_PREEMPT_RCU int rcu_read_lock_nesting; union rcu_special rcu_read_unlock_special; struct list_head rcu_node_entry; struct rcu_node *rcu_blocked_node; #endif /* #ifdef CONFIG_PREEMPT_RCU */ #ifdef CONFIG_TASKS_RCU unsigned long rcu_tasks_nvcsw; u8 rcu_tasks_holdout; u8 rcu_tasks_idx; int rcu_tasks_idle_cpu; struct list_head rcu_tasks_holdout_list; int rcu_tasks_exit_cpu; struct list_head rcu_tasks_exit_list; #endif /* #ifdef CONFIG_TASKS_RCU */ #ifdef CONFIG_TASKS_TRACE_RCU int trc_reader_nesting; int trc_ipi_to_cpu; union rcu_special trc_reader_special; struct list_head trc_holdout_list; struct list_head trc_blkd_node; int trc_blkd_cpu; #endif /* #ifdef CONFIG_TASKS_TRACE_RCU */ struct sched_info sched_info; struct list_head tasks; struct plist_node pushable_tasks; struct rb_node pushable_dl_tasks; struct mm_struct *mm; struct mm_struct *active_mm; struct address_space *faults_disabled_mapping; int exit_state; int exit_code; int exit_signal; /* The signal sent when the parent dies: */ int pdeath_signal; /* JOBCTL_*, siglock protected: */ unsigned long jobctl; /* Used for emulating ABI behavior of previous Linux versions: */ unsigned int personality; /* Scheduler bits, serialized by scheduler locks: */ unsigned sched_reset_on_fork:1; unsigned sched_contributes_to_load:1; unsigned sched_migrated:1; unsigned sched_task_hot:1; /* Force alignment to the next boundary: */ unsigned :0; /* Unserialized, strictly 'current' */ /* * This field must not be in the scheduler word above due to wakelist * queueing no longer being serialized by p->on_cpu. However: * * p->XXX = X; ttwu() * schedule() if (p->on_rq && ..) // false * smp_mb__after_spinlock(); if (smp_load_acquire(&p->on_cpu) && //true * deactivate_task() ttwu_queue_wakelist()) * p->on_rq = 0; p->sched_remote_wakeup = Y; * * guarantees all stores of 'current' are visible before * ->sched_remote_wakeup gets used, so it can be in this word. */ unsigned sched_remote_wakeup:1; #ifdef CONFIG_RT_MUTEXES unsigned sched_rt_mutex:1; #endif /* Bit to tell TOMOYO we're in execve(): */ unsigned in_execve:1; unsigned in_iowait:1; #ifndef TIF_RESTORE_SIGMASK unsigned restore_sigmask:1; #endif #ifdef CONFIG_MEMCG_V1 unsigned in_user_fault:1; #endif #ifdef CONFIG_LRU_GEN /* whether the LRU algorithm may apply to this access */ unsigned in_lru_fault:1; #endif #ifdef CONFIG_COMPAT_BRK unsigned brk_randomized:1; #endif #ifdef CONFIG_CGROUPS /* disallow userland-initiated cgroup migration */ unsigned no_cgroup_migration:1; /* task is frozen/stopped (used by the cgroup freezer) */ unsigned frozen:1; #endif #ifdef CONFIG_BLK_CGROUP unsigned use_memdelay:1; #endif #ifdef CONFIG_PSI /* Stalled due to lack of memory */ unsigned in_memstall:1; #endif #ifdef CONFIG_PAGE_OWNER /* Used by page_owner=on to detect recursion in page tracking. */ unsigned in_page_owner:1; #endif #ifdef CONFIG_EVENTFD /* Recursion prevention for eventfd_signal() */ unsigned in_eventfd:1; #endif #ifdef CONFIG_ARCH_HAS_CPU_PASID unsigned pasid_activated:1; #endif #ifdef CONFIG_X86_BUS_LOCK_DETECT unsigned reported_split_lock:1; #endif #ifdef CONFIG_TASK_DELAY_ACCT /* delay due to memory thrashing */ unsigned in_thrashing:1; #endif unsigned in_nf_duplicate:1; #ifdef CONFIG_PREEMPT_RT struct netdev_xmit net_xmit; #endif unsigned long atomic_flags; /* Flags requiring atomic access. */ struct restart_block restart_block; pid_t pid; pid_t tgid; #ifdef CONFIG_STACKPROTECTOR /* Canary value for the -fstack-protector GCC feature: */ unsigned long stack_canary; #endif /* * Pointers to the (original) parent process, youngest child, younger sibling, * older sibling, respectively. (p->father can be replaced with * p->real_parent->pid) */ /* Real parent process: */ struct task_struct __rcu *real_parent; /* Recipient of SIGCHLD, wait4() reports: */ struct task_struct __rcu *parent; /* * Children/sibling form the list of natural children: */ struct list_head children; struct list_head sibling; struct task_struct *group_leader; /* * 'ptraced' is the list of tasks this task is using ptrace() on. * * This includes both natural children and PTRACE_ATTACH targets. * 'ptrace_entry' is this task's link on the p->parent->ptraced list. */ struct list_head ptraced; struct list_head ptrace_entry; /* PID/PID hash table linkage. */ struct pid *thread_pid; struct hlist_node pid_links[PIDTYPE_MAX]; struct list_head thread_node; struct completion *vfork_done; /* CLONE_CHILD_SETTID: */ int __user *set_child_tid; /* CLONE_CHILD_CLEARTID: */ int __user *clear_child_tid; /* PF_KTHREAD | PF_IO_WORKER */ void *worker_private; u64 utime; u64 stime; #ifdef CONFIG_ARCH_HAS_SCALED_CPUTIME u64 utimescaled; u64 stimescaled; #endif u64 gtime; struct prev_cputime prev_cputime; #ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN struct vtime vtime; #endif #ifdef CONFIG_NO_HZ_FULL atomic_t tick_dep_mask; #endif /* Context switch counts: */ unsigned long nvcsw; unsigned long nivcsw; /* Monotonic time in nsecs: */ u64 start_time; /* Boot based time in nsecs: */ u64 start_boottime; /* MM fault and swap info: this can arguably be seen as either mm-specific or thread-specific: */ unsigned long min_flt; unsigned long maj_flt; /* Empty if CONFIG_POSIX_CPUTIMERS=n */ struct posix_cputimers posix_cputimers; #ifdef CONFIG_POSIX_CPU_TIMERS_TASK_WORK struct posix_cputimers_work posix_cputimers_work; #endif /* Process credentials: */ /* Tracer's credentials at attach: */ const struct cred __rcu *ptracer_cred; /* Objective and real subjective task credentials (COW): */ const struct cred __rcu *real_cred; /* Effective (overridable) subjective task credentials (COW): */ const struct cred __rcu *cred; #ifdef CONFIG_KEYS /* Cached requested key. */ struct key *cached_requested_key; #endif /* * executable name, excluding path. * * - normally initialized begin_new_exec() * - set it with set_task_comm() * - strscpy_pad() to ensure it is always NUL-terminated and * zero-padded * - task_lock() to ensure the operation is atomic and the name is * fully updated. */ char comm[TASK_COMM_LEN]; struct nameidata *nameidata; #ifdef CONFIG_SYSVIPC struct sysv_sem sysvsem; struct sysv_shm sysvshm; #endif #ifdef CONFIG_DETECT_HUNG_TASK unsigned long last_switch_count; unsigned long last_switch_time; #endif /* Filesystem information: */ struct fs_struct *fs; /* Open file information: */ struct files_struct *files; #ifdef CONFIG_IO_URING struct io_uring_task *io_uring; #endif /* Namespaces: */ struct nsproxy *nsproxy; /* Signal handlers: */ struct signal_struct *signal; struct sighand_struct __rcu *sighand; sigset_t blocked; sigset_t real_blocked; /* Restored if set_restore_sigmask() was used: */ sigset_t saved_sigmask; struct sigpending pending; unsigned long sas_ss_sp; size_t sas_ss_size; unsigned int sas_ss_flags; struct callback_head *task_works; #ifdef CONFIG_AUDIT #ifdef CONFIG_AUDITSYSCALL struct audit_context *audit_context; #endif kuid_t loginuid; unsigned int sessionid; #endif struct seccomp seccomp; struct syscall_user_dispatch syscall_dispatch; /* Thread group tracking: */ u64 parent_exec_id; u64 self_exec_id; /* Protection against (de-)allocation: mm, files, fs, tty, keyrings, mems_allowed, mempolicy: */ spinlock_t alloc_lock; /* Protection of the PI data structures: */ raw_spinlock_t pi_lock; struct wake_q_node wake_q; #ifdef CONFIG_RT_MUTEXES /* PI waiters blocked on a rt_mutex held by this task: */ struct rb_root_cached pi_waiters; /* Updated under owner's pi_lock and rq lock */ struct task_struct *pi_top_task; /* Deadlock detection and priority inheritance handling: */ struct rt_mutex_waiter *pi_blocked_on; #endif #ifdef CONFIG_DEBUG_MUTEXES /* Mutex deadlock detection: */ struct mutex_waiter *blocked_on; #endif #ifdef CONFIG_DETECT_HUNG_TASK_BLOCKER /* * Encoded lock address causing task block (lower 2 bits = type from * <linux/hung_task.h>). Accessed via hung_task_*() helpers. */ unsigned long blocker; #endif #ifdef CONFIG_DEBUG_ATOMIC_SLEEP int non_block_count; #endif #ifdef CONFIG_TRACE_IRQFLAGS struct irqtrace_events irqtrace; unsigned int hardirq_threaded; u64 hardirq_chain_key; int softirqs_enabled; int softirq_context; int irq_config; #endif #ifdef CONFIG_PREEMPT_RT int softirq_disable_cnt; #endif #ifdef CONFIG_LOCKDEP # define MAX_LOCK_DEPTH 48UL u64 curr_chain_key; int lockdep_depth; unsigned int lockdep_recursion; struct held_lock held_locks[MAX_LOCK_DEPTH]; #endif #if defined(CONFIG_UBSAN) && !defined(CONFIG_UBSAN_TRAP) unsigned int in_ubsan; #endif /* Journalling filesystem info: */ void *journal_info; /* Stacked block device info: */ struct bio_list *bio_list; /* Stack plugging: */ struct blk_plug *plug; /* VM state: */ struct reclaim_state *reclaim_state; struct io_context *io_context; #ifdef CONFIG_COMPACTION struct capture_control *capture_control; #endif /* Ptrace state: */ unsigned long ptrace_message; kernel_siginfo_t *last_siginfo; struct task_io_accounting ioac; #ifdef CONFIG_PSI /* Pressure stall state */ unsigned int psi_flags; #endif #ifdef CONFIG_TASK_XACCT /* Accumulated RSS usage: */ u64 acct_rss_mem1; /* Accumulated virtual memory usage: */ u64 acct_vm_mem1; /* stime + utime since last update: */ u64 acct_timexpd; #endif #ifdef CONFIG_CPUSETS /* Protected by ->alloc_lock: */ nodemask_t mems_allowed; /* Sequence number to catch updates: */ seqcount_spinlock_t mems_allowed_seq; int cpuset_mem_spread_rotor; #endif #ifdef CONFIG_CGROUPS /* Control Group info protected by css_set_lock: */ struct css_set __rcu *cgroups; /* cg_list protected by css_set_lock and tsk->alloc_lock: */ struct list_head cg_list; #endif #ifdef CONFIG_X86_CPU_RESCTRL u32 closid; u32 rmid; #endif #ifdef CONFIG_FUTEX struct robust_list_head __user *robust_list; #ifdef CONFIG_COMPAT struct compat_robust_list_head __user *compat_robust_list; #endif struct list_head pi_state_list; struct futex_pi_state *pi_state_cache; struct mutex futex_exit_mutex; unsigned int futex_state; #endif #ifdef CONFIG_PERF_EVENTS u8 perf_recursion[PERF_NR_CONTEXTS]; struct perf_event_context *perf_event_ctxp; struct mutex perf_event_mutex; struct list_head perf_event_list; struct perf_ctx_data __rcu *perf_ctx_data; #endif #ifdef CONFIG_DEBUG_PREEMPT unsigned long preempt_disable_ip; #endif #ifdef CONFIG_NUMA /* Protected by alloc_lock: */ struct mempolicy *mempolicy; short il_prev; u8 il_weight; short pref_node_fork; #endif #ifdef CONFIG_NUMA_BALANCING int numa_scan_seq; unsigned int numa_scan_period; unsigned int numa_scan_period_max; int numa_preferred_nid; unsigned long numa_migrate_retry; /* Migration stamp: */ u64 node_stamp; u64 last_task_numa_placement; u64 last_sum_exec_runtime; struct callback_head numa_work; /* * This pointer is only modified for current in syscall and * pagefault context (and for tasks being destroyed), so it can be read * from any of the following contexts: * - RCU read-side critical section * - current->numa_group from everywhere * - task's runqueue locked, task not running */ struct numa_group __rcu *numa_group; /* * numa_faults is an array split into four regions: * faults_memory, faults_cpu, faults_memory_buffer, faults_cpu_buffer * in this precise order. * * faults_memory: Exponential decaying average of faults on a per-node * basis. Scheduling placement decisions are made based on these * counts. The values remain static for the duration of a PTE scan. * faults_cpu: Track the nodes the process was running on when a NUMA * hinting fault was incurred. * faults_memory_buffer and faults_cpu_buffer: Record faults per node * during the current scan window. When the scan completes, the counts * in faults_memory and faults_cpu decay and these values are copied. */ unsigned long *numa_faults; unsigned long total_numa_faults; /* * numa_faults_locality tracks if faults recorded during the last * scan window were remote/local or failed to migrate. The task scan * period is adapted based on the locality of the faults with different * weights depending on whether they were shared or private faults */ unsigned long numa_faults_locality[3]; unsigned long numa_pages_migrated; #endif /* CONFIG_NUMA_BALANCING */ #ifdef CONFIG_RSEQ struct rseq __user *rseq; u32 rseq_len; u32 rseq_sig; /* * RmW on rseq_event_mask must be performed atomically * with respect to preemption. */ unsigned long rseq_event_mask; # ifdef CONFIG_DEBUG_RSEQ /* * This is a place holder to save a copy of the rseq fields for * validation of read-only fields. The struct rseq has a * variable-length array at the end, so it cannot be used * directly. Reserve a size large enough for the known fields. */ char rseq_fields[sizeof(struct rseq)]; # endif #endif #ifdef CONFIG_SCHED_MM_CID int mm_cid; /* Current cid in mm */ int last_mm_cid; /* Most recent cid in mm */ int migrate_from_cpu; int mm_cid_active; /* Whether cid bitmap is active */ struct callback_head cid_work; #endif struct tlbflush_unmap_batch tlb_ubc; /* Cache last used pipe for splice(): */ struct pipe_inode_info *splice_pipe; struct page_frag task_frag; #ifdef CONFIG_TASK_DELAY_ACCT struct task_delay_info *delays; #endif #ifdef CONFIG_FAULT_INJECTION int make_it_fail; unsigned int fail_nth; #endif /* * When (nr_dirtied >= nr_dirtied_pause), it's time to call * balance_dirty_pages() for a dirty throttling pause: */ int nr_dirtied; int nr_dirtied_pause; /* Start of a write-and-pause period: */ unsigned long dirty_paused_when; #ifdef CONFIG_LATENCYTOP int latency_record_count; struct latency_record latency_record[LT_SAVECOUNT]; #endif /* * Time slack values; these are used to round up poll() and * select() etc timeout values. These are in nanoseconds. */ u64 timer_slack_ns; u64 default_timer_slack_ns; #if defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS) unsigned int kasan_depth; #endif #ifdef CONFIG_KCSAN struct kcsan_ctx kcsan_ctx; #ifdef CONFIG_TRACE_IRQFLAGS struct irqtrace_events kcsan_save_irqtrace; #endif #ifdef CONFIG_KCSAN_WEAK_MEMORY int kcsan_stack_depth; #endif #endif #ifdef CONFIG_KMSAN struct kmsan_ctx kmsan_ctx; #endif #if IS_ENABLED(CONFIG_KUNIT) struct kunit *kunit_test; #endif #ifdef CONFIG_FUNCTION_GRAPH_TRACER /* Index of current stored address in ret_stack: */ int curr_ret_stack; int curr_ret_depth; /* Stack of return addresses for return function tracing: */ unsigned long *ret_stack; /* Timestamp for last schedule: */ unsigned long long ftrace_timestamp; unsigned long long ftrace_sleeptime; /* * Number of functions that haven't been traced * because of depth overrun: */ atomic_t trace_overrun; /* Pause tracing: */ atomic_t tracing_graph_pause; #endif #ifdef CONFIG_TRACING /* Bitmask and counter of trace recursion: */ unsigned long trace_recursion; #endif /* CONFIG_TRACING */ #ifdef CONFIG_KCOV /* See kernel/kcov.c for more details. */ /* Coverage collection mode enabled for this task (0 if disabled): */ unsigned int kcov_mode; /* Size of the kcov_area: */ unsigned int kcov_size; /* Buffer for coverage collection: */ void *kcov_area; /* KCOV descriptor wired with this task or NULL: */ struct kcov *kcov; /* KCOV common handle for remote coverage collection: */ u64 kcov_handle; /* KCOV sequence number: */ int kcov_sequence; /* Collect coverage from softirq context: */ unsigned int kcov_softirq; #endif #ifdef CONFIG_MEMCG_V1 struct mem_cgroup *memcg_in_oom; #endif #ifdef CONFIG_MEMCG /* Number of pages to reclaim on returning to userland: */ unsigned int memcg_nr_pages_over_high; /* Used by memcontrol for targeted memcg charge: */ struct mem_cgroup *active_memcg; /* Cache for current->cgroups->memcg->objcg lookups: */ struct obj_cgroup *objcg; #endif #ifdef CONFIG_BLK_CGROUP struct gendisk *throttle_disk; #endif #ifdef CONFIG_UPROBES struct uprobe_task *utask; #endif #if defined(CONFIG_BCACHE) || defined(CONFIG_BCACHE_MODULE) unsigned int sequential_io; unsigned int sequential_io_avg; #endif struct kmap_ctrl kmap_ctrl; #ifdef CONFIG_DEBUG_ATOMIC_SLEEP unsigned long task_state_change; # ifdef CONFIG_PREEMPT_RT unsigned long saved_state_change; # endif #endif struct rcu_head rcu; refcount_t rcu_users; int pagefault_disabled; #ifdef CONFIG_MMU struct task_struct *oom_reaper_list; struct timer_list oom_reaper_timer; #endif #ifdef CONFIG_VMAP_STACK struct vm_struct *stack_vm_area; #endif #ifdef CONFIG_THREAD_INFO_IN_TASK /* A live task holds one reference: */ refcount_t stack_refcount; #endif #ifdef CONFIG_LIVEPATCH int patch_state; #endif #ifdef CONFIG_SECURITY /* Used by LSM modules for access restriction: */ void *security; #endif #ifdef CONFIG_BPF_SYSCALL /* Used by BPF task local storage */ struct bpf_local_storage __rcu *bpf_storage; /* Used for BPF run context */ struct bpf_run_ctx *bpf_ctx; #endif /* Used by BPF for per-TASK xdp storage */ struct bpf_net_context *bpf_net_context; #ifdef CONFIG_GCC_PLUGIN_STACKLEAK unsigned long lowest_stack; unsigned long prev_lowest_stack; #endif #ifdef CONFIG_X86_MCE void __user *mce_vaddr; __u64 mce_kflags; u64 mce_addr; __u64 mce_ripv : 1, mce_whole_page : 1, __mce_reserved : 62; struct callback_head mce_kill_me; int mce_count; #endif #ifdef CONFIG_KRETPROBES struct llist_head kretprobe_instances; #endif #ifdef CONFIG_RETHOOK struct llist_head rethooks; #endif #ifdef CONFIG_ARCH_HAS_PARANOID_L1D_FLUSH /* * If L1D flush is supported on mm context switch * then we use this callback head to queue kill work * to kill tasks that are not running on SMT disabled * cores */ struct callback_head l1d_flush_kill; #endif #ifdef CONFIG_RV /* * Per-task RV monitor. Nowadays fixed in RV_PER_TASK_MONITORS. * If we find justification for more monitors, we can think * about adding more or developing a dynamic method. So far, * none of these are justified. */ union rv_task_monitor rv[RV_PER_TASK_MONITORS]; #endif #ifdef CONFIG_USER_EVENTS struct user_event_mm *user_event_mm; #endif /* CPU-specific state of this task: */ struct thread_struct thread; /* * New fields for task_struct should be added above here, so that * they are included in the randomized portion of task_struct. */ randomized_struct_fields_end } __attribute__ ((aligned (64))); #define TASK_REPORT_IDLE (TASK_REPORT + 1) #define TASK_REPORT_MAX (TASK_REPORT_IDLE << 1) static inline unsigned int __task_state_index(unsigned int tsk_state, unsigned int tsk_exit_state) { unsigned int state = (tsk_state | tsk_exit_state) & TASK_REPORT; BUILD_BUG_ON_NOT_POWER_OF_2(TASK_REPORT_MAX); if ((tsk_state & TASK_IDLE) == TASK_IDLE) state = TASK_REPORT_IDLE; /* * We're lying here, but rather than expose a completely new task state * to userspace, we can make this appear as if the task has gone through * a regular rt_mutex_lock() call. * Report frozen tasks as uninterruptible. */ if ((tsk_state & TASK_RTLOCK_WAIT) || (tsk_state & TASK_FROZEN)) state = TASK_UNINTERRUPTIBLE; return fls(state); } static inline unsigned int task_state_index(struct task_struct *tsk) { return __task_state_index(READ_ONCE(tsk->__state), tsk->exit_state); } static inline char task_index_to_char(unsigned int state) { static const char state_char[] = "RSDTtXZPI"; BUILD_BUG_ON(TASK_REPORT_MAX * 2 != 1 << (sizeof(state_char) - 1)); return state_char[state]; } static inline char task_state_to_char(struct task_struct *tsk) { return task_index_to_char(task_state_index(tsk)); } extern struct pid *cad_pid; /* * Per process flags */ #define PF_VCPU 0x00000001 /* I'm a virtual CPU */ #define PF_IDLE 0x00000002 /* I am an IDLE thread */ #define PF_EXITING 0x00000004 /* Getting shut down */ #define PF_POSTCOREDUMP 0x00000008 /* Coredumps should ignore this task */ #define PF_IO_WORKER 0x00000010 /* Task is an IO worker */ #define PF_WQ_WORKER 0x00000020 /* I'm a workqueue worker */ #define PF_FORKNOEXEC 0x00000040 /* Forked but didn't exec */ #define PF_MCE_PROCESS 0x00000080 /* Process policy on mce errors */ #define PF_SUPERPRIV 0x00000100 /* Used super-user privileges */ #define PF_DUMPCORE 0x00000200 /* Dumped core */ #define PF_SIGNALED 0x00000400 /* Killed by a signal */ #define PF_MEMALLOC 0x00000800 /* Allocating memory to free memory. See memalloc_noreclaim_save() */ #define PF_NPROC_EXCEEDED 0x00001000 /* set_user() noticed that RLIMIT_NPROC was exceeded */ #define PF_USED_MATH 0x00002000 /* If unset the fpu must be initialized before use */ #define PF_USER_WORKER 0x00004000 /* Kernel thread cloned from userspace thread */ #define PF_NOFREEZE 0x00008000 /* This thread should not be frozen */ #define PF_KCOMPACTD 0x00010000 /* I am kcompactd */ #define PF_KSWAPD 0x00020000 /* I am kswapd */ #define PF_MEMALLOC_NOFS 0x00040000 /* All allocations inherit GFP_NOFS. See memalloc_nfs_save() */ #define PF_MEMALLOC_NOIO 0x00080000 /* All allocations inherit GFP_NOIO. See memalloc_noio_save() */ #define PF_LOCAL_THROTTLE 0x00100000 /* Throttle writes only against the bdi I write to, * I am cleaning dirty pages from some other bdi. */ #define PF_KTHREAD 0x00200000 /* I am a kernel thread */ #define PF_RANDOMIZE 0x00400000 /* Randomize virtual address space */ #define PF__HOLE__00800000 0x00800000 #define PF__HOLE__01000000 0x01000000 #define PF__HOLE__02000000 0x02000000 #define PF_NO_SETAFFINITY 0x04000000 /* Userland is not allowed to meddle with cpus_mask */ #define PF_MCE_EARLY 0x08000000 /* Early kill for mce process policy */ #define PF_MEMALLOC_PIN 0x10000000 /* Allocations constrained to zones which allow long term pinning. * See memalloc_pin_save() */ #define PF_BLOCK_TS 0x20000000 /* plug has ts that needs updating */ #define PF__HOLE__40000000 0x40000000 #define PF_SUSPEND_TASK 0x80000000 /* This thread called freeze_processes() and should not be frozen */ /* * Only the _current_ task can read/write to tsk->flags, but other * tasks can access tsk->flags in readonly mode for example * with tsk_used_math (like during threaded core dumping). * There is however an exception to this rule during ptrace * or during fork: the ptracer task is allowed to write to the * child->flags of its traced child (same goes for fork, the parent * can write to the child->flags), because we're guaranteed the * child is not running and in turn not changing child->flags * at the same time the parent does it. */ #define clear_stopped_child_used_math(child) do { (child)->flags &= ~PF_USED_MATH; } while (0) #define set_stopped_child_used_math(child) do { (child)->flags |= PF_USED_MATH; } while (0) #define clear_used_math() clear_stopped_child_used_math(current) #define set_used_math() set_stopped_child_used_math(current) #define conditional_stopped_child_used_math(condition, child) \ do { (child)->flags &= ~PF_USED_MATH, (child)->flags |= (condition) ? PF_USED_MATH : 0; } while (0) #define conditional_used_math(condition) conditional_stopped_child_used_math(condition, current) #define copy_to_stopped_child_used_math(child) \ do { (child)->flags &= ~PF_USED_MATH, (child)->flags |= current->flags & PF_USED_MATH; } while (0) /* NOTE: this will return 0 or PF_USED_MATH, it will never return 1 */ #define tsk_used_math(p) ((p)->flags & PF_USED_MATH) #define used_math() tsk_used_math(current) static __always_inline bool is_percpu_thread(void) { return (current->flags & PF_NO_SETAFFINITY) && (current->nr_cpus_allowed == 1); } /* Per-process atomic flags. */ #define PFA_NO_NEW_PRIVS 0 /* May not gain new privileges. */ #define PFA_SPREAD_PAGE 1 /* Spread page cache over cpuset */ #define PFA_SPREAD_SLAB 2 /* Spread some slab caches over cpuset */ #define PFA_SPEC_SSB_DISABLE 3 /* Speculative Store Bypass disabled */ #define PFA_SPEC_SSB_FORCE_DISABLE 4 /* Speculative Store Bypass force disabled*/ #define PFA_SPEC_IB_DISABLE 5 /* Indirect branch speculation restricted */ #define PFA_SPEC_IB_FORCE_DISABLE 6 /* Indirect branch speculation permanently restricted */ #define PFA_SPEC_SSB_NOEXEC 7 /* Speculative Store Bypass clear on execve() */ #define TASK_PFA_TEST(name, func) \ static inline bool task_##func(struct task_struct *p) \ { return test_bit(PFA_##name, &p->atomic_flags); } #define TASK_PFA_SET(name, func) \ static inline void task_set_##func(struct task_struct *p) \ { set_bit(PFA_##name, &p->atomic_flags); } #define TASK_PFA_CLEAR(name, func) \ static inline void task_clear_##func(struct task_struct *p) \ { clear_bit(PFA_##name, &p->atomic_flags); } TASK_PFA_TEST(NO_NEW_PRIVS, no_new_privs) TASK_PFA_SET(NO_NEW_PRIVS, no_new_privs) TASK_PFA_TEST(SPREAD_PAGE, spread_page) TASK_PFA_SET(SPREAD_PAGE, spread_page) TASK_PFA_CLEAR(SPREAD_PAGE, spread_page) TASK_PFA_TEST(SPREAD_SLAB, spread_slab) TASK_PFA_SET(SPREAD_SLAB, spread_slab) TASK_PFA_CLEAR(SPREAD_SLAB, spread_slab) TASK_PFA_TEST(SPEC_SSB_DISABLE, spec_ssb_disable) TASK_PFA_SET(SPEC_SSB_DISABLE, spec_ssb_disable) TASK_PFA_CLEAR(SPEC_SSB_DISABLE, spec_ssb_disable) TASK_PFA_TEST(SPEC_SSB_NOEXEC, spec_ssb_noexec) TASK_PFA_SET(SPEC_SSB_NOEXEC, spec_ssb_noexec) TASK_PFA_CLEAR(SPEC_SSB_NOEXEC, spec_ssb_noexec) TASK_PFA_TEST(SPEC_SSB_FORCE_DISABLE, spec_ssb_force_disable) TASK_PFA_SET(SPEC_SSB_FORCE_DISABLE, spec_ssb_force_disable) TASK_PFA_TEST(SPEC_IB_DISABLE, spec_ib_disable) TASK_PFA_SET(SPEC_IB_DISABLE, spec_ib_disable) TASK_PFA_CLEAR(SPEC_IB_DISABLE, spec_ib_disable) TASK_PFA_TEST(SPEC_IB_FORCE_DISABLE, spec_ib_force_disable) TASK_PFA_SET(SPEC_IB_FORCE_DISABLE, spec_ib_force_disable) static inline void current_restore_flags(unsigned long orig_flags, unsigned long flags) { current->flags &= ~flags; current->flags |= orig_flags & flags; } extern int cpuset_cpumask_can_shrink(const struct cpumask *cur, const struct cpumask *trial); extern int task_can_attach(struct task_struct *p); extern int dl_bw_alloc(int cpu, u64 dl_bw); extern void dl_bw_free(int cpu, u64 dl_bw); /* do_set_cpus_allowed() - consider using set_cpus_allowed_ptr() instead */ extern void do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new_mask); /** * set_cpus_allowed_ptr - set CPU affinity mask of a task * @p: the task * @new_mask: CPU affinity mask * * Return: zero if successful, or a negative error code */ extern int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask); extern int dup_user_cpus_ptr(struct task_struct *dst, struct task_struct *src, int node); extern void release_user_cpus_ptr(struct task_struct *p); extern int dl_task_check_affinity(struct task_struct *p, const struct cpumask *mask); extern void force_compatible_cpus_allowed_ptr(struct task_struct *p); extern void relax_compatible_cpus_allowed_ptr(struct task_struct *p); extern int yield_to(struct task_struct *p, bool preempt); extern void set_user_nice(struct task_struct *p, long nice); extern int task_prio(const struct task_struct *p); /** * task_nice - return the nice value of a given task. * @p: the task in question. * * Return: The nice value [ -20 ... 0 ... 19 ]. */ static inline int task_nice(const struct task_struct *p) { return PRIO_TO_NICE((p)->static_prio); } extern int can_nice(const struct task_struct *p, const int nice); extern int task_curr(const struct task_struct *p); extern int idle_cpu(int cpu); extern int available_idle_cpu(int cpu); extern int sched_setscheduler(struct task_struct *, int, const struct sched_param *); extern int sched_setscheduler_nocheck(struct task_struct *, int, const struct sched_param *); extern void sched_set_fifo(struct task_struct *p); extern void sched_set_fifo_low(struct task_struct *p); extern void sched_set_normal(struct task_struct *p, int nice); extern int sched_setattr(struct task_struct *, const struct sched_attr *); extern int sched_setattr_nocheck(struct task_struct *, const struct sched_attr *); extern struct task_struct *idle_task(int cpu); /** * is_idle_task - is the specified task an idle task? * @p: the task in question. * * Return: 1 if @p is an idle task. 0 otherwise. */ static __always_inline bool is_idle_task(const struct task_struct *p) { return !!(p->flags & PF_IDLE); } extern struct task_struct *curr_task(int cpu); extern void ia64_set_curr_task(int cpu, struct task_struct *p); void yield(void); union thread_union { struct task_struct task; #ifndef CONFIG_THREAD_INFO_IN_TASK struct thread_info thread_info; #endif unsigned long stack[THREAD_SIZE/sizeof(long)]; }; #ifndef CONFIG_THREAD_INFO_IN_TASK extern struct thread_info init_thread_info; #endif extern unsigned long init_stack[THREAD_SIZE / sizeof(unsigned long)]; #ifdef CONFIG_THREAD_INFO_IN_TASK # define task_thread_info(task) (&(task)->thread_info) #else # define task_thread_info(task) ((struct thread_info *)(task)->stack) #endif /* * find a task by one of its numerical ids * * find_task_by_pid_ns(): * finds a task by its pid in the specified namespace * find_task_by_vpid(): * finds a task by its virtual pid * * see also find_vpid() etc in include/linux/pid.h */ extern struct task_struct *find_task_by_vpid(pid_t nr); extern struct task_struct *find_task_by_pid_ns(pid_t nr, struct pid_namespace *ns); /* * find a task by its virtual pid and get the task struct */ extern struct task_struct *find_get_task_by_vpid(pid_t nr); extern int wake_up_state(struct task_struct *tsk, unsigned int state); extern int wake_up_process(struct task_struct *tsk); extern void wake_up_new_task(struct task_struct *tsk); extern void kick_process(struct task_struct *tsk); extern void __set_task_comm(struct task_struct *tsk, const char *from, bool exec); #define set_task_comm(tsk, from) ({ \ BUILD_BUG_ON(sizeof(from) != TASK_COMM_LEN); \ __set_task_comm(tsk, from, false); \ }) /* * - Why not use task_lock()? * User space can randomly change their names anyway, so locking for readers * doesn't make sense. For writers, locking is probably necessary, as a race * condition could lead to long-term mixed results. * The strscpy_pad() in __set_task_comm() can ensure that the task comm is * always NUL-terminated and zero-padded. Therefore the race condition between * reader and writer is not an issue. * * - BUILD_BUG_ON() can help prevent the buf from being truncated. * Since the callers don't perform any return value checks, this safeguard is * necessary. */ #define get_task_comm(buf, tsk) ({ \ BUILD_BUG_ON(sizeof(buf) < TASK_COMM_LEN); \ strscpy_pad(buf, (tsk)->comm); \ buf; \ }) static __always_inline void scheduler_ipi(void) { /* * Fold TIF_NEED_RESCHED into the preempt_count; anybody setting * TIF_NEED_RESCHED remotely (for the first time) will also send * this IPI. */ preempt_fold_need_resched(); } extern unsigned long wait_task_inactive(struct task_struct *, unsigned int match_state); /* * Set thread flags in other task's structures. * See asm/thread_info.h for TIF_xxxx flags available: */ static inline void set_tsk_thread_flag(struct task_struct *tsk, int flag) { set_ti_thread_flag(task_thread_info(tsk), flag); } static inline void clear_tsk_thread_flag(struct task_struct *tsk, int flag) { clear_ti_thread_flag(task_thread_info(tsk), flag); } static inline void update_tsk_thread_flag(struct task_struct *tsk, int flag, bool value) { update_ti_thread_flag(task_thread_info(tsk), flag, value); } static inline int test_and_set_tsk_thread_flag(struct task_struct *tsk, int flag) { return test_and_set_ti_thread_flag(task_thread_info(tsk), flag); } static inline int test_and_clear_tsk_thread_flag(struct task_struct *tsk, int flag) { return test_and_clear_ti_thread_flag(task_thread_info(tsk), flag); } static inline int test_tsk_thread_flag(struct task_struct *tsk, int flag) { return test_ti_thread_flag(task_thread_info(tsk), flag); } static inline void set_tsk_need_resched(struct task_struct *tsk) { set_tsk_thread_flag(tsk,TIF_NEED_RESCHED); } static inline void clear_tsk_need_resched(struct task_struct *tsk) { atomic_long_andnot(_TIF_NEED_RESCHED | _TIF_NEED_RESCHED_LAZY, (atomic_long_t *)&task_thread_info(tsk)->flags); } static inline int test_tsk_need_resched(struct task_struct *tsk) { return unlikely(test_tsk_thread_flag(tsk,TIF_NEED_RESCHED)); } /* * cond_resched() and cond_resched_lock(): latency reduction via * explicit rescheduling in places that are safe. The return * value indicates whether a reschedule was done in fact. * cond_resched_lock() will drop the spinlock before scheduling, */ #if !defined(CONFIG_PREEMPTION) || defined(CONFIG_PREEMPT_DYNAMIC) extern int __cond_resched(void); #if defined(CONFIG_PREEMPT_DYNAMIC) && defined(CONFIG_HAVE_PREEMPT_DYNAMIC_CALL) DECLARE_STATIC_CALL(cond_resched, __cond_resched); static __always_inline int _cond_resched(void) { return static_call_mod(cond_resched)(); } #elif defined(CONFIG_PREEMPT_DYNAMIC) && defined(CONFIG_HAVE_PREEMPT_DYNAMIC_KEY) extern int dynamic_cond_resched(void); static __always_inline int _cond_resched(void) { return dynamic_cond_resched(); } #else /* !CONFIG_PREEMPTION */ static inline int _cond_resched(void) { return __cond_resched(); } #endif /* PREEMPT_DYNAMIC && CONFIG_HAVE_PREEMPT_DYNAMIC_CALL */ #else /* CONFIG_PREEMPTION && !CONFIG_PREEMPT_DYNAMIC */ static inline int _cond_resched(void) { return 0; } #endif /* !CONFIG_PREEMPTION || CONFIG_PREEMPT_DYNAMIC */ #define cond_resched() ({ \ __might_resched(__FILE__, __LINE__, 0); \ _cond_resched(); \ }) extern int __cond_resched_lock(spinlock_t *lock); extern int __cond_resched_rwlock_read(rwlock_t *lock); extern int __cond_resched_rwlock_write(rwlock_t *lock); #define MIGHT_RESCHED_RCU_SHIFT 8 #define MIGHT_RESCHED_PREEMPT_MASK ((1U << MIGHT_RESCHED_RCU_SHIFT) - 1) #ifndef CONFIG_PREEMPT_RT /* * Non RT kernels have an elevated preempt count due to the held lock, * but are not allowed to be inside a RCU read side critical section */ # define PREEMPT_LOCK_RESCHED_OFFSETS PREEMPT_LOCK_OFFSET #else /* * spin/rw_lock() on RT implies rcu_read_lock(). The might_sleep() check in * cond_resched*lock() has to take that into account because it checks for * preempt_count() and rcu_preempt_depth(). */ # define PREEMPT_LOCK_RESCHED_OFFSETS \ (PREEMPT_LOCK_OFFSET + (1U << MIGHT_RESCHED_RCU_SHIFT)) #endif #define cond_resched_lock(lock) ({ \ __might_resched(__FILE__, __LINE__, PREEMPT_LOCK_RESCHED_OFFSETS); \ __cond_resched_lock(lock); \ }) #define cond_resched_rwlock_read(lock) ({ \ __might_resched(__FILE__, __LINE__, PREEMPT_LOCK_RESCHED_OFFSETS); \ __cond_resched_rwlock_read(lock); \ }) #define cond_resched_rwlock_write(lock) ({ \ __might_resched(__FILE__, __LINE__, PREEMPT_LOCK_RESCHED_OFFSETS); \ __cond_resched_rwlock_write(lock); \ }) static __always_inline bool need_resched(void) { return unlikely(tif_need_resched()); } /* * Wrappers for p->thread_info->cpu access. No-op on UP. */ #ifdef CONFIG_SMP static inline unsigned int task_cpu(const struct task_struct *p) { return READ_ONCE(task_thread_info(p)->cpu); } extern void set_task_cpu(struct task_struct *p, unsigned int cpu); #else static inline unsigned int task_cpu(const struct task_struct *p) { return 0; } static inline void set_task_cpu(struct task_struct *p, unsigned int cpu) { } #endif /* CONFIG_SMP */ static inline bool task_is_runnable(struct task_struct *p) { return p->on_rq && !p->se.sched_delayed; } extern bool sched_task_on_rq(struct task_struct *p); extern unsigned long get_wchan(struct task_struct *p); extern struct task_struct *cpu_curr_snapshot(int cpu); #include <linux/spinlock.h> /* * In order to reduce various lock holder preemption latencies provide an * interface to see if a vCPU is currently running or not. * * This allows us to terminate optimistic spin loops and block, analogous to * the native optimistic spin heuristic of testing if the lock owner task is * running or not. */ #ifndef vcpu_is_preempted static inline bool vcpu_is_preempted(int cpu) { return false; } #endif extern long sched_setaffinity(pid_t pid, const struct cpumask *new_mask); extern long sched_getaffinity(pid_t pid, struct cpumask *mask); #ifndef TASK_SIZE_OF #define TASK_SIZE_OF(tsk) TASK_SIZE #endif static inline bool owner_on_cpu(struct task_struct *owner) { /* * As lock holder preemption issue, we both skip spinning if * task is not on cpu or its cpu is preempted */ return READ_ONCE(owner->on_cpu) && !vcpu_is_preempted(task_cpu(owner)); } /* Returns effective CPU energy utilization, as seen by the scheduler */ unsigned long sched_cpu_util(int cpu); #ifdef CONFIG_SCHED_CORE extern void sched_core_free(struct task_struct *tsk); extern void sched_core_fork(struct task_struct *p); extern int sched_core_share_pid(unsigned int cmd, pid_t pid, enum pid_type type, unsigned long uaddr); extern int sched_core_idle_cpu(int cpu); #else static inline void sched_core_free(struct task_struct *tsk) { } static inline void sched_core_fork(struct task_struct *p) { } static inline int sched_core_idle_cpu(int cpu) { return idle_cpu(cpu); } #endif extern void sched_set_stop_task(int cpu, struct task_struct *stop); #ifdef CONFIG_MEM_ALLOC_PROFILING static __always_inline struct alloc_tag *alloc_tag_save(struct alloc_tag *tag) { swap(current->alloc_tag, tag); return tag; } static __always_inline void alloc_tag_restore(struct alloc_tag *tag, struct alloc_tag *old) { #ifdef CONFIG_MEM_ALLOC_PROFILING_DEBUG WARN(current->alloc_tag != tag, "current->alloc_tag was changed:\n"); #endif current->alloc_tag = old; } #else #define alloc_tag_save(_tag) NULL #define alloc_tag_restore(_tag, _old) do {} while (0) #endif #endif |
| 43 43 43 43 43 42 199 202 61 64 517 519 3 2 1 18 18 18 18 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 | // SPDX-License-Identifier: GPL-2.0-or-later /* * Handle firewalling * Linux ethernet bridge * * Authors: * Lennert Buytenhek <buytenh@gnu.org> * Bart De Schuymer <bdschuym@pandora.be> * * Lennert dedicates this file to Kerstin Wurdinger. */ #include <linux/module.h> #include <linux/kernel.h> #include <linux/slab.h> #include <linux/ip.h> #include <linux/netdevice.h> #include <linux/skbuff.h> #include <linux/if_arp.h> #include <linux/if_ether.h> #include <linux/if_vlan.h> #include <linux/if_pppox.h> #include <linux/ppp_defs.h> #include <linux/netfilter_bridge.h> #include <uapi/linux/netfilter_bridge.h> #include <linux/netfilter_ipv4.h> #include <linux/netfilter_ipv6.h> #include <linux/netfilter_arp.h> #include <linux/in_route.h> #include <linux/rculist.h> #include <linux/inetdevice.h> #include <net/ip.h> #include <net/ipv6.h> #include <net/addrconf.h> #include <net/dst_metadata.h> #include <net/route.h> #include <net/netfilter/br_netfilter.h> #include <net/netns/generic.h> #include <net/inet_dscp.h> #include <linux/uaccess.h> #include "br_private.h" #ifdef CONFIG_SYSCTL #include <linux/sysctl.h> #endif #if IS_ENABLED(CONFIG_NF_CONNTRACK) #include <net/netfilter/nf_conntrack_core.h> #endif static unsigned int brnf_net_id __read_mostly; struct brnf_net { bool enabled; #ifdef CONFIG_SYSCTL struct ctl_table_header *ctl_hdr; #endif /* default value is 1 */ int call_iptables; int call_ip6tables; int call_arptables; /* default value is 0 */ int filter_vlan_tagged; int filter_pppoe_tagged; int pass_vlan_indev; }; #define IS_IP(skb) \ (!skb_vlan_tag_present(skb) && skb->protocol == htons(ETH_P_IP)) #define IS_IPV6(skb) \ (!skb_vlan_tag_present(skb) && skb->protocol == htons(ETH_P_IPV6)) #define IS_ARP(skb) \ (!skb_vlan_tag_present(skb) && skb->protocol == htons(ETH_P_ARP)) static inline __be16 vlan_proto(const struct sk_buff *skb) { if (skb_vlan_tag_present(skb)) return skb->protocol; else if (skb->protocol == htons(ETH_P_8021Q)) return vlan_eth_hdr(skb)->h_vlan_encapsulated_proto; else return 0; } static inline bool is_vlan_ip(const struct sk_buff *skb, const struct net *net) { struct brnf_net *brnet = net_generic(net, brnf_net_id); return vlan_proto(skb) == htons(ETH_P_IP) && brnet->filter_vlan_tagged; } static inline bool is_vlan_ipv6(const struct sk_buff *skb, const struct net *net) { struct brnf_net *brnet = net_generic(net, brnf_net_id); return vlan_proto(skb) == htons(ETH_P_IPV6) && brnet->filter_vlan_tagged; } static inline bool is_vlan_arp(const struct sk_buff *skb, const struct net *net) { struct brnf_net *brnet = net_generic(net, brnf_net_id); return vlan_proto(skb) == htons(ETH_P_ARP) && brnet->filter_vlan_tagged; } static inline __be16 pppoe_proto(const struct sk_buff *skb) { return *((__be16 *)(skb_mac_header(skb) + ETH_HLEN + sizeof(struct pppoe_hdr))); } static inline bool is_pppoe_ip(const struct sk_buff *skb, const struct net *net) { struct brnf_net *brnet = net_generic(net, brnf_net_id); return skb->protocol == htons(ETH_P_PPP_SES) && pppoe_proto(skb) == htons(PPP_IP) && brnet->filter_pppoe_tagged; } static inline bool is_pppoe_ipv6(const struct sk_buff *skb, const struct net *net) { struct brnf_net *brnet = net_generic(net, brnf_net_id); return skb->protocol == htons(ETH_P_PPP_SES) && pppoe_proto(skb) == htons(PPP_IPV6) && brnet->filter_pppoe_tagged; } /* largest possible L2 header, see br_nf_dev_queue_xmit() */ #define NF_BRIDGE_MAX_MAC_HEADER_LENGTH (PPPOE_SES_HLEN + ETH_HLEN) struct brnf_frag_data { local_lock_t bh_lock; char mac[NF_BRIDGE_MAX_MAC_HEADER_LENGTH]; u8 encap_size; u8 size; u16 vlan_tci; __be16 vlan_proto; }; static DEFINE_PER_CPU(struct brnf_frag_data, brnf_frag_data_storage) = { .bh_lock = INIT_LOCAL_LOCK(bh_lock), }; static void nf_bridge_info_free(struct sk_buff *skb) { skb_ext_del(skb, SKB_EXT_BRIDGE_NF); } static inline struct net_device *bridge_parent(const struct net_device *dev) { struct net_bridge_port *port; port = br_port_get_rcu(dev); return port ? port->br->dev : NULL; } static inline struct nf_bridge_info *nf_bridge_unshare(struct sk_buff *skb) { return skb_ext_add(skb, SKB_EXT_BRIDGE_NF); } unsigned int nf_bridge_encap_header_len(const struct sk_buff *skb) { switch (skb->protocol) { case __cpu_to_be16(ETH_P_8021Q): return VLAN_HLEN; case __cpu_to_be16(ETH_P_PPP_SES): return PPPOE_SES_HLEN; default: return 0; } } static inline void nf_bridge_pull_encap_header(struct sk_buff *skb) { unsigned int len = nf_bridge_encap_header_len(skb); skb_pull(skb, len); skb->network_header += len; } static inline void nf_bridge_pull_encap_header_rcsum(struct sk_buff *skb) { unsigned int len = nf_bridge_encap_header_len(skb); skb_pull_rcsum(skb, len); skb->network_header += len; } /* When handing a packet over to the IP layer * check whether we have a skb that is in the * expected format */ static int br_validate_ipv4(struct net *net, struct sk_buff *skb) { const struct iphdr *iph; u32 len; if (!pskb_may_pull(skb, sizeof(struct iphdr))) goto inhdr_error; iph = ip_hdr(skb); /* Basic sanity checks */ if (iph->ihl < 5 || iph->version != 4) goto inhdr_error; if (!pskb_may_pull(skb, iph->ihl*4)) goto inhdr_error; iph = ip_hdr(skb); if (unlikely(ip_fast_csum((u8 *)iph, iph->ihl))) goto csum_error; len = skb_ip_totlen(skb); if (skb->len < len) { __IP_INC_STATS(net, IPSTATS_MIB_INTRUNCATEDPKTS); goto drop; } else if (len < (iph->ihl*4)) goto inhdr_error; if (pskb_trim_rcsum(skb, len)) { __IP_INC_STATS(net, IPSTATS_MIB_INDISCARDS); goto drop; } memset(IPCB(skb), 0, sizeof(struct inet_skb_parm)); /* We should really parse IP options here but until * somebody who actually uses IP options complains to * us we'll just silently ignore the options because * we're lazy! */ return 0; csum_error: __IP_INC_STATS(net, IPSTATS_MIB_CSUMERRORS); inhdr_error: __IP_INC_STATS(net, IPSTATS_MIB_INHDRERRORS); drop: return -1; } void nf_bridge_update_protocol(struct sk_buff *skb) { const struct nf_bridge_info *nf_bridge = nf_bridge_info_get(skb); switch (nf_bridge->orig_proto) { case BRNF_PROTO_8021Q: skb->protocol = htons(ETH_P_8021Q); break; case BRNF_PROTO_PPPOE: skb->protocol = htons(ETH_P_PPP_SES); break; case BRNF_PROTO_UNCHANGED: break; } } /* Obtain the correct destination MAC address, while preserving the original * source MAC address. If we already know this address, we just copy it. If we * don't, we use the neighbour framework to find out. In both cases, we make * sure that br_handle_frame_finish() is called afterwards. */ int br_nf_pre_routing_finish_bridge(struct net *net, struct sock *sk, struct sk_buff *skb) { struct neighbour *neigh; struct dst_entry *dst; skb->dev = bridge_parent(skb->dev); if (!skb->dev) goto free_skb; dst = skb_dst(skb); neigh = dst_neigh_lookup_skb(dst, skb); if (neigh) { struct nf_bridge_info *nf_bridge = nf_bridge_info_get(skb); int ret; if ((READ_ONCE(neigh->nud_state) & NUD_CONNECTED) && READ_ONCE(neigh->hh.hh_len)) { struct net_device *br_indev; br_indev = nf_bridge_get_physindev(skb, net); if (!br_indev) { neigh_release(neigh); goto free_skb; } neigh_hh_bridge(&neigh->hh, skb); skb->dev = br_indev; ret = br_handle_frame_finish(net, sk, skb); } else { /* the neighbour function below overwrites the complete * MAC header, so we save the Ethernet source address and * protocol number. */ skb_copy_from_linear_data_offset(skb, -(ETH_HLEN-ETH_ALEN), nf_bridge->neigh_header, ETH_HLEN-ETH_ALEN); /* tell br_dev_xmit to continue with forwarding */ nf_bridge->bridged_dnat = 1; /* FIXME Need to refragment */ ret = READ_ONCE(neigh->output)(neigh, skb); } neigh_release(neigh); return ret; } free_skb: kfree_skb(skb); return 0; } static inline bool br_nf_ipv4_daddr_was_changed(const struct sk_buff *skb, const struct nf_bridge_info *nf_bridge) { return ip_hdr(skb)->daddr != nf_bridge->ipv4_daddr; } /* This requires some explaining. If DNAT has taken place, * we will need to fix up the destination Ethernet address. * This is also true when SNAT takes place (for the reply direction). * * There are two cases to consider: * 1. The packet was DNAT'ed to a device in the same bridge * port group as it was received on. We can still bridge * the packet. * 2. The packet was DNAT'ed to a different device, either * a non-bridged device or another bridge port group. * The packet will need to be routed. * * The correct way of distinguishing between these two cases is to * call ip_route_input() and to look at skb->dst->dev, which is * changed to the destination device if ip_route_input() succeeds. * * Let's first consider the case that ip_route_input() succeeds: * * If the output device equals the logical bridge device the packet * came in on, we can consider this bridging. The corresponding MAC * address will be obtained in br_nf_pre_routing_finish_bridge. * Otherwise, the packet is considered to be routed and we just * change the destination MAC address so that the packet will * later be passed up to the IP stack to be routed. For a redirected * packet, ip_route_input() will give back the localhost as output device, * which differs from the bridge device. * * Let's now consider the case that ip_route_input() fails: * * This can be because the destination address is martian, in which case * the packet will be dropped. * If IP forwarding is disabled, ip_route_input() will fail, while * ip_route_output_key() can return success. The source * address for ip_route_output_key() is set to zero, so ip_route_output_key() * thinks we're handling a locally generated packet and won't care * if IP forwarding is enabled. If the output device equals the logical bridge * device, we proceed as if ip_route_input() succeeded. If it differs from the * logical bridge port or if ip_route_output_key() fails we drop the packet. */ static int br_nf_pre_routing_finish(struct net *net, struct sock *sk, struct sk_buff *skb) { struct nf_bridge_info *nf_bridge = nf_bridge_info_get(skb); struct net_device *dev = skb->dev, *br_indev; const struct iphdr *iph = ip_hdr(skb); enum skb_drop_reason reason; struct rtable *rt; br_indev = nf_bridge_get_physindev(skb, net); if (!br_indev) { kfree_skb(skb); return 0; } nf_bridge->frag_max_size = IPCB(skb)->frag_max_size; if (nf_bridge->pkt_otherhost) { skb->pkt_type = PACKET_OTHERHOST; nf_bridge->pkt_otherhost = false; } nf_bridge->in_prerouting = 0; if (br_nf_ipv4_daddr_was_changed(skb, nf_bridge)) { reason = ip_route_input(skb, iph->daddr, iph->saddr, ip4h_dscp(iph), dev); if (reason) { kfree_skb_reason(skb, reason); return 0; } else { if (skb_dst(skb)->dev == dev) { skb->dev = br_indev; nf_bridge_update_protocol(skb); nf_bridge_push_encap_header(skb); br_nf_hook_thresh(NF_BR_PRE_ROUTING, net, sk, skb, skb->dev, NULL, br_nf_pre_routing_finish_bridge); return 0; } ether_addr_copy(eth_hdr(skb)->h_dest, dev->dev_addr); skb->pkt_type = PACKET_HOST; } } else { rt = bridge_parent_rtable(br_indev); if (!rt) { kfree_skb(skb); return 0; } skb_dst_drop(skb); skb_dst_set_noref(skb, &rt->dst); } skb->dev = br_indev; nf_bridge_update_protocol(skb); nf_bridge_push_encap_header(skb); br_nf_hook_thresh(NF_BR_PRE_ROUTING, net, sk, skb, skb->dev, NULL, br_handle_frame_finish); return 0; } static struct net_device *brnf_get_logical_dev(struct sk_buff *skb, const struct net_device *dev, const struct net *net) { struct net_device *vlan, *br; struct brnf_net *brnet = net_generic(net, brnf_net_id); br = bridge_parent(dev); if (brnet->pass_vlan_indev == 0 || !skb_vlan_tag_present(skb)) return br; vlan = __vlan_find_dev_deep_rcu(br, skb->vlan_proto, skb_vlan_tag_get(skb) & VLAN_VID_MASK); return vlan ? vlan : br; } /* Some common code for IPv4/IPv6 */ struct net_device *setup_pre_routing(struct sk_buff *skb, const struct net *net) { struct nf_bridge_info *nf_bridge = nf_bridge_info_get(skb); if (skb->pkt_type == PACKET_OTHERHOST) { skb->pkt_type = PACKET_HOST; nf_bridge->pkt_otherhost = true; } nf_bridge->in_prerouting = 1; nf_bridge->physinif = skb->dev->ifindex; skb->dev = brnf_get_logical_dev(skb, skb->dev, net); if (skb->protocol == htons(ETH_P_8021Q)) nf_bridge->orig_proto = BRNF_PROTO_8021Q; else if (skb->protocol == htons(ETH_P_PPP_SES)) nf_bridge->orig_proto = BRNF_PROTO_PPPOE; /* Must drop socket now because of tproxy. */ skb_orphan(skb); return skb->dev; } /* Direct IPv6 traffic to br_nf_pre_routing_ipv6. * Replicate the checks that IPv4 does on packet reception. * Set skb->dev to the bridge device (i.e. parent of the * receiving device) to make netfilter happy, the REDIRECT * target in particular. Save the original destination IP * address to be able to detect DNAT afterwards. */ static unsigned int br_nf_pre_routing(void *priv, struct sk_buff *skb, const struct nf_hook_state *state) { struct nf_bridge_info *nf_bridge; struct net_bridge_port *p; struct net_bridge *br; __u32 len = nf_bridge_encap_header_len(skb); struct brnf_net *brnet; if (unlikely(!pskb_may_pull(skb, len))) return NF_DROP_REASON(skb, SKB_DROP_REASON_PKT_TOO_SMALL, 0); p = br_port_get_rcu(state->in); if (p == NULL) return NF_DROP_REASON(skb, SKB_DROP_REASON_DEV_READY, 0); br = p->br; brnet = net_generic(state->net, brnf_net_id); if (IS_IPV6(skb) || is_vlan_ipv6(skb, state->net) || is_pppoe_ipv6(skb, state->net)) { if (!brnet->call_ip6tables && !br_opt_get(br, BROPT_NF_CALL_IP6TABLES)) return NF_ACCEPT; if (!ipv6_mod_enabled()) { pr_warn_once("Module ipv6 is disabled, so call_ip6tables is not supported."); return NF_DROP_REASON(skb, SKB_DROP_REASON_IPV6DISABLED, 0); } nf_bridge_pull_encap_header_rcsum(skb); return br_nf_pre_routing_ipv6(priv, skb, state); } if (!brnet->call_iptables && !br_opt_get(br, BROPT_NF_CALL_IPTABLES)) return NF_ACCEPT; if (!IS_IP(skb) && !is_vlan_ip(skb, state->net) && !is_pppoe_ip(skb, state->net)) return NF_ACCEPT; nf_bridge_pull_encap_header_rcsum(skb); if (br_validate_ipv4(state->net, skb)) return NF_DROP_REASON(skb, SKB_DROP_REASON_IP_INHDR, 0); if (!nf_bridge_alloc(skb)) return NF_DROP_REASON(skb, SKB_DROP_REASON_NOMEM, 0); if (!setup_pre_routing(skb, state->net)) return NF_DROP_REASON(skb, SKB_DROP_REASON_DEV_READY, 0); nf_bridge = nf_bridge_info_get(skb); nf_bridge->ipv4_daddr = ip_hdr(skb)->daddr; skb->protocol = htons(ETH_P_IP); skb->transport_header = skb->network_header + ip_hdr(skb)->ihl * 4; NF_HOOK(NFPROTO_IPV4, NF_INET_PRE_ROUTING, state->net, state->sk, skb, skb->dev, NULL, br_nf_pre_routing_finish); return NF_STOLEN; } #if IS_ENABLED(CONFIG_NF_CONNTRACK) /* conntracks' nf_confirm logic cannot handle cloned skbs referencing * the same nf_conn entry, which will happen for multicast (broadcast) * Frames on bridges. * * Example: * macvlan0 * br0 * ethX ethY * * ethX (or Y) receives multicast or broadcast packet containing * an IP packet, not yet in conntrack table. * * 1. skb passes through bridge and fake-ip (br_netfilter)Prerouting. * -> skb->_nfct now references a unconfirmed entry * 2. skb is broad/mcast packet. bridge now passes clones out on each bridge * interface. * 3. skb gets passed up the stack. * 4. In macvlan case, macvlan driver retains clone(s) of the mcast skb * and schedules a work queue to send them out on the lower devices. * * The clone skb->_nfct is not a copy, it is the same entry as the * original skb. The macvlan rx handler then returns RX_HANDLER_PASS. * 5. Normal conntrack hooks (in NF_INET_LOCAL_IN) confirm the orig skb. * * The Macvlan broadcast worker and normal confirm path will race. * * This race will not happen if step 2 already confirmed a clone. In that * case later steps perform skb_clone() with skb->_nfct already confirmed (in * hash table). This works fine. * * But such confirmation won't happen when eb/ip/nftables rules dropped the * packets before they reached the nf_confirm step in postrouting. * * Work around this problem by explicit confirmation of the entry at * LOCAL_IN time, before upper layer has a chance to clone the unconfirmed * entry. * */ static unsigned int br_nf_local_in(void *priv, struct sk_buff *skb, const struct nf_hook_state *state) { bool promisc = BR_INPUT_SKB_CB(skb)->promisc; struct nf_conntrack *nfct = skb_nfct(skb); const struct nf_ct_hook *ct_hook; struct nf_conn *ct; int ret; if (promisc) { nf_reset_ct(skb); return NF_ACCEPT; } if (!nfct || skb->pkt_type == PACKET_HOST) return NF_ACCEPT; ct = container_of(nfct, struct nf_conn, ct_general); if (likely(nf_ct_is_confirmed(ct))) return NF_ACCEPT; if (WARN_ON_ONCE(refcount_read(&nfct->use) != 1)) { nf_reset_ct(skb); return NF_ACCEPT; } WARN_ON_ONCE(skb_shared(skb)); /* We can't call nf_confirm here, it would create a dependency * on nf_conntrack module. */ ct_hook = rcu_dereference(nf_ct_hook); if (!ct_hook) { skb->_nfct = 0ul; nf_conntrack_put(nfct); return NF_ACCEPT; } nf_bridge_pull_encap_header(skb); ret = ct_hook->confirm(skb); switch (ret & NF_VERDICT_MASK) { case NF_STOLEN: return NF_STOLEN; default: nf_bridge_push_encap_header(skb); break; } ct = container_of(nfct, struct nf_conn, ct_general); WARN_ON_ONCE(!nf_ct_is_confirmed(ct)); return ret; } #endif /* PF_BRIDGE/FORWARD *************************************************/ static int br_nf_forward_finish(struct net *net, struct sock *sk, struct sk_buff *skb) { struct nf_bridge_info *nf_bridge = nf_bridge_info_get(skb); struct net_device *in; if (!IS_ARP(skb) && !is_vlan_arp(skb, net)) { if (skb->protocol == htons(ETH_P_IP)) nf_bridge->frag_max_size = IPCB(skb)->frag_max_size; if (skb->protocol == htons(ETH_P_IPV6)) nf_bridge->frag_max_size = IP6CB(skb)->frag_max_size; in = nf_bridge_get_physindev(skb, net); if (!in) { kfree_skb(skb); return 0; } if (nf_bridge->pkt_otherhost) { skb->pkt_type = PACKET_OTHERHOST; nf_bridge->pkt_otherhost = false; } nf_bridge_update_protocol(skb); } else { in = *((struct net_device **)(skb->cb)); } nf_bridge_push_encap_header(skb); br_nf_hook_thresh(NF_BR_FORWARD, net, sk, skb, in, skb->dev, br_forward_finish); return 0; } static unsigned int br_nf_forward_ip(struct sk_buff *skb, const struct nf_hook_state *state, u8 pf) { struct nf_bridge_info *nf_bridge; struct net_device *parent; nf_bridge = nf_bridge_info_get(skb); if (!nf_bridge) return NF_ACCEPT; /* Need exclusive nf_bridge_info since we might have multiple * different physoutdevs. */ if (!nf_bridge_unshare(skb)) return NF_DROP_REASON(skb, SKB_DROP_REASON_NOMEM, 0); nf_bridge = nf_bridge_info_get(skb); if (!nf_bridge) return NF_DROP_REASON(skb, SKB_DROP_REASON_NOMEM, 0); parent = bridge_parent(state->out); if (!parent) return NF_DROP_REASON(skb, SKB_DROP_REASON_DEV_READY, 0); nf_bridge_pull_encap_header(skb); if (skb->pkt_type == PACKET_OTHERHOST) { skb->pkt_type = PACKET_HOST; nf_bridge->pkt_otherhost = true; } if (pf == NFPROTO_IPV4) { if (br_validate_ipv4(state->net, skb)) return NF_DROP_REASON(skb, SKB_DROP_REASON_IP_INHDR, 0); IPCB(skb)->frag_max_size = nf_bridge->frag_max_size; skb->protocol = htons(ETH_P_IP); } else if (pf == NFPROTO_IPV6) { if (br_validate_ipv6(state->net, skb)) return NF_DROP_REASON(skb, SKB_DROP_REASON_IP_INHDR, 0); IP6CB(skb)->frag_max_size = nf_bridge->frag_max_size; skb->protocol = htons(ETH_P_IPV6); } else { WARN_ON_ONCE(1); return NF_DROP; } nf_bridge->physoutdev = skb->dev; NF_HOOK(pf, NF_INET_FORWARD, state->net, NULL, skb, brnf_get_logical_dev(skb, state->in, state->net), parent, br_nf_forward_finish); return NF_STOLEN; } static unsigned int br_nf_forward_arp(struct sk_buff *skb, const struct nf_hook_state *state) { struct net_bridge_port *p; struct net_bridge *br; struct net_device **d = (struct net_device **)(skb->cb); struct brnf_net *brnet; p = br_port_get_rcu(state->out); if (p == NULL) return NF_ACCEPT; br = p->br; brnet = net_generic(state->net, brnf_net_id); if (!brnet->call_arptables && !br_opt_get(br, BROPT_NF_CALL_ARPTABLES)) return NF_ACCEPT; if (is_vlan_arp(skb, state->net)) nf_bridge_pull_encap_header(skb); if (unlikely(!pskb_may_pull(skb, sizeof(struct arphdr)))) return NF_DROP_REASON(skb, SKB_DROP_REASON_PKT_TOO_SMALL, 0); if (arp_hdr(skb)->ar_pln != 4) { if (is_vlan_arp(skb, state->net)) nf_bridge_push_encap_header(skb); return NF_ACCEPT; } *d = state->in; NF_HOOK(NFPROTO_ARP, NF_ARP_FORWARD, state->net, state->sk, skb, state->in, state->out, br_nf_forward_finish); return NF_STOLEN; } /* This is the 'purely bridged' case. For IP, we pass the packet to * netfilter with indev and outdev set to the bridge device, * but we are still able to filter on the 'real' indev/outdev * because of the physdev module. For ARP, indev and outdev are the * bridge ports. */ static unsigned int br_nf_forward(void *priv, struct sk_buff *skb, const struct nf_hook_state *state) { if (IS_IP(skb) || is_vlan_ip(skb, state->net) || is_pppoe_ip(skb, state->net)) return br_nf_forward_ip(skb, state, NFPROTO_IPV4); if (IS_IPV6(skb) || is_vlan_ipv6(skb, state->net) || is_pppoe_ipv6(skb, state->net)) return br_nf_forward_ip(skb, state, NFPROTO_IPV6); if (IS_ARP(skb) || is_vlan_arp(skb, state->net)) return br_nf_forward_arp(skb, state); return NF_ACCEPT; } static int br_nf_push_frag_xmit(struct net *net, struct sock *sk, struct sk_buff *skb) { struct brnf_frag_data *data; int err; data = this_cpu_ptr(&brnf_frag_data_storage); err = skb_cow_head(skb, data->size); if (err) { kfree_skb(skb); return 0; } if (data->vlan_proto) __vlan_hwaccel_put_tag(skb, data->vlan_proto, data->vlan_tci); skb_copy_to_linear_data_offset(skb, -data->size, data->mac, data->size); __skb_push(skb, data->encap_size); nf_bridge_info_free(skb); return br_dev_queue_push_xmit(net, sk, skb); } static int br_nf_ip_fragment(struct net *net, struct sock *sk, struct sk_buff *skb, int (*output)(struct net *, struct sock *, struct sk_buff *)) { unsigned int mtu = ip_skb_dst_mtu(sk, skb); struct iphdr *iph = ip_hdr(skb); if (unlikely(((iph->frag_off & htons(IP_DF)) && !skb->ignore_df) || (IPCB(skb)->frag_max_size && IPCB(skb)->frag_max_size > mtu))) { IP_INC_STATS(net, IPSTATS_MIB_FRAGFAILS); kfree_skb(skb); return -EMSGSIZE; } return ip_do_fragment(net, sk, skb, output); } static unsigned int nf_bridge_mtu_reduction(const struct sk_buff *skb) { const struct nf_bridge_info *nf_bridge = nf_bridge_info_get(skb); if (nf_bridge->orig_proto == BRNF_PROTO_PPPOE) return PPPOE_SES_HLEN; return 0; } static int br_nf_dev_queue_xmit(struct net *net, struct sock *sk, struct sk_buff *skb) { struct nf_bridge_info *nf_bridge = nf_bridge_info_get(skb); unsigned int mtu, mtu_reserved; int ret; mtu_reserved = nf_bridge_mtu_reduction(skb); mtu = skb->dev->mtu; if (nf_bridge->pkt_otherhost) { skb->pkt_type = PACKET_OTHERHOST; nf_bridge->pkt_otherhost = false; } if (nf_bridge->frag_max_size && nf_bridge->frag_max_size < mtu) mtu = nf_bridge->frag_max_size; nf_bridge_update_protocol(skb); nf_bridge_push_encap_header(skb); if (skb_is_gso(skb) || skb->len + mtu_reserved <= mtu) { nf_bridge_info_free(skb); return br_dev_queue_push_xmit(net, sk, skb); } /* Fragmentation on metadata/template dst is not supported */ if (unlikely(!skb_valid_dst(skb))) goto drop; /* This is wrong! We should preserve the original fragment * boundaries by preserving frag_list rather than refragmenting. */ if (IS_ENABLED(CONFIG_NF_DEFRAG_IPV4) && skb->protocol == htons(ETH_P_IP)) { struct brnf_frag_data *data; if (br_validate_ipv4(net, skb)) goto drop; IPCB(skb)->frag_max_size = nf_bridge->frag_max_size; local_lock_nested_bh(&brnf_frag_data_storage.bh_lock); data = this_cpu_ptr(&brnf_frag_data_storage); if (skb_vlan_tag_present(skb)) { data->vlan_tci = skb->vlan_tci; data->vlan_proto = skb->vlan_proto; } else { data->vlan_proto = 0; } data->encap_size = nf_bridge_encap_header_len(skb); data->size = ETH_HLEN + data->encap_size; skb_copy_from_linear_data_offset(skb, -data->size, data->mac, data->size); ret = br_nf_ip_fragment(net, sk, skb, br_nf_push_frag_xmit); local_unlock_nested_bh(&brnf_frag_data_storage.bh_lock); return ret; } if (IS_ENABLED(CONFIG_NF_DEFRAG_IPV6) && skb->protocol == htons(ETH_P_IPV6)) { const struct nf_ipv6_ops *v6ops = nf_get_ipv6_ops(); struct brnf_frag_data *data; if (br_validate_ipv6(net, skb)) goto drop; IP6CB(skb)->frag_max_size = nf_bridge->frag_max_size; local_lock_nested_bh(&brnf_frag_data_storage.bh_lock); data = this_cpu_ptr(&brnf_frag_data_storage); data->encap_size = nf_bridge_encap_header_len(skb); data->size = ETH_HLEN + data->encap_size; skb_copy_from_linear_data_offset(skb, -data->size, data->mac, data->size); if (v6ops) { ret = v6ops->fragment(net, sk, skb, br_nf_push_frag_xmit); local_unlock_nested_bh(&brnf_frag_data_storage.bh_lock); return ret; } local_unlock_nested_bh(&brnf_frag_data_storage.bh_lock); kfree_skb(skb); return -EMSGSIZE; } nf_bridge_info_free(skb); return br_dev_queue_push_xmit(net, sk, skb); drop: kfree_skb(skb); return 0; } /* PF_BRIDGE/POST_ROUTING ********************************************/ static unsigned int br_nf_post_routing(void *priv, struct sk_buff *skb, const struct nf_hook_state *state) { struct nf_bridge_info *nf_bridge = nf_bridge_info_get(skb); struct net_device *realoutdev = bridge_parent(skb->dev); u_int8_t pf; /* if nf_bridge is set, but ->physoutdev is NULL, this packet came in * on a bridge, but was delivered locally and is now being routed: * * POST_ROUTING was already invoked from the ip stack. */ if (!nf_bridge || !nf_bridge->physoutdev) return NF_ACCEPT; if (!realoutdev) return NF_DROP_REASON(skb, SKB_DROP_REASON_DEV_READY, 0); if (IS_IP(skb) || is_vlan_ip(skb, state->net) || is_pppoe_ip(skb, state->net)) pf = NFPROTO_IPV4; else if (IS_IPV6(skb) || is_vlan_ipv6(skb, state->net) || is_pppoe_ipv6(skb, state->net)) pf = NFPROTO_IPV6; else return NF_ACCEPT; if (skb->pkt_type == PACKET_OTHERHOST) { skb->pkt_type = PACKET_HOST; nf_bridge->pkt_otherhost = true; } nf_bridge_pull_encap_header(skb); if (pf == NFPROTO_IPV4) skb->protocol = htons(ETH_P_IP); else skb->protocol = htons(ETH_P_IPV6); NF_HOOK(pf, NF_INET_POST_ROUTING, state->net, state->sk, skb, NULL, realoutdev, br_nf_dev_queue_xmit); return NF_STOLEN; } /* IP/SABOTAGE *****************************************************/ /* Don't hand locally destined packets to PF_INET(6)/PRE_ROUTING * for the second time. */ static unsigned int ip_sabotage_in(void *priv, struct sk_buff *skb, const struct nf_hook_state *state) { struct nf_bridge_info *nf_bridge = nf_bridge_info_get(skb); if (nf_bridge) { if (nf_bridge->sabotage_in_done) return NF_ACCEPT; if (!nf_bridge->in_prerouting && !netif_is_l3_master(skb->dev) && !netif_is_l3_slave(skb->dev)) { nf_bridge->sabotage_in_done = 1; state->okfn(state->net, state->sk, skb); return NF_STOLEN; } } return NF_ACCEPT; } /* This is called when br_netfilter has called into iptables/netfilter, * and DNAT has taken place on a bridge-forwarded packet. * * neigh->output has created a new MAC header, with local br0 MAC * as saddr. * * This restores the original MAC saddr of the bridged packet * before invoking bridge forward logic to transmit the packet. */ static void br_nf_pre_routing_finish_bridge_slow(struct sk_buff *skb) { struct nf_bridge_info *nf_bridge = nf_bridge_info_get(skb); struct net_device *br_indev; br_indev = nf_bridge_get_physindev(skb, dev_net(skb->dev)); if (!br_indev) { kfree_skb(skb); return; } skb_pull(skb, ETH_HLEN); nf_bridge->bridged_dnat = 0; BUILD_BUG_ON(sizeof(nf_bridge->neigh_header) != (ETH_HLEN - ETH_ALEN)); skb_copy_to_linear_data_offset(skb, -(ETH_HLEN - ETH_ALEN), nf_bridge->neigh_header, ETH_HLEN - ETH_ALEN); skb->dev = br_indev; nf_bridge->physoutdev = NULL; br_handle_frame_finish(dev_net(skb->dev), NULL, skb); } static int br_nf_dev_xmit(struct sk_buff *skb) { const struct nf_bridge_info *nf_bridge = nf_bridge_info_get(skb); if (nf_bridge && nf_bridge->bridged_dnat) { br_nf_pre_routing_finish_bridge_slow(skb); return 1; } return 0; } static const struct nf_br_ops br_ops = { .br_dev_xmit_hook = br_nf_dev_xmit, }; /* For br_nf_post_routing, we need (prio = NF_BR_PRI_LAST), because * br_dev_queue_push_xmit is called afterwards */ static const struct nf_hook_ops br_nf_ops[] = { { .hook = br_nf_pre_routing, .pf = NFPROTO_BRIDGE, .hooknum = NF_BR_PRE_ROUTING, .priority = NF_BR_PRI_BRNF, }, #if IS_ENABLED(CONFIG_NF_CONNTRACK) { .hook = br_nf_local_in, .pf = NFPROTO_BRIDGE, .hooknum = NF_BR_LOCAL_IN, .priority = NF_BR_PRI_LAST, }, #endif { .hook = br_nf_forward, .pf = NFPROTO_BRIDGE, .hooknum = NF_BR_FORWARD, .priority = NF_BR_PRI_BRNF, }, { .hook = br_nf_post_routing, .pf = NFPROTO_BRIDGE, .hooknum = NF_BR_POST_ROUTING, .priority = NF_BR_PRI_LAST, }, { .hook = ip_sabotage_in, .pf = NFPROTO_IPV4, .hooknum = NF_INET_PRE_ROUTING, .priority = NF_IP_PRI_FIRST, }, { .hook = ip_sabotage_in, .pf = NFPROTO_IPV6, .hooknum = NF_INET_PRE_ROUTING, .priority = NF_IP6_PRI_FIRST, }, }; static int brnf_device_event(struct notifier_block *unused, unsigned long event, void *ptr) { struct net_device *dev = netdev_notifier_info_to_dev(ptr); struct brnf_net *brnet; struct net *net; int ret; if (event != NETDEV_REGISTER || !netif_is_bridge_master(dev)) return NOTIFY_DONE; ASSERT_RTNL(); net = dev_net(dev); brnet = net_generic(net, brnf_net_id); if (brnet->enabled) return NOTIFY_OK; ret = nf_register_net_hooks(net, br_nf_ops, ARRAY_SIZE(br_nf_ops)); if (ret) return NOTIFY_BAD; brnet->enabled = true; return NOTIFY_OK; } static struct notifier_block brnf_notifier __read_mostly = { .notifier_call = brnf_device_event, }; /* recursively invokes nf_hook_slow (again), skipping already-called * hooks (< NF_BR_PRI_BRNF). * * Called with rcu read lock held. */ int br_nf_hook_thresh(unsigned int hook, struct net *net, struct sock *sk, struct sk_buff *skb, struct net_device *indev, struct net_device *outdev, int (*okfn)(struct net *, struct sock *, struct sk_buff *)) { const struct nf_hook_entries *e; struct nf_hook_state state; struct nf_hook_ops **ops; unsigned int i; int ret; e = rcu_dereference(net->nf.hooks_bridge[hook]); if (!e) return okfn(net, sk, skb); ops = nf_hook_entries_get_hook_ops(e); for (i = 0; i < e->num_hook_entries; i++) { /* These hooks have already been called */ if (ops[i]->priority < NF_BR_PRI_BRNF) continue; /* These hooks have not been called yet, run them. */ if (ops[i]->priority > NF_BR_PRI_BRNF) break; /* take a closer look at NF_BR_PRI_BRNF. */ if (ops[i]->hook == br_nf_pre_routing) { /* This hook diverted the skb to this function, * hooks after this have not been run yet. */ i++; break; } } nf_hook_state_init(&state, hook, NFPROTO_BRIDGE, indev, outdev, sk, net, okfn); ret = nf_hook_slow(skb, &state, e, i); if (ret == 1) ret = okfn(net, sk, skb); return ret; } #ifdef CONFIG_SYSCTL static int brnf_sysctl_call_tables(const struct ctl_table *ctl, int write, void *buffer, size_t *lenp, loff_t *ppos) { int ret; ret = proc_dointvec(ctl, write, buffer, lenp, ppos); if (write && *(int *)(ctl->data)) *(int *)(ctl->data) = 1; return ret; } static struct ctl_table brnf_table[] = { { .procname = "bridge-nf-call-arptables", .maxlen = sizeof(int), .mode = 0644, .proc_handler = brnf_sysctl_call_tables, }, { .procname = "bridge-nf-call-iptables", .maxlen = sizeof(int), .mode = 0644, .proc_handler = brnf_sysctl_call_tables, }, { .procname = "bridge-nf-call-ip6tables", .maxlen = sizeof(int), .mode = 0644, .proc_handler = brnf_sysctl_call_tables, }, { .procname = "bridge-nf-filter-vlan-tagged", .maxlen = sizeof(int), .mode = 0644, .proc_handler = brnf_sysctl_call_tables, }, { .procname = "bridge-nf-filter-pppoe-tagged", .maxlen = sizeof(int), .mode = 0644, .proc_handler = brnf_sysctl_call_tables, }, { .procname = "bridge-nf-pass-vlan-input-dev", .maxlen = sizeof(int), .mode = 0644, .proc_handler = brnf_sysctl_call_tables, }, }; static inline void br_netfilter_sysctl_default(struct brnf_net *brnf) { brnf->call_iptables = 1; brnf->call_ip6tables = 1; brnf->call_arptables = 1; brnf->filter_vlan_tagged = 0; brnf->filter_pppoe_tagged = 0; brnf->pass_vlan_indev = 0; } static int br_netfilter_sysctl_init_net(struct net *net) { struct ctl_table *table = brnf_table; struct brnf_net *brnet; if (!net_eq(net, &init_net)) { table = kmemdup(table, sizeof(brnf_table), GFP_KERNEL); if (!table) return -ENOMEM; } brnet = net_generic(net, brnf_net_id); table[0].data = &brnet->call_arptables; table[1].data = &brnet->call_iptables; table[2].data = &brnet->call_ip6tables; table[3].data = &brnet->filter_vlan_tagged; table[4].data = &brnet->filter_pppoe_tagged; table[5].data = &brnet->pass_vlan_indev; br_netfilter_sysctl_default(brnet); brnet->ctl_hdr = register_net_sysctl_sz(net, "net/bridge", table, ARRAY_SIZE(brnf_table)); if (!brnet->ctl_hdr) { if (!net_eq(net, &init_net)) kfree(table); return -ENOMEM; } return 0; } static void br_netfilter_sysctl_exit_net(struct net *net, struct brnf_net *brnet) { const struct ctl_table *table = brnet->ctl_hdr->ctl_table_arg; unregister_net_sysctl_table(brnet->ctl_hdr); if (!net_eq(net, &init_net)) kfree(table); } static int __net_init brnf_init_net(struct net *net) { return br_netfilter_sysctl_init_net(net); } #endif static void __net_exit brnf_exit_net(struct net *net) { struct brnf_net *brnet; brnet = net_generic(net, brnf_net_id); if (brnet->enabled) { nf_unregister_net_hooks(net, br_nf_ops, ARRAY_SIZE(br_nf_ops)); brnet->enabled = false; } #ifdef CONFIG_SYSCTL br_netfilter_sysctl_exit_net(net, brnet); #endif } static struct pernet_operations brnf_net_ops __read_mostly = { #ifdef CONFIG_SYSCTL .init = brnf_init_net, #endif .exit = brnf_exit_net, .id = &brnf_net_id, .size = sizeof(struct brnf_net), }; static int __init br_netfilter_init(void) { int ret; ret = register_pernet_subsys(&brnf_net_ops); if (ret < 0) return ret; ret = register_netdevice_notifier(&brnf_notifier); if (ret < 0) { unregister_pernet_subsys(&brnf_net_ops); return ret; } RCU_INIT_POINTER(nf_br_ops, &br_ops); printk(KERN_NOTICE "Bridge firewalling registered\n"); return 0; } static void __exit br_netfilter_fini(void) { RCU_INIT_POINTER(nf_br_ops, NULL); unregister_netdevice_notifier(&brnf_notifier); unregister_pernet_subsys(&brnf_net_ops); } module_init(br_netfilter_init); module_exit(br_netfilter_fini); MODULE_LICENSE("GPL"); MODULE_AUTHOR("Lennert Buytenhek <buytenh@gnu.org>"); MODULE_AUTHOR("Bart De Schuymer <bdschuym@pandora.be>"); MODULE_DESCRIPTION("Linux ethernet netfilter firewall bridge"); |
| 4 2 8 10 10 2 1 1 3 3 1 1 1 12 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 | /* * algif_rng: User-space interface for random number generators * * This file provides the user-space API for random number generators. * * Copyright (C) 2014, Stephan Mueller <smueller@chronox.de> * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, and the entire permission notice in its entirety, * including the disclaimer of warranties. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. The name of the author may not be used to endorse or promote * products derived from this software without specific prior * written permission. * * ALTERNATIVELY, this product may be distributed under the terms of * the GNU General Public License, in which case the provisions of the GPL2 * are required INSTEAD OF the above restrictions. (This clause is * necessary due to a potential bad interaction between the GPL and * the restrictions contained in a BSD-style copyright.) * * THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESS OR IMPLIED * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE, ALL OF * WHICH ARE HEREBY DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT * OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE * USE OF THIS SOFTWARE, EVEN IF NOT ADVISED OF THE POSSIBILITY OF SUCH * DAMAGE. */ #include <linux/capability.h> #include <linux/module.h> #include <crypto/rng.h> #include <linux/random.h> #include <crypto/if_alg.h> #include <linux/net.h> #include <net/sock.h> MODULE_LICENSE("GPL"); MODULE_AUTHOR("Stephan Mueller <smueller@chronox.de>"); MODULE_DESCRIPTION("User-space interface for random number generators"); struct rng_ctx { #define MAXSIZE 128 unsigned int len; struct crypto_rng *drng; u8 *addtl; size_t addtl_len; }; struct rng_parent_ctx { struct crypto_rng *drng; u8 *entropy; }; static void rng_reset_addtl(struct rng_ctx *ctx) { kfree_sensitive(ctx->addtl); ctx->addtl = NULL; ctx->addtl_len = 0; } static int _rng_recvmsg(struct crypto_rng *drng, struct msghdr *msg, size_t len, u8 *addtl, size_t addtl_len) { int err = 0; int genlen = 0; u8 result[MAXSIZE]; if (len == 0) return 0; if (len > MAXSIZE) len = MAXSIZE; /* * although not strictly needed, this is a precaution against coding * errors */ memset(result, 0, len); /* * The enforcement of a proper seeding of an RNG is done within an * RNG implementation. Some RNGs (DRBG, krng) do not need specific * seeding as they automatically seed. The X9.31 DRNG will return * an error if it was not seeded properly. */ genlen = crypto_rng_generate(drng, addtl, addtl_len, result, len); if (genlen < 0) return genlen; err = memcpy_to_msg(msg, result, len); memzero_explicit(result, len); return err ? err : len; } static int rng_recvmsg(struct socket *sock, struct msghdr *msg, size_t len, int flags) { struct sock *sk = sock->sk; struct alg_sock *ask = alg_sk(sk); struct rng_ctx *ctx = ask->private; return _rng_recvmsg(ctx->drng, msg, len, NULL, 0); } static int rng_test_recvmsg(struct socket *sock, struct msghdr *msg, size_t len, int flags) { struct sock *sk = sock->sk; struct alg_sock *ask = alg_sk(sk); struct rng_ctx *ctx = ask->private; int ret; lock_sock(sock->sk); ret = _rng_recvmsg(ctx->drng, msg, len, ctx->addtl, ctx->addtl_len); rng_reset_addtl(ctx); release_sock(sock->sk); return ret; } static int rng_test_sendmsg(struct socket *sock, struct msghdr *msg, size_t len) { int err; struct alg_sock *ask = alg_sk(sock->sk); struct rng_ctx *ctx = ask->private; lock_sock(sock->sk); if (len > MAXSIZE) { err = -EMSGSIZE; goto unlock; } rng_reset_addtl(ctx); ctx->addtl = kmalloc(len, GFP_KERNEL); if (!ctx->addtl) { err = -ENOMEM; goto unlock; } err = memcpy_from_msg(ctx->addtl, msg, len); if (err) { rng_reset_addtl(ctx); goto unlock; } ctx->addtl_len = len; unlock: release_sock(sock->sk); return err ? err : len; } static struct proto_ops algif_rng_ops = { .family = PF_ALG, .connect = sock_no_connect, .socketpair = sock_no_socketpair, .getname = sock_no_getname, .ioctl = sock_no_ioctl, .listen = sock_no_listen, .shutdown = sock_no_shutdown, .mmap = sock_no_mmap, .bind = sock_no_bind, .accept = sock_no_accept, .sendmsg = sock_no_sendmsg, .release = af_alg_release, .recvmsg = rng_recvmsg, }; static struct proto_ops __maybe_unused algif_rng_test_ops = { .family = PF_ALG, .connect = sock_no_connect, .socketpair = sock_no_socketpair, .getname = sock_no_getname, .ioctl = sock_no_ioctl, .listen = sock_no_listen, .shutdown = sock_no_shutdown, .mmap = sock_no_mmap, .bind = sock_no_bind, .accept = sock_no_accept, .release = af_alg_release, .recvmsg = rng_test_recvmsg, .sendmsg = rng_test_sendmsg, }; static void *rng_bind(const char *name, u32 type, u32 mask) { struct rng_parent_ctx *pctx; struct crypto_rng *rng; pctx = kzalloc(sizeof(*pctx), GFP_KERNEL); if (!pctx) return ERR_PTR(-ENOMEM); rng = crypto_alloc_rng(name, type, mask); if (IS_ERR(rng)) { kfree(pctx); return ERR_CAST(rng); } pctx->drng = rng; return pctx; } static void rng_release(void *private) { struct rng_parent_ctx *pctx = private; if (unlikely(!pctx)) return; crypto_free_rng(pctx->drng); kfree_sensitive(pctx->entropy); kfree_sensitive(pctx); } static void rng_sock_destruct(struct sock *sk) { struct alg_sock *ask = alg_sk(sk); struct rng_ctx *ctx = ask->private; rng_reset_addtl(ctx); sock_kfree_s(sk, ctx, ctx->len); af_alg_release_parent(sk); } static int rng_accept_parent(void *private, struct sock *sk) { struct rng_ctx *ctx; struct rng_parent_ctx *pctx = private; struct alg_sock *ask = alg_sk(sk); unsigned int len = sizeof(*ctx); ctx = sock_kmalloc(sk, len, GFP_KERNEL); if (!ctx) return -ENOMEM; ctx->len = len; ctx->addtl = NULL; ctx->addtl_len = 0; /* * No seeding done at that point -- if multiple accepts are * done on one RNG instance, each resulting FD points to the same * state of the RNG. */ ctx->drng = pctx->drng; ask->private = ctx; sk->sk_destruct = rng_sock_destruct; /* * Non NULL pctx->entropy means that CAVP test has been initiated on * this socket, replace proto_ops algif_rng_ops with algif_rng_test_ops. */ if (IS_ENABLED(CONFIG_CRYPTO_USER_API_RNG_CAVP) && pctx->entropy) sk->sk_socket->ops = &algif_rng_test_ops; return 0; } static int rng_setkey(void *private, const u8 *seed, unsigned int seedlen) { struct rng_parent_ctx *pctx = private; /* * Check whether seedlen is of sufficient size is done in RNG * implementations. */ return crypto_rng_reset(pctx->drng, seed, seedlen); } static int __maybe_unused rng_setentropy(void *private, sockptr_t entropy, unsigned int len) { struct rng_parent_ctx *pctx = private; u8 *kentropy = NULL; if (!capable(CAP_SYS_ADMIN)) return -EACCES; if (pctx->entropy) return -EINVAL; if (len > MAXSIZE) return -EMSGSIZE; if (len) { kentropy = memdup_sockptr(entropy, len); if (IS_ERR(kentropy)) return PTR_ERR(kentropy); } crypto_rng_alg(pctx->drng)->set_ent(pctx->drng, kentropy, len); /* * Since rng doesn't perform any memory management for the entropy * buffer, save kentropy pointer to pctx now to free it after use. */ pctx->entropy = kentropy; return 0; } static const struct af_alg_type algif_type_rng = { .bind = rng_bind, .release = rng_release, .accept = rng_accept_parent, .setkey = rng_setkey, #ifdef CONFIG_CRYPTO_USER_API_RNG_CAVP .setentropy = rng_setentropy, #endif .ops = &algif_rng_ops, .name = "rng", .owner = THIS_MODULE }; static int __init rng_init(void) { return af_alg_register_type(&algif_type_rng); } static void __exit rng_exit(void) { int err = af_alg_unregister_type(&algif_type_rng); BUG_ON(err); } module_init(rng_init); module_exit(rng_exit); |
| 6 7 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 | // SPDX-License-Identifier: GPL-2.0 /* * Copyright (C) 2014 Facebook. All rights reserved. */ #include <linux/sched.h> #include <linux/stacktrace.h> #include "messages.h" #include "ctree.h" #include "disk-io.h" #include "locking.h" #include "delayed-ref.h" #include "ref-verify.h" #include "fs.h" #include "accessors.h" /* * Used to keep track the roots and number of refs each root has for a given * bytenr. This just tracks the number of direct references, no shared * references. */ struct root_entry { u64 root_objectid; u64 num_refs; struct rb_node node; }; /* * These are meant to represent what should exist in the extent tree, these can * be used to verify the extent tree is consistent as these should all match * what the extent tree says. */ struct ref_entry { u64 root_objectid; u64 parent; u64 owner; u64 offset; u64 num_refs; struct rb_node node; }; #define MAX_TRACE 16 /* * Whenever we add/remove a reference we record the action. The action maps * back to the delayed ref action. We hold the ref we are changing in the * action so we can account for the history properly, and we record the root we * were called with since it could be different from ref_root. We also store * stack traces because that's how I roll. */ struct ref_action { int action; u64 root; struct ref_entry ref; struct list_head list; unsigned long trace[MAX_TRACE]; unsigned int trace_len; }; /* * One of these for every block we reference, it holds the roots and references * to it as well as all of the ref actions that have occurred to it. We never * free it until we unmount the file system in order to make sure re-allocations * are happening properly. */ struct block_entry { u64 bytenr; u64 len; u64 num_refs; int metadata; int from_disk; struct rb_root roots; struct rb_root refs; struct rb_node node; struct list_head actions; }; static int block_entry_bytenr_key_cmp(const void *key, const struct rb_node *node) { const u64 *bytenr = key; const struct block_entry *entry = rb_entry(node, struct block_entry, node); if (entry->bytenr < *bytenr) return 1; else if (entry->bytenr > *bytenr) return -1; return 0; } static int block_entry_bytenr_cmp(struct rb_node *new, const struct rb_node *existing) { const struct block_entry *new_entry = rb_entry(new, struct block_entry, node); return block_entry_bytenr_key_cmp(&new_entry->bytenr, existing); } static struct block_entry *insert_block_entry(struct rb_root *root, struct block_entry *be) { struct rb_node *node; node = rb_find_add(&be->node, root, block_entry_bytenr_cmp); return rb_entry_safe(node, struct block_entry, node); } static struct block_entry *lookup_block_entry(struct rb_root *root, u64 bytenr) { struct rb_node *node; node = rb_find(&bytenr, root, block_entry_bytenr_key_cmp); return rb_entry_safe(node, struct block_entry, node); } static int root_entry_root_objectid_key_cmp(const void *key, const struct rb_node *node) { const u64 *objectid = key; const struct root_entry *entry = rb_entry(node, struct root_entry, node); if (entry->root_objectid < *objectid) return 1; else if (entry->root_objectid > *objectid) return -1; return 0; } static int root_entry_root_objectid_cmp(struct rb_node *new, const struct rb_node *existing) { const struct root_entry *new_entry = rb_entry(new, struct root_entry, node); return root_entry_root_objectid_key_cmp(&new_entry->root_objectid, existing); } static struct root_entry *insert_root_entry(struct rb_root *root, struct root_entry *re) { struct rb_node *node; node = rb_find_add(&re->node, root, root_entry_root_objectid_cmp); return rb_entry_safe(node, struct root_entry, node); } static int comp_refs(struct ref_entry *ref1, struct ref_entry *ref2) { if (ref1->root_objectid < ref2->root_objectid) return -1; if (ref1->root_objectid > ref2->root_objectid) return 1; if (ref1->parent < ref2->parent) return -1; if (ref1->parent > ref2->parent) return 1; if (ref1->owner < ref2->owner) return -1; if (ref1->owner > ref2->owner) return 1; if (ref1->offset < ref2->offset) return -1; if (ref1->offset > ref2->offset) return 1; return 0; } static int ref_entry_cmp(struct rb_node *new, const struct rb_node *existing) { struct ref_entry *new_entry = rb_entry(new, struct ref_entry, node); struct ref_entry *existing_entry = rb_entry(existing, struct ref_entry, node); return comp_refs(new_entry, existing_entry); } static struct ref_entry *insert_ref_entry(struct rb_root *root, struct ref_entry *ref) { struct rb_node *node; node = rb_find_add(&ref->node, root, ref_entry_cmp); return rb_entry_safe(node, struct ref_entry, node); } static struct root_entry *lookup_root_entry(struct rb_root *root, u64 objectid) { struct rb_node *node; node = rb_find(&objectid, root, root_entry_root_objectid_key_cmp); return rb_entry_safe(node, struct root_entry, node); } #ifdef CONFIG_STACKTRACE static void __save_stack_trace(struct ref_action *ra) { ra->trace_len = stack_trace_save(ra->trace, MAX_TRACE, 2); } static void __print_stack_trace(struct btrfs_fs_info *fs_info, struct ref_action *ra) { if (ra->trace_len == 0) { btrfs_err(fs_info, " ref-verify: no stacktrace"); return; } stack_trace_print(ra->trace, ra->trace_len, 2); } #else static inline void __save_stack_trace(struct ref_action *ra) { } static inline void __print_stack_trace(struct btrfs_fs_info *fs_info, struct ref_action *ra) { btrfs_err(fs_info, " ref-verify: no stacktrace support"); } #endif static void free_block_entry(struct block_entry *be) { struct root_entry *re; struct ref_entry *ref; struct ref_action *ra; struct rb_node *n; while ((n = rb_first(&be->roots))) { re = rb_entry(n, struct root_entry, node); rb_erase(&re->node, &be->roots); kfree(re); } while((n = rb_first(&be->refs))) { ref = rb_entry(n, struct ref_entry, node); rb_erase(&ref->node, &be->refs); kfree(ref); } while (!list_empty(&be->actions)) { ra = list_first_entry(&be->actions, struct ref_action, list); list_del(&ra->list); kfree(ra); } kfree(be); } static struct block_entry *add_block_entry(struct btrfs_fs_info *fs_info, u64 bytenr, u64 len, u64 root_objectid) { struct block_entry *be = NULL, *exist; struct root_entry *re = NULL; re = kzalloc(sizeof(struct root_entry), GFP_NOFS); be = kzalloc(sizeof(struct block_entry), GFP_NOFS); if (!be || !re) { kfree(re); kfree(be); return ERR_PTR(-ENOMEM); } be->bytenr = bytenr; be->len = len; re->root_objectid = root_objectid; re->num_refs = 0; spin_lock(&fs_info->ref_verify_lock); exist = insert_block_entry(&fs_info->block_tree, be); if (exist) { if (root_objectid) { struct root_entry *exist_re; exist_re = insert_root_entry(&exist->roots, re); if (exist_re) kfree(re); } else { kfree(re); } kfree(be); return exist; } be->num_refs = 0; be->metadata = 0; be->from_disk = 0; be->roots = RB_ROOT; be->refs = RB_ROOT; INIT_LIST_HEAD(&be->actions); if (root_objectid) insert_root_entry(&be->roots, re); else kfree(re); return be; } static int add_tree_block(struct btrfs_fs_info *fs_info, u64 ref_root, u64 parent, u64 bytenr, int level) { struct block_entry *be; struct root_entry *re; struct ref_entry *ref = NULL, *exist; ref = kmalloc(sizeof(struct ref_entry), GFP_NOFS); if (!ref) return -ENOMEM; if (parent) ref->root_objectid = 0; else ref->root_objectid = ref_root; ref->parent = parent; ref->owner = level; ref->offset = 0; ref->num_refs = 1; be = add_block_entry(fs_info, bytenr, fs_info->nodesize, ref_root); if (IS_ERR(be)) { kfree(ref); return PTR_ERR(be); } be->num_refs++; be->from_disk = 1; be->metadata = 1; if (!parent) { ASSERT(ref_root); re = lookup_root_entry(&be->roots, ref_root); ASSERT(re); re->num_refs++; } exist = insert_ref_entry(&be->refs, ref); if (exist) { exist->num_refs++; kfree(ref); } spin_unlock(&fs_info->ref_verify_lock); return 0; } static int add_shared_data_ref(struct btrfs_fs_info *fs_info, u64 parent, u32 num_refs, u64 bytenr, u64 num_bytes) { struct block_entry *be; struct ref_entry *ref; ref = kzalloc(sizeof(struct ref_entry), GFP_NOFS); if (!ref) return -ENOMEM; be = add_block_entry(fs_info, bytenr, num_bytes, 0); if (IS_ERR(be)) { kfree(ref); return PTR_ERR(be); } be->num_refs += num_refs; ref->parent = parent; ref->num_refs = num_refs; if (insert_ref_entry(&be->refs, ref)) { spin_unlock(&fs_info->ref_verify_lock); btrfs_err(fs_info, "existing shared ref when reading from disk?"); kfree(ref); return -EINVAL; } spin_unlock(&fs_info->ref_verify_lock); return 0; } static int add_extent_data_ref(struct btrfs_fs_info *fs_info, struct extent_buffer *leaf, struct btrfs_extent_data_ref *dref, u64 bytenr, u64 num_bytes) { struct block_entry *be; struct ref_entry *ref; struct root_entry *re; u64 ref_root = btrfs_extent_data_ref_root(leaf, dref); u64 owner = btrfs_extent_data_ref_objectid(leaf, dref); u64 offset = btrfs_extent_data_ref_offset(leaf, dref); u32 num_refs = btrfs_extent_data_ref_count(leaf, dref); ref = kzalloc(sizeof(struct ref_entry), GFP_NOFS); if (!ref) return -ENOMEM; be = add_block_entry(fs_info, bytenr, num_bytes, ref_root); if (IS_ERR(be)) { kfree(ref); return PTR_ERR(be); } be->num_refs += num_refs; ref->parent = 0; ref->owner = owner; ref->root_objectid = ref_root; ref->offset = offset; ref->num_refs = num_refs; if (insert_ref_entry(&be->refs, ref)) { spin_unlock(&fs_info->ref_verify_lock); btrfs_err(fs_info, "existing ref when reading from disk?"); kfree(ref); return -EINVAL; } re = lookup_root_entry(&be->roots, ref_root); if (!re) { spin_unlock(&fs_info->ref_verify_lock); btrfs_err(fs_info, "missing root in new block entry?"); return -EINVAL; } re->num_refs += num_refs; spin_unlock(&fs_info->ref_verify_lock); return 0; } static int process_extent_item(struct btrfs_fs_info *fs_info, struct btrfs_path *path, struct btrfs_key *key, int slot, int *tree_block_level) { struct btrfs_extent_item *ei; struct btrfs_extent_inline_ref *iref; struct btrfs_extent_data_ref *dref; struct btrfs_shared_data_ref *sref; struct extent_buffer *leaf = path->nodes[0]; u32 item_size = btrfs_item_size(leaf, slot); unsigned long end, ptr; u64 offset, flags, count; int type; int ret = 0; ei = btrfs_item_ptr(leaf, slot, struct btrfs_extent_item); flags = btrfs_extent_flags(leaf, ei); if ((key->type == BTRFS_EXTENT_ITEM_KEY) && flags & BTRFS_EXTENT_FLAG_TREE_BLOCK) { struct btrfs_tree_block_info *info; info = (struct btrfs_tree_block_info *)(ei + 1); *tree_block_level = btrfs_tree_block_level(leaf, info); iref = (struct btrfs_extent_inline_ref *)(info + 1); } else { if (key->type == BTRFS_METADATA_ITEM_KEY) *tree_block_level = key->offset; iref = (struct btrfs_extent_inline_ref *)(ei + 1); } ptr = (unsigned long)iref; end = (unsigned long)ei + item_size; while (ptr < end) { iref = (struct btrfs_extent_inline_ref *)ptr; type = btrfs_extent_inline_ref_type(leaf, iref); offset = btrfs_extent_inline_ref_offset(leaf, iref); switch (type) { case BTRFS_TREE_BLOCK_REF_KEY: ret = add_tree_block(fs_info, offset, 0, key->objectid, *tree_block_level); break; case BTRFS_SHARED_BLOCK_REF_KEY: ret = add_tree_block(fs_info, 0, offset, key->objectid, *tree_block_level); break; case BTRFS_EXTENT_DATA_REF_KEY: dref = (struct btrfs_extent_data_ref *)(&iref->offset); ret = add_extent_data_ref(fs_info, leaf, dref, key->objectid, key->offset); break; case BTRFS_SHARED_DATA_REF_KEY: sref = (struct btrfs_shared_data_ref *)(iref + 1); count = btrfs_shared_data_ref_count(leaf, sref); ret = add_shared_data_ref(fs_info, offset, count, key->objectid, key->offset); break; case BTRFS_EXTENT_OWNER_REF_KEY: if (!btrfs_fs_incompat(fs_info, SIMPLE_QUOTA)) { btrfs_err(fs_info, "found extent owner ref without simple quotas enabled"); ret = -EINVAL; } break; default: btrfs_err(fs_info, "invalid key type in iref"); ret = -EINVAL; break; } if (ret) break; ptr += btrfs_extent_inline_ref_size(type); } return ret; } static int process_leaf(struct btrfs_root *root, struct btrfs_path *path, u64 *bytenr, u64 *num_bytes, int *tree_block_level) { struct btrfs_fs_info *fs_info = root->fs_info; struct extent_buffer *leaf = path->nodes[0]; struct btrfs_extent_data_ref *dref; struct btrfs_shared_data_ref *sref; u32 count; int i = 0, ret = 0; struct btrfs_key key; int nritems = btrfs_header_nritems(leaf); for (i = 0; i < nritems; i++) { btrfs_item_key_to_cpu(leaf, &key, i); switch (key.type) { case BTRFS_EXTENT_ITEM_KEY: *num_bytes = key.offset; fallthrough; case BTRFS_METADATA_ITEM_KEY: *bytenr = key.objectid; ret = process_extent_item(fs_info, path, &key, i, tree_block_level); break; case BTRFS_TREE_BLOCK_REF_KEY: ret = add_tree_block(fs_info, key.offset, 0, key.objectid, *tree_block_level); break; case BTRFS_SHARED_BLOCK_REF_KEY: ret = add_tree_block(fs_info, 0, key.offset, key.objectid, *tree_block_level); break; case BTRFS_EXTENT_DATA_REF_KEY: dref = btrfs_item_ptr(leaf, i, struct btrfs_extent_data_ref); ret = add_extent_data_ref(fs_info, leaf, dref, *bytenr, *num_bytes); break; case BTRFS_SHARED_DATA_REF_KEY: sref = btrfs_item_ptr(leaf, i, struct btrfs_shared_data_ref); count = btrfs_shared_data_ref_count(leaf, sref); ret = add_shared_data_ref(fs_info, key.offset, count, *bytenr, *num_bytes); break; default: break; } if (ret) break; } return ret; } /* Walk down to the leaf from the given level */ static int walk_down_tree(struct btrfs_root *root, struct btrfs_path *path, int level, u64 *bytenr, u64 *num_bytes, int *tree_block_level) { struct extent_buffer *eb; int ret = 0; while (level >= 0) { if (level) { eb = btrfs_read_node_slot(path->nodes[level], path->slots[level]); if (IS_ERR(eb)) return PTR_ERR(eb); btrfs_tree_read_lock(eb); path->nodes[level-1] = eb; path->slots[level-1] = 0; path->locks[level-1] = BTRFS_READ_LOCK; } else { ret = process_leaf(root, path, bytenr, num_bytes, tree_block_level); if (ret) break; } level--; } return ret; } /* Walk up to the next node that needs to be processed */ static int walk_up_tree(struct btrfs_path *path, int *level) { int l; for (l = 0; l < BTRFS_MAX_LEVEL; l++) { if (!path->nodes[l]) continue; if (l) { path->slots[l]++; if (path->slots[l] < btrfs_header_nritems(path->nodes[l])) { *level = l; return 0; } } btrfs_tree_unlock_rw(path->nodes[l], path->locks[l]); free_extent_buffer(path->nodes[l]); path->nodes[l] = NULL; path->slots[l] = 0; path->locks[l] = 0; } return 1; } static void dump_ref_action(struct btrfs_fs_info *fs_info, struct ref_action *ra) { btrfs_err(fs_info, " Ref action %d, root %llu, ref_root %llu, parent %llu, owner %llu, offset %llu, num_refs %llu", ra->action, ra->root, ra->ref.root_objectid, ra->ref.parent, ra->ref.owner, ra->ref.offset, ra->ref.num_refs); __print_stack_trace(fs_info, ra); } /* * Dumps all the information from the block entry to printk, it's going to be * awesome. */ static void dump_block_entry(struct btrfs_fs_info *fs_info, struct block_entry *be) { struct ref_entry *ref; struct root_entry *re; struct ref_action *ra; struct rb_node *n; btrfs_err(fs_info, "dumping block entry [%llu %llu], num_refs %llu, metadata %d, from disk %d", be->bytenr, be->len, be->num_refs, be->metadata, be->from_disk); for (n = rb_first(&be->refs); n; n = rb_next(n)) { ref = rb_entry(n, struct ref_entry, node); btrfs_err(fs_info, " ref root %llu, parent %llu, owner %llu, offset %llu, num_refs %llu", ref->root_objectid, ref->parent, ref->owner, ref->offset, ref->num_refs); } for (n = rb_first(&be->roots); n; n = rb_next(n)) { re = rb_entry(n, struct root_entry, node); btrfs_err(fs_info, " root entry %llu, num_refs %llu", re->root_objectid, re->num_refs); } list_for_each_entry(ra, &be->actions, list) dump_ref_action(fs_info, ra); } /* * Called when we modify a ref for a bytenr. * * This will add an action item to the given bytenr and do sanity checks to make * sure we haven't messed something up. If we are making a new allocation and * this block entry has history we will delete all previous actions as long as * our sanity checks pass as they are no longer needed. */ int btrfs_ref_tree_mod(struct btrfs_fs_info *fs_info, const struct btrfs_ref *generic_ref) { struct ref_entry *ref = NULL, *exist; struct ref_action *ra = NULL; struct block_entry *be = NULL; struct root_entry *re = NULL; int action = generic_ref->action; int ret = 0; bool metadata; u64 bytenr = generic_ref->bytenr; u64 num_bytes = generic_ref->num_bytes; u64 parent = generic_ref->parent; u64 ref_root = 0; u64 owner = 0; u64 offset = 0; if (!btrfs_test_opt(fs_info, REF_VERIFY)) return 0; if (generic_ref->type == BTRFS_REF_METADATA) { if (!parent) ref_root = generic_ref->ref_root; owner = generic_ref->tree_ref.level; } else if (!parent) { ref_root = generic_ref->ref_root; owner = generic_ref->data_ref.objectid; offset = generic_ref->data_ref.offset; } metadata = owner < BTRFS_FIRST_FREE_OBJECTID; ref = kzalloc(sizeof(struct ref_entry), GFP_NOFS); ra = kmalloc(sizeof(struct ref_action), GFP_NOFS); if (!ra || !ref) { kfree(ref); kfree(ra); ret = -ENOMEM; goto out; } ref->parent = parent; ref->owner = owner; ref->root_objectid = ref_root; ref->offset = offset; ref->num_refs = (action == BTRFS_DROP_DELAYED_REF) ? -1 : 1; memcpy(&ra->ref, ref, sizeof(struct ref_entry)); /* * Save the extra info from the delayed ref in the ref action to make it * easier to figure out what is happening. The real ref's we add to the * ref tree need to reflect what we save on disk so it matches any * on-disk refs we pre-loaded. */ ra->ref.owner = owner; ra->ref.offset = offset; ra->ref.root_objectid = ref_root; __save_stack_trace(ra); INIT_LIST_HEAD(&ra->list); ra->action = action; ra->root = generic_ref->real_root; /* * This is an allocation, preallocate the block_entry in case we haven't * used it before. */ ret = -EINVAL; if (action == BTRFS_ADD_DELAYED_EXTENT) { /* * For subvol_create we'll just pass in whatever the parent root * is and the new root objectid, so let's not treat the passed * in root as if it really has a ref for this bytenr. */ be = add_block_entry(fs_info, bytenr, num_bytes, ref_root); if (IS_ERR(be)) { kfree(ref); kfree(ra); ret = PTR_ERR(be); goto out; } be->num_refs++; if (metadata) be->metadata = 1; if (be->num_refs != 1) { btrfs_err(fs_info, "re-allocated a block that still has references to it!"); dump_block_entry(fs_info, be); dump_ref_action(fs_info, ra); kfree(ref); kfree(ra); goto out_unlock; } while (!list_empty(&be->actions)) { struct ref_action *tmp; tmp = list_first_entry(&be->actions, struct ref_action, list); list_del(&tmp->list); kfree(tmp); } } else { struct root_entry *tmp; if (!parent) { re = kmalloc(sizeof(struct root_entry), GFP_NOFS); if (!re) { kfree(ref); kfree(ra); ret = -ENOMEM; goto out; } /* * This is the root that is modifying us, so it's the * one we want to lookup below when we modify the * re->num_refs. */ ref_root = generic_ref->real_root; re->root_objectid = generic_ref->real_root; re->num_refs = 0; } spin_lock(&fs_info->ref_verify_lock); be = lookup_block_entry(&fs_info->block_tree, bytenr); if (!be) { btrfs_err(fs_info, "trying to do action %d to bytenr %llu num_bytes %llu but there is no existing entry!", action, bytenr, num_bytes); dump_ref_action(fs_info, ra); kfree(ref); kfree(ra); kfree(re); goto out_unlock; } else if (be->num_refs == 0) { btrfs_err(fs_info, "trying to do action %d for a bytenr that has 0 total references", action); dump_block_entry(fs_info, be); dump_ref_action(fs_info, ra); kfree(ref); kfree(ra); kfree(re); goto out_unlock; } if (!parent) { tmp = insert_root_entry(&be->roots, re); if (tmp) { kfree(re); re = tmp; } } } exist = insert_ref_entry(&be->refs, ref); if (exist) { if (action == BTRFS_DROP_DELAYED_REF) { if (exist->num_refs == 0) { btrfs_err(fs_info, "dropping a ref for a existing root that doesn't have a ref on the block"); dump_block_entry(fs_info, be); dump_ref_action(fs_info, ra); kfree(ref); kfree(ra); goto out_unlock; } exist->num_refs--; if (exist->num_refs == 0) { rb_erase(&exist->node, &be->refs); kfree(exist); } } else if (!be->metadata) { exist->num_refs++; } else { btrfs_err(fs_info, "attempting to add another ref for an existing ref on a tree block"); dump_block_entry(fs_info, be); dump_ref_action(fs_info, ra); kfree(ref); kfree(ra); goto out_unlock; } kfree(ref); } else { if (action == BTRFS_DROP_DELAYED_REF) { btrfs_err(fs_info, "dropping a ref for a root that doesn't have a ref on the block"); dump_block_entry(fs_info, be); dump_ref_action(fs_info, ra); rb_erase(&ref->node, &be->refs); kfree(ref); kfree(ra); goto out_unlock; } } if (!parent && !re) { re = lookup_root_entry(&be->roots, ref_root); if (!re) { /* * This shouldn't happen because we will add our re * above when we lookup the be with !parent, but just in * case catch this case so we don't panic because I * didn't think of some other corner case. */ btrfs_err(fs_info, "failed to find root %llu for %llu", generic_ref->real_root, be->bytenr); dump_block_entry(fs_info, be); dump_ref_action(fs_info, ra); kfree(ra); goto out_unlock; } } if (action == BTRFS_DROP_DELAYED_REF) { if (re) re->num_refs--; be->num_refs--; } else if (action == BTRFS_ADD_DELAYED_REF) { be->num_refs++; if (re) re->num_refs++; } list_add_tail(&ra->list, &be->actions); ret = 0; out_unlock: spin_unlock(&fs_info->ref_verify_lock); out: if (ret) { btrfs_free_ref_cache(fs_info); btrfs_clear_opt(fs_info->mount_opt, REF_VERIFY); } return ret; } /* Free up the ref cache */ void btrfs_free_ref_cache(struct btrfs_fs_info *fs_info) { struct block_entry *be; struct rb_node *n; if (!btrfs_test_opt(fs_info, REF_VERIFY)) return; spin_lock(&fs_info->ref_verify_lock); while ((n = rb_first(&fs_info->block_tree))) { be = rb_entry(n, struct block_entry, node); rb_erase(&be->node, &fs_info->block_tree); free_block_entry(be); cond_resched_lock(&fs_info->ref_verify_lock); } spin_unlock(&fs_info->ref_verify_lock); } void btrfs_free_ref_tree_range(struct btrfs_fs_info *fs_info, u64 start, u64 len) { struct block_entry *be = NULL, *entry; struct rb_node *n; if (!btrfs_test_opt(fs_info, REF_VERIFY)) return; spin_lock(&fs_info->ref_verify_lock); n = fs_info->block_tree.rb_node; while (n) { entry = rb_entry(n, struct block_entry, node); if (entry->bytenr < start) { n = n->rb_right; } else if (entry->bytenr > start) { n = n->rb_left; } else { be = entry; break; } /* We want to get as close to start as possible */ if (be == NULL || (entry->bytenr < start && be->bytenr > start) || (entry->bytenr < start && entry->bytenr > be->bytenr)) be = entry; } /* * Could have an empty block group, maybe have something to check for * this case to verify we were actually empty? */ if (!be) { spin_unlock(&fs_info->ref_verify_lock); return; } n = &be->node; while (n) { be = rb_entry(n, struct block_entry, node); n = rb_next(n); if (be->bytenr < start && be->bytenr + be->len > start) { btrfs_err(fs_info, "block entry overlaps a block group [%llu,%llu]!", start, len); dump_block_entry(fs_info, be); continue; } if (be->bytenr < start) continue; if (be->bytenr >= start + len) break; if (be->bytenr + be->len > start + len) { btrfs_err(fs_info, "block entry overlaps a block group [%llu,%llu]!", start, len); dump_block_entry(fs_info, be); } rb_erase(&be->node, &fs_info->block_tree); free_block_entry(be); } spin_unlock(&fs_info->ref_verify_lock); } /* Walk down all roots and build the ref tree, meant to be called at mount */ int btrfs_build_ref_tree(struct btrfs_fs_info *fs_info) { struct btrfs_root *extent_root; struct btrfs_path *path; struct extent_buffer *eb; int tree_block_level = 0; u64 bytenr = 0, num_bytes = 0; int ret, level; if (!btrfs_test_opt(fs_info, REF_VERIFY)) return 0; path = btrfs_alloc_path(); if (!path) return -ENOMEM; extent_root = btrfs_extent_root(fs_info, 0); eb = btrfs_read_lock_root_node(extent_root); level = btrfs_header_level(eb); path->nodes[level] = eb; path->slots[level] = 0; path->locks[level] = BTRFS_READ_LOCK; while (1) { /* * We have to keep track of the bytenr/num_bytes we last hit * because we could have run out of space for an inline ref, and * would have had to added a ref key item which may appear on a * different leaf from the original extent item. */ ret = walk_down_tree(extent_root, path, level, &bytenr, &num_bytes, &tree_block_level); if (ret) break; ret = walk_up_tree(path, &level); if (ret < 0) break; if (ret > 0) { ret = 0; break; } } if (ret) { btrfs_free_ref_cache(fs_info); btrfs_clear_opt(fs_info->mount_opt, REF_VERIFY); } btrfs_free_path(path); return ret; } |
| 1 1 7 5 5 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 | /* * Copyright 2011 Red Hat, Inc. * Copyright © 2014 The Chromium OS Authors * * Permission is hereby granted, free of charge, to any person obtaining a * copy of this software and associated documentation files (the "Software") * to deal in the software without restriction, including without limitation * on the rights to use, copy, modify, merge, publish, distribute, sub * license, and/or sell copies of the Software, and to permit persons to whom * them Software is furnished to do so, subject to the following conditions: * * The above copyright notice and this permission notice (including the next * paragraph) shall be included in all copies or substantial portions of the * Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTIBILITY, * FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL * THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES, OR OTHER LIABILITY, WHETHER * IN AN ACTION OF CONTRACT, TORT, OR OTHERWISE, ARISING FROM, OUT OF OR IN * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * * Authors: * Adam Jackson <ajax@redhat.com> * Ben Widawsky <ben@bwidawsk.net> */ /* * This is vgem, a (non-hardware-backed) GEM service. This is used by Mesa's * software renderer and the X server for efficient buffer sharing. */ #include <linux/dma-buf.h> #include <linux/module.h> #include <linux/platform_device.h> #include <linux/shmem_fs.h> #include <linux/vmalloc.h> #include <drm/drm_drv.h> #include <drm/drm_file.h> #include <drm/drm_gem_shmem_helper.h> #include <drm/drm_ioctl.h> #include <drm/drm_managed.h> #include <drm/drm_prime.h> #include "vgem_drv.h" #define DRIVER_NAME "vgem" #define DRIVER_DESC "Virtual GEM provider" #define DRIVER_MAJOR 1 #define DRIVER_MINOR 0 static struct vgem_device { struct drm_device drm; struct platform_device *platform; } *vgem_device; static int vgem_open(struct drm_device *dev, struct drm_file *file) { struct vgem_file *vfile; int ret; vfile = kzalloc(sizeof(*vfile), GFP_KERNEL); if (!vfile) return -ENOMEM; file->driver_priv = vfile; ret = vgem_fence_open(vfile); if (ret) { kfree(vfile); return ret; } return 0; } static void vgem_postclose(struct drm_device *dev, struct drm_file *file) { struct vgem_file *vfile = file->driver_priv; vgem_fence_close(vfile); kfree(vfile); } static struct drm_ioctl_desc vgem_ioctls[] = { DRM_IOCTL_DEF_DRV(VGEM_FENCE_ATTACH, vgem_fence_attach_ioctl, DRM_RENDER_ALLOW), DRM_IOCTL_DEF_DRV(VGEM_FENCE_SIGNAL, vgem_fence_signal_ioctl, DRM_RENDER_ALLOW), }; DEFINE_DRM_GEM_FOPS(vgem_driver_fops); static struct drm_gem_object *vgem_gem_create_object(struct drm_device *dev, size_t size) { struct drm_gem_shmem_object *obj; obj = kzalloc(sizeof(*obj), GFP_KERNEL); if (!obj) return ERR_PTR(-ENOMEM); /* * vgem doesn't have any begin/end cpu access ioctls, therefore must use * coherent memory or dma-buf sharing just wont work. */ obj->map_wc = true; return &obj->base; } static const struct drm_driver vgem_driver = { .driver_features = DRIVER_GEM | DRIVER_RENDER, .open = vgem_open, .postclose = vgem_postclose, .ioctls = vgem_ioctls, .num_ioctls = ARRAY_SIZE(vgem_ioctls), .fops = &vgem_driver_fops, DRM_GEM_SHMEM_DRIVER_OPS, .gem_create_object = vgem_gem_create_object, .name = DRIVER_NAME, .desc = DRIVER_DESC, .major = DRIVER_MAJOR, .minor = DRIVER_MINOR, }; static int __init vgem_init(void) { int ret; struct platform_device *pdev; pdev = platform_device_register_simple("vgem", -1, NULL, 0); if (IS_ERR(pdev)) return PTR_ERR(pdev); if (!devres_open_group(&pdev->dev, NULL, GFP_KERNEL)) { ret = -ENOMEM; goto out_unregister; } dma_coerce_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64)); vgem_device = devm_drm_dev_alloc(&pdev->dev, &vgem_driver, struct vgem_device, drm); if (IS_ERR(vgem_device)) { ret = PTR_ERR(vgem_device); goto out_devres; } vgem_device->platform = pdev; /* Final step: expose the device/driver to userspace */ ret = drm_dev_register(&vgem_device->drm, 0); if (ret) goto out_devres; return 0; out_devres: devres_release_group(&pdev->dev, NULL); out_unregister: platform_device_unregister(pdev); return ret; } static void __exit vgem_exit(void) { struct platform_device *pdev = vgem_device->platform; drm_dev_unregister(&vgem_device->drm); devres_release_group(&pdev->dev, NULL); platform_device_unregister(pdev); } module_init(vgem_init); module_exit(vgem_exit); MODULE_AUTHOR("Red Hat, Inc."); MODULE_AUTHOR("Intel Corporation"); MODULE_DESCRIPTION(DRIVER_DESC); MODULE_LICENSE("GPL and additional rights"); |
| 53 12645 12645 136 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 | /* SPDX-License-Identifier: GPL-2.0 */ #ifndef __LINUX_PAGE_EXT_H #define __LINUX_PAGE_EXT_H #include <linux/types.h> #include <linux/mmzone.h> #include <linux/stacktrace.h> struct pglist_data; #ifdef CONFIG_PAGE_EXTENSION /** * struct page_ext_operations - per page_ext client operations * @offset: Offset to the client's data within page_ext. Offset is returned to * the client by page_ext_init. * @size: The size of the client data within page_ext. * @need: Function that returns true if client requires page_ext. * @init: (optional) Called to initialize client once page_exts are allocated. * @need_shared_flags: True when client is using shared page_ext->flags * field. * * Each Page Extension client must define page_ext_operations in * page_ext_ops array. */ struct page_ext_operations { size_t offset; size_t size; bool (*need)(void); void (*init)(void); bool need_shared_flags; }; /* * The page_ext_flags users must set need_shared_flags to true. */ enum page_ext_flags { PAGE_EXT_OWNER, PAGE_EXT_OWNER_ALLOCATED, #if defined(CONFIG_PAGE_IDLE_FLAG) && !defined(CONFIG_64BIT) PAGE_EXT_YOUNG, PAGE_EXT_IDLE, #endif }; /* * Page Extension can be considered as an extended mem_map. * A page_ext page is associated with every page descriptor. The * page_ext helps us add more information about the page. * All page_ext are allocated at boot or memory hotplug event, * then the page_ext for pfn always exists. */ struct page_ext { unsigned long flags; }; extern bool early_page_ext; extern unsigned long page_ext_size; extern void pgdat_page_ext_init(struct pglist_data *pgdat); static inline bool early_page_ext_enabled(void) { return early_page_ext; } #ifdef CONFIG_SPARSEMEM static inline void page_ext_init_flatmem(void) { } extern void page_ext_init(void); static inline void page_ext_init_flatmem_late(void) { } static inline bool page_ext_iter_next_fast_possible(unsigned long next_pfn) { /* * page_ext is allocated per memory section. Once we cross a * memory section, we have to fetch the new pointer. */ return next_pfn % PAGES_PER_SECTION; } #else extern void page_ext_init_flatmem(void); extern void page_ext_init_flatmem_late(void); static inline void page_ext_init(void) { } static inline bool page_ext_iter_next_fast_possible(unsigned long next_pfn) { return true; } #endif extern struct page_ext *page_ext_get(const struct page *page); extern void page_ext_put(struct page_ext *page_ext); extern struct page_ext *page_ext_lookup(unsigned long pfn); static inline void *page_ext_data(struct page_ext *page_ext, struct page_ext_operations *ops) { return (void *)(page_ext) + ops->offset; } static inline struct page_ext *page_ext_next(struct page_ext *curr) { void *next = curr; next += page_ext_size; return next; } struct page_ext_iter { unsigned long index; unsigned long start_pfn; struct page_ext *page_ext; }; /** * page_ext_iter_begin() - Prepare for iterating through page extensions. * @iter: page extension iterator. * @pfn: PFN of the page we're interested in. * * Must be called with RCU read lock taken. * * Return: NULL if no page_ext exists for this page. */ static inline struct page_ext *page_ext_iter_begin(struct page_ext_iter *iter, unsigned long pfn) { iter->index = 0; iter->start_pfn = pfn; iter->page_ext = page_ext_lookup(pfn); return iter->page_ext; } /** * page_ext_iter_next() - Get next page extension * @iter: page extension iterator. * * Must be called with RCU read lock taken. * * Return: NULL if no next page_ext exists. */ static inline struct page_ext *page_ext_iter_next(struct page_ext_iter *iter) { unsigned long pfn; if (WARN_ON_ONCE(!iter->page_ext)) return NULL; iter->index++; pfn = iter->start_pfn + iter->index; if (page_ext_iter_next_fast_possible(pfn)) iter->page_ext = page_ext_next(iter->page_ext); else iter->page_ext = page_ext_lookup(pfn); return iter->page_ext; } /** * page_ext_iter_get() - Get current page extension * @iter: page extension iterator. * * Return: NULL if no page_ext exists for this iterator. */ static inline struct page_ext *page_ext_iter_get(const struct page_ext_iter *iter) { return iter->page_ext; } /** * for_each_page_ext(): iterate through page_ext objects. * @__page: the page we're interested in * @__pgcount: how many pages to iterate through * @__page_ext: struct page_ext pointer where the current page_ext * object is returned * @__iter: struct page_ext_iter object (defined in the stack) * * IMPORTANT: must be called with RCU read lock taken. */ #define for_each_page_ext(__page, __pgcount, __page_ext, __iter) \ for (__page_ext = page_ext_iter_begin(&__iter, page_to_pfn(__page));\ __page_ext && __iter.index < __pgcount; \ __page_ext = page_ext_iter_next(&__iter)) #else /* !CONFIG_PAGE_EXTENSION */ struct page_ext; static inline bool early_page_ext_enabled(void) { return false; } static inline void pgdat_page_ext_init(struct pglist_data *pgdat) { } static inline void page_ext_init(void) { } static inline void page_ext_init_flatmem_late(void) { } static inline void page_ext_init_flatmem(void) { } static inline struct page_ext *page_ext_get(const struct page *page) { return NULL; } static inline void page_ext_put(struct page_ext *page_ext) { } #endif /* CONFIG_PAGE_EXTENSION */ #endif /* __LINUX_PAGE_EXT_H */ |
| 18 18 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 | // SPDX-License-Identifier: GPL-2.0 #include <linux/sysctl.h> #include <net/lwtunnel.h> #include <net/netfilter/nf_hooks_lwtunnel.h> #include <linux/netfilter.h> #include "nf_internals.h" static inline int nf_hooks_lwtunnel_get(void) { if (static_branch_unlikely(&nf_hooks_lwtunnel_enabled)) return 1; else return 0; } static inline int nf_hooks_lwtunnel_set(int enable) { if (static_branch_unlikely(&nf_hooks_lwtunnel_enabled)) { if (!enable) return -EBUSY; } else if (enable) { static_branch_enable(&nf_hooks_lwtunnel_enabled); } return 0; } #ifdef CONFIG_SYSCTL int nf_hooks_lwtunnel_sysctl_handler(const struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos) { int proc_nf_hooks_lwtunnel_enabled = 0; struct ctl_table tmp = { .procname = table->procname, .data = &proc_nf_hooks_lwtunnel_enabled, .maxlen = sizeof(int), .mode = table->mode, .extra1 = SYSCTL_ZERO, .extra2 = SYSCTL_ONE, }; int ret; if (!write) proc_nf_hooks_lwtunnel_enabled = nf_hooks_lwtunnel_get(); ret = proc_dointvec_minmax(&tmp, write, buffer, lenp, ppos); if (write && ret == 0) ret = nf_hooks_lwtunnel_set(proc_nf_hooks_lwtunnel_enabled); return ret; } EXPORT_SYMBOL_GPL(nf_hooks_lwtunnel_sysctl_handler); static struct ctl_table nf_lwtunnel_sysctl_table[] = { { .procname = "nf_hooks_lwtunnel", .data = NULL, .maxlen = sizeof(int), .mode = 0644, .proc_handler = nf_hooks_lwtunnel_sysctl_handler, }, }; static int __net_init nf_lwtunnel_net_init(struct net *net) { struct ctl_table_header *hdr; struct ctl_table *table; table = nf_lwtunnel_sysctl_table; if (!net_eq(net, &init_net)) { table = kmemdup(nf_lwtunnel_sysctl_table, sizeof(nf_lwtunnel_sysctl_table), GFP_KERNEL); if (!table) goto err_alloc; } hdr = register_net_sysctl_sz(net, "net/netfilter", table, ARRAY_SIZE(nf_lwtunnel_sysctl_table)); if (!hdr) goto err_reg; net->nf.nf_lwtnl_dir_header = hdr; return 0; err_reg: if (!net_eq(net, &init_net)) kfree(table); err_alloc: return -ENOMEM; } static void __net_exit nf_lwtunnel_net_exit(struct net *net) { const struct ctl_table *table; table = net->nf.nf_lwtnl_dir_header->ctl_table_arg; unregister_net_sysctl_table(net->nf.nf_lwtnl_dir_header); if (!net_eq(net, &init_net)) kfree(table); } static struct pernet_operations nf_lwtunnel_net_ops = { .init = nf_lwtunnel_net_init, .exit = nf_lwtunnel_net_exit, }; int __init netfilter_lwtunnel_init(void) { return register_pernet_subsys(&nf_lwtunnel_net_ops); } void netfilter_lwtunnel_fini(void) { unregister_pernet_subsys(&nf_lwtunnel_net_ops); } #else int __init netfilter_lwtunnel_init(void) { return 0; } void netfilter_lwtunnel_fini(void) {} #endif /* CONFIG_SYSCTL */ |
| 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 2297 2298 2299 2300 2301 2302 2303 2304 2305 2306 2307 2308 2309 2310 2311 2312 2313 2314 2315 2316 2317 2318 2319 2320 2321 2322 2323 2324 2325 2326 2327 2328 2329 2330 2331 2332 2333 2334 2335 2336 2337 2338 2339 2340 2341 2342 2343 2344 2345 2346 2347 2348 2349 2350 2351 2352 2353 2354 2355 2356 2357 2358 2359 2360 2361 2362 2363 2364 2365 2366 2367 2368 2369 2370 2371 2372 2373 2374 2375 2376 2377 2378 2379 2380 2381 2382 2383 2384 2385 2386 2387 2388 2389 2390 2391 2392 2393 2394 2395 2396 2397 2398 2399 2400 2401 2402 2403 2404 2405 2406 2407 2408 2409 2410 2411 2412 2413 2414 2415 2416 2417 2418 2419 2420 2421 2422 2423 2424 2425 2426 2427 2428 2429 2430 2431 2432 2433 2434 2435 2436 2437 2438 2439 2440 2441 2442 2443 2444 2445 2446 2447 2448 2449 2450 2451 2452 2453 2454 2455 2456 2457 2458 2459 2460 2461 2462 2463 2464 2465 2466 2467 2468 2469 2470 2471 2472 2473 2474 2475 2476 2477 2478 2479 2480 2481 2482 2483 2484 2485 2486 2487 2488 2489 2490 2491 2492 2493 2494 2495 2496 2497 2498 2499 2500 2501 2502 2503 2504 2505 2506 2507 2508 2509 2510 2511 2512 2513 2514 2515 2516 2517 2518 2519 2520 2521 2522 2523 2524 2525 2526 2527 2528 2529 2530 2531 2532 2533 2534 2535 2536 2537 2538 2539 2540 2541 2542 2543 2544 2545 2546 2547 2548 2549 2550 2551 2552 2553 2554 2555 2556 2557 2558 2559 2560 2561 2562 2563 2564 2565 2566 2567 2568 2569 2570 2571 2572 2573 2574 2575 2576 2577 2578 2579 2580 2581 2582 2583 2584 2585 2586 2587 2588 2589 2590 2591 2592 2593 2594 2595 2596 2597 2598 2599 2600 2601 2602 2603 2604 2605 2606 2607 2608 2609 2610 2611 2612 2613 2614 2615 2616 2617 2618 2619 2620 2621 2622 2623 2624 2625 2626 2627 2628 2629 2630 2631 2632 2633 2634 2635 2636 2637 2638 2639 2640 2641 2642 2643 2644 2645 2646 2647 2648 2649 2650 2651 2652 2653 2654 2655 2656 2657 2658 2659 2660 2661 2662 2663 2664 2665 2666 2667 2668 2669 2670 2671 2672 2673 2674 2675 2676 2677 2678 2679 2680 2681 2682 2683 2684 2685 2686 2687 2688 2689 2690 2691 2692 2693 2694 2695 2696 2697 2698 2699 2700 2701 2702 2703 2704 2705 2706 2707 2708 2709 2710 2711 2712 2713 2714 2715 2716 2717 2718 2719 2720 2721 2722 2723 2724 2725 2726 2727 2728 2729 2730 2731 2732 2733 2734 2735 2736 2737 2738 2739 2740 2741 2742 2743 2744 2745 2746 2747 2748 2749 2750 2751 2752 2753 2754 2755 2756 2757 2758 2759 2760 2761 2762 2763 2764 2765 2766 2767 2768 2769 2770 2771 2772 2773 2774 2775 2776 2777 2778 2779 2780 2781 2782 2783 2784 2785 2786 2787 2788 2789 2790 2791 2792 2793 2794 2795 2796 2797 2798 2799 2800 2801 2802 2803 2804 2805 2806 2807 2808 2809 2810 2811 2812 2813 2814 2815 2816 2817 2818 2819 2820 2821 2822 2823 2824 2825 2826 2827 2828 2829 2830 2831 2832 2833 2834 2835 2836 2837 2838 2839 2840 2841 2842 2843 2844 2845 2846 2847 2848 2849 2850 2851 2852 2853 2854 2855 2856 2857 2858 2859 2860 2861 2862 2863 2864 2865 2866 2867 2868 2869 2870 2871 2872 2873 2874 2875 2876 2877 2878 2879 2880 2881 2882 2883 2884 2885 2886 2887 2888 2889 2890 2891 2892 2893 2894 2895 2896 2897 2898 2899 2900 2901 2902 2903 2904 2905 2906 2907 2908 2909 2910 2911 2912 2913 2914 2915 2916 2917 2918 2919 2920 2921 2922 2923 2924 2925 2926 2927 2928 2929 2930 2931 2932 2933 2934 2935 2936 2937 2938 2939 2940 2941 2942 2943 2944 2945 2946 2947 2948 2949 2950 2951 2952 2953 2954 2955 2956 2957 2958 2959 2960 2961 2962 2963 2964 2965 2966 2967 2968 2969 2970 2971 2972 2973 2974 2975 2976 2977 2978 2979 2980 2981 2982 2983 2984 2985 2986 2987 2988 2989 2990 2991 2992 2993 2994 2995 2996 2997 2998 2999 3000 3001 3002 3003 3004 3005 3006 3007 3008 3009 3010 3011 3012 3013 3014 3015 3016 3017 3018 3019 3020 3021 3022 3023 3024 3025 3026 3027 3028 3029 3030 3031 3032 3033 3034 3035 3036 3037 3038 3039 3040 3041 3042 3043 3044 3045 3046 3047 3048 3049 3050 3051 3052 3053 3054 3055 3056 3057 3058 3059 3060 3061 3062 3063 3064 3065 3066 3067 3068 3069 3070 3071 3072 3073 3074 3075 3076 3077 3078 3079 3080 3081 3082 3083 3084 3085 3086 3087 3088 3089 3090 3091 3092 3093 3094 3095 3096 3097 3098 3099 3100 3101 3102 3103 3104 3105 3106 3107 3108 3109 3110 3111 3112 3113 3114 3115 3116 3117 3118 3119 3120 3121 3122 3123 3124 3125 3126 3127 3128 3129 3130 3131 3132 3133 3134 3135 3136 3137 3138 3139 3140 3141 3142 3143 3144 3145 3146 3147 3148 3149 3150 3151 3152 3153 3154 3155 3156 3157 3158 3159 3160 3161 3162 3163 3164 3165 3166 3167 3168 3169 3170 3171 3172 3173 3174 3175 3176 3177 3178 3179 3180 3181 3182 3183 3184 3185 3186 3187 3188 3189 3190 3191 3192 3193 3194 3195 3196 3197 3198 3199 3200 3201 3202 3203 3204 3205 3206 3207 3208 3209 3210 3211 3212 3213 3214 3215 3216 3217 3218 3219 3220 3221 3222 3223 3224 3225 3226 3227 3228 3229 3230 3231 3232 3233 3234 3235 3236 3237 3238 3239 3240 3241 3242 3243 3244 3245 3246 3247 3248 3249 3250 3251 3252 3253 3254 3255 3256 3257 3258 3259 3260 3261 3262 3263 3264 3265 3266 3267 3268 3269 3270 3271 3272 3273 3274 3275 3276 3277 3278 3279 3280 3281 3282 3283 3284 3285 3286 3287 3288 3289 3290 3291 3292 3293 3294 3295 3296 3297 3298 3299 3300 3301 3302 3303 3304 3305 3306 3307 3308 3309 3310 3311 3312 3313 3314 3315 3316 3317 3318 3319 3320 3321 3322 3323 3324 3325 3326 3327 3328 3329 3330 3331 3332 3333 3334 3335 3336 3337 3338 3339 3340 3341 3342 3343 3344 3345 3346 3347 3348 3349 3350 3351 3352 3353 3354 3355 3356 3357 3358 3359 3360 3361 3362 3363 3364 3365 3366 3367 3368 3369 3370 3371 3372 3373 3374 3375 3376 3377 3378 3379 3380 3381 3382 3383 3384 3385 3386 3387 3388 3389 3390 3391 3392 3393 3394 3395 3396 3397 3398 3399 3400 3401 3402 3403 3404 3405 3406 3407 3408 3409 3410 3411 3412 3413 3414 3415 3416 3417 3418 3419 3420 3421 3422 3423 3424 3425 3426 3427 3428 3429 3430 3431 3432 3433 3434 3435 3436 3437 3438 3439 3440 3441 3442 3443 3444 3445 3446 3447 3448 3449 3450 3451 3452 3453 3454 3455 3456 3457 3458 3459 3460 3461 3462 3463 3464 3465 3466 3467 3468 3469 3470 3471 3472 3473 3474 3475 3476 3477 3478 3479 3480 3481 3482 3483 3484 3485 3486 3487 3488 3489 3490 3491 3492 3493 3494 3495 3496 3497 3498 3499 3500 3501 3502 3503 3504 3505 3506 3507 3508 3509 3510 3511 3512 3513 3514 3515 3516 3517 3518 3519 3520 3521 3522 3523 3524 3525 3526 3527 3528 3529 3530 3531 3532 3533 3534 3535 3536 3537 3538 3539 3540 3541 3542 3543 3544 3545 3546 3547 3548 3549 3550 3551 3552 3553 3554 3555 3556 3557 3558 3559 3560 3561 3562 3563 3564 3565 3566 3567 3568 3569 3570 3571 3572 3573 3574 3575 3576 3577 3578 3579 3580 3581 3582 3583 3584 3585 3586 3587 3588 3589 3590 3591 3592 3593 3594 3595 3596 3597 3598 3599 3600 3601 3602 3603 3604 3605 3606 3607 3608 3609 3610 3611 3612 3613 3614 3615 3616 3617 3618 3619 3620 3621 3622 3623 3624 3625 3626 3627 3628 3629 3630 3631 3632 3633 3634 3635 3636 3637 3638 3639 3640 3641 3642 3643 3644 3645 3646 3647 3648 3649 3650 3651 3652 3653 3654 3655 3656 3657 3658 3659 3660 3661 3662 3663 3664 3665 3666 3667 3668 3669 3670 3671 3672 3673 3674 3675 3676 3677 3678 3679 3680 3681 3682 3683 3684 3685 3686 3687 3688 3689 3690 3691 3692 3693 3694 3695 3696 3697 3698 3699 3700 3701 3702 3703 3704 3705 3706 3707 3708 3709 3710 3711 3712 3713 3714 3715 3716 3717 3718 3719 3720 3721 3722 3723 3724 3725 3726 3727 3728 3729 3730 3731 3732 3733 3734 3735 3736 3737 3738 3739 3740 3741 3742 3743 3744 3745 3746 3747 3748 3749 3750 3751 3752 3753 3754 3755 3756 3757 3758 3759 3760 3761 3762 3763 3764 3765 3766 3767 3768 3769 3770 3771 3772 3773 3774 3775 3776 3777 3778 3779 3780 3781 3782 3783 3784 3785 3786 3787 3788 3789 3790 3791 3792 3793 3794 3795 3796 3797 3798 3799 3800 3801 3802 3803 3804 3805 3806 3807 3808 3809 3810 3811 3812 3813 3814 3815 3816 3817 3818 3819 3820 3821 3822 3823 3824 3825 3826 3827 3828 3829 3830 3831 3832 3833 3834 3835 3836 3837 3838 3839 3840 3841 3842 3843 3844 3845 3846 3847 3848 3849 3850 3851 3852 3853 3854 3855 3856 3857 3858 3859 3860 3861 3862 3863 3864 3865 3866 3867 3868 3869 3870 3871 3872 3873 3874 3875 3876 3877 3878 3879 3880 3881 3882 3883 3884 3885 3886 3887 3888 3889 3890 3891 3892 3893 3894 3895 3896 3897 3898 3899 3900 3901 3902 3903 3904 3905 3906 3907 3908 3909 3910 3911 3912 3913 3914 3915 3916 3917 3918 3919 3920 3921 3922 3923 3924 3925 3926 3927 3928 3929 3930 3931 3932 3933 3934 3935 3936 3937 3938 3939 3940 3941 3942 3943 3944 3945 3946 3947 3948 3949 3950 3951 3952 3953 3954 3955 3956 3957 3958 3959 3960 3961 3962 3963 3964 3965 3966 3967 3968 3969 3970 3971 3972 3973 3974 3975 3976 3977 3978 3979 3980 3981 3982 3983 3984 3985 3986 3987 3988 3989 3990 3991 3992 3993 3994 3995 3996 3997 3998 3999 4000 4001 4002 4003 4004 4005 4006 4007 4008 4009 4010 4011 4012 4013 4014 4015 4016 4017 4018 4019 4020 4021 4022 4023 4024 4025 4026 4027 4028 4029 4030 4031 4032 4033 4034 4035 4036 4037 4038 4039 4040 4041 4042 4043 4044 4045 4046 4047 4048 4049 4050 4051 4052 4053 4054 4055 4056 4057 4058 4059 4060 4061 4062 4063 4064 4065 4066 4067 4068 4069 4070 4071 4072 4073 4074 4075 4076 4077 4078 4079 4080 4081 4082 4083 4084 4085 4086 4087 4088 4089 4090 4091 4092 4093 4094 4095 4096 4097 4098 4099 4100 4101 4102 4103 4104 4105 4106 4107 4108 4109 4110 4111 4112 4113 4114 4115 4116 4117 4118 4119 4120 4121 4122 4123 4124 4125 4126 4127 4128 4129 4130 4131 4132 4133 4134 4135 4136 4137 4138 4139 4140 4141 4142 4143 4144 4145 4146 4147 4148 4149 4150 4151 4152 4153 4154 4155 4156 4157 4158 4159 4160 4161 4162 4163 4164 4165 4166 4167 4168 4169 4170 4171 4172 4173 4174 4175 4176 4177 4178 4179 4180 4181 4182 4183 4184 4185 4186 4187 4188 4189 4190 4191 4192 4193 4194 4195 4196 4197 4198 4199 4200 4201 4202 4203 4204 4205 4206 4207 4208 4209 4210 4211 4212 4213 4214 4215 4216 4217 4218 4219 4220 4221 4222 4223 4224 4225 4226 4227 4228 4229 4230 4231 4232 4233 4234 4235 4236 4237 4238 4239 4240 4241 4242 4243 4244 4245 4246 4247 4248 4249 4250 4251 4252 4253 4254 4255 4256 4257 4258 4259 4260 4261 4262 4263 4264 4265 4266 4267 4268 4269 4270 4271 4272 4273 4274 4275 4276 4277 4278 4279 4280 4281 4282 4283 4284 4285 4286 4287 4288 4289 4290 4291 4292 4293 4294 4295 4296 4297 4298 4299 4300 4301 4302 4303 4304 4305 4306 4307 4308 4309 4310 4311 4312 4313 4314 4315 4316 4317 4318 4319 4320 4321 4322 4323 4324 4325 4326 4327 4328 4329 4330 4331 4332 4333 4334 4335 4336 4337 4338 4339 4340 4341 4342 4343 4344 4345 4346 4347 4348 4349 4350 4351 4352 4353 4354 4355 4356 4357 4358 4359 4360 4361 4362 4363 4364 4365 4366 4367 4368 4369 4370 4371 4372 4373 4374 4375 4376 4377 4378 4379 4380 4381 4382 4383 4384 4385 4386 4387 4388 4389 4390 4391 4392 4393 4394 4395 4396 4397 4398 4399 4400 4401 4402 4403 4404 4405 4406 4407 4408 4409 4410 4411 4412 4413 4414 4415 4416 4417 4418 4419 4420 4421 4422 4423 4424 4425 4426 4427 4428 4429 4430 4431 4432 4433 4434 4435 4436 4437 4438 4439 4440 4441 4442 4443 4444 4445 4446 4447 4448 4449 4450 4451 4452 4453 4454 4455 4456 4457 4458 4459 4460 4461 4462 4463 4464 4465 4466 4467 4468 4469 4470 4471 4472 4473 4474 4475 4476 4477 4478 4479 4480 4481 4482 4483 4484 4485 4486 4487 4488 4489 4490 4491 4492 4493 4494 4495 4496 4497 4498 4499 4500 4501 4502 4503 4504 4505 4506 4507 4508 4509 4510 4511 4512 4513 4514 4515 4516 4517 4518 4519 4520 4521 4522 4523 4524 4525 4526 4527 4528 4529 4530 4531 4532 4533 4534 4535 4536 4537 4538 4539 4540 4541 4542 4543 4544 4545 4546 4547 4548 4549 4550 4551 4552 4553 4554 4555 4556 4557 4558 4559 4560 4561 4562 4563 4564 4565 4566 4567 4568 4569 4570 4571 4572 4573 4574 4575 4576 4577 4578 4579 4580 4581 4582 4583 4584 4585 4586 4587 4588 4589 4590 4591 4592 4593 4594 4595 4596 4597 4598 4599 4600 4601 4602 4603 4604 4605 4606 4607 4608 4609 4610 4611 4612 4613 4614 4615 4616 4617 4618 4619 4620 4621 4622 4623 4624 4625 4626 4627 4628 4629 4630 4631 4632 4633 4634 4635 4636 4637 4638 4639 4640 4641 4642 4643 4644 4645 4646 4647 4648 4649 4650 4651 4652 4653 4654 4655 4656 4657 4658 4659 4660 4661 4662 4663 4664 4665 4666 4667 4668 4669 4670 4671 4672 4673 4674 4675 4676 4677 4678 4679 4680 4681 4682 4683 4684 4685 4686 4687 4688 4689 4690 4691 4692 4693 4694 4695 4696 4697 4698 4699 4700 4701 4702 4703 4704 4705 4706 4707 4708 4709 4710 4711 4712 4713 4714 4715 4716 4717 4718 4719 4720 4721 4722 4723 4724 4725 4726 4727 4728 4729 4730 4731 4732 4733 4734 4735 4736 4737 4738 4739 4740 4741 4742 4743 4744 4745 4746 4747 4748 4749 4750 4751 4752 4753 4754 4755 4756 4757 4758 4759 4760 4761 4762 4763 4764 4765 4766 4767 4768 4769 4770 4771 4772 4773 4774 4775 4776 4777 4778 4779 4780 4781 4782 4783 4784 4785 4786 4787 4788 4789 4790 4791 4792 4793 4794 4795 4796 4797 4798 4799 4800 4801 4802 4803 4804 4805 4806 4807 4808 4809 4810 4811 4812 4813 4814 4815 4816 4817 4818 4819 4820 4821 4822 4823 4824 4825 4826 4827 4828 4829 4830 4831 4832 4833 4834 4835 4836 4837 4838 4839 4840 4841 4842 4843 4844 4845 4846 4847 4848 4849 4850 4851 4852 4853 4854 4855 4856 4857 4858 4859 4860 4861 4862 4863 4864 4865 4866 4867 4868 4869 4870 4871 4872 4873 4874 4875 4876 4877 4878 4879 4880 4881 4882 4883 4884 4885 4886 4887 4888 4889 4890 4891 4892 4893 4894 4895 4896 4897 4898 4899 4900 4901 4902 4903 4904 4905 4906 4907 4908 4909 4910 4911 4912 4913 4914 4915 4916 4917 4918 4919 4920 4921 4922 4923 4924 4925 4926 4927 4928 4929 4930 4931 4932 4933 4934 4935 4936 4937 4938 4939 4940 4941 4942 4943 4944 4945 4946 4947 4948 4949 4950 4951 4952 4953 4954 4955 4956 4957 4958 4959 4960 4961 4962 4963 4964 4965 4966 4967 4968 4969 4970 4971 4972 4973 4974 4975 4976 4977 4978 4979 4980 4981 4982 4983 4984 4985 4986 4987 4988 4989 4990 4991 4992 4993 4994 4995 4996 4997 4998 4999 5000 5001 5002 5003 5004 5005 5006 5007 5008 5009 5010 5011 5012 5013 5014 5015 5016 5017 5018 5019 5020 5021 5022 5023 5024 5025 5026 5027 5028 5029 5030 | // SPDX-License-Identifier: GPL-2.0-or-later /* * OV519 driver * * Copyright (C) 2008-2011 Jean-François Moine <moinejf@free.fr> * Copyright (C) 2009 Hans de Goede <hdegoede@redhat.com> * * This module is adapted from the ov51x-jpeg package, which itself * was adapted from the ov511 driver. * * Original copyright for the ov511 driver is: * * Copyright (c) 1999-2006 Mark W. McClelland * Support for OV519, OV8610 Copyright (c) 2003 Joerg Heckenbach * Many improvements by Bret Wallach <bwallac1@san.rr.com> * Color fixes by by Orion Sky Lawlor <olawlor@acm.org> (2/26/2000) * OV7620 fixes by Charl P. Botha <cpbotha@ieee.org> * Changes by Claudio Matsuoka <claudio@conectiva.com> * * ov51x-jpeg original copyright is: * * Copyright (c) 2004-2007 Romain Beauxis <toots@rastageeks.org> * Support for OV7670 sensors was contributed by Sam Skipsey <aoanla@yahoo.com> */ #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt #define MODULE_NAME "ov519" #include <linux/input.h> #include "gspca.h" /* The jpeg_hdr is used by w996Xcf only */ /* The CONEX_CAM define for jpeg.h needs renaming, now its used here too */ #define CONEX_CAM #include "jpeg.h" MODULE_AUTHOR("Jean-Francois Moine <http://moinejf.free.fr>"); MODULE_DESCRIPTION("OV519 USB Camera Driver"); MODULE_LICENSE("GPL"); /* global parameters */ static int frame_rate; /* Number of times to retry a failed I2C transaction. Increase this if you * are getting "Failed to read sensor ID..." */ static int i2c_detect_tries = 10; /* ov519 device descriptor */ struct sd { struct gspca_dev gspca_dev; /* !! must be the first item */ struct v4l2_ctrl *jpegqual; struct v4l2_ctrl *freq; struct { /* h/vflip control cluster */ struct v4l2_ctrl *hflip; struct v4l2_ctrl *vflip; }; struct { /* autobrightness/brightness control cluster */ struct v4l2_ctrl *autobright; struct v4l2_ctrl *brightness; }; u8 revision; u8 packet_nr; char bridge; #define BRIDGE_OV511 0 #define BRIDGE_OV511PLUS 1 #define BRIDGE_OV518 2 #define BRIDGE_OV518PLUS 3 #define BRIDGE_OV519 4 /* = ov530 */ #define BRIDGE_OVFX2 5 #define BRIDGE_W9968CF 6 #define BRIDGE_MASK 7 char invert_led; #define BRIDGE_INVERT_LED 8 char snapshot_pressed; char snapshot_needs_reset; /* Determined by sensor type */ u8 sif; #define QUALITY_MIN 50 #define QUALITY_MAX 70 #define QUALITY_DEF 50 u8 stopped; /* Streaming is temporarily paused */ u8 first_frame; u8 frame_rate; /* current Framerate */ u8 clockdiv; /* clockdiv override */ s8 sensor; /* Type of image sensor chip (SEN_*) */ u8 sensor_addr; u16 sensor_width; u16 sensor_height; s16 sensor_reg_cache[256]; u8 jpeg_hdr[JPEG_HDR_SZ]; }; enum sensors { SEN_OV2610, SEN_OV2610AE, SEN_OV3610, SEN_OV6620, SEN_OV6630, SEN_OV66308AF, SEN_OV7610, SEN_OV7620, SEN_OV7620AE, SEN_OV7640, SEN_OV7648, SEN_OV7660, SEN_OV7670, SEN_OV76BE, SEN_OV8610, SEN_OV9600, }; /* Note this is a bit of a hack, but the w9968cf driver needs the code for all the ov sensors which is already present here. When we have the time we really should move the sensor drivers to v4l2 sub drivers. */ #include "w996Xcf.c" /* table of the disabled controls */ struct ctrl_valid { unsigned int has_brightness:1; unsigned int has_contrast:1; unsigned int has_exposure:1; unsigned int has_autogain:1; unsigned int has_sat:1; unsigned int has_hvflip:1; unsigned int has_autobright:1; unsigned int has_freq:1; }; static const struct ctrl_valid valid_controls[] = { [SEN_OV2610] = { .has_exposure = 1, .has_autogain = 1, }, [SEN_OV2610AE] = { .has_exposure = 1, .has_autogain = 1, }, [SEN_OV3610] = { /* No controls */ }, [SEN_OV6620] = { .has_brightness = 1, .has_contrast = 1, .has_sat = 1, .has_autobright = 1, .has_freq = 1, }, [SEN_OV6630] = { .has_brightness = 1, .has_contrast = 1, .has_sat = 1, .has_autobright = 1, .has_freq = 1, }, [SEN_OV66308AF] = { .has_brightness = 1, .has_contrast = 1, .has_sat = 1, .has_autobright = 1, .has_freq = 1, }, [SEN_OV7610] = { .has_brightness = 1, .has_contrast = 1, .has_sat = 1, .has_autobright = 1, .has_freq = 1, }, [SEN_OV7620] = { .has_brightness = 1, .has_contrast = 1, .has_sat = 1, .has_autobright = 1, .has_freq = 1, }, [SEN_OV7620AE] = { .has_brightness = 1, .has_contrast = 1, .has_sat = 1, .has_autobright = 1, .has_freq = 1, }, [SEN_OV7640] = { .has_brightness = 1, .has_sat = 1, .has_freq = 1, }, [SEN_OV7648] = { .has_brightness = 1, .has_sat = 1, .has_freq = 1, }, [SEN_OV7660] = { .has_brightness = 1, .has_contrast = 1, .has_sat = 1, .has_hvflip = 1, .has_freq = 1, }, [SEN_OV7670] = { .has_brightness = 1, .has_contrast = 1, .has_hvflip = 1, .has_freq = 1, }, [SEN_OV76BE] = { .has_brightness = 1, .has_contrast = 1, .has_sat = 1, .has_autobright = 1, .has_freq = 1, }, [SEN_OV8610] = { .has_brightness = 1, .has_contrast = 1, .has_sat = 1, .has_autobright = 1, }, [SEN_OV9600] = { .has_exposure = 1, .has_autogain = 1, }, }; static const struct v4l2_pix_format ov519_vga_mode[] = { {320, 240, V4L2_PIX_FMT_JPEG, V4L2_FIELD_NONE, .bytesperline = 320, .sizeimage = 320 * 240 * 3 / 8 + 590, .colorspace = V4L2_COLORSPACE_JPEG, .priv = 1}, {640, 480, V4L2_PIX_FMT_JPEG, V4L2_FIELD_NONE, .bytesperline = 640, .sizeimage = 640 * 480 * 3 / 8 + 590, .colorspace = V4L2_COLORSPACE_JPEG, .priv = 0}, }; static const struct v4l2_pix_format ov519_sif_mode[] = { {160, 120, V4L2_PIX_FMT_JPEG, V4L2_FIELD_NONE, .bytesperline = 160, .sizeimage = 160 * 120 * 3 / 8 + 590, .colorspace = V4L2_COLORSPACE_JPEG, .priv = 3}, {176, 144, V4L2_PIX_FMT_JPEG, V4L2_FIELD_NONE, .bytesperline = 176, .sizeimage = 176 * 144 * 3 / 8 + 590, .colorspace = V4L2_COLORSPACE_JPEG, .priv = 1}, {320, 240, V4L2_PIX_FMT_JPEG, V4L2_FIELD_NONE, .bytesperline = 320, .sizeimage = 320 * 240 * 3 / 8 + 590, .colorspace = V4L2_COLORSPACE_JPEG, .priv = 2}, {352, 288, V4L2_PIX_FMT_JPEG, V4L2_FIELD_NONE, .bytesperline = 352, .sizeimage = 352 * 288 * 3 / 8 + 590, .colorspace = V4L2_COLORSPACE_JPEG, .priv = 0}, }; /* Note some of the sizeimage values for the ov511 / ov518 may seem larger then necessary, however they need to be this big as the ov511 / ov518 always fills the entire isoc frame, using 0 padding bytes when it doesn't have any data. So with low framerates the amount of data transferred can become quite large (libv4l will remove all the 0 padding in userspace). */ static const struct v4l2_pix_format ov518_vga_mode[] = { {320, 240, V4L2_PIX_FMT_OV518, V4L2_FIELD_NONE, .bytesperline = 320, .sizeimage = 320 * 240 * 3, .colorspace = V4L2_COLORSPACE_JPEG, .priv = 1}, {640, 480, V4L2_PIX_FMT_OV518, V4L2_FIELD_NONE, .bytesperline = 640, .sizeimage = 640 * 480 * 2, .colorspace = V4L2_COLORSPACE_JPEG, .priv = 0}, }; static const struct v4l2_pix_format ov518_sif_mode[] = { {160, 120, V4L2_PIX_FMT_OV518, V4L2_FIELD_NONE, .bytesperline = 160, .sizeimage = 70000, .colorspace = V4L2_COLORSPACE_JPEG, .priv = 3}, {176, 144, V4L2_PIX_FMT_OV518, V4L2_FIELD_NONE, .bytesperline = 176, .sizeimage = 70000, .colorspace = V4L2_COLORSPACE_JPEG, .priv = 1}, {320, 240, V4L2_PIX_FMT_OV518, V4L2_FIELD_NONE, .bytesperline = 320, .sizeimage = 320 * 240 * 3, .colorspace = V4L2_COLORSPACE_JPEG, .priv = 2}, {352, 288, V4L2_PIX_FMT_OV518, V4L2_FIELD_NONE, .bytesperline = 352, .sizeimage = 352 * 288 * 3, .colorspace = V4L2_COLORSPACE_JPEG, .priv = 0}, }; static const struct v4l2_pix_format ov511_vga_mode[] = { {320, 240, V4L2_PIX_FMT_OV511, V4L2_FIELD_NONE, .bytesperline = 320, .sizeimage = 320 * 240 * 3, .colorspace = V4L2_COLORSPACE_JPEG, .priv = 1}, {640, 480, V4L2_PIX_FMT_OV511, V4L2_FIELD_NONE, .bytesperline = 640, .sizeimage = 640 * 480 * 2, .colorspace = V4L2_COLORSPACE_JPEG, .priv = 0}, }; static const struct v4l2_pix_format ov511_sif_mode[] = { {160, 120, V4L2_PIX_FMT_OV511, V4L2_FIELD_NONE, .bytesperline = 160, .sizeimage = 70000, .colorspace = V4L2_COLORSPACE_JPEG, .priv = 3}, {176, 144, V4L2_PIX_FMT_OV511, V4L2_FIELD_NONE, .bytesperline = 176, .sizeimage = 70000, .colorspace = V4L2_COLORSPACE_JPEG, .priv = 1}, {320, 240, V4L2_PIX_FMT_OV511, V4L2_FIELD_NONE, .bytesperline = 320, .sizeimage = 320 * 240 * 3, .colorspace = V4L2_COLORSPACE_JPEG, .priv = 2}, {352, 288, V4L2_PIX_FMT_OV511, V4L2_FIELD_NONE, .bytesperline = 352, .sizeimage = 352 * 288 * 3, .colorspace = V4L2_COLORSPACE_JPEG, .priv = 0}, }; static const struct v4l2_pix_format ovfx2_ov2610_mode[] = { {800, 600, V4L2_PIX_FMT_SBGGR8, V4L2_FIELD_NONE, .bytesperline = 800, .sizeimage = 800 * 600, .colorspace = V4L2_COLORSPACE_SRGB, .priv = 1}, {1600, 1200, V4L2_PIX_FMT_SBGGR8, V4L2_FIELD_NONE, .bytesperline = 1600, .sizeimage = 1600 * 1200, .colorspace = V4L2_COLORSPACE_SRGB}, }; static const struct v4l2_pix_format ovfx2_ov3610_mode[] = { {640, 480, V4L2_PIX_FMT_SBGGR8, V4L2_FIELD_NONE, .bytesperline = 640, .sizeimage = 640 * 480, .colorspace = V4L2_COLORSPACE_SRGB, .priv = 1}, {800, 600, V4L2_PIX_FMT_SBGGR8, V4L2_FIELD_NONE, .bytesperline = 800, .sizeimage = 800 * 600, .colorspace = V4L2_COLORSPACE_SRGB, .priv = 1}, {1024, 768, V4L2_PIX_FMT_SBGGR8, V4L2_FIELD_NONE, .bytesperline = 1024, .sizeimage = 1024 * 768, .colorspace = V4L2_COLORSPACE_SRGB, .priv = 1}, {1600, 1200, V4L2_PIX_FMT_SBGGR8, V4L2_FIELD_NONE, .bytesperline = 1600, .sizeimage = 1600 * 1200, .colorspace = V4L2_COLORSPACE_SRGB, .priv = 0}, {2048, 1536, V4L2_PIX_FMT_SBGGR8, V4L2_FIELD_NONE, .bytesperline = 2048, .sizeimage = 2048 * 1536, .colorspace = V4L2_COLORSPACE_SRGB, .priv = 0}, }; static const struct v4l2_pix_format ovfx2_ov9600_mode[] = { {640, 480, V4L2_PIX_FMT_SBGGR8, V4L2_FIELD_NONE, .bytesperline = 640, .sizeimage = 640 * 480, .colorspace = V4L2_COLORSPACE_SRGB, .priv = 1}, {1280, 1024, V4L2_PIX_FMT_SBGGR8, V4L2_FIELD_NONE, .bytesperline = 1280, .sizeimage = 1280 * 1024, .colorspace = V4L2_COLORSPACE_SRGB}, }; /* Registers common to OV511 / OV518 */ #define R51x_FIFO_PSIZE 0x30 /* 2 bytes wide w/ OV518(+) */ #define R51x_SYS_RESET 0x50 /* Reset type flags */ #define OV511_RESET_OMNICE 0x08 #define R51x_SYS_INIT 0x53 #define R51x_SYS_SNAP 0x52 #define R51x_SYS_CUST_ID 0x5f #define R51x_COMP_LUT_BEGIN 0x80 /* OV511 Camera interface register numbers */ #define R511_CAM_DELAY 0x10 #define R511_CAM_EDGE 0x11 #define R511_CAM_PXCNT 0x12 #define R511_CAM_LNCNT 0x13 #define R511_CAM_PXDIV 0x14 #define R511_CAM_LNDIV 0x15 #define R511_CAM_UV_EN 0x16 #define R511_CAM_LINE_MODE 0x17 #define R511_CAM_OPTS 0x18 #define R511_SNAP_FRAME 0x19 #define R511_SNAP_PXCNT 0x1a #define R511_SNAP_LNCNT 0x1b #define R511_SNAP_PXDIV 0x1c #define R511_SNAP_LNDIV 0x1d #define R511_SNAP_UV_EN 0x1e #define R511_SNAP_OPTS 0x1f #define R511_DRAM_FLOW_CTL 0x20 #define R511_FIFO_OPTS 0x31 #define R511_I2C_CTL 0x40 #define R511_SYS_LED_CTL 0x55 /* OV511+ only */ #define R511_COMP_EN 0x78 #define R511_COMP_LUT_EN 0x79 /* OV518 Camera interface register numbers */ #define R518_GPIO_OUT 0x56 /* OV518(+) only */ #define R518_GPIO_CTL 0x57 /* OV518(+) only */ /* OV519 Camera interface register numbers */ #define OV519_R10_H_SIZE 0x10 #define OV519_R11_V_SIZE 0x11 #define OV519_R12_X_OFFSETL 0x12 #define OV519_R13_X_OFFSETH 0x13 #define OV519_R14_Y_OFFSETL 0x14 #define OV519_R15_Y_OFFSETH 0x15 #define OV519_R16_DIVIDER 0x16 #define OV519_R20_DFR 0x20 #define OV519_R25_FORMAT 0x25 /* OV519 System Controller register numbers */ #define OV519_R51_RESET1 0x51 #define OV519_R54_EN_CLK1 0x54 #define OV519_R57_SNAPSHOT 0x57 #define OV519_GPIO_DATA_OUT0 0x71 #define OV519_GPIO_IO_CTRL0 0x72 /*#define OV511_ENDPOINT_ADDRESS 1 * Isoc endpoint number */ /* * The FX2 chip does not give us a zero length read at end of frame. * It does, however, give a short read at the end of a frame, if * necessary, rather than run two frames together. * * By choosing the right bulk transfer size, we are guaranteed to always * get a short read for the last read of each frame. Frame sizes are * always a composite number (width * height, or a multiple) so if we * choose a prime number, we are guaranteed that the last read of a * frame will be short. * * But it isn't that easy: the 2.6 kernel requires a multiple of 4KB, * otherwise EOVERFLOW "babbling" errors occur. I have not been able * to figure out why. [PMiller] * * The constant (13 * 4096) is the largest "prime enough" number less than 64KB. * * It isn't enough to know the number of bytes per frame, in case we * have data dropouts or buffer overruns (even though the FX2 double * buffers, there are some pretty strict real time constraints for * isochronous transfer for larger frame sizes). */ /*jfm: this value does not work for 800x600 - see isoc_init */ #define OVFX2_BULK_SIZE (13 * 4096) /* I2C registers */ #define R51x_I2C_W_SID 0x41 #define R51x_I2C_SADDR_3 0x42 #define R51x_I2C_SADDR_2 0x43 #define R51x_I2C_R_SID 0x44 #define R51x_I2C_DATA 0x45 #define R518_I2C_CTL 0x47 /* OV518(+) only */ #define OVFX2_I2C_ADDR 0x00 /* I2C ADDRESSES */ #define OV7xx0_SID 0x42 #define OV_HIRES_SID 0x60 /* OV9xxx / OV2xxx / OV3xxx */ #define OV8xx0_SID 0xa0 #define OV6xx0_SID 0xc0 /* OV7610 registers */ #define OV7610_REG_GAIN 0x00 /* gain setting (5:0) */ #define OV7610_REG_BLUE 0x01 /* blue channel balance */ #define OV7610_REG_RED 0x02 /* red channel balance */ #define OV7610_REG_SAT 0x03 /* saturation */ #define OV8610_REG_HUE 0x04 /* 04 reserved */ #define OV7610_REG_CNT 0x05 /* Y contrast */ #define OV7610_REG_BRT 0x06 /* Y brightness */ #define OV7610_REG_COM_C 0x14 /* misc common regs */ #define OV7610_REG_ID_HIGH 0x1c /* manufacturer ID MSB */ #define OV7610_REG_ID_LOW 0x1d /* manufacturer ID LSB */ #define OV7610_REG_COM_I 0x29 /* misc settings */ /* OV7660 and OV7670 registers */ #define OV7670_R00_GAIN 0x00 /* Gain lower 8 bits (rest in vref) */ #define OV7670_R01_BLUE 0x01 /* blue gain */ #define OV7670_R02_RED 0x02 /* red gain */ #define OV7670_R03_VREF 0x03 /* Pieces of GAIN, VSTART, VSTOP */ #define OV7670_R04_COM1 0x04 /* Control 1 */ /*#define OV7670_R07_AECHH 0x07 * AEC MS 5 bits */ #define OV7670_R0C_COM3 0x0c /* Control 3 */ #define OV7670_R0D_COM4 0x0d /* Control 4 */ #define OV7670_R0E_COM5 0x0e /* All "reserved" */ #define OV7670_R0F_COM6 0x0f /* Control 6 */ #define OV7670_R10_AECH 0x10 /* More bits of AEC value */ #define OV7670_R11_CLKRC 0x11 /* Clock control */ #define OV7670_R12_COM7 0x12 /* Control 7 */ #define OV7670_COM7_FMT_VGA 0x00 /*#define OV7670_COM7_YUV 0x00 * YUV */ #define OV7670_COM7_FMT_QVGA 0x10 /* QVGA format */ #define OV7670_COM7_FMT_MASK 0x38 #define OV7670_COM7_RESET 0x80 /* Register reset */ #define OV7670_R13_COM8 0x13 /* Control 8 */ #define OV7670_COM8_AEC 0x01 /* Auto exposure enable */ #define OV7670_COM8_AWB 0x02 /* White balance enable */ #define OV7670_COM8_AGC 0x04 /* Auto gain enable */ #define OV7670_COM8_BFILT 0x20 /* Band filter enable */ #define OV7670_COM8_AECSTEP 0x40 /* Unlimited AEC step size */ #define OV7670_COM8_FASTAEC 0x80 /* Enable fast AGC/AEC */ #define OV7670_R14_COM9 0x14 /* Control 9 - gain ceiling */ #define OV7670_R15_COM10 0x15 /* Control 10 */ #define OV7670_R17_HSTART 0x17 /* Horiz start high bits */ #define OV7670_R18_HSTOP 0x18 /* Horiz stop high bits */ #define OV7670_R19_VSTART 0x19 /* Vert start high bits */ #define OV7670_R1A_VSTOP 0x1a /* Vert stop high bits */ #define OV7670_R1E_MVFP 0x1e /* Mirror / vflip */ #define OV7670_MVFP_VFLIP 0x10 /* vertical flip */ #define OV7670_MVFP_MIRROR 0x20 /* Mirror image */ #define OV7670_R24_AEW 0x24 /* AGC upper limit */ #define OV7670_R25_AEB 0x25 /* AGC lower limit */ #define OV7670_R26_VPT 0x26 /* AGC/AEC fast mode op region */ #define OV7670_R32_HREF 0x32 /* HREF pieces */ #define OV7670_R3A_TSLB 0x3a /* lots of stuff */ #define OV7670_R3B_COM11 0x3b /* Control 11 */ #define OV7670_COM11_EXP 0x02 #define OV7670_COM11_HZAUTO 0x10 /* Auto detect 50/60 Hz */ #define OV7670_R3C_COM12 0x3c /* Control 12 */ #define OV7670_R3D_COM13 0x3d /* Control 13 */ #define OV7670_COM13_GAMMA 0x80 /* Gamma enable */ #define OV7670_COM13_UVSAT 0x40 /* UV saturation auto adjustment */ #define OV7670_R3E_COM14 0x3e /* Control 14 */ #define OV7670_R3F_EDGE 0x3f /* Edge enhancement factor */ #define OV7670_R40_COM15 0x40 /* Control 15 */ /*#define OV7670_COM15_R00FF 0xc0 * 00 to FF */ #define OV7670_R41_COM16 0x41 /* Control 16 */ #define OV7670_COM16_AWBGAIN 0x08 /* AWB gain enable */ /* end of ov7660 common registers */ #define OV7670_R55_BRIGHT 0x55 /* Brightness */ #define OV7670_R56_CONTRAS 0x56 /* Contrast control */ #define OV7670_R69_GFIX 0x69 /* Fix gain control */ /*#define OV7670_R8C_RGB444 0x8c * RGB 444 control */ #define OV7670_R9F_HAECC1 0x9f /* Hist AEC/AGC control 1 */ #define OV7670_RA0_HAECC2 0xa0 /* Hist AEC/AGC control 2 */ #define OV7670_RA5_BD50MAX 0xa5 /* 50hz banding step limit */ #define OV7670_RA6_HAECC3 0xa6 /* Hist AEC/AGC control 3 */ #define OV7670_RA7_HAECC4 0xa7 /* Hist AEC/AGC control 4 */ #define OV7670_RA8_HAECC5 0xa8 /* Hist AEC/AGC control 5 */ #define OV7670_RA9_HAECC6 0xa9 /* Hist AEC/AGC control 6 */ #define OV7670_RAA_HAECC7 0xaa /* Hist AEC/AGC control 7 */ #define OV7670_RAB_BD60MAX 0xab /* 60hz banding step limit */ struct ov_regvals { u8 reg; u8 val; }; struct ov_i2c_regvals { u8 reg; u8 val; }; /* Settings for OV2610 camera chip */ static const struct ov_i2c_regvals norm_2610[] = { { 0x12, 0x80 }, /* reset */ }; static const struct ov_i2c_regvals norm_2610ae[] = { {0x12, 0x80}, /* reset */ {0x13, 0xcd}, {0x09, 0x01}, {0x0d, 0x00}, {0x11, 0x80}, {0x12, 0x20}, /* 1600x1200 */ {0x33, 0x0c}, {0x35, 0x90}, {0x36, 0x37}, /* ms-win traces */ {0x11, 0x83}, /* clock / 3 ? */ {0x2d, 0x00}, /* 60 Hz filter */ {0x24, 0xb0}, /* normal colors */ {0x25, 0x90}, {0x10, 0x43}, }; static const struct ov_i2c_regvals norm_3620b[] = { /* * From the datasheet: "Note that after writing to register COMH * (0x12) to change the sensor mode, registers related to the * sensor's cropping window will be reset back to their default * values." * * "wait 4096 external clock ... to make sure the sensor is * stable and ready to access registers" i.e. 160us at 24MHz */ { 0x12, 0x80 }, /* COMH reset */ { 0x12, 0x00 }, /* QXGA, master */ /* * 11 CLKRC "Clock Rate Control" * [7] internal frequency doublers: on * [6] video port mode: master * [5:0] clock divider: 1 */ { 0x11, 0x80 }, /* * 13 COMI "Common Control I" * = 192 (0xC0) 11000000 * COMI[7] "AEC speed selection" * = 1 (0x01) 1....... "Faster AEC correction" * COMI[6] "AEC speed step selection" * = 1 (0x01) .1...... "Big steps, fast" * COMI[5] "Banding filter on off" * = 0 (0x00) ..0..... "Off" * COMI[4] "Banding filter option" * = 0 (0x00) ...0.... "Main clock is 48 MHz and * the PLL is ON" * COMI[3] "Reserved" * = 0 (0x00) ....0... * COMI[2] "AGC auto manual control selection" * = 0 (0x00) .....0.. "Manual" * COMI[1] "AWB auto manual control selection" * = 0 (0x00) ......0. "Manual" * COMI[0] "Exposure control" * = 0 (0x00) .......0 "Manual" */ { 0x13, 0xc0 }, /* * 09 COMC "Common Control C" * = 8 (0x08) 00001000 * COMC[7:5] "Reserved" * = 0 (0x00) 000..... * COMC[4] "Sleep Mode Enable" * = 0 (0x00) ...0.... "Normal mode" * COMC[3:2] "Sensor sampling reset timing selection" * = 2 (0x02) ....10.. "Longer reset time" * COMC[1:0] "Output drive current select" * = 0 (0x00) ......00 "Weakest" */ { 0x09, 0x08 }, /* * 0C COMD "Common Control D" * = 8 (0x08) 00001000 * COMD[7] "Reserved" * = 0 (0x00) 0....... * COMD[6] "Swap MSB and LSB at the output port" * = 0 (0x00) .0...... "False" * COMD[5:3] "Reserved" * = 1 (0x01) ..001... * COMD[2] "Output Average On Off" * = 0 (0x00) .....0.. "Output Normal" * COMD[1] "Sensor precharge voltage selection" * = 0 (0x00) ......0. "Selects internal * reference precharge * voltage" * COMD[0] "Snapshot option" * = 0 (0x00) .......0 "Enable live video output * after snapshot sequence" */ { 0x0c, 0x08 }, /* * 0D COME "Common Control E" * = 161 (0xA1) 10100001 * COME[7] "Output average option" * = 1 (0x01) 1....... "Output average of 4 pixels" * COME[6] "Anti-blooming control" * = 0 (0x00) .0...... "Off" * COME[5:3] "Reserved" * = 4 (0x04) ..100... * COME[2] "Clock output power down pin status" * = 0 (0x00) .....0.. "Tri-state data output pin * on power down" * COME[1] "Data output pin status selection at power down" * = 0 (0x00) ......0. "Tri-state VSYNC, PCLK, * HREF, and CHSYNC pins on * power down" * COME[0] "Auto zero circuit select" * = 1 (0x01) .......1 "On" */ { 0x0d, 0xa1 }, /* * 0E COMF "Common Control F" * = 112 (0x70) 01110000 * COMF[7] "System clock selection" * = 0 (0x00) 0....... "Use 24 MHz system clock" * COMF[6:4] "Reserved" * = 7 (0x07) .111.... * COMF[3] "Manual auto negative offset canceling selection" * = 0 (0x00) ....0... "Auto detect negative * offset and cancel it" * COMF[2:0] "Reserved" * = 0 (0x00) .....000 */ { 0x0e, 0x70 }, /* * 0F COMG "Common Control G" * = 66 (0x42) 01000010 * COMG[7] "Optical black output selection" * = 0 (0x00) 0....... "Disable" * COMG[6] "Black level calibrate selection" * = 1 (0x01) .1...... "Use optical black pixels * to calibrate" * COMG[5:4] "Reserved" * = 0 (0x00) ..00.... * COMG[3] "Channel offset adjustment" * = 0 (0x00) ....0... "Disable offset adjustment" * COMG[2] "ADC black level calibration option" * = 0 (0x00) .....0.. "Use B/G line and G/R * line to calibrate each * channel's black level" * COMG[1] "Reserved" * = 1 (0x01) ......1. * COMG[0] "ADC black level calibration enable" * = 0 (0x00) .......0 "Disable" */ { 0x0f, 0x42 }, /* * 14 COMJ "Common Control J" * = 198 (0xC6) 11000110 * COMJ[7:6] "AGC gain ceiling" * = 3 (0x03) 11...... "8x" * COMJ[5:4] "Reserved" * = 0 (0x00) ..00.... * COMJ[3] "Auto banding filter" * = 0 (0x00) ....0... "Banding filter is always * on off depending on * COMI[5] setting" * COMJ[2] "VSYNC drop option" * = 1 (0x01) .....1.. "SYNC is dropped if frame * data is dropped" * COMJ[1] "Frame data drop" * = 1 (0x01) ......1. "Drop frame data if * exposure is not within * tolerance. In AEC mode, * data is normally dropped * when data is out of * range." * COMJ[0] "Reserved" * = 0 (0x00) .......0 */ { 0x14, 0xc6 }, /* * 15 COMK "Common Control K" * = 2 (0x02) 00000010 * COMK[7] "CHSYNC pin output swap" * = 0 (0x00) 0....... "CHSYNC" * COMK[6] "HREF pin output swap" * = 0 (0x00) .0...... "HREF" * COMK[5] "PCLK output selection" * = 0 (0x00) ..0..... "PCLK always output" * COMK[4] "PCLK edge selection" * = 0 (0x00) ...0.... "Data valid on falling edge" * COMK[3] "HREF output polarity" * = 0 (0x00) ....0... "positive" * COMK[2] "Reserved" * = 0 (0x00) .....0.. * COMK[1] "VSYNC polarity" * = 1 (0x01) ......1. "negative" * COMK[0] "HSYNC polarity" * = 0 (0x00) .......0 "positive" */ { 0x15, 0x02 }, /* * 33 CHLF "Current Control" * = 9 (0x09) 00001001 * CHLF[7:6] "Sensor current control" * = 0 (0x00) 00...... * CHLF[5] "Sensor current range control" * = 0 (0x00) ..0..... "normal range" * CHLF[4] "Sensor current" * = 0 (0x00) ...0.... "normal current" * CHLF[3] "Sensor buffer current control" * = 1 (0x01) ....1... "half current" * CHLF[2] "Column buffer current control" * = 0 (0x00) .....0.. "normal current" * CHLF[1] "Analog DSP current control" * = 0 (0x00) ......0. "normal current" * CHLF[1] "ADC current control" * = 0 (0x00) ......0. "normal current" */ { 0x33, 0x09 }, /* * 34 VBLM "Blooming Control" * = 80 (0x50) 01010000 * VBLM[7] "Hard soft reset switch" * = 0 (0x00) 0....... "Hard reset" * VBLM[6:4] "Blooming voltage selection" * = 5 (0x05) .101.... * VBLM[3:0] "Sensor current control" * = 0 (0x00) ....0000 */ { 0x34, 0x50 }, /* * 36 VCHG "Sensor Precharge Voltage Control" * = 0 (0x00) 00000000 * VCHG[7] "Reserved" * = 0 (0x00) 0....... * VCHG[6:4] "Sensor precharge voltage control" * = 0 (0x00) .000.... * VCHG[3:0] "Sensor array common reference" * = 0 (0x00) ....0000 */ { 0x36, 0x00 }, /* * 37 ADC "ADC Reference Control" * = 4 (0x04) 00000100 * ADC[7:4] "Reserved" * = 0 (0x00) 0000.... * ADC[3] "ADC input signal range" * = 0 (0x00) ....0... "Input signal 1.0x" * ADC[2:0] "ADC range control" * = 4 (0x04) .....100 */ { 0x37, 0x04 }, /* * 38 ACOM "Analog Common Ground" * = 82 (0x52) 01010010 * ACOM[7] "Analog gain control" * = 0 (0x00) 0....... "Gain 1x" * ACOM[6] "Analog black level calibration" * = 1 (0x01) .1...... "On" * ACOM[5:0] "Reserved" * = 18 (0x12) ..010010 */ { 0x38, 0x52 }, /* * 3A FREFA "Internal Reference Adjustment" * = 0 (0x00) 00000000 * FREFA[7:0] "Range" * = 0 (0x00) 00000000 */ { 0x3a, 0x00 }, /* * 3C FVOPT "Internal Reference Adjustment" * = 31 (0x1F) 00011111 * FVOPT[7:0] "Range" * = 31 (0x1F) 00011111 */ { 0x3c, 0x1f }, /* * 44 Undocumented = 0 (0x00) 00000000 * 44[7:0] "It's a secret" * = 0 (0x00) 00000000 */ { 0x44, 0x00 }, /* * 40 Undocumented = 0 (0x00) 00000000 * 40[7:0] "It's a secret" * = 0 (0x00) 00000000 */ { 0x40, 0x00 }, /* * 41 Undocumented = 0 (0x00) 00000000 * 41[7:0] "It's a secret" * = 0 (0x00) 00000000 */ { 0x41, 0x00 }, /* * 42 Undocumented = 0 (0x00) 00000000 * 42[7:0] "It's a secret" * = 0 (0x00) 00000000 */ { 0x42, 0x00 }, /* * 43 Undocumented = 0 (0x00) 00000000 * 43[7:0] "It's a secret" * = 0 (0x00) 00000000 */ { 0x43, 0x00 }, /* * 45 Undocumented = 128 (0x80) 10000000 * 45[7:0] "It's a secret" * = 128 (0x80) 10000000 */ { 0x45, 0x80 }, /* * 48 Undocumented = 192 (0xC0) 11000000 * 48[7:0] "It's a secret" * = 192 (0xC0) 11000000 */ { 0x48, 0xc0 }, /* * 49 Undocumented = 25 (0x19) 00011001 * 49[7:0] "It's a secret" * = 25 (0x19) 00011001 */ { 0x49, 0x19 }, /* * 4B Undocumented = 128 (0x80) 10000000 * 4B[7:0] "It's a secret" * = 128 (0x80) 10000000 */ { 0x4b, 0x80 }, /* * 4D Undocumented = 196 (0xC4) 11000100 * 4D[7:0] "It's a secret" * = 196 (0xC4) 11000100 */ { 0x4d, 0xc4 }, /* * 35 VREF "Reference Voltage Control" * = 76 (0x4c) 01001100 * VREF[7:5] "Column high reference control" * = 2 (0x02) 010..... "higher voltage" * VREF[4:2] "Column low reference control" * = 3 (0x03) ...011.. "Highest voltage" * VREF[1:0] "Reserved" * = 0 (0x00) ......00 */ { 0x35, 0x4c }, /* * 3D Undocumented = 0 (0x00) 00000000 * 3D[7:0] "It's a secret" * = 0 (0x00) 00000000 */ { 0x3d, 0x00 }, /* * 3E Undocumented = 0 (0x00) 00000000 * 3E[7:0] "It's a secret" * = 0 (0x00) 00000000 */ { 0x3e, 0x00 }, /* * 3B FREFB "Internal Reference Adjustment" * = 24 (0x18) 00011000 * FREFB[7:0] "Range" * = 24 (0x18) 00011000 */ { 0x3b, 0x18 }, /* * 33 CHLF "Current Control" * = 25 (0x19) 00011001 * CHLF[7:6] "Sensor current control" * = 0 (0x00) 00...... * CHLF[5] "Sensor current range control" * = 0 (0x00) ..0..... "normal range" * CHLF[4] "Sensor current" * = 1 (0x01) ...1.... "double current" * CHLF[3] "Sensor buffer current control" * = 1 (0x01) ....1... "half current" * CHLF[2] "Column buffer current control" * = 0 (0x00) .....0.. "normal current" * CHLF[1] "Analog DSP current control" * = 0 (0x00) ......0. "normal current" * CHLF[1] "ADC current control" * = 0 (0x00) ......0. "normal current" */ { 0x33, 0x19 }, /* * 34 VBLM "Blooming Control" * = 90 (0x5A) 01011010 * VBLM[7] "Hard soft reset switch" * = 0 (0x00) 0....... "Hard reset" * VBLM[6:4] "Blooming voltage selection" * = 5 (0x05) .101.... * VBLM[3:0] "Sensor current control" * = 10 (0x0A) ....1010 */ { 0x34, 0x5a }, /* * 3B FREFB "Internal Reference Adjustment" * = 0 (0x00) 00000000 * FREFB[7:0] "Range" * = 0 (0x00) 00000000 */ { 0x3b, 0x00 }, /* * 33 CHLF "Current Control" * = 9 (0x09) 00001001 * CHLF[7:6] "Sensor current control" * = 0 (0x00) 00...... * CHLF[5] "Sensor current range control" * = 0 (0x00) ..0..... "normal range" * CHLF[4] "Sensor current" * = 0 (0x00) ...0.... "normal current" * CHLF[3] "Sensor buffer current control" * = 1 (0x01) ....1... "half current" * CHLF[2] "Column buffer current control" * = 0 (0x00) .....0.. "normal current" * CHLF[1] "Analog DSP current control" * = 0 (0x00) ......0. "normal current" * CHLF[1] "ADC current control" * = 0 (0x00) ......0. "normal current" */ { 0x33, 0x09 }, /* * 34 VBLM "Blooming Control" * = 80 (0x50) 01010000 * VBLM[7] "Hard soft reset switch" * = 0 (0x00) 0....... "Hard reset" * VBLM[6:4] "Blooming voltage selection" * = 5 (0x05) .101.... * VBLM[3:0] "Sensor current control" * = 0 (0x00) ....0000 */ { 0x34, 0x50 }, /* * 12 COMH "Common Control H" * = 64 (0x40) 01000000 * COMH[7] "SRST" * = 0 (0x00) 0....... "No-op" * COMH[6:4] "Resolution selection" * = 4 (0x04) .100.... "XGA" * COMH[3] "Master slave selection" * = 0 (0x00) ....0... "Master mode" * COMH[2] "Internal B/R channel option" * = 0 (0x00) .....0.. "B/R use same channel" * COMH[1] "Color bar test pattern" * = 0 (0x00) ......0. "Off" * COMH[0] "Reserved" * = 0 (0x00) .......0 */ { 0x12, 0x40 }, /* * 17 HREFST "Horizontal window start" * = 31 (0x1F) 00011111 * HREFST[7:0] "Horizontal window start, 8 MSBs" * = 31 (0x1F) 00011111 */ { 0x17, 0x1f }, /* * 18 HREFEND "Horizontal window end" * = 95 (0x5F) 01011111 * HREFEND[7:0] "Horizontal Window End, 8 MSBs" * = 95 (0x5F) 01011111 */ { 0x18, 0x5f }, /* * 19 VSTRT "Vertical window start" * = 0 (0x00) 00000000 * VSTRT[7:0] "Vertical Window Start, 8 MSBs" * = 0 (0x00) 00000000 */ { 0x19, 0x00 }, /* * 1A VEND "Vertical window end" * = 96 (0x60) 01100000 * VEND[7:0] "Vertical Window End, 8 MSBs" * = 96 (0x60) 01100000 */ { 0x1a, 0x60 }, /* * 32 COMM "Common Control M" * = 18 (0x12) 00010010 * COMM[7:6] "Pixel clock divide option" * = 0 (0x00) 00...... "/1" * COMM[5:3] "Horizontal window end position, 3 LSBs" * = 2 (0x02) ..010... * COMM[2:0] "Horizontal window start position, 3 LSBs" * = 2 (0x02) .....010 */ { 0x32, 0x12 }, /* * 03 COMA "Common Control A" * = 74 (0x4A) 01001010 * COMA[7:4] "AWB Update Threshold" * = 4 (0x04) 0100.... * COMA[3:2] "Vertical window end line control 2 LSBs" * = 2 (0x02) ....10.. * COMA[1:0] "Vertical window start line control 2 LSBs" * = 2 (0x02) ......10 */ { 0x03, 0x4a }, /* * 11 CLKRC "Clock Rate Control" * = 128 (0x80) 10000000 * CLKRC[7] "Internal frequency doublers on off seclection" * = 1 (0x01) 1....... "On" * CLKRC[6] "Digital video master slave selection" * = 0 (0x00) .0...... "Master mode, sensor * provides PCLK" * CLKRC[5:0] "Clock divider { CLK = PCLK/(1+CLKRC[5:0]) }" * = 0 (0x00) ..000000 */ { 0x11, 0x80 }, /* * 12 COMH "Common Control H" * = 0 (0x00) 00000000 * COMH[7] "SRST" * = 0 (0x00) 0....... "No-op" * COMH[6:4] "Resolution selection" * = 0 (0x00) .000.... "QXGA" * COMH[3] "Master slave selection" * = 0 (0x00) ....0... "Master mode" * COMH[2] "Internal B/R channel option" * = 0 (0x00) .....0.. "B/R use same channel" * COMH[1] "Color bar test pattern" * = 0 (0x00) ......0. "Off" * COMH[0] "Reserved" * = 0 (0x00) .......0 */ { 0x12, 0x00 }, /* * 12 COMH "Common Control H" * = 64 (0x40) 01000000 * COMH[7] "SRST" * = 0 (0x00) 0....... "No-op" * COMH[6:4] "Resolution selection" * = 4 (0x04) .100.... "XGA" * COMH[3] "Master slave selection" * = 0 (0x00) ....0... "Master mode" * COMH[2] "Internal B/R channel option" * = 0 (0x00) .....0.. "B/R use same channel" * COMH[1] "Color bar test pattern" * = 0 (0x00) ......0. "Off" * COMH[0] "Reserved" * = 0 (0x00) .......0 */ { 0x12, 0x40 }, /* * 17 HREFST "Horizontal window start" * = 31 (0x1F) 00011111 * HREFST[7:0] "Horizontal window start, 8 MSBs" * = 31 (0x1F) 00011111 */ { 0x17, 0x1f }, /* * 18 HREFEND "Horizontal window end" * = 95 (0x5F) 01011111 * HREFEND[7:0] "Horizontal Window End, 8 MSBs" * = 95 (0x5F) 01011111 */ { 0x18, 0x5f }, /* * 19 VSTRT "Vertical window start" * = 0 (0x00) 00000000 * VSTRT[7:0] "Vertical Window Start, 8 MSBs" * = 0 (0x00) 00000000 */ { 0x19, 0x00 }, /* * 1A VEND "Vertical window end" * = 96 (0x60) 01100000 * VEND[7:0] "Vertical Window End, 8 MSBs" * = 96 (0x60) 01100000 */ { 0x1a, 0x60 }, /* * 32 COMM "Common Control M" * = 18 (0x12) 00010010 * COMM[7:6] "Pixel clock divide option" * = 0 (0x00) 00...... "/1" * COMM[5:3] "Horizontal window end position, 3 LSBs" * = 2 (0x02) ..010... * COMM[2:0] "Horizontal window start position, 3 LSBs" * = 2 (0x02) .....010 */ { 0x32, 0x12 }, /* * 03 COMA "Common Control A" * = 74 (0x4A) 01001010 * COMA[7:4] "AWB Update Threshold" * = 4 (0x04) 0100.... * COMA[3:2] "Vertical window end line control 2 LSBs" * = 2 (0x02) ....10.. * COMA[1:0] "Vertical window start line control 2 LSBs" * = 2 (0x02) ......10 */ { 0x03, 0x4a }, /* * 02 RED "Red Gain Control" * = 175 (0xAF) 10101111 * RED[7] "Action" * = 1 (0x01) 1....... "gain = 1/(1+bitrev([6:0]))" * RED[6:0] "Value" * = 47 (0x2F) .0101111 */ { 0x02, 0xaf }, /* * 2D ADDVSL "VSYNC Pulse Width" * = 210 (0xD2) 11010010 * ADDVSL[7:0] "VSYNC pulse width, LSB" * = 210 (0xD2) 11010010 */ { 0x2d, 0xd2 }, /* * 00 GAIN = 24 (0x18) 00011000 * GAIN[7:6] "Reserved" * = 0 (0x00) 00...... * GAIN[5] "Double" * = 0 (0x00) ..0..... "False" * GAIN[4] "Double" * = 1 (0x01) ...1.... "True" * GAIN[3:0] "Range" * = 8 (0x08) ....1000 */ { 0x00, 0x18 }, /* * 01 BLUE "Blue Gain Control" * = 240 (0xF0) 11110000 * BLUE[7] "Action" * = 1 (0x01) 1....... "gain = 1/(1+bitrev([6:0]))" * BLUE[6:0] "Value" * = 112 (0x70) .1110000 */ { 0x01, 0xf0 }, /* * 10 AEC "Automatic Exposure Control" * = 10 (0x0A) 00001010 * AEC[7:0] "Automatic Exposure Control, 8 MSBs" * = 10 (0x0A) 00001010 */ { 0x10, 0x0a }, { 0xe1, 0x67 }, { 0xe3, 0x03 }, { 0xe4, 0x26 }, { 0xe5, 0x3e }, { 0xf8, 0x01 }, { 0xff, 0x01 }, }; static const struct ov_i2c_regvals norm_6x20[] = { { 0x12, 0x80 }, /* reset */ { 0x11, 0x01 }, { 0x03, 0x60 }, { 0x05, 0x7f }, /* For when autoadjust is off */ { 0x07, 0xa8 }, /* The ratio of 0x0c and 0x0d controls the white point */ { 0x0c, 0x24 }, { 0x0d, 0x24 }, { 0x0f, 0x15 }, /* COMS */ { 0x10, 0x75 }, /* AEC Exposure time */ { 0x12, 0x24 }, /* Enable AGC */ { 0x14, 0x04 }, /* 0x16: 0x06 helps frame stability with moving objects */ { 0x16, 0x06 }, /* { 0x20, 0x30 }, * Aperture correction enable */ { 0x26, 0xb2 }, /* BLC enable */ /* 0x28: 0x05 Selects RGB format if RGB on */ { 0x28, 0x05 }, { 0x2a, 0x04 }, /* Disable framerate adjust */ /* { 0x2b, 0xac }, * Framerate; Set 2a[7] first */ { 0x2d, 0x85 }, { 0x33, 0xa0 }, /* Color Processing Parameter */ { 0x34, 0xd2 }, /* Max A/D range */ { 0x38, 0x8b }, { 0x39, 0x40 }, { 0x3c, 0x39 }, /* Enable AEC mode changing */ { 0x3c, 0x3c }, /* Change AEC mode */ { 0x3c, 0x24 }, /* Disable AEC mode changing */ { 0x3d, 0x80 }, /* These next two registers (0x4a, 0x4b) are undocumented. * They control the color balance */ { 0x4a, 0x80 }, { 0x4b, 0x80 }, { 0x4d, 0xd2 }, /* This reduces noise a bit */ { 0x4e, 0xc1 }, { 0x4f, 0x04 }, /* Do 50-53 have any effect? */ /* Toggle 0x12[2] off and on here? */ }; static const struct ov_i2c_regvals norm_6x30[] = { { 0x12, 0x80 }, /* Reset */ { 0x00, 0x1f }, /* Gain */ { 0x01, 0x99 }, /* Blue gain */ { 0x02, 0x7c }, /* Red gain */ { 0x03, 0xc0 }, /* Saturation */ { 0x05, 0x0a }, /* Contrast */ { 0x06, 0x95 }, /* Brightness */ { 0x07, 0x2d }, /* Sharpness */ { 0x0c, 0x20 }, { 0x0d, 0x20 }, { 0x0e, 0xa0 }, /* Was 0x20, bit7 enables a 2x gain which we need */ { 0x0f, 0x05 }, { 0x10, 0x9a }, { 0x11, 0x00 }, /* Pixel clock = fastest */ { 0x12, 0x24 }, /* Enable AGC and AWB */ { 0x13, 0x21 }, { 0x14, 0x80 }, { 0x15, 0x01 }, { 0x16, 0x03 }, { 0x17, 0x38 }, { 0x18, 0xea }, { 0x19, 0x04 }, { 0x1a, 0x93 }, { 0x1b, 0x00 }, { 0x1e, 0xc4 }, { 0x1f, 0x04 }, { 0x20, 0x20 }, { 0x21, 0x10 }, { 0x22, 0x88 }, { 0x23, 0xc0 }, /* Crystal circuit power level */ { 0x25, 0x9a }, /* Increase AEC black ratio */ { 0x26, 0xb2 }, /* BLC enable */ { 0x27, 0xa2 }, { 0x28, 0x00 }, { 0x29, 0x00 }, { 0x2a, 0x84 }, /* 60 Hz power */ { 0x2b, 0xa8 }, /* 60 Hz power */ { 0x2c, 0xa0 }, { 0x2d, 0x95 }, /* Enable auto-brightness */ { 0x2e, 0x88 }, { 0x33, 0x26 }, { 0x34, 0x03 }, { 0x36, 0x8f }, { 0x37, 0x80 }, { 0x38, 0x83 }, { 0x39, 0x80 }, { 0x3a, 0x0f }, { 0x3b, 0x3c }, { 0x3c, 0x1a }, { 0x3d, 0x80 }, { 0x3e, 0x80 }, { 0x3f, 0x0e }, { 0x40, 0x00 }, /* White bal */ { 0x41, 0x00 }, /* White bal */ { 0x42, 0x80 }, { 0x43, 0x3f }, /* White bal */ { 0x44, 0x80 }, { 0x45, 0x20 }, { 0x46, 0x20 }, { 0x47, 0x80 }, { 0x48, 0x7f }, { 0x49, 0x00 }, { 0x4a, 0x00 }, { 0x4b, 0x80 }, { 0x4c, 0xd0 }, { 0x4d, 0x10 }, /* U = 0.563u, V = 0.714v */ { 0x4e, 0x40 }, { 0x4f, 0x07 }, /* UV avg., col. killer: max */ { 0x50, 0xff }, { 0x54, 0x23 }, /* Max AGC gain: 18dB */ { 0x55, 0xff }, { 0x56, 0x12 }, { 0x57, 0x81 }, { 0x58, 0x75 }, { 0x59, 0x01 }, /* AGC dark current comp.: +1 */ { 0x5a, 0x2c }, { 0x5b, 0x0f }, /* AWB chrominance levels */ { 0x5c, 0x10 }, { 0x3d, 0x80 }, { 0x27, 0xa6 }, { 0x12, 0x20 }, /* Toggle AWB */ { 0x12, 0x24 }, }; /* Lawrence Glaister <lg@jfm.bc.ca> reports: * * Register 0x0f in the 7610 has the following effects: * * 0x85 (AEC method 1): Best overall, good contrast range * 0x45 (AEC method 2): Very overexposed * 0xa5 (spec sheet default): Ok, but the black level is * shifted resulting in loss of contrast * 0x05 (old driver setting): very overexposed, too much * contrast */ static const struct ov_i2c_regvals norm_7610[] = { { 0x10, 0xff }, { 0x16, 0x06 }, { 0x28, 0x24 }, { 0x2b, 0xac }, { 0x12, 0x00 }, { 0x38, 0x81 }, { 0x28, 0x24 }, /* 0c */ { 0x0f, 0x85 }, /* lg's setting */ { 0x15, 0x01 }, { 0x20, 0x1c }, { 0x23, 0x2a }, { 0x24, 0x10 }, { 0x25, 0x8a }, { 0x26, 0xa2 }, { 0x27, 0xc2 }, { 0x2a, 0x04 }, { 0x2c, 0xfe }, { 0x2d, 0x93 }, { 0x30, 0x71 }, { 0x31, 0x60 }, { 0x32, 0x26 }, { 0x33, 0x20 }, { 0x34, 0x48 }, { 0x12, 0x24 }, { 0x11, 0x01 }, { 0x0c, 0x24 }, { 0x0d, 0x24 }, }; static const struct ov_i2c_regvals norm_7620[] = { { 0x12, 0x80 }, /* reset */ { 0x00, 0x00 }, /* gain */ { 0x01, 0x80 }, /* blue gain */ { 0x02, 0x80 }, /* red gain */ { 0x03, 0xc0 }, /* OV7670_R03_VREF */ { 0x06, 0x60 }, { 0x07, 0x00 }, { 0x0c, 0x24 }, { 0x0c, 0x24 }, { 0x0d, 0x24 }, { 0x11, 0x01 }, { 0x12, 0x24 }, { 0x13, 0x01 }, { 0x14, 0x84 }, { 0x15, 0x01 }, { 0x16, 0x03 }, { 0x17, 0x2f }, { 0x18, 0xcf }, { 0x19, 0x06 }, { 0x1a, 0xf5 }, { 0x1b, 0x00 }, { 0x20, 0x18 }, { 0x21, 0x80 }, { 0x22, 0x80 }, { 0x23, 0x00 }, { 0x26, 0xa2 }, { 0x27, 0xea }, { 0x28, 0x22 }, /* Was 0x20, bit1 enables a 2x gain which we need */ { 0x29, 0x00 }, { 0x2a, 0x10 }, { 0x2b, 0x00 }, { 0x2c, 0x88 }, { 0x2d, 0x91 }, { 0x2e, 0x80 }, { 0x2f, 0x44 }, { 0x60, 0x27 }, { 0x61, 0x02 }, { 0x62, 0x5f }, { 0x63, 0xd5 }, { 0x64, 0x57 }, { 0x65, 0x83 }, { 0x66, 0x55 }, { 0x67, 0x92 }, { 0x68, 0xcf }, { 0x69, 0x76 }, { 0x6a, 0x22 }, { 0x6b, 0x00 }, { 0x6c, 0x02 }, { 0x6d, 0x44 }, { 0x6e, 0x80 }, { 0x6f, 0x1d }, { 0x70, 0x8b }, { 0x71, 0x00 }, { 0x72, 0x14 }, { 0x73, 0x54 }, { 0x74, 0x00 }, { 0x75, 0x8e }, { 0x76, 0x00 }, { 0x77, 0xff }, { 0x78, 0x80 }, { 0x79, 0x80 }, { 0x7a, 0x80 }, { 0x7b, 0xe2 }, { 0x7c, 0x00 }, }; /* 7640 and 7648. The defaults should be OK for most registers. */ static const struct ov_i2c_regvals norm_7640[] = { { 0x12, 0x80 }, { 0x12, 0x14 }, }; static const struct ov_regvals init_519_ov7660[] = { { 0x5d, 0x03 }, /* Turn off suspend mode */ { 0x53, 0x9b }, /* 0x9f enables the (unused) microcontroller */ { 0x54, 0x0f }, /* bit2 (jpeg enable) */ { 0xa2, 0x20 }, /* a2-a5 are undocumented */ { 0xa3, 0x18 }, { 0xa4, 0x04 }, { 0xa5, 0x28 }, { 0x37, 0x00 }, /* SetUsbInit */ { 0x55, 0x02 }, /* 4.096 Mhz audio clock */ /* Enable both fields, YUV Input, disable defect comp (why?) */ { 0x20, 0x0c }, /* 0x0d does U <-> V swap */ { 0x21, 0x38 }, { 0x22, 0x1d }, { 0x17, 0x50 }, /* undocumented */ { 0x37, 0x00 }, /* undocumented */ { 0x40, 0xff }, /* I2C timeout counter */ { 0x46, 0x00 }, /* I2C clock prescaler */ }; static const struct ov_i2c_regvals norm_7660[] = { {OV7670_R12_COM7, OV7670_COM7_RESET}, {OV7670_R11_CLKRC, 0x81}, {0x92, 0x00}, /* DM_LNL */ {0x93, 0x00}, /* DM_LNH */ {0x9d, 0x4c}, /* BD50ST */ {0x9e, 0x3f}, /* BD60ST */ {OV7670_R3B_COM11, 0x02}, {OV7670_R13_COM8, 0xf5}, {OV7670_R10_AECH, 0x00}, {OV7670_R00_GAIN, 0x00}, {OV7670_R01_BLUE, 0x7c}, {OV7670_R02_RED, 0x9d}, {OV7670_R12_COM7, 0x00}, {OV7670_R04_COM1, 00}, {OV7670_R18_HSTOP, 0x01}, {OV7670_R17_HSTART, 0x13}, {OV7670_R32_HREF, 0x92}, {OV7670_R19_VSTART, 0x02}, {OV7670_R1A_VSTOP, 0x7a}, {OV7670_R03_VREF, 0x00}, {OV7670_R0E_COM5, 0x04}, {OV7670_R0F_COM6, 0x62}, {OV7670_R15_COM10, 0x00}, {0x16, 0x02}, /* RSVD */ {0x1b, 0x00}, /* PSHFT */ {OV7670_R1E_MVFP, 0x01}, {0x29, 0x3c}, /* RSVD */ {0x33, 0x00}, /* CHLF */ {0x34, 0x07}, /* ARBLM */ {0x35, 0x84}, /* RSVD */ {0x36, 0x00}, /* RSVD */ {0x37, 0x04}, /* ADC */ {0x39, 0x43}, /* OFON */ {OV7670_R3A_TSLB, 0x00}, {OV7670_R3C_COM12, 0x6c}, {OV7670_R3D_COM13, 0x98}, {OV7670_R3F_EDGE, 0x23}, {OV7670_R40_COM15, 0xc1}, {OV7670_R41_COM16, 0x22}, {0x6b, 0x0a}, /* DBLV */ {0xa1, 0x08}, /* RSVD */ {0x69, 0x80}, /* HV */ {0x43, 0xf0}, /* RSVD.. */ {0x44, 0x10}, {0x45, 0x78}, {0x46, 0xa8}, {0x47, 0x60}, {0x48, 0x80}, {0x59, 0xba}, {0x5a, 0x9a}, {0x5b, 0x22}, {0x5c, 0xb9}, {0x5d, 0x9b}, {0x5e, 0x10}, {0x5f, 0xe0}, {0x60, 0x85}, {0x61, 0x60}, {0x9f, 0x9d}, /* RSVD */ {0xa0, 0xa0}, /* DSPC2 */ {0x4f, 0x60}, /* matrix */ {0x50, 0x64}, {0x51, 0x04}, {0x52, 0x18}, {0x53, 0x3c}, {0x54, 0x54}, {0x55, 0x40}, {0x56, 0x40}, {0x57, 0x40}, {0x58, 0x0d}, /* matrix sign */ {0x8b, 0xcc}, /* RSVD */ {0x8c, 0xcc}, {0x8d, 0xcf}, {0x6c, 0x40}, /* gamma curve */ {0x6d, 0xe0}, {0x6e, 0xa0}, {0x6f, 0x80}, {0x70, 0x70}, {0x71, 0x80}, {0x72, 0x60}, {0x73, 0x60}, {0x74, 0x50}, {0x75, 0x40}, {0x76, 0x38}, {0x77, 0x3c}, {0x78, 0x32}, {0x79, 0x1a}, {0x7a, 0x28}, {0x7b, 0x24}, {0x7c, 0x04}, /* gamma curve */ {0x7d, 0x12}, {0x7e, 0x26}, {0x7f, 0x46}, {0x80, 0x54}, {0x81, 0x64}, {0x82, 0x70}, {0x83, 0x7c}, {0x84, 0x86}, {0x85, 0x8e}, {0x86, 0x9c}, {0x87, 0xab}, {0x88, 0xc4}, {0x89, 0xd1}, {0x8a, 0xe5}, {OV7670_R14_COM9, 0x1e}, {OV7670_R24_AEW, 0x80}, {OV7670_R25_AEB, 0x72}, {OV7670_R26_VPT, 0xb3}, {0x62, 0x80}, /* LCC1 */ {0x63, 0x80}, /* LCC2 */ {0x64, 0x06}, /* LCC3 */ {0x65, 0x00}, /* LCC4 */ {0x66, 0x01}, /* LCC5 */ {0x94, 0x0e}, /* RSVD.. */ {0x95, 0x14}, {OV7670_R13_COM8, OV7670_COM8_FASTAEC | OV7670_COM8_AECSTEP | OV7670_COM8_BFILT | 0x10 | OV7670_COM8_AGC | OV7670_COM8_AWB | OV7670_COM8_AEC}, {0xa1, 0xc8} }; static const struct ov_i2c_regvals norm_9600[] = { {0x12, 0x80}, {0x0c, 0x28}, {0x11, 0x80}, {0x13, 0xb5}, {0x14, 0x3e}, {0x1b, 0x04}, {0x24, 0xb0}, {0x25, 0x90}, {0x26, 0x94}, {0x35, 0x90}, {0x37, 0x07}, {0x38, 0x08}, {0x01, 0x8e}, {0x02, 0x85} }; /* 7670. Defaults taken from OmniVision provided data, * as provided by Jonathan Corbet of OLPC */ static const struct ov_i2c_regvals norm_7670[] = { { OV7670_R12_COM7, OV7670_COM7_RESET }, { OV7670_R3A_TSLB, 0x04 }, /* OV */ { OV7670_R12_COM7, OV7670_COM7_FMT_VGA }, /* VGA */ { OV7670_R11_CLKRC, 0x01 }, /* * Set the hardware window. These values from OV don't entirely * make sense - hstop is less than hstart. But they work... */ { OV7670_R17_HSTART, 0x13 }, { OV7670_R18_HSTOP, 0x01 }, { OV7670_R32_HREF, 0xb6 }, { OV7670_R19_VSTART, 0x02 }, { OV7670_R1A_VSTOP, 0x7a }, { OV7670_R03_VREF, 0x0a }, { OV7670_R0C_COM3, 0x00 }, { OV7670_R3E_COM14, 0x00 }, /* Mystery scaling numbers */ { 0x70, 0x3a }, { 0x71, 0x35 }, { 0x72, 0x11 }, { 0x73, 0xf0 }, { 0xa2, 0x02 }, /* { OV7670_R15_COM10, 0x0 }, */ /* Gamma curve values */ { 0x7a, 0x20 }, { 0x7b, 0x10 }, { 0x7c, 0x1e }, { 0x7d, 0x35 }, { 0x7e, 0x5a }, { 0x7f, 0x69 }, { 0x80, 0x76 }, { 0x81, 0x80 }, { 0x82, 0x88 }, { 0x83, 0x8f }, { 0x84, 0x96 }, { 0x85, 0xa3 }, { 0x86, 0xaf }, { 0x87, 0xc4 }, { 0x88, 0xd7 }, { 0x89, 0xe8 }, /* AGC and AEC parameters. Note we start by disabling those features, then turn them only after tweaking the values. */ { OV7670_R13_COM8, OV7670_COM8_FASTAEC | OV7670_COM8_AECSTEP | OV7670_COM8_BFILT }, { OV7670_R00_GAIN, 0x00 }, { OV7670_R10_AECH, 0x00 }, { OV7670_R0D_COM4, 0x40 }, /* magic reserved bit */ { OV7670_R14_COM9, 0x18 }, /* 4x gain + magic rsvd bit */ { OV7670_RA5_BD50MAX, 0x05 }, { OV7670_RAB_BD60MAX, 0x07 }, { OV7670_R24_AEW, 0x95 }, { OV7670_R25_AEB, 0x33 }, { OV7670_R26_VPT, 0xe3 }, { OV7670_R9F_HAECC1, 0x78 }, { OV7670_RA0_HAECC2, 0x68 }, { 0xa1, 0x03 }, /* magic */ { OV7670_RA6_HAECC3, 0xd8 }, { OV7670_RA7_HAECC4, 0xd8 }, { OV7670_RA8_HAECC5, 0xf0 }, { OV7670_RA9_HAECC6, 0x90 }, { OV7670_RAA_HAECC7, 0x94 }, { OV7670_R13_COM8, OV7670_COM8_FASTAEC | OV7670_COM8_AECSTEP | OV7670_COM8_BFILT | OV7670_COM8_AGC | OV7670_COM8_AEC }, /* Almost all of these are magic "reserved" values. */ { OV7670_R0E_COM5, 0x61 }, { OV7670_R0F_COM6, 0x4b }, { 0x16, 0x02 }, { OV7670_R1E_MVFP, 0x07 }, { 0x21, 0x02 }, { 0x22, 0x91 }, { 0x29, 0x07 }, { 0x33, 0x0b }, { 0x35, 0x0b }, { 0x37, 0x1d }, { 0x38, 0x71 }, { 0x39, 0x2a }, { OV7670_R3C_COM12, 0x78 }, { 0x4d, 0x40 }, { 0x4e, 0x20 }, { OV7670_R69_GFIX, 0x00 }, { 0x6b, 0x4a }, { 0x74, 0x10 }, { 0x8d, 0x4f }, { 0x8e, 0x00 }, { 0x8f, 0x00 }, { 0x90, 0x00 }, { 0x91, 0x00 }, { 0x96, 0x00 }, { 0x9a, 0x00 }, { 0xb0, 0x84 }, { 0xb1, 0x0c }, { 0xb2, 0x0e }, { 0xb3, 0x82 }, { 0xb8, 0x0a }, /* More reserved magic, some of which tweaks white balance */ { 0x43, 0x0a }, { 0x44, 0xf0 }, { 0x45, 0x34 }, { 0x46, 0x58 }, { 0x47, 0x28 }, { 0x48, 0x3a }, { 0x59, 0x88 }, { 0x5a, 0x88 }, { 0x5b, 0x44 }, { 0x5c, 0x67 }, { 0x5d, 0x49 }, { 0x5e, 0x0e }, { 0x6c, 0x0a }, { 0x6d, 0x55 }, { 0x6e, 0x11 }, { 0x6f, 0x9f }, /* "9e for advance AWB" */ { 0x6a, 0x40 }, { OV7670_R01_BLUE, 0x40 }, { OV7670_R02_RED, 0x60 }, { OV7670_R13_COM8, OV7670_COM8_FASTAEC | OV7670_COM8_AECSTEP | OV7670_COM8_BFILT | OV7670_COM8_AGC | OV7670_COM8_AEC | OV7670_COM8_AWB }, /* Matrix coefficients */ { 0x4f, 0x80 }, { 0x50, 0x80 }, { 0x51, 0x00 }, { 0x52, 0x22 }, { 0x53, 0x5e }, { 0x54, 0x80 }, { 0x58, 0x9e }, { OV7670_R41_COM16, OV7670_COM16_AWBGAIN }, { OV7670_R3F_EDGE, 0x00 }, { 0x75, 0x05 }, { 0x76, 0xe1 }, { 0x4c, 0x00 }, { 0x77, 0x01 }, { OV7670_R3D_COM13, OV7670_COM13_GAMMA | OV7670_COM13_UVSAT | 2}, /* was 3 */ { 0x4b, 0x09 }, { 0xc9, 0x60 }, { OV7670_R41_COM16, 0x38 }, { 0x56, 0x40 }, { 0x34, 0x11 }, { OV7670_R3B_COM11, OV7670_COM11_EXP|OV7670_COM11_HZAUTO }, { 0xa4, 0x88 }, { 0x96, 0x00 }, { 0x97, 0x30 }, { 0x98, 0x20 }, { 0x99, 0x30 }, { 0x9a, 0x84 }, { 0x9b, 0x29 }, { 0x9c, 0x03 }, { 0x9d, 0x4c }, { 0x9e, 0x3f }, { 0x78, 0x04 }, /* Extra-weird stuff. Some sort of multiplexor register */ { 0x79, 0x01 }, { 0xc8, 0xf0 }, { 0x79, 0x0f }, { 0xc8, 0x00 }, { 0x79, 0x10 }, { 0xc8, 0x7e }, { 0x79, 0x0a }, { 0xc8, 0x80 }, { 0x79, 0x0b }, { 0xc8, 0x01 }, { 0x79, 0x0c }, { 0xc8, 0x0f }, { 0x79, 0x0d }, { 0xc8, 0x20 }, { 0x79, 0x09 }, { 0xc8, 0x80 }, { 0x79, 0x02 }, { 0xc8, 0xc0 }, { 0x79, 0x03 }, { 0xc8, 0x40 }, { 0x79, 0x05 }, { 0xc8, 0x30 }, { 0x79, 0x26 }, }; static const struct ov_i2c_regvals norm_8610[] = { { 0x12, 0x80 }, { 0x00, 0x00 }, { 0x01, 0x80 }, { 0x02, 0x80 }, { 0x03, 0xc0 }, { 0x04, 0x30 }, { 0x05, 0x30 }, /* was 0x10, new from windrv 090403 */ { 0x06, 0x70 }, /* was 0x80, new from windrv 090403 */ { 0x0a, 0x86 }, { 0x0b, 0xb0 }, { 0x0c, 0x20 }, { 0x0d, 0x20 }, { 0x11, 0x01 }, { 0x12, 0x25 }, { 0x13, 0x01 }, { 0x14, 0x04 }, { 0x15, 0x01 }, /* Lin and Win think different about UV order */ { 0x16, 0x03 }, { 0x17, 0x38 }, /* was 0x2f, new from windrv 090403 */ { 0x18, 0xea }, /* was 0xcf, new from windrv 090403 */ { 0x19, 0x02 }, /* was 0x06, new from windrv 090403 */ { 0x1a, 0xf5 }, { 0x1b, 0x00 }, { 0x20, 0xd0 }, /* was 0x90, new from windrv 090403 */ { 0x23, 0xc0 }, /* was 0x00, new from windrv 090403 */ { 0x24, 0x30 }, /* was 0x1d, new from windrv 090403 */ { 0x25, 0x50 }, /* was 0x57, new from windrv 090403 */ { 0x26, 0xa2 }, { 0x27, 0xea }, { 0x28, 0x00 }, { 0x29, 0x00 }, { 0x2a, 0x80 }, { 0x2b, 0xc8 }, /* was 0xcc, new from windrv 090403 */ { 0x2c, 0xac }, { 0x2d, 0x45 }, /* was 0xd5, new from windrv 090403 */ { 0x2e, 0x80 }, { 0x2f, 0x14 }, /* was 0x01, new from windrv 090403 */ { 0x4c, 0x00 }, { 0x4d, 0x30 }, /* was 0x10, new from windrv 090403 */ { 0x60, 0x02 }, /* was 0x01, new from windrv 090403 */ { 0x61, 0x00 }, /* was 0x09, new from windrv 090403 */ { 0x62, 0x5f }, /* was 0xd7, new from windrv 090403 */ { 0x63, 0xff }, { 0x64, 0x53 }, /* new windrv 090403 says 0x57, * maybe that's wrong */ { 0x65, 0x00 }, { 0x66, 0x55 }, { 0x67, 0xb0 }, { 0x68, 0xc0 }, /* was 0xaf, new from windrv 090403 */ { 0x69, 0x02 }, { 0x6a, 0x22 }, { 0x6b, 0x00 }, { 0x6c, 0x99 }, /* was 0x80, old windrv says 0x00, but * deleting bit7 colors the first images red */ { 0x6d, 0x11 }, /* was 0x00, new from windrv 090403 */ { 0x6e, 0x11 }, /* was 0x00, new from windrv 090403 */ { 0x6f, 0x01 }, { 0x70, 0x8b }, { 0x71, 0x00 }, { 0x72, 0x14 }, { 0x73, 0x54 }, { 0x74, 0x00 },/* 0x60? - was 0x00, new from windrv 090403 */ { 0x75, 0x0e }, { 0x76, 0x02 }, /* was 0x02, new from windrv 090403 */ { 0x77, 0xff }, { 0x78, 0x80 }, { 0x79, 0x80 }, { 0x7a, 0x80 }, { 0x7b, 0x10 }, /* was 0x13, new from windrv 090403 */ { 0x7c, 0x00 }, { 0x7d, 0x08 }, /* was 0x09, new from windrv 090403 */ { 0x7e, 0x08 }, /* was 0xc0, new from windrv 090403 */ { 0x7f, 0xfb }, { 0x80, 0x28 }, { 0x81, 0x00 }, { 0x82, 0x23 }, { 0x83, 0x0b }, { 0x84, 0x00 }, { 0x85, 0x62 }, /* was 0x61, new from windrv 090403 */ { 0x86, 0xc9 }, { 0x87, 0x00 }, { 0x88, 0x00 }, { 0x89, 0x01 }, { 0x12, 0x20 }, { 0x12, 0x25 }, /* was 0x24, new from windrv 090403 */ }; static unsigned char ov7670_abs_to_sm(unsigned char v) { if (v > 127) return v & 0x7f; return (128 - v) | 0x80; } /* Write a OV519 register */ static void reg_w(struct sd *sd, u16 index, u16 value) { struct gspca_dev *gspca_dev = (struct gspca_dev *)sd; int ret, req = 0; if (sd->gspca_dev.usb_err < 0) return; /* Avoid things going to fast for the bridge with a xhci host */ udelay(150); switch (sd->bridge) { case BRIDGE_OV511: case BRIDGE_OV511PLUS: req = 2; break; case BRIDGE_OVFX2: req = 0x0a; fallthrough; case BRIDGE_W9968CF: gspca_dbg(gspca_dev, D_USBO, "SET %02x %04x %04x\n", req, value, index); ret = usb_control_msg(sd->gspca_dev.dev, usb_sndctrlpipe(sd->gspca_dev.dev, 0), req, USB_DIR_OUT | USB_TYPE_VENDOR | USB_RECIP_DEVICE, value, index, NULL, 0, 500); goto leave; default: req = 1; } gspca_dbg(gspca_dev, D_USBO, "SET %02x 0000 %04x %02x\n", req, index, value); sd->gspca_dev.usb_buf[0] = value; ret = usb_control_msg(sd->gspca_dev.dev, usb_sndctrlpipe(sd->gspca_dev.dev, 0), req, USB_DIR_OUT | USB_TYPE_VENDOR | USB_RECIP_DEVICE, 0, index, sd->gspca_dev.usb_buf, 1, 500); leave: if (ret < 0) { gspca_err(gspca_dev, "reg_w %02x failed %d\n", index, ret); sd->gspca_dev.usb_err = ret; return; } } /* Read from a OV519 register, note not valid for the w9968cf!! */ /* returns: negative is error, pos or zero is data */ static int reg_r(struct sd *sd, u16 index) { struct gspca_dev *gspca_dev = (struct gspca_dev *)sd; int ret; int req; if (sd->gspca_dev.usb_err < 0) return -1; switch (sd->bridge) { case BRIDGE_OV511: case BRIDGE_OV511PLUS: req = 3; break; case BRIDGE_OVFX2: req = 0x0b; break; default: req = 1; } /* Avoid things going to fast for the bridge with a xhci host */ udelay(150); ret = usb_control_msg(sd->gspca_dev.dev, usb_rcvctrlpipe(sd->gspca_dev.dev, 0), req, USB_DIR_IN | USB_TYPE_VENDOR | USB_RECIP_DEVICE, 0, index, sd->gspca_dev.usb_buf, 1, 500); if (ret >= 0) { ret = sd->gspca_dev.usb_buf[0]; gspca_dbg(gspca_dev, D_USBI, "GET %02x 0000 %04x %02x\n", req, index, ret); } else { gspca_err(gspca_dev, "reg_r %02x failed %d\n", index, ret); sd->gspca_dev.usb_err = ret; /* * Make sure the result is zeroed to avoid uninitialized * values. */ gspca_dev->usb_buf[0] = 0; } return ret; } /* Read 8 values from a OV519 register */ static int reg_r8(struct sd *sd, u16 index) { struct gspca_dev *gspca_dev = (struct gspca_dev *)sd; int ret; if (sd->gspca_dev.usb_err < 0) return -1; /* Avoid things going to fast for the bridge with a xhci host */ udelay(150); ret = usb_control_msg(sd->gspca_dev.dev, usb_rcvctrlpipe(sd->gspca_dev.dev, 0), 1, /* REQ_IO */ USB_DIR_IN | USB_TYPE_VENDOR | USB_RECIP_DEVICE, 0, index, sd->gspca_dev.usb_buf, 8, 500); if (ret >= 0) { ret = sd->gspca_dev.usb_buf[0]; } else { gspca_err(gspca_dev, "reg_r8 %02x failed %d\n", index, ret); sd->gspca_dev.usb_err = ret; /* * Make sure the buffer is zeroed to avoid uninitialized * values. */ memset(gspca_dev->usb_buf, 0, 8); } return ret; } /* * Writes bits at positions specified by mask to an OV51x reg. Bits that are in * the same position as 1's in "mask" are cleared and set to "value". Bits * that are in the same position as 0's in "mask" are preserved, regardless * of their respective state in "value". */ static void reg_w_mask(struct sd *sd, u16 index, u8 value, u8 mask) { int ret; u8 oldval; if (mask != 0xff) { value &= mask; /* Enforce mask on value */ ret = reg_r(sd, index); if (ret < 0) return; oldval = ret & ~mask; /* Clear the masked bits */ value |= oldval; /* Set the desired bits */ } reg_w(sd, index, value); } /* * Writes multiple (n) byte value to a single register. Only valid with certain * registers (0x30 and 0xc4 - 0xce). */ static void ov518_reg_w32(struct sd *sd, u16 index, u32 value, int n) { struct gspca_dev *gspca_dev = (struct gspca_dev *)sd; int ret; if (sd->gspca_dev.usb_err < 0) return; *((__le32 *) sd->gspca_dev.usb_buf) = __cpu_to_le32(value); /* Avoid things going to fast for the bridge with a xhci host */ udelay(150); ret = usb_control_msg(sd->gspca_dev.dev, usb_sndctrlpipe(sd->gspca_dev.dev, 0), 1 /* REG_IO */, USB_DIR_OUT | USB_TYPE_VENDOR | USB_RECIP_DEVICE, 0, index, sd->gspca_dev.usb_buf, n, 500); if (ret < 0) { gspca_err(gspca_dev, "reg_w32 %02x failed %d\n", index, ret); sd->gspca_dev.usb_err = ret; } } static void ov511_i2c_w(struct sd *sd, u8 reg, u8 value) { struct gspca_dev *gspca_dev = (struct gspca_dev *)sd; int rc, retries; gspca_dbg(gspca_dev, D_USBO, "ov511_i2c_w %02x %02x\n", reg, value); /* Three byte write cycle */ for (retries = 6; ; ) { /* Select camera register */ reg_w(sd, R51x_I2C_SADDR_3, reg); /* Write "value" to I2C data port of OV511 */ reg_w(sd, R51x_I2C_DATA, value); /* Initiate 3-byte write cycle */ reg_w(sd, R511_I2C_CTL, 0x01); do { rc = reg_r(sd, R511_I2C_CTL); } while (rc > 0 && ((rc & 1) == 0)); /* Retry until idle */ if (rc < 0) return; if ((rc & 2) == 0) /* Ack? */ break; if (--retries < 0) { gspca_dbg(gspca_dev, D_USBO, "i2c write retries exhausted\n"); return; } } } static int ov511_i2c_r(struct sd *sd, u8 reg) { struct gspca_dev *gspca_dev = (struct gspca_dev *)sd; int rc, value, retries; /* Two byte write cycle */ for (retries = 6; ; ) { /* Select camera register */ reg_w(sd, R51x_I2C_SADDR_2, reg); /* Initiate 2-byte write cycle */ reg_w(sd, R511_I2C_CTL, 0x03); do { rc = reg_r(sd, R511_I2C_CTL); } while (rc > 0 && ((rc & 1) == 0)); /* Retry until idle */ if (rc < 0) return rc; if ((rc & 2) == 0) /* Ack? */ break; /* I2C abort */ reg_w(sd, R511_I2C_CTL, 0x10); if (--retries < 0) { gspca_dbg(gspca_dev, D_USBI, "i2c write retries exhausted\n"); return -1; } } /* Two byte read cycle */ for (retries = 6; ; ) { /* Initiate 2-byte read cycle */ reg_w(sd, R511_I2C_CTL, 0x05); do { rc = reg_r(sd, R511_I2C_CTL); } while (rc > 0 && ((rc & 1) == 0)); /* Retry until idle */ if (rc < 0) return rc; if ((rc & 2) == 0) /* Ack? */ break; /* I2C abort */ reg_w(sd, R511_I2C_CTL, 0x10); if (--retries < 0) { gspca_dbg(gspca_dev, D_USBI, "i2c read retries exhausted\n"); return -1; } } value = reg_r(sd, R51x_I2C_DATA); gspca_dbg(gspca_dev, D_USBI, "ov511_i2c_r %02x %02x\n", reg, value); /* This is needed to make i2c_w() work */ reg_w(sd, R511_I2C_CTL, 0x05); return value; } /* * The OV518 I2C I/O procedure is different, hence, this function. * This is normally only called from i2c_w(). Note that this function * always succeeds regardless of whether the sensor is present and working. */ static void ov518_i2c_w(struct sd *sd, u8 reg, u8 value) { struct gspca_dev *gspca_dev = (struct gspca_dev *)sd; gspca_dbg(gspca_dev, D_USBO, "ov518_i2c_w %02x %02x\n", reg, value); /* Select camera register */ reg_w(sd, R51x_I2C_SADDR_3, reg); /* Write "value" to I2C data port of OV511 */ reg_w(sd, R51x_I2C_DATA, value); /* Initiate 3-byte write cycle */ reg_w(sd, R518_I2C_CTL, 0x01); /* wait for write complete */ msleep(4); reg_r8(sd, R518_I2C_CTL); } /* * returns: negative is error, pos or zero is data * * The OV518 I2C I/O procedure is different, hence, this function. * This is normally only called from i2c_r(). Note that this function * always succeeds regardless of whether the sensor is present and working. */ static int ov518_i2c_r(struct sd *sd, u8 reg) { struct gspca_dev *gspca_dev = (struct gspca_dev *)sd; int value; /* Select camera register */ reg_w(sd, R51x_I2C_SADDR_2, reg); /* Initiate 2-byte write cycle */ reg_w(sd, R518_I2C_CTL, 0x03); reg_r8(sd, R518_I2C_CTL); /* Initiate 2-byte read cycle */ reg_w(sd, R518_I2C_CTL, 0x05); reg_r8(sd, R518_I2C_CTL); value = reg_r(sd, R51x_I2C_DATA); gspca_dbg(gspca_dev, D_USBI, "ov518_i2c_r %02x %02x\n", reg, value); return value; } static void ovfx2_i2c_w(struct sd *sd, u8 reg, u8 value) { struct gspca_dev *gspca_dev = (struct gspca_dev *)sd; int ret; if (sd->gspca_dev.usb_err < 0) return; ret = usb_control_msg(sd->gspca_dev.dev, usb_sndctrlpipe(sd->gspca_dev.dev, 0), 0x02, USB_DIR_OUT | USB_TYPE_VENDOR | USB_RECIP_DEVICE, (u16) value, (u16) reg, NULL, 0, 500); if (ret < 0) { gspca_err(gspca_dev, "ovfx2_i2c_w %02x failed %d\n", reg, ret); sd->gspca_dev.usb_err = ret; } gspca_dbg(gspca_dev, D_USBO, "ovfx2_i2c_w %02x %02x\n", reg, value); } static int ovfx2_i2c_r(struct sd *sd, u8 reg) { struct gspca_dev *gspca_dev = (struct gspca_dev *)sd; int ret; if (sd->gspca_dev.usb_err < 0) return -1; ret = usb_control_msg(sd->gspca_dev.dev, usb_rcvctrlpipe(sd->gspca_dev.dev, 0), 0x03, USB_DIR_IN | USB_TYPE_VENDOR | USB_RECIP_DEVICE, 0, (u16) reg, sd->gspca_dev.usb_buf, 1, 500); if (ret >= 0) { ret = sd->gspca_dev.usb_buf[0]; gspca_dbg(gspca_dev, D_USBI, "ovfx2_i2c_r %02x %02x\n", reg, ret); } else { gspca_err(gspca_dev, "ovfx2_i2c_r %02x failed %d\n", reg, ret); sd->gspca_dev.usb_err = ret; } return ret; } static void i2c_w(struct sd *sd, u8 reg, u8 value) { if (sd->sensor_reg_cache[reg] == value) return; switch (sd->bridge) { case BRIDGE_OV511: case BRIDGE_OV511PLUS: ov511_i2c_w(sd, reg, value); break; case BRIDGE_OV518: case BRIDGE_OV518PLUS: case BRIDGE_OV519: ov518_i2c_w(sd, reg, value); break; case BRIDGE_OVFX2: ovfx2_i2c_w(sd, reg, value); break; case BRIDGE_W9968CF: w9968cf_i2c_w(sd, reg, value); break; } if (sd->gspca_dev.usb_err >= 0) { /* Up on sensor reset empty the register cache */ if (reg == 0x12 && (value & 0x80)) memset(sd->sensor_reg_cache, -1, sizeof(sd->sensor_reg_cache)); else sd->sensor_reg_cache[reg] = value; } } static int i2c_r(struct sd *sd, u8 reg) { int ret = -1; if (sd->sensor_reg_cache[reg] != -1) return sd->sensor_reg_cache[reg]; switch (sd->bridge) { case BRIDGE_OV511: case BRIDGE_OV511PLUS: ret = ov511_i2c_r(sd, reg); break; case BRIDGE_OV518: case BRIDGE_OV518PLUS: case BRIDGE_OV519: ret = ov518_i2c_r(sd, reg); break; case BRIDGE_OVFX2: ret = ovfx2_i2c_r(sd, reg); break; case BRIDGE_W9968CF: ret = w9968cf_i2c_r(sd, reg); break; } if (ret >= 0) sd->sensor_reg_cache[reg] = ret; return ret; } /* Writes bits at positions specified by mask to an I2C reg. Bits that are in * the same position as 1's in "mask" are cleared and set to "value". Bits * that are in the same position as 0's in "mask" are preserved, regardless * of their respective state in "value". */ static void i2c_w_mask(struct sd *sd, u8 reg, u8 value, u8 mask) { int rc; u8 oldval; value &= mask; /* Enforce mask on value */ rc = i2c_r(sd, reg); if (rc < 0) return; oldval = rc & ~mask; /* Clear the masked bits */ value |= oldval; /* Set the desired bits */ i2c_w(sd, reg, value); } /* Temporarily stops OV511 from functioning. Must do this before changing * registers while the camera is streaming */ static inline void ov51x_stop(struct sd *sd) { struct gspca_dev *gspca_dev = (struct gspca_dev *)sd; gspca_dbg(gspca_dev, D_STREAM, "stopping\n"); sd->stopped = 1; switch (sd->bridge) { case BRIDGE_OV511: case BRIDGE_OV511PLUS: reg_w(sd, R51x_SYS_RESET, 0x3d); break; case BRIDGE_OV518: case BRIDGE_OV518PLUS: reg_w_mask(sd, R51x_SYS_RESET, 0x3a, 0x3a); break; case BRIDGE_OV519: reg_w(sd, OV519_R51_RESET1, 0x0f); reg_w(sd, OV519_R51_RESET1, 0x00); reg_w(sd, 0x22, 0x00); /* FRAR */ break; case BRIDGE_OVFX2: reg_w_mask(sd, 0x0f, 0x00, 0x02); break; case BRIDGE_W9968CF: reg_w(sd, 0x3c, 0x0a05); /* stop USB transfer */ break; } } /* Restarts OV511 after ov511_stop() is called. Has no effect if it is not * actually stopped (for performance). */ static inline void ov51x_restart(struct sd *sd) { struct gspca_dev *gspca_dev = (struct gspca_dev *)sd; gspca_dbg(gspca_dev, D_STREAM, "restarting\n"); if (!sd->stopped) return; sd->stopped = 0; /* Reinitialize the stream */ switch (sd->bridge) { case BRIDGE_OV511: case BRIDGE_OV511PLUS: reg_w(sd, R51x_SYS_RESET, 0x00); break; case BRIDGE_OV518: case BRIDGE_OV518PLUS: reg_w(sd, 0x2f, 0x80); reg_w(sd, R51x_SYS_RESET, 0x00); break; case BRIDGE_OV519: reg_w(sd, OV519_R51_RESET1, 0x0f); reg_w(sd, OV519_R51_RESET1, 0x00); reg_w(sd, 0x22, 0x1d); /* FRAR */ break; case BRIDGE_OVFX2: reg_w_mask(sd, 0x0f, 0x02, 0x02); break; case BRIDGE_W9968CF: reg_w(sd, 0x3c, 0x8a05); /* USB FIFO enable */ break; } } static void ov51x_set_slave_ids(struct sd *sd, u8 slave); /* This does an initial reset of an OmniVision sensor and ensures that I2C * is synchronized. Returns <0 on failure. */ static int init_ov_sensor(struct sd *sd, u8 slave) { int i; struct gspca_dev *gspca_dev = (struct gspca_dev *)sd; ov51x_set_slave_ids(sd, slave); /* Reset the sensor */ i2c_w(sd, 0x12, 0x80); /* Wait for it to initialize */ msleep(150); for (i = 0; i < i2c_detect_tries; i++) { if (i2c_r(sd, OV7610_REG_ID_HIGH) == 0x7f && i2c_r(sd, OV7610_REG_ID_LOW) == 0xa2) { gspca_dbg(gspca_dev, D_PROBE, "I2C synced in %d attempt(s)\n", i); return 0; } /* Reset the sensor */ i2c_w(sd, 0x12, 0x80); /* Wait for it to initialize */ msleep(150); /* Dummy read to sync I2C */ if (i2c_r(sd, 0x00) < 0) return -1; } return -1; } /* Set the read and write slave IDs. The "slave" argument is the write slave, * and the read slave will be set to (slave + 1). * This should not be called from outside the i2c I/O functions. * Sets I2C read and write slave IDs. Returns <0 for error */ static void ov51x_set_slave_ids(struct sd *sd, u8 slave) { switch (sd->bridge) { case BRIDGE_OVFX2: reg_w(sd, OVFX2_I2C_ADDR, slave); return; case BRIDGE_W9968CF: sd->sensor_addr = slave; return; } reg_w(sd, R51x_I2C_W_SID, slave); reg_w(sd, R51x_I2C_R_SID, slave + 1); } static void write_regvals(struct sd *sd, const struct ov_regvals *regvals, int n) { while (--n >= 0) { reg_w(sd, regvals->reg, regvals->val); regvals++; } } static void write_i2c_regvals(struct sd *sd, const struct ov_i2c_regvals *regvals, int n) { while (--n >= 0) { i2c_w(sd, regvals->reg, regvals->val); regvals++; } } /**************************************************************************** * * OV511 and sensor configuration * ***************************************************************************/ /* This initializes the OV2x10 / OV3610 / OV3620 / OV9600 */ static void ov_hires_configure(struct sd *sd) { struct gspca_dev *gspca_dev = (struct gspca_dev *)sd; int high, low; if (sd->bridge != BRIDGE_OVFX2) { gspca_err(gspca_dev, "error hires sensors only supported with ovfx2\n"); return; } gspca_dbg(gspca_dev, D_PROBE, "starting ov hires configuration\n"); /* Detect sensor (sub)type */ high = i2c_r(sd, 0x0a); low = i2c_r(sd, 0x0b); /* info("%x, %x", high, low); */ switch (high) { case 0x96: switch (low) { case 0x40: gspca_dbg(gspca_dev, D_PROBE, "Sensor is a OV2610\n"); sd->sensor = SEN_OV2610; return; case 0x41: gspca_dbg(gspca_dev, D_PROBE, "Sensor is a OV2610AE\n"); sd->sensor = SEN_OV2610AE; return; case 0xb1: gspca_dbg(gspca_dev, D_PROBE, "Sensor is a OV9600\n"); sd->sensor = SEN_OV9600; return; } break; case 0x36: if ((low & 0x0f) == 0x00) { gspca_dbg(gspca_dev, D_PROBE, "Sensor is a OV3610\n"); sd->sensor = SEN_OV3610; return; } break; } gspca_err(gspca_dev, "Error unknown sensor type: %02x%02x\n", high, low); } /* This initializes the OV8110, OV8610 sensor. The OV8110 uses * the same register settings as the OV8610, since they are very similar. */ static void ov8xx0_configure(struct sd *sd) { struct gspca_dev *gspca_dev = (struct gspca_dev *)sd; int rc; gspca_dbg(gspca_dev, D_PROBE, "starting ov8xx0 configuration\n"); /* Detect sensor (sub)type */ rc = i2c_r(sd, OV7610_REG_COM_I); if (rc < 0) { gspca_err(gspca_dev, "Error detecting sensor type\n"); return; } if ((rc & 3) == 1) sd->sensor = SEN_OV8610; else gspca_err(gspca_dev, "Unknown image sensor version: %d\n", rc & 3); } /* This initializes the OV7610, OV7620, or OV76BE sensor. The OV76BE uses * the same register settings as the OV7610, since they are very similar. */ static void ov7xx0_configure(struct sd *sd) { struct gspca_dev *gspca_dev = (struct gspca_dev *)sd; int rc, high, low; gspca_dbg(gspca_dev, D_PROBE, "starting OV7xx0 configuration\n"); /* Detect sensor (sub)type */ rc = i2c_r(sd, OV7610_REG_COM_I); /* add OV7670 here * it appears to be wrongly detected as a 7610 by default */ if (rc < 0) { gspca_err(gspca_dev, "Error detecting sensor type\n"); return; } if ((rc & 3) == 3) { /* quick hack to make OV7670s work */ high = i2c_r(sd, 0x0a); low = i2c_r(sd, 0x0b); /* info("%x, %x", high, low); */ if (high == 0x76 && (low & 0xf0) == 0x70) { gspca_dbg(gspca_dev, D_PROBE, "Sensor is an OV76%02x\n", low); sd->sensor = SEN_OV7670; } else { gspca_dbg(gspca_dev, D_PROBE, "Sensor is an OV7610\n"); sd->sensor = SEN_OV7610; } } else if ((rc & 3) == 1) { /* I don't know what's different about the 76BE yet. */ if (i2c_r(sd, 0x15) & 1) { gspca_dbg(gspca_dev, D_PROBE, "Sensor is an OV7620AE\n"); sd->sensor = SEN_OV7620AE; } else { gspca_dbg(gspca_dev, D_PROBE, "Sensor is an OV76BE\n"); sd->sensor = SEN_OV76BE; } } else if ((rc & 3) == 0) { /* try to read product id registers */ high = i2c_r(sd, 0x0a); if (high < 0) { gspca_err(gspca_dev, "Error detecting camera chip PID\n"); return; } low = i2c_r(sd, 0x0b); if (low < 0) { gspca_err(gspca_dev, "Error detecting camera chip VER\n"); return; } if (high == 0x76) { switch (low) { case 0x30: gspca_err(gspca_dev, "Sensor is an OV7630/OV7635\n"); gspca_err(gspca_dev, "7630 is not supported by this driver\n"); return; case 0x40: gspca_dbg(gspca_dev, D_PROBE, "Sensor is an OV7645\n"); sd->sensor = SEN_OV7640; /* FIXME */ break; case 0x45: gspca_dbg(gspca_dev, D_PROBE, "Sensor is an OV7645B\n"); sd->sensor = SEN_OV7640; /* FIXME */ break; case 0x48: gspca_dbg(gspca_dev, D_PROBE, "Sensor is an OV7648\n"); sd->sensor = SEN_OV7648; break; case 0x60: gspca_dbg(gspca_dev, D_PROBE, "Sensor is a OV7660\n"); sd->sensor = SEN_OV7660; break; default: gspca_err(gspca_dev, "Unknown sensor: 0x76%02x\n", low); return; } } else { gspca_dbg(gspca_dev, D_PROBE, "Sensor is an OV7620\n"); sd->sensor = SEN_OV7620; } } else { gspca_err(gspca_dev, "Unknown image sensor version: %d\n", rc & 3); } } /* This initializes the OV6620, OV6630, OV6630AE, or OV6630AF sensor. */ static void ov6xx0_configure(struct sd *sd) { struct gspca_dev *gspca_dev = (struct gspca_dev *)sd; int rc; gspca_dbg(gspca_dev, D_PROBE, "starting OV6xx0 configuration\n"); /* Detect sensor (sub)type */ rc = i2c_r(sd, OV7610_REG_COM_I); if (rc < 0) { gspca_err(gspca_dev, "Error detecting sensor type\n"); return; } /* Ugh. The first two bits are the version bits, but * the entire register value must be used. I guess OVT * underestimated how many variants they would make. */ switch (rc) { case 0x00: sd->sensor = SEN_OV6630; pr_warn("WARNING: Sensor is an OV66308. Your camera may have been misdetected in previous driver versions.\n"); break; case 0x01: sd->sensor = SEN_OV6620; gspca_dbg(gspca_dev, D_PROBE, "Sensor is an OV6620\n"); break; case 0x02: sd->sensor = SEN_OV6630; gspca_dbg(gspca_dev, D_PROBE, "Sensor is an OV66308AE\n"); break; case 0x03: sd->sensor = SEN_OV66308AF; gspca_dbg(gspca_dev, D_PROBE, "Sensor is an OV66308AF\n"); break; case 0x90: sd->sensor = SEN_OV6630; pr_warn("WARNING: Sensor is an OV66307. Your camera may have been misdetected in previous driver versions.\n"); break; default: gspca_err(gspca_dev, "FATAL: Unknown sensor version: 0x%02x\n", rc); return; } /* Set sensor-specific vars */ sd->sif = 1; } /* Turns on or off the LED. Only has an effect with OV511+/OV518(+)/OV519 */ static void ov51x_led_control(struct sd *sd, int on) { if (sd->invert_led) on = !on; switch (sd->bridge) { /* OV511 has no LED control */ case BRIDGE_OV511PLUS: reg_w(sd, R511_SYS_LED_CTL, on); break; case BRIDGE_OV518: case BRIDGE_OV518PLUS: reg_w_mask(sd, R518_GPIO_OUT, 0x02 * on, 0x02); break; case BRIDGE_OV519: reg_w_mask(sd, OV519_GPIO_DATA_OUT0, on, 1); break; } } static void sd_reset_snapshot(struct gspca_dev *gspca_dev) { struct sd *sd = (struct sd *) gspca_dev; if (!sd->snapshot_needs_reset) return; /* Note it is important that we clear sd->snapshot_needs_reset, before actually clearing the snapshot state in the bridge otherwise we might race with the pkt_scan interrupt handler */ sd->snapshot_needs_reset = 0; switch (sd->bridge) { case BRIDGE_OV511: case BRIDGE_OV511PLUS: reg_w(sd, R51x_SYS_SNAP, 0x02); reg_w(sd, R51x_SYS_SNAP, 0x00); break; case BRIDGE_OV518: case BRIDGE_OV518PLUS: reg_w(sd, R51x_SYS_SNAP, 0x02); /* Reset */ reg_w(sd, R51x_SYS_SNAP, 0x01); /* Enable */ break; case BRIDGE_OV519: reg_w(sd, R51x_SYS_RESET, 0x40); reg_w(sd, R51x_SYS_RESET, 0x00); break; } } static void ov51x_upload_quan_tables(struct sd *sd) { static const unsigned char yQuanTable511[] = { 0, 1, 1, 2, 2, 3, 3, 4, 1, 1, 1, 2, 2, 3, 4, 4, 1, 1, 2, 2, 3, 4, 4, 4, 2, 2, 2, 3, 4, 4, 4, 4, 2, 2, 3, 4, 4, 5, 5, 5, 3, 3, 4, 4, 5, 5, 5, 5, 3, 4, 4, 4, 5, 5, 5, 5, 4, 4, 4, 4, 5, 5, 5, 5 }; static const unsigned char uvQuanTable511[] = { 0, 2, 2, 3, 4, 4, 4, 4, 2, 2, 2, 4, 4, 4, 4, 4, 2, 2, 3, 4, 4, 4, 4, 4, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4 }; /* OV518 quantization tables are 8x4 (instead of 8x8) */ static const unsigned char yQuanTable518[] = { 5, 4, 5, 6, 6, 7, 7, 7, 5, 5, 5, 5, 6, 7, 7, 7, 6, 6, 6, 6, 7, 7, 7, 8, 7, 7, 6, 7, 7, 7, 8, 8 }; static const unsigned char uvQuanTable518[] = { 6, 6, 6, 7, 7, 7, 7, 7, 6, 6, 6, 7, 7, 7, 7, 7, 6, 6, 6, 7, 7, 7, 7, 8, 7, 7, 7, 7, 7, 7, 8, 8 }; struct gspca_dev *gspca_dev = (struct gspca_dev *)sd; const unsigned char *pYTable, *pUVTable; unsigned char val0, val1; int i, size, reg = R51x_COMP_LUT_BEGIN; gspca_dbg(gspca_dev, D_PROBE, "Uploading quantization tables\n"); if (sd->bridge == BRIDGE_OV511 || sd->bridge == BRIDGE_OV511PLUS) { pYTable = yQuanTable511; pUVTable = uvQuanTable511; size = 32; } else { pYTable = yQuanTable518; pUVTable = uvQuanTable518; size = 16; } for (i = 0; i < size; i++) { val0 = *pYTable++; val1 = *pYTable++; val0 &= 0x0f; val1 &= 0x0f; val0 |= val1 << 4; reg_w(sd, reg, val0); val0 = *pUVTable++; val1 = *pUVTable++; val0 &= 0x0f; val1 &= 0x0f; val0 |= val1 << 4; reg_w(sd, reg + size, val0); reg++; } } /* This initializes the OV511/OV511+ and the sensor */ static void ov511_configure(struct gspca_dev *gspca_dev) { struct sd *sd = (struct sd *) gspca_dev; /* For 511 and 511+ */ static const struct ov_regvals init_511[] = { { R51x_SYS_RESET, 0x7f }, { R51x_SYS_INIT, 0x01 }, { R51x_SYS_RESET, 0x7f }, { R51x_SYS_INIT, 0x01 }, { R51x_SYS_RESET, 0x3f }, { R51x_SYS_INIT, 0x01 }, { R51x_SYS_RESET, 0x3d }, }; static const struct ov_regvals norm_511[] = { { R511_DRAM_FLOW_CTL, 0x01 }, { R51x_SYS_SNAP, 0x00 }, { R51x_SYS_SNAP, 0x02 }, { R51x_SYS_SNAP, 0x00 }, { R511_FIFO_OPTS, 0x1f }, { R511_COMP_EN, 0x00 }, { R511_COMP_LUT_EN, 0x03 }, }; static const struct ov_regvals norm_511_p[] = { { R511_DRAM_FLOW_CTL, 0xff }, { R51x_SYS_SNAP, 0x00 }, { R51x_SYS_SNAP, 0x02 }, { R51x_SYS_SNAP, 0x00 }, { R511_FIFO_OPTS, 0xff }, { R511_COMP_EN, 0x00 }, { R511_COMP_LUT_EN, 0x03 }, }; static const struct ov_regvals compress_511[] = { { 0x70, 0x1f }, { 0x71, 0x05 }, { 0x72, 0x06 }, { 0x73, 0x06 }, { 0x74, 0x14 }, { 0x75, 0x03 }, { 0x76, 0x04 }, { 0x77, 0x04 }, }; gspca_dbg(gspca_dev, D_PROBE, "Device custom id %x\n", reg_r(sd, R51x_SYS_CUST_ID)); write_regvals(sd, init_511, ARRAY_SIZE(init_511)); switch (sd->bridge) { case BRIDGE_OV511: write_regvals(sd, norm_511, ARRAY_SIZE(norm_511)); break; case BRIDGE_OV511PLUS: write_regvals(sd, norm_511_p, ARRAY_SIZE(norm_511_p)); break; } /* Init compression */ write_regvals(sd, compress_511, ARRAY_SIZE(compress_511)); ov51x_upload_quan_tables(sd); } /* This initializes the OV518/OV518+ and the sensor */ static void ov518_configure(struct gspca_dev *gspca_dev) { struct sd *sd = (struct sd *) gspca_dev; /* For 518 and 518+ */ static const struct ov_regvals init_518[] = { { R51x_SYS_RESET, 0x40 }, { R51x_SYS_INIT, 0xe1 }, { R51x_SYS_RESET, 0x3e }, { R51x_SYS_INIT, 0xe1 }, { R51x_SYS_RESET, 0x00 }, { R51x_SYS_INIT, 0xe1 }, { 0x46, 0x00 }, { 0x5d, 0x03 }, }; static const struct ov_regvals norm_518[] = { { R51x_SYS_SNAP, 0x02 }, /* Reset */ { R51x_SYS_SNAP, 0x01 }, /* Enable */ { 0x31, 0x0f }, { 0x5d, 0x03 }, { 0x24, 0x9f }, { 0x25, 0x90 }, { 0x20, 0x00 }, { 0x51, 0x04 }, { 0x71, 0x19 }, { 0x2f, 0x80 }, }; static const struct ov_regvals norm_518_p[] = { { R51x_SYS_SNAP, 0x02 }, /* Reset */ { R51x_SYS_SNAP, 0x01 }, /* Enable */ { 0x31, 0x0f }, { 0x5d, 0x03 }, { 0x24, 0x9f }, { 0x25, 0x90 }, { 0x20, 0x60 }, { 0x51, 0x02 }, { 0x71, 0x19 }, { 0x40, 0xff }, { 0x41, 0x42 }, { 0x46, 0x00 }, { 0x33, 0x04 }, { 0x21, 0x19 }, { 0x3f, 0x10 }, { 0x2f, 0x80 }, }; /* First 5 bits of custom ID reg are a revision ID on OV518 */ sd->revision = reg_r(sd, R51x_SYS_CUST_ID) & 0x1f; gspca_dbg(gspca_dev, D_PROBE, "Device revision %d\n", sd->revision); write_regvals(sd, init_518, ARRAY_SIZE(init_518)); /* Set LED GPIO pin to output mode */ reg_w_mask(sd, R518_GPIO_CTL, 0x00, 0x02); switch (sd->bridge) { case BRIDGE_OV518: write_regvals(sd, norm_518, ARRAY_SIZE(norm_518)); break; case BRIDGE_OV518PLUS: write_regvals(sd, norm_518_p, ARRAY_SIZE(norm_518_p)); break; } ov51x_upload_quan_tables(sd); reg_w(sd, 0x2f, 0x80); } static void ov519_configure(struct sd *sd) { static const struct ov_regvals init_519[] = { { 0x5a, 0x6d }, /* EnableSystem */ { 0x53, 0x9b }, /* don't enable the microcontroller */ { OV519_R54_EN_CLK1, 0xff }, /* set bit2 to enable jpeg */ { 0x5d, 0x03 }, { 0x49, 0x01 }, { 0x48, 0x00 }, /* Set LED pin to output mode. Bit 4 must be cleared or sensor * detection will fail. This deserves further investigation. */ { OV519_GPIO_IO_CTRL0, 0xee }, { OV519_R51_RESET1, 0x0f }, { OV519_R51_RESET1, 0x00 }, { 0x22, 0x00 }, /* windows reads 0x55 at this point*/ }; write_regvals(sd, init_519, ARRAY_SIZE(init_519)); } static void ovfx2_configure(struct sd *sd) { static const struct ov_regvals init_fx2[] = { { 0x00, 0x60 }, { 0x02, 0x01 }, { 0x0f, 0x1d }, { 0xe9, 0x82 }, { 0xea, 0xc7 }, { 0xeb, 0x10 }, { 0xec, 0xf6 }, }; sd->stopped = 1; write_regvals(sd, init_fx2, ARRAY_SIZE(init_fx2)); } /* set the mode */ /* This function works for ov7660 only */ static void ov519_set_mode(struct sd *sd) { static const struct ov_regvals bridge_ov7660[2][10] = { {{0x10, 0x14}, {0x11, 0x1e}, {0x12, 0x00}, {0x13, 0x00}, {0x14, 0x00}, {0x15, 0x00}, {0x16, 0x00}, {0x20, 0x0c}, {0x25, 0x01}, {0x26, 0x00}}, {{0x10, 0x28}, {0x11, 0x3c}, {0x12, 0x00}, {0x13, 0x00}, {0x14, 0x00}, {0x15, 0x00}, {0x16, 0x00}, {0x20, 0x0c}, {0x25, 0x03}, {0x26, 0x00}} }; static const struct ov_i2c_regvals sensor_ov7660[2][3] = { {{0x12, 0x00}, {0x24, 0x00}, {0x0c, 0x0c}}, {{0x12, 0x00}, {0x04, 0x00}, {0x0c, 0x00}} }; static const struct ov_i2c_regvals sensor_ov7660_2[] = { {OV7670_R17_HSTART, 0x13}, {OV7670_R18_HSTOP, 0x01}, {OV7670_R32_HREF, 0x92}, {OV7670_R19_VSTART, 0x02}, {OV7670_R1A_VSTOP, 0x7a}, {OV7670_R03_VREF, 0x00}, /* {0x33, 0x00}, */ /* {0x34, 0x07}, */ /* {0x36, 0x00}, */ /* {0x6b, 0x0a}, */ }; write_regvals(sd, bridge_ov7660[sd->gspca_dev.curr_mode], ARRAY_SIZE(bridge_ov7660[0])); write_i2c_regvals(sd, sensor_ov7660[sd->gspca_dev.curr_mode], ARRAY_SIZE(sensor_ov7660[0])); write_i2c_regvals(sd, sensor_ov7660_2, ARRAY_SIZE(sensor_ov7660_2)); } /* set the frame rate */ /* This function works for sensors ov7640, ov7648 ov7660 and ov7670 only */ static void ov519_set_fr(struct sd *sd) { int fr; u8 clock; /* frame rate table with indices: * - mode = 0: 320x240, 1: 640x480 * - fr rate = 0: 30, 1: 25, 2: 20, 3: 15, 4: 10, 5: 5 * - reg = 0: bridge a4, 1: bridge 23, 2: sensor 11 (clock) */ static const u8 fr_tb[2][6][3] = { {{0x04, 0xff, 0x00}, {0x04, 0x1f, 0x00}, {0x04, 0x1b, 0x00}, {0x04, 0x15, 0x00}, {0x04, 0x09, 0x00}, {0x04, 0x01, 0x00}}, {{0x0c, 0xff, 0x00}, {0x0c, 0x1f, 0x00}, {0x0c, 0x1b, 0x00}, {0x04, 0xff, 0x01}, {0x04, 0x1f, 0x01}, {0x04, 0x1b, 0x01}}, }; if (frame_rate > 0) sd->frame_rate = frame_rate; if (sd->frame_rate >= 30) fr = 0; else if (sd->frame_rate >= 25) fr = 1; else if (sd->frame_rate >= 20) fr = 2; else if (sd->frame_rate >= 15) fr = 3; else if (sd->frame_rate >= 10) fr = 4; else fr = 5; reg_w(sd, 0xa4, fr_tb[sd->gspca_dev.curr_mode][fr][0]); reg_w(sd, 0x23, fr_tb[sd->gspca_dev.curr_mode][fr][1]); clock = fr_tb[sd->gspca_dev.curr_mode][fr][2]; if (sd->sensor == SEN_OV7660) clock |= 0x80; /* enable double clock */ ov518_i2c_w(sd, OV7670_R11_CLKRC, clock); } static void setautogain(struct gspca_dev *gspca_dev, s32 val) { struct sd *sd = (struct sd *) gspca_dev; i2c_w_mask(sd, 0x13, val ? 0x05 : 0x00, 0x05); } /* this function is called at probe time */ static int sd_config(struct gspca_dev *gspca_dev, const struct usb_device_id *id) { struct sd *sd = (struct sd *) gspca_dev; struct cam *cam = &gspca_dev->cam; sd->bridge = id->driver_info & BRIDGE_MASK; sd->invert_led = (id->driver_info & BRIDGE_INVERT_LED) != 0; switch (sd->bridge) { case BRIDGE_OV511: case BRIDGE_OV511PLUS: cam->cam_mode = ov511_vga_mode; cam->nmodes = ARRAY_SIZE(ov511_vga_mode); break; case BRIDGE_OV518: case BRIDGE_OV518PLUS: cam->cam_mode = ov518_vga_mode; cam->nmodes = ARRAY_SIZE(ov518_vga_mode); break; case BRIDGE_OV519: cam->cam_mode = ov519_vga_mode; cam->nmodes = ARRAY_SIZE(ov519_vga_mode); break; case BRIDGE_OVFX2: cam->cam_mode = ov519_vga_mode; cam->nmodes = ARRAY_SIZE(ov519_vga_mode); cam->bulk_size = OVFX2_BULK_SIZE; cam->bulk_nurbs = MAX_NURBS; cam->bulk = 1; break; case BRIDGE_W9968CF: cam->cam_mode = w9968cf_vga_mode; cam->nmodes = ARRAY_SIZE(w9968cf_vga_mode); break; } sd->frame_rate = 15; return 0; } /* this function is called at probe and resume time */ static int sd_init(struct gspca_dev *gspca_dev) { struct sd *sd = (struct sd *) gspca_dev; struct cam *cam = &gspca_dev->cam; switch (sd->bridge) { case BRIDGE_OV511: case BRIDGE_OV511PLUS: ov511_configure(gspca_dev); break; case BRIDGE_OV518: case BRIDGE_OV518PLUS: ov518_configure(gspca_dev); break; case BRIDGE_OV519: ov519_configure(sd); break; case BRIDGE_OVFX2: ovfx2_configure(sd); break; case BRIDGE_W9968CF: w9968cf_configure(sd); break; } /* The OV519 must be more aggressive about sensor detection since * I2C write will never fail if the sensor is not present. We have * to try to initialize the sensor to detect its presence */ sd->sensor = -1; /* Test for 76xx */ if (init_ov_sensor(sd, OV7xx0_SID) >= 0) { ov7xx0_configure(sd); /* Test for 6xx0 */ } else if (init_ov_sensor(sd, OV6xx0_SID) >= 0) { ov6xx0_configure(sd); /* Test for 8xx0 */ } else if (init_ov_sensor(sd, OV8xx0_SID) >= 0) { ov8xx0_configure(sd); /* Test for 3xxx / 2xxx */ } else if (init_ov_sensor(sd, OV_HIRES_SID) >= 0) { ov_hires_configure(sd); } else { gspca_err(gspca_dev, "Can't determine sensor slave IDs\n"); goto error; } if (sd->sensor < 0) goto error; ov51x_led_control(sd, 0); /* turn LED off */ switch (sd->bridge) { case BRIDGE_OV511: case BRIDGE_OV511PLUS: if (sd->sif) { cam->cam_mode = ov511_sif_mode; cam->nmodes = ARRAY_SIZE(ov511_sif_mode); } break; case BRIDGE_OV518: case BRIDGE_OV518PLUS: if (sd->sif) { cam->cam_mode = ov518_sif_mode; cam->nmodes = ARRAY_SIZE(ov518_sif_mode); } break; case BRIDGE_OV519: if (sd->sif) { cam->cam_mode = ov519_sif_mode; cam->nmodes = ARRAY_SIZE(ov519_sif_mode); } break; case BRIDGE_OVFX2: switch (sd->sensor) { case SEN_OV2610: case SEN_OV2610AE: cam->cam_mode = ovfx2_ov2610_mode; cam->nmodes = ARRAY_SIZE(ovfx2_ov2610_mode); break; case SEN_OV3610: cam->cam_mode = ovfx2_ov3610_mode; cam->nmodes = ARRAY_SIZE(ovfx2_ov3610_mode); break; case SEN_OV9600: cam->cam_mode = ovfx2_ov9600_mode; cam->nmodes = ARRAY_SIZE(ovfx2_ov9600_mode); break; default: if (sd->sif) { cam->cam_mode = ov519_sif_mode; cam->nmodes = ARRAY_SIZE(ov519_sif_mode); } break; } break; case BRIDGE_W9968CF: if (sd->sif) cam->nmodes = ARRAY_SIZE(w9968cf_vga_mode) - 1; /* w9968cf needs initialisation once the sensor is known */ w9968cf_init(sd); break; } /* initialize the sensor */ switch (sd->sensor) { case SEN_OV2610: write_i2c_regvals(sd, norm_2610, ARRAY_SIZE(norm_2610)); /* Enable autogain, autoexpo, awb, bandfilter */ i2c_w_mask(sd, 0x13, 0x27, 0x27); break; case SEN_OV2610AE: write_i2c_regvals(sd, norm_2610ae, ARRAY_SIZE(norm_2610ae)); /* enable autoexpo */ i2c_w_mask(sd, 0x13, 0x05, 0x05); break; case SEN_OV3610: write_i2c_regvals(sd, norm_3620b, ARRAY_SIZE(norm_3620b)); /* Enable autogain, autoexpo, awb, bandfilter */ i2c_w_mask(sd, 0x13, 0x27, 0x27); break; case SEN_OV6620: write_i2c_regvals(sd, norm_6x20, ARRAY_SIZE(norm_6x20)); break; case SEN_OV6630: case SEN_OV66308AF: write_i2c_regvals(sd, norm_6x30, ARRAY_SIZE(norm_6x30)); break; default: /* case SEN_OV7610: */ /* case SEN_OV76BE: */ write_i2c_regvals(sd, norm_7610, ARRAY_SIZE(norm_7610)); i2c_w_mask(sd, 0x0e, 0x00, 0x40); break; case SEN_OV7620: case SEN_OV7620AE: write_i2c_regvals(sd, norm_7620, ARRAY_SIZE(norm_7620)); break; case SEN_OV7640: case SEN_OV7648: write_i2c_regvals(sd, norm_7640, ARRAY_SIZE(norm_7640)); break; case SEN_OV7660: i2c_w(sd, OV7670_R12_COM7, OV7670_COM7_RESET); msleep(14); reg_w(sd, OV519_R57_SNAPSHOT, 0x23); write_regvals(sd, init_519_ov7660, ARRAY_SIZE(init_519_ov7660)); write_i2c_regvals(sd, norm_7660, ARRAY_SIZE(norm_7660)); sd->gspca_dev.curr_mode = 1; /* 640x480 */ ov519_set_mode(sd); ov519_set_fr(sd); sd_reset_snapshot(gspca_dev); ov51x_restart(sd); ov51x_stop(sd); /* not in win traces */ ov51x_led_control(sd, 0); break; case SEN_OV7670: write_i2c_regvals(sd, norm_7670, ARRAY_SIZE(norm_7670)); break; case SEN_OV8610: write_i2c_regvals(sd, norm_8610, ARRAY_SIZE(norm_8610)); break; case SEN_OV9600: write_i2c_regvals(sd, norm_9600, ARRAY_SIZE(norm_9600)); /* enable autoexpo */ /* i2c_w_mask(sd, 0x13, 0x05, 0x05); */ break; } return gspca_dev->usb_err; error: gspca_err(gspca_dev, "OV519 Config failed\n"); return -EINVAL; } /* function called at start time before URB creation */ static int sd_isoc_init(struct gspca_dev *gspca_dev) { struct sd *sd = (struct sd *) gspca_dev; switch (sd->bridge) { case BRIDGE_OVFX2: if (gspca_dev->pixfmt.width != 800) gspca_dev->cam.bulk_size = OVFX2_BULK_SIZE; else gspca_dev->cam.bulk_size = 7 * 4096; break; } return 0; } /* Set up the OV511/OV511+ with the given image parameters. * * Do not put any sensor-specific code in here (including I2C I/O functions) */ static void ov511_mode_init_regs(struct sd *sd) { struct gspca_dev *gspca_dev = (struct gspca_dev *)sd; int hsegs, vsegs, packet_size, fps, needed; int interlaced = 0; struct usb_host_interface *alt; struct usb_interface *intf; intf = usb_ifnum_to_if(sd->gspca_dev.dev, sd->gspca_dev.iface); alt = usb_altnum_to_altsetting(intf, sd->gspca_dev.alt); if (!alt) { gspca_err(gspca_dev, "Couldn't get altsetting\n"); sd->gspca_dev.usb_err = -EIO; return; } if (alt->desc.bNumEndpoints < 1) { sd->gspca_dev.usb_err = -ENODEV; return; } packet_size = le16_to_cpu(alt->endpoint[0].desc.wMaxPacketSize); reg_w(sd, R51x_FIFO_PSIZE, packet_size >> 5); reg_w(sd, R511_CAM_UV_EN, 0x01); reg_w(sd, R511_SNAP_UV_EN, 0x01); reg_w(sd, R511_SNAP_OPTS, 0x03); /* Here I'm assuming that snapshot size == image size. * I hope that's always true. --claudio */ hsegs = (sd->gspca_dev.pixfmt.width >> 3) - 1; vsegs = (sd->gspca_dev.pixfmt.height >> 3) - 1; reg_w(sd, R511_CAM_PXCNT, hsegs); reg_w(sd, R511_CAM_LNCNT, vsegs); reg_w(sd, R511_CAM_PXDIV, 0x00); reg_w(sd, R511_CAM_LNDIV, 0x00); /* YUV420, low pass filter on */ reg_w(sd, R511_CAM_OPTS, 0x03); /* Snapshot additions */ reg_w(sd, R511_SNAP_PXCNT, hsegs); reg_w(sd, R511_SNAP_LNCNT, vsegs); reg_w(sd, R511_SNAP_PXDIV, 0x00); reg_w(sd, R511_SNAP_LNDIV, 0x00); /******** Set the framerate ********/ if (frame_rate > 0) sd->frame_rate = frame_rate; switch (sd->sensor) { case SEN_OV6620: /* No framerate control, doesn't like higher rates yet */ sd->clockdiv = 3; break; /* Note once the FIXME's in mode_init_ov_sensor_regs() are fixed for more sensors we need to do this for them too */ case SEN_OV7620: case SEN_OV7620AE: case SEN_OV7640: case SEN_OV7648: case SEN_OV76BE: if (sd->gspca_dev.pixfmt.width == 320) interlaced = 1; fallthrough; case SEN_OV6630: case SEN_OV7610: case SEN_OV7670: switch (sd->frame_rate) { case 30: case 25: /* Not enough bandwidth to do 640x480 @ 30 fps */ if (sd->gspca_dev.pixfmt.width != 640) { sd->clockdiv = 0; break; } /* For 640x480 case */ fallthrough; default: /* case 20: */ /* case 15: */ sd->clockdiv = 1; break; case 10: sd->clockdiv = 2; break; case 5: sd->clockdiv = 5; break; } if (interlaced) { sd->clockdiv = (sd->clockdiv + 1) * 2 - 1; /* Higher then 10 does not work */ if (sd->clockdiv > 10) sd->clockdiv = 10; } break; case SEN_OV8610: /* No framerate control ?? */ sd->clockdiv = 0; break; } /* Check if we have enough bandwidth to disable compression */ fps = (interlaced ? 60 : 30) / (sd->clockdiv + 1) + 1; needed = fps * sd->gspca_dev.pixfmt.width * sd->gspca_dev.pixfmt.height * 3 / 2; /* 1000 isoc packets/sec */ if (needed > 1000 * packet_size) { /* Enable Y and UV quantization and compression */ reg_w(sd, R511_COMP_EN, 0x07); reg_w(sd, R511_COMP_LUT_EN, 0x03); } else { reg_w(sd, R511_COMP_EN, 0x06); reg_w(sd, R511_COMP_LUT_EN, 0x00); } reg_w(sd, R51x_SYS_RESET, OV511_RESET_OMNICE); reg_w(sd, R51x_SYS_RESET, 0); } /* Sets up the OV518/OV518+ with the given image parameters * * OV518 needs a completely different approach, until we can figure out what * the individual registers do. Also, only 15 FPS is supported now. * * Do not put any sensor-specific code in here (including I2C I/O functions) */ static void ov518_mode_init_regs(struct sd *sd) { struct gspca_dev *gspca_dev = (struct gspca_dev *)sd; int hsegs, vsegs, packet_size; struct usb_host_interface *alt; struct usb_interface *intf; intf = usb_ifnum_to_if(sd->gspca_dev.dev, sd->gspca_dev.iface); alt = usb_altnum_to_altsetting(intf, sd->gspca_dev.alt); if (!alt) { gspca_err(gspca_dev, "Couldn't get altsetting\n"); sd->gspca_dev.usb_err = -EIO; return; } if (alt->desc.bNumEndpoints < 1) { sd->gspca_dev.usb_err = -ENODEV; return; } packet_size = le16_to_cpu(alt->endpoint[0].desc.wMaxPacketSize); ov518_reg_w32(sd, R51x_FIFO_PSIZE, packet_size & ~7, 2); /******** Set the mode ********/ reg_w(sd, 0x2b, 0); reg_w(sd, 0x2c, 0); reg_w(sd, 0x2d, 0); reg_w(sd, 0x2e, 0); reg_w(sd, 0x3b, 0); reg_w(sd, 0x3c, 0); reg_w(sd, 0x3d, 0); reg_w(sd, 0x3e, 0); if (sd->bridge == BRIDGE_OV518) { /* Set 8-bit (YVYU) input format */ reg_w_mask(sd, 0x20, 0x08, 0x08); /* Set 12-bit (4:2:0) output format */ reg_w_mask(sd, 0x28, 0x80, 0xf0); reg_w_mask(sd, 0x38, 0x80, 0xf0); } else { reg_w(sd, 0x28, 0x80); reg_w(sd, 0x38, 0x80); } hsegs = sd->gspca_dev.pixfmt.width / 16; vsegs = sd->gspca_dev.pixfmt.height / 4; reg_w(sd, 0x29, hsegs); reg_w(sd, 0x2a, vsegs); reg_w(sd, 0x39, hsegs); reg_w(sd, 0x3a, vsegs); /* Windows driver does this here; who knows why */ reg_w(sd, 0x2f, 0x80); /******** Set the framerate ********/ if (sd->bridge == BRIDGE_OV518PLUS && sd->revision == 0 && sd->sensor == SEN_OV7620AE) sd->clockdiv = 0; else sd->clockdiv = 1; /* Mode independent, but framerate dependent, regs */ /* 0x51: Clock divider; Only works on some cams which use 2 crystals */ reg_w(sd, 0x51, 0x04); reg_w(sd, 0x22, 0x18); reg_w(sd, 0x23, 0xff); if (sd->bridge == BRIDGE_OV518PLUS) { switch (sd->sensor) { case SEN_OV7620AE: /* * HdG: 640x480 needs special handling on device * revision 2, we check for device revision > 0 to * avoid regressions, as we don't know the correct * thing todo for revision 1. * * Also this likely means we don't need to * differentiate between the OV7620 and OV7620AE, * earlier testing hitting this same problem likely * happened to be with revision < 2 cams using an * OV7620 and revision 2 cams using an OV7620AE. */ if (sd->revision > 0 && sd->gspca_dev.pixfmt.width == 640) { reg_w(sd, 0x20, 0x60); reg_w(sd, 0x21, 0x1f); } else { reg_w(sd, 0x20, 0x00); reg_w(sd, 0x21, 0x19); } break; case SEN_OV7620: reg_w(sd, 0x20, 0x00); reg_w(sd, 0x21, 0x19); break; default: reg_w(sd, 0x21, 0x19); } } else reg_w(sd, 0x71, 0x17); /* Compression-related? */ /* FIXME: Sensor-specific */ /* Bit 5 is what matters here. Of course, it is "reserved" */ i2c_w(sd, 0x54, 0x23); reg_w(sd, 0x2f, 0x80); if (sd->bridge == BRIDGE_OV518PLUS) { reg_w(sd, 0x24, 0x94); reg_w(sd, 0x25, 0x90); ov518_reg_w32(sd, 0xc4, 400, 2); /* 190h */ ov518_reg_w32(sd, 0xc6, 540, 2); /* 21ch */ ov518_reg_w32(sd, 0xc7, 540, 2); /* 21ch */ ov518_reg_w32(sd, 0xc8, 108, 2); /* 6ch */ ov518_reg_w32(sd, 0xca, 131098, 3); /* 2001ah */ ov518_reg_w32(sd, 0xcb, 532, 2); /* 214h */ ov518_reg_w32(sd, 0xcc, 2400, 2); /* 960h */ ov518_reg_w32(sd, 0xcd, 32, 2); /* 20h */ ov518_reg_w32(sd, 0xce, 608, 2); /* 260h */ } else { reg_w(sd, 0x24, 0x9f); reg_w(sd, 0x25, 0x90); ov518_reg_w32(sd, 0xc4, 400, 2); /* 190h */ ov518_reg_w32(sd, 0xc6, 381, 2); /* 17dh */ ov518_reg_w32(sd, 0xc7, 381, 2); /* 17dh */ ov518_reg_w32(sd, 0xc8, 128, 2); /* 80h */ ov518_reg_w32(sd, 0xca, 183331, 3); /* 2cc23h */ ov518_reg_w32(sd, 0xcb, 746, 2); /* 2eah */ ov518_reg_w32(sd, 0xcc, 1750, 2); /* 6d6h */ ov518_reg_w32(sd, 0xcd, 45, 2); /* 2dh */ ov518_reg_w32(sd, 0xce, 851, 2); /* 353h */ } reg_w(sd, 0x2f, 0x80); } /* Sets up the OV519 with the given image parameters * * OV519 needs a completely different approach, until we can figure out what * the individual registers do. * * Do not put any sensor-specific code in here (including I2C I/O functions) */ static void ov519_mode_init_regs(struct sd *sd) { static const struct ov_regvals mode_init_519_ov7670[] = { { 0x5d, 0x03 }, /* Turn off suspend mode */ { 0x53, 0x9f }, /* was 9b in 1.65-1.08 */ { OV519_R54_EN_CLK1, 0x0f }, /* bit2 (jpeg enable) */ { 0xa2, 0x20 }, /* a2-a5 are undocumented */ { 0xa3, 0x18 }, { 0xa4, 0x04 }, { 0xa5, 0x28 }, { 0x37, 0x00 }, /* SetUsbInit */ { 0x55, 0x02 }, /* 4.096 Mhz audio clock */ /* Enable both fields, YUV Input, disable defect comp (why?) */ { 0x20, 0x0c }, { 0x21, 0x38 }, { 0x22, 0x1d }, { 0x17, 0x50 }, /* undocumented */ { 0x37, 0x00 }, /* undocumented */ { 0x40, 0xff }, /* I2C timeout counter */ { 0x46, 0x00 }, /* I2C clock prescaler */ { 0x59, 0x04 }, /* new from windrv 090403 */ { 0xff, 0x00 }, /* undocumented */ /* windows reads 0x55 at this point, why? */ }; static const struct ov_regvals mode_init_519[] = { { 0x5d, 0x03 }, /* Turn off suspend mode */ { 0x53, 0x9f }, /* was 9b in 1.65-1.08 */ { OV519_R54_EN_CLK1, 0x0f }, /* bit2 (jpeg enable) */ { 0xa2, 0x20 }, /* a2-a5 are undocumented */ { 0xa3, 0x18 }, { 0xa4, 0x04 }, { 0xa5, 0x28 }, { 0x37, 0x00 }, /* SetUsbInit */ { 0x55, 0x02 }, /* 4.096 Mhz audio clock */ /* Enable both fields, YUV Input, disable defect comp (why?) */ { 0x22, 0x1d }, { 0x17, 0x50 }, /* undocumented */ { 0x37, 0x00 }, /* undocumented */ { 0x40, 0xff }, /* I2C timeout counter */ { 0x46, 0x00 }, /* I2C clock prescaler */ { 0x59, 0x04 }, /* new from windrv 090403 */ { 0xff, 0x00 }, /* undocumented */ /* windows reads 0x55 at this point, why? */ }; struct gspca_dev *gspca_dev = (struct gspca_dev *)sd; /******** Set the mode ********/ switch (sd->sensor) { default: write_regvals(sd, mode_init_519, ARRAY_SIZE(mode_init_519)); if (sd->sensor == SEN_OV7640 || sd->sensor == SEN_OV7648) { /* Select 8-bit input mode */ reg_w_mask(sd, OV519_R20_DFR, 0x10, 0x10); } break; case SEN_OV7660: return; /* done by ov519_set_mode/fr() */ case SEN_OV7670: write_regvals(sd, mode_init_519_ov7670, ARRAY_SIZE(mode_init_519_ov7670)); break; } reg_w(sd, OV519_R10_H_SIZE, sd->gspca_dev.pixfmt.width >> 4); reg_w(sd, OV519_R11_V_SIZE, sd->gspca_dev.pixfmt.height >> 3); if (sd->sensor == SEN_OV7670 && sd->gspca_dev.cam.cam_mode[sd->gspca_dev.curr_mode].priv) reg_w(sd, OV519_R12_X_OFFSETL, 0x04); else if (sd->sensor == SEN_OV7648 && sd->gspca_dev.cam.cam_mode[sd->gspca_dev.curr_mode].priv) reg_w(sd, OV519_R12_X_OFFSETL, 0x01); else reg_w(sd, OV519_R12_X_OFFSETL, 0x00); reg_w(sd, OV519_R13_X_OFFSETH, 0x00); reg_w(sd, OV519_R14_Y_OFFSETL, 0x00); reg_w(sd, OV519_R15_Y_OFFSETH, 0x00); reg_w(sd, OV519_R16_DIVIDER, 0x00); reg_w(sd, OV519_R25_FORMAT, 0x03); /* YUV422 */ reg_w(sd, 0x26, 0x00); /* Undocumented */ /******** Set the framerate ********/ if (frame_rate > 0) sd->frame_rate = frame_rate; /* FIXME: These are only valid at the max resolution. */ sd->clockdiv = 0; switch (sd->sensor) { case SEN_OV7640: case SEN_OV7648: switch (sd->frame_rate) { default: /* case 30: */ reg_w(sd, 0xa4, 0x0c); reg_w(sd, 0x23, 0xff); break; case 25: reg_w(sd, 0xa4, 0x0c); reg_w(sd, 0x23, 0x1f); break; case 20: reg_w(sd, 0xa4, 0x0c); reg_w(sd, 0x23, 0x1b); break; case 15: reg_w(sd, 0xa4, 0x04); reg_w(sd, 0x23, 0xff); sd->clockdiv = 1; break; case 10: reg_w(sd, 0xa4, 0x04); reg_w(sd, 0x23, 0x1f); sd->clockdiv = 1; break; case 5: reg_w(sd, 0xa4, 0x04); reg_w(sd, 0x23, 0x1b); sd->clockdiv = 1; break; } break; case SEN_OV8610: switch (sd->frame_rate) { default: /* 15 fps */ /* case 15: */ reg_w(sd, 0xa4, 0x06); reg_w(sd, 0x23, 0xff); break; case 10: reg_w(sd, 0xa4, 0x06); reg_w(sd, 0x23, 0x1f); break; case 5: reg_w(sd, 0xa4, 0x06); reg_w(sd, 0x23, 0x1b); break; } break; case SEN_OV7670: /* guesses, based on 7640 */ gspca_dbg(gspca_dev, D_STREAM, "Setting framerate to %d fps\n", (sd->frame_rate == 0) ? 15 : sd->frame_rate); reg_w(sd, 0xa4, 0x10); switch (sd->frame_rate) { case 30: reg_w(sd, 0x23, 0xff); break; case 20: reg_w(sd, 0x23, 0x1b); break; default: /* case 15: */ reg_w(sd, 0x23, 0xff); sd->clockdiv = 1; break; } break; } } static void mode_init_ov_sensor_regs(struct sd *sd) { struct gspca_dev *gspca_dev = (struct gspca_dev *)sd; int qvga, xstart, xend, ystart, yend; u8 v; qvga = gspca_dev->cam.cam_mode[gspca_dev->curr_mode].priv & 1; /******** Mode (VGA/QVGA) and sensor specific regs ********/ switch (sd->sensor) { case SEN_OV2610: i2c_w_mask(sd, 0x14, qvga ? 0x20 : 0x00, 0x20); i2c_w_mask(sd, 0x28, qvga ? 0x00 : 0x20, 0x20); i2c_w(sd, 0x24, qvga ? 0x20 : 0x3a); i2c_w(sd, 0x25, qvga ? 0x30 : 0x60); i2c_w_mask(sd, 0x2d, qvga ? 0x40 : 0x00, 0x40); i2c_w_mask(sd, 0x67, qvga ? 0xf0 : 0x90, 0xf0); i2c_w_mask(sd, 0x74, qvga ? 0x20 : 0x00, 0x20); return; case SEN_OV2610AE: { u8 v; /* frame rates: * 10fps / 5 fps for 1600x1200 * 40fps / 20fps for 800x600 */ v = 80; if (qvga) { if (sd->frame_rate < 25) v = 0x81; } else { if (sd->frame_rate < 10) v = 0x81; } i2c_w(sd, 0x11, v); i2c_w(sd, 0x12, qvga ? 0x60 : 0x20); return; } case SEN_OV3610: if (qvga) { xstart = (1040 - gspca_dev->pixfmt.width) / 2 + (0x1f << 4); ystart = (776 - gspca_dev->pixfmt.height) / 2; } else { xstart = (2076 - gspca_dev->pixfmt.width) / 2 + (0x10 << 4); ystart = (1544 - gspca_dev->pixfmt.height) / 2; } xend = xstart + gspca_dev->pixfmt.width; yend = ystart + gspca_dev->pixfmt.height; /* Writing to the COMH register resets the other windowing regs to their default values, so we must do this first. */ i2c_w_mask(sd, 0x12, qvga ? 0x40 : 0x00, 0xf0); i2c_w_mask(sd, 0x32, (((xend >> 1) & 7) << 3) | ((xstart >> 1) & 7), 0x3f); i2c_w_mask(sd, 0x03, (((yend >> 1) & 3) << 2) | ((ystart >> 1) & 3), 0x0f); i2c_w(sd, 0x17, xstart >> 4); i2c_w(sd, 0x18, xend >> 4); i2c_w(sd, 0x19, ystart >> 3); i2c_w(sd, 0x1a, yend >> 3); return; case SEN_OV8610: /* For OV8610 qvga means qsvga */ i2c_w_mask(sd, OV7610_REG_COM_C, qvga ? (1 << 5) : 0, 1 << 5); i2c_w_mask(sd, 0x13, 0x00, 0x20); /* Select 16 bit data bus */ i2c_w_mask(sd, 0x12, 0x04, 0x06); /* AWB: 1 Test pattern: 0 */ i2c_w_mask(sd, 0x2d, 0x00, 0x40); /* from windrv 090403 */ i2c_w_mask(sd, 0x28, 0x20, 0x20); /* progressive mode on */ break; case SEN_OV7610: i2c_w_mask(sd, 0x14, qvga ? 0x20 : 0x00, 0x20); i2c_w(sd, 0x35, qvga ? 0x1e : 0x9e); i2c_w_mask(sd, 0x13, 0x00, 0x20); /* Select 16 bit data bus */ i2c_w_mask(sd, 0x12, 0x04, 0x06); /* AWB: 1 Test pattern: 0 */ break; case SEN_OV7620: case SEN_OV7620AE: case SEN_OV76BE: i2c_w_mask(sd, 0x14, qvga ? 0x20 : 0x00, 0x20); i2c_w_mask(sd, 0x28, qvga ? 0x00 : 0x20, 0x20); i2c_w(sd, 0x24, qvga ? 0x20 : 0x3a); i2c_w(sd, 0x25, qvga ? 0x30 : 0x60); i2c_w_mask(sd, 0x2d, qvga ? 0x40 : 0x00, 0x40); i2c_w_mask(sd, 0x67, qvga ? 0xb0 : 0x90, 0xf0); i2c_w_mask(sd, 0x74, qvga ? 0x20 : 0x00, 0x20); i2c_w_mask(sd, 0x13, 0x00, 0x20); /* Select 16 bit data bus */ i2c_w_mask(sd, 0x12, 0x04, 0x06); /* AWB: 1 Test pattern: 0 */ if (sd->sensor == SEN_OV76BE) i2c_w(sd, 0x35, qvga ? 0x1e : 0x9e); break; case SEN_OV7640: case SEN_OV7648: i2c_w_mask(sd, 0x14, qvga ? 0x20 : 0x00, 0x20); i2c_w_mask(sd, 0x28, qvga ? 0x00 : 0x20, 0x20); /* Setting this undocumented bit in qvga mode removes a very annoying vertical shaking of the image */ i2c_w_mask(sd, 0x2d, qvga ? 0x40 : 0x00, 0x40); /* Unknown */ i2c_w_mask(sd, 0x67, qvga ? 0xf0 : 0x90, 0xf0); /* Allow higher automatic gain (to allow higher framerates) */ i2c_w_mask(sd, 0x74, qvga ? 0x20 : 0x00, 0x20); i2c_w_mask(sd, 0x12, 0x04, 0x04); /* AWB: 1 */ break; case SEN_OV7670: /* set COM7_FMT_VGA or COM7_FMT_QVGA * do we need to set anything else? * HSTART etc are set in set_ov_sensor_window itself */ i2c_w_mask(sd, OV7670_R12_COM7, qvga ? OV7670_COM7_FMT_QVGA : OV7670_COM7_FMT_VGA, OV7670_COM7_FMT_MASK); i2c_w_mask(sd, 0x13, 0x00, 0x20); /* Select 16 bit data bus */ i2c_w_mask(sd, OV7670_R13_COM8, OV7670_COM8_AWB, OV7670_COM8_AWB); if (qvga) { /* QVGA from ov7670.c by * Jonathan Corbet */ xstart = 164; xend = 28; ystart = 14; yend = 494; } else { /* VGA */ xstart = 158; xend = 14; ystart = 10; yend = 490; } /* OV7670 hardware window registers are split across * multiple locations */ i2c_w(sd, OV7670_R17_HSTART, xstart >> 3); i2c_w(sd, OV7670_R18_HSTOP, xend >> 3); v = i2c_r(sd, OV7670_R32_HREF); v = (v & 0xc0) | ((xend & 0x7) << 3) | (xstart & 0x07); msleep(10); /* need to sleep between read and write to * same reg! */ i2c_w(sd, OV7670_R32_HREF, v); i2c_w(sd, OV7670_R19_VSTART, ystart >> 2); i2c_w(sd, OV7670_R1A_VSTOP, yend >> 2); v = i2c_r(sd, OV7670_R03_VREF); v = (v & 0xc0) | ((yend & 0x3) << 2) | (ystart & 0x03); msleep(10); /* need to sleep between read and write to * same reg! */ i2c_w(sd, OV7670_R03_VREF, v); break; case SEN_OV6620: i2c_w_mask(sd, 0x14, qvga ? 0x20 : 0x00, 0x20); i2c_w_mask(sd, 0x13, 0x00, 0x20); /* Select 16 bit data bus */ i2c_w_mask(sd, 0x12, 0x04, 0x06); /* AWB: 1 Test pattern: 0 */ break; case SEN_OV6630: case SEN_OV66308AF: i2c_w_mask(sd, 0x14, qvga ? 0x20 : 0x00, 0x20); i2c_w_mask(sd, 0x12, 0x04, 0x06); /* AWB: 1 Test pattern: 0 */ break; case SEN_OV9600: { const struct ov_i2c_regvals *vals; static const struct ov_i2c_regvals sxga_15[] = { {0x11, 0x80}, {0x14, 0x3e}, {0x24, 0x85}, {0x25, 0x75} }; static const struct ov_i2c_regvals sxga_7_5[] = { {0x11, 0x81}, {0x14, 0x3e}, {0x24, 0x85}, {0x25, 0x75} }; static const struct ov_i2c_regvals vga_30[] = { {0x11, 0x81}, {0x14, 0x7e}, {0x24, 0x70}, {0x25, 0x60} }; static const struct ov_i2c_regvals vga_15[] = { {0x11, 0x83}, {0x14, 0x3e}, {0x24, 0x80}, {0x25, 0x70} }; /* frame rates: * 15fps / 7.5 fps for 1280x1024 * 30fps / 15fps for 640x480 */ i2c_w_mask(sd, 0x12, qvga ? 0x40 : 0x00, 0x40); if (qvga) vals = sd->frame_rate < 30 ? vga_15 : vga_30; else vals = sd->frame_rate < 15 ? sxga_7_5 : sxga_15; write_i2c_regvals(sd, vals, ARRAY_SIZE(sxga_15)); return; } default: return; } /******** Clock programming ********/ i2c_w(sd, 0x11, sd->clockdiv); } /* this function works for bridge ov519 and sensors ov7660 and ov7670 only */ static void sethvflip(struct gspca_dev *gspca_dev, s32 hflip, s32 vflip) { struct sd *sd = (struct sd *) gspca_dev; if (sd->gspca_dev.streaming) reg_w(sd, OV519_R51_RESET1, 0x0f); /* block stream */ i2c_w_mask(sd, OV7670_R1E_MVFP, OV7670_MVFP_MIRROR * hflip | OV7670_MVFP_VFLIP * vflip, OV7670_MVFP_MIRROR | OV7670_MVFP_VFLIP); if (sd->gspca_dev.streaming) reg_w(sd, OV519_R51_RESET1, 0x00); /* restart stream */ } static void set_ov_sensor_window(struct sd *sd) { struct gspca_dev *gspca_dev; int qvga, crop; int hwsbase, hwebase, vwsbase, vwebase, hwscale, vwscale; /* mode setup is fully handled in mode_init_ov_sensor_regs for these */ switch (sd->sensor) { case SEN_OV2610: case SEN_OV2610AE: case SEN_OV3610: case SEN_OV7670: case SEN_OV9600: mode_init_ov_sensor_regs(sd); return; case SEN_OV7660: ov519_set_mode(sd); ov519_set_fr(sd); return; } gspca_dev = &sd->gspca_dev; qvga = gspca_dev->cam.cam_mode[gspca_dev->curr_mode].priv & 1; crop = gspca_dev->cam.cam_mode[gspca_dev->curr_mode].priv & 2; /* The different sensor ICs handle setting up of window differently. * IF YOU SET IT WRONG, YOU WILL GET ALL ZERO ISOC DATA FROM OV51x!! */ switch (sd->sensor) { case SEN_OV8610: hwsbase = 0x1e; hwebase = 0x1e; vwsbase = 0x02; vwebase = 0x02; break; case SEN_OV7610: case SEN_OV76BE: hwsbase = 0x38; hwebase = 0x3a; vwsbase = vwebase = 0x05; break; case SEN_OV6620: case SEN_OV6630: case SEN_OV66308AF: hwsbase = 0x38; hwebase = 0x3a; vwsbase = 0x05; vwebase = 0x06; if (sd->sensor == SEN_OV66308AF && qvga) /* HDG: this fixes U and V getting swapped */ hwsbase++; if (crop) { hwsbase += 8; hwebase += 8; vwsbase += 11; vwebase += 11; } break; case SEN_OV7620: case SEN_OV7620AE: hwsbase = 0x2f; /* From 7620.SET (spec is wrong) */ hwebase = 0x2f; vwsbase = vwebase = 0x05; break; case SEN_OV7640: case SEN_OV7648: hwsbase = 0x1a; hwebase = 0x1a; vwsbase = vwebase = 0x03; break; default: return; } switch (sd->sensor) { case SEN_OV6620: case SEN_OV6630: case SEN_OV66308AF: if (qvga) { /* QCIF */ hwscale = 0; vwscale = 0; } else { /* CIF */ hwscale = 1; vwscale = 1; /* The datasheet says 0; * it's wrong */ } break; case SEN_OV8610: if (qvga) { /* QSVGA */ hwscale = 1; vwscale = 1; } else { /* SVGA */ hwscale = 2; vwscale = 2; } break; default: /* SEN_OV7xx0 */ if (qvga) { /* QVGA */ hwscale = 1; vwscale = 0; } else { /* VGA */ hwscale = 2; vwscale = 1; } } mode_init_ov_sensor_regs(sd); i2c_w(sd, 0x17, hwsbase); i2c_w(sd, 0x18, hwebase + (sd->sensor_width >> hwscale)); i2c_w(sd, 0x19, vwsbase); i2c_w(sd, 0x1a, vwebase + (sd->sensor_height >> vwscale)); } /* -- start the camera -- */ static int sd_start(struct gspca_dev *gspca_dev) { struct sd *sd = (struct sd *) gspca_dev; /* Default for most bridges, allow bridge_mode_init_regs to override */ sd->sensor_width = sd->gspca_dev.pixfmt.width; sd->sensor_height = sd->gspca_dev.pixfmt.height; switch (sd->bridge) { case BRIDGE_OV511: case BRIDGE_OV511PLUS: ov511_mode_init_regs(sd); break; case BRIDGE_OV518: case BRIDGE_OV518PLUS: ov518_mode_init_regs(sd); break; case BRIDGE_OV519: ov519_mode_init_regs(sd); break; /* case BRIDGE_OVFX2: nothing to do */ case BRIDGE_W9968CF: w9968cf_mode_init_regs(sd); break; } set_ov_sensor_window(sd); /* Force clear snapshot state in case the snapshot button was pressed while we weren't streaming */ sd->snapshot_needs_reset = 1; sd_reset_snapshot(gspca_dev); sd->first_frame = 3; ov51x_restart(sd); ov51x_led_control(sd, 1); return gspca_dev->usb_err; } static void sd_stopN(struct gspca_dev *gspca_dev) { struct sd *sd = (struct sd *) gspca_dev; ov51x_stop(sd); ov51x_led_control(sd, 0); } static void sd_stop0(struct gspca_dev *gspca_dev) { struct sd *sd = (struct sd *) gspca_dev; if (!sd->gspca_dev.present) return; if (sd->bridge == BRIDGE_W9968CF) w9968cf_stop0(sd); #if IS_ENABLED(CONFIG_INPUT) /* If the last button state is pressed, release it now! */ if (sd->snapshot_pressed) { input_report_key(gspca_dev->input_dev, KEY_CAMERA, 0); input_sync(gspca_dev->input_dev); sd->snapshot_pressed = 0; } #endif if (sd->bridge == BRIDGE_OV519) reg_w(sd, OV519_R57_SNAPSHOT, 0x23); } static void ov51x_handle_button(struct gspca_dev *gspca_dev, u8 state) { struct sd *sd = (struct sd *) gspca_dev; if (sd->snapshot_pressed != state) { #if IS_ENABLED(CONFIG_INPUT) input_report_key(gspca_dev->input_dev, KEY_CAMERA, state); input_sync(gspca_dev->input_dev); #endif if (state) sd->snapshot_needs_reset = 1; sd->snapshot_pressed = state; } else { /* On the ov511 / ov519 we need to reset the button state multiple times, as resetting does not work as long as the button stays pressed */ switch (sd->bridge) { case BRIDGE_OV511: case BRIDGE_OV511PLUS: case BRIDGE_OV519: if (state) sd->snapshot_needs_reset = 1; break; } } } static void ov511_pkt_scan(struct gspca_dev *gspca_dev, u8 *in, /* isoc packet */ int len) /* iso packet length */ { struct sd *sd = (struct sd *) gspca_dev; /* SOF/EOF packets have 1st to 8th bytes zeroed and the 9th * byte non-zero. The EOF packet has image width/height in the * 10th and 11th bytes. The 9th byte is given as follows: * * bit 7: EOF * 6: compression enabled * 5: 422/420/400 modes * 4: 422/420/400 modes * 3: 1 * 2: snapshot button on * 1: snapshot frame * 0: even/odd field */ if (!(in[0] | in[1] | in[2] | in[3] | in[4] | in[5] | in[6] | in[7]) && (in[8] & 0x08)) { ov51x_handle_button(gspca_dev, (in[8] >> 2) & 1); if (in[8] & 0x80) { /* Frame end */ if ((in[9] + 1) * 8 != gspca_dev->pixfmt.width || (in[10] + 1) * 8 != gspca_dev->pixfmt.height) { gspca_err(gspca_dev, "Invalid frame size, got: %dx%d, requested: %dx%d\n", (in[9] + 1) * 8, (in[10] + 1) * 8, gspca_dev->pixfmt.width, gspca_dev->pixfmt.height); gspca_dev->last_packet_type = DISCARD_PACKET; return; } /* Add 11 byte footer to frame, might be useful */ gspca_frame_add(gspca_dev, LAST_PACKET, in, 11); return; } else { /* Frame start */ gspca_frame_add(gspca_dev, FIRST_PACKET, in, 0); sd->packet_nr = 0; } } /* Ignore the packet number */ len--; /* intermediate packet */ gspca_frame_add(gspca_dev, INTER_PACKET, in, len); } static void ov518_pkt_scan(struct gspca_dev *gspca_dev, u8 *data, /* isoc packet */ int len) /* iso packet length */ { struct sd *sd = (struct sd *) gspca_dev; /* A false positive here is likely, until OVT gives me * the definitive SOF/EOF format */ if ((!(data[0] | data[1] | data[2] | data[3] | data[5])) && data[6]) { ov51x_handle_button(gspca_dev, (data[6] >> 1) & 1); gspca_frame_add(gspca_dev, LAST_PACKET, NULL, 0); gspca_frame_add(gspca_dev, FIRST_PACKET, NULL, 0); sd->packet_nr = 0; } if (gspca_dev->last_packet_type == DISCARD_PACKET) return; /* Does this device use packet numbers ? */ if (len & 7) { len--; if (sd->packet_nr == data[len]) sd->packet_nr++; /* The last few packets of the frame (which are all 0's except that they may contain part of the footer), are numbered 0 */ else if (sd->packet_nr == 0 || data[len]) { gspca_err(gspca_dev, "Invalid packet nr: %d (expect: %d)\n", (int)data[len], (int)sd->packet_nr); gspca_dev->last_packet_type = DISCARD_PACKET; return; } } /* intermediate packet */ gspca_frame_add(gspca_dev, INTER_PACKET, data, len); } static void ov519_pkt_scan(struct gspca_dev *gspca_dev, u8 *data, /* isoc packet */ int len) /* iso packet length */ { /* Header of ov519 is 16 bytes: * Byte Value Description * 0 0xff magic * 1 0xff magic * 2 0xff magic * 3 0xXX 0x50 = SOF, 0x51 = EOF * 9 0xXX 0x01 initial frame without data, * 0x00 standard frame with image * 14 Lo in EOF: length of image data / 8 * 15 Hi */ if (data[0] == 0xff && data[1] == 0xff && data[2] == 0xff) { switch (data[3]) { case 0x50: /* start of frame */ /* Don't check the button state here, as the state usually (always ?) changes at EOF and checking it here leads to unnecessary snapshot state resets. */ #define HDRSZ 16 data += HDRSZ; len -= HDRSZ; #undef HDRSZ if (data[0] == 0xff || data[1] == 0xd8) gspca_frame_add(gspca_dev, FIRST_PACKET, data, len); else gspca_dev->last_packet_type = DISCARD_PACKET; return; case 0x51: /* end of frame */ ov51x_handle_button(gspca_dev, data[11] & 1); if (data[9] != 0) gspca_dev->last_packet_type = DISCARD_PACKET; gspca_frame_add(gspca_dev, LAST_PACKET, NULL, 0); return; } } /* intermediate packet */ gspca_frame_add(gspca_dev, INTER_PACKET, data, len); } static void ovfx2_pkt_scan(struct gspca_dev *gspca_dev, u8 *data, /* isoc packet */ int len) /* iso packet length */ { struct sd *sd = (struct sd *) gspca_dev; gspca_frame_add(gspca_dev, INTER_PACKET, data, len); /* A short read signals EOF */ if (len < gspca_dev->cam.bulk_size) { /* If the frame is short, and it is one of the first ones the sensor and bridge are still syncing, so drop it. */ if (sd->first_frame) { sd->first_frame--; if (gspca_dev->image_len < sd->gspca_dev.pixfmt.width * sd->gspca_dev.pixfmt.height) gspca_dev->last_packet_type = DISCARD_PACKET; } gspca_frame_add(gspca_dev, LAST_PACKET, NULL, 0); gspca_frame_add(gspca_dev, FIRST_PACKET, NULL, 0); } } static void sd_pkt_scan(struct gspca_dev *gspca_dev, u8 *data, /* isoc packet */ int len) /* iso packet length */ { struct sd *sd = (struct sd *) gspca_dev; switch (sd->bridge) { case BRIDGE_OV511: case BRIDGE_OV511PLUS: ov511_pkt_scan(gspca_dev, data, len); break; case BRIDGE_OV518: case BRIDGE_OV518PLUS: ov518_pkt_scan(gspca_dev, data, len); break; case BRIDGE_OV519: ov519_pkt_scan(gspca_dev, data, len); break; case BRIDGE_OVFX2: ovfx2_pkt_scan(gspca_dev, data, len); break; case BRIDGE_W9968CF: w9968cf_pkt_scan(gspca_dev, data, len); break; } } /* -- management routines -- */ static void setbrightness(struct gspca_dev *gspca_dev, s32 val) { struct sd *sd = (struct sd *) gspca_dev; static const struct ov_i2c_regvals brit_7660[][7] = { {{0x0f, 0x6a}, {0x24, 0x40}, {0x25, 0x2b}, {0x26, 0x90}, {0x27, 0xe0}, {0x28, 0xe0}, {0x2c, 0xe0}}, {{0x0f, 0x6a}, {0x24, 0x50}, {0x25, 0x40}, {0x26, 0xa1}, {0x27, 0xc0}, {0x28, 0xc0}, {0x2c, 0xc0}}, {{0x0f, 0x6a}, {0x24, 0x68}, {0x25, 0x58}, {0x26, 0xc2}, {0x27, 0xa0}, {0x28, 0xa0}, {0x2c, 0xa0}}, {{0x0f, 0x6a}, {0x24, 0x70}, {0x25, 0x68}, {0x26, 0xd3}, {0x27, 0x80}, {0x28, 0x80}, {0x2c, 0x80}}, {{0x0f, 0x6a}, {0x24, 0x80}, {0x25, 0x70}, {0x26, 0xd3}, {0x27, 0x20}, {0x28, 0x20}, {0x2c, 0x20}}, {{0x0f, 0x6a}, {0x24, 0x88}, {0x25, 0x78}, {0x26, 0xd3}, {0x27, 0x40}, {0x28, 0x40}, {0x2c, 0x40}}, {{0x0f, 0x6a}, {0x24, 0x90}, {0x25, 0x80}, {0x26, 0xd4}, {0x27, 0x60}, {0x28, 0x60}, {0x2c, 0x60}} }; switch (sd->sensor) { case SEN_OV8610: case SEN_OV7610: case SEN_OV76BE: case SEN_OV6620: case SEN_OV6630: case SEN_OV66308AF: case SEN_OV7640: case SEN_OV7648: i2c_w(sd, OV7610_REG_BRT, val); break; case SEN_OV7620: case SEN_OV7620AE: i2c_w(sd, OV7610_REG_BRT, val); break; case SEN_OV7660: write_i2c_regvals(sd, brit_7660[val], ARRAY_SIZE(brit_7660[0])); break; case SEN_OV7670: /*win trace * i2c_w_mask(sd, OV7670_R13_COM8, 0, OV7670_COM8_AEC); */ i2c_w(sd, OV7670_R55_BRIGHT, ov7670_abs_to_sm(val)); break; } } static void setcontrast(struct gspca_dev *gspca_dev, s32 val) { struct sd *sd = (struct sd *) gspca_dev; static const struct ov_i2c_regvals contrast_7660[][31] = { {{0x6c, 0xf0}, {0x6d, 0xf0}, {0x6e, 0xf8}, {0x6f, 0xa0}, {0x70, 0x58}, {0x71, 0x38}, {0x72, 0x30}, {0x73, 0x30}, {0x74, 0x28}, {0x75, 0x28}, {0x76, 0x24}, {0x77, 0x24}, {0x78, 0x22}, {0x79, 0x28}, {0x7a, 0x2a}, {0x7b, 0x34}, {0x7c, 0x0f}, {0x7d, 0x1e}, {0x7e, 0x3d}, {0x7f, 0x65}, {0x80, 0x70}, {0x81, 0x77}, {0x82, 0x7d}, {0x83, 0x83}, {0x84, 0x88}, {0x85, 0x8d}, {0x86, 0x96}, {0x87, 0x9f}, {0x88, 0xb0}, {0x89, 0xc4}, {0x8a, 0xd9}}, {{0x6c, 0xf0}, {0x6d, 0xf0}, {0x6e, 0xf8}, {0x6f, 0x94}, {0x70, 0x58}, {0x71, 0x40}, {0x72, 0x30}, {0x73, 0x30}, {0x74, 0x30}, {0x75, 0x30}, {0x76, 0x2c}, {0x77, 0x24}, {0x78, 0x22}, {0x79, 0x28}, {0x7a, 0x2a}, {0x7b, 0x31}, {0x7c, 0x0f}, {0x7d, 0x1e}, {0x7e, 0x3d}, {0x7f, 0x62}, {0x80, 0x6d}, {0x81, 0x75}, {0x82, 0x7b}, {0x83, 0x81}, {0x84, 0x87}, {0x85, 0x8d}, {0x86, 0x98}, {0x87, 0xa1}, {0x88, 0xb2}, {0x89, 0xc6}, {0x8a, 0xdb}}, {{0x6c, 0xf0}, {0x6d, 0xf0}, {0x6e, 0xf0}, {0x6f, 0x84}, {0x70, 0x58}, {0x71, 0x48}, {0x72, 0x40}, {0x73, 0x40}, {0x74, 0x28}, {0x75, 0x28}, {0x76, 0x28}, {0x77, 0x24}, {0x78, 0x26}, {0x79, 0x28}, {0x7a, 0x28}, {0x7b, 0x34}, {0x7c, 0x0f}, {0x7d, 0x1e}, {0x7e, 0x3c}, {0x7f, 0x5d}, {0x80, 0x68}, {0x81, 0x71}, {0x82, 0x79}, {0x83, 0x81}, {0x84, 0x86}, {0x85, 0x8b}, {0x86, 0x95}, {0x87, 0x9e}, {0x88, 0xb1}, {0x89, 0xc5}, {0x8a, 0xd9}}, {{0x6c, 0xf0}, {0x6d, 0xf0}, {0x6e, 0xf0}, {0x6f, 0x70}, {0x70, 0x58}, {0x71, 0x58}, {0x72, 0x48}, {0x73, 0x48}, {0x74, 0x38}, {0x75, 0x40}, {0x76, 0x34}, {0x77, 0x34}, {0x78, 0x2e}, {0x79, 0x28}, {0x7a, 0x24}, {0x7b, 0x22}, {0x7c, 0x0f}, {0x7d, 0x1e}, {0x7e, 0x3c}, {0x7f, 0x58}, {0x80, 0x63}, {0x81, 0x6e}, {0x82, 0x77}, {0x83, 0x80}, {0x84, 0x87}, {0x85, 0x8f}, {0x86, 0x9c}, {0x87, 0xa9}, {0x88, 0xc0}, {0x89, 0xd4}, {0x8a, 0xe6}}, {{0x6c, 0xa0}, {0x6d, 0xf0}, {0x6e, 0x90}, {0x6f, 0x80}, {0x70, 0x70}, {0x71, 0x80}, {0x72, 0x60}, {0x73, 0x60}, {0x74, 0x58}, {0x75, 0x60}, {0x76, 0x4c}, {0x77, 0x38}, {0x78, 0x38}, {0x79, 0x2a}, {0x7a, 0x20}, {0x7b, 0x0e}, {0x7c, 0x0a}, {0x7d, 0x14}, {0x7e, 0x26}, {0x7f, 0x46}, {0x80, 0x54}, {0x81, 0x64}, {0x82, 0x70}, {0x83, 0x7c}, {0x84, 0x87}, {0x85, 0x93}, {0x86, 0xa6}, {0x87, 0xb4}, {0x88, 0xd0}, {0x89, 0xe5}, {0x8a, 0xf5}}, {{0x6c, 0x60}, {0x6d, 0x80}, {0x6e, 0x60}, {0x6f, 0x80}, {0x70, 0x80}, {0x71, 0x80}, {0x72, 0x88}, {0x73, 0x30}, {0x74, 0x70}, {0x75, 0x68}, {0x76, 0x64}, {0x77, 0x50}, {0x78, 0x3c}, {0x79, 0x22}, {0x7a, 0x10}, {0x7b, 0x08}, {0x7c, 0x06}, {0x7d, 0x0e}, {0x7e, 0x1a}, {0x7f, 0x3a}, {0x80, 0x4a}, {0x81, 0x5a}, {0x82, 0x6b}, {0x83, 0x7b}, {0x84, 0x89}, {0x85, 0x96}, {0x86, 0xaf}, {0x87, 0xc3}, {0x88, 0xe1}, {0x89, 0xf2}, {0x8a, 0xfa}}, {{0x6c, 0x20}, {0x6d, 0x40}, {0x6e, 0x20}, {0x6f, 0x60}, {0x70, 0x88}, {0x71, 0xc8}, {0x72, 0xc0}, {0x73, 0xb8}, {0x74, 0xa8}, {0x75, 0xb8}, {0x76, 0x80}, {0x77, 0x5c}, {0x78, 0x26}, {0x79, 0x10}, {0x7a, 0x08}, {0x7b, 0x04}, {0x7c, 0x02}, {0x7d, 0x06}, {0x7e, 0x0a}, {0x7f, 0x22}, {0x80, 0x33}, {0x81, 0x4c}, {0x82, 0x64}, {0x83, 0x7b}, {0x84, 0x90}, {0x85, 0xa7}, {0x86, 0xc7}, {0x87, 0xde}, {0x88, 0xf1}, {0x89, 0xf9}, {0x8a, 0xfd}}, }; switch (sd->sensor) { case SEN_OV7610: case SEN_OV6620: i2c_w(sd, OV7610_REG_CNT, val); break; case SEN_OV6630: case SEN_OV66308AF: i2c_w_mask(sd, OV7610_REG_CNT, val >> 4, 0x0f); break; case SEN_OV8610: { static const u8 ctab[] = { 0x03, 0x09, 0x0b, 0x0f, 0x53, 0x6f, 0x35, 0x7f }; /* Use Y gamma control instead. Bit 0 enables it. */ i2c_w(sd, 0x64, ctab[val >> 5]); break; } case SEN_OV7620: case SEN_OV7620AE: { static const u8 ctab[] = { 0x01, 0x05, 0x09, 0x11, 0x15, 0x35, 0x37, 0x57, 0x5b, 0xa5, 0xa7, 0xc7, 0xc9, 0xcf, 0xef, 0xff }; /* Use Y gamma control instead. Bit 0 enables it. */ i2c_w(sd, 0x64, ctab[val >> 4]); break; } case SEN_OV7660: write_i2c_regvals(sd, contrast_7660[val], ARRAY_SIZE(contrast_7660[0])); break; case SEN_OV7670: /* check that this isn't just the same as ov7610 */ i2c_w(sd, OV7670_R56_CONTRAS, val >> 1); break; } } static void setexposure(struct gspca_dev *gspca_dev, s32 val) { struct sd *sd = (struct sd *) gspca_dev; i2c_w(sd, 0x10, val); } static void setcolors(struct gspca_dev *gspca_dev, s32 val) { struct sd *sd = (struct sd *) gspca_dev; static const struct ov_i2c_regvals colors_7660[][6] = { {{0x4f, 0x28}, {0x50, 0x2a}, {0x51, 0x02}, {0x52, 0x0a}, {0x53, 0x19}, {0x54, 0x23}}, {{0x4f, 0x47}, {0x50, 0x4a}, {0x51, 0x03}, {0x52, 0x11}, {0x53, 0x2c}, {0x54, 0x3e}}, {{0x4f, 0x66}, {0x50, 0x6b}, {0x51, 0x05}, {0x52, 0x19}, {0x53, 0x40}, {0x54, 0x59}}, {{0x4f, 0x84}, {0x50, 0x8b}, {0x51, 0x06}, {0x52, 0x20}, {0x53, 0x53}, {0x54, 0x73}}, {{0x4f, 0xa3}, {0x50, 0xab}, {0x51, 0x08}, {0x52, 0x28}, {0x53, 0x66}, {0x54, 0x8e}}, }; switch (sd->sensor) { case SEN_OV8610: case SEN_OV7610: case SEN_OV76BE: case SEN_OV6620: case SEN_OV6630: case SEN_OV66308AF: i2c_w(sd, OV7610_REG_SAT, val); break; case SEN_OV7620: case SEN_OV7620AE: /* Use UV gamma control instead. Bits 0 & 7 are reserved. */ /* rc = ov_i2c_write(sd->dev, 0x62, (val >> 9) & 0x7e); if (rc < 0) goto out; */ i2c_w(sd, OV7610_REG_SAT, val); break; case SEN_OV7640: case SEN_OV7648: i2c_w(sd, OV7610_REG_SAT, val & 0xf0); break; case SEN_OV7660: write_i2c_regvals(sd, colors_7660[val], ARRAY_SIZE(colors_7660[0])); break; case SEN_OV7670: /* supported later once I work out how to do it * transparently fail now! */ /* set REG_COM13 values for UV sat auto mode */ break; } } static void setautobright(struct gspca_dev *gspca_dev, s32 val) { struct sd *sd = (struct sd *) gspca_dev; i2c_w_mask(sd, 0x2d, val ? 0x10 : 0x00, 0x10); } static void setfreq_i(struct sd *sd, s32 val) { if (sd->sensor == SEN_OV7660 || sd->sensor == SEN_OV7670) { switch (val) { case 0: /* Banding filter disabled */ i2c_w_mask(sd, OV7670_R13_COM8, 0, OV7670_COM8_BFILT); break; case 1: /* 50 hz */ i2c_w_mask(sd, OV7670_R13_COM8, OV7670_COM8_BFILT, OV7670_COM8_BFILT); i2c_w_mask(sd, OV7670_R3B_COM11, 0x08, 0x18); break; case 2: /* 60 hz */ i2c_w_mask(sd, OV7670_R13_COM8, OV7670_COM8_BFILT, OV7670_COM8_BFILT); i2c_w_mask(sd, OV7670_R3B_COM11, 0x00, 0x18); break; case 3: /* Auto hz - ov7670 only */ i2c_w_mask(sd, OV7670_R13_COM8, OV7670_COM8_BFILT, OV7670_COM8_BFILT); i2c_w_mask(sd, OV7670_R3B_COM11, OV7670_COM11_HZAUTO, 0x18); break; } } else { switch (val) { case 0: /* Banding filter disabled */ i2c_w_mask(sd, 0x2d, 0x00, 0x04); i2c_w_mask(sd, 0x2a, 0x00, 0x80); break; case 1: /* 50 hz (filter on and framerate adj) */ i2c_w_mask(sd, 0x2d, 0x04, 0x04); i2c_w_mask(sd, 0x2a, 0x80, 0x80); /* 20 fps -> 16.667 fps */ if (sd->sensor == SEN_OV6620 || sd->sensor == SEN_OV6630 || sd->sensor == SEN_OV66308AF) i2c_w(sd, 0x2b, 0x5e); else i2c_w(sd, 0x2b, 0xac); break; case 2: /* 60 hz (filter on, ...) */ i2c_w_mask(sd, 0x2d, 0x04, 0x04); if (sd->sensor == SEN_OV6620 || sd->sensor == SEN_OV6630 || sd->sensor == SEN_OV66308AF) { /* 20 fps -> 15 fps */ i2c_w_mask(sd, 0x2a, 0x80, 0x80); i2c_w(sd, 0x2b, 0xa8); } else { /* no framerate adj. */ i2c_w_mask(sd, 0x2a, 0x00, 0x80); } break; } } } static void setfreq(struct gspca_dev *gspca_dev, s32 val) { struct sd *sd = (struct sd *) gspca_dev; setfreq_i(sd, val); /* Ugly but necessary */ if (sd->bridge == BRIDGE_W9968CF) w9968cf_set_crop_window(sd); } static int sd_get_jcomp(struct gspca_dev *gspca_dev, struct v4l2_jpegcompression *jcomp) { struct sd *sd = (struct sd *) gspca_dev; if (sd->bridge != BRIDGE_W9968CF) return -ENOTTY; memset(jcomp, 0, sizeof *jcomp); jcomp->quality = v4l2_ctrl_g_ctrl(sd->jpegqual); jcomp->jpeg_markers = V4L2_JPEG_MARKER_DHT | V4L2_JPEG_MARKER_DQT | V4L2_JPEG_MARKER_DRI; return 0; } static int sd_set_jcomp(struct gspca_dev *gspca_dev, const struct v4l2_jpegcompression *jcomp) { struct sd *sd = (struct sd *) gspca_dev; if (sd->bridge != BRIDGE_W9968CF) return -ENOTTY; v4l2_ctrl_s_ctrl(sd->jpegqual, jcomp->quality); return 0; } static int sd_g_volatile_ctrl(struct v4l2_ctrl *ctrl) { struct gspca_dev *gspca_dev = container_of(ctrl->handler, struct gspca_dev, ctrl_handler); struct sd *sd = (struct sd *)gspca_dev; gspca_dev->usb_err = 0; switch (ctrl->id) { case V4L2_CID_AUTOGAIN: gspca_dev->exposure->val = i2c_r(sd, 0x10); break; } return 0; } static int sd_s_ctrl(struct v4l2_ctrl *ctrl) { struct gspca_dev *gspca_dev = container_of(ctrl->handler, struct gspca_dev, ctrl_handler); struct sd *sd = (struct sd *)gspca_dev; gspca_dev->usb_err = 0; if (!gspca_dev->streaming) return 0; switch (ctrl->id) { case V4L2_CID_BRIGHTNESS: setbrightness(gspca_dev, ctrl->val); break; case V4L2_CID_CONTRAST: setcontrast(gspca_dev, ctrl->val); break; case V4L2_CID_POWER_LINE_FREQUENCY: setfreq(gspca_dev, ctrl->val); break; case V4L2_CID_AUTOBRIGHTNESS: if (ctrl->is_new) setautobright(gspca_dev, ctrl->val); if (!ctrl->val && sd->brightness->is_new) setbrightness(gspca_dev, sd->brightness->val); break; case V4L2_CID_SATURATION: setcolors(gspca_dev, ctrl->val); break; case V4L2_CID_HFLIP: sethvflip(gspca_dev, ctrl->val, sd->vflip->val); break; case V4L2_CID_AUTOGAIN: if (ctrl->is_new) setautogain(gspca_dev, ctrl->val); if (!ctrl->val && gspca_dev->exposure->is_new) setexposure(gspca_dev, gspca_dev->exposure->val); break; case V4L2_CID_JPEG_COMPRESSION_QUALITY: return -EBUSY; /* Should never happen, as we grab the ctrl */ } return gspca_dev->usb_err; } static const struct v4l2_ctrl_ops sd_ctrl_ops = { .g_volatile_ctrl = sd_g_volatile_ctrl, .s_ctrl = sd_s_ctrl, }; static int sd_init_controls(struct gspca_dev *gspca_dev) { struct sd *sd = (struct sd *)gspca_dev; struct v4l2_ctrl_handler *hdl = &gspca_dev->ctrl_handler; gspca_dev->vdev.ctrl_handler = hdl; v4l2_ctrl_handler_init(hdl, 10); if (valid_controls[sd->sensor].has_brightness) sd->brightness = v4l2_ctrl_new_std(hdl, &sd_ctrl_ops, V4L2_CID_BRIGHTNESS, 0, sd->sensor == SEN_OV7660 ? 6 : 255, 1, sd->sensor == SEN_OV7660 ? 3 : 127); if (valid_controls[sd->sensor].has_contrast) { if (sd->sensor == SEN_OV7660) v4l2_ctrl_new_std(hdl, &sd_ctrl_ops, V4L2_CID_CONTRAST, 0, 6, 1, 3); else v4l2_ctrl_new_std(hdl, &sd_ctrl_ops, V4L2_CID_CONTRAST, 0, 255, 1, (sd->sensor == SEN_OV6630 || sd->sensor == SEN_OV66308AF) ? 200 : 127); } if (valid_controls[sd->sensor].has_sat) v4l2_ctrl_new_std(hdl, &sd_ctrl_ops, V4L2_CID_SATURATION, 0, sd->sensor == SEN_OV7660 ? 4 : 255, 1, sd->sensor == SEN_OV7660 ? 2 : 127); if (valid_controls[sd->sensor].has_exposure) gspca_dev->exposure = v4l2_ctrl_new_std(hdl, &sd_ctrl_ops, V4L2_CID_EXPOSURE, 0, 255, 1, 127); if (valid_controls[sd->sensor].has_hvflip) { sd->hflip = v4l2_ctrl_new_std(hdl, &sd_ctrl_ops, V4L2_CID_HFLIP, 0, 1, 1, 0); sd->vflip = v4l2_ctrl_new_std(hdl, &sd_ctrl_ops, V4L2_CID_VFLIP, 0, 1, 1, 0); } if (valid_controls[sd->sensor].has_autobright) sd->autobright = v4l2_ctrl_new_std(hdl, &sd_ctrl_ops, V4L2_CID_AUTOBRIGHTNESS, 0, 1, 1, 1); if (valid_controls[sd->sensor].has_autogain) gspca_dev->autogain = v4l2_ctrl_new_std(hdl, &sd_ctrl_ops, V4L2_CID_AUTOGAIN, 0, 1, 1, 1); if (valid_controls[sd->sensor].has_freq) { if (sd->sensor == SEN_OV7670) sd->freq = v4l2_ctrl_new_std_menu(hdl, &sd_ctrl_ops, V4L2_CID_POWER_LINE_FREQUENCY, V4L2_CID_POWER_LINE_FREQUENCY_AUTO, 0, V4L2_CID_POWER_LINE_FREQUENCY_AUTO); else sd->freq = v4l2_ctrl_new_std_menu(hdl, &sd_ctrl_ops, V4L2_CID_POWER_LINE_FREQUENCY, V4L2_CID_POWER_LINE_FREQUENCY_60HZ, 0, 0); } if (sd->bridge == BRIDGE_W9968CF) sd->jpegqual = v4l2_ctrl_new_std(hdl, &sd_ctrl_ops, V4L2_CID_JPEG_COMPRESSION_QUALITY, QUALITY_MIN, QUALITY_MAX, 1, QUALITY_DEF); if (hdl->error) { gspca_err(gspca_dev, "Could not initialize controls\n"); return hdl->error; } if (gspca_dev->autogain) v4l2_ctrl_auto_cluster(3, &gspca_dev->autogain, 0, true); if (sd->autobright) v4l2_ctrl_auto_cluster(2, &sd->autobright, 0, false); if (sd->hflip) v4l2_ctrl_cluster(2, &sd->hflip); return 0; } /* sub-driver description */ static const struct sd_desc sd_desc = { .name = MODULE_NAME, .config = sd_config, .init = sd_init, .init_controls = sd_init_controls, .isoc_init = sd_isoc_init, .start = sd_start, .stopN = sd_stopN, .stop0 = sd_stop0, .pkt_scan = sd_pkt_scan, .dq_callback = sd_reset_snapshot, .get_jcomp = sd_get_jcomp, .set_jcomp = sd_set_jcomp, #if IS_ENABLED(CONFIG_INPUT) .other_input = 1, #endif }; /* -- module initialisation -- */ static const struct usb_device_id device_table[] = { {USB_DEVICE(0x041e, 0x4003), .driver_info = BRIDGE_W9968CF }, {USB_DEVICE(0x041e, 0x4052), .driver_info = BRIDGE_OV519 | BRIDGE_INVERT_LED }, {USB_DEVICE(0x041e, 0x405f), .driver_info = BRIDGE_OV519 }, {USB_DEVICE(0x041e, 0x4060), .driver_info = BRIDGE_OV519 }, {USB_DEVICE(0x041e, 0x4061), .driver_info = BRIDGE_OV519 }, {USB_DEVICE(0x041e, 0x4064), .driver_info = BRIDGE_OV519 }, {USB_DEVICE(0x041e, 0x4067), .driver_info = BRIDGE_OV519 }, {USB_DEVICE(0x041e, 0x4068), .driver_info = BRIDGE_OV519 }, {USB_DEVICE(0x045e, 0x028c), .driver_info = BRIDGE_OV519 | BRIDGE_INVERT_LED }, {USB_DEVICE(0x054c, 0x0154), .driver_info = BRIDGE_OV519 }, {USB_DEVICE(0x054c, 0x0155), .driver_info = BRIDGE_OV519 }, {USB_DEVICE(0x05a9, 0x0511), .driver_info = BRIDGE_OV511 }, {USB_DEVICE(0x05a9, 0x0518), .driver_info = BRIDGE_OV518 }, {USB_DEVICE(0x05a9, 0x0519), .driver_info = BRIDGE_OV519 | BRIDGE_INVERT_LED }, {USB_DEVICE(0x05a9, 0x0530), .driver_info = BRIDGE_OV519 | BRIDGE_INVERT_LED }, {USB_DEVICE(0x05a9, 0x2800), .driver_info = BRIDGE_OVFX2 }, {USB_DEVICE(0x05a9, 0x4519), .driver_info = BRIDGE_OV519 }, {USB_DEVICE(0x05a9, 0x8519), .driver_info = BRIDGE_OV519 }, {USB_DEVICE(0x05a9, 0xa511), .driver_info = BRIDGE_OV511PLUS }, {USB_DEVICE(0x05a9, 0xa518), .driver_info = BRIDGE_OV518PLUS }, {USB_DEVICE(0x0813, 0x0002), .driver_info = BRIDGE_OV511PLUS }, {USB_DEVICE(0x0b62, 0x0059), .driver_info = BRIDGE_OVFX2 }, {USB_DEVICE(0x0e96, 0xc001), .driver_info = BRIDGE_OVFX2 }, {USB_DEVICE(0x1046, 0x9967), .driver_info = BRIDGE_W9968CF }, {USB_DEVICE(0x8020, 0xef04), .driver_info = BRIDGE_OVFX2 }, {} }; MODULE_DEVICE_TABLE(usb, device_table); /* -- device connect -- */ static int sd_probe(struct usb_interface *intf, const struct usb_device_id *id) { return gspca_dev_probe(intf, id, &sd_desc, sizeof(struct sd), THIS_MODULE); } static struct usb_driver sd_driver = { .name = MODULE_NAME, .id_table = device_table, .probe = sd_probe, .disconnect = gspca_disconnect, #ifdef CONFIG_PM .suspend = gspca_suspend, .resume = gspca_resume, .reset_resume = gspca_resume, #endif }; module_usb_driver(sd_driver); module_param(frame_rate, int, 0644); MODULE_PARM_DESC(frame_rate, "Frame rate (5, 10, 15, 20 or 30 fps)"); |
| 2 2 3 2 2 1 1 1 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 | // SPDX-License-Identifier: GPL-2.0 /* * ESSIV skcipher and aead template for block encryption * * This template encapsulates the ESSIV IV generation algorithm used by * dm-crypt and fscrypt, which converts the initial vector for the skcipher * used for block encryption, by encrypting it using the hash of the * skcipher key as encryption key. Usually, the input IV is a 64-bit sector * number in LE representation zero-padded to the size of the IV, but this * is not assumed by this driver. * * The typical use of this template is to instantiate the skcipher * 'essiv(cbc(aes),sha256)', which is the only instantiation used by * fscrypt, and the most relevant one for dm-crypt. However, dm-crypt * also permits ESSIV to be used in combination with the authenc template, * e.g., 'essiv(authenc(hmac(sha256),cbc(aes)),sha256)', in which case * we need to instantiate an aead that accepts the same special key format * as the authenc template, and deals with the way the encrypted IV is * embedded into the AAD area of the aead request. This means the AEAD * flavor produced by this template is tightly coupled to the way dm-crypt * happens to use it. * * Copyright (c) 2019 Linaro, Ltd. <ard.biesheuvel@linaro.org> * * Heavily based on: * adiantum length-preserving encryption mode * * Copyright 2018 Google LLC */ #include <crypto/authenc.h> #include <crypto/internal/aead.h> #include <crypto/internal/cipher.h> #include <crypto/internal/hash.h> #include <crypto/internal/skcipher.h> #include <crypto/scatterwalk.h> #include <linux/module.h> #include "internal.h" struct essiv_instance_ctx { union { struct crypto_skcipher_spawn skcipher_spawn; struct crypto_aead_spawn aead_spawn; } u; char essiv_cipher_name[CRYPTO_MAX_ALG_NAME]; char shash_driver_name[CRYPTO_MAX_ALG_NAME]; }; struct essiv_tfm_ctx { union { struct crypto_skcipher *skcipher; struct crypto_aead *aead; } u; struct crypto_cipher *essiv_cipher; struct crypto_shash *hash; int ivoffset; }; struct essiv_aead_request_ctx { struct scatterlist sg[4]; u8 *assoc; struct aead_request aead_req; }; static int essiv_skcipher_setkey(struct crypto_skcipher *tfm, const u8 *key, unsigned int keylen) { struct essiv_tfm_ctx *tctx = crypto_skcipher_ctx(tfm); u8 salt[HASH_MAX_DIGESTSIZE]; int err; crypto_skcipher_clear_flags(tctx->u.skcipher, CRYPTO_TFM_REQ_MASK); crypto_skcipher_set_flags(tctx->u.skcipher, crypto_skcipher_get_flags(tfm) & CRYPTO_TFM_REQ_MASK); err = crypto_skcipher_setkey(tctx->u.skcipher, key, keylen); if (err) return err; err = crypto_shash_tfm_digest(tctx->hash, key, keylen, salt); if (err) return err; crypto_cipher_clear_flags(tctx->essiv_cipher, CRYPTO_TFM_REQ_MASK); crypto_cipher_set_flags(tctx->essiv_cipher, crypto_skcipher_get_flags(tfm) & CRYPTO_TFM_REQ_MASK); return crypto_cipher_setkey(tctx->essiv_cipher, salt, crypto_shash_digestsize(tctx->hash)); } static int essiv_aead_setkey(struct crypto_aead *tfm, const u8 *key, unsigned int keylen) { struct essiv_tfm_ctx *tctx = crypto_aead_ctx(tfm); SHASH_DESC_ON_STACK(desc, tctx->hash); struct crypto_authenc_keys keys; u8 salt[HASH_MAX_DIGESTSIZE]; int err; crypto_aead_clear_flags(tctx->u.aead, CRYPTO_TFM_REQ_MASK); crypto_aead_set_flags(tctx->u.aead, crypto_aead_get_flags(tfm) & CRYPTO_TFM_REQ_MASK); err = crypto_aead_setkey(tctx->u.aead, key, keylen); if (err) return err; if (crypto_authenc_extractkeys(&keys, key, keylen) != 0) return -EINVAL; desc->tfm = tctx->hash; err = crypto_shash_init(desc) ?: crypto_shash_update(desc, keys.enckey, keys.enckeylen) ?: crypto_shash_finup(desc, keys.authkey, keys.authkeylen, salt); if (err) return err; crypto_cipher_clear_flags(tctx->essiv_cipher, CRYPTO_TFM_REQ_MASK); crypto_cipher_set_flags(tctx->essiv_cipher, crypto_aead_get_flags(tfm) & CRYPTO_TFM_REQ_MASK); return crypto_cipher_setkey(tctx->essiv_cipher, salt, crypto_shash_digestsize(tctx->hash)); } static int essiv_aead_setauthsize(struct crypto_aead *tfm, unsigned int authsize) { struct essiv_tfm_ctx *tctx = crypto_aead_ctx(tfm); return crypto_aead_setauthsize(tctx->u.aead, authsize); } static void essiv_skcipher_done(void *data, int err) { struct skcipher_request *req = data; skcipher_request_complete(req, err); } static int essiv_skcipher_crypt(struct skcipher_request *req, bool enc) { struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req); const struct essiv_tfm_ctx *tctx = crypto_skcipher_ctx(tfm); struct skcipher_request *subreq = skcipher_request_ctx(req); crypto_cipher_encrypt_one(tctx->essiv_cipher, req->iv, req->iv); skcipher_request_set_tfm(subreq, tctx->u.skcipher); skcipher_request_set_crypt(subreq, req->src, req->dst, req->cryptlen, req->iv); skcipher_request_set_callback(subreq, skcipher_request_flags(req), essiv_skcipher_done, req); return enc ? crypto_skcipher_encrypt(subreq) : crypto_skcipher_decrypt(subreq); } static int essiv_skcipher_encrypt(struct skcipher_request *req) { return essiv_skcipher_crypt(req, true); } static int essiv_skcipher_decrypt(struct skcipher_request *req) { return essiv_skcipher_crypt(req, false); } static void essiv_aead_done(void *data, int err) { struct aead_request *req = data; struct essiv_aead_request_ctx *rctx = aead_request_ctx(req); if (err == -EINPROGRESS) goto out; kfree(rctx->assoc); out: aead_request_complete(req, err); } static int essiv_aead_crypt(struct aead_request *req, bool enc) { struct crypto_aead *tfm = crypto_aead_reqtfm(req); const struct essiv_tfm_ctx *tctx = crypto_aead_ctx(tfm); struct essiv_aead_request_ctx *rctx = aead_request_ctx(req); struct aead_request *subreq = &rctx->aead_req; struct scatterlist *src = req->src; int err; crypto_cipher_encrypt_one(tctx->essiv_cipher, req->iv, req->iv); /* * dm-crypt embeds the sector number and the IV in the AAD region, so * we have to copy the converted IV into the right scatterlist before * we pass it on. */ rctx->assoc = NULL; if (req->src == req->dst || !enc) { scatterwalk_map_and_copy(req->iv, req->dst, req->assoclen - crypto_aead_ivsize(tfm), crypto_aead_ivsize(tfm), 1); } else { u8 *iv = (u8 *)aead_request_ctx(req) + tctx->ivoffset; int ivsize = crypto_aead_ivsize(tfm); int ssize = req->assoclen - ivsize; struct scatterlist *sg; int nents; if (ssize < 0) return -EINVAL; nents = sg_nents_for_len(req->src, ssize); if (nents < 0) return -EINVAL; memcpy(iv, req->iv, ivsize); sg_init_table(rctx->sg, 4); if (unlikely(nents > 1)) { /* * This is a case that rarely occurs in practice, but * for correctness, we have to deal with it nonetheless. */ rctx->assoc = kmalloc(ssize, GFP_ATOMIC); if (!rctx->assoc) return -ENOMEM; scatterwalk_map_and_copy(rctx->assoc, req->src, 0, ssize, 0); sg_set_buf(rctx->sg, rctx->assoc, ssize); } else { sg_set_page(rctx->sg, sg_page(req->src), ssize, req->src->offset); } sg_set_buf(rctx->sg + 1, iv, ivsize); sg = scatterwalk_ffwd(rctx->sg + 2, req->src, req->assoclen); if (sg != rctx->sg + 2) sg_chain(rctx->sg, 3, sg); src = rctx->sg; } aead_request_set_tfm(subreq, tctx->u.aead); aead_request_set_ad(subreq, req->assoclen); aead_request_set_callback(subreq, aead_request_flags(req), essiv_aead_done, req); aead_request_set_crypt(subreq, src, req->dst, req->cryptlen, req->iv); err = enc ? crypto_aead_encrypt(subreq) : crypto_aead_decrypt(subreq); if (rctx->assoc && err != -EINPROGRESS && err != -EBUSY) kfree(rctx->assoc); return err; } static int essiv_aead_encrypt(struct aead_request *req) { return essiv_aead_crypt(req, true); } static int essiv_aead_decrypt(struct aead_request *req) { return essiv_aead_crypt(req, false); } static int essiv_init_tfm(struct essiv_instance_ctx *ictx, struct essiv_tfm_ctx *tctx) { struct crypto_cipher *essiv_cipher; struct crypto_shash *hash; int err; essiv_cipher = crypto_alloc_cipher(ictx->essiv_cipher_name, 0, 0); if (IS_ERR(essiv_cipher)) return PTR_ERR(essiv_cipher); hash = crypto_alloc_shash(ictx->shash_driver_name, 0, 0); if (IS_ERR(hash)) { err = PTR_ERR(hash); goto err_free_essiv_cipher; } tctx->essiv_cipher = essiv_cipher; tctx->hash = hash; return 0; err_free_essiv_cipher: crypto_free_cipher(essiv_cipher); return err; } static int essiv_skcipher_init_tfm(struct crypto_skcipher *tfm) { struct skcipher_instance *inst = skcipher_alg_instance(tfm); struct essiv_instance_ctx *ictx = skcipher_instance_ctx(inst); struct essiv_tfm_ctx *tctx = crypto_skcipher_ctx(tfm); struct crypto_skcipher *skcipher; int err; skcipher = crypto_spawn_skcipher(&ictx->u.skcipher_spawn); if (IS_ERR(skcipher)) return PTR_ERR(skcipher); crypto_skcipher_set_reqsize(tfm, sizeof(struct skcipher_request) + crypto_skcipher_reqsize(skcipher)); err = essiv_init_tfm(ictx, tctx); if (err) { crypto_free_skcipher(skcipher); return err; } tctx->u.skcipher = skcipher; return 0; } static int essiv_aead_init_tfm(struct crypto_aead *tfm) { struct aead_instance *inst = aead_alg_instance(tfm); struct essiv_instance_ctx *ictx = aead_instance_ctx(inst); struct essiv_tfm_ctx *tctx = crypto_aead_ctx(tfm); struct crypto_aead *aead; unsigned int subreq_size; int err; BUILD_BUG_ON(offsetofend(struct essiv_aead_request_ctx, aead_req) != sizeof(struct essiv_aead_request_ctx)); aead = crypto_spawn_aead(&ictx->u.aead_spawn); if (IS_ERR(aead)) return PTR_ERR(aead); subreq_size = sizeof_field(struct essiv_aead_request_ctx, aead_req) + crypto_aead_reqsize(aead); tctx->ivoffset = offsetof(struct essiv_aead_request_ctx, aead_req) + subreq_size; crypto_aead_set_reqsize(tfm, tctx->ivoffset + crypto_aead_ivsize(aead)); err = essiv_init_tfm(ictx, tctx); if (err) { crypto_free_aead(aead); return err; } tctx->u.aead = aead; return 0; } static void essiv_skcipher_exit_tfm(struct crypto_skcipher *tfm) { struct essiv_tfm_ctx *tctx = crypto_skcipher_ctx(tfm); crypto_free_skcipher(tctx->u.skcipher); crypto_free_cipher(tctx->essiv_cipher); crypto_free_shash(tctx->hash); } static void essiv_aead_exit_tfm(struct crypto_aead *tfm) { struct essiv_tfm_ctx *tctx = crypto_aead_ctx(tfm); crypto_free_aead(tctx->u.aead); crypto_free_cipher(tctx->essiv_cipher); crypto_free_shash(tctx->hash); } static void essiv_skcipher_free_instance(struct skcipher_instance *inst) { struct essiv_instance_ctx *ictx = skcipher_instance_ctx(inst); crypto_drop_skcipher(&ictx->u.skcipher_spawn); kfree(inst); } static void essiv_aead_free_instance(struct aead_instance *inst) { struct essiv_instance_ctx *ictx = aead_instance_ctx(inst); crypto_drop_aead(&ictx->u.aead_spawn); kfree(inst); } static bool parse_cipher_name(char *essiv_cipher_name, const char *cra_name) { const char *p, *q; int len; /* find the last opening parens */ p = strrchr(cra_name, '('); if (!p++) return false; /* find the first closing parens in the tail of the string */ q = strchr(p, ')'); if (!q) return false; len = q - p; if (len >= CRYPTO_MAX_ALG_NAME) return false; strscpy(essiv_cipher_name, p, len + 1); return true; } static bool essiv_supported_algorithms(const char *essiv_cipher_name, struct shash_alg *hash_alg, int ivsize) { struct crypto_alg *alg; bool ret = false; alg = crypto_alg_mod_lookup(essiv_cipher_name, CRYPTO_ALG_TYPE_CIPHER, CRYPTO_ALG_TYPE_MASK); if (IS_ERR(alg)) return false; if (hash_alg->digestsize < alg->cra_cipher.cia_min_keysize || hash_alg->digestsize > alg->cra_cipher.cia_max_keysize) goto out; if (ivsize != alg->cra_blocksize) goto out; if (crypto_shash_alg_needs_key(hash_alg)) goto out; ret = true; out: crypto_mod_put(alg); return ret; } static int essiv_create(struct crypto_template *tmpl, struct rtattr **tb) { struct skcipher_alg_common *skcipher_alg = NULL; struct crypto_attr_type *algt; const char *inner_cipher_name; const char *shash_name; struct skcipher_instance *skcipher_inst = NULL; struct aead_instance *aead_inst = NULL; struct crypto_instance *inst; struct crypto_alg *base, *block_base; struct essiv_instance_ctx *ictx; struct aead_alg *aead_alg = NULL; struct crypto_alg *_hash_alg; struct shash_alg *hash_alg; int ivsize; u32 type; u32 mask; int err; algt = crypto_get_attr_type(tb); if (IS_ERR(algt)) return PTR_ERR(algt); inner_cipher_name = crypto_attr_alg_name(tb[1]); if (IS_ERR(inner_cipher_name)) return PTR_ERR(inner_cipher_name); shash_name = crypto_attr_alg_name(tb[2]); if (IS_ERR(shash_name)) return PTR_ERR(shash_name); type = algt->type & algt->mask; mask = crypto_algt_inherited_mask(algt); switch (type) { case CRYPTO_ALG_TYPE_LSKCIPHER: skcipher_inst = kzalloc(sizeof(*skcipher_inst) + sizeof(*ictx), GFP_KERNEL); if (!skcipher_inst) return -ENOMEM; inst = skcipher_crypto_instance(skcipher_inst); base = &skcipher_inst->alg.base; ictx = crypto_instance_ctx(inst); /* Symmetric cipher, e.g., "cbc(aes)" */ err = crypto_grab_skcipher(&ictx->u.skcipher_spawn, inst, inner_cipher_name, 0, mask); if (err) goto out_free_inst; skcipher_alg = crypto_spawn_skcipher_alg_common( &ictx->u.skcipher_spawn); block_base = &skcipher_alg->base; ivsize = skcipher_alg->ivsize; break; case CRYPTO_ALG_TYPE_AEAD: aead_inst = kzalloc(sizeof(*aead_inst) + sizeof(*ictx), GFP_KERNEL); if (!aead_inst) return -ENOMEM; inst = aead_crypto_instance(aead_inst); base = &aead_inst->alg.base; ictx = crypto_instance_ctx(inst); /* AEAD cipher, e.g., "authenc(hmac(sha256),cbc(aes))" */ err = crypto_grab_aead(&ictx->u.aead_spawn, inst, inner_cipher_name, 0, mask); if (err) goto out_free_inst; aead_alg = crypto_spawn_aead_alg(&ictx->u.aead_spawn); block_base = &aead_alg->base; if (!strstarts(block_base->cra_name, "authenc(")) { pr_warn("Only authenc() type AEADs are supported by ESSIV\n"); err = -EINVAL; goto out_drop_skcipher; } ivsize = aead_alg->ivsize; break; default: return -EINVAL; } if (!parse_cipher_name(ictx->essiv_cipher_name, block_base->cra_name)) { pr_warn("Failed to parse ESSIV cipher name from skcipher cra_name\n"); err = -EINVAL; goto out_drop_skcipher; } /* Synchronous hash, e.g., "sha256" */ _hash_alg = crypto_alg_mod_lookup(shash_name, CRYPTO_ALG_TYPE_SHASH, CRYPTO_ALG_TYPE_MASK | mask); if (IS_ERR(_hash_alg)) { err = PTR_ERR(_hash_alg); goto out_drop_skcipher; } hash_alg = __crypto_shash_alg(_hash_alg); /* Check the set of algorithms */ if (!essiv_supported_algorithms(ictx->essiv_cipher_name, hash_alg, ivsize)) { pr_warn("Unsupported essiv instantiation: essiv(%s,%s)\n", block_base->cra_name, hash_alg->base.cra_name); err = -EINVAL; goto out_free_hash; } /* record the driver name so we can instantiate this exact algo later */ strscpy(ictx->shash_driver_name, hash_alg->base.cra_driver_name); /* Instance fields */ err = -ENAMETOOLONG; if (snprintf(base->cra_name, CRYPTO_MAX_ALG_NAME, "essiv(%s,%s)", block_base->cra_name, hash_alg->base.cra_name) >= CRYPTO_MAX_ALG_NAME) goto out_free_hash; if (snprintf(base->cra_driver_name, CRYPTO_MAX_ALG_NAME, "essiv(%s,%s)", block_base->cra_driver_name, hash_alg->base.cra_driver_name) >= CRYPTO_MAX_ALG_NAME) goto out_free_hash; /* * hash_alg wasn't gotten via crypto_grab*(), so we need to inherit its * flags manually. */ base->cra_flags |= (hash_alg->base.cra_flags & CRYPTO_ALG_INHERITED_FLAGS); base->cra_blocksize = block_base->cra_blocksize; base->cra_ctxsize = sizeof(struct essiv_tfm_ctx); base->cra_alignmask = block_base->cra_alignmask; base->cra_priority = block_base->cra_priority; if (type == CRYPTO_ALG_TYPE_LSKCIPHER) { skcipher_inst->alg.setkey = essiv_skcipher_setkey; skcipher_inst->alg.encrypt = essiv_skcipher_encrypt; skcipher_inst->alg.decrypt = essiv_skcipher_decrypt; skcipher_inst->alg.init = essiv_skcipher_init_tfm; skcipher_inst->alg.exit = essiv_skcipher_exit_tfm; skcipher_inst->alg.min_keysize = skcipher_alg->min_keysize; skcipher_inst->alg.max_keysize = skcipher_alg->max_keysize; skcipher_inst->alg.ivsize = ivsize; skcipher_inst->alg.chunksize = skcipher_alg->chunksize; skcipher_inst->free = essiv_skcipher_free_instance; err = skcipher_register_instance(tmpl, skcipher_inst); } else { aead_inst->alg.setkey = essiv_aead_setkey; aead_inst->alg.setauthsize = essiv_aead_setauthsize; aead_inst->alg.encrypt = essiv_aead_encrypt; aead_inst->alg.decrypt = essiv_aead_decrypt; aead_inst->alg.init = essiv_aead_init_tfm; aead_inst->alg.exit = essiv_aead_exit_tfm; aead_inst->alg.ivsize = ivsize; aead_inst->alg.maxauthsize = crypto_aead_alg_maxauthsize(aead_alg); aead_inst->alg.chunksize = crypto_aead_alg_chunksize(aead_alg); aead_inst->free = essiv_aead_free_instance; err = aead_register_instance(tmpl, aead_inst); } if (err) goto out_free_hash; crypto_mod_put(_hash_alg); return 0; out_free_hash: crypto_mod_put(_hash_alg); out_drop_skcipher: if (type == CRYPTO_ALG_TYPE_LSKCIPHER) crypto_drop_skcipher(&ictx->u.skcipher_spawn); else crypto_drop_aead(&ictx->u.aead_spawn); out_free_inst: kfree(skcipher_inst); kfree(aead_inst); return err; } /* essiv(cipher_name, shash_name) */ static struct crypto_template essiv_tmpl = { .name = "essiv", .create = essiv_create, .module = THIS_MODULE, }; static int __init essiv_module_init(void) { return crypto_register_template(&essiv_tmpl); } static void __exit essiv_module_exit(void) { crypto_unregister_template(&essiv_tmpl); } module_init(essiv_module_init); module_exit(essiv_module_exit); MODULE_DESCRIPTION("ESSIV skcipher/aead wrapper for block encryption"); MODULE_LICENSE("GPL v2"); MODULE_ALIAS_CRYPTO("essiv"); MODULE_IMPORT_NS("CRYPTO_INTERNAL"); |
| 173 517 107 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 | /* SPDX-License-Identifier: GPL-2.0 */ #ifndef _ASM_X86_PGALLOC_H #define _ASM_X86_PGALLOC_H #include <linux/threads.h> #include <linux/mm.h> /* for struct page */ #include <linux/pagemap.h> #include <asm/cpufeature.h> #define __HAVE_ARCH_PTE_ALLOC_ONE #define __HAVE_ARCH_PGD_FREE #include <asm-generic/pgalloc.h> static inline int __paravirt_pgd_alloc(struct mm_struct *mm) { return 0; } #ifdef CONFIG_PARAVIRT_XXL #include <asm/paravirt.h> #else #define paravirt_pgd_alloc(mm) __paravirt_pgd_alloc(mm) static inline void paravirt_pgd_free(struct mm_struct *mm, pgd_t *pgd) {} static inline void paravirt_alloc_pte(struct mm_struct *mm, unsigned long pfn) {} static inline void paravirt_alloc_pmd(struct mm_struct *mm, unsigned long pfn) {} static inline void paravirt_alloc_pmd_clone(unsigned long pfn, unsigned long clonepfn, unsigned long start, unsigned long count) {} static inline void paravirt_alloc_pud(struct mm_struct *mm, unsigned long pfn) {} static inline void paravirt_alloc_p4d(struct mm_struct *mm, unsigned long pfn) {} static inline void paravirt_release_pte(unsigned long pfn) {} static inline void paravirt_release_pmd(unsigned long pfn) {} static inline void paravirt_release_pud(unsigned long pfn) {} static inline void paravirt_release_p4d(unsigned long pfn) {} #endif /* * In case of Page Table Isolation active, we acquire two PGDs instead of one. * Being order-1, it is both 8k in size and 8k-aligned. That lets us just * flip bit 12 in a pointer to swap between the two 4k halves. */ static inline unsigned int pgd_allocation_order(void) { if (cpu_feature_enabled(X86_FEATURE_PTI)) return 1; return 0; } /* * Allocate and free page tables. */ extern pgd_t *pgd_alloc(struct mm_struct *); extern void pgd_free(struct mm_struct *mm, pgd_t *pgd); extern pgtable_t pte_alloc_one(struct mm_struct *); extern void ___pte_free_tlb(struct mmu_gather *tlb, struct page *pte); static inline void __pte_free_tlb(struct mmu_gather *tlb, struct page *pte, unsigned long address) { ___pte_free_tlb(tlb, pte); } static inline void pmd_populate_kernel(struct mm_struct *mm, pmd_t *pmd, pte_t *pte) { paravirt_alloc_pte(mm, __pa(pte) >> PAGE_SHIFT); set_pmd(pmd, __pmd(__pa(pte) | _PAGE_TABLE)); } static inline void pmd_populate_kernel_safe(struct mm_struct *mm, pmd_t *pmd, pte_t *pte) { paravirt_alloc_pte(mm, __pa(pte) >> PAGE_SHIFT); set_pmd_safe(pmd, __pmd(__pa(pte) | _PAGE_TABLE)); } static inline void pmd_populate(struct mm_struct *mm, pmd_t *pmd, struct page *pte) { unsigned long pfn = page_to_pfn(pte); paravirt_alloc_pte(mm, pfn); set_pmd(pmd, __pmd(((pteval_t)pfn << PAGE_SHIFT) | _PAGE_TABLE)); } #if CONFIG_PGTABLE_LEVELS > 2 extern void ___pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd); static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd, unsigned long address) { ___pmd_free_tlb(tlb, pmd); } #ifdef CONFIG_X86_PAE extern void pud_populate(struct mm_struct *mm, pud_t *pudp, pmd_t *pmd); #else /* !CONFIG_X86_PAE */ static inline void pud_populate(struct mm_struct *mm, pud_t *pud, pmd_t *pmd) { paravirt_alloc_pmd(mm, __pa(pmd) >> PAGE_SHIFT); set_pud(pud, __pud(_PAGE_TABLE | __pa(pmd))); } static inline void pud_populate_safe(struct mm_struct *mm, pud_t *pud, pmd_t *pmd) { paravirt_alloc_pmd(mm, __pa(pmd) >> PAGE_SHIFT); set_pud_safe(pud, __pud(_PAGE_TABLE | __pa(pmd))); } #endif /* CONFIG_X86_PAE */ #if CONFIG_PGTABLE_LEVELS > 3 static inline void p4d_populate(struct mm_struct *mm, p4d_t *p4d, pud_t *pud) { paravirt_alloc_pud(mm, __pa(pud) >> PAGE_SHIFT); set_p4d(p4d, __p4d(_PAGE_TABLE | __pa(pud))); } static inline void p4d_populate_safe(struct mm_struct *mm, p4d_t *p4d, pud_t *pud) { paravirt_alloc_pud(mm, __pa(pud) >> PAGE_SHIFT); set_p4d_safe(p4d, __p4d(_PAGE_TABLE | __pa(pud))); } extern void ___pud_free_tlb(struct mmu_gather *tlb, pud_t *pud); static inline void __pud_free_tlb(struct mmu_gather *tlb, pud_t *pud, unsigned long address) { ___pud_free_tlb(tlb, pud); } #if CONFIG_PGTABLE_LEVELS > 4 static inline void pgd_populate(struct mm_struct *mm, pgd_t *pgd, p4d_t *p4d) { if (!pgtable_l5_enabled()) return; paravirt_alloc_p4d(mm, __pa(p4d) >> PAGE_SHIFT); set_pgd(pgd, __pgd(_PAGE_TABLE | __pa(p4d))); } static inline void pgd_populate_safe(struct mm_struct *mm, pgd_t *pgd, p4d_t *p4d) { if (!pgtable_l5_enabled()) return; paravirt_alloc_p4d(mm, __pa(p4d) >> PAGE_SHIFT); set_pgd_safe(pgd, __pgd(_PAGE_TABLE | __pa(p4d))); } extern void ___p4d_free_tlb(struct mmu_gather *tlb, p4d_t *p4d); static inline void __p4d_free_tlb(struct mmu_gather *tlb, p4d_t *p4d, unsigned long address) { if (pgtable_l5_enabled()) ___p4d_free_tlb(tlb, p4d); } #endif /* CONFIG_PGTABLE_LEVELS > 4 */ #endif /* CONFIG_PGTABLE_LEVELS > 3 */ #endif /* CONFIG_PGTABLE_LEVELS > 2 */ #endif /* _ASM_X86_PGALLOC_H */ |
| 5 4 1 3 3 9 6 3 3 1 1 1 4 1 1 1 1 3 3 1 3 3 3 1 2 6 1 3 5 1 6 9 4 1 1 7 4 6 1 5 5 1 5 14 8 6 13 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 | /* FUSE: Filesystem in Userspace Copyright (C) 2001-2018 Miklos Szeredi <miklos@szeredi.hu> This program can be distributed under the terms of the GNU GPL. See the file COPYING. */ #include "fuse_i.h" #include <linux/iversion.h> #include <linux/posix_acl.h> #include <linux/pagemap.h> #include <linux/highmem.h> static bool fuse_use_readdirplus(struct inode *dir, struct dir_context *ctx) { struct fuse_conn *fc = get_fuse_conn(dir); struct fuse_inode *fi = get_fuse_inode(dir); if (!fc->do_readdirplus) return false; if (!fc->readdirplus_auto) return true; if (test_and_clear_bit(FUSE_I_ADVISE_RDPLUS, &fi->state)) return true; if (ctx->pos == 0) return true; return false; } static void fuse_add_dirent_to_cache(struct file *file, struct fuse_dirent *dirent, loff_t pos) { struct fuse_inode *fi = get_fuse_inode(file_inode(file)); size_t reclen = FUSE_DIRENT_SIZE(dirent); pgoff_t index; struct page *page; loff_t size; u64 version; unsigned int offset; void *addr; spin_lock(&fi->rdc.lock); /* * Is cache already completed? Or this entry does not go at the end of * cache? */ if (fi->rdc.cached || pos != fi->rdc.pos) { spin_unlock(&fi->rdc.lock); return; } version = fi->rdc.version; size = fi->rdc.size; offset = size & ~PAGE_MASK; index = size >> PAGE_SHIFT; /* Dirent doesn't fit in current page? Jump to next page. */ if (offset + reclen > PAGE_SIZE) { index++; offset = 0; } spin_unlock(&fi->rdc.lock); if (offset) { page = find_lock_page(file->f_mapping, index); } else { page = find_or_create_page(file->f_mapping, index, mapping_gfp_mask(file->f_mapping)); } if (!page) return; spin_lock(&fi->rdc.lock); /* Raced with another readdir */ if (fi->rdc.version != version || fi->rdc.size != size || WARN_ON(fi->rdc.pos != pos)) goto unlock; addr = kmap_local_page(page); if (!offset) { clear_page(addr); SetPageUptodate(page); } memcpy(addr + offset, dirent, reclen); kunmap_local(addr); fi->rdc.size = (index << PAGE_SHIFT) + offset + reclen; fi->rdc.pos = dirent->off; unlock: spin_unlock(&fi->rdc.lock); unlock_page(page); put_page(page); } static void fuse_readdir_cache_end(struct file *file, loff_t pos) { struct fuse_inode *fi = get_fuse_inode(file_inode(file)); loff_t end; spin_lock(&fi->rdc.lock); /* does cache end position match current position? */ if (fi->rdc.pos != pos) { spin_unlock(&fi->rdc.lock); return; } fi->rdc.cached = true; end = ALIGN(fi->rdc.size, PAGE_SIZE); spin_unlock(&fi->rdc.lock); /* truncate unused tail of cache */ truncate_inode_pages(file->f_mapping, end); } static bool fuse_emit(struct file *file, struct dir_context *ctx, struct fuse_dirent *dirent) { struct fuse_file *ff = file->private_data; if (ff->open_flags & FOPEN_CACHE_DIR) fuse_add_dirent_to_cache(file, dirent, ctx->pos); return dir_emit(ctx, dirent->name, dirent->namelen, dirent->ino, dirent->type | FILLDIR_FLAG_NOINTR); } static int parse_dirfile(char *buf, size_t nbytes, struct file *file, struct dir_context *ctx) { while (nbytes >= FUSE_NAME_OFFSET) { struct fuse_dirent *dirent = (struct fuse_dirent *) buf; size_t reclen = FUSE_DIRENT_SIZE(dirent); if (!dirent->namelen || dirent->namelen > FUSE_NAME_MAX) return -EIO; if (reclen > nbytes) break; if (memchr(dirent->name, '/', dirent->namelen) != NULL) return -EIO; if (!fuse_emit(file, ctx, dirent)) break; buf += reclen; nbytes -= reclen; ctx->pos = dirent->off; } return 0; } static int fuse_direntplus_link(struct file *file, struct fuse_direntplus *direntplus, u64 attr_version, u64 evict_ctr) { struct fuse_entry_out *o = &direntplus->entry_out; struct fuse_dirent *dirent = &direntplus->dirent; struct dentry *parent = file->f_path.dentry; struct qstr name = QSTR_INIT(dirent->name, dirent->namelen); struct dentry *dentry; struct dentry *alias; struct inode *dir = d_inode(parent); struct fuse_conn *fc; struct inode *inode; DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq); int epoch; if (!o->nodeid) { /* * Unlike in the case of fuse_lookup, zero nodeid does not mean * ENOENT. Instead, it only means the userspace filesystem did * not want to return attributes/handle for this entry. * * So do nothing. */ return 0; } if (name.name[0] == '.') { /* * We could potentially refresh the attributes of the directory * and its parent? */ if (name.len == 1) return 0; if (name.name[1] == '.' && name.len == 2) return 0; } if (invalid_nodeid(o->nodeid)) return -EIO; if (fuse_invalid_attr(&o->attr)) return -EIO; fc = get_fuse_conn(dir); epoch = atomic_read(&fc->epoch); name.hash = full_name_hash(parent, name.name, name.len); dentry = d_lookup(parent, &name); if (!dentry) { retry: dentry = d_alloc_parallel(parent, &name, &wq); if (IS_ERR(dentry)) return PTR_ERR(dentry); } if (!d_in_lookup(dentry)) { struct fuse_inode *fi; inode = d_inode(dentry); if (inode && get_node_id(inode) != o->nodeid) inode = NULL; if (!inode || fuse_stale_inode(inode, o->generation, &o->attr)) { if (inode) fuse_make_bad(inode); d_invalidate(dentry); dput(dentry); goto retry; } if (fuse_is_bad(inode)) { dput(dentry); return -EIO; } fi = get_fuse_inode(inode); spin_lock(&fi->lock); fi->nlookup++; spin_unlock(&fi->lock); forget_all_cached_acls(inode); fuse_change_attributes(inode, &o->attr, NULL, ATTR_TIMEOUT(o), attr_version); /* * The other branch comes via fuse_iget() * which bumps nlookup inside */ } else { inode = fuse_iget(dir->i_sb, o->nodeid, o->generation, &o->attr, ATTR_TIMEOUT(o), attr_version, evict_ctr); if (!inode) inode = ERR_PTR(-ENOMEM); alias = d_splice_alias(inode, dentry); d_lookup_done(dentry); if (alias) { dput(dentry); dentry = alias; } if (IS_ERR(dentry)) { if (!IS_ERR(inode)) { struct fuse_inode *fi = get_fuse_inode(inode); spin_lock(&fi->lock); fi->nlookup--; spin_unlock(&fi->lock); } return PTR_ERR(dentry); } } if (fc->readdirplus_auto) set_bit(FUSE_I_INIT_RDPLUS, &get_fuse_inode(inode)->state); dentry->d_time = epoch; fuse_change_entry_timeout(dentry, o); dput(dentry); return 0; } static void fuse_force_forget(struct file *file, u64 nodeid) { struct inode *inode = file_inode(file); struct fuse_mount *fm = get_fuse_mount(inode); struct fuse_forget_in inarg; FUSE_ARGS(args); memset(&inarg, 0, sizeof(inarg)); inarg.nlookup = 1; args.opcode = FUSE_FORGET; args.nodeid = nodeid; args.in_numargs = 1; args.in_args[0].size = sizeof(inarg); args.in_args[0].value = &inarg; args.force = true; args.noreply = true; fuse_simple_request(fm, &args); /* ignore errors */ } static int parse_dirplusfile(char *buf, size_t nbytes, struct file *file, struct dir_context *ctx, u64 attr_version, u64 evict_ctr) { struct fuse_direntplus *direntplus; struct fuse_dirent *dirent; size_t reclen; int over = 0; int ret; while (nbytes >= FUSE_NAME_OFFSET_DIRENTPLUS) { direntplus = (struct fuse_direntplus *) buf; dirent = &direntplus->dirent; reclen = FUSE_DIRENTPLUS_SIZE(direntplus); if (!dirent->namelen || dirent->namelen > FUSE_NAME_MAX) return -EIO; if (reclen > nbytes) break; if (memchr(dirent->name, '/', dirent->namelen) != NULL) return -EIO; if (!over) { /* We fill entries into dstbuf only as much as it can hold. But we still continue iterating over remaining entries to link them. If not, we need to send a FORGET for each of those which we did not link. */ over = !fuse_emit(file, ctx, dirent); if (!over) ctx->pos = dirent->off; } buf += reclen; nbytes -= reclen; ret = fuse_direntplus_link(file, direntplus, attr_version, evict_ctr); if (ret) fuse_force_forget(file, direntplus->entry_out.nodeid); } return 0; } static int fuse_readdir_uncached(struct file *file, struct dir_context *ctx) { int plus; ssize_t res; struct inode *inode = file_inode(file); struct fuse_mount *fm = get_fuse_mount(inode); struct fuse_conn *fc = fm->fc; struct fuse_io_args ia = {}; struct fuse_args *args = &ia.ap.args; void *buf; size_t bufsize = clamp((unsigned int) ctx->count, PAGE_SIZE, fc->max_pages << PAGE_SHIFT); u64 attr_version = 0, evict_ctr = 0; bool locked; buf = kvmalloc(bufsize, GFP_KERNEL); if (!buf) return -ENOMEM; args->out_args[0].value = buf; plus = fuse_use_readdirplus(inode, ctx); if (plus) { attr_version = fuse_get_attr_version(fm->fc); evict_ctr = fuse_get_evict_ctr(fm->fc); fuse_read_args_fill(&ia, file, ctx->pos, bufsize, FUSE_READDIRPLUS); } else { fuse_read_args_fill(&ia, file, ctx->pos, bufsize, FUSE_READDIR); } locked = fuse_lock_inode(inode); res = fuse_simple_request(fm, args); fuse_unlock_inode(inode, locked); if (res >= 0) { if (!res) { struct fuse_file *ff = file->private_data; if (ff->open_flags & FOPEN_CACHE_DIR) fuse_readdir_cache_end(file, ctx->pos); } else if (plus) { res = parse_dirplusfile(buf, res, file, ctx, attr_version, evict_ctr); } else { res = parse_dirfile(buf, res, file, ctx); } } kvfree(buf); fuse_invalidate_atime(inode); return res; } enum fuse_parse_result { FOUND_ERR = -1, FOUND_NONE = 0, FOUND_SOME, FOUND_ALL, }; static enum fuse_parse_result fuse_parse_cache(struct fuse_file *ff, void *addr, unsigned int size, struct dir_context *ctx) { unsigned int offset = ff->readdir.cache_off & ~PAGE_MASK; enum fuse_parse_result res = FOUND_NONE; WARN_ON(offset >= size); for (;;) { struct fuse_dirent *dirent = addr + offset; unsigned int nbytes = size - offset; size_t reclen; if (nbytes < FUSE_NAME_OFFSET || !dirent->namelen) break; reclen = FUSE_DIRENT_SIZE(dirent); /* derefs ->namelen */ if (WARN_ON(dirent->namelen > FUSE_NAME_MAX)) return FOUND_ERR; if (WARN_ON(reclen > nbytes)) return FOUND_ERR; if (WARN_ON(memchr(dirent->name, '/', dirent->namelen) != NULL)) return FOUND_ERR; if (ff->readdir.pos == ctx->pos) { res = FOUND_SOME; if (!dir_emit(ctx, dirent->name, dirent->namelen, dirent->ino, dirent->type | FILLDIR_FLAG_NOINTR)) return FOUND_ALL; ctx->pos = dirent->off; } ff->readdir.pos = dirent->off; ff->readdir.cache_off += reclen; offset += reclen; } return res; } static void fuse_rdc_reset(struct inode *inode) { struct fuse_inode *fi = get_fuse_inode(inode); fi->rdc.cached = false; fi->rdc.version++; fi->rdc.size = 0; fi->rdc.pos = 0; } #define UNCACHED 1 static int fuse_readdir_cached(struct file *file, struct dir_context *ctx) { struct fuse_file *ff = file->private_data; struct inode *inode = file_inode(file); struct fuse_conn *fc = get_fuse_conn(inode); struct fuse_inode *fi = get_fuse_inode(inode); enum fuse_parse_result res; pgoff_t index; unsigned int size; struct page *page; void *addr; /* Seeked? If so, reset the cache stream */ if (ff->readdir.pos != ctx->pos) { ff->readdir.pos = 0; ff->readdir.cache_off = 0; } /* * We're just about to start reading into the cache or reading the * cache; both cases require an up-to-date mtime value. */ if (!ctx->pos && fc->auto_inval_data) { int err = fuse_update_attributes(inode, file, STATX_MTIME); if (err) return err; } retry: spin_lock(&fi->rdc.lock); retry_locked: if (!fi->rdc.cached) { /* Starting cache? Set cache mtime. */ if (!ctx->pos && !fi->rdc.size) { fi->rdc.mtime = inode_get_mtime(inode); fi->rdc.iversion = inode_query_iversion(inode); } spin_unlock(&fi->rdc.lock); return UNCACHED; } /* * When at the beginning of the directory (i.e. just after opendir(3) or * rewinddir(3)), then need to check whether directory contents have * changed, and reset the cache if so. */ if (!ctx->pos) { struct timespec64 mtime = inode_get_mtime(inode); if (inode_peek_iversion(inode) != fi->rdc.iversion || !timespec64_equal(&fi->rdc.mtime, &mtime)) { fuse_rdc_reset(inode); goto retry_locked; } } /* * If cache version changed since the last getdents() call, then reset * the cache stream. */ if (ff->readdir.version != fi->rdc.version) { ff->readdir.pos = 0; ff->readdir.cache_off = 0; } /* * If at the beginning of the cache, than reset version to * current. */ if (ff->readdir.pos == 0) ff->readdir.version = fi->rdc.version; WARN_ON(fi->rdc.size < ff->readdir.cache_off); index = ff->readdir.cache_off >> PAGE_SHIFT; if (index == (fi->rdc.size >> PAGE_SHIFT)) size = fi->rdc.size & ~PAGE_MASK; else size = PAGE_SIZE; spin_unlock(&fi->rdc.lock); /* EOF? */ if ((ff->readdir.cache_off & ~PAGE_MASK) == size) return 0; page = find_get_page_flags(file->f_mapping, index, FGP_ACCESSED | FGP_LOCK); /* Page gone missing, then re-added to cache, but not initialized? */ if (page && !PageUptodate(page)) { unlock_page(page); put_page(page); page = NULL; } spin_lock(&fi->rdc.lock); if (!page) { /* * Uh-oh: page gone missing, cache is useless */ if (fi->rdc.version == ff->readdir.version) fuse_rdc_reset(inode); goto retry_locked; } /* Make sure it's still the same version after getting the page. */ if (ff->readdir.version != fi->rdc.version) { spin_unlock(&fi->rdc.lock); unlock_page(page); put_page(page); goto retry; } spin_unlock(&fi->rdc.lock); /* * Contents of the page are now protected against changing by holding * the page lock. */ addr = kmap_local_page(page); res = fuse_parse_cache(ff, addr, size, ctx); kunmap_local(addr); unlock_page(page); put_page(page); if (res == FOUND_ERR) return -EIO; if (res == FOUND_ALL) return 0; if (size == PAGE_SIZE) { /* We hit end of page: skip to next page. */ ff->readdir.cache_off = ALIGN(ff->readdir.cache_off, PAGE_SIZE); goto retry; } /* * End of cache reached. If found position, then we are done, otherwise * need to fall back to uncached, since the position we were looking for * wasn't in the cache. */ return res == FOUND_SOME ? 0 : UNCACHED; } int fuse_readdir(struct file *file, struct dir_context *ctx) { struct fuse_file *ff = file->private_data; struct inode *inode = file_inode(file); int err; if (fuse_is_bad(inode)) return -EIO; err = UNCACHED; if (ff->open_flags & FOPEN_CACHE_DIR) err = fuse_readdir_cached(file, ctx); if (err == UNCACHED) err = fuse_readdir_uncached(file, ctx); return err; } |
| 6 6 6 6 6 3 4 2 2 2 2 2 2 2 2 5 5 5 5 10 2 7 1 2 2 5 1 5 1 4 4 4 1 3 1 6 6 6 5 5 6 6 1 5 5 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 | // SPDX-License-Identifier: GPL-2.0 /* Copyright 2011-2014 Autronica Fire and Security AS * * Author(s): * 2011-2014 Arvid Brodin, arvid.brodin@alten.se * * The HSR spec says never to forward the same frame twice on the same * interface. A frame is identified by its source MAC address and its HSR * sequence number. This code keeps track of senders and their sequence numbers * to allow filtering of duplicate frames, and to detect HSR ring errors. * Same code handles filtering of duplicates for PRP as well. */ #include <linux/if_ether.h> #include <linux/etherdevice.h> #include <linux/slab.h> #include <linux/rculist.h> #include "hsr_main.h" #include "hsr_framereg.h" #include "hsr_netlink.h" /* seq_nr_after(a, b) - return true if a is after (higher in sequence than) b, * false otherwise. */ static bool seq_nr_after(u16 a, u16 b) { /* Remove inconsistency where * seq_nr_after(a, b) == seq_nr_before(a, b) */ if ((int)b - a == 32768) return false; return (((s16)(b - a)) < 0); } #define seq_nr_before(a, b) seq_nr_after((b), (a)) #define seq_nr_before_or_eq(a, b) (!seq_nr_after((a), (b))) #define PRP_DROP_WINDOW_LEN 32768 bool hsr_addr_is_redbox(struct hsr_priv *hsr, unsigned char *addr) { if (!hsr->redbox || !is_valid_ether_addr(hsr->macaddress_redbox)) return false; return ether_addr_equal(addr, hsr->macaddress_redbox); } bool hsr_addr_is_self(struct hsr_priv *hsr, unsigned char *addr) { struct hsr_self_node *sn; bool ret = false; rcu_read_lock(); sn = rcu_dereference(hsr->self_node); if (!sn) { WARN_ONCE(1, "HSR: No self node\n"); goto out; } if (ether_addr_equal(addr, sn->macaddress_A) || ether_addr_equal(addr, sn->macaddress_B)) ret = true; out: rcu_read_unlock(); return ret; } /* Search for mac entry. Caller must hold rcu read lock. */ static struct hsr_node *find_node_by_addr_A(struct list_head *node_db, const unsigned char addr[ETH_ALEN]) { struct hsr_node *node; list_for_each_entry_rcu(node, node_db, mac_list) { if (ether_addr_equal(node->macaddress_A, addr)) return node; } return NULL; } /* Check if node for a given MAC address is already present in data base */ bool hsr_is_node_in_db(struct list_head *node_db, const unsigned char addr[ETH_ALEN]) { return !!find_node_by_addr_A(node_db, addr); } /* Helper for device init; the self_node is used in hsr_rcv() to recognize * frames from self that's been looped over the HSR ring. */ int hsr_create_self_node(struct hsr_priv *hsr, const unsigned char addr_a[ETH_ALEN], const unsigned char addr_b[ETH_ALEN]) { struct hsr_self_node *sn, *old; sn = kmalloc(sizeof(*sn), GFP_KERNEL); if (!sn) return -ENOMEM; ether_addr_copy(sn->macaddress_A, addr_a); ether_addr_copy(sn->macaddress_B, addr_b); spin_lock_bh(&hsr->list_lock); old = rcu_replace_pointer(hsr->self_node, sn, lockdep_is_held(&hsr->list_lock)); spin_unlock_bh(&hsr->list_lock); if (old) kfree_rcu(old, rcu_head); return 0; } void hsr_del_self_node(struct hsr_priv *hsr) { struct hsr_self_node *old; spin_lock_bh(&hsr->list_lock); old = rcu_replace_pointer(hsr->self_node, NULL, lockdep_is_held(&hsr->list_lock)); spin_unlock_bh(&hsr->list_lock); if (old) kfree_rcu(old, rcu_head); } void hsr_del_nodes(struct list_head *node_db) { struct hsr_node *node; struct hsr_node *tmp; list_for_each_entry_safe(node, tmp, node_db, mac_list) kfree(node); } void prp_handle_san_frame(bool san, enum hsr_port_type port, struct hsr_node *node) { /* Mark if the SAN node is over LAN_A or LAN_B */ if (port == HSR_PT_SLAVE_A) { node->san_a = true; return; } if (port == HSR_PT_SLAVE_B) node->san_b = true; } /* Allocate an hsr_node and add it to node_db. 'addr' is the node's address_A; * seq_out is used to initialize filtering of outgoing duplicate frames * originating from the newly added node. */ static struct hsr_node *hsr_add_node(struct hsr_priv *hsr, struct list_head *node_db, unsigned char addr[], u16 seq_out, bool san, enum hsr_port_type rx_port) { struct hsr_node *new_node, *node; unsigned long now; int i; new_node = kzalloc(sizeof(*new_node), GFP_ATOMIC); if (!new_node) return NULL; ether_addr_copy(new_node->macaddress_A, addr); spin_lock_init(&new_node->seq_out_lock); /* We are only interested in time diffs here, so use current jiffies * as initialization. (0 could trigger an spurious ring error warning). */ now = jiffies; for (i = 0; i < HSR_PT_PORTS; i++) { new_node->time_in[i] = now; new_node->time_out[i] = now; } for (i = 0; i < HSR_PT_PORTS; i++) { new_node->seq_out[i] = seq_out; new_node->seq_expected[i] = seq_out + 1; new_node->seq_start[i] = seq_out + 1; } if (san && hsr->proto_ops->handle_san_frame) hsr->proto_ops->handle_san_frame(san, rx_port, new_node); spin_lock_bh(&hsr->list_lock); list_for_each_entry_rcu(node, node_db, mac_list, lockdep_is_held(&hsr->list_lock)) { if (ether_addr_equal(node->macaddress_A, addr)) goto out; if (ether_addr_equal(node->macaddress_B, addr)) goto out; } list_add_tail_rcu(&new_node->mac_list, node_db); spin_unlock_bh(&hsr->list_lock); return new_node; out: spin_unlock_bh(&hsr->list_lock); kfree(new_node); return node; } void prp_update_san_info(struct hsr_node *node, bool is_sup) { if (!is_sup) return; node->san_a = false; node->san_b = false; } /* Get the hsr_node from which 'skb' was sent. */ struct hsr_node *hsr_get_node(struct hsr_port *port, struct list_head *node_db, struct sk_buff *skb, bool is_sup, enum hsr_port_type rx_port) { struct hsr_priv *hsr = port->hsr; struct hsr_node *node; struct ethhdr *ethhdr; struct prp_rct *rct; bool san = false; u16 seq_out; if (!skb_mac_header_was_set(skb)) return NULL; ethhdr = (struct ethhdr *)skb_mac_header(skb); list_for_each_entry_rcu(node, node_db, mac_list) { if (ether_addr_equal(node->macaddress_A, ethhdr->h_source)) { if (hsr->proto_ops->update_san_info) hsr->proto_ops->update_san_info(node, is_sup); return node; } if (ether_addr_equal(node->macaddress_B, ethhdr->h_source)) { if (hsr->proto_ops->update_san_info) hsr->proto_ops->update_san_info(node, is_sup); return node; } } /* Check if required node is not in proxy nodes table */ list_for_each_entry_rcu(node, &hsr->proxy_node_db, mac_list) { if (ether_addr_equal(node->macaddress_A, ethhdr->h_source)) { if (hsr->proto_ops->update_san_info) hsr->proto_ops->update_san_info(node, is_sup); return node; } } /* Everyone may create a node entry, connected node to a HSR/PRP * device. */ if (ethhdr->h_proto == htons(ETH_P_PRP) || ethhdr->h_proto == htons(ETH_P_HSR)) { /* Check if skb contains hsr_ethhdr */ if (skb->mac_len < sizeof(struct hsr_ethhdr)) return NULL; /* Use the existing sequence_nr from the tag as starting point * for filtering duplicate frames. */ seq_out = hsr_get_skb_sequence_nr(skb) - 1; } else { rct = skb_get_PRP_rct(skb); if (rct && prp_check_lsdu_size(skb, rct, is_sup)) { seq_out = prp_get_skb_sequence_nr(rct); } else { if (rx_port != HSR_PT_MASTER) san = true; seq_out = HSR_SEQNR_START; } } return hsr_add_node(hsr, node_db, ethhdr->h_source, seq_out, san, rx_port); } /* Use the Supervision frame's info about an eventual macaddress_B for merging * nodes that has previously had their macaddress_B registered as a separate * node. */ void hsr_handle_sup_frame(struct hsr_frame_info *frame) { struct hsr_node *node_curr = frame->node_src; struct hsr_port *port_rcv = frame->port_rcv; struct hsr_priv *hsr = port_rcv->hsr; struct hsr_sup_payload *hsr_sp; struct hsr_sup_tlv *hsr_sup_tlv; struct hsr_node *node_real; struct sk_buff *skb = NULL; struct list_head *node_db; struct ethhdr *ethhdr; int i; unsigned int pull_size = 0; unsigned int total_pull_size = 0; /* Here either frame->skb_hsr or frame->skb_prp should be * valid as supervision frame always will have protocol * header info. */ if (frame->skb_hsr) skb = frame->skb_hsr; else if (frame->skb_prp) skb = frame->skb_prp; else if (frame->skb_std) skb = frame->skb_std; if (!skb) return; /* Leave the ethernet header. */ pull_size = sizeof(struct ethhdr); skb_pull(skb, pull_size); total_pull_size += pull_size; ethhdr = (struct ethhdr *)skb_mac_header(skb); /* And leave the HSR tag. */ if (ethhdr->h_proto == htons(ETH_P_HSR)) { pull_size = sizeof(struct hsr_tag); skb_pull(skb, pull_size); total_pull_size += pull_size; } /* And leave the HSR sup tag. */ pull_size = sizeof(struct hsr_sup_tag); skb_pull(skb, pull_size); total_pull_size += pull_size; /* get HSR sup payload */ hsr_sp = (struct hsr_sup_payload *)skb->data; /* Merge node_curr (registered on macaddress_B) into node_real */ node_db = &port_rcv->hsr->node_db; node_real = find_node_by_addr_A(node_db, hsr_sp->macaddress_A); if (!node_real) /* No frame received from AddrA of this node yet */ node_real = hsr_add_node(hsr, node_db, hsr_sp->macaddress_A, HSR_SEQNR_START - 1, true, port_rcv->type); if (!node_real) goto done; /* No mem */ if (node_real == node_curr) /* Node has already been merged */ goto done; /* Leave the first HSR sup payload. */ pull_size = sizeof(struct hsr_sup_payload); skb_pull(skb, pull_size); total_pull_size += pull_size; /* Get second supervision tlv */ hsr_sup_tlv = (struct hsr_sup_tlv *)skb->data; /* And check if it is a redbox mac TLV */ if (hsr_sup_tlv->HSR_TLV_type == PRP_TLV_REDBOX_MAC) { /* We could stop here after pushing hsr_sup_payload, * or proceed and allow macaddress_B and for redboxes. */ /* Sanity check length */ if (hsr_sup_tlv->HSR_TLV_length != 6) goto done; /* Leave the second HSR sup tlv. */ pull_size = sizeof(struct hsr_sup_tlv); skb_pull(skb, pull_size); total_pull_size += pull_size; /* Get redbox mac address. */ hsr_sp = (struct hsr_sup_payload *)skb->data; /* Check if redbox mac and node mac are equal. */ if (!ether_addr_equal(node_real->macaddress_A, hsr_sp->macaddress_A)) { /* This is a redbox supervision frame for a VDAN! */ goto done; } } ether_addr_copy(node_real->macaddress_B, ethhdr->h_source); spin_lock_bh(&node_real->seq_out_lock); for (i = 0; i < HSR_PT_PORTS; i++) { if (!node_curr->time_in_stale[i] && time_after(node_curr->time_in[i], node_real->time_in[i])) { node_real->time_in[i] = node_curr->time_in[i]; node_real->time_in_stale[i] = node_curr->time_in_stale[i]; } if (seq_nr_after(node_curr->seq_out[i], node_real->seq_out[i])) node_real->seq_out[i] = node_curr->seq_out[i]; } spin_unlock_bh(&node_real->seq_out_lock); node_real->addr_B_port = port_rcv->type; spin_lock_bh(&hsr->list_lock); if (!node_curr->removed) { list_del_rcu(&node_curr->mac_list); node_curr->removed = true; kfree_rcu(node_curr, rcu_head); } spin_unlock_bh(&hsr->list_lock); done: /* Push back here */ skb_push(skb, total_pull_size); } /* 'skb' is a frame meant for this host, that is to be passed to upper layers. * * If the frame was sent by a node's B interface, replace the source * address with that node's "official" address (macaddress_A) so that upper * layers recognize where it came from. */ void hsr_addr_subst_source(struct hsr_node *node, struct sk_buff *skb) { if (!skb_mac_header_was_set(skb)) { WARN_ONCE(1, "%s: Mac header not set\n", __func__); return; } memcpy(ð_hdr(skb)->h_source, node->macaddress_A, ETH_ALEN); } /* 'skb' is a frame meant for another host. * 'port' is the outgoing interface * * Substitute the target (dest) MAC address if necessary, so the it matches the * recipient interface MAC address, regardless of whether that is the * recipient's A or B interface. * This is needed to keep the packets flowing through switches that learn on * which "side" the different interfaces are. */ void hsr_addr_subst_dest(struct hsr_node *node_src, struct sk_buff *skb, struct hsr_port *port) { struct hsr_node *node_dst; if (!skb_mac_header_was_set(skb)) { WARN_ONCE(1, "%s: Mac header not set\n", __func__); return; } if (!is_unicast_ether_addr(eth_hdr(skb)->h_dest)) return; node_dst = find_node_by_addr_A(&port->hsr->node_db, eth_hdr(skb)->h_dest); if (!node_dst && port->hsr->redbox) node_dst = find_node_by_addr_A(&port->hsr->proxy_node_db, eth_hdr(skb)->h_dest); if (!node_dst) { if (port->hsr->prot_version != PRP_V1 && net_ratelimit()) netdev_err(skb->dev, "%s: Unknown node\n", __func__); return; } if (port->type != node_dst->addr_B_port) return; if (is_valid_ether_addr(node_dst->macaddress_B)) ether_addr_copy(eth_hdr(skb)->h_dest, node_dst->macaddress_B); } void hsr_register_frame_in(struct hsr_node *node, struct hsr_port *port, u16 sequence_nr) { /* Don't register incoming frames without a valid sequence number. This * ensures entries of restarted nodes gets pruned so that they can * re-register and resume communications. */ if (!(port->dev->features & NETIF_F_HW_HSR_TAG_RM) && seq_nr_before(sequence_nr, node->seq_out[port->type])) return; node->time_in[port->type] = jiffies; node->time_in_stale[port->type] = false; } /* 'skb' is a HSR Ethernet frame (with a HSR tag inserted), with a valid * ethhdr->h_source address and skb->mac_header set. * * Return: * 1 if frame can be shown to have been sent recently on this interface, * 0 otherwise, or * negative error code on error */ int hsr_register_frame_out(struct hsr_port *port, struct hsr_frame_info *frame) { struct hsr_node *node = frame->node_src; u16 sequence_nr = frame->sequence_nr; spin_lock_bh(&node->seq_out_lock); if (seq_nr_before_or_eq(sequence_nr, node->seq_out[port->type]) && time_is_after_jiffies(node->time_out[port->type] + msecs_to_jiffies(HSR_ENTRY_FORGET_TIME))) { spin_unlock_bh(&node->seq_out_lock); return 1; } node->time_out[port->type] = jiffies; node->seq_out[port->type] = sequence_nr; spin_unlock_bh(&node->seq_out_lock); return 0; } /* Adaptation of the PRP duplicate discard algorithm described in wireshark * wiki (https://wiki.wireshark.org/PRP) * * A drop window is maintained for both LANs with start sequence set to the * first sequence accepted on the LAN that has not been seen on the other LAN, * and expected sequence set to the latest received sequence number plus one. * * When a frame is received on either LAN it is compared against the received * frames on the other LAN. If it is outside the drop window of the other LAN * the frame is accepted and the drop window is updated. * The drop window for the other LAN is reset. * * 'port' is the outgoing interface * 'frame' is the frame to be sent * * Return: * 1 if frame can be shown to have been sent recently on this interface, * 0 otherwise */ int prp_register_frame_out(struct hsr_port *port, struct hsr_frame_info *frame) { enum hsr_port_type other_port; enum hsr_port_type rcv_port; struct hsr_node *node; u16 sequence_diff; u16 sequence_exp; u16 sequence_nr; /* out-going frames are always in order * and can be checked the same way as for HSR */ if (frame->port_rcv->type == HSR_PT_MASTER) return hsr_register_frame_out(port, frame); /* for PRP we should only forward frames from the slave ports * to the master port */ if (port->type != HSR_PT_MASTER) return 1; node = frame->node_src; sequence_nr = frame->sequence_nr; sequence_exp = sequence_nr + 1; rcv_port = frame->port_rcv->type; other_port = rcv_port == HSR_PT_SLAVE_A ? HSR_PT_SLAVE_B : HSR_PT_SLAVE_A; spin_lock_bh(&node->seq_out_lock); if (time_is_before_jiffies(node->time_out[port->type] + msecs_to_jiffies(HSR_ENTRY_FORGET_TIME)) || (node->seq_start[rcv_port] == node->seq_expected[rcv_port] && node->seq_start[other_port] == node->seq_expected[other_port])) { /* the node hasn't been sending for a while * or both drop windows are empty, forward the frame */ node->seq_start[rcv_port] = sequence_nr; } else if (seq_nr_before(sequence_nr, node->seq_expected[other_port]) && seq_nr_before_or_eq(node->seq_start[other_port], sequence_nr)) { /* drop the frame, update the drop window for the other port * and reset our drop window */ node->seq_start[other_port] = sequence_exp; node->seq_expected[rcv_port] = sequence_exp; node->seq_start[rcv_port] = node->seq_expected[rcv_port]; spin_unlock_bh(&node->seq_out_lock); return 1; } /* update the drop window for the port where this frame was received * and clear the drop window for the other port */ node->seq_start[other_port] = node->seq_expected[other_port]; node->seq_expected[rcv_port] = sequence_exp; sequence_diff = sequence_exp - node->seq_start[rcv_port]; if (sequence_diff > PRP_DROP_WINDOW_LEN) node->seq_start[rcv_port] = sequence_exp - PRP_DROP_WINDOW_LEN; node->time_out[port->type] = jiffies; node->seq_out[port->type] = sequence_nr; spin_unlock_bh(&node->seq_out_lock); return 0; } #if IS_MODULE(CONFIG_PRP_DUP_DISCARD_KUNIT_TEST) EXPORT_SYMBOL(prp_register_frame_out); #endif static struct hsr_port *get_late_port(struct hsr_priv *hsr, struct hsr_node *node) { if (node->time_in_stale[HSR_PT_SLAVE_A]) return hsr_port_get_hsr(hsr, HSR_PT_SLAVE_A); if (node->time_in_stale[HSR_PT_SLAVE_B]) return hsr_port_get_hsr(hsr, HSR_PT_SLAVE_B); if (time_after(node->time_in[HSR_PT_SLAVE_B], node->time_in[HSR_PT_SLAVE_A] + msecs_to_jiffies(MAX_SLAVE_DIFF))) return hsr_port_get_hsr(hsr, HSR_PT_SLAVE_A); if (time_after(node->time_in[HSR_PT_SLAVE_A], node->time_in[HSR_PT_SLAVE_B] + msecs_to_jiffies(MAX_SLAVE_DIFF))) return hsr_port_get_hsr(hsr, HSR_PT_SLAVE_B); return NULL; } /* Remove stale sequence_nr records. Called by timer every * HSR_LIFE_CHECK_INTERVAL (two seconds or so). */ void hsr_prune_nodes(struct timer_list *t) { struct hsr_priv *hsr = timer_container_of(hsr, t, prune_timer); struct hsr_node *node; struct hsr_node *tmp; struct hsr_port *port; unsigned long timestamp; unsigned long time_a, time_b; spin_lock_bh(&hsr->list_lock); list_for_each_entry_safe(node, tmp, &hsr->node_db, mac_list) { /* Don't prune own node. Neither time_in[HSR_PT_SLAVE_A] * nor time_in[HSR_PT_SLAVE_B], will ever be updated for * the master port. Thus the master node will be repeatedly * pruned leading to packet loss. */ if (hsr_addr_is_self(hsr, node->macaddress_A)) continue; /* Shorthand */ time_a = node->time_in[HSR_PT_SLAVE_A]; time_b = node->time_in[HSR_PT_SLAVE_B]; /* Check for timestamps old enough to risk wrap-around */ if (time_after(jiffies, time_a + MAX_JIFFY_OFFSET / 2)) node->time_in_stale[HSR_PT_SLAVE_A] = true; if (time_after(jiffies, time_b + MAX_JIFFY_OFFSET / 2)) node->time_in_stale[HSR_PT_SLAVE_B] = true; /* Get age of newest frame from node. * At least one time_in is OK here; nodes get pruned long * before both time_ins can get stale */ timestamp = time_a; if (node->time_in_stale[HSR_PT_SLAVE_A] || (!node->time_in_stale[HSR_PT_SLAVE_B] && time_after(time_b, time_a))) timestamp = time_b; /* Warn of ring error only as long as we get frames at all */ if (time_is_after_jiffies(timestamp + msecs_to_jiffies(1.5 * MAX_SLAVE_DIFF))) { rcu_read_lock(); port = get_late_port(hsr, node); if (port) hsr_nl_ringerror(hsr, node->macaddress_A, port); rcu_read_unlock(); } /* Prune old entries */ if (time_is_before_jiffies(timestamp + msecs_to_jiffies(HSR_NODE_FORGET_TIME))) { hsr_nl_nodedown(hsr, node->macaddress_A); if (!node->removed) { list_del_rcu(&node->mac_list); node->removed = true; /* Note that we need to free this entry later: */ kfree_rcu(node, rcu_head); } } } spin_unlock_bh(&hsr->list_lock); /* Restart timer */ mod_timer(&hsr->prune_timer, jiffies + msecs_to_jiffies(PRUNE_PERIOD)); } void hsr_prune_proxy_nodes(struct timer_list *t) { struct hsr_priv *hsr = timer_container_of(hsr, t, prune_proxy_timer); unsigned long timestamp; struct hsr_node *node; struct hsr_node *tmp; spin_lock_bh(&hsr->list_lock); list_for_each_entry_safe(node, tmp, &hsr->proxy_node_db, mac_list) { /* Don't prune RedBox node. */ if (hsr_addr_is_redbox(hsr, node->macaddress_A)) continue; timestamp = node->time_in[HSR_PT_INTERLINK]; /* Prune old entries */ if (time_is_before_jiffies(timestamp + msecs_to_jiffies(HSR_PROXY_NODE_FORGET_TIME))) { hsr_nl_nodedown(hsr, node->macaddress_A); if (!node->removed) { list_del_rcu(&node->mac_list); node->removed = true; /* Note that we need to free this entry later: */ kfree_rcu(node, rcu_head); } } } spin_unlock_bh(&hsr->list_lock); /* Restart timer */ mod_timer(&hsr->prune_proxy_timer, jiffies + msecs_to_jiffies(PRUNE_PROXY_PERIOD)); } void *hsr_get_next_node(struct hsr_priv *hsr, void *_pos, unsigned char addr[ETH_ALEN]) { struct hsr_node *node; if (!_pos) { node = list_first_or_null_rcu(&hsr->node_db, struct hsr_node, mac_list); if (node) ether_addr_copy(addr, node->macaddress_A); return node; } node = _pos; list_for_each_entry_continue_rcu(node, &hsr->node_db, mac_list) { ether_addr_copy(addr, node->macaddress_A); return node; } return NULL; } int hsr_get_node_data(struct hsr_priv *hsr, const unsigned char *addr, unsigned char addr_b[ETH_ALEN], unsigned int *addr_b_ifindex, int *if1_age, u16 *if1_seq, int *if2_age, u16 *if2_seq) { struct hsr_node *node; struct hsr_port *port; unsigned long tdiff; node = find_node_by_addr_A(&hsr->node_db, addr); if (!node) return -ENOENT; ether_addr_copy(addr_b, node->macaddress_B); tdiff = jiffies - node->time_in[HSR_PT_SLAVE_A]; if (node->time_in_stale[HSR_PT_SLAVE_A]) *if1_age = INT_MAX; #if HZ <= MSEC_PER_SEC else if (tdiff > msecs_to_jiffies(INT_MAX)) *if1_age = INT_MAX; #endif else *if1_age = jiffies_to_msecs(tdiff); tdiff = jiffies - node->time_in[HSR_PT_SLAVE_B]; if (node->time_in_stale[HSR_PT_SLAVE_B]) *if2_age = INT_MAX; #if HZ <= MSEC_PER_SEC else if (tdiff > msecs_to_jiffies(INT_MAX)) *if2_age = INT_MAX; #endif else *if2_age = jiffies_to_msecs(tdiff); /* Present sequence numbers as if they were incoming on interface */ *if1_seq = node->seq_out[HSR_PT_SLAVE_B]; *if2_seq = node->seq_out[HSR_PT_SLAVE_A]; if (node->addr_B_port != HSR_PT_NONE) { port = hsr_port_get_hsr(hsr, node->addr_B_port); *addr_b_ifindex = port->dev->ifindex; } else { *addr_b_ifindex = -1; } return 0; } |
| 2 1 1 1 1 1 1 12 12 1 5 1 1 1 2 1 1 1 1 1 1 1 2 2 1 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 | // SPDX-License-Identifier: GPL-2.0-only /* Copyright (C) 2000-2002 Joakim Axelsson <gozem@linux.nu> * Patrick Schaaf <bof@bof.de> * Copyright (C) 2003-2013 Jozsef Kadlecsik <kadlec@netfilter.org> */ /* Kernel module implementing an IP set type: the bitmap:ip type */ #include <linux/module.h> #include <linux/ip.h> #include <linux/skbuff.h> #include <linux/errno.h> #include <linux/bitops.h> #include <linux/spinlock.h> #include <linux/netlink.h> #include <linux/jiffies.h> #include <linux/timer.h> #include <net/netlink.h> #include <net/tcp.h> #include <linux/netfilter/ipset/pfxlen.h> #include <linux/netfilter/ipset/ip_set.h> #include <linux/netfilter/ipset/ip_set_bitmap.h> #define IPSET_TYPE_REV_MIN 0 /* 1 Counter support added */ /* 2 Comment support added */ #define IPSET_TYPE_REV_MAX 3 /* skbinfo support added */ MODULE_LICENSE("GPL"); MODULE_AUTHOR("Jozsef Kadlecsik <kadlec@netfilter.org>"); IP_SET_MODULE_DESC("bitmap:ip", IPSET_TYPE_REV_MIN, IPSET_TYPE_REV_MAX); MODULE_ALIAS("ip_set_bitmap:ip"); #define MTYPE bitmap_ip #define HOST_MASK 32 /* Type structure */ struct bitmap_ip { unsigned long *members; /* the set members */ u32 first_ip; /* host byte order, included in range */ u32 last_ip; /* host byte order, included in range */ u32 elements; /* number of max elements in the set */ u32 hosts; /* number of hosts in a subnet */ size_t memsize; /* members size */ u8 netmask; /* subnet netmask */ struct timer_list gc; /* garbage collection */ struct ip_set *set; /* attached to this ip_set */ unsigned char extensions[] /* data extensions */ __aligned(__alignof__(u64)); }; /* ADT structure for generic function args */ struct bitmap_ip_adt_elem { u16 id; }; static u32 ip_to_id(const struct bitmap_ip *m, u32 ip) { return ((ip & ip_set_hostmask(m->netmask)) - m->first_ip) / m->hosts; } /* Common functions */ static int bitmap_ip_do_test(const struct bitmap_ip_adt_elem *e, struct bitmap_ip *map, size_t dsize) { return !!test_bit(e->id, map->members); } static int bitmap_ip_gc_test(u16 id, const struct bitmap_ip *map, size_t dsize) { return !!test_bit(id, map->members); } static int bitmap_ip_do_add(const struct bitmap_ip_adt_elem *e, struct bitmap_ip *map, u32 flags, size_t dsize) { return !!test_bit(e->id, map->members); } static int bitmap_ip_do_del(const struct bitmap_ip_adt_elem *e, struct bitmap_ip *map) { return !test_and_clear_bit(e->id, map->members); } static int bitmap_ip_do_list(struct sk_buff *skb, const struct bitmap_ip *map, u32 id, size_t dsize) { return nla_put_ipaddr4(skb, IPSET_ATTR_IP, htonl(map->first_ip + id * map->hosts)); } static int bitmap_ip_do_head(struct sk_buff *skb, const struct bitmap_ip *map) { return nla_put_ipaddr4(skb, IPSET_ATTR_IP, htonl(map->first_ip)) || nla_put_ipaddr4(skb, IPSET_ATTR_IP_TO, htonl(map->last_ip)) || (map->netmask != 32 && nla_put_u8(skb, IPSET_ATTR_NETMASK, map->netmask)); } static int bitmap_ip_kadt(struct ip_set *set, const struct sk_buff *skb, const struct xt_action_param *par, enum ipset_adt adt, struct ip_set_adt_opt *opt) { struct bitmap_ip *map = set->data; ipset_adtfn adtfn = set->variant->adt[adt]; struct bitmap_ip_adt_elem e = { .id = 0 }; struct ip_set_ext ext = IP_SET_INIT_KEXT(skb, opt, set); u32 ip; ip = ntohl(ip4addr(skb, opt->flags & IPSET_DIM_ONE_SRC)); if (ip < map->first_ip || ip > map->last_ip) return -IPSET_ERR_BITMAP_RANGE; e.id = ip_to_id(map, ip); return adtfn(set, &e, &ext, &opt->ext, opt->cmdflags); } static int bitmap_ip_uadt(struct ip_set *set, struct nlattr *tb[], enum ipset_adt adt, u32 *lineno, u32 flags, bool retried) { struct bitmap_ip *map = set->data; ipset_adtfn adtfn = set->variant->adt[adt]; u32 ip = 0, ip_to = 0; struct bitmap_ip_adt_elem e = { .id = 0 }; struct ip_set_ext ext = IP_SET_INIT_UEXT(set); int ret = 0; if (tb[IPSET_ATTR_LINENO]) *lineno = nla_get_u32(tb[IPSET_ATTR_LINENO]); if (unlikely(!tb[IPSET_ATTR_IP])) return -IPSET_ERR_PROTOCOL; ret = ip_set_get_hostipaddr4(tb[IPSET_ATTR_IP], &ip); if (ret) return ret; ret = ip_set_get_extensions(set, tb, &ext); if (ret) return ret; if (ip < map->first_ip || ip > map->last_ip) return -IPSET_ERR_BITMAP_RANGE; if (adt == IPSET_TEST) { e.id = ip_to_id(map, ip); return adtfn(set, &e, &ext, &ext, flags); } if (tb[IPSET_ATTR_IP_TO]) { ret = ip_set_get_hostipaddr4(tb[IPSET_ATTR_IP_TO], &ip_to); if (ret) return ret; if (ip > ip_to) swap(ip, ip_to); } else if (tb[IPSET_ATTR_CIDR]) { u8 cidr = nla_get_u8(tb[IPSET_ATTR_CIDR]); if (!cidr || cidr > HOST_MASK) return -IPSET_ERR_INVALID_CIDR; ip_set_mask_from_to(ip, ip_to, cidr); } else { ip_to = ip; } if (ip < map->first_ip || ip_to > map->last_ip) return -IPSET_ERR_BITMAP_RANGE; for (; !before(ip_to, ip); ip += map->hosts) { e.id = ip_to_id(map, ip); ret = adtfn(set, &e, &ext, &ext, flags); if (ret && !ip_set_eexist(ret, flags)) return ret; ret = 0; } return ret; } static bool bitmap_ip_same_set(const struct ip_set *a, const struct ip_set *b) { const struct bitmap_ip *x = a->data; const struct bitmap_ip *y = b->data; return x->first_ip == y->first_ip && x->last_ip == y->last_ip && x->netmask == y->netmask && a->timeout == b->timeout && a->extensions == b->extensions; } /* Plain variant */ struct bitmap_ip_elem { }; #include "ip_set_bitmap_gen.h" /* Create bitmap:ip type of sets */ static bool init_map_ip(struct ip_set *set, struct bitmap_ip *map, u32 first_ip, u32 last_ip, u32 elements, u32 hosts, u8 netmask) { map->members = bitmap_zalloc(elements, GFP_KERNEL | __GFP_NOWARN); if (!map->members) return false; map->first_ip = first_ip; map->last_ip = last_ip; map->elements = elements; map->hosts = hosts; map->netmask = netmask; set->timeout = IPSET_NO_TIMEOUT; map->set = set; set->data = map; set->family = NFPROTO_IPV4; return true; } static u32 range_to_mask(u32 from, u32 to, u8 *bits) { u32 mask = 0xFFFFFFFE; *bits = 32; while (--(*bits) > 0 && mask && (to & mask) != from) mask <<= 1; return mask; } static int bitmap_ip_create(struct net *net, struct ip_set *set, struct nlattr *tb[], u32 flags) { struct bitmap_ip *map; u32 first_ip = 0, last_ip = 0, hosts; u64 elements; u8 netmask = 32; int ret; if (unlikely(!tb[IPSET_ATTR_IP] || !ip_set_optattr_netorder(tb, IPSET_ATTR_TIMEOUT) || !ip_set_optattr_netorder(tb, IPSET_ATTR_CADT_FLAGS))) return -IPSET_ERR_PROTOCOL; ret = ip_set_get_hostipaddr4(tb[IPSET_ATTR_IP], &first_ip); if (ret) return ret; if (tb[IPSET_ATTR_IP_TO]) { ret = ip_set_get_hostipaddr4(tb[IPSET_ATTR_IP_TO], &last_ip); if (ret) return ret; if (first_ip > last_ip) swap(first_ip, last_ip); } else if (tb[IPSET_ATTR_CIDR]) { u8 cidr = nla_get_u8(tb[IPSET_ATTR_CIDR]); if (cidr >= HOST_MASK) return -IPSET_ERR_INVALID_CIDR; ip_set_mask_from_to(first_ip, last_ip, cidr); } else { return -IPSET_ERR_PROTOCOL; } if (tb[IPSET_ATTR_NETMASK]) { netmask = nla_get_u8(tb[IPSET_ATTR_NETMASK]); if (netmask > HOST_MASK) return -IPSET_ERR_INVALID_NETMASK; first_ip &= ip_set_hostmask(netmask); last_ip |= ~ip_set_hostmask(netmask); } if (netmask == 32) { hosts = 1; elements = (u64)last_ip - first_ip + 1; } else { u8 mask_bits; u32 mask; mask = range_to_mask(first_ip, last_ip, &mask_bits); if ((!mask && (first_ip || last_ip != 0xFFFFFFFF)) || netmask <= mask_bits) return -IPSET_ERR_BITMAP_RANGE; pr_debug("mask_bits %u, netmask %u\n", mask_bits, netmask); hosts = 2U << (32 - netmask - 1); elements = 2UL << (netmask - mask_bits - 1); } if (elements > IPSET_BITMAP_MAX_RANGE + 1) return -IPSET_ERR_BITMAP_RANGE_SIZE; pr_debug("hosts %u, elements %llu\n", hosts, (unsigned long long)elements); set->dsize = ip_set_elem_len(set, tb, 0, 0); map = ip_set_alloc(sizeof(*map) + elements * set->dsize); if (!map) return -ENOMEM; map->memsize = BITS_TO_LONGS(elements) * sizeof(unsigned long); set->variant = &bitmap_ip; if (!init_map_ip(set, map, first_ip, last_ip, elements, hosts, netmask)) { ip_set_free(map); return -ENOMEM; } if (tb[IPSET_ATTR_TIMEOUT]) { set->timeout = ip_set_timeout_uget(tb[IPSET_ATTR_TIMEOUT]); bitmap_ip_gc_init(set, bitmap_ip_gc); } return 0; } static struct ip_set_type bitmap_ip_type __read_mostly = { .name = "bitmap:ip", .protocol = IPSET_PROTOCOL, .features = IPSET_TYPE_IP, .dimension = IPSET_DIM_ONE, .family = NFPROTO_IPV4, .revision_min = IPSET_TYPE_REV_MIN, .revision_max = IPSET_TYPE_REV_MAX, .create = bitmap_ip_create, .create_policy = { [IPSET_ATTR_IP] = { .type = NLA_NESTED }, [IPSET_ATTR_IP_TO] = { .type = NLA_NESTED }, [IPSET_ATTR_CIDR] = { .type = NLA_U8 }, [IPSET_ATTR_NETMASK] = { .type = NLA_U8 }, [IPSET_ATTR_TIMEOUT] = { .type = NLA_U32 }, [IPSET_ATTR_CADT_FLAGS] = { .type = NLA_U32 }, }, .adt_policy = { [IPSET_ATTR_IP] = { .type = NLA_NESTED }, [IPSET_ATTR_IP_TO] = { .type = NLA_NESTED }, [IPSET_ATTR_CIDR] = { .type = NLA_U8 }, [IPSET_ATTR_TIMEOUT] = { .type = NLA_U32 }, [IPSET_ATTR_LINENO] = { .type = NLA_U32 }, [IPSET_ATTR_BYTES] = { .type = NLA_U64 }, [IPSET_ATTR_PACKETS] = { .type = NLA_U64 }, [IPSET_ATTR_COMMENT] = { .type = NLA_NUL_STRING, .len = IPSET_MAX_COMMENT_SIZE }, [IPSET_ATTR_SKBMARK] = { .type = NLA_U64 }, [IPSET_ATTR_SKBPRIO] = { .type = NLA_U32 }, [IPSET_ATTR_SKBQUEUE] = { .type = NLA_U16 }, }, .me = THIS_MODULE, }; static int __init bitmap_ip_init(void) { return ip_set_type_register(&bitmap_ip_type); } static void __exit bitmap_ip_fini(void) { rcu_barrier(); ip_set_type_unregister(&bitmap_ip_type); } module_init(bitmap_ip_init); module_exit(bitmap_ip_fini); |
| 2 2 2 2 2 2 1 1 2 2 3 3 1 2 2 2 2 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 | // SPDX-License-Identifier: GPL-2.0+ /* * NILFS module and super block management. * * Copyright (C) 2005-2008 Nippon Telegraph and Telephone Corporation. * * Written by Ryusuke Konishi. */ /* * linux/fs/ext2/super.c * * Copyright (C) 1992, 1993, 1994, 1995 * Remy Card (card@masi.ibp.fr) * Laboratoire MASI - Institut Blaise Pascal * Universite Pierre et Marie Curie (Paris VI) * * from * * linux/fs/minix/inode.c * * Copyright (C) 1991, 1992 Linus Torvalds * * Big-endian to little-endian byte-swapping/bitmaps by * David S. Miller (davem@caip.rutgers.edu), 1995 */ #include <linux/module.h> #include <linux/string.h> #include <linux/slab.h> #include <linux/init.h> #include <linux/blkdev.h> #include <linux/crc32.h> #include <linux/vfs.h> #include <linux/writeback.h> #include <linux/seq_file.h> #include <linux/mount.h> #include <linux/fs_context.h> #include <linux/fs_parser.h> #include "nilfs.h" #include "export.h" #include "mdt.h" #include "alloc.h" #include "btree.h" #include "btnode.h" #include "page.h" #include "cpfile.h" #include "sufile.h" /* nilfs_sufile_resize(), nilfs_sufile_set_alloc_range() */ #include "ifile.h" #include "dat.h" #include "segment.h" #include "segbuf.h" MODULE_AUTHOR("NTT Corp."); MODULE_DESCRIPTION("A New Implementation of the Log-structured Filesystem " "(NILFS)"); MODULE_LICENSE("GPL"); static struct kmem_cache *nilfs_inode_cachep; struct kmem_cache *nilfs_transaction_cachep; struct kmem_cache *nilfs_segbuf_cachep; struct kmem_cache *nilfs_btree_path_cache; static int nilfs_setup_super(struct super_block *sb, int is_mount); void __nilfs_msg(struct super_block *sb, const char *fmt, ...) { struct va_format vaf; va_list args; int level; va_start(args, fmt); level = printk_get_level(fmt); vaf.fmt = printk_skip_level(fmt); vaf.va = &args; if (sb) printk("%c%cNILFS (%s): %pV\n", KERN_SOH_ASCII, level, sb->s_id, &vaf); else printk("%c%cNILFS: %pV\n", KERN_SOH_ASCII, level, &vaf); va_end(args); } static void nilfs_set_error(struct super_block *sb) { struct the_nilfs *nilfs = sb->s_fs_info; struct nilfs_super_block **sbp; down_write(&nilfs->ns_sem); if (!(nilfs->ns_mount_state & NILFS_ERROR_FS)) { nilfs->ns_mount_state |= NILFS_ERROR_FS; sbp = nilfs_prepare_super(sb, 0); if (likely(sbp)) { sbp[0]->s_state |= cpu_to_le16(NILFS_ERROR_FS); if (sbp[1]) sbp[1]->s_state |= cpu_to_le16(NILFS_ERROR_FS); nilfs_commit_super(sb, NILFS_SB_COMMIT_ALL); } } up_write(&nilfs->ns_sem); } /** * __nilfs_error() - report failure condition on a filesystem * @sb: super block instance * @function: name of calling function * @fmt: format string for message to be output * @...: optional arguments to @fmt * * __nilfs_error() sets an ERROR_FS flag on the superblock as well as * reporting an error message. This function should be called when * NILFS detects incoherences or defects of meta data on disk. * * This implements the body of nilfs_error() macro. Normally, * nilfs_error() should be used. As for sustainable errors such as a * single-shot I/O error, nilfs_err() should be used instead. * * Callers should not add a trailing newline since this will do it. */ void __nilfs_error(struct super_block *sb, const char *function, const char *fmt, ...) { struct the_nilfs *nilfs = sb->s_fs_info; struct va_format vaf; va_list args; va_start(args, fmt); vaf.fmt = fmt; vaf.va = &args; printk(KERN_CRIT "NILFS error (device %s): %s: %pV\n", sb->s_id, function, &vaf); va_end(args); if (!sb_rdonly(sb)) { nilfs_set_error(sb); if (nilfs_test_opt(nilfs, ERRORS_RO)) { printk(KERN_CRIT "Remounting filesystem read-only\n"); sb->s_flags |= SB_RDONLY; } } if (nilfs_test_opt(nilfs, ERRORS_PANIC)) panic("NILFS (device %s): panic forced after error\n", sb->s_id); } struct inode *nilfs_alloc_inode(struct super_block *sb) { struct nilfs_inode_info *ii; ii = alloc_inode_sb(sb, nilfs_inode_cachep, GFP_NOFS); if (!ii) return NULL; ii->i_bh = NULL; ii->i_state = 0; ii->i_type = 0; ii->i_cno = 0; ii->i_assoc_inode = NULL; ii->i_bmap = &ii->i_bmap_data; return &ii->vfs_inode; } static void nilfs_free_inode(struct inode *inode) { if (nilfs_is_metadata_file_inode(inode)) nilfs_mdt_destroy(inode); kmem_cache_free(nilfs_inode_cachep, NILFS_I(inode)); } static int nilfs_sync_super(struct super_block *sb, int flag) { struct the_nilfs *nilfs = sb->s_fs_info; int err; retry: set_buffer_dirty(nilfs->ns_sbh[0]); if (nilfs_test_opt(nilfs, BARRIER)) { err = __sync_dirty_buffer(nilfs->ns_sbh[0], REQ_SYNC | REQ_PREFLUSH | REQ_FUA); } else { err = sync_dirty_buffer(nilfs->ns_sbh[0]); } if (unlikely(err)) { nilfs_err(sb, "unable to write superblock: err=%d", err); if (err == -EIO && nilfs->ns_sbh[1]) { /* * sbp[0] points to newer log than sbp[1], * so copy sbp[0] to sbp[1] to take over sbp[0]. */ memcpy(nilfs->ns_sbp[1], nilfs->ns_sbp[0], nilfs->ns_sbsize); nilfs_fall_back_super_block(nilfs); goto retry; } } else { struct nilfs_super_block *sbp = nilfs->ns_sbp[0]; nilfs->ns_sbwcount++; /* * The latest segment becomes trailable from the position * written in superblock. */ clear_nilfs_discontinued(nilfs); /* update GC protection for recent segments */ if (nilfs->ns_sbh[1]) { if (flag == NILFS_SB_COMMIT_ALL) { set_buffer_dirty(nilfs->ns_sbh[1]); if (sync_dirty_buffer(nilfs->ns_sbh[1]) < 0) goto out; } if (le64_to_cpu(nilfs->ns_sbp[1]->s_last_cno) < le64_to_cpu(nilfs->ns_sbp[0]->s_last_cno)) sbp = nilfs->ns_sbp[1]; } spin_lock(&nilfs->ns_last_segment_lock); nilfs->ns_prot_seq = le64_to_cpu(sbp->s_last_seq); spin_unlock(&nilfs->ns_last_segment_lock); } out: return err; } void nilfs_set_log_cursor(struct nilfs_super_block *sbp, struct the_nilfs *nilfs) { sector_t nfreeblocks; /* nilfs->ns_sem must be locked by the caller. */ nilfs_count_free_blocks(nilfs, &nfreeblocks); sbp->s_free_blocks_count = cpu_to_le64(nfreeblocks); spin_lock(&nilfs->ns_last_segment_lock); sbp->s_last_seq = cpu_to_le64(nilfs->ns_last_seq); sbp->s_last_pseg = cpu_to_le64(nilfs->ns_last_pseg); sbp->s_last_cno = cpu_to_le64(nilfs->ns_last_cno); spin_unlock(&nilfs->ns_last_segment_lock); } struct nilfs_super_block **nilfs_prepare_super(struct super_block *sb, int flip) { struct the_nilfs *nilfs = sb->s_fs_info; struct nilfs_super_block **sbp = nilfs->ns_sbp; /* nilfs->ns_sem must be locked by the caller. */ if (sbp[0]->s_magic != cpu_to_le16(NILFS_SUPER_MAGIC)) { if (sbp[1] && sbp[1]->s_magic == cpu_to_le16(NILFS_SUPER_MAGIC)) { memcpy(sbp[0], sbp[1], nilfs->ns_sbsize); } else { nilfs_crit(sb, "superblock broke"); return NULL; } } else if (sbp[1] && sbp[1]->s_magic != cpu_to_le16(NILFS_SUPER_MAGIC)) { memcpy(sbp[1], sbp[0], nilfs->ns_sbsize); } if (flip && sbp[1]) nilfs_swap_super_block(nilfs); return sbp; } int nilfs_commit_super(struct super_block *sb, int flag) { struct the_nilfs *nilfs = sb->s_fs_info; struct nilfs_super_block **sbp = nilfs->ns_sbp; time64_t t; /* nilfs->ns_sem must be locked by the caller. */ t = ktime_get_real_seconds(); nilfs->ns_sbwtime = t; sbp[0]->s_wtime = cpu_to_le64(t); sbp[0]->s_sum = 0; sbp[0]->s_sum = cpu_to_le32(crc32_le(nilfs->ns_crc_seed, (unsigned char *)sbp[0], nilfs->ns_sbsize)); if (flag == NILFS_SB_COMMIT_ALL && sbp[1]) { sbp[1]->s_wtime = sbp[0]->s_wtime; sbp[1]->s_sum = 0; sbp[1]->s_sum = cpu_to_le32(crc32_le(nilfs->ns_crc_seed, (unsigned char *)sbp[1], nilfs->ns_sbsize)); } clear_nilfs_sb_dirty(nilfs); nilfs->ns_flushed_device = 1; /* make sure store to ns_flushed_device cannot be reordered */ smp_wmb(); return nilfs_sync_super(sb, flag); } /** * nilfs_cleanup_super() - write filesystem state for cleanup * @sb: super block instance to be unmounted or degraded to read-only * * This function restores state flags in the on-disk super block. * This will set "clean" flag (i.e. NILFS_VALID_FS) unless the * filesystem was not clean previously. * * Return: 0 on success, %-EIO if I/O error or superblock is corrupted. */ int nilfs_cleanup_super(struct super_block *sb) { struct the_nilfs *nilfs = sb->s_fs_info; struct nilfs_super_block **sbp; int flag = NILFS_SB_COMMIT; int ret = -EIO; sbp = nilfs_prepare_super(sb, 0); if (sbp) { sbp[0]->s_state = cpu_to_le16(nilfs->ns_mount_state); nilfs_set_log_cursor(sbp[0], nilfs); if (sbp[1] && sbp[0]->s_last_cno == sbp[1]->s_last_cno) { /* * make the "clean" flag also to the opposite * super block if both super blocks point to * the same checkpoint. */ sbp[1]->s_state = sbp[0]->s_state; flag = NILFS_SB_COMMIT_ALL; } ret = nilfs_commit_super(sb, flag); } return ret; } /** * nilfs_move_2nd_super - relocate secondary super block * @sb: super block instance * @sb2off: new offset of the secondary super block (in bytes) * * Return: 0 on success, or a negative error code on failure. */ static int nilfs_move_2nd_super(struct super_block *sb, loff_t sb2off) { struct the_nilfs *nilfs = sb->s_fs_info; struct buffer_head *nsbh; struct nilfs_super_block *nsbp; sector_t blocknr, newblocknr; unsigned long offset; int sb2i; /* array index of the secondary superblock */ int ret = 0; /* nilfs->ns_sem must be locked by the caller. */ if (nilfs->ns_sbh[1] && nilfs->ns_sbh[1]->b_blocknr > nilfs->ns_first_data_block) { sb2i = 1; blocknr = nilfs->ns_sbh[1]->b_blocknr; } else if (nilfs->ns_sbh[0]->b_blocknr > nilfs->ns_first_data_block) { sb2i = 0; blocknr = nilfs->ns_sbh[0]->b_blocknr; } else { sb2i = -1; blocknr = 0; } if (sb2i >= 0 && (u64)blocknr << nilfs->ns_blocksize_bits == sb2off) goto out; /* super block location is unchanged */ /* Get new super block buffer */ newblocknr = sb2off >> nilfs->ns_blocksize_bits; offset = sb2off & (nilfs->ns_blocksize - 1); nsbh = sb_getblk(sb, newblocknr); if (!nsbh) { nilfs_warn(sb, "unable to move secondary superblock to block %llu", (unsigned long long)newblocknr); ret = -EIO; goto out; } nsbp = (void *)nsbh->b_data + offset; lock_buffer(nsbh); if (sb2i >= 0) { /* * The position of the second superblock only changes by 4KiB, * which is larger than the maximum superblock data size * (= 1KiB), so there is no need to use memmove() to allow * overlap between source and destination. */ memcpy(nsbp, nilfs->ns_sbp[sb2i], nilfs->ns_sbsize); /* * Zero fill after copy to avoid overwriting in case of move * within the same block. */ memset(nsbh->b_data, 0, offset); memset((void *)nsbp + nilfs->ns_sbsize, 0, nsbh->b_size - offset - nilfs->ns_sbsize); } else { memset(nsbh->b_data, 0, nsbh->b_size); } set_buffer_uptodate(nsbh); unlock_buffer(nsbh); if (sb2i >= 0) { brelse(nilfs->ns_sbh[sb2i]); nilfs->ns_sbh[sb2i] = nsbh; nilfs->ns_sbp[sb2i] = nsbp; } else if (nilfs->ns_sbh[0]->b_blocknr < nilfs->ns_first_data_block) { /* secondary super block will be restored to index 1 */ nilfs->ns_sbh[1] = nsbh; nilfs->ns_sbp[1] = nsbp; } else { brelse(nsbh); } out: return ret; } /** * nilfs_resize_fs - resize the filesystem * @sb: super block instance * @newsize: new size of the filesystem (in bytes) * * Return: 0 on success, or a negative error code on failure. */ int nilfs_resize_fs(struct super_block *sb, __u64 newsize) { struct the_nilfs *nilfs = sb->s_fs_info; struct nilfs_super_block **sbp; __u64 devsize, newnsegs; loff_t sb2off; int ret; ret = -ERANGE; devsize = bdev_nr_bytes(sb->s_bdev); if (newsize > devsize) goto out; /* * Prevent underflow in second superblock position calculation. * The exact minimum size check is done in nilfs_sufile_resize(). */ if (newsize < 4096) { ret = -ENOSPC; goto out; } /* * Write lock is required to protect some functions depending * on the number of segments, the number of reserved segments, * and so forth. */ down_write(&nilfs->ns_segctor_sem); sb2off = NILFS_SB2_OFFSET_BYTES(newsize); newnsegs = sb2off >> nilfs->ns_blocksize_bits; newnsegs = div64_ul(newnsegs, nilfs->ns_blocks_per_segment); ret = nilfs_sufile_resize(nilfs->ns_sufile, newnsegs); up_write(&nilfs->ns_segctor_sem); if (ret < 0) goto out; ret = nilfs_construct_segment(sb); if (ret < 0) goto out; down_write(&nilfs->ns_sem); nilfs_move_2nd_super(sb, sb2off); ret = -EIO; sbp = nilfs_prepare_super(sb, 0); if (likely(sbp)) { nilfs_set_log_cursor(sbp[0], nilfs); /* * Drop NILFS_RESIZE_FS flag for compatibility with * mount-time resize which may be implemented in a * future release. */ sbp[0]->s_state = cpu_to_le16(le16_to_cpu(sbp[0]->s_state) & ~NILFS_RESIZE_FS); sbp[0]->s_dev_size = cpu_to_le64(newsize); sbp[0]->s_nsegments = cpu_to_le64(nilfs->ns_nsegments); if (sbp[1]) memcpy(sbp[1], sbp[0], nilfs->ns_sbsize); ret = nilfs_commit_super(sb, NILFS_SB_COMMIT_ALL); } up_write(&nilfs->ns_sem); /* * Reset the range of allocatable segments last. This order * is important in the case of expansion because the secondary * superblock must be protected from log write until migration * completes. */ if (!ret) nilfs_sufile_set_alloc_range(nilfs->ns_sufile, 0, newnsegs - 1); out: return ret; } static void nilfs_put_super(struct super_block *sb) { struct the_nilfs *nilfs = sb->s_fs_info; nilfs_detach_log_writer(sb); if (!sb_rdonly(sb)) { down_write(&nilfs->ns_sem); nilfs_cleanup_super(sb); up_write(&nilfs->ns_sem); } nilfs_sysfs_delete_device_group(nilfs); iput(nilfs->ns_sufile); iput(nilfs->ns_cpfile); iput(nilfs->ns_dat); destroy_nilfs(nilfs); sb->s_fs_info = NULL; } static int nilfs_sync_fs(struct super_block *sb, int wait) { struct the_nilfs *nilfs = sb->s_fs_info; struct nilfs_super_block **sbp; int err = 0; /* This function is called when super block should be written back */ if (wait) err = nilfs_construct_segment(sb); down_write(&nilfs->ns_sem); if (nilfs_sb_dirty(nilfs)) { sbp = nilfs_prepare_super(sb, nilfs_sb_will_flip(nilfs)); if (likely(sbp)) { nilfs_set_log_cursor(sbp[0], nilfs); nilfs_commit_super(sb, NILFS_SB_COMMIT); } } up_write(&nilfs->ns_sem); if (!err) err = nilfs_flush_device(nilfs); return err; } int nilfs_attach_checkpoint(struct super_block *sb, __u64 cno, int curr_mnt, struct nilfs_root **rootp) { struct the_nilfs *nilfs = sb->s_fs_info; struct nilfs_root *root; int err = -ENOMEM; root = nilfs_find_or_create_root( nilfs, curr_mnt ? NILFS_CPTREE_CURRENT_CNO : cno); if (!root) return err; if (root->ifile) goto reuse; /* already attached checkpoint */ down_read(&nilfs->ns_segctor_sem); err = nilfs_ifile_read(sb, root, cno, nilfs->ns_inode_size); up_read(&nilfs->ns_segctor_sem); if (unlikely(err)) goto failed; reuse: *rootp = root; return 0; failed: if (err == -EINVAL) nilfs_err(sb, "Invalid checkpoint (checkpoint number=%llu)", (unsigned long long)cno); nilfs_put_root(root); return err; } static int nilfs_freeze(struct super_block *sb) { struct the_nilfs *nilfs = sb->s_fs_info; int err; if (sb_rdonly(sb)) return 0; /* Mark super block clean */ down_write(&nilfs->ns_sem); err = nilfs_cleanup_super(sb); up_write(&nilfs->ns_sem); return err; } static int nilfs_unfreeze(struct super_block *sb) { struct the_nilfs *nilfs = sb->s_fs_info; if (sb_rdonly(sb)) return 0; down_write(&nilfs->ns_sem); nilfs_setup_super(sb, false); up_write(&nilfs->ns_sem); return 0; } static int nilfs_statfs(struct dentry *dentry, struct kstatfs *buf) { struct super_block *sb = dentry->d_sb; struct nilfs_root *root = NILFS_I(d_inode(dentry))->i_root; struct the_nilfs *nilfs = root->nilfs; u64 id = huge_encode_dev(sb->s_bdev->bd_dev); unsigned long long blocks; unsigned long overhead; unsigned long nrsvblocks; sector_t nfreeblocks; u64 nmaxinodes, nfreeinodes; int err; /* * Compute all of the segment blocks * * The blocks before first segment and after last segment * are excluded. */ blocks = nilfs->ns_blocks_per_segment * nilfs->ns_nsegments - nilfs->ns_first_data_block; nrsvblocks = nilfs->ns_nrsvsegs * nilfs->ns_blocks_per_segment; /* * Compute the overhead * * When distributing meta data blocks outside segment structure, * We must count them as the overhead. */ overhead = 0; err = nilfs_count_free_blocks(nilfs, &nfreeblocks); if (unlikely(err)) return err; err = nilfs_ifile_count_free_inodes(root->ifile, &nmaxinodes, &nfreeinodes); if (unlikely(err)) { nilfs_warn(sb, "failed to count free inodes: err=%d", err); if (err == -ERANGE) { /* * If nilfs_palloc_count_max_entries() returns * -ERANGE error code then we simply treat * curent inodes count as maximum possible and * zero as free inodes value. */ nmaxinodes = atomic64_read(&root->inodes_count); nfreeinodes = 0; err = 0; } else return err; } buf->f_type = NILFS_SUPER_MAGIC; buf->f_bsize = sb->s_blocksize; buf->f_blocks = blocks - overhead; buf->f_bfree = nfreeblocks; buf->f_bavail = (buf->f_bfree >= nrsvblocks) ? (buf->f_bfree - nrsvblocks) : 0; buf->f_files = nmaxinodes; buf->f_ffree = nfreeinodes; buf->f_namelen = NILFS_NAME_LEN; buf->f_fsid = u64_to_fsid(id); return 0; } static int nilfs_show_options(struct seq_file *seq, struct dentry *dentry) { struct super_block *sb = dentry->d_sb; struct the_nilfs *nilfs = sb->s_fs_info; struct nilfs_root *root = NILFS_I(d_inode(dentry))->i_root; if (!nilfs_test_opt(nilfs, BARRIER)) seq_puts(seq, ",nobarrier"); if (root->cno != NILFS_CPTREE_CURRENT_CNO) seq_printf(seq, ",cp=%llu", (unsigned long long)root->cno); if (nilfs_test_opt(nilfs, ERRORS_PANIC)) seq_puts(seq, ",errors=panic"); if (nilfs_test_opt(nilfs, ERRORS_CONT)) seq_puts(seq, ",errors=continue"); if (nilfs_test_opt(nilfs, STRICT_ORDER)) seq_puts(seq, ",order=strict"); if (nilfs_test_opt(nilfs, NORECOVERY)) seq_puts(seq, ",norecovery"); if (nilfs_test_opt(nilfs, DISCARD)) seq_puts(seq, ",discard"); return 0; } static const struct super_operations nilfs_sops = { .alloc_inode = nilfs_alloc_inode, .free_inode = nilfs_free_inode, .dirty_inode = nilfs_dirty_inode, .evict_inode = nilfs_evict_inode, .put_super = nilfs_put_super, .sync_fs = nilfs_sync_fs, .freeze_fs = nilfs_freeze, .unfreeze_fs = nilfs_unfreeze, .statfs = nilfs_statfs, .show_options = nilfs_show_options }; enum { Opt_err, Opt_barrier, Opt_snapshot, Opt_order, Opt_norecovery, Opt_discard, }; static const struct constant_table nilfs_param_err[] = { {"continue", NILFS_MOUNT_ERRORS_CONT}, {"panic", NILFS_MOUNT_ERRORS_PANIC}, {"remount-ro", NILFS_MOUNT_ERRORS_RO}, {} }; static const struct fs_parameter_spec nilfs_param_spec[] = { fsparam_enum ("errors", Opt_err, nilfs_param_err), fsparam_flag_no ("barrier", Opt_barrier), fsparam_u64 ("cp", Opt_snapshot), fsparam_string ("order", Opt_order), fsparam_flag ("norecovery", Opt_norecovery), fsparam_flag_no ("discard", Opt_discard), {} }; struct nilfs_fs_context { unsigned long ns_mount_opt; __u64 cno; }; static int nilfs_parse_param(struct fs_context *fc, struct fs_parameter *param) { struct nilfs_fs_context *nilfs = fc->fs_private; int is_remount = fc->purpose == FS_CONTEXT_FOR_RECONFIGURE; struct fs_parse_result result; int opt; opt = fs_parse(fc, nilfs_param_spec, param, &result); if (opt < 0) return opt; switch (opt) { case Opt_barrier: if (result.negated) nilfs_clear_opt(nilfs, BARRIER); else nilfs_set_opt(nilfs, BARRIER); break; case Opt_order: if (strcmp(param->string, "relaxed") == 0) /* Ordered data semantics */ nilfs_clear_opt(nilfs, STRICT_ORDER); else if (strcmp(param->string, "strict") == 0) /* Strict in-order semantics */ nilfs_set_opt(nilfs, STRICT_ORDER); else return -EINVAL; break; case Opt_err: nilfs->ns_mount_opt &= ~NILFS_MOUNT_ERROR_MODE; nilfs->ns_mount_opt |= result.uint_32; break; case Opt_snapshot: if (is_remount) { struct super_block *sb = fc->root->d_sb; nilfs_err(sb, "\"%s\" option is invalid for remount", param->key); return -EINVAL; } if (result.uint_64 == 0) { nilfs_err(NULL, "invalid option \"cp=0\": invalid checkpoint number 0"); return -EINVAL; } nilfs->cno = result.uint_64; break; case Opt_norecovery: nilfs_set_opt(nilfs, NORECOVERY); break; case Opt_discard: if (result.negated) nilfs_clear_opt(nilfs, DISCARD); else nilfs_set_opt(nilfs, DISCARD); break; default: return -EINVAL; } return 0; } static int nilfs_setup_super(struct super_block *sb, int is_mount) { struct the_nilfs *nilfs = sb->s_fs_info; struct nilfs_super_block **sbp; int max_mnt_count; int mnt_count; /* nilfs->ns_sem must be locked by the caller. */ sbp = nilfs_prepare_super(sb, 0); if (!sbp) return -EIO; if (!is_mount) goto skip_mount_setup; max_mnt_count = le16_to_cpu(sbp[0]->s_max_mnt_count); mnt_count = le16_to_cpu(sbp[0]->s_mnt_count); if (nilfs->ns_mount_state & NILFS_ERROR_FS) { nilfs_warn(sb, "mounting fs with errors"); #if 0 } else if (max_mnt_count >= 0 && mnt_count >= max_mnt_count) { nilfs_warn(sb, "maximal mount count reached"); #endif } if (!max_mnt_count) sbp[0]->s_max_mnt_count = cpu_to_le16(NILFS_DFL_MAX_MNT_COUNT); sbp[0]->s_mnt_count = cpu_to_le16(mnt_count + 1); sbp[0]->s_mtime = cpu_to_le64(ktime_get_real_seconds()); skip_mount_setup: sbp[0]->s_state = cpu_to_le16(le16_to_cpu(sbp[0]->s_state) & ~NILFS_VALID_FS); /* synchronize sbp[1] with sbp[0] */ if (sbp[1]) memcpy(sbp[1], sbp[0], nilfs->ns_sbsize); return nilfs_commit_super(sb, NILFS_SB_COMMIT_ALL); } struct nilfs_super_block *nilfs_read_super_block(struct super_block *sb, u64 pos, int blocksize, struct buffer_head **pbh) { unsigned long long sb_index = pos; unsigned long offset; offset = do_div(sb_index, blocksize); *pbh = sb_bread(sb, sb_index); if (!*pbh) return NULL; return (struct nilfs_super_block *)((char *)(*pbh)->b_data + offset); } int nilfs_store_magic(struct super_block *sb, struct nilfs_super_block *sbp) { struct the_nilfs *nilfs = sb->s_fs_info; sb->s_magic = le16_to_cpu(sbp->s_magic); /* FS independent flags */ #ifdef NILFS_ATIME_DISABLE sb->s_flags |= SB_NOATIME; #endif nilfs->ns_resuid = le16_to_cpu(sbp->s_def_resuid); nilfs->ns_resgid = le16_to_cpu(sbp->s_def_resgid); nilfs->ns_interval = le32_to_cpu(sbp->s_c_interval); nilfs->ns_watermark = le32_to_cpu(sbp->s_c_block_max); return 0; } int nilfs_check_feature_compatibility(struct super_block *sb, struct nilfs_super_block *sbp) { __u64 features; features = le64_to_cpu(sbp->s_feature_incompat) & ~NILFS_FEATURE_INCOMPAT_SUPP; if (features) { nilfs_err(sb, "couldn't mount because of unsupported optional features (%llx)", (unsigned long long)features); return -EINVAL; } features = le64_to_cpu(sbp->s_feature_compat_ro) & ~NILFS_FEATURE_COMPAT_RO_SUPP; if (!sb_rdonly(sb) && features) { nilfs_err(sb, "couldn't mount RDWR because of unsupported optional features (%llx)", (unsigned long long)features); return -EINVAL; } return 0; } static int nilfs_get_root_dentry(struct super_block *sb, struct nilfs_root *root, struct dentry **root_dentry) { struct inode *inode; struct dentry *dentry; int ret = 0; inode = nilfs_iget(sb, root, NILFS_ROOT_INO); if (IS_ERR(inode)) { ret = PTR_ERR(inode); nilfs_err(sb, "error %d getting root inode", ret); goto out; } if (!S_ISDIR(inode->i_mode) || !inode->i_blocks || !inode->i_size) { iput(inode); nilfs_err(sb, "corrupt root inode"); ret = -EINVAL; goto out; } if (root->cno == NILFS_CPTREE_CURRENT_CNO) { dentry = d_find_alias(inode); if (!dentry) { dentry = d_make_root(inode); if (!dentry) { ret = -ENOMEM; goto failed_dentry; } } else { iput(inode); } } else { dentry = d_obtain_root(inode); if (IS_ERR(dentry)) { ret = PTR_ERR(dentry); goto failed_dentry; } } *root_dentry = dentry; out: return ret; failed_dentry: nilfs_err(sb, "error %d getting root dentry", ret); goto out; } static int nilfs_attach_snapshot(struct super_block *s, __u64 cno, struct dentry **root_dentry) { struct the_nilfs *nilfs = s->s_fs_info; struct nilfs_root *root; int ret; mutex_lock(&nilfs->ns_snapshot_mount_mutex); down_read(&nilfs->ns_segctor_sem); ret = nilfs_cpfile_is_snapshot(nilfs->ns_cpfile, cno); up_read(&nilfs->ns_segctor_sem); if (ret < 0) { ret = (ret == -ENOENT) ? -EINVAL : ret; goto out; } else if (!ret) { nilfs_err(s, "The specified checkpoint is not a snapshot (checkpoint number=%llu)", (unsigned long long)cno); ret = -EINVAL; goto out; } ret = nilfs_attach_checkpoint(s, cno, false, &root); if (ret) { nilfs_err(s, "error %d while loading snapshot (checkpoint number=%llu)", ret, (unsigned long long)cno); goto out; } ret = nilfs_get_root_dentry(s, root, root_dentry); nilfs_put_root(root); out: mutex_unlock(&nilfs->ns_snapshot_mount_mutex); return ret; } /** * nilfs_tree_is_busy() - try to shrink dentries of a checkpoint * @root_dentry: root dentry of the tree to be shrunk * * Return: true if the tree was in-use, false otherwise. */ static bool nilfs_tree_is_busy(struct dentry *root_dentry) { shrink_dcache_parent(root_dentry); return d_count(root_dentry) > 1; } int nilfs_checkpoint_is_mounted(struct super_block *sb, __u64 cno) { struct the_nilfs *nilfs = sb->s_fs_info; struct nilfs_root *root; struct inode *inode; struct dentry *dentry; int ret; if (cno > nilfs->ns_cno) return false; if (cno >= nilfs_last_cno(nilfs)) return true; /* protect recent checkpoints */ ret = false; root = nilfs_lookup_root(nilfs, cno); if (root) { inode = nilfs_ilookup(sb, root, NILFS_ROOT_INO); if (inode) { dentry = d_find_alias(inode); if (dentry) { ret = nilfs_tree_is_busy(dentry); dput(dentry); } iput(inode); } nilfs_put_root(root); } return ret; } /** * nilfs_fill_super() - initialize a super block instance * @sb: super_block * @fc: filesystem context * * This function is called exclusively by nilfs->ns_mount_mutex. * So, the recovery process is protected from other simultaneous mounts. * * Return: 0 on success, or a negative error code on failure. */ static int nilfs_fill_super(struct super_block *sb, struct fs_context *fc) { struct the_nilfs *nilfs; struct nilfs_root *fsroot; struct nilfs_fs_context *ctx = fc->fs_private; __u64 cno; int err; nilfs = alloc_nilfs(sb); if (!nilfs) return -ENOMEM; sb->s_fs_info = nilfs; err = init_nilfs(nilfs, sb); if (err) goto failed_nilfs; /* Copy in parsed mount options */ nilfs->ns_mount_opt = ctx->ns_mount_opt; sb->s_op = &nilfs_sops; sb->s_export_op = &nilfs_export_ops; sb->s_root = NULL; sb->s_time_gran = 1; sb->s_max_links = NILFS_LINK_MAX; sb->s_bdi = bdi_get(sb->s_bdev->bd_disk->bdi); err = load_nilfs(nilfs, sb); if (err) goto failed_nilfs; super_set_uuid(sb, nilfs->ns_sbp[0]->s_uuid, sizeof(nilfs->ns_sbp[0]->s_uuid)); super_set_sysfs_name_bdev(sb); cno = nilfs_last_cno(nilfs); err = nilfs_attach_checkpoint(sb, cno, true, &fsroot); if (err) { nilfs_err(sb, "error %d while loading last checkpoint (checkpoint number=%llu)", err, (unsigned long long)cno); goto failed_unload; } if (!sb_rdonly(sb)) { err = nilfs_attach_log_writer(sb, fsroot); if (err) goto failed_checkpoint; } err = nilfs_get_root_dentry(sb, fsroot, &sb->s_root); if (err) goto failed_segctor; nilfs_put_root(fsroot); if (!sb_rdonly(sb)) { down_write(&nilfs->ns_sem); nilfs_setup_super(sb, true); up_write(&nilfs->ns_sem); } return 0; failed_segctor: nilfs_detach_log_writer(sb); failed_checkpoint: nilfs_put_root(fsroot); failed_unload: nilfs_sysfs_delete_device_group(nilfs); iput(nilfs->ns_sufile); iput(nilfs->ns_cpfile); iput(nilfs->ns_dat); failed_nilfs: destroy_nilfs(nilfs); return err; } static int nilfs_reconfigure(struct fs_context *fc) { struct nilfs_fs_context *ctx = fc->fs_private; struct super_block *sb = fc->root->d_sb; struct the_nilfs *nilfs = sb->s_fs_info; int err; sync_filesystem(sb); err = -EINVAL; if (!nilfs_valid_fs(nilfs)) { nilfs_warn(sb, "couldn't remount because the filesystem is in an incomplete recovery state"); goto ignore_opts; } if ((bool)(fc->sb_flags & SB_RDONLY) == sb_rdonly(sb)) goto out; if (fc->sb_flags & SB_RDONLY) { sb->s_flags |= SB_RDONLY; /* * Remounting a valid RW partition RDONLY, so set * the RDONLY flag and then mark the partition as valid again. */ down_write(&nilfs->ns_sem); nilfs_cleanup_super(sb); up_write(&nilfs->ns_sem); } else { __u64 features; struct nilfs_root *root; /* * Mounting a RDONLY partition read-write, so reread and * store the current valid flag. (It may have been changed * by fsck since we originally mounted the partition.) */ down_read(&nilfs->ns_sem); features = le64_to_cpu(nilfs->ns_sbp[0]->s_feature_compat_ro) & ~NILFS_FEATURE_COMPAT_RO_SUPP; up_read(&nilfs->ns_sem); if (features) { nilfs_warn(sb, "couldn't remount RDWR because of unsupported optional features (%llx)", (unsigned long long)features); err = -EROFS; goto ignore_opts; } sb->s_flags &= ~SB_RDONLY; root = NILFS_I(d_inode(sb->s_root))->i_root; err = nilfs_attach_log_writer(sb, root); if (err) { sb->s_flags |= SB_RDONLY; goto ignore_opts; } down_write(&nilfs->ns_sem); nilfs_setup_super(sb, true); up_write(&nilfs->ns_sem); } out: sb->s_flags = (sb->s_flags & ~SB_POSIXACL); /* Copy over parsed remount options */ nilfs->ns_mount_opt = ctx->ns_mount_opt; return 0; ignore_opts: return err; } static int nilfs_get_tree(struct fs_context *fc) { struct nilfs_fs_context *ctx = fc->fs_private; struct super_block *s; dev_t dev; int err; if (ctx->cno && !(fc->sb_flags & SB_RDONLY)) { nilfs_err(NULL, "invalid option \"cp=%llu\": read-only option is not specified", ctx->cno); return -EINVAL; } err = lookup_bdev(fc->source, &dev); if (err) return err; s = sget_dev(fc, dev); if (IS_ERR(s)) return PTR_ERR(s); if (!s->s_root) { err = setup_bdev_super(s, fc->sb_flags, fc); if (!err) err = nilfs_fill_super(s, fc); if (err) goto failed_super; s->s_flags |= SB_ACTIVE; } else if (!ctx->cno) { if (nilfs_tree_is_busy(s->s_root)) { if ((fc->sb_flags ^ s->s_flags) & SB_RDONLY) { nilfs_err(s, "the device already has a %s mount.", sb_rdonly(s) ? "read-only" : "read/write"); err = -EBUSY; goto failed_super; } } else { /* * Try reconfigure to setup mount states if the current * tree is not mounted and only snapshots use this sb. * * Since nilfs_reconfigure() requires fc->root to be * set, set it first and release it on failure. */ fc->root = dget(s->s_root); err = nilfs_reconfigure(fc); if (err) { dput(fc->root); fc->root = NULL; /* prevent double release */ goto failed_super; } return 0; } } if (ctx->cno) { struct dentry *root_dentry; err = nilfs_attach_snapshot(s, ctx->cno, &root_dentry); if (err) goto failed_super; fc->root = root_dentry; return 0; } fc->root = dget(s->s_root); return 0; failed_super: deactivate_locked_super(s); return err; } static void nilfs_free_fc(struct fs_context *fc) { kfree(fc->fs_private); } static const struct fs_context_operations nilfs_context_ops = { .parse_param = nilfs_parse_param, .get_tree = nilfs_get_tree, .reconfigure = nilfs_reconfigure, .free = nilfs_free_fc, }; static int nilfs_init_fs_context(struct fs_context *fc) { struct nilfs_fs_context *ctx; ctx = kzalloc(sizeof(*ctx), GFP_KERNEL); if (!ctx) return -ENOMEM; ctx->ns_mount_opt = NILFS_MOUNT_ERRORS_RO | NILFS_MOUNT_BARRIER; fc->fs_private = ctx; fc->ops = &nilfs_context_ops; return 0; } struct file_system_type nilfs_fs_type = { .owner = THIS_MODULE, .name = "nilfs2", .kill_sb = kill_block_super, .fs_flags = FS_REQUIRES_DEV, .init_fs_context = nilfs_init_fs_context, .parameters = nilfs_param_spec, }; MODULE_ALIAS_FS("nilfs2"); static void nilfs_inode_init_once(void *obj) { struct nilfs_inode_info *ii = obj; INIT_LIST_HEAD(&ii->i_dirty); #ifdef CONFIG_NILFS_XATTR init_rwsem(&ii->xattr_sem); #endif inode_init_once(&ii->vfs_inode); } static void nilfs_segbuf_init_once(void *obj) { memset(obj, 0, sizeof(struct nilfs_segment_buffer)); } static void nilfs_destroy_cachep(void) { /* * Make sure all delayed rcu free inodes are flushed before we * destroy cache. */ rcu_barrier(); kmem_cache_destroy(nilfs_inode_cachep); kmem_cache_destroy(nilfs_transaction_cachep); kmem_cache_destroy(nilfs_segbuf_cachep); kmem_cache_destroy(nilfs_btree_path_cache); } static int __init nilfs_init_cachep(void) { nilfs_inode_cachep = kmem_cache_create("nilfs2_inode_cache", sizeof(struct nilfs_inode_info), 0, SLAB_RECLAIM_ACCOUNT|SLAB_ACCOUNT, nilfs_inode_init_once); if (!nilfs_inode_cachep) goto fail; nilfs_transaction_cachep = kmem_cache_create("nilfs2_transaction_cache", sizeof(struct nilfs_transaction_info), 0, SLAB_RECLAIM_ACCOUNT, NULL); if (!nilfs_transaction_cachep) goto fail; nilfs_segbuf_cachep = kmem_cache_create("nilfs2_segbuf_cache", sizeof(struct nilfs_segment_buffer), 0, SLAB_RECLAIM_ACCOUNT, nilfs_segbuf_init_once); if (!nilfs_segbuf_cachep) goto fail; nilfs_btree_path_cache = kmem_cache_create("nilfs2_btree_path_cache", sizeof(struct nilfs_btree_path) * NILFS_BTREE_LEVEL_MAX, 0, 0, NULL); if (!nilfs_btree_path_cache) goto fail; return 0; fail: nilfs_destroy_cachep(); return -ENOMEM; } static int __init init_nilfs_fs(void) { int err; err = nilfs_init_cachep(); if (err) goto fail; err = nilfs_sysfs_init(); if (err) goto free_cachep; err = register_filesystem(&nilfs_fs_type); if (err) goto deinit_sysfs_entry; printk(KERN_INFO "NILFS version 2 loaded\n"); return 0; deinit_sysfs_entry: nilfs_sysfs_exit(); free_cachep: nilfs_destroy_cachep(); fail: return err; } static void __exit exit_nilfs_fs(void) { nilfs_destroy_cachep(); nilfs_sysfs_exit(); unregister_filesystem(&nilfs_fs_type); } module_init(init_nilfs_fs) module_exit(exit_nilfs_fs) |
| 11 71 107 1 106 106 79 34 32 2 2 2 2 104 104 103 100 21 56 56 101 19 1 101 99 20 103 88 15 3 85 85 1 84 16 65 81 2 42 1 20 18 2 16 6 2 8 10 4 10 6 7 61 12 73 17 58 38 21 15 45 82 148 59 4 85 142 126 11 117 72 73 72 1 49 49 1 12 11 10 37 127 31 1 30 11 12 5 1 69 255 254 88 13 75 90 44 62 73 74 68 4 4 4 1 3 4 4 1 3 18 18 18 18 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 | // SPDX-License-Identifier: GPL-2.0-or-later /* * PF_INET6 socket protocol family * Linux INET6 implementation * * Authors: * Pedro Roque <roque@di.fc.ul.pt> * * Adapted from linux/net/ipv4/af_inet.c * * Fixes: * piggy, Karl Knutson : Socket protocol table * Hideaki YOSHIFUJI : sin6_scope_id support * Arnaldo Melo : check proc_net_create return, cleanups */ #define pr_fmt(fmt) "IPv6: " fmt #include <linux/module.h> #include <linux/capability.h> #include <linux/errno.h> #include <linux/types.h> #include <linux/socket.h> #include <linux/in.h> #include <linux/kernel.h> #include <linux/timer.h> #include <linux/string.h> #include <linux/sockios.h> #include <linux/net.h> #include <linux/fcntl.h> #include <linux/mm.h> #include <linux/interrupt.h> #include <linux/proc_fs.h> #include <linux/stat.h> #include <linux/init.h> #include <linux/slab.h> #include <linux/inet.h> #include <linux/netdevice.h> #include <linux/icmpv6.h> #include <linux/netfilter_ipv6.h> #include <net/ip.h> #include <net/ipv6.h> #include <net/udp.h> #include <net/udplite.h> #include <net/tcp.h> #include <net/ping.h> #include <net/protocol.h> #include <net/inet_common.h> #include <net/route.h> #include <net/transp_v6.h> #include <net/ip6_route.h> #include <net/addrconf.h> #include <net/ipv6_stubs.h> #include <net/ndisc.h> #ifdef CONFIG_IPV6_TUNNEL #include <net/ip6_tunnel.h> #endif #include <net/calipso.h> #include <net/seg6.h> #include <net/rpl.h> #include <net/compat.h> #include <net/xfrm.h> #include <net/ioam6.h> #include <net/rawv6.h> #include <net/rps.h> #include <linux/uaccess.h> #include <linux/mroute6.h> #include "ip6_offload.h" MODULE_AUTHOR("Cast of dozens"); MODULE_DESCRIPTION("IPv6 protocol stack for Linux"); MODULE_LICENSE("GPL"); /* The inetsw6 table contains everything that inet6_create needs to * build a new socket. */ static struct list_head inetsw6[SOCK_MAX]; static DEFINE_SPINLOCK(inetsw6_lock); struct ipv6_params ipv6_defaults = { .disable_ipv6 = 0, .autoconf = 1, }; static int disable_ipv6_mod; module_param_named(disable, disable_ipv6_mod, int, 0444); MODULE_PARM_DESC(disable, "Disable IPv6 module such that it is non-functional"); module_param_named(disable_ipv6, ipv6_defaults.disable_ipv6, int, 0444); MODULE_PARM_DESC(disable_ipv6, "Disable IPv6 on all interfaces"); module_param_named(autoconf, ipv6_defaults.autoconf, int, 0444); MODULE_PARM_DESC(autoconf, "Enable IPv6 address autoconfiguration on all interfaces"); bool ipv6_mod_enabled(void) { return disable_ipv6_mod == 0; } EXPORT_SYMBOL_GPL(ipv6_mod_enabled); static struct ipv6_pinfo *inet6_sk_generic(struct sock *sk) { const int offset = sk->sk_prot->ipv6_pinfo_offset; return (struct ipv6_pinfo *)(((u8 *)sk) + offset); } void inet6_sock_destruct(struct sock *sk) { inet6_cleanup_sock(sk); inet_sock_destruct(sk); } EXPORT_SYMBOL_GPL(inet6_sock_destruct); static int inet6_create(struct net *net, struct socket *sock, int protocol, int kern) { struct inet_sock *inet; struct ipv6_pinfo *np; struct sock *sk; struct inet_protosw *answer; struct proto *answer_prot; unsigned char answer_flags; int try_loading_module = 0; int err; if (protocol < 0 || protocol >= IPPROTO_MAX) return -EINVAL; /* Look for the requested type/protocol pair. */ lookup_protocol: err = -ESOCKTNOSUPPORT; rcu_read_lock(); list_for_each_entry_rcu(answer, &inetsw6[sock->type], list) { err = 0; /* Check the non-wild match. */ if (protocol == answer->protocol) { if (protocol != IPPROTO_IP) break; } else { /* Check for the two wild cases. */ if (IPPROTO_IP == protocol) { protocol = answer->protocol; break; } if (IPPROTO_IP == answer->protocol) break; } err = -EPROTONOSUPPORT; } if (err) { if (try_loading_module < 2) { rcu_read_unlock(); /* * Be more specific, e.g. net-pf-10-proto-132-type-1 * (net-pf-PF_INET6-proto-IPPROTO_SCTP-type-SOCK_STREAM) */ if (++try_loading_module == 1) request_module("net-pf-%d-proto-%d-type-%d", PF_INET6, protocol, sock->type); /* * Fall back to generic, e.g. net-pf-10-proto-132 * (net-pf-PF_INET6-proto-IPPROTO_SCTP) */ else request_module("net-pf-%d-proto-%d", PF_INET6, protocol); goto lookup_protocol; } else goto out_rcu_unlock; } err = -EPERM; if (sock->type == SOCK_RAW && !kern && !ns_capable(net->user_ns, CAP_NET_RAW)) goto out_rcu_unlock; sock->ops = answer->ops; answer_prot = answer->prot; answer_flags = answer->flags; rcu_read_unlock(); WARN_ON(!answer_prot->slab); err = -ENOBUFS; sk = sk_alloc(net, PF_INET6, GFP_KERNEL, answer_prot, kern); if (!sk) goto out; sock_init_data(sock, sk); err = 0; if (INET_PROTOSW_REUSE & answer_flags) sk->sk_reuse = SK_CAN_REUSE; if (INET_PROTOSW_ICSK & answer_flags) inet_init_csk_locks(sk); inet = inet_sk(sk); inet_assign_bit(IS_ICSK, sk, INET_PROTOSW_ICSK & answer_flags); if (SOCK_RAW == sock->type) { inet->inet_num = protocol; if (IPPROTO_RAW == protocol) inet_set_bit(HDRINCL, sk); } sk->sk_destruct = inet6_sock_destruct; sk->sk_family = PF_INET6; sk->sk_protocol = protocol; sk->sk_backlog_rcv = answer->prot->backlog_rcv; inet_sk(sk)->pinet6 = np = inet6_sk_generic(sk); np->hop_limit = -1; np->mcast_hops = IPV6_DEFAULT_MCASTHOPS; inet6_set_bit(MC6_LOOP, sk); inet6_set_bit(MC6_ALL, sk); np->pmtudisc = IPV6_PMTUDISC_WANT; inet6_assign_bit(REPFLOW, sk, net->ipv6.sysctl.flowlabel_reflect & FLOWLABEL_REFLECT_ESTABLISHED); sk->sk_ipv6only = net->ipv6.sysctl.bindv6only; sk->sk_txrehash = READ_ONCE(net->core.sysctl_txrehash); /* Init the ipv4 part of the socket since we can have sockets * using v6 API for ipv4. */ inet->uc_ttl = -1; inet_set_bit(MC_LOOP, sk); inet->mc_ttl = 1; inet->mc_index = 0; RCU_INIT_POINTER(inet->mc_list, NULL); inet->rcv_tos = 0; if (READ_ONCE(net->ipv4.sysctl_ip_no_pmtu_disc)) inet->pmtudisc = IP_PMTUDISC_DONT; else inet->pmtudisc = IP_PMTUDISC_WANT; if (inet->inet_num) { /* It assumes that any protocol which allows * the user to assign a number at socket * creation time automatically shares. */ inet->inet_sport = htons(inet->inet_num); err = sk->sk_prot->hash(sk); if (err) goto out_sk_release; } if (sk->sk_prot->init) { err = sk->sk_prot->init(sk); if (err) goto out_sk_release; } if (!kern) { err = BPF_CGROUP_RUN_PROG_INET_SOCK(sk); if (err) goto out_sk_release; } out: return err; out_rcu_unlock: rcu_read_unlock(); goto out; out_sk_release: sk_common_release(sk); sock->sk = NULL; goto out; } static int __inet6_bind(struct sock *sk, struct sockaddr *uaddr, int addr_len, u32 flags) { struct sockaddr_in6 *addr = (struct sockaddr_in6 *)uaddr; struct inet_sock *inet = inet_sk(sk); struct ipv6_pinfo *np = inet6_sk(sk); struct net *net = sock_net(sk); __be32 v4addr = 0; unsigned short snum; bool saved_ipv6only; int addr_type = 0; int err = 0; if (addr->sin6_family != AF_INET6) return -EAFNOSUPPORT; addr_type = ipv6_addr_type(&addr->sin6_addr); if ((addr_type & IPV6_ADDR_MULTICAST) && sk->sk_type == SOCK_STREAM) return -EINVAL; snum = ntohs(addr->sin6_port); if (!(flags & BIND_NO_CAP_NET_BIND_SERVICE) && snum && inet_port_requires_bind_service(net, snum) && !ns_capable(net->user_ns, CAP_NET_BIND_SERVICE)) return -EACCES; if (flags & BIND_WITH_LOCK) lock_sock(sk); /* Check these errors (active socket, double bind). */ if (sk->sk_state != TCP_CLOSE || inet->inet_num) { err = -EINVAL; goto out; } /* Check if the address belongs to the host. */ if (addr_type == IPV6_ADDR_MAPPED) { struct net_device *dev = NULL; int chk_addr_ret; /* Binding to v4-mapped address on a v6-only socket * makes no sense */ if (ipv6_only_sock(sk)) { err = -EINVAL; goto out; } rcu_read_lock(); if (sk->sk_bound_dev_if) { dev = dev_get_by_index_rcu(net, sk->sk_bound_dev_if); if (!dev) { err = -ENODEV; goto out_unlock; } } /* Reproduce AF_INET checks to make the bindings consistent */ v4addr = addr->sin6_addr.s6_addr32[3]; chk_addr_ret = inet_addr_type_dev_table(net, dev, v4addr); rcu_read_unlock(); if (!inet_addr_valid_or_nonlocal(net, inet, v4addr, chk_addr_ret)) { err = -EADDRNOTAVAIL; goto out; } } else { if (addr_type != IPV6_ADDR_ANY) { struct net_device *dev = NULL; rcu_read_lock(); if (__ipv6_addr_needs_scope_id(addr_type)) { if (addr_len >= sizeof(struct sockaddr_in6) && addr->sin6_scope_id) { /* Override any existing binding, if another one * is supplied by user. */ sk->sk_bound_dev_if = addr->sin6_scope_id; } /* Binding to link-local address requires an interface */ if (!sk->sk_bound_dev_if) { err = -EINVAL; goto out_unlock; } } if (sk->sk_bound_dev_if) { dev = dev_get_by_index_rcu(net, sk->sk_bound_dev_if); if (!dev) { err = -ENODEV; goto out_unlock; } } /* ipv4 addr of the socket is invalid. Only the * unspecified and mapped address have a v4 equivalent. */ v4addr = LOOPBACK4_IPV6; if (!(addr_type & IPV6_ADDR_MULTICAST)) { if (!ipv6_can_nonlocal_bind(net, inet) && !ipv6_chk_addr(net, &addr->sin6_addr, dev, 0)) { err = -EADDRNOTAVAIL; goto out_unlock; } } rcu_read_unlock(); } } inet->inet_rcv_saddr = v4addr; inet->inet_saddr = v4addr; sk->sk_v6_rcv_saddr = addr->sin6_addr; if (!(addr_type & IPV6_ADDR_MULTICAST)) np->saddr = addr->sin6_addr; saved_ipv6only = sk->sk_ipv6only; if (addr_type != IPV6_ADDR_ANY && addr_type != IPV6_ADDR_MAPPED) sk->sk_ipv6only = 1; /* Make sure we are allowed to bind here. */ if (snum || !(inet_test_bit(BIND_ADDRESS_NO_PORT, sk) || (flags & BIND_FORCE_ADDRESS_NO_PORT))) { err = sk->sk_prot->get_port(sk, snum); if (err) { sk->sk_ipv6only = saved_ipv6only; inet_reset_saddr(sk); goto out; } if (!(flags & BIND_FROM_BPF)) { err = BPF_CGROUP_RUN_PROG_INET6_POST_BIND(sk); if (err) { sk->sk_ipv6only = saved_ipv6only; inet_reset_saddr(sk); if (sk->sk_prot->put_port) sk->sk_prot->put_port(sk); goto out; } } } if (addr_type != IPV6_ADDR_ANY) sk->sk_userlocks |= SOCK_BINDADDR_LOCK; if (snum) sk->sk_userlocks |= SOCK_BINDPORT_LOCK; inet->inet_sport = htons(inet->inet_num); inet->inet_dport = 0; inet->inet_daddr = 0; out: if (flags & BIND_WITH_LOCK) release_sock(sk); return err; out_unlock: rcu_read_unlock(); goto out; } int inet6_bind_sk(struct sock *sk, struct sockaddr *uaddr, int addr_len) { u32 flags = BIND_WITH_LOCK; const struct proto *prot; int err = 0; /* IPV6_ADDRFORM can change sk->sk_prot under us. */ prot = READ_ONCE(sk->sk_prot); /* If the socket has its own bind function then use it. */ if (prot->bind) return prot->bind(sk, uaddr, addr_len); if (addr_len < SIN6_LEN_RFC2133) return -EINVAL; /* BPF prog is run before any checks are done so that if the prog * changes context in a wrong way it will be caught. */ err = BPF_CGROUP_RUN_PROG_INET_BIND_LOCK(sk, uaddr, &addr_len, CGROUP_INET6_BIND, &flags); if (err) return err; return __inet6_bind(sk, uaddr, addr_len, flags); } /* bind for INET6 API */ int inet6_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len) { return inet6_bind_sk(sock->sk, uaddr, addr_len); } EXPORT_SYMBOL(inet6_bind); int inet6_release(struct socket *sock) { struct sock *sk = sock->sk; if (!sk) return -EINVAL; /* Free mc lists */ ipv6_sock_mc_close(sk); /* Free ac lists */ ipv6_sock_ac_close(sk); return inet_release(sock); } EXPORT_SYMBOL(inet6_release); void inet6_cleanup_sock(struct sock *sk) { struct ipv6_pinfo *np = inet6_sk(sk); struct sk_buff *skb; struct ipv6_txoptions *opt; /* Release rx options */ skb = xchg(&np->pktoptions, NULL); kfree_skb(skb); skb = xchg(&np->rxpmtu, NULL); kfree_skb(skb); /* Free flowlabels */ fl6_free_socklist(sk); /* Free tx options */ opt = unrcu_pointer(xchg(&np->opt, NULL)); if (opt) { atomic_sub(opt->tot_len, &sk->sk_omem_alloc); txopt_put(opt); } } EXPORT_SYMBOL_GPL(inet6_cleanup_sock); /* * This does both peername and sockname. */ int inet6_getname(struct socket *sock, struct sockaddr *uaddr, int peer) { struct sockaddr_in6 *sin = (struct sockaddr_in6 *)uaddr; int sin_addr_len = sizeof(*sin); struct sock *sk = sock->sk; struct inet_sock *inet = inet_sk(sk); struct ipv6_pinfo *np = inet6_sk(sk); sin->sin6_family = AF_INET6; sin->sin6_flowinfo = 0; sin->sin6_scope_id = 0; lock_sock(sk); if (peer) { if (!inet->inet_dport || (((1 << sk->sk_state) & (TCPF_CLOSE | TCPF_SYN_SENT)) && peer == 1)) { release_sock(sk); return -ENOTCONN; } sin->sin6_port = inet->inet_dport; sin->sin6_addr = sk->sk_v6_daddr; if (inet6_test_bit(SNDFLOW, sk)) sin->sin6_flowinfo = np->flow_label; BPF_CGROUP_RUN_SA_PROG(sk, (struct sockaddr *)sin, &sin_addr_len, CGROUP_INET6_GETPEERNAME); } else { if (ipv6_addr_any(&sk->sk_v6_rcv_saddr)) sin->sin6_addr = np->saddr; else sin->sin6_addr = sk->sk_v6_rcv_saddr; sin->sin6_port = inet->inet_sport; BPF_CGROUP_RUN_SA_PROG(sk, (struct sockaddr *)sin, &sin_addr_len, CGROUP_INET6_GETSOCKNAME); } sin->sin6_scope_id = ipv6_iface_scope_id(&sin->sin6_addr, sk->sk_bound_dev_if); release_sock(sk); return sin_addr_len; } EXPORT_SYMBOL(inet6_getname); int inet6_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg) { void __user *argp = (void __user *)arg; struct sock *sk = sock->sk; struct net *net = sock_net(sk); const struct proto *prot; switch (cmd) { case SIOCADDRT: case SIOCDELRT: { struct in6_rtmsg rtmsg; if (copy_from_user(&rtmsg, argp, sizeof(rtmsg))) return -EFAULT; return ipv6_route_ioctl(net, cmd, &rtmsg); } case SIOCSIFADDR: return addrconf_add_ifaddr(net, argp); case SIOCDIFADDR: return addrconf_del_ifaddr(net, argp); case SIOCSIFDSTADDR: return addrconf_set_dstaddr(net, argp); default: /* IPV6_ADDRFORM can change sk->sk_prot under us. */ prot = READ_ONCE(sk->sk_prot); if (!prot->ioctl) return -ENOIOCTLCMD; return sk_ioctl(sk, cmd, (void __user *)arg); } /*NOTREACHED*/ return 0; } EXPORT_SYMBOL(inet6_ioctl); #ifdef CONFIG_COMPAT struct compat_in6_rtmsg { struct in6_addr rtmsg_dst; struct in6_addr rtmsg_src; struct in6_addr rtmsg_gateway; u32 rtmsg_type; u16 rtmsg_dst_len; u16 rtmsg_src_len; u32 rtmsg_metric; u32 rtmsg_info; u32 rtmsg_flags; s32 rtmsg_ifindex; }; static int inet6_compat_routing_ioctl(struct sock *sk, unsigned int cmd, struct compat_in6_rtmsg __user *ur) { struct in6_rtmsg rt; if (copy_from_user(&rt.rtmsg_dst, &ur->rtmsg_dst, 3 * sizeof(struct in6_addr)) || get_user(rt.rtmsg_type, &ur->rtmsg_type) || get_user(rt.rtmsg_dst_len, &ur->rtmsg_dst_len) || get_user(rt.rtmsg_src_len, &ur->rtmsg_src_len) || get_user(rt.rtmsg_metric, &ur->rtmsg_metric) || get_user(rt.rtmsg_info, &ur->rtmsg_info) || get_user(rt.rtmsg_flags, &ur->rtmsg_flags) || get_user(rt.rtmsg_ifindex, &ur->rtmsg_ifindex)) return -EFAULT; return ipv6_route_ioctl(sock_net(sk), cmd, &rt); } int inet6_compat_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg) { void __user *argp = compat_ptr(arg); struct sock *sk = sock->sk; switch (cmd) { case SIOCADDRT: case SIOCDELRT: return inet6_compat_routing_ioctl(sk, cmd, argp); default: return -ENOIOCTLCMD; } } EXPORT_SYMBOL_GPL(inet6_compat_ioctl); #endif /* CONFIG_COMPAT */ INDIRECT_CALLABLE_DECLARE(int udpv6_sendmsg(struct sock *, struct msghdr *, size_t)); int inet6_sendmsg(struct socket *sock, struct msghdr *msg, size_t size) { struct sock *sk = sock->sk; const struct proto *prot; if (unlikely(inet_send_prepare(sk))) return -EAGAIN; /* IPV6_ADDRFORM can change sk->sk_prot under us. */ prot = READ_ONCE(sk->sk_prot); return INDIRECT_CALL_2(prot->sendmsg, tcp_sendmsg, udpv6_sendmsg, sk, msg, size); } INDIRECT_CALLABLE_DECLARE(int udpv6_recvmsg(struct sock *, struct msghdr *, size_t, int, int *)); int inet6_recvmsg(struct socket *sock, struct msghdr *msg, size_t size, int flags) { struct sock *sk = sock->sk; const struct proto *prot; int addr_len = 0; int err; if (likely(!(flags & MSG_ERRQUEUE))) sock_rps_record_flow(sk); /* IPV6_ADDRFORM can change sk->sk_prot under us. */ prot = READ_ONCE(sk->sk_prot); err = INDIRECT_CALL_2(prot->recvmsg, tcp_recvmsg, udpv6_recvmsg, sk, msg, size, flags, &addr_len); if (err >= 0) msg->msg_namelen = addr_len; return err; } const struct proto_ops inet6_stream_ops = { .family = PF_INET6, .owner = THIS_MODULE, .release = inet6_release, .bind = inet6_bind, .connect = inet_stream_connect, /* ok */ .socketpair = sock_no_socketpair, /* a do nothing */ .accept = inet_accept, /* ok */ .getname = inet6_getname, .poll = tcp_poll, /* ok */ .ioctl = inet6_ioctl, /* must change */ .gettstamp = sock_gettstamp, .listen = inet_listen, /* ok */ .shutdown = inet_shutdown, /* ok */ .setsockopt = sock_common_setsockopt, /* ok */ .getsockopt = sock_common_getsockopt, /* ok */ .sendmsg = inet6_sendmsg, /* retpoline's sake */ .recvmsg = inet6_recvmsg, /* retpoline's sake */ #ifdef CONFIG_MMU .mmap = tcp_mmap, #endif .splice_eof = inet_splice_eof, .sendmsg_locked = tcp_sendmsg_locked, .splice_read = tcp_splice_read, .set_peek_off = sk_set_peek_off, .read_sock = tcp_read_sock, .read_skb = tcp_read_skb, .peek_len = tcp_peek_len, #ifdef CONFIG_COMPAT .compat_ioctl = inet6_compat_ioctl, #endif .set_rcvlowat = tcp_set_rcvlowat, }; EXPORT_SYMBOL_GPL(inet6_stream_ops); const struct proto_ops inet6_dgram_ops = { .family = PF_INET6, .owner = THIS_MODULE, .release = inet6_release, .bind = inet6_bind, .connect = inet_dgram_connect, /* ok */ .socketpair = sock_no_socketpair, /* a do nothing */ .accept = sock_no_accept, /* a do nothing */ .getname = inet6_getname, .poll = udp_poll, /* ok */ .ioctl = inet6_ioctl, /* must change */ .gettstamp = sock_gettstamp, .listen = sock_no_listen, /* ok */ .shutdown = inet_shutdown, /* ok */ .setsockopt = sock_common_setsockopt, /* ok */ .getsockopt = sock_common_getsockopt, /* ok */ .sendmsg = inet6_sendmsg, /* retpoline's sake */ .recvmsg = inet6_recvmsg, /* retpoline's sake */ .read_skb = udp_read_skb, .mmap = sock_no_mmap, .set_peek_off = udp_set_peek_off, #ifdef CONFIG_COMPAT .compat_ioctl = inet6_compat_ioctl, #endif }; static const struct net_proto_family inet6_family_ops = { .family = PF_INET6, .create = inet6_create, .owner = THIS_MODULE, }; int inet6_register_protosw(struct inet_protosw *p) { struct list_head *lh; struct inet_protosw *answer; struct list_head *last_perm; int protocol = p->protocol; int ret; spin_lock_bh(&inetsw6_lock); ret = -EINVAL; if (p->type >= SOCK_MAX) goto out_illegal; /* If we are trying to override a permanent protocol, bail. */ answer = NULL; ret = -EPERM; last_perm = &inetsw6[p->type]; list_for_each(lh, &inetsw6[p->type]) { answer = list_entry(lh, struct inet_protosw, list); /* Check only the non-wild match. */ if (INET_PROTOSW_PERMANENT & answer->flags) { if (protocol == answer->protocol) break; last_perm = lh; } answer = NULL; } if (answer) goto out_permanent; /* Add the new entry after the last permanent entry if any, so that * the new entry does not override a permanent entry when matched with * a wild-card protocol. But it is allowed to override any existing * non-permanent entry. This means that when we remove this entry, the * system automatically returns to the old behavior. */ list_add_rcu(&p->list, last_perm); ret = 0; out: spin_unlock_bh(&inetsw6_lock); return ret; out_permanent: pr_err("Attempt to override permanent protocol %d\n", protocol); goto out; out_illegal: pr_err("Ignoring attempt to register invalid socket type %d\n", p->type); goto out; } EXPORT_SYMBOL(inet6_register_protosw); void inet6_unregister_protosw(struct inet_protosw *p) { if (INET_PROTOSW_PERMANENT & p->flags) { pr_err("Attempt to unregister permanent protocol %d\n", p->protocol); } else { spin_lock_bh(&inetsw6_lock); list_del_rcu(&p->list); spin_unlock_bh(&inetsw6_lock); synchronize_net(); } } EXPORT_SYMBOL(inet6_unregister_protosw); int inet6_sk_rebuild_header(struct sock *sk) { struct ipv6_pinfo *np = inet6_sk(sk); struct dst_entry *dst; dst = __sk_dst_check(sk, np->dst_cookie); if (!dst) { struct inet_sock *inet = inet_sk(sk); struct in6_addr *final_p, final; struct flowi6 fl6; memset(&fl6, 0, sizeof(fl6)); fl6.flowi6_proto = sk->sk_protocol; fl6.daddr = sk->sk_v6_daddr; fl6.saddr = np->saddr; fl6.flowlabel = np->flow_label; fl6.flowi6_oif = sk->sk_bound_dev_if; fl6.flowi6_mark = sk->sk_mark; fl6.fl6_dport = inet->inet_dport; fl6.fl6_sport = inet->inet_sport; fl6.flowi6_uid = sk_uid(sk); security_sk_classify_flow(sk, flowi6_to_flowi_common(&fl6)); rcu_read_lock(); final_p = fl6_update_dst(&fl6, rcu_dereference(np->opt), &final); rcu_read_unlock(); dst = ip6_dst_lookup_flow(sock_net(sk), sk, &fl6, final_p); if (IS_ERR(dst)) { sk->sk_route_caps = 0; WRITE_ONCE(sk->sk_err_soft, -PTR_ERR(dst)); return PTR_ERR(dst); } ip6_dst_store(sk, dst, NULL, NULL); } return 0; } EXPORT_SYMBOL_GPL(inet6_sk_rebuild_header); bool ipv6_opt_accepted(const struct sock *sk, const struct sk_buff *skb, const struct inet6_skb_parm *opt) { const struct ipv6_pinfo *np = inet6_sk(sk); if (np->rxopt.all) { if (((opt->flags & IP6SKB_HOPBYHOP) && (np->rxopt.bits.hopopts || np->rxopt.bits.ohopopts)) || (ip6_flowinfo((struct ipv6hdr *) skb_network_header(skb)) && np->rxopt.bits.rxflow) || (opt->srcrt && (np->rxopt.bits.srcrt || np->rxopt.bits.osrcrt)) || ((opt->dst1 || opt->dst0) && (np->rxopt.bits.dstopts || np->rxopt.bits.odstopts))) return true; } return false; } static struct packet_type ipv6_packet_type __read_mostly = { .type = cpu_to_be16(ETH_P_IPV6), .func = ipv6_rcv, .list_func = ipv6_list_rcv, }; static int __init ipv6_packet_init(void) { dev_add_pack(&ipv6_packet_type); return 0; } static void ipv6_packet_cleanup(void) { dev_remove_pack(&ipv6_packet_type); } static int __net_init ipv6_init_mibs(struct net *net) { int i; net->mib.udp_stats_in6 = alloc_percpu(struct udp_mib); if (!net->mib.udp_stats_in6) return -ENOMEM; net->mib.udplite_stats_in6 = alloc_percpu(struct udp_mib); if (!net->mib.udplite_stats_in6) goto err_udplite_mib; net->mib.ipv6_statistics = alloc_percpu(struct ipstats_mib); if (!net->mib.ipv6_statistics) goto err_ip_mib; for_each_possible_cpu(i) { struct ipstats_mib *af_inet6_stats; af_inet6_stats = per_cpu_ptr(net->mib.ipv6_statistics, i); u64_stats_init(&af_inet6_stats->syncp); } net->mib.icmpv6_statistics = alloc_percpu(struct icmpv6_mib); if (!net->mib.icmpv6_statistics) goto err_icmp_mib; net->mib.icmpv6msg_statistics = kzalloc(sizeof(struct icmpv6msg_mib), GFP_KERNEL); if (!net->mib.icmpv6msg_statistics) goto err_icmpmsg_mib; return 0; err_icmpmsg_mib: free_percpu(net->mib.icmpv6_statistics); err_icmp_mib: free_percpu(net->mib.ipv6_statistics); err_ip_mib: free_percpu(net->mib.udplite_stats_in6); err_udplite_mib: free_percpu(net->mib.udp_stats_in6); return -ENOMEM; } static void ipv6_cleanup_mibs(struct net *net) { free_percpu(net->mib.udp_stats_in6); free_percpu(net->mib.udplite_stats_in6); free_percpu(net->mib.ipv6_statistics); free_percpu(net->mib.icmpv6_statistics); kfree(net->mib.icmpv6msg_statistics); } static int __net_init inet6_net_init(struct net *net) { int err = 0; net->ipv6.sysctl.bindv6only = 0; net->ipv6.sysctl.icmpv6_time = 1*HZ; net->ipv6.sysctl.icmpv6_echo_ignore_all = 0; net->ipv6.sysctl.icmpv6_echo_ignore_multicast = 0; net->ipv6.sysctl.icmpv6_echo_ignore_anycast = 0; net->ipv6.sysctl.icmpv6_error_anycast_as_unicast = 0; /* By default, rate limit error messages. * Except for pmtu discovery, it would break it. * proc_do_large_bitmap needs pointer to the bitmap. */ bitmap_set(net->ipv6.sysctl.icmpv6_ratemask, 0, ICMPV6_ERRMSG_MAX + 1); bitmap_clear(net->ipv6.sysctl.icmpv6_ratemask, ICMPV6_PKT_TOOBIG, 1); net->ipv6.sysctl.icmpv6_ratemask_ptr = net->ipv6.sysctl.icmpv6_ratemask; net->ipv6.sysctl.flowlabel_consistency = 1; net->ipv6.sysctl.auto_flowlabels = IP6_DEFAULT_AUTO_FLOW_LABELS; net->ipv6.sysctl.idgen_retries = 3; net->ipv6.sysctl.idgen_delay = 1 * HZ; net->ipv6.sysctl.flowlabel_state_ranges = 0; net->ipv6.sysctl.max_dst_opts_cnt = IP6_DEFAULT_MAX_DST_OPTS_CNT; net->ipv6.sysctl.max_hbh_opts_cnt = IP6_DEFAULT_MAX_HBH_OPTS_CNT; net->ipv6.sysctl.max_dst_opts_len = IP6_DEFAULT_MAX_DST_OPTS_LEN; net->ipv6.sysctl.max_hbh_opts_len = IP6_DEFAULT_MAX_HBH_OPTS_LEN; net->ipv6.sysctl.fib_notify_on_flag_change = 0; atomic_set(&net->ipv6.fib6_sernum, 1); net->ipv6.sysctl.ioam6_id = IOAM6_DEFAULT_ID; net->ipv6.sysctl.ioam6_id_wide = IOAM6_DEFAULT_ID_WIDE; err = ipv6_init_mibs(net); if (err) return err; #ifdef CONFIG_PROC_FS err = udp6_proc_init(net); if (err) goto out; err = tcp6_proc_init(net); if (err) goto proc_tcp6_fail; err = ac6_proc_init(net); if (err) goto proc_ac6_fail; #endif return err; #ifdef CONFIG_PROC_FS proc_ac6_fail: tcp6_proc_exit(net); proc_tcp6_fail: udp6_proc_exit(net); out: ipv6_cleanup_mibs(net); return err; #endif } static void __net_exit inet6_net_exit(struct net *net) { #ifdef CONFIG_PROC_FS udp6_proc_exit(net); tcp6_proc_exit(net); ac6_proc_exit(net); #endif ipv6_cleanup_mibs(net); } static struct pernet_operations inet6_net_ops = { .init = inet6_net_init, .exit = inet6_net_exit, }; static int ipv6_route_input(struct sk_buff *skb) { ip6_route_input(skb); return skb_dst(skb)->error; } static const struct ipv6_stub ipv6_stub_impl = { .ipv6_sock_mc_join = ipv6_sock_mc_join, .ipv6_sock_mc_drop = ipv6_sock_mc_drop, .ipv6_dst_lookup_flow = ip6_dst_lookup_flow, .ipv6_route_input = ipv6_route_input, .fib6_get_table = fib6_get_table, .fib6_table_lookup = fib6_table_lookup, .fib6_lookup = fib6_lookup, .fib6_select_path = fib6_select_path, .ip6_mtu_from_fib6 = ip6_mtu_from_fib6, .fib6_nh_init = fib6_nh_init, .fib6_nh_release = fib6_nh_release, .fib6_nh_release_dsts = fib6_nh_release_dsts, .fib6_update_sernum = fib6_update_sernum_stub, .fib6_rt_update = fib6_rt_update, .ip6_del_rt = ip6_del_rt, .udpv6_encap_enable = udpv6_encap_enable, .ndisc_send_na = ndisc_send_na, #if IS_ENABLED(CONFIG_XFRM) .xfrm6_local_rxpmtu = xfrm6_local_rxpmtu, .xfrm6_udp_encap_rcv = xfrm6_udp_encap_rcv, .xfrm6_gro_udp_encap_rcv = xfrm6_gro_udp_encap_rcv, .xfrm6_rcv_encap = xfrm6_rcv_encap, #endif .nd_tbl = &nd_tbl, .ipv6_fragment = ip6_fragment, .ipv6_dev_find = ipv6_dev_find, .ip6_xmit = ip6_xmit, }; static const struct ipv6_bpf_stub ipv6_bpf_stub_impl = { .inet6_bind = __inet6_bind, .udp6_lib_lookup = __udp6_lib_lookup, .ipv6_setsockopt = do_ipv6_setsockopt, .ipv6_getsockopt = do_ipv6_getsockopt, .ipv6_dev_get_saddr = ipv6_dev_get_saddr, }; static int __init inet6_init(void) { struct list_head *r; int err = 0; sock_skb_cb_check_size(sizeof(struct inet6_skb_parm)); /* Register the socket-side information for inet6_create. */ for (r = &inetsw6[0]; r < &inetsw6[SOCK_MAX]; ++r) INIT_LIST_HEAD(r); raw_hashinfo_init(&raw_v6_hashinfo); if (disable_ipv6_mod) { pr_info("Loaded, but administratively disabled, reboot required to enable\n"); goto out; } err = proto_register(&tcpv6_prot, 1); if (err) goto out; err = proto_register(&udpv6_prot, 1); if (err) goto out_unregister_tcp_proto; err = proto_register(&udplitev6_prot, 1); if (err) goto out_unregister_udp_proto; err = proto_register(&rawv6_prot, 1); if (err) goto out_unregister_udplite_proto; err = proto_register(&pingv6_prot, 1); if (err) goto out_unregister_raw_proto; /* We MUST register RAW sockets before we create the ICMP6, * IGMP6, or NDISC control sockets. */ err = rawv6_init(); if (err) goto out_unregister_ping_proto; /* Register the family here so that the init calls below will * be able to create sockets. (?? is this dangerous ??) */ err = sock_register(&inet6_family_ops); if (err) goto out_sock_register_fail; /* * ipngwg API draft makes clear that the correct semantics * for TCP and UDP is to consider one TCP and UDP instance * in a host available by both INET and INET6 APIs and * able to communicate via both network protocols. */ err = register_pernet_subsys(&inet6_net_ops); if (err) goto register_pernet_fail; err = ip6_mr_init(); if (err) goto ipmr_fail; err = icmpv6_init(); if (err) goto icmp_fail; err = ndisc_init(); if (err) goto ndisc_fail; err = igmp6_init(); if (err) goto igmp_fail; err = ipv6_netfilter_init(); if (err) goto netfilter_fail; /* Create /proc/foo6 entries. */ #ifdef CONFIG_PROC_FS err = -ENOMEM; if (raw6_proc_init()) goto proc_raw6_fail; if (udplite6_proc_init()) goto proc_udplite6_fail; if (ipv6_misc_proc_init()) goto proc_misc6_fail; if (if6_proc_init()) goto proc_if6_fail; #endif err = ip6_route_init(); if (err) goto ip6_route_fail; err = ndisc_late_init(); if (err) goto ndisc_late_fail; err = ip6_flowlabel_init(); if (err) goto ip6_flowlabel_fail; err = ipv6_anycast_init(); if (err) goto ipv6_anycast_fail; err = addrconf_init(); if (err) goto addrconf_fail; /* Init v6 extension headers. */ err = ipv6_exthdrs_init(); if (err) goto ipv6_exthdrs_fail; err = ipv6_frag_init(); if (err) goto ipv6_frag_fail; /* Init v6 transport protocols. */ err = udpv6_init(); if (err) goto udpv6_fail; err = udplitev6_init(); if (err) goto udplitev6_fail; err = udpv6_offload_init(); if (err) goto udpv6_offload_fail; err = tcpv6_init(); if (err) goto tcpv6_fail; err = ipv6_packet_init(); if (err) goto ipv6_packet_fail; err = pingv6_init(); if (err) goto pingv6_fail; err = calipso_init(); if (err) goto calipso_fail; err = seg6_init(); if (err) goto seg6_fail; err = rpl_init(); if (err) goto rpl_fail; err = ioam6_init(); if (err) goto ioam6_fail; err = igmp6_late_init(); if (err) goto igmp6_late_err; #ifdef CONFIG_SYSCTL err = ipv6_sysctl_register(); if (err) goto sysctl_fail; #endif /* ensure that ipv6 stubs are visible only after ipv6 is ready */ wmb(); ipv6_stub = &ipv6_stub_impl; ipv6_bpf_stub = &ipv6_bpf_stub_impl; out: return err; #ifdef CONFIG_SYSCTL sysctl_fail: igmp6_late_cleanup(); #endif igmp6_late_err: ioam6_exit(); ioam6_fail: rpl_exit(); rpl_fail: seg6_exit(); seg6_fail: calipso_exit(); calipso_fail: pingv6_exit(); pingv6_fail: ipv6_packet_cleanup(); ipv6_packet_fail: tcpv6_exit(); tcpv6_fail: udpv6_offload_exit(); udpv6_offload_fail: udplitev6_exit(); udplitev6_fail: udpv6_exit(); udpv6_fail: ipv6_frag_exit(); ipv6_frag_fail: ipv6_exthdrs_exit(); ipv6_exthdrs_fail: addrconf_cleanup(); addrconf_fail: ipv6_anycast_cleanup(); ipv6_anycast_fail: ip6_flowlabel_cleanup(); ip6_flowlabel_fail: ndisc_late_cleanup(); ndisc_late_fail: ip6_route_cleanup(); ip6_route_fail: #ifdef CONFIG_PROC_FS if6_proc_exit(); proc_if6_fail: ipv6_misc_proc_exit(); proc_misc6_fail: udplite6_proc_exit(); proc_udplite6_fail: raw6_proc_exit(); proc_raw6_fail: #endif ipv6_netfilter_fini(); netfilter_fail: igmp6_cleanup(); igmp_fail: ndisc_cleanup(); ndisc_fail: icmpv6_cleanup(); icmp_fail: ip6_mr_cleanup(); ipmr_fail: unregister_pernet_subsys(&inet6_net_ops); register_pernet_fail: sock_unregister(PF_INET6); rtnl_unregister_all(PF_INET6); out_sock_register_fail: rawv6_exit(); out_unregister_ping_proto: proto_unregister(&pingv6_prot); out_unregister_raw_proto: proto_unregister(&rawv6_prot); out_unregister_udplite_proto: proto_unregister(&udplitev6_prot); out_unregister_udp_proto: proto_unregister(&udpv6_prot); out_unregister_tcp_proto: proto_unregister(&tcpv6_prot); goto out; } module_init(inet6_init); MODULE_ALIAS_NETPROTO(PF_INET6); |
| 4 3 7 6 2 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 | // SPDX-License-Identifier: GPL-2.0-only /* * Landlock LSM - Object management * * Copyright © 2016-2020 Mickaël Salaün <mic@digikod.net> * Copyright © 2018-2020 ANSSI */ #include <linux/bug.h> #include <linux/compiler_types.h> #include <linux/err.h> #include <linux/kernel.h> #include <linux/rcupdate.h> #include <linux/refcount.h> #include <linux/slab.h> #include <linux/spinlock.h> #include "object.h" struct landlock_object * landlock_create_object(const struct landlock_object_underops *const underops, void *const underobj) { struct landlock_object *new_object; if (WARN_ON_ONCE(!underops || !underobj)) return ERR_PTR(-ENOENT); new_object = kzalloc(sizeof(*new_object), GFP_KERNEL_ACCOUNT); if (!new_object) return ERR_PTR(-ENOMEM); refcount_set(&new_object->usage, 1); spin_lock_init(&new_object->lock); new_object->underops = underops; new_object->underobj = underobj; return new_object; } /* * The caller must own the object (i.e. thanks to object->usage) to safely put * it. */ void landlock_put_object(struct landlock_object *const object) { /* * The call to @object->underops->release(object) might sleep, e.g. * because of iput(). */ might_sleep(); if (!object) return; /* * If the @object's refcount cannot drop to zero, we can just decrement * the refcount without holding a lock. Otherwise, the decrement must * happen under @object->lock for synchronization with things like * get_inode_object(). */ if (refcount_dec_and_lock(&object->usage, &object->lock)) { __acquire(&object->lock); /* * With @object->lock initially held, remove the reference from * @object->underobj to @object (if it still exists). */ object->underops->release(object); kfree_rcu(object, rcu_free); } } |
| 10 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 | /* SPDX-License-Identifier: GPL-2.0 */ #ifndef _LINUX_IRQ_H #define _LINUX_IRQ_H /* * Please do not include this file in generic code. There is currently * no requirement for any architecture to implement anything held * within this file. * * Thanks. --rmk */ #include <linux/cache.h> #include <linux/spinlock.h> #include <linux/cpumask.h> #include <linux/irqhandler.h> #include <linux/irqreturn.h> #include <linux/irqnr.h> #include <linux/topology.h> #include <linux/io.h> #include <linux/slab.h> #include <asm/irq.h> #include <asm/ptrace.h> #include <asm/irq_regs.h> struct seq_file; struct module; struct msi_msg; struct irq_affinity_desc; enum irqchip_irq_state; /* * IRQ line status. * * Bits 0-7 are the same as the IRQF_* bits in linux/interrupt.h * * IRQ_TYPE_NONE - default, unspecified type * IRQ_TYPE_EDGE_RISING - rising edge triggered * IRQ_TYPE_EDGE_FALLING - falling edge triggered * IRQ_TYPE_EDGE_BOTH - rising and falling edge triggered * IRQ_TYPE_LEVEL_HIGH - high level triggered * IRQ_TYPE_LEVEL_LOW - low level triggered * IRQ_TYPE_LEVEL_MASK - Mask to filter out the level bits * IRQ_TYPE_SENSE_MASK - Mask for all the above bits * IRQ_TYPE_DEFAULT - For use by some PICs to ask irq_set_type * to setup the HW to a sane default (used * by irqdomain map() callbacks to synchronize * the HW state and SW flags for a newly * allocated descriptor). * * IRQ_TYPE_PROBE - Special flag for probing in progress * * Bits which can be modified via irq_set/clear/modify_status_flags() * IRQ_LEVEL - Interrupt is level type. Will be also * updated in the code when the above trigger * bits are modified via irq_set_irq_type() * IRQ_PER_CPU - Mark an interrupt PER_CPU. Will protect * it from affinity setting * IRQ_NOPROBE - Interrupt cannot be probed by autoprobing * IRQ_NOREQUEST - Interrupt cannot be requested via * request_irq() * IRQ_NOTHREAD - Interrupt cannot be threaded * IRQ_NOAUTOEN - Interrupt is not automatically enabled in * request/setup_irq() * IRQ_NO_BALANCING - Interrupt cannot be balanced (affinity set) * IRQ_NESTED_THREAD - Interrupt nests into another thread * IRQ_PER_CPU_DEVID - Dev_id is a per-cpu variable * IRQ_IS_POLLED - Always polled by another interrupt. Exclude * it from the spurious interrupt detection * mechanism and from core side polling. * IRQ_DISABLE_UNLAZY - Disable lazy irq disable * IRQ_HIDDEN - Don't show up in /proc/interrupts * IRQ_NO_DEBUG - Exclude from note_interrupt() debugging */ enum { IRQ_TYPE_NONE = 0x00000000, IRQ_TYPE_EDGE_RISING = 0x00000001, IRQ_TYPE_EDGE_FALLING = 0x00000002, IRQ_TYPE_EDGE_BOTH = (IRQ_TYPE_EDGE_FALLING | IRQ_TYPE_EDGE_RISING), IRQ_TYPE_LEVEL_HIGH = 0x00000004, IRQ_TYPE_LEVEL_LOW = 0x00000008, IRQ_TYPE_LEVEL_MASK = (IRQ_TYPE_LEVEL_LOW | IRQ_TYPE_LEVEL_HIGH), IRQ_TYPE_SENSE_MASK = 0x0000000f, IRQ_TYPE_DEFAULT = IRQ_TYPE_SENSE_MASK, IRQ_TYPE_PROBE = 0x00000010, IRQ_LEVEL = (1 << 8), IRQ_PER_CPU = (1 << 9), IRQ_NOPROBE = (1 << 10), IRQ_NOREQUEST = (1 << 11), IRQ_NOAUTOEN = (1 << 12), IRQ_NO_BALANCING = (1 << 13), IRQ_NESTED_THREAD = (1 << 15), IRQ_NOTHREAD = (1 << 16), IRQ_PER_CPU_DEVID = (1 << 17), IRQ_IS_POLLED = (1 << 18), IRQ_DISABLE_UNLAZY = (1 << 19), IRQ_HIDDEN = (1 << 20), IRQ_NO_DEBUG = (1 << 21), }; #define IRQF_MODIFY_MASK \ (IRQ_TYPE_SENSE_MASK | IRQ_NOPROBE | IRQ_NOREQUEST | \ IRQ_NOAUTOEN | IRQ_LEVEL | IRQ_NO_BALANCING | \ IRQ_PER_CPU | IRQ_NESTED_THREAD | IRQ_NOTHREAD | IRQ_PER_CPU_DEVID | \ IRQ_IS_POLLED | IRQ_DISABLE_UNLAZY | IRQ_HIDDEN) #define IRQ_NO_BALANCING_MASK (IRQ_PER_CPU | IRQ_NO_BALANCING) /* * Return value for chip->irq_set_affinity() * * IRQ_SET_MASK_OK - OK, core updates irq_common_data.affinity * IRQ_SET_MASK_NOCOPY - OK, chip did update irq_common_data.affinity * IRQ_SET_MASK_OK_DONE - Same as IRQ_SET_MASK_OK for core. Special code to * support stacked irqchips, which indicates skipping * all descendant irqchips. */ enum { IRQ_SET_MASK_OK = 0, IRQ_SET_MASK_OK_NOCOPY, IRQ_SET_MASK_OK_DONE, }; struct msi_desc; struct irq_domain; /** * struct irq_common_data - per irq data shared by all irqchips * @state_use_accessors: status information for irq chip functions. * Use accessor functions to deal with it * @node: node index useful for balancing * @handler_data: per-IRQ data for the irq_chip methods * @affinity: IRQ affinity on SMP. If this is an IPI * related irq, then this is the mask of the * CPUs to which an IPI can be sent. * @effective_affinity: The effective IRQ affinity on SMP as some irq * chips do not allow multi CPU destinations. * A subset of @affinity. * @msi_desc: MSI descriptor * @ipi_offset: Offset of first IPI target cpu in @affinity. Optional. */ struct irq_common_data { unsigned int __private state_use_accessors; #ifdef CONFIG_NUMA unsigned int node; #endif void *handler_data; struct msi_desc *msi_desc; #ifdef CONFIG_SMP cpumask_var_t affinity; #endif #ifdef CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK cpumask_var_t effective_affinity; #endif #ifdef CONFIG_GENERIC_IRQ_IPI unsigned int ipi_offset; #endif }; /** * struct irq_data - per irq chip data passed down to chip functions * @mask: precomputed bitmask for accessing the chip registers * @irq: interrupt number * @hwirq: hardware interrupt number, local to the interrupt domain * @common: point to data shared by all irqchips * @chip: low level interrupt hardware access * @domain: Interrupt translation domain; responsible for mapping * between hwirq number and linux irq number. * @parent_data: pointer to parent struct irq_data to support hierarchy * irq_domain * @chip_data: platform-specific per-chip private data for the chip * methods, to allow shared chip implementations */ struct irq_data { u32 mask; unsigned int irq; irq_hw_number_t hwirq; struct irq_common_data *common; struct irq_chip *chip; struct irq_domain *domain; #ifdef CONFIG_IRQ_DOMAIN_HIERARCHY struct irq_data *parent_data; #endif void *chip_data; }; /* * Bit masks for irq_common_data.state_use_accessors * * IRQD_TRIGGER_MASK - Mask for the trigger type bits * IRQD_SETAFFINITY_PENDING - Affinity setting is pending * IRQD_ACTIVATED - Interrupt has already been activated * IRQD_NO_BALANCING - Balancing disabled for this IRQ * IRQD_PER_CPU - Interrupt is per cpu * IRQD_AFFINITY_SET - Interrupt affinity was set * IRQD_LEVEL - Interrupt is level triggered * IRQD_WAKEUP_STATE - Interrupt is configured for wakeup * from suspend * IRQD_IRQ_DISABLED - Disabled state of the interrupt * IRQD_IRQ_MASKED - Masked state of the interrupt * IRQD_IRQ_INPROGRESS - In progress state of the interrupt * IRQD_WAKEUP_ARMED - Wakeup mode armed * IRQD_FORWARDED_TO_VCPU - The interrupt is forwarded to a VCPU * IRQD_AFFINITY_MANAGED - Affinity is auto-managed by the kernel * IRQD_IRQ_STARTED - Startup state of the interrupt * IRQD_MANAGED_SHUTDOWN - Interrupt was shutdown due to empty affinity * mask. Applies only to affinity managed irqs. * IRQD_SINGLE_TARGET - IRQ allows only a single affinity target * IRQD_DEFAULT_TRIGGER_SET - Expected trigger already been set * IRQD_CAN_RESERVE - Can use reservation mode * IRQD_HANDLE_ENFORCE_IRQCTX - Enforce that handle_irq_*() is only invoked * from actual interrupt context. * IRQD_AFFINITY_ON_ACTIVATE - Affinity is set on activation. Don't call * irq_chip::irq_set_affinity() when deactivated. * IRQD_IRQ_ENABLED_ON_SUSPEND - Interrupt is enabled on suspend by irq pm if * irqchip have flag IRQCHIP_ENABLE_WAKEUP_ON_SUSPEND set. * IRQD_RESEND_WHEN_IN_PROGRESS - Interrupt may fire when already in progress in which * case it must be resent at the next available opportunity. */ enum { IRQD_TRIGGER_MASK = 0xf, IRQD_SETAFFINITY_PENDING = BIT(8), IRQD_ACTIVATED = BIT(9), IRQD_NO_BALANCING = BIT(10), IRQD_PER_CPU = BIT(11), IRQD_AFFINITY_SET = BIT(12), IRQD_LEVEL = BIT(13), IRQD_WAKEUP_STATE = BIT(14), IRQD_IRQ_DISABLED = BIT(16), IRQD_IRQ_MASKED = BIT(17), IRQD_IRQ_INPROGRESS = BIT(18), IRQD_WAKEUP_ARMED = BIT(19), IRQD_FORWARDED_TO_VCPU = BIT(20), IRQD_AFFINITY_MANAGED = BIT(21), IRQD_IRQ_STARTED = BIT(22), IRQD_MANAGED_SHUTDOWN = BIT(23), IRQD_SINGLE_TARGET = BIT(24), IRQD_DEFAULT_TRIGGER_SET = BIT(25), IRQD_CAN_RESERVE = BIT(26), IRQD_HANDLE_ENFORCE_IRQCTX = BIT(27), IRQD_AFFINITY_ON_ACTIVATE = BIT(28), IRQD_IRQ_ENABLED_ON_SUSPEND = BIT(29), IRQD_RESEND_WHEN_IN_PROGRESS = BIT(30), }; #define __irqd_to_state(d) ACCESS_PRIVATE((d)->common, state_use_accessors) static inline bool irqd_is_setaffinity_pending(struct irq_data *d) { return __irqd_to_state(d) & IRQD_SETAFFINITY_PENDING; } static inline bool irqd_is_per_cpu(struct irq_data *d) { return __irqd_to_state(d) & IRQD_PER_CPU; } static inline bool irqd_can_balance(struct irq_data *d) { return !(__irqd_to_state(d) & (IRQD_PER_CPU | IRQD_NO_BALANCING)); } static inline bool irqd_affinity_was_set(struct irq_data *d) { return __irqd_to_state(d) & IRQD_AFFINITY_SET; } static inline void irqd_mark_affinity_was_set(struct irq_data *d) { __irqd_to_state(d) |= IRQD_AFFINITY_SET; } static inline bool irqd_trigger_type_was_set(struct irq_data *d) { return __irqd_to_state(d) & IRQD_DEFAULT_TRIGGER_SET; } static inline u32 irqd_get_trigger_type(struct irq_data *d) { return __irqd_to_state(d) & IRQD_TRIGGER_MASK; } /* * Must only be called inside irq_chip.irq_set_type() functions or * from the DT/ACPI setup code. */ static inline void irqd_set_trigger_type(struct irq_data *d, u32 type) { __irqd_to_state(d) &= ~IRQD_TRIGGER_MASK; __irqd_to_state(d) |= type & IRQD_TRIGGER_MASK; __irqd_to_state(d) |= IRQD_DEFAULT_TRIGGER_SET; } static inline bool irqd_is_level_type(struct irq_data *d) { return __irqd_to_state(d) & IRQD_LEVEL; } /* * Must only be called of irqchip.irq_set_affinity() or low level * hierarchy domain allocation functions. */ static inline void irqd_set_single_target(struct irq_data *d) { __irqd_to_state(d) |= IRQD_SINGLE_TARGET; } static inline bool irqd_is_single_target(struct irq_data *d) { return __irqd_to_state(d) & IRQD_SINGLE_TARGET; } static inline void irqd_set_handle_enforce_irqctx(struct irq_data *d) { __irqd_to_state(d) |= IRQD_HANDLE_ENFORCE_IRQCTX; } static inline bool irqd_is_handle_enforce_irqctx(struct irq_data *d) { return __irqd_to_state(d) & IRQD_HANDLE_ENFORCE_IRQCTX; } static inline bool irqd_is_enabled_on_suspend(struct irq_data *d) { return __irqd_to_state(d) & IRQD_IRQ_ENABLED_ON_SUSPEND; } static inline bool irqd_is_wakeup_set(struct irq_data *d) { return __irqd_to_state(d) & IRQD_WAKEUP_STATE; } static inline bool irqd_irq_disabled(struct irq_data *d) { return __irqd_to_state(d) & IRQD_IRQ_DISABLED; } static inline bool irqd_irq_masked(struct irq_data *d) { return __irqd_to_state(d) & IRQD_IRQ_MASKED; } static inline bool irqd_irq_inprogress(struct irq_data *d) { return __irqd_to_state(d) & IRQD_IRQ_INPROGRESS; } static inline bool irqd_is_wakeup_armed(struct irq_data *d) { return __irqd_to_state(d) & IRQD_WAKEUP_ARMED; } static inline bool irqd_is_forwarded_to_vcpu(struct irq_data *d) { return __irqd_to_state(d) & IRQD_FORWARDED_TO_VCPU; } static inline void irqd_set_forwarded_to_vcpu(struct irq_data *d) { __irqd_to_state(d) |= IRQD_FORWARDED_TO_VCPU; } static inline void irqd_clr_forwarded_to_vcpu(struct irq_data *d) { __irqd_to_state(d) &= ~IRQD_FORWARDED_TO_VCPU; } static inline bool irqd_affinity_is_managed(struct irq_data *d) { return __irqd_to_state(d) & IRQD_AFFINITY_MANAGED; } static inline bool irqd_is_activated(struct irq_data *d) { return __irqd_to_state(d) & IRQD_ACTIVATED; } static inline void irqd_set_activated(struct irq_data *d) { __irqd_to_state(d) |= IRQD_ACTIVATED; } static inline void irqd_clr_activated(struct irq_data *d) { __irqd_to_state(d) &= ~IRQD_ACTIVATED; } static inline bool irqd_is_started(struct irq_data *d) { return __irqd_to_state(d) & IRQD_IRQ_STARTED; } static inline bool irqd_is_managed_and_shutdown(struct irq_data *d) { return __irqd_to_state(d) & IRQD_MANAGED_SHUTDOWN; } static inline void irqd_set_can_reserve(struct irq_data *d) { __irqd_to_state(d) |= IRQD_CAN_RESERVE; } static inline void irqd_clr_can_reserve(struct irq_data *d) { __irqd_to_state(d) &= ~IRQD_CAN_RESERVE; } static inline bool irqd_can_reserve(struct irq_data *d) { return __irqd_to_state(d) & IRQD_CAN_RESERVE; } static inline void irqd_set_affinity_on_activate(struct irq_data *d) { __irqd_to_state(d) |= IRQD_AFFINITY_ON_ACTIVATE; } static inline bool irqd_affinity_on_activate(struct irq_data *d) { return __irqd_to_state(d) & IRQD_AFFINITY_ON_ACTIVATE; } static inline void irqd_set_resend_when_in_progress(struct irq_data *d) { __irqd_to_state(d) |= IRQD_RESEND_WHEN_IN_PROGRESS; } static inline bool irqd_needs_resend_when_in_progress(struct irq_data *d) { return __irqd_to_state(d) & IRQD_RESEND_WHEN_IN_PROGRESS; } #undef __irqd_to_state static inline irq_hw_number_t irqd_to_hwirq(struct irq_data *d) { return d->hwirq; } /** * struct irq_chip - hardware interrupt chip descriptor * * @name: name for /proc/interrupts * @irq_startup: start up the interrupt (defaults to ->enable if NULL) * @irq_shutdown: shut down the interrupt (defaults to ->disable if NULL) * @irq_enable: enable the interrupt (defaults to chip->unmask if NULL) * @irq_disable: disable the interrupt * @irq_ack: start of a new interrupt * @irq_mask: mask an interrupt source * @irq_mask_ack: ack and mask an interrupt source * @irq_unmask: unmask an interrupt source * @irq_eoi: end of interrupt * @irq_set_affinity: Set the CPU affinity on SMP machines. If the force * argument is true, it tells the driver to * unconditionally apply the affinity setting. Sanity * checks against the supplied affinity mask are not * required. This is used for CPU hotplug where the * target CPU is not yet set in the cpu_online_mask. * @irq_retrigger: resend an IRQ to the CPU * @irq_set_type: set the flow type (IRQ_TYPE_LEVEL/etc.) of an IRQ * @irq_set_wake: enable/disable power-management wake-on of an IRQ * @irq_bus_lock: function to lock access to slow bus (i2c) chips * @irq_bus_sync_unlock:function to sync and unlock slow bus (i2c) chips * @irq_cpu_online: configure an interrupt source for a secondary CPU * @irq_cpu_offline: un-configure an interrupt source for a secondary CPU * @irq_suspend: function called from core code on suspend once per * chip, when one or more interrupts are installed * @irq_resume: function called from core code on resume once per chip, * when one ore more interrupts are installed * @irq_pm_shutdown: function called from core code on shutdown once per chip * @irq_calc_mask: Optional function to set irq_data.mask for special cases * @irq_print_chip: optional to print special chip info in show_interrupts * @irq_request_resources: optional to request resources before calling * any other callback related to this irq * @irq_release_resources: optional to release resources acquired with * irq_request_resources * @irq_compose_msi_msg: optional to compose message content for MSI * @irq_write_msi_msg: optional to write message content for MSI * @irq_get_irqchip_state: return the internal state of an interrupt * @irq_set_irqchip_state: set the internal state of a interrupt * @irq_set_vcpu_affinity: optional to target a vCPU in a virtual machine * @ipi_send_single: send a single IPI to destination cpus * @ipi_send_mask: send an IPI to destination cpus in cpumask * @irq_nmi_setup: function called from core code before enabling an NMI * @irq_nmi_teardown: function called from core code after disabling an NMI * @irq_force_complete_move: optional function to force complete pending irq move * @flags: chip specific flags */ struct irq_chip { const char *name; unsigned int (*irq_startup)(struct irq_data *data); void (*irq_shutdown)(struct irq_data *data); void (*irq_enable)(struct irq_data *data); void (*irq_disable)(struct irq_data *data); void (*irq_ack)(struct irq_data *data); void (*irq_mask)(struct irq_data *data); void (*irq_mask_ack)(struct irq_data *data); void (*irq_unmask)(struct irq_data *data); void (*irq_eoi)(struct irq_data *data); int (*irq_set_affinity)(struct irq_data *data, const struct cpumask *dest, bool force); int (*irq_retrigger)(struct irq_data *data); int (*irq_set_type)(struct irq_data *data, unsigned int flow_type); int (*irq_set_wake)(struct irq_data *data, unsigned int on); void (*irq_bus_lock)(struct irq_data *data); void (*irq_bus_sync_unlock)(struct irq_data *data); #ifdef CONFIG_DEPRECATED_IRQ_CPU_ONOFFLINE void (*irq_cpu_online)(struct irq_data *data); void (*irq_cpu_offline)(struct irq_data *data); #endif void (*irq_suspend)(struct irq_data *data); void (*irq_resume)(struct irq_data *data); void (*irq_pm_shutdown)(struct irq_data *data); void (*irq_calc_mask)(struct irq_data *data); void (*irq_print_chip)(struct irq_data *data, struct seq_file *p); int (*irq_request_resources)(struct irq_data *data); void (*irq_release_resources)(struct irq_data *data); void (*irq_compose_msi_msg)(struct irq_data *data, struct msi_msg *msg); void (*irq_write_msi_msg)(struct irq_data *data, struct msi_msg *msg); int (*irq_get_irqchip_state)(struct irq_data *data, enum irqchip_irq_state which, bool *state); int (*irq_set_irqchip_state)(struct irq_data *data, enum irqchip_irq_state which, bool state); int (*irq_set_vcpu_affinity)(struct irq_data *data, void *vcpu_info); void (*ipi_send_single)(struct irq_data *data, unsigned int cpu); void (*ipi_send_mask)(struct irq_data *data, const struct cpumask *dest); int (*irq_nmi_setup)(struct irq_data *data); void (*irq_nmi_teardown)(struct irq_data *data); void (*irq_force_complete_move)(struct irq_data *data); unsigned long flags; }; /* * irq_chip specific flags * * IRQCHIP_SET_TYPE_MASKED: Mask before calling chip.irq_set_type() * IRQCHIP_EOI_IF_HANDLED: Only issue irq_eoi() when irq was handled * IRQCHIP_MASK_ON_SUSPEND: Mask non wake irqs in the suspend path * IRQCHIP_ONOFFLINE_ENABLED: Only call irq_on/off_line callbacks * when irq enabled * IRQCHIP_SKIP_SET_WAKE: Skip chip.irq_set_wake(), for this irq chip * IRQCHIP_ONESHOT_SAFE: One shot does not require mask/unmask * IRQCHIP_EOI_THREADED: Chip requires eoi() on unmask in threaded mode * IRQCHIP_SUPPORTS_LEVEL_MSI: Chip can provide two doorbells for Level MSIs * IRQCHIP_SUPPORTS_NMI: Chip can deliver NMIs, only for root irqchips * IRQCHIP_ENABLE_WAKEUP_ON_SUSPEND: Invokes __enable_irq()/__disable_irq() for wake irqs * in the suspend path if they are in disabled state * IRQCHIP_AFFINITY_PRE_STARTUP: Default affinity update before startup * IRQCHIP_IMMUTABLE: Don't ever change anything in this chip * IRQCHIP_MOVE_DEFERRED: Move the interrupt in actual interrupt context */ enum { IRQCHIP_SET_TYPE_MASKED = (1 << 0), IRQCHIP_EOI_IF_HANDLED = (1 << 1), IRQCHIP_MASK_ON_SUSPEND = (1 << 2), IRQCHIP_ONOFFLINE_ENABLED = (1 << 3), IRQCHIP_SKIP_SET_WAKE = (1 << 4), IRQCHIP_ONESHOT_SAFE = (1 << 5), IRQCHIP_EOI_THREADED = (1 << 6), IRQCHIP_SUPPORTS_LEVEL_MSI = (1 << 7), IRQCHIP_SUPPORTS_NMI = (1 << 8), IRQCHIP_ENABLE_WAKEUP_ON_SUSPEND = (1 << 9), IRQCHIP_AFFINITY_PRE_STARTUP = (1 << 10), IRQCHIP_IMMUTABLE = (1 << 11), IRQCHIP_MOVE_DEFERRED = (1 << 12), }; #include <linux/irqdesc.h> /* * Pick up the arch-dependent methods: */ #include <asm/hw_irq.h> #ifndef NR_IRQS_LEGACY # define NR_IRQS_LEGACY 0 #endif #ifndef ARCH_IRQ_INIT_FLAGS # define ARCH_IRQ_INIT_FLAGS 0 #endif #define IRQ_DEFAULT_INIT_FLAGS ARCH_IRQ_INIT_FLAGS struct irqaction; extern int setup_percpu_irq(unsigned int irq, struct irqaction *new); #ifdef CONFIG_DEPRECATED_IRQ_CPU_ONOFFLINE extern void irq_cpu_online(void); extern void irq_cpu_offline(void); #endif extern int irq_set_affinity_locked(struct irq_data *data, const struct cpumask *cpumask, bool force); extern int irq_set_vcpu_affinity(unsigned int irq, void *vcpu_info); #if defined(CONFIG_SMP) && defined(CONFIG_GENERIC_IRQ_MIGRATION) extern void irq_migrate_all_off_this_cpu(void); extern int irq_affinity_online_cpu(unsigned int cpu); #else # define irq_affinity_online_cpu NULL #endif #if defined(CONFIG_SMP) && defined(CONFIG_GENERIC_PENDING_IRQ) bool irq_can_move_in_process_context(struct irq_data *data); void __irq_move_irq(struct irq_data *data); static inline void irq_move_irq(struct irq_data *data) { if (unlikely(irqd_is_setaffinity_pending(data))) __irq_move_irq(data); } void irq_move_masked_irq(struct irq_data *data); #else static inline bool irq_can_move_in_process_context(struct irq_data *data) { return true; } static inline void irq_move_irq(struct irq_data *data) { } static inline void irq_move_masked_irq(struct irq_data *data) { } #endif extern int no_irq_affinity; #ifdef CONFIG_HARDIRQS_SW_RESEND int irq_set_parent(int irq, int parent_irq); #else static inline int irq_set_parent(int irq, int parent_irq) { return 0; } #endif /* * Built-in IRQ handlers for various IRQ types, * callable via desc->handle_irq() */ extern void handle_level_irq(struct irq_desc *desc); extern void handle_fasteoi_irq(struct irq_desc *desc); extern void handle_edge_irq(struct irq_desc *desc); extern void handle_edge_eoi_irq(struct irq_desc *desc); extern void handle_simple_irq(struct irq_desc *desc); extern void handle_untracked_irq(struct irq_desc *desc); extern void handle_percpu_irq(struct irq_desc *desc); extern void handle_percpu_devid_irq(struct irq_desc *desc); extern void handle_bad_irq(struct irq_desc *desc); extern void handle_nested_irq(unsigned int irq); extern void handle_fasteoi_nmi(struct irq_desc *desc); extern void handle_percpu_devid_fasteoi_nmi(struct irq_desc *desc); extern int irq_chip_compose_msi_msg(struct irq_data *data, struct msi_msg *msg); extern int irq_chip_pm_get(struct irq_data *data); extern int irq_chip_pm_put(struct irq_data *data); #ifdef CONFIG_IRQ_DOMAIN_HIERARCHY extern void handle_fasteoi_ack_irq(struct irq_desc *desc); extern void handle_fasteoi_mask_irq(struct irq_desc *desc); extern int irq_chip_set_parent_state(struct irq_data *data, enum irqchip_irq_state which, bool val); extern int irq_chip_get_parent_state(struct irq_data *data, enum irqchip_irq_state which, bool *state); extern void irq_chip_enable_parent(struct irq_data *data); extern void irq_chip_disable_parent(struct irq_data *data); extern void irq_chip_ack_parent(struct irq_data *data); extern int irq_chip_retrigger_hierarchy(struct irq_data *data); extern void irq_chip_mask_parent(struct irq_data *data); extern void irq_chip_mask_ack_parent(struct irq_data *data); extern void irq_chip_unmask_parent(struct irq_data *data); extern void irq_chip_eoi_parent(struct irq_data *data); extern int irq_chip_set_affinity_parent(struct irq_data *data, const struct cpumask *dest, bool force); extern int irq_chip_set_wake_parent(struct irq_data *data, unsigned int on); extern int irq_chip_set_vcpu_affinity_parent(struct irq_data *data, void *vcpu_info); extern int irq_chip_set_type_parent(struct irq_data *data, unsigned int type); extern int irq_chip_request_resources_parent(struct irq_data *data); extern void irq_chip_release_resources_parent(struct irq_data *data); #endif /* Disable or mask interrupts during a kernel kexec */ extern void machine_kexec_mask_interrupts(void); /* Handling of unhandled and spurious interrupts: */ extern void note_interrupt(struct irq_desc *desc, irqreturn_t action_ret); /* Enable/disable irq debugging output: */ extern int noirqdebug_setup(char *str); /* Checks whether the interrupt can be requested by request_irq(): */ extern bool can_request_irq(unsigned int irq, unsigned long irqflags); /* Dummy irq-chip implementations: */ extern struct irq_chip no_irq_chip; extern struct irq_chip dummy_irq_chip; extern void irq_set_chip_and_handler_name(unsigned int irq, const struct irq_chip *chip, irq_flow_handler_t handle, const char *name); static inline void irq_set_chip_and_handler(unsigned int irq, const struct irq_chip *chip, irq_flow_handler_t handle) { irq_set_chip_and_handler_name(irq, chip, handle, NULL); } extern int irq_set_percpu_devid(unsigned int irq); extern int irq_set_percpu_devid_partition(unsigned int irq, const struct cpumask *affinity); extern int irq_get_percpu_devid_partition(unsigned int irq, struct cpumask *affinity); extern void __irq_set_handler(unsigned int irq, irq_flow_handler_t handle, int is_chained, const char *name); static inline void irq_set_handler(unsigned int irq, irq_flow_handler_t handle) { __irq_set_handler(irq, handle, 0, NULL); } /* * Set a highlevel chained flow handler for a given IRQ. * (a chained handler is automatically enabled and set to * IRQ_NOREQUEST, IRQ_NOPROBE, and IRQ_NOTHREAD) */ static inline void irq_set_chained_handler(unsigned int irq, irq_flow_handler_t handle) { __irq_set_handler(irq, handle, 1, NULL); } /* * Set a highlevel chained flow handler and its data for a given IRQ. * (a chained handler is automatically enabled and set to * IRQ_NOREQUEST, IRQ_NOPROBE, and IRQ_NOTHREAD) */ void irq_set_chained_handler_and_data(unsigned int irq, irq_flow_handler_t handle, void *data); void irq_modify_status(unsigned int irq, unsigned long clr, unsigned long set); static inline void irq_set_status_flags(unsigned int irq, unsigned long set) { irq_modify_status(irq, 0, set); } static inline void irq_clear_status_flags(unsigned int irq, unsigned long clr) { irq_modify_status(irq, clr, 0); } static inline void irq_set_noprobe(unsigned int irq) { irq_modify_status(irq, 0, IRQ_NOPROBE); } static inline void irq_set_probe(unsigned int irq) { irq_modify_status(irq, IRQ_NOPROBE, 0); } static inline void irq_set_nothread(unsigned int irq) { irq_modify_status(irq, 0, IRQ_NOTHREAD); } static inline void irq_set_thread(unsigned int irq) { irq_modify_status(irq, IRQ_NOTHREAD, 0); } static inline void irq_set_nested_thread(unsigned int irq, bool nest) { if (nest) irq_set_status_flags(irq, IRQ_NESTED_THREAD); else irq_clear_status_flags(irq, IRQ_NESTED_THREAD); } static inline void irq_set_percpu_devid_flags(unsigned int irq) { irq_set_status_flags(irq, IRQ_NOAUTOEN | IRQ_PER_CPU | IRQ_NOTHREAD | IRQ_NOPROBE | IRQ_PER_CPU_DEVID); } /* Set/get chip/data for an IRQ: */ extern int irq_set_chip(unsigned int irq, const struct irq_chip *chip); extern int irq_set_handler_data(unsigned int irq, void *data); extern int irq_set_chip_data(unsigned int irq, void *data); extern int irq_set_irq_type(unsigned int irq, unsigned int type); extern int irq_set_msi_desc(unsigned int irq, struct msi_desc *entry); extern int irq_set_msi_desc_off(unsigned int irq_base, unsigned int irq_offset, struct msi_desc *entry); extern struct irq_data *irq_get_irq_data(unsigned int irq); static inline struct irq_chip *irq_get_chip(unsigned int irq) { struct irq_data *d = irq_get_irq_data(irq); return d ? d->chip : NULL; } static inline struct irq_chip *irq_data_get_irq_chip(struct irq_data *d) { return d->chip; } static inline void *irq_get_chip_data(unsigned int irq) { struct irq_data *d = irq_get_irq_data(irq); return d ? d->chip_data : NULL; } static inline void *irq_data_get_irq_chip_data(struct irq_data *d) { return d->chip_data; } static inline void *irq_get_handler_data(unsigned int irq) { struct irq_data *d = irq_get_irq_data(irq); return d ? d->common->handler_data : NULL; } static inline void *irq_data_get_irq_handler_data(struct irq_data *d) { return d->common->handler_data; } static inline struct msi_desc *irq_get_msi_desc(unsigned int irq) { struct irq_data *d = irq_get_irq_data(irq); return d ? d->common->msi_desc : NULL; } static inline struct msi_desc *irq_data_get_msi_desc(struct irq_data *d) { return d->common->msi_desc; } static inline u32 irq_get_trigger_type(unsigned int irq) { struct irq_data *d = irq_get_irq_data(irq); return d ? irqd_get_trigger_type(d) : 0; } static inline int irq_common_data_get_node(struct irq_common_data *d) { #ifdef CONFIG_NUMA return d->node; #else return 0; #endif } static inline int irq_data_get_node(struct irq_data *d) { return irq_common_data_get_node(d->common); } static inline const struct cpumask *irq_data_get_affinity_mask(struct irq_data *d) { #ifdef CONFIG_SMP return d->common->affinity; #else return cpumask_of(0); #endif } static inline void irq_data_update_affinity(struct irq_data *d, const struct cpumask *m) { #ifdef CONFIG_SMP cpumask_copy(d->common->affinity, m); #endif } static inline const struct cpumask *irq_get_affinity_mask(int irq) { struct irq_data *d = irq_get_irq_data(irq); return d ? irq_data_get_affinity_mask(d) : NULL; } #ifdef CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK static inline const struct cpumask *irq_data_get_effective_affinity_mask(struct irq_data *d) { return d->common->effective_affinity; } static inline void irq_data_update_effective_affinity(struct irq_data *d, const struct cpumask *m) { cpumask_copy(d->common->effective_affinity, m); } #else static inline void irq_data_update_effective_affinity(struct irq_data *d, const struct cpumask *m) { } static inline const struct cpumask *irq_data_get_effective_affinity_mask(struct irq_data *d) { return irq_data_get_affinity_mask(d); } #endif static inline const struct cpumask *irq_get_effective_affinity_mask(unsigned int irq) { struct irq_data *d = irq_get_irq_data(irq); return d ? irq_data_get_effective_affinity_mask(d) : NULL; } unsigned int arch_dynirq_lower_bound(unsigned int from); int __irq_alloc_descs(int irq, unsigned int from, unsigned int cnt, int node, struct module *owner, const struct irq_affinity_desc *affinity); int __devm_irq_alloc_descs(struct device *dev, int irq, unsigned int from, unsigned int cnt, int node, struct module *owner, const struct irq_affinity_desc *affinity); /* use macros to avoid needing export.h for THIS_MODULE */ #define irq_alloc_descs(irq, from, cnt, node) \ __irq_alloc_descs(irq, from, cnt, node, THIS_MODULE, NULL) #define irq_alloc_desc(node) \ irq_alloc_descs(-1, 1, 1, node) #define irq_alloc_desc_at(at, node) \ irq_alloc_descs(at, at, 1, node) #define irq_alloc_desc_from(from, node) \ irq_alloc_descs(-1, from, 1, node) #define irq_alloc_descs_from(from, cnt, node) \ irq_alloc_descs(-1, from, cnt, node) #define devm_irq_alloc_descs(dev, irq, from, cnt, node) \ __devm_irq_alloc_descs(dev, irq, from, cnt, node, THIS_MODULE, NULL) #define devm_irq_alloc_desc(dev, node) \ devm_irq_alloc_descs(dev, -1, 1, 1, node) #define devm_irq_alloc_desc_at(dev, at, node) \ devm_irq_alloc_descs(dev, at, at, 1, node) #define devm_irq_alloc_desc_from(dev, from, node) \ devm_irq_alloc_descs(dev, -1, from, 1, node) #define devm_irq_alloc_descs_from(dev, from, cnt, node) \ devm_irq_alloc_descs(dev, -1, from, cnt, node) void irq_free_descs(unsigned int irq, unsigned int cnt); static inline void irq_free_desc(unsigned int irq) { irq_free_descs(irq, 1); } #ifdef CONFIG_GENERIC_IRQ_LEGACY void irq_init_desc(unsigned int irq); #endif /** * struct irq_chip_regs - register offsets for struct irq_gci * @enable: Enable register offset to reg_base * @disable: Disable register offset to reg_base * @mask: Mask register offset to reg_base * @ack: Ack register offset to reg_base * @eoi: Eoi register offset to reg_base * @type: Type configuration register offset to reg_base */ struct irq_chip_regs { unsigned long enable; unsigned long disable; unsigned long mask; unsigned long ack; unsigned long eoi; unsigned long type; }; /** * struct irq_chip_type - Generic interrupt chip instance for a flow type * @chip: The real interrupt chip which provides the callbacks * @regs: Register offsets for this chip * @handler: Flow handler associated with this chip * @type: Chip can handle these flow types * @mask_cache_priv: Cached mask register private to the chip type * @mask_cache: Pointer to cached mask register * * A irq_generic_chip can have several instances of irq_chip_type when * it requires different functions and register offsets for different * flow types. */ struct irq_chip_type { struct irq_chip chip; struct irq_chip_regs regs; irq_flow_handler_t handler; u32 type; u32 mask_cache_priv; u32 *mask_cache; }; /** * struct irq_chip_generic - Generic irq chip data structure * @lock: Lock to protect register and cache data access * @reg_base: Register base address (virtual) * @reg_readl: Alternate I/O accessor (defaults to readl if NULL) * @reg_writel: Alternate I/O accessor (defaults to writel if NULL) * @suspend: Function called from core code on suspend once per * chip; can be useful instead of irq_chip::suspend to * handle chip details even when no interrupts are in use * @resume: Function called from core code on resume once per chip; * can be useful instead of irq_chip::suspend to handle * chip details even when no interrupts are in use * @irq_base: Interrupt base nr for this chip * @irq_cnt: Number of interrupts handled by this chip * @mask_cache: Cached mask register shared between all chip types * @wake_enabled: Interrupt can wakeup from suspend * @wake_active: Interrupt is marked as an wakeup from suspend source * @num_ct: Number of available irq_chip_type instances (usually 1) * @private: Private data for non generic chip callbacks * @installed: bitfield to denote installed interrupts * @unused: bitfield to denote unused interrupts * @domain: irq domain pointer * @list: List head for keeping track of instances * @chip_types: Array of interrupt irq_chip_types * * Note, that irq_chip_generic can have multiple irq_chip_type * implementations which can be associated to a particular irq line of * an irq_chip_generic instance. That allows to share and protect * state in an irq_chip_generic instance when we need to implement * different flow mechanisms (level/edge) for it. */ struct irq_chip_generic { raw_spinlock_t lock; void __iomem *reg_base; u32 (*reg_readl)(void __iomem *addr); void (*reg_writel)(u32 val, void __iomem *addr); void (*suspend)(struct irq_chip_generic *gc); void (*resume)(struct irq_chip_generic *gc); unsigned int irq_base; unsigned int irq_cnt; u32 mask_cache; u32 wake_enabled; u32 wake_active; unsigned int num_ct; void *private; unsigned long installed; unsigned long unused; struct irq_domain *domain; struct list_head list; struct irq_chip_type chip_types[]; }; /** * enum irq_gc_flags - Initialization flags for generic irq chips * @IRQ_GC_INIT_MASK_CACHE: Initialize the mask_cache by reading mask reg * @IRQ_GC_INIT_NESTED_LOCK: Set the lock class of the irqs to nested for * irq chips which need to call irq_set_wake() on * the parent irq. Usually GPIO implementations * @IRQ_GC_MASK_CACHE_PER_TYPE: Mask cache is chip type private * @IRQ_GC_NO_MASK: Do not calculate irq_data->mask * @IRQ_GC_BE_IO: Use big-endian register accesses (default: LE) */ enum irq_gc_flags { IRQ_GC_INIT_MASK_CACHE = 1 << 0, IRQ_GC_INIT_NESTED_LOCK = 1 << 1, IRQ_GC_MASK_CACHE_PER_TYPE = 1 << 2, IRQ_GC_NO_MASK = 1 << 3, IRQ_GC_BE_IO = 1 << 4, }; /* * struct irq_domain_chip_generic - Generic irq chip data structure for irq domains * @irqs_per_chip: Number of interrupts per chip * @num_chips: Number of chips * @irq_flags_to_set: IRQ* flags to set on irq setup * @irq_flags_to_clear: IRQ* flags to clear on irq setup * @gc_flags: Generic chip specific setup flags * @exit: Function called on each chip when they are destroyed. * @gc: Array of pointers to generic interrupt chips */ struct irq_domain_chip_generic { unsigned int irqs_per_chip; unsigned int num_chips; unsigned int irq_flags_to_clear; unsigned int irq_flags_to_set; enum irq_gc_flags gc_flags; void (*exit)(struct irq_chip_generic *gc); struct irq_chip_generic *gc[]; }; /** * struct irq_domain_chip_generic_info - Generic chip information structure * @name: Name of the generic interrupt chip * @handler: Interrupt handler used by the generic interrupt chip * @irqs_per_chip: Number of interrupts each chip handles (max 32) * @num_ct: Number of irq_chip_type instances associated with each * chip * @irq_flags_to_clear: IRQ_* bits to clear in the mapping function * @irq_flags_to_set: IRQ_* bits to set in the mapping function * @gc_flags: Generic chip specific setup flags * @init: Function called on each chip when they are created. * Allow to do some additional chip initialisation. * @exit: Function called on each chip when they are destroyed. * Allow to do some chip cleanup operation. */ struct irq_domain_chip_generic_info { const char *name; irq_flow_handler_t handler; unsigned int irqs_per_chip; unsigned int num_ct; unsigned int irq_flags_to_clear; unsigned int irq_flags_to_set; enum irq_gc_flags gc_flags; int (*init)(struct irq_chip_generic *gc); void (*exit)(struct irq_chip_generic *gc); }; /* Generic chip callback functions */ void irq_gc_noop(struct irq_data *d); void irq_gc_mask_disable_reg(struct irq_data *d); void irq_gc_mask_set_bit(struct irq_data *d); void irq_gc_mask_clr_bit(struct irq_data *d); void irq_gc_unmask_enable_reg(struct irq_data *d); void irq_gc_ack_set_bit(struct irq_data *d); void irq_gc_ack_clr_bit(struct irq_data *d); void irq_gc_mask_disable_and_ack_set(struct irq_data *d); void irq_gc_eoi(struct irq_data *d); int irq_gc_set_wake(struct irq_data *d, unsigned int on); /* Setup functions for irq_chip_generic */ int irq_map_generic_chip(struct irq_domain *d, unsigned int virq, irq_hw_number_t hw_irq); void irq_unmap_generic_chip(struct irq_domain *d, unsigned int virq); struct irq_chip_generic * irq_alloc_generic_chip(const char *name, int nr_ct, unsigned int irq_base, void __iomem *reg_base, irq_flow_handler_t handler); void irq_setup_generic_chip(struct irq_chip_generic *gc, u32 msk, enum irq_gc_flags flags, unsigned int clr, unsigned int set); int irq_setup_alt_chip(struct irq_data *d, unsigned int type); void irq_remove_generic_chip(struct irq_chip_generic *gc, u32 msk, unsigned int clr, unsigned int set); struct irq_chip_generic * devm_irq_alloc_generic_chip(struct device *dev, const char *name, int num_ct, unsigned int irq_base, void __iomem *reg_base, irq_flow_handler_t handler); int devm_irq_setup_generic_chip(struct device *dev, struct irq_chip_generic *gc, u32 msk, enum irq_gc_flags flags, unsigned int clr, unsigned int set); struct irq_chip_generic *irq_get_domain_generic_chip(struct irq_domain *d, unsigned int hw_irq); #ifdef CONFIG_GENERIC_IRQ_CHIP int irq_domain_alloc_generic_chips(struct irq_domain *d, const struct irq_domain_chip_generic_info *info); void irq_domain_remove_generic_chips(struct irq_domain *d); #else static inline int irq_domain_alloc_generic_chips(struct irq_domain *d, const struct irq_domain_chip_generic_info *info) { return -EINVAL; } static inline void irq_domain_remove_generic_chips(struct irq_domain *d) { } #endif /* CONFIG_GENERIC_IRQ_CHIP */ int __irq_alloc_domain_generic_chips(struct irq_domain *d, int irqs_per_chip, int num_ct, const char *name, irq_flow_handler_t handler, unsigned int clr, unsigned int set, enum irq_gc_flags flags); #define irq_alloc_domain_generic_chips(d, irqs_per_chip, num_ct, name, \ handler, clr, set, flags) \ ({ \ MAYBE_BUILD_BUG_ON(irqs_per_chip > 32); \ __irq_alloc_domain_generic_chips(d, irqs_per_chip, num_ct, name,\ handler, clr, set, flags); \ }) static inline void irq_free_generic_chip(struct irq_chip_generic *gc) { kfree(gc); } static inline void irq_destroy_generic_chip(struct irq_chip_generic *gc, u32 msk, unsigned int clr, unsigned int set) { irq_remove_generic_chip(gc, msk, clr, set); irq_free_generic_chip(gc); } static inline struct irq_chip_type *irq_data_get_chip_type(struct irq_data *d) { return container_of(d->chip, struct irq_chip_type, chip); } #define IRQ_MSK(n) (u32)((n) < 32 ? ((1 << (n)) - 1) : UINT_MAX) static inline void irq_reg_writel(struct irq_chip_generic *gc, u32 val, int reg_offset) { if (gc->reg_writel) gc->reg_writel(val, gc->reg_base + reg_offset); else writel(val, gc->reg_base + reg_offset); } static inline u32 irq_reg_readl(struct irq_chip_generic *gc, int reg_offset) { if (gc->reg_readl) return gc->reg_readl(gc->reg_base + reg_offset); else return readl(gc->reg_base + reg_offset); } struct irq_matrix; struct irq_matrix *irq_alloc_matrix(unsigned int matrix_bits, unsigned int alloc_start, unsigned int alloc_end); void irq_matrix_online(struct irq_matrix *m); void irq_matrix_offline(struct irq_matrix *m); void irq_matrix_assign_system(struct irq_matrix *m, unsigned int bit, bool replace); int irq_matrix_reserve_managed(struct irq_matrix *m, const struct cpumask *msk); void irq_matrix_remove_managed(struct irq_matrix *m, const struct cpumask *msk); int irq_matrix_alloc_managed(struct irq_matrix *m, const struct cpumask *msk, unsigned int *mapped_cpu); void irq_matrix_reserve(struct irq_matrix *m); void irq_matrix_remove_reserved(struct irq_matrix *m); int irq_matrix_alloc(struct irq_matrix *m, const struct cpumask *msk, bool reserved, unsigned int *mapped_cpu); void irq_matrix_free(struct irq_matrix *m, unsigned int cpu, unsigned int bit, bool managed); void irq_matrix_assign(struct irq_matrix *m, unsigned int bit); unsigned int irq_matrix_available(struct irq_matrix *m, bool cpudown); unsigned int irq_matrix_allocated(struct irq_matrix *m); unsigned int irq_matrix_reserved(struct irq_matrix *m); void irq_matrix_debug_show(struct seq_file *sf, struct irq_matrix *m, int ind); /* Contrary to Linux irqs, for hardware irqs the irq number 0 is valid */ #define INVALID_HWIRQ (~0UL) irq_hw_number_t ipi_get_hwirq(unsigned int irq, unsigned int cpu); int __ipi_send_single(struct irq_desc *desc, unsigned int cpu); int __ipi_send_mask(struct irq_desc *desc, const struct cpumask *dest); int ipi_send_single(unsigned int virq, unsigned int cpu); int ipi_send_mask(unsigned int virq, const struct cpumask *dest); void ipi_mux_process(void); int ipi_mux_create(unsigned int nr_ipi, void (*mux_send)(unsigned int cpu)); #ifdef CONFIG_GENERIC_IRQ_MULTI_HANDLER /* * Registers a generic IRQ handling function as the top-level IRQ handler in * the system, which is generally the first C code called from an assembly * architecture-specific interrupt handler. * * Returns 0 on success, or -EBUSY if an IRQ handler has already been * registered. */ int __init set_handle_irq(void (*handle_irq)(struct pt_regs *)); /* * Allows interrupt handlers to find the irqchip that's been registered as the * top-level IRQ handler. */ extern void (*handle_arch_irq)(struct pt_regs *) __ro_after_init; asmlinkage void generic_handle_arch_irq(struct pt_regs *regs); #else #ifndef set_handle_irq #define set_handle_irq(handle_irq) \ do { \ (void)handle_irq; \ WARN_ON(1); \ } while (0) #endif #endif #endif /* _LINUX_IRQ_H */ |
| 11 11 1 1 9 1 9 7 3 3 14 14 11 11 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 | // SPDX-License-Identifier: GPL-2.0-or-later /* * ChaCha and HChaCha functions (x86_64 optimized) * * Copyright (C) 2015 Martin Willi */ #include <asm/simd.h> #include <crypto/chacha.h> #include <linux/jump_label.h> #include <linux/kernel.h> #include <linux/module.h> #include <linux/sizes.h> asmlinkage void chacha_block_xor_ssse3(const struct chacha_state *state, u8 *dst, const u8 *src, unsigned int len, int nrounds); asmlinkage void chacha_4block_xor_ssse3(const struct chacha_state *state, u8 *dst, const u8 *src, unsigned int len, int nrounds); asmlinkage void hchacha_block_ssse3(const struct chacha_state *state, u32 out[HCHACHA_OUT_WORDS], int nrounds); asmlinkage void chacha_2block_xor_avx2(const struct chacha_state *state, u8 *dst, const u8 *src, unsigned int len, int nrounds); asmlinkage void chacha_4block_xor_avx2(const struct chacha_state *state, u8 *dst, const u8 *src, unsigned int len, int nrounds); asmlinkage void chacha_8block_xor_avx2(const struct chacha_state *state, u8 *dst, const u8 *src, unsigned int len, int nrounds); asmlinkage void chacha_2block_xor_avx512vl(const struct chacha_state *state, u8 *dst, const u8 *src, unsigned int len, int nrounds); asmlinkage void chacha_4block_xor_avx512vl(const struct chacha_state *state, u8 *dst, const u8 *src, unsigned int len, int nrounds); asmlinkage void chacha_8block_xor_avx512vl(const struct chacha_state *state, u8 *dst, const u8 *src, unsigned int len, int nrounds); static __ro_after_init DEFINE_STATIC_KEY_FALSE(chacha_use_simd); static __ro_after_init DEFINE_STATIC_KEY_FALSE(chacha_use_avx2); static __ro_after_init DEFINE_STATIC_KEY_FALSE(chacha_use_avx512vl); static unsigned int chacha_advance(unsigned int len, unsigned int maxblocks) { len = min(len, maxblocks * CHACHA_BLOCK_SIZE); return round_up(len, CHACHA_BLOCK_SIZE) / CHACHA_BLOCK_SIZE; } static void chacha_dosimd(struct chacha_state *state, u8 *dst, const u8 *src, unsigned int bytes, int nrounds) { if (static_branch_likely(&chacha_use_avx512vl)) { while (bytes >= CHACHA_BLOCK_SIZE * 8) { chacha_8block_xor_avx512vl(state, dst, src, bytes, nrounds); bytes -= CHACHA_BLOCK_SIZE * 8; src += CHACHA_BLOCK_SIZE * 8; dst += CHACHA_BLOCK_SIZE * 8; state->x[12] += 8; } if (bytes > CHACHA_BLOCK_SIZE * 4) { chacha_8block_xor_avx512vl(state, dst, src, bytes, nrounds); state->x[12] += chacha_advance(bytes, 8); return; } if (bytes > CHACHA_BLOCK_SIZE * 2) { chacha_4block_xor_avx512vl(state, dst, src, bytes, nrounds); state->x[12] += chacha_advance(bytes, 4); return; } if (bytes) { chacha_2block_xor_avx512vl(state, dst, src, bytes, nrounds); state->x[12] += chacha_advance(bytes, 2); return; } } if (static_branch_likely(&chacha_use_avx2)) { while (bytes >= CHACHA_BLOCK_SIZE * 8) { chacha_8block_xor_avx2(state, dst, src, bytes, nrounds); bytes -= CHACHA_BLOCK_SIZE * 8; src += CHACHA_BLOCK_SIZE * 8; dst += CHACHA_BLOCK_SIZE * 8; state->x[12] += 8; } if (bytes > CHACHA_BLOCK_SIZE * 4) { chacha_8block_xor_avx2(state, dst, src, bytes, nrounds); state->x[12] += chacha_advance(bytes, 8); return; } if (bytes > CHACHA_BLOCK_SIZE * 2) { chacha_4block_xor_avx2(state, dst, src, bytes, nrounds); state->x[12] += chacha_advance(bytes, 4); return; } if (bytes > CHACHA_BLOCK_SIZE) { chacha_2block_xor_avx2(state, dst, src, bytes, nrounds); state->x[12] += chacha_advance(bytes, 2); return; } } while (bytes >= CHACHA_BLOCK_SIZE * 4) { chacha_4block_xor_ssse3(state, dst, src, bytes, nrounds); bytes -= CHACHA_BLOCK_SIZE * 4; src += CHACHA_BLOCK_SIZE * 4; dst += CHACHA_BLOCK_SIZE * 4; state->x[12] += 4; } if (bytes > CHACHA_BLOCK_SIZE) { chacha_4block_xor_ssse3(state, dst, src, bytes, nrounds); state->x[12] += chacha_advance(bytes, 4); return; } if (bytes) { chacha_block_xor_ssse3(state, dst, src, bytes, nrounds); state->x[12]++; } } void hchacha_block_arch(const struct chacha_state *state, u32 out[HCHACHA_OUT_WORDS], int nrounds) { if (!static_branch_likely(&chacha_use_simd)) { hchacha_block_generic(state, out, nrounds); } else { kernel_fpu_begin(); hchacha_block_ssse3(state, out, nrounds); kernel_fpu_end(); } } EXPORT_SYMBOL(hchacha_block_arch); void chacha_crypt_arch(struct chacha_state *state, u8 *dst, const u8 *src, unsigned int bytes, int nrounds) { if (!static_branch_likely(&chacha_use_simd) || bytes <= CHACHA_BLOCK_SIZE) return chacha_crypt_generic(state, dst, src, bytes, nrounds); do { unsigned int todo = min_t(unsigned int, bytes, SZ_4K); kernel_fpu_begin(); chacha_dosimd(state, dst, src, todo, nrounds); kernel_fpu_end(); bytes -= todo; src += todo; dst += todo; } while (bytes); } EXPORT_SYMBOL(chacha_crypt_arch); bool chacha_is_arch_optimized(void) { return static_key_enabled(&chacha_use_simd); } EXPORT_SYMBOL(chacha_is_arch_optimized); static int __init chacha_simd_mod_init(void) { if (!boot_cpu_has(X86_FEATURE_SSSE3)) return 0; static_branch_enable(&chacha_use_simd); if (boot_cpu_has(X86_FEATURE_AVX) && boot_cpu_has(X86_FEATURE_AVX2) && cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM, NULL)) { static_branch_enable(&chacha_use_avx2); if (boot_cpu_has(X86_FEATURE_AVX512VL) && boot_cpu_has(X86_FEATURE_AVX512BW)) /* kmovq */ static_branch_enable(&chacha_use_avx512vl); } return 0; } subsys_initcall(chacha_simd_mod_init); static void __exit chacha_simd_mod_exit(void) { } module_exit(chacha_simd_mod_exit); MODULE_LICENSE("GPL"); MODULE_AUTHOR("Martin Willi <martin@strongswan.org>"); MODULE_DESCRIPTION("ChaCha and HChaCha functions (x86_64 optimized)"); |
| 6 6 6 6 6 6 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 | // SPDX-License-Identifier: GPL-2.0-only /* * Copyright (C) 2012 Red Hat * * based in parts on udlfb.c: * Copyright (C) 2009 Roberto De Ioris <roberto@unbit.it> * Copyright (C) 2009 Jaya Kumar <jayakumar.lkml@gmail.com> * Copyright (C) 2009 Bernie Thompson <bernie@plugable.com> */ #include <linux/bitfield.h> #include <drm/drm_atomic.h> #include <drm/drm_atomic_helper.h> #include <drm/drm_crtc_helper.h> #include <drm/drm_damage_helper.h> #include <drm/drm_drv.h> #include <drm/drm_edid.h> #include <drm/drm_fourcc.h> #include <drm/drm_gem_atomic_helper.h> #include <drm/drm_gem_framebuffer_helper.h> #include <drm/drm_gem_shmem_helper.h> #include <drm/drm_modeset_helper_vtables.h> #include <drm/drm_probe_helper.h> #include <drm/drm_vblank.h> #include "udl_drv.h" #include "udl_edid.h" #include "udl_proto.h" /* * All DisplayLink bulk operations start with 0xaf (UDL_MSG_BULK), followed by * a specific command code. All operations are written to a command buffer, which * the driver sends to the device. */ static char *udl_set_register(char *buf, u8 reg, u8 val) { *buf++ = UDL_MSG_BULK; *buf++ = UDL_CMD_WRITEREG; *buf++ = reg; *buf++ = val; return buf; } static char *udl_vidreg_lock(char *buf) { return udl_set_register(buf, UDL_REG_VIDREG, UDL_VIDREG_LOCK); } static char *udl_vidreg_unlock(char *buf) { return udl_set_register(buf, UDL_REG_VIDREG, UDL_VIDREG_UNLOCK); } static char *udl_set_blank_mode(char *buf, u8 mode) { return udl_set_register(buf, UDL_REG_BLANKMODE, mode); } static char *udl_set_color_depth(char *buf, u8 selection) { return udl_set_register(buf, UDL_REG_COLORDEPTH, selection); } static char *udl_set_base16bpp(char *buf, u32 base) { /* the base pointer is 24 bits wide, 0x20 is hi byte. */ u8 reg20 = FIELD_GET(UDL_BASE_ADDR2_MASK, base); u8 reg21 = FIELD_GET(UDL_BASE_ADDR1_MASK, base); u8 reg22 = FIELD_GET(UDL_BASE_ADDR0_MASK, base); buf = udl_set_register(buf, UDL_REG_BASE16BPP_ADDR2, reg20); buf = udl_set_register(buf, UDL_REG_BASE16BPP_ADDR1, reg21); buf = udl_set_register(buf, UDL_REG_BASE16BPP_ADDR0, reg22); return buf; } /* * DisplayLink HW has separate 16bpp and 8bpp framebuffers. * In 24bpp modes, the low 323 RGB bits go in the 8bpp framebuffer */ static char *udl_set_base8bpp(char *buf, u32 base) { /* the base pointer is 24 bits wide, 0x26 is hi byte. */ u8 reg26 = FIELD_GET(UDL_BASE_ADDR2_MASK, base); u8 reg27 = FIELD_GET(UDL_BASE_ADDR1_MASK, base); u8 reg28 = FIELD_GET(UDL_BASE_ADDR0_MASK, base); buf = udl_set_register(buf, UDL_REG_BASE8BPP_ADDR2, reg26); buf = udl_set_register(buf, UDL_REG_BASE8BPP_ADDR1, reg27); buf = udl_set_register(buf, UDL_REG_BASE8BPP_ADDR0, reg28); return buf; } static char *udl_set_register_16(char *wrptr, u8 reg, u16 value) { wrptr = udl_set_register(wrptr, reg, value >> 8); return udl_set_register(wrptr, reg+1, value); } /* * This is kind of weird because the controller takes some * register values in a different byte order than other registers. */ static char *udl_set_register_16be(char *wrptr, u8 reg, u16 value) { wrptr = udl_set_register(wrptr, reg, value); return udl_set_register(wrptr, reg+1, value >> 8); } /* * LFSR is linear feedback shift register. The reason we have this is * because the display controller needs to minimize the clock depth of * various counters used in the display path. So this code reverses the * provided value into the lfsr16 value by counting backwards to get * the value that needs to be set in the hardware comparator to get the * same actual count. This makes sense once you read above a couple of * times and think about it from a hardware perspective. */ static u16 udl_lfsr16(u16 actual_count) { u32 lv = 0xFFFF; /* This is the lfsr value that the hw starts with */ while (actual_count--) { lv = ((lv << 1) | (((lv >> 15) ^ (lv >> 4) ^ (lv >> 2) ^ (lv >> 1)) & 1)) & 0xFFFF; } return (u16) lv; } /* * This does LFSR conversion on the value that is to be written. * See LFSR explanation above for more detail. */ static char *udl_set_register_lfsr16(char *wrptr, u8 reg, u16 value) { return udl_set_register_16(wrptr, reg, udl_lfsr16(value)); } /* * Takes a DRM display mode and converts it into the DisplayLink * equivalent register commands. */ static char *udl_set_display_mode(char *buf, struct drm_display_mode *mode) { u16 reg01 = mode->crtc_htotal - mode->crtc_hsync_start; u16 reg03 = reg01 + mode->crtc_hdisplay; u16 reg05 = mode->crtc_vtotal - mode->crtc_vsync_start; u16 reg07 = reg05 + mode->crtc_vdisplay; u16 reg09 = mode->crtc_htotal - 1; u16 reg0b = 1; /* libdlo hardcodes hsync start to 1 */ u16 reg0d = mode->crtc_hsync_end - mode->crtc_hsync_start + 1; u16 reg0f = mode->hdisplay; u16 reg11 = mode->crtc_vtotal; u16 reg13 = 0; /* libdlo hardcodes vsync start to 0 */ u16 reg15 = mode->crtc_vsync_end - mode->crtc_vsync_start; u16 reg17 = mode->crtc_vdisplay; u16 reg1b = mode->clock / 5; buf = udl_set_register_lfsr16(buf, UDL_REG_XDISPLAYSTART, reg01); buf = udl_set_register_lfsr16(buf, UDL_REG_XDISPLAYEND, reg03); buf = udl_set_register_lfsr16(buf, UDL_REG_YDISPLAYSTART, reg05); buf = udl_set_register_lfsr16(buf, UDL_REG_YDISPLAYEND, reg07); buf = udl_set_register_lfsr16(buf, UDL_REG_XENDCOUNT, reg09); buf = udl_set_register_lfsr16(buf, UDL_REG_HSYNCSTART, reg0b); buf = udl_set_register_lfsr16(buf, UDL_REG_HSYNCEND, reg0d); buf = udl_set_register_16(buf, UDL_REG_HPIXELS, reg0f); buf = udl_set_register_lfsr16(buf, UDL_REG_YENDCOUNT, reg11); buf = udl_set_register_lfsr16(buf, UDL_REG_VSYNCSTART, reg13); buf = udl_set_register_lfsr16(buf, UDL_REG_VSYNCEND, reg15); buf = udl_set_register_16(buf, UDL_REG_VPIXELS, reg17); buf = udl_set_register_16be(buf, UDL_REG_PIXELCLOCK5KHZ, reg1b); return buf; } static char *udl_dummy_render(char *wrptr) { *wrptr++ = UDL_MSG_BULK; *wrptr++ = UDL_CMD_WRITECOPY16; *wrptr++ = 0x00; /* from addr */ *wrptr++ = 0x00; *wrptr++ = 0x00; *wrptr++ = 0x01; /* one pixel */ *wrptr++ = 0x00; /* to address */ *wrptr++ = 0x00; *wrptr++ = 0x00; return wrptr; } static long udl_log_cpp(unsigned int cpp) { if (WARN_ON(!is_power_of_2(cpp))) return -EINVAL; return __ffs(cpp); } static int udl_handle_damage(struct drm_framebuffer *fb, const struct iosys_map *map, const struct drm_rect *clip) { struct drm_device *dev = fb->dev; struct udl_device *udl = to_udl(dev); void *vaddr = map->vaddr; /* TODO: Use mapping abstraction properly */ int i, ret; char *cmd; struct urb *urb; int log_bpp; ret = udl_log_cpp(fb->format->cpp[0]); if (ret < 0) return ret; log_bpp = ret; urb = udl_get_urb(udl); if (!urb) return -ENOMEM; cmd = urb->transfer_buffer; for (i = clip->y1; i < clip->y2; i++) { const int line_offset = fb->pitches[0] * i; const int byte_offset = line_offset + (clip->x1 << log_bpp); const int dev_byte_offset = (fb->width * i + clip->x1) << log_bpp; const int byte_width = drm_rect_width(clip) << log_bpp; ret = udl_render_hline(udl, log_bpp, &urb, (char *)vaddr, &cmd, byte_offset, dev_byte_offset, byte_width); if (ret) return ret; } if (cmd > (char *)urb->transfer_buffer) { /* Send partial buffer remaining before exiting */ int len; if (cmd < (char *)urb->transfer_buffer + urb->transfer_buffer_length) *cmd++ = UDL_MSG_BULK; len = cmd - (char *)urb->transfer_buffer; ret = udl_submit_urb(udl, urb, len); } else { udl_urb_completion(urb); } return 0; } /* * Primary plane */ static const uint32_t udl_primary_plane_formats[] = { DRM_FORMAT_RGB565, DRM_FORMAT_XRGB8888, }; static const uint64_t udl_primary_plane_fmtmods[] = { DRM_FORMAT_MOD_LINEAR, DRM_FORMAT_MOD_INVALID }; static int udl_primary_plane_helper_atomic_check(struct drm_plane *plane, struct drm_atomic_state *state) { struct drm_plane_state *new_plane_state = drm_atomic_get_new_plane_state(state, plane); struct drm_crtc *new_crtc = new_plane_state->crtc; struct drm_crtc_state *new_crtc_state = NULL; if (new_crtc) new_crtc_state = drm_atomic_get_new_crtc_state(state, new_crtc); return drm_atomic_helper_check_plane_state(new_plane_state, new_crtc_state, DRM_PLANE_NO_SCALING, DRM_PLANE_NO_SCALING, false, false); } static void udl_primary_plane_helper_atomic_update(struct drm_plane *plane, struct drm_atomic_state *state) { struct drm_device *dev = plane->dev; struct drm_plane_state *plane_state = drm_atomic_get_new_plane_state(state, plane); struct drm_shadow_plane_state *shadow_plane_state = to_drm_shadow_plane_state(plane_state); struct drm_framebuffer *fb = plane_state->fb; struct drm_plane_state *old_plane_state = drm_atomic_get_old_plane_state(state, plane); struct drm_atomic_helper_damage_iter iter; struct drm_rect damage; int ret, idx; if (!fb) return; /* no framebuffer; plane is disabled */ ret = drm_gem_fb_begin_cpu_access(fb, DMA_FROM_DEVICE); if (ret) return; if (!drm_dev_enter(dev, &idx)) goto out_drm_gem_fb_end_cpu_access; drm_atomic_helper_damage_iter_init(&iter, old_plane_state, plane_state); drm_atomic_for_each_plane_damage(&iter, &damage) { udl_handle_damage(fb, &shadow_plane_state->data[0], &damage); } drm_dev_exit(idx); out_drm_gem_fb_end_cpu_access: drm_gem_fb_end_cpu_access(fb, DMA_FROM_DEVICE); } static const struct drm_plane_helper_funcs udl_primary_plane_helper_funcs = { DRM_GEM_SHADOW_PLANE_HELPER_FUNCS, .atomic_check = udl_primary_plane_helper_atomic_check, .atomic_update = udl_primary_plane_helper_atomic_update, }; static const struct drm_plane_funcs udl_primary_plane_funcs = { .update_plane = drm_atomic_helper_update_plane, .disable_plane = drm_atomic_helper_disable_plane, .destroy = drm_plane_cleanup, DRM_GEM_SHADOW_PLANE_FUNCS, }; /* * CRTC */ static void udl_crtc_helper_atomic_enable(struct drm_crtc *crtc, struct drm_atomic_state *state) { struct drm_device *dev = crtc->dev; struct udl_device *udl = to_udl(dev); struct drm_crtc_state *crtc_state = drm_atomic_get_new_crtc_state(state, crtc); struct drm_display_mode *mode = &crtc_state->mode; struct urb *urb; char *buf; int idx; if (!drm_dev_enter(dev, &idx)) return; urb = udl_get_urb(udl); if (!urb) goto out; buf = (char *)urb->transfer_buffer; buf = udl_vidreg_lock(buf); buf = udl_set_color_depth(buf, UDL_COLORDEPTH_16BPP); /* set base for 16bpp segment to 0 */ buf = udl_set_base16bpp(buf, 0); /* set base for 8bpp segment to end of fb */ buf = udl_set_base8bpp(buf, 2 * mode->vdisplay * mode->hdisplay); buf = udl_set_display_mode(buf, mode); buf = udl_set_blank_mode(buf, UDL_BLANKMODE_ON); buf = udl_vidreg_unlock(buf); buf = udl_dummy_render(buf); udl_submit_urb(udl, urb, buf - (char *)urb->transfer_buffer); out: drm_dev_exit(idx); } static void udl_crtc_helper_atomic_disable(struct drm_crtc *crtc, struct drm_atomic_state *state) { struct drm_device *dev = crtc->dev; struct udl_device *udl = to_udl(dev); struct urb *urb; char *buf; int idx; if (!drm_dev_enter(dev, &idx)) return; urb = udl_get_urb(udl); if (!urb) goto out; buf = (char *)urb->transfer_buffer; buf = udl_vidreg_lock(buf); buf = udl_set_blank_mode(buf, UDL_BLANKMODE_POWERDOWN); buf = udl_vidreg_unlock(buf); buf = udl_dummy_render(buf); udl_submit_urb(udl, urb, buf - (char *)urb->transfer_buffer); out: drm_dev_exit(idx); } static const struct drm_crtc_helper_funcs udl_crtc_helper_funcs = { .atomic_check = drm_crtc_helper_atomic_check, .atomic_enable = udl_crtc_helper_atomic_enable, .atomic_disable = udl_crtc_helper_atomic_disable, }; static const struct drm_crtc_funcs udl_crtc_funcs = { .reset = drm_atomic_helper_crtc_reset, .destroy = drm_crtc_cleanup, .set_config = drm_atomic_helper_set_config, .page_flip = drm_atomic_helper_page_flip, .atomic_duplicate_state = drm_atomic_helper_crtc_duplicate_state, .atomic_destroy_state = drm_atomic_helper_crtc_destroy_state, }; /* * Encoder */ static const struct drm_encoder_funcs udl_encoder_funcs = { .destroy = drm_encoder_cleanup, }; /* * Connector */ static int udl_connector_helper_get_modes(struct drm_connector *connector) { const struct drm_edid *drm_edid; int count; drm_edid = udl_edid_read(connector); drm_edid_connector_update(connector, drm_edid); count = drm_edid_connector_add_modes(connector); drm_edid_free(drm_edid); return count; } static int udl_connector_helper_detect_ctx(struct drm_connector *connector, struct drm_modeset_acquire_ctx *ctx, bool force) { struct udl_device *udl = to_udl(connector->dev); if (udl_probe_edid(udl)) return connector_status_connected; return connector_status_disconnected; } static const struct drm_connector_helper_funcs udl_connector_helper_funcs = { .get_modes = udl_connector_helper_get_modes, .detect_ctx = udl_connector_helper_detect_ctx, }; static const struct drm_connector_funcs udl_connector_funcs = { .reset = drm_atomic_helper_connector_reset, .fill_modes = drm_helper_probe_single_connector_modes, .destroy = drm_connector_cleanup, .atomic_duplicate_state = drm_atomic_helper_connector_duplicate_state, .atomic_destroy_state = drm_atomic_helper_connector_destroy_state, }; /* * Modesetting */ static enum drm_mode_status udl_mode_config_mode_valid(struct drm_device *dev, const struct drm_display_mode *mode) { struct udl_device *udl = to_udl(dev); if (udl->sku_pixel_limit) { if (mode->vdisplay * mode->hdisplay > udl->sku_pixel_limit) return MODE_MEM; } return MODE_OK; } static const struct drm_mode_config_funcs udl_mode_config_funcs = { .fb_create = drm_gem_fb_create_with_dirty, .mode_valid = udl_mode_config_mode_valid, .atomic_check = drm_atomic_helper_check, .atomic_commit = drm_atomic_helper_commit, }; int udl_modeset_init(struct udl_device *udl) { struct drm_device *dev = &udl->drm; struct drm_plane *primary_plane; struct drm_crtc *crtc; struct drm_encoder *encoder; struct drm_connector *connector; int ret; ret = drmm_mode_config_init(dev); if (ret) return ret; dev->mode_config.min_width = 640; dev->mode_config.min_height = 480; dev->mode_config.max_width = 2048; dev->mode_config.max_height = 2048; dev->mode_config.preferred_depth = 16; dev->mode_config.funcs = &udl_mode_config_funcs; primary_plane = &udl->primary_plane; ret = drm_universal_plane_init(dev, primary_plane, 0, &udl_primary_plane_funcs, udl_primary_plane_formats, ARRAY_SIZE(udl_primary_plane_formats), udl_primary_plane_fmtmods, DRM_PLANE_TYPE_PRIMARY, NULL); if (ret) return ret; drm_plane_helper_add(primary_plane, &udl_primary_plane_helper_funcs); drm_plane_enable_fb_damage_clips(primary_plane); crtc = &udl->crtc; ret = drm_crtc_init_with_planes(dev, crtc, primary_plane, NULL, &udl_crtc_funcs, NULL); if (ret) return ret; drm_crtc_helper_add(crtc, &udl_crtc_helper_funcs); encoder = &udl->encoder; ret = drm_encoder_init(dev, encoder, &udl_encoder_funcs, DRM_MODE_ENCODER_DAC, NULL); if (ret) return ret; encoder->possible_crtcs = drm_crtc_mask(crtc); connector = &udl->connector; ret = drm_connector_init(dev, connector, &udl_connector_funcs, DRM_MODE_CONNECTOR_VGA); if (ret) return ret; drm_connector_helper_add(connector, &udl_connector_helper_funcs); connector->polled = DRM_CONNECTOR_POLL_CONNECT | DRM_CONNECTOR_POLL_DISCONNECT; ret = drm_connector_attach_encoder(connector, encoder); if (ret) return ret; drm_mode_config_reset(dev); drmm_kms_helper_poll_init(dev); return 0; } |
| 1 2 1 1 1 1 1 1 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 | // SPDX-License-Identifier: GPL-2.0-or-later /* * Copyright (c) 2013 Andrew Duggan <aduggan@synaptics.com> * Copyright (c) 2013 Synaptics Incorporated * Copyright (c) 2014 Benjamin Tissoires <benjamin.tissoires@gmail.com> * Copyright (c) 2014 Red Hat, Inc */ #include <linux/kernel.h> #include <linux/hid.h> #include <linux/input.h> #include <linux/input/mt.h> #include <linux/irq.h> #include <linux/irqdomain.h> #include <linux/module.h> #include <linux/pm.h> #include <linux/slab.h> #include <linux/wait.h> #include <linux/sched.h> #include <linux/rmi.h> #include "hid-ids.h" #define RMI_MOUSE_REPORT_ID 0x01 /* Mouse emulation Report */ #define RMI_WRITE_REPORT_ID 0x09 /* Output Report */ #define RMI_READ_ADDR_REPORT_ID 0x0a /* Output Report */ #define RMI_READ_DATA_REPORT_ID 0x0b /* Input Report */ #define RMI_ATTN_REPORT_ID 0x0c /* Input Report */ #define RMI_SET_RMI_MODE_REPORT_ID 0x0f /* Feature Report */ /* flags */ #define RMI_READ_REQUEST_PENDING 0 #define RMI_READ_DATA_PENDING 1 #define RMI_STARTED 2 /* device flags */ #define RMI_DEVICE BIT(0) #define RMI_DEVICE_HAS_PHYS_BUTTONS BIT(1) #define RMI_DEVICE_OUTPUT_SET_REPORT BIT(2) /* * retrieve the ctrl registers * the ctrl register has a size of 20 but a fw bug split it into 16 + 4, * and there is no way to know if the first 20 bytes are here or not. * We use only the first 12 bytes, so get only them. */ #define RMI_F11_CTRL_REG_COUNT 12 enum rmi_mode_type { RMI_MODE_OFF = 0, RMI_MODE_ATTN_REPORTS = 1, RMI_MODE_NO_PACKED_ATTN_REPORTS = 2, }; /** * struct rmi_data - stores information for hid communication * * @page_mutex: Locks current page to avoid changing pages in unexpected ways. * @page: Keeps track of the current virtual page * @xport: transport device to be registered with the RMI4 core. * * @wait: Used for waiting for read data * * @writeReport: output buffer when writing RMI registers * @readReport: input buffer when reading RMI registers * * @input_report_size: size of an input report (advertised by HID) * @output_report_size: size of an output report (advertised by HID) * * @flags: flags for the current device (started, reading, etc...) * * @reset_work: worker which will be called in case of a mouse report * @hdev: pointer to the struct hid_device * * @device_flags: flags which describe the device * * @domain: the IRQ domain allocated for this RMI4 device * @rmi_irq: the irq that will be used to generate events to rmi-core */ struct rmi_data { struct mutex page_mutex; int page; struct rmi_transport_dev xport; wait_queue_head_t wait; u8 *writeReport; u8 *readReport; u32 input_report_size; u32 output_report_size; unsigned long flags; struct work_struct reset_work; struct hid_device *hdev; unsigned long device_flags; struct irq_domain *domain; int rmi_irq; }; #define RMI_PAGE(addr) (((addr) >> 8) & 0xff) static int rmi_write_report(struct hid_device *hdev, u8 *report, int len); /** * rmi_set_page - Set RMI page * @hdev: The pointer to the hid_device struct * @page: The new page address. * * RMI devices have 16-bit addressing, but some of the physical * implementations (like SMBus) only have 8-bit addressing. So RMI implements * a page address at 0xff of every page so we can reliable page addresses * every 256 registers. * * The page_mutex lock must be held when this function is entered. * * Returns zero on success, non-zero on failure. */ static int rmi_set_page(struct hid_device *hdev, u8 page) { struct rmi_data *data = hid_get_drvdata(hdev); int retval; data->writeReport[0] = RMI_WRITE_REPORT_ID; data->writeReport[1] = 1; data->writeReport[2] = 0xFF; data->writeReport[4] = page; retval = rmi_write_report(hdev, data->writeReport, data->output_report_size); if (retval != data->output_report_size) { dev_err(&hdev->dev, "%s: set page failed: %d.", __func__, retval); return retval; } data->page = page; return 0; } static int rmi_set_mode(struct hid_device *hdev, u8 mode) { int ret; const u8 txbuf[2] = {RMI_SET_RMI_MODE_REPORT_ID, mode}; u8 *buf; buf = kmemdup(txbuf, sizeof(txbuf), GFP_KERNEL); if (!buf) return -ENOMEM; ret = hid_hw_raw_request(hdev, RMI_SET_RMI_MODE_REPORT_ID, buf, sizeof(txbuf), HID_FEATURE_REPORT, HID_REQ_SET_REPORT); kfree(buf); if (ret < 0) { dev_err(&hdev->dev, "unable to set rmi mode to %d (%d)\n", mode, ret); return ret; } return 0; } static int rmi_write_report(struct hid_device *hdev, u8 *report, int len) { struct rmi_data *data = hid_get_drvdata(hdev); int ret; if (data->device_flags & RMI_DEVICE_OUTPUT_SET_REPORT) { /* * Talk to device by using SET_REPORT requests instead. */ ret = hid_hw_raw_request(hdev, report[0], report, len, HID_OUTPUT_REPORT, HID_REQ_SET_REPORT); } else { ret = hid_hw_output_report(hdev, (void *)report, len); } if (ret < 0) { dev_err(&hdev->dev, "failed to write hid report (%d)\n", ret); return ret; } return ret; } static int rmi_hid_read_block(struct rmi_transport_dev *xport, u16 addr, void *buf, size_t len) { struct rmi_data *data = container_of(xport, struct rmi_data, xport); struct hid_device *hdev = data->hdev; int ret; int bytes_read; int bytes_needed; int retries; int read_input_count; mutex_lock(&data->page_mutex); if (RMI_PAGE(addr) != data->page) { ret = rmi_set_page(hdev, RMI_PAGE(addr)); if (ret < 0) goto exit; } for (retries = 5; retries > 0; retries--) { data->writeReport[0] = RMI_READ_ADDR_REPORT_ID; data->writeReport[1] = 0; /* old 1 byte read count */ data->writeReport[2] = addr & 0xFF; data->writeReport[3] = (addr >> 8) & 0xFF; data->writeReport[4] = len & 0xFF; data->writeReport[5] = (len >> 8) & 0xFF; set_bit(RMI_READ_REQUEST_PENDING, &data->flags); ret = rmi_write_report(hdev, data->writeReport, data->output_report_size); if (ret != data->output_report_size) { dev_err(&hdev->dev, "failed to write request output report (%d)\n", ret); goto exit; } bytes_read = 0; bytes_needed = len; while (bytes_read < len) { if (!wait_event_timeout(data->wait, test_bit(RMI_READ_DATA_PENDING, &data->flags), msecs_to_jiffies(1000))) { hid_warn(hdev, "%s: timeout elapsed\n", __func__); ret = -EAGAIN; break; } read_input_count = data->readReport[1]; memcpy(buf + bytes_read, &data->readReport[2], min(read_input_count, bytes_needed)); bytes_read += read_input_count; bytes_needed -= read_input_count; clear_bit(RMI_READ_DATA_PENDING, &data->flags); } if (ret >= 0) { ret = 0; break; } } exit: clear_bit(RMI_READ_REQUEST_PENDING, &data->flags); mutex_unlock(&data->page_mutex); return ret; } static int rmi_hid_write_block(struct rmi_transport_dev *xport, u16 addr, const void *buf, size_t len) { struct rmi_data *data = container_of(xport, struct rmi_data, xport); struct hid_device *hdev = data->hdev; int ret; mutex_lock(&data->page_mutex); if (RMI_PAGE(addr) != data->page) { ret = rmi_set_page(hdev, RMI_PAGE(addr)); if (ret < 0) goto exit; } data->writeReport[0] = RMI_WRITE_REPORT_ID; data->writeReport[1] = len; data->writeReport[2] = addr & 0xFF; data->writeReport[3] = (addr >> 8) & 0xFF; memcpy(&data->writeReport[4], buf, len); ret = rmi_write_report(hdev, data->writeReport, data->output_report_size); if (ret < 0) { dev_err(&hdev->dev, "failed to write request output report (%d)\n", ret); goto exit; } ret = 0; exit: mutex_unlock(&data->page_mutex); return ret; } static int rmi_reset_attn_mode(struct hid_device *hdev) { struct rmi_data *data = hid_get_drvdata(hdev); struct rmi_device *rmi_dev = data->xport.rmi_dev; int ret; ret = rmi_set_mode(hdev, RMI_MODE_ATTN_REPORTS); if (ret) return ret; if (test_bit(RMI_STARTED, &data->flags)) ret = rmi_dev->driver->reset_handler(rmi_dev); return ret; } static void rmi_reset_work(struct work_struct *work) { struct rmi_data *hdata = container_of(work, struct rmi_data, reset_work); /* switch the device to RMI if we receive a generic mouse report */ rmi_reset_attn_mode(hdata->hdev); } static int rmi_input_event(struct hid_device *hdev, u8 *data, int size) { struct rmi_data *hdata = hid_get_drvdata(hdev); struct rmi_device *rmi_dev = hdata->xport.rmi_dev; unsigned long flags; if (!(test_bit(RMI_STARTED, &hdata->flags))) return 0; pm_wakeup_event(hdev->dev.parent, 0); local_irq_save(flags); rmi_set_attn_data(rmi_dev, data[1], &data[2], size - 2); generic_handle_irq(hdata->rmi_irq); local_irq_restore(flags); return 1; } static int rmi_read_data_event(struct hid_device *hdev, u8 *data, int size) { struct rmi_data *hdata = hid_get_drvdata(hdev); if (!test_bit(RMI_READ_REQUEST_PENDING, &hdata->flags)) { hid_dbg(hdev, "no read request pending\n"); return 0; } memcpy(hdata->readReport, data, min((u32)size, hdata->input_report_size)); set_bit(RMI_READ_DATA_PENDING, &hdata->flags); wake_up(&hdata->wait); return 1; } static int rmi_check_sanity(struct hid_device *hdev, u8 *data, int size) { int valid_size = size; /* * On the Dell XPS 13 9333, the bus sometimes get confused and fills * the report with a sentinel value "ff". Synaptics told us that such * behavior does not comes from the touchpad itself, so we filter out * such reports here. */ while ((data[valid_size - 1] == 0xff) && valid_size > 0) valid_size--; return valid_size; } static int rmi_raw_event(struct hid_device *hdev, struct hid_report *report, u8 *data, int size) { struct rmi_data *hdata = hid_get_drvdata(hdev); if (!(hdata->device_flags & RMI_DEVICE)) return 0; size = rmi_check_sanity(hdev, data, size); if (size < 2) return 0; switch (data[0]) { case RMI_READ_DATA_REPORT_ID: return rmi_read_data_event(hdev, data, size); case RMI_ATTN_REPORT_ID: return rmi_input_event(hdev, data, size); default: return 1; } return 0; } static int rmi_event(struct hid_device *hdev, struct hid_field *field, struct hid_usage *usage, __s32 value) { struct rmi_data *data = hid_get_drvdata(hdev); if ((data->device_flags & RMI_DEVICE) && (field->application == HID_GD_POINTER || field->application == HID_GD_MOUSE)) { if (data->device_flags & RMI_DEVICE_HAS_PHYS_BUTTONS) { if ((usage->hid & HID_USAGE_PAGE) == HID_UP_BUTTON) return 0; if ((usage->hid == HID_GD_X || usage->hid == HID_GD_Y) && !value) return 1; } schedule_work(&data->reset_work); return 1; } return 0; } static void rmi_report(struct hid_device *hid, struct hid_report *report) { struct hid_field *field = report->field[0]; if (!(hid->claimed & HID_CLAIMED_INPUT)) return; switch (report->id) { case RMI_READ_DATA_REPORT_ID: case RMI_ATTN_REPORT_ID: return; } if (field && field->hidinput && field->hidinput->input) input_sync(field->hidinput->input); } static int rmi_suspend(struct hid_device *hdev, pm_message_t message) { struct rmi_data *data = hid_get_drvdata(hdev); struct rmi_device *rmi_dev = data->xport.rmi_dev; int ret; if (!(data->device_flags & RMI_DEVICE)) return 0; ret = rmi_driver_suspend(rmi_dev, false); if (ret) { hid_warn(hdev, "Failed to suspend device: %d\n", ret); return ret; } return 0; } static int rmi_post_resume(struct hid_device *hdev) { struct rmi_data *data = hid_get_drvdata(hdev); struct rmi_device *rmi_dev = data->xport.rmi_dev; int ret; if (!(data->device_flags & RMI_DEVICE)) return 0; /* Make sure the HID device is ready to receive events */ ret = hid_hw_open(hdev); if (ret) return ret; ret = rmi_reset_attn_mode(hdev); if (ret) goto out; ret = rmi_driver_resume(rmi_dev, false); if (ret) { hid_warn(hdev, "Failed to resume device: %d\n", ret); goto out; } out: hid_hw_close(hdev); return ret; } static int rmi_hid_reset(struct rmi_transport_dev *xport, u16 reset_addr) { struct rmi_data *data = container_of(xport, struct rmi_data, xport); struct hid_device *hdev = data->hdev; return rmi_reset_attn_mode(hdev); } static int rmi_input_configured(struct hid_device *hdev, struct hid_input *hi) { struct rmi_data *data = hid_get_drvdata(hdev); struct input_dev *input = hi->input; int ret = 0; if (!(data->device_flags & RMI_DEVICE)) return 0; data->xport.input = input; hid_dbg(hdev, "Opening low level driver\n"); ret = hid_hw_open(hdev); if (ret) return ret; /* Allow incoming hid reports */ hid_device_io_start(hdev); ret = rmi_set_mode(hdev, RMI_MODE_ATTN_REPORTS); if (ret < 0) { dev_err(&hdev->dev, "failed to set rmi mode\n"); goto exit; } ret = rmi_set_page(hdev, 0); if (ret < 0) { dev_err(&hdev->dev, "failed to set page select to 0.\n"); goto exit; } ret = rmi_register_transport_device(&data->xport); if (ret < 0) { dev_err(&hdev->dev, "failed to register transport driver\n"); goto exit; } set_bit(RMI_STARTED, &data->flags); exit: hid_device_io_stop(hdev); hid_hw_close(hdev); return ret; } static int rmi_input_mapping(struct hid_device *hdev, struct hid_input *hi, struct hid_field *field, struct hid_usage *usage, unsigned long **bit, int *max) { struct rmi_data *data = hid_get_drvdata(hdev); /* * we want to make HID ignore the advertised HID collection * for RMI deivces */ if (data->device_flags & RMI_DEVICE) { if ((data->device_flags & RMI_DEVICE_HAS_PHYS_BUTTONS) && ((usage->hid & HID_USAGE_PAGE) == HID_UP_BUTTON)) return 0; return -1; } return 0; } static int rmi_check_valid_report_id(struct hid_device *hdev, unsigned type, unsigned id, struct hid_report **report) { int i; *report = hdev->report_enum[type].report_id_hash[id]; if (*report) { for (i = 0; i < (*report)->maxfield; i++) { unsigned app = (*report)->field[i]->application; if ((app & HID_USAGE_PAGE) >= HID_UP_MSVENDOR) return 1; } } return 0; } static struct rmi_device_platform_data rmi_hid_pdata = { .sensor_pdata = { .sensor_type = rmi_sensor_touchpad, .axis_align.flip_y = true, .dribble = RMI_REG_STATE_ON, .palm_detect = RMI_REG_STATE_OFF, }, }; static const struct rmi_transport_ops hid_rmi_ops = { .write_block = rmi_hid_write_block, .read_block = rmi_hid_read_block, .reset = rmi_hid_reset, }; static void rmi_irq_teardown(void *data) { struct rmi_data *hdata = data; struct irq_domain *domain = hdata->domain; if (!domain) return; irq_dispose_mapping(irq_find_mapping(domain, 0)); irq_domain_remove(domain); hdata->domain = NULL; hdata->rmi_irq = 0; } static int rmi_irq_map(struct irq_domain *h, unsigned int virq, irq_hw_number_t hw_irq_num) { irq_set_chip_and_handler(virq, &dummy_irq_chip, handle_simple_irq); return 0; } static const struct irq_domain_ops rmi_irq_ops = { .map = rmi_irq_map, }; static int rmi_setup_irq_domain(struct hid_device *hdev) { struct rmi_data *hdata = hid_get_drvdata(hdev); int ret; hdata->domain = irq_domain_create_linear(hdev->dev.fwnode, 1, &rmi_irq_ops, hdata); if (!hdata->domain) return -ENOMEM; ret = devm_add_action_or_reset(&hdev->dev, &rmi_irq_teardown, hdata); if (ret) return ret; hdata->rmi_irq = irq_create_mapping(hdata->domain, 0); if (hdata->rmi_irq <= 0) { hid_err(hdev, "Can't allocate an IRQ\n"); return hdata->rmi_irq < 0 ? hdata->rmi_irq : -ENXIO; } return 0; } static int rmi_probe(struct hid_device *hdev, const struct hid_device_id *id) { struct rmi_data *data = NULL; int ret; size_t alloc_size; struct hid_report *input_report; struct hid_report *output_report; struct hid_report *feature_report; data = devm_kzalloc(&hdev->dev, sizeof(struct rmi_data), GFP_KERNEL); if (!data) return -ENOMEM; INIT_WORK(&data->reset_work, rmi_reset_work); data->hdev = hdev; hid_set_drvdata(hdev, data); hdev->quirks |= HID_QUIRK_NO_INIT_REPORTS; hdev->quirks |= HID_QUIRK_NO_INPUT_SYNC; ret = hid_parse(hdev); if (ret) { hid_err(hdev, "parse failed\n"); return ret; } if (id->driver_data) data->device_flags = id->driver_data; /* * Check for the RMI specific report ids. If they are misisng * simply return and let the events be processed by hid-input */ if (!rmi_check_valid_report_id(hdev, HID_FEATURE_REPORT, RMI_SET_RMI_MODE_REPORT_ID, &feature_report)) { hid_dbg(hdev, "device does not have set mode feature report\n"); goto start; } if (!rmi_check_valid_report_id(hdev, HID_INPUT_REPORT, RMI_ATTN_REPORT_ID, &input_report)) { hid_dbg(hdev, "device does not have attention input report\n"); goto start; } data->input_report_size = hid_report_len(input_report); if (!rmi_check_valid_report_id(hdev, HID_OUTPUT_REPORT, RMI_WRITE_REPORT_ID, &output_report)) { hid_dbg(hdev, "device does not have rmi write output report\n"); goto start; } data->output_report_size = hid_report_len(output_report); data->device_flags |= RMI_DEVICE; alloc_size = data->output_report_size + data->input_report_size; data->writeReport = devm_kzalloc(&hdev->dev, alloc_size, GFP_KERNEL); if (!data->writeReport) { hid_err(hdev, "failed to allocate buffer for HID reports\n"); return -ENOMEM; } data->readReport = data->writeReport + data->output_report_size; init_waitqueue_head(&data->wait); mutex_init(&data->page_mutex); ret = rmi_setup_irq_domain(hdev); if (ret) { hid_err(hdev, "failed to allocate IRQ domain\n"); return ret; } if (data->device_flags & RMI_DEVICE_HAS_PHYS_BUTTONS) rmi_hid_pdata.gpio_data.disable = true; data->xport.dev = hdev->dev.parent; data->xport.pdata = rmi_hid_pdata; data->xport.pdata.irq = data->rmi_irq; data->xport.proto_name = "hid"; data->xport.ops = &hid_rmi_ops; start: ret = hid_hw_start(hdev, HID_CONNECT_DEFAULT); if (ret) { hid_err(hdev, "hw start failed\n"); return ret; } return 0; } static void rmi_remove(struct hid_device *hdev) { struct rmi_data *hdata = hid_get_drvdata(hdev); if ((hdata->device_flags & RMI_DEVICE) && test_bit(RMI_STARTED, &hdata->flags)) { clear_bit(RMI_STARTED, &hdata->flags); cancel_work_sync(&hdata->reset_work); rmi_unregister_transport_device(&hdata->xport); } hid_hw_stop(hdev); } static const struct hid_device_id rmi_id[] = { { HID_USB_DEVICE(USB_VENDOR_ID_RAZER, USB_DEVICE_ID_RAZER_BLADE_14), .driver_data = RMI_DEVICE_HAS_PHYS_BUTTONS }, { HID_USB_DEVICE(USB_VENDOR_ID_LENOVO, USB_DEVICE_ID_LENOVO_X1_COVER) }, { HID_USB_DEVICE(USB_VENDOR_ID_PRIMAX, USB_DEVICE_ID_PRIMAX_REZEL) }, { HID_USB_DEVICE(USB_VENDOR_ID_SYNAPTICS, USB_DEVICE_ID_SYNAPTICS_ACER_SWITCH5), .driver_data = RMI_DEVICE_OUTPUT_SET_REPORT }, { HID_DEVICE(HID_BUS_ANY, HID_GROUP_RMI, HID_ANY_ID, HID_ANY_ID) }, { } }; MODULE_DEVICE_TABLE(hid, rmi_id); static struct hid_driver rmi_driver = { .name = "hid-rmi", .id_table = rmi_id, .probe = rmi_probe, .remove = rmi_remove, .event = rmi_event, .raw_event = rmi_raw_event, .report = rmi_report, .input_mapping = rmi_input_mapping, .input_configured = rmi_input_configured, .suspend = pm_ptr(rmi_suspend), .resume = pm_ptr(rmi_post_resume), .reset_resume = pm_ptr(rmi_post_resume), }; module_hid_driver(rmi_driver); MODULE_AUTHOR("Andrew Duggan <aduggan@synaptics.com>"); MODULE_DESCRIPTION("RMI HID driver"); MODULE_LICENSE("GPL"); |
| 12 6 6 12 12 12 12 1 3 10 10 1 1 1 2 2 11 10 3 12 9 9 8 5 6 6 6 16 16 16 1 1 4 14 1 6 1 5 1 6 2 3 3 3 3 1 1 5 7 7 1 6 13 1 7 7 5 4 4 3 4 9 9 1 8 1 8 5 4 9 4 5 1 20 3 16 11 24 15 1 2 1 1 1 1 11 2 10 1 3 6 4 1 2 1 1 1 7 7 6 1 1 6 3 17 1 16 17 17 2 15 17 17 17 17 16 6 11 11 11 11 1 1 28 110 110 1 1 1 1 2 1 1 1 2 1 1 108 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 2297 2298 2299 2300 2301 2302 2303 2304 2305 2306 2307 2308 2309 2310 2311 2312 2313 2314 2315 2316 2317 2318 2319 2320 2321 2322 2323 2324 2325 2326 2327 2328 2329 2330 2331 2332 2333 2334 2335 2336 2337 2338 2339 2340 2341 2342 2343 2344 2345 2346 2347 2348 2349 2350 2351 2352 2353 2354 2355 2356 2357 2358 2359 2360 2361 2362 2363 2364 2365 2366 2367 2368 2369 2370 2371 2372 2373 2374 2375 2376 2377 2378 2379 2380 2381 2382 2383 2384 2385 2386 2387 2388 2389 2390 2391 2392 2393 2394 2395 2396 2397 2398 2399 2400 2401 2402 2403 2404 2405 2406 2407 2408 | // SPDX-License-Identifier: GPL-2.0-or-later /* * GRE over IPv6 protocol decoder. * * Authors: Dmitry Kozlov (xeb@mail.ru) */ #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt #include <linux/capability.h> #include <linux/module.h> #include <linux/types.h> #include <linux/kernel.h> #include <linux/slab.h> #include <linux/uaccess.h> #include <linux/skbuff.h> #include <linux/netdevice.h> #include <linux/in.h> #include <linux/tcp.h> #include <linux/udp.h> #include <linux/if_arp.h> #include <linux/init.h> #include <linux/in6.h> #include <linux/inetdevice.h> #include <linux/igmp.h> #include <linux/netfilter_ipv4.h> #include <linux/etherdevice.h> #include <linux/if_ether.h> #include <linux/hash.h> #include <linux/if_tunnel.h> #include <linux/ip6_tunnel.h> #include <net/sock.h> #include <net/ip.h> #include <net/ip_tunnels.h> #include <net/icmp.h> #include <net/protocol.h> #include <net/addrconf.h> #include <net/arp.h> #include <net/checksum.h> #include <net/dsfield.h> #include <net/inet_ecn.h> #include <net/xfrm.h> #include <net/net_namespace.h> #include <net/netns/generic.h> #include <net/netdev_lock.h> #include <net/rtnetlink.h> #include <net/ipv6.h> #include <net/ip6_fib.h> #include <net/ip6_route.h> #include <net/ip6_tunnel.h> #include <net/gre.h> #include <net/erspan.h> #include <net/dst_metadata.h> static bool log_ecn_error = true; module_param(log_ecn_error, bool, 0644); MODULE_PARM_DESC(log_ecn_error, "Log packets received with corrupted ECN"); #define IP6_GRE_HASH_SIZE_SHIFT 5 #define IP6_GRE_HASH_SIZE (1 << IP6_GRE_HASH_SIZE_SHIFT) static unsigned int ip6gre_net_id __read_mostly; struct ip6gre_net { struct ip6_tnl __rcu *tunnels[4][IP6_GRE_HASH_SIZE]; struct ip6_tnl __rcu *collect_md_tun; struct ip6_tnl __rcu *collect_md_tun_erspan; struct net_device *fb_tunnel_dev; }; static struct rtnl_link_ops ip6gre_link_ops __read_mostly; static struct rtnl_link_ops ip6gre_tap_ops __read_mostly; static struct rtnl_link_ops ip6erspan_tap_ops __read_mostly; static int ip6gre_tunnel_init(struct net_device *dev); static void ip6gre_tunnel_setup(struct net_device *dev); static void ip6gre_tunnel_link(struct ip6gre_net *ign, struct ip6_tnl *t); static void ip6gre_tnl_link_config(struct ip6_tnl *t, int set_mtu); static void ip6erspan_tnl_link_config(struct ip6_tnl *t, int set_mtu); /* Tunnel hash table */ /* 4 hash tables: 3: (remote,local) 2: (remote,*) 1: (*,local) 0: (*,*) We require exact key match i.e. if a key is present in packet it will match only tunnel with the same key; if it is not present, it will match only keyless tunnel. All keysless packets, if not matched configured keyless tunnels will match fallback tunnel. */ #define HASH_KEY(key) (((__force u32)key^((__force u32)key>>4))&(IP6_GRE_HASH_SIZE - 1)) static u32 HASH_ADDR(const struct in6_addr *addr) { u32 hash = ipv6_addr_hash(addr); return hash_32(hash, IP6_GRE_HASH_SIZE_SHIFT); } #define tunnels_r_l tunnels[3] #define tunnels_r tunnels[2] #define tunnels_l tunnels[1] #define tunnels_wc tunnels[0] /* Given src, dst and key, find appropriate for input tunnel. */ static struct ip6_tnl *ip6gre_tunnel_lookup(struct net_device *dev, const struct in6_addr *remote, const struct in6_addr *local, __be32 key, __be16 gre_proto) { struct net *net = dev_net(dev); int link = dev->ifindex; unsigned int h0 = HASH_ADDR(remote); unsigned int h1 = HASH_KEY(key); struct ip6_tnl *t, *cand = NULL; struct ip6gre_net *ign = net_generic(net, ip6gre_net_id); int dev_type = (gre_proto == htons(ETH_P_TEB) || gre_proto == htons(ETH_P_ERSPAN) || gre_proto == htons(ETH_P_ERSPAN2)) ? ARPHRD_ETHER : ARPHRD_IP6GRE; int score, cand_score = 4; struct net_device *ndev; for_each_ip_tunnel_rcu(t, ign->tunnels_r_l[h0 ^ h1]) { if (!ipv6_addr_equal(local, &t->parms.laddr) || !ipv6_addr_equal(remote, &t->parms.raddr) || key != t->parms.i_key || !(t->dev->flags & IFF_UP)) continue; if (t->dev->type != ARPHRD_IP6GRE && t->dev->type != dev_type) continue; score = 0; if (t->parms.link != link) score |= 1; if (t->dev->type != dev_type) score |= 2; if (score == 0) return t; if (score < cand_score) { cand = t; cand_score = score; } } for_each_ip_tunnel_rcu(t, ign->tunnels_r[h0 ^ h1]) { if (!ipv6_addr_equal(remote, &t->parms.raddr) || key != t->parms.i_key || !(t->dev->flags & IFF_UP)) continue; if (t->dev->type != ARPHRD_IP6GRE && t->dev->type != dev_type) continue; score = 0; if (t->parms.link != link) score |= 1; if (t->dev->type != dev_type) score |= 2; if (score == 0) return t; if (score < cand_score) { cand = t; cand_score = score; } } for_each_ip_tunnel_rcu(t, ign->tunnels_l[h1]) { if ((!ipv6_addr_equal(local, &t->parms.laddr) && (!ipv6_addr_equal(local, &t->parms.raddr) || !ipv6_addr_is_multicast(local))) || key != t->parms.i_key || !(t->dev->flags & IFF_UP)) continue; if (t->dev->type != ARPHRD_IP6GRE && t->dev->type != dev_type) continue; score = 0; if (t->parms.link != link) score |= 1; if (t->dev->type != dev_type) score |= 2; if (score == 0) return t; if (score < cand_score) { cand = t; cand_score = score; } } for_each_ip_tunnel_rcu(t, ign->tunnels_wc[h1]) { if (t->parms.i_key != key || !(t->dev->flags & IFF_UP)) continue; if (t->dev->type != ARPHRD_IP6GRE && t->dev->type != dev_type) continue; score = 0; if (t->parms.link != link) score |= 1; if (t->dev->type != dev_type) score |= 2; if (score == 0) return t; if (score < cand_score) { cand = t; cand_score = score; } } if (cand) return cand; if (gre_proto == htons(ETH_P_ERSPAN) || gre_proto == htons(ETH_P_ERSPAN2)) t = rcu_dereference(ign->collect_md_tun_erspan); else t = rcu_dereference(ign->collect_md_tun); if (t && t->dev->flags & IFF_UP) return t; ndev = READ_ONCE(ign->fb_tunnel_dev); if (ndev && ndev->flags & IFF_UP) return netdev_priv(ndev); return NULL; } static struct ip6_tnl __rcu **__ip6gre_bucket(struct ip6gre_net *ign, const struct __ip6_tnl_parm *p) { const struct in6_addr *remote = &p->raddr; const struct in6_addr *local = &p->laddr; unsigned int h = HASH_KEY(p->i_key); int prio = 0; if (!ipv6_addr_any(local)) prio |= 1; if (!ipv6_addr_any(remote) && !ipv6_addr_is_multicast(remote)) { prio |= 2; h ^= HASH_ADDR(remote); } return &ign->tunnels[prio][h]; } static void ip6gre_tunnel_link_md(struct ip6gre_net *ign, struct ip6_tnl *t) { if (t->parms.collect_md) rcu_assign_pointer(ign->collect_md_tun, t); } static void ip6erspan_tunnel_link_md(struct ip6gre_net *ign, struct ip6_tnl *t) { if (t->parms.collect_md) rcu_assign_pointer(ign->collect_md_tun_erspan, t); } static void ip6gre_tunnel_unlink_md(struct ip6gre_net *ign, struct ip6_tnl *t) { if (t->parms.collect_md) rcu_assign_pointer(ign->collect_md_tun, NULL); } static void ip6erspan_tunnel_unlink_md(struct ip6gre_net *ign, struct ip6_tnl *t) { if (t->parms.collect_md) rcu_assign_pointer(ign->collect_md_tun_erspan, NULL); } static inline struct ip6_tnl __rcu **ip6gre_bucket(struct ip6gre_net *ign, const struct ip6_tnl *t) { return __ip6gre_bucket(ign, &t->parms); } static void ip6gre_tunnel_link(struct ip6gre_net *ign, struct ip6_tnl *t) { struct ip6_tnl __rcu **tp = ip6gre_bucket(ign, t); rcu_assign_pointer(t->next, rtnl_dereference(*tp)); rcu_assign_pointer(*tp, t); } static void ip6gre_tunnel_unlink(struct ip6gre_net *ign, struct ip6_tnl *t) { struct ip6_tnl __rcu **tp; struct ip6_tnl *iter; for (tp = ip6gre_bucket(ign, t); (iter = rtnl_dereference(*tp)) != NULL; tp = &iter->next) { if (t == iter) { rcu_assign_pointer(*tp, t->next); break; } } } static struct ip6_tnl *ip6gre_tunnel_find(struct net *net, const struct __ip6_tnl_parm *parms, int type) { const struct in6_addr *remote = &parms->raddr; const struct in6_addr *local = &parms->laddr; __be32 key = parms->i_key; int link = parms->link; struct ip6_tnl *t; struct ip6_tnl __rcu **tp; struct ip6gre_net *ign = net_generic(net, ip6gre_net_id); for (tp = __ip6gre_bucket(ign, parms); (t = rtnl_dereference(*tp)) != NULL; tp = &t->next) if (ipv6_addr_equal(local, &t->parms.laddr) && ipv6_addr_equal(remote, &t->parms.raddr) && key == t->parms.i_key && link == t->parms.link && type == t->dev->type) break; return t; } static struct ip6_tnl *ip6gre_tunnel_locate(struct net *net, const struct __ip6_tnl_parm *parms, int create) { struct ip6_tnl *t, *nt; struct net_device *dev; char name[IFNAMSIZ]; struct ip6gre_net *ign = net_generic(net, ip6gre_net_id); t = ip6gre_tunnel_find(net, parms, ARPHRD_IP6GRE); if (t && create) return NULL; if (t || !create) return t; if (parms->name[0]) { if (!dev_valid_name(parms->name)) return NULL; strscpy(name, parms->name, IFNAMSIZ); } else { strcpy(name, "ip6gre%d"); } dev = alloc_netdev(sizeof(*t), name, NET_NAME_UNKNOWN, ip6gre_tunnel_setup); if (!dev) return NULL; dev_net_set(dev, net); nt = netdev_priv(dev); nt->parms = *parms; dev->rtnl_link_ops = &ip6gre_link_ops; nt->dev = dev; nt->net = dev_net(dev); if (register_netdevice(dev) < 0) goto failed_free; ip6gre_tnl_link_config(nt, 1); ip6gre_tunnel_link(ign, nt); return nt; failed_free: free_netdev(dev); return NULL; } static void ip6erspan_tunnel_uninit(struct net_device *dev) { struct ip6_tnl *t = netdev_priv(dev); struct ip6gre_net *ign = net_generic(t->net, ip6gre_net_id); ip6erspan_tunnel_unlink_md(ign, t); ip6gre_tunnel_unlink(ign, t); dst_cache_reset(&t->dst_cache); netdev_put(dev, &t->dev_tracker); } static void ip6gre_tunnel_uninit(struct net_device *dev) { struct ip6_tnl *t = netdev_priv(dev); struct ip6gre_net *ign = net_generic(t->net, ip6gre_net_id); ip6gre_tunnel_unlink_md(ign, t); ip6gre_tunnel_unlink(ign, t); if (ign->fb_tunnel_dev == dev) WRITE_ONCE(ign->fb_tunnel_dev, NULL); dst_cache_reset(&t->dst_cache); netdev_put(dev, &t->dev_tracker); } static int ip6gre_err(struct sk_buff *skb, struct inet6_skb_parm *opt, u8 type, u8 code, int offset, __be32 info) { struct net *net = dev_net(skb->dev); const struct ipv6hdr *ipv6h; struct tnl_ptk_info tpi; struct ip6_tnl *t; if (gre_parse_header(skb, &tpi, NULL, htons(ETH_P_IPV6), offset) < 0) return -EINVAL; ipv6h = (const struct ipv6hdr *)skb->data; t = ip6gre_tunnel_lookup(skb->dev, &ipv6h->daddr, &ipv6h->saddr, tpi.key, tpi.proto); if (!t) return -ENOENT; switch (type) { case ICMPV6_DEST_UNREACH: net_dbg_ratelimited("%s: Path to destination invalid or inactive!\n", t->parms.name); if (code != ICMPV6_PORT_UNREACH) break; return 0; case ICMPV6_TIME_EXCEED: if (code == ICMPV6_EXC_HOPLIMIT) { net_dbg_ratelimited("%s: Too small hop limit or routing loop in tunnel!\n", t->parms.name); break; } return 0; case ICMPV6_PARAMPROB: { struct ipv6_tlv_tnl_enc_lim *tel; __u32 teli; teli = 0; if (code == ICMPV6_HDR_FIELD) teli = ip6_tnl_parse_tlv_enc_lim(skb, skb->data); if (teli && teli == be32_to_cpu(info) - 2) { tel = (struct ipv6_tlv_tnl_enc_lim *) &skb->data[teli]; if (tel->encap_limit == 0) { net_dbg_ratelimited("%s: Too small encapsulation limit or routing loop in tunnel!\n", t->parms.name); } } else { net_dbg_ratelimited("%s: Recipient unable to parse tunneled packet!\n", t->parms.name); } return 0; } case ICMPV6_PKT_TOOBIG: ip6_update_pmtu(skb, net, info, 0, 0, sock_net_uid(net, NULL)); return 0; case NDISC_REDIRECT: ip6_redirect(skb, net, skb->dev->ifindex, 0, sock_net_uid(net, NULL)); return 0; } if (time_before(jiffies, t->err_time + IP6TUNNEL_ERR_TIMEO)) t->err_count++; else t->err_count = 1; t->err_time = jiffies; return 0; } static int ip6gre_rcv(struct sk_buff *skb, const struct tnl_ptk_info *tpi) { const struct ipv6hdr *ipv6h; struct ip6_tnl *tunnel; ipv6h = ipv6_hdr(skb); tunnel = ip6gre_tunnel_lookup(skb->dev, &ipv6h->saddr, &ipv6h->daddr, tpi->key, tpi->proto); if (tunnel) { if (tunnel->parms.collect_md) { IP_TUNNEL_DECLARE_FLAGS(flags); struct metadata_dst *tun_dst; __be64 tun_id; ip_tunnel_flags_copy(flags, tpi->flags); tun_id = key32_to_tunnel_id(tpi->key); tun_dst = ipv6_tun_rx_dst(skb, flags, tun_id, 0); if (!tun_dst) return PACKET_REJECT; ip6_tnl_rcv(tunnel, skb, tpi, tun_dst, log_ecn_error); } else { ip6_tnl_rcv(tunnel, skb, tpi, NULL, log_ecn_error); } return PACKET_RCVD; } return PACKET_REJECT; } static int ip6erspan_rcv(struct sk_buff *skb, struct tnl_ptk_info *tpi, int gre_hdr_len) { struct erspan_base_hdr *ershdr; const struct ipv6hdr *ipv6h; struct erspan_md2 *md2; struct ip6_tnl *tunnel; u8 ver; if (unlikely(!pskb_may_pull(skb, sizeof(*ershdr)))) return PACKET_REJECT; ipv6h = ipv6_hdr(skb); ershdr = (struct erspan_base_hdr *)skb->data; ver = ershdr->ver; tunnel = ip6gre_tunnel_lookup(skb->dev, &ipv6h->saddr, &ipv6h->daddr, tpi->key, tpi->proto); if (tunnel) { int len = erspan_hdr_len(ver); if (unlikely(!pskb_may_pull(skb, len))) return PACKET_REJECT; if (__iptunnel_pull_header(skb, len, htons(ETH_P_TEB), false, false) < 0) return PACKET_REJECT; if (tunnel->parms.collect_md) { struct erspan_metadata *pkt_md, *md; IP_TUNNEL_DECLARE_FLAGS(flags); struct metadata_dst *tun_dst; struct ip_tunnel_info *info; unsigned char *gh; __be64 tun_id; __set_bit(IP_TUNNEL_KEY_BIT, tpi->flags); ip_tunnel_flags_copy(flags, tpi->flags); tun_id = key32_to_tunnel_id(tpi->key); tun_dst = ipv6_tun_rx_dst(skb, flags, tun_id, sizeof(*md)); if (!tun_dst) return PACKET_REJECT; /* skb can be uncloned in __iptunnel_pull_header, so * old pkt_md is no longer valid and we need to reset * it */ gh = skb_network_header(skb) + skb_network_header_len(skb); pkt_md = (struct erspan_metadata *)(gh + gre_hdr_len + sizeof(*ershdr)); info = &tun_dst->u.tun_info; md = ip_tunnel_info_opts(info); md->version = ver; md2 = &md->u.md2; memcpy(md2, pkt_md, ver == 1 ? ERSPAN_V1_MDSIZE : ERSPAN_V2_MDSIZE); __set_bit(IP_TUNNEL_ERSPAN_OPT_BIT, info->key.tun_flags); info->options_len = sizeof(*md); ip6_tnl_rcv(tunnel, skb, tpi, tun_dst, log_ecn_error); } else { ip6_tnl_rcv(tunnel, skb, tpi, NULL, log_ecn_error); } return PACKET_RCVD; } return PACKET_REJECT; } static int gre_rcv(struct sk_buff *skb) { struct tnl_ptk_info tpi; bool csum_err = false; int hdr_len; hdr_len = gre_parse_header(skb, &tpi, &csum_err, htons(ETH_P_IPV6), 0); if (hdr_len < 0) goto drop; if (iptunnel_pull_header(skb, hdr_len, tpi.proto, false)) goto drop; if (unlikely(tpi.proto == htons(ETH_P_ERSPAN) || tpi.proto == htons(ETH_P_ERSPAN2))) { if (ip6erspan_rcv(skb, &tpi, hdr_len) == PACKET_RCVD) return 0; goto out; } if (ip6gre_rcv(skb, &tpi) == PACKET_RCVD) return 0; out: icmpv6_send(skb, ICMPV6_DEST_UNREACH, ICMPV6_PORT_UNREACH, 0); drop: kfree_skb(skb); return 0; } static int gre_handle_offloads(struct sk_buff *skb, bool csum) { return iptunnel_handle_offloads(skb, csum ? SKB_GSO_GRE_CSUM : SKB_GSO_GRE); } static void prepare_ip6gre_xmit_ipv4(struct sk_buff *skb, struct net_device *dev, struct flowi6 *fl6, __u8 *dsfield, int *encap_limit) { const struct iphdr *iph = ip_hdr(skb); struct ip6_tnl *t = netdev_priv(dev); if (!(t->parms.flags & IP6_TNL_F_IGN_ENCAP_LIMIT)) *encap_limit = t->parms.encap_limit; memcpy(fl6, &t->fl.u.ip6, sizeof(*fl6)); if (t->parms.flags & IP6_TNL_F_USE_ORIG_TCLASS) *dsfield = ipv4_get_dsfield(iph); else *dsfield = ip6_tclass(t->parms.flowinfo); if (t->parms.flags & IP6_TNL_F_USE_ORIG_FWMARK) fl6->flowi6_mark = skb->mark; else fl6->flowi6_mark = t->parms.fwmark; fl6->flowi6_uid = sock_net_uid(dev_net(dev), NULL); } static int prepare_ip6gre_xmit_ipv6(struct sk_buff *skb, struct net_device *dev, struct flowi6 *fl6, __u8 *dsfield, int *encap_limit) { struct ipv6hdr *ipv6h; struct ip6_tnl *t = netdev_priv(dev); __u16 offset; offset = ip6_tnl_parse_tlv_enc_lim(skb, skb_network_header(skb)); /* ip6_tnl_parse_tlv_enc_lim() might have reallocated skb->head */ ipv6h = ipv6_hdr(skb); if (offset > 0) { struct ipv6_tlv_tnl_enc_lim *tel; tel = (struct ipv6_tlv_tnl_enc_lim *)&skb_network_header(skb)[offset]; if (tel->encap_limit == 0) { icmpv6_ndo_send(skb, ICMPV6_PARAMPROB, ICMPV6_HDR_FIELD, offset + 2); return -1; } *encap_limit = tel->encap_limit - 1; } else if (!(t->parms.flags & IP6_TNL_F_IGN_ENCAP_LIMIT)) { *encap_limit = t->parms.encap_limit; } memcpy(fl6, &t->fl.u.ip6, sizeof(*fl6)); if (t->parms.flags & IP6_TNL_F_USE_ORIG_TCLASS) *dsfield = ipv6_get_dsfield(ipv6h); else *dsfield = ip6_tclass(t->parms.flowinfo); if (t->parms.flags & IP6_TNL_F_USE_ORIG_FLOWLABEL) fl6->flowlabel |= ip6_flowlabel(ipv6h); if (t->parms.flags & IP6_TNL_F_USE_ORIG_FWMARK) fl6->flowi6_mark = skb->mark; else fl6->flowi6_mark = t->parms.fwmark; fl6->flowi6_uid = sock_net_uid(dev_net(dev), NULL); return 0; } static int prepare_ip6gre_xmit_other(struct sk_buff *skb, struct net_device *dev, struct flowi6 *fl6, __u8 *dsfield, int *encap_limit) { struct ip6_tnl *t = netdev_priv(dev); if (!(t->parms.flags & IP6_TNL_F_IGN_ENCAP_LIMIT)) *encap_limit = t->parms.encap_limit; memcpy(fl6, &t->fl.u.ip6, sizeof(*fl6)); if (t->parms.flags & IP6_TNL_F_USE_ORIG_TCLASS) *dsfield = 0; else *dsfield = ip6_tclass(t->parms.flowinfo); if (t->parms.flags & IP6_TNL_F_USE_ORIG_FWMARK) fl6->flowi6_mark = skb->mark; else fl6->flowi6_mark = t->parms.fwmark; fl6->flowi6_uid = sock_net_uid(dev_net(dev), NULL); return 0; } static struct ip_tunnel_info *skb_tunnel_info_txcheck(struct sk_buff *skb) { struct ip_tunnel_info *tun_info; tun_info = skb_tunnel_info(skb); if (unlikely(!tun_info || !(tun_info->mode & IP_TUNNEL_INFO_TX))) return ERR_PTR(-EINVAL); return tun_info; } static netdev_tx_t __gre6_xmit(struct sk_buff *skb, struct net_device *dev, __u8 dsfield, struct flowi6 *fl6, int encap_limit, __u32 *pmtu, __be16 proto) { struct ip6_tnl *tunnel = netdev_priv(dev); IP_TUNNEL_DECLARE_FLAGS(flags); __be16 protocol; if (dev->type == ARPHRD_ETHER) IPCB(skb)->flags = 0; if (dev->header_ops && dev->type == ARPHRD_IP6GRE) fl6->daddr = ((struct ipv6hdr *)skb->data)->daddr; else fl6->daddr = tunnel->parms.raddr; /* Push GRE header. */ protocol = (dev->type == ARPHRD_ETHER) ? htons(ETH_P_TEB) : proto; if (tunnel->parms.collect_md) { struct ip_tunnel_info *tun_info; const struct ip_tunnel_key *key; int tun_hlen; tun_info = skb_tunnel_info_txcheck(skb); if (IS_ERR(tun_info) || unlikely(ip_tunnel_info_af(tun_info) != AF_INET6)) return -EINVAL; key = &tun_info->key; memset(fl6, 0, sizeof(*fl6)); fl6->flowi6_proto = IPPROTO_GRE; fl6->daddr = key->u.ipv6.dst; fl6->flowlabel = key->label; fl6->flowi6_uid = sock_net_uid(dev_net(dev), NULL); fl6->fl6_gre_key = tunnel_id_to_key32(key->tun_id); dsfield = key->tos; ip_tunnel_flags_zero(flags); __set_bit(IP_TUNNEL_CSUM_BIT, flags); __set_bit(IP_TUNNEL_KEY_BIT, flags); __set_bit(IP_TUNNEL_SEQ_BIT, flags); ip_tunnel_flags_and(flags, flags, key->tun_flags); tun_hlen = gre_calc_hlen(flags); if (skb_cow_head(skb, dev->needed_headroom ?: tun_hlen + tunnel->encap_hlen)) return -ENOMEM; gre_build_header(skb, tun_hlen, flags, protocol, tunnel_id_to_key32(tun_info->key.tun_id), test_bit(IP_TUNNEL_SEQ_BIT, flags) ? htonl(atomic_fetch_inc(&tunnel->o_seqno)) : 0); } else { if (skb_cow_head(skb, dev->needed_headroom ?: tunnel->hlen)) return -ENOMEM; ip_tunnel_flags_copy(flags, tunnel->parms.o_flags); gre_build_header(skb, tunnel->tun_hlen, flags, protocol, tunnel->parms.o_key, test_bit(IP_TUNNEL_SEQ_BIT, flags) ? htonl(atomic_fetch_inc(&tunnel->o_seqno)) : 0); } return ip6_tnl_xmit(skb, dev, dsfield, fl6, encap_limit, pmtu, NEXTHDR_GRE); } static inline int ip6gre_xmit_ipv4(struct sk_buff *skb, struct net_device *dev) { struct ip6_tnl *t = netdev_priv(dev); int encap_limit = -1; struct flowi6 fl6; __u8 dsfield = 0; __u32 mtu; int err; memset(&(IPCB(skb)->opt), 0, sizeof(IPCB(skb)->opt)); if (!t->parms.collect_md) prepare_ip6gre_xmit_ipv4(skb, dev, &fl6, &dsfield, &encap_limit); err = gre_handle_offloads(skb, test_bit(IP_TUNNEL_CSUM_BIT, t->parms.o_flags)); if (err) return -1; err = __gre6_xmit(skb, dev, dsfield, &fl6, encap_limit, &mtu, skb->protocol); if (err != 0) { /* XXX: send ICMP error even if DF is not set. */ if (err == -EMSGSIZE) icmp_ndo_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED, htonl(mtu)); return -1; } return 0; } static inline int ip6gre_xmit_ipv6(struct sk_buff *skb, struct net_device *dev) { struct ip6_tnl *t = netdev_priv(dev); struct ipv6hdr *ipv6h = ipv6_hdr(skb); int encap_limit = -1; struct flowi6 fl6; __u8 dsfield = 0; __u32 mtu; int err; if (ipv6_addr_equal(&t->parms.raddr, &ipv6h->saddr)) return -1; if (!t->parms.collect_md && prepare_ip6gre_xmit_ipv6(skb, dev, &fl6, &dsfield, &encap_limit)) return -1; if (gre_handle_offloads(skb, test_bit(IP_TUNNEL_CSUM_BIT, t->parms.o_flags))) return -1; err = __gre6_xmit(skb, dev, dsfield, &fl6, encap_limit, &mtu, skb->protocol); if (err != 0) { if (err == -EMSGSIZE) icmpv6_ndo_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu); return -1; } return 0; } static int ip6gre_xmit_other(struct sk_buff *skb, struct net_device *dev) { struct ip6_tnl *t = netdev_priv(dev); int encap_limit = -1; struct flowi6 fl6; __u8 dsfield = 0; __u32 mtu; int err; if (!t->parms.collect_md && prepare_ip6gre_xmit_other(skb, dev, &fl6, &dsfield, &encap_limit)) return -1; err = gre_handle_offloads(skb, test_bit(IP_TUNNEL_CSUM_BIT, t->parms.o_flags)); if (err) return err; err = __gre6_xmit(skb, dev, dsfield, &fl6, encap_limit, &mtu, skb->protocol); return err; } static netdev_tx_t ip6gre_tunnel_xmit(struct sk_buff *skb, struct net_device *dev) { struct ip6_tnl *t = netdev_priv(dev); __be16 payload_protocol; int ret; if (!pskb_inet_may_pull(skb)) goto tx_err; if (!ip6_tnl_xmit_ctl(t, &t->parms.laddr, &t->parms.raddr)) goto tx_err; payload_protocol = skb_protocol(skb, true); switch (payload_protocol) { case htons(ETH_P_IP): ret = ip6gre_xmit_ipv4(skb, dev); break; case htons(ETH_P_IPV6): ret = ip6gre_xmit_ipv6(skb, dev); break; default: ret = ip6gre_xmit_other(skb, dev); break; } if (ret < 0) goto tx_err; return NETDEV_TX_OK; tx_err: if (!t->parms.collect_md || !IS_ERR(skb_tunnel_info_txcheck(skb))) DEV_STATS_INC(dev, tx_errors); DEV_STATS_INC(dev, tx_dropped); kfree_skb(skb); return NETDEV_TX_OK; } static netdev_tx_t ip6erspan_tunnel_xmit(struct sk_buff *skb, struct net_device *dev) { struct ip_tunnel_info *tun_info = NULL; struct ip6_tnl *t = netdev_priv(dev); struct dst_entry *dst = skb_dst(skb); IP_TUNNEL_DECLARE_FLAGS(flags) = { }; bool truncate = false; int encap_limit = -1; __u8 dsfield = false; struct flowi6 fl6; int err = -EINVAL; __be16 proto; __u32 mtu; int nhoff; if (!pskb_inet_may_pull(skb)) goto tx_err; if (!ip6_tnl_xmit_ctl(t, &t->parms.laddr, &t->parms.raddr)) goto tx_err; if (gre_handle_offloads(skb, false)) goto tx_err; if (skb->len > dev->mtu + dev->hard_header_len) { if (pskb_trim(skb, dev->mtu + dev->hard_header_len)) goto tx_err; truncate = true; } nhoff = skb_network_offset(skb); if (skb->protocol == htons(ETH_P_IP) && (ntohs(ip_hdr(skb)->tot_len) > skb->len - nhoff)) truncate = true; if (skb->protocol == htons(ETH_P_IPV6)) { int thoff; if (skb_transport_header_was_set(skb)) thoff = skb_transport_offset(skb); else thoff = nhoff + sizeof(struct ipv6hdr); if (ntohs(ipv6_hdr(skb)->payload_len) > skb->len - thoff) truncate = true; } if (skb_cow_head(skb, dev->needed_headroom ?: t->hlen)) goto tx_err; __clear_bit(IP_TUNNEL_KEY_BIT, t->parms.o_flags); IPCB(skb)->flags = 0; /* For collect_md mode, derive fl6 from the tunnel key, * for native mode, call prepare_ip6gre_xmit_{ipv4,ipv6}. */ if (t->parms.collect_md) { const struct ip_tunnel_key *key; struct erspan_metadata *md; __be32 tun_id; tun_info = skb_tunnel_info_txcheck(skb); if (IS_ERR(tun_info) || unlikely(ip_tunnel_info_af(tun_info) != AF_INET6)) goto tx_err; key = &tun_info->key; memset(&fl6, 0, sizeof(fl6)); fl6.flowi6_proto = IPPROTO_GRE; fl6.daddr = key->u.ipv6.dst; fl6.flowlabel = key->label; fl6.flowi6_uid = sock_net_uid(dev_net(dev), NULL); fl6.fl6_gre_key = tunnel_id_to_key32(key->tun_id); dsfield = key->tos; if (!test_bit(IP_TUNNEL_ERSPAN_OPT_BIT, tun_info->key.tun_flags)) goto tx_err; if (tun_info->options_len < sizeof(*md)) goto tx_err; md = ip_tunnel_info_opts(tun_info); tun_id = tunnel_id_to_key32(key->tun_id); if (md->version == 1) { erspan_build_header(skb, ntohl(tun_id), ntohl(md->u.index), truncate, false); proto = htons(ETH_P_ERSPAN); } else if (md->version == 2) { erspan_build_header_v2(skb, ntohl(tun_id), md->u.md2.dir, get_hwid(&md->u.md2), truncate, false); proto = htons(ETH_P_ERSPAN2); } else { goto tx_err; } } else { switch (skb->protocol) { case htons(ETH_P_IP): memset(&(IPCB(skb)->opt), 0, sizeof(IPCB(skb)->opt)); prepare_ip6gre_xmit_ipv4(skb, dev, &fl6, &dsfield, &encap_limit); break; case htons(ETH_P_IPV6): if (ipv6_addr_equal(&t->parms.raddr, &ipv6_hdr(skb)->saddr)) goto tx_err; if (prepare_ip6gre_xmit_ipv6(skb, dev, &fl6, &dsfield, &encap_limit)) goto tx_err; break; default: memcpy(&fl6, &t->fl.u.ip6, sizeof(fl6)); break; } if (t->parms.erspan_ver == 1) { erspan_build_header(skb, ntohl(t->parms.o_key), t->parms.index, truncate, false); proto = htons(ETH_P_ERSPAN); } else if (t->parms.erspan_ver == 2) { erspan_build_header_v2(skb, ntohl(t->parms.o_key), t->parms.dir, t->parms.hwid, truncate, false); proto = htons(ETH_P_ERSPAN2); } else { goto tx_err; } fl6.daddr = t->parms.raddr; } /* Push GRE header. */ __set_bit(IP_TUNNEL_SEQ_BIT, flags); gre_build_header(skb, 8, flags, proto, 0, htonl(atomic_fetch_inc(&t->o_seqno))); /* TooBig packet may have updated dst->dev's mtu */ if (!t->parms.collect_md && dst && dst_mtu(dst) > dst->dev->mtu) dst->ops->update_pmtu(dst, NULL, skb, dst->dev->mtu, false); err = ip6_tnl_xmit(skb, dev, dsfield, &fl6, encap_limit, &mtu, NEXTHDR_GRE); if (err != 0) { /* XXX: send ICMP error even if DF is not set. */ if (err == -EMSGSIZE) { if (skb->protocol == htons(ETH_P_IP)) icmp_ndo_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED, htonl(mtu)); else icmpv6_ndo_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu); } goto tx_err; } return NETDEV_TX_OK; tx_err: if (!IS_ERR(tun_info)) DEV_STATS_INC(dev, tx_errors); DEV_STATS_INC(dev, tx_dropped); kfree_skb(skb); return NETDEV_TX_OK; } static void ip6gre_tnl_link_config_common(struct ip6_tnl *t) { struct net_device *dev = t->dev; struct __ip6_tnl_parm *p = &t->parms; struct flowi6 *fl6 = &t->fl.u.ip6; if (dev->type != ARPHRD_ETHER) { __dev_addr_set(dev, &p->laddr, sizeof(struct in6_addr)); memcpy(dev->broadcast, &p->raddr, sizeof(struct in6_addr)); } /* Set up flowi template */ fl6->saddr = p->laddr; fl6->daddr = p->raddr; fl6->flowi6_oif = p->link; fl6->flowlabel = 0; fl6->flowi6_proto = IPPROTO_GRE; fl6->fl6_gre_key = t->parms.o_key; if (!(p->flags&IP6_TNL_F_USE_ORIG_TCLASS)) fl6->flowlabel |= IPV6_TCLASS_MASK & p->flowinfo; if (!(p->flags&IP6_TNL_F_USE_ORIG_FLOWLABEL)) fl6->flowlabel |= IPV6_FLOWLABEL_MASK & p->flowinfo; p->flags &= ~(IP6_TNL_F_CAP_XMIT|IP6_TNL_F_CAP_RCV|IP6_TNL_F_CAP_PER_PACKET); p->flags |= ip6_tnl_get_cap(t, &p->laddr, &p->raddr); if (p->flags&IP6_TNL_F_CAP_XMIT && p->flags&IP6_TNL_F_CAP_RCV && dev->type != ARPHRD_ETHER) dev->flags |= IFF_POINTOPOINT; else dev->flags &= ~IFF_POINTOPOINT; } static void ip6gre_tnl_link_config_route(struct ip6_tnl *t, int set_mtu, int t_hlen) { const struct __ip6_tnl_parm *p = &t->parms; struct net_device *dev = t->dev; if (p->flags & IP6_TNL_F_CAP_XMIT) { int strict = (ipv6_addr_type(&p->raddr) & (IPV6_ADDR_MULTICAST|IPV6_ADDR_LINKLOCAL)); struct rt6_info *rt = rt6_lookup(t->net, &p->raddr, &p->laddr, p->link, NULL, strict); if (!rt) return; if (rt->dst.dev) { unsigned short dst_len = rt->dst.dev->hard_header_len + t_hlen; if (t->dev->header_ops) dev->hard_header_len = dst_len; else dev->needed_headroom = dst_len; if (set_mtu) { int mtu = rt->dst.dev->mtu - t_hlen; if (!(t->parms.flags & IP6_TNL_F_IGN_ENCAP_LIMIT)) mtu -= 8; if (dev->type == ARPHRD_ETHER) mtu -= ETH_HLEN; if (mtu < IPV6_MIN_MTU) mtu = IPV6_MIN_MTU; WRITE_ONCE(dev->mtu, mtu); } } ip6_rt_put(rt); } } static int ip6gre_calc_hlen(struct ip6_tnl *tunnel) { int t_hlen; tunnel->tun_hlen = gre_calc_hlen(tunnel->parms.o_flags); tunnel->hlen = tunnel->tun_hlen + tunnel->encap_hlen; t_hlen = tunnel->hlen + sizeof(struct ipv6hdr); if (tunnel->dev->header_ops) tunnel->dev->hard_header_len = LL_MAX_HEADER + t_hlen; else tunnel->dev->needed_headroom = LL_MAX_HEADER + t_hlen; return t_hlen; } static void ip6gre_tnl_link_config(struct ip6_tnl *t, int set_mtu) { ip6gre_tnl_link_config_common(t); ip6gre_tnl_link_config_route(t, set_mtu, ip6gre_calc_hlen(t)); } static void ip6gre_tnl_copy_tnl_parm(struct ip6_tnl *t, const struct __ip6_tnl_parm *p) { t->parms.laddr = p->laddr; t->parms.raddr = p->raddr; t->parms.flags = p->flags; t->parms.hop_limit = p->hop_limit; t->parms.encap_limit = p->encap_limit; t->parms.flowinfo = p->flowinfo; t->parms.link = p->link; t->parms.proto = p->proto; t->parms.i_key = p->i_key; t->parms.o_key = p->o_key; ip_tunnel_flags_copy(t->parms.i_flags, p->i_flags); ip_tunnel_flags_copy(t->parms.o_flags, p->o_flags); t->parms.fwmark = p->fwmark; t->parms.erspan_ver = p->erspan_ver; t->parms.index = p->index; t->parms.dir = p->dir; t->parms.hwid = p->hwid; dst_cache_reset(&t->dst_cache); } static int ip6gre_tnl_change(struct ip6_tnl *t, const struct __ip6_tnl_parm *p, int set_mtu) { ip6gre_tnl_copy_tnl_parm(t, p); ip6gre_tnl_link_config(t, set_mtu); return 0; } static void ip6gre_tnl_parm_from_user(struct __ip6_tnl_parm *p, const struct ip6_tnl_parm2 *u) { p->laddr = u->laddr; p->raddr = u->raddr; p->flags = u->flags; p->hop_limit = u->hop_limit; p->encap_limit = u->encap_limit; p->flowinfo = u->flowinfo; p->link = u->link; p->i_key = u->i_key; p->o_key = u->o_key; gre_flags_to_tnl_flags(p->i_flags, u->i_flags); gre_flags_to_tnl_flags(p->o_flags, u->o_flags); memcpy(p->name, u->name, sizeof(u->name)); } static void ip6gre_tnl_parm_to_user(struct ip6_tnl_parm2 *u, const struct __ip6_tnl_parm *p) { u->proto = IPPROTO_GRE; u->laddr = p->laddr; u->raddr = p->raddr; u->flags = p->flags; u->hop_limit = p->hop_limit; u->encap_limit = p->encap_limit; u->flowinfo = p->flowinfo; u->link = p->link; u->i_key = p->i_key; u->o_key = p->o_key; u->i_flags = gre_tnl_flags_to_gre_flags(p->i_flags); u->o_flags = gre_tnl_flags_to_gre_flags(p->o_flags); memcpy(u->name, p->name, sizeof(u->name)); } static int ip6gre_tunnel_siocdevprivate(struct net_device *dev, struct ifreq *ifr, void __user *data, int cmd) { int err = 0; struct ip6_tnl_parm2 p; struct __ip6_tnl_parm p1; struct ip6_tnl *t = netdev_priv(dev); struct net *net = t->net; struct ip6gre_net *ign = net_generic(net, ip6gre_net_id); memset(&p1, 0, sizeof(p1)); switch (cmd) { case SIOCGETTUNNEL: if (dev == ign->fb_tunnel_dev) { if (copy_from_user(&p, data, sizeof(p))) { err = -EFAULT; break; } ip6gre_tnl_parm_from_user(&p1, &p); t = ip6gre_tunnel_locate(net, &p1, 0); if (!t) t = netdev_priv(dev); } memset(&p, 0, sizeof(p)); ip6gre_tnl_parm_to_user(&p, &t->parms); if (copy_to_user(data, &p, sizeof(p))) err = -EFAULT; break; case SIOCADDTUNNEL: case SIOCCHGTUNNEL: err = -EPERM; if (!ns_capable(net->user_ns, CAP_NET_ADMIN)) goto done; err = -EFAULT; if (copy_from_user(&p, data, sizeof(p))) goto done; err = -EINVAL; if ((p.i_flags|p.o_flags)&(GRE_VERSION|GRE_ROUTING)) goto done; if (!(p.i_flags&GRE_KEY)) p.i_key = 0; if (!(p.o_flags&GRE_KEY)) p.o_key = 0; ip6gre_tnl_parm_from_user(&p1, &p); t = ip6gre_tunnel_locate(net, &p1, cmd == SIOCADDTUNNEL); if (dev != ign->fb_tunnel_dev && cmd == SIOCCHGTUNNEL) { if (t) { if (t->dev != dev) { err = -EEXIST; break; } } else { t = netdev_priv(dev); ip6gre_tunnel_unlink(ign, t); synchronize_net(); ip6gre_tnl_change(t, &p1, 1); ip6gre_tunnel_link(ign, t); netdev_state_change(dev); } } if (t) { err = 0; memset(&p, 0, sizeof(p)); ip6gre_tnl_parm_to_user(&p, &t->parms); if (copy_to_user(data, &p, sizeof(p))) err = -EFAULT; } else err = (cmd == SIOCADDTUNNEL ? -ENOBUFS : -ENOENT); break; case SIOCDELTUNNEL: err = -EPERM; if (!ns_capable(net->user_ns, CAP_NET_ADMIN)) goto done; if (dev == ign->fb_tunnel_dev) { err = -EFAULT; if (copy_from_user(&p, data, sizeof(p))) goto done; err = -ENOENT; ip6gre_tnl_parm_from_user(&p1, &p); t = ip6gre_tunnel_locate(net, &p1, 0); if (!t) goto done; err = -EPERM; if (t == netdev_priv(ign->fb_tunnel_dev)) goto done; dev = t->dev; } unregister_netdevice(dev); err = 0; break; default: err = -EINVAL; } done: return err; } static int ip6gre_header(struct sk_buff *skb, struct net_device *dev, unsigned short type, const void *daddr, const void *saddr, unsigned int len) { struct ip6_tnl *t = netdev_priv(dev); struct ipv6hdr *ipv6h; __be16 *p; ipv6h = skb_push(skb, t->hlen + sizeof(*ipv6h)); ip6_flow_hdr(ipv6h, 0, ip6_make_flowlabel(dev_net(dev), skb, t->fl.u.ip6.flowlabel, true, &t->fl.u.ip6)); ipv6h->hop_limit = t->parms.hop_limit; ipv6h->nexthdr = NEXTHDR_GRE; ipv6h->saddr = t->parms.laddr; ipv6h->daddr = t->parms.raddr; p = (__be16 *)(ipv6h + 1); p[0] = ip_tunnel_flags_to_be16(t->parms.o_flags); p[1] = htons(type); /* * Set the source hardware address. */ if (saddr) memcpy(&ipv6h->saddr, saddr, sizeof(struct in6_addr)); if (daddr) memcpy(&ipv6h->daddr, daddr, sizeof(struct in6_addr)); if (!ipv6_addr_any(&ipv6h->daddr)) return t->hlen; return -t->hlen; } static const struct header_ops ip6gre_header_ops = { .create = ip6gre_header, }; static const struct net_device_ops ip6gre_netdev_ops = { .ndo_init = ip6gre_tunnel_init, .ndo_uninit = ip6gre_tunnel_uninit, .ndo_start_xmit = ip6gre_tunnel_xmit, .ndo_siocdevprivate = ip6gre_tunnel_siocdevprivate, .ndo_change_mtu = ip6_tnl_change_mtu, .ndo_get_iflink = ip6_tnl_get_iflink, }; static void ip6gre_dev_free(struct net_device *dev) { struct ip6_tnl *t = netdev_priv(dev); gro_cells_destroy(&t->gro_cells); dst_cache_destroy(&t->dst_cache); } static void ip6gre_tunnel_setup(struct net_device *dev) { dev->netdev_ops = &ip6gre_netdev_ops; dev->needs_free_netdev = true; dev->priv_destructor = ip6gre_dev_free; dev->pcpu_stat_type = NETDEV_PCPU_STAT_TSTATS; dev->type = ARPHRD_IP6GRE; dev->flags |= IFF_NOARP; dev->addr_len = sizeof(struct in6_addr); netif_keep_dst(dev); /* This perm addr will be used as interface identifier by IPv6 */ dev->addr_assign_type = NET_ADDR_RANDOM; eth_random_addr(dev->perm_addr); } #define GRE6_FEATURES (NETIF_F_SG | \ NETIF_F_FRAGLIST | \ NETIF_F_HIGHDMA | \ NETIF_F_HW_CSUM) static void ip6gre_tnl_init_features(struct net_device *dev) { struct ip6_tnl *nt = netdev_priv(dev); dev->features |= GRE6_FEATURES; dev->hw_features |= GRE6_FEATURES; /* TCP offload with GRE SEQ is not supported, nor can we support 2 * levels of outer headers requiring an update. */ if (test_bit(IP_TUNNEL_SEQ_BIT, nt->parms.o_flags)) return; if (test_bit(IP_TUNNEL_CSUM_BIT, nt->parms.o_flags) && nt->encap.type != TUNNEL_ENCAP_NONE) return; dev->features |= NETIF_F_GSO_SOFTWARE; dev->hw_features |= NETIF_F_GSO_SOFTWARE; dev->lltx = true; } static int ip6gre_tunnel_init_common(struct net_device *dev) { struct ip6_tnl *tunnel; int ret; int t_hlen; tunnel = netdev_priv(dev); tunnel->dev = dev; strcpy(tunnel->parms.name, dev->name); ret = dst_cache_init(&tunnel->dst_cache, GFP_KERNEL); if (ret) return ret; ret = gro_cells_init(&tunnel->gro_cells, dev); if (ret) goto cleanup_dst_cache_init; t_hlen = ip6gre_calc_hlen(tunnel); dev->mtu = ETH_DATA_LEN - t_hlen; if (dev->type == ARPHRD_ETHER) dev->mtu -= ETH_HLEN; if (!(tunnel->parms.flags & IP6_TNL_F_IGN_ENCAP_LIMIT)) dev->mtu -= 8; if (tunnel->parms.collect_md) { netif_keep_dst(dev); } ip6gre_tnl_init_features(dev); netdev_hold(dev, &tunnel->dev_tracker, GFP_KERNEL); netdev_lockdep_set_classes(dev); return 0; cleanup_dst_cache_init: dst_cache_destroy(&tunnel->dst_cache); return ret; } static int ip6gre_tunnel_init(struct net_device *dev) { struct ip6_tnl *tunnel; int ret; ret = ip6gre_tunnel_init_common(dev); if (ret) return ret; tunnel = netdev_priv(dev); if (tunnel->parms.collect_md) return 0; __dev_addr_set(dev, &tunnel->parms.laddr, sizeof(struct in6_addr)); memcpy(dev->broadcast, &tunnel->parms.raddr, sizeof(struct in6_addr)); if (ipv6_addr_any(&tunnel->parms.raddr)) dev->header_ops = &ip6gre_header_ops; return 0; } static void ip6gre_fb_tunnel_init(struct net_device *dev) { struct ip6_tnl *tunnel = netdev_priv(dev); tunnel->dev = dev; tunnel->net = dev_net(dev); strcpy(tunnel->parms.name, dev->name); tunnel->hlen = sizeof(struct ipv6hdr) + 4; } static struct inet6_protocol ip6gre_protocol __read_mostly = { .handler = gre_rcv, .err_handler = ip6gre_err, .flags = INET6_PROTO_FINAL, }; static void __net_exit ip6gre_exit_rtnl_net(struct net *net, struct list_head *head) { struct ip6gre_net *ign = net_generic(net, ip6gre_net_id); struct net_device *dev, *aux; int prio; for_each_netdev_safe(net, dev, aux) if (dev->rtnl_link_ops == &ip6gre_link_ops || dev->rtnl_link_ops == &ip6gre_tap_ops || dev->rtnl_link_ops == &ip6erspan_tap_ops) unregister_netdevice_queue(dev, head); for (prio = 0; prio < 4; prio++) { int h; for (h = 0; h < IP6_GRE_HASH_SIZE; h++) { struct ip6_tnl *t; t = rtnl_net_dereference(net, ign->tunnels[prio][h]); while (t) { /* If dev is in the same netns, it has already * been added to the list by the previous loop. */ if (!net_eq(dev_net(t->dev), net)) unregister_netdevice_queue(t->dev, head); t = rtnl_net_dereference(net, t->next); } } } } static int __net_init ip6gre_init_net(struct net *net) { struct ip6gre_net *ign = net_generic(net, ip6gre_net_id); struct net_device *ndev; int err; if (!net_has_fallback_tunnels(net)) return 0; ndev = alloc_netdev(sizeof(struct ip6_tnl), "ip6gre0", NET_NAME_UNKNOWN, ip6gre_tunnel_setup); if (!ndev) { err = -ENOMEM; goto err_alloc_dev; } ign->fb_tunnel_dev = ndev; dev_net_set(ign->fb_tunnel_dev, net); /* FB netdevice is special: we have one, and only one per netns. * Allowing to move it to another netns is clearly unsafe. */ ign->fb_tunnel_dev->netns_immutable = true; ip6gre_fb_tunnel_init(ign->fb_tunnel_dev); ign->fb_tunnel_dev->rtnl_link_ops = &ip6gre_link_ops; err = register_netdev(ign->fb_tunnel_dev); if (err) goto err_reg_dev; rcu_assign_pointer(ign->tunnels_wc[0], netdev_priv(ign->fb_tunnel_dev)); return 0; err_reg_dev: free_netdev(ndev); err_alloc_dev: return err; } static struct pernet_operations ip6gre_net_ops = { .init = ip6gre_init_net, .exit_rtnl = ip6gre_exit_rtnl_net, .id = &ip6gre_net_id, .size = sizeof(struct ip6gre_net), }; static int ip6gre_tunnel_validate(struct nlattr *tb[], struct nlattr *data[], struct netlink_ext_ack *extack) { __be16 flags; if (!data) return 0; flags = 0; if (data[IFLA_GRE_IFLAGS]) flags |= nla_get_be16(data[IFLA_GRE_IFLAGS]); if (data[IFLA_GRE_OFLAGS]) flags |= nla_get_be16(data[IFLA_GRE_OFLAGS]); if (flags & (GRE_VERSION|GRE_ROUTING)) return -EINVAL; return 0; } static int ip6gre_tap_validate(struct nlattr *tb[], struct nlattr *data[], struct netlink_ext_ack *extack) { struct in6_addr daddr; if (tb[IFLA_ADDRESS]) { if (nla_len(tb[IFLA_ADDRESS]) != ETH_ALEN) return -EINVAL; if (!is_valid_ether_addr(nla_data(tb[IFLA_ADDRESS]))) return -EADDRNOTAVAIL; } if (!data) goto out; if (data[IFLA_GRE_REMOTE]) { daddr = nla_get_in6_addr(data[IFLA_GRE_REMOTE]); if (ipv6_addr_any(&daddr)) return -EINVAL; } out: return ip6gre_tunnel_validate(tb, data, extack); } static int ip6erspan_tap_validate(struct nlattr *tb[], struct nlattr *data[], struct netlink_ext_ack *extack) { __be16 flags = 0; int ret, ver = 0; if (!data) return 0; ret = ip6gre_tap_validate(tb, data, extack); if (ret) return ret; /* ERSPAN should only have GRE sequence and key flag */ if (data[IFLA_GRE_OFLAGS]) flags |= nla_get_be16(data[IFLA_GRE_OFLAGS]); if (data[IFLA_GRE_IFLAGS]) flags |= nla_get_be16(data[IFLA_GRE_IFLAGS]); if (!data[IFLA_GRE_COLLECT_METADATA] && flags != (GRE_SEQ | GRE_KEY)) return -EINVAL; /* ERSPAN Session ID only has 10-bit. Since we reuse * 32-bit key field as ID, check it's range. */ if (data[IFLA_GRE_IKEY] && (ntohl(nla_get_be32(data[IFLA_GRE_IKEY])) & ~ID_MASK)) return -EINVAL; if (data[IFLA_GRE_OKEY] && (ntohl(nla_get_be32(data[IFLA_GRE_OKEY])) & ~ID_MASK)) return -EINVAL; if (data[IFLA_GRE_ERSPAN_VER]) { ver = nla_get_u8(data[IFLA_GRE_ERSPAN_VER]); if (ver != 1 && ver != 2) return -EINVAL; } if (ver == 1) { if (data[IFLA_GRE_ERSPAN_INDEX]) { u32 index = nla_get_u32(data[IFLA_GRE_ERSPAN_INDEX]); if (index & ~INDEX_MASK) return -EINVAL; } } else if (ver == 2) { if (data[IFLA_GRE_ERSPAN_DIR]) { u16 dir = nla_get_u8(data[IFLA_GRE_ERSPAN_DIR]); if (dir & ~(DIR_MASK >> DIR_OFFSET)) return -EINVAL; } if (data[IFLA_GRE_ERSPAN_HWID]) { u16 hwid = nla_get_u16(data[IFLA_GRE_ERSPAN_HWID]); if (hwid & ~(HWID_MASK >> HWID_OFFSET)) return -EINVAL; } } return 0; } static void ip6erspan_set_version(struct nlattr *data[], struct __ip6_tnl_parm *parms) { if (!data) return; parms->erspan_ver = 1; if (data[IFLA_GRE_ERSPAN_VER]) parms->erspan_ver = nla_get_u8(data[IFLA_GRE_ERSPAN_VER]); if (parms->erspan_ver == 1) { if (data[IFLA_GRE_ERSPAN_INDEX]) parms->index = nla_get_u32(data[IFLA_GRE_ERSPAN_INDEX]); } else if (parms->erspan_ver == 2) { if (data[IFLA_GRE_ERSPAN_DIR]) parms->dir = nla_get_u8(data[IFLA_GRE_ERSPAN_DIR]); if (data[IFLA_GRE_ERSPAN_HWID]) parms->hwid = nla_get_u16(data[IFLA_GRE_ERSPAN_HWID]); } } static void ip6gre_netlink_parms(struct nlattr *data[], struct __ip6_tnl_parm *parms) { memset(parms, 0, sizeof(*parms)); if (!data) return; if (data[IFLA_GRE_LINK]) parms->link = nla_get_u32(data[IFLA_GRE_LINK]); if (data[IFLA_GRE_IFLAGS]) gre_flags_to_tnl_flags(parms->i_flags, nla_get_be16(data[IFLA_GRE_IFLAGS])); if (data[IFLA_GRE_OFLAGS]) gre_flags_to_tnl_flags(parms->o_flags, nla_get_be16(data[IFLA_GRE_OFLAGS])); if (data[IFLA_GRE_IKEY]) parms->i_key = nla_get_be32(data[IFLA_GRE_IKEY]); if (data[IFLA_GRE_OKEY]) parms->o_key = nla_get_be32(data[IFLA_GRE_OKEY]); if (data[IFLA_GRE_LOCAL]) parms->laddr = nla_get_in6_addr(data[IFLA_GRE_LOCAL]); if (data[IFLA_GRE_REMOTE]) parms->raddr = nla_get_in6_addr(data[IFLA_GRE_REMOTE]); if (data[IFLA_GRE_TTL]) parms->hop_limit = nla_get_u8(data[IFLA_GRE_TTL]); if (data[IFLA_GRE_ENCAP_LIMIT]) parms->encap_limit = nla_get_u8(data[IFLA_GRE_ENCAP_LIMIT]); if (data[IFLA_GRE_FLOWINFO]) parms->flowinfo = nla_get_be32(data[IFLA_GRE_FLOWINFO]); if (data[IFLA_GRE_FLAGS]) parms->flags = nla_get_u32(data[IFLA_GRE_FLAGS]); if (data[IFLA_GRE_FWMARK]) parms->fwmark = nla_get_u32(data[IFLA_GRE_FWMARK]); if (data[IFLA_GRE_COLLECT_METADATA]) parms->collect_md = true; } static int ip6gre_tap_init(struct net_device *dev) { int ret; ret = ip6gre_tunnel_init_common(dev); if (ret) return ret; dev->priv_flags |= IFF_LIVE_ADDR_CHANGE; return 0; } static const struct net_device_ops ip6gre_tap_netdev_ops = { .ndo_init = ip6gre_tap_init, .ndo_uninit = ip6gre_tunnel_uninit, .ndo_start_xmit = ip6gre_tunnel_xmit, .ndo_set_mac_address = eth_mac_addr, .ndo_validate_addr = eth_validate_addr, .ndo_change_mtu = ip6_tnl_change_mtu, .ndo_get_iflink = ip6_tnl_get_iflink, }; static int ip6erspan_calc_hlen(struct ip6_tnl *tunnel) { int t_hlen; tunnel->tun_hlen = 8; tunnel->hlen = tunnel->tun_hlen + tunnel->encap_hlen + erspan_hdr_len(tunnel->parms.erspan_ver); t_hlen = tunnel->hlen + sizeof(struct ipv6hdr); tunnel->dev->needed_headroom = LL_MAX_HEADER + t_hlen; return t_hlen; } static int ip6erspan_tap_init(struct net_device *dev) { struct ip6_tnl *tunnel; int t_hlen; int ret; tunnel = netdev_priv(dev); tunnel->dev = dev; strcpy(tunnel->parms.name, dev->name); ret = dst_cache_init(&tunnel->dst_cache, GFP_KERNEL); if (ret) return ret; ret = gro_cells_init(&tunnel->gro_cells, dev); if (ret) goto cleanup_dst_cache_init; t_hlen = ip6erspan_calc_hlen(tunnel); dev->mtu = ETH_DATA_LEN - t_hlen; if (dev->type == ARPHRD_ETHER) dev->mtu -= ETH_HLEN; if (!(tunnel->parms.flags & IP6_TNL_F_IGN_ENCAP_LIMIT)) dev->mtu -= 8; dev->priv_flags |= IFF_LIVE_ADDR_CHANGE; ip6erspan_tnl_link_config(tunnel, 1); netdev_hold(dev, &tunnel->dev_tracker, GFP_KERNEL); netdev_lockdep_set_classes(dev); return 0; cleanup_dst_cache_init: dst_cache_destroy(&tunnel->dst_cache); return ret; } static const struct net_device_ops ip6erspan_netdev_ops = { .ndo_init = ip6erspan_tap_init, .ndo_uninit = ip6erspan_tunnel_uninit, .ndo_start_xmit = ip6erspan_tunnel_xmit, .ndo_set_mac_address = eth_mac_addr, .ndo_validate_addr = eth_validate_addr, .ndo_change_mtu = ip6_tnl_change_mtu, .ndo_get_iflink = ip6_tnl_get_iflink, }; static void ip6gre_tap_setup(struct net_device *dev) { ether_setup(dev); dev->max_mtu = 0; dev->netdev_ops = &ip6gre_tap_netdev_ops; dev->needs_free_netdev = true; dev->priv_destructor = ip6gre_dev_free; dev->pcpu_stat_type = NETDEV_PCPU_STAT_TSTATS; dev->priv_flags &= ~IFF_TX_SKB_SHARING; dev->priv_flags |= IFF_LIVE_ADDR_CHANGE; netif_keep_dst(dev); } static bool ip6gre_netlink_encap_parms(struct nlattr *data[], struct ip_tunnel_encap *ipencap) { bool ret = false; memset(ipencap, 0, sizeof(*ipencap)); if (!data) return ret; if (data[IFLA_GRE_ENCAP_TYPE]) { ret = true; ipencap->type = nla_get_u16(data[IFLA_GRE_ENCAP_TYPE]); } if (data[IFLA_GRE_ENCAP_FLAGS]) { ret = true; ipencap->flags = nla_get_u16(data[IFLA_GRE_ENCAP_FLAGS]); } if (data[IFLA_GRE_ENCAP_SPORT]) { ret = true; ipencap->sport = nla_get_be16(data[IFLA_GRE_ENCAP_SPORT]); } if (data[IFLA_GRE_ENCAP_DPORT]) { ret = true; ipencap->dport = nla_get_be16(data[IFLA_GRE_ENCAP_DPORT]); } return ret; } static int ip6gre_newlink_common(struct net *link_net, struct net_device *dev, struct nlattr *tb[], struct nlattr *data[], struct netlink_ext_ack *extack) { struct ip6_tnl *nt; struct ip_tunnel_encap ipencap; int err; nt = netdev_priv(dev); if (ip6gre_netlink_encap_parms(data, &ipencap)) { int err = ip6_tnl_encap_setup(nt, &ipencap); if (err < 0) return err; } if (dev->type == ARPHRD_ETHER && !tb[IFLA_ADDRESS]) eth_hw_addr_random(dev); nt->dev = dev; nt->net = link_net; err = register_netdevice(dev); if (err) goto out; if (tb[IFLA_MTU]) ip6_tnl_change_mtu(dev, nla_get_u32(tb[IFLA_MTU])); out: return err; } static int ip6gre_newlink(struct net_device *dev, struct rtnl_newlink_params *params, struct netlink_ext_ack *extack) { struct net *net = params->link_net ? : dev_net(dev); struct ip6_tnl *nt = netdev_priv(dev); struct nlattr **data = params->data; struct nlattr **tb = params->tb; struct ip6gre_net *ign; int err; ip6gre_netlink_parms(data, &nt->parms); ign = net_generic(net, ip6gre_net_id); if (nt->parms.collect_md) { if (rtnl_dereference(ign->collect_md_tun)) return -EEXIST; } else { if (ip6gre_tunnel_find(net, &nt->parms, dev->type)) return -EEXIST; } err = ip6gre_newlink_common(net, dev, tb, data, extack); if (!err) { ip6gre_tnl_link_config(nt, !tb[IFLA_MTU]); ip6gre_tunnel_link_md(ign, nt); ip6gre_tunnel_link(net_generic(net, ip6gre_net_id), nt); } return err; } static struct ip6_tnl * ip6gre_changelink_common(struct net_device *dev, struct nlattr *tb[], struct nlattr *data[], struct __ip6_tnl_parm *p_p, struct netlink_ext_ack *extack) { struct ip6_tnl *t, *nt = netdev_priv(dev); struct net *net = nt->net; struct ip6gre_net *ign = net_generic(net, ip6gre_net_id); struct ip_tunnel_encap ipencap; if (dev == ign->fb_tunnel_dev) return ERR_PTR(-EINVAL); if (ip6gre_netlink_encap_parms(data, &ipencap)) { int err = ip6_tnl_encap_setup(nt, &ipencap); if (err < 0) return ERR_PTR(err); } ip6gre_netlink_parms(data, p_p); t = ip6gre_tunnel_locate(net, p_p, 0); if (t) { if (t->dev != dev) return ERR_PTR(-EEXIST); } else { t = nt; } return t; } static int ip6gre_changelink(struct net_device *dev, struct nlattr *tb[], struct nlattr *data[], struct netlink_ext_ack *extack) { struct ip6_tnl *t = netdev_priv(dev); struct ip6gre_net *ign = net_generic(t->net, ip6gre_net_id); struct __ip6_tnl_parm p; t = ip6gre_changelink_common(dev, tb, data, &p, extack); if (IS_ERR(t)) return PTR_ERR(t); ip6gre_tunnel_unlink_md(ign, t); ip6gre_tunnel_unlink(ign, t); ip6gre_tnl_change(t, &p, !tb[IFLA_MTU]); ip6gre_tunnel_link_md(ign, t); ip6gre_tunnel_link(ign, t); return 0; } static void ip6gre_dellink(struct net_device *dev, struct list_head *head) { struct net *net = dev_net(dev); struct ip6gre_net *ign = net_generic(net, ip6gre_net_id); if (dev != ign->fb_tunnel_dev) unregister_netdevice_queue(dev, head); } static size_t ip6gre_get_size(const struct net_device *dev) { return /* IFLA_GRE_LINK */ nla_total_size(4) + /* IFLA_GRE_IFLAGS */ nla_total_size(2) + /* IFLA_GRE_OFLAGS */ nla_total_size(2) + /* IFLA_GRE_IKEY */ nla_total_size(4) + /* IFLA_GRE_OKEY */ nla_total_size(4) + /* IFLA_GRE_LOCAL */ nla_total_size(sizeof(struct in6_addr)) + /* IFLA_GRE_REMOTE */ nla_total_size(sizeof(struct in6_addr)) + /* IFLA_GRE_TTL */ nla_total_size(1) + /* IFLA_GRE_ENCAP_LIMIT */ nla_total_size(1) + /* IFLA_GRE_FLOWINFO */ nla_total_size(4) + /* IFLA_GRE_FLAGS */ nla_total_size(4) + /* IFLA_GRE_ENCAP_TYPE */ nla_total_size(2) + /* IFLA_GRE_ENCAP_FLAGS */ nla_total_size(2) + /* IFLA_GRE_ENCAP_SPORT */ nla_total_size(2) + /* IFLA_GRE_ENCAP_DPORT */ nla_total_size(2) + /* IFLA_GRE_COLLECT_METADATA */ nla_total_size(0) + /* IFLA_GRE_FWMARK */ nla_total_size(4) + /* IFLA_GRE_ERSPAN_INDEX */ nla_total_size(4) + 0; } static int ip6gre_fill_info(struct sk_buff *skb, const struct net_device *dev) { struct ip6_tnl *t = netdev_priv(dev); struct __ip6_tnl_parm *p = &t->parms; IP_TUNNEL_DECLARE_FLAGS(o_flags); ip_tunnel_flags_copy(o_flags, p->o_flags); if (p->erspan_ver == 1 || p->erspan_ver == 2) { if (!p->collect_md) __set_bit(IP_TUNNEL_KEY_BIT, o_flags); if (nla_put_u8(skb, IFLA_GRE_ERSPAN_VER, p->erspan_ver)) goto nla_put_failure; if (p->erspan_ver == 1) { if (nla_put_u32(skb, IFLA_GRE_ERSPAN_INDEX, p->index)) goto nla_put_failure; } else { if (nla_put_u8(skb, IFLA_GRE_ERSPAN_DIR, p->dir)) goto nla_put_failure; if (nla_put_u16(skb, IFLA_GRE_ERSPAN_HWID, p->hwid)) goto nla_put_failure; } } if (nla_put_u32(skb, IFLA_GRE_LINK, p->link) || nla_put_be16(skb, IFLA_GRE_IFLAGS, gre_tnl_flags_to_gre_flags(p->i_flags)) || nla_put_be16(skb, IFLA_GRE_OFLAGS, gre_tnl_flags_to_gre_flags(o_flags)) || nla_put_be32(skb, IFLA_GRE_IKEY, p->i_key) || nla_put_be32(skb, IFLA_GRE_OKEY, p->o_key) || nla_put_in6_addr(skb, IFLA_GRE_LOCAL, &p->laddr) || nla_put_in6_addr(skb, IFLA_GRE_REMOTE, &p->raddr) || nla_put_u8(skb, IFLA_GRE_TTL, p->hop_limit) || nla_put_u8(skb, IFLA_GRE_ENCAP_LIMIT, p->encap_limit) || nla_put_be32(skb, IFLA_GRE_FLOWINFO, p->flowinfo) || nla_put_u32(skb, IFLA_GRE_FLAGS, p->flags) || nla_put_u32(skb, IFLA_GRE_FWMARK, p->fwmark)) goto nla_put_failure; if (nla_put_u16(skb, IFLA_GRE_ENCAP_TYPE, t->encap.type) || nla_put_be16(skb, IFLA_GRE_ENCAP_SPORT, t->encap.sport) || nla_put_be16(skb, IFLA_GRE_ENCAP_DPORT, t->encap.dport) || nla_put_u16(skb, IFLA_GRE_ENCAP_FLAGS, t->encap.flags)) goto nla_put_failure; if (p->collect_md) { if (nla_put_flag(skb, IFLA_GRE_COLLECT_METADATA)) goto nla_put_failure; } return 0; nla_put_failure: return -EMSGSIZE; } static const struct nla_policy ip6gre_policy[IFLA_GRE_MAX + 1] = { [IFLA_GRE_LINK] = { .type = NLA_U32 }, [IFLA_GRE_IFLAGS] = { .type = NLA_U16 }, [IFLA_GRE_OFLAGS] = { .type = NLA_U16 }, [IFLA_GRE_IKEY] = { .type = NLA_U32 }, [IFLA_GRE_OKEY] = { .type = NLA_U32 }, [IFLA_GRE_LOCAL] = { .len = sizeof_field(struct ipv6hdr, saddr) }, [IFLA_GRE_REMOTE] = { .len = sizeof_field(struct ipv6hdr, daddr) }, [IFLA_GRE_TTL] = { .type = NLA_U8 }, [IFLA_GRE_ENCAP_LIMIT] = { .type = NLA_U8 }, [IFLA_GRE_FLOWINFO] = { .type = NLA_U32 }, [IFLA_GRE_FLAGS] = { .type = NLA_U32 }, [IFLA_GRE_ENCAP_TYPE] = { .type = NLA_U16 }, [IFLA_GRE_ENCAP_FLAGS] = { .type = NLA_U16 }, [IFLA_GRE_ENCAP_SPORT] = { .type = NLA_U16 }, [IFLA_GRE_ENCAP_DPORT] = { .type = NLA_U16 }, [IFLA_GRE_COLLECT_METADATA] = { .type = NLA_FLAG }, [IFLA_GRE_FWMARK] = { .type = NLA_U32 }, [IFLA_GRE_ERSPAN_INDEX] = { .type = NLA_U32 }, [IFLA_GRE_ERSPAN_VER] = { .type = NLA_U8 }, [IFLA_GRE_ERSPAN_DIR] = { .type = NLA_U8 }, [IFLA_GRE_ERSPAN_HWID] = { .type = NLA_U16 }, }; static void ip6erspan_tap_setup(struct net_device *dev) { ether_setup(dev); dev->max_mtu = 0; dev->netdev_ops = &ip6erspan_netdev_ops; dev->needs_free_netdev = true; dev->priv_destructor = ip6gre_dev_free; dev->pcpu_stat_type = NETDEV_PCPU_STAT_TSTATS; dev->priv_flags &= ~IFF_TX_SKB_SHARING; dev->priv_flags |= IFF_LIVE_ADDR_CHANGE; netif_keep_dst(dev); } static int ip6erspan_newlink(struct net_device *dev, struct rtnl_newlink_params *params, struct netlink_ext_ack *extack) { struct net *net = params->link_net ? : dev_net(dev); struct ip6_tnl *nt = netdev_priv(dev); struct nlattr **data = params->data; struct nlattr **tb = params->tb; struct ip6gre_net *ign; int err; ip6gre_netlink_parms(data, &nt->parms); ip6erspan_set_version(data, &nt->parms); ign = net_generic(net, ip6gre_net_id); if (nt->parms.collect_md) { if (rtnl_dereference(ign->collect_md_tun_erspan)) return -EEXIST; } else { if (ip6gre_tunnel_find(net, &nt->parms, dev->type)) return -EEXIST; } err = ip6gre_newlink_common(net, dev, tb, data, extack); if (!err) { ip6erspan_tnl_link_config(nt, !tb[IFLA_MTU]); ip6erspan_tunnel_link_md(ign, nt); ip6gre_tunnel_link(net_generic(net, ip6gre_net_id), nt); } return err; } static void ip6erspan_tnl_link_config(struct ip6_tnl *t, int set_mtu) { ip6gre_tnl_link_config_common(t); ip6gre_tnl_link_config_route(t, set_mtu, ip6erspan_calc_hlen(t)); } static int ip6erspan_tnl_change(struct ip6_tnl *t, const struct __ip6_tnl_parm *p, int set_mtu) { ip6gre_tnl_copy_tnl_parm(t, p); ip6erspan_tnl_link_config(t, set_mtu); return 0; } static int ip6erspan_changelink(struct net_device *dev, struct nlattr *tb[], struct nlattr *data[], struct netlink_ext_ack *extack) { struct ip6gre_net *ign = net_generic(dev_net(dev), ip6gre_net_id); struct __ip6_tnl_parm p; struct ip6_tnl *t; t = ip6gre_changelink_common(dev, tb, data, &p, extack); if (IS_ERR(t)) return PTR_ERR(t); ip6erspan_set_version(data, &p); ip6gre_tunnel_unlink_md(ign, t); ip6gre_tunnel_unlink(ign, t); ip6erspan_tnl_change(t, &p, !tb[IFLA_MTU]); ip6erspan_tunnel_link_md(ign, t); ip6gre_tunnel_link(ign, t); return 0; } static struct rtnl_link_ops ip6gre_link_ops __read_mostly = { .kind = "ip6gre", .maxtype = IFLA_GRE_MAX, .policy = ip6gre_policy, .priv_size = sizeof(struct ip6_tnl), .setup = ip6gre_tunnel_setup, .validate = ip6gre_tunnel_validate, .newlink = ip6gre_newlink, .changelink = ip6gre_changelink, .dellink = ip6gre_dellink, .get_size = ip6gre_get_size, .fill_info = ip6gre_fill_info, .get_link_net = ip6_tnl_get_link_net, }; static struct rtnl_link_ops ip6gre_tap_ops __read_mostly = { .kind = "ip6gretap", .maxtype = IFLA_GRE_MAX, .policy = ip6gre_policy, .priv_size = sizeof(struct ip6_tnl), .setup = ip6gre_tap_setup, .validate = ip6gre_tap_validate, .newlink = ip6gre_newlink, .changelink = ip6gre_changelink, .get_size = ip6gre_get_size, .fill_info = ip6gre_fill_info, .get_link_net = ip6_tnl_get_link_net, }; static struct rtnl_link_ops ip6erspan_tap_ops __read_mostly = { .kind = "ip6erspan", .maxtype = IFLA_GRE_MAX, .policy = ip6gre_policy, .priv_size = sizeof(struct ip6_tnl), .setup = ip6erspan_tap_setup, .validate = ip6erspan_tap_validate, .newlink = ip6erspan_newlink, .changelink = ip6erspan_changelink, .get_size = ip6gre_get_size, .fill_info = ip6gre_fill_info, .get_link_net = ip6_tnl_get_link_net, }; /* * And now the modules code and kernel interface. */ static int __init ip6gre_init(void) { int err; pr_info("GRE over IPv6 tunneling driver\n"); err = register_pernet_device(&ip6gre_net_ops); if (err < 0) return err; err = inet6_add_protocol(&ip6gre_protocol, IPPROTO_GRE); if (err < 0) { pr_info("%s: can't add protocol\n", __func__); goto add_proto_failed; } err = rtnl_link_register(&ip6gre_link_ops); if (err < 0) goto rtnl_link_failed; err = rtnl_link_register(&ip6gre_tap_ops); if (err < 0) goto tap_ops_failed; err = rtnl_link_register(&ip6erspan_tap_ops); if (err < 0) goto erspan_link_failed; out: return err; erspan_link_failed: rtnl_link_unregister(&ip6gre_tap_ops); tap_ops_failed: rtnl_link_unregister(&ip6gre_link_ops); rtnl_link_failed: inet6_del_protocol(&ip6gre_protocol, IPPROTO_GRE); add_proto_failed: unregister_pernet_device(&ip6gre_net_ops); goto out; } static void __exit ip6gre_fini(void) { rtnl_link_unregister(&ip6gre_tap_ops); rtnl_link_unregister(&ip6gre_link_ops); rtnl_link_unregister(&ip6erspan_tap_ops); inet6_del_protocol(&ip6gre_protocol, IPPROTO_GRE); unregister_pernet_device(&ip6gre_net_ops); } module_init(ip6gre_init); module_exit(ip6gre_fini); MODULE_LICENSE("GPL"); MODULE_AUTHOR("D. Kozlov <xeb@mail.ru>"); MODULE_DESCRIPTION("GRE over IPv6 tunneling device"); MODULE_ALIAS_RTNL_LINK("ip6gre"); MODULE_ALIAS_RTNL_LINK("ip6gretap"); MODULE_ALIAS_RTNL_LINK("ip6erspan"); MODULE_ALIAS_NETDEV("ip6gre0"); |
| 12 59 1522 51 32 79 7 27 3 252 19 252 9668 526 16 264 252 251 2 160 2 2 2 109 110 109 289 281 8 282 6 6 199 11 15 14 159 384 38 27 24 3 28 22 68 11207 11 121 135 55 54 25 1 55 55 381 382 15 25 283 127 33 20 62 13 119 78 46 103 2 820 5 75 2 10 157 80 117 197 362 24 361 426 154 145 9037 135 6002 357 18 1738 1266 2 1 126 841 3 14 368 10668 4 4 4 65 9077 96 13 169 169 119 4 25 9 9 15 15 2 15 7 1156 13 118 121 121 121 2 2 1 2 2 2 92 3 3 92 89 3 124 182 622 88 172 257 7 94 94 88 16 1 55 151 1094 369 2 532 532 5 5 5 27 1584 14 222 13 262 11 143 144 12 12 11 9322 13 3 367 59 9 881 7400 148 3 193 106 1 28 3 27 60 13 6 17 79 493 8 5 278 3 231 22 355 16 2653 2 5 6206 6200 14 8 33 28 7 74 2 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 2297 2298 2299 2300 2301 2302 2303 2304 2305 2306 2307 2308 2309 2310 2311 2312 2313 2314 2315 2316 2317 2318 2319 2320 2321 2322 2323 2324 2325 2326 2327 2328 2329 2330 2331 2332 2333 2334 2335 2336 2337 2338 2339 2340 2341 2342 2343 2344 2345 2346 2347 2348 2349 2350 2351 2352 2353 2354 2355 2356 2357 2358 2359 2360 2361 2362 2363 2364 2365 2366 2367 2368 2369 2370 2371 2372 2373 2374 2375 2376 2377 2378 2379 2380 2381 2382 2383 2384 2385 2386 2387 2388 2389 2390 2391 2392 2393 2394 2395 2396 2397 2398 2399 2400 2401 2402 2403 2404 2405 2406 2407 2408 2409 2410 2411 2412 2413 2414 2415 2416 2417 2418 2419 2420 2421 2422 2423 2424 2425 2426 2427 2428 2429 2430 2431 2432 2433 2434 2435 2436 2437 2438 2439 2440 2441 2442 2443 2444 2445 2446 2447 2448 2449 2450 2451 2452 2453 2454 2455 2456 2457 2458 2459 2460 2461 2462 2463 2464 2465 2466 2467 2468 2469 2470 2471 2472 2473 2474 2475 2476 2477 2478 2479 2480 2481 2482 2483 2484 2485 2486 2487 2488 2489 2490 2491 2492 2493 2494 2495 2496 2497 2498 2499 2500 2501 2502 2503 2504 2505 2506 2507 2508 2509 2510 2511 2512 2513 2514 2515 2516 2517 2518 2519 2520 2521 2522 2523 2524 2525 2526 2527 2528 2529 2530 2531 2532 2533 2534 2535 2536 2537 2538 2539 2540 2541 2542 2543 2544 2545 2546 2547 2548 2549 2550 2551 2552 2553 2554 2555 2556 2557 2558 2559 2560 2561 2562 2563 2564 2565 2566 2567 2568 2569 2570 2571 2572 2573 2574 2575 2576 2577 2578 2579 2580 2581 2582 2583 2584 2585 2586 2587 2588 2589 2590 2591 2592 2593 2594 2595 2596 2597 2598 2599 2600 2601 2602 2603 2604 2605 2606 2607 2608 2609 2610 2611 2612 2613 2614 2615 2616 2617 2618 2619 2620 2621 2622 2623 2624 2625 2626 2627 2628 2629 2630 2631 2632 2633 2634 2635 2636 2637 2638 2639 2640 2641 2642 2643 2644 2645 2646 2647 2648 2649 2650 2651 2652 2653 2654 2655 2656 2657 2658 2659 2660 2661 2662 2663 2664 2665 2666 2667 2668 2669 2670 2671 2672 2673 2674 2675 2676 2677 2678 2679 2680 2681 2682 2683 2684 2685 2686 2687 2688 2689 2690 2691 2692 2693 2694 2695 2696 2697 2698 2699 2700 2701 2702 2703 2704 2705 2706 2707 2708 2709 2710 2711 2712 2713 2714 2715 2716 2717 2718 2719 2720 2721 2722 2723 2724 2725 2726 2727 2728 2729 2730 2731 2732 2733 2734 2735 2736 2737 2738 2739 2740 2741 2742 2743 2744 2745 2746 2747 2748 2749 2750 2751 2752 2753 2754 2755 2756 2757 2758 2759 2760 2761 2762 2763 2764 2765 2766 2767 2768 2769 2770 2771 2772 2773 2774 2775 2776 2777 2778 2779 2780 2781 2782 2783 2784 2785 2786 2787 2788 2789 2790 2791 2792 2793 2794 2795 2796 2797 2798 2799 2800 2801 2802 2803 2804 2805 2806 2807 2808 2809 2810 2811 2812 2813 2814 2815 2816 2817 2818 2819 2820 2821 2822 2823 2824 2825 2826 2827 2828 2829 2830 2831 2832 2833 2834 2835 2836 2837 2838 2839 2840 2841 2842 2843 2844 2845 2846 2847 2848 2849 2850 2851 2852 2853 2854 2855 2856 2857 2858 2859 2860 2861 2862 2863 2864 2865 2866 2867 2868 2869 2870 2871 2872 2873 2874 2875 2876 2877 2878 2879 2880 2881 2882 2883 2884 2885 2886 2887 2888 2889 2890 2891 2892 2893 2894 2895 2896 2897 2898 2899 2900 2901 2902 2903 2904 2905 2906 2907 2908 2909 2910 2911 2912 2913 2914 2915 2916 2917 2918 2919 2920 2921 2922 2923 2924 2925 2926 2927 2928 2929 2930 2931 2932 2933 2934 2935 2936 2937 2938 2939 2940 2941 2942 2943 2944 2945 2946 2947 2948 2949 2950 2951 2952 2953 2954 2955 2956 2957 2958 2959 2960 2961 2962 2963 2964 2965 2966 2967 2968 2969 2970 2971 2972 2973 2974 2975 2976 2977 2978 2979 2980 2981 2982 2983 2984 2985 2986 2987 2988 2989 2990 2991 2992 2993 2994 2995 2996 2997 2998 2999 3000 3001 3002 3003 3004 3005 3006 3007 3008 3009 3010 3011 3012 3013 3014 3015 3016 3017 3018 3019 3020 3021 3022 3023 3024 3025 3026 3027 3028 3029 | /* SPDX-License-Identifier: GPL-2.0-or-later */ /* * INET An implementation of the TCP/IP protocol suite for the LINUX * operating system. INET is implemented using the BSD Socket * interface as the means of communication with the user level. * * Definitions for the AF_INET socket handler. * * Version: @(#)sock.h 1.0.4 05/13/93 * * Authors: Ross Biro * Fred N. van Kempen, <waltje@uWalt.NL.Mugnet.ORG> * Corey Minyard <wf-rch!minyard@relay.EU.net> * Florian La Roche <flla@stud.uni-sb.de> * * Fixes: * Alan Cox : Volatiles in skbuff pointers. See * skbuff comments. May be overdone, * better to prove they can be removed * than the reverse. * Alan Cox : Added a zapped field for tcp to note * a socket is reset and must stay shut up * Alan Cox : New fields for options * Pauline Middelink : identd support * Alan Cox : Eliminate low level recv/recvfrom * David S. Miller : New socket lookup architecture. * Steve Whitehouse: Default routines for sock_ops * Arnaldo C. Melo : removed net_pinfo, tp_pinfo and made * protinfo be just a void pointer, as the * protocol specific parts were moved to * respective headers and ipv4/v6, etc now * use private slabcaches for its socks * Pedro Hortas : New flags field for socket options */ #ifndef _SOCK_H #define _SOCK_H #include <linux/hardirq.h> #include <linux/kernel.h> #include <linux/list.h> #include <linux/list_nulls.h> #include <linux/timer.h> #include <linux/cache.h> #include <linux/bitops.h> #include <linux/lockdep.h> #include <linux/netdevice.h> #include <linux/skbuff.h> /* struct sk_buff */ #include <linux/mm.h> #include <linux/security.h> #include <linux/slab.h> #include <linux/uaccess.h> #include <linux/page_counter.h> #include <linux/memcontrol.h> #include <linux/static_key.h> #include <linux/sched.h> #include <linux/wait.h> #include <linux/cgroup-defs.h> #include <linux/rbtree.h> #include <linux/rculist_nulls.h> #include <linux/poll.h> #include <linux/sockptr.h> #include <linux/indirect_call_wrapper.h> #include <linux/atomic.h> #include <linux/refcount.h> #include <linux/llist.h> #include <net/dst.h> #include <net/checksum.h> #include <net/tcp_states.h> #include <linux/net_tstamp.h> #include <net/l3mdev.h> #include <uapi/linux/socket.h> /* * This structure really needs to be cleaned up. * Most of it is for TCP, and not used by any of * the other protocols. */ /* This is the per-socket lock. The spinlock provides a synchronization * between user contexts and software interrupt processing, whereas the * mini-semaphore synchronizes multiple users amongst themselves. */ typedef struct { spinlock_t slock; int owned; wait_queue_head_t wq; /* * We express the mutex-alike socket_lock semantics * to the lock validator by explicitly managing * the slock as a lock variant (in addition to * the slock itself): */ #ifdef CONFIG_DEBUG_LOCK_ALLOC struct lockdep_map dep_map; #endif } socket_lock_t; struct sock; struct proto; struct net; typedef __u32 __bitwise __portpair; typedef __u64 __bitwise __addrpair; /** * struct sock_common - minimal network layer representation of sockets * @skc_daddr: Foreign IPv4 addr * @skc_rcv_saddr: Bound local IPv4 addr * @skc_addrpair: 8-byte-aligned __u64 union of @skc_daddr & @skc_rcv_saddr * @skc_hash: hash value used with various protocol lookup tables * @skc_u16hashes: two u16 hash values used by UDP lookup tables * @skc_dport: placeholder for inet_dport/tw_dport * @skc_num: placeholder for inet_num/tw_num * @skc_portpair: __u32 union of @skc_dport & @skc_num * @skc_family: network address family * @skc_state: Connection state * @skc_reuse: %SO_REUSEADDR setting * @skc_reuseport: %SO_REUSEPORT setting * @skc_ipv6only: socket is IPV6 only * @skc_net_refcnt: socket is using net ref counting * @skc_bound_dev_if: bound device index if != 0 * @skc_bind_node: bind hash linkage for various protocol lookup tables * @skc_portaddr_node: second hash linkage for UDP/UDP-Lite protocol * @skc_prot: protocol handlers inside a network family * @skc_net: reference to the network namespace of this socket * @skc_v6_daddr: IPV6 destination address * @skc_v6_rcv_saddr: IPV6 source address * @skc_cookie: socket's cookie value * @skc_node: main hash linkage for various protocol lookup tables * @skc_nulls_node: main hash linkage for TCP/UDP/UDP-Lite protocol * @skc_tx_queue_mapping: tx queue number for this connection * @skc_rx_queue_mapping: rx queue number for this connection * @skc_flags: place holder for sk_flags * %SO_LINGER (l_onoff), %SO_BROADCAST, %SO_KEEPALIVE, * %SO_OOBINLINE settings, %SO_TIMESTAMPING settings * @skc_listener: connection request listener socket (aka rsk_listener) * [union with @skc_flags] * @skc_tw_dr: (aka tw_dr) ptr to &struct inet_timewait_death_row * [union with @skc_flags] * @skc_incoming_cpu: record/match cpu processing incoming packets * @skc_rcv_wnd: (aka rsk_rcv_wnd) TCP receive window size (possibly scaled) * [union with @skc_incoming_cpu] * @skc_tw_rcv_nxt: (aka tw_rcv_nxt) TCP window next expected seq number * [union with @skc_incoming_cpu] * @skc_refcnt: reference count * * This is the minimal network layer representation of sockets, the header * for struct sock and struct inet_timewait_sock. */ struct sock_common { union { __addrpair skc_addrpair; struct { __be32 skc_daddr; __be32 skc_rcv_saddr; }; }; union { unsigned int skc_hash; __u16 skc_u16hashes[2]; }; /* skc_dport && skc_num must be grouped as well */ union { __portpair skc_portpair; struct { __be16 skc_dport; __u16 skc_num; }; }; unsigned short skc_family; volatile unsigned char skc_state; unsigned char skc_reuse:4; unsigned char skc_reuseport:1; unsigned char skc_ipv6only:1; unsigned char skc_net_refcnt:1; int skc_bound_dev_if; union { struct hlist_node skc_bind_node; struct hlist_node skc_portaddr_node; }; struct proto *skc_prot; possible_net_t skc_net; #if IS_ENABLED(CONFIG_IPV6) struct in6_addr skc_v6_daddr; struct in6_addr skc_v6_rcv_saddr; #endif atomic64_t skc_cookie; /* following fields are padding to force * offset(struct sock, sk_refcnt) == 128 on 64bit arches * assuming IPV6 is enabled. We use this padding differently * for different kind of 'sockets' */ union { unsigned long skc_flags; struct sock *skc_listener; /* request_sock */ struct inet_timewait_death_row *skc_tw_dr; /* inet_timewait_sock */ }; /* * fields between dontcopy_begin/dontcopy_end * are not copied in sock_copy() */ /* private: */ int skc_dontcopy_begin[0]; /* public: */ union { struct hlist_node skc_node; struct hlist_nulls_node skc_nulls_node; }; unsigned short skc_tx_queue_mapping; #ifdef CONFIG_SOCK_RX_QUEUE_MAPPING unsigned short skc_rx_queue_mapping; #endif union { int skc_incoming_cpu; u32 skc_rcv_wnd; u32 skc_tw_rcv_nxt; /* struct tcp_timewait_sock */ }; refcount_t skc_refcnt; /* private: */ int skc_dontcopy_end[0]; union { u32 skc_rxhash; u32 skc_window_clamp; u32 skc_tw_snd_nxt; /* struct tcp_timewait_sock */ }; /* public: */ }; struct bpf_local_storage; struct sk_filter; /** * struct sock - network layer representation of sockets * @__sk_common: shared layout with inet_timewait_sock * @sk_shutdown: mask of %SEND_SHUTDOWN and/or %RCV_SHUTDOWN * @sk_userlocks: %SO_SNDBUF and %SO_RCVBUF settings * @sk_lock: synchronizer * @sk_kern_sock: True if sock is using kernel lock classes * @sk_rcvbuf: size of receive buffer in bytes * @sk_wq: sock wait queue and async head * @sk_rx_dst: receive input route used by early demux * @sk_rx_dst_ifindex: ifindex for @sk_rx_dst * @sk_rx_dst_cookie: cookie for @sk_rx_dst * @sk_dst_cache: destination cache * @sk_dst_pending_confirm: need to confirm neighbour * @sk_policy: flow policy * @sk_receive_queue: incoming packets * @sk_wmem_alloc: transmit queue bytes committed * @sk_tsq_flags: TCP Small Queues flags * @sk_write_queue: Packet sending queue * @sk_omem_alloc: "o" is "option" or "other" * @sk_wmem_queued: persistent queue size * @sk_forward_alloc: space allocated forward * @sk_reserved_mem: space reserved and non-reclaimable for the socket * @sk_napi_id: id of the last napi context to receive data for sk * @sk_ll_usec: usecs to busypoll when there is no data * @sk_allocation: allocation mode * @sk_pacing_rate: Pacing rate (if supported by transport/packet scheduler) * @sk_pacing_status: Pacing status (requested, handled by sch_fq) * @sk_max_pacing_rate: Maximum pacing rate (%SO_MAX_PACING_RATE) * @sk_sndbuf: size of send buffer in bytes * @sk_no_check_tx: %SO_NO_CHECK setting, set checksum in TX packets * @sk_no_check_rx: allow zero checksum in RX packets * @sk_route_caps: route capabilities (e.g. %NETIF_F_TSO) * @sk_gso_disabled: if set, NETIF_F_GSO_MASK is forbidden. * @sk_gso_type: GSO type (e.g. %SKB_GSO_TCPV4) * @sk_gso_max_size: Maximum GSO segment size to build * @sk_gso_max_segs: Maximum number of GSO segments * @sk_pacing_shift: scaling factor for TCP Small Queues * @sk_lingertime: %SO_LINGER l_linger setting * @sk_backlog: always used with the per-socket spinlock held * @sk_callback_lock: used with the callbacks in the end of this struct * @sk_error_queue: rarely used * @sk_prot_creator: sk_prot of original sock creator (see ipv6_setsockopt, * IPV6_ADDRFORM for instance) * @sk_err: last error * @sk_err_soft: errors that don't cause failure but are the cause of a * persistent failure not just 'timed out' * @sk_drops: raw/udp drops counter * @sk_ack_backlog: current listen backlog * @sk_max_ack_backlog: listen backlog set in listen() * @sk_uid: user id of owner * @sk_prefer_busy_poll: prefer busypolling over softirq processing * @sk_busy_poll_budget: napi processing budget when busypolling * @sk_priority: %SO_PRIORITY setting * @sk_type: socket type (%SOCK_STREAM, etc) * @sk_protocol: which protocol this socket belongs in this network family * @sk_peer_lock: lock protecting @sk_peer_pid and @sk_peer_cred * @sk_peer_pid: &struct pid for this socket's peer * @sk_peer_cred: %SO_PEERCRED setting * @sk_rcvlowat: %SO_RCVLOWAT setting * @sk_rcvtimeo: %SO_RCVTIMEO setting * @sk_sndtimeo: %SO_SNDTIMEO setting * @sk_txhash: computed flow hash for use on transmit * @sk_txrehash: enable TX hash rethink * @sk_filter: socket filtering instructions * @sk_timer: sock cleanup timer * @sk_stamp: time stamp of last packet received * @sk_stamp_seq: lock for accessing sk_stamp on 32 bit architectures only * @sk_tsflags: SO_TIMESTAMPING flags * @sk_bpf_cb_flags: used in bpf_setsockopt() * @sk_use_task_frag: allow sk_page_frag() to use current->task_frag. * Sockets that can be used under memory reclaim should * set this to false. * @sk_bind_phc: SO_TIMESTAMPING bind PHC index of PTP virtual clock * for timestamping * @sk_tskey: counter to disambiguate concurrent tstamp requests * @sk_zckey: counter to order MSG_ZEROCOPY notifications * @sk_socket: Identd and reporting IO signals * @sk_user_data: RPC layer private data. Write-protected by @sk_callback_lock. * @sk_frag: cached page frag * @sk_peek_off: current peek_offset value * @sk_send_head: front of stuff to transmit * @tcp_rtx_queue: TCP re-transmit queue [union with @sk_send_head] * @sk_security: used by security modules * @sk_mark: generic packet mark * @sk_cgrp_data: cgroup data for this cgroup * @sk_memcg: this socket's memory cgroup association * @sk_write_pending: a write to stream socket waits to start * @sk_disconnects: number of disconnect operations performed on this sock * @sk_state_change: callback to indicate change in the state of the sock * @sk_data_ready: callback to indicate there is data to be processed * @sk_write_space: callback to indicate there is bf sending space available * @sk_error_report: callback to indicate errors (e.g. %MSG_ERRQUEUE) * @sk_backlog_rcv: callback to process the backlog * @sk_validate_xmit_skb: ptr to an optional validate function * @sk_destruct: called at sock freeing time, i.e. when all refcnt == 0 * @sk_reuseport_cb: reuseport group container * @sk_bpf_storage: ptr to cache and control for bpf_sk_storage * @sk_rcu: used during RCU grace period * @sk_clockid: clockid used by time-based scheduling (SO_TXTIME) * @sk_txtime_deadline_mode: set deadline mode for SO_TXTIME * @sk_txtime_report_errors: set report errors mode for SO_TXTIME * @sk_txtime_unused: unused txtime flags * @sk_scm_recv_flags: all flags used by scm_recv() * @sk_scm_credentials: flagged by SO_PASSCRED to recv SCM_CREDENTIALS * @sk_scm_security: flagged by SO_PASSSEC to recv SCM_SECURITY * @sk_scm_pidfd: flagged by SO_PASSPIDFD to recv SCM_PIDFD * @sk_scm_rights: flagged by SO_PASSRIGHTS to recv SCM_RIGHTS * @sk_scm_unused: unused flags for scm_recv() * @ns_tracker: tracker for netns reference * @sk_user_frags: xarray of pages the user is holding a reference on. * @sk_owner: reference to the real owner of the socket that calls * sock_lock_init_class_and_name(). */ struct sock { /* * Now struct inet_timewait_sock also uses sock_common, so please just * don't add nothing before this first member (__sk_common) --acme */ struct sock_common __sk_common; #define sk_node __sk_common.skc_node #define sk_nulls_node __sk_common.skc_nulls_node #define sk_refcnt __sk_common.skc_refcnt #define sk_tx_queue_mapping __sk_common.skc_tx_queue_mapping #ifdef CONFIG_SOCK_RX_QUEUE_MAPPING #define sk_rx_queue_mapping __sk_common.skc_rx_queue_mapping #endif #define sk_dontcopy_begin __sk_common.skc_dontcopy_begin #define sk_dontcopy_end __sk_common.skc_dontcopy_end #define sk_hash __sk_common.skc_hash #define sk_portpair __sk_common.skc_portpair #define sk_num __sk_common.skc_num #define sk_dport __sk_common.skc_dport #define sk_addrpair __sk_common.skc_addrpair #define sk_daddr __sk_common.skc_daddr #define sk_rcv_saddr __sk_common.skc_rcv_saddr #define sk_family __sk_common.skc_family #define sk_state __sk_common.skc_state #define sk_reuse __sk_common.skc_reuse #define sk_reuseport __sk_common.skc_reuseport #define sk_ipv6only __sk_common.skc_ipv6only #define sk_net_refcnt __sk_common.skc_net_refcnt #define sk_bound_dev_if __sk_common.skc_bound_dev_if #define sk_bind_node __sk_common.skc_bind_node #define sk_prot __sk_common.skc_prot #define sk_net __sk_common.skc_net #define sk_v6_daddr __sk_common.skc_v6_daddr #define sk_v6_rcv_saddr __sk_common.skc_v6_rcv_saddr #define sk_cookie __sk_common.skc_cookie #define sk_incoming_cpu __sk_common.skc_incoming_cpu #define sk_flags __sk_common.skc_flags #define sk_rxhash __sk_common.skc_rxhash __cacheline_group_begin(sock_write_rx); atomic_t sk_drops; __s32 sk_peek_off; struct sk_buff_head sk_error_queue; struct sk_buff_head sk_receive_queue; /* * The backlog queue is special, it is always used with * the per-socket spinlock held and requires low latency * access. Therefore we special case it's implementation. * Note : rmem_alloc is in this structure to fill a hole * on 64bit arches, not because its logically part of * backlog. */ struct { atomic_t rmem_alloc; int len; struct sk_buff *head; struct sk_buff *tail; } sk_backlog; #define sk_rmem_alloc sk_backlog.rmem_alloc __cacheline_group_end(sock_write_rx); __cacheline_group_begin(sock_read_rx); /* early demux fields */ struct dst_entry __rcu *sk_rx_dst; int sk_rx_dst_ifindex; u32 sk_rx_dst_cookie; #ifdef CONFIG_NET_RX_BUSY_POLL unsigned int sk_ll_usec; unsigned int sk_napi_id; u16 sk_busy_poll_budget; u8 sk_prefer_busy_poll; #endif u8 sk_userlocks; int sk_rcvbuf; struct sk_filter __rcu *sk_filter; union { struct socket_wq __rcu *sk_wq; /* private: */ struct socket_wq *sk_wq_raw; /* public: */ }; void (*sk_data_ready)(struct sock *sk); long sk_rcvtimeo; int sk_rcvlowat; __cacheline_group_end(sock_read_rx); __cacheline_group_begin(sock_read_rxtx); int sk_err; struct socket *sk_socket; struct mem_cgroup *sk_memcg; #ifdef CONFIG_XFRM struct xfrm_policy __rcu *sk_policy[2]; #endif __cacheline_group_end(sock_read_rxtx); __cacheline_group_begin(sock_write_rxtx); socket_lock_t sk_lock; u32 sk_reserved_mem; int sk_forward_alloc; u32 sk_tsflags; __cacheline_group_end(sock_write_rxtx); __cacheline_group_begin(sock_write_tx); int sk_write_pending; atomic_t sk_omem_alloc; int sk_sndbuf; int sk_wmem_queued; refcount_t sk_wmem_alloc; unsigned long sk_tsq_flags; union { struct sk_buff *sk_send_head; struct rb_root tcp_rtx_queue; }; struct sk_buff_head sk_write_queue; u32 sk_dst_pending_confirm; u32 sk_pacing_status; /* see enum sk_pacing */ struct page_frag sk_frag; struct timer_list sk_timer; unsigned long sk_pacing_rate; /* bytes per second */ atomic_t sk_zckey; atomic_t sk_tskey; __cacheline_group_end(sock_write_tx); __cacheline_group_begin(sock_read_tx); unsigned long sk_max_pacing_rate; long sk_sndtimeo; u32 sk_priority; u32 sk_mark; struct dst_entry __rcu *sk_dst_cache; netdev_features_t sk_route_caps; #ifdef CONFIG_SOCK_VALIDATE_XMIT struct sk_buff* (*sk_validate_xmit_skb)(struct sock *sk, struct net_device *dev, struct sk_buff *skb); #endif u16 sk_gso_type; u16 sk_gso_max_segs; unsigned int sk_gso_max_size; gfp_t sk_allocation; u32 sk_txhash; u8 sk_pacing_shift; bool sk_use_task_frag; __cacheline_group_end(sock_read_tx); /* * Because of non atomicity rules, all * changes are protected by socket lock. */ u8 sk_gso_disabled : 1, sk_kern_sock : 1, sk_no_check_tx : 1, sk_no_check_rx : 1; u8 sk_shutdown; u16 sk_type; u16 sk_protocol; unsigned long sk_lingertime; struct proto *sk_prot_creator; rwlock_t sk_callback_lock; int sk_err_soft; u32 sk_ack_backlog; u32 sk_max_ack_backlog; kuid_t sk_uid; spinlock_t sk_peer_lock; int sk_bind_phc; struct pid *sk_peer_pid; const struct cred *sk_peer_cred; ktime_t sk_stamp; #if BITS_PER_LONG==32 seqlock_t sk_stamp_seq; #endif int sk_disconnects; union { u8 sk_txrehash; u8 sk_scm_recv_flags; struct { u8 sk_scm_credentials : 1, sk_scm_security : 1, sk_scm_pidfd : 1, sk_scm_rights : 1, sk_scm_unused : 4; }; }; u8 sk_clockid; u8 sk_txtime_deadline_mode : 1, sk_txtime_report_errors : 1, sk_txtime_unused : 6; #define SK_BPF_CB_FLAG_TEST(SK, FLAG) ((SK)->sk_bpf_cb_flags & (FLAG)) u8 sk_bpf_cb_flags; void *sk_user_data; #ifdef CONFIG_SECURITY void *sk_security; #endif struct sock_cgroup_data sk_cgrp_data; void (*sk_state_change)(struct sock *sk); void (*sk_write_space)(struct sock *sk); void (*sk_error_report)(struct sock *sk); int (*sk_backlog_rcv)(struct sock *sk, struct sk_buff *skb); void (*sk_destruct)(struct sock *sk); struct sock_reuseport __rcu *sk_reuseport_cb; #ifdef CONFIG_BPF_SYSCALL struct bpf_local_storage __rcu *sk_bpf_storage; #endif struct rcu_head sk_rcu; netns_tracker ns_tracker; struct xarray sk_user_frags; #if IS_ENABLED(CONFIG_PROVE_LOCKING) && IS_ENABLED(CONFIG_MODULES) struct module *sk_owner; #endif }; struct sock_bh_locked { struct sock *sock; local_lock_t bh_lock; }; enum sk_pacing { SK_PACING_NONE = 0, SK_PACING_NEEDED = 1, SK_PACING_FQ = 2, }; /* flag bits in sk_user_data * * - SK_USER_DATA_NOCOPY: Pointer stored in sk_user_data might * not be suitable for copying when cloning the socket. For instance, * it can point to a reference counted object. sk_user_data bottom * bit is set if pointer must not be copied. * * - SK_USER_DATA_BPF: Mark whether sk_user_data field is * managed/owned by a BPF reuseport array. This bit should be set * when sk_user_data's sk is added to the bpf's reuseport_array. * * - SK_USER_DATA_PSOCK: Mark whether pointer stored in * sk_user_data points to psock type. This bit should be set * when sk_user_data is assigned to a psock object. */ #define SK_USER_DATA_NOCOPY 1UL #define SK_USER_DATA_BPF 2UL #define SK_USER_DATA_PSOCK 4UL #define SK_USER_DATA_PTRMASK ~(SK_USER_DATA_NOCOPY | SK_USER_DATA_BPF |\ SK_USER_DATA_PSOCK) /** * sk_user_data_is_nocopy - Test if sk_user_data pointer must not be copied * @sk: socket */ static inline bool sk_user_data_is_nocopy(const struct sock *sk) { return ((uintptr_t)sk->sk_user_data & SK_USER_DATA_NOCOPY); } #define __sk_user_data(sk) ((*((void __rcu **)&(sk)->sk_user_data))) /** * __locked_read_sk_user_data_with_flags - return the pointer * only if argument flags all has been set in sk_user_data. Otherwise * return NULL * * @sk: socket * @flags: flag bits * * The caller must be holding sk->sk_callback_lock. */ static inline void * __locked_read_sk_user_data_with_flags(const struct sock *sk, uintptr_t flags) { uintptr_t sk_user_data = (uintptr_t)rcu_dereference_check(__sk_user_data(sk), lockdep_is_held(&sk->sk_callback_lock)); WARN_ON_ONCE(flags & SK_USER_DATA_PTRMASK); if ((sk_user_data & flags) == flags) return (void *)(sk_user_data & SK_USER_DATA_PTRMASK); return NULL; } /** * __rcu_dereference_sk_user_data_with_flags - return the pointer * only if argument flags all has been set in sk_user_data. Otherwise * return NULL * * @sk: socket * @flags: flag bits */ static inline void * __rcu_dereference_sk_user_data_with_flags(const struct sock *sk, uintptr_t flags) { uintptr_t sk_user_data = (uintptr_t)rcu_dereference(__sk_user_data(sk)); WARN_ON_ONCE(flags & SK_USER_DATA_PTRMASK); if ((sk_user_data & flags) == flags) return (void *)(sk_user_data & SK_USER_DATA_PTRMASK); return NULL; } #define rcu_dereference_sk_user_data(sk) \ __rcu_dereference_sk_user_data_with_flags(sk, 0) #define __rcu_assign_sk_user_data_with_flags(sk, ptr, flags) \ ({ \ uintptr_t __tmp1 = (uintptr_t)(ptr), \ __tmp2 = (uintptr_t)(flags); \ WARN_ON_ONCE(__tmp1 & ~SK_USER_DATA_PTRMASK); \ WARN_ON_ONCE(__tmp2 & SK_USER_DATA_PTRMASK); \ rcu_assign_pointer(__sk_user_data((sk)), \ __tmp1 | __tmp2); \ }) #define rcu_assign_sk_user_data(sk, ptr) \ __rcu_assign_sk_user_data_with_flags(sk, ptr, 0) static inline struct net *sock_net(const struct sock *sk) { return read_pnet(&sk->sk_net); } static inline void sock_net_set(struct sock *sk, struct net *net) { write_pnet(&sk->sk_net, net); } /* * SK_CAN_REUSE and SK_NO_REUSE on a socket mean that the socket is OK * or not whether his port will be reused by someone else. SK_FORCE_REUSE * on a socket means that the socket will reuse everybody else's port * without looking at the other's sk_reuse value. */ #define SK_NO_REUSE 0 #define SK_CAN_REUSE 1 #define SK_FORCE_REUSE 2 int sk_set_peek_off(struct sock *sk, int val); static inline int sk_peek_offset(const struct sock *sk, int flags) { if (unlikely(flags & MSG_PEEK)) { return READ_ONCE(sk->sk_peek_off); } return 0; } static inline void sk_peek_offset_bwd(struct sock *sk, int val) { s32 off = READ_ONCE(sk->sk_peek_off); if (unlikely(off >= 0)) { off = max_t(s32, off - val, 0); WRITE_ONCE(sk->sk_peek_off, off); } } static inline void sk_peek_offset_fwd(struct sock *sk, int val) { sk_peek_offset_bwd(sk, -val); } /* * Hashed lists helper routines */ static inline struct sock *sk_entry(const struct hlist_node *node) { return hlist_entry(node, struct sock, sk_node); } static inline struct sock *__sk_head(const struct hlist_head *head) { return hlist_entry(head->first, struct sock, sk_node); } static inline struct sock *sk_head(const struct hlist_head *head) { return hlist_empty(head) ? NULL : __sk_head(head); } static inline struct sock *__sk_nulls_head(const struct hlist_nulls_head *head) { return hlist_nulls_entry(head->first, struct sock, sk_nulls_node); } static inline struct sock *sk_nulls_head(const struct hlist_nulls_head *head) { return hlist_nulls_empty(head) ? NULL : __sk_nulls_head(head); } static inline struct sock *sk_next(const struct sock *sk) { return hlist_entry_safe(sk->sk_node.next, struct sock, sk_node); } static inline struct sock *sk_nulls_next(const struct sock *sk) { return (!is_a_nulls(sk->sk_nulls_node.next)) ? hlist_nulls_entry(sk->sk_nulls_node.next, struct sock, sk_nulls_node) : NULL; } static inline bool sk_unhashed(const struct sock *sk) { return hlist_unhashed(&sk->sk_node); } static inline bool sk_hashed(const struct sock *sk) { return !sk_unhashed(sk); } static inline void sk_node_init(struct hlist_node *node) { node->pprev = NULL; } static inline void __sk_del_node(struct sock *sk) { __hlist_del(&sk->sk_node); } /* NB: equivalent to hlist_del_init_rcu */ static inline bool __sk_del_node_init(struct sock *sk) { if (sk_hashed(sk)) { __sk_del_node(sk); sk_node_init(&sk->sk_node); return true; } return false; } /* Grab socket reference count. This operation is valid only when sk is ALREADY grabbed f.e. it is found in hash table or a list and the lookup is made under lock preventing hash table modifications. */ static __always_inline void sock_hold(struct sock *sk) { refcount_inc(&sk->sk_refcnt); } /* Ungrab socket in the context, which assumes that socket refcnt cannot hit zero, f.e. it is true in context of any socketcall. */ static __always_inline void __sock_put(struct sock *sk) { refcount_dec(&sk->sk_refcnt); } static inline bool sk_del_node_init(struct sock *sk) { bool rc = __sk_del_node_init(sk); if (rc) { /* paranoid for a while -acme */ WARN_ON(refcount_read(&sk->sk_refcnt) == 1); __sock_put(sk); } return rc; } #define sk_del_node_init_rcu(sk) sk_del_node_init(sk) static inline bool __sk_nulls_del_node_init_rcu(struct sock *sk) { if (sk_hashed(sk)) { hlist_nulls_del_init_rcu(&sk->sk_nulls_node); return true; } return false; } static inline bool sk_nulls_del_node_init_rcu(struct sock *sk) { bool rc = __sk_nulls_del_node_init_rcu(sk); if (rc) { /* paranoid for a while -acme */ WARN_ON(refcount_read(&sk->sk_refcnt) == 1); __sock_put(sk); } return rc; } static inline void __sk_add_node(struct sock *sk, struct hlist_head *list) { hlist_add_head(&sk->sk_node, list); } static inline void sk_add_node(struct sock *sk, struct hlist_head *list) { sock_hold(sk); __sk_add_node(sk, list); } static inline void sk_add_node_rcu(struct sock *sk, struct hlist_head *list) { sock_hold(sk); if (IS_ENABLED(CONFIG_IPV6) && sk->sk_reuseport && sk->sk_family == AF_INET6) hlist_add_tail_rcu(&sk->sk_node, list); else hlist_add_head_rcu(&sk->sk_node, list); } static inline void sk_add_node_tail_rcu(struct sock *sk, struct hlist_head *list) { sock_hold(sk); hlist_add_tail_rcu(&sk->sk_node, list); } static inline void __sk_nulls_add_node_rcu(struct sock *sk, struct hlist_nulls_head *list) { hlist_nulls_add_head_rcu(&sk->sk_nulls_node, list); } static inline void __sk_nulls_add_node_tail_rcu(struct sock *sk, struct hlist_nulls_head *list) { hlist_nulls_add_tail_rcu(&sk->sk_nulls_node, list); } static inline void sk_nulls_add_node_rcu(struct sock *sk, struct hlist_nulls_head *list) { sock_hold(sk); __sk_nulls_add_node_rcu(sk, list); } static inline void __sk_del_bind_node(struct sock *sk) { __hlist_del(&sk->sk_bind_node); } static inline void sk_add_bind_node(struct sock *sk, struct hlist_head *list) { hlist_add_head(&sk->sk_bind_node, list); } #define sk_for_each(__sk, list) \ hlist_for_each_entry(__sk, list, sk_node) #define sk_for_each_rcu(__sk, list) \ hlist_for_each_entry_rcu(__sk, list, sk_node) #define sk_nulls_for_each(__sk, node, list) \ hlist_nulls_for_each_entry(__sk, node, list, sk_nulls_node) #define sk_nulls_for_each_rcu(__sk, node, list) \ hlist_nulls_for_each_entry_rcu(__sk, node, list, sk_nulls_node) #define sk_for_each_from(__sk) \ hlist_for_each_entry_from(__sk, sk_node) #define sk_nulls_for_each_from(__sk, node) \ if (__sk && ({ node = &(__sk)->sk_nulls_node; 1; })) \ hlist_nulls_for_each_entry_from(__sk, node, sk_nulls_node) #define sk_for_each_safe(__sk, tmp, list) \ hlist_for_each_entry_safe(__sk, tmp, list, sk_node) #define sk_for_each_bound(__sk, list) \ hlist_for_each_entry(__sk, list, sk_bind_node) #define sk_for_each_bound_safe(__sk, tmp, list) \ hlist_for_each_entry_safe(__sk, tmp, list, sk_bind_node) /** * sk_for_each_entry_offset_rcu - iterate over a list at a given struct offset * @tpos: the type * to use as a loop cursor. * @pos: the &struct hlist_node to use as a loop cursor. * @head: the head for your list. * @offset: offset of hlist_node within the struct. * */ #define sk_for_each_entry_offset_rcu(tpos, pos, head, offset) \ for (pos = rcu_dereference(hlist_first_rcu(head)); \ pos != NULL && \ ({ tpos = (typeof(*tpos) *)((void *)pos - offset); 1;}); \ pos = rcu_dereference(hlist_next_rcu(pos))) static inline struct user_namespace *sk_user_ns(const struct sock *sk) { /* Careful only use this in a context where these parameters * can not change and must all be valid, such as recvmsg from * userspace. */ return sk->sk_socket->file->f_cred->user_ns; } /* Sock flags */ enum sock_flags { SOCK_DEAD, SOCK_DONE, SOCK_URGINLINE, SOCK_KEEPOPEN, SOCK_LINGER, SOCK_DESTROY, SOCK_BROADCAST, SOCK_TIMESTAMP, SOCK_ZAPPED, SOCK_USE_WRITE_QUEUE, /* whether to call sk->sk_write_space in sock_wfree */ SOCK_DBG, /* %SO_DEBUG setting */ SOCK_RCVTSTAMP, /* %SO_TIMESTAMP setting */ SOCK_RCVTSTAMPNS, /* %SO_TIMESTAMPNS setting */ SOCK_LOCALROUTE, /* route locally only, %SO_DONTROUTE setting */ SOCK_MEMALLOC, /* VM depends on this socket for swapping */ SOCK_TIMESTAMPING_RX_SOFTWARE, /* %SOF_TIMESTAMPING_RX_SOFTWARE */ SOCK_FASYNC, /* fasync() active */ SOCK_RXQ_OVFL, SOCK_ZEROCOPY, /* buffers from userspace */ SOCK_WIFI_STATUS, /* push wifi status to userspace */ SOCK_NOFCS, /* Tell NIC not to do the Ethernet FCS. * Will use last 4 bytes of packet sent from * user-space instead. */ SOCK_FILTER_LOCKED, /* Filter cannot be changed anymore */ SOCK_SELECT_ERR_QUEUE, /* Wake select on error queue */ SOCK_RCU_FREE, /* wait rcu grace period in sk_destruct() */ SOCK_TXTIME, SOCK_XDP, /* XDP is attached */ SOCK_TSTAMP_NEW, /* Indicates 64 bit timestamps always */ SOCK_RCVMARK, /* Receive SO_MARK ancillary data with packet */ SOCK_RCVPRIORITY, /* Receive SO_PRIORITY ancillary data with packet */ SOCK_TIMESTAMPING_ANY, /* Copy of sk_tsflags & TSFLAGS_ANY */ }; #define SK_FLAGS_TIMESTAMP ((1UL << SOCK_TIMESTAMP) | (1UL << SOCK_TIMESTAMPING_RX_SOFTWARE)) /* * The highest bit of sk_tsflags is reserved for kernel-internal * SOCKCM_FLAG_TS_OPT_ID. There is a check in core/sock.c to control that * SOF_TIMESTAMPING* values do not reach this reserved area */ #define SOCKCM_FLAG_TS_OPT_ID BIT(31) static inline void sock_copy_flags(struct sock *nsk, const struct sock *osk) { nsk->sk_flags = osk->sk_flags; } static inline void sock_set_flag(struct sock *sk, enum sock_flags flag) { __set_bit(flag, &sk->sk_flags); } static inline void sock_reset_flag(struct sock *sk, enum sock_flags flag) { __clear_bit(flag, &sk->sk_flags); } static inline void sock_valbool_flag(struct sock *sk, enum sock_flags bit, int valbool) { if (valbool) sock_set_flag(sk, bit); else sock_reset_flag(sk, bit); } static inline bool sock_flag(const struct sock *sk, enum sock_flags flag) { return test_bit(flag, &sk->sk_flags); } #ifdef CONFIG_NET DECLARE_STATIC_KEY_FALSE(memalloc_socks_key); static inline int sk_memalloc_socks(void) { return static_branch_unlikely(&memalloc_socks_key); } void __receive_sock(struct file *file); #else static inline int sk_memalloc_socks(void) { return 0; } static inline void __receive_sock(struct file *file) { } #endif static inline gfp_t sk_gfp_mask(const struct sock *sk, gfp_t gfp_mask) { return gfp_mask | (sk->sk_allocation & __GFP_MEMALLOC); } static inline void sk_acceptq_removed(struct sock *sk) { WRITE_ONCE(sk->sk_ack_backlog, sk->sk_ack_backlog - 1); } static inline void sk_acceptq_added(struct sock *sk) { WRITE_ONCE(sk->sk_ack_backlog, sk->sk_ack_backlog + 1); } /* Note: If you think the test should be: * return READ_ONCE(sk->sk_ack_backlog) >= READ_ONCE(sk->sk_max_ack_backlog); * Then please take a look at commit 64a146513f8f ("[NET]: Revert incorrect accept queue backlog changes.") */ static inline bool sk_acceptq_is_full(const struct sock *sk) { return READ_ONCE(sk->sk_ack_backlog) > READ_ONCE(sk->sk_max_ack_backlog); } /* * Compute minimal free write space needed to queue new packets. */ static inline int sk_stream_min_wspace(const struct sock *sk) { return READ_ONCE(sk->sk_wmem_queued) >> 1; } static inline int sk_stream_wspace(const struct sock *sk) { return READ_ONCE(sk->sk_sndbuf) - READ_ONCE(sk->sk_wmem_queued); } static inline void sk_wmem_queued_add(struct sock *sk, int val) { WRITE_ONCE(sk->sk_wmem_queued, sk->sk_wmem_queued + val); } static inline void sk_forward_alloc_add(struct sock *sk, int val) { /* Paired with lockless reads of sk->sk_forward_alloc */ WRITE_ONCE(sk->sk_forward_alloc, sk->sk_forward_alloc + val); } void sk_stream_write_space(struct sock *sk); /* OOB backlog add */ static inline void __sk_add_backlog(struct sock *sk, struct sk_buff *skb) { /* dont let skb dst not refcounted, we are going to leave rcu lock */ skb_dst_force(skb); if (!sk->sk_backlog.tail) WRITE_ONCE(sk->sk_backlog.head, skb); else sk->sk_backlog.tail->next = skb; WRITE_ONCE(sk->sk_backlog.tail, skb); skb->next = NULL; } /* * Take into account size of receive queue and backlog queue * Do not take into account this skb truesize, * to allow even a single big packet to come. */ static inline bool sk_rcvqueues_full(const struct sock *sk, unsigned int limit) { unsigned int qsize = sk->sk_backlog.len + atomic_read(&sk->sk_rmem_alloc); return qsize > limit; } /* The per-socket spinlock must be held here. */ static inline __must_check int sk_add_backlog(struct sock *sk, struct sk_buff *skb, unsigned int limit) { if (sk_rcvqueues_full(sk, limit)) return -ENOBUFS; /* * If the skb was allocated from pfmemalloc reserves, only * allow SOCK_MEMALLOC sockets to use it as this socket is * helping free memory */ if (skb_pfmemalloc(skb) && !sock_flag(sk, SOCK_MEMALLOC)) return -ENOMEM; __sk_add_backlog(sk, skb); sk->sk_backlog.len += skb->truesize; return 0; } int __sk_backlog_rcv(struct sock *sk, struct sk_buff *skb); INDIRECT_CALLABLE_DECLARE(int tcp_v4_do_rcv(struct sock *sk, struct sk_buff *skb)); INDIRECT_CALLABLE_DECLARE(int tcp_v6_do_rcv(struct sock *sk, struct sk_buff *skb)); static inline int sk_backlog_rcv(struct sock *sk, struct sk_buff *skb) { if (sk_memalloc_socks() && skb_pfmemalloc(skb)) return __sk_backlog_rcv(sk, skb); return INDIRECT_CALL_INET(sk->sk_backlog_rcv, tcp_v6_do_rcv, tcp_v4_do_rcv, sk, skb); } static inline void sk_incoming_cpu_update(struct sock *sk) { int cpu = raw_smp_processor_id(); if (unlikely(READ_ONCE(sk->sk_incoming_cpu) != cpu)) WRITE_ONCE(sk->sk_incoming_cpu, cpu); } static inline void sock_rps_save_rxhash(struct sock *sk, const struct sk_buff *skb) { #ifdef CONFIG_RPS /* The following WRITE_ONCE() is paired with the READ_ONCE() * here, and another one in sock_rps_record_flow(). */ if (unlikely(READ_ONCE(sk->sk_rxhash) != skb->hash)) WRITE_ONCE(sk->sk_rxhash, skb->hash); #endif } static inline void sock_rps_reset_rxhash(struct sock *sk) { #ifdef CONFIG_RPS /* Paired with READ_ONCE() in sock_rps_record_flow() */ WRITE_ONCE(sk->sk_rxhash, 0); #endif } #define sk_wait_event(__sk, __timeo, __condition, __wait) \ ({ int __rc, __dis = __sk->sk_disconnects; \ release_sock(__sk); \ __rc = __condition; \ if (!__rc) { \ *(__timeo) = wait_woken(__wait, \ TASK_INTERRUPTIBLE, \ *(__timeo)); \ } \ sched_annotate_sleep(); \ lock_sock(__sk); \ __rc = __dis == __sk->sk_disconnects ? __condition : -EPIPE; \ __rc; \ }) int sk_stream_wait_connect(struct sock *sk, long *timeo_p); int sk_stream_wait_memory(struct sock *sk, long *timeo_p); void sk_stream_wait_close(struct sock *sk, long timeo_p); int sk_stream_error(struct sock *sk, int flags, int err); void sk_stream_kill_queues(struct sock *sk); void sk_set_memalloc(struct sock *sk); void sk_clear_memalloc(struct sock *sk); void __sk_flush_backlog(struct sock *sk); static inline bool sk_flush_backlog(struct sock *sk) { if (unlikely(READ_ONCE(sk->sk_backlog.tail))) { __sk_flush_backlog(sk); return true; } return false; } int sk_wait_data(struct sock *sk, long *timeo, const struct sk_buff *skb); struct request_sock_ops; struct timewait_sock_ops; struct inet_hashinfo; struct raw_hashinfo; struct smc_hashinfo; struct module; struct sk_psock; /* * caches using SLAB_TYPESAFE_BY_RCU should let .next pointer from nulls nodes * un-modified. Special care is taken when initializing object to zero. */ static inline void sk_prot_clear_nulls(struct sock *sk, int size) { if (offsetof(struct sock, sk_node.next) != 0) memset(sk, 0, offsetof(struct sock, sk_node.next)); memset(&sk->sk_node.pprev, 0, size - offsetof(struct sock, sk_node.pprev)); } struct proto_accept_arg { int flags; int err; int is_empty; bool kern; }; /* Networking protocol blocks we attach to sockets. * socket layer -> transport layer interface */ struct proto { void (*close)(struct sock *sk, long timeout); int (*pre_connect)(struct sock *sk, struct sockaddr *uaddr, int addr_len); int (*connect)(struct sock *sk, struct sockaddr *uaddr, int addr_len); int (*disconnect)(struct sock *sk, int flags); struct sock * (*accept)(struct sock *sk, struct proto_accept_arg *arg); int (*ioctl)(struct sock *sk, int cmd, int *karg); int (*init)(struct sock *sk); void (*destroy)(struct sock *sk); void (*shutdown)(struct sock *sk, int how); int (*setsockopt)(struct sock *sk, int level, int optname, sockptr_t optval, unsigned int optlen); int (*getsockopt)(struct sock *sk, int level, int optname, char __user *optval, int __user *option); void (*keepalive)(struct sock *sk, int valbool); #ifdef CONFIG_COMPAT int (*compat_ioctl)(struct sock *sk, unsigned int cmd, unsigned long arg); #endif int (*sendmsg)(struct sock *sk, struct msghdr *msg, size_t len); int (*recvmsg)(struct sock *sk, struct msghdr *msg, size_t len, int flags, int *addr_len); void (*splice_eof)(struct socket *sock); int (*bind)(struct sock *sk, struct sockaddr *addr, int addr_len); int (*bind_add)(struct sock *sk, struct sockaddr *addr, int addr_len); int (*backlog_rcv) (struct sock *sk, struct sk_buff *skb); bool (*bpf_bypass_getsockopt)(int level, int optname); void (*release_cb)(struct sock *sk); /* Keeping track of sk's, looking them up, and port selection methods. */ int (*hash)(struct sock *sk); void (*unhash)(struct sock *sk); void (*rehash)(struct sock *sk); int (*get_port)(struct sock *sk, unsigned short snum); void (*put_port)(struct sock *sk); #ifdef CONFIG_BPF_SYSCALL int (*psock_update_sk_prot)(struct sock *sk, struct sk_psock *psock, bool restore); #endif /* Keeping track of sockets in use */ #ifdef CONFIG_PROC_FS unsigned int inuse_idx; #endif bool (*stream_memory_free)(const struct sock *sk, int wake); bool (*sock_is_readable)(struct sock *sk); /* Memory pressure */ void (*enter_memory_pressure)(struct sock *sk); void (*leave_memory_pressure)(struct sock *sk); atomic_long_t *memory_allocated; /* Current allocated memory. */ int __percpu *per_cpu_fw_alloc; struct percpu_counter *sockets_allocated; /* Current number of sockets. */ /* * Pressure flag: try to collapse. * Technical note: it is used by multiple contexts non atomically. * Make sure to use READ_ONCE()/WRITE_ONCE() for all reads/writes. * All the __sk_mem_schedule() is of this nature: accounting * is strict, actions are advisory and have some latency. */ unsigned long *memory_pressure; long *sysctl_mem; int *sysctl_wmem; int *sysctl_rmem; u32 sysctl_wmem_offset; u32 sysctl_rmem_offset; int max_header; bool no_autobind; struct kmem_cache *slab; unsigned int obj_size; unsigned int ipv6_pinfo_offset; slab_flags_t slab_flags; unsigned int useroffset; /* Usercopy region offset */ unsigned int usersize; /* Usercopy region size */ unsigned int __percpu *orphan_count; struct request_sock_ops *rsk_prot; struct timewait_sock_ops *twsk_prot; union { struct inet_hashinfo *hashinfo; struct udp_table *udp_table; struct raw_hashinfo *raw_hash; struct smc_hashinfo *smc_hash; } h; struct module *owner; char name[32]; struct list_head node; int (*diag_destroy)(struct sock *sk, int err); } __randomize_layout; int proto_register(struct proto *prot, int alloc_slab); void proto_unregister(struct proto *prot); int sock_load_diag_module(int family, int protocol); INDIRECT_CALLABLE_DECLARE(bool tcp_stream_memory_free(const struct sock *sk, int wake)); static inline bool __sk_stream_memory_free(const struct sock *sk, int wake) { if (READ_ONCE(sk->sk_wmem_queued) >= READ_ONCE(sk->sk_sndbuf)) return false; return sk->sk_prot->stream_memory_free ? INDIRECT_CALL_INET_1(sk->sk_prot->stream_memory_free, tcp_stream_memory_free, sk, wake) : true; } static inline bool sk_stream_memory_free(const struct sock *sk) { return __sk_stream_memory_free(sk, 0); } static inline bool __sk_stream_is_writeable(const struct sock *sk, int wake) { return sk_stream_wspace(sk) >= sk_stream_min_wspace(sk) && __sk_stream_memory_free(sk, wake); } static inline bool sk_stream_is_writeable(const struct sock *sk) { return __sk_stream_is_writeable(sk, 0); } static inline int sk_under_cgroup_hierarchy(struct sock *sk, struct cgroup *ancestor) { #ifdef CONFIG_SOCK_CGROUP_DATA return cgroup_is_descendant(sock_cgroup_ptr(&sk->sk_cgrp_data), ancestor); #else return -ENOTSUPP; #endif } #define SK_ALLOC_PERCPU_COUNTER_BATCH 16 static inline void sk_sockets_allocated_dec(struct sock *sk) { percpu_counter_add_batch(sk->sk_prot->sockets_allocated, -1, SK_ALLOC_PERCPU_COUNTER_BATCH); } static inline void sk_sockets_allocated_inc(struct sock *sk) { percpu_counter_add_batch(sk->sk_prot->sockets_allocated, 1, SK_ALLOC_PERCPU_COUNTER_BATCH); } static inline u64 sk_sockets_allocated_read_positive(struct sock *sk) { return percpu_counter_read_positive(sk->sk_prot->sockets_allocated); } static inline int proto_sockets_allocated_sum_positive(struct proto *prot) { return percpu_counter_sum_positive(prot->sockets_allocated); } #ifdef CONFIG_PROC_FS #define PROTO_INUSE_NR 64 /* should be enough for the first time */ struct prot_inuse { int all; int val[PROTO_INUSE_NR]; }; static inline void sock_prot_inuse_add(const struct net *net, const struct proto *prot, int val) { this_cpu_add(net->core.prot_inuse->val[prot->inuse_idx], val); } static inline void sock_inuse_add(const struct net *net, int val) { this_cpu_add(net->core.prot_inuse->all, val); } int sock_prot_inuse_get(struct net *net, struct proto *proto); int sock_inuse_get(struct net *net); #else static inline void sock_prot_inuse_add(const struct net *net, const struct proto *prot, int val) { } static inline void sock_inuse_add(const struct net *net, int val) { } #endif /* With per-bucket locks this operation is not-atomic, so that * this version is not worse. */ static inline int __sk_prot_rehash(struct sock *sk) { sk->sk_prot->unhash(sk); return sk->sk_prot->hash(sk); } /* About 10 seconds */ #define SOCK_DESTROY_TIME (10*HZ) /* Sockets 0-1023 can't be bound to unless you are superuser */ #define PROT_SOCK 1024 #define SHUTDOWN_MASK 3 #define RCV_SHUTDOWN 1 #define SEND_SHUTDOWN 2 #define SOCK_BINDADDR_LOCK 4 #define SOCK_BINDPORT_LOCK 8 struct socket_alloc { struct socket socket; struct inode vfs_inode; }; static inline struct socket *SOCKET_I(struct inode *inode) { return &container_of(inode, struct socket_alloc, vfs_inode)->socket; } static inline struct inode *SOCK_INODE(struct socket *socket) { return &container_of(socket, struct socket_alloc, socket)->vfs_inode; } /* * Functions for memory accounting */ int __sk_mem_raise_allocated(struct sock *sk, int size, int amt, int kind); int __sk_mem_schedule(struct sock *sk, int size, int kind); void __sk_mem_reduce_allocated(struct sock *sk, int amount); void __sk_mem_reclaim(struct sock *sk, int amount); #define SK_MEM_SEND 0 #define SK_MEM_RECV 1 /* sysctl_mem values are in pages */ static inline long sk_prot_mem_limits(const struct sock *sk, int index) { return READ_ONCE(sk->sk_prot->sysctl_mem[index]); } static inline int sk_mem_pages(int amt) { return (amt + PAGE_SIZE - 1) >> PAGE_SHIFT; } static inline bool sk_has_account(struct sock *sk) { /* return true if protocol supports memory accounting */ return !!sk->sk_prot->memory_allocated; } static inline bool sk_wmem_schedule(struct sock *sk, int size) { int delta; if (!sk_has_account(sk)) return true; delta = size - sk->sk_forward_alloc; return delta <= 0 || __sk_mem_schedule(sk, delta, SK_MEM_SEND); } static inline bool __sk_rmem_schedule(struct sock *sk, int size, bool pfmemalloc) { int delta; if (!sk_has_account(sk)) return true; delta = size - sk->sk_forward_alloc; return delta <= 0 || __sk_mem_schedule(sk, delta, SK_MEM_RECV) || pfmemalloc; } static inline bool sk_rmem_schedule(struct sock *sk, struct sk_buff *skb, int size) { return __sk_rmem_schedule(sk, size, skb_pfmemalloc(skb)); } static inline int sk_unused_reserved_mem(const struct sock *sk) { int unused_mem; if (likely(!sk->sk_reserved_mem)) return 0; unused_mem = sk->sk_reserved_mem - sk->sk_wmem_queued - atomic_read(&sk->sk_rmem_alloc); return unused_mem > 0 ? unused_mem : 0; } static inline void sk_mem_reclaim(struct sock *sk) { int reclaimable; if (!sk_has_account(sk)) return; reclaimable = sk->sk_forward_alloc - sk_unused_reserved_mem(sk); if (reclaimable >= (int)PAGE_SIZE) __sk_mem_reclaim(sk, reclaimable); } static inline void sk_mem_reclaim_final(struct sock *sk) { sk->sk_reserved_mem = 0; sk_mem_reclaim(sk); } static inline void sk_mem_charge(struct sock *sk, int size) { if (!sk_has_account(sk)) return; sk_forward_alloc_add(sk, -size); } static inline void sk_mem_uncharge(struct sock *sk, int size) { if (!sk_has_account(sk)) return; sk_forward_alloc_add(sk, size); sk_mem_reclaim(sk); } #if IS_ENABLED(CONFIG_PROVE_LOCKING) && IS_ENABLED(CONFIG_MODULES) static inline void sk_owner_set(struct sock *sk, struct module *owner) { __module_get(owner); sk->sk_owner = owner; } static inline void sk_owner_clear(struct sock *sk) { sk->sk_owner = NULL; } static inline void sk_owner_put(struct sock *sk) { module_put(sk->sk_owner); } #else static inline void sk_owner_set(struct sock *sk, struct module *owner) { } static inline void sk_owner_clear(struct sock *sk) { } static inline void sk_owner_put(struct sock *sk) { } #endif /* * Macro so as to not evaluate some arguments when * lockdep is not enabled. * * Mark both the sk_lock and the sk_lock.slock as a * per-address-family lock class. */ #define sock_lock_init_class_and_name(sk, sname, skey, name, key) \ do { \ sk_owner_set(sk, THIS_MODULE); \ sk->sk_lock.owned = 0; \ init_waitqueue_head(&sk->sk_lock.wq); \ spin_lock_init(&(sk)->sk_lock.slock); \ debug_check_no_locks_freed((void *)&(sk)->sk_lock, \ sizeof((sk)->sk_lock)); \ lockdep_set_class_and_name(&(sk)->sk_lock.slock, \ (skey), (sname)); \ lockdep_init_map(&(sk)->sk_lock.dep_map, (name), (key), 0); \ } while (0) static inline bool lockdep_sock_is_held(const struct sock *sk) { return lockdep_is_held(&sk->sk_lock) || lockdep_is_held(&sk->sk_lock.slock); } void lock_sock_nested(struct sock *sk, int subclass); static inline void lock_sock(struct sock *sk) { lock_sock_nested(sk, 0); } void __lock_sock(struct sock *sk); void __release_sock(struct sock *sk); void release_sock(struct sock *sk); /* BH context may only use the following locking interface. */ #define bh_lock_sock(__sk) spin_lock(&((__sk)->sk_lock.slock)) #define bh_lock_sock_nested(__sk) \ spin_lock_nested(&((__sk)->sk_lock.slock), \ SINGLE_DEPTH_NESTING) #define bh_unlock_sock(__sk) spin_unlock(&((__sk)->sk_lock.slock)) bool __lock_sock_fast(struct sock *sk) __acquires(&sk->sk_lock.slock); /** * lock_sock_fast - fast version of lock_sock * @sk: socket * * This version should be used for very small section, where process won't block * return false if fast path is taken: * * sk_lock.slock locked, owned = 0, BH disabled * * return true if slow path is taken: * * sk_lock.slock unlocked, owned = 1, BH enabled */ static inline bool lock_sock_fast(struct sock *sk) { /* The sk_lock has mutex_lock() semantics here. */ mutex_acquire(&sk->sk_lock.dep_map, 0, 0, _RET_IP_); return __lock_sock_fast(sk); } /* fast socket lock variant for caller already holding a [different] socket lock */ static inline bool lock_sock_fast_nested(struct sock *sk) { mutex_acquire(&sk->sk_lock.dep_map, SINGLE_DEPTH_NESTING, 0, _RET_IP_); return __lock_sock_fast(sk); } /** * unlock_sock_fast - complement of lock_sock_fast * @sk: socket * @slow: slow mode * * fast unlock socket for user context. * If slow mode is on, we call regular release_sock() */ static inline void unlock_sock_fast(struct sock *sk, bool slow) __releases(&sk->sk_lock.slock) { if (slow) { release_sock(sk); __release(&sk->sk_lock.slock); } else { mutex_release(&sk->sk_lock.dep_map, _RET_IP_); spin_unlock_bh(&sk->sk_lock.slock); } } void sockopt_lock_sock(struct sock *sk); void sockopt_release_sock(struct sock *sk); bool sockopt_ns_capable(struct user_namespace *ns, int cap); bool sockopt_capable(int cap); /* Used by processes to "lock" a socket state, so that * interrupts and bottom half handlers won't change it * from under us. It essentially blocks any incoming * packets, so that we won't get any new data or any * packets that change the state of the socket. * * While locked, BH processing will add new packets to * the backlog queue. This queue is processed by the * owner of the socket lock right before it is released. * * Since ~2.3.5 it is also exclusive sleep lock serializing * accesses from user process context. */ static inline void sock_owned_by_me(const struct sock *sk) { #ifdef CONFIG_LOCKDEP WARN_ON_ONCE(!lockdep_sock_is_held(sk) && debug_locks); #endif } static inline void sock_not_owned_by_me(const struct sock *sk) { #ifdef CONFIG_LOCKDEP WARN_ON_ONCE(lockdep_sock_is_held(sk) && debug_locks); #endif } static inline bool sock_owned_by_user(const struct sock *sk) { sock_owned_by_me(sk); return sk->sk_lock.owned; } static inline bool sock_owned_by_user_nocheck(const struct sock *sk) { return sk->sk_lock.owned; } static inline void sock_release_ownership(struct sock *sk) { DEBUG_NET_WARN_ON_ONCE(!sock_owned_by_user_nocheck(sk)); sk->sk_lock.owned = 0; /* The sk_lock has mutex_unlock() semantics: */ mutex_release(&sk->sk_lock.dep_map, _RET_IP_); } /* no reclassification while locks are held */ static inline bool sock_allow_reclassification(const struct sock *csk) { struct sock *sk = (struct sock *)csk; return !sock_owned_by_user_nocheck(sk) && !spin_is_locked(&sk->sk_lock.slock); } struct sock *sk_alloc(struct net *net, int family, gfp_t priority, struct proto *prot, int kern); void sk_free(struct sock *sk); void sk_net_refcnt_upgrade(struct sock *sk); void sk_destruct(struct sock *sk); struct sock *sk_clone_lock(const struct sock *sk, const gfp_t priority); struct sk_buff *sock_wmalloc(struct sock *sk, unsigned long size, int force, gfp_t priority); void __sock_wfree(struct sk_buff *skb); void sock_wfree(struct sk_buff *skb); struct sk_buff *sock_omalloc(struct sock *sk, unsigned long size, gfp_t priority); void skb_orphan_partial(struct sk_buff *skb); void sock_rfree(struct sk_buff *skb); void sock_efree(struct sk_buff *skb); #ifdef CONFIG_INET void sock_edemux(struct sk_buff *skb); void sock_pfree(struct sk_buff *skb); static inline void skb_set_owner_edemux(struct sk_buff *skb, struct sock *sk) { skb_orphan(skb); if (refcount_inc_not_zero(&sk->sk_refcnt)) { skb->sk = sk; skb->destructor = sock_edemux; } } #else #define sock_edemux sock_efree #endif int sk_setsockopt(struct sock *sk, int level, int optname, sockptr_t optval, unsigned int optlen); int sock_setsockopt(struct socket *sock, int level, int op, sockptr_t optval, unsigned int optlen); int do_sock_setsockopt(struct socket *sock, bool compat, int level, int optname, sockptr_t optval, int optlen); int do_sock_getsockopt(struct socket *sock, bool compat, int level, int optname, sockptr_t optval, sockptr_t optlen); int sk_getsockopt(struct sock *sk, int level, int optname, sockptr_t optval, sockptr_t optlen); int sock_gettstamp(struct socket *sock, void __user *userstamp, bool timeval, bool time32); struct sk_buff *sock_alloc_send_pskb(struct sock *sk, unsigned long header_len, unsigned long data_len, int noblock, int *errcode, int max_page_order); static inline struct sk_buff *sock_alloc_send_skb(struct sock *sk, unsigned long size, int noblock, int *errcode) { return sock_alloc_send_pskb(sk, size, 0, noblock, errcode, 0); } void *sock_kmalloc(struct sock *sk, int size, gfp_t priority); void *sock_kmemdup(struct sock *sk, const void *src, int size, gfp_t priority); void sock_kfree_s(struct sock *sk, void *mem, int size); void sock_kzfree_s(struct sock *sk, void *mem, int size); void sk_send_sigurg(struct sock *sk); static inline void sock_replace_proto(struct sock *sk, struct proto *proto) { if (sk->sk_socket) clear_bit(SOCK_SUPPORT_ZC, &sk->sk_socket->flags); WRITE_ONCE(sk->sk_prot, proto); } struct sockcm_cookie { u64 transmit_time; u32 mark; u32 tsflags; u32 ts_opt_id; u32 priority; u32 dmabuf_id; }; static inline void sockcm_init(struct sockcm_cookie *sockc, const struct sock *sk) { *sockc = (struct sockcm_cookie) { .mark = READ_ONCE(sk->sk_mark), .tsflags = READ_ONCE(sk->sk_tsflags), .priority = READ_ONCE(sk->sk_priority), }; } int __sock_cmsg_send(struct sock *sk, struct cmsghdr *cmsg, struct sockcm_cookie *sockc); int sock_cmsg_send(struct sock *sk, struct msghdr *msg, struct sockcm_cookie *sockc); /* * Functions to fill in entries in struct proto_ops when a protocol * does not implement a particular function. */ int sock_no_bind(struct socket *, struct sockaddr *, int); int sock_no_connect(struct socket *, struct sockaddr *, int, int); int sock_no_socketpair(struct socket *, struct socket *); int sock_no_accept(struct socket *, struct socket *, struct proto_accept_arg *); int sock_no_getname(struct socket *, struct sockaddr *, int); int sock_no_ioctl(struct socket *, unsigned int, unsigned long); int sock_no_listen(struct socket *, int); int sock_no_shutdown(struct socket *, int); int sock_no_sendmsg(struct socket *, struct msghdr *, size_t); int sock_no_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t len); int sock_no_recvmsg(struct socket *, struct msghdr *, size_t, int); int sock_no_mmap(struct file *file, struct socket *sock, struct vm_area_struct *vma); /* * Functions to fill in entries in struct proto_ops when a protocol * uses the inet style. */ int sock_common_getsockopt(struct socket *sock, int level, int optname, char __user *optval, int __user *optlen); int sock_common_recvmsg(struct socket *sock, struct msghdr *msg, size_t size, int flags); int sock_common_setsockopt(struct socket *sock, int level, int optname, sockptr_t optval, unsigned int optlen); void sk_common_release(struct sock *sk); /* * Default socket callbacks and setup code */ /* Initialise core socket variables using an explicit uid. */ void sock_init_data_uid(struct socket *sock, struct sock *sk, kuid_t uid); /* Initialise core socket variables. * Assumes struct socket *sock is embedded in a struct socket_alloc. */ void sock_init_data(struct socket *sock, struct sock *sk); /* * Socket reference counting postulates. * * * Each user of socket SHOULD hold a reference count. * * Each access point to socket (an hash table bucket, reference from a list, * running timer, skb in flight MUST hold a reference count. * * When reference count hits 0, it means it will never increase back. * * When reference count hits 0, it means that no references from * outside exist to this socket and current process on current CPU * is last user and may/should destroy this socket. * * sk_free is called from any context: process, BH, IRQ. When * it is called, socket has no references from outside -> sk_free * may release descendant resources allocated by the socket, but * to the time when it is called, socket is NOT referenced by any * hash tables, lists etc. * * Packets, delivered from outside (from network or from another process) * and enqueued on receive/error queues SHOULD NOT grab reference count, * when they sit in queue. Otherwise, packets will leak to hole, when * socket is looked up by one cpu and unhasing is made by another CPU. * It is true for udp/raw, netlink (leak to receive and error queues), tcp * (leak to backlog). Packet socket does all the processing inside * BR_NETPROTO_LOCK, so that it has not this race condition. UNIX sockets * use separate SMP lock, so that they are prone too. */ /* Ungrab socket and destroy it, if it was the last reference. */ static inline void sock_put(struct sock *sk) { if (refcount_dec_and_test(&sk->sk_refcnt)) sk_free(sk); } /* Generic version of sock_put(), dealing with all sockets * (TCP_TIMEWAIT, TCP_NEW_SYN_RECV, ESTABLISHED...) */ void sock_gen_put(struct sock *sk); int __sk_receive_skb(struct sock *sk, struct sk_buff *skb, const int nested, unsigned int trim_cap, bool refcounted); static inline int sk_receive_skb(struct sock *sk, struct sk_buff *skb, const int nested) { return __sk_receive_skb(sk, skb, nested, 1, true); } static inline void sk_tx_queue_set(struct sock *sk, int tx_queue) { /* sk_tx_queue_mapping accept only upto a 16-bit value */ if (WARN_ON_ONCE((unsigned short)tx_queue >= USHRT_MAX)) return; /* Paired with READ_ONCE() in sk_tx_queue_get() and * other WRITE_ONCE() because socket lock might be not held. */ WRITE_ONCE(sk->sk_tx_queue_mapping, tx_queue); } #define NO_QUEUE_MAPPING USHRT_MAX static inline void sk_tx_queue_clear(struct sock *sk) { /* Paired with READ_ONCE() in sk_tx_queue_get() and * other WRITE_ONCE() because socket lock might be not held. */ WRITE_ONCE(sk->sk_tx_queue_mapping, NO_QUEUE_MAPPING); } static inline int sk_tx_queue_get(const struct sock *sk) { if (sk) { /* Paired with WRITE_ONCE() in sk_tx_queue_clear() * and sk_tx_queue_set(). */ int val = READ_ONCE(sk->sk_tx_queue_mapping); if (val != NO_QUEUE_MAPPING) return val; } return -1; } static inline void __sk_rx_queue_set(struct sock *sk, const struct sk_buff *skb, bool force_set) { #ifdef CONFIG_SOCK_RX_QUEUE_MAPPING if (skb_rx_queue_recorded(skb)) { u16 rx_queue = skb_get_rx_queue(skb); if (force_set || unlikely(READ_ONCE(sk->sk_rx_queue_mapping) != rx_queue)) WRITE_ONCE(sk->sk_rx_queue_mapping, rx_queue); } #endif } static inline void sk_rx_queue_set(struct sock *sk, const struct sk_buff *skb) { __sk_rx_queue_set(sk, skb, true); } static inline void sk_rx_queue_update(struct sock *sk, const struct sk_buff *skb) { __sk_rx_queue_set(sk, skb, false); } static inline void sk_rx_queue_clear(struct sock *sk) { #ifdef CONFIG_SOCK_RX_QUEUE_MAPPING WRITE_ONCE(sk->sk_rx_queue_mapping, NO_QUEUE_MAPPING); #endif } static inline int sk_rx_queue_get(const struct sock *sk) { #ifdef CONFIG_SOCK_RX_QUEUE_MAPPING if (sk) { int res = READ_ONCE(sk->sk_rx_queue_mapping); if (res != NO_QUEUE_MAPPING) return res; } #endif return -1; } static inline void sk_set_socket(struct sock *sk, struct socket *sock) { sk->sk_socket = sock; } static inline wait_queue_head_t *sk_sleep(struct sock *sk) { BUILD_BUG_ON(offsetof(struct socket_wq, wait) != 0); return &rcu_dereference_raw(sk->sk_wq)->wait; } /* Detach socket from process context. * Announce socket dead, detach it from wait queue and inode. * Note that parent inode held reference count on this struct sock, * we do not release it in this function, because protocol * probably wants some additional cleanups or even continuing * to work with this socket (TCP). */ static inline void sock_orphan(struct sock *sk) { write_lock_bh(&sk->sk_callback_lock); sock_set_flag(sk, SOCK_DEAD); sk_set_socket(sk, NULL); sk->sk_wq = NULL; /* Note: sk_uid is unchanged. */ write_unlock_bh(&sk->sk_callback_lock); } static inline void sock_graft(struct sock *sk, struct socket *parent) { WARN_ON(parent->sk); write_lock_bh(&sk->sk_callback_lock); rcu_assign_pointer(sk->sk_wq, &parent->wq); parent->sk = sk; sk_set_socket(sk, parent); WRITE_ONCE(sk->sk_uid, SOCK_INODE(parent)->i_uid); security_sock_graft(sk, parent); write_unlock_bh(&sk->sk_callback_lock); } static inline kuid_t sk_uid(const struct sock *sk) { /* Paired with WRITE_ONCE() in sockfs_setattr() */ return READ_ONCE(sk->sk_uid); } unsigned long __sock_i_ino(struct sock *sk); unsigned long sock_i_ino(struct sock *sk); static inline kuid_t sock_net_uid(const struct net *net, const struct sock *sk) { return sk ? sk_uid(sk) : make_kuid(net->user_ns, 0); } static inline u32 net_tx_rndhash(void) { u32 v = get_random_u32(); return v ?: 1; } static inline void sk_set_txhash(struct sock *sk) { /* This pairs with READ_ONCE() in skb_set_hash_from_sk() */ WRITE_ONCE(sk->sk_txhash, net_tx_rndhash()); } static inline bool sk_rethink_txhash(struct sock *sk) { if (sk->sk_txhash && sk->sk_txrehash == SOCK_TXREHASH_ENABLED) { sk_set_txhash(sk); return true; } return false; } static inline struct dst_entry * __sk_dst_get(const struct sock *sk) { return rcu_dereference_check(sk->sk_dst_cache, lockdep_sock_is_held(sk)); } static inline struct dst_entry * sk_dst_get(const struct sock *sk) { struct dst_entry *dst; rcu_read_lock(); dst = rcu_dereference(sk->sk_dst_cache); if (dst && !rcuref_get(&dst->__rcuref)) dst = NULL; rcu_read_unlock(); return dst; } static inline void __dst_negative_advice(struct sock *sk) { struct dst_entry *dst = __sk_dst_get(sk); if (dst && dst->ops->negative_advice) dst->ops->negative_advice(sk, dst); } static inline void dst_negative_advice(struct sock *sk) { sk_rethink_txhash(sk); __dst_negative_advice(sk); } static inline void __sk_dst_set(struct sock *sk, struct dst_entry *dst) { struct dst_entry *old_dst; sk_tx_queue_clear(sk); WRITE_ONCE(sk->sk_dst_pending_confirm, 0); old_dst = rcu_dereference_protected(sk->sk_dst_cache, lockdep_sock_is_held(sk)); rcu_assign_pointer(sk->sk_dst_cache, dst); dst_release(old_dst); } static inline void sk_dst_set(struct sock *sk, struct dst_entry *dst) { struct dst_entry *old_dst; sk_tx_queue_clear(sk); WRITE_ONCE(sk->sk_dst_pending_confirm, 0); old_dst = unrcu_pointer(xchg(&sk->sk_dst_cache, RCU_INITIALIZER(dst))); dst_release(old_dst); } static inline void __sk_dst_reset(struct sock *sk) { __sk_dst_set(sk, NULL); } static inline void sk_dst_reset(struct sock *sk) { sk_dst_set(sk, NULL); } struct dst_entry *__sk_dst_check(struct sock *sk, u32 cookie); struct dst_entry *sk_dst_check(struct sock *sk, u32 cookie); static inline void sk_dst_confirm(struct sock *sk) { if (!READ_ONCE(sk->sk_dst_pending_confirm)) WRITE_ONCE(sk->sk_dst_pending_confirm, 1); } static inline void sock_confirm_neigh(struct sk_buff *skb, struct neighbour *n) { if (skb_get_dst_pending_confirm(skb)) { struct sock *sk = skb->sk; if (sk && READ_ONCE(sk->sk_dst_pending_confirm)) WRITE_ONCE(sk->sk_dst_pending_confirm, 0); neigh_confirm(n); } } bool sk_mc_loop(const struct sock *sk); static inline bool sk_can_gso(const struct sock *sk) { return net_gso_ok(sk->sk_route_caps, sk->sk_gso_type); } void sk_setup_caps(struct sock *sk, struct dst_entry *dst); static inline void sk_gso_disable(struct sock *sk) { sk->sk_gso_disabled = 1; sk->sk_route_caps &= ~NETIF_F_GSO_MASK; } static inline int skb_do_copy_data_nocache(struct sock *sk, struct sk_buff *skb, struct iov_iter *from, char *to, int copy, int offset) { if (skb->ip_summed == CHECKSUM_NONE) { __wsum csum = 0; if (!csum_and_copy_from_iter_full(to, copy, &csum, from)) return -EFAULT; skb->csum = csum_block_add(skb->csum, csum, offset); } else if (sk->sk_route_caps & NETIF_F_NOCACHE_COPY) { if (!copy_from_iter_full_nocache(to, copy, from)) return -EFAULT; } else if (!copy_from_iter_full(to, copy, from)) return -EFAULT; return 0; } static inline int skb_add_data_nocache(struct sock *sk, struct sk_buff *skb, struct iov_iter *from, int copy) { int err, offset = skb->len; err = skb_do_copy_data_nocache(sk, skb, from, skb_put(skb, copy), copy, offset); if (err) __skb_trim(skb, offset); return err; } static inline int skb_copy_to_page_nocache(struct sock *sk, struct iov_iter *from, struct sk_buff *skb, struct page *page, int off, int copy) { int err; err = skb_do_copy_data_nocache(sk, skb, from, page_address(page) + off, copy, skb->len); if (err) return err; skb_len_add(skb, copy); sk_wmem_queued_add(sk, copy); sk_mem_charge(sk, copy); return 0; } /** * sk_wmem_alloc_get - returns write allocations * @sk: socket * * Return: sk_wmem_alloc minus initial offset of one */ static inline int sk_wmem_alloc_get(const struct sock *sk) { return refcount_read(&sk->sk_wmem_alloc) - 1; } /** * sk_rmem_alloc_get - returns read allocations * @sk: socket * * Return: sk_rmem_alloc */ static inline int sk_rmem_alloc_get(const struct sock *sk) { return atomic_read(&sk->sk_rmem_alloc); } /** * sk_has_allocations - check if allocations are outstanding * @sk: socket * * Return: true if socket has write or read allocations */ static inline bool sk_has_allocations(const struct sock *sk) { return sk_wmem_alloc_get(sk) || sk_rmem_alloc_get(sk); } /** * skwq_has_sleeper - check if there are any waiting processes * @wq: struct socket_wq * * Return: true if socket_wq has waiting processes * * The purpose of the skwq_has_sleeper and sock_poll_wait is to wrap the memory * barrier call. They were added due to the race found within the tcp code. * * Consider following tcp code paths:: * * CPU1 CPU2 * sys_select receive packet * ... ... * __add_wait_queue update tp->rcv_nxt * ... ... * tp->rcv_nxt check sock_def_readable * ... { * schedule rcu_read_lock(); * wq = rcu_dereference(sk->sk_wq); * if (wq && waitqueue_active(&wq->wait)) * wake_up_interruptible(&wq->wait) * ... * } * * The race for tcp fires when the __add_wait_queue changes done by CPU1 stay * in its cache, and so does the tp->rcv_nxt update on CPU2 side. The CPU1 * could then endup calling schedule and sleep forever if there are no more * data on the socket. * */ static inline bool skwq_has_sleeper(struct socket_wq *wq) { return wq && wq_has_sleeper(&wq->wait); } /** * sock_poll_wait - wrapper for the poll_wait call. * @filp: file * @sock: socket to wait on * @p: poll_table * * See the comments in the wq_has_sleeper function. */ static inline void sock_poll_wait(struct file *filp, struct socket *sock, poll_table *p) { /* Provides a barrier we need to be sure we are in sync * with the socket flags modification. * * This memory barrier is paired in the wq_has_sleeper. */ poll_wait(filp, &sock->wq.wait, p); } static inline void skb_set_hash_from_sk(struct sk_buff *skb, struct sock *sk) { /* This pairs with WRITE_ONCE() in sk_set_txhash() */ u32 txhash = READ_ONCE(sk->sk_txhash); if (txhash) { skb->l4_hash = 1; skb->hash = txhash; } } void skb_set_owner_w(struct sk_buff *skb, struct sock *sk); /* * Queue a received datagram if it will fit. Stream and sequenced * protocols can't normally use this as they need to fit buffers in * and play with them. * * Inlined as it's very short and called for pretty much every * packet ever received. */ static inline void skb_set_owner_r(struct sk_buff *skb, struct sock *sk) { skb_orphan(skb); skb->sk = sk; skb->destructor = sock_rfree; atomic_add(skb->truesize, &sk->sk_rmem_alloc); sk_mem_charge(sk, skb->truesize); } static inline __must_check bool skb_set_owner_sk_safe(struct sk_buff *skb, struct sock *sk) { if (sk && refcount_inc_not_zero(&sk->sk_refcnt)) { skb_orphan(skb); skb->destructor = sock_efree; skb->sk = sk; return true; } return false; } static inline struct sk_buff *skb_clone_and_charge_r(struct sk_buff *skb, struct sock *sk) { skb = skb_clone(skb, sk_gfp_mask(sk, GFP_ATOMIC)); if (skb) { if (sk_rmem_schedule(sk, skb, skb->truesize)) { skb_set_owner_r(skb, sk); return skb; } __kfree_skb(skb); } return NULL; } static inline void skb_prepare_for_gro(struct sk_buff *skb) { if (skb->destructor != sock_wfree) { skb_orphan(skb); return; } skb->slow_gro = 1; } void sk_reset_timer(struct sock *sk, struct timer_list *timer, unsigned long expires); void sk_stop_timer(struct sock *sk, struct timer_list *timer); void sk_stop_timer_sync(struct sock *sk, struct timer_list *timer); int __sk_queue_drop_skb(struct sock *sk, struct sk_buff_head *sk_queue, struct sk_buff *skb, unsigned int flags, void (*destructor)(struct sock *sk, struct sk_buff *skb)); int __sock_queue_rcv_skb(struct sock *sk, struct sk_buff *skb); int sock_queue_rcv_skb_reason(struct sock *sk, struct sk_buff *skb, enum skb_drop_reason *reason); static inline int sock_queue_rcv_skb(struct sock *sk, struct sk_buff *skb) { return sock_queue_rcv_skb_reason(sk, skb, NULL); } int sock_queue_err_skb(struct sock *sk, struct sk_buff *skb); struct sk_buff *sock_dequeue_err_skb(struct sock *sk); /* * Recover an error report and clear atomically */ static inline int sock_error(struct sock *sk) { int err; /* Avoid an atomic operation for the common case. * This is racy since another cpu/thread can change sk_err under us. */ if (likely(data_race(!sk->sk_err))) return 0; err = xchg(&sk->sk_err, 0); return -err; } void sk_error_report(struct sock *sk); static inline unsigned long sock_wspace(struct sock *sk) { int amt = 0; if (!(sk->sk_shutdown & SEND_SHUTDOWN)) { amt = sk->sk_sndbuf - refcount_read(&sk->sk_wmem_alloc); if (amt < 0) amt = 0; } return amt; } /* Note: * We use sk->sk_wq_raw, from contexts knowing this * pointer is not NULL and cannot disappear/change. */ static inline void sk_set_bit(int nr, struct sock *sk) { if ((nr == SOCKWQ_ASYNC_NOSPACE || nr == SOCKWQ_ASYNC_WAITDATA) && !sock_flag(sk, SOCK_FASYNC)) return; set_bit(nr, &sk->sk_wq_raw->flags); } static inline void sk_clear_bit(int nr, struct sock *sk) { if ((nr == SOCKWQ_ASYNC_NOSPACE || nr == SOCKWQ_ASYNC_WAITDATA) && !sock_flag(sk, SOCK_FASYNC)) return; clear_bit(nr, &sk->sk_wq_raw->flags); } static inline void sk_wake_async(const struct sock *sk, int how, int band) { if (sock_flag(sk, SOCK_FASYNC)) { rcu_read_lock(); sock_wake_async(rcu_dereference(sk->sk_wq), how, band); rcu_read_unlock(); } } static inline void sk_wake_async_rcu(const struct sock *sk, int how, int band) { if (unlikely(sock_flag(sk, SOCK_FASYNC))) sock_wake_async(rcu_dereference(sk->sk_wq), how, band); } /* Since sk_{r,w}mem_alloc sums skb->truesize, even a small frame might * need sizeof(sk_buff) + MTU + padding, unless net driver perform copybreak. * Note: for send buffers, TCP works better if we can build two skbs at * minimum. */ #define TCP_SKB_MIN_TRUESIZE (2048 + SKB_DATA_ALIGN(sizeof(struct sk_buff))) #define SOCK_MIN_SNDBUF (TCP_SKB_MIN_TRUESIZE * 2) #define SOCK_MIN_RCVBUF TCP_SKB_MIN_TRUESIZE static inline void sk_stream_moderate_sndbuf(struct sock *sk) { u32 val; if (sk->sk_userlocks & SOCK_SNDBUF_LOCK) return; val = min(sk->sk_sndbuf, sk->sk_wmem_queued >> 1); val = max_t(u32, val, sk_unused_reserved_mem(sk)); WRITE_ONCE(sk->sk_sndbuf, max_t(u32, val, SOCK_MIN_SNDBUF)); } /** * sk_page_frag - return an appropriate page_frag * @sk: socket * * Use the per task page_frag instead of the per socket one for * optimization when we know that we're in process context and own * everything that's associated with %current. * * Both direct reclaim and page faults can nest inside other * socket operations and end up recursing into sk_page_frag() * while it's already in use: explicitly avoid task page_frag * when users disable sk_use_task_frag. * * Return: a per task page_frag if context allows that, * otherwise a per socket one. */ static inline struct page_frag *sk_page_frag(struct sock *sk) { if (sk->sk_use_task_frag) return ¤t->task_frag; return &sk->sk_frag; } bool sk_page_frag_refill(struct sock *sk, struct page_frag *pfrag); /* * Default write policy as shown to user space via poll/select/SIGIO */ static inline bool sock_writeable(const struct sock *sk) { return refcount_read(&sk->sk_wmem_alloc) < (READ_ONCE(sk->sk_sndbuf) >> 1); } static inline gfp_t gfp_any(void) { return in_softirq() ? GFP_ATOMIC : GFP_KERNEL; } static inline gfp_t gfp_memcg_charge(void) { return in_softirq() ? GFP_ATOMIC : GFP_KERNEL; } static inline long sock_rcvtimeo(const struct sock *sk, bool noblock) { return noblock ? 0 : READ_ONCE(sk->sk_rcvtimeo); } static inline long sock_sndtimeo(const struct sock *sk, bool noblock) { return noblock ? 0 : READ_ONCE(sk->sk_sndtimeo); } static inline int sock_rcvlowat(const struct sock *sk, int waitall, int len) { int v = waitall ? len : min_t(int, READ_ONCE(sk->sk_rcvlowat), len); return v ?: 1; } /* Alas, with timeout socket operations are not restartable. * Compare this to poll(). */ static inline int sock_intr_errno(long timeo) { return timeo == MAX_SCHEDULE_TIMEOUT ? -ERESTARTSYS : -EINTR; } struct sock_skb_cb { u32 dropcount; }; /* Store sock_skb_cb at the end of skb->cb[] so protocol families * using skb->cb[] would keep using it directly and utilize its * alignment guarantee. */ #define SOCK_SKB_CB_OFFSET (sizeof_field(struct sk_buff, cb) - \ sizeof(struct sock_skb_cb)) #define SOCK_SKB_CB(__skb) ((struct sock_skb_cb *)((__skb)->cb + \ SOCK_SKB_CB_OFFSET)) #define sock_skb_cb_check_size(size) \ BUILD_BUG_ON((size) > SOCK_SKB_CB_OFFSET) static inline void sock_skb_set_dropcount(const struct sock *sk, struct sk_buff *skb) { SOCK_SKB_CB(skb)->dropcount = sock_flag(sk, SOCK_RXQ_OVFL) ? atomic_read(&sk->sk_drops) : 0; } static inline void sk_drops_add(struct sock *sk, const struct sk_buff *skb) { int segs = max_t(u16, 1, skb_shinfo(skb)->gso_segs); atomic_add(segs, &sk->sk_drops); } static inline ktime_t sock_read_timestamp(struct sock *sk) { #if BITS_PER_LONG==32 unsigned int seq; ktime_t kt; do { seq = read_seqbegin(&sk->sk_stamp_seq); kt = sk->sk_stamp; } while (read_seqretry(&sk->sk_stamp_seq, seq)); return kt; #else return READ_ONCE(sk->sk_stamp); #endif } static inline void sock_write_timestamp(struct sock *sk, ktime_t kt) { #if BITS_PER_LONG==32 write_seqlock(&sk->sk_stamp_seq); sk->sk_stamp = kt; write_sequnlock(&sk->sk_stamp_seq); #else WRITE_ONCE(sk->sk_stamp, kt); #endif } void __sock_recv_timestamp(struct msghdr *msg, struct sock *sk, struct sk_buff *skb); void __sock_recv_wifi_status(struct msghdr *msg, struct sock *sk, struct sk_buff *skb); bool skb_has_tx_timestamp(struct sk_buff *skb, const struct sock *sk); int skb_get_tx_timestamp(struct sk_buff *skb, struct sock *sk, struct timespec64 *ts); static inline void sock_recv_timestamp(struct msghdr *msg, struct sock *sk, struct sk_buff *skb) { struct skb_shared_hwtstamps *hwtstamps = skb_hwtstamps(skb); u32 tsflags = READ_ONCE(sk->sk_tsflags); ktime_t kt = skb->tstamp; /* * generate control messages if * - receive time stamping in software requested * - software time stamp available and wanted * - hardware time stamps available and wanted */ if (sock_flag(sk, SOCK_RCVTSTAMP) || (tsflags & SOF_TIMESTAMPING_RX_SOFTWARE) || (kt && tsflags & SOF_TIMESTAMPING_SOFTWARE) || (hwtstamps->hwtstamp && (tsflags & SOF_TIMESTAMPING_RAW_HARDWARE))) __sock_recv_timestamp(msg, sk, skb); else sock_write_timestamp(sk, kt); if (sock_flag(sk, SOCK_WIFI_STATUS) && skb_wifi_acked_valid(skb)) __sock_recv_wifi_status(msg, sk, skb); } void __sock_recv_cmsgs(struct msghdr *msg, struct sock *sk, struct sk_buff *skb); #define SK_DEFAULT_STAMP (-1L * NSEC_PER_SEC) static inline void sock_recv_cmsgs(struct msghdr *msg, struct sock *sk, struct sk_buff *skb) { #define FLAGS_RECV_CMSGS ((1UL << SOCK_RXQ_OVFL) | \ (1UL << SOCK_RCVTSTAMP) | \ (1UL << SOCK_RCVMARK) | \ (1UL << SOCK_RCVPRIORITY) | \ (1UL << SOCK_TIMESTAMPING_ANY)) #define TSFLAGS_ANY (SOF_TIMESTAMPING_SOFTWARE | \ SOF_TIMESTAMPING_RAW_HARDWARE) if (READ_ONCE(sk->sk_flags) & FLAGS_RECV_CMSGS) __sock_recv_cmsgs(msg, sk, skb); else if (unlikely(sock_flag(sk, SOCK_TIMESTAMP))) sock_write_timestamp(sk, skb->tstamp); else if (unlikely(sock_read_timestamp(sk) == SK_DEFAULT_STAMP)) sock_write_timestamp(sk, 0); } void __sock_tx_timestamp(__u32 tsflags, __u8 *tx_flags); /** * _sock_tx_timestamp - checks whether the outgoing packet is to be time stamped * @sk: socket sending this packet * @sockc: pointer to socket cmsg cookie to get timestamping info * @tx_flags: completed with instructions for time stamping * @tskey: filled in with next sk_tskey (not for TCP, which uses seqno) * * Note: callers should take care of initial ``*tx_flags`` value (usually 0) */ static inline void _sock_tx_timestamp(struct sock *sk, const struct sockcm_cookie *sockc, __u8 *tx_flags, __u32 *tskey) { __u32 tsflags = sockc->tsflags; if (unlikely(tsflags)) { __sock_tx_timestamp(tsflags, tx_flags); if (tsflags & SOF_TIMESTAMPING_OPT_ID && tskey && tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK) { if (tsflags & SOCKCM_FLAG_TS_OPT_ID) *tskey = sockc->ts_opt_id; else *tskey = atomic_inc_return(&sk->sk_tskey) - 1; } } } static inline void sock_tx_timestamp(struct sock *sk, const struct sockcm_cookie *sockc, __u8 *tx_flags) { _sock_tx_timestamp(sk, sockc, tx_flags, NULL); } static inline void skb_setup_tx_timestamp(struct sk_buff *skb, const struct sockcm_cookie *sockc) { _sock_tx_timestamp(skb->sk, sockc, &skb_shinfo(skb)->tx_flags, &skb_shinfo(skb)->tskey); } static inline bool sk_is_inet(const struct sock *sk) { int family = READ_ONCE(sk->sk_family); return family == AF_INET || family == AF_INET6; } static inline bool sk_is_tcp(const struct sock *sk) { return sk_is_inet(sk) && sk->sk_type == SOCK_STREAM && sk->sk_protocol == IPPROTO_TCP; } static inline bool sk_is_udp(const struct sock *sk) { return sk_is_inet(sk) && sk->sk_type == SOCK_DGRAM && sk->sk_protocol == IPPROTO_UDP; } static inline bool sk_is_unix(const struct sock *sk) { return sk->sk_family == AF_UNIX; } static inline bool sk_is_stream_unix(const struct sock *sk) { return sk_is_unix(sk) && sk->sk_type == SOCK_STREAM; } static inline bool sk_is_vsock(const struct sock *sk) { return sk->sk_family == AF_VSOCK; } static inline bool sk_may_scm_recv(const struct sock *sk) { return (IS_ENABLED(CONFIG_UNIX) && sk->sk_family == AF_UNIX) || sk->sk_family == AF_NETLINK || (IS_ENABLED(CONFIG_BT) && sk->sk_family == AF_BLUETOOTH); } /** * sk_eat_skb - Release a skb if it is no longer needed * @sk: socket to eat this skb from * @skb: socket buffer to eat * * This routine must be called with interrupts disabled or with the socket * locked so that the sk_buff queue operation is ok. */ static inline void sk_eat_skb(struct sock *sk, struct sk_buff *skb) { __skb_unlink(skb, &sk->sk_receive_queue); __kfree_skb(skb); } static inline bool skb_sk_is_prefetched(struct sk_buff *skb) { #ifdef CONFIG_INET return skb->destructor == sock_pfree; #else return false; #endif /* CONFIG_INET */ } /* This helper checks if a socket is a full socket, * ie _not_ a timewait or request socket. */ static inline bool sk_fullsock(const struct sock *sk) { return (1 << sk->sk_state) & ~(TCPF_TIME_WAIT | TCPF_NEW_SYN_RECV); } static inline bool sk_is_refcounted(struct sock *sk) { /* Only full sockets have sk->sk_flags. */ return !sk_fullsock(sk) || !sock_flag(sk, SOCK_RCU_FREE); } static inline bool sk_requests_wifi_status(struct sock *sk) { return sk && sk_fullsock(sk) && sock_flag(sk, SOCK_WIFI_STATUS); } /* Checks if this SKB belongs to an HW offloaded socket * and whether any SW fallbacks are required based on dev. * Check decrypted mark in case skb_orphan() cleared socket. */ static inline struct sk_buff *sk_validate_xmit_skb(struct sk_buff *skb, struct net_device *dev) { #ifdef CONFIG_SOCK_VALIDATE_XMIT struct sock *sk = skb->sk; if (sk && sk_fullsock(sk) && sk->sk_validate_xmit_skb) { skb = sk->sk_validate_xmit_skb(sk, dev, skb); } else if (unlikely(skb_is_decrypted(skb))) { pr_warn_ratelimited("unencrypted skb with no associated socket - dropping\n"); kfree_skb(skb); skb = NULL; } #endif return skb; } /* This helper checks if a socket is a LISTEN or NEW_SYN_RECV * SYNACK messages can be attached to either ones (depending on SYNCOOKIE) */ static inline bool sk_listener(const struct sock *sk) { return (1 << sk->sk_state) & (TCPF_LISTEN | TCPF_NEW_SYN_RECV); } /* This helper checks if a socket is a LISTEN or NEW_SYN_RECV or TIME_WAIT * TCP SYNACK messages can be attached to LISTEN or NEW_SYN_RECV (depending on SYNCOOKIE) * TCP RST and ACK can be attached to TIME_WAIT. */ static inline bool sk_listener_or_tw(const struct sock *sk) { return (1 << READ_ONCE(sk->sk_state)) & (TCPF_LISTEN | TCPF_NEW_SYN_RECV | TCPF_TIME_WAIT); } void sock_enable_timestamp(struct sock *sk, enum sock_flags flag); int sock_recv_errqueue(struct sock *sk, struct msghdr *msg, int len, int level, int type); bool sk_ns_capable(const struct sock *sk, struct user_namespace *user_ns, int cap); bool sk_capable(const struct sock *sk, int cap); bool sk_net_capable(const struct sock *sk, int cap); void sk_get_meminfo(const struct sock *sk, u32 *meminfo); /* Take into consideration the size of the struct sk_buff overhead in the * determination of these values, since that is non-constant across * platforms. This makes socket queueing behavior and performance * not depend upon such differences. */ #define _SK_MEM_PACKETS 256 #define _SK_MEM_OVERHEAD SKB_TRUESIZE(256) #define SK_WMEM_MAX (_SK_MEM_OVERHEAD * _SK_MEM_PACKETS) #define SK_RMEM_MAX (_SK_MEM_OVERHEAD * _SK_MEM_PACKETS) extern __u32 sysctl_wmem_max; extern __u32 sysctl_rmem_max; extern __u32 sysctl_wmem_default; extern __u32 sysctl_rmem_default; #define SKB_FRAG_PAGE_ORDER get_order(32768) DECLARE_STATIC_KEY_FALSE(net_high_order_alloc_disable_key); static inline int sk_get_wmem0(const struct sock *sk, const struct proto *proto) { /* Does this proto have per netns sysctl_wmem ? */ if (proto->sysctl_wmem_offset) return READ_ONCE(*(int *)((void *)sock_net(sk) + proto->sysctl_wmem_offset)); return READ_ONCE(*proto->sysctl_wmem); } static inline int sk_get_rmem0(const struct sock *sk, const struct proto *proto) { /* Does this proto have per netns sysctl_rmem ? */ if (proto->sysctl_rmem_offset) return READ_ONCE(*(int *)((void *)sock_net(sk) + proto->sysctl_rmem_offset)); return READ_ONCE(*proto->sysctl_rmem); } /* Default TCP Small queue budget is ~1 ms of data (1sec >> 10) * Some wifi drivers need to tweak it to get more chunks. * They can use this helper from their ndo_start_xmit() */ static inline void sk_pacing_shift_update(struct sock *sk, int val) { if (!sk || !sk_fullsock(sk) || READ_ONCE(sk->sk_pacing_shift) == val) return; WRITE_ONCE(sk->sk_pacing_shift, val); } /* if a socket is bound to a device, check that the given device * index is either the same or that the socket is bound to an L3 * master device and the given device index is also enslaved to * that L3 master */ static inline bool sk_dev_equal_l3scope(struct sock *sk, int dif) { int bound_dev_if = READ_ONCE(sk->sk_bound_dev_if); int mdif; if (!bound_dev_if || bound_dev_if == dif) return true; mdif = l3mdev_master_ifindex_by_index(sock_net(sk), dif); if (mdif && mdif == bound_dev_if) return true; return false; } void sock_def_readable(struct sock *sk); int sock_bindtoindex(struct sock *sk, int ifindex, bool lock_sk); void sock_set_timestamp(struct sock *sk, int optname, bool valbool); int sock_set_timestamping(struct sock *sk, int optname, struct so_timestamping timestamping); #if defined(CONFIG_CGROUP_BPF) void bpf_skops_tx_timestamping(struct sock *sk, struct sk_buff *skb, int op); #else static inline void bpf_skops_tx_timestamping(struct sock *sk, struct sk_buff *skb, int op) { } #endif void sock_no_linger(struct sock *sk); void sock_set_keepalive(struct sock *sk); void sock_set_priority(struct sock *sk, u32 priority); void sock_set_rcvbuf(struct sock *sk, int val); void sock_set_mark(struct sock *sk, u32 val); void sock_set_reuseaddr(struct sock *sk); void sock_set_reuseport(struct sock *sk); void sock_set_sndtimeo(struct sock *sk, s64 secs); int sock_bind_add(struct sock *sk, struct sockaddr *addr, int addr_len); int sock_get_timeout(long timeo, void *optval, bool old_timeval); int sock_copy_user_timeval(struct __kernel_sock_timeval *tv, sockptr_t optval, int optlen, bool old_timeval); int sock_ioctl_inout(struct sock *sk, unsigned int cmd, void __user *arg, void *karg, size_t size); int sk_ioctl(struct sock *sk, unsigned int cmd, void __user *arg); static inline bool sk_is_readable(struct sock *sk) { const struct proto *prot = READ_ONCE(sk->sk_prot); if (prot->sock_is_readable) return prot->sock_is_readable(sk); return false; } #endif /* _SOCK_H */ |
| 3 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 | /* SPDX-License-Identifier: GPL-2.0 */ /* * thermal_core.h * * Copyright (C) 2012 Intel Corp * Author: Durgadoss R <durgadoss.r@intel.com> */ #ifndef __THERMAL_CORE_H__ #define __THERMAL_CORE_H__ #include <linux/cleanup.h> #include <linux/device.h> #include <linux/thermal.h> #include "thermal_netlink.h" #include "thermal_thresholds.h" #include "thermal_debugfs.h" struct thermal_attr { struct device_attribute attr; char name[THERMAL_NAME_LENGTH]; }; struct thermal_trip_attrs { struct thermal_attr type; struct thermal_attr temp; struct thermal_attr hyst; }; struct thermal_trip_desc { struct thermal_trip trip; struct thermal_trip_attrs trip_attrs; struct list_head list_node; struct list_head thermal_instances; int threshold; }; /** * struct thermal_governor - structure that holds thermal governor information * @name: name of the governor * @bind_to_tz: callback called when binding to a thermal zone. If it * returns 0, the governor is bound to the thermal zone, * otherwise it fails. * @unbind_from_tz: callback called when a governor is unbound from a * thermal zone. * @trip_crossed: called for trip points that have just been crossed * @manage: called on thermal zone temperature updates * @update_tz: callback called when thermal zone internals have changed, e.g. * thermal cooling instance was added/removed * @governor_list: node in thermal_governor_list (in thermal_core.c) */ struct thermal_governor { const char *name; int (*bind_to_tz)(struct thermal_zone_device *tz); void (*unbind_from_tz)(struct thermal_zone_device *tz); void (*trip_crossed)(struct thermal_zone_device *tz, const struct thermal_trip *trip, bool upward); void (*manage)(struct thermal_zone_device *tz); void (*update_tz)(struct thermal_zone_device *tz, enum thermal_notify_event reason); struct list_head governor_list; }; #define TZ_STATE_FLAG_SUSPENDED BIT(0) #define TZ_STATE_FLAG_RESUMING BIT(1) #define TZ_STATE_FLAG_INIT BIT(2) #define TZ_STATE_FLAG_EXIT BIT(3) #define TZ_STATE_READY 0 /** * struct thermal_zone_device - structure for a thermal zone * @id: unique id number for each thermal zone * @type: the thermal zone device type * @device: &struct device for this thermal zone * @removal: removal completion * @resume: resume completion * @trips_high: trips above the current zone temperature * @trips_reached: trips below or at the current zone temperature * @trips_invalid: trips with invalid temperature * @mode: current mode of this thermal zone * @devdata: private pointer for device private data * @num_trips: number of trip points the thermal zone supports * @passive_delay_jiffies: number of jiffies to wait between polls when * performing passive cooling. * @polling_delay_jiffies: number of jiffies to wait between polls when * checking whether trip points have been crossed (0 for * interrupt driven systems) * @recheck_delay_jiffies: delay after a failed attempt to determine the zone * temperature before trying again * @temperature: current temperature. This is only for core code, * drivers should use thermal_zone_get_temp() to get the * current temperature * @last_temperature: previous temperature read * @emul_temperature: emulated temperature when using CONFIG_THERMAL_EMULATION * @passive: 1 if you've crossed a passive trip point, 0 otherwise. * @prev_low_trip: the low current temperature if you've crossed a passive trip point. * @prev_high_trip: the above current temperature if you've crossed a passive trip point. * @ops: operations this &thermal_zone_device supports * @tzp: thermal zone parameters * @governor: pointer to the governor for this thermal zone * @governor_data: private pointer for governor data * @ida: &struct ida to generate unique id for this zone's cooling * devices * @lock: lock to protect thermal_instances list * @node: node in thermal_tz_list (in thermal_core.c) * @poll_queue: delayed work for polling * @notify_event: Last notification event * @state: current state of the thermal zone * @trips: array of struct thermal_trip objects */ struct thermal_zone_device { int id; char type[THERMAL_NAME_LENGTH]; struct device device; struct completion removal; struct completion resume; struct attribute_group trips_attribute_group; struct list_head trips_high; struct list_head trips_reached; struct list_head trips_invalid; enum thermal_device_mode mode; void *devdata; int num_trips; unsigned long passive_delay_jiffies; unsigned long polling_delay_jiffies; unsigned long recheck_delay_jiffies; int temperature; int last_temperature; int emul_temperature; int passive; int prev_low_trip; int prev_high_trip; struct thermal_zone_device_ops ops; struct thermal_zone_params *tzp; struct thermal_governor *governor; void *governor_data; struct ida ida; struct mutex lock; struct list_head node; struct delayed_work poll_queue; enum thermal_notify_event notify_event; u8 state; #ifdef CONFIG_THERMAL_DEBUGFS struct thermal_debugfs *debugfs; #endif struct list_head user_thresholds; struct thermal_trip_desc trips[] __counted_by(num_trips); }; DEFINE_GUARD(thermal_zone, struct thermal_zone_device *, mutex_lock(&_T->lock), mutex_unlock(&_T->lock)) DEFINE_GUARD(thermal_zone_reverse, struct thermal_zone_device *, mutex_unlock(&_T->lock), mutex_lock(&_T->lock)) /* Initial thermal zone temperature. */ #define THERMAL_TEMP_INIT INT_MIN /* * Default and maximum delay after a failed thermal zone temperature check * before attempting to check it again (in jiffies). */ #define THERMAL_RECHECK_DELAY msecs_to_jiffies(250) #define THERMAL_MAX_RECHECK_DELAY (120 * HZ) /* Default Thermal Governor */ #if defined(CONFIG_THERMAL_DEFAULT_GOV_STEP_WISE) #define DEFAULT_THERMAL_GOVERNOR "step_wise" #elif defined(CONFIG_THERMAL_DEFAULT_GOV_FAIR_SHARE) #define DEFAULT_THERMAL_GOVERNOR "fair_share" #elif defined(CONFIG_THERMAL_DEFAULT_GOV_USER_SPACE) #define DEFAULT_THERMAL_GOVERNOR "user_space" #elif defined(CONFIG_THERMAL_DEFAULT_GOV_POWER_ALLOCATOR) #define DEFAULT_THERMAL_GOVERNOR "power_allocator" #elif defined(CONFIG_THERMAL_DEFAULT_GOV_BANG_BANG) #define DEFAULT_THERMAL_GOVERNOR "bang_bang" #endif /* Initial state of a cooling device during binding */ #define THERMAL_NO_TARGET -1UL /* Init section thermal table */ extern struct thermal_governor *__governor_thermal_table[]; extern struct thermal_governor *__governor_thermal_table_end[]; #define THERMAL_TABLE_ENTRY(table, name) \ static typeof(name) *__thermal_table_entry_##name \ __used __section("__" #table "_thermal_table") = &name #define THERMAL_GOVERNOR_DECLARE(name) THERMAL_TABLE_ENTRY(governor, name) #define for_each_governor_table(__governor) \ for (__governor = __governor_thermal_table; \ __governor < __governor_thermal_table_end; \ __governor++) int for_each_thermal_zone(int (*cb)(struct thermal_zone_device *, void *), void *); int for_each_thermal_cooling_device(int (*cb)(struct thermal_cooling_device *, void *), void *); int for_each_thermal_governor(int (*cb)(struct thermal_governor *, void *), void *thermal_governor); struct thermal_zone_device *thermal_zone_get_by_id(int id); DEFINE_CLASS(thermal_zone_get_by_id, struct thermal_zone_device *, if (_T) put_device(&_T->device), thermal_zone_get_by_id(id), int id) static inline bool cdev_is_power_actor(struct thermal_cooling_device *cdev) { return cdev->ops->get_requested_power && cdev->ops->state2power && cdev->ops->power2state; } void thermal_cdev_update(struct thermal_cooling_device *); void thermal_cdev_update_nocheck(struct thermal_cooling_device *cdev); void __thermal_cdev_update(struct thermal_cooling_device *cdev); int get_tz_trend(struct thermal_zone_device *tz, const struct thermal_trip *trip); /* * This structure is used to describe the behavior of * a certain cooling device on a certain trip point * in a certain thermal zone */ struct thermal_instance { int id; char name[THERMAL_NAME_LENGTH]; struct thermal_cooling_device *cdev; const struct thermal_trip *trip; bool initialized; unsigned long upper; /* Highest cooling state for this trip point */ unsigned long lower; /* Lowest cooling state for this trip point */ unsigned long target; /* expected cooling state */ char attr_name[THERMAL_NAME_LENGTH]; struct device_attribute attr; char weight_attr_name[THERMAL_NAME_LENGTH]; struct device_attribute weight_attr; struct list_head trip_node; /* node in trip->thermal_instances */ struct list_head cdev_node; /* node in cdev->thermal_instances */ unsigned int weight; /* The weight of the cooling device */ bool upper_no_limit; }; #define to_thermal_zone(_dev) \ container_of(_dev, struct thermal_zone_device, device) #define to_cooling_device(_dev) \ container_of(_dev, struct thermal_cooling_device, device) int thermal_register_governor(struct thermal_governor *); void thermal_unregister_governor(struct thermal_governor *); int thermal_zone_device_set_policy(struct thermal_zone_device *, char *); int thermal_build_list_of_policies(char *buf); void __thermal_zone_device_update(struct thermal_zone_device *tz, enum thermal_notify_event event); void thermal_zone_device_critical_reboot(struct thermal_zone_device *tz); void thermal_zone_device_critical_shutdown(struct thermal_zone_device *tz); void thermal_governor_update_tz(struct thermal_zone_device *tz, enum thermal_notify_event reason); /* Helpers */ #define for_each_trip_desc(__tz, __td) \ for (__td = __tz->trips; __td - __tz->trips < __tz->num_trips; __td++) #define trip_to_trip_desc(__trip) \ container_of(__trip, struct thermal_trip_desc, trip) const char *thermal_trip_type_name(enum thermal_trip_type trip_type); void thermal_zone_set_trips(struct thermal_zone_device *tz, int low, int high); int thermal_zone_trip_id(const struct thermal_zone_device *tz, const struct thermal_trip *trip); int __thermal_zone_get_temp(struct thermal_zone_device *tz, int *temp); void thermal_zone_set_trip_hyst(struct thermal_zone_device *tz, struct thermal_trip *trip, int hyst); /* sysfs I/F */ int thermal_zone_create_device_groups(struct thermal_zone_device *tz); void thermal_zone_destroy_device_groups(struct thermal_zone_device *); void thermal_cooling_device_setup_sysfs(struct thermal_cooling_device *); void thermal_cooling_device_destroy_sysfs(struct thermal_cooling_device *cdev); void thermal_cooling_device_stats_reinit(struct thermal_cooling_device *cdev); /* used only at binding time */ ssize_t trip_point_show(struct device *, struct device_attribute *, char *); ssize_t weight_show(struct device *, struct device_attribute *, char *); ssize_t weight_store(struct device *, struct device_attribute *, const char *, size_t); #ifdef CONFIG_THERMAL_STATISTICS void thermal_cooling_device_stats_update(struct thermal_cooling_device *cdev, unsigned long new_state); #else static inline void thermal_cooling_device_stats_update(struct thermal_cooling_device *cdev, unsigned long new_state) {} #endif /* CONFIG_THERMAL_STATISTICS */ #endif /* __THERMAL_CORE_H__ */ |
| 7 4 37 28 1 3 2 1 1 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 | // SPDX-License-Identifier: GPL-2.0-or-later /* * Hardware dependent layer * Copyright (c) by Jaroslav Kysela <perex@perex.cz> */ #include <linux/major.h> #include <linux/init.h> #include <linux/slab.h> #include <linux/time.h> #include <linux/mutex.h> #include <linux/module.h> #include <linux/sched/signal.h> #include <sound/core.h> #include <sound/control.h> #include <sound/minors.h> #include <sound/hwdep.h> #include <sound/info.h> MODULE_AUTHOR("Jaroslav Kysela <perex@perex.cz>"); MODULE_DESCRIPTION("Hardware dependent layer"); MODULE_LICENSE("GPL"); static LIST_HEAD(snd_hwdep_devices); static DEFINE_MUTEX(register_mutex); static int snd_hwdep_dev_free(struct snd_device *device); static int snd_hwdep_dev_register(struct snd_device *device); static int snd_hwdep_dev_disconnect(struct snd_device *device); static struct snd_hwdep *snd_hwdep_search(struct snd_card *card, int device) { struct snd_hwdep *hwdep; list_for_each_entry(hwdep, &snd_hwdep_devices, list) if (hwdep->card == card && hwdep->device == device) return hwdep; return NULL; } static loff_t snd_hwdep_llseek(struct file * file, loff_t offset, int orig) { struct snd_hwdep *hw = file->private_data; if (hw->ops.llseek) return hw->ops.llseek(hw, file, offset, orig); return -ENXIO; } static ssize_t snd_hwdep_read(struct file * file, char __user *buf, size_t count, loff_t *offset) { struct snd_hwdep *hw = file->private_data; if (hw->ops.read) return hw->ops.read(hw, buf, count, offset); return -ENXIO; } static ssize_t snd_hwdep_write(struct file * file, const char __user *buf, size_t count, loff_t *offset) { struct snd_hwdep *hw = file->private_data; if (hw->ops.write) return hw->ops.write(hw, buf, count, offset); return -ENXIO; } static int snd_hwdep_open(struct inode *inode, struct file * file) { int major = imajor(inode); struct snd_hwdep *hw; int err; wait_queue_entry_t wait; if (major == snd_major) { hw = snd_lookup_minor_data(iminor(inode), SNDRV_DEVICE_TYPE_HWDEP); #ifdef CONFIG_SND_OSSEMUL } else if (major == SOUND_MAJOR) { hw = snd_lookup_oss_minor_data(iminor(inode), SNDRV_OSS_DEVICE_TYPE_DMFM); #endif } else return -ENXIO; if (hw == NULL) return -ENODEV; if (!try_module_get(hw->card->module)) { snd_card_unref(hw->card); return -EFAULT; } init_waitqueue_entry(&wait, current); add_wait_queue(&hw->open_wait, &wait); mutex_lock(&hw->open_mutex); while (1) { if (hw->exclusive && hw->used > 0) { err = -EBUSY; break; } if (!hw->ops.open) { err = 0; break; } err = hw->ops.open(hw, file); if (err >= 0) break; if (err == -EAGAIN) { if (file->f_flags & O_NONBLOCK) { err = -EBUSY; break; } } else break; set_current_state(TASK_INTERRUPTIBLE); mutex_unlock(&hw->open_mutex); schedule(); mutex_lock(&hw->open_mutex); if (hw->card->shutdown) { err = -ENODEV; break; } if (signal_pending(current)) { err = -ERESTARTSYS; break; } } remove_wait_queue(&hw->open_wait, &wait); if (err >= 0) { err = snd_card_file_add(hw->card, file); if (err >= 0) { file->private_data = hw; hw->used++; } else { if (hw->ops.release) hw->ops.release(hw, file); } } mutex_unlock(&hw->open_mutex); if (err < 0) module_put(hw->card->module); snd_card_unref(hw->card); return err; } static int snd_hwdep_release(struct inode *inode, struct file * file) { int err = 0; struct snd_hwdep *hw = file->private_data; struct module *mod = hw->card->module; scoped_guard(mutex, &hw->open_mutex) { if (hw->ops.release) err = hw->ops.release(hw, file); if (hw->used > 0) hw->used--; } wake_up(&hw->open_wait); snd_card_file_remove(hw->card, file); module_put(mod); return err; } static __poll_t snd_hwdep_poll(struct file * file, poll_table * wait) { struct snd_hwdep *hw = file->private_data; if (hw->ops.poll) return hw->ops.poll(hw, file, wait); return 0; } static int snd_hwdep_info(struct snd_hwdep *hw, struct snd_hwdep_info __user *_info) { struct snd_hwdep_info info; memset(&info, 0, sizeof(info)); info.card = hw->card->number; strscpy(info.id, hw->id, sizeof(info.id)); strscpy(info.name, hw->name, sizeof(info.name)); info.iface = hw->iface; if (copy_to_user(_info, &info, sizeof(info))) return -EFAULT; return 0; } static int snd_hwdep_dsp_status(struct snd_hwdep *hw, struct snd_hwdep_dsp_status __user *_info) { struct snd_hwdep_dsp_status info; int err; if (! hw->ops.dsp_status) return -ENXIO; memset(&info, 0, sizeof(info)); info.dsp_loaded = hw->dsp_loaded; err = hw->ops.dsp_status(hw, &info); if (err < 0) return err; if (copy_to_user(_info, &info, sizeof(info))) return -EFAULT; return 0; } static int snd_hwdep_dsp_load(struct snd_hwdep *hw, struct snd_hwdep_dsp_image *info) { int err; if (! hw->ops.dsp_load) return -ENXIO; if (info->index >= 32) return -EINVAL; /* check whether the dsp was already loaded */ if (hw->dsp_loaded & (1u << info->index)) return -EBUSY; err = hw->ops.dsp_load(hw, info); if (err < 0) return err; hw->dsp_loaded |= (1u << info->index); return 0; } static int snd_hwdep_dsp_load_user(struct snd_hwdep *hw, struct snd_hwdep_dsp_image __user *_info) { struct snd_hwdep_dsp_image info = {}; if (copy_from_user(&info, _info, sizeof(info))) return -EFAULT; return snd_hwdep_dsp_load(hw, &info); } static long snd_hwdep_ioctl(struct file * file, unsigned int cmd, unsigned long arg) { struct snd_hwdep *hw = file->private_data; void __user *argp = (void __user *)arg; switch (cmd) { case SNDRV_HWDEP_IOCTL_PVERSION: return put_user(SNDRV_HWDEP_VERSION, (int __user *)argp); case SNDRV_HWDEP_IOCTL_INFO: return snd_hwdep_info(hw, argp); case SNDRV_HWDEP_IOCTL_DSP_STATUS: return snd_hwdep_dsp_status(hw, argp); case SNDRV_HWDEP_IOCTL_DSP_LOAD: return snd_hwdep_dsp_load_user(hw, argp); } if (hw->ops.ioctl) return hw->ops.ioctl(hw, file, cmd, arg); return -ENOTTY; } static int snd_hwdep_mmap(struct file * file, struct vm_area_struct * vma) { struct snd_hwdep *hw = file->private_data; if (hw->ops.mmap) return hw->ops.mmap(hw, file, vma); return -ENXIO; } static int snd_hwdep_control_ioctl(struct snd_card *card, struct snd_ctl_file * control, unsigned int cmd, unsigned long arg) { switch (cmd) { case SNDRV_CTL_IOCTL_HWDEP_NEXT_DEVICE: { int device; if (get_user(device, (int __user *)arg)) return -EFAULT; scoped_guard(mutex, ®ister_mutex) { if (device < 0) device = 0; else if (device < SNDRV_MINOR_HWDEPS) device++; else device = SNDRV_MINOR_HWDEPS; while (device < SNDRV_MINOR_HWDEPS) { if (snd_hwdep_search(card, device)) break; device++; } if (device >= SNDRV_MINOR_HWDEPS) device = -1; } if (put_user(device, (int __user *)arg)) return -EFAULT; return 0; } case SNDRV_CTL_IOCTL_HWDEP_INFO: { struct snd_hwdep_info __user *info = (struct snd_hwdep_info __user *)arg; int device; struct snd_hwdep *hwdep; if (get_user(device, &info->device)) return -EFAULT; scoped_guard(mutex, ®ister_mutex) { hwdep = snd_hwdep_search(card, device); if (!hwdep) return -ENXIO; return snd_hwdep_info(hwdep, info); } break; } } return -ENOIOCTLCMD; } #ifdef CONFIG_COMPAT #include "hwdep_compat.c" #else #define snd_hwdep_ioctl_compat NULL #endif /* */ static const struct file_operations snd_hwdep_f_ops = { .owner = THIS_MODULE, .llseek = snd_hwdep_llseek, .read = snd_hwdep_read, .write = snd_hwdep_write, .open = snd_hwdep_open, .release = snd_hwdep_release, .poll = snd_hwdep_poll, .unlocked_ioctl = snd_hwdep_ioctl, .compat_ioctl = snd_hwdep_ioctl_compat, .mmap = snd_hwdep_mmap, }; static void snd_hwdep_free(struct snd_hwdep *hwdep) { if (!hwdep) return; if (hwdep->private_free) hwdep->private_free(hwdep); put_device(hwdep->dev); kfree(hwdep); } /** * snd_hwdep_new - create a new hwdep instance * @card: the card instance * @id: the id string * @device: the device index (zero-based) * @rhwdep: the pointer to store the new hwdep instance * * Creates a new hwdep instance with the given index on the card. * The callbacks (hwdep->ops) must be set on the returned instance * after this call manually by the caller. * * Return: Zero if successful, or a negative error code on failure. */ int snd_hwdep_new(struct snd_card *card, char *id, int device, struct snd_hwdep **rhwdep) { struct snd_hwdep *hwdep; int err; static const struct snd_device_ops ops = { .dev_free = snd_hwdep_dev_free, .dev_register = snd_hwdep_dev_register, .dev_disconnect = snd_hwdep_dev_disconnect, }; if (snd_BUG_ON(!card)) return -ENXIO; if (rhwdep) *rhwdep = NULL; hwdep = kzalloc(sizeof(*hwdep), GFP_KERNEL); if (!hwdep) return -ENOMEM; init_waitqueue_head(&hwdep->open_wait); mutex_init(&hwdep->open_mutex); hwdep->card = card; hwdep->device = device; if (id) strscpy(hwdep->id, id, sizeof(hwdep->id)); err = snd_device_alloc(&hwdep->dev, card); if (err < 0) { snd_hwdep_free(hwdep); return err; } dev_set_name(hwdep->dev, "hwC%iD%i", card->number, device); #ifdef CONFIG_SND_OSSEMUL hwdep->oss_type = -1; #endif err = snd_device_new(card, SNDRV_DEV_HWDEP, hwdep, &ops); if (err < 0) { snd_hwdep_free(hwdep); return err; } if (rhwdep) *rhwdep = hwdep; return 0; } EXPORT_SYMBOL(snd_hwdep_new); static int snd_hwdep_dev_free(struct snd_device *device) { snd_hwdep_free(device->device_data); return 0; } static int snd_hwdep_dev_register(struct snd_device *device) { struct snd_hwdep *hwdep = device->device_data; struct snd_card *card = hwdep->card; int err; guard(mutex)(®ister_mutex); if (snd_hwdep_search(card, hwdep->device)) return -EBUSY; list_add_tail(&hwdep->list, &snd_hwdep_devices); err = snd_register_device(SNDRV_DEVICE_TYPE_HWDEP, hwdep->card, hwdep->device, &snd_hwdep_f_ops, hwdep, hwdep->dev); if (err < 0) { dev_err(hwdep->dev, "unable to register\n"); list_del(&hwdep->list); return err; } #ifdef CONFIG_SND_OSSEMUL hwdep->ossreg = 0; if (hwdep->oss_type >= 0) { if (hwdep->oss_type == SNDRV_OSS_DEVICE_TYPE_DMFM && hwdep->device) dev_warn(hwdep->dev, "only hwdep device 0 can be registered as OSS direct FM device!\n"); else if (snd_register_oss_device(hwdep->oss_type, card, hwdep->device, &snd_hwdep_f_ops, hwdep) < 0) dev_warn(hwdep->dev, "unable to register OSS compatibility device\n"); else hwdep->ossreg = 1; } #endif return 0; } static int snd_hwdep_dev_disconnect(struct snd_device *device) { struct snd_hwdep *hwdep = device->device_data; if (snd_BUG_ON(!hwdep)) return -ENXIO; guard(mutex)(®ister_mutex); if (snd_hwdep_search(hwdep->card, hwdep->device) != hwdep) return -EINVAL; guard(mutex)(&hwdep->open_mutex); wake_up(&hwdep->open_wait); #ifdef CONFIG_SND_OSSEMUL if (hwdep->ossreg) snd_unregister_oss_device(hwdep->oss_type, hwdep->card, hwdep->device); #endif snd_unregister_device(hwdep->dev); list_del_init(&hwdep->list); return 0; } #ifdef CONFIG_SND_PROC_FS /* * Info interface */ static void snd_hwdep_proc_read(struct snd_info_entry *entry, struct snd_info_buffer *buffer) { struct snd_hwdep *hwdep; guard(mutex)(®ister_mutex); list_for_each_entry(hwdep, &snd_hwdep_devices, list) snd_iprintf(buffer, "%02i-%02i: %s\n", hwdep->card->number, hwdep->device, hwdep->name); } static struct snd_info_entry *snd_hwdep_proc_entry; static void __init snd_hwdep_proc_init(void) { struct snd_info_entry *entry; entry = snd_info_create_module_entry(THIS_MODULE, "hwdep", NULL); if (entry) { entry->c.text.read = snd_hwdep_proc_read; if (snd_info_register(entry) < 0) { snd_info_free_entry(entry); entry = NULL; } } snd_hwdep_proc_entry = entry; } static void __exit snd_hwdep_proc_done(void) { snd_info_free_entry(snd_hwdep_proc_entry); } #else /* !CONFIG_SND_PROC_FS */ #define snd_hwdep_proc_init() #define snd_hwdep_proc_done() #endif /* CONFIG_SND_PROC_FS */ /* * ENTRY functions */ static int __init alsa_hwdep_init(void) { snd_hwdep_proc_init(); snd_ctl_register_ioctl(snd_hwdep_control_ioctl); snd_ctl_register_ioctl_compat(snd_hwdep_control_ioctl); return 0; } static void __exit alsa_hwdep_exit(void) { snd_ctl_unregister_ioctl(snd_hwdep_control_ioctl); snd_ctl_unregister_ioctl_compat(snd_hwdep_control_ioctl); snd_hwdep_proc_done(); } module_init(alsa_hwdep_init) module_exit(alsa_hwdep_exit) |
| 7 7 3 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 | // SPDX-License-Identifier: GPL-2.0-only /* * Netlink interface for IEEE 802.15.4 stack * * Copyright 2007, 2008 Siemens AG * * Written by: * Sergey Lapin <slapin@ossfans.org> * Dmitry Eremin-Solenikov <dbaryshkov@gmail.com> * Maxim Osipov <maxim.osipov@siemens.com> */ #include <linux/kernel.h> #include <linux/gfp.h> #include <net/genetlink.h> #include <linux/nl802154.h> #include "ieee802154.h" static unsigned int ieee802154_seq_num; static DEFINE_SPINLOCK(ieee802154_seq_lock); /* Requests to userspace */ struct sk_buff *ieee802154_nl_create(int flags, u8 req) { void *hdr; struct sk_buff *msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_ATOMIC); unsigned long f; if (!msg) return NULL; spin_lock_irqsave(&ieee802154_seq_lock, f); hdr = genlmsg_put(msg, 0, ieee802154_seq_num++, &nl802154_family, flags, req); spin_unlock_irqrestore(&ieee802154_seq_lock, f); if (!hdr) { nlmsg_free(msg); return NULL; } return msg; } int ieee802154_nl_mcast(struct sk_buff *msg, unsigned int group) { struct nlmsghdr *nlh = nlmsg_hdr(msg); void *hdr = genlmsg_data(nlmsg_data(nlh)); genlmsg_end(msg, hdr); return genlmsg_multicast(&nl802154_family, msg, 0, group, GFP_ATOMIC); } struct sk_buff *ieee802154_nl_new_reply(struct genl_info *info, int flags, u8 req) { void *hdr; struct sk_buff *msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_ATOMIC); if (!msg) return NULL; hdr = genlmsg_put_reply(msg, info, &nl802154_family, flags, req); if (!hdr) { nlmsg_free(msg); return NULL; } return msg; } int ieee802154_nl_reply(struct sk_buff *msg, struct genl_info *info) { struct nlmsghdr *nlh = nlmsg_hdr(msg); void *hdr = genlmsg_data(nlmsg_data(nlh)); genlmsg_end(msg, hdr); return genlmsg_reply(msg, info); } static const struct genl_small_ops ieee802154_ops[] = { /* see nl-phy.c */ IEEE802154_DUMP(IEEE802154_LIST_PHY, ieee802154_list_phy, ieee802154_dump_phy), IEEE802154_OP(IEEE802154_ADD_IFACE, ieee802154_add_iface), IEEE802154_OP(IEEE802154_DEL_IFACE, ieee802154_del_iface), /* see nl-mac.c */ IEEE802154_OP(IEEE802154_ASSOCIATE_REQ, ieee802154_associate_req), IEEE802154_OP(IEEE802154_ASSOCIATE_RESP, ieee802154_associate_resp), IEEE802154_OP(IEEE802154_DISASSOCIATE_REQ, ieee802154_disassociate_req), IEEE802154_OP(IEEE802154_SCAN_REQ, ieee802154_scan_req), IEEE802154_OP(IEEE802154_START_REQ, ieee802154_start_req), IEEE802154_DUMP(IEEE802154_LIST_IFACE, ieee802154_list_iface, ieee802154_dump_iface), IEEE802154_OP(IEEE802154_SET_MACPARAMS, ieee802154_set_macparams), IEEE802154_OP(IEEE802154_LLSEC_GETPARAMS, ieee802154_llsec_getparams), IEEE802154_OP(IEEE802154_LLSEC_SETPARAMS, ieee802154_llsec_setparams), IEEE802154_DUMP(IEEE802154_LLSEC_LIST_KEY, NULL, ieee802154_llsec_dump_keys), IEEE802154_OP(IEEE802154_LLSEC_ADD_KEY, ieee802154_llsec_add_key), IEEE802154_OP(IEEE802154_LLSEC_DEL_KEY, ieee802154_llsec_del_key), IEEE802154_DUMP(IEEE802154_LLSEC_LIST_DEV, NULL, ieee802154_llsec_dump_devs), IEEE802154_OP(IEEE802154_LLSEC_ADD_DEV, ieee802154_llsec_add_dev), IEEE802154_OP(IEEE802154_LLSEC_DEL_DEV, ieee802154_llsec_del_dev), IEEE802154_DUMP(IEEE802154_LLSEC_LIST_DEVKEY, NULL, ieee802154_llsec_dump_devkeys), IEEE802154_OP(IEEE802154_LLSEC_ADD_DEVKEY, ieee802154_llsec_add_devkey), IEEE802154_OP(IEEE802154_LLSEC_DEL_DEVKEY, ieee802154_llsec_del_devkey), IEEE802154_DUMP(IEEE802154_LLSEC_LIST_SECLEVEL, NULL, ieee802154_llsec_dump_seclevels), IEEE802154_OP(IEEE802154_LLSEC_ADD_SECLEVEL, ieee802154_llsec_add_seclevel), IEEE802154_OP(IEEE802154_LLSEC_DEL_SECLEVEL, ieee802154_llsec_del_seclevel), }; static const struct genl_multicast_group ieee802154_mcgrps[] = { [IEEE802154_COORD_MCGRP] = { .name = IEEE802154_MCAST_COORD_NAME, }, [IEEE802154_BEACON_MCGRP] = { .name = IEEE802154_MCAST_BEACON_NAME, }, }; struct genl_family nl802154_family __ro_after_init = { .hdrsize = 0, .name = IEEE802154_NL_NAME, .version = 1, .maxattr = IEEE802154_ATTR_MAX, .policy = ieee802154_policy, .module = THIS_MODULE, .small_ops = ieee802154_ops, .n_small_ops = ARRAY_SIZE(ieee802154_ops), .resv_start_op = IEEE802154_LLSEC_DEL_SECLEVEL + 1, .mcgrps = ieee802154_mcgrps, .n_mcgrps = ARRAY_SIZE(ieee802154_mcgrps), }; int __init ieee802154_nl_init(void) { return genl_register_family(&nl802154_family); } void ieee802154_nl_exit(void) { genl_unregister_family(&nl802154_family); } |
| 577 4948 114 5075 70 164 1198 139 32 3606 67 2 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 | /* SPDX-License-Identifier: GPL-2.0 */ /* * Written by Mark Hemment, 1996 (markhe@nextd.demon.co.uk). * * (C) SGI 2006, Christoph Lameter * Cleaned up and restructured to ease the addition of alternative * implementations of SLAB allocators. * (C) Linux Foundation 2008-2013 * Unified interface for all slab allocators */ #ifndef _LINUX_SLAB_H #define _LINUX_SLAB_H #include <linux/cache.h> #include <linux/gfp.h> #include <linux/overflow.h> #include <linux/types.h> #include <linux/rcupdate.h> #include <linux/workqueue.h> #include <linux/percpu-refcount.h> #include <linux/cleanup.h> #include <linux/hash.h> enum _slab_flag_bits { _SLAB_CONSISTENCY_CHECKS, _SLAB_RED_ZONE, _SLAB_POISON, _SLAB_KMALLOC, _SLAB_HWCACHE_ALIGN, _SLAB_CACHE_DMA, _SLAB_CACHE_DMA32, _SLAB_STORE_USER, _SLAB_PANIC, _SLAB_TYPESAFE_BY_RCU, _SLAB_TRACE, #ifdef CONFIG_DEBUG_OBJECTS _SLAB_DEBUG_OBJECTS, #endif _SLAB_NOLEAKTRACE, _SLAB_NO_MERGE, #ifdef CONFIG_FAILSLAB _SLAB_FAILSLAB, #endif #ifdef CONFIG_MEMCG _SLAB_ACCOUNT, #endif #ifdef CONFIG_KASAN_GENERIC _SLAB_KASAN, #endif _SLAB_NO_USER_FLAGS, #ifdef CONFIG_KFENCE _SLAB_SKIP_KFENCE, #endif #ifndef CONFIG_SLUB_TINY _SLAB_RECLAIM_ACCOUNT, #endif _SLAB_OBJECT_POISON, _SLAB_CMPXCHG_DOUBLE, #ifdef CONFIG_SLAB_OBJ_EXT _SLAB_NO_OBJ_EXT, #endif _SLAB_FLAGS_LAST_BIT }; #define __SLAB_FLAG_BIT(nr) ((slab_flags_t __force)(1U << (nr))) #define __SLAB_FLAG_UNUSED ((slab_flags_t __force)(0U)) /* * Flags to pass to kmem_cache_create(). * The ones marked DEBUG need CONFIG_SLUB_DEBUG enabled, otherwise are no-op */ /* DEBUG: Perform (expensive) checks on alloc/free */ #define SLAB_CONSISTENCY_CHECKS __SLAB_FLAG_BIT(_SLAB_CONSISTENCY_CHECKS) /* DEBUG: Red zone objs in a cache */ #define SLAB_RED_ZONE __SLAB_FLAG_BIT(_SLAB_RED_ZONE) /* DEBUG: Poison objects */ #define SLAB_POISON __SLAB_FLAG_BIT(_SLAB_POISON) /* Indicate a kmalloc slab */ #define SLAB_KMALLOC __SLAB_FLAG_BIT(_SLAB_KMALLOC) /** * define SLAB_HWCACHE_ALIGN - Align objects on cache line boundaries. * * Sufficiently large objects are aligned on cache line boundary. For object * size smaller than a half of cache line size, the alignment is on the half of * cache line size. In general, if object size is smaller than 1/2^n of cache * line size, the alignment is adjusted to 1/2^n. * * If explicit alignment is also requested by the respective * &struct kmem_cache_args field, the greater of both is alignments is applied. */ #define SLAB_HWCACHE_ALIGN __SLAB_FLAG_BIT(_SLAB_HWCACHE_ALIGN) /* Use GFP_DMA memory */ #define SLAB_CACHE_DMA __SLAB_FLAG_BIT(_SLAB_CACHE_DMA) /* Use GFP_DMA32 memory */ #define SLAB_CACHE_DMA32 __SLAB_FLAG_BIT(_SLAB_CACHE_DMA32) /* DEBUG: Store the last owner for bug hunting */ #define SLAB_STORE_USER __SLAB_FLAG_BIT(_SLAB_STORE_USER) /* Panic if kmem_cache_create() fails */ #define SLAB_PANIC __SLAB_FLAG_BIT(_SLAB_PANIC) /** * define SLAB_TYPESAFE_BY_RCU - **WARNING** READ THIS! * * This delays freeing the SLAB page by a grace period, it does _NOT_ * delay object freeing. This means that if you do kmem_cache_free() * that memory location is free to be reused at any time. Thus it may * be possible to see another object there in the same RCU grace period. * * This feature only ensures the memory location backing the object * stays valid, the trick to using this is relying on an independent * object validation pass. Something like: * * :: * * begin: * rcu_read_lock(); * obj = lockless_lookup(key); * if (obj) { * if (!try_get_ref(obj)) // might fail for free objects * rcu_read_unlock(); * goto begin; * * if (obj->key != key) { // not the object we expected * put_ref(obj); * rcu_read_unlock(); * goto begin; * } * } * rcu_read_unlock(); * * This is useful if we need to approach a kernel structure obliquely, * from its address obtained without the usual locking. We can lock * the structure to stabilize it and check it's still at the given address, * only if we can be sure that the memory has not been meanwhile reused * for some other kind of object (which our subsystem's lock might corrupt). * * rcu_read_lock before reading the address, then rcu_read_unlock after * taking the spinlock within the structure expected at that address. * * Note that object identity check has to be done *after* acquiring a * reference, therefore user has to ensure proper ordering for loads. * Similarly, when initializing objects allocated with SLAB_TYPESAFE_BY_RCU, * the newly allocated object has to be fully initialized *before* its * refcount gets initialized and proper ordering for stores is required. * refcount_{add|inc}_not_zero_acquire() and refcount_set_release() are * designed with the proper fences required for reference counting objects * allocated with SLAB_TYPESAFE_BY_RCU. * * Note that it is not possible to acquire a lock within a structure * allocated with SLAB_TYPESAFE_BY_RCU without first acquiring a reference * as described above. The reason is that SLAB_TYPESAFE_BY_RCU pages * are not zeroed before being given to the slab, which means that any * locks must be initialized after each and every kmem_struct_alloc(). * Alternatively, make the ctor passed to kmem_cache_create() initialize * the locks at page-allocation time, as is done in __i915_request_ctor(), * sighand_ctor(), and anon_vma_ctor(). Such a ctor permits readers * to safely acquire those ctor-initialized locks under rcu_read_lock() * protection. * * Note that SLAB_TYPESAFE_BY_RCU was originally named SLAB_DESTROY_BY_RCU. */ #define SLAB_TYPESAFE_BY_RCU __SLAB_FLAG_BIT(_SLAB_TYPESAFE_BY_RCU) /* Trace allocations and frees */ #define SLAB_TRACE __SLAB_FLAG_BIT(_SLAB_TRACE) /* Flag to prevent checks on free */ #ifdef CONFIG_DEBUG_OBJECTS # define SLAB_DEBUG_OBJECTS __SLAB_FLAG_BIT(_SLAB_DEBUG_OBJECTS) #else # define SLAB_DEBUG_OBJECTS __SLAB_FLAG_UNUSED #endif /* Avoid kmemleak tracing */ #define SLAB_NOLEAKTRACE __SLAB_FLAG_BIT(_SLAB_NOLEAKTRACE) /* * Prevent merging with compatible kmem caches. This flag should be used * cautiously. Valid use cases: * * - caches created for self-tests (e.g. kunit) * - general caches created and used by a subsystem, only when a * (subsystem-specific) debug option is enabled * - performance critical caches, should be very rare and consulted with slab * maintainers, and not used together with CONFIG_SLUB_TINY */ #define SLAB_NO_MERGE __SLAB_FLAG_BIT(_SLAB_NO_MERGE) /* Fault injection mark */ #ifdef CONFIG_FAILSLAB # define SLAB_FAILSLAB __SLAB_FLAG_BIT(_SLAB_FAILSLAB) #else # define SLAB_FAILSLAB __SLAB_FLAG_UNUSED #endif /** * define SLAB_ACCOUNT - Account allocations to memcg. * * All object allocations from this cache will be memcg accounted, regardless of * __GFP_ACCOUNT being or not being passed to individual allocations. */ #ifdef CONFIG_MEMCG # define SLAB_ACCOUNT __SLAB_FLAG_BIT(_SLAB_ACCOUNT) #else # define SLAB_ACCOUNT __SLAB_FLAG_UNUSED #endif #ifdef CONFIG_KASAN_GENERIC #define SLAB_KASAN __SLAB_FLAG_BIT(_SLAB_KASAN) #else #define SLAB_KASAN __SLAB_FLAG_UNUSED #endif /* * Ignore user specified debugging flags. * Intended for caches created for self-tests so they have only flags * specified in the code and other flags are ignored. */ #define SLAB_NO_USER_FLAGS __SLAB_FLAG_BIT(_SLAB_NO_USER_FLAGS) #ifdef CONFIG_KFENCE #define SLAB_SKIP_KFENCE __SLAB_FLAG_BIT(_SLAB_SKIP_KFENCE) #else #define SLAB_SKIP_KFENCE __SLAB_FLAG_UNUSED #endif /* The following flags affect the page allocator grouping pages by mobility */ /** * define SLAB_RECLAIM_ACCOUNT - Objects are reclaimable. * * Use this flag for caches that have an associated shrinker. As a result, slab * pages are allocated with __GFP_RECLAIMABLE, which affects grouping pages by * mobility, and are accounted in SReclaimable counter in /proc/meminfo */ #ifndef CONFIG_SLUB_TINY #define SLAB_RECLAIM_ACCOUNT __SLAB_FLAG_BIT(_SLAB_RECLAIM_ACCOUNT) #else #define SLAB_RECLAIM_ACCOUNT __SLAB_FLAG_UNUSED #endif #define SLAB_TEMPORARY SLAB_RECLAIM_ACCOUNT /* Objects are short-lived */ /* Slab created using create_boot_cache */ #ifdef CONFIG_SLAB_OBJ_EXT #define SLAB_NO_OBJ_EXT __SLAB_FLAG_BIT(_SLAB_NO_OBJ_EXT) #else #define SLAB_NO_OBJ_EXT __SLAB_FLAG_UNUSED #endif /* * ZERO_SIZE_PTR will be returned for zero sized kmalloc requests. * * Dereferencing ZERO_SIZE_PTR will lead to a distinct access fault. * * ZERO_SIZE_PTR can be passed to kfree though in the same way that NULL can. * Both make kfree a no-op. */ #define ZERO_SIZE_PTR ((void *)16) #define ZERO_OR_NULL_PTR(x) ((unsigned long)(x) <= \ (unsigned long)ZERO_SIZE_PTR) #include <linux/kasan.h> struct list_lru; struct mem_cgroup; /* * struct kmem_cache related prototypes */ bool slab_is_available(void); /** * struct kmem_cache_args - Less common arguments for kmem_cache_create() * * Any uninitialized fields of the structure are interpreted as unused. The * exception is @freeptr_offset where %0 is a valid value, so * @use_freeptr_offset must be also set to %true in order to interpret the field * as used. For @useroffset %0 is also valid, but only with non-%0 * @usersize. * * When %NULL args is passed to kmem_cache_create(), it is equivalent to all * fields unused. */ struct kmem_cache_args { /** * @align: The required alignment for the objects. * * %0 means no specific alignment is requested. */ unsigned int align; /** * @useroffset: Usercopy region offset. * * %0 is a valid offset, when @usersize is non-%0 */ unsigned int useroffset; /** * @usersize: Usercopy region size. * * %0 means no usercopy region is specified. */ unsigned int usersize; /** * @freeptr_offset: Custom offset for the free pointer * in &SLAB_TYPESAFE_BY_RCU caches * * By default &SLAB_TYPESAFE_BY_RCU caches place the free pointer * outside of the object. This might cause the object to grow in size. * Cache creators that have a reason to avoid this can specify a custom * free pointer offset in their struct where the free pointer will be * placed. * * Note that placing the free pointer inside the object requires the * caller to ensure that no fields are invalidated that are required to * guard against object recycling (See &SLAB_TYPESAFE_BY_RCU for * details). * * Using %0 as a value for @freeptr_offset is valid. If @freeptr_offset * is specified, %use_freeptr_offset must be set %true. * * Note that @ctor currently isn't supported with custom free pointers * as a @ctor requires an external free pointer. */ unsigned int freeptr_offset; /** * @use_freeptr_offset: Whether a @freeptr_offset is used. */ bool use_freeptr_offset; /** * @ctor: A constructor for the objects. * * The constructor is invoked for each object in a newly allocated slab * page. It is the cache user's responsibility to free object in the * same state as after calling the constructor, or deal appropriately * with any differences between a freshly constructed and a reallocated * object. * * %NULL means no constructor. */ void (*ctor)(void *); }; struct kmem_cache *__kmem_cache_create_args(const char *name, unsigned int object_size, struct kmem_cache_args *args, slab_flags_t flags); static inline struct kmem_cache * __kmem_cache_create(const char *name, unsigned int size, unsigned int align, slab_flags_t flags, void (*ctor)(void *)) { struct kmem_cache_args kmem_args = { .align = align, .ctor = ctor, }; return __kmem_cache_create_args(name, size, &kmem_args, flags); } /** * kmem_cache_create_usercopy - Create a kmem cache with a region suitable * for copying to userspace. * @name: A string which is used in /proc/slabinfo to identify this cache. * @size: The size of objects to be created in this cache. * @align: The required alignment for the objects. * @flags: SLAB flags * @useroffset: Usercopy region offset * @usersize: Usercopy region size * @ctor: A constructor for the objects, or %NULL. * * This is a legacy wrapper, new code should use either KMEM_CACHE_USERCOPY() * if whitelisting a single field is sufficient, or kmem_cache_create() with * the necessary parameters passed via the args parameter (see * &struct kmem_cache_args) * * Return: a pointer to the cache on success, NULL on failure. */ static inline struct kmem_cache * kmem_cache_create_usercopy(const char *name, unsigned int size, unsigned int align, slab_flags_t flags, unsigned int useroffset, unsigned int usersize, void (*ctor)(void *)) { struct kmem_cache_args kmem_args = { .align = align, .ctor = ctor, .useroffset = useroffset, .usersize = usersize, }; return __kmem_cache_create_args(name, size, &kmem_args, flags); } /* If NULL is passed for @args, use this variant with default arguments. */ static inline struct kmem_cache * __kmem_cache_default_args(const char *name, unsigned int size, struct kmem_cache_args *args, slab_flags_t flags) { struct kmem_cache_args kmem_default_args = {}; /* Make sure we don't get passed garbage. */ if (WARN_ON_ONCE(args)) return ERR_PTR(-EINVAL); return __kmem_cache_create_args(name, size, &kmem_default_args, flags); } /** * kmem_cache_create - Create a kmem cache. * @__name: A string which is used in /proc/slabinfo to identify this cache. * @__object_size: The size of objects to be created in this cache. * @__args: Optional arguments, see &struct kmem_cache_args. Passing %NULL * means defaults will be used for all the arguments. * * This is currently implemented as a macro using ``_Generic()`` to call * either the new variant of the function, or a legacy one. * * The new variant has 4 parameters: * ``kmem_cache_create(name, object_size, args, flags)`` * * See __kmem_cache_create_args() which implements this. * * The legacy variant has 5 parameters: * ``kmem_cache_create(name, object_size, align, flags, ctor)`` * * The align and ctor parameters map to the respective fields of * &struct kmem_cache_args * * Context: Cannot be called within a interrupt, but can be interrupted. * * Return: a pointer to the cache on success, NULL on failure. */ #define kmem_cache_create(__name, __object_size, __args, ...) \ _Generic((__args), \ struct kmem_cache_args *: __kmem_cache_create_args, \ void *: __kmem_cache_default_args, \ default: __kmem_cache_create)(__name, __object_size, __args, __VA_ARGS__) void kmem_cache_destroy(struct kmem_cache *s); int kmem_cache_shrink(struct kmem_cache *s); /* * Please use this macro to create slab caches. Simply specify the * name of the structure and maybe some flags that are listed above. * * The alignment of the struct determines object alignment. If you * f.e. add ____cacheline_aligned_in_smp to the struct declaration * then the objects will be properly aligned in SMP configurations. */ #define KMEM_CACHE(__struct, __flags) \ __kmem_cache_create_args(#__struct, sizeof(struct __struct), \ &(struct kmem_cache_args) { \ .align = __alignof__(struct __struct), \ }, (__flags)) /* * To whitelist a single field for copying to/from usercopy, use this * macro instead for KMEM_CACHE() above. */ #define KMEM_CACHE_USERCOPY(__struct, __flags, __field) \ __kmem_cache_create_args(#__struct, sizeof(struct __struct), \ &(struct kmem_cache_args) { \ .align = __alignof__(struct __struct), \ .useroffset = offsetof(struct __struct, __field), \ .usersize = sizeof_field(struct __struct, __field), \ }, (__flags)) /* * Common kmalloc functions provided by all allocators */ void * __must_check krealloc_noprof(const void *objp, size_t new_size, gfp_t flags) __realloc_size(2); #define krealloc(...) alloc_hooks(krealloc_noprof(__VA_ARGS__)) void kfree(const void *objp); void kfree_sensitive(const void *objp); size_t __ksize(const void *objp); DEFINE_FREE(kfree, void *, if (!IS_ERR_OR_NULL(_T)) kfree(_T)) DEFINE_FREE(kfree_sensitive, void *, if (_T) kfree_sensitive(_T)) /** * ksize - Report actual allocation size of associated object * * @objp: Pointer returned from a prior kmalloc()-family allocation. * * This should not be used for writing beyond the originally requested * allocation size. Either use krealloc() or round up the allocation size * with kmalloc_size_roundup() prior to allocation. If this is used to * access beyond the originally requested allocation size, UBSAN_BOUNDS * and/or FORTIFY_SOURCE may trip, since they only know about the * originally allocated size via the __alloc_size attribute. */ size_t ksize(const void *objp); #ifdef CONFIG_PRINTK bool kmem_dump_obj(void *object); #else static inline bool kmem_dump_obj(void *object) { return false; } #endif /* * Some archs want to perform DMA into kmalloc caches and need a guaranteed * alignment larger than the alignment of a 64-bit integer. * Setting ARCH_DMA_MINALIGN in arch headers allows that. */ #ifdef ARCH_HAS_DMA_MINALIGN #if ARCH_DMA_MINALIGN > 8 && !defined(ARCH_KMALLOC_MINALIGN) #define ARCH_KMALLOC_MINALIGN ARCH_DMA_MINALIGN #endif #endif #ifndef ARCH_KMALLOC_MINALIGN #define ARCH_KMALLOC_MINALIGN __alignof__(unsigned long long) #elif ARCH_KMALLOC_MINALIGN > 8 #define KMALLOC_MIN_SIZE ARCH_KMALLOC_MINALIGN #define KMALLOC_SHIFT_LOW ilog2(KMALLOC_MIN_SIZE) #endif /* * Setting ARCH_SLAB_MINALIGN in arch headers allows a different alignment. * Intended for arches that get misalignment faults even for 64 bit integer * aligned buffers. */ #ifndef ARCH_SLAB_MINALIGN #define ARCH_SLAB_MINALIGN __alignof__(unsigned long long) #endif /* * Arches can define this function if they want to decide the minimum slab * alignment at runtime. The value returned by the function must be a power * of two and >= ARCH_SLAB_MINALIGN. */ #ifndef arch_slab_minalign static inline unsigned int arch_slab_minalign(void) { return ARCH_SLAB_MINALIGN; } #endif /* * kmem_cache_alloc and friends return pointers aligned to ARCH_SLAB_MINALIGN. * kmalloc and friends return pointers aligned to both ARCH_KMALLOC_MINALIGN * and ARCH_SLAB_MINALIGN, but here we only assume the former alignment. */ #define __assume_kmalloc_alignment __assume_aligned(ARCH_KMALLOC_MINALIGN) #define __assume_slab_alignment __assume_aligned(ARCH_SLAB_MINALIGN) #define __assume_page_alignment __assume_aligned(PAGE_SIZE) /* * Kmalloc array related definitions */ /* * SLUB directly allocates requests fitting in to an order-1 page * (PAGE_SIZE*2). Larger requests are passed to the page allocator. */ #define KMALLOC_SHIFT_HIGH (PAGE_SHIFT + 1) #define KMALLOC_SHIFT_MAX (MAX_PAGE_ORDER + PAGE_SHIFT) #ifndef KMALLOC_SHIFT_LOW #define KMALLOC_SHIFT_LOW 3 #endif /* Maximum allocatable size */ #define KMALLOC_MAX_SIZE (1UL << KMALLOC_SHIFT_MAX) /* Maximum size for which we actually use a slab cache */ #define KMALLOC_MAX_CACHE_SIZE (1UL << KMALLOC_SHIFT_HIGH) /* Maximum order allocatable via the slab allocator */ #define KMALLOC_MAX_ORDER (KMALLOC_SHIFT_MAX - PAGE_SHIFT) /* * Kmalloc subsystem. */ #ifndef KMALLOC_MIN_SIZE #define KMALLOC_MIN_SIZE (1 << KMALLOC_SHIFT_LOW) #endif /* * This restriction comes from byte sized index implementation. * Page size is normally 2^12 bytes and, in this case, if we want to use * byte sized index which can represent 2^8 entries, the size of the object * should be equal or greater to 2^12 / 2^8 = 2^4 = 16. * If minimum size of kmalloc is less than 16, we use it as minimum object * size and give up to use byte sized index. */ #define SLAB_OBJ_MIN_SIZE (KMALLOC_MIN_SIZE < 16 ? \ (KMALLOC_MIN_SIZE) : 16) #ifdef CONFIG_RANDOM_KMALLOC_CACHES #define RANDOM_KMALLOC_CACHES_NR 15 // # of cache copies #else #define RANDOM_KMALLOC_CACHES_NR 0 #endif /* * Whenever changing this, take care of that kmalloc_type() and * create_kmalloc_caches() still work as intended. * * KMALLOC_NORMAL can contain only unaccounted objects whereas KMALLOC_CGROUP * is for accounted but unreclaimable and non-dma objects. All the other * kmem caches can have both accounted and unaccounted objects. */ enum kmalloc_cache_type { KMALLOC_NORMAL = 0, #ifndef CONFIG_ZONE_DMA KMALLOC_DMA = KMALLOC_NORMAL, #endif #ifndef CONFIG_MEMCG KMALLOC_CGROUP = KMALLOC_NORMAL, #endif KMALLOC_RANDOM_START = KMALLOC_NORMAL, KMALLOC_RANDOM_END = KMALLOC_RANDOM_START + RANDOM_KMALLOC_CACHES_NR, #ifdef CONFIG_SLUB_TINY KMALLOC_RECLAIM = KMALLOC_NORMAL, #else KMALLOC_RECLAIM, #endif #ifdef CONFIG_ZONE_DMA KMALLOC_DMA, #endif #ifdef CONFIG_MEMCG KMALLOC_CGROUP, #endif NR_KMALLOC_TYPES }; typedef struct kmem_cache * kmem_buckets[KMALLOC_SHIFT_HIGH + 1]; extern kmem_buckets kmalloc_caches[NR_KMALLOC_TYPES]; /* * Define gfp bits that should not be set for KMALLOC_NORMAL. */ #define KMALLOC_NOT_NORMAL_BITS \ (__GFP_RECLAIMABLE | \ (IS_ENABLED(CONFIG_ZONE_DMA) ? __GFP_DMA : 0) | \ (IS_ENABLED(CONFIG_MEMCG) ? __GFP_ACCOUNT : 0)) extern unsigned long random_kmalloc_seed; static __always_inline enum kmalloc_cache_type kmalloc_type(gfp_t flags, unsigned long caller) { /* * The most common case is KMALLOC_NORMAL, so test for it * with a single branch for all the relevant flags. */ if (likely((flags & KMALLOC_NOT_NORMAL_BITS) == 0)) #ifdef CONFIG_RANDOM_KMALLOC_CACHES /* RANDOM_KMALLOC_CACHES_NR (=15) copies + the KMALLOC_NORMAL */ return KMALLOC_RANDOM_START + hash_64(caller ^ random_kmalloc_seed, ilog2(RANDOM_KMALLOC_CACHES_NR + 1)); #else return KMALLOC_NORMAL; #endif /* * At least one of the flags has to be set. Their priorities in * decreasing order are: * 1) __GFP_DMA * 2) __GFP_RECLAIMABLE * 3) __GFP_ACCOUNT */ if (IS_ENABLED(CONFIG_ZONE_DMA) && (flags & __GFP_DMA)) return KMALLOC_DMA; if (!IS_ENABLED(CONFIG_MEMCG) || (flags & __GFP_RECLAIMABLE)) return KMALLOC_RECLAIM; else return KMALLOC_CGROUP; } /* * Figure out which kmalloc slab an allocation of a certain size * belongs to. * 0 = zero alloc * 1 = 65 .. 96 bytes * 2 = 129 .. 192 bytes * n = 2^(n-1)+1 .. 2^n * * Note: __kmalloc_index() is compile-time optimized, and not runtime optimized; * typical usage is via kmalloc_index() and therefore evaluated at compile-time. * Callers where !size_is_constant should only be test modules, where runtime * overheads of __kmalloc_index() can be tolerated. Also see kmalloc_slab(). */ static __always_inline unsigned int __kmalloc_index(size_t size, bool size_is_constant) { if (!size) return 0; if (size <= KMALLOC_MIN_SIZE) return KMALLOC_SHIFT_LOW; if (KMALLOC_MIN_SIZE <= 32 && size > 64 && size <= 96) return 1; if (KMALLOC_MIN_SIZE <= 64 && size > 128 && size <= 192) return 2; if (size <= 8) return 3; if (size <= 16) return 4; if (size <= 32) return 5; if (size <= 64) return 6; if (size <= 128) return 7; if (size <= 256) return 8; if (size <= 512) return 9; if (size <= 1024) return 10; if (size <= 2 * 1024) return 11; if (size <= 4 * 1024) return 12; if (size <= 8 * 1024) return 13; if (size <= 16 * 1024) return 14; if (size <= 32 * 1024) return 15; if (size <= 64 * 1024) return 16; if (size <= 128 * 1024) return 17; if (size <= 256 * 1024) return 18; if (size <= 512 * 1024) return 19; if (size <= 1024 * 1024) return 20; if (size <= 2 * 1024 * 1024) return 21; if (!IS_ENABLED(CONFIG_PROFILE_ALL_BRANCHES) && size_is_constant) BUILD_BUG_ON_MSG(1, "unexpected size in kmalloc_index()"); else BUG(); /* Will never be reached. Needed because the compiler may complain */ return -1; } static_assert(PAGE_SHIFT <= 20); #define kmalloc_index(s) __kmalloc_index(s, true) #include <linux/alloc_tag.h> /** * kmem_cache_alloc - Allocate an object * @cachep: The cache to allocate from. * @flags: See kmalloc(). * * Allocate an object from this cache. * See kmem_cache_zalloc() for a shortcut of adding __GFP_ZERO to flags. * * Return: pointer to the new object or %NULL in case of error */ void *kmem_cache_alloc_noprof(struct kmem_cache *cachep, gfp_t flags) __assume_slab_alignment __malloc; #define kmem_cache_alloc(...) alloc_hooks(kmem_cache_alloc_noprof(__VA_ARGS__)) void *kmem_cache_alloc_lru_noprof(struct kmem_cache *s, struct list_lru *lru, gfp_t gfpflags) __assume_slab_alignment __malloc; #define kmem_cache_alloc_lru(...) alloc_hooks(kmem_cache_alloc_lru_noprof(__VA_ARGS__)) /** * kmem_cache_charge - memcg charge an already allocated slab memory * @objp: address of the slab object to memcg charge * @gfpflags: describe the allocation context * * kmem_cache_charge allows charging a slab object to the current memcg, * primarily in cases where charging at allocation time might not be possible * because the target memcg is not known (i.e. softirq context) * * The objp should be pointer returned by the slab allocator functions like * kmalloc (with __GFP_ACCOUNT in flags) or kmem_cache_alloc. The memcg charge * behavior can be controlled through gfpflags parameter, which affects how the * necessary internal metadata can be allocated. Including __GFP_NOFAIL denotes * that overcharging is requested instead of failure, but is not applied for the * internal metadata allocation. * * There are several cases where it will return true even if the charging was * not done: * More specifically: * * 1. For !CONFIG_MEMCG or cgroup_disable=memory systems. * 2. Already charged slab objects. * 3. For slab objects from KMALLOC_NORMAL caches - allocated by kmalloc() * without __GFP_ACCOUNT * 4. Allocating internal metadata has failed * * Return: true if charge was successful otherwise false. */ bool kmem_cache_charge(void *objp, gfp_t gfpflags); void kmem_cache_free(struct kmem_cache *s, void *objp); kmem_buckets *kmem_buckets_create(const char *name, slab_flags_t flags, unsigned int useroffset, unsigned int usersize, void (*ctor)(void *)); /* * Bulk allocation and freeing operations. These are accelerated in an * allocator specific way to avoid taking locks repeatedly or building * metadata structures unnecessarily. * * Note that interrupts must be enabled when calling these functions. */ void kmem_cache_free_bulk(struct kmem_cache *s, size_t size, void **p); int kmem_cache_alloc_bulk_noprof(struct kmem_cache *s, gfp_t flags, size_t size, void **p); #define kmem_cache_alloc_bulk(...) alloc_hooks(kmem_cache_alloc_bulk_noprof(__VA_ARGS__)) static __always_inline void kfree_bulk(size_t size, void **p) { kmem_cache_free_bulk(NULL, size, p); } void *kmem_cache_alloc_node_noprof(struct kmem_cache *s, gfp_t flags, int node) __assume_slab_alignment __malloc; #define kmem_cache_alloc_node(...) alloc_hooks(kmem_cache_alloc_node_noprof(__VA_ARGS__)) /* * These macros allow declaring a kmem_buckets * parameter alongside size, which * can be compiled out with CONFIG_SLAB_BUCKETS=n so that a large number of call * sites don't have to pass NULL. */ #ifdef CONFIG_SLAB_BUCKETS #define DECL_BUCKET_PARAMS(_size, _b) size_t (_size), kmem_buckets *(_b) #define PASS_BUCKET_PARAMS(_size, _b) (_size), (_b) #define PASS_BUCKET_PARAM(_b) (_b) #else #define DECL_BUCKET_PARAMS(_size, _b) size_t (_size) #define PASS_BUCKET_PARAMS(_size, _b) (_size) #define PASS_BUCKET_PARAM(_b) NULL #endif /* * The following functions are not to be used directly and are intended only * for internal use from kmalloc() and kmalloc_node() * with the exception of kunit tests */ void *__kmalloc_noprof(size_t size, gfp_t flags) __assume_kmalloc_alignment __alloc_size(1); void *__kmalloc_node_noprof(DECL_BUCKET_PARAMS(size, b), gfp_t flags, int node) __assume_kmalloc_alignment __alloc_size(1); void *__kmalloc_cache_noprof(struct kmem_cache *s, gfp_t flags, size_t size) __assume_kmalloc_alignment __alloc_size(3); void *__kmalloc_cache_node_noprof(struct kmem_cache *s, gfp_t gfpflags, int node, size_t size) __assume_kmalloc_alignment __alloc_size(4); void *__kmalloc_large_noprof(size_t size, gfp_t flags) __assume_page_alignment __alloc_size(1); void *__kmalloc_large_node_noprof(size_t size, gfp_t flags, int node) __assume_page_alignment __alloc_size(1); /** * kmalloc - allocate kernel memory * @size: how many bytes of memory are required. * @flags: describe the allocation context * * kmalloc is the normal method of allocating memory * for objects smaller than page size in the kernel. * * The allocated object address is aligned to at least ARCH_KMALLOC_MINALIGN * bytes. For @size of power of two bytes, the alignment is also guaranteed * to be at least to the size. For other sizes, the alignment is guaranteed to * be at least the largest power-of-two divisor of @size. * * The @flags argument may be one of the GFP flags defined at * include/linux/gfp_types.h and described at * :ref:`Documentation/core-api/mm-api.rst <mm-api-gfp-flags>` * * The recommended usage of the @flags is described at * :ref:`Documentation/core-api/memory-allocation.rst <memory_allocation>` * * Below is a brief outline of the most useful GFP flags * * %GFP_KERNEL * Allocate normal kernel ram. May sleep. * * %GFP_NOWAIT * Allocation will not sleep. * * %GFP_ATOMIC * Allocation will not sleep. May use emergency pools. * * Also it is possible to set different flags by OR'ing * in one or more of the following additional @flags: * * %__GFP_ZERO * Zero the allocated memory before returning. Also see kzalloc(). * * %__GFP_HIGH * This allocation has high priority and may use emergency pools. * * %__GFP_NOFAIL * Indicate that this allocation is in no way allowed to fail * (think twice before using). * * %__GFP_NORETRY * If memory is not immediately available, * then give up at once. * * %__GFP_NOWARN * If allocation fails, don't issue any warnings. * * %__GFP_RETRY_MAYFAIL * Try really hard to succeed the allocation but fail * eventually. */ static __always_inline __alloc_size(1) void *kmalloc_noprof(size_t size, gfp_t flags) { if (__builtin_constant_p(size) && size) { unsigned int index; if (size > KMALLOC_MAX_CACHE_SIZE) return __kmalloc_large_noprof(size, flags); index = kmalloc_index(size); return __kmalloc_cache_noprof( kmalloc_caches[kmalloc_type(flags, _RET_IP_)][index], flags, size); } return __kmalloc_noprof(size, flags); } #define kmalloc(...) alloc_hooks(kmalloc_noprof(__VA_ARGS__)) #define kmem_buckets_alloc(_b, _size, _flags) \ alloc_hooks(__kmalloc_node_noprof(PASS_BUCKET_PARAMS(_size, _b), _flags, NUMA_NO_NODE)) #define kmem_buckets_alloc_track_caller(_b, _size, _flags) \ alloc_hooks(__kmalloc_node_track_caller_noprof(PASS_BUCKET_PARAMS(_size, _b), _flags, NUMA_NO_NODE, _RET_IP_)) static __always_inline __alloc_size(1) void *kmalloc_node_noprof(size_t size, gfp_t flags, int node) { if (__builtin_constant_p(size) && size) { unsigned int index; if (size > KMALLOC_MAX_CACHE_SIZE) return __kmalloc_large_node_noprof(size, flags, node); index = kmalloc_index(size); return __kmalloc_cache_node_noprof( kmalloc_caches[kmalloc_type(flags, _RET_IP_)][index], flags, node, size); } return __kmalloc_node_noprof(PASS_BUCKET_PARAMS(size, NULL), flags, node); } #define kmalloc_node(...) alloc_hooks(kmalloc_node_noprof(__VA_ARGS__)) /** * kmalloc_array - allocate memory for an array. * @n: number of elements. * @size: element size. * @flags: the type of memory to allocate (see kmalloc). */ static inline __alloc_size(1, 2) void *kmalloc_array_noprof(size_t n, size_t size, gfp_t flags) { size_t bytes; if (unlikely(check_mul_overflow(n, size, &bytes))) return NULL; return kmalloc_noprof(bytes, flags); } #define kmalloc_array(...) alloc_hooks(kmalloc_array_noprof(__VA_ARGS__)) /** * krealloc_array - reallocate memory for an array. * @p: pointer to the memory chunk to reallocate * @new_n: new number of elements to alloc * @new_size: new size of a single member of the array * @flags: the type of memory to allocate (see kmalloc) * * If __GFP_ZERO logic is requested, callers must ensure that, starting with the * initial memory allocation, every subsequent call to this API for the same * memory allocation is flagged with __GFP_ZERO. Otherwise, it is possible that * __GFP_ZERO is not fully honored by this API. * * See krealloc_noprof() for further details. * * In any case, the contents of the object pointed to are preserved up to the * lesser of the new and old sizes. */ static inline __realloc_size(2, 3) void * __must_check krealloc_array_noprof(void *p, size_t new_n, size_t new_size, gfp_t flags) { size_t bytes; if (unlikely(check_mul_overflow(new_n, new_size, &bytes))) return NULL; return krealloc_noprof(p, bytes, flags); } #define krealloc_array(...) alloc_hooks(krealloc_array_noprof(__VA_ARGS__)) /** * kcalloc - allocate memory for an array. The memory is set to zero. * @n: number of elements. * @size: element size. * @flags: the type of memory to allocate (see kmalloc). */ #define kcalloc(n, size, flags) kmalloc_array(n, size, (flags) | __GFP_ZERO) void *__kmalloc_node_track_caller_noprof(DECL_BUCKET_PARAMS(size, b), gfp_t flags, int node, unsigned long caller) __alloc_size(1); #define kmalloc_node_track_caller_noprof(size, flags, node, caller) \ __kmalloc_node_track_caller_noprof(PASS_BUCKET_PARAMS(size, NULL), flags, node, caller) #define kmalloc_node_track_caller(...) \ alloc_hooks(kmalloc_node_track_caller_noprof(__VA_ARGS__, _RET_IP_)) /* * kmalloc_track_caller is a special version of kmalloc that records the * calling function of the routine calling it for slab leak tracking instead * of just the calling function (confusing, eh?). * It's useful when the call to kmalloc comes from a widely-used standard * allocator where we care about the real place the memory allocation * request comes from. */ #define kmalloc_track_caller(...) kmalloc_node_track_caller(__VA_ARGS__, NUMA_NO_NODE) #define kmalloc_track_caller_noprof(...) \ kmalloc_node_track_caller_noprof(__VA_ARGS__, NUMA_NO_NODE, _RET_IP_) static inline __alloc_size(1, 2) void *kmalloc_array_node_noprof(size_t n, size_t size, gfp_t flags, int node) { size_t bytes; if (unlikely(check_mul_overflow(n, size, &bytes))) return NULL; if (__builtin_constant_p(n) && __builtin_constant_p(size)) return kmalloc_node_noprof(bytes, flags, node); return __kmalloc_node_noprof(PASS_BUCKET_PARAMS(bytes, NULL), flags, node); } #define kmalloc_array_node(...) alloc_hooks(kmalloc_array_node_noprof(__VA_ARGS__)) #define kcalloc_node(_n, _size, _flags, _node) \ kmalloc_array_node(_n, _size, (_flags) | __GFP_ZERO, _node) /* * Shortcuts */ #define kmem_cache_zalloc(_k, _flags) kmem_cache_alloc(_k, (_flags)|__GFP_ZERO) /** * kzalloc - allocate memory. The memory is set to zero. * @size: how many bytes of memory are required. * @flags: the type of memory to allocate (see kmalloc). */ static inline __alloc_size(1) void *kzalloc_noprof(size_t size, gfp_t flags) { return kmalloc_noprof(size, flags | __GFP_ZERO); } #define kzalloc(...) alloc_hooks(kzalloc_noprof(__VA_ARGS__)) #define kzalloc_node(_size, _flags, _node) kmalloc_node(_size, (_flags)|__GFP_ZERO, _node) void *__kvmalloc_node_noprof(DECL_BUCKET_PARAMS(size, b), gfp_t flags, int node) __alloc_size(1); #define kvmalloc_node_noprof(size, flags, node) \ __kvmalloc_node_noprof(PASS_BUCKET_PARAMS(size, NULL), flags, node) #define kvmalloc_node(...) alloc_hooks(kvmalloc_node_noprof(__VA_ARGS__)) #define kvmalloc(_size, _flags) kvmalloc_node(_size, _flags, NUMA_NO_NODE) #define kvmalloc_noprof(_size, _flags) kvmalloc_node_noprof(_size, _flags, NUMA_NO_NODE) #define kvzalloc(_size, _flags) kvmalloc(_size, (_flags)|__GFP_ZERO) #define kvzalloc_node(_size, _flags, _node) kvmalloc_node(_size, (_flags)|__GFP_ZERO, _node) #define kmem_buckets_valloc(_b, _size, _flags) \ alloc_hooks(__kvmalloc_node_noprof(PASS_BUCKET_PARAMS(_size, _b), _flags, NUMA_NO_NODE)) static inline __alloc_size(1, 2) void * kvmalloc_array_node_noprof(size_t n, size_t size, gfp_t flags, int node) { size_t bytes; if (unlikely(check_mul_overflow(n, size, &bytes))) return NULL; return kvmalloc_node_noprof(bytes, flags, node); } #define kvmalloc_array_noprof(...) kvmalloc_array_node_noprof(__VA_ARGS__, NUMA_NO_NODE) #define kvcalloc_node_noprof(_n,_s,_f,_node) kvmalloc_array_node_noprof(_n,_s,(_f)|__GFP_ZERO,_node) #define kvcalloc_noprof(...) kvcalloc_node_noprof(__VA_ARGS__, NUMA_NO_NODE) #define kvmalloc_array(...) alloc_hooks(kvmalloc_array_noprof(__VA_ARGS__)) #define kvcalloc_node(...) alloc_hooks(kvcalloc_node_noprof(__VA_ARGS__)) #define kvcalloc(...) alloc_hooks(kvcalloc_noprof(__VA_ARGS__)) void *kvrealloc_noprof(const void *p, size_t size, gfp_t flags) __realloc_size(2); #define kvrealloc(...) alloc_hooks(kvrealloc_noprof(__VA_ARGS__)) extern void kvfree(const void *addr); DEFINE_FREE(kvfree, void *, if (!IS_ERR_OR_NULL(_T)) kvfree(_T)) extern void kvfree_sensitive(const void *addr, size_t len); unsigned int kmem_cache_size(struct kmem_cache *s); #ifndef CONFIG_KVFREE_RCU_BATCHED static inline void kvfree_rcu_barrier(void) { rcu_barrier(); } static inline void kfree_rcu_scheduler_running(void) { } #else void kvfree_rcu_barrier(void); void kfree_rcu_scheduler_running(void); #endif /** * kmalloc_size_roundup - Report allocation bucket size for the given size * * @size: Number of bytes to round up from. * * This returns the number of bytes that would be available in a kmalloc() * allocation of @size bytes. For example, a 126 byte request would be * rounded up to the next sized kmalloc bucket, 128 bytes. (This is strictly * for the general-purpose kmalloc()-based allocations, and is not for the * pre-sized kmem_cache_alloc()-based allocations.) * * Use this to kmalloc() the full bucket size ahead of time instead of using * ksize() to query the size after an allocation. */ size_t kmalloc_size_roundup(size_t size); void __init kmem_cache_init_late(void); void __init kvfree_rcu_init(void); #endif /* _LINUX_SLAB_H */ |
| 31 31 18 18 18 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1 |