2 252 260 189 244 218 123 189 5 5 5 5 123 232 228 231 231 227 214 228 227 1 226 220 220 226 227 227 227 227 227 222 221 221 226 10 221 220 224 227 226 219 212 219 217 219 210 219 219 218 232 218 232 215 222 12 11 7 10 208 209 224 224 119 186 208 208 208 209 209 208 209 209 213 212 212 3 212 208 208 209 209 209 209 209 209 208 45 46 208 209 47 209 3 212 2 1 1 74 3 3 2 2 2 2 211 210 210 210 2 209 210 1 209 209 210 1 209 4 4 41 42 4 4 2 3 1 4 4 4 4 5 1 1 1 1 1 185 185 185 186 206 200 201 54 2 192 191 122 120 118 122 118 119 32 119 39 38 69 1 123 123 123 123 123 123 98 123 123 1 1 1 1 1 1 2 189 189 188 190 189 189 188 190 1 189 189 59 59 59 59 59 52 1 51 6 59 7 6 60 59 1 58 59 59 59 59 59 59 59 7 7 7 7 7 7 7 59 60 221 220 123 2 1 1 1 1 1 1 221 220 217 218 218 218 219 219 217 217 218 217 217 217 218 217 211 123 123 123 123 123 123 123 123 123 123 123 123 123 123 123 123 123 123 123 123 123 123 123 123 123 123 123 123 123 123 123 123 123 123 123 123 123 123 123 123 123 62 64 64 64 64 61 64 64 61 64 64 63 128 125 10 128 123 123 123 123 26 23 13 23 23 23 23 3 3 3 26 26 26 3 3 26 2 2 2 2 47 2 25 3 12 12 12 12 1 14 13 13 3 42 43 42 43 4 4 4 2 3 2 1 3 2 1 3 5 3 2 1 1 3 2 2 2 1 3 2 1 2 2 76 75 73 75 2 1 3 2 4 1 4 1 1 3 1 5 2 2 75 76 229 12 12 209 209 189 121 214 43 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 3 1 3 3 2 2 1 2 2 5 5 5 8 3 5 7 6 6 6 5 5 5 8 5 4 4 4 4 2 2 2 7 6 3 3 6 7 4 3 3 4 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 2297 2298 2299 2300 2301 2302 2303 2304 2305 2306 2307 2308 2309 2310 2311 2312 2313 2314 2315 2316 2317 2318 2319 2320 2321 2322 2323 2324 2325 2326 2327 2328 2329 2330 2331 2332 2333 2334 2335 2336 2337 2338 2339 2340 2341 2342 2343 2344 2345 2346 2347 2348 2349 2350 2351 2352 2353 2354 2355 2356 2357 2358 2359 2360 2361 2362 2363 2364 2365 2366 2367 2368 2369 2370 2371 2372 2373 2374 2375 2376 2377 2378 2379 2380 2381 2382 2383 2384 2385 2386 2387 2388 2389 2390 2391 2392 2393 2394 2395 2396 2397 2398 2399 2400 2401 2402 2403 2404 2405 2406 2407 2408 2409 2410 2411 2412 2413 2414 2415 2416 2417 2418 2419 2420 2421 2422 2423 2424 2425 2426 2427 2428 2429 2430 2431 2432 2433 2434 2435 2436 2437 2438 2439 2440 2441 2442 2443 2444 2445 2446 2447 2448 2449 2450 2451 2452 2453 2454 2455 2456 2457 2458 2459 2460 2461 2462 2463 2464 2465 2466 2467 2468 2469 2470 2471 2472 2473 2474 2475 2476 2477 2478 2479 2480 2481 2482 2483 2484 2485 2486 2487 2488 2489 2490 2491 2492 2493 2494 2495 2496 2497 2498 2499 2500 2501 2502 2503 2504 2505 2506 2507 2508 2509 2510 2511 2512 2513 2514 2515 2516 2517 2518 2519 2520 2521 2522 2523 2524 2525 2526 2527 2528 2529 2530 2531 2532 2533 2534 2535 2536 2537 2538 2539 2540 2541 2542 2543 2544 2545 2546 2547 2548 2549 2550 2551 2552 2553 2554 2555 2556 2557 2558 2559 2560 2561 2562 2563 2564 2565 2566 2567 2568 2569 2570 2571 2572 2573 2574 2575 2576 2577 2578 2579 2580 2581 2582 2583 2584 2585 2586 2587 2588 2589 2590 2591 2592 2593 2594 2595 2596 2597 2598 2599 2600 2601 2602 2603 2604 2605 2606 2607 2608 2609 2610 2611 2612 2613 2614 2615 2616 2617 2618 2619 2620 2621 2622 2623 2624 2625 2626 2627 2628 2629 2630 2631 2632 2633 2634 2635 2636 2637 2638 2639 2640 2641 2642 2643 2644 2645 2646 2647 2648 2649 2650 2651 2652 2653 2654 2655 2656 2657 2658 2659 2660 2661 2662 2663 2664 2665 2666 2667 2668 2669 2670 2671 2672 2673 2674 2675 2676 2677 2678 2679 2680 2681 2682 2683 2684 2685 2686 2687 2688 2689 2690 2691 2692 2693 2694 2695 2696 2697 2698 2699 2700 2701 2702 2703 2704 2705 2706 2707 2708 2709 2710 2711 2712 2713 2714 2715 2716 2717 2718 2719 2720 2721 2722 2723 2724 2725 2726 2727 2728 2729 2730 2731 2732 2733 2734 2735 2736 2737 2738 2739 2740 2741 2742 2743 2744 2745 2746 2747 2748 2749 2750 2751 2752 2753 2754 2755 2756 2757 2758 2759 2760 2761 2762 2763 2764 2765 2766 2767 2768 2769 2770 2771 2772 2773 2774 2775 2776 2777 2778 2779 2780 2781 2782 2783 2784 2785 2786 2787 2788 2789 2790 2791 2792 2793 2794 2795 2796 2797 2798 2799 2800 2801 2802 2803 2804 2805 2806 2807 2808 2809 2810 2811 2812 2813 2814 2815 2816 2817 2818 2819 2820 2821 2822 2823 2824 2825 2826 2827 2828 2829 2830 2831 2832 2833 2834 2835 2836 2837 2838 2839 2840 2841 2842 2843 2844 2845 2846 2847 2848 2849 2850 2851 2852 2853 2854 2855 2856 2857 2858 2859 2860 2861 2862 2863 2864 2865 2866 2867 2868 2869 2870 2871 2872 2873 2874 2875 2876 2877 2878 2879 2880 2881 2882 2883 2884 2885 2886 2887 2888 2889 2890 2891 2892 2893 2894 2895 2896 2897 2898 2899 2900 2901 2902 2903 2904 2905 2906 2907 2908 2909 2910 2911 2912 2913 2914 2915 2916 2917 2918 2919 2920 2921 2922 2923 2924 2925 2926 2927 2928 2929 2930 2931 2932 2933 2934 2935 2936 2937 2938 2939 2940 2941 2942 2943 2944 2945 2946 2947 2948 2949 2950 2951 2952 2953 2954 2955 2956 2957 2958 2959 2960 2961 2962 2963 2964 2965 2966 2967 2968 2969 2970 2971 2972 2973 2974 2975 2976 2977 2978 2979 2980 2981 2982 2983 2984 2985 2986 2987 2988 2989 2990 2991 2992 2993 2994 2995 2996 2997 2998 2999 3000 3001 3002 3003 3004 3005 3006 3007 3008 3009 3010 3011 3012 3013 3014 3015 3016 3017 3018 3019 3020 3021 3022 3023 3024 3025 3026 3027 3028 3029 3030 3031 3032 3033 3034 3035 3036 3037 3038 3039 3040 3041 3042 3043 3044 3045 3046 3047 3048 3049 3050 3051 3052 3053 3054 3055 3056 3057 3058 3059 3060 3061 3062 3063 3064 3065 3066 3067 3068 3069 3070 3071 3072 3073 3074 3075 3076 3077 3078 3079 3080 3081 3082 3083 3084 3085 3086 3087 3088 3089 3090 3091 3092 3093 3094 3095 3096 3097 3098 3099 3100 3101 3102 3103 3104 3105 3106 3107 3108 3109 3110 3111 3112 3113 3114 3115 3116 3117 3118 3119 3120 3121 3122 3123 3124 3125 3126 3127 3128 3129 3130 3131 3132 3133 3134 3135 3136 3137 3138 3139 3140 3141 3142 3143 3144 3145 3146 3147 3148 3149 3150 3151 3152 3153 3154 3155 3156 3157 3158 3159 3160 3161 3162 3163 3164 3165 3166 3167 3168 3169 3170 3171 3172 3173 3174 3175 3176 3177 3178 3179 3180 3181 3182 3183 3184 3185 3186 3187 3188 3189 3190 3191 3192 3193 3194 3195 3196 3197 3198 3199 3200 3201 3202 3203 3204 3205 3206 3207 3208 3209 3210 3211 3212 3213 3214 3215 3216 3217 3218 3219 3220 3221 3222 3223 3224 3225 3226 3227 3228 3229 3230 3231 3232 3233 3234 3235 3236 3237 3238 3239 3240 3241 3242 3243 3244 3245 3246 3247 3248 3249 3250 3251 3252 3253 3254 3255 3256 3257 3258 3259 3260 3261 3262 3263 3264 3265 3266 3267 3268 3269 3270 3271 3272 3273 3274 3275 3276 3277 3278 3279 3280 3281 3282 3283 3284 3285 3286 3287 3288 3289 3290 3291 3292 3293 3294 3295 3296 3297 3298 3299 3300 3301 3302 3303 3304 3305 3306 3307 3308 3309 3310 3311 3312 3313 3314 3315 3316 3317 3318 3319 3320 3321 3322 3323 3324 3325 3326 3327 3328 3329 3330 3331 3332 3333 3334 3335 3336 3337 3338 3339 3340 3341 3342 3343 3344 3345 3346 3347 3348 3349 3350 3351 3352 3353 3354 3355 3356 3357 3358 3359 3360 3361 3362 3363 3364 3365 3366 3367 3368 3369 3370 3371 3372 3373 3374 3375 3376 3377 3378 3379 3380 3381 3382 3383 3384 3385 3386 3387 3388 3389 3390 3391 3392 3393 3394 3395 3396 3397 3398 3399 3400 3401 3402 3403 3404 3405 3406 3407 3408 3409 3410 3411 3412 3413 3414 3415 3416 3417 3418 3419 3420 3421 3422 3423 3424 3425 3426 3427 3428 3429 3430 3431 3432 3433 3434 3435 3436 3437 3438 3439 3440 3441 3442 3443 3444 3445 3446 3447 3448 3449 3450 3451 3452 3453 3454 3455 3456 3457 3458 3459 3460 3461 3462 3463 3464 3465 3466 3467 3468 3469 3470 3471 3472 3473 3474 3475 3476 3477 3478 3479 3480 3481 3482 3483 3484 3485 3486 3487 3488 3489 3490 3491 3492 3493 3494 3495 3496 3497 3498 3499 3500 3501 3502 3503 3504 3505 3506 3507 3508 3509 3510 3511 3512 3513 3514 3515 3516 3517 3518 3519 3520 3521 3522 3523 3524 3525 3526 3527 3528 3529 3530 3531 3532 3533 3534 3535 3536 3537 3538 3539 3540 3541 3542 3543 3544 3545 3546 3547 3548 3549 3550 3551 3552 3553 3554 3555 3556 3557 3558 3559 3560 3561 3562 3563 3564 3565 3566 3567 3568 3569 3570 3571 3572 3573 3574 3575 3576 3577 3578 3579 3580 3581 3582 3583 3584 3585 3586 3587 3588 3589 3590 3591 3592 3593 3594 3595 3596 3597 3598 3599 3600 3601 3602 3603 3604 3605 3606 3607 3608 3609 3610 3611 3612 3613 3614 3615 3616 3617 3618 3619 3620 3621 3622 3623 3624 3625 3626 3627 3628 3629 3630 3631 3632 3633 3634 3635 3636 3637 3638 3639 3640 3641 3642 3643 3644 3645 3646 3647 3648 3649 3650 3651 3652 3653 3654 3655 3656 3657 3658 3659 3660 3661 3662 3663 3664 3665 3666 3667 3668 3669 3670 3671 3672 3673 3674 3675 3676 3677 3678 3679 3680 3681 3682 3683 3684 3685 3686 3687 3688 3689 3690 3691 3692 3693 3694 3695 3696 3697 3698 3699 3700 3701 3702 3703 3704 3705 3706 3707 3708 3709 3710 3711 3712 3713 3714 3715 3716 3717 3718 3719 3720 3721 3722 3723 3724 3725 3726 3727 3728 3729 3730 3731 3732 3733 3734 3735 3736 3737 3738 3739 3740 3741 3742 3743 3744 3745 3746 3747 3748 3749 3750 3751 3752 3753 3754 3755 /* * Digital Audio (PCM) abstract layer * Copyright (c) by Jaroslav Kysela <perex@perex.cz> * * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA * */ #include <linux/mm.h> #include <linux/module.h> #include <linux/file.h> #include <linux/slab.h> #include <linux/sched/signal.h> #include <linux/time.h> #include <linux/pm_qos.h> #include <linux/io.h> #include <linux/dma-mapping.h> #include <sound/core.h> #include <sound/control.h> #include <sound/info.h> #include <sound/pcm.h> #include <sound/pcm_params.h> #include <sound/timer.h> #include <sound/minors.h> #include <linux/uio.h> #include <linux/delay.h> #include "pcm_local.h" #ifdef CONFIG_SND_DEBUG #define CREATE_TRACE_POINTS #include "pcm_param_trace.h" #else #define trace_hw_mask_param_enabled() 0 #define trace_hw_interval_param_enabled() 0 #define trace_hw_mask_param(substream, type, index, prev, curr) #define trace_hw_interval_param(substream, type, index, prev, curr) #endif /* * Compatibility */ struct snd_pcm_hw_params_old { unsigned int flags; unsigned int masks[SNDRV_PCM_HW_PARAM_SUBFORMAT - SNDRV_PCM_HW_PARAM_ACCESS + 1]; struct snd_interval intervals[SNDRV_PCM_HW_PARAM_TICK_TIME - SNDRV_PCM_HW_PARAM_SAMPLE_BITS + 1]; unsigned int rmask; unsigned int cmask; unsigned int info; unsigned int msbits; unsigned int rate_num; unsigned int rate_den; snd_pcm_uframes_t fifo_size; unsigned char reserved[64]; }; #ifdef CONFIG_SND_SUPPORT_OLD_API #define SNDRV_PCM_IOCTL_HW_REFINE_OLD _IOWR('A', 0x10, struct snd_pcm_hw_params_old) #define SNDRV_PCM_IOCTL_HW_PARAMS_OLD _IOWR('A', 0x11, struct snd_pcm_hw_params_old) static int snd_pcm_hw_refine_old_user(struct snd_pcm_substream *substream, struct snd_pcm_hw_params_old __user * _oparams); static int snd_pcm_hw_params_old_user(struct snd_pcm_substream *substream, struct snd_pcm_hw_params_old __user * _oparams); #endif static int snd_pcm_open(struct file *file, struct snd_pcm *pcm, int stream); /* * */ static DEFINE_RWLOCK(snd_pcm_link_rwlock); static DECLARE_RWSEM(snd_pcm_link_rwsem); /* Writer in rwsem may block readers even during its waiting in queue, * and this may lead to a deadlock when the code path takes read sem * twice (e.g. one in snd_pcm_action_nonatomic() and another in * snd_pcm_stream_lock()). As a (suboptimal) workaround, let writer to * sleep until all the readers are completed without blocking by writer. */ static inline void down_write_nonfifo(struct rw_semaphore *lock) { while (!down_write_trylock(lock)) msleep(1); } #define PCM_LOCK_DEFAULT 0 #define PCM_LOCK_IRQ 1 #define PCM_LOCK_IRQSAVE 2 static unsigned long __snd_pcm_stream_lock_mode(struct snd_pcm_substream *substream, unsigned int mode) { unsigned long flags = 0; if (substream->pcm->nonatomic) { down_read_nested(&snd_pcm_link_rwsem, SINGLE_DEPTH_NESTING); mutex_lock(&substream->self_group.mutex); } else { switch (mode) { case PCM_LOCK_DEFAULT: read_lock(&snd_pcm_link_rwlock); break; case PCM_LOCK_IRQ: read_lock_irq(&snd_pcm_link_rwlock); break; case PCM_LOCK_IRQSAVE: read_lock_irqsave(&snd_pcm_link_rwlock, flags); break; } spin_lock(&substream->self_group.lock); } return flags; } static void __snd_pcm_stream_unlock_mode(struct snd_pcm_substream *substream, unsigned int mode, unsigned long flags) { if (substream->pcm->nonatomic) { mutex_unlock(&substream->self_group.mutex); up_read(&snd_pcm_link_rwsem); } else { spin_unlock(&substream->self_group.lock); switch (mode) { case PCM_LOCK_DEFAULT: read_unlock(&snd_pcm_link_rwlock); break; case PCM_LOCK_IRQ: read_unlock_irq(&snd_pcm_link_rwlock); break; case PCM_LOCK_IRQSAVE: read_unlock_irqrestore(&snd_pcm_link_rwlock, flags); break; } } } /** * snd_pcm_stream_lock - Lock the PCM stream * @substream: PCM substream * * This locks the PCM stream's spinlock or mutex depending on the nonatomic * flag of the given substream. This also takes the global link rw lock * (or rw sem), too, for avoiding the race with linked streams. */ void snd_pcm_stream_lock(struct snd_pcm_substream *substream) { __snd_pcm_stream_lock_mode(substream, PCM_LOCK_DEFAULT); } EXPORT_SYMBOL_GPL(snd_pcm_stream_lock); /** * snd_pcm_stream_lock - Unlock the PCM stream * @substream: PCM substream * * This unlocks the PCM stream that has been locked via snd_pcm_stream_lock(). */ void snd_pcm_stream_unlock(struct snd_pcm_substream *substream) { __snd_pcm_stream_unlock_mode(substream, PCM_LOCK_DEFAULT, 0); } EXPORT_SYMBOL_GPL(snd_pcm_stream_unlock); /** * snd_pcm_stream_lock_irq - Lock the PCM stream * @substream: PCM substream * * This locks the PCM stream like snd_pcm_stream_lock() and disables the local * IRQ (only when nonatomic is false). In nonatomic case, this is identical * as snd_pcm_stream_lock(). */ void snd_pcm_stream_lock_irq(struct snd_pcm_substream *substream) { __snd_pcm_stream_lock_mode(substream, PCM_LOCK_IRQ); } EXPORT_SYMBOL_GPL(snd_pcm_stream_lock_irq); /** * snd_pcm_stream_unlock_irq - Unlock the PCM stream * @substream: PCM substream * * This is a counter-part of snd_pcm_stream_lock_irq(). */ void snd_pcm_stream_unlock_irq(struct snd_pcm_substream *substream) { __snd_pcm_stream_unlock_mode(substream, PCM_LOCK_IRQ, 0); } EXPORT_SYMBOL_GPL(snd_pcm_stream_unlock_irq); unsigned long _snd_pcm_stream_lock_irqsave(struct snd_pcm_substream *substream) { return __snd_pcm_stream_lock_mode(substream, PCM_LOCK_IRQSAVE); } EXPORT_SYMBOL_GPL(_snd_pcm_stream_lock_irqsave); /** * snd_pcm_stream_unlock_irqrestore - Unlock the PCM stream * @substream: PCM substream * @flags: irq flags * * This is a counter-part of snd_pcm_stream_lock_irqsave(). */ void snd_pcm_stream_unlock_irqrestore(struct snd_pcm_substream *substream, unsigned long flags) { __snd_pcm_stream_unlock_mode(substream, PCM_LOCK_IRQSAVE, flags); } EXPORT_SYMBOL_GPL(snd_pcm_stream_unlock_irqrestore); int snd_pcm_info(struct snd_pcm_substream *substream, struct snd_pcm_info *info) { struct snd_pcm *pcm = substream->pcm; struct snd_pcm_str *pstr = substream->pstr; memset(info, 0, sizeof(*info)); info->card = pcm->card->number; info->device = pcm->device; info->stream = substream->stream; info->subdevice = substream->number; strlcpy(info->id, pcm->id, sizeof(info->id)); strlcpy(info->name, pcm->name, sizeof(info->name)); info->dev_class = pcm->dev_class; info->dev_subclass = pcm->dev_subclass; info->subdevices_count = pstr->substream_count; info->subdevices_avail = pstr->substream_count - pstr->substream_opened; strlcpy(info->subname, substream->name, sizeof(info->subname)); return 0; } int snd_pcm_info_user(struct snd_pcm_substream *substream, struct snd_pcm_info __user * _info) { struct snd_pcm_info *info; int err; info = kmalloc(sizeof(*info), GFP_KERNEL); if (! info) return -ENOMEM; err = snd_pcm_info(substream, info); if (err >= 0) { if (copy_to_user(_info, info, sizeof(*info))) err = -EFAULT; } kfree(info); return err; } static bool hw_support_mmap(struct snd_pcm_substream *substream) { if (!(substream->runtime->hw.info & SNDRV_PCM_INFO_MMAP)) return false; /* architecture supports dma_mmap_coherent()? */ #if defined(CONFIG_ARCH_NO_COHERENT_DMA_MMAP) || !defined(CONFIG_HAS_DMA) if (!substream->ops->mmap && substream->dma_buffer.dev.type == SNDRV_DMA_TYPE_DEV) return false; #endif return true; } static int constrain_mask_params(struct snd_pcm_substream *substream, struct snd_pcm_hw_params *params) { struct snd_pcm_hw_constraints *constrs = &substream->runtime->hw_constraints; struct snd_mask *m; unsigned int k; struct snd_mask old_mask; int changed; for (k = SNDRV_PCM_HW_PARAM_FIRST_MASK; k <= SNDRV_PCM_HW_PARAM_LAST_MASK; k++) { m = hw_param_mask(params, k); if (snd_mask_empty(m)) return -EINVAL; /* This parameter is not requested to change by a caller. */ if (!(params->rmask & (1 << k))) continue; if (trace_hw_mask_param_enabled()) old_mask = *m; changed = snd_mask_refine(m, constrs_mask(constrs, k)); if (changed < 0) return changed; if (changed == 0) continue; /* Set corresponding flag so that the caller gets it. */ trace_hw_mask_param(substream, k, 0, &old_mask, m); params->cmask |= 1 << k; } return 0; } static int constrain_interval_params(struct snd_pcm_substream *substream, struct snd_pcm_hw_params *params) { struct snd_pcm_hw_constraints *constrs = &substream->runtime->hw_constraints; struct snd_interval *i; unsigned int k; struct snd_interval old_interval; int changed; for (k = SNDRV_PCM_HW_PARAM_FIRST_INTERVAL; k <= SNDRV_PCM_HW_PARAM_LAST_INTERVAL; k++) { i = hw_param_interval(params, k); if (snd_interval_empty(i)) return -EINVAL; /* This parameter is not requested to change by a caller. */ if (!(params->rmask & (1 << k))) continue; if (trace_hw_interval_param_enabled()) old_interval = *i; changed = snd_interval_refine(i, constrs_interval(constrs, k)); if (changed < 0) return changed; if (changed == 0) continue; /* Set corresponding flag so that the caller gets it. */ trace_hw_interval_param(substream, k, 0, &old_interval, i); params->cmask |= 1 << k; } return 0; } static int constrain_params_by_rules(struct snd_pcm_substream *substream, struct snd_pcm_hw_params *params) { struct snd_pcm_hw_constraints *constrs = &substream->runtime->hw_constraints; unsigned int k; unsigned int *rstamps; unsigned int vstamps[SNDRV_PCM_HW_PARAM_LAST_INTERVAL + 1]; unsigned int stamp; struct snd_pcm_hw_rule *r; unsigned int d; struct snd_mask old_mask; struct snd_interval old_interval; bool again; int changed, err = 0; /* * Each application of rule has own sequence number. * * Each member of 'rstamps' array represents the sequence number of * recent application of corresponding rule. */ rstamps = kcalloc(constrs->rules_num, sizeof(unsigned int), GFP_KERNEL); if (!rstamps) return -ENOMEM; /* * Each member of 'vstamps' array represents the sequence number of * recent application of rule in which corresponding parameters were * changed. * * In initial state, elements corresponding to parameters requested by * a caller is 1. For unrequested parameters, corresponding members * have 0 so that the parameters are never changed anymore. */ for (k = 0; k <= SNDRV_PCM_HW_PARAM_LAST_INTERVAL; k++) vstamps[k] = (params->rmask & (1 << k)) ? 1 : 0; /* Due to the above design, actual sequence number starts at 2. */ stamp = 2; retry: /* Apply all rules in order. */ again = false; for (k = 0; k < constrs->rules_num; k++) { r = &constrs->rules[k]; /* * Check condition bits of this rule. When the rule has * some condition bits, parameter without the bits is * never processed. SNDRV_PCM_HW_PARAMS_NO_PERIOD_WAKEUP * is an example of the condition bits. */ if (r->cond && !(r->cond & params->flags)) continue; /* * The 'deps' array includes maximum three dependencies * to SNDRV_PCM_HW_PARAM_XXXs for this rule. The fourth * member of this array is a sentinel and should be * negative value. * * This rule should be processed in this time when dependent * parameters were changed at former applications of the other * rules. */ for (d = 0; r->deps[d] >= 0; d++) { if (vstamps[r->deps[d]] > rstamps[k]) break; } if (r->deps[d] < 0) continue; if (trace_hw_mask_param_enabled()) { if (hw_is_mask(r->var)) old_mask = *hw_param_mask(params, r->var); } if (trace_hw_interval_param_enabled()) { if (hw_is_interval(r->var)) old_interval = *hw_param_interval(params, r->var); } changed = r->func(params, r); if (changed < 0) { err = changed; goto out; } /* * When the parameter is changed, notify it to the caller * by corresponding returned bit, then preparing for next * iteration. */ if (changed && r->var >= 0) { if (hw_is_mask(r->var)) { trace_hw_mask_param(substream, r->var, k + 1, &old_mask, hw_param_mask(params, r->var)); } if (hw_is_interval(r->var)) { trace_hw_interval_param(substream, r->var, k + 1, &old_interval, hw_param_interval(params, r->var)); } params->cmask |= (1 << r->var); vstamps[r->var] = stamp; again = true; } rstamps[k] = stamp++; } /* Iterate to evaluate all rules till no parameters are changed. */ if (again) goto retry; out: kfree(rstamps); return err; } static int fixup_unreferenced_params(struct snd_pcm_substream *substream, struct snd_pcm_hw_params *params) { const struct snd_interval *i; const struct snd_mask *m; int err; if (!params->msbits) { i = hw_param_interval_c(params, SNDRV_PCM_HW_PARAM_SAMPLE_BITS); if (snd_interval_single(i)) params->msbits = snd_interval_value(i); } if (!params->rate_den) { i = hw_param_interval_c(params, SNDRV_PCM_HW_PARAM_RATE); if (snd_interval_single(i)) { params->rate_num = snd_interval_value(i); params->rate_den = 1; } } if (!params->fifo_size) { m = hw_param_mask_c(params, SNDRV_PCM_HW_PARAM_FORMAT); i = hw_param_interval_c(params, SNDRV_PCM_HW_PARAM_CHANNELS); if (snd_mask_single(m) && snd_interval_single(i)) { err = substream->ops->ioctl(substream, SNDRV_PCM_IOCTL1_FIFO_SIZE, params); if (err < 0) return err; } } if (!params->info) { params->info = substream->runtime->hw.info; params->info &= ~(SNDRV_PCM_INFO_FIFO_IN_FRAMES | SNDRV_PCM_INFO_DRAIN_TRIGGER); if (!hw_support_mmap(substream)) params->info &= ~(SNDRV_PCM_INFO_MMAP | SNDRV_PCM_INFO_MMAP_VALID); } return 0; } int snd_pcm_hw_refine(struct snd_pcm_substream *substream, struct snd_pcm_hw_params *params) { int err; params->info = 0; params->fifo_size = 0; if (params->rmask & (1 << SNDRV_PCM_HW_PARAM_SAMPLE_BITS)) params->msbits = 0; if (params->rmask & (1 << SNDRV_PCM_HW_PARAM_RATE)) { params->rate_num = 0; params->rate_den = 0; } err = constrain_mask_params(substream, params); if (err < 0) return err; err = constrain_interval_params(substream, params); if (err < 0) return err; err = constrain_params_by_rules(substream, params); if (err < 0) return err; params->rmask = 0; return 0; } EXPORT_SYMBOL(snd_pcm_hw_refine); static int snd_pcm_hw_refine_user(struct snd_pcm_substream *substream, struct snd_pcm_hw_params __user * _params) { struct snd_pcm_hw_params *params; int err; params = memdup_user(_params, sizeof(*params)); if (IS_ERR(params)) return PTR_ERR(params); err = snd_pcm_hw_refine(substream, params); if (err < 0) goto end; err = fixup_unreferenced_params(substream, params); if (err < 0) goto end; if (copy_to_user(_params, params, sizeof(*params))) err = -EFAULT; end: kfree(params); return err; } static int period_to_usecs(struct snd_pcm_runtime *runtime) { int usecs; if (! runtime->rate) return -1; /* invalid */ /* take 75% of period time as the deadline */ usecs = (750000 / runtime->rate) * runtime->period_size; usecs += ((750000 % runtime->rate) * runtime->period_size) / runtime->rate; return usecs; } static void snd_pcm_set_state(struct snd_pcm_substream *substream, int state) { snd_pcm_stream_lock_irq(substream); if (substream->runtime->status->state != SNDRV_PCM_STATE_DISCONNECTED) substream->runtime->status->state = state; snd_pcm_stream_unlock_irq(substream); } static inline void snd_pcm_timer_notify(struct snd_pcm_substream *substream, int event) { #ifdef CONFIG_SND_PCM_TIMER if (substream->timer) snd_timer_notify(substream->timer, event, &substream->runtime->trigger_tstamp); #endif } /** * snd_pcm_hw_param_choose - choose a configuration defined by @params * @pcm: PCM instance * @params: the hw_params instance * * Choose one configuration from configuration space defined by @params. * The configuration chosen is that obtained fixing in this order: * first access, first format, first subformat, min channels, * min rate, min period time, max buffer size, min tick time * * Return: Zero if successful, or a negative error code on failure. */ static int snd_pcm_hw_params_choose(struct snd_pcm_substream *pcm, struct snd_pcm_hw_params *params) { static const int vars[] = { SNDRV_PCM_HW_PARAM_ACCESS, SNDRV_PCM_HW_PARAM_FORMAT, SNDRV_PCM_HW_PARAM_SUBFORMAT, SNDRV_PCM_HW_PARAM_CHANNELS, SNDRV_PCM_HW_PARAM_RATE, SNDRV_PCM_HW_PARAM_PERIOD_TIME, SNDRV_PCM_HW_PARAM_BUFFER_SIZE, SNDRV_PCM_HW_PARAM_TICK_TIME, -1 }; const int *v; struct snd_mask old_mask; struct snd_interval old_interval; int changed; for (v = vars; *v != -1; v++) { /* Keep old parameter to trace. */ if (trace_hw_mask_param_enabled()) { if (hw_is_mask(*v)) old_mask = *hw_param_mask(params, *v); } if (trace_hw_interval_param_enabled()) { if (hw_is_interval(*v)) old_interval = *hw_param_interval(params, *v); } if (*v != SNDRV_PCM_HW_PARAM_BUFFER_SIZE) changed = snd_pcm_hw_param_first(pcm, params, *v, NULL); else changed = snd_pcm_hw_param_last(pcm, params, *v, NULL); if (changed < 0) return changed; if (changed == 0) continue; /* Trace the changed parameter. */ if (hw_is_mask(*v)) { trace_hw_mask_param(pcm, *v, 0, &old_mask, hw_param_mask(params, *v)); } if (hw_is_interval(*v)) { trace_hw_interval_param(pcm, *v, 0, &old_interval, hw_param_interval(params, *v)); } } return 0; } /* acquire buffer_mutex; if it's in r/w operation, return -EBUSY, otherwise * block the further r/w operations */ static int snd_pcm_buffer_access_lock(struct snd_pcm_runtime *runtime) { if (!atomic_dec_unless_positive(&runtime->buffer_accessing)) return -EBUSY; mutex_lock(&runtime->buffer_mutex); return 0; /* keep buffer_mutex, unlocked by below */ } /* release buffer_mutex and clear r/w access flag */ static void snd_pcm_buffer_access_unlock(struct snd_pcm_runtime *runtime) { mutex_unlock(&runtime->buffer_mutex); atomic_inc(&runtime->buffer_accessing); } #if IS_ENABLED(CONFIG_SND_PCM_OSS) #define is_oss_stream(substream) ((substream)->oss.oss) #else #define is_oss_stream(substream) false #endif static int snd_pcm_hw_params(struct snd_pcm_substream *substream, struct snd_pcm_hw_params *params) { struct snd_pcm_runtime *runtime; int err, usecs; unsigned int bits; snd_pcm_uframes_t frames; if (PCM_RUNTIME_CHECK(substream)) return -ENXIO; runtime = substream->runtime; err = snd_pcm_buffer_access_lock(runtime); if (err < 0) return err; snd_pcm_stream_lock_irq(substream); switch (runtime->status->state) { case SNDRV_PCM_STATE_OPEN: case SNDRV_PCM_STATE_SETUP: case SNDRV_PCM_STATE_PREPARED: if (!is_oss_stream(substream) && atomic_read(&substream->mmap_count)) err = -EBADFD; break; default: err = -EBADFD; break; } snd_pcm_stream_unlock_irq(substream); if (err) goto unlock; params->rmask = ~0U; err = snd_pcm_hw_refine(substream, params); if (err < 0) goto _error; err = snd_pcm_hw_params_choose(substream, params); if (err < 0) goto _error; err = fixup_unreferenced_params(substream, params); if (err < 0) goto _error; if (substream->ops->hw_params != NULL) { err = substream->ops->hw_params(substream, params); if (err < 0) goto _error; } runtime->access = params_access(params); runtime->format = params_format(params); runtime->subformat = params_subformat(params); runtime->channels = params_channels(params); runtime->rate = params_rate(params); runtime->period_size = params_period_size(params); runtime->periods = params_periods(params); runtime->buffer_size = params_buffer_size(params); runtime->info = params->info; runtime->rate_num = params->rate_num; runtime->rate_den = params->rate_den; runtime->no_period_wakeup = (params->info & SNDRV_PCM_INFO_NO_PERIOD_WAKEUP) && (params->flags & SNDRV_PCM_HW_PARAMS_NO_PERIOD_WAKEUP); bits = snd_pcm_format_physical_width(runtime->format); runtime->sample_bits = bits; bits *= runtime->channels; runtime->frame_bits = bits; frames = 1; while (bits % 8 != 0) { bits *= 2; frames *= 2; } runtime->byte_align = bits / 8; runtime->min_align = frames; /* Default sw params */ runtime->tstamp_mode = SNDRV_PCM_TSTAMP_NONE; runtime->period_step = 1; runtime->control->avail_min = runtime->period_size; runtime->start_threshold = 1; runtime->stop_threshold = runtime->buffer_size; runtime->silence_threshold = 0; runtime->silence_size = 0; runtime->boundary = runtime->buffer_size; while (runtime->boundary * 2 <= LONG_MAX - runtime->buffer_size) runtime->boundary *= 2; /* clear the buffer for avoiding possible kernel info leaks */ if (runtime->dma_area && !substream->ops->copy_user) { size_t size = runtime->dma_bytes; if (runtime->info & SNDRV_PCM_INFO_MMAP) size = PAGE_ALIGN(size); memset(runtime->dma_area, 0, size); } snd_pcm_timer_resolution_change(substream); snd_pcm_set_state(substream, SNDRV_PCM_STATE_SETUP); if (pm_qos_request_active(&substream->latency_pm_qos_req)) pm_qos_remove_request(&substream->latency_pm_qos_req); if ((usecs = period_to_usecs(runtime)) >= 0) pm_qos_add_request(&substream->latency_pm_qos_req, PM_QOS_CPU_DMA_LATENCY, usecs); err = 0; _error: if (err) { /* hardware might be unusable from this time, * so we force application to retry to set * the correct hardware parameter settings */ snd_pcm_set_state(substream, SNDRV_PCM_STATE_OPEN); if (substream->ops->hw_free != NULL) substream->ops->hw_free(substream); } unlock: snd_pcm_buffer_access_unlock(runtime); return err; } static int snd_pcm_hw_params_user(struct snd_pcm_substream *substream, struct snd_pcm_hw_params __user * _params) { struct snd_pcm_hw_params *params; int err; params = memdup_user(_params, sizeof(*params)); if (IS_ERR(params)) return PTR_ERR(params); err = snd_pcm_hw_params(substream, params); if (err < 0) goto end; if (copy_to_user(_params, params, sizeof(*params))) err = -EFAULT; end: kfree(params); return err; } static int snd_pcm_hw_free(struct snd_pcm_substream *substream) { struct snd_pcm_runtime *runtime; int result = 0; if (PCM_RUNTIME_CHECK(substream)) return -ENXIO; runtime = substream->runtime; result = snd_pcm_buffer_access_lock(runtime); if (result < 0) return result; snd_pcm_stream_lock_irq(substream); switch (runtime->status->state) { case SNDRV_PCM_STATE_SETUP: case SNDRV_PCM_STATE_PREPARED: if (atomic_read(&substream->mmap_count)) result = -EBADFD; break; default: result = -EBADFD; break; } snd_pcm_stream_unlock_irq(substream); if (result) goto unlock; if (substream->ops->hw_free) result = substream->ops->hw_free(substream); snd_pcm_set_state(substream, SNDRV_PCM_STATE_OPEN); pm_qos_remove_request(&substream->latency_pm_qos_req); unlock: snd_pcm_buffer_access_unlock(runtime); return result; } static int snd_pcm_sw_params(struct snd_pcm_substream *substream, struct snd_pcm_sw_params *params) { struct snd_pcm_runtime *runtime; int err; if (PCM_RUNTIME_CHECK(substream)) return -ENXIO; runtime = substream->runtime; snd_pcm_stream_lock_irq(substream); if (runtime->status->state == SNDRV_PCM_STATE_OPEN) { snd_pcm_stream_unlock_irq(substream); return -EBADFD; } snd_pcm_stream_unlock_irq(substream); if (params->tstamp_mode < 0 || params->tstamp_mode > SNDRV_PCM_TSTAMP_LAST) return -EINVAL; if (params->proto >= SNDRV_PROTOCOL_VERSION(2, 0, 12) && params->tstamp_type > SNDRV_PCM_TSTAMP_TYPE_LAST) return -EINVAL; if (params->avail_min == 0) return -EINVAL; if (params->silence_size >= runtime->boundary) { if (params->silence_threshold != 0) return -EINVAL; } else { if (params->silence_size > params->silence_threshold) return -EINVAL; if (params->silence_threshold > runtime->buffer_size) return -EINVAL; } err = 0; snd_pcm_stream_lock_irq(substream); runtime->tstamp_mode = params->tstamp_mode; if (params->proto >= SNDRV_PROTOCOL_VERSION(2, 0, 12)) runtime->tstamp_type = params->tstamp_type; runtime->period_step = params->period_step; runtime->control->avail_min = params->avail_min; runtime->start_threshold = params->start_threshold; runtime->stop_threshold = params->stop_threshold; runtime->silence_threshold = params->silence_threshold; runtime->silence_size = params->silence_size; params->boundary = runtime->boundary; if (snd_pcm_running(substream)) { if (substream->stream == SNDRV_PCM_STREAM_PLAYBACK && runtime->silence_size > 0) snd_pcm_playback_silence(substream, ULONG_MAX); err = snd_pcm_update_state(substream, runtime); } snd_pcm_stream_unlock_irq(substream); return err; } static int snd_pcm_sw_params_user(struct snd_pcm_substream *substream, struct snd_pcm_sw_params __user * _params) { struct snd_pcm_sw_params params; int err; if (copy_from_user(&params, _params, sizeof(params))) return -EFAULT; err = snd_pcm_sw_params(substream, &params); if (copy_to_user(_params, &params, sizeof(params))) return -EFAULT; return err; } static inline snd_pcm_uframes_t snd_pcm_calc_delay(struct snd_pcm_substream *substream) { snd_pcm_uframes_t delay; if (substream->stream == SNDRV_PCM_STREAM_PLAYBACK) delay = snd_pcm_playback_hw_avail(substream->runtime); else delay = snd_pcm_capture_avail(substream->runtime); return delay + substream->runtime->delay; } int snd_pcm_status(struct snd_pcm_substream *substream, struct snd_pcm_status *status) { struct snd_pcm_runtime *runtime = substream->runtime; snd_pcm_stream_lock_irq(substream); snd_pcm_unpack_audio_tstamp_config(status->audio_tstamp_data, &runtime->audio_tstamp_config); /* backwards compatible behavior */ if (runtime->audio_tstamp_config.type_requested == SNDRV_PCM_AUDIO_TSTAMP_TYPE_COMPAT) { if (runtime->hw.info & SNDRV_PCM_INFO_HAS_WALL_CLOCK) runtime->audio_tstamp_config.type_requested = SNDRV_PCM_AUDIO_TSTAMP_TYPE_LINK; else runtime->audio_tstamp_config.type_requested = SNDRV_PCM_AUDIO_TSTAMP_TYPE_DEFAULT; runtime->audio_tstamp_report.valid = 0; } else runtime->audio_tstamp_report.valid = 1; status->state = runtime->status->state; status->suspended_state = runtime->status->suspended_state; if (status->state == SNDRV_PCM_STATE_OPEN) goto _end; status->trigger_tstamp = runtime->trigger_tstamp; if (snd_pcm_running(substream)) { snd_pcm_update_hw_ptr(substream); if (runtime->tstamp_mode == SNDRV_PCM_TSTAMP_ENABLE) { status->tstamp = runtime->status->tstamp; status->driver_tstamp = runtime->driver_tstamp; status->audio_tstamp = runtime->status->audio_tstamp; if (runtime->audio_tstamp_report.valid == 1) /* backwards compatibility, no report provided in COMPAT mode */ snd_pcm_pack_audio_tstamp_report(&status->audio_tstamp_data, &status->audio_tstamp_accuracy, &runtime->audio_tstamp_report); goto _tstamp_end; } } else { /* get tstamp only in fallback mode and only if enabled */ if (runtime->tstamp_mode == SNDRV_PCM_TSTAMP_ENABLE) snd_pcm_gettime(runtime, &status->tstamp); } _tstamp_end: status->appl_ptr = runtime->control->appl_ptr; status->hw_ptr = runtime->status->hw_ptr; status->avail = snd_pcm_avail(substream); status->delay = snd_pcm_running(substream) ? snd_pcm_calc_delay(substream) : 0; status->avail_max = runtime->avail_max; status->overrange = runtime->overrange; runtime->avail_max = 0; runtime->overrange = 0; _end: snd_pcm_stream_unlock_irq(substream); return 0; } static int snd_pcm_status_user(struct snd_pcm_substream *substream, struct snd_pcm_status __user * _status, bool ext) { struct snd_pcm_status status; int res; memset(&status, 0, sizeof(status)); /* * with extension, parameters are read/write, * get audio_tstamp_data from user, * ignore rest of status structure */ if (ext && get_user(status.audio_tstamp_data, (u32 __user *)(&_status->audio_tstamp_data))) return -EFAULT; res = snd_pcm_status(substream, &status); if (res < 0) return res; if (copy_to_user(_status, &status, sizeof(status))) return -EFAULT; return 0; } static int snd_pcm_channel_info(struct snd_pcm_substream *substream, struct snd_pcm_channel_info * info) { struct snd_pcm_runtime *runtime; unsigned int channel; channel = info->channel; runtime = substream->runtime; snd_pcm_stream_lock_irq(substream); if (runtime->status->state == SNDRV_PCM_STATE_OPEN) { snd_pcm_stream_unlock_irq(substream); return -EBADFD; } snd_pcm_stream_unlock_irq(substream); if (channel >= runtime->channels) return -EINVAL; memset(info, 0, sizeof(*info)); info->channel = channel; return substream->ops->ioctl(substream, SNDRV_PCM_IOCTL1_CHANNEL_INFO, info); } static int snd_pcm_channel_info_user(struct snd_pcm_substream *substream, struct snd_pcm_channel_info __user * _info) { struct snd_pcm_channel_info info; int res; if (copy_from_user(&info, _info, sizeof(info))) return -EFAULT; res = snd_pcm_channel_info(substream, &info); if (res < 0) return res; if (copy_to_user(_info, &info, sizeof(info))) return -EFAULT; return 0; } static void snd_pcm_trigger_tstamp(struct snd_pcm_substream *substream) { struct snd_pcm_runtime *runtime = substream->runtime; if (runtime->trigger_master == NULL) return; if (runtime->trigger_master == substream) { if (!runtime->trigger_tstamp_latched) snd_pcm_gettime(runtime, &runtime->trigger_tstamp); } else { snd_pcm_trigger_tstamp(runtime->trigger_master); runtime->trigger_tstamp = runtime->trigger_master->runtime->trigger_tstamp; } runtime->trigger_master = NULL; } struct action_ops { int (*pre_action)(struct snd_pcm_substream *substream, int state); int (*do_action)(struct snd_pcm_substream *substream, int state); void (*undo_action)(struct snd_pcm_substream *substream, int state); void (*post_action)(struct snd_pcm_substream *substream, int state); }; /* * this functions is core for handling of linked stream * Note: the stream state might be changed also on failure * Note2: call with calling stream lock + link lock */ static int snd_pcm_action_group(const struct action_ops *ops, struct snd_pcm_substream *substream, int state, int stream_lock) { struct snd_pcm_substream *s = NULL; struct snd_pcm_substream *s1; int res = 0, depth = 1; snd_pcm_group_for_each_entry(s, substream) { if (s != substream) { if (!stream_lock) mutex_lock_nested(&s->runtime->buffer_mutex, depth); else if (s->pcm->nonatomic) mutex_lock_nested(&s->self_group.mutex, depth); else spin_lock_nested(&s->self_group.lock, depth); depth++; } res = ops->pre_action(s, state); if (res < 0) goto _unlock; } snd_pcm_group_for_each_entry(s, substream) { res = ops->do_action(s, state); if (res < 0) { if (ops->undo_action) { snd_pcm_group_for_each_entry(s1, substream) { if (s1 == s) /* failed stream */ break; ops->undo_action(s1, state); } } s = NULL; /* unlock all */ goto _unlock; } } snd_pcm_group_for_each_entry(s, substream) { ops->post_action(s, state); } _unlock: /* unlock streams */ snd_pcm_group_for_each_entry(s1, substream) { if (s1 != substream) { if (!stream_lock) mutex_unlock(&s1->runtime->buffer_mutex); else if (s1->pcm->nonatomic) mutex_unlock(&s1->self_group.mutex); else spin_unlock(&s1->self_group.lock); } if (s1 == s) /* end */ break; } return res; } /* * Note: call with stream lock */ static int snd_pcm_action_single(const struct action_ops *ops, struct snd_pcm_substream *substream, int state) { int res; res = ops->pre_action(substream, state); if (res < 0) return res; res = ops->do_action(substream, state); if (res == 0) ops->post_action(substream, state); else if (ops->undo_action) ops->undo_action(substream, state); return res; } /* * Note: call with stream lock */ static int snd_pcm_action(const struct action_ops *ops, struct snd_pcm_substream *substream, int state) { int res; if (!snd_pcm_stream_linked(substream)) return snd_pcm_action_single(ops, substream, state); if (substream->pcm->nonatomic) { if (!mutex_trylock(&substream->group->mutex)) { mutex_unlock(&substream->self_group.mutex); mutex_lock(&substream->group->mutex); mutex_lock(&substream->self_group.mutex); } res = snd_pcm_action_group(ops, substream, state, 1); mutex_unlock(&substream->group->mutex); } else { if (!spin_trylock(&substream->group->lock)) { spin_unlock(&substream->self_group.lock); spin_lock(&substream->group->lock); spin_lock(&substream->self_group.lock); } res = snd_pcm_action_group(ops, substream, state, 1); spin_unlock(&substream->group->lock); } return res; } /* * Note: don't use any locks before */ static int snd_pcm_action_lock_irq(const struct action_ops *ops, struct snd_pcm_substream *substream, int state) { int res; snd_pcm_stream_lock_irq(substream); res = snd_pcm_action(ops, substream, state); snd_pcm_stream_unlock_irq(substream); return res; } /* */ static int snd_pcm_action_nonatomic(const struct action_ops *ops, struct snd_pcm_substream *substream, int state) { int res; down_read(&snd_pcm_link_rwsem); res = snd_pcm_buffer_access_lock(substream->runtime); if (res < 0) goto unlock; if (snd_pcm_stream_linked(substream)) res = snd_pcm_action_group(ops, substream, state, 0); else res = snd_pcm_action_single(ops, substream, state); snd_pcm_buffer_access_unlock(substream->runtime); unlock: up_read(&snd_pcm_link_rwsem); return res; } /* * start callbacks */ static int snd_pcm_pre_start(struct snd_pcm_substream *substream, int state) { struct snd_pcm_runtime *runtime = substream->runtime; if (runtime->status->state != SNDRV_PCM_STATE_PREPARED) return -EBADFD; if (substream->stream == SNDRV_PCM_STREAM_PLAYBACK && !snd_pcm_playback_data(substream)) return -EPIPE; runtime->trigger_tstamp_latched = false; runtime->trigger_master = substream; return 0; } static int snd_pcm_do_start(struct snd_pcm_substream *substream, int state) { if (substream->runtime->trigger_master != substream) return 0; return substream->ops->trigger(substream, SNDRV_PCM_TRIGGER_START); } static void snd_pcm_undo_start(struct snd_pcm_substream *substream, int state) { if (substream->runtime->trigger_master == substream) substream->ops->trigger(substream, SNDRV_PCM_TRIGGER_STOP); } static void snd_pcm_post_start(struct snd_pcm_substream *substream, int state) { struct snd_pcm_runtime *runtime = substream->runtime; snd_pcm_trigger_tstamp(substream); runtime->hw_ptr_jiffies = jiffies; runtime->hw_ptr_buffer_jiffies = (runtime->buffer_size * HZ) / runtime->rate; runtime->status->state = state; if (substream->stream == SNDRV_PCM_STREAM_PLAYBACK && runtime->silence_size > 0) snd_pcm_playback_silence(substream, ULONG_MAX); snd_pcm_timer_notify(substream, SNDRV_TIMER_EVENT_MSTART); } static const struct action_ops snd_pcm_action_start = { .pre_action = snd_pcm_pre_start, .do_action = snd_pcm_do_start, .undo_action = snd_pcm_undo_start, .post_action = snd_pcm_post_start }; /** * snd_pcm_start - start all linked streams * @substream: the PCM substream instance * * Return: Zero if successful, or a negative error code. * The stream lock must be acquired before calling this function. */ int snd_pcm_start(struct snd_pcm_substream *substream) { return snd_pcm_action(&snd_pcm_action_start, substream, SNDRV_PCM_STATE_RUNNING); } /* take the stream lock and start the streams */ static int snd_pcm_start_lock_irq(struct snd_pcm_substream *substream) { return snd_pcm_action_lock_irq(&snd_pcm_action_start, substream, SNDRV_PCM_STATE_RUNNING); } /* * stop callbacks */ static int snd_pcm_pre_stop(struct snd_pcm_substream *substream, int state) { struct snd_pcm_runtime *runtime = substream->runtime; if (runtime->status->state == SNDRV_PCM_STATE_OPEN) return -EBADFD; runtime->trigger_master = substream; return 0; } static int snd_pcm_do_stop(struct snd_pcm_substream *substream, int state) { if (substream->runtime->trigger_master == substream && snd_pcm_running(substream)) substream->ops->trigger(substream, SNDRV_PCM_TRIGGER_STOP); return 0; /* unconditonally stop all substreams */ } static void snd_pcm_post_stop(struct snd_pcm_substream *substream, int state) { struct snd_pcm_runtime *runtime = substream->runtime; if (runtime->status->state != state) { snd_pcm_trigger_tstamp(substream); runtime->status->state = state; snd_pcm_timer_notify(substream, SNDRV_TIMER_EVENT_MSTOP); } wake_up(&runtime->sleep); wake_up(&runtime->tsleep); } static const struct action_ops snd_pcm_action_stop = { .pre_action = snd_pcm_pre_stop, .do_action = snd_pcm_do_stop, .post_action = snd_pcm_post_stop }; /** * snd_pcm_stop - try to stop all running streams in the substream group * @substream: the PCM substream instance * @state: PCM state after stopping the stream * * The state of each stream is then changed to the given state unconditionally. * * Return: Zero if successful, or a negative error code. */ int snd_pcm_stop(struct snd_pcm_substream *substream, snd_pcm_state_t state) { return snd_pcm_action(&snd_pcm_action_stop, substream, state); } EXPORT_SYMBOL(snd_pcm_stop); /** * snd_pcm_drain_done - stop the DMA only when the given stream is playback * @substream: the PCM substream * * After stopping, the state is changed to SETUP. * Unlike snd_pcm_stop(), this affects only the given stream. * * Return: Zero if succesful, or a negative error code. */ int snd_pcm_drain_done(struct snd_pcm_substream *substream) { return snd_pcm_action_single(&snd_pcm_action_stop, substream, SNDRV_PCM_STATE_SETUP); } /** * snd_pcm_stop_xrun - stop the running streams as XRUN * @substream: the PCM substream instance * * This stops the given running substream (and all linked substreams) as XRUN. * Unlike snd_pcm_stop(), this function takes the substream lock by itself. * * Return: Zero if successful, or a negative error code. */ int snd_pcm_stop_xrun(struct snd_pcm_substream *substream) { unsigned long flags; snd_pcm_stream_lock_irqsave(substream, flags); if (substream->runtime && snd_pcm_running(substream)) __snd_pcm_xrun(substream); snd_pcm_stream_unlock_irqrestore(substream, flags); return 0; } EXPORT_SYMBOL_GPL(snd_pcm_stop_xrun); /* * pause callbacks */ static int snd_pcm_pre_pause(struct snd_pcm_substream *substream, int push) { struct snd_pcm_runtime *runtime = substream->runtime; if (!(runtime->info & SNDRV_PCM_INFO_PAUSE)) return -ENOSYS; if (push) { if (runtime->status->state != SNDRV_PCM_STATE_RUNNING) return -EBADFD; } else if (runtime->status->state != SNDRV_PCM_STATE_PAUSED) return -EBADFD; runtime->trigger_master = substream; return 0; } static int snd_pcm_do_pause(struct snd_pcm_substream *substream, int push) { if (substream->runtime->trigger_master != substream) return 0; /* some drivers might use hw_ptr to recover from the pause - update the hw_ptr now */ if (push) snd_pcm_update_hw_ptr(substream); /* The jiffies check in snd_pcm_update_hw_ptr*() is done by * a delta between the current jiffies, this gives a large enough * delta, effectively to skip the check once. */ substream->runtime->hw_ptr_jiffies = jiffies - HZ * 1000; return substream->ops->trigger(substream, push ? SNDRV_PCM_TRIGGER_PAUSE_PUSH : SNDRV_PCM_TRIGGER_PAUSE_RELEASE); } static void snd_pcm_undo_pause(struct snd_pcm_substream *substream, int push) { if (substream->runtime->trigger_master == substream) substream->ops->trigger(substream, push ? SNDRV_PCM_TRIGGER_PAUSE_RELEASE : SNDRV_PCM_TRIGGER_PAUSE_PUSH); } static void snd_pcm_post_pause(struct snd_pcm_substream *substream, int push) { struct snd_pcm_runtime *runtime = substream->runtime; snd_pcm_trigger_tstamp(substream); if (push) { runtime->status->state = SNDRV_PCM_STATE_PAUSED; snd_pcm_timer_notify(substream, SNDRV_TIMER_EVENT_MPAUSE); wake_up(&runtime->sleep); wake_up(&runtime->tsleep); } else { runtime->status->state = SNDRV_PCM_STATE_RUNNING; snd_pcm_timer_notify(substream, SNDRV_TIMER_EVENT_MCONTINUE); } } static const struct action_ops snd_pcm_action_pause = { .pre_action = snd_pcm_pre_pause, .do_action = snd_pcm_do_pause, .undo_action = snd_pcm_undo_pause, .post_action = snd_pcm_post_pause }; /* * Push/release the pause for all linked streams. */ static int snd_pcm_pause(struct snd_pcm_substream *substream, int push) { return snd_pcm_action(&snd_pcm_action_pause, substream, push); } #ifdef CONFIG_PM /* suspend */ static int snd_pcm_pre_suspend(struct snd_pcm_substream *substream, int state) { struct snd_pcm_runtime *runtime = substream->runtime; switch (runtime->status->state) { case SNDRV_PCM_STATE_SUSPENDED: return -EBUSY; /* unresumable PCM state; return -EBUSY for skipping suspend */ case SNDRV_PCM_STATE_OPEN: case SNDRV_PCM_STATE_SETUP: case SNDRV_PCM_STATE_DISCONNECTED: return -EBUSY; } runtime->trigger_master = substream; return 0; } static int snd_pcm_do_suspend(struct snd_pcm_substream *substream, int state) { struct snd_pcm_runtime *runtime = substream->runtime; if (runtime->trigger_master != substream) return 0; if (! snd_pcm_running(substream)) return 0; substream->ops->trigger(substream, SNDRV_PCM_TRIGGER_SUSPEND); return 0; /* suspend unconditionally */ } static void snd_pcm_post_suspend(struct snd_pcm_substream *substream, int state) { struct snd_pcm_runtime *runtime = substream->runtime; snd_pcm_trigger_tstamp(substream); runtime->status->suspended_state = runtime->status->state; runtime->status->state = SNDRV_PCM_STATE_SUSPENDED; snd_pcm_timer_notify(substream, SNDRV_TIMER_EVENT_MSUSPEND); wake_up(&runtime->sleep); wake_up(&runtime->tsleep); } static const struct action_ops snd_pcm_action_suspend = { .pre_action = snd_pcm_pre_suspend, .do_action = snd_pcm_do_suspend, .post_action = snd_pcm_post_suspend }; /** * snd_pcm_suspend - trigger SUSPEND to all linked streams * @substream: the PCM substream * * After this call, all streams are changed to SUSPENDED state. * * Return: Zero if successful (or @substream is %NULL), or a negative error * code. */ int snd_pcm_suspend(struct snd_pcm_substream *substream) { int err; unsigned long flags; if (! substream) return 0; snd_pcm_stream_lock_irqsave(substream, flags); err = snd_pcm_action(&snd_pcm_action_suspend, substream, 0); snd_pcm_stream_unlock_irqrestore(substream, flags); return err; } EXPORT_SYMBOL(snd_pcm_suspend); /** * snd_pcm_suspend_all - trigger SUSPEND to all substreams in the given pcm * @pcm: the PCM instance * * After this call, all streams are changed to SUSPENDED state. * * Return: Zero if successful (or @pcm is %NULL), or a negative error code. */ int snd_pcm_suspend_all(struct snd_pcm *pcm) { struct snd_pcm_substream *substream; int stream, err = 0; if (! pcm) return 0; for (stream = 0; stream < 2; stream++) { for (substream = pcm->streams[stream].substream; substream; substream = substream->next) { /* FIXME: the open/close code should lock this as well */ if (substream->runtime == NULL) continue; /* * Skip BE dai link PCM's that are internal and may * not have their substream ops set. */ if (!substream->ops) continue; err = snd_pcm_suspend(substream); if (err < 0 && err != -EBUSY) return err; } } return 0; } EXPORT_SYMBOL(snd_pcm_suspend_all); /* resume */ static int snd_pcm_pre_resume(struct snd_pcm_substream *substream, int state) { struct snd_pcm_runtime *runtime = substream->runtime; if (!(runtime->info & SNDRV_PCM_INFO_RESUME)) return -ENOSYS; runtime->trigger_master = substream; return 0; } static int snd_pcm_do_resume(struct snd_pcm_substream *substream, int state) { struct snd_pcm_runtime *runtime = substream->runtime; if (runtime->trigger_master != substream) return 0; /* DMA not running previously? */ if (runtime->status->suspended_state != SNDRV_PCM_STATE_RUNNING && (runtime->status->suspended_state != SNDRV_PCM_STATE_DRAINING || substream->stream != SNDRV_PCM_STREAM_PLAYBACK)) return 0; return substream->ops->trigger(substream, SNDRV_PCM_TRIGGER_RESUME); } static void snd_pcm_undo_resume(struct snd_pcm_substream *substream, int state) { if (substream->runtime->trigger_master == substream && snd_pcm_running(substream)) substream->ops->trigger(substream, SNDRV_PCM_TRIGGER_SUSPEND); } static void snd_pcm_post_resume(struct snd_pcm_substream *substream, int state) { struct snd_pcm_runtime *runtime = substream->runtime; snd_pcm_trigger_tstamp(substream); runtime->status->state = runtime->status->suspended_state; snd_pcm_timer_notify(substream, SNDRV_TIMER_EVENT_MRESUME); } static const struct action_ops snd_pcm_action_resume = { .pre_action = snd_pcm_pre_resume, .do_action = snd_pcm_do_resume, .undo_action = snd_pcm_undo_resume, .post_action = snd_pcm_post_resume }; static int snd_pcm_resume(struct snd_pcm_substream *substream) { return snd_pcm_action_lock_irq(&snd_pcm_action_resume, substream, 0); } #else static int snd_pcm_resume(struct snd_pcm_substream *substream) { return -ENOSYS; } #endif /* CONFIG_PM */ /* * xrun ioctl * * Change the RUNNING stream(s) to XRUN state. */ static int snd_pcm_xrun(struct snd_pcm_substream *substream) { struct snd_pcm_runtime *runtime = substream->runtime; int result; snd_pcm_stream_lock_irq(substream); switch (runtime->status->state) { case SNDRV_PCM_STATE_XRUN: result = 0; /* already there */ break; case SNDRV_PCM_STATE_RUNNING: __snd_pcm_xrun(substream); result = 0; break; default: result = -EBADFD; } snd_pcm_stream_unlock_irq(substream); return result; } /* * reset ioctl */ static int snd_pcm_pre_reset(struct snd_pcm_substream *substream, int state) { struct snd_pcm_runtime *runtime = substream->runtime; switch (runtime->status->state) { case SNDRV_PCM_STATE_RUNNING: case SNDRV_PCM_STATE_PREPARED: case SNDRV_PCM_STATE_PAUSED: case SNDRV_PCM_STATE_SUSPENDED: return 0; default: return -EBADFD; } } static int snd_pcm_do_reset(struct snd_pcm_substream *substream, int state) { struct snd_pcm_runtime *runtime = substream->runtime; int err = substream->ops->ioctl(substream, SNDRV_PCM_IOCTL1_RESET, NULL); if (err < 0) return err; snd_pcm_stream_lock_irq(substream); runtime->hw_ptr_base = 0; runtime->hw_ptr_interrupt = runtime->status->hw_ptr - runtime->status->hw_ptr % runtime->period_size; runtime->silence_start = runtime->status->hw_ptr; runtime->silence_filled = 0; snd_pcm_stream_unlock_irq(substream); return 0; } static void snd_pcm_post_reset(struct snd_pcm_substream *substream, int state) { struct snd_pcm_runtime *runtime = substream->runtime; snd_pcm_stream_lock_irq(substream); runtime->control->appl_ptr = runtime->status->hw_ptr; if (substream->stream == SNDRV_PCM_STREAM_PLAYBACK && runtime->silence_size > 0) snd_pcm_playback_silence(substream, ULONG_MAX); snd_pcm_stream_unlock_irq(substream); } static const struct action_ops snd_pcm_action_reset = { .pre_action = snd_pcm_pre_reset, .do_action = snd_pcm_do_reset, .post_action = snd_pcm_post_reset }; static int snd_pcm_reset(struct snd_pcm_substream *substream) { return snd_pcm_action_nonatomic(&snd_pcm_action_reset, substream, 0); } /* * prepare ioctl */ /* we use the second argument for updating f_flags */ static int snd_pcm_pre_prepare(struct snd_pcm_substream *substream, int f_flags) { struct snd_pcm_runtime *runtime = substream->runtime; if (runtime->status->state == SNDRV_PCM_STATE_OPEN || runtime->status->state == SNDRV_PCM_STATE_DISCONNECTED) return -EBADFD; if (snd_pcm_running(substream)) return -EBUSY; substream->f_flags = f_flags; return 0; } static int snd_pcm_do_prepare(struct snd_pcm_substream *substream, int state) { int err; err = substream->ops->prepare(substream); if (err < 0) return err; return snd_pcm_do_reset(substream, 0); } static void snd_pcm_post_prepare(struct snd_pcm_substream *substream, int state) { struct snd_pcm_runtime *runtime = substream->runtime; runtime->control->appl_ptr = runtime->status->hw_ptr; snd_pcm_set_state(substream, SNDRV_PCM_STATE_PREPARED); } static const struct action_ops snd_pcm_action_prepare = { .pre_action = snd_pcm_pre_prepare, .do_action = snd_pcm_do_prepare, .post_action = snd_pcm_post_prepare }; /** * snd_pcm_prepare - prepare the PCM substream to be triggerable * @substream: the PCM substream instance * @file: file to refer f_flags * * Return: Zero if successful, or a negative error code. */ static int snd_pcm_prepare(struct snd_pcm_substream *substream, struct file *file) { int f_flags; if (file) f_flags = file->f_flags; else f_flags = substream->f_flags; snd_pcm_stream_lock_irq(substream); switch (substream->runtime->status->state) { case SNDRV_PCM_STATE_PAUSED: snd_pcm_pause(substream, 0); /* fallthru */ case SNDRV_PCM_STATE_SUSPENDED: snd_pcm_stop(substream, SNDRV_PCM_STATE_SETUP); break; } snd_pcm_stream_unlock_irq(substream); return snd_pcm_action_nonatomic(&snd_pcm_action_prepare, substream, f_flags); } /* * drain ioctl */ static int snd_pcm_pre_drain_init(struct snd_pcm_substream *substream, int state) { struct snd_pcm_runtime *runtime = substream->runtime; switch (runtime->status->state) { case SNDRV_PCM_STATE_OPEN: case SNDRV_PCM_STATE_DISCONNECTED: case SNDRV_PCM_STATE_SUSPENDED: return -EBADFD; } runtime->trigger_master = substream; return 0; } static int snd_pcm_do_drain_init(struct snd_pcm_substream *substream, int state) { struct snd_pcm_runtime *runtime = substream->runtime; if (substream->stream == SNDRV_PCM_STREAM_PLAYBACK) { switch (runtime->status->state) { case SNDRV_PCM_STATE_PREPARED: /* start playback stream if possible */ if (! snd_pcm_playback_empty(substream)) { snd_pcm_do_start(substream, SNDRV_PCM_STATE_DRAINING); snd_pcm_post_start(substream, SNDRV_PCM_STATE_DRAINING); } else { runtime->status->state = SNDRV_PCM_STATE_SETUP; } break; case SNDRV_PCM_STATE_RUNNING: runtime->status->state = SNDRV_PCM_STATE_DRAINING; break; case SNDRV_PCM_STATE_XRUN: runtime->status->state = SNDRV_PCM_STATE_SETUP; break; default: break; } } else { /* stop running stream */ if (runtime->status->state == SNDRV_PCM_STATE_RUNNING) { int new_state = snd_pcm_capture_avail(runtime) > 0 ? SNDRV_PCM_STATE_DRAINING : SNDRV_PCM_STATE_SETUP; snd_pcm_do_stop(substream, new_state); snd_pcm_post_stop(substream, new_state); } } if (runtime->status->state == SNDRV_PCM_STATE_DRAINING && runtime->trigger_master == substream && (runtime->hw.info & SNDRV_PCM_INFO_DRAIN_TRIGGER)) return substream->ops->trigger(substream, SNDRV_PCM_TRIGGER_DRAIN); return 0; } static void snd_pcm_post_drain_init(struct snd_pcm_substream *substream, int state) { } static const struct action_ops snd_pcm_action_drain_init = { .pre_action = snd_pcm_pre_drain_init, .do_action = snd_pcm_do_drain_init, .post_action = snd_pcm_post_drain_init }; static int snd_pcm_drop(struct snd_pcm_substream *substream); /* * Drain the stream(s). * When the substream is linked, sync until the draining of all playback streams * is finished. * After this call, all streams are supposed to be either SETUP or DRAINING * (capture only) state. */ static int snd_pcm_drain(struct snd_pcm_substream *substream, struct file *file) { struct snd_card *card; struct snd_pcm_runtime *runtime; struct snd_pcm_substream *s; wait_queue_entry_t wait; int result = 0; int nonblock = 0; card = substream->pcm->card; runtime = substream->runtime; if (runtime->status->state == SNDRV_PCM_STATE_OPEN) return -EBADFD; if (file) { if (file->f_flags & O_NONBLOCK) nonblock = 1; } else if (substream->f_flags & O_NONBLOCK) nonblock = 1; down_read(&snd_pcm_link_rwsem); snd_pcm_stream_lock_irq(substream); /* resume pause */ if (runtime->status->state == SNDRV_PCM_STATE_PAUSED) snd_pcm_pause(substream, 0); /* pre-start/stop - all running streams are changed to DRAINING state */ result = snd_pcm_action(&snd_pcm_action_drain_init, substream, 0); if (result < 0) goto unlock; /* in non-blocking, we don't wait in ioctl but let caller poll */ if (nonblock) { result = -EAGAIN; goto unlock; } for (;;) { long tout; struct snd_pcm_runtime *to_check; if (signal_pending(current)) { result = -ERESTARTSYS; break; } /* find a substream to drain */ to_check = NULL; snd_pcm_group_for_each_entry(s, substream) { if (s->stream != SNDRV_PCM_STREAM_PLAYBACK) continue; runtime = s->runtime; if (runtime->status->state == SNDRV_PCM_STATE_DRAINING) { to_check = runtime; break; } } if (!to_check) break; /* all drained */ init_waitqueue_entry(&wait, current); add_wait_queue(&to_check->sleep, &wait); snd_pcm_stream_unlock_irq(substream); up_read(&snd_pcm_link_rwsem); if (runtime->no_period_wakeup) tout = MAX_SCHEDULE_TIMEOUT; else { tout = 10; if (runtime->rate) { long t = runtime->period_size * 2 / runtime->rate; tout = max(t, tout); } tout = msecs_to_jiffies(tout * 1000); } tout = schedule_timeout_interruptible(tout); down_read(&snd_pcm_link_rwsem); snd_pcm_stream_lock_irq(substream); remove_wait_queue(&to_check->sleep, &wait); if (card->shutdown) { result = -ENODEV; break; } if (tout == 0) { if (substream->runtime->status->state == SNDRV_PCM_STATE_SUSPENDED) result = -ESTRPIPE; else { dev_dbg(substream->pcm->card->dev, "playback drain error (DMA or IRQ trouble?)\n"); snd_pcm_stop(substream, SNDRV_PCM_STATE_SETUP); result = -EIO; } break; } } unlock: snd_pcm_stream_unlock_irq(substream); up_read(&snd_pcm_link_rwsem); return result; } /* * drop ioctl * * Immediately put all linked substreams into SETUP state. */ static int snd_pcm_drop(struct snd_pcm_substream *substream) { struct snd_pcm_runtime *runtime; int result = 0; if (PCM_RUNTIME_CHECK(substream)) return -ENXIO; runtime = substream->runtime; if (runtime->status->state == SNDRV_PCM_STATE_OPEN || runtime->status->state == SNDRV_PCM_STATE_DISCONNECTED) return -EBADFD; snd_pcm_stream_lock_irq(substream); /* resume pause */ if (runtime->status->state == SNDRV_PCM_STATE_PAUSED) snd_pcm_pause(substream, 0); snd_pcm_stop(substream, SNDRV_PCM_STATE_SETUP); /* runtime->control->appl_ptr = runtime->status->hw_ptr; */ snd_pcm_stream_unlock_irq(substream); return result; } static bool is_pcm_file(struct file *file) { struct inode *inode = file_inode(file); unsigned int minor; if (!S_ISCHR(inode->i_mode) || imajor(inode) != snd_major) return false; minor = iminor(inode); return snd_lookup_minor_data(minor, SNDRV_DEVICE_TYPE_PCM_PLAYBACK) || snd_lookup_minor_data(minor, SNDRV_DEVICE_TYPE_PCM_CAPTURE); } /* * PCM link handling */ static int snd_pcm_link(struct snd_pcm_substream *substream, int fd) { int res = 0; struct snd_pcm_file *pcm_file; struct snd_pcm_substream *substream1; struct snd_pcm_group *group; struct fd f = fdget(fd); if (!f.file) return -EBADFD; if (!is_pcm_file(f.file)) { res = -EBADFD; goto _badf; } pcm_file = f.file->private_data; substream1 = pcm_file->substream; if (substream == substream1) { res = -EINVAL; goto _badf; } group = kmalloc(sizeof(*group), GFP_KERNEL); if (!group) { res = -ENOMEM; goto _nolock; } down_write_nonfifo(&snd_pcm_link_rwsem); write_lock_irq(&snd_pcm_link_rwlock); if (substream->runtime->status->state == SNDRV_PCM_STATE_OPEN || substream->runtime->status->state != substream1->runtime->status->state || substream->pcm->nonatomic != substream1->pcm->nonatomic) { res = -EBADFD; goto _end; } if (snd_pcm_stream_linked(substream1)) { res = -EALREADY; goto _end; } if (!snd_pcm_stream_linked(substream)) { substream->group = group; group = NULL; spin_lock_init(&substream->group->lock); mutex_init(&substream->group->mutex); INIT_LIST_HEAD(&substream->group->substreams); list_add_tail(&substream->link_list, &substream->group->substreams); substream->group->count = 1; } list_add_tail(&substream1->link_list, &substream->group->substreams); substream->group->count++; substream1->group = substream->group; _end: write_unlock_irq(&snd_pcm_link_rwlock); up_write(&snd_pcm_link_rwsem); _nolock: snd_card_unref(substream1->pcm->card); kfree(group); _badf: fdput(f); return res; } static void relink_to_local(struct snd_pcm_substream *substream) { substream->group = &substream->self_group; INIT_LIST_HEAD(&substream->self_group.substreams); list_add_tail(&substream->link_list, &substream->self_group.substreams); } static int snd_pcm_unlink(struct snd_pcm_substream *substream) { struct snd_pcm_substream *s; int res = 0; down_write_nonfifo(&snd_pcm_link_rwsem); write_lock_irq(&snd_pcm_link_rwlock); if (!snd_pcm_stream_linked(substream)) { res = -EALREADY; goto _end; } list_del(&substream->link_list); substream->group->count--; if (substream->group->count == 1) { /* detach the last stream, too */ snd_pcm_group_for_each_entry(s, substream) { relink_to_local(s); break; } kfree(substream->group); } relink_to_local(substream); _end: write_unlock_irq(&snd_pcm_link_rwlock); up_write(&snd_pcm_link_rwsem); return res; } /* * hw configurator */ static int snd_pcm_hw_rule_mul(struct snd_pcm_hw_params *params, struct snd_pcm_hw_rule *rule) { struct snd_interval t; snd_interval_mul(hw_param_interval_c(params, rule->deps[0]), hw_param_interval_c(params, rule->deps[1]), &t); return snd_interval_refine(hw_param_interval(params, rule->var), &t); } static int snd_pcm_hw_rule_div(struct snd_pcm_hw_params *params, struct snd_pcm_hw_rule *rule) { struct snd_interval t; snd_interval_div(hw_param_interval_c(params, rule->deps[0]), hw_param_interval_c(params, rule->deps[1]), &t); return snd_interval_refine(hw_param_interval(params, rule->var), &t); } static int snd_pcm_hw_rule_muldivk(struct snd_pcm_hw_params *params, struct snd_pcm_hw_rule *rule) { struct snd_interval t; snd_interval_muldivk(hw_param_interval_c(params, rule->deps[0]), hw_param_interval_c(params, rule->deps[1]), (unsigned long) rule->private, &t); return snd_interval_refine(hw_param_interval(params, rule->var), &t); } static int snd_pcm_hw_rule_mulkdiv(struct snd_pcm_hw_params *params, struct snd_pcm_hw_rule *rule) { struct snd_interval t; snd_interval_mulkdiv(hw_param_interval_c(params, rule->deps[0]), (unsigned long) rule->private, hw_param_interval_c(params, rule->deps[1]), &t); return snd_interval_refine(hw_param_interval(params, rule->var), &t); } static int snd_pcm_hw_rule_format(struct snd_pcm_hw_params *params, struct snd_pcm_hw_rule *rule) { unsigned int k; const struct snd_interval *i = hw_param_interval_c(params, rule->deps[0]); struct snd_mask m; struct snd_mask *mask = hw_param_mask(params, SNDRV_PCM_HW_PARAM_FORMAT); snd_mask_any(&m); for (k = 0; k <= SNDRV_PCM_FORMAT_LAST; ++k) { int bits; if (! snd_mask_test(mask, k)) continue; bits = snd_pcm_format_physical_width(k); if (bits <= 0) continue; /* ignore invalid formats */ if ((unsigned)bits < i->min || (unsigned)bits > i->max) snd_mask_reset(&m, k); } return snd_mask_refine(mask, &m); } static int snd_pcm_hw_rule_sample_bits(struct snd_pcm_hw_params *params, struct snd_pcm_hw_rule *rule) { struct snd_interval t; unsigned int k; t.min = UINT_MAX; t.max = 0; t.openmin = 0; t.openmax = 0; for (k = 0; k <= SNDRV_PCM_FORMAT_LAST; ++k) { int bits; if (! snd_mask_test(hw_param_mask(params, SNDRV_PCM_HW_PARAM_FORMAT), k)) continue; bits = snd_pcm_format_physical_width(k); if (bits <= 0) continue; /* ignore invalid formats */ if (t.min > (unsigned)bits) t.min = bits; if (t.max < (unsigned)bits) t.max = bits; } t.integer = 1; return snd_interval_refine(hw_param_interval(params, rule->var), &t); } #if SNDRV_PCM_RATE_5512 != 1 << 0 || SNDRV_PCM_RATE_192000 != 1 << 12 #error "Change this table" #endif static const unsigned int rates[] = { 5512, 8000, 11025, 16000, 22050, 32000, 44100, 48000, 64000, 88200, 96000, 176400, 192000 }; const struct snd_pcm_hw_constraint_list snd_pcm_known_rates = { .count = ARRAY_SIZE(rates), .list = rates, }; static int snd_pcm_hw_rule_rate(struct snd_pcm_hw_params *params, struct snd_pcm_hw_rule *rule) { struct snd_pcm_hardware *hw = rule->private; return snd_interval_list(hw_param_interval(params, rule->var), snd_pcm_known_rates.count, snd_pcm_known_rates.list, hw->rates); } static int snd_pcm_hw_rule_buffer_bytes_max(struct snd_pcm_hw_params *params, struct snd_pcm_hw_rule *rule) { struct snd_interval t; struct snd_pcm_substream *substream = rule->private; t.min = 0; t.max = substream->buffer_bytes_max; t.openmin = 0; t.openmax = 0; t.integer = 1; return snd_interval_refine(hw_param_interval(params, rule->var), &t); } int snd_pcm_hw_constraints_init(struct snd_pcm_substream *substream) { struct snd_pcm_runtime *runtime = substream->runtime; struct snd_pcm_hw_constraints *constrs = &runtime->hw_constraints; int k, err; for (k = SNDRV_PCM_HW_PARAM_FIRST_MASK; k <= SNDRV_PCM_HW_PARAM_LAST_MASK; k++) { snd_mask_any(constrs_mask(constrs, k)); } for (k = SNDRV_PCM_HW_PARAM_FIRST_INTERVAL; k <= SNDRV_PCM_HW_PARAM_LAST_INTERVAL; k++) { snd_interval_any(constrs_interval(constrs, k)); } snd_interval_setinteger(constrs_interval(constrs, SNDRV_PCM_HW_PARAM_CHANNELS)); snd_interval_setinteger(constrs_interval(constrs, SNDRV_PCM_HW_PARAM_BUFFER_SIZE)); snd_interval_setinteger(constrs_interval(constrs, SNDRV_PCM_HW_PARAM_BUFFER_BYTES)); snd_interval_setinteger(constrs_interval(constrs, SNDRV_PCM_HW_PARAM_SAMPLE_BITS)); snd_interval_setinteger(constrs_interval(constrs, SNDRV_PCM_HW_PARAM_FRAME_BITS)); err = snd_pcm_hw_rule_add(runtime, 0, SNDRV_PCM_HW_PARAM_FORMAT, snd_pcm_hw_rule_format, NULL, SNDRV_PCM_HW_PARAM_SAMPLE_BITS, -1); if (err < 0) return err; err = snd_pcm_hw_rule_add(runtime, 0, SNDRV_PCM_HW_PARAM_SAMPLE_BITS, snd_pcm_hw_rule_sample_bits, NULL, SNDRV_PCM_HW_PARAM_FORMAT, SNDRV_PCM_HW_PARAM_SAMPLE_BITS, -1); if (err < 0) return err; err = snd_pcm_hw_rule_add(runtime, 0, SNDRV_PCM_HW_PARAM_SAMPLE_BITS, snd_pcm_hw_rule_div, NULL, SNDRV_PCM_HW_PARAM_FRAME_BITS, SNDRV_PCM_HW_PARAM_CHANNELS, -1); if (err < 0) return err; err = snd_pcm_hw_rule_add(runtime, 0, SNDRV_PCM_HW_PARAM_FRAME_BITS, snd_pcm_hw_rule_mul, NULL, SNDRV_PCM_HW_PARAM_SAMPLE_BITS, SNDRV_PCM_HW_PARAM_CHANNELS, -1); if (err < 0) return err; err = snd_pcm_hw_rule_add(runtime, 0, SNDRV_PCM_HW_PARAM_FRAME_BITS, snd_pcm_hw_rule_mulkdiv, (void*) 8, SNDRV_PCM_HW_PARAM_PERIOD_BYTES, SNDRV_PCM_HW_PARAM_PERIOD_SIZE, -1); if (err < 0) return err; err = snd_pcm_hw_rule_add(runtime, 0, SNDRV_PCM_HW_PARAM_FRAME_BITS, snd_pcm_hw_rule_mulkdiv, (void*) 8, SNDRV_PCM_HW_PARAM_BUFFER_BYTES, SNDRV_PCM_HW_PARAM_BUFFER_SIZE, -1); if (err < 0) return err; err = snd_pcm_hw_rule_add(runtime, 0, SNDRV_PCM_HW_PARAM_CHANNELS, snd_pcm_hw_rule_div, NULL, SNDRV_PCM_HW_PARAM_FRAME_BITS, SNDRV_PCM_HW_PARAM_SAMPLE_BITS, -1); if (err < 0) return err; err = snd_pcm_hw_rule_add(runtime, 0, SNDRV_PCM_HW_PARAM_RATE, snd_pcm_hw_rule_mulkdiv, (void*) 1000000, SNDRV_PCM_HW_PARAM_PERIOD_SIZE, SNDRV_PCM_HW_PARAM_PERIOD_TIME, -1); if (err < 0) return err; err = snd_pcm_hw_rule_add(runtime, 0, SNDRV_PCM_HW_PARAM_RATE, snd_pcm_hw_rule_mulkdiv, (void*) 1000000, SNDRV_PCM_HW_PARAM_BUFFER_SIZE, SNDRV_PCM_HW_PARAM_BUFFER_TIME, -1); if (err < 0) return err; err = snd_pcm_hw_rule_add(runtime, 0, SNDRV_PCM_HW_PARAM_PERIODS, snd_pcm_hw_rule_div, NULL, SNDRV_PCM_HW_PARAM_BUFFER_SIZE, SNDRV_PCM_HW_PARAM_PERIOD_SIZE, -1); if (err < 0) return err; err = snd_pcm_hw_rule_add(runtime, 0, SNDRV_PCM_HW_PARAM_PERIOD_SIZE, snd_pcm_hw_rule_div, NULL, SNDRV_PCM_HW_PARAM_BUFFER_SIZE, SNDRV_PCM_HW_PARAM_PERIODS, -1); if (err < 0) return err; err = snd_pcm_hw_rule_add(runtime, 0, SNDRV_PCM_HW_PARAM_PERIOD_SIZE, snd_pcm_hw_rule_mulkdiv, (void*) 8, SNDRV_PCM_HW_PARAM_PERIOD_BYTES, SNDRV_PCM_HW_PARAM_FRAME_BITS, -1); if (err < 0) return err; err = snd_pcm_hw_rule_add(runtime, 0, SNDRV_PCM_HW_PARAM_PERIOD_SIZE, snd_pcm_hw_rule_muldivk, (void*) 1000000, SNDRV_PCM_HW_PARAM_PERIOD_TIME, SNDRV_PCM_HW_PARAM_RATE, -1); if (err < 0) return err; err = snd_pcm_hw_rule_add(runtime, 0, SNDRV_PCM_HW_PARAM_BUFFER_SIZE, snd_pcm_hw_rule_mul, NULL, SNDRV_PCM_HW_PARAM_PERIOD_SIZE, SNDRV_PCM_HW_PARAM_PERIODS, -1); if (err < 0) return err; err = snd_pcm_hw_rule_add(runtime, 0, SNDRV_PCM_HW_PARAM_BUFFER_SIZE, snd_pcm_hw_rule_mulkdiv, (void*) 8, SNDRV_PCM_HW_PARAM_BUFFER_BYTES, SNDRV_PCM_HW_PARAM_FRAME_BITS, -1); if (err < 0) return err; err = snd_pcm_hw_rule_add(runtime, 0, SNDRV_PCM_HW_PARAM_BUFFER_SIZE, snd_pcm_hw_rule_muldivk, (void*) 1000000, SNDRV_PCM_HW_PARAM_BUFFER_TIME, SNDRV_PCM_HW_PARAM_RATE, -1); if (err < 0) return err; err = snd_pcm_hw_rule_add(runtime, 0, SNDRV_PCM_HW_PARAM_PERIOD_BYTES, snd_pcm_hw_rule_muldivk, (void*) 8, SNDRV_PCM_HW_PARAM_PERIOD_SIZE, SNDRV_PCM_HW_PARAM_FRAME_BITS, -1); if (err < 0) return err; err = snd_pcm_hw_rule_add(runtime, 0, SNDRV_PCM_HW_PARAM_BUFFER_BYTES, snd_pcm_hw_rule_muldivk, (void*) 8, SNDRV_PCM_HW_PARAM_BUFFER_SIZE, SNDRV_PCM_HW_PARAM_FRAME_BITS, -1); if (err < 0) return err; err = snd_pcm_hw_rule_add(runtime, 0, SNDRV_PCM_HW_PARAM_PERIOD_TIME, snd_pcm_hw_rule_mulkdiv, (void*) 1000000, SNDRV_PCM_HW_PARAM_PERIOD_SIZE, SNDRV_PCM_HW_PARAM_RATE, -1); if (err < 0) return err; err = snd_pcm_hw_rule_add(runtime, 0, SNDRV_PCM_HW_PARAM_BUFFER_TIME, snd_pcm_hw_rule_mulkdiv, (void*) 1000000, SNDRV_PCM_HW_PARAM_BUFFER_SIZE, SNDRV_PCM_HW_PARAM_RATE, -1); if (err < 0) return err; return 0; } int snd_pcm_hw_constraints_complete(struct snd_pcm_substream *substream) { struct snd_pcm_runtime *runtime = substream->runtime; struct snd_pcm_hardware *hw = &runtime->hw; int err; unsigned int mask = 0; if (hw->info & SNDRV_PCM_INFO_INTERLEAVED) mask |= 1 << SNDRV_PCM_ACCESS_RW_INTERLEAVED; if (hw->info & SNDRV_PCM_INFO_NONINTERLEAVED) mask |= 1 << SNDRV_PCM_ACCESS_RW_NONINTERLEAVED; if (hw_support_mmap(substream)) { if (hw->info & SNDRV_PCM_INFO_INTERLEAVED) mask |= 1 << SNDRV_PCM_ACCESS_MMAP_INTERLEAVED; if (hw->info & SNDRV_PCM_INFO_NONINTERLEAVED) mask |= 1 << SNDRV_PCM_ACCESS_MMAP_NONINTERLEAVED; if (hw->info & SNDRV_PCM_INFO_COMPLEX) mask |= 1 << SNDRV_PCM_ACCESS_MMAP_COMPLEX; } err = snd_pcm_hw_constraint_mask(runtime, SNDRV_PCM_HW_PARAM_ACCESS, mask); if (err < 0) return err; err = snd_pcm_hw_constraint_mask64(runtime, SNDRV_PCM_HW_PARAM_FORMAT, hw->formats); if (err < 0) return err; err = snd_pcm_hw_constraint_mask(runtime, SNDRV_PCM_HW_PARAM_SUBFORMAT, 1 << SNDRV_PCM_SUBFORMAT_STD); if (err < 0) return err; err = snd_pcm_hw_constraint_minmax(runtime, SNDRV_PCM_HW_PARAM_CHANNELS, hw->channels_min, hw->channels_max); if (err < 0) return err; err = snd_pcm_hw_constraint_minmax(runtime, SNDRV_PCM_HW_PARAM_RATE, hw->rate_min, hw->rate_max); if (err < 0) return err; err = snd_pcm_hw_constraint_minmax(runtime, SNDRV_PCM_HW_PARAM_PERIOD_BYTES, hw->period_bytes_min, hw->period_bytes_max); if (err < 0) return err; err = snd_pcm_hw_constraint_minmax(runtime, SNDRV_PCM_HW_PARAM_PERIODS, hw->periods_min, hw->periods_max); if (err < 0) return err; err = snd_pcm_hw_constraint_minmax(runtime, SNDRV_PCM_HW_PARAM_BUFFER_BYTES, hw->period_bytes_min, hw->buffer_bytes_max); if (err < 0) return err; err = snd_pcm_hw_rule_add(runtime, 0, SNDRV_PCM_HW_PARAM_BUFFER_BYTES, snd_pcm_hw_rule_buffer_bytes_max, substream, SNDRV_PCM_HW_PARAM_BUFFER_BYTES, -1); if (err < 0) return err; /* FIXME: remove */ if (runtime->dma_bytes) { err = snd_pcm_hw_constraint_minmax(runtime, SNDRV_PCM_HW_PARAM_BUFFER_BYTES, 0, runtime->dma_bytes); if (err < 0) return err; } if (!(hw->rates & (SNDRV_PCM_RATE_KNOT | SNDRV_PCM_RATE_CONTINUOUS))) { err = snd_pcm_hw_rule_add(runtime, 0, SNDRV_PCM_HW_PARAM_RATE, snd_pcm_hw_rule_rate, hw, SNDRV_PCM_HW_PARAM_RATE, -1); if (err < 0) return err; } /* FIXME: this belong to lowlevel */ snd_pcm_hw_constraint_integer(runtime, SNDRV_PCM_HW_PARAM_PERIOD_SIZE); return 0; } static void pcm_release_private(struct snd_pcm_substream *substream) { if (snd_pcm_stream_linked(substream)) snd_pcm_unlink(substream); } void snd_pcm_release_substream(struct snd_pcm_substream *substream) { substream->ref_count--; if (substream->ref_count > 0) return; snd_pcm_drop(substream); if (substream->hw_opened) { if (substream->ops->hw_free && substream->runtime->status->state != SNDRV_PCM_STATE_OPEN) substream->ops->hw_free(substream); substream->ops->close(substream); substream->hw_opened = 0; } if (pm_qos_request_active(&substream->latency_pm_qos_req)) pm_qos_remove_request(&substream->latency_pm_qos_req); if (substream->pcm_release) { substream->pcm_release(substream); substream->pcm_release = NULL; } snd_pcm_detach_substream(substream); } EXPORT_SYMBOL(snd_pcm_release_substream); int snd_pcm_open_substream(struct snd_pcm *pcm, int stream, struct file *file, struct snd_pcm_substream **rsubstream) { struct snd_pcm_substream *substream; int err; err = snd_pcm_attach_substream(pcm, stream, file, &substream); if (err < 0) return err; if (substream->ref_count > 1) { *rsubstream = substream; return 0; } err = snd_pcm_hw_constraints_init(substream); if (err < 0) { pcm_dbg(pcm, "snd_pcm_hw_constraints_init failed\n"); goto error; } if ((err = substream->ops->open(substream)) < 0) goto error; substream->hw_opened = 1; err = snd_pcm_hw_constraints_complete(substream); if (err < 0) { pcm_dbg(pcm, "snd_pcm_hw_constraints_complete failed\n"); goto error; } *rsubstream = substream; return 0; error: snd_pcm_release_substream(substream); return err; } EXPORT_SYMBOL(snd_pcm_open_substream); static int snd_pcm_open_file(struct file *file, struct snd_pcm *pcm, int stream) { struct snd_pcm_file *pcm_file; struct snd_pcm_substream *substream; int err; err = snd_pcm_open_substream(pcm, stream, file, &substream); if (err < 0) return err; pcm_file = kzalloc(sizeof(*pcm_file), GFP_KERNEL); if (pcm_file == NULL) { snd_pcm_release_substream(substream); return -ENOMEM; } pcm_file->substream = substream; if (substream->ref_count == 1) { substream->file = pcm_file; substream->pcm_release = pcm_release_private; } file->private_data = pcm_file; return 0; } static int snd_pcm_playback_open(struct inode *inode, struct file *file) { struct snd_pcm *pcm; int err = nonseekable_open(inode, file); if (err < 0) return err; pcm = snd_lookup_minor_data(iminor(inode), SNDRV_DEVICE_TYPE_PCM_PLAYBACK); err = snd_pcm_open(file, pcm, SNDRV_PCM_STREAM_PLAYBACK); if (pcm) snd_card_unref(pcm->card); return err; } static int snd_pcm_capture_open(struct inode *inode, struct file *file) { struct snd_pcm *pcm; int err = nonseekable_open(inode, file); if (err < 0) return err; pcm = snd_lookup_minor_data(iminor(inode), SNDRV_DEVICE_TYPE_PCM_CAPTURE); err = snd_pcm_open(file, pcm, SNDRV_PCM_STREAM_CAPTURE); if (pcm) snd_card_unref(pcm->card); return err; } static int snd_pcm_open(struct file *file, struct snd_pcm *pcm, int stream) { int err; wait_queue_entry_t wait; if (pcm == NULL) { err = -ENODEV; goto __error1; } err = snd_card_file_add(pcm->card, file); if (err < 0) goto __error1; if (!try_module_get(pcm->card->module)) { err = -EFAULT; goto __error2; } init_waitqueue_entry(&wait, current); add_wait_queue(&pcm->open_wait, &wait); mutex_lock(&pcm->open_mutex); while (1) { err = snd_pcm_open_file(file, pcm, stream); if (err >= 0) break; if (err == -EAGAIN) { if (file->f_flags & O_NONBLOCK) { err = -EBUSY; break; } } else break; set_current_state(TASK_INTERRUPTIBLE); mutex_unlock(&pcm->open_mutex); schedule(); mutex_lock(&pcm->open_mutex); if (pcm->card->shutdown) { err = -ENODEV; break; } if (signal_pending(current)) { err = -ERESTARTSYS; break; } } remove_wait_queue(&pcm->open_wait, &wait); mutex_unlock(&pcm->open_mutex); if (err < 0) goto __error; return err; __error: module_put(pcm->card->module); __error2: snd_card_file_remove(pcm->card, file); __error1: return err; } static int snd_pcm_release(struct inode *inode, struct file *file) { struct snd_pcm *pcm; struct snd_pcm_substream *substream; struct snd_pcm_file *pcm_file; pcm_file = file->private_data; substream = pcm_file->substream; if (snd_BUG_ON(!substream)) return -ENXIO; pcm = substream->pcm; mutex_lock(&pcm->open_mutex); snd_pcm_release_substream(substream); kfree(pcm_file); mutex_unlock(&pcm->open_mutex); wake_up(&pcm->open_wait); module_put(pcm->card->module); snd_card_file_remove(pcm->card, file); return 0; } /* check and update PCM state; return 0 or a negative error * call this inside PCM lock */ static int do_pcm_hwsync(struct snd_pcm_substream *substream) { switch (substream->runtime->status->state) { case SNDRV_PCM_STATE_DRAINING: if (substream->stream == SNDRV_PCM_STREAM_CAPTURE) return -EBADFD; /* Fall through */ case SNDRV_PCM_STATE_RUNNING: return snd_pcm_update_hw_ptr(substream); case SNDRV_PCM_STATE_PREPARED: case SNDRV_PCM_STATE_PAUSED: return 0; case SNDRV_PCM_STATE_SUSPENDED: return -ESTRPIPE; case SNDRV_PCM_STATE_XRUN: return -EPIPE; default: return -EBADFD; } } /* increase the appl_ptr; returns the processed frames or a negative error */ static snd_pcm_sframes_t forward_appl_ptr(struct snd_pcm_substream *substream, snd_pcm_uframes_t frames, snd_pcm_sframes_t avail) { struct snd_pcm_runtime *runtime = substream->runtime; snd_pcm_sframes_t appl_ptr; int ret; if (avail <= 0) return 0; if (frames > (snd_pcm_uframes_t)avail) frames = avail; appl_ptr = runtime->control->appl_ptr + frames; if (appl_ptr >= (snd_pcm_sframes_t)runtime->boundary) appl_ptr -= runtime->boundary; ret = pcm_lib_apply_appl_ptr(substream, appl_ptr); return ret < 0 ? ret : frames; } /* decrease the appl_ptr; returns the processed frames or zero for error */ static snd_pcm_sframes_t rewind_appl_ptr(struct snd_pcm_substream *substream, snd_pcm_uframes_t frames, snd_pcm_sframes_t avail) { struct snd_pcm_runtime *runtime = substream->runtime; snd_pcm_sframes_t appl_ptr; int ret; if (avail <= 0) return 0; if (frames > (snd_pcm_uframes_t)avail) frames = avail; appl_ptr = runtime->control->appl_ptr - frames; if (appl_ptr < 0) appl_ptr += runtime->boundary; ret = pcm_lib_apply_appl_ptr(substream, appl_ptr); /* NOTE: we return zero for errors because PulseAudio gets depressed * upon receiving an error from rewind ioctl and stops processing * any longer. Returning zero means that no rewind is done, so * it's not absolutely wrong to answer like that. */ return ret < 0 ? 0 : frames; } static snd_pcm_sframes_t snd_pcm_rewind(struct snd_pcm_substream *substream, snd_pcm_uframes_t frames) { snd_pcm_sframes_t ret; if (frames == 0) return 0; snd_pcm_stream_lock_irq(substream); ret = do_pcm_hwsync(substream); if (!ret) ret = rewind_appl_ptr(substream, frames, snd_pcm_hw_avail(substream)); snd_pcm_stream_unlock_irq(substream); return ret; } static snd_pcm_sframes_t snd_pcm_forward(struct snd_pcm_substream *substream, snd_pcm_uframes_t frames) { snd_pcm_sframes_t ret; if (frames == 0) return 0; snd_pcm_stream_lock_irq(substream); ret = do_pcm_hwsync(substream); if (!ret) ret = forward_appl_ptr(substream, frames, snd_pcm_avail(substream)); snd_pcm_stream_unlock_irq(substream); return ret; } static int snd_pcm_hwsync(struct snd_pcm_substream *substream) { int err; snd_pcm_stream_lock_irq(substream); err = do_pcm_hwsync(substream); snd_pcm_stream_unlock_irq(substream); return err; } static int snd_pcm_delay(struct snd_pcm_substream *substream, snd_pcm_sframes_t *delay) { int err; snd_pcm_sframes_t n = 0; snd_pcm_stream_lock_irq(substream); err = do_pcm_hwsync(substream); if (!err) n = snd_pcm_calc_delay(substream); snd_pcm_stream_unlock_irq(substream); if (!err) *delay = n; return err; } static int snd_pcm_sync_ptr(struct snd_pcm_substream *substream, struct snd_pcm_sync_ptr __user *_sync_ptr) { struct snd_pcm_runtime *runtime = substream->runtime; struct snd_pcm_sync_ptr sync_ptr; volatile struct snd_pcm_mmap_status *status; volatile struct snd_pcm_mmap_control *control; int err; memset(&sync_ptr, 0, sizeof(sync_ptr)); if (get_user(sync_ptr.flags, (unsigned __user *)&(_sync_ptr->flags))) return -EFAULT; if (copy_from_user(&sync_ptr.c.control, &(_sync_ptr->c.control), sizeof(struct snd_pcm_mmap_control))) return -EFAULT; status = runtime->status; control = runtime->control; if (sync_ptr.flags & SNDRV_PCM_SYNC_PTR_HWSYNC) { err = snd_pcm_hwsync(substream); if (err < 0) return err; } snd_pcm_stream_lock_irq(substream); if (!(sync_ptr.flags & SNDRV_PCM_SYNC_PTR_APPL)) { err = pcm_lib_apply_appl_ptr(substream, sync_ptr.c.control.appl_ptr); if (err < 0) { snd_pcm_stream_unlock_irq(substream); return err; } } else { sync_ptr.c.control.appl_ptr = control->appl_ptr; } if (!(sync_ptr.flags & SNDRV_PCM_SYNC_PTR_AVAIL_MIN)) control->avail_min = sync_ptr.c.control.avail_min; else sync_ptr.c.control.avail_min = control->avail_min; sync_ptr.s.status.state = status->state; sync_ptr.s.status.hw_ptr = status->hw_ptr; sync_ptr.s.status.tstamp = status->tstamp; sync_ptr.s.status.suspended_state = status->suspended_state; sync_ptr.s.status.audio_tstamp = status->audio_tstamp; snd_pcm_stream_unlock_irq(substream); if (copy_to_user(_sync_ptr, &sync_ptr, sizeof(sync_ptr))) return -EFAULT; return 0; } static int snd_pcm_tstamp(struct snd_pcm_substream *substream, int __user *_arg) { struct snd_pcm_runtime *runtime = substream->runtime; int arg; if (get_user(arg, _arg)) return -EFAULT; if (arg < 0 || arg > SNDRV_PCM_TSTAMP_TYPE_LAST) return -EINVAL; runtime->tstamp_type = arg; return 0; } static int snd_pcm_xferi_frames_ioctl(struct snd_pcm_substream *substream, struct snd_xferi __user *_xferi) { struct snd_xferi xferi; struct snd_pcm_runtime *runtime = substream->runtime; snd_pcm_sframes_t result; if (runtime->status->state == SNDRV_PCM_STATE_OPEN) return -EBADFD; if (put_user(0, &_xferi->result)) return -EFAULT; if (copy_from_user(&xferi, _xferi, sizeof(xferi))) return -EFAULT; if (substream->stream == SNDRV_PCM_STREAM_PLAYBACK) result = snd_pcm_lib_write(substream, xferi.buf, xferi.frames); else result = snd_pcm_lib_read(substream, xferi.buf, xferi.frames); __put_user(result, &_xferi->result); return result < 0 ? result : 0; } static int snd_pcm_xfern_frames_ioctl(struct snd_pcm_substream *substream, struct snd_xfern __user *_xfern) { struct snd_xfern xfern; struct snd_pcm_runtime *runtime = substream->runtime; void *bufs; snd_pcm_sframes_t result; if (runtime->status->state == SNDRV_PCM_STATE_OPEN) return -EBADFD; if (runtime->channels > 128) return -EINVAL; if (put_user(0, &_xfern->result)) return -EFAULT; if (copy_from_user(&xfern, _xfern, sizeof(xfern))) return -EFAULT; bufs = memdup_user(xfern.bufs, sizeof(void *) * runtime->channels); if (IS_ERR(bufs)) return PTR_ERR(bufs); if (substream->stream == SNDRV_PCM_STREAM_PLAYBACK) result = snd_pcm_lib_writev(substream, bufs, xfern.frames); else result = snd_pcm_lib_readv(substream, bufs, xfern.frames); kfree(bufs); __put_user(result, &_xfern->result); return result < 0 ? result : 0; } static int snd_pcm_rewind_ioctl(struct snd_pcm_substream *substream, snd_pcm_uframes_t __user *_frames) { snd_pcm_uframes_t frames; snd_pcm_sframes_t result; if (get_user(frames, _frames)) return -EFAULT; if (put_user(0, _frames)) return -EFAULT; result = snd_pcm_rewind(substream, frames); __put_user(result, _frames); return result < 0 ? result : 0; } static int snd_pcm_forward_ioctl(struct snd_pcm_substream *substream, snd_pcm_uframes_t __user *_frames) { snd_pcm_uframes_t frames; snd_pcm_sframes_t result; if (get_user(frames, _frames)) return -EFAULT; if (put_user(0, _frames)) return -EFAULT; result = snd_pcm_forward(substream, frames); __put_user(result, _frames); return result < 0 ? result : 0; } static int snd_pcm_common_ioctl(struct file *file, struct snd_pcm_substream *substream, unsigned int cmd, void __user *arg) { struct snd_pcm_file *pcm_file = file->private_data; int res; if (PCM_RUNTIME_CHECK(substream)) return -ENXIO; res = snd_power_wait(substream->pcm->card, SNDRV_CTL_POWER_D0); if (res < 0) return res; switch (cmd) { case SNDRV_PCM_IOCTL_PVERSION: return put_user(SNDRV_PCM_VERSION, (int __user *)arg) ? -EFAULT : 0; case SNDRV_PCM_IOCTL_INFO: return snd_pcm_info_user(substream, arg); case SNDRV_PCM_IOCTL_TSTAMP: /* just for compatibility */ return 0; case SNDRV_PCM_IOCTL_TTSTAMP: return snd_pcm_tstamp(substream, arg); case SNDRV_PCM_IOCTL_USER_PVERSION: if (get_user(pcm_file->user_pversion, (unsigned int __user *)arg)) return -EFAULT; return 0; case SNDRV_PCM_IOCTL_HW_REFINE: return snd_pcm_hw_refine_user(substream, arg); case SNDRV_PCM_IOCTL_HW_PARAMS: return snd_pcm_hw_params_user(substream, arg); case SNDRV_PCM_IOCTL_HW_FREE: return snd_pcm_hw_free(substream); case SNDRV_PCM_IOCTL_SW_PARAMS: return snd_pcm_sw_params_user(substream, arg); case SNDRV_PCM_IOCTL_STATUS: return snd_pcm_status_user(substream, arg, false); case SNDRV_PCM_IOCTL_STATUS_EXT: return snd_pcm_status_user(substream, arg, true); case SNDRV_PCM_IOCTL_CHANNEL_INFO: return snd_pcm_channel_info_user(substream, arg); case SNDRV_PCM_IOCTL_PREPARE: return snd_pcm_prepare(substream, file); case SNDRV_PCM_IOCTL_RESET: return snd_pcm_reset(substream); case SNDRV_PCM_IOCTL_START: return snd_pcm_start_lock_irq(substream); case SNDRV_PCM_IOCTL_LINK: return snd_pcm_link(substream, (int)(unsigned long) arg); case SNDRV_PCM_IOCTL_UNLINK: return snd_pcm_unlink(substream); case SNDRV_PCM_IOCTL_RESUME: return snd_pcm_resume(substream); case SNDRV_PCM_IOCTL_XRUN: return snd_pcm_xrun(substream); case SNDRV_PCM_IOCTL_HWSYNC: return snd_pcm_hwsync(substream); case SNDRV_PCM_IOCTL_DELAY: { snd_pcm_sframes_t delay; snd_pcm_sframes_t __user *res = arg; int err; err = snd_pcm_delay(substream, &delay); if (err) return err; if (put_user(delay, res)) return -EFAULT; return 0; } case SNDRV_PCM_IOCTL_SYNC_PTR: return snd_pcm_sync_ptr(substream, arg); #ifdef CONFIG_SND_SUPPORT_OLD_API case SNDRV_PCM_IOCTL_HW_REFINE_OLD: return snd_pcm_hw_refine_old_user(substream, arg); case SNDRV_PCM_IOCTL_HW_PARAMS_OLD: return snd_pcm_hw_params_old_user(substream, arg); #endif case SNDRV_PCM_IOCTL_DRAIN: return snd_pcm_drain(substream, file); case SNDRV_PCM_IOCTL_DROP: return snd_pcm_drop(substream); case SNDRV_PCM_IOCTL_PAUSE: return snd_pcm_action_lock_irq(&snd_pcm_action_pause, substream, (int)(unsigned long)arg); case SNDRV_PCM_IOCTL_WRITEI_FRAMES: case SNDRV_PCM_IOCTL_READI_FRAMES: return snd_pcm_xferi_frames_ioctl(substream, arg); case SNDRV_PCM_IOCTL_WRITEN_FRAMES: case SNDRV_PCM_IOCTL_READN_FRAMES: return snd_pcm_xfern_frames_ioctl(substream, arg); case SNDRV_PCM_IOCTL_REWIND: return snd_pcm_rewind_ioctl(substream, arg); case SNDRV_PCM_IOCTL_FORWARD: return snd_pcm_forward_ioctl(substream, arg); } pcm_dbg(substream->pcm, "unknown ioctl = 0x%x\n", cmd); return -ENOTTY; } static long snd_pcm_ioctl(struct file *file, unsigned int cmd, unsigned long arg) { struct snd_pcm_file *pcm_file; pcm_file = file->private_data; if (((cmd >> 8) & 0xff) != 'A') return -ENOTTY; return snd_pcm_common_ioctl(file, pcm_file->substream, cmd, (void __user *)arg); } /** * snd_pcm_kernel_ioctl - Execute PCM ioctl in the kernel-space * @substream: PCM substream * @cmd: IOCTL cmd * @arg: IOCTL argument * * The function is provided primarily for OSS layer and USB gadget drivers, * and it allows only the limited set of ioctls (hw_params, sw_params, * prepare, start, drain, drop, forward). */ int snd_pcm_kernel_ioctl(struct snd_pcm_substream *substream, unsigned int cmd, void *arg) { snd_pcm_uframes_t *frames = arg; snd_pcm_sframes_t result; switch (cmd) { case SNDRV_PCM_IOCTL_FORWARD: { /* provided only for OSS; capture-only and no value returned */ if (substream->stream != SNDRV_PCM_STREAM_CAPTURE) return -EINVAL; result = snd_pcm_forward(substream, *frames); return result < 0 ? result : 0; } case SNDRV_PCM_IOCTL_HW_PARAMS: return snd_pcm_hw_params(substream, arg); case SNDRV_PCM_IOCTL_SW_PARAMS: return snd_pcm_sw_params(substream, arg); case SNDRV_PCM_IOCTL_PREPARE: return snd_pcm_prepare(substream, NULL); case SNDRV_PCM_IOCTL_START: return snd_pcm_start_lock_irq(substream); case SNDRV_PCM_IOCTL_DRAIN: return snd_pcm_drain(substream, NULL); case SNDRV_PCM_IOCTL_DROP: return snd_pcm_drop(substream); case SNDRV_PCM_IOCTL_DELAY: return snd_pcm_delay(substream, frames); default: return -EINVAL; } } EXPORT_SYMBOL(snd_pcm_kernel_ioctl); static ssize_t snd_pcm_read(struct file *file, char __user *buf, size_t count, loff_t * offset) { struct snd_pcm_file *pcm_file; struct snd_pcm_substream *substream; struct snd_pcm_runtime *runtime; snd_pcm_sframes_t result; pcm_file = file->private_data; substream = pcm_file->substream; if (PCM_RUNTIME_CHECK(substream)) return -ENXIO; runtime = substream->runtime; if (runtime->status->state == SNDRV_PCM_STATE_OPEN) return -EBADFD; if (!frame_aligned(runtime, count)) return -EINVAL; count = bytes_to_frames(runtime, count); result = snd_pcm_lib_read(substream, buf, count); if (result > 0) result = frames_to_bytes(runtime, result); return result; } static ssize_t snd_pcm_write(struct file *file, const char __user *buf, size_t count, loff_t * offset) { struct snd_pcm_file *pcm_file; struct snd_pcm_substream *substream; struct snd_pcm_runtime *runtime; snd_pcm_sframes_t result; pcm_file = file->private_data; substream = pcm_file->substream; if (PCM_RUNTIME_CHECK(substream)) return -ENXIO; runtime = substream->runtime; if (runtime->status->state == SNDRV_PCM_STATE_OPEN) return -EBADFD; if (!frame_aligned(runtime, count)) return -EINVAL; count = bytes_to_frames(runtime, count); result = snd_pcm_lib_write(substream, buf, count); if (result > 0) result = frames_to_bytes(runtime, result); return result; } static ssize_t snd_pcm_readv(struct kiocb *iocb, struct iov_iter *to) { struct snd_pcm_file *pcm_file; struct snd_pcm_substream *substream; struct snd_pcm_runtime *runtime; snd_pcm_sframes_t result; unsigned long i; void __user **bufs; snd_pcm_uframes_t frames; pcm_file = iocb->ki_filp->private_data; substream = pcm_file->substream; if (PCM_RUNTIME_CHECK(substream)) return -ENXIO; runtime = substream->runtime; if (runtime->status->state == SNDRV_PCM_STATE_OPEN) return -EBADFD; if (!iter_is_iovec(to)) return -EINVAL; if (to->nr_segs > 1024 || to->nr_segs != runtime->channels) return -EINVAL; if (!frame_aligned(runtime, to->iov->iov_len)) return -EINVAL; frames = bytes_to_samples(runtime, to->iov->iov_len); bufs = kmalloc_array(to->nr_segs, sizeof(void *), GFP_KERNEL); if (bufs == NULL) return -ENOMEM; for (i = 0; i < to->nr_segs; ++i) bufs[i] = to->iov[i].iov_base; result = snd_pcm_lib_readv(substream, bufs, frames); if (result > 0) result = frames_to_bytes(runtime, result); kfree(bufs); return result; } static ssize_t snd_pcm_writev(struct kiocb *iocb, struct iov_iter *from) { struct snd_pcm_file *pcm_file; struct snd_pcm_substream *substream; struct snd_pcm_runtime *runtime; snd_pcm_sframes_t result; unsigned long i; void __user **bufs; snd_pcm_uframes_t frames; pcm_file = iocb->ki_filp->private_data; substream = pcm_file->substream; if (PCM_RUNTIME_CHECK(substream)) return -ENXIO; runtime = substream->runtime; if (runtime->status->state == SNDRV_PCM_STATE_OPEN) return -EBADFD; if (!iter_is_iovec(from)) return -EINVAL; if (from->nr_segs > 128 || from->nr_segs != runtime->channels || !frame_aligned(runtime, from->iov->iov_len)) return -EINVAL; frames = bytes_to_samples(runtime, from->iov->iov_len); bufs = kmalloc_array(from->nr_segs, sizeof(void *), GFP_KERNEL); if (bufs == NULL) return -ENOMEM; for (i = 0; i < from->nr_segs; ++i) bufs[i] = from->iov[i].iov_base; result = snd_pcm_lib_writev(substream, bufs, frames); if (result > 0) result = frames_to_bytes(runtime, result); kfree(bufs); return result; } static __poll_t snd_pcm_poll(struct file *file, poll_table *wait) { struct snd_pcm_file *pcm_file; struct snd_pcm_substream *substream; struct snd_pcm_runtime *runtime; __poll_t mask, ok; snd_pcm_uframes_t avail; pcm_file = file->private_data; substream = pcm_file->substream; if (substream->stream == SNDRV_PCM_STREAM_PLAYBACK) ok = EPOLLOUT | EPOLLWRNORM; else ok = EPOLLIN | EPOLLRDNORM; if (PCM_RUNTIME_CHECK(substream)) return ok | EPOLLERR; runtime = substream->runtime; poll_wait(file, &runtime->sleep, wait); mask = 0; snd_pcm_stream_lock_irq(substream); avail = snd_pcm_avail(substream); switch (runtime->status->state) { case SNDRV_PCM_STATE_RUNNING: case SNDRV_PCM_STATE_PREPARED: case SNDRV_PCM_STATE_PAUSED: if (avail >= runtime->control->avail_min) mask = ok; break; case SNDRV_PCM_STATE_DRAINING: if (substream->stream == SNDRV_PCM_STREAM_CAPTURE) { mask = ok; if (!avail) mask |= EPOLLERR; } break; default: mask = ok | EPOLLERR; break; } snd_pcm_stream_unlock_irq(substream); return mask; } /* * mmap support */ /* * Only on coherent architectures, we can mmap the status and the control records * for effcient data transfer. On others, we have to use HWSYNC ioctl... */ #if defined(CONFIG_X86) || defined(CONFIG_PPC) || defined(CONFIG_ALPHA) /* * mmap status record */ static vm_fault_t snd_pcm_mmap_status_fault(struct vm_fault *vmf) { struct snd_pcm_substream *substream = vmf->vma->vm_private_data; struct snd_pcm_runtime *runtime; if (substream == NULL) return VM_FAULT_SIGBUS; runtime = substream->runtime; vmf->page = virt_to_page(runtime->status); get_page(vmf->page); return 0; } static const struct vm_operations_struct snd_pcm_vm_ops_status = { .fault = snd_pcm_mmap_status_fault, }; static int snd_pcm_mmap_status(struct snd_pcm_substream *substream, struct file *file, struct vm_area_struct *area) { long size; if (!(area->vm_flags & VM_READ)) return -EINVAL; size = area->vm_end - area->vm_start; if (size != PAGE_ALIGN(sizeof(struct snd_pcm_mmap_status))) return -EINVAL; area->vm_ops = &snd_pcm_vm_ops_status; area->vm_private_data = substream; area->vm_flags |= VM_DONTEXPAND | VM_DONTDUMP; return 0; } /* * mmap control record */ static vm_fault_t snd_pcm_mmap_control_fault(struct vm_fault *vmf) { struct snd_pcm_substream *substream = vmf->vma->vm_private_data; struct snd_pcm_runtime *runtime; if (substream == NULL) return VM_FAULT_SIGBUS; runtime = substream->runtime; vmf->page = virt_to_page(runtime->control); get_page(vmf->page); return 0; } static const struct vm_operations_struct snd_pcm_vm_ops_control = { .fault = snd_pcm_mmap_control_fault, }; static int snd_pcm_mmap_control(struct snd_pcm_substream *substream, struct file *file, struct vm_area_struct *area) { long size; if (!(area->vm_flags & VM_READ)) return -EINVAL; size = area->vm_end - area->vm_start; if (size != PAGE_ALIGN(sizeof(struct snd_pcm_mmap_control))) return -EINVAL; area->vm_ops = &snd_pcm_vm_ops_control; area->vm_private_data = substream; area->vm_flags |= VM_DONTEXPAND | VM_DONTDUMP; return 0; } static bool pcm_status_mmap_allowed(struct snd_pcm_file *pcm_file) { if (pcm_file->no_compat_mmap) return false; /* See pcm_control_mmap_allowed() below. * Since older alsa-lib requires both status and control mmaps to be * coupled, we have to disable the status mmap for old alsa-lib, too. */ if (pcm_file->user_pversion < SNDRV_PROTOCOL_VERSION(2, 0, 14) && (pcm_file->substream->runtime->hw.info & SNDRV_PCM_INFO_SYNC_APPLPTR)) return false; return true; } static bool pcm_control_mmap_allowed(struct snd_pcm_file *pcm_file) { if (pcm_file->no_compat_mmap) return false; /* Disallow the control mmap when SYNC_APPLPTR flag is set; * it enforces the user-space to fall back to snd_pcm_sync_ptr(), * thus it effectively assures the manual update of appl_ptr. */ if (pcm_file->substream->runtime->hw.info & SNDRV_PCM_INFO_SYNC_APPLPTR) return false; return true; } #else /* ! coherent mmap */ /* * don't support mmap for status and control records. */ #define pcm_status_mmap_allowed(pcm_file) false #define pcm_control_mmap_allowed(pcm_file) false static int snd_pcm_mmap_status(struct snd_pcm_substream *substream, struct file *file, struct vm_area_struct *area) { return -ENXIO; } static int snd_pcm_mmap_control(struct snd_pcm_substream *substream, struct file *file, struct vm_area_struct *area) { return -ENXIO; } #endif /* coherent mmap */ static inline struct page * snd_pcm_default_page_ops(struct snd_pcm_substream *substream, unsigned long ofs) { void *vaddr = substream->runtime->dma_area + ofs; return virt_to_page(vaddr); } /* * fault callback for mmapping a RAM page */ static vm_fault_t snd_pcm_mmap_data_fault(struct vm_fault *vmf) { struct snd_pcm_substream *substream = vmf->vma->vm_private_data; struct snd_pcm_runtime *runtime; unsigned long offset; struct page * page; size_t dma_bytes; if (substream == NULL) return VM_FAULT_SIGBUS; runtime = substream->runtime; offset = vmf->pgoff << PAGE_SHIFT; dma_bytes = PAGE_ALIGN(runtime->dma_bytes); if (offset > dma_bytes - PAGE_SIZE) return VM_FAULT_SIGBUS; if (substream->ops->page) page = substream->ops->page(substream, offset); else page = snd_pcm_default_page_ops(substream, offset); if (!page) return VM_FAULT_SIGBUS; get_page(page); vmf->page = page; return 0; } static const struct vm_operations_struct snd_pcm_vm_ops_data = { .open = snd_pcm_mmap_data_open, .close = snd_pcm_mmap_data_close, }; static const struct vm_operations_struct snd_pcm_vm_ops_data_fault = { .open = snd_pcm_mmap_data_open, .close = snd_pcm_mmap_data_close, .fault = snd_pcm_mmap_data_fault, }; /* * mmap the DMA buffer on RAM */ /** * snd_pcm_lib_default_mmap - Default PCM data mmap function * @substream: PCM substream * @area: VMA * * This is the default mmap handler for PCM data. When mmap pcm_ops is NULL, * this function is invoked implicitly. */ int snd_pcm_lib_default_mmap(struct snd_pcm_substream *substream, struct vm_area_struct *area) { area->vm_flags |= VM_DONTEXPAND | VM_DONTDUMP; #ifdef CONFIG_GENERIC_ALLOCATOR if (substream->dma_buffer.dev.type == SNDRV_DMA_TYPE_DEV_IRAM) { area->vm_page_prot = pgprot_writecombine(area->vm_page_prot); return remap_pfn_range(area, area->vm_start, substream->dma_buffer.addr >> PAGE_SHIFT, area->vm_end - area->vm_start, area->vm_page_prot); } #endif /* CONFIG_GENERIC_ALLOCATOR */ #ifndef CONFIG_X86 /* for avoiding warnings arch/x86/mm/pat.c */ if (IS_ENABLED(CONFIG_HAS_DMA) && !substream->ops->page && substream->dma_buffer.dev.type == SNDRV_DMA_TYPE_DEV) return dma_mmap_coherent(substream->dma_buffer.dev.dev, area, substream->runtime->dma_area, substream->runtime->dma_addr, substream->runtime->dma_bytes); #endif /* CONFIG_X86 */ /* mmap with fault handler */ area->vm_ops = &snd_pcm_vm_ops_data_fault; return 0; } EXPORT_SYMBOL_GPL(snd_pcm_lib_default_mmap); /* * mmap the DMA buffer on I/O memory area */ #if SNDRV_PCM_INFO_MMAP_IOMEM /** * snd_pcm_lib_mmap_iomem - Default PCM data mmap function for I/O mem * @substream: PCM substream * @area: VMA * * When your hardware uses the iomapped pages as the hardware buffer and * wants to mmap it, pass this function as mmap pcm_ops. Note that this * is supposed to work only on limited architectures. */ int snd_pcm_lib_mmap_iomem(struct snd_pcm_substream *substream, struct vm_area_struct *area) { struct snd_pcm_runtime *runtime = substream->runtime; area->vm_page_prot = pgprot_noncached(area->vm_page_prot); return vm_iomap_memory(area, runtime->dma_addr, runtime->dma_bytes); } EXPORT_SYMBOL(snd_pcm_lib_mmap_iomem); #endif /* SNDRV_PCM_INFO_MMAP */ /* * mmap DMA buffer */ int snd_pcm_mmap_data(struct snd_pcm_substream *substream, struct file *file, struct vm_area_struct *area) { struct snd_pcm_runtime *runtime; long size; unsigned long offset; size_t dma_bytes; int err; if (substream->stream == SNDRV_PCM_STREAM_PLAYBACK) { if (!(area->vm_flags & (VM_WRITE|VM_READ))) return -EINVAL; } else { if (!(area->vm_flags & VM_READ)) return -EINVAL; } runtime = substream->runtime; if (runtime->status->state == SNDRV_PCM_STATE_OPEN) return -EBADFD; if (!(runtime->info & SNDRV_PCM_INFO_MMAP)) return -ENXIO; if (runtime->access == SNDRV_PCM_ACCESS_RW_INTERLEAVED || runtime->access == SNDRV_PCM_ACCESS_RW_NONINTERLEAVED) return -EINVAL; size = area->vm_end - area->vm_start; offset = area->vm_pgoff << PAGE_SHIFT; dma_bytes = PAGE_ALIGN(runtime->dma_bytes); if ((size_t)size > dma_bytes) return -EINVAL; if (offset > dma_bytes - size) return -EINVAL; area->vm_ops = &snd_pcm_vm_ops_data; area->vm_private_data = substream; if (substream->ops->mmap) err = substream->ops->mmap(substream, area); else err = snd_pcm_lib_default_mmap(substream, area); if (!err) atomic_inc(&substream->mmap_count); return err; } EXPORT_SYMBOL(snd_pcm_mmap_data); static int snd_pcm_mmap(struct file *file, struct vm_area_struct *area) { struct snd_pcm_file * pcm_file; struct snd_pcm_substream *substream; unsigned long offset; pcm_file = file->private_data; substream = pcm_file->substream; if (PCM_RUNTIME_CHECK(substream)) return -ENXIO; offset = area->vm_pgoff << PAGE_SHIFT; switch (offset) { case SNDRV_PCM_MMAP_OFFSET_STATUS: if (!pcm_status_mmap_allowed(pcm_file)) return -ENXIO; return snd_pcm_mmap_status(substream, file, area); case SNDRV_PCM_MMAP_OFFSET_CONTROL: if (!pcm_control_mmap_allowed(pcm_file)) return -ENXIO; return snd_pcm_mmap_control(substream, file, area); default: return snd_pcm_mmap_data(substream, file, area); } return 0; } static int snd_pcm_fasync(int fd, struct file * file, int on) { struct snd_pcm_file * pcm_file; struct snd_pcm_substream *substream; struct snd_pcm_runtime *runtime; pcm_file = file->private_data; substream = pcm_file->substream; if (PCM_RUNTIME_CHECK(substream)) return -ENXIO; runtime = substream->runtime; return fasync_helper(fd, file, on, &runtime->fasync); } /* * ioctl32 compat */ #ifdef CONFIG_COMPAT #include "pcm_compat.c" #else #define snd_pcm_ioctl_compat NULL #endif /* * To be removed helpers to keep binary compatibility */ #ifdef CONFIG_SND_SUPPORT_OLD_API #define __OLD_TO_NEW_MASK(x) ((x&7)|((x&0x07fffff8)<<5)) #define __NEW_TO_OLD_MASK(x) ((x&7)|((x&0xffffff00)>>5)) static void snd_pcm_hw_convert_from_old_params(struct snd_pcm_hw_params *params, struct snd_pcm_hw_params_old *oparams) { unsigned int i; memset(params, 0, sizeof(*params)); params->flags = oparams->flags; for (i = 0; i < ARRAY_SIZE(oparams->masks); i++) params->masks[i].bits[0] = oparams->masks[i]; memcpy(params->intervals, oparams->intervals, sizeof(oparams->intervals)); params->rmask = __OLD_TO_NEW_MASK(oparams->rmask); params->cmask = __OLD_TO_NEW_MASK(oparams->cmask); params->info = oparams->info; params->msbits = oparams->msbits; params->rate_num = oparams->rate_num; params->rate_den = oparams->rate_den; params->fifo_size = oparams->fifo_size; } static void snd_pcm_hw_convert_to_old_params(struct snd_pcm_hw_params_old *oparams, struct snd_pcm_hw_params *params) { unsigned int i; memset(oparams, 0, sizeof(*oparams)); oparams->flags = params->flags; for (i = 0; i < ARRAY_SIZE(oparams->masks); i++) oparams->masks[i] = params->masks[i].bits[0]; memcpy(oparams->intervals, params->intervals, sizeof(oparams->intervals)); oparams->rmask = __NEW_TO_OLD_MASK(params->rmask); oparams->cmask = __NEW_TO_OLD_MASK(params->cmask); oparams->info = params->info; oparams->msbits = params->msbits; oparams->rate_num = params->rate_num; oparams->rate_den = params->rate_den; oparams->fifo_size = params->fifo_size; } static int snd_pcm_hw_refine_old_user(struct snd_pcm_substream *substream, struct snd_pcm_hw_params_old __user * _oparams) { struct snd_pcm_hw_params *params; struct snd_pcm_hw_params_old *oparams = NULL; int err; params = kmalloc(sizeof(*params), GFP_KERNEL); if (!params) return -ENOMEM; oparams = memdup_user(_oparams, sizeof(*oparams)); if (IS_ERR(oparams)) { err = PTR_ERR(oparams); goto out; } snd_pcm_hw_convert_from_old_params(params, oparams); err = snd_pcm_hw_refine(substream, params); if (err < 0) goto out_old; err = fixup_unreferenced_params(substream, params); if (err < 0) goto out_old; snd_pcm_hw_convert_to_old_params(oparams, params); if (copy_to_user(_oparams, oparams, sizeof(*oparams))) err = -EFAULT; out_old: kfree(oparams); out: kfree(params); return err; } static int snd_pcm_hw_params_old_user(struct snd_pcm_substream *substream, struct snd_pcm_hw_params_old __user * _oparams) { struct snd_pcm_hw_params *params; struct snd_pcm_hw_params_old *oparams = NULL; int err; params = kmalloc(sizeof(*params), GFP_KERNEL); if (!params) return -ENOMEM; oparams = memdup_user(_oparams, sizeof(*oparams)); if (IS_ERR(oparams)) { err = PTR_ERR(oparams); goto out; } snd_pcm_hw_convert_from_old_params(params, oparams); err = snd_pcm_hw_params(substream, params); if (err < 0) goto out_old; snd_pcm_hw_convert_to_old_params(oparams, params); if (copy_to_user(_oparams, oparams, sizeof(*oparams))) err = -EFAULT; out_old: kfree(oparams); out: kfree(params); return err; } #endif /* CONFIG_SND_SUPPORT_OLD_API */ #ifndef CONFIG_MMU static unsigned long snd_pcm_get_unmapped_area(struct file *file, unsigned long addr, unsigned long len, unsigned long pgoff, unsigned long flags) { struct snd_pcm_file *pcm_file = file->private_data; struct snd_pcm_substream *substream = pcm_file->substream; struct snd_pcm_runtime *runtime = substream->runtime; unsigned long offset = pgoff << PAGE_SHIFT; switch (offset) { case SNDRV_PCM_MMAP_OFFSET_STATUS: return (unsigned long)runtime->status; case SNDRV_PCM_MMAP_OFFSET_CONTROL: return (unsigned long)runtime->control; default: return (unsigned long)runtime->dma_area + offset; } } #else # define snd_pcm_get_unmapped_area NULL #endif /* * Register section */ const struct file_operations snd_pcm_f_ops[2] = { { .owner = THIS_MODULE, .write = snd_pcm_write, .write_iter = snd_pcm_writev, .open = snd_pcm_playback_open, .release = snd_pcm_release, .llseek = no_llseek, .poll = snd_pcm_poll, .unlocked_ioctl = snd_pcm_ioctl, .compat_ioctl = snd_pcm_ioctl_compat, .mmap = snd_pcm_mmap, .fasync = snd_pcm_fasync, .get_unmapped_area = snd_pcm_get_unmapped_area, }, { .owner = THIS_MODULE, .read = snd_pcm_read, .read_iter = snd_pcm_readv, .open = snd_pcm_capture_open, .release = snd_pcm_release, .llseek = no_llseek, .poll = snd_pcm_poll, .unlocked_ioctl = snd_pcm_ioctl, .compat_ioctl = snd_pcm_ioctl_compat, .mmap = snd_pcm_mmap, .fasync = snd_pcm_fasync, .get_unmapped_area = snd_pcm_get_unmapped_area, } };
23 23 23 23 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 /* * Copyright (c) 2004 The Regents of the University of Michigan. * Copyright (c) 2012 Jeff Layton <jlayton@redhat.com> * All rights reserved. * * Andy Adamson <andros@citi.umich.edu> * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. Neither the name of the University nor the names of its * contributors may be used to endorse or promote products derived * from this software without specific prior written permission. * * THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESS OR IMPLIED * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF * MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE * DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING * NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. * */ #include <crypto/hash.h> #include <linux/file.h> #include <linux/slab.h> #include <linux/namei.h> #include <linux/sched.h> #include <linux/fs.h> #include <linux/module.h> #include <net/net_namespace.h> #include <linux/sunrpc/rpc_pipe_fs.h> #include <linux/sunrpc/clnt.h> #include <linux/nfsd/cld.h> #include "nfsd.h" #include "state.h" #include "vfs.h" #include "netns.h" #define NFSDDBG_FACILITY NFSDDBG_PROC /* Declarations */ struct nfsd4_client_tracking_ops { int (*init)(struct net *); void (*exit)(struct net *); void (*create)(struct nfs4_client *); void (*remove)(struct nfs4_client *); int (*check)(struct nfs4_client *); void (*grace_done)(struct nfsd_net *); }; /* Globals */ static char user_recovery_dirname[PATH_MAX] = "/var/lib/nfs/v4recovery"; static int nfs4_save_creds(const struct cred **original_creds) { struct cred *new; new = prepare_creds(); if (!new) return -ENOMEM; new->fsuid = GLOBAL_ROOT_UID; new->fsgid = GLOBAL_ROOT_GID; *original_creds = override_creds(new); put_cred(new); return 0; } static void nfs4_reset_creds(const struct cred *original) { revert_creds(original); } static void md5_to_hex(char *out, char *md5) { int i; for (i=0; i<16; i++) { unsigned char c = md5[i]; *out++ = '0' + ((c&0xf0)>>4) + (c>=0xa0)*('a'-'9'-1); *out++ = '0' + (c&0x0f) + ((c&0x0f)>=0x0a)*('a'-'9'-1); } *out = '\0'; } static int nfs4_make_rec_clidname(char *dname, const struct xdr_netobj *clname) { struct xdr_netobj cksum; struct crypto_shash *tfm; int status; dprintk("NFSD: nfs4_make_rec_clidname for %.*s\n", clname->len, clname->data); tfm = crypto_alloc_shash("md5", 0, 0); if (IS_ERR(tfm)) { status = PTR_ERR(tfm); goto out_no_tfm; } cksum.len = crypto_shash_digestsize(tfm); cksum.data = kmalloc(cksum.len, GFP_KERNEL); if (cksum.data == NULL) { status = -ENOMEM; goto out; } { SHASH_DESC_ON_STACK(desc, tfm); desc->tfm = tfm; desc->flags = CRYPTO_TFM_REQ_MAY_SLEEP; status = crypto_shash_digest(desc, clname->data, clname->len, cksum.data); shash_desc_zero(desc); } if (status) goto out; md5_to_hex(dname, cksum.data); status = 0; out: kfree(cksum.data); crypto_free_shash(tfm); out_no_tfm: return status; } /* * If we had an error generating the recdir name for the legacy tracker * then warn the admin. If the error doesn't appear to be transient, * then disable recovery tracking. */ static void legacy_recdir_name_error(struct nfs4_client *clp, int error) { printk(KERN_ERR "NFSD: unable to generate recoverydir " "name (%d).\n", error); /* * if the algorithm just doesn't exist, then disable the recovery * tracker altogether. The crypto libs will generally return this if * FIPS is enabled as well. */ if (error == -ENOENT) { printk(KERN_ERR "NFSD: disabling legacy clientid tracking. " "Reboot recovery will not function correctly!\n"); nfsd4_client_tracking_exit(clp->net); } } static void nfsd4_create_clid_dir(struct nfs4_client *clp) { const struct cred *original_cred; char dname[HEXDIR_LEN]; struct dentry *dir, *dentry; struct nfs4_client_reclaim *crp; int status; struct nfsd_net *nn = net_generic(clp->net, nfsd_net_id); if (test_and_set_bit(NFSD4_CLIENT_STABLE, &clp->cl_flags)) return; if (!nn->rec_file) return; status = nfs4_make_rec_clidname(dname, &clp->cl_name); if (status) return legacy_recdir_name_error(clp, status); status = nfs4_save_creds(&original_cred); if (status < 0) return; status = mnt_want_write_file(nn->rec_file); if (status) goto out_creds; dir = nn->rec_file->f_path.dentry; /* lock the parent */ inode_lock(d_inode(dir)); dentry = lookup_one_len(dname, dir, HEXDIR_LEN-1); if (IS_ERR(dentry)) { status = PTR_ERR(dentry); goto out_unlock; } if (d_really_is_positive(dentry)) /* * In the 4.1 case, where we're called from * reclaim_complete(), records from the previous reboot * may still be left, so this is OK. * * In the 4.0 case, we should never get here; but we may * as well be forgiving and just succeed silently. */ goto out_put; status = vfs_mkdir(d_inode(dir), dentry, S_IRWXU); out_put: dput(dentry); out_unlock: inode_unlock(d_inode(dir)); if (status == 0) { if (nn->in_grace) { crp = nfs4_client_to_reclaim(dname, nn); if (crp) crp->cr_clp = clp; } vfs_fsync(nn->rec_file, 0); } else { printk(KERN_ERR "NFSD: failed to write recovery record" " (err %d); please check that %s exists" " and is writeable", status, user_recovery_dirname); } mnt_drop_write_file(nn->rec_file); out_creds: nfs4_reset_creds(original_cred); } typedef int (recdir_func)(struct dentry *, struct dentry *, struct nfsd_net *); struct name_list { char name[HEXDIR_LEN]; struct list_head list; }; struct nfs4_dir_ctx { struct dir_context ctx; struct list_head names; }; static int nfsd4_build_namelist(struct dir_context *__ctx, const char *name, int namlen, loff_t offset, u64 ino, unsigned int d_type) { struct nfs4_dir_ctx *ctx = container_of(__ctx, struct nfs4_dir_ctx, ctx); struct name_list *entry; if (namlen != HEXDIR_LEN - 1) return 0; entry = kmalloc(sizeof(struct name_list), GFP_KERNEL); if (entry == NULL) return -ENOMEM; memcpy(entry->name, name, HEXDIR_LEN - 1); entry->name[HEXDIR_LEN - 1] = '\0'; list_add(&entry->list, &ctx->names); return 0; } static int nfsd4_list_rec_dir(recdir_func *f, struct nfsd_net *nn) { const struct cred *original_cred; struct dentry *dir = nn->rec_file->f_path.dentry; struct nfs4_dir_ctx ctx = { .ctx.actor = nfsd4_build_namelist, .names = LIST_HEAD_INIT(ctx.names) }; struct name_list *entry, *tmp; int status; status = nfs4_save_creds(&original_cred); if (status < 0) return status; status = vfs_llseek(nn->rec_file, 0, SEEK_SET); if (status < 0) { nfs4_reset_creds(original_cred); return status; } status = iterate_dir(nn->rec_file, &ctx.ctx); inode_lock_nested(d_inode(dir), I_MUTEX_PARENT); list_for_each_entry_safe(entry, tmp, &ctx.names, list) { if (!status) { struct dentry *dentry; dentry = lookup_one_len(entry->name, dir, HEXDIR_LEN-1); if (IS_ERR(dentry)) { status = PTR_ERR(dentry); break; } status = f(dir, dentry, nn); dput(dentry); } list_del(&entry->list); kfree(entry); } inode_unlock(d_inode(dir)); nfs4_reset_creds(original_cred); list_for_each_entry_safe(entry, tmp, &ctx.names, list) { dprintk("NFSD: %s. Left entry %s\n", __func__, entry->name); list_del(&entry->list); kfree(entry); } return status; } static int nfsd4_unlink_clid_dir(char *name, int namlen, struct nfsd_net *nn) { struct dentry *dir, *dentry; int status; dprintk("NFSD: nfsd4_unlink_clid_dir. name %.*s\n", namlen, name); dir = nn->rec_file->f_path.dentry; inode_lock_nested(d_inode(dir), I_MUTEX_PARENT); dentry = lookup_one_len(name, dir, namlen); if (IS_ERR(dentry)) { status = PTR_ERR(dentry); goto out_unlock; } status = -ENOENT; if (d_really_is_negative(dentry)) goto out; status = vfs_rmdir(d_inode(dir), dentry); out: dput(dentry); out_unlock: inode_unlock(d_inode(dir)); return status; } static void nfsd4_remove_clid_dir(struct nfs4_client *clp) { const struct cred *original_cred; struct nfs4_client_reclaim *crp; char dname[HEXDIR_LEN]; int status; struct nfsd_net *nn = net_generic(clp->net, nfsd_net_id); if (!nn->rec_file || !test_bit(NFSD4_CLIENT_STABLE, &clp->cl_flags)) return; status = nfs4_make_rec_clidname(dname, &clp->cl_name); if (status) return legacy_recdir_name_error(clp, status); status = mnt_want_write_file(nn->rec_file); if (status) goto out; clear_bit(NFSD4_CLIENT_STABLE, &clp->cl_flags); status = nfs4_save_creds(&original_cred); if (status < 0) goto out_drop_write; status = nfsd4_unlink_clid_dir(dname, HEXDIR_LEN-1, nn); nfs4_reset_creds(original_cred); if (status == 0) { vfs_fsync(nn->rec_file, 0); if (nn->in_grace) { /* remove reclaim record */ crp = nfsd4_find_reclaim_client(dname, nn); if (crp) nfs4_remove_reclaim_record(crp, nn); } } out_drop_write: mnt_drop_write_file(nn->rec_file); out: if (status) printk("NFSD: Failed to remove expired client state directory" " %.*s\n", HEXDIR_LEN, dname); } static int purge_old(struct dentry *parent, struct dentry *child, struct nfsd_net *nn) { int status; if (nfs4_has_reclaimed_state(child->d_name.name, nn)) return 0; status = vfs_rmdir(d_inode(parent), child); if (status) printk("failed to remove client recovery directory %pd\n", child); /* Keep trying, success or failure: */ return 0; } static void nfsd4_recdir_purge_old(struct nfsd_net *nn) { int status; nn->in_grace = false; if (!nn->rec_file) return; status = mnt_want_write_file(nn->rec_file); if (status) goto out; status = nfsd4_list_rec_dir(purge_old, nn); if (status == 0) vfs_fsync(nn->rec_file, 0); mnt_drop_write_file(nn->rec_file); out: nfs4_release_reclaim(nn); if (status) printk("nfsd4: failed to purge old clients from recovery" " directory %pD\n", nn->rec_file); } static int load_recdir(struct dentry *parent, struct dentry *child, struct nfsd_net *nn) { if (child->d_name.len != HEXDIR_LEN - 1) { printk("nfsd4: illegal name %pd in recovery directory\n", child); /* Keep trying; maybe the others are OK: */ return 0; } nfs4_client_to_reclaim(child->d_name.name, nn); return 0; } static int nfsd4_recdir_load(struct net *net) { int status; struct nfsd_net *nn = net_generic(net, nfsd_net_id); if (!nn->rec_file) return 0; status = nfsd4_list_rec_dir(load_recdir, nn); if (status) printk("nfsd4: failed loading clients from recovery" " directory %pD\n", nn->rec_file); return status; } /* * Hold reference to the recovery directory. */ static int nfsd4_init_recdir(struct net *net) { struct nfsd_net *nn = net_generic(net, nfsd_net_id); const struct cred *original_cred; int status; printk("NFSD: Using %s as the NFSv4 state recovery directory\n", user_recovery_dirname); BUG_ON(nn->rec_file); status = nfs4_save_creds(&original_cred); if (status < 0) { printk("NFSD: Unable to change credentials to find recovery" " directory: error %d\n", status); return status; } nn->rec_file = filp_open(user_recovery_dirname, O_RDONLY | O_DIRECTORY, 0); if (IS_ERR(nn->rec_file)) { printk("NFSD: unable to find recovery directory %s\n", user_recovery_dirname); status = PTR_ERR(nn->rec_file); nn->rec_file = NULL; } nfs4_reset_creds(original_cred); if (!status) nn->in_grace = true; return status; } static void nfsd4_shutdown_recdir(struct net *net) { struct nfsd_net *nn = net_generic(net, nfsd_net_id); if (!nn->rec_file) return; fput(nn->rec_file); nn->rec_file = NULL; } static int nfs4_legacy_state_init(struct net *net) { struct nfsd_net *nn = net_generic(net, nfsd_net_id); int i; nn->reclaim_str_hashtbl = kmalloc_array(CLIENT_HASH_SIZE, sizeof(struct list_head), GFP_KERNEL); if (!nn->reclaim_str_hashtbl) return -ENOMEM; for (i = 0; i < CLIENT_HASH_SIZE; i++) INIT_LIST_HEAD(&nn->reclaim_str_hashtbl[i]); nn->reclaim_str_hashtbl_size = 0; return 0; } static void nfs4_legacy_state_shutdown(struct net *net) { struct nfsd_net *nn = net_generic(net, nfsd_net_id); kfree(nn->reclaim_str_hashtbl); } static int nfsd4_load_reboot_recovery_data(struct net *net) { int status; status = nfsd4_init_recdir(net); if (status) return status; status = nfsd4_recdir_load(net); if (status) nfsd4_shutdown_recdir(net); return status; } static int nfsd4_legacy_tracking_init(struct net *net) { int status; /* XXX: The legacy code won't work in a container */ if (net != &init_net) { pr_warn("NFSD: attempt to initialize legacy client tracking in a container ignored.\n"); return -EINVAL; } status = nfs4_legacy_state_init(net); if (status) return status; status = nfsd4_load_reboot_recovery_data(net); if (status) goto err; return 0; err: nfs4_legacy_state_shutdown(net); return status; } static void nfsd4_legacy_tracking_exit(struct net *net) { struct nfsd_net *nn = net_generic(net, nfsd_net_id); nfs4_release_reclaim(nn); nfsd4_shutdown_recdir(net); nfs4_legacy_state_shutdown(net); } /* * Change the NFSv4 recovery directory to recdir. */ int nfs4_reset_recoverydir(char *recdir) { int status; struct path path; status = kern_path(recdir, LOOKUP_FOLLOW, &path); if (status) return status; status = -ENOTDIR; if (d_is_dir(path.dentry)) { strcpy(user_recovery_dirname, recdir); status = 0; } path_put(&path); return status; } char * nfs4_recoverydir(void) { return user_recovery_dirname; } static int nfsd4_check_legacy_client(struct nfs4_client *clp) { int status; char dname[HEXDIR_LEN]; struct nfs4_client_reclaim *crp; struct nfsd_net *nn = net_generic(clp->net, nfsd_net_id); /* did we already find that this client is stable? */ if (test_bit(NFSD4_CLIENT_STABLE, &clp->cl_flags)) return 0; status = nfs4_make_rec_clidname(dname, &clp->cl_name); if (status) { legacy_recdir_name_error(clp, status); return status; } /* look for it in the reclaim hashtable otherwise */ crp = nfsd4_find_reclaim_client(dname, nn); if (crp) { set_bit(NFSD4_CLIENT_STABLE, &clp->cl_flags); crp->cr_clp = clp; return 0; } return -ENOENT; } static const struct nfsd4_client_tracking_ops nfsd4_legacy_tracking_ops = { .init = nfsd4_legacy_tracking_init, .exit = nfsd4_legacy_tracking_exit, .create = nfsd4_create_clid_dir, .remove = nfsd4_remove_clid_dir, .check = nfsd4_check_legacy_client, .grace_done = nfsd4_recdir_purge_old, }; /* Globals */ #define NFSD_PIPE_DIR "nfsd" #define NFSD_CLD_PIPE "cld" /* per-net-ns structure for holding cld upcall info */ struct cld_net { struct rpc_pipe *cn_pipe; spinlock_t cn_lock; struct list_head cn_list; unsigned int cn_xid; }; struct cld_upcall { struct list_head cu_list; struct cld_net *cu_net; struct completion cu_done; struct cld_msg cu_msg; }; static int __cld_pipe_upcall(struct rpc_pipe *pipe, struct cld_msg *cmsg) { int ret; struct rpc_pipe_msg msg; struct cld_upcall *cup = container_of(cmsg, struct cld_upcall, cu_msg); memset(&msg, 0, sizeof(msg)); msg.data = cmsg; msg.len = sizeof(*cmsg); ret = rpc_queue_upcall(pipe, &msg); if (ret < 0) { goto out; } wait_for_completion(&cup->cu_done); if (msg.errno < 0) ret = msg.errno; out: return ret; } static int cld_pipe_upcall(struct rpc_pipe *pipe, struct cld_msg *cmsg) { int ret; /* * -EAGAIN occurs when pipe is closed and reopened while there are * upcalls queued. */ do { ret = __cld_pipe_upcall(pipe, cmsg); } while (ret == -EAGAIN); return ret; } static ssize_t cld_pipe_downcall(struct file *filp, const char __user *src, size_t mlen) { struct cld_upcall *tmp, *cup; struct cld_msg __user *cmsg = (struct cld_msg __user *)src; uint32_t xid; struct nfsd_net *nn = net_generic(file_inode(filp)->i_sb->s_fs_info, nfsd_net_id); struct cld_net *cn = nn->cld_net; if (mlen != sizeof(*cmsg)) { dprintk("%s: got %zu bytes, expected %zu\n", __func__, mlen, sizeof(*cmsg)); return -EINVAL; } /* copy just the xid so we can try to find that */ if (copy_from_user(&xid, &cmsg->cm_xid, sizeof(xid)) != 0) { dprintk("%s: error when copying xid from userspace", __func__); return -EFAULT; } /* walk the list and find corresponding xid */ cup = NULL; spin_lock(&cn->cn_lock); list_for_each_entry(tmp, &cn->cn_list, cu_list) { if (get_unaligned(&tmp->cu_msg.cm_xid) == xid) { cup = tmp; list_del_init(&cup->cu_list); break; } } spin_unlock(&cn->cn_lock); /* couldn't find upcall? */ if (!cup) { dprintk("%s: couldn't find upcall -- xid=%u\n", __func__, xid); return -EINVAL; } if (copy_from_user(&cup->cu_msg, src, mlen) != 0) return -EFAULT; complete(&cup->cu_done); return mlen; } static void cld_pipe_destroy_msg(struct rpc_pipe_msg *msg) { struct cld_msg *cmsg = msg->data; struct cld_upcall *cup = container_of(cmsg, struct cld_upcall, cu_msg); /* errno >= 0 means we got a downcall */ if (msg->errno >= 0) return; complete(&cup->cu_done); } static const struct rpc_pipe_ops cld_upcall_ops = { .upcall = rpc_pipe_generic_upcall, .downcall = cld_pipe_downcall, .destroy_msg = cld_pipe_destroy_msg, }; static struct dentry * nfsd4_cld_register_sb(struct super_block *sb, struct rpc_pipe *pipe) { struct dentry *dir, *dentry; dir = rpc_d_lookup_sb(sb, NFSD_PIPE_DIR); if (dir == NULL) return ERR_PTR(-ENOENT); dentry = rpc_mkpipe_dentry(dir, NFSD_CLD_PIPE, NULL, pipe); dput(dir); return dentry; } static void nfsd4_cld_unregister_sb(struct rpc_pipe *pipe) { if (pipe->dentry) rpc_unlink(pipe->dentry); } static struct dentry * nfsd4_cld_register_net(struct net *net, struct rpc_pipe *pipe) { struct super_block *sb; struct dentry *dentry; sb = rpc_get_sb_net(net); if (!sb) return NULL; dentry = nfsd4_cld_register_sb(sb, pipe); rpc_put_sb_net(net); return dentry; } static void nfsd4_cld_unregister_net(struct net *net, struct rpc_pipe *pipe) { struct super_block *sb; sb = rpc_get_sb_net(net); if (sb) { nfsd4_cld_unregister_sb(pipe); rpc_put_sb_net(net); } } /* Initialize rpc_pipefs pipe for communication with client tracking daemon */ static int nfsd4_init_cld_pipe(struct net *net) { int ret; struct dentry *dentry; struct nfsd_net *nn = net_generic(net, nfsd_net_id); struct cld_net *cn; if (nn->cld_net) return 0; cn = kzalloc(sizeof(*cn), GFP_KERNEL); if (!cn) { ret = -ENOMEM; goto err; } cn->cn_pipe = rpc_mkpipe_data(&cld_upcall_ops, RPC_PIPE_WAIT_FOR_OPEN); if (IS_ERR(cn->cn_pipe)) { ret = PTR_ERR(cn->cn_pipe); goto err; } spin_lock_init(&cn->cn_lock); INIT_LIST_HEAD(&cn->cn_list); dentry = nfsd4_cld_register_net(net, cn->cn_pipe); if (IS_ERR(dentry)) { ret = PTR_ERR(dentry); goto err_destroy_data; } cn->cn_pipe->dentry = dentry; nn->cld_net = cn; return 0; err_destroy_data: rpc_destroy_pipe_data(cn->cn_pipe); err: kfree(cn); printk(KERN_ERR "NFSD: unable to create nfsdcld upcall pipe (%d)\n", ret); return ret; } static void nfsd4_remove_cld_pipe(struct net *net) { struct nfsd_net *nn = net_generic(net, nfsd_net_id); struct cld_net *cn = nn->cld_net; nfsd4_cld_unregister_net(net, cn->cn_pipe); rpc_destroy_pipe_data(cn->cn_pipe); kfree(nn->cld_net); nn->cld_net = NULL; } static struct cld_upcall * alloc_cld_upcall(struct cld_net *cn) { struct cld_upcall *new, *tmp; new = kzalloc(sizeof(*new), GFP_KERNEL); if (!new) return new; /* FIXME: hard cap on number in flight? */ restart_search: spin_lock(&cn->cn_lock); list_for_each_entry(tmp, &cn->cn_list, cu_list) { if (tmp->cu_msg.cm_xid == cn->cn_xid) { cn->cn_xid++; spin_unlock(&cn->cn_lock); goto restart_search; } } init_completion(&new->cu_done); new->cu_msg.cm_vers = CLD_UPCALL_VERSION; put_unaligned(cn->cn_xid++, &new->cu_msg.cm_xid); new->cu_net = cn; list_add(&new->cu_list, &cn->cn_list); spin_unlock(&cn->cn_lock); dprintk("%s: allocated xid %u\n", __func__, new->cu_msg.cm_xid); return new; } static void free_cld_upcall(struct cld_upcall *victim) { struct cld_net *cn = victim->cu_net; spin_lock(&cn->cn_lock); list_del(&victim->cu_list); spin_unlock(&cn->cn_lock); kfree(victim); } /* Ask daemon to create a new record */ static void nfsd4_cld_create(struct nfs4_client *clp) { int ret; struct cld_upcall *cup; struct nfsd_net *nn = net_generic(clp->net, nfsd_net_id); struct cld_net *cn = nn->cld_net; /* Don't upcall if it's already stored */ if (test_bit(NFSD4_CLIENT_STABLE, &clp->cl_flags)) return; cup = alloc_cld_upcall(cn); if (!cup) { ret = -ENOMEM; goto out_err; } cup->cu_msg.cm_cmd = Cld_Create; cup->cu_msg.cm_u.cm_name.cn_len = clp->cl_name.len; memcpy(cup->cu_msg.cm_u.cm_name.cn_id, clp->cl_name.data, clp->cl_name.len); ret = cld_pipe_upcall(cn->cn_pipe, &cup->cu_msg); if (!ret) { ret = cup->cu_msg.cm_status; set_bit(NFSD4_CLIENT_STABLE, &clp->cl_flags); } free_cld_upcall(cup); out_err: if (ret) printk(KERN_ERR "NFSD: Unable to create client " "record on stable storage: %d\n", ret); } /* Ask daemon to create a new record */ static void nfsd4_cld_remove(struct nfs4_client *clp) { int ret; struct cld_upcall *cup; struct nfsd_net *nn = net_generic(clp->net, nfsd_net_id); struct cld_net *cn = nn->cld_net; /* Don't upcall if it's already removed */ if (!test_bit(NFSD4_CLIENT_STABLE, &clp->cl_flags)) return; cup = alloc_cld_upcall(cn); if (!cup) { ret = -ENOMEM; goto out_err; } cup->cu_msg.cm_cmd = Cld_Remove; cup->cu_msg.cm_u.cm_name.cn_len = clp->cl_name.len; memcpy(cup->cu_msg.cm_u.cm_name.cn_id, clp->cl_name.data, clp->cl_name.len); ret = cld_pipe_upcall(cn->cn_pipe, &cup->cu_msg); if (!ret) { ret = cup->cu_msg.cm_status; clear_bit(NFSD4_CLIENT_STABLE, &clp->cl_flags); } free_cld_upcall(cup); out_err: if (ret) printk(KERN_ERR "NFSD: Unable to remove client " "record from stable storage: %d\n", ret); } /* Check for presence of a record, and update its timestamp */ static int nfsd4_cld_check(struct nfs4_client *clp) { int ret; struct cld_upcall *cup; struct nfsd_net *nn = net_generic(clp->net, nfsd_net_id); struct cld_net *cn = nn->cld_net; /* Don't upcall if one was already stored during this grace pd */ if (test_bit(NFSD4_CLIENT_STABLE, &clp->cl_flags)) return 0; cup = alloc_cld_upcall(cn); if (!cup) { printk(KERN_ERR "NFSD: Unable to check client record on " "stable storage: %d\n", -ENOMEM); return -ENOMEM; } cup->cu_msg.cm_cmd = Cld_Check; cup->cu_msg.cm_u.cm_name.cn_len = clp->cl_name.len; memcpy(cup->cu_msg.cm_u.cm_name.cn_id, clp->cl_name.data, clp->cl_name.len); ret = cld_pipe_upcall(cn->cn_pipe, &cup->cu_msg); if (!ret) { ret = cup->cu_msg.cm_status; set_bit(NFSD4_CLIENT_STABLE, &clp->cl_flags); } free_cld_upcall(cup); return ret; } static void nfsd4_cld_grace_done(struct nfsd_net *nn) { int ret; struct cld_upcall *cup; struct cld_net *cn = nn->cld_net; cup = alloc_cld_upcall(cn); if (!cup) { ret = -ENOMEM; goto out_err; } cup->cu_msg.cm_cmd = Cld_GraceDone; cup->cu_msg.cm_u.cm_gracetime = (int64_t)nn->boot_time; ret = cld_pipe_upcall(cn->cn_pipe, &cup->cu_msg); if (!ret) ret = cup->cu_msg.cm_status; free_cld_upcall(cup); out_err: if (ret) printk(KERN_ERR "NFSD: Unable to end grace period: %d\n", ret); } static const struct nfsd4_client_tracking_ops nfsd4_cld_tracking_ops = { .init = nfsd4_init_cld_pipe, .exit = nfsd4_remove_cld_pipe, .create = nfsd4_cld_create, .remove = nfsd4_cld_remove, .check = nfsd4_cld_check, .grace_done = nfsd4_cld_grace_done, }; /* upcall via usermodehelper */ static char cltrack_prog[PATH_MAX] = "/sbin/nfsdcltrack"; module_param_string(cltrack_prog, cltrack_prog, sizeof(cltrack_prog), S_IRUGO|S_IWUSR); MODULE_PARM_DESC(cltrack_prog, "Path to the nfsdcltrack upcall program"); static bool cltrack_legacy_disable; module_param(cltrack_legacy_disable, bool, S_IRUGO|S_IWUSR); MODULE_PARM_DESC(cltrack_legacy_disable, "Disable legacy recoverydir conversion. Default: false"); #define LEGACY_TOPDIR_ENV_PREFIX "NFSDCLTRACK_LEGACY_TOPDIR=" #define LEGACY_RECDIR_ENV_PREFIX "NFSDCLTRACK_LEGACY_RECDIR=" #define HAS_SESSION_ENV_PREFIX "NFSDCLTRACK_CLIENT_HAS_SESSION=" #define GRACE_START_ENV_PREFIX "NFSDCLTRACK_GRACE_START=" static char * nfsd4_cltrack_legacy_topdir(void) { int copied; size_t len; char *result; if (cltrack_legacy_disable) return NULL; len = strlen(LEGACY_TOPDIR_ENV_PREFIX) + strlen(nfs4_recoverydir()) + 1; result = kmalloc(len, GFP_KERNEL); if (!result) return result; copied = snprintf(result, len, LEGACY_TOPDIR_ENV_PREFIX "%s", nfs4_recoverydir()); if (copied >= len) { /* just return nothing if output was truncated */ kfree(result); return NULL; } return result; } static char * nfsd4_cltrack_legacy_recdir(const struct xdr_netobj *name) { int copied; size_t len; char *result; if (cltrack_legacy_disable) return NULL; /* +1 is for '/' between "topdir" and "recdir" */ len = strlen(LEGACY_RECDIR_ENV_PREFIX) + strlen(nfs4_recoverydir()) + 1 + HEXDIR_LEN; result = kmalloc(len, GFP_KERNEL); if (!result) return result; copied = snprintf(result, len, LEGACY_RECDIR_ENV_PREFIX "%s/", nfs4_recoverydir()); if (copied > (len - HEXDIR_LEN)) { /* just return nothing if output will be truncated */ kfree(result); return NULL; } copied = nfs4_make_rec_clidname(result + copied, name); if (copied) { kfree(result); return NULL; } return result; } static char * nfsd4_cltrack_client_has_session(struct nfs4_client *clp) { int copied; size_t len; char *result; /* prefix + Y/N character + terminating NULL */ len = strlen(HAS_SESSION_ENV_PREFIX) + 1 + 1; result = kmalloc(len, GFP_KERNEL); if (!result) return result; copied = snprintf(result, len, HAS_SESSION_ENV_PREFIX "%c", clp->cl_minorversion ? 'Y' : 'N'); if (copied >= len) { /* just return nothing if output was truncated */ kfree(result); return NULL; } return result; } static char * nfsd4_cltrack_grace_start(time_t grace_start) { int copied; size_t len; char *result; /* prefix + max width of int64_t string + terminating NULL */ len = strlen(GRACE_START_ENV_PREFIX) + 22 + 1; result = kmalloc(len, GFP_KERNEL); if (!result) return result; copied = snprintf(result, len, GRACE_START_ENV_PREFIX "%ld", grace_start); if (copied >= len) { /* just return nothing if output was truncated */ kfree(result); return NULL; } return result; } static int nfsd4_umh_cltrack_upcall(char *cmd, char *arg, char *env0, char *env1) { char *envp[3]; char *argv[4]; int ret; if (unlikely(!cltrack_prog[0])) { dprintk("%s: cltrack_prog is disabled\n", __func__); return -EACCES; } dprintk("%s: cmd: %s\n", __func__, cmd); dprintk("%s: arg: %s\n", __func__, arg ? arg : "(null)"); dprintk("%s: env0: %s\n", __func__, env0 ? env0 : "(null)"); dprintk("%s: env1: %s\n", __func__, env1 ? env1 : "(null)"); envp[0] = env0; envp[1] = env1; envp[2] = NULL; argv[0] = (char *)cltrack_prog; argv[1] = cmd; argv[2] = arg; argv[3] = NULL; ret = call_usermodehelper(argv[0], argv, envp, UMH_WAIT_PROC); /* * Disable the upcall mechanism if we're getting an ENOENT or EACCES * error. The admin can re-enable it on the fly by using sysfs * once the problem has been fixed. */ if (ret == -ENOENT || ret == -EACCES) { dprintk("NFSD: %s was not found or isn't executable (%d). " "Setting cltrack_prog to blank string!", cltrack_prog, ret); cltrack_prog[0] = '\0'; } dprintk("%s: %s return value: %d\n", __func__, cltrack_prog, ret); return ret; } static char * bin_to_hex_dup(const unsigned char *src, int srclen) { int i; char *buf, *hex; /* +1 for terminating NULL */ buf = kmalloc((srclen * 2) + 1, GFP_KERNEL); if (!buf) return buf; hex = buf; for (i = 0; i < srclen; i++) { sprintf(hex, "%2.2x", *src++); hex += 2; } return buf; } static int nfsd4_umh_cltrack_init(struct net *net) { int ret; struct nfsd_net *nn = net_generic(net, nfsd_net_id); char *grace_start = nfsd4_cltrack_grace_start(nn->boot_time); /* XXX: The usermode helper s not working in container yet. */ if (net != &init_net) { pr_warn("NFSD: attempt to initialize umh client tracking in a container ignored.\n"); kfree(grace_start); return -EINVAL; } ret = nfsd4_umh_cltrack_upcall("init", NULL, grace_start, NULL); kfree(grace_start); return ret; } static void nfsd4_cltrack_upcall_lock(struct nfs4_client *clp) { wait_on_bit_lock(&clp->cl_flags, NFSD4_CLIENT_UPCALL_LOCK, TASK_UNINTERRUPTIBLE); } static void nfsd4_cltrack_upcall_unlock(struct nfs4_client *clp) { smp_mb__before_atomic(); clear_bit(NFSD4_CLIENT_UPCALL_LOCK, &clp->cl_flags); smp_mb__after_atomic(); wake_up_bit(&clp->cl_flags, NFSD4_CLIENT_UPCALL_LOCK); } static void nfsd4_umh_cltrack_create(struct nfs4_client *clp) { char *hexid, *has_session, *grace_start; struct nfsd_net *nn = net_generic(clp->net, nfsd_net_id); /* * With v4.0 clients, there's little difference in outcome between a * create and check operation, and we can end up calling into this * function multiple times per client (once for each openowner). So, * for v4.0 clients skip upcalling once the client has been recorded * on stable storage. * * For v4.1+ clients, the outcome of the two operations is different, * so we must ensure that we upcall for the create operation. v4.1+ * clients call this on RECLAIM_COMPLETE though, so we should only end * up doing a single create upcall per client. */ if (clp->cl_minorversion == 0 && test_bit(NFSD4_CLIENT_STABLE, &clp->cl_flags)) return; hexid = bin_to_hex_dup(clp->cl_name.data, clp->cl_name.len); if (!hexid) { dprintk("%s: can't allocate memory for upcall!\n", __func__); return; } has_session = nfsd4_cltrack_client_has_session(clp); grace_start = nfsd4_cltrack_grace_start(nn->boot_time); nfsd4_cltrack_upcall_lock(clp); if (!nfsd4_umh_cltrack_upcall("create", hexid, has_session, grace_start)) set_bit(NFSD4_CLIENT_STABLE, &clp->cl_flags); nfsd4_cltrack_upcall_unlock(clp); kfree(has_session); kfree(grace_start); kfree(hexid); } static void nfsd4_umh_cltrack_remove(struct nfs4_client *clp) { char *hexid; if (!test_bit(NFSD4_CLIENT_STABLE, &clp->cl_flags)) return; hexid = bin_to_hex_dup(clp->cl_name.data, clp->cl_name.len); if (!hexid) { dprintk("%s: can't allocate memory for upcall!\n", __func__); return; } nfsd4_cltrack_upcall_lock(clp); if (test_bit(NFSD4_CLIENT_STABLE, &clp->cl_flags) && nfsd4_umh_cltrack_upcall("remove", hexid, NULL, NULL) == 0) clear_bit(NFSD4_CLIENT_STABLE, &clp->cl_flags); nfsd4_cltrack_upcall_unlock(clp); kfree(hexid); } static int nfsd4_umh_cltrack_check(struct nfs4_client *clp) { int ret; char *hexid, *has_session, *legacy; if (test_bit(NFSD4_CLIENT_STABLE, &clp->cl_flags)) return 0; hexid = bin_to_hex_dup(clp->cl_name.data, clp->cl_name.len); if (!hexid) { dprintk("%s: can't allocate memory for upcall!\n", __func__); return -ENOMEM; } has_session = nfsd4_cltrack_client_has_session(clp); legacy = nfsd4_cltrack_legacy_recdir(&clp->cl_name); nfsd4_cltrack_upcall_lock(clp); if (test_bit(NFSD4_CLIENT_STABLE, &clp->cl_flags)) { ret = 0; } else { ret = nfsd4_umh_cltrack_upcall("check", hexid, has_session, legacy); if (ret == 0) set_bit(NFSD4_CLIENT_STABLE, &clp->cl_flags); } nfsd4_cltrack_upcall_unlock(clp); kfree(has_session); kfree(legacy); kfree(hexid); return ret; } static void nfsd4_umh_cltrack_grace_done(struct nfsd_net *nn) { char *legacy; char timestr[22]; /* FIXME: better way to determine max size? */ sprintf(timestr, "%ld", nn->boot_time); legacy = nfsd4_cltrack_legacy_topdir(); nfsd4_umh_cltrack_upcall("gracedone", timestr, legacy, NULL); kfree(legacy); } static const struct nfsd4_client_tracking_ops nfsd4_umh_tracking_ops = { .init = nfsd4_umh_cltrack_init, .exit = NULL, .create = nfsd4_umh_cltrack_create, .remove = nfsd4_umh_cltrack_remove, .check = nfsd4_umh_cltrack_check, .grace_done = nfsd4_umh_cltrack_grace_done, }; int nfsd4_client_tracking_init(struct net *net) { int status; struct path path; struct nfsd_net *nn = net_generic(net, nfsd_net_id); /* just run the init if it the method is already decided */ if (nn->client_tracking_ops) goto do_init; /* * First, try a UMH upcall. It should succeed or fail quickly, so * there's little harm in trying that first. */ nn->client_tracking_ops = &nfsd4_umh_tracking_ops; status = nn->client_tracking_ops->init(net); if (!status) return status; /* * See if the recoverydir exists and is a directory. If it is, * then use the legacy ops. */ nn->client_tracking_ops = &nfsd4_legacy_tracking_ops; status = kern_path(nfs4_recoverydir(), LOOKUP_FOLLOW, &path); if (!status) { status = d_is_dir(path.dentry); path_put(&path); if (status) goto do_init; } /* Finally, try to use nfsdcld */ nn->client_tracking_ops = &nfsd4_cld_tracking_ops; printk(KERN_WARNING "NFSD: the nfsdcld client tracking upcall will be " "removed in 3.10. Please transition to using " "nfsdcltrack.\n"); do_init: status = nn->client_tracking_ops->init(net); if (status) { printk(KERN_WARNING "NFSD: Unable to initialize client " "recovery tracking! (%d)\n", status); nn->client_tracking_ops = NULL; } return status; } void nfsd4_client_tracking_exit(struct net *net) { struct nfsd_net *nn = net_generic(net, nfsd_net_id); if (nn->client_tracking_ops) { if (nn->client_tracking_ops->exit) nn->client_tracking_ops->exit(net); nn->client_tracking_ops = NULL; } } void nfsd4_client_record_create(struct nfs4_client *clp) { struct nfsd_net *nn = net_generic(clp->net, nfsd_net_id); if (nn->client_tracking_ops) nn->client_tracking_ops->create(clp); } void nfsd4_client_record_remove(struct nfs4_client *clp) { struct nfsd_net *nn = net_generic(clp->net, nfsd_net_id); if (nn->client_tracking_ops) nn->client_tracking_ops->remove(clp); } int nfsd4_client_record_check(struct nfs4_client *clp) { struct nfsd_net *nn = net_generic(clp->net, nfsd_net_id); if (nn->client_tracking_ops) return nn->client_tracking_ops->check(clp); return -EOPNOTSUPP; } void nfsd4_record_grace_done(struct nfsd_net *nn) { if (nn->client_tracking_ops) nn->client_tracking_ops->grace_done(nn); } static int rpc_pipefs_event(struct notifier_block *nb, unsigned long event, void *ptr) { struct super_block *sb = ptr; struct net *net = sb->s_fs_info; struct nfsd_net *nn = net_generic(net, nfsd_net_id); struct cld_net *cn = nn->cld_net; struct dentry *dentry; int ret = 0; if (!try_module_get(THIS_MODULE)) return 0; if (!cn) { module_put(THIS_MODULE); return 0; } switch (event) { case RPC_PIPEFS_MOUNT: dentry = nfsd4_cld_register_sb(sb, cn->cn_pipe); if (IS_ERR(dentry)) { ret = PTR_ERR(dentry); break; } cn->cn_pipe->dentry = dentry; break; case RPC_PIPEFS_UMOUNT: if (cn->cn_pipe->dentry) nfsd4_cld_unregister_sb(cn->cn_pipe); break; default: ret = -ENOTSUPP; break; } module_put(THIS_MODULE); return ret; } static struct notifier_block nfsd4_cld_block = { .notifier_call = rpc_pipefs_event, }; int register_cld_notifier(void) { return rpc_pipefs_notifier_register(&nfsd4_cld_block); } void unregister_cld_notifier(void) { rpc_pipefs_notifier_unregister(&nfsd4_cld_block); }
310 288 223 416 585 310 278 1 277 558 509 502 249 235 50 116 604 604 604 556 626 383 258 238 203 383 244 308 308 83 83 221 220 532 529 69 69 2 2 124 389 289 151 8 88 146 33 214 207 78 280 506 177 127 127 566 566 566 215 185 2 2 226 224 20 270 271 481 5 279 297 279 1 192 343 452 426 2 201 198 199 191 189 567 146 430 62 62 202 142 70 15 16 15 1 16 168 18 166 17 180 165 18 16 151 166 176 176 176 279 164 5 162 60 147 60 118 113 6 5 5 6 118 118 6 6 5 2 2 3 2 2 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 /* * Copyright (C) 2011 Novell Inc. * Copyright (C) 2016 Red Hat, Inc. * * This program is free software; you can redistribute it and/or modify it * under the terms of the GNU General Public License version 2 as published by * the Free Software Foundation. */ #include <linux/fs.h> #include <linux/mount.h> #include <linux/slab.h> #include <linux/cred.h> #include <linux/xattr.h> #include <linux/exportfs.h> #include <linux/uuid.h> #include <linux/namei.h> #include <linux/ratelimit.h> #include "overlayfs.h" int ovl_want_write(struct dentry *dentry) { struct ovl_fs *ofs = dentry->d_sb->s_fs_info; return mnt_want_write(ofs->upper_mnt); } void ovl_drop_write(struct dentry *dentry) { struct ovl_fs *ofs = dentry->d_sb->s_fs_info; mnt_drop_write(ofs->upper_mnt); } struct dentry *ovl_workdir(struct dentry *dentry) { struct ovl_fs *ofs = dentry->d_sb->s_fs_info; return ofs->workdir; } const struct cred *ovl_override_creds(struct super_block *sb) { struct ovl_fs *ofs = sb->s_fs_info; return override_creds(ofs->creator_cred); } struct super_block *ovl_same_sb(struct super_block *sb) { struct ovl_fs *ofs = sb->s_fs_info; if (!ofs->numlowerfs) return ofs->upper_mnt->mnt_sb; else if (ofs->numlowerfs == 1 && !ofs->upper_mnt) return ofs->lower_fs[0].sb; else return NULL; } /* * Check if underlying fs supports file handles and try to determine encoding * type, in order to deduce maximum inode number used by fs. * * Return 0 if file handles are not supported. * Return 1 (FILEID_INO32_GEN) if fs uses the default 32bit inode encoding. * Return -1 if fs uses a non default encoding with unknown inode size. */ int ovl_can_decode_fh(struct super_block *sb) { if (!sb->s_export_op || !sb->s_export_op->fh_to_dentry || uuid_is_null(&sb->s_uuid)) return 0; return sb->s_export_op->encode_fh ? -1 : FILEID_INO32_GEN; } struct dentry *ovl_indexdir(struct super_block *sb) { struct ovl_fs *ofs = sb->s_fs_info; return ofs->indexdir; } /* Index all files on copy up. For now only enabled for NFS export */ bool ovl_index_all(struct super_block *sb) { struct ovl_fs *ofs = sb->s_fs_info; return ofs->config.nfs_export && ofs->config.index; } /* Verify lower origin on lookup. For now only enabled for NFS export */ bool ovl_verify_lower(struct super_block *sb) { struct ovl_fs *ofs = sb->s_fs_info; return ofs->config.nfs_export && ofs->config.index; } struct ovl_entry *ovl_alloc_entry(unsigned int numlower) { size_t size = offsetof(struct ovl_entry, lowerstack[numlower]); struct ovl_entry *oe = kzalloc(size, GFP_KERNEL); if (oe) oe->numlower = numlower; return oe; } bool ovl_dentry_remote(struct dentry *dentry) { return dentry->d_flags & (DCACHE_OP_REVALIDATE | DCACHE_OP_WEAK_REVALIDATE | DCACHE_OP_REAL); } bool ovl_dentry_weird(struct dentry *dentry) { return dentry->d_flags & (DCACHE_NEED_AUTOMOUNT | DCACHE_MANAGE_TRANSIT | DCACHE_OP_HASH | DCACHE_OP_COMPARE); } enum ovl_path_type ovl_path_type(struct dentry *dentry) { struct ovl_entry *oe = dentry->d_fsdata; enum ovl_path_type type = 0; if (ovl_dentry_upper(dentry)) { type = __OVL_PATH_UPPER; /* * Non-dir dentry can hold lower dentry of its copy up origin. */ if (oe->numlower) { if (ovl_test_flag(OVL_CONST_INO, d_inode(dentry))) type |= __OVL_PATH_ORIGIN; if (d_is_dir(dentry) || !ovl_has_upperdata(d_inode(dentry))) type |= __OVL_PATH_MERGE; } } else { if (oe->numlower > 1) type |= __OVL_PATH_MERGE; } return type; } void ovl_path_upper(struct dentry *dentry, struct path *path) { struct ovl_fs *ofs = dentry->d_sb->s_fs_info; path->mnt = ofs->upper_mnt; path->dentry = ovl_dentry_upper(dentry); } void ovl_path_lower(struct dentry *dentry, struct path *path) { struct ovl_entry *oe = dentry->d_fsdata; if (oe->numlower) { path->mnt = oe->lowerstack[0].layer->mnt; path->dentry = oe->lowerstack[0].dentry; } else { *path = (struct path) { }; } } void ovl_path_lowerdata(struct dentry *dentry, struct path *path) { struct ovl_entry *oe = dentry->d_fsdata; if (oe->numlower) { path->mnt = oe->lowerstack[oe->numlower - 1].layer->mnt; path->dentry = oe->lowerstack[oe->numlower - 1].dentry; } else { *path = (struct path) { }; } } enum ovl_path_type ovl_path_real(struct dentry *dentry, struct path *path) { enum ovl_path_type type = ovl_path_type(dentry); if (!OVL_TYPE_UPPER(type)) ovl_path_lower(dentry, path); else ovl_path_upper(dentry, path); return type; } struct dentry *ovl_dentry_upper(struct dentry *dentry) { return ovl_upperdentry_dereference(OVL_I(d_inode(dentry))); } struct dentry *ovl_dentry_lower(struct dentry *dentry) { struct ovl_entry *oe = dentry->d_fsdata; return oe->numlower ? oe->lowerstack[0].dentry : NULL; } struct ovl_layer *ovl_layer_lower(struct dentry *dentry) { struct ovl_entry *oe = dentry->d_fsdata; return oe->numlower ? oe->lowerstack[0].layer : NULL; } /* * ovl_dentry_lower() could return either a data dentry or metacopy dentry * dependig on what is stored in lowerstack[0]. At times we need to find * lower dentry which has data (and not metacopy dentry). This helper * returns the lower data dentry. */ struct dentry *ovl_dentry_lowerdata(struct dentry *dentry) { struct ovl_entry *oe = dentry->d_fsdata; return oe->numlower ? oe->lowerstack[oe->numlower - 1].dentry : NULL; } struct dentry *ovl_dentry_real(struct dentry *dentry) { return ovl_dentry_upper(dentry) ?: ovl_dentry_lower(dentry); } struct dentry *ovl_i_dentry_upper(struct inode *inode) { return ovl_upperdentry_dereference(OVL_I(inode)); } struct inode *ovl_inode_upper(struct inode *inode) { struct dentry *upperdentry = ovl_i_dentry_upper(inode); return upperdentry ? d_inode(upperdentry) : NULL; } struct inode *ovl_inode_lower(struct inode *inode) { return OVL_I(inode)->lower; } struct inode *ovl_inode_real(struct inode *inode) { return ovl_inode_upper(inode) ?: ovl_inode_lower(inode); } /* Return inode which contains lower data. Do not return metacopy */ struct inode *ovl_inode_lowerdata(struct inode *inode) { if (WARN_ON(!S_ISREG(inode->i_mode))) return NULL; return OVL_I(inode)->lowerdata ?: ovl_inode_lower(inode); } /* Return real inode which contains data. Does not return metacopy inode */ struct inode *ovl_inode_realdata(struct inode *inode) { struct inode *upperinode; upperinode = ovl_inode_upper(inode); if (upperinode && ovl_has_upperdata(inode)) return upperinode; return ovl_inode_lowerdata(inode); } struct ovl_dir_cache *ovl_dir_cache(struct inode *inode) { return OVL_I(inode)->cache; } void ovl_set_dir_cache(struct inode *inode, struct ovl_dir_cache *cache) { OVL_I(inode)->cache = cache; } void ovl_dentry_set_flag(unsigned long flag, struct dentry *dentry) { set_bit(flag, &OVL_E(dentry)->flags); } void ovl_dentry_clear_flag(unsigned long flag, struct dentry *dentry) { clear_bit(flag, &OVL_E(dentry)->flags); } bool ovl_dentry_test_flag(unsigned long flag, struct dentry *dentry) { return test_bit(flag, &OVL_E(dentry)->flags); } bool ovl_dentry_is_opaque(struct dentry *dentry) { return ovl_dentry_test_flag(OVL_E_OPAQUE, dentry); } bool ovl_dentry_is_whiteout(struct dentry *dentry) { return !dentry->d_inode && ovl_dentry_is_opaque(dentry); } void ovl_dentry_set_opaque(struct dentry *dentry) { ovl_dentry_set_flag(OVL_E_OPAQUE, dentry); } /* * For hard links and decoded file handles, it's possible for ovl_dentry_upper() * to return positive, while there's no actual upper alias for the inode. * Copy up code needs to know about the existence of the upper alias, so it * can't use ovl_dentry_upper(). */ bool ovl_dentry_has_upper_alias(struct dentry *dentry) { return ovl_dentry_test_flag(OVL_E_UPPER_ALIAS, dentry); } void ovl_dentry_set_upper_alias(struct dentry *dentry) { ovl_dentry_set_flag(OVL_E_UPPER_ALIAS, dentry); } static bool ovl_should_check_upperdata(struct inode *inode) { if (!S_ISREG(inode->i_mode)) return false; if (!ovl_inode_lower(inode)) return false; return true; } bool ovl_has_upperdata(struct inode *inode) { if (!ovl_should_check_upperdata(inode)) return true; if (!ovl_test_flag(OVL_UPPERDATA, inode)) return false; /* * Pairs with smp_wmb() in ovl_set_upperdata(). Main user of * ovl_has_upperdata() is ovl_copy_up_meta_inode_data(). Make sure * if setting of OVL_UPPERDATA is visible, then effects of writes * before that are visible too. */ smp_rmb(); return true; } void ovl_set_upperdata(struct inode *inode) { /* * Pairs with smp_rmb() in ovl_has_upperdata(). Make sure * if OVL_UPPERDATA flag is visible, then effects of write operations * before it are visible as well. */ smp_wmb(); ovl_set_flag(OVL_UPPERDATA, inode); } /* Caller should hold ovl_inode->lock */ bool ovl_dentry_needs_data_copy_up_locked(struct dentry *dentry, int flags) { if (!ovl_open_flags_need_copy_up(flags)) return false; return !ovl_test_flag(OVL_UPPERDATA, d_inode(dentry)); } bool ovl_dentry_needs_data_copy_up(struct dentry *dentry, int flags) { if (!ovl_open_flags_need_copy_up(flags)) return false; return !ovl_has_upperdata(d_inode(dentry)); } bool ovl_redirect_dir(struct super_block *sb) { struct ovl_fs *ofs = sb->s_fs_info; return ofs->config.redirect_dir && !ofs->noxattr; } const char *ovl_dentry_get_redirect(struct dentry *dentry) { return OVL_I(d_inode(dentry))->redirect; } void ovl_dentry_set_redirect(struct dentry *dentry, const char *redirect) { struct ovl_inode *oi = OVL_I(d_inode(dentry)); kfree(oi->redirect); oi->redirect = redirect; } void ovl_inode_init(struct inode *inode, struct dentry *upperdentry, struct dentry *lowerdentry, struct dentry *lowerdata) { struct inode *realinode = d_inode(upperdentry ?: lowerdentry); if (upperdentry) OVL_I(inode)->__upperdentry = upperdentry; if (lowerdentry) OVL_I(inode)->lower = igrab(d_inode(lowerdentry)); if (lowerdata) OVL_I(inode)->lowerdata = igrab(d_inode(lowerdata)); ovl_copyattr(realinode, inode); ovl_copyflags(realinode, inode); if (!inode->i_ino) inode->i_ino = realinode->i_ino; } void ovl_inode_update(struct inode *inode, struct dentry *upperdentry) { struct inode *upperinode = d_inode(upperdentry); WARN_ON(OVL_I(inode)->__upperdentry); /* * Make sure upperdentry is consistent before making it visible */ smp_wmb(); OVL_I(inode)->__upperdentry = upperdentry; if (inode_unhashed(inode)) { if (!inode->i_ino) inode->i_ino = upperinode->i_ino; inode->i_private = upperinode; __insert_inode_hash(inode, (unsigned long) upperinode); } } static void ovl_dentry_version_inc(struct dentry *dentry, bool impurity) { struct inode *inode = d_inode(dentry); WARN_ON(!inode_is_locked(inode)); /* * Version is used by readdir code to keep cache consistent. For merge * dirs all changes need to be noted. For non-merge dirs, cache only * contains impure (ones which have been copied up and have origins) * entries, so only need to note changes to impure entries. */ if (OVL_TYPE_MERGE(ovl_path_type(dentry)) || impurity) OVL_I(inode)->version++; } void ovl_dir_modified(struct dentry *dentry, bool impurity) { /* Copy mtime/ctime */ ovl_copyattr(d_inode(ovl_dentry_upper(dentry)), d_inode(dentry)); ovl_dentry_version_inc(dentry, impurity); } u64 ovl_dentry_version_get(struct dentry *dentry) { struct inode *inode = d_inode(dentry); WARN_ON(!inode_is_locked(inode)); return OVL_I(inode)->version; } bool ovl_is_whiteout(struct dentry *dentry) { struct inode *inode = dentry->d_inode; return inode && IS_WHITEOUT(inode); } struct file *ovl_path_open(struct path *path, int flags) { return dentry_open(path, flags | O_NOATIME, current_cred()); } /* Caller should hold ovl_inode->lock */ static bool ovl_already_copied_up_locked(struct dentry *dentry, int flags) { bool disconnected = dentry->d_flags & DCACHE_DISCONNECTED; if (ovl_dentry_upper(dentry) && (ovl_dentry_has_upper_alias(dentry) || disconnected) && !ovl_dentry_needs_data_copy_up_locked(dentry, flags)) return true; return false; } bool ovl_already_copied_up(struct dentry *dentry, int flags) { bool disconnected = dentry->d_flags & DCACHE_DISCONNECTED; /* * Check if copy-up has happened as well as for upper alias (in * case of hard links) is there. * * Both checks are lockless: * - false negatives: will recheck under oi->lock * - false positives: * + ovl_dentry_upper() uses memory barriers to ensure the * upper dentry is up-to-date * + ovl_dentry_has_upper_alias() relies on locking of * upper parent i_rwsem to prevent reordering copy-up * with rename. */ if (ovl_dentry_upper(dentry) && (ovl_dentry_has_upper_alias(dentry) || disconnected) && !ovl_dentry_needs_data_copy_up(dentry, flags)) return true; return false; } int ovl_copy_up_start(struct dentry *dentry, int flags) { struct ovl_inode *oi = OVL_I(d_inode(dentry)); int err; err = mutex_lock_interruptible(&oi->lock); if (!err && ovl_already_copied_up_locked(dentry, flags)) { err = 1; /* Already copied up */ mutex_unlock(&oi->lock); } return err; } void ovl_copy_up_end(struct dentry *dentry) { mutex_unlock(&OVL_I(d_inode(dentry))->lock); } bool ovl_check_origin_xattr(struct dentry *dentry) { int res; res = vfs_getxattr(dentry, OVL_XATTR_ORIGIN, NULL, 0); /* Zero size value means "copied up but origin unknown" */ if (res >= 0) return true; return false; } bool ovl_check_dir_xattr(struct dentry *dentry, const char *name) { int res; char val; if (!d_is_dir(dentry)) return false; res = vfs_getxattr(dentry, name, &val, 1); if (res == 1 && val == 'y') return true; return false; } int ovl_check_setxattr(struct dentry *dentry, struct dentry *upperdentry, const char *name, const void *value, size_t size, int xerr) { int err; struct ovl_fs *ofs = dentry->d_sb->s_fs_info; if (ofs->noxattr) return xerr; err = ovl_do_setxattr(upperdentry, name, value, size, 0); if (err == -EOPNOTSUPP) { pr_warn("overlayfs: cannot set %s xattr on upper\n", name); ofs->noxattr = true; return xerr; } return err; } int ovl_set_impure(struct dentry *dentry, struct dentry *upperdentry) { int err; if (ovl_test_flag(OVL_IMPURE, d_inode(dentry))) return 0; /* * Do not fail when upper doesn't support xattrs. * Upper inodes won't have origin nor redirect xattr anyway. */ err = ovl_check_setxattr(dentry, upperdentry, OVL_XATTR_IMPURE, "y", 1, 0); if (!err) ovl_set_flag(OVL_IMPURE, d_inode(dentry)); return err; } void ovl_set_flag(unsigned long flag, struct inode *inode) { set_bit(flag, &OVL_I(inode)->flags); } void ovl_clear_flag(unsigned long flag, struct inode *inode) { clear_bit(flag, &OVL_I(inode)->flags); } bool ovl_test_flag(unsigned long flag, struct inode *inode) { return test_bit(flag, &OVL_I(inode)->flags); } /** * Caller must hold a reference to inode to prevent it from being freed while * it is marked inuse. */ bool ovl_inuse_trylock(struct dentry *dentry) { struct inode *inode = d_inode(dentry); bool locked = false; spin_lock(&inode->i_lock); if (!(inode->i_state & I_OVL_INUSE)) { inode->i_state |= I_OVL_INUSE; locked = true; } spin_unlock(&inode->i_lock); return locked; } void ovl_inuse_unlock(struct dentry *dentry) { if (dentry) { struct inode *inode = d_inode(dentry); spin_lock(&inode->i_lock); WARN_ON(!(inode->i_state & I_OVL_INUSE)); inode->i_state &= ~I_OVL_INUSE; spin_unlock(&inode->i_lock); } } bool ovl_is_inuse(struct dentry *dentry) { struct inode *inode = d_inode(dentry); bool inuse; spin_lock(&inode->i_lock); inuse = (inode->i_state & I_OVL_INUSE); spin_unlock(&inode->i_lock); return inuse; } /* * Does this overlay dentry need to be indexed on copy up? */ bool ovl_need_index(struct dentry *dentry) { struct dentry *lower = ovl_dentry_lower(dentry); if (!lower || !ovl_indexdir(dentry->d_sb)) return false; /* Index all files for NFS export and consistency verification */ if (ovl_index_all(dentry->d_sb)) return true; /* Index only lower hardlinks on copy up */ if (!d_is_dir(lower) && d_inode(lower)->i_nlink > 1) return true; return false; } /* Caller must hold OVL_I(inode)->lock */ static void ovl_cleanup_index(struct dentry *dentry) { struct dentry *indexdir = ovl_indexdir(dentry->d_sb); struct inode *dir = indexdir->d_inode; struct dentry *lowerdentry = ovl_dentry_lower(dentry); struct dentry *upperdentry = ovl_dentry_upper(dentry); struct dentry *index = NULL; struct inode *inode; struct qstr name = { }; int err; err = ovl_get_index_name(lowerdentry, &name); if (err) goto fail; inode = d_inode(upperdentry); if (!S_ISDIR(inode->i_mode) && inode->i_nlink != 1) { pr_warn_ratelimited("overlayfs: cleanup linked index (%pd2, ino=%lu, nlink=%u)\n", upperdentry, inode->i_ino, inode->i_nlink); /* * We either have a bug with persistent union nlink or a lower * hardlink was added while overlay is mounted. Adding a lower * hardlink and then unlinking all overlay hardlinks would drop * overlay nlink to zero before all upper inodes are unlinked. * As a safety measure, when that situation is detected, set * the overlay nlink to the index inode nlink minus one for the * index entry itself. */ set_nlink(d_inode(dentry), inode->i_nlink - 1); ovl_set_nlink_upper(dentry); goto out; } inode_lock_nested(dir, I_MUTEX_PARENT); index = lookup_one_len(name.name, indexdir, name.len); err = PTR_ERR(index); if (IS_ERR(index)) { index = NULL; } else if (ovl_index_all(dentry->d_sb)) { /* Whiteout orphan index to block future open by handle */ err = ovl_cleanup_and_whiteout(indexdir, dir, index); } else { /* Cleanup orphan index entries */ err = ovl_cleanup(dir, index); } inode_unlock(dir); if (err) goto fail; out: kfree(name.name); dput(index); return; fail: pr_err("overlayfs: cleanup index of '%pd2' failed (%i)\n", dentry, err); goto out; } /* * Operations that change overlay inode and upper inode nlink need to be * synchronized with copy up for persistent nlink accounting. */ int ovl_nlink_start(struct dentry *dentry, bool *locked) { struct ovl_inode *oi = OVL_I(d_inode(dentry)); const struct cred *old_cred; int err; if (!d_inode(dentry)) return 0; /* * With inodes index is enabled, we store the union overlay nlink * in an xattr on the index inode. When whiting out an indexed lower, * we need to decrement the overlay persistent nlink, but before the * first copy up, we have no upper index inode to store the xattr. * * As a workaround, before whiteout/rename over an indexed lower, * copy up to create the upper index. Creating the upper index will * initialize the overlay nlink, so it could be dropped if unlink * or rename succeeds. * * TODO: implement metadata only index copy up when called with * ovl_copy_up_flags(dentry, O_PATH). */ if (ovl_need_index(dentry) && !ovl_dentry_has_upper_alias(dentry)) { err = ovl_copy_up(dentry); if (err) return err; } err = mutex_lock_interruptible(&oi->lock); if (err) return err; if (d_is_dir(dentry) || !ovl_test_flag(OVL_INDEX, d_inode(dentry))) goto out; old_cred = ovl_override_creds(dentry->d_sb); /* * The overlay inode nlink should be incremented/decremented IFF the * upper operation succeeds, along with nlink change of upper inode. * Therefore, before link/unlink/rename, we store the union nlink * value relative to the upper inode nlink in an upper inode xattr. */ err = ovl_set_nlink_upper(dentry); revert_creds(old_cred); out: if (err) mutex_unlock(&oi->lock); else *locked = true; return err; } void ovl_nlink_end(struct dentry *dentry, bool locked) { if (locked) { if (ovl_test_flag(OVL_INDEX, d_inode(dentry)) && d_inode(dentry)->i_nlink == 0) { const struct cred *old_cred; old_cred = ovl_override_creds(dentry->d_sb); ovl_cleanup_index(dentry); revert_creds(old_cred); } mutex_unlock(&OVL_I(d_inode(dentry))->lock); } } int ovl_lock_rename_workdir(struct dentry *workdir, struct dentry *upperdir) { /* Workdir should not be the same as upperdir */ if (workdir == upperdir) goto err; /* Workdir should not be subdir of upperdir and vice versa */ if (lock_rename(workdir, upperdir) != NULL) goto err_unlock; return 0; err_unlock: unlock_rename(workdir, upperdir); err: pr_err("overlayfs: failed to lock workdir+upperdir\n"); return -EIO; } /* err < 0, 0 if no metacopy xattr, 1 if metacopy xattr found */ int ovl_check_metacopy_xattr(struct dentry *dentry) { int res; /* Only regular files can have metacopy xattr */ if (!S_ISREG(d_inode(dentry)->i_mode)) return 0; res = vfs_getxattr(dentry, OVL_XATTR_METACOPY, NULL, 0); if (res < 0) { if (res == -ENODATA || res == -EOPNOTSUPP) return 0; goto out; } return 1; out: pr_warn_ratelimited("overlayfs: failed to get metacopy (%i)\n", res); return res; } bool ovl_is_metacopy_dentry(struct dentry *dentry) { struct ovl_entry *oe = dentry->d_fsdata; if (!d_is_reg(dentry)) return false; if (ovl_dentry_upper(dentry)) { if (!ovl_has_upperdata(d_inode(dentry))) return true; return false; } return (oe->numlower > 1); } ssize_t ovl_getxattr(struct dentry *dentry, char *name, char **value, size_t padding) { ssize_t res; char *buf = NULL; res = vfs_getxattr(dentry, name, NULL, 0); if (res < 0) { if (res == -ENODATA || res == -EOPNOTSUPP) return -ENODATA; goto fail; } if (res != 0) { buf = kzalloc(res + padding, GFP_KERNEL); if (!buf) return -ENOMEM; res = vfs_getxattr(dentry, name, buf, res); if (res < 0) goto fail; } *value = buf; return res; fail: pr_warn_ratelimited("overlayfs: failed to get xattr %s: err=%zi)\n", name, res); kfree(buf); return res; } char *ovl_get_redirect_xattr(struct dentry *dentry, int padding) { int res; char *s, *next, *buf = NULL; res = ovl_getxattr(dentry, OVL_XATTR_REDIRECT, &buf, padding + 1); if (res == -ENODATA) return NULL; if (res < 0) return ERR_PTR(res); if (res == 0) goto invalid; if (buf[0] == '/') { for (s = buf; *s++ == '/'; s = next) { next = strchrnul(s, '/'); if (s == next) goto invalid; } } else { if (strchr(buf, '/') != NULL) goto invalid; } return buf; invalid: pr_warn_ratelimited("overlayfs: invalid redirect (%s)\n", buf); res = -EINVAL; kfree(buf); return ERR_PTR(res); }
9 103 14 13 1 8 232 1 1 1 19 9 8 2 1 2 2 25 23 2 23 23 2 2 2 1 2 2 2 2 1 25 20 3 2 2 2 21 15 14 1 1 10 10 10 10 10 9 1 10 8 8 10 25 56 55 34 34 2 34 52 52 52 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 // SPDX-License-Identifier: GPL-2.0 /* * This module exports the functions: * * 'int set_selection(struct tiocl_selection __user *, struct tty_struct *)' * 'void clear_selection(void)' * 'int paste_selection(struct tty_struct *)' * 'int sel_loadlut(char __user *)' * * Now that /dev/vcs exists, most of this can disappear again. */ #include <linux/module.h> #include <linux/tty.h> #include <linux/sched.h> #include <linux/mm.h> #include <linux/mutex.h> #include <linux/slab.h> #include <linux/types.h> #include <linux/uaccess.h> #include <linux/kbd_kern.h> #include <linux/vt_kern.h> #include <linux/consolemap.h> #include <linux/selection.h> #include <linux/tiocl.h> #include <linux/console.h> #include <linux/tty_flip.h> #include <linux/sched/signal.h> /* Don't take this from <ctype.h>: 011-015 on the screen aren't spaces */ #define isspace(c) ((c) == ' ') extern void poke_blanked_console(void); /* FIXME: all this needs locking */ /* Variables for selection control. */ /* Use a dynamic buffer, instead of static (Dec 1994) */ struct vc_data *sel_cons; /* must not be deallocated */ static int use_unicode; static volatile int sel_start = -1; /* cleared by clear_selection */ static int sel_end; static int sel_buffer_lth; static char *sel_buffer; static DEFINE_MUTEX(sel_lock); /* clear_selection, highlight and highlight_pointer can be called from interrupt (via scrollback/front) */ /* set reverse video on characters s-e of console with selection. */ static inline void highlight(const int s, const int e) { invert_screen(sel_cons, s, e-s+2, 1); } /* use complementary color to show the pointer */ static inline void highlight_pointer(const int where) { complement_pos(sel_cons, where); } static u32 sel_pos(int n) { if (use_unicode) return screen_glyph_unicode(sel_cons, n / 2); return inverse_translate(sel_cons, screen_glyph(sel_cons, n), 0); } /** * clear_selection - remove current selection * * Remove the current selection highlight, if any from the console * holding the selection. The caller must hold the console lock. */ void clear_selection(void) { highlight_pointer(-1); /* hide the pointer */ if (sel_start != -1) { highlight(sel_start, sel_end); sel_start = -1; } } bool vc_is_sel(struct vc_data *vc) { return vc == sel_cons; } /* * User settable table: what characters are to be considered alphabetic? * 128 bits. Locked by the console lock. */ static u32 inwordLut[]={ 0x00000000, /* control chars */ 0x03FFE000, /* digits and "-./" */ 0x87FFFFFE, /* uppercase and '_' */ 0x07FFFFFE, /* lowercase */ }; static inline int inword(const u32 c) { return c > 0x7f || (( inwordLut[c>>5] >> (c & 0x1F) ) & 1); } /** * set loadlut - load the LUT table * @p: user table * * Load the LUT table from user space. The caller must hold the console * lock. Make a temporary copy so a partial update doesn't make a mess. */ int sel_loadlut(char __user *p) { u32 tmplut[ARRAY_SIZE(inwordLut)]; if (copy_from_user(tmplut, (u32 __user *)(p+4), sizeof(inwordLut))) return -EFAULT; memcpy(inwordLut, tmplut, sizeof(inwordLut)); return 0; } /* does screen address p correspond to character at LH/RH edge of screen? */ static inline int atedge(const int p, int size_row) { return (!(p % size_row) || !((p + 2) % size_row)); } /* stores the char in UTF8 and returns the number of bytes used (1-4) */ static int store_utf8(u32 c, char *p) { if (c < 0x80) { /* 0******* */ p[0] = c; return 1; } else if (c < 0x800) { /* 110***** 10****** */ p[0] = 0xc0 | (c >> 6); p[1] = 0x80 | (c & 0x3f); return 2; } else if (c < 0x10000) { /* 1110**** 10****** 10****** */ p[0] = 0xe0 | (c >> 12); p[1] = 0x80 | ((c >> 6) & 0x3f); p[2] = 0x80 | (c & 0x3f); return 3; } else if (c < 0x110000) { /* 11110*** 10****** 10****** 10****** */ p[0] = 0xf0 | (c >> 18); p[1] = 0x80 | ((c >> 12) & 0x3f); p[2] = 0x80 | ((c >> 6) & 0x3f); p[3] = 0x80 | (c & 0x3f); return 4; } else { /* outside Unicode, replace with U+FFFD */ p[0] = 0xef; p[1] = 0xbf; p[2] = 0xbd; return 3; } } /** * set_selection - set the current selection. * @sel: user selection info * @tty: the console tty * * Invoked by the ioctl handle for the vt layer. * * The entire selection process is managed under the console_lock. It's * a lot under the lock but its hardly a performance path */ static int __set_selection(const struct tiocl_selection __user *sel, struct tty_struct *tty) { struct vc_data *vc = vc_cons[fg_console].d; int new_sel_start, new_sel_end, spc; struct tiocl_selection v; char *bp, *obp; int i, ps, pe, multiplier; u32 c; int mode, ret = 0; poke_blanked_console(); if (copy_from_user(&v, sel, sizeof(*sel))) return -EFAULT; v.xs = min_t(u16, v.xs - 1, vc->vc_cols - 1); v.ys = min_t(u16, v.ys - 1, vc->vc_rows - 1); v.xe = min_t(u16, v.xe - 1, vc->vc_cols - 1); v.ye = min_t(u16, v.ye - 1, vc->vc_rows - 1); ps = v.ys * vc->vc_size_row + (v.xs << 1); pe = v.ye * vc->vc_size_row + (v.xe << 1); if (v.sel_mode == TIOCL_SELCLEAR) { /* useful for screendump without selection highlights */ clear_selection(); return 0; } if (mouse_reporting() && (v.sel_mode & TIOCL_SELMOUSEREPORT)) { mouse_report(tty, v.sel_mode & TIOCL_SELBUTTONMASK, v.xs, v.ys); return 0; } if (ps > pe) /* make sel_start <= sel_end */ swap(ps, pe); if (sel_cons != vc_cons[fg_console].d) { clear_selection(); sel_cons = vc_cons[fg_console].d; } mode = vt_do_kdgkbmode(fg_console); if (mode == K_UNICODE) use_unicode = 1; else use_unicode = 0; switch (v.sel_mode) { case TIOCL_SELCHAR: /* character-by-character selection */ new_sel_start = ps; new_sel_end = pe; break; case TIOCL_SELWORD: /* word-by-word selection */ spc = isspace(sel_pos(ps)); for (new_sel_start = ps; ; ps -= 2) { if ((spc && !isspace(sel_pos(ps))) || (!spc && !inword(sel_pos(ps)))) break; new_sel_start = ps; if (!(ps % vc->vc_size_row)) break; } spc = isspace(sel_pos(pe)); for (new_sel_end = pe; ; pe += 2) { if ((spc && !isspace(sel_pos(pe))) || (!spc && !inword(sel_pos(pe)))) break; new_sel_end = pe; if (!((pe + 2) % vc->vc_size_row)) break; } break; case TIOCL_SELLINE: /* line-by-line selection */ new_sel_start = ps - ps % vc->vc_size_row; new_sel_end = pe + vc->vc_size_row - pe % vc->vc_size_row - 2; break; case TIOCL_SELPOINTER: highlight_pointer(pe); return 0; default: return -EINVAL; } /* remove the pointer */ highlight_pointer(-1); /* select to end of line if on trailing space */ if (new_sel_end > new_sel_start && !atedge(new_sel_end, vc->vc_size_row) && isspace(sel_pos(new_sel_end))) { for (pe = new_sel_end + 2; ; pe += 2) if (!isspace(sel_pos(pe)) || atedge(pe, vc->vc_size_row)) break; if (isspace(sel_pos(pe))) new_sel_end = pe; } if (sel_start == -1) /* no current selection */ highlight(new_sel_start, new_sel_end); else if (new_sel_start == sel_start) { if (new_sel_end == sel_end) /* no action required */ return 0; else if (new_sel_end > sel_end) /* extend to right */ highlight(sel_end + 2, new_sel_end); else /* contract from right */ highlight(new_sel_end + 2, sel_end); } else if (new_sel_end == sel_end) { if (new_sel_start < sel_start) /* extend to left */ highlight(new_sel_start, sel_start - 2); else /* contract from left */ highlight(sel_start, new_sel_start - 2); } else /* some other case; start selection from scratch */ { clear_selection(); highlight(new_sel_start, new_sel_end); } sel_start = new_sel_start; sel_end = new_sel_end; /* Allocate a new buffer before freeing the old one ... */ multiplier = use_unicode ? 4 : 1; /* chars can take up to 4 bytes */ bp = kmalloc_array((sel_end - sel_start) / 2 + 1, multiplier, GFP_KERNEL); if (!bp) { printk(KERN_WARNING "selection: kmalloc() failed\n"); clear_selection(); return -ENOMEM; } kfree(sel_buffer); sel_buffer = bp; obp = bp; for (i = sel_start; i <= sel_end; i += 2) { c = sel_pos(i); if (use_unicode) bp += store_utf8(c, bp); else *bp++ = c; if (!isspace(c)) obp = bp; if (! ((i + 2) % vc->vc_size_row)) { /* strip trailing blanks from line and add newline, unless non-space at end of line. */ if (obp != bp) { bp = obp; *bp++ = '\r'; } obp = bp; } } sel_buffer_lth = bp - sel_buffer; return ret; } int set_selection(const struct tiocl_selection __user *v, struct tty_struct *tty) { int ret; mutex_lock(&sel_lock); console_lock(); ret = __set_selection(v, tty); console_unlock(); mutex_unlock(&sel_lock); return ret; } /* Insert the contents of the selection buffer into the * queue of the tty associated with the current console. * Invoked by ioctl(). * * Locking: called without locks. Calls the ldisc wrongly with * unsafe methods, */ int paste_selection(struct tty_struct *tty) { struct vc_data *vc = tty->driver_data; int pasted = 0; unsigned int count; struct tty_ldisc *ld; DECLARE_WAITQUEUE(wait, current); int ret = 0; console_lock(); poke_blanked_console(); console_unlock(); ld = tty_ldisc_ref_wait(tty); if (!ld) return -EIO; /* ldisc was hung up */ tty_buffer_lock_exclusive(&vc->port); add_wait_queue(&vc->paste_wait, &wait); mutex_lock(&sel_lock); while (sel_buffer && sel_buffer_lth > pasted) { set_current_state(TASK_INTERRUPTIBLE); if (signal_pending(current)) { ret = -EINTR; break; } if (tty_throttled(tty)) { mutex_unlock(&sel_lock); schedule(); mutex_lock(&sel_lock); continue; } __set_current_state(TASK_RUNNING); count = sel_buffer_lth - pasted; count = tty_ldisc_receive_buf(ld, sel_buffer + pasted, NULL, count); pasted += count; } mutex_unlock(&sel_lock); remove_wait_queue(&vc->paste_wait, &wait); __set_current_state(TASK_RUNNING); tty_buffer_unlock_exclusive(&vc->port); tty_ldisc_deref(ld); return ret; }
3436 31 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 /* SPDX-License-Identifier: GPL-2.0 */ /* * memory buffer pool support */ #ifndef _LINUX_MEMPOOL_H #define _LINUX_MEMPOOL_H #include <linux/wait.h> #include <linux/compiler.h> struct kmem_cache; typedef void * (mempool_alloc_t)(gfp_t gfp_mask, void *pool_data); typedef void (mempool_free_t)(void *element, void *pool_data); typedef struct mempool_s { spinlock_t lock; int min_nr; /* nr of elements at *elements */ int curr_nr; /* Current nr of elements at *elements */ void **elements; void *pool_data; mempool_alloc_t *alloc; mempool_free_t *free; wait_queue_head_t wait; } mempool_t; static inline bool mempool_initialized(mempool_t *pool) { return pool->elements != NULL; } void mempool_exit(mempool_t *pool); int mempool_init_node(mempool_t *pool, int min_nr, mempool_alloc_t *alloc_fn, mempool_free_t *free_fn, void *pool_data, gfp_t gfp_mask, int node_id); int mempool_init(mempool_t *pool, int min_nr, mempool_alloc_t *alloc_fn, mempool_free_t *free_fn, void *pool_data); extern mempool_t *mempool_create(int min_nr, mempool_alloc_t *alloc_fn, mempool_free_t *free_fn, void *pool_data); extern mempool_t *mempool_create_node(int min_nr, mempool_alloc_t *alloc_fn, mempool_free_t *free_fn, void *pool_data, gfp_t gfp_mask, int nid); extern int mempool_resize(mempool_t *pool, int new_min_nr); extern void mempool_destroy(mempool_t *pool); extern void *mempool_alloc(mempool_t *pool, gfp_t gfp_mask) __malloc; extern void mempool_free(void *element, mempool_t *pool); /* * A mempool_alloc_t and mempool_free_t that get the memory from * a slab cache that is passed in through pool_data. * Note: the slab cache may not have a ctor function. */ void *mempool_alloc_slab(gfp_t gfp_mask, void *pool_data); void mempool_free_slab(void *element, void *pool_data); static inline int mempool_init_slab_pool(mempool_t *pool, int min_nr, struct kmem_cache *kc) { return mempool_init(pool, min_nr, mempool_alloc_slab, mempool_free_slab, (void *) kc); } static inline mempool_t * mempool_create_slab_pool(int min_nr, struct kmem_cache *kc) { return mempool_create(min_nr, mempool_alloc_slab, mempool_free_slab, (void *) kc); } /* * a mempool_alloc_t and a mempool_free_t to kmalloc and kfree the * amount of memory specified by pool_data */ void *mempool_kmalloc(gfp_t gfp_mask, void *pool_data); void mempool_kfree(void *element, void *pool_data); static inline int mempool_init_kmalloc_pool(mempool_t *pool, int min_nr, size_t size) { return mempool_init(pool, min_nr, mempool_kmalloc, mempool_kfree, (void *) size); } static inline mempool_t *mempool_create_kmalloc_pool(int min_nr, size_t size) { return mempool_create(min_nr, mempool_kmalloc, mempool_kfree, (void *) size); } /* * A mempool_alloc_t and mempool_free_t for a simple page allocator that * allocates pages of the order specified by pool_data */ void *mempool_alloc_pages(gfp_t gfp_mask, void *pool_data); void mempool_free_pages(void *element, void *pool_data); static inline int mempool_init_page_pool(mempool_t *pool, int min_nr, int order) { return mempool_init(pool, min_nr, mempool_alloc_pages, mempool_free_pages, (void *)(long)order); } static inline mempool_t *mempool_create_page_pool(int min_nr, int order) { return mempool_create(min_nr, mempool_alloc_pages, mempool_free_pages, (void *)(long)order); } #endif /* _LINUX_MEMPOOL_H */
2266 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 /* SPDX-License-Identifier: GPL-2.0 */ /* * include/linux/prandom.h * * Include file for the fast pseudo-random 32-bit * generation. */ #ifndef _LINUX_PRANDOM_H #define _LINUX_PRANDOM_H #include <linux/types.h> #include <linux/percpu.h> #include <linux/siphash.h> u32 prandom_u32(void); void prandom_bytes(void *buf, size_t nbytes); void prandom_seed(u32 seed); void prandom_reseed_late(void); #if BITS_PER_LONG == 64 /* * The core SipHash round function. Each line can be executed in * parallel given enough CPU resources. */ #define PRND_SIPROUND(v0, v1, v2, v3) SIPHASH_PERMUTATION(v0, v1, v2, v3) #define PRND_K0 (SIPHASH_CONST_0 ^ SIPHASH_CONST_2) #define PRND_K1 (SIPHASH_CONST_1 ^ SIPHASH_CONST_3) #elif BITS_PER_LONG == 32 /* * On 32-bit machines, we use HSipHash, a reduced-width version of SipHash. * This is weaker, but 32-bit machines are not used for high-traffic * applications, so there is less output for an attacker to analyze. */ #define PRND_SIPROUND(v0, v1, v2, v3) HSIPHASH_PERMUTATION(v0, v1, v2, v3) #define PRND_K0 (HSIPHASH_CONST_0 ^ HSIPHASH_CONST_2) #define PRND_K1 (HSIPHASH_CONST_1 ^ HSIPHASH_CONST_3) #else #error Unsupported BITS_PER_LONG #endif struct rnd_state { __u32 s1, s2, s3, s4; }; u32 prandom_u32_state(struct rnd_state *state); void prandom_bytes_state(struct rnd_state *state, void *buf, size_t nbytes); void prandom_seed_full_state(struct rnd_state __percpu *pcpu_state); #define prandom_init_once(pcpu_state) \ DO_ONCE(prandom_seed_full_state, (pcpu_state)) /** * prandom_u32_max - returns a pseudo-random number in interval [0, ep_ro) * @ep_ro: right open interval endpoint * * Returns a pseudo-random number that is in interval [0, ep_ro). Note * that the result depends on PRNG being well distributed in [0, ~0U] * u32 space. Here we use maximally equidistributed combined Tausworthe * generator, that is, prandom_u32(). This is useful when requesting a * random index of an array containing ep_ro elements, for example. * * Returns: pseudo-random number in interval [0, ep_ro) */ static inline u32 prandom_u32_max(u32 ep_ro) { return (u32)(((u64) prandom_u32() * ep_ro) >> 32); } /* * Handle minimum values for seeds */ static inline u32 __seed(u32 x, u32 m) { return (x < m) ? x + m : x; } /** * prandom_seed_state - set seed for prandom_u32_state(). * @state: pointer to state structure to receive the seed. * @seed: arbitrary 64-bit value to use as a seed. */ static inline void prandom_seed_state(struct rnd_state *state, u64 seed) { u32 i = ((seed >> 32) ^ (seed << 10) ^ seed) & 0xffffffffUL; state->s1 = __seed(i, 2U); state->s2 = __seed(i, 8U); state->s3 = __seed(i, 16U); state->s4 = __seed(i, 128U); } /* Pseudo random number generator from numerical recipes. */ static inline u32 next_pseudo_random32(u32 seed) { return seed * 1664525 + 1013904223; } #endif
1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 // SPDX-License-Identifier: GPL-2.0 /* * linux/drivers/staging/erofs/dir.c * * Copyright (C) 2017-2018 HUAWEI, Inc. * http://www.huawei.com/ * Created by Gao Xiang <gaoxiang25@huawei.com> * * This file is subject to the terms and conditions of the GNU General Public * License. See the file COPYING in the main directory of the Linux * distribution for more details. */ #include "internal.h" static const unsigned char erofs_filetype_table[EROFS_FT_MAX] = { [EROFS_FT_UNKNOWN] = DT_UNKNOWN, [EROFS_FT_REG_FILE] = DT_REG, [EROFS_FT_DIR] = DT_DIR, [EROFS_FT_CHRDEV] = DT_CHR, [EROFS_FT_BLKDEV] = DT_BLK, [EROFS_FT_FIFO] = DT_FIFO, [EROFS_FT_SOCK] = DT_SOCK, [EROFS_FT_SYMLINK] = DT_LNK, }; static void debug_one_dentry(unsigned char d_type, const char *de_name, unsigned int de_namelen) { #ifdef CONFIG_EROFS_FS_DEBUG /* since the on-disk name could not have the trailing '\0' */ unsigned char dbg_namebuf[EROFS_NAME_LEN + 1]; memcpy(dbg_namebuf, de_name, de_namelen); dbg_namebuf[de_namelen] = '\0'; debugln("found dirent %s de_len %u d_type %d", dbg_namebuf, de_namelen, d_type); #endif } static int erofs_fill_dentries(struct dir_context *ctx, void *dentry_blk, unsigned *ofs, unsigned nameoff, unsigned maxsize) { struct erofs_dirent *de = dentry_blk; const struct erofs_dirent *end = dentry_blk + nameoff; de = dentry_blk + *ofs; while (de < end) { const char *de_name; unsigned int de_namelen; unsigned char d_type; if (de->file_type < EROFS_FT_MAX) d_type = erofs_filetype_table[de->file_type]; else d_type = DT_UNKNOWN; nameoff = le16_to_cpu(de->nameoff); de_name = (char *)dentry_blk + nameoff; /* the last dirent in the block? */ if (de + 1 >= end) de_namelen = strnlen(de_name, maxsize - nameoff); else de_namelen = le16_to_cpu(de[1].nameoff) - nameoff; /* a corrupted entry is found */ if (unlikely(nameoff + de_namelen > maxsize || de_namelen > EROFS_NAME_LEN)) { DBG_BUGON(1); return -EIO; } debug_one_dentry(d_type, de_name, de_namelen); if (!dir_emit(ctx, de_name, de_namelen, le64_to_cpu(de->nid), d_type)) /* stoped by some reason */ return 1; ++de; *ofs += sizeof(struct erofs_dirent); } *ofs = maxsize; return 0; } static int erofs_readdir(struct file *f, struct dir_context *ctx) { struct inode *dir = file_inode(f); struct address_space *mapping = dir->i_mapping; const size_t dirsize = i_size_read(dir); unsigned i = ctx->pos / EROFS_BLKSIZ; unsigned ofs = ctx->pos % EROFS_BLKSIZ; int err = 0; bool initial = true; while (ctx->pos < dirsize) { struct page *dentry_page; struct erofs_dirent *de; unsigned nameoff, maxsize; dentry_page = read_mapping_page(mapping, i, NULL); if (dentry_page == ERR_PTR(-ENOMEM)) { err = -ENOMEM; break; } else if (IS_ERR(dentry_page)) { errln("fail to readdir of logical block %u of nid %llu", i, EROFS_V(dir)->nid); err = PTR_ERR(dentry_page); break; } lock_page(dentry_page); de = (struct erofs_dirent *)kmap(dentry_page); nameoff = le16_to_cpu(de->nameoff); if (unlikely(nameoff < sizeof(struct erofs_dirent) || nameoff >= PAGE_SIZE)) { errln("%s, invalid de[0].nameoff %u", __func__, nameoff); err = -EIO; goto skip_this; } maxsize = min_t(unsigned, dirsize - ctx->pos + ofs, PAGE_SIZE); /* search dirents at the arbitrary position */ if (unlikely(initial)) { initial = false; ofs = roundup(ofs, sizeof(struct erofs_dirent)); if (unlikely(ofs >= nameoff)) goto skip_this; } err = erofs_fill_dentries(ctx, de, &ofs, nameoff, maxsize); skip_this: kunmap(dentry_page); unlock_page(dentry_page); put_page(dentry_page); ctx->pos = blknr_to_addr(i) + ofs; if (unlikely(err)) break; ++i; ofs = 0; } return err < 0 ? err : 0; } const struct file_operations erofs_dir_fops = { .llseek = generic_file_llseek, .read = generic_read_dir, .iterate = erofs_readdir, };
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 /* Copyright (C) 2003-2013 Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License version 2 as * published by the Free Software Foundation. */ /* Kernel module implementing an IP set type: the hash:net,port type */ #include <linux/jhash.h> #include <linux/module.h> #include <linux/ip.h> #include <linux/skbuff.h> #include <linux/errno.h> #include <linux/random.h> #include <net/ip.h> #include <net/ipv6.h> #include <net/netlink.h> #include <linux/netfilter.h> #include <linux/netfilter/ipset/pfxlen.h> #include <linux/netfilter/ipset/ip_set.h> #include <linux/netfilter/ipset/ip_set_getport.h> #include <linux/netfilter/ipset/ip_set_hash.h> #define IPSET_TYPE_REV_MIN 0 /* 1 SCTP and UDPLITE support added */ /* 2 Range as input support for IPv4 added */ /* 3 nomatch flag support added */ /* 4 Counters support added */ /* 5 Comments support added */ /* 6 Forceadd support added */ #define IPSET_TYPE_REV_MAX 7 /* skbinfo support added */ MODULE_LICENSE("GPL"); MODULE_AUTHOR("Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>"); IP_SET_MODULE_DESC("hash:net,port", IPSET_TYPE_REV_MIN, IPSET_TYPE_REV_MAX); MODULE_ALIAS("ip_set_hash:net,port"); /* Type specific function prefix */ #define HTYPE hash_netport #define IP_SET_HASH_WITH_PROTO #define IP_SET_HASH_WITH_NETS /* We squeeze the "nomatch" flag into cidr: we don't support cidr == 0 * However this way we have to store internally cidr - 1, * dancing back and forth. */ #define IP_SET_HASH_WITH_NETS_PACKED /* IPv4 variant */ /* Member elements */ struct hash_netport4_elem { __be32 ip; __be16 port; u8 proto; u8 cidr:7; u8 nomatch:1; }; /* Common functions */ static inline bool hash_netport4_data_equal(const struct hash_netport4_elem *ip1, const struct hash_netport4_elem *ip2, u32 *multi) { return ip1->ip == ip2->ip && ip1->port == ip2->port && ip1->proto == ip2->proto && ip1->cidr == ip2->cidr; } static inline int hash_netport4_do_data_match(const struct hash_netport4_elem *elem) { return elem->nomatch ? -ENOTEMPTY : 1; } static inline void hash_netport4_data_set_flags(struct hash_netport4_elem *elem, u32 flags) { elem->nomatch = !!((flags >> 16) & IPSET_FLAG_NOMATCH); } static inline void hash_netport4_data_reset_flags(struct hash_netport4_elem *elem, u8 *flags) { swap(*flags, elem->nomatch); } static inline void hash_netport4_data_netmask(struct hash_netport4_elem *elem, u8 cidr) { elem->ip &= ip_set_netmask(cidr); elem->cidr = cidr - 1; } static bool hash_netport4_data_list(struct sk_buff *skb, const struct hash_netport4_elem *data) { u32 flags = data->nomatch ? IPSET_FLAG_NOMATCH : 0; if (nla_put_ipaddr4(skb, IPSET_ATTR_IP, data->ip) || nla_put_net16(skb, IPSET_ATTR_PORT, data->port) || nla_put_u8(skb, IPSET_ATTR_CIDR, data->cidr + 1) || nla_put_u8(skb, IPSET_ATTR_PROTO, data->proto) || (flags && nla_put_net32(skb, IPSET_ATTR_CADT_FLAGS, htonl(flags)))) goto nla_put_failure; return false; nla_put_failure: return true; } static inline void hash_netport4_data_next(struct hash_netport4_elem *next, const struct hash_netport4_elem *d) { next->ip = d->ip; next->port = d->port; } #define MTYPE hash_netport4 #define HOST_MASK 32 #include "ip_set_hash_gen.h" static int hash_netport4_kadt(struct ip_set *set, const struct sk_buff *skb, const struct xt_action_param *par, enum ipset_adt adt, struct ip_set_adt_opt *opt) { const struct hash_netport4 *h = set->data; ipset_adtfn adtfn = set->variant->adt[adt]; struct hash_netport4_elem e = { .cidr = INIT_CIDR(h->nets[0].cidr[0], HOST_MASK), }; struct ip_set_ext ext = IP_SET_INIT_KEXT(skb, opt, set); if (adt == IPSET_TEST) e.cidr = HOST_MASK - 1; if (!ip_set_get_ip4_port(skb, opt->flags & IPSET_DIM_TWO_SRC, &e.port, &e.proto)) return -EINVAL; ip4addrptr(skb, opt->flags & IPSET_DIM_ONE_SRC, &e.ip); e.ip &= ip_set_netmask(e.cidr + 1); return adtfn(set, &e, &ext, &opt->ext, opt->cmdflags); } static int hash_netport4_uadt(struct ip_set *set, struct nlattr *tb[], enum ipset_adt adt, u32 *lineno, u32 flags, bool retried) { const struct hash_netport4 *h = set->data; ipset_adtfn adtfn = set->variant->adt[adt]; struct hash_netport4_elem e = { .cidr = HOST_MASK - 1 }; struct ip_set_ext ext = IP_SET_INIT_UEXT(set); u32 port, port_to, p = 0, ip = 0, ip_to = 0; bool with_ports = false; u8 cidr; int ret; if (tb[IPSET_ATTR_LINENO]) *lineno = nla_get_u32(tb[IPSET_ATTR_LINENO]); if (unlikely(!tb[IPSET_ATTR_IP] || !ip_set_attr_netorder(tb, IPSET_ATTR_PORT) || !ip_set_optattr_netorder(tb, IPSET_ATTR_PORT_TO) || !ip_set_optattr_netorder(tb, IPSET_ATTR_CADT_FLAGS))) return -IPSET_ERR_PROTOCOL; ret = ip_set_get_hostipaddr4(tb[IPSET_ATTR_IP], &ip); if (ret) return ret; ret = ip_set_get_extensions(set, tb, &ext); if (ret) return ret; if (tb[IPSET_ATTR_CIDR]) { cidr = nla_get_u8(tb[IPSET_ATTR_CIDR]); if (!cidr || cidr > HOST_MASK) return -IPSET_ERR_INVALID_CIDR; e.cidr = cidr - 1; } e.port = nla_get_be16(tb[IPSET_ATTR_PORT]); if (tb[IPSET_ATTR_PROTO]) { e.proto = nla_get_u8(tb[IPSET_ATTR_PROTO]); with_ports = ip_set_proto_with_ports(e.proto); if (e.proto == 0) return -IPSET_ERR_INVALID_PROTO; } else { return -IPSET_ERR_MISSING_PROTO; } if (!(with_ports || e.proto == IPPROTO_ICMP)) e.port = 0; with_ports = with_ports && tb[IPSET_ATTR_PORT_TO]; if (tb[IPSET_ATTR_CADT_FLAGS]) { u32 cadt_flags = ip_set_get_h32(tb[IPSET_ATTR_CADT_FLAGS]); if (cadt_flags & IPSET_FLAG_NOMATCH) flags |= (IPSET_FLAG_NOMATCH << 16); } if (adt == IPSET_TEST || !(with_ports || tb[IPSET_ATTR_IP_TO])) { e.ip = htonl(ip & ip_set_hostmask(e.cidr + 1)); ret = adtfn(set, &e, &ext, &ext, flags); return ip_set_enomatch(ret, flags, adt, set) ? -ret : ip_set_eexist(ret, flags) ? 0 : ret; } port = port_to = ntohs(e.port); if (tb[IPSET_ATTR_PORT_TO]) { port_to = ip_set_get_h16(tb[IPSET_ATTR_PORT_TO]); if (port_to < port) swap(port, port_to); } if (tb[IPSET_ATTR_IP_TO]) { ret = ip_set_get_hostipaddr4(tb[IPSET_ATTR_IP_TO], &ip_to); if (ret) return ret; if (ip_to < ip) swap(ip, ip_to); if (ip + UINT_MAX == ip_to) return -IPSET_ERR_HASH_RANGE; } else { ip_set_mask_from_to(ip, ip_to, e.cidr + 1); } if (retried) { ip = ntohl(h->next.ip); p = ntohs(h->next.port); } else { p = port; } do { e.ip = htonl(ip); ip = ip_set_range_to_cidr(ip, ip_to, &cidr); e.cidr = cidr - 1; for (; p <= port_to; p++) { e.port = htons(p); ret = adtfn(set, &e, &ext, &ext, flags); if (ret && !ip_set_eexist(ret, flags)) return ret; ret = 0; } p = port; } while (ip++ < ip_to); return ret; } /* IPv6 variant */ struct hash_netport6_elem { union nf_inet_addr ip; __be16 port; u8 proto; u8 cidr:7; u8 nomatch:1; }; /* Common functions */ static inline bool hash_netport6_data_equal(const struct hash_netport6_elem *ip1, const struct hash_netport6_elem *ip2, u32 *multi) { return ipv6_addr_equal(&ip1->ip.in6, &ip2->ip.in6) && ip1->port == ip2->port && ip1->proto == ip2->proto && ip1->cidr == ip2->cidr; } static inline int hash_netport6_do_data_match(const struct hash_netport6_elem *elem) { return elem->nomatch ? -ENOTEMPTY : 1; } static inline void hash_netport6_data_set_flags(struct hash_netport6_elem *elem, u32 flags) { elem->nomatch = !!((flags >> 16) & IPSET_FLAG_NOMATCH); } static inline void hash_netport6_data_reset_flags(struct hash_netport6_elem *elem, u8 *flags) { swap(*flags, elem->nomatch); } static inline void hash_netport6_data_netmask(struct hash_netport6_elem *elem, u8 cidr) { ip6_netmask(&elem->ip, cidr); elem->cidr = cidr - 1; } static bool hash_netport6_data_list(struct sk_buff *skb, const struct hash_netport6_elem *data) { u32 flags = data->nomatch ? IPSET_FLAG_NOMATCH : 0; if (nla_put_ipaddr6(skb, IPSET_ATTR_IP, &data->ip.in6) || nla_put_net16(skb, IPSET_ATTR_PORT, data->port) || nla_put_u8(skb, IPSET_ATTR_CIDR, data->cidr + 1) || nla_put_u8(skb, IPSET_ATTR_PROTO, data->proto) || (flags && nla_put_net32(skb, IPSET_ATTR_CADT_FLAGS, htonl(flags)))) goto nla_put_failure; return false; nla_put_failure: return true; } static inline void hash_netport6_data_next(struct hash_netport6_elem *next, const struct hash_netport6_elem *d) { next->port = d->port; } #undef MTYPE #undef HOST_MASK #define MTYPE hash_netport6 #define HOST_MASK 128 #define IP_SET_EMIT_CREATE #include "ip_set_hash_gen.h" static int hash_netport6_kadt(struct ip_set *set, const struct sk_buff *skb, const struct xt_action_param *par, enum ipset_adt adt, struct ip_set_adt_opt *opt) { const struct hash_netport6 *h = set->data; ipset_adtfn adtfn = set->variant->adt[adt]; struct hash_netport6_elem e = { .cidr = INIT_CIDR(h->nets[0].cidr[0], HOST_MASK), }; struct ip_set_ext ext = IP_SET_INIT_KEXT(skb, opt, set); if (adt == IPSET_TEST) e.cidr = HOST_MASK - 1; if (!ip_set_get_ip6_port(skb, opt->flags & IPSET_DIM_TWO_SRC, &e.port, &e.proto)) return -EINVAL; ip6addrptr(skb, opt->flags & IPSET_DIM_ONE_SRC, &e.ip.in6); ip6_netmask(&e.ip, e.cidr + 1); return adtfn(set, &e, &ext, &opt->ext, opt->cmdflags); } static int hash_netport6_uadt(struct ip_set *set, struct nlattr *tb[], enum ipset_adt adt, u32 *lineno, u32 flags, bool retried) { const struct hash_netport6 *h = set->data; ipset_adtfn adtfn = set->variant->adt[adt]; struct hash_netport6_elem e = { .cidr = HOST_MASK - 1 }; struct ip_set_ext ext = IP_SET_INIT_UEXT(set); u32 port, port_to; bool with_ports = false; u8 cidr; int ret; if (tb[IPSET_ATTR_LINENO]) *lineno = nla_get_u32(tb[IPSET_ATTR_LINENO]); if (unlikely(!tb[IPSET_ATTR_IP] || !ip_set_attr_netorder(tb, IPSET_ATTR_PORT) || !ip_set_optattr_netorder(tb, IPSET_ATTR_PORT_TO) || !ip_set_optattr_netorder(tb, IPSET_ATTR_CADT_FLAGS))) return -IPSET_ERR_PROTOCOL; if (unlikely(tb[IPSET_ATTR_IP_TO])) return -IPSET_ERR_HASH_RANGE_UNSUPPORTED; ret = ip_set_get_ipaddr6(tb[IPSET_ATTR_IP], &e.ip); if (ret) return ret; ret = ip_set_get_extensions(set, tb, &ext); if (ret) return ret; if (tb[IPSET_ATTR_CIDR]) { cidr = nla_get_u8(tb[IPSET_ATTR_CIDR]); if (!cidr || cidr > HOST_MASK) return -IPSET_ERR_INVALID_CIDR; e.cidr = cidr - 1; } ip6_netmask(&e.ip, e.cidr + 1); e.port = nla_get_be16(tb[IPSET_ATTR_PORT]); if (tb[IPSET_ATTR_PROTO]) { e.proto = nla_get_u8(tb[IPSET_ATTR_PROTO]); with_ports = ip_set_proto_with_ports(e.proto); if (e.proto == 0) return -IPSET_ERR_INVALID_PROTO; } else { return -IPSET_ERR_MISSING_PROTO; } if (!(with_ports || e.proto == IPPROTO_ICMPV6)) e.port = 0; if (tb[IPSET_ATTR_CADT_FLAGS]) { u32 cadt_flags = ip_set_get_h32(tb[IPSET_ATTR_CADT_FLAGS]); if (cadt_flags & IPSET_FLAG_NOMATCH) flags |= (IPSET_FLAG_NOMATCH << 16); } if (adt == IPSET_TEST || !with_ports || !tb[IPSET_ATTR_PORT_TO]) { ret = adtfn(set, &e, &ext, &ext, flags); return ip_set_enomatch(ret, flags, adt, set) ? -ret : ip_set_eexist(ret, flags) ? 0 : ret; } port = ntohs(e.port); port_to = ip_set_get_h16(tb[IPSET_ATTR_PORT_TO]); if (port > port_to) swap(port, port_to); if (retried) port = ntohs(h->next.port); for (; port <= port_to; port++) { e.port = htons(port); ret = adtfn(set, &e, &ext, &ext, flags); if (ret && !ip_set_eexist(ret, flags)) return ret; ret = 0; } return ret; } static struct ip_set_type hash_netport_type __read_mostly = { .name = "hash:net,port", .protocol = IPSET_PROTOCOL, .features = IPSET_TYPE_IP | IPSET_TYPE_PORT | IPSET_TYPE_NOMATCH, .dimension = IPSET_DIM_TWO, .family = NFPROTO_UNSPEC, .revision_min = IPSET_TYPE_REV_MIN, .revision_max = IPSET_TYPE_REV_MAX, .create = hash_netport_create, .create_policy = { [IPSET_ATTR_HASHSIZE] = { .type = NLA_U32 }, [IPSET_ATTR_MAXELEM] = { .type = NLA_U32 }, [IPSET_ATTR_PROBES] = { .type = NLA_U8 }, [IPSET_ATTR_RESIZE] = { .type = NLA_U8 }, [IPSET_ATTR_PROTO] = { .type = NLA_U8 }, [IPSET_ATTR_TIMEOUT] = { .type = NLA_U32 }, [IPSET_ATTR_CADT_FLAGS] = { .type = NLA_U32 }, }, .adt_policy = { [IPSET_ATTR_IP] = { .type = NLA_NESTED }, [IPSET_ATTR_IP_TO] = { .type = NLA_NESTED }, [IPSET_ATTR_PORT] = { .type = NLA_U16 }, [IPSET_ATTR_PORT_TO] = { .type = NLA_U16 }, [IPSET_ATTR_PROTO] = { .type = NLA_U8 }, [IPSET_ATTR_CIDR] = { .type = NLA_U8 }, [IPSET_ATTR_TIMEOUT] = { .type = NLA_U32 }, [IPSET_ATTR_LINENO] = { .type = NLA_U32 }, [IPSET_ATTR_CADT_FLAGS] = { .type = NLA_U32 }, [IPSET_ATTR_BYTES] = { .type = NLA_U64 }, [IPSET_ATTR_PACKETS] = { .type = NLA_U64 }, [IPSET_ATTR_COMMENT] = { .type = NLA_NUL_STRING, .len = IPSET_MAX_COMMENT_SIZE }, [IPSET_ATTR_SKBMARK] = { .type = NLA_U64 }, [IPSET_ATTR_SKBPRIO] = { .type = NLA_U32 }, [IPSET_ATTR_SKBQUEUE] = { .type = NLA_U16 }, }, .me = THIS_MODULE, }; static int __init hash_netport_init(void) { return ip_set_type_register(&hash_netport_type); } static void __exit hash_netport_fini(void) { rcu_barrier(); ip_set_type_unregister(&hash_netport_type); } module_init(hash_netport_init); module_exit(hash_netport_fini);
69 68 68 17 17 49 32 32 32 32 32 36 3 14 14 14 5 5 14 57 7 7 7 7 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 /* * videobuf2-vmalloc.c - vmalloc memory allocator for videobuf2 * * Copyright (C) 2010 Samsung Electronics * * Author: Pawel Osciak <pawel@osciak.com> * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation. */ #include <linux/io.h> #include <linux/module.h> #include <linux/mm.h> #include <linux/refcount.h> #include <linux/sched.h> #include <linux/slab.h> #include <linux/vmalloc.h> #include <media/videobuf2-v4l2.h> #include <media/videobuf2-vmalloc.h> #include <media/videobuf2-memops.h> struct vb2_vmalloc_buf { void *vaddr; struct frame_vector *vec; enum dma_data_direction dma_dir; unsigned long size; refcount_t refcount; struct vb2_vmarea_handler handler; struct dma_buf *dbuf; }; static void vb2_vmalloc_put(void *buf_priv); static void *vb2_vmalloc_alloc(struct device *dev, unsigned long attrs, unsigned long size, enum dma_data_direction dma_dir, gfp_t gfp_flags) { struct vb2_vmalloc_buf *buf; buf = kzalloc(sizeof(*buf), GFP_KERNEL | gfp_flags); if (!buf) return ERR_PTR(-ENOMEM); buf->size = size; buf->vaddr = vmalloc_user(buf->size); buf->dma_dir = dma_dir; buf->handler.refcount = &buf->refcount; buf->handler.put = vb2_vmalloc_put; buf->handler.arg = buf; if (!buf->vaddr) { pr_debug("vmalloc of size %ld failed\n", buf->size); kfree(buf); return ERR_PTR(-ENOMEM); } refcount_set(&buf->refcount, 1); return buf; } static void vb2_vmalloc_put(void *buf_priv) { struct vb2_vmalloc_buf *buf = buf_priv; if (refcount_dec_and_test(&buf->refcount)) { vfree(buf->vaddr); kfree(buf); } } static void *vb2_vmalloc_get_userptr(struct device *dev, unsigned long vaddr, unsigned long size, enum dma_data_direction dma_dir) { struct vb2_vmalloc_buf *buf; struct frame_vector *vec; int n_pages, offset, i; int ret = -ENOMEM; buf = kzalloc(sizeof(*buf), GFP_KERNEL); if (!buf) return ERR_PTR(-ENOMEM); buf->dma_dir = dma_dir; offset = vaddr & ~PAGE_MASK; buf->size = size; vec = vb2_create_framevec(vaddr, size, dma_dir == DMA_FROM_DEVICE || dma_dir == DMA_BIDIRECTIONAL); if (IS_ERR(vec)) { ret = PTR_ERR(vec); goto fail_pfnvec_create; } buf->vec = vec; n_pages = frame_vector_count(vec); if (frame_vector_to_pages(vec) < 0) { unsigned long *nums = frame_vector_pfns(vec); /* * We cannot get page pointers for these pfns. Check memory is * physically contiguous and use direct mapping. */ for (i = 1; i < n_pages; i++) if (nums[i-1] + 1 != nums[i]) goto fail_map; buf->vaddr = (__force void *) ioremap_nocache(__pfn_to_phys(nums[0]), size + offset); } else { buf->vaddr = vm_map_ram(frame_vector_pages(vec), n_pages, -1, PAGE_KERNEL); } if (!buf->vaddr) goto fail_map; buf->vaddr += offset; return buf; fail_map: vb2_destroy_framevec(vec); fail_pfnvec_create: kfree(buf); return ERR_PTR(ret); } static void vb2_vmalloc_put_userptr(void *buf_priv) { struct vb2_vmalloc_buf *buf = buf_priv; unsigned long vaddr = (unsigned long)buf->vaddr & PAGE_MASK; unsigned int i; struct page **pages; unsigned int n_pages; if (!buf->vec->is_pfns) { n_pages = frame_vector_count(buf->vec); pages = frame_vector_pages(buf->vec); if (vaddr) vm_unmap_ram((void *)vaddr, n_pages); if (buf->dma_dir == DMA_FROM_DEVICE || buf->dma_dir == DMA_BIDIRECTIONAL) for (i = 0; i < n_pages; i++) set_page_dirty_lock(pages[i]); } else { iounmap((__force void __iomem *)buf->vaddr); } vb2_destroy_framevec(buf->vec); kfree(buf); } static void *vb2_vmalloc_vaddr(void *buf_priv) { struct vb2_vmalloc_buf *buf = buf_priv; if (!buf->vaddr) { pr_err("Address of an unallocated plane requested or cannot map user pointer\n"); return NULL; } return buf->vaddr; } static unsigned int vb2_vmalloc_num_users(void *buf_priv) { struct vb2_vmalloc_buf *buf = buf_priv; return refcount_read(&buf->refcount); } static int vb2_vmalloc_mmap(void *buf_priv, struct vm_area_struct *vma) { struct vb2_vmalloc_buf *buf = buf_priv; int ret; if (!buf) { pr_err("No memory to map\n"); return -EINVAL; } ret = remap_vmalloc_range(vma, buf->vaddr, 0); if (ret) { pr_err("Remapping vmalloc memory, error: %d\n", ret); return ret; } /* * Make sure that vm_areas for 2 buffers won't be merged together */ vma->vm_flags |= VM_DONTEXPAND; /* * Use common vm_area operations to track buffer refcount. */ vma->vm_private_data = &buf->handler; vma->vm_ops = &vb2_common_vm_ops; vma->vm_ops->open(vma); return 0; } #ifdef CONFIG_HAS_DMA /*********************************************/ /* DMABUF ops for exporters */ /*********************************************/ struct vb2_vmalloc_attachment { struct sg_table sgt; enum dma_data_direction dma_dir; }; static int vb2_vmalloc_dmabuf_ops_attach(struct dma_buf *dbuf, struct dma_buf_attachment *dbuf_attach) { struct vb2_vmalloc_attachment *attach; struct vb2_vmalloc_buf *buf = dbuf->priv; int num_pages = PAGE_ALIGN(buf->size) / PAGE_SIZE; struct sg_table *sgt; struct scatterlist *sg; void *vaddr = buf->vaddr; int ret; int i; attach = kzalloc(sizeof(*attach), GFP_KERNEL); if (!attach) return -ENOMEM; sgt = &attach->sgt; ret = sg_alloc_table(sgt, num_pages, GFP_KERNEL); if (ret) { kfree(attach); return ret; } for_each_sg(sgt->sgl, sg, sgt->nents, i) { struct page *page = vmalloc_to_page(vaddr); if (!page) { sg_free_table(sgt); kfree(attach); return -ENOMEM; } sg_set_page(sg, page, PAGE_SIZE, 0); vaddr += PAGE_SIZE; } attach->dma_dir = DMA_NONE; dbuf_attach->priv = attach; return 0; } static void vb2_vmalloc_dmabuf_ops_detach(struct dma_buf *dbuf, struct dma_buf_attachment *db_attach) { struct vb2_vmalloc_attachment *attach = db_attach->priv; struct sg_table *sgt; if (!attach) return; sgt = &attach->sgt; /* release the scatterlist cache */ if (attach->dma_dir != DMA_NONE) dma_unmap_sg(db_attach->dev, sgt->sgl, sgt->orig_nents, attach->dma_dir); sg_free_table(sgt); kfree(attach); db_attach->priv = NULL; } static struct sg_table *vb2_vmalloc_dmabuf_ops_map( struct dma_buf_attachment *db_attach, enum dma_data_direction dma_dir) { struct vb2_vmalloc_attachment *attach = db_attach->priv; /* stealing dmabuf mutex to serialize map/unmap operations */ struct mutex *lock = &db_attach->dmabuf->lock; struct sg_table *sgt; mutex_lock(lock); sgt = &attach->sgt; /* return previously mapped sg table */ if (attach->dma_dir == dma_dir) { mutex_unlock(lock); return sgt; } /* release any previous cache */ if (attach->dma_dir != DMA_NONE) { dma_unmap_sg(db_attach->dev, sgt->sgl, sgt->orig_nents, attach->dma_dir); attach->dma_dir = DMA_NONE; } /* mapping to the client with new direction */ sgt->nents = dma_map_sg(db_attach->dev, sgt->sgl, sgt->orig_nents, dma_dir); if (!sgt->nents) { pr_err("failed to map scatterlist\n"); mutex_unlock(lock); return ERR_PTR(-EIO); } attach->dma_dir = dma_dir; mutex_unlock(lock); return sgt; } static void vb2_vmalloc_dmabuf_ops_unmap(struct dma_buf_attachment *db_attach, struct sg_table *sgt, enum dma_data_direction dma_dir) { /* nothing to be done here */ } static void vb2_vmalloc_dmabuf_ops_release(struct dma_buf *dbuf) { /* drop reference obtained in vb2_vmalloc_get_dmabuf */ vb2_vmalloc_put(dbuf->priv); } static void *vb2_vmalloc_dmabuf_ops_kmap(struct dma_buf *dbuf, unsigned long pgnum) { struct vb2_vmalloc_buf *buf = dbuf->priv; return buf->vaddr + pgnum * PAGE_SIZE; } static void *vb2_vmalloc_dmabuf_ops_vmap(struct dma_buf *dbuf) { struct vb2_vmalloc_buf *buf = dbuf->priv; return buf->vaddr; } static int vb2_vmalloc_dmabuf_ops_mmap(struct dma_buf *dbuf, struct vm_area_struct *vma) { return vb2_vmalloc_mmap(dbuf->priv, vma); } static const struct dma_buf_ops vb2_vmalloc_dmabuf_ops = { .attach = vb2_vmalloc_dmabuf_ops_attach, .detach = vb2_vmalloc_dmabuf_ops_detach, .map_dma_buf = vb2_vmalloc_dmabuf_ops_map, .unmap_dma_buf = vb2_vmalloc_dmabuf_ops_unmap, .map = vb2_vmalloc_dmabuf_ops_kmap, .vmap = vb2_vmalloc_dmabuf_ops_vmap, .mmap = vb2_vmalloc_dmabuf_ops_mmap, .release = vb2_vmalloc_dmabuf_ops_release, }; static struct dma_buf *vb2_vmalloc_get_dmabuf(void *buf_priv, unsigned long flags) { struct vb2_vmalloc_buf *buf = buf_priv; struct dma_buf *dbuf; DEFINE_DMA_BUF_EXPORT_INFO(exp_info); exp_info.ops = &vb2_vmalloc_dmabuf_ops; exp_info.size = buf->size; exp_info.flags = flags; exp_info.priv = buf; if (WARN_ON(!buf->vaddr)) return NULL; dbuf = dma_buf_export(&exp_info); if (IS_ERR(dbuf)) return NULL; /* dmabuf keeps reference to vb2 buffer */ refcount_inc(&buf->refcount); return dbuf; } #endif /* CONFIG_HAS_DMA */ /*********************************************/ /* callbacks for DMABUF buffers */ /*********************************************/ static int vb2_vmalloc_map_dmabuf(void *mem_priv) { struct vb2_vmalloc_buf *buf = mem_priv; buf->vaddr = dma_buf_vmap(buf->dbuf); return buf->vaddr ? 0 : -EFAULT; } static void vb2_vmalloc_unmap_dmabuf(void *mem_priv) { struct vb2_vmalloc_buf *buf = mem_priv; dma_buf_vunmap(buf->dbuf, buf->vaddr); buf->vaddr = NULL; } static void vb2_vmalloc_detach_dmabuf(void *mem_priv) { struct vb2_vmalloc_buf *buf = mem_priv; if (buf->vaddr) dma_buf_vunmap(buf->dbuf, buf->vaddr); kfree(buf); } static void *vb2_vmalloc_attach_dmabuf(struct device *dev, struct dma_buf *dbuf, unsigned long size, enum dma_data_direction dma_dir) { struct vb2_vmalloc_buf *buf; if (dbuf->size < size) return ERR_PTR(-EFAULT); buf = kzalloc(sizeof(*buf), GFP_KERNEL); if (!buf) return ERR_PTR(-ENOMEM); buf->dbuf = dbuf; buf->dma_dir = dma_dir; buf->size = size; return buf; } const struct vb2_mem_ops vb2_vmalloc_memops = { .alloc = vb2_vmalloc_alloc, .put = vb2_vmalloc_put, .get_userptr = vb2_vmalloc_get_userptr, .put_userptr = vb2_vmalloc_put_userptr, #ifdef CONFIG_HAS_DMA .get_dmabuf = vb2_vmalloc_get_dmabuf, #endif .map_dmabuf = vb2_vmalloc_map_dmabuf, .unmap_dmabuf = vb2_vmalloc_unmap_dmabuf, .attach_dmabuf = vb2_vmalloc_attach_dmabuf, .detach_dmabuf = vb2_vmalloc_detach_dmabuf, .vaddr = vb2_vmalloc_vaddr, .mmap = vb2_vmalloc_mmap, .num_users = vb2_vmalloc_num_users, }; EXPORT_SYMBOL_GPL(vb2_vmalloc_memops); MODULE_DESCRIPTION("vmalloc memory handling routines for videobuf2"); MODULE_AUTHOR("Pawel Osciak <pawel@osciak.com>"); MODULE_LICENSE("GPL");
2086 42 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 /* * pm_wakeup.h - Power management wakeup interface * * Copyright (C) 2008 Alan Stern * Copyright (C) 2010 Rafael J. Wysocki, Novell Inc. * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */ #ifndef _LINUX_PM_WAKEUP_H #define _LINUX_PM_WAKEUP_H #ifndef _DEVICE_H_ # error "please don't include this file directly" #endif #include <linux/types.h> struct wake_irq; /** * struct wakeup_source - Representation of wakeup sources * * @name: Name of the wakeup source * @entry: Wakeup source list entry * @lock: Wakeup source lock * @wakeirq: Optional device specific wakeirq * @timer: Wakeup timer list * @timer_expires: Wakeup timer expiration * @total_time: Total time this wakeup source has been active. * @max_time: Maximum time this wakeup source has been continuously active. * @last_time: Monotonic clock when the wakeup source's was touched last time. * @prevent_sleep_time: Total time this source has been preventing autosleep. * @event_count: Number of signaled wakeup events. * @active_count: Number of times the wakeup source was activated. * @relax_count: Number of times the wakeup source was deactivated. * @expire_count: Number of times the wakeup source's timeout has expired. * @wakeup_count: Number of times the wakeup source might abort suspend. * @active: Status of the wakeup source. * @has_timeout: The wakeup source has been activated with a timeout. */ struct wakeup_source { const char *name; struct list_head entry; spinlock_t lock; struct wake_irq *wakeirq; struct timer_list timer; unsigned long timer_expires; ktime_t total_time; ktime_t max_time; ktime_t last_time; ktime_t start_prevent_time; ktime_t prevent_sleep_time; unsigned long event_count; unsigned long active_count; unsigned long relax_count; unsigned long expire_count; unsigned long wakeup_count; bool active:1; bool autosleep_enabled:1; }; #ifdef CONFIG_PM_SLEEP /* * Changes to device_may_wakeup take effect on the next pm state change. */ static inline bool device_can_wakeup(struct device *dev) { return dev->power.can_wakeup; } static inline bool device_may_wakeup(struct device *dev) { return dev->power.can_wakeup && !!dev->power.wakeup; } static inline void device_set_wakeup_path(struct device *dev) { dev->power.wakeup_path = true; } /* drivers/base/power/wakeup.c */ extern void wakeup_source_prepare(struct wakeup_source *ws, const char *name); extern struct wakeup_source *wakeup_source_create(const char *name); extern void wakeup_source_drop(struct wakeup_source *ws); extern void wakeup_source_destroy(struct wakeup_source *ws); extern void wakeup_source_add(struct wakeup_source *ws); extern void wakeup_source_remove(struct wakeup_source *ws); extern struct wakeup_source *wakeup_source_register(const char *name); extern void wakeup_source_unregister(struct wakeup_source *ws); extern int device_wakeup_enable(struct device *dev); extern int device_wakeup_disable(struct device *dev); extern void device_set_wakeup_capable(struct device *dev, bool capable); extern int device_init_wakeup(struct device *dev, bool val); extern int device_set_wakeup_enable(struct device *dev, bool enable); extern void __pm_stay_awake(struct wakeup_source *ws); extern void pm_stay_awake(struct device *dev); extern void __pm_relax(struct wakeup_source *ws); extern void pm_relax(struct device *dev); extern void pm_wakeup_ws_event(struct wakeup_source *ws, unsigned int msec, bool hard); extern void pm_wakeup_dev_event(struct device *dev, unsigned int msec, bool hard); #else /* !CONFIG_PM_SLEEP */ static inline void device_set_wakeup_capable(struct device *dev, bool capable) { dev->power.can_wakeup = capable; } static inline bool device_can_wakeup(struct device *dev) { return dev->power.can_wakeup; } static inline void wakeup_source_prepare(struct wakeup_source *ws, const char *name) {} static inline struct wakeup_source *wakeup_source_create(const char *name) { return NULL; } static inline void wakeup_source_drop(struct wakeup_source *ws) {} static inline void wakeup_source_destroy(struct wakeup_source *ws) {} static inline void wakeup_source_add(struct wakeup_source *ws) {} static inline void wakeup_source_remove(struct wakeup_source *ws) {} static inline struct wakeup_source *wakeup_source_register(const char *name) { return NULL; } static inline void wakeup_source_unregister(struct wakeup_source *ws) {} static inline int device_wakeup_enable(struct device *dev) { dev->power.should_wakeup = true; return 0; } static inline int device_wakeup_disable(struct device *dev) { dev->power.should_wakeup = false; return 0; } static inline int device_set_wakeup_enable(struct device *dev, bool enable) { dev->power.should_wakeup = enable; return 0; } static inline int device_init_wakeup(struct device *dev, bool val) { device_set_wakeup_capable(dev, val); device_set_wakeup_enable(dev, val); return 0; } static inline bool device_may_wakeup(struct device *dev) { return dev->power.can_wakeup && dev->power.should_wakeup; } static inline void device_set_wakeup_path(struct device *dev) {} static inline void __pm_stay_awake(struct wakeup_source *ws) {} static inline void pm_stay_awake(struct device *dev) {} static inline void __pm_relax(struct wakeup_source *ws) {} static inline void pm_relax(struct device *dev) {} static inline void pm_wakeup_ws_event(struct wakeup_source *ws, unsigned int msec, bool hard) {} static inline void pm_wakeup_dev_event(struct device *dev, unsigned int msec, bool hard) {} #endif /* !CONFIG_PM_SLEEP */ static inline void wakeup_source_init(struct wakeup_source *ws, const char *name) { wakeup_source_prepare(ws, name); wakeup_source_add(ws); } static inline void wakeup_source_trash(struct wakeup_source *ws) { wakeup_source_remove(ws); wakeup_source_drop(ws); } static inline void __pm_wakeup_event(struct wakeup_source *ws, unsigned int msec) { return pm_wakeup_ws_event(ws, msec, false); } static inline void pm_wakeup_event(struct device *dev, unsigned int msec) { return pm_wakeup_dev_event(dev, msec, false); } static inline void pm_wakeup_hard_event(struct device *dev) { return pm_wakeup_dev_event(dev, 0, true); } #endif /* _LINUX_PM_WAKEUP_H */
1 433 433 597 56 56 56 54 31 30 24 12 12 187 187 186 82 114 113 50 583 260 254 127 8 125 3908 3045 1567 3903 187 82 445 56 30 141 902 597 597 597 3906 3471 540 584 585 574 575 41 41 41 24 23 18 2 2 17 12 12 12 12 12 128 128 128 56 82 82 47 47 47 3904 3905 3399 752 347 416 445 127 395 41 79 817 3904 585 585 408 627 3468 540 1 3 1 440 440 440 434 431 430 429 429 107 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 2297 2298 2299 2300 2301 2302 2303 2304 2305 2306 2307 2308 2309 2310 2311 2312 2313 2314 2315 2316 2317 2318 2319 2320 2321 2322 2323 2324 2325 2326 2327 2328 2329 2330 2331 2332 2333 2334 2335 2336 2337 2338 2339 2340 2341 2342 2343 2344 2345 2346 2347 2348 2349 2350 2351 2352 2353 2354 2355 2356 2357 2358 2359 2360 2361 2362 2363 2364 2365 2366 2367 2368 2369 2370 2371 2372 2373 2374 2375 2376 2377 2378 2379 2380 2381 2382 2383 2384 2385 2386 2387 2388 2389 /* * IPVS An implementation of the IP virtual server support for the * LINUX operating system. IPVS is now implemented as a module * over the Netfilter framework. IPVS can be used to build a * high-performance and highly available server based on a * cluster of servers. * * Authors: Wensong Zhang <wensong@linuxvirtualserver.org> * Peter Kese <peter.kese@ijs.si> * Julian Anastasov <ja@ssi.bg> * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License * as published by the Free Software Foundation; either version * 2 of the License, or (at your option) any later version. * * The IPVS code for kernel 2.2 was done by Wensong Zhang and Peter Kese, * with changes/fixes from Julian Anastasov, Lars Marowsky-Bree, Horms * and others. * * Changes: * Paul `Rusty' Russell properly handle non-linear skbs * Harald Welte don't use nfcache * */ #define KMSG_COMPONENT "IPVS" #define pr_fmt(fmt) KMSG_COMPONENT ": " fmt #include <linux/module.h> #include <linux/kernel.h> #include <linux/ip.h> #include <linux/tcp.h> #include <linux/sctp.h> #include <linux/icmp.h> #include <linux/slab.h> #include <net/ip.h> #include <net/tcp.h> #include <net/udp.h> #include <net/icmp.h> /* for icmp_send */ #include <net/route.h> #include <net/ip6_checksum.h> #include <net/netns/generic.h> /* net_generic() */ #include <linux/netfilter.h> #include <linux/netfilter_ipv4.h> #ifdef CONFIG_IP_VS_IPV6 #include <net/ipv6.h> #include <linux/netfilter_ipv6.h> #include <net/ip6_route.h> #endif #include <net/ip_vs.h> EXPORT_SYMBOL(register_ip_vs_scheduler); EXPORT_SYMBOL(unregister_ip_vs_scheduler); EXPORT_SYMBOL(ip_vs_proto_name); EXPORT_SYMBOL(ip_vs_conn_new); EXPORT_SYMBOL(ip_vs_conn_in_get); EXPORT_SYMBOL(ip_vs_conn_out_get); #ifdef CONFIG_IP_VS_PROTO_TCP EXPORT_SYMBOL(ip_vs_tcp_conn_listen); #endif EXPORT_SYMBOL(ip_vs_conn_put); #ifdef CONFIG_IP_VS_DEBUG EXPORT_SYMBOL(ip_vs_get_debug_level); #endif EXPORT_SYMBOL(ip_vs_new_conn_out); static unsigned int ip_vs_net_id __read_mostly; /* netns cnt used for uniqueness */ static atomic_t ipvs_netns_cnt = ATOMIC_INIT(0); /* ID used in ICMP lookups */ #define icmp_id(icmph) (((icmph)->un).echo.id) #define icmpv6_id(icmph) (icmph->icmp6_dataun.u_echo.identifier) const char *ip_vs_proto_name(unsigned int proto) { static char buf[20]; switch (proto) { case IPPROTO_IP: return "IP"; case IPPROTO_UDP: return "UDP"; case IPPROTO_TCP: return "TCP"; case IPPROTO_SCTP: return "SCTP"; case IPPROTO_ICMP: return "ICMP"; #ifdef CONFIG_IP_VS_IPV6 case IPPROTO_ICMPV6: return "ICMPv6"; #endif default: sprintf(buf, "IP_%u", proto); return buf; } } void ip_vs_init_hash_table(struct list_head *table, int rows) { while (--rows >= 0) INIT_LIST_HEAD(&table[rows]); } static inline void ip_vs_in_stats(struct ip_vs_conn *cp, struct sk_buff *skb) { struct ip_vs_dest *dest = cp->dest; struct netns_ipvs *ipvs = cp->ipvs; if (dest && (dest->flags & IP_VS_DEST_F_AVAILABLE)) { struct ip_vs_cpu_stats *s; struct ip_vs_service *svc; local_bh_disable(); s = this_cpu_ptr(dest->stats.cpustats); u64_stats_update_begin(&s->syncp); s->cnt.inpkts++; s->cnt.inbytes += skb->len; u64_stats_update_end(&s->syncp); svc = rcu_dereference(dest->svc); s = this_cpu_ptr(svc->stats.cpustats); u64_stats_update_begin(&s->syncp); s->cnt.inpkts++; s->cnt.inbytes += skb->len; u64_stats_update_end(&s->syncp); s = this_cpu_ptr(ipvs->tot_stats.cpustats); u64_stats_update_begin(&s->syncp); s->cnt.inpkts++; s->cnt.inbytes += skb->len; u64_stats_update_end(&s->syncp); local_bh_enable(); } } static inline void ip_vs_out_stats(struct ip_vs_conn *cp, struct sk_buff *skb) { struct ip_vs_dest *dest = cp->dest; struct netns_ipvs *ipvs = cp->ipvs; if (dest && (dest->flags & IP_VS_DEST_F_AVAILABLE)) { struct ip_vs_cpu_stats *s; struct ip_vs_service *svc; local_bh_disable(); s = this_cpu_ptr(dest->stats.cpustats); u64_stats_update_begin(&s->syncp); s->cnt.outpkts++; s->cnt.outbytes += skb->len; u64_stats_update_end(&s->syncp); svc = rcu_dereference(dest->svc); s = this_cpu_ptr(svc->stats.cpustats); u64_stats_update_begin(&s->syncp); s->cnt.outpkts++; s->cnt.outbytes += skb->len; u64_stats_update_end(&s->syncp); s = this_cpu_ptr(ipvs->tot_stats.cpustats); u64_stats_update_begin(&s->syncp); s->cnt.outpkts++; s->cnt.outbytes += skb->len; u64_stats_update_end(&s->syncp); local_bh_enable(); } } static inline void ip_vs_conn_stats(struct ip_vs_conn *cp, struct ip_vs_service *svc) { struct netns_ipvs *ipvs = svc->ipvs; struct ip_vs_cpu_stats *s; local_bh_disable(); s = this_cpu_ptr(cp->dest->stats.cpustats); u64_stats_update_begin(&s->syncp); s->cnt.conns++; u64_stats_update_end(&s->syncp); s = this_cpu_ptr(svc->stats.cpustats); u64_stats_update_begin(&s->syncp); s->cnt.conns++; u64_stats_update_end(&s->syncp); s = this_cpu_ptr(ipvs->tot_stats.cpustats); u64_stats_update_begin(&s->syncp); s->cnt.conns++; u64_stats_update_end(&s->syncp); local_bh_enable(); } static inline void ip_vs_set_state(struct ip_vs_conn *cp, int direction, const struct sk_buff *skb, struct ip_vs_proto_data *pd) { if (likely(pd->pp->state_transition)) pd->pp->state_transition(cp, direction, skb, pd); } static inline int ip_vs_conn_fill_param_persist(const struct ip_vs_service *svc, struct sk_buff *skb, int protocol, const union nf_inet_addr *caddr, __be16 cport, const union nf_inet_addr *vaddr, __be16 vport, struct ip_vs_conn_param *p) { ip_vs_conn_fill_param(svc->ipvs, svc->af, protocol, caddr, cport, vaddr, vport, p); p->pe = rcu_dereference(svc->pe); if (p->pe && p->pe->fill_param) return p->pe->fill_param(p, skb); return 0; } /* * IPVS persistent scheduling function * It creates a connection entry according to its template if exists, * or selects a server and creates a connection entry plus a template. * Locking: we are svc user (svc->refcnt), so we hold all dests too * Protocols supported: TCP, UDP */ static struct ip_vs_conn * ip_vs_sched_persist(struct ip_vs_service *svc, struct sk_buff *skb, __be16 src_port, __be16 dst_port, int *ignored, struct ip_vs_iphdr *iph) { struct ip_vs_conn *cp = NULL; struct ip_vs_dest *dest; struct ip_vs_conn *ct; __be16 dport = 0; /* destination port to forward */ unsigned int flags; struct ip_vs_conn_param param; const union nf_inet_addr fwmark = { .ip = htonl(svc->fwmark) }; union nf_inet_addr snet; /* source network of the client, after masking */ const union nf_inet_addr *src_addr, *dst_addr; if (likely(!ip_vs_iph_inverse(iph))) { src_addr = &iph->saddr; dst_addr = &iph->daddr; } else { src_addr = &iph->daddr; dst_addr = &iph->saddr; } /* Mask saddr with the netmask to adjust template granularity */ #ifdef CONFIG_IP_VS_IPV6 if (svc->af == AF_INET6) ipv6_addr_prefix(&snet.in6, &src_addr->in6, (__force __u32) svc->netmask); else #endif snet.ip = src_addr->ip & svc->netmask; IP_VS_DBG_BUF(6, "p-schedule: src %s:%u dest %s:%u " "mnet %s\n", IP_VS_DBG_ADDR(svc->af, src_addr), ntohs(src_port), IP_VS_DBG_ADDR(svc->af, dst_addr), ntohs(dst_port), IP_VS_DBG_ADDR(svc->af, &snet)); /* * As far as we know, FTP is a very complicated network protocol, and * it uses control connection and data connections. For active FTP, * FTP server initialize data connection to the client, its source port * is often 20. For passive FTP, FTP server tells the clients the port * that it passively listens to, and the client issues the data * connection. In the tunneling or direct routing mode, the load * balancer is on the client-to-server half of connection, the port * number is unknown to the load balancer. So, a conn template like * <caddr, 0, vaddr, 0, daddr, 0> is created for persistent FTP * service, and a template like <caddr, 0, vaddr, vport, daddr, dport> * is created for other persistent services. */ { int protocol = iph->protocol; const union nf_inet_addr *vaddr = dst_addr; __be16 vport = 0; if (dst_port == svc->port) { /* non-FTP template: * <protocol, caddr, 0, vaddr, vport, daddr, dport> * FTP template: * <protocol, caddr, 0, vaddr, 0, daddr, 0> */ if (svc->port != FTPPORT) vport = dst_port; } else { /* Note: persistent fwmark-based services and * persistent port zero service are handled here. * fwmark template: * <IPPROTO_IP,caddr,0,fwmark,0,daddr,0> * port zero template: * <protocol,caddr,0,vaddr,0,daddr,0> */ if (svc->fwmark) { protocol = IPPROTO_IP; vaddr = &fwmark; } } /* return *ignored = -1 so NF_DROP can be used */ if (ip_vs_conn_fill_param_persist(svc, skb, protocol, &snet, 0, vaddr, vport, &param) < 0) { *ignored = -1; return NULL; } } /* Check if a template already exists */ ct = ip_vs_ct_in_get(&param); if (!ct || !ip_vs_check_template(ct, NULL)) { struct ip_vs_scheduler *sched; /* * No template found or the dest of the connection * template is not available. * return *ignored=0 i.e. ICMP and NF_DROP */ sched = rcu_dereference(svc->scheduler); if (sched) { /* read svc->sched_data after svc->scheduler */ smp_rmb(); dest = sched->schedule(svc, skb, iph); } else { dest = NULL; } if (!dest) { IP_VS_DBG(1, "p-schedule: no dest found.\n"); kfree(param.pe_data); *ignored = 0; return NULL; } if (dst_port == svc->port && svc->port != FTPPORT) dport = dest->port; /* Create a template * This adds param.pe_data to the template, * and thus param.pe_data will be destroyed * when the template expires */ ct = ip_vs_conn_new(&param, dest->af, &dest->addr, dport, IP_VS_CONN_F_TEMPLATE, dest, skb->mark); if (ct == NULL) { kfree(param.pe_data); *ignored = -1; return NULL; } ct->timeout = svc->timeout; } else { /* set destination with the found template */ dest = ct->dest; kfree(param.pe_data); } dport = dst_port; if (dport == svc->port && dest->port) dport = dest->port; flags = (svc->flags & IP_VS_SVC_F_ONEPACKET && iph->protocol == IPPROTO_UDP) ? IP_VS_CONN_F_ONE_PACKET : 0; /* * Create a new connection according to the template */ ip_vs_conn_fill_param(svc->ipvs, svc->af, iph->protocol, src_addr, src_port, dst_addr, dst_port, &param); cp = ip_vs_conn_new(&param, dest->af, &dest->addr, dport, flags, dest, skb->mark); if (cp == NULL) { ip_vs_conn_put(ct); *ignored = -1; return NULL; } /* * Add its control */ ip_vs_control_add(cp, ct); ip_vs_conn_put(ct); ip_vs_conn_stats(cp, svc); return cp; } /* * IPVS main scheduling function * It selects a server according to the virtual service, and * creates a connection entry. * Protocols supported: TCP, UDP * * Usage of *ignored * * 1 : protocol tried to schedule (eg. on SYN), found svc but the * svc/scheduler decides that this packet should be accepted with * NF_ACCEPT because it must not be scheduled. * * 0 : scheduler can not find destination, so try bypass or * return ICMP and then NF_DROP (ip_vs_leave). * * -1 : scheduler tried to schedule but fatal error occurred, eg. * ip_vs_conn_new failure (ENOMEM) or ip_vs_sip_fill_param * failure such as missing Call-ID, ENOMEM on skb_linearize * or pe_data. In this case we should return NF_DROP without * any attempts to send ICMP with ip_vs_leave. */ struct ip_vs_conn * ip_vs_schedule(struct ip_vs_service *svc, struct sk_buff *skb, struct ip_vs_proto_data *pd, int *ignored, struct ip_vs_iphdr *iph) { struct ip_vs_protocol *pp = pd->pp; struct ip_vs_conn *cp = NULL; struct ip_vs_scheduler *sched; struct ip_vs_dest *dest; __be16 _ports[2], *pptr, cport, vport; const void *caddr, *vaddr; unsigned int flags; *ignored = 1; /* * IPv6 frags, only the first hit here. */ pptr = frag_safe_skb_hp(skb, iph->len, sizeof(_ports), _ports); if (pptr == NULL) return NULL; if (likely(!ip_vs_iph_inverse(iph))) { cport = pptr[0]; caddr = &iph->saddr; vport = pptr[1]; vaddr = &iph->daddr; } else { cport = pptr[1]; caddr = &iph->daddr; vport = pptr[0]; vaddr = &iph->saddr; } /* * FTPDATA needs this check when using local real server. * Never schedule Active FTPDATA connections from real server. * For LVS-NAT they must be already created. For other methods * with persistence the connection is created on SYN+ACK. */ if (cport == FTPDATA) { IP_VS_DBG_PKT(12, svc->af, pp, skb, iph->off, "Not scheduling FTPDATA"); return NULL; } /* * Do not schedule replies from local real server. */ if ((!skb->dev || skb->dev->flags & IFF_LOOPBACK)) { iph->hdr_flags ^= IP_VS_HDR_INVERSE; cp = pp->conn_in_get(svc->ipvs, svc->af, skb, iph); iph->hdr_flags ^= IP_VS_HDR_INVERSE; if (cp) { IP_VS_DBG_PKT(12, svc->af, pp, skb, iph->off, "Not scheduling reply for existing" " connection"); __ip_vs_conn_put(cp); return NULL; } } /* * Persistent service */ if (svc->flags & IP_VS_SVC_F_PERSISTENT) return ip_vs_sched_persist(svc, skb, cport, vport, ignored, iph); *ignored = 0; /* * Non-persistent service */ if (!svc->fwmark && vport != svc->port) { if (!svc->port) pr_err("Schedule: port zero only supported " "in persistent services, " "check your ipvs configuration\n"); return NULL; } sched = rcu_dereference(svc->scheduler); if (sched) { /* read svc->sched_data after svc->scheduler */ smp_rmb(); dest = sched->schedule(svc, skb, iph); } else { dest = NULL; } if (dest == NULL) { IP_VS_DBG(1, "Schedule: no dest found.\n"); return NULL; } flags = (svc->flags & IP_VS_SVC_F_ONEPACKET && iph->protocol == IPPROTO_UDP) ? IP_VS_CONN_F_ONE_PACKET : 0; /* * Create a connection entry. */ { struct ip_vs_conn_param p; ip_vs_conn_fill_param(svc->ipvs, svc->af, iph->protocol, caddr, cport, vaddr, vport, &p); cp = ip_vs_conn_new(&p, dest->af, &dest->addr, dest->port ? dest->port : vport, flags, dest, skb->mark); if (!cp) { *ignored = -1; return NULL; } } IP_VS_DBG_BUF(6, "Schedule fwd:%c c:%s:%u v:%s:%u " "d:%s:%u conn->flags:%X conn->refcnt:%d\n", ip_vs_fwd_tag(cp), IP_VS_DBG_ADDR(cp->af, &cp->caddr), ntohs(cp->cport), IP_VS_DBG_ADDR(cp->af, &cp->vaddr), ntohs(cp->vport), IP_VS_DBG_ADDR(cp->daf, &cp->daddr), ntohs(cp->dport), cp->flags, refcount_read(&cp->refcnt)); ip_vs_conn_stats(cp, svc); return cp; } static inline int ip_vs_addr_is_unicast(struct net *net, int af, union nf_inet_addr *addr) { #ifdef CONFIG_IP_VS_IPV6 if (af == AF_INET6) return ipv6_addr_type(&addr->in6) & IPV6_ADDR_UNICAST; #endif return (inet_addr_type(net, addr->ip) == RTN_UNICAST); } /* * Pass or drop the packet. * Called by ip_vs_in, when the virtual service is available but * no destination is available for a new connection. */ int ip_vs_leave(struct ip_vs_service *svc, struct sk_buff *skb, struct ip_vs_proto_data *pd, struct ip_vs_iphdr *iph) { __be16 _ports[2], *pptr, dport; struct netns_ipvs *ipvs = svc->ipvs; struct net *net = ipvs->net; pptr = frag_safe_skb_hp(skb, iph->len, sizeof(_ports), _ports); if (!pptr) return NF_DROP; dport = likely(!ip_vs_iph_inverse(iph)) ? pptr[1] : pptr[0]; /* if it is fwmark-based service, the cache_bypass sysctl is up and the destination is a non-local unicast, then create a cache_bypass connection entry */ if (sysctl_cache_bypass(ipvs) && svc->fwmark && !(iph->hdr_flags & (IP_VS_HDR_INVERSE | IP_VS_HDR_ICMP)) && ip_vs_addr_is_unicast(net, svc->af, &iph->daddr)) { int ret; struct ip_vs_conn *cp; unsigned int flags = (svc->flags & IP_VS_SVC_F_ONEPACKET && iph->protocol == IPPROTO_UDP) ? IP_VS_CONN_F_ONE_PACKET : 0; union nf_inet_addr daddr = { .all = { 0, 0, 0, 0 } }; /* create a new connection entry */ IP_VS_DBG(6, "%s(): create a cache_bypass entry\n", __func__); { struct ip_vs_conn_param p; ip_vs_conn_fill_param(svc->ipvs, svc->af, iph->protocol, &iph->saddr, pptr[0], &iph->daddr, pptr[1], &p); cp = ip_vs_conn_new(&p, svc->af, &daddr, 0, IP_VS_CONN_F_BYPASS | flags, NULL, skb->mark); if (!cp) return NF_DROP; } /* statistics */ ip_vs_in_stats(cp, skb); /* set state */ ip_vs_set_state(cp, IP_VS_DIR_INPUT, skb, pd); /* transmit the first SYN packet */ ret = cp->packet_xmit(skb, cp, pd->pp, iph); /* do not touch skb anymore */ if ((cp->flags & IP_VS_CONN_F_ONE_PACKET) && cp->control) atomic_inc(&cp->control->in_pkts); else atomic_inc(&cp->in_pkts); ip_vs_conn_put(cp); return ret; } /* * When the virtual ftp service is presented, packets destined * for other services on the VIP may get here (except services * listed in the ipvs table), pass the packets, because it is * not ipvs job to decide to drop the packets. */ if (svc->port == FTPPORT && dport != FTPPORT) return NF_ACCEPT; if (unlikely(ip_vs_iph_icmp(iph))) return NF_DROP; /* * Notify the client that the destination is unreachable, and * release the socket buffer. * Since it is in IP layer, the TCP socket is not actually * created, the TCP RST packet cannot be sent, instead that * ICMP_PORT_UNREACH is sent here no matter it is TCP/UDP. --WZ */ #ifdef CONFIG_IP_VS_IPV6 if (svc->af == AF_INET6) { if (!skb->dev) skb->dev = net->loopback_dev; icmpv6_send(skb, ICMPV6_DEST_UNREACH, ICMPV6_PORT_UNREACH, 0); } else #endif icmp_send(skb, ICMP_DEST_UNREACH, ICMP_PORT_UNREACH, 0); return NF_DROP; } #ifdef CONFIG_SYSCTL static int sysctl_snat_reroute(struct netns_ipvs *ipvs) { return ipvs->sysctl_snat_reroute; } static int sysctl_nat_icmp_send(struct netns_ipvs *ipvs) { return ipvs->sysctl_nat_icmp_send; } static int sysctl_expire_nodest_conn(struct netns_ipvs *ipvs) { return ipvs->sysctl_expire_nodest_conn; } #else static int sysctl_snat_reroute(struct netns_ipvs *ipvs) { return 0; } static int sysctl_nat_icmp_send(struct netns_ipvs *ipvs) { return 0; } static int sysctl_expire_nodest_conn(struct netns_ipvs *ipvs) { return 0; } #endif __sum16 ip_vs_checksum_complete(struct sk_buff *skb, int offset) { return csum_fold(skb_checksum(skb, offset, skb->len - offset, 0)); } static inline enum ip_defrag_users ip_vs_defrag_user(unsigned int hooknum) { if (NF_INET_LOCAL_IN == hooknum) return IP_DEFRAG_VS_IN; if (NF_INET_FORWARD == hooknum) return IP_DEFRAG_VS_FWD; return IP_DEFRAG_VS_OUT; } static inline int ip_vs_gather_frags(struct netns_ipvs *ipvs, struct sk_buff *skb, u_int32_t user) { int err; local_bh_disable(); err = ip_defrag(ipvs->net, skb, user); local_bh_enable(); if (!err) ip_send_check(ip_hdr(skb)); return err; } static int ip_vs_route_me_harder(struct netns_ipvs *ipvs, int af, struct sk_buff *skb, unsigned int hooknum) { if (!sysctl_snat_reroute(ipvs)) return 0; /* Reroute replies only to remote clients (FORWARD and LOCAL_OUT) */ if (NF_INET_LOCAL_IN == hooknum) return 0; #ifdef CONFIG_IP_VS_IPV6 if (af == AF_INET6) { struct dst_entry *dst = skb_dst(skb); if (dst->dev && !(dst->dev->flags & IFF_LOOPBACK) && ip6_route_me_harder(ipvs->net, skb->sk, skb) != 0) return 1; } else #endif if (!(skb_rtable(skb)->rt_flags & RTCF_LOCAL) && ip_route_me_harder(ipvs->net, skb->sk, skb, RTN_LOCAL) != 0) return 1; return 0; } /* * Packet has been made sufficiently writable in caller * - inout: 1=in->out, 0=out->in */ void ip_vs_nat_icmp(struct sk_buff *skb, struct ip_vs_protocol *pp, struct ip_vs_conn *cp, int inout) { struct iphdr *iph = ip_hdr(skb); unsigned int icmp_offset = iph->ihl*4; struct icmphdr *icmph = (struct icmphdr *)(skb_network_header(skb) + icmp_offset); struct iphdr *ciph = (struct iphdr *)(icmph + 1); if (inout) { iph->saddr = cp->vaddr.ip; ip_send_check(iph); ciph->daddr = cp->vaddr.ip; ip_send_check(ciph); } else { iph->daddr = cp->daddr.ip; ip_send_check(iph); ciph->saddr = cp->daddr.ip; ip_send_check(ciph); } /* the TCP/UDP/SCTP port */ if (IPPROTO_TCP == ciph->protocol || IPPROTO_UDP == ciph->protocol || IPPROTO_SCTP == ciph->protocol) { __be16 *ports = (void *)ciph + ciph->ihl*4; if (inout) ports[1] = cp->vport; else ports[0] = cp->dport; } /* And finally the ICMP checksum */ icmph->checksum = 0; icmph->checksum = ip_vs_checksum_complete(skb, icmp_offset); skb->ip_summed = CHECKSUM_UNNECESSARY; if (inout) IP_VS_DBG_PKT(11, AF_INET, pp, skb, (void *)ciph - (void *)iph, "Forwarding altered outgoing ICMP"); else IP_VS_DBG_PKT(11, AF_INET, pp, skb, (void *)ciph - (void *)iph, "Forwarding altered incoming ICMP"); } #ifdef CONFIG_IP_VS_IPV6 void ip_vs_nat_icmp_v6(struct sk_buff *skb, struct ip_vs_protocol *pp, struct ip_vs_conn *cp, int inout) { struct ipv6hdr *iph = ipv6_hdr(skb); unsigned int icmp_offset = 0; unsigned int offs = 0; /* header offset*/ int protocol; struct icmp6hdr *icmph; struct ipv6hdr *ciph; unsigned short fragoffs; ipv6_find_hdr(skb, &icmp_offset, IPPROTO_ICMPV6, &fragoffs, NULL); icmph = (struct icmp6hdr *)(skb_network_header(skb) + icmp_offset); offs = icmp_offset + sizeof(struct icmp6hdr); ciph = (struct ipv6hdr *)(skb_network_header(skb) + offs); protocol = ipv6_find_hdr(skb, &offs, -1, &fragoffs, NULL); if (inout) { iph->saddr = cp->vaddr.in6; ciph->daddr = cp->vaddr.in6; } else { iph->daddr = cp->daddr.in6; ciph->saddr = cp->daddr.in6; } /* the TCP/UDP/SCTP port */ if (!fragoffs && (IPPROTO_TCP == protocol || IPPROTO_UDP == protocol || IPPROTO_SCTP == protocol)) { __be16 *ports = (void *)(skb_network_header(skb) + offs); IP_VS_DBG(11, "%s() changed port %d to %d\n", __func__, ntohs(inout ? ports[1] : ports[0]), ntohs(inout ? cp->vport : cp->dport)); if (inout) ports[1] = cp->vport; else ports[0] = cp->dport; } /* And finally the ICMP checksum */ icmph->icmp6_cksum = ~csum_ipv6_magic(&iph->saddr, &iph->daddr, skb->len - icmp_offset, IPPROTO_ICMPV6, 0); skb->csum_start = skb_network_header(skb) - skb->head + icmp_offset; skb->csum_offset = offsetof(struct icmp6hdr, icmp6_cksum); skb->ip_summed = CHECKSUM_PARTIAL; if (inout) IP_VS_DBG_PKT(11, AF_INET6, pp, skb, (void *)ciph - (void *)iph, "Forwarding altered outgoing ICMPv6"); else IP_VS_DBG_PKT(11, AF_INET6, pp, skb, (void *)ciph - (void *)iph, "Forwarding altered incoming ICMPv6"); } #endif /* Handle relevant response ICMP messages - forward to the right * destination host. */ static int handle_response_icmp(int af, struct sk_buff *skb, union nf_inet_addr *snet, __u8 protocol, struct ip_vs_conn *cp, struct ip_vs_protocol *pp, unsigned int offset, unsigned int ihl, unsigned int hooknum) { unsigned int verdict = NF_DROP; if (IP_VS_FWD_METHOD(cp) != IP_VS_CONN_F_MASQ) goto ignore_cp; /* Ensure the checksum is correct */ if (!skb_csum_unnecessary(skb) && ip_vs_checksum_complete(skb, ihl)) { /* Failed checksum! */ IP_VS_DBG_BUF(1, "Forward ICMP: failed checksum from %s!\n", IP_VS_DBG_ADDR(af, snet)); goto out; } if (IPPROTO_TCP == protocol || IPPROTO_UDP == protocol || IPPROTO_SCTP == protocol) offset += 2 * sizeof(__u16); if (!skb_make_writable(skb, offset)) goto out; #ifdef CONFIG_IP_VS_IPV6 if (af == AF_INET6) ip_vs_nat_icmp_v6(skb, pp, cp, 1); else #endif ip_vs_nat_icmp(skb, pp, cp, 1); if (ip_vs_route_me_harder(cp->ipvs, af, skb, hooknum)) goto out; /* do the statistics and put it back */ ip_vs_out_stats(cp, skb); skb->ipvs_property = 1; if (!(cp->flags & IP_VS_CONN_F_NFCT)) ip_vs_notrack(skb); else ip_vs_update_conntrack(skb, cp, 0); ignore_cp: verdict = NF_ACCEPT; out: __ip_vs_conn_put(cp); return verdict; } /* * Handle ICMP messages in the inside-to-outside direction (outgoing). * Find any that might be relevant, check against existing connections. * Currently handles error types - unreachable, quench, ttl exceeded. */ static int ip_vs_out_icmp(struct netns_ipvs *ipvs, struct sk_buff *skb, int *related, unsigned int hooknum) { struct iphdr *iph; struct icmphdr _icmph, *ic; struct iphdr _ciph, *cih; /* The ip header contained within the ICMP */ struct ip_vs_iphdr ciph; struct ip_vs_conn *cp; struct ip_vs_protocol *pp; unsigned int offset, ihl; union nf_inet_addr snet; *related = 1; /* reassemble IP fragments */ if (ip_is_fragment(ip_hdr(skb))) { if (ip_vs_gather_frags(ipvs, skb, ip_vs_defrag_user(hooknum))) return NF_STOLEN; } iph = ip_hdr(skb); offset = ihl = iph->ihl * 4; ic = skb_header_pointer(skb, offset, sizeof(_icmph), &_icmph); if (ic == NULL) return NF_DROP; IP_VS_DBG(12, "Outgoing ICMP (%d,%d) %pI4->%pI4\n", ic->type, ntohs(icmp_id(ic)), &iph->saddr, &iph->daddr); /* * Work through seeing if this is for us. * These checks are supposed to be in an order that means easy * things are checked first to speed up processing.... however * this means that some packets will manage to get a long way * down this stack and then be rejected, but that's life. */ if ((ic->type != ICMP_DEST_UNREACH) && (ic->type != ICMP_SOURCE_QUENCH) && (ic->type != ICMP_TIME_EXCEEDED)) { *related = 0; return NF_ACCEPT; } /* Now find the contained IP header */ offset += sizeof(_icmph); cih = skb_header_pointer(skb, offset, sizeof(_ciph), &_ciph); if (cih == NULL) return NF_ACCEPT; /* The packet looks wrong, ignore */ pp = ip_vs_proto_get(cih->protocol); if (!pp) return NF_ACCEPT; /* Is the embedded protocol header present? */ if (unlikely(cih->frag_off & htons(IP_OFFSET) && pp->dont_defrag)) return NF_ACCEPT; IP_VS_DBG_PKT(11, AF_INET, pp, skb, offset, "Checking outgoing ICMP for"); ip_vs_fill_iph_skb_icmp(AF_INET, skb, offset, true, &ciph); /* The embedded headers contain source and dest in reverse order */ cp = pp->conn_out_get(ipvs, AF_INET, skb, &ciph); if (!cp) return NF_ACCEPT; snet.ip = iph->saddr; return handle_response_icmp(AF_INET, skb, &snet, cih->protocol, cp, pp, ciph.len, ihl, hooknum); } #ifdef CONFIG_IP_VS_IPV6 static int ip_vs_out_icmp_v6(struct netns_ipvs *ipvs, struct sk_buff *skb, int *related, unsigned int hooknum, struct ip_vs_iphdr *ipvsh) { struct icmp6hdr _icmph, *ic; struct ip_vs_iphdr ciph = {.flags = 0, .fragoffs = 0};/*Contained IP */ struct ip_vs_conn *cp; struct ip_vs_protocol *pp; union nf_inet_addr snet; unsigned int offset; *related = 1; ic = frag_safe_skb_hp(skb, ipvsh->len, sizeof(_icmph), &_icmph); if (ic == NULL) return NF_DROP; /* * Work through seeing if this is for us. * These checks are supposed to be in an order that means easy * things are checked first to speed up processing.... however * this means that some packets will manage to get a long way * down this stack and then be rejected, but that's life. */ if (ic->icmp6_type & ICMPV6_INFOMSG_MASK) { *related = 0; return NF_ACCEPT; } /* Fragment header that is before ICMP header tells us that: * it's not an error message since they can't be fragmented. */ if (ipvsh->flags & IP6_FH_F_FRAG) return NF_DROP; IP_VS_DBG(8, "Outgoing ICMPv6 (%d,%d) %pI6c->%pI6c\n", ic->icmp6_type, ntohs(icmpv6_id(ic)), &ipvsh->saddr, &ipvsh->daddr); if (!ip_vs_fill_iph_skb_icmp(AF_INET6, skb, ipvsh->len + sizeof(_icmph), true, &ciph)) return NF_ACCEPT; /* The packet looks wrong, ignore */ pp = ip_vs_proto_get(ciph.protocol); if (!pp) return NF_ACCEPT; /* The embedded headers contain source and dest in reverse order */ cp = pp->conn_out_get(ipvs, AF_INET6, skb, &ciph); if (!cp) return NF_ACCEPT; snet.in6 = ciph.saddr.in6; offset = ciph.len; return handle_response_icmp(AF_INET6, skb, &snet, ciph.protocol, cp, pp, offset, sizeof(struct ipv6hdr), hooknum); } #endif /* * Check if sctp chunc is ABORT chunk */ static inline int is_sctp_abort(const struct sk_buff *skb, int nh_len) { struct sctp_chunkhdr *sch, schunk; sch = skb_header_pointer(skb, nh_len + sizeof(struct sctphdr), sizeof(schunk), &schunk); if (sch == NULL) return 0; if (sch->type == SCTP_CID_ABORT) return 1; return 0; } static inline int is_tcp_reset(const struct sk_buff *skb, int nh_len) { struct tcphdr _tcph, *th; th = skb_header_pointer(skb, nh_len, sizeof(_tcph), &_tcph); if (th == NULL) return 0; return th->rst; } static inline bool is_new_conn(const struct sk_buff *skb, struct ip_vs_iphdr *iph) { switch (iph->protocol) { case IPPROTO_TCP: { struct tcphdr _tcph, *th; th = skb_header_pointer(skb, iph->len, sizeof(_tcph), &_tcph); if (th == NULL) return false; return th->syn; } case IPPROTO_SCTP: { struct sctp_chunkhdr *sch, schunk; sch = skb_header_pointer(skb, iph->len + sizeof(struct sctphdr), sizeof(schunk), &schunk); if (sch == NULL) return false; return sch->type == SCTP_CID_INIT; } default: return false; } } static inline bool is_new_conn_expected(const struct ip_vs_conn *cp, int conn_reuse_mode) { /* Controlled (FTP DATA or persistence)? */ if (cp->control) return false; switch (cp->protocol) { case IPPROTO_TCP: return (cp->state == IP_VS_TCP_S_TIME_WAIT) || (cp->state == IP_VS_TCP_S_CLOSE) || ((conn_reuse_mode & 2) && (cp->state == IP_VS_TCP_S_FIN_WAIT) && (cp->flags & IP_VS_CONN_F_NOOUTPUT)); case IPPROTO_SCTP: return cp->state == IP_VS_SCTP_S_CLOSED; default: return false; } } /* Generic function to create new connections for outgoing RS packets * * Pre-requisites for successful connection creation: * 1) Virtual Service is NOT fwmark based: * In fwmark-VS actual vaddr and vport are unknown to IPVS * 2) Real Server and Virtual Service were NOT configured without port: * This is to allow match of different VS to the same RS ip-addr */ struct ip_vs_conn *ip_vs_new_conn_out(struct ip_vs_service *svc, struct ip_vs_dest *dest, struct sk_buff *skb, const struct ip_vs_iphdr *iph, __be16 dport, __be16 cport) { struct ip_vs_conn_param param; struct ip_vs_conn *ct = NULL, *cp = NULL; const union nf_inet_addr *vaddr, *daddr, *caddr; union nf_inet_addr snet; __be16 vport; unsigned int flags; EnterFunction(12); vaddr = &svc->addr; vport = svc->port; daddr = &iph->saddr; caddr = &iph->daddr; /* check pre-requisites are satisfied */ if (svc->fwmark) return NULL; if (!vport || !dport) return NULL; /* for persistent service first create connection template */ if (svc->flags & IP_VS_SVC_F_PERSISTENT) { /* apply netmask the same way ingress-side does */ #ifdef CONFIG_IP_VS_IPV6 if (svc->af == AF_INET6) ipv6_addr_prefix(&snet.in6, &caddr->in6, (__force __u32)svc->netmask); else #endif snet.ip = caddr->ip & svc->netmask; /* fill params and create template if not existent */ if (ip_vs_conn_fill_param_persist(svc, skb, iph->protocol, &snet, 0, vaddr, vport, &param) < 0) return NULL; ct = ip_vs_ct_in_get(&param); /* check if template exists and points to the same dest */ if (!ct || !ip_vs_check_template(ct, dest)) { ct = ip_vs_conn_new(&param, dest->af, daddr, dport, IP_VS_CONN_F_TEMPLATE, dest, 0); if (!ct) { kfree(param.pe_data); return NULL; } ct->timeout = svc->timeout; } else { kfree(param.pe_data); } } /* connection flags */ flags = ((svc->flags & IP_VS_SVC_F_ONEPACKET) && iph->protocol == IPPROTO_UDP) ? IP_VS_CONN_F_ONE_PACKET : 0; /* create connection */ ip_vs_conn_fill_param(svc->ipvs, svc->af, iph->protocol, caddr, cport, vaddr, vport, &param); cp = ip_vs_conn_new(&param, dest->af, daddr, dport, flags, dest, 0); if (!cp) { if (ct) ip_vs_conn_put(ct); return NULL; } if (ct) { ip_vs_control_add(cp, ct); ip_vs_conn_put(ct); } ip_vs_conn_stats(cp, svc); /* return connection (will be used to handle outgoing packet) */ IP_VS_DBG_BUF(6, "New connection RS-initiated:%c c:%s:%u v:%s:%u " "d:%s:%u conn->flags:%X conn->refcnt:%d\n", ip_vs_fwd_tag(cp), IP_VS_DBG_ADDR(cp->af, &cp->caddr), ntohs(cp->cport), IP_VS_DBG_ADDR(cp->af, &cp->vaddr), ntohs(cp->vport), IP_VS_DBG_ADDR(cp->af, &cp->daddr), ntohs(cp->dport), cp->flags, refcount_read(&cp->refcnt)); LeaveFunction(12); return cp; } /* Handle outgoing packets which are considered requests initiated by * real servers, so that subsequent responses from external client can be * routed to the right real server. * Used also for outgoing responses in OPS mode. * * Connection management is handled by persistent-engine specific callback. */ static struct ip_vs_conn *__ip_vs_rs_conn_out(unsigned int hooknum, struct netns_ipvs *ipvs, int af, struct sk_buff *skb, const struct ip_vs_iphdr *iph) { struct ip_vs_dest *dest; struct ip_vs_conn *cp = NULL; __be16 _ports[2], *pptr; if (hooknum == NF_INET_LOCAL_IN) return NULL; pptr = frag_safe_skb_hp(skb, iph->len, sizeof(_ports), _ports); if (!pptr) return NULL; dest = ip_vs_find_real_service(ipvs, af, iph->protocol, &iph->saddr, pptr[0]); if (dest) { struct ip_vs_service *svc; struct ip_vs_pe *pe; svc = rcu_dereference(dest->svc); if (svc) { pe = rcu_dereference(svc->pe); if (pe && pe->conn_out) cp = pe->conn_out(svc, dest, skb, iph, pptr[0], pptr[1]); } } return cp; } /* Handle response packets: rewrite addresses and send away... */ static unsigned int handle_response(int af, struct sk_buff *skb, struct ip_vs_proto_data *pd, struct ip_vs_conn *cp, struct ip_vs_iphdr *iph, unsigned int hooknum) { struct ip_vs_protocol *pp = pd->pp; IP_VS_DBG_PKT(11, af, pp, skb, iph->off, "Outgoing packet"); if (!skb_make_writable(skb, iph->len)) goto drop; /* mangle the packet */ if (pp->snat_handler && !pp->snat_handler(skb, pp, cp, iph)) goto drop; #ifdef CONFIG_IP_VS_IPV6 if (af == AF_INET6) ipv6_hdr(skb)->saddr = cp->vaddr.in6; else #endif { ip_hdr(skb)->saddr = cp->vaddr.ip; ip_send_check(ip_hdr(skb)); } /* * nf_iterate does not expect change in the skb->dst->dev. * It looks like it is not fatal to enable this code for hooks * where our handlers are at the end of the chain list and * when all next handlers use skb->dst->dev and not outdev. * It will definitely route properly the inout NAT traffic * when multiple paths are used. */ /* For policy routing, packets originating from this * machine itself may be routed differently to packets * passing through. We want this packet to be routed as * if it came from this machine itself. So re-compute * the routing information. */ if (ip_vs_route_me_harder(cp->ipvs, af, skb, hooknum)) goto drop; IP_VS_DBG_PKT(10, af, pp, skb, iph->off, "After SNAT"); ip_vs_out_stats(cp, skb); ip_vs_set_state(cp, IP_VS_DIR_OUTPUT, skb, pd); skb->ipvs_property = 1; if (!(cp->flags & IP_VS_CONN_F_NFCT)) ip_vs_notrack(skb); else ip_vs_update_conntrack(skb, cp, 0); ip_vs_conn_put(cp); LeaveFunction(11); return NF_ACCEPT; drop: ip_vs_conn_put(cp); kfree_skb(skb); LeaveFunction(11); return NF_STOLEN; } /* * Check if outgoing packet belongs to the established ip_vs_conn. */ static unsigned int ip_vs_out(struct netns_ipvs *ipvs, unsigned int hooknum, struct sk_buff *skb, int af) { struct ip_vs_iphdr iph; struct ip_vs_protocol *pp; struct ip_vs_proto_data *pd; struct ip_vs_conn *cp; struct sock *sk; EnterFunction(11); /* Already marked as IPVS request or reply? */ if (skb->ipvs_property) return NF_ACCEPT; sk = skb_to_full_sk(skb); /* Bad... Do not break raw sockets */ if (unlikely(sk && hooknum == NF_INET_LOCAL_OUT && af == AF_INET)) { if (sk->sk_family == PF_INET && inet_sk(sk)->nodefrag) return NF_ACCEPT; } if (unlikely(!skb_dst(skb))) return NF_ACCEPT; if (!ipvs->enable) return NF_ACCEPT; ip_vs_fill_iph_skb(af, skb, false, &iph); #ifdef CONFIG_IP_VS_IPV6 if (af == AF_INET6) { if (unlikely(iph.protocol == IPPROTO_ICMPV6)) { int related; int verdict = ip_vs_out_icmp_v6(ipvs, skb, &related, hooknum, &iph); if (related) return verdict; } } else #endif if (unlikely(iph.protocol == IPPROTO_ICMP)) { int related; int verdict = ip_vs_out_icmp(ipvs, skb, &related, hooknum); if (related) return verdict; } pd = ip_vs_proto_data_get(ipvs, iph.protocol); if (unlikely(!pd)) return NF_ACCEPT; pp = pd->pp; /* reassemble IP fragments */ #ifdef CONFIG_IP_VS_IPV6 if (af == AF_INET) #endif if (unlikely(ip_is_fragment(ip_hdr(skb)) && !pp->dont_defrag)) { if (ip_vs_gather_frags(ipvs, skb, ip_vs_defrag_user(hooknum))) return NF_STOLEN; ip_vs_fill_iph_skb(AF_INET, skb, false, &iph); } /* * Check if the packet belongs to an existing entry */ cp = pp->conn_out_get(ipvs, af, skb, &iph); if (likely(cp)) { if (IP_VS_FWD_METHOD(cp) != IP_VS_CONN_F_MASQ) goto ignore_cp; return handle_response(af, skb, pd, cp, &iph, hooknum); } /* Check for real-server-started requests */ if (atomic_read(&ipvs->conn_out_counter)) { /* Currently only for UDP: * connection oriented protocols typically use * ephemeral ports for outgoing connections, so * related incoming responses would not match any VS */ if (pp->protocol == IPPROTO_UDP) { cp = __ip_vs_rs_conn_out(hooknum, ipvs, af, skb, &iph); if (likely(cp)) return handle_response(af, skb, pd, cp, &iph, hooknum); } } if (sysctl_nat_icmp_send(ipvs) && (pp->protocol == IPPROTO_TCP || pp->protocol == IPPROTO_UDP || pp->protocol == IPPROTO_SCTP)) { __be16 _ports[2], *pptr; pptr = frag_safe_skb_hp(skb, iph.len, sizeof(_ports), _ports); if (pptr == NULL) return NF_ACCEPT; /* Not for me */ if (ip_vs_has_real_service(ipvs, af, iph.protocol, &iph.saddr, pptr[0])) { /* * Notify the real server: there is no * existing entry if it is not RST * packet or not TCP packet. */ if ((iph.protocol != IPPROTO_TCP && iph.protocol != IPPROTO_SCTP) || ((iph.protocol == IPPROTO_TCP && !is_tcp_reset(skb, iph.len)) || (iph.protocol == IPPROTO_SCTP && !is_sctp_abort(skb, iph.len)))) { #ifdef CONFIG_IP_VS_IPV6 if (af == AF_INET6) { if (!skb->dev) skb->dev = ipvs->net->loopback_dev; icmpv6_send(skb, ICMPV6_DEST_UNREACH, ICMPV6_PORT_UNREACH, 0); } else #endif icmp_send(skb, ICMP_DEST_UNREACH, ICMP_PORT_UNREACH, 0); return NF_DROP; } } } out: IP_VS_DBG_PKT(12, af, pp, skb, iph.off, "ip_vs_out: packet continues traversal as normal"); return NF_ACCEPT; ignore_cp: __ip_vs_conn_put(cp); goto out; } /* * It is hooked at the NF_INET_FORWARD and NF_INET_LOCAL_IN chain, * used only for VS/NAT. * Check if packet is reply for established ip_vs_conn. */ static unsigned int ip_vs_reply4(void *priv, struct sk_buff *skb, const struct nf_hook_state *state) { return ip_vs_out(net_ipvs(state->net), state->hook, skb, AF_INET); } /* * It is hooked at the NF_INET_LOCAL_OUT chain, used only for VS/NAT. * Check if packet is reply for established ip_vs_conn. */ static unsigned int ip_vs_local_reply4(void *priv, struct sk_buff *skb, const struct nf_hook_state *state) { return ip_vs_out(net_ipvs(state->net), state->hook, skb, AF_INET); } #ifdef CONFIG_IP_VS_IPV6 /* * It is hooked at the NF_INET_FORWARD and NF_INET_LOCAL_IN chain, * used only for VS/NAT. * Check if packet is reply for established ip_vs_conn. */ static unsigned int ip_vs_reply6(void *priv, struct sk_buff *skb, const struct nf_hook_state *state) { return ip_vs_out(net_ipvs(state->net), state->hook, skb, AF_INET6); } /* * It is hooked at the NF_INET_LOCAL_OUT chain, used only for VS/NAT. * Check if packet is reply for established ip_vs_conn. */ static unsigned int ip_vs_local_reply6(void *priv, struct sk_buff *skb, const struct nf_hook_state *state) { return ip_vs_out(net_ipvs(state->net), state->hook, skb, AF_INET6); } #endif static unsigned int ip_vs_try_to_schedule(struct netns_ipvs *ipvs, int af, struct sk_buff *skb, struct ip_vs_proto_data *pd, int *verdict, struct ip_vs_conn **cpp, struct ip_vs_iphdr *iph) { struct ip_vs_protocol *pp = pd->pp; if (!iph->fragoffs) { /* No (second) fragments need to enter here, as nf_defrag_ipv6 * replayed fragment zero will already have created the cp */ /* Schedule and create new connection entry into cpp */ if (!pp->conn_schedule(ipvs, af, skb, pd, verdict, cpp, iph)) return 0; } if (unlikely(!*cpp)) { /* sorry, all this trouble for a no-hit :) */ IP_VS_DBG_PKT(12, af, pp, skb, iph->off, "ip_vs_in: packet continues traversal as normal"); /* Fragment couldn't be mapped to a conn entry */ if (iph->fragoffs) IP_VS_DBG_PKT(7, af, pp, skb, iph->off, "unhandled fragment"); *verdict = NF_ACCEPT; return 0; } return 1; } /* * Handle ICMP messages in the outside-to-inside direction (incoming). * Find any that might be relevant, check against existing connections, * forward to the right destination host if relevant. * Currently handles error types - unreachable, quench, ttl exceeded. */ static int ip_vs_in_icmp(struct netns_ipvs *ipvs, struct sk_buff *skb, int *related, unsigned int hooknum) { struct iphdr *iph; struct icmphdr _icmph, *ic; struct iphdr _ciph, *cih; /* The ip header contained within the ICMP */ struct ip_vs_iphdr ciph; struct ip_vs_conn *cp; struct ip_vs_protocol *pp; struct ip_vs_proto_data *pd; unsigned int offset, offset2, ihl, verdict; bool ipip, new_cp = false; *related = 1; /* reassemble IP fragments */ if (ip_is_fragment(ip_hdr(skb))) { if (ip_vs_gather_frags(ipvs, skb, ip_vs_defrag_user(hooknum))) return NF_STOLEN; } iph = ip_hdr(skb); offset = ihl = iph->ihl * 4; ic = skb_header_pointer(skb, offset, sizeof(_icmph), &_icmph); if (ic == NULL) return NF_DROP; IP_VS_DBG(12, "Incoming ICMP (%d,%d) %pI4->%pI4\n", ic->type, ntohs(icmp_id(ic)), &iph->saddr, &iph->daddr); /* * Work through seeing if this is for us. * These checks are supposed to be in an order that means easy * things are checked first to speed up processing.... however * this means that some packets will manage to get a long way * down this stack and then be rejected, but that's life. */ if ((ic->type != ICMP_DEST_UNREACH) && (ic->type != ICMP_SOURCE_QUENCH) && (ic->type != ICMP_TIME_EXCEEDED)) { *related = 0; return NF_ACCEPT; } /* Now find the contained IP header */ offset += sizeof(_icmph); cih = skb_header_pointer(skb, offset, sizeof(_ciph), &_ciph); if (cih == NULL) return NF_ACCEPT; /* The packet looks wrong, ignore */ /* Special case for errors for IPIP packets */ ipip = false; if (cih->protocol == IPPROTO_IPIP) { if (unlikely(cih->frag_off & htons(IP_OFFSET))) return NF_ACCEPT; /* Error for our IPIP must arrive at LOCAL_IN */ if (!(skb_rtable(skb)->rt_flags & RTCF_LOCAL)) return NF_ACCEPT; offset += cih->ihl * 4; cih = skb_header_pointer(skb, offset, sizeof(_ciph), &_ciph); if (cih == NULL) return NF_ACCEPT; /* The packet looks wrong, ignore */ ipip = true; } pd = ip_vs_proto_data_get(ipvs, cih->protocol); if (!pd) return NF_ACCEPT; pp = pd->pp; /* Is the embedded protocol header present? */ if (unlikely(cih->frag_off & htons(IP_OFFSET) && pp->dont_defrag)) return NF_ACCEPT; IP_VS_DBG_PKT(11, AF_INET, pp, skb, offset, "Checking incoming ICMP for"); offset2 = offset; ip_vs_fill_iph_skb_icmp(AF_INET, skb, offset, !ipip, &ciph); offset = ciph.len; /* The embedded headers contain source and dest in reverse order. * For IPIP this is error for request, not for reply. */ cp = pp->conn_in_get(ipvs, AF_INET, skb, &ciph); if (!cp) { int v; if (ipip || !sysctl_schedule_icmp(ipvs)) return NF_ACCEPT; if (!ip_vs_try_to_schedule(ipvs, AF_INET, skb, pd, &v, &cp, &ciph)) return v; new_cp = true; } verdict = NF_DROP; /* Ensure the checksum is correct */ if (!skb_csum_unnecessary(skb) && ip_vs_checksum_complete(skb, ihl)) { /* Failed checksum! */ IP_VS_DBG(1, "Incoming ICMP: failed checksum from %pI4!\n", &iph->saddr); goto out; } if (ipip) { __be32 info = ic->un.gateway; __u8 type = ic->type; __u8 code = ic->code; /* Update the MTU */ if (ic->type == ICMP_DEST_UNREACH && ic->code == ICMP_FRAG_NEEDED) { struct ip_vs_dest *dest = cp->dest; u32 mtu = ntohs(ic->un.frag.mtu); __be16 frag_off = cih->frag_off; /* Strip outer IP and ICMP, go to IPIP header */ if (pskb_pull(skb, ihl + sizeof(_icmph)) == NULL) goto ignore_ipip; offset2 -= ihl + sizeof(_icmph); skb_reset_network_header(skb); IP_VS_DBG(12, "ICMP for IPIP %pI4->%pI4: mtu=%u\n", &ip_hdr(skb)->saddr, &ip_hdr(skb)->daddr, mtu); ipv4_update_pmtu(skb, ipvs->net, mtu, 0, 0, 0, 0); /* Client uses PMTUD? */ if (!(frag_off & htons(IP_DF))) goto ignore_ipip; /* Prefer the resulting PMTU */ if (dest) { struct ip_vs_dest_dst *dest_dst; dest_dst = rcu_dereference(dest->dest_dst); if (dest_dst) mtu = dst_mtu(dest_dst->dst_cache); } if (mtu > 68 + sizeof(struct iphdr)) mtu -= sizeof(struct iphdr); info = htonl(mtu); } /* Strip outer IP, ICMP and IPIP, go to IP header of * original request. */ if (pskb_pull(skb, offset2) == NULL) goto ignore_ipip; skb_reset_network_header(skb); IP_VS_DBG(12, "Sending ICMP for %pI4->%pI4: t=%u, c=%u, i=%u\n", &ip_hdr(skb)->saddr, &ip_hdr(skb)->daddr, type, code, ntohl(info)); icmp_send(skb, type, code, info); /* ICMP can be shorter but anyways, account it */ ip_vs_out_stats(cp, skb); ignore_ipip: consume_skb(skb); verdict = NF_STOLEN; goto out; } /* do the statistics and put it back */ ip_vs_in_stats(cp, skb); if (IPPROTO_TCP == cih->protocol || IPPROTO_UDP == cih->protocol || IPPROTO_SCTP == cih->protocol) offset += 2 * sizeof(__u16); verdict = ip_vs_icmp_xmit(skb, cp, pp, offset, hooknum, &ciph); out: if (likely(!new_cp)) __ip_vs_conn_put(cp); else ip_vs_conn_put(cp); return verdict; } #ifdef CONFIG_IP_VS_IPV6 static int ip_vs_in_icmp_v6(struct netns_ipvs *ipvs, struct sk_buff *skb, int *related, unsigned int hooknum, struct ip_vs_iphdr *iph) { struct icmp6hdr _icmph, *ic; struct ip_vs_iphdr ciph = {.flags = 0, .fragoffs = 0};/*Contained IP */ struct ip_vs_conn *cp; struct ip_vs_protocol *pp; struct ip_vs_proto_data *pd; unsigned int offset, verdict; bool new_cp = false; *related = 1; ic = frag_safe_skb_hp(skb, iph->len, sizeof(_icmph), &_icmph); if (ic == NULL) return NF_DROP; /* * Work through seeing if this is for us. * These checks are supposed to be in an order that means easy * things are checked first to speed up processing.... however * this means that some packets will manage to get a long way * down this stack and then be rejected, but that's life. */ if (ic->icmp6_type & ICMPV6_INFOMSG_MASK) { *related = 0; return NF_ACCEPT; } /* Fragment header that is before ICMP header tells us that: * it's not an error message since they can't be fragmented. */ if (iph->flags & IP6_FH_F_FRAG) return NF_DROP; IP_VS_DBG(8, "Incoming ICMPv6 (%d,%d) %pI6c->%pI6c\n", ic->icmp6_type, ntohs(icmpv6_id(ic)), &iph->saddr, &iph->daddr); offset = iph->len + sizeof(_icmph); if (!ip_vs_fill_iph_skb_icmp(AF_INET6, skb, offset, true, &ciph)) return NF_ACCEPT; pd = ip_vs_proto_data_get(ipvs, ciph.protocol); if (!pd) return NF_ACCEPT; pp = pd->pp; /* Cannot handle fragmented embedded protocol */ if (ciph.fragoffs) return NF_ACCEPT; IP_VS_DBG_PKT(11, AF_INET6, pp, skb, offset, "Checking incoming ICMPv6 for"); /* The embedded headers contain source and dest in reverse order * if not from localhost */ cp = pp->conn_in_get(ipvs, AF_INET6, skb, &ciph); if (!cp) { int v; if (!sysctl_schedule_icmp(ipvs)) return NF_ACCEPT; if (!ip_vs_try_to_schedule(ipvs, AF_INET6, skb, pd, &v, &cp, &ciph)) return v; new_cp = true; } /* VS/TUN, VS/DR and LOCALNODE just let it go */ if ((hooknum == NF_INET_LOCAL_OUT) && (IP_VS_FWD_METHOD(cp) != IP_VS_CONN_F_MASQ)) { verdict = NF_ACCEPT; goto out; } /* do the statistics and put it back */ ip_vs_in_stats(cp, skb); /* Need to mangle contained IPv6 header in ICMPv6 packet */ offset = ciph.len; if (IPPROTO_TCP == ciph.protocol || IPPROTO_UDP == ciph.protocol || IPPROTO_SCTP == ciph.protocol) offset += 2 * sizeof(__u16); /* Also mangle ports */ verdict = ip_vs_icmp_xmit_v6(skb, cp, pp, offset, hooknum, &ciph); out: if (likely(!new_cp)) __ip_vs_conn_put(cp); else ip_vs_conn_put(cp); return verdict; } #endif /* * Check if it's for virtual services, look it up, * and send it on its way... */ static unsigned int ip_vs_in(struct netns_ipvs *ipvs, unsigned int hooknum, struct sk_buff *skb, int af) { struct ip_vs_iphdr iph; struct ip_vs_protocol *pp; struct ip_vs_proto_data *pd; struct ip_vs_conn *cp; int ret, pkts; struct sock *sk; /* Already marked as IPVS request or reply? */ if (skb->ipvs_property) return NF_ACCEPT; /* * Big tappo: * - remote client: only PACKET_HOST * - route: used for struct net when skb->dev is unset */ if (unlikely((skb->pkt_type != PACKET_HOST && hooknum != NF_INET_LOCAL_OUT) || !skb_dst(skb))) { ip_vs_fill_iph_skb(af, skb, false, &iph); IP_VS_DBG_BUF(12, "packet type=%d proto=%d daddr=%s" " ignored in hook %u\n", skb->pkt_type, iph.protocol, IP_VS_DBG_ADDR(af, &iph.daddr), hooknum); return NF_ACCEPT; } /* ipvs enabled in this netns ? */ if (unlikely(sysctl_backup_only(ipvs) || !ipvs->enable)) return NF_ACCEPT; ip_vs_fill_iph_skb(af, skb, false, &iph); /* Bad... Do not break raw sockets */ sk = skb_to_full_sk(skb); if (unlikely(sk && hooknum == NF_INET_LOCAL_OUT && af == AF_INET)) { if (sk->sk_family == PF_INET && inet_sk(sk)->nodefrag) return NF_ACCEPT; } #ifdef CONFIG_IP_VS_IPV6 if (af == AF_INET6) { if (unlikely(iph.protocol == IPPROTO_ICMPV6)) { int related; int verdict = ip_vs_in_icmp_v6(ipvs, skb, &related, hooknum, &iph); if (related) return verdict; } } else #endif if (unlikely(iph.protocol == IPPROTO_ICMP)) { int related; int verdict = ip_vs_in_icmp(ipvs, skb, &related, hooknum); if (related) return verdict; } /* Protocol supported? */ pd = ip_vs_proto_data_get(ipvs, iph.protocol); if (unlikely(!pd)) { /* The only way we'll see this packet again is if it's * encapsulated, so mark it with ipvs_property=1 so we * skip it if we're ignoring tunneled packets */ if (sysctl_ignore_tunneled(ipvs)) skb->ipvs_property = 1; return NF_ACCEPT; } pp = pd->pp; /* * Check if the packet belongs to an existing connection entry */ cp = pp->conn_in_get(ipvs, af, skb, &iph); if (!iph.fragoffs && is_new_conn(skb, &iph) && cp) { int conn_reuse_mode = sysctl_conn_reuse_mode(ipvs); bool old_ct = false, resched = false; if (unlikely(sysctl_expire_nodest_conn(ipvs)) && cp->dest && unlikely(!atomic_read(&cp->dest->weight))) { resched = true; old_ct = ip_vs_conn_uses_old_conntrack(cp, skb); } else if (conn_reuse_mode && is_new_conn_expected(cp, conn_reuse_mode)) { old_ct = ip_vs_conn_uses_old_conntrack(cp, skb); if (!atomic_read(&cp->n_control)) { resched = true; } else { /* Do not reschedule controlling connection * that uses conntrack while it is still * referenced by controlled connection(s). */ resched = !old_ct; } } if (resched) { if (!old_ct) cp->flags &= ~IP_VS_CONN_F_NFCT; if (!atomic_read(&cp->n_control)) ip_vs_conn_expire_now(cp); __ip_vs_conn_put(cp); if (old_ct) return NF_DROP; cp = NULL; } } if (unlikely(!cp)) { int v; if (!ip_vs_try_to_schedule(ipvs, af, skb, pd, &v, &cp, &iph)) return v; } IP_VS_DBG_PKT(11, af, pp, skb, iph.off, "Incoming packet"); /* Check the server status */ if (cp->dest && !(cp->dest->flags & IP_VS_DEST_F_AVAILABLE)) { /* the destination server is not available */ __u32 flags = cp->flags; /* when timer already started, silently drop the packet.*/ if (timer_pending(&cp->timer)) __ip_vs_conn_put(cp); else ip_vs_conn_put(cp); if (sysctl_expire_nodest_conn(ipvs) && !(flags & IP_VS_CONN_F_ONE_PACKET)) { /* try to expire the connection immediately */ ip_vs_conn_expire_now(cp); } return NF_DROP; } ip_vs_in_stats(cp, skb); ip_vs_set_state(cp, IP_VS_DIR_INPUT, skb, pd); if (cp->packet_xmit) ret = cp->packet_xmit(skb, cp, pp, &iph); /* do not touch skb anymore */ else { IP_VS_DBG_RL("warning: packet_xmit is null"); ret = NF_ACCEPT; } /* Increase its packet counter and check if it is needed * to be synchronized * * Sync connection if it is about to close to * encorage the standby servers to update the connections timeout * * For ONE_PKT let ip_vs_sync_conn() do the filter work. */ if (cp->flags & IP_VS_CONN_F_ONE_PACKET) pkts = sysctl_sync_threshold(ipvs); else pkts = atomic_add_return(1, &cp->in_pkts); if (ipvs->sync_state & IP_VS_STATE_MASTER) ip_vs_sync_conn(ipvs, cp, pkts); else if ((cp->flags & IP_VS_CONN_F_ONE_PACKET) && cp->control) /* increment is done inside ip_vs_sync_conn too */ atomic_inc(&cp->control->in_pkts); ip_vs_conn_put(cp); return ret; } /* * AF_INET handler in NF_INET_LOCAL_IN chain * Schedule and forward packets from remote clients */ static unsigned int ip_vs_remote_request4(void *priv, struct sk_buff *skb, const struct nf_hook_state *state) { return ip_vs_in(net_ipvs(state->net), state->hook, skb, AF_INET); } /* * AF_INET handler in NF_INET_LOCAL_OUT chain * Schedule and forward packets from local clients */ static unsigned int ip_vs_local_request4(void *priv, struct sk_buff *skb, const struct nf_hook_state *state) { return ip_vs_in(net_ipvs(state->net), state->hook, skb, AF_INET); } #ifdef CONFIG_IP_VS_IPV6 /* * AF_INET6 handler in NF_INET_LOCAL_IN chain * Schedule and forward packets from remote clients */ static unsigned int ip_vs_remote_request6(void *priv, struct sk_buff *skb, const struct nf_hook_state *state) { return ip_vs_in(net_ipvs(state->net), state->hook, skb, AF_INET6); } /* * AF_INET6 handler in NF_INET_LOCAL_OUT chain * Schedule and forward packets from local clients */ static unsigned int ip_vs_local_request6(void *priv, struct sk_buff *skb, const struct nf_hook_state *state) { return ip_vs_in(net_ipvs(state->net), state->hook, skb, AF_INET6); } #endif /* * It is hooked at the NF_INET_FORWARD chain, in order to catch ICMP * related packets destined for 0.0.0.0/0. * When fwmark-based virtual service is used, such as transparent * cache cluster, TCP packets can be marked and routed to ip_vs_in, * but ICMP destined for 0.0.0.0/0 cannot not be easily marked and * sent to ip_vs_in_icmp. So, catch them at the NF_INET_FORWARD chain * and send them to ip_vs_in_icmp. */ static unsigned int ip_vs_forward_icmp(void *priv, struct sk_buff *skb, const struct nf_hook_state *state) { int r; struct netns_ipvs *ipvs = net_ipvs(state->net); if (ip_hdr(skb)->protocol != IPPROTO_ICMP) return NF_ACCEPT; /* ipvs enabled in this netns ? */ if (unlikely(sysctl_backup_only(ipvs) || !ipvs->enable)) return NF_ACCEPT; return ip_vs_in_icmp(ipvs, skb, &r, state->hook); } #ifdef CONFIG_IP_VS_IPV6 static unsigned int ip_vs_forward_icmp_v6(void *priv, struct sk_buff *skb, const struct nf_hook_state *state) { int r; struct netns_ipvs *ipvs = net_ipvs(state->net); struct ip_vs_iphdr iphdr; ip_vs_fill_iph_skb(AF_INET6, skb, false, &iphdr); if (iphdr.protocol != IPPROTO_ICMPV6) return NF_ACCEPT; /* ipvs enabled in this netns ? */ if (unlikely(sysctl_backup_only(ipvs) || !ipvs->enable)) return NF_ACCEPT; return ip_vs_in_icmp_v6(ipvs, skb, &r, state->hook, &iphdr); } #endif static const struct nf_hook_ops ip_vs_ops[] = { /* After packet filtering, change source only for VS/NAT */ { .hook = ip_vs_reply4, .pf = NFPROTO_IPV4, .hooknum = NF_INET_LOCAL_IN, .priority = NF_IP_PRI_NAT_SRC - 2, }, /* After packet filtering, forward packet through VS/DR, VS/TUN, * or VS/NAT(change destination), so that filtering rules can be * applied to IPVS. */ { .hook = ip_vs_remote_request4, .pf = NFPROTO_IPV4, .hooknum = NF_INET_LOCAL_IN, .priority = NF_IP_PRI_NAT_SRC - 1, }, /* Before ip_vs_in, change source only for VS/NAT */ { .hook = ip_vs_local_reply4, .pf = NFPROTO_IPV4, .hooknum = NF_INET_LOCAL_OUT, .priority = NF_IP_PRI_NAT_DST + 1, }, /* After mangle, schedule and forward local requests */ { .hook = ip_vs_local_request4, .pf = NFPROTO_IPV4, .hooknum = NF_INET_LOCAL_OUT, .priority = NF_IP_PRI_NAT_DST + 2, }, /* After packet filtering (but before ip_vs_out_icmp), catch icmp * destined for 0.0.0.0/0, which is for incoming IPVS connections */ { .hook = ip_vs_forward_icmp, .pf = NFPROTO_IPV4, .hooknum = NF_INET_FORWARD, .priority = 99, }, /* After packet filtering, change source only for VS/NAT */ { .hook = ip_vs_reply4, .pf = NFPROTO_IPV4, .hooknum = NF_INET_FORWARD, .priority = 100, }, #ifdef CONFIG_IP_VS_IPV6 /* After packet filtering, change source only for VS/NAT */ { .hook = ip_vs_reply6, .pf = NFPROTO_IPV6, .hooknum = NF_INET_LOCAL_IN, .priority = NF_IP6_PRI_NAT_SRC - 2, }, /* After packet filtering, forward packet through VS/DR, VS/TUN, * or VS/NAT(change destination), so that filtering rules can be * applied to IPVS. */ { .hook = ip_vs_remote_request6, .pf = NFPROTO_IPV6, .hooknum = NF_INET_LOCAL_IN, .priority = NF_IP6_PRI_NAT_SRC - 1, }, /* Before ip_vs_in, change source only for VS/NAT */ { .hook = ip_vs_local_reply6, .pf = NFPROTO_IPV6, .hooknum = NF_INET_LOCAL_OUT, .priority = NF_IP6_PRI_NAT_DST + 1, }, /* After mangle, schedule and forward local requests */ { .hook = ip_vs_local_request6, .pf = NFPROTO_IPV6, .hooknum = NF_INET_LOCAL_OUT, .priority = NF_IP6_PRI_NAT_DST + 2, }, /* After packet filtering (but before ip_vs_out_icmp), catch icmp * destined for 0.0.0.0/0, which is for incoming IPVS connections */ { .hook = ip_vs_forward_icmp_v6, .pf = NFPROTO_IPV6, .hooknum = NF_INET_FORWARD, .priority = 99, }, /* After packet filtering, change source only for VS/NAT */ { .hook = ip_vs_reply6, .pf = NFPROTO_IPV6, .hooknum = NF_INET_FORWARD, .priority = 100, }, #endif }; /* * Initialize IP Virtual Server netns mem. */ static int __net_init __ip_vs_init(struct net *net) { struct netns_ipvs *ipvs; ipvs = net_generic(net, ip_vs_net_id); if (ipvs == NULL) return -ENOMEM; /* Hold the beast until a service is registerd */ ipvs->enable = 0; ipvs->net = net; /* Counters used for creating unique names */ ipvs->gen = atomic_read(&ipvs_netns_cnt); atomic_inc(&ipvs_netns_cnt); net->ipvs = ipvs; if (ip_vs_estimator_net_init(ipvs) < 0) goto estimator_fail; if (ip_vs_control_net_init(ipvs) < 0) goto control_fail; if (ip_vs_protocol_net_init(ipvs) < 0) goto protocol_fail; if (ip_vs_app_net_init(ipvs) < 0) goto app_fail; if (ip_vs_conn_net_init(ipvs) < 0) goto conn_fail; if (ip_vs_sync_net_init(ipvs) < 0) goto sync_fail; return 0; /* * Error handling */ sync_fail: ip_vs_conn_net_cleanup(ipvs); conn_fail: ip_vs_app_net_cleanup(ipvs); app_fail: ip_vs_protocol_net_cleanup(ipvs); protocol_fail: ip_vs_control_net_cleanup(ipvs); control_fail: ip_vs_estimator_net_cleanup(ipvs); estimator_fail: net->ipvs = NULL; return -ENOMEM; } static void __net_exit __ip_vs_cleanup(struct net *net) { struct netns_ipvs *ipvs = net_ipvs(net); ip_vs_service_net_cleanup(ipvs); /* ip_vs_flush() with locks */ ip_vs_conn_net_cleanup(ipvs); ip_vs_app_net_cleanup(ipvs); ip_vs_protocol_net_cleanup(ipvs); ip_vs_control_net_cleanup(ipvs); ip_vs_estimator_net_cleanup(ipvs); IP_VS_DBG(2, "ipvs netns %d released\n", ipvs->gen); net->ipvs = NULL; } static int __net_init __ip_vs_dev_init(struct net *net) { int ret; ret = nf_register_net_hooks(net, ip_vs_ops, ARRAY_SIZE(ip_vs_ops)); if (ret < 0) goto hook_fail; return 0; hook_fail: return ret; } static void __net_exit __ip_vs_dev_cleanup(struct net *net) { struct netns_ipvs *ipvs = net_ipvs(net); EnterFunction(2); nf_unregister_net_hooks(net, ip_vs_ops, ARRAY_SIZE(ip_vs_ops)); ipvs->enable = 0; /* Disable packet reception */ smp_wmb(); ip_vs_sync_net_cleanup(ipvs); LeaveFunction(2); } static struct pernet_operations ipvs_core_ops = { .init = __ip_vs_init, .exit = __ip_vs_cleanup, .id = &ip_vs_net_id, .size = sizeof(struct netns_ipvs), }; static struct pernet_operations ipvs_core_dev_ops = { .init = __ip_vs_dev_init, .exit = __ip_vs_dev_cleanup, }; /* * Initialize IP Virtual Server */ static int __init ip_vs_init(void) { int ret; ret = ip_vs_control_init(); if (ret < 0) { pr_err("can't setup control.\n"); goto exit; } ip_vs_protocol_init(); ret = ip_vs_conn_init(); if (ret < 0) { pr_err("can't setup connection table.\n"); goto cleanup_protocol; } ret = register_pernet_subsys(&ipvs_core_ops); /* Alloc ip_vs struct */ if (ret < 0) goto cleanup_conn; ret = register_pernet_device(&ipvs_core_dev_ops); if (ret < 0) goto cleanup_sub; ret = ip_vs_register_nl_ioctl(); if (ret < 0) { pr_err("can't register netlink/ioctl.\n"); goto cleanup_dev; } pr_info("ipvs loaded.\n"); return ret; cleanup_dev: unregister_pernet_device(&ipvs_core_dev_ops); cleanup_sub: unregister_pernet_subsys(&ipvs_core_ops); cleanup_conn: ip_vs_conn_cleanup(); cleanup_protocol: ip_vs_protocol_cleanup(); ip_vs_control_cleanup(); exit: return ret; } static void __exit ip_vs_cleanup(void) { ip_vs_unregister_nl_ioctl(); unregister_pernet_device(&ipvs_core_dev_ops); unregister_pernet_subsys(&ipvs_core_ops); /* free ip_vs struct */ ip_vs_conn_cleanup(); ip_vs_protocol_cleanup(); ip_vs_control_cleanup(); pr_info("ipvs unloaded.\n"); } module_init(ip_vs_init); module_exit(ip_vs_cleanup); MODULE_LICENSE("GPL");
79 31 31 31 30 11 30 10 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 // SPDX-License-Identifier: GPL-2.0 /* * xfrm4_state.c * * Changes: * YOSHIFUJI Hideaki @USAGI * Split up af-specific portion * */ #include <net/ip.h> #include <net/xfrm.h> #include <linux/pfkeyv2.h> #include <linux/ipsec.h> #include <linux/netfilter_ipv4.h> #include <linux/export.h> static int xfrm4_init_flags(struct xfrm_state *x) { if (xs_net(x)->ipv4.sysctl_ip_no_pmtu_disc) x->props.flags |= XFRM_STATE_NOPMTUDISC; return 0; } static void __xfrm4_init_tempsel(struct xfrm_selector *sel, const struct flowi *fl) { const struct flowi4 *fl4 = &fl->u.ip4; sel->daddr.a4 = fl4->daddr; sel->saddr.a4 = fl4->saddr; sel->dport = xfrm_flowi_dport(fl, &fl4->uli); sel->dport_mask = htons(0xffff); sel->sport = xfrm_flowi_sport(fl, &fl4->uli); sel->sport_mask = htons(0xffff); sel->family = AF_INET; sel->prefixlen_d = 32; sel->prefixlen_s = 32; sel->proto = fl4->flowi4_proto; sel->ifindex = fl4->flowi4_oif; } static void xfrm4_init_temprop(struct xfrm_state *x, const struct xfrm_tmpl *tmpl, const xfrm_address_t *daddr, const xfrm_address_t *saddr) { x->id = tmpl->id; if (x->id.daddr.a4 == 0) x->id.daddr.a4 = daddr->a4; x->props.saddr = tmpl->saddr; if (x->props.saddr.a4 == 0) x->props.saddr.a4 = saddr->a4; x->props.mode = tmpl->mode; x->props.reqid = tmpl->reqid; x->props.family = AF_INET; } int xfrm4_extract_header(struct sk_buff *skb) { const struct iphdr *iph = ip_hdr(skb); XFRM_MODE_SKB_CB(skb)->ihl = sizeof(*iph); XFRM_MODE_SKB_CB(skb)->id = iph->id; XFRM_MODE_SKB_CB(skb)->frag_off = iph->frag_off; XFRM_MODE_SKB_CB(skb)->tos = iph->tos; XFRM_MODE_SKB_CB(skb)->ttl = iph->ttl; XFRM_MODE_SKB_CB(skb)->optlen = iph->ihl * 4 - sizeof(*iph); memset(XFRM_MODE_SKB_CB(skb)->flow_lbl, 0, sizeof(XFRM_MODE_SKB_CB(skb)->flow_lbl)); return 0; } static struct xfrm_state_afinfo xfrm4_state_afinfo = { .family = AF_INET, .proto = IPPROTO_IPIP, .eth_proto = htons(ETH_P_IP), .owner = THIS_MODULE, .init_flags = xfrm4_init_flags, .init_tempsel = __xfrm4_init_tempsel, .init_temprop = xfrm4_init_temprop, .output = xfrm4_output, .output_finish = xfrm4_output_finish, .extract_input = xfrm4_extract_input, .extract_output = xfrm4_extract_output, .transport_finish = xfrm4_transport_finish, .local_error = xfrm4_local_error, }; void __init xfrm4_state_init(void) { xfrm_state_register_afinfo(&xfrm4_state_afinfo); }
6 6 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 /* SPDX-License-Identifier: GPL-2.0 */ #ifndef __IP_SET_BITMAP_H #define __IP_SET_BITMAP_H #include <uapi/linux/netfilter/ipset/ip_set_bitmap.h> #define IPSET_BITMAP_MAX_RANGE 0x0000FFFF enum { IPSET_ADD_STORE_PLAIN_TIMEOUT = -1, IPSET_ADD_FAILED = 1, IPSET_ADD_START_STORED_TIMEOUT, }; /* Common functions */ static inline u32 range_to_mask(u32 from, u32 to, u8 *bits) { u32 mask = 0xFFFFFFFE; *bits = 32; while (--(*bits) > 0 && mask && (to & mask) != from) mask <<= 1; return mask; } #endif /* __IP_SET_BITMAP_H */
3 3 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 /* SPDX-License-Identifier: GPL-2.0 */ #ifndef IOCONTEXT_H #define IOCONTEXT_H #include <linux/radix-tree.h> #include <linux/rcupdate.h> #include <linux/workqueue.h> enum { ICQ_EXITED = 1 << 2, ICQ_DESTROYED = 1 << 3, }; /* * An io_cq (icq) is association between an io_context (ioc) and a * request_queue (q). This is used by elevators which need to track * information per ioc - q pair. * * Elevator can request use of icq by setting elevator_type->icq_size and * ->icq_align. Both size and align must be larger than that of struct * io_cq and elevator can use the tail area for private information. The * recommended way to do this is defining a struct which contains io_cq as * the first member followed by private members and using its size and * align. For example, * * struct snail_io_cq { * struct io_cq icq; * int poke_snail; * int feed_snail; * }; * * struct elevator_type snail_elv_type { * .ops = { ... }, * .icq_size = sizeof(struct snail_io_cq), * .icq_align = __alignof__(struct snail_io_cq), * ... * }; * * If icq_size is set, block core will manage icq's. All requests will * have its ->elv.icq field set before elevator_ops->elevator_set_req_fn() * is called and be holding a reference to the associated io_context. * * Whenever a new icq is created, elevator_ops->elevator_init_icq_fn() is * called and, on destruction, ->elevator_exit_icq_fn(). Both functions * are called with both the associated io_context and queue locks held. * * Elevator is allowed to lookup icq using ioc_lookup_icq() while holding * queue lock but the returned icq is valid only until the queue lock is * released. Elevators can not and should not try to create or destroy * icq's. * * As icq's are linked from both ioc and q, the locking rules are a bit * complex. * * - ioc lock nests inside q lock. * * - ioc->icq_list and icq->ioc_node are protected by ioc lock. * q->icq_list and icq->q_node by q lock. * * - ioc->icq_tree and ioc->icq_hint are protected by ioc lock, while icq * itself is protected by q lock. However, both the indexes and icq * itself are also RCU managed and lookup can be performed holding only * the q lock. * * - icq's are not reference counted. They are destroyed when either the * ioc or q goes away. Each request with icq set holds an extra * reference to ioc to ensure it stays until the request is completed. * * - Linking and unlinking icq's are performed while holding both ioc and q * locks. Due to the lock ordering, q exit is simple but ioc exit * requires reverse-order double lock dance. */ struct io_cq { struct request_queue *q; struct io_context *ioc; /* * q_node and ioc_node link io_cq through icq_list of q and ioc * respectively. Both fields are unused once ioc_exit_icq() is * called and shared with __rcu_icq_cache and __rcu_head which are * used for RCU free of io_cq. */ union { struct list_head q_node; struct kmem_cache *__rcu_icq_cache; }; union { struct hlist_node ioc_node; struct rcu_head __rcu_head; }; unsigned int flags; }; /* * I/O subsystem state of the associated processes. It is refcounted * and kmalloc'ed. These could be shared between processes. */ struct io_context { atomic_long_t refcount; atomic_t active_ref; atomic_t nr_tasks; /* all the fields below are protected by this lock */ spinlock_t lock; unsigned short ioprio; /* * For request batching */ int nr_batch_requests; /* Number of requests left in the batch */ unsigned long last_waited; /* Time last woken after wait for request */ struct radix_tree_root icq_tree; struct io_cq __rcu *icq_hint; struct hlist_head icq_list; struct work_struct release_work; }; /** * get_io_context_active - get active reference on ioc * @ioc: ioc of interest * * Only iocs with active reference can issue new IOs. This function * acquires an active reference on @ioc. The caller must already have an * active reference on @ioc. */ static inline void get_io_context_active(struct io_context *ioc) { WARN_ON_ONCE(atomic_long_read(&ioc->refcount) <= 0); WARN_ON_ONCE(atomic_read(&ioc->active_ref) <= 0); atomic_long_inc(&ioc->refcount); atomic_inc(&ioc->active_ref); } static inline void ioc_task_link(struct io_context *ioc) { get_io_context_active(ioc); WARN_ON_ONCE(atomic_read(&ioc->nr_tasks) <= 0); atomic_inc(&ioc->nr_tasks); } struct task_struct; #ifdef CONFIG_BLOCK void put_io_context(struct io_context *ioc); void put_io_context_active(struct io_context *ioc); void exit_io_context(struct task_struct *task); struct io_context *get_task_io_context(struct task_struct *task, gfp_t gfp_flags, int node); #else struct io_context; static inline void put_io_context(struct io_context *ioc) { } static inline void exit_io_context(struct task_struct *task) { } #endif #endif
1173 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 /* SPDX-License-Identifier: GPL-2.0 */ #ifndef _LINUX_PID_NS_H #define _LINUX_PID_NS_H #include <linux/sched.h> #include <linux/bug.h> #include <linux/mm.h> #include <linux/workqueue.h> #include <linux/threads.h> #include <linux/nsproxy.h> #include <linux/kref.h> #include <linux/ns_common.h> #include <linux/idr.h> struct fs_pin; enum { /* definitions for pid_namespace's hide_pid field */ HIDEPID_OFF = 0, HIDEPID_NO_ACCESS = 1, HIDEPID_INVISIBLE = 2, }; struct pid_namespace { struct kref kref; struct idr idr; struct rcu_head rcu; unsigned int pid_allocated; struct task_struct *child_reaper; struct kmem_cache *pid_cachep; unsigned int level; struct pid_namespace *parent; #ifdef CONFIG_PROC_FS struct vfsmount *proc_mnt; struct dentry *proc_self; struct dentry *proc_thread_self; #endif #ifdef CONFIG_BSD_PROCESS_ACCT struct fs_pin *bacct; #endif struct user_namespace *user_ns; struct ucounts *ucounts; struct work_struct proc_work; kgid_t pid_gid; int hide_pid; int reboot; /* group exit code if this pidns was rebooted */ struct ns_common ns; } __randomize_layout; extern struct pid_namespace init_pid_ns; #define PIDNS_ADDING (1U << 31) #ifdef CONFIG_PID_NS static inline struct pid_namespace *get_pid_ns(struct pid_namespace *ns) { if (ns != &init_pid_ns) kref_get(&ns->kref); return ns; } extern struct pid_namespace *copy_pid_ns(unsigned long flags, struct user_namespace *user_ns, struct pid_namespace *ns); extern void zap_pid_ns_processes(struct pid_namespace *pid_ns); extern int reboot_pid_ns(struct pid_namespace *pid_ns, int cmd); extern void put_pid_ns(struct pid_namespace *ns); #else /* !CONFIG_PID_NS */ #include <linux/err.h> static inline struct pid_namespace *get_pid_ns(struct pid_namespace *ns) { return ns; } static inline struct pid_namespace *copy_pid_ns(unsigned long flags, struct user_namespace *user_ns, struct pid_namespace *ns) { if (flags & CLONE_NEWPID) ns = ERR_PTR(-EINVAL); return ns; } static inline void put_pid_ns(struct pid_namespace *ns) { } static inline void zap_pid_ns_processes(struct pid_namespace *ns) { BUG(); } static inline int reboot_pid_ns(struct pid_namespace *pid_ns, int cmd) { return 0; } #endif /* CONFIG_PID_NS */ extern struct pid_namespace *task_active_pid_ns(struct task_struct *tsk); void pidhash_init(void); void pid_idr_init(void); #endif /* _LINUX_PID_NS_H */
2827 2748 2825 3198 504 3197 2827 2827 2819 2820 2 2817 1598 747 1598 764 1598 675 641 41 14 2 14 3 39 671 669 668 671 692 111 692 671 24 76 24 10 707 704 705 1 692 707 704 42 124 124 42 42 1 896 897 715 713 896 82 82 82 2081 2284 2537 576 575 575 124 576 230 230 4 3 443 462 458 458 456 1 455 458 458 308 306 305 8 299 249 246 187 129 132 48 44 292 297 306 265 264 264 137 160 49 49 45 44 45 35 33 32 15 32 4 4 2 4 29 31 35 14 5 15 25 21 40 16 40 2 14 40 39 28 12 19 16 12 48 47 7 48 31 29 31 22 12 24 24 19 18 19 141 141 141 103 103 12 11 10 142 141 141 34 140 129 128 10 120 140 137 130 142 1 24 20 24 43 41 5 41 6 1 22 24 23 43 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 /* File: fs/xattr.c Extended attribute handling. Copyright (C) 2001 by Andreas Gruenbacher <a.gruenbacher@computer.org> Copyright (C) 2001 SGI - Silicon Graphics, Inc <linux-xfs@oss.sgi.com> Copyright (c) 2004 Red Hat, Inc., James Morris <jmorris@redhat.com> */ #include <linux/fs.h> #include <linux/slab.h> #include <linux/file.h> #include <linux/xattr.h> #include <linux/mount.h> #include <linux/namei.h> #include <linux/security.h> #include <linux/evm.h> #include <linux/syscalls.h> #include <linux/export.h> #include <linux/fsnotify.h> #include <linux/audit.h> #include <linux/vmalloc.h> #include <linux/posix_acl_xattr.h> #include <linux/uaccess.h> static const char * strcmp_prefix(const char *a, const char *a_prefix) { while (*a_prefix && *a == *a_prefix) { a++; a_prefix++; } return *a_prefix ? NULL : a; } /* * In order to implement different sets of xattr operations for each xattr * prefix, a filesystem should create a null-terminated array of struct * xattr_handler (one for each prefix) and hang a pointer to it off of the * s_xattr field of the superblock. */ #define for_each_xattr_handler(handlers, handler) \ if (handlers) \ for ((handler) = *(handlers)++; \ (handler) != NULL; \ (handler) = *(handlers)++) /* * Find the xattr_handler with the matching prefix. */ static const struct xattr_handler * xattr_resolve_name(struct inode *inode, const char **name) { const struct xattr_handler **handlers = inode->i_sb->s_xattr; const struct xattr_handler *handler; if (!(inode->i_opflags & IOP_XATTR)) { if (unlikely(is_bad_inode(inode))) return ERR_PTR(-EIO); return ERR_PTR(-EOPNOTSUPP); } for_each_xattr_handler(handlers, handler) { const char *n; n = strcmp_prefix(*name, xattr_prefix(handler)); if (n) { if (!handler->prefix ^ !*n) { if (*n) continue; return ERR_PTR(-EINVAL); } *name = n; return handler; } } return ERR_PTR(-EOPNOTSUPP); } /* * Check permissions for extended attribute access. This is a bit complicated * because different namespaces have very different rules. */ static int xattr_permission(struct inode *inode, const char *name, int mask) { /* * We can never set or remove an extended attribute on a read-only * filesystem or on an immutable / append-only inode. */ if (mask & MAY_WRITE) { if (IS_IMMUTABLE(inode) || IS_APPEND(inode)) return -EPERM; /* * Updating an xattr will likely cause i_uid and i_gid * to be writen back improperly if their true value is * unknown to the vfs. */ if (HAS_UNMAPPED_ID(inode)) return -EPERM; } /* * No restriction for security.* and system.* from the VFS. Decision * on these is left to the underlying filesystem / security module. */ if (!strncmp(name, XATTR_SECURITY_PREFIX, XATTR_SECURITY_PREFIX_LEN) || !strncmp(name, XATTR_SYSTEM_PREFIX, XATTR_SYSTEM_PREFIX_LEN)) return 0; /* * The trusted.* namespace can only be accessed by privileged users. */ if (!strncmp(name, XATTR_TRUSTED_PREFIX, XATTR_TRUSTED_PREFIX_LEN)) { if (!capable(CAP_SYS_ADMIN)) return (mask & MAY_WRITE) ? -EPERM : -ENODATA; return 0; } /* * In the user.* namespace, only regular files and directories can have * extended attributes. For sticky directories, only the owner and * privileged users can write attributes. */ if (!strncmp(name, XATTR_USER_PREFIX, XATTR_USER_PREFIX_LEN)) { if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode)) return (mask & MAY_WRITE) ? -EPERM : -ENODATA; if (S_ISDIR(inode->i_mode) && (inode->i_mode & S_ISVTX) && (mask & MAY_WRITE) && !inode_owner_or_capable(inode)) return -EPERM; } return inode_permission(inode, mask); } int __vfs_setxattr(struct dentry *dentry, struct inode *inode, const char *name, const void *value, size_t size, int flags) { const struct xattr_handler *handler; handler = xattr_resolve_name(inode, &name); if (IS_ERR(handler)) return PTR_ERR(handler); if (!handler->set) return -EOPNOTSUPP; if (size == 0) value = ""; /* empty EA, do not remove */ return handler->set(handler, dentry, inode, name, value, size, flags); } EXPORT_SYMBOL(__vfs_setxattr); /** * __vfs_setxattr_noperm - perform setxattr operation without performing * permission checks. * * @dentry - object to perform setxattr on * @name - xattr name to set * @value - value to set @name to * @size - size of @value * @flags - flags to pass into filesystem operations * * returns the result of the internal setxattr or setsecurity operations. * * This function requires the caller to lock the inode's i_mutex before it * is executed. It also assumes that the caller will make the appropriate * permission checks. */ int __vfs_setxattr_noperm(struct dentry *dentry, const char *name, const void *value, size_t size, int flags) { struct inode *inode = dentry->d_inode; int error = -EAGAIN; int issec = !strncmp(name, XATTR_SECURITY_PREFIX, XATTR_SECURITY_PREFIX_LEN); if (issec) inode->i_flags &= ~S_NOSEC; if (inode->i_opflags & IOP_XATTR) { error = __vfs_setxattr(dentry, inode, name, value, size, flags); if (!error) { fsnotify_xattr(dentry); security_inode_post_setxattr(dentry, name, value, size, flags); } } else { if (unlikely(is_bad_inode(inode))) return -EIO; } if (error == -EAGAIN) { error = -EOPNOTSUPP; if (issec) { const char *suffix = name + XATTR_SECURITY_PREFIX_LEN; error = security_inode_setsecurity(inode, suffix, value, size, flags); if (!error) fsnotify_xattr(dentry); } } return error; } /** * __vfs_setxattr_locked: set an extended attribute while holding the inode * lock * * @dentry - object to perform setxattr on * @name - xattr name to set * @value - value to set @name to * @size - size of @value * @flags - flags to pass into filesystem operations * @delegated_inode - on return, will contain an inode pointer that * a delegation was broken on, NULL if none. */ int __vfs_setxattr_locked(struct dentry *dentry, const char *name, const void *value, size_t size, int flags, struct inode **delegated_inode) { struct inode *inode = dentry->d_inode; int error; error = xattr_permission(inode, name, MAY_WRITE); if (error) return error; error = security_inode_setxattr(dentry, name, value, size, flags); if (error) goto out; error = try_break_deleg(inode, delegated_inode); if (error) goto out; error = __vfs_setxattr_noperm(dentry, name, value, size, flags); out: return error; } EXPORT_SYMBOL_GPL(__vfs_setxattr_locked); int vfs_setxattr(struct dentry *dentry, const char *name, const void *value, size_t size, int flags) { struct inode *inode = dentry->d_inode; struct inode *delegated_inode = NULL; int error; retry_deleg: inode_lock(inode); error = __vfs_setxattr_locked(dentry, name, value, size, flags, &delegated_inode); inode_unlock(inode); if (delegated_inode) { error = break_deleg_wait(&delegated_inode); if (!error) goto retry_deleg; } return error; } EXPORT_SYMBOL_GPL(vfs_setxattr); static ssize_t xattr_getsecurity(struct inode *inode, const char *name, void *value, size_t size) { void *buffer = NULL; ssize_t len; if (!value || !size) { len = security_inode_getsecurity(inode, name, &buffer, false); goto out_noalloc; } len = security_inode_getsecurity(inode, name, &buffer, true); if (len < 0) return len; if (size < len) { len = -ERANGE; goto out; } memcpy(value, buffer, len); out: kfree(buffer); out_noalloc: return len; } /* * vfs_getxattr_alloc - allocate memory, if necessary, before calling getxattr * * Allocate memory, if not already allocated, or re-allocate correct size, * before retrieving the extended attribute. * * Returns the result of alloc, if failed, or the getxattr operation. */ ssize_t vfs_getxattr_alloc(struct dentry *dentry, const char *name, char **xattr_value, size_t xattr_size, gfp_t flags) { const struct xattr_handler *handler; struct inode *inode = dentry->d_inode; char *value = *xattr_value; int error; error = xattr_permission(inode, name, MAY_READ); if (error) return error; handler = xattr_resolve_name(inode, &name); if (IS_ERR(handler)) return PTR_ERR(handler); if (!handler->get) return -EOPNOTSUPP; error = handler->get(handler, dentry, inode, name, NULL, 0); if (error < 0) return error; if (!value || (error > xattr_size)) { value = krealloc(*xattr_value, error + 1, flags); if (!value) return -ENOMEM; memset(value, 0, error + 1); } error = handler->get(handler, dentry, inode, name, value, error); *xattr_value = value; return error; } ssize_t __vfs_getxattr(struct dentry *dentry, struct inode *inode, const char *name, void *value, size_t size) { const struct xattr_handler *handler; handler = xattr_resolve_name(inode, &name); if (IS_ERR(handler)) return PTR_ERR(handler); if (!handler->get) return -EOPNOTSUPP; return handler->get(handler, dentry, inode, name, value, size); } EXPORT_SYMBOL(__vfs_getxattr); ssize_t vfs_getxattr(struct dentry *dentry, const char *name, void *value, size_t size) { struct inode *inode = dentry->d_inode; int error; error = xattr_permission(inode, name, MAY_READ); if (error) return error; error = security_inode_getxattr(dentry, name); if (error) return error; if (!strncmp(name, XATTR_SECURITY_PREFIX, XATTR_SECURITY_PREFIX_LEN)) { const char *suffix = name + XATTR_SECURITY_PREFIX_LEN; int ret = xattr_getsecurity(inode, suffix, value, size); /* * Only overwrite the return value if a security module * is actually active. */ if (ret == -EOPNOTSUPP) goto nolsm; return ret; } nolsm: return __vfs_getxattr(dentry, inode, name, value, size); } EXPORT_SYMBOL_GPL(vfs_getxattr); ssize_t vfs_listxattr(struct dentry *dentry, char *list, size_t size) { struct inode *inode = d_inode(dentry); ssize_t error; error = security_inode_listxattr(dentry); if (error) return error; if (inode->i_op->listxattr && (inode->i_opflags & IOP_XATTR)) { error = inode->i_op->listxattr(dentry, list, size); } else { error = security_inode_listsecurity(inode, list, size); if (size && error > size) error = -ERANGE; } return error; } EXPORT_SYMBOL_GPL(vfs_listxattr); int __vfs_removexattr(struct dentry *dentry, const char *name) { struct inode *inode = d_inode(dentry); const struct xattr_handler *handler; handler = xattr_resolve_name(inode, &name); if (IS_ERR(handler)) return PTR_ERR(handler); if (!handler->set) return -EOPNOTSUPP; return handler->set(handler, dentry, inode, name, NULL, 0, XATTR_REPLACE); } EXPORT_SYMBOL(__vfs_removexattr); /** * __vfs_removexattr_locked: set an extended attribute while holding the inode * lock * * @dentry - object to perform setxattr on * @name - name of xattr to remove * @delegated_inode - on return, will contain an inode pointer that * a delegation was broken on, NULL if none. */ int __vfs_removexattr_locked(struct dentry *dentry, const char *name, struct inode **delegated_inode) { struct inode *inode = dentry->d_inode; int error; error = xattr_permission(inode, name, MAY_WRITE); if (error) return error; error = security_inode_removexattr(dentry, name); if (error) goto out; error = try_break_deleg(inode, delegated_inode); if (error) goto out; error = __vfs_removexattr(dentry, name); if (!error) { fsnotify_xattr(dentry); evm_inode_post_removexattr(dentry, name); } out: return error; } EXPORT_SYMBOL_GPL(__vfs_removexattr_locked); int vfs_removexattr(struct dentry *dentry, const char *name) { struct inode *inode = dentry->d_inode; struct inode *delegated_inode = NULL; int error; retry_deleg: inode_lock(inode); error = __vfs_removexattr_locked(dentry, name, &delegated_inode); inode_unlock(inode); if (delegated_inode) { error = break_deleg_wait(&delegated_inode); if (!error) goto retry_deleg; } return error; } EXPORT_SYMBOL_GPL(vfs_removexattr); /* * Extended attribute SET operations */ static long setxattr(struct dentry *d, const char __user *name, const void __user *value, size_t size, int flags) { int error; void *kvalue = NULL; char kname[XATTR_NAME_MAX + 1]; if (flags & ~(XATTR_CREATE|XATTR_REPLACE)) return -EINVAL; error = strncpy_from_user(kname, name, sizeof(kname)); if (error == 0 || error == sizeof(kname)) error = -ERANGE; if (error < 0) return error; if (size) { if (size > XATTR_SIZE_MAX) return -E2BIG; kvalue = kvmalloc(size, GFP_KERNEL); if (!kvalue) return -ENOMEM; if (copy_from_user(kvalue, value, size)) { error = -EFAULT; goto out; } if ((strcmp(kname, XATTR_NAME_POSIX_ACL_ACCESS) == 0) || (strcmp(kname, XATTR_NAME_POSIX_ACL_DEFAULT) == 0)) posix_acl_fix_xattr_from_user(kvalue, size); else if (strcmp(kname, XATTR_NAME_CAPS) == 0) { error = cap_convert_nscap(d, &kvalue, size); if (error < 0) goto out; size = error; } } error = vfs_setxattr(d, kname, kvalue, size, flags); out: kvfree(kvalue); return error; } static int path_setxattr(const char __user *pathname, const char __user *name, const void __user *value, size_t size, int flags, unsigned int lookup_flags) { struct path path; int error; retry: error = user_path_at(AT_FDCWD, pathname, lookup_flags, &path); if (error) return error; error = mnt_want_write(path.mnt); if (!error) { error = setxattr(path.dentry, name, value, size, flags); mnt_drop_write(path.mnt); } path_put(&path); if (retry_estale(error, lookup_flags)) { lookup_flags |= LOOKUP_REVAL; goto retry; } return error; } SYSCALL_DEFINE5(setxattr, const char __user *, pathname, const char __user *, name, const void __user *, value, size_t, size, int, flags) { return path_setxattr(pathname, name, value, size, flags, LOOKUP_FOLLOW); } SYSCALL_DEFINE5(lsetxattr, const char __user *, pathname, const char __user *, name, const void __user *, value, size_t, size, int, flags) { return path_setxattr(pathname, name, value, size, flags, 0); } SYSCALL_DEFINE5(fsetxattr, int, fd, const char __user *, name, const void __user *,value, size_t, size, int, flags) { struct fd f = fdget(fd); int error = -EBADF; if (!f.file) return error; audit_file(f.file); error = mnt_want_write_file(f.file); if (!error) { error = setxattr(f.file->f_path.dentry, name, value, size, flags); mnt_drop_write_file(f.file); } fdput(f); return error; } /* * Extended attribute GET operations */ static ssize_t getxattr(struct dentry *d, const char __user *name, void __user *value, size_t size) { ssize_t error; void *kvalue = NULL; char kname[XATTR_NAME_MAX + 1]; error = strncpy_from_user(kname, name, sizeof(kname)); if (error == 0 || error == sizeof(kname)) error = -ERANGE; if (error < 0) return error; if (size) { if (size > XATTR_SIZE_MAX) size = XATTR_SIZE_MAX; kvalue = kvzalloc(size, GFP_KERNEL); if (!kvalue) return -ENOMEM; } error = vfs_getxattr(d, kname, kvalue, size); if (error > 0) { if ((strcmp(kname, XATTR_NAME_POSIX_ACL_ACCESS) == 0) || (strcmp(kname, XATTR_NAME_POSIX_ACL_DEFAULT) == 0)) posix_acl_fix_xattr_to_user(kvalue, error); if (size && copy_to_user(value, kvalue, error)) error = -EFAULT; } else if (error == -ERANGE && size >= XATTR_SIZE_MAX) { /* The file system tried to returned a value bigger than XATTR_SIZE_MAX bytes. Not possible. */ error = -E2BIG; } kvfree(kvalue); return error; } static ssize_t path_getxattr(const char __user *pathname, const char __user *name, void __user *value, size_t size, unsigned int lookup_flags) { struct path path; ssize_t error; retry: error = user_path_at(AT_FDCWD, pathname, lookup_flags, &path); if (error) return error; error = getxattr(path.dentry, name, value, size); path_put(&path); if (retry_estale(error, lookup_flags)) { lookup_flags |= LOOKUP_REVAL; goto retry; } return error; } SYSCALL_DEFINE4(getxattr, const char __user *, pathname, const char __user *, name, void __user *, value, size_t, size) { return path_getxattr(pathname, name, value, size, LOOKUP_FOLLOW); } SYSCALL_DEFINE4(lgetxattr, const char __user *, pathname, const char __user *, name, void __user *, value, size_t, size) { return path_getxattr(pathname, name, value, size, 0); } SYSCALL_DEFINE4(fgetxattr, int, fd, const char __user *, name, void __user *, value, size_t, size) { struct fd f = fdget(fd); ssize_t error = -EBADF; if (!f.file) return error; audit_file(f.file); error = getxattr(f.file->f_path.dentry, name, value, size); fdput(f); return error; } /* * Extended attribute LIST operations */ static ssize_t listxattr(struct dentry *d, char __user *list, size_t size) { ssize_t error; char *klist = NULL; if (size) { if (size > XATTR_LIST_MAX) size = XATTR_LIST_MAX; klist = kvmalloc(size, GFP_KERNEL); if (!klist) return -ENOMEM; } error = vfs_listxattr(d, klist, size); if (error > 0) { if (size && copy_to_user(list, klist, error)) error = -EFAULT; } else if (error == -ERANGE && size >= XATTR_LIST_MAX) { /* The file system tried to returned a list bigger than XATTR_LIST_MAX bytes. Not possible. */ error = -E2BIG; } kvfree(klist); return error; } static ssize_t path_listxattr(const char __user *pathname, char __user *list, size_t size, unsigned int lookup_flags) { struct path path; ssize_t error; retry: error = user_path_at(AT_FDCWD, pathname, lookup_flags, &path); if (error) return error; error = listxattr(path.dentry, list, size); path_put(&path); if (retry_estale(error, lookup_flags)) { lookup_flags |= LOOKUP_REVAL; goto retry; } return error; } SYSCALL_DEFINE3(listxattr, const char __user *, pathname, char __user *, list, size_t, size) { return path_listxattr(pathname, list, size, LOOKUP_FOLLOW); } SYSCALL_DEFINE3(llistxattr, const char __user *, pathname, char __user *, list, size_t, size) { return path_listxattr(pathname, list, size, 0); } SYSCALL_DEFINE3(flistxattr, int, fd, char __user *, list, size_t, size) { struct fd f = fdget(fd); ssize_t error = -EBADF; if (!f.file) return error; audit_file(f.file); error = listxattr(f.file->f_path.dentry, list, size); fdput(f); return error; } /* * Extended attribute REMOVE operations */ static long removexattr(struct dentry *d, const char __user *name) { int error; char kname[XATTR_NAME_MAX + 1]; error = strncpy_from_user(kname, name, sizeof(kname)); if (error == 0 || error == sizeof(kname)) error = -ERANGE; if (error < 0) return error; return vfs_removexattr(d, kname); } static int path_removexattr(const char __user *pathname, const char __user *name, unsigned int lookup_flags) { struct path path; int error; retry: error = user_path_at(AT_FDCWD, pathname, lookup_flags, &path); if (error) return error; error = mnt_want_write(path.mnt); if (!error) { error = removexattr(path.dentry, name); mnt_drop_write(path.mnt); } path_put(&path); if (retry_estale(error, lookup_flags)) { lookup_flags |= LOOKUP_REVAL; goto retry; } return error; } SYSCALL_DEFINE2(removexattr, const char __user *, pathname, const char __user *, name) { return path_removexattr(pathname, name, LOOKUP_FOLLOW); } SYSCALL_DEFINE2(lremovexattr, const char __user *, pathname, const char __user *, name) { return path_removexattr(pathname, name, 0); } SYSCALL_DEFINE2(fremovexattr, int, fd, const char __user *, name) { struct fd f = fdget(fd); int error = -EBADF; if (!f.file) return error; audit_file(f.file); error = mnt_want_write_file(f.file); if (!error) { error = removexattr(f.file->f_path.dentry, name); mnt_drop_write_file(f.file); } fdput(f); return error; } /* * Combine the results of the list() operation from every xattr_handler in the * list. */ ssize_t generic_listxattr(struct dentry *dentry, char *buffer, size_t buffer_size) { const struct xattr_handler *handler, **handlers = dentry->d_sb->s_xattr; unsigned int size = 0; if (!buffer) { for_each_xattr_handler(handlers, handler) { if (!handler->name || (handler->list && !handler->list(dentry))) continue; size += strlen(handler->name) + 1; } } else { char *buf = buffer; size_t len; for_each_xattr_handler(handlers, handler) { if (!handler->name || (handler->list && !handler->list(dentry))) continue; len = strlen(handler->name); if (len + 1 > buffer_size) return -ERANGE; memcpy(buf, handler->name, len + 1); buf += len + 1; buffer_size -= len + 1; } size = buf - buffer; } return size; } EXPORT_SYMBOL(generic_listxattr); /** * xattr_full_name - Compute full attribute name from suffix * * @handler: handler of the xattr_handler operation * @name: name passed to the xattr_handler operation * * The get and set xattr handler operations are called with the remainder of * the attribute name after skipping the handler's prefix: for example, "foo" * is passed to the get operation of a handler with prefix "user." to get * attribute "user.foo". The full name is still "there" in the name though. * * Note: the list xattr handler operation when called from the vfs is passed a * NULL name; some file systems use this operation internally, with varying * semantics. */ const char *xattr_full_name(const struct xattr_handler *handler, const char *name) { size_t prefix_len = strlen(xattr_prefix(handler)); return name - prefix_len; } EXPORT_SYMBOL(xattr_full_name); /* * Allocate new xattr and copy in the value; but leave the name to callers. */ struct simple_xattr *simple_xattr_alloc(const void *value, size_t size) { struct simple_xattr *new_xattr; size_t len; /* wrap around? */ len = sizeof(*new_xattr) + size; if (len < sizeof(*new_xattr)) return NULL; new_xattr = kmalloc(len, GFP_KERNEL); if (!new_xattr) return NULL; new_xattr->size = size; memcpy(new_xattr->value, value, size); return new_xattr; } /* * xattr GET operation for in-memory/pseudo filesystems */ int simple_xattr_get(struct simple_xattrs *xattrs, const char *name, void *buffer, size_t size) { struct simple_xattr *xattr; int ret = -ENODATA; spin_lock(&xattrs->lock); list_for_each_entry(xattr, &xattrs->head, list) { if (strcmp(name, xattr->name)) continue; ret = xattr->size; if (buffer) { if (size < xattr->size) ret = -ERANGE; else memcpy(buffer, xattr->value, xattr->size); } break; } spin_unlock(&xattrs->lock); return ret; } /** * simple_xattr_set - xattr SET operation for in-memory/pseudo filesystems * @xattrs: target simple_xattr list * @name: name of the extended attribute * @value: value of the xattr. If %NULL, will remove the attribute. * @size: size of the new xattr * @flags: %XATTR_{CREATE|REPLACE} * * %XATTR_CREATE is set, the xattr shouldn't exist already; otherwise fails * with -EEXIST. If %XATTR_REPLACE is set, the xattr should exist; * otherwise, fails with -ENODATA. * * Returns 0 on success, -errno on failure. */ int simple_xattr_set(struct simple_xattrs *xattrs, const char *name, const void *value, size_t size, int flags) { struct simple_xattr *xattr; struct simple_xattr *new_xattr = NULL; int err = 0; /* value == NULL means remove */ if (value) { new_xattr = simple_xattr_alloc(value, size); if (!new_xattr) return -ENOMEM; new_xattr->name = kstrdup(name, GFP_KERNEL); if (!new_xattr->name) { kfree(new_xattr); return -ENOMEM; } } spin_lock(&xattrs->lock); list_for_each_entry(xattr, &xattrs->head, list) { if (!strcmp(name, xattr->name)) { if (flags & XATTR_CREATE) { xattr = new_xattr; err = -EEXIST; } else if (new_xattr) { list_replace(&xattr->list, &new_xattr->list); } else { list_del(&xattr->list); } goto out; } } if (flags & XATTR_REPLACE) { xattr = new_xattr; err = -ENODATA; } else { list_add(&new_xattr->list, &xattrs->head); xattr = NULL; } out: spin_unlock(&xattrs->lock); if (xattr) { kfree(xattr->name); kfree(xattr); } return err; } static bool xattr_is_trusted(const char *name) { return !strncmp(name, XATTR_TRUSTED_PREFIX, XATTR_TRUSTED_PREFIX_LEN); } static int xattr_list_one(char **buffer, ssize_t *remaining_size, const char *name) { size_t len = strlen(name) + 1; if (*buffer) { if (*remaining_size < len) return -ERANGE; memcpy(*buffer, name, len); *buffer += len; } *remaining_size -= len; return 0; } /* * xattr LIST operation for in-memory/pseudo filesystems */ ssize_t simple_xattr_list(struct inode *inode, struct simple_xattrs *xattrs, char *buffer, size_t size) { bool trusted = capable(CAP_SYS_ADMIN); struct simple_xattr *xattr; ssize_t remaining_size = size; int err = 0; #ifdef CONFIG_FS_POSIX_ACL if (IS_POSIXACL(inode)) { if (inode->i_acl) { err = xattr_list_one(&buffer, &remaining_size, XATTR_NAME_POSIX_ACL_ACCESS); if (err) return err; } if (inode->i_default_acl) { err = xattr_list_one(&buffer, &remaining_size, XATTR_NAME_POSIX_ACL_DEFAULT); if (err) return err; } } #endif spin_lock(&xattrs->lock); list_for_each_entry(xattr, &xattrs->head, list) { /* skip "trusted." attributes for unprivileged callers */ if (!trusted && xattr_is_trusted(xattr->name)) continue; err = xattr_list_one(&buffer, &remaining_size, xattr->name); if (err) break; } spin_unlock(&xattrs->lock); return err ? err : size - remaining_size; } /* * Adds an extended attribute to the list */ void simple_xattr_list_add(struct simple_xattrs *xattrs, struct simple_xattr *new_xattr) { spin_lock(&xattrs->lock); list_add(&new_xattr->list, &xattrs->head); spin_unlock(&xattrs->lock); }
167 243 247 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 /* SPDX-License-Identifier: GPL-2.0 */ /* Copyright (C) 2007-2018 B.A.T.M.A.N. contributors: * * Marek Lindner, Simon Wunderlich * * This program is free software; you can redistribute it and/or * modify it under the terms of version 2 of the GNU General Public * License as published by the Free Software Foundation. * * This program is distributed in the hope that it will be useful, but * WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, see <http://www.gnu.org/licenses/>. */ #ifndef _NET_BATMAN_ADV_HARD_INTERFACE_H_ #define _NET_BATMAN_ADV_HARD_INTERFACE_H_ #include "main.h" #include <linux/compiler.h> #include <linux/kref.h> #include <linux/notifier.h> #include <linux/rcupdate.h> #include <linux/stddef.h> #include <linux/types.h> struct net_device; struct net; /** * enum batadv_hard_if_state - State of a hard interface */ enum batadv_hard_if_state { /** * @BATADV_IF_NOT_IN_USE: interface is not used as slave interface of a * batman-adv soft interface */ BATADV_IF_NOT_IN_USE, /** * @BATADV_IF_TO_BE_REMOVED: interface will be removed from soft * interface */ BATADV_IF_TO_BE_REMOVED, /** @BATADV_IF_INACTIVE: interface is deactivated */ BATADV_IF_INACTIVE, /** @BATADV_IF_ACTIVE: interface is used */ BATADV_IF_ACTIVE, /** @BATADV_IF_TO_BE_ACTIVATED: interface is getting activated */ BATADV_IF_TO_BE_ACTIVATED, /** * @BATADV_IF_I_WANT_YOU: interface is queued up (using sysfs) for being * added as slave interface of a batman-adv soft interface */ BATADV_IF_I_WANT_YOU, }; /** * enum batadv_hard_if_bcast - broadcast avoidance options */ enum batadv_hard_if_bcast { /** @BATADV_HARDIF_BCAST_OK: Do broadcast on according hard interface */ BATADV_HARDIF_BCAST_OK = 0, /** * @BATADV_HARDIF_BCAST_NORECIPIENT: Broadcast not needed, there is no * recipient */ BATADV_HARDIF_BCAST_NORECIPIENT, /** * @BATADV_HARDIF_BCAST_DUPFWD: There is just the neighbor we got it * from */ BATADV_HARDIF_BCAST_DUPFWD, /** @BATADV_HARDIF_BCAST_DUPORIG: There is just the originator */ BATADV_HARDIF_BCAST_DUPORIG, }; /** * enum batadv_hard_if_cleanup - Cleanup modi for soft_iface after slave removal */ enum batadv_hard_if_cleanup { /** * @BATADV_IF_CLEANUP_KEEP: Don't automatically delete soft-interface */ BATADV_IF_CLEANUP_KEEP, /** * @BATADV_IF_CLEANUP_AUTO: Delete soft-interface after last slave was * removed */ BATADV_IF_CLEANUP_AUTO, }; extern struct notifier_block batadv_hard_if_notifier; struct net_device *batadv_get_real_netdev(struct net_device *net_device); bool batadv_is_cfg80211_hardif(struct batadv_hard_iface *hard_iface); bool batadv_is_wifi_hardif(struct batadv_hard_iface *hard_iface); struct batadv_hard_iface* batadv_hardif_get_by_netdev(const struct net_device *net_dev); int batadv_hardif_enable_interface(struct batadv_hard_iface *hard_iface, struct net *net, const char *iface_name); void batadv_hardif_disable_interface(struct batadv_hard_iface *hard_iface, enum batadv_hard_if_cleanup autodel); void batadv_hardif_remove_interfaces(void); int batadv_hardif_min_mtu(struct net_device *soft_iface); void batadv_update_min_mtu(struct net_device *soft_iface); void batadv_hardif_release(struct kref *ref); int batadv_hardif_no_broadcast(struct batadv_hard_iface *if_outgoing, u8 *orig_addr, u8 *orig_neigh); /** * batadv_hardif_put() - decrement the hard interface refcounter and possibly * release it * @hard_iface: the hard interface to free */ static inline void batadv_hardif_put(struct batadv_hard_iface *hard_iface) { kref_put(&hard_iface->refcount, batadv_hardif_release); } /** * batadv_primary_if_get_selected() - Get reference to primary interface * @bat_priv: the bat priv with all the soft interface information * * Return: primary interface (with increased refcnt), otherwise NULL */ static inline struct batadv_hard_iface * batadv_primary_if_get_selected(struct batadv_priv *bat_priv) { struct batadv_hard_iface *hard_iface; rcu_read_lock(); hard_iface = rcu_dereference(bat_priv->primary_if); if (!hard_iface) goto out; if (!kref_get_unless_zero(&hard_iface->refcount)) hard_iface = NULL; out: rcu_read_unlock(); return hard_iface; } #endif /* _NET_BATMAN_ADV_HARD_INTERFACE_H_ */
34 34 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 /* * NetLabel NETLINK Interface * * This file defines the NETLINK interface for the NetLabel system. The * NetLabel system manages static and dynamic label mappings for network * protocols such as CIPSO and RIPSO. * * Author: Paul Moore <paul@paul-moore.com> * */ /* * (c) Copyright Hewlett-Packard Development Company, L.P., 2006 * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See * the GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, see <http://www.gnu.org/licenses/>. * */ #include <linux/init.h> #include <linux/types.h> #include <linux/list.h> #include <linux/socket.h> #include <linux/audit.h> #include <linux/tty.h> #include <linux/security.h> #include <linux/gfp.h> #include <net/sock.h> #include <net/netlink.h> #include <net/genetlink.h> #include <net/netlabel.h> #include <asm/bug.h> #include "netlabel_mgmt.h" #include "netlabel_unlabeled.h" #include "netlabel_cipso_v4.h" #include "netlabel_calipso.h" #include "netlabel_user.h" /* * NetLabel NETLINK Setup Functions */ /** * netlbl_netlink_init - Initialize the NETLINK communication channel * * Description: * Call out to the NetLabel components so they can register their families and * commands with the Generic NETLINK mechanism. Returns zero on success and * non-zero on failure. * */ int __init netlbl_netlink_init(void) { int ret_val; ret_val = netlbl_mgmt_genl_init(); if (ret_val != 0) return ret_val; ret_val = netlbl_cipsov4_genl_init(); if (ret_val != 0) return ret_val; ret_val = netlbl_calipso_genl_init(); if (ret_val != 0) return ret_val; return netlbl_unlabel_genl_init(); } /* * NetLabel Audit Functions */ /** * netlbl_audit_start_common - Start an audit message * @type: audit message type * @audit_info: NetLabel audit information * * Description: * Start an audit message using the type specified in @type and fill the audit * message with some fields common to all NetLabel audit messages. Returns * a pointer to the audit buffer on success, NULL on failure. * */ struct audit_buffer *netlbl_audit_start_common(int type, struct netlbl_audit *audit_info) { struct audit_buffer *audit_buf; char *secctx; u32 secctx_len; if (audit_enabled == AUDIT_OFF) return NULL; audit_buf = audit_log_start(audit_context(), GFP_ATOMIC, type); if (audit_buf == NULL) return NULL; audit_log_format(audit_buf, "netlabel: auid=%u ses=%u", from_kuid(&init_user_ns, audit_info->loginuid), audit_info->sessionid); if (audit_info->secid != 0 && security_secid_to_secctx(audit_info->secid, &secctx, &secctx_len) == 0) { audit_log_format(audit_buf, " subj=%s", secctx); security_release_secctx(secctx, secctx_len); } return audit_buf; }
17 16 17 17 14 1 14 14 14 7 7 1 7 2 1 7 7 1 4 4 2 7 7 9 9 9 1 1 1 1 9 1 4 4 1 1 1 9 10 10 10 10 10 9 10 10 10 10 10 10 10 10 9 9 9 9 9 10 10 10 10 10 10 10 10 10 10 10 3 9 7 4 2 2 1 1 1 9 9 9 9 9 9 9 9 9 3 9 3 3 3 3 3 2 3 1 3 7 7 7 5 1 4 3 1 2 1 7 7 7 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 /* * PPP async serial channel driver for Linux. * * Copyright 1999 Paul Mackerras. * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License * as published by the Free Software Foundation; either version * 2 of the License, or (at your option) any later version. * * This driver provides the encapsulation and framing for sending * and receiving PPP frames over async serial lines. It relies on * the generic PPP layer to give it frames to send and to process * received frames. It implements the PPP line discipline. * * Part of the code in this driver was inspired by the old async-only * PPP driver, written by Michael Callahan and Al Longyear, and * subsequently hacked by Paul Mackerras. */ #include <linux/module.h> #include <linux/kernel.h> #include <linux/skbuff.h> #include <linux/tty.h> #include <linux/netdevice.h> #include <linux/poll.h> #include <linux/crc-ccitt.h> #include <linux/ppp_defs.h> #include <linux/ppp-ioctl.h> #include <linux/ppp_channel.h> #include <linux/spinlock.h> #include <linux/init.h> #include <linux/interrupt.h> #include <linux/jiffies.h> #include <linux/slab.h> #include <asm/unaligned.h> #include <linux/uaccess.h> #include <asm/string.h> #define PPP_VERSION "2.4.2" #define OBUFSIZE 4096 /* Structure for storing local state. */ struct asyncppp { struct tty_struct *tty; unsigned int flags; unsigned int state; unsigned int rbits; int mru; spinlock_t xmit_lock; spinlock_t recv_lock; unsigned long xmit_flags; u32 xaccm[8]; u32 raccm; unsigned int bytes_sent; unsigned int bytes_rcvd; struct sk_buff *tpkt; int tpkt_pos; u16 tfcs; unsigned char *optr; unsigned char *olim; unsigned long last_xmit; struct sk_buff *rpkt; int lcp_fcs; struct sk_buff_head rqueue; struct tasklet_struct tsk; refcount_t refcnt; struct semaphore dead_sem; struct ppp_channel chan; /* interface to generic ppp layer */ unsigned char obuf[OBUFSIZE]; }; /* Bit numbers in xmit_flags */ #define XMIT_WAKEUP 0 #define XMIT_FULL 1 #define XMIT_BUSY 2 /* State bits */ #define SC_TOSS 1 #define SC_ESCAPE 2 #define SC_PREV_ERROR 4 /* Bits in rbits */ #define SC_RCV_BITS (SC_RCV_B7_1|SC_RCV_B7_0|SC_RCV_ODDP|SC_RCV_EVNP) static int flag_time = HZ; module_param(flag_time, int, 0); MODULE_PARM_DESC(flag_time, "ppp_async: interval between flagged packets (in clock ticks)"); MODULE_LICENSE("GPL"); MODULE_ALIAS_LDISC(N_PPP); /* * Prototypes. */ static int ppp_async_encode(struct asyncppp *ap); static int ppp_async_send(struct ppp_channel *chan, struct sk_buff *skb); static int ppp_async_push(struct asyncppp *ap); static void ppp_async_flush_output(struct asyncppp *ap); static void ppp_async_input(struct asyncppp *ap, const unsigned char *buf, char *flags, int count); static int ppp_async_ioctl(struct ppp_channel *chan, unsigned int cmd, unsigned long arg); static void ppp_async_process(unsigned long arg); static void async_lcp_peek(struct asyncppp *ap, unsigned char *data, int len, int inbound); static const struct ppp_channel_ops async_ops = { .start_xmit = ppp_async_send, .ioctl = ppp_async_ioctl, }; /* * Routines implementing the PPP line discipline. */ /* * We have a potential race on dereferencing tty->disc_data, * because the tty layer provides no locking at all - thus one * cpu could be running ppp_asynctty_receive while another * calls ppp_asynctty_close, which zeroes tty->disc_data and * frees the memory that ppp_asynctty_receive is using. The best * way to fix this is to use a rwlock in the tty struct, but for now * we use a single global rwlock for all ttys in ppp line discipline. * * FIXME: this is no longer true. The _close path for the ldisc is * now guaranteed to be sane. */ static DEFINE_RWLOCK(disc_data_lock); static struct asyncppp *ap_get(struct tty_struct *tty) { struct asyncppp *ap; read_lock(&disc_data_lock); ap = tty->disc_data; if (ap != NULL) refcount_inc(&ap->refcnt); read_unlock(&disc_data_lock); return ap; } static void ap_put(struct asyncppp *ap) { if (refcount_dec_and_test(&ap->refcnt)) up(&ap->dead_sem); } /* * Called when a tty is put into PPP line discipline. Called in process * context. */ static int ppp_asynctty_open(struct tty_struct *tty) { struct asyncppp *ap; int err; int speed; if (tty->ops->write == NULL) return -EOPNOTSUPP; err = -ENOMEM; ap = kzalloc(sizeof(*ap), GFP_KERNEL); if (!ap) goto out; /* initialize the asyncppp structure */ ap->tty = tty; ap->mru = PPP_MRU; spin_lock_init(&ap->xmit_lock); spin_lock_init(&ap->recv_lock); ap->xaccm[0] = ~0U; ap->xaccm[3] = 0x60000000U; ap->raccm = ~0U; ap->optr = ap->obuf; ap->olim = ap->obuf; ap->lcp_fcs = -1; skb_queue_head_init(&ap->rqueue); tasklet_init(&ap->tsk, ppp_async_process, (unsigned long) ap); refcount_set(&ap->refcnt, 1); sema_init(&ap->dead_sem, 0); ap->chan.private = ap; ap->chan.ops = &async_ops; ap->chan.mtu = PPP_MRU; speed = tty_get_baud_rate(tty); ap->chan.speed = speed; err = ppp_register_channel(&ap->chan); if (err) goto out_free; tty->disc_data = ap; tty->receive_room = 65536; return 0; out_free: kfree(ap); out: return err; } /* * Called when the tty is put into another line discipline * or it hangs up. We have to wait for any cpu currently * executing in any of the other ppp_asynctty_* routines to * finish before we can call ppp_unregister_channel and free * the asyncppp struct. This routine must be called from * process context, not interrupt or softirq context. */ static void ppp_asynctty_close(struct tty_struct *tty) { struct asyncppp *ap; write_lock_irq(&disc_data_lock); ap = tty->disc_data; tty->disc_data = NULL; write_unlock_irq(&disc_data_lock); if (!ap) return; /* * We have now ensured that nobody can start using ap from now * on, but we have to wait for all existing users to finish. * Note that ppp_unregister_channel ensures that no calls to * our channel ops (i.e. ppp_async_send/ioctl) are in progress * by the time it returns. */ if (!refcount_dec_and_test(&ap->refcnt)) down(&ap->dead_sem); tasklet_kill(&ap->tsk); ppp_unregister_channel(&ap->chan); kfree_skb(ap->rpkt); skb_queue_purge(&ap->rqueue); kfree_skb(ap->tpkt); kfree(ap); } /* * Called on tty hangup in process context. * * Wait for I/O to driver to complete and unregister PPP channel. * This is already done by the close routine, so just call that. */ static int ppp_asynctty_hangup(struct tty_struct *tty) { ppp_asynctty_close(tty); return 0; } /* * Read does nothing - no data is ever available this way. * Pppd reads and writes packets via /dev/ppp instead. */ static ssize_t ppp_asynctty_read(struct tty_struct *tty, struct file *file, unsigned char __user *buf, size_t count) { return -EAGAIN; } /* * Write on the tty does nothing, the packets all come in * from the ppp generic stuff. */ static ssize_t ppp_asynctty_write(struct tty_struct *tty, struct file *file, const unsigned char *buf, size_t count) { return -EAGAIN; } /* * Called in process context only. May be re-entered by multiple * ioctl calling threads. */ static int ppp_asynctty_ioctl(struct tty_struct *tty, struct file *file, unsigned int cmd, unsigned long arg) { struct asyncppp *ap = ap_get(tty); int err, val; int __user *p = (int __user *)arg; if (!ap) return -ENXIO; err = -EFAULT; switch (cmd) { case PPPIOCGCHAN: err = -EFAULT; if (put_user(ppp_channel_index(&ap->chan), p)) break; err = 0; break; case PPPIOCGUNIT: err = -EFAULT; if (put_user(ppp_unit_number(&ap->chan), p)) break; err = 0; break; case TCFLSH: /* flush our buffers and the serial port's buffer */ if (arg == TCIOFLUSH || arg == TCOFLUSH) ppp_async_flush_output(ap); err = n_tty_ioctl_helper(tty, file, cmd, arg); break; case FIONREAD: val = 0; if (put_user(val, p)) break; err = 0; break; default: /* Try the various mode ioctls */ err = tty_mode_ioctl(tty, file, cmd, arg); } ap_put(ap); return err; } /* No kernel lock - fine */ static __poll_t ppp_asynctty_poll(struct tty_struct *tty, struct file *file, poll_table *wait) { return 0; } /* May sleep, don't call from interrupt level or with interrupts disabled */ static void ppp_asynctty_receive(struct tty_struct *tty, const unsigned char *buf, char *cflags, int count) { struct asyncppp *ap = ap_get(tty); unsigned long flags; if (!ap) return; spin_lock_irqsave(&ap->recv_lock, flags); ppp_async_input(ap, buf, cflags, count); spin_unlock_irqrestore(&ap->recv_lock, flags); if (!skb_queue_empty(&ap->rqueue)) tasklet_schedule(&ap->tsk); ap_put(ap); tty_unthrottle(tty); } static void ppp_asynctty_wakeup(struct tty_struct *tty) { struct asyncppp *ap = ap_get(tty); clear_bit(TTY_DO_WRITE_WAKEUP, &tty->flags); if (!ap) return; set_bit(XMIT_WAKEUP, &ap->xmit_flags); tasklet_schedule(&ap->tsk); ap_put(ap); } static struct tty_ldisc_ops ppp_ldisc = { .owner = THIS_MODULE, .magic = TTY_LDISC_MAGIC, .name = "ppp", .open = ppp_asynctty_open, .close = ppp_asynctty_close, .hangup = ppp_asynctty_hangup, .read = ppp_asynctty_read, .write = ppp_asynctty_write, .ioctl = ppp_asynctty_ioctl, .poll = ppp_asynctty_poll, .receive_buf = ppp_asynctty_receive, .write_wakeup = ppp_asynctty_wakeup, }; static int __init ppp_async_init(void) { int err; err = tty_register_ldisc(N_PPP, &ppp_ldisc); if (err != 0) printk(KERN_ERR "PPP_async: error %d registering line disc.\n", err); return err; } /* * The following routines provide the PPP channel interface. */ static int ppp_async_ioctl(struct ppp_channel *chan, unsigned int cmd, unsigned long arg) { struct asyncppp *ap = chan->private; void __user *argp = (void __user *)arg; int __user *p = argp; int err, val; u32 accm[8]; err = -EFAULT; switch (cmd) { case PPPIOCGFLAGS: val = ap->flags | ap->rbits; if (put_user(val, p)) break; err = 0; break; case PPPIOCSFLAGS: if (get_user(val, p)) break; ap->flags = val & ~SC_RCV_BITS; spin_lock_irq(&ap->recv_lock); ap->rbits = val & SC_RCV_BITS; spin_unlock_irq(&ap->recv_lock); err = 0; break; case PPPIOCGASYNCMAP: if (put_user(ap->xaccm[0], (u32 __user *)argp)) break; err = 0; break; case PPPIOCSASYNCMAP: if (get_user(ap->xaccm[0], (u32 __user *)argp)) break; err = 0; break; case PPPIOCGRASYNCMAP: if (put_user(ap->raccm, (u32 __user *)argp)) break; err = 0; break; case PPPIOCSRASYNCMAP: if (get_user(ap->raccm, (u32 __user *)argp)) break; err = 0; break; case PPPIOCGXASYNCMAP: if (copy_to_user(argp, ap->xaccm, sizeof(ap->xaccm))) break; err = 0; break; case PPPIOCSXASYNCMAP: if (copy_from_user(accm, argp, sizeof(accm))) break; accm[2] &= ~0x40000000U; /* can't escape 0x5e */ accm[3] |= 0x60000000U; /* must escape 0x7d, 0x7e */ memcpy(ap->xaccm, accm, sizeof(ap->xaccm)); err = 0; break; case PPPIOCGMRU: if (put_user(ap->mru, p)) break; err = 0; break; case PPPIOCSMRU: if (get_user(val, p)) break; if (val < PPP_MRU) val = PPP_MRU; ap->mru = val; err = 0; break; default: err = -ENOTTY; } return err; } /* * This is called at softirq level to deliver received packets * to the ppp_generic code, and to tell the ppp_generic code * if we can accept more output now. */ static void ppp_async_process(unsigned long arg) { struct asyncppp *ap = (struct asyncppp *) arg; struct sk_buff *skb; /* process received packets */ while ((skb = skb_dequeue(&ap->rqueue)) != NULL) { if (skb->cb[0]) ppp_input_error(&ap->chan, 0); ppp_input(&ap->chan, skb); } /* try to push more stuff out */ if (test_bit(XMIT_WAKEUP, &ap->xmit_flags) && ppp_async_push(ap)) ppp_output_wakeup(&ap->chan); } /* * Procedures for encapsulation and framing. */ /* * Procedure to encode the data for async serial transmission. * Does octet stuffing (escaping), puts the address/control bytes * on if A/C compression is disabled, and does protocol compression. * Assumes ap->tpkt != 0 on entry. * Returns 1 if we finished the current frame, 0 otherwise. */ #define PUT_BYTE(ap, buf, c, islcp) do { \ if ((islcp && c < 0x20) || (ap->xaccm[c >> 5] & (1 << (c & 0x1f)))) {\ *buf++ = PPP_ESCAPE; \ *buf++ = c ^ PPP_TRANS; \ } else \ *buf++ = c; \ } while (0) static int ppp_async_encode(struct asyncppp *ap) { int fcs, i, count, c, proto; unsigned char *buf, *buflim; unsigned char *data; int islcp; buf = ap->obuf; ap->olim = buf; ap->optr = buf; i = ap->tpkt_pos; data = ap->tpkt->data; count = ap->tpkt->len; fcs = ap->tfcs; proto = get_unaligned_be16(data); /* * LCP packets with code values between 1 (configure-reqest) * and 7 (code-reject) must be sent as though no options * had been negotiated. */ islcp = proto == PPP_LCP && 1 <= data[2] && data[2] <= 7; if (i == 0) { if (islcp) async_lcp_peek(ap, data, count, 0); /* * Start of a new packet - insert the leading FLAG * character if necessary. */ if (islcp || flag_time == 0 || time_after_eq(jiffies, ap->last_xmit + flag_time)) *buf++ = PPP_FLAG; ap->last_xmit = jiffies; fcs = PPP_INITFCS; /* * Put in the address/control bytes if necessary */ if ((ap->flags & SC_COMP_AC) == 0 || islcp) { PUT_BYTE(ap, buf, 0xff, islcp); fcs = PPP_FCS(fcs, 0xff); PUT_BYTE(ap, buf, 0x03, islcp); fcs = PPP_FCS(fcs, 0x03); } } /* * Once we put in the last byte, we need to put in the FCS * and closing flag, so make sure there is at least 7 bytes * of free space in the output buffer. */ buflim = ap->obuf + OBUFSIZE - 6; while (i < count && buf < buflim) { c = data[i++]; if (i == 1 && c == 0 && (ap->flags & SC_COMP_PROT)) continue; /* compress protocol field */ fcs = PPP_FCS(fcs, c); PUT_BYTE(ap, buf, c, islcp); } if (i < count) { /* * Remember where we are up to in this packet. */ ap->olim = buf; ap->tpkt_pos = i; ap->tfcs = fcs; return 0; } /* * We have finished the packet. Add the FCS and flag. */ fcs = ~fcs; c = fcs & 0xff; PUT_BYTE(ap, buf, c, islcp); c = (fcs >> 8) & 0xff; PUT_BYTE(ap, buf, c, islcp); *buf++ = PPP_FLAG; ap->olim = buf; consume_skb(ap->tpkt); ap->tpkt = NULL; return 1; } /* * Transmit-side routines. */ /* * Send a packet to the peer over an async tty line. * Returns 1 iff the packet was accepted. * If the packet was not accepted, we will call ppp_output_wakeup * at some later time. */ static int ppp_async_send(struct ppp_channel *chan, struct sk_buff *skb) { struct asyncppp *ap = chan->private; ppp_async_push(ap); if (test_and_set_bit(XMIT_FULL, &ap->xmit_flags)) return 0; /* already full */ ap->tpkt = skb; ap->tpkt_pos = 0; ppp_async_push(ap); return 1; } /* * Push as much data as possible out to the tty. */ static int ppp_async_push(struct asyncppp *ap) { int avail, sent, done = 0; struct tty_struct *tty = ap->tty; int tty_stuffed = 0; /* * We can get called recursively here if the tty write * function calls our wakeup function. This can happen * for example on a pty with both the master and slave * set to PPP line discipline. * We use the XMIT_BUSY bit to detect this and get out, * leaving the XMIT_WAKEUP bit set to tell the other * instance that it may now be able to write more now. */ if (test_and_set_bit(XMIT_BUSY, &ap->xmit_flags)) return 0; spin_lock_bh(&ap->xmit_lock); for (;;) { if (test_and_clear_bit(XMIT_WAKEUP, &ap->xmit_flags)) tty_stuffed = 0; if (!tty_stuffed && ap->optr < ap->olim) { avail = ap->olim - ap->optr; set_bit(TTY_DO_WRITE_WAKEUP, &tty->flags); sent = tty->ops->write(tty, ap->optr, avail); if (sent < 0) goto flush; /* error, e.g. loss of CD */ ap->optr += sent; if (sent < avail) tty_stuffed = 1; continue; } if (ap->optr >= ap->olim && ap->tpkt) { if (ppp_async_encode(ap)) { /* finished processing ap->tpkt */ clear_bit(XMIT_FULL, &ap->xmit_flags); done = 1; } continue; } /* * We haven't made any progress this time around. * Clear XMIT_BUSY to let other callers in, but * after doing so we have to check if anyone set * XMIT_WAKEUP since we last checked it. If they * did, we should try again to set XMIT_BUSY and go * around again in case XMIT_BUSY was still set when * the other caller tried. */ clear_bit(XMIT_BUSY, &ap->xmit_flags); /* any more work to do? if not, exit the loop */ if (!(test_bit(XMIT_WAKEUP, &ap->xmit_flags) || (!tty_stuffed && ap->tpkt))) break; /* more work to do, see if we can do it now */ if (test_and_set_bit(XMIT_BUSY, &ap->xmit_flags)) break; } spin_unlock_bh(&ap->xmit_lock); return done; flush: clear_bit(XMIT_BUSY, &ap->xmit_flags); if (ap->tpkt) { kfree_skb(ap->tpkt); ap->tpkt = NULL; clear_bit(XMIT_FULL, &ap->xmit_flags); done = 1; } ap->optr = ap->olim; spin_unlock_bh(&ap->xmit_lock); return done; } /* * Flush output from our internal buffers. * Called for the TCFLSH ioctl. Can be entered in parallel * but this is covered by the xmit_lock. */ static void ppp_async_flush_output(struct asyncppp *ap) { int done = 0; spin_lock_bh(&ap->xmit_lock); ap->optr = ap->olim; if (ap->tpkt != NULL) { kfree_skb(ap->tpkt); ap->tpkt = NULL; clear_bit(XMIT_FULL, &ap->xmit_flags); done = 1; } spin_unlock_bh(&ap->xmit_lock); if (done) ppp_output_wakeup(&ap->chan); } /* * Receive-side routines. */ /* see how many ordinary chars there are at the start of buf */ static inline int scan_ordinary(struct asyncppp *ap, const unsigned char *buf, int count) { int i, c; for (i = 0; i < count; ++i) { c = buf[i]; if (c == PPP_ESCAPE || c == PPP_FLAG || (c < 0x20 && (ap->raccm & (1 << c)) != 0)) break; } return i; } /* called when a flag is seen - do end-of-packet processing */ static void process_input_packet(struct asyncppp *ap) { struct sk_buff *skb; unsigned char *p; unsigned int len, fcs, proto; skb = ap->rpkt; if (ap->state & (SC_TOSS | SC_ESCAPE)) goto err; if (skb == NULL) return; /* 0-length packet */ /* check the FCS */ p = skb->data; len = skb->len; if (len < 3) goto err; /* too short */ fcs = PPP_INITFCS; for (; len > 0; --len) fcs = PPP_FCS(fcs, *p++); if (fcs != PPP_GOODFCS) goto err; /* bad FCS */ skb_trim(skb, skb->len - 2); /* check for address/control and protocol compression */ p = skb->data; if (p[0] == PPP_ALLSTATIONS) { /* chop off address/control */ if (p[1] != PPP_UI || skb->len < 3) goto err; p = skb_pull(skb, 2); } proto = p[0]; if (proto & 1) { /* protocol is compressed */ *(u8 *)skb_push(skb, 1) = 0; } else { if (skb->len < 2) goto err; proto = (proto << 8) + p[1]; if (proto == PPP_LCP) async_lcp_peek(ap, p, skb->len, 1); } /* queue the frame to be processed */ skb->cb[0] = ap->state; skb_queue_tail(&ap->rqueue, skb); ap->rpkt = NULL; ap->state = 0; return; err: /* frame had an error, remember that, reset SC_TOSS & SC_ESCAPE */ ap->state = SC_PREV_ERROR; if (skb) { /* make skb appear as freshly allocated */ skb_trim(skb, 0); skb_reserve(skb, - skb_headroom(skb)); } } /* Called when the tty driver has data for us. Runs parallel with the other ldisc functions but will not be re-entered */ static void ppp_async_input(struct asyncppp *ap, const unsigned char *buf, char *flags, int count) { struct sk_buff *skb; int c, i, j, n, s, f; unsigned char *sp; /* update bits used for 8-bit cleanness detection */ if (~ap->rbits & SC_RCV_BITS) { s = 0; for (i = 0; i < count; ++i) { c = buf[i]; if (flags && flags[i] != 0) continue; s |= (c & 0x80)? SC_RCV_B7_1: SC_RCV_B7_0; c = ((c >> 4) ^ c) & 0xf; s |= (0x6996 & (1 << c))? SC_RCV_ODDP: SC_RCV_EVNP; } ap->rbits |= s; } while (count > 0) { /* scan through and see how many chars we can do in bulk */ if ((ap->state & SC_ESCAPE) && buf[0] == PPP_ESCAPE) n = 1; else n = scan_ordinary(ap, buf, count); f = 0; if (flags && (ap->state & SC_TOSS) == 0) { /* check the flags to see if any char had an error */ for (j = 0; j < n; ++j) if ((f = flags[j]) != 0) break; } if (f != 0) { /* start tossing */ ap->state |= SC_TOSS; } else if (n > 0 && (ap->state & SC_TOSS) == 0) { /* stuff the chars in the skb */ skb = ap->rpkt; if (!skb) { skb = dev_alloc_skb(ap->mru + PPP_HDRLEN + 2); if (!skb) goto nomem; ap->rpkt = skb; } if (skb->len == 0) { /* Try to get the payload 4-byte aligned. * This should match the * PPP_ALLSTATIONS/PPP_UI/compressed tests in * process_input_packet, but we do not have * enough chars here to test buf[1] and buf[2]. */ if (buf[0] != PPP_ALLSTATIONS) skb_reserve(skb, 2 + (buf[0] & 1)); } if (n > skb_tailroom(skb)) { /* packet overflowed MRU */ ap->state |= SC_TOSS; } else { sp = skb_put_data(skb, buf, n); if (ap->state & SC_ESCAPE) { sp[0] ^= PPP_TRANS; ap->state &= ~SC_ESCAPE; } } } if (n >= count) break; c = buf[n]; if (flags != NULL && flags[n] != 0) { ap->state |= SC_TOSS; } else if (c == PPP_FLAG) { process_input_packet(ap); } else if (c == PPP_ESCAPE) { ap->state |= SC_ESCAPE; } else if (I_IXON(ap->tty)) { if (c == START_CHAR(ap->tty)) start_tty(ap->tty); else if (c == STOP_CHAR(ap->tty)) stop_tty(ap->tty); } /* otherwise it's a char in the recv ACCM */ ++n; buf += n; if (flags) flags += n; count -= n; } return; nomem: printk(KERN_ERR "PPPasync: no memory (input pkt)\n"); ap->state |= SC_TOSS; } /* * We look at LCP frames going past so that we can notice * and react to the LCP configure-ack from the peer. * In the situation where the peer has been sent a configure-ack * already, LCP is up once it has sent its configure-ack * so the immediately following packet can be sent with the * configured LCP options. This allows us to process the following * packet correctly without pppd needing to respond quickly. * * We only respond to the received configure-ack if we have just * sent a configure-request, and the configure-ack contains the * same data (this is checked using a 16-bit crc of the data). */ #define CONFREQ 1 /* LCP code field values */ #define CONFACK 2 #define LCP_MRU 1 /* LCP option numbers */ #define LCP_ASYNCMAP 2 static void async_lcp_peek(struct asyncppp *ap, unsigned char *data, int len, int inbound) { int dlen, fcs, i, code; u32 val; data += 2; /* skip protocol bytes */ len -= 2; if (len < 4) /* 4 = code, ID, length */ return; code = data[0]; if (code != CONFACK && code != CONFREQ) return; dlen = get_unaligned_be16(data + 2); if (len < dlen) return; /* packet got truncated or length is bogus */ if (code == (inbound? CONFACK: CONFREQ)) { /* * sent confreq or received confack: * calculate the crc of the data from the ID field on. */ fcs = PPP_INITFCS; for (i = 1; i < dlen; ++i) fcs = PPP_FCS(fcs, data[i]); if (!inbound) { /* outbound confreq - remember the crc for later */ ap->lcp_fcs = fcs; return; } /* received confack, check the crc */ fcs ^= ap->lcp_fcs; ap->lcp_fcs = -1; if (fcs != 0) return; } else if (inbound) return; /* not interested in received confreq */ /* process the options in the confack */ data += 4; dlen -= 4; /* data[0] is code, data[1] is length */ while (dlen >= 2 && dlen >= data[1] && data[1] >= 2) { switch (data[0]) { case LCP_MRU: val = get_unaligned_be16(data + 2); if (inbound) ap->mru = val; else ap->chan.mtu = val; break; case LCP_ASYNCMAP: val = get_unaligned_be32(data + 2); if (inbound) ap->raccm = val; else ap->xaccm[0] = val; break; } dlen -= data[1]; data += data[1]; } } static void __exit ppp_async_cleanup(void) { if (tty_unregister_ldisc(N_PPP) != 0) printk(KERN_ERR "failed to unregister PPP line discipline\n"); } module_init(ppp_async_init); module_exit(ppp_async_cleanup);
6 4 1 5 3 2 3 6 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 /* * IP6 tables REJECT target module * Linux INET6 implementation * * Copyright (C)2003 USAGI/WIDE Project * * Authors: * Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp> * * Copyright (c) 2005-2007 Patrick McHardy <kaber@trash.net> * * Based on net/ipv4/netfilter/ipt_REJECT.c * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License * as published by the Free Software Foundation; either version * 2 of the License, or (at your option) any later version. */ #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt #include <linux/gfp.h> #include <linux/module.h> #include <linux/skbuff.h> #include <linux/icmpv6.h> #include <linux/netdevice.h> #include <net/icmp.h> #include <net/flow.h> #include <linux/netfilter/x_tables.h> #include <linux/netfilter_ipv6/ip6_tables.h> #include <linux/netfilter_ipv6/ip6t_REJECT.h> #include <net/netfilter/ipv6/nf_reject.h> MODULE_AUTHOR("Yasuyuki KOZAKAI <yasuyuki.kozakai@toshiba.co.jp>"); MODULE_DESCRIPTION("Xtables: packet \"rejection\" target for IPv6"); MODULE_LICENSE("GPL"); static unsigned int reject_tg6(struct sk_buff *skb, const struct xt_action_param *par) { const struct ip6t_reject_info *reject = par->targinfo; struct net *net = xt_net(par); switch (reject->with) { case IP6T_ICMP6_NO_ROUTE: nf_send_unreach6(net, skb, ICMPV6_NOROUTE, xt_hooknum(par)); break; case IP6T_ICMP6_ADM_PROHIBITED: nf_send_unreach6(net, skb, ICMPV6_ADM_PROHIBITED, xt_hooknum(par)); break; case IP6T_ICMP6_NOT_NEIGHBOUR: nf_send_unreach6(net, skb, ICMPV6_NOT_NEIGHBOUR, xt_hooknum(par)); break; case IP6T_ICMP6_ADDR_UNREACH: nf_send_unreach6(net, skb, ICMPV6_ADDR_UNREACH, xt_hooknum(par)); break; case IP6T_ICMP6_PORT_UNREACH: nf_send_unreach6(net, skb, ICMPV6_PORT_UNREACH, xt_hooknum(par)); break; case IP6T_ICMP6_ECHOREPLY: /* Do nothing */ break; case IP6T_TCP_RESET: nf_send_reset6(net, skb, xt_hooknum(par)); break; case IP6T_ICMP6_POLICY_FAIL: nf_send_unreach6(net, skb, ICMPV6_POLICY_FAIL, xt_hooknum(par)); break; case IP6T_ICMP6_REJECT_ROUTE: nf_send_unreach6(net, skb, ICMPV6_REJECT_ROUTE, xt_hooknum(par)); break; } return NF_DROP; } static int reject_tg6_check(const struct xt_tgchk_param *par) { const struct ip6t_reject_info *rejinfo = par->targinfo; const struct ip6t_entry *e = par->entryinfo; if (rejinfo->with == IP6T_ICMP6_ECHOREPLY) { pr_info_ratelimited("ECHOREPLY is not supported\n"); return -EINVAL; } else if (rejinfo->with == IP6T_TCP_RESET) { /* Must specify that it's a TCP packet */ if (!(e->ipv6.flags & IP6T_F_PROTO) || e->ipv6.proto != IPPROTO_TCP || (e->ipv6.invflags & XT_INV_PROTO)) { pr_info_ratelimited("TCP_RESET illegal for non-tcp\n"); return -EINVAL; } } return 0; } static struct xt_target reject_tg6_reg __read_mostly = { .name = "REJECT", .family = NFPROTO_IPV6, .target = reject_tg6, .targetsize = sizeof(struct ip6t_reject_info), .table = "filter", .hooks = (1 << NF_INET_LOCAL_IN) | (1 << NF_INET_FORWARD) | (1 << NF_INET_LOCAL_OUT), .checkentry = reject_tg6_check, .me = THIS_MODULE }; static int __init reject_tg6_init(void) { return xt_register_target(&reject_tg6_reg); } static void __exit reject_tg6_exit(void) { xt_unregister_target(&reject_tg6_reg); } module_init(reject_tg6_init); module_exit(reject_tg6_exit);
1 1 3 7 7 6 6 5 4 2 3 2 2 7 7 5 1 4 1 1 3 2 7 7 7 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 /* * linux/fs/adfs/super.c * * Copyright (C) 1997-1999 Russell King * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License version 2 as * published by the Free Software Foundation. */ #include <linux/module.h> #include <linux/init.h> #include <linux/buffer_head.h> #include <linux/parser.h> #include <linux/mount.h> #include <linux/seq_file.h> #include <linux/slab.h> #include <linux/statfs.h> #include <linux/user_namespace.h> #include "adfs.h" #include "dir_f.h" #include "dir_fplus.h" #define ADFS_DEFAULT_OWNER_MASK S_IRWXU #define ADFS_DEFAULT_OTHER_MASK (S_IRWXG | S_IRWXO) void __adfs_error(struct super_block *sb, const char *function, const char *fmt, ...) { char error_buf[128]; va_list args; va_start(args, fmt); vsnprintf(error_buf, sizeof(error_buf), fmt, args); va_end(args); printk(KERN_CRIT "ADFS-fs error (device %s)%s%s: %s\n", sb->s_id, function ? ": " : "", function ? function : "", error_buf); } static int adfs_checkdiscrecord(struct adfs_discrecord *dr) { int i; /* sector size must be 256, 512 or 1024 bytes */ if (dr->log2secsize != 8 && dr->log2secsize != 9 && dr->log2secsize != 10) return 1; /* idlen must be at least log2secsize + 3 */ if (dr->idlen < dr->log2secsize + 3) return 1; /* we cannot have such a large disc that we * are unable to represent sector offsets in * 32 bits. This works out at 2.0 TB. */ if (le32_to_cpu(dr->disc_size_high) >> dr->log2secsize) return 1; /* idlen must be no greater than 19 v2 [1.0] */ if (dr->idlen > 19) return 1; /* reserved bytes should be zero */ for (i = 0; i < sizeof(dr->unused52); i++) if (dr->unused52[i] != 0) return 1; return 0; } static unsigned char adfs_calczonecheck(struct super_block *sb, unsigned char *map) { unsigned int v0, v1, v2, v3; int i; v0 = v1 = v2 = v3 = 0; for (i = sb->s_blocksize - 4; i; i -= 4) { v0 += map[i] + (v3 >> 8); v3 &= 0xff; v1 += map[i + 1] + (v0 >> 8); v0 &= 0xff; v2 += map[i + 2] + (v1 >> 8); v1 &= 0xff; v3 += map[i + 3] + (v2 >> 8); v2 &= 0xff; } v0 += v3 >> 8; v1 += map[1] + (v0 >> 8); v2 += map[2] + (v1 >> 8); v3 += map[3] + (v2 >> 8); return v0 ^ v1 ^ v2 ^ v3; } static int adfs_checkmap(struct super_block *sb, struct adfs_discmap *dm) { unsigned char crosscheck = 0, zonecheck = 1; int i; for (i = 0; i < ADFS_SB(sb)->s_map_size; i++) { unsigned char *map; map = dm[i].dm_bh->b_data; if (adfs_calczonecheck(sb, map) != map[0]) { adfs_error(sb, "zone %d fails zonecheck", i); zonecheck = 0; } crosscheck ^= map[3]; } if (crosscheck != 0xff) adfs_error(sb, "crosscheck != 0xff"); return crosscheck == 0xff && zonecheck; } static void adfs_put_super(struct super_block *sb) { int i; struct adfs_sb_info *asb = ADFS_SB(sb); for (i = 0; i < asb->s_map_size; i++) brelse(asb->s_map[i].dm_bh); kfree(asb->s_map); kfree_rcu(asb, rcu); } static int adfs_show_options(struct seq_file *seq, struct dentry *root) { struct adfs_sb_info *asb = ADFS_SB(root->d_sb); if (!uid_eq(asb->s_uid, GLOBAL_ROOT_UID)) seq_printf(seq, ",uid=%u", from_kuid_munged(&init_user_ns, asb->s_uid)); if (!gid_eq(asb->s_gid, GLOBAL_ROOT_GID)) seq_printf(seq, ",gid=%u", from_kgid_munged(&init_user_ns, asb->s_gid)); if (asb->s_owner_mask != ADFS_DEFAULT_OWNER_MASK) seq_printf(seq, ",ownmask=%o", asb->s_owner_mask); if (asb->s_other_mask != ADFS_DEFAULT_OTHER_MASK) seq_printf(seq, ",othmask=%o", asb->s_other_mask); if (asb->s_ftsuffix != 0) seq_printf(seq, ",ftsuffix=%u", asb->s_ftsuffix); return 0; } enum {Opt_uid, Opt_gid, Opt_ownmask, Opt_othmask, Opt_ftsuffix, Opt_err}; static const match_table_t tokens = { {Opt_uid, "uid=%u"}, {Opt_gid, "gid=%u"}, {Opt_ownmask, "ownmask=%o"}, {Opt_othmask, "othmask=%o"}, {Opt_ftsuffix, "ftsuffix=%u"}, {Opt_err, NULL} }; static int parse_options(struct super_block *sb, char *options) { char *p; struct adfs_sb_info *asb = ADFS_SB(sb); int option; if (!options) return 0; while ((p = strsep(&options, ",")) != NULL) { substring_t args[MAX_OPT_ARGS]; int token; if (!*p) continue; token = match_token(p, tokens, args); switch (token) { case Opt_uid: if (match_int(args, &option)) return -EINVAL; asb->s_uid = make_kuid(current_user_ns(), option); if (!uid_valid(asb->s_uid)) return -EINVAL; break; case Opt_gid: if (match_int(args, &option)) return -EINVAL; asb->s_gid = make_kgid(current_user_ns(), option); if (!gid_valid(asb->s_gid)) return -EINVAL; break; case Opt_ownmask: if (match_octal(args, &option)) return -EINVAL; asb->s_owner_mask = option; break; case Opt_othmask: if (match_octal(args, &option)) return -EINVAL; asb->s_other_mask = option; break; case Opt_ftsuffix: if (match_int(args, &option)) return -EINVAL; asb->s_ftsuffix = option; break; default: printk("ADFS-fs: unrecognised mount option \"%s\" " "or missing value\n", p); return -EINVAL; } } return 0; } static int adfs_remount(struct super_block *sb, int *flags, char *data) { sync_filesystem(sb); *flags |= SB_NODIRATIME; return parse_options(sb, data); } static int adfs_statfs(struct dentry *dentry, struct kstatfs *buf) { struct super_block *sb = dentry->d_sb; struct adfs_sb_info *sbi = ADFS_SB(sb); u64 id = huge_encode_dev(sb->s_bdev->bd_dev); buf->f_type = ADFS_SUPER_MAGIC; buf->f_namelen = sbi->s_namelen; buf->f_bsize = sb->s_blocksize; buf->f_blocks = sbi->s_size; buf->f_files = sbi->s_ids_per_zone * sbi->s_map_size; buf->f_bavail = buf->f_bfree = adfs_map_free(sb); buf->f_ffree = (long)(buf->f_bfree * buf->f_files) / (long)buf->f_blocks; buf->f_fsid.val[0] = (u32)id; buf->f_fsid.val[1] = (u32)(id >> 32); return 0; } static struct kmem_cache *adfs_inode_cachep; static struct inode *adfs_alloc_inode(struct super_block *sb) { struct adfs_inode_info *ei; ei = kmem_cache_alloc(adfs_inode_cachep, GFP_KERNEL); if (!ei) return NULL; return &ei->vfs_inode; } static void adfs_i_callback(struct rcu_head *head) { struct inode *inode = container_of(head, struct inode, i_rcu); kmem_cache_free(adfs_inode_cachep, ADFS_I(inode)); } static void adfs_destroy_inode(struct inode *inode) { call_rcu(&inode->i_rcu, adfs_i_callback); } static void init_once(void *foo) { struct adfs_inode_info *ei = (struct adfs_inode_info *) foo; inode_init_once(&ei->vfs_inode); } static int __init init_inodecache(void) { adfs_inode_cachep = kmem_cache_create("adfs_inode_cache", sizeof(struct adfs_inode_info), 0, (SLAB_RECLAIM_ACCOUNT| SLAB_MEM_SPREAD|SLAB_ACCOUNT), init_once); if (adfs_inode_cachep == NULL) return -ENOMEM; return 0; } static void destroy_inodecache(void) { /* * Make sure all delayed rcu free inodes are flushed before we * destroy cache. */ rcu_barrier(); kmem_cache_destroy(adfs_inode_cachep); } static const struct super_operations adfs_sops = { .alloc_inode = adfs_alloc_inode, .destroy_inode = adfs_destroy_inode, .drop_inode = generic_delete_inode, .write_inode = adfs_write_inode, .put_super = adfs_put_super, .statfs = adfs_statfs, .remount_fs = adfs_remount, .show_options = adfs_show_options, }; static struct adfs_discmap *adfs_read_map(struct super_block *sb, struct adfs_discrecord *dr) { struct adfs_discmap *dm; unsigned int map_addr, zone_size, nzones; int i, zone; struct adfs_sb_info *asb = ADFS_SB(sb); nzones = asb->s_map_size; zone_size = (8 << dr->log2secsize) - le16_to_cpu(dr->zone_spare); map_addr = (nzones >> 1) * zone_size - ((nzones > 1) ? ADFS_DR_SIZE_BITS : 0); map_addr = signed_asl(map_addr, asb->s_map2blk); asb->s_ids_per_zone = zone_size / (asb->s_idlen + 1); dm = kmalloc_array(nzones, sizeof(*dm), GFP_KERNEL); if (dm == NULL) { adfs_error(sb, "not enough memory"); return ERR_PTR(-ENOMEM); } for (zone = 0; zone < nzones; zone++, map_addr++) { dm[zone].dm_startbit = 0; dm[zone].dm_endbit = zone_size; dm[zone].dm_startblk = zone * zone_size - ADFS_DR_SIZE_BITS; dm[zone].dm_bh = sb_bread(sb, map_addr); if (!dm[zone].dm_bh) { adfs_error(sb, "unable to read map"); goto error_free; } } /* adjust the limits for the first and last map zones */ i = zone - 1; dm[0].dm_startblk = 0; dm[0].dm_startbit = ADFS_DR_SIZE_BITS; dm[i].dm_endbit = (le32_to_cpu(dr->disc_size_high) << (32 - dr->log2bpmb)) + (le32_to_cpu(dr->disc_size) >> dr->log2bpmb) + (ADFS_DR_SIZE_BITS - i * zone_size); if (adfs_checkmap(sb, dm)) return dm; adfs_error(sb, "map corrupted"); error_free: while (--zone >= 0) brelse(dm[zone].dm_bh); kfree(dm); return ERR_PTR(-EIO); } static inline unsigned long adfs_discsize(struct adfs_discrecord *dr, int block_bits) { unsigned long discsize; discsize = le32_to_cpu(dr->disc_size_high) << (32 - block_bits); discsize |= le32_to_cpu(dr->disc_size) >> block_bits; return discsize; } static int adfs_fill_super(struct super_block *sb, void *data, int silent) { struct adfs_discrecord *dr; struct buffer_head *bh; struct object_info root_obj; unsigned char *b_data; unsigned int blocksize; struct adfs_sb_info *asb; struct inode *root; int ret = -EINVAL; sb->s_flags |= SB_NODIRATIME; asb = kzalloc(sizeof(*asb), GFP_KERNEL); if (!asb) return -ENOMEM; sb->s_fs_info = asb; /* set default options */ asb->s_uid = GLOBAL_ROOT_UID; asb->s_gid = GLOBAL_ROOT_GID; asb->s_owner_mask = ADFS_DEFAULT_OWNER_MASK; asb->s_other_mask = ADFS_DEFAULT_OTHER_MASK; asb->s_ftsuffix = 0; if (parse_options(sb, data)) goto error; sb_set_blocksize(sb, BLOCK_SIZE); if (!(bh = sb_bread(sb, ADFS_DISCRECORD / BLOCK_SIZE))) { adfs_error(sb, "unable to read superblock"); ret = -EIO; goto error; } b_data = bh->b_data + (ADFS_DISCRECORD % BLOCK_SIZE); if (adfs_checkbblk(b_data)) { if (!silent) printk("VFS: Can't find an adfs filesystem on dev " "%s.\n", sb->s_id); ret = -EINVAL; goto error_free_bh; } dr = (struct adfs_discrecord *)(b_data + ADFS_DR_OFFSET); /* * Do some sanity checks on the ADFS disc record */ if (adfs_checkdiscrecord(dr)) { if (!silent) printk("VPS: Can't find an adfs filesystem on dev " "%s.\n", sb->s_id); ret = -EINVAL; goto error_free_bh; } blocksize = 1 << dr->log2secsize; brelse(bh); if (sb_set_blocksize(sb, blocksize)) { bh = sb_bread(sb, ADFS_DISCRECORD / sb->s_blocksize); if (!bh) { adfs_error(sb, "couldn't read superblock on " "2nd try."); ret = -EIO; goto error; } b_data = bh->b_data + (ADFS_DISCRECORD % sb->s_blocksize); if (adfs_checkbblk(b_data)) { adfs_error(sb, "disc record mismatch, very weird!"); ret = -EINVAL; goto error_free_bh; } dr = (struct adfs_discrecord *)(b_data + ADFS_DR_OFFSET); } else { if (!silent) printk(KERN_ERR "VFS: Unsupported blocksize on dev " "%s.\n", sb->s_id); ret = -EINVAL; goto error; } /* * blocksize on this device should now be set to the ADFS log2secsize */ sb->s_magic = ADFS_SUPER_MAGIC; asb->s_idlen = dr->idlen; asb->s_map_size = dr->nzones | (dr->nzones_high << 8); asb->s_map2blk = dr->log2bpmb - dr->log2secsize; asb->s_size = adfs_discsize(dr, sb->s_blocksize_bits); asb->s_version = dr->format_version; asb->s_log2sharesize = dr->log2sharesize; asb->s_map = adfs_read_map(sb, dr); if (IS_ERR(asb->s_map)) { ret = PTR_ERR(asb->s_map); goto error_free_bh; } brelse(bh); /* * set up enough so that we can read an inode */ sb->s_op = &adfs_sops; dr = (struct adfs_discrecord *)(asb->s_map[0].dm_bh->b_data + 4); root_obj.parent_id = root_obj.file_id = le32_to_cpu(dr->root); root_obj.name_len = 0; /* Set root object date as 01 Jan 1987 00:00:00 */ root_obj.loadaddr = 0xfff0003f; root_obj.execaddr = 0xec22c000; root_obj.size = ADFS_NEWDIR_SIZE; root_obj.attr = ADFS_NDA_DIRECTORY | ADFS_NDA_OWNER_READ | ADFS_NDA_OWNER_WRITE | ADFS_NDA_PUBLIC_READ; root_obj.filetype = -1; /* * If this is a F+ disk with variable length directories, * get the root_size from the disc record. */ if (asb->s_version) { root_obj.size = le32_to_cpu(dr->root_size); asb->s_dir = &adfs_fplus_dir_ops; asb->s_namelen = ADFS_FPLUS_NAME_LEN; } else { asb->s_dir = &adfs_f_dir_ops; asb->s_namelen = ADFS_F_NAME_LEN; } /* * ,xyz hex filetype suffix may be added by driver * to files that have valid RISC OS filetype */ if (asb->s_ftsuffix) asb->s_namelen += 4; sb->s_d_op = &adfs_dentry_operations; root = adfs_iget(sb, &root_obj); sb->s_root = d_make_root(root); if (!sb->s_root) { int i; for (i = 0; i < asb->s_map_size; i++) brelse(asb->s_map[i].dm_bh); kfree(asb->s_map); adfs_error(sb, "get root inode failed\n"); ret = -EIO; goto error; } return 0; error_free_bh: brelse(bh); error: sb->s_fs_info = NULL; kfree(asb); return ret; } static struct dentry *adfs_mount(struct file_system_type *fs_type, int flags, const char *dev_name, void *data) { return mount_bdev(fs_type, flags, dev_name, data, adfs_fill_super); } static struct file_system_type adfs_fs_type = { .owner = THIS_MODULE, .name = "adfs", .mount = adfs_mount, .kill_sb = kill_block_super, .fs_flags = FS_REQUIRES_DEV, }; MODULE_ALIAS_FS("adfs"); static int __init init_adfs_fs(void) { int err = init_inodecache(); if (err) goto out1; err = register_filesystem(&adfs_fs_type); if (err) goto out; return 0; out: destroy_inodecache(); out1: return err; } static void __exit exit_adfs_fs(void) { unregister_filesystem(&adfs_fs_type); destroy_inodecache(); } module_init(init_adfs_fs) module_exit(exit_adfs_fs) MODULE_LICENSE("GPL");
541 541 540 444 444 443 541 18 10 18 444 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 /* * AppArmor security module * * This file contains AppArmor security identifier (secid) manipulation fns * * Copyright 2009-2017 Canonical Ltd. * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License as * published by the Free Software Foundation, version 2 of the * License. * * * AppArmor allocates a unique secid for every label used. If a label * is replaced it receives the secid of the label it is replacing. */ #include <linux/errno.h> #include <linux/err.h> #include <linux/gfp.h> #include <linux/idr.h> #include <linux/slab.h> #include <linux/spinlock.h> #include "include/cred.h" #include "include/lib.h" #include "include/secid.h" #include "include/label.h" #include "include/policy_ns.h" /* * secids - do not pin labels with a refcount. They rely on the label * properly updating/freeing them */ #define AA_FIRST_SECID 1 static DEFINE_IDR(aa_secids); static DEFINE_SPINLOCK(secid_lock); /* * TODO: allow policy to reserve a secid range? * TODO: add secid pinning * TODO: use secid_update in label replace */ /** * aa_secid_update - update a secid mapping to a new label * @secid: secid to update * @label: label the secid will now map to */ void aa_secid_update(u32 secid, struct aa_label *label) { unsigned long flags; spin_lock_irqsave(&secid_lock, flags); idr_replace(&aa_secids, label, secid); spin_unlock_irqrestore(&secid_lock, flags); } /** * * see label for inverse aa_label_to_secid */ struct aa_label *aa_secid_to_label(u32 secid) { struct aa_label *label; rcu_read_lock(); label = idr_find(&aa_secids, secid); rcu_read_unlock(); return label; } int apparmor_secid_to_secctx(u32 secid, char **secdata, u32 *seclen) { /* TODO: cache secctx and ref count so we don't have to recreate */ struct aa_label *label = aa_secid_to_label(secid); int len; AA_BUG(!seclen); if (!label) return -EINVAL; if (secdata) len = aa_label_asxprint(secdata, root_ns, label, FLAG_SHOW_MODE | FLAG_VIEW_SUBNS | FLAG_HIDDEN_UNCONFINED | FLAG_ABS_ROOT, GFP_ATOMIC); else len = aa_label_snxprint(NULL, 0, root_ns, label, FLAG_SHOW_MODE | FLAG_VIEW_SUBNS | FLAG_HIDDEN_UNCONFINED | FLAG_ABS_ROOT); if (len < 0) return -ENOMEM; *seclen = len; return 0; } int apparmor_secctx_to_secid(const char *secdata, u32 seclen, u32 *secid) { struct aa_label *label; label = aa_label_strn_parse(&root_ns->unconfined->label, secdata, seclen, GFP_KERNEL, false, false); if (IS_ERR(label)) return PTR_ERR(label); *secid = label->secid; return 0; } void apparmor_release_secctx(char *secdata, u32 seclen) { kfree(secdata); } /** * aa_alloc_secid - allocate a new secid for a profile * @label: the label to allocate a secid for * @gfp: memory allocation flags * * Returns: 0 with @label->secid initialized * <0 returns error with @label->secid set to AA_SECID_INVALID */ int aa_alloc_secid(struct aa_label *label, gfp_t gfp) { unsigned long flags; int ret; idr_preload(gfp); spin_lock_irqsave(&secid_lock, flags); ret = idr_alloc(&aa_secids, label, AA_FIRST_SECID, 0, GFP_ATOMIC); spin_unlock_irqrestore(&secid_lock, flags); idr_preload_end(); if (ret < 0) { label->secid = AA_SECID_INVALID; return ret; } AA_BUG(ret == AA_SECID_INVALID); label->secid = ret; return 0; } /** * aa_free_secid - free a secid * @secid: secid to free */ void aa_free_secid(u32 secid) { unsigned long flags; spin_lock_irqsave(&secid_lock, flags); idr_remove(&aa_secids, secid); spin_unlock_irqrestore(&secid_lock, flags); } void aa_secids_init(void) { idr_init_base(&aa_secids, AA_FIRST_SECID); }
44 828 801 31 7 5 18 5 2 2 3 829 830 828 1629 726 11 691 9 10 8 453 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 /* * Copyright (C) 2011 IBM Corporation * * Author: * Mimi Zohar <zohar@us.ibm.com> * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation, version 2 of the License. */ #include <linux/module.h> #include <linux/file.h> #include <linux/fs.h> #include <linux/xattr.h> #include <linux/magic.h> #include <linux/ima.h> #include <linux/evm.h> #include "ima.h" static int __init default_appraise_setup(char *str) { #ifdef CONFIG_IMA_APPRAISE_BOOTPARAM if (strncmp(str, "off", 3) == 0) ima_appraise = 0; else if (strncmp(str, "log", 3) == 0) ima_appraise = IMA_APPRAISE_LOG; else if (strncmp(str, "fix", 3) == 0) ima_appraise = IMA_APPRAISE_FIX; #endif return 1; } __setup("ima_appraise=", default_appraise_setup); /* * is_ima_appraise_enabled - return appraise status * * Only return enabled, if not in ima_appraise="fix" or "log" modes. */ bool is_ima_appraise_enabled(void) { return ima_appraise & IMA_APPRAISE_ENFORCE; } /* * ima_must_appraise - set appraise flag * * Return 1 to appraise or hash */ int ima_must_appraise(struct inode *inode, int mask, enum ima_hooks func) { u32 secid; if (!ima_appraise) return 0; security_task_getsecid(current, &secid); return ima_match_policy(inode, current_cred(), secid, func, mask, IMA_APPRAISE | IMA_HASH, NULL); } static int ima_fix_xattr(struct dentry *dentry, struct integrity_iint_cache *iint) { int rc, offset; u8 algo = iint->ima_hash->algo; if (algo <= HASH_ALGO_SHA1) { offset = 1; iint->ima_hash->xattr.sha1.type = IMA_XATTR_DIGEST; } else { offset = 0; iint->ima_hash->xattr.ng.type = IMA_XATTR_DIGEST_NG; iint->ima_hash->xattr.ng.algo = algo; } rc = __vfs_setxattr_noperm(dentry, XATTR_NAME_IMA, &iint->ima_hash->xattr.data[offset], (sizeof(iint->ima_hash->xattr) - offset) + iint->ima_hash->length, 0); return rc; } /* Return specific func appraised cached result */ enum integrity_status ima_get_cache_status(struct integrity_iint_cache *iint, enum ima_hooks func) { switch (func) { case MMAP_CHECK: return iint->ima_mmap_status; case BPRM_CHECK: return iint->ima_bprm_status; case CREDS_CHECK: return iint->ima_creds_status; case FILE_CHECK: case POST_SETATTR: return iint->ima_file_status; case MODULE_CHECK ... MAX_CHECK - 1: default: return iint->ima_read_status; } } static void ima_set_cache_status(struct integrity_iint_cache *iint, enum ima_hooks func, enum integrity_status status) { switch (func) { case MMAP_CHECK: iint->ima_mmap_status = status; break; case BPRM_CHECK: iint->ima_bprm_status = status; break; case CREDS_CHECK: iint->ima_creds_status = status; case FILE_CHECK: case POST_SETATTR: iint->ima_file_status = status; break; case MODULE_CHECK ... MAX_CHECK - 1: default: iint->ima_read_status = status; break; } } static void ima_cache_flags(struct integrity_iint_cache *iint, enum ima_hooks func) { switch (func) { case MMAP_CHECK: iint->flags |= (IMA_MMAP_APPRAISED | IMA_APPRAISED); break; case BPRM_CHECK: iint->flags |= (IMA_BPRM_APPRAISED | IMA_APPRAISED); break; case CREDS_CHECK: iint->flags |= (IMA_CREDS_APPRAISED | IMA_APPRAISED); break; case FILE_CHECK: case POST_SETATTR: iint->flags |= (IMA_FILE_APPRAISED | IMA_APPRAISED); break; case MODULE_CHECK ... MAX_CHECK - 1: default: iint->flags |= (IMA_READ_APPRAISED | IMA_APPRAISED); break; } } enum hash_algo ima_get_hash_algo(struct evm_ima_xattr_data *xattr_value, int xattr_len) { struct signature_v2_hdr *sig; enum hash_algo ret; if (!xattr_value || xattr_len < 2) /* return default hash algo */ return ima_hash_algo; switch (xattr_value->type) { case EVM_IMA_XATTR_DIGSIG: sig = (typeof(sig))xattr_value; if (sig->version != 2 || xattr_len <= sizeof(*sig)) return ima_hash_algo; return sig->hash_algo; break; case IMA_XATTR_DIGEST_NG: ret = xattr_value->digest[0]; if (ret < HASH_ALGO__LAST) return ret; break; case IMA_XATTR_DIGEST: /* this is for backward compatibility */ if (xattr_len == 21) { unsigned int zero = 0; if (!memcmp(&xattr_value->digest[16], &zero, 4)) return HASH_ALGO_MD5; else return HASH_ALGO_SHA1; } else if (xattr_len == 17) return HASH_ALGO_MD5; break; } /* return default hash algo */ return ima_hash_algo; } int ima_read_xattr(struct dentry *dentry, struct evm_ima_xattr_data **xattr_value) { ssize_t ret; ret = vfs_getxattr_alloc(dentry, XATTR_NAME_IMA, (char **)xattr_value, 0, GFP_NOFS); if (ret == -EOPNOTSUPP) ret = 0; return ret; } /* * ima_appraise_measurement - appraise file measurement * * Call evm_verifyxattr() to verify the integrity of 'security.ima'. * Assuming success, compare the xattr hash with the collected measurement. * * Return 0 on success, error code otherwise */ int ima_appraise_measurement(enum ima_hooks func, struct integrity_iint_cache *iint, struct file *file, const unsigned char *filename, struct evm_ima_xattr_data *xattr_value, int xattr_len) { static const char op[] = "appraise_data"; const char *cause = "unknown"; struct dentry *dentry = file_dentry(file); struct inode *inode = d_backing_inode(dentry); enum integrity_status status = INTEGRITY_UNKNOWN; int rc = xattr_len, hash_start = 0; if (!(inode->i_opflags & IOP_XATTR)) return INTEGRITY_UNKNOWN; if (rc <= 0) { if (rc && rc != -ENODATA) goto out; cause = iint->flags & IMA_DIGSIG_REQUIRED ? "IMA-signature-required" : "missing-hash"; status = INTEGRITY_NOLABEL; if (file->f_mode & FMODE_CREATED) iint->flags |= IMA_NEW_FILE; if ((iint->flags & IMA_NEW_FILE) && (!(iint->flags & IMA_DIGSIG_REQUIRED) || (inode->i_size == 0))) status = INTEGRITY_PASS; goto out; } status = evm_verifyxattr(dentry, XATTR_NAME_IMA, xattr_value, rc, iint); switch (status) { case INTEGRITY_PASS: case INTEGRITY_PASS_IMMUTABLE: case INTEGRITY_UNKNOWN: break; case INTEGRITY_NOXATTRS: /* No EVM protected xattrs. */ case INTEGRITY_NOLABEL: /* No security.evm xattr. */ cause = "missing-HMAC"; goto out; case INTEGRITY_FAIL: /* Invalid HMAC/signature. */ cause = "invalid-HMAC"; goto out; default: WARN_ONCE(true, "Unexpected integrity status %d\n", status); } switch (xattr_value->type) { case IMA_XATTR_DIGEST_NG: /* first byte contains algorithm id */ hash_start = 1; /* fall through */ case IMA_XATTR_DIGEST: if (iint->flags & IMA_DIGSIG_REQUIRED) { cause = "IMA-signature-required"; status = INTEGRITY_FAIL; break; } clear_bit(IMA_DIGSIG, &iint->atomic_flags); if (xattr_len - sizeof(xattr_value->type) - hash_start >= iint->ima_hash->length) /* xattr length may be longer. md5 hash in previous version occupied 20 bytes in xattr, instead of 16 */ rc = memcmp(&xattr_value->digest[hash_start], iint->ima_hash->digest, iint->ima_hash->length); else rc = -EINVAL; if (rc) { cause = "invalid-hash"; status = INTEGRITY_FAIL; break; } status = INTEGRITY_PASS; break; case EVM_IMA_XATTR_DIGSIG: set_bit(IMA_DIGSIG, &iint->atomic_flags); rc = integrity_digsig_verify(INTEGRITY_KEYRING_IMA, (const char *)xattr_value, rc, iint->ima_hash->digest, iint->ima_hash->length); if (rc == -EOPNOTSUPP) { status = INTEGRITY_UNKNOWN; } else if (rc) { cause = "invalid-signature"; status = INTEGRITY_FAIL; } else { status = INTEGRITY_PASS; } break; default: status = INTEGRITY_UNKNOWN; cause = "unknown-ima-data"; break; } out: /* * File signatures on some filesystems can not be properly verified. * When such filesystems are mounted by an untrusted mounter or on a * system not willing to accept such a risk, fail the file signature * verification. */ if ((inode->i_sb->s_iflags & SB_I_IMA_UNVERIFIABLE_SIGNATURE) && ((inode->i_sb->s_iflags & SB_I_UNTRUSTED_MOUNTER) || (iint->flags & IMA_FAIL_UNVERIFIABLE_SIGS))) { status = INTEGRITY_FAIL; cause = "unverifiable-signature"; integrity_audit_msg(AUDIT_INTEGRITY_DATA, inode, filename, op, cause, rc, 0); } else if (status != INTEGRITY_PASS) { /* Fix mode, but don't replace file signatures. */ if ((ima_appraise & IMA_APPRAISE_FIX) && (!xattr_value || xattr_value->type != EVM_IMA_XATTR_DIGSIG)) { if (!ima_fix_xattr(dentry, iint)) status = INTEGRITY_PASS; } /* Permit new files with file signatures, but without data. */ if (inode->i_size == 0 && iint->flags & IMA_NEW_FILE && xattr_value && xattr_value->type == EVM_IMA_XATTR_DIGSIG) { status = INTEGRITY_PASS; } integrity_audit_msg(AUDIT_INTEGRITY_DATA, inode, filename, op, cause, rc, 0); } else { ima_cache_flags(iint, func); } ima_set_cache_status(iint, func, status); return status; } /* * ima_update_xattr - update 'security.ima' hash value */ void ima_update_xattr(struct integrity_iint_cache *iint, struct file *file) { struct dentry *dentry = file_dentry(file); int rc = 0; /* do not collect and update hash for digital signatures */ if (test_bit(IMA_DIGSIG, &iint->atomic_flags)) return; if ((iint->ima_file_status != INTEGRITY_PASS) && !(iint->flags & IMA_HASH)) return; rc = ima_collect_measurement(iint, file, NULL, 0, ima_hash_algo); if (rc < 0) return; inode_lock(file_inode(file)); ima_fix_xattr(dentry, iint); inode_unlock(file_inode(file)); } /** * ima_inode_post_setattr - reflect file metadata changes * @dentry: pointer to the affected dentry * * Changes to a dentry's metadata might result in needing to appraise. * * This function is called from notify_change(), which expects the caller * to lock the inode's i_mutex. */ void ima_inode_post_setattr(struct dentry *dentry) { struct inode *inode = d_backing_inode(dentry); struct integrity_iint_cache *iint; int action; if (!(ima_policy_flag & IMA_APPRAISE) || !S_ISREG(inode->i_mode) || !(inode->i_opflags & IOP_XATTR)) return; action = ima_must_appraise(inode, MAY_ACCESS, POST_SETATTR); if (!action) __vfs_removexattr(dentry, XATTR_NAME_IMA); iint = integrity_iint_find(inode); if (iint) { set_bit(IMA_CHANGE_ATTR, &iint->atomic_flags); if (!action) clear_bit(IMA_UPDATE_XATTR, &iint->atomic_flags); } } /* * ima_protect_xattr - protect 'security.ima' * * Ensure that not just anyone can modify or remove 'security.ima'. */ static int ima_protect_xattr(struct dentry *dentry, const char *xattr_name, const void *xattr_value, size_t xattr_value_len) { if (strcmp(xattr_name, XATTR_NAME_IMA) == 0) { if (!capable(CAP_SYS_ADMIN)) return -EPERM; return 1; } return 0; } static void ima_reset_appraise_flags(struct inode *inode, int digsig) { struct integrity_iint_cache *iint; if (!(ima_policy_flag & IMA_APPRAISE) || !S_ISREG(inode->i_mode)) return; iint = integrity_iint_find(inode); if (!iint) return; iint->measured_pcrs = 0; set_bit(IMA_CHANGE_XATTR, &iint->atomic_flags); if (digsig) set_bit(IMA_DIGSIG, &iint->atomic_flags); else clear_bit(IMA_DIGSIG, &iint->atomic_flags); } int ima_inode_setxattr(struct dentry *dentry, const char *xattr_name, const void *xattr_value, size_t xattr_value_len) { const struct evm_ima_xattr_data *xvalue = xattr_value; int result; result = ima_protect_xattr(dentry, xattr_name, xattr_value, xattr_value_len); if (result == 1) { if (!xattr_value_len || (xvalue->type >= IMA_XATTR_LAST)) return -EINVAL; ima_reset_appraise_flags(d_backing_inode(dentry), xvalue->type == EVM_IMA_XATTR_DIGSIG); result = 0; } return result; } int ima_inode_removexattr(struct dentry *dentry, const char *xattr_name) { int result; result = ima_protect_xattr(dentry, xattr_name, NULL, 0); if (result == 1) { ima_reset_appraise_flags(d_backing_inode(dentry), 0); result = 0; } return result; }
143 143 8 1 85 85 153 80 1 2 9 2 2 254 4 296 150 6 5 5 6 4 4 4 4 4 5 17 16 15 12 2 17 5 4 3 1 2 2 1 2 7 6 5 3 1 7 8 8 8 8 8 6 5 4 6 2 2 2 2 4 4 4 4 4 4 7 2 7 5 6 4 2 7 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 2297 2298 2299 2300 2301 2302 2303 2304 2305 2306 2307 2308 2309 2310 2311 2312 2313 2314 2315 2316 2317 2318 2319 2320 2321 2322 2323 2324 2325 2326 2327 2328 2329 2330 2331 2332 2333 2334 2335 2336 2337 2338 2339 2340 2341 2342 2343 2344 2345 2346 2347 2348 2349 2350 2351 2352 2353 2354 2355 2356 2357 2358 2359 2360 2361 2362 2363 2364 2365 2366 2367 2368 2369 2370 2371 2372 2373 2374 2375 2376 2377 2378 2379 2380 2381 2382 2383 2384 2385 2386 2387 2388 2389 2390 2391 2392 2393 2394 2395 2396 2397 2398 2399 2400 2401 2402 2403 2404 2405 2406 2407 2408 2409 2410 2411 2412 2413 2414 2415 2416 2417 2418 2419 2420 2421 2422 2423 2424 2425 2426 2427 2428 2429 2430 2431 2432 2433 2434 2435 /* * net/tipc/node.c: TIPC node management routines * * Copyright (c) 2000-2006, 2012-2016, Ericsson AB * Copyright (c) 2005-2006, 2010-2014, Wind River Systems * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are met: * * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. Neither the names of the copyright holders nor the names of its * contributors may be used to endorse or promote products derived from * this software without specific prior written permission. * * Alternatively, this software may be distributed under the terms of the * GNU General Public License ("GPL") version 2 as published by the Free * Software Foundation. * * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE * POSSIBILITY OF SUCH DAMAGE. */ #include "core.h" #include "link.h" #include "node.h" #include "name_distr.h" #include "socket.h" #include "bcast.h" #include "monitor.h" #include "discover.h" #include "netlink.h" #define INVALID_NODE_SIG 0x10000 #define NODE_CLEANUP_AFTER 300000 /* Flags used to take different actions according to flag type * TIPC_NOTIFY_NODE_DOWN: notify node is down * TIPC_NOTIFY_NODE_UP: notify node is up * TIPC_DISTRIBUTE_NAME: publish or withdraw link state name type */ enum { TIPC_NOTIFY_NODE_DOWN = (1 << 3), TIPC_NOTIFY_NODE_UP = (1 << 4), TIPC_NOTIFY_LINK_UP = (1 << 6), TIPC_NOTIFY_LINK_DOWN = (1 << 7) }; struct tipc_link_entry { struct tipc_link *link; spinlock_t lock; /* per link */ u32 mtu; struct sk_buff_head inputq; struct tipc_media_addr maddr; }; struct tipc_bclink_entry { struct tipc_link *link; struct sk_buff_head inputq1; struct sk_buff_head arrvq; struct sk_buff_head inputq2; struct sk_buff_head namedq; }; /** * struct tipc_node - TIPC node structure * @addr: network address of node * @ref: reference counter to node object * @lock: rwlock governing access to structure * @net: the applicable net namespace * @hash: links to adjacent nodes in unsorted hash chain * @inputq: pointer to input queue containing messages for msg event * @namedq: pointer to name table input queue with name table messages * @active_links: bearer ids of active links, used as index into links[] array * @links: array containing references to all links to node * @action_flags: bit mask of different types of node actions * @state: connectivity state vs peer node * @sync_point: sequence number where synch/failover is finished * @list: links to adjacent nodes in sorted list of cluster's nodes * @working_links: number of working links to node (both active and standby) * @link_cnt: number of links to node * @capabilities: bitmap, indicating peer node's functional capabilities * @signature: node instance identifier * @link_id: local and remote bearer ids of changing link, if any * @publ_list: list of publications * @rcu: rcu struct for tipc_node * @delete_at: indicates the time for deleting a down node */ struct tipc_node { u32 addr; struct kref kref; rwlock_t lock; struct net *net; struct hlist_node hash; int active_links[2]; struct tipc_link_entry links[MAX_BEARERS]; struct tipc_bclink_entry bc_entry; int action_flags; struct list_head list; int state; bool failover_sent; u16 sync_point; int link_cnt; u16 working_links; u16 capabilities; u32 signature; u32 link_id; u8 peer_id[16]; struct list_head publ_list; struct list_head conn_sks; unsigned long keepalive_intv; struct timer_list timer; struct rcu_head rcu; unsigned long delete_at; }; /* Node FSM states and events: */ enum { SELF_DOWN_PEER_DOWN = 0xdd, SELF_UP_PEER_UP = 0xaa, SELF_DOWN_PEER_LEAVING = 0xd1, SELF_UP_PEER_COMING = 0xac, SELF_COMING_PEER_UP = 0xca, SELF_LEAVING_PEER_DOWN = 0x1d, NODE_FAILINGOVER = 0xf0, NODE_SYNCHING = 0xcc }; enum { SELF_ESTABL_CONTACT_EVT = 0xece, SELF_LOST_CONTACT_EVT = 0x1ce, PEER_ESTABL_CONTACT_EVT = 0x9ece, PEER_LOST_CONTACT_EVT = 0x91ce, NODE_FAILOVER_BEGIN_EVT = 0xfbe, NODE_FAILOVER_END_EVT = 0xfee, NODE_SYNCH_BEGIN_EVT = 0xcbe, NODE_SYNCH_END_EVT = 0xcee }; static void __tipc_node_link_down(struct tipc_node *n, int *bearer_id, struct sk_buff_head *xmitq, struct tipc_media_addr **maddr); static void tipc_node_link_down(struct tipc_node *n, int bearer_id, bool delete); static void node_lost_contact(struct tipc_node *n, struct sk_buff_head *inputq); static void tipc_node_delete(struct tipc_node *node); static void tipc_node_timeout(struct timer_list *t); static void tipc_node_fsm_evt(struct tipc_node *n, int evt); static struct tipc_node *tipc_node_find(struct net *net, u32 addr); static struct tipc_node *tipc_node_find_by_id(struct net *net, u8 *id); static void tipc_node_put(struct tipc_node *node); static bool node_is_up(struct tipc_node *n); static void tipc_node_delete_from_list(struct tipc_node *node); struct tipc_sock_conn { u32 port; u32 peer_port; u32 peer_node; struct list_head list; }; static struct tipc_link *node_active_link(struct tipc_node *n, int sel) { int bearer_id = n->active_links[sel & 1]; if (unlikely(bearer_id == INVALID_BEARER_ID)) return NULL; return n->links[bearer_id].link; } int tipc_node_get_mtu(struct net *net, u32 addr, u32 sel) { struct tipc_node *n; int bearer_id; unsigned int mtu = MAX_MSG_SIZE; n = tipc_node_find(net, addr); if (unlikely(!n)) return mtu; bearer_id = n->active_links[sel & 1]; if (likely(bearer_id != INVALID_BEARER_ID)) mtu = n->links[bearer_id].mtu; tipc_node_put(n); return mtu; } bool tipc_node_get_id(struct net *net, u32 addr, u8 *id) { u8 *own_id = tipc_own_id(net); struct tipc_node *n; if (!own_id) return true; if (addr == tipc_own_addr(net)) { memcpy(id, own_id, TIPC_NODEID_LEN); return true; } n = tipc_node_find(net, addr); if (!n) return false; memcpy(id, &n->peer_id, TIPC_NODEID_LEN); tipc_node_put(n); return true; } u16 tipc_node_get_capabilities(struct net *net, u32 addr) { struct tipc_node *n; u16 caps; n = tipc_node_find(net, addr); if (unlikely(!n)) return TIPC_NODE_CAPABILITIES; caps = n->capabilities; tipc_node_put(n); return caps; } static void tipc_node_kref_release(struct kref *kref) { struct tipc_node *n = container_of(kref, struct tipc_node, kref); kfree(n->bc_entry.link); kfree_rcu(n, rcu); } static void tipc_node_put(struct tipc_node *node) { kref_put(&node->kref, tipc_node_kref_release); } static void tipc_node_get(struct tipc_node *node) { kref_get(&node->kref); } /* * tipc_node_find - locate specified node object, if it exists */ static struct tipc_node *tipc_node_find(struct net *net, u32 addr) { struct tipc_net *tn = tipc_net(net); struct tipc_node *node; unsigned int thash = tipc_hashfn(addr); rcu_read_lock(); hlist_for_each_entry_rcu(node, &tn->node_htable[thash], hash) { if (node->addr != addr) continue; if (!kref_get_unless_zero(&node->kref)) node = NULL; break; } rcu_read_unlock(); return node; } /* tipc_node_find_by_id - locate specified node object by its 128-bit id * Note: this function is called only when a discovery request failed * to find the node by its 32-bit id, and is not time critical */ static struct tipc_node *tipc_node_find_by_id(struct net *net, u8 *id) { struct tipc_net *tn = tipc_net(net); struct tipc_node *n; bool found = false; rcu_read_lock(); list_for_each_entry_rcu(n, &tn->node_list, list) { read_lock_bh(&n->lock); if (!memcmp(id, n->peer_id, 16) && kref_get_unless_zero(&n->kref)) found = true; read_unlock_bh(&n->lock); if (found) break; } rcu_read_unlock(); return found ? n : NULL; } static void tipc_node_read_lock(struct tipc_node *n) { read_lock_bh(&n->lock); } static void tipc_node_read_unlock(struct tipc_node *n) { read_unlock_bh(&n->lock); } static void tipc_node_write_lock(struct tipc_node *n) { write_lock_bh(&n->lock); } static void tipc_node_write_unlock_fast(struct tipc_node *n) { write_unlock_bh(&n->lock); } static void tipc_node_write_unlock(struct tipc_node *n) { struct net *net = n->net; u32 addr = 0; u32 flags = n->action_flags; u32 link_id = 0; u32 bearer_id; struct list_head *publ_list; if (likely(!flags)) { write_unlock_bh(&n->lock); return; } addr = n->addr; link_id = n->link_id; bearer_id = link_id & 0xffff; publ_list = &n->publ_list; n->action_flags &= ~(TIPC_NOTIFY_NODE_DOWN | TIPC_NOTIFY_NODE_UP | TIPC_NOTIFY_LINK_DOWN | TIPC_NOTIFY_LINK_UP); write_unlock_bh(&n->lock); if (flags & TIPC_NOTIFY_NODE_DOWN) tipc_publ_notify(net, publ_list, addr); if (flags & TIPC_NOTIFY_NODE_UP) tipc_named_node_up(net, addr); if (flags & TIPC_NOTIFY_LINK_UP) { tipc_mon_peer_up(net, addr, bearer_id); tipc_nametbl_publish(net, TIPC_LINK_STATE, addr, addr, TIPC_NODE_SCOPE, link_id, link_id); } if (flags & TIPC_NOTIFY_LINK_DOWN) { tipc_mon_peer_down(net, addr, bearer_id); tipc_nametbl_withdraw(net, TIPC_LINK_STATE, addr, addr, link_id); } } static struct tipc_node *tipc_node_create(struct net *net, u32 addr, u8 *peer_id, u16 capabilities) { struct tipc_net *tn = net_generic(net, tipc_net_id); struct tipc_node *n, *temp_node; struct tipc_link *l; int bearer_id; int i; spin_lock_bh(&tn->node_list_lock); n = tipc_node_find(net, addr); if (n) { if (n->capabilities == capabilities) goto exit; /* Same node may come back with new capabilities */ write_lock_bh(&n->lock); n->capabilities = capabilities; for (bearer_id = 0; bearer_id < MAX_BEARERS; bearer_id++) { l = n->links[bearer_id].link; if (l) tipc_link_update_caps(l, capabilities); } write_unlock_bh(&n->lock); goto exit; } n = kzalloc(sizeof(*n), GFP_ATOMIC); if (!n) { pr_warn("Node creation failed, no memory\n"); goto exit; } n->addr = addr; memcpy(&n->peer_id, peer_id, 16); n->net = net; n->capabilities = capabilities; kref_init(&n->kref); rwlock_init(&n->lock); INIT_HLIST_NODE(&n->hash); INIT_LIST_HEAD(&n->list); INIT_LIST_HEAD(&n->publ_list); INIT_LIST_HEAD(&n->conn_sks); skb_queue_head_init(&n->bc_entry.namedq); skb_queue_head_init(&n->bc_entry.inputq1); __skb_queue_head_init(&n->bc_entry.arrvq); skb_queue_head_init(&n->bc_entry.inputq2); for (i = 0; i < MAX_BEARERS; i++) spin_lock_init(&n->links[i].lock); n->state = SELF_DOWN_PEER_LEAVING; n->delete_at = jiffies + msecs_to_jiffies(NODE_CLEANUP_AFTER); n->signature = INVALID_NODE_SIG; n->active_links[0] = INVALID_BEARER_ID; n->active_links[1] = INVALID_BEARER_ID; if (!tipc_link_bc_create(net, tipc_own_addr(net), addr, U16_MAX, tipc_link_window(tipc_bc_sndlink(net)), n->capabilities, &n->bc_entry.inputq1, &n->bc_entry.namedq, tipc_bc_sndlink(net), &n->bc_entry.link)) { pr_warn("Broadcast rcv link creation failed, no memory\n"); kfree(n); n = NULL; goto exit; } tipc_node_get(n); timer_setup(&n->timer, tipc_node_timeout, 0); n->keepalive_intv = U32_MAX; hlist_add_head_rcu(&n->hash, &tn->node_htable[tipc_hashfn(addr)]); list_for_each_entry_rcu(temp_node, &tn->node_list, list) { if (n->addr < temp_node->addr) break; } list_add_tail_rcu(&n->list, &temp_node->list); exit: spin_unlock_bh(&tn->node_list_lock); return n; } static void tipc_node_calculate_timer(struct tipc_node *n, struct tipc_link *l) { unsigned long tol = tipc_link_tolerance(l); unsigned long intv = ((tol / 4) > 500) ? 500 : tol / 4; /* Link with lowest tolerance determines timer interval */ if (intv < n->keepalive_intv) n->keepalive_intv = intv; /* Ensure link's abort limit corresponds to current tolerance */ tipc_link_set_abort_limit(l, tol / n->keepalive_intv); } static void tipc_node_delete_from_list(struct tipc_node *node) { list_del_rcu(&node->list); hlist_del_rcu(&node->hash); tipc_node_put(node); } static void tipc_node_delete(struct tipc_node *node) { tipc_node_delete_from_list(node); del_timer_sync(&node->timer); tipc_node_put(node); } void tipc_node_stop(struct net *net) { struct tipc_net *tn = tipc_net(net); struct tipc_node *node, *t_node; spin_lock_bh(&tn->node_list_lock); list_for_each_entry_safe(node, t_node, &tn->node_list, list) tipc_node_delete(node); spin_unlock_bh(&tn->node_list_lock); } void tipc_node_subscribe(struct net *net, struct list_head *subscr, u32 addr) { struct tipc_node *n; if (in_own_node(net, addr)) return; n = tipc_node_find(net, addr); if (!n) { pr_warn("Node subscribe rejected, unknown node 0x%x\n", addr); return; } tipc_node_write_lock(n); list_add_tail(subscr, &n->publ_list); tipc_node_write_unlock_fast(n); tipc_node_put(n); } void tipc_node_unsubscribe(struct net *net, struct list_head *subscr, u32 addr) { struct tipc_node *n; if (in_own_node(net, addr)) return; n = tipc_node_find(net, addr); if (!n) { pr_warn("Node unsubscribe rejected, unknown node 0x%x\n", addr); return; } tipc_node_write_lock(n); list_del_init(subscr); tipc_node_write_unlock_fast(n); tipc_node_put(n); } int tipc_node_add_conn(struct net *net, u32 dnode, u32 port, u32 peer_port) { struct tipc_node *node; struct tipc_sock_conn *conn; int err = 0; if (in_own_node(net, dnode)) return 0; node = tipc_node_find(net, dnode); if (!node) { pr_warn("Connecting sock to node 0x%x failed\n", dnode); return -EHOSTUNREACH; } conn = kmalloc(sizeof(*conn), GFP_ATOMIC); if (!conn) { err = -EHOSTUNREACH; goto exit; } conn->peer_node = dnode; conn->port = port; conn->peer_port = peer_port; tipc_node_write_lock(node); list_add_tail(&conn->list, &node->conn_sks); tipc_node_write_unlock(node); exit: tipc_node_put(node); return err; } void tipc_node_remove_conn(struct net *net, u32 dnode, u32 port) { struct tipc_node *node; struct tipc_sock_conn *conn, *safe; if (in_own_node(net, dnode)) return; node = tipc_node_find(net, dnode); if (!node) return; tipc_node_write_lock(node); list_for_each_entry_safe(conn, safe, &node->conn_sks, list) { if (port != conn->port) continue; list_del(&conn->list); kfree(conn); } tipc_node_write_unlock(node); tipc_node_put(node); } static void tipc_node_clear_links(struct tipc_node *node) { int i; for (i = 0; i < MAX_BEARERS; i++) { struct tipc_link_entry *le = &node->links[i]; if (le->link) { kfree(le->link); le->link = NULL; node->link_cnt--; } } } /* tipc_node_cleanup - delete nodes that does not * have active links for NODE_CLEANUP_AFTER time */ static bool tipc_node_cleanup(struct tipc_node *peer) { struct tipc_net *tn = tipc_net(peer->net); bool deleted = false; /* If lock held by tipc_node_stop() the node will be deleted anyway */ if (!spin_trylock_bh(&tn->node_list_lock)) return false; tipc_node_write_lock(peer); if (!node_is_up(peer) && time_after(jiffies, peer->delete_at)) { tipc_node_clear_links(peer); tipc_node_delete_from_list(peer); deleted = true; } tipc_node_write_unlock(peer); spin_unlock_bh(&tn->node_list_lock); return deleted; } /* tipc_node_timeout - handle expiration of node timer */ static void tipc_node_timeout(struct timer_list *t) { struct tipc_node *n = from_timer(n, t, timer); struct tipc_link_entry *le; struct sk_buff_head xmitq; int remains = n->link_cnt; int bearer_id; int rc = 0; if (!node_is_up(n) && tipc_node_cleanup(n)) { /*Removing the reference of Timer*/ tipc_node_put(n); return; } __skb_queue_head_init(&xmitq); /* Initial node interval to value larger (10 seconds), then it will be * recalculated with link lowest tolerance */ tipc_node_read_lock(n); n->keepalive_intv = 10000; tipc_node_read_unlock(n); for (bearer_id = 0; remains && (bearer_id < MAX_BEARERS); bearer_id++) { tipc_node_read_lock(n); le = &n->links[bearer_id]; if (le->link) { spin_lock_bh(&le->lock); /* Link tolerance may change asynchronously: */ tipc_node_calculate_timer(n, le->link); rc = tipc_link_timeout(le->link, &xmitq); spin_unlock_bh(&le->lock); remains--; } tipc_node_read_unlock(n); tipc_bearer_xmit(n->net, bearer_id, &xmitq, &le->maddr); if (rc & TIPC_LINK_DOWN_EVT) tipc_node_link_down(n, bearer_id, false); } mod_timer(&n->timer, jiffies + msecs_to_jiffies(n->keepalive_intv)); } /** * __tipc_node_link_up - handle addition of link * Node lock must be held by caller * Link becomes active (alone or shared) or standby, depending on its priority. */ static void __tipc_node_link_up(struct tipc_node *n, int bearer_id, struct sk_buff_head *xmitq) { int *slot0 = &n->active_links[0]; int *slot1 = &n->active_links[1]; struct tipc_link *ol = node_active_link(n, 0); struct tipc_link *nl = n->links[bearer_id].link; if (!nl || tipc_link_is_up(nl)) return; tipc_link_fsm_evt(nl, LINK_ESTABLISH_EVT); if (!tipc_link_is_up(nl)) return; n->working_links++; n->action_flags |= TIPC_NOTIFY_LINK_UP; n->link_id = tipc_link_id(nl); /* Leave room for tunnel header when returning 'mtu' to users: */ n->links[bearer_id].mtu = tipc_link_mtu(nl) - INT_H_SIZE; tipc_bearer_add_dest(n->net, bearer_id, n->addr); tipc_bcast_inc_bearer_dst_cnt(n->net, bearer_id); pr_debug("Established link <%s> on network plane %c\n", tipc_link_name(nl), tipc_link_plane(nl)); /* Ensure that a STATE message goes first */ tipc_link_build_state_msg(nl, xmitq); /* First link? => give it both slots */ if (!ol) { *slot0 = bearer_id; *slot1 = bearer_id; tipc_node_fsm_evt(n, SELF_ESTABL_CONTACT_EVT); n->failover_sent = false; n->action_flags |= TIPC_NOTIFY_NODE_UP; tipc_link_set_active(nl, true); tipc_bcast_add_peer(n->net, nl, xmitq); return; } /* Second link => redistribute slots */ if (tipc_link_prio(nl) > tipc_link_prio(ol)) { pr_debug("Old link <%s> becomes standby\n", tipc_link_name(ol)); *slot0 = bearer_id; *slot1 = bearer_id; tipc_link_set_active(nl, true); tipc_link_set_active(ol, false); } else if (tipc_link_prio(nl) == tipc_link_prio(ol)) { tipc_link_set_active(nl, true); *slot1 = bearer_id; } else { pr_debug("New link <%s> is standby\n", tipc_link_name(nl)); } /* Prepare synchronization with first link */ tipc_link_tnl_prepare(ol, nl, SYNCH_MSG, xmitq); } /** * tipc_node_link_up - handle addition of link * * Link becomes active (alone or shared) or standby, depending on its priority. */ static void tipc_node_link_up(struct tipc_node *n, int bearer_id, struct sk_buff_head *xmitq) { struct tipc_media_addr *maddr; tipc_node_write_lock(n); __tipc_node_link_up(n, bearer_id, xmitq); maddr = &n->links[bearer_id].maddr; tipc_bearer_xmit(n->net, bearer_id, xmitq, maddr); tipc_node_write_unlock(n); } /** * __tipc_node_link_down - handle loss of link */ static void __tipc_node_link_down(struct tipc_node *n, int *bearer_id, struct sk_buff_head *xmitq, struct tipc_media_addr **maddr) { struct tipc_link_entry *le = &n->links[*bearer_id]; int *slot0 = &n->active_links[0]; int *slot1 = &n->active_links[1]; int i, highest = 0, prio; struct tipc_link *l, *_l, *tnl; l = n->links[*bearer_id].link; if (!l || tipc_link_is_reset(l)) return; n->working_links--; n->action_flags |= TIPC_NOTIFY_LINK_DOWN; n->link_id = tipc_link_id(l); tipc_bearer_remove_dest(n->net, *bearer_id, n->addr); pr_debug("Lost link <%s> on network plane %c\n", tipc_link_name(l), tipc_link_plane(l)); /* Select new active link if any available */ *slot0 = INVALID_BEARER_ID; *slot1 = INVALID_BEARER_ID; for (i = 0; i < MAX_BEARERS; i++) { _l = n->links[i].link; if (!_l || !tipc_link_is_up(_l)) continue; if (_l == l) continue; prio = tipc_link_prio(_l); if (prio < highest) continue; if (prio > highest) { highest = prio; *slot0 = i; *slot1 = i; continue; } *slot1 = i; } if (!node_is_up(n)) { if (tipc_link_peer_is_down(l)) tipc_node_fsm_evt(n, PEER_LOST_CONTACT_EVT); tipc_node_fsm_evt(n, SELF_LOST_CONTACT_EVT); tipc_link_fsm_evt(l, LINK_RESET_EVT); tipc_link_reset(l); tipc_link_build_reset_msg(l, xmitq); *maddr = &n->links[*bearer_id].maddr; node_lost_contact(n, &le->inputq); tipc_bcast_dec_bearer_dst_cnt(n->net, *bearer_id); return; } tipc_bcast_dec_bearer_dst_cnt(n->net, *bearer_id); /* There is still a working link => initiate failover */ *bearer_id = n->active_links[0]; tnl = n->links[*bearer_id].link; tipc_link_fsm_evt(tnl, LINK_SYNCH_END_EVT); tipc_node_fsm_evt(n, NODE_SYNCH_END_EVT); n->sync_point = tipc_link_rcv_nxt(tnl) + (U16_MAX / 2 - 1); tipc_link_tnl_prepare(l, tnl, FAILOVER_MSG, xmitq); tipc_link_reset(l); tipc_link_fsm_evt(l, LINK_RESET_EVT); tipc_link_fsm_evt(l, LINK_FAILOVER_BEGIN_EVT); tipc_node_fsm_evt(n, NODE_FAILOVER_BEGIN_EVT); *maddr = &n->links[*bearer_id].maddr; } static void tipc_node_link_down(struct tipc_node *n, int bearer_id, bool delete) { struct tipc_link_entry *le = &n->links[bearer_id]; struct tipc_media_addr *maddr = NULL; struct tipc_link *l = le->link; int old_bearer_id = bearer_id; struct sk_buff_head xmitq; if (!l) return; __skb_queue_head_init(&xmitq); tipc_node_write_lock(n); if (!tipc_link_is_establishing(l)) { __tipc_node_link_down(n, &bearer_id, &xmitq, &maddr); if (delete) { kfree(l); le->link = NULL; n->link_cnt--; } } else { /* Defuse pending tipc_node_link_up() */ tipc_link_fsm_evt(l, LINK_RESET_EVT); } tipc_node_write_unlock(n); if (delete) tipc_mon_remove_peer(n->net, n->addr, old_bearer_id); if (!skb_queue_empty(&xmitq)) tipc_bearer_xmit(n->net, bearer_id, &xmitq, maddr); tipc_sk_rcv(n->net, &le->inputq); } static bool node_is_up(struct tipc_node *n) { return n->active_links[0] != INVALID_BEARER_ID; } bool tipc_node_is_up(struct net *net, u32 addr) { struct tipc_node *n; bool retval = false; if (in_own_node(net, addr)) return true; n = tipc_node_find(net, addr); if (!n) return false; retval = node_is_up(n); tipc_node_put(n); return retval; } static u32 tipc_node_suggest_addr(struct net *net, u32 addr) { struct tipc_node *n; addr ^= tipc_net(net)->random; while ((n = tipc_node_find(net, addr))) { tipc_node_put(n); addr++; } return addr; } /* tipc_node_try_addr(): Check if addr can be used by peer, suggest other if not * Returns suggested address if any, otherwise 0 */ u32 tipc_node_try_addr(struct net *net, u8 *id, u32 addr) { struct tipc_net *tn = tipc_net(net); struct tipc_node *n; /* Suggest new address if some other peer is using this one */ n = tipc_node_find(net, addr); if (n) { if (!memcmp(n->peer_id, id, NODE_ID_LEN)) addr = 0; tipc_node_put(n); if (!addr) return 0; return tipc_node_suggest_addr(net, addr); } /* Suggest previously used address if peer is known */ n = tipc_node_find_by_id(net, id); if (n) { addr = n->addr; tipc_node_put(n); return addr; } /* Even this node may be in conflict */ if (tn->trial_addr == addr) return tipc_node_suggest_addr(net, addr); return 0; } void tipc_node_check_dest(struct net *net, u32 addr, u8 *peer_id, struct tipc_bearer *b, u16 capabilities, u32 signature, struct tipc_media_addr *maddr, bool *respond, bool *dupl_addr) { struct tipc_node *n; struct tipc_link *l; struct tipc_link_entry *le; bool addr_match = false; bool sign_match = false; bool link_up = false; bool accept_addr = false; bool reset = true; char *if_name; unsigned long intv; u16 session; *dupl_addr = false; *respond = false; n = tipc_node_create(net, addr, peer_id, capabilities); if (!n) return; tipc_node_write_lock(n); le = &n->links[b->identity]; /* Prepare to validate requesting node's signature and media address */ l = le->link; link_up = l && tipc_link_is_up(l); addr_match = l && !memcmp(&le->maddr, maddr, sizeof(*maddr)); sign_match = (signature == n->signature); /* These three flags give us eight permutations: */ if (sign_match && addr_match && link_up) { /* All is fine. Do nothing. */ reset = false; } else if (sign_match && addr_match && !link_up) { /* Respond. The link will come up in due time */ *respond = true; } else if (sign_match && !addr_match && link_up) { /* Peer has changed i/f address without rebooting. * If so, the link will reset soon, and the next * discovery will be accepted. So we can ignore it. * It may also be an cloned or malicious peer having * chosen the same node address and signature as an * existing one. * Ignore requests until the link goes down, if ever. */ *dupl_addr = true; } else if (sign_match && !addr_match && !link_up) { /* Peer link has changed i/f address without rebooting. * It may also be a cloned or malicious peer; we can't * distinguish between the two. * The signature is correct, so we must accept. */ accept_addr = true; *respond = true; } else if (!sign_match && addr_match && link_up) { /* Peer node rebooted. Two possibilities: * - Delayed re-discovery; this link endpoint has already * reset and re-established contact with the peer, before * receiving a discovery message from that node. * (The peer happened to receive one from this node first). * - The peer came back so fast that our side has not * discovered it yet. Probing from this side will soon * reset the link, since there can be no working link * endpoint at the peer end, and the link will re-establish. * Accept the signature, since it comes from a known peer. */ n->signature = signature; } else if (!sign_match && addr_match && !link_up) { /* The peer node has rebooted. * Accept signature, since it is a known peer. */ n->signature = signature; *respond = true; } else if (!sign_match && !addr_match && link_up) { /* Peer rebooted with new address, or a new/duplicate peer. * Ignore until the link goes down, if ever. */ *dupl_addr = true; } else if (!sign_match && !addr_match && !link_up) { /* Peer rebooted with new address, or it is a new peer. * Accept signature and address. */ n->signature = signature; accept_addr = true; *respond = true; } if (!accept_addr) goto exit; /* Now create new link if not already existing */ if (!l) { if (n->link_cnt == 2) goto exit; if_name = strchr(b->name, ':') + 1; get_random_bytes(&session, sizeof(u16)); if (!tipc_link_create(net, if_name, b->identity, b->tolerance, b->net_plane, b->mtu, b->priority, b->window, session, tipc_own_addr(net), addr, peer_id, n->capabilities, tipc_bc_sndlink(n->net), n->bc_entry.link, &le->inputq, &n->bc_entry.namedq, &l)) { *respond = false; goto exit; } tipc_link_reset(l); tipc_link_fsm_evt(l, LINK_RESET_EVT); if (n->state == NODE_FAILINGOVER) tipc_link_fsm_evt(l, LINK_FAILOVER_BEGIN_EVT); le->link = l; n->link_cnt++; tipc_node_calculate_timer(n, l); if (n->link_cnt == 1) { intv = jiffies + msecs_to_jiffies(n->keepalive_intv); if (!mod_timer(&n->timer, intv)) tipc_node_get(n); } } memcpy(&le->maddr, maddr, sizeof(*maddr)); exit: tipc_node_write_unlock(n); if (reset && l && !tipc_link_is_reset(l)) tipc_node_link_down(n, b->identity, false); tipc_node_put(n); } void tipc_node_delete_links(struct net *net, int bearer_id) { struct tipc_net *tn = net_generic(net, tipc_net_id); struct tipc_node *n; rcu_read_lock(); list_for_each_entry_rcu(n, &tn->node_list, list) { tipc_node_link_down(n, bearer_id, true); } rcu_read_unlock(); } static void tipc_node_reset_links(struct tipc_node *n) { int i; pr_warn("Resetting all links to %x\n", n->addr); for (i = 0; i < MAX_BEARERS; i++) { tipc_node_link_down(n, i, false); } } /* tipc_node_fsm_evt - node finite state machine * Determines when contact is allowed with peer node */ static void tipc_node_fsm_evt(struct tipc_node *n, int evt) { int state = n->state; switch (state) { case SELF_DOWN_PEER_DOWN: switch (evt) { case SELF_ESTABL_CONTACT_EVT: state = SELF_UP_PEER_COMING; break; case PEER_ESTABL_CONTACT_EVT: state = SELF_COMING_PEER_UP; break; case SELF_LOST_CONTACT_EVT: case PEER_LOST_CONTACT_EVT: break; case NODE_SYNCH_END_EVT: case NODE_SYNCH_BEGIN_EVT: case NODE_FAILOVER_BEGIN_EVT: case NODE_FAILOVER_END_EVT: default: goto illegal_evt; } break; case SELF_UP_PEER_UP: switch (evt) { case SELF_LOST_CONTACT_EVT: state = SELF_DOWN_PEER_LEAVING; break; case PEER_LOST_CONTACT_EVT: state = SELF_LEAVING_PEER_DOWN; break; case NODE_SYNCH_BEGIN_EVT: state = NODE_SYNCHING; break; case NODE_FAILOVER_BEGIN_EVT: state = NODE_FAILINGOVER; break; case SELF_ESTABL_CONTACT_EVT: case PEER_ESTABL_CONTACT_EVT: case NODE_SYNCH_END_EVT: case NODE_FAILOVER_END_EVT: break; default: goto illegal_evt; } break; case SELF_DOWN_PEER_LEAVING: switch (evt) { case PEER_LOST_CONTACT_EVT: state = SELF_DOWN_PEER_DOWN; break; case SELF_ESTABL_CONTACT_EVT: case PEER_ESTABL_CONTACT_EVT: case SELF_LOST_CONTACT_EVT: break; case NODE_SYNCH_END_EVT: case NODE_SYNCH_BEGIN_EVT: case NODE_FAILOVER_BEGIN_EVT: case NODE_FAILOVER_END_EVT: default: goto illegal_evt; } break; case SELF_UP_PEER_COMING: switch (evt) { case PEER_ESTABL_CONTACT_EVT: state = SELF_UP_PEER_UP; break; case SELF_LOST_CONTACT_EVT: state = SELF_DOWN_PEER_DOWN; break; case SELF_ESTABL_CONTACT_EVT: case PEER_LOST_CONTACT_EVT: case NODE_SYNCH_END_EVT: case NODE_FAILOVER_BEGIN_EVT: break; case NODE_SYNCH_BEGIN_EVT: case NODE_FAILOVER_END_EVT: default: goto illegal_evt; } break; case SELF_COMING_PEER_UP: switch (evt) { case SELF_ESTABL_CONTACT_EVT: state = SELF_UP_PEER_UP; break; case PEER_LOST_CONTACT_EVT: state = SELF_DOWN_PEER_DOWN; break; case SELF_LOST_CONTACT_EVT: case PEER_ESTABL_CONTACT_EVT: break; case NODE_SYNCH_END_EVT: case NODE_SYNCH_BEGIN_EVT: case NODE_FAILOVER_BEGIN_EVT: case NODE_FAILOVER_END_EVT: default: goto illegal_evt; } break; case SELF_LEAVING_PEER_DOWN: switch (evt) { case SELF_LOST_CONTACT_EVT: state = SELF_DOWN_PEER_DOWN; break; case SELF_ESTABL_CONTACT_EVT: case PEER_ESTABL_CONTACT_EVT: case PEER_LOST_CONTACT_EVT: break; case NODE_SYNCH_END_EVT: case NODE_SYNCH_BEGIN_EVT: case NODE_FAILOVER_BEGIN_EVT: case NODE_FAILOVER_END_EVT: default: goto illegal_evt; } break; case NODE_FAILINGOVER: switch (evt) { case SELF_LOST_CONTACT_EVT: state = SELF_DOWN_PEER_LEAVING; break; case PEER_LOST_CONTACT_EVT: state = SELF_LEAVING_PEER_DOWN; break; case NODE_FAILOVER_END_EVT: state = SELF_UP_PEER_UP; break; case NODE_FAILOVER_BEGIN_EVT: case SELF_ESTABL_CONTACT_EVT: case PEER_ESTABL_CONTACT_EVT: break; case NODE_SYNCH_BEGIN_EVT: case NODE_SYNCH_END_EVT: default: goto illegal_evt; } break; case NODE_SYNCHING: switch (evt) { case SELF_LOST_CONTACT_EVT: state = SELF_DOWN_PEER_LEAVING; break; case PEER_LOST_CONTACT_EVT: state = SELF_LEAVING_PEER_DOWN; break; case NODE_SYNCH_END_EVT: state = SELF_UP_PEER_UP; break; case NODE_FAILOVER_BEGIN_EVT: state = NODE_FAILINGOVER; break; case NODE_SYNCH_BEGIN_EVT: case SELF_ESTABL_CONTACT_EVT: case PEER_ESTABL_CONTACT_EVT: break; case NODE_FAILOVER_END_EVT: default: goto illegal_evt; } break; default: pr_err("Unknown node fsm state %x\n", state); break; } n->state = state; return; illegal_evt: pr_err("Illegal node fsm evt %x in state %x\n", evt, state); } static void node_lost_contact(struct tipc_node *n, struct sk_buff_head *inputq) { struct tipc_sock_conn *conn, *safe; struct tipc_link *l; struct list_head *conns = &n->conn_sks; struct sk_buff *skb; uint i; pr_debug("Lost contact with %x\n", n->addr); n->delete_at = jiffies + msecs_to_jiffies(NODE_CLEANUP_AFTER); /* Clean up broadcast state */ tipc_bcast_remove_peer(n->net, n->bc_entry.link); /* Abort any ongoing link failover */ for (i = 0; i < MAX_BEARERS; i++) { l = n->links[i].link; if (l) tipc_link_fsm_evt(l, LINK_FAILOVER_END_EVT); } /* Notify publications from this node */ n->action_flags |= TIPC_NOTIFY_NODE_DOWN; /* Notify sockets connected to node */ list_for_each_entry_safe(conn, safe, conns, list) { skb = tipc_msg_create(TIPC_CRITICAL_IMPORTANCE, TIPC_CONN_MSG, SHORT_H_SIZE, 0, tipc_own_addr(n->net), conn->peer_node, conn->port, conn->peer_port, TIPC_ERR_NO_NODE); if (likely(skb)) skb_queue_tail(inputq, skb); list_del(&conn->list); kfree(conn); } } /** * tipc_node_get_linkname - get the name of a link * * @bearer_id: id of the bearer * @node: peer node address * @linkname: link name output buffer * * Returns 0 on success */ int tipc_node_get_linkname(struct net *net, u32 bearer_id, u32 addr, char *linkname, size_t len) { struct tipc_link *link; int err = -EINVAL; struct tipc_node *node = tipc_node_find(net, addr); if (!node) return err; if (bearer_id >= MAX_BEARERS) goto exit; tipc_node_read_lock(node); link = node->links[bearer_id].link; if (link) { strncpy(linkname, tipc_link_name(link), len); err = 0; } tipc_node_read_unlock(node); exit: tipc_node_put(node); return err; } /* Caller should hold node lock for the passed node */ static int __tipc_nl_add_node(struct tipc_nl_msg *msg, struct tipc_node *node) { void *hdr; struct nlattr *attrs; hdr = genlmsg_put(msg->skb, msg->portid, msg->seq, &tipc_genl_family, NLM_F_MULTI, TIPC_NL_NODE_GET); if (!hdr) return -EMSGSIZE; attrs = nla_nest_start(msg->skb, TIPC_NLA_NODE); if (!attrs) goto msg_full; if (nla_put_u32(msg->skb, TIPC_NLA_NODE_ADDR, node->addr)) goto attr_msg_full; if (node_is_up(node)) if (nla_put_flag(msg->skb, TIPC_NLA_NODE_UP)) goto attr_msg_full; nla_nest_end(msg->skb, attrs); genlmsg_end(msg->skb, hdr); return 0; attr_msg_full: nla_nest_cancel(msg->skb, attrs); msg_full: genlmsg_cancel(msg->skb, hdr); return -EMSGSIZE; } /** * tipc_node_xmit() is the general link level function for message sending * @net: the applicable net namespace * @list: chain of buffers containing message * @dnode: address of destination node * @selector: a number used for deterministic link selection * Consumes the buffer chain. * Returns 0 if success, otherwise: -ELINKCONG,-EHOSTUNREACH,-EMSGSIZE,-ENOBUF */ int tipc_node_xmit(struct net *net, struct sk_buff_head *list, u32 dnode, int selector) { struct tipc_link_entry *le = NULL; struct tipc_node *n; struct sk_buff_head xmitq; int bearer_id; int rc; if (in_own_node(net, dnode)) { spin_lock_init(&list->lock); tipc_sk_rcv(net, list); return 0; } n = tipc_node_find(net, dnode); if (unlikely(!n)) { __skb_queue_purge(list); return -EHOSTUNREACH; } tipc_node_read_lock(n); bearer_id = n->active_links[selector & 1]; if (unlikely(bearer_id == INVALID_BEARER_ID)) { tipc_node_read_unlock(n); tipc_node_put(n); __skb_queue_purge(list); return -EHOSTUNREACH; } __skb_queue_head_init(&xmitq); le = &n->links[bearer_id]; spin_lock_bh(&le->lock); rc = tipc_link_xmit(le->link, list, &xmitq); spin_unlock_bh(&le->lock); tipc_node_read_unlock(n); if (unlikely(rc == -ENOBUFS)) tipc_node_link_down(n, bearer_id, false); else tipc_bearer_xmit(net, bearer_id, &xmitq, &le->maddr); tipc_node_put(n); return rc; } /* tipc_node_xmit_skb(): send single buffer to destination * Buffers sent via this functon are generally TIPC_SYSTEM_IMPORTANCE * messages, which will not be rejected * The only exception is datagram messages rerouted after secondary * lookup, which are rare and safe to dispose of anyway. */ int tipc_node_xmit_skb(struct net *net, struct sk_buff *skb, u32 dnode, u32 selector) { struct sk_buff_head head; __skb_queue_head_init(&head); __skb_queue_tail(&head, skb); tipc_node_xmit(net, &head, dnode, selector); return 0; } /* tipc_node_distr_xmit(): send single buffer msgs to individual destinations * Note: this is only for SYSTEM_IMPORTANCE messages, which cannot be rejected */ int tipc_node_distr_xmit(struct net *net, struct sk_buff_head *xmitq) { struct sk_buff *skb; u32 selector, dnode; while ((skb = __skb_dequeue(xmitq))) { selector = msg_origport(buf_msg(skb)); dnode = msg_destnode(buf_msg(skb)); tipc_node_xmit_skb(net, skb, dnode, selector); } return 0; } void tipc_node_broadcast(struct net *net, struct sk_buff *skb) { struct sk_buff *txskb; struct tipc_node *n; u32 dst; rcu_read_lock(); list_for_each_entry_rcu(n, tipc_nodes(net), list) { dst = n->addr; if (in_own_node(net, dst)) continue; if (!node_is_up(n)) continue; txskb = pskb_copy(skb, GFP_ATOMIC); if (!txskb) break; msg_set_destnode(buf_msg(txskb), dst); tipc_node_xmit_skb(net, txskb, dst, 0); } rcu_read_unlock(); kfree_skb(skb); } static void tipc_node_mcast_rcv(struct tipc_node *n) { struct tipc_bclink_entry *be = &n->bc_entry; /* 'arrvq' is under inputq2's lock protection */ spin_lock_bh(&be->inputq2.lock); spin_lock_bh(&be->inputq1.lock); skb_queue_splice_tail_init(&be->inputq1, &be->arrvq); spin_unlock_bh(&be->inputq1.lock); spin_unlock_bh(&be->inputq2.lock); tipc_sk_mcast_rcv(n->net, &be->arrvq, &be->inputq2); } static void tipc_node_bc_sync_rcv(struct tipc_node *n, struct tipc_msg *hdr, int bearer_id, struct sk_buff_head *xmitq) { struct tipc_link *ucl; int rc; rc = tipc_bcast_sync_rcv(n->net, n->bc_entry.link, hdr); if (rc & TIPC_LINK_DOWN_EVT) { tipc_node_reset_links(n); return; } if (!(rc & TIPC_LINK_SND_STATE)) return; /* If probe message, a STATE response will be sent anyway */ if (msg_probe(hdr)) return; /* Produce a STATE message carrying broadcast NACK */ tipc_node_read_lock(n); ucl = n->links[bearer_id].link; if (ucl) tipc_link_build_state_msg(ucl, xmitq); tipc_node_read_unlock(n); } /** * tipc_node_bc_rcv - process TIPC broadcast packet arriving from off-node * @net: the applicable net namespace * @skb: TIPC packet * @bearer_id: id of bearer message arrived on * * Invoked with no locks held. */ static void tipc_node_bc_rcv(struct net *net, struct sk_buff *skb, int bearer_id) { int rc; struct sk_buff_head xmitq; struct tipc_bclink_entry *be; struct tipc_link_entry *le; struct tipc_msg *hdr = buf_msg(skb); int usr = msg_user(hdr); u32 dnode = msg_destnode(hdr); struct tipc_node *n; __skb_queue_head_init(&xmitq); /* If NACK for other node, let rcv link for that node peek into it */ if ((usr == BCAST_PROTOCOL) && (dnode != tipc_own_addr(net))) n = tipc_node_find(net, dnode); else n = tipc_node_find(net, msg_prevnode(hdr)); if (!n) { kfree_skb(skb); return; } be = &n->bc_entry; le = &n->links[bearer_id]; rc = tipc_bcast_rcv(net, be->link, skb); /* Broadcast ACKs are sent on a unicast link */ if (rc & TIPC_LINK_SND_STATE) { tipc_node_read_lock(n); tipc_link_build_state_msg(le->link, &xmitq); tipc_node_read_unlock(n); } if (!skb_queue_empty(&xmitq)) tipc_bearer_xmit(net, bearer_id, &xmitq, &le->maddr); if (!skb_queue_empty(&be->inputq1)) tipc_node_mcast_rcv(n); /* If reassembly or retransmission failure => reset all links to peer */ if (rc & TIPC_LINK_DOWN_EVT) tipc_node_reset_links(n); tipc_node_put(n); } /** * tipc_node_check_state - check and if necessary update node state * @skb: TIPC packet * @bearer_id: identity of bearer delivering the packet * Returns true if state and msg are ok, otherwise false */ static bool tipc_node_check_state(struct tipc_node *n, struct sk_buff *skb, int bearer_id, struct sk_buff_head *xmitq) { struct tipc_msg *hdr = buf_msg(skb); int usr = msg_user(hdr); int mtyp = msg_type(hdr); u16 oseqno = msg_seqno(hdr); u16 iseqno = msg_seqno(msg_get_wrapped(hdr)); u16 exp_pkts = msg_msgcnt(hdr); u16 rcv_nxt, syncpt, dlv_nxt, inputq_len; int state = n->state; struct tipc_link *l, *tnl, *pl = NULL; struct tipc_media_addr *maddr; int pb_id; l = n->links[bearer_id].link; if (!l) return false; rcv_nxt = tipc_link_rcv_nxt(l); if (likely((state == SELF_UP_PEER_UP) && (usr != TUNNEL_PROTOCOL))) return true; /* Find parallel link, if any */ for (pb_id = 0; pb_id < MAX_BEARERS; pb_id++) { if ((pb_id != bearer_id) && n->links[pb_id].link) { pl = n->links[pb_id].link; break; } } if (!tipc_link_validate_msg(l, hdr)) return false; /* Check and update node accesibility if applicable */ if (state == SELF_UP_PEER_COMING) { if (!tipc_link_is_up(l)) return true; if (!msg_peer_link_is_up(hdr)) return true; tipc_node_fsm_evt(n, PEER_ESTABL_CONTACT_EVT); } if (state == SELF_DOWN_PEER_LEAVING) { if (msg_peer_node_is_up(hdr)) return false; tipc_node_fsm_evt(n, PEER_LOST_CONTACT_EVT); return true; } if (state == SELF_LEAVING_PEER_DOWN) return false; /* Ignore duplicate packets */ if ((usr != LINK_PROTOCOL) && less(oseqno, rcv_nxt)) return true; /* Initiate or update failover mode if applicable */ if ((usr == TUNNEL_PROTOCOL) && (mtyp == FAILOVER_MSG)) { syncpt = oseqno + exp_pkts - 1; if (pl && tipc_link_is_up(pl)) { __tipc_node_link_down(n, &pb_id, xmitq, &maddr); tipc_skb_queue_splice_tail_init(tipc_link_inputq(pl), tipc_link_inputq(l)); } /* If parallel link was already down, and this happened before * the tunnel link came up, FAILOVER was never sent. Ensure that * FAILOVER is sent to get peer out of NODE_FAILINGOVER state. */ if (n->state != NODE_FAILINGOVER && !n->failover_sent) { tipc_link_create_dummy_tnl_msg(l, xmitq); n->failover_sent = true; } /* If pkts arrive out of order, use lowest calculated syncpt */ if (less(syncpt, n->sync_point)) n->sync_point = syncpt; } /* Open parallel link when tunnel link reaches synch point */ if ((n->state == NODE_FAILINGOVER) && tipc_link_is_up(l)) { if (!more(rcv_nxt, n->sync_point)) return true; tipc_node_fsm_evt(n, NODE_FAILOVER_END_EVT); if (pl) tipc_link_fsm_evt(pl, LINK_FAILOVER_END_EVT); return true; } /* No synching needed if only one link */ if (!pl || !tipc_link_is_up(pl)) return true; /* Initiate synch mode if applicable */ if ((usr == TUNNEL_PROTOCOL) && (mtyp == SYNCH_MSG) && (oseqno == 1)) { syncpt = iseqno + exp_pkts - 1; if (!tipc_link_is_up(l)) __tipc_node_link_up(n, bearer_id, xmitq); if (n->state == SELF_UP_PEER_UP) { n->sync_point = syncpt; tipc_link_fsm_evt(l, LINK_SYNCH_BEGIN_EVT); tipc_node_fsm_evt(n, NODE_SYNCH_BEGIN_EVT); } } /* Open tunnel link when parallel link reaches synch point */ if (n->state == NODE_SYNCHING) { if (tipc_link_is_synching(l)) { tnl = l; } else { tnl = pl; pl = l; } inputq_len = skb_queue_len(tipc_link_inputq(pl)); dlv_nxt = tipc_link_rcv_nxt(pl) - inputq_len; if (more(dlv_nxt, n->sync_point)) { tipc_link_fsm_evt(tnl, LINK_SYNCH_END_EVT); tipc_node_fsm_evt(n, NODE_SYNCH_END_EVT); return true; } if (l == pl) return true; if ((usr == TUNNEL_PROTOCOL) && (mtyp == SYNCH_MSG)) return true; if (usr == LINK_PROTOCOL) return true; return false; } return true; } /** * tipc_rcv - process TIPC packets/messages arriving from off-node * @net: the applicable net namespace * @skb: TIPC packet * @bearer: pointer to bearer message arrived on * * Invoked with no locks held. Bearer pointer must point to a valid bearer * structure (i.e. cannot be NULL), but bearer can be inactive. */ void tipc_rcv(struct net *net, struct sk_buff *skb, struct tipc_bearer *b) { struct sk_buff_head xmitq; struct tipc_node *n; struct tipc_msg *hdr; int bearer_id = b->identity; struct tipc_link_entry *le; u32 self = tipc_own_addr(net); int usr, rc = 0; u16 bc_ack; __skb_queue_head_init(&xmitq); /* Ensure message is well-formed before touching the header */ if (unlikely(!tipc_msg_validate(&skb))) goto discard; hdr = buf_msg(skb); usr = msg_user(hdr); bc_ack = msg_bcast_ack(hdr); /* Handle arrival of discovery or broadcast packet */ if (unlikely(msg_non_seq(hdr))) { if (unlikely(usr == LINK_CONFIG)) return tipc_disc_rcv(net, skb, b); else return tipc_node_bc_rcv(net, skb, bearer_id); } /* Discard unicast link messages destined for another node */ if (unlikely(!msg_short(hdr) && (msg_destnode(hdr) != self))) goto discard; /* Locate neighboring node that sent packet */ n = tipc_node_find(net, msg_prevnode(hdr)); if (unlikely(!n)) goto discard; le = &n->links[bearer_id]; /* Ensure broadcast reception is in synch with peer's send state */ if (unlikely(usr == LINK_PROTOCOL)) tipc_node_bc_sync_rcv(n, hdr, bearer_id, &xmitq); else if (unlikely(tipc_link_acked(n->bc_entry.link) != bc_ack)) tipc_bcast_ack_rcv(net, n->bc_entry.link, hdr); /* Receive packet directly if conditions permit */ tipc_node_read_lock(n); if (likely((n->state == SELF_UP_PEER_UP) && (usr != TUNNEL_PROTOCOL))) { spin_lock_bh(&le->lock); if (le->link) { rc = tipc_link_rcv(le->link, skb, &xmitq); skb = NULL; } spin_unlock_bh(&le->lock); } tipc_node_read_unlock(n); /* Check/update node state before receiving */ if (unlikely(skb)) { if (unlikely(skb_linearize(skb))) goto discard; tipc_node_write_lock(n); if (tipc_node_check_state(n, skb, bearer_id, &xmitq)) { if (le->link) { rc = tipc_link_rcv(le->link, skb, &xmitq); skb = NULL; } } tipc_node_write_unlock(n); } if (unlikely(rc & TIPC_LINK_UP_EVT)) tipc_node_link_up(n, bearer_id, &xmitq); if (unlikely(rc & TIPC_LINK_DOWN_EVT)) tipc_node_link_down(n, bearer_id, false); if (unlikely(!skb_queue_empty(&n->bc_entry.namedq))) tipc_named_rcv(net, &n->bc_entry.namedq); if (unlikely(!skb_queue_empty(&n->bc_entry.inputq1))) tipc_node_mcast_rcv(n); if (!skb_queue_empty(&le->inputq)) tipc_sk_rcv(net, &le->inputq); if (!skb_queue_empty(&xmitq)) tipc_bearer_xmit(net, bearer_id, &xmitq, &le->maddr); tipc_node_put(n); discard: kfree_skb(skb); } void tipc_node_apply_property(struct net *net, struct tipc_bearer *b, int prop) { struct tipc_net *tn = tipc_net(net); int bearer_id = b->identity; struct sk_buff_head xmitq; struct tipc_link_entry *e; struct tipc_node *n; __skb_queue_head_init(&xmitq); rcu_read_lock(); list_for_each_entry_rcu(n, &tn->node_list, list) { tipc_node_write_lock(n); e = &n->links[bearer_id]; if (e->link) { if (prop == TIPC_NLA_PROP_TOL) tipc_link_set_tolerance(e->link, b->tolerance, &xmitq); else if (prop == TIPC_NLA_PROP_MTU) tipc_link_set_mtu(e->link, b->mtu); } tipc_node_write_unlock(n); tipc_bearer_xmit(net, bearer_id, &xmitq, &e->maddr); } rcu_read_unlock(); } int tipc_nl_peer_rm(struct sk_buff *skb, struct genl_info *info) { struct net *net = sock_net(skb->sk); struct tipc_net *tn = net_generic(net, tipc_net_id); struct nlattr *attrs[TIPC_NLA_NET_MAX + 1]; struct tipc_node *peer; u32 addr; int err; /* We identify the peer by its net */ if (!info->attrs[TIPC_NLA_NET]) return -EINVAL; err = nla_parse_nested(attrs, TIPC_NLA_NET_MAX, info->attrs[TIPC_NLA_NET], tipc_nl_net_policy, info->extack); if (err) return err; if (!attrs[TIPC_NLA_NET_ADDR]) return -EINVAL; addr = nla_get_u32(attrs[TIPC_NLA_NET_ADDR]); if (in_own_node(net, addr)) return -ENOTSUPP; spin_lock_bh(&tn->node_list_lock); peer = tipc_node_find(net, addr); if (!peer) { spin_unlock_bh(&tn->node_list_lock); return -ENXIO; } tipc_node_write_lock(peer); if (peer->state != SELF_DOWN_PEER_DOWN && peer->state != SELF_DOWN_PEER_LEAVING) { tipc_node_write_unlock(peer); err = -EBUSY; goto err_out; } tipc_node_clear_links(peer); tipc_node_write_unlock(peer); tipc_node_delete(peer); err = 0; err_out: tipc_node_put(peer); spin_unlock_bh(&tn->node_list_lock); return err; } int tipc_nl_node_dump(struct sk_buff *skb, struct netlink_callback *cb) { int err; struct net *net = sock_net(skb->sk); struct tipc_net *tn = net_generic(net, tipc_net_id); int done = cb->args[0]; int last_addr = cb->args[1]; struct tipc_node *node; struct tipc_nl_msg msg; if (done) return 0; msg.skb = skb; msg.portid = NETLINK_CB(cb->skb).portid; msg.seq = cb->nlh->nlmsg_seq; rcu_read_lock(); if (last_addr) { node = tipc_node_find(net, last_addr); if (!node) { rcu_read_unlock(); /* We never set seq or call nl_dump_check_consistent() * this means that setting prev_seq here will cause the * consistence check to fail in the netlink callback * handler. Resulting in the NLMSG_DONE message having * the NLM_F_DUMP_INTR flag set if the node state * changed while we released the lock. */ cb->prev_seq = 1; return -EPIPE; } tipc_node_put(node); } list_for_each_entry_rcu(node, &tn->node_list, list) { if (last_addr) { if (node->addr == last_addr) last_addr = 0; else continue; } tipc_node_read_lock(node); err = __tipc_nl_add_node(&msg, node); if (err) { last_addr = node->addr; tipc_node_read_unlock(node); goto out; } tipc_node_read_unlock(node); } done = 1; out: cb->args[0] = done; cb->args[1] = last_addr; rcu_read_unlock(); return skb->len; } /* tipc_node_find_by_name - locate owner node of link by link's name * @net: the applicable net namespace * @name: pointer to link name string * @bearer_id: pointer to index in 'node->links' array where the link was found. * * Returns pointer to node owning the link, or 0 if no matching link is found. */ static struct tipc_node *tipc_node_find_by_name(struct net *net, const char *link_name, unsigned int *bearer_id) { struct tipc_net *tn = net_generic(net, tipc_net_id); struct tipc_link *l; struct tipc_node *n; struct tipc_node *found_node = NULL; int i; *bearer_id = 0; rcu_read_lock(); list_for_each_entry_rcu(n, &tn->node_list, list) { tipc_node_read_lock(n); for (i = 0; i < MAX_BEARERS; i++) { l = n->links[i].link; if (l && !strcmp(tipc_link_name(l), link_name)) { *bearer_id = i; found_node = n; break; } } tipc_node_read_unlock(n); if (found_node) break; } rcu_read_unlock(); return found_node; } int tipc_nl_node_set_link(struct sk_buff *skb, struct genl_info *info) { int err; int res = 0; int bearer_id; char *name; struct tipc_link *link; struct tipc_node *node; struct sk_buff_head xmitq; struct nlattr *attrs[TIPC_NLA_LINK_MAX + 1]; struct net *net = sock_net(skb->sk); __skb_queue_head_init(&xmitq); if (!info->attrs[TIPC_NLA_LINK]) return -EINVAL; err = nla_parse_nested(attrs, TIPC_NLA_LINK_MAX, info->attrs[TIPC_NLA_LINK], tipc_nl_link_policy, info->extack); if (err) return err; if (!attrs[TIPC_NLA_LINK_NAME]) return -EINVAL; name = nla_data(attrs[TIPC_NLA_LINK_NAME]); if (strcmp(name, tipc_bclink_name) == 0) return tipc_nl_bc_link_set(net, attrs); node = tipc_node_find_by_name(net, name, &bearer_id); if (!node) return -EINVAL; tipc_node_read_lock(node); link = node->links[bearer_id].link; if (!link) { res = -EINVAL; goto out; } if (attrs[TIPC_NLA_LINK_PROP]) { struct nlattr *props[TIPC_NLA_PROP_MAX + 1]; err = tipc_nl_parse_link_prop(attrs[TIPC_NLA_LINK_PROP], props); if (err) { res = err; goto out; } if (props[TIPC_NLA_PROP_TOL]) { u32 tol; tol = nla_get_u32(props[TIPC_NLA_PROP_TOL]); tipc_link_set_tolerance(link, tol, &xmitq); } if (props[TIPC_NLA_PROP_PRIO]) { u32 prio; prio = nla_get_u32(props[TIPC_NLA_PROP_PRIO]); tipc_link_set_prio(link, prio, &xmitq); } if (props[TIPC_NLA_PROP_WIN]) { u32 win; win = nla_get_u32(props[TIPC_NLA_PROP_WIN]); tipc_link_set_queue_limits(link, win); } } out: tipc_node_read_unlock(node); tipc_bearer_xmit(net, bearer_id, &xmitq, &node->links[bearer_id].maddr); return res; } int tipc_nl_node_get_link(struct sk_buff *skb, struct genl_info *info) { struct net *net = genl_info_net(info); struct nlattr *attrs[TIPC_NLA_LINK_MAX + 1]; struct tipc_nl_msg msg; char *name; int err; msg.portid = info->snd_portid; msg.seq = info->snd_seq; if (!info->attrs[TIPC_NLA_LINK]) return -EINVAL; err = nla_parse_nested(attrs, TIPC_NLA_LINK_MAX, info->attrs[TIPC_NLA_LINK], tipc_nl_link_policy, info->extack); if (err) return err; if (!attrs[TIPC_NLA_LINK_NAME]) return -EINVAL; name = nla_data(attrs[TIPC_NLA_LINK_NAME]); msg.skb = nlmsg_new(NLMSG_GOODSIZE, GFP_KERNEL); if (!msg.skb) return -ENOMEM; if (strcmp(name, tipc_bclink_name) == 0) { err = tipc_nl_add_bc_link(net, &msg); if (err) goto err_free; } else { int bearer_id; struct tipc_node *node; struct tipc_link *link; node = tipc_node_find_by_name(net, name, &bearer_id); if (!node) { err = -EINVAL; goto err_free; } tipc_node_read_lock(node); link = node->links[bearer_id].link; if (!link) { tipc_node_read_unlock(node); err = -EINVAL; goto err_free; } err = __tipc_nl_add_link(net, &msg, link, 0); tipc_node_read_unlock(node); if (err) goto err_free; } return genlmsg_reply(msg.skb, info); err_free: nlmsg_free(msg.skb); return err; } int tipc_nl_node_reset_link_stats(struct sk_buff *skb, struct genl_info *info) { int err; char *link_name; unsigned int bearer_id; struct tipc_link *link; struct tipc_node *node; struct nlattr *attrs[TIPC_NLA_LINK_MAX + 1]; struct net *net = sock_net(skb->sk); struct tipc_link_entry *le; if (!info->attrs[TIPC_NLA_LINK]) return -EINVAL; err = nla_parse_nested(attrs, TIPC_NLA_LINK_MAX, info->attrs[TIPC_NLA_LINK], tipc_nl_link_policy, info->extack); if (err) return err; if (!attrs[TIPC_NLA_LINK_NAME]) return -EINVAL; link_name = nla_data(attrs[TIPC_NLA_LINK_NAME]); if (strcmp(link_name, tipc_bclink_name) == 0) { err = tipc_bclink_reset_stats(net); if (err) return err; return 0; } node = tipc_node_find_by_name(net, link_name, &bearer_id); if (!node) return -EINVAL; le = &node->links[bearer_id]; tipc_node_read_lock(node); spin_lock_bh(&le->lock); link = node->links[bearer_id].link; if (!link) { spin_unlock_bh(&le->lock); tipc_node_read_unlock(node); return -EINVAL; } tipc_link_reset_stats(link); spin_unlock_bh(&le->lock); tipc_node_read_unlock(node); return 0; } /* Caller should hold node lock */ static int __tipc_nl_add_node_links(struct net *net, struct tipc_nl_msg *msg, struct tipc_node *node, u32 *prev_link) { u32 i; int err; for (i = *prev_link; i < MAX_BEARERS; i++) { *prev_link = i; if (!node->links[i].link) continue; err = __tipc_nl_add_link(net, msg, node->links[i].link, NLM_F_MULTI); if (err) return err; } *prev_link = 0; return 0; } int tipc_nl_node_dump_link(struct sk_buff *skb, struct netlink_callback *cb) { struct net *net = sock_net(skb->sk); struct tipc_net *tn = net_generic(net, tipc_net_id); struct tipc_node *node; struct tipc_nl_msg msg; u32 prev_node = cb->args[0]; u32 prev_link = cb->args[1]; int done = cb->args[2]; int err; if (done) return 0; msg.skb = skb; msg.portid = NETLINK_CB(cb->skb).portid; msg.seq = cb->nlh->nlmsg_seq; rcu_read_lock(); if (prev_node) { node = tipc_node_find(net, prev_node); if (!node) { /* We never set seq or call nl_dump_check_consistent() * this means that setting prev_seq here will cause the * consistence check to fail in the netlink callback * handler. Resulting in the last NLMSG_DONE message * having the NLM_F_DUMP_INTR flag set. */ cb->prev_seq = 1; goto out; } tipc_node_put(node); list_for_each_entry_continue_rcu(node, &tn->node_list, list) { tipc_node_read_lock(node); err = __tipc_nl_add_node_links(net, &msg, node, &prev_link); tipc_node_read_unlock(node); if (err) goto out; prev_node = node->addr; } } else { err = tipc_nl_add_bc_link(net, &msg); if (err) goto out; list_for_each_entry_rcu(node, &tn->node_list, list) { tipc_node_read_lock(node); err = __tipc_nl_add_node_links(net, &msg, node, &prev_link); tipc_node_read_unlock(node); if (err) goto out; prev_node = node->addr; } } done = 1; out: rcu_read_unlock(); cb->args[0] = prev_node; cb->args[1] = prev_link; cb->args[2] = done; return skb->len; } int tipc_nl_node_set_monitor(struct sk_buff *skb, struct genl_info *info) { struct nlattr *attrs[TIPC_NLA_MON_MAX + 1]; struct net *net = sock_net(skb->sk); int err; if (!info->attrs[TIPC_NLA_MON]) return -EINVAL; err = nla_parse_nested(attrs, TIPC_NLA_MON_MAX, info->attrs[TIPC_NLA_MON], tipc_nl_monitor_policy, info->extack); if (err) return err; if (attrs[TIPC_NLA_MON_ACTIVATION_THRESHOLD]) { u32 val; val = nla_get_u32(attrs[TIPC_NLA_MON_ACTIVATION_THRESHOLD]); err = tipc_nl_monitor_set_threshold(net, val); if (err) return err; } return 0; } static int __tipc_nl_add_monitor_prop(struct net *net, struct tipc_nl_msg *msg) { struct nlattr *attrs; void *hdr; u32 val; hdr = genlmsg_put(msg->skb, msg->portid, msg->seq, &tipc_genl_family, 0, TIPC_NL_MON_GET); if (!hdr) return -EMSGSIZE; attrs = nla_nest_start(msg->skb, TIPC_NLA_MON); if (!attrs) goto msg_full; val = tipc_nl_monitor_get_threshold(net); if (nla_put_u32(msg->skb, TIPC_NLA_MON_ACTIVATION_THRESHOLD, val)) goto attr_msg_full; nla_nest_end(msg->skb, attrs); genlmsg_end(msg->skb, hdr); return 0; attr_msg_full: nla_nest_cancel(msg->skb, attrs); msg_full: genlmsg_cancel(msg->skb, hdr); return -EMSGSIZE; } int tipc_nl_node_get_monitor(struct sk_buff *skb, struct genl_info *info) { struct net *net = sock_net(skb->sk); struct tipc_nl_msg msg; int err; msg.skb = nlmsg_new(NLMSG_GOODSIZE, GFP_KERNEL); if (!msg.skb) return -ENOMEM; msg.portid = info->snd_portid; msg.seq = info->snd_seq; err = __tipc_nl_add_monitor_prop(net, &msg); if (err) { nlmsg_free(msg.skb); return err; } return genlmsg_reply(msg.skb, info); } int tipc_nl_node_dump_monitor(struct sk_buff *skb, struct netlink_callback *cb) { struct net *net = sock_net(skb->sk); u32 prev_bearer = cb->args[0]; struct tipc_nl_msg msg; int bearer_id; int err; if (prev_bearer == MAX_BEARERS) return 0; msg.skb = skb; msg.portid = NETLINK_CB(cb->skb).portid; msg.seq = cb->nlh->nlmsg_seq; rtnl_lock(); for (bearer_id = prev_bearer; bearer_id < MAX_BEARERS; bearer_id++) { err = __tipc_nl_add_monitor(net, &msg, bearer_id); if (err) break; } rtnl_unlock(); cb->args[0] = bearer_id; return skb->len; } int tipc_nl_node_dump_monitor_peer(struct sk_buff *skb, struct netlink_callback *cb) { struct net *net = sock_net(skb->sk); u32 prev_node = cb->args[1]; u32 bearer_id = cb->args[2]; int done = cb->args[0]; struct tipc_nl_msg msg; int err; if (!prev_node) { struct nlattr **attrs; struct nlattr *mon[TIPC_NLA_MON_MAX + 1]; err = tipc_nlmsg_parse(cb->nlh, &attrs); if (err) return err; if (!attrs[TIPC_NLA_MON]) return -EINVAL; err = nla_parse_nested(mon, TIPC_NLA_MON_MAX, attrs[TIPC_NLA_MON], tipc_nl_monitor_policy, NULL); if (err) return err; if (!mon[TIPC_NLA_MON_REF]) return -EINVAL; bearer_id = nla_get_u32(mon[TIPC_NLA_MON_REF]); if (bearer_id >= MAX_BEARERS) return -EINVAL; } if (done) return 0; msg.skb = skb; msg.portid = NETLINK_CB(cb->skb).portid; msg.seq = cb->nlh->nlmsg_seq; rtnl_lock(); err = tipc_nl_add_monitor_peer(net, &msg, bearer_id, &prev_node); if (!err) done = 1; rtnl_unlock(); cb->args[0] = done; cb->args[1] = prev_node; cb->args[2] = bearer_id; return skb->len; }
212 2313 220 2665 536 536 975 1851 2273 922 195 1940 241 44 1849 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 // SPDX-License-Identifier: GPL-2.0 /* * Copyright (c) 2003-2006, Cluster File Systems, Inc, info@clusterfs.com * Written by Alex Tomas <alex@clusterfs.com> */ #ifndef _EXT4_EXTENTS #define _EXT4_EXTENTS #include "ext4.h" /* * With AGGRESSIVE_TEST defined, the capacity of index/leaf blocks * becomes very small, so index split, in-depth growing and * other hard changes happen much more often. * This is for debug purposes only. */ #define AGGRESSIVE_TEST_ /* * With EXTENTS_STATS defined, the number of blocks and extents * are collected in the truncate path. They'll be shown at * umount time. */ #define EXTENTS_STATS__ /* * If CHECK_BINSEARCH is defined, then the results of the binary search * will also be checked by linear search. */ #define CHECK_BINSEARCH__ /* * If EXT_STATS is defined then stats numbers are collected. * These number will be displayed at umount time. */ #define EXT_STATS_ /* * ext4_inode has i_block array (60 bytes total). * The first 12 bytes store ext4_extent_header; * the remainder stores an array of ext4_extent. * For non-inode extent blocks, ext4_extent_tail * follows the array. */ /* * This is the extent tail on-disk structure. * All other extent structures are 12 bytes long. It turns out that * block_size % 12 >= 4 for at least all powers of 2 greater than 512, which * covers all valid ext4 block sizes. Therefore, this tail structure can be * crammed into the end of the block without having to rebalance the tree. */ struct ext4_extent_tail { __le32 et_checksum; /* crc32c(uuid+inum+extent_block) */ }; /* * This is the extent on-disk structure. * It's used at the bottom of the tree. */ struct ext4_extent { __le32 ee_block; /* first logical block extent covers */ __le16 ee_len; /* number of blocks covered by extent */ __le16 ee_start_hi; /* high 16 bits of physical block */ __le32 ee_start_lo; /* low 32 bits of physical block */ }; /* * This is index on-disk structure. * It's used at all the levels except the bottom. */ struct ext4_extent_idx { __le32 ei_block; /* index covers logical blocks from 'block' */ __le32 ei_leaf_lo; /* pointer to the physical block of the next * * level. leaf or next index could be there */ __le16 ei_leaf_hi; /* high 16 bits of physical block */ __u16 ei_unused; }; /* * Each block (leaves and indexes), even inode-stored has header. */ struct ext4_extent_header { __le16 eh_magic; /* probably will support different formats */ __le16 eh_entries; /* number of valid entries */ __le16 eh_max; /* capacity of store in entries */ __le16 eh_depth; /* has tree real underlying blocks? */ __le32 eh_generation; /* generation of the tree */ }; #define EXT4_EXT_MAGIC cpu_to_le16(0xf30a) #define EXT4_MAX_EXTENT_DEPTH 5 #define EXT4_EXTENT_TAIL_OFFSET(hdr) \ (sizeof(struct ext4_extent_header) + \ (sizeof(struct ext4_extent) * le16_to_cpu((hdr)->eh_max))) static inline struct ext4_extent_tail * find_ext4_extent_tail(struct ext4_extent_header *eh) { return (struct ext4_extent_tail *)(((void *)eh) + EXT4_EXTENT_TAIL_OFFSET(eh)); } /* * Array of ext4_ext_path contains path to some extent. * Creation/lookup routines use it for traversal/splitting/etc. * Truncate uses it to simulate recursive walking. */ struct ext4_ext_path { ext4_fsblk_t p_block; __u16 p_depth; __u16 p_maxdepth; struct ext4_extent *p_ext; struct ext4_extent_idx *p_idx; struct ext4_extent_header *p_hdr; struct buffer_head *p_bh; }; /* * structure for external API */ /* * EXT_INIT_MAX_LEN is the maximum number of blocks we can have in an * initialized extent. This is 2^15 and not (2^16 - 1), since we use the * MSB of ee_len field in the extent datastructure to signify if this * particular extent is an initialized extent or an unwritten (i.e. * preallocated). * EXT_UNWRITTEN_MAX_LEN is the maximum number of blocks we can have in an * unwritten extent. * If ee_len is <= 0x8000, it is an initialized extent. Otherwise, it is an * unwritten one. In other words, if MSB of ee_len is set, it is an * unwritten extent with only one special scenario when ee_len = 0x8000. * In this case we can not have an unwritten extent of zero length and * thus we make it as a special case of initialized extent with 0x8000 length. * This way we get better extent-to-group alignment for initialized extents. * Hence, the maximum number of blocks we can have in an *initialized* * extent is 2^15 (32768) and in an *unwritten* extent is 2^15-1 (32767). */ #define EXT_INIT_MAX_LEN (1UL << 15) #define EXT_UNWRITTEN_MAX_LEN (EXT_INIT_MAX_LEN - 1) #define EXT_FIRST_EXTENT(__hdr__) \ ((struct ext4_extent *) (((char *) (__hdr__)) + \ sizeof(struct ext4_extent_header))) #define EXT_FIRST_INDEX(__hdr__) \ ((struct ext4_extent_idx *) (((char *) (__hdr__)) + \ sizeof(struct ext4_extent_header))) #define EXT_HAS_FREE_INDEX(__path__) \ (le16_to_cpu((__path__)->p_hdr->eh_entries) \ < le16_to_cpu((__path__)->p_hdr->eh_max)) #define EXT_LAST_EXTENT(__hdr__) \ (EXT_FIRST_EXTENT((__hdr__)) + le16_to_cpu((__hdr__)->eh_entries) - 1) #define EXT_LAST_INDEX(__hdr__) \ (EXT_FIRST_INDEX((__hdr__)) + le16_to_cpu((__hdr__)->eh_entries) - 1) #define EXT_MAX_EXTENT(__hdr__) \ ((le16_to_cpu((__hdr__)->eh_max)) ? \ ((EXT_FIRST_EXTENT((__hdr__)) + le16_to_cpu((__hdr__)->eh_max) - 1)) \ : 0) #define EXT_MAX_INDEX(__hdr__) \ ((le16_to_cpu((__hdr__)->eh_max)) ? \ ((EXT_FIRST_INDEX((__hdr__)) + le16_to_cpu((__hdr__)->eh_max) - 1)) : 0) static inline struct ext4_extent_header *ext_inode_hdr(struct inode *inode) { return (struct ext4_extent_header *) EXT4_I(inode)->i_data; } static inline struct ext4_extent_header *ext_block_hdr(struct buffer_head *bh) { return (struct ext4_extent_header *) bh->b_data; } static inline unsigned short ext_depth(struct inode *inode) { return le16_to_cpu(ext_inode_hdr(inode)->eh_depth); } static inline void ext4_ext_mark_unwritten(struct ext4_extent *ext) { /* We can not have an unwritten extent of zero length! */ BUG_ON((le16_to_cpu(ext->ee_len) & ~EXT_INIT_MAX_LEN) == 0); ext->ee_len |= cpu_to_le16(EXT_INIT_MAX_LEN); } static inline int ext4_ext_is_unwritten(struct ext4_extent *ext) { /* Extent with ee_len of 0x8000 is treated as an initialized extent */ return (le16_to_cpu(ext->ee_len) > EXT_INIT_MAX_LEN); } static inline int ext4_ext_get_actual_len(struct ext4_extent *ext) { return (le16_to_cpu(ext->ee_len) <= EXT_INIT_MAX_LEN ? le16_to_cpu(ext->ee_len) : (le16_to_cpu(ext->ee_len) - EXT_INIT_MAX_LEN)); } static inline void ext4_ext_mark_initialized(struct ext4_extent *ext) { ext->ee_len = cpu_to_le16(ext4_ext_get_actual_len(ext)); } /* * ext4_ext_pblock: * combine low and high parts of physical block number into ext4_fsblk_t */ static inline ext4_fsblk_t ext4_ext_pblock(struct ext4_extent *ex) { ext4_fsblk_t block; block = le32_to_cpu(ex->ee_start_lo); block |= ((ext4_fsblk_t) le16_to_cpu(ex->ee_start_hi) << 31) << 1; return block; } /* * ext4_idx_pblock: * combine low and high parts of a leaf physical block number into ext4_fsblk_t */ static inline ext4_fsblk_t ext4_idx_pblock(struct ext4_extent_idx *ix) { ext4_fsblk_t block; block = le32_to_cpu(ix->ei_leaf_lo); block |= ((ext4_fsblk_t) le16_to_cpu(ix->ei_leaf_hi) << 31) << 1; return block; } /* * ext4_ext_store_pblock: * stores a large physical block number into an extent struct, * breaking it into parts */ static inline void ext4_ext_store_pblock(struct ext4_extent *ex, ext4_fsblk_t pb) { ex->ee_start_lo = cpu_to_le32((unsigned long) (pb & 0xffffffff)); ex->ee_start_hi = cpu_to_le16((unsigned long) ((pb >> 31) >> 1) & 0xffff); } /* * ext4_idx_store_pblock: * stores a large physical block number into an index struct, * breaking it into parts */ static inline void ext4_idx_store_pblock(struct ext4_extent_idx *ix, ext4_fsblk_t pb) { ix->ei_leaf_lo = cpu_to_le32((unsigned long) (pb & 0xffffffff)); ix->ei_leaf_hi = cpu_to_le16((unsigned long) ((pb >> 31) >> 1) & 0xffff); } #define ext4_ext_dirty(handle, inode, path) \ __ext4_ext_dirty(__func__, __LINE__, (handle), (inode), (path)) int __ext4_ext_dirty(const char *where, unsigned int line, handle_t *handle, struct inode *inode, struct ext4_ext_path *path); #endif /* _EXT4_EXTENTS */
1040 1039 987 1040 1039 2 1039 1019 68 67 66 68 65 64 2 59 59 65 71 66 71 71 666 663 663 661 660 658 12 656 656 655 663 666 416 414 412 412 410 40 409 43 408 5 407 405 404 403 402 403 402 401 401 399 398 327 279 282 1 34 32 32 30 31 30 32 14 5 2 14 13 2 14 32 21 34 96 1 110 110 15 12 10 1 10 10 9 1 11 11 32 30 31 30 31 11 13 13 21 23 6 18 146 146 122 122 146 139 139 146 68 146 140 120 119 2 119 108 105 102 116 30 41 50 44 43 46 46 5146 52 5105 5105 1615 4193 3490 5105 5102 5102 5087 5086 4173 5015 5013 4251 3702 5013 2504 2151 5014 487 253 253 253 260 5131 33 33 33 31 5120 580 580 5 579 74 74 1 74 5209 5209 536 5207 44 37 5199 92 5134 1106 5202 434 5199 4341 5199 963 1165 227 5198 5200 5015 5199 61 17 17 17 26 5095 5095 1156 5027 5066 661 4287 475 3977 3977 23 3980 3976 3974 3970 3141 3138 3139 1285 1285 819 5 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 /* * linux/fs/open.c * * Copyright (C) 1991, 1992 Linus Torvalds */ #include <linux/string.h> #include <linux/mm.h> #include <linux/file.h> #include <linux/fdtable.h> #include <linux/fsnotify.h> #include <linux/module.h> #include <linux/tty.h> #include <linux/namei.h> #include <linux/backing-dev.h> #include <linux/capability.h> #include <linux/securebits.h> #include <linux/security.h> #include <linux/mount.h> #include <linux/fcntl.h> #include <linux/slab.h> #include <linux/uaccess.h> #include <linux/fs.h> #include <linux/personality.h> #include <linux/pagemap.h> #include <linux/syscalls.h> #include <linux/rcupdate.h> #include <linux/audit.h> #include <linux/falloc.h> #include <linux/fs_struct.h> #include <linux/ima.h> #include <linux/dnotify.h> #include <linux/compat.h> #include "internal.h" int do_truncate(struct dentry *dentry, loff_t length, unsigned int time_attrs, struct file *filp) { int ret; struct iattr newattrs; /* Not pretty: "inode->i_size" shouldn't really be signed. But it is. */ if (length < 0) return -EINVAL; newattrs.ia_size = length; newattrs.ia_valid = ATTR_SIZE | time_attrs; if (filp) { newattrs.ia_file = filp; newattrs.ia_valid |= ATTR_FILE; } /* Remove suid, sgid, and file capabilities on truncate too */ ret = dentry_needs_remove_privs(dentry); if (ret < 0) return ret; if (ret) newattrs.ia_valid |= ret | ATTR_FORCE; inode_lock(dentry->d_inode); /* Note any delegations or leases have already been broken: */ ret = notify_change(dentry, &newattrs, NULL); inode_unlock(dentry->d_inode); return ret; } long vfs_truncate(const struct path *path, loff_t length) { struct inode *inode; long error; inode = path->dentry->d_inode; /* For directories it's -EISDIR, for other non-regulars - -EINVAL */ if (S_ISDIR(inode->i_mode)) return -EISDIR; if (!S_ISREG(inode->i_mode)) return -EINVAL; error = mnt_want_write(path->mnt); if (error) goto out; error = inode_permission(inode, MAY_WRITE); if (error) goto mnt_drop_write_and_out; error = -EPERM; if (IS_APPEND(inode)) goto mnt_drop_write_and_out; error = get_write_access(inode); if (error) goto mnt_drop_write_and_out; /* * Make sure that there are no leases. get_write_access() protects * against the truncate racing with a lease-granting setlease(). */ error = break_lease(inode, O_WRONLY); if (error) goto put_write_and_out; error = locks_verify_truncate(inode, NULL, length); if (!error) error = security_path_truncate(path); if (!error) error = do_truncate(path->dentry, length, 0, NULL); put_write_and_out: put_write_access(inode); mnt_drop_write_and_out: mnt_drop_write(path->mnt); out: return error; } EXPORT_SYMBOL_GPL(vfs_truncate); long do_sys_truncate(const char __user *pathname, loff_t length) { unsigned int lookup_flags = LOOKUP_FOLLOW; struct path path; int error; if (length < 0) /* sorry, but loff_t says... */ return -EINVAL; retry: error = user_path_at(AT_FDCWD, pathname, lookup_flags, &path); if (!error) { error = vfs_truncate(&path, length); path_put(&path); } if (retry_estale(error, lookup_flags)) { lookup_flags |= LOOKUP_REVAL; goto retry; } return error; } SYSCALL_DEFINE2(truncate, const char __user *, path, long, length) { return do_sys_truncate(path, length); } #ifdef CONFIG_COMPAT COMPAT_SYSCALL_DEFINE2(truncate, const char __user *, path, compat_off_t, length) { return do_sys_truncate(path, length); } #endif long do_sys_ftruncate(unsigned int fd, loff_t length, int small) { struct inode *inode; struct dentry *dentry; struct fd f; int error; error = -EINVAL; if (length < 0) goto out; error = -EBADF; f = fdget(fd); if (!f.file) goto out; /* explicitly opened as large or we are on 64-bit box */ if (f.file->f_flags & O_LARGEFILE) small = 0; dentry = f.file->f_path.dentry; inode = dentry->d_inode; error = -EINVAL; if (!S_ISREG(inode->i_mode) || !(f.file->f_mode & FMODE_WRITE)) goto out_putf; error = -EINVAL; /* Cannot ftruncate over 2^31 bytes without large file support */ if (small && length > MAX_NON_LFS) goto out_putf; error = -EPERM; /* Check IS_APPEND on real upper inode */ if (IS_APPEND(file_inode(f.file))) goto out_putf; sb_start_write(inode->i_sb); error = locks_verify_truncate(inode, f.file, length); if (!error) error = security_path_truncate(&f.file->f_path); if (!error) error = do_truncate(dentry, length, ATTR_MTIME|ATTR_CTIME, f.file); sb_end_write(inode->i_sb); out_putf: fdput(f); out: return error; } SYSCALL_DEFINE2(ftruncate, unsigned int, fd, unsigned long, length) { return do_sys_ftruncate(fd, length, 1); } #ifdef CONFIG_COMPAT COMPAT_SYSCALL_DEFINE2(ftruncate, unsigned int, fd, compat_ulong_t, length) { return do_sys_ftruncate(fd, length, 1); } #endif /* LFS versions of truncate are only needed on 32 bit machines */ #if BITS_PER_LONG == 32 SYSCALL_DEFINE2(truncate64, const char __user *, path, loff_t, length) { return do_sys_truncate(path, length); } SYSCALL_DEFINE2(ftruncate64, unsigned int, fd, loff_t, length) { return do_sys_ftruncate(fd, length, 0); } #endif /* BITS_PER_LONG == 32 */ int vfs_fallocate(struct file *file, int mode, loff_t offset, loff_t len) { struct inode *inode = file_inode(file); long ret; if (offset < 0 || len <= 0) return -EINVAL; /* Return error if mode is not supported */ if (mode & ~FALLOC_FL_SUPPORTED_MASK) return -EOPNOTSUPP; /* Punch hole and zero range are mutually exclusive */ if ((mode & (FALLOC_FL_PUNCH_HOLE | FALLOC_FL_ZERO_RANGE)) == (FALLOC_FL_PUNCH_HOLE | FALLOC_FL_ZERO_RANGE)) return -EOPNOTSUPP; /* Punch hole must have keep size set */ if ((mode & FALLOC_FL_PUNCH_HOLE) && !(mode & FALLOC_FL_KEEP_SIZE)) return -EOPNOTSUPP; /* Collapse range should only be used exclusively. */ if ((mode & FALLOC_FL_COLLAPSE_RANGE) && (mode & ~FALLOC_FL_COLLAPSE_RANGE)) return -EINVAL; /* Insert range should only be used exclusively. */ if ((mode & FALLOC_FL_INSERT_RANGE) && (mode & ~FALLOC_FL_INSERT_RANGE)) return -EINVAL; /* Unshare range should only be used with allocate mode. */ if ((mode & FALLOC_FL_UNSHARE_RANGE) && (mode & ~(FALLOC_FL_UNSHARE_RANGE | FALLOC_FL_KEEP_SIZE))) return -EINVAL; if (!(file->f_mode & FMODE_WRITE)) return -EBADF; /* * We can only allow pure fallocate on append only files */ if ((mode & ~FALLOC_FL_KEEP_SIZE) && IS_APPEND(inode)) return -EPERM; if (IS_IMMUTABLE(inode)) return -EPERM; /* * We cannot allow any fallocate operation on an active swapfile */ if (IS_SWAPFILE(inode)) return -ETXTBSY; /* * Revalidate the write permissions, in case security policy has * changed since the files were opened. */ ret = security_file_permission(file, MAY_WRITE); if (ret) return ret; if (S_ISFIFO(inode->i_mode)) return -ESPIPE; if (S_ISDIR(inode->i_mode)) return -EISDIR; if (!S_ISREG(inode->i_mode) && !S_ISBLK(inode->i_mode)) return -ENODEV; /* Check for wrap through zero too */ if (((offset + len) > inode->i_sb->s_maxbytes) || ((offset + len) < 0)) return -EFBIG; if (!file->f_op->fallocate) return -EOPNOTSUPP; file_start_write(file); ret = file->f_op->fallocate(file, mode, offset, len); /* * Create inotify and fanotify events. * * To keep the logic simple always create events if fallocate succeeds. * This implies that events are even created if the file size remains * unchanged, e.g. when using flag FALLOC_FL_KEEP_SIZE. */ if (ret == 0) fsnotify_modify(file); file_end_write(file); return ret; } EXPORT_SYMBOL_GPL(vfs_fallocate); int ksys_fallocate(int fd, int mode, loff_t offset, loff_t len) { struct fd f = fdget(fd); int error = -EBADF; if (f.file) { error = vfs_fallocate(f.file, mode, offset, len); fdput(f); } return error; } SYSCALL_DEFINE4(fallocate, int, fd, int, mode, loff_t, offset, loff_t, len) { return ksys_fallocate(fd, mode, offset, len); } /* * access() needs to use the real uid/gid, not the effective uid/gid. * We do this by temporarily clearing all FS-related capabilities and * switching the fsuid/fsgid around to the real ones. */ long do_faccessat(int dfd, const char __user *filename, int mode) { const struct cred *old_cred; struct cred *override_cred; struct path path; struct inode *inode; int res; unsigned int lookup_flags = LOOKUP_FOLLOW; if (mode & ~S_IRWXO) /* where's F_OK, X_OK, W_OK, R_OK? */ return -EINVAL; override_cred = prepare_creds(); if (!override_cred) return -ENOMEM; override_cred->fsuid = override_cred->uid; override_cred->fsgid = override_cred->gid; if (!issecure(SECURE_NO_SETUID_FIXUP)) { /* Clear the capabilities if we switch to a non-root user */ kuid_t root_uid = make_kuid(override_cred->user_ns, 0); if (!uid_eq(override_cred->uid, root_uid)) cap_clear(override_cred->cap_effective); else override_cred->cap_effective = override_cred->cap_permitted; } /* * The new set of credentials can *only* be used in * task-synchronous circumstances, and does not need * RCU freeing, unless somebody then takes a separate * reference to it. * * NOTE! This is _only_ true because this credential * is used purely for override_creds() that installs * it as the subjective cred. Other threads will be * accessing ->real_cred, not the subjective cred. * * If somebody _does_ make a copy of this (using the * 'get_current_cred()' function), that will clear the * non_rcu field, because now that other user may be * expecting RCU freeing. But normal thread-synchronous * cred accesses will keep things non-RCY. */ override_cred->non_rcu = 1; old_cred = override_creds(override_cred); retry: res = user_path_at(dfd, filename, lookup_flags, &path); if (res) goto out; inode = d_backing_inode(path.dentry); if ((mode & MAY_EXEC) && S_ISREG(inode->i_mode)) { /* * MAY_EXEC on regular files is denied if the fs is mounted * with the "noexec" flag. */ res = -EACCES; if (path_noexec(&path)) goto out_path_release; } res = inode_permission(inode, mode | MAY_ACCESS); /* SuS v2 requires we report a read only fs too */ if (res || !(mode & S_IWOTH) || special_file(inode->i_mode)) goto out_path_release; /* * This is a rare case where using __mnt_is_readonly() * is OK without a mnt_want/drop_write() pair. Since * no actual write to the fs is performed here, we do * not need to telegraph to that to anyone. * * By doing this, we accept that this access is * inherently racy and know that the fs may change * state before we even see this result. */ if (__mnt_is_readonly(path.mnt)) res = -EROFS; out_path_release: path_put(&path); if (retry_estale(res, lookup_flags)) { lookup_flags |= LOOKUP_REVAL; goto retry; } out: revert_creds(old_cred); put_cred(override_cred); return res; } SYSCALL_DEFINE3(faccessat, int, dfd, const char __user *, filename, int, mode) { return do_faccessat(dfd, filename, mode); } SYSCALL_DEFINE2(access, const char __user *, filename, int, mode) { return do_faccessat(AT_FDCWD, filename, mode); } int ksys_chdir(const char __user *filename) { struct path path; int error; unsigned int lookup_flags = LOOKUP_FOLLOW | LOOKUP_DIRECTORY; retry: error = user_path_at(AT_FDCWD, filename, lookup_flags, &path); if (error) goto out; error = inode_permission(path.dentry->d_inode, MAY_EXEC | MAY_CHDIR); if (error) goto dput_and_out; set_fs_pwd(current->fs, &path); dput_and_out: path_put(&path); if (retry_estale(error, lookup_flags)) { lookup_flags |= LOOKUP_REVAL; goto retry; } out: return error; } SYSCALL_DEFINE1(chdir, const char __user *, filename) { return ksys_chdir(filename); } SYSCALL_DEFINE1(fchdir, unsigned int, fd) { struct fd f = fdget_raw(fd); int error; error = -EBADF; if (!f.file) goto out; error = -ENOTDIR; if (!d_can_lookup(f.file->f_path.dentry)) goto out_putf; error = inode_permission(file_inode(f.file), MAY_EXEC | MAY_CHDIR); if (!error) set_fs_pwd(current->fs, &f.file->f_path); out_putf: fdput(f); out: return error; } int ksys_chroot(const char __user *filename) { struct path path; int error; unsigned int lookup_flags = LOOKUP_FOLLOW | LOOKUP_DIRECTORY; retry: error = user_path_at(AT_FDCWD, filename, lookup_flags, &path); if (error) goto out; error = inode_permission(path.dentry->d_inode, MAY_EXEC | MAY_CHDIR); if (error) goto dput_and_out; error = -EPERM; if (!ns_capable(current_user_ns(), CAP_SYS_CHROOT)) goto dput_and_out; error = security_path_chroot(&path); if (error) goto dput_and_out; set_fs_root(current->fs, &path); error = 0; dput_and_out: path_put(&path); if (retry_estale(error, lookup_flags)) { lookup_flags |= LOOKUP_REVAL; goto retry; } out: return error; } SYSCALL_DEFINE1(chroot, const char __user *, filename) { return ksys_chroot(filename); } static int chmod_common(const struct path *path, umode_t mode) { struct inode *inode = path->dentry->d_inode; struct inode *delegated_inode = NULL; struct iattr newattrs; int error; error = mnt_want_write(path->mnt); if (error) return error; retry_deleg: inode_lock(inode); error = security_path_chmod(path, mode); if (error) goto out_unlock; newattrs.ia_mode = (mode & S_IALLUGO) | (inode->i_mode & ~S_IALLUGO); newattrs.ia_valid = ATTR_MODE | ATTR_CTIME; error = notify_change(path->dentry, &newattrs, &delegated_inode); out_unlock: inode_unlock(inode); if (delegated_inode) { error = break_deleg_wait(&delegated_inode); if (!error) goto retry_deleg; } mnt_drop_write(path->mnt); return error; } int ksys_fchmod(unsigned int fd, umode_t mode) { struct fd f = fdget(fd); int err = -EBADF; if (f.file) { audit_file(f.file); err = chmod_common(&f.file->f_path, mode); fdput(f); } return err; } SYSCALL_DEFINE2(fchmod, unsigned int, fd, umode_t, mode) { return ksys_fchmod(fd, mode); } int do_fchmodat(int dfd, const char __user *filename, umode_t mode) { struct path path; int error; unsigned int lookup_flags = LOOKUP_FOLLOW; retry: error = user_path_at(dfd, filename, lookup_flags, &path); if (!error) { error = chmod_common(&path, mode); path_put(&path); if (retry_estale(error, lookup_flags)) { lookup_flags |= LOOKUP_REVAL; goto retry; } } return error; } SYSCALL_DEFINE3(fchmodat, int, dfd, const char __user *, filename, umode_t, mode) { return do_fchmodat(dfd, filename, mode); } SYSCALL_DEFINE2(chmod, const char __user *, filename, umode_t, mode) { return do_fchmodat(AT_FDCWD, filename, mode); } static int chown_common(const struct path *path, uid_t user, gid_t group) { struct inode *inode = path->dentry->d_inode; struct inode *delegated_inode = NULL; int error; struct iattr newattrs; kuid_t uid; kgid_t gid; uid = make_kuid(current_user_ns(), user); gid = make_kgid(current_user_ns(), group); retry_deleg: newattrs.ia_valid = ATTR_CTIME; if (user != (uid_t) -1) { if (!uid_valid(uid)) return -EINVAL; newattrs.ia_valid |= ATTR_UID; newattrs.ia_uid = uid; } if (group != (gid_t) -1) { if (!gid_valid(gid)) return -EINVAL; newattrs.ia_valid |= ATTR_GID; newattrs.ia_gid = gid; } if (!S_ISDIR(inode->i_mode)) newattrs.ia_valid |= ATTR_KILL_SUID | ATTR_KILL_SGID | ATTR_KILL_PRIV; inode_lock(inode); error = security_path_chown(path, uid, gid); if (!error) error = notify_change(path->dentry, &newattrs, &delegated_inode); inode_unlock(inode); if (delegated_inode) { error = break_deleg_wait(&delegated_inode); if (!error) goto retry_deleg; } return error; } int do_fchownat(int dfd, const char __user *filename, uid_t user, gid_t group, int flag) { struct path path; int error = -EINVAL; int lookup_flags; if ((flag & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH)) != 0) goto out; lookup_flags = (flag & AT_SYMLINK_NOFOLLOW) ? 0 : LOOKUP_FOLLOW; if (flag & AT_EMPTY_PATH) lookup_flags |= LOOKUP_EMPTY; retry: error = user_path_at(dfd, filename, lookup_flags, &path); if (error) goto out; error = mnt_want_write(path.mnt); if (error) goto out_release; error = chown_common(&path, user, group); mnt_drop_write(path.mnt); out_release: path_put(&path); if (retry_estale(error, lookup_flags)) { lookup_flags |= LOOKUP_REVAL; goto retry; } out: return error; } SYSCALL_DEFINE5(fchownat, int, dfd, const char __user *, filename, uid_t, user, gid_t, group, int, flag) { return do_fchownat(dfd, filename, user, group, flag); } SYSCALL_DEFINE3(chown, const char __user *, filename, uid_t, user, gid_t, group) { return do_fchownat(AT_FDCWD, filename, user, group, 0); } SYSCALL_DEFINE3(lchown, const char __user *, filename, uid_t, user, gid_t, group) { return do_fchownat(AT_FDCWD, filename, user, group, AT_SYMLINK_NOFOLLOW); } int ksys_fchown(unsigned int fd, uid_t user, gid_t group) { struct fd f = fdget(fd); int error = -EBADF; if (!f.file) goto out; error = mnt_want_write_file(f.file); if (error) goto out_fput; audit_file(f.file); error = chown_common(&f.file->f_path, user, group); mnt_drop_write_file(f.file); out_fput: fdput(f); out: return error; } SYSCALL_DEFINE3(fchown, unsigned int, fd, uid_t, user, gid_t, group) { return ksys_fchown(fd, user, group); } static int do_dentry_open(struct file *f, struct inode *inode, int (*open)(struct inode *, struct file *)) { static const struct file_operations empty_fops = {}; int error; path_get(&f->f_path); f->f_inode = inode; f->f_mapping = inode->i_mapping; /* Ensure that we skip any errors that predate opening of the file */ f->f_wb_err = filemap_sample_wb_err(f->f_mapping); if (unlikely(f->f_flags & O_PATH)) { f->f_mode = FMODE_PATH | FMODE_OPENED; f->f_op = &empty_fops; return 0; } /* Any file opened for execve()/uselib() has to be a regular file. */ if (unlikely(f->f_flags & FMODE_EXEC && !S_ISREG(inode->i_mode))) { error = -EACCES; goto cleanup_file; } if (f->f_mode & FMODE_WRITE && !special_file(inode->i_mode)) { error = get_write_access(inode); if (unlikely(error)) goto cleanup_file; error = __mnt_want_write(f->f_path.mnt); if (unlikely(error)) { put_write_access(inode); goto cleanup_file; } f->f_mode |= FMODE_WRITER; } /* POSIX.1-2008/SUSv4 Section XSI 2.9.7 */ if (S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode)) f->f_mode |= FMODE_ATOMIC_POS; f->f_op = fops_get(inode->i_fop); if (unlikely(WARN_ON(!f->f_op))) { error = -ENODEV; goto cleanup_all; } error = security_file_open(f); if (error) goto cleanup_all; error = break_lease(locks_inode(f), f->f_flags); if (error) goto cleanup_all; /* normally all 3 are set; ->open() can clear them if needed */ f->f_mode |= FMODE_LSEEK | FMODE_PREAD | FMODE_PWRITE; if (!open) open = f->f_op->open; if (open) { error = open(inode, f); if (error) goto cleanup_all; } f->f_mode |= FMODE_OPENED; if ((f->f_mode & (FMODE_READ | FMODE_WRITE)) == FMODE_READ) i_readcount_inc(inode); if ((f->f_mode & FMODE_READ) && likely(f->f_op->read || f->f_op->read_iter)) f->f_mode |= FMODE_CAN_READ; if ((f->f_mode & FMODE_WRITE) && likely(f->f_op->write || f->f_op->write_iter)) f->f_mode |= FMODE_CAN_WRITE; f->f_write_hint = WRITE_LIFE_NOT_SET; f->f_flags &= ~(O_CREAT | O_EXCL | O_NOCTTY | O_TRUNC); file_ra_state_init(&f->f_ra, f->f_mapping->host->i_mapping); /* NB: we're sure to have correct a_ops only after f_op->open */ if (f->f_flags & O_DIRECT) { if (!f->f_mapping->a_ops || !f->f_mapping->a_ops->direct_IO) return -EINVAL; } return 0; cleanup_all: if (WARN_ON_ONCE(error > 0)) error = -EINVAL; fops_put(f->f_op); if (f->f_mode & FMODE_WRITER) { put_write_access(inode); __mnt_drop_write(f->f_path.mnt); } cleanup_file: path_put(&f->f_path); f->f_path.mnt = NULL; f->f_path.dentry = NULL; f->f_inode = NULL; return error; } /** * finish_open - finish opening a file * @file: file pointer * @dentry: pointer to dentry * @open: open callback * @opened: state of open * * This can be used to finish opening a file passed to i_op->atomic_open(). * * If the open callback is set to NULL, then the standard f_op->open() * filesystem callback is substituted. * * NB: the dentry reference is _not_ consumed. If, for example, the dentry is * the return value of d_splice_alias(), then the caller needs to perform dput() * on it after finish_open(). * * Returns zero on success or -errno if the open failed. */ int finish_open(struct file *file, struct dentry *dentry, int (*open)(struct inode *, struct file *)) { BUG_ON(file->f_mode & FMODE_OPENED); /* once it's opened, it's opened */ file->f_path.dentry = dentry; return do_dentry_open(file, d_backing_inode(dentry), open); } EXPORT_SYMBOL(finish_open); /** * finish_no_open - finish ->atomic_open() without opening the file * * @file: file pointer * @dentry: dentry or NULL (as returned from ->lookup()) * * This can be used to set the result of a successful lookup in ->atomic_open(). * * NB: unlike finish_open() this function does consume the dentry reference and * the caller need not dput() it. * * Returns "0" which must be the return value of ->atomic_open() after having * called this function. */ int finish_no_open(struct file *file, struct dentry *dentry) { file->f_path.dentry = dentry; return 0; } EXPORT_SYMBOL(finish_no_open); char *file_path(struct file *filp, char *buf, int buflen) { return d_path(&filp->f_path, buf, buflen); } EXPORT_SYMBOL(file_path); /** * vfs_open - open the file at the given path * @path: path to open * @file: newly allocated file with f_flag initialized * @cred: credentials to use */ int vfs_open(const struct path *path, struct file *file) { file->f_path = *path; return do_dentry_open(file, d_backing_inode(path->dentry), NULL); } struct file *dentry_open(const struct path *path, int flags, const struct cred *cred) { int error; struct file *f; validate_creds(cred); /* We must always pass in a valid mount pointer. */ BUG_ON(!path->mnt); f = alloc_empty_file(flags, cred); if (!IS_ERR(f)) { error = vfs_open(path, f); if (error) { fput(f); f = ERR_PTR(error); } } return f; } EXPORT_SYMBOL(dentry_open); struct file *open_with_fake_path(const struct path *path, int flags, struct inode *inode, const struct cred *cred) { struct file *f = alloc_empty_file_noaccount(flags, cred); if (!IS_ERR(f)) { int error; f->f_path = *path; error = do_dentry_open(f, inode, NULL); if (error) { fput(f); f = ERR_PTR(error); } } return f; } EXPORT_SYMBOL(open_with_fake_path); static inline int build_open_flags(int flags, umode_t mode, struct open_flags *op) { int lookup_flags = 0; int acc_mode = ACC_MODE(flags); /* * Clear out all open flags we don't know about so that we don't report * them in fcntl(F_GETFD) or similar interfaces. */ flags &= VALID_OPEN_FLAGS; if (flags & (O_CREAT | __O_TMPFILE)) op->mode = (mode & S_IALLUGO) | S_IFREG; else op->mode = 0; /* Must never be set by userspace */ flags &= ~FMODE_NONOTIFY & ~O_CLOEXEC; /* * O_SYNC is implemented as __O_SYNC|O_DSYNC. As many places only * check for O_DSYNC if the need any syncing at all we enforce it's * always set instead of having to deal with possibly weird behaviour * for malicious applications setting only __O_SYNC. */ if (flags & __O_SYNC) flags |= O_DSYNC; if (flags & __O_TMPFILE) { if ((flags & O_TMPFILE_MASK) != O_TMPFILE) return -EINVAL; if (!(acc_mode & MAY_WRITE)) return -EINVAL; } else if (flags & O_PATH) { /* * If we have O_PATH in the open flag. Then we * cannot have anything other than the below set of flags */ flags &= O_DIRECTORY | O_NOFOLLOW | O_PATH; acc_mode = 0; } op->open_flag = flags; /* O_TRUNC implies we need access checks for write permissions */ if (flags & O_TRUNC) acc_mode |= MAY_WRITE; /* Allow the LSM permission hook to distinguish append access from general write access. */ if (flags & O_APPEND) acc_mode |= MAY_APPEND; op->acc_mode = acc_mode; op->intent = flags & O_PATH ? 0 : LOOKUP_OPEN; if (flags & O_CREAT) { op->intent |= LOOKUP_CREATE; if (flags & O_EXCL) op->intent |= LOOKUP_EXCL; } if (flags & O_DIRECTORY) lookup_flags |= LOOKUP_DIRECTORY; if (!(flags & O_NOFOLLOW)) lookup_flags |= LOOKUP_FOLLOW; op->lookup_flags = lookup_flags; return 0; } /** * file_open_name - open file and return file pointer * * @name: struct filename containing path to open * @flags: open flags as per the open(2) second argument * @mode: mode for the new file if O_CREAT is set, else ignored * * This is the helper to open a file from kernelspace if you really * have to. But in generally you should not do this, so please move * along, nothing to see here.. */ struct file *file_open_name(struct filename *name, int flags, umode_t mode) { struct open_flags op; int err = build_open_flags(flags, mode, &op); return err ? ERR_PTR(err) : do_filp_open(AT_FDCWD, name, &op); } /** * filp_open - open file and return file pointer * * @filename: path to open * @flags: open flags as per the open(2) second argument * @mode: mode for the new file if O_CREAT is set, else ignored * * This is the helper to open a file from kernelspace if you really * have to. But in generally you should not do this, so please move * along, nothing to see here.. */ struct file *filp_open(const char *filename, int flags, umode_t mode) { struct filename *name = getname_kernel(filename); struct file *file = ERR_CAST(name); if (!IS_ERR(name)) { file = file_open_name(name, flags, mode); putname(name); } return file; } EXPORT_SYMBOL(filp_open); struct file *file_open_root(struct dentry *dentry, struct vfsmount *mnt, const char *filename, int flags, umode_t mode) { struct open_flags op; int err = build_open_flags(flags, mode, &op); if (err) return ERR_PTR(err); return do_file_open_root(dentry, mnt, filename, &op); } EXPORT_SYMBOL(file_open_root); long do_sys_open(int dfd, const char __user *filename, int flags, umode_t mode) { struct open_flags op; int fd = build_open_flags(flags, mode, &op); struct filename *tmp; if (fd) return fd; tmp = getname(filename); if (IS_ERR(tmp)) return PTR_ERR(tmp); fd = get_unused_fd_flags(flags); if (fd >= 0) { struct file *f = do_filp_open(dfd, tmp, &op); if (IS_ERR(f)) { put_unused_fd(fd); fd = PTR_ERR(f); } else { fsnotify_open(f); fd_install(fd, f); } } putname(tmp); return fd; } SYSCALL_DEFINE3(open, const char __user *, filename, int, flags, umode_t, mode) { if (force_o_largefile()) flags |= O_LARGEFILE; return do_sys_open(AT_FDCWD, filename, flags, mode); } SYSCALL_DEFINE4(openat, int, dfd, const char __user *, filename, int, flags, umode_t, mode) { if (force_o_largefile()) flags |= O_LARGEFILE; return do_sys_open(dfd, filename, flags, mode); } #ifdef CONFIG_COMPAT /* * Exactly like sys_open(), except that it doesn't set the * O_LARGEFILE flag. */ COMPAT_SYSCALL_DEFINE3(open, const char __user *, filename, int, flags, umode_t, mode) { return do_sys_open(AT_FDCWD, filename, flags, mode); } /* * Exactly like sys_openat(), except that it doesn't set the * O_LARGEFILE flag. */ COMPAT_SYSCALL_DEFINE4(openat, int, dfd, const char __user *, filename, int, flags, umode_t, mode) { return do_sys_open(dfd, filename, flags, mode); } #endif #ifndef __alpha__ /* * For backward compatibility? Maybe this should be moved * into arch/i386 instead? */ SYSCALL_DEFINE2(creat, const char __user *, pathname, umode_t, mode) { return ksys_open(pathname, O_CREAT | O_WRONLY | O_TRUNC, mode); } #endif /* * "id" is the POSIX thread ID. We use the * files pointer for this.. */ int filp_close(struct file *filp, fl_owner_t id) { int retval = 0; if (!file_count(filp)) { printk(KERN_ERR "VFS: Close: file count is 0\n"); return 0; } if (filp->f_op->flush) retval = filp->f_op->flush(filp, id); if (likely(!(filp->f_mode & FMODE_PATH))) { dnotify_flush(filp, id); locks_remove_posix(filp, id); } fput(filp); return retval; } EXPORT_SYMBOL(filp_close); /* * Careful here! We test whether the file pointer is NULL before * releasing the fd. This ensures that one clone task can't release * an fd while another clone is opening it. */ SYSCALL_DEFINE1(close, unsigned int, fd) { int retval = __close_fd(current->files, fd); /* can't restart close syscall because file table entry was cleared */ if (unlikely(retval == -ERESTARTSYS || retval == -ERESTARTNOINTR || retval == -ERESTARTNOHAND || retval == -ERESTART_RESTARTBLOCK)) retval = -EINTR; return retval; } /* * This routine simulates a hangup on the tty, to arrange that users * are given clean terminals at login time. */ SYSCALL_DEFINE0(vhangup) { if (capable(CAP_SYS_TTY_CONFIG)) { tty_vhangup_self(); return 0; } return -EPERM; } /* * Called when an inode is about to be open. * We use this to disallow opening large files on 32bit systems if * the caller didn't specify O_LARGEFILE. On 64bit systems we force * on this flag in sys_open. */ int generic_file_open(struct inode * inode, struct file * filp) { if (!(filp->f_flags & O_LARGEFILE) && i_size_read(inode) > MAX_NON_LFS) return -EOVERFLOW; return 0; } EXPORT_SYMBOL(generic_file_open); /* * This is used by subsystems that don't want seekable * file descriptors. The function is not supposed to ever fail, the only * reason it returns an 'int' and not 'void' is so that it can be plugged * directly into file_operations structure. */ int nonseekable_open(struct inode *inode, struct file *filp) { filp->f_mode &= ~(FMODE_LSEEK | FMODE_PREAD | FMODE_PWRITE); return 0; } EXPORT_SYMBOL(nonseekable_open); /* * stream_open is used by subsystems that want stream-like file descriptors. * Such file descriptors are not seekable and don't have notion of position * (file.f_pos is always 0). Contrary to file descriptors of other regular * files, .read() and .write() can run simultaneously. * * stream_open never fails and is marked to return int so that it could be * directly used as file_operations.open . */ int stream_open(struct inode *inode, struct file *filp) { filp->f_mode &= ~(FMODE_LSEEK | FMODE_PREAD | FMODE_PWRITE | FMODE_ATOMIC_POS); filp->f_mode |= FMODE_STREAM; return 0; } EXPORT_SYMBOL(stream_open);
9 7 6 4 2 2 2 4 9 5 2 2 462 462 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 /* * Stateless NAT actions * * Copyright (c) 2007 Herbert Xu <herbert@gondor.apana.org.au> * * This program is free software; you can redistribute it and/or modify it * under the terms of the GNU General Public License as published by the Free * Software Foundation; either version 2 of the License, or (at your option) * any later version. */ #include <linux/errno.h> #include <linux/init.h> #include <linux/kernel.h> #include <linux/module.h> #include <linux/netfilter.h> #include <linux/rtnetlink.h> #include <linux/skbuff.h> #include <linux/slab.h> #include <linux/spinlock.h> #include <linux/string.h> #include <linux/tc_act/tc_nat.h> #include <net/act_api.h> #include <net/icmp.h> #include <net/ip.h> #include <net/netlink.h> #include <net/tc_act/tc_nat.h> #include <net/tcp.h> #include <net/udp.h> static unsigned int nat_net_id; static struct tc_action_ops act_nat_ops; static const struct nla_policy nat_policy[TCA_NAT_MAX + 1] = { [TCA_NAT_PARMS] = { .len = sizeof(struct tc_nat) }, }; static int tcf_nat_init(struct net *net, struct nlattr *nla, struct nlattr *est, struct tc_action **a, int ovr, int bind, bool rtnl_held, struct netlink_ext_ack *extack) { struct tc_action_net *tn = net_generic(net, nat_net_id); struct nlattr *tb[TCA_NAT_MAX + 1]; struct tc_nat *parm; int ret = 0, err; struct tcf_nat *p; u32 index; if (nla == NULL) return -EINVAL; err = nla_parse_nested(tb, TCA_NAT_MAX, nla, nat_policy, NULL); if (err < 0) return err; if (tb[TCA_NAT_PARMS] == NULL) return -EINVAL; parm = nla_data(tb[TCA_NAT_PARMS]); index = parm->index; err = tcf_idr_check_alloc(tn, &index, a, bind); if (!err) { ret = tcf_idr_create(tn, index, est, a, &act_nat_ops, bind, false); if (ret) { tcf_idr_cleanup(tn, index); return ret; } ret = ACT_P_CREATED; } else if (err > 0) { if (bind) return 0; if (!ovr) { tcf_idr_release(*a, bind); return -EEXIST; } } else { return err; } p = to_tcf_nat(*a); spin_lock_bh(&p->tcf_lock); p->old_addr = parm->old_addr; p->new_addr = parm->new_addr; p->mask = parm->mask; p->flags = parm->flags; p->tcf_action = parm->action; spin_unlock_bh(&p->tcf_lock); if (ret == ACT_P_CREATED) tcf_idr_insert(tn, *a); return ret; } static int tcf_nat_act(struct sk_buff *skb, const struct tc_action *a, struct tcf_result *res) { struct tcf_nat *p = to_tcf_nat(a); struct iphdr *iph; __be32 old_addr; __be32 new_addr; __be32 mask; __be32 addr; int egress; int action; int ihl; int noff; spin_lock(&p->tcf_lock); tcf_lastuse_update(&p->tcf_tm); old_addr = p->old_addr; new_addr = p->new_addr; mask = p->mask; egress = p->flags & TCA_NAT_FLAG_EGRESS; action = p->tcf_action; bstats_update(&p->tcf_bstats, skb); spin_unlock(&p->tcf_lock); if (unlikely(action == TC_ACT_SHOT)) goto drop; noff = skb_network_offset(skb); if (!pskb_may_pull(skb, sizeof(*iph) + noff)) goto drop; iph = ip_hdr(skb); if (egress) addr = iph->saddr; else addr = iph->daddr; if (!((old_addr ^ addr) & mask)) { if (skb_try_make_writable(skb, sizeof(*iph) + noff)) goto drop; new_addr &= mask; new_addr |= addr & ~mask; /* Rewrite IP header */ iph = ip_hdr(skb); if (egress) iph->saddr = new_addr; else iph->daddr = new_addr; csum_replace4(&iph->check, addr, new_addr); } else if ((iph->frag_off & htons(IP_OFFSET)) || iph->protocol != IPPROTO_ICMP) { goto out; } ihl = iph->ihl * 4; /* It would be nice to share code with stateful NAT. */ switch (iph->frag_off & htons(IP_OFFSET) ? 0 : iph->protocol) { case IPPROTO_TCP: { struct tcphdr *tcph; if (!pskb_may_pull(skb, ihl + sizeof(*tcph) + noff) || skb_try_make_writable(skb, ihl + sizeof(*tcph) + noff)) goto drop; tcph = (void *)(skb_network_header(skb) + ihl); inet_proto_csum_replace4(&tcph->check, skb, addr, new_addr, true); break; } case IPPROTO_UDP: { struct udphdr *udph; if (!pskb_may_pull(skb, ihl + sizeof(*udph) + noff) || skb_try_make_writable(skb, ihl + sizeof(*udph) + noff)) goto drop; udph = (void *)(skb_network_header(skb) + ihl); if (udph->check || skb->ip_summed == CHECKSUM_PARTIAL) { inet_proto_csum_replace4(&udph->check, skb, addr, new_addr, true); if (!udph->check) udph->check = CSUM_MANGLED_0; } break; } case IPPROTO_ICMP: { struct icmphdr *icmph; if (!pskb_may_pull(skb, ihl + sizeof(*icmph) + noff)) goto drop; icmph = (void *)(skb_network_header(skb) + ihl); if ((icmph->type != ICMP_DEST_UNREACH) && (icmph->type != ICMP_TIME_EXCEEDED) && (icmph->type != ICMP_PARAMETERPROB)) break; if (!pskb_may_pull(skb, ihl + sizeof(*icmph) + sizeof(*iph) + noff)) goto drop; icmph = (void *)(skb_network_header(skb) + ihl); iph = (void *)(icmph + 1); if (egress) addr = iph->daddr; else addr = iph->saddr; if ((old_addr ^ addr) & mask) break; if (skb_try_make_writable(skb, ihl + sizeof(*icmph) + sizeof(*iph) + noff)) goto drop; icmph = (void *)(skb_network_header(skb) + ihl); iph = (void *)(icmph + 1); new_addr &= mask; new_addr |= addr & ~mask; /* XXX Fix up the inner checksums. */ if (egress) iph->daddr = new_addr; else iph->saddr = new_addr; inet_proto_csum_replace4(&icmph->checksum, skb, addr, new_addr, false); break; } default: break; } out: return action; drop: spin_lock(&p->tcf_lock); p->tcf_qstats.drops++; spin_unlock(&p->tcf_lock); return TC_ACT_SHOT; } static int tcf_nat_dump(struct sk_buff *skb, struct tc_action *a, int bind, int ref) { unsigned char *b = skb_tail_pointer(skb); struct tcf_nat *p = to_tcf_nat(a); struct tc_nat opt = { .old_addr = p->old_addr, .new_addr = p->new_addr, .mask = p->mask, .flags = p->flags, .index = p->tcf_index, .action = p->tcf_action, .refcnt = refcount_read(&p->tcf_refcnt) - ref, .bindcnt = atomic_read(&p->tcf_bindcnt) - bind, }; struct tcf_t t; if (nla_put(skb, TCA_NAT_PARMS, sizeof(opt), &opt)) goto nla_put_failure; tcf_tm_dump(&t, &p->tcf_tm); if (nla_put_64bit(skb, TCA_NAT_TM, sizeof(t), &t, TCA_NAT_PAD)) goto nla_put_failure; return skb->len; nla_put_failure: nlmsg_trim(skb, b); return -1; } static int tcf_nat_walker(struct net *net, struct sk_buff *skb, struct netlink_callback *cb, int type, const struct tc_action_ops *ops, struct netlink_ext_ack *extack) { struct tc_action_net *tn = net_generic(net, nat_net_id); return tcf_generic_walker(tn, skb, cb, type, ops, extack); } static int tcf_nat_search(struct net *net, struct tc_action **a, u32 index, struct netlink_ext_ack *extack) { struct tc_action_net *tn = net_generic(net, nat_net_id); return tcf_idr_search(tn, a, index); } static struct tc_action_ops act_nat_ops = { .kind = "nat", .type = TCA_ACT_NAT, .owner = THIS_MODULE, .act = tcf_nat_act, .dump = tcf_nat_dump, .init = tcf_nat_init, .walk = tcf_nat_walker, .lookup = tcf_nat_search, .size = sizeof(struct tcf_nat), }; static __net_init int nat_init_net(struct net *net) { struct tc_action_net *tn = net_generic(net, nat_net_id); return tc_action_net_init(net, tn, &act_nat_ops); } static void __net_exit nat_exit_net(struct list_head *net_list) { tc_action_net_exit(net_list, nat_net_id); } static struct pernet_operations nat_net_ops = { .init = nat_init_net, .exit_batch = nat_exit_net, .id = &nat_net_id, .size = sizeof(struct tc_action_net), }; MODULE_DESCRIPTION("Stateless NAT actions"); MODULE_LICENSE("GPL"); static int __init nat_init_module(void) { return tcf_register_action(&act_nat_ops, &nat_net_ops); } static void __exit nat_cleanup_module(void) { tcf_unregister_action(&act_nat_ops, &nat_net_ops); } module_init(nat_init_module); module_exit(nat_cleanup_module);
592 682 681 567 70 182 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 /* Authentication token and access key management * * Copyright (C) 2004, 2007 Red Hat, Inc. All Rights Reserved. * Written by David Howells (dhowells@redhat.com) * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License * as published by the Free Software Foundation; either version * 2 of the License, or (at your option) any later version. * * * See Documentation/security/keys/core.rst for information on keys/keyrings. */ #ifndef _LINUX_KEY_H #define _LINUX_KEY_H #include <linux/types.h> #include <linux/list.h> #include <linux/rbtree.h> #include <linux/rcupdate.h> #include <linux/sysctl.h> #include <linux/rwsem.h> #include <linux/atomic.h> #include <linux/assoc_array.h> #include <linux/refcount.h> #include <linux/time64.h> #ifdef __KERNEL__ #include <linux/uidgid.h> /* key handle serial number */ typedef int32_t key_serial_t; /* key handle permissions mask */ typedef uint32_t key_perm_t; struct key; #ifdef CONFIG_KEYS #undef KEY_DEBUGGING #define KEY_POS_VIEW 0x01000000 /* possessor can view a key's attributes */ #define KEY_POS_READ 0x02000000 /* possessor can read key payload / view keyring */ #define KEY_POS_WRITE 0x04000000 /* possessor can update key payload / add link to keyring */ #define KEY_POS_SEARCH 0x08000000 /* possessor can find a key in search / search a keyring */ #define KEY_POS_LINK 0x10000000 /* possessor can create a link to a key/keyring */ #define KEY_POS_SETATTR 0x20000000 /* possessor can set key attributes */ #define KEY_POS_ALL 0x3f000000 #define KEY_USR_VIEW 0x00010000 /* user permissions... */ #define KEY_USR_READ 0x00020000 #define KEY_USR_WRITE 0x00040000 #define KEY_USR_SEARCH 0x00080000 #define KEY_USR_LINK 0x00100000 #define KEY_USR_SETATTR 0x00200000 #define KEY_USR_ALL 0x003f0000 #define KEY_GRP_VIEW 0x00000100 /* group permissions... */ #define KEY_GRP_READ 0x00000200 #define KEY_GRP_WRITE 0x00000400 #define KEY_GRP_SEARCH 0x00000800 #define KEY_GRP_LINK 0x00001000 #define KEY_GRP_SETATTR 0x00002000 #define KEY_GRP_ALL 0x00003f00 #define KEY_OTH_VIEW 0x00000001 /* third party permissions... */ #define KEY_OTH_READ 0x00000002 #define KEY_OTH_WRITE 0x00000004 #define KEY_OTH_SEARCH 0x00000008 #define KEY_OTH_LINK 0x00000010 #define KEY_OTH_SETATTR 0x00000020 #define KEY_OTH_ALL 0x0000003f #define KEY_PERM_UNDEF 0xffffffff struct seq_file; struct user_struct; struct signal_struct; struct cred; struct key_type; struct key_owner; struct keyring_list; struct keyring_name; struct keyring_index_key { struct key_type *type; const char *description; size_t desc_len; }; union key_payload { void __rcu *rcu_data0; void *data[4]; }; /*****************************************************************************/ /* * key reference with possession attribute handling * * NOTE! key_ref_t is a typedef'd pointer to a type that is not actually * defined. This is because we abuse the bottom bit of the reference to carry a * flag to indicate whether the calling process possesses that key in one of * its keyrings. * * the key_ref_t has been made a separate type so that the compiler can reject * attempts to dereference it without proper conversion. * * the three functions are used to assemble and disassemble references */ typedef struct __key_reference_with_attributes *key_ref_t; static inline key_ref_t make_key_ref(const struct key *key, bool possession) { return (key_ref_t) ((unsigned long) key | possession); } static inline struct key *key_ref_to_ptr(const key_ref_t key_ref) { return (struct key *) ((unsigned long) key_ref & ~1UL); } static inline bool is_key_possessed(const key_ref_t key_ref) { return (unsigned long) key_ref & 1UL; } typedef int (*key_restrict_link_func_t)(struct key *dest_keyring, const struct key_type *type, const union key_payload *payload, struct key *restriction_key); struct key_restriction { key_restrict_link_func_t check; struct key *key; struct key_type *keytype; }; enum key_state { KEY_IS_UNINSTANTIATED, KEY_IS_POSITIVE, /* Positively instantiated */ }; /*****************************************************************************/ /* * authentication token / access credential / keyring * - types of key include: * - keyrings * - disk encryption IDs * - Kerberos TGTs and tickets */ struct key { refcount_t usage; /* number of references */ key_serial_t serial; /* key serial number */ union { struct list_head graveyard_link; struct rb_node serial_node; }; struct rw_semaphore sem; /* change vs change sem */ struct key_user *user; /* owner of this key */ void *security; /* security data for this key */ union { time64_t expiry; /* time at which key expires (or 0) */ time64_t revoked_at; /* time at which key was revoked */ }; time64_t last_used_at; /* last time used for LRU keyring discard */ kuid_t uid; kgid_t gid; key_perm_t perm; /* access permissions */ unsigned short quotalen; /* length added to quota */ unsigned short datalen; /* payload data length * - may not match RCU dereferenced payload * - payload should contain own length */ short state; /* Key state (+) or rejection error (-) */ #ifdef KEY_DEBUGGING unsigned magic; #define KEY_DEBUG_MAGIC 0x18273645u #endif unsigned long flags; /* status flags (change with bitops) */ #define KEY_FLAG_DEAD 0 /* set if key type has been deleted */ #define KEY_FLAG_REVOKED 1 /* set if key had been revoked */ #define KEY_FLAG_IN_QUOTA 2 /* set if key consumes quota */ #define KEY_FLAG_USER_CONSTRUCT 3 /* set if key is being constructed in userspace */ #define KEY_FLAG_ROOT_CAN_CLEAR 4 /* set if key can be cleared by root without permission */ #define KEY_FLAG_INVALIDATED 5 /* set if key has been invalidated */ #define KEY_FLAG_BUILTIN 6 /* set if key is built in to the kernel */ #define KEY_FLAG_ROOT_CAN_INVAL 7 /* set if key can be invalidated by root without permission */ #define KEY_FLAG_KEEP 8 /* set if key should not be removed */ #define KEY_FLAG_UID_KEYRING 9 /* set if key is a user or user session keyring */ /* the key type and key description string * - the desc is used to match a key against search criteria * - it should be a printable string * - eg: for krb5 AFS, this might be "afs@REDHAT.COM" */ union { struct keyring_index_key index_key; struct { struct key_type *type; /* type of key */ char *description; }; }; /* key data * - this is used to hold the data actually used in cryptography or * whatever */ union { union key_payload payload; struct { /* Keyring bits */ struct list_head name_link; struct assoc_array keys; }; }; /* This is set on a keyring to restrict the addition of a link to a key * to it. If this structure isn't provided then it is assumed that the * keyring is open to any addition. It is ignored for non-keyring * keys. Only set this value using keyring_restrict(), keyring_alloc(), * or key_alloc(). * * This is intended for use with rings of trusted keys whereby addition * to the keyring needs to be controlled. KEY_ALLOC_BYPASS_RESTRICTION * overrides this, allowing the kernel to add extra keys without * restriction. */ struct key_restriction *restrict_link; }; extern struct key *key_alloc(struct key_type *type, const char *desc, kuid_t uid, kgid_t gid, const struct cred *cred, key_perm_t perm, unsigned long flags, struct key_restriction *restrict_link); #define KEY_ALLOC_IN_QUOTA 0x0000 /* add to quota, reject if would overrun */ #define KEY_ALLOC_QUOTA_OVERRUN 0x0001 /* add to quota, permit even if overrun */ #define KEY_ALLOC_NOT_IN_QUOTA 0x0002 /* not in quota */ #define KEY_ALLOC_BUILT_IN 0x0004 /* Key is built into kernel */ #define KEY_ALLOC_BYPASS_RESTRICTION 0x0008 /* Override the check on restricted keyrings */ #define KEY_ALLOC_UID_KEYRING 0x0010 /* allocating a user or user session keyring */ #define KEY_ALLOC_SET_KEEP 0x0020 /* Set the KEEP flag on the key/keyring */ extern void key_revoke(struct key *key); extern void key_invalidate(struct key *key); extern void key_put(struct key *key); static inline struct key *__key_get(struct key *key) { refcount_inc(&key->usage); return key; } static inline struct key *key_get(struct key *key) { return key ? __key_get(key) : key; } static inline void key_ref_put(key_ref_t key_ref) { key_put(key_ref_to_ptr(key_ref)); } extern struct key *request_key(struct key_type *type, const char *description, const char *callout_info); extern struct key *request_key_with_auxdata(struct key_type *type, const char *description, const void *callout_info, size_t callout_len, void *aux); extern struct key *request_key_async(struct key_type *type, const char *description, const void *callout_info, size_t callout_len); extern struct key *request_key_async_with_auxdata(struct key_type *type, const char *description, const void *callout_info, size_t callout_len, void *aux); extern int wait_for_key_construction(struct key *key, bool intr); extern int key_validate(const struct key *key); extern key_ref_t key_create_or_update(key_ref_t keyring, const char *type, const char *description, const void *payload, size_t plen, key_perm_t perm, unsigned long flags); extern int key_update(key_ref_t key, const void *payload, size_t plen); extern int key_link(struct key *keyring, struct key *key); extern int key_unlink(struct key *keyring, struct key *key); extern struct key *keyring_alloc(const char *description, kuid_t uid, kgid_t gid, const struct cred *cred, key_perm_t perm, unsigned long flags, struct key_restriction *restrict_link, struct key *dest); extern int restrict_link_reject(struct key *keyring, const struct key_type *type, const union key_payload *payload, struct key *restriction_key); extern int keyring_clear(struct key *keyring); extern key_ref_t keyring_search(key_ref_t keyring, struct key_type *type, const char *description); extern int keyring_add_key(struct key *keyring, struct key *key); extern int keyring_restrict(key_ref_t keyring, const char *type, const char *restriction); extern struct key *key_lookup(key_serial_t id); static inline key_serial_t key_serial(const struct key *key) { return key ? key->serial : 0; } extern void key_set_timeout(struct key *, unsigned); /* * The permissions required on a key that we're looking up. */ #define KEY_NEED_VIEW 0x01 /* Require permission to view attributes */ #define KEY_NEED_READ 0x02 /* Require permission to read content */ #define KEY_NEED_WRITE 0x04 /* Require permission to update / modify */ #define KEY_NEED_SEARCH 0x08 /* Require permission to search (keyring) or find (key) */ #define KEY_NEED_LINK 0x10 /* Require permission to link */ #define KEY_NEED_SETATTR 0x20 /* Require permission to change attributes */ #define KEY_NEED_ALL 0x3f /* All the above permissions */ static inline short key_read_state(const struct key *key) { /* Barrier versus mark_key_instantiated(). */ return smp_load_acquire(&key->state); } /** * key_is_positive - Determine if a key has been positively instantiated * @key: The key to check. * * Return true if the specified key has been positively instantiated, false * otherwise. */ static inline bool key_is_positive(const struct key *key) { return key_read_state(key) == KEY_IS_POSITIVE; } static inline bool key_is_negative(const struct key *key) { return key_read_state(key) < 0; } #define dereference_key_rcu(KEY) \ (rcu_dereference((KEY)->payload.rcu_data0)) #define dereference_key_locked(KEY) \ (rcu_dereference_protected((KEY)->payload.rcu_data0, \ rwsem_is_locked(&((struct key *)(KEY))->sem))) #define rcu_assign_keypointer(KEY, PAYLOAD) \ do { \ rcu_assign_pointer((KEY)->payload.rcu_data0, (PAYLOAD)); \ } while (0) #ifdef CONFIG_SYSCTL extern struct ctl_table key_sysctls[]; #endif /* * the userspace interface */ extern int install_thread_keyring_to_cred(struct cred *cred); extern void key_fsuid_changed(struct task_struct *tsk); extern void key_fsgid_changed(struct task_struct *tsk); extern void key_init(void); #else /* CONFIG_KEYS */ #define key_validate(k) 0 #define key_serial(k) 0 #define key_get(k) ({ NULL; }) #define key_revoke(k) do { } while(0) #define key_invalidate(k) do { } while(0) #define key_put(k) do { } while(0) #define key_ref_put(k) do { } while(0) #define make_key_ref(k, p) NULL #define key_ref_to_ptr(k) NULL #define is_key_possessed(k) 0 #define key_fsuid_changed(t) do { } while(0) #define key_fsgid_changed(t) do { } while(0) #define key_init() do { } while(0) #endif /* CONFIG_KEYS */ #endif /* __KERNEL__ */ #endif /* _LINUX_KEY_H */
17 29 40 29 40 40 40 17 17 17 17 29 29 29 9 3 6 6 3 3 2 20 20 19 1 29 29 29 29 29 29 29 29 29 29 29 29 29 11 11 11 11 11 11 11 11 29 29 29 29 29 29 29 29 29 29 29 29 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 /* * scsi_logging.c * * Copyright (C) 2014 SUSE Linux Products GmbH * Copyright (C) 2014 Hannes Reinecke <hare@suse.de> * * This file is released under the GPLv2 */ #include <linux/kernel.h> #include <linux/atomic.h> #include <scsi/scsi.h> #include <scsi/scsi_cmnd.h> #include <scsi/scsi_device.h> #include <scsi/scsi_eh.h> #include <scsi/scsi_dbg.h> static char *scsi_log_reserve_buffer(size_t *len) { *len = 128; return kmalloc(*len, GFP_ATOMIC); } static void scsi_log_release_buffer(char *bufptr) { kfree(bufptr); } static inline const char *scmd_name(const struct scsi_cmnd *scmd) { return scmd->request->rq_disk ? scmd->request->rq_disk->disk_name : NULL; } static size_t sdev_format_header(char *logbuf, size_t logbuf_len, const char *name, int tag) { size_t off = 0; if (name) off += scnprintf(logbuf + off, logbuf_len - off, "[%s] ", name); if (WARN_ON(off >= logbuf_len)) return off; if (tag >= 0) off += scnprintf(logbuf + off, logbuf_len - off, "tag#%d ", tag); return off; } void sdev_prefix_printk(const char *level, const struct scsi_device *sdev, const char *name, const char *fmt, ...) { va_list args; char *logbuf; size_t off = 0, logbuf_len; if (!sdev) return; logbuf = scsi_log_reserve_buffer(&logbuf_len); if (!logbuf) return; if (name) off += scnprintf(logbuf + off, logbuf_len - off, "[%s] ", name); if (!WARN_ON(off >= logbuf_len)) { va_start(args, fmt); off += vscnprintf(logbuf + off, logbuf_len - off, fmt, args); va_end(args); } dev_printk(level, &sdev->sdev_gendev, "%s", logbuf); scsi_log_release_buffer(logbuf); } EXPORT_SYMBOL(sdev_prefix_printk); void scmd_printk(const char *level, const struct scsi_cmnd *scmd, const char *fmt, ...) { va_list args; char *logbuf; size_t off = 0, logbuf_len; if (!scmd || !scmd->cmnd) return; logbuf = scsi_log_reserve_buffer(&logbuf_len); if (!logbuf) return; off = sdev_format_header(logbuf, logbuf_len, scmd_name(scmd), scmd->request->tag); if (off < logbuf_len) { va_start(args, fmt); off += vscnprintf(logbuf + off, logbuf_len - off, fmt, args); va_end(args); } dev_printk(level, &scmd->device->sdev_gendev, "%s", logbuf); scsi_log_release_buffer(logbuf); } EXPORT_SYMBOL(scmd_printk); static size_t scsi_format_opcode_name(char *buffer, size_t buf_len, const unsigned char *cdbp) { int sa, cdb0; const char *cdb_name = NULL, *sa_name = NULL; size_t off; cdb0 = cdbp[0]; if (cdb0 == VARIABLE_LENGTH_CMD) { int len = scsi_varlen_cdb_length(cdbp); if (len < 10) { off = scnprintf(buffer, buf_len, "short variable length command, len=%d", len); return off; } sa = (cdbp[8] << 8) + cdbp[9]; } else sa = cdbp[1] & 0x1f; if (!scsi_opcode_sa_name(cdb0, sa, &cdb_name, &sa_name)) { if (cdb_name) off = scnprintf(buffer, buf_len, "%s", cdb_name); else { off = scnprintf(buffer, buf_len, "opcode=0x%x", cdb0); if (WARN_ON(off >= buf_len)) return off; if (cdb0 >= VENDOR_SPECIFIC_CDB) off += scnprintf(buffer + off, buf_len - off, " (vendor)"); else if (cdb0 >= 0x60 && cdb0 < 0x7e) off += scnprintf(buffer + off, buf_len - off, " (reserved)"); } } else { if (sa_name) off = scnprintf(buffer, buf_len, "%s", sa_name); else if (cdb_name) off = scnprintf(buffer, buf_len, "%s, sa=0x%x", cdb_name, sa); else off = scnprintf(buffer, buf_len, "opcode=0x%x, sa=0x%x", cdb0, sa); } WARN_ON(off >= buf_len); return off; } size_t __scsi_format_command(char *logbuf, size_t logbuf_len, const unsigned char *cdb, size_t cdb_len) { int len, k; size_t off; off = scsi_format_opcode_name(logbuf, logbuf_len, cdb); if (off >= logbuf_len) return off; len = scsi_command_size(cdb); if (cdb_len < len) len = cdb_len; /* print out all bytes in cdb */ for (k = 0; k < len; ++k) { if (off > logbuf_len - 3) break; off += scnprintf(logbuf + off, logbuf_len - off, " %02x", cdb[k]); } return off; } EXPORT_SYMBOL(__scsi_format_command); void scsi_print_command(struct scsi_cmnd *cmd) { int k; char *logbuf; size_t off, logbuf_len; if (!cmd->cmnd) return; logbuf = scsi_log_reserve_buffer(&logbuf_len); if (!logbuf) return; off = sdev_format_header(logbuf, logbuf_len, scmd_name(cmd), cmd->request->tag); if (off >= logbuf_len) goto out_printk; off += scnprintf(logbuf + off, logbuf_len - off, "CDB: "); if (WARN_ON(off >= logbuf_len)) goto out_printk; off += scsi_format_opcode_name(logbuf + off, logbuf_len - off, cmd->cmnd); if (off >= logbuf_len) goto out_printk; /* print out all bytes in cdb */ if (cmd->cmd_len > 16) { /* Print opcode in one line and use separate lines for CDB */ off += scnprintf(logbuf + off, logbuf_len - off, "\n"); dev_printk(KERN_INFO, &cmd->device->sdev_gendev, "%s", logbuf); scsi_log_release_buffer(logbuf); for (k = 0; k < cmd->cmd_len; k += 16) { size_t linelen = min(cmd->cmd_len - k, 16); logbuf = scsi_log_reserve_buffer(&logbuf_len); if (!logbuf) break; off = sdev_format_header(logbuf, logbuf_len, scmd_name(cmd), cmd->request->tag); if (!WARN_ON(off > logbuf_len - 58)) { off += scnprintf(logbuf + off, logbuf_len - off, "CDB[%02x]: ", k); hex_dump_to_buffer(&cmd->cmnd[k], linelen, 16, 1, logbuf + off, logbuf_len - off, false); } dev_printk(KERN_INFO, &cmd->device->sdev_gendev, "%s", logbuf); scsi_log_release_buffer(logbuf); } return; } if (!WARN_ON(off > logbuf_len - 49)) { off += scnprintf(logbuf + off, logbuf_len - off, " "); hex_dump_to_buffer(cmd->cmnd, cmd->cmd_len, 16, 1, logbuf + off, logbuf_len - off, false); } out_printk: dev_printk(KERN_INFO, &cmd->device->sdev_gendev, "%s", logbuf); scsi_log_release_buffer(logbuf); } EXPORT_SYMBOL(scsi_print_command); static size_t scsi_format_extd_sense(char *buffer, size_t buf_len, unsigned char asc, unsigned char ascq) { size_t off = 0; const char *extd_sense_fmt = NULL; const char *extd_sense_str = scsi_extd_sense_format(asc, ascq, &extd_sense_fmt); if (extd_sense_str) { off = scnprintf(buffer, buf_len, "Add. Sense: %s", extd_sense_str); if (extd_sense_fmt) off += scnprintf(buffer + off, buf_len - off, "(%s%x)", extd_sense_fmt, ascq); } else { if (asc >= 0x80) off = scnprintf(buffer, buf_len, "<<vendor>>"); off += scnprintf(buffer + off, buf_len - off, "ASC=0x%x ", asc); if (ascq >= 0x80) off += scnprintf(buffer + off, buf_len - off, "<<vendor>>"); off += scnprintf(buffer + off, buf_len - off, "ASCQ=0x%x ", ascq); } return off; } static size_t scsi_format_sense_hdr(char *buffer, size_t buf_len, const struct scsi_sense_hdr *sshdr) { const char *sense_txt; size_t off; off = scnprintf(buffer, buf_len, "Sense Key : "); sense_txt = scsi_sense_key_string(sshdr->sense_key); if (sense_txt) off += scnprintf(buffer + off, buf_len - off, "%s ", sense_txt); else off += scnprintf(buffer + off, buf_len - off, "0x%x ", sshdr->sense_key); off += scnprintf(buffer + off, buf_len - off, scsi_sense_is_deferred(sshdr) ? "[deferred] " : "[current] "); if (sshdr->response_code >= 0x72) off += scnprintf(buffer + off, buf_len - off, "[descriptor] "); return off; } static void scsi_log_dump_sense(const struct scsi_device *sdev, const char *name, int tag, const unsigned char *sense_buffer, int sense_len) { char *logbuf; size_t logbuf_len; int i; logbuf = scsi_log_reserve_buffer(&logbuf_len); if (!logbuf) return; for (i = 0; i < sense_len; i += 16) { int len = min(sense_len - i, 16); size_t off; off = sdev_format_header(logbuf, logbuf_len, name, tag); hex_dump_to_buffer(&sense_buffer[i], len, 16, 1, logbuf + off, logbuf_len - off, false); dev_printk(KERN_INFO, &sdev->sdev_gendev, "%s", logbuf); } scsi_log_release_buffer(logbuf); } static void scsi_log_print_sense_hdr(const struct scsi_device *sdev, const char *name, int tag, const struct scsi_sense_hdr *sshdr) { char *logbuf; size_t off, logbuf_len; logbuf = scsi_log_reserve_buffer(&logbuf_len); if (!logbuf) return; off = sdev_format_header(logbuf, logbuf_len, name, tag); off += scsi_format_sense_hdr(logbuf + off, logbuf_len - off, sshdr); dev_printk(KERN_INFO, &sdev->sdev_gendev, "%s", logbuf); scsi_log_release_buffer(logbuf); logbuf = scsi_log_reserve_buffer(&logbuf_len); if (!logbuf) return; off = sdev_format_header(logbuf, logbuf_len, name, tag); off += scsi_format_extd_sense(logbuf + off, logbuf_len - off, sshdr->asc, sshdr->ascq); dev_printk(KERN_INFO, &sdev->sdev_gendev, "%s", logbuf); scsi_log_release_buffer(logbuf); } static void scsi_log_print_sense(const struct scsi_device *sdev, const char *name, int tag, const unsigned char *sense_buffer, int sense_len) { struct scsi_sense_hdr sshdr; if (scsi_normalize_sense(sense_buffer, sense_len, &sshdr)) scsi_log_print_sense_hdr(sdev, name, tag, &sshdr); else scsi_log_dump_sense(sdev, name, tag, sense_buffer, sense_len); } /* * Print normalized SCSI sense header with a prefix. */ void scsi_print_sense_hdr(const struct scsi_device *sdev, const char *name, const struct scsi_sense_hdr *sshdr) { scsi_log_print_sense_hdr(sdev, name, -1, sshdr); } EXPORT_SYMBOL(scsi_print_sense_hdr); /* Normalize and print sense buffer with name prefix */ void __scsi_print_sense(const struct scsi_device *sdev, const char *name, const unsigned char *sense_buffer, int sense_len) { scsi_log_print_sense(sdev, name, -1, sense_buffer, sense_len); } EXPORT_SYMBOL(__scsi_print_sense); /* Normalize and print sense buffer in SCSI command */ void scsi_print_sense(const struct scsi_cmnd *cmd) { scsi_log_print_sense(cmd->device, scmd_name(cmd), cmd->request->tag, cmd->sense_buffer, SCSI_SENSE_BUFFERSIZE); } EXPORT_SYMBOL(scsi_print_sense); void scsi_print_result(const struct scsi_cmnd *cmd, const char *msg, int disposition) { char *logbuf; size_t off, logbuf_len; const char *mlret_string = scsi_mlreturn_string(disposition); const char *hb_string = scsi_hostbyte_string(cmd->result); const char *db_string = scsi_driverbyte_string(cmd->result); logbuf = scsi_log_reserve_buffer(&logbuf_len); if (!logbuf) return; off = sdev_format_header(logbuf, logbuf_len, scmd_name(cmd), cmd->request->tag); if (off >= logbuf_len) goto out_printk; if (msg) { off += scnprintf(logbuf + off, logbuf_len - off, "%s: ", msg); if (WARN_ON(off >= logbuf_len)) goto out_printk; } if (mlret_string) off += scnprintf(logbuf + off, logbuf_len - off, "%s ", mlret_string); else off += scnprintf(logbuf + off, logbuf_len - off, "UNKNOWN(0x%02x) ", disposition); if (WARN_ON(off >= logbuf_len)) goto out_printk; off += scnprintf(logbuf + off, logbuf_len - off, "Result: "); if (WARN_ON(off >= logbuf_len)) goto out_printk; if (hb_string) off += scnprintf(logbuf + off, logbuf_len - off, "hostbyte=%s ", hb_string); else off += scnprintf(logbuf + off, logbuf_len - off, "hostbyte=0x%02x ", host_byte(cmd->result)); if (WARN_ON(off >= logbuf_len)) goto out_printk; if (db_string) off += scnprintf(logbuf + off, logbuf_len - off, "driverbyte=%s", db_string); else off += scnprintf(logbuf + off, logbuf_len - off, "driverbyte=0x%02x", driver_byte(cmd->result)); out_printk: dev_printk(KERN_INFO, &cmd->device->sdev_gendev, "%s", logbuf); scsi_log_release_buffer(logbuf); } EXPORT_SYMBOL(scsi_print_result);
3 4 2 3 88 88 15 2 3 2 6 4 2 3 2 2 1 1 3 1 3 2 4 3 3 1 4 3 3 2 2 1 2 5 4 3 2 3 2 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 /* * OSS compatible sequencer driver * * OSS compatible i/o control * * Copyright (C) 1998,99 Takashi Iwai <tiwai@suse.de> * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */ #include "seq_oss_device.h" #include "seq_oss_readq.h" #include "seq_oss_writeq.h" #include "seq_oss_timer.h" #include "seq_oss_synth.h" #include "seq_oss_midi.h" #include "seq_oss_event.h" static int snd_seq_oss_synth_info_user(struct seq_oss_devinfo *dp, void __user *arg) { struct synth_info info; if (copy_from_user(&info, arg, sizeof(info))) return -EFAULT; if (snd_seq_oss_synth_make_info(dp, info.device, &info) < 0) return -EINVAL; if (copy_to_user(arg, &info, sizeof(info))) return -EFAULT; return 0; } static int snd_seq_oss_midi_info_user(struct seq_oss_devinfo *dp, void __user *arg) { struct midi_info info; if (copy_from_user(&info, arg, sizeof(info))) return -EFAULT; if (snd_seq_oss_midi_make_info(dp, info.device, &info) < 0) return -EINVAL; if (copy_to_user(arg, &info, sizeof(info))) return -EFAULT; return 0; } static int snd_seq_oss_oob_user(struct seq_oss_devinfo *dp, void __user *arg) { unsigned char ev[8]; struct snd_seq_event tmpev; if (copy_from_user(ev, arg, 8)) return -EFAULT; memset(&tmpev, 0, sizeof(tmpev)); snd_seq_oss_fill_addr(dp, &tmpev, dp->addr.client, dp->addr.port); tmpev.time.tick = 0; if (! snd_seq_oss_process_event(dp, (union evrec *)ev, &tmpev)) { snd_seq_oss_dispatch(dp, &tmpev, 0, 0); } return 0; } int snd_seq_oss_ioctl(struct seq_oss_devinfo *dp, unsigned int cmd, unsigned long carg) { int dev, val; void __user *arg = (void __user *)carg; int __user *p = arg; switch (cmd) { case SNDCTL_TMR_TIMEBASE: case SNDCTL_TMR_TEMPO: case SNDCTL_TMR_START: case SNDCTL_TMR_STOP: case SNDCTL_TMR_CONTINUE: case SNDCTL_TMR_METRONOME: case SNDCTL_TMR_SOURCE: case SNDCTL_TMR_SELECT: case SNDCTL_SEQ_CTRLRATE: return snd_seq_oss_timer_ioctl(dp->timer, cmd, arg); case SNDCTL_SEQ_PANIC: snd_seq_oss_reset(dp); return -EINVAL; case SNDCTL_SEQ_SYNC: if (! is_write_mode(dp->file_mode) || dp->writeq == NULL) return 0; while (snd_seq_oss_writeq_sync(dp->writeq)) ; if (signal_pending(current)) return -ERESTARTSYS; return 0; case SNDCTL_SEQ_RESET: snd_seq_oss_reset(dp); return 0; case SNDCTL_SEQ_TESTMIDI: if (get_user(dev, p)) return -EFAULT; return snd_seq_oss_midi_open(dp, dev, dp->file_mode); case SNDCTL_SEQ_GETINCOUNT: if (dp->readq == NULL || ! is_read_mode(dp->file_mode)) return 0; return put_user(dp->readq->qlen, p) ? -EFAULT : 0; case SNDCTL_SEQ_GETOUTCOUNT: if (! is_write_mode(dp->file_mode) || dp->writeq == NULL) return 0; return put_user(snd_seq_oss_writeq_get_free_size(dp->writeq), p) ? -EFAULT : 0; case SNDCTL_SEQ_GETTIME: return put_user(snd_seq_oss_timer_cur_tick(dp->timer), p) ? -EFAULT : 0; case SNDCTL_SEQ_RESETSAMPLES: if (get_user(dev, p)) return -EFAULT; return snd_seq_oss_synth_ioctl(dp, dev, cmd, carg); case SNDCTL_SEQ_NRSYNTHS: return put_user(dp->max_synthdev, p) ? -EFAULT : 0; case SNDCTL_SEQ_NRMIDIS: return put_user(dp->max_mididev, p) ? -EFAULT : 0; case SNDCTL_SYNTH_MEMAVL: if (get_user(dev, p)) return -EFAULT; val = snd_seq_oss_synth_ioctl(dp, dev, cmd, carg); return put_user(val, p) ? -EFAULT : 0; case SNDCTL_FM_4OP_ENABLE: if (get_user(dev, p)) return -EFAULT; snd_seq_oss_synth_ioctl(dp, dev, cmd, carg); return 0; case SNDCTL_SYNTH_INFO: case SNDCTL_SYNTH_ID: return snd_seq_oss_synth_info_user(dp, arg); case SNDCTL_SEQ_OUTOFBAND: return snd_seq_oss_oob_user(dp, arg); case SNDCTL_MIDI_INFO: return snd_seq_oss_midi_info_user(dp, arg); case SNDCTL_SEQ_THRESHOLD: if (! is_write_mode(dp->file_mode)) return 0; if (get_user(val, p)) return -EFAULT; if (val < 1) val = 1; if (val >= dp->writeq->maxlen) val = dp->writeq->maxlen - 1; snd_seq_oss_writeq_set_output(dp->writeq, val); return 0; case SNDCTL_MIDI_PRETIME: if (dp->readq == NULL || !is_read_mode(dp->file_mode)) return 0; if (get_user(val, p)) return -EFAULT; if (val <= 0) val = -1; else val = (HZ * val) / 10; dp->readq->pre_event_timeout = val; return put_user(val, p) ? -EFAULT : 0; default: if (! is_write_mode(dp->file_mode)) return -EIO; return snd_seq_oss_synth_ioctl(dp, 0, cmd, carg); } return 0; }
7812 14714 23868 7256 12 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 /* SPDX-License-Identifier: GPL-2.0 */ #ifndef _LINUX_ERR_H #define _LINUX_ERR_H #include <linux/compiler.h> #include <linux/types.h> #include <asm/errno.h> /* * Kernel pointers have redundant information, so we can use a * scheme where we can return either an error code or a normal * pointer with the same return value. * * This should be a per-architecture thing, to allow different * error and pointer decisions. */ #define MAX_ERRNO 4095 #ifndef __ASSEMBLY__ #define IS_ERR_VALUE(x) unlikely((unsigned long)(void *)(x) >= (unsigned long)-MAX_ERRNO) static inline void * __must_check ERR_PTR(long error) { return (void *) error; } static inline long __must_check PTR_ERR(__force const void *ptr) { return (long) ptr; } static inline bool __must_check IS_ERR(__force const void *ptr) { return IS_ERR_VALUE((unsigned long)ptr); } static inline bool __must_check IS_ERR_OR_NULL(__force const void *ptr) { return unlikely(!ptr) || IS_ERR_VALUE((unsigned long)ptr); } /** * ERR_CAST - Explicitly cast an error-valued pointer to another pointer type * @ptr: The pointer to cast. * * Explicitly cast an error-valued pointer to another pointer type in such a * way as to make it clear that's what's going on. */ static inline void * __must_check ERR_CAST(__force const void *ptr) { /* cast away the const */ return (void *) ptr; } static inline int __must_check PTR_ERR_OR_ZERO(__force const void *ptr) { if (IS_ERR(ptr)) return PTR_ERR(ptr); else return 0; } /* Deprecated */ #define PTR_RET(p) PTR_ERR_OR_ZERO(p) #endif #endif /* _LINUX_ERR_H */
32 4 4 28 4 16 13 13 13 13 15 15 31 28 28 31 13 31 30 30 30 13 2 20 2 28 3 2 1 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 /* DataCenter TCP (DCTCP) congestion control. * * http://simula.stanford.edu/~alizade/Site/DCTCP.html * * This is an implementation of DCTCP over Reno, an enhancement to the * TCP congestion control algorithm designed for data centers. DCTCP * leverages Explicit Congestion Notification (ECN) in the network to * provide multi-bit feedback to the end hosts. DCTCP's goal is to meet * the following three data center transport requirements: * * - High burst tolerance (incast due to partition/aggregate) * - Low latency (short flows, queries) * - High throughput (continuous data updates, large file transfers) * with commodity shallow buffered switches * * The algorithm is described in detail in the following two papers: * * 1) Mohammad Alizadeh, Albert Greenberg, David A. Maltz, Jitendra Padhye, * Parveen Patel, Balaji Prabhakar, Sudipta Sengupta, and Murari Sridharan: * "Data Center TCP (DCTCP)", Data Center Networks session * Proc. ACM SIGCOMM, New Delhi, 2010. * http://simula.stanford.edu/~alizade/Site/DCTCP_files/dctcp-final.pdf * * 2) Mohammad Alizadeh, Adel Javanmard, and Balaji Prabhakar: * "Analysis of DCTCP: Stability, Convergence, and Fairness" * Proc. ACM SIGMETRICS, San Jose, 2011. * http://simula.stanford.edu/~alizade/Site/DCTCP_files/dctcp_analysis-full.pdf * * Initial prototype from Abdul Kabbani, Masato Yasuda and Mohammad Alizadeh. * * Authors: * * Daniel Borkmann <dborkman@redhat.com> * Florian Westphal <fw@strlen.de> * Glenn Judd <glenn.judd@morganstanley.com> * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or (at * your option) any later version. */ #include <linux/module.h> #include <linux/mm.h> #include <net/tcp.h> #include <linux/inet_diag.h> #define DCTCP_MAX_ALPHA 1024U struct dctcp { u32 acked_bytes_ecn; u32 acked_bytes_total; u32 prior_snd_una; u32 prior_rcv_nxt; u32 dctcp_alpha; u32 next_seq; u32 ce_state; u32 loss_cwnd; }; static unsigned int dctcp_shift_g __read_mostly = 4; /* g = 1/2^4 */ module_param(dctcp_shift_g, uint, 0644); MODULE_PARM_DESC(dctcp_shift_g, "parameter g for updating dctcp_alpha"); static unsigned int dctcp_alpha_on_init __read_mostly = DCTCP_MAX_ALPHA; module_param(dctcp_alpha_on_init, uint, 0644); MODULE_PARM_DESC(dctcp_alpha_on_init, "parameter for initial alpha value"); static struct tcp_congestion_ops dctcp_reno; static void dctcp_reset(const struct tcp_sock *tp, struct dctcp *ca) { ca->next_seq = tp->snd_nxt; ca->acked_bytes_ecn = 0; ca->acked_bytes_total = 0; } static void dctcp_init(struct sock *sk) { const struct tcp_sock *tp = tcp_sk(sk); if ((tp->ecn_flags & TCP_ECN_OK) || (sk->sk_state == TCP_LISTEN || sk->sk_state == TCP_CLOSE)) { struct dctcp *ca = inet_csk_ca(sk); ca->prior_snd_una = tp->snd_una; ca->prior_rcv_nxt = tp->rcv_nxt; ca->dctcp_alpha = min(dctcp_alpha_on_init, DCTCP_MAX_ALPHA); ca->loss_cwnd = 0; ca->ce_state = 0; dctcp_reset(tp, ca); return; } /* No ECN support? Fall back to Reno. Also need to clear * ECT from sk since it is set during 3WHS for DCTCP. */ inet_csk(sk)->icsk_ca_ops = &dctcp_reno; INET_ECN_dontxmit(sk); } static u32 dctcp_ssthresh(struct sock *sk) { struct dctcp *ca = inet_csk_ca(sk); struct tcp_sock *tp = tcp_sk(sk); ca->loss_cwnd = tp->snd_cwnd; return max(tp->snd_cwnd - ((tp->snd_cwnd * ca->dctcp_alpha) >> 11U), 2U); } /* Minimal DCTP CE state machine: * * S: 0 <- last pkt was non-CE * 1 <- last pkt was CE */ static void dctcp_ce_state_0_to_1(struct sock *sk) { struct dctcp *ca = inet_csk_ca(sk); struct tcp_sock *tp = tcp_sk(sk); if (!ca->ce_state) { /* State has changed from CE=0 to CE=1, force an immediate * ACK to reflect the new CE state. If an ACK was delayed, * send that first to reflect the prior CE state. */ if (inet_csk(sk)->icsk_ack.pending & ICSK_ACK_TIMER) __tcp_send_ack(sk, ca->prior_rcv_nxt); inet_csk(sk)->icsk_ack.pending |= ICSK_ACK_NOW; } ca->prior_rcv_nxt = tp->rcv_nxt; ca->ce_state = 1; tp->ecn_flags |= TCP_ECN_DEMAND_CWR; } static void dctcp_ce_state_1_to_0(struct sock *sk) { struct dctcp *ca = inet_csk_ca(sk); struct tcp_sock *tp = tcp_sk(sk); if (ca->ce_state) { /* State has changed from CE=1 to CE=0, force an immediate * ACK to reflect the new CE state. If an ACK was delayed, * send that first to reflect the prior CE state. */ if (inet_csk(sk)->icsk_ack.pending & ICSK_ACK_TIMER) __tcp_send_ack(sk, ca->prior_rcv_nxt); inet_csk(sk)->icsk_ack.pending |= ICSK_ACK_NOW; } ca->prior_rcv_nxt = tp->rcv_nxt; ca->ce_state = 0; tp->ecn_flags &= ~TCP_ECN_DEMAND_CWR; } static void dctcp_update_alpha(struct sock *sk, u32 flags) { const struct tcp_sock *tp = tcp_sk(sk); struct dctcp *ca = inet_csk_ca(sk); u32 acked_bytes = tp->snd_una - ca->prior_snd_una; /* If ack did not advance snd_una, count dupack as MSS size. * If ack did update window, do not count it at all. */ if (acked_bytes == 0 && !(flags & CA_ACK_WIN_UPDATE)) acked_bytes = inet_csk(sk)->icsk_ack.rcv_mss; if (acked_bytes) { ca->acked_bytes_total += acked_bytes; ca->prior_snd_una = tp->snd_una; if (flags & CA_ACK_ECE) ca->acked_bytes_ecn += acked_bytes; } /* Expired RTT */ if (!before(tp->snd_una, ca->next_seq)) { u64 bytes_ecn = ca->acked_bytes_ecn; u32 alpha = ca->dctcp_alpha; /* alpha = (1 - g) * alpha + g * F */ alpha -= min_not_zero(alpha, alpha >> dctcp_shift_g); if (bytes_ecn) { /* If dctcp_shift_g == 1, a 32bit value would overflow * after 8 Mbytes. */ bytes_ecn <<= (10 - dctcp_shift_g); do_div(bytes_ecn, max(1U, ca->acked_bytes_total)); alpha = min(alpha + (u32)bytes_ecn, DCTCP_MAX_ALPHA); } /* dctcp_alpha can be read from dctcp_get_info() without * synchro, so we ask compiler to not use dctcp_alpha * as a temporary variable in prior operations. */ WRITE_ONCE(ca->dctcp_alpha, alpha); dctcp_reset(tp, ca); } } static void dctcp_react_to_loss(struct sock *sk) { struct dctcp *ca = inet_csk_ca(sk); struct tcp_sock *tp = tcp_sk(sk); ca->loss_cwnd = tp->snd_cwnd; tp->snd_ssthresh = max(tp->snd_cwnd >> 1U, 2U); } static void dctcp_state(struct sock *sk, u8 new_state) { if (new_state == TCP_CA_Recovery && new_state != inet_csk(sk)->icsk_ca_state) dctcp_react_to_loss(sk); /* We handle RTO in dctcp_cwnd_event to ensure that we perform only * one loss-adjustment per RTT. */ } static void dctcp_cwnd_event(struct sock *sk, enum tcp_ca_event ev) { switch (ev) { case CA_EVENT_ECN_IS_CE: dctcp_ce_state_0_to_1(sk); break; case CA_EVENT_ECN_NO_CE: dctcp_ce_state_1_to_0(sk); break; case CA_EVENT_LOSS: dctcp_react_to_loss(sk); break; default: /* Don't care for the rest. */ break; } } static size_t dctcp_get_info(struct sock *sk, u32 ext, int *attr, union tcp_cc_info *info) { const struct dctcp *ca = inet_csk_ca(sk); /* Fill it also in case of VEGASINFO due to req struct limits. * We can still correctly retrieve it later. */ if (ext & (1 << (INET_DIAG_DCTCPINFO - 1)) || ext & (1 << (INET_DIAG_VEGASINFO - 1))) { memset(&info->dctcp, 0, sizeof(info->dctcp)); if (inet_csk(sk)->icsk_ca_ops != &dctcp_reno) { info->dctcp.dctcp_enabled = 1; info->dctcp.dctcp_ce_state = (u16) ca->ce_state; info->dctcp.dctcp_alpha = ca->dctcp_alpha; info->dctcp.dctcp_ab_ecn = ca->acked_bytes_ecn; info->dctcp.dctcp_ab_tot = ca->acked_bytes_total; } *attr = INET_DIAG_DCTCPINFO; return sizeof(info->dctcp); } return 0; } static u32 dctcp_cwnd_undo(struct sock *sk) { const struct dctcp *ca = inet_csk_ca(sk); return max(tcp_sk(sk)->snd_cwnd, ca->loss_cwnd); } static struct tcp_congestion_ops dctcp __read_mostly = { .init = dctcp_init, .in_ack_event = dctcp_update_alpha, .cwnd_event = dctcp_cwnd_event, .ssthresh = dctcp_ssthresh, .cong_avoid = tcp_reno_cong_avoid, .undo_cwnd = dctcp_cwnd_undo, .set_state = dctcp_state, .get_info = dctcp_get_info, .flags = TCP_CONG_NEEDS_ECN, .owner = THIS_MODULE, .name = "dctcp", }; static struct tcp_congestion_ops dctcp_reno __read_mostly = { .ssthresh = tcp_reno_ssthresh, .cong_avoid = tcp_reno_cong_avoid, .undo_cwnd = tcp_reno_undo_cwnd, .get_info = dctcp_get_info, .owner = THIS_MODULE, .name = "dctcp-reno", }; static int __init dctcp_register(void) { BUILD_BUG_ON(sizeof(struct dctcp) > ICSK_CA_PRIV_SIZE); return tcp_register_congestion_control(&dctcp); } static void __exit dctcp_unregister(void) { tcp_unregister_congestion_control(&dctcp); } module_init(dctcp_register); module_exit(dctcp_unregister); MODULE_AUTHOR("Daniel Borkmann <dborkman@redhat.com>"); MODULE_AUTHOR("Florian Westphal <fw@strlen.de>"); MODULE_AUTHOR("Glenn Judd <glenn.judd@morganstanley.com>"); MODULE_LICENSE("GPL v2"); MODULE_DESCRIPTION("DataCenter TCP (DCTCP)");
73 392 421 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 // SPDX-License-Identifier: GPL-2.0 /* * linux/fs/ext4/xattr_trusted.c * Handler for trusted extended attributes. * * Copyright (C) 2003 by Andreas Gruenbacher, <a.gruenbacher@computer.org> */ #include <linux/string.h> #include <linux/capability.h> #include <linux/fs.h> #include "ext4_jbd2.h" #include "ext4.h" #include "xattr.h" static bool ext4_xattr_trusted_list(struct dentry *dentry) { return capable(CAP_SYS_ADMIN); } static int ext4_xattr_trusted_get(const struct xattr_handler *handler, struct dentry *unused, struct inode *inode, const char *name, void *buffer, size_t size) { return ext4_xattr_get(inode, EXT4_XATTR_INDEX_TRUSTED, name, buffer, size); } static int ext4_xattr_trusted_set(const struct xattr_handler *handler, struct dentry *unused, struct inode *inode, const char *name, const void *value, size_t size, int flags) { return ext4_xattr_set(inode, EXT4_XATTR_INDEX_TRUSTED, name, value, size, flags); } const struct xattr_handler ext4_xattr_trusted_handler = { .prefix = XATTR_TRUSTED_PREFIX, .list = ext4_xattr_trusted_list, .get = ext4_xattr_trusted_get, .set = ext4_xattr_trusted_set, };
49 47 47 1 71 71 71 71 2631 2635 41 2632 1 2633 2 46 46 3 3 3 3 3 3 3 3 3 3 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 // SPDX-License-Identifier: GPL-2.0 /* * Functions related to io context handling */ #include <linux/kernel.h> #include <linux/module.h> #include <linux/init.h> #include <linux/bio.h> #include <linux/blkdev.h> #include <linux/slab.h> #include <linux/sched/task.h> #include "blk.h" /* * For io context allocations */ static struct kmem_cache *iocontext_cachep; /** * get_io_context - increment reference count to io_context * @ioc: io_context to get * * Increment reference count to @ioc. */ void get_io_context(struct io_context *ioc) { BUG_ON(atomic_long_read(&ioc->refcount) <= 0); atomic_long_inc(&ioc->refcount); } EXPORT_SYMBOL(get_io_context); static void icq_free_icq_rcu(struct rcu_head *head) { struct io_cq *icq = container_of(head, struct io_cq, __rcu_head); kmem_cache_free(icq->__rcu_icq_cache, icq); } /* * Exit an icq. Called with ioc locked for blk-mq, and with both ioc * and queue locked for legacy. */ static void ioc_exit_icq(struct io_cq *icq) { struct elevator_type *et = icq->q->elevator->type; if (icq->flags & ICQ_EXITED) return; if (et->uses_mq && et->ops.mq.exit_icq) et->ops.mq.exit_icq(icq); else if (!et->uses_mq && et->ops.sq.elevator_exit_icq_fn) et->ops.sq.elevator_exit_icq_fn(icq); icq->flags |= ICQ_EXITED; } /* * Release an icq. Called with ioc locked for blk-mq, and with both ioc * and queue locked for legacy. */ static void ioc_destroy_icq(struct io_cq *icq) { struct io_context *ioc = icq->ioc; struct request_queue *q = icq->q; struct elevator_type *et = q->elevator->type; lockdep_assert_held(&ioc->lock); radix_tree_delete(&ioc->icq_tree, icq->q->id); hlist_del_init(&icq->ioc_node); list_del_init(&icq->q_node); /* * Both setting lookup hint to and clearing it from @icq are done * under queue_lock. If it's not pointing to @icq now, it never * will. Hint assignment itself can race safely. */ if (rcu_access_pointer(ioc->icq_hint) == icq) rcu_assign_pointer(ioc->icq_hint, NULL); ioc_exit_icq(icq); /* * @icq->q might have gone away by the time RCU callback runs * making it impossible to determine icq_cache. Record it in @icq. */ icq->__rcu_icq_cache = et->icq_cache; icq->flags |= ICQ_DESTROYED; call_rcu(&icq->__rcu_head, icq_free_icq_rcu); } /* * Slow path for ioc release in put_io_context(). Performs double-lock * dancing to unlink all icq's and then frees ioc. */ static void ioc_release_fn(struct work_struct *work) { struct io_context *ioc = container_of(work, struct io_context, release_work); unsigned long flags; /* * Exiting icq may call into put_io_context() through elevator * which will trigger lockdep warning. The ioc's are guaranteed to * be different, use a different locking subclass here. Use * irqsave variant as there's no spin_lock_irq_nested(). */ spin_lock_irqsave_nested(&ioc->lock, flags, 1); while (!hlist_empty(&ioc->icq_list)) { struct io_cq *icq = hlist_entry(ioc->icq_list.first, struct io_cq, ioc_node); struct request_queue *q = icq->q; if (spin_trylock(q->queue_lock)) { ioc_destroy_icq(icq); spin_unlock(q->queue_lock); } else { spin_unlock_irqrestore(&ioc->lock, flags); cpu_relax(); spin_lock_irqsave_nested(&ioc->lock, flags, 1); } } spin_unlock_irqrestore(&ioc->lock, flags); kmem_cache_free(iocontext_cachep, ioc); } /** * put_io_context - put a reference of io_context * @ioc: io_context to put * * Decrement reference count of @ioc and release it if the count reaches * zero. */ void put_io_context(struct io_context *ioc) { unsigned long flags; bool free_ioc = false; if (ioc == NULL) return; BUG_ON(atomic_long_read(&ioc->refcount) <= 0); /* * Releasing ioc requires reverse order double locking and we may * already be holding a queue_lock. Do it asynchronously from wq. */ if (atomic_long_dec_and_test(&ioc->refcount)) { spin_lock_irqsave(&ioc->lock, flags); if (!hlist_empty(&ioc->icq_list)) queue_work(system_power_efficient_wq, &ioc->release_work); else free_ioc = true; spin_unlock_irqrestore(&ioc->lock, flags); } if (free_ioc) kmem_cache_free(iocontext_cachep, ioc); } EXPORT_SYMBOL(put_io_context); /** * put_io_context_active - put active reference on ioc * @ioc: ioc of interest * * Undo get_io_context_active(). If active reference reaches zero after * put, @ioc can never issue further IOs and ioscheds are notified. */ void put_io_context_active(struct io_context *ioc) { struct elevator_type *et; unsigned long flags; struct io_cq *icq; if (!atomic_dec_and_test(&ioc->active_ref)) { put_io_context(ioc); return; } /* * Need ioc lock to walk icq_list and q lock to exit icq. Perform * reverse double locking. Read comment in ioc_release_fn() for * explanation on the nested locking annotation. */ retry: spin_lock_irqsave_nested(&ioc->lock, flags, 1); hlist_for_each_entry(icq, &ioc->icq_list, ioc_node) { if (icq->flags & ICQ_EXITED) continue; et = icq->q->elevator->type; if (et->uses_mq) { ioc_exit_icq(icq); } else { if (spin_trylock(icq->q->queue_lock)) { ioc_exit_icq(icq); spin_unlock(icq->q->queue_lock); } else { spin_unlock_irqrestore(&ioc->lock, flags); cpu_relax(); goto retry; } } } spin_unlock_irqrestore(&ioc->lock, flags); put_io_context(ioc); } /* Called by the exiting task */ void exit_io_context(struct task_struct *task) { struct io_context *ioc; task_lock(task); ioc = task->io_context; task->io_context = NULL; task_unlock(task); atomic_dec(&ioc->nr_tasks); put_io_context_active(ioc); } static void __ioc_clear_queue(struct list_head *icq_list) { unsigned long flags; rcu_read_lock(); while (!list_empty(icq_list)) { struct io_cq *icq = list_entry(icq_list->next, struct io_cq, q_node); struct io_context *ioc = icq->ioc; spin_lock_irqsave(&ioc->lock, flags); if (icq->flags & ICQ_DESTROYED) { spin_unlock_irqrestore(&ioc->lock, flags); continue; } ioc_destroy_icq(icq); spin_unlock_irqrestore(&ioc->lock, flags); } rcu_read_unlock(); } /** * ioc_clear_queue - break any ioc association with the specified queue * @q: request_queue being cleared * * Walk @q->icq_list and exit all io_cq's. */ void ioc_clear_queue(struct request_queue *q) { LIST_HEAD(icq_list); spin_lock_irq(q->queue_lock); list_splice_init(&q->icq_list, &icq_list); if (q->mq_ops) { spin_unlock_irq(q->queue_lock); __ioc_clear_queue(&icq_list); } else { __ioc_clear_queue(&icq_list); spin_unlock_irq(q->queue_lock); } } int create_task_io_context(struct task_struct *task, gfp_t gfp_flags, int node) { struct io_context *ioc; int ret; ioc = kmem_cache_alloc_node(iocontext_cachep, gfp_flags | __GFP_ZERO, node); if (unlikely(!ioc)) return -ENOMEM; /* initialize */ atomic_long_set(&ioc->refcount, 1); atomic_set(&ioc->nr_tasks, 1); atomic_set(&ioc->active_ref, 1); spin_lock_init(&ioc->lock); INIT_RADIX_TREE(&ioc->icq_tree, GFP_ATOMIC); INIT_HLIST_HEAD(&ioc->icq_list); INIT_WORK(&ioc->release_work, ioc_release_fn); /* * Try to install. ioc shouldn't be installed if someone else * already did or @task, which isn't %current, is exiting. Note * that we need to allow ioc creation on exiting %current as exit * path may issue IOs from e.g. exit_files(). The exit path is * responsible for not issuing IO after exit_io_context(). */ task_lock(task); if (!task->io_context && (task == current || !(task->flags & PF_EXITING))) task->io_context = ioc; else kmem_cache_free(iocontext_cachep, ioc); ret = task->io_context ? 0 : -EBUSY; task_unlock(task); return ret; } /** * get_task_io_context - get io_context of a task * @task: task of interest * @gfp_flags: allocation flags, used if allocation is necessary * @node: allocation node, used if allocation is necessary * * Return io_context of @task. If it doesn't exist, it is created with * @gfp_flags and @node. The returned io_context has its reference count * incremented. * * This function always goes through task_lock() and it's better to use * %current->io_context + get_io_context() for %current. */ struct io_context *get_task_io_context(struct task_struct *task, gfp_t gfp_flags, int node) { struct io_context *ioc; might_sleep_if(gfpflags_allow_blocking(gfp_flags)); do { task_lock(task); ioc = task->io_context; if (likely(ioc)) { get_io_context(ioc); task_unlock(task); return ioc; } task_unlock(task); } while (!create_task_io_context(task, gfp_flags, node)); return NULL; } EXPORT_SYMBOL(get_task_io_context); /** * ioc_lookup_icq - lookup io_cq from ioc * @ioc: the associated io_context * @q: the associated request_queue * * Look up io_cq associated with @ioc - @q pair from @ioc. Must be called * with @q->queue_lock held. */ struct io_cq *ioc_lookup_icq(struct io_context *ioc, struct request_queue *q) { struct io_cq *icq; lockdep_assert_held(q->queue_lock); /* * icq's are indexed from @ioc using radix tree and hint pointer, * both of which are protected with RCU. All removals are done * holding both q and ioc locks, and we're holding q lock - if we * find a icq which points to us, it's guaranteed to be valid. */ rcu_read_lock(); icq = rcu_dereference(ioc->icq_hint); if (icq && icq->q == q) goto out; icq = radix_tree_lookup(&ioc->icq_tree, q->id); if (icq && icq->q == q) rcu_assign_pointer(ioc->icq_hint, icq); /* allowed to race */ else icq = NULL; out: rcu_read_unlock(); return icq; } EXPORT_SYMBOL(ioc_lookup_icq); /** * ioc_create_icq - create and link io_cq * @ioc: io_context of interest * @q: request_queue of interest * @gfp_mask: allocation mask * * Make sure io_cq linking @ioc and @q exists. If icq doesn't exist, they * will be created using @gfp_mask. * * The caller is responsible for ensuring @ioc won't go away and @q is * alive and will stay alive until this function returns. */ struct io_cq *ioc_create_icq(struct io_context *ioc, struct request_queue *q, gfp_t gfp_mask) { struct elevator_type *et = q->elevator->type; struct io_cq *icq; /* allocate stuff */ icq = kmem_cache_alloc_node(et->icq_cache, gfp_mask | __GFP_ZERO, q->node); if (!icq) return NULL; if (radix_tree_maybe_preload(gfp_mask) < 0) { kmem_cache_free(et->icq_cache, icq); return NULL; } icq->ioc = ioc; icq->q = q; INIT_LIST_HEAD(&icq->q_node); INIT_HLIST_NODE(&icq->ioc_node); /* lock both q and ioc and try to link @icq */ spin_lock_irq(q->queue_lock); spin_lock(&ioc->lock); if (likely(!radix_tree_insert(&ioc->icq_tree, q->id, icq))) { hlist_add_head(&icq->ioc_node, &ioc->icq_list); list_add(&icq->q_node, &q->icq_list); if (et->uses_mq && et->ops.mq.init_icq) et->ops.mq.init_icq(icq); else if (!et->uses_mq && et->ops.sq.elevator_init_icq_fn) et->ops.sq.elevator_init_icq_fn(icq); } else { kmem_cache_free(et->icq_cache, icq); icq = ioc_lookup_icq(ioc, q); if (!icq) printk(KERN_ERR "cfq: icq link failed!\n"); } spin_unlock(&ioc->lock); spin_unlock_irq(q->queue_lock); radix_tree_preload_end(); return icq; } static int __init blk_ioc_init(void) { iocontext_cachep = kmem_cache_create("blkdev_ioc", sizeof(struct io_context), 0, SLAB_PANIC, NULL); return 0; } subsys_initcall(blk_ioc_init);
13377 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 /* SPDX-License-Identifier: GPL-2.0 */ #ifndef _ASM_X86_CPUFEATURE_H #define _ASM_X86_CPUFEATURE_H #include <asm/processor.h> #if defined(__KERNEL__) && !defined(__ASSEMBLY__) #include <asm/asm.h> #include <linux/bitops.h> enum cpuid_leafs { CPUID_1_EDX = 0, CPUID_8000_0001_EDX, CPUID_8086_0001_EDX, CPUID_LNX_1, CPUID_1_ECX, CPUID_C000_0001_EDX, CPUID_8000_0001_ECX, CPUID_LNX_2, CPUID_LNX_3, CPUID_7_0_EBX, CPUID_D_1_EAX, CPUID_LNX_4, CPUID_DUMMY, CPUID_8000_0008_EBX, CPUID_6_EAX, CPUID_8000_000A_EDX, CPUID_7_ECX, CPUID_8000_0007_EBX, CPUID_7_EDX, }; #ifdef CONFIG_X86_FEATURE_NAMES extern const char * const x86_cap_flags[NCAPINTS*32]; extern const char * const x86_power_flags[32]; #define X86_CAP_FMT "%s" #define x86_cap_flag(flag) x86_cap_flags[flag] #else #define X86_CAP_FMT "%d:%d" #define x86_cap_flag(flag) ((flag) >> 5), ((flag) & 31) #endif /* * In order to save room, we index into this array by doing * X86_BUG_<name> - NCAPINTS*32. */ extern const char * const x86_bug_flags[NBUGINTS*32]; #define test_cpu_cap(c, bit) \ test_bit(bit, (unsigned long *)((c)->x86_capability)) /* * There are 32 bits/features in each mask word. The high bits * (selected with (bit>>5) give us the word number and the low 5 * bits give us the bit/feature number inside the word. * (1UL<<((bit)&31) gives us a mask for the feature_bit so we can * see if it is set in the mask word. */ #define CHECK_BIT_IN_MASK_WORD(maskname, word, bit) \ (((bit)>>5)==(word) && (1UL<<((bit)&31) & maskname##word )) #define REQUIRED_MASK_BIT_SET(feature_bit) \ ( CHECK_BIT_IN_MASK_WORD(REQUIRED_MASK, 0, feature_bit) || \ CHECK_BIT_IN_MASK_WORD(REQUIRED_MASK, 1, feature_bit) || \ CHECK_BIT_IN_MASK_WORD(REQUIRED_MASK, 2, feature_bit) || \ CHECK_BIT_IN_MASK_WORD(REQUIRED_MASK, 3, feature_bit) || \ CHECK_BIT_IN_MASK_WORD(REQUIRED_MASK, 4, feature_bit) || \ CHECK_BIT_IN_MASK_WORD(REQUIRED_MASK, 5, feature_bit) || \ CHECK_BIT_IN_MASK_WORD(REQUIRED_MASK, 6, feature_bit) || \ CHECK_BIT_IN_MASK_WORD(REQUIRED_MASK, 7, feature_bit) || \ CHECK_BIT_IN_MASK_WORD(REQUIRED_MASK, 8, feature_bit) || \ CHECK_BIT_IN_MASK_WORD(REQUIRED_MASK, 9, feature_bit) || \ CHECK_BIT_IN_MASK_WORD(REQUIRED_MASK, 10, feature_bit) || \ CHECK_BIT_IN_MASK_WORD(REQUIRED_MASK, 11, feature_bit) || \ CHECK_BIT_IN_MASK_WORD(REQUIRED_MASK, 12, feature_bit) || \ CHECK_BIT_IN_MASK_WORD(REQUIRED_MASK, 13, feature_bit) || \ CHECK_BIT_IN_MASK_WORD(REQUIRED_MASK, 14, feature_bit) || \ CHECK_BIT_IN_MASK_WORD(REQUIRED_MASK, 15, feature_bit) || \ CHECK_BIT_IN_MASK_WORD(REQUIRED_MASK, 16, feature_bit) || \ CHECK_BIT_IN_MASK_WORD(REQUIRED_MASK, 17, feature_bit) || \ CHECK_BIT_IN_MASK_WORD(REQUIRED_MASK, 18, feature_bit) || \ REQUIRED_MASK_CHECK || \ BUILD_BUG_ON_ZERO(NCAPINTS != 19)) #define DISABLED_MASK_BIT_SET(feature_bit) \ ( CHECK_BIT_IN_MASK_WORD(DISABLED_MASK, 0, feature_bit) || \ CHECK_BIT_IN_MASK_WORD(DISABLED_MASK, 1, feature_bit) || \ CHECK_BIT_IN_MASK_WORD(DISABLED_MASK, 2, feature_bit) || \ CHECK_BIT_IN_MASK_WORD(DISABLED_MASK, 3, feature_bit) || \ CHECK_BIT_IN_MASK_WORD(DISABLED_MASK, 4, feature_bit) || \ CHECK_BIT_IN_MASK_WORD(DISABLED_MASK, 5, feature_bit) || \ CHECK_BIT_IN_MASK_WORD(DISABLED_MASK, 6, feature_bit) || \ CHECK_BIT_IN_MASK_WORD(DISABLED_MASK, 7, feature_bit) || \ CHECK_BIT_IN_MASK_WORD(DISABLED_MASK, 8, feature_bit) || \ CHECK_BIT_IN_MASK_WORD(DISABLED_MASK, 9, feature_bit) || \ CHECK_BIT_IN_MASK_WORD(DISABLED_MASK, 10, feature_bit) || \ CHECK_BIT_IN_MASK_WORD(DISABLED_MASK, 11, feature_bit) || \ CHECK_BIT_IN_MASK_WORD(DISABLED_MASK, 12, feature_bit) || \ CHECK_BIT_IN_MASK_WORD(DISABLED_MASK, 13, feature_bit) || \ CHECK_BIT_IN_MASK_WORD(DISABLED_MASK, 14, feature_bit) || \ CHECK_BIT_IN_MASK_WORD(DISABLED_MASK, 15, feature_bit) || \ CHECK_BIT_IN_MASK_WORD(DISABLED_MASK, 16, feature_bit) || \ CHECK_BIT_IN_MASK_WORD(DISABLED_MASK, 17, feature_bit) || \ CHECK_BIT_IN_MASK_WORD(DISABLED_MASK, 18, feature_bit) || \ DISABLED_MASK_CHECK || \ BUILD_BUG_ON_ZERO(NCAPINTS != 19)) #define cpu_has(c, bit) \ (__builtin_constant_p(bit) && REQUIRED_MASK_BIT_SET(bit) ? 1 : \ test_cpu_cap(c, bit)) #define this_cpu_has(bit) \ (__builtin_constant_p(bit) && REQUIRED_MASK_BIT_SET(bit) ? 1 : \ x86_this_cpu_test_bit(bit, (unsigned long *)&cpu_info.x86_capability)) /* * This macro is for detection of features which need kernel * infrastructure to be used. It may *not* directly test the CPU * itself. Use the cpu_has() family if you want true runtime * testing of CPU features, like in hypervisor code where you are * supporting a possible guest feature where host support for it * is not relevant. */ #define cpu_feature_enabled(bit) \ (__builtin_constant_p(bit) && DISABLED_MASK_BIT_SET(bit) ? 0 : static_cpu_has(bit)) #define boot_cpu_has(bit) cpu_has(&boot_cpu_data, bit) #define set_cpu_cap(c, bit) set_bit(bit, (unsigned long *)((c)->x86_capability)) extern void setup_clear_cpu_cap(unsigned int bit); extern void clear_cpu_cap(struct cpuinfo_x86 *c, unsigned int bit); #define setup_force_cpu_cap(bit) do { \ set_cpu_cap(&boot_cpu_data, bit); \ set_bit(bit, (unsigned long *)cpu_caps_set); \ } while (0) #define setup_force_cpu_bug(bit) setup_force_cpu_cap(bit) #if defined(__clang__) && !defined(CONFIG_CC_HAS_ASM_GOTO) /* * Workaround for the sake of BPF compilation which utilizes kernel * headers, but clang does not support ASM GOTO and fails the build. */ #ifndef __BPF_TRACING__ #warning "Compiler lacks ASM_GOTO support. Add -D __BPF_TRACING__ to your compiler arguments" #endif #define static_cpu_has(bit) boot_cpu_has(bit) #else /* * Static testing of CPU features. Used the same as boot_cpu_has(). * These will statically patch the target code for additional * performance. */ static __always_inline __pure bool _static_cpu_has(u16 bit) { asm_volatile_goto("1: jmp 6f\n" "2:\n" ".skip -(((5f-4f) - (2b-1b)) > 0) * " "((5f-4f) - (2b-1b)),0x90\n" "3:\n" ".section .altinstructions,\"a\"\n" " .long 1b - .\n" /* src offset */ " .long 4f - .\n" /* repl offset */ " .word %P[always]\n" /* always replace */ " .byte 3b - 1b\n" /* src len */ " .byte 5f - 4f\n" /* repl len */ " .byte 3b - 2b\n" /* pad len */ ".previous\n" ".section .altinstr_replacement,\"ax\"\n" "4: jmp %l[t_no]\n" "5:\n" ".previous\n" ".section .altinstructions,\"a\"\n" " .long 1b - .\n" /* src offset */ " .long 0\n" /* no replacement */ " .word %P[feature]\n" /* feature bit */ " .byte 3b - 1b\n" /* src len */ " .byte 0\n" /* repl len */ " .byte 0\n" /* pad len */ ".previous\n" ".section .altinstr_aux,\"ax\"\n" "6:\n" " testb %[bitnum],%[cap_byte]\n" " jnz %l[t_yes]\n" " jmp %l[t_no]\n" ".previous\n" : : [feature] "i" (bit), [always] "i" (X86_FEATURE_ALWAYS), [bitnum] "i" (1 << (bit & 7)), [cap_byte] "m" (((const char *)boot_cpu_data.x86_capability)[bit >> 3]) : : t_yes, t_no); t_yes: return true; t_no: return false; } #define static_cpu_has(bit) \ ( \ __builtin_constant_p(boot_cpu_has(bit)) ? \ boot_cpu_has(bit) : \ _static_cpu_has(bit) \ ) #endif #define cpu_has_bug(c, bit) cpu_has(c, (bit)) #define set_cpu_bug(c, bit) set_cpu_cap(c, (bit)) #define clear_cpu_bug(c, bit) clear_cpu_cap(c, (bit)) #define static_cpu_has_bug(bit) static_cpu_has((bit)) #define boot_cpu_has_bug(bit) cpu_has_bug(&boot_cpu_data, (bit)) #define boot_cpu_set_bug(bit) set_cpu_cap(&boot_cpu_data, (bit)) #define MAX_CPU_FEATURES (NCAPINTS * 32) #define cpu_have_feature boot_cpu_has #define CPU_FEATURE_TYPEFMT "x86,ven%04Xfam%04Xmod%04X" #define CPU_FEATURE_TYPEVAL boot_cpu_data.x86_vendor, boot_cpu_data.x86, \ boot_cpu_data.x86_model #endif /* defined(__KERNEL__) && !defined(__ASSEMBLY__) */ #endif /* _ASM_X86_CPUFEATURE_H */
114 114 63 11 24 3 3 63 54 25 54 54 54 54 54 54 54 29 54 35 54 54 30 24 3 54 35 54 1 54 54 63 54 54 22 22 22 22 30 25 30 5 5 30 24 30 12 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 /* * OSS compatible sequencer driver * * open/close and reset interface * * Copyright (C) 1998-1999 Takashi Iwai <tiwai@suse.de> * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */ #include "seq_oss_device.h" #include "seq_oss_synth.h" #include "seq_oss_midi.h" #include "seq_oss_writeq.h" #include "seq_oss_readq.h" #include "seq_oss_timer.h" #include "seq_oss_event.h" #include <linux/init.h> #include <linux/export.h> #include <linux/moduleparam.h> #include <linux/slab.h> #include <linux/workqueue.h> /* * common variables */ static int maxqlen = SNDRV_SEQ_OSS_MAX_QLEN; module_param(maxqlen, int, 0444); MODULE_PARM_DESC(maxqlen, "maximum queue length"); static int system_client = -1; /* ALSA sequencer client number */ static int system_port = -1; static int num_clients; static struct seq_oss_devinfo *client_table[SNDRV_SEQ_OSS_MAX_CLIENTS]; /* * prototypes */ static int receive_announce(struct snd_seq_event *ev, int direct, void *private, int atomic, int hop); static int translate_mode(struct file *file); static int create_port(struct seq_oss_devinfo *dp); static int delete_port(struct seq_oss_devinfo *dp); static int alloc_seq_queue(struct seq_oss_devinfo *dp); static int delete_seq_queue(int queue); static void free_devinfo(void *private); #define call_ctl(type,rec) snd_seq_kernel_client_ctl(system_client, type, rec) /* call snd_seq_oss_midi_lookup_ports() asynchronously */ static void async_call_lookup_ports(struct work_struct *work) { snd_seq_oss_midi_lookup_ports(system_client); } static DECLARE_WORK(async_lookup_work, async_call_lookup_ports); /* * create sequencer client for OSS sequencer */ int __init snd_seq_oss_create_client(void) { int rc; struct snd_seq_port_info *port; struct snd_seq_port_callback port_callback; port = kmalloc(sizeof(*port), GFP_KERNEL); if (!port) { rc = -ENOMEM; goto __error; } /* create ALSA client */ rc = snd_seq_create_kernel_client(NULL, SNDRV_SEQ_CLIENT_OSS, "OSS sequencer"); if (rc < 0) goto __error; system_client = rc; /* create annoucement receiver port */ memset(port, 0, sizeof(*port)); strcpy(port->name, "Receiver"); port->addr.client = system_client; port->capability = SNDRV_SEQ_PORT_CAP_WRITE; /* receive only */ port->type = 0; memset(&port_callback, 0, sizeof(port_callback)); /* don't set port_callback.owner here. otherwise the module counter * is incremented and we can no longer release the module.. */ port_callback.event_input = receive_announce; port->kernel = &port_callback; call_ctl(SNDRV_SEQ_IOCTL_CREATE_PORT, port); if ((system_port = port->addr.port) >= 0) { struct snd_seq_port_subscribe subs; memset(&subs, 0, sizeof(subs)); subs.sender.client = SNDRV_SEQ_CLIENT_SYSTEM; subs.sender.port = SNDRV_SEQ_PORT_SYSTEM_ANNOUNCE; subs.dest.client = system_client; subs.dest.port = system_port; call_ctl(SNDRV_SEQ_IOCTL_SUBSCRIBE_PORT, &subs); } rc = 0; /* look up midi devices */ schedule_work(&async_lookup_work); __error: kfree(port); return rc; } /* * receive annoucement from system port, and check the midi device */ static int receive_announce(struct snd_seq_event *ev, int direct, void *private, int atomic, int hop) { struct snd_seq_port_info pinfo; if (atomic) return 0; /* it must not happen */ switch (ev->type) { case SNDRV_SEQ_EVENT_PORT_START: case SNDRV_SEQ_EVENT_PORT_CHANGE: if (ev->data.addr.client == system_client) break; /* ignore myself */ memset(&pinfo, 0, sizeof(pinfo)); pinfo.addr = ev->data.addr; if (call_ctl(SNDRV_SEQ_IOCTL_GET_PORT_INFO, &pinfo) >= 0) snd_seq_oss_midi_check_new_port(&pinfo); break; case SNDRV_SEQ_EVENT_PORT_EXIT: if (ev->data.addr.client == system_client) break; /* ignore myself */ snd_seq_oss_midi_check_exit_port(ev->data.addr.client, ev->data.addr.port); break; } return 0; } /* * delete OSS sequencer client */ int snd_seq_oss_delete_client(void) { cancel_work_sync(&async_lookup_work); if (system_client >= 0) snd_seq_delete_kernel_client(system_client); snd_seq_oss_midi_clear_all(); return 0; } /* * open sequencer device */ int snd_seq_oss_open(struct file *file, int level) { int i, rc; struct seq_oss_devinfo *dp; dp = kzalloc(sizeof(*dp), GFP_KERNEL); if (!dp) return -ENOMEM; dp->cseq = system_client; dp->port = -1; dp->queue = -1; for (i = 0; i < SNDRV_SEQ_OSS_MAX_CLIENTS; i++) { if (client_table[i] == NULL) break; } dp->index = i; if (i >= SNDRV_SEQ_OSS_MAX_CLIENTS) { pr_debug("ALSA: seq_oss: too many applications\n"); rc = -ENOMEM; goto _error; } /* look up synth and midi devices */ snd_seq_oss_synth_setup(dp); snd_seq_oss_midi_setup(dp); if (dp->synth_opened == 0 && dp->max_mididev == 0) { /* pr_err("ALSA: seq_oss: no device found\n"); */ rc = -ENODEV; goto _error; } /* create port */ rc = create_port(dp); if (rc < 0) { pr_err("ALSA: seq_oss: can't create port\n"); goto _error; } /* allocate queue */ rc = alloc_seq_queue(dp); if (rc < 0) goto _error; /* set address */ dp->addr.client = dp->cseq; dp->addr.port = dp->port; /*dp->addr.queue = dp->queue;*/ /*dp->addr.channel = 0;*/ dp->seq_mode = level; /* set up file mode */ dp->file_mode = translate_mode(file); /* initialize read queue */ if (is_read_mode(dp->file_mode)) { dp->readq = snd_seq_oss_readq_new(dp, maxqlen); if (!dp->readq) { rc = -ENOMEM; goto _error; } } /* initialize write queue */ if (is_write_mode(dp->file_mode)) { dp->writeq = snd_seq_oss_writeq_new(dp, maxqlen); if (!dp->writeq) { rc = -ENOMEM; goto _error; } } /* initialize timer */ dp->timer = snd_seq_oss_timer_new(dp); if (!dp->timer) { pr_err("ALSA: seq_oss: can't alloc timer\n"); rc = -ENOMEM; goto _error; } /* set private data pointer */ file->private_data = dp; /* set up for mode2 */ if (level == SNDRV_SEQ_OSS_MODE_MUSIC) snd_seq_oss_synth_setup_midi(dp); else if (is_read_mode(dp->file_mode)) snd_seq_oss_midi_open_all(dp, SNDRV_SEQ_OSS_FILE_READ); client_table[dp->index] = dp; num_clients++; return 0; _error: snd_seq_oss_synth_cleanup(dp); snd_seq_oss_midi_cleanup(dp); delete_seq_queue(dp->queue); delete_port(dp); return rc; } /* * translate file flags to private mode */ static int translate_mode(struct file *file) { int file_mode = 0; if ((file->f_flags & O_ACCMODE) != O_RDONLY) file_mode |= SNDRV_SEQ_OSS_FILE_WRITE; if ((file->f_flags & O_ACCMODE) != O_WRONLY) file_mode |= SNDRV_SEQ_OSS_FILE_READ; if (file->f_flags & O_NONBLOCK) file_mode |= SNDRV_SEQ_OSS_FILE_NONBLOCK; return file_mode; } /* * create sequencer port */ static int create_port(struct seq_oss_devinfo *dp) { int rc; struct snd_seq_port_info port; struct snd_seq_port_callback callback; memset(&port, 0, sizeof(port)); port.addr.client = dp->cseq; sprintf(port.name, "Sequencer-%d", dp->index); port.capability = SNDRV_SEQ_PORT_CAP_READ|SNDRV_SEQ_PORT_CAP_WRITE; /* no subscription */ port.type = SNDRV_SEQ_PORT_TYPE_SPECIFIC; port.midi_channels = 128; port.synth_voices = 128; memset(&callback, 0, sizeof(callback)); callback.owner = THIS_MODULE; callback.private_data = dp; callback.event_input = snd_seq_oss_event_input; callback.private_free = free_devinfo; port.kernel = &callback; rc = call_ctl(SNDRV_SEQ_IOCTL_CREATE_PORT, &port); if (rc < 0) return rc; dp->port = port.addr.port; return 0; } /* * delete ALSA port */ static int delete_port(struct seq_oss_devinfo *dp) { if (dp->port < 0) { kfree(dp); return 0; } return snd_seq_event_port_detach(dp->cseq, dp->port); } /* * allocate a queue */ static int alloc_seq_queue(struct seq_oss_devinfo *dp) { struct snd_seq_queue_info qinfo; int rc; memset(&qinfo, 0, sizeof(qinfo)); qinfo.owner = system_client; qinfo.locked = 1; strcpy(qinfo.name, "OSS Sequencer Emulation"); if ((rc = call_ctl(SNDRV_SEQ_IOCTL_CREATE_QUEUE, &qinfo)) < 0) return rc; dp->queue = qinfo.queue; return 0; } /* * release queue */ static int delete_seq_queue(int queue) { struct snd_seq_queue_info qinfo; int rc; if (queue < 0) return 0; memset(&qinfo, 0, sizeof(qinfo)); qinfo.queue = queue; rc = call_ctl(SNDRV_SEQ_IOCTL_DELETE_QUEUE, &qinfo); if (rc < 0) pr_err("ALSA: seq_oss: unable to delete queue %d (%d)\n", queue, rc); return rc; } /* * free device informations - private_free callback of port */ static void free_devinfo(void *private) { struct seq_oss_devinfo *dp = (struct seq_oss_devinfo *)private; snd_seq_oss_timer_delete(dp->timer); snd_seq_oss_writeq_delete(dp->writeq); snd_seq_oss_readq_delete(dp->readq); kfree(dp); } /* * close sequencer device */ void snd_seq_oss_release(struct seq_oss_devinfo *dp) { int queue; client_table[dp->index] = NULL; num_clients--; snd_seq_oss_reset(dp); snd_seq_oss_synth_cleanup(dp); snd_seq_oss_midi_cleanup(dp); /* clear slot */ queue = dp->queue; if (dp->port >= 0) delete_port(dp); delete_seq_queue(queue); } /* * reset sequencer devices */ void snd_seq_oss_reset(struct seq_oss_devinfo *dp) { int i; /* reset all synth devices */ for (i = 0; i < dp->max_synthdev; i++) snd_seq_oss_synth_reset(dp, i); /* reset all midi devices */ if (dp->seq_mode != SNDRV_SEQ_OSS_MODE_MUSIC) { for (i = 0; i < dp->max_mididev; i++) snd_seq_oss_midi_reset(dp, i); } /* remove queues */ if (dp->readq) snd_seq_oss_readq_clear(dp->readq); if (dp->writeq) snd_seq_oss_writeq_clear(dp->writeq); /* reset timer */ snd_seq_oss_timer_stop(dp->timer); } #ifdef CONFIG_SND_PROC_FS /* * misc. functions for proc interface */ char * enabled_str(int bool) { return bool ? "enabled" : "disabled"; } static char * filemode_str(int val) { static char *str[] = { "none", "read", "write", "read/write", }; return str[val & SNDRV_SEQ_OSS_FILE_ACMODE]; } /* * proc interface */ void snd_seq_oss_system_info_read(struct snd_info_buffer *buf) { int i; struct seq_oss_devinfo *dp; snd_iprintf(buf, "ALSA client number %d\n", system_client); snd_iprintf(buf, "ALSA receiver port %d\n", system_port); snd_iprintf(buf, "\nNumber of applications: %d\n", num_clients); for (i = 0; i < num_clients; i++) { snd_iprintf(buf, "\nApplication %d: ", i); if ((dp = client_table[i]) == NULL) { snd_iprintf(buf, "*empty*\n"); continue; } snd_iprintf(buf, "port %d : queue %d\n", dp->port, dp->queue); snd_iprintf(buf, " sequencer mode = %s : file open mode = %s\n", (dp->seq_mode ? "music" : "synth"), filemode_str(dp->file_mode)); if (dp->seq_mode) snd_iprintf(buf, " timer tempo = %d, timebase = %d\n", dp->timer->oss_tempo, dp->timer->oss_timebase); snd_iprintf(buf, " max queue length %d\n", maxqlen); if (is_read_mode(dp->file_mode) && dp->readq) snd_seq_oss_readq_info_read(dp->readq, buf); } } #endif /* CONFIG_SND_PROC_FS */
5 5 5 15 15 15 18 15 17 16 18 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 /* * IP Payload Compression Protocol (IPComp) for IPv6 - RFC3173 * * Copyright (C)2003 USAGI/WIDE Project * * Author Mitsuru KANDA <mk@linux-ipv6.org> * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, see <http://www.gnu.org/licenses/>. */ /* * [Memo] * * Outbound: * The compression of IP datagram MUST be done before AH/ESP processing, * fragmentation, and the addition of Hop-by-Hop/Routing header. * * Inbound: * The decompression of IP datagram MUST be done after the reassembly, * AH/ESP processing. */ #define pr_fmt(fmt) "IPv6: " fmt #include <linux/module.h> #include <net/ip.h> #include <net/xfrm.h> #include <net/ipcomp.h> #include <linux/crypto.h> #include <linux/err.h> #include <linux/pfkeyv2.h> #include <linux/random.h> #include <linux/percpu.h> #include <linux/smp.h> #include <linux/list.h> #include <linux/vmalloc.h> #include <linux/rtnetlink.h> #include <net/ip6_route.h> #include <net/icmp.h> #include <net/ipv6.h> #include <net/protocol.h> #include <linux/ipv6.h> #include <linux/icmpv6.h> #include <linux/mutex.h> static int ipcomp6_err(struct sk_buff *skb, struct inet6_skb_parm *opt, u8 type, u8 code, int offset, __be32 info) { struct net *net = dev_net(skb->dev); __be32 spi; const struct ipv6hdr *iph = (const struct ipv6hdr *)skb->data; struct ip_comp_hdr *ipcomph = (struct ip_comp_hdr *)(skb->data + offset); struct xfrm_state *x; if (type != ICMPV6_PKT_TOOBIG && type != NDISC_REDIRECT) return 0; spi = htonl(ntohs(ipcomph->cpi)); x = xfrm_state_lookup(net, skb->mark, (const xfrm_address_t *)&iph->daddr, spi, IPPROTO_COMP, AF_INET6); if (!x) return 0; if (type == NDISC_REDIRECT) ip6_redirect(skb, net, skb->dev->ifindex, 0, sock_net_uid(net, NULL)); else ip6_update_pmtu(skb, net, info, 0, 0, sock_net_uid(net, NULL)); xfrm_state_put(x); return 0; } static struct xfrm_state *ipcomp6_tunnel_create(struct xfrm_state *x) { struct net *net = xs_net(x); struct xfrm_state *t = NULL; t = xfrm_state_alloc(net); if (!t) goto out; t->id.proto = IPPROTO_IPV6; t->id.spi = xfrm6_tunnel_alloc_spi(net, (xfrm_address_t *)&x->props.saddr); if (!t->id.spi) goto error; memcpy(t->id.daddr.a6, x->id.daddr.a6, sizeof(struct in6_addr)); memcpy(&t->sel, &x->sel, sizeof(t->sel)); t->props.family = AF_INET6; t->props.mode = x->props.mode; memcpy(t->props.saddr.a6, x->props.saddr.a6, sizeof(struct in6_addr)); memcpy(&t->mark, &x->mark, sizeof(t->mark)); if (xfrm_init_state(t)) goto error; atomic_set(&t->tunnel_users, 1); out: return t; error: t->km.state = XFRM_STATE_DEAD; xfrm_state_put(t); t = NULL; goto out; } static int ipcomp6_tunnel_attach(struct xfrm_state *x) { struct net *net = xs_net(x); int err = 0; struct xfrm_state *t = NULL; __be32 spi; u32 mark = x->mark.m & x->mark.v; spi = xfrm6_tunnel_spi_lookup(net, (xfrm_address_t *)&x->props.saddr); if (spi) t = xfrm_state_lookup(net, mark, (xfrm_address_t *)&x->id.daddr, spi, IPPROTO_IPV6, AF_INET6); if (!t) { t = ipcomp6_tunnel_create(x); if (!t) { err = -EINVAL; goto out; } xfrm_state_insert(t); xfrm_state_hold(t); } x->tunnel = t; atomic_inc(&t->tunnel_users); out: return err; } static int ipcomp6_init_state(struct xfrm_state *x) { int err = -EINVAL; x->props.header_len = 0; switch (x->props.mode) { case XFRM_MODE_TRANSPORT: break; case XFRM_MODE_TUNNEL: x->props.header_len += sizeof(struct ipv6hdr); break; default: goto out; } err = ipcomp_init_state(x); if (err) goto out; if (x->props.mode == XFRM_MODE_TUNNEL) { err = ipcomp6_tunnel_attach(x); if (err) goto out; } err = 0; out: return err; } static int ipcomp6_rcv_cb(struct sk_buff *skb, int err) { return 0; } static const struct xfrm_type ipcomp6_type = { .description = "IPCOMP6", .owner = THIS_MODULE, .proto = IPPROTO_COMP, .init_state = ipcomp6_init_state, .destructor = ipcomp_destroy, .input = ipcomp_input, .output = ipcomp_output, .hdr_offset = xfrm6_find_1stfragopt, }; static struct xfrm6_protocol ipcomp6_protocol = { .handler = xfrm6_rcv, .cb_handler = ipcomp6_rcv_cb, .err_handler = ipcomp6_err, .priority = 0, }; static int __init ipcomp6_init(void) { if (xfrm_register_type(&ipcomp6_type, AF_INET6) < 0) { pr_info("%s: can't add xfrm type\n", __func__); return -EAGAIN; } if (xfrm6_protocol_register(&ipcomp6_protocol, IPPROTO_COMP) < 0) { pr_info("%s: can't add protocol\n", __func__); xfrm_unregister_type(&ipcomp6_type, AF_INET6); return -EAGAIN; } return 0; } static void __exit ipcomp6_fini(void) { if (xfrm6_protocol_deregister(&ipcomp6_protocol, IPPROTO_COMP) < 0) pr_info("%s: can't remove protocol\n", __func__); if (xfrm_unregister_type(&ipcomp6_type, AF_INET6) < 0) pr_info("%s: can't remove xfrm type\n", __func__); } module_init(ipcomp6_init); module_exit(ipcomp6_fini); MODULE_LICENSE("GPL"); MODULE_DESCRIPTION("IP Payload Compression Protocol (IPComp) for IPv6 - RFC3173"); MODULE_AUTHOR("Mitsuru KANDA <mk@linux-ipv6.org>"); MODULE_ALIAS_XFRM_TYPE(AF_INET6, XFRM_PROTO_COMP);
13 13 13 13 13 13 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 /* SPDX-License-Identifier: GPL-2.0 */ /* * Copyright (C) 2007 Oracle. All rights reserved. */ #ifndef BTRFS_VOLUMES_H #define BTRFS_VOLUMES_H #include <linux/bio.h> #include <linux/sort.h> #include <linux/btrfs.h> #include "async-thread.h" #define BTRFS_MAX_DATA_CHUNK_SIZE (10ULL * SZ_1G) extern struct mutex uuid_mutex; #define BTRFS_STRIPE_LEN SZ_64K struct buffer_head; struct btrfs_pending_bios { struct bio *head; struct bio *tail; }; /* * Use sequence counter to get consistent device stat data on * 32-bit processors. */ #if BITS_PER_LONG==32 && defined(CONFIG_SMP) #include <linux/seqlock.h> #define __BTRFS_NEED_DEVICE_DATA_ORDERED #define btrfs_device_data_ordered_init(device) \ seqcount_init(&device->data_seqcount) #else #define btrfs_device_data_ordered_init(device) do { } while (0) #endif #define BTRFS_DEV_STATE_WRITEABLE (0) #define BTRFS_DEV_STATE_IN_FS_METADATA (1) #define BTRFS_DEV_STATE_MISSING (2) #define BTRFS_DEV_STATE_REPLACE_TGT (3) #define BTRFS_DEV_STATE_FLUSH_SENT (4) struct btrfs_device { struct list_head dev_list; struct list_head dev_alloc_list; struct btrfs_fs_devices *fs_devices; struct btrfs_fs_info *fs_info; struct rcu_string *name; u64 generation; spinlock_t io_lock ____cacheline_aligned; int running_pending; /* When true means this device has pending chunk alloc in * current transaction. Protected by chunk_mutex. */ bool has_pending_chunks; /* regular prio bios */ struct btrfs_pending_bios pending_bios; /* sync bios */ struct btrfs_pending_bios pending_sync_bios; struct block_device *bdev; /* the mode sent to blkdev_get */ fmode_t mode; unsigned long dev_state; blk_status_t last_flush_error; int flush_bio_sent; #ifdef __BTRFS_NEED_DEVICE_DATA_ORDERED seqcount_t data_seqcount; #endif /* the internal btrfs device id */ u64 devid; /* size of the device in memory */ u64 total_bytes; /* size of the device on disk */ u64 disk_total_bytes; /* bytes used */ u64 bytes_used; /* optimal io alignment for this device */ u32 io_align; /* optimal io width for this device */ u32 io_width; /* type and info about this device */ u64 type; /* minimal io size for this device */ u32 sector_size; /* physical drive uuid (or lvm uuid) */ u8 uuid[BTRFS_UUID_SIZE]; /* * size of the device on the current transaction * * This variant is update when committing the transaction, * and protected by device_list_mutex */ u64 commit_total_bytes; /* bytes used on the current transaction */ u64 commit_bytes_used; /* * used to manage the device which is resized * * It is protected by chunk_lock. */ struct list_head resized_list; /* for sending down flush barriers */ struct bio *flush_bio; struct completion flush_wait; /* per-device scrub information */ struct scrub_ctx *scrub_ctx; struct btrfs_work work; struct rcu_head rcu; /* readahead state */ atomic_t reada_in_flight; u64 reada_next; struct reada_zone *reada_curr_zone; struct radix_tree_root reada_zones; struct radix_tree_root reada_extents; /* disk I/O failure stats. For detailed description refer to * enum btrfs_dev_stat_values in ioctl.h */ int dev_stats_valid; /* Counter to record the change of device stats */ atomic_t dev_stats_ccnt; atomic_t dev_stat_values[BTRFS_DEV_STAT_VALUES_MAX]; }; /* * If we read those variants at the context of their own lock, we needn't * use the following helpers, reading them directly is safe. */ #if BITS_PER_LONG==32 && defined(CONFIG_SMP) #define BTRFS_DEVICE_GETSET_FUNCS(name) \ static inline u64 \ btrfs_device_get_##name(const struct btrfs_device *dev) \ { \ u64 size; \ unsigned int seq; \ \ do { \ seq = read_seqcount_begin(&dev->data_seqcount); \ size = dev->name; \ } while (read_seqcount_retry(&dev->data_seqcount, seq)); \ return size; \ } \ \ static inline void \ btrfs_device_set_##name(struct btrfs_device *dev, u64 size) \ { \ preempt_disable(); \ write_seqcount_begin(&dev->data_seqcount); \ dev->name = size; \ write_seqcount_end(&dev->data_seqcount); \ preempt_enable(); \ } #elif BITS_PER_LONG==32 && defined(CONFIG_PREEMPT) #define BTRFS_DEVICE_GETSET_FUNCS(name) \ static inline u64 \ btrfs_device_get_##name(const struct btrfs_device *dev) \ { \ u64 size; \ \ preempt_disable(); \ size = dev->name; \ preempt_enable(); \ return size; \ } \ \ static inline void \ btrfs_device_set_##name(struct btrfs_device *dev, u64 size) \ { \ preempt_disable(); \ dev->name = size; \ preempt_enable(); \ } #else #define BTRFS_DEVICE_GETSET_FUNCS(name) \ static inline u64 \ btrfs_device_get_##name(const struct btrfs_device *dev) \ { \ return dev->name; \ } \ \ static inline void \ btrfs_device_set_##name(struct btrfs_device *dev, u64 size) \ { \ dev->name = size; \ } #endif BTRFS_DEVICE_GETSET_FUNCS(total_bytes); BTRFS_DEVICE_GETSET_FUNCS(disk_total_bytes); BTRFS_DEVICE_GETSET_FUNCS(bytes_used); struct btrfs_fs_devices { u8 fsid[BTRFS_FSID_SIZE]; /* FS specific uuid */ struct list_head fs_list; u64 num_devices; u64 open_devices; u64 rw_devices; u64 missing_devices; u64 total_rw_bytes; u64 total_devices; struct block_device *latest_bdev; /* all of the devices in the FS, protected by a mutex * so we can safely walk it to write out the supers without * worrying about add/remove by the multi-device code. * Scrubbing super can kick off supers writing by holding * this mutex lock. */ struct mutex device_list_mutex; struct list_head devices; struct list_head resized_devices; /* devices not currently being allocated */ struct list_head alloc_list; struct btrfs_fs_devices *seed; int seeding; int opened; /* set when we find or add a device that doesn't have the * nonrot flag set */ int rotating; struct btrfs_fs_info *fs_info; /* sysfs kobjects */ struct kobject fsid_kobj; struct kobject *device_dir_kobj; struct completion kobj_unregister; }; #define BTRFS_BIO_INLINE_CSUM_SIZE 64 #define BTRFS_MAX_DEVS(info) ((BTRFS_MAX_ITEM_SIZE(info) \ - sizeof(struct btrfs_chunk)) \ / sizeof(struct btrfs_stripe) + 1) #define BTRFS_MAX_DEVS_SYS_CHUNK ((BTRFS_SYSTEM_CHUNK_ARRAY_SIZE \ - 2 * sizeof(struct btrfs_disk_key) \ - 2 * sizeof(struct btrfs_chunk)) \ / sizeof(struct btrfs_stripe) + 1) /* * we need the mirror number and stripe index to be passed around * the call chain while we are processing end_io (especially errors). * Really, what we need is a btrfs_bio structure that has this info * and is properly sized with its stripe array, but we're not there * quite yet. We have our own btrfs bioset, and all of the bios * we allocate are actually btrfs_io_bios. We'll cram as much of * struct btrfs_bio as we can into this over time. */ typedef void (btrfs_io_bio_end_io_t) (struct btrfs_io_bio *bio, int err); struct btrfs_io_bio { unsigned int mirror_num; unsigned int stripe_index; u64 logical; u8 *csum; u8 csum_inline[BTRFS_BIO_INLINE_CSUM_SIZE]; u8 *csum_allocated; btrfs_io_bio_end_io_t *end_io; struct bvec_iter iter; /* * This member must come last, bio_alloc_bioset will allocate enough * bytes for entire btrfs_io_bio but relies on bio being last. */ struct bio bio; }; static inline struct btrfs_io_bio *btrfs_io_bio(struct bio *bio) { return container_of(bio, struct btrfs_io_bio, bio); } struct btrfs_bio_stripe { struct btrfs_device *dev; u64 physical; u64 length; /* only used for discard mappings */ }; struct btrfs_bio; typedef void (btrfs_bio_end_io_t) (struct btrfs_bio *bio, int err); struct btrfs_bio { refcount_t refs; atomic_t stripes_pending; struct btrfs_fs_info *fs_info; u64 map_type; /* get from map_lookup->type */ bio_end_io_t *end_io; struct bio *orig_bio; void *private; atomic_t error; int max_errors; int num_stripes; int mirror_num; int num_tgtdevs; int *tgtdev_map; /* * logical block numbers for the start of each stripe * The last one or two are p/q. These are sorted, * so raid_map[0] is the start of our full stripe */ u64 *raid_map; struct btrfs_bio_stripe stripes[]; }; struct btrfs_device_info { struct btrfs_device *dev; u64 dev_offset; u64 max_avail; u64 total_avail; }; struct btrfs_raid_attr { int sub_stripes; /* sub_stripes info for map */ int dev_stripes; /* stripes per dev */ int devs_max; /* max devs to use */ int devs_min; /* min devs needed */ int tolerated_failures; /* max tolerated fail devs */ int devs_increment; /* ndevs has to be a multiple of this */ int ncopies; /* how many copies to data has */ int mindev_error; /* error code if min devs requisite is unmet */ const char raid_name[8]; /* name of the raid */ u64 bg_flag; /* block group flag of the raid */ }; extern const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES]; struct map_lookup { u64 type; int io_align; int io_width; u64 stripe_len; int num_stripes; int sub_stripes; int verified_stripes; /* For mount time dev extent verification */ struct btrfs_bio_stripe stripes[]; }; #define map_lookup_size(n) (sizeof(struct map_lookup) + \ (sizeof(struct btrfs_bio_stripe) * (n))) struct btrfs_balance_args; struct btrfs_balance_progress; struct btrfs_balance_control { struct btrfs_balance_args data; struct btrfs_balance_args meta; struct btrfs_balance_args sys; u64 flags; struct btrfs_balance_progress stat; }; enum btrfs_map_op { BTRFS_MAP_READ, BTRFS_MAP_WRITE, BTRFS_MAP_DISCARD, BTRFS_MAP_GET_READ_MIRRORS, }; static inline enum btrfs_map_op btrfs_op(struct bio *bio) { switch (bio_op(bio)) { case REQ_OP_DISCARD: return BTRFS_MAP_DISCARD; case REQ_OP_WRITE: return BTRFS_MAP_WRITE; default: WARN_ON_ONCE(1); case REQ_OP_READ: return BTRFS_MAP_READ; } } void btrfs_get_bbio(struct btrfs_bio *bbio); void btrfs_put_bbio(struct btrfs_bio *bbio); int btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op, u64 logical, u64 *length, struct btrfs_bio **bbio_ret, int mirror_num); int btrfs_map_sblock(struct btrfs_fs_info *fs_info, enum btrfs_map_op op, u64 logical, u64 *length, struct btrfs_bio **bbio_ret); int btrfs_rmap_block(struct btrfs_fs_info *fs_info, u64 chunk_start, u64 physical, u64 **logical, int *naddrs, int *stripe_len); int btrfs_read_sys_array(struct btrfs_fs_info *fs_info); int btrfs_read_chunk_tree(struct btrfs_fs_info *fs_info); int btrfs_alloc_chunk(struct btrfs_trans_handle *trans, u64 type); void btrfs_mapping_init(struct btrfs_mapping_tree *tree); void btrfs_mapping_tree_free(struct btrfs_mapping_tree *tree); blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio, int mirror_num, int async_submit); int btrfs_open_devices(struct btrfs_fs_devices *fs_devices, fmode_t flags, void *holder); struct btrfs_device *btrfs_scan_one_device(const char *path, fmode_t flags, void *holder); int btrfs_close_devices(struct btrfs_fs_devices *fs_devices); void btrfs_free_extra_devids(struct btrfs_fs_devices *fs_devices, int step); void btrfs_assign_next_active_device(struct btrfs_device *device, struct btrfs_device *this_dev); int btrfs_find_device_missing_or_by_path(struct btrfs_fs_info *fs_info, const char *device_path, struct btrfs_device **device); int btrfs_find_device_by_devspec(struct btrfs_fs_info *fs_info, u64 devid, const char *devpath, struct btrfs_device **device); struct btrfs_device *btrfs_alloc_device(struct btrfs_fs_info *fs_info, const u64 *devid, const u8 *uuid); void btrfs_free_device(struct btrfs_device *device); int btrfs_rm_device(struct btrfs_fs_info *fs_info, const char *device_path, u64 devid); void __exit btrfs_cleanup_fs_uuids(void); int btrfs_num_copies(struct btrfs_fs_info *fs_info, u64 logical, u64 len); int btrfs_grow_device(struct btrfs_trans_handle *trans, struct btrfs_device *device, u64 new_size); struct btrfs_device *btrfs_find_device(struct btrfs_fs_devices *fs_devices, u64 devid, u8 *uuid, u8 *fsid, bool seed); int btrfs_shrink_device(struct btrfs_device *device, u64 new_size); int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *path); int btrfs_balance(struct btrfs_fs_info *fs_info, struct btrfs_balance_control *bctl, struct btrfs_ioctl_balance_args *bargs); int btrfs_resume_balance_async(struct btrfs_fs_info *fs_info); int btrfs_recover_balance(struct btrfs_fs_info *fs_info); int btrfs_pause_balance(struct btrfs_fs_info *fs_info); int btrfs_cancel_balance(struct btrfs_fs_info *fs_info); int btrfs_create_uuid_tree(struct btrfs_fs_info *fs_info); int btrfs_check_uuid_tree(struct btrfs_fs_info *fs_info); int btrfs_chunk_readonly(struct btrfs_fs_info *fs_info, u64 chunk_offset); int find_free_dev_extent_start(struct btrfs_transaction *transaction, struct btrfs_device *device, u64 num_bytes, u64 search_start, u64 *start, u64 *max_avail); int find_free_dev_extent(struct btrfs_trans_handle *trans, struct btrfs_device *device, u64 num_bytes, u64 *start, u64 *max_avail); void btrfs_dev_stat_inc_and_print(struct btrfs_device *dev, int index); int btrfs_get_dev_stats(struct btrfs_fs_info *fs_info, struct btrfs_ioctl_get_dev_stats *stats); void btrfs_init_devices_late(struct btrfs_fs_info *fs_info); int btrfs_init_dev_stats(struct btrfs_fs_info *fs_info); int btrfs_run_dev_stats(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info); void btrfs_rm_dev_replace_remove_srcdev(struct btrfs_device *srcdev); void btrfs_rm_dev_replace_free_srcdev(struct btrfs_fs_info *fs_info, struct btrfs_device *srcdev); void btrfs_destroy_dev_replace_tgtdev(struct btrfs_device *tgtdev); void btrfs_scratch_superblocks(struct block_device *bdev, const char *device_path); int btrfs_is_parity_mirror(struct btrfs_fs_info *fs_info, u64 logical, u64 len); unsigned long btrfs_full_stripe_len(struct btrfs_fs_info *fs_info, u64 logical); int btrfs_finish_chunk_alloc(struct btrfs_trans_handle *trans, u64 chunk_offset, u64 chunk_size); int btrfs_remove_chunk(struct btrfs_trans_handle *trans, u64 chunk_offset); static inline void btrfs_dev_stat_inc(struct btrfs_device *dev, int index) { atomic_inc(dev->dev_stat_values + index); /* * This memory barrier orders stores updating statistics before stores * updating dev_stats_ccnt. * * It pairs with smp_rmb() in btrfs_run_dev_stats(). */ smp_mb__before_atomic(); atomic_inc(&dev->dev_stats_ccnt); } static inline int btrfs_dev_stat_read(struct btrfs_device *dev, int index) { return atomic_read(dev->dev_stat_values + index); } static inline int btrfs_dev_stat_read_and_reset(struct btrfs_device *dev, int index) { int ret; ret = atomic_xchg(dev->dev_stat_values + index, 0); /* * atomic_xchg implies a full memory barriers as per atomic_t.txt: * - RMW operations that have a return value are fully ordered; * * This implicit memory barriers is paired with the smp_rmb in * btrfs_run_dev_stats */ atomic_inc(&dev->dev_stats_ccnt); return ret; } static inline void btrfs_dev_stat_set(struct btrfs_device *dev, int index, unsigned long val) { atomic_set(dev->dev_stat_values + index, val); /* * This memory barrier orders stores updating statistics before stores * updating dev_stats_ccnt. * * It pairs with smp_rmb() in btrfs_run_dev_stats(). */ smp_mb__before_atomic(); atomic_inc(&dev->dev_stats_ccnt); } static inline void btrfs_dev_stat_reset(struct btrfs_device *dev, int index) { btrfs_dev_stat_set(dev, index, 0); } /* * Convert block group flags (BTRFS_BLOCK_GROUP_*) to btrfs_raid_types, which * can be used as index to access btrfs_raid_array[]. */ static inline enum btrfs_raid_types btrfs_bg_flags_to_raid_index(u64 flags) { if (flags & BTRFS_BLOCK_GROUP_RAID10) return BTRFS_RAID_RAID10; else if (flags & BTRFS_BLOCK_GROUP_RAID1) return BTRFS_RAID_RAID1; else if (flags & BTRFS_BLOCK_GROUP_DUP) return BTRFS_RAID_DUP; else if (flags & BTRFS_BLOCK_GROUP_RAID0) return BTRFS_RAID_RAID0; else if (flags & BTRFS_BLOCK_GROUP_RAID5) return BTRFS_RAID_RAID5; else if (flags & BTRFS_BLOCK_GROUP_RAID6) return BTRFS_RAID_RAID6; return BTRFS_RAID_SINGLE; /* BTRFS_BLOCK_GROUP_SINGLE */ } const char *get_raid_name(enum btrfs_raid_types type); void btrfs_update_commit_device_size(struct btrfs_fs_info *fs_info); void btrfs_update_commit_device_bytes_used(struct btrfs_transaction *trans); struct list_head *btrfs_get_fs_uuids(void); void btrfs_set_fs_info_ptr(struct btrfs_fs_info *fs_info); void btrfs_reset_fs_info_ptr(struct btrfs_fs_info *fs_info); bool btrfs_check_rw_degradable(struct btrfs_fs_info *fs_info, struct btrfs_device *failing_dev); int btrfs_bg_type_to_factor(u64 flags); int btrfs_verify_dev_extents(struct btrfs_fs_info *fs_info); #endif
52 48 48 16 16 48 40 48 1 22 22 22 22 48 48 48 52 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 /* * Copyright (c) 2006 Oracle. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU * General Public License (GPL) Version 2, available from the file * COPYING in the main directory of this source tree, or the * OpenIB.org BSD license below: * * Redistribution and use in source and binary forms, with or * without modification, are permitted provided that the following * conditions are met: * * - Redistributions of source code must retain the above * copyright notice, this list of conditions and the following * disclaimer. * * - Redistributions in binary form must reproduce the above * copyright notice, this list of conditions and the following * disclaimer in the documentation and/or other materials * provided with the distribution. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE * SOFTWARE. * */ #include <linux/highmem.h> #include <linux/gfp.h> #include <linux/cpu.h> #include <linux/export.h> #include "rds.h" struct rds_page_remainder { struct page *r_page; unsigned long r_offset; }; static DEFINE_PER_CPU_SHARED_ALIGNED(struct rds_page_remainder, rds_page_remainders); /** * rds_page_remainder_alloc - build up regions of a message. * * @scat: Scatter list for message * @bytes: the number of bytes needed. * @gfp: the waiting behaviour of the allocation * * @gfp is always ored with __GFP_HIGHMEM. Callers must be prepared to * kmap the pages, etc. * * If @bytes is at least a full page then this just returns a page from * alloc_page(). * * If @bytes is a partial page then this stores the unused region of the * page in a per-cpu structure. Future partial-page allocations may be * satisfied from that cached region. This lets us waste less memory on * small allocations with minimal complexity. It works because the transmit * path passes read-only page regions down to devices. They hold a page * reference until they are done with the region. */ int rds_page_remainder_alloc(struct scatterlist *scat, unsigned long bytes, gfp_t gfp) { struct rds_page_remainder *rem; unsigned long flags; struct page *page; int ret; gfp |= __GFP_HIGHMEM; /* jump straight to allocation if we're trying for a huge page */ if (bytes >= PAGE_SIZE) { page = alloc_page(gfp); if (!page) { ret = -ENOMEM; } else { sg_set_page(scat, page, PAGE_SIZE, 0); ret = 0; } goto out; } rem = &per_cpu(rds_page_remainders, get_cpu()); local_irq_save(flags); while (1) { /* avoid a tiny region getting stuck by tossing it */ if (rem->r_page && bytes > (PAGE_SIZE - rem->r_offset)) { rds_stats_inc(s_page_remainder_miss); __free_page(rem->r_page); rem->r_page = NULL; } /* hand out a fragment from the cached page */ if (rem->r_page && bytes <= (PAGE_SIZE - rem->r_offset)) { sg_set_page(scat, rem->r_page, bytes, rem->r_offset); get_page(sg_page(scat)); if (rem->r_offset != 0) rds_stats_inc(s_page_remainder_hit); rem->r_offset += ALIGN(bytes, 8); if (rem->r_offset >= PAGE_SIZE) { __free_page(rem->r_page); rem->r_page = NULL; } ret = 0; break; } /* alloc if there is nothing for us to use */ local_irq_restore(flags); put_cpu(); page = alloc_page(gfp); rem = &per_cpu(rds_page_remainders, get_cpu()); local_irq_save(flags); if (!page) { ret = -ENOMEM; break; } /* did someone race to fill the remainder before us? */ if (rem->r_page) { __free_page(page); continue; } /* otherwise install our page and loop around to alloc */ rem->r_page = page; rem->r_offset = 0; } local_irq_restore(flags); put_cpu(); out: rdsdebug("bytes %lu ret %d %p %u %u\n", bytes, ret, ret ? NULL : sg_page(scat), ret ? 0 : scat->offset, ret ? 0 : scat->length); return ret; } EXPORT_SYMBOL_GPL(rds_page_remainder_alloc); void rds_page_exit(void) { unsigned int cpu; for_each_possible_cpu(cpu) { struct rds_page_remainder *rem; rem = &per_cpu(rds_page_remainders, cpu); rdsdebug("cpu %u\n", cpu); if (rem->r_page) __free_page(rem->r_page); rem->r_page = NULL; } }
462 3 60 59 52 60 7 3 3 42 6 42 41 13 6 6 3 4 4 4 2 2 445 444 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 /* * ip_conntrack_proto_gre.c - Version 3.0 * * Connection tracking protocol helper module for GRE. * * GRE is a generic encapsulation protocol, which is generally not very * suited for NAT, as it has no protocol-specific part as port numbers. * * It has an optional key field, which may help us distinguishing two * connections between the same two hosts. * * GRE is defined in RFC 1701 and RFC 1702, as well as RFC 2784 * * PPTP is built on top of a modified version of GRE, and has a mandatory * field called "CallID", which serves us for the same purpose as the key * field in plain GRE. * * Documentation about PPTP can be found in RFC 2637 * * (C) 2000-2005 by Harald Welte <laforge@gnumonks.org> * * Development of this code funded by Astaro AG (http://www.astaro.com/) * * (C) 2006-2012 Patrick McHardy <kaber@trash.net> */ #include <linux/module.h> #include <linux/types.h> #include <linux/timer.h> #include <linux/list.h> #include <linux/seq_file.h> #include <linux/in.h> #include <linux/netdevice.h> #include <linux/skbuff.h> #include <linux/slab.h> #include <net/dst.h> #include <net/net_namespace.h> #include <net/netns/generic.h> #include <net/netfilter/nf_conntrack_l4proto.h> #include <net/netfilter/nf_conntrack_helper.h> #include <net/netfilter/nf_conntrack_core.h> #include <net/netfilter/nf_conntrack_timeout.h> #include <linux/netfilter/nf_conntrack_proto_gre.h> #include <linux/netfilter/nf_conntrack_pptp.h> static const unsigned int gre_timeouts[GRE_CT_MAX] = { [GRE_CT_UNREPLIED] = 30*HZ, [GRE_CT_REPLIED] = 180*HZ, }; static unsigned int proto_gre_net_id __read_mostly; static inline struct netns_proto_gre *gre_pernet(struct net *net) { return net_generic(net, proto_gre_net_id); } static void nf_ct_gre_keymap_flush(struct net *net) { struct netns_proto_gre *net_gre = gre_pernet(net); struct nf_ct_gre_keymap *km, *tmp; write_lock_bh(&net_gre->keymap_lock); list_for_each_entry_safe(km, tmp, &net_gre->keymap_list, list) { list_del(&km->list); kfree(km); } write_unlock_bh(&net_gre->keymap_lock); } static inline int gre_key_cmpfn(const struct nf_ct_gre_keymap *km, const struct nf_conntrack_tuple *t) { return km->tuple.src.l3num == t->src.l3num && !memcmp(&km->tuple.src.u3, &t->src.u3, sizeof(t->src.u3)) && !memcmp(&km->tuple.dst.u3, &t->dst.u3, sizeof(t->dst.u3)) && km->tuple.dst.protonum == t->dst.protonum && km->tuple.dst.u.all == t->dst.u.all; } /* look up the source key for a given tuple */ static __be16 gre_keymap_lookup(struct net *net, struct nf_conntrack_tuple *t) { struct netns_proto_gre *net_gre = gre_pernet(net); struct nf_ct_gre_keymap *km; __be16 key = 0; read_lock_bh(&net_gre->keymap_lock); list_for_each_entry(km, &net_gre->keymap_list, list) { if (gre_key_cmpfn(km, t)) { key = km->tuple.src.u.gre.key; break; } } read_unlock_bh(&net_gre->keymap_lock); pr_debug("lookup src key 0x%x for ", key); nf_ct_dump_tuple(t); return key; } /* add a single keymap entry, associate with specified master ct */ int nf_ct_gre_keymap_add(struct nf_conn *ct, enum ip_conntrack_dir dir, struct nf_conntrack_tuple *t) { struct net *net = nf_ct_net(ct); struct netns_proto_gre *net_gre = gre_pernet(net); struct nf_ct_pptp_master *ct_pptp_info = nfct_help_data(ct); struct nf_ct_gre_keymap **kmp, *km; kmp = &ct_pptp_info->keymap[dir]; if (*kmp) { /* check whether it's a retransmission */ read_lock_bh(&net_gre->keymap_lock); list_for_each_entry(km, &net_gre->keymap_list, list) { if (gre_key_cmpfn(km, t) && km == *kmp) { read_unlock_bh(&net_gre->keymap_lock); return 0; } } read_unlock_bh(&net_gre->keymap_lock); pr_debug("trying to override keymap_%s for ct %p\n", dir == IP_CT_DIR_REPLY ? "reply" : "orig", ct); return -EEXIST; } km = kmalloc(sizeof(*km), GFP_ATOMIC); if (!km) return -ENOMEM; memcpy(&km->tuple, t, sizeof(*t)); *kmp = km; pr_debug("adding new entry %p: ", km); nf_ct_dump_tuple(&km->tuple); write_lock_bh(&net_gre->keymap_lock); list_add_tail(&km->list, &net_gre->keymap_list); write_unlock_bh(&net_gre->keymap_lock); return 0; } EXPORT_SYMBOL_GPL(nf_ct_gre_keymap_add); /* destroy the keymap entries associated with specified master ct */ void nf_ct_gre_keymap_destroy(struct nf_conn *ct) { struct net *net = nf_ct_net(ct); struct netns_proto_gre *net_gre = gre_pernet(net); struct nf_ct_pptp_master *ct_pptp_info = nfct_help_data(ct); enum ip_conntrack_dir dir; pr_debug("entering for ct %p\n", ct); write_lock_bh(&net_gre->keymap_lock); for (dir = IP_CT_DIR_ORIGINAL; dir < IP_CT_DIR_MAX; dir++) { if (ct_pptp_info->keymap[dir]) { pr_debug("removing %p from list\n", ct_pptp_info->keymap[dir]); list_del(&ct_pptp_info->keymap[dir]->list); kfree(ct_pptp_info->keymap[dir]); ct_pptp_info->keymap[dir] = NULL; } } write_unlock_bh(&net_gre->keymap_lock); } EXPORT_SYMBOL_GPL(nf_ct_gre_keymap_destroy); /* PUBLIC CONNTRACK PROTO HELPER FUNCTIONS */ /* gre hdr info to tuple */ static bool gre_pkt_to_tuple(const struct sk_buff *skb, unsigned int dataoff, struct net *net, struct nf_conntrack_tuple *tuple) { const struct pptp_gre_header *pgrehdr; struct pptp_gre_header _pgrehdr; __be16 srckey; const struct gre_base_hdr *grehdr; struct gre_base_hdr _grehdr; /* first only delinearize old RFC1701 GRE header */ grehdr = skb_header_pointer(skb, dataoff, sizeof(_grehdr), &_grehdr); if (!grehdr || (grehdr->flags & GRE_VERSION) != GRE_VERSION_1) { /* try to behave like "nf_conntrack_proto_generic" */ tuple->src.u.all = 0; tuple->dst.u.all = 0; return true; } /* PPTP header is variable length, only need up to the call_id field */ pgrehdr = skb_header_pointer(skb, dataoff, 8, &_pgrehdr); if (!pgrehdr) return true; if (grehdr->protocol != GRE_PROTO_PPP) { pr_debug("Unsupported GRE proto(0x%x)\n", ntohs(grehdr->protocol)); return false; } tuple->dst.u.gre.key = pgrehdr->call_id; srckey = gre_keymap_lookup(net, tuple); tuple->src.u.gre.key = srckey; return true; } #ifdef CONFIG_NF_CONNTRACK_PROCFS /* print private data for conntrack */ static void gre_print_conntrack(struct seq_file *s, struct nf_conn *ct) { seq_printf(s, "timeout=%u, stream_timeout=%u ", (ct->proto.gre.timeout / HZ), (ct->proto.gre.stream_timeout / HZ)); } #endif static unsigned int *gre_get_timeouts(struct net *net) { return gre_pernet(net)->gre_timeouts; } /* Returns verdict for packet, and may modify conntrack */ static int gre_packet(struct nf_conn *ct, const struct sk_buff *skb, unsigned int dataoff, enum ip_conntrack_info ctinfo) { /* If we've seen traffic both ways, this is a GRE connection. * Extend timeout. */ if (ct->status & IPS_SEEN_REPLY) { nf_ct_refresh_acct(ct, ctinfo, skb, ct->proto.gre.stream_timeout); /* Also, more likely to be important, and not a probe. */ if (!test_and_set_bit(IPS_ASSURED_BIT, &ct->status)) nf_conntrack_event_cache(IPCT_ASSURED, ct); } else nf_ct_refresh_acct(ct, ctinfo, skb, ct->proto.gre.timeout); return NF_ACCEPT; } /* Called when a new connection for this protocol found. */ static bool gre_new(struct nf_conn *ct, const struct sk_buff *skb, unsigned int dataoff) { unsigned int *timeouts = nf_ct_timeout_lookup(ct); if (!timeouts) timeouts = gre_get_timeouts(nf_ct_net(ct)); pr_debug(": "); nf_ct_dump_tuple(&ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple); /* initialize to sane value. Ideally a conntrack helper * (e.g. in case of pptp) is increasing them */ ct->proto.gre.stream_timeout = timeouts[GRE_CT_REPLIED]; ct->proto.gre.timeout = timeouts[GRE_CT_UNREPLIED]; return true; } /* Called when a conntrack entry has already been removed from the hashes * and is about to be deleted from memory */ static void gre_destroy(struct nf_conn *ct) { struct nf_conn *master = ct->master; pr_debug(" entering\n"); if (!master) pr_debug("no master !?!\n"); else nf_ct_gre_keymap_destroy(master); } #ifdef CONFIG_NF_CONNTRACK_TIMEOUT #include <linux/netfilter/nfnetlink.h> #include <linux/netfilter/nfnetlink_cttimeout.h> static int gre_timeout_nlattr_to_obj(struct nlattr *tb[], struct net *net, void *data) { unsigned int *timeouts = data; struct netns_proto_gre *net_gre = gre_pernet(net); if (!timeouts) timeouts = gre_get_timeouts(net); /* set default timeouts for GRE. */ timeouts[GRE_CT_UNREPLIED] = net_gre->gre_timeouts[GRE_CT_UNREPLIED]; timeouts[GRE_CT_REPLIED] = net_gre->gre_timeouts[GRE_CT_REPLIED]; if (tb[CTA_TIMEOUT_GRE_UNREPLIED]) { timeouts[GRE_CT_UNREPLIED] = ntohl(nla_get_be32(tb[CTA_TIMEOUT_GRE_UNREPLIED])) * HZ; } if (tb[CTA_TIMEOUT_GRE_REPLIED]) { timeouts[GRE_CT_REPLIED] = ntohl(nla_get_be32(tb[CTA_TIMEOUT_GRE_REPLIED])) * HZ; } return 0; } static int gre_timeout_obj_to_nlattr(struct sk_buff *skb, const void *data) { const unsigned int *timeouts = data; if (nla_put_be32(skb, CTA_TIMEOUT_GRE_UNREPLIED, htonl(timeouts[GRE_CT_UNREPLIED] / HZ)) || nla_put_be32(skb, CTA_TIMEOUT_GRE_REPLIED, htonl(timeouts[GRE_CT_REPLIED] / HZ))) goto nla_put_failure; return 0; nla_put_failure: return -ENOSPC; } static const struct nla_policy gre_timeout_nla_policy[CTA_TIMEOUT_GRE_MAX+1] = { [CTA_TIMEOUT_GRE_UNREPLIED] = { .type = NLA_U32 }, [CTA_TIMEOUT_GRE_REPLIED] = { .type = NLA_U32 }, }; #endif /* CONFIG_NF_CONNTRACK_TIMEOUT */ static int gre_init_net(struct net *net, u_int16_t proto) { struct netns_proto_gre *net_gre = gre_pernet(net); int i; rwlock_init(&net_gre->keymap_lock); INIT_LIST_HEAD(&net_gre->keymap_list); for (i = 0; i < GRE_CT_MAX; i++) net_gre->gre_timeouts[i] = gre_timeouts[i]; return 0; } /* protocol helper struct */ static const struct nf_conntrack_l4proto nf_conntrack_l4proto_gre4 = { .l3proto = AF_INET, .l4proto = IPPROTO_GRE, .pkt_to_tuple = gre_pkt_to_tuple, #ifdef CONFIG_NF_CONNTRACK_PROCFS .print_conntrack = gre_print_conntrack, #endif .packet = gre_packet, .new = gre_new, .destroy = gre_destroy, .me = THIS_MODULE, #if IS_ENABLED(CONFIG_NF_CT_NETLINK) .tuple_to_nlattr = nf_ct_port_tuple_to_nlattr, .nlattr_tuple_size = nf_ct_port_nlattr_tuple_size, .nlattr_to_tuple = nf_ct_port_nlattr_to_tuple, .nla_policy = nf_ct_port_nla_policy, #endif #ifdef CONFIG_NF_CONNTRACK_TIMEOUT .ctnl_timeout = { .nlattr_to_obj = gre_timeout_nlattr_to_obj, .obj_to_nlattr = gre_timeout_obj_to_nlattr, .nlattr_max = CTA_TIMEOUT_GRE_MAX, .obj_size = sizeof(unsigned int) * GRE_CT_MAX, .nla_policy = gre_timeout_nla_policy, }, #endif /* CONFIG_NF_CONNTRACK_TIMEOUT */ .net_id = &proto_gre_net_id, .init_net = gre_init_net, }; static int proto_gre_net_init(struct net *net) { int ret = 0; ret = nf_ct_l4proto_pernet_register_one(net, &nf_conntrack_l4proto_gre4); if (ret < 0) pr_err("nf_conntrack_gre4: pernet registration failed.\n"); return ret; } static void proto_gre_net_exit(struct net *net) { nf_ct_l4proto_pernet_unregister_one(net, &nf_conntrack_l4proto_gre4); nf_ct_gre_keymap_flush(net); } static struct pernet_operations proto_gre_net_ops = { .init = proto_gre_net_init, .exit = proto_gre_net_exit, .id = &proto_gre_net_id, .size = sizeof(struct netns_proto_gre), }; static int __init nf_ct_proto_gre_init(void) { int ret; BUILD_BUG_ON(offsetof(struct netns_proto_gre, nf) != 0); ret = register_pernet_subsys(&proto_gre_net_ops); if (ret < 0) goto out_pernet; ret = nf_ct_l4proto_register_one(&nf_conntrack_l4proto_gre4); if (ret < 0) goto out_gre4; return 0; out_gre4: unregister_pernet_subsys(&proto_gre_net_ops); out_pernet: return ret; } static void __exit nf_ct_proto_gre_fini(void) { nf_ct_l4proto_unregister_one(&nf_conntrack_l4proto_gre4); unregister_pernet_subsys(&proto_gre_net_ops); } module_init(nf_ct_proto_gre_init); module_exit(nf_ct_proto_gre_fini); MODULE_LICENSE("GPL");
1 1 1 1 1 1 1 1 1 1 1 1 1 1 118 8 119 1 1 1 2 10 2 1 10 1 1 1 1 1 1 1 1 3 3 2 222 141 2 1 1 2 1 1 127 127 138 138 138 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 /* * KVM PMU support for Intel CPUs * * Copyright 2011 Red Hat, Inc. and/or its affiliates. * * Authors: * Avi Kivity <avi@redhat.com> * Gleb Natapov <gleb@redhat.com> * * This work is licensed under the terms of the GNU GPL, version 2. See * the COPYING file in the top-level directory. * */ #include <linux/types.h> #include <linux/kvm_host.h> #include <linux/perf_event.h> #include <asm/perf_event.h> #include "x86.h" #include "cpuid.h" #include "lapic.h" #include "pmu.h" static struct kvm_event_hw_type_mapping intel_arch_events[] = { /* Index must match CPUID 0x0A.EBX bit vector */ [0] = { 0x3c, 0x00, PERF_COUNT_HW_CPU_CYCLES }, [1] = { 0xc0, 0x00, PERF_COUNT_HW_INSTRUCTIONS }, [2] = { 0x3c, 0x01, PERF_COUNT_HW_BUS_CYCLES }, [3] = { 0x2e, 0x4f, PERF_COUNT_HW_CACHE_REFERENCES }, [4] = { 0x2e, 0x41, PERF_COUNT_HW_CACHE_MISSES }, [5] = { 0xc4, 0x00, PERF_COUNT_HW_BRANCH_INSTRUCTIONS }, [6] = { 0xc5, 0x00, PERF_COUNT_HW_BRANCH_MISSES }, [7] = { 0x00, 0x03, PERF_COUNT_HW_REF_CPU_CYCLES }, }; /* mapping between fixed pmc index and intel_arch_events array */ static int fixed_pmc_events[] = {1, 0, 7}; static void reprogram_fixed_counters(struct kvm_pmu *pmu, u64 data) { int i; for (i = 0; i < pmu->nr_arch_fixed_counters; i++) { u8 new_ctrl = fixed_ctrl_field(data, i); u8 old_ctrl = fixed_ctrl_field(pmu->fixed_ctr_ctrl, i); struct kvm_pmc *pmc; pmc = get_fixed_pmc(pmu, MSR_CORE_PERF_FIXED_CTR0 + i); if (old_ctrl == new_ctrl) continue; reprogram_fixed_counter(pmc, new_ctrl, i); } pmu->fixed_ctr_ctrl = data; } /* function is called when global control register has been updated. */ static void global_ctrl_changed(struct kvm_pmu *pmu, u64 data) { int bit; u64 diff = pmu->global_ctrl ^ data; pmu->global_ctrl = data; for_each_set_bit(bit, (unsigned long *)&diff, X86_PMC_IDX_MAX) reprogram_counter(pmu, bit); } static unsigned intel_find_arch_event(struct kvm_pmu *pmu, u8 event_select, u8 unit_mask) { int i; for (i = 0; i < ARRAY_SIZE(intel_arch_events); i++) if (intel_arch_events[i].eventsel == event_select && intel_arch_events[i].unit_mask == unit_mask && (pmu->available_event_types & (1 << i))) break; if (i == ARRAY_SIZE(intel_arch_events)) return PERF_COUNT_HW_MAX; return intel_arch_events[i].event_type; } static unsigned intel_find_fixed_event(int idx) { u32 event; size_t size = ARRAY_SIZE(fixed_pmc_events); if (idx >= size) return PERF_COUNT_HW_MAX; event = fixed_pmc_events[array_index_nospec(idx, size)]; return intel_arch_events[event].event_type; } /* check if a PMC is enabled by comparing it with globl_ctrl bits. */ static bool intel_pmc_is_enabled(struct kvm_pmc *pmc) { struct kvm_pmu *pmu = pmc_to_pmu(pmc); return test_bit(pmc->idx, (unsigned long *)&pmu->global_ctrl); } static struct kvm_pmc *intel_pmc_idx_to_pmc(struct kvm_pmu *pmu, int pmc_idx) { if (pmc_idx < INTEL_PMC_IDX_FIXED) return get_gp_pmc(pmu, MSR_P6_EVNTSEL0 + pmc_idx, MSR_P6_EVNTSEL0); else { u32 idx = pmc_idx - INTEL_PMC_IDX_FIXED; return get_fixed_pmc(pmu, idx + MSR_CORE_PERF_FIXED_CTR0); } } /* returns 0 if idx's corresponding MSR exists; otherwise returns 1. */ static int intel_is_valid_msr_idx(struct kvm_vcpu *vcpu, unsigned idx) { struct kvm_pmu *pmu = vcpu_to_pmu(vcpu); bool fixed = idx & (1u << 30); idx &= ~(3u << 30); return (!fixed && idx >= pmu->nr_arch_gp_counters) || (fixed && idx >= pmu->nr_arch_fixed_counters); } static struct kvm_pmc *intel_msr_idx_to_pmc(struct kvm_vcpu *vcpu, unsigned idx, u64 *mask) { struct kvm_pmu *pmu = vcpu_to_pmu(vcpu); bool fixed = idx & (1u << 30); struct kvm_pmc *counters; unsigned int num_counters; idx &= ~(3u << 30); if (fixed) { counters = pmu->fixed_counters; num_counters = pmu->nr_arch_fixed_counters; } else { counters = pmu->gp_counters; num_counters = pmu->nr_arch_gp_counters; } if (idx >= num_counters) return NULL; *mask &= pmu->counter_bitmask[fixed ? KVM_PMC_FIXED : KVM_PMC_GP]; return &counters[array_index_nospec(idx, num_counters)]; } static bool intel_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr) { struct kvm_pmu *pmu = vcpu_to_pmu(vcpu); int ret; switch (msr) { case MSR_CORE_PERF_FIXED_CTR_CTRL: case MSR_CORE_PERF_GLOBAL_STATUS: case MSR_CORE_PERF_GLOBAL_CTRL: case MSR_CORE_PERF_GLOBAL_OVF_CTRL: ret = pmu->version > 1; break; default: ret = get_gp_pmc(pmu, msr, MSR_IA32_PERFCTR0) || get_gp_pmc(pmu, msr, MSR_P6_EVNTSEL0) || get_fixed_pmc(pmu, msr); break; } return ret; } static int intel_pmu_get_msr(struct kvm_vcpu *vcpu, u32 msr, u64 *data) { struct kvm_pmu *pmu = vcpu_to_pmu(vcpu); struct kvm_pmc *pmc; switch (msr) { case MSR_CORE_PERF_FIXED_CTR_CTRL: *data = pmu->fixed_ctr_ctrl; return 0; case MSR_CORE_PERF_GLOBAL_STATUS: *data = pmu->global_status; return 0; case MSR_CORE_PERF_GLOBAL_CTRL: *data = pmu->global_ctrl; return 0; case MSR_CORE_PERF_GLOBAL_OVF_CTRL: *data = pmu->global_ovf_ctrl; return 0; default: if ((pmc = get_gp_pmc(pmu, msr, MSR_IA32_PERFCTR0))) { u64 val = pmc_read_counter(pmc); *data = val & pmu->counter_bitmask[KVM_PMC_GP]; return 0; } else if ((pmc = get_fixed_pmc(pmu, msr))) { u64 val = pmc_read_counter(pmc); *data = val & pmu->counter_bitmask[KVM_PMC_FIXED]; return 0; } else if ((pmc = get_gp_pmc(pmu, msr, MSR_P6_EVNTSEL0))) { *data = pmc->eventsel; return 0; } } return 1; } static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) { struct kvm_pmu *pmu = vcpu_to_pmu(vcpu); struct kvm_pmc *pmc; u32 msr = msr_info->index; u64 data = msr_info->data; switch (msr) { case MSR_CORE_PERF_FIXED_CTR_CTRL: if (pmu->fixed_ctr_ctrl == data) return 0; if (!(data & 0xfffffffffffff444ull)) { reprogram_fixed_counters(pmu, data); return 0; } break; case MSR_CORE_PERF_GLOBAL_STATUS: if (msr_info->host_initiated) { pmu->global_status = data; return 0; } break; /* RO MSR */ case MSR_CORE_PERF_GLOBAL_CTRL: if (pmu->global_ctrl == data) return 0; if (!(data & pmu->global_ctrl_mask)) { global_ctrl_changed(pmu, data); return 0; } break; case MSR_CORE_PERF_GLOBAL_OVF_CTRL: if (!(data & (pmu->global_ctrl_mask & ~(3ull<<62)))) { if (!msr_info->host_initiated) pmu->global_status &= ~data; pmu->global_ovf_ctrl = data; return 0; } break; default: if ((pmc = get_gp_pmc(pmu, msr, MSR_IA32_PERFCTR0))) { if (msr_info->host_initiated) pmc->counter = data; else pmc->counter = (s32)data; return 0; } else if ((pmc = get_fixed_pmc(pmu, msr))) { pmc->counter = data; return 0; } else if ((pmc = get_gp_pmc(pmu, msr, MSR_P6_EVNTSEL0))) { if (data == pmc->eventsel) return 0; if (!(data & pmu->reserved_bits)) { reprogram_gp_counter(pmc, data); return 0; } } } return 1; } static void intel_pmu_refresh(struct kvm_vcpu *vcpu) { struct kvm_pmu *pmu = vcpu_to_pmu(vcpu); struct kvm_cpuid_entry2 *entry; union cpuid10_eax eax; union cpuid10_edx edx; pmu->nr_arch_gp_counters = 0; pmu->nr_arch_fixed_counters = 0; pmu->counter_bitmask[KVM_PMC_GP] = 0; pmu->counter_bitmask[KVM_PMC_FIXED] = 0; pmu->version = 0; pmu->reserved_bits = 0xffffffff00200000ull; entry = kvm_find_cpuid_entry(vcpu, 0xa, 0); if (!entry) return; eax.full = entry->eax; edx.full = entry->edx; pmu->version = eax.split.version_id; if (!pmu->version) return; pmu->nr_arch_gp_counters = min_t(int, eax.split.num_counters, INTEL_PMC_MAX_GENERIC); pmu->counter_bitmask[KVM_PMC_GP] = ((u64)1 << eax.split.bit_width) - 1; pmu->available_event_types = ~entry->ebx & ((1ull << eax.split.mask_length) - 1); if (pmu->version == 1) { pmu->nr_arch_fixed_counters = 0; } else { pmu->nr_arch_fixed_counters = min_t(int, edx.split.num_counters_fixed, INTEL_PMC_MAX_FIXED); pmu->counter_bitmask[KVM_PMC_FIXED] = ((u64)1 << edx.split.bit_width_fixed) - 1; } pmu->global_ctrl = ((1ull << pmu->nr_arch_gp_counters) - 1) | (((1ull << pmu->nr_arch_fixed_counters) - 1) << INTEL_PMC_IDX_FIXED); pmu->global_ctrl_mask = ~pmu->global_ctrl; entry = kvm_find_cpuid_entry(vcpu, 7, 0); if (entry && (boot_cpu_has(X86_FEATURE_HLE) || boot_cpu_has(X86_FEATURE_RTM)) && (entry->ebx & (X86_FEATURE_HLE|X86_FEATURE_RTM))) pmu->reserved_bits ^= HSW_IN_TX|HSW_IN_TX_CHECKPOINTED; } static void intel_pmu_init(struct kvm_vcpu *vcpu) { int i; struct kvm_pmu *pmu = vcpu_to_pmu(vcpu); for (i = 0; i < INTEL_PMC_MAX_GENERIC; i++) { pmu->gp_counters[i].type = KVM_PMC_GP; pmu->gp_counters[i].vcpu = vcpu; pmu->gp_counters[i].idx = i; } for (i = 0; i < INTEL_PMC_MAX_FIXED; i++) { pmu->fixed_counters[i].type = KVM_PMC_FIXED; pmu->fixed_counters[i].vcpu = vcpu; pmu->fixed_counters[i].idx = i + INTEL_PMC_IDX_FIXED; } } static void intel_pmu_reset(struct kvm_vcpu *vcpu) { struct kvm_pmu *pmu = vcpu_to_pmu(vcpu); int i; for (i = 0; i < INTEL_PMC_MAX_GENERIC; i++) { struct kvm_pmc *pmc = &pmu->gp_counters[i]; pmc_stop_counter(pmc); pmc->counter = pmc->eventsel = 0; } for (i = 0; i < INTEL_PMC_MAX_FIXED; i++) pmc_stop_counter(&pmu->fixed_counters[i]); pmu->fixed_ctr_ctrl = pmu->global_ctrl = pmu->global_status = pmu->global_ovf_ctrl = 0; } struct kvm_pmu_ops intel_pmu_ops = { .find_arch_event = intel_find_arch_event, .find_fixed_event = intel_find_fixed_event, .pmc_is_enabled = intel_pmc_is_enabled, .pmc_idx_to_pmc = intel_pmc_idx_to_pmc, .msr_idx_to_pmc = intel_msr_idx_to_pmc, .is_valid_msr_idx = intel_is_valid_msr_idx, .is_valid_msr = intel_is_valid_msr, .get_msr = intel_pmu_get_msr, .set_msr = intel_pmu_set_msr, .refresh = intel_pmu_refresh, .init = intel_pmu_init, .reset = intel_pmu_reset, };
19 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 /* * v4l2-fh.h * * V4L2 file handle. Store per file handle data for the V4L2 * framework. Using file handles is optional for the drivers. * * Copyright (C) 2009--2010 Nokia Corporation. * * Contact: Sakari Ailus <sakari.ailus@iki.fi> * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License * version 2 as published by the Free Software Foundation. * * This program is distributed in the hope that it will be useful, but * WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * General Public License for more details. */ #ifndef V4L2_FH_H #define V4L2_FH_H #include <linux/fs.h> #include <linux/kconfig.h> #include <linux/list.h> #include <linux/videodev2.h> struct video_device; struct v4l2_ctrl_handler; /** * struct v4l2_fh - Describes a V4L2 file handler * * @list: list of file handlers * @vdev: pointer to &struct video_device * @ctrl_handler: pointer to &struct v4l2_ctrl_handler * @prio: priority of the file handler, as defined by &enum v4l2_priority * * @wait: event' s wait queue * @subscribe_lock: serialise changes to the subscribed list; guarantee that * the add and del event callbacks are orderly called * @subscribed: list of subscribed events * @available: list of events waiting to be dequeued * @navailable: number of available events at @available list * @sequence: event sequence number * * @m2m_ctx: pointer to &struct v4l2_m2m_ctx */ struct v4l2_fh { struct list_head list; struct video_device *vdev; struct v4l2_ctrl_handler *ctrl_handler; enum v4l2_priority prio; /* Events */ wait_queue_head_t wait; struct mutex subscribe_lock; struct list_head subscribed; struct list_head available; unsigned int navailable; u32 sequence; #if IS_ENABLED(CONFIG_V4L2_MEM2MEM_DEV) struct v4l2_m2m_ctx *m2m_ctx; #endif }; /** * v4l2_fh_init - Initialise the file handle. * * @fh: pointer to &struct v4l2_fh * @vdev: pointer to &struct video_device * * Parts of the V4L2 framework using the * file handles should be initialised in this function. Must be called * from driver's v4l2_file_operations->open\(\) handler if the driver * uses &struct v4l2_fh. */ void v4l2_fh_init(struct v4l2_fh *fh, struct video_device *vdev); /** * v4l2_fh_add - Add the fh to the list of file handles on a video_device. * * @fh: pointer to &struct v4l2_fh * * .. note:: * The @fh file handle must be initialised first. */ void v4l2_fh_add(struct v4l2_fh *fh); /** * v4l2_fh_open - Ancillary routine that can be used as the open\(\) op * of v4l2_file_operations. * * @filp: pointer to struct file * * It allocates a v4l2_fh and inits and adds it to the &struct video_device * associated with the file pointer. */ int v4l2_fh_open(struct file *filp); /** * v4l2_fh_del - Remove file handle from the list of file handles. * * @fh: pointer to &struct v4l2_fh * * On error filp->private_data will be %NULL, otherwise it will point to * the &struct v4l2_fh. * * .. note:: * Must be called in v4l2_file_operations->release\(\) handler if the driver * uses &struct v4l2_fh. */ void v4l2_fh_del(struct v4l2_fh *fh); /** * v4l2_fh_exit - Release resources related to a file handle. * * @fh: pointer to &struct v4l2_fh * * Parts of the V4L2 framework using the v4l2_fh must release their * resources here, too. * * .. note:: * Must be called in v4l2_file_operations->release\(\) handler if the * driver uses &struct v4l2_fh. */ void v4l2_fh_exit(struct v4l2_fh *fh); /** * v4l2_fh_release - Ancillary routine that can be used as the release\(\) op * of v4l2_file_operations. * * @filp: pointer to struct file * * It deletes and exits the v4l2_fh associated with the file pointer and * frees it. It will do nothing if filp->private_data (the pointer to the * v4l2_fh struct) is %NULL. * * This function always returns 0. */ int v4l2_fh_release(struct file *filp); /** * v4l2_fh_is_singular - Returns 1 if this filehandle is the only filehandle * opened for the associated video_device. * * @fh: pointer to &struct v4l2_fh * * If @fh is NULL, then it returns 0. */ int v4l2_fh_is_singular(struct v4l2_fh *fh); /** * v4l2_fh_is_singular_file - Returns 1 if this filehandle is the only * filehandle opened for the associated video_device. * * @filp: pointer to struct file * * This is a helper function variant of v4l2_fh_is_singular() with uses * struct file as argument. * * If filp->private_data is %NULL, then it will return 0. */ static inline int v4l2_fh_is_singular_file(struct file *filp) { return v4l2_fh_is_singular(filp->private_data); } #endif /* V4L2_EVENT_H */
1555 1103 1103 200 1458 95 95 95 95 1798 95 95 95 95 95 95 95 95 95 95 95 95 95 95 95 95 95 95 95 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 /* * Copyright (C) 2007 Jens Axboe <jens.axboe@oracle.com> * * Scatterlist handling helpers. * * This source code is licensed under the GNU General Public License, * Version 2. See the file COPYING for more details. */ #include <linux/export.h> #include <linux/slab.h> #include <linux/scatterlist.h> #include <linux/highmem.h> #include <linux/kmemleak.h> /** * sg_next - return the next scatterlist entry in a list * @sg: The current sg entry * * Description: * Usually the next entry will be @sg@ + 1, but if this sg element is part * of a chained scatterlist, it could jump to the start of a new * scatterlist array. * **/ struct scatterlist *sg_next(struct scatterlist *sg) { if (sg_is_last(sg)) return NULL; sg++; if (unlikely(sg_is_chain(sg))) sg = sg_chain_ptr(sg); return sg; } EXPORT_SYMBOL(sg_next); /** * sg_nents - return total count of entries in scatterlist * @sg: The scatterlist * * Description: * Allows to know how many entries are in sg, taking into acount * chaining as well * **/ int sg_nents(struct scatterlist *sg) { int nents; for (nents = 0; sg; sg = sg_next(sg)) nents++; return nents; } EXPORT_SYMBOL(sg_nents); /** * sg_nents_for_len - return total count of entries in scatterlist * needed to satisfy the supplied length * @sg: The scatterlist * @len: The total required length * * Description: * Determines the number of entries in sg that are required to meet * the supplied length, taking into acount chaining as well * * Returns: * the number of sg entries needed, negative error on failure * **/ int sg_nents_for_len(struct scatterlist *sg, u64 len) { int nents; u64 total; if (!len) return 0; for (nents = 0, total = 0; sg; sg = sg_next(sg)) { nents++; total += sg->length; if (total >= len) return nents; } return -EINVAL; } EXPORT_SYMBOL(sg_nents_for_len); /** * sg_last - return the last scatterlist entry in a list * @sgl: First entry in the scatterlist * @nents: Number of entries in the scatterlist * * Description: * Should only be used casually, it (currently) scans the entire list * to get the last entry. * * Note that the @sgl@ pointer passed in need not be the first one, * the important bit is that @nents@ denotes the number of entries that * exist from @sgl@. * **/ struct scatterlist *sg_last(struct scatterlist *sgl, unsigned int nents) { struct scatterlist *sg, *ret = NULL; unsigned int i; for_each_sg(sgl, sg, nents, i) ret = sg; BUG_ON(!sg_is_last(ret)); return ret; } EXPORT_SYMBOL(sg_last); /** * sg_init_table - Initialize SG table * @sgl: The SG table * @nents: Number of entries in table * * Notes: * If this is part of a chained sg table, sg_mark_end() should be * used only on the last table part. * **/ void sg_init_table(struct scatterlist *sgl, unsigned int nents) { memset(sgl, 0, sizeof(*sgl) * nents); sg_init_marker(sgl, nents); } EXPORT_SYMBOL(sg_init_table); /** * sg_init_one - Initialize a single entry sg list * @sg: SG entry * @buf: Virtual address for IO * @buflen: IO length * **/ void sg_init_one(struct scatterlist *sg, const void *buf, unsigned int buflen) { sg_init_table(sg, 1); sg_set_buf(sg, buf, buflen); } EXPORT_SYMBOL(sg_init_one); /* * The default behaviour of sg_alloc_table() is to use these kmalloc/kfree * helpers. */ static struct scatterlist *sg_kmalloc(unsigned int nents, gfp_t gfp_mask) { if (nents == SG_MAX_SINGLE_ALLOC) { /* * Kmemleak doesn't track page allocations as they are not * commonly used (in a raw form) for kernel data structures. * As we chain together a list of pages and then a normal * kmalloc (tracked by kmemleak), in order to for that last * allocation not to become decoupled (and thus a * false-positive) we need to inform kmemleak of all the * intermediate allocations. */ void *ptr = (void *) __get_free_page(gfp_mask); kmemleak_alloc(ptr, PAGE_SIZE, 1, gfp_mask); return ptr; } else return kmalloc_array(nents, sizeof(struct scatterlist), gfp_mask); } static void sg_kfree(struct scatterlist *sg, unsigned int nents) { if (nents == SG_MAX_SINGLE_ALLOC) { kmemleak_free(sg); free_page((unsigned long) sg); } else kfree(sg); } /** * __sg_free_table - Free a previously mapped sg table * @table: The sg table header to use * @max_ents: The maximum number of entries per single scatterlist * @skip_first_chunk: don't free the (preallocated) first scatterlist chunk * @free_fn: Free function * * Description: * Free an sg table previously allocated and setup with * __sg_alloc_table(). The @max_ents value must be identical to * that previously used with __sg_alloc_table(). * **/ void __sg_free_table(struct sg_table *table, unsigned int max_ents, bool skip_first_chunk, sg_free_fn *free_fn) { struct scatterlist *sgl, *next; if (unlikely(!table->sgl)) return; sgl = table->sgl; while (table->orig_nents) { unsigned int alloc_size = table->orig_nents; unsigned int sg_size; /* * If we have more than max_ents segments left, * then assign 'next' to the sg table after the current one. * sg_size is then one less than alloc size, since the last * element is the chain pointer. */ if (alloc_size > max_ents) { next = sg_chain_ptr(&sgl[max_ents - 1]); alloc_size = max_ents; sg_size = alloc_size - 1; } else { sg_size = alloc_size; next = NULL; } table->orig_nents -= sg_size; if (skip_first_chunk) skip_first_chunk = false; else free_fn(sgl, alloc_size); sgl = next; } table->sgl = NULL; } EXPORT_SYMBOL(__sg_free_table); /** * sg_free_table - Free a previously allocated sg table * @table: The mapped sg table header * **/ void sg_free_table(struct sg_table *table) { __sg_free_table(table, SG_MAX_SINGLE_ALLOC, false, sg_kfree); } EXPORT_SYMBOL(sg_free_table); /** * __sg_alloc_table - Allocate and initialize an sg table with given allocator * @table: The sg table header to use * @nents: Number of entries in sg list * @max_ents: The maximum number of entries the allocator returns per call * @gfp_mask: GFP allocation mask * @alloc_fn: Allocator to use * * Description: * This function returns a @table @nents long. The allocator is * defined to return scatterlist chunks of maximum size @max_ents. * Thus if @nents is bigger than @max_ents, the scatterlists will be * chained in units of @max_ents. * * Notes: * If this function returns non-0 (eg failure), the caller must call * __sg_free_table() to cleanup any leftover allocations. * **/ int __sg_alloc_table(struct sg_table *table, unsigned int nents, unsigned int max_ents, struct scatterlist *first_chunk, gfp_t gfp_mask, sg_alloc_fn *alloc_fn) { struct scatterlist *sg, *prv; unsigned int left; memset(table, 0, sizeof(*table)); if (nents == 0) return -EINVAL; #ifndef CONFIG_ARCH_HAS_SG_CHAIN if (WARN_ON_ONCE(nents > max_ents)) return -EINVAL; #endif left = nents; prv = NULL; do { unsigned int sg_size, alloc_size = left; if (alloc_size > max_ents) { alloc_size = max_ents; sg_size = alloc_size - 1; } else sg_size = alloc_size; left -= sg_size; if (first_chunk) { sg = first_chunk; first_chunk = NULL; } else { sg = alloc_fn(alloc_size, gfp_mask); } if (unlikely(!sg)) { /* * Adjust entry count to reflect that the last * entry of the previous table won't be used for * linkage. Without this, sg_kfree() may get * confused. */ if (prv) table->nents = ++table->orig_nents; return -ENOMEM; } sg_init_table(sg, alloc_size); table->nents = table->orig_nents += sg_size; /* * If this is the first mapping, assign the sg table header. * If this is not the first mapping, chain previous part. */ if (prv) sg_chain(prv, max_ents, sg); else table->sgl = sg; /* * If no more entries after this one, mark the end */ if (!left) sg_mark_end(&sg[sg_size - 1]); prv = sg; } while (left); return 0; } EXPORT_SYMBOL(__sg_alloc_table); /** * sg_alloc_table - Allocate and initialize an sg table * @table: The sg table header to use * @nents: Number of entries in sg list * @gfp_mask: GFP allocation mask * * Description: * Allocate and initialize an sg table. If @nents@ is larger than * SG_MAX_SINGLE_ALLOC a chained sg table will be setup. * **/ int sg_alloc_table(struct sg_table *table, unsigned int nents, gfp_t gfp_mask) { int ret; ret = __sg_alloc_table(table, nents, SG_MAX_SINGLE_ALLOC, NULL, gfp_mask, sg_kmalloc); if (unlikely(ret)) __sg_free_table(table, SG_MAX_SINGLE_ALLOC, false, sg_kfree); return ret; } EXPORT_SYMBOL(sg_alloc_table); /** * __sg_alloc_table_from_pages - Allocate and initialize an sg table from * an array of pages * @sgt: The sg table header to use * @pages: Pointer to an array of page pointers * @n_pages: Number of pages in the pages array * @offset: Offset from start of the first page to the start of a buffer * @size: Number of valid bytes in the buffer (after offset) * @max_segment: Maximum size of a scatterlist node in bytes (page aligned) * @gfp_mask: GFP allocation mask * * Description: * Allocate and initialize an sg table from a list of pages. Contiguous * ranges of the pages are squashed into a single scatterlist node up to the * maximum size specified in @max_segment. An user may provide an offset at a * start and a size of valid data in a buffer specified by the page array. * The returned sg table is released by sg_free_table. * * Returns: * 0 on success, negative error on failure */ int __sg_alloc_table_from_pages(struct sg_table *sgt, struct page **pages, unsigned int n_pages, unsigned int offset, unsigned long size, unsigned int max_segment, gfp_t gfp_mask) { unsigned int chunks, cur_page, seg_len, i; int ret; struct scatterlist *s; if (WARN_ON(!max_segment || offset_in_page(max_segment))) return -EINVAL; /* compute number of contiguous chunks */ chunks = 1; seg_len = 0; for (i = 1; i < n_pages; i++) { seg_len += PAGE_SIZE; if (seg_len >= max_segment || page_to_pfn(pages[i]) != page_to_pfn(pages[i - 1]) + 1) { chunks++; seg_len = 0; } } ret = sg_alloc_table(sgt, chunks, gfp_mask); if (unlikely(ret)) return ret; /* merging chunks and putting them into the scatterlist */ cur_page = 0; for_each_sg(sgt->sgl, s, sgt->orig_nents, i) { unsigned int j, chunk_size; /* look for the end of the current chunk */ seg_len = 0; for (j = cur_page + 1; j < n_pages; j++) { seg_len += PAGE_SIZE; if (seg_len >= max_segment || page_to_pfn(pages[j]) != page_to_pfn(pages[j - 1]) + 1) break; } chunk_size = ((j - cur_page) << PAGE_SHIFT) - offset; sg_set_page(s, pages[cur_page], min_t(unsigned long, size, chunk_size), offset); size -= chunk_size; offset = 0; cur_page = j; } return 0; } EXPORT_SYMBOL(__sg_alloc_table_from_pages); /** * sg_alloc_table_from_pages - Allocate and initialize an sg table from * an array of pages * @sgt: The sg table header to use * @pages: Pointer to an array of page pointers * @n_pages: Number of pages in the pages array * @offset: Offset from start of the first page to the start of a buffer * @size: Number of valid bytes in the buffer (after offset) * @gfp_mask: GFP allocation mask * * Description: * Allocate and initialize an sg table from a list of pages. Contiguous * ranges of the pages are squashed into a single scatterlist node. A user * may provide an offset at a start and a size of valid data in a buffer * specified by the page array. The returned sg table is released by * sg_free_table. * * Returns: * 0 on success, negative error on failure */ int sg_alloc_table_from_pages(struct sg_table *sgt, struct page **pages, unsigned int n_pages, unsigned int offset, unsigned long size, gfp_t gfp_mask) { return __sg_alloc_table_from_pages(sgt, pages, n_pages, offset, size, SCATTERLIST_MAX_SEGMENT, gfp_mask); } EXPORT_SYMBOL(sg_alloc_table_from_pages); #ifdef CONFIG_SGL_ALLOC /** * sgl_alloc_order - allocate a scatterlist and its pages * @length: Length in bytes of the scatterlist. Must be at least one * @order: Second argument for alloc_pages() * @chainable: Whether or not to allocate an extra element in the scatterlist * for scatterlist chaining purposes * @gfp: Memory allocation flags * @nent_p: [out] Number of entries in the scatterlist that have pages * * Returns: A pointer to an initialized scatterlist or %NULL upon failure. */ struct scatterlist *sgl_alloc_order(unsigned long long length, unsigned int order, bool chainable, gfp_t gfp, unsigned int *nent_p) { struct scatterlist *sgl, *sg; struct page *page; unsigned int nent, nalloc; u32 elem_len; nent = round_up(length, PAGE_SIZE << order) >> (PAGE_SHIFT + order); /* Check for integer overflow */ if (length > (nent << (PAGE_SHIFT + order))) return NULL; nalloc = nent; if (chainable) { /* Check for integer overflow */ if (nalloc + 1 < nalloc) return NULL; nalloc++; } sgl = kmalloc_array(nalloc, sizeof(struct scatterlist), (gfp & ~GFP_DMA) | __GFP_ZERO); if (!sgl) return NULL; sg_init_table(sgl, nalloc); sg = sgl; while (length) { elem_len = min_t(u64, length, PAGE_SIZE << order); page = alloc_pages(gfp, order); if (!page) { sgl_free_order(sgl, order); return NULL; } sg_set_page(sg, page, elem_len, 0); length -= elem_len; sg = sg_next(sg); } WARN_ONCE(length, "length = %lld\n", length); if (nent_p) *nent_p = nent; return sgl; } EXPORT_SYMBOL(sgl_alloc_order); /** * sgl_alloc - allocate a scatterlist and its pages * @length: Length in bytes of the scatterlist * @gfp: Memory allocation flags * @nent_p: [out] Number of entries in the scatterlist * * Returns: A pointer to an initialized scatterlist or %NULL upon failure. */ struct scatterlist *sgl_alloc(unsigned long long length, gfp_t gfp, unsigned int *nent_p) { return sgl_alloc_order(length, 0, false, gfp, nent_p); } EXPORT_SYMBOL(sgl_alloc); /** * sgl_free_n_order - free a scatterlist and its pages * @sgl: Scatterlist with one or more elements * @nents: Maximum number of elements to free * @order: Second argument for __free_pages() * * Notes: * - If several scatterlists have been chained and each chain element is * freed separately then it's essential to set nents correctly to avoid that a * page would get freed twice. * - All pages in a chained scatterlist can be freed at once by setting @nents * to a high number. */ void sgl_free_n_order(struct scatterlist *sgl, int nents, int order) { struct scatterlist *sg; struct page *page; int i; for_each_sg(sgl, sg, nents, i) { if (!sg) break; page = sg_page(sg); if (page) __free_pages(page, order); } kfree(sgl); } EXPORT_SYMBOL(sgl_free_n_order); /** * sgl_free_order - free a scatterlist and its pages * @sgl: Scatterlist with one or more elements * @order: Second argument for __free_pages() */ void sgl_free_order(struct scatterlist *sgl, int order) { sgl_free_n_order(sgl, INT_MAX, order); } EXPORT_SYMBOL(sgl_free_order); /** * sgl_free - free a scatterlist and its pages * @sgl: Scatterlist with one or more elements */ void sgl_free(struct scatterlist *sgl) { sgl_free_order(sgl, 0); } EXPORT_SYMBOL(sgl_free); #endif /* CONFIG_SGL_ALLOC */ void __sg_page_iter_start(struct sg_page_iter *piter, struct scatterlist *sglist, unsigned int nents, unsigned long pgoffset) { piter->__pg_advance = 0; piter->__nents = nents; piter->sg = sglist; piter->sg_pgoffset = pgoffset; } EXPORT_SYMBOL(__sg_page_iter_start); static int sg_page_count(struct scatterlist *sg) { return PAGE_ALIGN(sg->offset + sg->length) >> PAGE_SHIFT; } bool __sg_page_iter_next(struct sg_page_iter *piter) { if (!piter->__nents || !piter->sg) return false; piter->sg_pgoffset += piter->__pg_advance; piter->__pg_advance = 1; while (piter->sg_pgoffset >= sg_page_count(piter->sg)) { piter->sg_pgoffset -= sg_page_count(piter->sg); piter->sg = sg_next(piter->sg); if (!--piter->__nents || !piter->sg) return false; } return true; } EXPORT_SYMBOL(__sg_page_iter_next); /** * sg_miter_start - start mapping iteration over a sg list * @miter: sg mapping iter to be started * @sgl: sg list to iterate over * @nents: number of sg entries * * Description: * Starts mapping iterator @miter. * * Context: * Don't care. */ void sg_miter_start(struct sg_mapping_iter *miter, struct scatterlist *sgl, unsigned int nents, unsigned int flags) { memset(miter, 0, sizeof(struct sg_mapping_iter)); __sg_page_iter_start(&miter->piter, sgl, nents, 0); WARN_ON(!(flags & (SG_MITER_TO_SG | SG_MITER_FROM_SG))); miter->__flags = flags; } EXPORT_SYMBOL(sg_miter_start); static bool sg_miter_get_next_page(struct sg_mapping_iter *miter) { if (!miter->__remaining) { struct scatterlist *sg; if (!__sg_page_iter_next(&miter->piter)) return false; sg = miter->piter.sg; miter->__offset = miter->piter.sg_pgoffset ? 0 : sg->offset; miter->piter.sg_pgoffset += miter->__offset >> PAGE_SHIFT; miter->__offset &= PAGE_SIZE - 1; miter->__remaining = sg->offset + sg->length - (miter->piter.sg_pgoffset << PAGE_SHIFT) - miter->__offset; miter->__remaining = min_t(unsigned long, miter->__remaining, PAGE_SIZE - miter->__offset); } return true; } /** * sg_miter_skip - reposition mapping iterator * @miter: sg mapping iter to be skipped * @offset: number of bytes to plus the current location * * Description: * Sets the offset of @miter to its current location plus @offset bytes. * If mapping iterator @miter has been proceeded by sg_miter_next(), this * stops @miter. * * Context: * Don't care if @miter is stopped, or not proceeded yet. * Otherwise, preemption disabled if the SG_MITER_ATOMIC is set. * * Returns: * true if @miter contains the valid mapping. false if end of sg * list is reached. */ bool sg_miter_skip(struct sg_mapping_iter *miter, off_t offset) { sg_miter_stop(miter); while (offset) { off_t consumed; if (!sg_miter_get_next_page(miter)) return false; consumed = min_t(off_t, offset, miter->__remaining); miter->__offset += consumed; miter->__remaining -= consumed; offset -= consumed; } return true; } EXPORT_SYMBOL(sg_miter_skip); /** * sg_miter_next - proceed mapping iterator to the next mapping * @miter: sg mapping iter to proceed * * Description: * Proceeds @miter to the next mapping. @miter should have been started * using sg_miter_start(). On successful return, @miter->page, * @miter->addr and @miter->length point to the current mapping. * * Context: * Preemption disabled if SG_MITER_ATOMIC. Preemption must stay disabled * till @miter is stopped. May sleep if !SG_MITER_ATOMIC. * * Returns: * true if @miter contains the next mapping. false if end of sg * list is reached. */ bool sg_miter_next(struct sg_mapping_iter *miter) { sg_miter_stop(miter); /* * Get to the next page if necessary. * __remaining, __offset is adjusted by sg_miter_stop */ if (!sg_miter_get_next_page(miter)) return false; miter->page = sg_page_iter_page(&miter->piter); miter->consumed = miter->length = miter->__remaining; if (miter->__flags & SG_MITER_ATOMIC) miter->addr = kmap_atomic(miter->page) + miter->__offset; else miter->addr = kmap(miter->page) + miter->__offset; return true; } EXPORT_SYMBOL(sg_miter_next); /** * sg_miter_stop - stop mapping iteration * @miter: sg mapping iter to be stopped * * Description: * Stops mapping iterator @miter. @miter should have been started * using sg_miter_start(). A stopped iteration can be resumed by * calling sg_miter_next() on it. This is useful when resources (kmap) * need to be released during iteration. * * Context: * Preemption disabled if the SG_MITER_ATOMIC is set. Don't care * otherwise. */ void sg_miter_stop(struct sg_mapping_iter *miter) { WARN_ON(miter->consumed > miter->length); /* drop resources from the last iteration */ if (miter->addr) { miter->__offset += miter->consumed; miter->__remaining -= miter->consumed; if ((miter->__flags & SG_MITER_TO_SG) && !PageSlab(miter->page)) flush_kernel_dcache_page(miter->page); if (miter->__flags & SG_MITER_ATOMIC) { WARN_ON_ONCE(preemptible()); kunmap_atomic(miter->addr); } else kunmap(miter->page); miter->page = NULL; miter->addr = NULL; miter->length = 0; miter->consumed = 0; } } EXPORT_SYMBOL(sg_miter_stop); /** * sg_copy_buffer - Copy data between a linear buffer and an SG list * @sgl: The SG list * @nents: Number of SG entries * @buf: Where to copy from * @buflen: The number of bytes to copy * @skip: Number of bytes to skip before copying * @to_buffer: transfer direction (true == from an sg list to a * buffer, false == from a buffer to an sg list * * Returns the number of copied bytes. * **/ size_t sg_copy_buffer(struct scatterlist *sgl, unsigned int nents, void *buf, size_t buflen, off_t skip, bool to_buffer) { unsigned int offset = 0; struct sg_mapping_iter miter; unsigned int sg_flags = SG_MITER_ATOMIC; if (to_buffer) sg_flags |= SG_MITER_FROM_SG; else sg_flags |= SG_MITER_TO_SG; sg_miter_start(&miter, sgl, nents, sg_flags); if (!sg_miter_skip(&miter, skip)) return false; while ((offset < buflen) && sg_miter_next(&miter)) { unsigned int len; len = min(miter.length, buflen - offset); if (to_buffer) memcpy(buf + offset, miter.addr, len); else memcpy(miter.addr, buf + offset, len); offset += len; } sg_miter_stop(&miter); return offset; } EXPORT_SYMBOL(sg_copy_buffer); /** * sg_copy_from_buffer - Copy from a linear buffer to an SG list * @sgl: The SG list * @nents: Number of SG entries * @buf: Where to copy from * @buflen: The number of bytes to copy * * Returns the number of copied bytes. * **/ size_t sg_copy_from_buffer(struct scatterlist *sgl, unsigned int nents, const void *buf, size_t buflen) { return sg_copy_buffer(sgl, nents, (void *)buf, buflen, 0, false); } EXPORT_SYMBOL(sg_copy_from_buffer); /** * sg_copy_to_buffer - Copy from an SG list to a linear buffer * @sgl: The SG list * @nents: Number of SG entries * @buf: Where to copy to * @buflen: The number of bytes to copy * * Returns the number of copied bytes. * **/ size_t sg_copy_to_buffer(struct scatterlist *sgl, unsigned int nents, void *buf, size_t buflen) { return sg_copy_buffer(sgl, nents, buf, buflen, 0, true); } EXPORT_SYMBOL(sg_copy_to_buffer); /** * sg_pcopy_from_buffer - Copy from a linear buffer to an SG list * @sgl: The SG list * @nents: Number of SG entries * @buf: Where to copy from * @buflen: The number of bytes to copy * @skip: Number of bytes to skip before copying * * Returns the number of copied bytes. * **/ size_t sg_pcopy_from_buffer(struct scatterlist *sgl, unsigned int nents, const void *buf, size_t buflen, off_t skip) { return sg_copy_buffer(sgl, nents, (void *)buf, buflen, skip, false); } EXPORT_SYMBOL(sg_pcopy_from_buffer); /** * sg_pcopy_to_buffer - Copy from an SG list to a linear buffer * @sgl: The SG list * @nents: Number of SG entries * @buf: Where to copy to * @buflen: The number of bytes to copy * @skip: Number of bytes to skip before copying * * Returns the number of copied bytes. * **/ size_t sg_pcopy_to_buffer(struct scatterlist *sgl, unsigned int nents, void *buf, size_t buflen, off_t skip) { return sg_copy_buffer(sgl, nents, buf, buflen, skip, true); } EXPORT_SYMBOL(sg_pcopy_to_buffer); /** * sg_zero_buffer - Zero-out a part of a SG list * @sgl: The SG list * @nents: Number of SG entries * @buflen: The number of bytes to zero out * @skip: Number of bytes to skip before zeroing * * Returns the number of bytes zeroed. **/ size_t sg_zero_buffer(struct scatterlist *sgl, unsigned int nents, size_t buflen, off_t skip) { unsigned int offset = 0; struct sg_mapping_iter miter; unsigned int sg_flags = SG_MITER_ATOMIC | SG_MITER_TO_SG; sg_miter_start(&miter, sgl, nents, sg_flags); if (!sg_miter_skip(&miter, skip)) return false; while (offset < buflen && sg_miter_next(&miter)) { unsigned int len; len = min(miter.length, buflen - offset); memset(miter.addr, 0, len); offset += len; } sg_miter_stop(&miter); return offset; } EXPORT_SYMBOL(sg_zero_buffer);
40 40 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 /* * linux/lib/crc-ccitt.c * * This source code is licensed under the GNU General Public License, * Version 2. See the file COPYING for more details. */ #include <linux/types.h> #include <linux/module.h> #include <linux/crc-ccitt.h> /* * This mysterious table is just the CRC of each possible byte. It can be * computed using the standard bit-at-a-time methods. The polynomial can * be seen in entry 128, 0x8408. This corresponds to x^0 + x^5 + x^12. * Add the implicit x^16, and you have the standard CRC-CCITT. */ u16 const crc_ccitt_table[256] = { 0x0000, 0x1189, 0x2312, 0x329b, 0x4624, 0x57ad, 0x6536, 0x74bf, 0x8c48, 0x9dc1, 0xaf5a, 0xbed3, 0xca6c, 0xdbe5, 0xe97e, 0xf8f7, 0x1081, 0x0108, 0x3393, 0x221a, 0x56a5, 0x472c, 0x75b7, 0x643e, 0x9cc9, 0x8d40, 0xbfdb, 0xae52, 0xdaed, 0xcb64, 0xf9ff, 0xe876, 0x2102, 0x308b, 0x0210, 0x1399, 0x6726, 0x76af, 0x4434, 0x55bd, 0xad4a, 0xbcc3, 0x8e58, 0x9fd1, 0xeb6e, 0xfae7, 0xc87c, 0xd9f5, 0x3183, 0x200a, 0x1291, 0x0318, 0x77a7, 0x662e, 0x54b5, 0x453c, 0xbdcb, 0xac42, 0x9ed9, 0x8f50, 0xfbef, 0xea66, 0xd8fd, 0xc974, 0x4204, 0x538d, 0x6116, 0x709f, 0x0420, 0x15a9, 0x2732, 0x36bb, 0xce4c, 0xdfc5, 0xed5e, 0xfcd7, 0x8868, 0x99e1, 0xab7a, 0xbaf3, 0x5285, 0x430c, 0x7197, 0x601e, 0x14a1, 0x0528, 0x37b3, 0x263a, 0xdecd, 0xcf44, 0xfddf, 0xec56, 0x98e9, 0x8960, 0xbbfb, 0xaa72, 0x6306, 0x728f, 0x4014, 0x519d, 0x2522, 0x34ab, 0x0630, 0x17b9, 0xef4e, 0xfec7, 0xcc5c, 0xddd5, 0xa96a, 0xb8e3, 0x8a78, 0x9bf1, 0x7387, 0x620e, 0x5095, 0x411c, 0x35a3, 0x242a, 0x16b1, 0x0738, 0xffcf, 0xee46, 0xdcdd, 0xcd54, 0xb9eb, 0xa862, 0x9af9, 0x8b70, 0x8408, 0x9581, 0xa71a, 0xb693, 0xc22c, 0xd3a5, 0xe13e, 0xf0b7, 0x0840, 0x19c9, 0x2b52, 0x3adb, 0x4e64, 0x5fed, 0x6d76, 0x7cff, 0x9489, 0x8500, 0xb79b, 0xa612, 0xd2ad, 0xc324, 0xf1bf, 0xe036, 0x18c1, 0x0948, 0x3bd3, 0x2a5a, 0x5ee5, 0x4f6c, 0x7df7, 0x6c7e, 0xa50a, 0xb483, 0x8618, 0x9791, 0xe32e, 0xf2a7, 0xc03c, 0xd1b5, 0x2942, 0x38cb, 0x0a50, 0x1bd9, 0x6f66, 0x7eef, 0x4c74, 0x5dfd, 0xb58b, 0xa402, 0x9699, 0x8710, 0xf3af, 0xe226, 0xd0bd, 0xc134, 0x39c3, 0x284a, 0x1ad1, 0x0b58, 0x7fe7, 0x6e6e, 0x5cf5, 0x4d7c, 0xc60c, 0xd785, 0xe51e, 0xf497, 0x8028, 0x91a1, 0xa33a, 0xb2b3, 0x4a44, 0x5bcd, 0x6956, 0x78df, 0x0c60, 0x1de9, 0x2f72, 0x3efb, 0xd68d, 0xc704, 0xf59f, 0xe416, 0x90a9, 0x8120, 0xb3bb, 0xa232, 0x5ac5, 0x4b4c, 0x79d7, 0x685e, 0x1ce1, 0x0d68, 0x3ff3, 0x2e7a, 0xe70e, 0xf687, 0xc41c, 0xd595, 0xa12a, 0xb0a3, 0x8238, 0x93b1, 0x6b46, 0x7acf, 0x4854, 0x59dd, 0x2d62, 0x3ceb, 0x0e70, 0x1ff9, 0xf78f, 0xe606, 0xd49d, 0xc514, 0xb1ab, 0xa022, 0x92b9, 0x8330, 0x7bc7, 0x6a4e, 0x58d5, 0x495c, 0x3de3, 0x2c6a, 0x1ef1, 0x0f78 }; EXPORT_SYMBOL(crc_ccitt_table); /* * Similar table to calculate CRC16 variant known as CRC-CCITT-FALSE * Reflected bits order, does not augment final value. */ u16 const crc_ccitt_false_table[256] = { 0x0000, 0x1021, 0x2042, 0x3063, 0x4084, 0x50A5, 0x60C6, 0x70E7, 0x8108, 0x9129, 0xA14A, 0xB16B, 0xC18C, 0xD1AD, 0xE1CE, 0xF1EF, 0x1231, 0x0210, 0x3273, 0x2252, 0x52B5, 0x4294, 0x72F7, 0x62D6, 0x9339, 0x8318, 0xB37B, 0xA35A, 0xD3BD, 0xC39C, 0xF3FF, 0xE3DE, 0x2462, 0x3443, 0x0420, 0x1401, 0x64E6, 0x74C7, 0x44A4, 0x5485, 0xA56A, 0xB54B, 0x8528, 0x9509, 0xE5EE, 0xF5CF, 0xC5AC, 0xD58D, 0x3653, 0x2672, 0x1611, 0x0630, 0x76D7, 0x66F6, 0x5695, 0x46B4, 0xB75B, 0xA77A, 0x9719, 0x8738, 0xF7DF, 0xE7FE, 0xD79D, 0xC7BC, 0x48C4, 0x58E5, 0x6886, 0x78A7, 0x0840, 0x1861, 0x2802, 0x3823, 0xC9CC, 0xD9ED, 0xE98E, 0xF9AF, 0x8948, 0x9969, 0xA90A, 0xB92B, 0x5AF5, 0x4AD4, 0x7AB7, 0x6A96, 0x1A71, 0x0A50, 0x3A33, 0x2A12, 0xDBFD, 0xCBDC, 0xFBBF, 0xEB9E, 0x9B79, 0x8B58, 0xBB3B, 0xAB1A, 0x6CA6, 0x7C87, 0x4CE4, 0x5CC5, 0x2C22, 0x3C03, 0x0C60, 0x1C41, 0xEDAE, 0xFD8F, 0xCDEC, 0xDDCD, 0xAD2A, 0xBD0B, 0x8D68, 0x9D49, 0x7E97, 0x6EB6, 0x5ED5, 0x4EF4, 0x3E13, 0x2E32, 0x1E51, 0x0E70, 0xFF9F, 0xEFBE, 0xDFDD, 0xCFFC, 0xBF1B, 0xAF3A, 0x9F59, 0x8F78, 0x9188, 0x81A9, 0xB1CA, 0xA1EB, 0xD10C, 0xC12D, 0xF14E, 0xE16F, 0x1080, 0x00A1, 0x30C2, 0x20E3, 0x5004, 0x4025, 0x7046, 0x6067, 0x83B9, 0x9398, 0xA3FB, 0xB3DA, 0xC33D, 0xD31C, 0xE37F, 0xF35E, 0x02B1, 0x1290, 0x22F3, 0x32D2, 0x4235, 0x5214, 0x6277, 0x7256, 0xB5EA, 0xA5CB, 0x95A8, 0x8589, 0xF56E, 0xE54F, 0xD52C, 0xC50D, 0x34E2, 0x24C3, 0x14A0, 0x0481, 0x7466, 0x6447, 0x5424, 0x4405, 0xA7DB, 0xB7FA, 0x8799, 0x97B8, 0xE75F, 0xF77E, 0xC71D, 0xD73C, 0x26D3, 0x36F2, 0x0691, 0x16B0, 0x6657, 0x7676, 0x4615, 0x5634, 0xD94C, 0xC96D, 0xF90E, 0xE92F, 0x99C8, 0x89E9, 0xB98A, 0xA9AB, 0x5844, 0x4865, 0x7806, 0x6827, 0x18C0, 0x08E1, 0x3882, 0x28A3, 0xCB7D, 0xDB5C, 0xEB3F, 0xFB1E, 0x8BF9, 0x9BD8, 0xABBB, 0xBB9A, 0x4A75, 0x5A54, 0x6A37, 0x7A16, 0x0AF1, 0x1AD0, 0x2AB3, 0x3A92, 0xFD2E, 0xED0F, 0xDD6C, 0xCD4D, 0xBDAA, 0xAD8B, 0x9DE8, 0x8DC9, 0x7C26, 0x6C07, 0x5C64, 0x4C45, 0x3CA2, 0x2C83, 0x1CE0, 0x0CC1, 0xEF1F, 0xFF3E, 0xCF5D, 0xDF7C, 0xAF9B, 0xBFBA, 0x8FD9, 0x9FF8, 0x6E17, 0x7E36, 0x4E55, 0x5E74, 0x2E93, 0x3EB2, 0x0ED1, 0x1EF0 }; EXPORT_SYMBOL(crc_ccitt_false_table); /** * crc_ccitt - recompute the CRC (CRC-CCITT variant) for the data * buffer * @crc: previous CRC value * @buffer: data pointer * @len: number of bytes in the buffer */ u16 crc_ccitt(u16 crc, u8 const *buffer, size_t len) { while (len--) crc = crc_ccitt_byte(crc, *buffer++); return crc; } EXPORT_SYMBOL(crc_ccitt); /** * crc_ccitt_false - recompute the CRC (CRC-CCITT-FALSE variant) * for the data buffer * @crc: previous CRC value * @buffer: data pointer * @len: number of bytes in the buffer */ u16 crc_ccitt_false(u16 crc, u8 const *buffer, size_t len) { while (len--) crc = crc_ccitt_false_byte(crc, *buffer++); return crc; } EXPORT_SYMBOL(crc_ccitt_false); MODULE_DESCRIPTION("CRC-CCITT calculations"); MODULE_LICENSE("GPL");
1 1 1 6 6 8 8 6 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 /* * AT and PS/2 keyboard driver * * Copyright (c) 1999-2002 Vojtech Pavlik */ /* * This program is free software; you can redistribute it and/or modify it * under the terms of the GNU General Public License version 2 as published by * the Free Software Foundation. */ /* * This driver can handle standard AT keyboards and PS/2 keyboards in * Translated and Raw Set 2 and Set 3, as well as AT keyboards on dumb * input-only controllers and AT keyboards connected over a one way RS232 * converter. */ #include <linux/delay.h> #include <linux/module.h> #include <linux/slab.h> #include <linux/interrupt.h> #include <linux/init.h> #include <linux/input.h> #include <linux/serio.h> #include <linux/workqueue.h> #include <linux/libps2.h> #include <linux/mutex.h> #include <linux/dmi.h> #define DRIVER_DESC "AT and PS/2 keyboard driver" MODULE_AUTHOR("Vojtech Pavlik <vojtech@suse.cz>"); MODULE_DESCRIPTION(DRIVER_DESC); MODULE_LICENSE("GPL"); static int atkbd_set = 2; module_param_named(set, atkbd_set, int, 0); MODULE_PARM_DESC(set, "Select keyboard code set (2 = default, 3 = PS/2 native)"); #if defined(__i386__) || defined(__x86_64__) || defined(__hppa__) static bool atkbd_reset; #else static bool atkbd_reset = true; #endif module_param_named(reset, atkbd_reset, bool, 0); MODULE_PARM_DESC(reset, "Reset keyboard during initialization"); static bool atkbd_softrepeat; module_param_named(softrepeat, atkbd_softrepeat, bool, 0); MODULE_PARM_DESC(softrepeat, "Use software keyboard repeat"); static bool atkbd_softraw = true; module_param_named(softraw, atkbd_softraw, bool, 0); MODULE_PARM_DESC(softraw, "Use software generated rawmode"); static bool atkbd_scroll; module_param_named(scroll, atkbd_scroll, bool, 0); MODULE_PARM_DESC(scroll, "Enable scroll-wheel on MS Office and similar keyboards"); static bool atkbd_extra; module_param_named(extra, atkbd_extra, bool, 0); MODULE_PARM_DESC(extra, "Enable extra LEDs and keys on IBM RapidAcces, EzKey and similar keyboards"); static bool atkbd_terminal; module_param_named(terminal, atkbd_terminal, bool, 0); MODULE_PARM_DESC(terminal, "Enable break codes on an IBM Terminal keyboard connected via AT/PS2"); /* * Scancode to keycode tables. These are just the default setting, and * are loadable via a userland utility. */ #define ATKBD_KEYMAP_SIZE 512 static const unsigned short atkbd_set2_keycode[ATKBD_KEYMAP_SIZE] = { #ifdef CONFIG_KEYBOARD_ATKBD_HP_KEYCODES /* XXX: need a more general approach */ #include "hpps2atkbd.h" /* include the keyboard scancodes */ #else 0, 67, 65, 63, 61, 59, 60, 88, 0, 68, 66, 64, 62, 15, 41,117, 0, 56, 42, 93, 29, 16, 2, 0, 0, 0, 44, 31, 30, 17, 3, 0, 0, 46, 45, 32, 18, 5, 4, 95, 0, 57, 47, 33, 20, 19, 6,183, 0, 49, 48, 35, 34, 21, 7,184, 0, 0, 50, 36, 22, 8, 9,185, 0, 51, 37, 23, 24, 11, 10, 0, 0, 52, 53, 38, 39, 25, 12, 0, 0, 89, 40, 0, 26, 13, 0, 0, 58, 54, 28, 27, 0, 43, 0, 85, 0, 86, 91, 90, 92, 0, 14, 94, 0, 79,124, 75, 71,121, 0, 0, 82, 83, 80, 76, 77, 72, 1, 69, 87, 78, 81, 74, 55, 73, 70, 99, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 217,100,255, 0, 97,165, 0, 0,156, 0, 0, 0, 0, 0, 0,125, 173,114, 0,113, 0, 0, 0,126,128, 0, 0,140, 0, 0, 0,127, 159, 0,115, 0,164, 0, 0,116,158, 0,172,166, 0, 0, 0,142, 157, 0, 0, 0, 0, 0, 0, 0,155, 0, 98, 0, 0,163, 0, 0, 226, 0, 0, 0, 0, 0, 0, 0, 0,255, 96, 0, 0, 0,143, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,107, 0,105,102, 0, 0,112, 110,111,108,112,106,103, 0,119, 0,118,109, 0, 99,104,119, 0, 0, 0, 0, 65, 99, #endif }; static const unsigned short atkbd_set3_keycode[ATKBD_KEYMAP_SIZE] = { 0, 0, 0, 0, 0, 0, 0, 59, 1,138,128,129,130, 15, 41, 60, 131, 29, 42, 86, 58, 16, 2, 61,133, 56, 44, 31, 30, 17, 3, 62, 134, 46, 45, 32, 18, 5, 4, 63,135, 57, 47, 33, 20, 19, 6, 64, 136, 49, 48, 35, 34, 21, 7, 65,137,100, 50, 36, 22, 8, 9, 66, 125, 51, 37, 23, 24, 11, 10, 67,126, 52, 53, 38, 39, 25, 12, 68, 113,114, 40, 43, 26, 13, 87, 99, 97, 54, 28, 27, 43, 43, 88, 70, 108,105,119,103,111,107, 14,110, 0, 79,106, 75, 71,109,102,104, 82, 83, 80, 76, 77, 72, 69, 98, 0, 96, 81, 0, 78, 73, 55,183, 184,185,186,187, 74, 94, 92, 93, 0, 0, 0,125,126,127,112, 0, 0,139,172,163,165,115,152,172,166,140,160,154,113,114,167,168, 148,149,147,140 }; static const unsigned short atkbd_unxlate_table[128] = { 0,118, 22, 30, 38, 37, 46, 54, 61, 62, 70, 69, 78, 85,102, 13, 21, 29, 36, 45, 44, 53, 60, 67, 68, 77, 84, 91, 90, 20, 28, 27, 35, 43, 52, 51, 59, 66, 75, 76, 82, 14, 18, 93, 26, 34, 33, 42, 50, 49, 58, 65, 73, 74, 89,124, 17, 41, 88, 5, 6, 4, 12, 3, 11, 2, 10, 1, 9,119,126,108,117,125,123,107,115,116,121,105, 114,122,112,113,127, 96, 97,120, 7, 15, 23, 31, 39, 47, 55, 63, 71, 79, 86, 94, 8, 16, 24, 32, 40, 48, 56, 64, 72, 80, 87,111, 19, 25, 57, 81, 83, 92, 95, 98, 99,100,101,103,104,106,109,110 }; #define ATKBD_CMD_SETLEDS 0x10ed #define ATKBD_CMD_GSCANSET 0x11f0 #define ATKBD_CMD_SSCANSET 0x10f0 #define ATKBD_CMD_GETID 0x02f2 #define ATKBD_CMD_SETREP 0x10f3 #define ATKBD_CMD_ENABLE 0x00f4 #define ATKBD_CMD_RESET_DIS 0x00f5 /* Reset to defaults and disable */ #define ATKBD_CMD_RESET_DEF 0x00f6 /* Reset to defaults */ #define ATKBD_CMD_SETALL_MB 0x00f8 /* Set all keys to give break codes */ #define ATKBD_CMD_SETALL_MBR 0x00fa /* ... and repeat */ #define ATKBD_CMD_RESET_BAT 0x02ff #define ATKBD_CMD_RESEND 0x00fe #define ATKBD_CMD_EX_ENABLE 0x10ea #define ATKBD_CMD_EX_SETLEDS 0x20eb #define ATKBD_CMD_OK_GETID 0x02e8 #define ATKBD_RET_ACK 0xfa #define ATKBD_RET_NAK 0xfe #define ATKBD_RET_BAT 0xaa #define ATKBD_RET_EMUL0 0xe0 #define ATKBD_RET_EMUL1 0xe1 #define ATKBD_RET_RELEASE 0xf0 #define ATKBD_RET_HANJA 0xf1 #define ATKBD_RET_HANGEUL 0xf2 #define ATKBD_RET_ERR 0xff #define ATKBD_KEY_UNKNOWN 0 #define ATKBD_KEY_NULL 255 #define ATKBD_SCR_1 0xfffe #define ATKBD_SCR_2 0xfffd #define ATKBD_SCR_4 0xfffc #define ATKBD_SCR_8 0xfffb #define ATKBD_SCR_CLICK 0xfffa #define ATKBD_SCR_LEFT 0xfff9 #define ATKBD_SCR_RIGHT 0xfff8 #define ATKBD_SPECIAL ATKBD_SCR_RIGHT #define ATKBD_LED_EVENT_BIT 0 #define ATKBD_REP_EVENT_BIT 1 #define ATKBD_XL_ERR 0x01 #define ATKBD_XL_BAT 0x02 #define ATKBD_XL_ACK 0x04 #define ATKBD_XL_NAK 0x08 #define ATKBD_XL_HANGEUL 0x10 #define ATKBD_XL_HANJA 0x20 static const struct { unsigned short keycode; unsigned char set2; } atkbd_scroll_keys[] = { { ATKBD_SCR_1, 0xc5 }, { ATKBD_SCR_2, 0x9d }, { ATKBD_SCR_4, 0xa4 }, { ATKBD_SCR_8, 0x9b }, { ATKBD_SCR_CLICK, 0xe0 }, { ATKBD_SCR_LEFT, 0xcb }, { ATKBD_SCR_RIGHT, 0xd2 }, }; /* * The atkbd control structure */ struct atkbd { struct ps2dev ps2dev; struct input_dev *dev; /* Written only during init */ char name[64]; char phys[32]; unsigned short id; unsigned short keycode[ATKBD_KEYMAP_SIZE]; DECLARE_BITMAP(force_release_mask, ATKBD_KEYMAP_SIZE); unsigned char set; bool translated; bool extra; bool write; bool softrepeat; bool softraw; bool scroll; bool enabled; /* Accessed only from interrupt */ unsigned char emul; bool resend; bool release; unsigned long xl_bit; unsigned int last; unsigned long time; unsigned long err_count; struct delayed_work event_work; unsigned long event_jiffies; unsigned long event_mask; /* Serializes reconnect(), attr->set() and event work */ struct mutex mutex; }; /* * System-specific keymap fixup routine */ static void (*atkbd_platform_fixup)(struct atkbd *, const void *data); static void *atkbd_platform_fixup_data; static unsigned int (*atkbd_platform_scancode_fixup)(struct atkbd *, unsigned int); /* * Certain keyboards to not like ATKBD_CMD_RESET_DIS and stop responding * to many commands until full reset (ATKBD_CMD_RESET_BAT) is performed. */ static bool atkbd_skip_deactivate; static ssize_t atkbd_attr_show_helper(struct device *dev, char *buf, ssize_t (*handler)(struct atkbd *, char *)); static ssize_t atkbd_attr_set_helper(struct device *dev, const char *buf, size_t count, ssize_t (*handler)(struct atkbd *, const char *, size_t)); #define ATKBD_DEFINE_ATTR(_name) \ static ssize_t atkbd_show_##_name(struct atkbd *, char *); \ static ssize_t atkbd_set_##_name(struct atkbd *, const char *, size_t); \ static ssize_t atkbd_do_show_##_name(struct device *d, \ struct device_attribute *attr, char *b) \ { \ return atkbd_attr_show_helper(d, b, atkbd_show_##_name); \ } \ static ssize_t atkbd_do_set_##_name(struct device *d, \ struct device_attribute *attr, const char *b, size_t s) \ { \ return atkbd_attr_set_helper(d, b, s, atkbd_set_##_name); \ } \ static struct device_attribute atkbd_attr_##_name = \ __ATTR(_name, S_IWUSR | S_IRUGO, atkbd_do_show_##_name, atkbd_do_set_##_name); ATKBD_DEFINE_ATTR(extra); ATKBD_DEFINE_ATTR(force_release); ATKBD_DEFINE_ATTR(scroll); ATKBD_DEFINE_ATTR(set); ATKBD_DEFINE_ATTR(softrepeat); ATKBD_DEFINE_ATTR(softraw); #define ATKBD_DEFINE_RO_ATTR(_name) \ static ssize_t atkbd_show_##_name(struct atkbd *, char *); \ static ssize_t atkbd_do_show_##_name(struct device *d, \ struct device_attribute *attr, char *b) \ { \ return atkbd_attr_show_helper(d, b, atkbd_show_##_name); \ } \ static struct device_attribute atkbd_attr_##_name = \ __ATTR(_name, S_IRUGO, atkbd_do_show_##_name, NULL); ATKBD_DEFINE_RO_ATTR(err_count); static struct attribute *atkbd_attributes[] = { &atkbd_attr_extra.attr, &atkbd_attr_force_release.attr, &atkbd_attr_scroll.attr, &atkbd_attr_set.attr, &atkbd_attr_softrepeat.attr, &atkbd_attr_softraw.attr, &atkbd_attr_err_count.attr, NULL }; static struct attribute_group atkbd_attribute_group = { .attrs = atkbd_attributes, }; static const unsigned int xl_table[] = { ATKBD_RET_BAT, ATKBD_RET_ERR, ATKBD_RET_ACK, ATKBD_RET_NAK, ATKBD_RET_HANJA, ATKBD_RET_HANGEUL, }; /* * Checks if we should mangle the scancode to extract 'release' bit * in translated mode. */ static bool atkbd_need_xlate(unsigned long xl_bit, unsigned char code) { int i; if (code == ATKBD_RET_EMUL0 || code == ATKBD_RET_EMUL1) return false; for (i = 0; i < ARRAY_SIZE(xl_table); i++) if (code == xl_table[i]) return test_bit(i, &xl_bit); return true; } /* * Calculates new value of xl_bit so the driver can distinguish * between make/break pair of scancodes for select keys and PS/2 * protocol responses. */ static void atkbd_calculate_xl_bit(struct atkbd *atkbd, unsigned char code) { int i; for (i = 0; i < ARRAY_SIZE(xl_table); i++) { if (!((code ^ xl_table[i]) & 0x7f)) { if (code & 0x80) __clear_bit(i, &atkbd->xl_bit); else __set_bit(i, &atkbd->xl_bit); break; } } } /* * Encode the scancode, 0xe0 prefix, and high bit into a single integer, * keeping kernel 2.4 compatibility for set 2 */ static unsigned int atkbd_compat_scancode(struct atkbd *atkbd, unsigned int code) { if (atkbd->set == 3) { if (atkbd->emul == 1) code |= 0x100; } else { code = (code & 0x7f) | ((code & 0x80) << 1); if (atkbd->emul == 1) code |= 0x80; } return code; } /* * atkbd_interrupt(). Here takes place processing of data received from * the keyboard into events. */ static irqreturn_t atkbd_interrupt(struct serio *serio, unsigned char data, unsigned int flags) { struct atkbd *atkbd = serio_get_drvdata(serio); struct input_dev *dev = atkbd->dev; unsigned int code = data; int scroll = 0, hscroll = 0, click = -1; int value; unsigned short keycode; dev_dbg(&serio->dev, "Received %02x flags %02x\n", data, flags); #if !defined(__i386__) && !defined (__x86_64__) if ((flags & (SERIO_FRAME | SERIO_PARITY)) && (~flags & SERIO_TIMEOUT) && !atkbd->resend && atkbd->write) { dev_warn(&serio->dev, "Frame/parity error: %02x\n", flags); serio_write(serio, ATKBD_CMD_RESEND); atkbd->resend = true; goto out; } if (!flags && data == ATKBD_RET_ACK) atkbd->resend = false; #endif if (unlikely(atkbd->ps2dev.flags & PS2_FLAG_ACK)) if (ps2_handle_ack(&atkbd->ps2dev, data)) goto out; if (unlikely(atkbd->ps2dev.flags & PS2_FLAG_CMD)) if (ps2_handle_response(&atkbd->ps2dev, data)) goto out; if (!atkbd->enabled) goto out; input_event(dev, EV_MSC, MSC_RAW, code); if (atkbd_platform_scancode_fixup) code = atkbd_platform_scancode_fixup(atkbd, code); if (atkbd->translated) { if (atkbd->emul || atkbd_need_xlate(atkbd->xl_bit, code)) { atkbd->release = code >> 7; code &= 0x7f; } if (!atkbd->emul) atkbd_calculate_xl_bit(atkbd, data); } switch (code) { case ATKBD_RET_BAT: atkbd->enabled = false; serio_reconnect(atkbd->ps2dev.serio); goto out; case ATKBD_RET_EMUL0: atkbd->emul = 1; goto out; case ATKBD_RET_EMUL1: atkbd->emul = 2; goto out; case ATKBD_RET_RELEASE: atkbd->release = true; goto out; case ATKBD_RET_ACK: case ATKBD_RET_NAK: if (printk_ratelimit()) dev_warn(&serio->dev, "Spurious %s on %s. " "Some program might be trying to access hardware directly.\n", data == ATKBD_RET_ACK ? "ACK" : "NAK", serio->phys); goto out; case ATKBD_RET_ERR: atkbd->err_count++; dev_dbg(&serio->dev, "Keyboard on %s reports too many keys pressed.\n", serio->phys); goto out; } code = atkbd_compat_scancode(atkbd, code); if (atkbd->emul && --atkbd->emul) goto out; keycode = atkbd->keycode[code]; if (!(atkbd->release && test_bit(code, atkbd->force_release_mask))) if (keycode != ATKBD_KEY_NULL) input_event(dev, EV_MSC, MSC_SCAN, code); switch (keycode) { case ATKBD_KEY_NULL: break; case ATKBD_KEY_UNKNOWN: dev_warn(&serio->dev, "Unknown key %s (%s set %d, code %#x on %s).\n", atkbd->release ? "released" : "pressed", atkbd->translated ? "translated" : "raw", atkbd->set, code, serio->phys); dev_warn(&serio->dev, "Use 'setkeycodes %s%02x <keycode>' to make it known.\n", code & 0x80 ? "e0" : "", code & 0x7f); input_sync(dev); break; case ATKBD_SCR_1: scroll = 1; break; case ATKBD_SCR_2: scroll = 2; break; case ATKBD_SCR_4: scroll = 4; break; case ATKBD_SCR_8: scroll = 8; break; case ATKBD_SCR_CLICK: click = !atkbd->release; break; case ATKBD_SCR_LEFT: hscroll = -1; break; case ATKBD_SCR_RIGHT: hscroll = 1; break; default: if (atkbd->release) { value = 0; atkbd->last = 0; } else if (!atkbd->softrepeat && test_bit(keycode, dev->key)) { /* Workaround Toshiba laptop multiple keypress */ value = time_before(jiffies, atkbd->time) && atkbd->last == code ? 1 : 2; } else { value = 1; atkbd->last = code; atkbd->time = jiffies + msecs_to_jiffies(dev->rep[REP_DELAY]) / 2; } input_event(dev, EV_KEY, keycode, value); input_sync(dev); if (value && test_bit(code, atkbd->force_release_mask)) { input_event(dev, EV_MSC, MSC_SCAN, code); input_report_key(dev, keycode, 0); input_sync(dev); } } if (atkbd->scroll) { if (click != -1) input_report_key(dev, BTN_MIDDLE, click); input_report_rel(dev, REL_WHEEL, atkbd->release ? -scroll : scroll); input_report_rel(dev, REL_HWHEEL, hscroll); input_sync(dev); } atkbd->release = false; out: return IRQ_HANDLED; } static int atkbd_set_repeat_rate(struct atkbd *atkbd) { const short period[32] = { 33, 37, 42, 46, 50, 54, 58, 63, 67, 75, 83, 92, 100, 109, 116, 125, 133, 149, 167, 182, 200, 217, 232, 250, 270, 303, 333, 370, 400, 435, 470, 500 }; const short delay[4] = { 250, 500, 750, 1000 }; struct input_dev *dev = atkbd->dev; unsigned char param; int i = 0, j = 0; while (i < ARRAY_SIZE(period) - 1 && period[i] < dev->rep[REP_PERIOD]) i++; dev->rep[REP_PERIOD] = period[i]; while (j < ARRAY_SIZE(delay) - 1 && delay[j] < dev->rep[REP_DELAY]) j++; dev->rep[REP_DELAY] = delay[j]; param = i | (j << 5); return ps2_command(&atkbd->ps2dev, &param, ATKBD_CMD_SETREP); } static int atkbd_set_leds(struct atkbd *atkbd) { struct input_dev *dev = atkbd->dev; unsigned char param[2]; param[0] = (test_bit(LED_SCROLLL, dev->led) ? 1 : 0) | (test_bit(LED_NUML, dev->led) ? 2 : 0) | (test_bit(LED_CAPSL, dev->led) ? 4 : 0); if (ps2_command(&atkbd->ps2dev, param, ATKBD_CMD_SETLEDS)) return -1; if (atkbd->extra) { param[0] = 0; param[1] = (test_bit(LED_COMPOSE, dev->led) ? 0x01 : 0) | (test_bit(LED_SLEEP, dev->led) ? 0x02 : 0) | (test_bit(LED_SUSPEND, dev->led) ? 0x04 : 0) | (test_bit(LED_MISC, dev->led) ? 0x10 : 0) | (test_bit(LED_MUTE, dev->led) ? 0x20 : 0); if (ps2_command(&atkbd->ps2dev, param, ATKBD_CMD_EX_SETLEDS)) return -1; } return 0; } /* * atkbd_event_work() is used to complete processing of events that * can not be processed by input_event() which is often called from * interrupt context. */ static void atkbd_event_work(struct work_struct *work) { struct atkbd *atkbd = container_of(work, struct atkbd, event_work.work); mutex_lock(&atkbd->mutex); if (!atkbd->enabled) { /* * Serio ports are resumed asynchronously so while driver core * thinks that device is already fully operational in reality * it may not be ready yet. In this case we need to keep * rescheduling till reconnect completes. */ schedule_delayed_work(&atkbd->event_work, msecs_to_jiffies(100)); } else { if (test_and_clear_bit(ATKBD_LED_EVENT_BIT, &atkbd->event_mask)) atkbd_set_leds(atkbd); if (test_and_clear_bit(ATKBD_REP_EVENT_BIT, &atkbd->event_mask)) atkbd_set_repeat_rate(atkbd); } mutex_unlock(&atkbd->mutex); } /* * Schedule switch for execution. We need to throttle requests, * otherwise keyboard may become unresponsive. */ static void atkbd_schedule_event_work(struct atkbd *atkbd, int event_bit) { unsigned long delay = msecs_to_jiffies(50); if (time_after(jiffies, atkbd->event_jiffies + delay)) delay = 0; atkbd->event_jiffies = jiffies; set_bit(event_bit, &atkbd->event_mask); mb(); schedule_delayed_work(&atkbd->event_work, delay); } /* * Event callback from the input module. Events that change the state of * the hardware are processed here. If action can not be performed in * interrupt context it is offloaded to atkbd_event_work. */ static int atkbd_event(struct input_dev *dev, unsigned int type, unsigned int code, int value) { struct atkbd *atkbd = input_get_drvdata(dev); if (!atkbd->write) return -1; switch (type) { case EV_LED: atkbd_schedule_event_work(atkbd, ATKBD_LED_EVENT_BIT); return 0; case EV_REP: if (!atkbd->softrepeat) atkbd_schedule_event_work(atkbd, ATKBD_REP_EVENT_BIT); return 0; default: return -1; } } /* * atkbd_enable() signals that interrupt handler is allowed to * generate input events. */ static inline void atkbd_enable(struct atkbd *atkbd) { serio_pause_rx(atkbd->ps2dev.serio); atkbd->enabled = true; serio_continue_rx(atkbd->ps2dev.serio); } /* * atkbd_disable() tells input handler that all incoming data except * for ACKs and command response should be dropped. */ static inline void atkbd_disable(struct atkbd *atkbd) { serio_pause_rx(atkbd->ps2dev.serio); atkbd->enabled = false; serio_continue_rx(atkbd->ps2dev.serio); } static int atkbd_activate(struct atkbd *atkbd) { struct ps2dev *ps2dev = &atkbd->ps2dev; /* * Enable the keyboard to receive keystrokes. */ if (ps2_command(ps2dev, NULL, ATKBD_CMD_ENABLE)) { dev_err(&ps2dev->serio->dev, "Failed to enable keyboard on %s\n", ps2dev->serio->phys); return -1; } return 0; } /* * atkbd_deactivate() resets and disables the keyboard from sending * keystrokes. */ static void atkbd_deactivate(struct atkbd *atkbd) { struct ps2dev *ps2dev = &atkbd->ps2dev; if (ps2_command(ps2dev, NULL, ATKBD_CMD_RESET_DIS)) dev_err(&ps2dev->serio->dev, "Failed to deactivate keyboard on %s\n", ps2dev->serio->phys); } /* * atkbd_probe() probes for an AT keyboard on a serio port. */ static int atkbd_probe(struct atkbd *atkbd) { struct ps2dev *ps2dev = &atkbd->ps2dev; unsigned char param[2]; /* * Some systems, where the bit-twiddling when testing the io-lines of the * controller may confuse the keyboard need a full reset of the keyboard. On * these systems the BIOS also usually doesn't do it for us. */ if (atkbd_reset) if (ps2_command(ps2dev, NULL, ATKBD_CMD_RESET_BAT)) dev_warn(&ps2dev->serio->dev, "keyboard reset failed on %s\n", ps2dev->serio->phys); /* * Then we check the keyboard ID. We should get 0xab83 under normal conditions. * Some keyboards report different values, but the first byte is always 0xab or * 0xac. Some old AT keyboards don't report anything. If a mouse is connected, this * should make sure we don't try to set the LEDs on it. */ param[0] = param[1] = 0xa5; /* initialize with invalid values */ if (ps2_command(ps2dev, param, ATKBD_CMD_GETID)) { /* * If the get ID command failed, we check if we can at least set the LEDs on * the keyboard. This should work on every keyboard out there. It also turns * the LEDs off, which we want anyway. */ param[0] = 0; if (ps2_command(ps2dev, param, ATKBD_CMD_SETLEDS)) return -1; atkbd->id = 0xabba; return 0; } if (!ps2_is_keyboard_id(param[0])) return -1; atkbd->id = (param[0] << 8) | param[1]; if (atkbd->id == 0xaca1 && atkbd->translated) { dev_err(&ps2dev->serio->dev, "NCD terminal keyboards are only supported on non-translating controllers. " "Use i8042.direct=1 to disable translation.\n"); return -1; } /* * Make sure nothing is coming from the keyboard and disturbs our * internal state. */ if (!atkbd_skip_deactivate) atkbd_deactivate(atkbd); return 0; } /* * atkbd_select_set checks if a keyboard has a working Set 3 support, and * sets it into that. Unfortunately there are keyboards that can be switched * to Set 3, but don't work well in that (BTC Multimedia ...) */ static int atkbd_select_set(struct atkbd *atkbd, int target_set, int allow_extra) { struct ps2dev *ps2dev = &atkbd->ps2dev; unsigned char param[2]; atkbd->extra = false; /* * For known special keyboards we can go ahead and set the correct set. * We check for NCD PS/2 Sun, NorthGate OmniKey 101 and * IBM RapidAccess / IBM EzButton / Chicony KBP-8993 keyboards. */ if (atkbd->translated) return 2; if (atkbd->id == 0xaca1) { param[0] = 3; ps2_command(ps2dev, param, ATKBD_CMD_SSCANSET); return 3; } if (allow_extra) { param[0] = 0x71; if (!ps2_command(ps2dev, param, ATKBD_CMD_EX_ENABLE)) { atkbd->extra = true; return 2; } } if (atkbd_terminal) { ps2_command(ps2dev, param, ATKBD_CMD_SETALL_MB); return 3; } if (target_set != 3) return 2; if (!ps2_command(ps2dev, param, ATKBD_CMD_OK_GETID)) { atkbd->id = param[0] << 8 | param[1]; return 2; } param[0] = 3; if (ps2_command(ps2dev, param, ATKBD_CMD_SSCANSET)) return 2; param[0] = 0; if (ps2_command(ps2dev, param, ATKBD_CMD_GSCANSET)) return 2; if (param[0] != 3) { param[0] = 2; if (ps2_command(ps2dev, param, ATKBD_CMD_SSCANSET)) return 2; } ps2_command(ps2dev, param, ATKBD_CMD_SETALL_MBR); return 3; } static int atkbd_reset_state(struct atkbd *atkbd) { struct ps2dev *ps2dev = &atkbd->ps2dev; unsigned char param[1]; /* * Set the LEDs to a predefined state (all off). */ param[0] = 0; if (ps2_command(ps2dev, param, ATKBD_CMD_SETLEDS)) return -1; /* * Set autorepeat to fastest possible. */ param[0] = 0; if (ps2_command(ps2dev, param, ATKBD_CMD_SETREP)) return -1; return 0; } /* * atkbd_cleanup() restores the keyboard state so that BIOS is happy after a * reboot. */ static void atkbd_cleanup(struct serio *serio) { struct atkbd *atkbd = serio_get_drvdata(serio); atkbd_disable(atkbd); ps2_command(&atkbd->ps2dev, NULL, ATKBD_CMD_RESET_DEF); } /* * atkbd_disconnect() closes and frees. */ static void atkbd_disconnect(struct serio *serio) { struct atkbd *atkbd = serio_get_drvdata(serio); sysfs_remove_group(&serio->dev.kobj, &atkbd_attribute_group); atkbd_disable(atkbd); input_unregister_device(atkbd->dev); /* * Make sure we don't have a command in flight. * Note that since atkbd->enabled is false event work will keep * rescheduling itself until it gets canceled and will not try * accessing freed input device or serio port. */ cancel_delayed_work_sync(&atkbd->event_work); serio_close(serio); serio_set_drvdata(serio, NULL); kfree(atkbd); } /* * generate release events for the keycodes given in data */ static void atkbd_apply_forced_release_keylist(struct atkbd* atkbd, const void *data) { const unsigned int *keys = data; unsigned int i; if (atkbd->set == 2) for (i = 0; keys[i] != -1U; i++) __set_bit(keys[i], atkbd->force_release_mask); } /* * Most special keys (Fn+F?) on Dell laptops do not generate release * events so we have to do it ourselves. */ static unsigned int atkbd_dell_laptop_forced_release_keys[] = { 0x85, 0x86, 0x87, 0x88, 0x89, 0x8a, 0x8b, 0x8f, 0x93, -1U }; /* * Perform fixup for HP system that doesn't generate release * for its video switch */ static unsigned int atkbd_hp_forced_release_keys[] = { 0x94, -1U }; /* * Samsung NC10,NC20 with Fn+F? key release not working */ static unsigned int atkbd_samsung_forced_release_keys[] = { 0x82, 0x83, 0x84, 0x86, 0x88, 0x89, 0xb3, 0xf7, 0xf9, -1U }; /* * Amilo Pi 3525 key release for Fn+Volume keys not working */ static unsigned int atkbd_amilo_pi3525_forced_release_keys[] = { 0x20, 0xa0, 0x2e, 0xae, 0x30, 0xb0, -1U }; /* * Amilo Xi 3650 key release for light touch bar not working */ static unsigned int atkbd_amilo_xi3650_forced_release_keys[] = { 0x67, 0xed, 0x90, 0xa2, 0x99, 0xa4, 0xae, 0xb0, -1U }; /* * Soltech TA12 system with broken key release on volume keys and mute key */ static unsigned int atkdb_soltech_ta12_forced_release_keys[] = { 0xa0, 0xae, 0xb0, -1U }; /* * Many notebooks don't send key release event for volume up/down * keys, with key list below common among them */ static unsigned int atkbd_volume_forced_release_keys[] = { 0xae, 0xb0, -1U }; /* * OQO 01+ multimedia keys (64--66) generate e0 6x upon release whereas * they should be generating e4-e6 (0x80 | code). */ static unsigned int atkbd_oqo_01plus_scancode_fixup(struct atkbd *atkbd, unsigned int code) { if (atkbd->translated && atkbd->emul == 1 && (code == 0x64 || code == 0x65 || code == 0x66)) { atkbd->emul = 0; code |= 0x80; } return code; } /* * atkbd_set_keycode_table() initializes keyboard's keycode table * according to the selected scancode set */ static void atkbd_set_keycode_table(struct atkbd *atkbd) { unsigned int scancode; int i, j; memset(atkbd->keycode, 0, sizeof(atkbd->keycode)); bitmap_zero(atkbd->force_release_mask, ATKBD_KEYMAP_SIZE); if (atkbd->translated) { for (i = 0; i < 128; i++) { scancode = atkbd_unxlate_table[i]; atkbd->keycode[i] = atkbd_set2_keycode[scancode]; atkbd->keycode[i | 0x80] = atkbd_set2_keycode[scancode | 0x80]; if (atkbd->scroll) for (j = 0; j < ARRAY_SIZE(atkbd_scroll_keys); j++) if ((scancode | 0x80) == atkbd_scroll_keys[j].set2) atkbd->keycode[i | 0x80] = atkbd_scroll_keys[j].keycode; } } else if (atkbd->set == 3) { memcpy(atkbd->keycode, atkbd_set3_keycode, sizeof(atkbd->keycode)); } else { memcpy(atkbd->keycode, atkbd_set2_keycode, sizeof(atkbd->keycode)); if (atkbd->scroll) for (i = 0; i < ARRAY_SIZE(atkbd_scroll_keys); i++) { scancode = atkbd_scroll_keys[i].set2; atkbd->keycode[scancode] = atkbd_scroll_keys[i].keycode; } } /* * HANGEUL and HANJA keys do not send release events so we need to * generate such events ourselves */ scancode = atkbd_compat_scancode(atkbd, ATKBD_RET_HANGEUL); atkbd->keycode[scancode] = KEY_HANGEUL; __set_bit(scancode, atkbd->force_release_mask); scancode = atkbd_compat_scancode(atkbd, ATKBD_RET_HANJA); atkbd->keycode[scancode] = KEY_HANJA; __set_bit(scancode, atkbd->force_release_mask); /* * Perform additional fixups */ if (atkbd_platform_fixup) atkbd_platform_fixup(atkbd, atkbd_platform_fixup_data); } /* * atkbd_set_device_attrs() sets up keyboard's input device structure */ static void atkbd_set_device_attrs(struct atkbd *atkbd) { struct input_dev *input_dev = atkbd->dev; int i; if (atkbd->extra) snprintf(atkbd->name, sizeof(atkbd->name), "AT Set 2 Extra keyboard"); else snprintf(atkbd->name, sizeof(atkbd->name), "AT %s Set %d keyboard", atkbd->translated ? "Translated" : "Raw", atkbd->set); snprintf(atkbd->phys, sizeof(atkbd->phys), "%s/input0", atkbd->ps2dev.serio->phys); input_dev->name = atkbd->name; input_dev->phys = atkbd->phys; input_dev->id.bustype = BUS_I8042; input_dev->id.vendor = 0x0001; input_dev->id.product = atkbd->translated ? 1 : atkbd->set; input_dev->id.version = atkbd->id; input_dev->event = atkbd_event; input_dev->dev.parent = &atkbd->ps2dev.serio->dev; input_set_drvdata(input_dev, atkbd); input_dev->evbit[0] = BIT_MASK(EV_KEY) | BIT_MASK(EV_REP) | BIT_MASK(EV_MSC); if (atkbd->write) { input_dev->evbit[0] |= BIT_MASK(EV_LED); input_dev->ledbit[0] = BIT_MASK(LED_NUML) | BIT_MASK(LED_CAPSL) | BIT_MASK(LED_SCROLLL); } if (atkbd->extra) input_dev->ledbit[0] |= BIT_MASK(LED_COMPOSE) | BIT_MASK(LED_SUSPEND) | BIT_MASK(LED_SLEEP) | BIT_MASK(LED_MUTE) | BIT_MASK(LED_MISC); if (!atkbd->softrepeat) { input_dev->rep[REP_DELAY] = 250; input_dev->rep[REP_PERIOD] = 33; } input_dev->mscbit[0] = atkbd->softraw ? BIT_MASK(MSC_SCAN) : BIT_MASK(MSC_RAW) | BIT_MASK(MSC_SCAN); if (atkbd->scroll) { input_dev->evbit[0] |= BIT_MASK(EV_REL); input_dev->relbit[0] = BIT_MASK(REL_WHEEL) | BIT_MASK(REL_HWHEEL); __set_bit(BTN_MIDDLE, input_dev->keybit); } input_dev->keycode = atkbd->keycode; input_dev->keycodesize = sizeof(unsigned short); input_dev->keycodemax = ARRAY_SIZE(atkbd_set2_keycode); for (i = 0; i < ATKBD_KEYMAP_SIZE; i++) { if (atkbd->keycode[i] != KEY_RESERVED && atkbd->keycode[i] != ATKBD_KEY_NULL && atkbd->keycode[i] < ATKBD_SPECIAL) { __set_bit(atkbd->keycode[i], input_dev->keybit); } } } /* * atkbd_connect() is called when the serio module finds an interface * that isn't handled yet by an appropriate device driver. We check if * there is an AT keyboard out there and if yes, we register ourselves * to the input module. */ static int atkbd_connect(struct serio *serio, struct serio_driver *drv) { struct atkbd *atkbd; struct input_dev *dev; int err = -ENOMEM; atkbd = kzalloc(sizeof(struct atkbd), GFP_KERNEL); dev = input_allocate_device(); if (!atkbd || !dev) goto fail1; atkbd->dev = dev; ps2_init(&atkbd->ps2dev, serio); INIT_DELAYED_WORK(&atkbd->event_work, atkbd_event_work); mutex_init(&atkbd->mutex); switch (serio->id.type) { case SERIO_8042_XL: atkbd->translated = true; /* Fall through */ case SERIO_8042: if (serio->write) atkbd->write = true; break; } atkbd->softraw = atkbd_softraw; atkbd->softrepeat = atkbd_softrepeat; atkbd->scroll = atkbd_scroll; if (atkbd->softrepeat) atkbd->softraw = true; serio_set_drvdata(serio, atkbd); err = serio_open(serio, drv); if (err) goto fail2; if (atkbd->write) { if (atkbd_probe(atkbd)) { err = -ENODEV; goto fail3; } atkbd->set = atkbd_select_set(atkbd, atkbd_set, atkbd_extra); atkbd_reset_state(atkbd); } else { atkbd->set = 2; atkbd->id = 0xab00; } atkbd_set_keycode_table(atkbd); atkbd_set_device_attrs(atkbd); err = sysfs_create_group(&serio->dev.kobj, &atkbd_attribute_group); if (err) goto fail3; atkbd_enable(atkbd); if (serio->write) atkbd_activate(atkbd); err = input_register_device(atkbd->dev); if (err) goto fail4; return 0; fail4: sysfs_remove_group(&serio->dev.kobj, &atkbd_attribute_group); fail3: serio_close(serio); fail2: serio_set_drvdata(serio, NULL); fail1: input_free_device(dev); kfree(atkbd); return err; } /* * atkbd_reconnect() tries to restore keyboard into a sane state and is * most likely called on resume. */ static int atkbd_reconnect(struct serio *serio) { struct atkbd *atkbd = serio_get_drvdata(serio); struct serio_driver *drv = serio->drv; int retval = -1; if (!atkbd || !drv) { dev_dbg(&serio->dev, "reconnect request, but serio is disconnected, ignoring...\n"); return -1; } mutex_lock(&atkbd->mutex); atkbd_disable(atkbd); if (atkbd->write) { if (atkbd_probe(atkbd)) goto out; if (atkbd->set != atkbd_select_set(atkbd, atkbd->set, atkbd->extra)) goto out; /* * Restore LED state and repeat rate. While input core * will do this for us at resume time reconnect may happen * because user requested it via sysfs or simply because * keyboard was unplugged and plugged in again so we need * to do it ourselves here. */ atkbd_set_leds(atkbd); if (!atkbd->softrepeat) atkbd_set_repeat_rate(atkbd); } /* * Reset our state machine in case reconnect happened in the middle * of multi-byte scancode. */ atkbd->xl_bit = 0; atkbd->emul = 0; atkbd_enable(atkbd); if (atkbd->write) atkbd_activate(atkbd); retval = 0; out: mutex_unlock(&atkbd->mutex); return retval; } static const struct serio_device_id atkbd_serio_ids[] = { { .type = SERIO_8042, .proto = SERIO_ANY, .id = SERIO_ANY, .extra = SERIO_ANY, }, { .type = SERIO_8042_XL, .proto = SERIO_ANY, .id = SERIO_ANY, .extra = SERIO_ANY, }, { .type = SERIO_RS232, .proto = SERIO_PS2SER, .id = SERIO_ANY, .extra = SERIO_ANY, }, { 0 } }; MODULE_DEVICE_TABLE(serio, atkbd_serio_ids); static struct serio_driver atkbd_drv = { .driver = { .name = "atkbd", }, .description = DRIVER_DESC, .id_table = atkbd_serio_ids, .interrupt = atkbd_interrupt, .connect = atkbd_connect, .reconnect = atkbd_reconnect, .disconnect = atkbd_disconnect, .cleanup = atkbd_cleanup, }; static ssize_t atkbd_attr_show_helper(struct device *dev, char *buf, ssize_t (*handler)(struct atkbd *, char *)) { struct serio *serio = to_serio_port(dev); struct atkbd *atkbd = serio_get_drvdata(serio); return handler(atkbd, buf); } static ssize_t atkbd_attr_set_helper(struct device *dev, const char *buf, size_t count, ssize_t (*handler)(struct atkbd *, const char *, size_t)) { struct serio *serio = to_serio_port(dev); struct atkbd *atkbd = serio_get_drvdata(serio); int retval; retval = mutex_lock_interruptible(&atkbd->mutex); if (retval) return retval; atkbd_disable(atkbd); retval = handler(atkbd, buf, count); atkbd_enable(atkbd); mutex_unlock(&atkbd->mutex); return retval; } static ssize_t atkbd_show_extra(struct atkbd *atkbd, char *buf) { return sprintf(buf, "%d\n", atkbd->extra ? 1 : 0); } static ssize_t atkbd_set_extra(struct atkbd *atkbd, const char *buf, size_t count) { struct input_dev *old_dev, *new_dev; unsigned int value; int err; bool old_extra; unsigned char old_set; if (!atkbd->write) return -EIO; err = kstrtouint(buf, 10, &value); if (err) return err; if (value > 1) return -EINVAL; if (atkbd->extra != value) { /* * Since device's properties will change we need to * unregister old device. But allocate and register * new one first to make sure we have it. */ old_dev = atkbd->dev; old_extra = atkbd->extra; old_set = atkbd->set; new_dev = input_allocate_device(); if (!new_dev) return -ENOMEM; atkbd->dev = new_dev; atkbd->set = atkbd_select_set(atkbd, atkbd->set, value); atkbd_reset_state(atkbd); atkbd_activate(atkbd); atkbd_set_keycode_table(atkbd); atkbd_set_device_attrs(atkbd); err = input_register_device(atkbd->dev); if (err) { input_free_device(new_dev); atkbd->dev = old_dev; atkbd->set = atkbd_select_set(atkbd, old_set, old_extra); atkbd_set_keycode_table(atkbd); atkbd_set_device_attrs(atkbd); return err; } input_unregister_device(old_dev); } return count; } static ssize_t atkbd_show_force_release(struct atkbd *atkbd, char *buf) { size_t len = scnprintf(buf, PAGE_SIZE - 1, "%*pbl", ATKBD_KEYMAP_SIZE, atkbd->force_release_mask); buf[len++] = '\n'; buf[len] = '\0'; return len; } static ssize_t atkbd_set_force_release(struct atkbd *atkbd, const char *buf, size_t count) { /* 64 bytes on stack should be acceptable */ DECLARE_BITMAP(new_mask, ATKBD_KEYMAP_SIZE); int err; err = bitmap_parselist(buf, new_mask, ATKBD_KEYMAP_SIZE); if (err) return err; memcpy(atkbd->force_release_mask, new_mask, sizeof(atkbd->force_release_mask)); return count; } static ssize_t atkbd_show_scroll(struct atkbd *atkbd, char *buf) { return sprintf(buf, "%d\n", atkbd->scroll ? 1 : 0); } static ssize_t atkbd_set_scroll(struct atkbd *atkbd, const char *buf, size_t count) { struct input_dev *old_dev, *new_dev; unsigned int value; int err; bool old_scroll; err = kstrtouint(buf, 10, &value); if (err) return err; if (value > 1) return -EINVAL; if (atkbd->scroll != value) { old_dev = atkbd->dev; old_scroll = atkbd->scroll; new_dev = input_allocate_device(); if (!new_dev) return -ENOMEM; atkbd->dev = new_dev; atkbd->scroll = value; atkbd_set_keycode_table(atkbd); atkbd_set_device_attrs(atkbd); err = input_register_device(atkbd->dev); if (err) { input_free_device(new_dev); atkbd->scroll = old_scroll; atkbd->dev = old_dev; atkbd_set_keycode_table(atkbd); atkbd_set_device_attrs(atkbd); return err; } input_unregister_device(old_dev); } return count; } static ssize_t atkbd_show_set(struct atkbd *atkbd, char *buf) { return sprintf(buf, "%d\n", atkbd->set); } static ssize_t atkbd_set_set(struct atkbd *atkbd, const char *buf, size_t count) { struct input_dev *old_dev, *new_dev; unsigned int value; int err; unsigned char old_set; bool old_extra; if (!atkbd->write) return -EIO; err = kstrtouint(buf, 10, &value); if (err) return err; if (value != 2 && value != 3) return -EINVAL; if (atkbd->set != value) { old_dev = atkbd->dev; old_extra = atkbd->extra; old_set = atkbd->set; new_dev = input_allocate_device(); if (!new_dev) return -ENOMEM; atkbd->dev = new_dev; atkbd->set = atkbd_select_set(atkbd, value, atkbd->extra); atkbd_reset_state(atkbd); atkbd_activate(atkbd); atkbd_set_keycode_table(atkbd); atkbd_set_device_attrs(atkbd); err = input_register_device(atkbd->dev); if (err) { input_free_device(new_dev); atkbd->dev = old_dev; atkbd->set = atkbd_select_set(atkbd, old_set, old_extra); atkbd_set_keycode_table(atkbd); atkbd_set_device_attrs(atkbd); return err; } input_unregister_device(old_dev); } return count; } static ssize_t atkbd_show_softrepeat(struct atkbd *atkbd, char *buf) { return sprintf(buf, "%d\n", atkbd->softrepeat ? 1 : 0); } static ssize_t atkbd_set_softrepeat(struct atkbd *atkbd, const char *buf, size_t count) { struct input_dev *old_dev, *new_dev; unsigned int value; int err; bool old_softrepeat, old_softraw; if (!atkbd->write) return -EIO; err = kstrtouint(buf, 10, &value); if (err) return err; if (value > 1) return -EINVAL; if (atkbd->softrepeat != value) { old_dev = atkbd->dev; old_softrepeat = atkbd->softrepeat; old_softraw = atkbd->softraw; new_dev = input_allocate_device(); if (!new_dev) return -ENOMEM; atkbd->dev = new_dev; atkbd->softrepeat = value; if (atkbd->softrepeat) atkbd->softraw = true; atkbd_set_device_attrs(atkbd); err = input_register_device(atkbd->dev); if (err) { input_free_device(new_dev); atkbd->dev = old_dev; atkbd->softrepeat = old_softrepeat; atkbd->softraw = old_softraw; atkbd_set_device_attrs(atkbd); return err; } input_unregister_device(old_dev); } return count; } static ssize_t atkbd_show_softraw(struct atkbd *atkbd, char *buf) { return sprintf(buf, "%d\n", atkbd->softraw ? 1 : 0); } static ssize_t atkbd_set_softraw(struct atkbd *atkbd, const char *buf, size_t count) { struct input_dev *old_dev, *new_dev; unsigned int value; int err; bool old_softraw; err = kstrtouint(buf, 10, &value); if (err) return err; if (value > 1) return -EINVAL; if (atkbd->softraw != value) { old_dev = atkbd->dev; old_softraw = atkbd->softraw; new_dev = input_allocate_device(); if (!new_dev) return -ENOMEM; atkbd->dev = new_dev; atkbd->softraw = value; atkbd_set_device_attrs(atkbd); err = input_register_device(atkbd->dev); if (err) { input_free_device(new_dev); atkbd->dev = old_dev; atkbd->softraw = old_softraw; atkbd_set_device_attrs(atkbd); return err; } input_unregister_device(old_dev); } return count; } static ssize_t atkbd_show_err_count(struct atkbd *atkbd, char *buf) { return sprintf(buf, "%lu\n", atkbd->err_count); } static int __init atkbd_setup_forced_release(const struct dmi_system_id *id) { atkbd_platform_fixup = atkbd_apply_forced_release_keylist; atkbd_platform_fixup_data = id->driver_data; return 1; } static int __init atkbd_setup_scancode_fixup(const struct dmi_system_id *id) { atkbd_platform_scancode_fixup = id->driver_data; return 1; } static int __init atkbd_deactivate_fixup(const struct dmi_system_id *id) { atkbd_skip_deactivate = true; return 1; } /* * NOTE: do not add any more "force release" quirks to this table. The * task of adjusting list of keys that should be "released" automatically * by the driver is now delegated to userspace tools, such as udev, so * submit such quirks there. */ static const struct dmi_system_id atkbd_dmi_quirk_table[] __initconst = { { .matches = { DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."), DMI_MATCH(DMI_CHASSIS_TYPE, "8"), /* Portable */ }, .callback = atkbd_setup_forced_release, .driver_data = atkbd_dell_laptop_forced_release_keys, }, { .matches = { DMI_MATCH(DMI_SYS_VENDOR, "Dell Computer Corporation"), DMI_MATCH(DMI_CHASSIS_TYPE, "8"), /* Portable */ }, .callback = atkbd_setup_forced_release, .driver_data = atkbd_dell_laptop_forced_release_keys, }, { .matches = { DMI_MATCH(DMI_SYS_VENDOR, "Hewlett-Packard"), DMI_MATCH(DMI_PRODUCT_NAME, "HP 2133"), }, .callback = atkbd_setup_forced_release, .driver_data = atkbd_hp_forced_release_keys, }, { .matches = { DMI_MATCH(DMI_SYS_VENDOR, "Hewlett-Packard"), DMI_MATCH(DMI_PRODUCT_NAME, "Pavilion ZV6100"), }, .callback = atkbd_setup_forced_release, .driver_data = atkbd_volume_forced_release_keys, }, { .matches = { DMI_MATCH(DMI_SYS_VENDOR, "Hewlett-Packard"), DMI_MATCH(DMI_PRODUCT_NAME, "Presario R4000"), }, .callback = atkbd_setup_forced_release, .driver_data = atkbd_volume_forced_release_keys, }, { .matches = { DMI_MATCH(DMI_SYS_VENDOR, "Hewlett-Packard"), DMI_MATCH(DMI_PRODUCT_NAME, "Presario R4100"), }, .callback = atkbd_setup_forced_release, .driver_data = atkbd_volume_forced_release_keys, }, { .matches = { DMI_MATCH(DMI_SYS_VENDOR, "Hewlett-Packard"), DMI_MATCH(DMI_PRODUCT_NAME, "Presario R4200"), }, .callback = atkbd_setup_forced_release, .driver_data = atkbd_volume_forced_release_keys, }, { /* Inventec Symphony */ .matches = { DMI_MATCH(DMI_SYS_VENDOR, "INVENTEC"), DMI_MATCH(DMI_PRODUCT_NAME, "SYMPHONY 6.0/7.0"), }, .callback = atkbd_setup_forced_release, .driver_data = atkbd_volume_forced_release_keys, }, { /* Samsung NC10 */ .matches = { DMI_MATCH(DMI_SYS_VENDOR, "SAMSUNG ELECTRONICS CO., LTD."), DMI_MATCH(DMI_PRODUCT_NAME, "NC10"), }, .callback = atkbd_setup_forced_release, .driver_data = atkbd_samsung_forced_release_keys, }, { /* Samsung NC20 */ .matches = { DMI_MATCH(DMI_SYS_VENDOR, "SAMSUNG ELECTRONICS CO., LTD."), DMI_MATCH(DMI_PRODUCT_NAME, "NC20"), }, .callback = atkbd_setup_forced_release, .driver_data = atkbd_samsung_forced_release_keys, }, { /* Samsung SQ45S70S */ .matches = { DMI_MATCH(DMI_SYS_VENDOR, "SAMSUNG ELECTRONICS CO., LTD."), DMI_MATCH(DMI_PRODUCT_NAME, "SQ45S70S"), }, .callback = atkbd_setup_forced_release, .driver_data = atkbd_samsung_forced_release_keys, }, { /* Fujitsu Amilo PA 1510 */ .matches = { DMI_MATCH(DMI_SYS_VENDOR, "FUJITSU SIEMENS"), DMI_MATCH(DMI_PRODUCT_NAME, "AMILO Pa 1510"), }, .callback = atkbd_setup_forced_release, .driver_data = atkbd_volume_forced_release_keys, }, { /* Fujitsu Amilo Pi 3525 */ .matches = { DMI_MATCH(DMI_SYS_VENDOR, "FUJITSU SIEMENS"), DMI_MATCH(DMI_PRODUCT_NAME, "AMILO Pi 3525"), }, .callback = atkbd_setup_forced_release, .driver_data = atkbd_amilo_pi3525_forced_release_keys, }, { /* Fujitsu Amilo Xi 3650 */ .matches = { DMI_MATCH(DMI_SYS_VENDOR, "FUJITSU SIEMENS"), DMI_MATCH(DMI_PRODUCT_NAME, "AMILO Xi 3650"), }, .callback = atkbd_setup_forced_release, .driver_data = atkbd_amilo_xi3650_forced_release_keys, }, { .matches = { DMI_MATCH(DMI_SYS_VENDOR, "Soltech Corporation"), DMI_MATCH(DMI_PRODUCT_NAME, "TA12"), }, .callback = atkbd_setup_forced_release, .driver_data = atkdb_soltech_ta12_forced_release_keys, }, { /* OQO Model 01+ */ .matches = { DMI_MATCH(DMI_SYS_VENDOR, "OQO"), DMI_MATCH(DMI_PRODUCT_NAME, "ZEPTO"), }, .callback = atkbd_setup_scancode_fixup, .driver_data = atkbd_oqo_01plus_scancode_fixup, }, { .matches = { DMI_MATCH(DMI_SYS_VENDOR, "LG Electronics"), }, .callback = atkbd_deactivate_fixup, }, { } }; static int __init atkbd_init(void) { dmi_check_system(atkbd_dmi_quirk_table); return serio_register_driver(&atkbd_drv); } static void __exit atkbd_exit(void) { serio_unregister_driver(&atkbd_drv); } module_init(atkbd_init); module_exit(atkbd_exit);
147 107 134 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 /* SPDX-License-Identifier: GPL-2.0 */ #ifndef __CGROUP_INTERNAL_H #define __CGROUP_INTERNAL_H #include <linux/cgroup.h> #include <linux/kernfs.h> #include <linux/workqueue.h> #include <linux/list.h> #include <linux/refcount.h> #define TRACE_CGROUP_PATH_LEN 1024 extern spinlock_t trace_cgroup_path_lock; extern char trace_cgroup_path[TRACE_CGROUP_PATH_LEN]; /* * cgroup_path() takes a spin lock. It is good practice not to take * spin locks within trace point handlers, as they are mostly hidden * from normal view. As cgroup_path() can take the kernfs_rename_lock * spin lock, it is best to not call that function from the trace event * handler. * * Note: trace_cgroup_##type##_enabled() is a static branch that will only * be set when the trace event is enabled. */ #define TRACE_CGROUP_PATH(type, cgrp, ...) \ do { \ if (trace_cgroup_##type##_enabled()) { \ spin_lock(&trace_cgroup_path_lock); \ cgroup_path(cgrp, trace_cgroup_path, \ TRACE_CGROUP_PATH_LEN); \ trace_cgroup_##type(cgrp, trace_cgroup_path, \ ##__VA_ARGS__); \ spin_unlock(&trace_cgroup_path_lock); \ } \ } while (0) struct cgroup_pidlist; struct cgroup_file_ctx { struct cgroup_namespace *ns; struct { void *trigger; } psi; struct { bool started; struct css_task_iter iter; } procs; struct { struct cgroup_pidlist *pidlist; } procs1; }; /* * A cgroup can be associated with multiple css_sets as different tasks may * belong to different cgroups on different hierarchies. In the other * direction, a css_set is naturally associated with multiple cgroups. * This M:N relationship is represented by the following link structure * which exists for each association and allows traversing the associations * from both sides. */ struct cgrp_cset_link { /* the cgroup and css_set this link associates */ struct cgroup *cgrp; struct css_set *cset; /* list of cgrp_cset_links anchored at cgrp->cset_links */ struct list_head cset_link; /* list of cgrp_cset_links anchored at css_set->cgrp_links */ struct list_head cgrp_link; }; /* used to track tasks and csets during migration */ struct cgroup_taskset { /* the src and dst cset list running through cset->mg_node */ struct list_head src_csets; struct list_head dst_csets; /* the number of tasks in the set */ int nr_tasks; /* the subsys currently being processed */ int ssid; /* * Fields for cgroup_taskset_*() iteration. * * Before migration is committed, the target migration tasks are on * ->mg_tasks of the csets on ->src_csets. After, on ->mg_tasks of * the csets on ->dst_csets. ->csets point to either ->src_csets * or ->dst_csets depending on whether migration is committed. * * ->cur_csets and ->cur_task point to the current task position * during iteration. */ struct list_head *csets; struct css_set *cur_cset; struct task_struct *cur_task; }; /* migration context also tracks preloading */ struct cgroup_mgctx { /* * Preloaded source and destination csets. Used to guarantee * atomic success or failure on actual migration. */ struct list_head preloaded_src_csets; struct list_head preloaded_dst_csets; /* tasks and csets to migrate */ struct cgroup_taskset tset; /* subsystems affected by migration */ u16 ss_mask; }; #define CGROUP_TASKSET_INIT(tset) \ { \ .src_csets = LIST_HEAD_INIT(tset.src_csets), \ .dst_csets = LIST_HEAD_INIT(tset.dst_csets), \ .csets = &tset.src_csets, \ } #define CGROUP_MGCTX_INIT(name) \ { \ LIST_HEAD_INIT(name.preloaded_src_csets), \ LIST_HEAD_INIT(name.preloaded_dst_csets), \ CGROUP_TASKSET_INIT(name.tset), \ } #define DEFINE_CGROUP_MGCTX(name) \ struct cgroup_mgctx name = CGROUP_MGCTX_INIT(name) struct cgroup_sb_opts { u16 subsys_mask; unsigned int flags; char *release_agent; bool cpuset_clone_children; char *name; /* User explicitly requested empty subsystem */ bool none; }; extern struct mutex cgroup_mutex; extern spinlock_t css_set_lock; extern struct cgroup_subsys *cgroup_subsys[]; extern struct list_head cgroup_roots; extern struct file_system_type cgroup_fs_type; /* iterate across the hierarchies */ #define for_each_root(root) \ list_for_each_entry((root), &cgroup_roots, root_list) /** * for_each_subsys - iterate all enabled cgroup subsystems * @ss: the iteration cursor * @ssid: the index of @ss, CGROUP_SUBSYS_COUNT after reaching the end */ #define for_each_subsys(ss, ssid) \ for ((ssid) = 0; (ssid) < CGROUP_SUBSYS_COUNT && \ (((ss) = cgroup_subsys[ssid]) || true); (ssid)++) static inline bool cgroup_is_dead(const struct cgroup *cgrp) { return !(cgrp->self.flags & CSS_ONLINE); } static inline bool notify_on_release(const struct cgroup *cgrp) { return test_bit(CGRP_NOTIFY_ON_RELEASE, &cgrp->flags); } void put_css_set_locked(struct css_set *cset); static inline void put_css_set(struct css_set *cset) { unsigned long flags; /* * Ensure that the refcount doesn't hit zero while any readers * can see it. Similar to atomic_dec_and_lock(), but for an * rwlock */ if (refcount_dec_not_one(&cset->refcount)) return; spin_lock_irqsave(&css_set_lock, flags); put_css_set_locked(cset); spin_unlock_irqrestore(&css_set_lock, flags); } /* * refcounted get/put for css_set objects */ static inline void get_css_set(struct css_set *cset) { refcount_inc(&cset->refcount); } bool cgroup_ssid_enabled(int ssid); bool cgroup_on_dfl(const struct cgroup *cgrp); bool cgroup_is_thread_root(struct cgroup *cgrp); bool cgroup_is_threaded(struct cgroup *cgrp); struct cgroup_root *cgroup_root_from_kf(struct kernfs_root *kf_root); struct cgroup *task_cgroup_from_root(struct task_struct *task, struct cgroup_root *root); struct cgroup *cgroup_kn_lock_live(struct kernfs_node *kn, bool drain_offline); void cgroup_kn_unlock(struct kernfs_node *kn); int cgroup_path_ns_locked(struct cgroup *cgrp, char *buf, size_t buflen, struct cgroup_namespace *ns); void cgroup_free_root(struct cgroup_root *root); void init_cgroup_root(struct cgroup_root *root, struct cgroup_sb_opts *opts); int cgroup_setup_root(struct cgroup_root *root, u16 ss_mask, int ref_flags); int rebind_subsystems(struct cgroup_root *dst_root, u16 ss_mask); struct dentry *cgroup_do_mount(struct file_system_type *fs_type, int flags, struct cgroup_root *root, unsigned long magic, struct cgroup_namespace *ns); int cgroup_migrate_vet_dst(struct cgroup *dst_cgrp); void cgroup_migrate_finish(struct cgroup_mgctx *mgctx); void cgroup_migrate_add_src(struct css_set *src_cset, struct cgroup *dst_cgrp, struct cgroup_mgctx *mgctx); int cgroup_migrate_prepare_dst(struct cgroup_mgctx *mgctx); int cgroup_migrate(struct task_struct *leader, bool threadgroup, struct cgroup_mgctx *mgctx); int cgroup_attach_task(struct cgroup *dst_cgrp, struct task_struct *leader, bool threadgroup); struct task_struct *cgroup_procs_write_start(char *buf, bool threadgroup) __acquires(&cgroup_threadgroup_rwsem); void cgroup_procs_write_finish(struct task_struct *task) __releases(&cgroup_threadgroup_rwsem); void cgroup_lock_and_drain_offline(struct cgroup *cgrp); int cgroup_mkdir(struct kernfs_node *parent_kn, const char *name, umode_t mode); int cgroup_rmdir(struct kernfs_node *kn); int cgroup_show_path(struct seq_file *sf, struct kernfs_node *kf_node, struct kernfs_root *kf_root); int cgroup_task_count(const struct cgroup *cgrp); /* * rstat.c */ int cgroup_rstat_init(struct cgroup *cgrp); void cgroup_rstat_exit(struct cgroup *cgrp); void cgroup_rstat_boot(void); void cgroup_base_stat_cputime_show(struct seq_file *seq); /* * namespace.c */ extern const struct proc_ns_operations cgroupns_operations; /* * cgroup-v1.c */ extern struct cftype cgroup1_base_files[]; extern struct kernfs_syscall_ops cgroup1_kf_syscall_ops; int proc_cgroupstats_show(struct seq_file *m, void *v); bool cgroup1_ssid_disabled(int ssid); void cgroup1_pidlist_destroy_all(struct cgroup *cgrp); void cgroup1_release_agent(struct work_struct *work); void cgroup1_check_for_release(struct cgroup *cgrp); struct dentry *cgroup1_mount(struct file_system_type *fs_type, int flags, void *data, unsigned long magic, struct cgroup_namespace *ns); #endif /* __CGROUP_INTERNAL_H */
41 34 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 /* * Media device node * * Copyright (C) 2010 Nokia Corporation * * Contacts: Laurent Pinchart <laurent.pinchart@ideasonboard.com> * Sakari Ailus <sakari.ailus@iki.fi> * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License version 2 as * published by the Free Software Foundation. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * -- * * Common functions for media-related drivers to register and unregister media * device nodes. */ #ifndef _MEDIA_DEVNODE_H #define _MEDIA_DEVNODE_H #include <linux/poll.h> #include <linux/fs.h> #include <linux/device.h> #include <linux/cdev.h> struct media_device; /* * Flag to mark the media_devnode struct as registered. Drivers must not touch * this flag directly, it will be set and cleared by media_devnode_register and * media_devnode_unregister. */ #define MEDIA_FLAG_REGISTERED 0 /** * struct media_file_operations - Media device file operations * * @owner: should be filled with %THIS_MODULE * @read: pointer to the function that implements read() syscall * @write: pointer to the function that implements write() syscall * @poll: pointer to the function that implements poll() syscall * @ioctl: pointer to the function that implements ioctl() syscall * @compat_ioctl: pointer to the function that will handle 32 bits userspace * calls to the the ioctl() syscall on a Kernel compiled with 64 bits. * @open: pointer to the function that implements open() syscall * @release: pointer to the function that will release the resources allocated * by the @open function. */ struct media_file_operations { struct module *owner; ssize_t (*read) (struct file *, char __user *, size_t, loff_t *); ssize_t (*write) (struct file *, const char __user *, size_t, loff_t *); __poll_t (*poll) (struct file *, struct poll_table_struct *); long (*ioctl) (struct file *, unsigned int, unsigned long); long (*compat_ioctl) (struct file *, unsigned int, unsigned long); int (*open) (struct file *); int (*release) (struct file *); }; /** * struct media_devnode - Media device node * @media_dev: pointer to struct &media_device * @fops: pointer to struct &media_file_operations with media device ops * @dev: pointer to struct &device containing the media controller device * @cdev: struct cdev pointer character device * @parent: parent device * @minor: device node minor number * @flags: flags, combination of the ``MEDIA_FLAG_*`` constants * @release: release callback called at the end of ``media_devnode_release()`` * routine at media-device.c. * * This structure represents a media-related device node. * * The @parent is a physical device. It must be set by core or device drivers * before registering the node. */ struct media_devnode { struct media_device *media_dev; /* device ops */ const struct media_file_operations *fops; /* sysfs */ struct device dev; /* media device */ struct cdev cdev; /* character device */ struct device *parent; /* device parent */ /* device info */ int minor; unsigned long flags; /* Use bitops to access flags */ /* callbacks */ void (*release)(struct media_devnode *devnode); }; /* dev to media_devnode */ #define to_media_devnode(cd) container_of(cd, struct media_devnode, dev) /** * media_devnode_register - register a media device node * * @mdev: struct media_device we want to register a device node * @devnode: media device node structure we want to register * @owner: should be filled with %THIS_MODULE * * The registration code assigns minor numbers and registers the new device node * with the kernel. An error is returned if no free minor number can be found, * or if the registration of the device node fails. * * Zero is returned on success. * * Note that if the media_devnode_register call fails, the release() callback of * the media_devnode structure is *not* called, so the caller is responsible for * freeing any data. */ int __must_check media_devnode_register(struct media_device *mdev, struct media_devnode *devnode, struct module *owner); /** * media_devnode_unregister_prepare - clear the media device node register bit * @devnode: the device node to prepare for unregister * * This clears the passed device register bit. Future open calls will be met * with errors. Should be called before media_devnode_unregister() to avoid * races with unregister and device file open calls. * * This function can safely be called if the device node has never been * registered or has already been unregistered. */ void media_devnode_unregister_prepare(struct media_devnode *devnode); /** * media_devnode_unregister - unregister a media device node * @devnode: the device node to unregister * * This unregisters the passed device. Future open calls will be met with * errors. * * Should be called after media_devnode_unregister_prepare() */ void media_devnode_unregister(struct media_devnode *devnode); /** * media_devnode_data - returns a pointer to the &media_devnode * * @filp: pointer to struct &file */ static inline struct media_devnode *media_devnode_data(struct file *filp) { return filp->private_data; } /** * media_devnode_is_registered - returns true if &media_devnode is registered; * false otherwise. * * @devnode: pointer to struct &media_devnode. * * Note: If mdev is NULL, it also returns false. */ static inline int media_devnode_is_registered(struct media_devnode *devnode) { if (!devnode) return false; return test_bit(MEDIA_FLAG_REGISTERED, &devnode->flags); } #endif /* _MEDIA_DEVNODE_H */
153 114 114 114 114 114 114 72 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 /* SPDX-License-Identifier: GPL-2.0 */ /* * Internal header to deal with irq_desc->status which will be renamed * to irq_desc->settings. */ enum { _IRQ_DEFAULT_INIT_FLAGS = IRQ_DEFAULT_INIT_FLAGS, _IRQ_PER_CPU = IRQ_PER_CPU, _IRQ_LEVEL = IRQ_LEVEL, _IRQ_NOPROBE = IRQ_NOPROBE, _IRQ_NOREQUEST = IRQ_NOREQUEST, _IRQ_NOTHREAD = IRQ_NOTHREAD, _IRQ_NOAUTOEN = IRQ_NOAUTOEN, _IRQ_MOVE_PCNTXT = IRQ_MOVE_PCNTXT, _IRQ_NO_BALANCING = IRQ_NO_BALANCING, _IRQ_NESTED_THREAD = IRQ_NESTED_THREAD, _IRQ_PER_CPU_DEVID = IRQ_PER_CPU_DEVID, _IRQ_IS_POLLED = IRQ_IS_POLLED, _IRQ_DISABLE_UNLAZY = IRQ_DISABLE_UNLAZY, _IRQF_MODIFY_MASK = IRQF_MODIFY_MASK, }; #define IRQ_PER_CPU GOT_YOU_MORON #define IRQ_NO_BALANCING GOT_YOU_MORON #define IRQ_LEVEL GOT_YOU_MORON #define IRQ_NOPROBE GOT_YOU_MORON #define IRQ_NOREQUEST GOT_YOU_MORON #define IRQ_NOTHREAD GOT_YOU_MORON #define IRQ_NOAUTOEN GOT_YOU_MORON #define IRQ_NESTED_THREAD GOT_YOU_MORON #define IRQ_PER_CPU_DEVID GOT_YOU_MORON #define IRQ_IS_POLLED GOT_YOU_MORON #define IRQ_DISABLE_UNLAZY GOT_YOU_MORON #undef IRQF_MODIFY_MASK #define IRQF_MODIFY_MASK GOT_YOU_MORON static inline void irq_settings_clr_and_set(struct irq_desc *desc, u32 clr, u32 set) { desc->status_use_accessors &= ~(clr & _IRQF_MODIFY_MASK); desc->status_use_accessors |= (set & _IRQF_MODIFY_MASK); } static inline bool irq_settings_is_per_cpu(struct irq_desc *desc) { return desc->status_use_accessors & _IRQ_PER_CPU; } static inline bool irq_settings_is_per_cpu_devid(struct irq_desc *desc) { return desc->status_use_accessors & _IRQ_PER_CPU_DEVID; } static inline void irq_settings_set_per_cpu(struct irq_desc *desc) { desc->status_use_accessors |= _IRQ_PER_CPU; } static inline void irq_settings_set_no_balancing(struct irq_desc *desc) { desc->status_use_accessors |= _IRQ_NO_BALANCING; } static inline bool irq_settings_has_no_balance_set(struct irq_desc *desc) { return desc->status_use_accessors & _IRQ_NO_BALANCING; } static inline u32 irq_settings_get_trigger_mask(struct irq_desc *desc) { return desc->status_use_accessors & IRQ_TYPE_SENSE_MASK; } static inline void irq_settings_set_trigger_mask(struct irq_desc *desc, u32 mask) { desc->status_use_accessors &= ~IRQ_TYPE_SENSE_MASK; desc->status_use_accessors |= mask & IRQ_TYPE_SENSE_MASK; } static inline bool irq_settings_is_level(struct irq_desc *desc) { return desc->status_use_accessors & _IRQ_LEVEL; } static inline void irq_settings_clr_level(struct irq_desc *desc) { desc->status_use_accessors &= ~_IRQ_LEVEL; } static inline void irq_settings_set_level(struct irq_desc *desc) { desc->status_use_accessors |= _IRQ_LEVEL; } static inline bool irq_settings_can_request(struct irq_desc *desc) { return !(desc->status_use_accessors & _IRQ_NOREQUEST); } static inline void irq_settings_clr_norequest(struct irq_desc *desc) { desc->status_use_accessors &= ~_IRQ_NOREQUEST; } static inline void irq_settings_set_norequest(struct irq_desc *desc) { desc->status_use_accessors |= _IRQ_NOREQUEST; } static inline bool irq_settings_can_thread(struct irq_desc *desc) { return !(desc->status_use_accessors & _IRQ_NOTHREAD); } static inline void irq_settings_clr_nothread(struct irq_desc *desc) { desc->status_use_accessors &= ~_IRQ_NOTHREAD; } static inline void irq_settings_set_nothread(struct irq_desc *desc) { desc->status_use_accessors |= _IRQ_NOTHREAD; } static inline bool irq_settings_can_probe(struct irq_desc *desc) { return !(desc->status_use_accessors & _IRQ_NOPROBE); } static inline void irq_settings_clr_noprobe(struct irq_desc *desc) { desc->status_use_accessors &= ~_IRQ_NOPROBE; } static inline void irq_settings_set_noprobe(struct irq_desc *desc) { desc->status_use_accessors |= _IRQ_NOPROBE; } static inline bool irq_settings_can_move_pcntxt(struct irq_desc *desc) { return desc->status_use_accessors & _IRQ_MOVE_PCNTXT; } static inline bool irq_settings_can_autoenable(struct irq_desc *desc) { return !(desc->status_use_accessors & _IRQ_NOAUTOEN); } static inline bool irq_settings_is_nested_thread(struct irq_desc *desc) { return desc->status_use_accessors & _IRQ_NESTED_THREAD; } static inline bool irq_settings_is_polled(struct irq_desc *desc) { return desc->status_use_accessors & _IRQ_IS_POLLED; } static inline bool irq_settings_disable_unlazy(struct irq_desc *desc) { return desc->status_use_accessors & _IRQ_DISABLE_UNLAZY; } static inline void irq_settings_clr_disable_unlazy(struct irq_desc *desc) { desc->status_use_accessors &= ~_IRQ_DISABLE_UNLAZY; }
2 2 23 18 23 13 13 13 13 7 10 4 4 4 11 1 2 10 10 9 4 1 5 10 4 4 1 1 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 /* * linux/fs/9p/vfs_dir.c * * This file contains vfs directory ops for the 9P2000 protocol. * * Copyright (C) 2004 by Eric Van Hensbergen <ericvh@gmail.com> * Copyright (C) 2002 by Ron Minnich <rminnich@lanl.gov> * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License version 2 * as published by the Free Software Foundation. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to: * Free Software Foundation * 51 Franklin Street, Fifth Floor * Boston, MA 02111-1301 USA * */ #include <linux/module.h> #include <linux/errno.h> #include <linux/fs.h> #include <linux/file.h> #include <linux/stat.h> #include <linux/string.h> #include <linux/sched.h> #include <linux/inet.h> #include <linux/idr.h> #include <linux/slab.h> #include <linux/uio.h> #include <net/9p/9p.h> #include <net/9p/client.h> #include "v9fs.h" #include "v9fs_vfs.h" #include "fid.h" /** * struct p9_rdir - readdir accounting * @head: start offset of current dirread buffer * @tail: end offset of current dirread buffer * @buf: dirread buffer * * private structure for keeping track of readdir * allocated on demand */ struct p9_rdir { int head; int tail; uint8_t buf[]; }; /** * dt_type - return file type * @mistat: mistat structure * */ static inline int dt_type(struct p9_wstat *mistat) { unsigned long perm = mistat->mode; int rettype = DT_REG; if (perm & P9_DMDIR) rettype = DT_DIR; if (perm & P9_DMSYMLINK) rettype = DT_LNK; return rettype; } /** * v9fs_alloc_rdir_buf - Allocate buffer used for read and readdir * @filp: opened file structure * @buflen: Length in bytes of buffer to allocate * */ static struct p9_rdir *v9fs_alloc_rdir_buf(struct file *filp, int buflen) { struct p9_fid *fid = filp->private_data; if (!fid->rdir) fid->rdir = kzalloc(sizeof(struct p9_rdir) + buflen, GFP_KERNEL); return fid->rdir; } /** * v9fs_dir_readdir - iterate through a directory * @file: opened file structure * @ctx: actor we feed the entries to * */ static int v9fs_dir_readdir(struct file *file, struct dir_context *ctx) { bool over; struct p9_wstat st; int err = 0; struct p9_fid *fid; int buflen; struct p9_rdir *rdir; struct kvec kvec; p9_debug(P9_DEBUG_VFS, "name %pD\n", file); fid = file->private_data; buflen = fid->clnt->msize - P9_IOHDRSZ; rdir = v9fs_alloc_rdir_buf(file, buflen); if (!rdir) return -ENOMEM; kvec.iov_base = rdir->buf; kvec.iov_len = buflen; while (1) { if (rdir->tail == rdir->head) { struct iov_iter to; int n; iov_iter_kvec(&to, READ | ITER_KVEC, &kvec, 1, buflen); n = p9_client_read(file->private_data, ctx->pos, &to, &err); if (err) return err; if (n == 0) return 0; rdir->head = 0; rdir->tail = n; } while (rdir->head < rdir->tail) { err = p9stat_read(fid->clnt, rdir->buf + rdir->head, rdir->tail - rdir->head, &st); if (err <= 0) { p9_debug(P9_DEBUG_VFS, "returned %d\n", err); return -EIO; } over = !dir_emit(ctx, st.name, strlen(st.name), v9fs_qid2ino(&st.qid), dt_type(&st)); p9stat_free(&st); if (over) return 0; rdir->head += err; ctx->pos += err; } } } /** * v9fs_dir_readdir_dotl - iterate through a directory * @file: opened file structure * @ctx: actor we feed the entries to * */ static int v9fs_dir_readdir_dotl(struct file *file, struct dir_context *ctx) { int err = 0; struct p9_fid *fid; int buflen; struct p9_rdir *rdir; struct p9_dirent curdirent; p9_debug(P9_DEBUG_VFS, "name %pD\n", file); fid = file->private_data; buflen = fid->clnt->msize - P9_READDIRHDRSZ; rdir = v9fs_alloc_rdir_buf(file, buflen); if (!rdir) return -ENOMEM; while (1) { if (rdir->tail == rdir->head) { err = p9_client_readdir(fid, rdir->buf, buflen, ctx->pos); if (err <= 0) return err; rdir->head = 0; rdir->tail = err; } while (rdir->head < rdir->tail) { err = p9dirent_read(fid->clnt, rdir->buf + rdir->head, rdir->tail - rdir->head, &curdirent); if (err < 0) { p9_debug(P9_DEBUG_VFS, "returned %d\n", err); return -EIO; } if (!dir_emit(ctx, curdirent.d_name, strlen(curdirent.d_name), v9fs_qid2ino(&curdirent.qid), curdirent.d_type)) return 0; ctx->pos = curdirent.d_off; rdir->head += err; } } } /** * v9fs_dir_release - close a directory * @inode: inode of the directory * @filp: file pointer to a directory * */ int v9fs_dir_release(struct inode *inode, struct file *filp) { struct p9_fid *fid; fid = filp->private_data; p9_debug(P9_DEBUG_VFS, "inode: %p filp: %p fid: %d\n", inode, filp, fid ? fid->fid : -1); if (fid) p9_client_clunk(fid); return 0; } const struct file_operations v9fs_dir_operations = { .read = generic_read_dir, .llseek = generic_file_llseek, .iterate_shared = v9fs_dir_readdir, .open = v9fs_file_open, .release = v9fs_dir_release, }; const struct file_operations v9fs_dir_operations_dotl = { .read = generic_read_dir, .llseek = generic_file_llseek, .iterate_shared = v9fs_dir_readdir_dotl, .open = v9fs_file_open, .release = v9fs_dir_release, .fsync = v9fs_file_fsync_dotl, };
1 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 /* * Glue Code for x86_64/AVX2/AES-NI assembler optimized version of Camellia * * Copyright © 2013 Jussi Kivilinna <jussi.kivilinna@mbnet.fi> * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * */ #include <asm/crypto/camellia.h> #include <asm/crypto/glue_helper.h> #include <crypto/algapi.h> #include <crypto/internal/simd.h> #include <crypto/xts.h> #include <linux/crypto.h> #include <linux/err.h> #include <linux/module.h> #include <linux/types.h> #define CAMELLIA_AESNI_PARALLEL_BLOCKS 16 #define CAMELLIA_AESNI_AVX2_PARALLEL_BLOCKS 32 /* 32-way AVX2/AES-NI parallel cipher functions */ asmlinkage void camellia_ecb_enc_32way(struct camellia_ctx *ctx, u8 *dst, const u8 *src); asmlinkage void camellia_ecb_dec_32way(struct camellia_ctx *ctx, u8 *dst, const u8 *src); asmlinkage void camellia_cbc_dec_32way(struct camellia_ctx *ctx, u8 *dst, const u8 *src); asmlinkage void camellia_ctr_32way(struct camellia_ctx *ctx, u8 *dst, const u8 *src, le128 *iv); asmlinkage void camellia_xts_enc_32way(struct camellia_ctx *ctx, u8 *dst, const u8 *src, le128 *iv); asmlinkage void camellia_xts_dec_32way(struct camellia_ctx *ctx, u8 *dst, const u8 *src, le128 *iv); static const struct common_glue_ctx camellia_enc = { .num_funcs = 4, .fpu_blocks_limit = CAMELLIA_AESNI_PARALLEL_BLOCKS, .funcs = { { .num_blocks = CAMELLIA_AESNI_AVX2_PARALLEL_BLOCKS, .fn_u = { .ecb = GLUE_FUNC_CAST(camellia_ecb_enc_32way) } }, { .num_blocks = CAMELLIA_AESNI_PARALLEL_BLOCKS, .fn_u = { .ecb = GLUE_FUNC_CAST(camellia_ecb_enc_16way) } }, { .num_blocks = 2, .fn_u = { .ecb = GLUE_FUNC_CAST(camellia_enc_blk_2way) } }, { .num_blocks = 1, .fn_u = { .ecb = GLUE_FUNC_CAST(camellia_enc_blk) } } } }; static const struct common_glue_ctx camellia_ctr = { .num_funcs = 4, .fpu_blocks_limit = CAMELLIA_AESNI_PARALLEL_BLOCKS, .funcs = { { .num_blocks = CAMELLIA_AESNI_AVX2_PARALLEL_BLOCKS, .fn_u = { .ctr = GLUE_CTR_FUNC_CAST(camellia_ctr_32way) } }, { .num_blocks = CAMELLIA_AESNI_PARALLEL_BLOCKS, .fn_u = { .ctr = GLUE_CTR_FUNC_CAST(camellia_ctr_16way) } }, { .num_blocks = 2, .fn_u = { .ctr = GLUE_CTR_FUNC_CAST(camellia_crypt_ctr_2way) } }, { .num_blocks = 1, .fn_u = { .ctr = GLUE_CTR_FUNC_CAST(camellia_crypt_ctr) } } } }; static const struct common_glue_ctx camellia_enc_xts = { .num_funcs = 3, .fpu_blocks_limit = CAMELLIA_AESNI_PARALLEL_BLOCKS, .funcs = { { .num_blocks = CAMELLIA_AESNI_AVX2_PARALLEL_BLOCKS, .fn_u = { .xts = GLUE_XTS_FUNC_CAST(camellia_xts_enc_32way) } }, { .num_blocks = CAMELLIA_AESNI_PARALLEL_BLOCKS, .fn_u = { .xts = GLUE_XTS_FUNC_CAST(camellia_xts_enc_16way) } }, { .num_blocks = 1, .fn_u = { .xts = GLUE_XTS_FUNC_CAST(camellia_xts_enc) } } } }; static const struct common_glue_ctx camellia_dec = { .num_funcs = 4, .fpu_blocks_limit = CAMELLIA_AESNI_PARALLEL_BLOCKS, .funcs = { { .num_blocks = CAMELLIA_AESNI_AVX2_PARALLEL_BLOCKS, .fn_u = { .ecb = GLUE_FUNC_CAST(camellia_ecb_dec_32way) } }, { .num_blocks = CAMELLIA_AESNI_PARALLEL_BLOCKS, .fn_u = { .ecb = GLUE_FUNC_CAST(camellia_ecb_dec_16way) } }, { .num_blocks = 2, .fn_u = { .ecb = GLUE_FUNC_CAST(camellia_dec_blk_2way) } }, { .num_blocks = 1, .fn_u = { .ecb = GLUE_FUNC_CAST(camellia_dec_blk) } } } }; static const struct common_glue_ctx camellia_dec_cbc = { .num_funcs = 4, .fpu_blocks_limit = CAMELLIA_AESNI_PARALLEL_BLOCKS, .funcs = { { .num_blocks = CAMELLIA_AESNI_AVX2_PARALLEL_BLOCKS, .fn_u = { .cbc = GLUE_CBC_FUNC_CAST(camellia_cbc_dec_32way) } }, { .num_blocks = CAMELLIA_AESNI_PARALLEL_BLOCKS, .fn_u = { .cbc = GLUE_CBC_FUNC_CAST(camellia_cbc_dec_16way) } }, { .num_blocks = 2, .fn_u = { .cbc = GLUE_CBC_FUNC_CAST(camellia_decrypt_cbc_2way) } }, { .num_blocks = 1, .fn_u = { .cbc = GLUE_CBC_FUNC_CAST(camellia_dec_blk) } } } }; static const struct common_glue_ctx camellia_dec_xts = { .num_funcs = 3, .fpu_blocks_limit = CAMELLIA_AESNI_PARALLEL_BLOCKS, .funcs = { { .num_blocks = CAMELLIA_AESNI_AVX2_PARALLEL_BLOCKS, .fn_u = { .xts = GLUE_XTS_FUNC_CAST(camellia_xts_dec_32way) } }, { .num_blocks = CAMELLIA_AESNI_PARALLEL_BLOCKS, .fn_u = { .xts = GLUE_XTS_FUNC_CAST(camellia_xts_dec_16way) } }, { .num_blocks = 1, .fn_u = { .xts = GLUE_XTS_FUNC_CAST(camellia_xts_dec) } } } }; static int camellia_setkey(struct crypto_skcipher *tfm, const u8 *key, unsigned int keylen) { return __camellia_setkey(crypto_skcipher_ctx(tfm), key, keylen, &tfm->base.crt_flags); } static int ecb_encrypt(struct skcipher_request *req) { return glue_ecb_req_128bit(&camellia_enc, req); } static int ecb_decrypt(struct skcipher_request *req) { return glue_ecb_req_128bit(&camellia_dec, req); } static int cbc_encrypt(struct skcipher_request *req) { return glue_cbc_encrypt_req_128bit(GLUE_FUNC_CAST(camellia_enc_blk), req); } static int cbc_decrypt(struct skcipher_request *req) { return glue_cbc_decrypt_req_128bit(&camellia_dec_cbc, req); } static int ctr_crypt(struct skcipher_request *req) { return glue_ctr_req_128bit(&camellia_ctr, req); } static int xts_encrypt(struct skcipher_request *req) { struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req); struct camellia_xts_ctx *ctx = crypto_skcipher_ctx(tfm); return glue_xts_req_128bit(&camellia_enc_xts, req, XTS_TWEAK_CAST(camellia_enc_blk), &ctx->tweak_ctx, &ctx->crypt_ctx); } static int xts_decrypt(struct skcipher_request *req) { struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req); struct camellia_xts_ctx *ctx = crypto_skcipher_ctx(tfm); return glue_xts_req_128bit(&camellia_dec_xts, req, XTS_TWEAK_CAST(camellia_enc_blk), &ctx->tweak_ctx, &ctx->crypt_ctx); } static struct skcipher_alg camellia_algs[] = { { .base.cra_name = "__ecb(camellia)", .base.cra_driver_name = "__ecb-camellia-aesni-avx2", .base.cra_priority = 500, .base.cra_flags = CRYPTO_ALG_INTERNAL, .base.cra_blocksize = CAMELLIA_BLOCK_SIZE, .base.cra_ctxsize = sizeof(struct camellia_ctx), .base.cra_module = THIS_MODULE, .min_keysize = CAMELLIA_MIN_KEY_SIZE, .max_keysize = CAMELLIA_MAX_KEY_SIZE, .setkey = camellia_setkey, .encrypt = ecb_encrypt, .decrypt = ecb_decrypt, }, { .base.cra_name = "__cbc(camellia)", .base.cra_driver_name = "__cbc-camellia-aesni-avx2", .base.cra_priority = 500, .base.cra_flags = CRYPTO_ALG_INTERNAL, .base.cra_blocksize = CAMELLIA_BLOCK_SIZE, .base.cra_ctxsize = sizeof(struct camellia_ctx), .base.cra_module = THIS_MODULE, .min_keysize = CAMELLIA_MIN_KEY_SIZE, .max_keysize = CAMELLIA_MAX_KEY_SIZE, .ivsize = CAMELLIA_BLOCK_SIZE, .setkey = camellia_setkey, .encrypt = cbc_encrypt, .decrypt = cbc_decrypt, }, { .base.cra_name = "__ctr(camellia)", .base.cra_driver_name = "__ctr-camellia-aesni-avx2", .base.cra_priority = 500, .base.cra_flags = CRYPTO_ALG_INTERNAL, .base.cra_blocksize = 1, .base.cra_ctxsize = sizeof(struct camellia_ctx), .base.cra_module = THIS_MODULE, .min_keysize = CAMELLIA_MIN_KEY_SIZE, .max_keysize = CAMELLIA_MAX_KEY_SIZE, .ivsize = CAMELLIA_BLOCK_SIZE, .chunksize = CAMELLIA_BLOCK_SIZE, .setkey = camellia_setkey, .encrypt = ctr_crypt, .decrypt = ctr_crypt, }, { .base.cra_name = "__xts(camellia)", .base.cra_driver_name = "__xts-camellia-aesni-avx2", .base.cra_priority = 500, .base.cra_flags = CRYPTO_ALG_INTERNAL, .base.cra_blocksize = CAMELLIA_BLOCK_SIZE, .base.cra_ctxsize = sizeof(struct camellia_xts_ctx), .base.cra_module = THIS_MODULE, .min_keysize = 2 * CAMELLIA_MIN_KEY_SIZE, .max_keysize = 2 * CAMELLIA_MAX_KEY_SIZE, .ivsize = CAMELLIA_BLOCK_SIZE, .setkey = xts_camellia_setkey, .encrypt = xts_encrypt, .decrypt = xts_decrypt, }, }; static struct simd_skcipher_alg *camellia_simd_algs[ARRAY_SIZE(camellia_algs)]; static int __init camellia_aesni_init(void) { const char *feature_name; if (!boot_cpu_has(X86_FEATURE_AVX) || !boot_cpu_has(X86_FEATURE_AVX2) || !boot_cpu_has(X86_FEATURE_AES) || !boot_cpu_has(X86_FEATURE_OSXSAVE)) { pr_info("AVX2 or AES-NI instructions are not detected.\n"); return -ENODEV; } if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM, &feature_name)) { pr_info("CPU feature '%s' is not supported.\n", feature_name); return -ENODEV; } return simd_register_skciphers_compat(camellia_algs, ARRAY_SIZE(camellia_algs), camellia_simd_algs); } static void __exit camellia_aesni_fini(void) { simd_unregister_skciphers(camellia_algs, ARRAY_SIZE(camellia_algs), camellia_simd_algs); } module_init(camellia_aesni_init); module_exit(camellia_aesni_fini); MODULE_LICENSE("GPL"); MODULE_DESCRIPTION("Camellia Cipher Algorithm, AES-NI/AVX2 optimized"); MODULE_ALIAS_CRYPTO("camellia"); MODULE_ALIAS_CRYPTO("camellia-asm");
5 1 6 6 4 5 5 5 4 1 1 7 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 /* * lib/ts_fsm.c A naive finite state machine text search approach * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License * as published by the Free Software Foundation; either version * 2 of the License, or (at your option) any later version. * * Authors: Thomas Graf <tgraf@suug.ch> * * ========================================================================== * * A finite state machine consists of n states (struct ts_fsm_token) * representing the pattern as a finite automaton. The data is read * sequentially on an octet basis. Every state token specifies the number * of recurrences and the type of value accepted which can be either a * specific character or ctype based set of characters. The available * type of recurrences include 1, (0|1), [0 n], and [1 n]. * * The algorithm differs between strict/non-strict mode specifying * whether the pattern has to start at the first octet. Strict mode * is enabled by default and can be disabled by inserting * TS_FSM_HEAD_IGNORE as the first token in the chain. * * The runtime performance of the algorithm should be around O(n), * however while in strict mode the average runtime can be better. */ #include <linux/module.h> #include <linux/types.h> #include <linux/string.h> #include <linux/ctype.h> #include <linux/textsearch.h> #include <linux/textsearch_fsm.h> struct ts_fsm { unsigned int ntokens; struct ts_fsm_token tokens[0]; }; /* other values derived from ctype.h */ #define _A 0x100 /* ascii */ #define _W 0x200 /* wildcard */ /* Map to _ctype flags and some magic numbers */ static const u16 token_map[TS_FSM_TYPE_MAX+1] = { [TS_FSM_SPECIFIC] = 0, [TS_FSM_WILDCARD] = _W, [TS_FSM_CNTRL] = _C, [TS_FSM_LOWER] = _L, [TS_FSM_UPPER] = _U, [TS_FSM_PUNCT] = _P, [TS_FSM_SPACE] = _S, [TS_FSM_DIGIT] = _D, [TS_FSM_XDIGIT] = _D | _X, [TS_FSM_ALPHA] = _U | _L, [TS_FSM_ALNUM] = _U | _L | _D, [TS_FSM_PRINT] = _P | _U | _L | _D | _SP, [TS_FSM_GRAPH] = _P | _U | _L | _D, [TS_FSM_ASCII] = _A, }; static const u16 token_lookup_tbl[256] = { _W|_A|_C, _W|_A|_C, _W|_A|_C, _W|_A|_C, /* 0- 3 */ _W|_A|_C, _W|_A|_C, _W|_A|_C, _W|_A|_C, /* 4- 7 */ _W|_A|_C, _W|_A|_C|_S, _W|_A|_C|_S, _W|_A|_C|_S, /* 8- 11 */ _W|_A|_C|_S, _W|_A|_C|_S, _W|_A|_C, _W|_A|_C, /* 12- 15 */ _W|_A|_C, _W|_A|_C, _W|_A|_C, _W|_A|_C, /* 16- 19 */ _W|_A|_C, _W|_A|_C, _W|_A|_C, _W|_A|_C, /* 20- 23 */ _W|_A|_C, _W|_A|_C, _W|_A|_C, _W|_A|_C, /* 24- 27 */ _W|_A|_C, _W|_A|_C, _W|_A|_C, _W|_A|_C, /* 28- 31 */ _W|_A|_S|_SP, _W|_A|_P, _W|_A|_P, _W|_A|_P, /* 32- 35 */ _W|_A|_P, _W|_A|_P, _W|_A|_P, _W|_A|_P, /* 36- 39 */ _W|_A|_P, _W|_A|_P, _W|_A|_P, _W|_A|_P, /* 40- 43 */ _W|_A|_P, _W|_A|_P, _W|_A|_P, _W|_A|_P, /* 44- 47 */ _W|_A|_D, _W|_A|_D, _W|_A|_D, _W|_A|_D, /* 48- 51 */ _W|_A|_D, _W|_A|_D, _W|_A|_D, _W|_A|_D, /* 52- 55 */ _W|_A|_D, _W|_A|_D, _W|_A|_P, _W|_A|_P, /* 56- 59 */ _W|_A|_P, _W|_A|_P, _W|_A|_P, _W|_A|_P, /* 60- 63 */ _W|_A|_P, _W|_A|_U|_X, _W|_A|_U|_X, _W|_A|_U|_X, /* 64- 67 */ _W|_A|_U|_X, _W|_A|_U|_X, _W|_A|_U|_X, _W|_A|_U, /* 68- 71 */ _W|_A|_U, _W|_A|_U, _W|_A|_U, _W|_A|_U, /* 72- 75 */ _W|_A|_U, _W|_A|_U, _W|_A|_U, _W|_A|_U, /* 76- 79 */ _W|_A|_U, _W|_A|_U, _W|_A|_U, _W|_A|_U, /* 80- 83 */ _W|_A|_U, _W|_A|_U, _W|_A|_U, _W|_A|_U, /* 84- 87 */ _W|_A|_U, _W|_A|_U, _W|_A|_U, _W|_A|_P, /* 88- 91 */ _W|_A|_P, _W|_A|_P, _W|_A|_P, _W|_A|_P, /* 92- 95 */ _W|_A|_P, _W|_A|_L|_X, _W|_A|_L|_X, _W|_A|_L|_X, /* 96- 99 */ _W|_A|_L|_X, _W|_A|_L|_X, _W|_A|_L|_X, _W|_A|_L, /* 100-103 */ _W|_A|_L, _W|_A|_L, _W|_A|_L, _W|_A|_L, /* 104-107 */ _W|_A|_L, _W|_A|_L, _W|_A|_L, _W|_A|_L, /* 108-111 */ _W|_A|_L, _W|_A|_L, _W|_A|_L, _W|_A|_L, /* 112-115 */ _W|_A|_L, _W|_A|_L, _W|_A|_L, _W|_A|_L, /* 116-119 */ _W|_A|_L, _W|_A|_L, _W|_A|_L, _W|_A|_P, /* 120-123 */ _W|_A|_P, _W|_A|_P, _W|_A|_P, _W|_A|_C, /* 124-127 */ _W, _W, _W, _W, /* 128-131 */ _W, _W, _W, _W, /* 132-135 */ _W, _W, _W, _W, /* 136-139 */ _W, _W, _W, _W, /* 140-143 */ _W, _W, _W, _W, /* 144-147 */ _W, _W, _W, _W, /* 148-151 */ _W, _W, _W, _W, /* 152-155 */ _W, _W, _W, _W, /* 156-159 */ _W|_S|_SP, _W|_P, _W|_P, _W|_P, /* 160-163 */ _W|_P, _W|_P, _W|_P, _W|_P, /* 164-167 */ _W|_P, _W|_P, _W|_P, _W|_P, /* 168-171 */ _W|_P, _W|_P, _W|_P, _W|_P, /* 172-175 */ _W|_P, _W|_P, _W|_P, _W|_P, /* 176-179 */ _W|_P, _W|_P, _W|_P, _W|_P, /* 180-183 */ _W|_P, _W|_P, _W|_P, _W|_P, /* 184-187 */ _W|_P, _W|_P, _W|_P, _W|_P, /* 188-191 */ _W|_U, _W|_U, _W|_U, _W|_U, /* 192-195 */ _W|_U, _W|_U, _W|_U, _W|_U, /* 196-199 */ _W|_U, _W|_U, _W|_U, _W|_U, /* 200-203 */ _W|_U, _W|_U, _W|_U, _W|_U, /* 204-207 */ _W|_U, _W|_U, _W|_U, _W|_U, /* 208-211 */ _W|_U, _W|_U, _W|_U, _W|_P, /* 212-215 */ _W|_U, _W|_U, _W|_U, _W|_U, /* 216-219 */ _W|_U, _W|_U, _W|_U, _W|_L, /* 220-223 */ _W|_L, _W|_L, _W|_L, _W|_L, /* 224-227 */ _W|_L, _W|_L, _W|_L, _W|_L, /* 228-231 */ _W|_L, _W|_L, _W|_L, _W|_L, /* 232-235 */ _W|_L, _W|_L, _W|_L, _W|_L, /* 236-239 */ _W|_L, _W|_L, _W|_L, _W|_L, /* 240-243 */ _W|_L, _W|_L, _W|_L, _W|_P, /* 244-247 */ _W|_L, _W|_L, _W|_L, _W|_L, /* 248-251 */ _W|_L, _W|_L, _W|_L, _W|_L}; /* 252-255 */ static inline int match_token(struct ts_fsm_token *t, u8 d) { if (t->type) return (token_lookup_tbl[d] & t->type) != 0; else return t->value == d; } static unsigned int fsm_find(struct ts_config *conf, struct ts_state *state) { struct ts_fsm *fsm = ts_config_priv(conf); struct ts_fsm_token *cur = NULL, *next; unsigned int match_start, block_idx = 0, tok_idx; unsigned block_len = 0, strict, consumed = state->offset; const u8 *data; #define GET_NEXT_BLOCK() \ ({ consumed += block_idx; \ block_idx = 0; \ block_len = conf->get_next_block(consumed, &data, conf, state); }) #define TOKEN_MISMATCH() \ do { \ if (strict) \ goto no_match; \ block_idx++; \ goto startover; \ } while(0) #define end_of_data() unlikely(block_idx >= block_len && !GET_NEXT_BLOCK()) if (end_of_data()) goto no_match; strict = fsm->tokens[0].recur != TS_FSM_HEAD_IGNORE; startover: match_start = consumed + block_idx; for (tok_idx = 0; tok_idx < fsm->ntokens; tok_idx++) { cur = &fsm->tokens[tok_idx]; if (likely(tok_idx < (fsm->ntokens - 1))) next = &fsm->tokens[tok_idx + 1]; else next = NULL; switch (cur->recur) { case TS_FSM_SINGLE: if (end_of_data()) goto no_match; if (!match_token(cur, data[block_idx])) TOKEN_MISMATCH(); break; case TS_FSM_PERHAPS: if (end_of_data() || !match_token(cur, data[block_idx])) continue; break; case TS_FSM_MULTI: if (end_of_data()) goto no_match; if (!match_token(cur, data[block_idx])) TOKEN_MISMATCH(); block_idx++; /* fall through */ case TS_FSM_ANY: if (next == NULL) goto found_match; if (end_of_data()) continue; while (!match_token(next, data[block_idx])) { if (!match_token(cur, data[block_idx])) TOKEN_MISMATCH(); block_idx++; if (end_of_data()) goto no_match; } continue; /* * Optimization: Prefer small local loop over jumping * back and forth until garbage at head is munched. */ case TS_FSM_HEAD_IGNORE: if (end_of_data()) continue; while (!match_token(next, data[block_idx])) { /* * Special case, don't start over upon * a mismatch, give the user the * chance to specify the type of data * allowed to be ignored. */ if (!match_token(cur, data[block_idx])) goto no_match; block_idx++; if (end_of_data()) goto no_match; } match_start = consumed + block_idx; continue; } block_idx++; } if (end_of_data()) goto found_match; no_match: return UINT_MAX; found_match: state->offset = consumed + block_idx; return match_start; } static struct ts_config *fsm_init(const void *pattern, unsigned int len, gfp_t gfp_mask, int flags) { int i, err = -EINVAL; struct ts_config *conf; struct ts_fsm *fsm; struct ts_fsm_token *tokens = (struct ts_fsm_token *) pattern; unsigned int ntokens = len / sizeof(*tokens); size_t priv_size = sizeof(*fsm) + len; if (len % sizeof(struct ts_fsm_token) || ntokens < 1) goto errout; if (flags & TS_IGNORECASE) goto errout; for (i = 0; i < ntokens; i++) { struct ts_fsm_token *t = &tokens[i]; if (t->type > TS_FSM_TYPE_MAX || t->recur > TS_FSM_RECUR_MAX) goto errout; if (t->recur == TS_FSM_HEAD_IGNORE && (i != 0 || i == (ntokens - 1))) goto errout; } conf = alloc_ts_config(priv_size, gfp_mask); if (IS_ERR(conf)) return conf; conf->flags = flags; fsm = ts_config_priv(conf); fsm->ntokens = ntokens; memcpy(fsm->tokens, pattern, len); for (i = 0; i < fsm->ntokens; i++) { struct ts_fsm_token *t = &fsm->tokens[i]; t->type = token_map[t->type]; } return conf; errout: return ERR_PTR(err); } static void *fsm_get_pattern(struct ts_config *conf) { struct ts_fsm *fsm = ts_config_priv(conf); return fsm->tokens; } static unsigned int fsm_get_pattern_len(struct ts_config *conf) { struct ts_fsm *fsm = ts_config_priv(conf); return fsm->ntokens * sizeof(struct ts_fsm_token); } static struct ts_ops fsm_ops = { .name = "fsm", .find = fsm_find, .init = fsm_init, .get_pattern = fsm_get_pattern, .get_pattern_len = fsm_get_pattern_len, .owner = THIS_MODULE, .list = LIST_HEAD_INIT(fsm_ops.list) }; static int __init init_fsm(void) { return textsearch_register(&fsm_ops); } static void __exit exit_fsm(void) { textsearch_unregister(&fsm_ops); } MODULE_LICENSE("GPL"); module_init(init_fsm); module_exit(exit_fsm);
3 10 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 /* RFCOMM implementation for Linux Bluetooth stack (BlueZ) Copyright (C) 2002 Maxim Krasnyansky <maxk@qualcomm.com> Copyright (C) 2002 Marcel Holtmann <marcel@holtmann.org> This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License version 2 as published by the Free Software Foundation; THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT OF THIRD PARTY RIGHTS. IN NO EVENT SHALL THE COPYRIGHT HOLDER(S) AND AUTHOR(S) BE LIABLE FOR ANY CLAIM, OR ANY SPECIAL INDIRECT OR CONSEQUENTIAL DAMAGES, OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. ALL LIABILITY, INCLUDING LIABILITY FOR INFRINGEMENT OF ANY PATENTS, COPYRIGHTS, TRADEMARKS OR OTHER RIGHTS, RELATING TO USE OF THIS SOFTWARE IS DISCLAIMED. */ #include <linux/refcount.h> #ifndef __RFCOMM_H #define __RFCOMM_H #define RFCOMM_CONN_TIMEOUT (HZ * 30) #define RFCOMM_DISC_TIMEOUT (HZ * 20) #define RFCOMM_AUTH_TIMEOUT (HZ * 25) #define RFCOMM_IDLE_TIMEOUT (HZ * 2) #define RFCOMM_DEFAULT_MTU 127 #define RFCOMM_DEFAULT_CREDITS 7 #define RFCOMM_MAX_L2CAP_MTU 1013 #define RFCOMM_MAX_CREDITS 40 #define RFCOMM_SKB_HEAD_RESERVE 8 #define RFCOMM_SKB_TAIL_RESERVE 2 #define RFCOMM_SKB_RESERVE (RFCOMM_SKB_HEAD_RESERVE + RFCOMM_SKB_TAIL_RESERVE) #define RFCOMM_SABM 0x2f #define RFCOMM_DISC 0x43 #define RFCOMM_UA 0x63 #define RFCOMM_DM 0x0f #define RFCOMM_UIH 0xef #define RFCOMM_TEST 0x08 #define RFCOMM_FCON 0x28 #define RFCOMM_FCOFF 0x18 #define RFCOMM_MSC 0x38 #define RFCOMM_RPN 0x24 #define RFCOMM_RLS 0x14 #define RFCOMM_PN 0x20 #define RFCOMM_NSC 0x04 #define RFCOMM_V24_FC 0x02 #define RFCOMM_V24_RTC 0x04 #define RFCOMM_V24_RTR 0x08 #define RFCOMM_V24_IC 0x40 #define RFCOMM_V24_DV 0x80 #define RFCOMM_RPN_BR_2400 0x0 #define RFCOMM_RPN_BR_4800 0x1 #define RFCOMM_RPN_BR_7200 0x2 #define RFCOMM_RPN_BR_9600 0x3 #define RFCOMM_RPN_BR_19200 0x4 #define RFCOMM_RPN_BR_38400 0x5 #define RFCOMM_RPN_BR_57600 0x6 #define RFCOMM_RPN_BR_115200 0x7 #define RFCOMM_RPN_BR_230400 0x8 #define RFCOMM_RPN_DATA_5 0x0 #define RFCOMM_RPN_DATA_6 0x1 #define RFCOMM_RPN_DATA_7 0x2 #define RFCOMM_RPN_DATA_8 0x3 #define RFCOMM_RPN_STOP_1 0 #define RFCOMM_RPN_STOP_15 1 #define RFCOMM_RPN_PARITY_NONE 0x0 #define RFCOMM_RPN_PARITY_ODD 0x1 #define RFCOMM_RPN_PARITY_EVEN 0x3 #define RFCOMM_RPN_PARITY_MARK 0x5 #define RFCOMM_RPN_PARITY_SPACE 0x7 #define RFCOMM_RPN_FLOW_NONE 0x00 #define RFCOMM_RPN_XON_CHAR 0x11 #define RFCOMM_RPN_XOFF_CHAR 0x13 #define RFCOMM_RPN_PM_BITRATE 0x0001 #define RFCOMM_RPN_PM_DATA 0x0002 #define RFCOMM_RPN_PM_STOP 0x0004 #define RFCOMM_RPN_PM_PARITY 0x0008 #define RFCOMM_RPN_PM_PARITY_TYPE 0x0010 #define RFCOMM_RPN_PM_XON 0x0020 #define RFCOMM_RPN_PM_XOFF 0x0040 #define RFCOMM_RPN_PM_FLOW 0x3F00 #define RFCOMM_RPN_PM_ALL 0x3F7F struct rfcomm_hdr { u8 addr; u8 ctrl; u8 len; /* Actual size can be 2 bytes */ } __packed; struct rfcomm_cmd { u8 addr; u8 ctrl; u8 len; u8 fcs; } __packed; struct rfcomm_mcc { u8 type; u8 len; } __packed; struct rfcomm_pn { u8 dlci; u8 flow_ctrl; u8 priority; u8 ack_timer; __le16 mtu; u8 max_retrans; u8 credits; } __packed; struct rfcomm_rpn { u8 dlci; u8 bit_rate; u8 line_settings; u8 flow_ctrl; u8 xon_char; u8 xoff_char; __le16 param_mask; } __packed; struct rfcomm_rls { u8 dlci; u8 status; } __packed; struct rfcomm_msc { u8 dlci; u8 v24_sig; } __packed; /* ---- Core structures, flags etc ---- */ struct rfcomm_session { struct list_head list; struct socket *sock; struct timer_list timer; unsigned long state; unsigned long flags; int initiator; /* Default DLC parameters */ int cfc; uint mtu; struct list_head dlcs; }; struct rfcomm_dlc { struct list_head list; struct rfcomm_session *session; struct sk_buff_head tx_queue; struct timer_list timer; struct mutex lock; unsigned long state; unsigned long flags; refcount_t refcnt; u8 dlci; u8 addr; u8 priority; u8 v24_sig; u8 remote_v24_sig; u8 mscex; u8 out; u8 sec_level; u8 role_switch; u32 defer_setup; uint mtu; uint cfc; uint rx_credits; uint tx_credits; void *owner; void (*data_ready)(struct rfcomm_dlc *d, struct sk_buff *skb); void (*state_change)(struct rfcomm_dlc *d, int err); void (*modem_status)(struct rfcomm_dlc *d, u8 v24_sig); }; /* DLC and session flags */ #define RFCOMM_RX_THROTTLED 0 #define RFCOMM_TX_THROTTLED 1 #define RFCOMM_TIMED_OUT 2 #define RFCOMM_MSC_PENDING 3 #define RFCOMM_SEC_PENDING 4 #define RFCOMM_AUTH_PENDING 5 #define RFCOMM_AUTH_ACCEPT 6 #define RFCOMM_AUTH_REJECT 7 #define RFCOMM_DEFER_SETUP 8 #define RFCOMM_ENC_DROP 9 /* Scheduling flags and events */ #define RFCOMM_SCHED_WAKEUP 31 /* MSC exchange flags */ #define RFCOMM_MSCEX_TX 1 #define RFCOMM_MSCEX_RX 2 #define RFCOMM_MSCEX_OK (RFCOMM_MSCEX_TX + RFCOMM_MSCEX_RX) /* CFC states */ #define RFCOMM_CFC_UNKNOWN -1 #define RFCOMM_CFC_DISABLED 0 #define RFCOMM_CFC_ENABLED RFCOMM_MAX_CREDITS /* ---- RFCOMM SEND RPN ---- */ int rfcomm_send_rpn(struct rfcomm_session *s, int cr, u8 dlci, u8 bit_rate, u8 data_bits, u8 stop_bits, u8 parity, u8 flow_ctrl_settings, u8 xon_char, u8 xoff_char, u16 param_mask); /* ---- RFCOMM DLCs (channels) ---- */ struct rfcomm_dlc *rfcomm_dlc_alloc(gfp_t prio); void rfcomm_dlc_free(struct rfcomm_dlc *d); int rfcomm_dlc_open(struct rfcomm_dlc *d, bdaddr_t *src, bdaddr_t *dst, u8 channel); int rfcomm_dlc_close(struct rfcomm_dlc *d, int reason); int rfcomm_dlc_send(struct rfcomm_dlc *d, struct sk_buff *skb); void rfcomm_dlc_send_noerror(struct rfcomm_dlc *d, struct sk_buff *skb); int rfcomm_dlc_set_modem_status(struct rfcomm_dlc *d, u8 v24_sig); int rfcomm_dlc_get_modem_status(struct rfcomm_dlc *d, u8 *v24_sig); void rfcomm_dlc_accept(struct rfcomm_dlc *d); struct rfcomm_dlc *rfcomm_dlc_exists(bdaddr_t *src, bdaddr_t *dst, u8 channel); #define rfcomm_dlc_lock(d) mutex_lock(&d->lock) #define rfcomm_dlc_unlock(d) mutex_unlock(&d->lock) static inline void rfcomm_dlc_hold(struct rfcomm_dlc *d) { refcount_inc(&d->refcnt); } static inline void rfcomm_dlc_put(struct rfcomm_dlc *d) { if (refcount_dec_and_test(&d->refcnt)) rfcomm_dlc_free(d); } void __rfcomm_dlc_throttle(struct rfcomm_dlc *d); void __rfcomm_dlc_unthrottle(struct rfcomm_dlc *d); static inline void rfcomm_dlc_throttle(struct rfcomm_dlc *d) { if (!test_and_set_bit(RFCOMM_RX_THROTTLED, &d->flags)) __rfcomm_dlc_throttle(d); } static inline void rfcomm_dlc_unthrottle(struct rfcomm_dlc *d) { if (test_and_clear_bit(RFCOMM_RX_THROTTLED, &d->flags)) __rfcomm_dlc_unthrottle(d); } /* ---- RFCOMM sessions ---- */ void rfcomm_session_getaddr(struct rfcomm_session *s, bdaddr_t *src, bdaddr_t *dst); /* ---- RFCOMM sockets ---- */ struct sockaddr_rc { sa_family_t rc_family; bdaddr_t rc_bdaddr; u8 rc_channel; }; #define RFCOMM_CONNINFO 0x02 struct rfcomm_conninfo { __u16 hci_handle; __u8 dev_class[3]; }; #define RFCOMM_LM 0x03 #define RFCOMM_LM_MASTER 0x0001 #define RFCOMM_LM_AUTH 0x0002 #define RFCOMM_LM_ENCRYPT 0x0004 #define RFCOMM_LM_TRUSTED 0x0008 #define RFCOMM_LM_RELIABLE 0x0010 #define RFCOMM_LM_SECURE 0x0020 #define RFCOMM_LM_FIPS 0x0040 #define rfcomm_pi(sk) ((struct rfcomm_pinfo *) sk) struct rfcomm_pinfo { struct bt_sock bt; bdaddr_t src; bdaddr_t dst; struct rfcomm_dlc *dlc; u8 channel; u8 sec_level; u8 role_switch; }; int rfcomm_init_sockets(void); void rfcomm_cleanup_sockets(void); int rfcomm_connect_ind(struct rfcomm_session *s, u8 channel, struct rfcomm_dlc **d); /* ---- RFCOMM TTY ---- */ #define RFCOMM_MAX_DEV 256 #define RFCOMMCREATEDEV _IOW('R', 200, int) #define RFCOMMRELEASEDEV _IOW('R', 201, int) #define RFCOMMGETDEVLIST _IOR('R', 210, int) #define RFCOMMGETDEVINFO _IOR('R', 211, int) #define RFCOMMSTEALDLC _IOW('R', 220, int) /* rfcomm_dev.flags bit definitions */ #define RFCOMM_REUSE_DLC 0 #define RFCOMM_RELEASE_ONHUP 1 #define RFCOMM_HANGUP_NOW 2 #define RFCOMM_TTY_ATTACHED 3 #define RFCOMM_DEFUNCT_BIT4 4 /* don't reuse this bit - userspace visible */ /* rfcomm_dev.status bit definitions */ #define RFCOMM_DEV_RELEASED 0 #define RFCOMM_TTY_OWNED 1 struct rfcomm_dev_req { s16 dev_id; u32 flags; bdaddr_t src; bdaddr_t dst; u8 channel; }; struct rfcomm_dev_info { s16 id; u32 flags; u16 state; bdaddr_t src; bdaddr_t dst; u8 channel; }; struct rfcomm_dev_list_req { u16 dev_num; struct rfcomm_dev_info dev_info[0]; }; int rfcomm_dev_ioctl(struct sock *sk, unsigned int cmd, void __user *arg); #ifdef CONFIG_BT_RFCOMM_TTY int rfcomm_init_ttys(void); void rfcomm_cleanup_ttys(void); #else static inline int rfcomm_init_ttys(void) { return 0; } static inline void rfcomm_cleanup_ttys(void) { } #endif #endif /* __RFCOMM_H */
479 479 479 479 481 480 479 479 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 #include <linux/types.h> #include <linux/sched.h> #include <linux/module.h> #include <linux/sunrpc/types.h> #include <linux/sunrpc/xdr.h> #include <linux/sunrpc/svcsock.h> #include <linux/sunrpc/svcauth.h> #include <linux/sunrpc/gss_api.h> #include <linux/sunrpc/addr.h> #include <linux/err.h> #include <linux/seq_file.h> #include <linux/hash.h> #include <linux/string.h> #include <linux/slab.h> #include <net/sock.h> #include <net/ipv6.h> #include <linux/kernel.h> #include <linux/user_namespace.h> #define RPCDBG_FACILITY RPCDBG_AUTH #include "netns.h" /* * AUTHUNIX and AUTHNULL credentials are both handled here. * AUTHNULL is treated just like AUTHUNIX except that the uid/gid * are always nobody (-2). i.e. we do the same IP address checks for * AUTHNULL as for AUTHUNIX, and that is done here. */ struct unix_domain { struct auth_domain h; /* other stuff later */ }; extern struct auth_ops svcauth_null; extern struct auth_ops svcauth_unix; static void svcauth_unix_domain_release(struct auth_domain *dom) { struct unix_domain *ud = container_of(dom, struct unix_domain, h); kfree(dom->name); kfree(ud); } struct auth_domain *unix_domain_find(char *name) { struct auth_domain *rv; struct unix_domain *new = NULL; rv = auth_domain_lookup(name, NULL); while(1) { if (rv) { if (new && rv != &new->h) svcauth_unix_domain_release(&new->h); if (rv->flavour != &svcauth_unix) { auth_domain_put(rv); return NULL; } return rv; } new = kmalloc(sizeof(*new), GFP_KERNEL); if (new == NULL) return NULL; kref_init(&new->h.ref); new->h.name = kstrdup(name, GFP_KERNEL); if (new->h.name == NULL) { kfree(new); return NULL; } new->h.flavour = &svcauth_unix; rv = auth_domain_lookup(name, &new->h); } } EXPORT_SYMBOL_GPL(unix_domain_find); /************************************************** * cache for IP address to unix_domain * as needed by AUTH_UNIX */ #define IP_HASHBITS 8 #define IP_HASHMAX (1<<IP_HASHBITS) struct ip_map { struct cache_head h; char m_class[8]; /* e.g. "nfsd" */ struct in6_addr m_addr; struct unix_domain *m_client; }; static void ip_map_put(struct kref *kref) { struct cache_head *item = container_of(kref, struct cache_head, ref); struct ip_map *im = container_of(item, struct ip_map,h); if (test_bit(CACHE_VALID, &item->flags) && !test_bit(CACHE_NEGATIVE, &item->flags)) auth_domain_put(&im->m_client->h); kfree(im); } static inline int hash_ip6(const struct in6_addr *ip) { return hash_32(ipv6_addr_hash(ip), IP_HASHBITS); } static int ip_map_match(struct cache_head *corig, struct cache_head *cnew) { struct ip_map *orig = container_of(corig, struct ip_map, h); struct ip_map *new = container_of(cnew, struct ip_map, h); return strcmp(orig->m_class, new->m_class) == 0 && ipv6_addr_equal(&orig->m_addr, &new->m_addr); } static void ip_map_init(struct cache_head *cnew, struct cache_head *citem) { struct ip_map *new = container_of(cnew, struct ip_map, h); struct ip_map *item = container_of(citem, struct ip_map, h); strcpy(new->m_class, item->m_class); new->m_addr = item->m_addr; } static void update(struct cache_head *cnew, struct cache_head *citem) { struct ip_map *new = container_of(cnew, struct ip_map, h); struct ip_map *item = container_of(citem, struct ip_map, h); kref_get(&item->m_client->h.ref); new->m_client = item->m_client; } static struct cache_head *ip_map_alloc(void) { struct ip_map *i = kmalloc(sizeof(*i), GFP_KERNEL); if (i) return &i->h; else return NULL; } static void ip_map_request(struct cache_detail *cd, struct cache_head *h, char **bpp, int *blen) { char text_addr[40]; struct ip_map *im = container_of(h, struct ip_map, h); if (ipv6_addr_v4mapped(&(im->m_addr))) { snprintf(text_addr, 20, "%pI4", &im->m_addr.s6_addr32[3]); } else { snprintf(text_addr, 40, "%pI6", &im->m_addr); } qword_add(bpp, blen, im->m_class); qword_add(bpp, blen, text_addr); (*bpp)[-1] = '\n'; } static struct ip_map *__ip_map_lookup(struct cache_detail *cd, char *class, struct in6_addr *addr); static int __ip_map_update(struct cache_detail *cd, struct ip_map *ipm, struct unix_domain *udom, time_t expiry); static int ip_map_parse(struct cache_detail *cd, char *mesg, int mlen) { /* class ipaddress [domainname] */ /* should be safe just to use the start of the input buffer * for scratch: */ char *buf = mesg; int len; char class[8]; union { struct sockaddr sa; struct sockaddr_in s4; struct sockaddr_in6 s6; } address; struct sockaddr_in6 sin6; int err; struct ip_map *ipmp; struct auth_domain *dom; time_t expiry; if (mesg[mlen-1] != '\n') return -EINVAL; mesg[mlen-1] = 0; /* class */ len = qword_get(&mesg, class, sizeof(class)); if (len <= 0) return -EINVAL; /* ip address */ len = qword_get(&mesg, buf, mlen); if (len <= 0) return -EINVAL; if (rpc_pton(cd->net, buf, len, &address.sa, sizeof(address)) == 0) return -EINVAL; switch (address.sa.sa_family) { case AF_INET: /* Form a mapped IPv4 address in sin6 */ sin6.sin6_family = AF_INET6; ipv6_addr_set_v4mapped(address.s4.sin_addr.s_addr, &sin6.sin6_addr); break; #if IS_ENABLED(CONFIG_IPV6) case AF_INET6: memcpy(&sin6, &address.s6, sizeof(sin6)); break; #endif default: return -EINVAL; } expiry = get_expiry(&mesg); if (expiry ==0) return -EINVAL; /* domainname, or empty for NEGATIVE */ len = qword_get(&mesg, buf, mlen); if (len < 0) return -EINVAL; if (len) { dom = unix_domain_find(buf); if (dom == NULL) return -ENOENT; } else dom = NULL; /* IPv6 scope IDs are ignored for now */ ipmp = __ip_map_lookup(cd, class, &sin6.sin6_addr); if (ipmp) { err = __ip_map_update(cd, ipmp, container_of(dom, struct unix_domain, h), expiry); } else err = -ENOMEM; if (dom) auth_domain_put(dom); cache_flush(); return err; } static int ip_map_show(struct seq_file *m, struct cache_detail *cd, struct cache_head *h) { struct ip_map *im; struct in6_addr addr; char *dom = "-no-domain-"; if (h == NULL) { seq_puts(m, "#class IP domain\n"); return 0; } im = container_of(h, struct ip_map, h); /* class addr domain */ addr = im->m_addr; if (test_bit(CACHE_VALID, &h->flags) && !test_bit(CACHE_NEGATIVE, &h->flags)) dom = im->m_client->h.name; if (ipv6_addr_v4mapped(&addr)) { seq_printf(m, "%s %pI4 %s\n", im->m_class, &addr.s6_addr32[3], dom); } else { seq_printf(m, "%s %pI6 %s\n", im->m_class, &addr, dom); } return 0; } static struct ip_map *__ip_map_lookup(struct cache_detail *cd, char *class, struct in6_addr *addr) { struct ip_map ip; struct cache_head *ch; strcpy(ip.m_class, class); ip.m_addr = *addr; ch = sunrpc_cache_lookup(cd, &ip.h, hash_str(class, IP_HASHBITS) ^ hash_ip6(addr)); if (ch) return container_of(ch, struct ip_map, h); else return NULL; } static inline struct ip_map *ip_map_lookup(struct net *net, char *class, struct in6_addr *addr) { struct sunrpc_net *sn; sn = net_generic(net, sunrpc_net_id); return __ip_map_lookup(sn->ip_map_cache, class, addr); } static int __ip_map_update(struct cache_detail *cd, struct ip_map *ipm, struct unix_domain *udom, time_t expiry) { struct ip_map ip; struct cache_head *ch; ip.m_client = udom; ip.h.flags = 0; if (!udom) set_bit(CACHE_NEGATIVE, &ip.h.flags); ip.h.expiry_time = expiry; ch = sunrpc_cache_update(cd, &ip.h, &ipm->h, hash_str(ipm->m_class, IP_HASHBITS) ^ hash_ip6(&ipm->m_addr)); if (!ch) return -ENOMEM; cache_put(ch, cd); return 0; } static inline int ip_map_update(struct net *net, struct ip_map *ipm, struct unix_domain *udom, time_t expiry) { struct sunrpc_net *sn; sn = net_generic(net, sunrpc_net_id); return __ip_map_update(sn->ip_map_cache, ipm, udom, expiry); } void svcauth_unix_purge(struct net *net) { struct sunrpc_net *sn; sn = net_generic(net, sunrpc_net_id); cache_purge(sn->ip_map_cache); } EXPORT_SYMBOL_GPL(svcauth_unix_purge); static inline struct ip_map * ip_map_cached_get(struct svc_xprt *xprt) { struct ip_map *ipm = NULL; struct sunrpc_net *sn; if (test_bit(XPT_CACHE_AUTH, &xprt->xpt_flags)) { spin_lock(&xprt->xpt_lock); ipm = xprt->xpt_auth_cache; if (ipm != NULL) { sn = net_generic(xprt->xpt_net, sunrpc_net_id); if (cache_is_expired(sn->ip_map_cache, &ipm->h)) { /* * The entry has been invalidated since it was * remembered, e.g. by a second mount from the * same IP address. */ xprt->xpt_auth_cache = NULL; spin_unlock(&xprt->xpt_lock); cache_put(&ipm->h, sn->ip_map_cache); return NULL; } cache_get(&ipm->h); } spin_unlock(&xprt->xpt_lock); } return ipm; } static inline void ip_map_cached_put(struct svc_xprt *xprt, struct ip_map *ipm) { if (test_bit(XPT_CACHE_AUTH, &xprt->xpt_flags)) { spin_lock(&xprt->xpt_lock); if (xprt->xpt_auth_cache == NULL) { /* newly cached, keep the reference */ xprt->xpt_auth_cache = ipm; ipm = NULL; } spin_unlock(&xprt->xpt_lock); } if (ipm) { struct sunrpc_net *sn; sn = net_generic(xprt->xpt_net, sunrpc_net_id); cache_put(&ipm->h, sn->ip_map_cache); } } void svcauth_unix_info_release(struct svc_xprt *xpt) { struct ip_map *ipm; ipm = xpt->xpt_auth_cache; if (ipm != NULL) { struct sunrpc_net *sn; sn = net_generic(xpt->xpt_net, sunrpc_net_id); cache_put(&ipm->h, sn->ip_map_cache); } } /**************************************************************************** * auth.unix.gid cache * simple cache to map a UID to a list of GIDs * because AUTH_UNIX aka AUTH_SYS has a max of UNX_NGROUPS */ #define GID_HASHBITS 8 #define GID_HASHMAX (1<<GID_HASHBITS) struct unix_gid { struct cache_head h; kuid_t uid; struct group_info *gi; }; static int unix_gid_hash(kuid_t uid) { return hash_long(from_kuid(&init_user_ns, uid), GID_HASHBITS); } static void unix_gid_put(struct kref *kref) { struct cache_head *item = container_of(kref, struct cache_head, ref); struct unix_gid *ug = container_of(item, struct unix_gid, h); if (test_bit(CACHE_VALID, &item->flags) && !test_bit(CACHE_NEGATIVE, &item->flags)) put_group_info(ug->gi); kfree(ug); } static int unix_gid_match(struct cache_head *corig, struct cache_head *cnew) { struct unix_gid *orig = container_of(corig, struct unix_gid, h); struct unix_gid *new = container_of(cnew, struct unix_gid, h); return uid_eq(orig->uid, new->uid); } static void unix_gid_init(struct cache_head *cnew, struct cache_head *citem) { struct unix_gid *new = container_of(cnew, struct unix_gid, h); struct unix_gid *item = container_of(citem, struct unix_gid, h); new->uid = item->uid; } static void unix_gid_update(struct cache_head *cnew, struct cache_head *citem) { struct unix_gid *new = container_of(cnew, struct unix_gid, h); struct unix_gid *item = container_of(citem, struct unix_gid, h); get_group_info(item->gi); new->gi = item->gi; } static struct cache_head *unix_gid_alloc(void) { struct unix_gid *g = kmalloc(sizeof(*g), GFP_KERNEL); if (g) return &g->h; else return NULL; } static void unix_gid_request(struct cache_detail *cd, struct cache_head *h, char **bpp, int *blen) { char tuid[20]; struct unix_gid *ug = container_of(h, struct unix_gid, h); snprintf(tuid, 20, "%u", from_kuid(&init_user_ns, ug->uid)); qword_add(bpp, blen, tuid); (*bpp)[-1] = '\n'; } static struct unix_gid *unix_gid_lookup(struct cache_detail *cd, kuid_t uid); static int unix_gid_parse(struct cache_detail *cd, char *mesg, int mlen) { /* uid expiry Ngid gid0 gid1 ... gidN-1 */ int id; kuid_t uid; int gids; int rv; int i; int err; time_t expiry; struct unix_gid ug, *ugp; if (mesg[mlen - 1] != '\n') return -EINVAL; mesg[mlen-1] = 0; rv = get_int(&mesg, &id); if (rv) return -EINVAL; uid = make_kuid(&init_user_ns, id); ug.uid = uid; expiry = get_expiry(&mesg); if (expiry == 0) return -EINVAL; rv = get_int(&mesg, &gids); if (rv || gids < 0 || gids > 8192) return -EINVAL; ug.gi = groups_alloc(gids); if (!ug.gi) return -ENOMEM; for (i = 0 ; i < gids ; i++) { int gid; kgid_t kgid; rv = get_int(&mesg, &gid); err = -EINVAL; if (rv) goto out; kgid = make_kgid(&init_user_ns, gid); if (!gid_valid(kgid)) goto out; ug.gi->gid[i] = kgid; } groups_sort(ug.gi); ugp = unix_gid_lookup(cd, uid); if (ugp) { struct cache_head *ch; ug.h.flags = 0; ug.h.expiry_time = expiry; ch = sunrpc_cache_update(cd, &ug.h, &ugp->h, unix_gid_hash(uid)); if (!ch) err = -ENOMEM; else { err = 0; cache_put(ch, cd); } } else err = -ENOMEM; out: if (ug.gi) put_group_info(ug.gi); return err; } static int unix_gid_show(struct seq_file *m, struct cache_detail *cd, struct cache_head *h) { struct user_namespace *user_ns = &init_user_ns; struct unix_gid *ug; int i; int glen; if (h == NULL) { seq_puts(m, "#uid cnt: gids...\n"); return 0; } ug = container_of(h, struct unix_gid, h); if (test_bit(CACHE_VALID, &h->flags) && !test_bit(CACHE_NEGATIVE, &h->flags)) glen = ug->gi->ngroups; else glen = 0; seq_printf(m, "%u %d:", from_kuid_munged(user_ns, ug->uid), glen); for (i = 0; i < glen; i++) seq_printf(m, " %d", from_kgid_munged(user_ns, ug->gi->gid[i])); seq_printf(m, "\n"); return 0; } static const struct cache_detail unix_gid_cache_template = { .owner = THIS_MODULE, .hash_size = GID_HASHMAX, .name = "auth.unix.gid", .cache_put = unix_gid_put, .cache_request = unix_gid_request, .cache_parse = unix_gid_parse, .cache_show = unix_gid_show, .match = unix_gid_match, .init = unix_gid_init, .update = unix_gid_update, .alloc = unix_gid_alloc, }; int unix_gid_cache_create(struct net *net) { struct sunrpc_net *sn = net_generic(net, sunrpc_net_id); struct cache_detail *cd; int err; cd = cache_create_net(&unix_gid_cache_template, net); if (IS_ERR(cd)) return PTR_ERR(cd); err = cache_register_net(cd, net); if (err) { cache_destroy_net(cd, net); return err; } sn->unix_gid_cache = cd; return 0; } void unix_gid_cache_destroy(struct net *net) { struct sunrpc_net *sn = net_generic(net, sunrpc_net_id); struct cache_detail *cd = sn->unix_gid_cache; sn->unix_gid_cache = NULL; cache_purge(cd); cache_unregister_net(cd, net); cache_destroy_net(cd, net); } static struct unix_gid *unix_gid_lookup(struct cache_detail *cd, kuid_t uid) { struct unix_gid ug; struct cache_head *ch; ug.uid = uid; ch = sunrpc_cache_lookup(cd, &ug.h, unix_gid_hash(uid)); if (ch) return container_of(ch, struct unix_gid, h); else return NULL; } static struct group_info *unix_gid_find(kuid_t uid, struct svc_rqst *rqstp) { struct unix_gid *ug; struct group_info *gi; int ret; struct sunrpc_net *sn = net_generic(rqstp->rq_xprt->xpt_net, sunrpc_net_id); ug = unix_gid_lookup(sn->unix_gid_cache, uid); if (!ug) return ERR_PTR(-EAGAIN); ret = cache_check(sn->unix_gid_cache, &ug->h, &rqstp->rq_chandle); switch (ret) { case -ENOENT: return ERR_PTR(-ENOENT); case -ETIMEDOUT: return ERR_PTR(-ESHUTDOWN); case 0: gi = get_group_info(ug->gi); cache_put(&ug->h, sn->unix_gid_cache); return gi; default: return ERR_PTR(-EAGAIN); } } int svcauth_unix_set_client(struct svc_rqst *rqstp) { struct sockaddr_in *sin; struct sockaddr_in6 *sin6, sin6_storage; struct ip_map *ipm; struct group_info *gi; struct svc_cred *cred = &rqstp->rq_cred; struct svc_xprt *xprt = rqstp->rq_xprt; struct net *net = xprt->xpt_net; struct sunrpc_net *sn = net_generic(net, sunrpc_net_id); switch (rqstp->rq_addr.ss_family) { case AF_INET: sin = svc_addr_in(rqstp); sin6 = &sin6_storage; ipv6_addr_set_v4mapped(sin->sin_addr.s_addr, &sin6->sin6_addr); break; case AF_INET6: sin6 = svc_addr_in6(rqstp); break; default: BUG(); } rqstp->rq_client = NULL; if (rqstp->rq_proc == 0) return SVC_OK; ipm = ip_map_cached_get(xprt); if (ipm == NULL) ipm = __ip_map_lookup(sn->ip_map_cache, rqstp->rq_server->sv_program->pg_class, &sin6->sin6_addr); if (ipm == NULL) return SVC_DENIED; switch (cache_check(sn->ip_map_cache, &ipm->h, &rqstp->rq_chandle)) { default: BUG(); case -ETIMEDOUT: return SVC_CLOSE; case -EAGAIN: return SVC_DROP; case -ENOENT: return SVC_DENIED; case 0: rqstp->rq_client = &ipm->m_client->h; kref_get(&rqstp->rq_client->ref); ip_map_cached_put(xprt, ipm); break; } gi = unix_gid_find(cred->cr_uid, rqstp); switch (PTR_ERR(gi)) { case -EAGAIN: return SVC_DROP; case -ESHUTDOWN: return SVC_CLOSE; case -ENOENT: break; default: put_group_info(cred->cr_group_info); cred->cr_group_info = gi; } return SVC_OK; } EXPORT_SYMBOL_GPL(svcauth_unix_set_client); static int svcauth_null_accept(struct svc_rqst *rqstp, __be32 *authp) { struct kvec *argv = &rqstp->rq_arg.head[0]; struct kvec *resv = &rqstp->rq_res.head[0]; struct svc_cred *cred = &rqstp->rq_cred; if (argv->iov_len < 3*4) return SVC_GARBAGE; if (svc_getu32(argv) != 0) { dprintk("svc: bad null cred\n"); *authp = rpc_autherr_badcred; return SVC_DENIED; } if (svc_getu32(argv) != htonl(RPC_AUTH_NULL) || svc_getu32(argv) != 0) { dprintk("svc: bad null verf\n"); *authp = rpc_autherr_badverf; return SVC_DENIED; } /* Signal that mapping to nobody uid/gid is required */ cred->cr_uid = INVALID_UID; cred->cr_gid = INVALID_GID; cred->cr_group_info = groups_alloc(0); if (cred->cr_group_info == NULL) return SVC_CLOSE; /* kmalloc failure - client must retry */ /* Put NULL verifier */ svc_putnl(resv, RPC_AUTH_NULL); svc_putnl(resv, 0); rqstp->rq_cred.cr_flavor = RPC_AUTH_NULL; return SVC_OK; } static int svcauth_null_release(struct svc_rqst *rqstp) { if (rqstp->rq_client) auth_domain_put(rqstp->rq_client); rqstp->rq_client = NULL; if (rqstp->rq_cred.cr_group_info) put_group_info(rqstp->rq_cred.cr_group_info); rqstp->rq_cred.cr_group_info = NULL; return 0; /* don't drop */ } struct auth_ops svcauth_null = { .name = "null", .owner = THIS_MODULE, .flavour = RPC_AUTH_NULL, .accept = svcauth_null_accept, .release = svcauth_null_release, .set_client = svcauth_unix_set_client, }; static int svcauth_unix_accept(struct svc_rqst *rqstp, __be32 *authp) { struct kvec *argv = &rqstp->rq_arg.head[0]; struct kvec *resv = &rqstp->rq_res.head[0]; struct svc_cred *cred = &rqstp->rq_cred; u32 slen, i; int len = argv->iov_len; if ((len -= 3*4) < 0) return SVC_GARBAGE; svc_getu32(argv); /* length */ svc_getu32(argv); /* time stamp */ slen = XDR_QUADLEN(svc_getnl(argv)); /* machname length */ if (slen > 64 || (len -= (slen + 3)*4) < 0) goto badcred; argv->iov_base = (void*)((__be32*)argv->iov_base + slen); /* skip machname */ argv->iov_len -= slen*4; /* * Note: we skip uid_valid()/gid_valid() checks here for * backwards compatibility with clients that use -1 id's. * Instead, -1 uid or gid is later mapped to the * (export-specific) anonymous id by nfsd_setuser. * Supplementary gid's will be left alone. */ cred->cr_uid = make_kuid(&init_user_ns, svc_getnl(argv)); /* uid */ cred->cr_gid = make_kgid(&init_user_ns, svc_getnl(argv)); /* gid */ slen = svc_getnl(argv); /* gids length */ if (slen > UNX_NGROUPS || (len -= (slen + 2)*4) < 0) goto badcred; cred->cr_group_info = groups_alloc(slen); if (cred->cr_group_info == NULL) return SVC_CLOSE; for (i = 0; i < slen; i++) { kgid_t kgid = make_kgid(&init_user_ns, svc_getnl(argv)); cred->cr_group_info->gid[i] = kgid; } groups_sort(cred->cr_group_info); if (svc_getu32(argv) != htonl(RPC_AUTH_NULL) || svc_getu32(argv) != 0) { *authp = rpc_autherr_badverf; return SVC_DENIED; } /* Put NULL verifier */ svc_putnl(resv, RPC_AUTH_NULL); svc_putnl(resv, 0); rqstp->rq_cred.cr_flavor = RPC_AUTH_UNIX; return SVC_OK; badcred: *authp = rpc_autherr_badcred; return SVC_DENIED; } static int svcauth_unix_release(struct svc_rqst *rqstp) { /* Verifier (such as it is) is already in place. */ if (rqstp->rq_client) auth_domain_put(rqstp->rq_client); rqstp->rq_client = NULL; if (rqstp->rq_cred.cr_group_info) put_group_info(rqstp->rq_cred.cr_group_info); rqstp->rq_cred.cr_group_info = NULL; return 0; } struct auth_ops svcauth_unix = { .name = "unix", .owner = THIS_MODULE, .flavour = RPC_AUTH_UNIX, .accept = svcauth_unix_accept, .release = svcauth_unix_release, .domain_release = svcauth_unix_domain_release, .set_client = svcauth_unix_set_client, }; static const struct cache_detail ip_map_cache_template = { .owner = THIS_MODULE, .hash_size = IP_HASHMAX, .name = "auth.unix.ip", .cache_put = ip_map_put, .cache_request = ip_map_request, .cache_parse = ip_map_parse, .cache_show = ip_map_show, .match = ip_map_match, .init = ip_map_init, .update = update, .alloc = ip_map_alloc, }; int ip_map_cache_create(struct net *net) { struct sunrpc_net *sn = net_generic(net, sunrpc_net_id); struct cache_detail *cd; int err; cd = cache_create_net(&ip_map_cache_template, net); if (IS_ERR(cd)) return PTR_ERR(cd); err = cache_register_net(cd, net); if (err) { cache_destroy_net(cd, net); return err; } sn->ip_map_cache = cd; return 0; } void ip_map_cache_destroy(struct net *net) { struct sunrpc_net *sn = net_generic(net, sunrpc_net_id); struct cache_detail *cd = sn->ip_map_cache; sn->ip_map_cache = NULL; cache_purge(cd); cache_unregister_net(cd, net); cache_destroy_net(cd, net); }
3225 3228 3223 3226 3227 1528 1530 654 1530 1529 1528 1530 1529 2169 2169 19 1529 1530 1688 19 1022 1012 1014 332 1012 1014 1013 1014 1013 1013 1012 1041 1014 1013 19 2728 2305 2305 2276 1117 1305 19 73 3 73 73 2 2329 2326 2330 2329 876 876 2236 826 2 70 70 3144 1820 1563 1818 933 579 499 932 27 795 802 566 556 299 793 3226 3225 3227 3228 3227 3224 3224 40 40 3227 3225 3225 3227 98 98 98 237 238 238 222 222 38 3 1408 1383 1407 6548 292 183 43 29 29 14 14 49 49 49 30 25 25 24 25 61 61 55 61 6575 5882 1 5881 5884 895 446 445 894 5923 5927 5573 1012 1011 1015 5929 4967 3317 2578 820 2580 166 2577 2576 2579 2578 13 2619 2620 2214 2215 2215 2217 1499 1499 1198 885 885 886 885 224 1496 1496 1499 2373 2375 1111 1111 1111 1111 1111 367 2366 2 2368 2373 3 11 950 948 949 949 597 597 223 225 76 36 6 867 868 517 867 361 867 865 867 821 77 1 15 15 261 4 898 899 1151 1225 68 228 292 214 111 11 9 9 9 9 73 81 1 72 72 72 72 120 9 32 263 126 121 81 167 49 144 145 144 265 9 9 10 1250 1251 1194 1184 1186 1186 1186 1182 54 1180 764 1182 1096 236 1187 1182 14 14 14 14 10 10 10 260 260 262 11 410 412 66 1940 1730 1940 1940 1890 1937 1938 1937 1952 72 69 647 5 642 645 4 644 192 555 373 1 374 556 1106 1105 1105 1435 1435 1435 145 1429 1423 29 1422 1304 1930 1928 1929 1929 622 92 92 74 67 1352 1227 1707 819 772 772 753 1496 1495 1496 126 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 2297 2298 2299 2300 2301 2302 2303 2304 2305 2306 2307 2308 2309 2310 2311 2312 2313 2314 2315 2316 2317 2318 2319 2320 2321 2322 2323 2324 2325 2326 2327 2328 2329 2330 2331 2332 2333 2334 2335 2336 2337 2338 2339 2340 2341 2342 2343 2344 2345 2346 2347 2348 2349 2350 2351 2352 2353 2354 2355 2356 2357 2358 2359 2360 2361 2362 2363 2364 2365 2366 2367 2368 2369 2370 2371 2372 2373 2374 2375 2376 2377 2378 2379 2380 2381 2382 2383 2384 2385 2386 2387 2388 2389 2390 2391 2392 2393 2394 2395 2396 2397 2398 2399 2400 2401 2402 2403 2404 2405 2406 2407 2408 2409 2410 2411 2412 2413 2414 2415 2416 2417 2418 2419 2420 2421 2422 2423 2424 2425 2426 2427 2428 2429 2430 2431 2432 2433 2434 2435 2436 2437 2438 2439 2440 2441 2442 2443 2444 2445 2446 2447 2448 2449 2450 2451 2452 2453 2454 2455 2456 2457 2458 2459 2460 2461 2462 2463 2464 2465 2466 2467 2468 2469 2470 2471 2472 2473 2474 2475 2476 2477 2478 2479 2480 2481 2482 2483 2484 2485 2486 2487 2488 2489 2490 2491 2492 2493 2494 2495 2496 2497 2498 2499 2500 2501 2502 2503 2504 2505 2506 2507 2508 2509 2510 2511 2512 2513 2514 2515 2516 2517 2518 2519 2520 2521 2522 2523 2524 2525 2526 2527 2528 2529 2530 2531 2532 2533 2534 2535 2536 2537 2538 2539 2540 2541 2542 2543 2544 2545 2546 2547 2548 2549 2550 2551 2552 2553 2554 2555 2556 2557 2558 2559 2560 2561 2562 2563 2564 2565 2566 2567 2568 2569 2570 2571 2572 2573 2574 2575 2576 2577 2578 2579 2580 2581 2582 2583 2584 2585 2586 2587 2588 2589 2590 2591 2592 2593 2594 2595 2596 2597 2598 2599 2600 2601 2602 2603 2604 2605 2606 2607 2608 2609 2610 2611 2612 2613 2614 2615 2616 2617 2618 2619 2620 2621 2622 2623 2624 2625 2626 2627 2628 2629 2630 2631 2632 2633 2634 2635 2636 2637 2638 2639 2640 2641 2642 2643 2644 2645 2646 2647 2648 2649 2650 2651 2652 2653 2654 2655 2656 2657 2658 2659 2660 2661 2662 2663 2664 2665 2666 2667 2668 2669 2670 2671 2672 2673 2674 2675 2676 2677 2678 2679 2680 2681 2682 2683 2684 2685 2686 2687 2688 2689 2690 2691 2692 2693 2694 2695 2696 2697 2698 2699 2700 2701 2702 2703 2704 2705 2706 2707 2708 2709 2710 2711 2712 2713 2714 2715 2716 2717 2718 2719 2720 2721 2722 2723 2724 2725 2726 2727 2728 2729 2730 2731 2732 2733 2734 2735 2736 2737 2738 2739 2740 2741 2742 2743 2744 2745 2746 2747 2748 2749 2750 2751 2752 2753 2754 2755 2756 2757 2758 2759 2760 2761 2762 2763 2764 2765 2766 2767 2768 2769 2770 2771 2772 2773 2774 2775 2776 2777 2778 2779 2780 2781 2782 2783 2784 2785 2786 2787 2788 2789 2790 2791 2792 2793 2794 2795 2796 2797 2798 2799 2800 2801 2802 2803 2804 2805 2806 2807 2808 2809 2810 2811 2812 2813 2814 2815 2816 2817 2818 2819 2820 2821 2822 2823 2824 2825 2826 2827 2828 2829 2830 2831 2832 2833 2834 2835 2836 2837 2838 2839 2840 2841 2842 2843 2844 2845 2846 2847 2848 2849 2850 2851 2852 2853 2854 2855 2856 2857 2858 2859 2860 2861 2862 2863 2864 2865 2866 2867 2868 2869 2870 2871 2872 2873 2874 2875 2876 2877 2878 2879 2880 2881 2882 2883 2884 2885 2886 2887 2888 2889 2890 2891 2892 2893 2894 2895 2896 2897 2898 2899 2900 2901 2902 2903 2904 2905 2906 2907 2908 2909 2910 2911 2912 2913 2914 2915 2916 2917 2918 2919 2920 2921 2922 2923 2924 2925 2926 2927 2928 2929 2930 2931 2932 2933 2934 2935 2936 2937 2938 2939 2940 2941 2942 2943 2944 2945 2946 2947 2948 2949 2950 2951 2952 2953 2954 2955 2956 2957 2958 2959 2960 2961 2962 2963 2964 2965 2966 2967 2968 2969 2970 2971 2972 2973 2974 2975 2976 2977 2978 2979 2980 2981 2982 2983 2984 2985 2986 2987 2988 2989 2990 2991 2992 2993 2994 2995 2996 2997 2998 2999 3000 3001 3002 3003 3004 3005 3006 3007 3008 3009 3010 3011 3012 3013 3014 3015 3016 3017 3018 3019 3020 3021 3022 3023 3024 3025 3026 3027 3028 3029 3030 3031 3032 3033 3034 3035 3036 3037 3038 3039 3040 3041 3042 3043 3044 3045 3046 3047 3048 3049 3050 3051 3052 3053 3054 3055 3056 3057 3058 3059 3060 3061 3062 3063 3064 3065 3066 3067 3068 3069 3070 3071 3072 3073 3074 3075 3076 3077 3078 3079 3080 3081 3082 3083 3084 3085 3086 3087 3088 3089 3090 3091 3092 3093 3094 3095 3096 3097 3098 3099 3100 3101 3102 3103 3104 3105 3106 3107 3108 3109 3110 3111 3112 3113 3114 3115 3116 3117 3118 3119 3120 3121 3122 3123 3124 3125 3126 3127 3128 3129 3130 3131 3132 3133 3134 3135 3136 3137 3138 3139 3140 3141 3142 3143 3144 3145 3146 3147 3148 3149 3150 3151 3152 3153 3154 3155 3156 3157 3158 3159 3160 3161 3162 3163 3164 3165 3166 3167 3168 3169 3170 3171 3172 3173 3174 3175 3176 3177 3178 3179 3180 3181 3182 3183 3184 3185 3186 3187 3188 3189 3190 3191 3192 3193 3194 3195 3196 3197 3198 3199 3200 3201 3202 3203 3204 3205 3206 3207 3208 3209 3210 3211 3212 3213 3214 3215 3216 3217 3218 3219 3220 3221 3222 3223 3224 3225 3226 3227 3228 3229 3230 3231 3232 3233 3234 3235 3236 3237 3238 3239 3240 3241 3242 3243 3244 3245 3246 3247 3248 3249 3250 3251 3252 3253 3254 3255 3256 3257 3258 3259 3260 3261 3262 3263 3264 3265 3266 3267 3268 3269 3270 3271 3272 3273 3274 3275 3276 3277 3278 3279 3280 3281 3282 3283 3284 3285 3286 3287 3288 3289 3290 3291 3292 3293 3294 3295 3296 3297 3298 3299 3300 3301 3302 3303 3304 3305 3306 3307 3308 3309 3310 3311 3312 3313 3314 3315 3316 3317 3318 3319 3320 3321 3322 3323 3324 3325 3326 3327 3328 3329 3330 3331 3332 3333 3334 3335 3336 3337 3338 3339 3340 3341 3342 3343 3344 3345 3346 3347 3348 3349 3350 3351 3352 3353 3354 3355 3356 3357 3358 3359 3360 3361 3362 /* * linux/mm/filemap.c * * Copyright (C) 1994-1999 Linus Torvalds */ /* * This file handles the generic file mmap semantics used by * most "normal" filesystems (but you don't /have/ to use this: * the NFS filesystem used to do this differently, for example) */ #include <linux/export.h> #include <linux/compiler.h> #include <linux/dax.h> #include <linux/fs.h> #include <linux/sched/signal.h> #include <linux/uaccess.h> #include <linux/capability.h> #include <linux/kernel_stat.h> #include <linux/gfp.h> #include <linux/mm.h> #include <linux/swap.h> #include <linux/mman.h> #include <linux/pagemap.h> #include <linux/file.h> #include <linux/uio.h> #include <linux/hash.h> #include <linux/writeback.h> #include <linux/backing-dev.h> #include <linux/pagevec.h> #include <linux/blkdev.h> #include <linux/security.h> #include <linux/cpuset.h> #include <linux/hugetlb.h> #include <linux/memcontrol.h> #include <linux/cleancache.h> #include <linux/shmem_fs.h> #include <linux/rmap.h> #include "internal.h" #define CREATE_TRACE_POINTS #include <trace/events/filemap.h> /* * FIXME: remove all knowledge of the buffer layer from the core VM */ #include <linux/buffer_head.h> /* for try_to_free_buffers */ #include <asm/mman.h> /* * Shared mappings implemented 30.11.1994. It's not fully working yet, * though. * * Shared mappings now work. 15.8.1995 Bruno. * * finished 'unifying' the page and buffer cache and SMP-threaded the * page-cache, 21.05.1999, Ingo Molnar <mingo@redhat.com> * * SMP-threaded pagemap-LRU 1999, Andrea Arcangeli <andrea@suse.de> */ /* * Lock ordering: * * ->i_mmap_rwsem (truncate_pagecache) * ->private_lock (__free_pte->__set_page_dirty_buffers) * ->swap_lock (exclusive_swap_page, others) * ->i_pages lock * * ->i_mutex * ->i_mmap_rwsem (truncate->unmap_mapping_range) * * ->mmap_sem * ->i_mmap_rwsem * ->page_table_lock or pte_lock (various, mainly in memory.c) * ->i_pages lock (arch-dependent flush_dcache_mmap_lock) * * ->mmap_sem * ->lock_page (access_process_vm) * * ->i_mutex (generic_perform_write) * ->mmap_sem (fault_in_pages_readable->do_page_fault) * * bdi->wb.list_lock * sb_lock (fs/fs-writeback.c) * ->i_pages lock (__sync_single_inode) * * ->i_mmap_rwsem * ->anon_vma.lock (vma_adjust) * * ->anon_vma.lock * ->page_table_lock or pte_lock (anon_vma_prepare and various) * * ->page_table_lock or pte_lock * ->swap_lock (try_to_unmap_one) * ->private_lock (try_to_unmap_one) * ->i_pages lock (try_to_unmap_one) * ->zone_lru_lock(zone) (follow_page->mark_page_accessed) * ->zone_lru_lock(zone) (check_pte_range->isolate_lru_page) * ->private_lock (page_remove_rmap->set_page_dirty) * ->i_pages lock (page_remove_rmap->set_page_dirty) * bdi.wb->list_lock (page_remove_rmap->set_page_dirty) * ->inode->i_lock (page_remove_rmap->set_page_dirty) * ->memcg->move_lock (page_remove_rmap->lock_page_memcg) * bdi.wb->list_lock (zap_pte_range->set_page_dirty) * ->inode->i_lock (zap_pte_range->set_page_dirty) * ->private_lock (zap_pte_range->__set_page_dirty_buffers) * * ->i_mmap_rwsem * ->tasklist_lock (memory_failure, collect_procs_ao) */ static int page_cache_tree_insert(struct address_space *mapping, struct page *page, void **shadowp) { struct radix_tree_node *node; void **slot; int error; error = __radix_tree_create(&mapping->i_pages, page->index, 0, &node, &slot); if (error) return error; if (*slot) { void *p; p = radix_tree_deref_slot_protected(slot, &mapping->i_pages.xa_lock); if (!radix_tree_exceptional_entry(p)) return -EEXIST; mapping->nrexceptional--; if (shadowp) *shadowp = p; } __radix_tree_replace(&mapping->i_pages, node, slot, page, workingset_lookup_update(mapping)); mapping->nrpages++; return 0; } static void page_cache_tree_delete(struct address_space *mapping, struct page *page, void *shadow) { int i, nr; /* hugetlb pages are represented by one entry in the radix tree */ nr = PageHuge(page) ? 1 : hpage_nr_pages(page); VM_BUG_ON_PAGE(!PageLocked(page), page); VM_BUG_ON_PAGE(PageTail(page), page); VM_BUG_ON_PAGE(nr != 1 && shadow, page); for (i = 0; i < nr; i++) { struct radix_tree_node *node; void **slot; __radix_tree_lookup(&mapping->i_pages, page->index + i, &node, &slot); VM_BUG_ON_PAGE(!node && nr != 1, page); radix_tree_clear_tags(&mapping->i_pages, node, slot); __radix_tree_replace(&mapping->i_pages, node, slot, shadow, workingset_lookup_update(mapping)); } page->mapping = NULL; /* Leave page->index set: truncation lookup relies upon it */ if (shadow) { mapping->nrexceptional += nr; /* * Make sure the nrexceptional update is committed before * the nrpages update so that final truncate racing * with reclaim does not see both counters 0 at the * same time and miss a shadow entry. */ smp_wmb(); } mapping->nrpages -= nr; } static void unaccount_page_cache_page(struct address_space *mapping, struct page *page) { int nr; /* * if we're uptodate, flush out into the cleancache, otherwise * invalidate any existing cleancache entries. We can't leave * stale data around in the cleancache once our page is gone */ if (PageUptodate(page) && PageMappedToDisk(page)) cleancache_put_page(page); else cleancache_invalidate_page(mapping, page); VM_BUG_ON_PAGE(PageTail(page), page); VM_BUG_ON_PAGE(page_mapped(page), page); if (!IS_ENABLED(CONFIG_DEBUG_VM) && unlikely(page_mapped(page))) { int mapcount; pr_alert("BUG: Bad page cache in process %s pfn:%05lx\n", current->comm, page_to_pfn(page)); dump_page(page, "still mapped when deleted"); dump_stack(); add_taint(TAINT_BAD_PAGE, LOCKDEP_NOW_UNRELIABLE); mapcount = page_mapcount(page); if (mapping_exiting(mapping) && page_count(page) >= mapcount + 2) { /* * All vmas have already been torn down, so it's * a good bet that actually the page is unmapped, * and we'd prefer not to leak it: if we're wrong, * some other bad page check should catch it later. */ page_mapcount_reset(page); page_ref_sub(page, mapcount); } } /* hugetlb pages do not participate in page cache accounting. */ if (PageHuge(page)) return; nr = hpage_nr_pages(page); __mod_node_page_state(page_pgdat(page), NR_FILE_PAGES, -nr); if (PageSwapBacked(page)) { __mod_node_page_state(page_pgdat(page), NR_SHMEM, -nr); if (PageTransHuge(page)) __dec_node_page_state(page, NR_SHMEM_THPS); } else { VM_BUG_ON_PAGE(PageTransHuge(page), page); } /* * At this point page must be either written or cleaned by * truncate. Dirty page here signals a bug and loss of * unwritten data. * * This fixes dirty accounting after removing the page entirely * but leaves PageDirty set: it has no effect for truncated * page and anyway will be cleared before returning page into * buddy allocator. */ if (WARN_ON_ONCE(PageDirty(page))) account_page_cleaned(page, mapping, inode_to_wb(mapping->host)); } /* * Delete a page from the page cache and free it. Caller has to make * sure the page is locked and that nobody else uses it - or that usage * is safe. The caller must hold the i_pages lock. */ void __delete_from_page_cache(struct page *page, void *shadow) { struct address_space *mapping = page->mapping; trace_mm_filemap_delete_from_page_cache(page); unaccount_page_cache_page(mapping, page); page_cache_tree_delete(mapping, page, shadow); } static void page_cache_free_page(struct address_space *mapping, struct page *page) { void (*freepage)(struct page *); freepage = mapping->a_ops->freepage; if (freepage) freepage(page); if (PageTransHuge(page) && !PageHuge(page)) { page_ref_sub(page, HPAGE_PMD_NR); VM_BUG_ON_PAGE(page_count(page) <= 0, page); } else { put_page(page); } } /** * delete_from_page_cache - delete page from page cache * @page: the page which the kernel is trying to remove from page cache * * This must be called only on pages that have been verified to be in the page * cache and locked. It will never put the page into the free list, the caller * has a reference on the page. */ void delete_from_page_cache(struct page *page) { struct address_space *mapping = page_mapping(page); unsigned long flags; BUG_ON(!PageLocked(page)); xa_lock_irqsave(&mapping->i_pages, flags); __delete_from_page_cache(page, NULL); xa_unlock_irqrestore(&mapping->i_pages, flags); page_cache_free_page(mapping, page); } EXPORT_SYMBOL(delete_from_page_cache); /* * page_cache_tree_delete_batch - delete several pages from page cache * @mapping: the mapping to which pages belong * @pvec: pagevec with pages to delete * * The function walks over mapping->i_pages and removes pages passed in @pvec * from the mapping. The function expects @pvec to be sorted by page index. * It tolerates holes in @pvec (mapping entries at those indices are not * modified). The function expects only THP head pages to be present in the * @pvec and takes care to delete all corresponding tail pages from the * mapping as well. * * The function expects the i_pages lock to be held. */ static void page_cache_tree_delete_batch(struct address_space *mapping, struct pagevec *pvec) { struct radix_tree_iter iter; void **slot; int total_pages = 0; int i = 0, tail_pages = 0; struct page *page; pgoff_t start; start = pvec->pages[0]->index; radix_tree_for_each_slot(slot, &mapping->i_pages, &iter, start) { if (i >= pagevec_count(pvec) && !tail_pages) break; page = radix_tree_deref_slot_protected(slot, &mapping->i_pages.xa_lock); if (radix_tree_exceptional_entry(page)) continue; if (!tail_pages) { /* * Some page got inserted in our range? Skip it. We * have our pages locked so they are protected from * being removed. */ if (page != pvec->pages[i]) continue; WARN_ON_ONCE(!PageLocked(page)); if (PageTransHuge(page) && !PageHuge(page)) tail_pages = HPAGE_PMD_NR - 1; page->mapping = NULL; /* * Leave page->index set: truncation lookup relies * upon it */ i++; } else { tail_pages--; } radix_tree_clear_tags(&mapping->i_pages, iter.node, slot); __radix_tree_replace(&mapping->i_pages, iter.node, slot, NULL, workingset_lookup_update(mapping)); total_pages++; } mapping->nrpages -= total_pages; } void delete_from_page_cache_batch(struct address_space *mapping, struct pagevec *pvec) { int i; unsigned long flags; if (!pagevec_count(pvec)) return; xa_lock_irqsave(&mapping->i_pages, flags); for (i = 0; i < pagevec_count(pvec); i++) { trace_mm_filemap_delete_from_page_cache(pvec->pages[i]); unaccount_page_cache_page(mapping, pvec->pages[i]); } page_cache_tree_delete_batch(mapping, pvec); xa_unlock_irqrestore(&mapping->i_pages, flags); for (i = 0; i < pagevec_count(pvec); i++) page_cache_free_page(mapping, pvec->pages[i]); } int filemap_check_errors(struct address_space *mapping) { int ret = 0; /* Check for outstanding write errors */ if (test_bit(AS_ENOSPC, &mapping->flags) && test_and_clear_bit(AS_ENOSPC, &mapping->flags)) ret = -ENOSPC; if (test_bit(AS_EIO, &mapping->flags) && test_and_clear_bit(AS_EIO, &mapping->flags)) ret = -EIO; return ret; } EXPORT_SYMBOL(filemap_check_errors); static int filemap_check_and_keep_errors(struct address_space *mapping) { /* Check for outstanding write errors */ if (test_bit(AS_EIO, &mapping->flags)) return -EIO; if (test_bit(AS_ENOSPC, &mapping->flags)) return -ENOSPC; return 0; } /** * __filemap_fdatawrite_range - start writeback on mapping dirty pages in range * @mapping: address space structure to write * @start: offset in bytes where the range starts * @end: offset in bytes where the range ends (inclusive) * @sync_mode: enable synchronous operation * * Start writeback against all of a mapping's dirty pages that lie * within the byte offsets <start, end> inclusive. * * If sync_mode is WB_SYNC_ALL then this is a "data integrity" operation, as * opposed to a regular memory cleansing writeback. The difference between * these two operations is that if a dirty page/buffer is encountered, it must * be waited upon, and not just skipped over. */ int __filemap_fdatawrite_range(struct address_space *mapping, loff_t start, loff_t end, int sync_mode) { int ret; struct writeback_control wbc = { .sync_mode = sync_mode, .nr_to_write = LONG_MAX, .range_start = start, .range_end = end, }; if (!mapping_cap_writeback_dirty(mapping) || !mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) return 0; wbc_attach_fdatawrite_inode(&wbc, mapping->host); ret = do_writepages(mapping, &wbc); wbc_detach_inode(&wbc); return ret; } static inline int __filemap_fdatawrite(struct address_space *mapping, int sync_mode) { return __filemap_fdatawrite_range(mapping, 0, LLONG_MAX, sync_mode); } int filemap_fdatawrite(struct address_space *mapping) { return __filemap_fdatawrite(mapping, WB_SYNC_ALL); } EXPORT_SYMBOL(filemap_fdatawrite); int filemap_fdatawrite_range(struct address_space *mapping, loff_t start, loff_t end) { return __filemap_fdatawrite_range(mapping, start, end, WB_SYNC_ALL); } EXPORT_SYMBOL(filemap_fdatawrite_range); /** * filemap_flush - mostly a non-blocking flush * @mapping: target address_space * * This is a mostly non-blocking flush. Not suitable for data-integrity * purposes - I/O may not be started against all dirty pages. */ int filemap_flush(struct address_space *mapping) { return __filemap_fdatawrite(mapping, WB_SYNC_NONE); } EXPORT_SYMBOL(filemap_flush); /** * filemap_range_has_page - check if a page exists in range. * @mapping: address space within which to check * @start_byte: offset in bytes where the range starts * @end_byte: offset in bytes where the range ends (inclusive) * * Find at least one page in the range supplied, usually used to check if * direct writing in this range will trigger a writeback. */ bool filemap_range_has_page(struct address_space *mapping, loff_t start_byte, loff_t end_byte) { pgoff_t index = start_byte >> PAGE_SHIFT; pgoff_t end = end_byte >> PAGE_SHIFT; struct page *page; if (end_byte < start_byte) return false; if (mapping->nrpages == 0) return false; if (!find_get_pages_range(mapping, &index, end, 1, &page)) return false; put_page(page); return true; } EXPORT_SYMBOL(filemap_range_has_page); static void __filemap_fdatawait_range(struct address_space *mapping, loff_t start_byte, loff_t end_byte) { pgoff_t index = start_byte >> PAGE_SHIFT; pgoff_t end = end_byte >> PAGE_SHIFT; struct pagevec pvec; int nr_pages; if (end_byte < start_byte) return; pagevec_init(&pvec); while (index <= end) { unsigned i; nr_pages = pagevec_lookup_range_tag(&pvec, mapping, &index, end, PAGECACHE_TAG_WRITEBACK); if (!nr_pages) break; for (i = 0; i < nr_pages; i++) { struct page *page = pvec.pages[i]; wait_on_page_writeback(page); ClearPageError(page); } pagevec_release(&pvec); cond_resched(); } } /** * filemap_fdatawait_range - wait for writeback to complete * @mapping: address space structure to wait for * @start_byte: offset in bytes where the range starts * @end_byte: offset in bytes where the range ends (inclusive) * * Walk the list of under-writeback pages of the given address space * in the given range and wait for all of them. Check error status of * the address space and return it. * * Since the error status of the address space is cleared by this function, * callers are responsible for checking the return value and handling and/or * reporting the error. */ int filemap_fdatawait_range(struct address_space *mapping, loff_t start_byte, loff_t end_byte) { __filemap_fdatawait_range(mapping, start_byte, end_byte); return filemap_check_errors(mapping); } EXPORT_SYMBOL(filemap_fdatawait_range); /** * filemap_fdatawait_range_keep_errors - wait for writeback to complete * @mapping: address space structure to wait for * @start_byte: offset in bytes where the range starts * @end_byte: offset in bytes where the range ends (inclusive) * * Walk the list of under-writeback pages of the given address space in the * given range and wait for all of them. Unlike filemap_fdatawait_range(), * this function does not clear error status of the address space. * * Use this function if callers don't handle errors themselves. Expected * call sites are system-wide / filesystem-wide data flushers: e.g. sync(2), * fsfreeze(8) */ int filemap_fdatawait_range_keep_errors(struct address_space *mapping, loff_t start_byte, loff_t end_byte) { __filemap_fdatawait_range(mapping, start_byte, end_byte); return filemap_check_and_keep_errors(mapping); } EXPORT_SYMBOL(filemap_fdatawait_range_keep_errors); /** * file_fdatawait_range - wait for writeback to complete * @file: file pointing to address space structure to wait for * @start_byte: offset in bytes where the range starts * @end_byte: offset in bytes where the range ends (inclusive) * * Walk the list of under-writeback pages of the address space that file * refers to, in the given range and wait for all of them. Check error * status of the address space vs. the file->f_wb_err cursor and return it. * * Since the error status of the file is advanced by this function, * callers are responsible for checking the return value and handling and/or * reporting the error. */ int file_fdatawait_range(struct file *file, loff_t start_byte, loff_t end_byte) { struct address_space *mapping = file->f_mapping; __filemap_fdatawait_range(mapping, start_byte, end_byte); return file_check_and_advance_wb_err(file); } EXPORT_SYMBOL(file_fdatawait_range); /** * filemap_fdatawait_keep_errors - wait for writeback without clearing errors * @mapping: address space structure to wait for * * Walk the list of under-writeback pages of the given address space * and wait for all of them. Unlike filemap_fdatawait(), this function * does not clear error status of the address space. * * Use this function if callers don't handle errors themselves. Expected * call sites are system-wide / filesystem-wide data flushers: e.g. sync(2), * fsfreeze(8) */ int filemap_fdatawait_keep_errors(struct address_space *mapping) { __filemap_fdatawait_range(mapping, 0, LLONG_MAX); return filemap_check_and_keep_errors(mapping); } EXPORT_SYMBOL(filemap_fdatawait_keep_errors); static bool mapping_needs_writeback(struct address_space *mapping) { return (!dax_mapping(mapping) && mapping->nrpages) || (dax_mapping(mapping) && mapping->nrexceptional); } int filemap_write_and_wait(struct address_space *mapping) { int err = 0; if (mapping_needs_writeback(mapping)) { err = filemap_fdatawrite(mapping); /* * Even if the above returned error, the pages may be * written partially (e.g. -ENOSPC), so we wait for it. * But the -EIO is special case, it may indicate the worst * thing (e.g. bug) happened, so we avoid waiting for it. */ if (err != -EIO) { int err2 = filemap_fdatawait(mapping); if (!err) err = err2; } else { /* Clear any previously stored errors */ filemap_check_errors(mapping); } } else { err = filemap_check_errors(mapping); } return err; } EXPORT_SYMBOL(filemap_write_and_wait); /** * filemap_write_and_wait_range - write out & wait on a file range * @mapping: the address_space for the pages * @lstart: offset in bytes where the range starts * @lend: offset in bytes where the range ends (inclusive) * * Write out and wait upon file offsets lstart->lend, inclusive. * * Note that @lend is inclusive (describes the last byte to be written) so * that this function can be used to write to the very end-of-file (end = -1). */ int filemap_write_and_wait_range(struct address_space *mapping, loff_t lstart, loff_t lend) { int err = 0; if (mapping_needs_writeback(mapping)) { err = __filemap_fdatawrite_range(mapping, lstart, lend, WB_SYNC_ALL); /* See comment of filemap_write_and_wait() */ if (err != -EIO) { int err2 = filemap_fdatawait_range(mapping, lstart, lend); if (!err) err = err2; } else { /* Clear any previously stored errors */ filemap_check_errors(mapping); } } else { err = filemap_check_errors(mapping); } return err; } EXPORT_SYMBOL(filemap_write_and_wait_range); void __filemap_set_wb_err(struct address_space *mapping, int err) { errseq_t eseq = errseq_set(&mapping->wb_err, err); trace_filemap_set_wb_err(mapping, eseq); } EXPORT_SYMBOL(__filemap_set_wb_err); /** * file_check_and_advance_wb_err - report wb error (if any) that was previously * and advance wb_err to current one * @file: struct file on which the error is being reported * * When userland calls fsync (or something like nfsd does the equivalent), we * want to report any writeback errors that occurred since the last fsync (or * since the file was opened if there haven't been any). * * Grab the wb_err from the mapping. If it matches what we have in the file, * then just quickly return 0. The file is all caught up. * * If it doesn't match, then take the mapping value, set the "seen" flag in * it and try to swap it into place. If it works, or another task beat us * to it with the new value, then update the f_wb_err and return the error * portion. The error at this point must be reported via proper channels * (a'la fsync, or NFS COMMIT operation, etc.). * * While we handle mapping->wb_err with atomic operations, the f_wb_err * value is protected by the f_lock since we must ensure that it reflects * the latest value swapped in for this file descriptor. */ int file_check_and_advance_wb_err(struct file *file) { int err = 0; errseq_t old = READ_ONCE(file->f_wb_err); struct address_space *mapping = file->f_mapping; /* Locklessly handle the common case where nothing has changed */ if (errseq_check(&mapping->wb_err, old)) { /* Something changed, must use slow path */ spin_lock(&file->f_lock); old = file->f_wb_err; err = errseq_check_and_advance(&mapping->wb_err, &file->f_wb_err); trace_file_check_and_advance_wb_err(file, old); spin_unlock(&file->f_lock); } /* * We're mostly using this function as a drop in replacement for * filemap_check_errors. Clear AS_EIO/AS_ENOSPC to emulate the effect * that the legacy code would have had on these flags. */ clear_bit(AS_EIO, &mapping->flags); clear_bit(AS_ENOSPC, &mapping->flags); return err; } EXPORT_SYMBOL(file_check_and_advance_wb_err); /** * file_write_and_wait_range - write out & wait on a file range * @file: file pointing to address_space with pages * @lstart: offset in bytes where the range starts * @lend: offset in bytes where the range ends (inclusive) * * Write out and wait upon file offsets lstart->lend, inclusive. * * Note that @lend is inclusive (describes the last byte to be written) so * that this function can be used to write to the very end-of-file (end = -1). * * After writing out and waiting on the data, we check and advance the * f_wb_err cursor to the latest value, and return any errors detected there. */ int file_write_and_wait_range(struct file *file, loff_t lstart, loff_t lend) { int err = 0, err2; struct address_space *mapping = file->f_mapping; if (mapping_needs_writeback(mapping)) { err = __filemap_fdatawrite_range(mapping, lstart, lend, WB_SYNC_ALL); /* See comment of filemap_write_and_wait() */ if (err != -EIO) __filemap_fdatawait_range(mapping, lstart, lend); } err2 = file_check_and_advance_wb_err(file); if (!err) err = err2; return err; } EXPORT_SYMBOL(file_write_and_wait_range); /** * replace_page_cache_page - replace a pagecache page with a new one * @old: page to be replaced * @new: page to replace with * @gfp_mask: allocation mode * * This function replaces a page in the pagecache with a new one. On * success it acquires the pagecache reference for the new page and * drops it for the old page. Both the old and new pages must be * locked. This function does not add the new page to the LRU, the * caller must do that. * * The remove + add is atomic. The only way this function can fail is * memory allocation failure. */ int replace_page_cache_page(struct page *old, struct page *new, gfp_t gfp_mask) { int error; VM_BUG_ON_PAGE(!PageLocked(old), old); VM_BUG_ON_PAGE(!PageLocked(new), new); VM_BUG_ON_PAGE(new->mapping, new); error = radix_tree_preload(gfp_mask & GFP_RECLAIM_MASK); if (!error) { struct address_space *mapping = old->mapping; void (*freepage)(struct page *); unsigned long flags; pgoff_t offset = old->index; freepage = mapping->a_ops->freepage; get_page(new); new->mapping = mapping; new->index = offset; xa_lock_irqsave(&mapping->i_pages, flags); __delete_from_page_cache(old, NULL); error = page_cache_tree_insert(mapping, new, NULL); BUG_ON(error); /* * hugetlb pages do not participate in page cache accounting. */ if (!PageHuge(new)) __inc_node_page_state(new, NR_FILE_PAGES); if (PageSwapBacked(new)) __inc_node_page_state(new, NR_SHMEM); xa_unlock_irqrestore(&mapping->i_pages, flags); mem_cgroup_migrate(old, new); radix_tree_preload_end(); if (freepage) freepage(old); put_page(old); } return error; } EXPORT_SYMBOL_GPL(replace_page_cache_page); static int __add_to_page_cache_locked(struct page *page, struct address_space *mapping, pgoff_t offset, gfp_t gfp_mask, void **shadowp) { int huge = PageHuge(page); struct mem_cgroup *memcg; int error; VM_BUG_ON_PAGE(!PageLocked(page), page); VM_BUG_ON_PAGE(PageSwapBacked(page), page); if (!huge) { error = mem_cgroup_try_charge(page, current->mm, gfp_mask, &memcg, false); if (error) return error; } error = radix_tree_maybe_preload(gfp_mask & GFP_RECLAIM_MASK); if (error) { if (!huge) mem_cgroup_cancel_charge(page, memcg, false); return error; } get_page(page); page->mapping = mapping; page->index = offset; xa_lock_irq(&mapping->i_pages); error = page_cache_tree_insert(mapping, page, shadowp); radix_tree_preload_end(); if (unlikely(error)) goto err_insert; /* hugetlb pages do not participate in page cache accounting. */ if (!huge) __inc_node_page_state(page, NR_FILE_PAGES); xa_unlock_irq(&mapping->i_pages); if (!huge) mem_cgroup_commit_charge(page, memcg, false, false); trace_mm_filemap_add_to_page_cache(page); return 0; err_insert: page->mapping = NULL; /* Leave page->index set: truncation relies upon it */ xa_unlock_irq(&mapping->i_pages); if (!huge) mem_cgroup_cancel_charge(page, memcg, false); put_page(page); return error; } /** * add_to_page_cache_locked - add a locked page to the pagecache * @page: page to add * @mapping: the page's address_space * @offset: page index * @gfp_mask: page allocation mode * * This function is used to add a page to the pagecache. It must be locked. * This function does not add the page to the LRU. The caller must do that. */ int add_to_page_cache_locked(struct page *page, struct address_space *mapping, pgoff_t offset, gfp_t gfp_mask) { return __add_to_page_cache_locked(page, mapping, offset, gfp_mask, NULL); } EXPORT_SYMBOL(add_to_page_cache_locked); int add_to_page_cache_lru(struct page *page, struct address_space *mapping, pgoff_t offset, gfp_t gfp_mask) { void *shadow = NULL; int ret; __SetPageLocked(page); ret = __add_to_page_cache_locked(page, mapping, offset, gfp_mask, &shadow); if (unlikely(ret)) __ClearPageLocked(page); else { /* * The page might have been evicted from cache only * recently, in which case it should be activated like * any other repeatedly accessed page. * The exception is pages getting rewritten; evicting other * data from the working set, only to cache data that will * get overwritten with something else, is a waste of memory. */ if (!(gfp_mask & __GFP_WRITE) && shadow && workingset_refault(shadow)) { SetPageActive(page); workingset_activation(page); } else ClearPageActive(page); lru_cache_add(page); } return ret; } EXPORT_SYMBOL_GPL(add_to_page_cache_lru); #ifdef CONFIG_NUMA struct page *__page_cache_alloc(gfp_t gfp) { int n; struct page *page; if (cpuset_do_page_mem_spread()) { unsigned int cpuset_mems_cookie; do { cpuset_mems_cookie = read_mems_allowed_begin(); n = cpuset_mem_spread_node(); page = __alloc_pages_node(n, gfp, 0); } while (!page && read_mems_allowed_retry(cpuset_mems_cookie)); return page; } return alloc_pages(gfp, 0); } EXPORT_SYMBOL(__page_cache_alloc); #endif /* * In order to wait for pages to become available there must be * waitqueues associated with pages. By using a hash table of * waitqueues where the bucket discipline is to maintain all * waiters on the same queue and wake all when any of the pages * become available, and for the woken contexts to check to be * sure the appropriate page became available, this saves space * at a cost of "thundering herd" phenomena during rare hash * collisions. */ #define PAGE_WAIT_TABLE_BITS 8 #define PAGE_WAIT_TABLE_SIZE (1 << PAGE_WAIT_TABLE_BITS) static wait_queue_head_t page_wait_table[PAGE_WAIT_TABLE_SIZE] __cacheline_aligned; static wait_queue_head_t *page_waitqueue(struct page *page) { return &page_wait_table[hash_ptr(page, PAGE_WAIT_TABLE_BITS)]; } void __init pagecache_init(void) { int i; for (i = 0; i < PAGE_WAIT_TABLE_SIZE; i++) init_waitqueue_head(&page_wait_table[i]); page_writeback_init(); } /* This has the same layout as wait_bit_key - see fs/cachefiles/rdwr.c */ struct wait_page_key { struct page *page; int bit_nr; int page_match; }; struct wait_page_queue { struct page *page; int bit_nr; wait_queue_entry_t wait; }; static int wake_page_function(wait_queue_entry_t *wait, unsigned mode, int sync, void *arg) { struct wait_page_key *key = arg; struct wait_page_queue *wait_page = container_of(wait, struct wait_page_queue, wait); if (wait_page->page != key->page) return 0; key->page_match = 1; if (wait_page->bit_nr != key->bit_nr) return 0; /* Stop walking if it's locked */ if (test_bit(key->bit_nr, &key->page->flags)) return -1; return autoremove_wake_function(wait, mode, sync, key); } static void wake_up_page_bit(struct page *page, int bit_nr) { wait_queue_head_t *q = page_waitqueue(page); struct wait_page_key key; unsigned long flags; wait_queue_entry_t bookmark; key.page = page; key.bit_nr = bit_nr; key.page_match = 0; bookmark.flags = 0; bookmark.private = NULL; bookmark.func = NULL; INIT_LIST_HEAD(&bookmark.entry); spin_lock_irqsave(&q->lock, flags); __wake_up_locked_key_bookmark(q, TASK_NORMAL, &key, &bookmark); while (bookmark.flags & WQ_FLAG_BOOKMARK) { /* * Take a breather from holding the lock, * allow pages that finish wake up asynchronously * to acquire the lock and remove themselves * from wait queue */ spin_unlock_irqrestore(&q->lock, flags); cpu_relax(); spin_lock_irqsave(&q->lock, flags); __wake_up_locked_key_bookmark(q, TASK_NORMAL, &key, &bookmark); } /* * It is possible for other pages to have collided on the waitqueue * hash, so in that case check for a page match. That prevents a long- * term waiter * * It is still possible to miss a case here, when we woke page waiters * and removed them from the waitqueue, but there are still other * page waiters. */ if (!waitqueue_active(q) || !key.page_match) { ClearPageWaiters(page); /* * It's possible to miss clearing Waiters here, when we woke * our page waiters, but the hashed waitqueue has waiters for * other pages on it. * * That's okay, it's a rare case. The next waker will clear it. */ } spin_unlock_irqrestore(&q->lock, flags); } static void wake_up_page(struct page *page, int bit) { if (!PageWaiters(page)) return; wake_up_page_bit(page, bit); } static inline int wait_on_page_bit_common(wait_queue_head_t *q, struct page *page, int bit_nr, int state, bool lock) { struct wait_page_queue wait_page; wait_queue_entry_t *wait = &wait_page.wait; int ret = 0; init_wait(wait); wait->flags = lock ? WQ_FLAG_EXCLUSIVE : 0; wait->func = wake_page_function; wait_page.page = page; wait_page.bit_nr = bit_nr; for (;;) { spin_lock_irq(&q->lock); if (likely(list_empty(&wait->entry))) { __add_wait_queue_entry_tail(q, wait); SetPageWaiters(page); } set_current_state(state); spin_unlock_irq(&q->lock); if (likely(test_bit(bit_nr, &page->flags))) { io_schedule(); } if (lock) { if (!test_and_set_bit_lock(bit_nr, &page->flags)) break; } else { if (!test_bit(bit_nr, &page->flags)) break; } if (unlikely(signal_pending_state(state, current))) { ret = -EINTR; break; } } finish_wait(q, wait); /* * A signal could leave PageWaiters set. Clearing it here if * !waitqueue_active would be possible (by open-coding finish_wait), * but still fail to catch it in the case of wait hash collision. We * already can fail to clear wait hash collision cases, so don't * bother with signals either. */ return ret; } void wait_on_page_bit(struct page *page, int bit_nr) { wait_queue_head_t *q = page_waitqueue(page); wait_on_page_bit_common(q, page, bit_nr, TASK_UNINTERRUPTIBLE, false); } EXPORT_SYMBOL(wait_on_page_bit); int wait_on_page_bit_killable(struct page *page, int bit_nr) { wait_queue_head_t *q = page_waitqueue(page); return wait_on_page_bit_common(q, page, bit_nr, TASK_KILLABLE, false); } EXPORT_SYMBOL(wait_on_page_bit_killable); /** * add_page_wait_queue - Add an arbitrary waiter to a page's wait queue * @page: Page defining the wait queue of interest * @waiter: Waiter to add to the queue * * Add an arbitrary @waiter to the wait queue for the nominated @page. */ void add_page_wait_queue(struct page *page, wait_queue_entry_t *waiter) { wait_queue_head_t *q = page_waitqueue(page); unsigned long flags; spin_lock_irqsave(&q->lock, flags); __add_wait_queue_entry_tail(q, waiter); SetPageWaiters(page); spin_unlock_irqrestore(&q->lock, flags); } EXPORT_SYMBOL_GPL(add_page_wait_queue); #ifndef clear_bit_unlock_is_negative_byte /* * PG_waiters is the high bit in the same byte as PG_lock. * * On x86 (and on many other architectures), we can clear PG_lock and * test the sign bit at the same time. But if the architecture does * not support that special operation, we just do this all by hand * instead. * * The read of PG_waiters has to be after (or concurrently with) PG_locked * being cleared, but a memory barrier should be unneccssary since it is * in the same byte as PG_locked. */ static inline bool clear_bit_unlock_is_negative_byte(long nr, volatile void *mem) { clear_bit_unlock(nr, mem); /* smp_mb__after_atomic(); */ return test_bit(PG_waiters, mem); } #endif /** * unlock_page - unlock a locked page * @page: the page * * Unlocks the page and wakes up sleepers in ___wait_on_page_locked(). * Also wakes sleepers in wait_on_page_writeback() because the wakeup * mechanism between PageLocked pages and PageWriteback pages is shared. * But that's OK - sleepers in wait_on_page_writeback() just go back to sleep. * * Note that this depends on PG_waiters being the sign bit in the byte * that contains PG_locked - thus the BUILD_BUG_ON(). That allows us to * clear the PG_locked bit and test PG_waiters at the same time fairly * portably (architectures that do LL/SC can test any bit, while x86 can * test the sign bit). */ void unlock_page(struct page *page) { BUILD_BUG_ON(PG_waiters != 7); page = compound_head(page); VM_BUG_ON_PAGE(!PageLocked(page), page); if (clear_bit_unlock_is_negative_byte(PG_locked, &page->flags)) wake_up_page_bit(page, PG_locked); } EXPORT_SYMBOL(unlock_page); /** * end_page_writeback - end writeback against a page * @page: the page */ void end_page_writeback(struct page *page) { /* * TestClearPageReclaim could be used here but it is an atomic * operation and overkill in this particular case. Failing to * shuffle a page marked for immediate reclaim is too mild to * justify taking an atomic operation penalty at the end of * ever page writeback. */ if (PageReclaim(page)) { ClearPageReclaim(page); rotate_reclaimable_page(page); } if (!test_clear_page_writeback(page)) BUG(); smp_mb__after_atomic(); wake_up_page(page, PG_writeback); } EXPORT_SYMBOL(end_page_writeback); /* * After completing I/O on a page, call this routine to update the page * flags appropriately */ void page_endio(struct page *page, bool is_write, int err) { if (!is_write) { if (!err) { SetPageUptodate(page); } else { ClearPageUptodate(page); SetPageError(page); } unlock_page(page); } else { if (err) { struct address_space *mapping; SetPageError(page); mapping = page_mapping(page); if (mapping) mapping_set_error(mapping, err); } end_page_writeback(page); } } EXPORT_SYMBOL_GPL(page_endio); /** * __lock_page - get a lock on the page, assuming we need to sleep to get it * @__page: the page to lock */ void __lock_page(struct page *__page) { struct page *page = compound_head(__page); wait_queue_head_t *q = page_waitqueue(page); wait_on_page_bit_common(q, page, PG_locked, TASK_UNINTERRUPTIBLE, true); } EXPORT_SYMBOL(__lock_page); int __lock_page_killable(struct page *__page) { struct page *page = compound_head(__page); wait_queue_head_t *q = page_waitqueue(page); return wait_on_page_bit_common(q, page, PG_locked, TASK_KILLABLE, true); } EXPORT_SYMBOL_GPL(__lock_page_killable); /* * Return values: * 1 - page is locked; mmap_sem is still held. * 0 - page is not locked. * mmap_sem has been released (up_read()), unless flags had both * FAULT_FLAG_ALLOW_RETRY and FAULT_FLAG_RETRY_NOWAIT set, in * which case mmap_sem is still held. * * If neither ALLOW_RETRY nor KILLABLE are set, will always return 1 * with the page locked and the mmap_sem unperturbed. */ int __lock_page_or_retry(struct page *page, struct mm_struct *mm, unsigned int flags) { if (flags & FAULT_FLAG_ALLOW_RETRY) { /* * CAUTION! In this case, mmap_sem is not released * even though return 0. */ if (flags & FAULT_FLAG_RETRY_NOWAIT) return 0; up_read(&mm->mmap_sem); if (flags & FAULT_FLAG_KILLABLE) wait_on_page_locked_killable(page); else wait_on_page_locked(page); return 0; } else { if (flags & FAULT_FLAG_KILLABLE) { int ret; ret = __lock_page_killable(page); if (ret) { up_read(&mm->mmap_sem); return 0; } } else __lock_page(page); return 1; } } /** * page_cache_next_hole - find the next hole (not-present entry) * @mapping: mapping * @index: index * @max_scan: maximum range to search * * Search the set [index, min(index+max_scan-1, MAX_INDEX)] for the * lowest indexed hole. * * Returns: the index of the hole if found, otherwise returns an index * outside of the set specified (in which case 'return - index >= * max_scan' will be true). In rare cases of index wrap-around, 0 will * be returned. * * page_cache_next_hole may be called under rcu_read_lock. However, * like radix_tree_gang_lookup, this will not atomically search a * snapshot of the tree at a single point in time. For example, if a * hole is created at index 5, then subsequently a hole is created at * index 10, page_cache_next_hole covering both indexes may return 10 * if called under rcu_read_lock. */ pgoff_t page_cache_next_hole(struct address_space *mapping, pgoff_t index, unsigned long max_scan) { unsigned long i; for (i = 0; i < max_scan; i++) { struct page *page; page = radix_tree_lookup(&mapping->i_pages, index); if (!page || radix_tree_exceptional_entry(page)) break; index++; if (index == 0) break; } return index; } EXPORT_SYMBOL(page_cache_next_hole); /** * page_cache_prev_hole - find the prev hole (not-present entry) * @mapping: mapping * @index: index * @max_scan: maximum range to search * * Search backwards in the range [max(index-max_scan+1, 0), index] for * the first hole. * * Returns: the index of the hole if found, otherwise returns an index * outside of the set specified (in which case 'index - return >= * max_scan' will be true). In rare cases of wrap-around, ULONG_MAX * will be returned. * * page_cache_prev_hole may be called under rcu_read_lock. However, * like radix_tree_gang_lookup, this will not atomically search a * snapshot of the tree at a single point in time. For example, if a * hole is created at index 10, then subsequently a hole is created at * index 5, page_cache_prev_hole covering both indexes may return 5 if * called under rcu_read_lock. */ pgoff_t page_cache_prev_hole(struct address_space *mapping, pgoff_t index, unsigned long max_scan) { unsigned long i; for (i = 0; i < max_scan; i++) { struct page *page; page = radix_tree_lookup(&mapping->i_pages, index); if (!page || radix_tree_exceptional_entry(page)) break; index--; if (index == ULONG_MAX) break; } return index; } EXPORT_SYMBOL(page_cache_prev_hole); /** * find_get_entry - find and get a page cache entry * @mapping: the address_space to search * @offset: the page cache index * * Looks up the page cache slot at @mapping & @offset. If there is a * page cache page, it is returned with an increased refcount. * * If the slot holds a shadow entry of a previously evicted page, or a * swap entry from shmem/tmpfs, it is returned. * * Otherwise, %NULL is returned. */ struct page *find_get_entry(struct address_space *mapping, pgoff_t offset) { void **pagep; struct page *head, *page; rcu_read_lock(); repeat: page = NULL; pagep = radix_tree_lookup_slot(&mapping->i_pages, offset); if (pagep) { page = radix_tree_deref_slot(pagep); if (unlikely(!page)) goto out; if (radix_tree_exception(page)) { if (radix_tree_deref_retry(page)) goto repeat; /* * A shadow entry of a recently evicted page, * or a swap entry from shmem/tmpfs. Return * it without attempting to raise page count. */ goto out; } head = compound_head(page); if (!page_cache_get_speculative(head)) goto repeat; /* The page was split under us? */ if (compound_head(page) != head) { put_page(head); goto repeat; } /* * Has the page moved? * This is part of the lockless pagecache protocol. See * include/linux/pagemap.h for details. */ if (unlikely(page != *pagep)) { put_page(head); goto repeat; } } out: rcu_read_unlock(); return page; } EXPORT_SYMBOL(find_get_entry); /** * find_lock_entry - locate, pin and lock a page cache entry * @mapping: the address_space to search * @offset: the page cache index * * Looks up the page cache slot at @mapping & @offset. If there is a * page cache page, it is returned locked and with an increased * refcount. * * If the slot holds a shadow entry of a previously evicted page, or a * swap entry from shmem/tmpfs, it is returned. * * Otherwise, %NULL is returned. * * find_lock_entry() may sleep. */ struct page *find_lock_entry(struct address_space *mapping, pgoff_t offset) { struct page *page; repeat: page = find_get_entry(mapping, offset); if (page && !radix_tree_exception(page)) { lock_page(page); /* Has the page been truncated? */ if (unlikely(page_mapping(page) != mapping)) { unlock_page(page); put_page(page); goto repeat; } VM_BUG_ON_PAGE(page_to_pgoff(page) != offset, page); } return page; } EXPORT_SYMBOL(find_lock_entry); /** * pagecache_get_page - find and get a page reference * @mapping: the address_space to search * @offset: the page index * @fgp_flags: PCG flags * @gfp_mask: gfp mask to use for the page cache data page allocation * * Looks up the page cache slot at @mapping & @offset. * * PCG flags modify how the page is returned. * * @fgp_flags can be: * * - FGP_ACCESSED: the page will be marked accessed * - FGP_LOCK: Page is return locked * - FGP_CREAT: If page is not present then a new page is allocated using * @gfp_mask and added to the page cache and the VM's LRU * list. The page is returned locked and with an increased * refcount. Otherwise, NULL is returned. * * If FGP_LOCK or FGP_CREAT are specified then the function may sleep even * if the GFP flags specified for FGP_CREAT are atomic. * * If there is a page cache page, it is returned with an increased refcount. */ struct page *pagecache_get_page(struct address_space *mapping, pgoff_t offset, int fgp_flags, gfp_t gfp_mask) { struct page *page; repeat: page = find_get_entry(mapping, offset); if (radix_tree_exceptional_entry(page)) page = NULL; if (!page) goto no_page; if (fgp_flags & FGP_LOCK) { if (fgp_flags & FGP_NOWAIT) { if (!trylock_page(page)) { put_page(page); return NULL; } } else { lock_page(page); } /* Has the page been truncated? */ if (unlikely(page->mapping != mapping)) { unlock_page(page); put_page(page); goto repeat; } VM_BUG_ON_PAGE(page->index != offset, page); } if (page && (fgp_flags & FGP_ACCESSED)) mark_page_accessed(page); no_page: if (!page && (fgp_flags & FGP_CREAT)) { int err; if ((fgp_flags & FGP_WRITE) && mapping_cap_account_dirty(mapping)) gfp_mask |= __GFP_WRITE; if (fgp_flags & FGP_NOFS) gfp_mask &= ~__GFP_FS; page = __page_cache_alloc(gfp_mask); if (!page) return NULL; if (WARN_ON_ONCE(!(fgp_flags & FGP_LOCK))) fgp_flags |= FGP_LOCK; /* Init accessed so avoid atomic mark_page_accessed later */ if (fgp_flags & FGP_ACCESSED) __SetPageReferenced(page); err = add_to_page_cache_lru(page, mapping, offset, gfp_mask); if (unlikely(err)) { put_page(page); page = NULL; if (err == -EEXIST) goto repeat; } } return page; } EXPORT_SYMBOL(pagecache_get_page); /** * find_get_entries - gang pagecache lookup * @mapping: The address_space to search * @start: The starting page cache index * @nr_entries: The maximum number of entries * @entries: Where the resulting entries are placed * @indices: The cache indices corresponding to the entries in @entries * * find_get_entries() will search for and return a group of up to * @nr_entries entries in the mapping. The entries are placed at * @entries. find_get_entries() takes a reference against any actual * pages it returns. * * The search returns a group of mapping-contiguous page cache entries * with ascending indexes. There may be holes in the indices due to * not-present pages. * * Any shadow entries of evicted pages, or swap entries from * shmem/tmpfs, are included in the returned array. * * find_get_entries() returns the number of pages and shadow entries * which were found. */ unsigned find_get_entries(struct address_space *mapping, pgoff_t start, unsigned int nr_entries, struct page **entries, pgoff_t *indices) { void **slot; unsigned int ret = 0; struct radix_tree_iter iter; if (!nr_entries) return 0; rcu_read_lock(); radix_tree_for_each_slot(slot, &mapping->i_pages, &iter, start) { struct page *head, *page; repeat: page = radix_tree_deref_slot(slot); if (unlikely(!page)) continue; if (radix_tree_exception(page)) { if (radix_tree_deref_retry(page)) { slot = radix_tree_iter_retry(&iter); continue; } /* * A shadow entry of a recently evicted page, a swap * entry from shmem/tmpfs or a DAX entry. Return it * without attempting to raise page count. */ goto export; } head = compound_head(page); if (!page_cache_get_speculative(head)) goto repeat; /* The page was split under us? */ if (compound_head(page) != head) { put_page(head); goto repeat; } /* Has the page moved? */ if (unlikely(page != *slot)) { put_page(head); goto repeat; } export: indices[ret] = iter.index; entries[ret] = page; if (++ret == nr_entries) break; } rcu_read_unlock(); return ret; } /** * find_get_pages_range - gang pagecache lookup * @mapping: The address_space to search * @start: The starting page index * @end: The final page index (inclusive) * @nr_pages: The maximum number of pages * @pages: Where the resulting pages are placed * * find_get_pages_range() will search for and return a group of up to @nr_pages * pages in the mapping starting at index @start and up to index @end * (inclusive). The pages are placed at @pages. find_get_pages_range() takes * a reference against the returned pages. * * The search returns a group of mapping-contiguous pages with ascending * indexes. There may be holes in the indices due to not-present pages. * We also update @start to index the next page for the traversal. * * find_get_pages_range() returns the number of pages which were found. If this * number is smaller than @nr_pages, the end of specified range has been * reached. */ unsigned find_get_pages_range(struct address_space *mapping, pgoff_t *start, pgoff_t end, unsigned int nr_pages, struct page **pages) { struct radix_tree_iter iter; void **slot; unsigned ret = 0; if (unlikely(!nr_pages)) return 0; rcu_read_lock(); radix_tree_for_each_slot(slot, &mapping->i_pages, &iter, *start) { struct page *head, *page; if (iter.index > end) break; repeat: page = radix_tree_deref_slot(slot); if (unlikely(!page)) continue; if (radix_tree_exception(page)) { if (radix_tree_deref_retry(page)) { slot = radix_tree_iter_retry(&iter); continue; } /* * A shadow entry of a recently evicted page, * or a swap entry from shmem/tmpfs. Skip * over it. */ continue; } head = compound_head(page); if (!page_cache_get_speculative(head)) goto repeat; /* The page was split under us? */ if (compound_head(page) != head) { put_page(head); goto repeat; } /* Has the page moved? */ if (unlikely(page != *slot)) { put_page(head); goto repeat; } pages[ret] = page; if (++ret == nr_pages) { *start = pages[ret - 1]->index + 1; goto out; } } /* * We come here when there is no page beyond @end. We take care to not * overflow the index @start as it confuses some of the callers. This * breaks the iteration when there is page at index -1 but that is * already broken anyway. */ if (end == (pgoff_t)-1) *start = (pgoff_t)-1; else *start = end + 1; out: rcu_read_unlock(); return ret; } /** * find_get_pages_contig - gang contiguous pagecache lookup * @mapping: The address_space to search * @index: The starting page index * @nr_pages: The maximum number of pages * @pages: Where the resulting pages are placed * * find_get_pages_contig() works exactly like find_get_pages(), except * that the returned number of pages are guaranteed to be contiguous. * * find_get_pages_contig() returns the number of pages which were found. */ unsigned find_get_pages_contig(struct address_space *mapping, pgoff_t index, unsigned int nr_pages, struct page **pages) { struct radix_tree_iter iter; void **slot; unsigned int ret = 0; if (unlikely(!nr_pages)) return 0; rcu_read_lock(); radix_tree_for_each_contig(slot, &mapping->i_pages, &iter, index) { struct page *head, *page; repeat: page = radix_tree_deref_slot(slot); /* The hole, there no reason to continue */ if (unlikely(!page)) break; if (radix_tree_exception(page)) { if (radix_tree_deref_retry(page)) { slot = radix_tree_iter_retry(&iter); continue; } /* * A shadow entry of a recently evicted page, * or a swap entry from shmem/tmpfs. Stop * looking for contiguous pages. */ break; } head = compound_head(page); if (!page_cache_get_speculative(head)) goto repeat; /* The page was split under us? */ if (compound_head(page) != head) { put_page(head); goto repeat; } /* Has the page moved? */ if (unlikely(page != *slot)) { put_page(head); goto repeat; } /* * must check mapping and index after taking the ref. * otherwise we can get both false positives and false * negatives, which is just confusing to the caller. */ if (page->mapping == NULL || page_to_pgoff(page) != iter.index) { put_page(page); break; } pages[ret] = page; if (++ret == nr_pages) break; } rcu_read_unlock(); return ret; } EXPORT_SYMBOL(find_get_pages_contig); /** * find_get_pages_range_tag - find and return pages in given range matching @tag * @mapping: the address_space to search * @index: the starting page index * @end: The final page index (inclusive) * @tag: the tag index * @nr_pages: the maximum number of pages * @pages: where the resulting pages are placed * * Like find_get_pages, except we only return pages which are tagged with * @tag. We update @index to index the next page for the traversal. */ unsigned find_get_pages_range_tag(struct address_space *mapping, pgoff_t *index, pgoff_t end, int tag, unsigned int nr_pages, struct page **pages) { struct radix_tree_iter iter; void **slot; unsigned ret = 0; if (unlikely(!nr_pages)) return 0; rcu_read_lock(); radix_tree_for_each_tagged(slot, &mapping->i_pages, &iter, *index, tag) { struct page *head, *page; if (iter.index > end) break; repeat: page = radix_tree_deref_slot(slot); if (unlikely(!page)) continue; if (radix_tree_exception(page)) { if (radix_tree_deref_retry(page)) { slot = radix_tree_iter_retry(&iter); continue; } /* * A shadow entry of a recently evicted page. * * Those entries should never be tagged, but * this tree walk is lockless and the tags are * looked up in bulk, one radix tree node at a * time, so there is a sizable window for page * reclaim to evict a page we saw tagged. * * Skip over it. */ continue; } head = compound_head(page); if (!page_cache_get_speculative(head)) goto repeat; /* The page was split under us? */ if (compound_head(page) != head) { put_page(head); goto repeat; } /* Has the page moved? */ if (unlikely(page != *slot)) { put_page(head); goto repeat; } pages[ret] = page; if (++ret == nr_pages) { *index = pages[ret - 1]->index + 1; goto out; } } /* * We come here when we got at @end. We take care to not overflow the * index @index as it confuses some of the callers. This breaks the * iteration when there is page at index -1 but that is already broken * anyway. */ if (end == (pgoff_t)-1) *index = (pgoff_t)-1; else *index = end + 1; out: rcu_read_unlock(); return ret; } EXPORT_SYMBOL(find_get_pages_range_tag); /** * find_get_entries_tag - find and return entries that match @tag * @mapping: the address_space to search * @start: the starting page cache index * @tag: the tag index * @nr_entries: the maximum number of entries * @entries: where the resulting entries are placed * @indices: the cache indices corresponding to the entries in @entries * * Like find_get_entries, except we only return entries which are tagged with * @tag. */ unsigned find_get_entries_tag(struct address_space *mapping, pgoff_t start, int tag, unsigned int nr_entries, struct page **entries, pgoff_t *indices) { void **slot; unsigned int ret = 0; struct radix_tree_iter iter; if (!nr_entries) return 0; rcu_read_lock(); radix_tree_for_each_tagged(slot, &mapping->i_pages, &iter, start, tag) { struct page *head, *page; repeat: page = radix_tree_deref_slot(slot); if (unlikely(!page)) continue; if (radix_tree_exception(page)) { if (radix_tree_deref_retry(page)) { slot = radix_tree_iter_retry(&iter); continue; } /* * A shadow entry of a recently evicted page, a swap * entry from shmem/tmpfs or a DAX entry. Return it * without attempting to raise page count. */ goto export; } head = compound_head(page); if (!page_cache_get_speculative(head)) goto repeat; /* The page was split under us? */ if (compound_head(page) != head) { put_page(head); goto repeat; } /* Has the page moved? */ if (unlikely(page != *slot)) { put_page(head); goto repeat; } export: indices[ret] = iter.index; entries[ret] = page; if (++ret == nr_entries) break; } rcu_read_unlock(); return ret; } EXPORT_SYMBOL(find_get_entries_tag); /* * CD/DVDs are error prone. When a medium error occurs, the driver may fail * a _large_ part of the i/o request. Imagine the worst scenario: * * ---R__________________________________________B__________ * ^ reading here ^ bad block(assume 4k) * * read(R) => miss => readahead(R...B) => media error => frustrating retries * => failing the whole request => read(R) => read(R+1) => * readahead(R+1...B+1) => bang => read(R+2) => read(R+3) => * readahead(R+3...B+2) => bang => read(R+3) => read(R+4) => * readahead(R+4...B+3) => bang => read(R+4) => read(R+5) => ...... * * It is going insane. Fix it by quickly scaling down the readahead size. */ static void shrink_readahead_size_eio(struct file *filp, struct file_ra_state *ra) { ra->ra_pages /= 4; } /** * generic_file_buffered_read - generic file read routine * @iocb: the iocb to read * @iter: data destination * @written: already copied * * This is a generic file read routine, and uses the * mapping->a_ops->readpage() function for the actual low-level stuff. * * This is really ugly. But the goto's actually try to clarify some * of the logic when it comes to error handling etc. */ static ssize_t generic_file_buffered_read(struct kiocb *iocb, struct iov_iter *iter, ssize_t written) { struct file *filp = iocb->ki_filp; struct address_space *mapping = filp->f_mapping; struct inode *inode = mapping->host; struct file_ra_state *ra = &filp->f_ra; loff_t *ppos = &iocb->ki_pos; pgoff_t index; pgoff_t last_index; pgoff_t prev_index; unsigned long offset; /* offset into pagecache page */ unsigned int prev_offset; int error = 0; if (unlikely(*ppos >= inode->i_sb->s_maxbytes)) return 0; iov_iter_truncate(iter, inode->i_sb->s_maxbytes); index = *ppos >> PAGE_SHIFT; prev_index = ra->prev_pos >> PAGE_SHIFT; prev_offset = ra->prev_pos & (PAGE_SIZE-1); last_index = (*ppos + iter->count + PAGE_SIZE-1) >> PAGE_SHIFT; offset = *ppos & ~PAGE_MASK; for (;;) { struct page *page; pgoff_t end_index; loff_t isize; unsigned long nr, ret; cond_resched(); find_page: if (fatal_signal_pending(current)) { error = -EINTR; goto out; } page = find_get_page(mapping, index); if (!page) { if (iocb->ki_flags & IOCB_NOWAIT) goto would_block; page_cache_sync_readahead(mapping, ra, filp, index, last_index - index); page = find_get_page(mapping, index); if (unlikely(page == NULL)) goto no_cached_page; } if (PageReadahead(page)) { page_cache_async_readahead(mapping, ra, filp, page, index, last_index - index); } if (!PageUptodate(page)) { if (iocb->ki_flags & IOCB_NOWAIT) { put_page(page); goto would_block; } /* * See comment in do_read_cache_page on why * wait_on_page_locked is used to avoid unnecessarily * serialisations and why it's safe. */ error = wait_on_page_locked_killable(page); if (unlikely(error)) goto readpage_error; if (PageUptodate(page)) goto page_ok; if (inode->i_blkbits == PAGE_SHIFT || !mapping->a_ops->is_partially_uptodate) goto page_not_up_to_date; /* pipes can't handle partially uptodate pages */ if (unlikely(iter->type & ITER_PIPE)) goto page_not_up_to_date; if (!trylock_page(page)) goto page_not_up_to_date; /* Did it get truncated before we got the lock? */ if (!page->mapping) goto page_not_up_to_date_locked; if (!mapping->a_ops->is_partially_uptodate(page, offset, iter->count)) goto page_not_up_to_date_locked; unlock_page(page); } page_ok: /* * i_size must be checked after we know the page is Uptodate. * * Checking i_size after the check allows us to calculate * the correct value for "nr", which means the zero-filled * part of the page is not copied back to userspace (unless * another truncate extends the file - this is desired though). */ isize = i_size_read(inode); end_index = (isize - 1) >> PAGE_SHIFT; if (unlikely(!isize || index > end_index)) { put_page(page); goto out; } /* nr is the maximum number of bytes to copy from this page */ nr = PAGE_SIZE; if (index == end_index) { nr = ((isize - 1) & ~PAGE_MASK) + 1; if (nr <= offset) { put_page(page); goto out; } } nr = nr - offset; /* If users can be writing to this page using arbitrary * virtual addresses, take care about potential aliasing * before reading the page on the kernel side. */ if (mapping_writably_mapped(mapping)) flush_dcache_page(page); /* * When a sequential read accesses a page several times, * only mark it as accessed the first time. */ if (prev_index != index || offset != prev_offset) mark_page_accessed(page); prev_index = index; /* * Ok, we have the page, and it's up-to-date, so * now we can copy it to user space... */ ret = copy_page_to_iter(page, offset, nr, iter); offset += ret; index += offset >> PAGE_SHIFT; offset &= ~PAGE_MASK; prev_offset = offset; put_page(page); written += ret; if (!iov_iter_count(iter)) goto out; if (ret < nr) { error = -EFAULT; goto out; } continue; page_not_up_to_date: /* Get exclusive access to the page ... */ error = lock_page_killable(page); if (unlikely(error)) goto readpage_error; page_not_up_to_date_locked: /* Did it get truncated before we got the lock? */ if (!page->mapping) { unlock_page(page); put_page(page); continue; } /* Did somebody else fill it already? */ if (PageUptodate(page)) { unlock_page(page); goto page_ok; } readpage: /* * A previous I/O error may have been due to temporary * failures, eg. multipath errors. * PG_error will be set again if readpage fails. */ ClearPageError(page); /* Start the actual read. The read will unlock the page. */ error = mapping->a_ops->readpage(filp, page); if (unlikely(error)) { if (error == AOP_TRUNCATED_PAGE) { put_page(page); error = 0; goto find_page; } goto readpage_error; } if (!PageUptodate(page)) { error = lock_page_killable(page); if (unlikely(error)) goto readpage_error; if (!PageUptodate(page)) { if (page->mapping == NULL) { /* * invalidate_mapping_pages got it */ unlock_page(page); put_page(page); goto find_page; } unlock_page(page); shrink_readahead_size_eio(filp, ra); error = -EIO; goto readpage_error; } unlock_page(page); } goto page_ok; readpage_error: /* UHHUH! A synchronous read error occurred. Report it */ put_page(page); goto out; no_cached_page: /* * Ok, it wasn't cached, so we need to create a new * page.. */ page = page_cache_alloc(mapping); if (!page) { error = -ENOMEM; goto out; } error = add_to_page_cache_lru(page, mapping, index, mapping_gfp_constraint(mapping, GFP_KERNEL)); if (error) { put_page(page); if (error == -EEXIST) { error = 0; goto find_page; } goto out; } goto readpage; } would_block: error = -EAGAIN; out: ra->prev_pos = prev_index; ra->prev_pos <<= PAGE_SHIFT; ra->prev_pos |= prev_offset; *ppos = ((loff_t)index << PAGE_SHIFT) + offset; file_accessed(filp); return written ? written : error; } /** * generic_file_read_iter - generic filesystem read routine * @iocb: kernel I/O control block * @iter: destination for the data read * * This is the "read_iter()" routine for all filesystems * that can use the page cache directly. */ ssize_t generic_file_read_iter(struct kiocb *iocb, struct iov_iter *iter) { size_t count = iov_iter_count(iter); ssize_t retval = 0; if (!count) goto out; /* skip atime */ if (iocb->ki_flags & IOCB_DIRECT) { struct file *file = iocb->ki_filp; struct address_space *mapping = file->f_mapping; struct inode *inode = mapping->host; loff_t size; size = i_size_read(inode); if (iocb->ki_flags & IOCB_NOWAIT) { if (filemap_range_has_page(mapping, iocb->ki_pos, iocb->ki_pos + count - 1)) return -EAGAIN; } else { retval = filemap_write_and_wait_range(mapping, iocb->ki_pos, iocb->ki_pos + count - 1); if (retval < 0) goto out; } file_accessed(file); retval = mapping->a_ops->direct_IO(iocb, iter); if (retval >= 0) { iocb->ki_pos += retval; count -= retval; } iov_iter_revert(iter, count - iov_iter_count(iter)); /* * Btrfs can have a short DIO read if we encounter * compressed extents, so if there was an error, or if * we've already read everything we wanted to, or if * there was a short read because we hit EOF, go ahead * and return. Otherwise fallthrough to buffered io for * the rest of the read. Buffered reads will not work for * DAX files, so don't bother trying. */ if (retval < 0 || !count || iocb->ki_pos >= size || IS_DAX(inode)) goto out; } retval = generic_file_buffered_read(iocb, iter, retval); out: return retval; } EXPORT_SYMBOL(generic_file_read_iter); #ifdef CONFIG_MMU /** * page_cache_read - adds requested page to the page cache if not already there * @file: file to read * @offset: page index * @gfp_mask: memory allocation flags * * This adds the requested page to the page cache if it isn't already there, * and schedules an I/O to read in its contents from disk. */ static int page_cache_read(struct file *file, pgoff_t offset, gfp_t gfp_mask) { struct address_space *mapping = file->f_mapping; struct page *page; int ret; do { page = __page_cache_alloc(gfp_mask); if (!page) return -ENOMEM; ret = add_to_page_cache_lru(page, mapping, offset, gfp_mask); if (ret == 0) ret = mapping->a_ops->readpage(file, page); else if (ret == -EEXIST) ret = 0; /* losing race to add is OK */ put_page(page); } while (ret == AOP_TRUNCATED_PAGE); return ret; } #define MMAP_LOTSAMISS (100) /* * Synchronous readahead happens when we don't even find * a page in the page cache at all. */ static void do_sync_mmap_readahead(struct vm_area_struct *vma, struct file_ra_state *ra, struct file *file, pgoff_t offset) { struct address_space *mapping = file->f_mapping; /* If we don't want any read-ahead, don't bother */ if (vma->vm_flags & VM_RAND_READ) return; if (!ra->ra_pages) return; if (vma->vm_flags & VM_SEQ_READ) { page_cache_sync_readahead(mapping, ra, file, offset, ra->ra_pages); return; } /* Avoid banging the cache line if not needed */ if (ra->mmap_miss < MMAP_LOTSAMISS * 10) ra->mmap_miss++; /* * Do we miss much more than hit in this file? If so, * stop bothering with read-ahead. It will only hurt. */ if (ra->mmap_miss > MMAP_LOTSAMISS) return; /* * mmap read-around */ ra->start = max_t(long, 0, offset - ra->ra_pages / 2); ra->size = ra->ra_pages; ra->async_size = ra->ra_pages / 4; ra_submit(ra, mapping, file); } /* * Asynchronous readahead happens when we find the page and PG_readahead, * so we want to possibly extend the readahead further.. */ static void do_async_mmap_readahead(struct vm_area_struct *vma, struct file_ra_state *ra, struct file *file, struct page *page, pgoff_t offset) { struct address_space *mapping = file->f_mapping; /* If we don't want any read-ahead, don't bother */ if (vma->vm_flags & VM_RAND_READ) return; if (ra->mmap_miss > 0) ra->mmap_miss--; if (PageReadahead(page)) page_cache_async_readahead(mapping, ra, file, page, offset, ra->ra_pages); } /** * filemap_fault - read in file data for page fault handling * @vmf: struct vm_fault containing details of the fault * * filemap_fault() is invoked via the vma operations vector for a * mapped memory region to read in file data during a page fault. * * The goto's are kind of ugly, but this streamlines the normal case of having * it in the page cache, and handles the special cases reasonably without * having a lot of duplicated code. * * vma->vm_mm->mmap_sem must be held on entry. * * If our return value has VM_FAULT_RETRY set, it's because * lock_page_or_retry() returned 0. * The mmap_sem has usually been released in this case. * See __lock_page_or_retry() for the exception. * * If our return value does not have VM_FAULT_RETRY set, the mmap_sem * has not been released. * * We never return with VM_FAULT_RETRY and a bit from VM_FAULT_ERROR set. */ vm_fault_t filemap_fault(struct vm_fault *vmf) { int error; struct file *file = vmf->vma->vm_file; struct address_space *mapping = file->f_mapping; struct file_ra_state *ra = &file->f_ra; struct inode *inode = mapping->host; pgoff_t offset = vmf->pgoff; pgoff_t max_off; struct page *page; vm_fault_t ret = 0; max_off = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE); if (unlikely(offset >= max_off)) return VM_FAULT_SIGBUS; /* * Do we have something in the page cache already? */ page = find_get_page(mapping, offset); if (likely(page) && !(vmf->flags & FAULT_FLAG_TRIED)) { /* * We found the page, so try async readahead before * waiting for the lock. */ do_async_mmap_readahead(vmf->vma, ra, file, page, offset); } else if (!page) { /* No page in the page cache at all */ do_sync_mmap_readahead(vmf->vma, ra, file, offset); count_vm_event(PGMAJFAULT); count_memcg_event_mm(vmf->vma->vm_mm, PGMAJFAULT); ret = VM_FAULT_MAJOR; retry_find: page = find_get_page(mapping, offset); if (!page) goto no_cached_page; } if (!lock_page_or_retry(page, vmf->vma->vm_mm, vmf->flags)) { put_page(page); return ret | VM_FAULT_RETRY; } /* Did it get truncated? */ if (unlikely(page->mapping != mapping)) { unlock_page(page); put_page(page); goto retry_find; } VM_BUG_ON_PAGE(page->index != offset, page); /* * We have a locked page in the page cache, now we need to check * that it's up-to-date. If not, it is going to be due to an error. */ if (unlikely(!PageUptodate(page))) goto page_not_uptodate; /* * Found the page and have a reference on it. * We must recheck i_size under page lock. */ max_off = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE); if (unlikely(offset >= max_off)) { unlock_page(page); put_page(page); return VM_FAULT_SIGBUS; } vmf->page = page; return ret | VM_FAULT_LOCKED; no_cached_page: /* * We're only likely to ever get here if MADV_RANDOM is in * effect. */ error = page_cache_read(file, offset, vmf->gfp_mask); /* * The page we want has now been added to the page cache. * In the unlikely event that someone removed it in the * meantime, we'll just come back here and read it again. */ if (error >= 0) goto retry_find; /* * An error return from page_cache_read can result if the * system is low on memory, or a problem occurs while trying * to schedule I/O. */ if (error == -ENOMEM) return VM_FAULT_OOM; return VM_FAULT_SIGBUS; page_not_uptodate: /* * Umm, take care of errors if the page isn't up-to-date. * Try to re-read it _once_. We do this synchronously, * because there really aren't any performance issues here * and we need to check for errors. */ ClearPageError(page); error = mapping->a_ops->readpage(file, page); if (!error) { wait_on_page_locked(page); if (!PageUptodate(page)) error = -EIO; } put_page(page); if (!error || error == AOP_TRUNCATED_PAGE) goto retry_find; /* Things didn't work out. Return zero to tell the mm layer so. */ shrink_readahead_size_eio(file, ra); return VM_FAULT_SIGBUS; } EXPORT_SYMBOL(filemap_fault); void filemap_map_pages(struct vm_fault *vmf, pgoff_t start_pgoff, pgoff_t end_pgoff) { struct radix_tree_iter iter; void **slot; struct file *file = vmf->vma->vm_file; struct address_space *mapping = file->f_mapping; pgoff_t last_pgoff = start_pgoff; unsigned long max_idx; struct page *head, *page; rcu_read_lock(); radix_tree_for_each_slot(slot, &mapping->i_pages, &iter, start_pgoff) { if (iter.index > end_pgoff) break; repeat: page = radix_tree_deref_slot(slot); if (unlikely(!page)) goto next; if (radix_tree_exception(page)) { if (radix_tree_deref_retry(page)) { slot = radix_tree_iter_retry(&iter); continue; } goto next; } head = compound_head(page); if (!page_cache_get_speculative(head)) goto repeat; /* The page was split under us? */ if (compound_head(page) != head) { put_page(head); goto repeat; } /* Has the page moved? */ if (unlikely(page != *slot)) { put_page(head); goto repeat; } if (!PageUptodate(page) || PageReadahead(page) || PageHWPoison(page)) goto skip; if (!trylock_page(page)) goto skip; if (page->mapping != mapping || !PageUptodate(page)) goto unlock; max_idx = DIV_ROUND_UP(i_size_read(mapping->host), PAGE_SIZE); if (page->index >= max_idx) goto unlock; if (file->f_ra.mmap_miss > 0) file->f_ra.mmap_miss--; vmf->address += (iter.index - last_pgoff) << PAGE_SHIFT; if (vmf->pte) vmf->pte += iter.index - last_pgoff; last_pgoff = iter.index; if (alloc_set_pte(vmf, NULL, page)) goto unlock; unlock_page(page); goto next; unlock: unlock_page(page); skip: put_page(page); next: /* Huge page is mapped? No need to proceed. */ if (pmd_trans_huge(*vmf->pmd)) break; if (iter.index == end_pgoff) break; } rcu_read_unlock(); } EXPORT_SYMBOL(filemap_map_pages); vm_fault_t filemap_page_mkwrite(struct vm_fault *vmf) { struct page *page = vmf->page; struct inode *inode = file_inode(vmf->vma->vm_file); vm_fault_t ret = VM_FAULT_LOCKED; sb_start_pagefault(inode->i_sb); file_update_time(vmf->vma->vm_file); lock_page(page); if (page->mapping != inode->i_mapping) { unlock_page(page); ret = VM_FAULT_NOPAGE; goto out; } /* * We mark the page dirty already here so that when freeze is in * progress, we are guaranteed that writeback during freezing will * see the dirty page and writeprotect it again. */ set_page_dirty(page); wait_for_stable_page(page); out: sb_end_pagefault(inode->i_sb); return ret; } const struct vm_operations_struct generic_file_vm_ops = { .fault = filemap_fault, .map_pages = filemap_map_pages, .page_mkwrite = filemap_page_mkwrite, }; /* This is used for a general mmap of a disk file */ int generic_file_mmap(struct file * file, struct vm_area_struct * vma) { struct address_space *mapping = file->f_mapping; if (!mapping->a_ops->readpage) return -ENOEXEC; file_accessed(file); vma->vm_ops = &generic_file_vm_ops; return 0; } /* * This is for filesystems which do not implement ->writepage. */ int generic_file_readonly_mmap(struct file *file, struct vm_area_struct *vma) { if ((vma->vm_flags & VM_SHARED) && (vma->vm_flags & VM_MAYWRITE)) return -EINVAL; return generic_file_mmap(file, vma); } #else int filemap_page_mkwrite(struct vm_fault *vmf) { return -ENOSYS; } int generic_file_mmap(struct file * file, struct vm_area_struct * vma) { return -ENOSYS; } int generic_file_readonly_mmap(struct file * file, struct vm_area_struct * vma) { return -ENOSYS; } #endif /* CONFIG_MMU */ EXPORT_SYMBOL(filemap_page_mkwrite); EXPORT_SYMBOL(generic_file_mmap); EXPORT_SYMBOL(generic_file_readonly_mmap); static struct page *wait_on_page_read(struct page *page) { if (!IS_ERR(page)) { wait_on_page_locked(page); if (!PageUptodate(page)) { put_page(page); page = ERR_PTR(-EIO); } } return page; } static struct page *do_read_cache_page(struct address_space *mapping, pgoff_t index, int (*filler)(void *, struct page *), void *data, gfp_t gfp) { struct page *page; int err; repeat: page = find_get_page(mapping, index); if (!page) { page = __page_cache_alloc(gfp); if (!page) return ERR_PTR(-ENOMEM); err = add_to_page_cache_lru(page, mapping, index, gfp); if (unlikely(err)) { put_page(page); if (err == -EEXIST) goto repeat; /* Presumably ENOMEM for radix tree node */ return ERR_PTR(err); } filler: err = filler(data, page); if (err < 0) { put_page(page); return ERR_PTR(err); } page = wait_on_page_read(page); if (IS_ERR(page)) return page; goto out; } if (PageUptodate(page)) goto out; /* * Page is not up to date and may be locked due one of the following * case a: Page is being filled and the page lock is held * case b: Read/write error clearing the page uptodate status * case c: Truncation in progress (page locked) * case d: Reclaim in progress * * Case a, the page will be up to date when the page is unlocked. * There is no need to serialise on the page lock here as the page * is pinned so the lock gives no additional protection. Even if the * the page is truncated, the data is still valid if PageUptodate as * it's a race vs truncate race. * Case b, the page will not be up to date * Case c, the page may be truncated but in itself, the data may still * be valid after IO completes as it's a read vs truncate race. The * operation must restart if the page is not uptodate on unlock but * otherwise serialising on page lock to stabilise the mapping gives * no additional guarantees to the caller as the page lock is * released before return. * Case d, similar to truncation. If reclaim holds the page lock, it * will be a race with remove_mapping that determines if the mapping * is valid on unlock but otherwise the data is valid and there is * no need to serialise with page lock. * * As the page lock gives no additional guarantee, we optimistically * wait on the page to be unlocked and check if it's up to date and * use the page if it is. Otherwise, the page lock is required to * distinguish between the different cases. The motivation is that we * avoid spurious serialisations and wakeups when multiple processes * wait on the same page for IO to complete. */ wait_on_page_locked(page); if (PageUptodate(page)) goto out; /* Distinguish between all the cases under the safety of the lock */ lock_page(page); /* Case c or d, restart the operation */ if (!page->mapping) { unlock_page(page); put_page(page); goto repeat; } /* Someone else locked and filled the page in a very small window */ if (PageUptodate(page)) { unlock_page(page); goto out; } /* * A previous I/O error may have been due to temporary * failures. * Clear page error before actual read, PG_error will be * set again if read page fails. */ ClearPageError(page); goto filler; out: mark_page_accessed(page); return page; } /** * read_cache_page - read into page cache, fill it if needed * @mapping: the page's address_space * @index: the page index * @filler: function to perform the read * @data: first arg to filler(data, page) function, often left as NULL * * Read into the page cache. If a page already exists, and PageUptodate() is * not set, try to fill the page and wait for it to become unlocked. * * If the page does not get brought uptodate, return -EIO. */ struct page *read_cache_page(struct address_space *mapping, pgoff_t index, int (*filler)(void *, struct page *), void *data) { return do_read_cache_page(mapping, index, filler, data, mapping_gfp_mask(mapping)); } EXPORT_SYMBOL(read_cache_page); /** * read_cache_page_gfp - read into page cache, using specified page allocation flags. * @mapping: the page's address_space * @index: the page index * @gfp: the page allocator flags to use if allocating * * This is the same as "read_mapping_page(mapping, index, NULL)", but with * any new page allocations done using the specified allocation flags. * * If the page does not get brought uptodate, return -EIO. */ struct page *read_cache_page_gfp(struct address_space *mapping, pgoff_t index, gfp_t gfp) { filler_t *filler = (filler_t *)mapping->a_ops->readpage; return do_read_cache_page(mapping, index, filler, NULL, gfp); } EXPORT_SYMBOL(read_cache_page_gfp); /* * Performs necessary checks before doing a write * * Can adjust writing position or amount of bytes to write. * Returns appropriate error code that caller should return or * zero in case that write should be allowed. */ inline ssize_t generic_write_checks(struct kiocb *iocb, struct iov_iter *from) { struct file *file = iocb->ki_filp; struct inode *inode = file->f_mapping->host; unsigned long limit = rlimit(RLIMIT_FSIZE); loff_t pos; if (!iov_iter_count(from)) return 0; /* FIXME: this is for backwards compatibility with 2.4 */ if (iocb->ki_flags & IOCB_APPEND) iocb->ki_pos = i_size_read(inode); pos = iocb->ki_pos; if ((iocb->ki_flags & IOCB_NOWAIT) && !(iocb->ki_flags & IOCB_DIRECT)) return -EINVAL; if (limit != RLIM_INFINITY) { if (iocb->ki_pos >= limit) { send_sig(SIGXFSZ, current, 0); return -EFBIG; } iov_iter_truncate(from, limit - (unsigned long)pos); } /* * LFS rule */ if (unlikely(pos + iov_iter_count(from) > MAX_NON_LFS && !(file->f_flags & O_LARGEFILE))) { if (pos >= MAX_NON_LFS) return -EFBIG; iov_iter_truncate(from, MAX_NON_LFS - (unsigned long)pos); } /* * Are we about to exceed the fs block limit ? * * If we have written data it becomes a short write. If we have * exceeded without writing data we send a signal and return EFBIG. * Linus frestrict idea will clean these up nicely.. */ if (unlikely(pos >= inode->i_sb->s_maxbytes)) return -EFBIG; iov_iter_truncate(from, inode->i_sb->s_maxbytes - pos); return iov_iter_count(from); } EXPORT_SYMBOL(generic_write_checks); int pagecache_write_begin(struct file *file, struct address_space *mapping, loff_t pos, unsigned len, unsigned flags, struct page **pagep, void **fsdata) { const struct address_space_operations *aops = mapping->a_ops; return aops->write_begin(file, mapping, pos, len, flags, pagep, fsdata); } EXPORT_SYMBOL(pagecache_write_begin); int pagecache_write_end(struct file *file, struct address_space *mapping, loff_t pos, unsigned len, unsigned copied, struct page *page, void *fsdata) { const struct address_space_operations *aops = mapping->a_ops; return aops->write_end(file, mapping, pos, len, copied, page, fsdata); } EXPORT_SYMBOL(pagecache_write_end); ssize_t generic_file_direct_write(struct kiocb *iocb, struct iov_iter *from) { struct file *file = iocb->ki_filp; struct address_space *mapping = file->f_mapping; struct inode *inode = mapping->host; loff_t pos = iocb->ki_pos; ssize_t written; size_t write_len; pgoff_t end; write_len = iov_iter_count(from); end = (pos + write_len - 1) >> PAGE_SHIFT; if (iocb->ki_flags & IOCB_NOWAIT) { /* If there are pages to writeback, return */ if (filemap_range_has_page(inode->i_mapping, pos, pos + iov_iter_count(from))) return -EAGAIN; } else { written = filemap_write_and_wait_range(mapping, pos, pos + write_len - 1); if (written) goto out; } /* * After a write we want buffered reads to be sure to go to disk to get * the new data. We invalidate clean cached page from the region we're * about to write. We do this *before* the write so that we can return * without clobbering -EIOCBQUEUED from ->direct_IO(). */ written = invalidate_inode_pages2_range(mapping, pos >> PAGE_SHIFT, end); /* * If a page can not be invalidated, return 0 to fall back * to buffered write. */ if (written) { if (written == -EBUSY) return 0; goto out; } written = mapping->a_ops->direct_IO(iocb, from); /* * Finally, try again to invalidate clean pages which might have been * cached by non-direct readahead, or faulted in by get_user_pages() * if the source of the write was an mmap'ed region of the file * we're writing. Either one is a pretty crazy thing to do, * so we don't support it 100%. If this invalidation * fails, tough, the write still worked... * * Most of the time we do not need this since dio_complete() will do * the invalidation for us. However there are some file systems that * do not end up with dio_complete() being called, so let's not break * them by removing it completely */ if (mapping->nrpages) invalidate_inode_pages2_range(mapping, pos >> PAGE_SHIFT, end); if (written > 0) { pos += written; write_len -= written; if (pos > i_size_read(inode) && !S_ISBLK(inode->i_mode)) { i_size_write(inode, pos); mark_inode_dirty(inode); } iocb->ki_pos = pos; } iov_iter_revert(from, write_len - iov_iter_count(from)); out: return written; } EXPORT_SYMBOL(generic_file_direct_write); /* * Find or create a page at the given pagecache position. Return the locked * page. This function is specifically for buffered writes. */ struct page *grab_cache_page_write_begin(struct address_space *mapping, pgoff_t index, unsigned flags) { struct page *page; int fgp_flags = FGP_LOCK|FGP_WRITE|FGP_CREAT; if (flags & AOP_FLAG_NOFS) fgp_flags |= FGP_NOFS; page = pagecache_get_page(mapping, index, fgp_flags, mapping_gfp_mask(mapping)); if (page) wait_for_stable_page(page); return page; } EXPORT_SYMBOL(grab_cache_page_write_begin); ssize_t generic_perform_write(struct file *file, struct iov_iter *i, loff_t pos) { struct address_space *mapping = file->f_mapping; const struct address_space_operations *a_ops = mapping->a_ops; long status = 0; ssize_t written = 0; unsigned int flags = 0; do { struct page *page; unsigned long offset; /* Offset into pagecache page */ unsigned long bytes; /* Bytes to write to page */ size_t copied; /* Bytes copied from user */ void *fsdata; offset = (pos & (PAGE_SIZE - 1)); bytes = min_t(unsigned long, PAGE_SIZE - offset, iov_iter_count(i)); again: /* * Bring in the user page that we will copy from _first_. * Otherwise there's a nasty deadlock on copying from the * same page as we're writing to, without it being marked * up-to-date. * * Not only is this an optimisation, but it is also required * to check that the address is actually valid, when atomic * usercopies are used, below. */ if (unlikely(iov_iter_fault_in_readable(i, bytes))) { status = -EFAULT; break; } if (fatal_signal_pending(current)) { status = -EINTR; break; } status = a_ops->write_begin(file, mapping, pos, bytes, flags, &page, &fsdata); if (unlikely(status < 0)) break; if (mapping_writably_mapped(mapping)) flush_dcache_page(page); copied = iov_iter_copy_from_user_atomic(page, i, offset, bytes); flush_dcache_page(page); status = a_ops->write_end(file, mapping, pos, bytes, copied, page, fsdata); if (unlikely(status < 0)) break; copied = status; cond_resched(); iov_iter_advance(i, copied); if (unlikely(copied == 0)) { /* * If we were unable to copy any data at all, we must * fall back to a single segment length write. * * If we didn't fallback here, we could livelock * because not all segments in the iov can be copied at * once without a pagefault. */ bytes = min_t(unsigned long, PAGE_SIZE - offset, iov_iter_single_seg_count(i)); goto again; } pos += copied; written += copied; balance_dirty_pages_ratelimited(mapping); } while (iov_iter_count(i)); return written ? written : status; } EXPORT_SYMBOL(generic_perform_write); /** * __generic_file_write_iter - write data to a file * @iocb: IO state structure (file, offset, etc.) * @from: iov_iter with data to write * * This function does all the work needed for actually writing data to a * file. It does all basic checks, removes SUID from the file, updates * modification times and calls proper subroutines depending on whether we * do direct IO or a standard buffered write. * * It expects i_mutex to be grabbed unless we work on a block device or similar * object which does not need locking at all. * * This function does *not* take care of syncing data in case of O_SYNC write. * A caller has to handle it. This is mainly due to the fact that we want to * avoid syncing under i_mutex. */ ssize_t __generic_file_write_iter(struct kiocb *iocb, struct iov_iter *from) { struct file *file = iocb->ki_filp; struct address_space * mapping = file->f_mapping; struct inode *inode = mapping->host; ssize_t written = 0; ssize_t err; ssize_t status; /* We can write back this queue in page reclaim */ current->backing_dev_info = inode_to_bdi(inode); err = file_remove_privs(file); if (err) goto out; err = file_update_time(file); if (err) goto out; if (iocb->ki_flags & IOCB_DIRECT) { loff_t pos, endbyte; written = generic_file_direct_write(iocb, from); /* * If the write stopped short of completing, fall back to * buffered writes. Some filesystems do this for writes to * holes, for example. For DAX files, a buffered write will * not succeed (even if it did, DAX does not handle dirty * page-cache pages correctly). */ if (written < 0 || !iov_iter_count(from) || IS_DAX(inode)) goto out; status = generic_perform_write(file, from, pos = iocb->ki_pos); /* * If generic_perform_write() returned a synchronous error * then we want to return the number of bytes which were * direct-written, or the error code if that was zero. Note * that this differs from normal direct-io semantics, which * will return -EFOO even if some bytes were written. */ if (unlikely(status < 0)) { err = status; goto out; } /* * We need to ensure that the page cache pages are written to * disk and invalidated to preserve the expected O_DIRECT * semantics. */ endbyte = pos + status - 1; err = filemap_write_and_wait_range(mapping, pos, endbyte); if (err == 0) { iocb->ki_pos = endbyte + 1; written += status; invalidate_mapping_pages(mapping, pos >> PAGE_SHIFT, endbyte >> PAGE_SHIFT); } else { /* * We don't know how much we wrote, so just return * the number of bytes which were direct-written */ } } else { written = generic_perform_write(file, from, iocb->ki_pos); if (likely(written > 0)) iocb->ki_pos += written; } out: current->backing_dev_info = NULL; return written ? written : err; } EXPORT_SYMBOL(__generic_file_write_iter); /** * generic_file_write_iter - write data to a file * @iocb: IO state structure * @from: iov_iter with data to write * * This is a wrapper around __generic_file_write_iter() to be used by most * filesystems. It takes care of syncing the file in case of O_SYNC file * and acquires i_mutex as needed. */ ssize_t generic_file_write_iter(struct kiocb *iocb, struct iov_iter *from) { struct file *file = iocb->ki_filp; struct inode *inode = file->f_mapping->host; ssize_t ret; inode_lock(inode); ret = generic_write_checks(iocb, from); if (ret > 0) ret = __generic_file_write_iter(iocb, from); inode_unlock(inode); if (ret > 0) ret = generic_write_sync(iocb, ret); return ret; } EXPORT_SYMBOL(generic_file_write_iter); /** * try_to_release_page() - release old fs-specific metadata on a page * * @page: the page which the kernel is trying to free * @gfp_mask: memory allocation flags (and I/O mode) * * The address_space is to try to release any data against the page * (presumably at page->private). If the release was successful, return '1'. * Otherwise return zero. * * This may also be called if PG_fscache is set on a page, indicating that the * page is known to the local caching routines. * * The @gfp_mask argument specifies whether I/O may be performed to release * this page (__GFP_IO), and whether the call may block (__GFP_RECLAIM & __GFP_FS). * */ int try_to_release_page(struct page *page, gfp_t gfp_mask) { struct address_space * const mapping = page->mapping; BUG_ON(!PageLocked(page)); if (PageWriteback(page)) return 0; if (mapping && mapping->a_ops->releasepage) return mapping->a_ops->releasepage(page, gfp_mask); return try_to_free_buffers(page); } EXPORT_SYMBOL(try_to_release_page);
71 71 1565 3536 1566 3540 3541 3537 3539 3539 3540 201 201 275 275 275 275 33 33 33 33 33 3326 3333 3330 3331 3332 3333 64 64 64 1210 1211 1211 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 /* * Tag allocation using scalable bitmaps. Uses active queue tracking to support * fairer distribution of tags between multiple submitters when a shared tag map * is used. * * Copyright (C) 2013-2014 Jens Axboe */ #include <linux/kernel.h> #include <linux/module.h> #include <linux/blk-mq.h> #include "blk.h" #include "blk-mq.h" #include "blk-mq-tag.h" bool blk_mq_has_free_tags(struct blk_mq_tags *tags) { if (!tags) return true; return sbitmap_any_bit_clear(&tags->bitmap_tags.sb); } /* * If a previously inactive queue goes active, bump the active user count. * We need to do this before try to allocate driver tag, then even if fail * to get tag when first time, the other shared-tag users could reserve * budget for it. */ bool __blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx) { if (!test_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state) && !test_and_set_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state)) atomic_inc(&hctx->tags->active_queues); return true; } /* * Wakeup all potentially sleeping on tags */ void blk_mq_tag_wakeup_all(struct blk_mq_tags *tags, bool include_reserve) { sbitmap_queue_wake_all(&tags->bitmap_tags); if (include_reserve) sbitmap_queue_wake_all(&tags->breserved_tags); } /* * If a previously busy queue goes inactive, potential waiters could now * be allowed to queue. Wake them up and check. */ void __blk_mq_tag_idle(struct blk_mq_hw_ctx *hctx) { struct blk_mq_tags *tags = hctx->tags; if (!test_and_clear_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state)) return; atomic_dec(&tags->active_queues); blk_mq_tag_wakeup_all(tags, false); } /* * For shared tag users, we track the number of currently active users * and attempt to provide a fair share of the tag depth for each of them. */ static inline bool hctx_may_queue(struct blk_mq_hw_ctx *hctx, struct sbitmap_queue *bt) { unsigned int depth, users; if (!hctx || !(hctx->flags & BLK_MQ_F_TAG_SHARED)) return true; if (!test_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state)) return true; /* * Don't try dividing an ant */ if (bt->sb.depth == 1) return true; users = atomic_read(&hctx->tags->active_queues); if (!users) return true; /* * Allow at least some tags */ depth = max((bt->sb.depth + users - 1) / users, 4U); return atomic_read(&hctx->nr_active) < depth; } static int __blk_mq_get_tag(struct blk_mq_alloc_data *data, struct sbitmap_queue *bt) { if (!(data->flags & BLK_MQ_REQ_INTERNAL) && !hctx_may_queue(data->hctx, bt)) return -1; if (data->shallow_depth) return __sbitmap_queue_get_shallow(bt, data->shallow_depth); else return __sbitmap_queue_get(bt); } unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data) { struct blk_mq_tags *tags = blk_mq_tags_from_data(data); struct sbitmap_queue *bt; struct sbq_wait_state *ws; DEFINE_WAIT(wait); unsigned int tag_offset; bool drop_ctx; int tag; if (data->flags & BLK_MQ_REQ_RESERVED) { if (unlikely(!tags->nr_reserved_tags)) { WARN_ON_ONCE(1); return BLK_MQ_TAG_FAIL; } bt = &tags->breserved_tags; tag_offset = 0; } else { bt = &tags->bitmap_tags; tag_offset = tags->nr_reserved_tags; } tag = __blk_mq_get_tag(data, bt); if (tag != -1) goto found_tag; if (data->flags & BLK_MQ_REQ_NOWAIT) return BLK_MQ_TAG_FAIL; ws = bt_wait_ptr(bt, data->hctx); drop_ctx = data->ctx == NULL; do { struct sbitmap_queue *bt_prev; /* * We're out of tags on this hardware queue, kick any * pending IO submits before going to sleep waiting for * some to complete. */ blk_mq_run_hw_queue(data->hctx, false); /* * Retry tag allocation after running the hardware queue, * as running the queue may also have found completions. */ tag = __blk_mq_get_tag(data, bt); if (tag != -1) break; prepare_to_wait_exclusive(&ws->wait, &wait, TASK_UNINTERRUPTIBLE); tag = __blk_mq_get_tag(data, bt); if (tag != -1) break; if (data->ctx) blk_mq_put_ctx(data->ctx); bt_prev = bt; io_schedule(); data->ctx = blk_mq_get_ctx(data->q); data->hctx = blk_mq_map_queue(data->q, data->ctx->cpu); tags = blk_mq_tags_from_data(data); if (data->flags & BLK_MQ_REQ_RESERVED) bt = &tags->breserved_tags; else bt = &tags->bitmap_tags; finish_wait(&ws->wait, &wait); /* * If destination hw queue is changed, fake wake up on * previous queue for compensating the wake up miss, so * other allocations on previous queue won't be starved. */ if (bt != bt_prev) sbitmap_queue_wake_up(bt_prev); ws = bt_wait_ptr(bt, data->hctx); } while (1); if (drop_ctx && data->ctx) blk_mq_put_ctx(data->ctx); finish_wait(&ws->wait, &wait); found_tag: return tag + tag_offset; } void blk_mq_put_tag(struct blk_mq_hw_ctx *hctx, struct blk_mq_tags *tags, struct blk_mq_ctx *ctx, unsigned int tag) { if (!blk_mq_tag_is_reserved(tags, tag)) { const int real_tag = tag - tags->nr_reserved_tags; BUG_ON(real_tag >= tags->nr_tags); sbitmap_queue_clear(&tags->bitmap_tags, real_tag, ctx->cpu); } else { BUG_ON(tag >= tags->nr_reserved_tags); sbitmap_queue_clear(&tags->breserved_tags, tag, ctx->cpu); } } struct bt_iter_data { struct blk_mq_hw_ctx *hctx; busy_iter_fn *fn; void *data; bool reserved; }; static bool bt_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data) { struct bt_iter_data *iter_data = data; struct blk_mq_hw_ctx *hctx = iter_data->hctx; struct blk_mq_tags *tags = hctx->tags; bool reserved = iter_data->reserved; struct request *rq; if (!reserved) bitnr += tags->nr_reserved_tags; rq = tags->rqs[bitnr]; /* * We can hit rq == NULL here, because the tagging functions * test and set the bit before assining ->rqs[]. */ if (rq && rq->q == hctx->queue) iter_data->fn(hctx, rq, iter_data->data, reserved); return true; } static void bt_for_each(struct blk_mq_hw_ctx *hctx, struct sbitmap_queue *bt, busy_iter_fn *fn, void *data, bool reserved) { struct bt_iter_data iter_data = { .hctx = hctx, .fn = fn, .data = data, .reserved = reserved, }; sbitmap_for_each_set(&bt->sb, bt_iter, &iter_data); } struct bt_tags_iter_data { struct blk_mq_tags *tags; busy_tag_iter_fn *fn; void *data; bool reserved; }; static bool bt_tags_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data) { struct bt_tags_iter_data *iter_data = data; struct blk_mq_tags *tags = iter_data->tags; bool reserved = iter_data->reserved; struct request *rq; if (!reserved) bitnr += tags->nr_reserved_tags; /* * We can hit rq == NULL here, because the tagging functions * test and set the bit before assining ->rqs[]. */ rq = tags->rqs[bitnr]; if (rq && blk_mq_request_started(rq)) iter_data->fn(rq, iter_data->data, reserved); return true; } static void bt_tags_for_each(struct blk_mq_tags *tags, struct sbitmap_queue *bt, busy_tag_iter_fn *fn, void *data, bool reserved) { struct bt_tags_iter_data iter_data = { .tags = tags, .fn = fn, .data = data, .reserved = reserved, }; if (tags->rqs) sbitmap_for_each_set(&bt->sb, bt_tags_iter, &iter_data); } static void blk_mq_all_tag_busy_iter(struct blk_mq_tags *tags, busy_tag_iter_fn *fn, void *priv) { if (tags->nr_reserved_tags) bt_tags_for_each(tags, &tags->breserved_tags, fn, priv, true); bt_tags_for_each(tags, &tags->bitmap_tags, fn, priv, false); } void blk_mq_tagset_busy_iter(struct blk_mq_tag_set *tagset, busy_tag_iter_fn *fn, void *priv) { int i; for (i = 0; i < tagset->nr_hw_queues; i++) { if (tagset->tags && tagset->tags[i]) blk_mq_all_tag_busy_iter(tagset->tags[i], fn, priv); } } EXPORT_SYMBOL(blk_mq_tagset_busy_iter); void blk_mq_queue_tag_busy_iter(struct request_queue *q, busy_iter_fn *fn, void *priv) { struct blk_mq_hw_ctx *hctx; int i; /* * __blk_mq_update_nr_hw_queues will update the nr_hw_queues and * queue_hw_ctx after freeze the queue, so we use q_usage_counter * to avoid race with it. */ if (!percpu_ref_tryget(&q->q_usage_counter)) return; queue_for_each_hw_ctx(q, hctx, i) { struct blk_mq_tags *tags = hctx->tags; /* * If not software queues are currently mapped to this * hardware queue, there's nothing to check */ if (!blk_mq_hw_queue_mapped(hctx)) continue; if (tags->nr_reserved_tags) bt_for_each(hctx, &tags->breserved_tags, fn, priv, true); bt_for_each(hctx, &tags->bitmap_tags, fn, priv, false); } blk_queue_exit(q); } static int bt_alloc(struct sbitmap_queue *bt, unsigned int depth, bool round_robin, int node) { return sbitmap_queue_init_node(bt, depth, -1, round_robin, GFP_KERNEL, node); } static struct blk_mq_tags *blk_mq_init_bitmap_tags(struct blk_mq_tags *tags, int node, int alloc_policy) { unsigned int depth = tags->nr_tags - tags->nr_reserved_tags; bool round_robin = alloc_policy == BLK_TAG_ALLOC_RR; if (bt_alloc(&tags->bitmap_tags, depth, round_robin, node)) goto free_tags; if (bt_alloc(&tags->breserved_tags, tags->nr_reserved_tags, round_robin, node)) goto free_bitmap_tags; return tags; free_bitmap_tags: sbitmap_queue_free(&tags->bitmap_tags); free_tags: kfree(tags); return NULL; } struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags, unsigned int reserved_tags, int node, int alloc_policy) { struct blk_mq_tags *tags; if (total_tags > BLK_MQ_TAG_MAX) { pr_err("blk-mq: tag depth too large\n"); return NULL; } tags = kzalloc_node(sizeof(*tags), GFP_KERNEL, node); if (!tags) return NULL; tags->nr_tags = total_tags; tags->nr_reserved_tags = reserved_tags; return blk_mq_init_bitmap_tags(tags, node, alloc_policy); } void blk_mq_free_tags(struct blk_mq_tags *tags) { sbitmap_queue_free(&tags->bitmap_tags); sbitmap_queue_free(&tags->breserved_tags); kfree(tags); } int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx, struct blk_mq_tags **tagsptr, unsigned int tdepth, bool can_grow) { struct blk_mq_tags *tags = *tagsptr; if (tdepth <= tags->nr_reserved_tags) return -EINVAL; /* * If we are allowed to grow beyond the original size, allocate * a new set of tags before freeing the old one. */ if (tdepth > tags->nr_tags) { struct blk_mq_tag_set *set = hctx->queue->tag_set; struct blk_mq_tags *new; bool ret; if (!can_grow) return -EINVAL; /* * We need some sort of upper limit, set it high enough that * no valid use cases should require more. */ if (tdepth > 16 * BLKDEV_MAX_RQ) return -EINVAL; new = blk_mq_alloc_rq_map(set, hctx->queue_num, tdepth, tags->nr_reserved_tags); if (!new) return -ENOMEM; ret = blk_mq_alloc_rqs(set, new, hctx->queue_num, tdepth); if (ret) { blk_mq_free_rq_map(new); return -ENOMEM; } blk_mq_free_rqs(set, *tagsptr, hctx->queue_num); blk_mq_free_rq_map(*tagsptr); *tagsptr = new; } else { /* * Don't need (or can't) update reserved tags here, they * remain static and should never need resizing. */ sbitmap_queue_resize(&tags->bitmap_tags, tdepth - tags->nr_reserved_tags); } return 0; } /** * blk_mq_unique_tag() - return a tag that is unique queue-wide * @rq: request for which to compute a unique tag * * The tag field in struct request is unique per hardware queue but not over * all hardware queues. Hence this function that returns a tag with the * hardware context index in the upper bits and the per hardware queue tag in * the lower bits. * * Note: When called for a request that is queued on a non-multiqueue request * queue, the hardware context index is set to zero. */ u32 blk_mq_unique_tag(struct request *rq) { struct request_queue *q = rq->q; struct blk_mq_hw_ctx *hctx; int hwq = 0; if (q->mq_ops) { hctx = blk_mq_map_queue(q, rq->mq_ctx->cpu); hwq = hctx->queue_num; } return (hwq << BLK_MQ_UNIQUE_TAG_BITS) | (rq->tag & BLK_MQ_UNIQUE_TAG_MASK); } EXPORT_SYMBOL(blk_mq_unique_tag);
65 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 /* request_key authorisation token key type * * Copyright (C) 2005 Red Hat, Inc. All Rights Reserved. * Written by David Howells (dhowells@redhat.com) * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public Licence * as published by the Free Software Foundation; either version * 2 of the Licence, or (at your option) any later version. */ #ifndef _KEYS_REQUEST_KEY_AUTH_TYPE_H #define _KEYS_REQUEST_KEY_AUTH_TYPE_H #include <linux/key.h> /* * Authorisation record for request_key(). */ struct request_key_auth { struct key *target_key; struct key *dest_keyring; const struct cred *cred; void *callout_info; size_t callout_len; pid_t pid; char op[8]; } __randomize_layout; static inline struct request_key_auth *get_request_key_auth(const struct key *key) { return key->payload.data[0]; } #endif /* _KEYS_REQUEST_KEY_AUTH_TYPE_H */
19 19 19 19 11 11 2 2 36 34 34 34 19 19 18 18 18 20 19 20 19 19 19 19 7 7 6 6 6 6 6 6 6 2 2 1 1 1 1 1 4 4 2 2 2 2 2 2 2 9 9 8 8 8 8 8 8 8 3 2 2 1 1 4 3 2 1 1 36 2 2 3 36 36 36 35 34 1 1 5 2 1 12 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 /* * Copyright 1997-1998 Transmeta Corporation -- All Rights Reserved * Copyright 1999-2000 Jeremy Fitzhardinge <jeremy@goop.org> * Copyright 2001-2006 Ian Kent <raven@themaw.net> * * This file is part of the Linux kernel and is made available under * the terms of the GNU General Public License, version 2, or at your * option, any later version, incorporated herein by reference. */ #include <linux/capability.h> #include <linux/compat.h> #include "autofs_i.h" static int autofs_dir_symlink(struct inode *, struct dentry *, const char *); static int autofs_dir_unlink(struct inode *, struct dentry *); static int autofs_dir_rmdir(struct inode *, struct dentry *); static int autofs_dir_mkdir(struct inode *, struct dentry *, umode_t); static long autofs_root_ioctl(struct file *, unsigned int, unsigned long); #ifdef CONFIG_COMPAT static long autofs_root_compat_ioctl(struct file *, unsigned int, unsigned long); #endif static int autofs_dir_open(struct inode *inode, struct file *file); static struct dentry *autofs_lookup(struct inode *, struct dentry *, unsigned int); static struct vfsmount *autofs_d_automount(struct path *); static int autofs_d_manage(const struct path *, bool); static void autofs_dentry_release(struct dentry *); const struct file_operations autofs_root_operations = { .open = dcache_dir_open, .release = dcache_dir_close, .read = generic_read_dir, .iterate_shared = dcache_readdir, .llseek = dcache_dir_lseek, .unlocked_ioctl = autofs_root_ioctl, #ifdef CONFIG_COMPAT .compat_ioctl = autofs_root_compat_ioctl, #endif }; const struct file_operations autofs_dir_operations = { .open = autofs_dir_open, .release = dcache_dir_close, .read = generic_read_dir, .iterate_shared = dcache_readdir, .llseek = dcache_dir_lseek, }; const struct inode_operations autofs_dir_inode_operations = { .lookup = autofs_lookup, .unlink = autofs_dir_unlink, .symlink = autofs_dir_symlink, .mkdir = autofs_dir_mkdir, .rmdir = autofs_dir_rmdir, }; const struct dentry_operations autofs_dentry_operations = { .d_automount = autofs_d_automount, .d_manage = autofs_d_manage, .d_release = autofs_dentry_release, }; static void autofs_add_active(struct dentry *dentry) { struct autofs_sb_info *sbi = autofs_sbi(dentry->d_sb); struct autofs_info *ino; ino = autofs_dentry_ino(dentry); if (ino) { spin_lock(&sbi->lookup_lock); if (!ino->active_count) { if (list_empty(&ino->active)) list_add(&ino->active, &sbi->active_list); } ino->active_count++; spin_unlock(&sbi->lookup_lock); } } static void autofs_del_active(struct dentry *dentry) { struct autofs_sb_info *sbi = autofs_sbi(dentry->d_sb); struct autofs_info *ino; ino = autofs_dentry_ino(dentry); if (ino) { spin_lock(&sbi->lookup_lock); ino->active_count--; if (!ino->active_count) { if (!list_empty(&ino->active)) list_del_init(&ino->active); } spin_unlock(&sbi->lookup_lock); } } static int autofs_dir_open(struct inode *inode, struct file *file) { struct dentry *dentry = file->f_path.dentry; struct autofs_sb_info *sbi = autofs_sbi(dentry->d_sb); pr_debug("file=%p dentry=%p %pd\n", file, dentry, dentry); if (autofs_oz_mode(sbi)) goto out; /* * An empty directory in an autofs file system is always a * mount point. The daemon must have failed to mount this * during lookup so it doesn't exist. This can happen, for * example, if user space returns an incorrect status for a * mount request. Otherwise we're doing a readdir on the * autofs file system so just let the libfs routines handle * it. */ spin_lock(&sbi->lookup_lock); if (!path_is_mountpoint(&file->f_path) && simple_empty(dentry)) { spin_unlock(&sbi->lookup_lock); return -ENOENT; } spin_unlock(&sbi->lookup_lock); out: return dcache_dir_open(inode, file); } static void autofs_dentry_release(struct dentry *de) { struct autofs_info *ino = autofs_dentry_ino(de); struct autofs_sb_info *sbi = autofs_sbi(de->d_sb); pr_debug("releasing %p\n", de); if (!ino) return; if (sbi) { spin_lock(&sbi->lookup_lock); if (!list_empty(&ino->active)) list_del(&ino->active); if (!list_empty(&ino->expiring)) list_del(&ino->expiring); spin_unlock(&sbi->lookup_lock); } autofs_free_ino(ino); } static struct dentry *autofs_lookup_active(struct dentry *dentry) { struct autofs_sb_info *sbi = autofs_sbi(dentry->d_sb); struct dentry *parent = dentry->d_parent; const struct qstr *name = &dentry->d_name; unsigned int len = name->len; unsigned int hash = name->hash; const unsigned char *str = name->name; struct list_head *p, *head; head = &sbi->active_list; if (list_empty(head)) return NULL; spin_lock(&sbi->lookup_lock); list_for_each(p, head) { struct autofs_info *ino; struct dentry *active; const struct qstr *qstr; ino = list_entry(p, struct autofs_info, active); active = ino->dentry; spin_lock(&active->d_lock); /* Already gone? */ if ((int) d_count(active) <= 0) goto next; qstr = &active->d_name; if (active->d_name.hash != hash) goto next; if (active->d_parent != parent) goto next; if (qstr->len != len) goto next; if (memcmp(qstr->name, str, len)) goto next; if (d_unhashed(active)) { dget_dlock(active); spin_unlock(&active->d_lock); spin_unlock(&sbi->lookup_lock); return active; } next: spin_unlock(&active->d_lock); } spin_unlock(&sbi->lookup_lock); return NULL; } static struct dentry *autofs_lookup_expiring(struct dentry *dentry, bool rcu_walk) { struct autofs_sb_info *sbi = autofs_sbi(dentry->d_sb); struct dentry *parent = dentry->d_parent; const struct qstr *name = &dentry->d_name; unsigned int len = name->len; unsigned int hash = name->hash; const unsigned char *str = name->name; struct list_head *p, *head; head = &sbi->expiring_list; if (list_empty(head)) return NULL; spin_lock(&sbi->lookup_lock); list_for_each(p, head) { struct autofs_info *ino; struct dentry *expiring; const struct qstr *qstr; if (rcu_walk) { spin_unlock(&sbi->lookup_lock); return ERR_PTR(-ECHILD); } ino = list_entry(p, struct autofs_info, expiring); expiring = ino->dentry; spin_lock(&expiring->d_lock); /* We've already been dentry_iput or unlinked */ if (d_really_is_negative(expiring)) goto next; qstr = &expiring->d_name; if (expiring->d_name.hash != hash) goto next; if (expiring->d_parent != parent) goto next; if (qstr->len != len) goto next; if (memcmp(qstr->name, str, len)) goto next; if (d_unhashed(expiring)) { dget_dlock(expiring); spin_unlock(&expiring->d_lock); spin_unlock(&sbi->lookup_lock); return expiring; } next: spin_unlock(&expiring->d_lock); } spin_unlock(&sbi->lookup_lock); return NULL; } static int autofs_mount_wait(const struct path *path, bool rcu_walk) { struct autofs_sb_info *sbi = autofs_sbi(path->dentry->d_sb); struct autofs_info *ino = autofs_dentry_ino(path->dentry); int status = 0; if (ino->flags & AUTOFS_INF_PENDING) { if (rcu_walk) return -ECHILD; pr_debug("waiting for mount name=%pd\n", path->dentry); status = autofs_wait(sbi, path, NFY_MOUNT); pr_debug("mount wait done status=%d\n", status); } ino->last_used = jiffies; return status; } static int do_expire_wait(const struct path *path, bool rcu_walk) { struct dentry *dentry = path->dentry; struct dentry *expiring; expiring = autofs_lookup_expiring(dentry, rcu_walk); if (IS_ERR(expiring)) return PTR_ERR(expiring); if (!expiring) return autofs_expire_wait(path, rcu_walk); else { const struct path this = { .mnt = path->mnt, .dentry = expiring }; /* * If we are racing with expire the request might not * be quite complete, but the directory has been removed * so it must have been successful, just wait for it. */ autofs_expire_wait(&this, 0); autofs_del_expiring(expiring); dput(expiring); } return 0; } static struct dentry *autofs_mountpoint_changed(struct path *path) { struct dentry *dentry = path->dentry; struct autofs_sb_info *sbi = autofs_sbi(dentry->d_sb); /* * If this is an indirect mount the dentry could have gone away * as a result of an expire and a new one created. */ if (autofs_type_indirect(sbi->type) && d_unhashed(dentry)) { struct dentry *parent = dentry->d_parent; struct autofs_info *ino; struct dentry *new; new = d_lookup(parent, &dentry->d_name); if (!new) return NULL; ino = autofs_dentry_ino(new); ino->last_used = jiffies; dput(path->dentry); path->dentry = new; } return path->dentry; } static struct vfsmount *autofs_d_automount(struct path *path) { struct dentry *dentry = path->dentry; struct autofs_sb_info *sbi = autofs_sbi(dentry->d_sb); struct autofs_info *ino = autofs_dentry_ino(dentry); int status; pr_debug("dentry=%p %pd\n", dentry, dentry); /* The daemon never triggers a mount. */ if (autofs_oz_mode(sbi)) return NULL; /* * If an expire request is pending everyone must wait. * If the expire fails we're still mounted so continue * the follow and return. A return of -EAGAIN (which only * happens with indirect mounts) means the expire completed * and the directory was removed, so just go ahead and try * the mount. */ status = do_expire_wait(path, 0); if (status && status != -EAGAIN) return NULL; /* Callback to the daemon to perform the mount or wait */ spin_lock(&sbi->fs_lock); if (ino->flags & AUTOFS_INF_PENDING) { spin_unlock(&sbi->fs_lock); status = autofs_mount_wait(path, 0); if (status) return ERR_PTR(status); goto done; } /* * If the dentry is a symlink it's equivalent to a directory * having path_is_mountpoint() true, so there's no need to call * back to the daemon. */ if (d_really_is_positive(dentry) && d_is_symlink(dentry)) { spin_unlock(&sbi->fs_lock); goto done; } if (!path_is_mountpoint(path)) { /* * It's possible that user space hasn't removed directories * after umounting a rootless multi-mount, although it * should. For v5 path_has_submounts() is sufficient to * handle this because the leaves of the directory tree under * the mount never trigger mounts themselves (they have an * autofs trigger mount mounted on them). But v4 pseudo direct * mounts do need the leaves to trigger mounts. In this case * we have no choice but to use the list_empty() check and * require user space behave. */ if (sbi->version > 4) { if (path_has_submounts(path)) { spin_unlock(&sbi->fs_lock); goto done; } } else { if (!simple_empty(dentry)) { spin_unlock(&sbi->fs_lock); goto done; } } ino->flags |= AUTOFS_INF_PENDING; spin_unlock(&sbi->fs_lock); status = autofs_mount_wait(path, 0); spin_lock(&sbi->fs_lock); ino->flags &= ~AUTOFS_INF_PENDING; if (status) { spin_unlock(&sbi->fs_lock); return ERR_PTR(status); } } spin_unlock(&sbi->fs_lock); done: /* Mount succeeded, check if we ended up with a new dentry */ dentry = autofs_mountpoint_changed(path); if (!dentry) return ERR_PTR(-ENOENT); return NULL; } static int autofs_d_manage(const struct path *path, bool rcu_walk) { struct dentry *dentry = path->dentry; struct autofs_sb_info *sbi = autofs_sbi(dentry->d_sb); struct autofs_info *ino = autofs_dentry_ino(dentry); int status; pr_debug("dentry=%p %pd\n", dentry, dentry); /* The daemon never waits. */ if (autofs_oz_mode(sbi)) { if (!path_is_mountpoint(path)) return -EISDIR; return 0; } /* Wait for pending expires */ if (do_expire_wait(path, rcu_walk) == -ECHILD) return -ECHILD; /* * This dentry may be under construction so wait on mount * completion. */ status = autofs_mount_wait(path, rcu_walk); if (status) return status; if (rcu_walk) { /* We don't need fs_lock in rcu_walk mode, * just testing 'AUTOFS_INFO_NO_RCU' is enough. * simple_empty() takes a spinlock, so leave it * to last. * We only return -EISDIR when certain this isn't * a mount-trap. */ struct inode *inode; if (ino->flags & AUTOFS_INF_WANT_EXPIRE) return 0; if (path_is_mountpoint(path)) return 0; inode = d_inode_rcu(dentry); if (inode && S_ISLNK(inode->i_mode)) return -EISDIR; if (list_empty(&dentry->d_subdirs)) return 0; if (!simple_empty(dentry)) return -EISDIR; return 0; } spin_lock(&sbi->fs_lock); /* * If the dentry has been selected for expire while we slept * on the lock then it might go away. We'll deal with that in * ->d_automount() and wait on a new mount if the expire * succeeds or return here if it doesn't (since there's no * mount to follow with a rootless multi-mount). */ if (!(ino->flags & AUTOFS_INF_EXPIRING)) { /* * Any needed mounting has been completed and the path * updated so check if this is a rootless multi-mount so * we can avoid needless calls ->d_automount() and avoid * an incorrect ELOOP error return. */ if ((!path_is_mountpoint(path) && !simple_empty(dentry)) || (d_really_is_positive(dentry) && d_is_symlink(dentry))) status = -EISDIR; } spin_unlock(&sbi->fs_lock); return status; } /* Lookups in the root directory */ static struct dentry *autofs_lookup(struct inode *dir, struct dentry *dentry, unsigned int flags) { struct autofs_sb_info *sbi; struct autofs_info *ino; struct dentry *active; pr_debug("name = %pd\n", dentry); /* File name too long to exist */ if (dentry->d_name.len > NAME_MAX) return ERR_PTR(-ENAMETOOLONG); sbi = autofs_sbi(dir->i_sb); pr_debug("pid = %u, pgrp = %u, catatonic = %d, oz_mode = %d\n", current->pid, task_pgrp_nr(current), sbi->catatonic, autofs_oz_mode(sbi)); active = autofs_lookup_active(dentry); if (active) return active; else { /* * A dentry that is not within the root can never trigger a * mount operation, unless the directory already exists, so we * can return fail immediately. The daemon however does need * to create directories within the file system. */ if (!autofs_oz_mode(sbi) && !IS_ROOT(dentry->d_parent)) return ERR_PTR(-ENOENT); /* Mark entries in the root as mount triggers */ if (IS_ROOT(dentry->d_parent) && autofs_type_indirect(sbi->type)) __managed_dentry_set_managed(dentry); ino = autofs_new_ino(sbi); if (!ino) return ERR_PTR(-ENOMEM); dentry->d_fsdata = ino; ino->dentry = dentry; autofs_add_active(dentry); } return NULL; } static int autofs_dir_symlink(struct inode *dir, struct dentry *dentry, const char *symname) { struct autofs_sb_info *sbi = autofs_sbi(dir->i_sb); struct autofs_info *ino = autofs_dentry_ino(dentry); struct autofs_info *p_ino; struct inode *inode; size_t size = strlen(symname); char *cp; pr_debug("%s <- %pd\n", symname, dentry); if (!autofs_oz_mode(sbi)) return -EACCES; /* autofs_oz_mode() needs to allow path walks when the * autofs mount is catatonic but the state of an autofs * file system needs to be preserved over restarts. */ if (sbi->catatonic) return -EACCES; BUG_ON(!ino); autofs_clean_ino(ino); autofs_del_active(dentry); cp = kmalloc(size + 1, GFP_KERNEL); if (!cp) return -ENOMEM; strcpy(cp, symname); inode = autofs_get_inode(dir->i_sb, S_IFLNK | 0555); if (!inode) { kfree(cp); return -ENOMEM; } inode->i_private = cp; inode->i_size = size; d_add(dentry, inode); dget(dentry); atomic_inc(&ino->count); p_ino = autofs_dentry_ino(dentry->d_parent); if (p_ino && !IS_ROOT(dentry)) atomic_inc(&p_ino->count); dir->i_mtime = current_time(dir); return 0; } /* * NOTE! * * Normal filesystems would do a "d_delete()" to tell the VFS dcache * that the file no longer exists. However, doing that means that the * VFS layer can turn the dentry into a negative dentry. We don't want * this, because the unlink is probably the result of an expire. * We simply d_drop it and add it to a expiring list in the super block, * which allows the dentry lookup to check for an incomplete expire. * * If a process is blocked on the dentry waiting for the expire to finish, * it will invalidate the dentry and try to mount with a new one. * * Also see autofs_dir_rmdir().. */ static int autofs_dir_unlink(struct inode *dir, struct dentry *dentry) { struct autofs_sb_info *sbi = autofs_sbi(dir->i_sb); struct autofs_info *ino = autofs_dentry_ino(dentry); struct autofs_info *p_ino; if (!autofs_oz_mode(sbi)) return -EACCES; /* autofs_oz_mode() needs to allow path walks when the * autofs mount is catatonic but the state of an autofs * file system needs to be preserved over restarts. */ if (sbi->catatonic) return -EACCES; if (atomic_dec_and_test(&ino->count)) { p_ino = autofs_dentry_ino(dentry->d_parent); if (p_ino && !IS_ROOT(dentry)) atomic_dec(&p_ino->count); } dput(ino->dentry); d_inode(dentry)->i_size = 0; clear_nlink(d_inode(dentry)); dir->i_mtime = current_time(dir); spin_lock(&sbi->lookup_lock); __autofs_add_expiring(dentry); d_drop(dentry); spin_unlock(&sbi->lookup_lock); return 0; } /* * Version 4 of autofs provides a pseudo direct mount implementation * that relies on directories at the leaves of a directory tree under * an indirect mount to trigger mounts. To allow for this we need to * set the DMANAGED_AUTOMOUNT and DMANAGED_TRANSIT flags on the leaves * of the directory tree. There is no need to clear the automount flag * following a mount or restore it after an expire because these mounts * are always covered. However, it is necessary to ensure that these * flags are clear on non-empty directories to avoid unnecessary calls * during path walks. */ static void autofs_set_leaf_automount_flags(struct dentry *dentry) { struct dentry *parent; /* root and dentrys in the root are already handled */ if (IS_ROOT(dentry->d_parent)) return; managed_dentry_set_managed(dentry); parent = dentry->d_parent; /* only consider parents below dentrys in the root */ if (IS_ROOT(parent->d_parent)) return; managed_dentry_clear_managed(parent); } static void autofs_clear_leaf_automount_flags(struct dentry *dentry) { struct list_head *d_child; struct dentry *parent; /* flags for dentrys in the root are handled elsewhere */ if (IS_ROOT(dentry->d_parent)) return; managed_dentry_clear_managed(dentry); parent = dentry->d_parent; /* only consider parents below dentrys in the root */ if (IS_ROOT(parent->d_parent)) return; d_child = &dentry->d_child; /* Set parent managed if it's becoming empty */ if (d_child->next == &parent->d_subdirs && d_child->prev == &parent->d_subdirs) managed_dentry_set_managed(parent); } static int autofs_dir_rmdir(struct inode *dir, struct dentry *dentry) { struct autofs_sb_info *sbi = autofs_sbi(dir->i_sb); struct autofs_info *ino = autofs_dentry_ino(dentry); struct autofs_info *p_ino; pr_debug("dentry %p, removing %pd\n", dentry, dentry); if (!autofs_oz_mode(sbi)) return -EACCES; /* autofs_oz_mode() needs to allow path walks when the * autofs mount is catatonic but the state of an autofs * file system needs to be preserved over restarts. */ if (sbi->catatonic) return -EACCES; spin_lock(&sbi->lookup_lock); if (!simple_empty(dentry)) { spin_unlock(&sbi->lookup_lock); return -ENOTEMPTY; } __autofs_add_expiring(dentry); d_drop(dentry); spin_unlock(&sbi->lookup_lock); if (sbi->version < 5) autofs_clear_leaf_automount_flags(dentry); if (atomic_dec_and_test(&ino->count)) { p_ino = autofs_dentry_ino(dentry->d_parent); if (p_ino && dentry->d_parent != dentry) atomic_dec(&p_ino->count); } dput(ino->dentry); d_inode(dentry)->i_size = 0; clear_nlink(d_inode(dentry)); if (dir->i_nlink) drop_nlink(dir); return 0; } static int autofs_dir_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode) { struct autofs_sb_info *sbi = autofs_sbi(dir->i_sb); struct autofs_info *ino = autofs_dentry_ino(dentry); struct autofs_info *p_ino; struct inode *inode; if (!autofs_oz_mode(sbi)) return -EACCES; /* autofs_oz_mode() needs to allow path walks when the * autofs mount is catatonic but the state of an autofs * file system needs to be preserved over restarts. */ if (sbi->catatonic) return -EACCES; pr_debug("dentry %p, creating %pd\n", dentry, dentry); BUG_ON(!ino); autofs_clean_ino(ino); autofs_del_active(dentry); inode = autofs_get_inode(dir->i_sb, S_IFDIR | mode); if (!inode) return -ENOMEM; d_add(dentry, inode); if (sbi->version < 5) autofs_set_leaf_automount_flags(dentry); dget(dentry); atomic_inc(&ino->count); p_ino = autofs_dentry_ino(dentry->d_parent); if (p_ino && !IS_ROOT(dentry)) atomic_inc(&p_ino->count); inc_nlink(dir); dir->i_mtime = current_time(dir); return 0; } /* Get/set timeout ioctl() operation */ #ifdef CONFIG_COMPAT static inline int autofs_compat_get_set_timeout(struct autofs_sb_info *sbi, compat_ulong_t __user *p) { unsigned long ntimeout; int rv; rv = get_user(ntimeout, p); if (rv) goto error; rv = put_user(sbi->exp_timeout/HZ, p); if (rv) goto error; if (ntimeout > UINT_MAX/HZ) sbi->exp_timeout = 0; else sbi->exp_timeout = ntimeout * HZ; return 0; error: return rv; } #endif static inline int autofs_get_set_timeout(struct autofs_sb_info *sbi, unsigned long __user *p) { unsigned long ntimeout; int rv; rv = get_user(ntimeout, p); if (rv) goto error; rv = put_user(sbi->exp_timeout/HZ, p); if (rv) goto error; if (ntimeout > ULONG_MAX/HZ) sbi->exp_timeout = 0; else sbi->exp_timeout = ntimeout * HZ; return 0; error: return rv; } /* Return protocol version */ static inline int autofs_get_protover(struct autofs_sb_info *sbi, int __user *p) { return put_user(sbi->version, p); } /* Return protocol sub version */ static inline int autofs_get_protosubver(struct autofs_sb_info *sbi, int __user *p) { return put_user(sbi->sub_version, p); } /* * Tells the daemon whether it can umount the autofs mount. */ static inline int autofs_ask_umount(struct vfsmount *mnt, int __user *p) { int status = 0; if (may_umount(mnt)) status = 1; pr_debug("may umount %d\n", status); status = put_user(status, p); return status; } /* Identify autofs_dentries - this is so we can tell if there's * an extra dentry refcount or not. We only hold a refcount on the * dentry if its non-negative (ie, d_inode != NULL) */ int is_autofs_dentry(struct dentry *dentry) { return dentry && d_really_is_positive(dentry) && dentry->d_op == &autofs_dentry_operations && dentry->d_fsdata != NULL; } /* * ioctl()'s on the root directory is the chief method for the daemon to * generate kernel reactions */ static int autofs_root_ioctl_unlocked(struct inode *inode, struct file *filp, unsigned int cmd, unsigned long arg) { struct autofs_sb_info *sbi = autofs_sbi(inode->i_sb); void __user *p = (void __user *)arg; pr_debug("cmd = 0x%08x, arg = 0x%08lx, sbi = %p, pgrp = %u\n", cmd, arg, sbi, task_pgrp_nr(current)); if (_IOC_TYPE(cmd) != _IOC_TYPE(AUTOFS_IOC_FIRST) || _IOC_NR(cmd) - _IOC_NR(AUTOFS_IOC_FIRST) >= AUTOFS_IOC_COUNT) return -ENOTTY; if (!autofs_oz_mode(sbi) && !capable(CAP_SYS_ADMIN)) return -EPERM; switch (cmd) { case AUTOFS_IOC_READY: /* Wait queue: go ahead and retry */ return autofs_wait_release(sbi, (autofs_wqt_t) arg, 0); case AUTOFS_IOC_FAIL: /* Wait queue: fail with ENOENT */ return autofs_wait_release(sbi, (autofs_wqt_t) arg, -ENOENT); case AUTOFS_IOC_CATATONIC: /* Enter catatonic mode (daemon shutdown) */ autofs_catatonic_mode(sbi); return 0; case AUTOFS_IOC_PROTOVER: /* Get protocol version */ return autofs_get_protover(sbi, p); case AUTOFS_IOC_PROTOSUBVER: /* Get protocol sub version */ return autofs_get_protosubver(sbi, p); case AUTOFS_IOC_SETTIMEOUT: return autofs_get_set_timeout(sbi, p); #ifdef CONFIG_COMPAT case AUTOFS_IOC_SETTIMEOUT32: return autofs_compat_get_set_timeout(sbi, p); #endif case AUTOFS_IOC_ASKUMOUNT: return autofs_ask_umount(filp->f_path.mnt, p); /* return a single thing to expire */ case AUTOFS_IOC_EXPIRE: return autofs_expire_run(inode->i_sb, filp->f_path.mnt, sbi, p); /* same as above, but can send multiple expires through pipe */ case AUTOFS_IOC_EXPIRE_MULTI: return autofs_expire_multi(inode->i_sb, filp->f_path.mnt, sbi, p); default: return -EINVAL; } } static long autofs_root_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) { struct inode *inode = file_inode(filp); return autofs_root_ioctl_unlocked(inode, filp, cmd, arg); } #ifdef CONFIG_COMPAT static long autofs_root_compat_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) { struct inode *inode = file_inode(filp); int ret; if (cmd == AUTOFS_IOC_READY || cmd == AUTOFS_IOC_FAIL) ret = autofs_root_ioctl_unlocked(inode, filp, cmd, arg); else ret = autofs_root_ioctl_unlocked(inode, filp, cmd, (unsigned long) compat_ptr(arg)); return ret; } #endif
1 7 7 7 7 20 18 18 18 1 1 20 20 2 20 1 1 1 1 1 1 5 5 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 1 1 2 2 1 1 2 1 1 1 11 10 10 8 2 7 7 1 5 2 2 3 5 5 5 3 2 4 4 2 1 1 5 3 5 5 20 20 20 20 20 20 9 9 20 2 4 9 9 9 9 9 9 1 1 1 4 9 9 9 9 9 9 9 9 9 9 9 9 9 4 4 4 1 1 1 1 1 1 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 2297 2298 2299 2300 2301 2302 2303 2304 2305 2306 2307 2308 2309 2310 2311 2312 2313 2314 2315 2316 2317 2318 2319 2320 2321 2322 2323 2324 2325 2326 2327 2328 2329 2330 2331 2332 2333 2334 2335 2336 2337 2338 2339 2340 2341 2342 2343 2344 2345 2346 2347 2348 2349 2350 2351 2352 2353 2354 2355 2356 2357 2358 2359 2360 2361 2362 2363 2364 2365 2366 2367 2368 2369 2370 2371 2372 2373 2374 2375 2376 2377 2378 2379 2380 2381 2382 2383 2384 2385 2386 2387 2388 2389 2390 2391 2392 2393 2394 2395 2396 2397 2398 2399 2400 2401 2402 2403 2404 2405 2406 2407 2408 2409 2410 2411 2412 2413 2414 2415 2416 2417 2418 2419 2420 2421 2422 2423 2424 2425 2426 2427 2428 2429 2430 2431 2432 2433 2434 2435 2436 2437 2438 2439 2440 2441 2442 2443 2444 2445 2446 2447 2448 2449 2450 2451 2452 2453 2454 2455 2456 2457 2458 2459 2460 2461 2462 2463 2464 2465 2466 2467 2468 2469 2470 2471 2472 2473 2474 2475 2476 2477 2478 2479 2480 2481 2482 2483 2484 2485 2486 2487 2488 2489 2490 2491 2492 2493 2494 2495 2496 2497 2498 2499 2500 2501 2502 2503 2504 2505 2506 2507 2508 2509 2510 2511 2512 2513 2514 2515 2516 2517 2518 2519 2520 2521 2522 2523 2524 2525 2526 2527 2528 2529 2530 2531 2532 2533 2534 2535 2536 2537 2538 2539 2540 2541 2542 2543 2544 2545 2546 2547 2548 2549 2550 2551 2552 2553 2554 2555 2556 2557 2558 2559 2560 2561 2562 2563 2564 2565 2566 2567 2568 2569 2570 2571 2572 2573 2574 2575 2576 2577 2578 2579 2580 2581 2582 2583 2584 2585 2586 2587 2588 2589 2590 2591 2592 2593 2594 2595 2596 2597 2598 2599 2600 2601 2602 2603 2604 2605 2606 2607 2608 2609 2610 2611 2612 2613 2614 2615 2616 2617 2618 2619 2620 2621 2622 2623 2624 2625 2626 2627 2628 2629 2630 2631 2632 2633 2634 2635 2636 2637 2638 2639 2640 2641 2642 2643 2644 2645 2646 2647 2648 2649 2650 2651 2652 2653 2654 2655 2656 2657 2658 2659 2660 2661 2662 2663 2664 2665 2666 2667 2668 2669 2670 2671 2672 2673 2674 2675 2676 2677 2678 2679 2680 2681 2682 2683 2684 2685 2686 2687 2688 2689 2690 2691 2692 2693 2694 2695 2696 2697 2698 2699 2700 2701 2702 2703 2704 2705 2706 2707 2708 2709 2710 2711 2712 2713 2714 2715 2716 2717 2718 2719 2720 2721 2722 2723 2724 2725 2726 2727 2728 2729 2730 2731 2732 2733 2734 2735 2736 2737 2738 2739 2740 2741 2742 2743 2744 2745 2746 2747 2748 2749 2750 2751 2752 2753 2754 2755 2756 2757 2758 2759 2760 2761 2762 2763 2764 2765 2766 2767 2768 2769 2770 2771 2772 2773 2774 2775 2776 2777 2778 2779 2780 2781 2782 2783 2784 2785 2786 2787 2788 2789 2790 2791 2792 2793 2794 2795 2796 2797 2798 2799 2800 2801 2802 2803 2804 2805 2806 2807 2808 2809 2810 2811 2812 2813 2814 2815 2816 2817 2818 2819 2820 2821 2822 2823 2824 2825 2826 2827 2828 2829 2830 2831 2832 2833 2834 2835 2836 2837 2838 2839 2840 2841 2842 2843 2844 2845 2846 2847 2848 2849 2850 2851 2852 2853 2854 2855 2856 2857 2858 2859 2860 2861 2862 2863 2864 2865 2866 2867 2868 2869 2870 2871 2872 2873 2874 2875 2876 2877 2878 2879 2880 2881 2882 2883 2884 2885 2886 2887 2888 2889 2890 2891 2892 2893 2894 2895 2896 2897 2898 2899 2900 2901 2902 2903 2904 2905 2906 2907 2908 2909 2910 2911 2912 2913 2914 2915 2916 2917 2918 2919 2920 2921 2922 2923 2924 2925 2926 2927 2928 2929 2930 2931 2932 2933 2934 2935 2936 2937 2938 2939 2940 2941 2942 2943 2944 2945 2946 2947 2948 2949 2950 2951 2952 2953 2954 2955 2956 2957 2958 2959 2960 2961 2962 2963 2964 2965 2966 2967 2968 2969 2970 2971 2972 2973 2974 2975 2976 2977 2978 2979 2980 2981 2982 2983 2984 2985 2986 2987 2988 2989 2990 2991 2992 2993 2994 2995 2996 2997 2998 2999 3000 3001 3002 3003 3004 3005 3006 3007 3008 3009 3010 3011 3012 3013 3014 3015 3016 3017 3018 3019 3020 3021 3022 3023 3024 3025 3026 3027 3028 3029 3030 3031 3032 3033 3034 3035 3036 3037 3038 3039 3040 3041 3042 3043 3044 3045 3046 3047 3048 3049 3050 3051 3052 3053 3054 3055 3056 3057 3058 3059 3060 3061 3062 3063 3064 3065 3066 3067 3068 3069 3070 3071 3072 3073 3074 3075 3076 3077 3078 3079 3080 3081 3082 3083 3084 3085 3086 3087 3088 3089 3090 3091 3092 3093 3094 3095 3096 3097 3098 3099 3100 3101 3102 3103 3104 3105 3106 3107 3108 3109 3110 3111 3112 3113 3114 3115 3116 3117 3118 3119 3120 3121 3122 3123 3124 3125 3126 3127 3128 3129 3130 3131 3132 3133 3134 3135 3136 3137 3138 3139 3140 3141 3142 3143 3144 3145 3146 3147 3148 3149 3150 3151 3152 3153 3154 3155 3156 3157 3158 3159 3160 3161 3162 3163 3164 3165 3166 3167 3168 3169 3170 3171 3172 3173 3174 3175 3176 3177 3178 3179 3180 3181 3182 3183 3184 3185 3186 3187 3188 3189 3190 3191 3192 3193 3194 3195 3196 3197 3198 3199 3200 3201 3202 3203 3204 3205 3206 3207 3208 3209 3210 3211 3212 3213 3214 3215 3216 3217 3218 3219 3220 3221 3222 3223 3224 3225 3226 3227 3228 3229 3230 3231 3232 3233 3234 3235 3236 3237 3238 3239 3240 3241 3242 3243 3244 3245 3246 3247 3248 3249 3250 3251 3252 3253 3254 3255 3256 3257 3258 3259 3260 3261 3262 3263 3264 3265 3266 3267 3268 3269 3270 3271 3272 3273 3274 3275 3276 3277 3278 3279 3280 3281 3282 3283 3284 3285 3286 3287 3288 3289 3290 3291 3292 3293 3294 3295 3296 3297 3298 3299 3300 3301 3302 3303 3304 3305 3306 3307 3308 3309 3310 3311 3312 3313 3314 3315 3316 3317 3318 3319 3320 3321 3322 3323 3324 3325 3326 3327 3328 3329 3330 3331 3332 3333 3334 3335 3336 3337 3338 3339 3340 3341 3342 3343 3344 3345 3346 3347 3348 3349 3350 3351 3352 3353 3354 3355 3356 3357 3358 3359 3360 3361 3362 3363 3364 3365 3366 3367 3368 3369 3370 3371 3372 3373 3374 3375 3376 3377 3378 3379 3380 3381 3382 3383 3384 3385 3386 3387 3388 3389 3390 3391 3392 3393 3394 3395 3396 3397 3398 3399 3400 3401 3402 3403 3404 3405 3406 3407 3408 3409 3410 3411 3412 3413 3414 3415 3416 3417 3418 3419 3420 3421 3422 3423 3424 3425 3426 3427 3428 3429 3430 3431 3432 3433 3434 3435 3436 3437 3438 3439 3440 3441 3442 3443 3444 3445 3446 3447 3448 3449 3450 3451 3452 3453 3454 3455 3456 3457 3458 3459 3460 3461 3462 3463 3464 3465 3466 3467 3468 3469 3470 3471 3472 3473 3474 3475 3476 3477 3478 3479 3480 3481 3482 3483 3484 3485 3486 3487 3488 3489 3490 3491 3492 3493 3494 3495 3496 3497 3498 3499 3500 3501 3502 3503 3504 3505 3506 3507 3508 3509 3510 3511 3512 3513 3514 3515 3516 3517 3518 3519 3520 3521 3522 3523 3524 3525 3526 3527 3528 3529 3530 3531 3532 3533 3534 3535 3536 3537 3538 3539 3540 3541 3542 3543 3544 3545 3546 3547 3548 3549 3550 3551 3552 3553 3554 3555 3556 3557 3558 3559 3560 3561 3562 3563 3564 3565 3566 3567 3568 3569 3570 3571 3572 3573 3574 3575 3576 3577 3578 3579 3580 3581 3582 3583 3584 3585 3586 3587 3588 3589 3590 3591 3592 3593 3594 3595 3596 3597 3598 3599 3600 3601 3602 3603 3604 3605 3606 3607 3608 3609 3610 3611 3612 3613 3614 3615 3616 3617 3618 3619 3620 3621 3622 3623 3624 3625 3626 3627 3628 3629 3630 3631 3632 3633 3634 3635 3636 3637 3638 3639 3640 3641 3642 3643 3644 3645 3646 3647 3648 3649 3650 3651 3652 3653 3654 3655 3656 3657 3658 3659 3660 3661 3662 3663 3664 3665 3666 3667 3668 3669 3670 3671 3672 3673 3674 3675 3676 3677 3678 3679 3680 3681 3682 3683 3684 3685 3686 3687 3688 3689 3690 3691 3692 3693 3694 3695 3696 3697 3698 3699 3700 3701 3702 3703 3704 3705 3706 3707 3708 3709 3710 3711 3712 3713 3714 3715 3716 3717 3718 3719 3720 3721 3722 3723 3724 3725 3726 3727 3728 3729 3730 3731 3732 3733 3734 3735 3736 3737 3738 3739 3740 3741 3742 3743 3744 3745 3746 3747 3748 3749 3750 3751 3752 3753 3754 3755 3756 3757 3758 3759 3760 3761 3762 3763 3764 3765 3766 3767 3768 3769 3770 3771 3772 3773 3774 3775 3776 3777 3778 3779 3780 3781 3782 3783 3784 3785 3786 3787 3788 3789 3790 3791 3792 3793 3794 3795 3796 3797 3798 3799 3800 3801 3802 3803 3804 3805 3806 3807 3808 3809 3810 3811 3812 3813 3814 3815 3816 3817 3818 3819 3820 3821 3822 3823 3824 3825 3826 3827 3828 3829 3830 3831 3832 3833 3834 3835 3836 3837 3838 3839 3840 3841 3842 3843 3844 3845 3846 3847 3848 3849 3850 3851 3852 3853 3854 3855 3856 3857 3858 3859 3860 3861 3862 3863 3864 3865 3866 3867 3868 3869 3870 3871 3872 3873 3874 3875 3876 3877 3878 3879 3880 3881 3882 3883 3884 3885 3886 3887 3888 3889 3890 3891 3892 3893 3894 3895 3896 3897 3898 3899 3900 3901 3902 3903 3904 3905 3906 3907 3908 3909 3910 3911 3912 3913 3914 3915 3916 3917 3918 3919 3920 3921 3922 3923 3924 3925 3926 3927 3928 3929 3930 3931 3932 3933 3934 3935 3936 3937 3938 3939 3940 3941 3942 3943 3944 3945 3946 3947 3948 3949 3950 3951 3952 3953 3954 3955 3956 3957 3958 3959 3960 3961 3962 3963 3964 3965 3966 3967 3968 3969 3970 3971 3972 3973 3974 3975 3976 3977 3978 3979 3980 3981 3982 3983 3984 3985 3986 3987 3988 3989 3990 3991 3992 3993 3994 3995 3996 3997 3998 3999 4000 4001 4002 4003 4004 4005 4006 4007 4008 4009 4010 4011 4012 4013 4014 4015 4016 4017 4018 4019 4020 4021 4022 4023 4024 4025 4026 4027 4028 4029 4030 4031 4032 4033 4034 4035 4036 4037 4038 4039 4040 4041 4042 4043 4044 4045 4046 4047 4048 4049 4050 4051 4052 4053 4054 4055 4056 4057 4058 4059 4060 4061 4062 4063 4064 4065 4066 4067 4068 4069 4070 4071 4072 4073 4074 4075 4076 4077 4078 4079 4080 4081 4082 4083 4084 4085 4086 4087 4088 4089 4090 4091 4092 4093 4094 4095 4096 4097 4098 4099 4100 4101 4102 4103 4104 4105 4106 4107 4108 4109 4110 4111 4112 4113 4114 4115 4116 4117 4118 4119 4120 4121 4122 4123 4124 4125 4126 4127 4128 4129 4130 4131 4132 4133 4134 4135 4136 4137 4138 4139 4140 4141 4142 4143 4144 4145 4146 4147 4148 4149 4150 4151 4152 4153 4154 4155 4156 4157 4158 4159 4160 4161 4162 4163 4164 4165 4166 4167 4168 4169 4170 4171 4172 4173 4174 4175 4176 4177 4178 4179 4180 4181 4182 4183 4184 4185 4186 4187 4188 4189 4190 4191 4192 4193 4194 4195 4196 4197 4198 4199 4200 4201 4202 4203 4204 4205 4206 4207 4208 4209 4210 4211 4212 4213 4214 4215 4216 4217 4218 4219 4220 4221 4222 4223 4224 4225 4226 4227 4228 4229 4230 4231 4232 4233 4234 4235 4236 4237 4238 4239 4240 4241 4242 4243 4244 4245 4246 4247 4248 4249 4250 4251 4252 4253 4254 4255 4256 4257 4258 4259 4260 4261 4262 4263 4264 4265 4266 4267 4268 4269 4270 4271 4272 4273 4274 4275 4276 4277 4278 4279 4280 4281 4282 4283 4284 4285 4286 4287 4288 4289 4290 4291 4292 4293 4294 4295 4296 4297 4298 4299 4300 4301 4302 4303 4304 4305 4306 4307 4308 4309 4310 4311 4312 4313 4314 4315 4316 4317 4318 4319 4320 4321 4322 4323 4324 4325 4326 4327 4328 4329 4330 4331 4332 4333 4334 4335 4336 4337 4338 4339 4340 4341 4342 4343 4344 4345 4346 4347 4348 4349 4350 4351 4352 4353 4354 4355 4356 4357 4358 4359 4360 4361 4362 4363 4364 4365 4366 4367 4368 4369 4370 4371 4372 4373 4374 4375 4376 4377 4378 4379 4380 4381 4382 4383 4384 4385 4386 4387 4388 4389 4390 4391 4392 4393 4394 4395 4396 4397 4398 4399 4400 4401 4402 4403 4404 4405 4406 4407 4408 4409 4410 4411 4412 4413 4414 4415 4416 4417 4418 4419 4420 4421 4422 4423 4424 4425 4426 4427 4428 4429 4430 4431 4432 4433 4434 4435 4436 4437 4438 4439 4440 4441 4442 4443 4444 4445 4446 4447 4448 4449 4450 4451 4452 4453 4454 4455 4456 4457 4458 4459 4460 4461 4462 4463 4464 4465 4466 4467 4468 4469 4470 4471 4472 4473 4474 4475 4476 4477 4478 4479 4480 4481 4482 4483 4484 4485 4486 4487 4488 4489 4490 4491 4492 4493 4494 4495 4496 4497 4498 4499 4500 4501 4502 4503 4504 4505 4506 4507 4508 4509 4510 4511 4512 4513 4514 4515 4516 4517 4518 4519 4520 4521 4522 4523 4524 4525 4526 4527 4528 4529 4530 4531 4532 4533 4534 4535 4536 4537 4538 4539 4540 4541 4542 4543 4544 4545 4546 4547 4548 4549 4550 4551 4552 4553 4554 4555 4556 4557 4558 4559 4560 4561 4562 4563 4564 4565 4566 4567 4568 4569 4570 4571 4572 4573 4574 4575 4576 4577 4578 4579 4580 4581 4582 4583 4584 4585 4586 4587 4588 4589 4590 4591 4592 4593 4594 4595 4596 4597 4598 4599 4600 4601 4602 4603 4604 4605 4606 4607 4608 4609 4610 4611 4612 4613 4614 4615 4616 4617 4618 4619 4620 4621 4622 4623 4624 4625 4626 4627 4628 4629 4630 4631 4632 4633 4634 4635 4636 4637 4638 4639 4640 4641 4642 4643 4644 4645 4646 4647 4648 4649 4650 4651 4652 4653 4654 4655 4656 4657 4658 4659 4660 4661 4662 4663 4664 4665 4666 4667 4668 4669 4670 4671 4672 4673 4674 4675 4676 4677 4678 4679 4680 4681 4682 4683 4684 4685 4686 4687 4688 4689 4690 4691 4692 4693 4694 4695 4696 4697 4698 4699 4700 4701 4702 4703 4704 4705 4706 4707 4708 4709 4710 4711 4712 4713 4714 4715 4716 4717 4718 4719 4720 4721 4722 4723 4724 4725 4726 4727 4728 4729 4730 4731 4732 4733 4734 4735 4736 4737 4738 4739 4740 4741 4742 4743 4744 4745 4746 4747 4748 4749 4750 4751 4752 4753 4754 4755 4756 4757 4758 4759 4760 4761 4762 4763 4764 4765 4766 4767 4768 4769 4770 4771 4772 4773 4774 4775 4776 4777 4778 4779 4780 4781 4782 4783 4784 4785 4786 4787 4788 4789 4790 4791 4792 4793 4794 4795 4796 4797 4798 4799 4800 4801 4802 4803 4804 4805 4806 4807 4808 4809 4810 4811 4812 4813 4814 4815 4816 4817 4818 4819 4820 4821 4822 4823 4824 4825 4826 4827 4828 4829 4830 4831 4832 4833 4834 4835 4836 4837 4838 4839 4840 4841 4842 4843 4844 4845 4846 4847 4848 4849 4850 4851 4852 4853 4854 4855 4856 4857 4858 4859 4860 4861 4862 4863 4864 4865 4866 4867 4868 4869 4870 4871 4872 4873 4874 4875 4876 4877 4878 4879 4880 4881 4882 4883 4884 4885 4886 4887 4888 4889 4890 4891 4892 4893 4894 4895 4896 4897 4898 4899 4900 4901 4902 4903 4904 4905 4906 4907 4908 4909 4910 4911 4912 4913 4914 4915 4916 4917 4918 4919 4920 4921 4922 4923 4924 4925 4926 4927 4928 4929 4930 4931 4932 4933 4934 4935 4936 4937 4938 4939 4940 4941 4942 4943 4944 4945 4946 4947 4948 4949 4950 4951 4952 4953 4954 4955 4956 4957 4958 4959 4960 4961 4962 4963 4964 4965 4966 4967 4968 4969 4970 4971 4972 4973 4974 4975 4976 4977 4978 4979 4980 4981 4982 4983 4984 4985 4986 4987 4988 4989 4990 4991 4992 4993 4994 4995 4996 4997 4998 4999 5000 5001 5002 5003 5004 5005 5006 5007 5008 5009 5010 5011 5012 5013 5014 5015 5016 5017 5018 5019 5020 5021 5022 5023 5024 5025 5026 5027 5028 5029 5030 5031 5032 5033 5034 5035 5036 5037 5038 5039 5040 5041 5042 5043 5044 5045 5046 5047 5048 5049 5050 5051 5052 5053 5054 5055 5056 5057 5058 5059 5060 5061 5062 5063 5064 5065 5066 5067 5068 5069 5070 5071 5072 5073 5074 5075 5076 5077 5078 5079 5080 5081 5082 5083 5084 5085 5086 5087 5088 5089 5090 5091 5092 5093 5094 5095 5096 5097 5098 5099 5100 5101 5102 5103 5104 5105 5106 5107 5108 5109 5110 5111 5112 5113 5114 5115 5116 5117 5118 5119 5120 5121 5122 5123 5124 5125 5126 5127 5128 5129 5130 5131 5132 5133 5134 5135 5136 5137 5138 5139 5140 5141 5142 5143 5144 5145 5146 5147 5148 5149 5150 5151 5152 5153 5154 5155 5156 5157 5158 5159 5160 5161 5162 5163 5164 5165 5166 5167 5168 5169 5170 5171 5172 5173 5174 5175 5176 5177 5178 5179 5180 5181 5182 5183 5184 5185 5186 5187 5188 5189 5190 5191 5192 5193 5194 5195 5196 5197 5198 5199 5200 5201 5202 5203 5204 5205 5206 5207 5208 5209 5210 5211 5212 5213 5214 5215 5216 5217 5218 5219 5220 5221 5222 5223 5224 5225 5226 5227 5228 5229 5230 5231 5232 5233 5234 5235 5236 5237 5238 5239 5240 5241 5242 5243 5244 5245 5246 5247 5248 5249 5250 5251 5252 5253 5254 5255 5256 5257 5258 5259 5260 5261 5262 5263 5264 5265 5266 5267 5268 5269 5270 5271 5272 5273 5274 5275 5276 5277 5278 5279 5280 5281 5282 5283 5284 5285 5286 5287 5288 5289 5290 5291 5292 5293 5294 5295 5296 5297 5298 5299 5300 5301 5302 5303 5304 5305 5306 5307 5308 5309 5310 5311 5312 5313 5314 5315 5316 5317 5318 5319 5320 5321 5322 5323 5324 5325 5326 5327 5328 5329 5330 5331 5332 5333 5334 5335 5336 5337 5338 5339 5340 5341 5342 5343 5344 5345 5346 5347 5348 5349 5350 5351 5352 5353 5354 5355 5356 5357 5358 5359 5360 5361 5362 5363 5364 5365 5366 5367 5368 5369 5370 5371 5372 5373 5374 5375 5376 5377 5378 5379 5380 5381 5382 5383 5384 5385 5386 5387 5388 5389 5390 5391 5392 5393 5394 5395 5396 5397 5398 5399 5400 5401 5402 5403 5404 5405 5406 5407 5408 5409 5410 5411 5412 5413 5414 5415 5416 5417 5418 5419 5420 5421 5422 5423 5424 5425 5426 5427 5428 5429 5430 5431 5432 5433 5434 5435 5436 5437 5438 5439 5440 5441 5442 5443 5444 5445 5446 5447 5448 5449 5450 5451 5452 5453 5454 5455 5456 5457 5458 5459 5460 5461 5462 5463 5464 5465 5466 5467 5468 5469 5470 5471 5472 5473 5474 5475 5476 5477 5478 5479 5480 5481 5482 5483 5484 5485 5486 5487 5488 5489 5490 5491 5492 5493 5494 5495 5496 5497 5498 5499 5500 5501 5502 5503 5504 5505 5506 5507 5508 5509 5510 5511 5512 5513 5514 5515 5516 5517 5518 5519 5520 5521 5522 5523 5524 5525 5526 5527 5528 5529 5530 5531 5532 5533 5534 5535 5536 5537 5538 5539 5540 5541 5542 5543 5544 5545 5546 5547 5548 5549 5550 5551 5552 5553 5554 5555 5556 5557 5558 5559 5560 5561 5562 5563 5564 5565 5566 5567 5568 5569 5570 5571 5572 5573 5574 5575 5576 5577 5578 5579 5580 5581 5582 5583 5584 5585 5586 5587 5588 5589 5590 5591 5592 5593 5594 5595 5596 5597 5598 5599 5600 5601 5602 5603 5604 5605 5606 5607 5608 5609 5610 5611 5612 5613 5614 5615 5616 5617 5618 5619 5620 5621 5622 5623 5624 5625 5626 5627 5628 5629 5630 5631 5632 5633 5634 5635 5636 5637 5638 5639 5640 5641 5642 5643 5644 5645 5646 5647 5648 5649 5650 5651 5652 5653 5654 5655 5656 5657 5658 5659 5660 5661 5662 5663 5664 5665 5666 5667 5668 5669 5670 5671 5672 5673 5674 5675 5676 5677 5678 5679 5680 5681 5682 5683 5684 5685 5686 5687 5688 5689 5690 5691 5692 5693 5694 5695 5696 5697 5698 5699 5700 5701 5702 5703 5704 5705 5706 5707 5708 5709 5710 5711 5712 5713 5714 5715 5716 5717 5718 5719 5720 5721 5722 5723 5724 5725 5726 5727 5728 5729 5730 5731 5732 5733 5734 5735 5736 5737 5738 5739 5740 5741 5742 5743 5744 5745 5746 5747 5748 5749 5750 5751 5752 5753 5754 5755 5756 5757 5758 5759 5760 5761 5762 5763 5764 5765 5766 5767 5768 5769 5770 5771 5772 5773 5774 5775 5776 5777 5778 5779 5780 5781 5782 5783 5784 5785 5786 5787 5788 5789 5790 5791 5792 5793 5794 5795 5796 5797 5798 5799 5800 5801 5802 5803 5804 5805 5806 5807 5808 5809 5810 5811 5812 5813 5814 5815 5816 5817 5818 5819 5820 5821 5822 5823 5824 5825 5826 5827 5828 5829 5830 5831 5832 5833 5834 5835 5836 5837 5838 5839 5840 5841 5842 5843 5844 5845 5846 5847 5848 5849 5850 5851 5852 5853 5854 5855 5856 5857 5858 5859 5860 5861 5862 5863 5864 5865 5866 5867 5868 5869 5870 5871 5872 5873 5874 5875 5876 5877 5878 5879 5880 5881 5882 5883 5884 5885 5886 5887 5888 5889 5890 5891 5892 5893 5894 5895 5896 5897 5898 5899 5900 5901 5902 5903 5904 5905 5906 5907 5908 5909 5910 5911 5912 5913 5914 5915 5916 5917 5918 5919 5920 5921 5922 5923 5924 5925 5926 5927 5928 5929 5930 5931 5932 5933 5934 5935 5936 5937 5938 5939 5940 5941 5942 5943 5944 5945 5946 5947 5948 5949 5950 5951 5952 5953 5954 5955 5956 5957 5958 5959 5960 5961 5962 5963 5964 5965 5966 5967 5968 5969 5970 5971 5972 5973 5974 5975 5976 5977 5978 5979 5980 5981 5982 5983 5984 5985 5986 5987 5988 5989 5990 5991 5992 5993 5994 5995 5996 5997 5998 5999 6000 6001 6002 6003 6004 6005 6006 6007 6008 6009 6010 6011 6012 6013 6014 6015 6016 6017 6018 6019 6020 6021 6022 6023 6024 6025 6026 6027 6028 6029 6030 6031 6032 6033 6034 6035 6036 6037 6038 6039 6040 6041 6042 6043 6044 6045 6046 6047 6048 6049 6050 6051 6052 6053 6054 6055 6056 6057 6058 6059 6060 6061 6062 6063 6064 6065 6066 6067 6068 6069 6070 6071 6072 6073 6074 6075 6076 6077 6078 6079 6080 6081 6082 6083 6084 6085 6086 6087 6088 6089 6090 6091 6092 6093 6094 6095 6096 6097 6098 6099 6100 6101 6102 6103 6104 6105 6106 6107 6108 6109 6110 6111 6112 6113 6114 6115 6116 6117 6118 6119 6120 6121 6122 6123 6124 6125 6126 6127 6128 6129 6130 6131 6132 6133 6134 6135 6136 6137 6138 6139 6140 6141 6142 6143 6144 6145 6146 6147 6148 6149 6150 6151 6152 6153 6154 6155 6156 6157 6158 6159 6160 6161 6162 6163 6164 6165 6166 6167 6168 6169 6170 6171 6172 6173 6174 6175 6176 6177 6178 6179 6180 6181 6182 6183 6184 6185 6186 6187 6188 6189 6190 6191 6192 6193 6194 6195 6196 6197 6198 6199 6200 6201 6202 6203 6204 6205 6206 6207 6208 6209 6210 6211 6212 6213 6214 6215 6216 6217 6218 6219 6220 6221 6222 6223 6224 6225 6226 6227 6228 6229 6230 6231 6232 6233 6234 6235 6236 6237 6238 6239 6240 6241 6242 6243 6244 6245 6246 6247 6248 6249 6250 6251 6252 6253 6254 6255 6256 6257 6258 6259 6260 6261 6262 6263 6264 6265 6266 6267 6268 6269 6270 6271 6272 6273 6274 6275 6276 6277 6278 6279 6280 6281 6282 6283 6284 6285 6286 6287 6288 6289 6290 6291 6292 6293 6294 6295 6296 6297 6298 6299 6300 6301 6302 6303 6304 6305 6306 6307 6308 6309 6310 6311 6312 6313 6314 6315 6316 6317 6318 6319 6320 6321 6322 6323 6324 6325 6326 6327 6328 6329 6330 6331 6332 6333 6334 6335 6336 6337 6338 6339 6340 6341 6342 6343 6344 6345 6346 6347 6348 6349 6350 6351 6352 6353 6354 6355 6356 6357 6358 6359 6360 6361 6362 6363 6364 6365 6366 6367 6368 6369 6370 6371 6372 6373 6374 6375 6376 6377 6378 6379 6380 6381 6382 6383 6384 6385 6386 6387 6388 6389 6390 6391 6392 6393 6394 6395 6396 6397 6398 6399 6400 6401 6402 6403 6404 6405 6406 6407 6408 6409 6410 6411 6412 6413 6414 6415 6416 6417 6418 6419 6420 6421 6422 6423 6424 6425 6426 6427 6428 6429 6430 6431 6432 6433 6434 6435 6436 6437 6438 6439 6440 6441 6442 6443 6444 6445 6446 6447 6448 6449 6450 6451 6452 6453 6454 6455 6456 6457 6458 6459 6460 6461 6462 6463 6464 6465 6466 6467 6468 6469 6470 6471 6472 6473 6474 6475 6476 6477 6478 6479 6480 6481 6482 6483 6484 6485 6486 6487 6488 6489 6490 6491 6492 6493 6494 6495 6496 6497 6498 6499 6500 6501 6502 6503 6504 6505 6506 6507 6508 6509 6510 6511 6512 6513 6514 6515 6516 6517 6518 6519 6520 6521 6522 6523 6524 6525 6526 6527 6528 6529 6530 6531 6532 6533 6534 6535 6536 6537 6538 6539 6540 6541 6542 6543 6544 6545 6546 6547 6548 6549 6550 6551 6552 6553 6554 6555 6556 6557 6558 6559 6560 6561 6562 6563 6564 6565 6566 6567 6568 6569 6570 6571 6572 6573 6574 6575 6576 6577 6578 6579 6580 6581 6582 6583 6584 6585 6586 6587 6588 6589 6590 6591 6592 6593 6594 6595 6596 6597 6598 6599 6600 6601 6602 6603 6604 6605 6606 6607 6608 6609 6610 6611 6612 6613 6614 6615 6616 6617 6618 6619 6620 6621 6622 6623 6624 6625 6626 6627 6628 6629 6630 6631 6632 6633 6634 6635 6636 6637 6638 6639 6640 6641 6642 6643 6644 6645 6646 6647 6648 6649 6650 6651 6652 6653 6654 6655 6656 6657 6658 6659 6660 6661 6662 6663 6664 6665 6666 6667 6668 6669 6670 6671 6672 6673 6674 6675 6676 6677 6678 6679 6680 6681 6682 6683 6684 6685 6686 6687 6688 6689 6690 6691 6692 6693 6694 6695 6696 6697 6698 6699 6700 6701 6702 6703 6704 6705 6706 6707 6708 6709 6710 6711 6712 6713 6714 6715 6716 6717 6718 6719 6720 6721 6722 6723 6724 6725 6726 6727 6728 6729 6730 6731 6732 6733 6734 6735 6736 6737 6738 6739 6740 6741 6742 6743 6744 6745 6746 6747 6748 6749 6750 6751 6752 6753 6754 6755 6756 6757 6758 6759 6760 6761 6762 6763 6764 6765 6766 6767 6768 6769 6770 6771 6772 6773 6774 6775 6776 6777 6778 6779 6780 6781 6782 6783 6784 6785 6786 6787 6788 6789 6790 6791 6792 6793 6794 6795 6796 6797 6798 6799 6800 6801 6802 6803 6804 6805 6806 6807 6808 6809 6810 6811 6812 6813 6814 6815 6816 6817 6818 6819 6820 6821 6822 6823 6824 6825 6826 6827 6828 6829 6830 6831 6832 6833 6834 6835 6836 6837 6838 6839 6840 6841 6842 6843 6844 6845 6846 6847 6848 6849 6850 6851 6852 6853 6854 6855 6856 6857 6858 6859 6860 6861 6862 6863 6864 6865 6866 6867 6868 6869 6870 6871 6872 6873 6874 6875 6876 6877 6878 6879 6880 6881 6882 6883 6884 6885 6886 6887 6888 6889 6890 6891 6892 6893 6894 6895 6896 6897 6898 6899 6900 6901 6902 6903 6904 6905 6906 6907 6908 6909 6910 6911 6912 6913 6914 6915 6916 6917 6918 6919 6920 6921 6922 6923 6924 6925 6926 6927 6928 6929 6930 6931 6932 6933 6934 6935 6936 6937 6938 6939 6940 6941 6942 6943 6944 6945 6946 6947 6948 6949 6950 6951 6952 6953 6954 6955 6956 6957 6958 6959 6960 6961 6962 6963 6964 6965 6966 6967 6968 6969 6970 6971 6972 6973 6974 6975 6976 6977 6978 6979 6980 6981 6982 6983 6984 6985 6986 6987 6988 6989 6990 6991 6992 6993 6994 6995 6996 6997 6998 6999 7000 7001 7002 7003 7004 7005 7006 7007 7008 7009 7010 7011 7012 7013 7014 7015 7016 7017 7018 7019 7020 7021 7022 7023 7024 7025 7026 7027 7028 7029 7030 7031 7032 7033 7034 7035 7036 7037 7038 7039 7040 7041 7042 7043 7044 7045 7046 7047 7048 7049 7050 7051 7052 7053 7054 7055 7056 7057 7058 7059 7060 7061 7062 7063 7064 7065 7066 7067 7068 7069 7070 7071 7072 7073 7074 7075 7076 7077 7078 7079 7080 7081 7082 7083 7084 7085 7086 7087 7088 7089 7090 7091 7092 7093 7094 7095 7096 7097 7098 7099 7100 7101 7102 7103 7104 7105 7106 7107 7108 7109 7110 7111 7112 7113 7114 7115 7116 7117 7118 7119 7120 7121 7122 7123 7124 7125 7126 7127 7128 7129 7130 7131 7132 7133 7134 7135 7136 7137 7138 7139 7140 7141 7142 7143 7144 7145 7146 7147 7148 7149 7150 7151 7152 7153 7154 7155 7156 7157 7158 7159 7160 7161 7162 7163 7164 7165 7166 7167 7168 7169 7170 7171 7172 7173 7174 7175 7176 7177 7178 7179 7180 7181 7182 7183 7184 7185 7186 7187 7188 7189 7190 7191 7192 7193 7194 7195 7196 7197 7198 7199 7200 7201 7202 7203 7204 7205 7206 7207 7208 7209 7210 7211 7212 7213 7214 7215 7216 7217 7218 7219 7220 7221 7222 7223 7224 7225 7226 7227 7228 7229 7230 7231 7232 7233 7234 7235 7236 7237 7238 7239 7240 7241 7242 7243 7244 7245 7246 7247 7248 7249 7250 7251 7252 7253 7254 7255 7256 7257 7258 7259 7260 7261 7262 7263 7264 7265 7266 7267 7268 7269 7270 7271 7272 7273 7274 7275 7276 7277 7278 7279 7280 7281 7282 7283 7284 7285 7286 7287 7288 7289 7290 7291 7292 7293 7294 7295 7296 7297 7298 7299 7300 7301 7302 7303 7304 7305 7306 7307 7308 7309 7310 7311 7312 7313 7314 7315 7316 7317 7318 7319 7320 7321 7322 7323 7324 7325 7326 7327 7328 7329 7330 7331 7332 7333 7334 7335 7336 7337 7338 7339 7340 7341 7342 7343 7344 7345 7346 7347 7348 7349 7350 7351 7352 7353 7354 7355 7356 7357 7358 7359 7360 7361 7362 7363 7364 7365 7366 7367 7368 7369 7370 7371 7372 7373 7374 7375 7376 7377 7378 7379 7380 7381 7382 7383 7384 7385 7386 7387 7388 7389 7390 7391 7392 7393 7394 7395 7396 7397 7398 7399 7400 7401 7402 7403 7404 7405 7406 7407 7408 7409 7410 7411 7412 7413 7414 7415 7416 7417 7418 7419 7420 7421 7422 7423 7424 7425 7426 7427 7428 7429 7430 7431 7432 7433 7434 7435 7436 7437 7438 7439 7440 7441 7442 7443 7444 7445 7446 7447 7448 7449 7450 7451 7452 7453 7454 7455 7456 7457 7458 7459 7460 7461 7462 7463 7464 7465 7466 7467 7468 7469 7470 7471 7472 7473 7474 7475 7476 7477 7478 7479 7480 7481 7482 7483 7484 7485 7486 7487 7488 7489 7490 7491 7492 7493 7494 7495 7496 7497 7498 7499 7500 7501 7502 7503 7504 7505 7506 7507 7508 7509 7510 7511 7512 7513 7514 7515 7516 7517 7518 7519 7520 7521 7522 7523 7524 7525 7526 7527 7528 7529 7530 7531 7532 7533 7534 7535 7536 7537 7538 7539 7540 7541 7542 7543 7544 7545 7546 7547 7548 7549 7550 7551 7552 7553 7554 7555 7556 7557 7558 7559 7560 7561 7562 7563 7564 7565 7566 7567 7568 7569 7570 7571 7572 7573 7574 7575 7576 7577 7578 7579 7580 7581 7582 7583 7584 7585 7586 7587 7588 7589 7590 7591 7592 7593 7594 7595 7596 7597 7598 7599 7600 7601 7602 7603 7604 7605 7606 7607 7608 7609 7610 7611 7612 7613 7614 7615 7616 7617 7618 7619 7620 7621 7622 7623 7624 7625 7626 7627 7628 7629 7630 7631 7632 7633 7634 7635 7636 7637 7638 7639 7640 7641 7642 7643 7644 7645 7646 7647 7648 7649 7650 7651 7652 7653 7654 7655 7656 7657 7658 7659 7660 7661 7662 7663 7664 7665 7666 7667 7668 7669 7670 7671 7672 7673 7674 7675 7676 7677 7678 7679 7680 7681 7682 7683 7684 7685 7686 7687 7688 7689 7690 7691 7692 7693 7694 7695 7696 7697 7698 7699 7700 7701 7702 7703 7704 7705 7706 7707 7708 7709 7710 7711 7712 7713 7714 7715 7716 7717 7718 7719 7720 7721 7722 7723 7724 7725 7726 7727 7728 7729 7730 7731 7732 7733 7734 7735 7736 7737 7738 7739 7740 7741 7742 7743 7744 7745 7746 7747 7748 7749 7750 7751 7752 7753 7754 7755 7756 7757 7758 7759 7760 7761 7762 7763 7764 7765 7766 7767 7768 7769 7770 7771 7772 7773 7774 7775 7776 7777 7778 7779 7780 7781 7782 7783 7784 7785 7786 7787 7788 7789 7790 7791 7792 7793 7794 7795 7796 7797 7798 7799 7800 7801 7802 7803 7804 7805 7806 7807 7808 7809 7810 7811 7812 7813 7814 7815 7816 7817 7818 7819 7820 7821 7822 7823 7824 7825 7826 7827 7828 7829 7830 7831 7832 7833 7834 7835 7836 7837 7838 7839 7840 7841 7842 7843 7844 7845 7846 7847 7848 7849 7850 7851 7852 7853 7854 7855 7856 7857 7858 7859 7860 7861 7862 7863 7864 7865 7866 7867 7868 7869 7870 7871 7872 7873 7874 7875 7876 7877 7878 7879 7880 7881 7882 7883 7884 7885 7886 7887 7888 7889 7890 7891 7892 7893 7894 7895 7896 7897 7898 7899 7900 7901 7902 7903 7904 7905 7906 7907 7908 7909 7910 7911 7912 7913 7914 7915 7916 7917 7918 7919 7920 7921 7922 7923 7924 7925 7926 7927 7928 7929 7930 7931 7932 7933 7934 7935 7936 7937 7938 7939 7940 7941 7942 7943 7944 7945 7946 7947 7948 7949 7950 7951 7952 7953 7954 7955 7956 7957 7958 7959 7960 7961 7962 7963 7964 7965 7966 7967 7968 7969 7970 7971 7972 7973 7974 7975 7976 7977 7978 7979 7980 7981 7982 7983 7984 7985 7986 7987 7988 7989 7990 7991 7992 7993 7994 7995 7996 7997 7998 7999 8000 8001 8002 8003 8004 8005 8006 8007 8008 8009 8010 8011 8012 8013 8014 8015 8016 8017 8018 8019 8020 8021 8022 8023 8024 8025 8026 8027 8028 8029 8030 8031 8032 8033 8034 8035 8036 8037 8038 8039 8040 8041 8042 8043 8044 8045 8046 8047 8048 8049 8050 8051 8052 8053 8054 8055 8056 8057 8058 8059 8060 8061 8062 8063 8064 8065 8066 8067 8068 8069 8070 8071 8072 8073 8074 8075 8076 8077 8078 8079 8080 8081 8082 8083 8084 8085 8086 8087 8088 8089 8090 8091 8092 8093 8094 8095 8096 8097 8098 8099 8100 8101 8102 8103 8104 8105 8106 8107 8108 8109 8110 8111 8112 8113 8114 8115 8116 8117 8118 8119 8120 8121 8122 8123 8124 8125 8126 8127 8128 8129 8130 8131 8132 8133 8134 8135 8136 8137 8138 8139 8140 8141 8142 8143 8144 8145 8146 8147 8148 8149 8150 8151 8152 8153 8154 8155 8156 8157 8158 8159 8160 8161 8162 8163 8164 8165 8166 8167 8168 8169 8170 8171 8172 8173 8174 8175 8176 8177 8178 8179 8180 8181 8182 8183 8184 8185 8186 8187 8188 8189 8190 8191 8192 8193 8194 8195 8196 8197 8198 8199 8200 8201 8202 8203 8204 8205 8206 8207 8208 8209 8210 8211 8212 8213 8214 8215 8216 8217 8218 8219 8220 8221 8222 8223 8224 8225 8226 8227 8228 8229 8230 8231 8232 8233 8234 8235 8236 8237 8238 8239 8240 8241 8242 8243 8244 8245 8246 8247 8248 8249 8250 8251 8252 8253 8254 8255 8256 8257 8258 8259 8260 8261 8262 8263 8264 8265 8266 8267 8268 8269 8270 8271 8272 8273 8274 8275 8276 8277 8278 8279 8280 8281 8282 8283 8284 8285 8286 8287 8288 8289 8290 8291 8292 8293 8294 8295 8296 8297 8298 8299 8300 8301 8302 8303 8304 8305 8306 8307 8308 8309 8310 8311 8312 8313 8314 8315 8316 8317 8318 8319 8320 8321 8322 8323 8324 8325 8326 8327 8328 8329 8330 8331 8332 8333 8334 8335 8336 8337 8338 8339 8340 8341 8342 8343 8344 8345 8346 8347 8348 8349 8350 8351 8352 8353 8354 8355 8356 8357 8358 8359 8360 8361 8362 8363 8364 8365 8366 8367 8368 8369 8370 8371 8372 8373 8374 8375 8376 8377 8378 8379 8380 8381 8382 8383 8384 8385 8386 8387 8388 8389 8390 8391 8392 8393 8394 8395 8396 8397 8398 8399 8400 8401 8402 8403 8404 8405 8406 8407 8408 8409 8410 8411 8412 8413 8414 8415 8416 8417 8418 8419 8420 8421 8422 8423 8424 8425 8426 8427 8428 8429 8430 8431 8432 8433 8434 8435 8436 8437 8438 8439 8440 8441 8442 8443 8444 8445 8446 8447 8448 8449 8450 8451 8452 8453 8454 8455 8456 8457 8458 8459 8460 8461 8462 8463 8464 8465 8466 8467 8468 8469 8470 8471 8472 8473 8474 8475 8476 8477 8478 8479 8480 8481 8482 8483 8484 8485 8486 8487 8488 8489 8490 8491 8492 8493 8494 8495 8496 8497 8498 8499 8500 8501 8502 8503 8504 8505 8506 8507 8508 8509 8510 8511 8512 8513 8514 8515 8516 8517 8518 8519 8520 8521 8522 8523 8524 8525 8526 8527 8528 8529 8530 8531 8532 8533 8534 8535 8536 8537 8538 8539 8540 8541 8542 8543 8544 8545 8546 8547 8548 8549 8550 8551 8552 8553 8554 8555 8556 8557 8558 8559 8560 8561 8562 8563 8564 8565 8566 8567 8568 8569 8570 8571 8572 8573 8574 8575 8576 8577 8578 8579 8580 8581 8582 8583 8584 8585 8586 8587 8588 8589 8590 8591 8592 8593 8594 8595 8596 8597 8598 8599 8600 8601 8602 8603 8604 8605 8606 8607 8608 8609 8610 8611 8612 8613 8614 8615 8616 8617 8618 8619 8620 8621 8622 8623 8624 8625 8626 8627 8628 8629 8630 8631 8632 8633 8634 8635 8636 8637 8638 8639 8640 8641 8642 8643 8644 8645 8646 8647 8648 8649 8650 8651 8652 8653 8654 8655 8656 8657 8658 8659 8660 8661 8662 8663 8664 8665 8666 8667 8668 8669 8670 8671 8672 8673 8674 8675 8676 8677 8678 8679 8680 8681 8682 8683 8684 8685 8686 8687 8688 8689 8690 8691 8692 8693 8694 8695 8696 8697 8698 8699 8700 8701 8702 8703 8704 8705 8706 8707 8708 8709 8710 8711 8712 8713 8714 8715 8716 8717 8718 8719 8720 8721 8722 8723 8724 8725 8726 8727 8728 8729 8730 8731 8732 8733 8734 8735 8736 8737 8738 8739 8740 8741 8742 8743 8744 8745 8746 8747 8748 8749 8750 8751 8752 8753 8754 8755 8756 8757 8758 8759 8760 8761 8762 8763 8764 8765 8766 8767 8768 8769 8770 8771 8772 8773 8774 8775 8776 8777 8778 8779 8780 8781 8782 8783 8784 8785 8786 8787 8788 8789 8790 8791 8792 8793 8794 8795 8796 8797 8798 8799 8800 8801 8802 8803 8804 8805 8806 8807 8808 8809 8810 8811 8812 8813 8814 8815 8816 8817 8818 8819 8820 8821 8822 8823 8824 8825 8826 8827 8828 8829 8830 8831 8832 8833 8834 8835 8836 8837 8838 8839 8840 8841 8842 8843 8844 8845 8846 8847 8848 8849 8850 8851 8852 8853 8854 8855 8856 8857 8858 8859 8860 8861 8862 8863 8864 8865 8866 8867 8868 8869 8870 8871 8872 8873 8874 8875 8876 8877 8878 8879 8880 8881 8882 8883 8884 8885 8886 8887 8888 8889 8890 8891 8892 8893 8894 8895 8896 8897 8898 8899 8900 8901 8902 8903 8904 8905 8906 8907 8908 8909 8910 8911 8912 8913 8914 8915 8916 8917 8918 8919 8920 8921 8922 8923 8924 8925 8926 8927 8928 8929 8930 8931 8932 8933 8934 8935 8936 8937 8938 8939 8940 8941 8942 8943 8944 8945 8946 8947 8948 8949 8950 8951 8952 8953 8954 8955 8956 8957 8958 8959 8960 8961 8962 8963 8964 8965 8966 8967 8968 8969 8970 8971 8972 8973 8974 8975 8976 8977 8978 8979 8980 8981 8982 8983 8984 8985 8986 8987 8988 8989 8990 8991 8992 8993 8994 8995 8996 8997 8998 8999 9000 9001 9002 9003 9004 9005 9006 9007 9008 9009 9010 9011 9012 9013 9014 9015 9016 9017 9018 9019 9020 9021 9022 9023 9024 9025 9026 9027 9028 9029 9030 9031 9032 9033 9034 9035 9036 9037 9038 9039 9040 9041 9042 9043 9044 9045 9046 9047 9048 9049 9050 9051 9052 9053 9054 9055 9056 9057 9058 9059 9060 9061 9062 9063 9064 9065 9066 9067 9068 9069 9070 9071 9072 9073 9074 9075 9076 9077 9078 9079 9080 9081 9082 9083 9084 9085 9086 9087 9088 9089 9090 9091 9092 9093 9094 9095 9096 9097 9098 9099 9100 9101 9102 9103 9104 9105 9106 9107 9108 9109 9110 9111 9112 9113 9114 9115 9116 9117 9118 9119 9120 9121 9122 9123 9124 9125 9126 9127 9128 9129 9130 9131 9132 9133 9134 9135 9136 9137 9138 9139 9140 9141 9142 9143 9144 9145 9146 9147 9148 9149 9150 9151 9152 9153 9154 9155 9156 9157 9158 9159 9160 9161 9162 9163 9164 9165 9166 9167 9168 9169 9170 9171 9172 9173 9174 9175 9176 9177 9178 9179 9180 9181 9182 9183 9184 9185 9186 9187 9188 9189 9190 9191 9192 9193 9194 9195 9196 9197 9198 9199 9200 9201 9202 9203 9204 9205 9206 9207 9208 9209 9210 9211 9212 9213 9214 9215 9216 9217 9218 9219 9220 9221 9222 9223 9224 9225 9226 9227 9228 9229 9230 9231 9232 9233 9234 9235 9236 9237 9238 9239 9240 9241 9242 9243 9244 9245 9246 9247 9248 9249 9250 9251 9252 9253 9254 9255 9256 9257 9258 9259 9260 9261 9262 9263 9264 9265 9266 9267 9268 9269 9270 9271 9272 9273 9274 9275 9276 9277 9278 9279 9280 9281 9282 9283 9284 9285 9286 9287 9288 9289 9290 9291 9292 9293 9294 9295 9296 9297 9298 9299 9300 9301 9302 9303 9304 9305 9306 9307 9308 9309 9310 9311 9312 9313 9314 9315 9316 9317 9318 9319 9320 9321 9322 9323 9324 9325 9326 9327 9328 9329 9330 9331 9332 9333 9334 9335 9336 9337 9338 9339 9340 9341 9342 9343 9344 9345 9346 9347 9348 9349 9350 9351 9352 9353 9354 9355 9356 9357 9358 9359 9360 9361 9362 9363 9364 9365 9366 9367 9368 9369 9370 9371 9372 9373 9374 9375 9376 9377 9378 9379 9380 9381 9382 9383 9384 9385 9386 9387 9388 9389 9390 9391 9392 9393 9394 9395 9396 9397 9398 9399 9400 9401 9402 9403 9404 9405 9406 9407 9408 9409 9410 9411 9412 9413 9414 9415 9416 9417 9418 9419 9420 9421 9422 9423 9424 9425 9426 9427 9428 9429 9430 9431 9432 9433 9434 9435 9436 9437 9438 9439 9440 9441 9442 9443 9444 9445 9446 9447 9448 9449 9450 9451 9452 9453 9454 9455 9456 9457 9458 9459 9460 9461 9462 9463 9464 9465 9466 9467 9468 9469 9470 9471 9472 9473 9474 9475 9476 9477 9478 9479 9480 9481 9482 9483 9484 9485 9486 9487 9488 9489 9490 9491 9492 9493 9494 /* md.c : Multiple Devices driver for Linux Copyright (C) 1998, 1999, 2000 Ingo Molnar completely rewritten, based on the MD driver code from Marc Zyngier Changes: - RAID-1/RAID-5 extensions by Miguel de Icaza, Gadi Oxman, Ingo Molnar - RAID-6 extensions by H. Peter Anvin <hpa@zytor.com> - boot support for linear and striped mode by Harald Hoyer <HarryH@Royal.Net> - kerneld support by Boris Tobotras <boris@xtalk.msk.su> - kmod support by: Cyrus Durgin - RAID0 bugfixes: Mark Anthony Lisher <markal@iname.com> - Devfs support by Richard Gooch <rgooch@atnf.csiro.au> - lots of fixes and improvements to the RAID1/RAID5 and generic RAID code (such as request based resynchronization): Neil Brown <neilb@cse.unsw.edu.au>. - persistent bitmap code Copyright (C) 2003-2004, Paul Clements, SteelEye Technology, Inc. This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2, or (at your option) any later version. You should have received a copy of the GNU General Public License (for example /usr/src/linux/COPYING); if not, write to the Free Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. Errors, Warnings, etc. Please use: pr_crit() for error conditions that risk data loss pr_err() for error conditions that are unexpected, like an IO error or internal inconsistency pr_warn() for error conditions that could have been predicated, like adding a device to an array when it has incompatible metadata pr_info() for every interesting, very rare events, like an array starting or stopping, or resync starting or stopping pr_debug() for everything else. */ #include <linux/sched/signal.h> #include <linux/kthread.h> #include <linux/blkdev.h> #include <linux/badblocks.h> #include <linux/sysctl.h> #include <linux/seq_file.h> #include <linux/fs.h> #include <linux/poll.h> #include <linux/ctype.h> #include <linux/string.h> #include <linux/hdreg.h> #include <linux/proc_fs.h> #include <linux/random.h> #include <linux/module.h> #include <linux/reboot.h> #include <linux/file.h> #include <linux/compat.h> #include <linux/delay.h> #include <linux/raid/md_p.h> #include <linux/raid/md_u.h> #include <linux/slab.h> #include <linux/percpu-refcount.h> #include <trace/events/block.h> #include "md.h" #include "md-bitmap.h" #include "md-cluster.h" #ifndef MODULE static void autostart_arrays(int part); #endif /* pers_list is a list of registered personalities protected * by pers_lock. * pers_lock does extra service to protect accesses to * mddev->thread when the mutex cannot be held. */ static LIST_HEAD(pers_list); static DEFINE_SPINLOCK(pers_lock); static struct kobj_type md_ktype; struct md_cluster_operations *md_cluster_ops; EXPORT_SYMBOL(md_cluster_ops); struct module *md_cluster_mod; EXPORT_SYMBOL(md_cluster_mod); static DECLARE_WAIT_QUEUE_HEAD(resync_wait); static struct workqueue_struct *md_wq; static struct workqueue_struct *md_misc_wq; static int remove_and_add_spares(struct mddev *mddev, struct md_rdev *this); static void mddev_detach(struct mddev *mddev); /* * Default number of read corrections we'll attempt on an rdev * before ejecting it from the array. We divide the read error * count by 2 for every hour elapsed between read errors. */ #define MD_DEFAULT_MAX_CORRECTED_READ_ERRORS 20 /* * Current RAID-1,4,5 parallel reconstruction 'guaranteed speed limit' * is 1000 KB/sec, so the extra system load does not show up that much. * Increase it if you want to have more _guaranteed_ speed. Note that * the RAID driver will use the maximum available bandwidth if the IO * subsystem is idle. There is also an 'absolute maximum' reconstruction * speed limit - in case reconstruction slows down your system despite * idle IO detection. * * you can change it via /proc/sys/dev/raid/speed_limit_min and _max. * or /sys/block/mdX/md/sync_speed_{min,max} */ static int sysctl_speed_limit_min = 1000; static int sysctl_speed_limit_max = 200000; static inline int speed_min(struct mddev *mddev) { return mddev->sync_speed_min ? mddev->sync_speed_min : sysctl_speed_limit_min; } static inline int speed_max(struct mddev *mddev) { return mddev->sync_speed_max ? mddev->sync_speed_max : sysctl_speed_limit_max; } static struct ctl_table_header *raid_table_header; static struct ctl_table raid_table[] = { { .procname = "speed_limit_min", .data = &sysctl_speed_limit_min, .maxlen = sizeof(int), .mode = S_IRUGO|S_IWUSR, .proc_handler = proc_dointvec, }, { .procname = "speed_limit_max", .data = &sysctl_speed_limit_max, .maxlen = sizeof(int), .mode = S_IRUGO|S_IWUSR, .proc_handler = proc_dointvec, }, { } }; static struct ctl_table raid_dir_table[] = { { .procname = "raid", .maxlen = 0, .mode = S_IRUGO|S_IXUGO, .child = raid_table, }, { } }; static struct ctl_table raid_root_table[] = { { .procname = "dev", .maxlen = 0, .mode = 0555, .child = raid_dir_table, }, { } }; static const struct block_device_operations md_fops; static int start_readonly; /* * The original mechanism for creating an md device is to create * a device node in /dev and to open it. This causes races with device-close. * The preferred method is to write to the "new_array" module parameter. * This can avoid races. * Setting create_on_open to false disables the original mechanism * so all the races disappear. */ static bool create_on_open = true; struct bio *bio_alloc_mddev(gfp_t gfp_mask, int nr_iovecs, struct mddev *mddev) { struct bio *b; if (!mddev || !bioset_initialized(&mddev->bio_set)) return bio_alloc(gfp_mask, nr_iovecs); b = bio_alloc_bioset(gfp_mask, nr_iovecs, &mddev->bio_set); if (!b) return NULL; return b; } EXPORT_SYMBOL_GPL(bio_alloc_mddev); static struct bio *md_bio_alloc_sync(struct mddev *mddev) { if (!mddev || !bioset_initialized(&mddev->sync_set)) return bio_alloc(GFP_NOIO, 1); return bio_alloc_bioset(GFP_NOIO, 1, &mddev->sync_set); } /* * We have a system wide 'event count' that is incremented * on any 'interesting' event, and readers of /proc/mdstat * can use 'poll' or 'select' to find out when the event * count increases. * * Events are: * start array, stop array, error, add device, remove device, * start build, activate spare */ static DECLARE_WAIT_QUEUE_HEAD(md_event_waiters); static atomic_t md_event_count; void md_new_event(struct mddev *mddev) { atomic_inc(&md_event_count); wake_up(&md_event_waiters); } EXPORT_SYMBOL_GPL(md_new_event); /* * Enables to iterate over all existing md arrays * all_mddevs_lock protects this list. */ static LIST_HEAD(all_mddevs); static DEFINE_SPINLOCK(all_mddevs_lock); /* * iterates through all used mddevs in the system. * We take care to grab the all_mddevs_lock whenever navigating * the list, and to always hold a refcount when unlocked. * Any code which breaks out of this loop while own * a reference to the current mddev and must mddev_put it. */ #define for_each_mddev(_mddev,_tmp) \ \ for (({ spin_lock(&all_mddevs_lock); \ _tmp = all_mddevs.next; \ _mddev = NULL;}); \ ({ if (_tmp != &all_mddevs) \ mddev_get(list_entry(_tmp, struct mddev, all_mddevs));\ spin_unlock(&all_mddevs_lock); \ if (_mddev) mddev_put(_mddev); \ _mddev = list_entry(_tmp, struct mddev, all_mddevs); \ _tmp != &all_mddevs;}); \ ({ spin_lock(&all_mddevs_lock); \ _tmp = _tmp->next;}) \ ) /* Rather than calling directly into the personality make_request function, * IO requests come here first so that we can check if the device is * being suspended pending a reconfiguration. * We hold a refcount over the call to ->make_request. By the time that * call has finished, the bio has been linked into some internal structure * and so is visible to ->quiesce(), so we don't need the refcount any more. */ static bool is_suspended(struct mddev *mddev, struct bio *bio) { if (mddev->suspended) return true; if (bio_data_dir(bio) != WRITE) return false; if (mddev->suspend_lo >= mddev->suspend_hi) return false; if (bio->bi_iter.bi_sector >= mddev->suspend_hi) return false; if (bio_end_sector(bio) < mddev->suspend_lo) return false; return true; } void md_handle_request(struct mddev *mddev, struct bio *bio) { check_suspended: rcu_read_lock(); if (is_suspended(mddev, bio)) { DEFINE_WAIT(__wait); for (;;) { prepare_to_wait(&mddev->sb_wait, &__wait, TASK_UNINTERRUPTIBLE); if (!is_suspended(mddev, bio)) break; rcu_read_unlock(); schedule(); rcu_read_lock(); } finish_wait(&mddev->sb_wait, &__wait); } atomic_inc(&mddev->active_io); rcu_read_unlock(); if (!mddev->pers->make_request(mddev, bio)) { atomic_dec(&mddev->active_io); wake_up(&mddev->sb_wait); goto check_suspended; } if (atomic_dec_and_test(&mddev->active_io) && mddev->suspended) wake_up(&mddev->sb_wait); } EXPORT_SYMBOL(md_handle_request); static blk_qc_t md_make_request(struct request_queue *q, struct bio *bio) { const int rw = bio_data_dir(bio); const int sgrp = op_stat_group(bio_op(bio)); struct mddev *mddev = q->queuedata; unsigned int sectors; int cpu; blk_queue_split(q, &bio); if (mddev == NULL || mddev->pers == NULL) { bio_io_error(bio); return BLK_QC_T_NONE; } if (mddev->ro == 1 && unlikely(rw == WRITE)) { if (bio_sectors(bio) != 0) bio->bi_status = BLK_STS_IOERR; bio_endio(bio); return BLK_QC_T_NONE; } /* * save the sectors now since our bio can * go away inside make_request */ sectors = bio_sectors(bio); /* bio could be mergeable after passing to underlayer */ bio->bi_opf &= ~REQ_NOMERGE; md_handle_request(mddev, bio); cpu = part_stat_lock(); part_stat_inc(cpu, &mddev->gendisk->part0, ios[sgrp]); part_stat_add(cpu, &mddev->gendisk->part0, sectors[sgrp], sectors); part_stat_unlock(); return BLK_QC_T_NONE; } /* mddev_suspend makes sure no new requests are submitted * to the device, and that any requests that have been submitted * are completely handled. * Once mddev_detach() is called and completes, the module will be * completely unused. */ void mddev_suspend(struct mddev *mddev) { WARN_ON_ONCE(mddev->thread && current == mddev->thread->tsk); lockdep_assert_held(&mddev->reconfig_mutex); if (mddev->suspended++) return; synchronize_rcu(); wake_up(&mddev->sb_wait); set_bit(MD_ALLOW_SB_UPDATE, &mddev->flags); smp_mb__after_atomic(); wait_event(mddev->sb_wait, atomic_read(&mddev->active_io) == 0); mddev->pers->quiesce(mddev, 1); clear_bit_unlock(MD_ALLOW_SB_UPDATE, &mddev->flags); wait_event(mddev->sb_wait, !test_bit(MD_UPDATING_SB, &mddev->flags)); del_timer_sync(&mddev->safemode_timer); } EXPORT_SYMBOL_GPL(mddev_suspend); void mddev_resume(struct mddev *mddev) { lockdep_assert_held(&mddev->reconfig_mutex); if (--mddev->suspended) return; wake_up(&mddev->sb_wait); mddev->pers->quiesce(mddev, 0); set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); md_wakeup_thread(mddev->thread); md_wakeup_thread(mddev->sync_thread); /* possibly kick off a reshape */ } EXPORT_SYMBOL_GPL(mddev_resume); int mddev_congested(struct mddev *mddev, int bits) { struct md_personality *pers = mddev->pers; int ret = 0; rcu_read_lock(); if (mddev->suspended) ret = 1; else if (pers && pers->congested) ret = pers->congested(mddev, bits); rcu_read_unlock(); return ret; } EXPORT_SYMBOL_GPL(mddev_congested); static int md_congested(void *data, int bits) { struct mddev *mddev = data; return mddev_congested(mddev, bits); } /* * Generic flush handling for md */ static void md_end_flush(struct bio *bio) { struct md_rdev *rdev = bio->bi_private; struct mddev *mddev = rdev->mddev; rdev_dec_pending(rdev, mddev); if (atomic_dec_and_test(&mddev->flush_pending)) { /* The pre-request flush has finished */ queue_work(md_wq, &mddev->flush_work); } bio_put(bio); } static void md_submit_flush_data(struct work_struct *ws); static void submit_flushes(struct work_struct *ws) { struct mddev *mddev = container_of(ws, struct mddev, flush_work); struct md_rdev *rdev; mddev->start_flush = ktime_get_boottime(); INIT_WORK(&mddev->flush_work, md_submit_flush_data); atomic_set(&mddev->flush_pending, 1); rcu_read_lock(); rdev_for_each_rcu(rdev, mddev) if (rdev->raid_disk >= 0 && !test_bit(Faulty, &rdev->flags)) { /* Take two references, one is dropped * when request finishes, one after * we reclaim rcu_read_lock */ struct bio *bi; atomic_inc(&rdev->nr_pending); atomic_inc(&rdev->nr_pending); rcu_read_unlock(); bi = bio_alloc_mddev(GFP_NOIO, 0, mddev); bi->bi_end_io = md_end_flush; bi->bi_private = rdev; bio_set_dev(bi, rdev->bdev); bi->bi_opf = REQ_OP_WRITE | REQ_PREFLUSH; atomic_inc(&mddev->flush_pending); submit_bio(bi); rcu_read_lock(); rdev_dec_pending(rdev, mddev); } rcu_read_unlock(); if (atomic_dec_and_test(&mddev->flush_pending)) queue_work(md_wq, &mddev->flush_work); } static void md_submit_flush_data(struct work_struct *ws) { struct mddev *mddev = container_of(ws, struct mddev, flush_work); struct bio *bio = mddev->flush_bio; /* * must reset flush_bio before calling into md_handle_request to avoid a * deadlock, because other bios passed md_handle_request suspend check * could wait for this and below md_handle_request could wait for those * bios because of suspend check */ spin_lock_irq(&mddev->lock); mddev->last_flush = mddev->start_flush; mddev->flush_bio = NULL; spin_unlock_irq(&mddev->lock); wake_up(&mddev->sb_wait); if (bio->bi_iter.bi_size == 0) { /* an empty barrier - all done */ bio_endio(bio); } else { bio->bi_opf &= ~REQ_PREFLUSH; md_handle_request(mddev, bio); } } /* * Manages consolidation of flushes and submitting any flushes needed for * a bio with REQ_PREFLUSH. Returns true if the bio is finished or is * being finished in another context. Returns false if the flushing is * complete but still needs the I/O portion of the bio to be processed. */ bool md_flush_request(struct mddev *mddev, struct bio *bio) { ktime_t start = ktime_get_boottime(); spin_lock_irq(&mddev->lock); wait_event_lock_irq(mddev->sb_wait, !mddev->flush_bio || ktime_after(mddev->last_flush, start), mddev->lock); if (!ktime_after(mddev->last_flush, start)) { WARN_ON(mddev->flush_bio); mddev->flush_bio = bio; bio = NULL; } spin_unlock_irq(&mddev->lock); if (!bio) { INIT_WORK(&mddev->flush_work, submit_flushes); queue_work(md_wq, &mddev->flush_work); } else { /* flush was performed for some other bio while we waited. */ if (bio->bi_iter.bi_size == 0) /* an empty barrier - all done */ bio_endio(bio); else { bio->bi_opf &= ~REQ_PREFLUSH; return false; } } return true; } EXPORT_SYMBOL(md_flush_request); static inline struct mddev *mddev_get(struct mddev *mddev) { atomic_inc(&mddev->active); return mddev; } static void mddev_delayed_delete(struct work_struct *ws); static void mddev_put(struct mddev *mddev) { if (!atomic_dec_and_lock(&mddev->active, &all_mddevs_lock)) return; if (!mddev->raid_disks && list_empty(&mddev->disks) && mddev->ctime == 0 && !mddev->hold_active) { /* Array is not configured at all, and not held active, * so destroy it */ list_del_init(&mddev->all_mddevs); /* * Call queue_work inside the spinlock so that * flush_workqueue() after mddev_find will succeed in waiting * for the work to be done. */ INIT_WORK(&mddev->del_work, mddev_delayed_delete); queue_work(md_misc_wq, &mddev->del_work); } spin_unlock(&all_mddevs_lock); } static void md_safemode_timeout(struct timer_list *t); void mddev_init(struct mddev *mddev) { kobject_init(&mddev->kobj, &md_ktype); mutex_init(&mddev->open_mutex); mutex_init(&mddev->reconfig_mutex); mutex_init(&mddev->bitmap_info.mutex); INIT_LIST_HEAD(&mddev->disks); INIT_LIST_HEAD(&mddev->all_mddevs); timer_setup(&mddev->safemode_timer, md_safemode_timeout, 0); atomic_set(&mddev->active, 1); atomic_set(&mddev->openers, 0); atomic_set(&mddev->active_io, 0); spin_lock_init(&mddev->lock); atomic_set(&mddev->flush_pending, 0); init_waitqueue_head(&mddev->sb_wait); init_waitqueue_head(&mddev->recovery_wait); mddev->reshape_position = MaxSector; mddev->reshape_backwards = 0; mddev->last_sync_action = "none"; mddev->resync_min = 0; mddev->resync_max = MaxSector; mddev->level = LEVEL_NONE; } EXPORT_SYMBOL_GPL(mddev_init); static struct mddev *mddev_find_locked(dev_t unit) { struct mddev *mddev; list_for_each_entry(mddev, &all_mddevs, all_mddevs) if (mddev->unit == unit) return mddev; return NULL; } static struct mddev *mddev_find(dev_t unit) { struct mddev *mddev; if (MAJOR(unit) != MD_MAJOR) unit &= ~((1 << MdpMinorShift) - 1); spin_lock(&all_mddevs_lock); mddev = mddev_find_locked(unit); if (mddev) mddev_get(mddev); spin_unlock(&all_mddevs_lock); return mddev; } static struct mddev *mddev_find_or_alloc(dev_t unit) { struct mddev *mddev, *new = NULL; if (unit && MAJOR(unit) != MD_MAJOR) unit &= ~((1<<MdpMinorShift)-1); retry: spin_lock(&all_mddevs_lock); if (unit) { mddev = mddev_find_locked(unit); if (mddev) { mddev_get(mddev); spin_unlock(&all_mddevs_lock); kfree(new); return mddev; } if (new) { list_add(&new->all_mddevs, &all_mddevs); spin_unlock(&all_mddevs_lock); new->hold_active = UNTIL_IOCTL; return new; } } else if (new) { /* find an unused unit number */ static int next_minor = 512; int start = next_minor; int is_free = 0; int dev = 0; while (!is_free) { dev = MKDEV(MD_MAJOR, next_minor); next_minor++; if (next_minor > MINORMASK) next_minor = 0; if (next_minor == start) { /* Oh dear, all in use. */ spin_unlock(&all_mddevs_lock); kfree(new); return NULL; } is_free = !mddev_find_locked(dev); } new->unit = dev; new->md_minor = MINOR(dev); new->hold_active = UNTIL_STOP; list_add(&new->all_mddevs, &all_mddevs); spin_unlock(&all_mddevs_lock); return new; } spin_unlock(&all_mddevs_lock); new = kzalloc(sizeof(*new), GFP_KERNEL); if (!new) return NULL; new->unit = unit; if (MAJOR(unit) == MD_MAJOR) new->md_minor = MINOR(unit); else new->md_minor = MINOR(unit) >> MdpMinorShift; mddev_init(new); goto retry; } static struct attribute_group md_redundancy_group; void mddev_unlock(struct mddev *mddev) { if (mddev->to_remove) { /* These cannot be removed under reconfig_mutex as * an access to the files will try to take reconfig_mutex * while holding the file unremovable, which leads to * a deadlock. * So hold set sysfs_active while the remove in happeing, * and anything else which might set ->to_remove or my * otherwise change the sysfs namespace will fail with * -EBUSY if sysfs_active is still set. * We set sysfs_active under reconfig_mutex and elsewhere * test it under the same mutex to ensure its correct value * is seen. */ struct attribute_group *to_remove = mddev->to_remove; mddev->to_remove = NULL; mddev->sysfs_active = 1; mutex_unlock(&mddev->reconfig_mutex); if (mddev->kobj.sd) { if (to_remove != &md_redundancy_group) sysfs_remove_group(&mddev->kobj, to_remove); if (mddev->pers == NULL || mddev->pers->sync_request == NULL) { sysfs_remove_group(&mddev->kobj, &md_redundancy_group); if (mddev->sysfs_action) sysfs_put(mddev->sysfs_action); mddev->sysfs_action = NULL; } } mddev->sysfs_active = 0; } else mutex_unlock(&mddev->reconfig_mutex); /* As we've dropped the mutex we need a spinlock to * make sure the thread doesn't disappear */ spin_lock(&pers_lock); md_wakeup_thread(mddev->thread); wake_up(&mddev->sb_wait); spin_unlock(&pers_lock); } EXPORT_SYMBOL_GPL(mddev_unlock); struct md_rdev *md_find_rdev_nr_rcu(struct mddev *mddev, int nr) { struct md_rdev *rdev; rdev_for_each_rcu(rdev, mddev) if (rdev->desc_nr == nr) return rdev; return NULL; } EXPORT_SYMBOL_GPL(md_find_rdev_nr_rcu); static struct md_rdev *find_rdev(struct mddev *mddev, dev_t dev) { struct md_rdev *rdev; rdev_for_each(rdev, mddev) if (rdev->bdev->bd_dev == dev) return rdev; return NULL; } struct md_rdev *md_find_rdev_rcu(struct mddev *mddev, dev_t dev) { struct md_rdev *rdev; rdev_for_each_rcu(rdev, mddev) if (rdev->bdev->bd_dev == dev) return rdev; return NULL; } EXPORT_SYMBOL_GPL(md_find_rdev_rcu); static struct md_personality *find_pers(int level, char *clevel) { struct md_personality *pers; list_for_each_entry(pers, &pers_list, list) { if (level != LEVEL_NONE && pers->level == level) return pers; if (strcmp(pers->name, clevel)==0) return pers; } return NULL; } /* return the offset of the super block in 512byte sectors */ static inline sector_t calc_dev_sboffset(struct md_rdev *rdev) { sector_t num_sectors = i_size_read(rdev->bdev->bd_inode) / 512; return MD_NEW_SIZE_SECTORS(num_sectors); } static int alloc_disk_sb(struct md_rdev *rdev) { rdev->sb_page = alloc_page(GFP_KERNEL); if (!rdev->sb_page) return -ENOMEM; return 0; } void md_rdev_clear(struct md_rdev *rdev) { if (rdev->sb_page) { put_page(rdev->sb_page); rdev->sb_loaded = 0; rdev->sb_page = NULL; rdev->sb_start = 0; rdev->sectors = 0; } if (rdev->bb_page) { put_page(rdev->bb_page); rdev->bb_page = NULL; } badblocks_exit(&rdev->badblocks); } EXPORT_SYMBOL_GPL(md_rdev_clear); static void super_written(struct bio *bio) { struct md_rdev *rdev = bio->bi_private; struct mddev *mddev = rdev->mddev; if (bio->bi_status) { pr_err("md: super_written gets error=%d\n", bio->bi_status); md_error(mddev, rdev); if (!test_bit(Faulty, &rdev->flags) && (bio->bi_opf & MD_FAILFAST)) { set_bit(MD_SB_NEED_REWRITE, &mddev->sb_flags); set_bit(LastDev, &rdev->flags); } } else clear_bit(LastDev, &rdev->flags); if (atomic_dec_and_test(&mddev->pending_writes)) wake_up(&mddev->sb_wait); rdev_dec_pending(rdev, mddev); bio_put(bio); } void md_super_write(struct mddev *mddev, struct md_rdev *rdev, sector_t sector, int size, struct page *page) { /* write first size bytes of page to sector of rdev * Increment mddev->pending_writes before returning * and decrement it on completion, waking up sb_wait * if zero is reached. * If an error occurred, call md_error */ struct bio *bio; int ff = 0; if (!page) return; if (test_bit(Faulty, &rdev->flags)) return; bio = md_bio_alloc_sync(mddev); atomic_inc(&rdev->nr_pending); bio_set_dev(bio, rdev->meta_bdev ? rdev->meta_bdev : rdev->bdev); bio->bi_iter.bi_sector = sector; bio_add_page(bio, page, size, 0); bio->bi_private = rdev; bio->bi_end_io = super_written; if (test_bit(MD_FAILFAST_SUPPORTED, &mddev->flags) && test_bit(FailFast, &rdev->flags) && !test_bit(LastDev, &rdev->flags)) ff = MD_FAILFAST; bio->bi_opf = REQ_OP_WRITE | REQ_SYNC | REQ_PREFLUSH | REQ_FUA | ff; atomic_inc(&mddev->pending_writes); submit_bio(bio); } int md_super_wait(struct mddev *mddev) { /* wait for all superblock writes that were scheduled to complete */ wait_event(mddev->sb_wait, atomic_read(&mddev->pending_writes)==0); if (test_and_clear_bit(MD_SB_NEED_REWRITE, &mddev->sb_flags)) return -EAGAIN; return 0; } int sync_page_io(struct md_rdev *rdev, sector_t sector, int size, struct page *page, int op, int op_flags, bool metadata_op) { struct bio *bio = md_bio_alloc_sync(rdev->mddev); int ret; if (metadata_op && rdev->meta_bdev) bio_set_dev(bio, rdev->meta_bdev); else bio_set_dev(bio, rdev->bdev); bio_set_op_attrs(bio, op, op_flags); if (metadata_op) bio->bi_iter.bi_sector = sector + rdev->sb_start; else if (rdev->mddev->reshape_position != MaxSector && (rdev->mddev->reshape_backwards == (sector >= rdev->mddev->reshape_position))) bio->bi_iter.bi_sector = sector + rdev->new_data_offset; else bio->bi_iter.bi_sector = sector + rdev->data_offset; bio_add_page(bio, page, size, 0); submit_bio_wait(bio); ret = !bio->bi_status; bio_put(bio); return ret; } EXPORT_SYMBOL_GPL(sync_page_io); static int read_disk_sb(struct md_rdev *rdev, int size) { char b[BDEVNAME_SIZE]; if (rdev->sb_loaded) return 0; if (!sync_page_io(rdev, 0, size, rdev->sb_page, REQ_OP_READ, 0, true)) goto fail; rdev->sb_loaded = 1; return 0; fail: pr_err("md: disabled device %s, could not read superblock.\n", bdevname(rdev->bdev,b)); return -EINVAL; } static int md_uuid_equal(mdp_super_t *sb1, mdp_super_t *sb2) { return sb1->set_uuid0 == sb2->set_uuid0 && sb1->set_uuid1 == sb2->set_uuid1 && sb1->set_uuid2 == sb2->set_uuid2 && sb1->set_uuid3 == sb2->set_uuid3; } static int md_sb_equal(mdp_super_t *sb1, mdp_super_t *sb2) { int ret; mdp_super_t *tmp1, *tmp2; tmp1 = kmalloc(sizeof(*tmp1),GFP_KERNEL); tmp2 = kmalloc(sizeof(*tmp2),GFP_KERNEL); if (!tmp1 || !tmp2) { ret = 0; goto abort; } *tmp1 = *sb1; *tmp2 = *sb2; /* * nr_disks is not constant */ tmp1->nr_disks = 0; tmp2->nr_disks = 0; ret = (memcmp(tmp1, tmp2, MD_SB_GENERIC_CONSTANT_WORDS * 4) == 0); abort: kfree(tmp1); kfree(tmp2); return ret; } static u32 md_csum_fold(u32 csum) { csum = (csum & 0xffff) + (csum >> 16); return (csum & 0xffff) + (csum >> 16); } static unsigned int calc_sb_csum(mdp_super_t *sb) { u64 newcsum = 0; u32 *sb32 = (u32*)sb; int i; unsigned int disk_csum, csum; disk_csum = sb->sb_csum; sb->sb_csum = 0; for (i = 0; i < MD_SB_BYTES/4 ; i++) newcsum += sb32[i]; csum = (newcsum & 0xffffffff) + (newcsum>>32); #ifdef CONFIG_ALPHA /* This used to use csum_partial, which was wrong for several * reasons including that different results are returned on * different architectures. It isn't critical that we get exactly * the same return value as before (we always csum_fold before * testing, and that removes any differences). However as we * know that csum_partial always returned a 16bit value on * alphas, do a fold to maximise conformity to previous behaviour. */ sb->sb_csum = md_csum_fold(disk_csum); #else sb->sb_csum = disk_csum; #endif return csum; } /* * Handle superblock details. * We want to be able to handle multiple superblock formats * so we have a common interface to them all, and an array of * different handlers. * We rely on user-space to write the initial superblock, and support * reading and updating of superblocks. * Interface methods are: * int load_super(struct md_rdev *dev, struct md_rdev *refdev, int minor_version) * loads and validates a superblock on dev. * if refdev != NULL, compare superblocks on both devices * Return: * 0 - dev has a superblock that is compatible with refdev * 1 - dev has a superblock that is compatible and newer than refdev * so dev should be used as the refdev in future * -EINVAL superblock incompatible or invalid * -othererror e.g. -EIO * * int validate_super(struct mddev *mddev, struct md_rdev *dev) * Verify that dev is acceptable into mddev. * The first time, mddev->raid_disks will be 0, and data from * dev should be merged in. Subsequent calls check that dev * is new enough. Return 0 or -EINVAL * * void sync_super(struct mddev *mddev, struct md_rdev *dev) * Update the superblock for rdev with data in mddev * This does not write to disc. * */ struct super_type { char *name; struct module *owner; int (*load_super)(struct md_rdev *rdev, struct md_rdev *refdev, int minor_version); int (*validate_super)(struct mddev *mddev, struct md_rdev *rdev); void (*sync_super)(struct mddev *mddev, struct md_rdev *rdev); unsigned long long (*rdev_size_change)(struct md_rdev *rdev, sector_t num_sectors); int (*allow_new_offset)(struct md_rdev *rdev, unsigned long long new_offset); }; /* * Check that the given mddev has no bitmap. * * This function is called from the run method of all personalities that do not * support bitmaps. It prints an error message and returns non-zero if mddev * has a bitmap. Otherwise, it returns 0. * */ int md_check_no_bitmap(struct mddev *mddev) { if (!mddev->bitmap_info.file && !mddev->bitmap_info.offset) return 0; pr_warn("%s: bitmaps are not supported for %s\n", mdname(mddev), mddev->pers->name); return 1; } EXPORT_SYMBOL(md_check_no_bitmap); /* * load_super for 0.90.0 */ static int super_90_load(struct md_rdev *rdev, struct md_rdev *refdev, int minor_version) { char b[BDEVNAME_SIZE], b2[BDEVNAME_SIZE]; mdp_super_t *sb; int ret; /* * Calculate the position of the superblock (512byte sectors), * it's at the end of the disk. * * It also happens to be a multiple of 4Kb. */ rdev->sb_start = calc_dev_sboffset(rdev); ret = read_disk_sb(rdev, MD_SB_BYTES); if (ret) return ret; ret = -EINVAL; bdevname(rdev->bdev, b); sb = page_address(rdev->sb_page); if (sb->md_magic != MD_SB_MAGIC) { pr_warn("md: invalid raid superblock magic on %s\n", b); goto abort; } if (sb->major_version != 0 || sb->minor_version < 90 || sb->minor_version > 91) { pr_warn("Bad version number %d.%d on %s\n", sb->major_version, sb->minor_version, b); goto abort; } if (sb->raid_disks <= 0) goto abort; if (md_csum_fold(calc_sb_csum(sb)) != md_csum_fold(sb->sb_csum)) { pr_warn("md: invalid superblock checksum on %s\n", b); goto abort; } rdev->preferred_minor = sb->md_minor; rdev->data_offset = 0; rdev->new_data_offset = 0; rdev->sb_size = MD_SB_BYTES; rdev->badblocks.shift = -1; if (sb->level == LEVEL_MULTIPATH) rdev->desc_nr = -1; else rdev->desc_nr = sb->this_disk.number; if (!refdev) { ret = 1; } else { __u64 ev1, ev2; mdp_super_t *refsb = page_address(refdev->sb_page); if (!md_uuid_equal(refsb, sb)) { pr_warn("md: %s has different UUID to %s\n", b, bdevname(refdev->bdev,b2)); goto abort; } if (!md_sb_equal(refsb, sb)) { pr_warn("md: %s has same UUID but different superblock to %s\n", b, bdevname(refdev->bdev, b2)); goto abort; } ev1 = md_event(sb); ev2 = md_event(refsb); if (ev1 > ev2) ret = 1; else ret = 0; } rdev->sectors = rdev->sb_start; /* Limit to 4TB as metadata cannot record more than that. * (not needed for Linear and RAID0 as metadata doesn't * record this size) */ if (IS_ENABLED(CONFIG_LBDAF) && (u64)rdev->sectors >= (2ULL << 32) && sb->level >= 1) rdev->sectors = (sector_t)(2ULL << 32) - 2; if (rdev->sectors < ((sector_t)sb->size) * 2 && sb->level >= 1) /* "this cannot possibly happen" ... */ ret = -EINVAL; abort: return ret; } /* * validate_super for 0.90.0 */ static int super_90_validate(struct mddev *mddev, struct md_rdev *rdev) { mdp_disk_t *desc; mdp_super_t *sb = page_address(rdev->sb_page); __u64 ev1 = md_event(sb); rdev->raid_disk = -1; clear_bit(Faulty, &rdev->flags); clear_bit(In_sync, &rdev->flags); clear_bit(Bitmap_sync, &rdev->flags); clear_bit(WriteMostly, &rdev->flags); if (mddev->raid_disks == 0) { mddev->major_version = 0; mddev->minor_version = sb->minor_version; mddev->patch_version = sb->patch_version; mddev->external = 0; mddev->chunk_sectors = sb->chunk_size >> 9; mddev->ctime = sb->ctime; mddev->utime = sb->utime; mddev->level = sb->level; mddev->clevel[0] = 0; mddev->layout = sb->layout; mddev->raid_disks = sb->raid_disks; mddev->dev_sectors = ((sector_t)sb->size) * 2; mddev->events = ev1; mddev->bitmap_info.offset = 0; mddev->bitmap_info.space = 0; /* bitmap can use 60 K after the 4K superblocks */ mddev->bitmap_info.default_offset = MD_SB_BYTES >> 9; mddev->bitmap_info.default_space = 64*2 - (MD_SB_BYTES >> 9); mddev->reshape_backwards = 0; if (mddev->minor_version >= 91) { mddev->reshape_position = sb->reshape_position; mddev->delta_disks = sb->delta_disks; mddev->new_level = sb->new_level; mddev->new_layout = sb->new_layout; mddev->new_chunk_sectors = sb->new_chunk >> 9; if (mddev->delta_disks < 0) mddev->reshape_backwards = 1; } else { mddev->reshape_position = MaxSector; mddev->delta_disks = 0; mddev->new_level = mddev->level; mddev->new_layout = mddev->layout; mddev->new_chunk_sectors = mddev->chunk_sectors; } if (mddev->level == 0) mddev->layout = -1; if (sb->state & (1<<MD_SB_CLEAN)) mddev->recovery_cp = MaxSector; else { if (sb->events_hi == sb->cp_events_hi && sb->events_lo == sb->cp_events_lo) { mddev->recovery_cp = sb->recovery_cp; } else mddev->recovery_cp = 0; } memcpy(mddev->uuid+0, &sb->set_uuid0, 4); memcpy(mddev->uuid+4, &sb->set_uuid1, 4); memcpy(mddev->uuid+8, &sb->set_uuid2, 4); memcpy(mddev->uuid+12,&sb->set_uuid3, 4); mddev->max_disks = MD_SB_DISKS; if (sb->state & (1<<MD_SB_BITMAP_PRESENT) && mddev->bitmap_info.file == NULL) { mddev->bitmap_info.offset = mddev->bitmap_info.default_offset; mddev->bitmap_info.space = mddev->bitmap_info.default_space; } } else if (mddev->pers == NULL) { /* Insist on good event counter while assembling, except * for spares (which don't need an event count) */ ++ev1; if (sb->disks[rdev->desc_nr].state & ( (1<<MD_DISK_SYNC) | (1 << MD_DISK_ACTIVE))) if (ev1 < mddev->events) return -EINVAL; } else if (mddev->bitmap) { /* if adding to array with a bitmap, then we can accept an * older device ... but not too old. */ if (ev1 < mddev->bitmap->events_cleared) return 0; if (ev1 < mddev->events) set_bit(Bitmap_sync, &rdev->flags); } else { if (ev1 < mddev->events) /* just a hot-add of a new device, leave raid_disk at -1 */ return 0; } if (mddev->level != LEVEL_MULTIPATH) { desc = sb->disks + rdev->desc_nr; if (desc->state & (1<<MD_DISK_FAULTY)) set_bit(Faulty, &rdev->flags); else if (desc->state & (1<<MD_DISK_SYNC) /* && desc->raid_disk < mddev->raid_disks */) { set_bit(In_sync, &rdev->flags); rdev->raid_disk = desc->raid_disk; rdev->saved_raid_disk = desc->raid_disk; } else if (desc->state & (1<<MD_DISK_ACTIVE)) { /* active but not in sync implies recovery up to * reshape position. We don't know exactly where * that is, so set to zero for now */ if (mddev->minor_version >= 91) { rdev->recovery_offset = 0; rdev->raid_disk = desc->raid_disk; } } if (desc->state & (1<<MD_DISK_WRITEMOSTLY)) set_bit(WriteMostly, &rdev->flags); if (desc->state & (1<<MD_DISK_FAILFAST)) set_bit(FailFast, &rdev->flags); } else /* MULTIPATH are always insync */ set_bit(In_sync, &rdev->flags); return 0; } /* * sync_super for 0.90.0 */ static void super_90_sync(struct mddev *mddev, struct md_rdev *rdev) { mdp_super_t *sb; struct md_rdev *rdev2; int next_spare = mddev->raid_disks; /* make rdev->sb match mddev data.. * * 1/ zero out disks * 2/ Add info for each disk, keeping track of highest desc_nr (next_spare); * 3/ any empty disks < next_spare become removed * * disks[0] gets initialised to REMOVED because * we cannot be sure from other fields if it has * been initialised or not. */ int i; int active=0, working=0,failed=0,spare=0,nr_disks=0; rdev->sb_size = MD_SB_BYTES; sb = page_address(rdev->sb_page); memset(sb, 0, sizeof(*sb)); sb->md_magic = MD_SB_MAGIC; sb->major_version = mddev->major_version; sb->patch_version = mddev->patch_version; sb->gvalid_words = 0; /* ignored */ memcpy(&sb->set_uuid0, mddev->uuid+0, 4); memcpy(&sb->set_uuid1, mddev->uuid+4, 4); memcpy(&sb->set_uuid2, mddev->uuid+8, 4); memcpy(&sb->set_uuid3, mddev->uuid+12,4); sb->ctime = clamp_t(time64_t, mddev->ctime, 0, U32_MAX); sb->level = mddev->level; sb->size = mddev->dev_sectors / 2; sb->raid_disks = mddev->raid_disks; sb->md_minor = mddev->md_minor; sb->not_persistent = 0; sb->utime = clamp_t(time64_t, mddev->utime, 0, U32_MAX); sb->state = 0; sb->events_hi = (mddev->events>>32); sb->events_lo = (u32)mddev->events; if (mddev->reshape_position == MaxSector) sb->minor_version = 90; else { sb->minor_version = 91; sb->reshape_position = mddev->reshape_position; sb->new_level = mddev->new_level; sb->delta_disks = mddev->delta_disks; sb->new_layout = mddev->new_layout; sb->new_chunk = mddev->new_chunk_sectors << 9; } mddev->minor_version = sb->minor_version; if (mddev->in_sync) { sb->recovery_cp = mddev->recovery_cp; sb->cp_events_hi = (mddev->events>>32); sb->cp_events_lo = (u32)mddev->events; if (mddev->recovery_cp == MaxSector) sb->state = (1<< MD_SB_CLEAN); } else sb->recovery_cp = 0; sb->layout = mddev->layout; sb->chunk_size = mddev->chunk_sectors << 9; if (mddev->bitmap && mddev->bitmap_info.file == NULL) sb->state |= (1<<MD_SB_BITMAP_PRESENT); sb->disks[0].state = (1<<MD_DISK_REMOVED); rdev_for_each(rdev2, mddev) { mdp_disk_t *d; int desc_nr; int is_active = test_bit(In_sync, &rdev2->flags); if (rdev2->raid_disk >= 0 && sb->minor_version >= 91) /* we have nowhere to store the recovery_offset, * but if it is not below the reshape_position, * we can piggy-back on that. */ is_active = 1; if (rdev2->raid_disk < 0 || test_bit(Faulty, &rdev2->flags)) is_active = 0; if (is_active) desc_nr = rdev2->raid_disk; else desc_nr = next_spare++; rdev2->desc_nr = desc_nr; d = &sb->disks[rdev2->desc_nr]; nr_disks++; d->number = rdev2->desc_nr; d->major = MAJOR(rdev2->bdev->bd_dev); d->minor = MINOR(rdev2->bdev->bd_dev); if (is_active) d->raid_disk = rdev2->raid_disk; else d->raid_disk = rdev2->desc_nr; /* compatibility */ if (test_bit(Faulty, &rdev2->flags)) d->state = (1<<MD_DISK_FAULTY); else if (is_active) { d->state = (1<<MD_DISK_ACTIVE); if (test_bit(In_sync, &rdev2->flags)) d->state |= (1<<MD_DISK_SYNC); active++; working++; } else { d->state = 0; spare++; working++; } if (test_bit(WriteMostly, &rdev2->flags)) d->state |= (1<<MD_DISK_WRITEMOSTLY); if (test_bit(FailFast, &rdev2->flags)) d->state |= (1<<MD_DISK_FAILFAST); } /* now set the "removed" and "faulty" bits on any missing devices */ for (i=0 ; i < mddev->raid_disks ; i++) { mdp_disk_t *d = &sb->disks[i]; if (d->state == 0 && d->number == 0) { d->number = i; d->raid_disk = i; d->state = (1<<MD_DISK_REMOVED); d->state |= (1<<MD_DISK_FAULTY); failed++; } } sb->nr_disks = nr_disks; sb->active_disks = active; sb->working_disks = working; sb->failed_disks = failed; sb->spare_disks = spare; sb->this_disk = sb->disks[rdev->desc_nr]; sb->sb_csum = calc_sb_csum(sb); } /* * rdev_size_change for 0.90.0 */ static unsigned long long super_90_rdev_size_change(struct md_rdev *rdev, sector_t num_sectors) { if (num_sectors && num_sectors < rdev->mddev->dev_sectors) return 0; /* component must fit device */ if (rdev->mddev->bitmap_info.offset) return 0; /* can't move bitmap */ rdev->sb_start = calc_dev_sboffset(rdev); if (!num_sectors || num_sectors > rdev->sb_start) num_sectors = rdev->sb_start; /* Limit to 4TB as metadata cannot record more than that. * 4TB == 2^32 KB, or 2*2^32 sectors. */ if (IS_ENABLED(CONFIG_LBDAF) && (u64)num_sectors >= (2ULL << 32) && rdev->mddev->level >= 1) num_sectors = (sector_t)(2ULL << 32) - 2; do { md_super_write(rdev->mddev, rdev, rdev->sb_start, rdev->sb_size, rdev->sb_page); } while (md_super_wait(rdev->mddev) < 0); return num_sectors; } static int super_90_allow_new_offset(struct md_rdev *rdev, unsigned long long new_offset) { /* non-zero offset changes not possible with v0.90 */ return new_offset == 0; } /* * version 1 superblock */ static __le32 calc_sb_1_csum(struct mdp_superblock_1 *sb) { __le32 disk_csum; u32 csum; unsigned long long newcsum; int size = 256 + le32_to_cpu(sb->max_dev)*2; __le32 *isuper = (__le32*)sb; disk_csum = sb->sb_csum; sb->sb_csum = 0; newcsum = 0; for (; size >= 4; size -= 4) newcsum += le32_to_cpu(*isuper++); if (size == 2) newcsum += le16_to_cpu(*(__le16*) isuper); csum = (newcsum & 0xffffffff) + (newcsum >> 32); sb->sb_csum = disk_csum; return cpu_to_le32(csum); } static int super_1_load(struct md_rdev *rdev, struct md_rdev *refdev, int minor_version) { struct mdp_superblock_1 *sb; int ret; sector_t sb_start; sector_t sectors; char b[BDEVNAME_SIZE], b2[BDEVNAME_SIZE]; int bmask; /* * Calculate the position of the superblock in 512byte sectors. * It is always aligned to a 4K boundary and * depeding on minor_version, it can be: * 0: At least 8K, but less than 12K, from end of device * 1: At start of device * 2: 4K from start of device. */ switch(minor_version) { case 0: sb_start = i_size_read(rdev->bdev->bd_inode) >> 9; sb_start -= 8*2; sb_start &= ~(sector_t)(4*2-1); break; case 1: sb_start = 0; break; case 2: sb_start = 8; break; default: return -EINVAL; } rdev->sb_start = sb_start; /* superblock is rarely larger than 1K, but it can be larger, * and it is safe to read 4k, so we do that */ ret = read_disk_sb(rdev, 4096); if (ret) return ret; sb = page_address(rdev->sb_page); if (sb->magic != cpu_to_le32(MD_SB_MAGIC) || sb->major_version != cpu_to_le32(1) || le32_to_cpu(sb->max_dev) > (4096-256)/2 || le64_to_cpu(sb->super_offset) != rdev->sb_start || (le32_to_cpu(sb->feature_map) & ~MD_FEATURE_ALL) != 0) return -EINVAL; if (calc_sb_1_csum(sb) != sb->sb_csum) { pr_warn("md: invalid superblock checksum on %s\n", bdevname(rdev->bdev,b)); return -EINVAL; } if (le64_to_cpu(sb->data_size) < 10) { pr_warn("md: data_size too small on %s\n", bdevname(rdev->bdev,b)); return -EINVAL; } if (sb->pad0 || sb->pad3[0] || memcmp(sb->pad3, sb->pad3+1, sizeof(sb->pad3) - sizeof(sb->pad3[1]))) /* Some padding is non-zero, might be a new feature */ return -EINVAL; rdev->preferred_minor = 0xffff; rdev->data_offset = le64_to_cpu(sb->data_offset); rdev->new_data_offset = rdev->data_offset; if ((le32_to_cpu(sb->feature_map) & MD_FEATURE_RESHAPE_ACTIVE) && (le32_to_cpu(sb->feature_map) & MD_FEATURE_NEW_OFFSET)) rdev->new_data_offset += (s32)le32_to_cpu(sb->new_offset); atomic_set(&rdev->corrected_errors, le32_to_cpu(sb->cnt_corrected_read)); rdev->sb_size = le32_to_cpu(sb->max_dev) * 2 + 256; bmask = queue_logical_block_size(rdev->bdev->bd_disk->queue)-1; if (rdev->sb_size & bmask) rdev->sb_size = (rdev->sb_size | bmask) + 1; if (minor_version && rdev->data_offset < sb_start + (rdev->sb_size/512)) return -EINVAL; if (minor_version && rdev->new_data_offset < sb_start + (rdev->sb_size/512)) return -EINVAL; if (sb->level == cpu_to_le32(LEVEL_MULTIPATH)) rdev->desc_nr = -1; else rdev->desc_nr = le32_to_cpu(sb->dev_number); if (!rdev->bb_page) { rdev->bb_page = alloc_page(GFP_KERNEL); if (!rdev->bb_page) return -ENOMEM; } if ((le32_to_cpu(sb->feature_map) & MD_FEATURE_BAD_BLOCKS) && rdev->badblocks.count == 0) { /* need to load the bad block list. * Currently we limit it to one page. */ s32 offset; sector_t bb_sector; u64 *bbp; int i; int sectors = le16_to_cpu(sb->bblog_size); if (sectors > (PAGE_SIZE / 512)) return -EINVAL; offset = le32_to_cpu(sb->bblog_offset); if (offset == 0) return -EINVAL; bb_sector = (long long)offset; if (!sync_page_io(rdev, bb_sector, sectors << 9, rdev->bb_page, REQ_OP_READ, 0, true)) return -EIO; bbp = (u64 *)page_address(rdev->bb_page); rdev->badblocks.shift = sb->bblog_shift; for (i = 0 ; i < (sectors << (9-3)) ; i++, bbp++) { u64 bb = le64_to_cpu(*bbp); int count = bb & (0x3ff); u64 sector = bb >> 10; sector <<= sb->bblog_shift; count <<= sb->bblog_shift; if (bb + 1 == 0) break; if (badblocks_set(&rdev->badblocks, sector, count, 1)) return -EINVAL; } } else if (sb->bblog_offset != 0) rdev->badblocks.shift = 0; if ((le32_to_cpu(sb->feature_map) & (MD_FEATURE_PPL | MD_FEATURE_MULTIPLE_PPLS))) { rdev->ppl.offset = (__s16)le16_to_cpu(sb->ppl.offset); rdev->ppl.size = le16_to_cpu(sb->ppl.size); rdev->ppl.sector = rdev->sb_start + rdev->ppl.offset; } if ((le32_to_cpu(sb->feature_map) & MD_FEATURE_RAID0_LAYOUT) && sb->level != 0) return -EINVAL; if (!refdev) { ret = 1; } else { __u64 ev1, ev2; struct mdp_superblock_1 *refsb = page_address(refdev->sb_page); if (memcmp(sb->set_uuid, refsb->set_uuid, 16) != 0 || sb->level != refsb->level || sb->layout != refsb->layout || sb->chunksize != refsb->chunksize) { pr_warn("md: %s has strangely different superblock to %s\n", bdevname(rdev->bdev,b), bdevname(refdev->bdev,b2)); return -EINVAL; } ev1 = le64_to_cpu(sb->events); ev2 = le64_to_cpu(refsb->events); if (ev1 > ev2) ret = 1; else ret = 0; } if (minor_version) { sectors = (i_size_read(rdev->bdev->bd_inode) >> 9); sectors -= rdev->data_offset; } else sectors = rdev->sb_start; if (sectors < le64_to_cpu(sb->data_size)) return -EINVAL; rdev->sectors = le64_to_cpu(sb->data_size); return ret; } static int super_1_validate(struct mddev *mddev, struct md_rdev *rdev) { struct mdp_superblock_1 *sb = page_address(rdev->sb_page); __u64 ev1 = le64_to_cpu(sb->events); rdev->raid_disk = -1; clear_bit(Faulty, &rdev->flags); clear_bit(In_sync, &rdev->flags); clear_bit(Bitmap_sync, &rdev->flags); clear_bit(WriteMostly, &rdev->flags); if (mddev->raid_disks == 0) { mddev->major_version = 1; mddev->patch_version = 0; mddev->external = 0; mddev->chunk_sectors = le32_to_cpu(sb->chunksize); mddev->ctime = le64_to_cpu(sb->ctime); mddev->utime = le64_to_cpu(sb->utime); mddev->level = le32_to_cpu(sb->level); mddev->clevel[0] = 0; mddev->layout = le32_to_cpu(sb->layout); mddev->raid_disks = le32_to_cpu(sb->raid_disks); mddev->dev_sectors = le64_to_cpu(sb->size); mddev->events = ev1; mddev->bitmap_info.offset = 0; mddev->bitmap_info.space = 0; /* Default location for bitmap is 1K after superblock * using 3K - total of 4K */ mddev->bitmap_info.default_offset = 1024 >> 9; mddev->bitmap_info.default_space = (4096-1024) >> 9; mddev->reshape_backwards = 0; mddev->recovery_cp = le64_to_cpu(sb->resync_offset); memcpy(mddev->uuid, sb->set_uuid, 16); mddev->max_disks = (4096-256)/2; if ((le32_to_cpu(sb->feature_map) & MD_FEATURE_BITMAP_OFFSET) && mddev->bitmap_info.file == NULL) { mddev->bitmap_info.offset = (__s32)le32_to_cpu(sb->bitmap_offset); /* Metadata doesn't record how much space is available. * For 1.0, we assume we can use up to the superblock * if before, else to 4K beyond superblock. * For others, assume no change is possible. */ if (mddev->minor_version > 0) mddev->bitmap_info.space = 0; else if (mddev->bitmap_info.offset > 0) mddev->bitmap_info.space = 8 - mddev->bitmap_info.offset; else mddev->bitmap_info.space = -mddev->bitmap_info.offset; } if ((le32_to_cpu(sb->feature_map) & MD_FEATURE_RESHAPE_ACTIVE)) { mddev->reshape_position = le64_to_cpu(sb->reshape_position); mddev->delta_disks = le32_to_cpu(sb->delta_disks); mddev->new_level = le32_to_cpu(sb->new_level); mddev->new_layout = le32_to_cpu(sb->new_layout); mddev->new_chunk_sectors = le32_to_cpu(sb->new_chunk); if (mddev->delta_disks < 0 || (mddev->delta_disks == 0 && (le32_to_cpu(sb->feature_map) & MD_FEATURE_RESHAPE_BACKWARDS))) mddev->reshape_backwards = 1; } else { mddev->reshape_position = MaxSector; mddev->delta_disks = 0; mddev->new_level = mddev->level; mddev->new_layout = mddev->layout; mddev->new_chunk_sectors = mddev->chunk_sectors; } if (mddev->level == 0 && !(le32_to_cpu(sb->feature_map) & MD_FEATURE_RAID0_LAYOUT)) mddev->layout = -1; if (le32_to_cpu(sb->feature_map) & MD_FEATURE_JOURNAL) set_bit(MD_HAS_JOURNAL, &mddev->flags); if (le32_to_cpu(sb->feature_map) & (MD_FEATURE_PPL | MD_FEATURE_MULTIPLE_PPLS)) { if (le32_to_cpu(sb->feature_map) & (MD_FEATURE_BITMAP_OFFSET | MD_FEATURE_JOURNAL)) return -EINVAL; if ((le32_to_cpu(sb->feature_map) & MD_FEATURE_PPL) && (le32_to_cpu(sb->feature_map) & MD_FEATURE_MULTIPLE_PPLS)) return -EINVAL; set_bit(MD_HAS_PPL, &mddev->flags); } } else if (mddev->pers == NULL) { /* Insist of good event counter while assembling, except for * spares (which don't need an event count) */ ++ev1; if (rdev->desc_nr >= 0 && rdev->desc_nr < le32_to_cpu(sb->max_dev) && (le16_to_cpu(sb->dev_roles[rdev->desc_nr]) < MD_DISK_ROLE_MAX || le16_to_cpu(sb->dev_roles[rdev->desc_nr]) == MD_DISK_ROLE_JOURNAL)) if (ev1 < mddev->events) return -EINVAL; } else if (mddev->bitmap) { /* If adding to array with a bitmap, then we can accept an * older device, but not too old. */ if (ev1 < mddev->bitmap->events_cleared) return 0; if (ev1 < mddev->events) set_bit(Bitmap_sync, &rdev->flags); } else { if (ev1 < mddev->events) /* just a hot-add of a new device, leave raid_disk at -1 */ return 0; } if (mddev->level != LEVEL_MULTIPATH) { int role; if (rdev->desc_nr < 0 || rdev->desc_nr >= le32_to_cpu(sb->max_dev)) { role = MD_DISK_ROLE_SPARE; rdev->desc_nr = -1; } else role = le16_to_cpu(sb->dev_roles[rdev->desc_nr]); switch(role) { case MD_DISK_ROLE_SPARE: /* spare */ break; case MD_DISK_ROLE_FAULTY: /* faulty */ set_bit(Faulty, &rdev->flags); break; case MD_DISK_ROLE_JOURNAL: /* journal device */ if (!(le32_to_cpu(sb->feature_map) & MD_FEATURE_JOURNAL)) { /* journal device without journal feature */ pr_warn("md: journal device provided without journal feature, ignoring the device\n"); return -EINVAL; } set_bit(Journal, &rdev->flags); rdev->journal_tail = le64_to_cpu(sb->journal_tail); rdev->raid_disk = 0; break; default: rdev->saved_raid_disk = role; if ((le32_to_cpu(sb->feature_map) & MD_FEATURE_RECOVERY_OFFSET)) { rdev->recovery_offset = le64_to_cpu(sb->recovery_offset); if (!(le32_to_cpu(sb->feature_map) & MD_FEATURE_RECOVERY_BITMAP)) rdev->saved_raid_disk = -1; } else { /* * If the array is FROZEN, then the device can't * be in_sync with rest of array. */ if (!test_bit(MD_RECOVERY_FROZEN, &mddev->recovery)) set_bit(In_sync, &rdev->flags); } rdev->raid_disk = role; break; } if (sb->devflags & WriteMostly1) set_bit(WriteMostly, &rdev->flags); if (sb->devflags & FailFast1) set_bit(FailFast, &rdev->flags); if (le32_to_cpu(sb->feature_map) & MD_FEATURE_REPLACEMENT) set_bit(Replacement, &rdev->flags); } else /* MULTIPATH are always insync */ set_bit(In_sync, &rdev->flags); return 0; } static void super_1_sync(struct mddev *mddev, struct md_rdev *rdev) { struct mdp_superblock_1 *sb; struct md_rdev *rdev2; int max_dev, i; /* make rdev->sb match mddev and rdev data. */ sb = page_address(rdev->sb_page); sb->feature_map = 0; sb->pad0 = 0; sb->recovery_offset = cpu_to_le64(0); memset(sb->pad3, 0, sizeof(sb->pad3)); sb->utime = cpu_to_le64((__u64)mddev->utime); sb->events = cpu_to_le64(mddev->events); if (mddev->in_sync) sb->resync_offset = cpu_to_le64(mddev->recovery_cp); else if (test_bit(MD_JOURNAL_CLEAN, &mddev->flags)) sb->resync_offset = cpu_to_le64(MaxSector); else sb->resync_offset = cpu_to_le64(0); sb->cnt_corrected_read = cpu_to_le32(atomic_read(&rdev->corrected_errors)); sb->raid_disks = cpu_to_le32(mddev->raid_disks); sb->size = cpu_to_le64(mddev->dev_sectors); sb->chunksize = cpu_to_le32(mddev->chunk_sectors); sb->level = cpu_to_le32(mddev->level); sb->layout = cpu_to_le32(mddev->layout); if (test_bit(FailFast, &rdev->flags)) sb->devflags |= FailFast1; else sb->devflags &= ~FailFast1; if (test_bit(WriteMostly, &rdev->flags)) sb->devflags |= WriteMostly1; else sb->devflags &= ~WriteMostly1; sb->data_offset = cpu_to_le64(rdev->data_offset); sb->data_size = cpu_to_le64(rdev->sectors); if (mddev->bitmap && mddev->bitmap_info.file == NULL) { sb->bitmap_offset = cpu_to_le32((__u32)mddev->bitmap_info.offset); sb->feature_map = cpu_to_le32(MD_FEATURE_BITMAP_OFFSET); } if (rdev->raid_disk >= 0 && !test_bit(Journal, &rdev->flags) && !test_bit(In_sync, &rdev->flags)) { sb->feature_map |= cpu_to_le32(MD_FEATURE_RECOVERY_OFFSET); sb->recovery_offset = cpu_to_le64(rdev->recovery_offset); if (rdev->saved_raid_disk >= 0 && mddev->bitmap) sb->feature_map |= cpu_to_le32(MD_FEATURE_RECOVERY_BITMAP); } /* Note: recovery_offset and journal_tail share space */ if (test_bit(Journal, &rdev->flags)) sb->journal_tail = cpu_to_le64(rdev->journal_tail); if (test_bit(Replacement, &rdev->flags)) sb->feature_map |= cpu_to_le32(MD_FEATURE_REPLACEMENT); if (mddev->reshape_position != MaxSector) { sb->feature_map |= cpu_to_le32(MD_FEATURE_RESHAPE_ACTIVE); sb->reshape_position = cpu_to_le64(mddev->reshape_position); sb->new_layout = cpu_to_le32(mddev->new_layout); sb->delta_disks = cpu_to_le32(mddev->delta_disks); sb->new_level = cpu_to_le32(mddev->new_level); sb->new_chunk = cpu_to_le32(mddev->new_chunk_sectors); if (mddev->delta_disks == 0 && mddev->reshape_backwards) sb->feature_map |= cpu_to_le32(MD_FEATURE_RESHAPE_BACKWARDS); if (rdev->new_data_offset != rdev->data_offset) { sb->feature_map |= cpu_to_le32(MD_FEATURE_NEW_OFFSET); sb->new_offset = cpu_to_le32((__u32)(rdev->new_data_offset - rdev->data_offset)); } } if (mddev_is_clustered(mddev)) sb->feature_map |= cpu_to_le32(MD_FEATURE_CLUSTERED); if (rdev->badblocks.count == 0) /* Nothing to do for bad blocks*/ ; else if (sb->bblog_offset == 0) /* Cannot record bad blocks on this device */ md_error(mddev, rdev); else { struct badblocks *bb = &rdev->badblocks; u64 *bbp = (u64 *)page_address(rdev->bb_page); u64 *p = bb->page; sb->feature_map |= cpu_to_le32(MD_FEATURE_BAD_BLOCKS); if (bb->changed) { unsigned seq; retry: seq = read_seqbegin(&bb->lock); memset(bbp, 0xff, PAGE_SIZE); for (i = 0 ; i < bb->count ; i++) { u64 internal_bb = p[i]; u64 store_bb = ((BB_OFFSET(internal_bb) << 10) | BB_LEN(internal_bb)); bbp[i] = cpu_to_le64(store_bb); } bb->changed = 0; if (read_seqretry(&bb->lock, seq)) goto retry; bb->sector = (rdev->sb_start + (int)le32_to_cpu(sb->bblog_offset)); bb->size = le16_to_cpu(sb->bblog_size); } } max_dev = 0; rdev_for_each(rdev2, mddev) if (rdev2->desc_nr+1 > max_dev) max_dev = rdev2->desc_nr+1; if (max_dev > le32_to_cpu(sb->max_dev)) { int bmask; sb->max_dev = cpu_to_le32(max_dev); rdev->sb_size = max_dev * 2 + 256; bmask = queue_logical_block_size(rdev->bdev->bd_disk->queue)-1; if (rdev->sb_size & bmask) rdev->sb_size = (rdev->sb_size | bmask) + 1; } else max_dev = le32_to_cpu(sb->max_dev); for (i=0; i<max_dev;i++) sb->dev_roles[i] = cpu_to_le16(MD_DISK_ROLE_SPARE); if (test_bit(MD_HAS_JOURNAL, &mddev->flags)) sb->feature_map |= cpu_to_le32(MD_FEATURE_JOURNAL); if (test_bit(MD_HAS_PPL, &mddev->flags)) { if (test_bit(MD_HAS_MULTIPLE_PPLS, &mddev->flags)) sb->feature_map |= cpu_to_le32(MD_FEATURE_MULTIPLE_PPLS); else sb->feature_map |= cpu_to_le32(MD_FEATURE_PPL); sb->ppl.offset = cpu_to_le16(rdev->ppl.offset); sb->ppl.size = cpu_to_le16(rdev->ppl.size); } rdev_for_each(rdev2, mddev) { i = rdev2->desc_nr; if (test_bit(Faulty, &rdev2->flags)) sb->dev_roles[i] = cpu_to_le16(MD_DISK_ROLE_FAULTY); else if (test_bit(In_sync, &rdev2->flags)) sb->dev_roles[i] = cpu_to_le16(rdev2->raid_disk); else if (test_bit(Journal, &rdev2->flags)) sb->dev_roles[i] = cpu_to_le16(MD_DISK_ROLE_JOURNAL); else if (rdev2->raid_disk >= 0) sb->dev_roles[i] = cpu_to_le16(rdev2->raid_disk); else sb->dev_roles[i] = cpu_to_le16(MD_DISK_ROLE_SPARE); } sb->sb_csum = calc_sb_1_csum(sb); } static unsigned long long super_1_rdev_size_change(struct md_rdev *rdev, sector_t num_sectors) { struct mdp_superblock_1 *sb; sector_t max_sectors; if (num_sectors && num_sectors < rdev->mddev->dev_sectors) return 0; /* component must fit device */ if (rdev->data_offset != rdev->new_data_offset) return 0; /* too confusing */ if (rdev->sb_start < rdev->data_offset) { /* minor versions 1 and 2; superblock before data */ max_sectors = i_size_read(rdev->bdev->bd_inode) >> 9; max_sectors -= rdev->data_offset; if (!num_sectors || num_sectors > max_sectors) num_sectors = max_sectors; } else if (rdev->mddev->bitmap_info.offset) { /* minor version 0 with bitmap we can't move */ return 0; } else { /* minor version 0; superblock after data */ sector_t sb_start; sb_start = (i_size_read(rdev->bdev->bd_inode) >> 9) - 8*2; sb_start &= ~(sector_t)(4*2 - 1); max_sectors = rdev->sectors + sb_start - rdev->sb_start; if (!num_sectors || num_sectors > max_sectors) num_sectors = max_sectors; rdev->sb_start = sb_start; } sb = page_address(rdev->sb_page); sb->data_size = cpu_to_le64(num_sectors); sb->super_offset = cpu_to_le64(rdev->sb_start); sb->sb_csum = calc_sb_1_csum(sb); do { md_super_write(rdev->mddev, rdev, rdev->sb_start, rdev->sb_size, rdev->sb_page); } while (md_super_wait(rdev->mddev) < 0); return num_sectors; } static int super_1_allow_new_offset(struct md_rdev *rdev, unsigned long long new_offset) { /* All necessary checks on new >= old have been done */ struct bitmap *bitmap; if (new_offset >= rdev->data_offset) return 1; /* with 1.0 metadata, there is no metadata to tread on * so we can always move back */ if (rdev->mddev->minor_version == 0) return 1; /* otherwise we must be sure not to step on * any metadata, so stay: * 36K beyond start of superblock * beyond end of badblocks * beyond write-intent bitmap */ if (rdev->sb_start + (32+4)*2 > new_offset) return 0; bitmap = rdev->mddev->bitmap; if (bitmap && !rdev->mddev->bitmap_info.file && rdev->sb_start + rdev->mddev->bitmap_info.offset + bitmap->storage.file_pages * (PAGE_SIZE>>9) > new_offset) return 0; if (rdev->badblocks.sector + rdev->badblocks.size > new_offset) return 0; return 1; } static struct super_type super_types[] = { [0] = { .name = "0.90.0", .owner = THIS_MODULE, .load_super = super_90_load, .validate_super = super_90_validate, .sync_super = super_90_sync, .rdev_size_change = super_90_rdev_size_change, .allow_new_offset = super_90_allow_new_offset, }, [1] = { .name = "md-1", .owner = THIS_MODULE, .load_super = super_1_load, .validate_super = super_1_validate, .sync_super = super_1_sync, .rdev_size_change = super_1_rdev_size_change, .allow_new_offset = super_1_allow_new_offset, }, }; static void sync_super(struct mddev *mddev, struct md_rdev *rdev) { if (mddev->sync_super) { mddev->sync_super(mddev, rdev); return; } BUG_ON(mddev->major_version >= ARRAY_SIZE(super_types)); super_types[mddev->major_version].sync_super(mddev, rdev); } static int match_mddev_units(struct mddev *mddev1, struct mddev *mddev2) { struct md_rdev *rdev, *rdev2; rcu_read_lock(); rdev_for_each_rcu(rdev, mddev1) { if (test_bit(Faulty, &rdev->flags) || test_bit(Journal, &rdev->flags) || rdev->raid_disk == -1) continue; rdev_for_each_rcu(rdev2, mddev2) { if (test_bit(Faulty, &rdev2->flags) || test_bit(Journal, &rdev2->flags) || rdev2->raid_disk == -1) continue; if (rdev->bdev->bd_contains == rdev2->bdev->bd_contains) { rcu_read_unlock(); return 1; } } } rcu_read_unlock(); return 0; } static LIST_HEAD(pending_raid_disks); /* * Try to register data integrity profile for an mddev * * This is called when an array is started and after a disk has been kicked * from the array. It only succeeds if all working and active component devices * are integrity capable with matching profiles. */ int md_integrity_register(struct mddev *mddev) { struct md_rdev *rdev, *reference = NULL; if (list_empty(&mddev->disks)) return 0; /* nothing to do */ if (!mddev->gendisk || blk_get_integrity(mddev->gendisk)) return 0; /* shouldn't register, or already is */ rdev_for_each(rdev, mddev) { /* skip spares and non-functional disks */ if (test_bit(Faulty, &rdev->flags)) continue; if (rdev->raid_disk < 0) continue; if (!reference) { /* Use the first rdev as the reference */ reference = rdev; continue; } /* does this rdev's profile match the reference profile? */ if (blk_integrity_compare(reference->bdev->bd_disk, rdev->bdev->bd_disk) < 0) return -EINVAL; } if (!reference || !bdev_get_integrity(reference->bdev)) return 0; /* * All component devices are integrity capable and have matching * profiles, register the common profile for the md device. */ blk_integrity_register(mddev->gendisk, bdev_get_integrity(reference->bdev)); pr_debug("md: data integrity enabled on %s\n", mdname(mddev)); if (bioset_integrity_create(&mddev->bio_set, BIO_POOL_SIZE)) { pr_err("md: failed to create integrity pool for %s\n", mdname(mddev)); return -EINVAL; } return 0; } EXPORT_SYMBOL(md_integrity_register); /* * Attempt to add an rdev, but only if it is consistent with the current * integrity profile */ int md_integrity_add_rdev(struct md_rdev *rdev, struct mddev *mddev) { struct blk_integrity *bi_rdev; struct blk_integrity *bi_mddev; char name[BDEVNAME_SIZE]; if (!mddev->gendisk) return 0; bi_rdev = bdev_get_integrity(rdev->bdev); bi_mddev = blk_get_integrity(mddev->gendisk); if (!bi_mddev) /* nothing to do */ return 0; if (blk_integrity_compare(mddev->gendisk, rdev->bdev->bd_disk) != 0) { pr_err("%s: incompatible integrity profile for %s\n", mdname(mddev), bdevname(rdev->bdev, name)); return -ENXIO; } return 0; } EXPORT_SYMBOL(md_integrity_add_rdev); static int bind_rdev_to_array(struct md_rdev *rdev, struct mddev *mddev) { char b[BDEVNAME_SIZE]; struct kobject *ko; int err; /* prevent duplicates */ if (find_rdev(mddev, rdev->bdev->bd_dev)) return -EEXIST; if ((bdev_read_only(rdev->bdev) || bdev_read_only(rdev->meta_bdev)) && mddev->pers) return -EROFS; /* make sure rdev->sectors exceeds mddev->dev_sectors */ if (!test_bit(Journal, &rdev->flags) && rdev->sectors && (mddev->dev_sectors == 0 || rdev->sectors < mddev->dev_sectors)) { if (mddev->pers) { /* Cannot change size, so fail * If mddev->level <= 0, then we don't care * about aligning sizes (e.g. linear) */ if (mddev->level > 0) return -ENOSPC; } else mddev->dev_sectors = rdev->sectors; } /* Verify rdev->desc_nr is unique. * If it is -1, assign a free number, else * check number is not in use */ rcu_read_lock(); if (rdev->desc_nr < 0) { int choice = 0; if (mddev->pers) choice = mddev->raid_disks; while (md_find_rdev_nr_rcu(mddev, choice)) choice++; rdev->desc_nr = choice; } else { if (md_find_rdev_nr_rcu(mddev, rdev->desc_nr)) { rcu_read_unlock(); return -EBUSY; } } rcu_read_unlock(); if (!test_bit(Journal, &rdev->flags) && mddev->max_disks && rdev->desc_nr >= mddev->max_disks) { pr_warn("md: %s: array is limited to %d devices\n", mdname(mddev), mddev->max_disks); return -EBUSY; } bdevname(rdev->bdev,b); strreplace(b, '/', '!'); rdev->mddev = mddev; pr_debug("md: bind<%s>\n", b); if ((err = kobject_add(&rdev->kobj, &mddev->kobj, "dev-%s", b))) goto fail; ko = &part_to_dev(rdev->bdev->bd_part)->kobj; if (sysfs_create_link(&rdev->kobj, ko, "block")) /* failure here is OK */; rdev->sysfs_state = sysfs_get_dirent_safe(rdev->kobj.sd, "state"); list_add_rcu(&rdev->same_set, &mddev->disks); bd_link_disk_holder(rdev->bdev, mddev->gendisk); /* May as well allow recovery to be retried once */ mddev->recovery_disabled++; return 0; fail: pr_warn("md: failed to register dev-%s for %s\n", b, mdname(mddev)); return err; } static void md_delayed_delete(struct work_struct *ws) { struct md_rdev *rdev = container_of(ws, struct md_rdev, del_work); kobject_del(&rdev->kobj); kobject_put(&rdev->kobj); } static void unbind_rdev_from_array(struct md_rdev *rdev) { char b[BDEVNAME_SIZE]; bd_unlink_disk_holder(rdev->bdev, rdev->mddev->gendisk); list_del_rcu(&rdev->same_set); pr_debug("md: unbind<%s>\n", bdevname(rdev->bdev,b)); rdev->mddev = NULL; sysfs_remove_link(&rdev->kobj, "block"); sysfs_put(rdev->sysfs_state); rdev->sysfs_state = NULL; rdev->badblocks.count = 0; /* We need to delay this, otherwise we can deadlock when * writing to 'remove' to "dev/state". We also need * to delay it due to rcu usage. */ synchronize_rcu(); INIT_WORK(&rdev->del_work, md_delayed_delete); kobject_get(&rdev->kobj); queue_work(md_misc_wq, &rdev->del_work); } /* * prevent the device from being mounted, repartitioned or * otherwise reused by a RAID array (or any other kernel * subsystem), by bd_claiming the device. */ static int lock_rdev(struct md_rdev *rdev, dev_t dev, int shared) { int err = 0; struct block_device *bdev; char b[BDEVNAME_SIZE]; bdev = blkdev_get_by_dev(dev, FMODE_READ|FMODE_WRITE|FMODE_EXCL, shared ? (struct md_rdev *)lock_rdev : rdev); if (IS_ERR(bdev)) { pr_warn("md: could not open %s.\n", __bdevname(dev, b)); return PTR_ERR(bdev); } rdev->bdev = bdev; return err; } static void unlock_rdev(struct md_rdev *rdev) { struct block_device *bdev = rdev->bdev; rdev->bdev = NULL; blkdev_put(bdev, FMODE_READ|FMODE_WRITE|FMODE_EXCL); } void md_autodetect_dev(dev_t dev); static void export_rdev(struct md_rdev *rdev) { char b[BDEVNAME_SIZE]; pr_debug("md: export_rdev(%s)\n", bdevname(rdev->bdev,b)); md_rdev_clear(rdev); #ifndef MODULE if (test_bit(AutoDetected, &rdev->flags)) md_autodetect_dev(rdev->bdev->bd_dev); #endif unlock_rdev(rdev); kobject_put(&rdev->kobj); } void md_kick_rdev_from_array(struct md_rdev *rdev) { unbind_rdev_from_array(rdev); export_rdev(rdev); } EXPORT_SYMBOL_GPL(md_kick_rdev_from_array); static void export_array(struct mddev *mddev) { struct md_rdev *rdev; while (!list_empty(&mddev->disks)) { rdev = list_first_entry(&mddev->disks, struct md_rdev, same_set); md_kick_rdev_from_array(rdev); } mddev->raid_disks = 0; mddev->major_version = 0; } static bool set_in_sync(struct mddev *mddev) { lockdep_assert_held(&mddev->lock); if (!mddev->in_sync) { mddev->sync_checkers++; spin_unlock(&mddev->lock); percpu_ref_switch_to_atomic_sync(&mddev->writes_pending); spin_lock(&mddev->lock); if (!mddev->in_sync && percpu_ref_is_zero(&mddev->writes_pending)) { mddev->in_sync = 1; /* * Ensure ->in_sync is visible before we clear * ->sync_checkers. */ smp_mb(); set_bit(MD_SB_CHANGE_CLEAN, &mddev->sb_flags); sysfs_notify_dirent_safe(mddev->sysfs_state); } if (--mddev->sync_checkers == 0) percpu_ref_switch_to_percpu(&mddev->writes_pending); } if (mddev->safemode == 1) mddev->safemode = 0; return mddev->in_sync; } static void sync_sbs(struct mddev *mddev, int nospares) { /* Update each superblock (in-memory image), but * if we are allowed to, skip spares which already * have the right event counter, or have one earlier * (which would mean they aren't being marked as dirty * with the rest of the array) */ struct md_rdev *rdev; rdev_for_each(rdev, mddev) { if (rdev->sb_events == mddev->events || (nospares && rdev->raid_disk < 0 && rdev->sb_events+1 == mddev->events)) { /* Don't update this superblock */ rdev->sb_loaded = 2; } else { sync_super(mddev, rdev); rdev->sb_loaded = 1; } } } static bool does_sb_need_changing(struct mddev *mddev) { struct md_rdev *rdev = NULL, *iter; struct mdp_superblock_1 *sb; int role; /* Find a good rdev */ rdev_for_each(iter, mddev) if ((iter->raid_disk >= 0) && !test_bit(Faulty, &iter->flags)) { rdev = iter; break; } /* No good device found. */ if (!rdev) return false; sb = page_address(rdev->sb_page); /* Check if a device has become faulty or a spare become active */ rdev_for_each(rdev, mddev) { role = le16_to_cpu(sb->dev_roles[rdev->desc_nr]); /* Device activated? */ if (role == 0xffff && rdev->raid_disk >=0 && !test_bit(Faulty, &rdev->flags)) return true; /* Device turned faulty? */ if (test_bit(Faulty, &rdev->flags) && (role < 0xfffd)) return true; } /* Check if any mddev parameters have changed */ if ((mddev->dev_sectors != le64_to_cpu(sb->size)) || (mddev->reshape_position != le64_to_cpu(sb->reshape_position)) || (mddev->layout != le32_to_cpu(sb->layout)) || (mddev->raid_disks != le32_to_cpu(sb->raid_disks)) || (mddev->chunk_sectors != le32_to_cpu(sb->chunksize))) return true; return false; } void md_update_sb(struct mddev *mddev, int force_change) { struct md_rdev *rdev; int sync_req; int nospares = 0; int any_badblocks_changed = 0; int ret = -1; if (mddev->ro) { if (force_change) set_bit(MD_SB_CHANGE_DEVS, &mddev->sb_flags); return; } repeat: if (mddev_is_clustered(mddev)) { if (test_and_clear_bit(MD_SB_CHANGE_DEVS, &mddev->sb_flags)) force_change = 1; if (test_and_clear_bit(MD_SB_CHANGE_CLEAN, &mddev->sb_flags)) nospares = 1; ret = md_cluster_ops->metadata_update_start(mddev); /* Has someone else has updated the sb */ if (!does_sb_need_changing(mddev)) { if (ret == 0) md_cluster_ops->metadata_update_cancel(mddev); bit_clear_unless(&mddev->sb_flags, BIT(MD_SB_CHANGE_PENDING), BIT(MD_SB_CHANGE_DEVS) | BIT(MD_SB_CHANGE_CLEAN)); return; } } /* * First make sure individual recovery_offsets are correct * curr_resync_completed can only be used during recovery. * During reshape/resync it might use array-addresses rather * that device addresses. */ rdev_for_each(rdev, mddev) { if (rdev->raid_disk >= 0 && mddev->delta_disks >= 0 && test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) && test_bit(MD_RECOVERY_RECOVER, &mddev->recovery) && !test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery) && !test_bit(Journal, &rdev->flags) && !test_bit(In_sync, &rdev->flags) && mddev->curr_resync_completed > rdev->recovery_offset) rdev->recovery_offset = mddev->curr_resync_completed; } if (!mddev->persistent) { clear_bit(MD_SB_CHANGE_CLEAN, &mddev->sb_flags); clear_bit(MD_SB_CHANGE_DEVS, &mddev->sb_flags); if (!mddev->external) { clear_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags); rdev_for_each(rdev, mddev) { if (rdev->badblocks.changed) { rdev->badblocks.changed = 0; ack_all_badblocks(&rdev->badblocks); md_error(mddev, rdev); } clear_bit(Blocked, &rdev->flags); clear_bit(BlockedBadBlocks, &rdev->flags); wake_up(&rdev->blocked_wait); } } wake_up(&mddev->sb_wait); return; } spin_lock(&mddev->lock); mddev->utime = ktime_get_real_seconds(); if (test_and_clear_bit(MD_SB_CHANGE_DEVS, &mddev->sb_flags)) force_change = 1; if (test_and_clear_bit(MD_SB_CHANGE_CLEAN, &mddev->sb_flags)) /* just a clean<-> dirty transition, possibly leave spares alone, * though if events isn't the right even/odd, we will have to do * spares after all */ nospares = 1; if (force_change) nospares = 0; if (mddev->degraded) /* If the array is degraded, then skipping spares is both * dangerous and fairly pointless. * Dangerous because a device that was removed from the array * might have a event_count that still looks up-to-date, * so it can be re-added without a resync. * Pointless because if there are any spares to skip, * then a recovery will happen and soon that array won't * be degraded any more and the spare can go back to sleep then. */ nospares = 0; sync_req = mddev->in_sync; /* If this is just a dirty<->clean transition, and the array is clean * and 'events' is odd, we can roll back to the previous clean state */ if (nospares && (mddev->in_sync && mddev->recovery_cp == MaxSector) && mddev->can_decrease_events && mddev->events != 1) { mddev->events--; mddev->can_decrease_events = 0; } else { /* otherwise we have to go forward and ... */ mddev->events ++; mddev->can_decrease_events = nospares; } /* * This 64-bit counter should never wrap. * Either we are in around ~1 trillion A.C., assuming * 1 reboot per second, or we have a bug... */ WARN_ON(mddev->events == 0); rdev_for_each(rdev, mddev) { if (rdev->badblocks.changed) any_badblocks_changed++; if (test_bit(Faulty, &rdev->flags)) set_bit(FaultRecorded, &rdev->flags); } sync_sbs(mddev, nospares); spin_unlock(&mddev->lock); pr_debug("md: updating %s RAID superblock on device (in sync %d)\n", mdname(mddev), mddev->in_sync); if (mddev->queue) blk_add_trace_msg(mddev->queue, "md md_update_sb"); rewrite: md_bitmap_update_sb(mddev->bitmap); rdev_for_each(rdev, mddev) { char b[BDEVNAME_SIZE]; if (rdev->sb_loaded != 1) continue; /* no noise on spare devices */ if (!test_bit(Faulty, &rdev->flags)) { md_super_write(mddev,rdev, rdev->sb_start, rdev->sb_size, rdev->sb_page); pr_debug("md: (write) %s's sb offset: %llu\n", bdevname(rdev->bdev, b), (unsigned long long)rdev->sb_start); rdev->sb_events = mddev->events; if (rdev->badblocks.size) { md_super_write(mddev, rdev, rdev->badblocks.sector, rdev->badblocks.size << 9, rdev->bb_page); rdev->badblocks.size = 0; } } else pr_debug("md: %s (skipping faulty)\n", bdevname(rdev->bdev, b)); if (mddev->level == LEVEL_MULTIPATH) /* only need to write one superblock... */ break; } if (md_super_wait(mddev) < 0) goto rewrite; /* if there was a failure, MD_SB_CHANGE_DEVS was set, and we re-write super */ if (mddev_is_clustered(mddev) && ret == 0) md_cluster_ops->metadata_update_finish(mddev); if (mddev->in_sync != sync_req || !bit_clear_unless(&mddev->sb_flags, BIT(MD_SB_CHANGE_PENDING), BIT(MD_SB_CHANGE_DEVS) | BIT(MD_SB_CHANGE_CLEAN))) /* have to write it out again */ goto repeat; wake_up(&mddev->sb_wait); if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) sysfs_notify(&mddev->kobj, NULL, "sync_completed"); rdev_for_each(rdev, mddev) { if (test_and_clear_bit(FaultRecorded, &rdev->flags)) clear_bit(Blocked, &rdev->flags); if (any_badblocks_changed) ack_all_badblocks(&rdev->badblocks); clear_bit(BlockedBadBlocks, &rdev->flags); wake_up(&rdev->blocked_wait); } } EXPORT_SYMBOL(md_update_sb); static int add_bound_rdev(struct md_rdev *rdev) { struct mddev *mddev = rdev->mddev; int err = 0; bool add_journal = test_bit(Journal, &rdev->flags); if (!mddev->pers->hot_remove_disk || add_journal) { /* If there is hot_add_disk but no hot_remove_disk * then added disks for geometry changes, * and should be added immediately. */ super_types[mddev->major_version]. validate_super(mddev, rdev); if (add_journal) mddev_suspend(mddev); err = mddev->pers->hot_add_disk(mddev, rdev); if (add_journal) mddev_resume(mddev); if (err) { md_kick_rdev_from_array(rdev); return err; } } sysfs_notify_dirent_safe(rdev->sysfs_state); set_bit(MD_SB_CHANGE_DEVS, &mddev->sb_flags); if (mddev->degraded) set_bit(MD_RECOVERY_RECOVER, &mddev->recovery); set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); md_new_event(mddev); md_wakeup_thread(mddev->thread); return 0; } /* words written to sysfs files may, or may not, be \n terminated. * We want to accept with case. For this we use cmd_match. */ static int cmd_match(const char *cmd, const char *str) { /* See if cmd, written into a sysfs file, matches * str. They must either be the same, or cmd can * have a trailing newline */ while (*cmd && *str && *cmd == *str) { cmd++; str++; } if (*cmd == '\n') cmd++; if (*str || *cmd) return 0; return 1; } struct rdev_sysfs_entry { struct attribute attr; ssize_t (*show)(struct md_rdev *, char *); ssize_t (*store)(struct md_rdev *, const char *, size_t); }; static ssize_t state_show(struct md_rdev *rdev, char *page) { char *sep = ","; size_t len = 0; unsigned long flags = READ_ONCE(rdev->flags); if (test_bit(Faulty, &flags) || (!test_bit(ExternalBbl, &flags) && rdev->badblocks.unacked_exist)) len += sprintf(page+len, "faulty%s", sep); if (test_bit(In_sync, &flags)) len += sprintf(page+len, "in_sync%s", sep); if (test_bit(Journal, &flags)) len += sprintf(page+len, "journal%s", sep); if (test_bit(WriteMostly, &flags)) len += sprintf(page+len, "write_mostly%s", sep); if (test_bit(Blocked, &flags) || (rdev->badblocks.unacked_exist && !test_bit(Faulty, &flags))) len += sprintf(page+len, "blocked%s", sep); if (!test_bit(Faulty, &flags) && !test_bit(Journal, &flags) && !test_bit(In_sync, &flags)) len += sprintf(page+len, "spare%s", sep); if (test_bit(WriteErrorSeen, &flags)) len += sprintf(page+len, "write_error%s", sep); if (test_bit(WantReplacement, &flags)) len += sprintf(page+len, "want_replacement%s", sep); if (test_bit(Replacement, &flags)) len += sprintf(page+len, "replacement%s", sep); if (test_bit(ExternalBbl, &flags)) len += sprintf(page+len, "external_bbl%s", sep); if (test_bit(FailFast, &flags)) len += sprintf(page+len, "failfast%s", sep); if (len) len -= strlen(sep); return len+sprintf(page+len, "\n"); } static ssize_t state_store(struct md_rdev *rdev, const char *buf, size_t len) { /* can write * faulty - simulates an error * remove - disconnects the device * writemostly - sets write_mostly * -writemostly - clears write_mostly * blocked - sets the Blocked flags * -blocked - clears the Blocked and possibly simulates an error * insync - sets Insync providing device isn't active * -insync - clear Insync for a device with a slot assigned, * so that it gets rebuilt based on bitmap * write_error - sets WriteErrorSeen * -write_error - clears WriteErrorSeen * {,-}failfast - set/clear FailFast */ int err = -EINVAL; if (cmd_match(buf, "faulty") && rdev->mddev->pers) { md_error(rdev->mddev, rdev); if (test_bit(Faulty, &rdev->flags)) err = 0; else err = -EBUSY; } else if (cmd_match(buf, "remove")) { if (rdev->mddev->pers) { clear_bit(Blocked, &rdev->flags); remove_and_add_spares(rdev->mddev, rdev); } if (rdev->raid_disk >= 0) err = -EBUSY; else { struct mddev *mddev = rdev->mddev; err = 0; if (mddev_is_clustered(mddev)) err = md_cluster_ops->remove_disk(mddev, rdev); if (err == 0) { md_kick_rdev_from_array(rdev); if (mddev->pers) { set_bit(MD_SB_CHANGE_DEVS, &mddev->sb_flags); md_wakeup_thread(mddev->thread); } md_new_event(mddev); } } } else if (cmd_match(buf, "writemostly")) { set_bit(WriteMostly, &rdev->flags); err = 0; } else if (cmd_match(buf, "-writemostly")) { clear_bit(WriteMostly, &rdev->flags); err = 0; } else if (cmd_match(buf, "blocked")) { set_bit(Blocked, &rdev->flags); err = 0; } else if (cmd_match(buf, "-blocked")) { if (!test_bit(Faulty, &rdev->flags) && !test_bit(ExternalBbl, &rdev->flags) && rdev->badblocks.unacked_exist) { /* metadata handler doesn't understand badblocks, * so we need to fail the device */ md_error(rdev->mddev, rdev); } clear_bit(Blocked, &rdev->flags); clear_bit(BlockedBadBlocks, &rdev->flags); wake_up(&rdev->blocked_wait); set_bit(MD_RECOVERY_NEEDED, &rdev->mddev->recovery); md_wakeup_thread(rdev->mddev->thread); err = 0; } else if (cmd_match(buf, "insync") && rdev->raid_disk == -1) { set_bit(In_sync, &rdev->flags); err = 0; } else if (cmd_match(buf, "failfast")) { set_bit(FailFast, &rdev->flags); err = 0; } else if (cmd_match(buf, "-failfast")) { clear_bit(FailFast, &rdev->flags); err = 0; } else if (cmd_match(buf, "-insync") && rdev->raid_disk >= 0 && !test_bit(Journal, &rdev->flags)) { if (rdev->mddev->pers == NULL) { clear_bit(In_sync, &rdev->flags); rdev->saved_raid_disk = rdev->raid_disk; rdev->raid_disk = -1; err = 0; } } else if (cmd_match(buf, "write_error")) { set_bit(WriteErrorSeen, &rdev->flags); err = 0; } else if (cmd_match(buf, "-write_error")) { clear_bit(WriteErrorSeen, &rdev->flags); err = 0; } else if (cmd_match(buf, "want_replacement")) { /* Any non-spare device that is not a replacement can * become want_replacement at any time, but we then need to * check if recovery is needed. */ if (rdev->raid_disk >= 0 && !test_bit(Journal, &rdev->flags) && !test_bit(Replacement, &rdev->flags)) set_bit(WantReplacement, &rdev->flags); set_bit(MD_RECOVERY_NEEDED, &rdev->mddev->recovery); md_wakeup_thread(rdev->mddev->thread); err = 0; } else if (cmd_match(buf, "-want_replacement")) { /* Clearing 'want_replacement' is always allowed. * Once replacements starts it is too late though. */ err = 0; clear_bit(WantReplacement, &rdev->flags); } else if (cmd_match(buf, "replacement")) { /* Can only set a device as a replacement when array has not * yet been started. Once running, replacement is automatic * from spares, or by assigning 'slot'. */ if (rdev->mddev->pers) err = -EBUSY; else { set_bit(Replacement, &rdev->flags); err = 0; } } else if (cmd_match(buf, "-replacement")) { /* Similarly, can only clear Replacement before start */ if (rdev->mddev->pers) err = -EBUSY; else { clear_bit(Replacement, &rdev->flags); err = 0; } } else if (cmd_match(buf, "re-add")) { if (!rdev->mddev->pers) err = -EINVAL; else if (test_bit(Faulty, &rdev->flags) && (rdev->raid_disk == -1) && rdev->saved_raid_disk >= 0) { /* clear_bit is performed _after_ all the devices * have their local Faulty bit cleared. If any writes * happen in the meantime in the local node, they * will land in the local bitmap, which will be synced * by this node eventually */ if (!mddev_is_clustered(rdev->mddev) || (err = md_cluster_ops->gather_bitmaps(rdev)) == 0) { clear_bit(Faulty, &rdev->flags); err = add_bound_rdev(rdev); } } else err = -EBUSY; } else if (cmd_match(buf, "external_bbl") && (rdev->mddev->external)) { set_bit(ExternalBbl, &rdev->flags); rdev->badblocks.shift = 0; err = 0; } else if (cmd_match(buf, "-external_bbl") && (rdev->mddev->external)) { clear_bit(ExternalBbl, &rdev->flags); err = 0; } if (!err) sysfs_notify_dirent_safe(rdev->sysfs_state); return err ? err : len; } static struct rdev_sysfs_entry rdev_state = __ATTR_PREALLOC(state, S_IRUGO|S_IWUSR, state_show, state_store); static ssize_t errors_show(struct md_rdev *rdev, char *page) { return sprintf(page, "%d\n", atomic_read(&rdev->corrected_errors)); } static ssize_t errors_store(struct md_rdev *rdev, const char *buf, size_t len) { unsigned int n; int rv; rv = kstrtouint(buf, 10, &n); if (rv < 0) return rv; atomic_set(&rdev->corrected_errors, n); return len; } static struct rdev_sysfs_entry rdev_errors = __ATTR(errors, S_IRUGO|S_IWUSR, errors_show, errors_store); static ssize_t slot_show(struct md_rdev *rdev, char *page) { if (test_bit(Journal, &rdev->flags)) return sprintf(page, "journal\n"); else if (rdev->raid_disk < 0) return sprintf(page, "none\n"); else return sprintf(page, "%d\n", rdev->raid_disk); } static ssize_t slot_store(struct md_rdev *rdev, const char *buf, size_t len) { int slot; int err; if (test_bit(Journal, &rdev->flags)) return -EBUSY; if (strncmp(buf, "none", 4)==0) slot = -1; else { err = kstrtouint(buf, 10, (unsigned int *)&slot); if (err < 0) return err; } if (rdev->mddev->pers && slot == -1) { /* Setting 'slot' on an active array requires also * updating the 'rd%d' link, and communicating * with the personality with ->hot_*_disk. * For now we only support removing * failed/spare devices. This normally happens automatically, * but not when the metadata is externally managed. */ if (rdev->raid_disk == -1) return -EEXIST; /* personality does all needed checks */ if (rdev->mddev->pers->hot_remove_disk == NULL) return -EINVAL; clear_bit(Blocked, &rdev->flags); remove_and_add_spares(rdev->mddev, rdev); if (rdev->raid_disk >= 0) return -EBUSY; set_bit(MD_RECOVERY_NEEDED, &rdev->mddev->recovery); md_wakeup_thread(rdev->mddev->thread); } else if (rdev->mddev->pers) { /* Activating a spare .. or possibly reactivating * if we ever get bitmaps working here. */ int err; if (rdev->raid_disk != -1) return -EBUSY; if (test_bit(MD_RECOVERY_RUNNING, &rdev->mddev->recovery)) return -EBUSY; if (rdev->mddev->pers->hot_add_disk == NULL) return -EINVAL; if (slot >= rdev->mddev->raid_disks && slot >= rdev->mddev->raid_disks + rdev->mddev->delta_disks) return -ENOSPC; rdev->raid_disk = slot; if (test_bit(In_sync, &rdev->flags)) rdev->saved_raid_disk = slot; else rdev->saved_raid_disk = -1; clear_bit(In_sync, &rdev->flags); clear_bit(Bitmap_sync, &rdev->flags); err = rdev->mddev->pers-> hot_add_disk(rdev->mddev, rdev); if (err) { rdev->raid_disk = -1; return err; } else sysfs_notify_dirent_safe(rdev->sysfs_state); if (sysfs_link_rdev(rdev->mddev, rdev)) /* failure here is OK */; /* don't wakeup anyone, leave that to userspace. */ } else { if (slot >= rdev->mddev->raid_disks && slot >= rdev->mddev->raid_disks + rdev->mddev->delta_disks) return -ENOSPC; rdev->raid_disk = slot; /* assume it is working */ clear_bit(Faulty, &rdev->flags); clear_bit(WriteMostly, &rdev->flags); set_bit(In_sync, &rdev->flags); sysfs_notify_dirent_safe(rdev->sysfs_state); } return len; } static struct rdev_sysfs_entry rdev_slot = __ATTR(slot, S_IRUGO|S_IWUSR, slot_show, slot_store); static ssize_t offset_show(struct md_rdev *rdev, char *page) { return sprintf(page, "%llu\n", (unsigned long long)rdev->data_offset); } static ssize_t offset_store(struct md_rdev *rdev, const char *buf, size_t len) { unsigned long long offset; if (kstrtoull(buf, 10, &offset) < 0) return -EINVAL; if (rdev->mddev->pers && rdev->raid_disk >= 0) return -EBUSY; if (rdev->sectors && rdev->mddev->external) /* Must set offset before size, so overlap checks * can be sane */ return -EBUSY; rdev->data_offset = offset; rdev->new_data_offset = offset; return len; } static struct rdev_sysfs_entry rdev_offset = __ATTR(offset, S_IRUGO|S_IWUSR, offset_show, offset_store); static ssize_t new_offset_show(struct md_rdev *rdev, char *page) { return sprintf(page, "%llu\n", (unsigned long long)rdev->new_data_offset); } static ssize_t new_offset_store(struct md_rdev *rdev, const char *buf, size_t len) { unsigned long long new_offset; struct mddev *mddev = rdev->mddev; if (kstrtoull(buf, 10, &new_offset) < 0) return -EINVAL; if (mddev->sync_thread || test_bit(MD_RECOVERY_RUNNING,&mddev->recovery)) return -EBUSY; if (new_offset == rdev->data_offset) /* reset is always permitted */ ; else if (new_offset > rdev->data_offset) { /* must not push array size beyond rdev_sectors */ if (new_offset - rdev->data_offset + mddev->dev_sectors > rdev->sectors) return -E2BIG; } /* Metadata worries about other space details. */ /* decreasing the offset is inconsistent with a backwards * reshape. */ if (new_offset < rdev->data_offset && mddev->reshape_backwards) return -EINVAL; /* Increasing offset is inconsistent with forwards * reshape. reshape_direction should be set to * 'backwards' first. */ if (new_offset > rdev->data_offset && !mddev->reshape_backwards) return -EINVAL; if (mddev->pers && mddev->persistent && !super_types[mddev->major_version] .allow_new_offset(rdev, new_offset)) return -E2BIG; rdev->new_data_offset = new_offset; if (new_offset > rdev->data_offset) mddev->reshape_backwards = 1; else if (new_offset < rdev->data_offset) mddev->reshape_backwards = 0; return len; } static struct rdev_sysfs_entry rdev_new_offset = __ATTR(new_offset, S_IRUGO|S_IWUSR, new_offset_show, new_offset_store); static ssize_t rdev_size_show(struct md_rdev *rdev, char *page) { return sprintf(page, "%llu\n", (unsigned long long)rdev->sectors / 2); } static int overlaps(sector_t s1, sector_t l1, sector_t s2, sector_t l2) { /* check if two start/length pairs overlap */ if (s1+l1 <= s2) return 0; if (s2+l2 <= s1) return 0; return 1; } static int strict_blocks_to_sectors(const char *buf, sector_t *sectors) { unsigned long long blocks; sector_t new; if (kstrtoull(buf, 10, &blocks) < 0) return -EINVAL; if (blocks & 1ULL << (8 * sizeof(blocks) - 1)) return -EINVAL; /* sector conversion overflow */ new = blocks * 2; if (new != blocks * 2) return -EINVAL; /* unsigned long long to sector_t overflow */ *sectors = new; return 0; } static ssize_t rdev_size_store(struct md_rdev *rdev, const char *buf, size_t len) { struct mddev *my_mddev = rdev->mddev; sector_t oldsectors = rdev->sectors; sector_t sectors; if (test_bit(Journal, &rdev->flags)) return -EBUSY; if (strict_blocks_to_sectors(buf, &sectors) < 0) return -EINVAL; if (rdev->data_offset != rdev->new_data_offset) return -EINVAL; /* too confusing */ if (my_mddev->pers && rdev->raid_disk >= 0) { if (my_mddev->persistent) { sectors = super_types[my_mddev->major_version]. rdev_size_change(rdev, sectors); if (!sectors) return -EBUSY; } else if (!sectors) sectors = (i_size_read(rdev->bdev->bd_inode) >> 9) - rdev->data_offset; if (!my_mddev->pers->resize) /* Cannot change size for RAID0 or Linear etc */ return -EINVAL; } if (sectors < my_mddev->dev_sectors) return -EINVAL; /* component must fit device */ rdev->sectors = sectors; if (sectors > oldsectors && my_mddev->external) { /* Need to check that all other rdevs with the same * ->bdev do not overlap. 'rcu' is sufficient to walk * the rdev lists safely. * This check does not provide a hard guarantee, it * just helps avoid dangerous mistakes. */ struct mddev *mddev; int overlap = 0; struct list_head *tmp; rcu_read_lock(); for_each_mddev(mddev, tmp) { struct md_rdev *rdev2; rdev_for_each(rdev2, mddev) if (rdev->bdev == rdev2->bdev && rdev != rdev2 && overlaps(rdev->data_offset, rdev->sectors, rdev2->data_offset, rdev2->sectors)) { overlap = 1; break; } if (overlap) { mddev_put(mddev); break; } } rcu_read_unlock(); if (overlap) { /* Someone else could have slipped in a size * change here, but doing so is just silly. * We put oldsectors back because we *know* it is * safe, and trust userspace not to race with * itself */ rdev->sectors = oldsectors; return -EBUSY; } } return len; } static struct rdev_sysfs_entry rdev_size = __ATTR(size, S_IRUGO|S_IWUSR, rdev_size_show, rdev_size_store); static ssize_t recovery_start_show(struct md_rdev *rdev, char *page) { unsigned long long recovery_start = rdev->recovery_offset; if (test_bit(In_sync, &rdev->flags) || recovery_start == MaxSector) return sprintf(page, "none\n"); return sprintf(page, "%llu\n", recovery_start); } static ssize_t recovery_start_store(struct md_rdev *rdev, const char *buf, size_t len) { unsigned long long recovery_start; if (cmd_match(buf, "none")) recovery_start = MaxSector; else if (kstrtoull(buf, 10, &recovery_start)) return -EINVAL; if (rdev->mddev->pers && rdev->raid_disk >= 0) return -EBUSY; rdev->recovery_offset = recovery_start; if (recovery_start == MaxSector) set_bit(In_sync, &rdev->flags); else clear_bit(In_sync, &rdev->flags); return len; } static struct rdev_sysfs_entry rdev_recovery_start = __ATTR(recovery_start, S_IRUGO|S_IWUSR, recovery_start_show, recovery_start_store); /* sysfs access to bad-blocks list. * We present two files. * 'bad-blocks' lists sector numbers and lengths of ranges that * are recorded as bad. The list is truncated to fit within * the one-page limit of sysfs. * Writing "sector length" to this file adds an acknowledged * bad block list. * 'unacknowledged-bad-blocks' lists bad blocks that have not yet * been acknowledged. Writing to this file adds bad blocks * without acknowledging them. This is largely for testing. */ static ssize_t bb_show(struct md_rdev *rdev, char *page) { return badblocks_show(&rdev->badblocks, page, 0); } static ssize_t bb_store(struct md_rdev *rdev, const char *page, size_t len) { int rv = badblocks_store(&rdev->badblocks, page, len, 0); /* Maybe that ack was all we needed */ if (test_and_clear_bit(BlockedBadBlocks, &rdev->flags)) wake_up(&rdev->blocked_wait); return rv; } static struct rdev_sysfs_entry rdev_bad_blocks = __ATTR(bad_blocks, S_IRUGO|S_IWUSR, bb_show, bb_store); static ssize_t ubb_show(struct md_rdev *rdev, char *page) { return badblocks_show(&rdev->badblocks, page, 1); } static ssize_t ubb_store(struct md_rdev *rdev, const char *page, size_t len) { return badblocks_store(&rdev->badblocks, page, len, 1); } static struct rdev_sysfs_entry rdev_unack_bad_blocks = __ATTR(unacknowledged_bad_blocks, S_IRUGO|S_IWUSR, ubb_show, ubb_store); static ssize_t ppl_sector_show(struct md_rdev *rdev, char *page) { return sprintf(page, "%llu\n", (unsigned long long)rdev->ppl.sector); } static ssize_t ppl_sector_store(struct md_rdev *rdev, const char *buf, size_t len) { unsigned long long sector; if (kstrtoull(buf, 10, &sector) < 0) return -EINVAL; if (sector != (sector_t)sector) return -EINVAL; if (rdev->mddev->pers && test_bit(MD_HAS_PPL, &rdev->mddev->flags) && rdev->raid_disk >= 0) return -EBUSY; if (rdev->mddev->persistent) { if (rdev->mddev->major_version == 0) return -EINVAL; if ((sector > rdev->sb_start && sector - rdev->sb_start > S16_MAX) || (sector < rdev->sb_start && rdev->sb_start - sector > -S16_MIN)) return -EINVAL; rdev->ppl.offset = sector - rdev->sb_start; } else if (!rdev->mddev->external) { return -EBUSY; } rdev->ppl.sector = sector; return len; } static struct rdev_sysfs_entry rdev_ppl_sector = __ATTR(ppl_sector, S_IRUGO|S_IWUSR, ppl_sector_show, ppl_sector_store); static ssize_t ppl_size_show(struct md_rdev *rdev, char *page) { return sprintf(page, "%u\n", rdev->ppl.size); } static ssize_t ppl_size_store(struct md_rdev *rdev, const char *buf, size_t len) { unsigned int size; if (kstrtouint(buf, 10, &size) < 0) return -EINVAL; if (rdev->mddev->pers && test_bit(MD_HAS_PPL, &rdev->mddev->flags) && rdev->raid_disk >= 0) return -EBUSY; if (rdev->mddev->persistent) { if (rdev->mddev->major_version == 0) return -EINVAL; if (size > U16_MAX) return -EINVAL; } else if (!rdev->mddev->external) { return -EBUSY; } rdev->ppl.size = size; return len; } static struct rdev_sysfs_entry rdev_ppl_size = __ATTR(ppl_size, S_IRUGO|S_IWUSR, ppl_size_show, ppl_size_store); static struct attribute *rdev_default_attrs[] = { &rdev_state.attr, &rdev_errors.attr, &rdev_slot.attr, &rdev_offset.attr, &rdev_new_offset.attr, &rdev_size.attr, &rdev_recovery_start.attr, &rdev_bad_blocks.attr, &rdev_unack_bad_blocks.attr, &rdev_ppl_sector.attr, &rdev_ppl_size.attr, NULL, }; static ssize_t rdev_attr_show(struct kobject *kobj, struct attribute *attr, char *page) { struct rdev_sysfs_entry *entry = container_of(attr, struct rdev_sysfs_entry, attr); struct md_rdev *rdev = container_of(kobj, struct md_rdev, kobj); if (!entry->show) return -EIO; if (!rdev->mddev) return -EBUSY; return entry->show(rdev, page); } static ssize_t rdev_attr_store(struct kobject *kobj, struct attribute *attr, const char *page, size_t length) { struct rdev_sysfs_entry *entry = container_of(attr, struct rdev_sysfs_entry, attr); struct md_rdev *rdev = container_of(kobj, struct md_rdev, kobj); ssize_t rv; struct mddev *mddev = rdev->mddev; if (!entry->store) return -EIO; if (!capable(CAP_SYS_ADMIN)) return -EACCES; rv = mddev ? mddev_lock(mddev): -EBUSY; if (!rv) { if (rdev->mddev == NULL) rv = -EBUSY; else rv = entry->store(rdev, page, length); mddev_unlock(mddev); } return rv; } static void rdev_free(struct kobject *ko) { struct md_rdev *rdev = container_of(ko, struct md_rdev, kobj); kfree(rdev); } static const struct sysfs_ops rdev_sysfs_ops = { .show = rdev_attr_show, .store = rdev_attr_store, }; static struct kobj_type rdev_ktype = { .release = rdev_free, .sysfs_ops = &rdev_sysfs_ops, .default_attrs = rdev_default_attrs, }; int md_rdev_init(struct md_rdev *rdev) { rdev->desc_nr = -1; rdev->saved_raid_disk = -1; rdev->raid_disk = -1; rdev->flags = 0; rdev->data_offset = 0; rdev->new_data_offset = 0; rdev->sb_events = 0; rdev->last_read_error = 0; rdev->sb_loaded = 0; rdev->bb_page = NULL; atomic_set(&rdev->nr_pending, 0); atomic_set(&rdev->read_errors, 0); atomic_set(&rdev->corrected_errors, 0); INIT_LIST_HEAD(&rdev->same_set); init_waitqueue_head(&rdev->blocked_wait); /* Add space to store bad block list. * This reserves the space even on arrays where it cannot * be used - I wonder if that matters */ return badblocks_init(&rdev->badblocks, 0); } EXPORT_SYMBOL_GPL(md_rdev_init); /* * Import a device. If 'super_format' >= 0, then sanity check the superblock * * mark the device faulty if: * * - the device is nonexistent (zero size) * - the device has no valid superblock * * a faulty rdev _never_ has rdev->sb set. */ static struct md_rdev *md_import_device(dev_t newdev, int super_format, int super_minor) { char b[BDEVNAME_SIZE]; int err; struct md_rdev *rdev; sector_t size; rdev = kzalloc(sizeof(*rdev), GFP_KERNEL); if (!rdev) return ERR_PTR(-ENOMEM); err = md_rdev_init(rdev); if (err) goto abort_free; err = alloc_disk_sb(rdev); if (err) goto abort_free; err = lock_rdev(rdev, newdev, super_format == -2); if (err) goto abort_free; kobject_init(&rdev->kobj, &rdev_ktype); size = i_size_read(rdev->bdev->bd_inode) >> BLOCK_SIZE_BITS; if (!size) { pr_warn("md: %s has zero or unknown size, marking faulty!\n", bdevname(rdev->bdev,b)); err = -EINVAL; goto abort_free; } if (super_format >= 0) { err = super_types[super_format]. load_super(rdev, NULL, super_minor); if (err == -EINVAL) { pr_warn("md: %s does not have a valid v%d.%d superblock, not importing!\n", bdevname(rdev->bdev,b), super_format, super_minor); goto abort_free; } if (err < 0) { pr_warn("md: could not read %s's sb, not importing!\n", bdevname(rdev->bdev,b)); goto abort_free; } } return rdev; abort_free: if (rdev->bdev) unlock_rdev(rdev); md_rdev_clear(rdev); kfree(rdev); return ERR_PTR(err); } /* * Check a full RAID array for plausibility */ static void analyze_sbs(struct mddev *mddev) { int i; struct md_rdev *rdev, *freshest, *tmp; char b[BDEVNAME_SIZE]; freshest = NULL; rdev_for_each_safe(rdev, tmp, mddev) switch (super_types[mddev->major_version]. load_super(rdev, freshest, mddev->minor_version)) { case 1: freshest = rdev; break; case 0: break; default: pr_warn("md: fatal superblock inconsistency in %s -- removing from array\n", bdevname(rdev->bdev,b)); md_kick_rdev_from_array(rdev); } super_types[mddev->major_version]. validate_super(mddev, freshest); i = 0; rdev_for_each_safe(rdev, tmp, mddev) { if (mddev->max_disks && (rdev->desc_nr >= mddev->max_disks || i > mddev->max_disks)) { pr_warn("md: %s: %s: only %d devices permitted\n", mdname(mddev), bdevname(rdev->bdev, b), mddev->max_disks); md_kick_rdev_from_array(rdev); continue; } if (rdev != freshest) { if (super_types[mddev->major_version]. validate_super(mddev, rdev)) { pr_warn("md: kicking non-fresh %s from array!\n", bdevname(rdev->bdev,b)); md_kick_rdev_from_array(rdev); continue; } } if (mddev->level == LEVEL_MULTIPATH) { rdev->desc_nr = i++; rdev->raid_disk = rdev->desc_nr; set_bit(In_sync, &rdev->flags); } else if (rdev->raid_disk >= (mddev->raid_disks - min(0, mddev->delta_disks)) && !test_bit(Journal, &rdev->flags)) { rdev->raid_disk = -1; clear_bit(In_sync, &rdev->flags); } } } /* Read a fixed-point number. * Numbers in sysfs attributes should be in "standard" units where * possible, so time should be in seconds. * However we internally use a a much smaller unit such as * milliseconds or jiffies. * This function takes a decimal number with a possible fractional * component, and produces an integer which is the result of * multiplying that number by 10^'scale'. * all without any floating-point arithmetic. */ int strict_strtoul_scaled(const char *cp, unsigned long *res, int scale) { unsigned long result = 0; long decimals = -1; while (isdigit(*cp) || (*cp == '.' && decimals < 0)) { if (*cp == '.') decimals = 0; else if (decimals < scale) { unsigned int value; value = *cp - '0'; result = result * 10 + value; if (decimals >= 0) decimals++; } cp++; } if (*cp == '\n') cp++; if (*cp) return -EINVAL; if (decimals < 0) decimals = 0; while (decimals < scale) { result *= 10; decimals ++; } *res = result; return 0; } static ssize_t safe_delay_show(struct mddev *mddev, char *page) { int msec = (mddev->safemode_delay*1000)/HZ; return sprintf(page, "%d.%03d\n", msec/1000, msec%1000); } static ssize_t safe_delay_store(struct mddev *mddev, const char *cbuf, size_t len) { unsigned long msec; if (mddev_is_clustered(mddev)) { pr_warn("md: Safemode is disabled for clustered mode\n"); return -EINVAL; } if (strict_strtoul_scaled(cbuf, &msec, 3) < 0) return -EINVAL; if (msec == 0) mddev->safemode_delay = 0; else { unsigned long old_delay = mddev->safemode_delay; unsigned long new_delay = (msec*HZ)/1000; if (new_delay == 0) new_delay = 1; mddev->safemode_delay = new_delay; if (new_delay < old_delay || old_delay == 0) mod_timer(&mddev->safemode_timer, jiffies+1); } return len; } static struct md_sysfs_entry md_safe_delay = __ATTR(safe_mode_delay, S_IRUGO|S_IWUSR,safe_delay_show, safe_delay_store); static ssize_t level_show(struct mddev *mddev, char *page) { struct md_personality *p; int ret; spin_lock(&mddev->lock); p = mddev->pers; if (p) ret = sprintf(page, "%s\n", p->name); else if (mddev->clevel[0]) ret = sprintf(page, "%s\n", mddev->clevel); else if (mddev->level != LEVEL_NONE) ret = sprintf(page, "%d\n", mddev->level); else ret = 0; spin_unlock(&mddev->lock); return ret; } static ssize_t level_store(struct mddev *mddev, const char *buf, size_t len) { char clevel[16]; ssize_t rv; size_t slen = len; struct md_personality *pers, *oldpers; long level; void *priv, *oldpriv; struct md_rdev *rdev; if (slen == 0 || slen >= sizeof(clevel)) return -EINVAL; rv = mddev_lock(mddev); if (rv) return rv; if (mddev->pers == NULL) { strncpy(mddev->clevel, buf, slen); if (mddev->clevel[slen-1] == '\n') slen--; mddev->clevel[slen] = 0; mddev->level = LEVEL_NONE; rv = len; goto out_unlock; } rv = -EROFS; if (mddev->ro) goto out_unlock; /* request to change the personality. Need to ensure: * - array is not engaged in resync/recovery/reshape * - old personality can be suspended * - new personality will access other array. */ rv = -EBUSY; if (mddev->sync_thread || test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) || mddev->reshape_position != MaxSector || mddev->sysfs_active) goto out_unlock; rv = -EINVAL; if (!mddev->pers->quiesce) { pr_warn("md: %s: %s does not support online personality change\n", mdname(mddev), mddev->pers->name); goto out_unlock; } /* Now find the new personality */ strncpy(clevel, buf, slen); if (clevel[slen-1] == '\n') slen--; clevel[slen] = 0; if (kstrtol(clevel, 10, &level)) level = LEVEL_NONE; if (request_module("md-%s", clevel) != 0) request_module("md-level-%s", clevel); spin_lock(&pers_lock); pers = find_pers(level, clevel); if (!pers || !try_module_get(pers->owner)) { spin_unlock(&pers_lock); pr_warn("md: personality %s not loaded\n", clevel); rv = -EINVAL; goto out_unlock; } spin_unlock(&pers_lock); if (pers == mddev->pers) { /* Nothing to do! */ module_put(pers->owner); rv = len; goto out_unlock; } if (!pers->takeover) { module_put(pers->owner); pr_warn("md: %s: %s does not support personality takeover\n", mdname(mddev), clevel); rv = -EINVAL; goto out_unlock; } rdev_for_each(rdev, mddev) rdev->new_raid_disk = rdev->raid_disk; /* ->takeover must set new_* and/or delta_disks * if it succeeds, and may set them when it fails. */ priv = pers->takeover(mddev); if (IS_ERR(priv)) { mddev->new_level = mddev->level; mddev->new_layout = mddev->layout; mddev->new_chunk_sectors = mddev->chunk_sectors; mddev->raid_disks -= mddev->delta_disks; mddev->delta_disks = 0; mddev->reshape_backwards = 0; module_put(pers->owner); pr_warn("md: %s: %s would not accept array\n", mdname(mddev), clevel); rv = PTR_ERR(priv); goto out_unlock; } /* Looks like we have a winner */ mddev_suspend(mddev); mddev_detach(mddev); spin_lock(&mddev->lock); oldpers = mddev->pers; oldpriv = mddev->private; mddev->pers = pers; mddev->private = priv; strlcpy(mddev->clevel, pers->name, sizeof(mddev->clevel)); mddev->level = mddev->new_level; mddev->layout = mddev->new_layout; mddev->chunk_sectors = mddev->new_chunk_sectors; mddev->delta_disks = 0; mddev->reshape_backwards = 0; mddev->degraded = 0; spin_unlock(&mddev->lock); if (oldpers->sync_request == NULL && mddev->external) { /* We are converting from a no-redundancy array * to a redundancy array and metadata is managed * externally so we need to be sure that writes * won't block due to a need to transition * clean->dirty * until external management is started. */ mddev->in_sync = 0; mddev->safemode_delay = 0; mddev->safemode = 0; } oldpers->free(mddev, oldpriv); if (oldpers->sync_request == NULL && pers->sync_request != NULL) { /* need to add the md_redundancy_group */ if (sysfs_create_group(&mddev->kobj, &md_redundancy_group)) pr_warn("md: cannot register extra attributes for %s\n", mdname(mddev)); mddev->sysfs_action = sysfs_get_dirent(mddev->kobj.sd, "sync_action"); } if (oldpers->sync_request != NULL && pers->sync_request == NULL) { /* need to remove the md_redundancy_group */ if (mddev->to_remove == NULL) mddev->to_remove = &md_redundancy_group; } module_put(oldpers->owner); rdev_for_each(rdev, mddev) { if (rdev->raid_disk < 0) continue; if (rdev->new_raid_disk >= mddev->raid_disks) rdev->new_raid_disk = -1; if (rdev->new_raid_disk == rdev->raid_disk) continue; sysfs_unlink_rdev(mddev, rdev); } rdev_for_each(rdev, mddev) { if (rdev->raid_disk < 0) continue; if (rdev->new_raid_disk == rdev->raid_disk) continue; rdev->raid_disk = rdev->new_raid_disk; if (rdev->raid_disk < 0) clear_bit(In_sync, &rdev->flags); else { if (sysfs_link_rdev(mddev, rdev)) pr_warn("md: cannot register rd%d for %s after level change\n", rdev->raid_disk, mdname(mddev)); } } if (pers->sync_request == NULL) { /* this is now an array without redundancy, so * it must always be in_sync */ mddev->in_sync = 1; del_timer_sync(&mddev->safemode_timer); } blk_set_stacking_limits(&mddev->queue->limits); pers->run(mddev); set_bit(MD_SB_CHANGE_DEVS, &mddev->sb_flags); mddev_resume(mddev); if (!mddev->thread) md_update_sb(mddev, 1); sysfs_notify(&mddev->kobj, NULL, "level"); md_new_event(mddev); rv = len; out_unlock: mddev_unlock(mddev); return rv; } static struct md_sysfs_entry md_level = __ATTR(level, S_IRUGO|S_IWUSR, level_show, level_store); static ssize_t layout_show(struct mddev *mddev, char *page) { /* just a number, not meaningful for all levels */ if (mddev->reshape_position != MaxSector && mddev->layout != mddev->new_layout) return sprintf(page, "%d (%d)\n", mddev->new_layout, mddev->layout); return sprintf(page, "%d\n", mddev->layout); } static ssize_t layout_store(struct mddev *mddev, const char *buf, size_t len) { unsigned int n; int err; err = kstrtouint(buf, 10, &n); if (err < 0) return err; err = mddev_lock(mddev); if (err) return err; if (mddev->pers) { if (mddev->pers->check_reshape == NULL) err = -EBUSY; else if (mddev->ro) err = -EROFS; else { mddev->new_layout = n; err = mddev->pers->check_reshape(mddev); if (err) mddev->new_layout = mddev->layout; } } else { mddev->new_layout = n; if (mddev->reshape_position == MaxSector) mddev->layout = n; } mddev_unlock(mddev); return err ?: len; } static struct md_sysfs_entry md_layout = __ATTR(layout, S_IRUGO|S_IWUSR, layout_show, layout_store); static ssize_t raid_disks_show(struct mddev *mddev, char *page) { if (mddev->raid_disks == 0) return 0; if (mddev->reshape_position != MaxSector && mddev->delta_disks != 0) return sprintf(page, "%d (%d)\n", mddev->raid_disks, mddev->raid_disks - mddev->delta_disks); return sprintf(page, "%d\n", mddev->raid_disks); } static int update_raid_disks(struct mddev *mddev, int raid_disks); static ssize_t raid_disks_store(struct mddev *mddev, const char *buf, size_t len) { unsigned int n; int err; err = kstrtouint(buf, 10, &n); if (err < 0) return err; err = mddev_lock(mddev); if (err) return err; if (mddev->pers) err = update_raid_disks(mddev, n); else if (mddev->reshape_position != MaxSector) { struct md_rdev *rdev; int olddisks = mddev->raid_disks - mddev->delta_disks; err = -EINVAL; rdev_for_each(rdev, mddev) { if (olddisks < n && rdev->data_offset < rdev->new_data_offset) goto out_unlock; if (olddisks > n && rdev->data_offset > rdev->new_data_offset) goto out_unlock; } err = 0; mddev->delta_disks = n - olddisks; mddev->raid_disks = n; mddev->reshape_backwards = (mddev->delta_disks < 0); } else mddev->raid_disks = n; out_unlock: mddev_unlock(mddev); return err ? err : len; } static struct md_sysfs_entry md_raid_disks = __ATTR(raid_disks, S_IRUGO|S_IWUSR, raid_disks_show, raid_disks_store); static ssize_t chunk_size_show(struct mddev *mddev, char *page) { if (mddev->reshape_position != MaxSector && mddev->chunk_sectors != mddev->new_chunk_sectors) return sprintf(page, "%d (%d)\n", mddev->new_chunk_sectors << 9, mddev->chunk_sectors << 9); return sprintf(page, "%d\n", mddev->chunk_sectors << 9); } static ssize_t chunk_size_store(struct mddev *mddev, const char *buf, size_t len) { unsigned long n; int err; err = kstrtoul(buf, 10, &n); if (err < 0) return err; err = mddev_lock(mddev); if (err) return err; if (mddev->pers) { if (mddev->pers->check_reshape == NULL) err = -EBUSY; else if (mddev->ro) err = -EROFS; else { mddev->new_chunk_sectors = n >> 9; err = mddev->pers->check_reshape(mddev); if (err) mddev->new_chunk_sectors = mddev->chunk_sectors; } } else { mddev->new_chunk_sectors = n >> 9; if (mddev->reshape_position == MaxSector) mddev->chunk_sectors = n >> 9; } mddev_unlock(mddev); return err ?: len; } static struct md_sysfs_entry md_chunk_size = __ATTR(chunk_size, S_IRUGO|S_IWUSR, chunk_size_show, chunk_size_store); static ssize_t resync_start_show(struct mddev *mddev, char *page) { if (mddev->recovery_cp == MaxSector) return sprintf(page, "none\n"); return sprintf(page, "%llu\n", (unsigned long long)mddev->recovery_cp); } static ssize_t resync_start_store(struct mddev *mddev, const char *buf, size_t len) { unsigned long long n; int err; if (cmd_match(buf, "none")) n = MaxSector; else { err = kstrtoull(buf, 10, &n); if (err < 0) return err; if (n != (sector_t)n) return -EINVAL; } err = mddev_lock(mddev); if (err) return err; if (mddev->pers && !test_bit(MD_RECOVERY_FROZEN, &mddev->recovery)) err = -EBUSY; if (!err) { mddev->recovery_cp = n; if (mddev->pers) set_bit(MD_SB_CHANGE_CLEAN, &mddev->sb_flags); } mddev_unlock(mddev); return err ?: len; } static struct md_sysfs_entry md_resync_start = __ATTR_PREALLOC(resync_start, S_IRUGO|S_IWUSR, resync_start_show, resync_start_store); /* * The array state can be: * * clear * No devices, no size, no level * Equivalent to STOP_ARRAY ioctl * inactive * May have some settings, but array is not active * all IO results in error * When written, doesn't tear down array, but just stops it * suspended (not supported yet) * All IO requests will block. The array can be reconfigured. * Writing this, if accepted, will block until array is quiescent * readonly * no resync can happen. no superblocks get written. * write requests fail * read-auto * like readonly, but behaves like 'clean' on a write request. * * clean - no pending writes, but otherwise active. * When written to inactive array, starts without resync * If a write request arrives then * if metadata is known, mark 'dirty' and switch to 'active'. * if not known, block and switch to write-pending * If written to an active array that has pending writes, then fails. * active * fully active: IO and resync can be happening. * When written to inactive array, starts with resync * * write-pending * clean, but writes are blocked waiting for 'active' to be written. * * active-idle * like active, but no writes have been seen for a while (100msec). * */ enum array_state { clear, inactive, suspended, readonly, read_auto, clean, active, write_pending, active_idle, bad_word}; static char *array_states[] = { "clear", "inactive", "suspended", "readonly", "read-auto", "clean", "active", "write-pending", "active-idle", NULL }; static int match_word(const char *word, char **list) { int n; for (n=0; list[n]; n++) if (cmd_match(word, list[n])) break; return n; } static ssize_t array_state_show(struct mddev *mddev, char *page) { enum array_state st = inactive; if (mddev->pers && !test_bit(MD_NOT_READY, &mddev->flags)) switch(mddev->ro) { case 1: st = readonly; break; case 2: st = read_auto; break; case 0: spin_lock(&mddev->lock); if (test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags)) st = write_pending; else if (mddev->in_sync) st = clean; else if (mddev->safemode) st = active_idle; else st = active; spin_unlock(&mddev->lock); } else { if (list_empty(&mddev->disks) && mddev->raid_disks == 0 && mddev->dev_sectors == 0) st = clear; else st = inactive; } return sprintf(page, "%s\n", array_states[st]); } static int do_md_stop(struct mddev *mddev, int ro, struct block_device *bdev); static int md_set_readonly(struct mddev *mddev, struct block_device *bdev); static int do_md_run(struct mddev *mddev); static int restart_array(struct mddev *mddev); static ssize_t array_state_store(struct mddev *mddev, const char *buf, size_t len) { int err = 0; enum array_state st = match_word(buf, array_states); if (mddev->pers && (st == active || st == clean) && mddev->ro != 1) { /* don't take reconfig_mutex when toggling between * clean and active */ spin_lock(&mddev->lock); if (st == active) { restart_array(mddev); clear_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags); md_wakeup_thread(mddev->thread); wake_up(&mddev->sb_wait); } else /* st == clean */ { restart_array(mddev); if (!set_in_sync(mddev)) err = -EBUSY; } if (!err) sysfs_notify_dirent_safe(mddev->sysfs_state); spin_unlock(&mddev->lock); return err ?: len; } err = mddev_lock(mddev); if (err) return err; err = -EINVAL; switch(st) { case bad_word: break; case clear: /* stopping an active array */ err = do_md_stop(mddev, 0, NULL); break; case inactive: /* stopping an active array */ if (mddev->pers) err = do_md_stop(mddev, 2, NULL); else err = 0; /* already inactive */ break; case suspended: break; /* not supported yet */ case readonly: if (mddev->pers) err = md_set_readonly(mddev, NULL); else { mddev->ro = 1; set_disk_ro(mddev->gendisk, 1); err = do_md_run(mddev); } break; case read_auto: if (mddev->pers) { if (mddev->ro == 0) err = md_set_readonly(mddev, NULL); else if (mddev->ro == 1) err = restart_array(mddev); if (err == 0) { mddev->ro = 2; set_disk_ro(mddev->gendisk, 0); } } else { mddev->ro = 2; err = do_md_run(mddev); } break; case clean: if (mddev->pers) { err = restart_array(mddev); if (err) break; spin_lock(&mddev->lock); if (!set_in_sync(mddev)) err = -EBUSY; spin_unlock(&mddev->lock); } else err = -EINVAL; break; case active: if (mddev->pers) { err = restart_array(mddev); if (err) break; clear_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags); wake_up(&mddev->sb_wait); err = 0; } else { mddev->ro = 0; set_disk_ro(mddev->gendisk, 0); err = do_md_run(mddev); } break; case write_pending: case active_idle: /* these cannot be set */ break; } if (!err) { if (mddev->hold_active == UNTIL_IOCTL) mddev->hold_active = 0; sysfs_notify_dirent_safe(mddev->sysfs_state); } mddev_unlock(mddev); return err ?: len; } static struct md_sysfs_entry md_array_state = __ATTR_PREALLOC(array_state, S_IRUGO|S_IWUSR, array_state_show, array_state_store); static ssize_t max_corrected_read_errors_show(struct mddev *mddev, char *page) { return sprintf(page, "%d\n", atomic_read(&mddev->max_corr_read_errors)); } static ssize_t max_corrected_read_errors_store(struct mddev *mddev, const char *buf, size_t len) { unsigned int n; int rv; rv = kstrtouint(buf, 10, &n); if (rv < 0) return rv; atomic_set(&mddev->max_corr_read_errors, n); return len; } static struct md_sysfs_entry max_corr_read_errors = __ATTR(max_read_errors, S_IRUGO|S_IWUSR, max_corrected_read_errors_show, max_corrected_read_errors_store); static ssize_t null_show(struct mddev *mddev, char *page) { return -EINVAL; } static ssize_t new_dev_store(struct mddev *mddev, const char *buf, size_t len) { /* buf must be %d:%d\n? giving major and minor numbers */ /* The new device is added to the array. * If the array has a persistent superblock, we read the * superblock to initialise info and check validity. * Otherwise, only checking done is that in bind_rdev_to_array, * which mainly checks size. */ char *e; int major = simple_strtoul(buf, &e, 10); int minor; dev_t dev; struct md_rdev *rdev; int err; if (!*buf || *e != ':' || !e[1] || e[1] == '\n') return -EINVAL; minor = simple_strtoul(e+1, &e, 10); if (*e && *e != '\n') return -EINVAL; dev = MKDEV(major, minor); if (major != MAJOR(dev) || minor != MINOR(dev)) return -EOVERFLOW; flush_workqueue(md_misc_wq); err = mddev_lock(mddev); if (err) return err; if (mddev->persistent) { rdev = md_import_device(dev, mddev->major_version, mddev->minor_version); if (!IS_ERR(rdev) && !list_empty(&mddev->disks)) { struct md_rdev *rdev0 = list_entry(mddev->disks.next, struct md_rdev, same_set); err = super_types[mddev->major_version] .load_super(rdev, rdev0, mddev->minor_version); if (err < 0) goto out; } } else if (mddev->external) rdev = md_import_device(dev, -2, -1); else rdev = md_import_device(dev, -1, -1); if (IS_ERR(rdev)) { mddev_unlock(mddev); return PTR_ERR(rdev); } err = bind_rdev_to_array(rdev, mddev); out: if (err) export_rdev(rdev); mddev_unlock(mddev); if (!err) md_new_event(mddev); return err ? err : len; } static struct md_sysfs_entry md_new_device = __ATTR(new_dev, S_IWUSR, null_show, new_dev_store); static ssize_t bitmap_store(struct mddev *mddev, const char *buf, size_t len) { char *end; unsigned long chunk, end_chunk; int err; err = mddev_lock(mddev); if (err) return err; if (!mddev->bitmap) goto out; /* buf should be <chunk> <chunk> ... or <chunk>-<chunk> ... (range) */ while (*buf) { chunk = end_chunk = simple_strtoul(buf, &end, 0); if (buf == end) break; if (*end == '-') { /* range */ buf = end + 1; end_chunk = simple_strtoul(buf, &end, 0); if (buf == end) break; } if (*end && !isspace(*end)) break; md_bitmap_dirty_bits(mddev->bitmap, chunk, end_chunk); buf = skip_spaces(end); } md_bitmap_unplug(mddev->bitmap); /* flush the bits to disk */ out: mddev_unlock(mddev); return len; } static struct md_sysfs_entry md_bitmap = __ATTR(bitmap_set_bits, S_IWUSR, null_show, bitmap_store); static ssize_t size_show(struct mddev *mddev, char *page) { return sprintf(page, "%llu\n", (unsigned long long)mddev->dev_sectors / 2); } static int update_size(struct mddev *mddev, sector_t num_sectors); static ssize_t size_store(struct mddev *mddev, const char *buf, size_t len) { /* If array is inactive, we can reduce the component size, but * not increase it (except from 0). * If array is active, we can try an on-line resize */ sector_t sectors; int err = strict_blocks_to_sectors(buf, &sectors); if (err < 0) return err; err = mddev_lock(mddev); if (err) return err; if (mddev->pers) { err = update_size(mddev, sectors); if (err == 0) md_update_sb(mddev, 1); } else { if (mddev->dev_sectors == 0 || mddev->dev_sectors > sectors) mddev->dev_sectors = sectors; else err = -ENOSPC; } mddev_unlock(mddev); return err ? err : len; } static struct md_sysfs_entry md_size = __ATTR(component_size, S_IRUGO|S_IWUSR, size_show, size_store); /* Metadata version. * This is one of * 'none' for arrays with no metadata (good luck...) * 'external' for arrays with externally managed metadata, * or N.M for internally known formats */ static ssize_t metadata_show(struct mddev *mddev, char *page) { if (mddev->persistent) return sprintf(page, "%d.%d\n", mddev->major_version, mddev->minor_version); else if (mddev->external) return sprintf(page, "external:%s\n", mddev->metadata_type); else return sprintf(page, "none\n"); } static ssize_t metadata_store(struct mddev *mddev, const char *buf, size_t len) { int major, minor; char *e; int err; /* Changing the details of 'external' metadata is * always permitted. Otherwise there must be * no devices attached to the array. */ err = mddev_lock(mddev); if (err) return err; err = -EBUSY; if (mddev->external && strncmp(buf, "external:", 9) == 0) ; else if (!list_empty(&mddev->disks)) goto out_unlock; err = 0; if (cmd_match(buf, "none")) { mddev->persistent = 0; mddev->external = 0; mddev->major_version = 0; mddev->minor_version = 90; goto out_unlock; } if (strncmp(buf, "external:", 9) == 0) { size_t namelen = len-9; if (namelen >= sizeof(mddev->metadata_type)) namelen = sizeof(mddev->metadata_type)-1; strncpy(mddev->metadata_type, buf+9, namelen); mddev->metadata_type[namelen] = 0; if (namelen && mddev->metadata_type[namelen-1] == '\n') mddev->metadata_type[--namelen] = 0; mddev->persistent = 0; mddev->external = 1; mddev->major_version = 0; mddev->minor_version = 90; goto out_unlock; } major = simple_strtoul(buf, &e, 10); err = -EINVAL; if (e==buf || *e != '.') goto out_unlock; buf = e+1; minor = simple_strtoul(buf, &e, 10); if (e==buf || (*e && *e != '\n') ) goto out_unlock; err = -ENOENT; if (major >= ARRAY_SIZE(super_types) || super_types[major].name == NULL) goto out_unlock; mddev->major_version = major; mddev->minor_version = minor; mddev->persistent = 1; mddev->external = 0; err = 0; out_unlock: mddev_unlock(mddev); return err ?: len; } static struct md_sysfs_entry md_metadata = __ATTR_PREALLOC(metadata_version, S_IRUGO|S_IWUSR, metadata_show, metadata_store); static ssize_t action_show(struct mddev *mddev, char *page) { char *type = "idle"; unsigned long recovery = mddev->recovery; if (test_bit(MD_RECOVERY_FROZEN, &recovery)) type = "frozen"; else if (test_bit(MD_RECOVERY_RUNNING, &recovery) || (!mddev->ro && test_bit(MD_RECOVERY_NEEDED, &recovery))) { if (test_bit(MD_RECOVERY_RESHAPE, &recovery)) type = "reshape"; else if (test_bit(MD_RECOVERY_SYNC, &recovery)) { if (!test_bit(MD_RECOVERY_REQUESTED, &recovery)) type = "resync"; else if (test_bit(MD_RECOVERY_CHECK, &recovery)) type = "check"; else type = "repair"; } else if (test_bit(MD_RECOVERY_RECOVER, &recovery)) type = "recover"; else if (mddev->reshape_position != MaxSector) type = "reshape"; } return sprintf(page, "%s\n", type); } static ssize_t action_store(struct mddev *mddev, const char *page, size_t len) { if (!mddev->pers || !mddev->pers->sync_request) return -EINVAL; if (cmd_match(page, "idle") || cmd_match(page, "frozen")) { if (cmd_match(page, "frozen")) set_bit(MD_RECOVERY_FROZEN, &mddev->recovery); else clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery); if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) && mddev_lock(mddev) == 0) { flush_workqueue(md_misc_wq); if (mddev->sync_thread) { set_bit(MD_RECOVERY_INTR, &mddev->recovery); md_reap_sync_thread(mddev); } mddev_unlock(mddev); } } else if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) return -EBUSY; else if (cmd_match(page, "resync")) clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery); else if (cmd_match(page, "recover")) { clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery); set_bit(MD_RECOVERY_RECOVER, &mddev->recovery); } else if (cmd_match(page, "reshape")) { int err; if (mddev->pers->start_reshape == NULL) return -EINVAL; err = mddev_lock(mddev); if (!err) { if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) err = -EBUSY; else { clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery); err = mddev->pers->start_reshape(mddev); } mddev_unlock(mddev); } if (err) return err; sysfs_notify(&mddev->kobj, NULL, "degraded"); } else { if (cmd_match(page, "check")) set_bit(MD_RECOVERY_CHECK, &mddev->recovery); else if (!cmd_match(page, "repair")) return -EINVAL; clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery); set_bit(MD_RECOVERY_REQUESTED, &mddev->recovery); set_bit(MD_RECOVERY_SYNC, &mddev->recovery); } if (mddev->ro == 2) { /* A write to sync_action is enough to justify * canceling read-auto mode */ mddev->ro = 0; md_wakeup_thread(mddev->sync_thread); } set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); md_wakeup_thread(mddev->thread); sysfs_notify_dirent_safe(mddev->sysfs_action); return len; } static struct md_sysfs_entry md_scan_mode = __ATTR_PREALLOC(sync_action, S_IRUGO|S_IWUSR, action_show, action_store); static ssize_t last_sync_action_show(struct mddev *mddev, char *page) { return sprintf(page, "%s\n", mddev->last_sync_action); } static struct md_sysfs_entry md_last_scan_mode = __ATTR_RO(last_sync_action); static ssize_t mismatch_cnt_show(struct mddev *mddev, char *page) { return sprintf(page, "%llu\n", (unsigned long long) atomic64_read(&mddev->resync_mismatches)); } static struct md_sysfs_entry md_mismatches = __ATTR_RO(mismatch_cnt); static ssize_t sync_min_show(struct mddev *mddev, char *page) { return sprintf(page, "%d (%s)\n", speed_min(mddev), mddev->sync_speed_min ? "local": "system"); } static ssize_t sync_min_store(struct mddev *mddev, const char *buf, size_t len) { unsigned int min; int rv; if (strncmp(buf, "system", 6)==0) { min = 0; } else { rv = kstrtouint(buf, 10, &min); if (rv < 0) return rv; if (min == 0) return -EINVAL; } mddev->sync_speed_min = min; return len; } static struct md_sysfs_entry md_sync_min = __ATTR(sync_speed_min, S_IRUGO|S_IWUSR, sync_min_show, sync_min_store); static ssize_t sync_max_show(struct mddev *mddev, char *page) { return sprintf(page, "%d (%s)\n", speed_max(mddev), mddev->sync_speed_max ? "local": "system"); } static ssize_t sync_max_store(struct mddev *mddev, const char *buf, size_t len) { unsigned int max; int rv; if (strncmp(buf, "system", 6)==0) { max = 0; } else { rv = kstrtouint(buf, 10, &max); if (rv < 0) return rv; if (max == 0) return -EINVAL; } mddev->sync_speed_max = max; return len; } static struct md_sysfs_entry md_sync_max = __ATTR(sync_speed_max, S_IRUGO|S_IWUSR, sync_max_show, sync_max_store); static ssize_t degraded_show(struct mddev *mddev, char *page) { return sprintf(page, "%d\n", mddev->degraded); } static struct md_sysfs_entry md_degraded = __ATTR_RO(degraded); static ssize_t sync_force_parallel_show(struct mddev *mddev, char *page) { return sprintf(page, "%d\n", mddev->parallel_resync); } static ssize_t sync_force_parallel_store(struct mddev *mddev, const char *buf, size_t len) { long n; if (kstrtol(buf, 10, &n)) return -EINVAL; if (n != 0 && n != 1) return -EINVAL; mddev->parallel_resync = n; if (mddev->sync_thread) wake_up(&resync_wait); return len; } /* force parallel resync, even with shared block devices */ static struct md_sysfs_entry md_sync_force_parallel = __ATTR(sync_force_parallel, S_IRUGO|S_IWUSR, sync_force_parallel_show, sync_force_parallel_store); static ssize_t sync_speed_show(struct mddev *mddev, char *page) { unsigned long resync, dt, db; if (mddev->curr_resync == 0) return sprintf(page, "none\n"); resync = mddev->curr_mark_cnt - atomic_read(&mddev->recovery_active); dt = (jiffies - mddev->resync_mark) / HZ; if (!dt) dt++; db = resync - mddev->resync_mark_cnt; return sprintf(page, "%lu\n", db/dt/2); /* K/sec */ } static struct md_sysfs_entry md_sync_speed = __ATTR_RO(sync_speed); static ssize_t sync_completed_show(struct mddev *mddev, char *page) { unsigned long long max_sectors, resync; if (!test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) return sprintf(page, "none\n"); if (mddev->curr_resync == 1 || mddev->curr_resync == 2) return sprintf(page, "delayed\n"); if (test_bit(MD_RECOVERY_SYNC, &mddev->recovery) || test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery)) max_sectors = mddev->resync_max_sectors; else max_sectors = mddev->dev_sectors; resync = mddev->curr_resync_completed; return sprintf(page, "%llu / %llu\n", resync, max_sectors); } static struct md_sysfs_entry md_sync_completed = __ATTR_PREALLOC(sync_completed, S_IRUGO, sync_completed_show, NULL); static ssize_t min_sync_show(struct mddev *mddev, char *page) { return sprintf(page, "%llu\n", (unsigned long long)mddev->resync_min); } static ssize_t min_sync_store(struct mddev *mddev, const char *buf, size_t len) { unsigned long long min; int err; if (kstrtoull(buf, 10, &min)) return -EINVAL; spin_lock(&mddev->lock); err = -EINVAL; if (min > mddev->resync_max) goto out_unlock; err = -EBUSY; if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) goto out_unlock; /* Round down to multiple of 4K for safety */ mddev->resync_min = round_down(min, 8); err = 0; out_unlock: spin_unlock(&mddev->lock); return err ?: len; } static struct md_sysfs_entry md_min_sync = __ATTR(sync_min, S_IRUGO|S_IWUSR, min_sync_show, min_sync_store); static ssize_t max_sync_show(struct mddev *mddev, char *page) { if (mddev->resync_max == MaxSector) return sprintf(page, "max\n"); else return sprintf(page, "%llu\n", (unsigned long long)mddev->resync_max); } static ssize_t max_sync_store(struct mddev *mddev, const char *buf, size_t len) { int err; spin_lock(&mddev->lock); if (strncmp(buf, "max", 3) == 0) mddev->resync_max = MaxSector; else { unsigned long long max; int chunk; err = -EINVAL; if (kstrtoull(buf, 10, &max)) goto out_unlock; if (max < mddev->resync_min) goto out_unlock; err = -EBUSY; if (max < mddev->resync_max && mddev->ro == 0 && test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) goto out_unlock; /* Must be a multiple of chunk_size */ chunk = mddev->chunk_sectors; if (chunk) { sector_t temp = max; err = -EINVAL; if (sector_div(temp, chunk)) goto out_unlock; } mddev->resync_max = max; } wake_up(&mddev->recovery_wait); err = 0; out_unlock: spin_unlock(&mddev->lock); return err ?: len; } static struct md_sysfs_entry md_max_sync = __ATTR(sync_max, S_IRUGO|S_IWUSR, max_sync_show, max_sync_store); static ssize_t suspend_lo_show(struct mddev *mddev, char *page) { return sprintf(page, "%llu\n", (unsigned long long)mddev->suspend_lo); } static ssize_t suspend_lo_store(struct mddev *mddev, const char *buf, size_t len) { unsigned long long new; int err; err = kstrtoull(buf, 10, &new); if (err < 0) return err; if (new != (sector_t)new) return -EINVAL; err = mddev_lock(mddev); if (err) return err; err = -EINVAL; if (mddev->pers == NULL || mddev->pers->quiesce == NULL) goto unlock; mddev_suspend(mddev); mddev->suspend_lo = new; mddev_resume(mddev); err = 0; unlock: mddev_unlock(mddev); return err ?: len; } static struct md_sysfs_entry md_suspend_lo = __ATTR(suspend_lo, S_IRUGO|S_IWUSR, suspend_lo_show, suspend_lo_store); static ssize_t suspend_hi_show(struct mddev *mddev, char *page) { return sprintf(page, "%llu\n", (unsigned long long)mddev->suspend_hi); } static ssize_t suspend_hi_store(struct mddev *mddev, const char *buf, size_t len) { unsigned long long new; int err; err = kstrtoull(buf, 10, &new); if (err < 0) return err; if (new != (sector_t)new) return -EINVAL; err = mddev_lock(mddev); if (err) return err; err = -EINVAL; if (mddev->pers == NULL) goto unlock; mddev_suspend(mddev); mddev->suspend_hi = new; mddev_resume(mddev); err = 0; unlock: mddev_unlock(mddev); return err ?: len; } static struct md_sysfs_entry md_suspend_hi = __ATTR(suspend_hi, S_IRUGO|S_IWUSR, suspend_hi_show, suspend_hi_store); static ssize_t reshape_position_show(struct mddev *mddev, char *page) { if (mddev->reshape_position != MaxSector) return sprintf(page, "%llu\n", (unsigned long long)mddev->reshape_position); strcpy(page, "none\n"); return 5; } static ssize_t reshape_position_store(struct mddev *mddev, const char *buf, size_t len) { struct md_rdev *rdev; unsigned long long new; int err; err = kstrtoull(buf, 10, &new); if (err < 0) return err; if (new != (sector_t)new) return -EINVAL; err = mddev_lock(mddev); if (err) return err; err = -EBUSY; if (mddev->pers) goto unlock; mddev->reshape_position = new; mddev->delta_disks = 0; mddev->reshape_backwards = 0; mddev->new_level = mddev->level; mddev->new_layout = mddev->layout; mddev->new_chunk_sectors = mddev->chunk_sectors; rdev_for_each(rdev, mddev) rdev->new_data_offset = rdev->data_offset; err = 0; unlock: mddev_unlock(mddev); return err ?: len; } static struct md_sysfs_entry md_reshape_position = __ATTR(reshape_position, S_IRUGO|S_IWUSR, reshape_position_show, reshape_position_store); static ssize_t reshape_direction_show(struct mddev *mddev, char *page) { return sprintf(page, "%s\n", mddev->reshape_backwards ? "backwards" : "forwards"); } static ssize_t reshape_direction_store(struct mddev *mddev, const char *buf, size_t len) { int backwards = 0; int err; if (cmd_match(buf, "forwards")) backwards = 0; else if (cmd_match(buf, "backwards")) backwards = 1; else return -EINVAL; if (mddev->reshape_backwards == backwards) return len; err = mddev_lock(mddev); if (err) return err; /* check if we are allowed to change */ if (mddev->delta_disks) err = -EBUSY; else if (mddev->persistent && mddev->major_version == 0) err = -EINVAL; else mddev->reshape_backwards = backwards; mddev_unlock(mddev); return err ?: len; } static struct md_sysfs_entry md_reshape_direction = __ATTR(reshape_direction, S_IRUGO|S_IWUSR, reshape_direction_show, reshape_direction_store); static ssize_t array_size_show(struct mddev *mddev, char *page) { if (mddev->external_size) return sprintf(page, "%llu\n", (unsigned long long)mddev->array_sectors/2); else return sprintf(page, "default\n"); } static ssize_t array_size_store(struct mddev *mddev, const char *buf, size_t len) { sector_t sectors; int err; err = mddev_lock(mddev); if (err) return err; /* cluster raid doesn't support change array_sectors */ if (mddev_is_clustered(mddev)) { mddev_unlock(mddev); return -EINVAL; } if (strncmp(buf, "default", 7) == 0) { if (mddev->pers) sectors = mddev->pers->size(mddev, 0, 0); else sectors = mddev->array_sectors; mddev->external_size = 0; } else { if (strict_blocks_to_sectors(buf, &sectors) < 0) err = -EINVAL; else if (mddev->pers && mddev->pers->size(mddev, 0, 0) < sectors) err = -E2BIG; else mddev->external_size = 1; } if (!err) { mddev->array_sectors = sectors; if (mddev->pers) { set_capacity(mddev->gendisk, mddev->array_sectors); revalidate_disk(mddev->gendisk); } } mddev_unlock(mddev); return err ?: len; } static struct md_sysfs_entry md_array_size = __ATTR(array_size, S_IRUGO|S_IWUSR, array_size_show, array_size_store); static ssize_t consistency_policy_show(struct mddev *mddev, char *page) { int ret; if (test_bit(MD_HAS_JOURNAL, &mddev->flags)) { ret = sprintf(page, "journal\n"); } else if (test_bit(MD_HAS_PPL, &mddev->flags)) { ret = sprintf(page, "ppl\n"); } else if (mddev->bitmap) { ret = sprintf(page, "bitmap\n"); } else if (mddev->pers) { if (mddev->pers->sync_request) ret = sprintf(page, "resync\n"); else ret = sprintf(page, "none\n"); } else { ret = sprintf(page, "unknown\n"); } return ret; } static ssize_t consistency_policy_store(struct mddev *mddev, const char *buf, size_t len) { int err = 0; if (mddev->pers) { if (mddev->pers->change_consistency_policy) err = mddev->pers->change_consistency_policy(mddev, buf); else err = -EBUSY; } else if (mddev->external && strncmp(buf, "ppl", 3) == 0) { set_bit(MD_HAS_PPL, &mddev->flags); } else { err = -EINVAL; } return err ? err : len; } static struct md_sysfs_entry md_consistency_policy = __ATTR(consistency_policy, S_IRUGO | S_IWUSR, consistency_policy_show, consistency_policy_store); static struct attribute *md_default_attrs[] = { &md_level.attr, &md_layout.attr, &md_raid_disks.attr, &md_chunk_size.attr, &md_size.attr, &md_resync_start.attr, &md_metadata.attr, &md_new_device.attr, &md_safe_delay.attr, &md_array_state.attr, &md_reshape_position.attr, &md_reshape_direction.attr, &md_array_size.attr, &max_corr_read_errors.attr, &md_consistency_policy.attr, NULL, }; static struct attribute *md_redundancy_attrs[] = { &md_scan_mode.attr, &md_last_scan_mode.attr, &md_mismatches.attr, &md_sync_min.attr, &md_sync_max.attr, &md_sync_speed.attr, &md_sync_force_parallel.attr, &md_sync_completed.attr, &md_min_sync.attr, &md_max_sync.attr, &md_suspend_lo.attr, &md_suspend_hi.attr, &md_bitmap.attr, &md_degraded.attr, NULL, }; static struct attribute_group md_redundancy_group = { .name = NULL, .attrs = md_redundancy_attrs, }; static ssize_t md_attr_show(struct kobject *kobj, struct attribute *attr, char *page) { struct md_sysfs_entry *entry = container_of(attr, struct md_sysfs_entry, attr); struct mddev *mddev = container_of(kobj, struct mddev, kobj); ssize_t rv; if (!entry->show) return -EIO; spin_lock(&all_mddevs_lock); if (list_empty(&mddev->all_mddevs)) { spin_unlock(&all_mddevs_lock); return -EBUSY; } mddev_get(mddev); spin_unlock(&all_mddevs_lock); rv = entry->show(mddev, page); mddev_put(mddev); return rv; } static ssize_t md_attr_store(struct kobject *kobj, struct attribute *attr, const char *page, size_t length) { struct md_sysfs_entry *entry = container_of(attr, struct md_sysfs_entry, attr); struct mddev *mddev = container_of(kobj, struct mddev, kobj); ssize_t rv; if (!entry->store) return -EIO; if (!capable(CAP_SYS_ADMIN)) return -EACCES; spin_lock(&all_mddevs_lock); if (list_empty(&mddev->all_mddevs)) { spin_unlock(&all_mddevs_lock); return -EBUSY; } mddev_get(mddev); spin_unlock(&all_mddevs_lock); rv = entry->store(mddev, page, length); mddev_put(mddev); return rv; } static void md_free(struct kobject *ko) { struct mddev *mddev = container_of(ko, struct mddev, kobj); if (mddev->sysfs_state) sysfs_put(mddev->sysfs_state); if (mddev->gendisk) del_gendisk(mddev->gendisk); if (mddev->queue) blk_cleanup_queue(mddev->queue); if (mddev->gendisk) put_disk(mddev->gendisk); percpu_ref_exit(&mddev->writes_pending); bioset_exit(&mddev->bio_set); bioset_exit(&mddev->sync_set); kfree(mddev); } static const struct sysfs_ops md_sysfs_ops = { .show = md_attr_show, .store = md_attr_store, }; static struct kobj_type md_ktype = { .release = md_free, .sysfs_ops = &md_sysfs_ops, .default_attrs = md_default_attrs, }; int mdp_major = 0; static void mddev_delayed_delete(struct work_struct *ws) { struct mddev *mddev = container_of(ws, struct mddev, del_work); sysfs_remove_group(&mddev->kobj, &md_bitmap_group); kobject_del(&mddev->kobj); kobject_put(&mddev->kobj); } static void no_op(struct percpu_ref *r) {} int mddev_init_writes_pending(struct mddev *mddev) { if (mddev->writes_pending.percpu_count_ptr) return 0; if (percpu_ref_init(&mddev->writes_pending, no_op, 0, GFP_KERNEL) < 0) return -ENOMEM; /* We want to start with the refcount at zero */ percpu_ref_put(&mddev->writes_pending); return 0; } EXPORT_SYMBOL_GPL(mddev_init_writes_pending); static int md_alloc(dev_t dev, char *name) { /* * If dev is zero, name is the name of a device to allocate with * an arbitrary minor number. It will be "md_???" * If dev is non-zero it must be a device number with a MAJOR of * MD_MAJOR or mdp_major. In this case, if "name" is NULL, then * the device is being created by opening a node in /dev. * If "name" is not NULL, the device is being created by * writing to /sys/module/md_mod/parameters/new_array. */ static DEFINE_MUTEX(disks_mutex); struct mddev *mddev = mddev_find_or_alloc(dev); struct gendisk *disk; int partitioned; int shift; int unit; int error; if (!mddev) return -ENODEV; partitioned = (MAJOR(mddev->unit) != MD_MAJOR); shift = partitioned ? MdpMinorShift : 0; unit = MINOR(mddev->unit) >> shift; /* wait for any previous instance of this device to be * completely removed (mddev_delayed_delete). */ flush_workqueue(md_misc_wq); mutex_lock(&disks_mutex); error = -EEXIST; if (mddev->gendisk) goto abort; if (name && !dev) { /* Need to ensure that 'name' is not a duplicate. */ struct mddev *mddev2; spin_lock(&all_mddevs_lock); list_for_each_entry(mddev2, &all_mddevs, all_mddevs) if (mddev2->gendisk && strcmp(mddev2->gendisk->disk_name, name) == 0) { spin_unlock(&all_mddevs_lock); goto abort; } spin_unlock(&all_mddevs_lock); } if (name && dev) /* * Creating /dev/mdNNN via "newarray", so adjust hold_active. */ mddev->hold_active = UNTIL_STOP; error = -ENOMEM; mddev->queue = blk_alloc_queue(GFP_KERNEL); if (!mddev->queue) goto abort; mddev->queue->queuedata = mddev; blk_queue_make_request(mddev->queue, md_make_request); blk_set_stacking_limits(&mddev->queue->limits); disk = alloc_disk(1 << shift); if (!disk) { blk_cleanup_queue(mddev->queue); mddev->queue = NULL; goto abort; } disk->major = MAJOR(mddev->unit); disk->first_minor = unit << shift; if (name) strcpy(disk->disk_name, name); else if (partitioned) sprintf(disk->disk_name, "md_d%d", unit); else sprintf(disk->disk_name, "md%d", unit); disk->fops = &md_fops; disk->private_data = mddev; disk->queue = mddev->queue; blk_queue_write_cache(mddev->queue, true, true); /* Allow extended partitions. This makes the * 'mdp' device redundant, but we can't really * remove it now. */ disk->flags |= GENHD_FL_EXT_DEVT; mddev->gendisk = disk; add_disk(disk); error = kobject_add(&mddev->kobj, &disk_to_dev(disk)->kobj, "%s", "md"); if (error) { /* This isn't possible, but as kobject_init_and_add is marked * __must_check, we must do something with the result */ pr_debug("md: cannot register %s/md - name in use\n", disk->disk_name); error = 0; } if (mddev->kobj.sd && sysfs_create_group(&mddev->kobj, &md_bitmap_group)) pr_debug("pointless warning\n"); abort: mutex_unlock(&disks_mutex); if (!error && mddev->kobj.sd) { kobject_uevent(&mddev->kobj, KOBJ_ADD); mddev->sysfs_state = sysfs_get_dirent_safe(mddev->kobj.sd, "array_state"); } mddev_put(mddev); return error; } static struct kobject *md_probe(dev_t dev, int *part, void *data) { if (create_on_open) md_alloc(dev, NULL); return NULL; } static int add_named_array(const char *val, const struct kernel_param *kp) { /* * val must be "md_*" or "mdNNN". * For "md_*" we allocate an array with a large free minor number, and * set the name to val. val must not already be an active name. * For "mdNNN" we allocate an array with the minor number NNN * which must not already be in use. */ int len = strlen(val); char buf[DISK_NAME_LEN]; unsigned long devnum; while (len && val[len-1] == '\n') len--; if (len >= DISK_NAME_LEN) return -E2BIG; strlcpy(buf, val, len+1); if (strncmp(buf, "md_", 3) == 0) return md_alloc(0, buf); if (strncmp(buf, "md", 2) == 0 && isdigit(buf[2]) && kstrtoul(buf+2, 10, &devnum) == 0 && devnum <= MINORMASK) return md_alloc(MKDEV(MD_MAJOR, devnum), NULL); return -EINVAL; } static void md_safemode_timeout(struct timer_list *t) { struct mddev *mddev = from_timer(mddev, t, safemode_timer); mddev->safemode = 1; if (mddev->external) sysfs_notify_dirent_safe(mddev->sysfs_state); md_wakeup_thread(mddev->thread); } static int start_dirty_degraded; int md_run(struct mddev *mddev) { int err; struct md_rdev *rdev; struct md_personality *pers; if (list_empty(&mddev->disks)) /* cannot run an array with no devices.. */ return -EINVAL; if (mddev->pers) return -EBUSY; /* Cannot run until previous stop completes properly */ if (mddev->sysfs_active) return -EBUSY; /* * Analyze all RAID superblock(s) */ if (!mddev->raid_disks) { if (!mddev->persistent) return -EINVAL; analyze_sbs(mddev); } if (mddev->level != LEVEL_NONE) request_module("md-level-%d", mddev->level); else if (mddev->clevel[0]) request_module("md-%s", mddev->clevel); /* * Drop all container device buffers, from now on * the only valid external interface is through the md * device. */ mddev->has_superblocks = false; rdev_for_each(rdev, mddev) { if (test_bit(Faulty, &rdev->flags)) continue; sync_blockdev(rdev->bdev); invalidate_bdev(rdev->bdev); if (mddev->ro != 1 && (bdev_read_only(rdev->bdev) || bdev_read_only(rdev->meta_bdev))) { mddev->ro = 1; if (mddev->gendisk) set_disk_ro(mddev->gendisk, 1); } if (rdev->sb_page) mddev->has_superblocks = true; /* perform some consistency tests on the device. * We don't want the data to overlap the metadata, * Internal Bitmap issues have been handled elsewhere. */ if (rdev->meta_bdev) { /* Nothing to check */; } else if (rdev->data_offset < rdev->sb_start) { if (mddev->dev_sectors && rdev->data_offset + mddev->dev_sectors > rdev->sb_start) { pr_warn("md: %s: data overlaps metadata\n", mdname(mddev)); return -EINVAL; } } else { if (rdev->sb_start + rdev->sb_size/512 > rdev->data_offset) { pr_warn("md: %s: metadata overlaps data\n", mdname(mddev)); return -EINVAL; } } sysfs_notify_dirent_safe(rdev->sysfs_state); } if (!bioset_initialized(&mddev->bio_set)) { err = bioset_init(&mddev->bio_set, BIO_POOL_SIZE, 0, BIOSET_NEED_BVECS); if (err) return err; } if (!bioset_initialized(&mddev->sync_set)) { err = bioset_init(&mddev->sync_set, BIO_POOL_SIZE, 0, BIOSET_NEED_BVECS); if (err) return err; } spin_lock(&pers_lock); pers = find_pers(mddev->level, mddev->clevel); if (!pers || !try_module_get(pers->owner)) { spin_unlock(&pers_lock); if (mddev->level != LEVEL_NONE) pr_warn("md: personality for level %d is not loaded!\n", mddev->level); else pr_warn("md: personality for level %s is not loaded!\n", mddev->clevel); err = -EINVAL; goto abort; } spin_unlock(&pers_lock); if (mddev->level != pers->level) { mddev->level = pers->level; mddev->new_level = pers->level; } strlcpy(mddev->clevel, pers->name, sizeof(mddev->clevel)); if (mddev->reshape_position != MaxSector && pers->start_reshape == NULL) { /* This personality cannot handle reshaping... */ module_put(pers->owner); err = -EINVAL; goto abort; } if (pers->sync_request) { /* Warn if this is a potentially silly * configuration. */ char b[BDEVNAME_SIZE], b2[BDEVNAME_SIZE]; struct md_rdev *rdev2; int warned = 0; rdev_for_each(rdev, mddev) rdev_for_each(rdev2, mddev) { if (rdev < rdev2 && rdev->bdev->bd_contains == rdev2->bdev->bd_contains) { pr_warn("%s: WARNING: %s appears to be on the same physical disk as %s.\n", mdname(mddev), bdevname(rdev->bdev,b), bdevname(rdev2->bdev,b2)); warned = 1; } } if (warned) pr_warn("True protection against single-disk failure might be compromised.\n"); } mddev->recovery = 0; /* may be over-ridden by personality */ mddev->resync_max_sectors = mddev->dev_sectors; mddev->ok_start_degraded = start_dirty_degraded; if (start_readonly && mddev->ro == 0) mddev->ro = 2; /* read-only, but switch on first write */ err = pers->run(mddev); if (err) pr_warn("md: pers->run() failed ...\n"); else if (pers->size(mddev, 0, 0) < mddev->array_sectors) { WARN_ONCE(!mddev->external_size, "%s: default size too small, but 'external_size' not in effect?\n", __func__); pr_warn("md: invalid array_size %llu > default size %llu\n", (unsigned long long)mddev->array_sectors / 2, (unsigned long long)pers->size(mddev, 0, 0) / 2); err = -EINVAL; } if (err == 0 && pers->sync_request && (mddev->bitmap_info.file || mddev->bitmap_info.offset)) { struct bitmap *bitmap; bitmap = md_bitmap_create(mddev, -1); if (IS_ERR(bitmap)) { err = PTR_ERR(bitmap); pr_warn("%s: failed to create bitmap (%d)\n", mdname(mddev), err); } else mddev->bitmap = bitmap; } if (err) { mddev_detach(mddev); if (mddev->private) pers->free(mddev, mddev->private); mddev->private = NULL; module_put(pers->owner); md_bitmap_destroy(mddev); goto abort; } if (mddev->queue) { bool nonrot = true; rdev_for_each(rdev, mddev) { if (rdev->raid_disk >= 0 && !blk_queue_nonrot(bdev_get_queue(rdev->bdev))) { nonrot = false; break; } } if (mddev->degraded) nonrot = false; if (nonrot) blk_queue_flag_set(QUEUE_FLAG_NONROT, mddev->queue); else blk_queue_flag_clear(QUEUE_FLAG_NONROT, mddev->queue); mddev->queue->backing_dev_info->congested_data = mddev; mddev->queue->backing_dev_info->congested_fn = md_congested; } if (pers->sync_request) { if (mddev->kobj.sd && sysfs_create_group(&mddev->kobj, &md_redundancy_group)) pr_warn("md: cannot register extra attributes for %s\n", mdname(mddev)); mddev->sysfs_action = sysfs_get_dirent_safe(mddev->kobj.sd, "sync_action"); } else if (mddev->ro == 2) /* auto-readonly not meaningful */ mddev->ro = 0; atomic_set(&mddev->max_corr_read_errors, MD_DEFAULT_MAX_CORRECTED_READ_ERRORS); mddev->safemode = 0; if (mddev_is_clustered(mddev)) mddev->safemode_delay = 0; else mddev->safemode_delay = (200 * HZ)/1000 +1; /* 200 msec delay */ mddev->in_sync = 1; smp_wmb(); spin_lock(&mddev->lock); mddev->pers = pers; spin_unlock(&mddev->lock); rdev_for_each(rdev, mddev) if (rdev->raid_disk >= 0) if (sysfs_link_rdev(mddev, rdev)) /* failure here is OK */; if (mddev->degraded && !mddev->ro) /* This ensures that recovering status is reported immediately * via sysfs - until a lack of spares is confirmed. */ set_bit(MD_RECOVERY_RECOVER, &mddev->recovery); set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); if (mddev->sb_flags) md_update_sb(mddev, 0); md_new_event(mddev); return 0; abort: bioset_exit(&mddev->bio_set); bioset_exit(&mddev->sync_set); return err; } EXPORT_SYMBOL_GPL(md_run); static int do_md_run(struct mddev *mddev) { int err; set_bit(MD_NOT_READY, &mddev->flags); err = md_run(mddev); if (err) goto out; err = md_bitmap_load(mddev); if (err) { md_bitmap_destroy(mddev); goto out; } if (mddev_is_clustered(mddev)) md_allow_write(mddev); /* run start up tasks that require md_thread */ md_start(mddev); md_wakeup_thread(mddev->thread); md_wakeup_thread(mddev->sync_thread); /* possibly kick off a reshape */ set_capacity(mddev->gendisk, mddev->array_sectors); revalidate_disk(mddev->gendisk); clear_bit(MD_NOT_READY, &mddev->flags); mddev->changed = 1; kobject_uevent(&disk_to_dev(mddev->gendisk)->kobj, KOBJ_CHANGE); sysfs_notify_dirent_safe(mddev->sysfs_state); sysfs_notify_dirent_safe(mddev->sysfs_action); sysfs_notify(&mddev->kobj, NULL, "degraded"); out: clear_bit(MD_NOT_READY, &mddev->flags); return err; } int md_start(struct mddev *mddev) { int ret = 0; if (mddev->pers->start) { set_bit(MD_RECOVERY_WAIT, &mddev->recovery); md_wakeup_thread(mddev->thread); ret = mddev->pers->start(mddev); clear_bit(MD_RECOVERY_WAIT, &mddev->recovery); md_wakeup_thread(mddev->sync_thread); } return ret; } EXPORT_SYMBOL_GPL(md_start); static int restart_array(struct mddev *mddev) { struct gendisk *disk = mddev->gendisk; struct md_rdev *rdev; bool has_journal = false; bool has_readonly = false; /* Complain if it has no devices */ if (list_empty(&mddev->disks)) return -ENXIO; if (!mddev->pers) return -EINVAL; if (!mddev->ro) return -EBUSY; rcu_read_lock(); rdev_for_each_rcu(rdev, mddev) { if (test_bit(Journal, &rdev->flags) && !test_bit(Faulty, &rdev->flags)) has_journal = true; if (bdev_read_only(rdev->bdev)) has_readonly = true; } rcu_read_unlock(); if (test_bit(MD_HAS_JOURNAL, &mddev->flags) && !has_journal) /* Don't restart rw with journal missing/faulty */ return -EINVAL; if (has_readonly) return -EROFS; mddev->safemode = 0; mddev->ro = 0; set_disk_ro(disk, 0); pr_debug("md: %s switched to read-write mode.\n", mdname(mddev)); /* Kick recovery or resync if necessary */ set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); md_wakeup_thread(mddev->thread); md_wakeup_thread(mddev->sync_thread); sysfs_notify_dirent_safe(mddev->sysfs_state); return 0; } static void md_clean(struct mddev *mddev) { mddev->array_sectors = 0; mddev->external_size = 0; mddev->dev_sectors = 0; mddev->raid_disks = 0; mddev->recovery_cp = 0; mddev->resync_min = 0; mddev->resync_max = MaxSector; mddev->reshape_position = MaxSector; mddev->external = 0; mddev->persistent = 0; mddev->level = LEVEL_NONE; mddev->clevel[0] = 0; mddev->flags = 0; mddev->sb_flags = 0; mddev->ro = 0; mddev->metadata_type[0] = 0; mddev->chunk_sectors = 0; mddev->ctime = mddev->utime = 0; mddev->layout = 0; mddev->max_disks = 0; mddev->events = 0; mddev->can_decrease_events = 0; mddev->delta_disks = 0; mddev->reshape_backwards = 0; mddev->new_level = LEVEL_NONE; mddev->new_layout = 0; mddev->new_chunk_sectors = 0; mddev->curr_resync = 0; atomic64_set(&mddev->resync_mismatches, 0); mddev->suspend_lo = mddev->suspend_hi = 0; mddev->sync_speed_min = mddev->sync_speed_max = 0; mddev->recovery = 0; mddev->in_sync = 0; mddev->changed = 0; mddev->degraded = 0; mddev->safemode = 0; mddev->private = NULL; mddev->cluster_info = NULL; mddev->bitmap_info.offset = 0; mddev->bitmap_info.default_offset = 0; mddev->bitmap_info.default_space = 0; mddev->bitmap_info.chunksize = 0; mddev->bitmap_info.daemon_sleep = 0; mddev->bitmap_info.max_write_behind = 0; mddev->bitmap_info.nodes = 0; } static void __md_stop_writes(struct mddev *mddev) { set_bit(MD_RECOVERY_FROZEN, &mddev->recovery); flush_workqueue(md_misc_wq); if (mddev->sync_thread) { set_bit(MD_RECOVERY_INTR, &mddev->recovery); md_reap_sync_thread(mddev); } del_timer_sync(&mddev->safemode_timer); if (mddev->pers && mddev->pers->quiesce) { mddev->pers->quiesce(mddev, 1); mddev->pers->quiesce(mddev, 0); } md_bitmap_flush(mddev); if (mddev->ro == 0 && ((!mddev->in_sync && !mddev_is_clustered(mddev)) || mddev->sb_flags)) { /* mark array as shutdown cleanly */ if (!mddev_is_clustered(mddev)) mddev->in_sync = 1; md_update_sb(mddev, 1); } } void md_stop_writes(struct mddev *mddev) { mddev_lock_nointr(mddev); __md_stop_writes(mddev); mddev_unlock(mddev); } EXPORT_SYMBOL_GPL(md_stop_writes); static void mddev_detach(struct mddev *mddev) { md_bitmap_wait_behind_writes(mddev); if (mddev->pers && mddev->pers->quiesce && !mddev->suspended) { mddev->pers->quiesce(mddev, 1); mddev->pers->quiesce(mddev, 0); } md_unregister_thread(&mddev->thread); if (mddev->queue) blk_sync_queue(mddev->queue); /* the unplug fn references 'conf'*/ } static void __md_stop(struct mddev *mddev) { struct md_personality *pers = mddev->pers; md_bitmap_destroy(mddev); mddev_detach(mddev); /* Ensure ->event_work is done */ flush_workqueue(md_misc_wq); spin_lock(&mddev->lock); mddev->pers = NULL; spin_unlock(&mddev->lock); pers->free(mddev, mddev->private); mddev->private = NULL; if (pers->sync_request && mddev->to_remove == NULL) mddev->to_remove = &md_redundancy_group; module_put(pers->owner); clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery); } void md_stop(struct mddev *mddev) { /* stop the array and free an attached data structures. * This is called from dm-raid */ __md_stop_writes(mddev); __md_stop(mddev); bioset_exit(&mddev->bio_set); bioset_exit(&mddev->sync_set); } EXPORT_SYMBOL_GPL(md_stop); static int md_set_readonly(struct mddev *mddev, struct block_device *bdev) { int err = 0; int did_freeze = 0; if (!test_bit(MD_RECOVERY_FROZEN, &mddev->recovery)) { did_freeze = 1; set_bit(MD_RECOVERY_FROZEN, &mddev->recovery); md_wakeup_thread(mddev->thread); } if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) set_bit(MD_RECOVERY_INTR, &mddev->recovery); if (mddev->sync_thread) /* Thread might be blocked waiting for metadata update * which will now never happen */ wake_up_process(mddev->sync_thread->tsk); if (mddev->external && test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags)) return -EBUSY; mddev_unlock(mddev); wait_event(resync_wait, !test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)); wait_event(mddev->sb_wait, !test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags)); mddev_lock_nointr(mddev); mutex_lock(&mddev->open_mutex); if ((mddev->pers && atomic_read(&mddev->openers) > !!bdev) || mddev->sync_thread || test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) { pr_warn("md: %s still in use.\n",mdname(mddev)); if (did_freeze) { clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery); set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); md_wakeup_thread(mddev->thread); } err = -EBUSY; goto out; } if (mddev->pers) { __md_stop_writes(mddev); err = -ENXIO; if (mddev->ro==1) goto out; mddev->ro = 1; set_disk_ro(mddev->gendisk, 1); clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery); set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); md_wakeup_thread(mddev->thread); sysfs_notify_dirent_safe(mddev->sysfs_state); err = 0; } out: mutex_unlock(&mddev->open_mutex); return err; } /* mode: * 0 - completely stop and dis-assemble array * 2 - stop but do not disassemble array */ static int do_md_stop(struct mddev *mddev, int mode, struct block_device *bdev) { struct gendisk *disk = mddev->gendisk; struct md_rdev *rdev; int did_freeze = 0; if (!test_bit(MD_RECOVERY_FROZEN, &mddev->recovery)) { did_freeze = 1; set_bit(MD_RECOVERY_FROZEN, &mddev->recovery); md_wakeup_thread(mddev->thread); } if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) set_bit(MD_RECOVERY_INTR, &mddev->recovery); if (mddev->sync_thread) /* Thread might be blocked waiting for metadata update * which will now never happen */ wake_up_process(mddev->sync_thread->tsk); mddev_unlock(mddev); wait_event(resync_wait, (mddev->sync_thread == NULL && !test_bit(MD_RECOVERY_RUNNING, &mddev->recovery))); mddev_lock_nointr(mddev); mutex_lock(&mddev->open_mutex); if ((mddev->pers && atomic_read(&mddev->openers) > !!bdev) || mddev->sysfs_active || mddev->sync_thread || test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) { pr_warn("md: %s still in use.\n",mdname(mddev)); mutex_unlock(&mddev->open_mutex); if (did_freeze) { clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery); set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); md_wakeup_thread(mddev->thread); } return -EBUSY; } if (mddev->pers) { if (mddev->ro) set_disk_ro(disk, 0); __md_stop_writes(mddev); __md_stop(mddev); mddev->queue->backing_dev_info->congested_fn = NULL; /* tell userspace to handle 'inactive' */ sysfs_notify_dirent_safe(mddev->sysfs_state); rdev_for_each(rdev, mddev) if (rdev->raid_disk >= 0) sysfs_unlink_rdev(mddev, rdev); set_capacity(disk, 0); mutex_unlock(&mddev->open_mutex); mddev->changed = 1; revalidate_disk(disk); if (mddev->ro) mddev->ro = 0; } else mutex_unlock(&mddev->open_mutex); /* * Free resources if final stop */ if (mode == 0) { pr_info("md: %s stopped.\n", mdname(mddev)); if (mddev->bitmap_info.file) { struct file *f = mddev->bitmap_info.file; spin_lock(&mddev->lock); mddev->bitmap_info.file = NULL; spin_unlock(&mddev->lock); fput(f); } mddev->bitmap_info.offset = 0; export_array(mddev); md_clean(mddev); if (mddev->hold_active == UNTIL_STOP) mddev->hold_active = 0; } md_new_event(mddev); sysfs_notify_dirent_safe(mddev->sysfs_state); return 0; } #ifndef MODULE static void autorun_array(struct mddev *mddev) { struct md_rdev *rdev; int err; if (list_empty(&mddev->disks)) return; pr_info("md: running: "); rdev_for_each(rdev, mddev) { char b[BDEVNAME_SIZE]; pr_cont("<%s>", bdevname(rdev->bdev,b)); } pr_cont("\n"); err = do_md_run(mddev); if (err) { pr_warn("md: do_md_run() returned %d\n", err); do_md_stop(mddev, 0, NULL); } } /* * lets try to run arrays based on all disks that have arrived * until now. (those are in pending_raid_disks) * * the method: pick the first pending disk, collect all disks with * the same UUID, remove all from the pending list and put them into * the 'same_array' list. Then order this list based on superblock * update time (freshest comes first), kick out 'old' disks and * compare superblocks. If everything's fine then run it. * * If "unit" is allocated, then bump its reference count */ static void autorun_devices(int part) { struct md_rdev *rdev0, *rdev, *tmp; struct mddev *mddev; char b[BDEVNAME_SIZE]; pr_info("md: autorun ...\n"); while (!list_empty(&pending_raid_disks)) { int unit; dev_t dev; LIST_HEAD(candidates); rdev0 = list_entry(pending_raid_disks.next, struct md_rdev, same_set); pr_debug("md: considering %s ...\n", bdevname(rdev0->bdev,b)); INIT_LIST_HEAD(&candidates); rdev_for_each_list(rdev, tmp, &pending_raid_disks) if (super_90_load(rdev, rdev0, 0) >= 0) { pr_debug("md: adding %s ...\n", bdevname(rdev->bdev,b)); list_move(&rdev->same_set, &candidates); } /* * now we have a set of devices, with all of them having * mostly sane superblocks. It's time to allocate the * mddev. */ if (part) { dev = MKDEV(mdp_major, rdev0->preferred_minor << MdpMinorShift); unit = MINOR(dev) >> MdpMinorShift; } else { dev = MKDEV(MD_MAJOR, rdev0->preferred_minor); unit = MINOR(dev); } if (rdev0->preferred_minor != unit) { pr_warn("md: unit number in %s is bad: %d\n", bdevname(rdev0->bdev, b), rdev0->preferred_minor); break; } md_probe(dev, NULL, NULL); mddev = mddev_find(dev); if (!mddev) break; if (mddev_lock(mddev)) pr_warn("md: %s locked, cannot run\n", mdname(mddev)); else if (mddev->raid_disks || mddev->major_version || !list_empty(&mddev->disks)) { pr_warn("md: %s already running, cannot run %s\n", mdname(mddev), bdevname(rdev0->bdev,b)); mddev_unlock(mddev); } else { pr_debug("md: created %s\n", mdname(mddev)); mddev->persistent = 1; rdev_for_each_list(rdev, tmp, &candidates) { list_del_init(&rdev->same_set); if (bind_rdev_to_array(rdev, mddev)) export_rdev(rdev); } autorun_array(mddev); mddev_unlock(mddev); } /* on success, candidates will be empty, on error * it won't... */ rdev_for_each_list(rdev, tmp, &candidates) { list_del_init(&rdev->same_set); export_rdev(rdev); } mddev_put(mddev); } pr_info("md: ... autorun DONE.\n"); } #endif /* !MODULE */ static int get_version(void __user *arg) { mdu_version_t ver; ver.major = MD_MAJOR_VERSION; ver.minor = MD_MINOR_VERSION; ver.patchlevel = MD_PATCHLEVEL_VERSION; if (copy_to_user(arg, &ver, sizeof(ver))) return -EFAULT; return 0; } static int get_array_info(struct mddev *mddev, void __user *arg) { mdu_array_info_t info; int nr,working,insync,failed,spare; struct md_rdev *rdev; nr = working = insync = failed = spare = 0; rcu_read_lock(); rdev_for_each_rcu(rdev, mddev) { nr++; if (test_bit(Faulty, &rdev->flags)) failed++; else { working++; if (test_bit(In_sync, &rdev->flags)) insync++; else if (test_bit(Journal, &rdev->flags)) /* TODO: add journal count to md_u.h */ ; else spare++; } } rcu_read_unlock(); info.major_version = mddev->major_version; info.minor_version = mddev->minor_version; info.patch_version = MD_PATCHLEVEL_VERSION; info.ctime = clamp_t(time64_t, mddev->ctime, 0, U32_MAX); info.level = mddev->level; info.size = mddev->dev_sectors / 2; if (info.size != mddev->dev_sectors / 2) /* overflow */ info.size = -1; info.nr_disks = nr; info.raid_disks = mddev->raid_disks; info.md_minor = mddev->md_minor; info.not_persistent= !mddev->persistent; info.utime = clamp_t(time64_t, mddev->utime, 0, U32_MAX); info.state = 0; if (mddev->in_sync) info.state = (1<<MD_SB_CLEAN); if (mddev->bitmap && mddev->bitmap_info.offset) info.state |= (1<<MD_SB_BITMAP_PRESENT); if (mddev_is_clustered(mddev)) info.state |= (1<<MD_SB_CLUSTERED); info.active_disks = insync; info.working_disks = working; info.failed_disks = failed; info.spare_disks = spare; info.layout = mddev->layout; info.chunk_size = mddev->chunk_sectors << 9; if (copy_to_user(arg, &info, sizeof(info))) return -EFAULT; return 0; } static int get_bitmap_file(struct mddev *mddev, void __user * arg) { mdu_bitmap_file_t *file = NULL; /* too big for stack allocation */ char *ptr; int err; file = kzalloc(sizeof(*file), GFP_NOIO); if (!file) return -ENOMEM; err = 0; spin_lock(&mddev->lock); /* bitmap enabled */ if (mddev->bitmap_info.file) { ptr = file_path(mddev->bitmap_info.file, file->pathname, sizeof(file->pathname)); if (IS_ERR(ptr)) err = PTR_ERR(ptr); else memmove(file->pathname, ptr, sizeof(file->pathname)-(ptr-file->pathname)); } spin_unlock(&mddev->lock); if (err == 0 && copy_to_user(arg, file, sizeof(*file))) err = -EFAULT; kfree(file); return err; } static int get_disk_info(struct mddev *mddev, void __user * arg) { mdu_disk_info_t info; struct md_rdev *rdev; if (copy_from_user(&info, arg, sizeof(info))) return -EFAULT; rcu_read_lock(); rdev = md_find_rdev_nr_rcu(mddev, info.number); if (rdev) { info.major = MAJOR(rdev->bdev->bd_dev); info.minor = MINOR(rdev->bdev->bd_dev); info.raid_disk = rdev->raid_disk; info.state = 0; if (test_bit(Faulty, &rdev->flags)) info.state |= (1<<MD_DISK_FAULTY); else if (test_bit(In_sync, &rdev->flags)) { info.state |= (1<<MD_DISK_ACTIVE); info.state |= (1<<MD_DISK_SYNC); } if (test_bit(Journal, &rdev->flags)) info.state |= (1<<MD_DISK_JOURNAL); if (test_bit(WriteMostly, &rdev->flags)) info.state |= (1<<MD_DISK_WRITEMOSTLY); if (test_bit(FailFast, &rdev->flags)) info.state |= (1<<MD_DISK_FAILFAST); } else { info.major = info.minor = 0; info.raid_disk = -1; info.state = (1<<MD_DISK_REMOVED); } rcu_read_unlock(); if (copy_to_user(arg, &info, sizeof(info))) return -EFAULT; return 0; } static int add_new_disk(struct mddev *mddev, mdu_disk_info_t *info) { char b[BDEVNAME_SIZE], b2[BDEVNAME_SIZE]; struct md_rdev *rdev; dev_t dev = MKDEV(info->major,info->minor); if (mddev_is_clustered(mddev) && !(info->state & ((1 << MD_DISK_CLUSTER_ADD) | (1 << MD_DISK_CANDIDATE)))) { pr_warn("%s: Cannot add to clustered mddev.\n", mdname(mddev)); return -EINVAL; } if (info->major != MAJOR(dev) || info->minor != MINOR(dev)) return -EOVERFLOW; if (!mddev->raid_disks) { int err; /* expecting a device which has a superblock */ rdev = md_import_device(dev, mddev->major_version, mddev->minor_version); if (IS_ERR(rdev)) { pr_warn("md: md_import_device returned %ld\n", PTR_ERR(rdev)); return PTR_ERR(rdev); } if (!list_empty(&mddev->disks)) { struct md_rdev *rdev0 = list_entry(mddev->disks.next, struct md_rdev, same_set); err = super_types[mddev->major_version] .load_super(rdev, rdev0, mddev->minor_version); if (err < 0) { pr_warn("md: %s has different UUID to %s\n", bdevname(rdev->bdev,b), bdevname(rdev0->bdev,b2)); export_rdev(rdev); return -EINVAL; } } err = bind_rdev_to_array(rdev, mddev); if (err) export_rdev(rdev); return err; } /* * add_new_disk can be used once the array is assembled * to add "hot spares". They must already have a superblock * written */ if (mddev->pers) { int err; if (!mddev->pers->hot_add_disk) { pr_warn("%s: personality does not support diskops!\n", mdname(mddev)); return -EINVAL; } if (mddev->persistent) rdev = md_import_device(dev, mddev->major_version, mddev->minor_version); else rdev = md_import_device(dev, -1, -1); if (IS_ERR(rdev)) { pr_warn("md: md_import_device returned %ld\n", PTR_ERR(rdev)); return PTR_ERR(rdev); } /* set saved_raid_disk if appropriate */ if (!mddev->persistent) { if (info->state & (1<<MD_DISK_SYNC) && info->raid_disk < mddev->raid_disks) { rdev->raid_disk = info->raid_disk; set_bit(In_sync, &rdev->flags); clear_bit(Bitmap_sync, &rdev->flags); } else rdev->raid_disk = -1; rdev->saved_raid_disk = rdev->raid_disk; } else super_types[mddev->major_version]. validate_super(mddev, rdev); if ((info->state & (1<<MD_DISK_SYNC)) && rdev->raid_disk != info->raid_disk) { /* This was a hot-add request, but events doesn't * match, so reject it. */ export_rdev(rdev); return -EINVAL; } clear_bit(In_sync, &rdev->flags); /* just to be sure */ if (info->state & (1<<MD_DISK_WRITEMOSTLY)) set_bit(WriteMostly, &rdev->flags); else clear_bit(WriteMostly, &rdev->flags); if (info->state & (1<<MD_DISK_FAILFAST)) set_bit(FailFast, &rdev->flags); else clear_bit(FailFast, &rdev->flags); if (info->state & (1<<MD_DISK_JOURNAL)) { struct md_rdev *rdev2; bool has_journal = false; /* make sure no existing journal disk */ rdev_for_each(rdev2, mddev) { if (test_bit(Journal, &rdev2->flags)) { has_journal = true; break; } } if (has_journal || mddev->bitmap) { export_rdev(rdev); return -EBUSY; } set_bit(Journal, &rdev->flags); } /* * check whether the device shows up in other nodes */ if (mddev_is_clustered(mddev)) { if (info->state & (1 << MD_DISK_CANDIDATE)) set_bit(Candidate, &rdev->flags); else if (info->state & (1 << MD_DISK_CLUSTER_ADD)) { /* --add initiated by this node */ err = md_cluster_ops->add_new_disk(mddev, rdev); if (err) { export_rdev(rdev); return err; } } } rdev->raid_disk = -1; err = bind_rdev_to_array(rdev, mddev); if (err) export_rdev(rdev); if (mddev_is_clustered(mddev)) { if (info->state & (1 << MD_DISK_CANDIDATE)) { if (!err) { err = md_cluster_ops->new_disk_ack(mddev, err == 0); if (err) md_kick_rdev_from_array(rdev); } } else { if (err) md_cluster_ops->add_new_disk_cancel(mddev); else err = add_bound_rdev(rdev); } } else if (!err) err = add_bound_rdev(rdev); return err; } /* otherwise, add_new_disk is only allowed * for major_version==0 superblocks */ if (mddev->major_version != 0) { pr_warn("%s: ADD_NEW_DISK not supported\n", mdname(mddev)); return -EINVAL; } if (!(info->state & (1<<MD_DISK_FAULTY))) { int err; rdev = md_import_device(dev, -1, 0); if (IS_ERR(rdev)) { pr_warn("md: error, md_import_device() returned %ld\n", PTR_ERR(rdev)); return PTR_ERR(rdev); } rdev->desc_nr = info->number; if (info->raid_disk < mddev->raid_disks) rdev->raid_disk = info->raid_disk; else rdev->raid_disk = -1; if (rdev->raid_disk < mddev->raid_disks) if (info->state & (1<<MD_DISK_SYNC)) set_bit(In_sync, &rdev->flags); if (info->state & (1<<MD_DISK_WRITEMOSTLY)) set_bit(WriteMostly, &rdev->flags); if (info->state & (1<<MD_DISK_FAILFAST)) set_bit(FailFast, &rdev->flags); if (!mddev->persistent) { pr_debug("md: nonpersistent superblock ...\n"); rdev->sb_start = i_size_read(rdev->bdev->bd_inode) / 512; } else rdev->sb_start = calc_dev_sboffset(rdev); rdev->sectors = rdev->sb_start; err = bind_rdev_to_array(rdev, mddev); if (err) { export_rdev(rdev); return err; } } return 0; } static int hot_remove_disk(struct mddev *mddev, dev_t dev) { char b[BDEVNAME_SIZE]; struct md_rdev *rdev; if (!mddev->pers) return -ENODEV; rdev = find_rdev(mddev, dev); if (!rdev) return -ENXIO; if (rdev->raid_disk < 0) goto kick_rdev; clear_bit(Blocked, &rdev->flags); remove_and_add_spares(mddev, rdev); if (rdev->raid_disk >= 0) goto busy; kick_rdev: if (mddev_is_clustered(mddev)) { if (md_cluster_ops->remove_disk(mddev, rdev)) goto busy; } md_kick_rdev_from_array(rdev); set_bit(MD_SB_CHANGE_DEVS, &mddev->sb_flags); if (mddev->thread) md_wakeup_thread(mddev->thread); else md_update_sb(mddev, 1); md_new_event(mddev); return 0; busy: pr_debug("md: cannot remove active disk %s from %s ...\n", bdevname(rdev->bdev,b), mdname(mddev)); return -EBUSY; } static int hot_add_disk(struct mddev *mddev, dev_t dev) { char b[BDEVNAME_SIZE]; int err; struct md_rdev *rdev; if (!mddev->pers) return -ENODEV; if (mddev->major_version != 0) { pr_warn("%s: HOT_ADD may only be used with version-0 superblocks.\n", mdname(mddev)); return -EINVAL; } if (!mddev->pers->hot_add_disk) { pr_warn("%s: personality does not support diskops!\n", mdname(mddev)); return -EINVAL; } rdev = md_import_device(dev, -1, 0); if (IS_ERR(rdev)) { pr_warn("md: error, md_import_device() returned %ld\n", PTR_ERR(rdev)); return -EINVAL; } if (mddev->persistent) rdev->sb_start = calc_dev_sboffset(rdev); else rdev->sb_start = i_size_read(rdev->bdev->bd_inode) / 512; rdev->sectors = rdev->sb_start; if (test_bit(Faulty, &rdev->flags)) { pr_warn("md: can not hot-add faulty %s disk to %s!\n", bdevname(rdev->bdev,b), mdname(mddev)); err = -EINVAL; goto abort_export; } clear_bit(In_sync, &rdev->flags); rdev->desc_nr = -1; rdev->saved_raid_disk = -1; err = bind_rdev_to_array(rdev, mddev); if (err) goto abort_export; /* * The rest should better be atomic, we can have disk failures * noticed in interrupt contexts ... */ rdev->raid_disk = -1; set_bit(MD_SB_CHANGE_DEVS, &mddev->sb_flags); if (!mddev->thread) md_update_sb(mddev, 1); /* * Kick recovery, maybe this spare has to be added to the * array immediately. */ set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); md_wakeup_thread(mddev->thread); md_new_event(mddev); return 0; abort_export: export_rdev(rdev); return err; } static int set_bitmap_file(struct mddev *mddev, int fd) { int err = 0; if (mddev->pers) { if (!mddev->pers->quiesce || !mddev->thread) return -EBUSY; if (mddev->recovery || mddev->sync_thread) return -EBUSY; /* we should be able to change the bitmap.. */ } if (fd >= 0) { struct inode *inode; struct file *f; if (mddev->bitmap || mddev->bitmap_info.file) return -EEXIST; /* cannot add when bitmap is present */ f = fget(fd); if (f == NULL) { pr_warn("%s: error: failed to get bitmap file\n", mdname(mddev)); return -EBADF; } inode = f->f_mapping->host; if (!S_ISREG(inode->i_mode)) { pr_warn("%s: error: bitmap file must be a regular file\n", mdname(mddev)); err = -EBADF; } else if (!(f->f_mode & FMODE_WRITE)) { pr_warn("%s: error: bitmap file must open for write\n", mdname(mddev)); err = -EBADF; } else if (atomic_read(&inode->i_writecount) != 1) { pr_warn("%s: error: bitmap file is already in use\n", mdname(mddev)); err = -EBUSY; } if (err) { fput(f); return err; } mddev->bitmap_info.file = f; mddev->bitmap_info.offset = 0; /* file overrides offset */ } else if (mddev->bitmap == NULL) return -ENOENT; /* cannot remove what isn't there */ err = 0; if (mddev->pers) { if (fd >= 0) { struct bitmap *bitmap; bitmap = md_bitmap_create(mddev, -1); mddev_suspend(mddev); if (!IS_ERR(bitmap)) { mddev->bitmap = bitmap; err = md_bitmap_load(mddev); } else err = PTR_ERR(bitmap); if (err) { md_bitmap_destroy(mddev); fd = -1; } mddev_resume(mddev); } else if (fd < 0) { mddev_suspend(mddev); md_bitmap_destroy(mddev); mddev_resume(mddev); } } if (fd < 0) { struct file *f = mddev->bitmap_info.file; if (f) { spin_lock(&mddev->lock); mddev->bitmap_info.file = NULL; spin_unlock(&mddev->lock); fput(f); } } return err; } /* * set_array_info is used two different ways * The original usage is when creating a new array. * In this usage, raid_disks is > 0 and it together with * level, size, not_persistent,layout,chunksize determine the * shape of the array. * This will always create an array with a type-0.90.0 superblock. * The newer usage is when assembling an array. * In this case raid_disks will be 0, and the major_version field is * use to determine which style super-blocks are to be found on the devices. * The minor and patch _version numbers are also kept incase the * super_block handler wishes to interpret them. */ static int set_array_info(struct mddev *mddev, mdu_array_info_t *info) { if (info->raid_disks == 0) { /* just setting version number for superblock loading */ if (info->major_version < 0 || info->major_version >= ARRAY_SIZE(super_types) || super_types[info->major_version].name == NULL) { /* maybe try to auto-load a module? */ pr_warn("md: superblock version %d not known\n", info->major_version); return -EINVAL; } mddev->major_version = info->major_version; mddev->minor_version = info->minor_version; mddev->patch_version = info->patch_version; mddev->persistent = !info->not_persistent; /* ensure mddev_put doesn't delete this now that there * is some minimal configuration. */ mddev->ctime = ktime_get_real_seconds(); return 0; } mddev->major_version = MD_MAJOR_VERSION; mddev->minor_version = MD_MINOR_VERSION; mddev->patch_version = MD_PATCHLEVEL_VERSION; mddev->ctime = ktime_get_real_seconds(); mddev->level = info->level; mddev->clevel[0] = 0; mddev->dev_sectors = 2 * (sector_t)info->size; mddev->raid_disks = info->raid_disks; /* don't set md_minor, it is determined by which /dev/md* was * openned */ if (info->state & (1<<MD_SB_CLEAN)) mddev->recovery_cp = MaxSector; else mddev->recovery_cp = 0; mddev->persistent = ! info->not_persistent; mddev->external = 0; mddev->layout = info->layout; if (mddev->level == 0) /* Cannot trust RAID0 layout info here */ mddev->layout = -1; mddev->chunk_sectors = info->chunk_size >> 9; if (mddev->persistent) { mddev->max_disks = MD_SB_DISKS; mddev->flags = 0; mddev->sb_flags = 0; } set_bit(MD_SB_CHANGE_DEVS, &mddev->sb_flags); mddev->bitmap_info.default_offset = MD_SB_BYTES >> 9; mddev->bitmap_info.default_space = 64*2 - (MD_SB_BYTES >> 9); mddev->bitmap_info.offset = 0; mddev->reshape_position = MaxSector; /* * Generate a 128 bit UUID */ get_random_bytes(mddev->uuid, 16); mddev->new_level = mddev->level; mddev->new_chunk_sectors = mddev->chunk_sectors; mddev->new_layout = mddev->layout; mddev->delta_disks = 0; mddev->reshape_backwards = 0; return 0; } void md_set_array_sectors(struct mddev *mddev, sector_t array_sectors) { lockdep_assert_held(&mddev->reconfig_mutex); if (mddev->external_size) return; mddev->array_sectors = array_sectors; } EXPORT_SYMBOL(md_set_array_sectors); static int update_size(struct mddev *mddev, sector_t num_sectors) { struct md_rdev *rdev; int rv; int fit = (num_sectors == 0); sector_t old_dev_sectors = mddev->dev_sectors; if (mddev->pers->resize == NULL) return -EINVAL; /* The "num_sectors" is the number of sectors of each device that * is used. This can only make sense for arrays with redundancy. * linear and raid0 always use whatever space is available. We can only * consider changing this number if no resync or reconstruction is * happening, and if the new size is acceptable. It must fit before the * sb_start or, if that is <data_offset, it must fit before the size * of each device. If num_sectors is zero, we find the largest size * that fits. */ if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) || mddev->sync_thread) return -EBUSY; if (mddev->ro) return -EROFS; rdev_for_each(rdev, mddev) { sector_t avail = rdev->sectors; if (fit && (num_sectors == 0 || num_sectors > avail)) num_sectors = avail; if (avail < num_sectors) return -ENOSPC; } rv = mddev->pers->resize(mddev, num_sectors); if (!rv) { if (mddev_is_clustered(mddev)) md_cluster_ops->update_size(mddev, old_dev_sectors); else if (mddev->queue) { set_capacity(mddev->gendisk, mddev->array_sectors); revalidate_disk(mddev->gendisk); } } return rv; } static int update_raid_disks(struct mddev *mddev, int raid_disks) { int rv; struct md_rdev *rdev; /* change the number of raid disks */ if (mddev->pers->check_reshape == NULL) return -EINVAL; if (mddev->ro) return -EROFS; if (raid_disks <= 0 || (mddev->max_disks && raid_disks >= mddev->max_disks)) return -EINVAL; if (mddev->sync_thread || test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) || test_bit(MD_RESYNCING_REMOTE, &mddev->recovery) || mddev->reshape_position != MaxSector) return -EBUSY; rdev_for_each(rdev, mddev) { if (mddev->raid_disks < raid_disks && rdev->data_offset < rdev->new_data_offset) return -EINVAL; if (mddev->raid_disks > raid_disks && rdev->data_offset > rdev->new_data_offset) return -EINVAL; } mddev->delta_disks = raid_disks - mddev->raid_disks; if (mddev->delta_disks < 0) mddev->reshape_backwards = 1; else if (mddev->delta_disks > 0) mddev->reshape_backwards = 0; rv = mddev->pers->check_reshape(mddev); if (rv < 0) { mddev->delta_disks = 0; mddev->reshape_backwards = 0; } return rv; } /* * update_array_info is used to change the configuration of an * on-line array. * The version, ctime,level,size,raid_disks,not_persistent, layout,chunk_size * fields in the info are checked against the array. * Any differences that cannot be handled will cause an error. * Normally, only one change can be managed at a time. */ static int update_array_info(struct mddev *mddev, mdu_array_info_t *info) { int rv = 0; int cnt = 0; int state = 0; /* calculate expected state,ignoring low bits */ if (mddev->bitmap && mddev->bitmap_info.offset) state |= (1 << MD_SB_BITMAP_PRESENT); if (mddev->major_version != info->major_version || mddev->minor_version != info->minor_version || /* mddev->patch_version != info->patch_version || */ mddev->ctime != info->ctime || mddev->level != info->level || /* mddev->layout != info->layout || */ mddev->persistent != !info->not_persistent || mddev->chunk_sectors != info->chunk_size >> 9 || /* ignore bottom 8 bits of state, and allow SB_BITMAP_PRESENT to change */ ((state^info->state) & 0xfffffe00) ) return -EINVAL; /* Check there is only one change */ if (info->size >= 0 && mddev->dev_sectors / 2 != info->size) cnt++; if (mddev->raid_disks != info->raid_disks) cnt++; if (mddev->layout != info->layout) cnt++; if ((state ^ info->state) & (1<<MD_SB_BITMAP_PRESENT)) cnt++; if (cnt == 0) return 0; if (cnt > 1) return -EINVAL; if (mddev->layout != info->layout) { /* Change layout * we don't need to do anything at the md level, the * personality will take care of it all. */ if (mddev->pers->check_reshape == NULL) return -EINVAL; else { mddev->new_layout = info->layout; rv = mddev->pers->check_reshape(mddev); if (rv) mddev->new_layout = mddev->layout; return rv; } } if (info->size >= 0 && mddev->dev_sectors / 2 != info->size) rv = update_size(mddev, (sector_t)info->size * 2); if (mddev->raid_disks != info->raid_disks) rv = update_raid_disks(mddev, info->raid_disks); if ((state ^ info->state) & (1<<MD_SB_BITMAP_PRESENT)) { if (mddev->pers->quiesce == NULL || mddev->thread == NULL) { rv = -EINVAL; goto err; } if (mddev->recovery || mddev->sync_thread) { rv = -EBUSY; goto err; } if (info->state & (1<<MD_SB_BITMAP_PRESENT)) { struct bitmap *bitmap; /* add the bitmap */ if (mddev->bitmap) { rv = -EEXIST; goto err; } if (mddev->bitmap_info.default_offset == 0) { rv = -EINVAL; goto err; } mddev->bitmap_info.offset = mddev->bitmap_info.default_offset; mddev->bitmap_info.space = mddev->bitmap_info.default_space; bitmap = md_bitmap_create(mddev, -1); mddev_suspend(mddev); if (!IS_ERR(bitmap)) { mddev->bitmap = bitmap; rv = md_bitmap_load(mddev); } else rv = PTR_ERR(bitmap); if (rv) md_bitmap_destroy(mddev); mddev_resume(mddev); } else { /* remove the bitmap */ if (!mddev->bitmap) { rv = -ENOENT; goto err; } if (mddev->bitmap->storage.file) { rv = -EINVAL; goto err; } if (mddev->bitmap_info.nodes) { /* hold PW on all the bitmap lock */ if (md_cluster_ops->lock_all_bitmaps(mddev) <= 0) { pr_warn("md: can't change bitmap to none since the array is in use by more than one node\n"); rv = -EPERM; md_cluster_ops->unlock_all_bitmaps(mddev); goto err; } mddev->bitmap_info.nodes = 0; md_cluster_ops->leave(mddev); } mddev_suspend(mddev); md_bitmap_destroy(mddev); mddev_resume(mddev); mddev->bitmap_info.offset = 0; } } md_update_sb(mddev, 1); return rv; err: return rv; } static int set_disk_faulty(struct mddev *mddev, dev_t dev) { struct md_rdev *rdev; int err = 0; if (mddev->pers == NULL) return -ENODEV; rcu_read_lock(); rdev = md_find_rdev_rcu(mddev, dev); if (!rdev) err = -ENODEV; else { md_error(mddev, rdev); if (!test_bit(Faulty, &rdev->flags)) err = -EBUSY; } rcu_read_unlock(); return err; } /* * We have a problem here : there is no easy way to give a CHS * virtual geometry. We currently pretend that we have a 2 heads * 4 sectors (with a BIG number of cylinders...). This drives * dosfs just mad... ;-) */ static int md_getgeo(struct block_device *bdev, struct hd_geometry *geo) { struct mddev *mddev = bdev->bd_disk->private_data; geo->heads = 2; geo->sectors = 4; geo->cylinders = mddev->array_sectors / 8; return 0; } static inline bool md_ioctl_valid(unsigned int cmd) { switch (cmd) { case ADD_NEW_DISK: case BLKROSET: case GET_ARRAY_INFO: case GET_BITMAP_FILE: case GET_DISK_INFO: case HOT_ADD_DISK: case HOT_REMOVE_DISK: case RAID_AUTORUN: case RAID_VERSION: case RESTART_ARRAY_RW: case RUN_ARRAY: case SET_ARRAY_INFO: case SET_BITMAP_FILE: case SET_DISK_FAULTY: case STOP_ARRAY: case STOP_ARRAY_RO: case CLUSTERED_DISK_NACK: return true; default: return false; } } static int md_ioctl(struct block_device *bdev, fmode_t mode, unsigned int cmd, unsigned long arg) { int err = 0; void __user *argp = (void __user *)arg; struct mddev *mddev = NULL; int ro; bool did_set_md_closing = false; if (!md_ioctl_valid(cmd)) return -ENOTTY; switch (cmd) { case RAID_VERSION: case GET_ARRAY_INFO: case GET_DISK_INFO: break; default: if (!capable(CAP_SYS_ADMIN)) return -EACCES; } /* * Commands dealing with the RAID driver but not any * particular array: */ switch (cmd) { case RAID_VERSION: err = get_version(argp); goto out; #ifndef MODULE case RAID_AUTORUN: err = 0; autostart_arrays(arg); goto out; #endif default:; } /* * Commands creating/starting a new array: */ mddev = bdev->bd_disk->private_data; if (!mddev) { BUG(); goto out; } /* Some actions do not requires the mutex */ switch (cmd) { case GET_ARRAY_INFO: if (!mddev->raid_disks && !mddev->external) err = -ENODEV; else err = get_array_info(mddev, argp); goto out; case GET_DISK_INFO: if (!mddev->raid_disks && !mddev->external) err = -ENODEV; else err = get_disk_info(mddev, argp); goto out; case SET_DISK_FAULTY: err = set_disk_faulty(mddev, new_decode_dev(arg)); goto out; case GET_BITMAP_FILE: err = get_bitmap_file(mddev, argp); goto out; } if (cmd == ADD_NEW_DISK) /* need to ensure md_delayed_delete() has completed */ flush_workqueue(md_misc_wq); if (cmd == HOT_REMOVE_DISK) /* need to ensure recovery thread has run */ wait_event_interruptible_timeout(mddev->sb_wait, !test_bit(MD_RECOVERY_NEEDED, &mddev->recovery), msecs_to_jiffies(5000)); if (cmd == STOP_ARRAY || cmd == STOP_ARRAY_RO) { /* Need to flush page cache, and ensure no-one else opens * and writes */ mutex_lock(&mddev->open_mutex); if (mddev->pers && atomic_read(&mddev->openers) > 1) { mutex_unlock(&mddev->open_mutex); err = -EBUSY; goto out; } if (test_and_set_bit(MD_CLOSING, &mddev->flags)) { mutex_unlock(&mddev->open_mutex); err = -EBUSY; goto out; } did_set_md_closing = true; mutex_unlock(&mddev->open_mutex); sync_blockdev(bdev); } err = mddev_lock(mddev); if (err) { pr_debug("md: ioctl lock interrupted, reason %d, cmd %d\n", err, cmd); goto out; } if (cmd == SET_ARRAY_INFO) { mdu_array_info_t info; if (!arg) memset(&info, 0, sizeof(info)); else if (copy_from_user(&info, argp, sizeof(info))) { err = -EFAULT; goto unlock; } if (mddev->pers) { err = update_array_info(mddev, &info); if (err) { pr_warn("md: couldn't update array info. %d\n", err); goto unlock; } goto unlock; } if (!list_empty(&mddev->disks)) { pr_warn("md: array %s already has disks!\n", mdname(mddev)); err = -EBUSY; goto unlock; } if (mddev->raid_disks) { pr_warn("md: array %s already initialised!\n", mdname(mddev)); err = -EBUSY; goto unlock; } err = set_array_info(mddev, &info); if (err) { pr_warn("md: couldn't set array info. %d\n", err); goto unlock; } goto unlock; } /* * Commands querying/configuring an existing array: */ /* if we are not initialised yet, only ADD_NEW_DISK, STOP_ARRAY, * RUN_ARRAY, and GET_ and SET_BITMAP_FILE are allowed */ if ((!mddev->raid_disks && !mddev->external) && cmd != ADD_NEW_DISK && cmd != STOP_ARRAY && cmd != RUN_ARRAY && cmd != SET_BITMAP_FILE && cmd != GET_BITMAP_FILE) { err = -ENODEV; goto unlock; } /* * Commands even a read-only array can execute: */ switch (cmd) { case RESTART_ARRAY_RW: err = restart_array(mddev); goto unlock; case STOP_ARRAY: err = do_md_stop(mddev, 0, bdev); goto unlock; case STOP_ARRAY_RO: err = md_set_readonly(mddev, bdev); goto unlock; case HOT_REMOVE_DISK: err = hot_remove_disk(mddev, new_decode_dev(arg)); goto unlock; case ADD_NEW_DISK: /* We can support ADD_NEW_DISK on read-only arrays * only if we are re-adding a preexisting device. * So require mddev->pers and MD_DISK_SYNC. */ if (mddev->pers) { mdu_disk_info_t info; if (copy_from_user(&info, argp, sizeof(info))) err = -EFAULT; else if (!(info.state & (1<<MD_DISK_SYNC))) /* Need to clear read-only for this */ break; else err = add_new_disk(mddev, &info); goto unlock; } break; case BLKROSET: if (get_user(ro, (int __user *)(arg))) { err = -EFAULT; goto unlock; } err = -EINVAL; /* if the bdev is going readonly the value of mddev->ro * does not matter, no writes are coming */ if (ro) goto unlock; /* are we are already prepared for writes? */ if (mddev->ro != 1) goto unlock; /* transitioning to readauto need only happen for * arrays that call md_write_start */ if (mddev->pers) { err = restart_array(mddev); if (err == 0) { mddev->ro = 2; set_disk_ro(mddev->gendisk, 0); } } goto unlock; } /* * The remaining ioctls are changing the state of the * superblock, so we do not allow them on read-only arrays. */ if (mddev->ro && mddev->pers) { if (mddev->ro == 2) { mddev->ro = 0; sysfs_notify_dirent_safe(mddev->sysfs_state); set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); /* mddev_unlock will wake thread */ /* If a device failed while we were read-only, we * need to make sure the metadata is updated now. */ if (test_bit(MD_SB_CHANGE_DEVS, &mddev->sb_flags)) { mddev_unlock(mddev); wait_event(mddev->sb_wait, !test_bit(MD_SB_CHANGE_DEVS, &mddev->sb_flags) && !test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags)); mddev_lock_nointr(mddev); } } else { err = -EROFS; goto unlock; } } switch (cmd) { case ADD_NEW_DISK: { mdu_disk_info_t info; if (copy_from_user(&info, argp, sizeof(info))) err = -EFAULT; else err = add_new_disk(mddev, &info); goto unlock; } case CLUSTERED_DISK_NACK: if (mddev_is_clustered(mddev)) md_cluster_ops->new_disk_ack(mddev, false); else err = -EINVAL; goto unlock; case HOT_ADD_DISK: err = hot_add_disk(mddev, new_decode_dev(arg)); goto unlock; case RUN_ARRAY: err = do_md_run(mddev); goto unlock; case SET_BITMAP_FILE: err = set_bitmap_file(mddev, (int)arg); goto unlock; default: err = -EINVAL; goto unlock; } unlock: if (mddev->hold_active == UNTIL_IOCTL && err != -EINVAL) mddev->hold_active = 0; mddev_unlock(mddev); out: if(did_set_md_closing) clear_bit(MD_CLOSING, &mddev->flags); return err; } #ifdef CONFIG_COMPAT static int md_compat_ioctl(struct block_device *bdev, fmode_t mode, unsigned int cmd, unsigned long arg) { switch (cmd) { case HOT_REMOVE_DISK: case HOT_ADD_DISK: case SET_DISK_FAULTY: case SET_BITMAP_FILE: /* These take in integer arg, do not convert */ break; default: arg = (unsigned long)compat_ptr(arg); break; } return md_ioctl(bdev, mode, cmd, arg); } #endif /* CONFIG_COMPAT */ static int md_open(struct block_device *bdev, fmode_t mode) { /* * Succeed if we can lock the mddev, which confirms that * it isn't being stopped right now. */ struct mddev *mddev = mddev_find(bdev->bd_dev); int err; if (!mddev) return -ENODEV; if (mddev->gendisk != bdev->bd_disk) { /* we are racing with mddev_put which is discarding this * bd_disk. */ mddev_put(mddev); /* Wait until bdev->bd_disk is definitely gone */ if (work_pending(&mddev->del_work)) flush_workqueue(md_misc_wq); return -EBUSY; } BUG_ON(mddev != bdev->bd_disk->private_data); if ((err = mutex_lock_interruptible(&mddev->open_mutex))) goto out; if (test_bit(MD_CLOSING, &mddev->flags)) { mutex_unlock(&mddev->open_mutex); err = -ENODEV; goto out; } err = 0; atomic_inc(&mddev->openers); mutex_unlock(&mddev->open_mutex); check_disk_change(bdev); out: if (err) mddev_put(mddev); return err; } static void md_release(struct gendisk *disk, fmode_t mode) { struct mddev *mddev = disk->private_data; BUG_ON(!mddev); atomic_dec(&mddev->openers); mddev_put(mddev); } static int md_media_changed(struct gendisk *disk) { struct mddev *mddev = disk->private_data; return mddev->changed; } static int md_revalidate(struct gendisk *disk) { struct mddev *mddev = disk->private_data; mddev->changed = 0; return 0; } static const struct block_device_operations md_fops = { .owner = THIS_MODULE, .open = md_open, .release = md_release, .ioctl = md_ioctl, #ifdef CONFIG_COMPAT .compat_ioctl = md_compat_ioctl, #endif .getgeo = md_getgeo, .media_changed = md_media_changed, .revalidate_disk= md_revalidate, }; static int md_thread(void *arg) { struct md_thread *thread = arg; /* * md_thread is a 'system-thread', it's priority should be very * high. We avoid resource deadlocks individually in each * raid personality. (RAID5 does preallocation) We also use RR and * the very same RT priority as kswapd, thus we will never get * into a priority inversion deadlock. * * we definitely have to have equal or higher priority than * bdflush, otherwise bdflush will deadlock if there are too * many dirty RAID5 blocks. */ allow_signal(SIGKILL); while (!kthread_should_stop()) { /* We need to wait INTERRUPTIBLE so that * we don't add to the load-average. * That means we need to be sure no signals are * pending */ if (signal_pending(current)) flush_signals(current); wait_event_interruptible_timeout (thread->wqueue, test_bit(THREAD_WAKEUP, &thread->flags) || kthread_should_stop() || kthread_should_park(), thread->timeout); clear_bit(THREAD_WAKEUP, &thread->flags); if (kthread_should_park()) kthread_parkme(); if (!kthread_should_stop()) thread->run(thread); } return 0; } void md_wakeup_thread(struct md_thread *thread) { if (thread) { pr_debug("md: waking up MD thread %s.\n", thread->tsk->comm); set_bit(THREAD_WAKEUP, &thread->flags); wake_up(&thread->wqueue); } } EXPORT_SYMBOL(md_wakeup_thread); struct md_thread *md_register_thread(void (*run) (struct md_thread *), struct mddev *mddev, const char *name) { struct md_thread *thread; thread = kzalloc(sizeof(struct md_thread), GFP_KERNEL); if (!thread) return NULL; init_waitqueue_head(&thread->wqueue); thread->run = run; thread->mddev = mddev; thread->timeout = MAX_SCHEDULE_TIMEOUT; thread->tsk = kthread_run(md_thread, thread, "%s_%s", mdname(thread->mddev), name); if (IS_ERR(thread->tsk)) { kfree(thread); return NULL; } return thread; } EXPORT_SYMBOL(md_register_thread); void md_unregister_thread(struct md_thread **threadp) { struct md_thread *thread; /* * Locking ensures that mddev_unlock does not wake_up a * non-existent thread */ spin_lock(&pers_lock); thread = *threadp; if (!thread) { spin_unlock(&pers_lock); return; } *threadp = NULL; spin_unlock(&pers_lock); pr_debug("interrupting MD-thread pid %d\n", task_pid_nr(thread->tsk)); kthread_stop(thread->tsk); kfree(thread); } EXPORT_SYMBOL(md_unregister_thread); void md_error(struct mddev *mddev, struct md_rdev *rdev) { if (!rdev || test_bit(Faulty, &rdev->flags)) return; if (!mddev->pers || !mddev->pers->error_handler) return; mddev->pers->error_handler(mddev,rdev); if (mddev->degraded) set_bit(MD_RECOVERY_RECOVER, &mddev->recovery); sysfs_notify_dirent_safe(rdev->sysfs_state); set_bit(MD_RECOVERY_INTR, &mddev->recovery); set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); md_wakeup_thread(mddev->thread); if (mddev->event_work.func) queue_work(md_misc_wq, &mddev->event_work); md_new_event(mddev); } EXPORT_SYMBOL(md_error); /* seq_file implementation /proc/mdstat */ static void status_unused(struct seq_file *seq) { int i = 0; struct md_rdev *rdev; seq_printf(seq, "unused devices: "); list_for_each_entry(rdev, &pending_raid_disks, same_set) { char b[BDEVNAME_SIZE]; i++; seq_printf(seq, "%s ", bdevname(rdev->bdev,b)); } if (!i) seq_printf(seq, "<none>"); seq_printf(seq, "\n"); } static int status_resync(struct seq_file *seq, struct mddev *mddev) { sector_t max_sectors, resync, res; unsigned long dt, db = 0; sector_t rt, curr_mark_cnt, resync_mark_cnt; int scale, recovery_active; unsigned int per_milli; if (test_bit(MD_RECOVERY_SYNC, &mddev->recovery) || test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery)) max_sectors = mddev->resync_max_sectors; else max_sectors = mddev->dev_sectors; resync = mddev->curr_resync; if (resync <= 3) { if (test_bit(MD_RECOVERY_DONE, &mddev->recovery)) /* Still cleaning up */ resync = max_sectors; } else if (resync > max_sectors) resync = max_sectors; else resync -= atomic_read(&mddev->recovery_active); if (resync == 0) { if (test_bit(MD_RESYNCING_REMOTE, &mddev->recovery)) { struct md_rdev *rdev; rdev_for_each(rdev, mddev) if (rdev->raid_disk >= 0 && !test_bit(Faulty, &rdev->flags) && rdev->recovery_offset != MaxSector && rdev->recovery_offset) { seq_printf(seq, "\trecover=REMOTE"); return 1; } if (mddev->reshape_position != MaxSector) seq_printf(seq, "\treshape=REMOTE"); else seq_printf(seq, "\tresync=REMOTE"); return 1; } if (mddev->recovery_cp < MaxSector) { seq_printf(seq, "\tresync=PENDING"); return 1; } return 0; } if (resync < 3) { seq_printf(seq, "\tresync=DELAYED"); return 1; } WARN_ON(max_sectors == 0); /* Pick 'scale' such that (resync>>scale)*1000 will fit * in a sector_t, and (max_sectors>>scale) will fit in a * u32, as those are the requirements for sector_div. * Thus 'scale' must be at least 10 */ scale = 10; if (sizeof(sector_t) > sizeof(unsigned long)) { while ( max_sectors/2 > (1ULL<<(scale+32))) scale++; } res = (resync>>scale)*1000; sector_div(res, (u32)((max_sectors>>scale)+1)); per_milli = res; { int i, x = per_milli/50, y = 20-x; seq_printf(seq, "["); for (i = 0; i < x; i++) seq_printf(seq, "="); seq_printf(seq, ">"); for (i = 0; i < y; i++) seq_printf(seq, "."); seq_printf(seq, "] "); } seq_printf(seq, " %s =%3u.%u%% (%llu/%llu)", (test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery)? "reshape" : (test_bit(MD_RECOVERY_CHECK, &mddev->recovery)? "check" : (test_bit(MD_RECOVERY_SYNC, &mddev->recovery) ? "resync" : "recovery"))), per_milli/10, per_milli % 10, (unsigned long long) resync/2, (unsigned long long) max_sectors/2); /* * dt: time from mark until now * db: blocks written from mark until now * rt: remaining time * * rt is a sector_t, which is always 64bit now. We are keeping * the original algorithm, but it is not really necessary. * * Original algorithm: * So we divide before multiply in case it is 32bit and close * to the limit. * We scale the divisor (db) by 32 to avoid losing precision * near the end of resync when the number of remaining sectors * is close to 'db'. * We then divide rt by 32 after multiplying by db to compensate. * The '+1' avoids division by zero if db is very small. */ dt = ((jiffies - mddev->resync_mark) / HZ); if (!dt) dt++; curr_mark_cnt = mddev->curr_mark_cnt; recovery_active = atomic_read(&mddev->recovery_active); resync_mark_cnt = mddev->resync_mark_cnt; if (curr_mark_cnt >= (recovery_active + resync_mark_cnt)) db = curr_mark_cnt - (recovery_active + resync_mark_cnt); rt = max_sectors - resync; /* number of remaining sectors */ rt = div64_u64(rt, db/32+1); rt *= dt; rt >>= 5; seq_printf(seq, " finish=%lu.%lumin", (unsigned long)rt / 60, ((unsigned long)rt % 60)/6); seq_printf(seq, " speed=%ldK/sec", db/2/dt); return 1; } static void *md_seq_start(struct seq_file *seq, loff_t *pos) { struct list_head *tmp; loff_t l = *pos; struct mddev *mddev; if (l == 0x10000) { ++*pos; return (void *)2; } if (l > 0x10000) return NULL; if (!l--) /* header */ return (void*)1; spin_lock(&all_mddevs_lock); list_for_each(tmp,&all_mddevs) if (!l--) { mddev = list_entry(tmp, struct mddev, all_mddevs); mddev_get(mddev); spin_unlock(&all_mddevs_lock); return mddev; } spin_unlock(&all_mddevs_lock); if (!l--) return (void*)2;/* tail */ return NULL; } static void *md_seq_next(struct seq_file *seq, void *v, loff_t *pos) { struct list_head *tmp; struct mddev *next_mddev, *mddev = v; ++*pos; if (v == (void*)2) return NULL; spin_lock(&all_mddevs_lock); if (v == (void*)1) tmp = all_mddevs.next; else tmp = mddev->all_mddevs.next; if (tmp != &all_mddevs) next_mddev = mddev_get(list_entry(tmp,struct mddev,all_mddevs)); else { next_mddev = (void*)2; *pos = 0x10000; } spin_unlock(&all_mddevs_lock); if (v != (void*)1) mddev_put(mddev); return next_mddev; } static void md_seq_stop(struct seq_file *seq, void *v) { struct mddev *mddev = v; if (mddev && v != (void*)1 && v != (void*)2) mddev_put(mddev); } static int md_seq_show(struct seq_file *seq, void *v) { struct mddev *mddev = v; sector_t sectors; struct md_rdev *rdev; if (v == (void*)1) { struct md_personality *pers; seq_printf(seq, "Personalities : "); spin_lock(&pers_lock); list_for_each_entry(pers, &pers_list, list) seq_printf(seq, "[%s] ", pers->name); spin_unlock(&pers_lock); seq_printf(seq, "\n"); seq->poll_event = atomic_read(&md_event_count); return 0; } if (v == (void*)2) { status_unused(seq); return 0; } spin_lock(&mddev->lock); if (mddev->pers || mddev->raid_disks || !list_empty(&mddev->disks)) { seq_printf(seq, "%s : %sactive", mdname(mddev), mddev->pers ? "" : "in"); if (mddev->pers) { if (mddev->ro==1) seq_printf(seq, " (read-only)"); if (mddev->ro==2) seq_printf(seq, " (auto-read-only)"); seq_printf(seq, " %s", mddev->pers->name); } sectors = 0; rcu_read_lock(); rdev_for_each_rcu(rdev, mddev) { char b[BDEVNAME_SIZE]; seq_printf(seq, " %s[%d]", bdevname(rdev->bdev,b), rdev->desc_nr); if (test_bit(WriteMostly, &rdev->flags)) seq_printf(seq, "(W)"); if (test_bit(Journal, &rdev->flags)) seq_printf(seq, "(J)"); if (test_bit(Faulty, &rdev->flags)) { seq_printf(seq, "(F)"); continue; } if (rdev->raid_disk < 0) seq_printf(seq, "(S)"); /* spare */ if (test_bit(Replacement, &rdev->flags)) seq_printf(seq, "(R)"); sectors += rdev->sectors; } rcu_read_unlock(); if (!list_empty(&mddev->disks)) { if (mddev->pers) seq_printf(seq, "\n %llu blocks", (unsigned long long) mddev->array_sectors / 2); else seq_printf(seq, "\n %llu blocks", (unsigned long long)sectors / 2); } if (mddev->persistent) { if (mddev->major_version != 0 || mddev->minor_version != 90) { seq_printf(seq," super %d.%d", mddev->major_version, mddev->minor_version); } } else if (mddev->external) seq_printf(seq, " super external:%s", mddev->metadata_type); else seq_printf(seq, " super non-persistent"); if (mddev->pers) { mddev->pers->status(seq, mddev); seq_printf(seq, "\n "); if (mddev->pers->sync_request) { if (status_resync(seq, mddev)) seq_printf(seq, "\n "); } } else seq_printf(seq, "\n "); md_bitmap_status(seq, mddev->bitmap); seq_printf(seq, "\n"); } spin_unlock(&mddev->lock); return 0; } static const struct seq_operations md_seq_ops = { .start = md_seq_start, .next = md_seq_next, .stop = md_seq_stop, .show = md_seq_show, }; static int md_seq_open(struct inode *inode, struct file *file) { struct seq_file *seq; int error; error = seq_open(file, &md_seq_ops); if (error) return error; seq = file->private_data; seq->poll_event = atomic_read(&md_event_count); return error; } static int md_unloading; static __poll_t mdstat_poll(struct file *filp, poll_table *wait) { struct seq_file *seq = filp->private_data; __poll_t mask; if (md_unloading) return EPOLLIN|EPOLLRDNORM|EPOLLERR|EPOLLPRI; poll_wait(filp, &md_event_waiters, wait); /* always allow read */ mask = EPOLLIN | EPOLLRDNORM; if (seq->poll_event != atomic_read(&md_event_count)) mask |= EPOLLERR | EPOLLPRI; return mask; } static const struct file_operations md_seq_fops = { .owner = THIS_MODULE, .open = md_seq_open, .read = seq_read, .llseek = seq_lseek, .release = seq_release, .poll = mdstat_poll, }; int register_md_personality(struct md_personality *p) { pr_debug("md: %s personality registered for level %d\n", p->name, p->level); spin_lock(&pers_lock); list_add_tail(&p->list, &pers_list); spin_unlock(&pers_lock); return 0; } EXPORT_SYMBOL(register_md_personality); int unregister_md_personality(struct md_personality *p) { pr_debug("md: %s personality unregistered\n", p->name); spin_lock(&pers_lock); list_del_init(&p->list); spin_unlock(&pers_lock); return 0; } EXPORT_SYMBOL(unregister_md_personality); int register_md_cluster_operations(struct md_cluster_operations *ops, struct module *module) { int ret = 0; spin_lock(&pers_lock); if (md_cluster_ops != NULL) ret = -EALREADY; else { md_cluster_ops = ops; md_cluster_mod = module; } spin_unlock(&pers_lock); return ret; } EXPORT_SYMBOL(register_md_cluster_operations); int unregister_md_cluster_operations(void) { spin_lock(&pers_lock); md_cluster_ops = NULL; spin_unlock(&pers_lock); return 0; } EXPORT_SYMBOL(unregister_md_cluster_operations); int md_setup_cluster(struct mddev *mddev, int nodes) { if (!md_cluster_ops) request_module("md-cluster"); spin_lock(&pers_lock); /* ensure module won't be unloaded */ if (!md_cluster_ops || !try_module_get(md_cluster_mod)) { pr_warn("can't find md-cluster module or get it's reference.\n"); spin_unlock(&pers_lock); return -ENOENT; } spin_unlock(&pers_lock); return md_cluster_ops->join(mddev, nodes); } void md_cluster_stop(struct mddev *mddev) { if (!md_cluster_ops) return; md_cluster_ops->leave(mddev); module_put(md_cluster_mod); } static int is_mddev_idle(struct mddev *mddev, int init) { struct md_rdev *rdev; int idle; int curr_events; idle = 1; rcu_read_lock(); rdev_for_each_rcu(rdev, mddev) { struct gendisk *disk = rdev->bdev->bd_contains->bd_disk; curr_events = (int)part_stat_read_accum(&disk->part0, sectors) - atomic_read(&disk->sync_io); /* sync IO will cause sync_io to increase before the disk_stats * as sync_io is counted when a request starts, and * disk_stats is counted when it completes. * So resync activity will cause curr_events to be smaller than * when there was no such activity. * non-sync IO will cause disk_stat to increase without * increasing sync_io so curr_events will (eventually) * be larger than it was before. Once it becomes * substantially larger, the test below will cause * the array to appear non-idle, and resync will slow * down. * If there is a lot of outstanding resync activity when * we set last_event to curr_events, then all that activity * completing might cause the array to appear non-idle * and resync will be slowed down even though there might * not have been non-resync activity. This will only * happen once though. 'last_events' will soon reflect * the state where there is little or no outstanding * resync requests, and further resync activity will * always make curr_events less than last_events. * */ if (init || curr_events - rdev->last_events > 64) { rdev->last_events = curr_events; idle = 0; } } rcu_read_unlock(); return idle; } void md_done_sync(struct mddev *mddev, int blocks, int ok) { /* another "blocks" (512byte) blocks have been synced */ atomic_sub(blocks, &mddev->recovery_active); wake_up(&mddev->recovery_wait); if (!ok) { set_bit(MD_RECOVERY_INTR, &mddev->recovery); set_bit(MD_RECOVERY_ERROR, &mddev->recovery); md_wakeup_thread(mddev->thread); // stop recovery, signal do_sync .... } } EXPORT_SYMBOL(md_done_sync); /* md_write_start(mddev, bi) * If we need to update some array metadata (e.g. 'active' flag * in superblock) before writing, schedule a superblock update * and wait for it to complete. * A return value of 'false' means that the write wasn't recorded * and cannot proceed as the array is being suspend. */ bool md_write_start(struct mddev *mddev, struct bio *bi) { int did_change = 0; if (bio_data_dir(bi) != WRITE) return true; BUG_ON(mddev->ro == 1); if (mddev->ro == 2) { /* need to switch to read/write */ mddev->ro = 0; set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); md_wakeup_thread(mddev->thread); md_wakeup_thread(mddev->sync_thread); did_change = 1; } rcu_read_lock(); percpu_ref_get(&mddev->writes_pending); smp_mb(); /* Match smp_mb in set_in_sync() */ if (mddev->safemode == 1) mddev->safemode = 0; /* sync_checkers is always 0 when writes_pending is in per-cpu mode */ if (mddev->in_sync || mddev->sync_checkers) { spin_lock(&mddev->lock); if (mddev->in_sync) { mddev->in_sync = 0; set_bit(MD_SB_CHANGE_CLEAN, &mddev->sb_flags); set_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags); md_wakeup_thread(mddev->thread); did_change = 1; } spin_unlock(&mddev->lock); } rcu_read_unlock(); if (did_change) sysfs_notify_dirent_safe(mddev->sysfs_state); if (!mddev->has_superblocks) return true; wait_event(mddev->sb_wait, !test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags) || mddev->suspended); if (test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags)) { percpu_ref_put(&mddev->writes_pending); return false; } return true; } EXPORT_SYMBOL(md_write_start); /* md_write_inc can only be called when md_write_start() has * already been called at least once of the current request. * It increments the counter and is useful when a single request * is split into several parts. Each part causes an increment and * so needs a matching md_write_end(). * Unlike md_write_start(), it is safe to call md_write_inc() inside * a spinlocked region. */ void md_write_inc(struct mddev *mddev, struct bio *bi) { if (bio_data_dir(bi) != WRITE) return; WARN_ON_ONCE(mddev->in_sync || mddev->ro); percpu_ref_get(&mddev->writes_pending); } EXPORT_SYMBOL(md_write_inc); void md_write_end(struct mddev *mddev) { percpu_ref_put(&mddev->writes_pending); if (mddev->safemode == 2) md_wakeup_thread(mddev->thread); else if (mddev->safemode_delay) /* The roundup() ensures this only performs locking once * every ->safemode_delay jiffies */ mod_timer(&mddev->safemode_timer, roundup(jiffies, mddev->safemode_delay) + mddev->safemode_delay); } EXPORT_SYMBOL(md_write_end); /* md_allow_write(mddev) * Calling this ensures that the array is marked 'active' so that writes * may proceed without blocking. It is important to call this before * attempting a GFP_KERNEL allocation while holding the mddev lock. * Must be called with mddev_lock held. */ void md_allow_write(struct mddev *mddev) { if (!mddev->pers) return; if (mddev->ro) return; if (!mddev->pers->sync_request) return; spin_lock(&mddev->lock); if (mddev->in_sync) { mddev->in_sync = 0; set_bit(MD_SB_CHANGE_CLEAN, &mddev->sb_flags); set_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags); if (mddev->safemode_delay && mddev->safemode == 0) mddev->safemode = 1; spin_unlock(&mddev->lock); md_update_sb(mddev, 0); sysfs_notify_dirent_safe(mddev->sysfs_state); /* wait for the dirty state to be recorded in the metadata */ wait_event(mddev->sb_wait, !test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags)); } else spin_unlock(&mddev->lock); } EXPORT_SYMBOL_GPL(md_allow_write); #define SYNC_MARKS 10 #define SYNC_MARK_STEP (3*HZ) #define UPDATE_FREQUENCY (5*60*HZ) void md_do_sync(struct md_thread *thread) { struct mddev *mddev = thread->mddev; struct mddev *mddev2; unsigned int currspeed = 0, window; sector_t max_sectors,j, io_sectors, recovery_done; unsigned long mark[SYNC_MARKS]; unsigned long update_time; sector_t mark_cnt[SYNC_MARKS]; int last_mark,m; struct list_head *tmp; sector_t last_check; int skipped = 0; struct md_rdev *rdev; char *desc, *action = NULL; struct blk_plug plug; int ret; /* just incase thread restarts... */ if (test_bit(MD_RECOVERY_DONE, &mddev->recovery) || test_bit(MD_RECOVERY_WAIT, &mddev->recovery)) return; if (mddev->ro) {/* never try to sync a read-only array */ set_bit(MD_RECOVERY_INTR, &mddev->recovery); return; } if (mddev_is_clustered(mddev)) { ret = md_cluster_ops->resync_start(mddev); if (ret) goto skip; set_bit(MD_CLUSTER_RESYNC_LOCKED, &mddev->flags); if (!(test_bit(MD_RECOVERY_SYNC, &mddev->recovery) || test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery) || test_bit(MD_RECOVERY_RECOVER, &mddev->recovery)) && ((unsigned long long)mddev->curr_resync_completed < (unsigned long long)mddev->resync_max_sectors)) goto skip; } if (test_bit(MD_RECOVERY_SYNC, &mddev->recovery)) { if (test_bit(MD_RECOVERY_CHECK, &mddev->recovery)) { desc = "data-check"; action = "check"; } else if (test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery)) { desc = "requested-resync"; action = "repair"; } else desc = "resync"; } else if (test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery)) desc = "reshape"; else desc = "recovery"; mddev->last_sync_action = action ?: desc; /* we overload curr_resync somewhat here. * 0 == not engaged in resync at all * 2 == checking that there is no conflict with another sync * 1 == like 2, but have yielded to allow conflicting resync to * commense * other == active in resync - this many blocks * * Before starting a resync we must have set curr_resync to * 2, and then checked that every "conflicting" array has curr_resync * less than ours. When we find one that is the same or higher * we wait on resync_wait. To avoid deadlock, we reduce curr_resync * to 1 if we choose to yield (based arbitrarily on address of mddev structure). * This will mean we have to start checking from the beginning again. * */ do { int mddev2_minor = -1; mddev->curr_resync = 2; try_again: if (test_bit(MD_RECOVERY_INTR, &mddev->recovery)) goto skip; for_each_mddev(mddev2, tmp) { if (mddev2 == mddev) continue; if (!mddev->parallel_resync && mddev2->curr_resync && match_mddev_units(mddev, mddev2)) { DEFINE_WAIT(wq); if (mddev < mddev2 && mddev->curr_resync == 2) { /* arbitrarily yield */ mddev->curr_resync = 1; wake_up(&resync_wait); } if (mddev > mddev2 && mddev->curr_resync == 1) /* no need to wait here, we can wait the next * time 'round when curr_resync == 2 */ continue; /* We need to wait 'interruptible' so as not to * contribute to the load average, and not to * be caught by 'softlockup' */ prepare_to_wait(&resync_wait, &wq, TASK_INTERRUPTIBLE); if (!test_bit(MD_RECOVERY_INTR, &mddev->recovery) && mddev2->curr_resync >= mddev->curr_resync) { if (mddev2_minor != mddev2->md_minor) { mddev2_minor = mddev2->md_minor; pr_info("md: delaying %s of %s until %s has finished (they share one or more physical units)\n", desc, mdname(mddev), mdname(mddev2)); } mddev_put(mddev2); if (signal_pending(current)) flush_signals(current); schedule(); finish_wait(&resync_wait, &wq); goto try_again; } finish_wait(&resync_wait, &wq); } } } while (mddev->curr_resync < 2); j = 0; if (test_bit(MD_RECOVERY_SYNC, &mddev->recovery)) { /* resync follows the size requested by the personality, * which defaults to physical size, but can be virtual size */ max_sectors = mddev->resync_max_sectors; atomic64_set(&mddev->resync_mismatches, 0); /* we don't use the checkpoint if there's a bitmap */ if (test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery)) j = mddev->resync_min; else if (!mddev->bitmap) j = mddev->recovery_cp; } else if (test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery)) max_sectors = mddev->resync_max_sectors; else { /* recovery follows the physical size of devices */ max_sectors = mddev->dev_sectors; j = MaxSector; rcu_read_lock(); rdev_for_each_rcu(rdev, mddev) if (rdev->raid_disk >= 0 && !test_bit(Journal, &rdev->flags) && !test_bit(Faulty, &rdev->flags) && !test_bit(In_sync, &rdev->flags) && rdev->recovery_offset < j) j = rdev->recovery_offset; rcu_read_unlock(); /* If there is a bitmap, we need to make sure all * writes that started before we added a spare * complete before we start doing a recovery. * Otherwise the write might complete and (via * bitmap_endwrite) set a bit in the bitmap after the * recovery has checked that bit and skipped that * region. */ if (mddev->bitmap) { mddev->pers->quiesce(mddev, 1); mddev->pers->quiesce(mddev, 0); } } pr_info("md: %s of RAID array %s\n", desc, mdname(mddev)); pr_debug("md: minimum _guaranteed_ speed: %d KB/sec/disk.\n", speed_min(mddev)); pr_debug("md: using maximum available idle IO bandwidth (but not more than %d KB/sec) for %s.\n", speed_max(mddev), desc); is_mddev_idle(mddev, 1); /* this initializes IO event counters */ io_sectors = 0; for (m = 0; m < SYNC_MARKS; m++) { mark[m] = jiffies; mark_cnt[m] = io_sectors; } last_mark = 0; mddev->resync_mark = mark[last_mark]; mddev->resync_mark_cnt = mark_cnt[last_mark]; /* * Tune reconstruction: */ window = 32*(PAGE_SIZE/512); pr_debug("md: using %dk window, over a total of %lluk.\n", window/2, (unsigned long long)max_sectors/2); atomic_set(&mddev->recovery_active, 0); last_check = 0; if (j>2) { pr_debug("md: resuming %s of %s from checkpoint.\n", desc, mdname(mddev)); mddev->curr_resync = j; } else mddev->curr_resync = 3; /* no longer delayed */ mddev->curr_resync_completed = j; sysfs_notify(&mddev->kobj, NULL, "sync_completed"); md_new_event(mddev); update_time = jiffies; blk_start_plug(&plug); while (j < max_sectors) { sector_t sectors; skipped = 0; if (!test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery) && ((mddev->curr_resync > mddev->curr_resync_completed && (mddev->curr_resync - mddev->curr_resync_completed) > (max_sectors >> 4)) || time_after_eq(jiffies, update_time + UPDATE_FREQUENCY) || (j - mddev->curr_resync_completed)*2 >= mddev->resync_max - mddev->curr_resync_completed || mddev->curr_resync_completed > mddev->resync_max )) { /* time to update curr_resync_completed */ wait_event(mddev->recovery_wait, atomic_read(&mddev->recovery_active) == 0); mddev->curr_resync_completed = j; if (test_bit(MD_RECOVERY_SYNC, &mddev->recovery) && j > mddev->recovery_cp) mddev->recovery_cp = j; update_time = jiffies; set_bit(MD_SB_CHANGE_CLEAN, &mddev->sb_flags); sysfs_notify(&mddev->kobj, NULL, "sync_completed"); } while (j >= mddev->resync_max && !test_bit(MD_RECOVERY_INTR, &mddev->recovery)) { /* As this condition is controlled by user-space, * we can block indefinitely, so use '_interruptible' * to avoid triggering warnings. */ flush_signals(current); /* just in case */ wait_event_interruptible(mddev->recovery_wait, mddev->resync_max > j || test_bit(MD_RECOVERY_INTR, &mddev->recovery)); } if (test_bit(MD_RECOVERY_INTR, &mddev->recovery)) break; sectors = mddev->pers->sync_request(mddev, j, &skipped); if (sectors == 0) { set_bit(MD_RECOVERY_INTR, &mddev->recovery); break; } if (!skipped) { /* actual IO requested */ io_sectors += sectors; atomic_add(sectors, &mddev->recovery_active); } if (test_bit(MD_RECOVERY_INTR, &mddev->recovery)) break; j += sectors; if (j > max_sectors) /* when skipping, extra large numbers can be returned. */ j = max_sectors; if (j > 2) mddev->curr_resync = j; mddev->curr_mark_cnt = io_sectors; if (last_check == 0) /* this is the earliest that rebuild will be * visible in /proc/mdstat */ md_new_event(mddev); if (last_check + window > io_sectors || j == max_sectors) continue; last_check = io_sectors; repeat: if (time_after_eq(jiffies, mark[last_mark] + SYNC_MARK_STEP )) { /* step marks */ int next = (last_mark+1) % SYNC_MARKS; mddev->resync_mark = mark[next]; mddev->resync_mark_cnt = mark_cnt[next]; mark[next] = jiffies; mark_cnt[next] = io_sectors - atomic_read(&mddev->recovery_active); last_mark = next; } if (test_bit(MD_RECOVERY_INTR, &mddev->recovery)) break; /* * this loop exits only if either when we are slower than * the 'hard' speed limit, or the system was IO-idle for * a jiffy. * the system might be non-idle CPU-wise, but we only care * about not overloading the IO subsystem. (things like an * e2fsck being done on the RAID array should execute fast) */ cond_resched(); recovery_done = io_sectors - atomic_read(&mddev->recovery_active); currspeed = ((unsigned long)(recovery_done - mddev->resync_mark_cnt))/2 /((jiffies-mddev->resync_mark)/HZ +1) +1; if (currspeed > speed_min(mddev)) { if (currspeed > speed_max(mddev)) { msleep(500); goto repeat; } if (!is_mddev_idle(mddev, 0)) { /* * Give other IO more of a chance. * The faster the devices, the less we wait. */ wait_event(mddev->recovery_wait, !atomic_read(&mddev->recovery_active)); } } } pr_info("md: %s: %s %s.\n",mdname(mddev), desc, test_bit(MD_RECOVERY_INTR, &mddev->recovery) ? "interrupted" : "done"); /* * this also signals 'finished resyncing' to md_stop */ blk_finish_plug(&plug); wait_event(mddev->recovery_wait, !atomic_read(&mddev->recovery_active)); if (!test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery) && !test_bit(MD_RECOVERY_INTR, &mddev->recovery) && mddev->curr_resync > 3) { mddev->curr_resync_completed = mddev->curr_resync; sysfs_notify(&mddev->kobj, NULL, "sync_completed"); } mddev->pers->sync_request(mddev, max_sectors, &skipped); if (!test_bit(MD_RECOVERY_CHECK, &mddev->recovery) && mddev->curr_resync > 3) { if (test_bit(MD_RECOVERY_SYNC, &mddev->recovery)) { if (test_bit(MD_RECOVERY_INTR, &mddev->recovery)) { if (mddev->curr_resync >= mddev->recovery_cp) { pr_debug("md: checkpointing %s of %s.\n", desc, mdname(mddev)); if (test_bit(MD_RECOVERY_ERROR, &mddev->recovery)) mddev->recovery_cp = mddev->curr_resync_completed; else mddev->recovery_cp = mddev->curr_resync; } } else mddev->recovery_cp = MaxSector; } else { if (!test_bit(MD_RECOVERY_INTR, &mddev->recovery)) mddev->curr_resync = MaxSector; if (!test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery) && test_bit(MD_RECOVERY_RECOVER, &mddev->recovery)) { rcu_read_lock(); rdev_for_each_rcu(rdev, mddev) if (rdev->raid_disk >= 0 && mddev->delta_disks >= 0 && !test_bit(Journal, &rdev->flags) && !test_bit(Faulty, &rdev->flags) && !test_bit(In_sync, &rdev->flags) && rdev->recovery_offset < mddev->curr_resync) rdev->recovery_offset = mddev->curr_resync; rcu_read_unlock(); } } } skip: /* set CHANGE_PENDING here since maybe another update is needed, * so other nodes are informed. It should be harmless for normal * raid */ set_mask_bits(&mddev->sb_flags, 0, BIT(MD_SB_CHANGE_PENDING) | BIT(MD_SB_CHANGE_DEVS)); if (test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery) && !test_bit(MD_RECOVERY_INTR, &mddev->recovery) && mddev->delta_disks > 0 && mddev->pers->finish_reshape && mddev->pers->size && mddev->queue) { mddev_lock_nointr(mddev); md_set_array_sectors(mddev, mddev->pers->size(mddev, 0, 0)); mddev_unlock(mddev); set_capacity(mddev->gendisk, mddev->array_sectors); revalidate_disk(mddev->gendisk); } spin_lock(&mddev->lock); if (!test_bit(MD_RECOVERY_INTR, &mddev->recovery)) { /* We completed so min/max setting can be forgotten if used. */ if (test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery)) mddev->resync_min = 0; mddev->resync_max = MaxSector; } else if (test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery)) mddev->resync_min = mddev->curr_resync_completed; set_bit(MD_RECOVERY_DONE, &mddev->recovery); mddev->curr_resync = 0; spin_unlock(&mddev->lock); wake_up(&resync_wait); md_wakeup_thread(mddev->thread); return; } EXPORT_SYMBOL_GPL(md_do_sync); static int remove_and_add_spares(struct mddev *mddev, struct md_rdev *this) { struct md_rdev *rdev; int spares = 0; int removed = 0; bool remove_some = false; if (this && test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) /* Mustn't remove devices when resync thread is running */ return 0; rdev_for_each(rdev, mddev) { if ((this == NULL || rdev == this) && rdev->raid_disk >= 0 && !test_bit(Blocked, &rdev->flags) && test_bit(Faulty, &rdev->flags) && atomic_read(&rdev->nr_pending)==0) { /* Faulty non-Blocked devices with nr_pending == 0 * never get nr_pending incremented, * never get Faulty cleared, and never get Blocked set. * So we can synchronize_rcu now rather than once per device */ remove_some = true; set_bit(RemoveSynchronized, &rdev->flags); } } if (remove_some) synchronize_rcu(); rdev_for_each(rdev, mddev) { if ((this == NULL || rdev == this) && rdev->raid_disk >= 0 && !test_bit(Blocked, &rdev->flags) && ((test_bit(RemoveSynchronized, &rdev->flags) || (!test_bit(In_sync, &rdev->flags) && !test_bit(Journal, &rdev->flags))) && atomic_read(&rdev->nr_pending)==0)) { if (mddev->pers->hot_remove_disk( mddev, rdev) == 0) { sysfs_unlink_rdev(mddev, rdev); rdev->saved_raid_disk = rdev->raid_disk; rdev->raid_disk = -1; removed++; } } if (remove_some && test_bit(RemoveSynchronized, &rdev->flags)) clear_bit(RemoveSynchronized, &rdev->flags); } if (removed && mddev->kobj.sd) sysfs_notify(&mddev->kobj, NULL, "degraded"); if (this && removed) goto no_add; rdev_for_each(rdev, mddev) { if (this && this != rdev) continue; if (test_bit(Candidate, &rdev->flags)) continue; if (rdev->raid_disk >= 0 && !test_bit(In_sync, &rdev->flags) && !test_bit(Journal, &rdev->flags) && !test_bit(Faulty, &rdev->flags)) spares++; if (rdev->raid_disk >= 0) continue; if (test_bit(Faulty, &rdev->flags)) continue; if (!test_bit(Journal, &rdev->flags)) { if (mddev->ro && ! (rdev->saved_raid_disk >= 0 && !test_bit(Bitmap_sync, &rdev->flags))) continue; rdev->recovery_offset = 0; } if (mddev->pers-> hot_add_disk(mddev, rdev) == 0) { if (sysfs_link_rdev(mddev, rdev)) /* failure here is OK */; if (!test_bit(Journal, &rdev->flags)) spares++; md_new_event(mddev); set_bit(MD_SB_CHANGE_DEVS, &mddev->sb_flags); } } no_add: if (removed) set_bit(MD_SB_CHANGE_DEVS, &mddev->sb_flags); return spares; } static void md_start_sync(struct work_struct *ws) { struct mddev *mddev = container_of(ws, struct mddev, del_work); mddev->sync_thread = md_register_thread(md_do_sync, mddev, "resync"); if (!mddev->sync_thread) { pr_warn("%s: could not start resync thread...\n", mdname(mddev)); /* leave the spares where they are, it shouldn't hurt */ clear_bit(MD_RECOVERY_SYNC, &mddev->recovery); clear_bit(MD_RECOVERY_RESHAPE, &mddev->recovery); clear_bit(MD_RECOVERY_REQUESTED, &mddev->recovery); clear_bit(MD_RECOVERY_CHECK, &mddev->recovery); clear_bit(MD_RECOVERY_RUNNING, &mddev->recovery); wake_up(&resync_wait); if (test_and_clear_bit(MD_RECOVERY_RECOVER, &mddev->recovery)) if (mddev->sysfs_action) sysfs_notify_dirent_safe(mddev->sysfs_action); } else md_wakeup_thread(mddev->sync_thread); sysfs_notify_dirent_safe(mddev->sysfs_action); md_new_event(mddev); } /* * This routine is regularly called by all per-raid-array threads to * deal with generic issues like resync and super-block update. * Raid personalities that don't have a thread (linear/raid0) do not * need this as they never do any recovery or update the superblock. * * It does not do any resync itself, but rather "forks" off other threads * to do that as needed. * When it is determined that resync is needed, we set MD_RECOVERY_RUNNING in * "->recovery" and create a thread at ->sync_thread. * When the thread finishes it sets MD_RECOVERY_DONE * and wakeups up this thread which will reap the thread and finish up. * This thread also removes any faulty devices (with nr_pending == 0). * * The overall approach is: * 1/ if the superblock needs updating, update it. * 2/ If a recovery thread is running, don't do anything else. * 3/ If recovery has finished, clean up, possibly marking spares active. * 4/ If there are any faulty devices, remove them. * 5/ If array is degraded, try to add spares devices * 6/ If array has spares or is not in-sync, start a resync thread. */ void md_check_recovery(struct mddev *mddev) { if (test_bit(MD_ALLOW_SB_UPDATE, &mddev->flags) && mddev->sb_flags) { /* Write superblock - thread that called mddev_suspend() * holds reconfig_mutex for us. */ set_bit(MD_UPDATING_SB, &mddev->flags); smp_mb__after_atomic(); if (test_bit(MD_ALLOW_SB_UPDATE, &mddev->flags)) md_update_sb(mddev, 0); clear_bit_unlock(MD_UPDATING_SB, &mddev->flags); wake_up(&mddev->sb_wait); } if (mddev->suspended) return; if (mddev->bitmap) md_bitmap_daemon_work(mddev); if (signal_pending(current)) { if (mddev->pers->sync_request && !mddev->external) { pr_debug("md: %s in immediate safe mode\n", mdname(mddev)); mddev->safemode = 2; } flush_signals(current); } if (mddev->ro && !test_bit(MD_RECOVERY_NEEDED, &mddev->recovery)) return; if ( ! ( (mddev->sb_flags & ~ (1<<MD_SB_CHANGE_PENDING)) || test_bit(MD_RECOVERY_NEEDED, &mddev->recovery) || test_bit(MD_RECOVERY_DONE, &mddev->recovery) || (mddev->external == 0 && mddev->safemode == 1) || (mddev->safemode == 2 && !mddev->in_sync && mddev->recovery_cp == MaxSector) )) return; if (mddev_trylock(mddev)) { int spares = 0; bool try_set_sync = mddev->safemode != 0; if (!mddev->external && mddev->safemode == 1) mddev->safemode = 0; if (mddev->ro) { struct md_rdev *rdev; if (!mddev->external && mddev->in_sync) /* 'Blocked' flag not needed as failed devices * will be recorded if array switched to read/write. * Leaving it set will prevent the device * from being removed. */ rdev_for_each(rdev, mddev) clear_bit(Blocked, &rdev->flags); /* On a read-only array we can: * - remove failed devices * - add already-in_sync devices if the array itself * is in-sync. * As we only add devices that are already in-sync, * we can activate the spares immediately. */ remove_and_add_spares(mddev, NULL); /* There is no thread, but we need to call * ->spare_active and clear saved_raid_disk */ set_bit(MD_RECOVERY_INTR, &mddev->recovery); md_reap_sync_thread(mddev); clear_bit(MD_RECOVERY_RECOVER, &mddev->recovery); clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery); clear_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags); goto unlock; } if (mddev_is_clustered(mddev)) { struct md_rdev *rdev, *tmp; /* kick the device if another node issued a * remove disk. */ rdev_for_each_safe(rdev, tmp, mddev) { if (test_and_clear_bit(ClusterRemove, &rdev->flags) && rdev->raid_disk < 0) md_kick_rdev_from_array(rdev); } } if (try_set_sync && !mddev->external && !mddev->in_sync) { spin_lock(&mddev->lock); set_in_sync(mddev); spin_unlock(&mddev->lock); } if (mddev->sb_flags) md_update_sb(mddev, 0); if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) && !test_bit(MD_RECOVERY_DONE, &mddev->recovery)) { /* resync/recovery still happening */ clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery); goto unlock; } if (mddev->sync_thread) { md_reap_sync_thread(mddev); goto unlock; } /* Set RUNNING before clearing NEEDED to avoid * any transients in the value of "sync_action". */ mddev->curr_resync_completed = 0; spin_lock(&mddev->lock); set_bit(MD_RECOVERY_RUNNING, &mddev->recovery); spin_unlock(&mddev->lock); /* Clear some bits that don't mean anything, but * might be left set */ clear_bit(MD_RECOVERY_INTR, &mddev->recovery); clear_bit(MD_RECOVERY_DONE, &mddev->recovery); if (!test_and_clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery) || test_bit(MD_RECOVERY_FROZEN, &mddev->recovery)) goto not_running; /* no recovery is running. * remove any failed drives, then * add spares if possible. * Spares are also removed and re-added, to allow * the personality to fail the re-add. */ if (mddev->reshape_position != MaxSector) { if (mddev->pers->check_reshape == NULL || mddev->pers->check_reshape(mddev) != 0) /* Cannot proceed */ goto not_running; set_bit(MD_RECOVERY_RESHAPE, &mddev->recovery); clear_bit(MD_RECOVERY_RECOVER, &mddev->recovery); } else if ((spares = remove_and_add_spares(mddev, NULL))) { clear_bit(MD_RECOVERY_SYNC, &mddev->recovery); clear_bit(MD_RECOVERY_CHECK, &mddev->recovery); clear_bit(MD_RECOVERY_REQUESTED, &mddev->recovery); set_bit(MD_RECOVERY_RECOVER, &mddev->recovery); } else if (mddev->recovery_cp < MaxSector) { set_bit(MD_RECOVERY_SYNC, &mddev->recovery); clear_bit(MD_RECOVERY_RECOVER, &mddev->recovery); } else if (!test_bit(MD_RECOVERY_SYNC, &mddev->recovery)) /* nothing to be done ... */ goto not_running; if (mddev->pers->sync_request) { if (spares) { /* We are adding a device or devices to an array * which has the bitmap stored on all devices. * So make sure all bitmap pages get written */ md_bitmap_write_all(mddev->bitmap); } INIT_WORK(&mddev->del_work, md_start_sync); queue_work(md_misc_wq, &mddev->del_work); goto unlock; } not_running: if (!mddev->sync_thread) { clear_bit(MD_RECOVERY_RUNNING, &mddev->recovery); wake_up(&resync_wait); if (test_and_clear_bit(MD_RECOVERY_RECOVER, &mddev->recovery)) if (mddev->sysfs_action) sysfs_notify_dirent_safe(mddev->sysfs_action); } unlock: wake_up(&mddev->sb_wait); mddev_unlock(mddev); } } EXPORT_SYMBOL(md_check_recovery); void md_reap_sync_thread(struct mddev *mddev) { struct md_rdev *rdev; /* resync has finished, collect result */ md_unregister_thread(&mddev->sync_thread); if (!test_bit(MD_RECOVERY_INTR, &mddev->recovery) && !test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery) && mddev->degraded != mddev->raid_disks) { /* success...*/ /* activate any spares */ if (mddev->pers->spare_active(mddev)) { sysfs_notify(&mddev->kobj, NULL, "degraded"); set_bit(MD_SB_CHANGE_DEVS, &mddev->sb_flags); } } if (test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery) && mddev->pers->finish_reshape) mddev->pers->finish_reshape(mddev); /* If array is no-longer degraded, then any saved_raid_disk * information must be scrapped. */ if (!mddev->degraded) rdev_for_each(rdev, mddev) rdev->saved_raid_disk = -1; md_update_sb(mddev, 1); /* MD_SB_CHANGE_PENDING should be cleared by md_update_sb, so we can * call resync_finish here if MD_CLUSTER_RESYNC_LOCKED is set by * clustered raid */ if (test_and_clear_bit(MD_CLUSTER_RESYNC_LOCKED, &mddev->flags)) md_cluster_ops->resync_finish(mddev); clear_bit(MD_RECOVERY_RUNNING, &mddev->recovery); clear_bit(MD_RECOVERY_DONE, &mddev->recovery); clear_bit(MD_RECOVERY_SYNC, &mddev->recovery); clear_bit(MD_RECOVERY_RESHAPE, &mddev->recovery); clear_bit(MD_RECOVERY_REQUESTED, &mddev->recovery); clear_bit(MD_RECOVERY_CHECK, &mddev->recovery); wake_up(&resync_wait); /* flag recovery needed just to double check */ set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); sysfs_notify_dirent_safe(mddev->sysfs_action); md_new_event(mddev); if (mddev->event_work.func) queue_work(md_misc_wq, &mddev->event_work); } EXPORT_SYMBOL(md_reap_sync_thread); void md_wait_for_blocked_rdev(struct md_rdev *rdev, struct mddev *mddev) { sysfs_notify_dirent_safe(rdev->sysfs_state); wait_event_timeout(rdev->blocked_wait, !test_bit(Blocked, &rdev->flags) && !test_bit(BlockedBadBlocks, &rdev->flags), msecs_to_jiffies(5000)); rdev_dec_pending(rdev, mddev); } EXPORT_SYMBOL(md_wait_for_blocked_rdev); void md_finish_reshape(struct mddev *mddev) { /* called be personality module when reshape completes. */ struct md_rdev *rdev; rdev_for_each(rdev, mddev) { if (rdev->data_offset > rdev->new_data_offset) rdev->sectors += rdev->data_offset - rdev->new_data_offset; else rdev->sectors -= rdev->new_data_offset - rdev->data_offset; rdev->data_offset = rdev->new_data_offset; } } EXPORT_SYMBOL(md_finish_reshape); /* Bad block management */ /* Returns 1 on success, 0 on failure */ int rdev_set_badblocks(struct md_rdev *rdev, sector_t s, int sectors, int is_new) { struct mddev *mddev = rdev->mddev; int rv; if (is_new) s += rdev->new_data_offset; else s += rdev->data_offset; rv = badblocks_set(&rdev->badblocks, s, sectors, 0); if (rv == 0) { /* Make sure they get written out promptly */ if (test_bit(ExternalBbl, &rdev->flags)) sysfs_notify(&rdev->kobj, NULL, "unacknowledged_bad_blocks"); sysfs_notify_dirent_safe(rdev->sysfs_state); set_mask_bits(&mddev->sb_flags, 0, BIT(MD_SB_CHANGE_CLEAN) | BIT(MD_SB_CHANGE_PENDING)); md_wakeup_thread(rdev->mddev->thread); return 1; } else return 0; } EXPORT_SYMBOL_GPL(rdev_set_badblocks); int rdev_clear_badblocks(struct md_rdev *rdev, sector_t s, int sectors, int is_new) { int rv; if (is_new) s += rdev->new_data_offset; else s += rdev->data_offset; rv = badblocks_clear(&rdev->badblocks, s, sectors); if ((rv == 0) && test_bit(ExternalBbl, &rdev->flags)) sysfs_notify(&rdev->kobj, NULL, "bad_blocks"); return rv; } EXPORT_SYMBOL_GPL(rdev_clear_badblocks); static int md_notify_reboot(struct notifier_block *this, unsigned long code, void *x) { struct list_head *tmp; struct mddev *mddev; int need_delay = 0; for_each_mddev(mddev, tmp) { if (mddev_trylock(mddev)) { if (mddev->pers) __md_stop_writes(mddev); if (mddev->persistent) mddev->safemode = 2; mddev_unlock(mddev); } need_delay = 1; } /* * certain more exotic SCSI devices are known to be * volatile wrt too early system reboots. While the * right place to handle this issue is the given * driver, we do want to have a safe RAID driver ... */ if (need_delay) mdelay(1000*1); return NOTIFY_DONE; } static struct notifier_block md_notifier = { .notifier_call = md_notify_reboot, .next = NULL, .priority = INT_MAX, /* before any real devices */ }; static void md_geninit(void) { pr_debug("md: sizeof(mdp_super_t) = %d\n", (int)sizeof(mdp_super_t)); proc_create("mdstat", S_IRUGO, NULL, &md_seq_fops); } static int __init md_init(void) { int ret = -ENOMEM; md_wq = alloc_workqueue("md", WQ_MEM_RECLAIM, 0); if (!md_wq) goto err_wq; md_misc_wq = alloc_workqueue("md_misc", 0, 0); if (!md_misc_wq) goto err_misc_wq; if ((ret = register_blkdev(MD_MAJOR, "md")) < 0) goto err_md; if ((ret = register_blkdev(0, "mdp")) < 0) goto err_mdp; mdp_major = ret; blk_register_region(MKDEV(MD_MAJOR, 0), 512, THIS_MODULE, md_probe, NULL, NULL); blk_register_region(MKDEV(mdp_major, 0), 1UL<<MINORBITS, THIS_MODULE, md_probe, NULL, NULL); register_reboot_notifier(&md_notifier); raid_table_header = register_sysctl_table(raid_root_table); md_geninit(); return 0; err_mdp: unregister_blkdev(MD_MAJOR, "md"); err_md: destroy_workqueue(md_misc_wq); err_misc_wq: destroy_workqueue(md_wq); err_wq: return ret; } static void check_sb_changes(struct mddev *mddev, struct md_rdev *rdev) { struct mdp_superblock_1 *sb = page_address(rdev->sb_page); struct md_rdev *rdev2, *tmp; int role, ret; char b[BDEVNAME_SIZE]; /* * If size is changed in another node then we need to * do resize as well. */ if (mddev->dev_sectors != le64_to_cpu(sb->size)) { ret = mddev->pers->resize(mddev, le64_to_cpu(sb->size)); if (ret) pr_info("md-cluster: resize failed\n"); else md_bitmap_update_sb(mddev->bitmap); } /* Check for change of roles in the active devices */ rdev_for_each_safe(rdev2, tmp, mddev) { if (test_bit(Faulty, &rdev2->flags)) continue; /* Check if the roles changed */ role = le16_to_cpu(sb->dev_roles[rdev2->desc_nr]); if (test_bit(Candidate, &rdev2->flags)) { if (role == 0xfffe) { pr_info("md: Removing Candidate device %s because add failed\n", bdevname(rdev2->bdev,b)); md_kick_rdev_from_array(rdev2); continue; } else clear_bit(Candidate, &rdev2->flags); } if (role != rdev2->raid_disk) { /* got activated */ if (rdev2->raid_disk == -1 && role != 0xffff) { rdev2->saved_raid_disk = role; ret = remove_and_add_spares(mddev, rdev2); pr_info("Activated spare: %s\n", bdevname(rdev2->bdev,b)); /* wakeup mddev->thread here, so array could * perform resync with the new activated disk */ set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); md_wakeup_thread(mddev->thread); } /* device faulty * We just want to do the minimum to mark the disk * as faulty. The recovery is performed by the * one who initiated the error. */ if ((role == 0xfffe) || (role == 0xfffd)) { md_error(mddev, rdev2); clear_bit(Blocked, &rdev2->flags); } } } if (mddev->raid_disks != le32_to_cpu(sb->raid_disks)) { ret = update_raid_disks(mddev, le32_to_cpu(sb->raid_disks)); if (ret) pr_warn("md: updating array disks failed. %d\n", ret); } /* Finally set the event to be up to date */ mddev->events = le64_to_cpu(sb->events); } static int read_rdev(struct mddev *mddev, struct md_rdev *rdev) { int err; struct page *swapout = rdev->sb_page; struct mdp_superblock_1 *sb; /* Store the sb page of the rdev in the swapout temporary * variable in case we err in the future */ rdev->sb_page = NULL; err = alloc_disk_sb(rdev); if (err == 0) { ClearPageUptodate(rdev->sb_page); rdev->sb_loaded = 0; err = super_types[mddev->major_version]. load_super(rdev, NULL, mddev->minor_version); } if (err < 0) { pr_warn("%s: %d Could not reload rdev(%d) err: %d. Restoring old values\n", __func__, __LINE__, rdev->desc_nr, err); if (rdev->sb_page) put_page(rdev->sb_page); rdev->sb_page = swapout; rdev->sb_loaded = 1; return err; } sb = page_address(rdev->sb_page); /* Read the offset unconditionally, even if MD_FEATURE_RECOVERY_OFFSET * is not set */ if ((le32_to_cpu(sb->feature_map) & MD_FEATURE_RECOVERY_OFFSET)) rdev->recovery_offset = le64_to_cpu(sb->recovery_offset); /* The other node finished recovery, call spare_active to set * device In_sync and mddev->degraded */ if (rdev->recovery_offset == MaxSector && !test_bit(In_sync, &rdev->flags) && mddev->pers->spare_active(mddev)) sysfs_notify(&mddev->kobj, NULL, "degraded"); put_page(swapout); return 0; } void md_reload_sb(struct mddev *mddev, int nr) { struct md_rdev *rdev = NULL, *iter; int err; /* Find the rdev */ rdev_for_each_rcu(iter, mddev) { if (iter->desc_nr == nr) { rdev = iter; break; } } if (!rdev) { pr_warn("%s: %d Could not find rdev with nr %d\n", __func__, __LINE__, nr); return; } err = read_rdev(mddev, rdev); if (err < 0) return; check_sb_changes(mddev, rdev); /* Read all rdev's to update recovery_offset */ rdev_for_each_rcu(rdev, mddev) { if (!test_bit(Faulty, &rdev->flags)) read_rdev(mddev, rdev); } } EXPORT_SYMBOL(md_reload_sb); #ifndef MODULE /* * Searches all registered partitions for autorun RAID arrays * at boot time. */ static DEFINE_MUTEX(detected_devices_mutex); static LIST_HEAD(all_detected_devices); struct detected_devices_node { struct list_head list; dev_t dev; }; void md_autodetect_dev(dev_t dev) { struct detected_devices_node *node_detected_dev; node_detected_dev = kzalloc(sizeof(*node_detected_dev), GFP_KERNEL); if (node_detected_dev) { node_detected_dev->dev = dev; mutex_lock(&detected_devices_mutex); list_add_tail(&node_detected_dev->list, &all_detected_devices); mutex_unlock(&detected_devices_mutex); } } static void autostart_arrays(int part) { struct md_rdev *rdev; struct detected_devices_node *node_detected_dev; dev_t dev; int i_scanned, i_passed; i_scanned = 0; i_passed = 0; pr_info("md: Autodetecting RAID arrays.\n"); mutex_lock(&detected_devices_mutex); while (!list_empty(&all_detected_devices) && i_scanned < INT_MAX) { i_scanned++; node_detected_dev = list_entry(all_detected_devices.next, struct detected_devices_node, list); list_del(&node_detected_dev->list); dev = node_detected_dev->dev; kfree(node_detected_dev); mutex_unlock(&detected_devices_mutex); rdev = md_import_device(dev,0, 90); mutex_lock(&detected_devices_mutex); if (IS_ERR(rdev)) continue; if (test_bit(Faulty, &rdev->flags)) continue; set_bit(AutoDetected, &rdev->flags); list_add(&rdev->same_set, &pending_raid_disks); i_passed++; } mutex_unlock(&detected_devices_mutex); pr_debug("md: Scanned %d and added %d devices.\n", i_scanned, i_passed); autorun_devices(part); } #endif /* !MODULE */ static __exit void md_exit(void) { struct mddev *mddev; struct list_head *tmp; int delay = 1; blk_unregister_region(MKDEV(MD_MAJOR,0), 512); blk_unregister_region(MKDEV(mdp_major,0), 1U << MINORBITS); unregister_blkdev(MD_MAJOR,"md"); unregister_blkdev(mdp_major, "mdp"); unregister_reboot_notifier(&md_notifier); unregister_sysctl_table(raid_table_header); /* We cannot unload the modules while some process is * waiting for us in select() or poll() - wake them up */ md_unloading = 1; while (waitqueue_active(&md_event_waiters)) { /* not safe to leave yet */ wake_up(&md_event_waiters); msleep(delay); delay += delay; } remove_proc_entry("mdstat", NULL); for_each_mddev(mddev, tmp) { export_array(mddev); mddev->ctime = 0; mddev->hold_active = 0; /* * for_each_mddev() will call mddev_put() at the end of each * iteration. As the mddev is now fully clear, this will * schedule the mddev for destruction by a workqueue, and the * destroy_workqueue() below will wait for that to complete. */ } destroy_workqueue(md_misc_wq); destroy_workqueue(md_wq); } subsys_initcall(md_init); module_exit(md_exit) static int get_ro(char *buffer, const struct kernel_param *kp) { return sprintf(buffer, "%d", start_readonly); } static int set_ro(const char *val, const struct kernel_param *kp) { return kstrtouint(val, 10, (unsigned int *)&start_readonly); } module_param_call(start_ro, set_ro, get_ro, NULL, S_IRUSR|S_IWUSR); module_param(start_dirty_degraded, int, S_IRUGO|S_IWUSR); module_param_call(new_array, add_named_array, NULL, NULL, S_IWUSR); module_param(create_on_open, bool, S_IRUSR|S_IWUSR); MODULE_LICENSE("GPL"); MODULE_DESCRIPTION("MD RAID framework"); MODULE_ALIAS("md"); MODULE_ALIAS_BLOCKDEV_MAJOR(MD_MAJOR);
6 184 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 /* * Handle firewalling core * Linux ethernet bridge * * Authors: * Lennert Buytenhek <buytenh@gnu.org> * Bart De Schuymer <bdschuym@pandora.be> * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License * as published by the Free Software Foundation; either version * 2 of the License, or (at your option) any later version. * * Lennert dedicates this file to Kerstin Wurdinger. */ #include <linux/module.h> #include <linux/kernel.h> #include <linux/in_route.h> #include <linux/inetdevice.h> #include <net/route.h> #include "br_private.h" #ifdef CONFIG_SYSCTL #include <linux/sysctl.h> #endif static void fake_update_pmtu(struct dst_entry *dst, struct sock *sk, struct sk_buff *skb, u32 mtu, bool confirm_neigh) { } static void fake_redirect(struct dst_entry *dst, struct sock *sk, struct sk_buff *skb) { } static u32 *fake_cow_metrics(struct dst_entry *dst, unsigned long old) { return NULL; } static struct neighbour *fake_neigh_lookup(const struct dst_entry *dst, struct sk_buff *skb, const void *daddr) { return NULL; } static unsigned int fake_mtu(const struct dst_entry *dst) { return dst->dev->mtu; } static struct dst_ops fake_dst_ops = { .family = AF_INET, .update_pmtu = fake_update_pmtu, .redirect = fake_redirect, .cow_metrics = fake_cow_metrics, .neigh_lookup = fake_neigh_lookup, .mtu = fake_mtu, }; /* * Initialize bogus route table used to keep netfilter happy. * Currently, we fill in the PMTU entry because netfilter * refragmentation needs it, and the rt_flags entry because * ipt_REJECT needs it. Future netfilter modules might * require us to fill additional fields. */ static const u32 br_dst_default_metrics[RTAX_MAX] = { [RTAX_MTU - 1] = 1500, }; void br_netfilter_rtable_init(struct net_bridge *br) { struct rtable *rt = &br->fake_rtable; atomic_set(&rt->dst.__refcnt, 1); rt->dst.dev = br->dev; dst_init_metrics(&rt->dst, br_dst_default_metrics, true); rt->dst.flags = DST_NOXFRM | DST_FAKE_RTABLE; rt->dst.ops = &fake_dst_ops; } int __init br_nf_core_init(void) { return dst_entries_init(&fake_dst_ops); } void br_nf_core_fini(void) { dst_entries_destroy(&fake_dst_ops); }
3358 3356 275 275 275 3356 200 71 272 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 /* * Fast and scalable bitmaps. * * Copyright (C) 2016 Facebook * Copyright (C) 2013-2014 Jens Axboe * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public * License v2 as published by the Free Software Foundation. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program. If not, see <https://www.gnu.org/licenses/>. */ #ifndef __LINUX_SCALE_BITMAP_H #define __LINUX_SCALE_BITMAP_H #include <linux/kernel.h> #include <linux/slab.h> struct seq_file; /** * struct sbitmap_word - Word in a &struct sbitmap. */ struct sbitmap_word { /** * @word: The bitmap word itself. */ unsigned long word; /** * @depth: Number of bits being used in @word. */ unsigned long depth; } ____cacheline_aligned_in_smp; /** * struct sbitmap - Scalable bitmap. * * A &struct sbitmap is spread over multiple cachelines to avoid ping-pong. This * trades off higher memory usage for better scalability. */ struct sbitmap { /** * @depth: Number of bits used in the whole bitmap. */ unsigned int depth; /** * @shift: log2(number of bits used per word) */ unsigned int shift; /** * @map_nr: Number of words (cachelines) being used for the bitmap. */ unsigned int map_nr; /** * @map: Allocated bitmap. */ struct sbitmap_word *map; }; #define SBQ_WAIT_QUEUES 8 #define SBQ_WAKE_BATCH 8 /** * struct sbq_wait_state - Wait queue in a &struct sbitmap_queue. */ struct sbq_wait_state { /** * @wait_cnt: Number of frees remaining before we wake up. */ atomic_t wait_cnt; /** * @wait: Wait queue. */ wait_queue_head_t wait; } ____cacheline_aligned_in_smp; /** * struct sbitmap_queue - Scalable bitmap with the added ability to wait on free * bits. * * A &struct sbitmap_queue uses multiple wait queues and rolling wakeups to * avoid contention on the wait queue spinlock. This ensures that we don't hit a * scalability wall when we run out of free bits and have to start putting tasks * to sleep. */ struct sbitmap_queue { /** * @sb: Scalable bitmap. */ struct sbitmap sb; /* * @alloc_hint: Cache of last successfully allocated or freed bit. * * This is per-cpu, which allows multiple users to stick to different * cachelines until the map is exhausted. */ unsigned int __percpu *alloc_hint; /** * @wake_batch: Number of bits which must be freed before we wake up any * waiters. */ unsigned int wake_batch; /** * @wake_index: Next wait queue in @ws to wake up. */ atomic_t wake_index; /** * @ws: Wait queues. */ struct sbq_wait_state *ws; /** * @round_robin: Allocate bits in strict round-robin order. */ bool round_robin; /** * @min_shallow_depth: The minimum shallow depth which may be passed to * sbitmap_queue_get_shallow() or __sbitmap_queue_get_shallow(). */ unsigned int min_shallow_depth; }; /** * sbitmap_init_node() - Initialize a &struct sbitmap on a specific memory node. * @sb: Bitmap to initialize. * @depth: Number of bits to allocate. * @shift: Use 2^@shift bits per word in the bitmap; if a negative number if * given, a good default is chosen. * @flags: Allocation flags. * @node: Memory node to allocate on. * * Return: Zero on success or negative errno on failure. */ int sbitmap_init_node(struct sbitmap *sb, unsigned int depth, int shift, gfp_t flags, int node); /** * sbitmap_free() - Free memory used by a &struct sbitmap. * @sb: Bitmap to free. */ static inline void sbitmap_free(struct sbitmap *sb) { kfree(sb->map); sb->map = NULL; } /** * sbitmap_resize() - Resize a &struct sbitmap. * @sb: Bitmap to resize. * @depth: New number of bits to resize to. * * Doesn't reallocate anything. It's up to the caller to ensure that the new * depth doesn't exceed the depth that the sb was initialized with. */ void sbitmap_resize(struct sbitmap *sb, unsigned int depth); /** * sbitmap_get() - Try to allocate a free bit from a &struct sbitmap. * @sb: Bitmap to allocate from. * @alloc_hint: Hint for where to start searching for a free bit. * @round_robin: If true, be stricter about allocation order; always allocate * starting from the last allocated bit. This is less efficient * than the default behavior (false). * * This operation provides acquire barrier semantics if it succeeds. * * Return: Non-negative allocated bit number if successful, -1 otherwise. */ int sbitmap_get(struct sbitmap *sb, unsigned int alloc_hint, bool round_robin); /** * sbitmap_get_shallow() - Try to allocate a free bit from a &struct sbitmap, * limiting the depth used from each word. * @sb: Bitmap to allocate from. * @alloc_hint: Hint for where to start searching for a free bit. * @shallow_depth: The maximum number of bits to allocate from a single word. * * This rather specific operation allows for having multiple users with * different allocation limits. E.g., there can be a high-priority class that * uses sbitmap_get() and a low-priority class that uses sbitmap_get_shallow() * with a @shallow_depth of (1 << (@sb->shift - 1)). Then, the low-priority * class can only allocate half of the total bits in the bitmap, preventing it * from starving out the high-priority class. * * Return: Non-negative allocated bit number if successful, -1 otherwise. */ int sbitmap_get_shallow(struct sbitmap *sb, unsigned int alloc_hint, unsigned long shallow_depth); /** * sbitmap_any_bit_set() - Check for a set bit in a &struct sbitmap. * @sb: Bitmap to check. * * Return: true if any bit in the bitmap is set, false otherwise. */ bool sbitmap_any_bit_set(const struct sbitmap *sb); /** * sbitmap_any_bit_clear() - Check for an unset bit in a &struct * sbitmap. * @sb: Bitmap to check. * * Return: true if any bit in the bitmap is clear, false otherwise. */ bool sbitmap_any_bit_clear(const struct sbitmap *sb); #define SB_NR_TO_INDEX(sb, bitnr) ((bitnr) >> (sb)->shift) #define SB_NR_TO_BIT(sb, bitnr) ((bitnr) & ((1U << (sb)->shift) - 1U)) typedef bool (*sb_for_each_fn)(struct sbitmap *, unsigned int, void *); /** * __sbitmap_for_each_set() - Iterate over each set bit in a &struct sbitmap. * @start: Where to start the iteration. * @sb: Bitmap to iterate over. * @fn: Callback. Should return true to continue or false to break early. * @data: Pointer to pass to callback. * * This is inline even though it's non-trivial so that the function calls to the * callback will hopefully get optimized away. */ static inline void __sbitmap_for_each_set(struct sbitmap *sb, unsigned int start, sb_for_each_fn fn, void *data) { unsigned int index; unsigned int nr; unsigned int scanned = 0; if (start >= sb->depth) start = 0; index = SB_NR_TO_INDEX(sb, start); nr = SB_NR_TO_BIT(sb, start); while (scanned < sb->depth) { struct sbitmap_word *word = &sb->map[index]; unsigned int depth = min_t(unsigned int, word->depth - nr, sb->depth - scanned); scanned += depth; if (!word->word) goto next; /* * On the first iteration of the outer loop, we need to add the * bit offset back to the size of the word for find_next_bit(). * On all other iterations, nr is zero, so this is a noop. */ depth += nr; while (1) { nr = find_next_bit(&word->word, depth, nr); if (nr >= depth) break; if (!fn(sb, (index << sb->shift) + nr, data)) return; nr++; } next: nr = 0; if (++index >= sb->map_nr) index = 0; } } /** * sbitmap_for_each_set() - Iterate over each set bit in a &struct sbitmap. * @sb: Bitmap to iterate over. * @fn: Callback. Should return true to continue or false to break early. * @data: Pointer to pass to callback. */ static inline void sbitmap_for_each_set(struct sbitmap *sb, sb_for_each_fn fn, void *data) { __sbitmap_for_each_set(sb, 0, fn, data); } static inline unsigned long *__sbitmap_word(struct sbitmap *sb, unsigned int bitnr) { return &sb->map[SB_NR_TO_INDEX(sb, bitnr)].word; } /* Helpers equivalent to the operations in asm/bitops.h and linux/bitmap.h */ static inline void sbitmap_set_bit(struct sbitmap *sb, unsigned int bitnr) { set_bit(SB_NR_TO_BIT(sb, bitnr), __sbitmap_word(sb, bitnr)); } static inline void sbitmap_clear_bit(struct sbitmap *sb, unsigned int bitnr) { clear_bit(SB_NR_TO_BIT(sb, bitnr), __sbitmap_word(sb, bitnr)); } static inline void sbitmap_clear_bit_unlock(struct sbitmap *sb, unsigned int bitnr) { clear_bit_unlock(SB_NR_TO_BIT(sb, bitnr), __sbitmap_word(sb, bitnr)); } static inline int sbitmap_test_bit(struct sbitmap *sb, unsigned int bitnr) { return test_bit(SB_NR_TO_BIT(sb, bitnr), __sbitmap_word(sb, bitnr)); } unsigned int sbitmap_weight(const struct sbitmap *sb); /** * sbitmap_show() - Dump &struct sbitmap information to a &struct seq_file. * @sb: Bitmap to show. * @m: struct seq_file to write to. * * This is intended for debugging. The format may change at any time. */ void sbitmap_show(struct sbitmap *sb, struct seq_file *m); /** * sbitmap_bitmap_show() - Write a hex dump of a &struct sbitmap to a &struct * seq_file. * @sb: Bitmap to show. * @m: struct seq_file to write to. * * This is intended for debugging. The output isn't guaranteed to be internally * consistent. */ void sbitmap_bitmap_show(struct sbitmap *sb, struct seq_file *m); /** * sbitmap_queue_init_node() - Initialize a &struct sbitmap_queue on a specific * memory node. * @sbq: Bitmap queue to initialize. * @depth: See sbitmap_init_node(). * @shift: See sbitmap_init_node(). * @round_robin: See sbitmap_get(). * @flags: Allocation flags. * @node: Memory node to allocate on. * * Return: Zero on success or negative errno on failure. */ int sbitmap_queue_init_node(struct sbitmap_queue *sbq, unsigned int depth, int shift, bool round_robin, gfp_t flags, int node); /** * sbitmap_queue_free() - Free memory used by a &struct sbitmap_queue. * * @sbq: Bitmap queue to free. */ static inline void sbitmap_queue_free(struct sbitmap_queue *sbq) { kfree(sbq->ws); free_percpu(sbq->alloc_hint); sbitmap_free(&sbq->sb); } /** * sbitmap_queue_resize() - Resize a &struct sbitmap_queue. * @sbq: Bitmap queue to resize. * @depth: New number of bits to resize to. * * Like sbitmap_resize(), this doesn't reallocate anything. It has to do * some extra work on the &struct sbitmap_queue, so it's not safe to just * resize the underlying &struct sbitmap. */ void sbitmap_queue_resize(struct sbitmap_queue *sbq, unsigned int depth); /** * __sbitmap_queue_get() - Try to allocate a free bit from a &struct * sbitmap_queue with preemption already disabled. * @sbq: Bitmap queue to allocate from. * * Return: Non-negative allocated bit number if successful, -1 otherwise. */ int __sbitmap_queue_get(struct sbitmap_queue *sbq); /** * __sbitmap_queue_get_shallow() - Try to allocate a free bit from a &struct * sbitmap_queue, limiting the depth used from each word, with preemption * already disabled. * @sbq: Bitmap queue to allocate from. * @shallow_depth: The maximum number of bits to allocate from a single word. * See sbitmap_get_shallow(). * * If you call this, make sure to call sbitmap_queue_min_shallow_depth() after * initializing @sbq. * * Return: Non-negative allocated bit number if successful, -1 otherwise. */ int __sbitmap_queue_get_shallow(struct sbitmap_queue *sbq, unsigned int shallow_depth); /** * sbitmap_queue_get() - Try to allocate a free bit from a &struct * sbitmap_queue. * @sbq: Bitmap queue to allocate from. * @cpu: Output parameter; will contain the CPU we ran on (e.g., to be passed to * sbitmap_queue_clear()). * * Return: Non-negative allocated bit number if successful, -1 otherwise. */ static inline int sbitmap_queue_get(struct sbitmap_queue *sbq, unsigned int *cpu) { int nr; *cpu = get_cpu(); nr = __sbitmap_queue_get(sbq); put_cpu(); return nr; } /** * sbitmap_queue_get_shallow() - Try to allocate a free bit from a &struct * sbitmap_queue, limiting the depth used from each word. * @sbq: Bitmap queue to allocate from. * @cpu: Output parameter; will contain the CPU we ran on (e.g., to be passed to * sbitmap_queue_clear()). * @shallow_depth: The maximum number of bits to allocate from a single word. * See sbitmap_get_shallow(). * * If you call this, make sure to call sbitmap_queue_min_shallow_depth() after * initializing @sbq. * * Return: Non-negative allocated bit number if successful, -1 otherwise. */ static inline int sbitmap_queue_get_shallow(struct sbitmap_queue *sbq, unsigned int *cpu, unsigned int shallow_depth) { int nr; *cpu = get_cpu(); nr = __sbitmap_queue_get_shallow(sbq, shallow_depth); put_cpu(); return nr; } /** * sbitmap_queue_min_shallow_depth() - Inform a &struct sbitmap_queue of the * minimum shallow depth that will be used. * @sbq: Bitmap queue in question. * @min_shallow_depth: The minimum shallow depth that will be passed to * sbitmap_queue_get_shallow() or __sbitmap_queue_get_shallow(). * * sbitmap_queue_clear() batches wakeups as an optimization. The batch size * depends on the depth of the bitmap. Since the shallow allocation functions * effectively operate with a different depth, the shallow depth must be taken * into account when calculating the batch size. This function must be called * with the minimum shallow depth that will be used. Failure to do so can result * in missed wakeups. */ void sbitmap_queue_min_shallow_depth(struct sbitmap_queue *sbq, unsigned int min_shallow_depth); /** * sbitmap_queue_clear() - Free an allocated bit and wake up waiters on a * &struct sbitmap_queue. * @sbq: Bitmap to free from. * @nr: Bit number to free. * @cpu: CPU the bit was allocated on. */ void sbitmap_queue_clear(struct sbitmap_queue *sbq, unsigned int nr, unsigned int cpu); static inline int sbq_index_inc(int index) { return (index + 1) & (SBQ_WAIT_QUEUES - 1); } static inline void sbq_index_atomic_inc(atomic_t *index) { int old = atomic_read(index); int new = sbq_index_inc(old); atomic_cmpxchg(index, old, new); } /** * sbq_wait_ptr() - Get the next wait queue to use for a &struct * sbitmap_queue. * @sbq: Bitmap queue to wait on. * @wait_index: A counter per "user" of @sbq. */ static inline struct sbq_wait_state *sbq_wait_ptr(struct sbitmap_queue *sbq, atomic_t *wait_index) { struct sbq_wait_state *ws; ws = &sbq->ws[atomic_read(wait_index)]; sbq_index_atomic_inc(wait_index); return ws; } /** * sbitmap_queue_wake_all() - Wake up everything waiting on a &struct * sbitmap_queue. * @sbq: Bitmap queue to wake up. */ void sbitmap_queue_wake_all(struct sbitmap_queue *sbq); /** * sbitmap_queue_wake_up() - Wake up some of waiters in one waitqueue * on a &struct sbitmap_queue. * @sbq: Bitmap queue to wake up. */ void sbitmap_queue_wake_up(struct sbitmap_queue *sbq); /** * sbitmap_queue_show() - Dump &struct sbitmap_queue information to a &struct * seq_file. * @sbq: Bitmap queue to show. * @m: struct seq_file to write to. * * This is intended for debugging. The format may change at any time. */ void sbitmap_queue_show(struct sbitmap_queue *sbq, struct seq_file *m); #endif /* __LINUX_SCALE_BITMAP_H */
444 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 /* * (C) 2016 by Pablo Neira Ayuso <pablo@netfilter.org> * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License version 2 as * published by the Free Software Foundation. */ #include <linux/module.h> #include <linux/spinlock.h> #include <linux/skbuff.h> #include <linux/ip.h> #include <net/route.h> #include <linux/netfilter.h> #include <net/netfilter/nf_log.h> static void nf_log_netdev_packet(struct net *net, u_int8_t pf, unsigned int hooknum, const struct sk_buff *skb, const struct net_device *in, const struct net_device *out, const struct nf_loginfo *loginfo, const char *prefix) { nf_log_l2packet(net, pf, skb->protocol, hooknum, skb, in, out, loginfo, prefix); } static struct nf_logger nf_netdev_logger __read_mostly = { .name = "nf_log_netdev", .type = NF_LOG_TYPE_LOG, .logfn = nf_log_netdev_packet, .me = THIS_MODULE, }; static int __net_init nf_log_netdev_net_init(struct net *net) { return nf_log_set(net, NFPROTO_NETDEV, &nf_netdev_logger); } static void __net_exit nf_log_netdev_net_exit(struct net *net) { nf_log_unset(net, &nf_netdev_logger); } static struct pernet_operations nf_log_netdev_net_ops = { .init = nf_log_netdev_net_init, .exit = nf_log_netdev_net_exit, }; static int __init nf_log_netdev_init(void) { int ret; /* Request to load the real packet loggers. */ nf_logger_request_module(NFPROTO_IPV4, NF_LOG_TYPE_LOG); nf_logger_request_module(NFPROTO_IPV6, NF_LOG_TYPE_LOG); nf_logger_request_module(NFPROTO_ARP, NF_LOG_TYPE_LOG); ret = register_pernet_subsys(&nf_log_netdev_net_ops); if (ret < 0) return ret; nf_log_register(NFPROTO_NETDEV, &nf_netdev_logger); return 0; } static void __exit nf_log_netdev_exit(void) { unregister_pernet_subsys(&nf_log_netdev_net_ops); nf_log_unregister(&nf_netdev_logger); } module_init(nf_log_netdev_init); module_exit(nf_log_netdev_exit); MODULE_AUTHOR("Pablo Neira Ayuso <pablo@netfilter.org>"); MODULE_DESCRIPTION("Netfilter netdev packet logging"); MODULE_LICENSE("GPL"); MODULE_ALIAS_NF_LOGGER(5, 0); /* NFPROTO_NETDEV */
13 1 7 10 7 7 2 3 3 7 7 7 7 23 3 3 3 10 10 1 9 1 2 9 2 2 2 2 2 2 2 2 13 15 14 10 7 7 3 2 3 3 4 2 6 4 4 4 8 10 5 5 4 4 5 5 1 5 1 7 7 7 2 7 5 7 53 11 10 10 11 54 15 1 7 1 2 27 12 15 27 5 14 12 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 /* Copyright (C) 2009 Red Hat, Inc. * Author: Michael S. Tsirkin <mst@redhat.com> * * This work is licensed under the terms of the GNU GPL, version 2. * * virtio-net server in host kernel. */ #include <linux/compat.h> #include <linux/eventfd.h> #include <linux/vhost.h> #include <linux/virtio_net.h> #include <linux/miscdevice.h> #include <linux/module.h> #include <linux/moduleparam.h> #include <linux/mutex.h> #include <linux/workqueue.h> #include <linux/file.h> #include <linux/slab.h> #include <linux/sched/clock.h> #include <linux/sched/signal.h> #include <linux/vmalloc.h> #include <linux/net.h> #include <linux/if_packet.h> #include <linux/if_arp.h> #include <linux/if_tun.h> #include <linux/if_macvlan.h> #include <linux/if_tap.h> #include <linux/if_vlan.h> #include <linux/skb_array.h> #include <linux/skbuff.h> #include <net/sock.h> #include <net/xdp.h> #include "vhost.h" static int experimental_zcopytx = 0; module_param(experimental_zcopytx, int, 0444); MODULE_PARM_DESC(experimental_zcopytx, "Enable Zero Copy TX;" " 1 -Enable; 0 - Disable"); /* Max number of bytes transferred before requeueing the job. * Using this limit prevents one virtqueue from starving others. */ #define VHOST_NET_WEIGHT 0x80000 /* Max number of packets transferred before requeueing the job. * Using this limit prevents one virtqueue from starving others with small * pkts. */ #define VHOST_NET_PKT_WEIGHT 256 /* MAX number of TX used buffers for outstanding zerocopy */ #define VHOST_MAX_PEND 128 #define VHOST_GOODCOPY_LEN 256 /* * For transmit, used buffer len is unused; we override it to track buffer * status internally; used for zerocopy tx only. */ /* Lower device DMA failed */ #define VHOST_DMA_FAILED_LEN ((__force __virtio32)3) /* Lower device DMA done */ #define VHOST_DMA_DONE_LEN ((__force __virtio32)2) /* Lower device DMA in progress */ #define VHOST_DMA_IN_PROGRESS ((__force __virtio32)1) /* Buffer unused */ #define VHOST_DMA_CLEAR_LEN ((__force __virtio32)0) #define VHOST_DMA_IS_DONE(len) ((__force u32)(len) >= (__force u32)VHOST_DMA_DONE_LEN) enum { VHOST_NET_FEATURES = VHOST_FEATURES | (1ULL << VHOST_NET_F_VIRTIO_NET_HDR) | (1ULL << VIRTIO_NET_F_MRG_RXBUF) | (1ULL << VIRTIO_F_IOMMU_PLATFORM) }; enum { VHOST_NET_BACKEND_FEATURES = (1ULL << VHOST_BACKEND_F_IOTLB_MSG_V2) }; enum { VHOST_NET_VQ_RX = 0, VHOST_NET_VQ_TX = 1, VHOST_NET_VQ_MAX = 2, }; struct vhost_net_ubuf_ref { /* refcount follows semantics similar to kref: * 0: object is released * 1: no outstanding ubufs * >1: outstanding ubufs */ atomic_t refcount; wait_queue_head_t wait; struct vhost_virtqueue *vq; }; #define VHOST_NET_BATCH 64 struct vhost_net_buf { void **queue; int tail; int head; }; struct vhost_net_virtqueue { struct vhost_virtqueue vq; size_t vhost_hlen; size_t sock_hlen; /* vhost zerocopy support fields below: */ /* last used idx for outstanding DMA zerocopy buffers */ int upend_idx; /* For TX, first used idx for DMA done zerocopy buffers * For RX, number of batched heads */ int done_idx; /* an array of userspace buffers info */ struct ubuf_info *ubuf_info; /* Reference counting for outstanding ubufs. * Protected by vq mutex. Writers must also take device mutex. */ struct vhost_net_ubuf_ref *ubufs; struct ptr_ring *rx_ring; struct vhost_net_buf rxq; }; struct vhost_net { struct vhost_dev dev; struct vhost_net_virtqueue vqs[VHOST_NET_VQ_MAX]; struct vhost_poll poll[VHOST_NET_VQ_MAX]; /* Number of TX recently submitted. * Protected by tx vq lock. */ unsigned tx_packets; /* Number of times zerocopy TX recently failed. * Protected by tx vq lock. */ unsigned tx_zcopy_err; /* Flush in progress. Protected by tx vq lock. */ bool tx_flush; }; static unsigned vhost_net_zcopy_mask __read_mostly; static void *vhost_net_buf_get_ptr(struct vhost_net_buf *rxq) { if (rxq->tail != rxq->head) return rxq->queue[rxq->head]; else return NULL; } static int vhost_net_buf_get_size(struct vhost_net_buf *rxq) { return rxq->tail - rxq->head; } static int vhost_net_buf_is_empty(struct vhost_net_buf *rxq) { return rxq->tail == rxq->head; } static void *vhost_net_buf_consume(struct vhost_net_buf *rxq) { void *ret = vhost_net_buf_get_ptr(rxq); ++rxq->head; return ret; } static int vhost_net_buf_produce(struct vhost_net_virtqueue *nvq) { struct vhost_net_buf *rxq = &nvq->rxq; rxq->head = 0; rxq->tail = ptr_ring_consume_batched(nvq->rx_ring, rxq->queue, VHOST_NET_BATCH); return rxq->tail; } static void vhost_net_buf_unproduce(struct vhost_net_virtqueue *nvq) { struct vhost_net_buf *rxq = &nvq->rxq; if (nvq->rx_ring && !vhost_net_buf_is_empty(rxq)) { ptr_ring_unconsume(nvq->rx_ring, rxq->queue + rxq->head, vhost_net_buf_get_size(rxq), tun_ptr_free); rxq->head = rxq->tail = 0; } } static int vhost_net_buf_peek_len(void *ptr) { if (tun_is_xdp_frame(ptr)) { struct xdp_frame *xdpf = tun_ptr_to_xdp(ptr); return xdpf->len; } return __skb_array_len_with_tag(ptr); } static int vhost_net_buf_peek(struct vhost_net_virtqueue *nvq) { struct vhost_net_buf *rxq = &nvq->rxq; if (!vhost_net_buf_is_empty(rxq)) goto out; if (!vhost_net_buf_produce(nvq)) return 0; out: return vhost_net_buf_peek_len(vhost_net_buf_get_ptr(rxq)); } static void vhost_net_buf_init(struct vhost_net_buf *rxq) { rxq->head = rxq->tail = 0; } static void vhost_net_enable_zcopy(int vq) { vhost_net_zcopy_mask |= 0x1 << vq; } static struct vhost_net_ubuf_ref * vhost_net_ubuf_alloc(struct vhost_virtqueue *vq, bool zcopy) { struct vhost_net_ubuf_ref *ubufs; /* No zero copy backend? Nothing to count. */ if (!zcopy) return NULL; ubufs = kmalloc(sizeof(*ubufs), GFP_KERNEL); if (!ubufs) return ERR_PTR(-ENOMEM); atomic_set(&ubufs->refcount, 1); init_waitqueue_head(&ubufs->wait); ubufs->vq = vq; return ubufs; } static int vhost_net_ubuf_put(struct vhost_net_ubuf_ref *ubufs) { int r = atomic_sub_return(1, &ubufs->refcount); if (unlikely(!r)) wake_up(&ubufs->wait); return r; } static void vhost_net_ubuf_put_and_wait(struct vhost_net_ubuf_ref *ubufs) { vhost_net_ubuf_put(ubufs); wait_event(ubufs->wait, !atomic_read(&ubufs->refcount)); } static void vhost_net_ubuf_put_wait_and_free(struct vhost_net_ubuf_ref *ubufs) { vhost_net_ubuf_put_and_wait(ubufs); kfree(ubufs); } static void vhost_net_clear_ubuf_info(struct vhost_net *n) { int i; for (i = 0; i < VHOST_NET_VQ_MAX; ++i) { kfree(n->vqs[i].ubuf_info); n->vqs[i].ubuf_info = NULL; } } static int vhost_net_set_ubuf_info(struct vhost_net *n) { bool zcopy; int i; for (i = 0; i < VHOST_NET_VQ_MAX; ++i) { zcopy = vhost_net_zcopy_mask & (0x1 << i); if (!zcopy) continue; n->vqs[i].ubuf_info = kmalloc_array(UIO_MAXIOV, sizeof(*n->vqs[i].ubuf_info), GFP_KERNEL); if (!n->vqs[i].ubuf_info) goto err; } return 0; err: vhost_net_clear_ubuf_info(n); return -ENOMEM; } static void vhost_net_vq_reset(struct vhost_net *n) { int i; vhost_net_clear_ubuf_info(n); for (i = 0; i < VHOST_NET_VQ_MAX; i++) { n->vqs[i].done_idx = 0; n->vqs[i].upend_idx = 0; n->vqs[i].ubufs = NULL; n->vqs[i].vhost_hlen = 0; n->vqs[i].sock_hlen = 0; vhost_net_buf_init(&n->vqs[i].rxq); } } static void vhost_net_tx_packet(struct vhost_net *net) { ++net->tx_packets; if (net->tx_packets < 1024) return; net->tx_packets = 0; net->tx_zcopy_err = 0; } static void vhost_net_tx_err(struct vhost_net *net) { ++net->tx_zcopy_err; } static bool vhost_net_tx_select_zcopy(struct vhost_net *net) { /* TX flush waits for outstanding DMAs to be done. * Don't start new DMAs. */ return !net->tx_flush && net->tx_packets / 64 >= net->tx_zcopy_err; } static bool vhost_sock_zcopy(struct socket *sock) { return unlikely(experimental_zcopytx) && sock_flag(sock->sk, SOCK_ZEROCOPY); } /* In case of DMA done not in order in lower device driver for some reason. * upend_idx is used to track end of used idx, done_idx is used to track head * of used idx. Once lower device DMA done contiguously, we will signal KVM * guest used idx. */ static void vhost_zerocopy_signal_used(struct vhost_net *net, struct vhost_virtqueue *vq) { struct vhost_net_virtqueue *nvq = container_of(vq, struct vhost_net_virtqueue, vq); int i, add; int j = 0; for (i = nvq->done_idx; i != nvq->upend_idx; i = (i + 1) % UIO_MAXIOV) { if (vq->heads[i].len == VHOST_DMA_FAILED_LEN) vhost_net_tx_err(net); if (VHOST_DMA_IS_DONE(vq->heads[i].len)) { vq->heads[i].len = VHOST_DMA_CLEAR_LEN; ++j; } else break; } while (j) { add = min(UIO_MAXIOV - nvq->done_idx, j); vhost_add_used_and_signal_n(vq->dev, vq, &vq->heads[nvq->done_idx], add); nvq->done_idx = (nvq->done_idx + add) % UIO_MAXIOV; j -= add; } } static void vhost_zerocopy_callback(struct ubuf_info *ubuf, bool success) { struct vhost_net_ubuf_ref *ubufs = ubuf->ctx; struct vhost_virtqueue *vq = ubufs->vq; int cnt; rcu_read_lock_bh(); /* set len to mark this desc buffers done DMA */ vq->heads[ubuf->desc].len = success ? VHOST_DMA_DONE_LEN : VHOST_DMA_FAILED_LEN; cnt = vhost_net_ubuf_put(ubufs); /* * Trigger polling thread if guest stopped submitting new buffers: * in this case, the refcount after decrement will eventually reach 1. * We also trigger polling periodically after each 16 packets * (the value 16 here is more or less arbitrary, it's tuned to trigger * less than 10% of times). */ if (cnt <= 1 || !(cnt % 16)) vhost_poll_queue(&vq->poll); rcu_read_unlock_bh(); } static inline unsigned long busy_clock(void) { return local_clock() >> 10; } static bool vhost_can_busy_poll(unsigned long endtime) { return likely(!need_resched() && !time_after(busy_clock(), endtime) && !signal_pending(current)); } static void vhost_net_disable_vq(struct vhost_net *n, struct vhost_virtqueue *vq) { struct vhost_net_virtqueue *nvq = container_of(vq, struct vhost_net_virtqueue, vq); struct vhost_poll *poll = n->poll + (nvq - n->vqs); if (!vq->private_data) return; vhost_poll_stop(poll); } static int vhost_net_enable_vq(struct vhost_net *n, struct vhost_virtqueue *vq) { struct vhost_net_virtqueue *nvq = container_of(vq, struct vhost_net_virtqueue, vq); struct vhost_poll *poll = n->poll + (nvq - n->vqs); struct socket *sock; sock = vq->private_data; if (!sock) return 0; return vhost_poll_start(poll, sock->file); } static void vhost_net_signal_used(struct vhost_net_virtqueue *nvq) { struct vhost_virtqueue *vq = &nvq->vq; struct vhost_dev *dev = vq->dev; if (!nvq->done_idx) return; vhost_add_used_and_signal_n(dev, vq, vq->heads, nvq->done_idx); nvq->done_idx = 0; } static int vhost_net_tx_get_vq_desc(struct vhost_net *net, struct vhost_net_virtqueue *nvq, unsigned int *out_num, unsigned int *in_num, bool *busyloop_intr) { struct vhost_virtqueue *vq = &nvq->vq; unsigned long uninitialized_var(endtime); int r = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov), out_num, in_num, NULL, NULL); if (r == vq->num && vq->busyloop_timeout) { if (!vhost_sock_zcopy(vq->private_data)) vhost_net_signal_used(nvq); preempt_disable(); endtime = busy_clock() + vq->busyloop_timeout; while (vhost_can_busy_poll(endtime)) { if (vhost_has_work(vq->dev)) { *busyloop_intr = true; break; } if (!vhost_vq_avail_empty(vq->dev, vq)) break; cpu_relax(); } preempt_enable(); r = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov), out_num, in_num, NULL, NULL); } return r; } static bool vhost_exceeds_maxpend(struct vhost_net *net) { struct vhost_net_virtqueue *nvq = &net->vqs[VHOST_NET_VQ_TX]; struct vhost_virtqueue *vq = &nvq->vq; return (nvq->upend_idx + UIO_MAXIOV - nvq->done_idx) % UIO_MAXIOV > min_t(unsigned int, VHOST_MAX_PEND, vq->num >> 2); } static size_t init_iov_iter(struct vhost_virtqueue *vq, struct iov_iter *iter, size_t hdr_size, int out) { /* Skip header. TODO: support TSO. */ size_t len = iov_length(vq->iov, out); iov_iter_init(iter, WRITE, vq->iov, out, len); iov_iter_advance(iter, hdr_size); return iov_iter_count(iter); } static int get_tx_bufs(struct vhost_net *net, struct vhost_net_virtqueue *nvq, struct msghdr *msg, unsigned int *out, unsigned int *in, size_t *len, bool *busyloop_intr) { struct vhost_virtqueue *vq = &nvq->vq; int ret; ret = vhost_net_tx_get_vq_desc(net, nvq, out, in, busyloop_intr); if (ret < 0 || ret == vq->num) return ret; if (*in) { vq_err(vq, "Unexpected descriptor format for TX: out %d, int %d\n", *out, *in); return -EFAULT; } /* Sanity check */ *len = init_iov_iter(vq, &msg->msg_iter, nvq->vhost_hlen, *out); if (*len == 0) { vq_err(vq, "Unexpected header len for TX: %zd expected %zd\n", *len, nvq->vhost_hlen); return -EFAULT; } return ret; } static bool tx_can_batch(struct vhost_virtqueue *vq, size_t total_len) { return total_len < VHOST_NET_WEIGHT && !vhost_vq_avail_empty(vq->dev, vq); } static void handle_tx_copy(struct vhost_net *net, struct socket *sock) { struct vhost_net_virtqueue *nvq = &net->vqs[VHOST_NET_VQ_TX]; struct vhost_virtqueue *vq = &nvq->vq; unsigned out, in; int head; struct msghdr msg = { .msg_name = NULL, .msg_namelen = 0, .msg_control = NULL, .msg_controllen = 0, .msg_flags = MSG_DONTWAIT, }; size_t len, total_len = 0; int err; int sent_pkts = 0; do { bool busyloop_intr = false; head = get_tx_bufs(net, nvq, &msg, &out, &in, &len, &busyloop_intr); /* On error, stop handling until the next kick. */ if (unlikely(head < 0)) break; /* Nothing new? Wait for eventfd to tell us they refilled. */ if (head == vq->num) { if (unlikely(busyloop_intr)) { vhost_poll_queue(&vq->poll); } else if (unlikely(vhost_enable_notify(&net->dev, vq))) { vhost_disable_notify(&net->dev, vq); continue; } break; } vq->heads[nvq->done_idx].id = cpu_to_vhost32(vq, head); vq->heads[nvq->done_idx].len = 0; total_len += len; if (tx_can_batch(vq, total_len)) msg.msg_flags |= MSG_MORE; else msg.msg_flags &= ~MSG_MORE; /* TODO: Check specific error and bomb out unless ENOBUFS? */ err = sock->ops->sendmsg(sock, &msg, len); if (unlikely(err < 0)) { vhost_discard_vq_desc(vq, 1); vhost_net_enable_vq(net, vq); break; } if (err != len) pr_debug("Truncated TX packet: len %d != %zd\n", err, len); if (++nvq->done_idx >= VHOST_NET_BATCH) vhost_net_signal_used(nvq); } while (likely(!vhost_exceeds_weight(vq, ++sent_pkts, total_len))); vhost_net_signal_used(nvq); } static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock) { struct vhost_net_virtqueue *nvq = &net->vqs[VHOST_NET_VQ_TX]; struct vhost_virtqueue *vq = &nvq->vq; unsigned out, in; int head; struct msghdr msg = { .msg_name = NULL, .msg_namelen = 0, .msg_control = NULL, .msg_controllen = 0, .msg_flags = MSG_DONTWAIT, }; size_t len, total_len = 0; int err; struct vhost_net_ubuf_ref *uninitialized_var(ubufs); struct ubuf_info *ubuf; bool zcopy_used; int sent_pkts = 0; do { bool busyloop_intr; /* Release DMAs done buffers first */ vhost_zerocopy_signal_used(net, vq); busyloop_intr = false; head = get_tx_bufs(net, nvq, &msg, &out, &in, &len, &busyloop_intr); /* On error, stop handling until the next kick. */ if (unlikely(head < 0)) break; /* Nothing new? Wait for eventfd to tell us they refilled. */ if (head == vq->num) { if (unlikely(busyloop_intr)) { vhost_poll_queue(&vq->poll); } else if (unlikely(vhost_enable_notify(&net->dev, vq))) { vhost_disable_notify(&net->dev, vq); continue; } break; } zcopy_used = len >= VHOST_GOODCOPY_LEN && !vhost_exceeds_maxpend(net) && vhost_net_tx_select_zcopy(net); /* use msg_control to pass vhost zerocopy ubuf info to skb */ if (zcopy_used) { ubuf = nvq->ubuf_info + nvq->upend_idx; vq->heads[nvq->upend_idx].id = cpu_to_vhost32(vq, head); vq->heads[nvq->upend_idx].len = VHOST_DMA_IN_PROGRESS; ubuf->callback = vhost_zerocopy_callback; ubuf->ctx = nvq->ubufs; ubuf->desc = nvq->upend_idx; refcount_set(&ubuf->refcnt, 1); msg.msg_control = ubuf; msg.msg_controllen = sizeof(ubuf); ubufs = nvq->ubufs; atomic_inc(&ubufs->refcount); nvq->upend_idx = (nvq->upend_idx + 1) % UIO_MAXIOV; } else { msg.msg_control = NULL; ubufs = NULL; } total_len += len; if (tx_can_batch(vq, total_len) && likely(!vhost_exceeds_maxpend(net))) { msg.msg_flags |= MSG_MORE; } else { msg.msg_flags &= ~MSG_MORE; } /* TODO: Check specific error and bomb out unless ENOBUFS? */ err = sock->ops->sendmsg(sock, &msg, len); if (unlikely(err < 0)) { if (zcopy_used) { if (vq->heads[ubuf->desc].len == VHOST_DMA_IN_PROGRESS) vhost_net_ubuf_put(ubufs); nvq->upend_idx = ((unsigned)nvq->upend_idx - 1) % UIO_MAXIOV; } vhost_discard_vq_desc(vq, 1); vhost_net_enable_vq(net, vq); break; } if (err != len) pr_debug("Truncated TX packet: " " len %d != %zd\n", err, len); if (!zcopy_used) vhost_add_used_and_signal(&net->dev, vq, head, 0); else vhost_zerocopy_signal_used(net, vq); vhost_net_tx_packet(net); } while (likely(!vhost_exceeds_weight(vq, ++sent_pkts, total_len))); } /* Expects to be always run from workqueue - which acts as * read-size critical section for our kind of RCU. */ static void handle_tx(struct vhost_net *net) { struct vhost_net_virtqueue *nvq = &net->vqs[VHOST_NET_VQ_TX]; struct vhost_virtqueue *vq = &nvq->vq; struct socket *sock; mutex_lock(&vq->mutex); sock = vq->private_data; if (!sock) goto out; if (!vq_iotlb_prefetch(vq)) goto out; vhost_disable_notify(&net->dev, vq); vhost_net_disable_vq(net, vq); if (vhost_sock_zcopy(sock)) handle_tx_zerocopy(net, sock); else handle_tx_copy(net, sock); out: mutex_unlock(&vq->mutex); } static int peek_head_len(struct vhost_net_virtqueue *rvq, struct sock *sk) { struct sk_buff *head; int len = 0; unsigned long flags; if (rvq->rx_ring) return vhost_net_buf_peek(rvq); spin_lock_irqsave(&sk->sk_receive_queue.lock, flags); head = skb_peek(&sk->sk_receive_queue); if (likely(head)) { len = head->len; if (skb_vlan_tag_present(head)) len += VLAN_HLEN; } spin_unlock_irqrestore(&sk->sk_receive_queue.lock, flags); return len; } static int sk_has_rx_data(struct sock *sk) { struct socket *sock = sk->sk_socket; if (sock->ops->peek_len) return sock->ops->peek_len(sock); return skb_queue_empty(&sk->sk_receive_queue); } static int vhost_net_rx_peek_head_len(struct vhost_net *net, struct sock *sk, bool *busyloop_intr) { struct vhost_net_virtqueue *rnvq = &net->vqs[VHOST_NET_VQ_RX]; struct vhost_net_virtqueue *tnvq = &net->vqs[VHOST_NET_VQ_TX]; struct vhost_virtqueue *rvq = &rnvq->vq; struct vhost_virtqueue *tvq = &tnvq->vq; unsigned long uninitialized_var(endtime); int len = peek_head_len(rnvq, sk); if (!len && tvq->busyloop_timeout) { /* Flush batched heads first */ vhost_net_signal_used(rnvq); /* Both tx vq and rx socket were polled here */ mutex_lock_nested(&tvq->mutex, 1); vhost_disable_notify(&net->dev, tvq); preempt_disable(); endtime = busy_clock() + tvq->busyloop_timeout; while (vhost_can_busy_poll(endtime)) { if (vhost_has_work(&net->dev)) { *busyloop_intr = true; break; } if ((sk_has_rx_data(sk) && !vhost_vq_avail_empty(&net->dev, rvq)) || !vhost_vq_avail_empty(&net->dev, tvq)) break; cpu_relax(); } preempt_enable(); if (!vhost_vq_avail_empty(&net->dev, tvq)) { vhost_poll_queue(&tvq->poll); } else if (unlikely(vhost_enable_notify(&net->dev, tvq))) { vhost_disable_notify(&net->dev, tvq); vhost_poll_queue(&tvq->poll); } mutex_unlock(&tvq->mutex); len = peek_head_len(rnvq, sk); } return len; } /* This is a multi-buffer version of vhost_get_desc, that works if * vq has read descriptors only. * @vq - the relevant virtqueue * @datalen - data length we'll be reading * @iovcount - returned count of io vectors we fill * @log - vhost log * @log_num - log offset * @quota - headcount quota, 1 for big buffer * returns number of buffer heads allocated, negative on error */ static int get_rx_bufs(struct vhost_virtqueue *vq, struct vring_used_elem *heads, int datalen, unsigned *iovcount, struct vhost_log *log, unsigned *log_num, unsigned int quota) { unsigned int out, in; int seg = 0; int headcount = 0; unsigned d; int r, nlogs = 0; /* len is always initialized before use since we are always called with * datalen > 0. */ u32 uninitialized_var(len); while (datalen > 0 && headcount < quota) { if (unlikely(seg >= UIO_MAXIOV)) { r = -ENOBUFS; goto err; } r = vhost_get_vq_desc(vq, vq->iov + seg, ARRAY_SIZE(vq->iov) - seg, &out, &in, log, log_num); if (unlikely(r < 0)) goto err; d = r; if (d == vq->num) { r = 0; goto err; } if (unlikely(out || in <= 0)) { vq_err(vq, "unexpected descriptor format for RX: " "out %d, in %d\n", out, in); r = -EINVAL; goto err; } if (unlikely(log)) { nlogs += *log_num; log += *log_num; } heads[headcount].id = cpu_to_vhost32(vq, d); len = iov_length(vq->iov + seg, in); heads[headcount].len = cpu_to_vhost32(vq, len); datalen -= len; ++headcount; seg += in; } heads[headcount - 1].len = cpu_to_vhost32(vq, len + datalen); *iovcount = seg; if (unlikely(log)) *log_num = nlogs; /* Detect overrun */ if (unlikely(datalen > 0)) { r = UIO_MAXIOV + 1; goto err; } return headcount; err: vhost_discard_vq_desc(vq, headcount); return r; } /* Expects to be always run from workqueue - which acts as * read-size critical section for our kind of RCU. */ static void handle_rx(struct vhost_net *net) { struct vhost_net_virtqueue *nvq = &net->vqs[VHOST_NET_VQ_RX]; struct vhost_virtqueue *vq = &nvq->vq; unsigned uninitialized_var(in), log; struct vhost_log *vq_log; struct msghdr msg = { .msg_name = NULL, .msg_namelen = 0, .msg_control = NULL, /* FIXME: get and handle RX aux data. */ .msg_controllen = 0, .msg_flags = MSG_DONTWAIT, }; struct virtio_net_hdr hdr = { .flags = 0, .gso_type = VIRTIO_NET_HDR_GSO_NONE }; size_t total_len = 0; int err, mergeable; s16 headcount; size_t vhost_hlen, sock_hlen; size_t vhost_len, sock_len; bool busyloop_intr = false; struct socket *sock; struct iov_iter fixup; __virtio16 num_buffers; int recv_pkts = 0; mutex_lock_nested(&vq->mutex, 0); sock = vq->private_data; if (!sock) goto out; if (!vq_iotlb_prefetch(vq)) goto out; vhost_disable_notify(&net->dev, vq); vhost_net_disable_vq(net, vq); vhost_hlen = nvq->vhost_hlen; sock_hlen = nvq->sock_hlen; vq_log = unlikely(vhost_has_feature(vq, VHOST_F_LOG_ALL)) ? vq->log : NULL; mergeable = vhost_has_feature(vq, VIRTIO_NET_F_MRG_RXBUF); do { sock_len = vhost_net_rx_peek_head_len(net, sock->sk, &busyloop_intr); if (!sock_len) break; sock_len += sock_hlen; vhost_len = sock_len + vhost_hlen; headcount = get_rx_bufs(vq, vq->heads + nvq->done_idx, vhost_len, &in, vq_log, &log, likely(mergeable) ? UIO_MAXIOV : 1); /* On error, stop handling until the next kick. */ if (unlikely(headcount < 0)) goto out; /* OK, now we need to know about added descriptors. */ if (!headcount) { if (unlikely(busyloop_intr)) { vhost_poll_queue(&vq->poll); } else if (unlikely(vhost_enable_notify(&net->dev, vq))) { /* They have slipped one in as we were * doing that: check again. */ vhost_disable_notify(&net->dev, vq); continue; } /* Nothing new? Wait for eventfd to tell us * they refilled. */ goto out; } busyloop_intr = false; if (nvq->rx_ring) msg.msg_control = vhost_net_buf_consume(&nvq->rxq); /* On overrun, truncate and discard */ if (unlikely(headcount > UIO_MAXIOV)) { iov_iter_init(&msg.msg_iter, READ, vq->iov, 1, 1); err = sock->ops->recvmsg(sock, &msg, 1, MSG_DONTWAIT | MSG_TRUNC); pr_debug("Discarded rx packet: len %zd\n", sock_len); continue; } /* We don't need to be notified again. */ iov_iter_init(&msg.msg_iter, READ, vq->iov, in, vhost_len); fixup = msg.msg_iter; if (unlikely((vhost_hlen))) { /* We will supply the header ourselves * TODO: support TSO. */ iov_iter_advance(&msg.msg_iter, vhost_hlen); } err = sock->ops->recvmsg(sock, &msg, sock_len, MSG_DONTWAIT | MSG_TRUNC); /* Userspace might have consumed the packet meanwhile: * it's not supposed to do this usually, but might be hard * to prevent. Discard data we got (if any) and keep going. */ if (unlikely(err != sock_len)) { pr_debug("Discarded rx packet: " " len %d, expected %zd\n", err, sock_len); vhost_discard_vq_desc(vq, headcount); continue; } /* Supply virtio_net_hdr if VHOST_NET_F_VIRTIO_NET_HDR */ if (unlikely(vhost_hlen)) { if (copy_to_iter(&hdr, sizeof(hdr), &fixup) != sizeof(hdr)) { vq_err(vq, "Unable to write vnet_hdr " "at addr %p\n", vq->iov->iov_base); goto out; } } else { /* Header came from socket; we'll need to patch * ->num_buffers over if VIRTIO_NET_F_MRG_RXBUF */ iov_iter_advance(&fixup, sizeof(hdr)); } /* TODO: Should check and handle checksum. */ num_buffers = cpu_to_vhost16(vq, headcount); if (likely(mergeable) && copy_to_iter(&num_buffers, sizeof num_buffers, &fixup) != sizeof num_buffers) { vq_err(vq, "Failed num_buffers write"); vhost_discard_vq_desc(vq, headcount); goto out; } nvq->done_idx += headcount; if (nvq->done_idx > VHOST_NET_BATCH) vhost_net_signal_used(nvq); if (unlikely(vq_log)) vhost_log_write(vq, vq_log, log, vhost_len, vq->iov, in); total_len += vhost_len; } while (likely(!vhost_exceeds_weight(vq, ++recv_pkts, total_len))); if (unlikely(busyloop_intr)) vhost_poll_queue(&vq->poll); else if (!sock_len) vhost_net_enable_vq(net, vq); out: vhost_net_signal_used(nvq); mutex_unlock(&vq->mutex); } static void handle_tx_kick(struct vhost_work *work) { struct vhost_virtqueue *vq = container_of(work, struct vhost_virtqueue, poll.work); struct vhost_net *net = container_of(vq->dev, struct vhost_net, dev); handle_tx(net); } static void handle_rx_kick(struct vhost_work *work) { struct vhost_virtqueue *vq = container_of(work, struct vhost_virtqueue, poll.work); struct vhost_net *net = container_of(vq->dev, struct vhost_net, dev); handle_rx(net); } static void handle_tx_net(struct vhost_work *work) { struct vhost_net *net = container_of(work, struct vhost_net, poll[VHOST_NET_VQ_TX].work); handle_tx(net); } static void handle_rx_net(struct vhost_work *work) { struct vhost_net *net = container_of(work, struct vhost_net, poll[VHOST_NET_VQ_RX].work); handle_rx(net); } static int vhost_net_open(struct inode *inode, struct file *f) { struct vhost_net *n; struct vhost_dev *dev; struct vhost_virtqueue **vqs; void **queue; int i; n = kvmalloc(sizeof *n, GFP_KERNEL | __GFP_RETRY_MAYFAIL); if (!n) return -ENOMEM; vqs = kmalloc_array(VHOST_NET_VQ_MAX, sizeof(*vqs), GFP_KERNEL); if (!vqs) { kvfree(n); return -ENOMEM; } queue = kmalloc_array(VHOST_NET_BATCH, sizeof(void *), GFP_KERNEL); if (!queue) { kfree(vqs); kvfree(n); return -ENOMEM; } n->vqs[VHOST_NET_VQ_RX].rxq.queue = queue; dev = &n->dev; vqs[VHOST_NET_VQ_TX] = &n->vqs[VHOST_NET_VQ_TX].vq; vqs[VHOST_NET_VQ_RX] = &n->vqs[VHOST_NET_VQ_RX].vq; n->vqs[VHOST_NET_VQ_TX].vq.handle_kick = handle_tx_kick; n->vqs[VHOST_NET_VQ_RX].vq.handle_kick = handle_rx_kick; for (i = 0; i < VHOST_NET_VQ_MAX; i++) { n->vqs[i].ubufs = NULL; n->vqs[i].ubuf_info = NULL; n->vqs[i].upend_idx = 0; n->vqs[i].done_idx = 0; n->vqs[i].vhost_hlen = 0; n->vqs[i].sock_hlen = 0; n->vqs[i].rx_ring = NULL; vhost_net_buf_init(&n->vqs[i].rxq); } vhost_dev_init(dev, vqs, VHOST_NET_VQ_MAX, UIO_MAXIOV + VHOST_NET_BATCH, VHOST_NET_PKT_WEIGHT, VHOST_NET_WEIGHT); vhost_poll_init(n->poll + VHOST_NET_VQ_TX, handle_tx_net, EPOLLOUT, dev); vhost_poll_init(n->poll + VHOST_NET_VQ_RX, handle_rx_net, EPOLLIN, dev); f->private_data = n; return 0; } static struct socket *vhost_net_stop_vq(struct vhost_net *n, struct vhost_virtqueue *vq) { struct socket *sock; struct vhost_net_virtqueue *nvq = container_of(vq, struct vhost_net_virtqueue, vq); mutex_lock(&vq->mutex); sock = vq->private_data; vhost_net_disable_vq(n, vq); vq->private_data = NULL; vhost_net_buf_unproduce(nvq); nvq->rx_ring = NULL; mutex_unlock(&vq->mutex); return sock; } static void vhost_net_stop(struct vhost_net *n, struct socket **tx_sock, struct socket **rx_sock) { *tx_sock = vhost_net_stop_vq(n, &n->vqs[VHOST_NET_VQ_TX].vq); *rx_sock = vhost_net_stop_vq(n, &n->vqs[VHOST_NET_VQ_RX].vq); } static void vhost_net_flush_vq(struct vhost_net *n, int index) { vhost_poll_flush(n->poll + index); vhost_poll_flush(&n->vqs[index].vq.poll); } static void vhost_net_flush(struct vhost_net *n) { vhost_net_flush_vq(n, VHOST_NET_VQ_TX); vhost_net_flush_vq(n, VHOST_NET_VQ_RX); if (n->vqs[VHOST_NET_VQ_TX].ubufs) { mutex_lock(&n->vqs[VHOST_NET_VQ_TX].vq.mutex); n->tx_flush = true; mutex_unlock(&n->vqs[VHOST_NET_VQ_TX].vq.mutex); /* Wait for all lower device DMAs done. */ vhost_net_ubuf_put_and_wait(n->vqs[VHOST_NET_VQ_TX].ubufs); mutex_lock(&n->vqs[VHOST_NET_VQ_TX].vq.mutex); n->tx_flush = false; atomic_set(&n->vqs[VHOST_NET_VQ_TX].ubufs->refcount, 1); mutex_unlock(&n->vqs[VHOST_NET_VQ_TX].vq.mutex); } } static int vhost_net_release(struct inode *inode, struct file *f) { struct vhost_net *n = f->private_data; struct socket *tx_sock; struct socket *rx_sock; vhost_net_stop(n, &tx_sock, &rx_sock); vhost_net_flush(n); vhost_dev_stop(&n->dev); vhost_dev_cleanup(&n->dev); vhost_net_vq_reset(n); if (tx_sock) sockfd_put(tx_sock); if (rx_sock) sockfd_put(rx_sock); /* Make sure no callbacks are outstanding */ synchronize_rcu_bh(); /* We do an extra flush before freeing memory, * since jobs can re-queue themselves. */ vhost_net_flush(n); kfree(n->vqs[VHOST_NET_VQ_RX].rxq.queue); kfree(n->dev.vqs); kvfree(n); return 0; } static struct socket *get_raw_socket(int fd) { int r; struct socket *sock = sockfd_lookup(fd, &r); if (!sock) return ERR_PTR(-ENOTSOCK); /* Parameter checking */ if (sock->sk->sk_type != SOCK_RAW) { r = -ESOCKTNOSUPPORT; goto err; } if (sock->sk->sk_family != AF_PACKET) { r = -EPFNOSUPPORT; goto err; } return sock; err: sockfd_put(sock); return ERR_PTR(r); } static struct ptr_ring *get_tap_ptr_ring(struct file *file) { struct ptr_ring *ring; ring = tun_get_tx_ring(file); if (!IS_ERR(ring)) goto out; ring = tap_get_ptr_ring(file); if (!IS_ERR(ring)) goto out; ring = NULL; out: return ring; } static struct socket *get_tap_socket(int fd) { struct file *file = fget(fd); struct socket *sock; if (!file) return ERR_PTR(-EBADF); sock = tun_get_socket(file); if (!IS_ERR(sock)) return sock; sock = tap_get_socket(file); if (IS_ERR(sock)) fput(file); return sock; } static struct socket *get_socket(int fd) { struct socket *sock; /* special case to disable backend */ if (fd == -1) return NULL; sock = get_raw_socket(fd); if (!IS_ERR(sock)) return sock; sock = get_tap_socket(fd); if (!IS_ERR(sock)) return sock; return ERR_PTR(-ENOTSOCK); } static long vhost_net_set_backend(struct vhost_net *n, unsigned index, int fd) { struct socket *sock, *oldsock; struct vhost_virtqueue *vq; struct vhost_net_virtqueue *nvq; struct vhost_net_ubuf_ref *ubufs, *oldubufs = NULL; int r; mutex_lock(&n->dev.mutex); r = vhost_dev_check_owner(&n->dev); if (r) goto err; if (index >= VHOST_NET_VQ_MAX) { r = -ENOBUFS; goto err; } vq = &n->vqs[index].vq; nvq = &n->vqs[index]; mutex_lock(&vq->mutex); /* Verify that ring has been setup correctly. */ if (!vhost_vq_access_ok(vq)) { r = -EFAULT; goto err_vq; } sock = get_socket(fd); if (IS_ERR(sock)) { r = PTR_ERR(sock); goto err_vq; } /* start polling new socket */ oldsock = vq->private_data; if (sock != oldsock) { ubufs = vhost_net_ubuf_alloc(vq, sock && vhost_sock_zcopy(sock)); if (IS_ERR(ubufs)) { r = PTR_ERR(ubufs); goto err_ubufs; } vhost_net_disable_vq(n, vq); vq->private_data = sock; vhost_net_buf_unproduce(nvq); r = vhost_vq_init_access(vq); if (r) goto err_used; r = vhost_net_enable_vq(n, vq); if (r) goto err_used; if (index == VHOST_NET_VQ_RX) { if (sock) nvq->rx_ring = get_tap_ptr_ring(sock->file); else nvq->rx_ring = NULL; } oldubufs = nvq->ubufs; nvq->ubufs = ubufs; n->tx_packets = 0; n->tx_zcopy_err = 0; n->tx_flush = false; } mutex_unlock(&vq->mutex); if (oldubufs) { vhost_net_ubuf_put_wait_and_free(oldubufs); mutex_lock(&vq->mutex); vhost_zerocopy_signal_used(n, vq); mutex_unlock(&vq->mutex); } if (oldsock) { vhost_net_flush_vq(n, index); sockfd_put(oldsock); } mutex_unlock(&n->dev.mutex); return 0; err_used: vq->private_data = oldsock; vhost_net_enable_vq(n, vq); if (ubufs) vhost_net_ubuf_put_wait_and_free(ubufs); err_ubufs: if (sock) sockfd_put(sock); err_vq: mutex_unlock(&vq->mutex); err: mutex_unlock(&n->dev.mutex); return r; } static long vhost_net_reset_owner(struct vhost_net *n) { struct socket *tx_sock = NULL; struct socket *rx_sock = NULL; long err; struct vhost_umem *umem; mutex_lock(&n->dev.mutex); err = vhost_dev_check_owner(&n->dev); if (err) goto done; umem = vhost_dev_reset_owner_prepare(); if (!umem) { err = -ENOMEM; goto done; } vhost_net_stop(n, &tx_sock, &rx_sock); vhost_net_flush(n); vhost_dev_stop(&n->dev); vhost_dev_reset_owner(&n->dev, umem); vhost_net_vq_reset(n); done: mutex_unlock(&n->dev.mutex); if (tx_sock) sockfd_put(tx_sock); if (rx_sock) sockfd_put(rx_sock); return err; } static int vhost_net_set_backend_features(struct vhost_net *n, u64 features) { int i; mutex_lock(&n->dev.mutex); for (i = 0; i < VHOST_NET_VQ_MAX; ++i) { mutex_lock(&n->vqs[i].vq.mutex); n->vqs[i].vq.acked_backend_features = features; mutex_unlock(&n->vqs[i].vq.mutex); } mutex_unlock(&n->dev.mutex); return 0; } static int vhost_net_set_features(struct vhost_net *n, u64 features) { size_t vhost_hlen, sock_hlen, hdr_len; int i; hdr_len = (features & ((1ULL << VIRTIO_NET_F_MRG_RXBUF) | (1ULL << VIRTIO_F_VERSION_1))) ? sizeof(struct virtio_net_hdr_mrg_rxbuf) : sizeof(struct virtio_net_hdr); if (features & (1 << VHOST_NET_F_VIRTIO_NET_HDR)) { /* vhost provides vnet_hdr */ vhost_hlen = hdr_len; sock_hlen = 0; } else { /* socket provides vnet_hdr */ vhost_hlen = 0; sock_hlen = hdr_len; } mutex_lock(&n->dev.mutex); if ((features & (1 << VHOST_F_LOG_ALL)) && !vhost_log_access_ok(&n->dev)) goto out_unlock; if ((features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))) { if (vhost_init_device_iotlb(&n->dev, true)) goto out_unlock; } for (i = 0; i < VHOST_NET_VQ_MAX; ++i) { mutex_lock(&n->vqs[i].vq.mutex); n->vqs[i].vq.acked_features = features; n->vqs[i].vhost_hlen = vhost_hlen; n->vqs[i].sock_hlen = sock_hlen; mutex_unlock(&n->vqs[i].vq.mutex); } mutex_unlock(&n->dev.mutex); return 0; out_unlock: mutex_unlock(&n->dev.mutex); return -EFAULT; } static long vhost_net_set_owner(struct vhost_net *n) { int r; mutex_lock(&n->dev.mutex); if (vhost_dev_has_owner(&n->dev)) { r = -EBUSY; goto out; } r = vhost_net_set_ubuf_info(n); if (r) goto out; r = vhost_dev_set_owner(&n->dev); if (r) vhost_net_clear_ubuf_info(n); vhost_net_flush(n); out: mutex_unlock(&n->dev.mutex); return r; } static long vhost_net_ioctl(struct file *f, unsigned int ioctl, unsigned long arg) { struct vhost_net *n = f->private_data; void __user *argp = (void __user *)arg; u64 __user *featurep = argp; struct vhost_vring_file backend; u64 features; int r; switch (ioctl) { case VHOST_NET_SET_BACKEND: if (copy_from_user(&backend, argp, sizeof backend)) return -EFAULT; return vhost_net_set_backend(n, backend.index, backend.fd); case VHOST_GET_FEATURES: features = VHOST_NET_FEATURES; if (copy_to_user(featurep, &features, sizeof features)) return -EFAULT; return 0; case VHOST_SET_FEATURES: if (copy_from_user(&features, featurep, sizeof features)) return -EFAULT; if (features & ~VHOST_NET_FEATURES) return -EOPNOTSUPP; return vhost_net_set_features(n, features); case VHOST_GET_BACKEND_FEATURES: features = VHOST_NET_BACKEND_FEATURES; if (copy_to_user(featurep, &features, sizeof(features))) return -EFAULT; return 0; case VHOST_SET_BACKEND_FEATURES: if (copy_from_user(&features, featurep, sizeof(features))) return -EFAULT; if (features & ~VHOST_NET_BACKEND_FEATURES) return -EOPNOTSUPP; return vhost_net_set_backend_features(n, features); case VHOST_RESET_OWNER: return vhost_net_reset_owner(n); case VHOST_SET_OWNER: return vhost_net_set_owner(n); default: mutex_lock(&n->dev.mutex); r = vhost_dev_ioctl(&n->dev, ioctl, argp); if (r == -ENOIOCTLCMD) r = vhost_vring_ioctl(&n->dev, ioctl, argp); else vhost_net_flush(n); mutex_unlock(&n->dev.mutex); return r; } } #ifdef CONFIG_COMPAT static long vhost_net_compat_ioctl(struct file *f, unsigned int ioctl, unsigned long arg) { return vhost_net_ioctl(f, ioctl, (unsigned long)compat_ptr(arg)); } #endif static ssize_t vhost_net_chr_read_iter(struct kiocb *iocb, struct iov_iter *to) { struct file *file = iocb->ki_filp; struct vhost_net *n = file->private_data; struct vhost_dev *dev = &n->dev; int noblock = file->f_flags & O_NONBLOCK; return vhost_chr_read_iter(dev, to, noblock); } static ssize_t vhost_net_chr_write_iter(struct kiocb *iocb, struct iov_iter *from) { struct file *file = iocb->ki_filp; struct vhost_net *n = file->private_data; struct vhost_dev *dev = &n->dev; return vhost_chr_write_iter(dev, from); } static __poll_t vhost_net_chr_poll(struct file *file, poll_table *wait) { struct vhost_net *n = file->private_data; struct vhost_dev *dev = &n->dev; return vhost_chr_poll(file, dev, wait); } static const struct file_operations vhost_net_fops = { .owner = THIS_MODULE, .release = vhost_net_release, .read_iter = vhost_net_chr_read_iter, .write_iter = vhost_net_chr_write_iter, .poll = vhost_net_chr_poll, .unlocked_ioctl = vhost_net_ioctl, #ifdef CONFIG_COMPAT .compat_ioctl = vhost_net_compat_ioctl, #endif .open = vhost_net_open, .llseek = noop_llseek, }; static struct miscdevice vhost_net_misc = { .minor = VHOST_NET_MINOR, .name = "vhost-net", .fops = &vhost_net_fops, }; static int vhost_net_init(void) { if (experimental_zcopytx) vhost_net_enable_zcopy(VHOST_NET_VQ_TX); return misc_register(&vhost_net_misc); } module_init(vhost_net_init); static void vhost_net_exit(void) { misc_deregister(&vhost_net_misc); } module_exit(vhost_net_exit); MODULE_VERSION("0.0.1"); MODULE_LICENSE("GPL v2"); MODULE_AUTHOR("Michael S. Tsirkin"); MODULE_DESCRIPTION("Host kernel accelerator for virtio net"); MODULE_ALIAS_MISCDEV(VHOST_NET_MINOR); MODULE_ALIAS("devname:vhost-net");
54 203 203 203 203 203 55 55 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 // SPDX-License-Identifier: GPL-2.0 /* * linux/fs/ext4/sysfs.c * * Copyright (C) 1992, 1993, 1994, 1995 * Remy Card (card@masi.ibp.fr) * Theodore Ts'o (tytso@mit.edu) * */ #include <linux/time.h> #include <linux/fs.h> #include <linux/seq_file.h> #include <linux/slab.h> #include <linux/proc_fs.h> #include "ext4.h" #include "ext4_jbd2.h" typedef enum { attr_noop, attr_delayed_allocation_blocks, attr_session_write_kbytes, attr_lifetime_write_kbytes, attr_reserved_clusters, attr_inode_readahead, attr_trigger_test_error, attr_first_error_time, attr_last_error_time, attr_feature, attr_pointer_ui, attr_pointer_atomic, } attr_id_t; typedef enum { ptr_explicit, ptr_ext4_sb_info_offset, ptr_ext4_super_block_offset, } attr_ptr_t; static const char proc_dirname[] = "fs/ext4"; static struct proc_dir_entry *ext4_proc_root; struct ext4_attr { struct attribute attr; short attr_id; short attr_ptr; union { int offset; void *explicit_ptr; } u; }; static ssize_t session_write_kbytes_show(struct ext4_sb_info *sbi, char *buf) { struct super_block *sb = sbi->s_buddy_cache->i_sb; if (!sb->s_bdev->bd_part) return snprintf(buf, PAGE_SIZE, "0\n"); return snprintf(buf, PAGE_SIZE, "%lu\n", (part_stat_read(sb->s_bdev->bd_part, sectors[STAT_WRITE]) - sbi->s_sectors_written_start) >> 1); } static ssize_t lifetime_write_kbytes_show(struct ext4_sb_info *sbi, char *buf) { struct super_block *sb = sbi->s_buddy_cache->i_sb; if (!sb->s_bdev->bd_part) return snprintf(buf, PAGE_SIZE, "0\n"); return snprintf(buf, PAGE_SIZE, "%llu\n", (unsigned long long)(sbi->s_kbytes_written + ((part_stat_read(sb->s_bdev->bd_part, sectors[STAT_WRITE]) - EXT4_SB(sb)->s_sectors_written_start) >> 1))); } static ssize_t inode_readahead_blks_store(struct ext4_sb_info *sbi, const char *buf, size_t count) { unsigned long t; int ret; ret = kstrtoul(skip_spaces(buf), 0, &t); if (ret) return ret; if (t && (!is_power_of_2(t) || t > 0x40000000)) return -EINVAL; sbi->s_inode_readahead_blks = t; return count; } static ssize_t reserved_clusters_store(struct ext4_sb_info *sbi, const char *buf, size_t count) { unsigned long long val; ext4_fsblk_t clusters = (ext4_blocks_count(sbi->s_es) >> sbi->s_cluster_bits); int ret; ret = kstrtoull(skip_spaces(buf), 0, &val); if (ret || val >= clusters) return -EINVAL; atomic64_set(&sbi->s_resv_clusters, val); return count; } static ssize_t trigger_test_error(struct ext4_sb_info *sbi, const char *buf, size_t count) { int len = count; if (!capable(CAP_SYS_ADMIN)) return -EPERM; if (len && buf[len-1] == '\n') len--; if (len) ext4_error(sbi->s_sb, "%.*s", len, buf); return count; } #define EXT4_ATTR(_name,_mode,_id) \ static struct ext4_attr ext4_attr_##_name = { \ .attr = {.name = __stringify(_name), .mode = _mode }, \ .attr_id = attr_##_id, \ } #define EXT4_ATTR_FUNC(_name,_mode) EXT4_ATTR(_name,_mode,_name) #define EXT4_ATTR_FEATURE(_name) EXT4_ATTR(_name, 0444, feature) #define EXT4_ATTR_OFFSET(_name,_mode,_id,_struct,_elname) \ static struct ext4_attr ext4_attr_##_name = { \ .attr = {.name = __stringify(_name), .mode = _mode }, \ .attr_id = attr_##_id, \ .attr_ptr = ptr_##_struct##_offset, \ .u = { \ .offset = offsetof(struct _struct, _elname),\ }, \ } #define EXT4_RO_ATTR_ES_UI(_name,_elname) \ EXT4_ATTR_OFFSET(_name, 0444, pointer_ui, ext4_super_block, _elname) #define EXT4_RW_ATTR_SBI_UI(_name,_elname) \ EXT4_ATTR_OFFSET(_name, 0644, pointer_ui, ext4_sb_info, _elname) #define EXT4_ATTR_PTR(_name,_mode,_id,_ptr) \ static struct ext4_attr ext4_attr_##_name = { \ .attr = {.name = __stringify(_name), .mode = _mode }, \ .attr_id = attr_##_id, \ .attr_ptr = ptr_explicit, \ .u = { \ .explicit_ptr = _ptr, \ }, \ } #define ATTR_LIST(name) &ext4_attr_##name.attr EXT4_ATTR_FUNC(delayed_allocation_blocks, 0444); EXT4_ATTR_FUNC(session_write_kbytes, 0444); EXT4_ATTR_FUNC(lifetime_write_kbytes, 0444); EXT4_ATTR_FUNC(reserved_clusters, 0644); EXT4_ATTR_OFFSET(inode_readahead_blks, 0644, inode_readahead, ext4_sb_info, s_inode_readahead_blks); EXT4_RW_ATTR_SBI_UI(inode_goal, s_inode_goal); EXT4_RW_ATTR_SBI_UI(mb_stats, s_mb_stats); EXT4_RW_ATTR_SBI_UI(mb_max_to_scan, s_mb_max_to_scan); EXT4_RW_ATTR_SBI_UI(mb_min_to_scan, s_mb_min_to_scan); EXT4_RW_ATTR_SBI_UI(mb_order2_req, s_mb_order2_reqs); EXT4_RW_ATTR_SBI_UI(mb_stream_req, s_mb_stream_request); EXT4_RW_ATTR_SBI_UI(mb_group_prealloc, s_mb_group_prealloc); EXT4_RW_ATTR_SBI_UI(extent_max_zeroout_kb, s_extent_max_zeroout_kb); EXT4_ATTR(trigger_fs_error, 0200, trigger_test_error); EXT4_RW_ATTR_SBI_UI(err_ratelimit_interval_ms, s_err_ratelimit_state.interval); EXT4_RW_ATTR_SBI_UI(err_ratelimit_burst, s_err_ratelimit_state.burst); EXT4_RW_ATTR_SBI_UI(warning_ratelimit_interval_ms, s_warning_ratelimit_state.interval); EXT4_RW_ATTR_SBI_UI(warning_ratelimit_burst, s_warning_ratelimit_state.burst); EXT4_RW_ATTR_SBI_UI(msg_ratelimit_interval_ms, s_msg_ratelimit_state.interval); EXT4_RW_ATTR_SBI_UI(msg_ratelimit_burst, s_msg_ratelimit_state.burst); EXT4_RO_ATTR_ES_UI(errors_count, s_error_count); EXT4_ATTR(first_error_time, 0444, first_error_time); EXT4_ATTR(last_error_time, 0444, last_error_time); static unsigned int old_bump_val = 128; EXT4_ATTR_PTR(max_writeback_mb_bump, 0444, pointer_ui, &old_bump_val); static struct attribute *ext4_attrs[] = { ATTR_LIST(delayed_allocation_blocks), ATTR_LIST(session_write_kbytes), ATTR_LIST(lifetime_write_kbytes), ATTR_LIST(reserved_clusters), ATTR_LIST(inode_readahead_blks), ATTR_LIST(inode_goal), ATTR_LIST(mb_stats), ATTR_LIST(mb_max_to_scan), ATTR_LIST(mb_min_to_scan), ATTR_LIST(mb_order2_req), ATTR_LIST(mb_stream_req), ATTR_LIST(mb_group_prealloc), ATTR_LIST(max_writeback_mb_bump), ATTR_LIST(extent_max_zeroout_kb), ATTR_LIST(trigger_fs_error), ATTR_LIST(err_ratelimit_interval_ms), ATTR_LIST(err_ratelimit_burst), ATTR_LIST(warning_ratelimit_interval_ms), ATTR_LIST(warning_ratelimit_burst), ATTR_LIST(msg_ratelimit_interval_ms), ATTR_LIST(msg_ratelimit_burst), ATTR_LIST(errors_count), ATTR_LIST(first_error_time), ATTR_LIST(last_error_time), NULL, }; /* Features this copy of ext4 supports */ EXT4_ATTR_FEATURE(lazy_itable_init); EXT4_ATTR_FEATURE(batched_discard); EXT4_ATTR_FEATURE(meta_bg_resize); #ifdef CONFIG_EXT4_FS_ENCRYPTION EXT4_ATTR_FEATURE(encryption); #endif EXT4_ATTR_FEATURE(metadata_csum_seed); static struct attribute *ext4_feat_attrs[] = { ATTR_LIST(lazy_itable_init), ATTR_LIST(batched_discard), ATTR_LIST(meta_bg_resize), #ifdef CONFIG_EXT4_FS_ENCRYPTION ATTR_LIST(encryption), #endif ATTR_LIST(metadata_csum_seed), NULL, }; static void *calc_ptr(struct ext4_attr *a, struct ext4_sb_info *sbi) { switch (a->attr_ptr) { case ptr_explicit: return a->u.explicit_ptr; case ptr_ext4_sb_info_offset: return (void *) (((char *) sbi) + a->u.offset); case ptr_ext4_super_block_offset: return (void *) (((char *) sbi->s_es) + a->u.offset); } return NULL; } static ssize_t __print_tstamp(char *buf, __le32 lo, __u8 hi) { return snprintf(buf, PAGE_SIZE, "%lld", ((time64_t)hi << 32) + le32_to_cpu(lo)); } #define print_tstamp(buf, es, tstamp) \ __print_tstamp(buf, (es)->tstamp, (es)->tstamp ## _hi) static ssize_t ext4_attr_show(struct kobject *kobj, struct attribute *attr, char *buf) { struct ext4_sb_info *sbi = container_of(kobj, struct ext4_sb_info, s_kobj); struct ext4_attr *a = container_of(attr, struct ext4_attr, attr); void *ptr = calc_ptr(a, sbi); switch (a->attr_id) { case attr_delayed_allocation_blocks: return snprintf(buf, PAGE_SIZE, "%llu\n", (s64) EXT4_C2B(sbi, percpu_counter_sum(&sbi->s_dirtyclusters_counter))); case attr_session_write_kbytes: return session_write_kbytes_show(sbi, buf); case attr_lifetime_write_kbytes: return lifetime_write_kbytes_show(sbi, buf); case attr_reserved_clusters: return snprintf(buf, PAGE_SIZE, "%llu\n", (unsigned long long) atomic64_read(&sbi->s_resv_clusters)); case attr_inode_readahead: case attr_pointer_ui: if (!ptr) return 0; if (a->attr_ptr == ptr_ext4_super_block_offset) return snprintf(buf, PAGE_SIZE, "%u\n", le32_to_cpup(ptr)); else return snprintf(buf, PAGE_SIZE, "%u\n", *((unsigned int *) ptr)); case attr_pointer_atomic: if (!ptr) return 0; return snprintf(buf, PAGE_SIZE, "%d\n", atomic_read((atomic_t *) ptr)); case attr_feature: return snprintf(buf, PAGE_SIZE, "supported\n"); case attr_first_error_time: return print_tstamp(buf, sbi->s_es, s_first_error_time); case attr_last_error_time: return print_tstamp(buf, sbi->s_es, s_last_error_time); } return 0; } static ssize_t ext4_attr_store(struct kobject *kobj, struct attribute *attr, const char *buf, size_t len) { struct ext4_sb_info *sbi = container_of(kobj, struct ext4_sb_info, s_kobj); struct ext4_attr *a = container_of(attr, struct ext4_attr, attr); void *ptr = calc_ptr(a, sbi); unsigned long t; int ret; switch (a->attr_id) { case attr_reserved_clusters: return reserved_clusters_store(sbi, buf, len); case attr_pointer_ui: if (!ptr) return 0; ret = kstrtoul(skip_spaces(buf), 0, &t); if (ret) return ret; if (a->attr_ptr == ptr_ext4_super_block_offset) *((__le32 *) ptr) = cpu_to_le32(t); else *((unsigned int *) ptr) = t; return len; case attr_inode_readahead: return inode_readahead_blks_store(sbi, buf, len); case attr_trigger_test_error: return trigger_test_error(sbi, buf, len); } return 0; } static void ext4_sb_release(struct kobject *kobj) { struct ext4_sb_info *sbi = container_of(kobj, struct ext4_sb_info, s_kobj); complete(&sbi->s_kobj_unregister); } static const struct sysfs_ops ext4_attr_ops = { .show = ext4_attr_show, .store = ext4_attr_store, }; static struct kobj_type ext4_sb_ktype = { .default_attrs = ext4_attrs, .sysfs_ops = &ext4_attr_ops, .release = ext4_sb_release, }; static struct kobj_type ext4_feat_ktype = { .default_attrs = ext4_feat_attrs, .sysfs_ops = &ext4_attr_ops, .release = (void (*)(struct kobject *))kfree, }; static struct kobject *ext4_root; static struct kobject *ext4_feat; int ext4_register_sysfs(struct super_block *sb) { struct ext4_sb_info *sbi = EXT4_SB(sb); int err; init_completion(&sbi->s_kobj_unregister); err = kobject_init_and_add(&sbi->s_kobj, &ext4_sb_ktype, ext4_root, "%s", sb->s_id); if (err) { kobject_put(&sbi->s_kobj); wait_for_completion(&sbi->s_kobj_unregister); return err; } if (ext4_proc_root) sbi->s_proc = proc_mkdir(sb->s_id, ext4_proc_root); if (sbi->s_proc) { proc_create_single_data("options", S_IRUGO, sbi->s_proc, ext4_seq_options_show, sb); proc_create_single_data("es_shrinker_info", S_IRUGO, sbi->s_proc, ext4_seq_es_shrinker_info_show, sb); proc_create_seq_data("mb_groups", S_IRUGO, sbi->s_proc, &ext4_mb_seq_groups_ops, sb); } return 0; } void ext4_unregister_sysfs(struct super_block *sb) { struct ext4_sb_info *sbi = EXT4_SB(sb); if (sbi->s_proc) remove_proc_subtree(sb->s_id, ext4_proc_root); kobject_del(&sbi->s_kobj); } int __init ext4_init_sysfs(void) { int ret; ext4_root = kobject_create_and_add("ext4", fs_kobj); if (!ext4_root) return -ENOMEM; ext4_feat = kzalloc(sizeof(*ext4_feat), GFP_KERNEL); if (!ext4_feat) { ret = -ENOMEM; goto root_err; } ret = kobject_init_and_add(ext4_feat, &ext4_feat_ktype, ext4_root, "features"); if (ret) goto feat_err; ext4_proc_root = proc_mkdir(proc_dirname, NULL); return ret; feat_err: kobject_put(ext4_feat); ext4_feat = NULL; root_err: kobject_put(ext4_root); ext4_root = NULL; return ret; } void ext4_exit_sysfs(void) { kobject_put(ext4_feat); ext4_feat = NULL; kobject_put(ext4_root); ext4_root = NULL; remove_proc_entry(proc_dirname, NULL); ext4_proc_root = NULL; }
1 1 1 1 1 10 10 10 10 10 10 10 10 10 10 10 10 2 10 10 10 10 10 10 10 1 10 10 10 10 10 2 10 10 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 // SPDX-License-Identifier: GPL-2.0 /* * fs/bfs/dir.c * BFS directory operations. * Copyright (C) 1999,2000 Tigran Aivazian <tigran@veritas.com> * Made endianness-clean by Andrew Stribblehill <ads@wompom.org> 2005 */ #include <linux/time.h> #include <linux/string.h> #include <linux/fs.h> #include <linux/buffer_head.h> #include <linux/sched.h> #include "bfs.h" #undef DEBUG #ifdef DEBUG #define dprintf(x...) printf(x) #else #define dprintf(x...) #endif static int bfs_add_entry(struct inode *dir, const struct qstr *child, int ino); static struct buffer_head *bfs_find_entry(struct inode *dir, const struct qstr *child, struct bfs_dirent **res_dir); static int bfs_readdir(struct file *f, struct dir_context *ctx) { struct inode *dir = file_inode(f); struct buffer_head *bh; struct bfs_dirent *de; unsigned int offset; int block; if (ctx->pos & (BFS_DIRENT_SIZE - 1)) { printf("Bad f_pos=%08lx for %s:%08lx\n", (unsigned long)ctx->pos, dir->i_sb->s_id, dir->i_ino); return -EINVAL; } while (ctx->pos < dir->i_size) { offset = ctx->pos & (BFS_BSIZE - 1); block = BFS_I(dir)->i_sblock + (ctx->pos >> BFS_BSIZE_BITS); bh = sb_bread(dir->i_sb, block); if (!bh) { ctx->pos += BFS_BSIZE - offset; continue; } do { de = (struct bfs_dirent *)(bh->b_data + offset); if (de->ino) { int size = strnlen(de->name, BFS_NAMELEN); if (!dir_emit(ctx, de->name, size, le16_to_cpu(de->ino), DT_UNKNOWN)) { brelse(bh); return 0; } } offset += BFS_DIRENT_SIZE; ctx->pos += BFS_DIRENT_SIZE; } while ((offset < BFS_BSIZE) && (ctx->pos < dir->i_size)); brelse(bh); } return 0; } const struct file_operations bfs_dir_operations = { .read = generic_read_dir, .iterate_shared = bfs_readdir, .fsync = generic_file_fsync, .llseek = generic_file_llseek, }; static int bfs_create(struct inode *dir, struct dentry *dentry, umode_t mode, bool excl) { int err; struct inode *inode; struct super_block *s = dir->i_sb; struct bfs_sb_info *info = BFS_SB(s); unsigned long ino; inode = new_inode(s); if (!inode) return -ENOMEM; mutex_lock(&info->bfs_lock); ino = find_first_zero_bit(info->si_imap, info->si_lasti + 1); if (ino > info->si_lasti) { mutex_unlock(&info->bfs_lock); iput(inode); return -ENOSPC; } set_bit(ino, info->si_imap); info->si_freei--; inode_init_owner(inode, dir, mode); inode->i_mtime = inode->i_atime = inode->i_ctime = current_time(inode); inode->i_blocks = 0; inode->i_op = &bfs_file_inops; inode->i_fop = &bfs_file_operations; inode->i_mapping->a_ops = &bfs_aops; inode->i_ino = ino; BFS_I(inode)->i_dsk_ino = ino; BFS_I(inode)->i_sblock = 0; BFS_I(inode)->i_eblock = 0; insert_inode_hash(inode); mark_inode_dirty(inode); bfs_dump_imap("create", s); err = bfs_add_entry(dir, &dentry->d_name, inode->i_ino); if (err) { inode_dec_link_count(inode); mutex_unlock(&info->bfs_lock); iput(inode); return err; } mutex_unlock(&info->bfs_lock); d_instantiate(dentry, inode); return 0; } static struct dentry *bfs_lookup(struct inode *dir, struct dentry *dentry, unsigned int flags) { struct inode *inode = NULL; struct buffer_head *bh; struct bfs_dirent *de; struct bfs_sb_info *info = BFS_SB(dir->i_sb); if (dentry->d_name.len > BFS_NAMELEN) return ERR_PTR(-ENAMETOOLONG); mutex_lock(&info->bfs_lock); bh = bfs_find_entry(dir, &dentry->d_name, &de); if (bh) { unsigned long ino = (unsigned long)le16_to_cpu(de->ino); brelse(bh); inode = bfs_iget(dir->i_sb, ino); } mutex_unlock(&info->bfs_lock); return d_splice_alias(inode, dentry); } static int bfs_link(struct dentry *old, struct inode *dir, struct dentry *new) { struct inode *inode = d_inode(old); struct bfs_sb_info *info = BFS_SB(inode->i_sb); int err; mutex_lock(&info->bfs_lock); err = bfs_add_entry(dir, &new->d_name, inode->i_ino); if (err) { mutex_unlock(&info->bfs_lock); return err; } inc_nlink(inode); inode->i_ctime = current_time(inode); mark_inode_dirty(inode); ihold(inode); d_instantiate(new, inode); mutex_unlock(&info->bfs_lock); return 0; } static int bfs_unlink(struct inode *dir, struct dentry *dentry) { int error = -ENOENT; struct inode *inode = d_inode(dentry); struct buffer_head *bh; struct bfs_dirent *de; struct bfs_sb_info *info = BFS_SB(inode->i_sb); mutex_lock(&info->bfs_lock); bh = bfs_find_entry(dir, &dentry->d_name, &de); if (!bh || (le16_to_cpu(de->ino) != inode->i_ino)) goto out_brelse; if (!inode->i_nlink) { printf("unlinking non-existent file %s:%lu (nlink=%d)\n", inode->i_sb->s_id, inode->i_ino, inode->i_nlink); set_nlink(inode, 1); } de->ino = 0; mark_buffer_dirty_inode(bh, dir); dir->i_ctime = dir->i_mtime = current_time(dir); mark_inode_dirty(dir); inode->i_ctime = dir->i_ctime; inode_dec_link_count(inode); error = 0; out_brelse: brelse(bh); mutex_unlock(&info->bfs_lock); return error; } static int bfs_rename(struct inode *old_dir, struct dentry *old_dentry, struct inode *new_dir, struct dentry *new_dentry, unsigned int flags) { struct inode *old_inode, *new_inode; struct buffer_head *old_bh, *new_bh; struct bfs_dirent *old_de, *new_de; struct bfs_sb_info *info; int error = -ENOENT; if (flags & ~RENAME_NOREPLACE) return -EINVAL; old_bh = new_bh = NULL; old_inode = d_inode(old_dentry); if (S_ISDIR(old_inode->i_mode)) return -EINVAL; info = BFS_SB(old_inode->i_sb); mutex_lock(&info->bfs_lock); old_bh = bfs_find_entry(old_dir, &old_dentry->d_name, &old_de); if (!old_bh || (le16_to_cpu(old_de->ino) != old_inode->i_ino)) goto end_rename; error = -EPERM; new_inode = d_inode(new_dentry); new_bh = bfs_find_entry(new_dir, &new_dentry->d_name, &new_de); if (new_bh && !new_inode) { brelse(new_bh); new_bh = NULL; } if (!new_bh) { error = bfs_add_entry(new_dir, &new_dentry->d_name, old_inode->i_ino); if (error) goto end_rename; } old_de->ino = 0; old_dir->i_ctime = old_dir->i_mtime = current_time(old_dir); mark_inode_dirty(old_dir); if (new_inode) { new_inode->i_ctime = current_time(new_inode); inode_dec_link_count(new_inode); } mark_buffer_dirty_inode(old_bh, old_dir); error = 0; end_rename: mutex_unlock(&info->bfs_lock); brelse(old_bh); brelse(new_bh); return error; } const struct inode_operations bfs_dir_inops = { .create = bfs_create, .lookup = bfs_lookup, .link = bfs_link, .unlink = bfs_unlink, .rename = bfs_rename, }; static int bfs_add_entry(struct inode *dir, const struct qstr *child, int ino) { const unsigned char *name = child->name; int namelen = child->len; struct buffer_head *bh; struct bfs_dirent *de; int block, sblock, eblock, off, pos; int i; dprintf("name=%s, namelen=%d\n", name, namelen); if (!namelen) return -ENOENT; if (namelen > BFS_NAMELEN) return -ENAMETOOLONG; sblock = BFS_I(dir)->i_sblock; eblock = BFS_I(dir)->i_eblock; for (block = sblock; block <= eblock; block++) { bh = sb_bread(dir->i_sb, block); if (!bh) return -EIO; for (off = 0; off < BFS_BSIZE; off += BFS_DIRENT_SIZE) { de = (struct bfs_dirent *)(bh->b_data + off); if (!de->ino) { pos = (block - sblock) * BFS_BSIZE + off; if (pos >= dir->i_size) { dir->i_size += BFS_DIRENT_SIZE; dir->i_ctime = current_time(dir); } dir->i_mtime = current_time(dir); mark_inode_dirty(dir); de->ino = cpu_to_le16((u16)ino); for (i = 0; i < BFS_NAMELEN; i++) de->name[i] = (i < namelen) ? name[i] : 0; mark_buffer_dirty_inode(bh, dir); brelse(bh); return 0; } } brelse(bh); } return -ENOSPC; } static inline int bfs_namecmp(int len, const unsigned char *name, const char *buffer) { if ((len < BFS_NAMELEN) && buffer[len]) return 0; return !memcmp(name, buffer, len); } static struct buffer_head *bfs_find_entry(struct inode *dir, const struct qstr *child, struct bfs_dirent **res_dir) { unsigned long block = 0, offset = 0; struct buffer_head *bh = NULL; struct bfs_dirent *de; const unsigned char *name = child->name; int namelen = child->len; *res_dir = NULL; if (namelen > BFS_NAMELEN) return NULL; while (block * BFS_BSIZE + offset < dir->i_size) { if (!bh) { bh = sb_bread(dir->i_sb, BFS_I(dir)->i_sblock + block); if (!bh) { block++; continue; } } de = (struct bfs_dirent *)(bh->b_data + offset); offset += BFS_DIRENT_SIZE; if (le16_to_cpu(de->ino) && bfs_namecmp(namelen, name, de->name)) { *res_dir = de; return bh; } if (offset < bh->b_size) continue; brelse(bh); bh = NULL; offset = 0; block++; } brelse(bh); return NULL; }
7 7 7 7 3 3 3 1 1 3 3 3 1 1 1 2 3 5 4 5 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 /* Daemon interface * * Copyright (C) 2007 Red Hat, Inc. All Rights Reserved. * Written by David Howells (dhowells@redhat.com) * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public Licence * as published by the Free Software Foundation; either version * 2 of the Licence, or (at your option) any later version. */ #include <linux/module.h> #include <linux/init.h> #include <linux/sched.h> #include <linux/completion.h> #include <linux/slab.h> #include <linux/fs.h> #include <linux/file.h> #include <linux/namei.h> #include <linux/poll.h> #include <linux/mount.h> #include <linux/statfs.h> #include <linux/ctype.h> #include <linux/string.h> #include <linux/fs_struct.h> #include "internal.h" static int cachefiles_daemon_open(struct inode *, struct file *); static int cachefiles_daemon_release(struct inode *, struct file *); static ssize_t cachefiles_daemon_read(struct file *, char __user *, size_t, loff_t *); static ssize_t cachefiles_daemon_write(struct file *, const char __user *, size_t, loff_t *); static __poll_t cachefiles_daemon_poll(struct file *, struct poll_table_struct *); static int cachefiles_daemon_frun(struct cachefiles_cache *, char *); static int cachefiles_daemon_fcull(struct cachefiles_cache *, char *); static int cachefiles_daemon_fstop(struct cachefiles_cache *, char *); static int cachefiles_daemon_brun(struct cachefiles_cache *, char *); static int cachefiles_daemon_bcull(struct cachefiles_cache *, char *); static int cachefiles_daemon_bstop(struct cachefiles_cache *, char *); static int cachefiles_daemon_cull(struct cachefiles_cache *, char *); static int cachefiles_daemon_debug(struct cachefiles_cache *, char *); static int cachefiles_daemon_dir(struct cachefiles_cache *, char *); static int cachefiles_daemon_inuse(struct cachefiles_cache *, char *); static int cachefiles_daemon_secctx(struct cachefiles_cache *, char *); static int cachefiles_daemon_tag(struct cachefiles_cache *, char *); static unsigned long cachefiles_open; const struct file_operations cachefiles_daemon_fops = { .owner = THIS_MODULE, .open = cachefiles_daemon_open, .release = cachefiles_daemon_release, .read = cachefiles_daemon_read, .write = cachefiles_daemon_write, .poll = cachefiles_daemon_poll, .llseek = noop_llseek, }; struct cachefiles_daemon_cmd { char name[8]; int (*handler)(struct cachefiles_cache *cache, char *args); }; static const struct cachefiles_daemon_cmd cachefiles_daemon_cmds[] = { { "bind", cachefiles_daemon_bind }, { "brun", cachefiles_daemon_brun }, { "bcull", cachefiles_daemon_bcull }, { "bstop", cachefiles_daemon_bstop }, { "cull", cachefiles_daemon_cull }, { "debug", cachefiles_daemon_debug }, { "dir", cachefiles_daemon_dir }, { "frun", cachefiles_daemon_frun }, { "fcull", cachefiles_daemon_fcull }, { "fstop", cachefiles_daemon_fstop }, { "inuse", cachefiles_daemon_inuse }, { "secctx", cachefiles_daemon_secctx }, { "tag", cachefiles_daemon_tag }, { "", NULL } }; /* * do various checks */ static int cachefiles_daemon_open(struct inode *inode, struct file *file) { struct cachefiles_cache *cache; _enter(""); /* only the superuser may do this */ if (!capable(CAP_SYS_ADMIN)) return -EPERM; /* the cachefiles device may only be open once at a time */ if (xchg(&cachefiles_open, 1) == 1) return -EBUSY; /* allocate a cache record */ cache = kzalloc(sizeof(struct cachefiles_cache), GFP_KERNEL); if (!cache) { cachefiles_open = 0; return -ENOMEM; } mutex_init(&cache->daemon_mutex); cache->active_nodes = RB_ROOT; rwlock_init(&cache->active_lock); init_waitqueue_head(&cache->daemon_pollwq); /* set default caching limits * - limit at 1% free space and/or free files * - cull below 5% free space and/or free files * - cease culling above 7% free space and/or free files */ cache->frun_percent = 7; cache->fcull_percent = 5; cache->fstop_percent = 1; cache->brun_percent = 7; cache->bcull_percent = 5; cache->bstop_percent = 1; file->private_data = cache; cache->cachefilesd = file; return 0; } /* * release a cache */ static int cachefiles_daemon_release(struct inode *inode, struct file *file) { struct cachefiles_cache *cache = file->private_data; _enter(""); ASSERT(cache); set_bit(CACHEFILES_DEAD, &cache->flags); cachefiles_daemon_unbind(cache); ASSERT(!cache->active_nodes.rb_node); /* clean up the control file interface */ cache->cachefilesd = NULL; file->private_data = NULL; cachefiles_open = 0; kfree(cache); _leave(""); return 0; } /* * read the cache state */ static ssize_t cachefiles_daemon_read(struct file *file, char __user *_buffer, size_t buflen, loff_t *pos) { struct cachefiles_cache *cache = file->private_data; unsigned long long b_released; unsigned f_released; char buffer[256]; int n; //_enter(",,%zu,", buflen); if (!test_bit(CACHEFILES_READY, &cache->flags)) return 0; /* check how much space the cache has */ cachefiles_has_space(cache, 0, 0); /* summarise */ f_released = atomic_xchg(&cache->f_released, 0); b_released = atomic_long_xchg(&cache->b_released, 0); clear_bit(CACHEFILES_STATE_CHANGED, &cache->flags); n = snprintf(buffer, sizeof(buffer), "cull=%c" " frun=%llx" " fcull=%llx" " fstop=%llx" " brun=%llx" " bcull=%llx" " bstop=%llx" " freleased=%x" " breleased=%llx", test_bit(CACHEFILES_CULLING, &cache->flags) ? '1' : '0', (unsigned long long) cache->frun, (unsigned long long) cache->fcull, (unsigned long long) cache->fstop, (unsigned long long) cache->brun, (unsigned long long) cache->bcull, (unsigned long long) cache->bstop, f_released, b_released); if (n > buflen) return -EMSGSIZE; if (copy_to_user(_buffer, buffer, n) != 0) return -EFAULT; return n; } /* * command the cache */ static ssize_t cachefiles_daemon_write(struct file *file, const char __user *_data, size_t datalen, loff_t *pos) { const struct cachefiles_daemon_cmd *cmd; struct cachefiles_cache *cache = file->private_data; ssize_t ret; char *data, *args, *cp; //_enter(",,%zu,", datalen); ASSERT(cache); if (test_bit(CACHEFILES_DEAD, &cache->flags)) return -EIO; if (datalen < 0 || datalen > PAGE_SIZE - 1) return -EOPNOTSUPP; /* drag the command string into the kernel so we can parse it */ data = memdup_user_nul(_data, datalen); if (IS_ERR(data)) return PTR_ERR(data); ret = -EINVAL; if (memchr(data, '\0', datalen)) goto error; /* strip any newline */ cp = memchr(data, '\n', datalen); if (cp) { if (cp == data) goto error; *cp = '\0'; } /* parse the command */ ret = -EOPNOTSUPP; for (args = data; *args; args++) if (isspace(*args)) break; if (*args) { if (args == data) goto error; *args = '\0'; args = skip_spaces(++args); } /* run the appropriate command handler */ for (cmd = cachefiles_daemon_cmds; cmd->name[0]; cmd++) if (strcmp(cmd->name, data) == 0) goto found_command; error: kfree(data); //_leave(" = %zd", ret); return ret; found_command: mutex_lock(&cache->daemon_mutex); ret = -EIO; if (!test_bit(CACHEFILES_DEAD, &cache->flags)) ret = cmd->handler(cache, args); mutex_unlock(&cache->daemon_mutex); if (ret == 0) ret = datalen; goto error; } /* * poll for culling state * - use EPOLLOUT to indicate culling state */ static __poll_t cachefiles_daemon_poll(struct file *file, struct poll_table_struct *poll) { struct cachefiles_cache *cache = file->private_data; __poll_t mask; poll_wait(file, &cache->daemon_pollwq, poll); mask = 0; if (test_bit(CACHEFILES_STATE_CHANGED, &cache->flags)) mask |= EPOLLIN; if (test_bit(CACHEFILES_CULLING, &cache->flags)) mask |= EPOLLOUT; return mask; } /* * give a range error for cache space constraints * - can be tail-called */ static int cachefiles_daemon_range_error(struct cachefiles_cache *cache, char *args) { pr_err("Free space limits must be in range 0%%<=stop<cull<run<100%%\n"); return -EINVAL; } /* * set the percentage of files at which to stop culling * - command: "frun <N>%" */ static int cachefiles_daemon_frun(struct cachefiles_cache *cache, char *args) { unsigned long frun; _enter(",%s", args); if (!*args) return -EINVAL; frun = simple_strtoul(args, &args, 10); if (args[0] != '%' || args[1] != '\0') return -EINVAL; if (frun <= cache->fcull_percent || frun >= 100) return cachefiles_daemon_range_error(cache, args); cache->frun_percent = frun; return 0; } /* * set the percentage of files at which to start culling * - command: "fcull <N>%" */ static int cachefiles_daemon_fcull(struct cachefiles_cache *cache, char *args) { unsigned long fcull; _enter(",%s", args); if (!*args) return -EINVAL; fcull = simple_strtoul(args, &args, 10); if (args[0] != '%' || args[1] != '\0') return -EINVAL; if (fcull <= cache->fstop_percent || fcull >= cache->frun_percent) return cachefiles_daemon_range_error(cache, args); cache->fcull_percent = fcull; return 0; } /* * set the percentage of files at which to stop allocating * - command: "fstop <N>%" */ static int cachefiles_daemon_fstop(struct cachefiles_cache *cache, char *args) { unsigned long fstop; _enter(",%s", args); if (!*args) return -EINVAL; fstop = simple_strtoul(args, &args, 10); if (args[0] != '%' || args[1] != '\0') return -EINVAL; if (fstop < 0 || fstop >= cache->fcull_percent) return cachefiles_daemon_range_error(cache, args); cache->fstop_percent = fstop; return 0; } /* * set the percentage of blocks at which to stop culling * - command: "brun <N>%" */ static int cachefiles_daemon_brun(struct cachefiles_cache *cache, char *args) { unsigned long brun; _enter(",%s", args); if (!*args) return -EINVAL; brun = simple_strtoul(args, &args, 10); if (args[0] != '%' || args[1] != '\0') return -EINVAL; if (brun <= cache->bcull_percent || brun >= 100) return cachefiles_daemon_range_error(cache, args); cache->brun_percent = brun; return 0; } /* * set the percentage of blocks at which to start culling * - command: "bcull <N>%" */ static int cachefiles_daemon_bcull(struct cachefiles_cache *cache, char *args) { unsigned long bcull; _enter(",%s", args); if (!*args) return -EINVAL; bcull = simple_strtoul(args, &args, 10); if (args[0] != '%' || args[1] != '\0') return -EINVAL; if (bcull <= cache->bstop_percent || bcull >= cache->brun_percent) return cachefiles_daemon_range_error(cache, args); cache->bcull_percent = bcull; return 0; } /* * set the percentage of blocks at which to stop allocating * - command: "bstop <N>%" */ static int cachefiles_daemon_bstop(struct cachefiles_cache *cache, char *args) { unsigned long bstop; _enter(",%s", args); if (!*args) return -EINVAL; bstop = simple_strtoul(args, &args, 10); if (args[0] != '%' || args[1] != '\0') return -EINVAL; if (bstop < 0 || bstop >= cache->bcull_percent) return cachefiles_daemon_range_error(cache, args); cache->bstop_percent = bstop; return 0; } /* * set the cache directory * - command: "dir <name>" */ static int cachefiles_daemon_dir(struct cachefiles_cache *cache, char *args) { char *dir; _enter(",%s", args); if (!*args) { pr_err("Empty directory specified\n"); return -EINVAL; } if (cache->rootdirname) { pr_err("Second cache directory specified\n"); return -EEXIST; } dir = kstrdup(args, GFP_KERNEL); if (!dir) return -ENOMEM; cache->rootdirname = dir; return 0; } /* * set the cache security context * - command: "secctx <ctx>" */ static int cachefiles_daemon_secctx(struct cachefiles_cache *cache, char *args) { char *secctx; _enter(",%s", args); if (!*args) { pr_err("Empty security context specified\n"); return -EINVAL; } if (cache->secctx) { pr_err("Second security context specified\n"); return -EINVAL; } secctx = kstrdup(args, GFP_KERNEL); if (!secctx) return -ENOMEM; cache->secctx = secctx; return 0; } /* * set the cache tag * - command: "tag <name>" */ static int cachefiles_daemon_tag(struct cachefiles_cache *cache, char *args) { char *tag; _enter(",%s", args); if (!*args) { pr_err("Empty tag specified\n"); return -EINVAL; } if (cache->tag) return -EEXIST; tag = kstrdup(args, GFP_KERNEL); if (!tag) return -ENOMEM; cache->tag = tag; return 0; } /* * request a node in the cache be culled from the current working directory * - command: "cull <name>" */ static int cachefiles_daemon_cull(struct cachefiles_cache *cache, char *args) { struct path path; const struct cred *saved_cred; int ret; _enter(",%s", args); if (strchr(args, '/')) goto inval; if (!test_bit(CACHEFILES_READY, &cache->flags)) { pr_err("cull applied to unready cache\n"); return -EIO; } if (test_bit(CACHEFILES_DEAD, &cache->flags)) { pr_err("cull applied to dead cache\n"); return -EIO; } /* extract the directory dentry from the cwd */ get_fs_pwd(current->fs, &path); if (!d_can_lookup(path.dentry)) goto notdir; cachefiles_begin_secure(cache, &saved_cred); ret = cachefiles_cull(cache, path.dentry, args); cachefiles_end_secure(cache, saved_cred); path_put(&path); _leave(" = %d", ret); return ret; notdir: path_put(&path); pr_err("cull command requires dirfd to be a directory\n"); return -ENOTDIR; inval: pr_err("cull command requires dirfd and filename\n"); return -EINVAL; } /* * set debugging mode * - command: "debug <mask>" */ static int cachefiles_daemon_debug(struct cachefiles_cache *cache, char *args) { unsigned long mask; _enter(",%s", args); mask = simple_strtoul(args, &args, 0); if (args[0] != '\0') goto inval; cachefiles_debug = mask; _leave(" = 0"); return 0; inval: pr_err("debug command requires mask\n"); return -EINVAL; } /* * find out whether an object in the current working directory is in use or not * - command: "inuse <name>" */ static int cachefiles_daemon_inuse(struct cachefiles_cache *cache, char *args) { struct path path; const struct cred *saved_cred; int ret; //_enter(",%s", args); if (strchr(args, '/')) goto inval; if (!test_bit(CACHEFILES_READY, &cache->flags)) { pr_err("inuse applied to unready cache\n"); return -EIO; } if (test_bit(CACHEFILES_DEAD, &cache->flags)) { pr_err("inuse applied to dead cache\n"); return -EIO; } /* extract the directory dentry from the cwd */ get_fs_pwd(current->fs, &path); if (!d_can_lookup(path.dentry)) goto notdir; cachefiles_begin_secure(cache, &saved_cred); ret = cachefiles_check_in_use(cache, path.dentry, args); cachefiles_end_secure(cache, saved_cred); path_put(&path); //_leave(" = %d", ret); return ret; notdir: path_put(&path); pr_err("inuse command requires dirfd to be a directory\n"); return -ENOTDIR; inval: pr_err("inuse command requires dirfd and filename\n"); return -EINVAL; } /* * see if we have space for a number of pages and/or a number of files in the * cache */ int cachefiles_has_space(struct cachefiles_cache *cache, unsigned fnr, unsigned bnr) { struct kstatfs stats; struct path path = { .mnt = cache->mnt, .dentry = cache->mnt->mnt_root, }; int ret; //_enter("{%llu,%llu,%llu,%llu,%llu,%llu},%u,%u", // (unsigned long long) cache->frun, // (unsigned long long) cache->fcull, // (unsigned long long) cache->fstop, // (unsigned long long) cache->brun, // (unsigned long long) cache->bcull, // (unsigned long long) cache->bstop, // fnr, bnr); /* find out how many pages of blockdev are available */ memset(&stats, 0, sizeof(stats)); ret = vfs_statfs(&path, &stats); if (ret < 0) { if (ret == -EIO) cachefiles_io_error(cache, "statfs failed"); _leave(" = %d", ret); return ret; } stats.f_bavail >>= cache->bshift; //_debug("avail %llu,%llu", // (unsigned long long) stats.f_ffree, // (unsigned long long) stats.f_bavail); /* see if there is sufficient space */ if (stats.f_ffree > fnr) stats.f_ffree -= fnr; else stats.f_ffree = 0; if (stats.f_bavail > bnr) stats.f_bavail -= bnr; else stats.f_bavail = 0; ret = -ENOBUFS; if (stats.f_ffree < cache->fstop || stats.f_bavail < cache->bstop) goto begin_cull; ret = 0; if (stats.f_ffree < cache->fcull || stats.f_bavail < cache->bcull) goto begin_cull; if (test_bit(CACHEFILES_CULLING, &cache->flags) && stats.f_ffree >= cache->frun && stats.f_bavail >= cache->brun && test_and_clear_bit(CACHEFILES_CULLING, &cache->flags) ) { _debug("cease culling"); cachefiles_state_changed(cache); } //_leave(" = 0"); return 0; begin_cull: if (!test_and_set_bit(CACHEFILES_CULLING, &cache->flags)) { _debug("### CULL CACHE ###"); cachefiles_state_changed(cache); } _leave(" = %d", ret); return ret; }
31 30 31 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 /* module that allows mangling of the arp payload */ #include <linux/module.h> #include <linux/netfilter.h> #include <linux/netfilter_arp/arpt_mangle.h> #include <net/sock.h> MODULE_LICENSE("GPL"); MODULE_AUTHOR("Bart De Schuymer <bdschuym@pandora.be>"); MODULE_DESCRIPTION("arptables arp payload mangle target"); static unsigned int target(struct sk_buff *skb, const struct xt_action_param *par) { const struct arpt_mangle *mangle = par->targinfo; const struct arphdr *arp; unsigned char *arpptr; int pln, hln; if (!skb_make_writable(skb, skb->len)) return NF_DROP; arp = arp_hdr(skb); arpptr = skb_network_header(skb) + sizeof(*arp); pln = arp->ar_pln; hln = arp->ar_hln; /* We assume that pln and hln were checked in the match */ if (mangle->flags & ARPT_MANGLE_SDEV) { if (ARPT_DEV_ADDR_LEN_MAX < hln || (arpptr + hln > skb_tail_pointer(skb))) return NF_DROP; memcpy(arpptr, mangle->src_devaddr, hln); } arpptr += hln; if (mangle->flags & ARPT_MANGLE_SIP) { if (ARPT_MANGLE_ADDR_LEN_MAX < pln || (arpptr + pln > skb_tail_pointer(skb))) return NF_DROP; memcpy(arpptr, &mangle->u_s.src_ip, pln); } arpptr += pln; if (mangle->flags & ARPT_MANGLE_TDEV) { if (ARPT_DEV_ADDR_LEN_MAX < hln || (arpptr + hln > skb_tail_pointer(skb))) return NF_DROP; memcpy(arpptr, mangle->tgt_devaddr, hln); } arpptr += hln; if (mangle->flags & ARPT_MANGLE_TIP) { if (ARPT_MANGLE_ADDR_LEN_MAX < pln || (arpptr + pln > skb_tail_pointer(skb))) return NF_DROP; memcpy(arpptr, &mangle->u_t.tgt_ip, pln); } return mangle->target; } static int checkentry(const struct xt_tgchk_param *par) { const struct arpt_mangle *mangle = par->targinfo; if (mangle->flags & ~ARPT_MANGLE_MASK || !(mangle->flags & ARPT_MANGLE_MASK)) return -EINVAL; if (mangle->target != NF_DROP && mangle->target != NF_ACCEPT && mangle->target != XT_CONTINUE) return -EINVAL; return 0; } static struct xt_target arpt_mangle_reg __read_mostly = { .name = "mangle", .family = NFPROTO_ARP, .target = target, .targetsize = sizeof(struct arpt_mangle), .checkentry = checkentry, .me = THIS_MODULE, }; static int __init arpt_mangle_init(void) { return xt_register_target(&arpt_mangle_reg); } static void __exit arpt_mangle_fini(void) { xt_unregister_target(&arpt_mangle_reg); } module_init(arpt_mangle_init); module_exit(arpt_mangle_fini);
254 149 6 148 149 260 427 427 427 427 433 433 433 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 /* * ip_vs_proto_tcp.c: TCP load balancing support for IPVS * * Authors: Wensong Zhang <wensong@linuxvirtualserver.org> * Julian Anastasov <ja@ssi.bg> * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License * as published by the Free Software Foundation; either version * 2 of the License, or (at your option) any later version. * * Changes: Hans Schillstrom <hans.schillstrom@ericsson.com> * * Network name space (netns) aware. * Global data moved to netns i.e struct netns_ipvs * tcp_timeouts table has copy per netns in a hash table per * protocol ip_vs_proto_data and is handled by netns */ #define KMSG_COMPONENT "IPVS" #define pr_fmt(fmt) KMSG_COMPONENT ": " fmt #include <linux/kernel.h> #include <linux/ip.h> #include <linux/tcp.h> /* for tcphdr */ #include <net/ip.h> #include <net/tcp.h> /* for csum_tcpudp_magic */ #include <net/ip6_checksum.h> #include <linux/netfilter.h> #include <linux/netfilter_ipv4.h> #include <net/ip_vs.h> static int tcp_conn_schedule(struct netns_ipvs *ipvs, int af, struct sk_buff *skb, struct ip_vs_proto_data *pd, int *verdict, struct ip_vs_conn **cpp, struct ip_vs_iphdr *iph) { struct ip_vs_service *svc; struct tcphdr _tcph, *th; __be16 _ports[2], *ports = NULL; /* In the event of icmp, we're only guaranteed to have the first 8 * bytes of the transport header, so we only check the rest of the * TCP packet for non-ICMP packets */ if (likely(!ip_vs_iph_icmp(iph))) { th = skb_header_pointer(skb, iph->len, sizeof(_tcph), &_tcph); if (th) { if (th->rst || !(sysctl_sloppy_tcp(ipvs) || th->syn)) return 1; ports = &th->source; } } else { ports = skb_header_pointer( skb, iph->len, sizeof(_ports), &_ports); } if (!ports) { *verdict = NF_DROP; return 0; } /* No !th->ack check to allow scheduling on SYN+ACK for Active FTP */ if (likely(!ip_vs_iph_inverse(iph))) svc = ip_vs_service_find(ipvs, af, skb->mark, iph->protocol, &iph->daddr, ports[1]); else svc = ip_vs_service_find(ipvs, af, skb->mark, iph->protocol, &iph->saddr, ports[0]); if (svc) { int ignored; if (ip_vs_todrop(ipvs)) { /* * It seems that we are very loaded. * We have to drop this packet :( */ *verdict = NF_DROP; return 0; } /* * Let the virtual server select a real server for the * incoming connection, and create a connection entry. */ *cpp = ip_vs_schedule(svc, skb, pd, &ignored, iph); if (!*cpp && ignored <= 0) { if (!ignored) *verdict = ip_vs_leave(svc, skb, pd, iph); else *verdict = NF_DROP; return 0; } } /* NF_ACCEPT */ return 1; } static inline void tcp_fast_csum_update(int af, struct tcphdr *tcph, const union nf_inet_addr *oldip, const union nf_inet_addr *newip, __be16 oldport, __be16 newport) { #ifdef CONFIG_IP_VS_IPV6 if (af == AF_INET6) tcph->check = csum_fold(ip_vs_check_diff16(oldip->ip6, newip->ip6, ip_vs_check_diff2(oldport, newport, ~csum_unfold(tcph->check)))); else #endif tcph->check = csum_fold(ip_vs_check_diff4(oldip->ip, newip->ip, ip_vs_check_diff2(oldport, newport, ~csum_unfold(tcph->check)))); } static inline void tcp_partial_csum_update(int af, struct tcphdr *tcph, const union nf_inet_addr *oldip, const union nf_inet_addr *newip, __be16 oldlen, __be16 newlen) { #ifdef CONFIG_IP_VS_IPV6 if (af == AF_INET6) tcph->check = ~csum_fold(ip_vs_check_diff16(oldip->ip6, newip->ip6, ip_vs_check_diff2(oldlen, newlen, csum_unfold(tcph->check)))); else #endif tcph->check = ~csum_fold(ip_vs_check_diff4(oldip->ip, newip->ip, ip_vs_check_diff2(oldlen, newlen, csum_unfold(tcph->check)))); } static int tcp_snat_handler(struct sk_buff *skb, struct ip_vs_protocol *pp, struct ip_vs_conn *cp, struct ip_vs_iphdr *iph) { struct tcphdr *tcph; unsigned int tcphoff = iph->len; int oldlen; int payload_csum = 0; #ifdef CONFIG_IP_VS_IPV6 if (cp->af == AF_INET6 && iph->fragoffs) return 1; #endif oldlen = skb->len - tcphoff; /* csum_check requires unshared skb */ if (!skb_make_writable(skb, tcphoff+sizeof(*tcph))) return 0; if (unlikely(cp->app != NULL)) { int ret; /* Some checks before mangling */ if (pp->csum_check && !pp->csum_check(cp->af, skb, pp)) return 0; /* Call application helper if needed */ if (!(ret = ip_vs_app_pkt_out(cp, skb, iph))) return 0; /* ret=2: csum update is needed after payload mangling */ if (ret == 1) oldlen = skb->len - tcphoff; else payload_csum = 1; } tcph = (void *)skb_network_header(skb) + tcphoff; tcph->source = cp->vport; /* Adjust TCP checksums */ if (skb->ip_summed == CHECKSUM_PARTIAL) { tcp_partial_csum_update(cp->af, tcph, &cp->daddr, &cp->vaddr, htons(oldlen), htons(skb->len - tcphoff)); } else if (!payload_csum) { /* Only port and addr are changed, do fast csum update */ tcp_fast_csum_update(cp->af, tcph, &cp->daddr, &cp->vaddr, cp->dport, cp->vport); if (skb->ip_summed == CHECKSUM_COMPLETE) skb->ip_summed = (cp->app && pp->csum_check) ? CHECKSUM_UNNECESSARY : CHECKSUM_NONE; } else { /* full checksum calculation */ tcph->check = 0; skb->csum = skb_checksum(skb, tcphoff, skb->len - tcphoff, 0); #ifdef CONFIG_IP_VS_IPV6 if (cp->af == AF_INET6) tcph->check = csum_ipv6_magic(&cp->vaddr.in6, &cp->caddr.in6, skb->len - tcphoff, cp->protocol, skb->csum); else #endif tcph->check = csum_tcpudp_magic(cp->vaddr.ip, cp->caddr.ip, skb->len - tcphoff, cp->protocol, skb->csum); skb->ip_summed = CHECKSUM_UNNECESSARY; IP_VS_DBG(11, "O-pkt: %s O-csum=%d (+%zd)\n", pp->name, tcph->check, (char*)&(tcph->check) - (char*)tcph); } return 1; } static int tcp_dnat_handler(struct sk_buff *skb, struct ip_vs_protocol *pp, struct ip_vs_conn *cp, struct ip_vs_iphdr *iph) { struct tcphdr *tcph; unsigned int tcphoff = iph->len; int oldlen; int payload_csum = 0; #ifdef CONFIG_IP_VS_IPV6 if (cp->af == AF_INET6 && iph->fragoffs) return 1; #endif oldlen = skb->len - tcphoff; /* csum_check requires unshared skb */ if (!skb_make_writable(skb, tcphoff+sizeof(*tcph))) return 0; if (unlikely(cp->app != NULL)) { int ret; /* Some checks before mangling */ if (pp->csum_check && !pp->csum_check(cp->af, skb, pp)) return 0; /* * Attempt ip_vs_app call. * It will fix ip_vs_conn and iph ack_seq stuff */ if (!(ret = ip_vs_app_pkt_in(cp, skb, iph))) return 0; /* ret=2: csum update is needed after payload mangling */ if (ret == 1) oldlen = skb->len - tcphoff; else payload_csum = 1; } tcph = (void *)skb_network_header(skb) + tcphoff; tcph->dest = cp->dport; /* * Adjust TCP checksums */ if (skb->ip_summed == CHECKSUM_PARTIAL) { tcp_partial_csum_update(cp->af, tcph, &cp->vaddr, &cp->daddr, htons(oldlen), htons(skb->len - tcphoff)); } else if (!payload_csum) { /* Only port and addr are changed, do fast csum update */ tcp_fast_csum_update(cp->af, tcph, &cp->vaddr, &cp->daddr, cp->vport, cp->dport); if (skb->ip_summed == CHECKSUM_COMPLETE) skb->ip_summed = (cp->app && pp->csum_check) ? CHECKSUM_UNNECESSARY : CHECKSUM_NONE; } else { /* full checksum calculation */ tcph->check = 0; skb->csum = skb_checksum(skb, tcphoff, skb->len - tcphoff, 0); #ifdef CONFIG_IP_VS_IPV6 if (cp->af == AF_INET6) tcph->check = csum_ipv6_magic(&cp->caddr.in6, &cp->daddr.in6, skb->len - tcphoff, cp->protocol, skb->csum); else #endif tcph->check = csum_tcpudp_magic(cp->caddr.ip, cp->daddr.ip, skb->len - tcphoff, cp->protocol, skb->csum); skb->ip_summed = CHECKSUM_UNNECESSARY; } return 1; } static int tcp_csum_check(int af, struct sk_buff *skb, struct ip_vs_protocol *pp) { unsigned int tcphoff; #ifdef CONFIG_IP_VS_IPV6 if (af == AF_INET6) tcphoff = sizeof(struct ipv6hdr); else #endif tcphoff = ip_hdrlen(skb); switch (skb->ip_summed) { case CHECKSUM_NONE: skb->csum = skb_checksum(skb, tcphoff, skb->len - tcphoff, 0); /* fall through */ case CHECKSUM_COMPLETE: #ifdef CONFIG_IP_VS_IPV6 if (af == AF_INET6) { if (csum_ipv6_magic(&ipv6_hdr(skb)->saddr, &ipv6_hdr(skb)->daddr, skb->len - tcphoff, ipv6_hdr(skb)->nexthdr, skb->csum)) { IP_VS_DBG_RL_PKT(0, af, pp, skb, 0, "Failed checksum for"); return 0; } } else #endif if (csum_tcpudp_magic(ip_hdr(skb)->saddr, ip_hdr(skb)->daddr, skb->len - tcphoff, ip_hdr(skb)->protocol, skb->csum)) { IP_VS_DBG_RL_PKT(0, af, pp, skb, 0, "Failed checksum for"); return 0; } break; default: /* No need to checksum. */ break; } return 1; } #define TCP_DIR_INPUT 0 #define TCP_DIR_OUTPUT 4 #define TCP_DIR_INPUT_ONLY 8 static const int tcp_state_off[IP_VS_DIR_LAST] = { [IP_VS_DIR_INPUT] = TCP_DIR_INPUT, [IP_VS_DIR_OUTPUT] = TCP_DIR_OUTPUT, [IP_VS_DIR_INPUT_ONLY] = TCP_DIR_INPUT_ONLY, }; /* * Timeout table[state] */ static const int tcp_timeouts[IP_VS_TCP_S_LAST+1] = { [IP_VS_TCP_S_NONE] = 2*HZ, [IP_VS_TCP_S_ESTABLISHED] = 15*60*HZ, [IP_VS_TCP_S_SYN_SENT] = 2*60*HZ, [IP_VS_TCP_S_SYN_RECV] = 1*60*HZ, [IP_VS_TCP_S_FIN_WAIT] = 2*60*HZ, [IP_VS_TCP_S_TIME_WAIT] = 2*60*HZ, [IP_VS_TCP_S_CLOSE] = 10*HZ, [IP_VS_TCP_S_CLOSE_WAIT] = 60*HZ, [IP_VS_TCP_S_LAST_ACK] = 30*HZ, [IP_VS_TCP_S_LISTEN] = 2*60*HZ, [IP_VS_TCP_S_SYNACK] = 120*HZ, [IP_VS_TCP_S_LAST] = 2*HZ, }; static const char *const tcp_state_name_table[IP_VS_TCP_S_LAST+1] = { [IP_VS_TCP_S_NONE] = "NONE", [IP_VS_TCP_S_ESTABLISHED] = "ESTABLISHED", [IP_VS_TCP_S_SYN_SENT] = "SYN_SENT", [IP_VS_TCP_S_SYN_RECV] = "SYN_RECV", [IP_VS_TCP_S_FIN_WAIT] = "FIN_WAIT", [IP_VS_TCP_S_TIME_WAIT] = "TIME_WAIT", [IP_VS_TCP_S_CLOSE] = "CLOSE", [IP_VS_TCP_S_CLOSE_WAIT] = "CLOSE_WAIT", [IP_VS_TCP_S_LAST_ACK] = "LAST_ACK", [IP_VS_TCP_S_LISTEN] = "LISTEN", [IP_VS_TCP_S_SYNACK] = "SYNACK", [IP_VS_TCP_S_LAST] = "BUG!", }; static const bool tcp_state_active_table[IP_VS_TCP_S_LAST] = { [IP_VS_TCP_S_NONE] = false, [IP_VS_TCP_S_ESTABLISHED] = true, [IP_VS_TCP_S_SYN_SENT] = true, [IP_VS_TCP_S_SYN_RECV] = true, [IP_VS_TCP_S_FIN_WAIT] = false, [IP_VS_TCP_S_TIME_WAIT] = false, [IP_VS_TCP_S_CLOSE] = false, [IP_VS_TCP_S_CLOSE_WAIT] = false, [IP_VS_TCP_S_LAST_ACK] = false, [IP_VS_TCP_S_LISTEN] = false, [IP_VS_TCP_S_SYNACK] = true, }; #define sNO IP_VS_TCP_S_NONE #define sES IP_VS_TCP_S_ESTABLISHED #define sSS IP_VS_TCP_S_SYN_SENT #define sSR IP_VS_TCP_S_SYN_RECV #define sFW IP_VS_TCP_S_FIN_WAIT #define sTW IP_VS_TCP_S_TIME_WAIT #define sCL IP_VS_TCP_S_CLOSE #define sCW IP_VS_TCP_S_CLOSE_WAIT #define sLA IP_VS_TCP_S_LAST_ACK #define sLI IP_VS_TCP_S_LISTEN #define sSA IP_VS_TCP_S_SYNACK struct tcp_states_t { int next_state[IP_VS_TCP_S_LAST]; }; static const char * tcp_state_name(int state) { if (state >= IP_VS_TCP_S_LAST) return "ERR!"; return tcp_state_name_table[state] ? tcp_state_name_table[state] : "?"; } static bool tcp_state_active(int state) { if (state >= IP_VS_TCP_S_LAST) return false; return tcp_state_active_table[state]; } static struct tcp_states_t tcp_states[] = { /* INPUT */ /* sNO, sES, sSS, sSR, sFW, sTW, sCL, sCW, sLA, sLI, sSA */ /*syn*/ {{sSR, sES, sES, sSR, sSR, sSR, sSR, sSR, sSR, sSR, sSR }}, /*fin*/ {{sCL, sCW, sSS, sTW, sTW, sTW, sCL, sCW, sLA, sLI, sTW }}, /*ack*/ {{sES, sES, sSS, sES, sFW, sTW, sCL, sCW, sCL, sLI, sES }}, /*rst*/ {{sCL, sCL, sCL, sSR, sCL, sCL, sCL, sCL, sLA, sLI, sSR }}, /* OUTPUT */ /* sNO, sES, sSS, sSR, sFW, sTW, sCL, sCW, sLA, sLI, sSA */ /*syn*/ {{sSS, sES, sSS, sSR, sSS, sSS, sSS, sSS, sSS, sLI, sSR }}, /*fin*/ {{sTW, sFW, sSS, sTW, sFW, sTW, sCL, sTW, sLA, sLI, sTW }}, /*ack*/ {{sES, sES, sSS, sES, sFW, sTW, sCL, sCW, sLA, sES, sES }}, /*rst*/ {{sCL, sCL, sSS, sCL, sCL, sTW, sCL, sCL, sCL, sCL, sCL }}, /* INPUT-ONLY */ /* sNO, sES, sSS, sSR, sFW, sTW, sCL, sCW, sLA, sLI, sSA */ /*syn*/ {{sSR, sES, sES, sSR, sSR, sSR, sSR, sSR, sSR, sSR, sSR }}, /*fin*/ {{sCL, sFW, sSS, sTW, sFW, sTW, sCL, sCW, sLA, sLI, sTW }}, /*ack*/ {{sES, sES, sSS, sES, sFW, sTW, sCL, sCW, sCL, sLI, sES }}, /*rst*/ {{sCL, sCL, sCL, sSR, sCL, sCL, sCL, sCL, sLA, sLI, sCL }}, }; static struct tcp_states_t tcp_states_dos[] = { /* INPUT */ /* sNO, sES, sSS, sSR, sFW, sTW, sCL, sCW, sLA, sLI, sSA */ /*syn*/ {{sSR, sES, sES, sSR, sSR, sSR, sSR, sSR, sSR, sSR, sSA }}, /*fin*/ {{sCL, sCW, sSS, sTW, sTW, sTW, sCL, sCW, sLA, sLI, sSA }}, /*ack*/ {{sES, sES, sSS, sSR, sFW, sTW, sCL, sCW, sCL, sLI, sSA }}, /*rst*/ {{sCL, sCL, sCL, sSR, sCL, sCL, sCL, sCL, sLA, sLI, sCL }}, /* OUTPUT */ /* sNO, sES, sSS, sSR, sFW, sTW, sCL, sCW, sLA, sLI, sSA */ /*syn*/ {{sSS, sES, sSS, sSA, sSS, sSS, sSS, sSS, sSS, sLI, sSA }}, /*fin*/ {{sTW, sFW, sSS, sTW, sFW, sTW, sCL, sTW, sLA, sLI, sTW }}, /*ack*/ {{sES, sES, sSS, sES, sFW, sTW, sCL, sCW, sLA, sES, sES }}, /*rst*/ {{sCL, sCL, sSS, sCL, sCL, sTW, sCL, sCL, sCL, sCL, sCL }}, /* INPUT-ONLY */ /* sNO, sES, sSS, sSR, sFW, sTW, sCL, sCW, sLA, sLI, sSA */ /*syn*/ {{sSA, sES, sES, sSR, sSA, sSA, sSA, sSA, sSA, sSA, sSA }}, /*fin*/ {{sCL, sFW, sSS, sTW, sFW, sTW, sCL, sCW, sLA, sLI, sTW }}, /*ack*/ {{sES, sES, sSS, sES, sFW, sTW, sCL, sCW, sCL, sLI, sES }}, /*rst*/ {{sCL, sCL, sCL, sSR, sCL, sCL, sCL, sCL, sLA, sLI, sCL }}, }; static void tcp_timeout_change(struct ip_vs_proto_data *pd, int flags) { int on = (flags & 1); /* secure_tcp */ /* ** FIXME: change secure_tcp to independent sysctl var ** or make it per-service or per-app because it is valid ** for most if not for all of the applications. Something ** like "capabilities" (flags) for each object. */ pd->tcp_state_table = (on ? tcp_states_dos : tcp_states); } static inline int tcp_state_idx(struct tcphdr *th) { if (th->rst) return 3; if (th->syn) return 0; if (th->fin) return 1; if (th->ack) return 2; return -1; } static inline void set_tcp_state(struct ip_vs_proto_data *pd, struct ip_vs_conn *cp, int direction, struct tcphdr *th) { int state_idx; int new_state = IP_VS_TCP_S_CLOSE; int state_off = tcp_state_off[direction]; /* * Update state offset to INPUT_ONLY if necessary * or delete NO_OUTPUT flag if output packet detected */ if (cp->flags & IP_VS_CONN_F_NOOUTPUT) { if (state_off == TCP_DIR_OUTPUT) cp->flags &= ~IP_VS_CONN_F_NOOUTPUT; else state_off = TCP_DIR_INPUT_ONLY; } if ((state_idx = tcp_state_idx(th)) < 0) { IP_VS_DBG(8, "tcp_state_idx=%d!!!\n", state_idx); goto tcp_state_out; } new_state = pd->tcp_state_table[state_off+state_idx].next_state[cp->state]; tcp_state_out: if (new_state != cp->state) { struct ip_vs_dest *dest = cp->dest; IP_VS_DBG_BUF(8, "%s %s [%c%c%c%c] %s:%d->" "%s:%d state: %s->%s conn->refcnt:%d\n", pd->pp->name, ((state_off == TCP_DIR_OUTPUT) ? "output " : "input "), th->syn ? 'S' : '.', th->fin ? 'F' : '.', th->ack ? 'A' : '.', th->rst ? 'R' : '.', IP_VS_DBG_ADDR(cp->daf, &cp->daddr), ntohs(cp->dport), IP_VS_DBG_ADDR(cp->af, &cp->caddr), ntohs(cp->cport), tcp_state_name(cp->state), tcp_state_name(new_state), refcount_read(&cp->refcnt)); if (dest) { if (!(cp->flags & IP_VS_CONN_F_INACTIVE) && !tcp_state_active(new_state)) { atomic_dec(&dest->activeconns); atomic_inc(&dest->inactconns); cp->flags |= IP_VS_CONN_F_INACTIVE; } else if ((cp->flags & IP_VS_CONN_F_INACTIVE) && tcp_state_active(new_state)) { atomic_inc(&dest->activeconns); atomic_dec(&dest->inactconns); cp->flags &= ~IP_VS_CONN_F_INACTIVE; } } if (new_state == IP_VS_TCP_S_ESTABLISHED) ip_vs_control_assure_ct(cp); } if (likely(pd)) cp->timeout = pd->timeout_table[cp->state = new_state]; else /* What to do ? */ cp->timeout = tcp_timeouts[cp->state = new_state]; } /* * Handle state transitions */ static void tcp_state_transition(struct ip_vs_conn *cp, int direction, const struct sk_buff *skb, struct ip_vs_proto_data *pd) { struct tcphdr _tcph, *th; #ifdef CONFIG_IP_VS_IPV6 int ihl = cp->af == AF_INET ? ip_hdrlen(skb) : sizeof(struct ipv6hdr); #else int ihl = ip_hdrlen(skb); #endif th = skb_header_pointer(skb, ihl, sizeof(_tcph), &_tcph); if (th == NULL) return; spin_lock_bh(&cp->lock); set_tcp_state(pd, cp, direction, th); spin_unlock_bh(&cp->lock); } static inline __u16 tcp_app_hashkey(__be16 port) { return (((__force u16)port >> TCP_APP_TAB_BITS) ^ (__force u16)port) & TCP_APP_TAB_MASK; } static int tcp_register_app(struct netns_ipvs *ipvs, struct ip_vs_app *inc) { struct ip_vs_app *i; __u16 hash; __be16 port = inc->port; int ret = 0; struct ip_vs_proto_data *pd = ip_vs_proto_data_get(ipvs, IPPROTO_TCP); hash = tcp_app_hashkey(port); list_for_each_entry(i, &ipvs->tcp_apps[hash], p_list) { if (i->port == port) { ret = -EEXIST; goto out; } } list_add_rcu(&inc->p_list, &ipvs->tcp_apps[hash]); atomic_inc(&pd->appcnt); out: return ret; } static void tcp_unregister_app(struct netns_ipvs *ipvs, struct ip_vs_app *inc) { struct ip_vs_proto_data *pd = ip_vs_proto_data_get(ipvs, IPPROTO_TCP); atomic_dec(&pd->appcnt); list_del_rcu(&inc->p_list); } static int tcp_app_conn_bind(struct ip_vs_conn *cp) { struct netns_ipvs *ipvs = cp->ipvs; int hash; struct ip_vs_app *inc; int result = 0; /* Default binding: bind app only for NAT */ if (IP_VS_FWD_METHOD(cp) != IP_VS_CONN_F_MASQ) return 0; /* Lookup application incarnations and bind the right one */ hash = tcp_app_hashkey(cp->vport); list_for_each_entry_rcu(inc, &ipvs->tcp_apps[hash], p_list) { if (inc->port == cp->vport) { if (unlikely(!ip_vs_app_inc_get(inc))) break; IP_VS_DBG_BUF(9, "%s(): Binding conn %s:%u->" "%s:%u to app %s on port %u\n", __func__, IP_VS_DBG_ADDR(cp->af, &cp->caddr), ntohs(cp->cport), IP_VS_DBG_ADDR(cp->af, &cp->vaddr), ntohs(cp->vport), inc->name, ntohs(inc->port)); cp->app = inc; if (inc->init_conn) result = inc->init_conn(inc, cp); break; } } return result; } /* * Set LISTEN timeout. (ip_vs_conn_put will setup timer) */ void ip_vs_tcp_conn_listen(struct ip_vs_conn *cp) { struct ip_vs_proto_data *pd = ip_vs_proto_data_get(cp->ipvs, IPPROTO_TCP); spin_lock_bh(&cp->lock); cp->state = IP_VS_TCP_S_LISTEN; cp->timeout = (pd ? pd->timeout_table[IP_VS_TCP_S_LISTEN] : tcp_timeouts[IP_VS_TCP_S_LISTEN]); spin_unlock_bh(&cp->lock); } /* --------------------------------------------- * timeouts is netns related now. * --------------------------------------------- */ static int __ip_vs_tcp_init(struct netns_ipvs *ipvs, struct ip_vs_proto_data *pd) { ip_vs_init_hash_table(ipvs->tcp_apps, TCP_APP_TAB_SIZE); pd->timeout_table = ip_vs_create_timeout_table((int *)tcp_timeouts, sizeof(tcp_timeouts)); if (!pd->timeout_table) return -ENOMEM; pd->tcp_state_table = tcp_states; return 0; } static void __ip_vs_tcp_exit(struct netns_ipvs *ipvs, struct ip_vs_proto_data *pd) { kfree(pd->timeout_table); } struct ip_vs_protocol ip_vs_protocol_tcp = { .name = "TCP", .protocol = IPPROTO_TCP, .num_states = IP_VS_TCP_S_LAST, .dont_defrag = 0, .init = NULL, .exit = NULL, .init_netns = __ip_vs_tcp_init, .exit_netns = __ip_vs_tcp_exit, .register_app = tcp_register_app, .unregister_app = tcp_unregister_app, .conn_schedule = tcp_conn_schedule, .conn_in_get = ip_vs_conn_in_get_proto, .conn_out_get = ip_vs_conn_out_get_proto, .snat_handler = tcp_snat_handler, .dnat_handler = tcp_dnat_handler, .csum_check = tcp_csum_check, .state_name = tcp_state_name, .state_transition = tcp_state_transition, .app_conn_bind = tcp_app_conn_bind, .debug_packet = ip_vs_tcpudp_debug_packet, .timeout_change = tcp_timeout_change, };
22 21 20 15 2 19 18 5 5 19 1 1 3 3 3 3 3 1 3 9 7 3 1 9 15 11 9 9 15 3 3 15 15 15 24 24 24 23 23 21 23 21 21 15 15 15 15 15 15 15 15 15 3 1 15 15 11 11 11 11 3 12 15 3 2 2 3 2 1 2 6 5 5 1 2 1 1 5 5 4 3 2 3 2 2 6 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 /* * Copyright (c) 2007, 2017 Oracle and/or its affiliates. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU * General Public License (GPL) Version 2, available from the file * COPYING in the main directory of this source tree, or the * OpenIB.org BSD license below: * * Redistribution and use in source and binary forms, with or * without modification, are permitted provided that the following * conditions are met: * * - Redistributions of source code must retain the above * copyright notice, this list of conditions and the following * disclaimer. * * - Redistributions in binary form must reproduce the above * copyright notice, this list of conditions and the following * disclaimer in the documentation and/or other materials * provided with the distribution. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE * SOFTWARE. * */ #include <linux/pagemap.h> #include <linux/slab.h> #include <linux/rbtree.h> #include <linux/dma-mapping.h> /* for DMA_*_DEVICE */ #include "rds.h" /* * XXX * - build with sparse * - should we detect duplicate keys on a socket? hmm. * - an rdma is an mlock, apply rlimit? */ /* * get the number of pages by looking at the page indices that the start and * end addresses fall in. * * Returns 0 if the vec is invalid. It is invalid if the number of bytes * causes the address to wrap or overflows an unsigned int. This comes * from being stored in the 'length' member of 'struct scatterlist'. */ static unsigned int rds_pages_in_vec(struct rds_iovec *vec) { if ((vec->addr + vec->bytes <= vec->addr) || (vec->bytes > (u64)UINT_MAX)) return 0; return ((vec->addr + vec->bytes + PAGE_SIZE - 1) >> PAGE_SHIFT) - (vec->addr >> PAGE_SHIFT); } static struct rds_mr *rds_mr_tree_walk(struct rb_root *root, u64 key, struct rds_mr *insert) { struct rb_node **p = &root->rb_node; struct rb_node *parent = NULL; struct rds_mr *mr; while (*p) { parent = *p; mr = rb_entry(parent, struct rds_mr, r_rb_node); if (key < mr->r_key) p = &(*p)->rb_left; else if (key > mr->r_key) p = &(*p)->rb_right; else return mr; } if (insert) { rb_link_node(&insert->r_rb_node, parent, p); rb_insert_color(&insert->r_rb_node, root); refcount_inc(&insert->r_refcount); } return NULL; } /* * Destroy the transport-specific part of a MR. */ static void rds_destroy_mr(struct rds_mr *mr) { struct rds_sock *rs = mr->r_sock; void *trans_private = NULL; unsigned long flags; rdsdebug("RDS: destroy mr key is %x refcnt %u\n", mr->r_key, refcount_read(&mr->r_refcount)); if (test_and_set_bit(RDS_MR_DEAD, &mr->r_state)) return; spin_lock_irqsave(&rs->rs_rdma_lock, flags); if (!RB_EMPTY_NODE(&mr->r_rb_node)) rb_erase(&mr->r_rb_node, &rs->rs_rdma_keys); trans_private = mr->r_trans_private; mr->r_trans_private = NULL; spin_unlock_irqrestore(&rs->rs_rdma_lock, flags); if (trans_private) mr->r_trans->free_mr(trans_private, mr->r_invalidate); } void __rds_put_mr_final(struct rds_mr *mr) { rds_destroy_mr(mr); kfree(mr); } /* * By the time this is called we can't have any more ioctls called on * the socket so we don't need to worry about racing with others. */ void rds_rdma_drop_keys(struct rds_sock *rs) { struct rds_mr *mr; struct rb_node *node; unsigned long flags; /* Release any MRs associated with this socket */ spin_lock_irqsave(&rs->rs_rdma_lock, flags); while ((node = rb_first(&rs->rs_rdma_keys))) { mr = rb_entry(node, struct rds_mr, r_rb_node); if (mr->r_trans == rs->rs_transport) mr->r_invalidate = 0; rb_erase(&mr->r_rb_node, &rs->rs_rdma_keys); RB_CLEAR_NODE(&mr->r_rb_node); spin_unlock_irqrestore(&rs->rs_rdma_lock, flags); rds_destroy_mr(mr); rds_mr_put(mr); spin_lock_irqsave(&rs->rs_rdma_lock, flags); } spin_unlock_irqrestore(&rs->rs_rdma_lock, flags); if (rs->rs_transport && rs->rs_transport->flush_mrs) rs->rs_transport->flush_mrs(); } /* * Helper function to pin user pages. */ static int rds_pin_pages(unsigned long user_addr, unsigned int nr_pages, struct page **pages, int write) { int ret; ret = get_user_pages_fast(user_addr, nr_pages, write, pages); if (ret >= 0 && ret < nr_pages) { while (ret--) put_page(pages[ret]); ret = -EFAULT; } return ret; } static int __rds_rdma_map(struct rds_sock *rs, struct rds_get_mr_args *args, u64 *cookie_ret, struct rds_mr **mr_ret, struct rds_conn_path *cp) { struct rds_mr *mr = NULL, *found; unsigned int nr_pages; struct page **pages = NULL; struct scatterlist *sg; void *trans_private; unsigned long flags; rds_rdma_cookie_t cookie; unsigned int nents; long i; int ret; if (ipv6_addr_any(&rs->rs_bound_addr) || !rs->rs_transport) { ret = -ENOTCONN; /* XXX not a great errno */ goto out; } if (!rs->rs_transport->get_mr) { ret = -EOPNOTSUPP; goto out; } nr_pages = rds_pages_in_vec(&args->vec); if (nr_pages == 0) { ret = -EINVAL; goto out; } /* Restrict the size of mr irrespective of underlying transport * To account for unaligned mr regions, subtract one from nr_pages */ if ((nr_pages - 1) > (RDS_MAX_MSG_SIZE >> PAGE_SHIFT)) { ret = -EMSGSIZE; goto out; } rdsdebug("RDS: get_mr addr %llx len %llu nr_pages %u\n", args->vec.addr, args->vec.bytes, nr_pages); /* XXX clamp nr_pages to limit the size of this alloc? */ pages = kcalloc(nr_pages, sizeof(struct page *), GFP_KERNEL); if (!pages) { ret = -ENOMEM; goto out; } mr = kzalloc(sizeof(struct rds_mr), GFP_KERNEL); if (!mr) { ret = -ENOMEM; goto out; } refcount_set(&mr->r_refcount, 1); RB_CLEAR_NODE(&mr->r_rb_node); mr->r_trans = rs->rs_transport; mr->r_sock = rs; if (args->flags & RDS_RDMA_USE_ONCE) mr->r_use_once = 1; if (args->flags & RDS_RDMA_INVALIDATE) mr->r_invalidate = 1; if (args->flags & RDS_RDMA_READWRITE) mr->r_write = 1; /* * Pin the pages that make up the user buffer and transfer the page * pointers to the mr's sg array. We check to see if we've mapped * the whole region after transferring the partial page references * to the sg array so that we can have one page ref cleanup path. * * For now we have no flag that tells us whether the mapping is * r/o or r/w. We need to assume r/w, or we'll do a lot of RDMA to * the zero page. */ ret = rds_pin_pages(args->vec.addr, nr_pages, pages, 1); if (ret < 0) goto out; nents = ret; sg = kcalloc(nents, sizeof(*sg), GFP_KERNEL); if (!sg) { ret = -ENOMEM; goto out; } WARN_ON(!nents); sg_init_table(sg, nents); /* Stick all pages into the scatterlist */ for (i = 0 ; i < nents; i++) sg_set_page(&sg[i], pages[i], PAGE_SIZE, 0); rdsdebug("RDS: trans_private nents is %u\n", nents); /* Obtain a transport specific MR. If this succeeds, the * s/g list is now owned by the MR. * Note that dma_map() implies that pending writes are * flushed to RAM, so no dma_sync is needed here. */ trans_private = rs->rs_transport->get_mr(sg, nents, rs, &mr->r_key, cp ? cp->cp_conn : NULL); if (IS_ERR(trans_private)) { for (i = 0 ; i < nents; i++) put_page(sg_page(&sg[i])); kfree(sg); ret = PTR_ERR(trans_private); goto out; } mr->r_trans_private = trans_private; rdsdebug("RDS: get_mr put_user key is %x cookie_addr %p\n", mr->r_key, (void *)(unsigned long) args->cookie_addr); /* The user may pass us an unaligned address, but we can only * map page aligned regions. So we keep the offset, and build * a 64bit cookie containing <R_Key, offset> and pass that * around. */ cookie = rds_rdma_make_cookie(mr->r_key, args->vec.addr & ~PAGE_MASK); if (cookie_ret) *cookie_ret = cookie; if (args->cookie_addr && put_user(cookie, (u64 __user *)(unsigned long) args->cookie_addr)) { ret = -EFAULT; goto out; } /* Inserting the new MR into the rbtree bumps its * reference count. */ spin_lock_irqsave(&rs->rs_rdma_lock, flags); found = rds_mr_tree_walk(&rs->rs_rdma_keys, mr->r_key, mr); spin_unlock_irqrestore(&rs->rs_rdma_lock, flags); BUG_ON(found && found != mr); rdsdebug("RDS: get_mr key is %x\n", mr->r_key); if (mr_ret) { refcount_inc(&mr->r_refcount); *mr_ret = mr; } ret = 0; out: kfree(pages); if (mr) rds_mr_put(mr); return ret; } int rds_get_mr(struct rds_sock *rs, char __user *optval, int optlen) { struct rds_get_mr_args args; if (optlen != sizeof(struct rds_get_mr_args)) return -EINVAL; if (copy_from_user(&args, (struct rds_get_mr_args __user *)optval, sizeof(struct rds_get_mr_args))) return -EFAULT; return __rds_rdma_map(rs, &args, NULL, NULL, NULL); } int rds_get_mr_for_dest(struct rds_sock *rs, char __user *optval, int optlen) { struct rds_get_mr_for_dest_args args; struct rds_get_mr_args new_args; if (optlen != sizeof(struct rds_get_mr_for_dest_args)) return -EINVAL; if (copy_from_user(&args, (struct rds_get_mr_for_dest_args __user *)optval, sizeof(struct rds_get_mr_for_dest_args))) return -EFAULT; /* * Initially, just behave like get_mr(). * TODO: Implement get_mr as wrapper around this * and deprecate it. */ new_args.vec = args.vec; new_args.cookie_addr = args.cookie_addr; new_args.flags = args.flags; return __rds_rdma_map(rs, &new_args, NULL, NULL, NULL); } /* * Free the MR indicated by the given R_Key */ int rds_free_mr(struct rds_sock *rs, char __user *optval, int optlen) { struct rds_free_mr_args args; struct rds_mr *mr; unsigned long flags; if (optlen != sizeof(struct rds_free_mr_args)) return -EINVAL; if (copy_from_user(&args, (struct rds_free_mr_args __user *)optval, sizeof(struct rds_free_mr_args))) return -EFAULT; /* Special case - a null cookie means flush all unused MRs */ if (args.cookie == 0) { if (!rs->rs_transport || !rs->rs_transport->flush_mrs) return -EINVAL; rs->rs_transport->flush_mrs(); return 0; } /* Look up the MR given its R_key and remove it from the rbtree * so nobody else finds it. * This should also prevent races with rds_rdma_unuse. */ spin_lock_irqsave(&rs->rs_rdma_lock, flags); mr = rds_mr_tree_walk(&rs->rs_rdma_keys, rds_rdma_cookie_key(args.cookie), NULL); if (mr) { rb_erase(&mr->r_rb_node, &rs->rs_rdma_keys); RB_CLEAR_NODE(&mr->r_rb_node); if (args.flags & RDS_RDMA_INVALIDATE) mr->r_invalidate = 1; } spin_unlock_irqrestore(&rs->rs_rdma_lock, flags); if (!mr) return -EINVAL; /* * call rds_destroy_mr() ourselves so that we're sure it's done by the time * we return. If we let rds_mr_put() do it it might not happen until * someone else drops their ref. */ rds_destroy_mr(mr); rds_mr_put(mr); return 0; } /* * This is called when we receive an extension header that * tells us this MR was used. It allows us to implement * use_once semantics */ void rds_rdma_unuse(struct rds_sock *rs, u32 r_key, int force) { struct rds_mr *mr; unsigned long flags; int zot_me = 0; spin_lock_irqsave(&rs->rs_rdma_lock, flags); mr = rds_mr_tree_walk(&rs->rs_rdma_keys, r_key, NULL); if (!mr) { pr_debug("rds: trying to unuse MR with unknown r_key %u!\n", r_key); spin_unlock_irqrestore(&rs->rs_rdma_lock, flags); return; } if (mr->r_use_once || force) { rb_erase(&mr->r_rb_node, &rs->rs_rdma_keys); RB_CLEAR_NODE(&mr->r_rb_node); zot_me = 1; } spin_unlock_irqrestore(&rs->rs_rdma_lock, flags); /* May have to issue a dma_sync on this memory region. * Note we could avoid this if the operation was a RDMA READ, * but at this point we can't tell. */ if (mr->r_trans->sync_mr) mr->r_trans->sync_mr(mr->r_trans_private, DMA_FROM_DEVICE); /* If the MR was marked as invalidate, this will * trigger an async flush. */ if (zot_me) { rds_destroy_mr(mr); rds_mr_put(mr); } } void rds_rdma_free_op(struct rm_rdma_op *ro) { unsigned int i; for (i = 0; i < ro->op_nents; i++) { struct page *page = sg_page(&ro->op_sg[i]); /* Mark page dirty if it was possibly modified, which * is the case for a RDMA_READ which copies from remote * to local memory */ if (!ro->op_write) { WARN_ON(!page->mapping && irqs_disabled()); set_page_dirty(page); } put_page(page); } kfree(ro->op_notifier); ro->op_notifier = NULL; ro->op_active = 0; } void rds_atomic_free_op(struct rm_atomic_op *ao) { struct page *page = sg_page(ao->op_sg); /* Mark page dirty if it was possibly modified, which * is the case for a RDMA_READ which copies from remote * to local memory */ set_page_dirty(page); put_page(page); kfree(ao->op_notifier); ao->op_notifier = NULL; ao->op_active = 0; } /* * Count the number of pages needed to describe an incoming iovec array. */ static int rds_rdma_pages(struct rds_iovec iov[], int nr_iovecs) { int tot_pages = 0; unsigned int nr_pages; unsigned int i; /* figure out the number of pages in the vector */ for (i = 0; i < nr_iovecs; i++) { nr_pages = rds_pages_in_vec(&iov[i]); if (nr_pages == 0) return -EINVAL; tot_pages += nr_pages; /* * nr_pages for one entry is limited to (UINT_MAX>>PAGE_SHIFT)+1, * so tot_pages cannot overflow without first going negative. */ if (tot_pages < 0) return -EINVAL; } return tot_pages; } int rds_rdma_extra_size(struct rds_rdma_args *args, struct rds_iov_vector *iov) { struct rds_iovec *vec; struct rds_iovec __user *local_vec; int tot_pages = 0; unsigned int nr_pages; unsigned int i; local_vec = (struct rds_iovec __user *)(unsigned long) args->local_vec_addr; if (args->nr_local == 0) return -EINVAL; if (args->nr_local > UIO_MAXIOV) return -EMSGSIZE; iov->iov = kcalloc(args->nr_local, sizeof(struct rds_iovec), GFP_KERNEL); if (!iov->iov) return -ENOMEM; vec = &iov->iov[0]; if (copy_from_user(vec, local_vec, args->nr_local * sizeof(struct rds_iovec))) return -EFAULT; iov->len = args->nr_local; /* figure out the number of pages in the vector */ for (i = 0; i < args->nr_local; i++, vec++) { nr_pages = rds_pages_in_vec(vec); if (nr_pages == 0) return -EINVAL; tot_pages += nr_pages; /* * nr_pages for one entry is limited to (UINT_MAX>>PAGE_SHIFT)+1, * so tot_pages cannot overflow without first going negative. */ if (tot_pages < 0) return -EINVAL; } return tot_pages * sizeof(struct scatterlist); } /* * The application asks for a RDMA transfer. * Extract all arguments and set up the rdma_op */ int rds_cmsg_rdma_args(struct rds_sock *rs, struct rds_message *rm, struct cmsghdr *cmsg, struct rds_iov_vector *vec) { struct rds_rdma_args *args; struct rm_rdma_op *op = &rm->rdma; int nr_pages; unsigned int nr_bytes; struct page **pages = NULL; struct rds_iovec *iovs; unsigned int i, j; int ret = 0; if (cmsg->cmsg_len < CMSG_LEN(sizeof(struct rds_rdma_args)) || rm->rdma.op_active) return -EINVAL; args = CMSG_DATA(cmsg); if (ipv6_addr_any(&rs->rs_bound_addr)) { ret = -ENOTCONN; /* XXX not a great errno */ goto out_ret; } if (args->nr_local > UIO_MAXIOV) { ret = -EMSGSIZE; goto out_ret; } if (vec->len != args->nr_local) { ret = -EINVAL; goto out_ret; } iovs = vec->iov; nr_pages = rds_rdma_pages(iovs, args->nr_local); if (nr_pages < 0) { ret = -EINVAL; goto out_ret; } pages = kcalloc(nr_pages, sizeof(struct page *), GFP_KERNEL); if (!pages) { ret = -ENOMEM; goto out_ret; } op->op_write = !!(args->flags & RDS_RDMA_READWRITE); op->op_fence = !!(args->flags & RDS_RDMA_FENCE); op->op_notify = !!(args->flags & RDS_RDMA_NOTIFY_ME); op->op_silent = !!(args->flags & RDS_RDMA_SILENT); op->op_active = 1; op->op_recverr = rs->rs_recverr; WARN_ON(!nr_pages); op->op_sg = rds_message_alloc_sgs(rm, nr_pages); if (!op->op_sg) { ret = -ENOMEM; goto out_pages; } if (op->op_notify || op->op_recverr) { /* We allocate an uninitialized notifier here, because * we don't want to do that in the completion handler. We * would have to use GFP_ATOMIC there, and don't want to deal * with failed allocations. */ op->op_notifier = kmalloc(sizeof(struct rds_notifier), GFP_KERNEL); if (!op->op_notifier) { ret = -ENOMEM; goto out_pages; } op->op_notifier->n_user_token = args->user_token; op->op_notifier->n_status = RDS_RDMA_SUCCESS; /* Enable rmda notification on data operation for composite * rds messages and make sure notification is enabled only * for the data operation which follows it so that application * gets notified only after full message gets delivered. */ if (rm->data.op_sg) { rm->rdma.op_notify = 0; rm->data.op_notify = !!(args->flags & RDS_RDMA_NOTIFY_ME); } } /* The cookie contains the R_Key of the remote memory region, and * optionally an offset into it. This is how we implement RDMA into * unaligned memory. * When setting up the RDMA, we need to add that offset to the * destination address (which is really an offset into the MR) * FIXME: We may want to move this into ib_rdma.c */ op->op_rkey = rds_rdma_cookie_key(args->cookie); op->op_remote_addr = args->remote_vec.addr + rds_rdma_cookie_offset(args->cookie); nr_bytes = 0; rdsdebug("RDS: rdma prepare nr_local %llu rva %llx rkey %x\n", (unsigned long long)args->nr_local, (unsigned long long)args->remote_vec.addr, op->op_rkey); for (i = 0; i < args->nr_local; i++) { struct rds_iovec *iov = &iovs[i]; /* don't need to check, rds_rdma_pages() verified nr will be +nonzero */ unsigned int nr = rds_pages_in_vec(iov); rs->rs_user_addr = iov->addr; rs->rs_user_bytes = iov->bytes; /* If it's a WRITE operation, we want to pin the pages for reading. * If it's a READ operation, we need to pin the pages for writing. */ ret = rds_pin_pages(iov->addr, nr, pages, !op->op_write); if (ret < 0) goto out_pages; else ret = 0; rdsdebug("RDS: nr_bytes %u nr %u iov->bytes %llu iov->addr %llx\n", nr_bytes, nr, iov->bytes, iov->addr); nr_bytes += iov->bytes; for (j = 0; j < nr; j++) { unsigned int offset = iov->addr & ~PAGE_MASK; struct scatterlist *sg; sg = &op->op_sg[op->op_nents + j]; sg_set_page(sg, pages[j], min_t(unsigned int, iov->bytes, PAGE_SIZE - offset), offset); rdsdebug("RDS: sg->offset %x sg->len %x iov->addr %llx iov->bytes %llu\n", sg->offset, sg->length, iov->addr, iov->bytes); iov->addr += sg->length; iov->bytes -= sg->length; } op->op_nents += nr; } if (nr_bytes > args->remote_vec.bytes) { rdsdebug("RDS nr_bytes %u remote_bytes %u do not match\n", nr_bytes, (unsigned int) args->remote_vec.bytes); ret = -EINVAL; goto out_pages; } op->op_bytes = nr_bytes; out_pages: kfree(pages); out_ret: if (ret) rds_rdma_free_op(op); else rds_stats_inc(s_send_rdma); return ret; } /* * The application wants us to pass an RDMA destination (aka MR) * to the remote */ int rds_cmsg_rdma_dest(struct rds_sock *rs, struct rds_message *rm, struct cmsghdr *cmsg) { unsigned long flags; struct rds_mr *mr; u32 r_key; int err = 0; if (cmsg->cmsg_len < CMSG_LEN(sizeof(rds_rdma_cookie_t)) || rm->m_rdma_cookie != 0) return -EINVAL; memcpy(&rm->m_rdma_cookie, CMSG_DATA(cmsg), sizeof(rm->m_rdma_cookie)); /* We are reusing a previously mapped MR here. Most likely, the * application has written to the buffer, so we need to explicitly * flush those writes to RAM. Otherwise the HCA may not see them * when doing a DMA from that buffer. */ r_key = rds_rdma_cookie_key(rm->m_rdma_cookie); spin_lock_irqsave(&rs->rs_rdma_lock, flags); mr = rds_mr_tree_walk(&rs->rs_rdma_keys, r_key, NULL); if (!mr) err = -EINVAL; /* invalid r_key */ else refcount_inc(&mr->r_refcount); spin_unlock_irqrestore(&rs->rs_rdma_lock, flags); if (mr) { mr->r_trans->sync_mr(mr->r_trans_private, DMA_TO_DEVICE); rm->rdma.op_rdma_mr = mr; } return err; } /* * The application passes us an address range it wants to enable RDMA * to/from. We map the area, and save the <R_Key,offset> pair * in rm->m_rdma_cookie. This causes it to be sent along to the peer * in an extension header. */ int rds_cmsg_rdma_map(struct rds_sock *rs, struct rds_message *rm, struct cmsghdr *cmsg) { if (cmsg->cmsg_len < CMSG_LEN(sizeof(struct rds_get_mr_args)) || rm->m_rdma_cookie != 0) return -EINVAL; return __rds_rdma_map(rs, CMSG_DATA(cmsg), &rm->m_rdma_cookie, &rm->rdma.op_rdma_mr, rm->m_conn_path); } /* * Fill in rds_message for an atomic request. */ int rds_cmsg_atomic(struct rds_sock *rs, struct rds_message *rm, struct cmsghdr *cmsg) { struct page *page = NULL; struct rds_atomic_args *args; int ret = 0; if (cmsg->cmsg_len < CMSG_LEN(sizeof(struct rds_atomic_args)) || rm->atomic.op_active) return -EINVAL; args = CMSG_DATA(cmsg); /* Nonmasked & masked cmsg ops converted to masked hw ops */ switch (cmsg->cmsg_type) { case RDS_CMSG_ATOMIC_FADD: rm->atomic.op_type = RDS_ATOMIC_TYPE_FADD; rm->atomic.op_m_fadd.add = args->fadd.add; rm->atomic.op_m_fadd.nocarry_mask = 0; break; case RDS_CMSG_MASKED_ATOMIC_FADD: rm->atomic.op_type = RDS_ATOMIC_TYPE_FADD; rm->atomic.op_m_fadd.add = args->m_fadd.add; rm->atomic.op_m_fadd.nocarry_mask = args->m_fadd.nocarry_mask; break; case RDS_CMSG_ATOMIC_CSWP: rm->atomic.op_type = RDS_ATOMIC_TYPE_CSWP; rm->atomic.op_m_cswp.compare = args->cswp.compare; rm->atomic.op_m_cswp.swap = args->cswp.swap; rm->atomic.op_m_cswp.compare_mask = ~0; rm->atomic.op_m_cswp.swap_mask = ~0; break; case RDS_CMSG_MASKED_ATOMIC_CSWP: rm->atomic.op_type = RDS_ATOMIC_TYPE_CSWP; rm->atomic.op_m_cswp.compare = args->m_cswp.compare; rm->atomic.op_m_cswp.swap = args->m_cswp.swap; rm->atomic.op_m_cswp.compare_mask = args->m_cswp.compare_mask; rm->atomic.op_m_cswp.swap_mask = args->m_cswp.swap_mask; break; default: BUG(); /* should never happen */ } rm->atomic.op_notify = !!(args->flags & RDS_RDMA_NOTIFY_ME); rm->atomic.op_silent = !!(args->flags & RDS_RDMA_SILENT); rm->atomic.op_active = 1; rm->atomic.op_recverr = rs->rs_recverr; rm->atomic.op_sg = rds_message_alloc_sgs(rm, 1); if (!rm->atomic.op_sg) { ret = -ENOMEM; goto err; } /* verify 8 byte-aligned */ if (args->local_addr & 0x7) { ret = -EFAULT; goto err; } ret = rds_pin_pages(args->local_addr, 1, &page, 1); if (ret != 1) goto err; ret = 0; sg_set_page(rm->atomic.op_sg, page, 8, offset_in_page(args->local_addr)); if (rm->atomic.op_notify || rm->atomic.op_recverr) { /* We allocate an uninitialized notifier here, because * we don't want to do that in the completion handler. We * would have to use GFP_ATOMIC there, and don't want to deal * with failed allocations. */ rm->atomic.op_notifier = kmalloc(sizeof(*rm->atomic.op_notifier), GFP_KERNEL); if (!rm->atomic.op_notifier) { ret = -ENOMEM; goto err; } rm->atomic.op_notifier->n_user_token = args->user_token; rm->atomic.op_notifier->n_status = RDS_RDMA_SUCCESS; } rm->atomic.op_rkey = rds_rdma_cookie_key(args->cookie); rm->atomic.op_remote_addr = args->remote_addr + rds_rdma_cookie_offset(args->cookie); return ret; err: if (page) put_page(page); rm->atomic.op_active = 0; kfree(rm->atomic.op_notifier); return ret; }
1 1 1 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 /* * * Bluetooth HCI UART driver for Intel/AG6xx devices * * Copyright (C) 2016 Intel Corporation * * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA * */ #include <linux/kernel.h> #include <linux/errno.h> #include <linux/skbuff.h> #include <linux/firmware.h> #include <linux/module.h> #include <linux/tty.h> #include <net/bluetooth/bluetooth.h> #include <net/bluetooth/hci_core.h> #include "hci_uart.h" #include "btintel.h" struct ag6xx_data { struct sk_buff *rx_skb; struct sk_buff_head txq; }; struct pbn_entry { __le32 addr; __le32 plen; __u8 data[0]; } __packed; static int ag6xx_open(struct hci_uart *hu) { struct ag6xx_data *ag6xx; BT_DBG("hu %p", hu); ag6xx = kzalloc(sizeof(*ag6xx), GFP_KERNEL); if (!ag6xx) return -ENOMEM; skb_queue_head_init(&ag6xx->txq); hu->priv = ag6xx; return 0; } static int ag6xx_close(struct hci_uart *hu) { struct ag6xx_data *ag6xx = hu->priv; BT_DBG("hu %p", hu); skb_queue_purge(&ag6xx->txq); kfree_skb(ag6xx->rx_skb); kfree(ag6xx); hu->priv = NULL; return 0; } static int ag6xx_flush(struct hci_uart *hu) { struct ag6xx_data *ag6xx = hu->priv; BT_DBG("hu %p", hu); skb_queue_purge(&ag6xx->txq); return 0; } static struct sk_buff *ag6xx_dequeue(struct hci_uart *hu) { struct ag6xx_data *ag6xx = hu->priv; struct sk_buff *skb; skb = skb_dequeue(&ag6xx->txq); if (!skb) return skb; /* Prepend skb with frame type */ memcpy(skb_push(skb, 1), &bt_cb(skb)->pkt_type, 1); return skb; } static int ag6xx_enqueue(struct hci_uart *hu, struct sk_buff *skb) { struct ag6xx_data *ag6xx = hu->priv; skb_queue_tail(&ag6xx->txq, skb); return 0; } static const struct h4_recv_pkt ag6xx_recv_pkts[] = { { H4_RECV_ACL, .recv = hci_recv_frame }, { H4_RECV_SCO, .recv = hci_recv_frame }, { H4_RECV_EVENT, .recv = hci_recv_frame }, }; static int ag6xx_recv(struct hci_uart *hu, const void *data, int count) { struct ag6xx_data *ag6xx = hu->priv; if (!test_bit(HCI_UART_REGISTERED, &hu->flags)) return -EUNATCH; ag6xx->rx_skb = h4_recv_buf(hu->hdev, ag6xx->rx_skb, data, count, ag6xx_recv_pkts, ARRAY_SIZE(ag6xx_recv_pkts)); if (IS_ERR(ag6xx->rx_skb)) { int err = PTR_ERR(ag6xx->rx_skb); bt_dev_err(hu->hdev, "Frame reassembly failed (%d)", err); ag6xx->rx_skb = NULL; return err; } return count; } static int intel_mem_write(struct hci_dev *hdev, u32 addr, u32 plen, const void *data) { /* Can write a maximum of 247 bytes per HCI command. * HCI cmd Header (3), Intel mem write header (6), data (247). */ while (plen > 0) { struct sk_buff *skb; u8 cmd_param[253], fragment_len = (plen > 247) ? 247 : plen; __le32 leaddr = cpu_to_le32(addr); memcpy(cmd_param, &leaddr, 4); cmd_param[4] = 0; cmd_param[5] = fragment_len; memcpy(cmd_param + 6, data, fragment_len); skb = __hci_cmd_sync(hdev, 0xfc8e, fragment_len + 6, cmd_param, HCI_INIT_TIMEOUT); if (IS_ERR(skb)) return PTR_ERR(skb); kfree_skb(skb); plen -= fragment_len; data += fragment_len; addr += fragment_len; } return 0; } static int ag6xx_setup(struct hci_uart *hu) { struct hci_dev *hdev = hu->hdev; struct sk_buff *skb; struct intel_version ver; const struct firmware *fw; const u8 *fw_ptr; char fwname[64]; bool patched = false; int err; hu->hdev->set_diag = btintel_set_diag; hu->hdev->set_bdaddr = btintel_set_bdaddr; err = btintel_enter_mfg(hdev); if (err) return err; err = btintel_read_version(hdev, &ver); if (err) return err; btintel_version_info(hdev, &ver); /* The hardware platform number has a fixed value of 0x37 and * for now only accept this single value. */ if (ver.hw_platform != 0x37) { bt_dev_err(hdev, "Unsupported Intel hardware platform: 0x%X", ver.hw_platform); return -EINVAL; } /* Only the hardware variant iBT 2.1 (AG6XX) is supported by this * firmware setup method. */ if (ver.hw_variant != 0x0a) { bt_dev_err(hdev, "Unsupported Intel hardware variant: 0x%x", ver.hw_variant); return -EINVAL; } snprintf(fwname, sizeof(fwname), "intel/ibt-hw-%x.%x.bddata", ver.hw_platform, ver.hw_variant); err = request_firmware(&fw, fwname, &hdev->dev); if (err < 0) { bt_dev_err(hdev, "Failed to open Intel bddata file: %s (%d)", fwname, err); goto patch; } fw_ptr = fw->data; bt_dev_info(hdev, "Applying bddata (%s)", fwname); skb = __hci_cmd_sync_ev(hdev, 0xfc2f, fw->size, fw->data, HCI_EV_CMD_STATUS, HCI_CMD_TIMEOUT); if (IS_ERR(skb)) { bt_dev_err(hdev, "Applying bddata failed (%ld)", PTR_ERR(skb)); release_firmware(fw); return PTR_ERR(skb); } kfree_skb(skb); release_firmware(fw); patch: /* If there is no applied patch, fw_patch_num is always 0x00. In other * cases, current firmware is already patched. No need to patch it. */ if (ver.fw_patch_num) { bt_dev_info(hdev, "Device is already patched. patch num: %02x", ver.fw_patch_num); patched = true; goto complete; } snprintf(fwname, sizeof(fwname), "intel/ibt-hw-%x.%x.%x-fw-%x.%x.%x.%x.%x.pbn", ver.hw_platform, ver.hw_variant, ver.hw_revision, ver.fw_variant, ver.fw_revision, ver.fw_build_num, ver.fw_build_ww, ver.fw_build_yy); err = request_firmware(&fw, fwname, &hdev->dev); if (err < 0) { bt_dev_err(hdev, "Failed to open Intel patch file: %s(%d)", fwname, err); goto complete; } fw_ptr = fw->data; bt_dev_info(hdev, "Patching firmware file (%s)", fwname); /* PBN patch file contains a list of binary patches to be applied on top * of the embedded firmware. Each patch entry header contains the target * address and patch size. * * Patch entry: * | addr(le) | patch_len(le) | patch_data | * | 4 Bytes | 4 Bytes | n Bytes | * * PBN file is terminated by a patch entry whose address is 0xffffffff. */ while (fw->size > fw_ptr - fw->data) { struct pbn_entry *pbn = (void *)fw_ptr; u32 addr, plen; if (pbn->addr == 0xffffffff) { bt_dev_info(hdev, "Patching complete"); patched = true; break; } addr = le32_to_cpu(pbn->addr); plen = le32_to_cpu(pbn->plen); if (fw->data + fw->size <= pbn->data + plen) { bt_dev_info(hdev, "Invalid patch len (%d)", plen); break; } bt_dev_info(hdev, "Patching %td/%zu", (fw_ptr - fw->data), fw->size); err = intel_mem_write(hdev, addr, plen, pbn->data); if (err) { bt_dev_err(hdev, "Patching failed"); break; } fw_ptr = pbn->data + plen; } release_firmware(fw); complete: /* Exit manufacturing mode and reset */ err = btintel_exit_mfg(hdev, true, patched); if (err) return err; /* Set the event mask for Intel specific vendor events. This enables * a few extra events that are useful during general operation. */ btintel_set_event_mask_mfg(hdev, false); btintel_check_bdaddr(hdev); return 0; } static const struct hci_uart_proto ag6xx_proto = { .id = HCI_UART_AG6XX, .name = "AG6XX", .manufacturer = 2, .open = ag6xx_open, .close = ag6xx_close, .flush = ag6xx_flush, .setup = ag6xx_setup, .recv = ag6xx_recv, .enqueue = ag6xx_enqueue, .dequeue = ag6xx_dequeue, }; int __init ag6xx_init(void) { return hci_uart_register_proto(&ag6xx_proto); } int __exit ag6xx_deinit(void) { return hci_uart_unregister_proto(&ag6xx_proto); }
8 8 8 7 5 7 1 6 6 6 6 6 6 1 5 5 4 1 3 2 1 2 2 1 1 12 13 3 10 2 8 1 1 1 1 13 14 2 2 1 2 1 2 3 2 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 /* * Copyright (c) 2016 Intel Corporation * * Permission to use, copy, modify, distribute, and sell this software and its * documentation for any purpose is hereby granted without fee, provided that * the above copyright notice appear in all copies and that both that copyright * notice and this permission notice appear in supporting documentation, and * that the name of the copyright holders not be used in advertising or * publicity pertaining to distribution of the software without specific, * written prior permission. The copyright holders make no representations * about the suitability of this software for any purpose. It is provided "as * is" without express or implied warranty. * * THE COPYRIGHT HOLDERS DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, * INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO * EVENT SHALL THE COPYRIGHT HOLDERS BE LIABLE FOR ANY SPECIAL, INDIRECT OR * CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, * DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER * TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE * OF THIS SOFTWARE. */ #include <linux/export.h> #include <drm/drmP.h> #include <drm/drm_auth.h> #include <drm/drm_framebuffer.h> #include <drm/drm_atomic.h> #include <drm/drm_print.h> #include "drm_internal.h" #include "drm_crtc_internal.h" /** * DOC: overview * * Frame buffers are abstract memory objects that provide a source of pixels to * scanout to a CRTC. Applications explicitly request the creation of frame * buffers through the DRM_IOCTL_MODE_ADDFB(2) ioctls and receive an opaque * handle that can be passed to the KMS CRTC control, plane configuration and * page flip functions. * * Frame buffers rely on the underlying memory manager for allocating backing * storage. When creating a frame buffer applications pass a memory handle * (or a list of memory handles for multi-planar formats) through the * &struct drm_mode_fb_cmd2 argument. For drivers using GEM as their userspace * buffer management interface this would be a GEM handle. Drivers are however * free to use their own backing storage object handles, e.g. vmwgfx directly * exposes special TTM handles to userspace and so expects TTM handles in the * create ioctl and not GEM handles. * * Framebuffers are tracked with &struct drm_framebuffer. They are published * using drm_framebuffer_init() - after calling that function userspace can use * and access the framebuffer object. The helper function * drm_helper_mode_fill_fb_struct() can be used to pre-fill the required * metadata fields. * * The lifetime of a drm framebuffer is controlled with a reference count, * drivers can grab additional references with drm_framebuffer_get() and drop * them again with drm_framebuffer_put(). For driver-private framebuffers for * which the last reference is never dropped (e.g. for the fbdev framebuffer * when the struct &struct drm_framebuffer is embedded into the fbdev helper * struct) drivers can manually clean up a framebuffer at module unload time * with drm_framebuffer_unregister_private(). But doing this is not * recommended, and it's better to have a normal free-standing &struct * drm_framebuffer. */ int drm_framebuffer_check_src_coords(uint32_t src_x, uint32_t src_y, uint32_t src_w, uint32_t src_h, const struct drm_framebuffer *fb) { unsigned int fb_width, fb_height; fb_width = fb->width << 16; fb_height = fb->height << 16; /* Make sure source coordinates are inside the fb. */ if (src_w > fb_width || src_x > fb_width - src_w || src_h > fb_height || src_y > fb_height - src_h) { DRM_DEBUG_KMS("Invalid source coordinates " "%u.%06ux%u.%06u+%u.%06u+%u.%06u (fb %ux%u)\n", src_w >> 16, ((src_w & 0xffff) * 15625) >> 10, src_h >> 16, ((src_h & 0xffff) * 15625) >> 10, src_x >> 16, ((src_x & 0xffff) * 15625) >> 10, src_y >> 16, ((src_y & 0xffff) * 15625) >> 10, fb->width, fb->height); return -ENOSPC; } return 0; } /** * drm_mode_addfb - add an FB to the graphics configuration * @dev: drm device for the ioctl * @or: pointer to request structure * @file_priv: drm file * * Add a new FB to the specified CRTC, given a user request. This is the * original addfb ioctl which only supported RGB formats. * * Called by the user via ioctl, or by an in-kernel client. * * Returns: * Zero on success, negative errno on failure. */ int drm_mode_addfb(struct drm_device *dev, struct drm_mode_fb_cmd *or, struct drm_file *file_priv) { struct drm_mode_fb_cmd2 r = {}; int ret; /* convert to new format and call new ioctl */ r.fb_id = or->fb_id; r.width = or->width; r.height = or->height; r.pitches[0] = or->pitch; r.pixel_format = drm_mode_legacy_fb_format(or->bpp, or->depth); r.handles[0] = or->handle; if (r.pixel_format == DRM_FORMAT_XRGB2101010 && dev->driver->driver_features & DRIVER_PREFER_XBGR_30BPP) r.pixel_format = DRM_FORMAT_XBGR2101010; ret = drm_mode_addfb2(dev, &r, file_priv); if (ret) return ret; or->fb_id = r.fb_id; return 0; } int drm_mode_addfb_ioctl(struct drm_device *dev, void *data, struct drm_file *file_priv) { return drm_mode_addfb(dev, data, file_priv); } static int fb_plane_width(int width, const struct drm_format_info *format, int plane) { if (plane == 0) return width; return DIV_ROUND_UP(width, format->hsub); } static int fb_plane_height(int height, const struct drm_format_info *format, int plane) { if (plane == 0) return height; return DIV_ROUND_UP(height, format->vsub); } static int framebuffer_check(struct drm_device *dev, const struct drm_mode_fb_cmd2 *r) { const struct drm_format_info *info; int i; /* check if the format is supported at all */ info = __drm_format_info(r->pixel_format & ~DRM_FORMAT_BIG_ENDIAN); if (!info) { struct drm_format_name_buf format_name; DRM_DEBUG_KMS("bad framebuffer format %s\n", drm_get_format_name(r->pixel_format, &format_name)); return -EINVAL; } /* now let the driver pick its own format info */ info = drm_get_format_info(dev, r); if (r->width == 0) { DRM_DEBUG_KMS("bad framebuffer width %u\n", r->width); return -EINVAL; } if (r->height == 0) { DRM_DEBUG_KMS("bad framebuffer height %u\n", r->height); return -EINVAL; } for (i = 0; i < info->num_planes; i++) { unsigned int width = fb_plane_width(r->width, info, i); unsigned int height = fb_plane_height(r->height, info, i); unsigned int cpp = info->cpp[i]; if (!r->handles[i]) { DRM_DEBUG_KMS("no buffer object handle for plane %d\n", i); return -EINVAL; } if ((uint64_t) width * cpp > UINT_MAX) return -ERANGE; if ((uint64_t) height * r->pitches[i] + r->offsets[i] > UINT_MAX) return -ERANGE; if (r->pitches[i] < width * cpp) { DRM_DEBUG_KMS("bad pitch %u for plane %d\n", r->pitches[i], i); return -EINVAL; } if (r->modifier[i] && !(r->flags & DRM_MODE_FB_MODIFIERS)) { DRM_DEBUG_KMS("bad fb modifier %llu for plane %d\n", r->modifier[i], i); return -EINVAL; } if (r->flags & DRM_MODE_FB_MODIFIERS && r->modifier[i] != r->modifier[0]) { DRM_DEBUG_KMS("bad fb modifier %llu for plane %d\n", r->modifier[i], i); return -EINVAL; } /* modifier specific checks: */ switch (r->modifier[i]) { case DRM_FORMAT_MOD_SAMSUNG_64_32_TILE: /* NOTE: the pitch restriction may be lifted later if it turns * out that no hw has this restriction: */ if (r->pixel_format != DRM_FORMAT_NV12 || width % 128 || height % 32 || r->pitches[i] % 128) { DRM_DEBUG_KMS("bad modifier data for plane %d\n", i); return -EINVAL; } break; default: break; } } for (i = info->num_planes; i < 4; i++) { if (r->modifier[i]) { DRM_DEBUG_KMS("non-zero modifier for unused plane %d\n", i); return -EINVAL; } /* Pre-FB_MODIFIERS userspace didn't clear the structs properly. */ if (!(r->flags & DRM_MODE_FB_MODIFIERS)) continue; if (r->handles[i]) { DRM_DEBUG_KMS("buffer object handle for unused plane %d\n", i); return -EINVAL; } if (r->pitches[i]) { DRM_DEBUG_KMS("non-zero pitch for unused plane %d\n", i); return -EINVAL; } if (r->offsets[i]) { DRM_DEBUG_KMS("non-zero offset for unused plane %d\n", i); return -EINVAL; } } return 0; } struct drm_framebuffer * drm_internal_framebuffer_create(struct drm_device *dev, const struct drm_mode_fb_cmd2 *r, struct drm_file *file_priv) { struct drm_mode_config *config = &dev->mode_config; struct drm_framebuffer *fb; int ret; if (r->flags & ~(DRM_MODE_FB_INTERLACED | DRM_MODE_FB_MODIFIERS)) { DRM_DEBUG_KMS("bad framebuffer flags 0x%08x\n", r->flags); return ERR_PTR(-EINVAL); } if ((config->min_width > r->width) || (r->width > config->max_width)) { DRM_DEBUG_KMS("bad framebuffer width %d, should be >= %d && <= %d\n", r->width, config->min_width, config->max_width); return ERR_PTR(-EINVAL); } if ((config->min_height > r->height) || (r->height > config->max_height)) { DRM_DEBUG_KMS("bad framebuffer height %d, should be >= %d && <= %d\n", r->height, config->min_height, config->max_height); return ERR_PTR(-EINVAL); } if (r->flags & DRM_MODE_FB_MODIFIERS && !dev->mode_config.allow_fb_modifiers) { DRM_DEBUG_KMS("driver does not support fb modifiers\n"); return ERR_PTR(-EINVAL); } ret = framebuffer_check(dev, r); if (ret) return ERR_PTR(ret); fb = dev->mode_config.funcs->fb_create(dev, file_priv, r); if (IS_ERR(fb)) { DRM_DEBUG_KMS("could not create framebuffer\n"); return fb; } return fb; } /** * drm_mode_addfb2 - add an FB to the graphics configuration * @dev: drm device for the ioctl * @data: data pointer for the ioctl * @file_priv: drm file for the ioctl call * * Add a new FB to the specified CRTC, given a user request with format. This is * the 2nd version of the addfb ioctl, which supports multi-planar framebuffers * and uses fourcc codes as pixel format specifiers. * * Called by the user via ioctl. * * Returns: * Zero on success, negative errno on failure. */ int drm_mode_addfb2(struct drm_device *dev, void *data, struct drm_file *file_priv) { struct drm_mode_fb_cmd2 *r = data; struct drm_framebuffer *fb; if (!drm_core_check_feature(dev, DRIVER_MODESET)) return -EINVAL; fb = drm_internal_framebuffer_create(dev, r, file_priv); if (IS_ERR(fb)) return PTR_ERR(fb); DRM_DEBUG_KMS("[FB:%d]\n", fb->base.id); r->fb_id = fb->base.id; /* Transfer ownership to the filp for reaping on close */ mutex_lock(&file_priv->fbs_lock); list_add(&fb->filp_head, &file_priv->fbs); mutex_unlock(&file_priv->fbs_lock); return 0; } struct drm_mode_rmfb_work { struct work_struct work; struct list_head fbs; }; static void drm_mode_rmfb_work_fn(struct work_struct *w) { struct drm_mode_rmfb_work *arg = container_of(w, typeof(*arg), work); while (!list_empty(&arg->fbs)) { struct drm_framebuffer *fb = list_first_entry(&arg->fbs, typeof(*fb), filp_head); list_del_init(&fb->filp_head); drm_framebuffer_remove(fb); } } /** * drm_mode_rmfb - remove an FB from the configuration * @dev: drm device * @fb_id: id of framebuffer to remove * @file_priv: drm file * * Remove the specified FB. * * Called by the user via ioctl, or by an in-kernel client. * * Returns: * Zero on success, negative errno on failure. */ int drm_mode_rmfb(struct drm_device *dev, u32 fb_id, struct drm_file *file_priv) { struct drm_framebuffer *fb = NULL; struct drm_framebuffer *fbl = NULL; int found = 0; if (!drm_core_check_feature(dev, DRIVER_MODESET)) return -EINVAL; fb = drm_framebuffer_lookup(dev, file_priv, fb_id); if (!fb) return -ENOENT; mutex_lock(&file_priv->fbs_lock); list_for_each_entry(fbl, &file_priv->fbs, filp_head) if (fb == fbl) found = 1; if (!found) { mutex_unlock(&file_priv->fbs_lock); goto fail_unref; } list_del_init(&fb->filp_head); mutex_unlock(&file_priv->fbs_lock); /* drop the reference we picked up in framebuffer lookup */ drm_framebuffer_put(fb); /* * we now own the reference that was stored in the fbs list * * drm_framebuffer_remove may fail with -EINTR on pending signals, * so run this in a separate stack as there's no way to correctly * handle this after the fb is already removed from the lookup table. */ if (drm_framebuffer_read_refcount(fb) > 1) { struct drm_mode_rmfb_work arg; INIT_WORK_ONSTACK(&arg.work, drm_mode_rmfb_work_fn); INIT_LIST_HEAD(&arg.fbs); list_add_tail(&fb->filp_head, &arg.fbs); schedule_work(&arg.work); flush_work(&arg.work); destroy_work_on_stack(&arg.work); } else drm_framebuffer_put(fb); return 0; fail_unref: drm_framebuffer_put(fb); return -ENOENT; } int drm_mode_rmfb_ioctl(struct drm_device *dev, void *data, struct drm_file *file_priv) { uint32_t *fb_id = data; return drm_mode_rmfb(dev, *fb_id, file_priv); } /** * drm_mode_getfb - get FB info * @dev: drm device for the ioctl * @data: data pointer for the ioctl * @file_priv: drm file for the ioctl call * * Lookup the FB given its ID and return info about it. * * Called by the user via ioctl. * * Returns: * Zero on success, negative errno on failure. */ int drm_mode_getfb(struct drm_device *dev, void *data, struct drm_file *file_priv) { struct drm_mode_fb_cmd *r = data; struct drm_framebuffer *fb; int ret; if (!drm_core_check_feature(dev, DRIVER_MODESET)) return -EINVAL; fb = drm_framebuffer_lookup(dev, file_priv, r->fb_id); if (!fb) return -ENOENT; /* Multi-planar framebuffers need getfb2. */ if (fb->format->num_planes > 1) { ret = -EINVAL; goto out; } if (!fb->funcs->create_handle) { ret = -ENODEV; goto out; } r->height = fb->height; r->width = fb->width; r->depth = fb->format->depth; r->bpp = fb->format->cpp[0] * 8; r->pitch = fb->pitches[0]; /* GET_FB() is an unprivileged ioctl so we must not return a * buffer-handle to non-master processes! For * backwards-compatibility reasons, we cannot make GET_FB() privileged, * so just return an invalid handle for non-masters. */ if (!drm_is_current_master(file_priv) && !capable(CAP_SYS_ADMIN)) { r->handle = 0; ret = 0; goto out; } ret = fb->funcs->create_handle(fb, file_priv, &r->handle); out: drm_framebuffer_put(fb); return ret; } /** * drm_mode_dirtyfb_ioctl - flush frontbuffer rendering on an FB * @dev: drm device for the ioctl * @data: data pointer for the ioctl * @file_priv: drm file for the ioctl call * * Lookup the FB and flush out the damaged area supplied by userspace as a clip * rectangle list. Generic userspace which does frontbuffer rendering must call * this ioctl to flush out the changes on manual-update display outputs, e.g. * usb display-link, mipi manual update panels or edp panel self refresh modes. * * Modesetting drivers which always update the frontbuffer do not need to * implement the corresponding &drm_framebuffer_funcs.dirty callback. * * Called by the user via ioctl. * * Returns: * Zero on success, negative errno on failure. */ int drm_mode_dirtyfb_ioctl(struct drm_device *dev, void *data, struct drm_file *file_priv) { struct drm_clip_rect __user *clips_ptr; struct drm_clip_rect *clips = NULL; struct drm_mode_fb_dirty_cmd *r = data; struct drm_framebuffer *fb; unsigned flags; int num_clips; int ret; if (!drm_core_check_feature(dev, DRIVER_MODESET)) return -EINVAL; fb = drm_framebuffer_lookup(dev, file_priv, r->fb_id); if (!fb) return -ENOENT; num_clips = r->num_clips; clips_ptr = (struct drm_clip_rect __user *)(unsigned long)r->clips_ptr; if (!num_clips != !clips_ptr) { ret = -EINVAL; goto out_err1; } flags = DRM_MODE_FB_DIRTY_FLAGS & r->flags; /* If userspace annotates copy, clips must come in pairs */ if (flags & DRM_MODE_FB_DIRTY_ANNOTATE_COPY && (num_clips % 2)) { ret = -EINVAL; goto out_err1; } if (num_clips && clips_ptr) { if (num_clips < 0 || num_clips > DRM_MODE_FB_DIRTY_MAX_CLIPS) { ret = -EINVAL; goto out_err1; } clips = kcalloc(num_clips, sizeof(*clips), GFP_KERNEL); if (!clips) { ret = -ENOMEM; goto out_err1; } ret = copy_from_user(clips, clips_ptr, num_clips * sizeof(*clips)); if (ret) { ret = -EFAULT; goto out_err2; } } if (fb->funcs->dirty) { ret = fb->funcs->dirty(fb, file_priv, flags, r->color, clips, num_clips); } else { ret = -ENOSYS; } out_err2: kfree(clips); out_err1: drm_framebuffer_put(fb); return ret; } /** * drm_fb_release - remove and free the FBs on this file * @priv: drm file for the ioctl * * Destroy all the FBs associated with @filp. * * Called by the user via ioctl. * * Returns: * Zero on success, negative errno on failure. */ void drm_fb_release(struct drm_file *priv) { struct drm_framebuffer *fb, *tfb; struct drm_mode_rmfb_work arg; INIT_LIST_HEAD(&arg.fbs); /* * When the file gets released that means no one else can access the fb * list any more, so no need to grab fpriv->fbs_lock. And we need to * avoid upsetting lockdep since the universal cursor code adds a * framebuffer while holding mutex locks. * * Note that a real deadlock between fpriv->fbs_lock and the modeset * locks is impossible here since no one else but this function can get * at it any more. */ list_for_each_entry_safe(fb, tfb, &priv->fbs, filp_head) { if (drm_framebuffer_read_refcount(fb) > 1) { list_move_tail(&fb->filp_head, &arg.fbs); } else { list_del_init(&fb->filp_head); /* This drops the fpriv->fbs reference. */ drm_framebuffer_put(fb); } } if (!list_empty(&arg.fbs)) { INIT_WORK_ONSTACK(&arg.work, drm_mode_rmfb_work_fn); schedule_work(&arg.work); flush_work(&arg.work); destroy_work_on_stack(&arg.work); } } void drm_framebuffer_free(struct kref *kref) { struct drm_framebuffer *fb = container_of(kref, struct drm_framebuffer, base.refcount); struct drm_device *dev = fb->dev; /* * The lookup idr holds a weak reference, which has not necessarily been * removed at this point. Check for that. */ drm_mode_object_unregister(dev, &fb->base); fb->funcs->destroy(fb); } /** * drm_framebuffer_init - initialize a framebuffer * @dev: DRM device * @fb: framebuffer to be initialized * @funcs: ... with these functions * * Allocates an ID for the framebuffer's parent mode object, sets its mode * functions & device file and adds it to the master fd list. * * IMPORTANT: * This functions publishes the fb and makes it available for concurrent access * by other users. Which means by this point the fb _must_ be fully set up - * since all the fb attributes are invariant over its lifetime, no further * locking but only correct reference counting is required. * * Returns: * Zero on success, error code on failure. */ int drm_framebuffer_init(struct drm_device *dev, struct drm_framebuffer *fb, const struct drm_framebuffer_funcs *funcs) { int ret; if (WARN_ON_ONCE(fb->dev != dev || !fb->format)) return -EINVAL; INIT_LIST_HEAD(&fb->filp_head); fb->funcs = funcs; strcpy(fb->comm, current->comm); ret = __drm_mode_object_add(dev, &fb->base, DRM_MODE_OBJECT_FB, false, drm_framebuffer_free); if (ret) goto out; mutex_lock(&dev->mode_config.fb_lock); dev->mode_config.num_fb++; list_add(&fb->head, &dev->mode_config.fb_list); mutex_unlock(&dev->mode_config.fb_lock); drm_mode_object_register(dev, &fb->base); out: return ret; } EXPORT_SYMBOL(drm_framebuffer_init); /** * drm_framebuffer_lookup - look up a drm framebuffer and grab a reference * @dev: drm device * @file_priv: drm file to check for lease against. * @id: id of the fb object * * If successful, this grabs an additional reference to the framebuffer - * callers need to make sure to eventually unreference the returned framebuffer * again, using drm_framebuffer_put(). */ struct drm_framebuffer *drm_framebuffer_lookup(struct drm_device *dev, struct drm_file *file_priv, uint32_t id) { struct drm_mode_object *obj; struct drm_framebuffer *fb = NULL; obj = __drm_mode_object_find(dev, file_priv, id, DRM_MODE_OBJECT_FB); if (obj) fb = obj_to_fb(obj); return fb; } EXPORT_SYMBOL(drm_framebuffer_lookup); /** * drm_framebuffer_unregister_private - unregister a private fb from the lookup idr * @fb: fb to unregister * * Drivers need to call this when cleaning up driver-private framebuffers, e.g. * those used for fbdev. Note that the caller must hold a reference of it's own, * i.e. the object may not be destroyed through this call (since it'll lead to a * locking inversion). * * NOTE: This function is deprecated. For driver-private framebuffers it is not * recommended to embed a framebuffer struct info fbdev struct, instead, a * framebuffer pointer is preferred and drm_framebuffer_put() should be called * when the framebuffer is to be cleaned up. */ void drm_framebuffer_unregister_private(struct drm_framebuffer *fb) { struct drm_device *dev; if (!fb) return; dev = fb->dev; /* Mark fb as reaped and drop idr ref. */ drm_mode_object_unregister(dev, &fb->base); } EXPORT_SYMBOL(drm_framebuffer_unregister_private); /** * drm_framebuffer_cleanup - remove a framebuffer object * @fb: framebuffer to remove * * Cleanup framebuffer. This function is intended to be used from the drivers * &drm_framebuffer_funcs.destroy callback. It can also be used to clean up * driver private framebuffers embedded into a larger structure. * * Note that this function does not remove the fb from active usage - if it is * still used anywhere, hilarity can ensue since userspace could call getfb on * the id and get back -EINVAL. Obviously no concern at driver unload time. * * Also, the framebuffer will not be removed from the lookup idr - for * user-created framebuffers this will happen in in the rmfb ioctl. For * driver-private objects (e.g. for fbdev) drivers need to explicitly call * drm_framebuffer_unregister_private. */ void drm_framebuffer_cleanup(struct drm_framebuffer *fb) { struct drm_device *dev = fb->dev; mutex_lock(&dev->mode_config.fb_lock); list_del(&fb->head); dev->mode_config.num_fb--; mutex_unlock(&dev->mode_config.fb_lock); } EXPORT_SYMBOL(drm_framebuffer_cleanup); static int atomic_remove_fb(struct drm_framebuffer *fb) { struct drm_modeset_acquire_ctx ctx; struct drm_device *dev = fb->dev; struct drm_atomic_state *state; struct drm_plane *plane; struct drm_connector *conn __maybe_unused; struct drm_connector_state *conn_state; int i, ret; unsigned plane_mask; bool disable_crtcs = false; retry_disable: drm_modeset_acquire_init(&ctx, 0); state = drm_atomic_state_alloc(dev); if (!state) { ret = -ENOMEM; goto out; } state->acquire_ctx = &ctx; retry: plane_mask = 0; ret = drm_modeset_lock_all_ctx(dev, &ctx); if (ret) goto unlock; drm_for_each_plane(plane, dev) { struct drm_plane_state *plane_state; if (plane->state->fb != fb) continue; plane_state = drm_atomic_get_plane_state(state, plane); if (IS_ERR(plane_state)) { ret = PTR_ERR(plane_state); goto unlock; } if (disable_crtcs && plane_state->crtc->primary == plane) { struct drm_crtc_state *crtc_state; crtc_state = drm_atomic_get_existing_crtc_state(state, plane_state->crtc); ret = drm_atomic_add_affected_connectors(state, plane_state->crtc); if (ret) goto unlock; crtc_state->active = false; ret = drm_atomic_set_mode_for_crtc(crtc_state, NULL); if (ret) goto unlock; } drm_atomic_set_fb_for_plane(plane_state, NULL); ret = drm_atomic_set_crtc_for_plane(plane_state, NULL); if (ret) goto unlock; plane_mask |= drm_plane_mask(plane); } /* This list is only filled when disable_crtcs is set. */ for_each_new_connector_in_state(state, conn, conn_state, i) { ret = drm_atomic_set_crtc_for_connector(conn_state, NULL); if (ret) goto unlock; } if (plane_mask) ret = drm_atomic_commit(state); unlock: if (ret == -EDEADLK) { drm_atomic_state_clear(state); drm_modeset_backoff(&ctx); goto retry; } drm_atomic_state_put(state); out: drm_modeset_drop_locks(&ctx); drm_modeset_acquire_fini(&ctx); if (ret == -EINVAL && !disable_crtcs) { disable_crtcs = true; goto retry_disable; } return ret; } static void legacy_remove_fb(struct drm_framebuffer *fb) { struct drm_device *dev = fb->dev; struct drm_crtc *crtc; struct drm_plane *plane; drm_modeset_lock_all(dev); /* remove from any CRTC */ drm_for_each_crtc(crtc, dev) { if (crtc->primary->fb == fb) { /* should turn off the crtc */ if (drm_crtc_force_disable(crtc)) DRM_ERROR("failed to reset crtc %p when fb was deleted\n", crtc); } } drm_for_each_plane(plane, dev) { if (plane->fb == fb) drm_plane_force_disable(plane); } drm_modeset_unlock_all(dev); } /** * drm_framebuffer_remove - remove and unreference a framebuffer object * @fb: framebuffer to remove * * Scans all the CRTCs and planes in @dev's mode_config. If they're * using @fb, removes it, setting it to NULL. Then drops the reference to the * passed-in framebuffer. Might take the modeset locks. * * Note that this function optimizes the cleanup away if the caller holds the * last reference to the framebuffer. It is also guaranteed to not take the * modeset locks in this case. */ void drm_framebuffer_remove(struct drm_framebuffer *fb) { struct drm_device *dev; if (!fb) return; dev = fb->dev; WARN_ON(!list_empty(&fb->filp_head)); /* * drm ABI mandates that we remove any deleted framebuffers from active * useage. But since most sane clients only remove framebuffers they no * longer need, try to optimize this away. * * Since we're holding a reference ourselves, observing a refcount of 1 * means that we're the last holder and can skip it. Also, the refcount * can never increase from 1 again, so we don't need any barriers or * locks. * * Note that userspace could try to race with use and instate a new * usage _after_ we've cleared all current ones. End result will be an * in-use fb with fb-id == 0. Userspace is allowed to shoot its own foot * in this manner. */ if (drm_framebuffer_read_refcount(fb) > 1) { if (drm_drv_uses_atomic_modeset(dev)) { int ret = atomic_remove_fb(fb); WARN(ret, "atomic remove_fb failed with %i\n", ret); } else legacy_remove_fb(fb); } drm_framebuffer_put(fb); } EXPORT_SYMBOL(drm_framebuffer_remove); /** * drm_framebuffer_plane_width - width of the plane given the first plane * @width: width of the first plane * @fb: the framebuffer * @plane: plane index * * Returns: * The width of @plane, given that the width of the first plane is @width. */ int drm_framebuffer_plane_width(int width, const struct drm_framebuffer *fb, int plane) { if (plane >= fb->format->num_planes) return 0; return fb_plane_width(width, fb->format, plane); } EXPORT_SYMBOL(drm_framebuffer_plane_width); /** * drm_framebuffer_plane_height - height of the plane given the first plane * @height: height of the first plane * @fb: the framebuffer * @plane: plane index * * Returns: * The height of @plane, given that the height of the first plane is @height. */ int drm_framebuffer_plane_height(int height, const struct drm_framebuffer *fb, int plane) { if (plane >= fb->format->num_planes) return 0; return fb_plane_height(height, fb->format, plane); } EXPORT_SYMBOL(drm_framebuffer_plane_height); void drm_framebuffer_print_info(struct drm_printer *p, unsigned int indent, const struct drm_framebuffer *fb) { struct drm_format_name_buf format_name; unsigned int i; drm_printf_indent(p, indent, "allocated by = %s\n", fb->comm); drm_printf_indent(p, indent, "refcount=%u\n", drm_framebuffer_read_refcount(fb)); drm_printf_indent(p, indent, "format=%s\n", drm_get_format_name(fb->format->format, &format_name)); drm_printf_indent(p, indent, "modifier=0x%llx\n", fb->modifier); drm_printf_indent(p, indent, "size=%ux%u\n", fb->width, fb->height); drm_printf_indent(p, indent, "layers:\n"); for (i = 0; i < fb->format->num_planes; i++) { drm_printf_indent(p, indent + 1, "size[%u]=%dx%d\n", i, drm_framebuffer_plane_width(fb->width, fb, i), drm_framebuffer_plane_height(fb->height, fb, i)); drm_printf_indent(p, indent + 1, "pitch[%u]=%u\n", i, fb->pitches[i]); drm_printf_indent(p, indent + 1, "offset[%u]=%u\n", i, fb->offsets[i]); drm_printf_indent(p, indent + 1, "obj[%u]:%s\n", i, fb->obj[i] ? "" : "(null)"); if (fb->obj[i]) drm_gem_print_info(p, indent + 2, fb->obj[i]); } } #ifdef CONFIG_DEBUG_FS static int drm_framebuffer_info(struct seq_file *m, void *data) { struct drm_info_node *node = m->private; struct drm_device *dev = node->minor->dev; struct drm_printer p = drm_seq_file_printer(m); struct drm_framebuffer *fb; mutex_lock(&dev->mode_config.fb_lock); drm_for_each_fb(fb, dev) { drm_printf(&p, "framebuffer[%u]:\n", fb->base.id); drm_framebuffer_print_info(&p, 1, fb); } mutex_unlock(&dev->mode_config.fb_lock); return 0; } static const struct drm_info_list drm_framebuffer_debugfs_list[] = { { "framebuffer", drm_framebuffer_info, 0 }, }; int drm_framebuffer_debugfs_init(struct drm_minor *minor) { return drm_debugfs_create_files(drm_framebuffer_debugfs_list, ARRAY_SIZE(drm_framebuffer_debugfs_list), minor->debugfs_root, minor); } #endif
8 8 25 25 9 9 9 9 9 9 9 9 9 4 2 4 4 1 1 1 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 /* * Copyright 2011 Red Hat, Inc. * Copyright © 2014 The Chromium OS Authors * * Permission is hereby granted, free of charge, to any person obtaining a * copy of this software and associated documentation files (the "Software") * to deal in the software without restriction, including without limitation * on the rights to use, copy, modify, merge, publish, distribute, sub * license, and/or sell copies of the Software, and to permit persons to whom * them Software is furnished to do so, subject to the following conditions: * * The above copyright notice and this permission notice (including the next * paragraph) shall be included in all copies or substantial portions of the * Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTIBILITY, * FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL * THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES, OR OTHER LIABILITY, WHETHER * IN AN ACTION OF CONTRACT, TORT, OR OTHERWISE, ARISING FROM, OUT OF OR IN * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * * Authors: * Adam Jackson <ajax@redhat.com> * Ben Widawsky <ben@bwidawsk.net> */ /** * This is vgem, a (non-hardware-backed) GEM service. This is used by Mesa's * software renderer and the X server for efficient buffer sharing. */ #include <linux/module.h> #include <linux/ramfs.h> #include <linux/shmem_fs.h> #include <linux/dma-buf.h> #include "vgem_drv.h" #define DRIVER_NAME "vgem" #define DRIVER_DESC "Virtual GEM provider" #define DRIVER_DATE "20120112" #define DRIVER_MAJOR 1 #define DRIVER_MINOR 0 static struct vgem_device { struct drm_device drm; struct platform_device *platform; } *vgem_device; static void vgem_gem_free_object(struct drm_gem_object *obj) { struct drm_vgem_gem_object *vgem_obj = to_vgem_bo(obj); kvfree(vgem_obj->pages); mutex_destroy(&vgem_obj->pages_lock); if (obj->import_attach) drm_prime_gem_destroy(obj, vgem_obj->table); drm_gem_object_release(obj); kfree(vgem_obj); } static vm_fault_t vgem_gem_fault(struct vm_fault *vmf) { struct vm_area_struct *vma = vmf->vma; struct drm_vgem_gem_object *obj = vma->vm_private_data; /* We don't use vmf->pgoff since that has the fake offset */ unsigned long vaddr = vmf->address; vm_fault_t ret = VM_FAULT_SIGBUS; loff_t num_pages; pgoff_t page_offset; page_offset = (vaddr - vma->vm_start) >> PAGE_SHIFT; num_pages = DIV_ROUND_UP(obj->base.size, PAGE_SIZE); if (page_offset >= num_pages) return VM_FAULT_SIGBUS; mutex_lock(&obj->pages_lock); if (obj->pages) { get_page(obj->pages[page_offset]); vmf->page = obj->pages[page_offset]; ret = 0; } mutex_unlock(&obj->pages_lock); if (ret) { struct page *page; page = shmem_read_mapping_page( file_inode(obj->base.filp)->i_mapping, page_offset); if (!IS_ERR(page)) { vmf->page = page; ret = 0; } else switch (PTR_ERR(page)) { case -ENOSPC: case -ENOMEM: ret = VM_FAULT_OOM; break; case -EBUSY: ret = VM_FAULT_RETRY; break; case -EFAULT: case -EINVAL: ret = VM_FAULT_SIGBUS; break; default: WARN_ON(PTR_ERR(page)); ret = VM_FAULT_SIGBUS; break; } } return ret; } static const struct vm_operations_struct vgem_gem_vm_ops = { .fault = vgem_gem_fault, .open = drm_gem_vm_open, .close = drm_gem_vm_close, }; static int vgem_open(struct drm_device *dev, struct drm_file *file) { struct vgem_file *vfile; int ret; vfile = kzalloc(sizeof(*vfile), GFP_KERNEL); if (!vfile) return -ENOMEM; file->driver_priv = vfile; ret = vgem_fence_open(vfile); if (ret) { kfree(vfile); return ret; } return 0; } static void vgem_postclose(struct drm_device *dev, struct drm_file *file) { struct vgem_file *vfile = file->driver_priv; vgem_fence_close(vfile); kfree(vfile); } static struct drm_vgem_gem_object *__vgem_gem_create(struct drm_device *dev, unsigned long size) { struct drm_vgem_gem_object *obj; int ret; obj = kzalloc(sizeof(*obj), GFP_KERNEL); if (!obj) return ERR_PTR(-ENOMEM); ret = drm_gem_object_init(dev, &obj->base, roundup(size, PAGE_SIZE)); if (ret) { kfree(obj); return ERR_PTR(ret); } mutex_init(&obj->pages_lock); return obj; } static void __vgem_gem_destroy(struct drm_vgem_gem_object *obj) { drm_gem_object_release(&obj->base); kfree(obj); } static struct drm_gem_object *vgem_gem_create(struct drm_device *dev, struct drm_file *file, unsigned int *handle, unsigned long size) { struct drm_vgem_gem_object *obj; int ret; obj = __vgem_gem_create(dev, size); if (IS_ERR(obj)) return ERR_CAST(obj); ret = drm_gem_handle_create(file, &obj->base, handle); if (ret) { drm_gem_object_put_unlocked(&obj->base); return ERR_PTR(ret); } return &obj->base; } static int vgem_gem_dumb_create(struct drm_file *file, struct drm_device *dev, struct drm_mode_create_dumb *args) { struct drm_gem_object *gem_object; u64 pitch, size; pitch = args->width * DIV_ROUND_UP(args->bpp, 8); size = args->height * pitch; if (size == 0) return -EINVAL; gem_object = vgem_gem_create(dev, file, &args->handle, size); if (IS_ERR(gem_object)) return PTR_ERR(gem_object); args->size = gem_object->size; args->pitch = pitch; drm_gem_object_put_unlocked(gem_object); DRM_DEBUG_DRIVER("Created object of size %llu\n", args->size); return 0; } static struct drm_ioctl_desc vgem_ioctls[] = { DRM_IOCTL_DEF_DRV(VGEM_FENCE_ATTACH, vgem_fence_attach_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), DRM_IOCTL_DEF_DRV(VGEM_FENCE_SIGNAL, vgem_fence_signal_ioctl, DRM_AUTH|DRM_RENDER_ALLOW), }; static int vgem_mmap(struct file *filp, struct vm_area_struct *vma) { unsigned long flags = vma->vm_flags; int ret; ret = drm_gem_mmap(filp, vma); if (ret) return ret; /* Keep the WC mmaping set by drm_gem_mmap() but our pages * are ordinary and not special. */ vma->vm_flags = flags | VM_DONTEXPAND | VM_DONTDUMP; return 0; } static const struct file_operations vgem_driver_fops = { .owner = THIS_MODULE, .open = drm_open, .mmap = vgem_mmap, .poll = drm_poll, .read = drm_read, .unlocked_ioctl = drm_ioctl, .compat_ioctl = drm_compat_ioctl, .release = drm_release, }; static struct page **vgem_pin_pages(struct drm_vgem_gem_object *bo) { mutex_lock(&bo->pages_lock); if (bo->pages_pin_count++ == 0) { struct page **pages; pages = drm_gem_get_pages(&bo->base); if (IS_ERR(pages)) { bo->pages_pin_count--; mutex_unlock(&bo->pages_lock); return pages; } bo->pages = pages; } mutex_unlock(&bo->pages_lock); return bo->pages; } static void vgem_unpin_pages(struct drm_vgem_gem_object *bo) { mutex_lock(&bo->pages_lock); if (--bo->pages_pin_count == 0) { drm_gem_put_pages(&bo->base, bo->pages, true, true); bo->pages = NULL; } mutex_unlock(&bo->pages_lock); } static int vgem_prime_pin(struct drm_gem_object *obj) { struct drm_vgem_gem_object *bo = to_vgem_bo(obj); long n_pages = obj->size >> PAGE_SHIFT; struct page **pages; pages = vgem_pin_pages(bo); if (IS_ERR(pages)) return PTR_ERR(pages); /* Flush the object from the CPU cache so that importers can rely * on coherent indirect access via the exported dma-address. */ drm_clflush_pages(pages, n_pages); return 0; } static void vgem_prime_unpin(struct drm_gem_object *obj) { struct drm_vgem_gem_object *bo = to_vgem_bo(obj); vgem_unpin_pages(bo); } static struct sg_table *vgem_prime_get_sg_table(struct drm_gem_object *obj) { struct drm_vgem_gem_object *bo = to_vgem_bo(obj); return drm_prime_pages_to_sg(bo->pages, bo->base.size >> PAGE_SHIFT); } static struct drm_gem_object* vgem_prime_import(struct drm_device *dev, struct dma_buf *dma_buf) { struct vgem_device *vgem = container_of(dev, typeof(*vgem), drm); return drm_gem_prime_import_dev(dev, dma_buf, &vgem->platform->dev); } static struct drm_gem_object *vgem_prime_import_sg_table(struct drm_device *dev, struct dma_buf_attachment *attach, struct sg_table *sg) { struct drm_vgem_gem_object *obj; int npages; obj = __vgem_gem_create(dev, attach->dmabuf->size); if (IS_ERR(obj)) return ERR_CAST(obj); npages = PAGE_ALIGN(attach->dmabuf->size) / PAGE_SIZE; obj->table = sg; obj->pages = kvmalloc_array(npages, sizeof(struct page *), GFP_KERNEL); if (!obj->pages) { __vgem_gem_destroy(obj); return ERR_PTR(-ENOMEM); } obj->pages_pin_count++; /* perma-pinned */ drm_prime_sg_to_page_addr_arrays(obj->table, obj->pages, NULL, npages); return &obj->base; } static void *vgem_prime_vmap(struct drm_gem_object *obj) { struct drm_vgem_gem_object *bo = to_vgem_bo(obj); long n_pages = obj->size >> PAGE_SHIFT; struct page **pages; pages = vgem_pin_pages(bo); if (IS_ERR(pages)) return NULL; return vmap(pages, n_pages, 0, pgprot_writecombine(PAGE_KERNEL)); } static void vgem_prime_vunmap(struct drm_gem_object *obj, void *vaddr) { struct drm_vgem_gem_object *bo = to_vgem_bo(obj); vunmap(vaddr); vgem_unpin_pages(bo); } static int vgem_prime_mmap(struct drm_gem_object *obj, struct vm_area_struct *vma) { int ret; if (obj->size < vma->vm_end - vma->vm_start) return -EINVAL; if (!obj->filp) return -ENODEV; ret = call_mmap(obj->filp, vma); if (ret) return ret; fput(vma->vm_file); vma->vm_file = get_file(obj->filp); vma->vm_flags |= VM_DONTEXPAND | VM_DONTDUMP; vma->vm_page_prot = pgprot_writecombine(vm_get_page_prot(vma->vm_flags)); return 0; } static void vgem_release(struct drm_device *dev) { struct vgem_device *vgem = container_of(dev, typeof(*vgem), drm); platform_device_unregister(vgem->platform); drm_dev_fini(&vgem->drm); kfree(vgem); } static struct drm_driver vgem_driver = { .driver_features = DRIVER_GEM | DRIVER_PRIME, .release = vgem_release, .open = vgem_open, .postclose = vgem_postclose, .gem_free_object_unlocked = vgem_gem_free_object, .gem_vm_ops = &vgem_gem_vm_ops, .ioctls = vgem_ioctls, .num_ioctls = ARRAY_SIZE(vgem_ioctls), .fops = &vgem_driver_fops, .dumb_create = vgem_gem_dumb_create, .prime_handle_to_fd = drm_gem_prime_handle_to_fd, .prime_fd_to_handle = drm_gem_prime_fd_to_handle, .gem_prime_pin = vgem_prime_pin, .gem_prime_unpin = vgem_prime_unpin, .gem_prime_import = vgem_prime_import, .gem_prime_export = drm_gem_prime_export, .gem_prime_import_sg_table = vgem_prime_import_sg_table, .gem_prime_get_sg_table = vgem_prime_get_sg_table, .gem_prime_vmap = vgem_prime_vmap, .gem_prime_vunmap = vgem_prime_vunmap, .gem_prime_mmap = vgem_prime_mmap, .name = DRIVER_NAME, .desc = DRIVER_DESC, .date = DRIVER_DATE, .major = DRIVER_MAJOR, .minor = DRIVER_MINOR, }; static int __init vgem_init(void) { int ret; vgem_device = kzalloc(sizeof(*vgem_device), GFP_KERNEL); if (!vgem_device) return -ENOMEM; vgem_device->platform = platform_device_register_simple("vgem", -1, NULL, 0); if (IS_ERR(vgem_device->platform)) { ret = PTR_ERR(vgem_device->platform); goto out_free; } dma_coerce_mask_and_coherent(&vgem_device->platform->dev, DMA_BIT_MASK(64)); ret = drm_dev_init(&vgem_device->drm, &vgem_driver, &vgem_device->platform->dev); if (ret) goto out_unregister; /* Final step: expose the device/driver to userspace */ ret = drm_dev_register(&vgem_device->drm, 0); if (ret) goto out_fini; return 0; out_fini: drm_dev_fini(&vgem_device->drm); out_unregister: platform_device_unregister(vgem_device->platform); out_free: kfree(vgem_device); return ret; } static void __exit vgem_exit(void) { drm_dev_unregister(&vgem_device->drm); drm_dev_unref(&vgem_device->drm); } module_init(vgem_init); module_exit(vgem_exit); MODULE_AUTHOR("Red Hat, Inc."); MODULE_AUTHOR("Intel Corporation"); MODULE_DESCRIPTION(DRIVER_DESC); MODULE_LICENSE("GPL and additional rights");
183 13 425 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 /* SPDX-License-Identifier: GPL-2.0 */ #ifndef __NET_PKT_SCHED_H #define __NET_PKT_SCHED_H #include <linux/jiffies.h> #include <linux/ktime.h> #include <linux/if_vlan.h> #include <linux/netdevice.h> #include <net/sch_generic.h> #include <net/net_namespace.h> #include <uapi/linux/pkt_sched.h> #define DEFAULT_TX_QUEUE_LEN 1000 #define STAB_SIZE_LOG_MAX 30 struct qdisc_walker { int stop; int skip; int count; int (*fn)(struct Qdisc *, unsigned long cl, struct qdisc_walker *); }; #define QDISC_ALIGNTO 64 #define QDISC_ALIGN(len) (((len) + QDISC_ALIGNTO-1) & ~(QDISC_ALIGNTO-1)) static inline void *qdisc_priv(struct Qdisc *q) { return (char *) q + QDISC_ALIGN(sizeof(struct Qdisc)); } /* Timer resolution MUST BE < 10% of min_schedulable_packet_size/bandwidth Normal IP packet size ~ 512byte, hence: 0.5Kbyte/1Mbyte/sec = 0.5msec, so that we need 50usec timer for 10Mbit ethernet. 10msec resolution -> <50Kbit/sec. The result: [34]86 is not good choice for QoS router :-( The things are not so bad, because we may use artificial clock evaluated by integration of network data flow in the most critical places. */ typedef u64 psched_time_t; typedef long psched_tdiff_t; /* Avoid doing 64 bit divide */ #define PSCHED_SHIFT 6 #define PSCHED_TICKS2NS(x) ((s64)(x) << PSCHED_SHIFT) #define PSCHED_NS2TICKS(x) ((x) >> PSCHED_SHIFT) #define PSCHED_TICKS_PER_SEC PSCHED_NS2TICKS(NSEC_PER_SEC) #define PSCHED_PASTPERFECT 0 static inline psched_time_t psched_get_time(void) { return PSCHED_NS2TICKS(ktime_get_ns()); } static inline psched_tdiff_t psched_tdiff_bounded(psched_time_t tv1, psched_time_t tv2, psched_time_t bound) { return min(tv1 - tv2, bound); } struct qdisc_watchdog { u64 last_expires; struct hrtimer timer; struct Qdisc *qdisc; }; void qdisc_watchdog_init_clockid(struct qdisc_watchdog *wd, struct Qdisc *qdisc, clockid_t clockid); void qdisc_watchdog_init(struct qdisc_watchdog *wd, struct Qdisc *qdisc); void qdisc_watchdog_schedule_ns(struct qdisc_watchdog *wd, u64 expires); static inline void qdisc_watchdog_schedule(struct qdisc_watchdog *wd, psched_time_t expires) { qdisc_watchdog_schedule_ns(wd, PSCHED_TICKS2NS(expires)); } void qdisc_watchdog_cancel(struct qdisc_watchdog *wd); extern struct Qdisc_ops pfifo_qdisc_ops; extern struct Qdisc_ops bfifo_qdisc_ops; extern struct Qdisc_ops pfifo_head_drop_qdisc_ops; int fifo_set_limit(struct Qdisc *q, unsigned int limit); struct Qdisc *fifo_create_dflt(struct Qdisc *sch, struct Qdisc_ops *ops, unsigned int limit, struct netlink_ext_ack *extack); int register_qdisc(struct Qdisc_ops *qops); int unregister_qdisc(struct Qdisc_ops *qops); void qdisc_get_default(char *id, size_t len); int qdisc_set_default(const char *id); void qdisc_hash_add(struct Qdisc *q, bool invisible); void qdisc_hash_del(struct Qdisc *q); struct Qdisc *qdisc_lookup(struct net_device *dev, u32 handle); struct Qdisc *qdisc_lookup_rcu(struct net_device *dev, u32 handle); struct qdisc_rate_table *qdisc_get_rtab(struct tc_ratespec *r, struct nlattr *tab, struct netlink_ext_ack *extack); void qdisc_put_rtab(struct qdisc_rate_table *tab); void qdisc_put_stab(struct qdisc_size_table *tab); void qdisc_warn_nonwc(const char *txt, struct Qdisc *qdisc); bool sch_direct_xmit(struct sk_buff *skb, struct Qdisc *q, struct net_device *dev, struct netdev_queue *txq, spinlock_t *root_lock, bool validate); void __qdisc_run(struct Qdisc *q); static inline void qdisc_run(struct Qdisc *q) { if (qdisc_run_begin(q)) { __qdisc_run(q); qdisc_run_end(q); } } /* Calculate maximal size of packet seen by hard_start_xmit routine of this device. */ static inline unsigned int psched_mtu(const struct net_device *dev) { return dev->mtu + dev->hard_header_len; } static inline struct net *qdisc_net(struct Qdisc *q) { return dev_net(q->dev_queue->dev); } struct tc_cbs_qopt_offload { u8 enable; s32 queue; s32 hicredit; s32 locredit; s32 idleslope; s32 sendslope; }; struct tc_etf_qopt_offload { u8 enable; s32 queue; }; #endif
1 1 1 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 /* * vmx.h: VMX Architecture related definitions * Copyright (c) 2004, Intel Corporation. * * This program is free software; you can redistribute it and/or modify it * under the terms and conditions of the GNU General Public License, * version 2, as published by the Free Software Foundation. * * This program is distributed in the hope it will be useful, but WITHOUT * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for * more details. * * You should have received a copy of the GNU General Public License along with * this program; if not, write to the Free Software Foundation, Inc., 59 Temple * Place - Suite 330, Boston, MA 02111-1307 USA. * * A few random additions are: * Copyright (C) 2006 Qumranet * Avi Kivity <avi@qumranet.com> * Yaniv Kamay <yaniv@qumranet.com> * */ #ifndef VMX_H #define VMX_H #include <linux/bitops.h> #include <linux/types.h> #include <uapi/asm/vmx.h> /* * Definitions of Primary Processor-Based VM-Execution Controls. */ #define CPU_BASED_VIRTUAL_INTR_PENDING 0x00000004 #define CPU_BASED_USE_TSC_OFFSETING 0x00000008 #define CPU_BASED_HLT_EXITING 0x00000080 #define CPU_BASED_INVLPG_EXITING 0x00000200 #define CPU_BASED_MWAIT_EXITING 0x00000400 #define CPU_BASED_RDPMC_EXITING 0x00000800 #define CPU_BASED_RDTSC_EXITING 0x00001000 #define CPU_BASED_CR3_LOAD_EXITING 0x00008000 #define CPU_BASED_CR3_STORE_EXITING 0x00010000 #define CPU_BASED_CR8_LOAD_EXITING 0x00080000 #define CPU_BASED_CR8_STORE_EXITING 0x00100000 #define CPU_BASED_TPR_SHADOW 0x00200000 #define CPU_BASED_VIRTUAL_NMI_PENDING 0x00400000 #define CPU_BASED_MOV_DR_EXITING 0x00800000 #define CPU_BASED_UNCOND_IO_EXITING 0x01000000 #define CPU_BASED_USE_IO_BITMAPS 0x02000000 #define CPU_BASED_MONITOR_TRAP_FLAG 0x08000000 #define CPU_BASED_USE_MSR_BITMAPS 0x10000000 #define CPU_BASED_MONITOR_EXITING 0x20000000 #define CPU_BASED_PAUSE_EXITING 0x40000000 #define CPU_BASED_ACTIVATE_SECONDARY_CONTROLS 0x80000000 #define CPU_BASED_ALWAYSON_WITHOUT_TRUE_MSR 0x0401e172 /* * Definitions of Secondary Processor-Based VM-Execution Controls. */ #define SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES 0x00000001 #define SECONDARY_EXEC_ENABLE_EPT 0x00000002 #define SECONDARY_EXEC_DESC 0x00000004 #define SECONDARY_EXEC_RDTSCP 0x00000008 #define SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE 0x00000010 #define SECONDARY_EXEC_ENABLE_VPID 0x00000020 #define SECONDARY_EXEC_WBINVD_EXITING 0x00000040 #define SECONDARY_EXEC_UNRESTRICTED_GUEST 0x00000080 #define SECONDARY_EXEC_APIC_REGISTER_VIRT 0x00000100 #define SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY 0x00000200 #define SECONDARY_EXEC_PAUSE_LOOP_EXITING 0x00000400 #define SECONDARY_EXEC_RDRAND_EXITING 0x00000800 #define SECONDARY_EXEC_ENABLE_INVPCID 0x00001000 #define SECONDARY_EXEC_ENABLE_VMFUNC 0x00002000 #define SECONDARY_EXEC_SHADOW_VMCS 0x00004000 #define SECONDARY_EXEC_ENCLS_EXITING 0x00008000 #define SECONDARY_EXEC_RDSEED_EXITING 0x00010000 #define SECONDARY_EXEC_ENABLE_PML 0x00020000 #define SECONDARY_EXEC_XSAVES 0x00100000 #define SECONDARY_EXEC_TSC_SCALING 0x02000000 #define PIN_BASED_EXT_INTR_MASK 0x00000001 #define PIN_BASED_NMI_EXITING 0x00000008 #define PIN_BASED_VIRTUAL_NMIS 0x00000020 #define PIN_BASED_VMX_PREEMPTION_TIMER 0x00000040 #define PIN_BASED_POSTED_INTR 0x00000080 #define PIN_BASED_ALWAYSON_WITHOUT_TRUE_MSR 0x00000016 #define VM_EXIT_SAVE_DEBUG_CONTROLS 0x00000004 #define VM_EXIT_HOST_ADDR_SPACE_SIZE 0x00000200 #define VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL 0x00001000 #define VM_EXIT_ACK_INTR_ON_EXIT 0x00008000 #define VM_EXIT_SAVE_IA32_PAT 0x00040000 #define VM_EXIT_LOAD_IA32_PAT 0x00080000 #define VM_EXIT_SAVE_IA32_EFER 0x00100000 #define VM_EXIT_LOAD_IA32_EFER 0x00200000 #define VM_EXIT_SAVE_VMX_PREEMPTION_TIMER 0x00400000 #define VM_EXIT_CLEAR_BNDCFGS 0x00800000 #define VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR 0x00036dff #define VM_ENTRY_LOAD_DEBUG_CONTROLS 0x00000004 #define VM_ENTRY_IA32E_MODE 0x00000200 #define VM_ENTRY_SMM 0x00000400 #define VM_ENTRY_DEACT_DUAL_MONITOR 0x00000800 #define VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL 0x00002000 #define VM_ENTRY_LOAD_IA32_PAT 0x00004000 #define VM_ENTRY_LOAD_IA32_EFER 0x00008000 #define VM_ENTRY_LOAD_BNDCFGS 0x00010000 #define VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR 0x000011ff #define VMX_MISC_PREEMPTION_TIMER_RATE_MASK 0x0000001f #define VMX_MISC_SAVE_EFER_LMA 0x00000020 #define VMX_MISC_ACTIVITY_HLT 0x00000040 #define VMX_MISC_ZERO_LEN_INS 0x40000000 /* VMFUNC functions */ #define VMX_VMFUNC_EPTP_SWITCHING 0x00000001 #define VMFUNC_EPTP_ENTRIES 512 static inline u32 vmx_basic_vmcs_revision_id(u64 vmx_basic) { return vmx_basic & GENMASK_ULL(30, 0); } static inline u32 vmx_basic_vmcs_size(u64 vmx_basic) { return (vmx_basic & GENMASK_ULL(44, 32)) >> 32; } static inline int vmx_misc_preemption_timer_rate(u64 vmx_misc) { return vmx_misc & VMX_MISC_PREEMPTION_TIMER_RATE_MASK; } static inline int vmx_misc_cr3_count(u64 vmx_misc) { return (vmx_misc & GENMASK_ULL(24, 16)) >> 16; } static inline int vmx_misc_max_msr(u64 vmx_misc) { return (vmx_misc & GENMASK_ULL(27, 25)) >> 25; } static inline int vmx_misc_mseg_revid(u64 vmx_misc) { return (vmx_misc & GENMASK_ULL(63, 32)) >> 32; } /* VMCS Encodings */ enum vmcs_field { VIRTUAL_PROCESSOR_ID = 0x00000000, POSTED_INTR_NV = 0x00000002, GUEST_ES_SELECTOR = 0x00000800, GUEST_CS_SELECTOR = 0x00000802, GUEST_SS_SELECTOR = 0x00000804, GUEST_DS_SELECTOR = 0x00000806, GUEST_FS_SELECTOR = 0x00000808, GUEST_GS_SELECTOR = 0x0000080a, GUEST_LDTR_SELECTOR = 0x0000080c, GUEST_TR_SELECTOR = 0x0000080e, GUEST_INTR_STATUS = 0x00000810, GUEST_PML_INDEX = 0x00000812, HOST_ES_SELECTOR = 0x00000c00, HOST_CS_SELECTOR = 0x00000c02, HOST_SS_SELECTOR = 0x00000c04, HOST_DS_SELECTOR = 0x00000c06, HOST_FS_SELECTOR = 0x00000c08, HOST_GS_SELECTOR = 0x00000c0a, HOST_TR_SELECTOR = 0x00000c0c, IO_BITMAP_A = 0x00002000, IO_BITMAP_A_HIGH = 0x00002001, IO_BITMAP_B = 0x00002002, IO_BITMAP_B_HIGH = 0x00002003, MSR_BITMAP = 0x00002004, MSR_BITMAP_HIGH = 0x00002005, VM_EXIT_MSR_STORE_ADDR = 0x00002006, VM_EXIT_MSR_STORE_ADDR_HIGH = 0x00002007, VM_EXIT_MSR_LOAD_ADDR = 0x00002008, VM_EXIT_MSR_LOAD_ADDR_HIGH = 0x00002009, VM_ENTRY_MSR_LOAD_ADDR = 0x0000200a, VM_ENTRY_MSR_LOAD_ADDR_HIGH = 0x0000200b, PML_ADDRESS = 0x0000200e, PML_ADDRESS_HIGH = 0x0000200f, TSC_OFFSET = 0x00002010, TSC_OFFSET_HIGH = 0x00002011, VIRTUAL_APIC_PAGE_ADDR = 0x00002012, VIRTUAL_APIC_PAGE_ADDR_HIGH = 0x00002013, APIC_ACCESS_ADDR = 0x00002014, APIC_ACCESS_ADDR_HIGH = 0x00002015, POSTED_INTR_DESC_ADDR = 0x00002016, POSTED_INTR_DESC_ADDR_HIGH = 0x00002017, VM_FUNCTION_CONTROL = 0x00002018, VM_FUNCTION_CONTROL_HIGH = 0x00002019, EPT_POINTER = 0x0000201a, EPT_POINTER_HIGH = 0x0000201b, EOI_EXIT_BITMAP0 = 0x0000201c, EOI_EXIT_BITMAP0_HIGH = 0x0000201d, EOI_EXIT_BITMAP1 = 0x0000201e, EOI_EXIT_BITMAP1_HIGH = 0x0000201f, EOI_EXIT_BITMAP2 = 0x00002020, EOI_EXIT_BITMAP2_HIGH = 0x00002021, EOI_EXIT_BITMAP3 = 0x00002022, EOI_EXIT_BITMAP3_HIGH = 0x00002023, EPTP_LIST_ADDRESS = 0x00002024, EPTP_LIST_ADDRESS_HIGH = 0x00002025, VMREAD_BITMAP = 0x00002026, VMREAD_BITMAP_HIGH = 0x00002027, VMWRITE_BITMAP = 0x00002028, VMWRITE_BITMAP_HIGH = 0x00002029, XSS_EXIT_BITMAP = 0x0000202C, XSS_EXIT_BITMAP_HIGH = 0x0000202D, ENCLS_EXITING_BITMAP = 0x0000202E, ENCLS_EXITING_BITMAP_HIGH = 0x0000202F, TSC_MULTIPLIER = 0x00002032, TSC_MULTIPLIER_HIGH = 0x00002033, GUEST_PHYSICAL_ADDRESS = 0x00002400, GUEST_PHYSICAL_ADDRESS_HIGH = 0x00002401, VMCS_LINK_POINTER = 0x00002800, VMCS_LINK_POINTER_HIGH = 0x00002801, GUEST_IA32_DEBUGCTL = 0x00002802, GUEST_IA32_DEBUGCTL_HIGH = 0x00002803, GUEST_IA32_PAT = 0x00002804, GUEST_IA32_PAT_HIGH = 0x00002805, GUEST_IA32_EFER = 0x00002806, GUEST_IA32_EFER_HIGH = 0x00002807, GUEST_IA32_PERF_GLOBAL_CTRL = 0x00002808, GUEST_IA32_PERF_GLOBAL_CTRL_HIGH= 0x00002809, GUEST_PDPTR0 = 0x0000280a, GUEST_PDPTR0_HIGH = 0x0000280b, GUEST_PDPTR1 = 0x0000280c, GUEST_PDPTR1_HIGH = 0x0000280d, GUEST_PDPTR2 = 0x0000280e, GUEST_PDPTR2_HIGH = 0x0000280f, GUEST_PDPTR3 = 0x00002810, GUEST_PDPTR3_HIGH = 0x00002811, GUEST_BNDCFGS = 0x00002812, GUEST_BNDCFGS_HIGH = 0x00002813, HOST_IA32_PAT = 0x00002c00, HOST_IA32_PAT_HIGH = 0x00002c01, HOST_IA32_EFER = 0x00002c02, HOST_IA32_EFER_HIGH = 0x00002c03, HOST_IA32_PERF_GLOBAL_CTRL = 0x00002c04, HOST_IA32_PERF_GLOBAL_CTRL_HIGH = 0x00002c05, PIN_BASED_VM_EXEC_CONTROL = 0x00004000, CPU_BASED_VM_EXEC_CONTROL = 0x00004002, EXCEPTION_BITMAP = 0x00004004, PAGE_FAULT_ERROR_CODE_MASK = 0x00004006, PAGE_FAULT_ERROR_CODE_MATCH = 0x00004008, CR3_TARGET_COUNT = 0x0000400a, VM_EXIT_CONTROLS = 0x0000400c, VM_EXIT_MSR_STORE_COUNT = 0x0000400e, VM_EXIT_MSR_LOAD_COUNT = 0x00004010, VM_ENTRY_CONTROLS = 0x00004012, VM_ENTRY_MSR_LOAD_COUNT = 0x00004014, VM_ENTRY_INTR_INFO_FIELD = 0x00004016, VM_ENTRY_EXCEPTION_ERROR_CODE = 0x00004018, VM_ENTRY_INSTRUCTION_LEN = 0x0000401a, TPR_THRESHOLD = 0x0000401c, SECONDARY_VM_EXEC_CONTROL = 0x0000401e, PLE_GAP = 0x00004020, PLE_WINDOW = 0x00004022, VM_INSTRUCTION_ERROR = 0x00004400, VM_EXIT_REASON = 0x00004402, VM_EXIT_INTR_INFO = 0x00004404, VM_EXIT_INTR_ERROR_CODE = 0x00004406, IDT_VECTORING_INFO_FIELD = 0x00004408, IDT_VECTORING_ERROR_CODE = 0x0000440a, VM_EXIT_INSTRUCTION_LEN = 0x0000440c, VMX_INSTRUCTION_INFO = 0x0000440e, GUEST_ES_LIMIT = 0x00004800, GUEST_CS_LIMIT = 0x00004802, GUEST_SS_LIMIT = 0x00004804, GUEST_DS_LIMIT = 0x00004806, GUEST_FS_LIMIT = 0x00004808, GUEST_GS_LIMIT = 0x0000480a, GUEST_LDTR_LIMIT = 0x0000480c, GUEST_TR_LIMIT = 0x0000480e, GUEST_GDTR_LIMIT = 0x00004810, GUEST_IDTR_LIMIT = 0x00004812, GUEST_ES_AR_BYTES = 0x00004814, GUEST_CS_AR_BYTES = 0x00004816, GUEST_SS_AR_BYTES = 0x00004818, GUEST_DS_AR_BYTES = 0x0000481a, GUEST_FS_AR_BYTES = 0x0000481c, GUEST_GS_AR_BYTES = 0x0000481e, GUEST_LDTR_AR_BYTES = 0x00004820, GUEST_TR_AR_BYTES = 0x00004822, GUEST_INTERRUPTIBILITY_INFO = 0x00004824, GUEST_ACTIVITY_STATE = 0X00004826, GUEST_SYSENTER_CS = 0x0000482A, VMX_PREEMPTION_TIMER_VALUE = 0x0000482E, HOST_IA32_SYSENTER_CS = 0x00004c00, CR0_GUEST_HOST_MASK = 0x00006000, CR4_GUEST_HOST_MASK = 0x00006002, CR0_READ_SHADOW = 0x00006004, CR4_READ_SHADOW = 0x00006006, CR3_TARGET_VALUE0 = 0x00006008, CR3_TARGET_VALUE1 = 0x0000600a, CR3_TARGET_VALUE2 = 0x0000600c, CR3_TARGET_VALUE3 = 0x0000600e, EXIT_QUALIFICATION = 0x00006400, GUEST_LINEAR_ADDRESS = 0x0000640a, GUEST_CR0 = 0x00006800, GUEST_CR3 = 0x00006802, GUEST_CR4 = 0x00006804, GUEST_ES_BASE = 0x00006806, GUEST_CS_BASE = 0x00006808, GUEST_SS_BASE = 0x0000680a, GUEST_DS_BASE = 0x0000680c, GUEST_FS_BASE = 0x0000680e, GUEST_GS_BASE = 0x00006810, GUEST_LDTR_BASE = 0x00006812, GUEST_TR_BASE = 0x00006814, GUEST_GDTR_BASE = 0x00006816, GUEST_IDTR_BASE = 0x00006818, GUEST_DR7 = 0x0000681a, GUEST_RSP = 0x0000681c, GUEST_RIP = 0x0000681e, GUEST_RFLAGS = 0x00006820, GUEST_PENDING_DBG_EXCEPTIONS = 0x00006822, GUEST_SYSENTER_ESP = 0x00006824, GUEST_SYSENTER_EIP = 0x00006826, HOST_CR0 = 0x00006c00, HOST_CR3 = 0x00006c02, HOST_CR4 = 0x00006c04, HOST_FS_BASE = 0x00006c06, HOST_GS_BASE = 0x00006c08, HOST_TR_BASE = 0x00006c0a, HOST_GDTR_BASE = 0x00006c0c, HOST_IDTR_BASE = 0x00006c0e, HOST_IA32_SYSENTER_ESP = 0x00006c10, HOST_IA32_SYSENTER_EIP = 0x00006c12, HOST_RSP = 0x00006c14, HOST_RIP = 0x00006c16, }; /* * Interruption-information format */ #define INTR_INFO_VECTOR_MASK 0xff /* 7:0 */ #define INTR_INFO_INTR_TYPE_MASK 0x700 /* 10:8 */ #define INTR_INFO_DELIVER_CODE_MASK 0x800 /* 11 */ #define INTR_INFO_UNBLOCK_NMI 0x1000 /* 12 */ #define INTR_INFO_VALID_MASK 0x80000000 /* 31 */ #define INTR_INFO_RESVD_BITS_MASK 0x7ffff000 #define VECTORING_INFO_VECTOR_MASK INTR_INFO_VECTOR_MASK #define VECTORING_INFO_TYPE_MASK INTR_INFO_INTR_TYPE_MASK #define VECTORING_INFO_DELIVER_CODE_MASK INTR_INFO_DELIVER_CODE_MASK #define VECTORING_INFO_VALID_MASK INTR_INFO_VALID_MASK #define INTR_TYPE_EXT_INTR (0 << 8) /* external interrupt */ #define INTR_TYPE_RESERVED (1 << 8) /* reserved */ #define INTR_TYPE_NMI_INTR (2 << 8) /* NMI */ #define INTR_TYPE_HARD_EXCEPTION (3 << 8) /* processor exception */ #define INTR_TYPE_SOFT_INTR (4 << 8) /* software interrupt */ #define INTR_TYPE_PRIV_SW_EXCEPTION (5 << 8) /* ICE breakpoint - undocumented */ #define INTR_TYPE_SOFT_EXCEPTION (6 << 8) /* software exception */ #define INTR_TYPE_OTHER_EVENT (7 << 8) /* other event */ /* GUEST_INTERRUPTIBILITY_INFO flags. */ #define GUEST_INTR_STATE_STI 0x00000001 #define GUEST_INTR_STATE_MOV_SS 0x00000002 #define GUEST_INTR_STATE_SMI 0x00000004 #define GUEST_INTR_STATE_NMI 0x00000008 /* GUEST_ACTIVITY_STATE flags */ #define GUEST_ACTIVITY_ACTIVE 0 #define GUEST_ACTIVITY_HLT 1 #define GUEST_ACTIVITY_SHUTDOWN 2 #define GUEST_ACTIVITY_WAIT_SIPI 3 /* * Exit Qualifications for MOV for Control Register Access */ #define CONTROL_REG_ACCESS_NUM 0x7 /* 2:0, number of control reg.*/ #define CONTROL_REG_ACCESS_TYPE 0x30 /* 5:4, access type */ #define CONTROL_REG_ACCESS_REG 0xf00 /* 10:8, general purpose reg. */ #define LMSW_SOURCE_DATA_SHIFT 16 #define LMSW_SOURCE_DATA (0xFFFF << LMSW_SOURCE_DATA_SHIFT) /* 16:31 lmsw source */ #define REG_EAX (0 << 8) #define REG_ECX (1 << 8) #define REG_EDX (2 << 8) #define REG_EBX (3 << 8) #define REG_ESP (4 << 8) #define REG_EBP (5 << 8) #define REG_ESI (6 << 8) #define REG_EDI (7 << 8) #define REG_R8 (8 << 8) #define REG_R9 (9 << 8) #define REG_R10 (10 << 8) #define REG_R11 (11 << 8) #define REG_R12 (12 << 8) #define REG_R13 (13 << 8) #define REG_R14 (14 << 8) #define REG_R15 (15 << 8) /* * Exit Qualifications for MOV for Debug Register Access */ #define DEBUG_REG_ACCESS_NUM 0x7 /* 2:0, number of debug reg. */ #define DEBUG_REG_ACCESS_TYPE 0x10 /* 4, direction of access */ #define TYPE_MOV_TO_DR (0 << 4) #define TYPE_MOV_FROM_DR (1 << 4) #define DEBUG_REG_ACCESS_REG(eq) (((eq) >> 8) & 0xf) /* 11:8, general purpose reg. */ /* * Exit Qualifications for APIC-Access */ #define APIC_ACCESS_OFFSET 0xfff /* 11:0, offset within the APIC page */ #define APIC_ACCESS_TYPE 0xf000 /* 15:12, access type */ #define TYPE_LINEAR_APIC_INST_READ (0 << 12) #define TYPE_LINEAR_APIC_INST_WRITE (1 << 12) #define TYPE_LINEAR_APIC_INST_FETCH (2 << 12) #define TYPE_LINEAR_APIC_EVENT (3 << 12) #define TYPE_PHYSICAL_APIC_EVENT (10 << 12) #define TYPE_PHYSICAL_APIC_INST (15 << 12) /* segment AR in VMCS -- these are different from what LAR reports */ #define VMX_SEGMENT_AR_L_MASK (1 << 13) #define VMX_AR_TYPE_ACCESSES_MASK 1 #define VMX_AR_TYPE_READABLE_MASK (1 << 1) #define VMX_AR_TYPE_WRITEABLE_MASK (1 << 2) #define VMX_AR_TYPE_CODE_MASK (1 << 3) #define VMX_AR_TYPE_MASK 0x0f #define VMX_AR_TYPE_BUSY_64_TSS 11 #define VMX_AR_TYPE_BUSY_32_TSS 11 #define VMX_AR_TYPE_BUSY_16_TSS 3 #define VMX_AR_TYPE_LDT 2 #define VMX_AR_UNUSABLE_MASK (1 << 16) #define VMX_AR_S_MASK (1 << 4) #define VMX_AR_P_MASK (1 << 7) #define VMX_AR_L_MASK (1 << 13) #define VMX_AR_DB_MASK (1 << 14) #define VMX_AR_G_MASK (1 << 15) #define VMX_AR_DPL_SHIFT 5 #define VMX_AR_DPL(ar) (((ar) >> VMX_AR_DPL_SHIFT) & 3) #define VMX_AR_RESERVD_MASK 0xfffe0f00 #define TSS_PRIVATE_MEMSLOT (KVM_USER_MEM_SLOTS + 0) #define APIC_ACCESS_PAGE_PRIVATE_MEMSLOT (KVM_USER_MEM_SLOTS + 1) #define IDENTITY_PAGETABLE_PRIVATE_MEMSLOT (KVM_USER_MEM_SLOTS + 2) #define VMX_NR_VPIDS (1 << 16) #define VMX_VPID_EXTENT_INDIVIDUAL_ADDR 0 #define VMX_VPID_EXTENT_SINGLE_CONTEXT 1 #define VMX_VPID_EXTENT_ALL_CONTEXT 2 #define VMX_VPID_EXTENT_SINGLE_NON_GLOBAL 3 #define VMX_EPT_EXTENT_CONTEXT 1 #define VMX_EPT_EXTENT_GLOBAL 2 #define VMX_EPT_EXTENT_SHIFT 24 #define VMX_EPT_EXECUTE_ONLY_BIT (1ull) #define VMX_EPT_PAGE_WALK_4_BIT (1ull << 6) #define VMX_EPT_PAGE_WALK_5_BIT (1ull << 7) #define VMX_EPTP_UC_BIT (1ull << 8) #define VMX_EPTP_WB_BIT (1ull << 14) #define VMX_EPT_2MB_PAGE_BIT (1ull << 16) #define VMX_EPT_1GB_PAGE_BIT (1ull << 17) #define VMX_EPT_INVEPT_BIT (1ull << 20) #define VMX_EPT_AD_BIT (1ull << 21) #define VMX_EPT_EXTENT_CONTEXT_BIT (1ull << 25) #define VMX_EPT_EXTENT_GLOBAL_BIT (1ull << 26) #define VMX_VPID_INVVPID_BIT (1ull << 0) /* (32 - 32) */ #define VMX_VPID_EXTENT_INDIVIDUAL_ADDR_BIT (1ull << 8) /* (40 - 32) */ #define VMX_VPID_EXTENT_SINGLE_CONTEXT_BIT (1ull << 9) /* (41 - 32) */ #define VMX_VPID_EXTENT_GLOBAL_CONTEXT_BIT (1ull << 10) /* (42 - 32) */ #define VMX_VPID_EXTENT_SINGLE_NON_GLOBAL_BIT (1ull << 11) /* (43 - 32) */ #define VMX_EPT_MT_EPTE_SHIFT 3 #define VMX_EPTP_PWL_MASK 0x38ull #define VMX_EPTP_PWL_4 0x18ull #define VMX_EPTP_PWL_5 0x20ull #define VMX_EPTP_AD_ENABLE_BIT (1ull << 6) #define VMX_EPTP_MT_MASK 0x7ull #define VMX_EPTP_MT_WB 0x6ull #define VMX_EPTP_MT_UC 0x0ull #define VMX_EPT_READABLE_MASK 0x1ull #define VMX_EPT_WRITABLE_MASK 0x2ull #define VMX_EPT_EXECUTABLE_MASK 0x4ull #define VMX_EPT_IPAT_BIT (1ull << 6) #define VMX_EPT_ACCESS_BIT (1ull << 8) #define VMX_EPT_DIRTY_BIT (1ull << 9) #define VMX_EPT_RWX_MASK (VMX_EPT_READABLE_MASK | \ VMX_EPT_WRITABLE_MASK | \ VMX_EPT_EXECUTABLE_MASK) #define VMX_EPT_MT_MASK (7ull << VMX_EPT_MT_EPTE_SHIFT) /* The mask to use to trigger an EPT Misconfiguration in order to track MMIO */ #define VMX_EPT_MISCONFIG_WX_VALUE (VMX_EPT_WRITABLE_MASK | \ VMX_EPT_EXECUTABLE_MASK) #define VMX_EPT_IDENTITY_PAGETABLE_ADDR 0xfffbc000ul #define ASM_VMX_VMCLEAR_RAX ".byte 0x66, 0x0f, 0xc7, 0x30" #define ASM_VMX_VMLAUNCH ".byte 0x0f, 0x01, 0xc2" #define ASM_VMX_VMRESUME ".byte 0x0f, 0x01, 0xc3" #define ASM_VMX_VMPTRLD_RAX ".byte 0x0f, 0xc7, 0x30" #define ASM_VMX_VMREAD_RDX_RAX ".byte 0x0f, 0x78, 0xd0" #define ASM_VMX_VMWRITE_RAX_RDX ".byte 0x0f, 0x79, 0xd0" #define ASM_VMX_VMWRITE_RSP_RDX ".byte 0x0f, 0x79, 0xd4" #define ASM_VMX_VMXOFF ".byte 0x0f, 0x01, 0xc4" #define ASM_VMX_VMXON_RAX ".byte 0xf3, 0x0f, 0xc7, 0x30" #define ASM_VMX_INVEPT ".byte 0x66, 0x0f, 0x38, 0x80, 0x08" #define ASM_VMX_INVVPID ".byte 0x66, 0x0f, 0x38, 0x81, 0x08" struct vmx_msr_entry { u32 index; u32 reserved; u64 value; } __aligned(16); /* * Exit Qualifications for entry failure during or after loading guest state */ #define ENTRY_FAIL_DEFAULT 0 #define ENTRY_FAIL_PDPTE 2 #define ENTRY_FAIL_NMI 3 #define ENTRY_FAIL_VMCS_LINK_PTR 4 /* * Exit Qualifications for EPT Violations */ #define EPT_VIOLATION_ACC_READ_BIT 0 #define EPT_VIOLATION_ACC_WRITE_BIT 1 #define EPT_VIOLATION_ACC_INSTR_BIT 2 #define EPT_VIOLATION_READABLE_BIT 3 #define EPT_VIOLATION_WRITABLE_BIT 4 #define EPT_VIOLATION_EXECUTABLE_BIT 5 #define EPT_VIOLATION_GVA_TRANSLATED_BIT 8 #define EPT_VIOLATION_ACC_READ (1 << EPT_VIOLATION_ACC_READ_BIT) #define EPT_VIOLATION_ACC_WRITE (1 << EPT_VIOLATION_ACC_WRITE_BIT) #define EPT_VIOLATION_ACC_INSTR (1 << EPT_VIOLATION_ACC_INSTR_BIT) #define EPT_VIOLATION_READABLE (1 << EPT_VIOLATION_READABLE_BIT) #define EPT_VIOLATION_WRITABLE (1 << EPT_VIOLATION_WRITABLE_BIT) #define EPT_VIOLATION_EXECUTABLE (1 << EPT_VIOLATION_EXECUTABLE_BIT) #define EPT_VIOLATION_GVA_TRANSLATED (1 << EPT_VIOLATION_GVA_TRANSLATED_BIT) /* * VM-instruction error numbers */ enum vm_instruction_error_number { VMXERR_VMCALL_IN_VMX_ROOT_OPERATION = 1, VMXERR_VMCLEAR_INVALID_ADDRESS = 2, VMXERR_VMCLEAR_VMXON_POINTER = 3, VMXERR_VMLAUNCH_NONCLEAR_VMCS = 4, VMXERR_VMRESUME_NONLAUNCHED_VMCS = 5, VMXERR_VMRESUME_AFTER_VMXOFF = 6, VMXERR_ENTRY_INVALID_CONTROL_FIELD = 7, VMXERR_ENTRY_INVALID_HOST_STATE_FIELD = 8, VMXERR_VMPTRLD_INVALID_ADDRESS = 9, VMXERR_VMPTRLD_VMXON_POINTER = 10, VMXERR_VMPTRLD_INCORRECT_VMCS_REVISION_ID = 11, VMXERR_UNSUPPORTED_VMCS_COMPONENT = 12, VMXERR_VMWRITE_READ_ONLY_VMCS_COMPONENT = 13, VMXERR_VMXON_IN_VMX_ROOT_OPERATION = 15, VMXERR_ENTRY_INVALID_EXECUTIVE_VMCS_POINTER = 16, VMXERR_ENTRY_NONLAUNCHED_EXECUTIVE_VMCS = 17, VMXERR_ENTRY_EXECUTIVE_VMCS_POINTER_NOT_VMXON_POINTER = 18, VMXERR_VMCALL_NONCLEAR_VMCS = 19, VMXERR_VMCALL_INVALID_VM_EXIT_CONTROL_FIELDS = 20, VMXERR_VMCALL_INCORRECT_MSEG_REVISION_ID = 22, VMXERR_VMXOFF_UNDER_DUAL_MONITOR_TREATMENT_OF_SMIS_AND_SMM = 23, VMXERR_VMCALL_INVALID_SMM_MONITOR_FEATURES = 24, VMXERR_ENTRY_INVALID_VM_EXECUTION_CONTROL_FIELDS_IN_EXECUTIVE_VMCS = 25, VMXERR_ENTRY_EVENTS_BLOCKED_BY_MOV_SS = 26, VMXERR_INVALID_OPERAND_TO_INVEPT_INVVPID = 28, }; enum vmx_l1d_flush_state { VMENTER_L1D_FLUSH_AUTO, VMENTER_L1D_FLUSH_NEVER, VMENTER_L1D_FLUSH_COND, VMENTER_L1D_FLUSH_ALWAYS, VMENTER_L1D_FLUSH_EPT_DISABLED, VMENTER_L1D_FLUSH_NOT_REQUIRED, }; extern enum vmx_l1d_flush_state l1tf_vmx_mitigation; #endif
1 6 4 6 18 18 18 18 18 18 18 18 7 18 3 3 3 2 2 2 2 2 2 21 1 33 32 32 3 3 3 1 3 3 3 34 1 1 3 5 5 4 2 3 1 1 3 5 5 1 1 2 3 1 1 1 1 7 5 1 4 4 11 11 6 2 1 4 3 2 6 5 5 5 5 1 1 1 4 5 5 5 5 5 5 2 2 2 4 4 2 2 2 4 5 5 5 3 3 3 23 23 2 2 2 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 /* * VMware VMCI Driver * * Copyright (C) 2012 VMware, Inc. All rights reserved. * * This program is free software; you can redistribute it and/or modify it * under the terms of the GNU General Public License as published by the * Free Software Foundation version 2 and no later version. * * This program is distributed in the hope that it will be useful, but * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License * for more details. */ #include <linux/vmw_vmci_defs.h> #include <linux/vmw_vmci_api.h> #include <linux/highmem.h> #include <linux/kernel.h> #include <linux/module.h> #include <linux/sched.h> #include <linux/cred.h> #include <linux/slab.h> #include "vmci_queue_pair.h" #include "vmci_datagram.h" #include "vmci_doorbell.h" #include "vmci_context.h" #include "vmci_driver.h" #include "vmci_event.h" /* Use a wide upper bound for the maximum contexts. */ #define VMCI_MAX_CONTEXTS 2000 /* * List of current VMCI contexts. Contexts can be added by * vmci_ctx_create() and removed via vmci_ctx_destroy(). * These, along with context lookup, are protected by the * list structure's lock. */ static struct { struct list_head head; spinlock_t lock; /* Spinlock for context list operations */ } ctx_list = { .head = LIST_HEAD_INIT(ctx_list.head), .lock = __SPIN_LOCK_UNLOCKED(ctx_list.lock), }; /* Used by contexts that did not set up notify flag pointers */ static bool ctx_dummy_notify; static void ctx_signal_notify(struct vmci_ctx *context) { *context->notify = true; } static void ctx_clear_notify(struct vmci_ctx *context) { *context->notify = false; } /* * If nothing requires the attention of the guest, clears both * notify flag and call. */ static void ctx_clear_notify_call(struct vmci_ctx *context) { if (context->pending_datagrams == 0 && vmci_handle_arr_get_size(context->pending_doorbell_array) == 0) ctx_clear_notify(context); } /* * Sets the context's notify flag iff datagrams are pending for this * context. Called from vmci_setup_notify(). */ void vmci_ctx_check_signal_notify(struct vmci_ctx *context) { spin_lock(&context->lock); if (context->pending_datagrams) ctx_signal_notify(context); spin_unlock(&context->lock); } /* * Allocates and initializes a VMCI context. */ struct vmci_ctx *vmci_ctx_create(u32 cid, u32 priv_flags, uintptr_t event_hnd, int user_version, const struct cred *cred) { struct vmci_ctx *context; int error; if (cid == VMCI_INVALID_ID) { pr_devel("Invalid context ID for VMCI context\n"); error = -EINVAL; goto err_out; } if (priv_flags & ~VMCI_PRIVILEGE_ALL_FLAGS) { pr_devel("Invalid flag (flags=0x%x) for VMCI context\n", priv_flags); error = -EINVAL; goto err_out; } if (user_version == 0) { pr_devel("Invalid suer_version %d\n", user_version); error = -EINVAL; goto err_out; } context = kzalloc(sizeof(*context), GFP_KERNEL); if (!context) { pr_warn("Failed to allocate memory for VMCI context\n"); error = -EINVAL; goto err_out; } kref_init(&context->kref); spin_lock_init(&context->lock); INIT_LIST_HEAD(&context->list_item); INIT_LIST_HEAD(&context->datagram_queue); INIT_LIST_HEAD(&context->notifier_list); /* Initialize host-specific VMCI context. */ init_waitqueue_head(&context->host_context.wait_queue); context->queue_pair_array = vmci_handle_arr_create(0, VMCI_MAX_GUEST_QP_COUNT); if (!context->queue_pair_array) { error = -ENOMEM; goto err_free_ctx; } context->doorbell_array = vmci_handle_arr_create(0, VMCI_MAX_GUEST_DOORBELL_COUNT); if (!context->doorbell_array) { error = -ENOMEM; goto err_free_qp_array; } context->pending_doorbell_array = vmci_handle_arr_create(0, VMCI_MAX_GUEST_DOORBELL_COUNT); if (!context->pending_doorbell_array) { error = -ENOMEM; goto err_free_db_array; } context->user_version = user_version; context->priv_flags = priv_flags; if (cred) context->cred = get_cred(cred); context->notify = &ctx_dummy_notify; context->notify_page = NULL; /* * If we collide with an existing context we generate a new * and use it instead. The VMX will determine if regeneration * is okay. Since there isn't 4B - 16 VMs running on a given * host, the below loop will terminate. */ spin_lock(&ctx_list.lock); while (vmci_ctx_exists(cid)) { /* We reserve the lowest 16 ids for fixed contexts. */ cid = max(cid, VMCI_RESERVED_CID_LIMIT - 1) + 1; if (cid == VMCI_INVALID_ID) cid = VMCI_RESERVED_CID_LIMIT; } context->cid = cid; list_add_tail_rcu(&context->list_item, &ctx_list.head); spin_unlock(&ctx_list.lock); return context; err_free_db_array: vmci_handle_arr_destroy(context->doorbell_array); err_free_qp_array: vmci_handle_arr_destroy(context->queue_pair_array); err_free_ctx: kfree(context); err_out: return ERR_PTR(error); } /* * Destroy VMCI context. */ void vmci_ctx_destroy(struct vmci_ctx *context) { spin_lock(&ctx_list.lock); list_del_rcu(&context->list_item); spin_unlock(&ctx_list.lock); synchronize_rcu(); vmci_ctx_put(context); } /* * Fire notification for all contexts interested in given cid. */ static int ctx_fire_notification(u32 context_id, u32 priv_flags) { u32 i, array_size; struct vmci_ctx *sub_ctx; struct vmci_handle_arr *subscriber_array; struct vmci_handle context_handle = vmci_make_handle(context_id, VMCI_EVENT_HANDLER); /* * We create an array to hold the subscribers we find when * scanning through all contexts. */ subscriber_array = vmci_handle_arr_create(0, VMCI_MAX_CONTEXTS); if (subscriber_array == NULL) return VMCI_ERROR_NO_MEM; /* * Scan all contexts to find who is interested in being * notified about given contextID. */ rcu_read_lock(); list_for_each_entry_rcu(sub_ctx, &ctx_list.head, list_item) { struct vmci_handle_list *node; /* * We only deliver notifications of the removal of * contexts, if the two contexts are allowed to * interact. */ if (vmci_deny_interaction(priv_flags, sub_ctx->priv_flags)) continue; list_for_each_entry_rcu(node, &sub_ctx->notifier_list, node) { if (!vmci_handle_is_equal(node->handle, context_handle)) continue; vmci_handle_arr_append_entry(&subscriber_array, vmci_make_handle(sub_ctx->cid, VMCI_EVENT_HANDLER)); } } rcu_read_unlock(); /* Fire event to all subscribers. */ array_size = vmci_handle_arr_get_size(subscriber_array); for (i = 0; i < array_size; i++) { int result; struct vmci_event_ctx ev; ev.msg.hdr.dst = vmci_handle_arr_get_entry(subscriber_array, i); ev.msg.hdr.src = vmci_make_handle(VMCI_HYPERVISOR_CONTEXT_ID, VMCI_CONTEXT_RESOURCE_ID); ev.msg.hdr.payload_size = sizeof(ev) - sizeof(ev.msg.hdr); ev.msg.event_data.event = VMCI_EVENT_CTX_REMOVED; ev.payload.context_id = context_id; result = vmci_datagram_dispatch(VMCI_HYPERVISOR_CONTEXT_ID, &ev.msg.hdr, false); if (result < VMCI_SUCCESS) { pr_devel("Failed to enqueue event datagram (type=%d) for context (ID=0x%x)\n", ev.msg.event_data.event, ev.msg.hdr.dst.context); /* We continue to enqueue on next subscriber. */ } } vmci_handle_arr_destroy(subscriber_array); return VMCI_SUCCESS; } /* * Returns the current number of pending datagrams. The call may * also serve as a synchronization point for the datagram queue, * as no enqueue operations can occur concurrently. */ int vmci_ctx_pending_datagrams(u32 cid, u32 *pending) { struct vmci_ctx *context; context = vmci_ctx_get(cid); if (context == NULL) return VMCI_ERROR_INVALID_ARGS; spin_lock(&context->lock); if (pending) *pending = context->pending_datagrams; spin_unlock(&context->lock); vmci_ctx_put(context); return VMCI_SUCCESS; } /* * Queues a VMCI datagram for the appropriate target VM context. */ int vmci_ctx_enqueue_datagram(u32 cid, struct vmci_datagram *dg) { struct vmci_datagram_queue_entry *dq_entry; struct vmci_ctx *context; struct vmci_handle dg_src; size_t vmci_dg_size; vmci_dg_size = VMCI_DG_SIZE(dg); if (vmci_dg_size > VMCI_MAX_DG_SIZE) { pr_devel("Datagram too large (bytes=%zu)\n", vmci_dg_size); return VMCI_ERROR_INVALID_ARGS; } /* Get the target VM's VMCI context. */ context = vmci_ctx_get(cid); if (!context) { pr_devel("Invalid context (ID=0x%x)\n", cid); return VMCI_ERROR_INVALID_ARGS; } /* Allocate guest call entry and add it to the target VM's queue. */ dq_entry = kmalloc(sizeof(*dq_entry), GFP_KERNEL); if (dq_entry == NULL) { pr_warn("Failed to allocate memory for datagram\n"); vmci_ctx_put(context); return VMCI_ERROR_NO_MEM; } dq_entry->dg = dg; dq_entry->dg_size = vmci_dg_size; dg_src = dg->src; INIT_LIST_HEAD(&dq_entry->list_item); spin_lock(&context->lock); /* * We put a higher limit on datagrams from the hypervisor. If * the pending datagram is not from hypervisor, then we check * if enqueueing it would exceed the * VMCI_MAX_DATAGRAM_QUEUE_SIZE limit on the destination. If * the pending datagram is from hypervisor, we allow it to be * queued at the destination side provided we don't reach the * VMCI_MAX_DATAGRAM_AND_EVENT_QUEUE_SIZE limit. */ if (context->datagram_queue_size + vmci_dg_size >= VMCI_MAX_DATAGRAM_QUEUE_SIZE && (!vmci_handle_is_equal(dg_src, vmci_make_handle (VMCI_HYPERVISOR_CONTEXT_ID, VMCI_CONTEXT_RESOURCE_ID)) || context->datagram_queue_size + vmci_dg_size >= VMCI_MAX_DATAGRAM_AND_EVENT_QUEUE_SIZE)) { spin_unlock(&context->lock); vmci_ctx_put(context); kfree(dq_entry); pr_devel("Context (ID=0x%x) receive queue is full\n", cid); return VMCI_ERROR_NO_RESOURCES; } list_add(&dq_entry->list_item, &context->datagram_queue); context->pending_datagrams++; context->datagram_queue_size += vmci_dg_size; ctx_signal_notify(context); wake_up(&context->host_context.wait_queue); spin_unlock(&context->lock); vmci_ctx_put(context); return vmci_dg_size; } /* * Verifies whether a context with the specified context ID exists. * FIXME: utility is dubious as no decisions can be reliably made * using this data as context can appear and disappear at any time. */ bool vmci_ctx_exists(u32 cid) { struct vmci_ctx *context; bool exists = false; rcu_read_lock(); list_for_each_entry_rcu(context, &ctx_list.head, list_item) { if (context->cid == cid) { exists = true; break; } } rcu_read_unlock(); return exists; } /* * Retrieves VMCI context corresponding to the given cid. */ struct vmci_ctx *vmci_ctx_get(u32 cid) { struct vmci_ctx *c, *context = NULL; if (cid == VMCI_INVALID_ID) return NULL; rcu_read_lock(); list_for_each_entry_rcu(c, &ctx_list.head, list_item) { if (c->cid == cid) { /* * The context owner drops its own reference to the * context only after removing it from the list and * waiting for RCU grace period to expire. This * means that we are not about to increase the * reference count of something that is in the * process of being destroyed. */ context = c; kref_get(&context->kref); break; } } rcu_read_unlock(); return context; } /* * Deallocates all parts of a context data structure. This * function doesn't lock the context, because it assumes that * the caller was holding the last reference to context. */ static void ctx_free_ctx(struct kref *kref) { struct vmci_ctx *context = container_of(kref, struct vmci_ctx, kref); struct vmci_datagram_queue_entry *dq_entry, *dq_entry_tmp; struct vmci_handle temp_handle; struct vmci_handle_list *notifier, *tmp; /* * Fire event to all contexts interested in knowing this * context is dying. */ ctx_fire_notification(context->cid, context->priv_flags); /* * Cleanup all queue pair resources attached to context. If * the VM dies without cleaning up, this code will make sure * that no resources are leaked. */ temp_handle = vmci_handle_arr_get_entry(context->queue_pair_array, 0); while (!vmci_handle_is_equal(temp_handle, VMCI_INVALID_HANDLE)) { if (vmci_qp_broker_detach(temp_handle, context) < VMCI_SUCCESS) { /* * When vmci_qp_broker_detach() succeeds it * removes the handle from the array. If * detach fails, we must remove the handle * ourselves. */ vmci_handle_arr_remove_entry(context->queue_pair_array, temp_handle); } temp_handle = vmci_handle_arr_get_entry(context->queue_pair_array, 0); } /* * It is fine to destroy this without locking the callQueue, as * this is the only thread having a reference to the context. */ list_for_each_entry_safe(dq_entry, dq_entry_tmp, &context->datagram_queue, list_item) { WARN_ON(dq_entry->dg_size != VMCI_DG_SIZE(dq_entry->dg)); list_del(&dq_entry->list_item); kfree(dq_entry->dg); kfree(dq_entry); } list_for_each_entry_safe(notifier, tmp, &context->notifier_list, node) { list_del(&notifier->node); kfree(notifier); } vmci_handle_arr_destroy(context->queue_pair_array); vmci_handle_arr_destroy(context->doorbell_array); vmci_handle_arr_destroy(context->pending_doorbell_array); vmci_ctx_unset_notify(context); if (context->cred) put_cred(context->cred); kfree(context); } /* * Drops reference to VMCI context. If this is the last reference to * the context it will be deallocated. A context is created with * a reference count of one, and on destroy, it is removed from * the context list before its reference count is decremented. Thus, * if we reach zero, we are sure that nobody else are about to increment * it (they need the entry in the context list for that), and so there * is no need for locking. */ void vmci_ctx_put(struct vmci_ctx *context) { kref_put(&context->kref, ctx_free_ctx); } /* * Dequeues the next datagram and returns it to caller. * The caller passes in a pointer to the max size datagram * it can handle and the datagram is only unqueued if the * size is less than max_size. If larger max_size is set to * the size of the datagram to give the caller a chance to * set up a larger buffer for the guestcall. */ int vmci_ctx_dequeue_datagram(struct vmci_ctx *context, size_t *max_size, struct vmci_datagram **dg) { struct vmci_datagram_queue_entry *dq_entry; struct list_head *list_item; int rv; /* Dequeue the next datagram entry. */ spin_lock(&context->lock); if (context->pending_datagrams == 0) { ctx_clear_notify_call(context); spin_unlock(&context->lock); pr_devel("No datagrams pending\n"); return VMCI_ERROR_NO_MORE_DATAGRAMS; } list_item = context->datagram_queue.next; dq_entry = list_entry(list_item, struct vmci_datagram_queue_entry, list_item); /* Check size of caller's buffer. */ if (*max_size < dq_entry->dg_size) { *max_size = dq_entry->dg_size; spin_unlock(&context->lock); pr_devel("Caller's buffer should be at least (size=%u bytes)\n", (u32) *max_size); return VMCI_ERROR_NO_MEM; } list_del(list_item); context->pending_datagrams--; context->datagram_queue_size -= dq_entry->dg_size; if (context->pending_datagrams == 0) { ctx_clear_notify_call(context); rv = VMCI_SUCCESS; } else { /* * Return the size of the next datagram. */ struct vmci_datagram_queue_entry *next_entry; list_item = context->datagram_queue.next; next_entry = list_entry(list_item, struct vmci_datagram_queue_entry, list_item); /* * The following size_t -> int truncation is fine as * the maximum size of a (routable) datagram is 68KB. */ rv = (int)next_entry->dg_size; } spin_unlock(&context->lock); /* Caller must free datagram. */ *dg = dq_entry->dg; dq_entry->dg = NULL; kfree(dq_entry); return rv; } /* * Reverts actions set up by vmci_setup_notify(). Unmaps and unlocks the * page mapped/locked by vmci_setup_notify(). */ void vmci_ctx_unset_notify(struct vmci_ctx *context) { struct page *notify_page; spin_lock(&context->lock); notify_page = context->notify_page; context->notify = &ctx_dummy_notify; context->notify_page = NULL; spin_unlock(&context->lock); if (notify_page) { kunmap(notify_page); put_page(notify_page); } } /* * Add remote_cid to list of contexts current contexts wants * notifications from/about. */ int vmci_ctx_add_notification(u32 context_id, u32 remote_cid) { struct vmci_ctx *context; struct vmci_handle_list *notifier, *n; int result; bool exists = false; context = vmci_ctx_get(context_id); if (!context) return VMCI_ERROR_NOT_FOUND; if (VMCI_CONTEXT_IS_VM(context_id) && VMCI_CONTEXT_IS_VM(remote_cid)) { pr_devel("Context removed notifications for other VMs not supported (src=0x%x, remote=0x%x)\n", context_id, remote_cid); result = VMCI_ERROR_DST_UNREACHABLE; goto out; } if (context->priv_flags & VMCI_PRIVILEGE_FLAG_RESTRICTED) { result = VMCI_ERROR_NO_ACCESS; goto out; } notifier = kmalloc(sizeof(struct vmci_handle_list), GFP_KERNEL); if (!notifier) { result = VMCI_ERROR_NO_MEM; goto out; } INIT_LIST_HEAD(&notifier->node); notifier->handle = vmci_make_handle(remote_cid, VMCI_EVENT_HANDLER); spin_lock(&context->lock); if (context->n_notifiers < VMCI_MAX_CONTEXTS) { list_for_each_entry(n, &context->notifier_list, node) { if (vmci_handle_is_equal(n->handle, notifier->handle)) { exists = true; break; } } if (exists) { kfree(notifier); result = VMCI_ERROR_ALREADY_EXISTS; } else { list_add_tail_rcu(&notifier->node, &context->notifier_list); context->n_notifiers++; result = VMCI_SUCCESS; } } else { kfree(notifier); result = VMCI_ERROR_NO_MEM; } spin_unlock(&context->lock); out: vmci_ctx_put(context); return result; } /* * Remove remote_cid from current context's list of contexts it is * interested in getting notifications from/about. */ int vmci_ctx_remove_notification(u32 context_id, u32 remote_cid) { struct vmci_ctx *context; struct vmci_handle_list *notifier, *tmp; struct vmci_handle handle; bool found = false; context = vmci_ctx_get(context_id); if (!context) return VMCI_ERROR_NOT_FOUND; handle = vmci_make_handle(remote_cid, VMCI_EVENT_HANDLER); spin_lock(&context->lock); list_for_each_entry_safe(notifier, tmp, &context->notifier_list, node) { if (vmci_handle_is_equal(notifier->handle, handle)) { list_del_rcu(&notifier->node); context->n_notifiers--; found = true; break; } } spin_unlock(&context->lock); if (found) { synchronize_rcu(); kfree(notifier); } vmci_ctx_put(context); return found ? VMCI_SUCCESS : VMCI_ERROR_NOT_FOUND; } static int vmci_ctx_get_chkpt_notifiers(struct vmci_ctx *context, u32 *buf_size, void **pbuf) { u32 *notifiers; size_t data_size; struct vmci_handle_list *entry; int i = 0; if (context->n_notifiers == 0) { *buf_size = 0; *pbuf = NULL; return VMCI_SUCCESS; } data_size = context->n_notifiers * sizeof(*notifiers); if (*buf_size < data_size) { *buf_size = data_size; return VMCI_ERROR_MORE_DATA; } notifiers = kmalloc(data_size, GFP_ATOMIC); /* FIXME: want GFP_KERNEL */ if (!notifiers) return VMCI_ERROR_NO_MEM; list_for_each_entry(entry, &context->notifier_list, node) notifiers[i++] = entry->handle.context; *buf_size = data_size; *pbuf = notifiers; return VMCI_SUCCESS; } static int vmci_ctx_get_chkpt_doorbells(struct vmci_ctx *context, u32 *buf_size, void **pbuf) { struct dbell_cpt_state *dbells; u32 i, n_doorbells; n_doorbells = vmci_handle_arr_get_size(context->doorbell_array); if (n_doorbells > 0) { size_t data_size = n_doorbells * sizeof(*dbells); if (*buf_size < data_size) { *buf_size = data_size; return VMCI_ERROR_MORE_DATA; } dbells = kzalloc(data_size, GFP_ATOMIC); if (!dbells) return VMCI_ERROR_NO_MEM; for (i = 0; i < n_doorbells; i++) dbells[i].handle = vmci_handle_arr_get_entry( context->doorbell_array, i); *buf_size = data_size; *pbuf = dbells; } else { *buf_size = 0; *pbuf = NULL; } return VMCI_SUCCESS; } /* * Get current context's checkpoint state of given type. */ int vmci_ctx_get_chkpt_state(u32 context_id, u32 cpt_type, u32 *buf_size, void **pbuf) { struct vmci_ctx *context; int result; context = vmci_ctx_get(context_id); if (!context) return VMCI_ERROR_NOT_FOUND; spin_lock(&context->lock); switch (cpt_type) { case VMCI_NOTIFICATION_CPT_STATE: result = vmci_ctx_get_chkpt_notifiers(context, buf_size, pbuf); break; case VMCI_WELLKNOWN_CPT_STATE: /* * For compatibility with VMX'en with VM to VM communication, we * always return zero wellknown handles. */ *buf_size = 0; *pbuf = NULL; result = VMCI_SUCCESS; break; case VMCI_DOORBELL_CPT_STATE: result = vmci_ctx_get_chkpt_doorbells(context, buf_size, pbuf); break; default: pr_devel("Invalid cpt state (type=%d)\n", cpt_type); result = VMCI_ERROR_INVALID_ARGS; break; } spin_unlock(&context->lock); vmci_ctx_put(context); return result; } /* * Set current context's checkpoint state of given type. */ int vmci_ctx_set_chkpt_state(u32 context_id, u32 cpt_type, u32 buf_size, void *cpt_buf) { u32 i; u32 current_id; int result = VMCI_SUCCESS; u32 num_ids = buf_size / sizeof(u32); if (cpt_type == VMCI_WELLKNOWN_CPT_STATE && num_ids > 0) { /* * We would end up here if VMX with VM to VM communication * attempts to restore a checkpoint with wellknown handles. */ pr_warn("Attempt to restore checkpoint with obsolete wellknown handles\n"); return VMCI_ERROR_OBSOLETE; } if (cpt_type != VMCI_NOTIFICATION_CPT_STATE) { pr_devel("Invalid cpt state (type=%d)\n", cpt_type); return VMCI_ERROR_INVALID_ARGS; } for (i = 0; i < num_ids && result == VMCI_SUCCESS; i++) { current_id = ((u32 *)cpt_buf)[i]; result = vmci_ctx_add_notification(context_id, current_id); if (result != VMCI_SUCCESS) break; } if (result != VMCI_SUCCESS) pr_devel("Failed to set cpt state (type=%d) (error=%d)\n", cpt_type, result); return result; } /* * Retrieves the specified context's pending notifications in the * form of a handle array. The handle arrays returned are the * actual data - not a copy and should not be modified by the * caller. They must be released using * vmci_ctx_rcv_notifications_release. */ int vmci_ctx_rcv_notifications_get(u32 context_id, struct vmci_handle_arr **db_handle_array, struct vmci_handle_arr **qp_handle_array) { struct vmci_ctx *context; int result = VMCI_SUCCESS; context = vmci_ctx_get(context_id); if (context == NULL) return VMCI_ERROR_NOT_FOUND; spin_lock(&context->lock); *db_handle_array = context->pending_doorbell_array; context->pending_doorbell_array = vmci_handle_arr_create(0, VMCI_MAX_GUEST_DOORBELL_COUNT); if (!context->pending_doorbell_array) { context->pending_doorbell_array = *db_handle_array; *db_handle_array = NULL; result = VMCI_ERROR_NO_MEM; } *qp_handle_array = NULL; spin_unlock(&context->lock); vmci_ctx_put(context); return result; } /* * Releases handle arrays with pending notifications previously * retrieved using vmci_ctx_rcv_notifications_get. If the * notifications were not successfully handed over to the guest, * success must be false. */ void vmci_ctx_rcv_notifications_release(u32 context_id, struct vmci_handle_arr *db_handle_array, struct vmci_handle_arr *qp_handle_array, bool success) { struct vmci_ctx *context = vmci_ctx_get(context_id); spin_lock(&context->lock); if (!success) { struct vmci_handle handle; /* * New notifications may have been added while we were not * holding the context lock, so we transfer any new pending * doorbell notifications to the old array, and reinstate the * old array. */ handle = vmci_handle_arr_remove_tail( context->pending_doorbell_array); while (!vmci_handle_is_invalid(handle)) { if (!vmci_handle_arr_has_entry(db_handle_array, handle)) { vmci_handle_arr_append_entry( &db_handle_array, handle); } handle = vmci_handle_arr_remove_tail( context->pending_doorbell_array); } vmci_handle_arr_destroy(context->pending_doorbell_array); context->pending_doorbell_array = db_handle_array; db_handle_array = NULL; } else { ctx_clear_notify_call(context); } spin_unlock(&context->lock); vmci_ctx_put(context); if (db_handle_array) vmci_handle_arr_destroy(db_handle_array); if (qp_handle_array) vmci_handle_arr_destroy(qp_handle_array); } /* * Registers that a new doorbell handle has been allocated by the * context. Only doorbell handles registered can be notified. */ int vmci_ctx_dbell_create(u32 context_id, struct vmci_handle handle) { struct vmci_ctx *context; int result; if (context_id == VMCI_INVALID_ID || vmci_handle_is_invalid(handle)) return VMCI_ERROR_INVALID_ARGS; context = vmci_ctx_get(context_id); if (context == NULL) return VMCI_ERROR_NOT_FOUND; spin_lock(&context->lock); if (!vmci_handle_arr_has_entry(context->doorbell_array, handle)) result = vmci_handle_arr_append_entry(&context->doorbell_array, handle); else result = VMCI_ERROR_DUPLICATE_ENTRY; spin_unlock(&context->lock); vmci_ctx_put(context); return result; } /* * Unregisters a doorbell handle that was previously registered * with vmci_ctx_dbell_create. */ int vmci_ctx_dbell_destroy(u32 context_id, struct vmci_handle handle) { struct vmci_ctx *context; struct vmci_handle removed_handle; if (context_id == VMCI_INVALID_ID || vmci_handle_is_invalid(handle)) return VMCI_ERROR_INVALID_ARGS; context = vmci_ctx_get(context_id); if (context == NULL) return VMCI_ERROR_NOT_FOUND; spin_lock(&context->lock); removed_handle = vmci_handle_arr_remove_entry(context->doorbell_array, handle); vmci_handle_arr_remove_entry(context->pending_doorbell_array, handle); spin_unlock(&context->lock); vmci_ctx_put(context); return vmci_handle_is_invalid(removed_handle) ? VMCI_ERROR_NOT_FOUND : VMCI_SUCCESS; } /* * Unregisters all doorbell handles that were previously * registered with vmci_ctx_dbell_create. */ int vmci_ctx_dbell_destroy_all(u32 context_id) { struct vmci_ctx *context; struct vmci_handle handle; if (context_id == VMCI_INVALID_ID) return VMCI_ERROR_INVALID_ARGS; context = vmci_ctx_get(context_id); if (context == NULL) return VMCI_ERROR_NOT_FOUND; spin_lock(&context->lock); do { struct vmci_handle_arr *arr = context->doorbell_array; handle = vmci_handle_arr_remove_tail(arr); } while (!vmci_handle_is_invalid(handle)); do { struct vmci_handle_arr *arr = context->pending_doorbell_array; handle = vmci_handle_arr_remove_tail(arr); } while (!vmci_handle_is_invalid(handle)); spin_unlock(&context->lock); vmci_ctx_put(context); return VMCI_SUCCESS; } /* * Registers a notification of a doorbell handle initiated by the * specified source context. The notification of doorbells are * subject to the same isolation rules as datagram delivery. To * allow host side senders of notifications a finer granularity * of sender rights than those assigned to the sending context * itself, the host context is required to specify a different * set of privilege flags that will override the privileges of * the source context. */ int vmci_ctx_notify_dbell(u32 src_cid, struct vmci_handle handle, u32 src_priv_flags) { struct vmci_ctx *dst_context; int result; if (vmci_handle_is_invalid(handle)) return VMCI_ERROR_INVALID_ARGS; /* Get the target VM's VMCI context. */ dst_context = vmci_ctx_get(handle.context); if (!dst_context) { pr_devel("Invalid context (ID=0x%x)\n", handle.context); return VMCI_ERROR_NOT_FOUND; } if (src_cid != handle.context) { u32 dst_priv_flags; if (VMCI_CONTEXT_IS_VM(src_cid) && VMCI_CONTEXT_IS_VM(handle.context)) { pr_devel("Doorbell notification from VM to VM not supported (src=0x%x, dst=0x%x)\n", src_cid, handle.context); result = VMCI_ERROR_DST_UNREACHABLE; goto out; } result = vmci_dbell_get_priv_flags(handle, &dst_priv_flags); if (result < VMCI_SUCCESS) { pr_warn("Failed to get privilege flags for destination (handle=0x%x:0x%x)\n", handle.context, handle.resource); goto out; } if (src_cid != VMCI_HOST_CONTEXT_ID || src_priv_flags == VMCI_NO_PRIVILEGE_FLAGS) { src_priv_flags = vmci_context_get_priv_flags(src_cid); } if (vmci_deny_interaction(src_priv_flags, dst_priv_flags)) { result = VMCI_ERROR_NO_ACCESS; goto out; } } if (handle.context == VMCI_HOST_CONTEXT_ID) { result = vmci_dbell_host_context_notify(src_cid, handle); } else { spin_lock(&dst_context->lock); if (!vmci_handle_arr_has_entry(dst_context->doorbell_array, handle)) { result = VMCI_ERROR_NOT_FOUND; } else { if (!vmci_handle_arr_has_entry( dst_context->pending_doorbell_array, handle)) { result = vmci_handle_arr_append_entry( &dst_context->pending_doorbell_array, handle); if (result == VMCI_SUCCESS) { ctx_signal_notify(dst_context); wake_up(&dst_context->host_context.wait_queue); } } else { result = VMCI_SUCCESS; } } spin_unlock(&dst_context->lock); } out: vmci_ctx_put(dst_context); return result; } bool vmci_ctx_supports_host_qp(struct vmci_ctx *context) { return context && context->user_version >= VMCI_VERSION_HOSTQP; } /* * Registers that a new queue pair handle has been allocated by * the context. */ int vmci_ctx_qp_create(struct vmci_ctx *context, struct vmci_handle handle) { int result; if (context == NULL || vmci_handle_is_invalid(handle)) return VMCI_ERROR_INVALID_ARGS; if (!vmci_handle_arr_has_entry(context->queue_pair_array, handle)) result = vmci_handle_arr_append_entry( &context->queue_pair_array, handle); else result = VMCI_ERROR_DUPLICATE_ENTRY; return result; } /* * Unregisters a queue pair handle that was previously registered * with vmci_ctx_qp_create. */ int vmci_ctx_qp_destroy(struct vmci_ctx *context, struct vmci_handle handle) { struct vmci_handle hndl; if (context == NULL || vmci_handle_is_invalid(handle)) return VMCI_ERROR_INVALID_ARGS; hndl = vmci_handle_arr_remove_entry(context->queue_pair_array, handle); return vmci_handle_is_invalid(hndl) ? VMCI_ERROR_NOT_FOUND : VMCI_SUCCESS; } /* * Determines whether a given queue pair handle is registered * with the given context. */ bool vmci_ctx_qp_exists(struct vmci_ctx *context, struct vmci_handle handle) { if (context == NULL || vmci_handle_is_invalid(handle)) return false; return vmci_handle_arr_has_entry(context->queue_pair_array, handle); } /* * vmci_context_get_priv_flags() - Retrieve privilege flags. * @context_id: The context ID of the VMCI context. * * Retrieves privilege flags of the given VMCI context ID. */ u32 vmci_context_get_priv_flags(u32 context_id) { if (vmci_host_code_active()) { u32 flags; struct vmci_ctx *context; context = vmci_ctx_get(context_id); if (!context) return VMCI_LEAST_PRIVILEGE_FLAGS; flags = context->priv_flags; vmci_ctx_put(context); return flags; } return VMCI_NO_PRIVILEGE_FLAGS; } EXPORT_SYMBOL_GPL(vmci_context_get_priv_flags); /* * vmci_is_context_owner() - Determimnes if user is the context owner * @context_id: The context ID of the VMCI context. * @uid: The host user id (real kernel value). * * Determines whether a given UID is the owner of given VMCI context. */ bool vmci_is_context_owner(u32 context_id, kuid_t uid) { bool is_owner = false; if (vmci_host_code_active()) { struct vmci_ctx *context = vmci_ctx_get(context_id); if (context) { if (context->cred) is_owner = uid_eq(context->cred->uid, uid); vmci_ctx_put(context); } } return is_owner; } EXPORT_SYMBOL_GPL(vmci_is_context_owner);
3 3 2 2 1 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 /* * (C) 2000-2001 Svenning Soerensen <svenning@post5.tele.dk> * Copyright (c) 2011 Patrick McHardy <kaber@trash.net> * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License version 2 as * published by the Free Software Foundation. */ #include <linux/ip.h> #include <linux/kernel.h> #include <linux/module.h> #include <linux/netdevice.h> #include <linux/ipv6.h> #include <linux/netfilter.h> #include <linux/netfilter_ipv4.h> #include <linux/netfilter_ipv6.h> #include <linux/netfilter/x_tables.h> #include <net/netfilter/nf_nat.h> static unsigned int netmap_tg6(struct sk_buff *skb, const struct xt_action_param *par) { const struct nf_nat_range2 *range = par->targinfo; struct nf_nat_range2 newrange; struct nf_conn *ct; enum ip_conntrack_info ctinfo; union nf_inet_addr new_addr, netmask; unsigned int i; ct = nf_ct_get(skb, &ctinfo); for (i = 0; i < ARRAY_SIZE(range->min_addr.ip6); i++) netmask.ip6[i] = ~(range->min_addr.ip6[i] ^ range->max_addr.ip6[i]); if (xt_hooknum(par) == NF_INET_PRE_ROUTING || xt_hooknum(par) == NF_INET_LOCAL_OUT) new_addr.in6 = ipv6_hdr(skb)->daddr; else new_addr.in6 = ipv6_hdr(skb)->saddr; for (i = 0; i < ARRAY_SIZE(new_addr.ip6); i++) { new_addr.ip6[i] &= ~netmask.ip6[i]; new_addr.ip6[i] |= range->min_addr.ip6[i] & netmask.ip6[i]; } newrange.flags = range->flags | NF_NAT_RANGE_MAP_IPS; newrange.min_addr = new_addr; newrange.max_addr = new_addr; newrange.min_proto = range->min_proto; newrange.max_proto = range->max_proto; return nf_nat_setup_info(ct, &newrange, HOOK2MANIP(xt_hooknum(par))); } static int netmap_tg6_checkentry(const struct xt_tgchk_param *par) { const struct nf_nat_range2 *range = par->targinfo; if (!(range->flags & NF_NAT_RANGE_MAP_IPS)) return -EINVAL; return nf_ct_netns_get(par->net, par->family); } static void netmap_tg_destroy(const struct xt_tgdtor_param *par) { nf_ct_netns_put(par->net, par->family); } static unsigned int netmap_tg4(struct sk_buff *skb, const struct xt_action_param *par) { struct nf_conn *ct; enum ip_conntrack_info ctinfo; __be32 new_ip, netmask; const struct nf_nat_ipv4_multi_range_compat *mr = par->targinfo; struct nf_nat_range2 newrange; WARN_ON(xt_hooknum(par) != NF_INET_PRE_ROUTING && xt_hooknum(par) != NF_INET_POST_ROUTING && xt_hooknum(par) != NF_INET_LOCAL_OUT && xt_hooknum(par) != NF_INET_LOCAL_IN); ct = nf_ct_get(skb, &ctinfo); netmask = ~(mr->range[0].min_ip ^ mr->range[0].max_ip); if (xt_hooknum(par) == NF_INET_PRE_ROUTING || xt_hooknum(par) == NF_INET_LOCAL_OUT) new_ip = ip_hdr(skb)->daddr & ~netmask; else new_ip = ip_hdr(skb)->saddr & ~netmask; new_ip |= mr->range[0].min_ip & netmask; memset(&newrange.min_addr, 0, sizeof(newrange.min_addr)); memset(&newrange.max_addr, 0, sizeof(newrange.max_addr)); newrange.flags = mr->range[0].flags | NF_NAT_RANGE_MAP_IPS; newrange.min_addr.ip = new_ip; newrange.max_addr.ip = new_ip; newrange.min_proto = mr->range[0].min; newrange.max_proto = mr->range[0].max; /* Hand modified range to generic setup. */ return nf_nat_setup_info(ct, &newrange, HOOK2MANIP(xt_hooknum(par))); } static int netmap_tg4_check(const struct xt_tgchk_param *par) { const struct nf_nat_ipv4_multi_range_compat *mr = par->targinfo; if (!(mr->range[0].flags & NF_NAT_RANGE_MAP_IPS)) { pr_debug("bad MAP_IPS.\n"); return -EINVAL; } if (mr->rangesize != 1) { pr_debug("bad rangesize %u.\n", mr->rangesize); return -EINVAL; } return nf_ct_netns_get(par->net, par->family); } static struct xt_target netmap_tg_reg[] __read_mostly = { { .name = "NETMAP", .family = NFPROTO_IPV6, .revision = 0, .target = netmap_tg6, .targetsize = sizeof(struct nf_nat_range), .table = "nat", .hooks = (1 << NF_INET_PRE_ROUTING) | (1 << NF_INET_POST_ROUTING) | (1 << NF_INET_LOCAL_OUT) | (1 << NF_INET_LOCAL_IN), .checkentry = netmap_tg6_checkentry, .destroy = netmap_tg_destroy, .me = THIS_MODULE, }, { .name = "NETMAP", .family = NFPROTO_IPV4, .revision = 0, .target = netmap_tg4, .targetsize = sizeof(struct nf_nat_ipv4_multi_range_compat), .table = "nat", .hooks = (1 << NF_INET_PRE_ROUTING) | (1 << NF_INET_POST_ROUTING) | (1 << NF_INET_LOCAL_OUT) | (1 << NF_INET_LOCAL_IN), .checkentry = netmap_tg4_check, .destroy = netmap_tg_destroy, .me = THIS_MODULE, }, }; static int __init netmap_tg_init(void) { return xt_register_targets(netmap_tg_reg, ARRAY_SIZE(netmap_tg_reg)); } static void netmap_tg_exit(void) { xt_unregister_targets(netmap_tg_reg, ARRAY_SIZE(netmap_tg_reg)); } module_init(netmap_tg_init); module_exit(netmap_tg_exit); MODULE_LICENSE("GPL"); MODULE_DESCRIPTION("Xtables: 1:1 NAT mapping of subnets"); MODULE_AUTHOR("Patrick McHardy <kaber@trash.net>"); MODULE_ALIAS("ip6t_NETMAP"); MODULE_ALIAS("ipt_NETMAP");
37 37 37 347 232 231 232 231 232 231 231 231 1 1 13 311 311 307 307 29 29 309 4 313 4 4 4 4 4 4 1 1 4 4 4 4 1 1 1 4 3 1 1 1 3 3 9 4 4 4 3 31 11 4 3 4 2 3 3 3 3 1 3 3 3 3 3 11 29 18 18 18 4 4 4 4 29 12 4 4 7 5 8 8 3 8 7 8 20 20 4 2174 2174 477 115 115 311 311 307 305 30 29 29 9 29 28 28 28 30 19 20 20 20 20 6 4 4 4 6 2 2 3 7 6 1 35 15 15 6 12 31 231 231 72 71 71 69 71 60 66 8 6 18 16 15 15 8 7 35 33 2 34 31 30 3 2 1 4 3 2 8 7 6 4 4 52 8 8 8 10 2 3 1 6 4 3 3 29 29 29 1 2 1 1 1 1 1 3 3 3 3 3 42 42 42 41 42 11 31 31 31 37 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 2297 2298 2299 2300 2301 2302 2303 2304 2305 2306 2307 2308 2309 2310 2311 2312 2313 2314 2315 2316 2317 2318 2319 2320 2321 2322 2323 2324 2325 2326 2327 2328 2329 2330 2331 2332 2333 2334 2335 2336 2337 2338 2339 2340 2341 2342 2343 2344 2345 2346 2347 2348 2349 2350 2351 2352 2353 2354 2355 2356 2357 2358 2359 2360 2361 2362 2363 2364 2365 2366 2367 2368 2369 2370 2371 2372 2373 2374 2375 2376 2377 2378 2379 2380 2381 2382 2383 2384 2385 2386 2387 2388 2389 2390 2391 2392 2393 2394 2395 2396 2397 2398 2399 2400 2401 2402 2403 2404 2405 2406 2407 2408 2409 2410 2411 2412 2413 2414 2415 2416 2417 2418 2419 2420 2421 2422 2423 2424 2425 2426 2427 2428 2429 2430 2431 2432 2433 2434 2435 2436 2437 2438 2439 2440 2441 2442 2443 2444 2445 2446 2447 2448 /* * Linux IPv6 multicast routing support for BSD pim6sd * Based on net/ipv4/ipmr.c. * * (c) 2004 Mickael Hoerdt, <hoerdt@clarinet.u-strasbg.fr> * LSIIT Laboratory, Strasbourg, France * (c) 2004 Jean-Philippe Andriot, <jean-philippe.andriot@6WIND.com> * 6WIND, Paris, France * Copyright (C)2007,2008 USAGI/WIDE Project * YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License * as published by the Free Software Foundation; either version * 2 of the License, or (at your option) any later version. * */ #include <linux/uaccess.h> #include <linux/types.h> #include <linux/sched.h> #include <linux/errno.h> #include <linux/mm.h> #include <linux/kernel.h> #include <linux/fcntl.h> #include <linux/stat.h> #include <linux/socket.h> #include <linux/inet.h> #include <linux/netdevice.h> #include <linux/inetdevice.h> #include <linux/proc_fs.h> #include <linux/seq_file.h> #include <linux/init.h> #include <linux/compat.h> #include <linux/rhashtable.h> #include <net/protocol.h> #include <linux/skbuff.h> #include <net/raw.h> #include <linux/notifier.h> #include <linux/if_arp.h> #include <net/checksum.h> #include <net/netlink.h> #include <net/fib_rules.h> #include <net/ipv6.h> #include <net/ip6_route.h> #include <linux/mroute6.h> #include <linux/pim.h> #include <net/addrconf.h> #include <linux/netfilter_ipv6.h> #include <linux/export.h> #include <net/ip6_checksum.h> #include <linux/netconf.h> #include <net/ip_tunnels.h> #include <linux/nospec.h> struct ip6mr_rule { struct fib_rule common; }; struct ip6mr_result { struct mr_table *mrt; }; /* Big lock, protecting vif table, mrt cache and mroute socket state. Note that the changes are semaphored via rtnl_lock. */ static DEFINE_RWLOCK(mrt_lock); /* Multicast router control variables */ /* Special spinlock for queue of unresolved entries */ static DEFINE_SPINLOCK(mfc_unres_lock); /* We return to original Alan's scheme. Hash table of resolved entries is changed only in process context and protected with weak lock mrt_lock. Queue of unresolved entries is protected with strong spinlock mfc_unres_lock. In this case data path is free of exclusive locks at all. */ static struct kmem_cache *mrt_cachep __read_mostly; static struct mr_table *ip6mr_new_table(struct net *net, u32 id); static void ip6mr_free_table(struct mr_table *mrt); static void ip6_mr_forward(struct net *net, struct mr_table *mrt, struct sk_buff *skb, struct mfc6_cache *cache); static int ip6mr_cache_report(struct mr_table *mrt, struct sk_buff *pkt, mifi_t mifi, int assert); static void mr6_netlink_event(struct mr_table *mrt, struct mfc6_cache *mfc, int cmd); static void mrt6msg_netlink_event(struct mr_table *mrt, struct sk_buff *pkt); static int ip6mr_rtm_dumproute(struct sk_buff *skb, struct netlink_callback *cb); static void mroute_clean_tables(struct mr_table *mrt, bool all); static void ipmr_expire_process(struct timer_list *t); #ifdef CONFIG_IPV6_MROUTE_MULTIPLE_TABLES #define ip6mr_for_each_table(mrt, net) \ list_for_each_entry_rcu(mrt, &net->ipv6.mr6_tables, list) static struct mr_table *ip6mr_mr_table_iter(struct net *net, struct mr_table *mrt) { struct mr_table *ret; if (!mrt) ret = list_entry_rcu(net->ipv6.mr6_tables.next, struct mr_table, list); else ret = list_entry_rcu(mrt->list.next, struct mr_table, list); if (&ret->list == &net->ipv6.mr6_tables) return NULL; return ret; } static struct mr_table *ip6mr_get_table(struct net *net, u32 id) { struct mr_table *mrt; ip6mr_for_each_table(mrt, net) { if (mrt->id == id) return mrt; } return NULL; } static int ip6mr_fib_lookup(struct net *net, struct flowi6 *flp6, struct mr_table **mrt) { int err; struct ip6mr_result res; struct fib_lookup_arg arg = { .result = &res, .flags = FIB_LOOKUP_NOREF, }; err = fib_rules_lookup(net->ipv6.mr6_rules_ops, flowi6_to_flowi(flp6), 0, &arg); if (err < 0) return err; *mrt = res.mrt; return 0; } static int ip6mr_rule_action(struct fib_rule *rule, struct flowi *flp, int flags, struct fib_lookup_arg *arg) { struct ip6mr_result *res = arg->result; struct mr_table *mrt; switch (rule->action) { case FR_ACT_TO_TBL: break; case FR_ACT_UNREACHABLE: return -ENETUNREACH; case FR_ACT_PROHIBIT: return -EACCES; case FR_ACT_BLACKHOLE: default: return -EINVAL; } mrt = ip6mr_get_table(rule->fr_net, rule->table); if (!mrt) return -EAGAIN; res->mrt = mrt; return 0; } static int ip6mr_rule_match(struct fib_rule *rule, struct flowi *flp, int flags) { return 1; } static const struct nla_policy ip6mr_rule_policy[FRA_MAX + 1] = { FRA_GENERIC_POLICY, }; static int ip6mr_rule_configure(struct fib_rule *rule, struct sk_buff *skb, struct fib_rule_hdr *frh, struct nlattr **tb, struct netlink_ext_ack *extack) { return 0; } static int ip6mr_rule_compare(struct fib_rule *rule, struct fib_rule_hdr *frh, struct nlattr **tb) { return 1; } static int ip6mr_rule_fill(struct fib_rule *rule, struct sk_buff *skb, struct fib_rule_hdr *frh) { frh->dst_len = 0; frh->src_len = 0; frh->tos = 0; return 0; } static const struct fib_rules_ops __net_initconst ip6mr_rules_ops_template = { .family = RTNL_FAMILY_IP6MR, .rule_size = sizeof(struct ip6mr_rule), .addr_size = sizeof(struct in6_addr), .action = ip6mr_rule_action, .match = ip6mr_rule_match, .configure = ip6mr_rule_configure, .compare = ip6mr_rule_compare, .fill = ip6mr_rule_fill, .nlgroup = RTNLGRP_IPV6_RULE, .policy = ip6mr_rule_policy, .owner = THIS_MODULE, }; static int __net_init ip6mr_rules_init(struct net *net) { struct fib_rules_ops *ops; struct mr_table *mrt; int err; ops = fib_rules_register(&ip6mr_rules_ops_template, net); if (IS_ERR(ops)) return PTR_ERR(ops); INIT_LIST_HEAD(&net->ipv6.mr6_tables); mrt = ip6mr_new_table(net, RT6_TABLE_DFLT); if (IS_ERR(mrt)) { err = PTR_ERR(mrt); goto err1; } err = fib_default_rule_add(ops, 0x7fff, RT6_TABLE_DFLT, 0); if (err < 0) goto err2; net->ipv6.mr6_rules_ops = ops; return 0; err2: rtnl_lock(); ip6mr_free_table(mrt); rtnl_unlock(); err1: fib_rules_unregister(ops); return err; } static void __net_exit ip6mr_rules_exit(struct net *net) { struct mr_table *mrt, *next; rtnl_lock(); list_for_each_entry_safe(mrt, next, &net->ipv6.mr6_tables, list) { list_del(&mrt->list); ip6mr_free_table(mrt); } fib_rules_unregister(net->ipv6.mr6_rules_ops); rtnl_unlock(); } static int ip6mr_rules_dump(struct net *net, struct notifier_block *nb) { return fib_rules_dump(net, nb, RTNL_FAMILY_IP6MR); } static unsigned int ip6mr_rules_seq_read(struct net *net) { return fib_rules_seq_read(net, RTNL_FAMILY_IP6MR); } bool ip6mr_rule_default(const struct fib_rule *rule) { return fib_rule_matchall(rule) && rule->action == FR_ACT_TO_TBL && rule->table == RT6_TABLE_DFLT && !rule->l3mdev; } EXPORT_SYMBOL(ip6mr_rule_default); #else #define ip6mr_for_each_table(mrt, net) \ for (mrt = net->ipv6.mrt6; mrt; mrt = NULL) static struct mr_table *ip6mr_mr_table_iter(struct net *net, struct mr_table *mrt) { if (!mrt) return net->ipv6.mrt6; return NULL; } static struct mr_table *ip6mr_get_table(struct net *net, u32 id) { return net->ipv6.mrt6; } static int ip6mr_fib_lookup(struct net *net, struct flowi6 *flp6, struct mr_table **mrt) { *mrt = net->ipv6.mrt6; return 0; } static int __net_init ip6mr_rules_init(struct net *net) { struct mr_table *mrt; mrt = ip6mr_new_table(net, RT6_TABLE_DFLT); if (IS_ERR(mrt)) return PTR_ERR(mrt); net->ipv6.mrt6 = mrt; return 0; } static void __net_exit ip6mr_rules_exit(struct net *net) { rtnl_lock(); ip6mr_free_table(net->ipv6.mrt6); net->ipv6.mrt6 = NULL; rtnl_unlock(); } static int ip6mr_rules_dump(struct net *net, struct notifier_block *nb) { return 0; } static unsigned int ip6mr_rules_seq_read(struct net *net) { return 0; } #endif static int ip6mr_hash_cmp(struct rhashtable_compare_arg *arg, const void *ptr) { const struct mfc6_cache_cmp_arg *cmparg = arg->key; struct mfc6_cache *c = (struct mfc6_cache *)ptr; return !ipv6_addr_equal(&c->mf6c_mcastgrp, &cmparg->mf6c_mcastgrp) || !ipv6_addr_equal(&c->mf6c_origin, &cmparg->mf6c_origin); } static const struct rhashtable_params ip6mr_rht_params = { .head_offset = offsetof(struct mr_mfc, mnode), .key_offset = offsetof(struct mfc6_cache, cmparg), .key_len = sizeof(struct mfc6_cache_cmp_arg), .nelem_hint = 3, .locks_mul = 1, .obj_cmpfn = ip6mr_hash_cmp, .automatic_shrinking = true, }; static void ip6mr_new_table_set(struct mr_table *mrt, struct net *net) { #ifdef CONFIG_IPV6_MROUTE_MULTIPLE_TABLES list_add_tail_rcu(&mrt->list, &net->ipv6.mr6_tables); #endif } static struct mfc6_cache_cmp_arg ip6mr_mr_table_ops_cmparg_any = { .mf6c_origin = IN6ADDR_ANY_INIT, .mf6c_mcastgrp = IN6ADDR_ANY_INIT, }; static struct mr_table_ops ip6mr_mr_table_ops = { .rht_params = &ip6mr_rht_params, .cmparg_any = &ip6mr_mr_table_ops_cmparg_any, }; static struct mr_table *ip6mr_new_table(struct net *net, u32 id) { struct mr_table *mrt; mrt = ip6mr_get_table(net, id); if (mrt) return mrt; return mr_table_alloc(net, id, &ip6mr_mr_table_ops, ipmr_expire_process, ip6mr_new_table_set); } static void ip6mr_free_table(struct mr_table *mrt) { del_timer_sync(&mrt->ipmr_expire_timer); mroute_clean_tables(mrt, true); rhltable_destroy(&mrt->mfc_hash); kfree(mrt); } #ifdef CONFIG_PROC_FS /* The /proc interfaces to multicast routing * /proc/ip6_mr_cache /proc/ip6_mr_vif */ static void *ip6mr_vif_seq_start(struct seq_file *seq, loff_t *pos) __acquires(mrt_lock) { struct mr_vif_iter *iter = seq->private; struct net *net = seq_file_net(seq); struct mr_table *mrt; mrt = ip6mr_get_table(net, RT6_TABLE_DFLT); if (!mrt) return ERR_PTR(-ENOENT); iter->mrt = mrt; read_lock(&mrt_lock); return mr_vif_seq_start(seq, pos); } static void ip6mr_vif_seq_stop(struct seq_file *seq, void *v) __releases(mrt_lock) { read_unlock(&mrt_lock); } static int ip6mr_vif_seq_show(struct seq_file *seq, void *v) { struct mr_vif_iter *iter = seq->private; struct mr_table *mrt = iter->mrt; if (v == SEQ_START_TOKEN) { seq_puts(seq, "Interface BytesIn PktsIn BytesOut PktsOut Flags\n"); } else { const struct vif_device *vif = v; const char *name = vif->dev ? vif->dev->name : "none"; seq_printf(seq, "%2td %-10s %8ld %7ld %8ld %7ld %05X\n", vif - mrt->vif_table, name, vif->bytes_in, vif->pkt_in, vif->bytes_out, vif->pkt_out, vif->flags); } return 0; } static const struct seq_operations ip6mr_vif_seq_ops = { .start = ip6mr_vif_seq_start, .next = mr_vif_seq_next, .stop = ip6mr_vif_seq_stop, .show = ip6mr_vif_seq_show, }; static void *ipmr_mfc_seq_start(struct seq_file *seq, loff_t *pos) { struct net *net = seq_file_net(seq); struct mr_table *mrt; mrt = ip6mr_get_table(net, RT6_TABLE_DFLT); if (!mrt) return ERR_PTR(-ENOENT); return mr_mfc_seq_start(seq, pos, mrt, &mfc_unres_lock); } static int ipmr_mfc_seq_show(struct seq_file *seq, void *v) { int n; if (v == SEQ_START_TOKEN) { seq_puts(seq, "Group " "Origin " "Iif Pkts Bytes Wrong Oifs\n"); } else { const struct mfc6_cache *mfc = v; const struct mr_mfc_iter *it = seq->private; struct mr_table *mrt = it->mrt; seq_printf(seq, "%pI6 %pI6 %-3hd", &mfc->mf6c_mcastgrp, &mfc->mf6c_origin, mfc->_c.mfc_parent); if (it->cache != &mrt->mfc_unres_queue) { seq_printf(seq, " %8lu %8lu %8lu", mfc->_c.mfc_un.res.pkt, mfc->_c.mfc_un.res.bytes, mfc->_c.mfc_un.res.wrong_if); for (n = mfc->_c.mfc_un.res.minvif; n < mfc->_c.mfc_un.res.maxvif; n++) { if (VIF_EXISTS(mrt, n) && mfc->_c.mfc_un.res.ttls[n] < 255) seq_printf(seq, " %2d:%-3d", n, mfc->_c.mfc_un.res.ttls[n]); } } else { /* unresolved mfc_caches don't contain * pkt, bytes and wrong_if values */ seq_printf(seq, " %8lu %8lu %8lu", 0ul, 0ul, 0ul); } seq_putc(seq, '\n'); } return 0; } static const struct seq_operations ipmr_mfc_seq_ops = { .start = ipmr_mfc_seq_start, .next = mr_mfc_seq_next, .stop = mr_mfc_seq_stop, .show = ipmr_mfc_seq_show, }; #endif #ifdef CONFIG_IPV6_PIMSM_V2 static int pim6_rcv(struct sk_buff *skb) { struct pimreghdr *pim; struct ipv6hdr *encap; struct net_device *reg_dev = NULL; struct net *net = dev_net(skb->dev); struct mr_table *mrt; struct flowi6 fl6 = { .flowi6_iif = skb->dev->ifindex, .flowi6_mark = skb->mark, }; int reg_vif_num; if (!pskb_may_pull(skb, sizeof(*pim) + sizeof(*encap))) goto drop; pim = (struct pimreghdr *)skb_transport_header(skb); if (pim->type != ((PIM_VERSION << 4) | PIM_TYPE_REGISTER) || (pim->flags & PIM_NULL_REGISTER) || (csum_ipv6_magic(&ipv6_hdr(skb)->saddr, &ipv6_hdr(skb)->daddr, sizeof(*pim), IPPROTO_PIM, csum_partial((void *)pim, sizeof(*pim), 0)) && csum_fold(skb_checksum(skb, 0, skb->len, 0)))) goto drop; /* check if the inner packet is destined to mcast group */ encap = (struct ipv6hdr *)(skb_transport_header(skb) + sizeof(*pim)); if (!ipv6_addr_is_multicast(&encap->daddr) || encap->payload_len == 0 || ntohs(encap->payload_len) + sizeof(*pim) > skb->len) goto drop; if (ip6mr_fib_lookup(net, &fl6, &mrt) < 0) goto drop; reg_vif_num = mrt->mroute_reg_vif_num; read_lock(&mrt_lock); if (reg_vif_num >= 0) reg_dev = mrt->vif_table[reg_vif_num].dev; if (reg_dev) dev_hold(reg_dev); read_unlock(&mrt_lock); if (!reg_dev) goto drop; skb->mac_header = skb->network_header; skb_pull(skb, (u8 *)encap - skb->data); skb_reset_network_header(skb); skb->protocol = htons(ETH_P_IPV6); skb->ip_summed = CHECKSUM_NONE; skb_tunnel_rx(skb, reg_dev, dev_net(reg_dev)); netif_rx(skb); dev_put(reg_dev); return 0; drop: kfree_skb(skb); return 0; } static const struct inet6_protocol pim6_protocol = { .handler = pim6_rcv, }; /* Service routines creating virtual interfaces: PIMREG */ static netdev_tx_t reg_vif_xmit(struct sk_buff *skb, struct net_device *dev) { struct net *net = dev_net(dev); struct mr_table *mrt; struct flowi6 fl6 = { .flowi6_oif = dev->ifindex, .flowi6_iif = skb->skb_iif ? : LOOPBACK_IFINDEX, .flowi6_mark = skb->mark, }; if (!pskb_inet_may_pull(skb)) goto tx_err; if (ip6mr_fib_lookup(net, &fl6, &mrt) < 0) goto tx_err; read_lock(&mrt_lock); dev->stats.tx_bytes += skb->len; dev->stats.tx_packets++; ip6mr_cache_report(mrt, skb, mrt->mroute_reg_vif_num, MRT6MSG_WHOLEPKT); read_unlock(&mrt_lock); kfree_skb(skb); return NETDEV_TX_OK; tx_err: dev->stats.tx_errors++; kfree_skb(skb); return NETDEV_TX_OK; } static int reg_vif_get_iflink(const struct net_device *dev) { return 0; } static const struct net_device_ops reg_vif_netdev_ops = { .ndo_start_xmit = reg_vif_xmit, .ndo_get_iflink = reg_vif_get_iflink, }; static void reg_vif_setup(struct net_device *dev) { dev->type = ARPHRD_PIMREG; dev->mtu = 1500 - sizeof(struct ipv6hdr) - 8; dev->flags = IFF_NOARP; dev->netdev_ops = &reg_vif_netdev_ops; dev->needs_free_netdev = true; dev->features |= NETIF_F_NETNS_LOCAL; } static struct net_device *ip6mr_reg_vif(struct net *net, struct mr_table *mrt) { struct net_device *dev; char name[IFNAMSIZ]; if (mrt->id == RT6_TABLE_DFLT) sprintf(name, "pim6reg"); else sprintf(name, "pim6reg%u", mrt->id); dev = alloc_netdev(0, name, NET_NAME_UNKNOWN, reg_vif_setup); if (!dev) return NULL; dev_net_set(dev, net); if (register_netdevice(dev)) { free_netdev(dev); return NULL; } if (dev_open(dev)) goto failure; dev_hold(dev); return dev; failure: unregister_netdevice(dev); return NULL; } #endif static int call_ip6mr_vif_entry_notifiers(struct net *net, enum fib_event_type event_type, struct vif_device *vif, mifi_t vif_index, u32 tb_id) { return mr_call_vif_notifiers(net, RTNL_FAMILY_IP6MR, event_type, vif, vif_index, tb_id, &net->ipv6.ipmr_seq); } static int call_ip6mr_mfc_entry_notifiers(struct net *net, enum fib_event_type event_type, struct mfc6_cache *mfc, u32 tb_id) { return mr_call_mfc_notifiers(net, RTNL_FAMILY_IP6MR, event_type, &mfc->_c, tb_id, &net->ipv6.ipmr_seq); } /* Delete a VIF entry */ static int mif6_delete(struct mr_table *mrt, int vifi, int notify, struct list_head *head) { struct vif_device *v; struct net_device *dev; struct inet6_dev *in6_dev; if (vifi < 0 || vifi >= mrt->maxvif) return -EADDRNOTAVAIL; v = &mrt->vif_table[vifi]; if (VIF_EXISTS(mrt, vifi)) call_ip6mr_vif_entry_notifiers(read_pnet(&mrt->net), FIB_EVENT_VIF_DEL, v, vifi, mrt->id); write_lock_bh(&mrt_lock); dev = v->dev; v->dev = NULL; if (!dev) { write_unlock_bh(&mrt_lock); return -EADDRNOTAVAIL; } #ifdef CONFIG_IPV6_PIMSM_V2 if (vifi == mrt->mroute_reg_vif_num) mrt->mroute_reg_vif_num = -1; #endif if (vifi + 1 == mrt->maxvif) { int tmp; for (tmp = vifi - 1; tmp >= 0; tmp--) { if (VIF_EXISTS(mrt, tmp)) break; } mrt->maxvif = tmp + 1; } write_unlock_bh(&mrt_lock); dev_set_allmulti(dev, -1); in6_dev = __in6_dev_get(dev); if (in6_dev) { in6_dev->cnf.mc_forwarding--; inet6_netconf_notify_devconf(dev_net(dev), RTM_NEWNETCONF, NETCONFA_MC_FORWARDING, dev->ifindex, &in6_dev->cnf); } if ((v->flags & MIFF_REGISTER) && !notify) unregister_netdevice_queue(dev, head); dev_put(dev); return 0; } static inline void ip6mr_cache_free_rcu(struct rcu_head *head) { struct mr_mfc *c = container_of(head, struct mr_mfc, rcu); kmem_cache_free(mrt_cachep, (struct mfc6_cache *)c); } static inline void ip6mr_cache_free(struct mfc6_cache *c) { call_rcu(&c->_c.rcu, ip6mr_cache_free_rcu); } /* Destroy an unresolved cache entry, killing queued skbs and reporting error to netlink readers. */ static void ip6mr_destroy_unres(struct mr_table *mrt, struct mfc6_cache *c) { struct net *net = read_pnet(&mrt->net); struct sk_buff *skb; atomic_dec(&mrt->cache_resolve_queue_len); while ((skb = skb_dequeue(&c->_c.mfc_un.unres.unresolved)) != NULL) { if (ipv6_hdr(skb)->version == 0) { struct nlmsghdr *nlh = skb_pull(skb, sizeof(struct ipv6hdr)); nlh->nlmsg_type = NLMSG_ERROR; nlh->nlmsg_len = nlmsg_msg_size(sizeof(struct nlmsgerr)); skb_trim(skb, nlh->nlmsg_len); ((struct nlmsgerr *)nlmsg_data(nlh))->error = -ETIMEDOUT; rtnl_unicast(skb, net, NETLINK_CB(skb).portid); } else kfree_skb(skb); } ip6mr_cache_free(c); } /* Timer process for all the unresolved queue. */ static void ipmr_do_expire_process(struct mr_table *mrt) { unsigned long now = jiffies; unsigned long expires = 10 * HZ; struct mr_mfc *c, *next; list_for_each_entry_safe(c, next, &mrt->mfc_unres_queue, list) { if (time_after(c->mfc_un.unres.expires, now)) { /* not yet... */ unsigned long interval = c->mfc_un.unres.expires - now; if (interval < expires) expires = interval; continue; } list_del(&c->list); mr6_netlink_event(mrt, (struct mfc6_cache *)c, RTM_DELROUTE); ip6mr_destroy_unres(mrt, (struct mfc6_cache *)c); } if (!list_empty(&mrt->mfc_unres_queue)) mod_timer(&mrt->ipmr_expire_timer, jiffies + expires); } static void ipmr_expire_process(struct timer_list *t) { struct mr_table *mrt = from_timer(mrt, t, ipmr_expire_timer); if (!spin_trylock(&mfc_unres_lock)) { mod_timer(&mrt->ipmr_expire_timer, jiffies + 1); return; } if (!list_empty(&mrt->mfc_unres_queue)) ipmr_do_expire_process(mrt); spin_unlock(&mfc_unres_lock); } /* Fill oifs list. It is called under write locked mrt_lock. */ static void ip6mr_update_thresholds(struct mr_table *mrt, struct mr_mfc *cache, unsigned char *ttls) { int vifi; cache->mfc_un.res.minvif = MAXMIFS; cache->mfc_un.res.maxvif = 0; memset(cache->mfc_un.res.ttls, 255, MAXMIFS); for (vifi = 0; vifi < mrt->maxvif; vifi++) { if (VIF_EXISTS(mrt, vifi) && ttls[vifi] && ttls[vifi] < 255) { cache->mfc_un.res.ttls[vifi] = ttls[vifi]; if (cache->mfc_un.res.minvif > vifi) cache->mfc_un.res.minvif = vifi; if (cache->mfc_un.res.maxvif <= vifi) cache->mfc_un.res.maxvif = vifi + 1; } } cache->mfc_un.res.lastuse = jiffies; } static int mif6_add(struct net *net, struct mr_table *mrt, struct mif6ctl *vifc, int mrtsock) { int vifi = vifc->mif6c_mifi; struct vif_device *v = &mrt->vif_table[vifi]; struct net_device *dev; struct inet6_dev *in6_dev; int err; /* Is vif busy ? */ if (VIF_EXISTS(mrt, vifi)) return -EADDRINUSE; switch (vifc->mif6c_flags) { #ifdef CONFIG_IPV6_PIMSM_V2 case MIFF_REGISTER: /* * Special Purpose VIF in PIM * All the packets will be sent to the daemon */ if (mrt->mroute_reg_vif_num >= 0) return -EADDRINUSE; dev = ip6mr_reg_vif(net, mrt); if (!dev) return -ENOBUFS; err = dev_set_allmulti(dev, 1); if (err) { unregister_netdevice(dev); dev_put(dev); return err; } break; #endif case 0: dev = dev_get_by_index(net, vifc->mif6c_pifi); if (!dev) return -EADDRNOTAVAIL; err = dev_set_allmulti(dev, 1); if (err) { dev_put(dev); return err; } break; default: return -EINVAL; } in6_dev = __in6_dev_get(dev); if (in6_dev) { in6_dev->cnf.mc_forwarding++; inet6_netconf_notify_devconf(dev_net(dev), RTM_NEWNETCONF, NETCONFA_MC_FORWARDING, dev->ifindex, &in6_dev->cnf); } /* Fill in the VIF structures */ vif_device_init(v, dev, vifc->vifc_rate_limit, vifc->vifc_threshold, vifc->mif6c_flags | (!mrtsock ? VIFF_STATIC : 0), MIFF_REGISTER); /* And finish update writing critical data */ write_lock_bh(&mrt_lock); v->dev = dev; #ifdef CONFIG_IPV6_PIMSM_V2 if (v->flags & MIFF_REGISTER) mrt->mroute_reg_vif_num = vifi; #endif if (vifi + 1 > mrt->maxvif) mrt->maxvif = vifi + 1; write_unlock_bh(&mrt_lock); call_ip6mr_vif_entry_notifiers(net, FIB_EVENT_VIF_ADD, v, vifi, mrt->id); return 0; } static struct mfc6_cache *ip6mr_cache_find(struct mr_table *mrt, const struct in6_addr *origin, const struct in6_addr *mcastgrp) { struct mfc6_cache_cmp_arg arg = { .mf6c_origin = *origin, .mf6c_mcastgrp = *mcastgrp, }; return mr_mfc_find(mrt, &arg); } /* Look for a (*,G) entry */ static struct mfc6_cache *ip6mr_cache_find_any(struct mr_table *mrt, struct in6_addr *mcastgrp, mifi_t mifi) { struct mfc6_cache_cmp_arg arg = { .mf6c_origin = in6addr_any, .mf6c_mcastgrp = *mcastgrp, }; if (ipv6_addr_any(mcastgrp)) return mr_mfc_find_any_parent(mrt, mifi); return mr_mfc_find_any(mrt, mifi, &arg); } /* Look for a (S,G,iif) entry if parent != -1 */ static struct mfc6_cache * ip6mr_cache_find_parent(struct mr_table *mrt, const struct in6_addr *origin, const struct in6_addr *mcastgrp, int parent) { struct mfc6_cache_cmp_arg arg = { .mf6c_origin = *origin, .mf6c_mcastgrp = *mcastgrp, }; return mr_mfc_find_parent(mrt, &arg, parent); } /* Allocate a multicast cache entry */ static struct mfc6_cache *ip6mr_cache_alloc(void) { struct mfc6_cache *c = kmem_cache_zalloc(mrt_cachep, GFP_KERNEL); if (!c) return NULL; c->_c.mfc_un.res.last_assert = jiffies - MFC_ASSERT_THRESH - 1; c->_c.mfc_un.res.minvif = MAXMIFS; c->_c.free = ip6mr_cache_free_rcu; refcount_set(&c->_c.mfc_un.res.refcount, 1); return c; } static struct mfc6_cache *ip6mr_cache_alloc_unres(void) { struct mfc6_cache *c = kmem_cache_zalloc(mrt_cachep, GFP_ATOMIC); if (!c) return NULL; skb_queue_head_init(&c->_c.mfc_un.unres.unresolved); c->_c.mfc_un.unres.expires = jiffies + 10 * HZ; return c; } /* * A cache entry has gone into a resolved state from queued */ static void ip6mr_cache_resolve(struct net *net, struct mr_table *mrt, struct mfc6_cache *uc, struct mfc6_cache *c) { struct sk_buff *skb; /* * Play the pending entries through our router */ while ((skb = __skb_dequeue(&uc->_c.mfc_un.unres.unresolved))) { if (ipv6_hdr(skb)->version == 0) { struct nlmsghdr *nlh = skb_pull(skb, sizeof(struct ipv6hdr)); if (mr_fill_mroute(mrt, skb, &c->_c, nlmsg_data(nlh)) > 0) { nlh->nlmsg_len = skb_tail_pointer(skb) - (u8 *)nlh; } else { nlh->nlmsg_type = NLMSG_ERROR; nlh->nlmsg_len = nlmsg_msg_size(sizeof(struct nlmsgerr)); skb_trim(skb, nlh->nlmsg_len); ((struct nlmsgerr *)nlmsg_data(nlh))->error = -EMSGSIZE; } rtnl_unicast(skb, net, NETLINK_CB(skb).portid); } else ip6_mr_forward(net, mrt, skb, c); } } /* * Bounce a cache query up to pim6sd and netlink. * * Called under mrt_lock. */ static int ip6mr_cache_report(struct mr_table *mrt, struct sk_buff *pkt, mifi_t mifi, int assert) { struct sock *mroute6_sk; struct sk_buff *skb; struct mrt6msg *msg; int ret; #ifdef CONFIG_IPV6_PIMSM_V2 if (assert == MRT6MSG_WHOLEPKT) skb = skb_realloc_headroom(pkt, -skb_network_offset(pkt) +sizeof(*msg)); else #endif skb = alloc_skb(sizeof(struct ipv6hdr) + sizeof(*msg), GFP_ATOMIC); if (!skb) return -ENOBUFS; /* I suppose that internal messages * do not require checksums */ skb->ip_summed = CHECKSUM_UNNECESSARY; #ifdef CONFIG_IPV6_PIMSM_V2 if (assert == MRT6MSG_WHOLEPKT) { /* Ugly, but we have no choice with this interface. Duplicate old header, fix length etc. And all this only to mangle msg->im6_msgtype and to set msg->im6_mbz to "mbz" :-) */ skb_push(skb, -skb_network_offset(pkt)); skb_push(skb, sizeof(*msg)); skb_reset_transport_header(skb); msg = (struct mrt6msg *)skb_transport_header(skb); msg->im6_mbz = 0; msg->im6_msgtype = MRT6MSG_WHOLEPKT; msg->im6_mif = mrt->mroute_reg_vif_num; msg->im6_pad = 0; msg->im6_src = ipv6_hdr(pkt)->saddr; msg->im6_dst = ipv6_hdr(pkt)->daddr; skb->ip_summed = CHECKSUM_UNNECESSARY; } else #endif { /* * Copy the IP header */ skb_put(skb, sizeof(struct ipv6hdr)); skb_reset_network_header(skb); skb_copy_to_linear_data(skb, ipv6_hdr(pkt), sizeof(struct ipv6hdr)); /* * Add our header */ skb_put(skb, sizeof(*msg)); skb_reset_transport_header(skb); msg = (struct mrt6msg *)skb_transport_header(skb); msg->im6_mbz = 0; msg->im6_msgtype = assert; msg->im6_mif = mifi; msg->im6_pad = 0; msg->im6_src = ipv6_hdr(pkt)->saddr; msg->im6_dst = ipv6_hdr(pkt)->daddr; skb_dst_set(skb, dst_clone(skb_dst(pkt))); skb->ip_summed = CHECKSUM_UNNECESSARY; } rcu_read_lock(); mroute6_sk = rcu_dereference(mrt->mroute_sk); if (!mroute6_sk) { rcu_read_unlock(); kfree_skb(skb); return -EINVAL; } mrt6msg_netlink_event(mrt, skb); /* Deliver to user space multicast routing algorithms */ ret = sock_queue_rcv_skb(mroute6_sk, skb); rcu_read_unlock(); if (ret < 0) { net_warn_ratelimited("mroute6: pending queue full, dropping entries\n"); kfree_skb(skb); } return ret; } /* Queue a packet for resolution. It gets locked cache entry! */ static int ip6mr_cache_unresolved(struct mr_table *mrt, mifi_t mifi, struct sk_buff *skb) { struct mfc6_cache *c; bool found = false; int err; spin_lock_bh(&mfc_unres_lock); list_for_each_entry(c, &mrt->mfc_unres_queue, _c.list) { if (ipv6_addr_equal(&c->mf6c_mcastgrp, &ipv6_hdr(skb)->daddr) && ipv6_addr_equal(&c->mf6c_origin, &ipv6_hdr(skb)->saddr)) { found = true; break; } } if (!found) { /* * Create a new entry if allowable */ if (atomic_read(&mrt->cache_resolve_queue_len) >= 10 || (c = ip6mr_cache_alloc_unres()) == NULL) { spin_unlock_bh(&mfc_unres_lock); kfree_skb(skb); return -ENOBUFS; } /* Fill in the new cache entry */ c->_c.mfc_parent = -1; c->mf6c_origin = ipv6_hdr(skb)->saddr; c->mf6c_mcastgrp = ipv6_hdr(skb)->daddr; /* * Reflect first query at pim6sd */ err = ip6mr_cache_report(mrt, skb, mifi, MRT6MSG_NOCACHE); if (err < 0) { /* If the report failed throw the cache entry out - Brad Parker */ spin_unlock_bh(&mfc_unres_lock); ip6mr_cache_free(c); kfree_skb(skb); return err; } atomic_inc(&mrt->cache_resolve_queue_len); list_add(&c->_c.list, &mrt->mfc_unres_queue); mr6_netlink_event(mrt, c, RTM_NEWROUTE); ipmr_do_expire_process(mrt); } /* See if we can append the packet */ if (c->_c.mfc_un.unres.unresolved.qlen > 3) { kfree_skb(skb); err = -ENOBUFS; } else { skb_queue_tail(&c->_c.mfc_un.unres.unresolved, skb); err = 0; } spin_unlock_bh(&mfc_unres_lock); return err; } /* * MFC6 cache manipulation by user space */ static int ip6mr_mfc_delete(struct mr_table *mrt, struct mf6cctl *mfc, int parent) { struct mfc6_cache *c; /* The entries are added/deleted only under RTNL */ rcu_read_lock(); c = ip6mr_cache_find_parent(mrt, &mfc->mf6cc_origin.sin6_addr, &mfc->mf6cc_mcastgrp.sin6_addr, parent); rcu_read_unlock(); if (!c) return -ENOENT; rhltable_remove(&mrt->mfc_hash, &c->_c.mnode, ip6mr_rht_params); list_del_rcu(&c->_c.list); call_ip6mr_mfc_entry_notifiers(read_pnet(&mrt->net), FIB_EVENT_ENTRY_DEL, c, mrt->id); mr6_netlink_event(mrt, c, RTM_DELROUTE); mr_cache_put(&c->_c); return 0; } static int ip6mr_device_event(struct notifier_block *this, unsigned long event, void *ptr) { struct net_device *dev = netdev_notifier_info_to_dev(ptr); struct net *net = dev_net(dev); struct mr_table *mrt; struct vif_device *v; int ct; if (event != NETDEV_UNREGISTER) return NOTIFY_DONE; ip6mr_for_each_table(mrt, net) { v = &mrt->vif_table[0]; for (ct = 0; ct < mrt->maxvif; ct++, v++) { if (v->dev == dev) mif6_delete(mrt, ct, 1, NULL); } } return NOTIFY_DONE; } static unsigned int ip6mr_seq_read(struct net *net) { ASSERT_RTNL(); return net->ipv6.ipmr_seq + ip6mr_rules_seq_read(net); } static int ip6mr_dump(struct net *net, struct notifier_block *nb) { return mr_dump(net, nb, RTNL_FAMILY_IP6MR, ip6mr_rules_dump, ip6mr_mr_table_iter, &mrt_lock); } static struct notifier_block ip6_mr_notifier = { .notifier_call = ip6mr_device_event }; static const struct fib_notifier_ops ip6mr_notifier_ops_template = { .family = RTNL_FAMILY_IP6MR, .fib_seq_read = ip6mr_seq_read, .fib_dump = ip6mr_dump, .owner = THIS_MODULE, }; static int __net_init ip6mr_notifier_init(struct net *net) { struct fib_notifier_ops *ops; net->ipv6.ipmr_seq = 0; ops = fib_notifier_ops_register(&ip6mr_notifier_ops_template, net); if (IS_ERR(ops)) return PTR_ERR(ops); net->ipv6.ip6mr_notifier_ops = ops; return 0; } static void __net_exit ip6mr_notifier_exit(struct net *net) { fib_notifier_ops_unregister(net->ipv6.ip6mr_notifier_ops); net->ipv6.ip6mr_notifier_ops = NULL; } /* Setup for IP multicast routing */ static int __net_init ip6mr_net_init(struct net *net) { int err; err = ip6mr_notifier_init(net); if (err) return err; err = ip6mr_rules_init(net); if (err < 0) goto ip6mr_rules_fail; #ifdef CONFIG_PROC_FS err = -ENOMEM; if (!proc_create_net("ip6_mr_vif", 0, net->proc_net, &ip6mr_vif_seq_ops, sizeof(struct mr_vif_iter))) goto proc_vif_fail; if (!proc_create_net("ip6_mr_cache", 0, net->proc_net, &ipmr_mfc_seq_ops, sizeof(struct mr_mfc_iter))) goto proc_cache_fail; #endif return 0; #ifdef CONFIG_PROC_FS proc_cache_fail: remove_proc_entry("ip6_mr_vif", net->proc_net); proc_vif_fail: ip6mr_rules_exit(net); #endif ip6mr_rules_fail: ip6mr_notifier_exit(net); return err; } static void __net_exit ip6mr_net_exit(struct net *net) { #ifdef CONFIG_PROC_FS remove_proc_entry("ip6_mr_cache", net->proc_net); remove_proc_entry("ip6_mr_vif", net->proc_net); #endif ip6mr_rules_exit(net); ip6mr_notifier_exit(net); } static struct pernet_operations ip6mr_net_ops = { .init = ip6mr_net_init, .exit = ip6mr_net_exit, }; int __init ip6_mr_init(void) { int err; mrt_cachep = kmem_cache_create("ip6_mrt_cache", sizeof(struct mfc6_cache), 0, SLAB_HWCACHE_ALIGN, NULL); if (!mrt_cachep) return -ENOMEM; err = register_pernet_subsys(&ip6mr_net_ops); if (err) goto reg_pernet_fail; err = register_netdevice_notifier(&ip6_mr_notifier); if (err) goto reg_notif_fail; #ifdef CONFIG_IPV6_PIMSM_V2 if (inet6_add_protocol(&pim6_protocol, IPPROTO_PIM) < 0) { pr_err("%s: can't add PIM protocol\n", __func__); err = -EAGAIN; goto add_proto_fail; } #endif err = rtnl_register_module(THIS_MODULE, RTNL_FAMILY_IP6MR, RTM_GETROUTE, NULL, ip6mr_rtm_dumproute, 0); if (err == 0) return 0; #ifdef CONFIG_IPV6_PIMSM_V2 inet6_del_protocol(&pim6_protocol, IPPROTO_PIM); add_proto_fail: unregister_netdevice_notifier(&ip6_mr_notifier); #endif reg_notif_fail: unregister_pernet_subsys(&ip6mr_net_ops); reg_pernet_fail: kmem_cache_destroy(mrt_cachep); return err; } void ip6_mr_cleanup(void) { rtnl_unregister(RTNL_FAMILY_IP6MR, RTM_GETROUTE); #ifdef CONFIG_IPV6_PIMSM_V2 inet6_del_protocol(&pim6_protocol, IPPROTO_PIM); #endif unregister_netdevice_notifier(&ip6_mr_notifier); unregister_pernet_subsys(&ip6mr_net_ops); kmem_cache_destroy(mrt_cachep); } static int ip6mr_mfc_add(struct net *net, struct mr_table *mrt, struct mf6cctl *mfc, int mrtsock, int parent) { unsigned char ttls[MAXMIFS]; struct mfc6_cache *uc, *c; struct mr_mfc *_uc; bool found; int i, err; if (mfc->mf6cc_parent >= MAXMIFS) return -ENFILE; memset(ttls, 255, MAXMIFS); for (i = 0; i < MAXMIFS; i++) { if (IF_ISSET(i, &mfc->mf6cc_ifset)) ttls[i] = 1; } /* The entries are added/deleted only under RTNL */ rcu_read_lock(); c = ip6mr_cache_find_parent(mrt, &mfc->mf6cc_origin.sin6_addr, &mfc->mf6cc_mcastgrp.sin6_addr, parent); rcu_read_unlock(); if (c) { write_lock_bh(&mrt_lock); c->_c.mfc_parent = mfc->mf6cc_parent; ip6mr_update_thresholds(mrt, &c->_c, ttls); if (!mrtsock) c->_c.mfc_flags |= MFC_STATIC; write_unlock_bh(&mrt_lock); call_ip6mr_mfc_entry_notifiers(net, FIB_EVENT_ENTRY_REPLACE, c, mrt->id); mr6_netlink_event(mrt, c, RTM_NEWROUTE); return 0; } if (!ipv6_addr_any(&mfc->mf6cc_mcastgrp.sin6_addr) && !ipv6_addr_is_multicast(&mfc->mf6cc_mcastgrp.sin6_addr)) return -EINVAL; c = ip6mr_cache_alloc(); if (!c) return -ENOMEM; c->mf6c_origin = mfc->mf6cc_origin.sin6_addr; c->mf6c_mcastgrp = mfc->mf6cc_mcastgrp.sin6_addr; c->_c.mfc_parent = mfc->mf6cc_parent; ip6mr_update_thresholds(mrt, &c->_c, ttls); if (!mrtsock) c->_c.mfc_flags |= MFC_STATIC; err = rhltable_insert_key(&mrt->mfc_hash, &c->cmparg, &c->_c.mnode, ip6mr_rht_params); if (err) { pr_err("ip6mr: rhtable insert error %d\n", err); ip6mr_cache_free(c); return err; } list_add_tail_rcu(&c->_c.list, &mrt->mfc_cache_list); /* Check to see if we resolved a queued list. If so we * need to send on the frames and tidy up. */ found = false; spin_lock_bh(&mfc_unres_lock); list_for_each_entry(_uc, &mrt->mfc_unres_queue, list) { uc = (struct mfc6_cache *)_uc; if (ipv6_addr_equal(&uc->mf6c_origin, &c->mf6c_origin) && ipv6_addr_equal(&uc->mf6c_mcastgrp, &c->mf6c_mcastgrp)) { list_del(&_uc->list); atomic_dec(&mrt->cache_resolve_queue_len); found = true; break; } } if (list_empty(&mrt->mfc_unres_queue)) del_timer(&mrt->ipmr_expire_timer); spin_unlock_bh(&mfc_unres_lock); if (found) { ip6mr_cache_resolve(net, mrt, uc, c); ip6mr_cache_free(uc); } call_ip6mr_mfc_entry_notifiers(net, FIB_EVENT_ENTRY_ADD, c, mrt->id); mr6_netlink_event(mrt, c, RTM_NEWROUTE); return 0; } /* * Close the multicast socket, and clear the vif tables etc */ static void mroute_clean_tables(struct mr_table *mrt, bool all) { struct mr_mfc *c, *tmp; LIST_HEAD(list); int i; /* Shut down all active vif entries */ for (i = 0; i < mrt->maxvif; i++) { if (!all && (mrt->vif_table[i].flags & VIFF_STATIC)) continue; mif6_delete(mrt, i, 0, &list); } unregister_netdevice_many(&list); /* Wipe the cache */ list_for_each_entry_safe(c, tmp, &mrt->mfc_cache_list, list) { if (!all && (c->mfc_flags & MFC_STATIC)) continue; rhltable_remove(&mrt->mfc_hash, &c->mnode, ip6mr_rht_params); list_del_rcu(&c->list); call_ip6mr_mfc_entry_notifiers(read_pnet(&mrt->net), FIB_EVENT_ENTRY_DEL, (struct mfc6_cache *)c, mrt->id); mr6_netlink_event(mrt, (struct mfc6_cache *)c, RTM_DELROUTE); mr_cache_put(c); } if (atomic_read(&mrt->cache_resolve_queue_len) != 0) { spin_lock_bh(&mfc_unres_lock); list_for_each_entry_safe(c, tmp, &mrt->mfc_unres_queue, list) { list_del(&c->list); mr6_netlink_event(mrt, (struct mfc6_cache *)c, RTM_DELROUTE); ip6mr_destroy_unres(mrt, (struct mfc6_cache *)c); } spin_unlock_bh(&mfc_unres_lock); } } static int ip6mr_sk_init(struct mr_table *mrt, struct sock *sk) { int err = 0; struct net *net = sock_net(sk); rtnl_lock(); write_lock_bh(&mrt_lock); if (rtnl_dereference(mrt->mroute_sk)) { err = -EADDRINUSE; } else { rcu_assign_pointer(mrt->mroute_sk, sk); sock_set_flag(sk, SOCK_RCU_FREE); net->ipv6.devconf_all->mc_forwarding++; } write_unlock_bh(&mrt_lock); if (!err) inet6_netconf_notify_devconf(net, RTM_NEWNETCONF, NETCONFA_MC_FORWARDING, NETCONFA_IFINDEX_ALL, net->ipv6.devconf_all); rtnl_unlock(); return err; } int ip6mr_sk_done(struct sock *sk) { int err = -EACCES; struct net *net = sock_net(sk); struct mr_table *mrt; if (sk->sk_type != SOCK_RAW || inet_sk(sk)->inet_num != IPPROTO_ICMPV6) return err; rtnl_lock(); ip6mr_for_each_table(mrt, net) { if (sk == rtnl_dereference(mrt->mroute_sk)) { write_lock_bh(&mrt_lock); RCU_INIT_POINTER(mrt->mroute_sk, NULL); /* Note that mroute_sk had SOCK_RCU_FREE set, * so the RCU grace period before sk freeing * is guaranteed by sk_destruct() */ net->ipv6.devconf_all->mc_forwarding--; write_unlock_bh(&mrt_lock); inet6_netconf_notify_devconf(net, RTM_NEWNETCONF, NETCONFA_MC_FORWARDING, NETCONFA_IFINDEX_ALL, net->ipv6.devconf_all); mroute_clean_tables(mrt, false); err = 0; break; } } rtnl_unlock(); return err; } bool mroute6_is_socket(struct net *net, struct sk_buff *skb) { struct mr_table *mrt; struct flowi6 fl6 = { .flowi6_iif = skb->skb_iif ? : LOOPBACK_IFINDEX, .flowi6_oif = skb->dev->ifindex, .flowi6_mark = skb->mark, }; if (ip6mr_fib_lookup(net, &fl6, &mrt) < 0) return NULL; return rcu_access_pointer(mrt->mroute_sk); } EXPORT_SYMBOL(mroute6_is_socket); /* * Socket options and virtual interface manipulation. The whole * virtual interface system is a complete heap, but unfortunately * that's how BSD mrouted happens to think. Maybe one day with a proper * MOSPF/PIM router set up we can clean this up. */ int ip6_mroute_setsockopt(struct sock *sk, int optname, char __user *optval, unsigned int optlen) { int ret, parent = 0; struct mif6ctl vif; struct mf6cctl mfc; mifi_t mifi; struct net *net = sock_net(sk); struct mr_table *mrt; if (sk->sk_type != SOCK_RAW || inet_sk(sk)->inet_num != IPPROTO_ICMPV6) return -EOPNOTSUPP; mrt = ip6mr_get_table(net, raw6_sk(sk)->ip6mr_table ? : RT6_TABLE_DFLT); if (!mrt) return -ENOENT; if (optname != MRT6_INIT) { if (sk != rcu_access_pointer(mrt->mroute_sk) && !ns_capable(net->user_ns, CAP_NET_ADMIN)) return -EACCES; } switch (optname) { case MRT6_INIT: if (optlen < sizeof(int)) return -EINVAL; return ip6mr_sk_init(mrt, sk); case MRT6_DONE: return ip6mr_sk_done(sk); case MRT6_ADD_MIF: if (optlen < sizeof(vif)) return -EINVAL; if (copy_from_user(&vif, optval, sizeof(vif))) return -EFAULT; if (vif.mif6c_mifi >= MAXMIFS) return -ENFILE; rtnl_lock(); ret = mif6_add(net, mrt, &vif, sk == rtnl_dereference(mrt->mroute_sk)); rtnl_unlock(); return ret; case MRT6_DEL_MIF: if (optlen < sizeof(mifi_t)) return -EINVAL; if (copy_from_user(&mifi, optval, sizeof(mifi_t))) return -EFAULT; rtnl_lock(); ret = mif6_delete(mrt, mifi, 0, NULL); rtnl_unlock(); return ret; /* * Manipulate the forwarding caches. These live * in a sort of kernel/user symbiosis. */ case MRT6_ADD_MFC: case MRT6_DEL_MFC: parent = -1; /* fall through */ case MRT6_ADD_MFC_PROXY: case MRT6_DEL_MFC_PROXY: if (optlen < sizeof(mfc)) return -EINVAL; if (copy_from_user(&mfc, optval, sizeof(mfc))) return -EFAULT; if (parent == 0) parent = mfc.mf6cc_parent; rtnl_lock(); if (optname == MRT6_DEL_MFC || optname == MRT6_DEL_MFC_PROXY) ret = ip6mr_mfc_delete(mrt, &mfc, parent); else ret = ip6mr_mfc_add(net, mrt, &mfc, sk == rtnl_dereference(mrt->mroute_sk), parent); rtnl_unlock(); return ret; /* * Control PIM assert (to activate pim will activate assert) */ case MRT6_ASSERT: { int v; if (optlen != sizeof(v)) return -EINVAL; if (get_user(v, (int __user *)optval)) return -EFAULT; mrt->mroute_do_assert = v; return 0; } #ifdef CONFIG_IPV6_PIMSM_V2 case MRT6_PIM: { int v; if (optlen != sizeof(v)) return -EINVAL; if (get_user(v, (int __user *)optval)) return -EFAULT; v = !!v; rtnl_lock(); ret = 0; if (v != mrt->mroute_do_pim) { mrt->mroute_do_pim = v; mrt->mroute_do_assert = v; } rtnl_unlock(); return ret; } #endif #ifdef CONFIG_IPV6_MROUTE_MULTIPLE_TABLES case MRT6_TABLE: { u32 v; if (optlen != sizeof(u32)) return -EINVAL; if (get_user(v, (u32 __user *)optval)) return -EFAULT; /* "pim6reg%u" should not exceed 16 bytes (IFNAMSIZ) */ if (v != RT_TABLE_DEFAULT && v >= 100000000) return -EINVAL; if (sk == rcu_access_pointer(mrt->mroute_sk)) return -EBUSY; rtnl_lock(); ret = 0; mrt = ip6mr_new_table(net, v); if (IS_ERR(mrt)) ret = PTR_ERR(mrt); else raw6_sk(sk)->ip6mr_table = v; rtnl_unlock(); return ret; } #endif /* * Spurious command, or MRT6_VERSION which you cannot * set. */ default: return -ENOPROTOOPT; } } /* * Getsock opt support for the multicast routing system. */ int ip6_mroute_getsockopt(struct sock *sk, int optname, char __user *optval, int __user *optlen) { int olr; int val; struct net *net = sock_net(sk); struct mr_table *mrt; if (sk->sk_type != SOCK_RAW || inet_sk(sk)->inet_num != IPPROTO_ICMPV6) return -EOPNOTSUPP; mrt = ip6mr_get_table(net, raw6_sk(sk)->ip6mr_table ? : RT6_TABLE_DFLT); if (!mrt) return -ENOENT; switch (optname) { case MRT6_VERSION: val = 0x0305; break; #ifdef CONFIG_IPV6_PIMSM_V2 case MRT6_PIM: val = mrt->mroute_do_pim; break; #endif case MRT6_ASSERT: val = mrt->mroute_do_assert; break; default: return -ENOPROTOOPT; } if (get_user(olr, optlen)) return -EFAULT; olr = min_t(int, olr, sizeof(int)); if (olr < 0) return -EINVAL; if (put_user(olr, optlen)) return -EFAULT; if (copy_to_user(optval, &val, olr)) return -EFAULT; return 0; } /* * The IP multicast ioctl support routines. */ int ip6mr_ioctl(struct sock *sk, int cmd, void __user *arg) { struct sioc_sg_req6 sr; struct sioc_mif_req6 vr; struct vif_device *vif; struct mfc6_cache *c; struct net *net = sock_net(sk); struct mr_table *mrt; mrt = ip6mr_get_table(net, raw6_sk(sk)->ip6mr_table ? : RT6_TABLE_DFLT); if (!mrt) return -ENOENT; switch (cmd) { case SIOCGETMIFCNT_IN6: if (copy_from_user(&vr, arg, sizeof(vr))) return -EFAULT; if (vr.mifi >= mrt->maxvif) return -EINVAL; vr.mifi = array_index_nospec(vr.mifi, mrt->maxvif); read_lock(&mrt_lock); vif = &mrt->vif_table[vr.mifi]; if (VIF_EXISTS(mrt, vr.mifi)) { vr.icount = vif->pkt_in; vr.ocount = vif->pkt_out; vr.ibytes = vif->bytes_in; vr.obytes = vif->bytes_out; read_unlock(&mrt_lock); if (copy_to_user(arg, &vr, sizeof(vr))) return -EFAULT; return 0; } read_unlock(&mrt_lock); return -EADDRNOTAVAIL; case SIOCGETSGCNT_IN6: if (copy_from_user(&sr, arg, sizeof(sr))) return -EFAULT; rcu_read_lock(); c = ip6mr_cache_find(mrt, &sr.src.sin6_addr, &sr.grp.sin6_addr); if (c) { sr.pktcnt = c->_c.mfc_un.res.pkt; sr.bytecnt = c->_c.mfc_un.res.bytes; sr.wrong_if = c->_c.mfc_un.res.wrong_if; rcu_read_unlock(); if (copy_to_user(arg, &sr, sizeof(sr))) return -EFAULT; return 0; } rcu_read_unlock(); return -EADDRNOTAVAIL; default: return -ENOIOCTLCMD; } } #ifdef CONFIG_COMPAT struct compat_sioc_sg_req6 { struct sockaddr_in6 src; struct sockaddr_in6 grp; compat_ulong_t pktcnt; compat_ulong_t bytecnt; compat_ulong_t wrong_if; }; struct compat_sioc_mif_req6 { mifi_t mifi; compat_ulong_t icount; compat_ulong_t ocount; compat_ulong_t ibytes; compat_ulong_t obytes; }; int ip6mr_compat_ioctl(struct sock *sk, unsigned int cmd, void __user *arg) { struct compat_sioc_sg_req6 sr; struct compat_sioc_mif_req6 vr; struct vif_device *vif; struct mfc6_cache *c; struct net *net = sock_net(sk); struct mr_table *mrt; mrt = ip6mr_get_table(net, raw6_sk(sk)->ip6mr_table ? : RT6_TABLE_DFLT); if (!mrt) return -ENOENT; switch (cmd) { case SIOCGETMIFCNT_IN6: if (copy_from_user(&vr, arg, sizeof(vr))) return -EFAULT; if (vr.mifi >= mrt->maxvif) return -EINVAL; vr.mifi = array_index_nospec(vr.mifi, mrt->maxvif); read_lock(&mrt_lock); vif = &mrt->vif_table[vr.mifi]; if (VIF_EXISTS(mrt, vr.mifi)) { vr.icount = vif->pkt_in; vr.ocount = vif->pkt_out; vr.ibytes = vif->bytes_in; vr.obytes = vif->bytes_out; read_unlock(&mrt_lock); if (copy_to_user(arg, &vr, sizeof(vr))) return -EFAULT; return 0; } read_unlock(&mrt_lock); return -EADDRNOTAVAIL; case SIOCGETSGCNT_IN6: if (copy_from_user(&sr, arg, sizeof(sr))) return -EFAULT; rcu_read_lock(); c = ip6mr_cache_find(mrt, &sr.src.sin6_addr, &sr.grp.sin6_addr); if (c) { sr.pktcnt = c->_c.mfc_un.res.pkt; sr.bytecnt = c->_c.mfc_un.res.bytes; sr.wrong_if = c->_c.mfc_un.res.wrong_if; rcu_read_unlock(); if (copy_to_user(arg, &sr, sizeof(sr))) return -EFAULT; return 0; } rcu_read_unlock(); return -EADDRNOTAVAIL; default: return -ENOIOCTLCMD; } } #endif static inline int ip6mr_forward2_finish(struct net *net, struct sock *sk, struct sk_buff *skb) { IP6_INC_STATS(net, ip6_dst_idev(skb_dst(skb)), IPSTATS_MIB_OUTFORWDATAGRAMS); IP6_ADD_STATS(net, ip6_dst_idev(skb_dst(skb)), IPSTATS_MIB_OUTOCTETS, skb->len); return dst_output(net, sk, skb); } /* * Processing handlers for ip6mr_forward */ static int ip6mr_forward2(struct net *net, struct mr_table *mrt, struct sk_buff *skb, struct mfc6_cache *c, int vifi) { struct ipv6hdr *ipv6h; struct vif_device *vif = &mrt->vif_table[vifi]; struct net_device *dev; struct dst_entry *dst; struct flowi6 fl6; if (!vif->dev) goto out_free; #ifdef CONFIG_IPV6_PIMSM_V2 if (vif->flags & MIFF_REGISTER) { vif->pkt_out++; vif->bytes_out += skb->len; vif->dev->stats.tx_bytes += skb->len; vif->dev->stats.tx_packets++; ip6mr_cache_report(mrt, skb, vifi, MRT6MSG_WHOLEPKT); goto out_free; } #endif ipv6h = ipv6_hdr(skb); fl6 = (struct flowi6) { .flowi6_oif = vif->link, .daddr = ipv6h->daddr, }; dst = ip6_route_output(net, NULL, &fl6); if (dst->error) { dst_release(dst); goto out_free; } skb_dst_drop(skb); skb_dst_set(skb, dst); /* * RFC1584 teaches, that DVMRP/PIM router must deliver packets locally * not only before forwarding, but after forwarding on all output * interfaces. It is clear, if mrouter runs a multicasting * program, it should receive packets not depending to what interface * program is joined. * If we will not make it, the program will have to join on all * interfaces. On the other hand, multihoming host (or router, but * not mrouter) cannot join to more than one interface - it will * result in receiving multiple packets. */ dev = vif->dev; skb->dev = dev; vif->pkt_out++; vif->bytes_out += skb->len; /* We are about to write */ /* XXX: extension headers? */ if (skb_cow(skb, sizeof(*ipv6h) + LL_RESERVED_SPACE(dev))) goto out_free; ipv6h = ipv6_hdr(skb); ipv6h->hop_limit--; IP6CB(skb)->flags |= IP6SKB_FORWARDED; return NF_HOOK(NFPROTO_IPV6, NF_INET_FORWARD, net, NULL, skb, skb->dev, dev, ip6mr_forward2_finish); out_free: kfree_skb(skb); return 0; } static int ip6mr_find_vif(struct mr_table *mrt, struct net_device *dev) { int ct; for (ct = mrt->maxvif - 1; ct >= 0; ct--) { if (mrt->vif_table[ct].dev == dev) break; } return ct; } static void ip6_mr_forward(struct net *net, struct mr_table *mrt, struct sk_buff *skb, struct mfc6_cache *c) { int psend = -1; int vif, ct; int true_vifi = ip6mr_find_vif(mrt, skb->dev); vif = c->_c.mfc_parent; c->_c.mfc_un.res.pkt++; c->_c.mfc_un.res.bytes += skb->len; c->_c.mfc_un.res.lastuse = jiffies; if (ipv6_addr_any(&c->mf6c_origin) && true_vifi >= 0) { struct mfc6_cache *cache_proxy; /* For an (*,G) entry, we only check that the incoming * interface is part of the static tree. */ rcu_read_lock(); cache_proxy = mr_mfc_find_any_parent(mrt, vif); if (cache_proxy && cache_proxy->_c.mfc_un.res.ttls[true_vifi] < 255) { rcu_read_unlock(); goto forward; } rcu_read_unlock(); } /* * Wrong interface: drop packet and (maybe) send PIM assert. */ if (mrt->vif_table[vif].dev != skb->dev) { c->_c.mfc_un.res.wrong_if++; if (true_vifi >= 0 && mrt->mroute_do_assert && /* pimsm uses asserts, when switching from RPT to SPT, so that we cannot check that packet arrived on an oif. It is bad, but otherwise we would need to move pretty large chunk of pimd to kernel. Ough... --ANK */ (mrt->mroute_do_pim || c->_c.mfc_un.res.ttls[true_vifi] < 255) && time_after(jiffies, c->_c.mfc_un.res.last_assert + MFC_ASSERT_THRESH)) { c->_c.mfc_un.res.last_assert = jiffies; ip6mr_cache_report(mrt, skb, true_vifi, MRT6MSG_WRONGMIF); } goto dont_forward; } forward: mrt->vif_table[vif].pkt_in++; mrt->vif_table[vif].bytes_in += skb->len; /* * Forward the frame */ if (ipv6_addr_any(&c->mf6c_origin) && ipv6_addr_any(&c->mf6c_mcastgrp)) { if (true_vifi >= 0 && true_vifi != c->_c.mfc_parent && ipv6_hdr(skb)->hop_limit > c->_c.mfc_un.res.ttls[c->_c.mfc_parent]) { /* It's an (*,*) entry and the packet is not coming from * the upstream: forward the packet to the upstream * only. */ psend = c->_c.mfc_parent; goto last_forward; } goto dont_forward; } for (ct = c->_c.mfc_un.res.maxvif - 1; ct >= c->_c.mfc_un.res.minvif; ct--) { /* For (*,G) entry, don't forward to the incoming interface */ if ((!ipv6_addr_any(&c->mf6c_origin) || ct != true_vifi) && ipv6_hdr(skb)->hop_limit > c->_c.mfc_un.res.ttls[ct]) { if (psend != -1) { struct sk_buff *skb2 = skb_clone(skb, GFP_ATOMIC); if (skb2) ip6mr_forward2(net, mrt, skb2, c, psend); } psend = ct; } } last_forward: if (psend != -1) { ip6mr_forward2(net, mrt, skb, c, psend); return; } dont_forward: kfree_skb(skb); } /* * Multicast packets for forwarding arrive here */ int ip6_mr_input(struct sk_buff *skb) { struct mfc6_cache *cache; struct net *net = dev_net(skb->dev); struct mr_table *mrt; struct flowi6 fl6 = { .flowi6_iif = skb->dev->ifindex, .flowi6_mark = skb->mark, }; int err; err = ip6mr_fib_lookup(net, &fl6, &mrt); if (err < 0) { kfree_skb(skb); return err; } read_lock(&mrt_lock); cache = ip6mr_cache_find(mrt, &ipv6_hdr(skb)->saddr, &ipv6_hdr(skb)->daddr); if (!cache) { int vif = ip6mr_find_vif(mrt, skb->dev); if (vif >= 0) cache = ip6mr_cache_find_any(mrt, &ipv6_hdr(skb)->daddr, vif); } /* * No usable cache entry */ if (!cache) { int vif; vif = ip6mr_find_vif(mrt, skb->dev); if (vif >= 0) { int err = ip6mr_cache_unresolved(mrt, vif, skb); read_unlock(&mrt_lock); return err; } read_unlock(&mrt_lock); kfree_skb(skb); return -ENODEV; } ip6_mr_forward(net, mrt, skb, cache); read_unlock(&mrt_lock); return 0; } int ip6mr_get_route(struct net *net, struct sk_buff *skb, struct rtmsg *rtm, u32 portid) { int err; struct mr_table *mrt; struct mfc6_cache *cache; struct rt6_info *rt = (struct rt6_info *)skb_dst(skb); mrt = ip6mr_get_table(net, RT6_TABLE_DFLT); if (!mrt) return -ENOENT; read_lock(&mrt_lock); cache = ip6mr_cache_find(mrt, &rt->rt6i_src.addr, &rt->rt6i_dst.addr); if (!cache && skb->dev) { int vif = ip6mr_find_vif(mrt, skb->dev); if (vif >= 0) cache = ip6mr_cache_find_any(mrt, &rt->rt6i_dst.addr, vif); } if (!cache) { struct sk_buff *skb2; struct ipv6hdr *iph; struct net_device *dev; int vif; dev = skb->dev; if (!dev || (vif = ip6mr_find_vif(mrt, dev)) < 0) { read_unlock(&mrt_lock); return -ENODEV; } /* really correct? */ skb2 = alloc_skb(sizeof(struct ipv6hdr), GFP_ATOMIC); if (!skb2) { read_unlock(&mrt_lock); return -ENOMEM; } NETLINK_CB(skb2).portid = portid; skb_reset_transport_header(skb2); skb_put(skb2, sizeof(struct ipv6hdr)); skb_reset_network_header(skb2); iph = ipv6_hdr(skb2); iph->version = 0; iph->priority = 0; iph->flow_lbl[0] = 0; iph->flow_lbl[1] = 0; iph->flow_lbl[2] = 0; iph->payload_len = 0; iph->nexthdr = IPPROTO_NONE; iph->hop_limit = 0; iph->saddr = rt->rt6i_src.addr; iph->daddr = rt->rt6i_dst.addr; err = ip6mr_cache_unresolved(mrt, vif, skb2); read_unlock(&mrt_lock); return err; } err = mr_fill_mroute(mrt, skb, &cache->_c, rtm); read_unlock(&mrt_lock); return err; } static int ip6mr_fill_mroute(struct mr_table *mrt, struct sk_buff *skb, u32 portid, u32 seq, struct mfc6_cache *c, int cmd, int flags) { struct nlmsghdr *nlh; struct rtmsg *rtm; int err; nlh = nlmsg_put(skb, portid, seq, cmd, sizeof(*rtm), flags); if (!nlh) return -EMSGSIZE; rtm = nlmsg_data(nlh); rtm->rtm_family = RTNL_FAMILY_IP6MR; rtm->rtm_dst_len = 128; rtm->rtm_src_len = 128; rtm->rtm_tos = 0; rtm->rtm_table = mrt->id; if (nla_put_u32(skb, RTA_TABLE, mrt->id)) goto nla_put_failure; rtm->rtm_type = RTN_MULTICAST; rtm->rtm_scope = RT_SCOPE_UNIVERSE; if (c->_c.mfc_flags & MFC_STATIC) rtm->rtm_protocol = RTPROT_STATIC; else rtm->rtm_protocol = RTPROT_MROUTED; rtm->rtm_flags = 0; if (nla_put_in6_addr(skb, RTA_SRC, &c->mf6c_origin) || nla_put_in6_addr(skb, RTA_DST, &c->mf6c_mcastgrp)) goto nla_put_failure; err = mr_fill_mroute(mrt, skb, &c->_c, rtm); /* do not break the dump if cache is unresolved */ if (err < 0 && err != -ENOENT) goto nla_put_failure; nlmsg_end(skb, nlh); return 0; nla_put_failure: nlmsg_cancel(skb, nlh); return -EMSGSIZE; } static int _ip6mr_fill_mroute(struct mr_table *mrt, struct sk_buff *skb, u32 portid, u32 seq, struct mr_mfc *c, int cmd, int flags) { return ip6mr_fill_mroute(mrt, skb, portid, seq, (struct mfc6_cache *)c, cmd, flags); } static int mr6_msgsize(bool unresolved, int maxvif) { size_t len = NLMSG_ALIGN(sizeof(struct rtmsg)) + nla_total_size(4) /* RTA_TABLE */ + nla_total_size(sizeof(struct in6_addr)) /* RTA_SRC */ + nla_total_size(sizeof(struct in6_addr)) /* RTA_DST */ ; if (!unresolved) len = len + nla_total_size(4) /* RTA_IIF */ + nla_total_size(0) /* RTA_MULTIPATH */ + maxvif * NLA_ALIGN(sizeof(struct rtnexthop)) /* RTA_MFC_STATS */ + nla_total_size_64bit(sizeof(struct rta_mfc_stats)) ; return len; } static void mr6_netlink_event(struct mr_table *mrt, struct mfc6_cache *mfc, int cmd) { struct net *net = read_pnet(&mrt->net); struct sk_buff *skb; int err = -ENOBUFS; skb = nlmsg_new(mr6_msgsize(mfc->_c.mfc_parent >= MAXMIFS, mrt->maxvif), GFP_ATOMIC); if (!skb) goto errout; err = ip6mr_fill_mroute(mrt, skb, 0, 0, mfc, cmd, 0); if (err < 0) goto errout; rtnl_notify(skb, net, 0, RTNLGRP_IPV6_MROUTE, NULL, GFP_ATOMIC); return; errout: kfree_skb(skb); if (err < 0) rtnl_set_sk_err(net, RTNLGRP_IPV6_MROUTE, err); } static size_t mrt6msg_netlink_msgsize(size_t payloadlen) { size_t len = NLMSG_ALIGN(sizeof(struct rtgenmsg)) + nla_total_size(1) /* IP6MRA_CREPORT_MSGTYPE */ + nla_total_size(4) /* IP6MRA_CREPORT_MIF_ID */ /* IP6MRA_CREPORT_SRC_ADDR */ + nla_total_size(sizeof(struct in6_addr)) /* IP6MRA_CREPORT_DST_ADDR */ + nla_total_size(sizeof(struct in6_addr)) /* IP6MRA_CREPORT_PKT */ + nla_total_size(payloadlen) ; return len; } static void mrt6msg_netlink_event(struct mr_table *mrt, struct sk_buff *pkt) { struct net *net = read_pnet(&mrt->net); struct nlmsghdr *nlh; struct rtgenmsg *rtgenm; struct mrt6msg *msg; struct sk_buff *skb; struct nlattr *nla; int payloadlen; payloadlen = pkt->len - sizeof(struct mrt6msg); msg = (struct mrt6msg *)skb_transport_header(pkt); skb = nlmsg_new(mrt6msg_netlink_msgsize(payloadlen), GFP_ATOMIC); if (!skb) goto errout; nlh = nlmsg_put(skb, 0, 0, RTM_NEWCACHEREPORT, sizeof(struct rtgenmsg), 0); if (!nlh) goto errout; rtgenm = nlmsg_data(nlh); rtgenm->rtgen_family = RTNL_FAMILY_IP6MR; if (nla_put_u8(skb, IP6MRA_CREPORT_MSGTYPE, msg->im6_msgtype) || nla_put_u32(skb, IP6MRA_CREPORT_MIF_ID, msg->im6_mif) || nla_put_in6_addr(skb, IP6MRA_CREPORT_SRC_ADDR, &msg->im6_src) || nla_put_in6_addr(skb, IP6MRA_CREPORT_DST_ADDR, &msg->im6_dst)) goto nla_put_failure; nla = nla_reserve(skb, IP6MRA_CREPORT_PKT, payloadlen); if (!nla || skb_copy_bits(pkt, sizeof(struct mrt6msg), nla_data(nla), payloadlen)) goto nla_put_failure; nlmsg_end(skb, nlh); rtnl_notify(skb, net, 0, RTNLGRP_IPV6_MROUTE_R, NULL, GFP_ATOMIC); return; nla_put_failure: nlmsg_cancel(skb, nlh); errout: kfree_skb(skb); rtnl_set_sk_err(net, RTNLGRP_IPV6_MROUTE_R, -ENOBUFS); } static int ip6mr_rtm_dumproute(struct sk_buff *skb, struct netlink_callback *cb) { return mr_rtm_dumproute(skb, cb, ip6mr_mr_table_iter, _ip6mr_fill_mroute, &mfc_unres_lock); }
47 19 11 34 2 23 2 53 53 53 53 52 53 1 53 53 25 33 33 20 16 1 1 2 1 3 31 34 34 16 16 16 16 16 3 21 21 34 42 42 42 39 6 42 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 /* * Copyright 1997-1998 Transmeta Corporation -- All Rights Reserved * Copyright 2005-2006 Ian Kent <raven@themaw.net> * * This file is part of the Linux kernel and is made available under * the terms of the GNU General Public License, version 2, or at your * option, any later version, incorporated herein by reference. */ #include <linux/seq_file.h> #include <linux/pagemap.h> #include <linux/parser.h> #include "autofs_i.h" struct autofs_info *autofs_new_ino(struct autofs_sb_info *sbi) { struct autofs_info *ino; ino = kzalloc(sizeof(*ino), GFP_KERNEL); if (ino) { INIT_LIST_HEAD(&ino->active); INIT_LIST_HEAD(&ino->expiring); ino->last_used = jiffies; ino->sbi = sbi; } return ino; } void autofs_clean_ino(struct autofs_info *ino) { ino->uid = GLOBAL_ROOT_UID; ino->gid = GLOBAL_ROOT_GID; ino->last_used = jiffies; } void autofs_free_ino(struct autofs_info *ino) { kfree(ino); } void autofs_kill_sb(struct super_block *sb) { struct autofs_sb_info *sbi = autofs_sbi(sb); /* * In the event of a failure in get_sb_nodev the superblock * info is not present so nothing else has been setup, so * just call kill_anon_super when we are called from * deactivate_super. */ if (sbi) { /* Free wait queues, close pipe */ autofs_catatonic_mode(sbi); put_pid(sbi->oz_pgrp); } pr_debug("shutting down\n"); kill_litter_super(sb); if (sbi) kfree_rcu(sbi, rcu); } static int autofs_show_options(struct seq_file *m, struct dentry *root) { struct autofs_sb_info *sbi = autofs_sbi(root->d_sb); struct inode *root_inode = d_inode(root->d_sb->s_root); if (!sbi) return 0; seq_printf(m, ",fd=%d", sbi->pipefd); if (!uid_eq(root_inode->i_uid, GLOBAL_ROOT_UID)) seq_printf(m, ",uid=%u", from_kuid_munged(&init_user_ns, root_inode->i_uid)); if (!gid_eq(root_inode->i_gid, GLOBAL_ROOT_GID)) seq_printf(m, ",gid=%u", from_kgid_munged(&init_user_ns, root_inode->i_gid)); seq_printf(m, ",pgrp=%d", pid_vnr(sbi->oz_pgrp)); seq_printf(m, ",timeout=%lu", sbi->exp_timeout/HZ); seq_printf(m, ",minproto=%d", sbi->min_proto); seq_printf(m, ",maxproto=%d", sbi->max_proto); if (autofs_type_offset(sbi->type)) seq_printf(m, ",offset"); else if (autofs_type_direct(sbi->type)) seq_printf(m, ",direct"); else seq_printf(m, ",indirect"); #ifdef CONFIG_CHECKPOINT_RESTORE if (sbi->pipe) seq_printf(m, ",pipe_ino=%ld", file_inode(sbi->pipe)->i_ino); else seq_printf(m, ",pipe_ino=-1"); #endif return 0; } static void autofs_evict_inode(struct inode *inode) { clear_inode(inode); kfree(inode->i_private); } static const struct super_operations autofs_sops = { .statfs = simple_statfs, .show_options = autofs_show_options, .evict_inode = autofs_evict_inode, }; enum {Opt_err, Opt_fd, Opt_uid, Opt_gid, Opt_pgrp, Opt_minproto, Opt_maxproto, Opt_indirect, Opt_direct, Opt_offset}; static const match_table_t tokens = { {Opt_fd, "fd=%u"}, {Opt_uid, "uid=%u"}, {Opt_gid, "gid=%u"}, {Opt_pgrp, "pgrp=%u"}, {Opt_minproto, "minproto=%u"}, {Opt_maxproto, "maxproto=%u"}, {Opt_indirect, "indirect"}, {Opt_direct, "direct"}, {Opt_offset, "offset"}, {Opt_err, NULL} }; static int parse_options(char *options, int *pipefd, kuid_t *uid, kgid_t *gid, int *pgrp, bool *pgrp_set, unsigned int *type, int *minproto, int *maxproto) { char *p; substring_t args[MAX_OPT_ARGS]; int option; *uid = current_uid(); *gid = current_gid(); *minproto = AUTOFS_MIN_PROTO_VERSION; *maxproto = AUTOFS_MAX_PROTO_VERSION; *pipefd = -1; if (!options) return 1; while ((p = strsep(&options, ",")) != NULL) { int token; if (!*p) continue; token = match_token(p, tokens, args); switch (token) { case Opt_fd: if (match_int(args, pipefd)) return 1; break; case Opt_uid: if (match_int(args, &option)) return 1; *uid = make_kuid(current_user_ns(), option); if (!uid_valid(*uid)) return 1; break; case Opt_gid: if (match_int(args, &option)) return 1; *gid = make_kgid(current_user_ns(), option); if (!gid_valid(*gid)) return 1; break; case Opt_pgrp: if (match_int(args, &option)) return 1; *pgrp = option; *pgrp_set = true; break; case Opt_minproto: if (match_int(args, &option)) return 1; *minproto = option; break; case Opt_maxproto: if (match_int(args, &option)) return 1; *maxproto = option; break; case Opt_indirect: set_autofs_type_indirect(type); break; case Opt_direct: set_autofs_type_direct(type); break; case Opt_offset: set_autofs_type_offset(type); break; default: return 1; } } return (*pipefd < 0); } int autofs_fill_super(struct super_block *s, void *data, int silent) { struct inode *root_inode; struct dentry *root; struct file *pipe; int pipefd; struct autofs_sb_info *sbi; struct autofs_info *ino; int pgrp = 0; bool pgrp_set = false; int ret = -EINVAL; sbi = kzalloc(sizeof(*sbi), GFP_KERNEL); if (!sbi) return -ENOMEM; pr_debug("starting up, sbi = %p\n", sbi); s->s_fs_info = sbi; sbi->magic = AUTOFS_SBI_MAGIC; sbi->pipefd = -1; sbi->pipe = NULL; sbi->catatonic = 1; sbi->exp_timeout = 0; sbi->oz_pgrp = NULL; sbi->sb = s; sbi->version = 0; sbi->sub_version = 0; set_autofs_type_indirect(&sbi->type); sbi->min_proto = 0; sbi->max_proto = 0; mutex_init(&sbi->wq_mutex); mutex_init(&sbi->pipe_mutex); spin_lock_init(&sbi->fs_lock); sbi->queues = NULL; spin_lock_init(&sbi->lookup_lock); INIT_LIST_HEAD(&sbi->active_list); INIT_LIST_HEAD(&sbi->expiring_list); s->s_blocksize = 1024; s->s_blocksize_bits = 10; s->s_magic = AUTOFS_SUPER_MAGIC; s->s_op = &autofs_sops; s->s_d_op = &autofs_dentry_operations; s->s_time_gran = 1; /* * Get the root inode and dentry, but defer checking for errors. */ ino = autofs_new_ino(sbi); if (!ino) { ret = -ENOMEM; goto fail_free; } root_inode = autofs_get_inode(s, S_IFDIR | 0755); root = d_make_root(root_inode); if (!root) { ret = -ENOMEM; goto fail_ino; } pipe = NULL; root->d_fsdata = ino; /* Can this call block? */ if (parse_options(data, &pipefd, &root_inode->i_uid, &root_inode->i_gid, &pgrp, &pgrp_set, &sbi->type, &sbi->min_proto, &sbi->max_proto)) { pr_err("called with bogus options\n"); goto fail_dput; } /* Test versions first */ if (sbi->max_proto < AUTOFS_MIN_PROTO_VERSION || sbi->min_proto > AUTOFS_MAX_PROTO_VERSION) { pr_err("kernel does not match daemon version " "daemon (%d, %d) kernel (%d, %d)\n", sbi->min_proto, sbi->max_proto, AUTOFS_MIN_PROTO_VERSION, AUTOFS_MAX_PROTO_VERSION); goto fail_dput; } /* Establish highest kernel protocol version */ if (sbi->max_proto > AUTOFS_MAX_PROTO_VERSION) sbi->version = AUTOFS_MAX_PROTO_VERSION; else sbi->version = sbi->max_proto; sbi->sub_version = AUTOFS_PROTO_SUBVERSION; if (pgrp_set) { sbi->oz_pgrp = find_get_pid(pgrp); if (!sbi->oz_pgrp) { pr_err("could not find process group %d\n", pgrp); goto fail_dput; } } else { sbi->oz_pgrp = get_task_pid(current, PIDTYPE_PGID); } if (autofs_type_trigger(sbi->type)) __managed_dentry_set_managed(root); root_inode->i_fop = &autofs_root_operations; root_inode->i_op = &autofs_dir_inode_operations; pr_debug("pipe fd = %d, pgrp = %u\n", pipefd, pid_nr(sbi->oz_pgrp)); pipe = fget(pipefd); if (!pipe) { pr_err("could not open pipe file descriptor\n"); goto fail_put_pid; } ret = autofs_prepare_pipe(pipe); if (ret < 0) goto fail_fput; sbi->pipe = pipe; sbi->pipefd = pipefd; sbi->catatonic = 0; /* * Success! Install the root dentry now to indicate completion. */ s->s_root = root; return 0; /* * Failure ... clean up. */ fail_fput: pr_err("pipe file descriptor does not contain proper ops\n"); fput(pipe); fail_put_pid: put_pid(sbi->oz_pgrp); fail_dput: dput(root); goto fail_free; fail_ino: autofs_free_ino(ino); fail_free: kfree(sbi); s->s_fs_info = NULL; return ret; } struct inode *autofs_get_inode(struct super_block *sb, umode_t mode) { struct inode *inode = new_inode(sb); if (inode == NULL) return NULL; inode->i_mode = mode; if (sb->s_root) { inode->i_uid = d_inode(sb->s_root)->i_uid; inode->i_gid = d_inode(sb->s_root)->i_gid; } inode->i_atime = inode->i_mtime = inode->i_ctime = current_time(inode); inode->i_ino = get_next_ino(); if (S_ISDIR(mode)) { set_nlink(inode, 2); inode->i_op = &autofs_dir_inode_operations; inode->i_fop = &autofs_dir_operations; } else if (S_ISLNK(mode)) { inode->i_op = &autofs_symlink_inode_operations; } else WARN_ON(1); return inode; }
910 910 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 /* * Copyright (C) 2013 Politecnico di Torino, Italy * TORSEC group -- http://security.polito.it * * Author: Roberto Sassu <roberto.sassu@polito.it> * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License as * published by the Free Software Foundation, version 2 of the * License. * * File: ima_template.c * Helpers to manage template descriptors. */ #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt #include <linux/rculist.h> #include "ima.h" #include "ima_template_lib.h" enum header_fields { HDR_PCR, HDR_DIGEST, HDR_TEMPLATE_NAME, HDR_TEMPLATE_DATA, HDR__LAST }; static struct ima_template_desc builtin_templates[] = { {.name = IMA_TEMPLATE_IMA_NAME, .fmt = IMA_TEMPLATE_IMA_FMT}, {.name = "ima-ng", .fmt = "d-ng|n-ng"}, {.name = "ima-sig", .fmt = "d-ng|n-ng|sig"}, {.name = "", .fmt = ""}, /* placeholder for a custom format */ }; static LIST_HEAD(defined_templates); static DEFINE_SPINLOCK(template_list); static int template_setup_done; static struct ima_template_field supported_fields[] = { {.field_id = "d", .field_init = ima_eventdigest_init, .field_show = ima_show_template_digest}, {.field_id = "n", .field_init = ima_eventname_init, .field_show = ima_show_template_string}, {.field_id = "d-ng", .field_init = ima_eventdigest_ng_init, .field_show = ima_show_template_digest_ng}, {.field_id = "n-ng", .field_init = ima_eventname_ng_init, .field_show = ima_show_template_string}, {.field_id = "sig", .field_init = ima_eventsig_init, .field_show = ima_show_template_sig}, }; #define MAX_TEMPLATE_NAME_LEN 15 static struct ima_template_desc *ima_template; static struct ima_template_desc *lookup_template_desc(const char *name); static int template_desc_init_fields(const char *template_fmt, struct ima_template_field ***fields, int *num_fields); static int __init ima_template_setup(char *str) { struct ima_template_desc *template_desc; int template_len = strlen(str); if (template_setup_done) return 1; if (!ima_template) ima_init_template_list(); /* * Verify that a template with the supplied name exists. * If not, use CONFIG_IMA_DEFAULT_TEMPLATE. */ template_desc = lookup_template_desc(str); if (!template_desc) { pr_err("template %s not found, using %s\n", str, CONFIG_IMA_DEFAULT_TEMPLATE); return 1; } /* * Verify whether the current hash algorithm is supported * by the 'ima' template. */ if (template_len == 3 && strcmp(str, IMA_TEMPLATE_IMA_NAME) == 0 && ima_hash_algo != HASH_ALGO_SHA1 && ima_hash_algo != HASH_ALGO_MD5) { pr_err("template does not support hash alg\n"); return 1; } ima_template = template_desc; template_setup_done = 1; return 1; } __setup("ima_template=", ima_template_setup); static int __init ima_template_fmt_setup(char *str) { int num_templates = ARRAY_SIZE(builtin_templates); if (template_setup_done) return 1; if (template_desc_init_fields(str, NULL, NULL) < 0) { pr_err("format string '%s' not valid, using template %s\n", str, CONFIG_IMA_DEFAULT_TEMPLATE); return 1; } builtin_templates[num_templates - 1].fmt = str; ima_template = builtin_templates + num_templates - 1; template_setup_done = 1; return 1; } __setup("ima_template_fmt=", ima_template_fmt_setup); static struct ima_template_desc *lookup_template_desc(const char *name) { struct ima_template_desc *template_desc; int found = 0; rcu_read_lock(); list_for_each_entry_rcu(template_desc, &defined_templates, list) { if ((strcmp(template_desc->name, name) == 0) || (strcmp(template_desc->fmt, name) == 0)) { found = 1; break; } } rcu_read_unlock(); return found ? template_desc : NULL; } static struct ima_template_field *lookup_template_field(const char *field_id) { int i; for (i = 0; i < ARRAY_SIZE(supported_fields); i++) if (strncmp(supported_fields[i].field_id, field_id, IMA_TEMPLATE_FIELD_ID_MAX_LEN) == 0) return &supported_fields[i]; return NULL; } static int template_fmt_size(const char *template_fmt) { char c; int template_fmt_len = strlen(template_fmt); int i = 0, j = 0; while (i < template_fmt_len) { c = template_fmt[i]; if (c == '|') j++; i++; } return j + 1; } static int template_desc_init_fields(const char *template_fmt, struct ima_template_field ***fields, int *num_fields) { const char *template_fmt_ptr; struct ima_template_field *found_fields[IMA_TEMPLATE_NUM_FIELDS_MAX]; int template_num_fields; int i, len; if (num_fields && *num_fields > 0) /* already initialized? */ return 0; template_num_fields = template_fmt_size(template_fmt); if (template_num_fields > IMA_TEMPLATE_NUM_FIELDS_MAX) { pr_err("format string '%s' contains too many fields\n", template_fmt); return -EINVAL; } for (i = 0, template_fmt_ptr = template_fmt; i < template_num_fields; i++, template_fmt_ptr += len + 1) { char tmp_field_id[IMA_TEMPLATE_FIELD_ID_MAX_LEN + 1]; len = strchrnul(template_fmt_ptr, '|') - template_fmt_ptr; if (len == 0 || len > IMA_TEMPLATE_FIELD_ID_MAX_LEN) { pr_err("Invalid field with length %d\n", len); return -EINVAL; } memcpy(tmp_field_id, template_fmt_ptr, len); tmp_field_id[len] = '\0'; found_fields[i] = lookup_template_field(tmp_field_id); if (!found_fields[i]) { pr_err("field '%s' not found\n", tmp_field_id); return -ENOENT; } } if (fields && num_fields) { *fields = kmalloc_array(i, sizeof(*fields), GFP_KERNEL); if (*fields == NULL) return -ENOMEM; memcpy(*fields, found_fields, i * sizeof(*fields)); *num_fields = i; } return 0; } void ima_init_template_list(void) { int i; if (!list_empty(&defined_templates)) return; spin_lock(&template_list); for (i = 0; i < ARRAY_SIZE(builtin_templates); i++) { list_add_tail_rcu(&builtin_templates[i].list, &defined_templates); } spin_unlock(&template_list); } struct ima_template_desc *ima_template_desc_current(void) { if (!ima_template) { ima_init_template_list(); ima_template = lookup_template_desc(CONFIG_IMA_DEFAULT_TEMPLATE); } return ima_template; } int __init ima_init_template(void) { struct ima_template_desc *template = ima_template_desc_current(); int result; result = template_desc_init_fields(template->fmt, &(template->fields), &(template->num_fields)); if (result < 0) pr_err("template %s init failed, result: %d\n", (strlen(template->name) ? template->name : template->fmt), result); return result; } static struct ima_template_desc *restore_template_fmt(char *template_name) { struct ima_template_desc *template_desc = NULL; int ret; ret = template_desc_init_fields(template_name, NULL, NULL); if (ret < 0) { pr_err("attempting to initialize the template \"%s\" failed\n", template_name); goto out; } template_desc = kzalloc(sizeof(*template_desc), GFP_KERNEL); if (!template_desc) goto out; template_desc->name = ""; template_desc->fmt = kstrdup(template_name, GFP_KERNEL); if (!template_desc->fmt) goto out; spin_lock(&template_list); list_add_tail_rcu(&template_desc->list, &defined_templates); spin_unlock(&template_list); out: return template_desc; } static int ima_restore_template_data(struct ima_template_desc *template_desc, void *template_data, int template_data_size, struct ima_template_entry **entry) { int ret = 0; int i; *entry = kzalloc(sizeof(**entry) + template_desc->num_fields * sizeof(struct ima_field_data), GFP_NOFS); if (!*entry) return -ENOMEM; ret = ima_parse_buf(template_data, template_data + template_data_size, NULL, template_desc->num_fields, (*entry)->template_data, NULL, NULL, ENFORCE_FIELDS | ENFORCE_BUFEND, "template data"); if (ret < 0) { kfree(*entry); return ret; } (*entry)->template_desc = template_desc; for (i = 0; i < template_desc->num_fields; i++) { struct ima_field_data *field_data = &(*entry)->template_data[i]; u8 *data = field_data->data; (*entry)->template_data[i].data = kzalloc(field_data->len + 1, GFP_KERNEL); if (!(*entry)->template_data[i].data) { ret = -ENOMEM; break; } memcpy((*entry)->template_data[i].data, data, field_data->len); (*entry)->template_data_len += sizeof(field_data->len); (*entry)->template_data_len += field_data->len; } if (ret < 0) { ima_free_template_entry(*entry); *entry = NULL; } return ret; } /* Restore the serialized binary measurement list without extending PCRs. */ int ima_restore_measurement_list(loff_t size, void *buf) { char template_name[MAX_TEMPLATE_NAME_LEN]; struct ima_kexec_hdr *khdr = buf; struct ima_field_data hdr[HDR__LAST] = { [HDR_PCR] = {.len = sizeof(u32)}, [HDR_DIGEST] = {.len = TPM_DIGEST_SIZE}, }; void *bufp = buf + sizeof(*khdr); void *bufendp; struct ima_template_entry *entry; struct ima_template_desc *template_desc; DECLARE_BITMAP(hdr_mask, HDR__LAST); unsigned long count = 0; int ret = 0; if (!buf || size < sizeof(*khdr)) return 0; if (ima_canonical_fmt) { khdr->version = le16_to_cpu(khdr->version); khdr->count = le64_to_cpu(khdr->count); khdr->buffer_size = le64_to_cpu(khdr->buffer_size); } if (khdr->version != 1) { pr_err("attempting to restore a incompatible measurement list"); return -EINVAL; } if (khdr->count > ULONG_MAX - 1) { pr_err("attempting to restore too many measurements"); return -EINVAL; } bitmap_zero(hdr_mask, HDR__LAST); bitmap_set(hdr_mask, HDR_PCR, 1); bitmap_set(hdr_mask, HDR_DIGEST, 1); /* * ima kexec buffer prefix: version, buffer size, count * v1 format: pcr, digest, template-name-len, template-name, * template-data-size, template-data */ bufendp = buf + khdr->buffer_size; while ((bufp < bufendp) && (count++ < khdr->count)) { int enforce_mask = ENFORCE_FIELDS; enforce_mask |= (count == khdr->count) ? ENFORCE_BUFEND : 0; ret = ima_parse_buf(bufp, bufendp, &bufp, HDR__LAST, hdr, NULL, hdr_mask, enforce_mask, "entry header"); if (ret < 0) break; if (hdr[HDR_TEMPLATE_NAME].len >= MAX_TEMPLATE_NAME_LEN) { pr_err("attempting to restore a template name that is too long\n"); ret = -EINVAL; break; } /* template name is not null terminated */ memcpy(template_name, hdr[HDR_TEMPLATE_NAME].data, hdr[HDR_TEMPLATE_NAME].len); template_name[hdr[HDR_TEMPLATE_NAME].len] = 0; if (strcmp(template_name, "ima") == 0) { pr_err("attempting to restore an unsupported template \"%s\" failed\n", template_name); ret = -EINVAL; break; } template_desc = lookup_template_desc(template_name); if (!template_desc) { template_desc = restore_template_fmt(template_name); if (!template_desc) break; } /* * Only the running system's template format is initialized * on boot. As needed, initialize the other template formats. */ ret = template_desc_init_fields(template_desc->fmt, &(template_desc->fields), &(template_desc->num_fields)); if (ret < 0) { pr_err("attempting to restore the template fmt \"%s\" failed\n", template_desc->fmt); ret = -EINVAL; break; } ret = ima_restore_template_data(template_desc, hdr[HDR_TEMPLATE_DATA].data, hdr[HDR_TEMPLATE_DATA].len, &entry); if (ret < 0) break; memcpy(entry->digest, hdr[HDR_DIGEST].data, hdr[HDR_DIGEST].len); entry->pcr = !ima_canonical_fmt ? *(hdr[HDR_PCR].data) : le32_to_cpu(*(hdr[HDR_PCR].data)); ret = ima_restore_measurement_entry(entry); if (ret < 0) break; } return ret; }
215 61 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 /* SPDX-License-Identifier: GPL-2.0 */ #ifndef _LINUX_RANDOM_H #define _LINUX_RANDOM_H #include <linux/bug.h> #include <linux/kernel.h> #include <linux/list.h> #include <linux/once.h> #include <uapi/linux/random.h> struct notifier_block; void add_device_randomness(const void *buf, size_t len); void __init add_bootloader_randomness(const void *buf, size_t len); void add_input_randomness(unsigned int type, unsigned int code, unsigned int value) __latent_entropy; void add_interrupt_randomness(int irq) __latent_entropy; void add_hwgenerator_randomness(const void *buf, size_t len, size_t entropy); #if defined(LATENT_ENTROPY_PLUGIN) && !defined(__CHECKER__) static inline void add_latent_entropy(void) { add_device_randomness((const void *)&latent_entropy, sizeof(latent_entropy)); } #else static inline void add_latent_entropy(void) { } #endif void get_random_bytes(void *buf, size_t len); size_t __must_check get_random_bytes_arch(void *buf, size_t len); u32 get_random_u32(void); u64 get_random_u64(void); static inline unsigned int get_random_int(void) { return get_random_u32(); } static inline unsigned long get_random_long(void) { #if BITS_PER_LONG == 64 return get_random_u64(); #else return get_random_u32(); #endif } /* * On 64-bit architectures, protect against non-terminated C string overflows * by zeroing out the first byte of the canary; this leaves 56 bits of entropy. */ #ifdef CONFIG_64BIT # ifdef __LITTLE_ENDIAN # define CANARY_MASK 0xffffffffffffff00UL # else /* big endian, 64 bits: */ # define CANARY_MASK 0x00ffffffffffffffUL # endif #else /* 32 bits: */ # define CANARY_MASK 0xffffffffUL #endif static inline unsigned long get_random_canary(void) { return get_random_long() & CANARY_MASK; } int __init random_init(const char *command_line); bool rng_is_initialized(void); int wait_for_random_bytes(void); int register_random_ready_notifier(struct notifier_block *nb); int unregister_random_ready_notifier(struct notifier_block *nb); /* Calls wait_for_random_bytes() and then calls get_random_bytes(buf, nbytes). * Returns the result of the call to wait_for_random_bytes. */ static inline int get_random_bytes_wait(void *buf, size_t nbytes) { int ret = wait_for_random_bytes(); get_random_bytes(buf, nbytes); return ret; } #define declare_get_random_var_wait(name, ret_type) \ static inline int get_random_ ## name ## _wait(ret_type *out) { \ int ret = wait_for_random_bytes(); \ if (unlikely(ret)) \ return ret; \ *out = get_random_ ## name(); \ return 0; \ } declare_get_random_var_wait(u32, u32) declare_get_random_var_wait(u64, u32) declare_get_random_var_wait(int, unsigned int) declare_get_random_var_wait(long, unsigned long) #undef declare_get_random_var /* * This is designed to be standalone for just prandom * users, but for now we include it from <linux/random.h> * for legacy reasons. */ #include <linux/prandom.h> #ifdef CONFIG_ARCH_RANDOM # include <asm/archrandom.h> #else static inline bool __must_check arch_get_random_long(unsigned long *v) { return false; } static inline bool __must_check arch_get_random_int(unsigned int *v) { return false; } static inline bool __must_check arch_get_random_seed_long(unsigned long *v) { return false; } static inline bool __must_check arch_get_random_seed_int(unsigned int *v) { return false; } #endif /* * Called from the boot CPU during startup; not valid to call once * secondary CPUs are up and preemption is possible. */ #ifndef arch_get_random_seed_long_early static inline bool __init arch_get_random_seed_long_early(unsigned long *v) { WARN_ON(system_state != SYSTEM_BOOTING); return arch_get_random_seed_long(v); } #endif #ifndef arch_get_random_long_early static inline bool __init arch_get_random_long_early(unsigned long *v) { WARN_ON(system_state != SYSTEM_BOOTING); return arch_get_random_long(v); } #endif #ifdef CONFIG_SMP int random_prepare_cpu(unsigned int cpu); int random_online_cpu(unsigned int cpu); #endif #ifndef MODULE extern const struct file_operations random_fops, urandom_fops; #endif #endif /* _LINUX_RANDOM_H */
57 28 73 8 6 4 66 65 72 42 41 1 40 40 42 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 /** -*- linux-c -*- *********************************************************** * Linux PPP over X/Ethernet (PPPoX/PPPoE) Sockets * * PPPoX --- Generic PPP encapsulation socket family * PPPoE --- PPP over Ethernet (RFC 2516) * * * Version: 0.5.2 * * Author: Michal Ostrowski <mostrows@speakeasy.net> * * 051000 : Initialization cleanup * * License: * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License * as published by the Free Software Foundation; either version * 2 of the License, or (at your option) any later version. * */ #include <linux/string.h> #include <linux/module.h> #include <linux/kernel.h> #include <linux/compat.h> #include <linux/errno.h> #include <linux/netdevice.h> #include <linux/net.h> #include <linux/init.h> #include <linux/if_pppox.h> #include <linux/ppp_defs.h> #include <linux/ppp-ioctl.h> #include <linux/ppp_channel.h> #include <linux/kmod.h> #include <net/sock.h> #include <linux/uaccess.h> static const struct pppox_proto *pppox_protos[PX_MAX_PROTO + 1]; int register_pppox_proto(int proto_num, const struct pppox_proto *pp) { if (proto_num < 0 || proto_num > PX_MAX_PROTO) return -EINVAL; if (pppox_protos[proto_num]) return -EALREADY; pppox_protos[proto_num] = pp; return 0; } void unregister_pppox_proto(int proto_num) { if (proto_num >= 0 && proto_num <= PX_MAX_PROTO) pppox_protos[proto_num] = NULL; } void pppox_unbind_sock(struct sock *sk) { /* Clear connection to ppp device, if attached. */ if (sk->sk_state & (PPPOX_BOUND | PPPOX_CONNECTED)) { ppp_unregister_channel(&pppox_sk(sk)->chan); sk->sk_state = PPPOX_DEAD; } } EXPORT_SYMBOL(register_pppox_proto); EXPORT_SYMBOL(unregister_pppox_proto); EXPORT_SYMBOL(pppox_unbind_sock); int pppox_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg) { struct sock *sk = sock->sk; struct pppox_sock *po = pppox_sk(sk); int rc; lock_sock(sk); switch (cmd) { case PPPIOCGCHAN: { int index; rc = -ENOTCONN; if (!(sk->sk_state & PPPOX_CONNECTED)) break; rc = -EINVAL; index = ppp_channel_index(&po->chan); if (put_user(index , (int __user *) arg)) break; rc = 0; sk->sk_state |= PPPOX_BOUND; break; } default: rc = pppox_protos[sk->sk_protocol]->ioctl ? pppox_protos[sk->sk_protocol]->ioctl(sock, cmd, arg) : -ENOTTY; } release_sock(sk); return rc; } EXPORT_SYMBOL(pppox_ioctl); #ifdef CONFIG_COMPAT int pppox_compat_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg) { if (cmd == PPPOEIOCSFWD32) cmd = PPPOEIOCSFWD; return pppox_ioctl(sock, cmd, (unsigned long)compat_ptr(arg)); } EXPORT_SYMBOL(pppox_compat_ioctl); #endif static int pppox_create(struct net *net, struct socket *sock, int protocol, int kern) { int rc = -EPROTOTYPE; if (protocol < 0 || protocol > PX_MAX_PROTO) goto out; rc = -EPROTONOSUPPORT; if (!pppox_protos[protocol]) request_module("net-pf-%d-proto-%d", PF_PPPOX, protocol); if (!pppox_protos[protocol] || !try_module_get(pppox_protos[protocol]->owner)) goto out; rc = pppox_protos[protocol]->create(net, sock, kern); module_put(pppox_protos[protocol]->owner); out: return rc; } static const struct net_proto_family pppox_proto_family = { .family = PF_PPPOX, .create = pppox_create, .owner = THIS_MODULE, }; static int __init pppox_init(void) { return sock_register(&pppox_proto_family); } static void __exit pppox_exit(void) { sock_unregister(PF_PPPOX); } module_init(pppox_init); module_exit(pppox_exit); MODULE_AUTHOR("Michal Ostrowski <mostrows@speakeasy.net>"); MODULE_DESCRIPTION("PPP over Ethernet driver (generic socket layer)"); MODULE_LICENSE("GPL"); MODULE_ALIAS_NETPROTO(PF_PPPOX);
89 102 76 96 63 17 90 90 90 90 90 68 90 90 90 11 11 11 11 11 11 17 17 2 2 2 2 12 13 13 9 6 10 10 23 1 104 104 5 2 2 3 3 17 17 17 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 /* * Copyright (c) 2007, 2017 Oracle and/or its affiliates. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU * General Public License (GPL) Version 2, available from the file * COPYING in the main directory of this source tree, or the * OpenIB.org BSD license below: * * Redistribution and use in source and binary forms, with or * without modification, are permitted provided that the following * conditions are met: * * - Redistributions of source code must retain the above * copyright notice, this list of conditions and the following * disclaimer. * * - Redistributions in binary form must reproduce the above * copyright notice, this list of conditions and the following * disclaimer in the documentation and/or other materials * provided with the distribution. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE * SOFTWARE. * */ #include <linux/slab.h> #include <linux/types.h> #include <linux/rbtree.h> #include <linux/bitops.h> #include <linux/export.h> #include "rds.h" /* * This file implements the receive side of the unconventional congestion * management in RDS. * * Messages waiting in the receive queue on the receiving socket are accounted * against the sockets SO_RCVBUF option value. Only the payload bytes in the * message are accounted for. If the number of bytes queued equals or exceeds * rcvbuf then the socket is congested. All sends attempted to this socket's * address should return block or return -EWOULDBLOCK. * * Applications are expected to be reasonably tuned such that this situation * very rarely occurs. An application encountering this "back-pressure" is * considered a bug. * * This is implemented by having each node maintain bitmaps which indicate * which ports on bound addresses are congested. As the bitmap changes it is * sent through all the connections which terminate in the local address of the * bitmap which changed. * * The bitmaps are allocated as connections are brought up. This avoids * allocation in the interrupt handling path which queues messages on sockets. * The dense bitmaps let transports send the entire bitmap on any bitmap change * reasonably efficiently. This is much easier to implement than some * finer-grained communication of per-port congestion. The sender does a very * inexpensive bit test to test if the port it's about to send to is congested * or not. */ /* * Interaction with poll is a tad tricky. We want all processes stuck in * poll to wake up and check whether a congested destination became uncongested. * The really sad thing is we have no idea which destinations the application * wants to send to - we don't even know which rds_connections are involved. * So until we implement a more flexible rds poll interface, we have to make * do with this: * We maintain a global counter that is incremented each time a congestion map * update is received. Each rds socket tracks this value, and if rds_poll * finds that the saved generation number is smaller than the global generation * number, it wakes up the process. */ static atomic_t rds_cong_generation = ATOMIC_INIT(0); /* * Congestion monitoring */ static LIST_HEAD(rds_cong_monitor); static DEFINE_RWLOCK(rds_cong_monitor_lock); /* * Yes, a global lock. It's used so infrequently that it's worth keeping it * global to simplify the locking. It's only used in the following * circumstances: * * - on connection buildup to associate a conn with its maps * - on map changes to inform conns of a new map to send * * It's sadly ordered under the socket callback lock and the connection lock. * Receive paths can mark ports congested from interrupt context so the * lock masks interrupts. */ static DEFINE_SPINLOCK(rds_cong_lock); static struct rb_root rds_cong_tree = RB_ROOT; static struct rds_cong_map *rds_cong_tree_walk(const struct in6_addr *addr, struct rds_cong_map *insert) { struct rb_node **p = &rds_cong_tree.rb_node; struct rb_node *parent = NULL; struct rds_cong_map *map; while (*p) { int diff; parent = *p; map = rb_entry(parent, struct rds_cong_map, m_rb_node); diff = rds_addr_cmp(addr, &map->m_addr); if (diff < 0) p = &(*p)->rb_left; else if (diff > 0) p = &(*p)->rb_right; else return map; } if (insert) { rb_link_node(&insert->m_rb_node, parent, p); rb_insert_color(&insert->m_rb_node, &rds_cong_tree); } return NULL; } /* * There is only ever one bitmap for any address. Connections try and allocate * these bitmaps in the process getting pointers to them. The bitmaps are only * ever freed as the module is removed after all connections have been freed. */ static struct rds_cong_map *rds_cong_from_addr(const struct in6_addr *addr) { struct rds_cong_map *map; struct rds_cong_map *ret = NULL; unsigned long zp; unsigned long i; unsigned long flags; map = kzalloc(sizeof(struct rds_cong_map), GFP_KERNEL); if (!map) return NULL; map->m_addr = *addr; init_waitqueue_head(&map->m_waitq); INIT_LIST_HEAD(&map->m_conn_list); for (i = 0; i < RDS_CONG_MAP_PAGES; i++) { zp = get_zeroed_page(GFP_KERNEL); if (zp == 0) goto out; map->m_page_addrs[i] = zp; } spin_lock_irqsave(&rds_cong_lock, flags); ret = rds_cong_tree_walk(addr, map); spin_unlock_irqrestore(&rds_cong_lock, flags); if (!ret) { ret = map; map = NULL; } out: if (map) { for (i = 0; i < RDS_CONG_MAP_PAGES && map->m_page_addrs[i]; i++) free_page(map->m_page_addrs[i]); kfree(map); } rdsdebug("map %p for addr %pI6c\n", ret, addr); return ret; } /* * Put the conn on its local map's list. This is called when the conn is * really added to the hash. It's nested under the rds_conn_lock, sadly. */ void rds_cong_add_conn(struct rds_connection *conn) { unsigned long flags; rdsdebug("conn %p now on map %p\n", conn, conn->c_lcong); spin_lock_irqsave(&rds_cong_lock, flags); list_add_tail(&conn->c_map_item, &conn->c_lcong->m_conn_list); spin_unlock_irqrestore(&rds_cong_lock, flags); } void rds_cong_remove_conn(struct rds_connection *conn) { unsigned long flags; rdsdebug("removing conn %p from map %p\n", conn, conn->c_lcong); spin_lock_irqsave(&rds_cong_lock, flags); list_del_init(&conn->c_map_item); spin_unlock_irqrestore(&rds_cong_lock, flags); } int rds_cong_get_maps(struct rds_connection *conn) { conn->c_lcong = rds_cong_from_addr(&conn->c_laddr); conn->c_fcong = rds_cong_from_addr(&conn->c_faddr); if (!(conn->c_lcong && conn->c_fcong)) return -ENOMEM; return 0; } void rds_cong_queue_updates(struct rds_cong_map *map) { struct rds_connection *conn; unsigned long flags; spin_lock_irqsave(&rds_cong_lock, flags); list_for_each_entry(conn, &map->m_conn_list, c_map_item) { struct rds_conn_path *cp = &conn->c_path[0]; rcu_read_lock(); if (!test_and_set_bit(0, &conn->c_map_queued) && !rds_destroy_pending(cp->cp_conn)) { rds_stats_inc(s_cong_update_queued); /* We cannot inline the call to rds_send_xmit() here * for two reasons (both pertaining to a TCP transport): * 1. When we get here from the receive path, we * are already holding the sock_lock (held by * tcp_v4_rcv()). So inlining calls to * tcp_setsockopt and/or tcp_sendmsg will deadlock * when it tries to get the sock_lock()) * 2. Interrupts are masked so that we can mark the * the port congested from both send and recv paths. * (See comment around declaration of rdc_cong_lock). * An attempt to get the sock_lock() here will * therefore trigger warnings. * Defer the xmit to rds_send_worker() instead. */ queue_delayed_work(rds_wq, &cp->cp_send_w, 0); } rcu_read_unlock(); } spin_unlock_irqrestore(&rds_cong_lock, flags); } void rds_cong_map_updated(struct rds_cong_map *map, uint64_t portmask) { rdsdebug("waking map %p for %pI4\n", map, &map->m_addr); rds_stats_inc(s_cong_update_received); atomic_inc(&rds_cong_generation); if (waitqueue_active(&map->m_waitq)) wake_up(&map->m_waitq); if (waitqueue_active(&rds_poll_waitq)) wake_up_all(&rds_poll_waitq); if (portmask && !list_empty(&rds_cong_monitor)) { unsigned long flags; struct rds_sock *rs; read_lock_irqsave(&rds_cong_monitor_lock, flags); list_for_each_entry(rs, &rds_cong_monitor, rs_cong_list) { spin_lock(&rs->rs_lock); rs->rs_cong_notify |= (rs->rs_cong_mask & portmask); rs->rs_cong_mask &= ~portmask; spin_unlock(&rs->rs_lock); if (rs->rs_cong_notify) rds_wake_sk_sleep(rs); } read_unlock_irqrestore(&rds_cong_monitor_lock, flags); } } EXPORT_SYMBOL_GPL(rds_cong_map_updated); int rds_cong_updated_since(unsigned long *recent) { unsigned long gen = atomic_read(&rds_cong_generation); if (likely(*recent == gen)) return 0; *recent = gen; return 1; } /* * We're called under the locking that protects the sockets receive buffer * consumption. This makes it a lot easier for the caller to only call us * when it knows that an existing set bit needs to be cleared, and vice versa. * We can't block and we need to deal with concurrent sockets working against * the same per-address map. */ void rds_cong_set_bit(struct rds_cong_map *map, __be16 port) { unsigned long i; unsigned long off; rdsdebug("setting congestion for %pI4:%u in map %p\n", &map->m_addr, ntohs(port), map); i = be16_to_cpu(port) / RDS_CONG_MAP_PAGE_BITS; off = be16_to_cpu(port) % RDS_CONG_MAP_PAGE_BITS; set_bit_le(off, (void *)map->m_page_addrs[i]); } void rds_cong_clear_bit(struct rds_cong_map *map, __be16 port) { unsigned long i; unsigned long off; rdsdebug("clearing congestion for %pI4:%u in map %p\n", &map->m_addr, ntohs(port), map); i = be16_to_cpu(port) / RDS_CONG_MAP_PAGE_BITS; off = be16_to_cpu(port) % RDS_CONG_MAP_PAGE_BITS; clear_bit_le(off, (void *)map->m_page_addrs[i]); } static int rds_cong_test_bit(struct rds_cong_map *map, __be16 port) { unsigned long i; unsigned long off; i = be16_to_cpu(port) / RDS_CONG_MAP_PAGE_BITS; off = be16_to_cpu(port) % RDS_CONG_MAP_PAGE_BITS; return test_bit_le(off, (void *)map->m_page_addrs[i]); } void rds_cong_add_socket(struct rds_sock *rs) { unsigned long flags; write_lock_irqsave(&rds_cong_monitor_lock, flags); if (list_empty(&rs->rs_cong_list)) list_add(&rs->rs_cong_list, &rds_cong_monitor); write_unlock_irqrestore(&rds_cong_monitor_lock, flags); } void rds_cong_remove_socket(struct rds_sock *rs) { unsigned long flags; struct rds_cong_map *map; write_lock_irqsave(&rds_cong_monitor_lock, flags); list_del_init(&rs->rs_cong_list); write_unlock_irqrestore(&rds_cong_monitor_lock, flags); /* update congestion map for now-closed port */ spin_lock_irqsave(&rds_cong_lock, flags); map = rds_cong_tree_walk(&rs->rs_bound_addr, NULL); spin_unlock_irqrestore(&rds_cong_lock, flags); if (map && rds_cong_test_bit(map, rs->rs_bound_port)) { rds_cong_clear_bit(map, rs->rs_bound_port); rds_cong_queue_updates(map); } } int rds_cong_wait(struct rds_cong_map *map, __be16 port, int nonblock, struct rds_sock *rs) { if (!rds_cong_test_bit(map, port)) return 0; if (nonblock) { if (rs && rs->rs_cong_monitor) { unsigned long flags; /* It would have been nice to have an atomic set_bit on * a uint64_t. */ spin_lock_irqsave(&rs->rs_lock, flags); rs->rs_cong_mask |= RDS_CONG_MONITOR_MASK(ntohs(port)); spin_unlock_irqrestore(&rs->rs_lock, flags); /* Test again - a congestion update may have arrived in * the meantime. */ if (!rds_cong_test_bit(map, port)) return 0; } rds_stats_inc(s_cong_send_error); return -ENOBUFS; } rds_stats_inc(s_cong_send_blocked); rdsdebug("waiting on map %p for port %u\n", map, be16_to_cpu(port)); return wait_event_interruptible(map->m_waitq, !rds_cong_test_bit(map, port)); } void rds_cong_exit(void) { struct rb_node *node; struct rds_cong_map *map; unsigned long i; while ((node = rb_first(&rds_cong_tree))) { map = rb_entry(node, struct rds_cong_map, m_rb_node); rdsdebug("freeing map %p\n", map); rb_erase(&map->m_rb_node, &rds_cong_tree); for (i = 0; i < RDS_CONG_MAP_PAGES && map->m_page_addrs[i]; i++) free_page(map->m_page_addrs[i]); kfree(map); } } /* * Allocate a RDS message containing a congestion update. */ struct rds_message *rds_cong_update_alloc(struct rds_connection *conn) { struct rds_cong_map *map = conn->c_lcong; struct rds_message *rm; rm = rds_message_map_pages(map->m_page_addrs, RDS_CONG_MAP_BYTES); if (!IS_ERR(rm)) rm->m_inc.i_hdr.h_flags = RDS_FLAG_CONG_BITMAP; return rm; }
3 1 2 2 1 1 1 2 2 1 2 2 1 1 1 5 5 1 1 1 6 5 1 1 1 1 1 1 40 40 40 36 2 127 3 3 3 25 1 1 2 1 1 2 1 1 1 22 127 127 2 74 2 3 3 1 319 320 2 1 2 2 1 2 4 3 2 2 2 2 2 2 3 1 4 2 2 1 1 4 2 3 2 1 2 1 5 50 2 1 3 1 1 2 2 1 12 17 2 2 2 1 1 1 6 9 27 24 24 27 14 14 125 22 22 1 8 2 1 8 7 6 4 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 /* * KVM Microsoft Hyper-V emulation * * derived from arch/x86/kvm/x86.c * * Copyright (C) 2006 Qumranet, Inc. * Copyright (C) 2008 Qumranet, Inc. * Copyright IBM Corporation, 2008 * Copyright 2010 Red Hat, Inc. and/or its affiliates. * Copyright (C) 2015 Andrey Smetanin <asmetanin@virtuozzo.com> * * Authors: * Avi Kivity <avi@qumranet.com> * Yaniv Kamay <yaniv@qumranet.com> * Amit Shah <amit.shah@qumranet.com> * Ben-Ami Yassour <benami@il.ibm.com> * Andrey Smetanin <asmetanin@virtuozzo.com> * * This work is licensed under the terms of the GNU GPL, version 2. See * the COPYING file in the top-level directory. * */ #include "x86.h" #include "lapic.h" #include "ioapic.h" #include "hyperv.h" #include <linux/kvm_host.h> #include <linux/highmem.h> #include <linux/sched/cputime.h> #include <linux/eventfd.h> #include <asm/apicdef.h> #include <trace/events/kvm.h> #include "trace.h" static inline u64 synic_read_sint(struct kvm_vcpu_hv_synic *synic, int sint) { return atomic64_read(&synic->sint[sint]); } static inline int synic_get_sint_vector(u64 sint_value) { if (sint_value & HV_SYNIC_SINT_MASKED) return -1; return sint_value & HV_SYNIC_SINT_VECTOR_MASK; } static bool synic_has_vector_connected(struct kvm_vcpu_hv_synic *synic, int vector) { int i; for (i = 0; i < ARRAY_SIZE(synic->sint); i++) { if (synic_get_sint_vector(synic_read_sint(synic, i)) == vector) return true; } return false; } static bool synic_has_vector_auto_eoi(struct kvm_vcpu_hv_synic *synic, int vector) { int i; u64 sint_value; for (i = 0; i < ARRAY_SIZE(synic->sint); i++) { sint_value = synic_read_sint(synic, i); if (synic_get_sint_vector(sint_value) == vector && sint_value & HV_SYNIC_SINT_AUTO_EOI) return true; } return false; } static void synic_update_vector(struct kvm_vcpu_hv_synic *synic, int vector) { if (vector < HV_SYNIC_FIRST_VALID_VECTOR) return; if (synic_has_vector_connected(synic, vector)) __set_bit(vector, synic->vec_bitmap); else __clear_bit(vector, synic->vec_bitmap); if (synic_has_vector_auto_eoi(synic, vector)) __set_bit(vector, synic->auto_eoi_bitmap); else __clear_bit(vector, synic->auto_eoi_bitmap); } static int synic_set_sint(struct kvm_vcpu_hv_synic *synic, int sint, u64 data, bool host) { int vector, old_vector; bool masked; vector = data & HV_SYNIC_SINT_VECTOR_MASK; masked = data & HV_SYNIC_SINT_MASKED; /* * Valid vectors are 16-255, however, nested Hyper-V attempts to write * default '0x10000' value on boot and this should not #GP. We need to * allow zero-initing the register from host as well. */ if (vector < HV_SYNIC_FIRST_VALID_VECTOR && !host && !masked) return 1; /* * Guest may configure multiple SINTs to use the same vector, so * we maintain a bitmap of vectors handled by synic, and a * bitmap of vectors with auto-eoi behavior. The bitmaps are * updated here, and atomically queried on fast paths. */ old_vector = synic_read_sint(synic, sint) & HV_SYNIC_SINT_VECTOR_MASK; atomic64_set(&synic->sint[sint], data); synic_update_vector(synic, old_vector); synic_update_vector(synic, vector); /* Load SynIC vectors into EOI exit bitmap */ kvm_make_request(KVM_REQ_SCAN_IOAPIC, synic_to_vcpu(synic)); return 0; } static struct kvm_vcpu *get_vcpu_by_vpidx(struct kvm *kvm, u32 vpidx) { struct kvm_vcpu *vcpu = NULL; int i; if (vpidx >= KVM_MAX_VCPUS) return NULL; vcpu = kvm_get_vcpu(kvm, vpidx); if (vcpu && vcpu_to_hv_vcpu(vcpu)->vp_index == vpidx) return vcpu; kvm_for_each_vcpu(i, vcpu, kvm) if (vcpu_to_hv_vcpu(vcpu)->vp_index == vpidx) return vcpu; return NULL; } static struct kvm_vcpu_hv_synic *synic_get(struct kvm *kvm, u32 vpidx) { struct kvm_vcpu *vcpu; struct kvm_vcpu_hv_synic *synic; vcpu = get_vcpu_by_vpidx(kvm, vpidx); if (!vcpu) return NULL; synic = vcpu_to_synic(vcpu); return (synic->active) ? synic : NULL; } static void synic_clear_sint_msg_pending(struct kvm_vcpu_hv_synic *synic, u32 sint) { struct kvm_vcpu *vcpu = synic_to_vcpu(synic); struct page *page; gpa_t gpa; struct hv_message *msg; struct hv_message_page *msg_page; gpa = synic->msg_page & PAGE_MASK; page = kvm_vcpu_gfn_to_page(vcpu, gpa >> PAGE_SHIFT); if (is_error_page(page)) { vcpu_err(vcpu, "Hyper-V SynIC can't get msg page, gpa 0x%llx\n", gpa); return; } msg_page = kmap_atomic(page); msg = &msg_page->sint_message[sint]; msg->header.message_flags.msg_pending = 0; kunmap_atomic(msg_page); kvm_release_page_dirty(page); kvm_vcpu_mark_page_dirty(vcpu, gpa >> PAGE_SHIFT); } static void kvm_hv_notify_acked_sint(struct kvm_vcpu *vcpu, u32 sint) { struct kvm *kvm = vcpu->kvm; struct kvm_vcpu_hv_synic *synic = vcpu_to_synic(vcpu); struct kvm_vcpu_hv *hv_vcpu = vcpu_to_hv_vcpu(vcpu); struct kvm_vcpu_hv_stimer *stimer; int gsi, idx, stimers_pending; trace_kvm_hv_notify_acked_sint(vcpu->vcpu_id, sint); if (synic->msg_page & HV_SYNIC_SIMP_ENABLE) synic_clear_sint_msg_pending(synic, sint); /* Try to deliver pending Hyper-V SynIC timers messages */ stimers_pending = 0; for (idx = 0; idx < ARRAY_SIZE(hv_vcpu->stimer); idx++) { stimer = &hv_vcpu->stimer[idx]; if (stimer->msg_pending && (stimer->config & HV_STIMER_ENABLE) && HV_STIMER_SINT(stimer->config) == sint) { set_bit(stimer->index, hv_vcpu->stimer_pending_bitmap); stimers_pending++; } } if (stimers_pending) kvm_make_request(KVM_REQ_HV_STIMER, vcpu); idx = srcu_read_lock(&kvm->irq_srcu); gsi = atomic_read(&synic->sint_to_gsi[sint]); if (gsi != -1) kvm_notify_acked_gsi(kvm, gsi); srcu_read_unlock(&kvm->irq_srcu, idx); } static void synic_exit(struct kvm_vcpu_hv_synic *synic, u32 msr) { struct kvm_vcpu *vcpu = synic_to_vcpu(synic); struct kvm_vcpu_hv *hv_vcpu = &vcpu->arch.hyperv; hv_vcpu->exit.type = KVM_EXIT_HYPERV_SYNIC; hv_vcpu->exit.u.synic.msr = msr; hv_vcpu->exit.u.synic.control = synic->control; hv_vcpu->exit.u.synic.evt_page = synic->evt_page; hv_vcpu->exit.u.synic.msg_page = synic->msg_page; kvm_make_request(KVM_REQ_HV_EXIT, vcpu); } static int synic_set_msr(struct kvm_vcpu_hv_synic *synic, u32 msr, u64 data, bool host) { struct kvm_vcpu *vcpu = synic_to_vcpu(synic); int ret; if (!synic->active && (!host || data)) return 1; trace_kvm_hv_synic_set_msr(vcpu->vcpu_id, msr, data, host); ret = 0; switch (msr) { case HV_X64_MSR_SCONTROL: synic->control = data; if (!host) synic_exit(synic, msr); break; case HV_X64_MSR_SVERSION: if (!host) { ret = 1; break; } synic->version = data; break; case HV_X64_MSR_SIEFP: if ((data & HV_SYNIC_SIEFP_ENABLE) && !host && !synic->dont_zero_synic_pages) if (kvm_clear_guest(vcpu->kvm, data & PAGE_MASK, PAGE_SIZE)) { ret = 1; break; } synic->evt_page = data; if (!host) synic_exit(synic, msr); break; case HV_X64_MSR_SIMP: if ((data & HV_SYNIC_SIMP_ENABLE) && !host && !synic->dont_zero_synic_pages) if (kvm_clear_guest(vcpu->kvm, data & PAGE_MASK, PAGE_SIZE)) { ret = 1; break; } synic->msg_page = data; if (!host) synic_exit(synic, msr); break; case HV_X64_MSR_EOM: { int i; if (!synic->active) break; for (i = 0; i < ARRAY_SIZE(synic->sint); i++) kvm_hv_notify_acked_sint(vcpu, i); break; } case HV_X64_MSR_SINT0 ... HV_X64_MSR_SINT15: ret = synic_set_sint(synic, msr - HV_X64_MSR_SINT0, data, host); break; default: ret = 1; break; } return ret; } static int synic_get_msr(struct kvm_vcpu_hv_synic *synic, u32 msr, u64 *pdata, bool host) { int ret; if (!synic->active && !host) return 1; ret = 0; switch (msr) { case HV_X64_MSR_SCONTROL: *pdata = synic->control; break; case HV_X64_MSR_SVERSION: *pdata = synic->version; break; case HV_X64_MSR_SIEFP: *pdata = synic->evt_page; break; case HV_X64_MSR_SIMP: *pdata = synic->msg_page; break; case HV_X64_MSR_EOM: *pdata = 0; break; case HV_X64_MSR_SINT0 ... HV_X64_MSR_SINT15: *pdata = atomic64_read(&synic->sint[msr - HV_X64_MSR_SINT0]); break; default: ret = 1; break; } return ret; } static int synic_set_irq(struct kvm_vcpu_hv_synic *synic, u32 sint) { struct kvm_vcpu *vcpu = synic_to_vcpu(synic); struct kvm_lapic_irq irq; int ret, vector; if (KVM_BUG_ON(!lapic_in_kernel(vcpu), vcpu->kvm)) return -EINVAL; if (sint >= ARRAY_SIZE(synic->sint)) return -EINVAL; vector = synic_get_sint_vector(synic_read_sint(synic, sint)); if (vector < 0) return -ENOENT; memset(&irq, 0, sizeof(irq)); irq.shorthand = APIC_DEST_SELF; irq.dest_mode = APIC_DEST_PHYSICAL; irq.delivery_mode = APIC_DM_FIXED; irq.vector = vector; irq.level = 1; ret = kvm_irq_delivery_to_apic(vcpu->kvm, vcpu->arch.apic, &irq, NULL); trace_kvm_hv_synic_set_irq(vcpu->vcpu_id, sint, irq.vector, ret); return ret; } int kvm_hv_synic_set_irq(struct kvm *kvm, u32 vpidx, u32 sint) { struct kvm_vcpu_hv_synic *synic; synic = synic_get(kvm, vpidx); if (!synic) return -EINVAL; return synic_set_irq(synic, sint); } void kvm_hv_synic_send_eoi(struct kvm_vcpu *vcpu, int vector) { struct kvm_vcpu_hv_synic *synic = vcpu_to_synic(vcpu); int i; trace_kvm_hv_synic_send_eoi(vcpu->vcpu_id, vector); for (i = 0; i < ARRAY_SIZE(synic->sint); i++) if (synic_get_sint_vector(synic_read_sint(synic, i)) == vector) kvm_hv_notify_acked_sint(vcpu, i); } static int kvm_hv_set_sint_gsi(struct kvm *kvm, u32 vpidx, u32 sint, int gsi) { struct kvm_vcpu_hv_synic *synic; synic = synic_get(kvm, vpidx); if (!synic) return -EINVAL; if (sint >= ARRAY_SIZE(synic->sint_to_gsi)) return -EINVAL; atomic_set(&synic->sint_to_gsi[sint], gsi); return 0; } void kvm_hv_irq_routing_update(struct kvm *kvm) { struct kvm_irq_routing_table *irq_rt; struct kvm_kernel_irq_routing_entry *e; u32 gsi; irq_rt = srcu_dereference_check(kvm->irq_routing, &kvm->irq_srcu, lockdep_is_held(&kvm->irq_lock)); for (gsi = 0; gsi < irq_rt->nr_rt_entries; gsi++) { hlist_for_each_entry(e, &irq_rt->map[gsi], link) { if (e->type == KVM_IRQ_ROUTING_HV_SINT) kvm_hv_set_sint_gsi(kvm, e->hv_sint.vcpu, e->hv_sint.sint, gsi); } } } static void synic_init(struct kvm_vcpu_hv_synic *synic) { int i; memset(synic, 0, sizeof(*synic)); synic->version = HV_SYNIC_VERSION_1; for (i = 0; i < ARRAY_SIZE(synic->sint); i++) { atomic64_set(&synic->sint[i], HV_SYNIC_SINT_MASKED); atomic_set(&synic->sint_to_gsi[i], -1); } } static u64 get_time_ref_counter(struct kvm *kvm) { struct kvm_hv *hv = &kvm->arch.hyperv; struct kvm_vcpu *vcpu; u64 tsc; /* * The guest has not set up the TSC page or the clock isn't * stable, fall back to get_kvmclock_ns. */ if (!hv->tsc_ref.tsc_sequence) return div_u64(get_kvmclock_ns(kvm), 100); vcpu = kvm_get_vcpu(kvm, 0); tsc = kvm_read_l1_tsc(vcpu, rdtsc()); return mul_u64_u64_shr(tsc, hv->tsc_ref.tsc_scale, 64) + hv->tsc_ref.tsc_offset; } static void stimer_mark_pending(struct kvm_vcpu_hv_stimer *stimer, bool vcpu_kick) { struct kvm_vcpu *vcpu = stimer_to_vcpu(stimer); set_bit(stimer->index, vcpu_to_hv_vcpu(vcpu)->stimer_pending_bitmap); kvm_make_request(KVM_REQ_HV_STIMER, vcpu); if (vcpu_kick) kvm_vcpu_kick(vcpu); } static void stimer_cleanup(struct kvm_vcpu_hv_stimer *stimer) { struct kvm_vcpu *vcpu = stimer_to_vcpu(stimer); trace_kvm_hv_stimer_cleanup(stimer_to_vcpu(stimer)->vcpu_id, stimer->index); hrtimer_cancel(&stimer->timer); clear_bit(stimer->index, vcpu_to_hv_vcpu(vcpu)->stimer_pending_bitmap); stimer->msg_pending = false; stimer->exp_time = 0; } static enum hrtimer_restart stimer_timer_callback(struct hrtimer *timer) { struct kvm_vcpu_hv_stimer *stimer; stimer = container_of(timer, struct kvm_vcpu_hv_stimer, timer); trace_kvm_hv_stimer_callback(stimer_to_vcpu(stimer)->vcpu_id, stimer->index); stimer_mark_pending(stimer, true); return HRTIMER_NORESTART; } /* * stimer_start() assumptions: * a) stimer->count is not equal to 0 * b) stimer->config has HV_STIMER_ENABLE flag */ static int stimer_start(struct kvm_vcpu_hv_stimer *stimer) { u64 time_now; ktime_t ktime_now; time_now = get_time_ref_counter(stimer_to_vcpu(stimer)->kvm); ktime_now = ktime_get(); if (stimer->config & HV_STIMER_PERIODIC) { if (stimer->exp_time) { if (time_now >= stimer->exp_time) { u64 remainder; div64_u64_rem(time_now - stimer->exp_time, stimer->count, &remainder); stimer->exp_time = time_now + (stimer->count - remainder); } } else stimer->exp_time = time_now + stimer->count; trace_kvm_hv_stimer_start_periodic( stimer_to_vcpu(stimer)->vcpu_id, stimer->index, time_now, stimer->exp_time); hrtimer_start(&stimer->timer, ktime_add_ns(ktime_now, 100 * (stimer->exp_time - time_now)), HRTIMER_MODE_ABS); return 0; } stimer->exp_time = stimer->count; if (time_now >= stimer->count) { /* * Expire timer according to Hypervisor Top-Level Functional * specification v4(15.3.1): * "If a one shot is enabled and the specified count is in * the past, it will expire immediately." */ stimer_mark_pending(stimer, false); return 0; } trace_kvm_hv_stimer_start_one_shot(stimer_to_vcpu(stimer)->vcpu_id, stimer->index, time_now, stimer->count); hrtimer_start(&stimer->timer, ktime_add_ns(ktime_now, 100 * (stimer->count - time_now)), HRTIMER_MODE_ABS); return 0; } static int stimer_set_config(struct kvm_vcpu_hv_stimer *stimer, u64 config, bool host) { struct kvm_vcpu *vcpu = stimer_to_vcpu(stimer); struct kvm_vcpu_hv_synic *synic = vcpu_to_synic(vcpu); if (!synic->active && (!host || config)) return 1; trace_kvm_hv_stimer_set_config(stimer_to_vcpu(stimer)->vcpu_id, stimer->index, config, host); stimer_cleanup(stimer); if ((stimer->config & HV_STIMER_ENABLE) && HV_STIMER_SINT(config) == 0) config &= ~HV_STIMER_ENABLE; stimer->config = config; stimer_mark_pending(stimer, false); return 0; } static int stimer_set_count(struct kvm_vcpu_hv_stimer *stimer, u64 count, bool host) { struct kvm_vcpu *vcpu = stimer_to_vcpu(stimer); struct kvm_vcpu_hv_synic *synic = vcpu_to_synic(vcpu); if (!synic->active && (!host || count)) return 1; trace_kvm_hv_stimer_set_count(stimer_to_vcpu(stimer)->vcpu_id, stimer->index, count, host); stimer_cleanup(stimer); stimer->count = count; if (stimer->count == 0) stimer->config &= ~HV_STIMER_ENABLE; else if (stimer->config & HV_STIMER_AUTOENABLE) stimer->config |= HV_STIMER_ENABLE; stimer_mark_pending(stimer, false); return 0; } static int stimer_get_config(struct kvm_vcpu_hv_stimer *stimer, u64 *pconfig) { *pconfig = stimer->config; return 0; } static int stimer_get_count(struct kvm_vcpu_hv_stimer *stimer, u64 *pcount) { *pcount = stimer->count; return 0; } static int synic_deliver_msg(struct kvm_vcpu_hv_synic *synic, u32 sint, struct hv_message *src_msg) { struct kvm_vcpu *vcpu = synic_to_vcpu(synic); struct page *page; gpa_t gpa; struct hv_message *dst_msg; int r; struct hv_message_page *msg_page; if (!(synic->msg_page & HV_SYNIC_SIMP_ENABLE)) return -ENOENT; gpa = synic->msg_page & PAGE_MASK; page = kvm_vcpu_gfn_to_page(vcpu, gpa >> PAGE_SHIFT); if (is_error_page(page)) return -EFAULT; msg_page = kmap_atomic(page); dst_msg = &msg_page->sint_message[sint]; if (sync_cmpxchg(&dst_msg->header.message_type, HVMSG_NONE, src_msg->header.message_type) != HVMSG_NONE) { dst_msg->header.message_flags.msg_pending = 1; r = -EAGAIN; } else { memcpy(&dst_msg->u.payload, &src_msg->u.payload, src_msg->header.payload_size); dst_msg->header.message_type = src_msg->header.message_type; dst_msg->header.payload_size = src_msg->header.payload_size; r = synic_set_irq(synic, sint); if (r >= 1) r = 0; else if (r == 0) r = -EFAULT; } kunmap_atomic(msg_page); kvm_release_page_dirty(page); kvm_vcpu_mark_page_dirty(vcpu, gpa >> PAGE_SHIFT); return r; } static int stimer_send_msg(struct kvm_vcpu_hv_stimer *stimer) { struct kvm_vcpu *vcpu = stimer_to_vcpu(stimer); struct hv_message *msg = &stimer->msg; struct hv_timer_message_payload *payload = (struct hv_timer_message_payload *)&msg->u.payload; payload->expiration_time = stimer->exp_time; payload->delivery_time = get_time_ref_counter(vcpu->kvm); return synic_deliver_msg(vcpu_to_synic(vcpu), HV_STIMER_SINT(stimer->config), msg); } static void stimer_expiration(struct kvm_vcpu_hv_stimer *stimer) { int r; stimer->msg_pending = true; r = stimer_send_msg(stimer); trace_kvm_hv_stimer_expiration(stimer_to_vcpu(stimer)->vcpu_id, stimer->index, r); if (!r) { stimer->msg_pending = false; if (!(stimer->config & HV_STIMER_PERIODIC)) stimer->config &= ~HV_STIMER_ENABLE; } } void kvm_hv_process_stimers(struct kvm_vcpu *vcpu) { struct kvm_vcpu_hv *hv_vcpu = vcpu_to_hv_vcpu(vcpu); struct kvm_vcpu_hv_stimer *stimer; u64 time_now, exp_time; int i; for (i = 0; i < ARRAY_SIZE(hv_vcpu->stimer); i++) if (test_and_clear_bit(i, hv_vcpu->stimer_pending_bitmap)) { stimer = &hv_vcpu->stimer[i]; if (stimer->config & HV_STIMER_ENABLE) { exp_time = stimer->exp_time; if (exp_time) { time_now = get_time_ref_counter(vcpu->kvm); if (time_now >= exp_time) stimer_expiration(stimer); } if ((stimer->config & HV_STIMER_ENABLE) && stimer->count) { if (!stimer->msg_pending) stimer_start(stimer); } else stimer_cleanup(stimer); } } } void kvm_hv_vcpu_uninit(struct kvm_vcpu *vcpu) { struct kvm_vcpu_hv *hv_vcpu = vcpu_to_hv_vcpu(vcpu); int i; for (i = 0; i < ARRAY_SIZE(hv_vcpu->stimer); i++) stimer_cleanup(&hv_vcpu->stimer[i]); } bool kvm_hv_assist_page_enabled(struct kvm_vcpu *vcpu) { if (!(vcpu->arch.hyperv.hv_vapic & HV_X64_MSR_VP_ASSIST_PAGE_ENABLE)) return false; return vcpu->arch.pv_eoi.msr_val & KVM_MSR_ENABLED; } EXPORT_SYMBOL_GPL(kvm_hv_assist_page_enabled); bool kvm_hv_get_assist_page(struct kvm_vcpu *vcpu, struct hv_vp_assist_page *assist_page) { if (!kvm_hv_assist_page_enabled(vcpu)) return false; return !kvm_read_guest_cached(vcpu->kvm, &vcpu->arch.pv_eoi.data, assist_page, sizeof(*assist_page)); } EXPORT_SYMBOL_GPL(kvm_hv_get_assist_page); static void stimer_prepare_msg(struct kvm_vcpu_hv_stimer *stimer) { struct hv_message *msg = &stimer->msg; struct hv_timer_message_payload *payload = (struct hv_timer_message_payload *)&msg->u.payload; memset(&msg->header, 0, sizeof(msg->header)); msg->header.message_type = HVMSG_TIMER_EXPIRED; msg->header.payload_size = sizeof(*payload); payload->timer_index = stimer->index; payload->expiration_time = 0; payload->delivery_time = 0; } static void stimer_init(struct kvm_vcpu_hv_stimer *stimer, int timer_index) { memset(stimer, 0, sizeof(*stimer)); stimer->index = timer_index; hrtimer_init(&stimer->timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS); stimer->timer.function = stimer_timer_callback; stimer_prepare_msg(stimer); } void kvm_hv_vcpu_init(struct kvm_vcpu *vcpu) { struct kvm_vcpu_hv *hv_vcpu = vcpu_to_hv_vcpu(vcpu); int i; synic_init(&hv_vcpu->synic); bitmap_zero(hv_vcpu->stimer_pending_bitmap, HV_SYNIC_STIMER_COUNT); for (i = 0; i < ARRAY_SIZE(hv_vcpu->stimer); i++) stimer_init(&hv_vcpu->stimer[i], i); } void kvm_hv_vcpu_postcreate(struct kvm_vcpu *vcpu) { struct kvm_vcpu_hv *hv_vcpu = vcpu_to_hv_vcpu(vcpu); hv_vcpu->vp_index = kvm_vcpu_get_idx(vcpu); } int kvm_hv_activate_synic(struct kvm_vcpu *vcpu, bool dont_zero_synic_pages) { struct kvm_vcpu_hv_synic *synic = vcpu_to_synic(vcpu); /* * Hyper-V SynIC auto EOI SINT's are * not compatible with APICV, so deactivate APICV */ kvm_vcpu_deactivate_apicv(vcpu); synic->active = true; synic->dont_zero_synic_pages = dont_zero_synic_pages; return 0; } static bool kvm_hv_msr_partition_wide(u32 msr) { bool r = false; switch (msr) { case HV_X64_MSR_GUEST_OS_ID: case HV_X64_MSR_HYPERCALL: case HV_X64_MSR_REFERENCE_TSC: case HV_X64_MSR_TIME_REF_COUNT: case HV_X64_MSR_CRASH_CTL: case HV_X64_MSR_CRASH_P0 ... HV_X64_MSR_CRASH_P4: case HV_X64_MSR_RESET: case HV_X64_MSR_REENLIGHTENMENT_CONTROL: case HV_X64_MSR_TSC_EMULATION_CONTROL: case HV_X64_MSR_TSC_EMULATION_STATUS: r = true; break; } return r; } static int kvm_hv_msr_get_crash_data(struct kvm_vcpu *vcpu, u32 index, u64 *pdata) { struct kvm_hv *hv = &vcpu->kvm->arch.hyperv; size_t size = ARRAY_SIZE(hv->hv_crash_param); if (WARN_ON_ONCE(index >= size)) return -EINVAL; *pdata = hv->hv_crash_param[array_index_nospec(index, size)]; return 0; } static int kvm_hv_msr_get_crash_ctl(struct kvm_vcpu *vcpu, u64 *pdata) { struct kvm_hv *hv = &vcpu->kvm->arch.hyperv; *pdata = hv->hv_crash_ctl; return 0; } static int kvm_hv_msr_set_crash_ctl(struct kvm_vcpu *vcpu, u64 data, bool host) { struct kvm_hv *hv = &vcpu->kvm->arch.hyperv; if (host) hv->hv_crash_ctl = data & HV_X64_MSR_CRASH_CTL_NOTIFY; if (!host && (data & HV_X64_MSR_CRASH_CTL_NOTIFY)) { vcpu_debug(vcpu, "hv crash (0x%llx 0x%llx 0x%llx 0x%llx 0x%llx)\n", hv->hv_crash_param[0], hv->hv_crash_param[1], hv->hv_crash_param[2], hv->hv_crash_param[3], hv->hv_crash_param[4]); /* Send notification about crash to user space */ kvm_make_request(KVM_REQ_HV_CRASH, vcpu); } return 0; } static int kvm_hv_msr_set_crash_data(struct kvm_vcpu *vcpu, u32 index, u64 data) { struct kvm_hv *hv = &vcpu->kvm->arch.hyperv; size_t size = ARRAY_SIZE(hv->hv_crash_param); if (WARN_ON_ONCE(index >= size)) return -EINVAL; hv->hv_crash_param[array_index_nospec(index, size)] = data; return 0; } /* * The kvmclock and Hyper-V TSC page use similar formulas, and converting * between them is possible: * * kvmclock formula: * nsec = (ticks - tsc_timestamp) * tsc_to_system_mul * 2^(tsc_shift-32) * + system_time * * Hyper-V formula: * nsec/100 = ticks * scale / 2^64 + offset * * When tsc_timestamp = system_time = 0, offset is zero in the Hyper-V formula. * By dividing the kvmclock formula by 100 and equating what's left we get: * ticks * scale / 2^64 = ticks * tsc_to_system_mul * 2^(tsc_shift-32) / 100 * scale / 2^64 = tsc_to_system_mul * 2^(tsc_shift-32) / 100 * scale = tsc_to_system_mul * 2^(32+tsc_shift) / 100 * * Now expand the kvmclock formula and divide by 100: * nsec = ticks * tsc_to_system_mul * 2^(tsc_shift-32) * - tsc_timestamp * tsc_to_system_mul * 2^(tsc_shift-32) * + system_time * nsec/100 = ticks * tsc_to_system_mul * 2^(tsc_shift-32) / 100 * - tsc_timestamp * tsc_to_system_mul * 2^(tsc_shift-32) / 100 * + system_time / 100 * * Replace tsc_to_system_mul * 2^(tsc_shift-32) / 100 by scale / 2^64: * nsec/100 = ticks * scale / 2^64 * - tsc_timestamp * scale / 2^64 * + system_time / 100 * * Equate with the Hyper-V formula so that ticks * scale / 2^64 cancels out: * offset = system_time / 100 - tsc_timestamp * scale / 2^64 * * These two equivalencies are implemented in this function. */ static bool compute_tsc_page_parameters(struct pvclock_vcpu_time_info *hv_clock, HV_REFERENCE_TSC_PAGE *tsc_ref) { u64 max_mul; if (!(hv_clock->flags & PVCLOCK_TSC_STABLE_BIT)) return false; /* * check if scale would overflow, if so we use the time ref counter * tsc_to_system_mul * 2^(tsc_shift+32) / 100 >= 2^64 * tsc_to_system_mul / 100 >= 2^(32-tsc_shift) * tsc_to_system_mul >= 100 * 2^(32-tsc_shift) */ max_mul = 100ull << (32 - hv_clock->tsc_shift); if (hv_clock->tsc_to_system_mul >= max_mul) return false; /* * Otherwise compute the scale and offset according to the formulas * derived above. */ tsc_ref->tsc_scale = mul_u64_u32_div(1ULL << (32 + hv_clock->tsc_shift), hv_clock->tsc_to_system_mul, 100); tsc_ref->tsc_offset = hv_clock->system_time; do_div(tsc_ref->tsc_offset, 100); tsc_ref->tsc_offset -= mul_u64_u64_shr(hv_clock->tsc_timestamp, tsc_ref->tsc_scale, 64); return true; } void kvm_hv_setup_tsc_page(struct kvm *kvm, struct pvclock_vcpu_time_info *hv_clock) { struct kvm_hv *hv = &kvm->arch.hyperv; u32 tsc_seq; u64 gfn; BUILD_BUG_ON(sizeof(tsc_seq) != sizeof(hv->tsc_ref.tsc_sequence)); BUILD_BUG_ON(offsetof(HV_REFERENCE_TSC_PAGE, tsc_sequence) != 0); if (!(hv->hv_tsc_page & HV_X64_MSR_TSC_REFERENCE_ENABLE)) return; mutex_lock(&kvm->arch.hyperv.hv_lock); if (!(hv->hv_tsc_page & HV_X64_MSR_TSC_REFERENCE_ENABLE)) goto out_unlock; gfn = hv->hv_tsc_page >> HV_X64_MSR_TSC_REFERENCE_ADDRESS_SHIFT; /* * Because the TSC parameters only vary when there is a * change in the master clock, do not bother with caching. */ if (unlikely(kvm_read_guest(kvm, gfn_to_gpa(gfn), &tsc_seq, sizeof(tsc_seq)))) goto out_unlock; /* * While we're computing and writing the parameters, force the * guest to use the time reference count MSR. */ hv->tsc_ref.tsc_sequence = 0; if (kvm_write_guest(kvm, gfn_to_gpa(gfn), &hv->tsc_ref, sizeof(hv->tsc_ref.tsc_sequence))) goto out_unlock; if (!compute_tsc_page_parameters(hv_clock, &hv->tsc_ref)) goto out_unlock; /* Ensure sequence is zero before writing the rest of the struct. */ smp_wmb(); if (kvm_write_guest(kvm, gfn_to_gpa(gfn), &hv->tsc_ref, sizeof(hv->tsc_ref))) goto out_unlock; /* * Now switch to the TSC page mechanism by writing the sequence. */ tsc_seq++; if (tsc_seq == 0xFFFFFFFF || tsc_seq == 0) tsc_seq = 1; /* Write the struct entirely before the non-zero sequence. */ smp_wmb(); hv->tsc_ref.tsc_sequence = tsc_seq; kvm_write_guest(kvm, gfn_to_gpa(gfn), &hv->tsc_ref, sizeof(hv->tsc_ref.tsc_sequence)); out_unlock: mutex_unlock(&kvm->arch.hyperv.hv_lock); } static int kvm_hv_set_msr_pw(struct kvm_vcpu *vcpu, u32 msr, u64 data, bool host) { struct kvm *kvm = vcpu->kvm; struct kvm_hv *hv = &kvm->arch.hyperv; switch (msr) { case HV_X64_MSR_GUEST_OS_ID: hv->hv_guest_os_id = data; /* setting guest os id to zero disables hypercall page */ if (!hv->hv_guest_os_id) hv->hv_hypercall &= ~HV_X64_MSR_HYPERCALL_ENABLE; break; case HV_X64_MSR_HYPERCALL: { u64 gfn; unsigned long addr; u8 instructions[4]; /* if guest os id is not set hypercall should remain disabled */ if (!hv->hv_guest_os_id) break; if (!(data & HV_X64_MSR_HYPERCALL_ENABLE)) { hv->hv_hypercall = data; break; } gfn = data >> HV_X64_MSR_HYPERCALL_PAGE_ADDRESS_SHIFT; addr = gfn_to_hva(kvm, gfn); if (kvm_is_error_hva(addr)) return 1; kvm_x86_ops->patch_hypercall(vcpu, instructions); ((unsigned char *)instructions)[3] = 0xc3; /* ret */ if (__copy_to_user((void __user *)addr, instructions, 4)) return 1; hv->hv_hypercall = data; mark_page_dirty(kvm, gfn); break; } case HV_X64_MSR_REFERENCE_TSC: hv->hv_tsc_page = data; if (hv->hv_tsc_page & HV_X64_MSR_TSC_REFERENCE_ENABLE) kvm_make_request(KVM_REQ_MASTERCLOCK_UPDATE, vcpu); break; case HV_X64_MSR_CRASH_P0 ... HV_X64_MSR_CRASH_P4: return kvm_hv_msr_set_crash_data(vcpu, msr - HV_X64_MSR_CRASH_P0, data); case HV_X64_MSR_CRASH_CTL: return kvm_hv_msr_set_crash_ctl(vcpu, data, host); case HV_X64_MSR_RESET: if (data == 1) { vcpu_debug(vcpu, "hyper-v reset requested\n"); kvm_make_request(KVM_REQ_HV_RESET, vcpu); } break; case HV_X64_MSR_REENLIGHTENMENT_CONTROL: hv->hv_reenlightenment_control = data; break; case HV_X64_MSR_TSC_EMULATION_CONTROL: hv->hv_tsc_emulation_control = data; break; case HV_X64_MSR_TSC_EMULATION_STATUS: hv->hv_tsc_emulation_status = data; break; case HV_X64_MSR_TIME_REF_COUNT: /* read-only, but still ignore it if host-initiated */ if (!host) return 1; break; default: vcpu_unimpl(vcpu, "Hyper-V uhandled wrmsr: 0x%x data 0x%llx\n", msr, data); return 1; } return 0; } /* Calculate cpu time spent by current task in 100ns units */ static u64 current_task_runtime_100ns(void) { u64 utime, stime; task_cputime_adjusted(current, &utime, &stime); return div_u64(utime + stime, 100); } static int kvm_hv_set_msr(struct kvm_vcpu *vcpu, u32 msr, u64 data, bool host) { struct kvm_vcpu_hv *hv_vcpu = &vcpu->arch.hyperv; switch (msr) { case HV_X64_MSR_VP_INDEX: { struct kvm_hv *hv = &vcpu->kvm->arch.hyperv; int vcpu_idx = kvm_vcpu_get_idx(vcpu); u32 new_vp_index = (u32)data; if (!host || new_vp_index >= KVM_MAX_VCPUS) return 1; if (new_vp_index == hv_vcpu->vp_index) return 0; /* * The VP index is initialized to vcpu_index by * kvm_hv_vcpu_postcreate so they initially match. Now the * VP index is changing, adjust num_mismatched_vp_indexes if * it now matches or no longer matches vcpu_idx. */ if (hv_vcpu->vp_index == vcpu_idx) atomic_inc(&hv->num_mismatched_vp_indexes); else if (new_vp_index == vcpu_idx) atomic_dec(&hv->num_mismatched_vp_indexes); hv_vcpu->vp_index = new_vp_index; break; } case HV_X64_MSR_VP_ASSIST_PAGE: { u64 gfn; unsigned long addr; if (!(data & HV_X64_MSR_VP_ASSIST_PAGE_ENABLE)) { hv_vcpu->hv_vapic = data; if (kvm_lapic_enable_pv_eoi(vcpu, 0, 0)) return 1; break; } gfn = data >> HV_X64_MSR_VP_ASSIST_PAGE_ADDRESS_SHIFT; addr = kvm_vcpu_gfn_to_hva(vcpu, gfn); if (kvm_is_error_hva(addr)) return 1; if (__clear_user((void __user *)addr, PAGE_SIZE)) return 1; hv_vcpu->hv_vapic = data; kvm_vcpu_mark_page_dirty(vcpu, gfn); if (kvm_lapic_enable_pv_eoi(vcpu, gfn_to_gpa(gfn) | KVM_MSR_ENABLED, sizeof(struct hv_vp_assist_page))) return 1; break; } case HV_X64_MSR_EOI: return kvm_hv_vapic_msr_write(vcpu, APIC_EOI, data); case HV_X64_MSR_ICR: return kvm_hv_vapic_msr_write(vcpu, APIC_ICR, data); case HV_X64_MSR_TPR: return kvm_hv_vapic_msr_write(vcpu, APIC_TASKPRI, data); case HV_X64_MSR_VP_RUNTIME: if (!host) return 1; hv_vcpu->runtime_offset = data - current_task_runtime_100ns(); break; case HV_X64_MSR_SCONTROL: case HV_X64_MSR_SVERSION: case HV_X64_MSR_SIEFP: case HV_X64_MSR_SIMP: case HV_X64_MSR_EOM: case HV_X64_MSR_SINT0 ... HV_X64_MSR_SINT15: return synic_set_msr(vcpu_to_synic(vcpu), msr, data, host); case HV_X64_MSR_STIMER0_CONFIG: case HV_X64_MSR_STIMER1_CONFIG: case HV_X64_MSR_STIMER2_CONFIG: case HV_X64_MSR_STIMER3_CONFIG: { int timer_index = (msr - HV_X64_MSR_STIMER0_CONFIG)/2; return stimer_set_config(vcpu_to_stimer(vcpu, timer_index), data, host); } case HV_X64_MSR_STIMER0_COUNT: case HV_X64_MSR_STIMER1_COUNT: case HV_X64_MSR_STIMER2_COUNT: case HV_X64_MSR_STIMER3_COUNT: { int timer_index = (msr - HV_X64_MSR_STIMER0_COUNT)/2; return stimer_set_count(vcpu_to_stimer(vcpu, timer_index), data, host); } case HV_X64_MSR_TSC_FREQUENCY: case HV_X64_MSR_APIC_FREQUENCY: /* read-only, but still ignore it if host-initiated */ if (!host) return 1; break; default: vcpu_unimpl(vcpu, "Hyper-V uhandled wrmsr: 0x%x data 0x%llx\n", msr, data); return 1; } return 0; } static int kvm_hv_get_msr_pw(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata) { u64 data = 0; struct kvm *kvm = vcpu->kvm; struct kvm_hv *hv = &kvm->arch.hyperv; switch (msr) { case HV_X64_MSR_GUEST_OS_ID: data = hv->hv_guest_os_id; break; case HV_X64_MSR_HYPERCALL: data = hv->hv_hypercall; break; case HV_X64_MSR_TIME_REF_COUNT: data = get_time_ref_counter(kvm); break; case HV_X64_MSR_REFERENCE_TSC: data = hv->hv_tsc_page; break; case HV_X64_MSR_CRASH_P0 ... HV_X64_MSR_CRASH_P4: return kvm_hv_msr_get_crash_data(vcpu, msr - HV_X64_MSR_CRASH_P0, pdata); case HV_X64_MSR_CRASH_CTL: return kvm_hv_msr_get_crash_ctl(vcpu, pdata); case HV_X64_MSR_RESET: data = 0; break; case HV_X64_MSR_REENLIGHTENMENT_CONTROL: data = hv->hv_reenlightenment_control; break; case HV_X64_MSR_TSC_EMULATION_CONTROL: data = hv->hv_tsc_emulation_control; break; case HV_X64_MSR_TSC_EMULATION_STATUS: data = hv->hv_tsc_emulation_status; break; default: vcpu_unimpl(vcpu, "Hyper-V unhandled rdmsr: 0x%x\n", msr); return 1; } *pdata = data; return 0; } static int kvm_hv_get_msr(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata, bool host) { u64 data = 0; struct kvm_vcpu_hv *hv_vcpu = &vcpu->arch.hyperv; switch (msr) { case HV_X64_MSR_VP_INDEX: data = hv_vcpu->vp_index; break; case HV_X64_MSR_EOI: return kvm_hv_vapic_msr_read(vcpu, APIC_EOI, pdata); case HV_X64_MSR_ICR: return kvm_hv_vapic_msr_read(vcpu, APIC_ICR, pdata); case HV_X64_MSR_TPR: return kvm_hv_vapic_msr_read(vcpu, APIC_TASKPRI, pdata); case HV_X64_MSR_VP_ASSIST_PAGE: data = hv_vcpu->hv_vapic; break; case HV_X64_MSR_VP_RUNTIME: data = current_task_runtime_100ns() + hv_vcpu->runtime_offset; break; case HV_X64_MSR_SCONTROL: case HV_X64_MSR_SVERSION: case HV_X64_MSR_SIEFP: case HV_X64_MSR_SIMP: case HV_X64_MSR_EOM: case HV_X64_MSR_SINT0 ... HV_X64_MSR_SINT15: return synic_get_msr(vcpu_to_synic(vcpu), msr, pdata, host); case HV_X64_MSR_STIMER0_CONFIG: case HV_X64_MSR_STIMER1_CONFIG: case HV_X64_MSR_STIMER2_CONFIG: case HV_X64_MSR_STIMER3_CONFIG: { int timer_index = (msr - HV_X64_MSR_STIMER0_CONFIG)/2; return stimer_get_config(vcpu_to_stimer(vcpu, timer_index), pdata); } case HV_X64_MSR_STIMER0_COUNT: case HV_X64_MSR_STIMER1_COUNT: case HV_X64_MSR_STIMER2_COUNT: case HV_X64_MSR_STIMER3_COUNT: { int timer_index = (msr - HV_X64_MSR_STIMER0_COUNT)/2; return stimer_get_count(vcpu_to_stimer(vcpu, timer_index), pdata); } case HV_X64_MSR_TSC_FREQUENCY: data = (u64)vcpu->arch.virtual_tsc_khz * 1000; break; case HV_X64_MSR_APIC_FREQUENCY: data = APIC_BUS_FREQUENCY; break; default: vcpu_unimpl(vcpu, "Hyper-V unhandled rdmsr: 0x%x\n", msr); return 1; } *pdata = data; return 0; } int kvm_hv_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data, bool host) { if (kvm_hv_msr_partition_wide(msr)) { int r; mutex_lock(&vcpu->kvm->arch.hyperv.hv_lock); r = kvm_hv_set_msr_pw(vcpu, msr, data, host); mutex_unlock(&vcpu->kvm->arch.hyperv.hv_lock); return r; } else return kvm_hv_set_msr(vcpu, msr, data, host); } int kvm_hv_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata, bool host) { if (kvm_hv_msr_partition_wide(msr)) { int r; mutex_lock(&vcpu->kvm->arch.hyperv.hv_lock); r = kvm_hv_get_msr_pw(vcpu, msr, pdata); mutex_unlock(&vcpu->kvm->arch.hyperv.hv_lock); return r; } else return kvm_hv_get_msr(vcpu, msr, pdata, host); } static __always_inline int get_sparse_bank_no(u64 valid_bank_mask, int bank_no) { int i = 0, j; if (!(valid_bank_mask & BIT_ULL(bank_no))) return -1; for (j = 0; j < bank_no; j++) if (valid_bank_mask & BIT_ULL(j)) i++; return i; } static u64 kvm_hv_flush_tlb(struct kvm_vcpu *current_vcpu, u64 ingpa, u16 rep_cnt, bool ex) { struct kvm *kvm = current_vcpu->kvm; struct kvm_vcpu_hv *hv_current = &current_vcpu->arch.hyperv; struct hv_tlb_flush_ex flush_ex; struct hv_tlb_flush flush; struct kvm_vcpu *vcpu; unsigned long vcpu_bitmap[BITS_TO_LONGS(KVM_MAX_VCPUS)] = {0}; unsigned long valid_bank_mask = 0; u64 sparse_banks[64]; int sparse_banks_len, i; bool all_cpus; if (!ex) { if (unlikely(kvm_read_guest(kvm, ingpa, &flush, sizeof(flush)))) return HV_STATUS_INVALID_HYPERCALL_INPUT; trace_kvm_hv_flush_tlb(flush.processor_mask, flush.address_space, flush.flags); sparse_banks[0] = flush.processor_mask; /* * Work around possible WS2012 bug: it sends hypercalls * with processor_mask = 0x0 and HV_FLUSH_ALL_PROCESSORS clear, * while also expecting us to flush something and crashing if * we don't. Let's treat processor_mask == 0 same as * HV_FLUSH_ALL_PROCESSORS. */ all_cpus = (flush.flags & HV_FLUSH_ALL_PROCESSORS) || flush.processor_mask == 0; } else { if (unlikely(kvm_read_guest(kvm, ingpa, &flush_ex, sizeof(flush_ex)))) return HV_STATUS_INVALID_HYPERCALL_INPUT; trace_kvm_hv_flush_tlb_ex(flush_ex.hv_vp_set.valid_bank_mask, flush_ex.hv_vp_set.format, flush_ex.address_space, flush_ex.flags); valid_bank_mask = flush_ex.hv_vp_set.valid_bank_mask; all_cpus = flush_ex.hv_vp_set.format != HV_GENERIC_SET_SPARSE_4K; sparse_banks_len = bitmap_weight(&valid_bank_mask, 64) * sizeof(sparse_banks[0]); if (!sparse_banks_len && !all_cpus) goto ret_success; if (!all_cpus && kvm_read_guest(kvm, ingpa + offsetof(struct hv_tlb_flush_ex, hv_vp_set.bank_contents), sparse_banks, sparse_banks_len)) return HV_STATUS_INVALID_HYPERCALL_INPUT; } cpumask_clear(&hv_current->tlb_lush); kvm_for_each_vcpu(i, vcpu, kvm) { struct kvm_vcpu_hv *hv = &vcpu->arch.hyperv; int bank = hv->vp_index / 64, sbank = 0; if (!all_cpus) { /* Banks >64 can't be represented */ if (bank >= 64) continue; /* Non-ex hypercalls can only address first 64 vCPUs */ if (!ex && bank) continue; if (ex) { /* * Check is the bank of this vCPU is in sparse * set and get the sparse bank number. */ sbank = get_sparse_bank_no(valid_bank_mask, bank); if (sbank < 0) continue; } if (!(sparse_banks[sbank] & BIT_ULL(hv->vp_index % 64))) continue; } /* * vcpu->arch.cr3 may not be up-to-date for running vCPUs so we * can't analyze it here, flush TLB regardless of the specified * address space. */ __set_bit(i, vcpu_bitmap); } kvm_make_vcpus_request_mask(kvm, KVM_REQ_TLB_FLUSH | KVM_REQUEST_NO_WAKEUP, vcpu_bitmap, &hv_current->tlb_lush); ret_success: /* We always do full TLB flush, set rep_done = rep_cnt. */ return (u64)HV_STATUS_SUCCESS | ((u64)rep_cnt << HV_HYPERCALL_REP_COMP_OFFSET); } bool kvm_hv_hypercall_enabled(struct kvm *kvm) { return READ_ONCE(kvm->arch.hyperv.hv_hypercall) & HV_X64_MSR_HYPERCALL_ENABLE; } static void kvm_hv_hypercall_set_result(struct kvm_vcpu *vcpu, u64 result) { bool longmode; longmode = is_64_bit_mode(vcpu); if (longmode) kvm_register_write(vcpu, VCPU_REGS_RAX, result); else { kvm_register_write(vcpu, VCPU_REGS_RDX, result >> 32); kvm_register_write(vcpu, VCPU_REGS_RAX, result & 0xffffffff); } } static int kvm_hv_hypercall_complete(struct kvm_vcpu *vcpu, u64 result) { kvm_hv_hypercall_set_result(vcpu, result); ++vcpu->stat.hypercalls; return kvm_skip_emulated_instruction(vcpu); } static int kvm_hv_hypercall_complete_userspace(struct kvm_vcpu *vcpu) { return kvm_hv_hypercall_complete(vcpu, vcpu->run->hyperv.u.hcall.result); } static u16 kvm_hvcall_signal_event(struct kvm_vcpu *vcpu, bool fast, u64 param) { struct eventfd_ctx *eventfd; if (unlikely(!fast)) { int ret; gpa_t gpa = param; if ((gpa & (__alignof__(param) - 1)) || offset_in_page(gpa) + sizeof(param) > PAGE_SIZE) return HV_STATUS_INVALID_ALIGNMENT; ret = kvm_vcpu_read_guest(vcpu, gpa, &param, sizeof(param)); if (ret < 0) return HV_STATUS_INVALID_ALIGNMENT; } /* * Per spec, bits 32-47 contain the extra "flag number". However, we * have no use for it, and in all known usecases it is zero, so just * report lookup failure if it isn't. */ if (param & 0xffff00000000ULL) return HV_STATUS_INVALID_PORT_ID; /* remaining bits are reserved-zero */ if (param & ~KVM_HYPERV_CONN_ID_MASK) return HV_STATUS_INVALID_HYPERCALL_INPUT; /* the eventfd is protected by vcpu->kvm->srcu, but conn_to_evt isn't */ rcu_read_lock(); eventfd = idr_find(&vcpu->kvm->arch.hyperv.conn_to_evt, param); rcu_read_unlock(); if (!eventfd) return HV_STATUS_INVALID_PORT_ID; eventfd_signal(eventfd, 1); return HV_STATUS_SUCCESS; } int kvm_hv_hypercall(struct kvm_vcpu *vcpu) { u64 param, ingpa, outgpa, ret = HV_STATUS_SUCCESS; uint16_t code, rep_idx, rep_cnt; bool fast, longmode, rep; /* * hypercall generates UD from non zero cpl and real mode * per HYPER-V spec */ if (kvm_x86_ops->get_cpl(vcpu) != 0 || !is_protmode(vcpu)) { kvm_queue_exception(vcpu, UD_VECTOR); return 1; } longmode = is_64_bit_mode(vcpu); if (!longmode) { param = ((u64)kvm_register_read(vcpu, VCPU_REGS_RDX) << 32) | (kvm_register_read(vcpu, VCPU_REGS_RAX) & 0xffffffff); ingpa = ((u64)kvm_register_read(vcpu, VCPU_REGS_RBX) << 32) | (kvm_register_read(vcpu, VCPU_REGS_RCX) & 0xffffffff); outgpa = ((u64)kvm_register_read(vcpu, VCPU_REGS_RDI) << 32) | (kvm_register_read(vcpu, VCPU_REGS_RSI) & 0xffffffff); } #ifdef CONFIG_X86_64 else { param = kvm_register_read(vcpu, VCPU_REGS_RCX); ingpa = kvm_register_read(vcpu, VCPU_REGS_RDX); outgpa = kvm_register_read(vcpu, VCPU_REGS_R8); } #endif code = param & 0xffff; fast = !!(param & HV_HYPERCALL_FAST_BIT); rep_cnt = (param >> HV_HYPERCALL_REP_COMP_OFFSET) & 0xfff; rep_idx = (param >> HV_HYPERCALL_REP_START_OFFSET) & 0xfff; rep = !!(rep_cnt || rep_idx); trace_kvm_hv_hypercall(code, fast, rep_cnt, rep_idx, ingpa, outgpa); switch (code) { case HVCALL_NOTIFY_LONG_SPIN_WAIT: if (unlikely(rep)) { ret = HV_STATUS_INVALID_HYPERCALL_INPUT; break; } kvm_vcpu_on_spin(vcpu, true); break; case HVCALL_SIGNAL_EVENT: if (unlikely(rep)) { ret = HV_STATUS_INVALID_HYPERCALL_INPUT; break; } ret = kvm_hvcall_signal_event(vcpu, fast, ingpa); if (ret != HV_STATUS_INVALID_PORT_ID) break; /* maybe userspace knows this conn_id: fall through */ case HVCALL_POST_MESSAGE: /* don't bother userspace if it has no way to handle it */ if (unlikely(rep || !vcpu_to_synic(vcpu)->active)) { ret = HV_STATUS_INVALID_HYPERCALL_INPUT; break; } vcpu->run->exit_reason = KVM_EXIT_HYPERV; vcpu->run->hyperv.type = KVM_EXIT_HYPERV_HCALL; vcpu->run->hyperv.u.hcall.input = param; vcpu->run->hyperv.u.hcall.params[0] = ingpa; vcpu->run->hyperv.u.hcall.params[1] = outgpa; vcpu->arch.complete_userspace_io = kvm_hv_hypercall_complete_userspace; return 0; case HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST: if (unlikely(fast || !rep_cnt || rep_idx)) { ret = HV_STATUS_INVALID_HYPERCALL_INPUT; break; } ret = kvm_hv_flush_tlb(vcpu, ingpa, rep_cnt, false); break; case HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE: if (unlikely(fast || rep)) { ret = HV_STATUS_INVALID_HYPERCALL_INPUT; break; } ret = kvm_hv_flush_tlb(vcpu, ingpa, rep_cnt, false); break; case HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX: if (unlikely(fast || !rep_cnt || rep_idx)) { ret = HV_STATUS_INVALID_HYPERCALL_INPUT; break; } ret = kvm_hv_flush_tlb(vcpu, ingpa, rep_cnt, true); break; case HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX: if (unlikely(fast || rep)) { ret = HV_STATUS_INVALID_HYPERCALL_INPUT; break; } ret = kvm_hv_flush_tlb(vcpu, ingpa, rep_cnt, true); break; default: ret = HV_STATUS_INVALID_HYPERCALL_CODE; break; } return kvm_hv_hypercall_complete(vcpu, ret); } void kvm_hv_init_vm(struct kvm *kvm) { mutex_init(&kvm->arch.hyperv.hv_lock); idr_init(&kvm->arch.hyperv.conn_to_evt); } void kvm_hv_destroy_vm(struct kvm *kvm) { struct eventfd_ctx *eventfd; int i; idr_for_each_entry(&kvm->arch.hyperv.conn_to_evt, eventfd, i) eventfd_ctx_put(eventfd); idr_destroy(&kvm->arch.hyperv.conn_to_evt); } static int kvm_hv_eventfd_assign(struct kvm *kvm, u32 conn_id, int fd) { struct kvm_hv *hv = &kvm->arch.hyperv; struct eventfd_ctx *eventfd; int ret; eventfd = eventfd_ctx_fdget(fd); if (IS_ERR(eventfd)) return PTR_ERR(eventfd); mutex_lock(&hv->hv_lock); ret = idr_alloc(&hv->conn_to_evt, eventfd, conn_id, conn_id + 1, GFP_KERNEL); mutex_unlock(&hv->hv_lock); if (ret >= 0) return 0; if (ret == -ENOSPC) ret = -EEXIST; eventfd_ctx_put(eventfd); return ret; } static int kvm_hv_eventfd_deassign(struct kvm *kvm, u32 conn_id) { struct kvm_hv *hv = &kvm->arch.hyperv; struct eventfd_ctx *eventfd; mutex_lock(&hv->hv_lock); eventfd = idr_remove(&hv->conn_to_evt, conn_id); mutex_unlock(&hv->hv_lock); if (!eventfd) return -ENOENT; synchronize_srcu(&kvm->srcu); eventfd_ctx_put(eventfd); return 0; } int kvm_vm_ioctl_hv_eventfd(struct kvm *kvm, struct kvm_hyperv_eventfd *args) { if ((args->flags & ~KVM_HYPERV_EVENTFD_DEASSIGN) || (args->conn_id & ~KVM_HYPERV_CONN_ID_MASK)) return -EINVAL; if (args->flags == KVM_HYPERV_EVENTFD_DEASSIGN) return kvm_hv_eventfd_deassign(kvm, args->conn_id); return kvm_hv_eventfd_assign(kvm, args->conn_id, args->fd); }
211 210 67 67 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 /* FS-Cache tracepoints * * Copyright (C) 2016 Red Hat, Inc. All Rights Reserved. * Written by David Howells (dhowells@redhat.com) * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public Licence * as published by the Free Software Foundation; either version * 2 of the Licence, or (at your option) any later version. */ #undef TRACE_SYSTEM #define TRACE_SYSTEM fscache #if !defined(_TRACE_FSCACHE_H) || defined(TRACE_HEADER_MULTI_READ) #define _TRACE_FSCACHE_H #include <linux/fscache.h> #include <linux/tracepoint.h> /* * Define enums for tracing information. */ #ifndef __FSCACHE_DECLARE_TRACE_ENUMS_ONCE_ONLY #define __FSCACHE_DECLARE_TRACE_ENUMS_ONCE_ONLY enum fscache_cookie_trace { fscache_cookie_collision, fscache_cookie_discard, fscache_cookie_get_acquire_parent, fscache_cookie_get_attach_object, fscache_cookie_get_reacquire, fscache_cookie_get_register_netfs, fscache_cookie_put_acquire_nobufs, fscache_cookie_put_dup_netfs, fscache_cookie_put_relinquish, fscache_cookie_put_object, fscache_cookie_put_parent, }; enum fscache_page_trace { fscache_page_cached, fscache_page_inval, fscache_page_maybe_release, fscache_page_radix_clear_store, fscache_page_radix_delete, fscache_page_radix_insert, fscache_page_radix_pend2store, fscache_page_radix_set_pend, fscache_page_uncache, fscache_page_write, fscache_page_write_end, fscache_page_write_end_pend, fscache_page_write_end_noc, fscache_page_write_wait, fscache_page_trace__nr }; enum fscache_op_trace { fscache_op_cancel, fscache_op_cancel_all, fscache_op_cancelled, fscache_op_completed, fscache_op_enqueue_async, fscache_op_enqueue_mythread, fscache_op_gc, fscache_op_init, fscache_op_put, fscache_op_run, fscache_op_signal, fscache_op_submit, fscache_op_submit_ex, fscache_op_work, fscache_op_trace__nr }; enum fscache_page_op_trace { fscache_page_op_alloc_one, fscache_page_op_attr_changed, fscache_page_op_check_consistency, fscache_page_op_invalidate, fscache_page_op_retr_multi, fscache_page_op_retr_one, fscache_page_op_write_one, fscache_page_op_trace__nr }; #endif /* * Declare tracing information enums and their string mappings for display. */ #define fscache_cookie_traces \ EM(fscache_cookie_collision, "*COLLISION*") \ EM(fscache_cookie_discard, "DISCARD") \ EM(fscache_cookie_get_acquire_parent, "GET prn") \ EM(fscache_cookie_get_attach_object, "GET obj") \ EM(fscache_cookie_get_reacquire, "GET raq") \ EM(fscache_cookie_get_register_netfs, "GET net") \ EM(fscache_cookie_put_acquire_nobufs, "PUT nbf") \ EM(fscache_cookie_put_dup_netfs, "PUT dnt") \ EM(fscache_cookie_put_relinquish, "PUT rlq") \ EM(fscache_cookie_put_object, "PUT obj") \ E_(fscache_cookie_put_parent, "PUT prn") #define fscache_page_traces \ EM(fscache_page_cached, "Cached ") \ EM(fscache_page_inval, "InvalPg") \ EM(fscache_page_maybe_release, "MayRels") \ EM(fscache_page_uncache, "Uncache") \ EM(fscache_page_radix_clear_store, "RxCStr ") \ EM(fscache_page_radix_delete, "RxDel ") \ EM(fscache_page_radix_insert, "RxIns ") \ EM(fscache_page_radix_pend2store, "RxP2S ") \ EM(fscache_page_radix_set_pend, "RxSPend ") \ EM(fscache_page_write, "WritePg") \ EM(fscache_page_write_end, "EndPgWr") \ EM(fscache_page_write_end_pend, "EndPgWP") \ EM(fscache_page_write_end_noc, "EndPgNC") \ E_(fscache_page_write_wait, "WtOnWrt") #define fscache_op_traces \ EM(fscache_op_cancel, "Cancel1") \ EM(fscache_op_cancel_all, "CancelA") \ EM(fscache_op_cancelled, "Canclld") \ EM(fscache_op_completed, "Complet") \ EM(fscache_op_enqueue_async, "EnqAsyn") \ EM(fscache_op_enqueue_mythread, "EnqMyTh") \ EM(fscache_op_gc, "GC ") \ EM(fscache_op_init, "Init ") \ EM(fscache_op_put, "Put ") \ EM(fscache_op_run, "Run ") \ EM(fscache_op_signal, "Signal ") \ EM(fscache_op_submit, "Submit ") \ EM(fscache_op_submit_ex, "SubmitX") \ E_(fscache_op_work, "Work ") #define fscache_page_op_traces \ EM(fscache_page_op_alloc_one, "Alloc1 ") \ EM(fscache_page_op_attr_changed, "AttrChg") \ EM(fscache_page_op_check_consistency, "CheckCn") \ EM(fscache_page_op_invalidate, "Inval ") \ EM(fscache_page_op_retr_multi, "RetrMul") \ EM(fscache_page_op_retr_one, "Retr1 ") \ E_(fscache_page_op_write_one, "Write1 ") /* * Export enum symbols via userspace. */ #undef EM #undef E_ #define EM(a, b) TRACE_DEFINE_ENUM(a); #define E_(a, b) TRACE_DEFINE_ENUM(a); fscache_cookie_traces; /* * Now redefine the EM() and E_() macros to map the enums to the strings that * will be printed in the output. */ #undef EM #undef E_ #define EM(a, b) { a, b }, #define E_(a, b) { a, b } TRACE_EVENT(fscache_cookie, TP_PROTO(struct fscache_cookie *cookie, enum fscache_cookie_trace where, int usage), TP_ARGS(cookie, where, usage), TP_STRUCT__entry( __field(struct fscache_cookie *, cookie ) __field(struct fscache_cookie *, parent ) __field(enum fscache_cookie_trace, where ) __field(int, usage ) __field(int, n_children ) __field(int, n_active ) __field(u8, flags ) ), TP_fast_assign( __entry->cookie = cookie; __entry->parent = cookie->parent; __entry->where = where; __entry->usage = usage; __entry->n_children = atomic_read(&cookie->n_children); __entry->n_active = atomic_read(&cookie->n_active); __entry->flags = cookie->flags; ), TP_printk("%s c=%p u=%d p=%p Nc=%d Na=%d f=%02x", __print_symbolic(__entry->where, fscache_cookie_traces), __entry->cookie, __entry->usage, __entry->parent, __entry->n_children, __entry->n_active, __entry->flags) ); TRACE_EVENT(fscache_netfs, TP_PROTO(struct fscache_netfs *netfs), TP_ARGS(netfs), TP_STRUCT__entry( __field(struct fscache_cookie *, cookie ) __array(char, name, 8 ) ), TP_fast_assign( __entry->cookie = netfs->primary_index; strncpy(__entry->name, netfs->name, 8); __entry->name[7] = 0; ), TP_printk("c=%p n=%s", __entry->cookie, __entry->name) ); TRACE_EVENT(fscache_acquire, TP_PROTO(struct fscache_cookie *cookie), TP_ARGS(cookie), TP_STRUCT__entry( __field(struct fscache_cookie *, cookie ) __field(struct fscache_cookie *, parent ) __array(char, name, 8 ) __field(int, p_usage ) __field(int, p_n_children ) __field(u8, p_flags ) ), TP_fast_assign( __entry->cookie = cookie; __entry->parent = cookie->parent; __entry->p_usage = atomic_read(&cookie->parent->usage); __entry->p_n_children = atomic_read(&cookie->parent->n_children); __entry->p_flags = cookie->parent->flags; memcpy(__entry->name, cookie->def->name, 8); __entry->name[7] = 0; ), TP_printk("c=%p p=%p pu=%d pc=%d pf=%02x n=%s", __entry->cookie, __entry->parent, __entry->p_usage, __entry->p_n_children, __entry->p_flags, __entry->name) ); TRACE_EVENT(fscache_relinquish, TP_PROTO(struct fscache_cookie *cookie, bool retire), TP_ARGS(cookie, retire), TP_STRUCT__entry( __field(struct fscache_cookie *, cookie ) __field(struct fscache_cookie *, parent ) __field(int, usage ) __field(int, n_children ) __field(int, n_active ) __field(u8, flags ) __field(bool, retire ) ), TP_fast_assign( __entry->cookie = cookie; __entry->parent = cookie->parent; __entry->usage = atomic_read(&cookie->usage); __entry->n_children = atomic_read(&cookie->n_children); __entry->n_active = atomic_read(&cookie->n_active); __entry->flags = cookie->flags; __entry->retire = retire; ), TP_printk("c=%p u=%d p=%p Nc=%d Na=%d f=%02x r=%u", __entry->cookie, __entry->usage, __entry->parent, __entry->n_children, __entry->n_active, __entry->flags, __entry->retire) ); TRACE_EVENT(fscache_enable, TP_PROTO(struct fscache_cookie *cookie), TP_ARGS(cookie), TP_STRUCT__entry( __field(struct fscache_cookie *, cookie ) __field(int, usage ) __field(int, n_children ) __field(int, n_active ) __field(u8, flags ) ), TP_fast_assign( __entry->cookie = cookie; __entry->usage = atomic_read(&cookie->usage); __entry->n_children = atomic_read(&cookie->n_children); __entry->n_active = atomic_read(&cookie->n_active); __entry->flags = cookie->flags; ), TP_printk("c=%p u=%d Nc=%d Na=%d f=%02x", __entry->cookie, __entry->usage, __entry->n_children, __entry->n_active, __entry->flags) ); TRACE_EVENT(fscache_disable, TP_PROTO(struct fscache_cookie *cookie), TP_ARGS(cookie), TP_STRUCT__entry( __field(struct fscache_cookie *, cookie ) __field(int, usage ) __field(int, n_children ) __field(int, n_active ) __field(u8, flags ) ), TP_fast_assign( __entry->cookie = cookie; __entry->usage = atomic_read(&cookie->usage); __entry->n_children = atomic_read(&cookie->n_children); __entry->n_active = atomic_read(&cookie->n_active); __entry->flags = cookie->flags; ), TP_printk("c=%p u=%d Nc=%d Na=%d f=%02x", __entry->cookie, __entry->usage, __entry->n_children, __entry->n_active, __entry->flags) ); TRACE_EVENT(fscache_osm, TP_PROTO(struct fscache_object *object, const struct fscache_state *state, bool wait, bool oob, s8 event_num), TP_ARGS(object, state, wait, oob, event_num), TP_STRUCT__entry( __field(struct fscache_cookie *, cookie ) __field(struct fscache_object *, object ) __array(char, state, 8 ) __field(bool, wait ) __field(bool, oob ) __field(s8, event_num ) ), TP_fast_assign( __entry->cookie = object->cookie; __entry->object = object; __entry->wait = wait; __entry->oob = oob; __entry->event_num = event_num; memcpy(__entry->state, state->short_name, 8); ), TP_printk("c=%p o=%p %s %s%sev=%d", __entry->cookie, __entry->object, __entry->state, __print_symbolic(__entry->wait, { true, "WAIT" }, { false, "WORK" }), __print_symbolic(__entry->oob, { true, " OOB " }, { false, " " }), __entry->event_num) ); TRACE_EVENT(fscache_page, TP_PROTO(struct fscache_cookie *cookie, struct page *page, enum fscache_page_trace why), TP_ARGS(cookie, page, why), TP_STRUCT__entry( __field(struct fscache_cookie *, cookie ) __field(pgoff_t, page ) __field(enum fscache_page_trace, why ) ), TP_fast_assign( __entry->cookie = cookie; __entry->page = page->index; __entry->why = why; ), TP_printk("c=%p %s pg=%lx", __entry->cookie, __print_symbolic(__entry->why, fscache_page_traces), __entry->page) ); TRACE_EVENT(fscache_check_page, TP_PROTO(struct fscache_cookie *cookie, struct page *page, void *val, int n), TP_ARGS(cookie, page, val, n), TP_STRUCT__entry( __field(struct fscache_cookie *, cookie ) __field(void *, page ) __field(void *, val ) __field(int, n ) ), TP_fast_assign( __entry->cookie = cookie; __entry->page = page; __entry->val = val; __entry->n = n; ), TP_printk("c=%p pg=%p val=%p n=%d", __entry->cookie, __entry->page, __entry->val, __entry->n) ); TRACE_EVENT(fscache_wake_cookie, TP_PROTO(struct fscache_cookie *cookie), TP_ARGS(cookie), TP_STRUCT__entry( __field(struct fscache_cookie *, cookie ) ), TP_fast_assign( __entry->cookie = cookie; ), TP_printk("c=%p", __entry->cookie) ); TRACE_EVENT(fscache_op, TP_PROTO(struct fscache_cookie *cookie, struct fscache_operation *op, enum fscache_op_trace why), TP_ARGS(cookie, op, why), TP_STRUCT__entry( __field(struct fscache_cookie *, cookie ) __field(struct fscache_operation *, op ) __field(enum fscache_op_trace, why ) ), TP_fast_assign( __entry->cookie = cookie; __entry->op = op; __entry->why = why; ), TP_printk("c=%p op=%p %s", __entry->cookie, __entry->op, __print_symbolic(__entry->why, fscache_op_traces)) ); TRACE_EVENT(fscache_page_op, TP_PROTO(struct fscache_cookie *cookie, struct page *page, struct fscache_operation *op, enum fscache_page_op_trace what), TP_ARGS(cookie, page, op, what), TP_STRUCT__entry( __field(struct fscache_cookie *, cookie ) __field(pgoff_t, page ) __field(struct fscache_operation *, op ) __field(enum fscache_page_op_trace, what ) ), TP_fast_assign( __entry->cookie = cookie; __entry->page = page ? page->index : 0; __entry->op = op; __entry->what = what; ), TP_printk("c=%p %s pg=%lx op=%p", __entry->cookie, __print_symbolic(__entry->what, fscache_page_op_traces), __entry->page, __entry->op) ); TRACE_EVENT(fscache_wrote_page, TP_PROTO(struct fscache_cookie *cookie, struct page *page, struct fscache_operation *op, int ret), TP_ARGS(cookie, page, op, ret), TP_STRUCT__entry( __field(struct fscache_cookie *, cookie ) __field(pgoff_t, page ) __field(struct fscache_operation *, op ) __field(int, ret ) ), TP_fast_assign( __entry->cookie = cookie; __entry->page = page->index; __entry->op = op; __entry->ret = ret; ), TP_printk("c=%p pg=%lx op=%p ret=%d", __entry->cookie, __entry->page, __entry->op, __entry->ret) ); TRACE_EVENT(fscache_gang_lookup, TP_PROTO(struct fscache_cookie *cookie, struct fscache_operation *op, void **results, int n, pgoff_t store_limit), TP_ARGS(cookie, op, results, n, store_limit), TP_STRUCT__entry( __field(struct fscache_cookie *, cookie ) __field(struct fscache_operation *, op ) __field(pgoff_t, results0 ) __field(int, n ) __field(pgoff_t, store_limit ) ), TP_fast_assign( __entry->cookie = cookie; __entry->op = op; __entry->results0 = results[0] ? ((struct page *)results[0])->index : (pgoff_t)-1; __entry->n = n; __entry->store_limit = store_limit; ), TP_printk("c=%p op=%p r0=%lx n=%d sl=%lx", __entry->cookie, __entry->op, __entry->results0, __entry->n, __entry->store_limit) ); #endif /* _TRACE_FSCACHE_H */ /* This part must be outside protection */ #include <trace/define_trace.h>
30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 /* SPDX-License-Identifier: GPL-2.0 */ #ifndef _FS_CEPH_SUPER_H #define _FS_CEPH_SUPER_H #include <linux/ceph/ceph_debug.h> #include <asm/unaligned.h> #include <linux/backing-dev.h> #include <linux/completion.h> #include <linux/exportfs.h> #include <linux/fs.h> #include <linux/mempool.h> #include <linux/pagemap.h> #include <linux/wait.h> #include <linux/writeback.h> #include <linux/slab.h> #include <linux/posix_acl.h> #include <linux/refcount.h> #include <linux/ceph/libceph.h> #ifdef CONFIG_CEPH_FSCACHE #include <linux/fscache.h> #endif /* f_type in struct statfs */ #define CEPH_SUPER_MAGIC 0x00c36400 /* large granularity for statfs utilization stats to facilitate * large volume sizes on 32-bit machines. */ #define CEPH_BLOCK_SHIFT 22 /* 4 MB */ #define CEPH_BLOCK (1 << CEPH_BLOCK_SHIFT) #define CEPH_MOUNT_OPT_DIRSTAT (1<<4) /* `cat dirname` for stats */ #define CEPH_MOUNT_OPT_RBYTES (1<<5) /* dir st_bytes = rbytes */ #define CEPH_MOUNT_OPT_NOASYNCREADDIR (1<<7) /* no dcache readdir */ #define CEPH_MOUNT_OPT_INO32 (1<<8) /* 32 bit inos */ #define CEPH_MOUNT_OPT_DCACHE (1<<9) /* use dcache for readdir etc */ #define CEPH_MOUNT_OPT_FSCACHE (1<<10) /* use fscache */ #define CEPH_MOUNT_OPT_NOPOOLPERM (1<<11) /* no pool permission check */ #define CEPH_MOUNT_OPT_MOUNTWAIT (1<<12) /* mount waits if no mds is up */ #define CEPH_MOUNT_OPT_NOQUOTADF (1<<13) /* no root dir quota in statfs */ #define CEPH_MOUNT_OPT_DEFAULT CEPH_MOUNT_OPT_DCACHE #define ceph_set_mount_opt(fsc, opt) \ (fsc)->mount_options->flags |= CEPH_MOUNT_OPT_##opt; #define ceph_test_mount_opt(fsc, opt) \ (!!((fsc)->mount_options->flags & CEPH_MOUNT_OPT_##opt)) /* max size of osd read request, limited by libceph */ #define CEPH_MAX_READ_SIZE CEPH_MSG_MAX_DATA_LEN /* osd has a configurable limitaion of max write size. * CEPH_MSG_MAX_DATA_LEN should be small enough. */ #define CEPH_MAX_WRITE_SIZE CEPH_MSG_MAX_DATA_LEN #define CEPH_RASIZE_DEFAULT (8192*1024) /* max readahead */ #define CEPH_MAX_READDIR_DEFAULT 1024 #define CEPH_MAX_READDIR_BYTES_DEFAULT (512*1024) #define CEPH_SNAPDIRNAME_DEFAULT ".snap" /* * Delay telling the MDS we no longer want caps, in case we reopen * the file. Delay a minimum amount of time, even if we send a cap * message for some other reason. Otherwise, take the oppotunity to * update the mds to avoid sending another message later. */ #define CEPH_CAPS_WANTED_DELAY_MIN_DEFAULT 5 /* cap release delay */ #define CEPH_CAPS_WANTED_DELAY_MAX_DEFAULT 60 /* cap release delay */ struct ceph_mount_options { int flags; int sb_flags; int wsize; /* max write size */ int rsize; /* max read size */ int rasize; /* max readahead */ int congestion_kb; /* max writeback in flight */ int caps_wanted_delay_min, caps_wanted_delay_max; int max_readdir; /* max readdir result (entires) */ int max_readdir_bytes; /* max readdir result (bytes) */ /* * everything above this point can be memcmp'd; everything below * is handled in compare_mount_options() */ char *snapdir_name; /* default ".snap" */ char *mds_namespace; /* default NULL */ char *server_path; /* default NULL (means "/") */ char *fscache_uniq; /* default NULL */ }; struct ceph_fs_client { struct super_block *sb; struct ceph_mount_options *mount_options; struct ceph_client *client; unsigned long mount_state; int min_caps; /* min caps i added */ loff_t max_file_size; struct ceph_mds_client *mdsc; /* writeback */ mempool_t *wb_pagevec_pool; struct workqueue_struct *wb_wq; struct workqueue_struct *pg_inv_wq; struct workqueue_struct *trunc_wq; atomic_long_t writeback_count; #ifdef CONFIG_DEBUG_FS struct dentry *debugfs_dentry_lru, *debugfs_caps; struct dentry *debugfs_congestion_kb; struct dentry *debugfs_bdi; struct dentry *debugfs_mdsc, *debugfs_mdsmap; struct dentry *debugfs_mds_sessions; #endif #ifdef CONFIG_CEPH_FSCACHE struct fscache_cookie *fscache; #endif }; /* * File i/o capability. This tracks shared state with the metadata * server that allows us to cache or writeback attributes or to read * and write data. For any given inode, we should have one or more * capabilities, one issued by each metadata server, and our * cumulative access is the OR of all issued capabilities. * * Each cap is referenced by the inode's i_caps rbtree and by per-mds * session capability lists. */ struct ceph_cap { struct ceph_inode_info *ci; struct rb_node ci_node; /* per-ci cap tree */ struct ceph_mds_session *session; struct list_head session_caps; /* per-session caplist */ u64 cap_id; /* unique cap id (mds provided) */ union { /* in-use caps */ struct { int issued; /* latest, from the mds */ int implemented; /* implemented superset of issued (for revocation) */ int mds, mds_wanted; }; /* caps to release */ struct { u64 cap_ino; int queue_release; }; }; u32 seq, issue_seq, mseq; u32 cap_gen; /* active/stale cycle */ unsigned long last_used; struct list_head caps_item; }; #define CHECK_CAPS_NODELAY 1 /* do not delay any further */ #define CHECK_CAPS_AUTHONLY 2 /* only check auth cap */ #define CHECK_CAPS_FLUSH 4 /* flush any dirty caps */ struct ceph_cap_flush { u64 tid; int caps; /* 0 means capsnap */ bool wake; /* wake up flush waiters when finish ? */ struct list_head g_list; // global struct list_head i_list; // per inode }; /* * Snapped cap state that is pending flush to mds. When a snapshot occurs, * we first complete any in-process sync writes and writeback any dirty * data before flushing the snapped state (tracked here) back to the MDS. */ struct ceph_cap_snap { refcount_t nref; struct list_head ci_item; struct ceph_cap_flush cap_flush; u64 follows; int issued, dirty; struct ceph_snap_context *context; umode_t mode; kuid_t uid; kgid_t gid; struct ceph_buffer *xattr_blob; u64 xattr_version; u64 size; struct timespec64 mtime, atime, ctime; u64 time_warp_seq; u64 truncate_size; u32 truncate_seq; int writing; /* a sync write is still in progress */ int dirty_pages; /* dirty pages awaiting writeback */ bool inline_data; bool need_flush; }; static inline void ceph_put_cap_snap(struct ceph_cap_snap *capsnap) { if (refcount_dec_and_test(&capsnap->nref)) { if (capsnap->xattr_blob) ceph_buffer_put(capsnap->xattr_blob); kfree(capsnap); } } /* * The frag tree describes how a directory is fragmented, potentially across * multiple metadata servers. It is also used to indicate points where * metadata authority is delegated, and whether/where metadata is replicated. * * A _leaf_ frag will be present in the i_fragtree IFF there is * delegation info. That is, if mds >= 0 || ndist > 0. */ #define CEPH_MAX_DIRFRAG_REP 4 struct ceph_inode_frag { struct rb_node node; /* fragtree state */ u32 frag; int split_by; /* i.e. 2^(split_by) children */ /* delegation and replication info */ int mds; /* -1 if same authority as parent */ int ndist; /* >0 if replicated */ int dist[CEPH_MAX_DIRFRAG_REP]; }; /* * We cache inode xattrs as an encoded blob until they are first used, * at which point we parse them into an rbtree. */ struct ceph_inode_xattr { struct rb_node node; const char *name; int name_len; const char *val; int val_len; int dirty; int should_free_name; int should_free_val; }; /* * Ceph dentry state */ struct ceph_dentry_info { struct ceph_mds_session *lease_session; int lease_shared_gen; u32 lease_gen; u32 lease_seq; unsigned long lease_renew_after, lease_renew_from; struct list_head lru; struct dentry *dentry; unsigned long time; u64 offset; }; struct ceph_inode_xattrs_info { /* * (still encoded) xattr blob. we avoid the overhead of parsing * this until someone actually calls getxattr, etc. * * blob->vec.iov_len == 4 implies there are no xattrs; blob == * NULL means we don't know. */ struct ceph_buffer *blob, *prealloc_blob; struct rb_root index; bool dirty; int count; int names_size; int vals_size; u64 version, index_version; }; /* * Ceph inode. */ struct ceph_inode_info { struct ceph_vino i_vino; /* ceph ino + snap */ spinlock_t i_ceph_lock; u64 i_version; u64 i_inline_version; u32 i_time_warp_seq; unsigned i_ceph_flags; atomic64_t i_release_count; atomic64_t i_ordered_count; atomic64_t i_complete_seq[2]; struct ceph_dir_layout i_dir_layout; struct ceph_file_layout i_layout; char *i_symlink; /* for dirs */ struct timespec64 i_rctime; u64 i_rbytes, i_rfiles, i_rsubdirs; u64 i_files, i_subdirs; /* quotas */ u64 i_max_bytes, i_max_files; struct rb_root i_fragtree; int i_fragtree_nsplits; struct mutex i_fragtree_mutex; struct ceph_inode_xattrs_info i_xattrs; /* capabilities. protected _both_ by i_ceph_lock and cap->session's * s_mutex. */ struct rb_root i_caps; /* cap list */ struct ceph_cap *i_auth_cap; /* authoritative cap, if any */ unsigned i_dirty_caps, i_flushing_caps; /* mask of dirtied fields */ struct list_head i_dirty_item, i_flushing_item; /* we need to track cap writeback on a per-cap-bit basis, to allow * overlapping, pipelined cap flushes to the mds. we can probably * reduce the tid to 8 bits if we're concerned about inode size. */ struct ceph_cap_flush *i_prealloc_cap_flush; struct list_head i_cap_flush_list; wait_queue_head_t i_cap_wq; /* threads waiting on a capability */ unsigned long i_hold_caps_min; /* jiffies */ unsigned long i_hold_caps_max; /* jiffies */ struct list_head i_cap_delay_list; /* for delayed cap release to mds */ struct ceph_cap_reservation i_cap_migration_resv; struct list_head i_cap_snaps; /* snapped state pending flush to mds */ struct ceph_snap_context *i_head_snapc; /* set if wr_buffer_head > 0 or dirty|flushing caps */ unsigned i_snap_caps; /* cap bits for snapped files */ int i_nr_by_mode[CEPH_FILE_MODE_BITS]; /* open file counts */ struct mutex i_truncate_mutex; u32 i_truncate_seq; /* last truncate to smaller size */ u64 i_truncate_size; /* and the size we last truncated down to */ int i_truncate_pending; /* still need to call vmtruncate */ u64 i_max_size; /* max file size authorized by mds */ u64 i_reported_size; /* (max_)size reported to or requested of mds */ u64 i_wanted_max_size; /* offset we'd like to write too */ u64 i_requested_max_size; /* max_size we've requested */ /* held references to caps */ int i_pin_ref; int i_rd_ref, i_rdcache_ref, i_wr_ref, i_wb_ref; int i_wrbuffer_ref, i_wrbuffer_ref_head; atomic_t i_filelock_ref; atomic_t i_shared_gen; /* increment each time we get FILE_SHARED */ u32 i_rdcache_gen; /* incremented each time we get FILE_CACHE. */ u32 i_rdcache_revoking; /* RDCACHE gen to async invalidate, if any */ struct list_head i_unsafe_dirops; /* uncommitted mds dir ops */ struct list_head i_unsafe_iops; /* uncommitted mds inode ops */ spinlock_t i_unsafe_lock; struct ceph_snap_realm *i_snap_realm; /* snap realm (if caps) */ int i_snap_realm_counter; /* snap realm (if caps) */ struct list_head i_snap_realm_item; struct list_head i_snap_flush_item; struct work_struct i_wb_work; /* writeback work */ struct work_struct i_pg_inv_work; /* page invalidation work */ struct work_struct i_vmtruncate_work; #ifdef CONFIG_CEPH_FSCACHE struct fscache_cookie *fscache; u32 i_fscache_gen; #endif struct inode vfs_inode; /* at end */ }; static inline struct ceph_inode_info *ceph_inode(struct inode *inode) { return container_of(inode, struct ceph_inode_info, vfs_inode); } static inline struct ceph_fs_client *ceph_inode_to_client(struct inode *inode) { return (struct ceph_fs_client *)inode->i_sb->s_fs_info; } static inline struct ceph_fs_client *ceph_sb_to_client(struct super_block *sb) { return (struct ceph_fs_client *)sb->s_fs_info; } static inline struct ceph_vino ceph_vino(struct inode *inode) { return ceph_inode(inode)->i_vino; } /* * ino_t is <64 bits on many architectures, blech. * * i_ino (kernel inode) st_ino (userspace) * i386 32 32 * x86_64+ino32 64 32 * x86_64 64 64 */ static inline u32 ceph_ino_to_ino32(__u64 vino) { u32 ino = vino & 0xffffffff; ino ^= vino >> 32; if (!ino) ino = 2; return ino; } /* * kernel i_ino value */ static inline ino_t ceph_vino_to_ino(struct ceph_vino vino) { #if BITS_PER_LONG == 32 return ceph_ino_to_ino32(vino.ino); #else return (ino_t)vino.ino; #endif } /* * user-visible ino (stat, filldir) */ #if BITS_PER_LONG == 32 static inline ino_t ceph_translate_ino(struct super_block *sb, ino_t ino) { return ino; } #else static inline ino_t ceph_translate_ino(struct super_block *sb, ino_t ino) { if (ceph_test_mount_opt(ceph_sb_to_client(sb), INO32)) ino = ceph_ino_to_ino32(ino); return ino; } #endif /* for printf-style formatting */ #define ceph_vinop(i) ceph_inode(i)->i_vino.ino, ceph_inode(i)->i_vino.snap static inline u64 ceph_ino(struct inode *inode) { return ceph_inode(inode)->i_vino.ino; } static inline u64 ceph_snap(struct inode *inode) { return ceph_inode(inode)->i_vino.snap; } static inline int ceph_ino_compare(struct inode *inode, void *data) { struct ceph_vino *pvino = (struct ceph_vino *)data; struct ceph_inode_info *ci = ceph_inode(inode); return ci->i_vino.ino == pvino->ino && ci->i_vino.snap == pvino->snap; } static inline struct inode *ceph_find_inode(struct super_block *sb, struct ceph_vino vino) { ino_t t = ceph_vino_to_ino(vino); return ilookup5(sb, t, ceph_ino_compare, &vino); } /* * Ceph inode. */ #define CEPH_I_DIR_ORDERED (1 << 0) /* dentries in dir are ordered */ #define CEPH_I_NODELAY (1 << 1) /* do not delay cap release */ #define CEPH_I_FLUSH (1 << 2) /* do not delay flush of dirty metadata */ #define CEPH_I_NOFLUSH (1 << 3) /* do not flush dirty caps */ #define CEPH_I_POOL_PERM (1 << 4) /* pool rd/wr bits are valid */ #define CEPH_I_POOL_RD (1 << 5) /* can read from pool */ #define CEPH_I_POOL_WR (1 << 6) /* can write to pool */ #define CEPH_I_SEC_INITED (1 << 7) /* security initialized */ #define CEPH_I_CAP_DROPPED (1 << 8) /* caps were forcibly dropped */ #define CEPH_I_KICK_FLUSH (1 << 9) /* kick flushing caps */ #define CEPH_I_FLUSH_SNAPS (1 << 10) /* need flush snapss */ #define CEPH_I_ERROR_WRITE (1 << 11) /* have seen write errors */ #define CEPH_I_ERROR_FILELOCK (1 << 12) /* have seen file lock errors */ /* * We set the ERROR_WRITE bit when we start seeing write errors on an inode * and then clear it when they start succeeding. Note that we do a lockless * check first, and only take the lock if it looks like it needs to be changed. * The write submission code just takes this as a hint, so we're not too * worried if a few slip through in either direction. */ static inline void ceph_set_error_write(struct ceph_inode_info *ci) { if (!(READ_ONCE(ci->i_ceph_flags) & CEPH_I_ERROR_WRITE)) { spin_lock(&ci->i_ceph_lock); ci->i_ceph_flags |= CEPH_I_ERROR_WRITE; spin_unlock(&ci->i_ceph_lock); } } static inline void ceph_clear_error_write(struct ceph_inode_info *ci) { if (READ_ONCE(ci->i_ceph_flags) & CEPH_I_ERROR_WRITE) { spin_lock(&ci->i_ceph_lock); ci->i_ceph_flags &= ~CEPH_I_ERROR_WRITE; spin_unlock(&ci->i_ceph_lock); } } static inline void __ceph_dir_set_complete(struct ceph_inode_info *ci, long long release_count, long long ordered_count) { /* * Makes sure operations that setup readdir cache (update page * cache and i_size) are strongly ordered w.r.t. the following * atomic64_set() operations. */ smp_mb(); atomic64_set(&ci->i_complete_seq[0], release_count); atomic64_set(&ci->i_complete_seq[1], ordered_count); } static inline void __ceph_dir_clear_complete(struct ceph_inode_info *ci) { atomic64_inc(&ci->i_release_count); } static inline void __ceph_dir_clear_ordered(struct ceph_inode_info *ci) { atomic64_inc(&ci->i_ordered_count); } static inline bool __ceph_dir_is_complete(struct ceph_inode_info *ci) { return atomic64_read(&ci->i_complete_seq[0]) == atomic64_read(&ci->i_release_count); } static inline bool __ceph_dir_is_complete_ordered(struct ceph_inode_info *ci) { return atomic64_read(&ci->i_complete_seq[0]) == atomic64_read(&ci->i_release_count) && atomic64_read(&ci->i_complete_seq[1]) == atomic64_read(&ci->i_ordered_count); } static inline void ceph_dir_clear_complete(struct inode *inode) { __ceph_dir_clear_complete(ceph_inode(inode)); } static inline void ceph_dir_clear_ordered(struct inode *inode) { __ceph_dir_clear_ordered(ceph_inode(inode)); } static inline bool ceph_dir_is_complete_ordered(struct inode *inode) { bool ret = __ceph_dir_is_complete_ordered(ceph_inode(inode)); smp_rmb(); return ret; } /* find a specific frag @f */ extern struct ceph_inode_frag *__ceph_find_frag(struct ceph_inode_info *ci, u32 f); /* * choose fragment for value @v. copy frag content to pfrag, if leaf * exists */ extern u32 ceph_choose_frag(struct ceph_inode_info *ci, u32 v, struct ceph_inode_frag *pfrag, int *found); static inline struct ceph_dentry_info *ceph_dentry(struct dentry *dentry) { return (struct ceph_dentry_info *)dentry->d_fsdata; } /* * caps helpers */ static inline bool __ceph_is_any_real_caps(struct ceph_inode_info *ci) { return !RB_EMPTY_ROOT(&ci->i_caps); } extern int __ceph_caps_issued(struct ceph_inode_info *ci, int *implemented); extern int __ceph_caps_issued_mask(struct ceph_inode_info *ci, int mask, int t); extern int __ceph_caps_issued_other(struct ceph_inode_info *ci, struct ceph_cap *cap); static inline int ceph_caps_issued(struct ceph_inode_info *ci) { int issued; spin_lock(&ci->i_ceph_lock); issued = __ceph_caps_issued(ci, NULL); spin_unlock(&ci->i_ceph_lock); return issued; } static inline int ceph_caps_issued_mask(struct ceph_inode_info *ci, int mask, int touch) { int r; spin_lock(&ci->i_ceph_lock); r = __ceph_caps_issued_mask(ci, mask, touch); spin_unlock(&ci->i_ceph_lock); return r; } static inline int __ceph_caps_dirty(struct ceph_inode_info *ci) { return ci->i_dirty_caps | ci->i_flushing_caps; } extern struct ceph_cap_flush *ceph_alloc_cap_flush(void); extern void ceph_free_cap_flush(struct ceph_cap_flush *cf); extern int __ceph_mark_dirty_caps(struct ceph_inode_info *ci, int mask, struct ceph_cap_flush **pcf); extern int __ceph_caps_revoking_other(struct ceph_inode_info *ci, struct ceph_cap *ocap, int mask); extern int ceph_caps_revoking(struct ceph_inode_info *ci, int mask); extern int __ceph_caps_used(struct ceph_inode_info *ci); extern int __ceph_caps_file_wanted(struct ceph_inode_info *ci); /* * wanted, by virtue of open file modes AND cap refs (buffered/cached data) */ static inline int __ceph_caps_wanted(struct ceph_inode_info *ci) { int w = __ceph_caps_file_wanted(ci) | __ceph_caps_used(ci); if (w & CEPH_CAP_FILE_BUFFER) w |= CEPH_CAP_FILE_EXCL; /* we want EXCL if dirty data */ return w; } /* what the mds thinks we want */ extern int __ceph_caps_mds_wanted(struct ceph_inode_info *ci, bool check); extern void ceph_caps_init(struct ceph_mds_client *mdsc); extern void ceph_caps_finalize(struct ceph_mds_client *mdsc); extern void ceph_adjust_min_caps(struct ceph_mds_client *mdsc, int delta); extern int ceph_reserve_caps(struct ceph_mds_client *mdsc, struct ceph_cap_reservation *ctx, int need); extern void ceph_unreserve_caps(struct ceph_mds_client *mdsc, struct ceph_cap_reservation *ctx); extern void ceph_reservation_status(struct ceph_fs_client *client, int *total, int *avail, int *used, int *reserved, int *min); /* * we keep buffered readdir results attached to file->private_data */ #define CEPH_F_SYNC 1 #define CEPH_F_ATEND 2 struct ceph_file_info { short fmode; /* initialized on open */ short flags; /* CEPH_F_* */ spinlock_t rw_contexts_lock; struct list_head rw_contexts; }; struct ceph_dir_file_info { struct ceph_file_info file_info; /* readdir: position within the dir */ u32 frag; struct ceph_mds_request *last_readdir; /* readdir: position within a frag */ unsigned next_offset; /* offset of next chunk (last_name's + 1) */ char *last_name; /* last entry in previous chunk */ long long dir_release_count; long long dir_ordered_count; int readdir_cache_idx; /* used for -o dirstat read() on directory thing */ char *dir_info; int dir_info_len; }; struct ceph_rw_context { struct list_head list; struct task_struct *thread; int caps; }; #define CEPH_DEFINE_RW_CONTEXT(_name, _caps) \ struct ceph_rw_context _name = { \ .thread = current, \ .caps = _caps, \ } static inline void ceph_add_rw_context(struct ceph_file_info *cf, struct ceph_rw_context *ctx) { spin_lock(&cf->rw_contexts_lock); list_add(&ctx->list, &cf->rw_contexts); spin_unlock(&cf->rw_contexts_lock); } static inline void ceph_del_rw_context(struct ceph_file_info *cf, struct ceph_rw_context *ctx) { spin_lock(&cf->rw_contexts_lock); list_del(&ctx->list); spin_unlock(&cf->rw_contexts_lock); } static inline struct ceph_rw_context* ceph_find_rw_context(struct ceph_file_info *cf) { struct ceph_rw_context *ctx, *found = NULL; spin_lock(&cf->rw_contexts_lock); list_for_each_entry(ctx, &cf->rw_contexts, list) { if (ctx->thread == current) { found = ctx; break; } } spin_unlock(&cf->rw_contexts_lock); return found; } struct ceph_readdir_cache_control { struct page *page; struct dentry **dentries; int index; }; /* * A "snap realm" describes a subset of the file hierarchy sharing * the same set of snapshots that apply to it. The realms themselves * are organized into a hierarchy, such that children inherit (some of) * the snapshots of their parents. * * All inodes within the realm that have capabilities are linked into a * per-realm list. */ struct ceph_snap_realm { u64 ino; struct inode *inode; atomic_t nref; struct rb_node node; u64 created, seq; u64 parent_ino; u64 parent_since; /* snapid when our current parent became so */ u64 *prior_parent_snaps; /* snaps inherited from any parents we */ u32 num_prior_parent_snaps; /* had prior to parent_since */ u64 *snaps; /* snaps specific to this realm */ u32 num_snaps; struct ceph_snap_realm *parent; struct list_head children; /* list of child realms */ struct list_head child_item; struct list_head empty_item; /* if i have ref==0 */ struct list_head dirty_item; /* if realm needs new context */ /* the current set of snaps for this realm */ struct ceph_snap_context *cached_context; struct list_head inodes_with_caps; spinlock_t inodes_with_caps_lock; }; static inline int default_congestion_kb(void) { int congestion_kb; /* * Copied from NFS * * congestion size, scale with available memory. * * 64MB: 8192k * 128MB: 11585k * 256MB: 16384k * 512MB: 23170k * 1GB: 32768k * 2GB: 46340k * 4GB: 65536k * 8GB: 92681k * 16GB: 131072k * * This allows larger machines to have larger/more transfers. * Limit the default to 256M */ congestion_kb = (16*int_sqrt(totalram_pages)) << (PAGE_SHIFT-10); if (congestion_kb > 256*1024) congestion_kb = 256*1024; return congestion_kb; } /* snap.c */ struct ceph_snap_realm *ceph_lookup_snap_realm(struct ceph_mds_client *mdsc, u64 ino); extern void ceph_get_snap_realm(struct ceph_mds_client *mdsc, struct ceph_snap_realm *realm); extern void ceph_put_snap_realm(struct ceph_mds_client *mdsc, struct ceph_snap_realm *realm); extern int ceph_update_snap_trace(struct ceph_mds_client *m, void *p, void *e, bool deletion, struct ceph_snap_realm **realm_ret); extern void ceph_handle_snap(struct ceph_mds_client *mdsc, struct ceph_mds_session *session, struct ceph_msg *msg); extern void ceph_queue_cap_snap(struct ceph_inode_info *ci); extern int __ceph_finish_cap_snap(struct ceph_inode_info *ci, struct ceph_cap_snap *capsnap); extern void ceph_cleanup_empty_realms(struct ceph_mds_client *mdsc); /* * a cap_snap is "pending" if it is still awaiting an in-progress * sync write (that may/may not still update size, mtime, etc.). */ static inline bool __ceph_have_pending_cap_snap(struct ceph_inode_info *ci) { return !list_empty(&ci->i_cap_snaps) && list_last_entry(&ci->i_cap_snaps, struct ceph_cap_snap, ci_item)->writing; } /* inode.c */ extern const struct inode_operations ceph_file_iops; extern struct inode *ceph_alloc_inode(struct super_block *sb); extern void ceph_evict_inode(struct inode *inode); extern void ceph_destroy_inode(struct inode *inode); extern int ceph_drop_inode(struct inode *inode); extern struct inode *ceph_get_inode(struct super_block *sb, struct ceph_vino vino); extern struct inode *ceph_get_snapdir(struct inode *parent); extern int ceph_fill_file_size(struct inode *inode, int issued, u32 truncate_seq, u64 truncate_size, u64 size); extern void ceph_fill_file_time(struct inode *inode, int issued, u64 time_warp_seq, struct timespec64 *ctime, struct timespec64 *mtime, struct timespec64 *atime); extern int ceph_fill_trace(struct super_block *sb, struct ceph_mds_request *req); extern int ceph_readdir_prepopulate(struct ceph_mds_request *req, struct ceph_mds_session *session); extern int ceph_inode_holds_cap(struct inode *inode, int mask); extern bool ceph_inode_set_size(struct inode *inode, loff_t size); extern void __ceph_do_pending_vmtruncate(struct inode *inode); extern void ceph_queue_vmtruncate(struct inode *inode); extern void ceph_queue_invalidate(struct inode *inode); extern void ceph_queue_writeback(struct inode *inode); extern int __ceph_do_getattr(struct inode *inode, struct page *locked_page, int mask, bool force); static inline int ceph_do_getattr(struct inode *inode, int mask, bool force) { return __ceph_do_getattr(inode, NULL, mask, force); } extern int ceph_permission(struct inode *inode, int mask); extern int __ceph_setattr(struct inode *inode, struct iattr *attr); extern int ceph_setattr(struct dentry *dentry, struct iattr *attr); extern int ceph_getattr(const struct path *path, struct kstat *stat, u32 request_mask, unsigned int flags); /* xattr.c */ int __ceph_setxattr(struct inode *, const char *, const void *, size_t, int); ssize_t __ceph_getxattr(struct inode *, const char *, void *, size_t); extern ssize_t ceph_listxattr(struct dentry *, char *, size_t); extern struct ceph_buffer *__ceph_build_xattrs_blob(struct ceph_inode_info *ci); extern void __ceph_destroy_xattrs(struct ceph_inode_info *ci); extern void __init ceph_xattr_init(void); extern void ceph_xattr_exit(void); extern const struct xattr_handler *ceph_xattr_handlers[]; #ifdef CONFIG_SECURITY extern bool ceph_security_xattr_deadlock(struct inode *in); extern bool ceph_security_xattr_wanted(struct inode *in); #else static inline bool ceph_security_xattr_deadlock(struct inode *in) { return false; } static inline bool ceph_security_xattr_wanted(struct inode *in) { return false; } #endif /* acl.c */ struct ceph_acls_info { void *default_acl; void *acl; struct ceph_pagelist *pagelist; }; #ifdef CONFIG_CEPH_FS_POSIX_ACL struct posix_acl *ceph_get_acl(struct inode *, int); int ceph_set_acl(struct inode *inode, struct posix_acl *acl, int type); int ceph_pre_init_acls(struct inode *dir, umode_t *mode, struct ceph_acls_info *info); void ceph_init_inode_acls(struct inode *inode, struct ceph_acls_info *info); void ceph_release_acls_info(struct ceph_acls_info *info); static inline void ceph_forget_all_cached_acls(struct inode *inode) { forget_all_cached_acls(inode); } #else #define ceph_get_acl NULL #define ceph_set_acl NULL static inline int ceph_pre_init_acls(struct inode *dir, umode_t *mode, struct ceph_acls_info *info) { return 0; } static inline void ceph_init_inode_acls(struct inode *inode, struct ceph_acls_info *info) { } static inline void ceph_release_acls_info(struct ceph_acls_info *info) { } static inline int ceph_acl_chmod(struct dentry *dentry, struct inode *inode) { return 0; } static inline void ceph_forget_all_cached_acls(struct inode *inode) { } #endif /* caps.c */ extern const char *ceph_cap_string(int c); extern void ceph_handle_caps(struct ceph_mds_session *session, struct ceph_msg *msg); extern struct ceph_cap *ceph_get_cap(struct ceph_mds_client *mdsc, struct ceph_cap_reservation *ctx); extern void ceph_add_cap(struct inode *inode, struct ceph_mds_session *session, u64 cap_id, int fmode, unsigned issued, unsigned wanted, unsigned cap, unsigned seq, u64 realmino, int flags, struct ceph_cap **new_cap); extern void __ceph_remove_cap(struct ceph_cap *cap, bool queue_release); extern void ceph_put_cap(struct ceph_mds_client *mdsc, struct ceph_cap *cap); extern int ceph_is_any_caps(struct inode *inode); extern void ceph_queue_caps_release(struct inode *inode); extern int ceph_write_inode(struct inode *inode, struct writeback_control *wbc); extern int ceph_fsync(struct file *file, loff_t start, loff_t end, int datasync); extern void ceph_early_kick_flushing_caps(struct ceph_mds_client *mdsc, struct ceph_mds_session *session); extern void ceph_kick_flushing_caps(struct ceph_mds_client *mdsc, struct ceph_mds_session *session); extern struct ceph_cap *ceph_get_cap_for_mds(struct ceph_inode_info *ci, int mds); extern int ceph_get_cap_mds(struct inode *inode); extern void ceph_get_cap_refs(struct ceph_inode_info *ci, int caps); extern void ceph_put_cap_refs(struct ceph_inode_info *ci, int had); extern void ceph_put_wrbuffer_cap_refs(struct ceph_inode_info *ci, int nr, struct ceph_snap_context *snapc); extern void ceph_flush_snaps(struct ceph_inode_info *ci, struct ceph_mds_session **psession); extern bool __ceph_should_report_size(struct ceph_inode_info *ci); extern void ceph_check_caps(struct ceph_inode_info *ci, int flags, struct ceph_mds_session *session); extern void ceph_check_delayed_caps(struct ceph_mds_client *mdsc); extern void ceph_flush_dirty_caps(struct ceph_mds_client *mdsc); extern int ceph_drop_caps_for_unlink(struct inode *inode); extern int ceph_encode_inode_release(void **p, struct inode *inode, int mds, int drop, int unless, int force); extern int ceph_encode_dentry_release(void **p, struct dentry *dn, struct inode *dir, int mds, int drop, int unless); extern int ceph_get_caps(struct ceph_inode_info *ci, int need, int want, loff_t endoff, int *got, struct page **pinned_page); extern int ceph_try_get_caps(struct ceph_inode_info *ci, int need, int want, int *got); /* for counting open files by mode */ extern void __ceph_get_fmode(struct ceph_inode_info *ci, int mode); extern void ceph_put_fmode(struct ceph_inode_info *ci, int mode); /* addr.c */ extern const struct address_space_operations ceph_aops; extern int ceph_mmap(struct file *file, struct vm_area_struct *vma); extern int ceph_uninline_data(struct file *filp, struct page *locked_page); extern int ceph_pool_perm_check(struct ceph_inode_info *ci, int need); extern void ceph_pool_perm_destroy(struct ceph_mds_client* mdsc); /* file.c */ extern const struct file_operations ceph_file_fops; extern int ceph_renew_caps(struct inode *inode); extern int ceph_open(struct inode *inode, struct file *file); extern int ceph_atomic_open(struct inode *dir, struct dentry *dentry, struct file *file, unsigned flags, umode_t mode); extern int ceph_release(struct inode *inode, struct file *filp); extern void ceph_fill_inline_data(struct inode *inode, struct page *locked_page, char *data, size_t len); /* dir.c */ extern const struct file_operations ceph_dir_fops; extern const struct file_operations ceph_snapdir_fops; extern const struct inode_operations ceph_dir_iops; extern const struct inode_operations ceph_snapdir_iops; extern const struct dentry_operations ceph_dentry_ops; extern loff_t ceph_make_fpos(unsigned high, unsigned off, bool hash_order); extern int ceph_handle_notrace_create(struct inode *dir, struct dentry *dentry); extern int ceph_handle_snapdir(struct ceph_mds_request *req, struct dentry *dentry, int err); extern struct dentry *ceph_finish_lookup(struct ceph_mds_request *req, struct dentry *dentry, int err); extern void ceph_dentry_lru_add(struct dentry *dn); extern void ceph_dentry_lru_touch(struct dentry *dn); extern void ceph_dentry_lru_del(struct dentry *dn); extern void ceph_invalidate_dentry_lease(struct dentry *dentry); extern unsigned ceph_dentry_hash(struct inode *dir, struct dentry *dn); extern void ceph_readdir_cache_release(struct ceph_readdir_cache_control *ctl); /* ioctl.c */ extern long ceph_ioctl(struct file *file, unsigned int cmd, unsigned long arg); /* export.c */ extern const struct export_operations ceph_export_ops; /* locks.c */ extern __init void ceph_flock_init(void); extern int ceph_lock(struct file *file, int cmd, struct file_lock *fl); extern int ceph_flock(struct file *file, int cmd, struct file_lock *fl); extern void ceph_count_locks(struct inode *inode, int *p_num, int *f_num); extern int ceph_encode_locks_to_buffer(struct inode *inode, struct ceph_filelock *flocks, int num_fcntl_locks, int num_flock_locks); extern int ceph_locks_to_pagelist(struct ceph_filelock *flocks, struct ceph_pagelist *pagelist, int num_fcntl_locks, int num_flock_locks); /* debugfs.c */ extern int ceph_fs_debugfs_init(struct ceph_fs_client *client); extern void ceph_fs_debugfs_cleanup(struct ceph_fs_client *client); /* quota.c */ static inline bool __ceph_has_any_quota(struct ceph_inode_info *ci) { return ci->i_max_files || ci->i_max_bytes; } extern void ceph_adjust_quota_realms_count(struct inode *inode, bool inc); static inline void __ceph_update_quota(struct ceph_inode_info *ci, u64 max_bytes, u64 max_files) { bool had_quota, has_quota; had_quota = __ceph_has_any_quota(ci); ci->i_max_bytes = max_bytes; ci->i_max_files = max_files; has_quota = __ceph_has_any_quota(ci); if (had_quota != has_quota) ceph_adjust_quota_realms_count(&ci->vfs_inode, has_quota); } extern void ceph_handle_quota(struct ceph_mds_client *mdsc, struct ceph_mds_session *session, struct ceph_msg *msg); extern bool ceph_quota_is_max_files_exceeded(struct inode *inode); extern bool ceph_quota_is_same_realm(struct inode *old, struct inode *new); extern bool ceph_quota_is_max_bytes_exceeded(struct inode *inode, loff_t newlen); extern bool ceph_quota_is_max_bytes_approaching(struct inode *inode, loff_t newlen); extern bool ceph_quota_update_statfs(struct ceph_fs_client *fsc, struct kstatfs *buf); #endif /* _FS_CEPH_SUPER_H */
3 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 /* * Copyright (C) Sistina Software, Inc. 1997-2003 All rights reserved. * Copyright (C) 2004-2008 Red Hat, Inc. All rights reserved. * * This copyrighted material is made available to anyone wishing to use, * modify, copy, or redistribute it subject to the terms and conditions * of the GNU General Public License version 2. */ #include <linux/sched.h> #include <linux/slab.h> #include <linux/spinlock.h> #include <linux/completion.h> #include <linux/buffer_head.h> #include <linux/pagemap.h> #include <linux/pagevec.h> #include <linux/mpage.h> #include <linux/fs.h> #include <linux/writeback.h> #include <linux/swap.h> #include <linux/gfs2_ondisk.h> #include <linux/backing-dev.h> #include <linux/uio.h> #include <trace/events/writeback.h> #include <linux/sched/signal.h> #include "gfs2.h" #include "incore.h" #include "bmap.h" #include "glock.h" #include "inode.h" #include "log.h" #include "meta_io.h" #include "quota.h" #include "trans.h" #include "rgrp.h" #include "super.h" #include "util.h" #include "glops.h" #include "aops.h" void gfs2_page_add_databufs(struct gfs2_inode *ip, struct page *page, unsigned int from, unsigned int len) { struct buffer_head *head = page_buffers(page); unsigned int bsize = head->b_size; struct buffer_head *bh; unsigned int to = from + len; unsigned int start, end; for (bh = head, start = 0; bh != head || !start; bh = bh->b_this_page, start = end) { end = start + bsize; if (end <= from) continue; if (start >= to) break; set_buffer_uptodate(bh); gfs2_trans_add_data(ip->i_gl, bh); } } /** * gfs2_get_block_noalloc - Fills in a buffer head with details about a block * @inode: The inode * @lblock: The block number to look up * @bh_result: The buffer head to return the result in * @create: Non-zero if we may add block to the file * * Returns: errno */ static int gfs2_get_block_noalloc(struct inode *inode, sector_t lblock, struct buffer_head *bh_result, int create) { int error; error = gfs2_block_map(inode, lblock, bh_result, 0); if (error) return error; if (!buffer_mapped(bh_result)) return -EIO; return 0; } /** * gfs2_writepage_common - Common bits of writepage * @page: The page to be written * @wbc: The writeback control * * Returns: 1 if writepage is ok, otherwise an error code or zero if no error. */ static int gfs2_writepage_common(struct page *page, struct writeback_control *wbc) { struct inode *inode = page->mapping->host; struct gfs2_inode *ip = GFS2_I(inode); struct gfs2_sbd *sdp = GFS2_SB(inode); loff_t i_size = i_size_read(inode); pgoff_t end_index = i_size >> PAGE_SHIFT; unsigned offset; if (gfs2_assert_withdraw(sdp, gfs2_glock_is_held_excl(ip->i_gl))) goto out; if (current->journal_info) goto redirty; /* Is the page fully outside i_size? (truncate in progress) */ offset = i_size & (PAGE_SIZE-1); if (page->index > end_index || (page->index == end_index && !offset)) { page->mapping->a_ops->invalidatepage(page, 0, PAGE_SIZE); goto out; } return 1; redirty: redirty_page_for_writepage(wbc, page); out: unlock_page(page); return 0; } /** * gfs2_writepage - Write page for writeback mappings * @page: The page * @wbc: The writeback control * */ static int gfs2_writepage(struct page *page, struct writeback_control *wbc) { int ret; ret = gfs2_writepage_common(page, wbc); if (ret <= 0) return ret; return nobh_writepage(page, gfs2_get_block_noalloc, wbc); } /* This is the same as calling block_write_full_page, but it also * writes pages outside of i_size */ static int gfs2_write_full_page(struct page *page, get_block_t *get_block, struct writeback_control *wbc) { struct inode * const inode = page->mapping->host; loff_t i_size = i_size_read(inode); const pgoff_t end_index = i_size >> PAGE_SHIFT; unsigned offset; /* * The page straddles i_size. It must be zeroed out on each and every * writepage invocation because it may be mmapped. "A file is mapped * in multiples of the page size. For a file that is not a multiple of * the page size, the remaining memory is zeroed when mapped, and * writes to that region are not written out to the file." */ offset = i_size & (PAGE_SIZE-1); if (page->index == end_index && offset) zero_user_segment(page, offset, PAGE_SIZE); return __block_write_full_page(inode, page, get_block, wbc, end_buffer_async_write); } /** * __gfs2_jdata_writepage - The core of jdata writepage * @page: The page to write * @wbc: The writeback control * * This is shared between writepage and writepages and implements the * core of the writepage operation. If a transaction is required then * PageChecked will have been set and the transaction will have * already been started before this is called. */ static int __gfs2_jdata_writepage(struct page *page, struct writeback_control *wbc) { struct inode *inode = page->mapping->host; struct gfs2_inode *ip = GFS2_I(inode); struct gfs2_sbd *sdp = GFS2_SB(inode); if (PageChecked(page)) { ClearPageChecked(page); if (!page_has_buffers(page)) { create_empty_buffers(page, inode->i_sb->s_blocksize, BIT(BH_Dirty)|BIT(BH_Uptodate)); } gfs2_page_add_databufs(ip, page, 0, sdp->sd_vfs->s_blocksize); } return gfs2_write_full_page(page, gfs2_get_block_noalloc, wbc); } /** * gfs2_jdata_writepage - Write complete page * @page: Page to write * @wbc: The writeback control * * Returns: errno * */ static int gfs2_jdata_writepage(struct page *page, struct writeback_control *wbc) { struct inode *inode = page->mapping->host; struct gfs2_inode *ip = GFS2_I(inode); struct gfs2_sbd *sdp = GFS2_SB(inode); int ret; if (gfs2_assert_withdraw(sdp, gfs2_glock_is_held_excl(ip->i_gl))) goto out; if (PageChecked(page) || current->journal_info) goto out_ignore; ret = __gfs2_jdata_writepage(page, wbc); return ret; out_ignore: redirty_page_for_writepage(wbc, page); out: unlock_page(page); return 0; } /** * gfs2_writepages - Write a bunch of dirty pages back to disk * @mapping: The mapping to write * @wbc: Write-back control * * Used for both ordered and writeback modes. */ static int gfs2_writepages(struct address_space *mapping, struct writeback_control *wbc) { struct gfs2_sbd *sdp = gfs2_mapping2sbd(mapping); int ret = mpage_writepages(mapping, wbc, gfs2_get_block_noalloc); /* * Even if we didn't write any pages here, we might still be holding * dirty pages in the ail. We forcibly flush the ail because we don't * want balance_dirty_pages() to loop indefinitely trying to write out * pages held in the ail that it can't find. */ if (ret == 0) set_bit(SDF_FORCE_AIL_FLUSH, &sdp->sd_flags); return ret; } /** * gfs2_write_jdata_pagevec - Write back a pagevec's worth of pages * @mapping: The mapping * @wbc: The writeback control * @pvec: The vector of pages * @nr_pages: The number of pages to write * @done_index: Page index * * Returns: non-zero if loop should terminate, zero otherwise */ static int gfs2_write_jdata_pagevec(struct address_space *mapping, struct writeback_control *wbc, struct pagevec *pvec, int nr_pages, pgoff_t *done_index) { struct inode *inode = mapping->host; struct gfs2_sbd *sdp = GFS2_SB(inode); unsigned nrblocks = nr_pages * (PAGE_SIZE/inode->i_sb->s_blocksize); int i; int ret; ret = gfs2_trans_begin(sdp, nrblocks, nrblocks); if (ret < 0) return ret; for(i = 0; i < nr_pages; i++) { struct page *page = pvec->pages[i]; *done_index = page->index; lock_page(page); if (unlikely(page->mapping != mapping)) { continue_unlock: unlock_page(page); continue; } if (!PageDirty(page)) { /* someone wrote it for us */ goto continue_unlock; } if (PageWriteback(page)) { if (wbc->sync_mode != WB_SYNC_NONE) wait_on_page_writeback(page); else goto continue_unlock; } BUG_ON(PageWriteback(page)); if (!clear_page_dirty_for_io(page)) goto continue_unlock; trace_wbc_writepage(wbc, inode_to_bdi(inode)); ret = __gfs2_jdata_writepage(page, wbc); if (unlikely(ret)) { if (ret == AOP_WRITEPAGE_ACTIVATE) { unlock_page(page); ret = 0; } else { /* * done_index is set past this page, * so media errors will not choke * background writeout for the entire * file. This has consequences for * range_cyclic semantics (ie. it may * not be suitable for data integrity * writeout). */ *done_index = page->index + 1; ret = 1; break; } } /* * We stop writing back only if we are not doing * integrity sync. In case of integrity sync we have to * keep going until we have written all the pages * we tagged for writeback prior to entering this loop. */ if (--wbc->nr_to_write <= 0 && wbc->sync_mode == WB_SYNC_NONE) { ret = 1; break; } } gfs2_trans_end(sdp); return ret; } /** * gfs2_write_cache_jdata - Like write_cache_pages but different * @mapping: The mapping to write * @wbc: The writeback control * * The reason that we use our own function here is that we need to * start transactions before we grab page locks. This allows us * to get the ordering right. */ static int gfs2_write_cache_jdata(struct address_space *mapping, struct writeback_control *wbc) { int ret = 0; int done = 0; struct pagevec pvec; int nr_pages; pgoff_t uninitialized_var(writeback_index); pgoff_t index; pgoff_t end; pgoff_t done_index; int cycled; int range_whole = 0; int tag; pagevec_init(&pvec); if (wbc->range_cyclic) { writeback_index = mapping->writeback_index; /* prev offset */ index = writeback_index; if (index == 0) cycled = 1; else cycled = 0; end = -1; } else { index = wbc->range_start >> PAGE_SHIFT; end = wbc->range_end >> PAGE_SHIFT; if (wbc->range_start == 0 && wbc->range_end == LLONG_MAX) range_whole = 1; cycled = 1; /* ignore range_cyclic tests */ } if (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages) tag = PAGECACHE_TAG_TOWRITE; else tag = PAGECACHE_TAG_DIRTY; retry: if (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages) tag_pages_for_writeback(mapping, index, end); done_index = index; while (!done && (index <= end)) { nr_pages = pagevec_lookup_range_tag(&pvec, mapping, &index, end, tag); if (nr_pages == 0) break; ret = gfs2_write_jdata_pagevec(mapping, wbc, &pvec, nr_pages, &done_index); if (ret) done = 1; if (ret > 0) ret = 0; pagevec_release(&pvec); cond_resched(); } if (!cycled && !done) { /* * range_cyclic: * We hit the last page and there is more work to be done: wrap * back to the start of the file */ cycled = 1; index = 0; end = writeback_index - 1; goto retry; } if (wbc->range_cyclic || (range_whole && wbc->nr_to_write > 0)) mapping->writeback_index = done_index; return ret; } /** * gfs2_jdata_writepages - Write a bunch of dirty pages back to disk * @mapping: The mapping to write * @wbc: The writeback control * */ static int gfs2_jdata_writepages(struct address_space *mapping, struct writeback_control *wbc) { struct gfs2_inode *ip = GFS2_I(mapping->host); struct gfs2_sbd *sdp = GFS2_SB(mapping->host); int ret; ret = gfs2_write_cache_jdata(mapping, wbc); if (ret == 0 && wbc->sync_mode == WB_SYNC_ALL) { gfs2_log_flush(sdp, ip->i_gl, GFS2_LOG_HEAD_FLUSH_NORMAL | GFS2_LFC_JDATA_WPAGES); ret = gfs2_write_cache_jdata(mapping, wbc); } return ret; } /** * stuffed_readpage - Fill in a Linux page with stuffed file data * @ip: the inode * @page: the page * * Returns: errno */ int stuffed_readpage(struct gfs2_inode *ip, struct page *page) { struct buffer_head *dibh; u64 dsize = i_size_read(&ip->i_inode); void *kaddr; int error; /* * Due to the order of unstuffing files and ->fault(), we can be * asked for a zero page in the case of a stuffed file being extended, * so we need to supply one here. It doesn't happen often. */ if (unlikely(page->index)) { zero_user(page, 0, PAGE_SIZE); SetPageUptodate(page); return 0; } error = gfs2_meta_inode_buffer(ip, &dibh); if (error) return error; kaddr = kmap_atomic(page); if (dsize > gfs2_max_stuffed_size(ip)) dsize = gfs2_max_stuffed_size(ip); memcpy(kaddr, dibh->b_data + sizeof(struct gfs2_dinode), dsize); memset(kaddr + dsize, 0, PAGE_SIZE - dsize); kunmap_atomic(kaddr); flush_dcache_page(page); brelse(dibh); SetPageUptodate(page); return 0; } /** * __gfs2_readpage - readpage * @file: The file to read a page for * @page: The page to read * * This is the core of gfs2's readpage. It's used by the internal file * reading code as in that case we already hold the glock. Also it's * called by gfs2_readpage() once the required lock has been granted. */ static int __gfs2_readpage(void *file, struct page *page) { struct gfs2_inode *ip = GFS2_I(page->mapping->host); struct gfs2_sbd *sdp = GFS2_SB(page->mapping->host); int error; if (i_blocksize(page->mapping->host) == PAGE_SIZE && !page_has_buffers(page)) { error = iomap_readpage(page, &gfs2_iomap_ops); } else if (gfs2_is_stuffed(ip)) { error = stuffed_readpage(ip, page); unlock_page(page); } else { error = mpage_readpage(page, gfs2_block_map); } if (unlikely(test_bit(SDF_SHUTDOWN, &sdp->sd_flags))) return -EIO; return error; } /** * gfs2_readpage - read a page of a file * @file: The file to read * @page: The page of the file * * This deals with the locking required. We have to unlock and * relock the page in order to get the locking in the right * order. */ static int gfs2_readpage(struct file *file, struct page *page) { struct address_space *mapping = page->mapping; struct gfs2_inode *ip = GFS2_I(mapping->host); struct gfs2_holder gh; int error; unlock_page(page); gfs2_holder_init(ip->i_gl, LM_ST_SHARED, 0, &gh); error = gfs2_glock_nq(&gh); if (unlikely(error)) goto out; error = AOP_TRUNCATED_PAGE; lock_page(page); if (page->mapping == mapping && !PageUptodate(page)) error = __gfs2_readpage(file, page); else unlock_page(page); gfs2_glock_dq(&gh); out: gfs2_holder_uninit(&gh); if (error && error != AOP_TRUNCATED_PAGE) lock_page(page); return error; } /** * gfs2_internal_read - read an internal file * @ip: The gfs2 inode * @buf: The buffer to fill * @pos: The file position * @size: The amount to read * */ int gfs2_internal_read(struct gfs2_inode *ip, char *buf, loff_t *pos, unsigned size) { struct address_space *mapping = ip->i_inode.i_mapping; unsigned long index = *pos / PAGE_SIZE; unsigned offset = *pos & (PAGE_SIZE - 1); unsigned copied = 0; unsigned amt; struct page *page; void *p; do { amt = size - copied; if (offset + size > PAGE_SIZE) amt = PAGE_SIZE - offset; page = read_cache_page(mapping, index, __gfs2_readpage, NULL); if (IS_ERR(page)) return PTR_ERR(page); p = kmap_atomic(page); memcpy(buf + copied, p + offset, amt); kunmap_atomic(p); put_page(page); copied += amt; index++; offset = 0; } while(copied < size); (*pos) += size; return size; } /** * gfs2_readpages - Read a bunch of pages at once * @file: The file to read from * @mapping: Address space info * @pages: List of pages to read * @nr_pages: Number of pages to read * * Some notes: * 1. This is only for readahead, so we can simply ignore any things * which are slightly inconvenient (such as locking conflicts between * the page lock and the glock) and return having done no I/O. Its * obviously not something we'd want to do on too regular a basis. * Any I/O we ignore at this time will be done via readpage later. * 2. We don't handle stuffed files here we let readpage do the honours. * 3. mpage_readpages() does most of the heavy lifting in the common case. * 4. gfs2_block_map() is relied upon to set BH_Boundary in the right places. */ static int gfs2_readpages(struct file *file, struct address_space *mapping, struct list_head *pages, unsigned nr_pages) { struct inode *inode = mapping->host; struct gfs2_inode *ip = GFS2_I(inode); struct gfs2_sbd *sdp = GFS2_SB(inode); struct gfs2_holder gh; int ret; gfs2_holder_init(ip->i_gl, LM_ST_SHARED, 0, &gh); ret = gfs2_glock_nq(&gh); if (unlikely(ret)) goto out_uninit; if (!gfs2_is_stuffed(ip)) ret = mpage_readpages(mapping, pages, nr_pages, gfs2_block_map); gfs2_glock_dq(&gh); out_uninit: gfs2_holder_uninit(&gh); if (unlikely(test_bit(SDF_SHUTDOWN, &sdp->sd_flags))) ret = -EIO; return ret; } /** * adjust_fs_space - Adjusts the free space available due to gfs2_grow * @inode: the rindex inode */ void adjust_fs_space(struct inode *inode) { struct gfs2_sbd *sdp = inode->i_sb->s_fs_info; struct gfs2_inode *m_ip = GFS2_I(sdp->sd_statfs_inode); struct gfs2_inode *l_ip = GFS2_I(sdp->sd_sc_inode); struct gfs2_statfs_change_host *m_sc = &sdp->sd_statfs_master; struct gfs2_statfs_change_host *l_sc = &sdp->sd_statfs_local; struct buffer_head *m_bh, *l_bh; u64 fs_total, new_free; /* Total up the file system space, according to the latest rindex. */ fs_total = gfs2_ri_total(sdp); if (gfs2_meta_inode_buffer(m_ip, &m_bh) != 0) return; spin_lock(&sdp->sd_statfs_spin); gfs2_statfs_change_in(m_sc, m_bh->b_data + sizeof(struct gfs2_dinode)); if (fs_total > (m_sc->sc_total + l_sc->sc_total)) new_free = fs_total - (m_sc->sc_total + l_sc->sc_total); else new_free = 0; spin_unlock(&sdp->sd_statfs_spin); fs_warn(sdp, "File system extended by %llu blocks.\n", (unsigned long long)new_free); gfs2_statfs_change(sdp, new_free, new_free, 0); if (gfs2_meta_inode_buffer(l_ip, &l_bh) != 0) goto out; update_statfs(sdp, m_bh, l_bh); brelse(l_bh); out: brelse(m_bh); } /** * gfs2_stuffed_write_end - Write end for stuffed files * @inode: The inode * @dibh: The buffer_head containing the on-disk inode * @pos: The file position * @copied: How much was actually copied by the VFS * @page: The page * * This copies the data from the page into the inode block after * the inode data structure itself. * * Returns: copied bytes or errno */ int gfs2_stuffed_write_end(struct inode *inode, struct buffer_head *dibh, loff_t pos, unsigned copied, struct page *page) { struct gfs2_inode *ip = GFS2_I(inode); u64 to = pos + copied; void *kaddr; unsigned char *buf = dibh->b_data + sizeof(struct gfs2_dinode); BUG_ON(pos + copied > gfs2_max_stuffed_size(ip)); kaddr = kmap_atomic(page); memcpy(buf + pos, kaddr + pos, copied); flush_dcache_page(page); kunmap_atomic(kaddr); WARN_ON(!PageUptodate(page)); unlock_page(page); put_page(page); if (copied) { if (inode->i_size < to) i_size_write(inode, to); mark_inode_dirty(inode); } return copied; } /** * jdata_set_page_dirty - Page dirtying function * @page: The page to dirty * * Returns: 1 if it dirtyed the page, or 0 otherwise */ static int jdata_set_page_dirty(struct page *page) { SetPageChecked(page); return __set_page_dirty_buffers(page); } /** * gfs2_bmap - Block map function * @mapping: Address space info * @lblock: The block to map * * Returns: The disk address for the block or 0 on hole or error */ static sector_t gfs2_bmap(struct address_space *mapping, sector_t lblock) { struct gfs2_inode *ip = GFS2_I(mapping->host); struct gfs2_holder i_gh; sector_t dblock = 0; int error; error = gfs2_glock_nq_init(ip->i_gl, LM_ST_SHARED, LM_FLAG_ANY, &i_gh); if (error) return 0; if (!gfs2_is_stuffed(ip)) dblock = generic_block_bmap(mapping, lblock, gfs2_block_map); gfs2_glock_dq_uninit(&i_gh); return dblock; } static void gfs2_discard(struct gfs2_sbd *sdp, struct buffer_head *bh) { struct gfs2_bufdata *bd; lock_buffer(bh); gfs2_log_lock(sdp); clear_buffer_dirty(bh); bd = bh->b_private; if (bd) { if (!list_empty(&bd->bd_list) && !buffer_pinned(bh)) list_del_init(&bd->bd_list); else gfs2_remove_from_journal(bh, REMOVE_JDATA); } bh->b_bdev = NULL; clear_buffer_mapped(bh); clear_buffer_req(bh); clear_buffer_new(bh); gfs2_log_unlock(sdp); unlock_buffer(bh); } static void gfs2_invalidatepage(struct page *page, unsigned int offset, unsigned int length) { struct gfs2_sbd *sdp = GFS2_SB(page->mapping->host); unsigned int stop = offset + length; int partial_page = (offset || length < PAGE_SIZE); struct buffer_head *bh, *head; unsigned long pos = 0; BUG_ON(!PageLocked(page)); if (!partial_page) ClearPageChecked(page); if (!page_has_buffers(page)) goto out; bh = head = page_buffers(page); do { if (pos + bh->b_size > stop) return; if (offset <= pos) gfs2_discard(sdp, bh); pos += bh->b_size; bh = bh->b_this_page; } while (bh != head); out: if (!partial_page) try_to_release_page(page, 0); } /** * gfs2_releasepage - free the metadata associated with a page * @page: the page that's being released * @gfp_mask: passed from Linux VFS, ignored by us * * Call try_to_free_buffers() if the buffers in this page can be * released. * * Returns: 0 */ int gfs2_releasepage(struct page *page, gfp_t gfp_mask) { struct address_space *mapping = page->mapping; struct gfs2_sbd *sdp = gfs2_mapping2sbd(mapping); struct buffer_head *bh, *head; struct gfs2_bufdata *bd; if (!page_has_buffers(page)) return 0; /* * From xfs_vm_releasepage: mm accommodates an old ext3 case where * clean pages might not have had the dirty bit cleared. Thus, it can * send actual dirty pages to ->releasepage() via shrink_active_list(). * * As a workaround, we skip pages that contain dirty buffers below. * Once ->releasepage isn't called on dirty pages anymore, we can warn * on dirty buffers like we used to here again. */ gfs2_log_lock(sdp); spin_lock(&sdp->sd_ail_lock); head = bh = page_buffers(page); do { if (atomic_read(&bh->b_count)) goto cannot_release; bd = bh->b_private; if (bd && bd->bd_tr) goto cannot_release; if (buffer_dirty(bh) || WARN_ON(buffer_pinned(bh))) goto cannot_release; bh = bh->b_this_page; } while(bh != head); spin_unlock(&sdp->sd_ail_lock); head = bh = page_buffers(page); do { bd = bh->b_private; if (bd) { gfs2_assert_warn(sdp, bd->bd_bh == bh); if (!list_empty(&bd->bd_list)) list_del_init(&bd->bd_list); bd->bd_bh = NULL; bh->b_private = NULL; kmem_cache_free(gfs2_bufdata_cachep, bd); } bh = bh->b_this_page; } while (bh != head); gfs2_log_unlock(sdp); return try_to_free_buffers(page); cannot_release: spin_unlock(&sdp->sd_ail_lock); gfs2_log_unlock(sdp); return 0; } static const struct address_space_operations gfs2_writeback_aops = { .writepage = gfs2_writepage, .writepages = gfs2_writepages, .readpage = gfs2_readpage, .readpages = gfs2_readpages, .bmap = gfs2_bmap, .invalidatepage = gfs2_invalidatepage, .releasepage = gfs2_releasepage, .direct_IO = noop_direct_IO, .migratepage = buffer_migrate_page, .is_partially_uptodate = block_is_partially_uptodate, .error_remove_page = generic_error_remove_page, }; static const struct address_space_operations gfs2_ordered_aops = { .writepage = gfs2_writepage, .writepages = gfs2_writepages, .readpage = gfs2_readpage, .readpages = gfs2_readpages, .set_page_dirty = __set_page_dirty_buffers, .bmap = gfs2_bmap, .invalidatepage = gfs2_invalidatepage, .releasepage = gfs2_releasepage, .direct_IO = noop_direct_IO, .migratepage = buffer_migrate_page, .is_partially_uptodate = block_is_partially_uptodate, .error_remove_page = generic_error_remove_page, }; static const struct address_space_operations gfs2_jdata_aops = { .writepage = gfs2_jdata_writepage, .writepages = gfs2_jdata_writepages, .readpage = gfs2_readpage, .readpages = gfs2_readpages, .set_page_dirty = jdata_set_page_dirty, .bmap = gfs2_bmap, .invalidatepage = gfs2_invalidatepage, .releasepage = gfs2_releasepage, .is_partially_uptodate = block_is_partially_uptodate, .error_remove_page = generic_error_remove_page, }; void gfs2_set_aops(struct inode *inode) { struct gfs2_inode *ip = GFS2_I(inode); if (gfs2_is_writeback(ip)) inode->i_mapping->a_ops = &gfs2_writeback_aops; else if (gfs2_is_ordered(ip)) inode->i_mapping->a_ops = &gfs2_ordered_aops; else if (gfs2_is_jdata(ip)) inode->i_mapping->a_ops = &gfs2_jdata_aops; else BUG(); }
10 10 32 1 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 /* * Creates audit record for dropped/accepted packets * * (C) 2010-2011 Thomas Graf <tgraf@redhat.com> * (C) 2010-2011 Red Hat, Inc. * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License version 2 as * published by the Free Software Foundation. */ #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt #include <linux/audit.h> #include <linux/module.h> #include <linux/skbuff.h> #include <linux/tcp.h> #include <linux/udp.h> #include <linux/if_arp.h> #include <linux/netfilter/x_tables.h> #include <linux/netfilter/xt_AUDIT.h> #include <linux/netfilter_bridge/ebtables.h> #include <net/ipv6.h> #include <net/ip.h> MODULE_LICENSE("GPL"); MODULE_AUTHOR("Thomas Graf <tgraf@redhat.com>"); MODULE_DESCRIPTION("Xtables: creates audit records for dropped/accepted packets"); MODULE_ALIAS("ipt_AUDIT"); MODULE_ALIAS("ip6t_AUDIT"); MODULE_ALIAS("ebt_AUDIT"); MODULE_ALIAS("arpt_AUDIT"); static bool audit_ip4(struct audit_buffer *ab, struct sk_buff *skb) { struct iphdr _iph; const struct iphdr *ih; ih = skb_header_pointer(skb, skb_network_offset(skb), sizeof(_iph), &_iph); if (!ih) return false; audit_log_format(ab, " saddr=%pI4 daddr=%pI4 proto=%hhu", &ih->saddr, &ih->daddr, ih->protocol); return true; } static bool audit_ip6(struct audit_buffer *ab, struct sk_buff *skb) { struct ipv6hdr _ip6h; const struct ipv6hdr *ih; u8 nexthdr; __be16 frag_off; ih = skb_header_pointer(skb, skb_network_offset(skb), sizeof(_ip6h), &_ip6h); if (!ih) return false; nexthdr = ih->nexthdr; ipv6_skip_exthdr(skb, skb_network_offset(skb) + sizeof(_ip6h), &nexthdr, &frag_off); audit_log_format(ab, " saddr=%pI6c daddr=%pI6c proto=%hhu", &ih->saddr, &ih->daddr, nexthdr); return true; } static unsigned int audit_tg(struct sk_buff *skb, const struct xt_action_param *par) { struct audit_buffer *ab; int fam = -1; if (audit_enabled == AUDIT_OFF) goto errout; ab = audit_log_start(NULL, GFP_ATOMIC, AUDIT_NETFILTER_PKT); if (ab == NULL) goto errout; audit_log_format(ab, "mark=%#x", skb->mark); switch (xt_family(par)) { case NFPROTO_BRIDGE: switch (eth_hdr(skb)->h_proto) { case htons(ETH_P_IP): fam = audit_ip4(ab, skb) ? NFPROTO_IPV4 : -1; break; case htons(ETH_P_IPV6): fam = audit_ip6(ab, skb) ? NFPROTO_IPV6 : -1; break; } break; case NFPROTO_IPV4: fam = audit_ip4(ab, skb) ? NFPROTO_IPV4 : -1; break; case NFPROTO_IPV6: fam = audit_ip6(ab, skb) ? NFPROTO_IPV6 : -1; break; } if (fam == -1) audit_log_format(ab, " saddr=? daddr=? proto=-1"); audit_log_end(ab); errout: return XT_CONTINUE; } static unsigned int audit_tg_ebt(struct sk_buff *skb, const struct xt_action_param *par) { audit_tg(skb, par); return EBT_CONTINUE; } static int audit_tg_check(const struct xt_tgchk_param *par) { const struct xt_audit_info *info = par->targinfo; if (info->type > XT_AUDIT_TYPE_MAX) { pr_info_ratelimited("Audit type out of range (valid range: 0..%hhu)\n", XT_AUDIT_TYPE_MAX); return -ERANGE; } return 0; } static struct xt_target audit_tg_reg[] __read_mostly = { { .name = "AUDIT", .family = NFPROTO_UNSPEC, .target = audit_tg, .targetsize = sizeof(struct xt_audit_info), .checkentry = audit_tg_check, .me = THIS_MODULE, }, { .name = "AUDIT", .family = NFPROTO_BRIDGE, .target = audit_tg_ebt, .targetsize = sizeof(struct xt_audit_info), .checkentry = audit_tg_check, .me = THIS_MODULE, }, }; static int __init audit_tg_init(void) { return xt_register_targets(audit_tg_reg, ARRAY_SIZE(audit_tg_reg)); } static void __exit audit_tg_exit(void) { xt_unregister_targets(audit_tg_reg, ARRAY_SIZE(audit_tg_reg)); } module_init(audit_tg_init); module_exit(audit_tg_exit);
10 10 9 9 1 1 8 1 10 10 1 9 9 9 9 9 9 9 9 6 6 9 8 10 10 10 10 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 /* * linux/fs/affs/inode.c * * (c) 1996 Hans-Joachim Widmaier - Rewritten * * (C) 1993 Ray Burr - Modified for Amiga FFS filesystem. * * (C) 1992 Eric Youngdale Modified for ISO 9660 filesystem. * * (C) 1991 Linus Torvalds - minix filesystem */ #include <linux/module.h> #include <linux/init.h> #include <linux/statfs.h> #include <linux/parser.h> #include <linux/magic.h> #include <linux/sched.h> #include <linux/cred.h> #include <linux/slab.h> #include <linux/writeback.h> #include <linux/blkdev.h> #include <linux/seq_file.h> #include <linux/iversion.h> #include "affs.h" static int affs_statfs(struct dentry *dentry, struct kstatfs *buf); static int affs_show_options(struct seq_file *m, struct dentry *root); static int affs_remount (struct super_block *sb, int *flags, char *data); static void affs_commit_super(struct super_block *sb, int wait) { struct affs_sb_info *sbi = AFFS_SB(sb); struct buffer_head *bh = sbi->s_root_bh; struct affs_root_tail *tail = AFFS_ROOT_TAIL(sb, bh); lock_buffer(bh); affs_secs_to_datestamp(ktime_get_real_seconds(), &tail->disk_change); affs_fix_checksum(sb, bh); unlock_buffer(bh); mark_buffer_dirty(bh); if (wait) sync_dirty_buffer(bh); } static void affs_put_super(struct super_block *sb) { struct affs_sb_info *sbi = AFFS_SB(sb); pr_debug("%s()\n", __func__); cancel_delayed_work_sync(&sbi->sb_work); } static int affs_sync_fs(struct super_block *sb, int wait) { affs_commit_super(sb, wait); return 0; } static void flush_superblock(struct work_struct *work) { struct affs_sb_info *sbi; struct super_block *sb; sbi = container_of(work, struct affs_sb_info, sb_work.work); sb = sbi->sb; spin_lock(&sbi->work_lock); sbi->work_queued = 0; spin_unlock(&sbi->work_lock); affs_commit_super(sb, 1); } void affs_mark_sb_dirty(struct super_block *sb) { struct affs_sb_info *sbi = AFFS_SB(sb); unsigned long delay; if (sb_rdonly(sb)) return; spin_lock(&sbi->work_lock); if (!sbi->work_queued) { delay = msecs_to_jiffies(dirty_writeback_interval * 10); queue_delayed_work(system_long_wq, &sbi->sb_work, delay); sbi->work_queued = 1; } spin_unlock(&sbi->work_lock); } static struct kmem_cache * affs_inode_cachep; static struct inode *affs_alloc_inode(struct super_block *sb) { struct affs_inode_info *i; i = kmem_cache_alloc(affs_inode_cachep, GFP_KERNEL); if (!i) return NULL; inode_set_iversion(&i->vfs_inode, 1); i->i_lc = NULL; i->i_ext_bh = NULL; i->i_pa_cnt = 0; return &i->vfs_inode; } static void affs_i_callback(struct rcu_head *head) { struct inode *inode = container_of(head, struct inode, i_rcu); kmem_cache_free(affs_inode_cachep, AFFS_I(inode)); } static void affs_destroy_inode(struct inode *inode) { call_rcu(&inode->i_rcu, affs_i_callback); } static void init_once(void *foo) { struct affs_inode_info *ei = (struct affs_inode_info *) foo; sema_init(&ei->i_link_lock, 1); sema_init(&ei->i_ext_lock, 1); inode_init_once(&ei->vfs_inode); } static int __init init_inodecache(void) { affs_inode_cachep = kmem_cache_create("affs_inode_cache", sizeof(struct affs_inode_info), 0, (SLAB_RECLAIM_ACCOUNT| SLAB_MEM_SPREAD|SLAB_ACCOUNT), init_once); if (affs_inode_cachep == NULL) return -ENOMEM; return 0; } static void destroy_inodecache(void) { /* * Make sure all delayed rcu free inodes are flushed before we * destroy cache. */ rcu_barrier(); kmem_cache_destroy(affs_inode_cachep); } static const struct super_operations affs_sops = { .alloc_inode = affs_alloc_inode, .destroy_inode = affs_destroy_inode, .write_inode = affs_write_inode, .evict_inode = affs_evict_inode, .put_super = affs_put_super, .sync_fs = affs_sync_fs, .statfs = affs_statfs, .remount_fs = affs_remount, .show_options = affs_show_options, }; enum { Opt_bs, Opt_mode, Opt_mufs, Opt_notruncate, Opt_prefix, Opt_protect, Opt_reserved, Opt_root, Opt_setgid, Opt_setuid, Opt_verbose, Opt_volume, Opt_ignore, Opt_err, }; static const match_table_t tokens = { {Opt_bs, "bs=%u"}, {Opt_mode, "mode=%o"}, {Opt_mufs, "mufs"}, {Opt_notruncate, "nofilenametruncate"}, {Opt_prefix, "prefix=%s"}, {Opt_protect, "protect"}, {Opt_reserved, "reserved=%u"}, {Opt_root, "root=%u"}, {Opt_setgid, "setgid=%u"}, {Opt_setuid, "setuid=%u"}, {Opt_verbose, "verbose"}, {Opt_volume, "volume=%s"}, {Opt_ignore, "grpquota"}, {Opt_ignore, "noquota"}, {Opt_ignore, "quota"}, {Opt_ignore, "usrquota"}, {Opt_err, NULL}, }; static int parse_options(char *options, kuid_t *uid, kgid_t *gid, int *mode, int *reserved, s32 *root, int *blocksize, char **prefix, char *volume, unsigned long *mount_opts) { char *p; substring_t args[MAX_OPT_ARGS]; /* Fill in defaults */ *uid = current_uid(); *gid = current_gid(); *reserved = 2; *root = -1; *blocksize = -1; volume[0] = ':'; volume[1] = 0; *mount_opts = 0; if (!options) return 1; while ((p = strsep(&options, ",")) != NULL) { int token, n, option; if (!*p) continue; token = match_token(p, tokens, args); switch (token) { case Opt_bs: if (match_int(&args[0], &n)) return 0; if (n != 512 && n != 1024 && n != 2048 && n != 4096) { pr_warn("Invalid blocksize (512, 1024, 2048, 4096 allowed)\n"); return 0; } *blocksize = n; break; case Opt_mode: if (match_octal(&args[0], &option)) return 0; *mode = option & 0777; affs_set_opt(*mount_opts, SF_SETMODE); break; case Opt_mufs: affs_set_opt(*mount_opts, SF_MUFS); break; case Opt_notruncate: affs_set_opt(*mount_opts, SF_NO_TRUNCATE); break; case Opt_prefix: kfree(*prefix); *prefix = match_strdup(&args[0]); if (!*prefix) return 0; affs_set_opt(*mount_opts, SF_PREFIX); break; case Opt_protect: affs_set_opt(*mount_opts, SF_IMMUTABLE); break; case Opt_reserved: if (match_int(&args[0], reserved)) return 0; break; case Opt_root: if (match_int(&args[0], root)) return 0; break; case Opt_setgid: if (match_int(&args[0], &option)) return 0; *gid = make_kgid(current_user_ns(), option); if (!gid_valid(*gid)) return 0; affs_set_opt(*mount_opts, SF_SETGID); break; case Opt_setuid: if (match_int(&args[0], &option)) return 0; *uid = make_kuid(current_user_ns(), option); if (!uid_valid(*uid)) return 0; affs_set_opt(*mount_opts, SF_SETUID); break; case Opt_verbose: affs_set_opt(*mount_opts, SF_VERBOSE); break; case Opt_volume: { char *vol = match_strdup(&args[0]); if (!vol) return 0; strlcpy(volume, vol, 32); kfree(vol); break; } case Opt_ignore: /* Silently ignore the quota options */ break; default: pr_warn("Unrecognized mount option \"%s\" or missing value\n", p); return 0; } } return 1; } static int affs_show_options(struct seq_file *m, struct dentry *root) { struct super_block *sb = root->d_sb; struct affs_sb_info *sbi = AFFS_SB(sb); if (sb->s_blocksize) seq_printf(m, ",bs=%lu", sb->s_blocksize); if (affs_test_opt(sbi->s_flags, SF_SETMODE)) seq_printf(m, ",mode=%o", sbi->s_mode); if (affs_test_opt(sbi->s_flags, SF_MUFS)) seq_puts(m, ",mufs"); if (affs_test_opt(sbi->s_flags, SF_NO_TRUNCATE)) seq_puts(m, ",nofilenametruncate"); if (affs_test_opt(sbi->s_flags, SF_PREFIX)) seq_printf(m, ",prefix=%s", sbi->s_prefix); if (affs_test_opt(sbi->s_flags, SF_IMMUTABLE)) seq_puts(m, ",protect"); if (sbi->s_reserved != 2) seq_printf(m, ",reserved=%u", sbi->s_reserved); if (sbi->s_root_block != (sbi->s_reserved + sbi->s_partition_size - 1) / 2) seq_printf(m, ",root=%u", sbi->s_root_block); if (affs_test_opt(sbi->s_flags, SF_SETGID)) seq_printf(m, ",setgid=%u", from_kgid_munged(&init_user_ns, sbi->s_gid)); if (affs_test_opt(sbi->s_flags, SF_SETUID)) seq_printf(m, ",setuid=%u", from_kuid_munged(&init_user_ns, sbi->s_uid)); if (affs_test_opt(sbi->s_flags, SF_VERBOSE)) seq_puts(m, ",verbose"); if (sbi->s_volume[0]) seq_printf(m, ",volume=%s", sbi->s_volume); return 0; } /* This function definitely needs to be split up. Some fine day I'll * hopefully have the guts to do so. Until then: sorry for the mess. */ static int affs_fill_super(struct super_block *sb, void *data, int silent) { struct affs_sb_info *sbi; struct buffer_head *root_bh = NULL; struct buffer_head *boot_bh; struct inode *root_inode = NULL; s32 root_block; int size, blocksize; u32 chksum; int num_bm; int i, j; kuid_t uid; kgid_t gid; int reserved; unsigned long mount_flags; int tmp_flags; /* fix remount prototype... */ u8 sig[4]; int ret; pr_debug("read_super(%s)\n", data ? (const char *)data : "no options"); sb->s_magic = AFFS_SUPER_MAGIC; sb->s_op = &affs_sops; sb->s_flags |= SB_NODIRATIME; sbi = kzalloc(sizeof(struct affs_sb_info), GFP_KERNEL); if (!sbi) return -ENOMEM; sb->s_fs_info = sbi; sbi->sb = sb; mutex_init(&sbi->s_bmlock); spin_lock_init(&sbi->symlink_lock); spin_lock_init(&sbi->work_lock); INIT_DELAYED_WORK(&sbi->sb_work, flush_superblock); if (!parse_options(data,&uid,&gid,&i,&reserved,&root_block, &blocksize,&sbi->s_prefix, sbi->s_volume, &mount_flags)) { pr_err("Error parsing options\n"); return -EINVAL; } /* N.B. after this point s_prefix must be released */ sbi->s_flags = mount_flags; sbi->s_mode = i; sbi->s_uid = uid; sbi->s_gid = gid; sbi->s_reserved= reserved; /* Get the size of the device in 512-byte blocks. * If we later see that the partition uses bigger * blocks, we will have to change it. */ size = i_size_read(sb->s_bdev->bd_inode) >> 9; pr_debug("initial blocksize=%d, #blocks=%d\n", 512, size); affs_set_blocksize(sb, PAGE_SIZE); /* Try to find root block. Its location depends on the block size. */ i = bdev_logical_block_size(sb->s_bdev); j = PAGE_SIZE; if (blocksize > 0) { i = j = blocksize; size = size / (blocksize / 512); } for (blocksize = i; blocksize <= j; blocksize <<= 1, size >>= 1) { sbi->s_root_block = root_block; if (root_block < 0) sbi->s_root_block = (reserved + size - 1) / 2; pr_debug("setting blocksize to %d\n", blocksize); affs_set_blocksize(sb, blocksize); sbi->s_partition_size = size; /* The root block location that was calculated above is not * correct if the partition size is an odd number of 512- * byte blocks, which will be rounded down to a number of * 1024-byte blocks, and if there were an even number of * reserved blocks. Ideally, all partition checkers should * report the real number of blocks of the real blocksize, * but since this just cannot be done, we have to try to * find the root block anyways. In the above case, it is one * block behind the calculated one. So we check this one, too. */ for (num_bm = 0; num_bm < 2; num_bm++) { pr_debug("Dev %s, trying root=%u, bs=%d, " "size=%d, reserved=%d\n", sb->s_id, sbi->s_root_block + num_bm, blocksize, size, reserved); root_bh = affs_bread(sb, sbi->s_root_block + num_bm); if (!root_bh) continue; if (!affs_checksum_block(sb, root_bh) && be32_to_cpu(AFFS_ROOT_HEAD(root_bh)->ptype) == T_SHORT && be32_to_cpu(AFFS_ROOT_TAIL(sb, root_bh)->stype) == ST_ROOT) { sbi->s_hashsize = blocksize / 4 - 56; sbi->s_root_block += num_bm; goto got_root; } affs_brelse(root_bh); root_bh = NULL; } } if (!silent) pr_err("No valid root block on device %s\n", sb->s_id); return -EINVAL; /* N.B. after this point bh must be released */ got_root: /* Keep super block in cache */ sbi->s_root_bh = root_bh; root_block = sbi->s_root_block; /* Find out which kind of FS we have */ boot_bh = sb_bread(sb, 0); if (!boot_bh) { pr_err("Cannot read boot block\n"); return -EINVAL; } memcpy(sig, boot_bh->b_data, 4); brelse(boot_bh); chksum = be32_to_cpu(*(__be32 *)sig); /* Dircache filesystems are compatible with non-dircache ones * when reading. As long as they aren't supported, writing is * not recommended. */ if ((chksum == FS_DCFFS || chksum == MUFS_DCFFS || chksum == FS_DCOFS || chksum == MUFS_DCOFS) && !sb_rdonly(sb)) { pr_notice("Dircache FS - mounting %s read only\n", sb->s_id); sb->s_flags |= SB_RDONLY; } switch (chksum) { case MUFS_FS: case MUFS_INTLFFS: case MUFS_DCFFS: affs_set_opt(sbi->s_flags, SF_MUFS); /* fall thru */ case FS_INTLFFS: case FS_DCFFS: affs_set_opt(sbi->s_flags, SF_INTL); break; case MUFS_FFS: affs_set_opt(sbi->s_flags, SF_MUFS); break; case FS_FFS: break; case MUFS_OFS: affs_set_opt(sbi->s_flags, SF_MUFS); /* fall thru */ case FS_OFS: affs_set_opt(sbi->s_flags, SF_OFS); sb->s_flags |= SB_NOEXEC; break; case MUFS_DCOFS: case MUFS_INTLOFS: affs_set_opt(sbi->s_flags, SF_MUFS); case FS_DCOFS: case FS_INTLOFS: affs_set_opt(sbi->s_flags, SF_INTL); affs_set_opt(sbi->s_flags, SF_OFS); sb->s_flags |= SB_NOEXEC; break; default: pr_err("Unknown filesystem on device %s: %08X\n", sb->s_id, chksum); return -EINVAL; } if (affs_test_opt(mount_flags, SF_VERBOSE)) { u8 len = AFFS_ROOT_TAIL(sb, root_bh)->disk_name[0]; pr_notice("Mounting volume \"%.*s\": Type=%.3s\\%c, Blocksize=%d\n", len > 31 ? 31 : len, AFFS_ROOT_TAIL(sb, root_bh)->disk_name + 1, sig, sig[3] + '0', blocksize); } sb->s_flags |= SB_NODEV | SB_NOSUID; sbi->s_data_blksize = sb->s_blocksize; if (affs_test_opt(sbi->s_flags, SF_OFS)) sbi->s_data_blksize -= 24; tmp_flags = sb->s_flags; ret = affs_init_bitmap(sb, &tmp_flags); if (ret) return ret; sb->s_flags = tmp_flags; /* set up enough so that it can read an inode */ root_inode = affs_iget(sb, root_block); if (IS_ERR(root_inode)) return PTR_ERR(root_inode); if (affs_test_opt(AFFS_SB(sb)->s_flags, SF_INTL)) sb->s_d_op = &affs_intl_dentry_operations; else sb->s_d_op = &affs_dentry_operations; sb->s_root = d_make_root(root_inode); if (!sb->s_root) { pr_err("AFFS: Get root inode failed\n"); return -ENOMEM; } sb->s_export_op = &affs_export_ops; pr_debug("s_flags=%lX\n", sb->s_flags); return 0; } static int affs_remount(struct super_block *sb, int *flags, char *data) { struct affs_sb_info *sbi = AFFS_SB(sb); int blocksize; kuid_t uid; kgid_t gid; int mode; int reserved; int root_block; unsigned long mount_flags; int res = 0; char volume[32]; char *prefix = NULL; pr_debug("%s(flags=0x%x,opts=\"%s\")\n", __func__, *flags, data); sync_filesystem(sb); *flags |= SB_NODIRATIME; memcpy(volume, sbi->s_volume, 32); if (!parse_options(data, &uid, &gid, &mode, &reserved, &root_block, &blocksize, &prefix, volume, &mount_flags)) { kfree(prefix); return -EINVAL; } flush_delayed_work(&sbi->sb_work); sbi->s_flags = mount_flags; sbi->s_mode = mode; sbi->s_uid = uid; sbi->s_gid = gid; /* protect against readers */ spin_lock(&sbi->symlink_lock); if (prefix) { kfree(sbi->s_prefix); sbi->s_prefix = prefix; } memcpy(sbi->s_volume, volume, 32); spin_unlock(&sbi->symlink_lock); if ((bool)(*flags & SB_RDONLY) == sb_rdonly(sb)) return 0; if (*flags & SB_RDONLY) affs_free_bitmap(sb); else res = affs_init_bitmap(sb, flags); return res; } static int affs_statfs(struct dentry *dentry, struct kstatfs *buf) { struct super_block *sb = dentry->d_sb; int free; u64 id = huge_encode_dev(sb->s_bdev->bd_dev); pr_debug("%s() partsize=%d, reserved=%d\n", __func__, AFFS_SB(sb)->s_partition_size, AFFS_SB(sb)->s_reserved); free = affs_count_free_blocks(sb); buf->f_type = AFFS_SUPER_MAGIC; buf->f_bsize = sb->s_blocksize; buf->f_blocks = AFFS_SB(sb)->s_partition_size - AFFS_SB(sb)->s_reserved; buf->f_bfree = free; buf->f_bavail = free; buf->f_fsid.val[0] = (u32)id; buf->f_fsid.val[1] = (u32)(id >> 32); buf->f_namelen = AFFSNAMEMAX; return 0; } static struct dentry *affs_mount(struct file_system_type *fs_type, int flags, const char *dev_name, void *data) { return mount_bdev(fs_type, flags, dev_name, data, affs_fill_super); } static void affs_kill_sb(struct super_block *sb) { struct affs_sb_info *sbi = AFFS_SB(sb); kill_block_super(sb); if (sbi) { affs_free_bitmap(sb); affs_brelse(sbi->s_root_bh); kfree(sbi->s_prefix); mutex_destroy(&sbi->s_bmlock); kfree(sbi); } } static struct file_system_type affs_fs_type = { .owner = THIS_MODULE, .name = "affs", .mount = affs_mount, .kill_sb = affs_kill_sb, .fs_flags = FS_REQUIRES_DEV, }; MODULE_ALIAS_FS("affs"); static int __init init_affs_fs(void) { int err = init_inodecache(); if (err) goto out1; err = register_filesystem(&affs_fs_type); if (err) goto out; return 0; out: destroy_inodecache(); out1: return err; } static void __exit exit_affs_fs(void) { unregister_filesystem(&affs_fs_type); destroy_inodecache(); } MODULE_DESCRIPTION("Amiga filesystem support for Linux"); MODULE_LICENSE("GPL"); module_init(init_affs_fs) module_exit(exit_affs_fs)
4 3 1 13 1 9 9 9 3 3 3 48 41 24 19 6 5 22 22 22 42 42 41 38 7 48 1 3 1 1 3 10 10 10 10 10 10 10 9 10 10 10 10 10 10 10 10 6 4 1 4 4 5 4 4 6 6 2 1 1 2 2 2 4 2 1 4 2 1 1 1 2 4 2 2 2 2 2 2 4 5 5 5 5 5 5 5 5 5 5 5 5 5 5 1 5 5 5 5 5 5 5 5 5 8 1 1 4 4 4 2 2 1 1 4 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 /* * L2TP netlink layer, for management * * Copyright (c) 2008,2009,2010 Katalix Systems Ltd * * Partly based on the IrDA nelink implementation * (see net/irda/irnetlink.c) which is: * Copyright (c) 2007 Samuel Ortiz <samuel@sortiz.org> * which is in turn partly based on the wireless netlink code: * Copyright 2006 Johannes Berg <johannes@sipsolutions.net> * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License version 2 as * published by the Free Software Foundation. */ #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt #include <net/sock.h> #include <net/genetlink.h> #include <net/udp.h> #include <linux/in.h> #include <linux/udp.h> #include <linux/socket.h> #include <linux/module.h> #include <linux/list.h> #include <net/net_namespace.h> #include <linux/l2tp.h> #include "l2tp_core.h" static struct genl_family l2tp_nl_family; static const struct genl_multicast_group l2tp_multicast_group[] = { { .name = L2TP_GENL_MCGROUP, }, }; static int l2tp_nl_tunnel_send(struct sk_buff *skb, u32 portid, u32 seq, int flags, struct l2tp_tunnel *tunnel, u8 cmd); static int l2tp_nl_session_send(struct sk_buff *skb, u32 portid, u32 seq, int flags, struct l2tp_session *session, u8 cmd); /* Accessed under genl lock */ static const struct l2tp_nl_cmd_ops *l2tp_nl_cmd_ops[__L2TP_PWTYPE_MAX]; static struct l2tp_session *l2tp_nl_session_get(struct genl_info *info) { u32 tunnel_id; u32 session_id; char *ifname; struct l2tp_tunnel *tunnel; struct l2tp_session *session = NULL; struct net *net = genl_info_net(info); if (info->attrs[L2TP_ATTR_IFNAME]) { ifname = nla_data(info->attrs[L2TP_ATTR_IFNAME]); session = l2tp_session_get_by_ifname(net, ifname); } else if ((info->attrs[L2TP_ATTR_SESSION_ID]) && (info->attrs[L2TP_ATTR_CONN_ID])) { tunnel_id = nla_get_u32(info->attrs[L2TP_ATTR_CONN_ID]); session_id = nla_get_u32(info->attrs[L2TP_ATTR_SESSION_ID]); tunnel = l2tp_tunnel_get(net, tunnel_id); if (tunnel) { session = l2tp_tunnel_get_session(tunnel, session_id); l2tp_tunnel_dec_refcount(tunnel); } } return session; } static int l2tp_nl_cmd_noop(struct sk_buff *skb, struct genl_info *info) { struct sk_buff *msg; void *hdr; int ret = -ENOBUFS; msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); if (!msg) { ret = -ENOMEM; goto out; } hdr = genlmsg_put(msg, info->snd_portid, info->snd_seq, &l2tp_nl_family, 0, L2TP_CMD_NOOP); if (!hdr) { ret = -EMSGSIZE; goto err_out; } genlmsg_end(msg, hdr); return genlmsg_unicast(genl_info_net(info), msg, info->snd_portid); err_out: nlmsg_free(msg); out: return ret; } static int l2tp_tunnel_notify(struct genl_family *family, struct genl_info *info, struct l2tp_tunnel *tunnel, u8 cmd) { struct sk_buff *msg; int ret; msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); if (!msg) return -ENOMEM; ret = l2tp_nl_tunnel_send(msg, info->snd_portid, info->snd_seq, NLM_F_ACK, tunnel, cmd); if (ret >= 0) { ret = genlmsg_multicast_allns(family, msg, 0, 0, GFP_ATOMIC); /* We don't care if no one is listening */ if (ret == -ESRCH) ret = 0; return ret; } nlmsg_free(msg); return ret; } static int l2tp_session_notify(struct genl_family *family, struct genl_info *info, struct l2tp_session *session, u8 cmd) { struct sk_buff *msg; int ret; msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); if (!msg) return -ENOMEM; ret = l2tp_nl_session_send(msg, info->snd_portid, info->snd_seq, NLM_F_ACK, session, cmd); if (ret >= 0) { ret = genlmsg_multicast_allns(family, msg, 0, 0, GFP_ATOMIC); /* We don't care if no one is listening */ if (ret == -ESRCH) ret = 0; return ret; } nlmsg_free(msg); return ret; } static int l2tp_nl_cmd_tunnel_create(struct sk_buff *skb, struct genl_info *info) { u32 tunnel_id; u32 peer_tunnel_id; int proto_version; int fd; int ret = 0; struct l2tp_tunnel_cfg cfg = { 0, }; struct l2tp_tunnel *tunnel; struct net *net = genl_info_net(info); if (!info->attrs[L2TP_ATTR_CONN_ID]) { ret = -EINVAL; goto out; } tunnel_id = nla_get_u32(info->attrs[L2TP_ATTR_CONN_ID]); if (!info->attrs[L2TP_ATTR_PEER_CONN_ID]) { ret = -EINVAL; goto out; } peer_tunnel_id = nla_get_u32(info->attrs[L2TP_ATTR_PEER_CONN_ID]); if (!info->attrs[L2TP_ATTR_PROTO_VERSION]) { ret = -EINVAL; goto out; } proto_version = nla_get_u8(info->attrs[L2TP_ATTR_PROTO_VERSION]); if (!info->attrs[L2TP_ATTR_ENCAP_TYPE]) { ret = -EINVAL; goto out; } cfg.encap = nla_get_u16(info->attrs[L2TP_ATTR_ENCAP_TYPE]); fd = -1; if (info->attrs[L2TP_ATTR_FD]) { fd = nla_get_u32(info->attrs[L2TP_ATTR_FD]); } else { #if IS_ENABLED(CONFIG_IPV6) if (info->attrs[L2TP_ATTR_IP6_SADDR] && info->attrs[L2TP_ATTR_IP6_DADDR]) { cfg.local_ip6 = nla_data( info->attrs[L2TP_ATTR_IP6_SADDR]); cfg.peer_ip6 = nla_data( info->attrs[L2TP_ATTR_IP6_DADDR]); } else #endif if (info->attrs[L2TP_ATTR_IP_SADDR] && info->attrs[L2TP_ATTR_IP_DADDR]) { cfg.local_ip.s_addr = nla_get_in_addr( info->attrs[L2TP_ATTR_IP_SADDR]); cfg.peer_ip.s_addr = nla_get_in_addr( info->attrs[L2TP_ATTR_IP_DADDR]); } else { ret = -EINVAL; goto out; } if (info->attrs[L2TP_ATTR_UDP_SPORT]) cfg.local_udp_port = nla_get_u16(info->attrs[L2TP_ATTR_UDP_SPORT]); if (info->attrs[L2TP_ATTR_UDP_DPORT]) cfg.peer_udp_port = nla_get_u16(info->attrs[L2TP_ATTR_UDP_DPORT]); cfg.use_udp_checksums = nla_get_flag( info->attrs[L2TP_ATTR_UDP_CSUM]); #if IS_ENABLED(CONFIG_IPV6) cfg.udp6_zero_tx_checksums = nla_get_flag( info->attrs[L2TP_ATTR_UDP_ZERO_CSUM6_TX]); cfg.udp6_zero_rx_checksums = nla_get_flag( info->attrs[L2TP_ATTR_UDP_ZERO_CSUM6_RX]); #endif } if (info->attrs[L2TP_ATTR_DEBUG]) cfg.debug = nla_get_u32(info->attrs[L2TP_ATTR_DEBUG]); ret = -EINVAL; switch (cfg.encap) { case L2TP_ENCAPTYPE_UDP: case L2TP_ENCAPTYPE_IP: ret = l2tp_tunnel_create(net, fd, proto_version, tunnel_id, peer_tunnel_id, &cfg, &tunnel); break; } if (ret < 0) goto out; l2tp_tunnel_inc_refcount(tunnel); ret = l2tp_tunnel_register(tunnel, net, &cfg); if (ret < 0) { kfree(tunnel); goto out; } ret = l2tp_tunnel_notify(&l2tp_nl_family, info, tunnel, L2TP_CMD_TUNNEL_CREATE); l2tp_tunnel_dec_refcount(tunnel); out: return ret; } static int l2tp_nl_cmd_tunnel_delete(struct sk_buff *skb, struct genl_info *info) { struct l2tp_tunnel *tunnel; u32 tunnel_id; int ret = 0; struct net *net = genl_info_net(info); if (!info->attrs[L2TP_ATTR_CONN_ID]) { ret = -EINVAL; goto out; } tunnel_id = nla_get_u32(info->attrs[L2TP_ATTR_CONN_ID]); tunnel = l2tp_tunnel_get(net, tunnel_id); if (!tunnel) { ret = -ENODEV; goto out; } l2tp_tunnel_notify(&l2tp_nl_family, info, tunnel, L2TP_CMD_TUNNEL_DELETE); l2tp_tunnel_delete(tunnel); l2tp_tunnel_dec_refcount(tunnel); out: return ret; } static int l2tp_nl_cmd_tunnel_modify(struct sk_buff *skb, struct genl_info *info) { struct l2tp_tunnel *tunnel; u32 tunnel_id; int ret = 0; struct net *net = genl_info_net(info); if (!info->attrs[L2TP_ATTR_CONN_ID]) { ret = -EINVAL; goto out; } tunnel_id = nla_get_u32(info->attrs[L2TP_ATTR_CONN_ID]); tunnel = l2tp_tunnel_get(net, tunnel_id); if (!tunnel) { ret = -ENODEV; goto out; } if (info->attrs[L2TP_ATTR_DEBUG]) tunnel->debug = nla_get_u32(info->attrs[L2TP_ATTR_DEBUG]); ret = l2tp_tunnel_notify(&l2tp_nl_family, info, tunnel, L2TP_CMD_TUNNEL_MODIFY); l2tp_tunnel_dec_refcount(tunnel); out: return ret; } static int l2tp_nl_tunnel_send(struct sk_buff *skb, u32 portid, u32 seq, int flags, struct l2tp_tunnel *tunnel, u8 cmd) { void *hdr; struct nlattr *nest; struct sock *sk = NULL; struct inet_sock *inet; #if IS_ENABLED(CONFIG_IPV6) struct ipv6_pinfo *np = NULL; #endif hdr = genlmsg_put(skb, portid, seq, &l2tp_nl_family, flags, cmd); if (!hdr) return -EMSGSIZE; if (nla_put_u8(skb, L2TP_ATTR_PROTO_VERSION, tunnel->version) || nla_put_u32(skb, L2TP_ATTR_CONN_ID, tunnel->tunnel_id) || nla_put_u32(skb, L2TP_ATTR_PEER_CONN_ID, tunnel->peer_tunnel_id) || nla_put_u32(skb, L2TP_ATTR_DEBUG, tunnel->debug) || nla_put_u16(skb, L2TP_ATTR_ENCAP_TYPE, tunnel->encap)) goto nla_put_failure; nest = nla_nest_start(skb, L2TP_ATTR_STATS); if (nest == NULL) goto nla_put_failure; if (nla_put_u64_64bit(skb, L2TP_ATTR_TX_PACKETS, atomic_long_read(&tunnel->stats.tx_packets), L2TP_ATTR_STATS_PAD) || nla_put_u64_64bit(skb, L2TP_ATTR_TX_BYTES, atomic_long_read(&tunnel->stats.tx_bytes), L2TP_ATTR_STATS_PAD) || nla_put_u64_64bit(skb, L2TP_ATTR_TX_ERRORS, atomic_long_read(&tunnel->stats.tx_errors), L2TP_ATTR_STATS_PAD) || nla_put_u64_64bit(skb, L2TP_ATTR_RX_PACKETS, atomic_long_read(&tunnel->stats.rx_packets), L2TP_ATTR_STATS_PAD) || nla_put_u64_64bit(skb, L2TP_ATTR_RX_BYTES, atomic_long_read(&tunnel->stats.rx_bytes), L2TP_ATTR_STATS_PAD) || nla_put_u64_64bit(skb, L2TP_ATTR_RX_SEQ_DISCARDS, atomic_long_read(&tunnel->stats.rx_seq_discards), L2TP_ATTR_STATS_PAD) || nla_put_u64_64bit(skb, L2TP_ATTR_RX_OOS_PACKETS, atomic_long_read(&tunnel->stats.rx_oos_packets), L2TP_ATTR_STATS_PAD) || nla_put_u64_64bit(skb, L2TP_ATTR_RX_ERRORS, atomic_long_read(&tunnel->stats.rx_errors), L2TP_ATTR_STATS_PAD)) goto nla_put_failure; nla_nest_end(skb, nest); sk = tunnel->sock; if (!sk) goto out; #if IS_ENABLED(CONFIG_IPV6) if (sk->sk_family == AF_INET6) np = inet6_sk(sk); #endif inet = inet_sk(sk); switch (tunnel->encap) { case L2TP_ENCAPTYPE_UDP: switch (sk->sk_family) { case AF_INET: if (nla_put_u8(skb, L2TP_ATTR_UDP_CSUM, !sk->sk_no_check_tx)) goto nla_put_failure; break; #if IS_ENABLED(CONFIG_IPV6) case AF_INET6: if (udp_get_no_check6_tx(sk) && nla_put_flag(skb, L2TP_ATTR_UDP_ZERO_CSUM6_TX)) goto nla_put_failure; if (udp_get_no_check6_rx(sk) && nla_put_flag(skb, L2TP_ATTR_UDP_ZERO_CSUM6_RX)) goto nla_put_failure; break; #endif } if (nla_put_u16(skb, L2TP_ATTR_UDP_SPORT, ntohs(inet->inet_sport)) || nla_put_u16(skb, L2TP_ATTR_UDP_DPORT, ntohs(inet->inet_dport))) goto nla_put_failure; /* fall through */ case L2TP_ENCAPTYPE_IP: #if IS_ENABLED(CONFIG_IPV6) if (np) { if (nla_put_in6_addr(skb, L2TP_ATTR_IP6_SADDR, &np->saddr) || nla_put_in6_addr(skb, L2TP_ATTR_IP6_DADDR, &sk->sk_v6_daddr)) goto nla_put_failure; } else #endif if (nla_put_in_addr(skb, L2TP_ATTR_IP_SADDR, inet->inet_saddr) || nla_put_in_addr(skb, L2TP_ATTR_IP_DADDR, inet->inet_daddr)) goto nla_put_failure; break; } out: genlmsg_end(skb, hdr); return 0; nla_put_failure: genlmsg_cancel(skb, hdr); return -1; } static int l2tp_nl_cmd_tunnel_get(struct sk_buff *skb, struct genl_info *info) { struct l2tp_tunnel *tunnel; struct sk_buff *msg; u32 tunnel_id; int ret = -ENOBUFS; struct net *net = genl_info_net(info); if (!info->attrs[L2TP_ATTR_CONN_ID]) { ret = -EINVAL; goto err; } tunnel_id = nla_get_u32(info->attrs[L2TP_ATTR_CONN_ID]); msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); if (!msg) { ret = -ENOMEM; goto err; } tunnel = l2tp_tunnel_get(net, tunnel_id); if (!tunnel) { ret = -ENODEV; goto err_nlmsg; } ret = l2tp_nl_tunnel_send(msg, info->snd_portid, info->snd_seq, NLM_F_ACK, tunnel, L2TP_CMD_TUNNEL_GET); if (ret < 0) goto err_nlmsg_tunnel; l2tp_tunnel_dec_refcount(tunnel); return genlmsg_unicast(net, msg, info->snd_portid); err_nlmsg_tunnel: l2tp_tunnel_dec_refcount(tunnel); err_nlmsg: nlmsg_free(msg); err: return ret; } static int l2tp_nl_cmd_tunnel_dump(struct sk_buff *skb, struct netlink_callback *cb) { int ti = cb->args[0]; struct l2tp_tunnel *tunnel; struct net *net = sock_net(skb->sk); for (;;) { tunnel = l2tp_tunnel_get_nth(net, ti); if (tunnel == NULL) goto out; if (l2tp_nl_tunnel_send(skb, NETLINK_CB(cb->skb).portid, cb->nlh->nlmsg_seq, NLM_F_MULTI, tunnel, L2TP_CMD_TUNNEL_GET) < 0) { l2tp_tunnel_dec_refcount(tunnel); goto out; } l2tp_tunnel_dec_refcount(tunnel); ti++; } out: cb->args[0] = ti; return skb->len; } static int l2tp_nl_cmd_session_create(struct sk_buff *skb, struct genl_info *info) { u32 tunnel_id = 0; u32 session_id; u32 peer_session_id; int ret = 0; struct l2tp_tunnel *tunnel; struct l2tp_session *session; struct l2tp_session_cfg cfg = { 0, }; struct net *net = genl_info_net(info); if (!info->attrs[L2TP_ATTR_CONN_ID]) { ret = -EINVAL; goto out; } tunnel_id = nla_get_u32(info->attrs[L2TP_ATTR_CONN_ID]); tunnel = l2tp_tunnel_get(net, tunnel_id); if (!tunnel) { ret = -ENODEV; goto out; } if (!info->attrs[L2TP_ATTR_SESSION_ID]) { ret = -EINVAL; goto out_tunnel; } session_id = nla_get_u32(info->attrs[L2TP_ATTR_SESSION_ID]); if (!info->attrs[L2TP_ATTR_PEER_SESSION_ID]) { ret = -EINVAL; goto out_tunnel; } peer_session_id = nla_get_u32(info->attrs[L2TP_ATTR_PEER_SESSION_ID]); if (!info->attrs[L2TP_ATTR_PW_TYPE]) { ret = -EINVAL; goto out_tunnel; } cfg.pw_type = nla_get_u16(info->attrs[L2TP_ATTR_PW_TYPE]); if (cfg.pw_type >= __L2TP_PWTYPE_MAX) { ret = -EINVAL; goto out_tunnel; } /* L2TPv2 only accepts PPP pseudo-wires */ if (tunnel->version == 2 && cfg.pw_type != L2TP_PWTYPE_PPP) { ret = -EPROTONOSUPPORT; goto out_tunnel; } if (tunnel->version > 2) { if (info->attrs[L2TP_ATTR_L2SPEC_TYPE]) { cfg.l2specific_type = nla_get_u8(info->attrs[L2TP_ATTR_L2SPEC_TYPE]); if (cfg.l2specific_type != L2TP_L2SPECTYPE_DEFAULT && cfg.l2specific_type != L2TP_L2SPECTYPE_NONE) { ret = -EINVAL; goto out_tunnel; } } else { cfg.l2specific_type = L2TP_L2SPECTYPE_DEFAULT; } if (info->attrs[L2TP_ATTR_COOKIE]) { u16 len = nla_len(info->attrs[L2TP_ATTR_COOKIE]); if (len > 8) { ret = -EINVAL; goto out_tunnel; } cfg.cookie_len = len; memcpy(&cfg.cookie[0], nla_data(info->attrs[L2TP_ATTR_COOKIE]), len); } if (info->attrs[L2TP_ATTR_PEER_COOKIE]) { u16 len = nla_len(info->attrs[L2TP_ATTR_PEER_COOKIE]); if (len > 8) { ret = -EINVAL; goto out_tunnel; } cfg.peer_cookie_len = len; memcpy(&cfg.peer_cookie[0], nla_data(info->attrs[L2TP_ATTR_PEER_COOKIE]), len); } if (info->attrs[L2TP_ATTR_IFNAME]) cfg.ifname = nla_data(info->attrs[L2TP_ATTR_IFNAME]); } if (info->attrs[L2TP_ATTR_DEBUG]) cfg.debug = nla_get_u32(info->attrs[L2TP_ATTR_DEBUG]); if (info->attrs[L2TP_ATTR_RECV_SEQ]) cfg.recv_seq = nla_get_u8(info->attrs[L2TP_ATTR_RECV_SEQ]); if (info->attrs[L2TP_ATTR_SEND_SEQ]) cfg.send_seq = nla_get_u8(info->attrs[L2TP_ATTR_SEND_SEQ]); if (info->attrs[L2TP_ATTR_LNS_MODE]) cfg.lns_mode = nla_get_u8(info->attrs[L2TP_ATTR_LNS_MODE]); if (info->attrs[L2TP_ATTR_RECV_TIMEOUT]) cfg.reorder_timeout = nla_get_msecs(info->attrs[L2TP_ATTR_RECV_TIMEOUT]); #ifdef CONFIG_MODULES if (l2tp_nl_cmd_ops[cfg.pw_type] == NULL) { genl_unlock(); request_module("net-l2tp-type-%u", cfg.pw_type); genl_lock(); } #endif if ((l2tp_nl_cmd_ops[cfg.pw_type] == NULL) || (l2tp_nl_cmd_ops[cfg.pw_type]->session_create == NULL)) { ret = -EPROTONOSUPPORT; goto out_tunnel; } ret = l2tp_nl_cmd_ops[cfg.pw_type]->session_create(net, tunnel, session_id, peer_session_id, &cfg); if (ret >= 0) { session = l2tp_tunnel_get_session(tunnel, session_id); if (session) { ret = l2tp_session_notify(&l2tp_nl_family, info, session, L2TP_CMD_SESSION_CREATE); l2tp_session_dec_refcount(session); } } out_tunnel: l2tp_tunnel_dec_refcount(tunnel); out: return ret; } static int l2tp_nl_cmd_session_delete(struct sk_buff *skb, struct genl_info *info) { int ret = 0; struct l2tp_session *session; u16 pw_type; session = l2tp_nl_session_get(info); if (session == NULL) { ret = -ENODEV; goto out; } l2tp_session_notify(&l2tp_nl_family, info, session, L2TP_CMD_SESSION_DELETE); pw_type = session->pwtype; if (pw_type < __L2TP_PWTYPE_MAX) if (l2tp_nl_cmd_ops[pw_type] && l2tp_nl_cmd_ops[pw_type]->session_delete) ret = (*l2tp_nl_cmd_ops[pw_type]->session_delete)(session); l2tp_session_dec_refcount(session); out: return ret; } static int l2tp_nl_cmd_session_modify(struct sk_buff *skb, struct genl_info *info) { int ret = 0; struct l2tp_session *session; session = l2tp_nl_session_get(info); if (session == NULL) { ret = -ENODEV; goto out; } if (info->attrs[L2TP_ATTR_DEBUG]) session->debug = nla_get_u32(info->attrs[L2TP_ATTR_DEBUG]); if (info->attrs[L2TP_ATTR_RECV_SEQ]) session->recv_seq = nla_get_u8(info->attrs[L2TP_ATTR_RECV_SEQ]); if (info->attrs[L2TP_ATTR_SEND_SEQ]) { session->send_seq = nla_get_u8(info->attrs[L2TP_ATTR_SEND_SEQ]); l2tp_session_set_header_len(session, session->tunnel->version); } if (info->attrs[L2TP_ATTR_LNS_MODE]) session->lns_mode = nla_get_u8(info->attrs[L2TP_ATTR_LNS_MODE]); if (info->attrs[L2TP_ATTR_RECV_TIMEOUT]) session->reorder_timeout = nla_get_msecs(info->attrs[L2TP_ATTR_RECV_TIMEOUT]); ret = l2tp_session_notify(&l2tp_nl_family, info, session, L2TP_CMD_SESSION_MODIFY); l2tp_session_dec_refcount(session); out: return ret; } static int l2tp_nl_session_send(struct sk_buff *skb, u32 portid, u32 seq, int flags, struct l2tp_session *session, u8 cmd) { void *hdr; struct nlattr *nest; struct l2tp_tunnel *tunnel = session->tunnel; hdr = genlmsg_put(skb, portid, seq, &l2tp_nl_family, flags, cmd); if (!hdr) return -EMSGSIZE; if (nla_put_u32(skb, L2TP_ATTR_CONN_ID, tunnel->tunnel_id) || nla_put_u32(skb, L2TP_ATTR_SESSION_ID, session->session_id) || nla_put_u32(skb, L2TP_ATTR_PEER_CONN_ID, tunnel->peer_tunnel_id) || nla_put_u32(skb, L2TP_ATTR_PEER_SESSION_ID, session->peer_session_id) || nla_put_u32(skb, L2TP_ATTR_DEBUG, session->debug) || nla_put_u16(skb, L2TP_ATTR_PW_TYPE, session->pwtype)) goto nla_put_failure; if ((session->ifname[0] && nla_put_string(skb, L2TP_ATTR_IFNAME, session->ifname)) || (session->cookie_len && nla_put(skb, L2TP_ATTR_COOKIE, session->cookie_len, &session->cookie[0])) || (session->peer_cookie_len && nla_put(skb, L2TP_ATTR_PEER_COOKIE, session->peer_cookie_len, &session->peer_cookie[0])) || nla_put_u8(skb, L2TP_ATTR_RECV_SEQ, session->recv_seq) || nla_put_u8(skb, L2TP_ATTR_SEND_SEQ, session->send_seq) || nla_put_u8(skb, L2TP_ATTR_LNS_MODE, session->lns_mode) || (l2tp_tunnel_uses_xfrm(tunnel) && nla_put_u8(skb, L2TP_ATTR_USING_IPSEC, 1)) || (session->reorder_timeout && nla_put_msecs(skb, L2TP_ATTR_RECV_TIMEOUT, session->reorder_timeout, L2TP_ATTR_PAD))) goto nla_put_failure; nest = nla_nest_start(skb, L2TP_ATTR_STATS); if (nest == NULL) goto nla_put_failure; if (nla_put_u64_64bit(skb, L2TP_ATTR_TX_PACKETS, atomic_long_read(&session->stats.tx_packets), L2TP_ATTR_STATS_PAD) || nla_put_u64_64bit(skb, L2TP_ATTR_TX_BYTES, atomic_long_read(&session->stats.tx_bytes), L2TP_ATTR_STATS_PAD) || nla_put_u64_64bit(skb, L2TP_ATTR_TX_ERRORS, atomic_long_read(&session->stats.tx_errors), L2TP_ATTR_STATS_PAD) || nla_put_u64_64bit(skb, L2TP_ATTR_RX_PACKETS, atomic_long_read(&session->stats.rx_packets), L2TP_ATTR_STATS_PAD) || nla_put_u64_64bit(skb, L2TP_ATTR_RX_BYTES, atomic_long_read(&session->stats.rx_bytes), L2TP_ATTR_STATS_PAD) || nla_put_u64_64bit(skb, L2TP_ATTR_RX_SEQ_DISCARDS, atomic_long_read(&session->stats.rx_seq_discards), L2TP_ATTR_STATS_PAD) || nla_put_u64_64bit(skb, L2TP_ATTR_RX_OOS_PACKETS, atomic_long_read(&session->stats.rx_oos_packets), L2TP_ATTR_STATS_PAD) || nla_put_u64_64bit(skb, L2TP_ATTR_RX_ERRORS, atomic_long_read(&session->stats.rx_errors), L2TP_ATTR_STATS_PAD)) goto nla_put_failure; nla_nest_end(skb, nest); genlmsg_end(skb, hdr); return 0; nla_put_failure: genlmsg_cancel(skb, hdr); return -1; } static int l2tp_nl_cmd_session_get(struct sk_buff *skb, struct genl_info *info) { struct l2tp_session *session; struct sk_buff *msg; int ret; session = l2tp_nl_session_get(info); if (session == NULL) { ret = -ENODEV; goto err; } msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); if (!msg) { ret = -ENOMEM; goto err_ref; } ret = l2tp_nl_session_send(msg, info->snd_portid, info->snd_seq, 0, session, L2TP_CMD_SESSION_GET); if (ret < 0) goto err_ref_msg; ret = genlmsg_unicast(genl_info_net(info), msg, info->snd_portid); l2tp_session_dec_refcount(session); return ret; err_ref_msg: nlmsg_free(msg); err_ref: l2tp_session_dec_refcount(session); err: return ret; } static int l2tp_nl_cmd_session_dump(struct sk_buff *skb, struct netlink_callback *cb) { struct net *net = sock_net(skb->sk); struct l2tp_session *session; struct l2tp_tunnel *tunnel = NULL; int ti = cb->args[0]; int si = cb->args[1]; for (;;) { if (tunnel == NULL) { tunnel = l2tp_tunnel_get_nth(net, ti); if (tunnel == NULL) goto out; } session = l2tp_session_get_nth(tunnel, si); if (session == NULL) { ti++; l2tp_tunnel_dec_refcount(tunnel); tunnel = NULL; si = 0; continue; } if (l2tp_nl_session_send(skb, NETLINK_CB(cb->skb).portid, cb->nlh->nlmsg_seq, NLM_F_MULTI, session, L2TP_CMD_SESSION_GET) < 0) { l2tp_session_dec_refcount(session); l2tp_tunnel_dec_refcount(tunnel); break; } l2tp_session_dec_refcount(session); si++; } out: cb->args[0] = ti; cb->args[1] = si; return skb->len; } static const struct nla_policy l2tp_nl_policy[L2TP_ATTR_MAX + 1] = { [L2TP_ATTR_NONE] = { .type = NLA_UNSPEC, }, [L2TP_ATTR_PW_TYPE] = { .type = NLA_U16, }, [L2TP_ATTR_ENCAP_TYPE] = { .type = NLA_U16, }, [L2TP_ATTR_OFFSET] = { .type = NLA_U16, }, [L2TP_ATTR_DATA_SEQ] = { .type = NLA_U8, }, [L2TP_ATTR_L2SPEC_TYPE] = { .type = NLA_U8, }, [L2TP_ATTR_L2SPEC_LEN] = { .type = NLA_U8, }, [L2TP_ATTR_PROTO_VERSION] = { .type = NLA_U8, }, [L2TP_ATTR_CONN_ID] = { .type = NLA_U32, }, [L2TP_ATTR_PEER_CONN_ID] = { .type = NLA_U32, }, [L2TP_ATTR_SESSION_ID] = { .type = NLA_U32, }, [L2TP_ATTR_PEER_SESSION_ID] = { .type = NLA_U32, }, [L2TP_ATTR_UDP_CSUM] = { .type = NLA_U8, }, [L2TP_ATTR_VLAN_ID] = { .type = NLA_U16, }, [L2TP_ATTR_DEBUG] = { .type = NLA_U32, }, [L2TP_ATTR_RECV_SEQ] = { .type = NLA_U8, }, [L2TP_ATTR_SEND_SEQ] = { .type = NLA_U8, }, [L2TP_ATTR_LNS_MODE] = { .type = NLA_U8, }, [L2TP_ATTR_USING_IPSEC] = { .type = NLA_U8, }, [L2TP_ATTR_RECV_TIMEOUT] = { .type = NLA_MSECS, }, [L2TP_ATTR_FD] = { .type = NLA_U32, }, [L2TP_ATTR_IP_SADDR] = { .type = NLA_U32, }, [L2TP_ATTR_IP_DADDR] = { .type = NLA_U32, }, [L2TP_ATTR_UDP_SPORT] = { .type = NLA_U16, }, [L2TP_ATTR_UDP_DPORT] = { .type = NLA_U16, }, [L2TP_ATTR_MTU] = { .type = NLA_U16, }, [L2TP_ATTR_MRU] = { .type = NLA_U16, }, [L2TP_ATTR_STATS] = { .type = NLA_NESTED, }, [L2TP_ATTR_IP6_SADDR] = { .type = NLA_BINARY, .len = sizeof(struct in6_addr), }, [L2TP_ATTR_IP6_DADDR] = { .type = NLA_BINARY, .len = sizeof(struct in6_addr), }, [L2TP_ATTR_IFNAME] = { .type = NLA_NUL_STRING, .len = IFNAMSIZ - 1, }, [L2TP_ATTR_COOKIE] = { .type = NLA_BINARY, .len = 8, }, [L2TP_ATTR_PEER_COOKIE] = { .type = NLA_BINARY, .len = 8, }, }; static const struct genl_ops l2tp_nl_ops[] = { { .cmd = L2TP_CMD_NOOP, .doit = l2tp_nl_cmd_noop, .policy = l2tp_nl_policy, /* can be retrieved by unprivileged users */ }, { .cmd = L2TP_CMD_TUNNEL_CREATE, .doit = l2tp_nl_cmd_tunnel_create, .policy = l2tp_nl_policy, .flags = GENL_ADMIN_PERM, }, { .cmd = L2TP_CMD_TUNNEL_DELETE, .doit = l2tp_nl_cmd_tunnel_delete, .policy = l2tp_nl_policy, .flags = GENL_ADMIN_PERM, }, { .cmd = L2TP_CMD_TUNNEL_MODIFY, .doit = l2tp_nl_cmd_tunnel_modify, .policy = l2tp_nl_policy, .flags = GENL_ADMIN_PERM, }, { .cmd = L2TP_CMD_TUNNEL_GET, .doit = l2tp_nl_cmd_tunnel_get, .dumpit = l2tp_nl_cmd_tunnel_dump, .policy = l2tp_nl_policy, .flags = GENL_ADMIN_PERM, }, { .cmd = L2TP_CMD_SESSION_CREATE, .doit = l2tp_nl_cmd_session_create, .policy = l2tp_nl_policy, .flags = GENL_ADMIN_PERM, }, { .cmd = L2TP_CMD_SESSION_DELETE, .doit = l2tp_nl_cmd_session_delete, .policy = l2tp_nl_policy, .flags = GENL_ADMIN_PERM, }, { .cmd = L2TP_CMD_SESSION_MODIFY, .doit = l2tp_nl_cmd_session_modify, .policy = l2tp_nl_policy, .flags = GENL_ADMIN_PERM, }, { .cmd = L2TP_CMD_SESSION_GET, .doit = l2tp_nl_cmd_session_get, .dumpit = l2tp_nl_cmd_session_dump, .policy = l2tp_nl_policy, .flags = GENL_ADMIN_PERM, }, }; static struct genl_family l2tp_nl_family __ro_after_init = { .name = L2TP_GENL_NAME, .version = L2TP_GENL_VERSION, .hdrsize = 0, .maxattr = L2TP_ATTR_MAX, .netnsok = true, .module = THIS_MODULE, .ops = l2tp_nl_ops, .n_ops = ARRAY_SIZE(l2tp_nl_ops), .mcgrps = l2tp_multicast_group, .n_mcgrps = ARRAY_SIZE(l2tp_multicast_group), }; int l2tp_nl_register_ops(enum l2tp_pwtype pw_type, const struct l2tp_nl_cmd_ops *ops) { int ret; ret = -EINVAL; if (pw_type >= __L2TP_PWTYPE_MAX) goto err; genl_lock(); ret = -EBUSY; if (l2tp_nl_cmd_ops[pw_type]) goto out; l2tp_nl_cmd_ops[pw_type] = ops; ret = 0; out: genl_unlock(); err: return ret; } EXPORT_SYMBOL_GPL(l2tp_nl_register_ops); void l2tp_nl_unregister_ops(enum l2tp_pwtype pw_type) { if (pw_type < __L2TP_PWTYPE_MAX) { genl_lock(); l2tp_nl_cmd_ops[pw_type] = NULL; genl_unlock(); } } EXPORT_SYMBOL_GPL(l2tp_nl_unregister_ops); static int __init l2tp_nl_init(void) { pr_info("L2TP netlink interface\n"); return genl_register_family(&l2tp_nl_family); } static void l2tp_nl_cleanup(void) { genl_unregister_family(&l2tp_nl_family); } module_init(l2tp_nl_init); module_exit(l2tp_nl_cleanup); MODULE_AUTHOR("James Chapman <jchapman@katalix.com>"); MODULE_DESCRIPTION("L2TP netlink"); MODULE_LICENSE("GPL"); MODULE_VERSION("1.0"); MODULE_ALIAS_GENL_FAMILY("l2tp");
46 34 47 34 47 34 47 34 47 44 6 47 46 119 111 110 111 111 111 111 111 92 92 6 6 107 107 107 92 92 92 92 92 92 49 111 110 118 55 33 55 33 55 33 55 33 55 55 8 55 8 144 143 144 15 10 8 8 92 144 92 92 92 92 92 92 35 35 35 35 35 141 141 141 158 158 158 158 158 149 100 158 158 158 162 162 162 160 160 162 2 2 162 92 162 162 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 /* * net/dccp/options.c * * An implementation of the DCCP protocol * Copyright (c) 2005 Aristeu Sergio Rozanski Filho <aris@cathedrallabs.org> * Copyright (c) 2005 Arnaldo Carvalho de Melo <acme@ghostprotocols.net> * Copyright (c) 2005 Ian McDonald <ian.mcdonald@jandi.co.nz> * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License * as published by the Free Software Foundation; either version * 2 of the License, or (at your option) any later version. */ #include <linux/dccp.h> #include <linux/module.h> #include <linux/types.h> #include <asm/unaligned.h> #include <linux/kernel.h> #include <linux/skbuff.h> #include "ackvec.h" #include "ccid.h" #include "dccp.h" #include "feat.h" u64 dccp_decode_value_var(const u8 *bf, const u8 len) { u64 value = 0; if (len >= DCCP_OPTVAL_MAXLEN) value += ((u64)*bf++) << 40; if (len > 4) value += ((u64)*bf++) << 32; if (len > 3) value += ((u64)*bf++) << 24; if (len > 2) value += ((u64)*bf++) << 16; if (len > 1) value += ((u64)*bf++) << 8; if (len > 0) value += *bf; return value; } /** * dccp_parse_options - Parse DCCP options present in @skb * @sk: client|server|listening dccp socket (when @dreq != NULL) * @dreq: request socket to use during connection setup, or NULL */ int dccp_parse_options(struct sock *sk, struct dccp_request_sock *dreq, struct sk_buff *skb) { struct dccp_sock *dp = dccp_sk(sk); const struct dccp_hdr *dh = dccp_hdr(skb); const u8 pkt_type = DCCP_SKB_CB(skb)->dccpd_type; unsigned char *options = (unsigned char *)dh + dccp_hdr_len(skb); unsigned char *opt_ptr = options; const unsigned char *opt_end = (unsigned char *)dh + (dh->dccph_doff * 4); struct dccp_options_received *opt_recv = &dp->dccps_options_received; unsigned char opt, len; unsigned char *uninitialized_var(value); u32 elapsed_time; __be32 opt_val; int rc; int mandatory = 0; memset(opt_recv, 0, sizeof(*opt_recv)); opt = len = 0; while (opt_ptr != opt_end) { opt = *opt_ptr++; len = 0; value = NULL; /* Check if this isn't a single byte option */ if (opt > DCCPO_MAX_RESERVED) { if (opt_ptr == opt_end) goto out_nonsensical_length; len = *opt_ptr++; if (len < 2) goto out_nonsensical_length; /* * Remove the type and len fields, leaving * just the value size */ len -= 2; value = opt_ptr; opt_ptr += len; if (opt_ptr > opt_end) goto out_nonsensical_length; } /* * CCID-specific options are ignored during connection setup, as * negotiation may still be in progress (see RFC 4340, 10.3). * The same applies to Ack Vectors, as these depend on the CCID. */ if (dreq != NULL && (opt >= DCCPO_MIN_RX_CCID_SPECIFIC || opt == DCCPO_ACK_VECTOR_0 || opt == DCCPO_ACK_VECTOR_1)) goto ignore_option; switch (opt) { case DCCPO_PADDING: break; case DCCPO_MANDATORY: if (mandatory) goto out_invalid_option; if (pkt_type != DCCP_PKT_DATA) mandatory = 1; break; case DCCPO_NDP_COUNT: if (len > 6) goto out_invalid_option; opt_recv->dccpor_ndp = dccp_decode_value_var(value, len); dccp_pr_debug("%s opt: NDP count=%llu\n", dccp_role(sk), (unsigned long long)opt_recv->dccpor_ndp); break; case DCCPO_CHANGE_L ... DCCPO_CONFIRM_R: if (pkt_type == DCCP_PKT_DATA) /* RFC 4340, 6 */ break; if (len == 0) goto out_invalid_option; rc = dccp_feat_parse_options(sk, dreq, mandatory, opt, *value, value + 1, len - 1); if (rc) goto out_featneg_failed; break; case DCCPO_TIMESTAMP: if (len != 4) goto out_invalid_option; /* * RFC 4340 13.1: "The precise time corresponding to * Timestamp Value zero is not specified". We use * zero to indicate absence of a meaningful timestamp. */ opt_val = get_unaligned((__be32 *)value); if (unlikely(opt_val == 0)) { DCCP_WARN("Timestamp with zero value\n"); break; } if (dreq != NULL) { dreq->dreq_timestamp_echo = ntohl(opt_val); dreq->dreq_timestamp_time = dccp_timestamp(); } else { opt_recv->dccpor_timestamp = dp->dccps_timestamp_echo = ntohl(opt_val); dp->dccps_timestamp_time = dccp_timestamp(); } dccp_pr_debug("%s rx opt: TIMESTAMP=%u, ackno=%llu\n", dccp_role(sk), ntohl(opt_val), (unsigned long long) DCCP_SKB_CB(skb)->dccpd_ack_seq); /* schedule an Ack in case this sender is quiescent */ inet_csk_schedule_ack(sk); break; case DCCPO_TIMESTAMP_ECHO: if (len != 4 && len != 6 && len != 8) goto out_invalid_option; opt_val = get_unaligned((__be32 *)value); opt_recv->dccpor_timestamp_echo = ntohl(opt_val); dccp_pr_debug("%s rx opt: TIMESTAMP_ECHO=%u, len=%d, " "ackno=%llu", dccp_role(sk), opt_recv->dccpor_timestamp_echo, len + 2, (unsigned long long) DCCP_SKB_CB(skb)->dccpd_ack_seq); value += 4; if (len == 4) { /* no elapsed time included */ dccp_pr_debug_cat("\n"); break; } if (len == 6) { /* 2-byte elapsed time */ __be16 opt_val2 = get_unaligned((__be16 *)value); elapsed_time = ntohs(opt_val2); } else { /* 4-byte elapsed time */ opt_val = get_unaligned((__be32 *)value); elapsed_time = ntohl(opt_val); } dccp_pr_debug_cat(", ELAPSED_TIME=%u\n", elapsed_time); /* Give precedence to the biggest ELAPSED_TIME */ if (elapsed_time > opt_recv->dccpor_elapsed_time) opt_recv->dccpor_elapsed_time = elapsed_time; break; case DCCPO_ELAPSED_TIME: if (dccp_packet_without_ack(skb)) /* RFC 4340, 13.2 */ break; if (len == 2) { __be16 opt_val2 = get_unaligned((__be16 *)value); elapsed_time = ntohs(opt_val2); } else if (len == 4) { opt_val = get_unaligned((__be32 *)value); elapsed_time = ntohl(opt_val); } else { goto out_invalid_option; } if (elapsed_time > opt_recv->dccpor_elapsed_time) opt_recv->dccpor_elapsed_time = elapsed_time; dccp_pr_debug("%s rx opt: ELAPSED_TIME=%d\n", dccp_role(sk), elapsed_time); break; case DCCPO_MIN_RX_CCID_SPECIFIC ... DCCPO_MAX_RX_CCID_SPECIFIC: if (ccid_hc_rx_parse_options(dp->dccps_hc_rx_ccid, sk, pkt_type, opt, value, len)) goto out_invalid_option; break; case DCCPO_ACK_VECTOR_0: case DCCPO_ACK_VECTOR_1: if (dccp_packet_without_ack(skb)) /* RFC 4340, 11.4 */ break; /* * Ack vectors are processed by the TX CCID if it is * interested. The RX CCID need not parse Ack Vectors, * since it is only interested in clearing old state. */ /* fall through */ case DCCPO_MIN_TX_CCID_SPECIFIC ... DCCPO_MAX_TX_CCID_SPECIFIC: if (ccid_hc_tx_parse_options(dp->dccps_hc_tx_ccid, sk, pkt_type, opt, value, len)) goto out_invalid_option; break; default: DCCP_CRIT("DCCP(%p): option %d(len=%d) not " "implemented, ignoring", sk, opt, len); break; } ignore_option: if (opt != DCCPO_MANDATORY) mandatory = 0; } /* mandatory was the last byte in option list -> reset connection */ if (mandatory) goto out_invalid_option; out_nonsensical_length: /* RFC 4340, 5.8: ignore option and all remaining option space */ return 0; out_invalid_option: DCCP_INC_STATS(DCCP_MIB_INVALIDOPT); rc = DCCP_RESET_CODE_OPTION_ERROR; out_featneg_failed: DCCP_WARN("DCCP(%p): Option %d (len=%d) error=%u\n", sk, opt, len, rc); DCCP_SKB_CB(skb)->dccpd_reset_code = rc; DCCP_SKB_CB(skb)->dccpd_reset_data[0] = opt; DCCP_SKB_CB(skb)->dccpd_reset_data[1] = len > 0 ? value[0] : 0; DCCP_SKB_CB(skb)->dccpd_reset_data[2] = len > 1 ? value[1] : 0; return -1; } EXPORT_SYMBOL_GPL(dccp_parse_options); void dccp_encode_value_var(const u64 value, u8 *to, const u8 len) { if (len >= DCCP_OPTVAL_MAXLEN) *to++ = (value & 0xFF0000000000ull) >> 40; if (len > 4) *to++ = (value & 0xFF00000000ull) >> 32; if (len > 3) *to++ = (value & 0xFF000000) >> 24; if (len > 2) *to++ = (value & 0xFF0000) >> 16; if (len > 1) *to++ = (value & 0xFF00) >> 8; if (len > 0) *to++ = (value & 0xFF); } static inline u8 dccp_ndp_len(const u64 ndp) { if (likely(ndp <= 0xFF)) return 1; return likely(ndp <= USHRT_MAX) ? 2 : (ndp <= UINT_MAX ? 4 : 6); } int dccp_insert_option(struct sk_buff *skb, const unsigned char option, const void *value, const unsigned char len) { unsigned char *to; if (DCCP_SKB_CB(skb)->dccpd_opt_len + len + 2 > DCCP_MAX_OPT_LEN) return -1; DCCP_SKB_CB(skb)->dccpd_opt_len += len + 2; to = skb_push(skb, len + 2); *to++ = option; *to++ = len + 2; memcpy(to, value, len); return 0; } EXPORT_SYMBOL_GPL(dccp_insert_option); static int dccp_insert_option_ndp(struct sock *sk, struct sk_buff *skb) { struct dccp_sock *dp = dccp_sk(sk); u64 ndp = dp->dccps_ndp_count; if (dccp_non_data_packet(skb)) ++dp->dccps_ndp_count; else dp->dccps_ndp_count = 0; if (ndp > 0) { unsigned char *ptr; const int ndp_len = dccp_ndp_len(ndp); const int len = ndp_len + 2; if (DCCP_SKB_CB(skb)->dccpd_opt_len + len > DCCP_MAX_OPT_LEN) return -1; DCCP_SKB_CB(skb)->dccpd_opt_len += len; ptr = skb_push(skb, len); *ptr++ = DCCPO_NDP_COUNT; *ptr++ = len; dccp_encode_value_var(ndp, ptr, ndp_len); } return 0; } static inline int dccp_elapsed_time_len(const u32 elapsed_time) { return elapsed_time == 0 ? 0 : elapsed_time <= 0xFFFF ? 2 : 4; } static int dccp_insert_option_timestamp(struct sk_buff *skb) { __be32 now = htonl(dccp_timestamp()); /* yes this will overflow but that is the point as we want a * 10 usec 32 bit timer which mean it wraps every 11.9 hours */ return dccp_insert_option(skb, DCCPO_TIMESTAMP, &now, sizeof(now)); } static int dccp_insert_option_timestamp_echo(struct dccp_sock *dp, struct dccp_request_sock *dreq, struct sk_buff *skb) { __be32 tstamp_echo; unsigned char *to; u32 elapsed_time, elapsed_time_len, len; if (dreq != NULL) { elapsed_time = dccp_timestamp() - dreq->dreq_timestamp_time; tstamp_echo = htonl(dreq->dreq_timestamp_echo); dreq->dreq_timestamp_echo = 0; } else { elapsed_time = dccp_timestamp() - dp->dccps_timestamp_time; tstamp_echo = htonl(dp->dccps_timestamp_echo); dp->dccps_timestamp_echo = 0; } elapsed_time_len = dccp_elapsed_time_len(elapsed_time); len = 6 + elapsed_time_len; if (DCCP_SKB_CB(skb)->dccpd_opt_len + len > DCCP_MAX_OPT_LEN) return -1; DCCP_SKB_CB(skb)->dccpd_opt_len += len; to = skb_push(skb, len); *to++ = DCCPO_TIMESTAMP_ECHO; *to++ = len; memcpy(to, &tstamp_echo, 4); to += 4; if (elapsed_time_len == 2) { const __be16 var16 = htons((u16)elapsed_time); memcpy(to, &var16, 2); } else if (elapsed_time_len == 4) { const __be32 var32 = htonl(elapsed_time); memcpy(to, &var32, 4); } return 0; } static int dccp_insert_option_ackvec(struct sock *sk, struct sk_buff *skb) { struct dccp_sock *dp = dccp_sk(sk); struct dccp_ackvec *av = dp->dccps_hc_rx_ackvec; struct dccp_skb_cb *dcb = DCCP_SKB_CB(skb); const u16 buflen = dccp_ackvec_buflen(av); /* Figure out how many options do we need to represent the ackvec */ const u8 nr_opts = DIV_ROUND_UP(buflen, DCCP_SINGLE_OPT_MAXLEN); u16 len = buflen + 2 * nr_opts; u8 i, nonce = 0; const unsigned char *tail, *from; unsigned char *to; if (dcb->dccpd_opt_len + len > DCCP_MAX_OPT_LEN) { DCCP_WARN("Lacking space for %u bytes on %s packet\n", len, dccp_packet_name(dcb->dccpd_type)); return -1; } /* * Since Ack Vectors are variable-length, we can not always predict * their size. To catch exception cases where the space is running out * on the skb, a separate Sync is scheduled to carry the Ack Vector. */ if (len > DCCPAV_MIN_OPTLEN && len + dcb->dccpd_opt_len + skb->len > dp->dccps_mss_cache) { DCCP_WARN("No space left for Ack Vector (%u) on skb (%u+%u), " "MPS=%u ==> reduce payload size?\n", len, skb->len, dcb->dccpd_opt_len, dp->dccps_mss_cache); dp->dccps_sync_scheduled = 1; return 0; } dcb->dccpd_opt_len += len; to = skb_push(skb, len); len = buflen; from = av->av_buf + av->av_buf_head; tail = av->av_buf + DCCPAV_MAX_ACKVEC_LEN; for (i = 0; i < nr_opts; ++i) { int copylen = len; if (len > DCCP_SINGLE_OPT_MAXLEN) copylen = DCCP_SINGLE_OPT_MAXLEN; /* * RFC 4340, 12.2: Encode the Nonce Echo for this Ack Vector via * its type; ack_nonce is the sum of all individual buf_nonce's. */ nonce ^= av->av_buf_nonce[i]; *to++ = DCCPO_ACK_VECTOR_0 + av->av_buf_nonce[i]; *to++ = copylen + 2; /* Check if buf_head wraps */ if (from + copylen > tail) { const u16 tailsize = tail - from; memcpy(to, from, tailsize); to += tailsize; len -= tailsize; copylen -= tailsize; from = av->av_buf; } memcpy(to, from, copylen); from += copylen; to += copylen; len -= copylen; } /* * Each sent Ack Vector is recorded in the list, as per A.2 of RFC 4340. */ if (dccp_ackvec_update_records(av, dcb->dccpd_seq, nonce)) return -ENOBUFS; return 0; } /** * dccp_insert_option_mandatory - Mandatory option (5.8.2) * Note that since we are using skb_push, this function needs to be called * _after_ inserting the option it is supposed to influence (stack order). */ int dccp_insert_option_mandatory(struct sk_buff *skb) { if (DCCP_SKB_CB(skb)->dccpd_opt_len >= DCCP_MAX_OPT_LEN) return -1; DCCP_SKB_CB(skb)->dccpd_opt_len++; *(u8 *)skb_push(skb, 1) = DCCPO_MANDATORY; return 0; } /** * dccp_insert_fn_opt - Insert single Feature-Negotiation option into @skb * @type: %DCCPO_CHANGE_L, %DCCPO_CHANGE_R, %DCCPO_CONFIRM_L, %DCCPO_CONFIRM_R * @feat: one out of %dccp_feature_numbers * @val: NN value or SP array (preferred element first) to copy * @len: true length of @val in bytes (excluding first element repetition) * @repeat_first: whether to copy the first element of @val twice * * The last argument is used to construct Confirm options, where the preferred * value and the preference list appear separately (RFC 4340, 6.3.1). Preference * lists are kept such that the preferred entry is always first, so we only need * to copy twice, and avoid the overhead of cloning into a bigger array. */ int dccp_insert_fn_opt(struct sk_buff *skb, u8 type, u8 feat, u8 *val, u8 len, bool repeat_first) { u8 tot_len, *to; /* take the `Feature' field and possible repetition into account */ if (len > (DCCP_SINGLE_OPT_MAXLEN - 2)) { DCCP_WARN("length %u for feature %u too large\n", len, feat); return -1; } if (unlikely(val == NULL || len == 0)) len = repeat_first = false; tot_len = 3 + repeat_first + len; if (DCCP_SKB_CB(skb)->dccpd_opt_len + tot_len > DCCP_MAX_OPT_LEN) { DCCP_WARN("packet too small for feature %d option!\n", feat); return -1; } DCCP_SKB_CB(skb)->dccpd_opt_len += tot_len; to = skb_push(skb, tot_len); *to++ = type; *to++ = tot_len; *to++ = feat; if (repeat_first) *to++ = *val; if (len) memcpy(to, val, len); return 0; } /* The length of all options needs to be a multiple of 4 (5.8) */ static void dccp_insert_option_padding(struct sk_buff *skb) { int padding = DCCP_SKB_CB(skb)->dccpd_opt_len % 4; if (padding != 0) { padding = 4 - padding; memset(skb_push(skb, padding), 0, padding); DCCP_SKB_CB(skb)->dccpd_opt_len += padding; } } int dccp_insert_options(struct sock *sk, struct sk_buff *skb) { struct dccp_sock *dp = dccp_sk(sk); DCCP_SKB_CB(skb)->dccpd_opt_len = 0; if (dp->dccps_send_ndp_count && dccp_insert_option_ndp(sk, skb)) return -1; if (DCCP_SKB_CB(skb)->dccpd_type != DCCP_PKT_DATA) { /* Feature Negotiation */ if (dccp_feat_insert_opts(dp, NULL, skb)) return -1; if (DCCP_SKB_CB(skb)->dccpd_type == DCCP_PKT_REQUEST) { /* * Obtain RTT sample from Request/Response exchange. * This is currently used for TFRC initialisation. */ if (dccp_insert_option_timestamp(skb)) return -1; } else if (dccp_ackvec_pending(sk) && dccp_insert_option_ackvec(sk, skb)) { return -1; } } if (dp->dccps_hc_rx_insert_options) { if (ccid_hc_rx_insert_options(dp->dccps_hc_rx_ccid, sk, skb)) return -1; dp->dccps_hc_rx_insert_options = 0; } if (dp->dccps_timestamp_echo != 0 && dccp_insert_option_timestamp_echo(dp, NULL, skb)) return -1; dccp_insert_option_padding(skb); return 0; } int dccp_insert_options_rsk(struct dccp_request_sock *dreq, struct sk_buff *skb) { DCCP_SKB_CB(skb)->dccpd_opt_len = 0; if (dccp_feat_insert_opts(NULL, dreq, skb)) return -1; /* Obtain RTT sample from Response/Ack exchange (used by TFRC). */ if (dccp_insert_option_timestamp(skb)) return -1; if (dreq->dreq_timestamp_echo != 0 && dccp_insert_option_timestamp_echo(NULL, dreq, skb)) return -1; dccp_insert_option_padding(skb); return 0; }
7 7 5 5 1 1 1 3 3 2 2 1 7 7 5 7 1 1 1 1 6 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 /* * inode.c - part of tracefs, a pseudo file system for activating tracing * * Based on debugfs by: Greg Kroah-Hartman <greg@kroah.com> * * Copyright (C) 2014 Red Hat Inc, author: Steven Rostedt <srostedt@redhat.com> * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License version * 2 as published by the Free Software Foundation. * * tracefs is the file system that is used by the tracing infrastructure. * */ #include <linux/module.h> #include <linux/fs.h> #include <linux/mount.h> #include <linux/kobject.h> #include <linux/namei.h> #include <linux/tracefs.h> #include <linux/fsnotify.h> #include <linux/seq_file.h> #include <linux/parser.h> #include <linux/magic.h> #include <linux/slab.h> #define TRACEFS_DEFAULT_MODE 0700 static struct vfsmount *tracefs_mount; static int tracefs_mount_count; static bool tracefs_registered; static ssize_t default_read_file(struct file *file, char __user *buf, size_t count, loff_t *ppos) { return 0; } static ssize_t default_write_file(struct file *file, const char __user *buf, size_t count, loff_t *ppos) { return count; } static const struct file_operations tracefs_file_operations = { .read = default_read_file, .write = default_write_file, .open = simple_open, .llseek = noop_llseek, }; static struct tracefs_dir_ops { int (*mkdir)(const char *name); int (*rmdir)(const char *name); } tracefs_ops __ro_after_init; static char *get_dname(struct dentry *dentry) { const char *dname; char *name; int len = dentry->d_name.len; dname = dentry->d_name.name; name = kmalloc(len + 1, GFP_KERNEL); if (!name) return NULL; memcpy(name, dname, len); name[len] = 0; return name; } static int tracefs_syscall_mkdir(struct inode *inode, struct dentry *dentry, umode_t mode) { char *name; int ret; name = get_dname(dentry); if (!name) return -ENOMEM; /* * The mkdir call can call the generic functions that create * the files within the tracefs system. It is up to the individual * mkdir routine to handle races. */ inode_unlock(inode); ret = tracefs_ops.mkdir(name); inode_lock(inode); kfree(name); return ret; } static int tracefs_syscall_rmdir(struct inode *inode, struct dentry *dentry) { char *name; int ret; name = get_dname(dentry); if (!name) return -ENOMEM; /* * The rmdir call can call the generic functions that create * the files within the tracefs system. It is up to the individual * rmdir routine to handle races. * This time we need to unlock not only the parent (inode) but * also the directory that is being deleted. */ inode_unlock(inode); inode_unlock(dentry->d_inode); ret = tracefs_ops.rmdir(name); inode_lock_nested(inode, I_MUTEX_PARENT); inode_lock(dentry->d_inode); kfree(name); return ret; } static const struct inode_operations tracefs_dir_inode_operations = { .lookup = simple_lookup, .mkdir = tracefs_syscall_mkdir, .rmdir = tracefs_syscall_rmdir, }; static struct inode *tracefs_get_inode(struct super_block *sb) { struct inode *inode = new_inode(sb); if (inode) { inode->i_ino = get_next_ino(); inode->i_atime = inode->i_mtime = inode->i_ctime = current_time(inode); } return inode; } struct tracefs_mount_opts { kuid_t uid; kgid_t gid; umode_t mode; }; enum { Opt_uid, Opt_gid, Opt_mode, Opt_err }; static const match_table_t tokens = { {Opt_uid, "uid=%u"}, {Opt_gid, "gid=%u"}, {Opt_mode, "mode=%o"}, {Opt_err, NULL} }; struct tracefs_fs_info { struct tracefs_mount_opts mount_opts; }; static void change_gid(struct dentry *dentry, kgid_t gid) { if (!dentry->d_inode) return; dentry->d_inode->i_gid = gid; } /* * Taken from d_walk, but without he need for handling renames. * Nothing can be renamed while walking the list, as tracefs * does not support renames. This is only called when mounting * or remounting the file system, to set all the files to * the given gid. */ static void set_gid(struct dentry *parent, kgid_t gid) { struct dentry *this_parent; struct list_head *next; this_parent = parent; spin_lock(&this_parent->d_lock); change_gid(this_parent, gid); repeat: next = this_parent->d_subdirs.next; resume: while (next != &this_parent->d_subdirs) { struct list_head *tmp = next; struct dentry *dentry = list_entry(tmp, struct dentry, d_child); next = tmp->next; spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED); change_gid(dentry, gid); if (!list_empty(&dentry->d_subdirs)) { spin_unlock(&this_parent->d_lock); spin_release(&dentry->d_lock.dep_map, 1, _RET_IP_); this_parent = dentry; spin_acquire(&this_parent->d_lock.dep_map, 0, 1, _RET_IP_); goto repeat; } spin_unlock(&dentry->d_lock); } /* * All done at this level ... ascend and resume the search. */ rcu_read_lock(); ascend: if (this_parent != parent) { struct dentry *child = this_parent; this_parent = child->d_parent; spin_unlock(&child->d_lock); spin_lock(&this_parent->d_lock); /* go into the first sibling still alive */ do { next = child->d_child.next; if (next == &this_parent->d_subdirs) goto ascend; child = list_entry(next, struct dentry, d_child); } while (unlikely(child->d_flags & DCACHE_DENTRY_KILLED)); rcu_read_unlock(); goto resume; } rcu_read_unlock(); spin_unlock(&this_parent->d_lock); return; } static int tracefs_parse_options(char *data, struct tracefs_mount_opts *opts) { substring_t args[MAX_OPT_ARGS]; int option; int token; kuid_t uid; kgid_t gid; char *p; opts->mode = TRACEFS_DEFAULT_MODE; while ((p = strsep(&data, ",")) != NULL) { if (!*p) continue; token = match_token(p, tokens, args); switch (token) { case Opt_uid: if (match_int(&args[0], &option)) return -EINVAL; uid = make_kuid(current_user_ns(), option); if (!uid_valid(uid)) return -EINVAL; opts->uid = uid; break; case Opt_gid: if (match_int(&args[0], &option)) return -EINVAL; gid = make_kgid(current_user_ns(), option); if (!gid_valid(gid)) return -EINVAL; opts->gid = gid; break; case Opt_mode: if (match_octal(&args[0], &option)) return -EINVAL; opts->mode = option & S_IALLUGO; break; /* * We might like to report bad mount options here; * but traditionally tracefs has ignored all mount options */ } } return 0; } static int tracefs_apply_options(struct super_block *sb) { struct tracefs_fs_info *fsi = sb->s_fs_info; struct inode *inode = sb->s_root->d_inode; struct tracefs_mount_opts *opts = &fsi->mount_opts; inode->i_mode &= ~S_IALLUGO; inode->i_mode |= opts->mode; inode->i_uid = opts->uid; /* Set all the group ids to the mount option */ set_gid(sb->s_root, opts->gid); return 0; } static int tracefs_remount(struct super_block *sb, int *flags, char *data) { int err; struct tracefs_fs_info *fsi = sb->s_fs_info; sync_filesystem(sb); err = tracefs_parse_options(data, &fsi->mount_opts); if (err) goto fail; tracefs_apply_options(sb); fail: return err; } static int tracefs_show_options(struct seq_file *m, struct dentry *root) { struct tracefs_fs_info *fsi = root->d_sb->s_fs_info; struct tracefs_mount_opts *opts = &fsi->mount_opts; if (!uid_eq(opts->uid, GLOBAL_ROOT_UID)) seq_printf(m, ",uid=%u", from_kuid_munged(&init_user_ns, opts->uid)); if (!gid_eq(opts->gid, GLOBAL_ROOT_GID)) seq_printf(m, ",gid=%u", from_kgid_munged(&init_user_ns, opts->gid)); if (opts->mode != TRACEFS_DEFAULT_MODE) seq_printf(m, ",mode=%o", opts->mode); return 0; } static const struct super_operations tracefs_super_operations = { .statfs = simple_statfs, .remount_fs = tracefs_remount, .show_options = tracefs_show_options, }; static int trace_fill_super(struct super_block *sb, void *data, int silent) { static const struct tree_descr trace_files[] = {{""}}; struct tracefs_fs_info *fsi; int err; fsi = kzalloc(sizeof(struct tracefs_fs_info), GFP_KERNEL); sb->s_fs_info = fsi; if (!fsi) { err = -ENOMEM; goto fail; } err = tracefs_parse_options(data, &fsi->mount_opts); if (err) goto fail; err = simple_fill_super(sb, TRACEFS_MAGIC, trace_files); if (err) goto fail; sb->s_op = &tracefs_super_operations; tracefs_apply_options(sb); return 0; fail: kfree(fsi); sb->s_fs_info = NULL; return err; } static struct dentry *trace_mount(struct file_system_type *fs_type, int flags, const char *dev_name, void *data) { return mount_single(fs_type, flags, data, trace_fill_super); } static struct file_system_type trace_fs_type = { .owner = THIS_MODULE, .name = "tracefs", .mount = trace_mount, .kill_sb = kill_litter_super, }; MODULE_ALIAS_FS("tracefs"); static struct dentry *start_creating(const char *name, struct dentry *parent) { struct dentry *dentry; int error; pr_debug("tracefs: creating file '%s'\n",name); error = simple_pin_fs(&trace_fs_type, &tracefs_mount, &tracefs_mount_count); if (error) return ERR_PTR(error); /* If the parent is not specified, we create it in the root. * We need the root dentry to do this, which is in the super * block. A pointer to that is in the struct vfsmount that we * have around. */ if (!parent) parent = tracefs_mount->mnt_root; inode_lock(parent->d_inode); dentry = lookup_one_len(name, parent, strlen(name)); if (!IS_ERR(dentry) && dentry->d_inode) { dput(dentry); dentry = ERR_PTR(-EEXIST); } if (IS_ERR(dentry)) { inode_unlock(parent->d_inode); simple_release_fs(&tracefs_mount, &tracefs_mount_count); } return dentry; } static struct dentry *failed_creating(struct dentry *dentry) { inode_unlock(dentry->d_parent->d_inode); dput(dentry); simple_release_fs(&tracefs_mount, &tracefs_mount_count); return NULL; } static struct dentry *end_creating(struct dentry *dentry) { inode_unlock(dentry->d_parent->d_inode); return dentry; } /** * tracefs_create_file - create a file in the tracefs filesystem * @name: a pointer to a string containing the name of the file to create. * @mode: the permission that the file should have. * @parent: a pointer to the parent dentry for this file. This should be a * directory dentry if set. If this parameter is NULL, then the * file will be created in the root of the tracefs filesystem. * @data: a pointer to something that the caller will want to get to later * on. The inode.i_private pointer will point to this value on * the open() call. * @fops: a pointer to a struct file_operations that should be used for * this file. * * This is the basic "create a file" function for tracefs. It allows for a * wide range of flexibility in creating a file, or a directory (if you want * to create a directory, the tracefs_create_dir() function is * recommended to be used instead.) * * This function will return a pointer to a dentry if it succeeds. This * pointer must be passed to the tracefs_remove() function when the file is * to be removed (no automatic cleanup happens if your module is unloaded, * you are responsible here.) If an error occurs, %NULL will be returned. * * If tracefs is not enabled in the kernel, the value -%ENODEV will be * returned. */ struct dentry *tracefs_create_file(const char *name, umode_t mode, struct dentry *parent, void *data, const struct file_operations *fops) { struct dentry *dentry; struct inode *inode; if (!(mode & S_IFMT)) mode |= S_IFREG; BUG_ON(!S_ISREG(mode)); dentry = start_creating(name, parent); if (IS_ERR(dentry)) return NULL; inode = tracefs_get_inode(dentry->d_sb); if (unlikely(!inode)) return failed_creating(dentry); inode->i_mode = mode; inode->i_fop = fops ? fops : &tracefs_file_operations; inode->i_private = data; inode->i_uid = d_inode(dentry->d_parent)->i_uid; inode->i_gid = d_inode(dentry->d_parent)->i_gid; d_instantiate(dentry, inode); fsnotify_create(dentry->d_parent->d_inode, dentry); return end_creating(dentry); } static struct dentry *__create_dir(const char *name, struct dentry *parent, const struct inode_operations *ops) { struct dentry *dentry = start_creating(name, parent); struct inode *inode; if (IS_ERR(dentry)) return NULL; inode = tracefs_get_inode(dentry->d_sb); if (unlikely(!inode)) return failed_creating(dentry); /* Do not set bits for OTH */ inode->i_mode = S_IFDIR | S_IRWXU | S_IRUSR| S_IRGRP | S_IXUSR | S_IXGRP; inode->i_op = ops; inode->i_fop = &simple_dir_operations; inode->i_uid = d_inode(dentry->d_parent)->i_uid; inode->i_gid = d_inode(dentry->d_parent)->i_gid; /* directory inodes start off with i_nlink == 2 (for "." entry) */ inc_nlink(inode); d_instantiate(dentry, inode); inc_nlink(dentry->d_parent->d_inode); fsnotify_mkdir(dentry->d_parent->d_inode, dentry); return end_creating(dentry); } /** * tracefs_create_dir - create a directory in the tracefs filesystem * @name: a pointer to a string containing the name of the directory to * create. * @parent: a pointer to the parent dentry for this file. This should be a * directory dentry if set. If this parameter is NULL, then the * directory will be created in the root of the tracefs filesystem. * * This function creates a directory in tracefs with the given name. * * This function will return a pointer to a dentry if it succeeds. This * pointer must be passed to the tracefs_remove() function when the file is * to be removed. If an error occurs, %NULL will be returned. * * If tracing is not enabled in the kernel, the value -%ENODEV will be * returned. */ struct dentry *tracefs_create_dir(const char *name, struct dentry *parent) { return __create_dir(name, parent, &simple_dir_inode_operations); } /** * tracefs_create_instance_dir - create the tracing instances directory * @name: The name of the instances directory to create * @parent: The parent directory that the instances directory will exist * @mkdir: The function to call when a mkdir is performed. * @rmdir: The function to call when a rmdir is performed. * * Only one instances directory is allowed. * * The instances directory is special as it allows for mkdir and rmdir to * to be done by userspace. When a mkdir or rmdir is performed, the inode * locks are released and the methhods passed in (@mkdir and @rmdir) are * called without locks and with the name of the directory being created * within the instances directory. * * Returns the dentry of the instances directory. */ __init struct dentry *tracefs_create_instance_dir(const char *name, struct dentry *parent, int (*mkdir)(const char *name), int (*rmdir)(const char *name)) { struct dentry *dentry; /* Only allow one instance of the instances directory. */ if (WARN_ON(tracefs_ops.mkdir || tracefs_ops.rmdir)) return NULL; dentry = __create_dir(name, parent, &tracefs_dir_inode_operations); if (!dentry) return NULL; tracefs_ops.mkdir = mkdir; tracefs_ops.rmdir = rmdir; return dentry; } static int __tracefs_remove(struct dentry *dentry, struct dentry *parent) { int ret = 0; if (simple_positive(dentry)) { if (dentry->d_inode) { dget(dentry); switch (dentry->d_inode->i_mode & S_IFMT) { case S_IFDIR: ret = simple_rmdir(parent->d_inode, dentry); break; default: simple_unlink(parent->d_inode, dentry); break; } if (!ret) d_delete(dentry); dput(dentry); } } return ret; } /** * tracefs_remove - removes a file or directory from the tracefs filesystem * @dentry: a pointer to a the dentry of the file or directory to be * removed. * * This function removes a file or directory in tracefs that was previously * created with a call to another tracefs function (like * tracefs_create_file() or variants thereof.) */ void tracefs_remove(struct dentry *dentry) { struct dentry *parent; int ret; if (IS_ERR_OR_NULL(dentry)) return; parent = dentry->d_parent; inode_lock(parent->d_inode); ret = __tracefs_remove(dentry, parent); inode_unlock(parent->d_inode); if (!ret) simple_release_fs(&tracefs_mount, &tracefs_mount_count); } /** * tracefs_remove_recursive - recursively removes a directory * @dentry: a pointer to a the dentry of the directory to be removed. * * This function recursively removes a directory tree in tracefs that * was previously created with a call to another tracefs function * (like tracefs_create_file() or variants thereof.) */ void tracefs_remove_recursive(struct dentry *dentry) { struct dentry *child, *parent; if (IS_ERR_OR_NULL(dentry)) return; parent = dentry; down: inode_lock(parent->d_inode); loop: /* * The parent->d_subdirs is protected by the d_lock. Outside that * lock, the child can be unlinked and set to be freed which can * use the d_u.d_child as the rcu head and corrupt this list. */ spin_lock(&parent->d_lock); list_for_each_entry(child, &parent->d_subdirs, d_child) { if (!simple_positive(child)) continue; /* perhaps simple_empty(child) makes more sense */ if (!list_empty(&child->d_subdirs)) { spin_unlock(&parent->d_lock); inode_unlock(parent->d_inode); parent = child; goto down; } spin_unlock(&parent->d_lock); if (!__tracefs_remove(child, parent)) simple_release_fs(&tracefs_mount, &tracefs_mount_count); /* * The parent->d_lock protects agaist child from unlinking * from d_subdirs. When releasing the parent->d_lock we can * no longer trust that the next pointer is valid. * Restart the loop. We'll skip this one with the * simple_positive() check. */ goto loop; } spin_unlock(&parent->d_lock); inode_unlock(parent->d_inode); child = parent; parent = parent->d_parent; inode_lock(parent->d_inode); if (child != dentry) /* go up */ goto loop; if (!__tracefs_remove(child, parent)) simple_release_fs(&tracefs_mount, &tracefs_mount_count); inode_unlock(parent->d_inode); } /** * tracefs_initialized - Tells whether tracefs has been registered */ bool tracefs_initialized(void) { return tracefs_registered; } static int __init tracefs_init(void) { int retval; retval = sysfs_create_mount_point(kernel_kobj, "tracing"); if (retval) return -EINVAL; retval = register_filesystem(&trace_fs_type); if (!retval) tracefs_registered = true; return retval; } core_initcall(tracefs_init);
12 12 12 12 12 8 8 8 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 /* * Cryptographic API. * * RIPEMD-160 - RACE Integrity Primitives Evaluation Message Digest. * * Based on the reference implementation by Antoon Bosselaers, ESAT-COSIC * * Copyright (c) 2008 Adrian-Ken Rueegsegger <ken@codelabs.ch> * * This program is free software; you can redistribute it and/or modify it * under the terms of the GNU General Public License as published by the Free * Software Foundation; either version 2 of the License, or (at your option) * any later version. * */ #include <crypto/internal/hash.h> #include <linux/init.h> #include <linux/module.h> #include <linux/mm.h> #include <linux/types.h> #include <asm/byteorder.h> #include "ripemd.h" struct rmd160_ctx { u64 byte_count; u32 state[5]; __le32 buffer[16]; }; #define K1 RMD_K1 #define K2 RMD_K2 #define K3 RMD_K3 #define K4 RMD_K4 #define K5 RMD_K5 #define KK1 RMD_K6 #define KK2 RMD_K7 #define KK3 RMD_K8 #define KK4 RMD_K9 #define KK5 RMD_K1 #define F1(x, y, z) (x ^ y ^ z) /* XOR */ #define F2(x, y, z) (z ^ (x & (y ^ z))) /* x ? y : z */ #define F3(x, y, z) ((x | ~y) ^ z) #define F4(x, y, z) (y ^ (z & (x ^ y))) /* z ? x : y */ #define F5(x, y, z) (x ^ (y | ~z)) #define ROUND(a, b, c, d, e, f, k, x, s) { \ (a) += f((b), (c), (d)) + le32_to_cpup(&(x)) + (k); \ (a) = rol32((a), (s)) + (e); \ (c) = rol32((c), 10); \ } static void rmd160_transform(u32 *state, const __le32 *in) { u32 aa, bb, cc, dd, ee, aaa, bbb, ccc, ddd, eee; /* Initialize left lane */ aa = state[0]; bb = state[1]; cc = state[2]; dd = state[3]; ee = state[4]; /* Initialize right lane */ aaa = state[0]; bbb = state[1]; ccc = state[2]; ddd = state[3]; eee = state[4]; /* round 1: left lane */ ROUND(aa, bb, cc, dd, ee, F1, K1, in[0], 11); ROUND(ee, aa, bb, cc, dd, F1, K1, in[1], 14); ROUND(dd, ee, aa, bb, cc, F1, K1, in[2], 15); ROUND(cc, dd, ee, aa, bb, F1, K1, in[3], 12); ROUND(bb, cc, dd, ee, aa, F1, K1, in[4], 5); ROUND(aa, bb, cc, dd, ee, F1, K1, in[5], 8); ROUND(ee, aa, bb, cc, dd, F1, K1, in[6], 7); ROUND(dd, ee, aa, bb, cc, F1, K1, in[7], 9); ROUND(cc, dd, ee, aa, bb, F1, K1, in[8], 11); ROUND(bb, cc, dd, ee, aa, F1, K1, in[9], 13); ROUND(aa, bb, cc, dd, ee, F1, K1, in[10], 14); ROUND(ee, aa, bb, cc, dd, F1, K1, in[11], 15); ROUND(dd, ee, aa, bb, cc, F1, K1, in[12], 6); ROUND(cc, dd, ee, aa, bb, F1, K1, in[13], 7); ROUND(bb, cc, dd, ee, aa, F1, K1, in[14], 9); ROUND(aa, bb, cc, dd, ee, F1, K1, in[15], 8); /* round 2: left lane" */ ROUND(ee, aa, bb, cc, dd, F2, K2, in[7], 7); ROUND(dd, ee, aa, bb, cc, F2, K2, in[4], 6); ROUND(cc, dd, ee, aa, bb, F2, K2, in[13], 8); ROUND(bb, cc, dd, ee, aa, F2, K2, in[1], 13); ROUND(aa, bb, cc, dd, ee, F2, K2, in[10], 11); ROUND(ee, aa, bb, cc, dd, F2, K2, in[6], 9); ROUND(dd, ee, aa, bb, cc, F2, K2, in[15], 7); ROUND(cc, dd, ee, aa, bb, F2, K2, in[3], 15); ROUND(bb, cc, dd, ee, aa, F2, K2, in[12], 7); ROUND(aa, bb, cc, dd, ee, F2, K2, in[0], 12); ROUND(ee, aa, bb, cc, dd, F2, K2, in[9], 15); ROUND(dd, ee, aa, bb, cc, F2, K2, in[5], 9); ROUND(cc, dd, ee, aa, bb, F2, K2, in[2], 11); ROUND(bb, cc, dd, ee, aa, F2, K2, in[14], 7); ROUND(aa, bb, cc, dd, ee, F2, K2, in[11], 13); ROUND(ee, aa, bb, cc, dd, F2, K2, in[8], 12); /* round 3: left lane" */ ROUND(dd, ee, aa, bb, cc, F3, K3, in[3], 11); ROUND(cc, dd, ee, aa, bb, F3, K3, in[10], 13); ROUND(bb, cc, dd, ee, aa, F3, K3, in[14], 6); ROUND(aa, bb, cc, dd, ee, F3, K3, in[4], 7); ROUND(ee, aa, bb, cc, dd, F3, K3, in[9], 14); ROUND(dd, ee, aa, bb, cc, F3, K3, in[15], 9); ROUND(cc, dd, ee, aa, bb, F3, K3, in[8], 13); ROUND(bb, cc, dd, ee, aa, F3, K3, in[1], 15); ROUND(aa, bb, cc, dd, ee, F3, K3, in[2], 14); ROUND(ee, aa, bb, cc, dd, F3, K3, in[7], 8); ROUND(dd, ee, aa, bb, cc, F3, K3, in[0], 13); ROUND(cc, dd, ee, aa, bb, F3, K3, in[6], 6); ROUND(bb, cc, dd, ee, aa, F3, K3, in[13], 5); ROUND(aa, bb, cc, dd, ee, F3, K3, in[11], 12); ROUND(ee, aa, bb, cc, dd, F3, K3, in[5], 7); ROUND(dd, ee, aa, bb, cc, F3, K3, in[12], 5); /* round 4: left lane" */ ROUND(cc, dd, ee, aa, bb, F4, K4, in[1], 11); ROUND(bb, cc, dd, ee, aa, F4, K4, in[9], 12); ROUND(aa, bb, cc, dd, ee, F4, K4, in[11], 14); ROUND(ee, aa, bb, cc, dd, F4, K4, in[10], 15); ROUND(dd, ee, aa, bb, cc, F4, K4, in[0], 14); ROUND(cc, dd, ee, aa, bb, F4, K4, in[8], 15); ROUND(bb, cc, dd, ee, aa, F4, K4, in[12], 9); ROUND(aa, bb, cc, dd, ee, F4, K4, in[4], 8); ROUND(ee, aa, bb, cc, dd, F4, K4, in[13], 9); ROUND(dd, ee, aa, bb, cc, F4, K4, in[3], 14); ROUND(cc, dd, ee, aa, bb, F4, K4, in[7], 5); ROUND(bb, cc, dd, ee, aa, F4, K4, in[15], 6); ROUND(aa, bb, cc, dd, ee, F4, K4, in[14], 8); ROUND(ee, aa, bb, cc, dd, F4, K4, in[5], 6); ROUND(dd, ee, aa, bb, cc, F4, K4, in[6], 5); ROUND(cc, dd, ee, aa, bb, F4, K4, in[2], 12); /* round 5: left lane" */ ROUND(bb, cc, dd, ee, aa, F5, K5, in[4], 9); ROUND(aa, bb, cc, dd, ee, F5, K5, in[0], 15); ROUND(ee, aa, bb, cc, dd, F5, K5, in[5], 5); ROUND(dd, ee, aa, bb, cc, F5, K5, in[9], 11); ROUND(cc, dd, ee, aa, bb, F5, K5, in[7], 6); ROUND(bb, cc, dd, ee, aa, F5, K5, in[12], 8); ROUND(aa, bb, cc, dd, ee, F5, K5, in[2], 13); ROUND(ee, aa, bb, cc, dd, F5, K5, in[10], 12); ROUND(dd, ee, aa, bb, cc, F5, K5, in[14], 5); ROUND(cc, dd, ee, aa, bb, F5, K5, in[1], 12); ROUND(bb, cc, dd, ee, aa, F5, K5, in[3], 13); ROUND(aa, bb, cc, dd, ee, F5, K5, in[8], 14); ROUND(ee, aa, bb, cc, dd, F5, K5, in[11], 11); ROUND(dd, ee, aa, bb, cc, F5, K5, in[6], 8); ROUND(cc, dd, ee, aa, bb, F5, K5, in[15], 5); ROUND(bb, cc, dd, ee, aa, F5, K5, in[13], 6); /* round 1: right lane */ ROUND(aaa, bbb, ccc, ddd, eee, F5, KK1, in[5], 8); ROUND(eee, aaa, bbb, ccc, ddd, F5, KK1, in[14], 9); ROUND(ddd, eee, aaa, bbb, ccc, F5, KK1, in[7], 9); ROUND(ccc, ddd, eee, aaa, bbb, F5, KK1, in[0], 11); ROUND(bbb, ccc, ddd, eee, aaa, F5, KK1, in[9], 13); ROUND(aaa, bbb, ccc, ddd, eee, F5, KK1, in[2], 15); ROUND(eee, aaa, bbb, ccc, ddd, F5, KK1, in[11], 15); ROUND(ddd, eee, aaa, bbb, ccc, F5, KK1, in[4], 5); ROUND(ccc, ddd, eee, aaa, bbb, F5, KK1, in[13], 7); ROUND(bbb, ccc, ddd, eee, aaa, F5, KK1, in[6], 7); ROUND(aaa, bbb, ccc, ddd, eee, F5, KK1, in[15], 8); ROUND(eee, aaa, bbb, ccc, ddd, F5, KK1, in[8], 11); ROUND(ddd, eee, aaa, bbb, ccc, F5, KK1, in[1], 14); ROUND(ccc, ddd, eee, aaa, bbb, F5, KK1, in[10], 14); ROUND(bbb, ccc, ddd, eee, aaa, F5, KK1, in[3], 12); ROUND(aaa, bbb, ccc, ddd, eee, F5, KK1, in[12], 6); /* round 2: right lane */ ROUND(eee, aaa, bbb, ccc, ddd, F4, KK2, in[6], 9); ROUND(ddd, eee, aaa, bbb, ccc, F4, KK2, in[11], 13); ROUND(ccc, ddd, eee, aaa, bbb, F4, KK2, in[3], 15); ROUND(bbb, ccc, ddd, eee, aaa, F4, KK2, in[7], 7); ROUND(aaa, bbb, ccc, ddd, eee, F4, KK2, in[0], 12); ROUND(eee, aaa, bbb, ccc, ddd, F4, KK2, in[13], 8); ROUND(ddd, eee, aaa, bbb, ccc, F4, KK2, in[5], 9); ROUND(ccc, ddd, eee, aaa, bbb, F4, KK2, in[10], 11); ROUND(bbb, ccc, ddd, eee, aaa, F4, KK2, in[14], 7); ROUND(aaa, bbb, ccc, ddd, eee, F4, KK2, in[15], 7); ROUND(eee, aaa, bbb, ccc, ddd, F4, KK2, in[8], 12); ROUND(ddd, eee, aaa, bbb, ccc, F4, KK2, in[12], 7); ROUND(ccc, ddd, eee, aaa, bbb, F4, KK2, in[4], 6); ROUND(bbb, ccc, ddd, eee, aaa, F4, KK2, in[9], 15); ROUND(aaa, bbb, ccc, ddd, eee, F4, KK2, in[1], 13); ROUND(eee, aaa, bbb, ccc, ddd, F4, KK2, in[2], 11); /* round 3: right lane */ ROUND(ddd, eee, aaa, bbb, ccc, F3, KK3, in[15], 9); ROUND(ccc, ddd, eee, aaa, bbb, F3, KK3, in[5], 7); ROUND(bbb, ccc, ddd, eee, aaa, F3, KK3, in[1], 15); ROUND(aaa, bbb, ccc, ddd, eee, F3, KK3, in[3], 11); ROUND(eee, aaa, bbb, ccc, ddd, F3, KK3, in[7], 8); ROUND(ddd, eee, aaa, bbb, ccc, F3, KK3, in[14], 6); ROUND(ccc, ddd, eee, aaa, bbb, F3, KK3, in[6], 6); ROUND(bbb, ccc, ddd, eee, aaa, F3, KK3, in[9], 14); ROUND(aaa, bbb, ccc, ddd, eee, F3, KK3, in[11], 12); ROUND(eee, aaa, bbb, ccc, ddd, F3, KK3, in[8], 13); ROUND(ddd, eee, aaa, bbb, ccc, F3, KK3, in[12], 5); ROUND(ccc, ddd, eee, aaa, bbb, F3, KK3, in[2], 14); ROUND(bbb, ccc, ddd, eee, aaa, F3, KK3, in[10], 13); ROUND(aaa, bbb, ccc, ddd, eee, F3, KK3, in[0], 13); ROUND(eee, aaa, bbb, ccc, ddd, F3, KK3, in[4], 7); ROUND(ddd, eee, aaa, bbb, ccc, F3, KK3, in[13], 5); /* round 4: right lane */ ROUND(ccc, ddd, eee, aaa, bbb, F2, KK4, in[8], 15); ROUND(bbb, ccc, ddd, eee, aaa, F2, KK4, in[6], 5); ROUND(aaa, bbb, ccc, ddd, eee, F2, KK4, in[4], 8); ROUND(eee, aaa, bbb, ccc, ddd, F2, KK4, in[1], 11); ROUND(ddd, eee, aaa, bbb, ccc, F2, KK4, in[3], 14); ROUND(ccc, ddd, eee, aaa, bbb, F2, KK4, in[11], 14); ROUND(bbb, ccc, ddd, eee, aaa, F2, KK4, in[15], 6); ROUND(aaa, bbb, ccc, ddd, eee, F2, KK4, in[0], 14); ROUND(eee, aaa, bbb, ccc, ddd, F2, KK4, in[5], 6); ROUND(ddd, eee, aaa, bbb, ccc, F2, KK4, in[12], 9); ROUND(ccc, ddd, eee, aaa, bbb, F2, KK4, in[2], 12); ROUND(bbb, ccc, ddd, eee, aaa, F2, KK4, in[13], 9); ROUND(aaa, bbb, ccc, ddd, eee, F2, KK4, in[9], 12); ROUND(eee, aaa, bbb, ccc, ddd, F2, KK4, in[7], 5); ROUND(ddd, eee, aaa, bbb, ccc, F2, KK4, in[10], 15); ROUND(ccc, ddd, eee, aaa, bbb, F2, KK4, in[14], 8); /* round 5: right lane */ ROUND(bbb, ccc, ddd, eee, aaa, F1, KK5, in[12], 8); ROUND(aaa, bbb, ccc, ddd, eee, F1, KK5, in[15], 5); ROUND(eee, aaa, bbb, ccc, ddd, F1, KK5, in[10], 12); ROUND(ddd, eee, aaa, bbb, ccc, F1, KK5, in[4], 9); ROUND(ccc, ddd, eee, aaa, bbb, F1, KK5, in[1], 12); ROUND(bbb, ccc, ddd, eee, aaa, F1, KK5, in[5], 5); ROUND(aaa, bbb, ccc, ddd, eee, F1, KK5, in[8], 14); ROUND(eee, aaa, bbb, ccc, ddd, F1, KK5, in[7], 6); ROUND(ddd, eee, aaa, bbb, ccc, F1, KK5, in[6], 8); ROUND(ccc, ddd, eee, aaa, bbb, F1, KK5, in[2], 13); ROUND(bbb, ccc, ddd, eee, aaa, F1, KK5, in[13], 6); ROUND(aaa, bbb, ccc, ddd, eee, F1, KK5, in[14], 5); ROUND(eee, aaa, bbb, ccc, ddd, F1, KK5, in[0], 15); ROUND(ddd, eee, aaa, bbb, ccc, F1, KK5, in[3], 13); ROUND(ccc, ddd, eee, aaa, bbb, F1, KK5, in[9], 11); ROUND(bbb, ccc, ddd, eee, aaa, F1, KK5, in[11], 11); /* combine results */ ddd += cc + state[1]; /* final result for state[0] */ state[1] = state[2] + dd + eee; state[2] = state[3] + ee + aaa; state[3] = state[4] + aa + bbb; state[4] = state[0] + bb + ccc; state[0] = ddd; } static int rmd160_init(struct shash_desc *desc) { struct rmd160_ctx *rctx = shash_desc_ctx(desc); rctx->byte_count = 0; rctx->state[0] = RMD_H0; rctx->state[1] = RMD_H1; rctx->state[2] = RMD_H2; rctx->state[3] = RMD_H3; rctx->state[4] = RMD_H4; memset(rctx->buffer, 0, sizeof(rctx->buffer)); return 0; } static int rmd160_update(struct shash_desc *desc, const u8 *data, unsigned int len) { struct rmd160_ctx *rctx = shash_desc_ctx(desc); const u32 avail = sizeof(rctx->buffer) - (rctx->byte_count & 0x3f); rctx->byte_count += len; /* Enough space in buffer? If so copy and we're done */ if (avail > len) { memcpy((char *)rctx->buffer + (sizeof(rctx->buffer) - avail), data, len); goto out; } memcpy((char *)rctx->buffer + (sizeof(rctx->buffer) - avail), data, avail); rmd160_transform(rctx->state, rctx->buffer); data += avail; len -= avail; while (len >= sizeof(rctx->buffer)) { memcpy(rctx->buffer, data, sizeof(rctx->buffer)); rmd160_transform(rctx->state, rctx->buffer); data += sizeof(rctx->buffer); len -= sizeof(rctx->buffer); } memcpy(rctx->buffer, data, len); out: return 0; } /* Add padding and return the message digest. */ static int rmd160_final(struct shash_desc *desc, u8 *out) { struct rmd160_ctx *rctx = shash_desc_ctx(desc); u32 i, index, padlen; __le64 bits; __le32 *dst = (__le32 *)out; static const u8 padding[64] = { 0x80, }; bits = cpu_to_le64(rctx->byte_count << 3); /* Pad out to 56 mod 64 */ index = rctx->byte_count & 0x3f; padlen = (index < 56) ? (56 - index) : ((64+56) - index); rmd160_update(desc, padding, padlen); /* Append length */ rmd160_update(desc, (const u8 *)&bits, sizeof(bits)); /* Store state in digest */ for (i = 0; i < 5; i++) dst[i] = cpu_to_le32p(&rctx->state[i]); /* Wipe context */ memset(rctx, 0, sizeof(*rctx)); return 0; } static struct shash_alg alg = { .digestsize = RMD160_DIGEST_SIZE, .init = rmd160_init, .update = rmd160_update, .final = rmd160_final, .descsize = sizeof(struct rmd160_ctx), .base = { .cra_name = "rmd160", .cra_blocksize = RMD160_BLOCK_SIZE, .cra_module = THIS_MODULE, } }; static int __init rmd160_mod_init(void) { return crypto_register_shash(&alg); } static void __exit rmd160_mod_fini(void) { crypto_unregister_shash(&alg); } module_init(rmd160_mod_init); module_exit(rmd160_mod_fini); MODULE_LICENSE("GPL"); MODULE_AUTHOR("Adrian-Ken Rueegsegger <ken@codelabs.ch>"); MODULE_DESCRIPTION("RIPEMD-160 Message Digest"); MODULE_ALIAS_CRYPTO("rmd160");
4 3 4 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 /* * X.25 Packet Layer release 002 * * This is ALPHA test software. This code may break your machine, * randomly fail to work with new releases, misbehave and/or generally * screw up. It might even work. * * This code REQUIRES 2.1.15 or higher * * This module: * This module is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License * as published by the Free Software Foundation; either version * 2 of the License, or (at your option) any later version. * * History * X.25 001 Jonathan Naylor Started coding. * X.25 002 Jonathan Naylor New timer architecture. * mar/20/00 Daniela Squassoni Disabling/enabling of facilities * negotiation. * 2000-09-04 Henner Eisen dev_hold() / dev_put() for x25_neigh. */ #define pr_fmt(fmt) "X25: " fmt #include <linux/kernel.h> #include <linux/jiffies.h> #include <linux/timer.h> #include <linux/slab.h> #include <linux/netdevice.h> #include <linux/skbuff.h> #include <linux/uaccess.h> #include <linux/init.h> #include <net/x25.h> LIST_HEAD(x25_neigh_list); DEFINE_RWLOCK(x25_neigh_list_lock); static void x25_t20timer_expiry(struct timer_list *); static void x25_transmit_restart_confirmation(struct x25_neigh *nb); static void x25_transmit_restart_request(struct x25_neigh *nb); /* * Linux set/reset timer routines */ static inline void x25_start_t20timer(struct x25_neigh *nb) { mod_timer(&nb->t20timer, jiffies + nb->t20); } static void x25_t20timer_expiry(struct timer_list *t) { struct x25_neigh *nb = from_timer(nb, t, t20timer); x25_transmit_restart_request(nb); x25_start_t20timer(nb); } static inline void x25_stop_t20timer(struct x25_neigh *nb) { del_timer(&nb->t20timer); } static inline int x25_t20timer_pending(struct x25_neigh *nb) { return timer_pending(&nb->t20timer); } /* * This handles all restart and diagnostic frames. */ void x25_link_control(struct sk_buff *skb, struct x25_neigh *nb, unsigned short frametype) { struct sk_buff *skbn; int confirm; switch (frametype) { case X25_RESTART_REQUEST: confirm = !x25_t20timer_pending(nb); x25_stop_t20timer(nb); nb->state = X25_LINK_STATE_3; if (confirm) x25_transmit_restart_confirmation(nb); break; case X25_RESTART_CONFIRMATION: x25_stop_t20timer(nb); nb->state = X25_LINK_STATE_3; break; case X25_DIAGNOSTIC: if (!pskb_may_pull(skb, X25_STD_MIN_LEN + 4)) break; pr_warn("diagnostic #%d - %02X %02X %02X\n", skb->data[3], skb->data[4], skb->data[5], skb->data[6]); break; default: pr_warn("received unknown %02X with LCI 000\n", frametype); break; } if (nb->state == X25_LINK_STATE_3) while ((skbn = skb_dequeue(&nb->queue)) != NULL) x25_send_frame(skbn, nb); } /* * This routine is called when a Restart Request is needed */ static void x25_transmit_restart_request(struct x25_neigh *nb) { unsigned char *dptr; int len = X25_MAX_L2_LEN + X25_STD_MIN_LEN + 2; struct sk_buff *skb = alloc_skb(len, GFP_ATOMIC); if (!skb) return; skb_reserve(skb, X25_MAX_L2_LEN); dptr = skb_put(skb, X25_STD_MIN_LEN + 2); *dptr++ = nb->extended ? X25_GFI_EXTSEQ : X25_GFI_STDSEQ; *dptr++ = 0x00; *dptr++ = X25_RESTART_REQUEST; *dptr++ = 0x00; *dptr++ = 0; skb->sk = NULL; x25_send_frame(skb, nb); } /* * This routine is called when a Restart Confirmation is needed */ static void x25_transmit_restart_confirmation(struct x25_neigh *nb) { unsigned char *dptr; int len = X25_MAX_L2_LEN + X25_STD_MIN_LEN; struct sk_buff *skb = alloc_skb(len, GFP_ATOMIC); if (!skb) return; skb_reserve(skb, X25_MAX_L2_LEN); dptr = skb_put(skb, X25_STD_MIN_LEN); *dptr++ = nb->extended ? X25_GFI_EXTSEQ : X25_GFI_STDSEQ; *dptr++ = 0x00; *dptr++ = X25_RESTART_CONFIRMATION; skb->sk = NULL; x25_send_frame(skb, nb); } /* * This routine is called when a Clear Request is needed outside of the context * of a connected socket. */ void x25_transmit_clear_request(struct x25_neigh *nb, unsigned int lci, unsigned char cause) { unsigned char *dptr; int len = X25_MAX_L2_LEN + X25_STD_MIN_LEN + 2; struct sk_buff *skb = alloc_skb(len, GFP_ATOMIC); if (!skb) return; skb_reserve(skb, X25_MAX_L2_LEN); dptr = skb_put(skb, X25_STD_MIN_LEN + 2); *dptr++ = ((lci >> 8) & 0x0F) | (nb->extended ? X25_GFI_EXTSEQ : X25_GFI_STDSEQ); *dptr++ = (lci >> 0) & 0xFF; *dptr++ = X25_CLEAR_REQUEST; *dptr++ = cause; *dptr++ = 0x00; skb->sk = NULL; x25_send_frame(skb, nb); } void x25_transmit_link(struct sk_buff *skb, struct x25_neigh *nb) { switch (nb->state) { case X25_LINK_STATE_0: skb_queue_tail(&nb->queue, skb); nb->state = X25_LINK_STATE_1; x25_establish_link(nb); break; case X25_LINK_STATE_1: case X25_LINK_STATE_2: skb_queue_tail(&nb->queue, skb); break; case X25_LINK_STATE_3: x25_send_frame(skb, nb); break; } } /* * Called when the link layer has become established. */ void x25_link_established(struct x25_neigh *nb) { switch (nb->state) { case X25_LINK_STATE_0: nb->state = X25_LINK_STATE_2; break; case X25_LINK_STATE_1: x25_transmit_restart_request(nb); nb->state = X25_LINK_STATE_2; x25_start_t20timer(nb); break; } } /* * Called when the link layer has terminated, or an establishment * request has failed. */ void x25_link_terminated(struct x25_neigh *nb) { nb->state = X25_LINK_STATE_0; /* Out of order: clear existing virtual calls (X.25 03/93 4.6.3) */ x25_kill_by_neigh(nb); } /* * Add a new device. */ void x25_link_device_up(struct net_device *dev) { struct x25_neigh *nb = kmalloc(sizeof(*nb), GFP_ATOMIC); if (!nb) return; skb_queue_head_init(&nb->queue); timer_setup(&nb->t20timer, x25_t20timer_expiry, 0); dev_hold(dev); nb->dev = dev; nb->state = X25_LINK_STATE_0; nb->extended = 0; /* * Enables negotiation */ nb->global_facil_mask = X25_MASK_REVERSE | X25_MASK_THROUGHPUT | X25_MASK_PACKET_SIZE | X25_MASK_WINDOW_SIZE; nb->t20 = sysctl_x25_restart_request_timeout; refcount_set(&nb->refcnt, 1); write_lock_bh(&x25_neigh_list_lock); list_add(&nb->node, &x25_neigh_list); write_unlock_bh(&x25_neigh_list_lock); } /** * __x25_remove_neigh - remove neighbour from x25_neigh_list * @nb - neigh to remove * * Remove neighbour from x25_neigh_list. If it was there. * Caller must hold x25_neigh_list_lock. */ static void __x25_remove_neigh(struct x25_neigh *nb) { skb_queue_purge(&nb->queue); x25_stop_t20timer(nb); if (nb->node.next) { list_del(&nb->node); x25_neigh_put(nb); } } /* * A device has been removed, remove its links. */ void x25_link_device_down(struct net_device *dev) { struct x25_neigh *nb; struct list_head *entry, *tmp; write_lock_bh(&x25_neigh_list_lock); list_for_each_safe(entry, tmp, &x25_neigh_list) { nb = list_entry(entry, struct x25_neigh, node); if (nb->dev == dev) { __x25_remove_neigh(nb); dev_put(dev); } } write_unlock_bh(&x25_neigh_list_lock); } /* * Given a device, return the neighbour address. */ struct x25_neigh *x25_get_neigh(struct net_device *dev) { struct x25_neigh *nb, *use = NULL; struct list_head *entry; read_lock_bh(&x25_neigh_list_lock); list_for_each(entry, &x25_neigh_list) { nb = list_entry(entry, struct x25_neigh, node); if (nb->dev == dev) { use = nb; break; } } if (use) x25_neigh_hold(use); read_unlock_bh(&x25_neigh_list_lock); return use; } /* * Handle the ioctls that control the subscription functions. */ int x25_subscr_ioctl(unsigned int cmd, void __user *arg) { struct x25_subscrip_struct x25_subscr; struct x25_neigh *nb; struct net_device *dev; int rc = -EINVAL; if (cmd != SIOCX25GSUBSCRIP && cmd != SIOCX25SSUBSCRIP) goto out; rc = -EFAULT; if (copy_from_user(&x25_subscr, arg, sizeof(x25_subscr))) goto out; rc = -EINVAL; if ((dev = x25_dev_get(x25_subscr.device)) == NULL) goto out; if ((nb = x25_get_neigh(dev)) == NULL) goto out_dev_put; dev_put(dev); if (cmd == SIOCX25GSUBSCRIP) { read_lock_bh(&x25_neigh_list_lock); x25_subscr.extended = nb->extended; x25_subscr.global_facil_mask = nb->global_facil_mask; read_unlock_bh(&x25_neigh_list_lock); rc = copy_to_user(arg, &x25_subscr, sizeof(x25_subscr)) ? -EFAULT : 0; } else { rc = -EINVAL; if (!(x25_subscr.extended && x25_subscr.extended != 1)) { rc = 0; write_lock_bh(&x25_neigh_list_lock); nb->extended = x25_subscr.extended; nb->global_facil_mask = x25_subscr.global_facil_mask; write_unlock_bh(&x25_neigh_list_lock); } } x25_neigh_put(nb); out: return rc; out_dev_put: dev_put(dev); goto out; } /* * Release all memory associated with X.25 neighbour structures. */ void __exit x25_link_free(void) { struct x25_neigh *nb; struct list_head *entry, *tmp; write_lock_bh(&x25_neigh_list_lock); list_for_each_safe(entry, tmp, &x25_neigh_list) { struct net_device *dev; nb = list_entry(entry, struct x25_neigh, node); dev = nb->dev; __x25_remove_neigh(nb); dev_put(dev); } write_unlock_bh(&x25_neigh_list_lock); }
213 83 91 6 119 3 215 201 206 220 216 165 165 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 /* * Copyright (c) 2006 Jiri Benc <jbenc@suse.cz> * Copyright 2007 Johannes Berg <johannes@sipsolutions.net> * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License version 2 as * published by the Free Software Foundation. */ #include <linux/kernel.h> #include <linux/device.h> #include <linux/if.h> #include <linux/if_ether.h> #include <linux/interrupt.h> #include <linux/netdevice.h> #include <linux/rtnetlink.h> #include <linux/slab.h> #include <linux/notifier.h> #include <net/mac80211.h> #include <net/cfg80211.h> #include "ieee80211_i.h" #include "rate.h" #include "debugfs.h" #include "debugfs_netdev.h" #include "driver-ops.h" static ssize_t ieee80211_if_read( struct ieee80211_sub_if_data *sdata, char __user *userbuf, size_t count, loff_t *ppos, ssize_t (*format)(const struct ieee80211_sub_if_data *, char *, int)) { char buf[200]; ssize_t ret = -EINVAL; read_lock(&dev_base_lock); ret = (*format)(sdata, buf, sizeof(buf)); read_unlock(&dev_base_lock); if (ret >= 0) ret = simple_read_from_buffer(userbuf, count, ppos, buf, ret); return ret; } static ssize_t ieee80211_if_write( struct ieee80211_sub_if_data *sdata, const char __user *userbuf, size_t count, loff_t *ppos, ssize_t (*write)(struct ieee80211_sub_if_data *, const char *, int)) { char buf[64]; ssize_t ret; if (count >= sizeof(buf)) return -E2BIG; if (copy_from_user(buf, userbuf, count)) return -EFAULT; buf[count] = '\0'; ret = -ENODEV; rtnl_lock(); ret = (*write)(sdata, buf, count); rtnl_unlock(); return ret; } #define IEEE80211_IF_FMT(name, field, format_string) \ static ssize_t ieee80211_if_fmt_##name( \ const struct ieee80211_sub_if_data *sdata, char *buf, \ int buflen) \ { \ return scnprintf(buf, buflen, format_string, sdata->field); \ } #define IEEE80211_IF_FMT_DEC(name, field) \ IEEE80211_IF_FMT(name, field, "%d\n") #define IEEE80211_IF_FMT_HEX(name, field) \ IEEE80211_IF_FMT(name, field, "%#x\n") #define IEEE80211_IF_FMT_LHEX(name, field) \ IEEE80211_IF_FMT(name, field, "%#lx\n") #define IEEE80211_IF_FMT_SIZE(name, field) \ IEEE80211_IF_FMT(name, field, "%zd\n") #define IEEE80211_IF_FMT_HEXARRAY(name, field) \ static ssize_t ieee80211_if_fmt_##name( \ const struct ieee80211_sub_if_data *sdata, \ char *buf, int buflen) \ { \ char *p = buf; \ int i; \ for (i = 0; i < sizeof(sdata->field); i++) { \ p += scnprintf(p, buflen + buf - p, "%.2x ", \ sdata->field[i]); \ } \ p += scnprintf(p, buflen + buf - p, "\n"); \ return p - buf; \ } #define IEEE80211_IF_FMT_ATOMIC(name, field) \ static ssize_t ieee80211_if_fmt_##name( \ const struct ieee80211_sub_if_data *sdata, \ char *buf, int buflen) \ { \ return scnprintf(buf, buflen, "%d\n", atomic_read(&sdata->field));\ } #define IEEE80211_IF_FMT_MAC(name, field) \ static ssize_t ieee80211_if_fmt_##name( \ const struct ieee80211_sub_if_data *sdata, char *buf, \ int buflen) \ { \ return scnprintf(buf, buflen, "%pM\n", sdata->field); \ } #define IEEE80211_IF_FMT_JIFFIES_TO_MS(name, field) \ static ssize_t ieee80211_if_fmt_##name( \ const struct ieee80211_sub_if_data *sdata, \ char *buf, int buflen) \ { \ return scnprintf(buf, buflen, "%d\n", \ jiffies_to_msecs(sdata->field)); \ } #define _IEEE80211_IF_FILE_OPS(name, _read, _write) \ static const struct file_operations name##_ops = { \ .read = (_read), \ .write = (_write), \ .open = simple_open, \ .llseek = generic_file_llseek, \ } #define _IEEE80211_IF_FILE_R_FN(name) \ static ssize_t ieee80211_if_read_##name(struct file *file, \ char __user *userbuf, \ size_t count, loff_t *ppos) \ { \ return ieee80211_if_read(file->private_data, \ userbuf, count, ppos, \ ieee80211_if_fmt_##name); \ } #define _IEEE80211_IF_FILE_W_FN(name) \ static ssize_t ieee80211_if_write_##name(struct file *file, \ const char __user *userbuf, \ size_t count, loff_t *ppos) \ { \ return ieee80211_if_write(file->private_data, userbuf, count, \ ppos, ieee80211_if_parse_##name); \ } #define IEEE80211_IF_FILE_R(name) \ _IEEE80211_IF_FILE_R_FN(name) \ _IEEE80211_IF_FILE_OPS(name, ieee80211_if_read_##name, NULL) #define IEEE80211_IF_FILE_W(name) \ _IEEE80211_IF_FILE_W_FN(name) \ _IEEE80211_IF_FILE_OPS(name, NULL, ieee80211_if_write_##name) #define IEEE80211_IF_FILE_RW(name) \ _IEEE80211_IF_FILE_R_FN(name) \ _IEEE80211_IF_FILE_W_FN(name) \ _IEEE80211_IF_FILE_OPS(name, ieee80211_if_read_##name, \ ieee80211_if_write_##name) #define IEEE80211_IF_FILE(name, field, format) \ IEEE80211_IF_FMT_##format(name, field) \ IEEE80211_IF_FILE_R(name) /* common attributes */ IEEE80211_IF_FILE(rc_rateidx_mask_2ghz, rc_rateidx_mask[NL80211_BAND_2GHZ], HEX); IEEE80211_IF_FILE(rc_rateidx_mask_5ghz, rc_rateidx_mask[NL80211_BAND_5GHZ], HEX); IEEE80211_IF_FILE(rc_rateidx_mcs_mask_2ghz, rc_rateidx_mcs_mask[NL80211_BAND_2GHZ], HEXARRAY); IEEE80211_IF_FILE(rc_rateidx_mcs_mask_5ghz, rc_rateidx_mcs_mask[NL80211_BAND_5GHZ], HEXARRAY); static ssize_t ieee80211_if_fmt_rc_rateidx_vht_mcs_mask_2ghz( const struct ieee80211_sub_if_data *sdata, char *buf, int buflen) { int i, len = 0; const u16 *mask = sdata->rc_rateidx_vht_mcs_mask[NL80211_BAND_2GHZ]; for (i = 0; i < NL80211_VHT_NSS_MAX; i++) len += scnprintf(buf + len, buflen - len, "%04x ", mask[i]); len += scnprintf(buf + len, buflen - len, "\n"); return len; } IEEE80211_IF_FILE_R(rc_rateidx_vht_mcs_mask_2ghz); static ssize_t ieee80211_if_fmt_rc_rateidx_vht_mcs_mask_5ghz( const struct ieee80211_sub_if_data *sdata, char *buf, int buflen) { int i, len = 0; const u16 *mask = sdata->rc_rateidx_vht_mcs_mask[NL80211_BAND_5GHZ]; for (i = 0; i < NL80211_VHT_NSS_MAX; i++) len += scnprintf(buf + len, buflen - len, "%04x ", mask[i]); len += scnprintf(buf + len, buflen - len, "\n"); return len; } IEEE80211_IF_FILE_R(rc_rateidx_vht_mcs_mask_5ghz); IEEE80211_IF_FILE(flags, flags, HEX); IEEE80211_IF_FILE(state, state, LHEX); IEEE80211_IF_FILE(txpower, vif.bss_conf.txpower, DEC); IEEE80211_IF_FILE(ap_power_level, ap_power_level, DEC); IEEE80211_IF_FILE(user_power_level, user_power_level, DEC); static ssize_t ieee80211_if_fmt_hw_queues(const struct ieee80211_sub_if_data *sdata, char *buf, int buflen) { int len; len = scnprintf(buf, buflen, "AC queues: VO:%d VI:%d BE:%d BK:%d\n", sdata->vif.hw_queue[IEEE80211_AC_VO], sdata->vif.hw_queue[IEEE80211_AC_VI], sdata->vif.hw_queue[IEEE80211_AC_BE], sdata->vif.hw_queue[IEEE80211_AC_BK]); if (sdata->vif.type == NL80211_IFTYPE_AP) len += scnprintf(buf + len, buflen - len, "cab queue: %d\n", sdata->vif.cab_queue); return len; } IEEE80211_IF_FILE_R(hw_queues); /* STA attributes */ IEEE80211_IF_FILE(bssid, u.mgd.bssid, MAC); IEEE80211_IF_FILE(aid, u.mgd.aid, DEC); IEEE80211_IF_FILE(beacon_timeout, u.mgd.beacon_timeout, JIFFIES_TO_MS); static int ieee80211_set_smps(struct ieee80211_sub_if_data *sdata, enum ieee80211_smps_mode smps_mode) { struct ieee80211_local *local = sdata->local; int err; if (!(local->hw.wiphy->features & NL80211_FEATURE_STATIC_SMPS) && smps_mode == IEEE80211_SMPS_STATIC) return -EINVAL; /* auto should be dynamic if in PS mode */ if (!(local->hw.wiphy->features & NL80211_FEATURE_DYNAMIC_SMPS) && (smps_mode == IEEE80211_SMPS_DYNAMIC || smps_mode == IEEE80211_SMPS_AUTOMATIC)) return -EINVAL; if (sdata->vif.type != NL80211_IFTYPE_STATION && sdata->vif.type != NL80211_IFTYPE_AP) return -EOPNOTSUPP; sdata_lock(sdata); if (sdata->vif.type == NL80211_IFTYPE_STATION) err = __ieee80211_request_smps_mgd(sdata, smps_mode); else err = __ieee80211_request_smps_ap(sdata, smps_mode); sdata_unlock(sdata); return err; } static const char *smps_modes[IEEE80211_SMPS_NUM_MODES] = { [IEEE80211_SMPS_AUTOMATIC] = "auto", [IEEE80211_SMPS_OFF] = "off", [IEEE80211_SMPS_STATIC] = "static", [IEEE80211_SMPS_DYNAMIC] = "dynamic", }; static ssize_t ieee80211_if_fmt_smps(const struct ieee80211_sub_if_data *sdata, char *buf, int buflen) { if (sdata->vif.type == NL80211_IFTYPE_STATION) return snprintf(buf, buflen, "request: %s\nused: %s\n", smps_modes[sdata->u.mgd.req_smps], smps_modes[sdata->smps_mode]); if (sdata->vif.type == NL80211_IFTYPE_AP) return snprintf(buf, buflen, "request: %s\nused: %s\n", smps_modes[sdata->u.ap.req_smps], smps_modes[sdata->smps_mode]); return -EINVAL; } static ssize_t ieee80211_if_parse_smps(struct ieee80211_sub_if_data *sdata, const char *buf, int buflen) { enum ieee80211_smps_mode mode; for (mode = 0; mode < IEEE80211_SMPS_NUM_MODES; mode++) { if (strncmp(buf, smps_modes[mode], buflen) == 0) { int err = ieee80211_set_smps(sdata, mode); if (!err) return buflen; return err; } } return -EINVAL; } IEEE80211_IF_FILE_RW(smps); static ssize_t ieee80211_if_parse_tkip_mic_test( struct ieee80211_sub_if_data *sdata, const char *buf, int buflen) { struct ieee80211_local *local = sdata->local; u8 addr[ETH_ALEN]; struct sk_buff *skb; struct ieee80211_hdr *hdr; __le16 fc; if (!mac_pton(buf, addr)) return -EINVAL; if (!ieee80211_sdata_running(sdata)) return -ENOTCONN; skb = dev_alloc_skb(local->hw.extra_tx_headroom + 24 + 100); if (!skb) return -ENOMEM; skb_reserve(skb, local->hw.extra_tx_headroom); hdr = skb_put_zero(skb, 24); fc = cpu_to_le16(IEEE80211_FTYPE_DATA | IEEE80211_STYPE_DATA); switch (sdata->vif.type) { case NL80211_IFTYPE_AP: fc |= cpu_to_le16(IEEE80211_FCTL_FROMDS); /* DA BSSID SA */ memcpy(hdr->addr1, addr, ETH_ALEN); memcpy(hdr->addr2, sdata->vif.addr, ETH_ALEN); memcpy(hdr->addr3, sdata->vif.addr, ETH_ALEN); break; case NL80211_IFTYPE_STATION: fc |= cpu_to_le16(IEEE80211_FCTL_TODS); /* BSSID SA DA */ sdata_lock(sdata); if (!sdata->u.mgd.associated) { sdata_unlock(sdata); dev_kfree_skb(skb); return -ENOTCONN; } memcpy(hdr->addr1, sdata->u.mgd.associated->bssid, ETH_ALEN); memcpy(hdr->addr2, sdata->vif.addr, ETH_ALEN); memcpy(hdr->addr3, addr, ETH_ALEN); sdata_unlock(sdata); break; default: dev_kfree_skb(skb); return -EOPNOTSUPP; } hdr->frame_control = fc; /* * Add some length to the test frame to make it look bit more valid. * The exact contents does not matter since the recipient is required * to drop this because of the Michael MIC failure. */ skb_put_zero(skb, 50); IEEE80211_SKB_CB(skb)->flags |= IEEE80211_TX_INTFL_TKIP_MIC_FAILURE; ieee80211_tx_skb(sdata, skb); return buflen; } IEEE80211_IF_FILE_W(tkip_mic_test); static ssize_t ieee80211_if_parse_beacon_loss( struct ieee80211_sub_if_data *sdata, const char *buf, int buflen) { if (!ieee80211_sdata_running(sdata) || !sdata->vif.bss_conf.assoc) return -ENOTCONN; ieee80211_beacon_loss(&sdata->vif); return buflen; } IEEE80211_IF_FILE_W(beacon_loss); static ssize_t ieee80211_if_fmt_uapsd_queues( const struct ieee80211_sub_if_data *sdata, char *buf, int buflen) { const struct ieee80211_if_managed *ifmgd = &sdata->u.mgd; return snprintf(buf, buflen, "0x%x\n", ifmgd->uapsd_queues); } static ssize_t ieee80211_if_parse_uapsd_queues( struct ieee80211_sub_if_data *sdata, const char *buf, int buflen) { struct ieee80211_if_managed *ifmgd = &sdata->u.mgd; u8 val; int ret; ret = kstrtou8(buf, 0, &val); if (ret) return ret; if (val & ~IEEE80211_WMM_IE_STA_QOSINFO_AC_MASK) return -ERANGE; ifmgd->uapsd_queues = val; return buflen; } IEEE80211_IF_FILE_RW(uapsd_queues); static ssize_t ieee80211_if_fmt_uapsd_max_sp_len( const struct ieee80211_sub_if_data *sdata, char *buf, int buflen) { const struct ieee80211_if_managed *ifmgd = &sdata->u.mgd; return snprintf(buf, buflen, "0x%x\n", ifmgd->uapsd_max_sp_len); } static ssize_t ieee80211_if_parse_uapsd_max_sp_len( struct ieee80211_sub_if_data *sdata, const char *buf, int buflen) { struct ieee80211_if_managed *ifmgd = &sdata->u.mgd; unsigned long val; int ret; ret = kstrtoul(buf, 0, &val); if (ret) return -EINVAL; if (val & ~IEEE80211_WMM_IE_STA_QOSINFO_SP_MASK) return -ERANGE; ifmgd->uapsd_max_sp_len = val; return buflen; } IEEE80211_IF_FILE_RW(uapsd_max_sp_len); static ssize_t ieee80211_if_fmt_tdls_wider_bw( const struct ieee80211_sub_if_data *sdata, char *buf, int buflen) { const struct ieee80211_if_managed *ifmgd = &sdata->u.mgd; bool tdls_wider_bw; tdls_wider_bw = ieee80211_hw_check(&sdata->local->hw, TDLS_WIDER_BW) && !ifmgd->tdls_wider_bw_prohibited; return snprintf(buf, buflen, "%d\n", tdls_wider_bw); } static ssize_t ieee80211_if_parse_tdls_wider_bw( struct ieee80211_sub_if_data *sdata, const char *buf, int buflen) { struct ieee80211_if_managed *ifmgd = &sdata->u.mgd; u8 val; int ret; ret = kstrtou8(buf, 0, &val); if (ret) return ret; ifmgd->tdls_wider_bw_prohibited = !val; return buflen; } IEEE80211_IF_FILE_RW(tdls_wider_bw); /* AP attributes */ IEEE80211_IF_FILE(num_mcast_sta, u.ap.num_mcast_sta, ATOMIC); IEEE80211_IF_FILE(num_sta_ps, u.ap.ps.num_sta_ps, ATOMIC); IEEE80211_IF_FILE(dtim_count, u.ap.ps.dtim_count, DEC); IEEE80211_IF_FILE(num_mcast_sta_vlan, u.vlan.num_mcast_sta, ATOMIC); static ssize_t ieee80211_if_fmt_num_buffered_multicast( const struct ieee80211_sub_if_data *sdata, char *buf, int buflen) { return scnprintf(buf, buflen, "%u\n", skb_queue_len(&sdata->u.ap.ps.bc_buf)); } IEEE80211_IF_FILE_R(num_buffered_multicast); static ssize_t ieee80211_if_fmt_aqm( const struct ieee80211_sub_if_data *sdata, char *buf, int buflen) { struct ieee80211_local *local = sdata->local; struct txq_info *txqi; int len; if (!sdata->vif.txq) return 0; txqi = to_txq_info(sdata->vif.txq); spin_lock_bh(&local->fq.lock); rcu_read_lock(); len = scnprintf(buf, buflen, "ac backlog-bytes backlog-packets new-flows drops marks overlimit collisions tx-bytes tx-packets\n" "%u %u %u %u %u %u %u %u %u %u\n", txqi->txq.ac, txqi->tin.backlog_bytes, txqi->tin.backlog_packets, txqi->tin.flows, txqi->cstats.drop_count, txqi->cstats.ecn_mark, txqi->tin.overlimit, txqi->tin.collisions, txqi->tin.tx_bytes, txqi->tin.tx_packets); rcu_read_unlock(); spin_unlock_bh(&local->fq.lock); return len; } IEEE80211_IF_FILE_R(aqm); IEEE80211_IF_FILE(multicast_to_unicast, u.ap.multicast_to_unicast, HEX); /* IBSS attributes */ static ssize_t ieee80211_if_fmt_tsf( const struct ieee80211_sub_if_data *sdata, char *buf, int buflen) { struct ieee80211_local *local = sdata->local; u64 tsf; tsf = drv_get_tsf(local, (struct ieee80211_sub_if_data *)sdata); return scnprintf(buf, buflen, "0x%016llx\n", (unsigned long long) tsf); } static ssize_t ieee80211_if_parse_tsf( struct ieee80211_sub_if_data *sdata, const char *buf, int buflen) { struct ieee80211_local *local = sdata->local; unsigned long long tsf; int ret; int tsf_is_delta = 0; if (strncmp(buf, "reset", 5) == 0) { if (local->ops->reset_tsf) { drv_reset_tsf(local, sdata); wiphy_info(local->hw.wiphy, "debugfs reset TSF\n"); } } else { if (buflen > 10 && buf[1] == '=') { if (buf[0] == '+') tsf_is_delta = 1; else if (buf[0] == '-') tsf_is_delta = -1; else return -EINVAL; buf += 2; } ret = kstrtoull(buf, 10, &tsf); if (ret < 0) return ret; if (tsf_is_delta && local->ops->offset_tsf) { drv_offset_tsf(local, sdata, tsf_is_delta * tsf); wiphy_info(local->hw.wiphy, "debugfs offset TSF by %018lld\n", tsf_is_delta * tsf); } else if (local->ops->set_tsf) { if (tsf_is_delta) tsf = drv_get_tsf(local, sdata) + tsf_is_delta * tsf; drv_set_tsf(local, sdata, tsf); wiphy_info(local->hw.wiphy, "debugfs set TSF to %#018llx\n", tsf); } } ieee80211_recalc_dtim(local, sdata); return buflen; } IEEE80211_IF_FILE_RW(tsf); /* WDS attributes */ IEEE80211_IF_FILE(peer, u.wds.remote_addr, MAC); #ifdef CONFIG_MAC80211_MESH IEEE80211_IF_FILE(estab_plinks, u.mesh.estab_plinks, ATOMIC); /* Mesh stats attributes */ IEEE80211_IF_FILE(fwded_mcast, u.mesh.mshstats.fwded_mcast, DEC); IEEE80211_IF_FILE(fwded_unicast, u.mesh.mshstats.fwded_unicast, DEC); IEEE80211_IF_FILE(fwded_frames, u.mesh.mshstats.fwded_frames, DEC); IEEE80211_IF_FILE(dropped_frames_ttl, u.mesh.mshstats.dropped_frames_ttl, DEC); IEEE80211_IF_FILE(dropped_frames_congestion, u.mesh.mshstats.dropped_frames_congestion, DEC); IEEE80211_IF_FILE(dropped_frames_no_route, u.mesh.mshstats.dropped_frames_no_route, DEC); /* Mesh parameters */ IEEE80211_IF_FILE(dot11MeshMaxRetries, u.mesh.mshcfg.dot11MeshMaxRetries, DEC); IEEE80211_IF_FILE(dot11MeshRetryTimeout, u.mesh.mshcfg.dot11MeshRetryTimeout, DEC); IEEE80211_IF_FILE(dot11MeshConfirmTimeout, u.mesh.mshcfg.dot11MeshConfirmTimeout, DEC); IEEE80211_IF_FILE(dot11MeshHoldingTimeout, u.mesh.mshcfg.dot11MeshHoldingTimeout, DEC); IEEE80211_IF_FILE(dot11MeshTTL, u.mesh.mshcfg.dot11MeshTTL, DEC); IEEE80211_IF_FILE(element_ttl, u.mesh.mshcfg.element_ttl, DEC); IEEE80211_IF_FILE(auto_open_plinks, u.mesh.mshcfg.auto_open_plinks, DEC); IEEE80211_IF_FILE(dot11MeshMaxPeerLinks, u.mesh.mshcfg.dot11MeshMaxPeerLinks, DEC); IEEE80211_IF_FILE(dot11MeshHWMPactivePathTimeout, u.mesh.mshcfg.dot11MeshHWMPactivePathTimeout, DEC); IEEE80211_IF_FILE(dot11MeshHWMPpreqMinInterval, u.mesh.mshcfg.dot11MeshHWMPpreqMinInterval, DEC); IEEE80211_IF_FILE(dot11MeshHWMPperrMinInterval, u.mesh.mshcfg.dot11MeshHWMPperrMinInterval, DEC); IEEE80211_IF_FILE(dot11MeshHWMPnetDiameterTraversalTime, u.mesh.mshcfg.dot11MeshHWMPnetDiameterTraversalTime, DEC); IEEE80211_IF_FILE(dot11MeshHWMPmaxPREQretries, u.mesh.mshcfg.dot11MeshHWMPmaxPREQretries, DEC); IEEE80211_IF_FILE(path_refresh_time, u.mesh.mshcfg.path_refresh_time, DEC); IEEE80211_IF_FILE(min_discovery_timeout, u.mesh.mshcfg.min_discovery_timeout, DEC); IEEE80211_IF_FILE(dot11MeshHWMPRootMode, u.mesh.mshcfg.dot11MeshHWMPRootMode, DEC); IEEE80211_IF_FILE(dot11MeshGateAnnouncementProtocol, u.mesh.mshcfg.dot11MeshGateAnnouncementProtocol, DEC); IEEE80211_IF_FILE(dot11MeshHWMPRannInterval, u.mesh.mshcfg.dot11MeshHWMPRannInterval, DEC); IEEE80211_IF_FILE(dot11MeshForwarding, u.mesh.mshcfg.dot11MeshForwarding, DEC); IEEE80211_IF_FILE(rssi_threshold, u.mesh.mshcfg.rssi_threshold, DEC); IEEE80211_IF_FILE(ht_opmode, u.mesh.mshcfg.ht_opmode, DEC); IEEE80211_IF_FILE(dot11MeshHWMPactivePathToRootTimeout, u.mesh.mshcfg.dot11MeshHWMPactivePathToRootTimeout, DEC); IEEE80211_IF_FILE(dot11MeshHWMProotInterval, u.mesh.mshcfg.dot11MeshHWMProotInterval, DEC); IEEE80211_IF_FILE(dot11MeshHWMPconfirmationInterval, u.mesh.mshcfg.dot11MeshHWMPconfirmationInterval, DEC); IEEE80211_IF_FILE(power_mode, u.mesh.mshcfg.power_mode, DEC); IEEE80211_IF_FILE(dot11MeshAwakeWindowDuration, u.mesh.mshcfg.dot11MeshAwakeWindowDuration, DEC); #endif #define DEBUGFS_ADD_MODE(name, mode) \ debugfs_create_file(#name, mode, sdata->vif.debugfs_dir, \ sdata, &name##_ops); #define DEBUGFS_ADD(name) DEBUGFS_ADD_MODE(name, 0400) static void add_common_files(struct ieee80211_sub_if_data *sdata) { DEBUGFS_ADD(rc_rateidx_mask_2ghz); DEBUGFS_ADD(rc_rateidx_mask_5ghz); DEBUGFS_ADD(rc_rateidx_mcs_mask_2ghz); DEBUGFS_ADD(rc_rateidx_mcs_mask_5ghz); DEBUGFS_ADD(rc_rateidx_vht_mcs_mask_2ghz); DEBUGFS_ADD(rc_rateidx_vht_mcs_mask_5ghz); DEBUGFS_ADD(hw_queues); if (sdata->local->ops->wake_tx_queue && sdata->vif.type != NL80211_IFTYPE_P2P_DEVICE && sdata->vif.type != NL80211_IFTYPE_NAN) DEBUGFS_ADD(aqm); } static void add_sta_files(struct ieee80211_sub_if_data *sdata) { DEBUGFS_ADD(bssid); DEBUGFS_ADD(aid); DEBUGFS_ADD(beacon_timeout); DEBUGFS_ADD_MODE(smps, 0600); DEBUGFS_ADD_MODE(tkip_mic_test, 0200); DEBUGFS_ADD_MODE(beacon_loss, 0200); DEBUGFS_ADD_MODE(uapsd_queues, 0600); DEBUGFS_ADD_MODE(uapsd_max_sp_len, 0600); DEBUGFS_ADD_MODE(tdls_wider_bw, 0600); } static void add_ap_files(struct ieee80211_sub_if_data *sdata) { DEBUGFS_ADD(num_mcast_sta); DEBUGFS_ADD_MODE(smps, 0600); DEBUGFS_ADD(num_sta_ps); DEBUGFS_ADD(dtim_count); DEBUGFS_ADD(num_buffered_multicast); DEBUGFS_ADD_MODE(tkip_mic_test, 0200); DEBUGFS_ADD_MODE(multicast_to_unicast, 0600); } static void add_vlan_files(struct ieee80211_sub_if_data *sdata) { /* add num_mcast_sta_vlan using name num_mcast_sta */ debugfs_create_file("num_mcast_sta", 0400, sdata->vif.debugfs_dir, sdata, &num_mcast_sta_vlan_ops); } static void add_ibss_files(struct ieee80211_sub_if_data *sdata) { DEBUGFS_ADD_MODE(tsf, 0600); } static void add_wds_files(struct ieee80211_sub_if_data *sdata) { DEBUGFS_ADD(peer); } #ifdef CONFIG_MAC80211_MESH static void add_mesh_files(struct ieee80211_sub_if_data *sdata) { DEBUGFS_ADD_MODE(tsf, 0600); DEBUGFS_ADD_MODE(estab_plinks, 0400); } static void add_mesh_stats(struct ieee80211_sub_if_data *sdata) { struct dentry *dir = debugfs_create_dir("mesh_stats", sdata->vif.debugfs_dir); #define MESHSTATS_ADD(name)\ debugfs_create_file(#name, 0400, dir, sdata, &name##_ops); MESHSTATS_ADD(fwded_mcast); MESHSTATS_ADD(fwded_unicast); MESHSTATS_ADD(fwded_frames); MESHSTATS_ADD(dropped_frames_ttl); MESHSTATS_ADD(dropped_frames_no_route); MESHSTATS_ADD(dropped_frames_congestion); #undef MESHSTATS_ADD } static void add_mesh_config(struct ieee80211_sub_if_data *sdata) { struct dentry *dir = debugfs_create_dir("mesh_config", sdata->vif.debugfs_dir); #define MESHPARAMS_ADD(name) \ debugfs_create_file(#name, 0600, dir, sdata, &name##_ops); MESHPARAMS_ADD(dot11MeshMaxRetries); MESHPARAMS_ADD(dot11MeshRetryTimeout); MESHPARAMS_ADD(dot11MeshConfirmTimeout); MESHPARAMS_ADD(dot11MeshHoldingTimeout); MESHPARAMS_ADD(dot11MeshTTL); MESHPARAMS_ADD(element_ttl); MESHPARAMS_ADD(auto_open_plinks); MESHPARAMS_ADD(dot11MeshMaxPeerLinks); MESHPARAMS_ADD(dot11MeshHWMPactivePathTimeout); MESHPARAMS_ADD(dot11MeshHWMPpreqMinInterval); MESHPARAMS_ADD(dot11MeshHWMPperrMinInterval); MESHPARAMS_ADD(dot11MeshHWMPnetDiameterTraversalTime); MESHPARAMS_ADD(dot11MeshHWMPmaxPREQretries); MESHPARAMS_ADD(path_refresh_time); MESHPARAMS_ADD(min_discovery_timeout); MESHPARAMS_ADD(dot11MeshHWMPRootMode); MESHPARAMS_ADD(dot11MeshHWMPRannInterval); MESHPARAMS_ADD(dot11MeshForwarding); MESHPARAMS_ADD(dot11MeshGateAnnouncementProtocol); MESHPARAMS_ADD(rssi_threshold); MESHPARAMS_ADD(ht_opmode); MESHPARAMS_ADD(dot11MeshHWMPactivePathToRootTimeout); MESHPARAMS_ADD(dot11MeshHWMProotInterval); MESHPARAMS_ADD(dot11MeshHWMPconfirmationInterval); MESHPARAMS_ADD(power_mode); MESHPARAMS_ADD(dot11MeshAwakeWindowDuration); #undef MESHPARAMS_ADD } #endif static void add_files(struct ieee80211_sub_if_data *sdata) { if (!sdata->vif.debugfs_dir) return; DEBUGFS_ADD(flags); DEBUGFS_ADD(state); DEBUGFS_ADD(txpower); DEBUGFS_ADD(user_power_level); DEBUGFS_ADD(ap_power_level); if (sdata->vif.type != NL80211_IFTYPE_MONITOR) add_common_files(sdata); switch (sdata->vif.type) { case NL80211_IFTYPE_MESH_POINT: #ifdef CONFIG_MAC80211_MESH add_mesh_files(sdata); add_mesh_stats(sdata); add_mesh_config(sdata); #endif break; case NL80211_IFTYPE_STATION: add_sta_files(sdata); break; case NL80211_IFTYPE_ADHOC: add_ibss_files(sdata); break; case NL80211_IFTYPE_AP: add_ap_files(sdata); break; case NL80211_IFTYPE_AP_VLAN: add_vlan_files(sdata); break; case NL80211_IFTYPE_WDS: add_wds_files(sdata); break; default: break; } } void ieee80211_debugfs_add_netdev(struct ieee80211_sub_if_data *sdata) { char buf[10+IFNAMSIZ]; sprintf(buf, "netdev:%s", sdata->name); sdata->vif.debugfs_dir = debugfs_create_dir(buf, sdata->local->hw.wiphy->debugfsdir); if (sdata->vif.debugfs_dir) sdata->debugfs.subdir_stations = debugfs_create_dir("stations", sdata->vif.debugfs_dir); add_files(sdata); } void ieee80211_debugfs_remove_netdev(struct ieee80211_sub_if_data *sdata) { if (!sdata->vif.debugfs_dir) return; debugfs_remove_recursive(sdata->vif.debugfs_dir); sdata->vif.debugfs_dir = NULL; sdata->debugfs.subdir_stations = NULL; } void ieee80211_debugfs_rename_netdev(struct ieee80211_sub_if_data *sdata) { struct dentry *dir; char buf[10 + IFNAMSIZ]; dir = sdata->vif.debugfs_dir; if (IS_ERR_OR_NULL(dir)) return; sprintf(buf, "netdev:%s", sdata->name); if (!debugfs_rename(dir->d_parent, dir, dir->d_parent, buf)) sdata_err(sdata, "debugfs: failed to rename debugfs dir to %s\n", buf); }
30 20160 116 27 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 /* SPDX-License-Identifier: GPL-2.0 */ #ifndef _LINUX_WAIT_H #define _LINUX_WAIT_H /* * Linux wait queue related types and methods */ #include <linux/list.h> #include <linux/stddef.h> #include <linux/spinlock.h> #include <asm/current.h> #include <uapi/linux/wait.h> typedef struct wait_queue_entry wait_queue_entry_t; typedef int (*wait_queue_func_t)(struct wait_queue_entry *wq_entry, unsigned mode, int flags, void *key); int default_wake_function(struct wait_queue_entry *wq_entry, unsigned mode, int flags, void *key); /* wait_queue_entry::flags */ #define WQ_FLAG_EXCLUSIVE 0x01 #define WQ_FLAG_WOKEN 0x02 #define WQ_FLAG_BOOKMARK 0x04 /* * A single wait-queue entry structure: */ struct wait_queue_entry { unsigned int flags; void *private; wait_queue_func_t func; struct list_head entry; }; struct wait_queue_head { spinlock_t lock; struct list_head head; }; typedef struct wait_queue_head wait_queue_head_t; struct task_struct; /* * Macros for declaration and initialisaton of the datatypes */ #define __WAITQUEUE_INITIALIZER(name, tsk) { \ .private = tsk, \ .func = default_wake_function, \ .entry = { NULL, NULL } } #define DECLARE_WAITQUEUE(name, tsk) \ struct wait_queue_entry name = __WAITQUEUE_INITIALIZER(name, tsk) #define __WAIT_QUEUE_HEAD_INITIALIZER(name) { \ .lock = __SPIN_LOCK_UNLOCKED(name.lock), \ .head = { &(name).head, &(name).head } } #define DECLARE_WAIT_QUEUE_HEAD(name) \ struct wait_queue_head name = __WAIT_QUEUE_HEAD_INITIALIZER(name) extern void __init_waitqueue_head(struct wait_queue_head *wq_head, const char *name, struct lock_class_key *); #define init_waitqueue_head(wq_head) \ do { \ static struct lock_class_key __key; \ \ __init_waitqueue_head((wq_head), #wq_head, &__key); \ } while (0) #ifdef CONFIG_LOCKDEP # define __WAIT_QUEUE_HEAD_INIT_ONSTACK(name) \ ({ init_waitqueue_head(&name); name; }) # define DECLARE_WAIT_QUEUE_HEAD_ONSTACK(name) \ struct wait_queue_head name = __WAIT_QUEUE_HEAD_INIT_ONSTACK(name) #else # define DECLARE_WAIT_QUEUE_HEAD_ONSTACK(name) DECLARE_WAIT_QUEUE_HEAD(name) #endif static inline void init_waitqueue_entry(struct wait_queue_entry *wq_entry, struct task_struct *p) { wq_entry->flags = 0; wq_entry->private = p; wq_entry->func = default_wake_function; } static inline void init_waitqueue_func_entry(struct wait_queue_entry *wq_entry, wait_queue_func_t func) { wq_entry->flags = 0; wq_entry->private = NULL; wq_entry->func = func; } /** * waitqueue_active -- locklessly test for waiters on the queue * @wq_head: the waitqueue to test for waiters * * returns true if the wait list is not empty * * NOTE: this function is lockless and requires care, incorrect usage _will_ * lead to sporadic and non-obvious failure. * * Use either while holding wait_queue_head::lock or when used for wakeups * with an extra smp_mb() like: * * CPU0 - waker CPU1 - waiter * * for (;;) { * @cond = true; prepare_to_wait(&wq_head, &wait, state); * smp_mb(); // smp_mb() from set_current_state() * if (waitqueue_active(wq_head)) if (@cond) * wake_up(wq_head); break; * schedule(); * } * finish_wait(&wq_head, &wait); * * Because without the explicit smp_mb() it's possible for the * waitqueue_active() load to get hoisted over the @cond store such that we'll * observe an empty wait list while the waiter might not observe @cond. * * Also note that this 'optimization' trades a spin_lock() for an smp_mb(), * which (when the lock is uncontended) are of roughly equal cost. */ static inline int waitqueue_active(struct wait_queue_head *wq_head) { return !list_empty(&wq_head->head); } /** * wq_has_sleeper - check if there are any waiting processes * @wq_head: wait queue head * * Returns true if wq_head has waiting processes * * Please refer to the comment for waitqueue_active. */ static inline bool wq_has_sleeper(struct wait_queue_head *wq_head) { /* * We need to be sure we are in sync with the * add_wait_queue modifications to the wait queue. * * This memory barrier should be paired with one on the * waiting side. */ smp_mb(); return waitqueue_active(wq_head); } extern void add_wait_queue(struct wait_queue_head *wq_head, struct wait_queue_entry *wq_entry); extern void add_wait_queue_exclusive(struct wait_queue_head *wq_head, struct wait_queue_entry *wq_entry); extern void remove_wait_queue(struct wait_queue_head *wq_head, struct wait_queue_entry *wq_entry); static inline void __add_wait_queue(struct wait_queue_head *wq_head, struct wait_queue_entry *wq_entry) { list_add(&wq_entry->entry, &wq_head->head); } /* * Used for wake-one threads: */ static inline void __add_wait_queue_exclusive(struct wait_queue_head *wq_head, struct wait_queue_entry *wq_entry) { wq_entry->flags |= WQ_FLAG_EXCLUSIVE; __add_wait_queue(wq_head, wq_entry); } static inline void __add_wait_queue_entry_tail(struct wait_queue_head *wq_head, struct wait_queue_entry *wq_entry) { list_add_tail(&wq_entry->entry, &wq_head->head); } static inline void __add_wait_queue_entry_tail_exclusive(struct wait_queue_head *wq_head, struct wait_queue_entry *wq_entry) { wq_entry->flags |= WQ_FLAG_EXCLUSIVE; __add_wait_queue_entry_tail(wq_head, wq_entry); } static inline void __remove_wait_queue(struct wait_queue_head *wq_head, struct wait_queue_entry *wq_entry) { list_del(&wq_entry->entry); } void __wake_up(struct wait_queue_head *wq_head, unsigned int mode, int nr, void *key); void __wake_up_locked_key(struct wait_queue_head *wq_head, unsigned int mode, void *key); void __wake_up_locked_key_bookmark(struct wait_queue_head *wq_head, unsigned int mode, void *key, wait_queue_entry_t *bookmark); void __wake_up_sync_key(struct wait_queue_head *wq_head, unsigned int mode, int nr, void *key); void __wake_up_locked(struct wait_queue_head *wq_head, unsigned int mode, int nr); void __wake_up_sync(struct wait_queue_head *wq_head, unsigned int mode, int nr); void __wake_up_pollfree(struct wait_queue_head *wq_head); #define wake_up(x) __wake_up(x, TASK_NORMAL, 1, NULL) #define wake_up_nr(x, nr) __wake_up(x, TASK_NORMAL, nr, NULL) #define wake_up_all(x) __wake_up(x, TASK_NORMAL, 0, NULL) #define wake_up_locked(x) __wake_up_locked((x), TASK_NORMAL, 1) #define wake_up_all_locked(x) __wake_up_locked((x), TASK_NORMAL, 0) #define wake_up_interruptible(x) __wake_up(x, TASK_INTERRUPTIBLE, 1, NULL) #define wake_up_interruptible_nr(x, nr) __wake_up(x, TASK_INTERRUPTIBLE, nr, NULL) #define wake_up_interruptible_all(x) __wake_up(x, TASK_INTERRUPTIBLE, 0, NULL) #define wake_up_interruptible_sync(x) __wake_up_sync((x), TASK_INTERRUPTIBLE, 1) /* * Wakeup macros to be used to report events to the targets. */ #define poll_to_key(m) ((void *)(__force uintptr_t)(__poll_t)(m)) #define key_to_poll(m) ((__force __poll_t)(uintptr_t)(void *)(m)) #define wake_up_poll(x, m) \ __wake_up(x, TASK_NORMAL, 1, poll_to_key(m)) #define wake_up_locked_poll(x, m) \ __wake_up_locked_key((x), TASK_NORMAL, poll_to_key(m)) #define wake_up_interruptible_poll(x, m) \ __wake_up(x, TASK_INTERRUPTIBLE, 1, poll_to_key(m)) #define wake_up_interruptible_sync_poll(x, m) \ __wake_up_sync_key((x), TASK_INTERRUPTIBLE, 1, poll_to_key(m)) /** * wake_up_pollfree - signal that a polled waitqueue is going away * @wq_head: the wait queue head * * In the very rare cases where a ->poll() implementation uses a waitqueue whose * lifetime is tied to a task rather than to the 'struct file' being polled, * this function must be called before the waitqueue is freed so that * non-blocking polls (e.g. epoll) are notified that the queue is going away. * * The caller must also RCU-delay the freeing of the wait_queue_head, e.g. via * an explicit synchronize_rcu() or call_rcu(), or via SLAB_TYPESAFE_BY_RCU. */ static inline void wake_up_pollfree(struct wait_queue_head *wq_head) { /* * For performance reasons, we don't always take the queue lock here. * Therefore, we might race with someone removing the last entry from * the queue, and proceed while they still hold the queue lock. * However, rcu_read_lock() is required to be held in such cases, so we * can safely proceed with an RCU-delayed free. */ if (waitqueue_active(wq_head)) __wake_up_pollfree(wq_head); } #define ___wait_cond_timeout(condition) \ ({ \ bool __cond = (condition); \ if (__cond && !__ret) \ __ret = 1; \ __cond || !__ret; \ }) #define ___wait_is_interruptible(state) \ (!__builtin_constant_p(state) || \ state == TASK_INTERRUPTIBLE || state == TASK_KILLABLE) \ extern void init_wait_entry(struct wait_queue_entry *wq_entry, int flags); /* * The below macro ___wait_event() has an explicit shadow of the __ret * variable when used from the wait_event_*() macros. * * This is so that both can use the ___wait_cond_timeout() construct * to wrap the condition. * * The type inconsistency of the wait_event_*() __ret variable is also * on purpose; we use long where we can return timeout values and int * otherwise. */ #define ___wait_event(wq_head, condition, state, exclusive, ret, cmd) \ ({ \ __label__ __out; \ struct wait_queue_entry __wq_entry; \ long __ret = ret; /* explicit shadow */ \ \ init_wait_entry(&__wq_entry, exclusive ? WQ_FLAG_EXCLUSIVE : 0); \ for (;;) { \ long __int = prepare_to_wait_event(&wq_head, &__wq_entry, state);\ \ if (condition) \ break; \ \ if (___wait_is_interruptible(state) && __int) { \ __ret = __int; \ goto __out; \ } \ \ cmd; \ } \ finish_wait(&wq_head, &__wq_entry); \ __out: __ret; \ }) #define __wait_event(wq_head, condition) \ (void)___wait_event(wq_head, condition, TASK_UNINTERRUPTIBLE, 0, 0, \ schedule()) /** * wait_event - sleep until a condition gets true * @wq_head: the waitqueue to wait on * @condition: a C expression for the event to wait for * * The process is put to sleep (TASK_UNINTERRUPTIBLE) until the * @condition evaluates to true. The @condition is checked each time * the waitqueue @wq_head is woken up. * * wake_up() has to be called after changing any variable that could * change the result of the wait condition. */ #define wait_event(wq_head, condition) \ do { \ might_sleep(); \ if (condition) \ break; \ __wait_event(wq_head, condition); \ } while (0) #define __io_wait_event(wq_head, condition) \ (void)___wait_event(wq_head, condition, TASK_UNINTERRUPTIBLE, 0, 0, \ io_schedule()) /* * io_wait_event() -- like wait_event() but with io_schedule() */ #define io_wait_event(wq_head, condition) \ do { \ might_sleep(); \ if (condition) \ break; \ __io_wait_event(wq_head, condition); \ } while (0) #define __wait_event_freezable(wq_head, condition) \ ___wait_event(wq_head, condition, TASK_INTERRUPTIBLE, 0, 0, \ schedule(); try_to_freeze()) /** * wait_event_freezable - sleep (or freeze) until a condition gets true * @wq_head: the waitqueue to wait on * @condition: a C expression for the event to wait for * * The process is put to sleep (TASK_INTERRUPTIBLE -- so as not to contribute * to system load) until the @condition evaluates to true. The * @condition is checked each time the waitqueue @wq_head is woken up. * * wake_up() has to be called after changing any variable that could * change the result of the wait condition. */ #define wait_event_freezable(wq_head, condition) \ ({ \ int __ret = 0; \ might_sleep(); \ if (!(condition)) \ __ret = __wait_event_freezable(wq_head, condition); \ __ret; \ }) #define __wait_event_timeout(wq_head, condition, timeout) \ ___wait_event(wq_head, ___wait_cond_timeout(condition), \ TASK_UNINTERRUPTIBLE, 0, timeout, \ __ret = schedule_timeout(__ret)) /** * wait_event_timeout - sleep until a condition gets true or a timeout elapses * @wq_head: the waitqueue to wait on * @condition: a C expression for the event to wait for * @timeout: timeout, in jiffies * * The process is put to sleep (TASK_UNINTERRUPTIBLE) until the * @condition evaluates to true. The @condition is checked each time * the waitqueue @wq_head is woken up. * * wake_up() has to be called after changing any variable that could * change the result of the wait condition. * * Returns: * 0 if the @condition evaluated to %false after the @timeout elapsed, * 1 if the @condition evaluated to %true after the @timeout elapsed, * or the remaining jiffies (at least 1) if the @condition evaluated * to %true before the @timeout elapsed. */ #define wait_event_timeout(wq_head, condition, timeout) \ ({ \ long __ret = timeout; \ might_sleep(); \ if (!___wait_cond_timeout(condition)) \ __ret = __wait_event_timeout(wq_head, condition, timeout); \ __ret; \ }) #define __wait_event_freezable_timeout(wq_head, condition, timeout) \ ___wait_event(wq_head, ___wait_cond_timeout(condition), \ TASK_INTERRUPTIBLE, 0, timeout, \ __ret = schedule_timeout(__ret); try_to_freeze()) /* * like wait_event_timeout() -- except it uses TASK_INTERRUPTIBLE to avoid * increasing load and is freezable. */ #define wait_event_freezable_timeout(wq_head, condition, timeout) \ ({ \ long __ret = timeout; \ might_sleep(); \ if (!___wait_cond_timeout(condition)) \ __ret = __wait_event_freezable_timeout(wq_head, condition, timeout); \ __ret; \ }) #define __wait_event_exclusive_cmd(wq_head, condition, cmd1, cmd2) \ (void)___wait_event(wq_head, condition, TASK_UNINTERRUPTIBLE, 1, 0, \ cmd1; schedule(); cmd2) /* * Just like wait_event_cmd(), except it sets exclusive flag */ #define wait_event_exclusive_cmd(wq_head, condition, cmd1, cmd2) \ do { \ if (condition) \ break; \ __wait_event_exclusive_cmd(wq_head, condition, cmd1, cmd2); \ } while (0) #define __wait_event_cmd(wq_head, condition, cmd1, cmd2) \ (void)___wait_event(wq_head, condition, TASK_UNINTERRUPTIBLE, 0, 0, \ cmd1; schedule(); cmd2) /** * wait_event_cmd - sleep until a condition gets true * @wq_head: the waitqueue to wait on * @condition: a C expression for the event to wait for * @cmd1: the command will be executed before sleep * @cmd2: the command will be executed after sleep * * The process is put to sleep (TASK_UNINTERRUPTIBLE) until the * @condition evaluates to true. The @condition is checked each time * the waitqueue @wq_head is woken up. * * wake_up() has to be called after changing any variable that could * change the result of the wait condition. */ #define wait_event_cmd(wq_head, condition, cmd1, cmd2) \ do { \ if (condition) \ break; \ __wait_event_cmd(wq_head, condition, cmd1, cmd2); \ } while (0) #define __wait_event_interruptible(wq_head, condition) \ ___wait_event(wq_head, condition, TASK_INTERRUPTIBLE, 0, 0, \ schedule()) /** * wait_event_interruptible - sleep until a condition gets true * @wq_head: the waitqueue to wait on * @condition: a C expression for the event to wait for * * The process is put to sleep (TASK_INTERRUPTIBLE) until the * @condition evaluates to true or a signal is received. * The @condition is checked each time the waitqueue @wq_head is woken up. * * wake_up() has to be called after changing any variable that could * change the result of the wait condition. * * The function will return -ERESTARTSYS if it was interrupted by a * signal and 0 if @condition evaluated to true. */ #define wait_event_interruptible(wq_head, condition) \ ({ \ int __ret = 0; \ might_sleep(); \ if (!(condition)) \ __ret = __wait_event_interruptible(wq_head, condition); \ __ret; \ }) #define __wait_event_interruptible_timeout(wq_head, condition, timeout) \ ___wait_event(wq_head, ___wait_cond_timeout(condition), \ TASK_INTERRUPTIBLE, 0, timeout, \ __ret = schedule_timeout(__ret)) /** * wait_event_interruptible_timeout - sleep until a condition gets true or a timeout elapses * @wq_head: the waitqueue to wait on * @condition: a C expression for the event to wait for * @timeout: timeout, in jiffies * * The process is put to sleep (TASK_INTERRUPTIBLE) until the * @condition evaluates to true or a signal is received. * The @condition is checked each time the waitqueue @wq_head is woken up. * * wake_up() has to be called after changing any variable that could * change the result of the wait condition. * * Returns: * 0 if the @condition evaluated to %false after the @timeout elapsed, * 1 if the @condition evaluated to %true after the @timeout elapsed, * the remaining jiffies (at least 1) if the @condition evaluated * to %true before the @timeout elapsed, or -%ERESTARTSYS if it was * interrupted by a signal. */ #define wait_event_interruptible_timeout(wq_head, condition, timeout) \ ({ \ long __ret = timeout; \ might_sleep(); \ if (!___wait_cond_timeout(condition)) \ __ret = __wait_event_interruptible_timeout(wq_head, \ condition, timeout); \ __ret; \ }) #define __wait_event_hrtimeout(wq_head, condition, timeout, state) \ ({ \ int __ret = 0; \ struct hrtimer_sleeper __t; \ \ hrtimer_init_on_stack(&__t.timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL); \ hrtimer_init_sleeper(&__t, current); \ if ((timeout) != KTIME_MAX) \ hrtimer_start_range_ns(&__t.timer, timeout, \ current->timer_slack_ns, \ HRTIMER_MODE_REL); \ \ __ret = ___wait_event(wq_head, condition, state, 0, 0, \ if (!__t.task) { \ __ret = -ETIME; \ break; \ } \ schedule()); \ \ hrtimer_cancel(&__t.timer); \ destroy_hrtimer_on_stack(&__t.timer); \ __ret; \ }) /** * wait_event_hrtimeout - sleep until a condition gets true or a timeout elapses * @wq_head: the waitqueue to wait on * @condition: a C expression for the event to wait for * @timeout: timeout, as a ktime_t * * The process is put to sleep (TASK_UNINTERRUPTIBLE) until the * @condition evaluates to true or a signal is received. * The @condition is checked each time the waitqueue @wq_head is woken up. * * wake_up() has to be called after changing any variable that could * change the result of the wait condition. * * The function returns 0 if @condition became true, or -ETIME if the timeout * elapsed. */ #define wait_event_hrtimeout(wq_head, condition, timeout) \ ({ \ int __ret = 0; \ might_sleep(); \ if (!(condition)) \ __ret = __wait_event_hrtimeout(wq_head, condition, timeout, \ TASK_UNINTERRUPTIBLE); \ __ret; \ }) /** * wait_event_interruptible_hrtimeout - sleep until a condition gets true or a timeout elapses * @wq: the waitqueue to wait on * @condition: a C expression for the event to wait for * @timeout: timeout, as a ktime_t * * The process is put to sleep (TASK_INTERRUPTIBLE) until the * @condition evaluates to true or a signal is received. * The @condition is checked each time the waitqueue @wq is woken up. * * wake_up() has to be called after changing any variable that could * change the result of the wait condition. * * The function returns 0 if @condition became true, -ERESTARTSYS if it was * interrupted by a signal, or -ETIME if the timeout elapsed. */ #define wait_event_interruptible_hrtimeout(wq, condition, timeout) \ ({ \ long __ret = 0; \ might_sleep(); \ if (!(condition)) \ __ret = __wait_event_hrtimeout(wq, condition, timeout, \ TASK_INTERRUPTIBLE); \ __ret; \ }) #define __wait_event_interruptible_exclusive(wq, condition) \ ___wait_event(wq, condition, TASK_INTERRUPTIBLE, 1, 0, \ schedule()) #define wait_event_interruptible_exclusive(wq, condition) \ ({ \ int __ret = 0; \ might_sleep(); \ if (!(condition)) \ __ret = __wait_event_interruptible_exclusive(wq, condition); \ __ret; \ }) #define __wait_event_killable_exclusive(wq, condition) \ ___wait_event(wq, condition, TASK_KILLABLE, 1, 0, \ schedule()) #define wait_event_killable_exclusive(wq, condition) \ ({ \ int __ret = 0; \ might_sleep(); \ if (!(condition)) \ __ret = __wait_event_killable_exclusive(wq, condition); \ __ret; \ }) #define __wait_event_freezable_exclusive(wq, condition) \ ___wait_event(wq, condition, TASK_INTERRUPTIBLE, 1, 0, \ schedule(); try_to_freeze()) #define wait_event_freezable_exclusive(wq, condition) \ ({ \ int __ret = 0; \ might_sleep(); \ if (!(condition)) \ __ret = __wait_event_freezable_exclusive(wq, condition); \ __ret; \ }) /** * wait_event_idle - wait for a condition without contributing to system load * @wq_head: the waitqueue to wait on * @condition: a C expression for the event to wait for * * The process is put to sleep (TASK_IDLE) until the * @condition evaluates to true. * The @condition is checked each time the waitqueue @wq_head is woken up. * * wake_up() has to be called after changing any variable that could * change the result of the wait condition. * */ #define wait_event_idle(wq_head, condition) \ do { \ might_sleep(); \ if (!(condition)) \ ___wait_event(wq_head, condition, TASK_IDLE, 0, 0, schedule()); \ } while (0) /** * wait_event_idle_exclusive - wait for a condition with contributing to system load * @wq_head: the waitqueue to wait on * @condition: a C expression for the event to wait for * * The process is put to sleep (TASK_IDLE) until the * @condition evaluates to true. * The @condition is checked each time the waitqueue @wq_head is woken up. * * The process is put on the wait queue with an WQ_FLAG_EXCLUSIVE flag * set thus if other processes wait on the same list, when this * process is woken further processes are not considered. * * wake_up() has to be called after changing any variable that could * change the result of the wait condition. * */ #define wait_event_idle_exclusive(wq_head, condition) \ do { \ might_sleep(); \ if (!(condition)) \ ___wait_event(wq_head, condition, TASK_IDLE, 1, 0, schedule()); \ } while (0) #define __wait_event_idle_timeout(wq_head, condition, timeout) \ ___wait_event(wq_head, ___wait_cond_timeout(condition), \ TASK_IDLE, 0, timeout, \ __ret = schedule_timeout(__ret)) /** * wait_event_idle_timeout - sleep without load until a condition becomes true or a timeout elapses * @wq_head: the waitqueue to wait on * @condition: a C expression for the event to wait for * @timeout: timeout, in jiffies * * The process is put to sleep (TASK_IDLE) until the * @condition evaluates to true. The @condition is checked each time * the waitqueue @wq_head is woken up. * * wake_up() has to be called after changing any variable that could * change the result of the wait condition. * * Returns: * 0 if the @condition evaluated to %false after the @timeout elapsed, * 1 if the @condition evaluated to %true after the @timeout elapsed, * or the remaining jiffies (at least 1) if the @condition evaluated * to %true before the @timeout elapsed. */ #define wait_event_idle_timeout(wq_head, condition, timeout) \ ({ \ long __ret = timeout; \ might_sleep(); \ if (!___wait_cond_timeout(condition)) \ __ret = __wait_event_idle_timeout(wq_head, condition, timeout); \ __ret; \ }) #define __wait_event_idle_exclusive_timeout(wq_head, condition, timeout) \ ___wait_event(wq_head, ___wait_cond_timeout(condition), \ TASK_IDLE, 1, timeout, \ __ret = schedule_timeout(__ret)) /** * wait_event_idle_exclusive_timeout - sleep without load until a condition becomes true or a timeout elapses * @wq_head: the waitqueue to wait on * @condition: a C expression for the event to wait for * @timeout: timeout, in jiffies * * The process is put to sleep (TASK_IDLE) until the * @condition evaluates to true. The @condition is checked each time * the waitqueue @wq_head is woken up. * * The process is put on the wait queue with an WQ_FLAG_EXCLUSIVE flag * set thus if other processes wait on the same list, when this * process is woken further processes are not considered. * * wake_up() has to be called after changing any variable that could * change the result of the wait condition. * * Returns: * 0 if the @condition evaluated to %false after the @timeout elapsed, * 1 if the @condition evaluated to %true after the @timeout elapsed, * or the remaining jiffies (at least 1) if the @condition evaluated * to %true before the @timeout elapsed. */ #define wait_event_idle_exclusive_timeout(wq_head, condition, timeout) \ ({ \ long __ret = timeout; \ might_sleep(); \ if (!___wait_cond_timeout(condition)) \ __ret = __wait_event_idle_exclusive_timeout(wq_head, condition, timeout);\ __ret; \ }) extern int do_wait_intr(wait_queue_head_t *, wait_queue_entry_t *); extern int do_wait_intr_irq(wait_queue_head_t *, wait_queue_entry_t *); #define __wait_event_interruptible_locked(wq, condition, exclusive, fn) \ ({ \ int __ret; \ DEFINE_WAIT(__wait); \ if (exclusive) \ __wait.flags |= WQ_FLAG_EXCLUSIVE; \ do { \ __ret = fn(&(wq), &__wait); \ if (__ret) \ break; \ } while (!(condition)); \ __remove_wait_queue(&(wq), &__wait); \ __set_current_state(TASK_RUNNING); \ __ret; \ }) /** * wait_event_interruptible_locked - sleep until a condition gets true * @wq: the waitqueue to wait on * @condition: a C expression for the event to wait for * * The process is put to sleep (TASK_INTERRUPTIBLE) until the * @condition evaluates to true or a signal is received. * The @condition is checked each time the waitqueue @wq is woken up. * * It must be called with wq.lock being held. This spinlock is * unlocked while sleeping but @condition testing is done while lock * is held and when this macro exits the lock is held. * * The lock is locked/unlocked using spin_lock()/spin_unlock() * functions which must match the way they are locked/unlocked outside * of this macro. * * wake_up_locked() has to be called after changing any variable that could * change the result of the wait condition. * * The function will return -ERESTARTSYS if it was interrupted by a * signal and 0 if @condition evaluated to true. */ #define wait_event_interruptible_locked(wq, condition) \ ((condition) \ ? 0 : __wait_event_interruptible_locked(wq, condition, 0, do_wait_intr)) /** * wait_event_interruptible_locked_irq - sleep until a condition gets true * @wq: the waitqueue to wait on * @condition: a C expression for the event to wait for * * The process is put to sleep (TASK_INTERRUPTIBLE) until the * @condition evaluates to true or a signal is received. * The @condition is checked each time the waitqueue @wq is woken up. * * It must be called with wq.lock being held. This spinlock is * unlocked while sleeping but @condition testing is done while lock * is held and when this macro exits the lock is held. * * The lock is locked/unlocked using spin_lock_irq()/spin_unlock_irq() * functions which must match the way they are locked/unlocked outside * of this macro. * * wake_up_locked() has to be called after changing any variable that could * change the result of the wait condition. * * The function will return -ERESTARTSYS if it was interrupted by a * signal and 0 if @condition evaluated to true. */ #define wait_event_interruptible_locked_irq(wq, condition) \ ((condition) \ ? 0 : __wait_event_interruptible_locked(wq, condition, 0, do_wait_intr_irq)) /** * wait_event_interruptible_exclusive_locked - sleep exclusively until a condition gets true * @wq: the waitqueue to wait on * @condition: a C expression for the event to wait for * * The process is put to sleep (TASK_INTERRUPTIBLE) until the * @condition evaluates to true or a signal is received. * The @condition is checked each time the waitqueue @wq is woken up. * * It must be called with wq.lock being held. This spinlock is * unlocked while sleeping but @condition testing is done while lock * is held and when this macro exits the lock is held. * * The lock is locked/unlocked using spin_lock()/spin_unlock() * functions which must match the way they are locked/unlocked outside * of this macro. * * The process is put on the wait queue with an WQ_FLAG_EXCLUSIVE flag * set thus when other process waits process on the list if this * process is awaken further processes are not considered. * * wake_up_locked() has to be called after changing any variable that could * change the result of the wait condition. * * The function will return -ERESTARTSYS if it was interrupted by a * signal and 0 if @condition evaluated to true. */ #define wait_event_interruptible_exclusive_locked(wq, condition) \ ((condition) \ ? 0 : __wait_event_interruptible_locked(wq, condition, 1, do_wait_intr)) /** * wait_event_interruptible_exclusive_locked_irq - sleep until a condition gets true * @wq: the waitqueue to wait on * @condition: a C expression for the event to wait for * * The process is put to sleep (TASK_INTERRUPTIBLE) until the * @condition evaluates to true or a signal is received. * The @condition is checked each time the waitqueue @wq is woken up. * * It must be called with wq.lock being held. This spinlock is * unlocked while sleeping but @condition testing is done while lock * is held and when this macro exits the lock is held. * * The lock is locked/unlocked using spin_lock_irq()/spin_unlock_irq() * functions which must match the way they are locked/unlocked outside * of this macro. * * The process is put on the wait queue with an WQ_FLAG_EXCLUSIVE flag * set thus when other process waits process on the list if this * process is awaken further processes are not considered. * * wake_up_locked() has to be called after changing any variable that could * change the result of the wait condition. * * The function will return -ERESTARTSYS if it was interrupted by a * signal and 0 if @condition evaluated to true. */ #define wait_event_interruptible_exclusive_locked_irq(wq, condition) \ ((condition) \ ? 0 : __wait_event_interruptible_locked(wq, condition, 1, do_wait_intr_irq)) #define __wait_event_killable(wq, condition) \ ___wait_event(wq, condition, TASK_KILLABLE, 0, 0, schedule()) /** * wait_event_killable - sleep until a condition gets true * @wq_head: the waitqueue to wait on * @condition: a C expression for the event to wait for * * The process is put to sleep (TASK_KILLABLE) until the * @condition evaluates to true or a signal is received. * The @condition is checked each time the waitqueue @wq_head is woken up. * * wake_up() has to be called after changing any variable that could * change the result of the wait condition. * * The function will return -ERESTARTSYS if it was interrupted by a * signal and 0 if @condition evaluated to true. */ #define wait_event_killable(wq_head, condition) \ ({ \ int __ret = 0; \ might_sleep(); \ if (!(condition)) \ __ret = __wait_event_killable(wq_head, condition); \ __ret; \ }) #define __wait_event_killable_timeout(wq_head, condition, timeout) \ ___wait_event(wq_head, ___wait_cond_timeout(condition), \ TASK_KILLABLE, 0, timeout, \ __ret = schedule_timeout(__ret)) /** * wait_event_killable_timeout - sleep until a condition gets true or a timeout elapses * @wq_head: the waitqueue to wait on * @condition: a C expression for the event to wait for * @timeout: timeout, in jiffies * * The process is put to sleep (TASK_KILLABLE) until the * @condition evaluates to true or a kill signal is received. * The @condition is checked each time the waitqueue @wq_head is woken up. * * wake_up() has to be called after changing any variable that could * change the result of the wait condition. * * Returns: * 0 if the @condition evaluated to %false after the @timeout elapsed, * 1 if the @condition evaluated to %true after the @timeout elapsed, * the remaining jiffies (at least 1) if the @condition evaluated * to %true before the @timeout elapsed, or -%ERESTARTSYS if it was * interrupted by a kill signal. * * Only kill signals interrupt this process. */ #define wait_event_killable_timeout(wq_head, condition, timeout) \ ({ \ long __ret = timeout; \ might_sleep(); \ if (!___wait_cond_timeout(condition)) \ __ret = __wait_event_killable_timeout(wq_head, \ condition, timeout); \ __ret; \ }) #define __wait_event_lock_irq(wq_head, condition, lock, cmd) \ (void)___wait_event(wq_head, condition, TASK_UNINTERRUPTIBLE, 0, 0, \ spin_unlock_irq(&lock); \ cmd; \ schedule(); \ spin_lock_irq(&lock)) /** * wait_event_lock_irq_cmd - sleep until a condition gets true. The * condition is checked under the lock. This * is expected to be called with the lock * taken. * @wq_head: the waitqueue to wait on * @condition: a C expression for the event to wait for * @lock: a locked spinlock_t, which will be released before cmd * and schedule() and reacquired afterwards. * @cmd: a command which is invoked outside the critical section before * sleep * * The process is put to sleep (TASK_UNINTERRUPTIBLE) until the * @condition evaluates to true. The @condition is checked each time * the waitqueue @wq_head is woken up. * * wake_up() has to be called after changing any variable that could * change the result of the wait condition. * * This is supposed to be called while holding the lock. The lock is * dropped before invoking the cmd and going to sleep and is reacquired * afterwards. */ #define wait_event_lock_irq_cmd(wq_head, condition, lock, cmd) \ do { \ if (condition) \ break; \ __wait_event_lock_irq(wq_head, condition, lock, cmd); \ } while (0) /** * wait_event_lock_irq - sleep until a condition gets true. The * condition is checked under the lock. This * is expected to be called with the lock * taken. * @wq_head: the waitqueue to wait on * @condition: a C expression for the event to wait for * @lock: a locked spinlock_t, which will be released before schedule() * and reacquired afterwards. * * The process is put to sleep (TASK_UNINTERRUPTIBLE) until the * @condition evaluates to true. The @condition is checked each time * the waitqueue @wq_head is woken up. * * wake_up() has to be called after changing any variable that could * change the result of the wait condition. * * This is supposed to be called while holding the lock. The lock is * dropped before going to sleep and is reacquired afterwards. */ #define wait_event_lock_irq(wq_head, condition, lock) \ do { \ if (condition) \ break; \ __wait_event_lock_irq(wq_head, condition, lock, ); \ } while (0) #define __wait_event_interruptible_lock_irq(wq_head, condition, lock, cmd) \ ___wait_event(wq_head, condition, TASK_INTERRUPTIBLE, 0, 0, \ spin_unlock_irq(&lock); \ cmd; \ schedule(); \ spin_lock_irq(&lock)) /** * wait_event_interruptible_lock_irq_cmd - sleep until a condition gets true. * The condition is checked under the lock. This is expected to * be called with the lock taken. * @wq_head: the waitqueue to wait on * @condition: a C expression for the event to wait for * @lock: a locked spinlock_t, which will be released before cmd and * schedule() and reacquired afterwards. * @cmd: a command which is invoked outside the critical section before * sleep * * The process is put to sleep (TASK_INTERRUPTIBLE) until the * @condition evaluates to true or a signal is received. The @condition is * checked each time the waitqueue @wq_head is woken up. * * wake_up() has to be called after changing any variable that could * change the result of the wait condition. * * This is supposed to be called while holding the lock. The lock is * dropped before invoking the cmd and going to sleep and is reacquired * afterwards. * * The macro will return -ERESTARTSYS if it was interrupted by a signal * and 0 if @condition evaluated to true. */ #define wait_event_interruptible_lock_irq_cmd(wq_head, condition, lock, cmd) \ ({ \ int __ret = 0; \ if (!(condition)) \ __ret = __wait_event_interruptible_lock_irq(wq_head, \ condition, lock, cmd); \ __ret; \ }) /** * wait_event_interruptible_lock_irq - sleep until a condition gets true. * The condition is checked under the lock. This is expected * to be called with the lock taken. * @wq_head: the waitqueue to wait on * @condition: a C expression for the event to wait for * @lock: a locked spinlock_t, which will be released before schedule() * and reacquired afterwards. * * The process is put to sleep (TASK_INTERRUPTIBLE) until the * @condition evaluates to true or signal is received. The @condition is * checked each time the waitqueue @wq_head is woken up. * * wake_up() has to be called after changing any variable that could * change the result of the wait condition. * * This is supposed to be called while holding the lock. The lock is * dropped before going to sleep and is reacquired afterwards. * * The macro will return -ERESTARTSYS if it was interrupted by a signal * and 0 if @condition evaluated to true. */ #define wait_event_interruptible_lock_irq(wq_head, condition, lock) \ ({ \ int __ret = 0; \ if (!(condition)) \ __ret = __wait_event_interruptible_lock_irq(wq_head, \ condition, lock,); \ __ret; \ }) #define __wait_event_lock_irq_timeout(wq_head, condition, lock, timeout, state) \ ___wait_event(wq_head, ___wait_cond_timeout(condition), \ state, 0, timeout, \ spin_unlock_irq(&lock); \ __ret = schedule_timeout(__ret); \ spin_lock_irq(&lock)); /** * wait_event_interruptible_lock_irq_timeout - sleep until a condition gets * true or a timeout elapses. The condition is checked under * the lock. This is expected to be called with the lock taken. * @wq_head: the waitqueue to wait on * @condition: a C expression for the event to wait for * @lock: a locked spinlock_t, which will be released before schedule() * and reacquired afterwards. * @timeout: timeout, in jiffies * * The process is put to sleep (TASK_INTERRUPTIBLE) until the * @condition evaluates to true or signal is received. The @condition is * checked each time the waitqueue @wq_head is woken up. * * wake_up() has to be called after changing any variable that could * change the result of the wait condition. * * This is supposed to be called while holding the lock. The lock is * dropped before going to sleep and is reacquired afterwards. * * The function returns 0 if the @timeout elapsed, -ERESTARTSYS if it * was interrupted by a signal, and the remaining jiffies otherwise * if the condition evaluated to true before the timeout elapsed. */ #define wait_event_interruptible_lock_irq_timeout(wq_head, condition, lock, \ timeout) \ ({ \ long __ret = timeout; \ if (!___wait_cond_timeout(condition)) \ __ret = __wait_event_lock_irq_timeout( \ wq_head, condition, lock, timeout, \ TASK_INTERRUPTIBLE); \ __ret; \ }) #define wait_event_lock_irq_timeout(wq_head, condition, lock, timeout) \ ({ \ long __ret = timeout; \ if (!___wait_cond_timeout(condition)) \ __ret = __wait_event_lock_irq_timeout( \ wq_head, condition, lock, timeout, \ TASK_UNINTERRUPTIBLE); \ __ret; \ }) /* * Waitqueues which are removed from the waitqueue_head at wakeup time */ void prepare_to_wait(struct wait_queue_head *wq_head, struct wait_queue_entry *wq_entry, int state); void prepare_to_wait_exclusive(struct wait_queue_head *wq_head, struct wait_queue_entry *wq_entry, int state); long prepare_to_wait_event(struct wait_queue_head *wq_head, struct wait_queue_entry *wq_entry, int state); void finish_wait(struct wait_queue_head *wq_head, struct wait_queue_entry *wq_entry); long wait_woken(struct wait_queue_entry *wq_entry, unsigned mode, long timeout); int woken_wake_function(struct wait_queue_entry *wq_entry, unsigned mode, int sync, void *key); int autoremove_wake_function(struct wait_queue_entry *wq_entry, unsigned mode, int sync, void *key); #define DEFINE_WAIT_FUNC(name, function) \ struct wait_queue_entry name = { \ .private = current, \ .func = function, \ .entry = LIST_HEAD_INIT((name).entry), \ } #define DEFINE_WAIT(name) DEFINE_WAIT_FUNC(name, autoremove_wake_function) #define init_wait(wait) \ do { \ (wait)->private = current; \ (wait)->func = autoremove_wake_function; \ INIT_LIST_HEAD(&(wait)->entry); \ (wait)->flags = 0; \ } while (0) #endif /* _LINUX_WAIT_H */
35 35 34 25 25 25 25 11 11 11 10 10 10 10 11 10 35 35 35 35 25 25 25 11 11 11 11 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 /* RomFS storage access routines * * Copyright © 2007 Red Hat, Inc. All Rights Reserved. * Written by David Howells (dhowells@redhat.com) * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License * as published by the Free Software Foundation; either version * 2 of the License, or (at your option) any later version. */ #include <linux/fs.h> #include <linux/mtd/super.h> #include <linux/buffer_head.h> #include "internal.h" #if !defined(CONFIG_ROMFS_ON_MTD) && !defined(CONFIG_ROMFS_ON_BLOCK) #error no ROMFS backing store interface configured #endif #ifdef CONFIG_ROMFS_ON_MTD #define ROMFS_MTD_READ(sb, ...) mtd_read((sb)->s_mtd, ##__VA_ARGS__) /* * read data from an romfs image on an MTD device */ static int romfs_mtd_read(struct super_block *sb, unsigned long pos, void *buf, size_t buflen) { size_t rlen; int ret; ret = ROMFS_MTD_READ(sb, pos, buflen, &rlen, buf); return (ret < 0 || rlen != buflen) ? -EIO : 0; } /* * determine the length of a string in a romfs image on an MTD device */ static ssize_t romfs_mtd_strnlen(struct super_block *sb, unsigned long pos, size_t maxlen) { ssize_t n = 0; size_t segment; u_char buf[16], *p; size_t len; int ret; /* scan the string up to 16 bytes at a time */ while (maxlen > 0) { segment = min_t(size_t, maxlen, 16); ret = ROMFS_MTD_READ(sb, pos, segment, &len, buf); if (ret < 0) return ret; p = memchr(buf, 0, len); if (p) return n + (p - buf); maxlen -= len; pos += len; n += len; } return n; } /* * compare a string to one in a romfs image on MTD * - return 1 if matched, 0 if differ, -ve if error */ static int romfs_mtd_strcmp(struct super_block *sb, unsigned long pos, const char *str, size_t size) { u_char buf[17]; size_t len, segment; int ret; /* scan the string up to 16 bytes at a time, and attempt to grab the * trailing NUL whilst we're at it */ buf[0] = 0xff; while (size > 0) { segment = min_t(size_t, size + 1, 17); ret = ROMFS_MTD_READ(sb, pos, segment, &len, buf); if (ret < 0) return ret; len--; if (memcmp(buf, str, len) != 0) return 0; buf[0] = buf[len]; size -= len; pos += len; str += len; } /* check the trailing NUL was */ if (buf[0]) return 0; return 1; } #endif /* CONFIG_ROMFS_ON_MTD */ #ifdef CONFIG_ROMFS_ON_BLOCK /* * read data from an romfs image on a block device */ static int romfs_blk_read(struct super_block *sb, unsigned long pos, void *buf, size_t buflen) { struct buffer_head *bh; unsigned long offset; size_t segment; /* copy the string up to blocksize bytes at a time */ while (buflen > 0) { offset = pos & (ROMBSIZE - 1); segment = min_t(size_t, buflen, ROMBSIZE - offset); bh = sb_bread(sb, pos >> ROMBSBITS); if (!bh) return -EIO; memcpy(buf, bh->b_data + offset, segment); brelse(bh); buf += segment; buflen -= segment; pos += segment; } return 0; } /* * determine the length of a string in romfs on a block device */ static ssize_t romfs_blk_strnlen(struct super_block *sb, unsigned long pos, size_t limit) { struct buffer_head *bh; unsigned long offset; ssize_t n = 0; size_t segment; u_char *buf, *p; /* scan the string up to blocksize bytes at a time */ while (limit > 0) { offset = pos & (ROMBSIZE - 1); segment = min_t(size_t, limit, ROMBSIZE - offset); bh = sb_bread(sb, pos >> ROMBSBITS); if (!bh) return -EIO; buf = bh->b_data + offset; p = memchr(buf, 0, segment); brelse(bh); if (p) return n + (p - buf); limit -= segment; pos += segment; n += segment; } return n; } /* * compare a string to one in a romfs image on a block device * - return 1 if matched, 0 if differ, -ve if error */ static int romfs_blk_strcmp(struct super_block *sb, unsigned long pos, const char *str, size_t size) { struct buffer_head *bh; unsigned long offset; size_t segment; bool matched, terminated = false; /* compare string up to a block at a time */ while (size > 0) { offset = pos & (ROMBSIZE - 1); segment = min_t(size_t, size, ROMBSIZE - offset); bh = sb_bread(sb, pos >> ROMBSBITS); if (!bh) return -EIO; matched = (memcmp(bh->b_data + offset, str, segment) == 0); size -= segment; pos += segment; str += segment; if (matched && size == 0 && offset + segment < ROMBSIZE) { if (!bh->b_data[offset + segment]) terminated = true; else matched = false; } brelse(bh); if (!matched) return 0; } if (!terminated) { /* the terminating NUL must be on the first byte of the next * block */ BUG_ON((pos & (ROMBSIZE - 1)) != 0); bh = sb_bread(sb, pos >> ROMBSBITS); if (!bh) return -EIO; matched = !bh->b_data[0]; brelse(bh); if (!matched) return 0; } return 1; } #endif /* CONFIG_ROMFS_ON_BLOCK */ /* * read data from the romfs image */ int romfs_dev_read(struct super_block *sb, unsigned long pos, void *buf, size_t buflen) { size_t limit; limit = romfs_maxsize(sb); if (pos >= limit || buflen > limit - pos) return -EIO; #ifdef CONFIG_ROMFS_ON_MTD if (sb->s_mtd) return romfs_mtd_read(sb, pos, buf, buflen); #endif #ifdef CONFIG_ROMFS_ON_BLOCK if (sb->s_bdev) return romfs_blk_read(sb, pos, buf, buflen); #endif return -EIO; } /* * determine the length of a string in romfs */ ssize_t romfs_dev_strnlen(struct super_block *sb, unsigned long pos, size_t maxlen) { size_t limit; limit = romfs_maxsize(sb); if (pos >= limit) return -EIO; if (maxlen > limit - pos) maxlen = limit - pos; #ifdef CONFIG_ROMFS_ON_MTD if (sb->s_mtd) return romfs_mtd_strnlen(sb, pos, maxlen); #endif #ifdef CONFIG_ROMFS_ON_BLOCK if (sb->s_bdev) return romfs_blk_strnlen(sb, pos, maxlen); #endif return -EIO; } /* * compare a string to one in romfs * - the string to be compared to, str, may not be NUL-terminated; instead the * string is of the specified size * - return 1 if matched, 0 if differ, -ve if error */ int romfs_dev_strcmp(struct super_block *sb, unsigned long pos, const char *str, size_t size) { size_t limit; limit = romfs_maxsize(sb); if (pos >= limit) return -EIO; if (size > ROMFS_MAXFN) return -ENAMETOOLONG; if (size + 1 > limit - pos) return -EIO; #ifdef CONFIG_ROMFS_ON_MTD if (sb->s_mtd) return romfs_mtd_strcmp(sb, pos, str, size); #endif #ifdef CONFIG_ROMFS_ON_BLOCK if (sb->s_bdev) return romfs_blk_strcmp(sb, pos, str, size); #endif return -EIO; }
5560 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 /* SPDX-License-Identifier: GPL-2.0 */ #ifndef __LINUX_COMPLETION_H #define __LINUX_COMPLETION_H /* * (C) Copyright 2001 Linus Torvalds * * Atomic wait-for-completion handler data structures. * See kernel/sched/completion.c for details. */ #include <linux/wait.h> /* * struct completion - structure used to maintain state for a "completion" * * This is the opaque structure used to maintain the state for a "completion". * Completions currently use a FIFO to queue threads that have to wait for * the "completion" event. * * See also: complete(), wait_for_completion() (and friends _timeout, * _interruptible, _interruptible_timeout, and _killable), init_completion(), * reinit_completion(), and macros DECLARE_COMPLETION(), * DECLARE_COMPLETION_ONSTACK(). */ struct completion { unsigned int done; wait_queue_head_t wait; }; #define init_completion_map(x, m) __init_completion(x) #define init_completion(x) __init_completion(x) static inline void complete_acquire(struct completion *x) {} static inline void complete_release(struct completion *x) {} #define COMPLETION_INITIALIZER(work) \ { 0, __WAIT_QUEUE_HEAD_INITIALIZER((work).wait) } #define COMPLETION_INITIALIZER_ONSTACK_MAP(work, map) \ (*({ init_completion_map(&(work), &(map)); &(work); })) #define COMPLETION_INITIALIZER_ONSTACK(work) \ (*({ init_completion(&work); &work; })) /** * DECLARE_COMPLETION - declare and initialize a completion structure * @work: identifier for the completion structure * * This macro declares and initializes a completion structure. Generally used * for static declarations. You should use the _ONSTACK variant for automatic * variables. */ #define DECLARE_COMPLETION(work) \ struct completion work = COMPLETION_INITIALIZER(work) /* * Lockdep needs to run a non-constant initializer for on-stack * completions - so we use the _ONSTACK() variant for those that * are on the kernel stack: */ /** * DECLARE_COMPLETION_ONSTACK - declare and initialize a completion structure * @work: identifier for the completion structure * * This macro declares and initializes a completion structure on the kernel * stack. */ #ifdef CONFIG_LOCKDEP # define DECLARE_COMPLETION_ONSTACK(work) \ struct completion work = COMPLETION_INITIALIZER_ONSTACK(work) # define DECLARE_COMPLETION_ONSTACK_MAP(work, map) \ struct completion work = COMPLETION_INITIALIZER_ONSTACK_MAP(work, map) #else # define DECLARE_COMPLETION_ONSTACK(work) DECLARE_COMPLETION(work) # define DECLARE_COMPLETION_ONSTACK_MAP(work, map) DECLARE_COMPLETION(work) #endif /** * init_completion - Initialize a dynamically allocated completion * @x: pointer to completion structure that is to be initialized * * This inline function will initialize a dynamically created completion * structure. */ static inline void __init_completion(struct completion *x) { x->done = 0; init_waitqueue_head(&x->wait); } /** * reinit_completion - reinitialize a completion structure * @x: pointer to completion structure that is to be reinitialized * * This inline function should be used to reinitialize a completion structure so it can * be reused. This is especially important after complete_all() is used. */ static inline void reinit_completion(struct completion *x) { x->done = 0; } extern void wait_for_completion(struct completion *); extern void wait_for_completion_io(struct completion *); extern int wait_for_completion_interruptible(struct completion *x); extern int wait_for_completion_killable(struct completion *x); extern unsigned long wait_for_completion_timeout(struct completion *x, unsigned long timeout); extern unsigned long wait_for_completion_io_timeout(struct completion *x, unsigned long timeout); extern long wait_for_completion_interruptible_timeout( struct completion *x, unsigned long timeout); extern long wait_for_completion_killable_timeout( struct completion *x, unsigned long timeout); extern bool try_wait_for_completion(struct completion *x); extern bool completion_done(struct completion *x); extern void complete(struct completion *); extern void complete_all(struct completion *); #endif
3 3 3 3 3 1 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 /* * linux/fs/nls/nls_cp737.c * * Charset cp737 translation tables. * Generated automatically from the Unicode and charset * tables from the Unicode Organization (www.unicode.org). * The Unicode to charset table has only exact mappings. */ #include <linux/module.h> #include <linux/kernel.h> #include <linux/string.h> #include <linux/nls.h> #include <linux/errno.h> static const wchar_t charset2uni[256] = { /* 0x00*/ 0x0000, 0x0001, 0x0002, 0x0003, 0x0004, 0x0005, 0x0006, 0x0007, 0x0008, 0x0009, 0x000a, 0x000b, 0x000c, 0x000d, 0x000e, 0x000f, /* 0x10*/ 0x0010, 0x0011, 0x0012, 0x0013, 0x0014, 0x0015, 0x0016, 0x0017, 0x0018, 0x0019, 0x001a, 0x001b, 0x001c, 0x001d, 0x001e, 0x001f, /* 0x20*/ 0x0020, 0x0021, 0x0022, 0x0023, 0x0024, 0x0025, 0x0026, 0x0027, 0x0028, 0x0029, 0x002a, 0x002b, 0x002c, 0x002d, 0x002e, 0x002f, /* 0x30*/ 0x0030, 0x0031, 0x0032, 0x0033, 0x0034, 0x0035, 0x0036, 0x0037, 0x0038, 0x0039, 0x003a, 0x003b, 0x003c, 0x003d, 0x003e, 0x003f, /* 0x40*/ 0x0040, 0x0041, 0x0042, 0x0043, 0x0044, 0x0045, 0x0046, 0x0047, 0x0048, 0x0049, 0x004a, 0x004b, 0x004c, 0x004d, 0x004e, 0x004f, /* 0x50*/ 0x0050, 0x0051, 0x0052, 0x0053, 0x0054, 0x0055, 0x0056, 0x0057, 0x0058, 0x0059, 0x005a, 0x005b, 0x005c, 0x005d, 0x005e, 0x005f, /* 0x60*/ 0x0060, 0x0061, 0x0062, 0x0063, 0x0064, 0x0065, 0x0066, 0x0067, 0x0068, 0x0069, 0x006a, 0x006b, 0x006c, 0x006d, 0x006e, 0x006f, /* 0x70*/ 0x0070, 0x0071, 0x0072, 0x0073, 0x0074, 0x0075, 0x0076, 0x0077, 0x0078, 0x0079, 0x007a, 0x007b, 0x007c, 0x007d, 0x007e, 0x007f, /* 0x80*/ 0x0391, 0x0392, 0x0393, 0x0394, 0x0395, 0x0396, 0x0397, 0x0398, 0x0399, 0x039a, 0x039b, 0x039c, 0x039d, 0x039e, 0x039f, 0x03a0, /* 0x90*/ 0x03a1, 0x03a3, 0x03a4, 0x03a5, 0x03a6, 0x03a7, 0x03a8, 0x03a9, 0x03b1, 0x03b2, 0x03b3, 0x03b4, 0x03b5, 0x03b6, 0x03b7, 0x03b8, /* 0xa0*/ 0x03b9, 0x03ba, 0x03bb, 0x03bc, 0x03bd, 0x03be, 0x03bf, 0x03c0, 0x03c1, 0x03c3, 0x03c2, 0x03c4, 0x03c5, 0x03c6, 0x03c7, 0x03c8, /* 0xb0*/ 0x2591, 0x2592, 0x2593, 0x2502, 0x2524, 0x2561, 0x2562, 0x2556, 0x2555, 0x2563, 0x2551, 0x2557, 0x255d, 0x255c, 0x255b, 0x2510, /* 0xc0*/ 0x2514, 0x2534, 0x252c, 0x251c, 0x2500, 0x253c, 0x255e, 0x255f, 0x255a, 0x2554, 0x2569, 0x2566, 0x2560, 0x2550, 0x256c, 0x2567, /* 0xd0*/ 0x2568, 0x2564, 0x2565, 0x2559, 0x2558, 0x2552, 0x2553, 0x256b, 0x256a, 0x2518, 0x250c, 0x2588, 0x2584, 0x258c, 0x2590, 0x2580, /* 0xe0*/ 0x03c9, 0x03ac, 0x03ad, 0x03ae, 0x03ca, 0x03af, 0x03cc, 0x03cd, 0x03cb, 0x03ce, 0x0386, 0x0388, 0x0389, 0x038a, 0x038c, 0x038e, /* 0xf0*/ 0x038f, 0x00b1, 0x2265, 0x2264, 0x03aa, 0x03ab, 0x00f7, 0x2248, 0x00b0, 0x2219, 0x00b7, 0x221a, 0x207f, 0x00b2, 0x25a0, 0x00a0, }; static const unsigned char page00[256] = { 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, /* 0x00-0x07 */ 0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f, /* 0x08-0x0f */ 0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17, /* 0x10-0x17 */ 0x18, 0x19, 0x1a, 0x1b, 0x1c, 0x1d, 0x1e, 0x1f, /* 0x18-0x1f */ 0x20, 0x21, 0x22, 0x23, 0x24, 0x25, 0x26, 0x27, /* 0x20-0x27 */ 0x28, 0x29, 0x2a, 0x2b, 0x2c, 0x2d, 0x2e, 0x2f, /* 0x28-0x2f */ 0x30, 0x31, 0x32, 0x33, 0x34, 0x35, 0x36, 0x37, /* 0x30-0x37 */ 0x38, 0x39, 0x3a, 0x3b, 0x3c, 0x3d, 0x3e, 0x3f, /* 0x38-0x3f */ 0x40, 0x41, 0x42, 0x43, 0x44, 0x45, 0x46, 0x47, /* 0x40-0x47 */ 0x48, 0x49, 0x4a, 0x4b, 0x4c, 0x4d, 0x4e, 0x4f, /* 0x48-0x4f */ 0x50, 0x51, 0x52, 0x53, 0x54, 0x55, 0x56, 0x57, /* 0x50-0x57 */ 0x58, 0x59, 0x5a, 0x5b, 0x5c, 0x5d, 0x5e, 0x5f, /* 0x58-0x5f */ 0x60, 0x61, 0x62, 0x63, 0x64, 0x65, 0x66, 0x67, /* 0x60-0x67 */ 0x68, 0x69, 0x6a, 0x6b, 0x6c, 0x6d, 0x6e, 0x6f, /* 0x68-0x6f */ 0x70, 0x71, 0x72, 0x73, 0x74, 0x75, 0x76, 0x77, /* 0x70-0x77 */ 0x78, 0x79, 0x7a, 0x7b, 0x7c, 0x7d, 0x7e, 0x7f, /* 0x78-0x7f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x80-0x87 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x88-0x8f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x90-0x97 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x98-0x9f */ 0xff, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xa0-0xa7 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xa8-0xaf */ 0xf8, 0xf1, 0xfd, 0x00, 0x00, 0x00, 0x00, 0xfa, /* 0xb0-0xb7 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xb8-0xbf */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xc0-0xc7 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xc8-0xcf */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xd0-0xd7 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xd8-0xdf */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xe0-0xe7 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xe8-0xef */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xf6, /* 0xf0-0xf7 */ }; static const unsigned char page03[256] = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x00-0x07 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x08-0x0f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x10-0x17 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x18-0x1f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x20-0x27 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x28-0x2f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x30-0x37 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x38-0x3f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x40-0x47 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x48-0x4f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x50-0x57 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x58-0x5f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x60-0x67 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x68-0x6f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x70-0x77 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x78-0x7f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xea, 0x00, /* 0x80-0x87 */ 0xeb, 0xec, 0xed, 0x00, 0xee, 0x00, 0xef, 0xf0, /* 0x88-0x8f */ 0x00, 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, /* 0x90-0x97 */ 0x87, 0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, /* 0x98-0x9f */ 0x8f, 0x90, 0x00, 0x91, 0x92, 0x93, 0x94, 0x95, /* 0xa0-0xa7 */ 0x96, 0x97, 0xf4, 0xf5, 0xe1, 0xe2, 0xe3, 0xe5, /* 0xa8-0xaf */ 0x00, 0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, /* 0xb0-0xb7 */ 0x9f, 0xa0, 0xa1, 0xa2, 0xa3, 0xa4, 0xa5, 0xa6, /* 0xb8-0xbf */ 0xa7, 0xa8, 0xaa, 0xa9, 0xab, 0xac, 0xad, 0xae, /* 0xc0-0xc7 */ 0xaf, 0xe0, 0xe4, 0xe8, 0xe6, 0xe7, 0xe9, 0x00, /* 0xc8-0xcf */ }; static const unsigned char page20[256] = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x00-0x07 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x08-0x0f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x10-0x17 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x18-0x1f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x20-0x27 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x28-0x2f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x30-0x37 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x38-0x3f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x40-0x47 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x48-0x4f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x50-0x57 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x58-0x5f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x60-0x67 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x68-0x6f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x70-0x77 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xfc, /* 0x78-0x7f */ }; static const unsigned char page22[256] = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x00-0x07 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x08-0x0f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x10-0x17 */ 0x00, 0xf9, 0xfb, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x18-0x1f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x20-0x27 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x28-0x2f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x30-0x37 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x38-0x3f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x40-0x47 */ 0xf7, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x48-0x4f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x50-0x57 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x58-0x5f */ 0x00, 0x00, 0x00, 0x00, 0xf3, 0xf2, 0x00, 0x00, /* 0x60-0x67 */ }; static const unsigned char page25[256] = { 0xc4, 0x00, 0xb3, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x00-0x07 */ 0x00, 0x00, 0x00, 0x00, 0xda, 0x00, 0x00, 0x00, /* 0x08-0x0f */ 0xbf, 0x00, 0x00, 0x00, 0xc0, 0x00, 0x00, 0x00, /* 0x10-0x17 */ 0xd9, 0x00, 0x00, 0x00, 0xc3, 0x00, 0x00, 0x00, /* 0x18-0x1f */ 0x00, 0x00, 0x00, 0x00, 0xb4, 0x00, 0x00, 0x00, /* 0x20-0x27 */ 0x00, 0x00, 0x00, 0x00, 0xc2, 0x00, 0x00, 0x00, /* 0x28-0x2f */ 0x00, 0x00, 0x00, 0x00, 0xc1, 0x00, 0x00, 0x00, /* 0x30-0x37 */ 0x00, 0x00, 0x00, 0x00, 0xc5, 0x00, 0x00, 0x00, /* 0x38-0x3f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x40-0x47 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x48-0x4f */ 0xcd, 0xba, 0xd5, 0xd6, 0xc9, 0xb8, 0xb7, 0xbb, /* 0x50-0x57 */ 0xd4, 0xd3, 0xc8, 0xbe, 0xbd, 0xbc, 0xc6, 0xc7, /* 0x58-0x5f */ 0xcc, 0xb5, 0xb6, 0xb9, 0xd1, 0xd2, 0xcb, 0xcf, /* 0x60-0x67 */ 0xd0, 0xca, 0xd8, 0xd7, 0xce, 0x00, 0x00, 0x00, /* 0x68-0x6f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x70-0x77 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x78-0x7f */ 0xdf, 0x00, 0x00, 0x00, 0xdc, 0x00, 0x00, 0x00, /* 0x80-0x87 */ 0xdb, 0x00, 0x00, 0x00, 0xdd, 0x00, 0x00, 0x00, /* 0x88-0x8f */ 0xde, 0xb0, 0xb1, 0xb2, 0x00, 0x00, 0x00, 0x00, /* 0x90-0x97 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x98-0x9f */ 0xfe, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xa0-0xa7 */ }; static const unsigned char *const page_uni2charset[256] = { page00, NULL, NULL, page03, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, page20, NULL, page22, NULL, NULL, page25, NULL, NULL, }; static const unsigned char charset2lower[256] = { 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, /* 0x00-0x07 */ 0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f, /* 0x08-0x0f */ 0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17, /* 0x10-0x17 */ 0x18, 0x19, 0x1a, 0x1b, 0x1c, 0x1d, 0x1e, 0x1f, /* 0x18-0x1f */ 0x20, 0x21, 0x22, 0x23, 0x24, 0x25, 0x26, 0x27, /* 0x20-0x27 */ 0x28, 0x29, 0x2a, 0x2b, 0x2c, 0x2d, 0x2e, 0x2f, /* 0x28-0x2f */ 0x30, 0x31, 0x32, 0x33, 0x34, 0x35, 0x36, 0x37, /* 0x30-0x37 */ 0x38, 0x39, 0x3a, 0x3b, 0x3c, 0x3d, 0x3e, 0x3f, /* 0x38-0x3f */ 0x40, 0x61, 0x62, 0x63, 0x64, 0x65, 0x66, 0x67, /* 0x40-0x47 */ 0x68, 0x69, 0x6a, 0x6b, 0x6c, 0x6d, 0x6e, 0x6f, /* 0x48-0x4f */ 0x70, 0x71, 0x72, 0x73, 0x74, 0x75, 0x76, 0x77, /* 0x50-0x57 */ 0x78, 0x79, 0x7a, 0x5b, 0x5c, 0x5d, 0x5e, 0x5f, /* 0x58-0x5f */ 0x60, 0x61, 0x62, 0x63, 0x64, 0x65, 0x66, 0x67, /* 0x60-0x67 */ 0x68, 0x69, 0x6a, 0x6b, 0x6c, 0x6d, 0x6e, 0x6f, /* 0x68-0x6f */ 0x70, 0x71, 0x72, 0x73, 0x74, 0x75, 0x76, 0x77, /* 0x70-0x77 */ 0x78, 0x79, 0x7a, 0x7b, 0x7c, 0x7d, 0x7e, 0x7f, /* 0x78-0x7f */ 0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f, /* 0x80-0x87 */ 0xa0, 0xa1, 0xa2, 0xa3, 0xa4, 0xa5, 0xa6, 0xa7, /* 0x88-0x8f */ 0xa8, 0xa9, 0xab, 0xac, 0xad, 0xae, 0xaf, 0xe0, /* 0x90-0x97 */ 0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f, /* 0x98-0x9f */ 0xa0, 0xa1, 0xa2, 0xa3, 0xa4, 0xa5, 0xa6, 0xa7, /* 0xa0-0xa7 */ 0xa8, 0xa9, 0xaa, 0xab, 0xac, 0xad, 0xae, 0xaf, /* 0xa8-0xaf */ 0xb0, 0xb1, 0xb2, 0xb3, 0xb4, 0xb5, 0xb6, 0xb7, /* 0xb0-0xb7 */ 0xb8, 0xb9, 0xba, 0xbb, 0xbc, 0xbd, 0xbe, 0xbf, /* 0xb8-0xbf */ 0xc0, 0xc1, 0xc2, 0xc3, 0xc4, 0xc5, 0xc6, 0xc7, /* 0xc0-0xc7 */ 0xc8, 0xc9, 0xca, 0xcb, 0xcc, 0xcd, 0xce, 0xcf, /* 0xc8-0xcf */ 0xd0, 0xd1, 0xd2, 0xd3, 0xd4, 0xd5, 0xd6, 0xd7, /* 0xd0-0xd7 */ 0xd8, 0xd9, 0xda, 0xdb, 0xdc, 0xdd, 0xde, 0xdf, /* 0xd8-0xdf */ 0xe0, 0xe1, 0xe2, 0xe3, 0xe4, 0xe5, 0xe6, 0xe7, /* 0xe0-0xe7 */ 0xe8, 0xe9, 0xe1, 0xe2, 0xe3, 0xe5, 0xe6, 0xe7, /* 0xe8-0xef */ 0xe9, 0xf1, 0xf2, 0xf3, 0xe4, 0xe8, 0xf6, 0xf7, /* 0xf0-0xf7 */ 0xf8, 0xf9, 0xfa, 0xfb, 0xfc, 0xfd, 0xfe, 0xff, /* 0xf8-0xff */ }; static const unsigned char charset2upper[256] = { 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, /* 0x00-0x07 */ 0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f, /* 0x08-0x0f */ 0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17, /* 0x10-0x17 */ 0x18, 0x19, 0x1a, 0x1b, 0x1c, 0x1d, 0x1e, 0x1f, /* 0x18-0x1f */ 0x20, 0x21, 0x22, 0x23, 0x24, 0x25, 0x26, 0x27, /* 0x20-0x27 */ 0x28, 0x29, 0x2a, 0x2b, 0x2c, 0x2d, 0x2e, 0x2f, /* 0x28-0x2f */ 0x30, 0x31, 0x32, 0x33, 0x34, 0x35, 0x36, 0x37, /* 0x30-0x37 */ 0x38, 0x39, 0x3a, 0x3b, 0x3c, 0x3d, 0x3e, 0x3f, /* 0x38-0x3f */ 0x40, 0x41, 0x42, 0x43, 0x44, 0x45, 0x46, 0x47, /* 0x40-0x47 */ 0x48, 0x49, 0x4a, 0x4b, 0x4c, 0x4d, 0x4e, 0x4f, /* 0x48-0x4f */ 0x50, 0x51, 0x52, 0x53, 0x54, 0x55, 0x56, 0x57, /* 0x50-0x57 */ 0x58, 0x59, 0x5a, 0x5b, 0x5c, 0x5d, 0x5e, 0x5f, /* 0x58-0x5f */ 0x60, 0x41, 0x42, 0x43, 0x44, 0x45, 0x46, 0x47, /* 0x60-0x67 */ 0x48, 0x49, 0x4a, 0x4b, 0x4c, 0x4d, 0x4e, 0x4f, /* 0x68-0x6f */ 0x50, 0x51, 0x52, 0x53, 0x54, 0x55, 0x56, 0x57, /* 0x70-0x77 */ 0x58, 0x59, 0x5a, 0x7b, 0x7c, 0x7d, 0x7e, 0x7f, /* 0x78-0x7f */ 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87, /* 0x80-0x87 */ 0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f, /* 0x88-0x8f */ 0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97, /* 0x90-0x97 */ 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87, /* 0x98-0x9f */ 0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f, /* 0xa0-0xa7 */ 0x90, 0x91, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, /* 0xa8-0xaf */ 0xb0, 0xb1, 0xb2, 0xb3, 0xb4, 0xb5, 0xb6, 0xb7, /* 0xb0-0xb7 */ 0xb8, 0xb9, 0xba, 0xbb, 0xbc, 0xbd, 0xbe, 0xbf, /* 0xb8-0xbf */ 0xc0, 0xc1, 0xc2, 0xc3, 0xc4, 0xc5, 0xc6, 0xc7, /* 0xc0-0xc7 */ 0xc8, 0xc9, 0xca, 0xcb, 0xcc, 0xcd, 0xce, 0xcf, /* 0xc8-0xcf */ 0xd0, 0xd1, 0xd2, 0xd3, 0xd4, 0xd5, 0xd6, 0xd7, /* 0xd0-0xd7 */ 0xd8, 0xd9, 0xda, 0xdb, 0xdc, 0xdd, 0xde, 0xdf, /* 0xd8-0xdf */ 0x97, 0xea, 0xeb, 0xec, 0xf4, 0xed, 0xee, 0xef, /* 0xe0-0xe7 */ 0xf5, 0xf0, 0xea, 0xeb, 0xec, 0xed, 0xee, 0xef, /* 0xe8-0xef */ 0xf0, 0xf1, 0xf2, 0xf3, 0xf4, 0xf5, 0xf6, 0xf7, /* 0xf0-0xf7 */ 0xf8, 0xf9, 0xfa, 0xfb, 0xfc, 0xfd, 0xfe, 0xff, /* 0xf8-0xff */ }; static int uni2char(wchar_t uni, unsigned char *out, int boundlen) { const unsigned char *uni2charset; unsigned char cl = uni & 0x00ff; unsigned char ch = (uni & 0xff00) >> 8; if (boundlen <= 0) return -ENAMETOOLONG; uni2charset = page_uni2charset[ch]; if (uni2charset && uni2charset[cl]) out[0] = uni2charset[cl]; else return -EINVAL; return 1; } static int char2uni(const unsigned char *rawstring, int boundlen, wchar_t *uni) { *uni = charset2uni[*rawstring]; if (*uni == 0x0000) return -EINVAL; return 1; } static struct nls_table table = { .charset = "cp737", .uni2char = uni2char, .char2uni = char2uni, .charset2lower = charset2lower, .charset2upper = charset2upper, }; static int __init init_nls_cp737(void) { return register_nls(&table); } static void __exit exit_nls_cp737(void) { unregister_nls(&table); } module_init(init_nls_cp737) module_exit(exit_nls_cp737) MODULE_LICENSE("Dual BSD/GPL");
5999 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 // SPDX-License-Identifier: GPL-2.0 #include <linux/export.h> #include <linux/spinlock.h> #include <linux/atomic.h> /* * This is an implementation of the notion of "decrement a * reference count, and return locked if it decremented to zero". * * NOTE NOTE NOTE! This is _not_ equivalent to * * if (atomic_dec_and_test(&atomic)) { * spin_lock(&lock); * return 1; * } * return 0; * * because the spin-lock and the decrement must be * "atomic". */ int _atomic_dec_and_lock(atomic_t *atomic, spinlock_t *lock) { /* Subtract 1 from counter unless that drops it to 0 (ie. it was 1) */ if (atomic_add_unless(atomic, -1, 1)) return 0; /* Otherwise do it the slow way */ spin_lock(lock); if (atomic_dec_and_test(atomic)) return 1; spin_unlock(lock); return 0; } EXPORT_SYMBOL(_atomic_dec_and_lock); int _atomic_dec_and_lock_irqsave(atomic_t *atomic, spinlock_t *lock, unsigned long *flags) { /* Subtract 1 from counter unless that drops it to 0 (ie. it was 1) */ if (atomic_add_unless(atomic, -1, 1)) return 0; /* Otherwise do it the slow way */ spin_lock_irqsave(lock, *flags); if (atomic_dec_and_test(atomic)) return 1; spin_unlock_irqrestore(lock, *flags); return 0; } EXPORT_SYMBOL(_atomic_dec_and_lock_irqsave);
43 37 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 /* SPDX-License-Identifier: GPL-2.0 */ #ifndef _SCSI_SCSI_REQUEST_H #define _SCSI_SCSI_REQUEST_H #include <linux/blk-mq.h> #define BLK_MAX_CDB 16 struct scsi_request { unsigned char __cmd[BLK_MAX_CDB]; unsigned char *cmd; unsigned short cmd_len; int result; unsigned int sense_len; unsigned int resid_len; /* residual count */ int retries; void *sense; }; static inline struct scsi_request *scsi_req(struct request *rq) { return blk_mq_rq_to_pdu(rq); } static inline void scsi_req_free_cmd(struct scsi_request *req) { if (req->cmd != req->__cmd) kfree(req->cmd); } void scsi_req_init(struct scsi_request *req); #endif /* _SCSI_SCSI_REQUEST_H */
31 30 28 27 27 15 9 13 12 11 15 14 10 7 4 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 /* Copyright (c) 2017 Facebook * * This program is free software; you can redistribute it and/or * modify it under the terms of version 2 of the GNU General Public * License as published by the Free Software Foundation. */ #include <linux/slab.h> #include <linux/bpf.h> #include "map_in_map.h" struct bpf_map *bpf_map_meta_alloc(int inner_map_ufd) { struct bpf_map *inner_map, *inner_map_meta; u32 inner_map_meta_size; struct fd f; f = fdget(inner_map_ufd); inner_map = __bpf_map_get(f); if (IS_ERR(inner_map)) return inner_map; /* prog_array->owner_prog_type and owner_jited * is a runtime binding. Doing static check alone * in the verifier is not enough. */ if (inner_map->map_type == BPF_MAP_TYPE_PROG_ARRAY || inner_map->map_type == BPF_MAP_TYPE_CGROUP_STORAGE) { fdput(f); return ERR_PTR(-ENOTSUPP); } /* Does not support >1 level map-in-map */ if (inner_map->inner_map_meta) { fdput(f); return ERR_PTR(-EINVAL); } inner_map_meta_size = sizeof(*inner_map_meta); /* In some cases verifier needs to access beyond just base map. */ if (inner_map->ops == &array_map_ops) inner_map_meta_size = sizeof(struct bpf_array); inner_map_meta = kzalloc(inner_map_meta_size, GFP_USER); if (!inner_map_meta) { fdput(f); return ERR_PTR(-ENOMEM); } inner_map_meta->map_type = inner_map->map_type; inner_map_meta->key_size = inner_map->key_size; inner_map_meta->value_size = inner_map->value_size; inner_map_meta->map_flags = inner_map->map_flags; inner_map_meta->max_entries = inner_map->max_entries; /* Misc members not needed in bpf_map_meta_equal() check. */ inner_map_meta->ops = inner_map->ops; if (inner_map->ops == &array_map_ops) { inner_map_meta->unpriv_array = inner_map->unpriv_array; container_of(inner_map_meta, struct bpf_array, map)->index_mask = container_of(inner_map, struct bpf_array, map)->index_mask; } fdput(f); return inner_map_meta; } void bpf_map_meta_free(struct bpf_map *map_meta) { kfree(map_meta); } bool bpf_map_meta_equal(const struct bpf_map *meta0, const struct bpf_map *meta1) { /* No need to compare ops because it is covered by map_type */ return meta0->map_type == meta1->map_type && meta0->key_size == meta1->key_size && meta0->value_size == meta1->value_size && meta0->map_flags == meta1->map_flags && meta0->max_entries == meta1->max_entries; } void *bpf_map_fd_get_ptr(struct bpf_map *map, struct file *map_file /* not used */, int ufd) { struct bpf_map *inner_map; struct fd f; f = fdget(ufd); inner_map = __bpf_map_get(f); if (IS_ERR(inner_map)) return inner_map; if (bpf_map_meta_equal(map->inner_map_meta, inner_map)) inner_map = bpf_map_inc(inner_map, false); else inner_map = ERR_PTR(-EINVAL); fdput(f); return inner_map; } void bpf_map_fd_put_ptr(void *ptr) { /* ptr->ops->map_free() has to go through one * rcu grace period by itself. */ bpf_map_put(ptr); } u32 bpf_map_fd_sys_lookup_elem(void *ptr) { return ((struct bpf_map *)ptr)->id; }
2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 /* * usnjrnl.h - NTFS kernel transaction log ($UsnJrnl) handling. Part of the * Linux-NTFS project. * * Copyright (c) 2005 Anton Altaparmakov * * This program/include file is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License as published * by the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program/include file is distributed in the hope that it will be * useful, but WITHOUT ANY WARRANTY; without even the implied warranty * of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program (in the main directory of the Linux-NTFS * distribution in the file COPYING); if not, write to the Free Software * Foundation,Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */ #ifdef NTFS_RW #include <linux/fs.h> #include <linux/highmem.h> #include <linux/mm.h> #include "aops.h" #include "debug.h" #include "endian.h" #include "time.h" #include "types.h" #include "usnjrnl.h" #include "volume.h" /** * ntfs_stamp_usnjrnl - stamp the transaction log ($UsnJrnl) on an ntfs volume * @vol: ntfs volume on which to stamp the transaction log * * Stamp the transaction log ($UsnJrnl) on the ntfs volume @vol and return * 'true' on success and 'false' on error. * * This function assumes that the transaction log has already been loaded and * consistency checked by a call to fs/ntfs/super.c::load_and_init_usnjrnl(). */ bool ntfs_stamp_usnjrnl(ntfs_volume *vol) { ntfs_debug("Entering."); if (likely(!NVolUsnJrnlStamped(vol))) { sle64 stamp; struct page *page; USN_HEADER *uh; page = ntfs_map_page(vol->usnjrnl_max_ino->i_mapping, 0); if (IS_ERR(page)) { ntfs_error(vol->sb, "Failed to read from " "$UsnJrnl/$DATA/$Max attribute."); return false; } uh = (USN_HEADER*)page_address(page); stamp = get_current_ntfs_time(); ntfs_debug("Stamping transaction log ($UsnJrnl): old " "journal_id 0x%llx, old lowest_valid_usn " "0x%llx, new journal_id 0x%llx, new " "lowest_valid_usn 0x%llx.", (long long)sle64_to_cpu(uh->journal_id), (long long)sle64_to_cpu(uh->lowest_valid_usn), (long long)sle64_to_cpu(stamp), i_size_read(vol->usnjrnl_j_ino)); uh->lowest_valid_usn = cpu_to_sle64(i_size_read(vol->usnjrnl_j_ino)); uh->journal_id = stamp; flush_dcache_page(page); set_page_dirty(page); ntfs_unmap_page(page); /* Set the flag so we do not have to do it again on remount. */ NVolSetUsnJrnlStamped(vol); } ntfs_debug("Done."); return true; } #endif /* NTFS_RW */
5 5 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 // SPDX-License-Identifier: GPL-2.0 /* net/atm/pvc.c - ATM PVC sockets */ /* Written 1995-2000 by Werner Almesberger, EPFL LRC/ICA */ #include <linux/net.h> /* struct socket, struct proto_ops */ #include <linux/atm.h> /* ATM stuff */ #include <linux/atmdev.h> /* ATM devices */ #include <linux/errno.h> /* error codes */ #include <linux/kernel.h> /* printk */ #include <linux/init.h> #include <linux/skbuff.h> #include <linux/bitops.h> #include <linux/export.h> #include <net/sock.h> /* for sock_no_* */ #include "resources.h" /* devs and vccs */ #include "common.h" /* common for PVCs and SVCs */ static int pvc_shutdown(struct socket *sock, int how) { return 0; } static int pvc_bind(struct socket *sock, struct sockaddr *sockaddr, int sockaddr_len) { struct sock *sk = sock->sk; struct sockaddr_atmpvc *addr; struct atm_vcc *vcc; int error; if (sockaddr_len != sizeof(struct sockaddr_atmpvc)) return -EINVAL; addr = (struct sockaddr_atmpvc *)sockaddr; if (addr->sap_family != AF_ATMPVC) return -EAFNOSUPPORT; lock_sock(sk); vcc = ATM_SD(sock); if (!test_bit(ATM_VF_HASQOS, &vcc->flags)) { error = -EBADFD; goto out; } if (test_bit(ATM_VF_PARTIAL, &vcc->flags)) { if (vcc->vpi != ATM_VPI_UNSPEC) addr->sap_addr.vpi = vcc->vpi; if (vcc->vci != ATM_VCI_UNSPEC) addr->sap_addr.vci = vcc->vci; } error = vcc_connect(sock, addr->sap_addr.itf, addr->sap_addr.vpi, addr->sap_addr.vci); out: release_sock(sk); return error; } static int pvc_connect(struct socket *sock, struct sockaddr *sockaddr, int sockaddr_len, int flags) { return pvc_bind(sock, sockaddr, sockaddr_len); } static int pvc_setsockopt(struct socket *sock, int level, int optname, char __user *optval, unsigned int optlen) { struct sock *sk = sock->sk; int error; lock_sock(sk); error = vcc_setsockopt(sock, level, optname, optval, optlen); release_sock(sk); return error; } static int pvc_getsockopt(struct socket *sock, int level, int optname, char __user *optval, int __user *optlen) { struct sock *sk = sock->sk; int error; lock_sock(sk); error = vcc_getsockopt(sock, level, optname, optval, optlen); release_sock(sk); return error; } static int pvc_getname(struct socket *sock, struct sockaddr *sockaddr, int peer) { struct sockaddr_atmpvc *addr; struct atm_vcc *vcc = ATM_SD(sock); if (!vcc->dev || !test_bit(ATM_VF_ADDR, &vcc->flags)) return -ENOTCONN; addr = (struct sockaddr_atmpvc *)sockaddr; memset(addr, 0, sizeof(*addr)); addr->sap_family = AF_ATMPVC; addr->sap_addr.itf = vcc->dev->number; addr->sap_addr.vpi = vcc->vpi; addr->sap_addr.vci = vcc->vci; return sizeof(struct sockaddr_atmpvc); } static const struct proto_ops pvc_proto_ops = { .family = PF_ATMPVC, .owner = THIS_MODULE, .release = vcc_release, .bind = pvc_bind, .connect = pvc_connect, .socketpair = sock_no_socketpair, .accept = sock_no_accept, .getname = pvc_getname, .poll = vcc_poll, .ioctl = vcc_ioctl, #ifdef CONFIG_COMPAT .compat_ioctl = vcc_compat_ioctl, #endif .listen = sock_no_listen, .shutdown = pvc_shutdown, .setsockopt = pvc_setsockopt, .getsockopt = pvc_getsockopt, .sendmsg = vcc_sendmsg, .recvmsg = vcc_recvmsg, .mmap = sock_no_mmap, .sendpage = sock_no_sendpage, }; static int pvc_create(struct net *net, struct socket *sock, int protocol, int kern) { if (net != &init_net) return -EAFNOSUPPORT; sock->ops = &pvc_proto_ops; return vcc_create(net, sock, protocol, PF_ATMPVC, kern); } static const struct net_proto_family pvc_family_ops = { .family = PF_ATMPVC, .create = pvc_create, .owner = THIS_MODULE, }; /* * Initialize the ATM PVC protocol family */ int __init atmpvc_init(void) { return sock_register(&pvc_family_ops); } void atmpvc_exit(void) { sock_unregister(PF_ATMPVC); }
56 125 137 64 64 62 201 201 146 47 157 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 2297 2298 2299 2300 2301 2302 2303 2304 2305 2306 2307 2308 2309 2310 2311 2312 2313 2314 2315 2316 2317 2318 2319 2320 2321 2322 2323 2324 2325 2326 2327 2328 2329 2330 2331 2332 2333 2334 2335 2336 2337 2338 2339 2340 2341 2342 2343 2344 2345 2346 2347 2348 2349 2350 2351 2352 2353 2354 2355 2356 2357 2358 2359 2360 2361 2362 2363 2364 2365 2366 2367 2368 2369 2370 2371 2372 2373 2374 2375 2376 2377 2378 2379 2380 2381 2382 2383 2384 2385 2386 2387 2388 2389 2390 2391 2392 2393 2394 2395 2396 2397 2398 2399 2400 2401 2402 2403 2404 2405 2406 2407 2408 2409 2410 2411 2412 2413 2414 2415 2416 2417 2418 2419 2420 2421 2422 2423 2424 2425 2426 2427 2428 2429 2430 2431 2432 2433 2434 2435 2436 2437 2438 2439 2440 2441 2442 2443 2444 2445 2446 2447 2448 2449 2450 2451 2452 2453 2454 2455 2456 2457 2458 2459 2460 2461 2462 2463 2464 2465 2466 2467 2468 2469 2470 2471 2472 2473 2474 2475 2476 2477 2478 2479 2480 2481 2482 2483 2484 2485 2486 2487 2488 2489 2490 2491 2492 2493 2494 2495 2496 2497 2498 2499 2500 2501 2502 2503 2504 2505 2506 2507 2508 2509 2510 2511 2512 2513 2514 2515 2516 2517 2518 2519 2520 2521 2522 2523 2524 2525 2526 2527 2528 2529 2530 2531 2532 2533 2534 2535 2536 2537 2538 2539 2540 2541 2542 2543 2544 2545 2546 2547 2548 2549 2550 2551 2552 2553 2554 2555 2556 2557 2558 2559 2560 2561 2562 2563 2564 2565 2566 2567 2568 2569 2570 2571 2572 2573 2574 2575 2576 2577 2578 2579 2580 2581 2582 2583 2584 2585 2586 2587 2588 2589 2590 2591 2592 2593 2594 2595 2596 2597 2598 2599 2600 2601 2602 2603 2604 2605 2606 2607 2608 2609 2610 2611 2612 2613 2614 2615 2616 2617 2618 2619 2620 2621 2622 2623 2624 2625 2626 2627 2628 2629 2630 2631 2632 2633 2634 2635 2636 2637 2638 2639 2640 2641 2642 2643 2644 2645 2646 2647 2648 2649 2650 2651 2652 2653 2654 2655 2656 2657 2658 2659 2660 2661 2662 2663 2664 2665 2666 2667 2668 2669 2670 2671 2672 2673 2674 2675 2676 2677 2678 2679 2680 2681 2682 2683 2684 2685 2686 2687 2688 2689 2690 2691 2692 2693 2694 2695 2696 2697 2698 2699 2700 2701 2702 2703 2704 2705 2706 2707 2708 2709 2710 2711 2712 2713 2714 2715 2716 2717 2718 2719 2720 2721 2722 2723 2724 2725 2726 2727 2728 2729 2730 2731 2732 2733 2734 2735 2736 2737 2738 2739 2740 2741 2742 2743 2744 2745 2746 2747 2748 2749 2750 2751 2752 2753 2754 2755 2756 2757 2758 2759 2760 2761 2762 2763 2764 2765 2766 2767 2768 2769 2770 2771 2772 2773 2774 2775 2776 2777 2778 2779 2780 2781 2782 2783 2784 2785 2786 2787 2788 2789 2790 2791 2792 2793 2794 2795 2796 2797 2798 2799 2800 2801 2802 2803 2804 2805 2806 2807 2808 2809 2810 2811 2812 2813 2814 2815 2816 2817 2818 2819 2820 2821 2822 2823 2824 2825 2826 2827 2828 2829 2830 2831 2832 2833 2834 2835 2836 2837 2838 2839 2840 2841 2842 2843 2844 2845 2846 2847 2848 2849 2850 2851 2852 2853 2854 2855 2856 2857 2858 2859 2860 2861 2862 2863 2864 2865 2866 2867 2868 2869 2870 2871 2872 2873 2874 2875 2876 2877 2878 2879 2880 2881 2882 2883 2884 2885 2886 2887 2888 2889 2890 2891 2892 2893 2894 2895 2896 2897 2898 2899 2900 2901 2902 2903 2904 2905 2906 2907 2908 2909 2910 2911 2912 2913 2914 2915 2916 2917 2918 2919 2920 2921 2922 2923 2924 2925 2926 2927 2928 2929 2930 2931 2932 2933 2934 2935 2936 2937 2938 2939 2940 2941 2942 2943 2944 2945 2946 2947 2948 2949 2950 2951 2952 2953 2954 2955 2956 2957 2958 2959 2960 2961 2962 2963 2964 2965 2966 2967 2968 2969 2970 2971 2972 2973 2974 2975 2976 2977 2978 2979 2980 2981 2982 2983 2984 2985 2986 2987 2988 2989 2990 2991 2992 2993 2994 2995 2996 2997 2998 2999 3000 3001 3002 3003 3004 3005 3006 3007 3008 3009 3010 3011 3012 3013 3014 3015 3016 3017 3018 3019 3020 3021 3022 3023 3024 3025 3026 3027 3028 3029 3030 3031 3032 3033 3034 3035 3036 3037 3038 3039 3040 3041 3042 3043 3044 3045 3046 3047 3048 3049 3050 3051 3052 3053 3054 3055 3056 3057 3058 3059 3060 3061 3062 3063 3064 3065 3066 3067 3068 3069 3070 3071 3072 3073 3074 3075 3076 3077 3078 3079 3080 3081 3082 3083 3084 3085 3086 3087 3088 3089 3090 3091 3092 3093 3094 3095 3096 3097 3098 3099 3100 3101 3102 3103 3104 3105 3106 3107 3108 3109 3110 3111 3112 3113 3114 3115 3116 3117 3118 3119 3120 3121 3122 3123 3124 3125 3126 3127 3128 3129 3130 3131 3132 3133 3134 3135 3136 3137 3138 3139 3140 3141 3142 3143 3144 3145 3146 3147 3148 3149 3150 3151 3152 3153 3154 3155 3156 3157 3158 3159 3160 3161 3162 3163 3164 3165 3166 3167 3168 3169 3170 3171 3172 3173 3174 3175 3176 3177 3178 3179 3180 3181 3182 3183 3184 3185 3186 3187 3188 3189 3190 3191 3192 3193 3194 3195 3196 3197 3198 3199 3200 3201 3202 3203 3204 3205 3206 3207 3208 3209 3210 3211 3212 3213 3214 3215 3216 3217 3218 3219 3220 3221 3222 3223 3224 3225 3226 3227 3228 3229 3230 3231 3232 3233 3234 3235 3236 3237 3238 3239 3240 3241 3242 3243 3244 3245 3246 3247 3248 3249 3250 3251 3252 3253 3254 3255 3256 3257 3258 3259 3260 3261 3262 3263 3264 3265 3266 3267 3268 3269 3270 3271 3272 3273 3274 3275 3276 3277 3278 3279 3280 3281 3282 3283 3284 3285 3286 3287 3288 3289 3290 3291 3292 3293 3294 3295 3296 3297 3298 3299 3300 3301 3302 3303 3304 3305 3306 3307 3308 3309 3310 3311 3312 3313 3314 3315 3316 3317 3318 3319 3320 3321 3322 3323 3324 3325 3326 3327 3328 3329 3330 3331 3332 3333 3334 3335 3336 3337 3338 3339 3340 3341 3342 3343 3344 3345 3346 3347 3348 3349 3350 3351 3352 3353 3354 3355 3356 3357 3358 3359 3360 3361 3362 3363 3364 3365 3366 3367 3368 3369 3370 3371 3372 3373 3374 3375 3376 3377 3378 3379 3380 3381 3382 3383 3384 3385 3386 3387 3388 3389 3390 3391 3392 3393 3394 3395 3396 3397 3398 3399 3400 3401 3402 3403 3404 3405 3406 3407 3408 3409 3410 3411 3412 3413 3414 3415 3416 3417 3418 3419 3420 3421 3422 3423 3424 3425 3426 3427 3428 3429 3430 3431 3432 3433 3434 3435 3436 3437 3438 3439 3440 3441 3442 3443 3444 3445 3446 3447 3448 3449 3450 3451 3452 3453 3454 3455 3456 3457 3458 3459 3460 3461 3462 3463 3464 3465 3466 3467 3468 3469 3470 3471 3472 3473 3474 3475 3476 3477 3478 3479 3480 3481 3482 3483 3484 3485 3486 3487 3488 3489 3490 3491 3492 3493 3494 3495 3496 3497 3498 3499 3500 3501 3502 3503 3504 3505 3506 3507 3508 3509 3510 3511 3512 3513 3514 3515 3516 3517 3518 3519 3520 3521 3522 3523 3524 3525 3526 3527 3528 3529 3530 3531 3532 3533 3534 3535 3536 3537 3538 3539 3540 3541 3542 3543 3544 3545 3546 3547 3548 3549 3550 3551 3552 3553 3554 3555 3556 3557 3558 3559 3560 3561 3562 3563 3564 3565 3566 3567 3568 3569 3570 3571 3572 3573 3574 3575 3576 3577 3578 3579 3580 3581 3582 3583 3584 3585 3586 3587 3588 3589 3590 3591 3592 3593 3594 3595 3596 3597 3598 3599 3600 3601 3602 3603 3604 3605 3606 3607 3608 3609 3610 3611 3612 3613 3614 3615 3616 3617 3618 3619 3620 3621 3622 3623 3624 3625 3626 3627 3628 3629 3630 3631 3632 3633 3634 3635 3636 3637 3638 3639 3640 3641 3642 3643 3644 3645 3646 3647 3648 3649 3650 3651 3652 3653 3654 3655 3656 3657 3658 3659 3660 3661 3662 3663 3664 3665 3666 3667 3668 3669 3670 3671 3672 3673 3674 3675 3676 3677 3678 3679 3680 3681 3682 3683 3684 3685 3686 3687 3688 3689 3690 3691 3692 3693 3694 3695 3696 3697 3698 3699 3700 3701 3702 3703 3704 3705 3706 3707 3708 3709 3710 3711 3712 3713 3714 3715 3716 3717 3718 3719 3720 3721 3722 3723 3724 3725 3726 3727 3728 3729 3730 3731 3732 3733 3734 3735 3736 3737 3738 3739 3740 3741 3742 3743 3744 3745 3746 3747 3748 3749 3750 3751 3752 3753 3754 3755 3756 3757 3758 3759 3760 3761 3762 3763 3764 3765 3766 3767 3768 3769 3770 3771 3772 3773 3774 3775 3776 3777 3778 3779 3780 3781 3782 3783 3784 3785 3786 3787 3788 3789 3790 3791 3792 3793 3794 3795 3796 3797 3798 3799 3800 3801 3802 3803 3804 3805 3806 3807 3808 3809 3810 3811 3812 3813 3814 3815 3816 3817 3818 3819 3820 3821 3822 3823 3824 3825 3826 3827 3828 3829 3830 3831 3832 3833 3834 3835 3836 3837 3838 3839 3840 3841 3842 3843 3844 3845 3846 3847 3848 3849 3850 3851 3852 3853 3854 3855 3856 3857 3858 3859 3860 3861 3862 3863 3864 3865 3866 3867 3868 3869 3870 3871 3872 3873 3874 3875 3876 3877 3878 3879 3880 3881 3882 3883 3884 3885 3886 3887 3888 3889 3890 3891 3892 3893 3894 3895 3896 3897 3898 3899 3900 3901 3902 3903 3904 3905 3906 3907 3908 3909 3910 3911 3912 3913 3914 3915 3916 3917 3918 3919 3920 3921 3922 3923 3924 3925 3926 3927 3928 3929 3930 3931 3932 3933 3934 3935 3936 3937 3938 3939 3940 3941 3942 3943 3944 3945 3946 3947 3948 3949 3950 3951 3952 3953 3954 3955 3956 3957 3958 3959 3960 3961 3962 3963 3964 3965 3966 3967 3968 3969 3970 3971 3972 3973 3974 3975 3976 3977 3978 3979 3980 3981 3982 3983 3984 3985 3986 3987 3988 3989 3990 3991 3992 3993 3994 3995 3996 3997 3998 3999 4000 4001 4002 4003 4004 4005 4006 4007 4008 4009 4010 4011 4012 4013 4014 4015 4016 4017 4018 4019 4020 4021 4022 4023 4024 4025 4026 4027 4028 4029 4030 4031 4032 4033 4034 4035 4036 4037 4038 4039 4040 4041 4042 4043 4044 4045 4046 4047 4048 4049 4050 4051 4052 4053 4054 4055 4056 4057 4058 4059 4060 4061 4062 4063 4064 4065 4066 4067 4068 4069 4070 4071 4072 4073 4074 4075 4076 4077 4078 4079 4080 4081 4082 4083 4084 4085 4086 4087 4088 4089 4090 4091 4092 4093 4094 4095 4096 4097 4098 4099 4100 4101 4102 4103 4104 4105 4106 4107 4108 4109 4110 4111 4112 4113 4114 4115 4116 4117 4118 4119 4120 4121 4122 4123 4124 4125 4126 4127 4128 4129 4130 4131 4132 4133 4134 4135 4136 4137 4138 4139 4140 4141 4142 4143 4144 4145 4146 4147 4148 4149 4150 4151 4152 4153 4154 4155 4156 4157 4158 4159 4160 4161 4162 4163 4164 4165 4166 4167 4168 4169 4170 4171 4172 4173 4174 4175 4176 4177 4178 4179 4180 4181 4182 4183 4184 4185 4186 4187 4188 4189 4190 4191 4192 4193 4194 4195 4196 4197 4198 4199 4200 4201 4202 4203 4204 4205 4206 4207 4208 4209 4210 4211 4212 4213 4214 4215 4216 4217 4218 4219 4220 4221 4222 4223 4224 4225 4226 4227 4228 4229 4230 4231 4232 4233 4234 4235 4236 4237 4238 4239 4240 4241 4242 4243 4244 4245 4246 4247 4248 4249 4250 4251 4252 4253 4254 4255 4256 4257 4258 4259 4260 4261 4262 4263 4264 4265 4266 4267 4268 4269 4270 4271 4272 4273 4274 4275 4276 4277 4278 4279 4280 4281 4282 4283 4284 4285 4286 4287 4288 4289 4290 4291 4292 4293 4294 4295 4296 4297 4298 4299 4300 4301 4302 4303 4304 4305 4306 4307 4308 4309 4310 4311 4312 4313 4314 4315 4316 4317 4318 4319 4320 4321 4322 4323 4324 4325 4326 4327 4328 4329 4330 4331 4332 4333 4334 4335 4336 4337 4338 4339 4340 4341 4342 4343 4344 4345 4346 4347 4348 4349 4350 4351 4352 4353 4354 4355 4356 4357 4358 4359 4360 4361 4362 4363 4364 4365 4366 4367 4368 4369 4370 4371 4372 4373 4374 4375 4376 4377 4378 4379 4380 4381 4382 4383 4384 4385 4386 4387 4388 4389 4390 4391 4392 4393 4394 4395 4396 4397 4398 4399 4400 4401 4402 4403 4404 4405 4406 4407 4408 4409 4410 4411 4412 4413 4414 4415 4416 4417 4418 4419 4420 4421 4422 4423 4424 4425 4426 4427 4428 4429 4430 4431 4432 4433 4434 4435 4436 4437 4438 4439 4440 4441 4442 4443 4444 4445 4446 4447 4448 4449 4450 4451 4452 4453 4454 4455 4456 4457 4458 4459 4460 4461 4462 4463 4464 4465 4466 4467 4468 4469 4470 4471 4472 4473 4474 4475 4476 4477 4478 4479 4480 4481 4482 4483 4484 4485 4486 4487 4488 4489 4490 4491 4492 4493 4494 4495 4496 4497 4498 4499 4500 4501 4502 4503 4504 4505 4506 4507 4508 4509 4510 4511 4512 4513 4514 4515 4516 4517 4518 4519 4520 4521 4522 4523 4524 4525 4526 4527 4528 4529 4530 4531 4532 4533 4534 4535 4536 4537 4538 4539 4540 4541 4542 4543 4544 4545 4546 4547 4548 4549 4550 4551 4552 4553 4554 4555 4556 4557 4558 4559 4560 4561 4562 4563 4564 4565 4566 4567 4568 4569 4570 4571 4572 4573 4574 4575 4576 4577 4578 4579 4580 4581 4582 4583 4584 4585 4586 4587 4588 4589 4590 4591 4592 4593 4594 4595 4596 4597 4598 4599 4600 4601 4602 4603 4604 4605 4606 4607 4608 4609 4610 4611 4612 4613 4614 4615 4616 4617 4618 4619 4620 4621 4622 4623 4624 4625 4626 4627 4628 4629 4630 4631 4632 4633 4634 4635 4636 4637 4638 4639 4640 4641 4642 4643 4644 4645 4646 4647 4648 4649 4650 4651 4652 4653 4654 4655 4656 4657 4658 4659 4660 4661 4662 4663 4664 4665 4666 4667 4668 4669 4670 4671 4672 4673 4674 4675 4676 4677 4678 4679 4680 4681 4682 4683 4684 4685 4686 4687 4688 4689 4690 4691 4692 4693 4694 4695 4696 4697 4698 4699 4700 4701 4702 4703 4704 4705 4706 4707 4708 4709 4710 4711 4712 4713 4714 4715 4716 4717 4718 4719 4720 4721 4722 4723 4724 4725 4726 4727 4728 4729 4730 4731 4732 4733 4734 4735 4736 4737 4738 4739 4740 4741 4742 4743 4744 4745 4746 4747 4748 4749 4750 4751 4752 4753 4754 4755 4756 4757 4758 4759 4760 4761 4762 4763 4764 4765 4766 4767 4768 4769 4770 4771 4772 4773 4774 4775 4776 4777 4778 4779 4780 4781 4782 4783 4784 4785 4786 4787 4788 4789 4790 4791 4792 4793 4794 4795 4796 4797 4798 4799 4800 4801 4802 4803 4804 4805 4806 4807 4808 4809 4810 4811 4812 4813 4814 4815 4816 4817 4818 4819 4820 4821 4822 4823 4824 4825 4826 4827 4828 4829 4830 4831 4832 4833 4834 4835 4836 4837 4838 4839 4840 4841 4842 4843 4844 4845 4846 4847 4848 4849 4850 4851 4852 4853 4854 4855 4856 4857 4858 4859 4860 4861 4862 4863 4864 4865 4866 4867 4868 4869 4870 4871 4872 4873 4874 4875 4876 4877 4878 4879 4880 4881 4882 4883 4884 4885 4886 4887 4888 4889 4890 4891 4892 4893 4894 4895 4896 4897 4898 4899 4900 4901 4902 4903 4904 4905 4906 4907 4908 4909 4910 4911 4912 4913 4914 4915 4916 4917 4918 4919 4920 4921 4922 4923 4924 4925 4926 4927 4928 4929 4930 4931 4932 4933 4934 4935 4936 4937 4938 4939 4940 4941 4942 4943 4944 4945 4946 4947 4948 4949 4950 4951 4952 4953 4954 4955 4956 4957 4958 4959 4960 4961 4962 4963 4964 4965 4966 4967 4968 4969 4970 4971 4972 4973 4974 4975 4976 4977 4978 4979 4980 4981 4982 4983 4984 4985 4986 4987 4988 4989 4990 4991 4992 4993 4994 4995 4996 4997 4998 4999 5000 5001 5002 5003 5004 5005 5006 5007 5008 5009 5010 5011 5012 5013 5014 5015 5016 5017 5018 5019 5020 5021 5022 5023 5024 5025 5026 5027 5028 5029 5030 5031 5032 5033 5034 5035 5036 5037 5038 5039 5040 5041 5042 5043 5044 5045 5046 5047 5048 5049 5050 5051 5052 5053 5054 5055 5056 5057 5058 5059 5060 5061 5062 5063 5064 5065 5066 5067 5068 5069 5070 5071 5072 5073 5074 5075 5076 5077 5078 5079 5080 5081 5082 5083 5084 5085 5086 5087 5088 5089 5090 5091 5092 5093 5094 5095 5096 5097 5098 5099 5100 5101 5102 5103 5104 5105 5106 5107 5108 5109 5110 5111 5112 5113 5114 5115 5116 5117 5118 5119 5120 5121 5122 5123 5124 5125 5126 5127 5128 5129 5130 5131 5132 5133 5134 5135 5136 5137 5138 5139 5140 5141 5142 5143 5144 5145 5146 5147 5148 5149 5150 5151 5152 5153 5154 5155 5156 5157 5158 5159 5160 5161 5162 5163 5164 5165 5166 5167 5168 5169 5170 5171 5172 5173 5174 5175 5176 5177 5178 5179 5180 5181 5182 5183 5184 5185 5186 5187 5188 5189 5190 5191 5192 5193 5194 5195 5196 5197 5198 5199 5200 5201 5202 5203 5204 5205 5206 5207 5208 5209 5210 5211 5212 5213 5214 5215 5216 5217 5218 5219 5220 5221 5222 5223 5224 5225 5226 5227 5228 5229 5230 5231 5232 5233 5234 5235 5236 5237 5238 5239 5240 5241 5242 5243 5244 5245 5246 5247 5248 5249 5250 5251 5252 5253 5254 5255 5256 5257 5258 5259 5260 5261 5262 5263 5264 5265 5266 5267 5268 5269 5270 5271 5272 5273 5274 5275 5276 5277 5278 5279 5280 5281 5282 5283 5284 5285 5286 5287 5288 5289 5290 5291 5292 5293 5294 5295 5296 5297 5298 5299 5300 5301 5302 5303 5304 5305 5306 5307 5308 5309 5310 5311 5312 5313 5314 5315 5316 5317 5318 5319 5320 5321 5322 5323 5324 5325 5326 5327 5328 5329 5330 5331 5332 5333 5334 5335 5336 5337 5338 5339 5340 5341 5342 5343 5344 5345 5346 5347 5348 5349 5350 5351 5352 5353 5354 5355 5356 5357 5358 5359 5360 5361 5362 5363 5364 5365 5366 5367 5368 5369 5370 5371 5372 5373 5374 5375 5376 5377 5378 5379 5380 5381 5382 5383 5384 5385 5386 5387 5388 5389 5390 5391 5392 5393 5394 5395 5396 5397 5398 5399 5400 5401 5402 5403 5404 5405 5406 5407 5408 5409 5410 5411 5412 5413 5414 5415 5416 5417 5418 5419 5420 5421 5422 5423 5424 5425 5426 5427 5428 5429 5430 5431 5432 5433 5434 5435 5436 5437 5438 5439 5440 5441 5442 5443 5444 5445 5446 5447 5448 5449 5450 5451 5452 5453 5454 5455 5456 5457 5458 5459 5460 5461 5462 5463 5464 5465 5466 5467 5468 5469 5470 5471 5472 5473 5474 5475 5476 5477 5478 5479 5480 5481 5482 5483 5484 5485 5486 5487 5488 5489 5490 5491 5492 5493 5494 5495 5496 5497 5498 5499 5500 5501 5502 5503 5504 5505 5506 5507 5508 5509 5510 5511 5512 5513 5514 5515 5516 5517 5518 5519 5520 5521 5522 5523 5524 5525 5526 5527 5528 5529 5530 5531 5532 5533 5534 5535 5536 5537 5538 5539 5540 5541 5542 5543 5544 5545 5546 5547 5548 5549 5550 5551 5552 5553 5554 5555 5556 5557 5558 5559 5560 5561 5562 5563 5564 5565 5566 5567 5568 5569 5570 5571 5572 5573 5574 5575 5576 5577 5578 5579 5580 5581 5582 5583 5584 5585 5586 5587 5588 5589 5590 5591 5592 5593 5594 5595 5596 5597 5598 5599 5600 5601 5602 5603 5604 5605 5606 5607 5608 5609 5610 5611 5612 5613 5614 5615 5616 5617 5618 5619 5620 5621 5622 5623 5624 5625 5626 5627 5628 5629 5630 5631 5632 5633 5634 5635 5636 5637 5638 5639 5640 5641 5642 5643 5644 5645 5646 5647 5648 5649 5650 5651 5652 5653 5654 5655 5656 5657 5658 5659 5660 5661 5662 5663 5664 5665 5666 5667 5668 5669 5670 5671 5672 5673 5674 5675 5676 5677 5678 5679 5680 5681 5682 5683 5684 5685 5686 5687 5688 5689 5690 5691 5692 5693 5694 5695 5696 5697 5698 5699 5700 5701 5702 5703 5704 5705 5706 5707 5708 5709 5710 5711 5712 5713 5714 5715 5716 5717 5718 5719 5720 5721 5722 5723 5724 5725 5726 5727 5728 5729 5730 5731 5732 5733 5734 5735 5736 5737 5738 5739 5740 5741 5742 5743 5744 5745 5746 5747 5748 5749 5750 5751 5752 5753 5754 5755 5756 5757 5758 5759 5760 5761 5762 5763 5764 5765 5766 5767 5768 5769 5770 5771 5772 5773 5774 5775 5776 5777 5778 5779 5780 5781 5782 5783 5784 5785 5786 5787 5788 5789 5790 5791 5792 5793 5794 5795 5796 5797 5798 5799 5800 5801 5802 5803 5804 5805 5806 5807 5808 5809 5810 5811 5812 5813 5814 5815 5816 5817 5818 5819 5820 5821 5822 5823 5824 5825 5826 5827 5828 5829 5830 5831 5832 5833 5834 5835 5836 5837 5838 5839 5840 5841 5842 5843 5844 5845 5846 5847 5848 5849 5850 5851 5852 5853 5854 5855 5856 5857 5858 5859 5860 5861 5862 5863 5864 5865 5866 5867 5868 5869 5870 5871 5872 5873 5874 5875 5876 5877 5878 5879 5880 5881 5882 5883 5884 5885 5886 5887 5888 5889 5890 5891 5892 5893 5894 5895 5896 5897 5898 5899 5900 5901 5902 5903 5904 5905 5906 5907 5908 5909 5910 5911 5912 5913 5914 5915 5916 5917 5918 5919 5920 5921 5922 5923 5924 5925 5926 5927 5928 5929 5930 5931 5932 5933 5934 5935 5936 5937 5938 5939 5940 5941 5942 5943 5944 5945 5946 5947 5948 5949 5950 5951 5952 5953 5954 5955 5956 5957 5958 5959 5960 5961 5962 5963 5964 5965 5966 5967 5968 5969 5970 5971 5972 5973 5974 5975 5976 5977 5978 5979 5980 5981 5982 5983 5984 5985 5986 5987 5988 5989 5990 5991 5992 5993 5994 5995 5996 5997 5998 5999 6000 6001 6002 6003 6004 6005 6006 6007 6008 6009 6010 6011 6012 6013 6014 6015 6016 6017 6018 6019 6020 6021 6022 6023 6024 6025 6026 6027 /* * mac80211 <-> driver interface * * Copyright 2002-2005, Devicescape Software, Inc. * Copyright 2006-2007 Jiri Benc <jbenc@suse.cz> * Copyright 2007-2010 Johannes Berg <johannes@sipsolutions.net> * Copyright 2013-2014 Intel Mobile Communications GmbH * Copyright (C) 2015 - 2017 Intel Deutschland GmbH * Copyright (C) 2018 Intel Corporation * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License version 2 as * published by the Free Software Foundation. */ #ifndef MAC80211_H #define MAC80211_H #include <linux/bug.h> #include <linux/kernel.h> #include <linux/if_ether.h> #include <linux/skbuff.h> #include <linux/ieee80211.h> #include <net/cfg80211.h> #include <net/codel.h> #include <net/ieee80211_radiotap.h> #include <asm/unaligned.h> /** * DOC: Introduction * * mac80211 is the Linux stack for 802.11 hardware that implements * only partial functionality in hard- or firmware. This document * defines the interface between mac80211 and low-level hardware * drivers. */ /** * DOC: Calling mac80211 from interrupts * * Only ieee80211_tx_status_irqsafe() and ieee80211_rx_irqsafe() can be * called in hardware interrupt context. The low-level driver must not call any * other functions in hardware interrupt context. If there is a need for such * call, the low-level driver should first ACK the interrupt and perform the * IEEE 802.11 code call after this, e.g. from a scheduled workqueue or even * tasklet function. * * NOTE: If the driver opts to use the _irqsafe() functions, it may not also * use the non-IRQ-safe functions! */ /** * DOC: Warning * * If you're reading this document and not the header file itself, it will * be incomplete because not all documentation has been converted yet. */ /** * DOC: Frame format * * As a general rule, when frames are passed between mac80211 and the driver, * they start with the IEEE 802.11 header and include the same octets that are * sent over the air except for the FCS which should be calculated by the * hardware. * * There are, however, various exceptions to this rule for advanced features: * * The first exception is for hardware encryption and decryption offload * where the IV/ICV may or may not be generated in hardware. * * Secondly, when the hardware handles fragmentation, the frame handed to * the driver from mac80211 is the MSDU, not the MPDU. */ /** * DOC: mac80211 workqueue * * mac80211 provides its own workqueue for drivers and internal mac80211 use. * The workqueue is a single threaded workqueue and can only be accessed by * helpers for sanity checking. Drivers must ensure all work added onto the * mac80211 workqueue should be cancelled on the driver stop() callback. * * mac80211 will flushed the workqueue upon interface removal and during * suspend. * * All work performed on the mac80211 workqueue must not acquire the RTNL lock. * */ /** * DOC: mac80211 software tx queueing * * mac80211 provides an optional intermediate queueing implementation designed * to allow the driver to keep hardware queues short and provide some fairness * between different stations/interfaces. * In this model, the driver pulls data frames from the mac80211 queue instead * of letting mac80211 push them via drv_tx(). * Other frames (e.g. control or management) are still pushed using drv_tx(). * * Drivers indicate that they use this model by implementing the .wake_tx_queue * driver operation. * * Intermediate queues (struct ieee80211_txq) are kept per-sta per-tid, with a * single per-vif queue for multicast data frames. * * The driver is expected to initialize its private per-queue data for stations * and interfaces in the .add_interface and .sta_add ops. * * The driver can't access the queue directly. To dequeue a frame, it calls * ieee80211_tx_dequeue(). Whenever mac80211 adds a new frame to a queue, it * calls the .wake_tx_queue driver op. * * For AP powersave TIM handling, the driver only needs to indicate if it has * buffered packets in the driver specific data structures by calling * ieee80211_sta_set_buffered(). For frames buffered in the ieee80211_txq * struct, mac80211 sets the appropriate TIM PVB bits and calls * .release_buffered_frames(). * In that callback the driver is therefore expected to release its own * buffered frames and afterwards also frames from the ieee80211_txq (obtained * via the usual ieee80211_tx_dequeue). */ struct device; /** * enum ieee80211_max_queues - maximum number of queues * * @IEEE80211_MAX_QUEUES: Maximum number of regular device queues. * @IEEE80211_MAX_QUEUE_MAP: bitmap with maximum queues set */ enum ieee80211_max_queues { IEEE80211_MAX_QUEUES = 16, IEEE80211_MAX_QUEUE_MAP = BIT(IEEE80211_MAX_QUEUES) - 1, }; #define IEEE80211_INVAL_HW_QUEUE 0xff /** * enum ieee80211_ac_numbers - AC numbers as used in mac80211 * @IEEE80211_AC_VO: voice * @IEEE80211_AC_VI: video * @IEEE80211_AC_BE: best effort * @IEEE80211_AC_BK: background */ enum ieee80211_ac_numbers { IEEE80211_AC_VO = 0, IEEE80211_AC_VI = 1, IEEE80211_AC_BE = 2, IEEE80211_AC_BK = 3, }; /** * struct ieee80211_tx_queue_params - transmit queue configuration * * The information provided in this structure is required for QoS * transmit queue configuration. Cf. IEEE 802.11 7.3.2.29. * * @aifs: arbitration interframe space [0..255] * @cw_min: minimum contention window [a value of the form * 2^n-1 in the range 1..32767] * @cw_max: maximum contention window [like @cw_min] * @txop: maximum burst time in units of 32 usecs, 0 meaning disabled * @acm: is mandatory admission control required for the access category * @uapsd: is U-APSD mode enabled for the queue * @mu_edca: is the MU EDCA configured * @mu_edca_param_rec: MU EDCA Parameter Record for HE */ struct ieee80211_tx_queue_params { u16 txop; u16 cw_min; u16 cw_max; u8 aifs; bool acm; bool uapsd; bool mu_edca; struct ieee80211_he_mu_edca_param_ac_rec mu_edca_param_rec; }; struct ieee80211_low_level_stats { unsigned int dot11ACKFailureCount; unsigned int dot11RTSFailureCount; unsigned int dot11FCSErrorCount; unsigned int dot11RTSSuccessCount; }; /** * enum ieee80211_chanctx_change - change flag for channel context * @IEEE80211_CHANCTX_CHANGE_WIDTH: The channel width changed * @IEEE80211_CHANCTX_CHANGE_RX_CHAINS: The number of RX chains changed * @IEEE80211_CHANCTX_CHANGE_RADAR: radar detection flag changed * @IEEE80211_CHANCTX_CHANGE_CHANNEL: switched to another operating channel, * this is used only with channel switching with CSA * @IEEE80211_CHANCTX_CHANGE_MIN_WIDTH: The min required channel width changed */ enum ieee80211_chanctx_change { IEEE80211_CHANCTX_CHANGE_WIDTH = BIT(0), IEEE80211_CHANCTX_CHANGE_RX_CHAINS = BIT(1), IEEE80211_CHANCTX_CHANGE_RADAR = BIT(2), IEEE80211_CHANCTX_CHANGE_CHANNEL = BIT(3), IEEE80211_CHANCTX_CHANGE_MIN_WIDTH = BIT(4), }; /** * struct ieee80211_chanctx_conf - channel context that vifs may be tuned to * * This is the driver-visible part. The ieee80211_chanctx * that contains it is visible in mac80211 only. * * @def: the channel definition * @min_def: the minimum channel definition currently required. * @rx_chains_static: The number of RX chains that must always be * active on the channel to receive MIMO transmissions * @rx_chains_dynamic: The number of RX chains that must be enabled * after RTS/CTS handshake to receive SMPS MIMO transmissions; * this will always be >= @rx_chains_static. * @radar_enabled: whether radar detection is enabled on this channel. * @drv_priv: data area for driver use, will always be aligned to * sizeof(void *), size is determined in hw information. */ struct ieee80211_chanctx_conf { struct cfg80211_chan_def def; struct cfg80211_chan_def min_def; u8 rx_chains_static, rx_chains_dynamic; bool radar_enabled; u8 drv_priv[0] __aligned(sizeof(void *)); }; /** * enum ieee80211_chanctx_switch_mode - channel context switch mode * @CHANCTX_SWMODE_REASSIGN_VIF: Both old and new contexts already * exist (and will continue to exist), but the virtual interface * needs to be switched from one to the other. * @CHANCTX_SWMODE_SWAP_CONTEXTS: The old context exists but will stop * to exist with this call, the new context doesn't exist but * will be active after this call, the virtual interface switches * from the old to the new (note that the driver may of course * implement this as an on-the-fly chandef switch of the existing * hardware context, but the mac80211 pointer for the old context * will cease to exist and only the new one will later be used * for changes/removal.) */ enum ieee80211_chanctx_switch_mode { CHANCTX_SWMODE_REASSIGN_VIF, CHANCTX_SWMODE_SWAP_CONTEXTS, }; /** * struct ieee80211_vif_chanctx_switch - vif chanctx switch information * * This is structure is used to pass information about a vif that * needs to switch from one chanctx to another. The * &ieee80211_chanctx_switch_mode defines how the switch should be * done. * * @vif: the vif that should be switched from old_ctx to new_ctx * @old_ctx: the old context to which the vif was assigned * @new_ctx: the new context to which the vif must be assigned */ struct ieee80211_vif_chanctx_switch { struct ieee80211_vif *vif; struct ieee80211_chanctx_conf *old_ctx; struct ieee80211_chanctx_conf *new_ctx; }; /** * enum ieee80211_bss_change - BSS change notification flags * * These flags are used with the bss_info_changed() callback * to indicate which BSS parameter changed. * * @BSS_CHANGED_ASSOC: association status changed (associated/disassociated), * also implies a change in the AID. * @BSS_CHANGED_ERP_CTS_PROT: CTS protection changed * @BSS_CHANGED_ERP_PREAMBLE: preamble changed * @BSS_CHANGED_ERP_SLOT: slot timing changed * @BSS_CHANGED_HT: 802.11n parameters changed * @BSS_CHANGED_BASIC_RATES: Basic rateset changed * @BSS_CHANGED_BEACON_INT: Beacon interval changed * @BSS_CHANGED_BSSID: BSSID changed, for whatever * reason (IBSS and managed mode) * @BSS_CHANGED_BEACON: Beacon data changed, retrieve * new beacon (beaconing modes) * @BSS_CHANGED_BEACON_ENABLED: Beaconing should be * enabled/disabled (beaconing modes) * @BSS_CHANGED_CQM: Connection quality monitor config changed * @BSS_CHANGED_IBSS: IBSS join status changed * @BSS_CHANGED_ARP_FILTER: Hardware ARP filter address list or state changed. * @BSS_CHANGED_QOS: QoS for this association was enabled/disabled. Note * that it is only ever disabled for station mode. * @BSS_CHANGED_IDLE: Idle changed for this BSS/interface. * @BSS_CHANGED_SSID: SSID changed for this BSS (AP and IBSS mode) * @BSS_CHANGED_AP_PROBE_RESP: Probe Response changed for this BSS (AP mode) * @BSS_CHANGED_PS: PS changed for this BSS (STA mode) * @BSS_CHANGED_TXPOWER: TX power setting changed for this interface * @BSS_CHANGED_P2P_PS: P2P powersave settings (CTWindow, opportunistic PS) * changed * @BSS_CHANGED_BEACON_INFO: Data from the AP's beacon became available: * currently dtim_period only is under consideration. * @BSS_CHANGED_BANDWIDTH: The bandwidth used by this interface changed, * note that this is only called when it changes after the channel * context had been assigned. * @BSS_CHANGED_OCB: OCB join status changed * @BSS_CHANGED_MU_GROUPS: VHT MU-MIMO group id or user position changed * @BSS_CHANGED_KEEP_ALIVE: keep alive options (idle period or protected * keep alive) changed. * @BSS_CHANGED_MCAST_RATE: Multicast Rate setting changed for this interface * */ enum ieee80211_bss_change { BSS_CHANGED_ASSOC = 1<<0, BSS_CHANGED_ERP_CTS_PROT = 1<<1, BSS_CHANGED_ERP_PREAMBLE = 1<<2, BSS_CHANGED_ERP_SLOT = 1<<3, BSS_CHANGED_HT = 1<<4, BSS_CHANGED_BASIC_RATES = 1<<5, BSS_CHANGED_BEACON_INT = 1<<6, BSS_CHANGED_BSSID = 1<<7, BSS_CHANGED_BEACON = 1<<8, BSS_CHANGED_BEACON_ENABLED = 1<<9, BSS_CHANGED_CQM = 1<<10, BSS_CHANGED_IBSS = 1<<11, BSS_CHANGED_ARP_FILTER = 1<<12, BSS_CHANGED_QOS = 1<<13, BSS_CHANGED_IDLE = 1<<14, BSS_CHANGED_SSID = 1<<15, BSS_CHANGED_AP_PROBE_RESP = 1<<16, BSS_CHANGED_PS = 1<<17, BSS_CHANGED_TXPOWER = 1<<18, BSS_CHANGED_P2P_PS = 1<<19, BSS_CHANGED_BEACON_INFO = 1<<20, BSS_CHANGED_BANDWIDTH = 1<<21, BSS_CHANGED_OCB = 1<<22, BSS_CHANGED_MU_GROUPS = 1<<23, BSS_CHANGED_KEEP_ALIVE = 1<<24, BSS_CHANGED_MCAST_RATE = 1<<25, /* when adding here, make sure to change ieee80211_reconfig */ }; /* * The maximum number of IPv4 addresses listed for ARP filtering. If the number * of addresses for an interface increase beyond this value, hardware ARP * filtering will be disabled. */ #define IEEE80211_BSS_ARP_ADDR_LIST_LEN 4 /** * enum ieee80211_event_type - event to be notified to the low level driver * @RSSI_EVENT: AP's rssi crossed the a threshold set by the driver. * @MLME_EVENT: event related to MLME * @BAR_RX_EVENT: a BAR was received * @BA_FRAME_TIMEOUT: Frames were released from the reordering buffer because * they timed out. This won't be called for each frame released, but only * once each time the timeout triggers. */ enum ieee80211_event_type { RSSI_EVENT, MLME_EVENT, BAR_RX_EVENT, BA_FRAME_TIMEOUT, }; /** * enum ieee80211_rssi_event_data - relevant when event type is %RSSI_EVENT * @RSSI_EVENT_HIGH: AP's rssi went below the threshold set by the driver. * @RSSI_EVENT_LOW: AP's rssi went above the threshold set by the driver. */ enum ieee80211_rssi_event_data { RSSI_EVENT_HIGH, RSSI_EVENT_LOW, }; /** * struct ieee80211_rssi_event - data attached to an %RSSI_EVENT * @data: See &enum ieee80211_rssi_event_data */ struct ieee80211_rssi_event { enum ieee80211_rssi_event_data data; }; /** * enum ieee80211_mlme_event_data - relevant when event type is %MLME_EVENT * @AUTH_EVENT: the MLME operation is authentication * @ASSOC_EVENT: the MLME operation is association * @DEAUTH_RX_EVENT: deauth received.. * @DEAUTH_TX_EVENT: deauth sent. */ enum ieee80211_mlme_event_data { AUTH_EVENT, ASSOC_EVENT, DEAUTH_RX_EVENT, DEAUTH_TX_EVENT, }; /** * enum ieee80211_mlme_event_status - relevant when event type is %MLME_EVENT * @MLME_SUCCESS: the MLME operation completed successfully. * @MLME_DENIED: the MLME operation was denied by the peer. * @MLME_TIMEOUT: the MLME operation timed out. */ enum ieee80211_mlme_event_status { MLME_SUCCESS, MLME_DENIED, MLME_TIMEOUT, }; /** * struct ieee80211_mlme_event - data attached to an %MLME_EVENT * @data: See &enum ieee80211_mlme_event_data * @status: See &enum ieee80211_mlme_event_status * @reason: the reason code if applicable */ struct ieee80211_mlme_event { enum ieee80211_mlme_event_data data; enum ieee80211_mlme_event_status status; u16 reason; }; /** * struct ieee80211_ba_event - data attached for BlockAck related events * @sta: pointer to the &ieee80211_sta to which this event relates * @tid: the tid * @ssn: the starting sequence number (for %BAR_RX_EVENT) */ struct ieee80211_ba_event { struct ieee80211_sta *sta; u16 tid; u16 ssn; }; /** * struct ieee80211_event - event to be sent to the driver * @type: The event itself. See &enum ieee80211_event_type. * @rssi: relevant if &type is %RSSI_EVENT * @mlme: relevant if &type is %AUTH_EVENT * @ba: relevant if &type is %BAR_RX_EVENT or %BA_FRAME_TIMEOUT * @u:union holding the fields above */ struct ieee80211_event { enum ieee80211_event_type type; union { struct ieee80211_rssi_event rssi; struct ieee80211_mlme_event mlme; struct ieee80211_ba_event ba; } u; }; /** * struct ieee80211_mu_group_data - STA's VHT MU-MIMO group data * * This structure describes the group id data of VHT MU-MIMO * * @membership: 64 bits array - a bit is set if station is member of the group * @position: 2 bits per group id indicating the position in the group */ struct ieee80211_mu_group_data { u8 membership[WLAN_MEMBERSHIP_LEN]; u8 position[WLAN_USER_POSITION_LEN]; }; /** * struct ieee80211_bss_conf - holds the BSS's changing parameters * * This structure keeps information about a BSS (and an association * to that BSS) that can change during the lifetime of the BSS. * * @bss_color: 6-bit value to mark inter-BSS frame, if BSS supports HE * @htc_trig_based_pkt_ext: default PE in 4us units, if BSS supports HE * @multi_sta_back_32bit: supports BA bitmap of 32-bits in Multi-STA BACK * @uora_exists: is the UORA element advertised by AP * @ack_enabled: indicates support to receive a multi-TID that solicits either * ACK, BACK or both * @uora_ocw_range: UORA element's OCW Range field * @frame_time_rts_th: HE duration RTS threshold, in units of 32us * @he_support: does this BSS support HE * @assoc: association status * @ibss_joined: indicates whether this station is part of an IBSS * or not * @ibss_creator: indicates if a new IBSS network is being created * @aid: association ID number, valid only when @assoc is true * @use_cts_prot: use CTS protection * @use_short_preamble: use 802.11b short preamble * @use_short_slot: use short slot time (only relevant for ERP) * @dtim_period: num of beacons before the next DTIM, for beaconing, * valid in station mode only if after the driver was notified * with the %BSS_CHANGED_BEACON_INFO flag, will be non-zero then. * @sync_tsf: last beacon's/probe response's TSF timestamp (could be old * as it may have been received during scanning long ago). If the * HW flag %IEEE80211_HW_TIMING_BEACON_ONLY is set, then this can * only come from a beacon, but might not become valid until after * association when a beacon is received (which is notified with the * %BSS_CHANGED_DTIM flag.). See also sync_dtim_count important notice. * @sync_device_ts: the device timestamp corresponding to the sync_tsf, * the driver/device can use this to calculate synchronisation * (see @sync_tsf). See also sync_dtim_count important notice. * @sync_dtim_count: Only valid when %IEEE80211_HW_TIMING_BEACON_ONLY * is requested, see @sync_tsf/@sync_device_ts. * IMPORTANT: These three sync_* parameters would possibly be out of sync * by the time the driver will use them. The synchronized view is currently * guaranteed only in certain callbacks. * @beacon_int: beacon interval * @assoc_capability: capabilities taken from assoc resp * @basic_rates: bitmap of basic rates, each bit stands for an * index into the rate table configured by the driver in * the current band. * @beacon_rate: associated AP's beacon TX rate * @mcast_rate: per-band multicast rate index + 1 (0: disabled) * @bssid: The BSSID for this BSS * @enable_beacon: whether beaconing should be enabled or not * @chandef: Channel definition for this BSS -- the hardware might be * configured a higher bandwidth than this BSS uses, for example. * @mu_group: VHT MU-MIMO group membership data * @ht_operation_mode: HT operation mode like in &struct ieee80211_ht_operation. * This field is only valid when the channel is a wide HT/VHT channel. * Note that with TDLS this can be the case (channel is HT, protection must * be used from this field) even when the BSS association isn't using HT. * @cqm_rssi_thold: Connection quality monitor RSSI threshold, a zero value * implies disabled. As with the cfg80211 callback, a change here should * cause an event to be sent indicating where the current value is in * relation to the newly configured threshold. * @cqm_rssi_low: Connection quality monitor RSSI lower threshold, a zero value * implies disabled. This is an alternative mechanism to the single * threshold event and can't be enabled simultaneously with it. * @cqm_rssi_high: Connection quality monitor RSSI upper threshold. * @cqm_rssi_hyst: Connection quality monitor RSSI hysteresis * @arp_addr_list: List of IPv4 addresses for hardware ARP filtering. The * may filter ARP queries targeted for other addresses than listed here. * The driver must allow ARP queries targeted for all address listed here * to pass through. An empty list implies no ARP queries need to pass. * @arp_addr_cnt: Number of addresses currently on the list. Note that this * may be larger than %IEEE80211_BSS_ARP_ADDR_LIST_LEN (the arp_addr_list * array size), it's up to the driver what to do in that case. * @qos: This is a QoS-enabled BSS. * @idle: This interface is idle. There's also a global idle flag in the * hardware config which may be more appropriate depending on what * your driver/device needs to do. * @ps: power-save mode (STA only). This flag is NOT affected by * offchannel/dynamic_ps operations. * @ssid: The SSID of the current vif. Valid in AP and IBSS mode. * @ssid_len: Length of SSID given in @ssid. * @hidden_ssid: The SSID of the current vif is hidden. Only valid in AP-mode. * @txpower: TX power in dBm * @txpower_type: TX power adjustment used to control per packet Transmit * Power Control (TPC) in lower driver for the current vif. In particular * TPC is enabled if value passed in %txpower_type is * NL80211_TX_POWER_LIMITED (allow using less than specified from * userspace), whereas TPC is disabled if %txpower_type is set to * NL80211_TX_POWER_FIXED (use value configured from userspace) * @p2p_noa_attr: P2P NoA attribute for P2P powersave * @allow_p2p_go_ps: indication for AP or P2P GO interface, whether it's allowed * to use P2P PS mechanism or not. AP/P2P GO is not allowed to use P2P PS * if it has associated clients without P2P PS support. * @max_idle_period: the time period during which the station can refrain from * transmitting frames to its associated AP without being disassociated. * In units of 1000 TUs. Zero value indicates that the AP did not include * a (valid) BSS Max Idle Period Element. * @protected_keep_alive: if set, indicates that the station should send an RSN * protected frame to the AP to reset the idle timer at the AP for the * station. */ struct ieee80211_bss_conf { const u8 *bssid; u8 bss_color; u8 htc_trig_based_pkt_ext; bool multi_sta_back_32bit; bool uora_exists; bool ack_enabled; u8 uora_ocw_range; u16 frame_time_rts_th; bool he_support; /* association related data */ bool assoc, ibss_joined; bool ibss_creator; u16 aid; /* erp related data */ bool use_cts_prot; bool use_short_preamble; bool use_short_slot; bool enable_beacon; u8 dtim_period; u16 beacon_int; u16 assoc_capability; u64 sync_tsf; u32 sync_device_ts; u8 sync_dtim_count; u32 basic_rates; struct ieee80211_rate *beacon_rate; int mcast_rate[NUM_NL80211_BANDS]; u16 ht_operation_mode; s32 cqm_rssi_thold; u32 cqm_rssi_hyst; s32 cqm_rssi_low; s32 cqm_rssi_high; struct cfg80211_chan_def chandef; struct ieee80211_mu_group_data mu_group; __be32 arp_addr_list[IEEE80211_BSS_ARP_ADDR_LIST_LEN]; int arp_addr_cnt; bool qos; bool idle; bool ps; u8 ssid[IEEE80211_MAX_SSID_LEN]; size_t ssid_len; bool hidden_ssid; int txpower; enum nl80211_tx_power_setting txpower_type; struct ieee80211_p2p_noa_attr p2p_noa_attr; bool allow_p2p_go_ps; u16 max_idle_period; bool protected_keep_alive; }; /** * enum mac80211_tx_info_flags - flags to describe transmission information/status * * These flags are used with the @flags member of &ieee80211_tx_info. * * @IEEE80211_TX_CTL_REQ_TX_STATUS: require TX status callback for this frame. * @IEEE80211_TX_CTL_ASSIGN_SEQ: The driver has to assign a sequence * number to this frame, taking care of not overwriting the fragment * number and increasing the sequence number only when the * IEEE80211_TX_CTL_FIRST_FRAGMENT flag is set. mac80211 will properly * assign sequence numbers to QoS-data frames but cannot do so correctly * for non-QoS-data and management frames because beacons need them from * that counter as well and mac80211 cannot guarantee proper sequencing. * If this flag is set, the driver should instruct the hardware to * assign a sequence number to the frame or assign one itself. Cf. IEEE * 802.11-2007 7.1.3.4.1 paragraph 3. This flag will always be set for * beacons and always be clear for frames without a sequence number field. * @IEEE80211_TX_CTL_NO_ACK: tell the low level not to wait for an ack * @IEEE80211_TX_CTL_CLEAR_PS_FILT: clear powersave filter for destination * station * @IEEE80211_TX_CTL_FIRST_FRAGMENT: this is a first fragment of the frame * @IEEE80211_TX_CTL_SEND_AFTER_DTIM: send this frame after DTIM beacon * @IEEE80211_TX_CTL_AMPDU: this frame should be sent as part of an A-MPDU * @IEEE80211_TX_CTL_INJECTED: Frame was injected, internal to mac80211. * @IEEE80211_TX_STAT_TX_FILTERED: The frame was not transmitted * because the destination STA was in powersave mode. Note that to * avoid race conditions, the filter must be set by the hardware or * firmware upon receiving a frame that indicates that the station * went to sleep (must be done on device to filter frames already on * the queue) and may only be unset after mac80211 gives the OK for * that by setting the IEEE80211_TX_CTL_CLEAR_PS_FILT (see above), * since only then is it guaranteed that no more frames are in the * hardware queue. * @IEEE80211_TX_STAT_ACK: Frame was acknowledged * @IEEE80211_TX_STAT_AMPDU: The frame was aggregated, so status * is for the whole aggregation. * @IEEE80211_TX_STAT_AMPDU_NO_BACK: no block ack was returned, * so consider using block ack request (BAR). * @IEEE80211_TX_CTL_RATE_CTRL_PROBE: internal to mac80211, can be * set by rate control algorithms to indicate probe rate, will * be cleared for fragmented frames (except on the last fragment) * @IEEE80211_TX_INTFL_OFFCHAN_TX_OK: Internal to mac80211. Used to indicate * that a frame can be transmitted while the queues are stopped for * off-channel operation. * @IEEE80211_TX_INTFL_NEED_TXPROCESSING: completely internal to mac80211, * used to indicate that a pending frame requires TX processing before * it can be sent out. * @IEEE80211_TX_INTFL_RETRIED: completely internal to mac80211, * used to indicate that a frame was already retried due to PS * @IEEE80211_TX_INTFL_DONT_ENCRYPT: completely internal to mac80211, * used to indicate frame should not be encrypted * @IEEE80211_TX_CTL_NO_PS_BUFFER: This frame is a response to a poll * frame (PS-Poll or uAPSD) or a non-bufferable MMPDU and must * be sent although the station is in powersave mode. * @IEEE80211_TX_CTL_MORE_FRAMES: More frames will be passed to the * transmit function after the current frame, this can be used * by drivers to kick the DMA queue only if unset or when the * queue gets full. * @IEEE80211_TX_INTFL_RETRANSMISSION: This frame is being retransmitted * after TX status because the destination was asleep, it must not * be modified again (no seqno assignment, crypto, etc.) * @IEEE80211_TX_INTFL_MLME_CONN_TX: This frame was transmitted by the MLME * code for connection establishment, this indicates that its status * should kick the MLME state machine. * @IEEE80211_TX_INTFL_NL80211_FRAME_TX: Frame was requested through nl80211 * MLME command (internal to mac80211 to figure out whether to send TX * status to user space) * @IEEE80211_TX_CTL_LDPC: tells the driver to use LDPC for this frame * @IEEE80211_TX_CTL_STBC: Enables Space-Time Block Coding (STBC) for this * frame and selects the maximum number of streams that it can use. * @IEEE80211_TX_CTL_TX_OFFCHAN: Marks this packet to be transmitted on * the off-channel channel when a remain-on-channel offload is done * in hardware -- normal packets still flow and are expected to be * handled properly by the device. * @IEEE80211_TX_INTFL_TKIP_MIC_FAILURE: Marks this packet to be used for TKIP * testing. It will be sent out with incorrect Michael MIC key to allow * TKIP countermeasures to be tested. * @IEEE80211_TX_CTL_NO_CCK_RATE: This frame will be sent at non CCK rate. * This flag is actually used for management frame especially for P2P * frames not being sent at CCK rate in 2GHz band. * @IEEE80211_TX_STATUS_EOSP: This packet marks the end of service period, * when its status is reported the service period ends. For frames in * an SP that mac80211 transmits, it is already set; for driver frames * the driver may set this flag. It is also used to do the same for * PS-Poll responses. * @IEEE80211_TX_CTL_USE_MINRATE: This frame will be sent at lowest rate. * This flag is used to send nullfunc frame at minimum rate when * the nullfunc is used for connection monitoring purpose. * @IEEE80211_TX_CTL_DONTFRAG: Don't fragment this packet even if it * would be fragmented by size (this is optional, only used for * monitor injection). * @IEEE80211_TX_STAT_NOACK_TRANSMITTED: A frame that was marked with * IEEE80211_TX_CTL_NO_ACK has been successfully transmitted without * any errors (like issues specific to the driver/HW). * This flag must not be set for frames that don't request no-ack * behaviour with IEEE80211_TX_CTL_NO_ACK. * * Note: If you have to add new flags to the enumeration, then don't * forget to update %IEEE80211_TX_TEMPORARY_FLAGS when necessary. */ enum mac80211_tx_info_flags { IEEE80211_TX_CTL_REQ_TX_STATUS = BIT(0), IEEE80211_TX_CTL_ASSIGN_SEQ = BIT(1), IEEE80211_TX_CTL_NO_ACK = BIT(2), IEEE80211_TX_CTL_CLEAR_PS_FILT = BIT(3), IEEE80211_TX_CTL_FIRST_FRAGMENT = BIT(4), IEEE80211_TX_CTL_SEND_AFTER_DTIM = BIT(5), IEEE80211_TX_CTL_AMPDU = BIT(6), IEEE80211_TX_CTL_INJECTED = BIT(7), IEEE80211_TX_STAT_TX_FILTERED = BIT(8), IEEE80211_TX_STAT_ACK = BIT(9), IEEE80211_TX_STAT_AMPDU = BIT(10), IEEE80211_TX_STAT_AMPDU_NO_BACK = BIT(11), IEEE80211_TX_CTL_RATE_CTRL_PROBE = BIT(12), IEEE80211_TX_INTFL_OFFCHAN_TX_OK = BIT(13), IEEE80211_TX_INTFL_NEED_TXPROCESSING = BIT(14), IEEE80211_TX_INTFL_RETRIED = BIT(15), IEEE80211_TX_INTFL_DONT_ENCRYPT = BIT(16), IEEE80211_TX_CTL_NO_PS_BUFFER = BIT(17), IEEE80211_TX_CTL_MORE_FRAMES = BIT(18), IEEE80211_TX_INTFL_RETRANSMISSION = BIT(19), IEEE80211_TX_INTFL_MLME_CONN_TX = BIT(20), IEEE80211_TX_INTFL_NL80211_FRAME_TX = BIT(21), IEEE80211_TX_CTL_LDPC = BIT(22), IEEE80211_TX_CTL_STBC = BIT(23) | BIT(24), IEEE80211_TX_CTL_TX_OFFCHAN = BIT(25), IEEE80211_TX_INTFL_TKIP_MIC_FAILURE = BIT(26), IEEE80211_TX_CTL_NO_CCK_RATE = BIT(27), IEEE80211_TX_STATUS_EOSP = BIT(28), IEEE80211_TX_CTL_USE_MINRATE = BIT(29), IEEE80211_TX_CTL_DONTFRAG = BIT(30), IEEE80211_TX_STAT_NOACK_TRANSMITTED = BIT(31), }; #define IEEE80211_TX_CTL_STBC_SHIFT 23 /** * enum mac80211_tx_control_flags - flags to describe transmit control * * @IEEE80211_TX_CTRL_PORT_CTRL_PROTO: this frame is a port control * protocol frame (e.g. EAP) * @IEEE80211_TX_CTRL_PS_RESPONSE: This frame is a response to a poll * frame (PS-Poll or uAPSD). * @IEEE80211_TX_CTRL_RATE_INJECT: This frame is injected with rate information * @IEEE80211_TX_CTRL_AMSDU: This frame is an A-MSDU frame * @IEEE80211_TX_CTRL_FAST_XMIT: This frame is going through the fast_xmit path * * These flags are used in tx_info->control.flags. */ enum mac80211_tx_control_flags { IEEE80211_TX_CTRL_PORT_CTRL_PROTO = BIT(0), IEEE80211_TX_CTRL_PS_RESPONSE = BIT(1), IEEE80211_TX_CTRL_RATE_INJECT = BIT(2), IEEE80211_TX_CTRL_AMSDU = BIT(3), IEEE80211_TX_CTRL_FAST_XMIT = BIT(4), }; /* * This definition is used as a mask to clear all temporary flags, which are * set by the tx handlers for each transmission attempt by the mac80211 stack. */ #define IEEE80211_TX_TEMPORARY_FLAGS (IEEE80211_TX_CTL_NO_ACK | \ IEEE80211_TX_CTL_CLEAR_PS_FILT | IEEE80211_TX_CTL_FIRST_FRAGMENT | \ IEEE80211_TX_CTL_SEND_AFTER_DTIM | IEEE80211_TX_CTL_AMPDU | \ IEEE80211_TX_STAT_TX_FILTERED | IEEE80211_TX_STAT_ACK | \ IEEE80211_TX_STAT_AMPDU | IEEE80211_TX_STAT_AMPDU_NO_BACK | \ IEEE80211_TX_CTL_RATE_CTRL_PROBE | IEEE80211_TX_CTL_NO_PS_BUFFER | \ IEEE80211_TX_CTL_MORE_FRAMES | IEEE80211_TX_CTL_LDPC | \ IEEE80211_TX_CTL_STBC | IEEE80211_TX_STATUS_EOSP) /** * enum mac80211_rate_control_flags - per-rate flags set by the * Rate Control algorithm. * * These flags are set by the Rate control algorithm for each rate during tx, * in the @flags member of struct ieee80211_tx_rate. * * @IEEE80211_TX_RC_USE_RTS_CTS: Use RTS/CTS exchange for this rate. * @IEEE80211_TX_RC_USE_CTS_PROTECT: CTS-to-self protection is required. * This is set if the current BSS requires ERP protection. * @IEEE80211_TX_RC_USE_SHORT_PREAMBLE: Use short preamble. * @IEEE80211_TX_RC_MCS: HT rate. * @IEEE80211_TX_RC_VHT_MCS: VHT MCS rate, in this case the idx field is split * into a higher 4 bits (Nss) and lower 4 bits (MCS number) * @IEEE80211_TX_RC_GREEN_FIELD: Indicates whether this rate should be used in * Greenfield mode. * @IEEE80211_TX_RC_40_MHZ_WIDTH: Indicates if the Channel Width should be 40 MHz. * @IEEE80211_TX_RC_80_MHZ_WIDTH: Indicates 80 MHz transmission * @IEEE80211_TX_RC_160_MHZ_WIDTH: Indicates 160 MHz transmission * (80+80 isn't supported yet) * @IEEE80211_TX_RC_DUP_DATA: The frame should be transmitted on both of the * adjacent 20 MHz channels, if the current channel type is * NL80211_CHAN_HT40MINUS or NL80211_CHAN_HT40PLUS. * @IEEE80211_TX_RC_SHORT_GI: Short Guard interval should be used for this rate. */ enum mac80211_rate_control_flags { IEEE80211_TX_RC_USE_RTS_CTS = BIT(0), IEEE80211_TX_RC_USE_CTS_PROTECT = BIT(1), IEEE80211_TX_RC_USE_SHORT_PREAMBLE = BIT(2), /* rate index is an HT/VHT MCS instead of an index */ IEEE80211_TX_RC_MCS = BIT(3), IEEE80211_TX_RC_GREEN_FIELD = BIT(4), IEEE80211_TX_RC_40_MHZ_WIDTH = BIT(5), IEEE80211_TX_RC_DUP_DATA = BIT(6), IEEE80211_TX_RC_SHORT_GI = BIT(7), IEEE80211_TX_RC_VHT_MCS = BIT(8), IEEE80211_TX_RC_80_MHZ_WIDTH = BIT(9), IEEE80211_TX_RC_160_MHZ_WIDTH = BIT(10), }; /* there are 40 bytes if you don't need the rateset to be kept */ #define IEEE80211_TX_INFO_DRIVER_DATA_SIZE 40 /* if you do need the rateset, then you have less space */ #define IEEE80211_TX_INFO_RATE_DRIVER_DATA_SIZE 24 /* maximum number of rate stages */ #define IEEE80211_TX_MAX_RATES 4 /* maximum number of rate table entries */ #define IEEE80211_TX_RATE_TABLE_SIZE 4 /** * struct ieee80211_tx_rate - rate selection/status * * @idx: rate index to attempt to send with * @flags: rate control flags (&enum mac80211_rate_control_flags) * @count: number of tries in this rate before going to the next rate * * A value of -1 for @idx indicates an invalid rate and, if used * in an array of retry rates, that no more rates should be tried. * * When used for transmit status reporting, the driver should * always report the rate along with the flags it used. * * &struct ieee80211_tx_info contains an array of these structs * in the control information, and it will be filled by the rate * control algorithm according to what should be sent. For example, * if this array contains, in the format { <idx>, <count> } the * information:: * * { 3, 2 }, { 2, 2 }, { 1, 4 }, { -1, 0 }, { -1, 0 } * * then this means that the frame should be transmitted * up to twice at rate 3, up to twice at rate 2, and up to four * times at rate 1 if it doesn't get acknowledged. Say it gets * acknowledged by the peer after the fifth attempt, the status * information should then contain:: * * { 3, 2 }, { 2, 2 }, { 1, 1 }, { -1, 0 } ... * * since it was transmitted twice at rate 3, twice at rate 2 * and once at rate 1 after which we received an acknowledgement. */ struct ieee80211_tx_rate { s8 idx; u16 count:5, flags:11; } __packed; #define IEEE80211_MAX_TX_RETRY 31 static inline void ieee80211_rate_set_vht(struct ieee80211_tx_rate *rate, u8 mcs, u8 nss) { WARN_ON(mcs & ~0xF); WARN_ON((nss - 1) & ~0x7); rate->idx = ((nss - 1) << 4) | mcs; } static inline u8 ieee80211_rate_get_vht_mcs(const struct ieee80211_tx_rate *rate) { return rate->idx & 0xF; } static inline u8 ieee80211_rate_get_vht_nss(const struct ieee80211_tx_rate *rate) { return (rate->idx >> 4) + 1; } /** * struct ieee80211_tx_info - skb transmit information * * This structure is placed in skb->cb for three uses: * (1) mac80211 TX control - mac80211 tells the driver what to do * (2) driver internal use (if applicable) * (3) TX status information - driver tells mac80211 what happened * * @flags: transmit info flags, defined above * @band: the band to transmit on (use for checking for races) * @hw_queue: HW queue to put the frame on, skb_get_queue_mapping() gives the AC * @ack_frame_id: internal frame ID for TX status, used internally * @control: union for control data * @status: union for status data * @driver_data: array of driver_data pointers * @ampdu_ack_len: number of acked aggregated frames. * relevant only if IEEE80211_TX_STAT_AMPDU was set. * @ampdu_len: number of aggregated frames. * relevant only if IEEE80211_TX_STAT_AMPDU was set. * @ack_signal: signal strength of the ACK frame */ struct ieee80211_tx_info { /* common information */ u32 flags; u8 band; u8 hw_queue; u16 ack_frame_id; union { struct { union { /* rate control */ struct { struct ieee80211_tx_rate rates[ IEEE80211_TX_MAX_RATES]; s8 rts_cts_rate_idx; u8 use_rts:1; u8 use_cts_prot:1; u8 short_preamble:1; u8 skip_table:1; /* 2 bytes free */ }; /* only needed before rate control */ unsigned long jiffies; }; /* NB: vif can be NULL for injected frames */ struct ieee80211_vif *vif; struct ieee80211_key_conf *hw_key; u32 flags; codel_time_t enqueue_time; } control; struct { u64 cookie; } ack; struct { struct ieee80211_tx_rate rates[IEEE80211_TX_MAX_RATES]; s32 ack_signal; u8 ampdu_ack_len; u8 ampdu_len; u8 antenna; u16 tx_time; bool is_valid_ack_signal; void *status_driver_data[19 / sizeof(void *)]; } status; struct { struct ieee80211_tx_rate driver_rates[ IEEE80211_TX_MAX_RATES]; u8 pad[4]; void *rate_driver_data[ IEEE80211_TX_INFO_RATE_DRIVER_DATA_SIZE / sizeof(void *)]; }; void *driver_data[ IEEE80211_TX_INFO_DRIVER_DATA_SIZE / sizeof(void *)]; }; }; /** * struct ieee80211_tx_status - extended tx staus info for rate control * * @sta: Station that the packet was transmitted for * @info: Basic tx status information * @skb: Packet skb (can be NULL if not provided by the driver) */ struct ieee80211_tx_status { struct ieee80211_sta *sta; struct ieee80211_tx_info *info; struct sk_buff *skb; }; /** * struct ieee80211_scan_ies - descriptors for different blocks of IEs * * This structure is used to point to different blocks of IEs in HW scan * and scheduled scan. These blocks contain the IEs passed by userspace * and the ones generated by mac80211. * * @ies: pointers to band specific IEs. * @len: lengths of band_specific IEs. * @common_ies: IEs for all bands (especially vendor specific ones) * @common_ie_len: length of the common_ies */ struct ieee80211_scan_ies { const u8 *ies[NUM_NL80211_BANDS]; size_t len[NUM_NL80211_BANDS]; const u8 *common_ies; size_t common_ie_len; }; static inline struct ieee80211_tx_info *IEEE80211_SKB_CB(struct sk_buff *skb) { return (struct ieee80211_tx_info *)skb->cb; } static inline struct ieee80211_rx_status *IEEE80211_SKB_RXCB(struct sk_buff *skb) { return (struct ieee80211_rx_status *)skb->cb; } /** * ieee80211_tx_info_clear_status - clear TX status * * @info: The &struct ieee80211_tx_info to be cleared. * * When the driver passes an skb back to mac80211, it must report * a number of things in TX status. This function clears everything * in the TX status but the rate control information (it does clear * the count since you need to fill that in anyway). * * NOTE: You can only use this function if you do NOT use * info->driver_data! Use info->rate_driver_data * instead if you need only the less space that allows. */ static inline void ieee80211_tx_info_clear_status(struct ieee80211_tx_info *info) { int i; BUILD_BUG_ON(offsetof(struct ieee80211_tx_info, status.rates) != offsetof(struct ieee80211_tx_info, control.rates)); BUILD_BUG_ON(offsetof(struct ieee80211_tx_info, status.rates) != offsetof(struct ieee80211_tx_info, driver_rates)); BUILD_BUG_ON(offsetof(struct ieee80211_tx_info, status.rates) != 8); /* clear the rate counts */ for (i = 0; i < IEEE80211_TX_MAX_RATES; i++) info->status.rates[i].count = 0; BUILD_BUG_ON( offsetof(struct ieee80211_tx_info, status.ack_signal) != 20); memset(&info->status.ampdu_ack_len, 0, sizeof(struct ieee80211_tx_info) - offsetof(struct ieee80211_tx_info, status.ampdu_ack_len)); } /** * enum mac80211_rx_flags - receive flags * * These flags are used with the @flag member of &struct ieee80211_rx_status. * @RX_FLAG_MMIC_ERROR: Michael MIC error was reported on this frame. * Use together with %RX_FLAG_MMIC_STRIPPED. * @RX_FLAG_DECRYPTED: This frame was decrypted in hardware. * @RX_FLAG_MMIC_STRIPPED: the Michael MIC is stripped off this frame, * verification has been done by the hardware. * @RX_FLAG_IV_STRIPPED: The IV and ICV are stripped from this frame. * If this flag is set, the stack cannot do any replay detection * hence the driver or hardware will have to do that. * @RX_FLAG_PN_VALIDATED: Currently only valid for CCMP/GCMP frames, this * flag indicates that the PN was verified for replay protection. * Note that this flag is also currently only supported when a frame * is also decrypted (ie. @RX_FLAG_DECRYPTED must be set) * @RX_FLAG_DUP_VALIDATED: The driver should set this flag if it did * de-duplication by itself. * @RX_FLAG_FAILED_FCS_CRC: Set this flag if the FCS check failed on * the frame. * @RX_FLAG_FAILED_PLCP_CRC: Set this flag if the PCLP check failed on * the frame. * @RX_FLAG_MACTIME_START: The timestamp passed in the RX status (@mactime * field) is valid and contains the time the first symbol of the MPDU * was received. This is useful in monitor mode and for proper IBSS * merging. * @RX_FLAG_MACTIME_END: The timestamp passed in the RX status (@mactime * field) is valid and contains the time the last symbol of the MPDU * (including FCS) was received. * @RX_FLAG_MACTIME_PLCP_START: The timestamp passed in the RX status (@mactime * field) is valid and contains the time the SYNC preamble was received. * @RX_FLAG_NO_SIGNAL_VAL: The signal strength value is not present. * Valid only for data frames (mainly A-MPDU) * @RX_FLAG_AMPDU_DETAILS: A-MPDU details are known, in particular the reference * number (@ampdu_reference) must be populated and be a distinct number for * each A-MPDU * @RX_FLAG_AMPDU_LAST_KNOWN: last subframe is known, should be set on all * subframes of a single A-MPDU * @RX_FLAG_AMPDU_IS_LAST: this subframe is the last subframe of the A-MPDU * @RX_FLAG_AMPDU_DELIM_CRC_ERROR: A delimiter CRC error has been detected * on this subframe * @RX_FLAG_AMPDU_DELIM_CRC_KNOWN: The delimiter CRC field is known (the CRC * is stored in the @ampdu_delimiter_crc field) * @RX_FLAG_MIC_STRIPPED: The mic was stripped of this packet. Decryption was * done by the hardware * @RX_FLAG_ONLY_MONITOR: Report frame only to monitor interfaces without * processing it in any regular way. * This is useful if drivers offload some frames but still want to report * them for sniffing purposes. * @RX_FLAG_SKIP_MONITOR: Process and report frame to all interfaces except * monitor interfaces. * This is useful if drivers offload some frames but still want to report * them for sniffing purposes. * @RX_FLAG_AMSDU_MORE: Some drivers may prefer to report separate A-MSDU * subframes instead of a one huge frame for performance reasons. * All, but the last MSDU from an A-MSDU should have this flag set. E.g. * if an A-MSDU has 3 frames, the first 2 must have the flag set, while * the 3rd (last) one must not have this flag set. The flag is used to * deal with retransmission/duplication recovery properly since A-MSDU * subframes share the same sequence number. Reported subframes can be * either regular MSDU or singly A-MSDUs. Subframes must not be * interleaved with other frames. * @RX_FLAG_RADIOTAP_VENDOR_DATA: This frame contains vendor-specific * radiotap data in the skb->data (before the frame) as described by * the &struct ieee80211_vendor_radiotap. * @RX_FLAG_ALLOW_SAME_PN: Allow the same PN as same packet before. * This is used for AMSDU subframes which can have the same PN as * the first subframe. * @RX_FLAG_ICV_STRIPPED: The ICV is stripped from this frame. CRC checking must * be done in the hardware. * @RX_FLAG_AMPDU_EOF_BIT: Value of the EOF bit in the A-MPDU delimiter for this * frame * @RX_FLAG_AMPDU_EOF_BIT_KNOWN: The EOF value is known * @RX_FLAG_RADIOTAP_HE: HE radiotap data is present * (&struct ieee80211_radiotap_he, mac80211 will fill in * - DATA3_DATA_MCS * - DATA3_DATA_DCM * - DATA3_CODING * - DATA5_GI * - DATA5_DATA_BW_RU_ALLOC * - DATA6_NSTS * - DATA3_STBC * from the RX info data, so leave those zeroed when building this data) * @RX_FLAG_RADIOTAP_HE_MU: HE MU radiotap data is present * (&struct ieee80211_radiotap_he_mu) */ enum mac80211_rx_flags { RX_FLAG_MMIC_ERROR = BIT(0), RX_FLAG_DECRYPTED = BIT(1), RX_FLAG_MACTIME_PLCP_START = BIT(2), RX_FLAG_MMIC_STRIPPED = BIT(3), RX_FLAG_IV_STRIPPED = BIT(4), RX_FLAG_FAILED_FCS_CRC = BIT(5), RX_FLAG_FAILED_PLCP_CRC = BIT(6), RX_FLAG_MACTIME_START = BIT(7), RX_FLAG_NO_SIGNAL_VAL = BIT(8), RX_FLAG_AMPDU_DETAILS = BIT(9), RX_FLAG_PN_VALIDATED = BIT(10), RX_FLAG_DUP_VALIDATED = BIT(11), RX_FLAG_AMPDU_LAST_KNOWN = BIT(12), RX_FLAG_AMPDU_IS_LAST = BIT(13), RX_FLAG_AMPDU_DELIM_CRC_ERROR = BIT(14), RX_FLAG_AMPDU_DELIM_CRC_KNOWN = BIT(15), RX_FLAG_MACTIME_END = BIT(16), RX_FLAG_ONLY_MONITOR = BIT(17), RX_FLAG_SKIP_MONITOR = BIT(18), RX_FLAG_AMSDU_MORE = BIT(19), RX_FLAG_RADIOTAP_VENDOR_DATA = BIT(20), RX_FLAG_MIC_STRIPPED = BIT(21), RX_FLAG_ALLOW_SAME_PN = BIT(22), RX_FLAG_ICV_STRIPPED = BIT(23), RX_FLAG_AMPDU_EOF_BIT = BIT(24), RX_FLAG_AMPDU_EOF_BIT_KNOWN = BIT(25), RX_FLAG_RADIOTAP_HE = BIT(26), RX_FLAG_RADIOTAP_HE_MU = BIT(27), }; /** * enum mac80211_rx_encoding_flags - MCS & bandwidth flags * * @RX_ENC_FLAG_SHORTPRE: Short preamble was used for this frame * @RX_ENC_FLAG_SHORT_GI: Short guard interval was used * @RX_ENC_FLAG_HT_GF: This frame was received in a HT-greenfield transmission, * if the driver fills this value it should add * %IEEE80211_RADIOTAP_MCS_HAVE_FMT * to hw.radiotap_mcs_details to advertise that fact * @RX_ENC_FLAG_LDPC: LDPC was used * @RX_ENC_FLAG_STBC_MASK: STBC 2 bit bitmask. 1 - Nss=1, 2 - Nss=2, 3 - Nss=3 * @RX_ENC_FLAG_BF: packet was beamformed */ enum mac80211_rx_encoding_flags { RX_ENC_FLAG_SHORTPRE = BIT(0), RX_ENC_FLAG_SHORT_GI = BIT(2), RX_ENC_FLAG_HT_GF = BIT(3), RX_ENC_FLAG_STBC_MASK = BIT(4) | BIT(5), RX_ENC_FLAG_LDPC = BIT(6), RX_ENC_FLAG_BF = BIT(7), }; #define RX_ENC_FLAG_STBC_SHIFT 4 enum mac80211_rx_encoding { RX_ENC_LEGACY = 0, RX_ENC_HT, RX_ENC_VHT, RX_ENC_HE, }; /** * struct ieee80211_rx_status - receive status * * The low-level driver should provide this information (the subset * supported by hardware) to the 802.11 code with each received * frame, in the skb's control buffer (cb). * * @mactime: value in microseconds of the 64-bit Time Synchronization Function * (TSF) timer when the first data symbol (MPDU) arrived at the hardware. * @boottime_ns: CLOCK_BOOTTIME timestamp the frame was received at, this is * needed only for beacons and probe responses that update the scan cache. * @device_timestamp: arbitrary timestamp for the device, mac80211 doesn't use * it but can store it and pass it back to the driver for synchronisation * @band: the active band when this frame was received * @freq: frequency the radio was tuned to when receiving this frame, in MHz * This field must be set for management frames, but isn't strictly needed * for data (other) frames - for those it only affects radiotap reporting. * @signal: signal strength when receiving this frame, either in dBm, in dB or * unspecified depending on the hardware capabilities flags * @IEEE80211_HW_SIGNAL_* * @chains: bitmask of receive chains for which separate signal strength * values were filled. * @chain_signal: per-chain signal strength, in dBm (unlike @signal, doesn't * support dB or unspecified units) * @antenna: antenna used * @rate_idx: index of data rate into band's supported rates or MCS index if * HT or VHT is used (%RX_FLAG_HT/%RX_FLAG_VHT) * @nss: number of streams (VHT and HE only) * @flag: %RX_FLAG_\* * @encoding: &enum mac80211_rx_encoding * @bw: &enum rate_info_bw * @enc_flags: uses bits from &enum mac80211_rx_encoding_flags * @he_ru: HE RU, from &enum nl80211_he_ru_alloc * @he_gi: HE GI, from &enum nl80211_he_gi * @he_dcm: HE DCM value * @rx_flags: internal RX flags for mac80211 * @ampdu_reference: A-MPDU reference number, must be a different value for * each A-MPDU but the same for each subframe within one A-MPDU * @ampdu_delimiter_crc: A-MPDU delimiter CRC */ struct ieee80211_rx_status { u64 mactime; u64 boottime_ns; u32 device_timestamp; u32 ampdu_reference; u32 flag; u16 freq; u8 enc_flags; u8 encoding:2, bw:3, he_ru:3; u8 he_gi:2, he_dcm:1; u8 rate_idx; u8 nss; u8 rx_flags; u8 band; u8 antenna; s8 signal; u8 chains; s8 chain_signal[IEEE80211_MAX_CHAINS]; u8 ampdu_delimiter_crc; }; /** * struct ieee80211_vendor_radiotap - vendor radiotap data information * @present: presence bitmap for this vendor namespace * (this could be extended in the future if any vendor needs more * bits, the radiotap spec does allow for that) * @align: radiotap vendor namespace alignment. This defines the needed * alignment for the @data field below, not for the vendor namespace * description itself (which has a fixed 2-byte alignment) * Must be a power of two, and be set to at least 1! * @oui: radiotap vendor namespace OUI * @subns: radiotap vendor sub namespace * @len: radiotap vendor sub namespace skip length, if alignment is done * then that's added to this, i.e. this is only the length of the * @data field. * @pad: number of bytes of padding after the @data, this exists so that * the skb data alignment can be preserved even if the data has odd * length * @data: the actual vendor namespace data * * This struct, including the vendor data, goes into the skb->data before * the 802.11 header. It's split up in mac80211 using the align/oui/subns * data. */ struct ieee80211_vendor_radiotap { u32 present; u8 align; u8 oui[3]; u8 subns; u8 pad; u16 len; u8 data[]; } __packed; /** * enum ieee80211_conf_flags - configuration flags * * Flags to define PHY configuration options * * @IEEE80211_CONF_MONITOR: there's a monitor interface present -- use this * to determine for example whether to calculate timestamps for packets * or not, do not use instead of filter flags! * @IEEE80211_CONF_PS: Enable 802.11 power save mode (managed mode only). * This is the power save mode defined by IEEE 802.11-2007 section 11.2, * meaning that the hardware still wakes up for beacons, is able to * transmit frames and receive the possible acknowledgment frames. * Not to be confused with hardware specific wakeup/sleep states, * driver is responsible for that. See the section "Powersave support" * for more. * @IEEE80211_CONF_IDLE: The device is running, but idle; if the flag is set * the driver should be prepared to handle configuration requests but * may turn the device off as much as possible. Typically, this flag will * be set when an interface is set UP but not associated or scanning, but * it can also be unset in that case when monitor interfaces are active. * @IEEE80211_CONF_OFFCHANNEL: The device is currently not on its main * operating channel. */ enum ieee80211_conf_flags { IEEE80211_CONF_MONITOR = (1<<0), IEEE80211_CONF_PS = (1<<1), IEEE80211_CONF_IDLE = (1<<2), IEEE80211_CONF_OFFCHANNEL = (1<<3), }; /** * enum ieee80211_conf_changed - denotes which configuration changed * * @IEEE80211_CONF_CHANGE_LISTEN_INTERVAL: the listen interval changed * @IEEE80211_CONF_CHANGE_MONITOR: the monitor flag changed * @IEEE80211_CONF_CHANGE_PS: the PS flag or dynamic PS timeout changed * @IEEE80211_CONF_CHANGE_POWER: the TX power changed * @IEEE80211_CONF_CHANGE_CHANNEL: the channel/channel_type changed * @IEEE80211_CONF_CHANGE_RETRY_LIMITS: retry limits changed * @IEEE80211_CONF_CHANGE_IDLE: Idle flag changed * @IEEE80211_CONF_CHANGE_SMPS: Spatial multiplexing powersave mode changed * Note that this is only valid if channel contexts are not used, * otherwise each channel context has the number of chains listed. */ enum ieee80211_conf_changed { IEEE80211_CONF_CHANGE_SMPS = BIT(1), IEEE80211_CONF_CHANGE_LISTEN_INTERVAL = BIT(2), IEEE80211_CONF_CHANGE_MONITOR = BIT(3), IEEE80211_CONF_CHANGE_PS = BIT(4), IEEE80211_CONF_CHANGE_POWER = BIT(5), IEEE80211_CONF_CHANGE_CHANNEL = BIT(6), IEEE80211_CONF_CHANGE_RETRY_LIMITS = BIT(7), IEEE80211_CONF_CHANGE_IDLE = BIT(8), }; /** * enum ieee80211_smps_mode - spatial multiplexing power save mode * * @IEEE80211_SMPS_AUTOMATIC: automatic * @IEEE80211_SMPS_OFF: off * @IEEE80211_SMPS_STATIC: static * @IEEE80211_SMPS_DYNAMIC: dynamic * @IEEE80211_SMPS_NUM_MODES: internal, don't use */ enum ieee80211_smps_mode { IEEE80211_SMPS_AUTOMATIC, IEEE80211_SMPS_OFF, IEEE80211_SMPS_STATIC, IEEE80211_SMPS_DYNAMIC, /* keep last */ IEEE80211_SMPS_NUM_MODES, }; /** * struct ieee80211_conf - configuration of the device * * This struct indicates how the driver shall configure the hardware. * * @flags: configuration flags defined above * * @listen_interval: listen interval in units of beacon interval * @ps_dtim_period: The DTIM period of the AP we're connected to, for use * in power saving. Power saving will not be enabled until a beacon * has been received and the DTIM period is known. * @dynamic_ps_timeout: The dynamic powersave timeout (in ms), see the * powersave documentation below. This variable is valid only when * the CONF_PS flag is set. * * @power_level: requested transmit power (in dBm), backward compatibility * value only that is set to the minimum of all interfaces * * @chandef: the channel definition to tune to * @radar_enabled: whether radar detection is enabled * * @long_frame_max_tx_count: Maximum number of transmissions for a "long" frame * (a frame not RTS protected), called "dot11LongRetryLimit" in 802.11, * but actually means the number of transmissions not the number of retries * @short_frame_max_tx_count: Maximum number of transmissions for a "short" * frame, called "dot11ShortRetryLimit" in 802.11, but actually means the * number of transmissions not the number of retries * * @smps_mode: spatial multiplexing powersave mode; note that * %IEEE80211_SMPS_STATIC is used when the device is not * configured for an HT channel. * Note that this is only valid if channel contexts are not used, * otherwise each channel context has the number of chains listed. */ struct ieee80211_conf { u32 flags; int power_level, dynamic_ps_timeout; u16 listen_interval; u8 ps_dtim_period; u8 long_frame_max_tx_count, short_frame_max_tx_count; struct cfg80211_chan_def chandef; bool radar_enabled; enum ieee80211_smps_mode smps_mode; }; /** * struct ieee80211_channel_switch - holds the channel switch data * * The information provided in this structure is required for channel switch * operation. * * @timestamp: value in microseconds of the 64-bit Time Synchronization * Function (TSF) timer when the frame containing the channel switch * announcement was received. This is simply the rx.mactime parameter * the driver passed into mac80211. * @device_timestamp: arbitrary timestamp for the device, this is the * rx.device_timestamp parameter the driver passed to mac80211. * @block_tx: Indicates whether transmission must be blocked before the * scheduled channel switch, as indicated by the AP. * @chandef: the new channel to switch to * @count: the number of TBTT's until the channel switch event */ struct ieee80211_channel_switch { u64 timestamp; u32 device_timestamp; bool block_tx; struct cfg80211_chan_def chandef; u8 count; }; /** * enum ieee80211_vif_flags - virtual interface flags * * @IEEE80211_VIF_BEACON_FILTER: the device performs beacon filtering * on this virtual interface to avoid unnecessary CPU wakeups * @IEEE80211_VIF_SUPPORTS_CQM_RSSI: the device can do connection quality * monitoring on this virtual interface -- i.e. it can monitor * connection quality related parameters, such as the RSSI level and * provide notifications if configured trigger levels are reached. * @IEEE80211_VIF_SUPPORTS_UAPSD: The device can do U-APSD for this * interface. This flag should be set during interface addition, * but may be set/cleared as late as authentication to an AP. It is * only valid for managed/station mode interfaces. * @IEEE80211_VIF_GET_NOA_UPDATE: request to handle NOA attributes * and send P2P_PS notification to the driver if NOA changed, even * this is not pure P2P vif. */ enum ieee80211_vif_flags { IEEE80211_VIF_BEACON_FILTER = BIT(0), IEEE80211_VIF_SUPPORTS_CQM_RSSI = BIT(1), IEEE80211_VIF_SUPPORTS_UAPSD = BIT(2), IEEE80211_VIF_GET_NOA_UPDATE = BIT(3), }; /** * struct ieee80211_vif - per-interface data * * Data in this structure is continually present for driver * use during the life of a virtual interface. * * @type: type of this virtual interface * @bss_conf: BSS configuration for this interface, either our own * or the BSS we're associated to * @addr: address of this interface * @p2p: indicates whether this AP or STA interface is a p2p * interface, i.e. a GO or p2p-sta respectively * @csa_active: marks whether a channel switch is going on. Internally it is * write-protected by sdata_lock and local->mtx so holding either is fine * for read access. * @mu_mimo_owner: indicates interface owns MU-MIMO capability * @driver_flags: flags/capabilities the driver has for this interface, * these need to be set (or cleared) when the interface is added * or, if supported by the driver, the interface type is changed * at runtime, mac80211 will never touch this field * @hw_queue: hardware queue for each AC * @cab_queue: content-after-beacon (DTIM beacon really) queue, AP mode only * @chanctx_conf: The channel context this interface is assigned to, or %NULL * when it is not assigned. This pointer is RCU-protected due to the TX * path needing to access it; even though the netdev carrier will always * be off when it is %NULL there can still be races and packets could be * processed after it switches back to %NULL. * @debugfs_dir: debugfs dentry, can be used by drivers to create own per * interface debug files. Note that it will be NULL for the virtual * monitor interface (if that is requested.) * @probe_req_reg: probe requests should be reported to mac80211 for this * interface. * @drv_priv: data area for driver use, will always be aligned to * sizeof(void \*). * @txq: the multicast data TX queue (if driver uses the TXQ abstraction) */ struct ieee80211_vif { enum nl80211_iftype type; struct ieee80211_bss_conf bss_conf; u8 addr[ETH_ALEN] __aligned(2); bool p2p; bool csa_active; bool mu_mimo_owner; u8 cab_queue; u8 hw_queue[IEEE80211_NUM_ACS]; struct ieee80211_txq *txq; struct ieee80211_chanctx_conf __rcu *chanctx_conf; u32 driver_flags; #ifdef CONFIG_MAC80211_DEBUGFS struct dentry *debugfs_dir; #endif unsigned int probe_req_reg; /* must be last */ u8 drv_priv[0] __aligned(sizeof(void *)); }; static inline bool ieee80211_vif_is_mesh(struct ieee80211_vif *vif) { #ifdef CONFIG_MAC80211_MESH return vif->type == NL80211_IFTYPE_MESH_POINT; #endif return false; } /** * wdev_to_ieee80211_vif - return a vif struct from a wdev * @wdev: the wdev to get the vif for * * This can be used by mac80211 drivers with direct cfg80211 APIs * (like the vendor commands) that get a wdev. * * Note that this function may return %NULL if the given wdev isn't * associated with a vif that the driver knows about (e.g. monitor * or AP_VLAN interfaces.) */ struct ieee80211_vif *wdev_to_ieee80211_vif(struct wireless_dev *wdev); /** * ieee80211_vif_to_wdev - return a wdev struct from a vif * @vif: the vif to get the wdev for * * This can be used by mac80211 drivers with direct cfg80211 APIs * (like the vendor commands) that needs to get the wdev for a vif. * * Note that this function may return %NULL if the given wdev isn't * associated with a vif that the driver knows about (e.g. monitor * or AP_VLAN interfaces.) */ struct wireless_dev *ieee80211_vif_to_wdev(struct ieee80211_vif *vif); /** * enum ieee80211_key_flags - key flags * * These flags are used for communication about keys between the driver * and mac80211, with the @flags parameter of &struct ieee80211_key_conf. * * @IEEE80211_KEY_FLAG_GENERATE_IV: This flag should be set by the * driver to indicate that it requires IV generation for this * particular key. Setting this flag does not necessarily mean that SKBs * will have sufficient tailroom for ICV or MIC. * @IEEE80211_KEY_FLAG_GENERATE_MMIC: This flag should be set by * the driver for a TKIP key if it requires Michael MIC * generation in software. * @IEEE80211_KEY_FLAG_PAIRWISE: Set by mac80211, this flag indicates * that the key is pairwise rather then a shared key. * @IEEE80211_KEY_FLAG_SW_MGMT_TX: This flag should be set by the driver for a * CCMP/GCMP key if it requires CCMP/GCMP encryption of management frames * (MFP) to be done in software. * @IEEE80211_KEY_FLAG_PUT_IV_SPACE: This flag should be set by the driver * if space should be prepared for the IV, but the IV * itself should not be generated. Do not set together with * @IEEE80211_KEY_FLAG_GENERATE_IV on the same key. Setting this flag does * not necessarily mean that SKBs will have sufficient tailroom for ICV or * MIC. * @IEEE80211_KEY_FLAG_RX_MGMT: This key will be used to decrypt received * management frames. The flag can help drivers that have a hardware * crypto implementation that doesn't deal with management frames * properly by allowing them to not upload the keys to hardware and * fall back to software crypto. Note that this flag deals only with * RX, if your crypto engine can't deal with TX you can also set the * %IEEE80211_KEY_FLAG_SW_MGMT_TX flag to encrypt such frames in SW. * @IEEE80211_KEY_FLAG_GENERATE_IV_MGMT: This flag should be set by the * driver for a CCMP/GCMP key to indicate that is requires IV generation * only for managment frames (MFP). * @IEEE80211_KEY_FLAG_RESERVE_TAILROOM: This flag should be set by the * driver for a key to indicate that sufficient tailroom must always * be reserved for ICV or MIC, even when HW encryption is enabled. * @IEEE80211_KEY_FLAG_PUT_MIC_SPACE: This flag should be set by the driver for * a TKIP key if it only requires MIC space. Do not set together with * @IEEE80211_KEY_FLAG_GENERATE_MMIC on the same key. */ enum ieee80211_key_flags { IEEE80211_KEY_FLAG_GENERATE_IV_MGMT = BIT(0), IEEE80211_KEY_FLAG_GENERATE_IV = BIT(1), IEEE80211_KEY_FLAG_GENERATE_MMIC = BIT(2), IEEE80211_KEY_FLAG_PAIRWISE = BIT(3), IEEE80211_KEY_FLAG_SW_MGMT_TX = BIT(4), IEEE80211_KEY_FLAG_PUT_IV_SPACE = BIT(5), IEEE80211_KEY_FLAG_RX_MGMT = BIT(6), IEEE80211_KEY_FLAG_RESERVE_TAILROOM = BIT(7), IEEE80211_KEY_FLAG_PUT_MIC_SPACE = BIT(8), }; /** * struct ieee80211_key_conf - key information * * This key information is given by mac80211 to the driver by * the set_key() callback in &struct ieee80211_ops. * * @hw_key_idx: To be set by the driver, this is the key index the driver * wants to be given when a frame is transmitted and needs to be * encrypted in hardware. * @cipher: The key's cipher suite selector. * @tx_pn: PN used for TX keys, may be used by the driver as well if it * needs to do software PN assignment by itself (e.g. due to TSO) * @flags: key flags, see &enum ieee80211_key_flags. * @keyidx: the key index (0-3) * @keylen: key material length * @key: key material. For ALG_TKIP the key is encoded as a 256-bit (32 byte) * data block: * - Temporal Encryption Key (128 bits) * - Temporal Authenticator Tx MIC Key (64 bits) * - Temporal Authenticator Rx MIC Key (64 bits) * @icv_len: The ICV length for this key type * @iv_len: The IV length for this key type */ struct ieee80211_key_conf { atomic64_t tx_pn; u32 cipher; u8 icv_len; u8 iv_len; u8 hw_key_idx; s8 keyidx; u16 flags; u8 keylen; u8 key[0]; }; #define IEEE80211_MAX_PN_LEN 16 #define TKIP_PN_TO_IV16(pn) ((u16)(pn & 0xffff)) #define TKIP_PN_TO_IV32(pn) ((u32)((pn >> 16) & 0xffffffff)) /** * struct ieee80211_key_seq - key sequence counter * * @tkip: TKIP data, containing IV32 and IV16 in host byte order * @ccmp: PN data, most significant byte first (big endian, * reverse order than in packet) * @aes_cmac: PN data, most significant byte first (big endian, * reverse order than in packet) * @aes_gmac: PN data, most significant byte first (big endian, * reverse order than in packet) * @gcmp: PN data, most significant byte first (big endian, * reverse order than in packet) * @hw: data for HW-only (e.g. cipher scheme) keys */ struct ieee80211_key_seq { union { struct { u32 iv32; u16 iv16; } tkip; struct { u8 pn[6]; } ccmp; struct { u8 pn[6]; } aes_cmac; struct { u8 pn[6]; } aes_gmac; struct { u8 pn[6]; } gcmp; struct { u8 seq[IEEE80211_MAX_PN_LEN]; u8 seq_len; } hw; }; }; /** * struct ieee80211_cipher_scheme - cipher scheme * * This structure contains a cipher scheme information defining * the secure packet crypto handling. * * @cipher: a cipher suite selector * @iftype: a cipher iftype bit mask indicating an allowed cipher usage * @hdr_len: a length of a security header used the cipher * @pn_len: a length of a packet number in the security header * @pn_off: an offset of pn from the beginning of the security header * @key_idx_off: an offset of key index byte in the security header * @key_idx_mask: a bit mask of key_idx bits * @key_idx_shift: a bit shift needed to get key_idx * key_idx value calculation: * (sec_header_base[key_idx_off] & key_idx_mask) >> key_idx_shift * @mic_len: a mic length in bytes */ struct ieee80211_cipher_scheme { u32 cipher; u16 iftype; u8 hdr_len; u8 pn_len; u8 pn_off; u8 key_idx_off; u8 key_idx_mask; u8 key_idx_shift; u8 mic_len; }; /** * enum set_key_cmd - key command * * Used with the set_key() callback in &struct ieee80211_ops, this * indicates whether a key is being removed or added. * * @SET_KEY: a key is set * @DISABLE_KEY: a key must be disabled */ enum set_key_cmd { SET_KEY, DISABLE_KEY, }; /** * enum ieee80211_sta_state - station state * * @IEEE80211_STA_NOTEXIST: station doesn't exist at all, * this is a special state for add/remove transitions * @IEEE80211_STA_NONE: station exists without special state * @IEEE80211_STA_AUTH: station is authenticated * @IEEE80211_STA_ASSOC: station is associated * @IEEE80211_STA_AUTHORIZED: station is authorized (802.1X) */ enum ieee80211_sta_state { /* NOTE: These need to be ordered correctly! */ IEEE80211_STA_NOTEXIST, IEEE80211_STA_NONE, IEEE80211_STA_AUTH, IEEE80211_STA_ASSOC, IEEE80211_STA_AUTHORIZED, }; /** * enum ieee80211_sta_rx_bandwidth - station RX bandwidth * @IEEE80211_STA_RX_BW_20: station can only receive 20 MHz * @IEEE80211_STA_RX_BW_40: station can receive up to 40 MHz * @IEEE80211_STA_RX_BW_80: station can receive up to 80 MHz * @IEEE80211_STA_RX_BW_160: station can receive up to 160 MHz * (including 80+80 MHz) * * Implementation note: 20 must be zero to be initialized * correctly, the values must be sorted. */ enum ieee80211_sta_rx_bandwidth { IEEE80211_STA_RX_BW_20 = 0, IEEE80211_STA_RX_BW_40, IEEE80211_STA_RX_BW_80, IEEE80211_STA_RX_BW_160, }; /** * struct ieee80211_sta_rates - station rate selection table * * @rcu_head: RCU head used for freeing the table on update * @rate: transmit rates/flags to be used by default. * Overriding entries per-packet is possible by using cb tx control. */ struct ieee80211_sta_rates { struct rcu_head rcu_head; struct { s8 idx; u8 count; u8 count_cts; u8 count_rts; u16 flags; } rate[IEEE80211_TX_RATE_TABLE_SIZE]; }; /** * struct ieee80211_sta - station table entry * * A station table entry represents a station we are possibly * communicating with. Since stations are RCU-managed in * mac80211, any ieee80211_sta pointer you get access to must * either be protected by rcu_read_lock() explicitly or implicitly, * or you must take good care to not use such a pointer after a * call to your sta_remove callback that removed it. * * @addr: MAC address * @aid: AID we assigned to the station if we're an AP * @supp_rates: Bitmap of supported rates (per band) * @ht_cap: HT capabilities of this STA; restricted to our own capabilities * @vht_cap: VHT capabilities of this STA; restricted to our own capabilities * @he_cap: HE capabilities of this STA * @max_rx_aggregation_subframes: maximal amount of frames in a single AMPDU * that this station is allowed to transmit to us. * Can be modified by driver. * @wme: indicates whether the STA supports QoS/WME (if local devices does, * otherwise always false) * @drv_priv: data area for driver use, will always be aligned to * sizeof(void \*), size is determined in hw information. * @uapsd_queues: bitmap of queues configured for uapsd. Only valid * if wme is supported. The bits order is like in * IEEE80211_WMM_IE_STA_QOSINFO_AC_*. * @max_sp: max Service Period. Only valid if wme is supported. * @bandwidth: current bandwidth the station can receive with * @rx_nss: in HT/VHT, the maximum number of spatial streams the * station can receive at the moment, changed by operating mode * notifications and capabilities. The value is only valid after * the station moves to associated state. * @smps_mode: current SMPS mode (off, static or dynamic) * @rates: rate control selection table * @tdls: indicates whether the STA is a TDLS peer * @tdls_initiator: indicates the STA is an initiator of the TDLS link. Only * valid if the STA is a TDLS peer in the first place. * @mfp: indicates whether the STA uses management frame protection or not. * @max_amsdu_subframes: indicates the maximal number of MSDUs in a single * A-MSDU. Taken from the Extended Capabilities element. 0 means * unlimited. * @support_p2p_ps: indicates whether the STA supports P2P PS mechanism or not. * @max_rc_amsdu_len: Maximum A-MSDU size in bytes recommended by rate control. * @txq: per-TID data TX queues (if driver uses the TXQ abstraction) */ struct ieee80211_sta { u32 supp_rates[NUM_NL80211_BANDS]; u8 addr[ETH_ALEN]; u16 aid; struct ieee80211_sta_ht_cap ht_cap; struct ieee80211_sta_vht_cap vht_cap; struct ieee80211_sta_he_cap he_cap; u16 max_rx_aggregation_subframes; bool wme; u8 uapsd_queues; u8 max_sp; u8 rx_nss; enum ieee80211_sta_rx_bandwidth bandwidth; enum ieee80211_smps_mode smps_mode; struct ieee80211_sta_rates __rcu *rates; bool tdls; bool tdls_initiator; bool mfp; u8 max_amsdu_subframes; /** * @max_amsdu_len: * indicates the maximal length of an A-MSDU in bytes. * This field is always valid for packets with a VHT preamble. * For packets with a HT preamble, additional limits apply: * * * If the skb is transmitted as part of a BA agreement, the * A-MSDU maximal size is min(max_amsdu_len, 4065) bytes. * * If the skb is not part of a BA aggreement, the A-MSDU maximal * size is min(max_amsdu_len, 7935) bytes. * * Both additional HT limits must be enforced by the low level * driver. This is defined by the spec (IEEE 802.11-2012 section * 8.3.2.2 NOTE 2). */ u16 max_amsdu_len; bool support_p2p_ps; u16 max_rc_amsdu_len; struct ieee80211_txq *txq[IEEE80211_NUM_TIDS]; /* must be last */ u8 drv_priv[0] __aligned(sizeof(void *)); }; /** * enum sta_notify_cmd - sta notify command * * Used with the sta_notify() callback in &struct ieee80211_ops, this * indicates if an associated station made a power state transition. * * @STA_NOTIFY_SLEEP: a station is now sleeping * @STA_NOTIFY_AWAKE: a sleeping station woke up */ enum sta_notify_cmd { STA_NOTIFY_SLEEP, STA_NOTIFY_AWAKE, }; /** * struct ieee80211_tx_control - TX control data * * @sta: station table entry, this sta pointer may be NULL and * it is not allowed to copy the pointer, due to RCU. */ struct ieee80211_tx_control { struct ieee80211_sta *sta; }; /** * struct ieee80211_txq - Software intermediate tx queue * * @vif: &struct ieee80211_vif pointer from the add_interface callback. * @sta: station table entry, %NULL for per-vif queue * @tid: the TID for this queue (unused for per-vif queue) * @ac: the AC for this queue * @drv_priv: driver private area, sized by hw->txq_data_size * * The driver can obtain packets from this queue by calling * ieee80211_tx_dequeue(). */ struct ieee80211_txq { struct ieee80211_vif *vif; struct ieee80211_sta *sta; u8 tid; u8 ac; /* must be last */ u8 drv_priv[0] __aligned(sizeof(void *)); }; /** * enum ieee80211_hw_flags - hardware flags * * These flags are used to indicate hardware capabilities to * the stack. Generally, flags here should have their meaning * done in a way that the simplest hardware doesn't need setting * any particular flags. There are some exceptions to this rule, * however, so you are advised to review these flags carefully. * * @IEEE80211_HW_HAS_RATE_CONTROL: * The hardware or firmware includes rate control, and cannot be * controlled by the stack. As such, no rate control algorithm * should be instantiated, and the TX rate reported to userspace * will be taken from the TX status instead of the rate control * algorithm. * Note that this requires that the driver implement a number of * callbacks so it has the correct information, it needs to have * the @set_rts_threshold callback and must look at the BSS config * @use_cts_prot for G/N protection, @use_short_slot for slot * timing in 2.4 GHz and @use_short_preamble for preambles for * CCK frames. * * @IEEE80211_HW_RX_INCLUDES_FCS: * Indicates that received frames passed to the stack include * the FCS at the end. * * @IEEE80211_HW_HOST_BROADCAST_PS_BUFFERING: * Some wireless LAN chipsets buffer broadcast/multicast frames * for power saving stations in the hardware/firmware and others * rely on the host system for such buffering. This option is used * to configure the IEEE 802.11 upper layer to buffer broadcast and * multicast frames when there are power saving stations so that * the driver can fetch them with ieee80211_get_buffered_bc(). * * @IEEE80211_HW_SIGNAL_UNSPEC: * Hardware can provide signal values but we don't know its units. We * expect values between 0 and @max_signal. * If possible please provide dB or dBm instead. * * @IEEE80211_HW_SIGNAL_DBM: * Hardware gives signal values in dBm, decibel difference from * one milliwatt. This is the preferred method since it is standardized * between different devices. @max_signal does not need to be set. * * @IEEE80211_HW_SPECTRUM_MGMT: * Hardware supports spectrum management defined in 802.11h * Measurement, Channel Switch, Quieting, TPC * * @IEEE80211_HW_AMPDU_AGGREGATION: * Hardware supports 11n A-MPDU aggregation. * * @IEEE80211_HW_SUPPORTS_PS: * Hardware has power save support (i.e. can go to sleep). * * @IEEE80211_HW_PS_NULLFUNC_STACK: * Hardware requires nullfunc frame handling in stack, implies * stack support for dynamic PS. * * @IEEE80211_HW_SUPPORTS_DYNAMIC_PS: * Hardware has support for dynamic PS. * * @IEEE80211_HW_MFP_CAPABLE: * Hardware supports management frame protection (MFP, IEEE 802.11w). * * @IEEE80211_HW_REPORTS_TX_ACK_STATUS: * Hardware can provide ack status reports of Tx frames to * the stack. * * @IEEE80211_HW_CONNECTION_MONITOR: * The hardware performs its own connection monitoring, including * periodic keep-alives to the AP and probing the AP on beacon loss. * * @IEEE80211_HW_NEED_DTIM_BEFORE_ASSOC: * This device needs to get data from beacon before association (i.e. * dtim_period). * * @IEEE80211_HW_SUPPORTS_PER_STA_GTK: The device's crypto engine supports * per-station GTKs as used by IBSS RSN or during fast transition. If * the device doesn't support per-station GTKs, but can be asked not * to decrypt group addressed frames, then IBSS RSN support is still * possible but software crypto will be used. Advertise the wiphy flag * only in that case. * * @IEEE80211_HW_AP_LINK_PS: When operating in AP mode the device * autonomously manages the PS status of connected stations. When * this flag is set mac80211 will not trigger PS mode for connected * stations based on the PM bit of incoming frames. * Use ieee80211_start_ps()/ieee8021_end_ps() to manually configure * the PS mode of connected stations. * * @IEEE80211_HW_TX_AMPDU_SETUP_IN_HW: The device handles TX A-MPDU session * setup strictly in HW. mac80211 should not attempt to do this in * software. * * @IEEE80211_HW_WANT_MONITOR_VIF: The driver would like to be informed of * a virtual monitor interface when monitor interfaces are the only * active interfaces. * * @IEEE80211_HW_NO_AUTO_VIF: The driver would like for no wlanX to * be created. It is expected user-space will create vifs as * desired (and thus have them named as desired). * * @IEEE80211_HW_SW_CRYPTO_CONTROL: The driver wants to control which of the * crypto algorithms can be done in software - so don't automatically * try to fall back to it if hardware crypto fails, but do so only if * the driver returns 1. This also forces the driver to advertise its * supported cipher suites. * * @IEEE80211_HW_SUPPORT_FAST_XMIT: The driver/hardware supports fast-xmit, * this currently requires only the ability to calculate the duration * for frames. * * @IEEE80211_HW_QUEUE_CONTROL: The driver wants to control per-interface * queue mapping in order to use different queues (not just one per AC) * for different virtual interfaces. See the doc section on HW queue * control for more details. * * @IEEE80211_HW_SUPPORTS_RC_TABLE: The driver supports using a rate * selection table provided by the rate control algorithm. * * @IEEE80211_HW_P2P_DEV_ADDR_FOR_INTF: Use the P2P Device address for any * P2P Interface. This will be honoured even if more than one interface * is supported. * * @IEEE80211_HW_TIMING_BEACON_ONLY: Use sync timing from beacon frames * only, to allow getting TBTT of a DTIM beacon. * * @IEEE80211_HW_SUPPORTS_HT_CCK_RATES: Hardware supports mixing HT/CCK rates * and can cope with CCK rates in an aggregation session (e.g. by not * using aggregation for such frames.) * * @IEEE80211_HW_CHANCTX_STA_CSA: Support 802.11h based channel-switch (CSA) * for a single active channel while using channel contexts. When support * is not enabled the default action is to disconnect when getting the * CSA frame. * * @IEEE80211_HW_SUPPORTS_CLONED_SKBS: The driver will never modify the payload * or tailroom of TX skbs without copying them first. * * @IEEE80211_HW_SINGLE_SCAN_ON_ALL_BANDS: The HW supports scanning on all bands * in one command, mac80211 doesn't have to run separate scans per band. * * @IEEE80211_HW_TDLS_WIDER_BW: The device/driver supports wider bandwidth * than then BSS bandwidth for a TDLS link on the base channel. * * @IEEE80211_HW_SUPPORTS_AMSDU_IN_AMPDU: The driver supports receiving A-MSDUs * within A-MPDU. * * @IEEE80211_HW_BEACON_TX_STATUS: The device/driver provides TX status * for sent beacons. * * @IEEE80211_HW_NEEDS_UNIQUE_STA_ADDR: Hardware (or driver) requires that each * station has a unique address, i.e. each station entry can be identified * by just its MAC address; this prevents, for example, the same station * from connecting to two virtual AP interfaces at the same time. * * @IEEE80211_HW_SUPPORTS_REORDERING_BUFFER: Hardware (or driver) manages the * reordering buffer internally, guaranteeing mac80211 receives frames in * order and does not need to manage its own reorder buffer or BA session * timeout. * * @IEEE80211_HW_USES_RSS: The device uses RSS and thus requires parallel RX, * which implies using per-CPU station statistics. * * @IEEE80211_HW_TX_AMSDU: Hardware (or driver) supports software aggregated * A-MSDU frames. Requires software tx queueing and fast-xmit support. * When not using minstrel/minstrel_ht rate control, the driver must * limit the maximum A-MSDU size based on the current tx rate by setting * max_rc_amsdu_len in struct ieee80211_sta. * * @IEEE80211_HW_TX_FRAG_LIST: Hardware (or driver) supports sending frag_list * skbs, needed for zero-copy software A-MSDU. * * @IEEE80211_HW_REPORTS_LOW_ACK: The driver (or firmware) reports low ack event * by ieee80211_report_low_ack() based on its own algorithm. For such * drivers, mac80211 packet loss mechanism will not be triggered and driver * is completely depending on firmware event for station kickout. * * @IEEE80211_HW_SUPPORTS_TX_FRAG: Hardware does fragmentation by itself. * The stack will not do fragmentation. * The callback for @set_frag_threshold should be set as well. * * @IEEE80211_HW_SUPPORTS_TDLS_BUFFER_STA: Hardware supports buffer STA on * TDLS links. * * @IEEE80211_HW_DEAUTH_NEED_MGD_TX_PREP: The driver requires the * mgd_prepare_tx() callback to be called before transmission of a * deauthentication frame in case the association was completed but no * beacon was heard. This is required in multi-channel scenarios, where the * virtual interface might not be given air time for the transmission of * the frame, as it is not synced with the AP/P2P GO yet, and thus the * deauthentication frame might not be transmitted. * * @IEEE80211_HW_DOESNT_SUPPORT_QOS_NDP: The driver (or firmware) doesn't * support QoS NDP for AP probing - that's most likely a driver bug. * * @NUM_IEEE80211_HW_FLAGS: number of hardware flags, used for sizing arrays */ enum ieee80211_hw_flags { IEEE80211_HW_HAS_RATE_CONTROL, IEEE80211_HW_RX_INCLUDES_FCS, IEEE80211_HW_HOST_BROADCAST_PS_BUFFERING, IEEE80211_HW_SIGNAL_UNSPEC, IEEE80211_HW_SIGNAL_DBM, IEEE80211_HW_NEED_DTIM_BEFORE_ASSOC, IEEE80211_HW_SPECTRUM_MGMT, IEEE80211_HW_AMPDU_AGGREGATION, IEEE80211_HW_SUPPORTS_PS, IEEE80211_HW_PS_NULLFUNC_STACK, IEEE80211_HW_SUPPORTS_DYNAMIC_PS, IEEE80211_HW_MFP_CAPABLE, IEEE80211_HW_WANT_MONITOR_VIF, IEEE80211_HW_NO_AUTO_VIF, IEEE80211_HW_SW_CRYPTO_CONTROL, IEEE80211_HW_SUPPORT_FAST_XMIT, IEEE80211_HW_REPORTS_TX_ACK_STATUS, IEEE80211_HW_CONNECTION_MONITOR, IEEE80211_HW_QUEUE_CONTROL, IEEE80211_HW_SUPPORTS_PER_STA_GTK, IEEE80211_HW_AP_LINK_PS, IEEE80211_HW_TX_AMPDU_SETUP_IN_HW, IEEE80211_HW_SUPPORTS_RC_TABLE, IEEE80211_HW_P2P_DEV_ADDR_FOR_INTF, IEEE80211_HW_TIMING_BEACON_ONLY, IEEE80211_HW_SUPPORTS_HT_CCK_RATES, IEEE80211_HW_CHANCTX_STA_CSA, IEEE80211_HW_SUPPORTS_CLONED_SKBS, IEEE80211_HW_SINGLE_SCAN_ON_ALL_BANDS, IEEE80211_HW_TDLS_WIDER_BW, IEEE80211_HW_SUPPORTS_AMSDU_IN_AMPDU, IEEE80211_HW_BEACON_TX_STATUS, IEEE80211_HW_NEEDS_UNIQUE_STA_ADDR, IEEE80211_HW_SUPPORTS_REORDERING_BUFFER, IEEE80211_HW_USES_RSS, IEEE80211_HW_TX_AMSDU, IEEE80211_HW_TX_FRAG_LIST, IEEE80211_HW_REPORTS_LOW_ACK, IEEE80211_HW_SUPPORTS_TX_FRAG, IEEE80211_HW_SUPPORTS_TDLS_BUFFER_STA, IEEE80211_HW_DEAUTH_NEED_MGD_TX_PREP, IEEE80211_HW_DOESNT_SUPPORT_QOS_NDP, /* keep last, obviously */ NUM_IEEE80211_HW_FLAGS }; /** * struct ieee80211_hw - hardware information and state * * This structure contains the configuration and hardware * information for an 802.11 PHY. * * @wiphy: This points to the &struct wiphy allocated for this * 802.11 PHY. You must fill in the @perm_addr and @dev * members of this structure using SET_IEEE80211_DEV() * and SET_IEEE80211_PERM_ADDR(). Additionally, all supported * bands (with channels, bitrates) are registered here. * * @conf: &struct ieee80211_conf, device configuration, don't use. * * @priv: pointer to private area that was allocated for driver use * along with this structure. * * @flags: hardware flags, see &enum ieee80211_hw_flags. * * @extra_tx_headroom: headroom to reserve in each transmit skb * for use by the driver (e.g. for transmit headers.) * * @extra_beacon_tailroom: tailroom to reserve in each beacon tx skb. * Can be used by drivers to add extra IEs. * * @max_signal: Maximum value for signal (rssi) in RX information, used * only when @IEEE80211_HW_SIGNAL_UNSPEC or @IEEE80211_HW_SIGNAL_DB * * @max_listen_interval: max listen interval in units of beacon interval * that HW supports * * @queues: number of available hardware transmit queues for * data packets. WMM/QoS requires at least four, these * queues need to have configurable access parameters. * * @rate_control_algorithm: rate control algorithm for this hardware. * If unset (NULL), the default algorithm will be used. Must be * set before calling ieee80211_register_hw(). * * @vif_data_size: size (in bytes) of the drv_priv data area * within &struct ieee80211_vif. * @sta_data_size: size (in bytes) of the drv_priv data area * within &struct ieee80211_sta. * @chanctx_data_size: size (in bytes) of the drv_priv data area * within &struct ieee80211_chanctx_conf. * @txq_data_size: size (in bytes) of the drv_priv data area * within @struct ieee80211_txq. * * @max_rates: maximum number of alternate rate retry stages the hw * can handle. * @max_report_rates: maximum number of alternate rate retry stages * the hw can report back. * @max_rate_tries: maximum number of tries for each stage * * @max_rx_aggregation_subframes: maximum buffer size (number of * sub-frames) to be used for A-MPDU block ack receiver * aggregation. * This is only relevant if the device has restrictions on the * number of subframes, if it relies on mac80211 to do reordering * it shouldn't be set. * * @max_tx_aggregation_subframes: maximum number of subframes in an * aggregate an HT/HE device will transmit. In HT AddBA we'll * advertise a constant value of 64 as some older APs crash if * the window size is smaller (an example is LinkSys WRT120N * with FW v1.0.07 build 002 Jun 18 2012). * For AddBA to HE capable peers this value will be used. * * @max_tx_fragments: maximum number of tx buffers per (A)-MSDU, sum * of 1 + skb_shinfo(skb)->nr_frags for each skb in the frag_list. * * @offchannel_tx_hw_queue: HW queue ID to use for offchannel TX * (if %IEEE80211_HW_QUEUE_CONTROL is set) * * @radiotap_mcs_details: lists which MCS information can the HW * reports, by default it is set to _MCS, _GI and _BW but doesn't * include _FMT. Use %IEEE80211_RADIOTAP_MCS_HAVE_\* values, only * adding _BW is supported today. * * @radiotap_vht_details: lists which VHT MCS information the HW reports, * the default is _GI | _BANDWIDTH. * Use the %IEEE80211_RADIOTAP_VHT_KNOWN_\* values. * * @radiotap_he: HE radiotap validity flags * * @radiotap_timestamp: Information for the radiotap timestamp field; if the * 'units_pos' member is set to a non-negative value it must be set to * a combination of a IEEE80211_RADIOTAP_TIMESTAMP_UNIT_* and a * IEEE80211_RADIOTAP_TIMESTAMP_SPOS_* value, and then the timestamp * field will be added and populated from the &struct ieee80211_rx_status * device_timestamp. If the 'accuracy' member is non-negative, it's put * into the accuracy radiotap field and the accuracy known flag is set. * * @netdev_features: netdev features to be set in each netdev created * from this HW. Note that not all features are usable with mac80211, * other features will be rejected during HW registration. * * @uapsd_queues: This bitmap is included in (re)association frame to indicate * for each access category if it is uAPSD trigger-enabled and delivery- * enabled. Use IEEE80211_WMM_IE_STA_QOSINFO_AC_* to set this bitmap. * Each bit corresponds to different AC. Value '1' in specific bit means * that corresponding AC is both trigger- and delivery-enabled. '0' means * neither enabled. * * @uapsd_max_sp_len: maximum number of total buffered frames the WMM AP may * deliver to a WMM STA during any Service Period triggered by the WMM STA. * Use IEEE80211_WMM_IE_STA_QOSINFO_SP_* for correct values. * * @n_cipher_schemes: a size of an array of cipher schemes definitions. * @cipher_schemes: a pointer to an array of cipher scheme definitions * supported by HW. * @max_nan_de_entries: maximum number of NAN DE functions supported by the * device. */ struct ieee80211_hw { struct ieee80211_conf conf; struct wiphy *wiphy; const char *rate_control_algorithm; void *priv; unsigned long flags[BITS_TO_LONGS(NUM_IEEE80211_HW_FLAGS)]; unsigned int extra_tx_headroom; unsigned int extra_beacon_tailroom; int vif_data_size; int sta_data_size; int chanctx_data_size; int txq_data_size; u16 queues; u16 max_listen_interval; s8 max_signal; u8 max_rates; u8 max_report_rates; u8 max_rate_tries; u16 max_rx_aggregation_subframes; u16 max_tx_aggregation_subframes; u8 max_tx_fragments; u8 offchannel_tx_hw_queue; u8 radiotap_mcs_details; u16 radiotap_vht_details; struct { int units_pos; s16 accuracy; } radiotap_timestamp; netdev_features_t netdev_features; u8 uapsd_queues; u8 uapsd_max_sp_len; u8 n_cipher_schemes; const struct ieee80211_cipher_scheme *cipher_schemes; u8 max_nan_de_entries; }; static inline bool _ieee80211_hw_check(struct ieee80211_hw *hw, enum ieee80211_hw_flags flg) { return test_bit(flg, hw->flags); } #define ieee80211_hw_check(hw, flg) _ieee80211_hw_check(hw, IEEE80211_HW_##flg) static inline void _ieee80211_hw_set(struct ieee80211_hw *hw, enum ieee80211_hw_flags flg) { return __set_bit(flg, hw->flags); } #define ieee80211_hw_set(hw, flg) _ieee80211_hw_set(hw, IEEE80211_HW_##flg) /** * struct ieee80211_scan_request - hw scan request * * @ies: pointers different parts of IEs (in req.ie) * @req: cfg80211 request. */ struct ieee80211_scan_request { struct ieee80211_scan_ies ies; /* Keep last */ struct cfg80211_scan_request req; }; /** * struct ieee80211_tdls_ch_sw_params - TDLS channel switch parameters * * @sta: peer this TDLS channel-switch request/response came from * @chandef: channel referenced in a TDLS channel-switch request * @action_code: see &enum ieee80211_tdls_actioncode * @status: channel-switch response status * @timestamp: time at which the frame was received * @switch_time: switch-timing parameter received in the frame * @switch_timeout: switch-timing parameter received in the frame * @tmpl_skb: TDLS switch-channel response template * @ch_sw_tm_ie: offset of the channel-switch timing IE inside @tmpl_skb */ struct ieee80211_tdls_ch_sw_params { struct ieee80211_sta *sta; struct cfg80211_chan_def *chandef; u8 action_code; u32 status; u32 timestamp; u16 switch_time; u16 switch_timeout; struct sk_buff *tmpl_skb; u32 ch_sw_tm_ie; }; /** * wiphy_to_ieee80211_hw - return a mac80211 driver hw struct from a wiphy * * @wiphy: the &struct wiphy which we want to query * * mac80211 drivers can use this to get to their respective * &struct ieee80211_hw. Drivers wishing to get to their own private * structure can then access it via hw->priv. Note that mac802111 drivers should * not use wiphy_priv() to try to get their private driver structure as this * is already used internally by mac80211. * * Return: The mac80211 driver hw struct of @wiphy. */ struct ieee80211_hw *wiphy_to_ieee80211_hw(struct wiphy *wiphy); /** * SET_IEEE80211_DEV - set device for 802.11 hardware * * @hw: the &struct ieee80211_hw to set the device for * @dev: the &struct device of this 802.11 device */ static inline void SET_IEEE80211_DEV(struct ieee80211_hw *hw, struct device *dev) { set_wiphy_dev(hw->wiphy, dev); } /** * SET_IEEE80211_PERM_ADDR - set the permanent MAC address for 802.11 hardware * * @hw: the &struct ieee80211_hw to set the MAC address for * @addr: the address to set */ static inline void SET_IEEE80211_PERM_ADDR(struct ieee80211_hw *hw, const u8 *addr) { memcpy(hw->wiphy->perm_addr, addr, ETH_ALEN); } static inline struct ieee80211_rate * ieee80211_get_tx_rate(const struct ieee80211_hw *hw, const struct ieee80211_tx_info *c) { if (WARN_ON_ONCE(c->control.rates[0].idx < 0)) return NULL; return &hw->wiphy->bands[c->band]->bitrates[c->control.rates[0].idx]; } static inline struct ieee80211_rate * ieee80211_get_rts_cts_rate(const struct ieee80211_hw *hw, const struct ieee80211_tx_info *c) { if (c->control.rts_cts_rate_idx < 0) return NULL; return &hw->wiphy->bands[c->band]->bitrates[c->control.rts_cts_rate_idx]; } static inline struct ieee80211_rate * ieee80211_get_alt_retry_rate(const struct ieee80211_hw *hw, const struct ieee80211_tx_info *c, int idx) { if (c->control.rates[idx + 1].idx < 0) return NULL; return &hw->wiphy->bands[c->band]->bitrates[c->control.rates[idx + 1].idx]; } /** * ieee80211_free_txskb - free TX skb * @hw: the hardware * @skb: the skb * * Free a transmit skb. Use this funtion when some failure * to transmit happened and thus status cannot be reported. */ void ieee80211_free_txskb(struct ieee80211_hw *hw, struct sk_buff *skb); /** * DOC: Hardware crypto acceleration * * mac80211 is capable of taking advantage of many hardware * acceleration designs for encryption and decryption operations. * * The set_key() callback in the &struct ieee80211_ops for a given * device is called to enable hardware acceleration of encryption and * decryption. The callback takes a @sta parameter that will be NULL * for default keys or keys used for transmission only, or point to * the station information for the peer for individual keys. * Multiple transmission keys with the same key index may be used when * VLANs are configured for an access point. * * When transmitting, the TX control data will use the @hw_key_idx * selected by the driver by modifying the &struct ieee80211_key_conf * pointed to by the @key parameter to the set_key() function. * * The set_key() call for the %SET_KEY command should return 0 if * the key is now in use, -%EOPNOTSUPP or -%ENOSPC if it couldn't be * added; if you return 0 then hw_key_idx must be assigned to the * hardware key index, you are free to use the full u8 range. * * Note that in the case that the @IEEE80211_HW_SW_CRYPTO_CONTROL flag is * set, mac80211 will not automatically fall back to software crypto if * enabling hardware crypto failed. The set_key() call may also return the * value 1 to permit this specific key/algorithm to be done in software. * * When the cmd is %DISABLE_KEY then it must succeed. * * Note that it is permissible to not decrypt a frame even if a key * for it has been uploaded to hardware, the stack will not make any * decision based on whether a key has been uploaded or not but rather * based on the receive flags. * * The &struct ieee80211_key_conf structure pointed to by the @key * parameter is guaranteed to be valid until another call to set_key() * removes it, but it can only be used as a cookie to differentiate * keys. * * In TKIP some HW need to be provided a phase 1 key, for RX decryption * acceleration (i.e. iwlwifi). Those drivers should provide update_tkip_key * handler. * The update_tkip_key() call updates the driver with the new phase 1 key. * This happens every time the iv16 wraps around (every 65536 packets). The * set_key() call will happen only once for each key (unless the AP did * rekeying), it will not include a valid phase 1 key. The valid phase 1 key is * provided by update_tkip_key only. The trigger that makes mac80211 call this * handler is software decryption with wrap around of iv16. * * The set_default_unicast_key() call updates the default WEP key index * configured to the hardware for WEP encryption type. This is required * for devices that support offload of data packets (e.g. ARP responses). */ /** * DOC: Powersave support * * mac80211 has support for various powersave implementations. * * First, it can support hardware that handles all powersaving by itself, * such hardware should simply set the %IEEE80211_HW_SUPPORTS_PS hardware * flag. In that case, it will be told about the desired powersave mode * with the %IEEE80211_CONF_PS flag depending on the association status. * The hardware must take care of sending nullfunc frames when necessary, * i.e. when entering and leaving powersave mode. The hardware is required * to look at the AID in beacons and signal to the AP that it woke up when * it finds traffic directed to it. * * %IEEE80211_CONF_PS flag enabled means that the powersave mode defined in * IEEE 802.11-2007 section 11.2 is enabled. This is not to be confused * with hardware wakeup and sleep states. Driver is responsible for waking * up the hardware before issuing commands to the hardware and putting it * back to sleep at appropriate times. * * When PS is enabled, hardware needs to wakeup for beacons and receive the * buffered multicast/broadcast frames after the beacon. Also it must be * possible to send frames and receive the acknowledment frame. * * Other hardware designs cannot send nullfunc frames by themselves and also * need software support for parsing the TIM bitmap. This is also supported * by mac80211 by combining the %IEEE80211_HW_SUPPORTS_PS and * %IEEE80211_HW_PS_NULLFUNC_STACK flags. The hardware is of course still * required to pass up beacons. The hardware is still required to handle * waking up for multicast traffic; if it cannot the driver must handle that * as best as it can, mac80211 is too slow to do that. * * Dynamic powersave is an extension to normal powersave in which the * hardware stays awake for a user-specified period of time after sending a * frame so that reply frames need not be buffered and therefore delayed to * the next wakeup. It's compromise of getting good enough latency when * there's data traffic and still saving significantly power in idle * periods. * * Dynamic powersave is simply supported by mac80211 enabling and disabling * PS based on traffic. Driver needs to only set %IEEE80211_HW_SUPPORTS_PS * flag and mac80211 will handle everything automatically. Additionally, * hardware having support for the dynamic PS feature may set the * %IEEE80211_HW_SUPPORTS_DYNAMIC_PS flag to indicate that it can support * dynamic PS mode itself. The driver needs to look at the * @dynamic_ps_timeout hardware configuration value and use it that value * whenever %IEEE80211_CONF_PS is set. In this case mac80211 will disable * dynamic PS feature in stack and will just keep %IEEE80211_CONF_PS * enabled whenever user has enabled powersave. * * Driver informs U-APSD client support by enabling * %IEEE80211_VIF_SUPPORTS_UAPSD flag. The mode is configured through the * uapsd parameter in conf_tx() operation. Hardware needs to send the QoS * Nullfunc frames and stay awake until the service period has ended. To * utilize U-APSD, dynamic powersave is disabled for voip AC and all frames * from that AC are transmitted with powersave enabled. * * Note: U-APSD client mode is not yet supported with * %IEEE80211_HW_PS_NULLFUNC_STACK. */ /** * DOC: Beacon filter support * * Some hardware have beacon filter support to reduce host cpu wakeups * which will reduce system power consumption. It usually works so that * the firmware creates a checksum of the beacon but omits all constantly * changing elements (TSF, TIM etc). Whenever the checksum changes the * beacon is forwarded to the host, otherwise it will be just dropped. That * way the host will only receive beacons where some relevant information * (for example ERP protection or WMM settings) have changed. * * Beacon filter support is advertised with the %IEEE80211_VIF_BEACON_FILTER * interface capability. The driver needs to enable beacon filter support * whenever power save is enabled, that is %IEEE80211_CONF_PS is set. When * power save is enabled, the stack will not check for beacon loss and the * driver needs to notify about loss of beacons with ieee80211_beacon_loss(). * * The time (or number of beacons missed) until the firmware notifies the * driver of a beacon loss event (which in turn causes the driver to call * ieee80211_beacon_loss()) should be configurable and will be controlled * by mac80211 and the roaming algorithm in the future. * * Since there may be constantly changing information elements that nothing * in the software stack cares about, we will, in the future, have mac80211 * tell the driver which information elements are interesting in the sense * that we want to see changes in them. This will include * * - a list of information element IDs * - a list of OUIs for the vendor information element * * Ideally, the hardware would filter out any beacons without changes in the * requested elements, but if it cannot support that it may, at the expense * of some efficiency, filter out only a subset. For example, if the device * doesn't support checking for OUIs it should pass up all changes in all * vendor information elements. * * Note that change, for the sake of simplification, also includes information * elements appearing or disappearing from the beacon. * * Some hardware supports an "ignore list" instead, just make sure nothing * that was requested is on the ignore list, and include commonly changing * information element IDs in the ignore list, for example 11 (BSS load) and * the various vendor-assigned IEs with unknown contents (128, 129, 133-136, * 149, 150, 155, 156, 173, 176, 178, 179, 219); for forward compatibility * it could also include some currently unused IDs. * * * In addition to these capabilities, hardware should support notifying the * host of changes in the beacon RSSI. This is relevant to implement roaming * when no traffic is flowing (when traffic is flowing we see the RSSI of * the received data packets). This can consist in notifying the host when * the RSSI changes significantly or when it drops below or rises above * configurable thresholds. In the future these thresholds will also be * configured by mac80211 (which gets them from userspace) to implement * them as the roaming algorithm requires. * * If the hardware cannot implement this, the driver should ask it to * periodically pass beacon frames to the host so that software can do the * signal strength threshold checking. */ /** * DOC: Spatial multiplexing power save * * SMPS (Spatial multiplexing power save) is a mechanism to conserve * power in an 802.11n implementation. For details on the mechanism * and rationale, please refer to 802.11 (as amended by 802.11n-2009) * "11.2.3 SM power save". * * The mac80211 implementation is capable of sending action frames * to update the AP about the station's SMPS mode, and will instruct * the driver to enter the specific mode. It will also announce the * requested SMPS mode during the association handshake. Hardware * support for this feature is required, and can be indicated by * hardware flags. * * The default mode will be "automatic", which nl80211/cfg80211 * defines to be dynamic SMPS in (regular) powersave, and SMPS * turned off otherwise. * * To support this feature, the driver must set the appropriate * hardware support flags, and handle the SMPS flag to the config() * operation. It will then with this mechanism be instructed to * enter the requested SMPS mode while associated to an HT AP. */ /** * DOC: Frame filtering * * mac80211 requires to see many management frames for proper * operation, and users may want to see many more frames when * in monitor mode. However, for best CPU usage and power consumption, * having as few frames as possible percolate through the stack is * desirable. Hence, the hardware should filter as much as possible. * * To achieve this, mac80211 uses filter flags (see below) to tell * the driver's configure_filter() function which frames should be * passed to mac80211 and which should be filtered out. * * Before configure_filter() is invoked, the prepare_multicast() * callback is invoked with the parameters @mc_count and @mc_list * for the combined multicast address list of all virtual interfaces. * It's use is optional, and it returns a u64 that is passed to * configure_filter(). Additionally, configure_filter() has the * arguments @changed_flags telling which flags were changed and * @total_flags with the new flag states. * * If your device has no multicast address filters your driver will * need to check both the %FIF_ALLMULTI flag and the @mc_count * parameter to see whether multicast frames should be accepted * or dropped. * * All unsupported flags in @total_flags must be cleared. * Hardware does not support a flag if it is incapable of _passing_ * the frame to the stack. Otherwise the driver must ignore * the flag, but not clear it. * You must _only_ clear the flag (announce no support for the * flag to mac80211) if you are not able to pass the packet type * to the stack (so the hardware always filters it). * So for example, you should clear @FIF_CONTROL, if your hardware * always filters control frames. If your hardware always passes * control frames to the kernel and is incapable of filtering them, * you do _not_ clear the @FIF_CONTROL flag. * This rule applies to all other FIF flags as well. */ /** * DOC: AP support for powersaving clients * * In order to implement AP and P2P GO modes, mac80211 has support for * client powersaving, both "legacy" PS (PS-Poll/null data) and uAPSD. * There currently is no support for sAPSD. * * There is one assumption that mac80211 makes, namely that a client * will not poll with PS-Poll and trigger with uAPSD at the same time. * Both are supported, and both can be used by the same client, but * they can't be used concurrently by the same client. This simplifies * the driver code. * * The first thing to keep in mind is that there is a flag for complete * driver implementation: %IEEE80211_HW_AP_LINK_PS. If this flag is set, * mac80211 expects the driver to handle most of the state machine for * powersaving clients and will ignore the PM bit in incoming frames. * Drivers then use ieee80211_sta_ps_transition() to inform mac80211 of * stations' powersave transitions. In this mode, mac80211 also doesn't * handle PS-Poll/uAPSD. * * In the mode without %IEEE80211_HW_AP_LINK_PS, mac80211 will check the * PM bit in incoming frames for client powersave transitions. When a * station goes to sleep, we will stop transmitting to it. There is, * however, a race condition: a station might go to sleep while there is * data buffered on hardware queues. If the device has support for this * it will reject frames, and the driver should give the frames back to * mac80211 with the %IEEE80211_TX_STAT_TX_FILTERED flag set which will * cause mac80211 to retry the frame when the station wakes up. The * driver is also notified of powersave transitions by calling its * @sta_notify callback. * * When the station is asleep, it has three choices: it can wake up, * it can PS-Poll, or it can possibly start a uAPSD service period. * Waking up is implemented by simply transmitting all buffered (and * filtered) frames to the station. This is the easiest case. When * the station sends a PS-Poll or a uAPSD trigger frame, mac80211 * will inform the driver of this with the @allow_buffered_frames * callback; this callback is optional. mac80211 will then transmit * the frames as usual and set the %IEEE80211_TX_CTL_NO_PS_BUFFER * on each frame. The last frame in the service period (or the only * response to a PS-Poll) also has %IEEE80211_TX_STATUS_EOSP set to * indicate that it ends the service period; as this frame must have * TX status report it also sets %IEEE80211_TX_CTL_REQ_TX_STATUS. * When TX status is reported for this frame, the service period is * marked has having ended and a new one can be started by the peer. * * Additionally, non-bufferable MMPDUs can also be transmitted by * mac80211 with the %IEEE80211_TX_CTL_NO_PS_BUFFER set in them. * * Another race condition can happen on some devices like iwlwifi * when there are frames queued for the station and it wakes up * or polls; the frames that are already queued could end up being * transmitted first instead, causing reordering and/or wrong * processing of the EOSP. The cause is that allowing frames to be * transmitted to a certain station is out-of-band communication to * the device. To allow this problem to be solved, the driver can * call ieee80211_sta_block_awake() if frames are buffered when it * is notified that the station went to sleep. When all these frames * have been filtered (see above), it must call the function again * to indicate that the station is no longer blocked. * * If the driver buffers frames in the driver for aggregation in any * way, it must use the ieee80211_sta_set_buffered() call when it is * notified of the station going to sleep to inform mac80211 of any * TIDs that have frames buffered. Note that when a station wakes up * this information is reset (hence the requirement to call it when * informed of the station going to sleep). Then, when a service * period starts for any reason, @release_buffered_frames is called * with the number of frames to be released and which TIDs they are * to come from. In this case, the driver is responsible for setting * the EOSP (for uAPSD) and MORE_DATA bits in the released frames, * to help the @more_data parameter is passed to tell the driver if * there is more data on other TIDs -- the TIDs to release frames * from are ignored since mac80211 doesn't know how many frames the * buffers for those TIDs contain. * * If the driver also implement GO mode, where absence periods may * shorten service periods (or abort PS-Poll responses), it must * filter those response frames except in the case of frames that * are buffered in the driver -- those must remain buffered to avoid * reordering. Because it is possible that no frames are released * in this case, the driver must call ieee80211_sta_eosp() * to indicate to mac80211 that the service period ended anyway. * * Finally, if frames from multiple TIDs are released from mac80211 * but the driver might reorder them, it must clear & set the flags * appropriately (only the last frame may have %IEEE80211_TX_STATUS_EOSP) * and also take care of the EOSP and MORE_DATA bits in the frame. * The driver may also use ieee80211_sta_eosp() in this case. * * Note that if the driver ever buffers frames other than QoS-data * frames, it must take care to never send a non-QoS-data frame as * the last frame in a service period, adding a QoS-nulldata frame * after a non-QoS-data frame if needed. */ /** * DOC: HW queue control * * Before HW queue control was introduced, mac80211 only had a single static * assignment of per-interface AC software queues to hardware queues. This * was problematic for a few reasons: * 1) off-channel transmissions might get stuck behind other frames * 2) multiple virtual interfaces couldn't be handled correctly * 3) after-DTIM frames could get stuck behind other frames * * To solve this, hardware typically uses multiple different queues for all * the different usages, and this needs to be propagated into mac80211 so it * won't have the same problem with the software queues. * * Therefore, mac80211 now offers the %IEEE80211_HW_QUEUE_CONTROL capability * flag that tells it that the driver implements its own queue control. To do * so, the driver will set up the various queues in each &struct ieee80211_vif * and the offchannel queue in &struct ieee80211_hw. In response, mac80211 will * use those queue IDs in the hw_queue field of &struct ieee80211_tx_info and * if necessary will queue the frame on the right software queue that mirrors * the hardware queue. * Additionally, the driver has to then use these HW queue IDs for the queue * management functions (ieee80211_stop_queue() et al.) * * The driver is free to set up the queue mappings as needed, multiple virtual * interfaces may map to the same hardware queues if needed. The setup has to * happen during add_interface or change_interface callbacks. For example, a * driver supporting station+station and station+AP modes might decide to have * 10 hardware queues to handle different scenarios: * * 4 AC HW queues for 1st vif: 0, 1, 2, 3 * 4 AC HW queues for 2nd vif: 4, 5, 6, 7 * after-DTIM queue for AP: 8 * off-channel queue: 9 * * It would then set up the hardware like this: * hw.offchannel_tx_hw_queue = 9 * * and the first virtual interface that is added as follows: * vif.hw_queue[IEEE80211_AC_VO] = 0 * vif.hw_queue[IEEE80211_AC_VI] = 1 * vif.hw_queue[IEEE80211_AC_BE] = 2 * vif.hw_queue[IEEE80211_AC_BK] = 3 * vif.cab_queue = 8 // if AP mode, otherwise %IEEE80211_INVAL_HW_QUEUE * and the second virtual interface with 4-7. * * If queue 6 gets full, for example, mac80211 would only stop the second * virtual interface's BE queue since virtual interface queues are per AC. * * Note that the vif.cab_queue value should be set to %IEEE80211_INVAL_HW_QUEUE * whenever the queue is not used (i.e. the interface is not in AP mode) if the * queue could potentially be shared since mac80211 will look at cab_queue when * a queue is stopped/woken even if the interface is not in AP mode. */ /** * enum ieee80211_filter_flags - hardware filter flags * * These flags determine what the filter in hardware should be * programmed to let through and what should not be passed to the * stack. It is always safe to pass more frames than requested, * but this has negative impact on power consumption. * * @FIF_ALLMULTI: pass all multicast frames, this is used if requested * by the user or if the hardware is not capable of filtering by * multicast address. * * @FIF_FCSFAIL: pass frames with failed FCS (but you need to set the * %RX_FLAG_FAILED_FCS_CRC for them) * * @FIF_PLCPFAIL: pass frames with failed PLCP CRC (but you need to set * the %RX_FLAG_FAILED_PLCP_CRC for them * * @FIF_BCN_PRBRESP_PROMISC: This flag is set during scanning to indicate * to the hardware that it should not filter beacons or probe responses * by BSSID. Filtering them can greatly reduce the amount of processing * mac80211 needs to do and the amount of CPU wakeups, so you should * honour this flag if possible. * * @FIF_CONTROL: pass control frames (except for PS Poll) addressed to this * station * * @FIF_OTHER_BSS: pass frames destined to other BSSes * * @FIF_PSPOLL: pass PS Poll frames * * @FIF_PROBE_REQ: pass probe request frames */ enum ieee80211_filter_flags { FIF_ALLMULTI = 1<<1, FIF_FCSFAIL = 1<<2, FIF_PLCPFAIL = 1<<3, FIF_BCN_PRBRESP_PROMISC = 1<<4, FIF_CONTROL = 1<<5, FIF_OTHER_BSS = 1<<6, FIF_PSPOLL = 1<<7, FIF_PROBE_REQ = 1<<8, }; /** * enum ieee80211_ampdu_mlme_action - A-MPDU actions * * These flags are used with the ampdu_action() callback in * &struct ieee80211_ops to indicate which action is needed. * * Note that drivers MUST be able to deal with a TX aggregation * session being stopped even before they OK'ed starting it by * calling ieee80211_start_tx_ba_cb_irqsafe, because the peer * might receive the addBA frame and send a delBA right away! * * @IEEE80211_AMPDU_RX_START: start RX aggregation * @IEEE80211_AMPDU_RX_STOP: stop RX aggregation * @IEEE80211_AMPDU_TX_START: start TX aggregation * @IEEE80211_AMPDU_TX_OPERATIONAL: TX aggregation has become operational * @IEEE80211_AMPDU_TX_STOP_CONT: stop TX aggregation but continue transmitting * queued packets, now unaggregated. After all packets are transmitted the * driver has to call ieee80211_stop_tx_ba_cb_irqsafe(). * @IEEE80211_AMPDU_TX_STOP_FLUSH: stop TX aggregation and flush all packets, * called when the station is removed. There's no need or reason to call * ieee80211_stop_tx_ba_cb_irqsafe() in this case as mac80211 assumes the * session is gone and removes the station. * @IEEE80211_AMPDU_TX_STOP_FLUSH_CONT: called when TX aggregation is stopped * but the driver hasn't called ieee80211_stop_tx_ba_cb_irqsafe() yet and * now the connection is dropped and the station will be removed. Drivers * should clean up and drop remaining packets when this is called. */ enum ieee80211_ampdu_mlme_action { IEEE80211_AMPDU_RX_START, IEEE80211_AMPDU_RX_STOP, IEEE80211_AMPDU_TX_START, IEEE80211_AMPDU_TX_STOP_CONT, IEEE80211_AMPDU_TX_STOP_FLUSH, IEEE80211_AMPDU_TX_STOP_FLUSH_CONT, IEEE80211_AMPDU_TX_OPERATIONAL, }; /** * struct ieee80211_ampdu_params - AMPDU action parameters * * @action: the ampdu action, value from %ieee80211_ampdu_mlme_action. * @sta: peer of this AMPDU session * @tid: tid of the BA session * @ssn: start sequence number of the session. TX/RX_STOP can pass 0. When * action is set to %IEEE80211_AMPDU_RX_START the driver passes back the * actual ssn value used to start the session and writes the value here. * @buf_size: reorder buffer size (number of subframes). Valid only when the * action is set to %IEEE80211_AMPDU_RX_START or * %IEEE80211_AMPDU_TX_OPERATIONAL * @amsdu: indicates the peer's ability to receive A-MSDU within A-MPDU. * valid when the action is set to %IEEE80211_AMPDU_TX_OPERATIONAL * @timeout: BA session timeout. Valid only when the action is set to * %IEEE80211_AMPDU_RX_START */ struct ieee80211_ampdu_params { enum ieee80211_ampdu_mlme_action action; struct ieee80211_sta *sta; u16 tid; u16 ssn; u16 buf_size; bool amsdu; u16 timeout; }; /** * enum ieee80211_frame_release_type - frame release reason * @IEEE80211_FRAME_RELEASE_PSPOLL: frame released for PS-Poll * @IEEE80211_FRAME_RELEASE_UAPSD: frame(s) released due to * frame received on trigger-enabled AC */ enum ieee80211_frame_release_type { IEEE80211_FRAME_RELEASE_PSPOLL, IEEE80211_FRAME_RELEASE_UAPSD, }; /** * enum ieee80211_rate_control_changed - flags to indicate what changed * * @IEEE80211_RC_BW_CHANGED: The bandwidth that can be used to transmit * to this station changed. The actual bandwidth is in the station * information -- for HT20/40 the IEEE80211_HT_CAP_SUP_WIDTH_20_40 * flag changes, for HT and VHT the bandwidth field changes. * @IEEE80211_RC_SMPS_CHANGED: The SMPS state of the station changed. * @IEEE80211_RC_SUPP_RATES_CHANGED: The supported rate set of this peer * changed (in IBSS mode) due to discovering more information about * the peer. * @IEEE80211_RC_NSS_CHANGED: N_SS (number of spatial streams) was changed * by the peer */ enum ieee80211_rate_control_changed { IEEE80211_RC_BW_CHANGED = BIT(0), IEEE80211_RC_SMPS_CHANGED = BIT(1), IEEE80211_RC_SUPP_RATES_CHANGED = BIT(2), IEEE80211_RC_NSS_CHANGED = BIT(3), }; /** * enum ieee80211_roc_type - remain on channel type * * With the support for multi channel contexts and multi channel operations, * remain on channel operations might be limited/deferred/aborted by other * flows/operations which have higher priority (and vise versa). * Specifying the ROC type can be used by devices to prioritize the ROC * operations compared to other operations/flows. * * @IEEE80211_ROC_TYPE_NORMAL: There are no special requirements for this ROC. * @IEEE80211_ROC_TYPE_MGMT_TX: The remain on channel request is required * for sending managment frames offchannel. */ enum ieee80211_roc_type { IEEE80211_ROC_TYPE_NORMAL = 0, IEEE80211_ROC_TYPE_MGMT_TX, }; /** * enum ieee80211_reconfig_complete_type - reconfig type * * This enum is used by the reconfig_complete() callback to indicate what * reconfiguration type was completed. * * @IEEE80211_RECONFIG_TYPE_RESTART: hw restart type * (also due to resume() callback returning 1) * @IEEE80211_RECONFIG_TYPE_SUSPEND: suspend type (regardless * of wowlan configuration) */ enum ieee80211_reconfig_type { IEEE80211_RECONFIG_TYPE_RESTART, IEEE80211_RECONFIG_TYPE_SUSPEND, }; /** * struct ieee80211_ops - callbacks from mac80211 to the driver * * This structure contains various callbacks that the driver may * handle or, in some cases, must handle, for example to configure * the hardware to a new channel or to transmit a frame. * * @tx: Handler that 802.11 module calls for each transmitted frame. * skb contains the buffer starting from the IEEE 802.11 header. * The low-level driver should send the frame out based on * configuration in the TX control data. This handler should, * preferably, never fail and stop queues appropriately. * Must be atomic. * * @start: Called before the first netdevice attached to the hardware * is enabled. This should turn on the hardware and must turn on * frame reception (for possibly enabled monitor interfaces.) * Returns negative error codes, these may be seen in userspace, * or zero. * When the device is started it should not have a MAC address * to avoid acknowledging frames before a non-monitor device * is added. * Must be implemented and can sleep. * * @stop: Called after last netdevice attached to the hardware * is disabled. This should turn off the hardware (at least * it must turn off frame reception.) * May be called right after add_interface if that rejects * an interface. If you added any work onto the mac80211 workqueue * you should ensure to cancel it on this callback. * Must be implemented and can sleep. * * @suspend: Suspend the device; mac80211 itself will quiesce before and * stop transmitting and doing any other configuration, and then * ask the device to suspend. This is only invoked when WoWLAN is * configured, otherwise the device is deconfigured completely and * reconfigured at resume time. * The driver may also impose special conditions under which it * wants to use the "normal" suspend (deconfigure), say if it only * supports WoWLAN when the device is associated. In this case, it * must return 1 from this function. * * @resume: If WoWLAN was configured, this indicates that mac80211 is * now resuming its operation, after this the device must be fully * functional again. If this returns an error, the only way out is * to also unregister the device. If it returns 1, then mac80211 * will also go through the regular complete restart on resume. * * @set_wakeup: Enable or disable wakeup when WoWLAN configuration is * modified. The reason is that device_set_wakeup_enable() is * supposed to be called when the configuration changes, not only * in suspend(). * * @add_interface: Called when a netdevice attached to the hardware is * enabled. Because it is not called for monitor mode devices, @start * and @stop must be implemented. * The driver should perform any initialization it needs before * the device can be enabled. The initial configuration for the * interface is given in the conf parameter. * The callback may refuse to add an interface by returning a * negative error code (which will be seen in userspace.) * Must be implemented and can sleep. * * @change_interface: Called when a netdevice changes type. This callback * is optional, but only if it is supported can interface types be * switched while the interface is UP. The callback may sleep. * Note that while an interface is being switched, it will not be * found by the interface iteration callbacks. * * @remove_interface: Notifies a driver that an interface is going down. * The @stop callback is called after this if it is the last interface * and no monitor interfaces are present. * When all interfaces are removed, the MAC address in the hardware * must be cleared so the device no longer acknowledges packets, * the mac_addr member of the conf structure is, however, set to the * MAC address of the device going away. * Hence, this callback must be implemented. It can sleep. * * @config: Handler for configuration requests. IEEE 802.11 code calls this * function to change hardware configuration, e.g., channel. * This function should never fail but returns a negative error code * if it does. The callback can sleep. * * @bss_info_changed: Handler for configuration requests related to BSS * parameters that may vary during BSS's lifespan, and may affect low * level driver (e.g. assoc/disassoc status, erp parameters). * This function should not be used if no BSS has been set, unless * for association indication. The @changed parameter indicates which * of the bss parameters has changed when a call is made. The callback * can sleep. * * @prepare_multicast: Prepare for multicast filter configuration. * This callback is optional, and its return value is passed * to configure_filter(). This callback must be atomic. * * @configure_filter: Configure the device's RX filter. * See the section "Frame filtering" for more information. * This callback must be implemented and can sleep. * * @config_iface_filter: Configure the interface's RX filter. * This callback is optional and is used to configure which frames * should be passed to mac80211. The filter_flags is the combination * of FIF_* flags. The changed_flags is a bit mask that indicates * which flags are changed. * This callback can sleep. * * @set_tim: Set TIM bit. mac80211 calls this function when a TIM bit * must be set or cleared for a given STA. Must be atomic. * * @set_key: See the section "Hardware crypto acceleration" * This callback is only called between add_interface and * remove_interface calls, i.e. while the given virtual interface * is enabled. * Returns a negative error code if the key can't be added. * The callback can sleep. * * @update_tkip_key: See the section "Hardware crypto acceleration" * This callback will be called in the context of Rx. Called for drivers * which set IEEE80211_KEY_FLAG_TKIP_REQ_RX_P1_KEY. * The callback must be atomic. * * @set_rekey_data: If the device supports GTK rekeying, for example while the * host is suspended, it can assign this callback to retrieve the data * necessary to do GTK rekeying, this is the KEK, KCK and replay counter. * After rekeying was done it should (for example during resume) notify * userspace of the new replay counter using ieee80211_gtk_rekey_notify(). * * @set_default_unicast_key: Set the default (unicast) key index, useful for * WEP when the device sends data packets autonomously, e.g. for ARP * offloading. The index can be 0-3, or -1 for unsetting it. * * @hw_scan: Ask the hardware to service the scan request, no need to start * the scan state machine in stack. The scan must honour the channel * configuration done by the regulatory agent in the wiphy's * registered bands. The hardware (or the driver