17 16 1 3 49 84 85 764 758 14 763 1 758 761 759 14 712 53 752 12 20 9 9 67 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 | // SPDX-License-Identifier: GPL-2.0-or-later /* * fs/inotify_user.c - inotify support for userspace * * Authors: * John McCutchan <ttb@tentacle.dhs.org> * Robert Love <rml@novell.com> * * Copyright (C) 2005 John McCutchan * Copyright 2006 Hewlett-Packard Development Company, L.P. * * Copyright (C) 2009 Eric Paris <Red Hat Inc> * inotify was largely rewriten to make use of the fsnotify infrastructure */ #include <linux/dcache.h> /* d_unlinked */ #include <linux/fs.h> /* struct inode */ #include <linux/fsnotify_backend.h> #include <linux/inotify.h> #include <linux/path.h> /* struct path */ #include <linux/slab.h> /* kmem_* */ #include <linux/types.h> #include <linux/sched.h> #include <linux/sched/user.h> #include <linux/sched/mm.h> #include "inotify.h" /* * Check if 2 events contain the same information. */ static bool event_compare(struct fsnotify_event *old_fsn, struct fsnotify_event *new_fsn) { struct inotify_event_info *old, *new; old = INOTIFY_E(old_fsn); new = INOTIFY_E(new_fsn); if (old->mask & FS_IN_IGNORED) return false; if ((old->mask == new->mask) && (old->wd == new->wd) && (old->name_len == new->name_len) && (!old->name_len || !strcmp(old->name, new->name))) return true; return false; } static int inotify_merge(struct fsnotify_group *group, struct fsnotify_event *event) { struct list_head *list = &group->notification_list; struct fsnotify_event *last_event; last_event = list_entry(list->prev, struct fsnotify_event, list); return event_compare(last_event, event); } int inotify_handle_inode_event(struct fsnotify_mark *inode_mark, u32 mask, struct inode *inode, struct inode *dir, const struct qstr *name, u32 cookie) { struct inotify_inode_mark *i_mark; struct inotify_event_info *event; struct fsnotify_event *fsn_event; struct fsnotify_group *group = inode_mark->group; int ret; int len = 0, wd; int alloc_len = sizeof(struct inotify_event_info); struct mem_cgroup *old_memcg; if (name) { len = name->len; alloc_len += len + 1; } pr_debug("%s: group=%p mark=%p mask=%x\n", __func__, group, inode_mark, mask); i_mark = container_of(inode_mark, struct inotify_inode_mark, fsn_mark); /* * We can be racing with mark being detached. Don't report event with * invalid wd. */ wd = READ_ONCE(i_mark->wd); if (wd == -1) return 0; /* * Whoever is interested in the event, pays for the allocation. Do not * trigger OOM killer in the target monitoring memcg as it may have * security repercussion. */ old_memcg = set_active_memcg(group->memcg); event = kmalloc(alloc_len, GFP_KERNEL_ACCOUNT | __GFP_RETRY_MAYFAIL); set_active_memcg(old_memcg); if (unlikely(!event)) { /* * Treat lost event due to ENOMEM the same way as queue * overflow to let userspace know event was lost. */ fsnotify_queue_overflow(group); return -ENOMEM; } /* * We now report FS_ISDIR flag with MOVE_SELF and DELETE_SELF events * for fanotify. inotify never reported IN_ISDIR with those events. * It looks like an oversight, but to avoid the risk of breaking * existing inotify programs, mask the flag out from those events. */ if (mask & (IN_MOVE_SELF | IN_DELETE_SELF)) mask &= ~IN_ISDIR; fsn_event = &event->fse; fsnotify_init_event(fsn_event); event->mask = mask; event->wd = wd; event->sync_cookie = cookie; event->name_len = len; if (len) strscpy(event->name, name->name, event->name_len + 1); ret = fsnotify_add_event(group, fsn_event, inotify_merge); if (ret) { /* Our event wasn't used in the end. Free it. */ fsnotify_destroy_event(group, fsn_event); } if (inode_mark->flags & FSNOTIFY_MARK_FLAG_IN_ONESHOT) fsnotify_destroy_mark(inode_mark, group); return 0; } static void inotify_freeing_mark(struct fsnotify_mark *fsn_mark, struct fsnotify_group *group) { inotify_ignored_and_remove_idr(fsn_mark, group); } /* * This is NEVER supposed to be called. Inotify marks should either have been * removed from the idr when the watch was removed or in the * fsnotify_destroy_mark_by_group() call when the inotify instance was being * torn down. This is only called if the idr is about to be freed but there * are still marks in it. */ static int idr_callback(int id, void *p, void *data) { struct fsnotify_mark *fsn_mark; struct inotify_inode_mark *i_mark; static bool warned = false; if (warned) return 0; warned = true; fsn_mark = p; i_mark = container_of(fsn_mark, struct inotify_inode_mark, fsn_mark); WARN(1, "inotify closing but id=%d for fsn_mark=%p in group=%p still in " "idr. Probably leaking memory\n", id, p, data); /* * I'm taking the liberty of assuming that the mark in question is a * valid address and I'm dereferencing it. This might help to figure * out why we got here and the panic is no worse than the original * BUG() that was here. */ if (fsn_mark) printk(KERN_WARNING "fsn_mark->group=%p wd=%d\n", fsn_mark->group, i_mark->wd); return 0; } static void inotify_free_group_priv(struct fsnotify_group *group) { /* ideally the idr is empty and we won't hit the BUG in the callback */ idr_for_each(&group->inotify_data.idr, idr_callback, group); idr_destroy(&group->inotify_data.idr); if (group->inotify_data.ucounts) dec_inotify_instances(group->inotify_data.ucounts); } static void inotify_free_event(struct fsnotify_group *group, struct fsnotify_event *fsn_event) { kfree(INOTIFY_E(fsn_event)); } /* ding dong the mark is dead */ static void inotify_free_mark(struct fsnotify_mark *fsn_mark) { struct inotify_inode_mark *i_mark; i_mark = container_of(fsn_mark, struct inotify_inode_mark, fsn_mark); kmem_cache_free(inotify_inode_mark_cachep, i_mark); } const struct fsnotify_ops inotify_fsnotify_ops = { .handle_inode_event = inotify_handle_inode_event, .free_group_priv = inotify_free_group_priv, .free_event = inotify_free_event, .freeing_mark = inotify_freeing_mark, .free_mark = inotify_free_mark, }; |
54 52 1 18 51 51 3 51 17 4 15 54 2 52 18 16 2 53 54 54 54 52 54 51 2 2 16 16 16 60 74 42 10 1 9 42 16 16 14 15 52 50 49 2 106 24 91 1 1 1 67 60 54 16 52 14 56 1 1 54 18 18 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 | // SPDX-License-Identifier: GPL-2.0 /* * uprobes-based tracing events * * Copyright (C) IBM Corporation, 2010-2012 * Author: Srikar Dronamraju <srikar@linux.vnet.ibm.com> */ #define pr_fmt(fmt) "trace_uprobe: " fmt #include <linux/bpf-cgroup.h> #include <linux/security.h> #include <linux/ctype.h> #include <linux/module.h> #include <linux/uaccess.h> #include <linux/uprobes.h> #include <linux/namei.h> #include <linux/string.h> #include <linux/rculist.h> #include <linux/filter.h> #include <linux/percpu.h> #include "trace_dynevent.h" #include "trace_probe.h" #include "trace_probe_tmpl.h" #define UPROBE_EVENT_SYSTEM "uprobes" struct uprobe_trace_entry_head { struct trace_entry ent; unsigned long vaddr[]; }; #define SIZEOF_TRACE_ENTRY(is_return) \ (sizeof(struct uprobe_trace_entry_head) + \ sizeof(unsigned long) * (is_return ? 2 : 1)) #define DATAOF_TRACE_ENTRY(entry, is_return) \ ((void*)(entry) + SIZEOF_TRACE_ENTRY(is_return)) static int trace_uprobe_create(const char *raw_command); static int trace_uprobe_show(struct seq_file *m, struct dyn_event *ev); static int trace_uprobe_release(struct dyn_event *ev); static bool trace_uprobe_is_busy(struct dyn_event *ev); static bool trace_uprobe_match(const char *system, const char *event, int argc, const char **argv, struct dyn_event *ev); static struct dyn_event_operations trace_uprobe_ops = { .create = trace_uprobe_create, .show = trace_uprobe_show, .is_busy = trace_uprobe_is_busy, .free = trace_uprobe_release, .match = trace_uprobe_match, }; /* * uprobe event core functions */ struct trace_uprobe { struct dyn_event devent; struct uprobe_consumer consumer; struct path path; char *filename; struct uprobe *uprobe; unsigned long offset; unsigned long ref_ctr_offset; unsigned long __percpu *nhits; struct trace_probe tp; }; static bool is_trace_uprobe(struct dyn_event *ev) { return ev->ops == &trace_uprobe_ops; } static struct trace_uprobe *to_trace_uprobe(struct dyn_event *ev) { return container_of(ev, struct trace_uprobe, devent); } /** * for_each_trace_uprobe - iterate over the trace_uprobe list * @pos: the struct trace_uprobe * for each entry * @dpos: the struct dyn_event * to use as a loop cursor */ #define for_each_trace_uprobe(pos, dpos) \ for_each_dyn_event(dpos) \ if (is_trace_uprobe(dpos) && (pos = to_trace_uprobe(dpos))) static int register_uprobe_event(struct trace_uprobe *tu); static int unregister_uprobe_event(struct trace_uprobe *tu); static int uprobe_dispatcher(struct uprobe_consumer *con, struct pt_regs *regs, __u64 *data); static int uretprobe_dispatcher(struct uprobe_consumer *con, unsigned long func, struct pt_regs *regs, __u64 *data); #ifdef CONFIG_STACK_GROWSUP static unsigned long adjust_stack_addr(unsigned long addr, unsigned int n) { return addr - (n * sizeof(long)); } #else static unsigned long adjust_stack_addr(unsigned long addr, unsigned int n) { return addr + (n * sizeof(long)); } #endif static unsigned long get_user_stack_nth(struct pt_regs *regs, unsigned int n) { unsigned long ret; unsigned long addr = user_stack_pointer(regs); addr = adjust_stack_addr(addr, n); if (copy_from_user(&ret, (void __force __user *) addr, sizeof(ret))) return 0; return ret; } /* * Uprobes-specific fetch functions */ static nokprobe_inline int probe_mem_read(void *dest, void *src, size_t size) { void __user *vaddr = (void __force __user *)src; return copy_from_user(dest, vaddr, size) ? -EFAULT : 0; } static nokprobe_inline int probe_mem_read_user(void *dest, void *src, size_t size) { return probe_mem_read(dest, src, size); } /* * Fetch a null-terminated string. Caller MUST set *(u32 *)dest with max * length and relative data location. */ static nokprobe_inline int fetch_store_string(unsigned long addr, void *dest, void *base) { long ret; u32 loc = *(u32 *)dest; int maxlen = get_loc_len(loc); u8 *dst = get_loc_data(dest, base); void __user *src = (void __force __user *) addr; if (unlikely(!maxlen)) return -ENOMEM; if (addr == FETCH_TOKEN_COMM) ret = strscpy(dst, current->comm, maxlen); else ret = strncpy_from_user(dst, src, maxlen); if (ret >= 0) { if (ret == maxlen) dst[ret - 1] = '\0'; else /* * Include the terminating null byte. In this case it * was copied by strncpy_from_user but not accounted * for in ret. */ ret++; *(u32 *)dest = make_data_loc(ret, (void *)dst - base); } else *(u32 *)dest = make_data_loc(0, (void *)dst - base); return ret; } static nokprobe_inline int fetch_store_string_user(unsigned long addr, void *dest, void *base) { return fetch_store_string(addr, dest, base); } /* Return the length of string -- including null terminal byte */ static nokprobe_inline int fetch_store_strlen(unsigned long addr) { int len; void __user *vaddr = (void __force __user *) addr; if (addr == FETCH_TOKEN_COMM) len = strlen(current->comm) + 1; else len = strnlen_user(vaddr, MAX_STRING_SIZE); return (len > MAX_STRING_SIZE) ? 0 : len; } static nokprobe_inline int fetch_store_strlen_user(unsigned long addr) { return fetch_store_strlen(addr); } static unsigned long translate_user_vaddr(unsigned long file_offset) { unsigned long base_addr; struct uprobe_dispatch_data *udd; udd = (void *) current->utask->vaddr; base_addr = udd->bp_addr - udd->tu->offset; return base_addr + file_offset; } /* Note that we don't verify it, since the code does not come from user space */ static int process_fetch_insn(struct fetch_insn *code, void *rec, void *edata, void *dest, void *base) { struct pt_regs *regs = rec; unsigned long val; int ret; /* 1st stage: get value from context */ switch (code->op) { case FETCH_OP_REG: val = regs_get_register(regs, code->param); break; case FETCH_OP_STACK: val = get_user_stack_nth(regs, code->param); break; case FETCH_OP_STACKP: val = user_stack_pointer(regs); break; case FETCH_OP_RETVAL: val = regs_return_value(regs); break; case FETCH_OP_COMM: val = FETCH_TOKEN_COMM; break; case FETCH_OP_FOFFS: val = translate_user_vaddr(code->immediate); break; default: ret = process_common_fetch_insn(code, &val); if (ret < 0) return ret; } code++; return process_fetch_insn_bottom(code, val, dest, base); } NOKPROBE_SYMBOL(process_fetch_insn) static inline void init_trace_uprobe_filter(struct trace_uprobe_filter *filter) { rwlock_init(&filter->rwlock); filter->nr_systemwide = 0; INIT_LIST_HEAD(&filter->perf_events); } static inline bool uprobe_filter_is_empty(struct trace_uprobe_filter *filter) { return !filter->nr_systemwide && list_empty(&filter->perf_events); } static inline bool is_ret_probe(struct trace_uprobe *tu) { return tu->consumer.ret_handler != NULL; } static bool trace_uprobe_is_busy(struct dyn_event *ev) { struct trace_uprobe *tu = to_trace_uprobe(ev); return trace_probe_is_enabled(&tu->tp); } static bool trace_uprobe_match_command_head(struct trace_uprobe *tu, int argc, const char **argv) { char buf[MAX_ARGSTR_LEN + 1]; int len; if (!argc) return true; len = strlen(tu->filename); if (strncmp(tu->filename, argv[0], len) || argv[0][len] != ':') return false; if (tu->ref_ctr_offset == 0) snprintf(buf, sizeof(buf), "0x%0*lx", (int)(sizeof(void *) * 2), tu->offset); else snprintf(buf, sizeof(buf), "0x%0*lx(0x%lx)", (int)(sizeof(void *) * 2), tu->offset, tu->ref_ctr_offset); if (strcmp(buf, &argv[0][len + 1])) return false; argc--; argv++; return trace_probe_match_command_args(&tu->tp, argc, argv); } static bool trace_uprobe_match(const char *system, const char *event, int argc, const char **argv, struct dyn_event *ev) { struct trace_uprobe *tu = to_trace_uprobe(ev); return (event[0] == '\0' || strcmp(trace_probe_name(&tu->tp), event) == 0) && (!system || strcmp(trace_probe_group_name(&tu->tp), system) == 0) && trace_uprobe_match_command_head(tu, argc, argv); } static nokprobe_inline struct trace_uprobe * trace_uprobe_primary_from_call(struct trace_event_call *call) { struct trace_probe *tp; tp = trace_probe_primary_from_call(call); if (WARN_ON_ONCE(!tp)) return NULL; return container_of(tp, struct trace_uprobe, tp); } /* * Allocate new trace_uprobe and initialize it (including uprobes). */ static struct trace_uprobe * alloc_trace_uprobe(const char *group, const char *event, int nargs, bool is_ret) { struct trace_uprobe *tu; int ret; tu = kzalloc(struct_size(tu, tp.args, nargs), GFP_KERNEL); if (!tu) return ERR_PTR(-ENOMEM); tu->nhits = alloc_percpu(unsigned long); if (!tu->nhits) { ret = -ENOMEM; goto error; } ret = trace_probe_init(&tu->tp, event, group, true, nargs); if (ret < 0) goto error; dyn_event_init(&tu->devent, &trace_uprobe_ops); tu->consumer.handler = uprobe_dispatcher; if (is_ret) tu->consumer.ret_handler = uretprobe_dispatcher; init_trace_uprobe_filter(tu->tp.event->filter); return tu; error: free_percpu(tu->nhits); kfree(tu); return ERR_PTR(ret); } static void free_trace_uprobe(struct trace_uprobe *tu) { if (!tu) return; path_put(&tu->path); trace_probe_cleanup(&tu->tp); kfree(tu->filename); free_percpu(tu->nhits); kfree(tu); } static struct trace_uprobe *find_probe_event(const char *event, const char *group) { struct dyn_event *pos; struct trace_uprobe *tu; for_each_trace_uprobe(tu, pos) if (strcmp(trace_probe_name(&tu->tp), event) == 0 && strcmp(trace_probe_group_name(&tu->tp), group) == 0) return tu; return NULL; } /* Unregister a trace_uprobe and probe_event */ static int unregister_trace_uprobe(struct trace_uprobe *tu) { int ret; if (trace_probe_has_sibling(&tu->tp)) goto unreg; /* If there's a reference to the dynamic event */ if (trace_event_dyn_busy(trace_probe_event_call(&tu->tp))) return -EBUSY; ret = unregister_uprobe_event(tu); if (ret) return ret; unreg: dyn_event_remove(&tu->devent); trace_probe_unlink(&tu->tp); free_trace_uprobe(tu); return 0; } static bool trace_uprobe_has_same_uprobe(struct trace_uprobe *orig, struct trace_uprobe *comp) { struct trace_probe_event *tpe = orig->tp.event; struct inode *comp_inode = d_real_inode(comp->path.dentry); int i; list_for_each_entry(orig, &tpe->probes, tp.list) { if (comp_inode != d_real_inode(orig->path.dentry) || comp->offset != orig->offset) continue; /* * trace_probe_compare_arg_type() ensured that nr_args and * each argument name and type are same. Let's compare comm. */ for (i = 0; i < orig->tp.nr_args; i++) { if (strcmp(orig->tp.args[i].comm, comp->tp.args[i].comm)) break; } if (i == orig->tp.nr_args) return true; } return false; } static int append_trace_uprobe(struct trace_uprobe *tu, struct trace_uprobe *to) { int ret; ret = trace_probe_compare_arg_type(&tu->tp, &to->tp); if (ret) { /* Note that argument starts index = 2 */ trace_probe_log_set_index(ret + 1); trace_probe_log_err(0, DIFF_ARG_TYPE); return -EEXIST; } if (trace_uprobe_has_same_uprobe(to, tu)) { trace_probe_log_set_index(0); trace_probe_log_err(0, SAME_PROBE); return -EEXIST; } /* Append to existing event */ ret = trace_probe_append(&tu->tp, &to->tp); if (!ret) dyn_event_add(&tu->devent, trace_probe_event_call(&tu->tp)); return ret; } /* * Uprobe with multiple reference counter is not allowed. i.e. * If inode and offset matches, reference counter offset *must* * match as well. Though, there is one exception: If user is * replacing old trace_uprobe with new one(same group/event), * then we allow same uprobe with new reference counter as far * as the new one does not conflict with any other existing * ones. */ static int validate_ref_ctr_offset(struct trace_uprobe *new) { struct dyn_event *pos; struct trace_uprobe *tmp; struct inode *new_inode = d_real_inode(new->path.dentry); for_each_trace_uprobe(tmp, pos) { if (new_inode == d_real_inode(tmp->path.dentry) && new->offset == tmp->offset && new->ref_ctr_offset != tmp->ref_ctr_offset) { pr_warn("Reference counter offset mismatch."); return -EINVAL; } } return 0; } /* Register a trace_uprobe and probe_event */ static int register_trace_uprobe(struct trace_uprobe *tu) { struct trace_uprobe *old_tu; int ret; guard(mutex)(&event_mutex); ret = validate_ref_ctr_offset(tu); if (ret) return ret; /* register as an event */ old_tu = find_probe_event(trace_probe_name(&tu->tp), trace_probe_group_name(&tu->tp)); if (old_tu) { if (is_ret_probe(tu) != is_ret_probe(old_tu)) { trace_probe_log_set_index(0); trace_probe_log_err(0, DIFF_PROBE_TYPE); return -EEXIST; } return append_trace_uprobe(tu, old_tu); } ret = register_uprobe_event(tu); if (ret) { if (ret == -EEXIST) { trace_probe_log_set_index(0); trace_probe_log_err(0, EVENT_EXIST); } else pr_warn("Failed to register probe event(%d)\n", ret); return ret; } dyn_event_add(&tu->devent, trace_probe_event_call(&tu->tp)); return ret; } /* * Argument syntax: * - Add uprobe: p|r[:[GRP/][EVENT]] PATH:OFFSET[%return][(REF)] [FETCHARGS] */ static int __trace_uprobe_create(int argc, const char **argv) { struct trace_uprobe *tu; const char *event = NULL, *group = UPROBE_EVENT_SYSTEM; char *arg, *filename, *rctr, *rctr_end, *tmp; char buf[MAX_EVENT_NAME_LEN]; char gbuf[MAX_EVENT_NAME_LEN]; enum probe_print_type ptype; struct path path; unsigned long offset, ref_ctr_offset; bool is_return = false; int i, ret; ref_ctr_offset = 0; switch (argv[0][0]) { case 'r': is_return = true; break; case 'p': break; default: return -ECANCELED; } if (argc < 2) return -ECANCELED; trace_probe_log_init("trace_uprobe", argc, argv); if (argc - 2 > MAX_TRACE_ARGS) { trace_probe_log_set_index(2); trace_probe_log_err(0, TOO_MANY_ARGS); return -E2BIG; } if (argv[0][1] == ':') event = &argv[0][2]; if (!strchr(argv[1], '/')) return -ECANCELED; filename = kstrdup(argv[1], GFP_KERNEL); if (!filename) return -ENOMEM; /* Find the last occurrence, in case the path contains ':' too. */ arg = strrchr(filename, ':'); if (!arg || !isdigit(arg[1])) { kfree(filename); return -ECANCELED; } trace_probe_log_set_index(1); /* filename is the 2nd argument */ *arg++ = '\0'; ret = kern_path(filename, LOOKUP_FOLLOW, &path); if (ret) { trace_probe_log_err(0, FILE_NOT_FOUND); kfree(filename); trace_probe_log_clear(); return ret; } if (!d_is_reg(path.dentry)) { trace_probe_log_err(0, NO_REGULAR_FILE); ret = -EINVAL; goto fail_address_parse; } /* Parse reference counter offset if specified. */ rctr = strchr(arg, '('); if (rctr) { rctr_end = strchr(rctr, ')'); if (!rctr_end) { ret = -EINVAL; rctr_end = rctr + strlen(rctr); trace_probe_log_err(rctr_end - filename, REFCNT_OPEN_BRACE); goto fail_address_parse; } else if (rctr_end[1] != '\0') { ret = -EINVAL; trace_probe_log_err(rctr_end + 1 - filename, BAD_REFCNT_SUFFIX); goto fail_address_parse; } *rctr++ = '\0'; *rctr_end = '\0'; ret = kstrtoul(rctr, 0, &ref_ctr_offset); if (ret) { trace_probe_log_err(rctr - filename, BAD_REFCNT); goto fail_address_parse; } } /* Check if there is %return suffix */ tmp = strchr(arg, '%'); if (tmp) { if (!strcmp(tmp, "%return")) { *tmp = '\0'; is_return = true; } else { trace_probe_log_err(tmp - filename, BAD_ADDR_SUFFIX); ret = -EINVAL; goto fail_address_parse; } } /* Parse uprobe offset. */ ret = kstrtoul(arg, 0, &offset); if (ret) { trace_probe_log_err(arg - filename, BAD_UPROBE_OFFS); goto fail_address_parse; } /* setup a probe */ trace_probe_log_set_index(0); if (event) { ret = traceprobe_parse_event_name(&event, &group, gbuf, event - argv[0]); if (ret) goto fail_address_parse; } if (!event) { char *tail; char *ptr; tail = kstrdup(kbasename(filename), GFP_KERNEL); if (!tail) { ret = -ENOMEM; goto fail_address_parse; } ptr = strpbrk(tail, ".-_"); if (ptr) *ptr = '\0'; snprintf(buf, MAX_EVENT_NAME_LEN, "%c_%s_0x%lx", 'p', tail, offset); event = buf; kfree(tail); } argc -= 2; argv += 2; tu = alloc_trace_uprobe(group, event, argc, is_return); if (IS_ERR(tu)) { ret = PTR_ERR(tu); /* This must return -ENOMEM otherwise there is a bug */ WARN_ON_ONCE(ret != -ENOMEM); goto fail_address_parse; } tu->offset = offset; tu->ref_ctr_offset = ref_ctr_offset; tu->path = path; tu->filename = filename; /* parse arguments */ for (i = 0; i < argc; i++) { struct traceprobe_parse_context ctx = { .flags = (is_return ? TPARG_FL_RETURN : 0) | TPARG_FL_USER, }; trace_probe_log_set_index(i + 2); ret = traceprobe_parse_probe_arg(&tu->tp, i, argv[i], &ctx); traceprobe_finish_parse(&ctx); if (ret) goto error; } ptype = is_ret_probe(tu) ? PROBE_PRINT_RETURN : PROBE_PRINT_NORMAL; ret = traceprobe_set_print_fmt(&tu->tp, ptype); if (ret < 0) goto error; ret = register_trace_uprobe(tu); if (!ret) goto out; error: free_trace_uprobe(tu); out: trace_probe_log_clear(); return ret; fail_address_parse: trace_probe_log_clear(); path_put(&path); kfree(filename); return ret; } int trace_uprobe_create(const char *raw_command) { return trace_probe_create(raw_command, __trace_uprobe_create); } static int create_or_delete_trace_uprobe(const char *raw_command) { int ret; if (raw_command[0] == '-') return dyn_event_release(raw_command, &trace_uprobe_ops); ret = dyn_event_create(raw_command, &trace_uprobe_ops); return ret == -ECANCELED ? -EINVAL : ret; } static int trace_uprobe_release(struct dyn_event *ev) { struct trace_uprobe *tu = to_trace_uprobe(ev); return unregister_trace_uprobe(tu); } /* Probes listing interfaces */ static int trace_uprobe_show(struct seq_file *m, struct dyn_event *ev) { struct trace_uprobe *tu = to_trace_uprobe(ev); char c = is_ret_probe(tu) ? 'r' : 'p'; int i; seq_printf(m, "%c:%s/%s %s:0x%0*lx", c, trace_probe_group_name(&tu->tp), trace_probe_name(&tu->tp), tu->filename, (int)(sizeof(void *) * 2), tu->offset); if (tu->ref_ctr_offset) seq_printf(m, "(0x%lx)", tu->ref_ctr_offset); for (i = 0; i < tu->tp.nr_args; i++) seq_printf(m, " %s=%s", tu->tp.args[i].name, tu->tp.args[i].comm); seq_putc(m, '\n'); return 0; } static int probes_seq_show(struct seq_file *m, void *v) { struct dyn_event *ev = v; if (!is_trace_uprobe(ev)) return 0; return trace_uprobe_show(m, ev); } static const struct seq_operations probes_seq_op = { .start = dyn_event_seq_start, .next = dyn_event_seq_next, .stop = dyn_event_seq_stop, .show = probes_seq_show }; static int probes_open(struct inode *inode, struct file *file) { int ret; ret = security_locked_down(LOCKDOWN_TRACEFS); if (ret) return ret; if ((file->f_mode & FMODE_WRITE) && (file->f_flags & O_TRUNC)) { ret = dyn_events_release_all(&trace_uprobe_ops); if (ret) return ret; } return seq_open(file, &probes_seq_op); } static ssize_t probes_write(struct file *file, const char __user *buffer, size_t count, loff_t *ppos) { return trace_parse_run_command(file, buffer, count, ppos, create_or_delete_trace_uprobe); } static const struct file_operations uprobe_events_ops = { .owner = THIS_MODULE, .open = probes_open, .read = seq_read, .llseek = seq_lseek, .release = seq_release, .write = probes_write, }; /* Probes profiling interfaces */ static int probes_profile_seq_show(struct seq_file *m, void *v) { struct dyn_event *ev = v; struct trace_uprobe *tu; unsigned long nhits; int cpu; if (!is_trace_uprobe(ev)) return 0; tu = to_trace_uprobe(ev); nhits = 0; for_each_possible_cpu(cpu) { nhits += per_cpu(*tu->nhits, cpu); } seq_printf(m, " %s %-44s %15lu\n", tu->filename, trace_probe_name(&tu->tp), nhits); return 0; } static const struct seq_operations profile_seq_op = { .start = dyn_event_seq_start, .next = dyn_event_seq_next, .stop = dyn_event_seq_stop, .show = probes_profile_seq_show }; static int profile_open(struct inode *inode, struct file *file) { int ret; ret = security_locked_down(LOCKDOWN_TRACEFS); if (ret) return ret; return seq_open(file, &profile_seq_op); } static const struct file_operations uprobe_profile_ops = { .owner = THIS_MODULE, .open = profile_open, .read = seq_read, .llseek = seq_lseek, .release = seq_release, }; struct uprobe_cpu_buffer { struct mutex mutex; void *buf; int dsize; }; static struct uprobe_cpu_buffer __percpu *uprobe_cpu_buffer; static int uprobe_buffer_refcnt; #define MAX_UCB_BUFFER_SIZE PAGE_SIZE static int uprobe_buffer_init(void) { int cpu, err_cpu; uprobe_cpu_buffer = alloc_percpu(struct uprobe_cpu_buffer); if (uprobe_cpu_buffer == NULL) return -ENOMEM; for_each_possible_cpu(cpu) { struct page *p = alloc_pages_node(cpu_to_node(cpu), GFP_KERNEL, 0); if (p == NULL) { err_cpu = cpu; goto err; } per_cpu_ptr(uprobe_cpu_buffer, cpu)->buf = page_address(p); mutex_init(&per_cpu_ptr(uprobe_cpu_buffer, cpu)->mutex); } return 0; err: for_each_possible_cpu(cpu) { if (cpu == err_cpu) break; free_page((unsigned long)per_cpu_ptr(uprobe_cpu_buffer, cpu)->buf); } free_percpu(uprobe_cpu_buffer); return -ENOMEM; } static int uprobe_buffer_enable(void) { int ret = 0; BUG_ON(!mutex_is_locked(&event_mutex)); if (uprobe_buffer_refcnt++ == 0) { ret = uprobe_buffer_init(); if (ret < 0) uprobe_buffer_refcnt--; } return ret; } static void uprobe_buffer_disable(void) { int cpu; BUG_ON(!mutex_is_locked(&event_mutex)); if (--uprobe_buffer_refcnt == 0) { for_each_possible_cpu(cpu) free_page((unsigned long)per_cpu_ptr(uprobe_cpu_buffer, cpu)->buf); free_percpu(uprobe_cpu_buffer); uprobe_cpu_buffer = NULL; } } static struct uprobe_cpu_buffer *uprobe_buffer_get(void) { struct uprobe_cpu_buffer *ucb; int cpu; cpu = raw_smp_processor_id(); ucb = per_cpu_ptr(uprobe_cpu_buffer, cpu); /* * Use per-cpu buffers for fastest access, but we might migrate * so the mutex makes sure we have sole access to it. */ mutex_lock(&ucb->mutex); return ucb; } static void uprobe_buffer_put(struct uprobe_cpu_buffer *ucb) { if (!ucb) return; mutex_unlock(&ucb->mutex); } static struct uprobe_cpu_buffer *prepare_uprobe_buffer(struct trace_uprobe *tu, struct pt_regs *regs, struct uprobe_cpu_buffer **ucbp) { struct uprobe_cpu_buffer *ucb; int dsize, esize; if (*ucbp) return *ucbp; esize = SIZEOF_TRACE_ENTRY(is_ret_probe(tu)); dsize = __get_data_size(&tu->tp, regs, NULL); ucb = uprobe_buffer_get(); ucb->dsize = tu->tp.size + dsize; if (WARN_ON_ONCE(ucb->dsize > MAX_UCB_BUFFER_SIZE)) { ucb->dsize = MAX_UCB_BUFFER_SIZE; dsize = MAX_UCB_BUFFER_SIZE - tu->tp.size; } store_trace_args(ucb->buf, &tu->tp, regs, NULL, esize, dsize); *ucbp = ucb; return ucb; } static void __uprobe_trace_func(struct trace_uprobe *tu, unsigned long func, struct pt_regs *regs, struct uprobe_cpu_buffer *ucb, struct trace_event_file *trace_file) { struct uprobe_trace_entry_head *entry; struct trace_event_buffer fbuffer; void *data; int size, esize; struct trace_event_call *call = trace_probe_event_call(&tu->tp); WARN_ON(call != trace_file->event_call); if (trace_trigger_soft_disabled(trace_file)) return; esize = SIZEOF_TRACE_ENTRY(is_ret_probe(tu)); size = esize + ucb->dsize; entry = trace_event_buffer_reserve(&fbuffer, trace_file, size); if (!entry) return; if (is_ret_probe(tu)) { entry->vaddr[0] = func; entry->vaddr[1] = instruction_pointer(regs); data = DATAOF_TRACE_ENTRY(entry, true); } else { entry->vaddr[0] = instruction_pointer(regs); data = DATAOF_TRACE_ENTRY(entry, false); } memcpy(data, ucb->buf, ucb->dsize); trace_event_buffer_commit(&fbuffer); } /* uprobe handler */ static int uprobe_trace_func(struct trace_uprobe *tu, struct pt_regs *regs, struct uprobe_cpu_buffer **ucbp) { struct event_file_link *link; struct uprobe_cpu_buffer *ucb; if (is_ret_probe(tu)) return 0; ucb = prepare_uprobe_buffer(tu, regs, ucbp); rcu_read_lock(); trace_probe_for_each_link_rcu(link, &tu->tp) __uprobe_trace_func(tu, 0, regs, ucb, link->file); rcu_read_unlock(); return 0; } static void uretprobe_trace_func(struct trace_uprobe *tu, unsigned long func, struct pt_regs *regs, struct uprobe_cpu_buffer **ucbp) { struct event_file_link *link; struct uprobe_cpu_buffer *ucb; ucb = prepare_uprobe_buffer(tu, regs, ucbp); rcu_read_lock(); trace_probe_for_each_link_rcu(link, &tu->tp) __uprobe_trace_func(tu, func, regs, ucb, link->file); rcu_read_unlock(); } /* Event entry printers */ static enum print_line_t print_uprobe_event(struct trace_iterator *iter, int flags, struct trace_event *event) { struct uprobe_trace_entry_head *entry; struct trace_seq *s = &iter->seq; struct trace_uprobe *tu; u8 *data; entry = (struct uprobe_trace_entry_head *)iter->ent; tu = trace_uprobe_primary_from_call( container_of(event, struct trace_event_call, event)); if (unlikely(!tu)) goto out; if (is_ret_probe(tu)) { trace_seq_printf(s, "%s: (0x%lx <- 0x%lx)", trace_probe_name(&tu->tp), entry->vaddr[1], entry->vaddr[0]); data = DATAOF_TRACE_ENTRY(entry, true); } else { trace_seq_printf(s, "%s: (0x%lx)", trace_probe_name(&tu->tp), entry->vaddr[0]); data = DATAOF_TRACE_ENTRY(entry, false); } if (trace_probe_print_args(s, tu->tp.args, tu->tp.nr_args, data, entry) < 0) goto out; trace_seq_putc(s, '\n'); out: return trace_handle_return(s); } typedef bool (*filter_func_t)(struct uprobe_consumer *self, struct mm_struct *mm); static int trace_uprobe_enable(struct trace_uprobe *tu, filter_func_t filter) { struct inode *inode = d_real_inode(tu->path.dentry); struct uprobe *uprobe; tu->consumer.filter = filter; uprobe = uprobe_register(inode, tu->offset, tu->ref_ctr_offset, &tu->consumer); if (IS_ERR(uprobe)) return PTR_ERR(uprobe); tu->uprobe = uprobe; return 0; } static void __probe_event_disable(struct trace_probe *tp) { struct trace_uprobe *tu; bool sync = false; tu = container_of(tp, struct trace_uprobe, tp); WARN_ON(!uprobe_filter_is_empty(tu->tp.event->filter)); list_for_each_entry(tu, trace_probe_probe_list(tp), tp.list) { if (!tu->uprobe) continue; uprobe_unregister_nosync(tu->uprobe, &tu->consumer); sync = true; tu->uprobe = NULL; } if (sync) uprobe_unregister_sync(); } static int probe_event_enable(struct trace_event_call *call, struct trace_event_file *file, filter_func_t filter) { struct trace_probe *tp; struct trace_uprobe *tu; bool enabled; int ret; tp = trace_probe_primary_from_call(call); if (WARN_ON_ONCE(!tp)) return -ENODEV; enabled = trace_probe_is_enabled(tp); /* This may also change "enabled" state */ if (file) { if (trace_probe_test_flag(tp, TP_FLAG_PROFILE)) return -EINTR; ret = trace_probe_add_file(tp, file); if (ret < 0) return ret; } else { if (trace_probe_test_flag(tp, TP_FLAG_TRACE)) return -EINTR; trace_probe_set_flag(tp, TP_FLAG_PROFILE); } tu = container_of(tp, struct trace_uprobe, tp); WARN_ON(!uprobe_filter_is_empty(tu->tp.event->filter)); if (enabled) return 0; ret = uprobe_buffer_enable(); if (ret) goto err_flags; list_for_each_entry(tu, trace_probe_probe_list(tp), tp.list) { ret = trace_uprobe_enable(tu, filter); if (ret) { __probe_event_disable(tp); goto err_buffer; } } return 0; err_buffer: uprobe_buffer_disable(); err_flags: if (file) trace_probe_remove_file(tp, file); else trace_probe_clear_flag(tp, TP_FLAG_PROFILE); return ret; } static void probe_event_disable(struct trace_event_call *call, struct trace_event_file *file) { struct trace_probe *tp; tp = trace_probe_primary_from_call(call); if (WARN_ON_ONCE(!tp)) return; if (!trace_probe_is_enabled(tp)) return; if (file) { if (trace_probe_remove_file(tp, file) < 0) return; if (trace_probe_is_enabled(tp)) return; } else trace_probe_clear_flag(tp, TP_FLAG_PROFILE); __probe_event_disable(tp); uprobe_buffer_disable(); } static int uprobe_event_define_fields(struct trace_event_call *event_call) { int ret, size; struct uprobe_trace_entry_head field; struct trace_uprobe *tu; tu = trace_uprobe_primary_from_call(event_call); if (unlikely(!tu)) return -ENODEV; if (is_ret_probe(tu)) { DEFINE_FIELD(unsigned long, vaddr[0], FIELD_STRING_FUNC, 0); DEFINE_FIELD(unsigned long, vaddr[1], FIELD_STRING_RETIP, 0); size = SIZEOF_TRACE_ENTRY(true); } else { DEFINE_FIELD(unsigned long, vaddr[0], FIELD_STRING_IP, 0); size = SIZEOF_TRACE_ENTRY(false); } return traceprobe_define_arg_fields(event_call, size, &tu->tp); } #ifdef CONFIG_PERF_EVENTS static bool __uprobe_perf_filter(struct trace_uprobe_filter *filter, struct mm_struct *mm) { struct perf_event *event; list_for_each_entry(event, &filter->perf_events, hw.tp_list) { if (event->hw.target->mm == mm) return true; } return false; } static inline bool trace_uprobe_filter_event(struct trace_uprobe_filter *filter, struct perf_event *event) { return __uprobe_perf_filter(filter, event->hw.target->mm); } static bool trace_uprobe_filter_remove(struct trace_uprobe_filter *filter, struct perf_event *event) { bool done; write_lock(&filter->rwlock); if (event->hw.target) { list_del(&event->hw.tp_list); done = filter->nr_systemwide || (event->hw.target->flags & PF_EXITING) || trace_uprobe_filter_event(filter, event); } else { filter->nr_systemwide--; done = filter->nr_systemwide; } write_unlock(&filter->rwlock); return done; } /* This returns true if the filter always covers target mm */ static bool trace_uprobe_filter_add(struct trace_uprobe_filter *filter, struct perf_event *event) { bool done; write_lock(&filter->rwlock); if (event->hw.target) { /* * event->parent != NULL means copy_process(), we can avoid * uprobe_apply(). current->mm must be probed and we can rely * on dup_mmap() which preserves the already installed bp's. * * attr.enable_on_exec means that exec/mmap will install the * breakpoints we need. */ done = filter->nr_systemwide || event->parent || event->attr.enable_on_exec || trace_uprobe_filter_event(filter, event); list_add(&event->hw.tp_list, &filter->perf_events); } else { done = filter->nr_systemwide; filter->nr_systemwide++; } write_unlock(&filter->rwlock); return done; } static int uprobe_perf_close(struct trace_event_call *call, struct perf_event *event) { struct trace_probe *tp; struct trace_uprobe *tu; int ret = 0; tp = trace_probe_primary_from_call(call); if (WARN_ON_ONCE(!tp)) return -ENODEV; tu = container_of(tp, struct trace_uprobe, tp); if (trace_uprobe_filter_remove(tu->tp.event->filter, event)) return 0; list_for_each_entry(tu, trace_probe_probe_list(tp), tp.list) { ret = uprobe_apply(tu->uprobe, &tu->consumer, false); if (ret) break; } return ret; } static int uprobe_perf_open(struct trace_event_call *call, struct perf_event *event) { struct trace_probe *tp; struct trace_uprobe *tu; int err = 0; tp = trace_probe_primary_from_call(call); if (WARN_ON_ONCE(!tp)) return -ENODEV; tu = container_of(tp, struct trace_uprobe, tp); if (trace_uprobe_filter_add(tu->tp.event->filter, event)) return 0; list_for_each_entry(tu, trace_probe_probe_list(tp), tp.list) { err = uprobe_apply(tu->uprobe, &tu->consumer, true); if (err) { uprobe_perf_close(call, event); break; } } return err; } static bool uprobe_perf_filter(struct uprobe_consumer *uc, struct mm_struct *mm) { struct trace_uprobe_filter *filter; struct trace_uprobe *tu; int ret; tu = container_of(uc, struct trace_uprobe, consumer); filter = tu->tp.event->filter; /* * speculative short-circuiting check to avoid unnecessarily taking * filter->rwlock below, if the uprobe has system-wide consumer */ if (READ_ONCE(filter->nr_systemwide)) return true; read_lock(&filter->rwlock); ret = __uprobe_perf_filter(filter, mm); read_unlock(&filter->rwlock); return ret; } static void __uprobe_perf_func(struct trace_uprobe *tu, unsigned long func, struct pt_regs *regs, struct uprobe_cpu_buffer **ucbp) { struct trace_event_call *call = trace_probe_event_call(&tu->tp); struct uprobe_trace_entry_head *entry; struct uprobe_cpu_buffer *ucb; struct hlist_head *head; void *data; int size, esize; int rctx; #ifdef CONFIG_BPF_EVENTS if (bpf_prog_array_valid(call)) { const struct bpf_prog_array *array; u32 ret; rcu_read_lock_trace(); array = rcu_dereference_check(call->prog_array, rcu_read_lock_trace_held()); ret = bpf_prog_run_array_uprobe(array, regs, bpf_prog_run); rcu_read_unlock_trace(); if (!ret) return; } #endif /* CONFIG_BPF_EVENTS */ esize = SIZEOF_TRACE_ENTRY(is_ret_probe(tu)); ucb = prepare_uprobe_buffer(tu, regs, ucbp); size = esize + ucb->dsize; size = ALIGN(size + sizeof(u32), sizeof(u64)) - sizeof(u32); if (WARN_ONCE(size > PERF_MAX_TRACE_SIZE, "profile buffer not large enough")) return; preempt_disable(); head = this_cpu_ptr(call->perf_events); if (hlist_empty(head)) goto out; entry = perf_trace_buf_alloc(size, NULL, &rctx); if (!entry) goto out; if (is_ret_probe(tu)) { entry->vaddr[0] = func; entry->vaddr[1] = instruction_pointer(regs); data = DATAOF_TRACE_ENTRY(entry, true); } else { entry->vaddr[0] = instruction_pointer(regs); data = DATAOF_TRACE_ENTRY(entry, false); } memcpy(data, ucb->buf, ucb->dsize); if (size - esize > ucb->dsize) memset(data + ucb->dsize, 0, size - esize - ucb->dsize); perf_trace_buf_submit(entry, size, rctx, call->event.type, 1, regs, head, NULL); out: preempt_enable(); } /* uprobe profile handler */ static int uprobe_perf_func(struct trace_uprobe *tu, struct pt_regs *regs, struct uprobe_cpu_buffer **ucbp) { if (!uprobe_perf_filter(&tu->consumer, current->mm)) return UPROBE_HANDLER_REMOVE; if (!is_ret_probe(tu)) __uprobe_perf_func(tu, 0, regs, ucbp); return 0; } static void uretprobe_perf_func(struct trace_uprobe *tu, unsigned long func, struct pt_regs *regs, struct uprobe_cpu_buffer **ucbp) { __uprobe_perf_func(tu, func, regs, ucbp); } int bpf_get_uprobe_info(const struct perf_event *event, u32 *fd_type, const char **filename, u64 *probe_offset, u64 *probe_addr, bool perf_type_tracepoint) { const char *pevent = trace_event_name(event->tp_event); const char *group = event->tp_event->class->system; struct trace_uprobe *tu; if (perf_type_tracepoint) tu = find_probe_event(pevent, group); else tu = trace_uprobe_primary_from_call(event->tp_event); if (!tu) return -EINVAL; *fd_type = is_ret_probe(tu) ? BPF_FD_TYPE_URETPROBE : BPF_FD_TYPE_UPROBE; *filename = tu->filename; *probe_offset = tu->offset; *probe_addr = tu->ref_ctr_offset; return 0; } #endif /* CONFIG_PERF_EVENTS */ static int trace_uprobe_register(struct trace_event_call *event, enum trace_reg type, void *data) { struct trace_event_file *file = data; switch (type) { case TRACE_REG_REGISTER: return probe_event_enable(event, file, NULL); case TRACE_REG_UNREGISTER: probe_event_disable(event, file); return 0; #ifdef CONFIG_PERF_EVENTS case TRACE_REG_PERF_REGISTER: return probe_event_enable(event, NULL, uprobe_perf_filter); case TRACE_REG_PERF_UNREGISTER: probe_event_disable(event, NULL); return 0; case TRACE_REG_PERF_OPEN: return uprobe_perf_open(event, data); case TRACE_REG_PERF_CLOSE: return uprobe_perf_close(event, data); #endif default: return 0; } } static int uprobe_dispatcher(struct uprobe_consumer *con, struct pt_regs *regs, __u64 *data) { struct trace_uprobe *tu; struct uprobe_dispatch_data udd; struct uprobe_cpu_buffer *ucb = NULL; int ret = 0; tu = container_of(con, struct trace_uprobe, consumer); this_cpu_inc(*tu->nhits); udd.tu = tu; udd.bp_addr = instruction_pointer(regs); current->utask->vaddr = (unsigned long) &udd; if (WARN_ON_ONCE(!uprobe_cpu_buffer)) return 0; if (trace_probe_test_flag(&tu->tp, TP_FLAG_TRACE)) ret |= uprobe_trace_func(tu, regs, &ucb); #ifdef CONFIG_PERF_EVENTS if (trace_probe_test_flag(&tu->tp, TP_FLAG_PROFILE)) ret |= uprobe_perf_func(tu, regs, &ucb); #endif uprobe_buffer_put(ucb); return ret; } static int uretprobe_dispatcher(struct uprobe_consumer *con, unsigned long func, struct pt_regs *regs, __u64 *data) { struct trace_uprobe *tu; struct uprobe_dispatch_data udd; struct uprobe_cpu_buffer *ucb = NULL; tu = container_of(con, struct trace_uprobe, consumer); udd.tu = tu; udd.bp_addr = func; current->utask->vaddr = (unsigned long) &udd; if (WARN_ON_ONCE(!uprobe_cpu_buffer)) return 0; if (trace_probe_test_flag(&tu->tp, TP_FLAG_TRACE)) uretprobe_trace_func(tu, func, regs, &ucb); #ifdef CONFIG_PERF_EVENTS if (trace_probe_test_flag(&tu->tp, TP_FLAG_PROFILE)) uretprobe_perf_func(tu, func, regs, &ucb); #endif uprobe_buffer_put(ucb); return 0; } static struct trace_event_functions uprobe_funcs = { .trace = print_uprobe_event }; static struct trace_event_fields uprobe_fields_array[] = { { .type = TRACE_FUNCTION_TYPE, .define_fields = uprobe_event_define_fields }, {} }; static inline void init_trace_event_call(struct trace_uprobe *tu) { struct trace_event_call *call = trace_probe_event_call(&tu->tp); call->event.funcs = &uprobe_funcs; call->class->fields_array = uprobe_fields_array; call->flags = TRACE_EVENT_FL_UPROBE | TRACE_EVENT_FL_CAP_ANY; call->class->reg = trace_uprobe_register; } static int register_uprobe_event(struct trace_uprobe *tu) { init_trace_event_call(tu); return trace_probe_register_event_call(&tu->tp); } static int unregister_uprobe_event(struct trace_uprobe *tu) { return trace_probe_unregister_event_call(&tu->tp); } #ifdef CONFIG_PERF_EVENTS struct trace_event_call * create_local_trace_uprobe(char *name, unsigned long offs, unsigned long ref_ctr_offset, bool is_return) { enum probe_print_type ptype; struct trace_uprobe *tu; struct path path; int ret; ret = kern_path(name, LOOKUP_FOLLOW, &path); if (ret) return ERR_PTR(ret); if (!d_is_reg(path.dentry)) { path_put(&path); return ERR_PTR(-EINVAL); } /* * local trace_kprobes are not added to dyn_event, so they are never * searched in find_trace_kprobe(). Therefore, there is no concern of * duplicated name "DUMMY_EVENT" here. */ tu = alloc_trace_uprobe(UPROBE_EVENT_SYSTEM, "DUMMY_EVENT", 0, is_return); if (IS_ERR(tu)) { pr_info("Failed to allocate trace_uprobe.(%d)\n", (int)PTR_ERR(tu)); path_put(&path); return ERR_CAST(tu); } tu->offset = offs; tu->path = path; tu->ref_ctr_offset = ref_ctr_offset; tu->filename = kstrdup(name, GFP_KERNEL); if (!tu->filename) { ret = -ENOMEM; goto error; } init_trace_event_call(tu); ptype = is_ret_probe(tu) ? PROBE_PRINT_RETURN : PROBE_PRINT_NORMAL; if (traceprobe_set_print_fmt(&tu->tp, ptype) < 0) { ret = -ENOMEM; goto error; } return trace_probe_event_call(&tu->tp); error: free_trace_uprobe(tu); return ERR_PTR(ret); } void destroy_local_trace_uprobe(struct trace_event_call *event_call) { struct trace_uprobe *tu; tu = trace_uprobe_primary_from_call(event_call); free_trace_uprobe(tu); } #endif /* CONFIG_PERF_EVENTS */ /* Make a trace interface for controlling probe points */ static __init int init_uprobe_trace(void) { int ret; ret = dyn_event_register(&trace_uprobe_ops); if (ret) return ret; ret = tracing_init_dentry(); if (ret) return 0; trace_create_file("uprobe_events", TRACE_MODE_WRITE, NULL, NULL, &uprobe_events_ops); /* Profile interface */ trace_create_file("uprobe_profile", TRACE_MODE_READ, NULL, NULL, &uprobe_profile_ops); return 0; } fs_initcall(init_uprobe_trace); |
10 10 10 8 2 5 5 45 3 45 1 44 45 45 12 40 38 9 45 2 43 31 20 45 44 11 1 10 10 1 9 10 11 11 10 1 40 45 45 45 44 20 40 45 565 502 68 5 4 5 1 1 4 1 4 2 2 1 6 6 4 23 19 19 8 8 8 97 3 95 95 1 1 94 4 4 97 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 | /* * \author Rickard E. (Rik) Faith <faith@valinux.com> * \author Daryll Strauss <daryll@valinux.com> * \author Gareth Hughes <gareth@valinux.com> */ /* * Created: Mon Jan 4 08:58:31 1999 by faith@valinux.com * * Copyright 1999 Precision Insight, Inc., Cedar Park, Texas. * Copyright 2000 VA Linux Systems, Inc., Sunnyvale, California. * All Rights Reserved. * * Permission is hereby granted, free of charge, to any person obtaining a * copy of this software and associated documentation files (the "Software"), * to deal in the Software without restriction, including without limitation * the rights to use, copy, modify, merge, publish, distribute, sublicense, * and/or sell copies of the Software, and to permit persons to whom the * Software is furnished to do so, subject to the following conditions: * * The above copyright notice and this permission notice (including the next * paragraph) shall be included in all copies or substantial portions of the * Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL * VA LINUX SYSTEMS AND/OR ITS SUPPLIERS BE LIABLE FOR ANY CLAIM, DAMAGES OR * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR * OTHER DEALINGS IN THE SOFTWARE. */ #include <linux/anon_inodes.h> #include <linux/dma-fence.h> #include <linux/file.h> #include <linux/module.h> #include <linux/pci.h> #include <linux/poll.h> #include <linux/slab.h> #include <linux/vga_switcheroo.h> #include <drm/drm_client_event.h> #include <drm/drm_drv.h> #include <drm/drm_file.h> #include <drm/drm_gem.h> #include <drm/drm_print.h> #include "drm_crtc_internal.h" #include "drm_internal.h" /* from BKL pushdown */ DEFINE_MUTEX(drm_global_mutex); bool drm_dev_needs_global_mutex(struct drm_device *dev) { /* * The deprecated ->load callback must be called after the driver is * already registered. This means such drivers rely on the BKL to make * sure an open can't proceed until the driver is actually fully set up. * Similar hilarity holds for the unload callback. */ if (dev->driver->load || dev->driver->unload) return true; return false; } /** * DOC: file operations * * Drivers must define the file operations structure that forms the DRM * userspace API entry point, even though most of those operations are * implemented in the DRM core. The resulting &struct file_operations must be * stored in the &drm_driver.fops field. The mandatory functions are drm_open(), * drm_read(), drm_ioctl() and drm_compat_ioctl() if CONFIG_COMPAT is enabled * Note that drm_compat_ioctl will be NULL if CONFIG_COMPAT=n, so there's no * need to sprinkle #ifdef into the code. Drivers which implement private ioctls * that require 32/64 bit compatibility support must provide their own * &file_operations.compat_ioctl handler that processes private ioctls and calls * drm_compat_ioctl() for core ioctls. * * In addition drm_read() and drm_poll() provide support for DRM events. DRM * events are a generic and extensible means to send asynchronous events to * userspace through the file descriptor. They are used to send vblank event and * page flip completions by the KMS API. But drivers can also use it for their * own needs, e.g. to signal completion of rendering. * * For the driver-side event interface see drm_event_reserve_init() and * drm_send_event() as the main starting points. * * The memory mapping implementation will vary depending on how the driver * manages memory. For GEM-based drivers this is drm_gem_mmap(). * * No other file operations are supported by the DRM userspace API. Overall the * following is an example &file_operations structure:: * * static const example_drm_fops = { * .owner = THIS_MODULE, * .open = drm_open, * .release = drm_release, * .unlocked_ioctl = drm_ioctl, * .compat_ioctl = drm_compat_ioctl, // NULL if CONFIG_COMPAT=n * .poll = drm_poll, * .read = drm_read, * .mmap = drm_gem_mmap, * }; * * For plain GEM based drivers there is the DEFINE_DRM_GEM_FOPS() macro, and for * DMA based drivers there is the DEFINE_DRM_GEM_DMA_FOPS() macro to make this * simpler. * * The driver's &file_operations must be stored in &drm_driver.fops. * * For driver-private IOCTL handling see the more detailed discussion in * :ref:`IOCTL support in the userland interfaces chapter<drm_driver_ioctl>`. */ /** * drm_file_alloc - allocate file context * @minor: minor to allocate on * * This allocates a new DRM file context. It is not linked into any context and * can be used by the caller freely. Note that the context keeps a pointer to * @minor, so it must be freed before @minor is. * * RETURNS: * Pointer to newly allocated context, ERR_PTR on failure. */ struct drm_file *drm_file_alloc(struct drm_minor *minor) { static atomic64_t ident = ATOMIC64_INIT(0); struct drm_device *dev = minor->dev; struct drm_file *file; int ret; file = kzalloc(sizeof(*file), GFP_KERNEL); if (!file) return ERR_PTR(-ENOMEM); /* Get a unique identifier for fdinfo: */ file->client_id = atomic64_inc_return(&ident); rcu_assign_pointer(file->pid, get_pid(task_tgid(current))); file->minor = minor; /* for compatibility root is always authenticated */ file->authenticated = capable(CAP_SYS_ADMIN); INIT_LIST_HEAD(&file->lhead); INIT_LIST_HEAD(&file->fbs); mutex_init(&file->fbs_lock); INIT_LIST_HEAD(&file->blobs); INIT_LIST_HEAD(&file->pending_event_list); INIT_LIST_HEAD(&file->event_list); init_waitqueue_head(&file->event_wait); file->event_space = 4096; /* set aside 4k for event buffer */ spin_lock_init(&file->master_lookup_lock); mutex_init(&file->event_read_lock); mutex_init(&file->client_name_lock); if (drm_core_check_feature(dev, DRIVER_GEM)) drm_gem_open(dev, file); if (drm_core_check_feature(dev, DRIVER_SYNCOBJ)) drm_syncobj_open(file); drm_prime_init_file_private(&file->prime); if (dev->driver->open) { ret = dev->driver->open(dev, file); if (ret < 0) goto out_prime_destroy; } return file; out_prime_destroy: drm_prime_destroy_file_private(&file->prime); if (drm_core_check_feature(dev, DRIVER_SYNCOBJ)) drm_syncobj_release(file); if (drm_core_check_feature(dev, DRIVER_GEM)) drm_gem_release(dev, file); put_pid(rcu_access_pointer(file->pid)); kfree(file); return ERR_PTR(ret); } static void drm_events_release(struct drm_file *file_priv) { struct drm_device *dev = file_priv->minor->dev; struct drm_pending_event *e, *et; unsigned long flags; spin_lock_irqsave(&dev->event_lock, flags); /* Unlink pending events */ list_for_each_entry_safe(e, et, &file_priv->pending_event_list, pending_link) { list_del(&e->pending_link); e->file_priv = NULL; } /* Remove unconsumed events */ list_for_each_entry_safe(e, et, &file_priv->event_list, link) { list_del(&e->link); kfree(e); } spin_unlock_irqrestore(&dev->event_lock, flags); } /** * drm_file_free - free file context * @file: context to free, or NULL * * This destroys and deallocates a DRM file context previously allocated via * drm_file_alloc(). The caller must make sure to unlink it from any contexts * before calling this. * * If NULL is passed, this is a no-op. */ void drm_file_free(struct drm_file *file) { struct drm_device *dev; if (!file) return; dev = file->minor->dev; drm_dbg_core(dev, "comm=\"%s\", pid=%d, dev=0x%lx, open_count=%d\n", current->comm, task_pid_nr(current), (long)old_encode_dev(file->minor->kdev->devt), atomic_read(&dev->open_count)); drm_events_release(file); if (drm_core_check_feature(dev, DRIVER_MODESET)) { drm_fb_release(file); drm_property_destroy_user_blobs(dev, file); } if (drm_core_check_feature(dev, DRIVER_SYNCOBJ)) drm_syncobj_release(file); if (drm_core_check_feature(dev, DRIVER_GEM)) drm_gem_release(dev, file); if (drm_is_primary_client(file)) drm_master_release(file); if (dev->driver->postclose) dev->driver->postclose(dev, file); drm_prime_destroy_file_private(&file->prime); WARN_ON(!list_empty(&file->event_list)); put_pid(rcu_access_pointer(file->pid)); mutex_destroy(&file->client_name_lock); kfree(file->client_name); kfree(file); } static void drm_close_helper(struct file *filp) { struct drm_file *file_priv = filp->private_data; struct drm_device *dev = file_priv->minor->dev; mutex_lock(&dev->filelist_mutex); list_del(&file_priv->lhead); mutex_unlock(&dev->filelist_mutex); drm_file_free(file_priv); } /* * Check whether DRI will run on this CPU. * * \return non-zero if the DRI will run on this CPU, or zero otherwise. */ static int drm_cpu_valid(void) { #if defined(__sparc__) && !defined(__sparc_v9__) return 0; /* No cmpxchg before v9 sparc. */ #endif return 1; } /* * Called whenever a process opens a drm node * * \param filp file pointer. * \param minor acquired minor-object. * \return zero on success or a negative number on failure. * * Creates and initializes a drm_file structure for the file private data in \p * filp and add it into the double linked list in \p dev. */ int drm_open_helper(struct file *filp, struct drm_minor *minor) { struct drm_device *dev = minor->dev; struct drm_file *priv; int ret; if (filp->f_flags & O_EXCL) return -EBUSY; /* No exclusive opens */ if (!drm_cpu_valid()) return -EINVAL; if (dev->switch_power_state != DRM_SWITCH_POWER_ON && dev->switch_power_state != DRM_SWITCH_POWER_DYNAMIC_OFF) return -EINVAL; if (WARN_ON_ONCE(!(filp->f_op->fop_flags & FOP_UNSIGNED_OFFSET))) return -EINVAL; drm_dbg_core(dev, "comm=\"%s\", pid=%d, minor=%d\n", current->comm, task_pid_nr(current), minor->index); priv = drm_file_alloc(minor); if (IS_ERR(priv)) return PTR_ERR(priv); if (drm_is_primary_client(priv)) { ret = drm_master_open(priv); if (ret) { drm_file_free(priv); return ret; } } filp->private_data = priv; priv->filp = filp; mutex_lock(&dev->filelist_mutex); list_add(&priv->lhead, &dev->filelist); mutex_unlock(&dev->filelist_mutex); return 0; } /** * drm_open - open method for DRM file * @inode: device inode * @filp: file pointer. * * This function must be used by drivers as their &file_operations.open method. * It looks up the correct DRM device and instantiates all the per-file * resources for it. It also calls the &drm_driver.open driver callback. * * RETURNS: * 0 on success or negative errno value on failure. */ int drm_open(struct inode *inode, struct file *filp) { struct drm_device *dev; struct drm_minor *minor; int retcode; minor = drm_minor_acquire(&drm_minors_xa, iminor(inode)); if (IS_ERR(minor)) return PTR_ERR(minor); dev = minor->dev; if (drm_dev_needs_global_mutex(dev)) mutex_lock(&drm_global_mutex); atomic_fetch_inc(&dev->open_count); /* share address_space across all char-devs of a single device */ filp->f_mapping = dev->anon_inode->i_mapping; retcode = drm_open_helper(filp, minor); if (retcode) goto err_undo; if (drm_dev_needs_global_mutex(dev)) mutex_unlock(&drm_global_mutex); return 0; err_undo: atomic_dec(&dev->open_count); if (drm_dev_needs_global_mutex(dev)) mutex_unlock(&drm_global_mutex); drm_minor_release(minor); return retcode; } EXPORT_SYMBOL(drm_open); static void drm_lastclose(struct drm_device *dev) { drm_client_dev_restore(dev); if (dev_is_pci(dev->dev)) vga_switcheroo_process_delayed_switch(); } /** * drm_release - release method for DRM file * @inode: device inode * @filp: file pointer. * * This function must be used by drivers as their &file_operations.release * method. It frees any resources associated with the open file. If this * is the last open file for the DRM device, it also restores the active * in-kernel DRM client. * * RETURNS: * Always succeeds and returns 0. */ int drm_release(struct inode *inode, struct file *filp) { struct drm_file *file_priv = filp->private_data; struct drm_minor *minor = file_priv->minor; struct drm_device *dev = minor->dev; if (drm_dev_needs_global_mutex(dev)) mutex_lock(&drm_global_mutex); drm_dbg_core(dev, "open_count = %d\n", atomic_read(&dev->open_count)); drm_close_helper(filp); if (atomic_dec_and_test(&dev->open_count)) drm_lastclose(dev); if (drm_dev_needs_global_mutex(dev)) mutex_unlock(&drm_global_mutex); drm_minor_release(minor); return 0; } EXPORT_SYMBOL(drm_release); void drm_file_update_pid(struct drm_file *filp) { struct drm_device *dev; struct pid *pid, *old; /* * Master nodes need to keep the original ownership in order for * drm_master_check_perm to keep working correctly. (See comment in * drm_auth.c.) */ if (filp->was_master) return; pid = task_tgid(current); /* * Quick unlocked check since the model is a single handover followed by * exclusive repeated use. */ if (pid == rcu_access_pointer(filp->pid)) return; dev = filp->minor->dev; mutex_lock(&dev->filelist_mutex); get_pid(pid); old = rcu_replace_pointer(filp->pid, pid, 1); mutex_unlock(&dev->filelist_mutex); synchronize_rcu(); put_pid(old); } /** * drm_release_noglobal - release method for DRM file * @inode: device inode * @filp: file pointer. * * This function may be used by drivers as their &file_operations.release * method. It frees any resources associated with the open file prior to taking * the drm_global_mutex. If this is the last open file for the DRM device, it * then restores the active in-kernel DRM client. * * RETURNS: * Always succeeds and returns 0. */ int drm_release_noglobal(struct inode *inode, struct file *filp) { struct drm_file *file_priv = filp->private_data; struct drm_minor *minor = file_priv->minor; struct drm_device *dev = minor->dev; drm_close_helper(filp); if (atomic_dec_and_mutex_lock(&dev->open_count, &drm_global_mutex)) { drm_lastclose(dev); mutex_unlock(&drm_global_mutex); } drm_minor_release(minor); return 0; } EXPORT_SYMBOL(drm_release_noglobal); /** * drm_read - read method for DRM file * @filp: file pointer * @buffer: userspace destination pointer for the read * @count: count in bytes to read * @offset: offset to read * * This function must be used by drivers as their &file_operations.read * method if they use DRM events for asynchronous signalling to userspace. * Since events are used by the KMS API for vblank and page flip completion this * means all modern display drivers must use it. * * @offset is ignored, DRM events are read like a pipe. Polling support is * provided by drm_poll(). * * This function will only ever read a full event. Therefore userspace must * supply a big enough buffer to fit any event to ensure forward progress. Since * the maximum event space is currently 4K it's recommended to just use that for * safety. * * RETURNS: * Number of bytes read (always aligned to full events, and can be 0) or a * negative error code on failure. */ ssize_t drm_read(struct file *filp, char __user *buffer, size_t count, loff_t *offset) { struct drm_file *file_priv = filp->private_data; struct drm_device *dev = file_priv->minor->dev; ssize_t ret; ret = mutex_lock_interruptible(&file_priv->event_read_lock); if (ret) return ret; for (;;) { struct drm_pending_event *e = NULL; spin_lock_irq(&dev->event_lock); if (!list_empty(&file_priv->event_list)) { e = list_first_entry(&file_priv->event_list, struct drm_pending_event, link); file_priv->event_space += e->event->length; list_del(&e->link); } spin_unlock_irq(&dev->event_lock); if (e == NULL) { if (ret) break; if (filp->f_flags & O_NONBLOCK) { ret = -EAGAIN; break; } mutex_unlock(&file_priv->event_read_lock); ret = wait_event_interruptible(file_priv->event_wait, !list_empty(&file_priv->event_list)); if (ret >= 0) ret = mutex_lock_interruptible(&file_priv->event_read_lock); if (ret) return ret; } else { unsigned length = e->event->length; if (length > count - ret) { put_back_event: spin_lock_irq(&dev->event_lock); file_priv->event_space -= length; list_add(&e->link, &file_priv->event_list); spin_unlock_irq(&dev->event_lock); wake_up_interruptible_poll(&file_priv->event_wait, EPOLLIN | EPOLLRDNORM); break; } if (copy_to_user(buffer + ret, e->event, length)) { if (ret == 0) ret = -EFAULT; goto put_back_event; } ret += length; kfree(e); } } mutex_unlock(&file_priv->event_read_lock); return ret; } EXPORT_SYMBOL(drm_read); /** * drm_poll - poll method for DRM file * @filp: file pointer * @wait: poll waiter table * * This function must be used by drivers as their &file_operations.read method * if they use DRM events for asynchronous signalling to userspace. Since * events are used by the KMS API for vblank and page flip completion this means * all modern display drivers must use it. * * See also drm_read(). * * RETURNS: * Mask of POLL flags indicating the current status of the file. */ __poll_t drm_poll(struct file *filp, struct poll_table_struct *wait) { struct drm_file *file_priv = filp->private_data; __poll_t mask = 0; poll_wait(filp, &file_priv->event_wait, wait); if (!list_empty(&file_priv->event_list)) mask |= EPOLLIN | EPOLLRDNORM; return mask; } EXPORT_SYMBOL(drm_poll); /** * drm_event_reserve_init_locked - init a DRM event and reserve space for it * @dev: DRM device * @file_priv: DRM file private data * @p: tracking structure for the pending event * @e: actual event data to deliver to userspace * * This function prepares the passed in event for eventual delivery. If the event * doesn't get delivered (because the IOCTL fails later on, before queuing up * anything) then the even must be cancelled and freed using * drm_event_cancel_free(). Successfully initialized events should be sent out * using drm_send_event() or drm_send_event_locked() to signal completion of the * asynchronous event to userspace. * * If callers embedded @p into a larger structure it must be allocated with * kmalloc and @p must be the first member element. * * This is the locked version of drm_event_reserve_init() for callers which * already hold &drm_device.event_lock. * * RETURNS: * 0 on success or a negative error code on failure. */ int drm_event_reserve_init_locked(struct drm_device *dev, struct drm_file *file_priv, struct drm_pending_event *p, struct drm_event *e) { if (file_priv->event_space < e->length) return -ENOMEM; file_priv->event_space -= e->length; p->event = e; list_add(&p->pending_link, &file_priv->pending_event_list); p->file_priv = file_priv; return 0; } EXPORT_SYMBOL(drm_event_reserve_init_locked); /** * drm_event_reserve_init - init a DRM event and reserve space for it * @dev: DRM device * @file_priv: DRM file private data * @p: tracking structure for the pending event * @e: actual event data to deliver to userspace * * This function prepares the passed in event for eventual delivery. If the event * doesn't get delivered (because the IOCTL fails later on, before queuing up * anything) then the even must be cancelled and freed using * drm_event_cancel_free(). Successfully initialized events should be sent out * using drm_send_event() or drm_send_event_locked() to signal completion of the * asynchronous event to userspace. * * If callers embedded @p into a larger structure it must be allocated with * kmalloc and @p must be the first member element. * * Callers which already hold &drm_device.event_lock should use * drm_event_reserve_init_locked() instead. * * RETURNS: * 0 on success or a negative error code on failure. */ int drm_event_reserve_init(struct drm_device *dev, struct drm_file *file_priv, struct drm_pending_event *p, struct drm_event *e) { unsigned long flags; int ret; spin_lock_irqsave(&dev->event_lock, flags); ret = drm_event_reserve_init_locked(dev, file_priv, p, e); spin_unlock_irqrestore(&dev->event_lock, flags); return ret; } EXPORT_SYMBOL(drm_event_reserve_init); /** * drm_event_cancel_free - free a DRM event and release its space * @dev: DRM device * @p: tracking structure for the pending event * * This function frees the event @p initialized with drm_event_reserve_init() * and releases any allocated space. It is used to cancel an event when the * nonblocking operation could not be submitted and needed to be aborted. */ void drm_event_cancel_free(struct drm_device *dev, struct drm_pending_event *p) { unsigned long flags; spin_lock_irqsave(&dev->event_lock, flags); if (p->file_priv) { p->file_priv->event_space += p->event->length; list_del(&p->pending_link); } spin_unlock_irqrestore(&dev->event_lock, flags); if (p->fence) dma_fence_put(p->fence); kfree(p); } EXPORT_SYMBOL(drm_event_cancel_free); static void drm_send_event_helper(struct drm_device *dev, struct drm_pending_event *e, ktime_t timestamp) { assert_spin_locked(&dev->event_lock); if (e->completion) { complete_all(e->completion); e->completion_release(e->completion); e->completion = NULL; } if (e->fence) { if (timestamp) dma_fence_signal_timestamp(e->fence, timestamp); else dma_fence_signal(e->fence); dma_fence_put(e->fence); } if (!e->file_priv) { kfree(e); return; } list_del(&e->pending_link); list_add_tail(&e->link, &e->file_priv->event_list); wake_up_interruptible_poll(&e->file_priv->event_wait, EPOLLIN | EPOLLRDNORM); } /** * drm_send_event_timestamp_locked - send DRM event to file descriptor * @dev: DRM device * @e: DRM event to deliver * @timestamp: timestamp to set for the fence event in kernel's CLOCK_MONOTONIC * time domain * * This function sends the event @e, initialized with drm_event_reserve_init(), * to its associated userspace DRM file. Callers must already hold * &drm_device.event_lock. * * Note that the core will take care of unlinking and disarming events when the * corresponding DRM file is closed. Drivers need not worry about whether the * DRM file for this event still exists and can call this function upon * completion of the asynchronous work unconditionally. */ void drm_send_event_timestamp_locked(struct drm_device *dev, struct drm_pending_event *e, ktime_t timestamp) { drm_send_event_helper(dev, e, timestamp); } EXPORT_SYMBOL(drm_send_event_timestamp_locked); /** * drm_send_event_locked - send DRM event to file descriptor * @dev: DRM device * @e: DRM event to deliver * * This function sends the event @e, initialized with drm_event_reserve_init(), * to its associated userspace DRM file. Callers must already hold * &drm_device.event_lock, see drm_send_event() for the unlocked version. * * Note that the core will take care of unlinking and disarming events when the * corresponding DRM file is closed. Drivers need not worry about whether the * DRM file for this event still exists and can call this function upon * completion of the asynchronous work unconditionally. */ void drm_send_event_locked(struct drm_device *dev, struct drm_pending_event *e) { drm_send_event_helper(dev, e, 0); } EXPORT_SYMBOL(drm_send_event_locked); /** * drm_send_event - send DRM event to file descriptor * @dev: DRM device * @e: DRM event to deliver * * This function sends the event @e, initialized with drm_event_reserve_init(), * to its associated userspace DRM file. This function acquires * &drm_device.event_lock, see drm_send_event_locked() for callers which already * hold this lock. * * Note that the core will take care of unlinking and disarming events when the * corresponding DRM file is closed. Drivers need not worry about whether the * DRM file for this event still exists and can call this function upon * completion of the asynchronous work unconditionally. */ void drm_send_event(struct drm_device *dev, struct drm_pending_event *e) { unsigned long irqflags; spin_lock_irqsave(&dev->event_lock, irqflags); drm_send_event_helper(dev, e, 0); spin_unlock_irqrestore(&dev->event_lock, irqflags); } EXPORT_SYMBOL(drm_send_event); void drm_fdinfo_print_size(struct drm_printer *p, const char *prefix, const char *stat, const char *region, u64 sz) { const char *units[] = {"", " KiB", " MiB"}; unsigned u; for (u = 0; u < ARRAY_SIZE(units) - 1; u++) { if (sz == 0 || !IS_ALIGNED(sz, SZ_1K)) break; sz = div_u64(sz, SZ_1K); } drm_printf(p, "%s-%s-%s:\t%llu%s\n", prefix, stat, region, sz, units[u]); } EXPORT_SYMBOL(drm_fdinfo_print_size); int drm_memory_stats_is_zero(const struct drm_memory_stats *stats) { return (stats->shared == 0 && stats->private == 0 && stats->resident == 0 && stats->purgeable == 0 && stats->active == 0); } EXPORT_SYMBOL(drm_memory_stats_is_zero); /** * drm_print_memory_stats - A helper to print memory stats * @p: The printer to print output to * @stats: The collected memory stats * @supported_status: Bitmask of optional stats which are available * @region: The memory region * */ void drm_print_memory_stats(struct drm_printer *p, const struct drm_memory_stats *stats, enum drm_gem_object_status supported_status, const char *region) { const char *prefix = "drm"; drm_fdinfo_print_size(p, prefix, "total", region, stats->private + stats->shared); drm_fdinfo_print_size(p, prefix, "shared", region, stats->shared); if (supported_status & DRM_GEM_OBJECT_ACTIVE) drm_fdinfo_print_size(p, prefix, "active", region, stats->active); if (supported_status & DRM_GEM_OBJECT_RESIDENT) drm_fdinfo_print_size(p, prefix, "resident", region, stats->resident); if (supported_status & DRM_GEM_OBJECT_PURGEABLE) drm_fdinfo_print_size(p, prefix, "purgeable", region, stats->purgeable); } EXPORT_SYMBOL(drm_print_memory_stats); /** * drm_show_memory_stats - Helper to collect and show standard fdinfo memory stats * @p: the printer to print output to * @file: the DRM file * * Helper to iterate over GEM objects with a handle allocated in the specified * file. */ void drm_show_memory_stats(struct drm_printer *p, struct drm_file *file) { struct drm_gem_object *obj; struct drm_memory_stats status = {}; enum drm_gem_object_status supported_status = 0; int id; spin_lock(&file->table_lock); idr_for_each_entry (&file->object_idr, obj, id) { enum drm_gem_object_status s = 0; size_t add_size = (obj->funcs && obj->funcs->rss) ? obj->funcs->rss(obj) : obj->size; if (obj->funcs && obj->funcs->status) { s = obj->funcs->status(obj); supported_status |= s; } if (drm_gem_object_is_shared_for_memory_stats(obj)) status.shared += obj->size; else status.private += obj->size; if (s & DRM_GEM_OBJECT_RESIDENT) { status.resident += add_size; } else { /* If already purged or not yet backed by pages, don't * count it as purgeable: */ s &= ~DRM_GEM_OBJECT_PURGEABLE; } if (!dma_resv_test_signaled(obj->resv, dma_resv_usage_rw(true))) { status.active += add_size; supported_status |= DRM_GEM_OBJECT_ACTIVE; /* If still active, don't count as purgeable: */ s &= ~DRM_GEM_OBJECT_PURGEABLE; } if (s & DRM_GEM_OBJECT_PURGEABLE) status.purgeable += add_size; } spin_unlock(&file->table_lock); drm_print_memory_stats(p, &status, supported_status, "memory"); } EXPORT_SYMBOL(drm_show_memory_stats); /** * drm_show_fdinfo - helper for drm file fops * @m: output stream * @f: the device file instance * * Helper to implement fdinfo, for userspace to query usage stats, etc, of a * process using the GPU. See also &drm_driver.show_fdinfo. * * For text output format description please see Documentation/gpu/drm-usage-stats.rst */ void drm_show_fdinfo(struct seq_file *m, struct file *f) { struct drm_file *file = f->private_data; struct drm_device *dev = file->minor->dev; struct drm_printer p = drm_seq_file_printer(m); int idx; if (!drm_dev_enter(dev, &idx)) return; drm_printf(&p, "drm-driver:\t%s\n", dev->driver->name); drm_printf(&p, "drm-client-id:\t%llu\n", file->client_id); if (dev_is_pci(dev->dev)) { struct pci_dev *pdev = to_pci_dev(dev->dev); drm_printf(&p, "drm-pdev:\t%04x:%02x:%02x.%d\n", pci_domain_nr(pdev->bus), pdev->bus->number, PCI_SLOT(pdev->devfn), PCI_FUNC(pdev->devfn)); } mutex_lock(&file->client_name_lock); if (file->client_name) drm_printf(&p, "drm-client-name:\t%s\n", file->client_name); mutex_unlock(&file->client_name_lock); if (dev->driver->show_fdinfo) dev->driver->show_fdinfo(&p, file); drm_dev_exit(idx); } EXPORT_SYMBOL(drm_show_fdinfo); /** * drm_file_err - log process name, pid and client_name associated with a drm_file * @file_priv: context of interest for process name and pid * @fmt: printf() like format string * * Helper function for clients which needs to log process details such * as name and pid etc along with user logs. */ void drm_file_err(struct drm_file *file_priv, const char *fmt, ...) { va_list args; struct va_format vaf; struct pid *pid; struct task_struct *task; struct drm_device *dev = file_priv->minor->dev; va_start(args, fmt); vaf.fmt = fmt; vaf.va = &args; mutex_lock(&file_priv->client_name_lock); rcu_read_lock(); pid = rcu_dereference(file_priv->pid); task = pid_task(pid, PIDTYPE_TGID); drm_err(dev, "comm: %s pid: %d client: %s ... %pV", task ? task->comm : "Unset", task ? task->pid : 0, file_priv->client_name ?: "Unset", &vaf); va_end(args); rcu_read_unlock(); mutex_unlock(&file_priv->client_name_lock); } EXPORT_SYMBOL(drm_file_err); /** * mock_drm_getfile - Create a new struct file for the drm device * @minor: drm minor to wrap (e.g. #drm_device.primary) * @flags: file creation mode (O_RDWR etc) * * This create a new struct file that wraps a DRM file context around a * DRM minor. This mimicks userspace opening e.g. /dev/dri/card0, but without * invoking userspace. The struct file may be operated on using its f_op * (the drm_device.driver.fops) to mimick userspace operations, or be supplied * to userspace facing functions as an internal/anonymous client. * * RETURNS: * Pointer to newly created struct file, ERR_PTR on failure. */ struct file *mock_drm_getfile(struct drm_minor *minor, unsigned int flags) { struct drm_device *dev = minor->dev; struct drm_file *priv; struct file *file; priv = drm_file_alloc(minor); if (IS_ERR(priv)) return ERR_CAST(priv); file = anon_inode_getfile("drm", dev->driver->fops, priv, flags); if (IS_ERR(file)) { drm_file_free(priv); return file; } /* Everyone shares a single global address space */ file->f_mapping = dev->anon_inode->i_mapping; drm_dev_get(dev); priv->filp = file; return file; } EXPORT_SYMBOL_FOR_TESTS_ONLY(mock_drm_getfile); |
10 10 10 3 3 3 14 3 9 1 1 1 1 1 1 1 3 3 3 3 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 | // SPDX-License-Identifier: GPL-2.0 #include <linux/fs.h> #include <linux/sched.h> #include <linux/slab.h> #include "internal.h" #include "mount.h" static DEFINE_SPINLOCK(pin_lock); void pin_remove(struct fs_pin *pin) { spin_lock(&pin_lock); hlist_del_init(&pin->m_list); hlist_del_init(&pin->s_list); spin_unlock(&pin_lock); spin_lock_irq(&pin->wait.lock); pin->done = 1; wake_up_locked(&pin->wait); spin_unlock_irq(&pin->wait.lock); } void pin_insert(struct fs_pin *pin, struct vfsmount *m) { spin_lock(&pin_lock); hlist_add_head(&pin->s_list, &m->mnt_sb->s_pins); hlist_add_head(&pin->m_list, &real_mount(m)->mnt_pins); spin_unlock(&pin_lock); } void pin_kill(struct fs_pin *p) { wait_queue_entry_t wait; if (!p) { rcu_read_unlock(); return; } init_wait(&wait); spin_lock_irq(&p->wait.lock); if (likely(!p->done)) { p->done = -1; spin_unlock_irq(&p->wait.lock); rcu_read_unlock(); p->kill(p); return; } if (p->done > 0) { spin_unlock_irq(&p->wait.lock); rcu_read_unlock(); return; } __add_wait_queue(&p->wait, &wait); while (1) { set_current_state(TASK_UNINTERRUPTIBLE); spin_unlock_irq(&p->wait.lock); rcu_read_unlock(); schedule(); rcu_read_lock(); if (likely(list_empty(&wait.entry))) break; /* OK, we know p couldn't have been freed yet */ spin_lock_irq(&p->wait.lock); if (p->done > 0) { spin_unlock_irq(&p->wait.lock); break; } } rcu_read_unlock(); } void mnt_pin_kill(struct mount *m) { while (1) { struct hlist_node *p; rcu_read_lock(); p = READ_ONCE(m->mnt_pins.first); if (!p) { rcu_read_unlock(); break; } pin_kill(hlist_entry(p, struct fs_pin, m_list)); } } void group_pin_kill(struct hlist_head *p) { while (1) { struct hlist_node *q; rcu_read_lock(); q = READ_ONCE(p->first); if (!q) { rcu_read_unlock(); break; } pin_kill(hlist_entry(q, struct fs_pin, s_list)); } } |
1 107 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 | // SPDX-License-Identifier: GPL-2.0 /* * trace_export.c - export basic ftrace utilities to user space * * Copyright (C) 2009 Steven Rostedt <srostedt@redhat.com> */ #include <linux/stringify.h> #include <linux/kallsyms.h> #include <linux/seq_file.h> #include <linux/uaccess.h> #include <linux/ftrace.h> #include <linux/module.h> #include <linux/init.h> #include "trace_output.h" /* Stub function for events with triggers */ static int ftrace_event_register(struct trace_event_call *call, enum trace_reg type, void *data) { return 0; } #undef TRACE_SYSTEM #define TRACE_SYSTEM ftrace /* * The FTRACE_ENTRY_REG macro allows ftrace entry to define register * function and thus become accessible via perf. */ #undef FTRACE_ENTRY_REG #define FTRACE_ENTRY_REG(name, struct_name, id, tstruct, print, regfn) \ FTRACE_ENTRY(name, struct_name, id, PARAMS(tstruct), PARAMS(print)) /* not needed for this file */ #undef __field_struct #define __field_struct(type, item) #undef __field #define __field(type, item) type item; #undef __field_fn #define __field_fn(type, item) type item; #undef __field_desc #define __field_desc(type, container, item) type item; #undef __field_packed #define __field_packed(type, container, item) type item; #undef __array #define __array(type, item, size) type item[size]; #undef __stack_array #define __stack_array(type, item, size, field) __array(type, item, size) #undef __array_desc #define __array_desc(type, container, item, size) type item[size]; #undef __dynamic_array #define __dynamic_array(type, item) type item[]; #undef F_STRUCT #define F_STRUCT(args...) args #undef F_printk #define F_printk(fmt, args...) fmt, args #undef FTRACE_ENTRY #define FTRACE_ENTRY(name, struct_name, id, tstruct, print) \ struct ____ftrace_##name { \ tstruct \ }; \ static void __always_unused ____ftrace_check_##name(void) \ { \ struct ____ftrace_##name *__entry = NULL; \ \ /* force compile-time check on F_printk() */ \ printk(print); \ } #undef FTRACE_ENTRY_DUP #define FTRACE_ENTRY_DUP(name, struct_name, id, tstruct, print) \ FTRACE_ENTRY(name, struct_name, id, PARAMS(tstruct), PARAMS(print)) #include "trace_entries.h" #undef __field_ext #define __field_ext(_type, _item, _filter_type) { \ .type = #_type, .name = #_item, \ .size = sizeof(_type), .align = __alignof__(_type), \ is_signed_type(_type), .filter_type = _filter_type }, #undef __field_ext_packed #define __field_ext_packed(_type, _item, _filter_type) { \ .type = #_type, .name = #_item, \ .size = sizeof(_type), .align = 1, \ is_signed_type(_type), .filter_type = _filter_type }, #undef __field #define __field(_type, _item) __field_ext(_type, _item, FILTER_OTHER) #undef __field_fn #define __field_fn(_type, _item) __field_ext(_type, _item, FILTER_TRACE_FN) #undef __field_desc #define __field_desc(_type, _container, _item) __field_ext(_type, _item, FILTER_OTHER) #undef __field_packed #define __field_packed(_type, _container, _item) __field_ext_packed(_type, _item, FILTER_OTHER) #undef __array #define __array(_type, _item, _len) { \ .type = #_type"["__stringify(_len)"]", .name = #_item, \ .size = sizeof(_type[_len]), .align = __alignof__(_type), \ is_signed_type(_type), .filter_type = FILTER_OTHER, \ .len = _len }, #undef __stack_array #define __stack_array(_type, _item, _len, _field) __array(_type, _item, _len) #undef __array_desc #define __array_desc(_type, _container, _item, _len) __array(_type, _item, _len) #undef __dynamic_array #define __dynamic_array(_type, _item) { \ .type = #_type "[]", .name = #_item, \ .size = 0, .align = __alignof__(_type), \ is_signed_type(_type), .filter_type = FILTER_OTHER }, #undef FTRACE_ENTRY #define FTRACE_ENTRY(name, struct_name, id, tstruct, print) \ static struct trace_event_fields ftrace_event_fields_##name[] = { \ tstruct \ {} }; #include "trace_entries.h" #undef __entry #define __entry REC #undef __field #define __field(type, item) #undef __field_fn #define __field_fn(type, item) #undef __field_desc #define __field_desc(type, container, item) #undef __field_packed #define __field_packed(type, container, item) #undef __array #define __array(type, item, len) #undef __stack_array #define __stack_array(type, item, len, field) #undef __array_desc #define __array_desc(type, container, item, len) #undef __dynamic_array #define __dynamic_array(type, item) #undef F_printk #define F_printk(fmt, args...) __stringify(fmt) ", " __stringify(args) #undef FTRACE_ENTRY_REG #define FTRACE_ENTRY_REG(call, struct_name, etype, tstruct, print, regfn) \ static struct trace_event_class __refdata event_class_ftrace_##call = { \ .system = __stringify(TRACE_SYSTEM), \ .fields_array = ftrace_event_fields_##call, \ .fields = LIST_HEAD_INIT(event_class_ftrace_##call.fields),\ .reg = regfn, \ }; \ \ struct trace_event_call __used event_##call = { \ .class = &event_class_ftrace_##call, \ { \ .name = #call, \ }, \ .event.type = etype, \ .print_fmt = print, \ .flags = TRACE_EVENT_FL_IGNORE_ENABLE, \ }; \ static struct trace_event_call __used \ __section("_ftrace_events") *__event_##call = &event_##call; #undef FTRACE_ENTRY #define FTRACE_ENTRY(call, struct_name, etype, tstruct, print) \ FTRACE_ENTRY_REG(call, struct_name, etype, \ PARAMS(tstruct), PARAMS(print), NULL) bool ftrace_event_is_function(struct trace_event_call *call) { return call == &event_function; } #include "trace_entries.h" |
1 2 2 3 98 57 6 1 28 28 18 5 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 | /* SPDX-License-Identifier: GPL-2.0-only */ /* * Trace point definitions for the RDMA Connect Manager. * * Author: Chuck Lever <chuck.lever@oracle.com> * * Copyright (c) 2019, Oracle and/or its affiliates. All rights reserved. */ #undef TRACE_SYSTEM #define TRACE_SYSTEM rdma_cma #if !defined(_TRACE_RDMA_CMA_H) || defined(TRACE_HEADER_MULTI_READ) #define _TRACE_RDMA_CMA_H #include <linux/tracepoint.h> #include <trace/misc/rdma.h> DECLARE_EVENT_CLASS(cma_fsm_class, TP_PROTO( const struct rdma_id_private *id_priv ), TP_ARGS(id_priv), TP_STRUCT__entry( __field(u32, cm_id) __field(u32, tos) __array(unsigned char, srcaddr, sizeof(struct sockaddr_in6)) __array(unsigned char, dstaddr, sizeof(struct sockaddr_in6)) ), TP_fast_assign( __entry->cm_id = id_priv->res.id; __entry->tos = id_priv->tos; memcpy(__entry->srcaddr, &id_priv->id.route.addr.src_addr, sizeof(struct sockaddr_in6)); memcpy(__entry->dstaddr, &id_priv->id.route.addr.dst_addr, sizeof(struct sockaddr_in6)); ), TP_printk("cm.id=%u src=%pISpc dst=%pISpc tos=%u", __entry->cm_id, __entry->srcaddr, __entry->dstaddr, __entry->tos ) ); #define DEFINE_CMA_FSM_EVENT(name) \ DEFINE_EVENT(cma_fsm_class, cm_##name, \ TP_PROTO( \ const struct rdma_id_private *id_priv \ ), \ TP_ARGS(id_priv)) DEFINE_CMA_FSM_EVENT(send_rtu); DEFINE_CMA_FSM_EVENT(send_rej); DEFINE_CMA_FSM_EVENT(prepare_mra); DEFINE_CMA_FSM_EVENT(send_sidr_req); DEFINE_CMA_FSM_EVENT(send_sidr_rep); DEFINE_CMA_FSM_EVENT(disconnect); DEFINE_CMA_FSM_EVENT(sent_drep); DEFINE_CMA_FSM_EVENT(sent_dreq); DEFINE_CMA_FSM_EVENT(id_destroy); TRACE_EVENT(cm_id_attach, TP_PROTO( const struct rdma_id_private *id_priv, const struct ib_device *device ), TP_ARGS(id_priv, device), TP_STRUCT__entry( __field(u32, cm_id) __array(unsigned char, srcaddr, sizeof(struct sockaddr_in6)) __array(unsigned char, dstaddr, sizeof(struct sockaddr_in6)) __string(devname, device->name) ), TP_fast_assign( __entry->cm_id = id_priv->res.id; memcpy(__entry->srcaddr, &id_priv->id.route.addr.src_addr, sizeof(struct sockaddr_in6)); memcpy(__entry->dstaddr, &id_priv->id.route.addr.dst_addr, sizeof(struct sockaddr_in6)); __assign_str(devname); ), TP_printk("cm.id=%u src=%pISpc dst=%pISpc device=%s", __entry->cm_id, __entry->srcaddr, __entry->dstaddr, __get_str(devname) ) ); DECLARE_EVENT_CLASS(cma_qp_class, TP_PROTO( const struct rdma_id_private *id_priv ), TP_ARGS(id_priv), TP_STRUCT__entry( __field(u32, cm_id) __field(u32, tos) __field(u32, qp_num) __array(unsigned char, srcaddr, sizeof(struct sockaddr_in6)) __array(unsigned char, dstaddr, sizeof(struct sockaddr_in6)) ), TP_fast_assign( __entry->cm_id = id_priv->res.id; __entry->tos = id_priv->tos; __entry->qp_num = id_priv->qp_num; memcpy(__entry->srcaddr, &id_priv->id.route.addr.src_addr, sizeof(struct sockaddr_in6)); memcpy(__entry->dstaddr, &id_priv->id.route.addr.dst_addr, sizeof(struct sockaddr_in6)); ), TP_printk("cm.id=%u src=%pISpc dst=%pISpc tos=%u qp_num=%u", __entry->cm_id, __entry->srcaddr, __entry->dstaddr, __entry->tos, __entry->qp_num ) ); #define DEFINE_CMA_QP_EVENT(name) \ DEFINE_EVENT(cma_qp_class, cm_##name, \ TP_PROTO( \ const struct rdma_id_private *id_priv \ ), \ TP_ARGS(id_priv)) DEFINE_CMA_QP_EVENT(send_req); DEFINE_CMA_QP_EVENT(send_rep); DEFINE_CMA_QP_EVENT(qp_destroy); /* * enum ib_wp_type, from include/rdma/ib_verbs.h */ #define IB_QP_TYPE_LIST \ ib_qp_type(SMI) \ ib_qp_type(GSI) \ ib_qp_type(RC) \ ib_qp_type(UC) \ ib_qp_type(UD) \ ib_qp_type(RAW_IPV6) \ ib_qp_type(RAW_ETHERTYPE) \ ib_qp_type(RAW_PACKET) \ ib_qp_type(XRC_INI) \ ib_qp_type_end(XRC_TGT) #undef ib_qp_type #undef ib_qp_type_end #define ib_qp_type(x) TRACE_DEFINE_ENUM(IB_QPT_##x); #define ib_qp_type_end(x) TRACE_DEFINE_ENUM(IB_QPT_##x); IB_QP_TYPE_LIST #undef ib_qp_type #undef ib_qp_type_end #define ib_qp_type(x) { IB_QPT_##x, #x }, #define ib_qp_type_end(x) { IB_QPT_##x, #x } #define rdma_show_qp_type(x) \ __print_symbolic(x, IB_QP_TYPE_LIST) TRACE_EVENT(cm_qp_create, TP_PROTO( const struct rdma_id_private *id_priv, const struct ib_pd *pd, const struct ib_qp_init_attr *qp_init_attr, int rc ), TP_ARGS(id_priv, pd, qp_init_attr, rc), TP_STRUCT__entry( __field(u32, cm_id) __field(u32, pd_id) __field(u32, tos) __field(u32, qp_num) __field(u32, send_wr) __field(u32, recv_wr) __field(int, rc) __field(unsigned long, qp_type) __array(unsigned char, srcaddr, sizeof(struct sockaddr_in6)) __array(unsigned char, dstaddr, sizeof(struct sockaddr_in6)) ), TP_fast_assign( __entry->cm_id = id_priv->res.id; __entry->pd_id = pd->res.id; __entry->tos = id_priv->tos; __entry->send_wr = qp_init_attr->cap.max_send_wr; __entry->recv_wr = qp_init_attr->cap.max_recv_wr; __entry->rc = rc; if (!rc) { __entry->qp_num = id_priv->qp_num; __entry->qp_type = id_priv->id.qp_type; } else { __entry->qp_num = 0; __entry->qp_type = 0; } memcpy(__entry->srcaddr, &id_priv->id.route.addr.src_addr, sizeof(struct sockaddr_in6)); memcpy(__entry->dstaddr, &id_priv->id.route.addr.dst_addr, sizeof(struct sockaddr_in6)); ), TP_printk("cm.id=%u src=%pISpc dst=%pISpc tos=%u pd.id=%u qp_type=%s" " send_wr=%u recv_wr=%u qp_num=%u rc=%d", __entry->cm_id, __entry->srcaddr, __entry->dstaddr, __entry->tos, __entry->pd_id, rdma_show_qp_type(__entry->qp_type), __entry->send_wr, __entry->recv_wr, __entry->qp_num, __entry->rc ) ); TRACE_EVENT(cm_req_handler, TP_PROTO( const struct rdma_id_private *id_priv, int event ), TP_ARGS(id_priv, event), TP_STRUCT__entry( __field(u32, cm_id) __field(u32, tos) __field(unsigned long, event) __array(unsigned char, srcaddr, sizeof(struct sockaddr_in6)) __array(unsigned char, dstaddr, sizeof(struct sockaddr_in6)) ), TP_fast_assign( __entry->cm_id = id_priv->res.id; __entry->tos = id_priv->tos; __entry->event = event; memcpy(__entry->srcaddr, &id_priv->id.route.addr.src_addr, sizeof(struct sockaddr_in6)); memcpy(__entry->dstaddr, &id_priv->id.route.addr.dst_addr, sizeof(struct sockaddr_in6)); ), TP_printk("cm.id=%u src=%pISpc dst=%pISpc tos=%u %s (%lu)", __entry->cm_id, __entry->srcaddr, __entry->dstaddr, __entry->tos, rdma_show_ib_cm_event(__entry->event), __entry->event ) ); TRACE_EVENT(cm_event_handler, TP_PROTO( const struct rdma_id_private *id_priv, const struct rdma_cm_event *event ), TP_ARGS(id_priv, event), TP_STRUCT__entry( __field(u32, cm_id) __field(u32, tos) __field(unsigned long, event) __field(int, status) __array(unsigned char, srcaddr, sizeof(struct sockaddr_in6)) __array(unsigned char, dstaddr, sizeof(struct sockaddr_in6)) ), TP_fast_assign( __entry->cm_id = id_priv->res.id; __entry->tos = id_priv->tos; __entry->event = event->event; __entry->status = event->status; memcpy(__entry->srcaddr, &id_priv->id.route.addr.src_addr, sizeof(struct sockaddr_in6)); memcpy(__entry->dstaddr, &id_priv->id.route.addr.dst_addr, sizeof(struct sockaddr_in6)); ), TP_printk("cm.id=%u src=%pISpc dst=%pISpc tos=%u %s (%lu/%d)", __entry->cm_id, __entry->srcaddr, __entry->dstaddr, __entry->tos, rdma_show_cm_event(__entry->event), __entry->event, __entry->status ) ); TRACE_EVENT(cm_event_done, TP_PROTO( const struct rdma_id_private *id_priv, const struct rdma_cm_event *event, int result ), TP_ARGS(id_priv, event, result), TP_STRUCT__entry( __field(u32, cm_id) __field(u32, tos) __field(unsigned long, event) __field(int, result) __array(unsigned char, srcaddr, sizeof(struct sockaddr_in6)) __array(unsigned char, dstaddr, sizeof(struct sockaddr_in6)) ), TP_fast_assign( __entry->cm_id = id_priv->res.id; __entry->tos = id_priv->tos; __entry->event = event->event; __entry->result = result; memcpy(__entry->srcaddr, &id_priv->id.route.addr.src_addr, sizeof(struct sockaddr_in6)); memcpy(__entry->dstaddr, &id_priv->id.route.addr.dst_addr, sizeof(struct sockaddr_in6)); ), TP_printk("cm.id=%u src=%pISpc dst=%pISpc tos=%u %s consumer returns %d", __entry->cm_id, __entry->srcaddr, __entry->dstaddr, __entry->tos, rdma_show_cm_event(__entry->event), __entry->result ) ); DECLARE_EVENT_CLASS(cma_client_class, TP_PROTO( const struct ib_device *device ), TP_ARGS(device), TP_STRUCT__entry( __string(name, device->name) ), TP_fast_assign( __assign_str(name); ), TP_printk("device name=%s", __get_str(name) ) ); #define DEFINE_CMA_CLIENT_EVENT(name) \ DEFINE_EVENT(cma_client_class, cm_##name, \ TP_PROTO( \ const struct ib_device *device \ ), \ TP_ARGS(device)) DEFINE_CMA_CLIENT_EVENT(add_one); DEFINE_CMA_CLIENT_EVENT(remove_one); #endif /* _TRACE_RDMA_CMA_H */ #undef TRACE_INCLUDE_PATH #define TRACE_INCLUDE_PATH . #define TRACE_INCLUDE_FILE cma_trace #include <trace/define_trace.h> |
7500 225 22436 282 8 8 24452 47 2 502 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 | /* SPDX-License-Identifier: GPL-2.0 */ /* * This file provides wrappers with sanitizer instrumentation for non-atomic * bit operations. * * To use this functionality, an arch's bitops.h file needs to define each of * the below bit operations with an arch_ prefix (e.g. arch_set_bit(), * arch___set_bit(), etc.). */ #ifndef _ASM_GENERIC_BITOPS_INSTRUMENTED_NON_ATOMIC_H #define _ASM_GENERIC_BITOPS_INSTRUMENTED_NON_ATOMIC_H #include <linux/instrumented.h> /** * ___set_bit - Set a bit in memory * @nr: the bit to set * @addr: the address to start counting from * * Unlike set_bit(), this function is non-atomic. If it is called on the same * region of memory concurrently, the effect may be that only one operation * succeeds. */ static __always_inline void ___set_bit(unsigned long nr, volatile unsigned long *addr) { instrument_write(addr + BIT_WORD(nr), sizeof(long)); arch___set_bit(nr, addr); } /** * ___clear_bit - Clears a bit in memory * @nr: the bit to clear * @addr: the address to start counting from * * Unlike clear_bit(), this function is non-atomic. If it is called on the same * region of memory concurrently, the effect may be that only one operation * succeeds. */ static __always_inline void ___clear_bit(unsigned long nr, volatile unsigned long *addr) { instrument_write(addr + BIT_WORD(nr), sizeof(long)); arch___clear_bit(nr, addr); } /** * ___change_bit - Toggle a bit in memory * @nr: the bit to change * @addr: the address to start counting from * * Unlike change_bit(), this function is non-atomic. If it is called on the same * region of memory concurrently, the effect may be that only one operation * succeeds. */ static __always_inline void ___change_bit(unsigned long nr, volatile unsigned long *addr) { instrument_write(addr + BIT_WORD(nr), sizeof(long)); arch___change_bit(nr, addr); } static __always_inline void __instrument_read_write_bitop(long nr, volatile unsigned long *addr) { if (IS_ENABLED(CONFIG_KCSAN_ASSUME_PLAIN_WRITES_ATOMIC)) { /* * We treat non-atomic read-write bitops a little more special. * Given the operations here only modify a single bit, assuming * non-atomicity of the writer is sufficient may be reasonable * for certain usage (and follows the permissible nature of the * assume-plain-writes-atomic rule): * 1. report read-modify-write races -> check read; * 2. do not report races with marked readers, but do report * races with unmarked readers -> check "atomic" write. */ kcsan_check_read(addr + BIT_WORD(nr), sizeof(long)); /* * Use generic write instrumentation, in case other sanitizers * or tools are enabled alongside KCSAN. */ instrument_write(addr + BIT_WORD(nr), sizeof(long)); } else { instrument_read_write(addr + BIT_WORD(nr), sizeof(long)); } } /** * ___test_and_set_bit - Set a bit and return its old value * @nr: Bit to set * @addr: Address to count from * * This operation is non-atomic. If two instances of this operation race, one * can appear to succeed but actually fail. */ static __always_inline bool ___test_and_set_bit(unsigned long nr, volatile unsigned long *addr) { __instrument_read_write_bitop(nr, addr); return arch___test_and_set_bit(nr, addr); } /** * ___test_and_clear_bit - Clear a bit and return its old value * @nr: Bit to clear * @addr: Address to count from * * This operation is non-atomic. If two instances of this operation race, one * can appear to succeed but actually fail. */ static __always_inline bool ___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr) { __instrument_read_write_bitop(nr, addr); return arch___test_and_clear_bit(nr, addr); } /** * ___test_and_change_bit - Change a bit and return its old value * @nr: Bit to change * @addr: Address to count from * * This operation is non-atomic. If two instances of this operation race, one * can appear to succeed but actually fail. */ static __always_inline bool ___test_and_change_bit(unsigned long nr, volatile unsigned long *addr) { __instrument_read_write_bitop(nr, addr); return arch___test_and_change_bit(nr, addr); } /** * _test_bit - Determine whether a bit is set * @nr: bit number to test * @addr: Address to start counting from */ static __always_inline bool _test_bit(unsigned long nr, const volatile unsigned long *addr) { instrument_atomic_read(addr + BIT_WORD(nr), sizeof(long)); return arch_test_bit(nr, addr); } /** * _test_bit_acquire - Determine, with acquire semantics, whether a bit is set * @nr: bit number to test * @addr: Address to start counting from */ static __always_inline bool _test_bit_acquire(unsigned long nr, const volatile unsigned long *addr) { instrument_atomic_read(addr + BIT_WORD(nr), sizeof(long)); return arch_test_bit_acquire(nr, addr); } #endif /* _ASM_GENERIC_BITOPS_INSTRUMENTED_NON_ATOMIC_H */ |
4 17 17 35 46 47 5 6 6 6 5 6 6 6 6 6 311 1 311 147 147 6 6 6 5 6 6 6 6 6 6 6 6 6 6 6 1 6 6 6 9 9 4 4 1 1 1 1 1 1 1 3 2 1 1 1 2 1 1 1 1 1 1 1 1 2 9 1 1 1 1 1 2 2 1 2 1 6 6 9 9 2 9 6 4 6 6 1758 1734 144 1 2 6 4 2 1 1 1 1 1 311 311 311 1900 1901 1878 311 1907 1879 312 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 | // SPDX-License-Identifier: GPL-2.0-or-later /* * Copyright (c) 2016 Mellanox Technologies. All rights reserved. * Copyright (c) 2016 Jiri Pirko <jiri@mellanox.com> */ #include "devl_internal.h" #define DEVLINK_PORT_FN_CAPS_VALID_MASK \ (_BITUL(__DEVLINK_PORT_FN_ATTR_CAPS_MAX) - 1) static const struct nla_policy devlink_function_nl_policy[DEVLINK_PORT_FUNCTION_ATTR_MAX + 1] = { [DEVLINK_PORT_FUNCTION_ATTR_HW_ADDR] = { .type = NLA_BINARY }, [DEVLINK_PORT_FN_ATTR_STATE] = NLA_POLICY_RANGE(NLA_U8, DEVLINK_PORT_FN_STATE_INACTIVE, DEVLINK_PORT_FN_STATE_ACTIVE), [DEVLINK_PORT_FN_ATTR_CAPS] = NLA_POLICY_BITFIELD32(DEVLINK_PORT_FN_CAPS_VALID_MASK), [DEVLINK_PORT_FN_ATTR_MAX_IO_EQS] = { .type = NLA_U32 }, }; #define ASSERT_DEVLINK_PORT_REGISTERED(devlink_port) \ WARN_ON_ONCE(!(devlink_port)->registered) #define ASSERT_DEVLINK_PORT_NOT_REGISTERED(devlink_port) \ WARN_ON_ONCE((devlink_port)->registered) struct devlink_port *devlink_port_get_by_index(struct devlink *devlink, unsigned int port_index) { return xa_load(&devlink->ports, port_index); } struct devlink_port *devlink_port_get_from_attrs(struct devlink *devlink, struct nlattr **attrs) { if (attrs[DEVLINK_ATTR_PORT_INDEX]) { u32 port_index = nla_get_u32(attrs[DEVLINK_ATTR_PORT_INDEX]); struct devlink_port *devlink_port; devlink_port = devlink_port_get_by_index(devlink, port_index); if (!devlink_port) return ERR_PTR(-ENODEV); return devlink_port; } return ERR_PTR(-EINVAL); } struct devlink_port *devlink_port_get_from_info(struct devlink *devlink, struct genl_info *info) { return devlink_port_get_from_attrs(devlink, info->attrs); } static void devlink_port_fn_cap_fill(struct nla_bitfield32 *caps, u32 cap, bool is_enable) { caps->selector |= cap; if (is_enable) caps->value |= cap; } static int devlink_port_fn_roce_fill(struct devlink_port *devlink_port, struct nla_bitfield32 *caps, struct netlink_ext_ack *extack) { bool is_enable; int err; if (!devlink_port->ops->port_fn_roce_get) return 0; err = devlink_port->ops->port_fn_roce_get(devlink_port, &is_enable, extack); if (err) { if (err == -EOPNOTSUPP) return 0; return err; } devlink_port_fn_cap_fill(caps, DEVLINK_PORT_FN_CAP_ROCE, is_enable); return 0; } static int devlink_port_fn_migratable_fill(struct devlink_port *devlink_port, struct nla_bitfield32 *caps, struct netlink_ext_ack *extack) { bool is_enable; int err; if (!devlink_port->ops->port_fn_migratable_get || devlink_port->attrs.flavour != DEVLINK_PORT_FLAVOUR_PCI_VF) return 0; err = devlink_port->ops->port_fn_migratable_get(devlink_port, &is_enable, extack); if (err) { if (err == -EOPNOTSUPP) return 0; return err; } devlink_port_fn_cap_fill(caps, DEVLINK_PORT_FN_CAP_MIGRATABLE, is_enable); return 0; } static int devlink_port_fn_ipsec_crypto_fill(struct devlink_port *devlink_port, struct nla_bitfield32 *caps, struct netlink_ext_ack *extack) { bool is_enable; int err; if (!devlink_port->ops->port_fn_ipsec_crypto_get || devlink_port->attrs.flavour != DEVLINK_PORT_FLAVOUR_PCI_VF) return 0; err = devlink_port->ops->port_fn_ipsec_crypto_get(devlink_port, &is_enable, extack); if (err) { if (err == -EOPNOTSUPP) return 0; return err; } devlink_port_fn_cap_fill(caps, DEVLINK_PORT_FN_CAP_IPSEC_CRYPTO, is_enable); return 0; } static int devlink_port_fn_ipsec_packet_fill(struct devlink_port *devlink_port, struct nla_bitfield32 *caps, struct netlink_ext_ack *extack) { bool is_enable; int err; if (!devlink_port->ops->port_fn_ipsec_packet_get || devlink_port->attrs.flavour != DEVLINK_PORT_FLAVOUR_PCI_VF) return 0; err = devlink_port->ops->port_fn_ipsec_packet_get(devlink_port, &is_enable, extack); if (err) { if (err == -EOPNOTSUPP) return 0; return err; } devlink_port_fn_cap_fill(caps, DEVLINK_PORT_FN_CAP_IPSEC_PACKET, is_enable); return 0; } static int devlink_port_fn_caps_fill(struct devlink_port *devlink_port, struct sk_buff *msg, struct netlink_ext_ack *extack, bool *msg_updated) { struct nla_bitfield32 caps = {}; int err; err = devlink_port_fn_roce_fill(devlink_port, &caps, extack); if (err) return err; err = devlink_port_fn_migratable_fill(devlink_port, &caps, extack); if (err) return err; err = devlink_port_fn_ipsec_crypto_fill(devlink_port, &caps, extack); if (err) return err; err = devlink_port_fn_ipsec_packet_fill(devlink_port, &caps, extack); if (err) return err; if (!caps.selector) return 0; err = nla_put_bitfield32(msg, DEVLINK_PORT_FN_ATTR_CAPS, caps.value, caps.selector); if (err) return err; *msg_updated = true; return 0; } static int devlink_port_fn_max_io_eqs_fill(struct devlink_port *port, struct sk_buff *msg, struct netlink_ext_ack *extack, bool *msg_updated) { u32 max_io_eqs; int err; if (!port->ops->port_fn_max_io_eqs_get) return 0; err = port->ops->port_fn_max_io_eqs_get(port, &max_io_eqs, extack); if (err) { if (err == -EOPNOTSUPP) return 0; return err; } err = nla_put_u32(msg, DEVLINK_PORT_FN_ATTR_MAX_IO_EQS, max_io_eqs); if (err) return err; *msg_updated = true; return 0; } int devlink_nl_port_handle_fill(struct sk_buff *msg, struct devlink_port *devlink_port) { if (devlink_nl_put_handle(msg, devlink_port->devlink)) return -EMSGSIZE; if (nla_put_u32(msg, DEVLINK_ATTR_PORT_INDEX, devlink_port->index)) return -EMSGSIZE; return 0; } size_t devlink_nl_port_handle_size(struct devlink_port *devlink_port) { struct devlink *devlink = devlink_port->devlink; return nla_total_size(strlen(devlink->dev->bus->name) + 1) /* DEVLINK_ATTR_BUS_NAME */ + nla_total_size(strlen(dev_name(devlink->dev)) + 1) /* DEVLINK_ATTR_DEV_NAME */ + nla_total_size(4); /* DEVLINK_ATTR_PORT_INDEX */ } static int devlink_nl_port_attrs_put(struct sk_buff *msg, struct devlink_port *devlink_port) { struct devlink_port_attrs *attrs = &devlink_port->attrs; if (!devlink_port->attrs_set) return 0; if (attrs->lanes) { if (nla_put_u32(msg, DEVLINK_ATTR_PORT_LANES, attrs->lanes)) return -EMSGSIZE; } if (nla_put_u8(msg, DEVLINK_ATTR_PORT_SPLITTABLE, attrs->splittable)) return -EMSGSIZE; if (nla_put_u16(msg, DEVLINK_ATTR_PORT_FLAVOUR, attrs->flavour)) return -EMSGSIZE; switch (devlink_port->attrs.flavour) { case DEVLINK_PORT_FLAVOUR_PCI_PF: if (nla_put_u32(msg, DEVLINK_ATTR_PORT_CONTROLLER_NUMBER, attrs->pci_pf.controller) || nla_put_u16(msg, DEVLINK_ATTR_PORT_PCI_PF_NUMBER, attrs->pci_pf.pf)) return -EMSGSIZE; if (nla_put_u8(msg, DEVLINK_ATTR_PORT_EXTERNAL, attrs->pci_pf.external)) return -EMSGSIZE; break; case DEVLINK_PORT_FLAVOUR_PCI_VF: if (nla_put_u32(msg, DEVLINK_ATTR_PORT_CONTROLLER_NUMBER, attrs->pci_vf.controller) || nla_put_u16(msg, DEVLINK_ATTR_PORT_PCI_PF_NUMBER, attrs->pci_vf.pf) || nla_put_u16(msg, DEVLINK_ATTR_PORT_PCI_VF_NUMBER, attrs->pci_vf.vf)) return -EMSGSIZE; if (nla_put_u8(msg, DEVLINK_ATTR_PORT_EXTERNAL, attrs->pci_vf.external)) return -EMSGSIZE; break; case DEVLINK_PORT_FLAVOUR_PCI_SF: if (nla_put_u32(msg, DEVLINK_ATTR_PORT_CONTROLLER_NUMBER, attrs->pci_sf.controller) || nla_put_u16(msg, DEVLINK_ATTR_PORT_PCI_PF_NUMBER, attrs->pci_sf.pf) || nla_put_u32(msg, DEVLINK_ATTR_PORT_PCI_SF_NUMBER, attrs->pci_sf.sf)) return -EMSGSIZE; break; case DEVLINK_PORT_FLAVOUR_PHYSICAL: case DEVLINK_PORT_FLAVOUR_CPU: case DEVLINK_PORT_FLAVOUR_DSA: if (nla_put_u32(msg, DEVLINK_ATTR_PORT_NUMBER, attrs->phys.port_number)) return -EMSGSIZE; if (!attrs->split) return 0; if (nla_put_u32(msg, DEVLINK_ATTR_PORT_SPLIT_GROUP, attrs->phys.port_number)) return -EMSGSIZE; if (nla_put_u32(msg, DEVLINK_ATTR_PORT_SPLIT_SUBPORT_NUMBER, attrs->phys.split_subport_number)) return -EMSGSIZE; break; default: break; } return 0; } static int devlink_port_fn_hw_addr_fill(struct devlink_port *port, struct sk_buff *msg, struct netlink_ext_ack *extack, bool *msg_updated) { u8 hw_addr[MAX_ADDR_LEN]; int hw_addr_len; int err; if (!port->ops->port_fn_hw_addr_get) return 0; err = port->ops->port_fn_hw_addr_get(port, hw_addr, &hw_addr_len, extack); if (err) { if (err == -EOPNOTSUPP) return 0; return err; } err = nla_put(msg, DEVLINK_PORT_FUNCTION_ATTR_HW_ADDR, hw_addr_len, hw_addr); if (err) return err; *msg_updated = true; return 0; } static bool devlink_port_fn_state_valid(enum devlink_port_fn_state state) { return state == DEVLINK_PORT_FN_STATE_INACTIVE || state == DEVLINK_PORT_FN_STATE_ACTIVE; } static bool devlink_port_fn_opstate_valid(enum devlink_port_fn_opstate opstate) { return opstate == DEVLINK_PORT_FN_OPSTATE_DETACHED || opstate == DEVLINK_PORT_FN_OPSTATE_ATTACHED; } static int devlink_port_fn_state_fill(struct devlink_port *port, struct sk_buff *msg, struct netlink_ext_ack *extack, bool *msg_updated) { enum devlink_port_fn_opstate opstate; enum devlink_port_fn_state state; int err; if (!port->ops->port_fn_state_get) return 0; err = port->ops->port_fn_state_get(port, &state, &opstate, extack); if (err) { if (err == -EOPNOTSUPP) return 0; return err; } if (!devlink_port_fn_state_valid(state)) { WARN_ON_ONCE(1); NL_SET_ERR_MSG(extack, "Invalid state read from driver"); return -EINVAL; } if (!devlink_port_fn_opstate_valid(opstate)) { WARN_ON_ONCE(1); NL_SET_ERR_MSG(extack, "Invalid operational state read from driver"); return -EINVAL; } if (nla_put_u8(msg, DEVLINK_PORT_FN_ATTR_STATE, state) || nla_put_u8(msg, DEVLINK_PORT_FN_ATTR_OPSTATE, opstate)) return -EMSGSIZE; *msg_updated = true; return 0; } static int devlink_port_fn_mig_set(struct devlink_port *devlink_port, bool enable, struct netlink_ext_ack *extack) { return devlink_port->ops->port_fn_migratable_set(devlink_port, enable, extack); } static int devlink_port_fn_roce_set(struct devlink_port *devlink_port, bool enable, struct netlink_ext_ack *extack) { return devlink_port->ops->port_fn_roce_set(devlink_port, enable, extack); } static int devlink_port_fn_ipsec_crypto_set(struct devlink_port *devlink_port, bool enable, struct netlink_ext_ack *extack) { return devlink_port->ops->port_fn_ipsec_crypto_set(devlink_port, enable, extack); } static int devlink_port_fn_ipsec_packet_set(struct devlink_port *devlink_port, bool enable, struct netlink_ext_ack *extack) { return devlink_port->ops->port_fn_ipsec_packet_set(devlink_port, enable, extack); } static int devlink_port_fn_caps_set(struct devlink_port *devlink_port, const struct nlattr *attr, struct netlink_ext_ack *extack) { struct nla_bitfield32 caps; u32 caps_value; int err; caps = nla_get_bitfield32(attr); caps_value = caps.value & caps.selector; if (caps.selector & DEVLINK_PORT_FN_CAP_ROCE) { err = devlink_port_fn_roce_set(devlink_port, caps_value & DEVLINK_PORT_FN_CAP_ROCE, extack); if (err) return err; } if (caps.selector & DEVLINK_PORT_FN_CAP_MIGRATABLE) { err = devlink_port_fn_mig_set(devlink_port, caps_value & DEVLINK_PORT_FN_CAP_MIGRATABLE, extack); if (err) return err; } if (caps.selector & DEVLINK_PORT_FN_CAP_IPSEC_CRYPTO) { err = devlink_port_fn_ipsec_crypto_set(devlink_port, caps_value & DEVLINK_PORT_FN_CAP_IPSEC_CRYPTO, extack); if (err) return err; } if (caps.selector & DEVLINK_PORT_FN_CAP_IPSEC_PACKET) { err = devlink_port_fn_ipsec_packet_set(devlink_port, caps_value & DEVLINK_PORT_FN_CAP_IPSEC_PACKET, extack); if (err) return err; } return 0; } static int devlink_port_fn_max_io_eqs_set(struct devlink_port *devlink_port, const struct nlattr *attr, struct netlink_ext_ack *extack) { u32 max_io_eqs; max_io_eqs = nla_get_u32(attr); return devlink_port->ops->port_fn_max_io_eqs_set(devlink_port, max_io_eqs, extack); } static int devlink_nl_port_function_attrs_put(struct sk_buff *msg, struct devlink_port *port, struct netlink_ext_ack *extack) { struct nlattr *function_attr; bool msg_updated = false; int err; function_attr = nla_nest_start_noflag(msg, DEVLINK_ATTR_PORT_FUNCTION); if (!function_attr) return -EMSGSIZE; err = devlink_port_fn_hw_addr_fill(port, msg, extack, &msg_updated); if (err) goto out; err = devlink_port_fn_caps_fill(port, msg, extack, &msg_updated); if (err) goto out; err = devlink_port_fn_state_fill(port, msg, extack, &msg_updated); if (err) goto out; err = devlink_port_fn_max_io_eqs_fill(port, msg, extack, &msg_updated); if (err) goto out; err = devlink_rel_devlink_handle_put(msg, port->devlink, port->rel_index, DEVLINK_PORT_FN_ATTR_DEVLINK, &msg_updated); out: if (err || !msg_updated) nla_nest_cancel(msg, function_attr); else nla_nest_end(msg, function_attr); return err; } static int devlink_nl_port_fill(struct sk_buff *msg, struct devlink_port *devlink_port, enum devlink_command cmd, u32 portid, u32 seq, int flags, struct netlink_ext_ack *extack) { struct devlink *devlink = devlink_port->devlink; void *hdr; hdr = genlmsg_put(msg, portid, seq, &devlink_nl_family, flags, cmd); if (!hdr) return -EMSGSIZE; if (devlink_nl_put_handle(msg, devlink)) goto nla_put_failure; if (nla_put_u32(msg, DEVLINK_ATTR_PORT_INDEX, devlink_port->index)) goto nla_put_failure; spin_lock_bh(&devlink_port->type_lock); if (nla_put_u16(msg, DEVLINK_ATTR_PORT_TYPE, devlink_port->type)) goto nla_put_failure_type_locked; if (devlink_port->desired_type != DEVLINK_PORT_TYPE_NOTSET && nla_put_u16(msg, DEVLINK_ATTR_PORT_DESIRED_TYPE, devlink_port->desired_type)) goto nla_put_failure_type_locked; if (devlink_port->type == DEVLINK_PORT_TYPE_ETH) { if (devlink_port->type_eth.netdev && (nla_put_u32(msg, DEVLINK_ATTR_PORT_NETDEV_IFINDEX, devlink_port->type_eth.ifindex) || nla_put_string(msg, DEVLINK_ATTR_PORT_NETDEV_NAME, devlink_port->type_eth.ifname))) goto nla_put_failure_type_locked; } if (devlink_port->type == DEVLINK_PORT_TYPE_IB) { struct ib_device *ibdev = devlink_port->type_ib.ibdev; if (ibdev && nla_put_string(msg, DEVLINK_ATTR_PORT_IBDEV_NAME, ibdev->name)) goto nla_put_failure_type_locked; } spin_unlock_bh(&devlink_port->type_lock); if (devlink_nl_port_attrs_put(msg, devlink_port)) goto nla_put_failure; if (devlink_nl_port_function_attrs_put(msg, devlink_port, extack)) goto nla_put_failure; if (devlink_port->linecard && nla_put_u32(msg, DEVLINK_ATTR_LINECARD_INDEX, devlink_linecard_index(devlink_port->linecard))) goto nla_put_failure; genlmsg_end(msg, hdr); return 0; nla_put_failure_type_locked: spin_unlock_bh(&devlink_port->type_lock); nla_put_failure: genlmsg_cancel(msg, hdr); return -EMSGSIZE; } static void devlink_port_notify(struct devlink_port *devlink_port, enum devlink_command cmd) { struct devlink *devlink = devlink_port->devlink; struct devlink_obj_desc desc; struct sk_buff *msg; int err; WARN_ON(cmd != DEVLINK_CMD_PORT_NEW && cmd != DEVLINK_CMD_PORT_DEL); if (!__devl_is_registered(devlink) || !devlink_nl_notify_need(devlink)) return; msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); if (!msg) return; err = devlink_nl_port_fill(msg, devlink_port, cmd, 0, 0, 0, NULL); if (err) { nlmsg_free(msg); return; } devlink_nl_obj_desc_init(&desc, devlink); devlink_nl_obj_desc_port_set(&desc, devlink_port); devlink_nl_notify_send_desc(devlink, msg, &desc); } static void devlink_ports_notify(struct devlink *devlink, enum devlink_command cmd) { struct devlink_port *devlink_port; unsigned long port_index; xa_for_each(&devlink->ports, port_index, devlink_port) devlink_port_notify(devlink_port, cmd); } void devlink_ports_notify_register(struct devlink *devlink) { devlink_ports_notify(devlink, DEVLINK_CMD_PORT_NEW); } void devlink_ports_notify_unregister(struct devlink *devlink) { devlink_ports_notify(devlink, DEVLINK_CMD_PORT_DEL); } int devlink_nl_port_get_doit(struct sk_buff *skb, struct genl_info *info) { struct devlink_port *devlink_port = info->user_ptr[1]; struct sk_buff *msg; int err; msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); if (!msg) return -ENOMEM; err = devlink_nl_port_fill(msg, devlink_port, DEVLINK_CMD_PORT_NEW, info->snd_portid, info->snd_seq, 0, info->extack); if (err) { nlmsg_free(msg); return err; } return genlmsg_reply(msg, info); } static int devlink_nl_port_get_dump_one(struct sk_buff *msg, struct devlink *devlink, struct netlink_callback *cb, int flags) { struct devlink_nl_dump_state *state = devlink_dump_state(cb); struct devlink_port *devlink_port; unsigned long port_index; int err = 0; xa_for_each_start(&devlink->ports, port_index, devlink_port, state->idx) { err = devlink_nl_port_fill(msg, devlink_port, DEVLINK_CMD_PORT_NEW, NETLINK_CB(cb->skb).portid, cb->nlh->nlmsg_seq, flags, cb->extack); if (err) { state->idx = port_index; break; } } return err; } int devlink_nl_port_get_dumpit(struct sk_buff *skb, struct netlink_callback *cb) { return devlink_nl_dumpit(skb, cb, devlink_nl_port_get_dump_one); } static int devlink_port_type_set(struct devlink_port *devlink_port, enum devlink_port_type port_type) { int err; if (!devlink_port->ops->port_type_set) return -EOPNOTSUPP; if (port_type == devlink_port->type) return 0; err = devlink_port->ops->port_type_set(devlink_port, port_type); if (err) return err; devlink_port->desired_type = port_type; devlink_port_notify(devlink_port, DEVLINK_CMD_PORT_NEW); return 0; } static int devlink_port_function_hw_addr_set(struct devlink_port *port, const struct nlattr *attr, struct netlink_ext_ack *extack) { const u8 *hw_addr; int hw_addr_len; hw_addr = nla_data(attr); hw_addr_len = nla_len(attr); if (hw_addr_len > MAX_ADDR_LEN) { NL_SET_ERR_MSG(extack, "Port function hardware address too long"); return -EINVAL; } if (port->type == DEVLINK_PORT_TYPE_ETH) { if (hw_addr_len != ETH_ALEN) { NL_SET_ERR_MSG(extack, "Address must be 6 bytes for Ethernet device"); return -EINVAL; } if (!is_unicast_ether_addr(hw_addr)) { NL_SET_ERR_MSG(extack, "Non-unicast hardware address unsupported"); return -EINVAL; } } return port->ops->port_fn_hw_addr_set(port, hw_addr, hw_addr_len, extack); } static int devlink_port_fn_state_set(struct devlink_port *port, const struct nlattr *attr, struct netlink_ext_ack *extack) { enum devlink_port_fn_state state; state = nla_get_u8(attr); return port->ops->port_fn_state_set(port, state, extack); } static int devlink_port_function_validate(struct devlink_port *devlink_port, struct nlattr **tb, struct netlink_ext_ack *extack) { const struct devlink_port_ops *ops = devlink_port->ops; struct nlattr *attr; if (tb[DEVLINK_PORT_FUNCTION_ATTR_HW_ADDR] && !ops->port_fn_hw_addr_set) { NL_SET_ERR_MSG_ATTR(extack, tb[DEVLINK_PORT_FUNCTION_ATTR_HW_ADDR], "Port doesn't support function attributes"); return -EOPNOTSUPP; } if (tb[DEVLINK_PORT_FN_ATTR_STATE] && !ops->port_fn_state_set) { NL_SET_ERR_MSG_ATTR(extack, tb[DEVLINK_PORT_FN_ATTR_STATE], "Function does not support state setting"); return -EOPNOTSUPP; } attr = tb[DEVLINK_PORT_FN_ATTR_CAPS]; if (attr) { struct nla_bitfield32 caps; caps = nla_get_bitfield32(attr); if (caps.selector & DEVLINK_PORT_FN_CAP_ROCE && !ops->port_fn_roce_set) { NL_SET_ERR_MSG_ATTR(extack, attr, "Port doesn't support RoCE function attribute"); return -EOPNOTSUPP; } if (caps.selector & DEVLINK_PORT_FN_CAP_MIGRATABLE) { if (!ops->port_fn_migratable_set) { NL_SET_ERR_MSG_ATTR(extack, attr, "Port doesn't support migratable function attribute"); return -EOPNOTSUPP; } if (devlink_port->attrs.flavour != DEVLINK_PORT_FLAVOUR_PCI_VF) { NL_SET_ERR_MSG_ATTR(extack, attr, "migratable function attribute supported for VFs only"); return -EOPNOTSUPP; } } if (caps.selector & DEVLINK_PORT_FN_CAP_IPSEC_CRYPTO) { if (!ops->port_fn_ipsec_crypto_set) { NL_SET_ERR_MSG_ATTR(extack, attr, "Port doesn't support ipsec_crypto function attribute"); return -EOPNOTSUPP; } if (devlink_port->attrs.flavour != DEVLINK_PORT_FLAVOUR_PCI_VF) { NL_SET_ERR_MSG_ATTR(extack, attr, "ipsec_crypto function attribute supported for VFs only"); return -EOPNOTSUPP; } } if (caps.selector & DEVLINK_PORT_FN_CAP_IPSEC_PACKET) { if (!ops->port_fn_ipsec_packet_set) { NL_SET_ERR_MSG_ATTR(extack, attr, "Port doesn't support ipsec_packet function attribute"); return -EOPNOTSUPP; } if (devlink_port->attrs.flavour != DEVLINK_PORT_FLAVOUR_PCI_VF) { NL_SET_ERR_MSG_ATTR(extack, attr, "ipsec_packet function attribute supported for VFs only"); return -EOPNOTSUPP; } } } if (tb[DEVLINK_PORT_FN_ATTR_MAX_IO_EQS] && !ops->port_fn_max_io_eqs_set) { NL_SET_ERR_MSG_ATTR(extack, tb[DEVLINK_PORT_FN_ATTR_MAX_IO_EQS], "Function does not support max_io_eqs setting"); return -EOPNOTSUPP; } return 0; } static int devlink_port_function_set(struct devlink_port *port, const struct nlattr *attr, struct netlink_ext_ack *extack) { struct nlattr *tb[DEVLINK_PORT_FUNCTION_ATTR_MAX + 1]; int err; err = nla_parse_nested(tb, DEVLINK_PORT_FUNCTION_ATTR_MAX, attr, devlink_function_nl_policy, extack); if (err < 0) { NL_SET_ERR_MSG(extack, "Fail to parse port function attributes"); return err; } err = devlink_port_function_validate(port, tb, extack); if (err) return err; attr = tb[DEVLINK_PORT_FUNCTION_ATTR_HW_ADDR]; if (attr) { err = devlink_port_function_hw_addr_set(port, attr, extack); if (err) return err; } attr = tb[DEVLINK_PORT_FN_ATTR_CAPS]; if (attr) { err = devlink_port_fn_caps_set(port, attr, extack); if (err) return err; } attr = tb[DEVLINK_PORT_FN_ATTR_MAX_IO_EQS]; if (attr) { err = devlink_port_fn_max_io_eqs_set(port, attr, extack); if (err) return err; } /* Keep this as the last function attribute set, so that when * multiple port function attributes are set along with state, * Those can be applied first before activating the state. */ attr = tb[DEVLINK_PORT_FN_ATTR_STATE]; if (attr) err = devlink_port_fn_state_set(port, attr, extack); if (!err) devlink_port_notify(port, DEVLINK_CMD_PORT_NEW); return err; } int devlink_nl_port_set_doit(struct sk_buff *skb, struct genl_info *info) { struct devlink_port *devlink_port = info->user_ptr[1]; int err; if (info->attrs[DEVLINK_ATTR_PORT_TYPE]) { enum devlink_port_type port_type; port_type = nla_get_u16(info->attrs[DEVLINK_ATTR_PORT_TYPE]); err = devlink_port_type_set(devlink_port, port_type); if (err) return err; } if (info->attrs[DEVLINK_ATTR_PORT_FUNCTION]) { struct nlattr *attr = info->attrs[DEVLINK_ATTR_PORT_FUNCTION]; struct netlink_ext_ack *extack = info->extack; err = devlink_port_function_set(devlink_port, attr, extack); if (err) return err; } return 0; } int devlink_nl_port_split_doit(struct sk_buff *skb, struct genl_info *info) { struct devlink_port *devlink_port = info->user_ptr[1]; struct devlink *devlink = info->user_ptr[0]; u32 count; if (GENL_REQ_ATTR_CHECK(info, DEVLINK_ATTR_PORT_SPLIT_COUNT)) return -EINVAL; if (!devlink_port->ops->port_split) return -EOPNOTSUPP; count = nla_get_u32(info->attrs[DEVLINK_ATTR_PORT_SPLIT_COUNT]); if (!devlink_port->attrs.splittable) { /* Split ports cannot be split. */ if (devlink_port->attrs.split) NL_SET_ERR_MSG(info->extack, "Port cannot be split further"); else NL_SET_ERR_MSG(info->extack, "Port cannot be split"); return -EINVAL; } if (count < 2 || !is_power_of_2(count) || count > devlink_port->attrs.lanes) { NL_SET_ERR_MSG(info->extack, "Invalid split count"); return -EINVAL; } return devlink_port->ops->port_split(devlink, devlink_port, count, info->extack); } int devlink_nl_port_unsplit_doit(struct sk_buff *skb, struct genl_info *info) { struct devlink_port *devlink_port = info->user_ptr[1]; struct devlink *devlink = info->user_ptr[0]; if (!devlink_port->ops->port_unsplit) return -EOPNOTSUPP; return devlink_port->ops->port_unsplit(devlink, devlink_port, info->extack); } int devlink_nl_port_new_doit(struct sk_buff *skb, struct genl_info *info) { struct netlink_ext_ack *extack = info->extack; struct devlink_port_new_attrs new_attrs = {}; struct devlink *devlink = info->user_ptr[0]; struct devlink_port *devlink_port; struct sk_buff *msg; int err; if (!devlink->ops->port_new) return -EOPNOTSUPP; if (!info->attrs[DEVLINK_ATTR_PORT_FLAVOUR] || !info->attrs[DEVLINK_ATTR_PORT_PCI_PF_NUMBER]) { NL_SET_ERR_MSG(extack, "Port flavour or PCI PF are not specified"); return -EINVAL; } new_attrs.flavour = nla_get_u16(info->attrs[DEVLINK_ATTR_PORT_FLAVOUR]); new_attrs.pfnum = nla_get_u16(info->attrs[DEVLINK_ATTR_PORT_PCI_PF_NUMBER]); if (info->attrs[DEVLINK_ATTR_PORT_INDEX]) { /* Port index of the new port being created by driver. */ new_attrs.port_index = nla_get_u32(info->attrs[DEVLINK_ATTR_PORT_INDEX]); new_attrs.port_index_valid = true; } if (info->attrs[DEVLINK_ATTR_PORT_CONTROLLER_NUMBER]) { new_attrs.controller = nla_get_u16(info->attrs[DEVLINK_ATTR_PORT_CONTROLLER_NUMBER]); new_attrs.controller_valid = true; } if (new_attrs.flavour == DEVLINK_PORT_FLAVOUR_PCI_SF && info->attrs[DEVLINK_ATTR_PORT_PCI_SF_NUMBER]) { new_attrs.sfnum = nla_get_u32(info->attrs[DEVLINK_ATTR_PORT_PCI_SF_NUMBER]); new_attrs.sfnum_valid = true; } err = devlink->ops->port_new(devlink, &new_attrs, extack, &devlink_port); if (err) return err; msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); if (!msg) { err = -ENOMEM; goto err_out_port_del; } err = devlink_nl_port_fill(msg, devlink_port, DEVLINK_CMD_PORT_NEW, info->snd_portid, info->snd_seq, 0, NULL); if (WARN_ON_ONCE(err)) goto err_out_msg_free; err = genlmsg_reply(msg, info); if (err) goto err_out_port_del; return 0; err_out_msg_free: nlmsg_free(msg); err_out_port_del: devlink_port->ops->port_del(devlink, devlink_port, NULL); return err; } int devlink_nl_port_del_doit(struct sk_buff *skb, struct genl_info *info) { struct devlink_port *devlink_port = info->user_ptr[1]; struct netlink_ext_ack *extack = info->extack; struct devlink *devlink = info->user_ptr[0]; if (!devlink_port->ops->port_del) return -EOPNOTSUPP; return devlink_port->ops->port_del(devlink, devlink_port, extack); } static void devlink_port_type_warn(struct work_struct *work) { struct devlink_port *port = container_of(to_delayed_work(work), struct devlink_port, type_warn_dw); dev_warn(port->devlink->dev, "Type was not set for devlink port."); } static bool devlink_port_type_should_warn(struct devlink_port *devlink_port) { /* Ignore CPU and DSA flavours. */ return devlink_port->attrs.flavour != DEVLINK_PORT_FLAVOUR_CPU && devlink_port->attrs.flavour != DEVLINK_PORT_FLAVOUR_DSA && devlink_port->attrs.flavour != DEVLINK_PORT_FLAVOUR_UNUSED; } #define DEVLINK_PORT_TYPE_WARN_TIMEOUT (HZ * 3600) static void devlink_port_type_warn_schedule(struct devlink_port *devlink_port) { if (!devlink_port_type_should_warn(devlink_port)) return; /* Schedule a work to WARN in case driver does not set port * type within timeout. */ schedule_delayed_work(&devlink_port->type_warn_dw, DEVLINK_PORT_TYPE_WARN_TIMEOUT); } static void devlink_port_type_warn_cancel(struct devlink_port *devlink_port) { if (!devlink_port_type_should_warn(devlink_port)) return; cancel_delayed_work_sync(&devlink_port->type_warn_dw); } /** * devlink_port_init() - Init devlink port * * @devlink: devlink * @devlink_port: devlink port * * Initialize essential stuff that is needed for functions * that may be called before devlink port registration. * Call to this function is optional and not needed * in case the driver does not use such functions. */ void devlink_port_init(struct devlink *devlink, struct devlink_port *devlink_port) { if (devlink_port->initialized) return; devlink_port->devlink = devlink; INIT_LIST_HEAD(&devlink_port->region_list); devlink_port->initialized = true; } EXPORT_SYMBOL_GPL(devlink_port_init); /** * devlink_port_fini() - Deinitialize devlink port * * @devlink_port: devlink port * * Deinitialize essential stuff that is in use for functions * that may be called after devlink port unregistration. * Call to this function is optional and not needed * in case the driver does not use such functions. */ void devlink_port_fini(struct devlink_port *devlink_port) { WARN_ON(!list_empty(&devlink_port->region_list)); } EXPORT_SYMBOL_GPL(devlink_port_fini); static const struct devlink_port_ops devlink_port_dummy_ops = {}; /** * devl_port_register_with_ops() - Register devlink port * * @devlink: devlink * @devlink_port: devlink port * @port_index: driver-specific numerical identifier of the port * @ops: port ops * * Register devlink port with provided port index. User can use * any indexing, even hw-related one. devlink_port structure * is convenient to be embedded inside user driver private structure. * Note that the caller should take care of zeroing the devlink_port * structure. */ int devl_port_register_with_ops(struct devlink *devlink, struct devlink_port *devlink_port, unsigned int port_index, const struct devlink_port_ops *ops) { int err; devl_assert_locked(devlink); ASSERT_DEVLINK_PORT_NOT_REGISTERED(devlink_port); devlink_port_init(devlink, devlink_port); devlink_port->registered = true; devlink_port->index = port_index; devlink_port->ops = ops ? ops : &devlink_port_dummy_ops; spin_lock_init(&devlink_port->type_lock); INIT_LIST_HEAD(&devlink_port->reporter_list); err = xa_insert(&devlink->ports, port_index, devlink_port, GFP_KERNEL); if (err) { devlink_port->registered = false; return err; } INIT_DELAYED_WORK(&devlink_port->type_warn_dw, &devlink_port_type_warn); devlink_port_type_warn_schedule(devlink_port); devlink_port_notify(devlink_port, DEVLINK_CMD_PORT_NEW); return 0; } EXPORT_SYMBOL_GPL(devl_port_register_with_ops); /** * devlink_port_register_with_ops - Register devlink port * * @devlink: devlink * @devlink_port: devlink port * @port_index: driver-specific numerical identifier of the port * @ops: port ops * * Register devlink port with provided port index. User can use * any indexing, even hw-related one. devlink_port structure * is convenient to be embedded inside user driver private structure. * Note that the caller should take care of zeroing the devlink_port * structure. * * Context: Takes and release devlink->lock <mutex>. */ int devlink_port_register_with_ops(struct devlink *devlink, struct devlink_port *devlink_port, unsigned int port_index, const struct devlink_port_ops *ops) { int err; devl_lock(devlink); err = devl_port_register_with_ops(devlink, devlink_port, port_index, ops); devl_unlock(devlink); return err; } EXPORT_SYMBOL_GPL(devlink_port_register_with_ops); /** * devl_port_unregister() - Unregister devlink port * * @devlink_port: devlink port */ void devl_port_unregister(struct devlink_port *devlink_port) { lockdep_assert_held(&devlink_port->devlink->lock); WARN_ON(devlink_port->type != DEVLINK_PORT_TYPE_NOTSET); devlink_port_type_warn_cancel(devlink_port); devlink_port_notify(devlink_port, DEVLINK_CMD_PORT_DEL); xa_erase(&devlink_port->devlink->ports, devlink_port->index); WARN_ON(!list_empty(&devlink_port->reporter_list)); devlink_port->registered = false; } EXPORT_SYMBOL_GPL(devl_port_unregister); /** * devlink_port_unregister - Unregister devlink port * * @devlink_port: devlink port * * Context: Takes and release devlink->lock <mutex>. */ void devlink_port_unregister(struct devlink_port *devlink_port) { struct devlink *devlink = devlink_port->devlink; devl_lock(devlink); devl_port_unregister(devlink_port); devl_unlock(devlink); } EXPORT_SYMBOL_GPL(devlink_port_unregister); static void devlink_port_type_netdev_checks(struct devlink_port *devlink_port, struct net_device *netdev) { const struct net_device_ops *ops = netdev->netdev_ops; /* If driver registers devlink port, it should set devlink port * attributes accordingly so the compat functions are called * and the original ops are not used. */ if (ops->ndo_get_phys_port_name) { /* Some drivers use the same set of ndos for netdevs * that have devlink_port registered and also for * those who don't. Make sure that ndo_get_phys_port_name * returns -EOPNOTSUPP here in case it is defined. * Warn if not. */ char name[IFNAMSIZ]; int err; err = ops->ndo_get_phys_port_name(netdev, name, sizeof(name)); WARN_ON(err != -EOPNOTSUPP); } if (ops->ndo_get_port_parent_id) { /* Some drivers use the same set of ndos for netdevs * that have devlink_port registered and also for * those who don't. Make sure that ndo_get_port_parent_id * returns -EOPNOTSUPP here in case it is defined. * Warn if not. */ struct netdev_phys_item_id ppid; int err; err = ops->ndo_get_port_parent_id(netdev, &ppid); WARN_ON(err != -EOPNOTSUPP); } } static void __devlink_port_type_set(struct devlink_port *devlink_port, enum devlink_port_type type, void *type_dev) { struct net_device *netdev = type_dev; ASSERT_DEVLINK_PORT_REGISTERED(devlink_port); if (type == DEVLINK_PORT_TYPE_NOTSET) { devlink_port_type_warn_schedule(devlink_port); } else { devlink_port_type_warn_cancel(devlink_port); if (type == DEVLINK_PORT_TYPE_ETH && netdev) devlink_port_type_netdev_checks(devlink_port, netdev); } spin_lock_bh(&devlink_port->type_lock); devlink_port->type = type; switch (type) { case DEVLINK_PORT_TYPE_ETH: devlink_port->type_eth.netdev = netdev; if (netdev) { ASSERT_RTNL(); devlink_port->type_eth.ifindex = netdev->ifindex; BUILD_BUG_ON(sizeof(devlink_port->type_eth.ifname) != sizeof(netdev->name)); strcpy(devlink_port->type_eth.ifname, netdev->name); } break; case DEVLINK_PORT_TYPE_IB: devlink_port->type_ib.ibdev = type_dev; break; default: break; } spin_unlock_bh(&devlink_port->type_lock); devlink_port_notify(devlink_port, DEVLINK_CMD_PORT_NEW); } /** * devlink_port_type_eth_set - Set port type to Ethernet * * @devlink_port: devlink port * * If driver is calling this, most likely it is doing something wrong. */ void devlink_port_type_eth_set(struct devlink_port *devlink_port) { dev_warn(devlink_port->devlink->dev, "devlink port type for port %d set to Ethernet without a software interface reference, device type not supported by the kernel?\n", devlink_port->index); __devlink_port_type_set(devlink_port, DEVLINK_PORT_TYPE_ETH, NULL); } EXPORT_SYMBOL_GPL(devlink_port_type_eth_set); /** * devlink_port_type_ib_set - Set port type to InfiniBand * * @devlink_port: devlink port * @ibdev: related IB device */ void devlink_port_type_ib_set(struct devlink_port *devlink_port, struct ib_device *ibdev) { __devlink_port_type_set(devlink_port, DEVLINK_PORT_TYPE_IB, ibdev); } EXPORT_SYMBOL_GPL(devlink_port_type_ib_set); /** * devlink_port_type_clear - Clear port type * * @devlink_port: devlink port * * If driver is calling this for clearing Ethernet type, most likely * it is doing something wrong. */ void devlink_port_type_clear(struct devlink_port *devlink_port) { if (devlink_port->type == DEVLINK_PORT_TYPE_ETH) dev_warn(devlink_port->devlink->dev, "devlink port type for port %d cleared without a software interface reference, device type not supported by the kernel?\n", devlink_port->index); __devlink_port_type_set(devlink_port, DEVLINK_PORT_TYPE_NOTSET, NULL); } EXPORT_SYMBOL_GPL(devlink_port_type_clear); int devlink_port_netdevice_event(struct notifier_block *nb, unsigned long event, void *ptr) { struct net_device *netdev = netdev_notifier_info_to_dev(ptr); struct devlink_port *devlink_port = netdev->devlink_port; struct devlink *devlink; if (!devlink_port) return NOTIFY_OK; devlink = devlink_port->devlink; switch (event) { case NETDEV_POST_INIT: /* Set the type but not netdev pointer. It is going to be set * later on by NETDEV_REGISTER event. Happens once during * netdevice register */ __devlink_port_type_set(devlink_port, DEVLINK_PORT_TYPE_ETH, NULL); break; case NETDEV_REGISTER: case NETDEV_CHANGENAME: if (devlink_net(devlink) != dev_net(netdev)) return NOTIFY_OK; /* Set the netdev on top of previously set type. Note this * event happens also during net namespace change so here * we take into account netdev pointer appearing in this * namespace. */ __devlink_port_type_set(devlink_port, devlink_port->type, netdev); break; case NETDEV_UNREGISTER: if (devlink_net(devlink) != dev_net(netdev)) return NOTIFY_OK; /* Clear netdev pointer, but not the type. This event happens * also during net namespace change so we need to clear * pointer to netdev that is going to another net namespace. */ __devlink_port_type_set(devlink_port, devlink_port->type, NULL); break; case NETDEV_PRE_UNINIT: /* Clear the type and the netdev pointer. Happens one during * netdevice unregister. */ __devlink_port_type_set(devlink_port, DEVLINK_PORT_TYPE_NOTSET, NULL); break; } return NOTIFY_OK; } static int __devlink_port_attrs_set(struct devlink_port *devlink_port, enum devlink_port_flavour flavour) { struct devlink_port_attrs *attrs = &devlink_port->attrs; devlink_port->attrs_set = true; attrs->flavour = flavour; if (attrs->switch_id.id_len) { devlink_port->switch_port = true; if (WARN_ON(attrs->switch_id.id_len > MAX_PHYS_ITEM_ID_LEN)) attrs->switch_id.id_len = MAX_PHYS_ITEM_ID_LEN; } else { devlink_port->switch_port = false; } return 0; } /** * devlink_port_attrs_set - Set port attributes * * @devlink_port: devlink port * @attrs: devlink port attrs */ void devlink_port_attrs_set(struct devlink_port *devlink_port, struct devlink_port_attrs *attrs) { int ret; ASSERT_DEVLINK_PORT_NOT_REGISTERED(devlink_port); devlink_port->attrs = *attrs; ret = __devlink_port_attrs_set(devlink_port, attrs->flavour); if (ret) return; WARN_ON(attrs->splittable && attrs->split); } EXPORT_SYMBOL_GPL(devlink_port_attrs_set); /** * devlink_port_attrs_pci_pf_set - Set PCI PF port attributes * * @devlink_port: devlink port * @controller: associated controller number for the devlink port instance * @pf: associated PCI function number for the devlink port instance * @external: indicates if the port is for an external controller */ void devlink_port_attrs_pci_pf_set(struct devlink_port *devlink_port, u32 controller, u16 pf, bool external) { struct devlink_port_attrs *attrs = &devlink_port->attrs; int ret; ASSERT_DEVLINK_PORT_NOT_REGISTERED(devlink_port); ret = __devlink_port_attrs_set(devlink_port, DEVLINK_PORT_FLAVOUR_PCI_PF); if (ret) return; attrs->pci_pf.controller = controller; attrs->pci_pf.pf = pf; attrs->pci_pf.external = external; } EXPORT_SYMBOL_GPL(devlink_port_attrs_pci_pf_set); /** * devlink_port_attrs_pci_vf_set - Set PCI VF port attributes * * @devlink_port: devlink port * @controller: associated controller number for the devlink port instance * @pf: associated PCI function number for the devlink port instance * @vf: associated PCI VF number of a PF for the devlink port instance; * VF number starts from 0 for the first PCI virtual function * @external: indicates if the port is for an external controller */ void devlink_port_attrs_pci_vf_set(struct devlink_port *devlink_port, u32 controller, u16 pf, u16 vf, bool external) { struct devlink_port_attrs *attrs = &devlink_port->attrs; int ret; ASSERT_DEVLINK_PORT_NOT_REGISTERED(devlink_port); ret = __devlink_port_attrs_set(devlink_port, DEVLINK_PORT_FLAVOUR_PCI_VF); if (ret) return; attrs->pci_vf.controller = controller; attrs->pci_vf.pf = pf; attrs->pci_vf.vf = vf; attrs->pci_vf.external = external; } EXPORT_SYMBOL_GPL(devlink_port_attrs_pci_vf_set); /** * devlink_port_attrs_pci_sf_set - Set PCI SF port attributes * * @devlink_port: devlink port * @controller: associated controller number for the devlink port instance * @pf: associated PCI function number for the devlink port instance * @sf: associated SF number of a PF for the devlink port instance * @external: indicates if the port is for an external controller */ void devlink_port_attrs_pci_sf_set(struct devlink_port *devlink_port, u32 controller, u16 pf, u32 sf, bool external) { struct devlink_port_attrs *attrs = &devlink_port->attrs; int ret; ASSERT_DEVLINK_PORT_NOT_REGISTERED(devlink_port); ret = __devlink_port_attrs_set(devlink_port, DEVLINK_PORT_FLAVOUR_PCI_SF); if (ret) return; attrs->pci_sf.controller = controller; attrs->pci_sf.pf = pf; attrs->pci_sf.sf = sf; attrs->pci_sf.external = external; } EXPORT_SYMBOL_GPL(devlink_port_attrs_pci_sf_set); static void devlink_port_rel_notify_cb(struct devlink *devlink, u32 port_index) { struct devlink_port *devlink_port; devlink_port = devlink_port_get_by_index(devlink, port_index); if (!devlink_port) return; devlink_port_notify(devlink_port, DEVLINK_CMD_PORT_NEW); } static void devlink_port_rel_cleanup_cb(struct devlink *devlink, u32 port_index, u32 rel_index) { struct devlink_port *devlink_port; devlink_port = devlink_port_get_by_index(devlink, port_index); if (devlink_port && devlink_port->rel_index == rel_index) devlink_port->rel_index = 0; } /** * devl_port_fn_devlink_set - Attach peer devlink * instance to port function. * @devlink_port: devlink port * @fn_devlink: devlink instance to attach */ int devl_port_fn_devlink_set(struct devlink_port *devlink_port, struct devlink *fn_devlink) { ASSERT_DEVLINK_PORT_REGISTERED(devlink_port); if (WARN_ON(devlink_port->attrs.flavour != DEVLINK_PORT_FLAVOUR_PCI_SF || devlink_port->attrs.pci_sf.external)) return -EINVAL; return devlink_rel_nested_in_add(&devlink_port->rel_index, devlink_port->devlink->index, devlink_port->index, devlink_port_rel_notify_cb, devlink_port_rel_cleanup_cb, fn_devlink); } EXPORT_SYMBOL_GPL(devl_port_fn_devlink_set); /** * devlink_port_linecard_set - Link port with a linecard * * @devlink_port: devlink port * @linecard: devlink linecard */ void devlink_port_linecard_set(struct devlink_port *devlink_port, struct devlink_linecard *linecard) { ASSERT_DEVLINK_PORT_NOT_REGISTERED(devlink_port); devlink_port->linecard = linecard; } EXPORT_SYMBOL_GPL(devlink_port_linecard_set); static int __devlink_port_phys_port_name_get(struct devlink_port *devlink_port, char *name, size_t len) { struct devlink_port_attrs *attrs = &devlink_port->attrs; int n = 0; if (!devlink_port->attrs_set) return -EOPNOTSUPP; switch (attrs->flavour) { case DEVLINK_PORT_FLAVOUR_PHYSICAL: if (devlink_port->linecard) n = snprintf(name, len, "l%u", devlink_linecard_index(devlink_port->linecard)); if (n < len) n += snprintf(name + n, len - n, "p%u", attrs->phys.port_number); if (n < len && attrs->split) n += snprintf(name + n, len - n, "s%u", attrs->phys.split_subport_number); break; case DEVLINK_PORT_FLAVOUR_CPU: case DEVLINK_PORT_FLAVOUR_DSA: case DEVLINK_PORT_FLAVOUR_UNUSED: /* As CPU and DSA ports do not have a netdevice associated * case should not ever happen. */ WARN_ON(1); return -EINVAL; case DEVLINK_PORT_FLAVOUR_PCI_PF: if (attrs->pci_pf.external) { n = snprintf(name, len, "c%u", attrs->pci_pf.controller); if (n >= len) return -EINVAL; len -= n; name += n; } n = snprintf(name, len, "pf%u", attrs->pci_pf.pf); break; case DEVLINK_PORT_FLAVOUR_PCI_VF: if (attrs->pci_vf.external) { n = snprintf(name, len, "c%u", attrs->pci_vf.controller); if (n >= len) return -EINVAL; len -= n; name += n; } n = snprintf(name, len, "pf%uvf%u", attrs->pci_vf.pf, attrs->pci_vf.vf); break; case DEVLINK_PORT_FLAVOUR_PCI_SF: if (attrs->pci_sf.external) { n = snprintf(name, len, "c%u", attrs->pci_sf.controller); if (n >= len) return -EINVAL; len -= n; name += n; } n = snprintf(name, len, "pf%usf%u", attrs->pci_sf.pf, attrs->pci_sf.sf); break; case DEVLINK_PORT_FLAVOUR_VIRTUAL: return -EOPNOTSUPP; } if (n >= len) return -EINVAL; return 0; } int devlink_compat_phys_port_name_get(struct net_device *dev, char *name, size_t len) { struct devlink_port *devlink_port; /* RTNL mutex is held here which ensures that devlink_port * instance cannot disappear in the middle. No need to take * any devlink lock as only permanent values are accessed. */ ASSERT_RTNL(); devlink_port = dev->devlink_port; if (!devlink_port) return -EOPNOTSUPP; return __devlink_port_phys_port_name_get(devlink_port, name, len); } int devlink_compat_switch_id_get(struct net_device *dev, struct netdev_phys_item_id *ppid) { struct devlink_port *devlink_port; /* Caller must hold RTNL mutex or reference to dev, which ensures that * devlink_port instance cannot disappear in the middle. No need to take * any devlink lock as only permanent values are accessed. */ devlink_port = dev->devlink_port; if (!devlink_port || !devlink_port->switch_port) return -EOPNOTSUPP; memcpy(ppid, &devlink_port->attrs.switch_id, sizeof(*ppid)); return 0; } |
2 2 2 2 2 2 2 2 17 17 17 17 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 | // SPDX-License-Identifier: GPL-2.0-or-later /* */ #include <linux/gfp.h> #include <linux/init.h> #include <linux/ratelimit.h> #include <linux/usb.h> #include <linux/usb/audio.h> #include <linux/slab.h> #include <sound/core.h> #include <sound/pcm.h> #include <sound/pcm_params.h> #include "usbaudio.h" #include "helper.h" #include "card.h" #include "endpoint.h" #include "pcm.h" #include "clock.h" #include "quirks.h" enum { EP_STATE_STOPPED, EP_STATE_RUNNING, EP_STATE_STOPPING, }; /* interface refcounting */ struct snd_usb_iface_ref { unsigned char iface; bool need_setup; int opened; int altset; struct list_head list; }; /* clock refcounting */ struct snd_usb_clock_ref { unsigned char clock; atomic_t locked; int opened; int rate; bool need_setup; struct list_head list; }; /* * snd_usb_endpoint is a model that abstracts everything related to an * USB endpoint and its streaming. * * There are functions to activate and deactivate the streaming URBs and * optional callbacks to let the pcm logic handle the actual content of the * packets for playback and record. Thus, the bus streaming and the audio * handlers are fully decoupled. * * There are two different types of endpoints in audio applications. * * SND_USB_ENDPOINT_TYPE_DATA handles full audio data payload for both * inbound and outbound traffic. * * SND_USB_ENDPOINT_TYPE_SYNC endpoints are for inbound traffic only and * expect the payload to carry Q10.14 / Q16.16 formatted sync information * (3 or 4 bytes). * * Each endpoint has to be configured prior to being used by calling * snd_usb_endpoint_set_params(). * * The model incorporates a reference counting, so that multiple users * can call snd_usb_endpoint_start() and snd_usb_endpoint_stop(), and * only the first user will effectively start the URBs, and only the last * one to stop it will tear the URBs down again. */ /* * convert a sampling rate into our full speed format (fs/1000 in Q16.16) * this will overflow at approx 524 kHz */ static inline unsigned get_usb_full_speed_rate(unsigned int rate) { return ((rate << 13) + 62) / 125; } /* * convert a sampling rate into USB high speed format (fs/8000 in Q16.16) * this will overflow at approx 4 MHz */ static inline unsigned get_usb_high_speed_rate(unsigned int rate) { return ((rate << 10) + 62) / 125; } /* * release a urb data */ static void release_urb_ctx(struct snd_urb_ctx *u) { if (u->urb && u->buffer_size) usb_free_coherent(u->ep->chip->dev, u->buffer_size, u->urb->transfer_buffer, u->urb->transfer_dma); usb_free_urb(u->urb); u->urb = NULL; u->buffer_size = 0; } static const char *usb_error_string(int err) { switch (err) { case -ENODEV: return "no device"; case -ENOENT: return "endpoint not enabled"; case -EPIPE: return "endpoint stalled"; case -ENOSPC: return "not enough bandwidth"; case -ESHUTDOWN: return "device disabled"; case -EHOSTUNREACH: return "device suspended"; case -EINVAL: case -EAGAIN: case -EFBIG: case -EMSGSIZE: return "internal error"; default: return "unknown error"; } } static inline bool ep_state_running(struct snd_usb_endpoint *ep) { return atomic_read(&ep->state) == EP_STATE_RUNNING; } static inline bool ep_state_update(struct snd_usb_endpoint *ep, int old, int new) { return atomic_try_cmpxchg(&ep->state, &old, new); } /** * snd_usb_endpoint_implicit_feedback_sink: Report endpoint usage type * * @ep: The snd_usb_endpoint * * Determine whether an endpoint is driven by an implicit feedback * data endpoint source. */ int snd_usb_endpoint_implicit_feedback_sink(struct snd_usb_endpoint *ep) { return ep->implicit_fb_sync && usb_pipeout(ep->pipe); } /* * Return the number of samples to be sent in the next packet * for streaming based on information derived from sync endpoints * * This won't be used for implicit feedback which takes the packet size * returned from the sync source */ static int slave_next_packet_size(struct snd_usb_endpoint *ep, unsigned int avail) { unsigned long flags; unsigned int phase; int ret; if (ep->fill_max) return ep->maxframesize; spin_lock_irqsave(&ep->lock, flags); phase = (ep->phase & 0xffff) + (ep->freqm << ep->datainterval); ret = min(phase >> 16, ep->maxframesize); if (avail && ret >= avail) ret = -EAGAIN; else ep->phase = phase; spin_unlock_irqrestore(&ep->lock, flags); return ret; } /* * Return the number of samples to be sent in the next packet * for adaptive and synchronous endpoints */ static int next_packet_size(struct snd_usb_endpoint *ep, unsigned int avail) { unsigned int sample_accum; int ret; if (ep->fill_max) return ep->maxframesize; sample_accum = ep->sample_accum + ep->sample_rem; if (sample_accum >= ep->pps) { sample_accum -= ep->pps; ret = ep->packsize[1]; } else { ret = ep->packsize[0]; } if (avail && ret >= avail) ret = -EAGAIN; else ep->sample_accum = sample_accum; return ret; } /* * snd_usb_endpoint_next_packet_size: Return the number of samples to be sent * in the next packet * * If the size is equal or exceeds @avail, don't proceed but return -EAGAIN * Exception: @avail = 0 for skipping the check. */ int snd_usb_endpoint_next_packet_size(struct snd_usb_endpoint *ep, struct snd_urb_ctx *ctx, int idx, unsigned int avail) { unsigned int packet; packet = ctx->packet_size[idx]; if (packet) { if (avail && packet >= avail) return -EAGAIN; return packet; } if (ep->sync_source) return slave_next_packet_size(ep, avail); else return next_packet_size(ep, avail); } static void call_retire_callback(struct snd_usb_endpoint *ep, struct urb *urb) { struct snd_usb_substream *data_subs; data_subs = READ_ONCE(ep->data_subs); if (data_subs && ep->retire_data_urb) ep->retire_data_urb(data_subs, urb); } static void retire_outbound_urb(struct snd_usb_endpoint *ep, struct snd_urb_ctx *urb_ctx) { call_retire_callback(ep, urb_ctx->urb); } static void snd_usb_handle_sync_urb(struct snd_usb_endpoint *ep, struct snd_usb_endpoint *sender, const struct urb *urb); static void retire_inbound_urb(struct snd_usb_endpoint *ep, struct snd_urb_ctx *urb_ctx) { struct urb *urb = urb_ctx->urb; struct snd_usb_endpoint *sync_sink; if (unlikely(ep->skip_packets > 0)) { ep->skip_packets--; return; } sync_sink = READ_ONCE(ep->sync_sink); if (sync_sink) snd_usb_handle_sync_urb(sync_sink, ep, urb); call_retire_callback(ep, urb); } static inline bool has_tx_length_quirk(struct snd_usb_audio *chip) { return chip->quirk_flags & QUIRK_FLAG_TX_LENGTH; } static void prepare_silent_urb(struct snd_usb_endpoint *ep, struct snd_urb_ctx *ctx) { struct urb *urb = ctx->urb; unsigned int offs = 0; unsigned int extra = 0; __le32 packet_length; int i; /* For tx_length_quirk, put packet length at start of packet */ if (has_tx_length_quirk(ep->chip)) extra = sizeof(packet_length); for (i = 0; i < ctx->packets; ++i) { unsigned int offset; unsigned int length; int counts; counts = snd_usb_endpoint_next_packet_size(ep, ctx, i, 0); length = counts * ep->stride; /* number of silent bytes */ offset = offs * ep->stride + extra * i; urb->iso_frame_desc[i].offset = offset; urb->iso_frame_desc[i].length = length + extra; if (extra) { packet_length = cpu_to_le32(length); memcpy(urb->transfer_buffer + offset, &packet_length, sizeof(packet_length)); } memset(urb->transfer_buffer + offset + extra, ep->silence_value, length); offs += counts; } urb->number_of_packets = ctx->packets; urb->transfer_buffer_length = offs * ep->stride + ctx->packets * extra; ctx->queued = 0; } /* * Prepare a PLAYBACK urb for submission to the bus. */ static int prepare_outbound_urb(struct snd_usb_endpoint *ep, struct snd_urb_ctx *ctx, bool in_stream_lock) { struct urb *urb = ctx->urb; unsigned char *cp = urb->transfer_buffer; struct snd_usb_substream *data_subs; urb->dev = ep->chip->dev; /* we need to set this at each time */ switch (ep->type) { case SND_USB_ENDPOINT_TYPE_DATA: data_subs = READ_ONCE(ep->data_subs); if (data_subs && ep->prepare_data_urb) return ep->prepare_data_urb(data_subs, urb, in_stream_lock); /* no data provider, so send silence */ prepare_silent_urb(ep, ctx); break; case SND_USB_ENDPOINT_TYPE_SYNC: if (snd_usb_get_speed(ep->chip->dev) >= USB_SPEED_HIGH) { /* * fill the length and offset of each urb descriptor. * the fixed 12.13 frequency is passed as 16.16 through the pipe. */ urb->iso_frame_desc[0].length = 4; urb->iso_frame_desc[0].offset = 0; cp[0] = ep->freqn; cp[1] = ep->freqn >> 8; cp[2] = ep->freqn >> 16; cp[3] = ep->freqn >> 24; } else { /* * fill the length and offset of each urb descriptor. * the fixed 10.14 frequency is passed through the pipe. */ urb->iso_frame_desc[0].length = 3; urb->iso_frame_desc[0].offset = 0; cp[0] = ep->freqn >> 2; cp[1] = ep->freqn >> 10; cp[2] = ep->freqn >> 18; } break; } return 0; } /* * Prepare a CAPTURE or SYNC urb for submission to the bus. */ static int prepare_inbound_urb(struct snd_usb_endpoint *ep, struct snd_urb_ctx *urb_ctx) { int i, offs; struct urb *urb = urb_ctx->urb; urb->dev = ep->chip->dev; /* we need to set this at each time */ switch (ep->type) { case SND_USB_ENDPOINT_TYPE_DATA: offs = 0; for (i = 0; i < urb_ctx->packets; i++) { urb->iso_frame_desc[i].offset = offs; urb->iso_frame_desc[i].length = ep->curpacksize; offs += ep->curpacksize; } urb->transfer_buffer_length = offs; urb->number_of_packets = urb_ctx->packets; break; case SND_USB_ENDPOINT_TYPE_SYNC: urb->iso_frame_desc[0].length = min(4u, ep->syncmaxsize); urb->iso_frame_desc[0].offset = 0; break; } return 0; } /* notify an error as XRUN to the assigned PCM data substream */ static void notify_xrun(struct snd_usb_endpoint *ep) { struct snd_usb_substream *data_subs; struct snd_pcm_substream *psubs; data_subs = READ_ONCE(ep->data_subs); if (!data_subs) return; psubs = data_subs->pcm_substream; if (psubs && psubs->runtime && psubs->runtime->state == SNDRV_PCM_STATE_RUNNING) snd_pcm_stop_xrun(psubs); } static struct snd_usb_packet_info * next_packet_fifo_enqueue(struct snd_usb_endpoint *ep) { struct snd_usb_packet_info *p; p = ep->next_packet + (ep->next_packet_head + ep->next_packet_queued) % ARRAY_SIZE(ep->next_packet); ep->next_packet_queued++; return p; } static struct snd_usb_packet_info * next_packet_fifo_dequeue(struct snd_usb_endpoint *ep) { struct snd_usb_packet_info *p; p = ep->next_packet + ep->next_packet_head; ep->next_packet_head++; ep->next_packet_head %= ARRAY_SIZE(ep->next_packet); ep->next_packet_queued--; return p; } static void push_back_to_ready_list(struct snd_usb_endpoint *ep, struct snd_urb_ctx *ctx) { unsigned long flags; spin_lock_irqsave(&ep->lock, flags); list_add_tail(&ctx->ready_list, &ep->ready_playback_urbs); spin_unlock_irqrestore(&ep->lock, flags); } /* * Send output urbs that have been prepared previously. URBs are dequeued * from ep->ready_playback_urbs and in case there aren't any available * or there are no packets that have been prepared, this function does * nothing. * * The reason why the functionality of sending and preparing URBs is separated * is that host controllers don't guarantee the order in which they return * inbound and outbound packets to their submitters. * * This function is used both for implicit feedback endpoints and in low- * latency playback mode. */ int snd_usb_queue_pending_output_urbs(struct snd_usb_endpoint *ep, bool in_stream_lock) { bool implicit_fb = snd_usb_endpoint_implicit_feedback_sink(ep); while (ep_state_running(ep)) { unsigned long flags; struct snd_usb_packet_info *packet; struct snd_urb_ctx *ctx = NULL; int err, i; spin_lock_irqsave(&ep->lock, flags); if ((!implicit_fb || ep->next_packet_queued > 0) && !list_empty(&ep->ready_playback_urbs)) { /* take URB out of FIFO */ ctx = list_first_entry(&ep->ready_playback_urbs, struct snd_urb_ctx, ready_list); list_del_init(&ctx->ready_list); if (implicit_fb) packet = next_packet_fifo_dequeue(ep); } spin_unlock_irqrestore(&ep->lock, flags); if (ctx == NULL) break; /* copy over the length information */ if (implicit_fb) { for (i = 0; i < packet->packets; i++) ctx->packet_size[i] = packet->packet_size[i]; } /* call the data handler to fill in playback data */ err = prepare_outbound_urb(ep, ctx, in_stream_lock); /* can be stopped during prepare callback */ if (unlikely(!ep_state_running(ep))) break; if (err < 0) { /* push back to ready list again for -EAGAIN */ if (err == -EAGAIN) { push_back_to_ready_list(ep, ctx); break; } if (!in_stream_lock) notify_xrun(ep); return -EPIPE; } if (!atomic_read(&ep->chip->shutdown)) err = usb_submit_urb(ctx->urb, GFP_ATOMIC); else err = -ENODEV; if (err < 0) { if (!atomic_read(&ep->chip->shutdown)) { usb_audio_err(ep->chip, "Unable to submit urb #%d: %d at %s\n", ctx->index, err, __func__); if (!in_stream_lock) notify_xrun(ep); } return -EPIPE; } set_bit(ctx->index, &ep->active_mask); atomic_inc(&ep->submitted_urbs); } return 0; } /* * complete callback for urbs */ static void snd_complete_urb(struct urb *urb) { struct snd_urb_ctx *ctx = urb->context; struct snd_usb_endpoint *ep = ctx->ep; int err; if (unlikely(urb->status == -ENOENT || /* unlinked */ urb->status == -ENODEV || /* device removed */ urb->status == -ECONNRESET || /* unlinked */ urb->status == -ESHUTDOWN)) /* device disabled */ goto exit_clear; /* device disconnected */ if (unlikely(atomic_read(&ep->chip->shutdown))) goto exit_clear; if (unlikely(!ep_state_running(ep))) goto exit_clear; if (usb_pipeout(ep->pipe)) { retire_outbound_urb(ep, ctx); /* can be stopped during retire callback */ if (unlikely(!ep_state_running(ep))) goto exit_clear; /* in low-latency and implicit-feedback modes, push back the * URB to ready list at first, then process as much as possible */ if (ep->lowlatency_playback || snd_usb_endpoint_implicit_feedback_sink(ep)) { push_back_to_ready_list(ep, ctx); clear_bit(ctx->index, &ep->active_mask); snd_usb_queue_pending_output_urbs(ep, false); /* decrement at last, and check xrun */ if (atomic_dec_and_test(&ep->submitted_urbs) && !snd_usb_endpoint_implicit_feedback_sink(ep)) notify_xrun(ep); return; } /* in non-lowlatency mode, no error handling for prepare */ prepare_outbound_urb(ep, ctx, false); /* can be stopped during prepare callback */ if (unlikely(!ep_state_running(ep))) goto exit_clear; } else { retire_inbound_urb(ep, ctx); /* can be stopped during retire callback */ if (unlikely(!ep_state_running(ep))) goto exit_clear; prepare_inbound_urb(ep, ctx); } if (!atomic_read(&ep->chip->shutdown)) err = usb_submit_urb(urb, GFP_ATOMIC); else err = -ENODEV; if (err == 0) return; if (!atomic_read(&ep->chip->shutdown)) { usb_audio_err(ep->chip, "cannot submit urb (err = %d)\n", err); notify_xrun(ep); } exit_clear: clear_bit(ctx->index, &ep->active_mask); atomic_dec(&ep->submitted_urbs); } /* * Find or create a refcount object for the given interface * * The objects are released altogether in snd_usb_endpoint_free_all() */ static struct snd_usb_iface_ref * iface_ref_find(struct snd_usb_audio *chip, int iface) { struct snd_usb_iface_ref *ip; list_for_each_entry(ip, &chip->iface_ref_list, list) if (ip->iface == iface) return ip; ip = kzalloc(sizeof(*ip), GFP_KERNEL); if (!ip) return NULL; ip->iface = iface; list_add_tail(&ip->list, &chip->iface_ref_list); return ip; } /* Similarly, a refcount object for clock */ static struct snd_usb_clock_ref * clock_ref_find(struct snd_usb_audio *chip, int clock) { struct snd_usb_clock_ref *ref; list_for_each_entry(ref, &chip->clock_ref_list, list) if (ref->clock == clock) return ref; ref = kzalloc(sizeof(*ref), GFP_KERNEL); if (!ref) return NULL; ref->clock = clock; atomic_set(&ref->locked, 0); list_add_tail(&ref->list, &chip->clock_ref_list); return ref; } /* * Get the existing endpoint object corresponding EP * Returns NULL if not present. */ struct snd_usb_endpoint * snd_usb_get_endpoint(struct snd_usb_audio *chip, int ep_num) { struct snd_usb_endpoint *ep; list_for_each_entry(ep, &chip->ep_list, list) { if (ep->ep_num == ep_num) return ep; } return NULL; } #define ep_type_name(type) \ (type == SND_USB_ENDPOINT_TYPE_DATA ? "data" : "sync") /** * snd_usb_add_endpoint: Add an endpoint to an USB audio chip * * @chip: The chip * @ep_num: The number of the endpoint to use * @type: SND_USB_ENDPOINT_TYPE_DATA or SND_USB_ENDPOINT_TYPE_SYNC * * If the requested endpoint has not been added to the given chip before, * a new instance is created. * * Returns zero on success or a negative error code. * * New endpoints will be added to chip->ep_list and freed by * calling snd_usb_endpoint_free_all(). * * For SND_USB_ENDPOINT_TYPE_SYNC, the caller needs to guarantee that * bNumEndpoints > 1 beforehand. */ int snd_usb_add_endpoint(struct snd_usb_audio *chip, int ep_num, int type) { struct snd_usb_endpoint *ep; bool is_playback; ep = snd_usb_get_endpoint(chip, ep_num); if (ep) return 0; usb_audio_dbg(chip, "Creating new %s endpoint #%x\n", ep_type_name(type), ep_num); ep = kzalloc(sizeof(*ep), GFP_KERNEL); if (!ep) return -ENOMEM; ep->chip = chip; spin_lock_init(&ep->lock); ep->type = type; ep->ep_num = ep_num; INIT_LIST_HEAD(&ep->ready_playback_urbs); atomic_set(&ep->submitted_urbs, 0); is_playback = ((ep_num & USB_ENDPOINT_DIR_MASK) == USB_DIR_OUT); ep_num &= USB_ENDPOINT_NUMBER_MASK; if (is_playback) ep->pipe = usb_sndisocpipe(chip->dev, ep_num); else ep->pipe = usb_rcvisocpipe(chip->dev, ep_num); list_add_tail(&ep->list, &chip->ep_list); return 0; } /* Set up syncinterval and maxsyncsize for a sync EP */ static void endpoint_set_syncinterval(struct snd_usb_audio *chip, struct snd_usb_endpoint *ep) { struct usb_host_interface *alts; struct usb_endpoint_descriptor *desc; alts = snd_usb_get_host_interface(chip, ep->iface, ep->altsetting); if (!alts) return; desc = get_endpoint(alts, ep->ep_idx); if (desc->bLength >= USB_DT_ENDPOINT_AUDIO_SIZE && desc->bRefresh >= 1 && desc->bRefresh <= 9) ep->syncinterval = desc->bRefresh; else if (snd_usb_get_speed(chip->dev) == USB_SPEED_FULL) ep->syncinterval = 1; else if (desc->bInterval >= 1 && desc->bInterval <= 16) ep->syncinterval = desc->bInterval - 1; else ep->syncinterval = 3; ep->syncmaxsize = le16_to_cpu(desc->wMaxPacketSize); } static bool endpoint_compatible(struct snd_usb_endpoint *ep, const struct audioformat *fp, const struct snd_pcm_hw_params *params) { if (!ep->opened) return false; if (ep->cur_audiofmt != fp) return false; if (ep->cur_rate != params_rate(params) || ep->cur_format != params_format(params) || ep->cur_period_frames != params_period_size(params) || ep->cur_buffer_periods != params_periods(params)) return false; return true; } /* * Check whether the given fp and hw params are compatible with the current * setup of the target EP for implicit feedback sync */ bool snd_usb_endpoint_compatible(struct snd_usb_audio *chip, struct snd_usb_endpoint *ep, const struct audioformat *fp, const struct snd_pcm_hw_params *params) { bool ret; mutex_lock(&chip->mutex); ret = endpoint_compatible(ep, fp, params); mutex_unlock(&chip->mutex); return ret; } /* * snd_usb_endpoint_open: Open the endpoint * * Called from hw_params to assign the endpoint to the substream. * It's reference-counted, and only the first opener is allowed to set up * arbitrary parameters. The later opener must be compatible with the * former opened parameters. * The endpoint needs to be closed via snd_usb_endpoint_close() later. * * Note that this function doesn't configure the endpoint. The substream * needs to set it up later via snd_usb_endpoint_set_params() and * snd_usb_endpoint_prepare(). */ struct snd_usb_endpoint * snd_usb_endpoint_open(struct snd_usb_audio *chip, const struct audioformat *fp, const struct snd_pcm_hw_params *params, bool is_sync_ep, bool fixed_rate) { struct snd_usb_endpoint *ep; int ep_num = is_sync_ep ? fp->sync_ep : fp->endpoint; mutex_lock(&chip->mutex); ep = snd_usb_get_endpoint(chip, ep_num); if (!ep) { usb_audio_err(chip, "Cannot find EP 0x%x to open\n", ep_num); goto unlock; } if (!ep->opened) { if (is_sync_ep) { ep->iface = fp->sync_iface; ep->altsetting = fp->sync_altsetting; ep->ep_idx = fp->sync_ep_idx; } else { ep->iface = fp->iface; ep->altsetting = fp->altsetting; ep->ep_idx = fp->ep_idx; } usb_audio_dbg(chip, "Open EP 0x%x, iface=%d:%d, idx=%d\n", ep_num, ep->iface, ep->altsetting, ep->ep_idx); ep->iface_ref = iface_ref_find(chip, ep->iface); if (!ep->iface_ref) { ep = NULL; goto unlock; } if (fp->protocol != UAC_VERSION_1) { ep->clock_ref = clock_ref_find(chip, fp->clock); if (!ep->clock_ref) { ep = NULL; goto unlock; } ep->clock_ref->opened++; } ep->cur_audiofmt = fp; ep->cur_channels = fp->channels; ep->cur_rate = params_rate(params); ep->cur_format = params_format(params); ep->cur_frame_bytes = snd_pcm_format_physical_width(ep->cur_format) * ep->cur_channels / 8; ep->cur_period_frames = params_period_size(params); ep->cur_period_bytes = ep->cur_period_frames * ep->cur_frame_bytes; ep->cur_buffer_periods = params_periods(params); if (ep->type == SND_USB_ENDPOINT_TYPE_SYNC) endpoint_set_syncinterval(chip, ep); ep->implicit_fb_sync = fp->implicit_fb; ep->need_setup = true; ep->need_prepare = true; ep->fixed_rate = fixed_rate; usb_audio_dbg(chip, " channels=%d, rate=%d, format=%s, period_bytes=%d, periods=%d, implicit_fb=%d\n", ep->cur_channels, ep->cur_rate, snd_pcm_format_name(ep->cur_format), ep->cur_period_bytes, ep->cur_buffer_periods, ep->implicit_fb_sync); } else { if (WARN_ON(!ep->iface_ref)) { ep = NULL; goto unlock; } if (!endpoint_compatible(ep, fp, params)) { usb_audio_err(chip, "Incompatible EP setup for 0x%x\n", ep_num); ep = NULL; goto unlock; } usb_audio_dbg(chip, "Reopened EP 0x%x (count %d)\n", ep_num, ep->opened); } if (!ep->iface_ref->opened++) ep->iface_ref->need_setup = true; ep->opened++; unlock: mutex_unlock(&chip->mutex); return ep; } /* * snd_usb_endpoint_set_sync: Link data and sync endpoints * * Pass NULL to sync_ep to unlink again */ void snd_usb_endpoint_set_sync(struct snd_usb_audio *chip, struct snd_usb_endpoint *data_ep, struct snd_usb_endpoint *sync_ep) { data_ep->sync_source = sync_ep; } /* * Set data endpoint callbacks and the assigned data stream * * Called at PCM trigger and cleanups. * Pass NULL to deactivate each callback. */ void snd_usb_endpoint_set_callback(struct snd_usb_endpoint *ep, int (*prepare)(struct snd_usb_substream *subs, struct urb *urb, bool in_stream_lock), void (*retire)(struct snd_usb_substream *subs, struct urb *urb), struct snd_usb_substream *data_subs) { ep->prepare_data_urb = prepare; ep->retire_data_urb = retire; if (data_subs) ep->lowlatency_playback = data_subs->lowlatency_playback; else ep->lowlatency_playback = false; WRITE_ONCE(ep->data_subs, data_subs); } static int endpoint_set_interface(struct snd_usb_audio *chip, struct snd_usb_endpoint *ep, bool set) { int altset = set ? ep->altsetting : 0; int err; int retries = 0; const int max_retries = 5; if (ep->iface_ref->altset == altset) return 0; /* already disconnected? */ if (unlikely(atomic_read(&chip->shutdown))) return -ENODEV; usb_audio_dbg(chip, "Setting usb interface %d:%d for EP 0x%x\n", ep->iface, altset, ep->ep_num); retry: err = usb_set_interface(chip->dev, ep->iface, altset); if (err < 0) { if (err == -EPROTO && ++retries <= max_retries) { msleep(5 * (1 << (retries - 1))); goto retry; } usb_audio_err_ratelimited( chip, "%d:%d: usb_set_interface failed (%d)\n", ep->iface, altset, err); return err; } if (chip->quirk_flags & QUIRK_FLAG_IFACE_DELAY) msleep(50); ep->iface_ref->altset = altset; return 0; } /* * snd_usb_endpoint_close: Close the endpoint * * Unreference the already opened endpoint via snd_usb_endpoint_open(). */ void snd_usb_endpoint_close(struct snd_usb_audio *chip, struct snd_usb_endpoint *ep) { mutex_lock(&chip->mutex); usb_audio_dbg(chip, "Closing EP 0x%x (count %d)\n", ep->ep_num, ep->opened); if (!--ep->iface_ref->opened && !(chip->quirk_flags & QUIRK_FLAG_IFACE_SKIP_CLOSE)) endpoint_set_interface(chip, ep, false); if (!--ep->opened) { if (ep->clock_ref) { if (!--ep->clock_ref->opened) ep->clock_ref->rate = 0; } ep->iface = 0; ep->altsetting = 0; ep->cur_audiofmt = NULL; ep->cur_rate = 0; ep->iface_ref = NULL; ep->clock_ref = NULL; usb_audio_dbg(chip, "EP 0x%x closed\n", ep->ep_num); } mutex_unlock(&chip->mutex); } /* Prepare for suspening EP, called from the main suspend handler */ void snd_usb_endpoint_suspend(struct snd_usb_endpoint *ep) { ep->need_prepare = true; if (ep->iface_ref) ep->iface_ref->need_setup = true; if (ep->clock_ref) ep->clock_ref->rate = 0; } /* * wait until all urbs are processed. */ static int wait_clear_urbs(struct snd_usb_endpoint *ep) { unsigned long end_time = jiffies + msecs_to_jiffies(1000); int alive; if (atomic_read(&ep->state) != EP_STATE_STOPPING) return 0; do { alive = atomic_read(&ep->submitted_urbs); if (!alive) break; schedule_timeout_uninterruptible(1); } while (time_before(jiffies, end_time)); if (alive) usb_audio_err(ep->chip, "timeout: still %d active urbs on EP #%x\n", alive, ep->ep_num); if (ep_state_update(ep, EP_STATE_STOPPING, EP_STATE_STOPPED)) { ep->sync_sink = NULL; snd_usb_endpoint_set_callback(ep, NULL, NULL, NULL); } return 0; } /* sync the pending stop operation; * this function itself doesn't trigger the stop operation */ void snd_usb_endpoint_sync_pending_stop(struct snd_usb_endpoint *ep) { if (ep) wait_clear_urbs(ep); } /* * Stop active urbs * * This function moves the EP to STOPPING state if it's being RUNNING. */ static int stop_urbs(struct snd_usb_endpoint *ep, bool force, bool keep_pending) { unsigned int i; unsigned long flags; if (!force && atomic_read(&ep->running)) return -EBUSY; if (!ep_state_update(ep, EP_STATE_RUNNING, EP_STATE_STOPPING)) return 0; spin_lock_irqsave(&ep->lock, flags); INIT_LIST_HEAD(&ep->ready_playback_urbs); ep->next_packet_head = 0; ep->next_packet_queued = 0; spin_unlock_irqrestore(&ep->lock, flags); if (keep_pending) return 0; for (i = 0; i < ep->nurbs; i++) { if (test_bit(i, &ep->active_mask)) { if (!test_and_set_bit(i, &ep->unlink_mask)) { struct urb *u = ep->urb[i].urb; usb_unlink_urb(u); } } } return 0; } /* * release an endpoint's urbs */ static int release_urbs(struct snd_usb_endpoint *ep, bool force) { int i, err; /* route incoming urbs to nirvana */ snd_usb_endpoint_set_callback(ep, NULL, NULL, NULL); /* stop and unlink urbs */ err = stop_urbs(ep, force, false); if (err) return err; wait_clear_urbs(ep); for (i = 0; i < ep->nurbs; i++) release_urb_ctx(&ep->urb[i]); usb_free_coherent(ep->chip->dev, SYNC_URBS * 4, ep->syncbuf, ep->sync_dma); ep->syncbuf = NULL; ep->nurbs = 0; return 0; } /* * configure a data endpoint */ static int data_ep_set_params(struct snd_usb_endpoint *ep) { struct snd_usb_audio *chip = ep->chip; unsigned int maxsize, minsize, packs_per_ms, max_packs_per_urb; unsigned int max_packs_per_period, urbs_per_period, urb_packs; unsigned int max_urbs, i; const struct audioformat *fmt = ep->cur_audiofmt; int frame_bits = ep->cur_frame_bytes * 8; int tx_length_quirk = (has_tx_length_quirk(chip) && usb_pipeout(ep->pipe)); usb_audio_dbg(chip, "Setting params for data EP 0x%x, pipe 0x%x\n", ep->ep_num, ep->pipe); if (ep->cur_format == SNDRV_PCM_FORMAT_DSD_U16_LE && fmt->dsd_dop) { /* * When operating in DSD DOP mode, the size of a sample frame * in hardware differs from the actual physical format width * because we need to make room for the DOP markers. */ frame_bits += ep->cur_channels << 3; } ep->datainterval = fmt->datainterval; ep->stride = frame_bits >> 3; switch (ep->cur_format) { case SNDRV_PCM_FORMAT_U8: ep->silence_value = 0x80; break; case SNDRV_PCM_FORMAT_DSD_U8: case SNDRV_PCM_FORMAT_DSD_U16_LE: case SNDRV_PCM_FORMAT_DSD_U32_LE: case SNDRV_PCM_FORMAT_DSD_U16_BE: case SNDRV_PCM_FORMAT_DSD_U32_BE: ep->silence_value = 0x69; break; default: ep->silence_value = 0; } /* assume max. frequency is 50% higher than nominal */ ep->freqmax = ep->freqn + (ep->freqn >> 1); /* Round up freqmax to nearest integer in order to calculate maximum * packet size, which must represent a whole number of frames. * This is accomplished by adding 0x0.ffff before converting the * Q16.16 format into integer. * In order to accurately calculate the maximum packet size when * the data interval is more than 1 (i.e. ep->datainterval > 0), * multiply by the data interval prior to rounding. For instance, * a freqmax of 41 kHz will result in a max packet size of 6 (5.125) * frames with a data interval of 1, but 11 (10.25) frames with a * data interval of 2. * (ep->freqmax << ep->datainterval overflows at 8.192 MHz for the * maximum datainterval value of 3, at USB full speed, higher for * USB high speed, noting that ep->freqmax is in units of * frames per packet in Q16.16 format.) */ maxsize = (((ep->freqmax << ep->datainterval) + 0xffff) >> 16) * (frame_bits >> 3); if (tx_length_quirk) maxsize += sizeof(__le32); /* Space for length descriptor */ /* but wMaxPacketSize might reduce this */ if (ep->maxpacksize && ep->maxpacksize < maxsize) { /* whatever fits into a max. size packet */ unsigned int data_maxsize = maxsize = ep->maxpacksize; if (tx_length_quirk) /* Need to remove the length descriptor to calc freq */ data_maxsize -= sizeof(__le32); ep->freqmax = (data_maxsize / (frame_bits >> 3)) << (16 - ep->datainterval); } if (ep->fill_max) ep->curpacksize = ep->maxpacksize; else ep->curpacksize = maxsize; if (snd_usb_get_speed(chip->dev) != USB_SPEED_FULL) { packs_per_ms = 8 >> ep->datainterval; max_packs_per_urb = MAX_PACKS_HS; } else { packs_per_ms = 1; max_packs_per_urb = MAX_PACKS; } if (ep->sync_source && !ep->implicit_fb_sync) max_packs_per_urb = min(max_packs_per_urb, 1U << ep->sync_source->syncinterval); max_packs_per_urb = max(1u, max_packs_per_urb >> ep->datainterval); /* * Capture endpoints need to use small URBs because there's no way * to tell in advance where the next period will end, and we don't * want the next URB to complete much after the period ends. * * Playback endpoints with implicit sync much use the same parameters * as their corresponding capture endpoint. */ if (usb_pipein(ep->pipe) || ep->implicit_fb_sync) { /* make capture URBs <= 1 ms and smaller than a period */ urb_packs = min(max_packs_per_urb, packs_per_ms); while (urb_packs > 1 && urb_packs * maxsize >= ep->cur_period_bytes) urb_packs >>= 1; ep->nurbs = MAX_URBS; /* * Playback endpoints without implicit sync are adjusted so that * a period fits as evenly as possible in the smallest number of * URBs. The total number of URBs is adjusted to the size of the * ALSA buffer, subject to the MAX_URBS and MAX_QUEUE limits. */ } else { /* determine how small a packet can be */ minsize = (ep->freqn >> (16 - ep->datainterval)) * (frame_bits >> 3); /* with sync from device, assume it can be 12% lower */ if (ep->sync_source) minsize -= minsize >> 3; minsize = max(minsize, 1u); /* how many packets will contain an entire ALSA period? */ max_packs_per_period = DIV_ROUND_UP(ep->cur_period_bytes, minsize); /* how many URBs will contain a period? */ urbs_per_period = DIV_ROUND_UP(max_packs_per_period, max_packs_per_urb); /* how many packets are needed in each URB? */ urb_packs = DIV_ROUND_UP(max_packs_per_period, urbs_per_period); /* limit the number of frames in a single URB */ ep->max_urb_frames = DIV_ROUND_UP(ep->cur_period_frames, urbs_per_period); /* try to use enough URBs to contain an entire ALSA buffer */ max_urbs = min((unsigned) MAX_URBS, MAX_QUEUE * packs_per_ms / urb_packs); ep->nurbs = min(max_urbs, urbs_per_period * ep->cur_buffer_periods); } /* allocate and initialize data urbs */ for (i = 0; i < ep->nurbs; i++) { struct snd_urb_ctx *u = &ep->urb[i]; u->index = i; u->ep = ep; u->packets = urb_packs; u->buffer_size = maxsize * u->packets; if (fmt->fmt_type == UAC_FORMAT_TYPE_II) u->packets++; /* for transfer delimiter */ u->urb = usb_alloc_urb(u->packets, GFP_KERNEL); if (!u->urb) goto out_of_memory; u->urb->transfer_buffer = usb_alloc_coherent(chip->dev, u->buffer_size, GFP_KERNEL, &u->urb->transfer_dma); if (!u->urb->transfer_buffer) goto out_of_memory; u->urb->pipe = ep->pipe; u->urb->transfer_flags = URB_NO_TRANSFER_DMA_MAP; u->urb->interval = 1 << ep->datainterval; u->urb->context = u; u->urb->complete = snd_complete_urb; INIT_LIST_HEAD(&u->ready_list); } return 0; out_of_memory: release_urbs(ep, false); return -ENOMEM; } /* * configure a sync endpoint */ static int sync_ep_set_params(struct snd_usb_endpoint *ep) { struct snd_usb_audio *chip = ep->chip; int i; usb_audio_dbg(chip, "Setting params for sync EP 0x%x, pipe 0x%x\n", ep->ep_num, ep->pipe); ep->syncbuf = usb_alloc_coherent(chip->dev, SYNC_URBS * 4, GFP_KERNEL, &ep->sync_dma); if (!ep->syncbuf) return -ENOMEM; ep->nurbs = SYNC_URBS; for (i = 0; i < SYNC_URBS; i++) { struct snd_urb_ctx *u = &ep->urb[i]; u->index = i; u->ep = ep; u->packets = 1; u->urb = usb_alloc_urb(1, GFP_KERNEL); if (!u->urb) goto out_of_memory; u->urb->transfer_buffer = ep->syncbuf + i * 4; u->urb->transfer_dma = ep->sync_dma + i * 4; u->urb->transfer_buffer_length = 4; u->urb->pipe = ep->pipe; u->urb->transfer_flags = URB_NO_TRANSFER_DMA_MAP; u->urb->number_of_packets = 1; u->urb->interval = 1 << ep->syncinterval; u->urb->context = u; u->urb->complete = snd_complete_urb; } return 0; out_of_memory: release_urbs(ep, false); return -ENOMEM; } /* update the rate of the referred clock; return the actual rate */ static int update_clock_ref_rate(struct snd_usb_audio *chip, struct snd_usb_endpoint *ep) { struct snd_usb_clock_ref *clock = ep->clock_ref; int rate = ep->cur_rate; if (!clock || clock->rate == rate) return rate; if (clock->rate) { if (atomic_read(&clock->locked)) return clock->rate; if (clock->rate != rate) { usb_audio_err(chip, "Mismatched sample rate %d vs %d for EP 0x%x\n", clock->rate, rate, ep->ep_num); return clock->rate; } } clock->rate = rate; clock->need_setup = true; return rate; } /* * snd_usb_endpoint_set_params: configure an snd_usb_endpoint * * It's called either from hw_params callback. * Determine the number of URBs to be used on this endpoint. * An endpoint must be configured before it can be started. * An endpoint that is already running can not be reconfigured. */ int snd_usb_endpoint_set_params(struct snd_usb_audio *chip, struct snd_usb_endpoint *ep) { const struct audioformat *fmt = ep->cur_audiofmt; int err = 0; mutex_lock(&chip->mutex); if (!ep->need_setup) goto unlock; /* release old buffers, if any */ err = release_urbs(ep, false); if (err < 0) goto unlock; ep->datainterval = fmt->datainterval; ep->maxpacksize = fmt->maxpacksize; ep->fill_max = !!(fmt->attributes & UAC_EP_CS_ATTR_FILL_MAX); if (snd_usb_get_speed(chip->dev) == USB_SPEED_FULL) { ep->freqn = get_usb_full_speed_rate(ep->cur_rate); ep->pps = 1000 >> ep->datainterval; } else { ep->freqn = get_usb_high_speed_rate(ep->cur_rate); ep->pps = 8000 >> ep->datainterval; } ep->sample_rem = ep->cur_rate % ep->pps; ep->packsize[0] = ep->cur_rate / ep->pps; ep->packsize[1] = (ep->cur_rate + (ep->pps - 1)) / ep->pps; /* calculate the frequency in 16.16 format */ ep->freqm = ep->freqn; ep->freqshift = INT_MIN; ep->phase = 0; switch (ep->type) { case SND_USB_ENDPOINT_TYPE_DATA: err = data_ep_set_params(ep); break; case SND_USB_ENDPOINT_TYPE_SYNC: err = sync_ep_set_params(ep); break; default: err = -EINVAL; } usb_audio_dbg(chip, "Set up %d URBS, ret=%d\n", ep->nurbs, err); if (err < 0) goto unlock; /* some unit conversions in runtime */ ep->maxframesize = ep->maxpacksize / ep->cur_frame_bytes; ep->curframesize = ep->curpacksize / ep->cur_frame_bytes; err = update_clock_ref_rate(chip, ep); if (err >= 0) { ep->need_setup = false; err = 0; } unlock: mutex_unlock(&chip->mutex); return err; } static int init_sample_rate(struct snd_usb_audio *chip, struct snd_usb_endpoint *ep) { struct snd_usb_clock_ref *clock = ep->clock_ref; int rate, err; rate = update_clock_ref_rate(chip, ep); if (rate < 0) return rate; if (clock && !clock->need_setup) return 0; if (!ep->fixed_rate) { err = snd_usb_init_sample_rate(chip, ep->cur_audiofmt, rate); if (err < 0) { if (clock) clock->rate = 0; /* reset rate */ return err; } } if (clock) clock->need_setup = false; return 0; } /* * snd_usb_endpoint_prepare: Prepare the endpoint * * This function sets up the EP to be fully usable state. * It's called either from prepare callback. * The function checks need_setup flag, and performs nothing unless needed, * so it's safe to call this multiple times. * * This returns zero if unchanged, 1 if the configuration has changed, * or a negative error code. */ int snd_usb_endpoint_prepare(struct snd_usb_audio *chip, struct snd_usb_endpoint *ep) { bool iface_first; int err = 0; mutex_lock(&chip->mutex); if (WARN_ON(!ep->iface_ref)) goto unlock; if (!ep->need_prepare) goto unlock; /* If the interface has been already set up, just set EP parameters */ if (!ep->iface_ref->need_setup) { /* sample rate setup of UAC1 is per endpoint, and we need * to update at each EP configuration */ if (ep->cur_audiofmt->protocol == UAC_VERSION_1) { err = init_sample_rate(chip, ep); if (err < 0) goto unlock; } goto done; } /* Need to deselect altsetting at first */ endpoint_set_interface(chip, ep, false); /* Some UAC1 devices (e.g. Yamaha THR10) need the host interface * to be set up before parameter setups */ iface_first = ep->cur_audiofmt->protocol == UAC_VERSION_1; /* Workaround for devices that require the interface setup at first like UAC1 */ if (chip->quirk_flags & QUIRK_FLAG_SET_IFACE_FIRST) iface_first = true; if (iface_first) { err = endpoint_set_interface(chip, ep, true); if (err < 0) goto unlock; } err = snd_usb_init_pitch(chip, ep->cur_audiofmt); if (err < 0) goto unlock; err = init_sample_rate(chip, ep); if (err < 0) goto unlock; err = snd_usb_select_mode_quirk(chip, ep->cur_audiofmt); if (err < 0) goto unlock; /* for UAC2/3, enable the interface altset here at last */ if (!iface_first) { err = endpoint_set_interface(chip, ep, true); if (err < 0) goto unlock; } ep->iface_ref->need_setup = false; done: ep->need_prepare = false; err = 1; unlock: mutex_unlock(&chip->mutex); return err; } EXPORT_SYMBOL_GPL(snd_usb_endpoint_prepare); /* get the current rate set to the given clock by any endpoint */ int snd_usb_endpoint_get_clock_rate(struct snd_usb_audio *chip, int clock) { struct snd_usb_clock_ref *ref; int rate = 0; if (!clock) return 0; mutex_lock(&chip->mutex); list_for_each_entry(ref, &chip->clock_ref_list, list) { if (ref->clock == clock) { rate = ref->rate; break; } } mutex_unlock(&chip->mutex); return rate; } /** * snd_usb_endpoint_start: start an snd_usb_endpoint * * @ep: the endpoint to start * * A call to this function will increment the running count of the endpoint. * In case it is not already running, the URBs for this endpoint will be * submitted. Otherwise, this function does nothing. * * Must be balanced to calls of snd_usb_endpoint_stop(). * * Returns an error if the URB submission failed, 0 in all other cases. */ int snd_usb_endpoint_start(struct snd_usb_endpoint *ep) { bool is_playback = usb_pipeout(ep->pipe); int err; unsigned int i; if (atomic_read(&ep->chip->shutdown)) return -EBADFD; if (ep->sync_source) WRITE_ONCE(ep->sync_source->sync_sink, ep); usb_audio_dbg(ep->chip, "Starting %s EP 0x%x (running %d)\n", ep_type_name(ep->type), ep->ep_num, atomic_read(&ep->running)); /* already running? */ if (atomic_inc_return(&ep->running) != 1) return 0; if (ep->clock_ref) atomic_inc(&ep->clock_ref->locked); ep->active_mask = 0; ep->unlink_mask = 0; ep->phase = 0; ep->sample_accum = 0; snd_usb_endpoint_start_quirk(ep); /* * If this endpoint has a data endpoint as implicit feedback source, * don't start the urbs here. Instead, mark them all as available, * wait for the record urbs to return and queue the playback urbs * from that context. */ if (!ep_state_update(ep, EP_STATE_STOPPED, EP_STATE_RUNNING)) goto __error; if (snd_usb_endpoint_implicit_feedback_sink(ep) && !(ep->chip->quirk_flags & QUIRK_FLAG_PLAYBACK_FIRST)) { usb_audio_dbg(ep->chip, "No URB submission due to implicit fb sync\n"); i = 0; goto fill_rest; } for (i = 0; i < ep->nurbs; i++) { struct urb *urb = ep->urb[i].urb; if (snd_BUG_ON(!urb)) goto __error; if (is_playback) err = prepare_outbound_urb(ep, urb->context, true); else err = prepare_inbound_urb(ep, urb->context); if (err < 0) { /* stop filling at applptr */ if (err == -EAGAIN) break; usb_audio_dbg(ep->chip, "EP 0x%x: failed to prepare urb: %d\n", ep->ep_num, err); goto __error; } if (!atomic_read(&ep->chip->shutdown)) err = usb_submit_urb(urb, GFP_ATOMIC); else err = -ENODEV; if (err < 0) { if (!atomic_read(&ep->chip->shutdown)) usb_audio_err(ep->chip, "cannot submit urb %d, error %d: %s\n", i, err, usb_error_string(err)); goto __error; } set_bit(i, &ep->active_mask); atomic_inc(&ep->submitted_urbs); } if (!i) { usb_audio_dbg(ep->chip, "XRUN at starting EP 0x%x\n", ep->ep_num); goto __error; } usb_audio_dbg(ep->chip, "%d URBs submitted for EP 0x%x\n", i, ep->ep_num); fill_rest: /* put the remaining URBs to ready list */ if (is_playback) { for (; i < ep->nurbs; i++) push_back_to_ready_list(ep, ep->urb + i); } return 0; __error: snd_usb_endpoint_stop(ep, false); return -EPIPE; } /** * snd_usb_endpoint_stop: stop an snd_usb_endpoint * * @ep: the endpoint to stop (may be NULL) * @keep_pending: keep in-flight URBs * * A call to this function will decrement the running count of the endpoint. * In case the last user has requested the endpoint stop, the URBs will * actually be deactivated. * * Must be balanced to calls of snd_usb_endpoint_start(). * * The caller needs to synchronize the pending stop operation via * snd_usb_endpoint_sync_pending_stop(). */ void snd_usb_endpoint_stop(struct snd_usb_endpoint *ep, bool keep_pending) { if (!ep) return; usb_audio_dbg(ep->chip, "Stopping %s EP 0x%x (running %d)\n", ep_type_name(ep->type), ep->ep_num, atomic_read(&ep->running)); if (snd_BUG_ON(!atomic_read(&ep->running))) return; if (!atomic_dec_return(&ep->running)) { if (ep->sync_source) WRITE_ONCE(ep->sync_source->sync_sink, NULL); stop_urbs(ep, false, keep_pending); if (ep->clock_ref) atomic_dec(&ep->clock_ref->locked); if (ep->chip->quirk_flags & QUIRK_FLAG_FORCE_IFACE_RESET && usb_pipeout(ep->pipe)) { ep->need_prepare = true; if (ep->iface_ref) ep->iface_ref->need_setup = true; } } } /** * snd_usb_endpoint_release: Tear down an snd_usb_endpoint * * @ep: the endpoint to release * * This function does not care for the endpoint's running count but will tear * down all the streaming URBs immediately. */ void snd_usb_endpoint_release(struct snd_usb_endpoint *ep) { release_urbs(ep, true); } /** * snd_usb_endpoint_free_all: Free the resources of an snd_usb_endpoint * @chip: The chip * * This free all endpoints and those resources */ void snd_usb_endpoint_free_all(struct snd_usb_audio *chip) { struct snd_usb_endpoint *ep, *en; struct snd_usb_iface_ref *ip, *in; struct snd_usb_clock_ref *cp, *cn; list_for_each_entry_safe(ep, en, &chip->ep_list, list) kfree(ep); list_for_each_entry_safe(ip, in, &chip->iface_ref_list, list) kfree(ip); list_for_each_entry_safe(cp, cn, &chip->clock_ref_list, list) kfree(cp); } /* * snd_usb_handle_sync_urb: parse an USB sync packet * * @ep: the endpoint to handle the packet * @sender: the sending endpoint * @urb: the received packet * * This function is called from the context of an endpoint that received * the packet and is used to let another endpoint object handle the payload. */ static void snd_usb_handle_sync_urb(struct snd_usb_endpoint *ep, struct snd_usb_endpoint *sender, const struct urb *urb) { int shift; unsigned int f; unsigned long flags; snd_BUG_ON(ep == sender); /* * In case the endpoint is operating in implicit feedback mode, prepare * a new outbound URB that has the same layout as the received packet * and add it to the list of pending urbs. queue_pending_output_urbs() * will take care of them later. */ if (snd_usb_endpoint_implicit_feedback_sink(ep) && atomic_read(&ep->running)) { /* implicit feedback case */ int i, bytes = 0; struct snd_urb_ctx *in_ctx; struct snd_usb_packet_info *out_packet; in_ctx = urb->context; /* Count overall packet size */ for (i = 0; i < in_ctx->packets; i++) if (urb->iso_frame_desc[i].status == 0) bytes += urb->iso_frame_desc[i].actual_length; /* * skip empty packets. At least M-Audio's Fast Track Ultra stops * streaming once it received a 0-byte OUT URB */ if (bytes == 0) return; spin_lock_irqsave(&ep->lock, flags); if (ep->next_packet_queued >= ARRAY_SIZE(ep->next_packet)) { spin_unlock_irqrestore(&ep->lock, flags); usb_audio_err(ep->chip, "next package FIFO overflow EP 0x%x\n", ep->ep_num); notify_xrun(ep); return; } out_packet = next_packet_fifo_enqueue(ep); /* * Iterate through the inbound packet and prepare the lengths * for the output packet. The OUT packet we are about to send * will have the same amount of payload bytes per stride as the * IN packet we just received. Since the actual size is scaled * by the stride, use the sender stride to calculate the length * in case the number of channels differ between the implicitly * fed-back endpoint and the synchronizing endpoint. */ out_packet->packets = in_ctx->packets; for (i = 0; i < in_ctx->packets; i++) { if (urb->iso_frame_desc[i].status == 0) out_packet->packet_size[i] = urb->iso_frame_desc[i].actual_length / sender->stride; else out_packet->packet_size[i] = 0; } spin_unlock_irqrestore(&ep->lock, flags); snd_usb_queue_pending_output_urbs(ep, false); return; } /* * process after playback sync complete * * Full speed devices report feedback values in 10.14 format as samples * per frame, high speed devices in 16.16 format as samples per * microframe. * * Because the Audio Class 1 spec was written before USB 2.0, many high * speed devices use a wrong interpretation, some others use an * entirely different format. * * Therefore, we cannot predict what format any particular device uses * and must detect it automatically. */ if (urb->iso_frame_desc[0].status != 0 || urb->iso_frame_desc[0].actual_length < 3) return; f = le32_to_cpup(urb->transfer_buffer); if (urb->iso_frame_desc[0].actual_length == 3) f &= 0x00ffffff; else f &= 0x0fffffff; if (f == 0) return; if (unlikely(sender->tenor_fb_quirk)) { /* * Devices based on Tenor 8802 chipsets (TEAC UD-H01 * and others) sometimes change the feedback value * by +/- 0x1.0000. */ if (f < ep->freqn - 0x8000) f += 0xf000; else if (f > ep->freqn + 0x8000) f -= 0xf000; } else if (unlikely(ep->freqshift == INT_MIN)) { /* * The first time we see a feedback value, determine its format * by shifting it left or right until it matches the nominal * frequency value. This assumes that the feedback does not * differ from the nominal value more than +50% or -25%. */ shift = 0; while (f < ep->freqn - ep->freqn / 4) { f <<= 1; shift++; } while (f > ep->freqn + ep->freqn / 2) { f >>= 1; shift--; } ep->freqshift = shift; } else if (ep->freqshift >= 0) f <<= ep->freqshift; else f >>= -ep->freqshift; if (likely(f >= ep->freqn - ep->freqn / 8 && f <= ep->freqmax)) { /* * If the frequency looks valid, set it. * This value is referred to in prepare_playback_urb(). */ spin_lock_irqsave(&ep->lock, flags); ep->freqm = f; spin_unlock_irqrestore(&ep->lock, flags); } else { /* * Out of range; maybe the shift value is wrong. * Reset it so that we autodetect again the next time. */ ep->freqshift = INT_MIN; } } |
14 263 319 331 2439 365 315 441 1084 583 31 583 583 508 580 580 580 437 194 194 510 114 120 11 106 106 429 1016 252 16 6566 252 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 | /* SPDX-License-Identifier: GPL-2.0 */ /* * Copyright (C) 2001 Jens Axboe <axboe@suse.de> */ #ifndef __LINUX_BIO_H #define __LINUX_BIO_H #include <linux/mempool.h> /* struct bio, bio_vec and BIO_* flags are defined in blk_types.h */ #include <linux/blk_types.h> #include <linux/uio.h> #define BIO_MAX_VECS 256U #define BIO_MAX_INLINE_VECS UIO_MAXIOV struct queue_limits; static inline unsigned int bio_max_segs(unsigned int nr_segs) { return min(nr_segs, BIO_MAX_VECS); } #define bio_iter_iovec(bio, iter) \ bvec_iter_bvec((bio)->bi_io_vec, (iter)) #define bio_iter_page(bio, iter) \ bvec_iter_page((bio)->bi_io_vec, (iter)) #define bio_iter_len(bio, iter) \ bvec_iter_len((bio)->bi_io_vec, (iter)) #define bio_iter_offset(bio, iter) \ bvec_iter_offset((bio)->bi_io_vec, (iter)) #define bio_page(bio) bio_iter_page((bio), (bio)->bi_iter) #define bio_offset(bio) bio_iter_offset((bio), (bio)->bi_iter) #define bio_iovec(bio) bio_iter_iovec((bio), (bio)->bi_iter) #define bvec_iter_sectors(iter) ((iter).bi_size >> 9) #define bvec_iter_end_sector(iter) ((iter).bi_sector + bvec_iter_sectors((iter))) #define bio_sectors(bio) bvec_iter_sectors((bio)->bi_iter) #define bio_end_sector(bio) bvec_iter_end_sector((bio)->bi_iter) /* * Return the data direction, READ or WRITE. */ #define bio_data_dir(bio) \ (op_is_write(bio_op(bio)) ? WRITE : READ) /* * Check whether this bio carries any data or not. A NULL bio is allowed. */ static inline bool bio_has_data(struct bio *bio) { if (bio && bio->bi_iter.bi_size && bio_op(bio) != REQ_OP_DISCARD && bio_op(bio) != REQ_OP_SECURE_ERASE && bio_op(bio) != REQ_OP_WRITE_ZEROES) return true; return false; } static inline bool bio_no_advance_iter(const struct bio *bio) { return bio_op(bio) == REQ_OP_DISCARD || bio_op(bio) == REQ_OP_SECURE_ERASE || bio_op(bio) == REQ_OP_WRITE_ZEROES; } static inline void *bio_data(struct bio *bio) { if (bio_has_data(bio)) return page_address(bio_page(bio)) + bio_offset(bio); return NULL; } static inline bool bio_next_segment(const struct bio *bio, struct bvec_iter_all *iter) { if (iter->idx >= bio->bi_vcnt) return false; bvec_advance(&bio->bi_io_vec[iter->idx], iter); return true; } /* * drivers should _never_ use the all version - the bio may have been split * before it got to the driver and the driver won't own all of it */ #define bio_for_each_segment_all(bvl, bio, iter) \ for (bvl = bvec_init_iter_all(&iter); bio_next_segment((bio), &iter); ) static inline void bio_advance_iter(const struct bio *bio, struct bvec_iter *iter, unsigned int bytes) { iter->bi_sector += bytes >> 9; if (bio_no_advance_iter(bio)) iter->bi_size -= bytes; else bvec_iter_advance(bio->bi_io_vec, iter, bytes); /* TODO: It is reasonable to complete bio with error here. */ } /* @bytes should be less or equal to bvec[i->bi_idx].bv_len */ static inline void bio_advance_iter_single(const struct bio *bio, struct bvec_iter *iter, unsigned int bytes) { iter->bi_sector += bytes >> 9; if (bio_no_advance_iter(bio)) iter->bi_size -= bytes; else bvec_iter_advance_single(bio->bi_io_vec, iter, bytes); } void __bio_advance(struct bio *, unsigned bytes); /** * bio_advance - increment/complete a bio by some number of bytes * @bio: bio to advance * @nbytes: number of bytes to complete * * This updates bi_sector, bi_size and bi_idx; if the number of bytes to * complete doesn't align with a bvec boundary, then bv_len and bv_offset will * be updated on the last bvec as well. * * @bio will then represent the remaining, uncompleted portion of the io. */ static inline void bio_advance(struct bio *bio, unsigned int nbytes) { if (nbytes == bio->bi_iter.bi_size) { bio->bi_iter.bi_size = 0; return; } __bio_advance(bio, nbytes); } #define __bio_for_each_segment(bvl, bio, iter, start) \ for (iter = (start); \ (iter).bi_size && \ ((bvl = bio_iter_iovec((bio), (iter))), 1); \ bio_advance_iter_single((bio), &(iter), (bvl).bv_len)) #define bio_for_each_segment(bvl, bio, iter) \ __bio_for_each_segment(bvl, bio, iter, (bio)->bi_iter) #define __bio_for_each_bvec(bvl, bio, iter, start) \ for (iter = (start); \ (iter).bi_size && \ ((bvl = mp_bvec_iter_bvec((bio)->bi_io_vec, (iter))), 1); \ bio_advance_iter_single((bio), &(iter), (bvl).bv_len)) /* iterate over multi-page bvec */ #define bio_for_each_bvec(bvl, bio, iter) \ __bio_for_each_bvec(bvl, bio, iter, (bio)->bi_iter) /* * Iterate over all multi-page bvecs. Drivers shouldn't use this version for the * same reasons as bio_for_each_segment_all(). */ #define bio_for_each_bvec_all(bvl, bio, i) \ for (i = 0, bvl = bio_first_bvec_all(bio); \ i < (bio)->bi_vcnt; i++, bvl++) #define bio_iter_last(bvec, iter) ((iter).bi_size == (bvec).bv_len) static inline unsigned bio_segments(struct bio *bio) { unsigned segs = 0; struct bio_vec bv; struct bvec_iter iter; /* * We special case discard/write same/write zeroes, because they * interpret bi_size differently: */ switch (bio_op(bio)) { case REQ_OP_DISCARD: case REQ_OP_SECURE_ERASE: case REQ_OP_WRITE_ZEROES: return 0; default: break; } bio_for_each_segment(bv, bio, iter) segs++; return segs; } /* * get a reference to a bio, so it won't disappear. the intended use is * something like: * * bio_get(bio); * submit_bio(rw, bio); * if (bio->bi_flags ...) * do_something * bio_put(bio); * * without the bio_get(), it could potentially complete I/O before submit_bio * returns. and then bio would be freed memory when if (bio->bi_flags ...) * runs */ static inline void bio_get(struct bio *bio) { bio->bi_flags |= (1 << BIO_REFFED); smp_mb__before_atomic(); atomic_inc(&bio->__bi_cnt); } static inline void bio_cnt_set(struct bio *bio, unsigned int count) { if (count != 1) { bio->bi_flags |= (1 << BIO_REFFED); smp_mb(); } atomic_set(&bio->__bi_cnt, count); } static inline bool bio_flagged(struct bio *bio, unsigned int bit) { return bio->bi_flags & (1U << bit); } static inline void bio_set_flag(struct bio *bio, unsigned int bit) { bio->bi_flags |= (1U << bit); } static inline void bio_clear_flag(struct bio *bio, unsigned int bit) { bio->bi_flags &= ~(1U << bit); } static inline struct bio_vec *bio_first_bvec_all(struct bio *bio) { WARN_ON_ONCE(bio_flagged(bio, BIO_CLONED)); return bio->bi_io_vec; } static inline struct page *bio_first_page_all(struct bio *bio) { return bio_first_bvec_all(bio)->bv_page; } static inline struct folio *bio_first_folio_all(struct bio *bio) { return page_folio(bio_first_page_all(bio)); } static inline struct bio_vec *bio_last_bvec_all(struct bio *bio) { WARN_ON_ONCE(bio_flagged(bio, BIO_CLONED)); return &bio->bi_io_vec[bio->bi_vcnt - 1]; } /** * struct folio_iter - State for iterating all folios in a bio. * @folio: The current folio we're iterating. NULL after the last folio. * @offset: The byte offset within the current folio. * @length: The number of bytes in this iteration (will not cross folio * boundary). */ struct folio_iter { struct folio *folio; size_t offset; size_t length; /* private: for use by the iterator */ struct folio *_next; size_t _seg_count; int _i; }; static inline void bio_first_folio(struct folio_iter *fi, struct bio *bio, int i) { struct bio_vec *bvec = bio_first_bvec_all(bio) + i; if (unlikely(i >= bio->bi_vcnt)) { fi->folio = NULL; return; } fi->folio = page_folio(bvec->bv_page); fi->offset = bvec->bv_offset + PAGE_SIZE * folio_page_idx(fi->folio, bvec->bv_page); fi->_seg_count = bvec->bv_len; fi->length = min(folio_size(fi->folio) - fi->offset, fi->_seg_count); fi->_next = folio_next(fi->folio); fi->_i = i; } static inline void bio_next_folio(struct folio_iter *fi, struct bio *bio) { fi->_seg_count -= fi->length; if (fi->_seg_count) { fi->folio = fi->_next; fi->offset = 0; fi->length = min(folio_size(fi->folio), fi->_seg_count); fi->_next = folio_next(fi->folio); } else { bio_first_folio(fi, bio, fi->_i + 1); } } /** * bio_for_each_folio_all - Iterate over each folio in a bio. * @fi: struct folio_iter which is updated for each folio. * @bio: struct bio to iterate over. */ #define bio_for_each_folio_all(fi, bio) \ for (bio_first_folio(&fi, bio, 0); fi.folio; bio_next_folio(&fi, bio)) void bio_trim(struct bio *bio, sector_t offset, sector_t size); extern struct bio *bio_split(struct bio *bio, int sectors, gfp_t gfp, struct bio_set *bs); int bio_split_rw_at(struct bio *bio, const struct queue_limits *lim, unsigned *segs, unsigned max_bytes); /** * bio_next_split - get next @sectors from a bio, splitting if necessary * @bio: bio to split * @sectors: number of sectors to split from the front of @bio * @gfp: gfp mask * @bs: bio set to allocate from * * Return: a bio representing the next @sectors of @bio - if the bio is smaller * than @sectors, returns the original bio unchanged. */ static inline struct bio *bio_next_split(struct bio *bio, int sectors, gfp_t gfp, struct bio_set *bs) { if (sectors >= bio_sectors(bio)) return bio; return bio_split(bio, sectors, gfp, bs); } enum { BIOSET_NEED_BVECS = BIT(0), BIOSET_NEED_RESCUER = BIT(1), BIOSET_PERCPU_CACHE = BIT(2), }; extern int bioset_init(struct bio_set *, unsigned int, unsigned int, int flags); extern void bioset_exit(struct bio_set *); extern int biovec_init_pool(mempool_t *pool, int pool_entries); struct bio *bio_alloc_bioset(struct block_device *bdev, unsigned short nr_vecs, blk_opf_t opf, gfp_t gfp_mask, struct bio_set *bs); struct bio *bio_kmalloc(unsigned short nr_vecs, gfp_t gfp_mask); extern void bio_put(struct bio *); struct bio *bio_alloc_clone(struct block_device *bdev, struct bio *bio_src, gfp_t gfp, struct bio_set *bs); int bio_init_clone(struct block_device *bdev, struct bio *bio, struct bio *bio_src, gfp_t gfp); extern struct bio_set fs_bio_set; static inline struct bio *bio_alloc(struct block_device *bdev, unsigned short nr_vecs, blk_opf_t opf, gfp_t gfp_mask) { return bio_alloc_bioset(bdev, nr_vecs, opf, gfp_mask, &fs_bio_set); } void submit_bio(struct bio *bio); extern void bio_endio(struct bio *); static inline void bio_io_error(struct bio *bio) { bio->bi_status = BLK_STS_IOERR; bio_endio(bio); } static inline void bio_wouldblock_error(struct bio *bio) { bio_set_flag(bio, BIO_QUIET); bio->bi_status = BLK_STS_AGAIN; bio_endio(bio); } /* * Calculate number of bvec segments that should be allocated to fit data * pointed by @iter. If @iter is backed by bvec it's going to be reused * instead of allocating a new one. */ static inline int bio_iov_vecs_to_alloc(struct iov_iter *iter, int max_segs) { if (iov_iter_is_bvec(iter)) return 0; return iov_iter_npages(iter, max_segs); } struct request_queue; void bio_init(struct bio *bio, struct block_device *bdev, struct bio_vec *table, unsigned short max_vecs, blk_opf_t opf); extern void bio_uninit(struct bio *); void bio_reset(struct bio *bio, struct block_device *bdev, blk_opf_t opf); void bio_chain(struct bio *, struct bio *); int __must_check bio_add_page(struct bio *bio, struct page *page, unsigned len, unsigned off); bool __must_check bio_add_folio(struct bio *bio, struct folio *folio, size_t len, size_t off); void __bio_add_page(struct bio *bio, struct page *page, unsigned int len, unsigned int off); void bio_add_folio_nofail(struct bio *bio, struct folio *folio, size_t len, size_t off); void bio_add_virt_nofail(struct bio *bio, void *vaddr, unsigned len); /** * bio_add_max_vecs - number of bio_vecs needed to add data to a bio * @kaddr: kernel virtual address to add * @len: length in bytes to add * * Calculate how many bio_vecs need to be allocated to add the kernel virtual * address range in [@kaddr:@len] in the worse case. */ static inline unsigned int bio_add_max_vecs(void *kaddr, unsigned int len) { if (is_vmalloc_addr(kaddr)) return DIV_ROUND_UP(offset_in_page(kaddr) + len, PAGE_SIZE); return 1; } unsigned int bio_add_vmalloc_chunk(struct bio *bio, void *vaddr, unsigned len); bool bio_add_vmalloc(struct bio *bio, void *vaddr, unsigned int len); int submit_bio_wait(struct bio *bio); int bdev_rw_virt(struct block_device *bdev, sector_t sector, void *data, size_t len, enum req_op op); int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter); void bio_iov_bvec_set(struct bio *bio, const struct iov_iter *iter); void __bio_release_pages(struct bio *bio, bool mark_dirty); extern void bio_set_pages_dirty(struct bio *bio); extern void bio_check_pages_dirty(struct bio *bio); extern void bio_copy_data_iter(struct bio *dst, struct bvec_iter *dst_iter, struct bio *src, struct bvec_iter *src_iter); extern void bio_copy_data(struct bio *dst, struct bio *src); extern void bio_free_pages(struct bio *bio); void guard_bio_eod(struct bio *bio); void zero_fill_bio_iter(struct bio *bio, struct bvec_iter iter); static inline void zero_fill_bio(struct bio *bio) { zero_fill_bio_iter(bio, bio->bi_iter); } static inline void bio_release_pages(struct bio *bio, bool mark_dirty) { if (bio_flagged(bio, BIO_PAGE_PINNED)) __bio_release_pages(bio, mark_dirty); } #define bio_dev(bio) \ disk_devt((bio)->bi_bdev->bd_disk) #ifdef CONFIG_BLK_CGROUP void bio_associate_blkg(struct bio *bio); void bio_associate_blkg_from_css(struct bio *bio, struct cgroup_subsys_state *css); void bio_clone_blkg_association(struct bio *dst, struct bio *src); void blkcg_punt_bio_submit(struct bio *bio); #else /* CONFIG_BLK_CGROUP */ static inline void bio_associate_blkg(struct bio *bio) { } static inline void bio_associate_blkg_from_css(struct bio *bio, struct cgroup_subsys_state *css) { } static inline void bio_clone_blkg_association(struct bio *dst, struct bio *src) { } static inline void blkcg_punt_bio_submit(struct bio *bio) { submit_bio(bio); } #endif /* CONFIG_BLK_CGROUP */ static inline void bio_set_dev(struct bio *bio, struct block_device *bdev) { bio_clear_flag(bio, BIO_REMAPPED); if (bio->bi_bdev != bdev) bio_clear_flag(bio, BIO_BPS_THROTTLED); bio->bi_bdev = bdev; bio_associate_blkg(bio); } /* * BIO list management for use by remapping drivers (e.g. DM or MD) and loop. * * A bio_list anchors a singly-linked list of bios chained through the bi_next * member of the bio. The bio_list also caches the last list member to allow * fast access to the tail. */ struct bio_list { struct bio *head; struct bio *tail; }; static inline int bio_list_empty(const struct bio_list *bl) { return bl->head == NULL; } static inline void bio_list_init(struct bio_list *bl) { bl->head = bl->tail = NULL; } #define BIO_EMPTY_LIST { NULL, NULL } #define bio_list_for_each(bio, bl) \ for (bio = (bl)->head; bio; bio = bio->bi_next) static inline unsigned bio_list_size(const struct bio_list *bl) { unsigned sz = 0; struct bio *bio; bio_list_for_each(bio, bl) sz++; return sz; } static inline void bio_list_add(struct bio_list *bl, struct bio *bio) { bio->bi_next = NULL; if (bl->tail) bl->tail->bi_next = bio; else bl->head = bio; bl->tail = bio; } static inline void bio_list_add_head(struct bio_list *bl, struct bio *bio) { bio->bi_next = bl->head; bl->head = bio; if (!bl->tail) bl->tail = bio; } static inline void bio_list_merge(struct bio_list *bl, struct bio_list *bl2) { if (!bl2->head) return; if (bl->tail) bl->tail->bi_next = bl2->head; else bl->head = bl2->head; bl->tail = bl2->tail; } static inline void bio_list_merge_init(struct bio_list *bl, struct bio_list *bl2) { bio_list_merge(bl, bl2); bio_list_init(bl2); } static inline void bio_list_merge_head(struct bio_list *bl, struct bio_list *bl2) { if (!bl2->head) return; if (bl->head) bl2->tail->bi_next = bl->head; else bl->tail = bl2->tail; bl->head = bl2->head; } static inline struct bio *bio_list_peek(struct bio_list *bl) { return bl->head; } static inline struct bio *bio_list_pop(struct bio_list *bl) { struct bio *bio = bl->head; if (bio) { bl->head = bl->head->bi_next; if (!bl->head) bl->tail = NULL; bio->bi_next = NULL; } return bio; } static inline struct bio *bio_list_get(struct bio_list *bl) { struct bio *bio = bl->head; bl->head = bl->tail = NULL; return bio; } /* * Increment chain count for the bio. Make sure the CHAIN flag update * is visible before the raised count. */ static inline void bio_inc_remaining(struct bio *bio) { bio_set_flag(bio, BIO_CHAIN); smp_mb__before_atomic(); atomic_inc(&bio->__bi_remaining); } /* * bio_set is used to allow other portions of the IO system to * allocate their own private memory pools for bio and iovec structures. * These memory pools in turn all allocate from the bio_slab * and the bvec_slabs[]. */ #define BIO_POOL_SIZE 2 struct bio_set { struct kmem_cache *bio_slab; unsigned int front_pad; /* * per-cpu bio alloc cache */ struct bio_alloc_cache __percpu *cache; mempool_t bio_pool; mempool_t bvec_pool; unsigned int back_pad; /* * Deadlock avoidance for stacking block drivers: see comments in * bio_alloc_bioset() for details */ spinlock_t rescue_lock; struct bio_list rescue_list; struct work_struct rescue_work; struct workqueue_struct *rescue_workqueue; /* * Hot un-plug notifier for the per-cpu cache, if used */ struct hlist_node cpuhp_dead; }; static inline bool bioset_initialized(struct bio_set *bs) { return bs->bio_slab != NULL; } /* * Mark a bio as polled. Note that for async polled IO, the caller must * expect -EWOULDBLOCK if we cannot allocate a request (or other resources). * We cannot block waiting for requests on polled IO, as those completions * must be found by the caller. This is different than IRQ driven IO, where * it's safe to wait for IO to complete. */ static inline void bio_set_polled(struct bio *bio, struct kiocb *kiocb) { bio->bi_opf |= REQ_POLLED; if (kiocb->ki_flags & IOCB_NOWAIT) bio->bi_opf |= REQ_NOWAIT; } static inline void bio_clear_polled(struct bio *bio) { bio->bi_opf &= ~REQ_POLLED; } /** * bio_is_zone_append - is this a zone append bio? * @bio: bio to check * * Check if @bio is a zone append operation. Core block layer code and end_io * handlers must use this instead of an open coded REQ_OP_ZONE_APPEND check * because the block layer can rewrite REQ_OP_ZONE_APPEND to REQ_OP_WRITE if * it is not natively supported. */ static inline bool bio_is_zone_append(struct bio *bio) { if (!IS_ENABLED(CONFIG_BLK_DEV_ZONED)) return false; return bio_op(bio) == REQ_OP_ZONE_APPEND || bio_flagged(bio, BIO_EMULATES_ZONE_APPEND); } struct bio *blk_next_bio(struct bio *bio, struct block_device *bdev, unsigned int nr_pages, blk_opf_t opf, gfp_t gfp); struct bio *bio_chain_and_submit(struct bio *prev, struct bio *new); struct bio *blk_alloc_discard_bio(struct block_device *bdev, sector_t *sector, sector_t *nr_sects, gfp_t gfp_mask); #endif /* __LINUX_BIO_H */ |
20 3 1 14 10 23 2 22 14 14 10 1 10 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 | // SPDX-License-Identifier: GPL-2.0-or-later /* RomFS storage access routines * * Copyright © 2007 Red Hat, Inc. All Rights Reserved. * Written by David Howells (dhowells@redhat.com) */ #include <linux/fs.h> #include <linux/mtd/super.h> #include <linux/buffer_head.h> #include "internal.h" #if !defined(CONFIG_ROMFS_ON_MTD) && !defined(CONFIG_ROMFS_ON_BLOCK) #error no ROMFS backing store interface configured #endif #ifdef CONFIG_ROMFS_ON_MTD #define ROMFS_MTD_READ(sb, ...) mtd_read((sb)->s_mtd, ##__VA_ARGS__) /* * read data from an romfs image on an MTD device */ static int romfs_mtd_read(struct super_block *sb, unsigned long pos, void *buf, size_t buflen) { size_t rlen; int ret; ret = ROMFS_MTD_READ(sb, pos, buflen, &rlen, buf); return (ret < 0 || rlen != buflen) ? -EIO : 0; } /* * determine the length of a string in a romfs image on an MTD device */ static ssize_t romfs_mtd_strnlen(struct super_block *sb, unsigned long pos, size_t maxlen) { ssize_t n = 0; size_t segment; u_char buf[16], *p; size_t len; int ret; /* scan the string up to 16 bytes at a time */ while (maxlen > 0) { segment = min_t(size_t, maxlen, 16); ret = ROMFS_MTD_READ(sb, pos, segment, &len, buf); if (ret < 0) return ret; p = memchr(buf, 0, len); if (p) return n + (p - buf); maxlen -= len; pos += len; n += len; } return n; } /* * compare a string to one in a romfs image on MTD * - return 1 if matched, 0 if differ, -ve if error */ static int romfs_mtd_strcmp(struct super_block *sb, unsigned long pos, const char *str, size_t size) { u_char buf[17]; size_t len, segment; int ret; /* scan the string up to 16 bytes at a time, and attempt to grab the * trailing NUL whilst we're at it */ buf[0] = 0xff; while (size > 0) { segment = min_t(size_t, size + 1, 17); ret = ROMFS_MTD_READ(sb, pos, segment, &len, buf); if (ret < 0) return ret; len--; if (memcmp(buf, str, len) != 0) return 0; buf[0] = buf[len]; size -= len; pos += len; str += len; } /* check the trailing NUL was */ if (buf[0]) return 0; return 1; } #endif /* CONFIG_ROMFS_ON_MTD */ #ifdef CONFIG_ROMFS_ON_BLOCK /* * read data from an romfs image on a block device */ static int romfs_blk_read(struct super_block *sb, unsigned long pos, void *buf, size_t buflen) { struct buffer_head *bh; unsigned long offset; size_t segment; /* copy the string up to blocksize bytes at a time */ while (buflen > 0) { offset = pos & (ROMBSIZE - 1); segment = min_t(size_t, buflen, ROMBSIZE - offset); bh = sb_bread(sb, pos >> ROMBSBITS); if (!bh) return -EIO; memcpy(buf, bh->b_data + offset, segment); brelse(bh); buf += segment; buflen -= segment; pos += segment; } return 0; } /* * determine the length of a string in romfs on a block device */ static ssize_t romfs_blk_strnlen(struct super_block *sb, unsigned long pos, size_t limit) { struct buffer_head *bh; unsigned long offset; ssize_t n = 0; size_t segment; u_char *buf, *p; /* scan the string up to blocksize bytes at a time */ while (limit > 0) { offset = pos & (ROMBSIZE - 1); segment = min_t(size_t, limit, ROMBSIZE - offset); bh = sb_bread(sb, pos >> ROMBSBITS); if (!bh) return -EIO; buf = bh->b_data + offset; p = memchr(buf, 0, segment); brelse(bh); if (p) return n + (p - buf); limit -= segment; pos += segment; n += segment; } return n; } /* * compare a string to one in a romfs image on a block device * - return 1 if matched, 0 if differ, -ve if error */ static int romfs_blk_strcmp(struct super_block *sb, unsigned long pos, const char *str, size_t size) { struct buffer_head *bh; unsigned long offset; size_t segment; bool matched, terminated = false; /* compare string up to a block at a time */ while (size > 0) { offset = pos & (ROMBSIZE - 1); segment = min_t(size_t, size, ROMBSIZE - offset); bh = sb_bread(sb, pos >> ROMBSBITS); if (!bh) return -EIO; matched = (memcmp(bh->b_data + offset, str, segment) == 0); size -= segment; pos += segment; str += segment; if (matched && size == 0 && offset + segment < ROMBSIZE) { if (!bh->b_data[offset + segment]) terminated = true; else matched = false; } brelse(bh); if (!matched) return 0; } if (!terminated) { /* the terminating NUL must be on the first byte of the next * block */ BUG_ON((pos & (ROMBSIZE - 1)) != 0); bh = sb_bread(sb, pos >> ROMBSBITS); if (!bh) return -EIO; matched = !bh->b_data[0]; brelse(bh); if (!matched) return 0; } return 1; } #endif /* CONFIG_ROMFS_ON_BLOCK */ /* * read data from the romfs image */ int romfs_dev_read(struct super_block *sb, unsigned long pos, void *buf, size_t buflen) { size_t limit; limit = romfs_maxsize(sb); if (pos >= limit || buflen > limit - pos) return -EIO; #ifdef CONFIG_ROMFS_ON_MTD if (sb->s_mtd) return romfs_mtd_read(sb, pos, buf, buflen); #endif #ifdef CONFIG_ROMFS_ON_BLOCK if (sb->s_bdev) return romfs_blk_read(sb, pos, buf, buflen); #endif return -EIO; } /* * determine the length of a string in romfs */ ssize_t romfs_dev_strnlen(struct super_block *sb, unsigned long pos, size_t maxlen) { size_t limit; limit = romfs_maxsize(sb); if (pos >= limit) return -EIO; if (maxlen > limit - pos) maxlen = limit - pos; #ifdef CONFIG_ROMFS_ON_MTD if (sb->s_mtd) return romfs_mtd_strnlen(sb, pos, maxlen); #endif #ifdef CONFIG_ROMFS_ON_BLOCK if (sb->s_bdev) return romfs_blk_strnlen(sb, pos, maxlen); #endif return -EIO; } /* * compare a string to one in romfs * - the string to be compared to, str, may not be NUL-terminated; instead the * string is of the specified size * - return 1 if matched, 0 if differ, -ve if error */ int romfs_dev_strcmp(struct super_block *sb, unsigned long pos, const char *str, size_t size) { size_t limit; limit = romfs_maxsize(sb); if (pos >= limit) return -EIO; if (size > ROMFS_MAXFN) return -ENAMETOOLONG; if (size + 1 > limit - pos) return -EIO; #ifdef CONFIG_ROMFS_ON_MTD if (sb->s_mtd) return romfs_mtd_strcmp(sb, pos, str, size); #endif #ifdef CONFIG_ROMFS_ON_BLOCK if (sb->s_bdev) return romfs_blk_strcmp(sb, pos, str, size); #endif return -EIO; } |
5 59 60 60 33 28 28 28 140 14 126 2 1 2 139 139 139 140 140 57 82 70 70 200 23 197 128 70 70 128 4 4 4 151 152 115 152 2 5 142 141 3 1 1 1 136 1 3 3 60 59 1 58 2 60 60 60 60 2 9 1 3 3 2 1 1 2 2 5 2 1 1 1 143 2 140 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 | // SPDX-License-Identifier: GPL-2.0-only /* * Fd transport layer. Includes deprecated socket layer. * * Copyright (C) 2006 by Russ Cox <rsc@swtch.com> * Copyright (C) 2004-2005 by Latchesar Ionkov <lucho@ionkov.net> * Copyright (C) 2004-2008 by Eric Van Hensbergen <ericvh@gmail.com> * Copyright (C) 1997-2002 by Ron Minnich <rminnich@sarnoff.com> */ #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt #include <linux/in.h> #include <linux/in6.h> #include <linux/module.h> #include <linux/net.h> #include <linux/ipv6.h> #include <linux/kthread.h> #include <linux/errno.h> #include <linux/kernel.h> #include <linux/un.h> #include <linux/uaccess.h> #include <linux/inet.h> #include <linux/file.h> #include <linux/parser.h> #include <linux/slab.h> #include <linux/seq_file.h> #include <net/9p/9p.h> #include <net/9p/client.h> #include <net/9p/transport.h> #include <linux/syscalls.h> /* killme */ #define P9_PORT 564 #define MAX_SOCK_BUF (1024*1024) #define MAXPOLLWADDR 2 static struct p9_trans_module p9_tcp_trans; static struct p9_trans_module p9_fd_trans; /** * struct p9_fd_opts - per-transport options * @rfd: file descriptor for reading (trans=fd) * @wfd: file descriptor for writing (trans=fd) * @port: port to connect to (trans=tcp) * @privport: port is privileged */ struct p9_fd_opts { int rfd; int wfd; u16 port; bool privport; }; /* * Option Parsing (code inspired by NFS code) * - a little lazy - parse all fd-transport options */ enum { /* Options that take integer arguments */ Opt_port, Opt_rfdno, Opt_wfdno, Opt_err, /* Options that take no arguments */ Opt_privport, }; static const match_table_t tokens = { {Opt_port, "port=%u"}, {Opt_rfdno, "rfdno=%u"}, {Opt_wfdno, "wfdno=%u"}, {Opt_privport, "privport"}, {Opt_err, NULL}, }; enum { Rworksched = 1, /* read work scheduled or running */ Rpending = 2, /* can read */ Wworksched = 4, /* write work scheduled or running */ Wpending = 8, /* can write */ }; struct p9_poll_wait { struct p9_conn *conn; wait_queue_entry_t wait; wait_queue_head_t *wait_addr; }; /** * struct p9_conn - fd mux connection state information * @mux_list: list link for mux to manage multiple connections (?) * @client: reference to client instance for this connection * @err: error state * @req_lock: lock protecting req_list and requests statuses * @req_list: accounting for requests which have been sent * @unsent_req_list: accounting for requests that haven't been sent * @rreq: read request * @wreq: write request * @tmp_buf: temporary buffer to read in header * @rc: temporary fcall for reading current frame * @wpos: write position for current frame * @wsize: amount of data to write for current frame * @wbuf: current write buffer * @poll_pending_link: pending links to be polled per conn * @poll_wait: array of wait_q's for various worker threads * @pt: poll state * @rq: current read work * @wq: current write work * @wsched: ???? * */ struct p9_conn { struct list_head mux_list; struct p9_client *client; int err; spinlock_t req_lock; struct list_head req_list; struct list_head unsent_req_list; struct p9_req_t *rreq; struct p9_req_t *wreq; char tmp_buf[P9_HDRSZ]; struct p9_fcall rc; int wpos; int wsize; char *wbuf; struct list_head poll_pending_link; struct p9_poll_wait poll_wait[MAXPOLLWADDR]; poll_table pt; struct work_struct rq; struct work_struct wq; unsigned long wsched; }; /** * struct p9_trans_fd - transport state * @rd: reference to file to read from * @wr: reference of file to write to * @conn: connection state reference * */ struct p9_trans_fd { struct file *rd; struct file *wr; struct p9_conn conn; }; static void p9_poll_workfn(struct work_struct *work); static DEFINE_SPINLOCK(p9_poll_lock); static LIST_HEAD(p9_poll_pending_list); static DECLARE_WORK(p9_poll_work, p9_poll_workfn); static unsigned int p9_ipport_resv_min = P9_DEF_MIN_RESVPORT; static unsigned int p9_ipport_resv_max = P9_DEF_MAX_RESVPORT; static void p9_mux_poll_stop(struct p9_conn *m) { unsigned long flags; int i; for (i = 0; i < ARRAY_SIZE(m->poll_wait); i++) { struct p9_poll_wait *pwait = &m->poll_wait[i]; if (pwait->wait_addr) { remove_wait_queue(pwait->wait_addr, &pwait->wait); pwait->wait_addr = NULL; } } spin_lock_irqsave(&p9_poll_lock, flags); list_del_init(&m->poll_pending_link); spin_unlock_irqrestore(&p9_poll_lock, flags); flush_work(&p9_poll_work); } /** * p9_conn_cancel - cancel all pending requests with error * @m: mux data * @err: error code * */ static void p9_conn_cancel(struct p9_conn *m, int err) { struct p9_req_t *req, *rtmp; LIST_HEAD(cancel_list); p9_debug(P9_DEBUG_ERROR, "mux %p err %d\n", m, err); spin_lock(&m->req_lock); if (READ_ONCE(m->err)) { spin_unlock(&m->req_lock); return; } WRITE_ONCE(m->err, err); ASSERT_EXCLUSIVE_WRITER(m->err); list_for_each_entry_safe(req, rtmp, &m->req_list, req_list) { list_move(&req->req_list, &cancel_list); WRITE_ONCE(req->status, REQ_STATUS_ERROR); } list_for_each_entry_safe(req, rtmp, &m->unsent_req_list, req_list) { list_move(&req->req_list, &cancel_list); WRITE_ONCE(req->status, REQ_STATUS_ERROR); } spin_unlock(&m->req_lock); list_for_each_entry_safe(req, rtmp, &cancel_list, req_list) { p9_debug(P9_DEBUG_ERROR, "call back req %p\n", req); list_del(&req->req_list); if (!req->t_err) req->t_err = err; p9_client_cb(m->client, req, REQ_STATUS_ERROR); } } static __poll_t p9_fd_poll(struct p9_client *client, struct poll_table_struct *pt, int *err) { __poll_t ret; struct p9_trans_fd *ts = NULL; if (client && client->status == Connected) ts = client->trans; if (!ts) { if (err) *err = -EREMOTEIO; return EPOLLERR; } ret = vfs_poll(ts->rd, pt); if (ts->rd != ts->wr) ret = (ret & ~EPOLLOUT) | (vfs_poll(ts->wr, pt) & ~EPOLLIN); return ret; } /** * p9_fd_read- read from a fd * @client: client instance * @v: buffer to receive data into * @len: size of receive buffer * */ static int p9_fd_read(struct p9_client *client, void *v, int len) { int ret; struct p9_trans_fd *ts = NULL; loff_t pos; if (client && client->status != Disconnected) ts = client->trans; if (!ts) return -EREMOTEIO; if (!(ts->rd->f_flags & O_NONBLOCK)) p9_debug(P9_DEBUG_ERROR, "blocking read ...\n"); pos = ts->rd->f_pos; ret = kernel_read(ts->rd, v, len, &pos); if (ret <= 0 && ret != -ERESTARTSYS && ret != -EAGAIN) client->status = Disconnected; return ret; } /** * p9_read_work - called when there is some data to be read from a transport * @work: container of work to be done * */ static void p9_read_work(struct work_struct *work) { __poll_t n; int err; struct p9_conn *m; m = container_of(work, struct p9_conn, rq); if (READ_ONCE(m->err) < 0) return; p9_debug(P9_DEBUG_TRANS, "start mux %p pos %zd\n", m, m->rc.offset); if (!m->rc.sdata) { m->rc.sdata = m->tmp_buf; m->rc.offset = 0; m->rc.capacity = P9_HDRSZ; /* start by reading header */ } clear_bit(Rpending, &m->wsched); p9_debug(P9_DEBUG_TRANS, "read mux %p pos %zd size: %zd = %zd\n", m, m->rc.offset, m->rc.capacity, m->rc.capacity - m->rc.offset); err = p9_fd_read(m->client, m->rc.sdata + m->rc.offset, m->rc.capacity - m->rc.offset); p9_debug(P9_DEBUG_TRANS, "mux %p got %d bytes\n", m, err); if (err == -EAGAIN) goto end_clear; if (err <= 0) goto error; m->rc.offset += err; /* header read in */ if ((!m->rreq) && (m->rc.offset == m->rc.capacity)) { p9_debug(P9_DEBUG_TRANS, "got new header\n"); /* Header size */ m->rc.size = P9_HDRSZ; err = p9_parse_header(&m->rc, &m->rc.size, NULL, NULL, 0); if (err) { p9_debug(P9_DEBUG_ERROR, "error parsing header: %d\n", err); goto error; } p9_debug(P9_DEBUG_TRANS, "mux %p pkt: size: %d bytes tag: %d\n", m, m->rc.size, m->rc.tag); m->rreq = p9_tag_lookup(m->client, m->rc.tag); if (!m->rreq || (m->rreq->status != REQ_STATUS_SENT)) { p9_debug(P9_DEBUG_ERROR, "Unexpected packet tag %d\n", m->rc.tag); err = -EIO; goto error; } if (m->rc.size > m->rreq->rc.capacity) { p9_debug(P9_DEBUG_ERROR, "requested packet size too big: %d for tag %d with capacity %zd\n", m->rc.size, m->rc.tag, m->rreq->rc.capacity); err = -EIO; goto error; } if (!m->rreq->rc.sdata) { p9_debug(P9_DEBUG_ERROR, "No recv fcall for tag %d (req %p), disconnecting!\n", m->rc.tag, m->rreq); p9_req_put(m->client, m->rreq); m->rreq = NULL; err = -EIO; goto error; } m->rc.sdata = m->rreq->rc.sdata; memcpy(m->rc.sdata, m->tmp_buf, m->rc.capacity); m->rc.capacity = m->rc.size; } /* packet is read in * not an else because some packets (like clunk) have no payload */ if ((m->rreq) && (m->rc.offset == m->rc.capacity)) { p9_debug(P9_DEBUG_TRANS, "got new packet\n"); m->rreq->rc.size = m->rc.offset; spin_lock(&m->req_lock); if (m->rreq->status == REQ_STATUS_SENT) { list_del(&m->rreq->req_list); p9_client_cb(m->client, m->rreq, REQ_STATUS_RCVD); } else if (m->rreq->status == REQ_STATUS_FLSHD) { /* Ignore replies associated with a cancelled request. */ p9_debug(P9_DEBUG_TRANS, "Ignore replies associated with a cancelled request\n"); } else { spin_unlock(&m->req_lock); p9_debug(P9_DEBUG_ERROR, "Request tag %d errored out while we were reading the reply\n", m->rc.tag); err = -EIO; goto error; } spin_unlock(&m->req_lock); m->rc.sdata = NULL; m->rc.offset = 0; m->rc.capacity = 0; p9_req_put(m->client, m->rreq); m->rreq = NULL; } end_clear: clear_bit(Rworksched, &m->wsched); if (!list_empty(&m->req_list)) { if (test_and_clear_bit(Rpending, &m->wsched)) n = EPOLLIN; else n = p9_fd_poll(m->client, NULL, NULL); if ((n & EPOLLIN) && !test_and_set_bit(Rworksched, &m->wsched)) { p9_debug(P9_DEBUG_TRANS, "sched read work %p\n", m); schedule_work(&m->rq); } } return; error: p9_conn_cancel(m, err); clear_bit(Rworksched, &m->wsched); } /** * p9_fd_write - write to a socket * @client: client instance * @v: buffer to send data from * @len: size of send buffer * */ static int p9_fd_write(struct p9_client *client, void *v, int len) { ssize_t ret; struct p9_trans_fd *ts = NULL; if (client && client->status != Disconnected) ts = client->trans; if (!ts) return -EREMOTEIO; if (!(ts->wr->f_flags & O_NONBLOCK)) p9_debug(P9_DEBUG_ERROR, "blocking write ...\n"); ret = kernel_write(ts->wr, v, len, &ts->wr->f_pos); if (ret <= 0 && ret != -ERESTARTSYS && ret != -EAGAIN) client->status = Disconnected; return ret; } /** * p9_write_work - called when a transport can send some data * @work: container for work to be done * */ static void p9_write_work(struct work_struct *work) { __poll_t n; int err; struct p9_conn *m; struct p9_req_t *req; m = container_of(work, struct p9_conn, wq); if (READ_ONCE(m->err) < 0) { clear_bit(Wworksched, &m->wsched); return; } if (!m->wsize) { spin_lock(&m->req_lock); if (list_empty(&m->unsent_req_list)) { clear_bit(Wworksched, &m->wsched); spin_unlock(&m->req_lock); return; } req = list_entry(m->unsent_req_list.next, struct p9_req_t, req_list); WRITE_ONCE(req->status, REQ_STATUS_SENT); p9_debug(P9_DEBUG_TRANS, "move req %p\n", req); list_move_tail(&req->req_list, &m->req_list); m->wbuf = req->tc.sdata; m->wsize = req->tc.size; m->wpos = 0; p9_req_get(req); m->wreq = req; spin_unlock(&m->req_lock); } p9_debug(P9_DEBUG_TRANS, "mux %p pos %d size %d\n", m, m->wpos, m->wsize); clear_bit(Wpending, &m->wsched); err = p9_fd_write(m->client, m->wbuf + m->wpos, m->wsize - m->wpos); p9_debug(P9_DEBUG_TRANS, "mux %p sent %d bytes\n", m, err); if (err == -EAGAIN) goto end_clear; if (err < 0) goto error; else if (err == 0) { err = -EREMOTEIO; goto error; } m->wpos += err; if (m->wpos == m->wsize) { m->wpos = m->wsize = 0; p9_req_put(m->client, m->wreq); m->wreq = NULL; } end_clear: clear_bit(Wworksched, &m->wsched); if (m->wsize || !list_empty(&m->unsent_req_list)) { if (test_and_clear_bit(Wpending, &m->wsched)) n = EPOLLOUT; else n = p9_fd_poll(m->client, NULL, NULL); if ((n & EPOLLOUT) && !test_and_set_bit(Wworksched, &m->wsched)) { p9_debug(P9_DEBUG_TRANS, "sched write work %p\n", m); schedule_work(&m->wq); } } return; error: p9_conn_cancel(m, err); clear_bit(Wworksched, &m->wsched); } static int p9_pollwake(wait_queue_entry_t *wait, unsigned int mode, int sync, void *key) { struct p9_poll_wait *pwait = container_of(wait, struct p9_poll_wait, wait); struct p9_conn *m = pwait->conn; unsigned long flags; spin_lock_irqsave(&p9_poll_lock, flags); if (list_empty(&m->poll_pending_link)) list_add_tail(&m->poll_pending_link, &p9_poll_pending_list); spin_unlock_irqrestore(&p9_poll_lock, flags); schedule_work(&p9_poll_work); return 1; } /** * p9_pollwait - add poll task to the wait queue * @filp: file pointer being polled * @wait_address: wait_q to block on * @p: poll state * * called by files poll operation to add v9fs-poll task to files wait queue */ static void p9_pollwait(struct file *filp, wait_queue_head_t *wait_address, poll_table *p) { struct p9_conn *m = container_of(p, struct p9_conn, pt); struct p9_poll_wait *pwait = NULL; int i; for (i = 0; i < ARRAY_SIZE(m->poll_wait); i++) { if (m->poll_wait[i].wait_addr == NULL) { pwait = &m->poll_wait[i]; break; } } if (!pwait) { p9_debug(P9_DEBUG_ERROR, "not enough wait_address slots\n"); return; } pwait->conn = m; pwait->wait_addr = wait_address; init_waitqueue_func_entry(&pwait->wait, p9_pollwake); add_wait_queue(wait_address, &pwait->wait); } /** * p9_conn_create - initialize the per-session mux data * @client: client instance * * Note: Creates the polling task if this is the first session. */ static void p9_conn_create(struct p9_client *client) { __poll_t n; struct p9_trans_fd *ts = client->trans; struct p9_conn *m = &ts->conn; p9_debug(P9_DEBUG_TRANS, "client %p msize %d\n", client, client->msize); INIT_LIST_HEAD(&m->mux_list); m->client = client; spin_lock_init(&m->req_lock); INIT_LIST_HEAD(&m->req_list); INIT_LIST_HEAD(&m->unsent_req_list); INIT_WORK(&m->rq, p9_read_work); INIT_WORK(&m->wq, p9_write_work); INIT_LIST_HEAD(&m->poll_pending_link); init_poll_funcptr(&m->pt, p9_pollwait); n = p9_fd_poll(client, &m->pt, NULL); if (n & EPOLLIN) { p9_debug(P9_DEBUG_TRANS, "mux %p can read\n", m); set_bit(Rpending, &m->wsched); } if (n & EPOLLOUT) { p9_debug(P9_DEBUG_TRANS, "mux %p can write\n", m); set_bit(Wpending, &m->wsched); } } /** * p9_poll_mux - polls a mux and schedules read or write works if necessary * @m: connection to poll * */ static void p9_poll_mux(struct p9_conn *m) { __poll_t n; int err = -ECONNRESET; if (READ_ONCE(m->err) < 0) return; n = p9_fd_poll(m->client, NULL, &err); if (n & (EPOLLERR | EPOLLHUP | EPOLLNVAL)) { p9_debug(P9_DEBUG_TRANS, "error mux %p err %d\n", m, n); p9_conn_cancel(m, err); } if (n & EPOLLIN) { set_bit(Rpending, &m->wsched); p9_debug(P9_DEBUG_TRANS, "mux %p can read\n", m); if (!test_and_set_bit(Rworksched, &m->wsched)) { p9_debug(P9_DEBUG_TRANS, "sched read work %p\n", m); schedule_work(&m->rq); } } if (n & EPOLLOUT) { set_bit(Wpending, &m->wsched); p9_debug(P9_DEBUG_TRANS, "mux %p can write\n", m); if ((m->wsize || !list_empty(&m->unsent_req_list)) && !test_and_set_bit(Wworksched, &m->wsched)) { p9_debug(P9_DEBUG_TRANS, "sched write work %p\n", m); schedule_work(&m->wq); } } } /** * p9_fd_request - send 9P request * The function can sleep until the request is scheduled for sending. * The function can be interrupted. Return from the function is not * a guarantee that the request is sent successfully. * * @client: client instance * @req: request to be sent * */ static int p9_fd_request(struct p9_client *client, struct p9_req_t *req) { __poll_t n; int err; struct p9_trans_fd *ts = client->trans; struct p9_conn *m = &ts->conn; p9_debug(P9_DEBUG_TRANS, "mux %p task %p tcall %p id %d\n", m, current, &req->tc, req->tc.id); spin_lock(&m->req_lock); err = READ_ONCE(m->err); if (err < 0) { spin_unlock(&m->req_lock); return err; } WRITE_ONCE(req->status, REQ_STATUS_UNSENT); list_add_tail(&req->req_list, &m->unsent_req_list); spin_unlock(&m->req_lock); if (test_and_clear_bit(Wpending, &m->wsched)) n = EPOLLOUT; else n = p9_fd_poll(m->client, NULL, NULL); if (n & EPOLLOUT && !test_and_set_bit(Wworksched, &m->wsched)) schedule_work(&m->wq); return 0; } static int p9_fd_cancel(struct p9_client *client, struct p9_req_t *req) { struct p9_trans_fd *ts = client->trans; struct p9_conn *m = &ts->conn; int ret = 1; p9_debug(P9_DEBUG_TRANS, "client %p req %p\n", client, req); spin_lock(&m->req_lock); if (req->status == REQ_STATUS_UNSENT) { list_del(&req->req_list); WRITE_ONCE(req->status, REQ_STATUS_FLSHD); p9_req_put(client, req); ret = 0; } spin_unlock(&m->req_lock); return ret; } static int p9_fd_cancelled(struct p9_client *client, struct p9_req_t *req) { struct p9_trans_fd *ts = client->trans; struct p9_conn *m = &ts->conn; p9_debug(P9_DEBUG_TRANS, "client %p req %p\n", client, req); spin_lock(&m->req_lock); /* Ignore cancelled request if message has been received * before lock. */ if (req->status == REQ_STATUS_RCVD) { spin_unlock(&m->req_lock); return 0; } /* we haven't received a response for oldreq, * remove it from the list. */ list_del(&req->req_list); WRITE_ONCE(req->status, REQ_STATUS_FLSHD); spin_unlock(&m->req_lock); p9_req_put(client, req); return 0; } static int p9_fd_show_options(struct seq_file *m, struct p9_client *clnt) { if (clnt->trans_mod == &p9_tcp_trans) { if (clnt->trans_opts.tcp.port != P9_PORT) seq_printf(m, ",port=%u", clnt->trans_opts.tcp.port); } else if (clnt->trans_mod == &p9_fd_trans) { if (clnt->trans_opts.fd.rfd != ~0) seq_printf(m, ",rfd=%u", clnt->trans_opts.fd.rfd); if (clnt->trans_opts.fd.wfd != ~0) seq_printf(m, ",wfd=%u", clnt->trans_opts.fd.wfd); } return 0; } /** * parse_opts - parse mount options into p9_fd_opts structure * @params: options string passed from mount * @opts: fd transport-specific structure to parse options into * * Returns 0 upon success, -ERRNO upon failure */ static int parse_opts(char *params, struct p9_fd_opts *opts) { char *p; substring_t args[MAX_OPT_ARGS]; int option; char *options, *tmp_options; opts->port = P9_PORT; opts->rfd = ~0; opts->wfd = ~0; opts->privport = false; if (!params) return 0; tmp_options = kstrdup(params, GFP_KERNEL); if (!tmp_options) { p9_debug(P9_DEBUG_ERROR, "failed to allocate copy of option string\n"); return -ENOMEM; } options = tmp_options; while ((p = strsep(&options, ",")) != NULL) { int token; int r; if (!*p) continue; token = match_token(p, tokens, args); if ((token != Opt_err) && (token != Opt_privport)) { r = match_int(&args[0], &option); if (r < 0) { p9_debug(P9_DEBUG_ERROR, "integer field, but no integer?\n"); continue; } } switch (token) { case Opt_port: opts->port = option; break; case Opt_rfdno: opts->rfd = option; break; case Opt_wfdno: opts->wfd = option; break; case Opt_privport: opts->privport = true; break; default: continue; } } kfree(tmp_options); return 0; } static int p9_fd_open(struct p9_client *client, int rfd, int wfd) { struct p9_trans_fd *ts = kzalloc(sizeof(struct p9_trans_fd), GFP_KERNEL); if (!ts) return -ENOMEM; ts->rd = fget(rfd); if (!ts->rd) goto out_free_ts; if (!(ts->rd->f_mode & FMODE_READ)) goto out_put_rd; /* Prevent workers from hanging on IO when fd is a pipe. * It's technically possible for userspace or concurrent mounts to * modify this flag concurrently, which will likely result in a * broken filesystem. However, just having bad flags here should * not crash the kernel or cause any other sort of bug, so mark this * particular data race as intentional so that tooling (like KCSAN) * can allow it and detect further problems. */ data_race(ts->rd->f_flags |= O_NONBLOCK); ts->wr = fget(wfd); if (!ts->wr) goto out_put_rd; if (!(ts->wr->f_mode & FMODE_WRITE)) goto out_put_wr; data_race(ts->wr->f_flags |= O_NONBLOCK); client->trans = ts; client->status = Connected; return 0; out_put_wr: fput(ts->wr); out_put_rd: fput(ts->rd); out_free_ts: kfree(ts); return -EIO; } static int p9_socket_open(struct p9_client *client, struct socket *csocket) { struct p9_trans_fd *p; struct file *file; p = kzalloc(sizeof(struct p9_trans_fd), GFP_KERNEL); if (!p) { sock_release(csocket); return -ENOMEM; } csocket->sk->sk_allocation = GFP_NOIO; csocket->sk->sk_use_task_frag = false; file = sock_alloc_file(csocket, 0, NULL); if (IS_ERR(file)) { pr_err("%s (%d): failed to map fd\n", __func__, task_pid_nr(current)); kfree(p); return PTR_ERR(file); } get_file(file); p->wr = p->rd = file; client->trans = p; client->status = Connected; p->rd->f_flags |= O_NONBLOCK; p9_conn_create(client); return 0; } /** * p9_conn_destroy - cancels all pending requests of mux * @m: mux to destroy * */ static void p9_conn_destroy(struct p9_conn *m) { p9_debug(P9_DEBUG_TRANS, "mux %p prev %p next %p\n", m, m->mux_list.prev, m->mux_list.next); p9_mux_poll_stop(m); cancel_work_sync(&m->rq); if (m->rreq) { p9_req_put(m->client, m->rreq); m->rreq = NULL; } cancel_work_sync(&m->wq); if (m->wreq) { p9_req_put(m->client, m->wreq); m->wreq = NULL; } p9_conn_cancel(m, -ECONNRESET); m->client = NULL; } /** * p9_fd_close - shutdown file descriptor transport * @client: client instance * */ static void p9_fd_close(struct p9_client *client) { struct p9_trans_fd *ts; if (!client) return; ts = client->trans; if (!ts) return; client->status = Disconnected; p9_conn_destroy(&ts->conn); if (ts->rd) fput(ts->rd); if (ts->wr) fput(ts->wr); kfree(ts); } static int p9_bind_privport(struct socket *sock) { struct sockaddr_storage stor = { 0 }; int port, err = -EINVAL; stor.ss_family = sock->ops->family; if (stor.ss_family == AF_INET) ((struct sockaddr_in *)&stor)->sin_addr.s_addr = htonl(INADDR_ANY); else ((struct sockaddr_in6 *)&stor)->sin6_addr = in6addr_any; for (port = p9_ipport_resv_max; port >= p9_ipport_resv_min; port--) { if (stor.ss_family == AF_INET) ((struct sockaddr_in *)&stor)->sin_port = htons((ushort)port); else ((struct sockaddr_in6 *)&stor)->sin6_port = htons((ushort)port); err = kernel_bind(sock, (struct sockaddr *)&stor, sizeof(stor)); if (err != -EADDRINUSE) break; } return err; } static int p9_fd_create_tcp(struct p9_client *client, const char *addr, char *args) { int err; char port_str[6]; struct socket *csocket; struct sockaddr_storage stor = { 0 }; struct p9_fd_opts opts; err = parse_opts(args, &opts); if (err < 0) return err; if (!addr) return -EINVAL; sprintf(port_str, "%u", opts.port); err = inet_pton_with_scope(current->nsproxy->net_ns, AF_UNSPEC, addr, port_str, &stor); if (err < 0) return err; csocket = NULL; client->trans_opts.tcp.port = opts.port; client->trans_opts.tcp.privport = opts.privport; err = __sock_create(current->nsproxy->net_ns, stor.ss_family, SOCK_STREAM, IPPROTO_TCP, &csocket, 1); if (err) { pr_err("%s (%d): problem creating socket\n", __func__, task_pid_nr(current)); return err; } if (opts.privport) { err = p9_bind_privport(csocket); if (err < 0) { pr_err("%s (%d): problem binding to privport\n", __func__, task_pid_nr(current)); sock_release(csocket); return err; } } err = READ_ONCE(csocket->ops)->connect(csocket, (struct sockaddr *)&stor, sizeof(stor), 0); if (err < 0) { pr_err("%s (%d): problem connecting socket to %s\n", __func__, task_pid_nr(current), addr); sock_release(csocket); return err; } return p9_socket_open(client, csocket); } static int p9_fd_create_unix(struct p9_client *client, const char *addr, char *args) { int err; struct socket *csocket; struct sockaddr_un sun_server; csocket = NULL; if (!addr || !strlen(addr)) return -EINVAL; if (strlen(addr) >= UNIX_PATH_MAX) { pr_err("%s (%d): address too long: %s\n", __func__, task_pid_nr(current), addr); return -ENAMETOOLONG; } sun_server.sun_family = PF_UNIX; strcpy(sun_server.sun_path, addr); err = __sock_create(current->nsproxy->net_ns, PF_UNIX, SOCK_STREAM, 0, &csocket, 1); if (err < 0) { pr_err("%s (%d): problem creating socket\n", __func__, task_pid_nr(current)); return err; } err = READ_ONCE(csocket->ops)->connect(csocket, (struct sockaddr *)&sun_server, sizeof(struct sockaddr_un) - 1, 0); if (err < 0) { pr_err("%s (%d): problem connecting socket: %s: %d\n", __func__, task_pid_nr(current), addr, err); sock_release(csocket); return err; } return p9_socket_open(client, csocket); } static int p9_fd_create(struct p9_client *client, const char *addr, char *args) { int err; struct p9_fd_opts opts; err = parse_opts(args, &opts); if (err < 0) return err; client->trans_opts.fd.rfd = opts.rfd; client->trans_opts.fd.wfd = opts.wfd; if (opts.rfd == ~0 || opts.wfd == ~0) { pr_err("Insufficient options for proto=fd\n"); return -ENOPROTOOPT; } err = p9_fd_open(client, opts.rfd, opts.wfd); if (err < 0) return err; p9_conn_create(client); return 0; } static struct p9_trans_module p9_tcp_trans = { .name = "tcp", .maxsize = MAX_SOCK_BUF, .pooled_rbuffers = false, .def = 0, .create = p9_fd_create_tcp, .close = p9_fd_close, .request = p9_fd_request, .cancel = p9_fd_cancel, .cancelled = p9_fd_cancelled, .show_options = p9_fd_show_options, .owner = THIS_MODULE, }; MODULE_ALIAS_9P("tcp"); static struct p9_trans_module p9_unix_trans = { .name = "unix", .maxsize = MAX_SOCK_BUF, .def = 0, .create = p9_fd_create_unix, .close = p9_fd_close, .request = p9_fd_request, .cancel = p9_fd_cancel, .cancelled = p9_fd_cancelled, .show_options = p9_fd_show_options, .owner = THIS_MODULE, }; MODULE_ALIAS_9P("unix"); static struct p9_trans_module p9_fd_trans = { .name = "fd", .maxsize = MAX_SOCK_BUF, .def = 0, .create = p9_fd_create, .close = p9_fd_close, .request = p9_fd_request, .cancel = p9_fd_cancel, .cancelled = p9_fd_cancelled, .show_options = p9_fd_show_options, .owner = THIS_MODULE, }; MODULE_ALIAS_9P("fd"); /** * p9_poll_workfn - poll worker thread * @work: work queue * * polls all v9fs transports for new events and queues the appropriate * work to the work queue * */ static void p9_poll_workfn(struct work_struct *work) { unsigned long flags; p9_debug(P9_DEBUG_TRANS, "start %p\n", current); spin_lock_irqsave(&p9_poll_lock, flags); while (!list_empty(&p9_poll_pending_list)) { struct p9_conn *conn = list_first_entry(&p9_poll_pending_list, struct p9_conn, poll_pending_link); list_del_init(&conn->poll_pending_link); spin_unlock_irqrestore(&p9_poll_lock, flags); p9_poll_mux(conn); spin_lock_irqsave(&p9_poll_lock, flags); } spin_unlock_irqrestore(&p9_poll_lock, flags); p9_debug(P9_DEBUG_TRANS, "finish\n"); } static int __init p9_trans_fd_init(void) { v9fs_register_trans(&p9_tcp_trans); v9fs_register_trans(&p9_unix_trans); v9fs_register_trans(&p9_fd_trans); return 0; } static void __exit p9_trans_fd_exit(void) { flush_work(&p9_poll_work); v9fs_unregister_trans(&p9_tcp_trans); v9fs_unregister_trans(&p9_unix_trans); v9fs_unregister_trans(&p9_fd_trans); } module_init(p9_trans_fd_init); module_exit(p9_trans_fd_exit); MODULE_AUTHOR("Eric Van Hensbergen <ericvh@gmail.com>"); MODULE_DESCRIPTION("Filedescriptor Transport for 9P"); MODULE_LICENSE("GPL"); |
89 89 85 61 65 66 6 62 72 44 65 72 41 69 66 40 64 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 | // SPDX-License-Identifier: GPL-2.0+ #include <linux/iosys-map.h> #include <drm/drm_atomic.h> #include <drm/drm_atomic_helper.h> #include <drm/drm_blend.h> #include <drm/drm_fourcc.h> #include <drm/drm_gem_atomic_helper.h> #include <drm/drm_gem_framebuffer_helper.h> #include "vkms_drv.h" #include "vkms_formats.h" static const u32 vkms_formats[] = { DRM_FORMAT_ARGB8888, DRM_FORMAT_XRGB8888, DRM_FORMAT_ABGR8888, DRM_FORMAT_XRGB16161616, DRM_FORMAT_ARGB16161616, DRM_FORMAT_RGB565 }; static struct drm_plane_state * vkms_plane_duplicate_state(struct drm_plane *plane) { struct vkms_plane_state *vkms_state; struct vkms_frame_info *frame_info; vkms_state = kzalloc(sizeof(*vkms_state), GFP_KERNEL); if (!vkms_state) return NULL; frame_info = kzalloc(sizeof(*frame_info), GFP_KERNEL); if (!frame_info) { DRM_DEBUG_KMS("Couldn't allocate frame_info\n"); kfree(vkms_state); return NULL; } vkms_state->frame_info = frame_info; __drm_gem_duplicate_shadow_plane_state(plane, &vkms_state->base); return &vkms_state->base.base; } static void vkms_plane_destroy_state(struct drm_plane *plane, struct drm_plane_state *old_state) { struct vkms_plane_state *vkms_state = to_vkms_plane_state(old_state); struct drm_crtc *crtc = vkms_state->base.base.crtc; if (crtc && vkms_state->frame_info->fb) { /* dropping the reference we acquired in * vkms_primary_plane_update() */ if (drm_framebuffer_read_refcount(vkms_state->frame_info->fb)) drm_framebuffer_put(vkms_state->frame_info->fb); } kfree(vkms_state->frame_info); vkms_state->frame_info = NULL; __drm_gem_destroy_shadow_plane_state(&vkms_state->base); kfree(vkms_state); } static void vkms_plane_reset(struct drm_plane *plane) { struct vkms_plane_state *vkms_state; if (plane->state) { vkms_plane_destroy_state(plane, plane->state); plane->state = NULL; /* must be set to NULL here */ } vkms_state = kzalloc(sizeof(*vkms_state), GFP_KERNEL); if (!vkms_state) { DRM_ERROR("Cannot allocate vkms_plane_state\n"); return; } __drm_gem_reset_shadow_plane(plane, &vkms_state->base); } static const struct drm_plane_funcs vkms_plane_funcs = { .update_plane = drm_atomic_helper_update_plane, .disable_plane = drm_atomic_helper_disable_plane, .reset = vkms_plane_reset, .atomic_duplicate_state = vkms_plane_duplicate_state, .atomic_destroy_state = vkms_plane_destroy_state, }; static void vkms_plane_atomic_update(struct drm_plane *plane, struct drm_atomic_state *state) { struct drm_plane_state *new_state = drm_atomic_get_new_plane_state(state, plane); struct vkms_plane_state *vkms_plane_state; struct drm_shadow_plane_state *shadow_plane_state; struct drm_framebuffer *fb = new_state->fb; struct vkms_frame_info *frame_info; u32 fmt; if (!new_state->crtc || !fb) return; fmt = fb->format->format; vkms_plane_state = to_vkms_plane_state(new_state); shadow_plane_state = &vkms_plane_state->base; frame_info = vkms_plane_state->frame_info; memcpy(&frame_info->src, &new_state->src, sizeof(struct drm_rect)); memcpy(&frame_info->dst, &new_state->dst, sizeof(struct drm_rect)); frame_info->fb = fb; memcpy(&frame_info->map, &shadow_plane_state->data, sizeof(frame_info->map)); drm_framebuffer_get(frame_info->fb); frame_info->rotation = new_state->rotation; vkms_plane_state->pixel_read_line = get_pixel_read_line_function(fmt); } static int vkms_plane_atomic_check(struct drm_plane *plane, struct drm_atomic_state *state) { struct drm_plane_state *new_plane_state = drm_atomic_get_new_plane_state(state, plane); struct drm_crtc_state *crtc_state; int ret; if (!new_plane_state->fb || WARN_ON(!new_plane_state->crtc)) return 0; crtc_state = drm_atomic_get_crtc_state(state, new_plane_state->crtc); if (IS_ERR(crtc_state)) return PTR_ERR(crtc_state); ret = drm_atomic_helper_check_plane_state(new_plane_state, crtc_state, DRM_PLANE_NO_SCALING, DRM_PLANE_NO_SCALING, true, true); if (ret != 0) return ret; return 0; } static int vkms_prepare_fb(struct drm_plane *plane, struct drm_plane_state *state) { struct drm_shadow_plane_state *shadow_plane_state; struct drm_framebuffer *fb = state->fb; int ret; if (!fb) return 0; shadow_plane_state = to_drm_shadow_plane_state(state); ret = drm_gem_plane_helper_prepare_fb(plane, state); if (ret) return ret; return drm_gem_fb_vmap(fb, shadow_plane_state->map, shadow_plane_state->data); } static void vkms_cleanup_fb(struct drm_plane *plane, struct drm_plane_state *state) { struct drm_shadow_plane_state *shadow_plane_state; struct drm_framebuffer *fb = state->fb; if (!fb) return; shadow_plane_state = to_drm_shadow_plane_state(state); drm_gem_fb_vunmap(fb, shadow_plane_state->map); } static const struct drm_plane_helper_funcs vkms_plane_helper_funcs = { .atomic_update = vkms_plane_atomic_update, .atomic_check = vkms_plane_atomic_check, .prepare_fb = vkms_prepare_fb, .cleanup_fb = vkms_cleanup_fb, }; struct vkms_plane *vkms_plane_init(struct vkms_device *vkmsdev, enum drm_plane_type type) { struct drm_device *dev = &vkmsdev->drm; struct vkms_plane *plane; plane = drmm_universal_plane_alloc(dev, struct vkms_plane, base, 0, &vkms_plane_funcs, vkms_formats, ARRAY_SIZE(vkms_formats), NULL, type, NULL); if (IS_ERR(plane)) return plane; drm_plane_helper_add(&plane->base, &vkms_plane_helper_funcs); drm_plane_create_rotation_property(&plane->base, DRM_MODE_ROTATE_0, DRM_MODE_ROTATE_MASK | DRM_MODE_REFLECT_MASK); return plane; } |
14 158 8 154 5 154 158 70 123 156 157 157 1 1 13 9 4 13 13 21 1 9 19 22 22 5 2 7 16 15 20 2 1 2 25 1 24 6 15 2 3 1 12 13 39 38 19 28 7 4 5 6 3 97 21 126 114 137 118 111 115 116 21 2 2 5 122 126 124 111 110 111 111 111 105 1 106 96 106 96 95 96 104 96 102 3 107 20 116 21 123 42 122 128 127 37 26 4 7 7 7 2 4 1 5 3 2 2 5 5 4 1 3 2 2 1 2 2 2 94 95 126 124 2 6 1 3 1 1 4 2 2 2 3 5 5 5 2 2 2 2 2 2 2 4 3 1 2 1 2 2 2 2 5 42 42 1 5 1 29 1 1 1 1 28 1 1 27 1 5 1 1 1 3 1 3 2 1 3 1 1 1 2 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 2297 2298 2299 2300 2301 2302 2303 2304 2305 2306 2307 2308 2309 2310 2311 2312 2313 2314 2315 2316 2317 2318 2319 2320 2321 2322 2323 2324 2325 2326 2327 2328 2329 2330 2331 2332 2333 2334 2335 2336 2337 2338 2339 2340 2341 2342 2343 2344 2345 2346 2347 2348 2349 2350 2351 2352 2353 2354 2355 2356 2357 2358 2359 2360 2361 2362 2363 2364 2365 2366 2367 2368 2369 2370 2371 2372 2373 2374 2375 2376 2377 2378 2379 2380 2381 2382 2383 2384 2385 2386 2387 2388 2389 2390 2391 2392 2393 2394 2395 2396 2397 2398 2399 2400 2401 2402 2403 2404 2405 2406 2407 2408 2409 2410 2411 2412 2413 2414 2415 2416 2417 2418 2419 2420 2421 2422 2423 2424 2425 2426 2427 2428 2429 2430 2431 2432 2433 2434 2435 2436 2437 2438 2439 2440 2441 2442 2443 2444 2445 2446 2447 2448 2449 2450 2451 2452 2453 2454 2455 2456 2457 2458 2459 2460 2461 2462 2463 | // SPDX-License-Identifier: GPL-2.0 /* * message.c - synchronous message handling * * Released under the GPLv2 only. */ #include <linux/acpi.h> #include <linux/pci.h> /* for scatterlist macros */ #include <linux/usb.h> #include <linux/module.h> #include <linux/of.h> #include <linux/slab.h> #include <linux/mm.h> #include <linux/timer.h> #include <linux/ctype.h> #include <linux/nls.h> #include <linux/device.h> #include <linux/scatterlist.h> #include <linux/usb/cdc.h> #include <linux/usb/quirks.h> #include <linux/usb/hcd.h> /* for usbcore internals */ #include <linux/usb/of.h> #include <asm/byteorder.h> #include "usb.h" static void cancel_async_set_config(struct usb_device *udev); struct api_context { struct completion done; int status; }; static void usb_api_blocking_completion(struct urb *urb) { struct api_context *ctx = urb->context; ctx->status = urb->status; complete(&ctx->done); } /* * Starts urb and waits for completion or timeout. Note that this call * is NOT interruptible. Many device driver i/o requests should be * interruptible and therefore these drivers should implement their * own interruptible routines. */ static int usb_start_wait_urb(struct urb *urb, int timeout, int *actual_length) { struct api_context ctx; unsigned long expire; int retval; init_completion(&ctx.done); urb->context = &ctx; urb->actual_length = 0; retval = usb_submit_urb(urb, GFP_NOIO); if (unlikely(retval)) goto out; expire = timeout ? msecs_to_jiffies(timeout) : MAX_SCHEDULE_TIMEOUT; if (!wait_for_completion_timeout(&ctx.done, expire)) { usb_kill_urb(urb); retval = (ctx.status == -ENOENT ? -ETIMEDOUT : ctx.status); dev_dbg(&urb->dev->dev, "%s timed out on ep%d%s len=%u/%u\n", current->comm, usb_endpoint_num(&urb->ep->desc), usb_urb_dir_in(urb) ? "in" : "out", urb->actual_length, urb->transfer_buffer_length); } else retval = ctx.status; out: if (actual_length) *actual_length = urb->actual_length; usb_free_urb(urb); return retval; } /*-------------------------------------------------------------------*/ /* returns status (negative) or length (positive) */ static int usb_internal_control_msg(struct usb_device *usb_dev, unsigned int pipe, struct usb_ctrlrequest *cmd, void *data, int len, int timeout) { struct urb *urb; int retv; int length; urb = usb_alloc_urb(0, GFP_NOIO); if (!urb) return -ENOMEM; usb_fill_control_urb(urb, usb_dev, pipe, (unsigned char *)cmd, data, len, usb_api_blocking_completion, NULL); retv = usb_start_wait_urb(urb, timeout, &length); if (retv < 0) return retv; else return length; } /** * usb_control_msg - Builds a control urb, sends it off and waits for completion * @dev: pointer to the usb device to send the message to * @pipe: endpoint "pipe" to send the message to * @request: USB message request value * @requesttype: USB message request type value * @value: USB message value * @index: USB message index value * @data: pointer to the data to send * @size: length in bytes of the data to send * @timeout: time in msecs to wait for the message to complete before timing * out (if 0 the wait is forever) * * Context: task context, might sleep. * * This function sends a simple control message to a specified endpoint and * waits for the message to complete, or timeout. * * Don't use this function from within an interrupt context. If you need * an asynchronous message, or need to send a message from within interrupt * context, use usb_submit_urb(). If a thread in your driver uses this call, * make sure your disconnect() method can wait for it to complete. Since you * don't have a handle on the URB used, you can't cancel the request. * * Return: If successful, the number of bytes transferred. Otherwise, a negative * error number. */ int usb_control_msg(struct usb_device *dev, unsigned int pipe, __u8 request, __u8 requesttype, __u16 value, __u16 index, void *data, __u16 size, int timeout) { struct usb_ctrlrequest *dr; int ret; dr = kmalloc(sizeof(struct usb_ctrlrequest), GFP_NOIO); if (!dr) return -ENOMEM; dr->bRequestType = requesttype; dr->bRequest = request; dr->wValue = cpu_to_le16(value); dr->wIndex = cpu_to_le16(index); dr->wLength = cpu_to_le16(size); ret = usb_internal_control_msg(dev, pipe, dr, data, size, timeout); /* Linger a bit, prior to the next control message. */ if (dev->quirks & USB_QUIRK_DELAY_CTRL_MSG) msleep(200); kfree(dr); return ret; } EXPORT_SYMBOL_GPL(usb_control_msg); /** * usb_control_msg_send - Builds a control "send" message, sends it off and waits for completion * @dev: pointer to the usb device to send the message to * @endpoint: endpoint to send the message to * @request: USB message request value * @requesttype: USB message request type value * @value: USB message value * @index: USB message index value * @driver_data: pointer to the data to send * @size: length in bytes of the data to send * @timeout: time in msecs to wait for the message to complete before timing * out (if 0 the wait is forever) * @memflags: the flags for memory allocation for buffers * * Context: !in_interrupt () * * This function sends a control message to a specified endpoint that is not * expected to fill in a response (i.e. a "send message") and waits for the * message to complete, or timeout. * * Do not use this function from within an interrupt context. If you need * an asynchronous message, or need to send a message from within interrupt * context, use usb_submit_urb(). If a thread in your driver uses this call, * make sure your disconnect() method can wait for it to complete. Since you * don't have a handle on the URB used, you can't cancel the request. * * The data pointer can be made to a reference on the stack, or anywhere else, * as it will not be modified at all. This does not have the restriction that * usb_control_msg() has where the data pointer must be to dynamically allocated * memory (i.e. memory that can be successfully DMAed to a device). * * Return: If successful, 0 is returned, Otherwise, a negative error number. */ int usb_control_msg_send(struct usb_device *dev, __u8 endpoint, __u8 request, __u8 requesttype, __u16 value, __u16 index, const void *driver_data, __u16 size, int timeout, gfp_t memflags) { unsigned int pipe = usb_sndctrlpipe(dev, endpoint); int ret; u8 *data = NULL; if (size) { data = kmemdup(driver_data, size, memflags); if (!data) return -ENOMEM; } ret = usb_control_msg(dev, pipe, request, requesttype, value, index, data, size, timeout); kfree(data); if (ret < 0) return ret; return 0; } EXPORT_SYMBOL_GPL(usb_control_msg_send); /** * usb_control_msg_recv - Builds a control "receive" message, sends it off and waits for completion * @dev: pointer to the usb device to send the message to * @endpoint: endpoint to send the message to * @request: USB message request value * @requesttype: USB message request type value * @value: USB message value * @index: USB message index value * @driver_data: pointer to the data to be filled in by the message * @size: length in bytes of the data to be received * @timeout: time in msecs to wait for the message to complete before timing * out (if 0 the wait is forever) * @memflags: the flags for memory allocation for buffers * * Context: !in_interrupt () * * This function sends a control message to a specified endpoint that is * expected to fill in a response (i.e. a "receive message") and waits for the * message to complete, or timeout. * * Do not use this function from within an interrupt context. If you need * an asynchronous message, or need to send a message from within interrupt * context, use usb_submit_urb(). If a thread in your driver uses this call, * make sure your disconnect() method can wait for it to complete. Since you * don't have a handle on the URB used, you can't cancel the request. * * The data pointer can be made to a reference on the stack, or anywhere else * that can be successfully written to. This function does not have the * restriction that usb_control_msg() has where the data pointer must be to * dynamically allocated memory (i.e. memory that can be successfully DMAed to a * device). * * The "whole" message must be properly received from the device in order for * this function to be successful. If a device returns less than the expected * amount of data, then the function will fail. Do not use this for messages * where a variable amount of data might be returned. * * Return: If successful, 0 is returned, Otherwise, a negative error number. */ int usb_control_msg_recv(struct usb_device *dev, __u8 endpoint, __u8 request, __u8 requesttype, __u16 value, __u16 index, void *driver_data, __u16 size, int timeout, gfp_t memflags) { unsigned int pipe = usb_rcvctrlpipe(dev, endpoint); int ret; u8 *data; if (!size || !driver_data) return -EINVAL; data = kmalloc(size, memflags); if (!data) return -ENOMEM; ret = usb_control_msg(dev, pipe, request, requesttype, value, index, data, size, timeout); if (ret < 0) goto exit; if (ret == size) { memcpy(driver_data, data, size); ret = 0; } else { ret = -EREMOTEIO; } exit: kfree(data); return ret; } EXPORT_SYMBOL_GPL(usb_control_msg_recv); /** * usb_interrupt_msg - Builds an interrupt urb, sends it off and waits for completion * @usb_dev: pointer to the usb device to send the message to * @pipe: endpoint "pipe" to send the message to * @data: pointer to the data to send * @len: length in bytes of the data to send * @actual_length: pointer to a location to put the actual length transferred * in bytes * @timeout: time in msecs to wait for the message to complete before * timing out (if 0 the wait is forever) * * Context: task context, might sleep. * * This function sends a simple interrupt message to a specified endpoint and * waits for the message to complete, or timeout. * * Don't use this function from within an interrupt context. If you need * an asynchronous message, or need to send a message from within interrupt * context, use usb_submit_urb() If a thread in your driver uses this call, * make sure your disconnect() method can wait for it to complete. Since you * don't have a handle on the URB used, you can't cancel the request. * * Return: * If successful, 0. Otherwise a negative error number. The number of actual * bytes transferred will be stored in the @actual_length parameter. */ int usb_interrupt_msg(struct usb_device *usb_dev, unsigned int pipe, void *data, int len, int *actual_length, int timeout) { return usb_bulk_msg(usb_dev, pipe, data, len, actual_length, timeout); } EXPORT_SYMBOL_GPL(usb_interrupt_msg); /** * usb_bulk_msg - Builds a bulk urb, sends it off and waits for completion * @usb_dev: pointer to the usb device to send the message to * @pipe: endpoint "pipe" to send the message to * @data: pointer to the data to send * @len: length in bytes of the data to send * @actual_length: pointer to a location to put the actual length transferred * in bytes * @timeout: time in msecs to wait for the message to complete before * timing out (if 0 the wait is forever) * * Context: task context, might sleep. * * This function sends a simple bulk message to a specified endpoint * and waits for the message to complete, or timeout. * * Don't use this function from within an interrupt context. If you need * an asynchronous message, or need to send a message from within interrupt * context, use usb_submit_urb() If a thread in your driver uses this call, * make sure your disconnect() method can wait for it to complete. Since you * don't have a handle on the URB used, you can't cancel the request. * * Because there is no usb_interrupt_msg() and no USBDEVFS_INTERRUPT ioctl, * users are forced to abuse this routine by using it to submit URBs for * interrupt endpoints. We will take the liberty of creating an interrupt URB * (with the default interval) if the target is an interrupt endpoint. * * Return: * If successful, 0. Otherwise a negative error number. The number of actual * bytes transferred will be stored in the @actual_length parameter. * */ int usb_bulk_msg(struct usb_device *usb_dev, unsigned int pipe, void *data, int len, int *actual_length, int timeout) { struct urb *urb; struct usb_host_endpoint *ep; ep = usb_pipe_endpoint(usb_dev, pipe); if (!ep || len < 0) return -EINVAL; urb = usb_alloc_urb(0, GFP_KERNEL); if (!urb) return -ENOMEM; if ((ep->desc.bmAttributes & USB_ENDPOINT_XFERTYPE_MASK) == USB_ENDPOINT_XFER_INT) { pipe = (pipe & ~(3 << 30)) | (PIPE_INTERRUPT << 30); usb_fill_int_urb(urb, usb_dev, pipe, data, len, usb_api_blocking_completion, NULL, ep->desc.bInterval); } else usb_fill_bulk_urb(urb, usb_dev, pipe, data, len, usb_api_blocking_completion, NULL); return usb_start_wait_urb(urb, timeout, actual_length); } EXPORT_SYMBOL_GPL(usb_bulk_msg); /*-------------------------------------------------------------------*/ static void sg_clean(struct usb_sg_request *io) { if (io->urbs) { while (io->entries--) usb_free_urb(io->urbs[io->entries]); kfree(io->urbs); io->urbs = NULL; } io->dev = NULL; } static void sg_complete(struct urb *urb) { unsigned long flags; struct usb_sg_request *io = urb->context; int status = urb->status; spin_lock_irqsave(&io->lock, flags); /* In 2.5 we require hcds' endpoint queues not to progress after fault * reports, until the completion callback (this!) returns. That lets * device driver code (like this routine) unlink queued urbs first, * if it needs to, since the HC won't work on them at all. So it's * not possible for page N+1 to overwrite page N, and so on. * * That's only for "hard" faults; "soft" faults (unlinks) sometimes * complete before the HCD can get requests away from hardware, * though never during cleanup after a hard fault. */ if (io->status && (io->status != -ECONNRESET || status != -ECONNRESET) && urb->actual_length) { dev_err(io->dev->bus->controller, "dev %s ep%d%s scatterlist error %d/%d\n", io->dev->devpath, usb_endpoint_num(&urb->ep->desc), usb_urb_dir_in(urb) ? "in" : "out", status, io->status); /* BUG (); */ } if (io->status == 0 && status && status != -ECONNRESET) { int i, found, retval; io->status = status; /* the previous urbs, and this one, completed already. * unlink pending urbs so they won't rx/tx bad data. * careful: unlink can sometimes be synchronous... */ spin_unlock_irqrestore(&io->lock, flags); for (i = 0, found = 0; i < io->entries; i++) { if (!io->urbs[i]) continue; if (found) { usb_block_urb(io->urbs[i]); retval = usb_unlink_urb(io->urbs[i]); if (retval != -EINPROGRESS && retval != -ENODEV && retval != -EBUSY && retval != -EIDRM) dev_err(&io->dev->dev, "%s, unlink --> %d\n", __func__, retval); } else if (urb == io->urbs[i]) found = 1; } spin_lock_irqsave(&io->lock, flags); } /* on the last completion, signal usb_sg_wait() */ io->bytes += urb->actual_length; io->count--; if (!io->count) complete(&io->complete); spin_unlock_irqrestore(&io->lock, flags); } /** * usb_sg_init - initializes scatterlist-based bulk/interrupt I/O request * @io: request block being initialized. until usb_sg_wait() returns, * treat this as a pointer to an opaque block of memory, * @dev: the usb device that will send or receive the data * @pipe: endpoint "pipe" used to transfer the data * @period: polling rate for interrupt endpoints, in frames or * (for high speed endpoints) microframes; ignored for bulk * @sg: scatterlist entries * @nents: how many entries in the scatterlist * @length: how many bytes to send from the scatterlist, or zero to * send every byte identified in the list. * @mem_flags: SLAB_* flags affecting memory allocations in this call * * This initializes a scatter/gather request, allocating resources such as * I/O mappings and urb memory (except maybe memory used by USB controller * drivers). * * The request must be issued using usb_sg_wait(), which waits for the I/O to * complete (or to be canceled) and then cleans up all resources allocated by * usb_sg_init(). * * The request may be canceled with usb_sg_cancel(), either before or after * usb_sg_wait() is called. * * Return: Zero for success, else a negative errno value. */ int usb_sg_init(struct usb_sg_request *io, struct usb_device *dev, unsigned pipe, unsigned period, struct scatterlist *sg, int nents, size_t length, gfp_t mem_flags) { int i; int urb_flags; int use_sg; if (!io || !dev || !sg || usb_pipecontrol(pipe) || usb_pipeisoc(pipe) || nents <= 0) return -EINVAL; spin_lock_init(&io->lock); io->dev = dev; io->pipe = pipe; if (dev->bus->sg_tablesize > 0) { use_sg = true; io->entries = 1; } else { use_sg = false; io->entries = nents; } /* initialize all the urbs we'll use */ io->urbs = kmalloc_array(io->entries, sizeof(*io->urbs), mem_flags); if (!io->urbs) goto nomem; urb_flags = URB_NO_INTERRUPT; if (usb_pipein(pipe)) urb_flags |= URB_SHORT_NOT_OK; for_each_sg(sg, sg, io->entries, i) { struct urb *urb; unsigned len; urb = usb_alloc_urb(0, mem_flags); if (!urb) { io->entries = i; goto nomem; } io->urbs[i] = urb; urb->dev = NULL; urb->pipe = pipe; urb->interval = period; urb->transfer_flags = urb_flags; urb->complete = sg_complete; urb->context = io; urb->sg = sg; if (use_sg) { /* There is no single transfer buffer */ urb->transfer_buffer = NULL; urb->num_sgs = nents; /* A length of zero means transfer the whole sg list */ len = length; if (len == 0) { struct scatterlist *sg2; int j; for_each_sg(sg, sg2, nents, j) len += sg2->length; } } else { /* * Some systems can't use DMA; they use PIO instead. * For their sakes, transfer_buffer is set whenever * possible. */ if (!PageHighMem(sg_page(sg))) urb->transfer_buffer = sg_virt(sg); else urb->transfer_buffer = NULL; len = sg->length; if (length) { len = min_t(size_t, len, length); length -= len; if (length == 0) io->entries = i + 1; } } urb->transfer_buffer_length = len; } io->urbs[--i]->transfer_flags &= ~URB_NO_INTERRUPT; /* transaction state */ io->count = io->entries; io->status = 0; io->bytes = 0; init_completion(&io->complete); return 0; nomem: sg_clean(io); return -ENOMEM; } EXPORT_SYMBOL_GPL(usb_sg_init); /** * usb_sg_wait - synchronously execute scatter/gather request * @io: request block handle, as initialized with usb_sg_init(). * some fields become accessible when this call returns. * * Context: task context, might sleep. * * This function blocks until the specified I/O operation completes. It * leverages the grouping of the related I/O requests to get good transfer * rates, by queueing the requests. At higher speeds, such queuing can * significantly improve USB throughput. * * There are three kinds of completion for this function. * * (1) success, where io->status is zero. The number of io->bytes * transferred is as requested. * (2) error, where io->status is a negative errno value. The number * of io->bytes transferred before the error is usually less * than requested, and can be nonzero. * (3) cancellation, a type of error with status -ECONNRESET that * is initiated by usb_sg_cancel(). * * When this function returns, all memory allocated through usb_sg_init() or * this call will have been freed. The request block parameter may still be * passed to usb_sg_cancel(), or it may be freed. It could also be * reinitialized and then reused. * * Data Transfer Rates: * * Bulk transfers are valid for full or high speed endpoints. * The best full speed data rate is 19 packets of 64 bytes each * per frame, or 1216 bytes per millisecond. * The best high speed data rate is 13 packets of 512 bytes each * per microframe, or 52 KBytes per millisecond. * * The reason to use interrupt transfers through this API would most likely * be to reserve high speed bandwidth, where up to 24 KBytes per millisecond * could be transferred. That capability is less useful for low or full * speed interrupt endpoints, which allow at most one packet per millisecond, * of at most 8 or 64 bytes (respectively). * * It is not necessary to call this function to reserve bandwidth for devices * under an xHCI host controller, as the bandwidth is reserved when the * configuration or interface alt setting is selected. */ void usb_sg_wait(struct usb_sg_request *io) { int i; int entries = io->entries; /* queue the urbs. */ spin_lock_irq(&io->lock); i = 0; while (i < entries && !io->status) { int retval; io->urbs[i]->dev = io->dev; spin_unlock_irq(&io->lock); retval = usb_submit_urb(io->urbs[i], GFP_NOIO); switch (retval) { /* maybe we retrying will recover */ case -ENXIO: /* hc didn't queue this one */ case -EAGAIN: case -ENOMEM: retval = 0; yield(); break; /* no error? continue immediately. * * NOTE: to work better with UHCI (4K I/O buffer may * need 3K of TDs) it may be good to limit how many * URBs are queued at once; N milliseconds? */ case 0: ++i; cpu_relax(); break; /* fail any uncompleted urbs */ default: io->urbs[i]->status = retval; dev_dbg(&io->dev->dev, "%s, submit --> %d\n", __func__, retval); usb_sg_cancel(io); } spin_lock_irq(&io->lock); if (retval && (io->status == 0 || io->status == -ECONNRESET)) io->status = retval; } io->count -= entries - i; if (io->count == 0) complete(&io->complete); spin_unlock_irq(&io->lock); /* OK, yes, this could be packaged as non-blocking. * So could the submit loop above ... but it's easier to * solve neither problem than to solve both! */ wait_for_completion(&io->complete); sg_clean(io); } EXPORT_SYMBOL_GPL(usb_sg_wait); /** * usb_sg_cancel - stop scatter/gather i/o issued by usb_sg_wait() * @io: request block, initialized with usb_sg_init() * * This stops a request after it has been started by usb_sg_wait(). * It can also prevents one initialized by usb_sg_init() from starting, * so that call just frees resources allocated to the request. */ void usb_sg_cancel(struct usb_sg_request *io) { unsigned long flags; int i, retval; spin_lock_irqsave(&io->lock, flags); if (io->status || io->count == 0) { spin_unlock_irqrestore(&io->lock, flags); return; } /* shut everything down */ io->status = -ECONNRESET; io->count++; /* Keep the request alive until we're done */ spin_unlock_irqrestore(&io->lock, flags); for (i = io->entries - 1; i >= 0; --i) { usb_block_urb(io->urbs[i]); retval = usb_unlink_urb(io->urbs[i]); if (retval != -EINPROGRESS && retval != -ENODEV && retval != -EBUSY && retval != -EIDRM) dev_warn(&io->dev->dev, "%s, unlink --> %d\n", __func__, retval); } spin_lock_irqsave(&io->lock, flags); io->count--; if (!io->count) complete(&io->complete); spin_unlock_irqrestore(&io->lock, flags); } EXPORT_SYMBOL_GPL(usb_sg_cancel); /*-------------------------------------------------------------------*/ /** * usb_get_descriptor - issues a generic GET_DESCRIPTOR request * @dev: the device whose descriptor is being retrieved * @type: the descriptor type (USB_DT_*) * @index: the number of the descriptor * @buf: where to put the descriptor * @size: how big is "buf"? * * Context: task context, might sleep. * * Gets a USB descriptor. Convenience functions exist to simplify * getting some types of descriptors. Use * usb_get_string() or usb_string() for USB_DT_STRING. * Device (USB_DT_DEVICE) and configuration descriptors (USB_DT_CONFIG) * are part of the device structure. * In addition to a number of USB-standard descriptors, some * devices also use class-specific or vendor-specific descriptors. * * This call is synchronous, and may not be used in an interrupt context. * * Return: The number of bytes received on success, or else the status code * returned by the underlying usb_control_msg() call. */ int usb_get_descriptor(struct usb_device *dev, unsigned char type, unsigned char index, void *buf, int size) { int i; int result; if (size <= 0) /* No point in asking for no data */ return -EINVAL; memset(buf, 0, size); /* Make sure we parse really received data */ for (i = 0; i < 3; ++i) { /* retry on length 0 or error; some devices are flakey */ result = usb_control_msg(dev, usb_rcvctrlpipe(dev, 0), USB_REQ_GET_DESCRIPTOR, USB_DIR_IN, (type << 8) + index, 0, buf, size, USB_CTRL_GET_TIMEOUT); if (result <= 0 && result != -ETIMEDOUT) continue; if (result > 1 && ((u8 *)buf)[1] != type) { result = -ENODATA; continue; } break; } return result; } EXPORT_SYMBOL_GPL(usb_get_descriptor); /** * usb_get_string - gets a string descriptor * @dev: the device whose string descriptor is being retrieved * @langid: code for language chosen (from string descriptor zero) * @index: the number of the descriptor * @buf: where to put the string * @size: how big is "buf"? * * Context: task context, might sleep. * * Retrieves a string, encoded using UTF-16LE (Unicode, 16 bits per character, * in little-endian byte order). * The usb_string() function will often be a convenient way to turn * these strings into kernel-printable form. * * Strings may be referenced in device, configuration, interface, or other * descriptors, and could also be used in vendor-specific ways. * * This call is synchronous, and may not be used in an interrupt context. * * Return: The number of bytes received on success, or else the status code * returned by the underlying usb_control_msg() call. */ static int usb_get_string(struct usb_device *dev, unsigned short langid, unsigned char index, void *buf, int size) { int i; int result; if (size <= 0) /* No point in asking for no data */ return -EINVAL; for (i = 0; i < 3; ++i) { /* retry on length 0 or stall; some devices are flakey */ result = usb_control_msg(dev, usb_rcvctrlpipe(dev, 0), USB_REQ_GET_DESCRIPTOR, USB_DIR_IN, (USB_DT_STRING << 8) + index, langid, buf, size, USB_CTRL_GET_TIMEOUT); if (result == 0 || result == -EPIPE) continue; if (result > 1 && ((u8 *) buf)[1] != USB_DT_STRING) { result = -ENODATA; continue; } break; } return result; } static void usb_try_string_workarounds(unsigned char *buf, int *length) { int newlength, oldlength = *length; for (newlength = 2; newlength + 1 < oldlength; newlength += 2) if (!isprint(buf[newlength]) || buf[newlength + 1]) break; if (newlength > 2) { buf[0] = newlength; *length = newlength; } } static int usb_string_sub(struct usb_device *dev, unsigned int langid, unsigned int index, unsigned char *buf) { int rc; /* Try to read the string descriptor by asking for the maximum * possible number of bytes */ if (dev->quirks & USB_QUIRK_STRING_FETCH_255) rc = -EIO; else rc = usb_get_string(dev, langid, index, buf, 255); /* If that failed try to read the descriptor length, then * ask for just that many bytes */ if (rc < 2) { rc = usb_get_string(dev, langid, index, buf, 2); if (rc == 2) rc = usb_get_string(dev, langid, index, buf, buf[0]); } if (rc >= 2) { if (!buf[0] && !buf[1]) usb_try_string_workarounds(buf, &rc); /* There might be extra junk at the end of the descriptor */ if (buf[0] < rc) rc = buf[0]; rc = rc - (rc & 1); /* force a multiple of two */ } if (rc < 2) rc = (rc < 0 ? rc : -EINVAL); return rc; } static int usb_get_langid(struct usb_device *dev, unsigned char *tbuf) { int err; if (dev->have_langid) return 0; if (dev->string_langid < 0) return -EPIPE; err = usb_string_sub(dev, 0, 0, tbuf); /* If the string was reported but is malformed, default to english * (0x0409) */ if (err == -ENODATA || (err > 0 && err < 4)) { dev->string_langid = 0x0409; dev->have_langid = 1; dev_err(&dev->dev, "language id specifier not provided by device, defaulting to English\n"); return 0; } /* In case of all other errors, we assume the device is not able to * deal with strings at all. Set string_langid to -1 in order to * prevent any string to be retrieved from the device */ if (err < 0) { dev_info(&dev->dev, "string descriptor 0 read error: %d\n", err); dev->string_langid = -1; return -EPIPE; } /* always use the first langid listed */ dev->string_langid = tbuf[2] | (tbuf[3] << 8); dev->have_langid = 1; dev_dbg(&dev->dev, "default language 0x%04x\n", dev->string_langid); return 0; } /** * usb_string - returns UTF-8 version of a string descriptor * @dev: the device whose string descriptor is being retrieved * @index: the number of the descriptor * @buf: where to put the string * @size: how big is "buf"? * * Context: task context, might sleep. * * This converts the UTF-16LE encoded strings returned by devices, from * usb_get_string_descriptor(), to null-terminated UTF-8 encoded ones * that are more usable in most kernel contexts. Note that this function * chooses strings in the first language supported by the device. * * This call is synchronous, and may not be used in an interrupt context. * * Return: length of the string (>= 0) or usb_control_msg status (< 0). */ int usb_string(struct usb_device *dev, int index, char *buf, size_t size) { unsigned char *tbuf; int err; if (dev->state == USB_STATE_SUSPENDED) return -EHOSTUNREACH; if (size <= 0 || !buf) return -EINVAL; buf[0] = 0; if (index <= 0 || index >= 256) return -EINVAL; tbuf = kmalloc(256, GFP_NOIO); if (!tbuf) return -ENOMEM; err = usb_get_langid(dev, tbuf); if (err < 0) goto errout; err = usb_string_sub(dev, dev->string_langid, index, tbuf); if (err < 0) goto errout; size--; /* leave room for trailing NULL char in output buffer */ err = utf16s_to_utf8s((wchar_t *) &tbuf[2], (err - 2) / 2, UTF16_LITTLE_ENDIAN, buf, size); buf[err] = 0; if (tbuf[1] != USB_DT_STRING) dev_dbg(&dev->dev, "wrong descriptor type %02x for string %d (\"%s\")\n", tbuf[1], index, buf); errout: kfree(tbuf); return err; } EXPORT_SYMBOL_GPL(usb_string); /* one UTF-8-encoded 16-bit character has at most three bytes */ #define MAX_USB_STRING_SIZE (127 * 3 + 1) /** * usb_cache_string - read a string descriptor and cache it for later use * @udev: the device whose string descriptor is being read * @index: the descriptor index * * Return: A pointer to a kmalloc'ed buffer containing the descriptor string, * or %NULL if the index is 0 or the string could not be read. */ char *usb_cache_string(struct usb_device *udev, int index) { char *buf; char *smallbuf = NULL; int len; if (index <= 0) return NULL; buf = kmalloc(MAX_USB_STRING_SIZE, GFP_NOIO); if (buf) { len = usb_string(udev, index, buf, MAX_USB_STRING_SIZE); if (len > 0) { smallbuf = kmalloc(++len, GFP_NOIO); if (!smallbuf) return buf; memcpy(smallbuf, buf, len); } kfree(buf); } return smallbuf; } EXPORT_SYMBOL_GPL(usb_cache_string); /* * usb_get_device_descriptor - read the device descriptor * @udev: the device whose device descriptor should be read * * Context: task context, might sleep. * * Not exported, only for use by the core. If drivers really want to read * the device descriptor directly, they can call usb_get_descriptor() with * type = USB_DT_DEVICE and index = 0. * * Returns: a pointer to a dynamically allocated usb_device_descriptor * structure (which the caller must deallocate), or an ERR_PTR value. */ struct usb_device_descriptor *usb_get_device_descriptor(struct usb_device *udev) { struct usb_device_descriptor *desc; int ret; desc = kmalloc(sizeof(*desc), GFP_NOIO); if (!desc) return ERR_PTR(-ENOMEM); ret = usb_get_descriptor(udev, USB_DT_DEVICE, 0, desc, sizeof(*desc)); if (ret == sizeof(*desc)) return desc; if (ret >= 0) ret = -EMSGSIZE; kfree(desc); return ERR_PTR(ret); } /* * usb_set_isoch_delay - informs the device of the packet transmit delay * @dev: the device whose delay is to be informed * Context: task context, might sleep * * Since this is an optional request, we don't bother if it fails. */ int usb_set_isoch_delay(struct usb_device *dev) { /* skip hub devices */ if (dev->descriptor.bDeviceClass == USB_CLASS_HUB) return 0; /* skip non-SS/non-SSP devices */ if (dev->speed < USB_SPEED_SUPER) return 0; return usb_control_msg_send(dev, 0, USB_REQ_SET_ISOCH_DELAY, USB_DIR_OUT | USB_TYPE_STANDARD | USB_RECIP_DEVICE, dev->hub_delay, 0, NULL, 0, USB_CTRL_SET_TIMEOUT, GFP_NOIO); } /** * usb_get_status - issues a GET_STATUS call * @dev: the device whose status is being checked * @recip: USB_RECIP_*; for device, interface, or endpoint * @type: USB_STATUS_TYPE_*; for standard or PTM status types * @target: zero (for device), else interface or endpoint number * @data: pointer to two bytes of bitmap data * * Context: task context, might sleep. * * Returns device, interface, or endpoint status. Normally only of * interest to see if the device is self powered, or has enabled the * remote wakeup facility; or whether a bulk or interrupt endpoint * is halted ("stalled"). * * Bits in these status bitmaps are set using the SET_FEATURE request, * and cleared using the CLEAR_FEATURE request. The usb_clear_halt() * function should be used to clear halt ("stall") status. * * This call is synchronous, and may not be used in an interrupt context. * * Returns 0 and the status value in *@data (in host byte order) on success, * or else the status code from the underlying usb_control_msg() call. */ int usb_get_status(struct usb_device *dev, int recip, int type, int target, void *data) { int ret; void *status; int length; switch (type) { case USB_STATUS_TYPE_STANDARD: length = 2; break; case USB_STATUS_TYPE_PTM: if (recip != USB_RECIP_DEVICE) return -EINVAL; length = 4; break; default: return -EINVAL; } status = kmalloc(length, GFP_KERNEL); if (!status) return -ENOMEM; ret = usb_control_msg(dev, usb_rcvctrlpipe(dev, 0), USB_REQ_GET_STATUS, USB_DIR_IN | recip, USB_STATUS_TYPE_STANDARD, target, status, length, USB_CTRL_GET_TIMEOUT); switch (ret) { case 4: if (type != USB_STATUS_TYPE_PTM) { ret = -EIO; break; } *(u32 *) data = le32_to_cpu(*(__le32 *) status); ret = 0; break; case 2: if (type != USB_STATUS_TYPE_STANDARD) { ret = -EIO; break; } *(u16 *) data = le16_to_cpu(*(__le16 *) status); ret = 0; break; default: ret = -EIO; } kfree(status); return ret; } EXPORT_SYMBOL_GPL(usb_get_status); /** * usb_clear_halt - tells device to clear endpoint halt/stall condition * @dev: device whose endpoint is halted * @pipe: endpoint "pipe" being cleared * * Context: task context, might sleep. * * This is used to clear halt conditions for bulk and interrupt endpoints, * as reported by URB completion status. Endpoints that are halted are * sometimes referred to as being "stalled". Such endpoints are unable * to transmit or receive data until the halt status is cleared. Any URBs * queued for such an endpoint should normally be unlinked by the driver * before clearing the halt condition, as described in sections 5.7.5 * and 5.8.5 of the USB 2.0 spec. * * Note that control and isochronous endpoints don't halt, although control * endpoints report "protocol stall" (for unsupported requests) using the * same status code used to report a true stall. * * This call is synchronous, and may not be used in an interrupt context. * If a thread in your driver uses this call, make sure your disconnect() * method can wait for it to complete. * * Return: Zero on success, or else the status code returned by the * underlying usb_control_msg() call. */ int usb_clear_halt(struct usb_device *dev, int pipe) { int result; int endp = usb_pipeendpoint(pipe); if (usb_pipein(pipe)) endp |= USB_DIR_IN; /* we don't care if it wasn't halted first. in fact some devices * (like some ibmcam model 1 units) seem to expect hosts to make * this request for iso endpoints, which can't halt! */ result = usb_control_msg_send(dev, 0, USB_REQ_CLEAR_FEATURE, USB_RECIP_ENDPOINT, USB_ENDPOINT_HALT, endp, NULL, 0, USB_CTRL_SET_TIMEOUT, GFP_NOIO); /* don't un-halt or force to DATA0 except on success */ if (result) return result; /* NOTE: seems like Microsoft and Apple don't bother verifying * the clear "took", so some devices could lock up if you check... * such as the Hagiwara FlashGate DUAL. So we won't bother. * * NOTE: make sure the logic here doesn't diverge much from * the copy in usb-storage, for as long as we need two copies. */ usb_reset_endpoint(dev, endp); return 0; } EXPORT_SYMBOL_GPL(usb_clear_halt); static int create_intf_ep_devs(struct usb_interface *intf) { struct usb_device *udev = interface_to_usbdev(intf); struct usb_host_interface *alt = intf->cur_altsetting; int i; if (intf->ep_devs_created || intf->unregistering) return 0; for (i = 0; i < alt->desc.bNumEndpoints; ++i) (void) usb_create_ep_devs(&intf->dev, &alt->endpoint[i], udev); intf->ep_devs_created = 1; return 0; } static void remove_intf_ep_devs(struct usb_interface *intf) { struct usb_host_interface *alt = intf->cur_altsetting; int i; if (!intf->ep_devs_created) return; for (i = 0; i < alt->desc.bNumEndpoints; ++i) usb_remove_ep_devs(&alt->endpoint[i]); intf->ep_devs_created = 0; } /** * usb_disable_endpoint -- Disable an endpoint by address * @dev: the device whose endpoint is being disabled * @epaddr: the endpoint's address. Endpoint number for output, * endpoint number + USB_DIR_IN for input * @reset_hardware: flag to erase any endpoint state stored in the * controller hardware * * Disables the endpoint for URB submission and nukes all pending URBs. * If @reset_hardware is set then also deallocates hcd/hardware state * for the endpoint. */ void usb_disable_endpoint(struct usb_device *dev, unsigned int epaddr, bool reset_hardware) { unsigned int epnum = epaddr & USB_ENDPOINT_NUMBER_MASK; struct usb_host_endpoint *ep; if (!dev) return; if (usb_endpoint_out(epaddr)) { ep = dev->ep_out[epnum]; if (reset_hardware && epnum != 0) dev->ep_out[epnum] = NULL; } else { ep = dev->ep_in[epnum]; if (reset_hardware && epnum != 0) dev->ep_in[epnum] = NULL; } if (ep) { ep->enabled = 0; usb_hcd_flush_endpoint(dev, ep); if (reset_hardware) usb_hcd_disable_endpoint(dev, ep); } } /** * usb_reset_endpoint - Reset an endpoint's state. * @dev: the device whose endpoint is to be reset * @epaddr: the endpoint's address. Endpoint number for output, * endpoint number + USB_DIR_IN for input * * Resets any host-side endpoint state such as the toggle bit, * sequence number or current window. */ void usb_reset_endpoint(struct usb_device *dev, unsigned int epaddr) { unsigned int epnum = epaddr & USB_ENDPOINT_NUMBER_MASK; struct usb_host_endpoint *ep; if (usb_endpoint_out(epaddr)) ep = dev->ep_out[epnum]; else ep = dev->ep_in[epnum]; if (ep) usb_hcd_reset_endpoint(dev, ep); } EXPORT_SYMBOL_GPL(usb_reset_endpoint); /** * usb_disable_interface -- Disable all endpoints for an interface * @dev: the device whose interface is being disabled * @intf: pointer to the interface descriptor * @reset_hardware: flag to erase any endpoint state stored in the * controller hardware * * Disables all the endpoints for the interface's current altsetting. */ void usb_disable_interface(struct usb_device *dev, struct usb_interface *intf, bool reset_hardware) { struct usb_host_interface *alt = intf->cur_altsetting; int i; for (i = 0; i < alt->desc.bNumEndpoints; ++i) { usb_disable_endpoint(dev, alt->endpoint[i].desc.bEndpointAddress, reset_hardware); } } /* * usb_disable_device_endpoints -- Disable all endpoints for a device * @dev: the device whose endpoints are being disabled * @skip_ep0: 0 to disable endpoint 0, 1 to skip it. */ static void usb_disable_device_endpoints(struct usb_device *dev, int skip_ep0) { struct usb_hcd *hcd = bus_to_hcd(dev->bus); int i; if (hcd->driver->check_bandwidth) { /* First pass: Cancel URBs, leave endpoint pointers intact. */ for (i = skip_ep0; i < 16; ++i) { usb_disable_endpoint(dev, i, false); usb_disable_endpoint(dev, i + USB_DIR_IN, false); } /* Remove endpoints from the host controller internal state */ mutex_lock(hcd->bandwidth_mutex); usb_hcd_alloc_bandwidth(dev, NULL, NULL, NULL); mutex_unlock(hcd->bandwidth_mutex); } /* Second pass: remove endpoint pointers */ for (i = skip_ep0; i < 16; ++i) { usb_disable_endpoint(dev, i, true); usb_disable_endpoint(dev, i + USB_DIR_IN, true); } } /** * usb_disable_device - Disable all the endpoints for a USB device * @dev: the device whose endpoints are being disabled * @skip_ep0: 0 to disable endpoint 0, 1 to skip it. * * Disables all the device's endpoints, potentially including endpoint 0. * Deallocates hcd/hardware state for the endpoints (nuking all or most * pending urbs) and usbcore state for the interfaces, so that usbcore * must usb_set_configuration() before any interfaces could be used. */ void usb_disable_device(struct usb_device *dev, int skip_ep0) { int i; /* getting rid of interfaces will disconnect * any drivers bound to them (a key side effect) */ if (dev->actconfig) { /* * FIXME: In order to avoid self-deadlock involving the * bandwidth_mutex, we have to mark all the interfaces * before unregistering any of them. */ for (i = 0; i < dev->actconfig->desc.bNumInterfaces; i++) dev->actconfig->interface[i]->unregistering = 1; for (i = 0; i < dev->actconfig->desc.bNumInterfaces; i++) { struct usb_interface *interface; /* remove this interface if it has been registered */ interface = dev->actconfig->interface[i]; if (!device_is_registered(&interface->dev)) continue; dev_dbg(&dev->dev, "unregistering interface %s\n", dev_name(&interface->dev)); remove_intf_ep_devs(interface); device_del(&interface->dev); } /* Now that the interfaces are unbound, nobody should * try to access them. */ for (i = 0; i < dev->actconfig->desc.bNumInterfaces; i++) { put_device(&dev->actconfig->interface[i]->dev); dev->actconfig->interface[i] = NULL; } usb_disable_usb2_hardware_lpm(dev); usb_unlocked_disable_lpm(dev); usb_disable_ltm(dev); dev->actconfig = NULL; if (dev->state == USB_STATE_CONFIGURED) usb_set_device_state(dev, USB_STATE_ADDRESS); } dev_dbg(&dev->dev, "%s nuking %s URBs\n", __func__, skip_ep0 ? "non-ep0" : "all"); usb_disable_device_endpoints(dev, skip_ep0); } /** * usb_enable_endpoint - Enable an endpoint for USB communications * @dev: the device whose interface is being enabled * @ep: the endpoint * @reset_ep: flag to reset the endpoint state * * Resets the endpoint state if asked, and sets dev->ep_{in,out} pointers. * For control endpoints, both the input and output sides are handled. */ void usb_enable_endpoint(struct usb_device *dev, struct usb_host_endpoint *ep, bool reset_ep) { int epnum = usb_endpoint_num(&ep->desc); int is_out = usb_endpoint_dir_out(&ep->desc); int is_control = usb_endpoint_xfer_control(&ep->desc); if (reset_ep) usb_hcd_reset_endpoint(dev, ep); if (is_out || is_control) dev->ep_out[epnum] = ep; if (!is_out || is_control) dev->ep_in[epnum] = ep; ep->enabled = 1; } /** * usb_enable_interface - Enable all the endpoints for an interface * @dev: the device whose interface is being enabled * @intf: pointer to the interface descriptor * @reset_eps: flag to reset the endpoints' state * * Enables all the endpoints for the interface's current altsetting. */ void usb_enable_interface(struct usb_device *dev, struct usb_interface *intf, bool reset_eps) { struct usb_host_interface *alt = intf->cur_altsetting; int i; for (i = 0; i < alt->desc.bNumEndpoints; ++i) usb_enable_endpoint(dev, &alt->endpoint[i], reset_eps); } /** * usb_set_interface - Makes a particular alternate setting be current * @dev: the device whose interface is being updated * @interface: the interface being updated * @alternate: the setting being chosen. * * Context: task context, might sleep. * * This is used to enable data transfers on interfaces that may not * be enabled by default. Not all devices support such configurability. * Only the driver bound to an interface may change its setting. * * Within any given configuration, each interface may have several * alternative settings. These are often used to control levels of * bandwidth consumption. For example, the default setting for a high * speed interrupt endpoint may not send more than 64 bytes per microframe, * while interrupt transfers of up to 3KBytes per microframe are legal. * Also, isochronous endpoints may never be part of an * interface's default setting. To access such bandwidth, alternate * interface settings must be made current. * * Note that in the Linux USB subsystem, bandwidth associated with * an endpoint in a given alternate setting is not reserved until an URB * is submitted that needs that bandwidth. Some other operating systems * allocate bandwidth early, when a configuration is chosen. * * xHCI reserves bandwidth and configures the alternate setting in * usb_hcd_alloc_bandwidth(). If it fails the original interface altsetting * may be disabled. Drivers cannot rely on any particular alternate * setting being in effect after a failure. * * This call is synchronous, and may not be used in an interrupt context. * Also, drivers must not change altsettings while urbs are scheduled for * endpoints in that interface; all such urbs must first be completed * (perhaps forced by unlinking). If a thread in your driver uses this call, * make sure your disconnect() method can wait for it to complete. * * Return: Zero on success, or else the status code returned by the * underlying usb_control_msg() call. */ int usb_set_interface(struct usb_device *dev, int interface, int alternate) { struct usb_interface *iface; struct usb_host_interface *alt; struct usb_hcd *hcd = bus_to_hcd(dev->bus); int i, ret, manual = 0; unsigned int epaddr; unsigned int pipe; if (dev->state == USB_STATE_SUSPENDED) return -EHOSTUNREACH; iface = usb_ifnum_to_if(dev, interface); if (!iface) { dev_dbg(&dev->dev, "selecting invalid interface %d\n", interface); return -EINVAL; } if (iface->unregistering) return -ENODEV; alt = usb_altnum_to_altsetting(iface, alternate); if (!alt) { dev_warn(&dev->dev, "selecting invalid altsetting %d\n", alternate); return -EINVAL; } /* * usb3 hosts configure the interface in usb_hcd_alloc_bandwidth, * including freeing dropped endpoint ring buffers. * Make sure the interface endpoints are flushed before that */ usb_disable_interface(dev, iface, false); /* Make sure we have enough bandwidth for this alternate interface. * Remove the current alt setting and add the new alt setting. */ mutex_lock(hcd->bandwidth_mutex); /* Disable LPM, and re-enable it once the new alt setting is installed, * so that the xHCI driver can recalculate the U1/U2 timeouts. */ if (usb_disable_lpm(dev)) { dev_err(&iface->dev, "%s Failed to disable LPM\n", __func__); mutex_unlock(hcd->bandwidth_mutex); return -ENOMEM; } /* Changing alt-setting also frees any allocated streams */ for (i = 0; i < iface->cur_altsetting->desc.bNumEndpoints; i++) iface->cur_altsetting->endpoint[i].streams = 0; ret = usb_hcd_alloc_bandwidth(dev, NULL, iface->cur_altsetting, alt); if (ret < 0) { dev_info(&dev->dev, "Not enough bandwidth for altsetting %d\n", alternate); usb_enable_lpm(dev); mutex_unlock(hcd->bandwidth_mutex); return ret; } if (dev->quirks & USB_QUIRK_NO_SET_INTF) ret = -EPIPE; else ret = usb_control_msg_send(dev, 0, USB_REQ_SET_INTERFACE, USB_RECIP_INTERFACE, alternate, interface, NULL, 0, 5000, GFP_NOIO); /* 9.4.10 says devices don't need this and are free to STALL the * request if the interface only has one alternate setting. */ if (ret == -EPIPE && iface->num_altsetting == 1) { dev_dbg(&dev->dev, "manual set_interface for iface %d, alt %d\n", interface, alternate); manual = 1; } else if (ret) { /* Re-instate the old alt setting */ usb_hcd_alloc_bandwidth(dev, NULL, alt, iface->cur_altsetting); usb_enable_lpm(dev); mutex_unlock(hcd->bandwidth_mutex); return ret; } mutex_unlock(hcd->bandwidth_mutex); /* FIXME drivers shouldn't need to replicate/bugfix the logic here * when they implement async or easily-killable versions of this or * other "should-be-internal" functions (like clear_halt). * should hcd+usbcore postprocess control requests? */ /* prevent submissions using previous endpoint settings */ if (iface->cur_altsetting != alt) { remove_intf_ep_devs(iface); usb_remove_sysfs_intf_files(iface); } usb_disable_interface(dev, iface, true); iface->cur_altsetting = alt; /* Now that the interface is installed, re-enable LPM. */ usb_unlocked_enable_lpm(dev); /* If the interface only has one altsetting and the device didn't * accept the request, we attempt to carry out the equivalent action * by manually clearing the HALT feature for each endpoint in the * new altsetting. */ if (manual) { for (i = 0; i < alt->desc.bNumEndpoints; i++) { epaddr = alt->endpoint[i].desc.bEndpointAddress; pipe = __create_pipe(dev, USB_ENDPOINT_NUMBER_MASK & epaddr) | (usb_endpoint_out(epaddr) ? USB_DIR_OUT : USB_DIR_IN); usb_clear_halt(dev, pipe); } } /* 9.1.1.5: reset toggles for all endpoints in the new altsetting * * Note: * Despite EP0 is always present in all interfaces/AS, the list of * endpoints from the descriptor does not contain EP0. Due to its * omnipresence one might expect EP0 being considered "affected" by * any SetInterface request and hence assume toggles need to be reset. * However, EP0 toggles are re-synced for every individual transfer * during the SETUP stage - hence EP0 toggles are "don't care" here. * (Likewise, EP0 never "halts" on well designed devices.) */ usb_enable_interface(dev, iface, true); if (device_is_registered(&iface->dev)) { usb_create_sysfs_intf_files(iface); create_intf_ep_devs(iface); } return 0; } EXPORT_SYMBOL_GPL(usb_set_interface); /** * usb_reset_configuration - lightweight device reset * @dev: the device whose configuration is being reset * * This issues a standard SET_CONFIGURATION request to the device using * the current configuration. The effect is to reset most USB-related * state in the device, including interface altsettings (reset to zero), * endpoint halts (cleared), and endpoint state (only for bulk and interrupt * endpoints). Other usbcore state is unchanged, including bindings of * usb device drivers to interfaces. * * Because this affects multiple interfaces, avoid using this with composite * (multi-interface) devices. Instead, the driver for each interface may * use usb_set_interface() on the interfaces it claims. Be careful though; * some devices don't support the SET_INTERFACE request, and others won't * reset all the interface state (notably endpoint state). Resetting the whole * configuration would affect other drivers' interfaces. * * The caller must own the device lock. * * Return: Zero on success, else a negative error code. * * If this routine fails the device will probably be in an unusable state * with endpoints disabled, and interfaces only partially enabled. */ int usb_reset_configuration(struct usb_device *dev) { int i, retval; struct usb_host_config *config; struct usb_hcd *hcd = bus_to_hcd(dev->bus); if (dev->state == USB_STATE_SUSPENDED) return -EHOSTUNREACH; /* caller must have locked the device and must own * the usb bus readlock (so driver bindings are stable); * calls during probe() are fine */ usb_disable_device_endpoints(dev, 1); /* skip ep0*/ config = dev->actconfig; retval = 0; mutex_lock(hcd->bandwidth_mutex); /* Disable LPM, and re-enable it once the configuration is reset, so * that the xHCI driver can recalculate the U1/U2 timeouts. */ if (usb_disable_lpm(dev)) { dev_err(&dev->dev, "%s Failed to disable LPM\n", __func__); mutex_unlock(hcd->bandwidth_mutex); return -ENOMEM; } /* xHCI adds all endpoints in usb_hcd_alloc_bandwidth */ retval = usb_hcd_alloc_bandwidth(dev, config, NULL, NULL); if (retval < 0) { usb_enable_lpm(dev); mutex_unlock(hcd->bandwidth_mutex); return retval; } retval = usb_control_msg_send(dev, 0, USB_REQ_SET_CONFIGURATION, 0, config->desc.bConfigurationValue, 0, NULL, 0, USB_CTRL_SET_TIMEOUT, GFP_NOIO); if (retval) { usb_hcd_alloc_bandwidth(dev, NULL, NULL, NULL); usb_enable_lpm(dev); mutex_unlock(hcd->bandwidth_mutex); return retval; } mutex_unlock(hcd->bandwidth_mutex); /* re-init hc/hcd interface/endpoint state */ for (i = 0; i < config->desc.bNumInterfaces; i++) { struct usb_interface *intf = config->interface[i]; struct usb_host_interface *alt; alt = usb_altnum_to_altsetting(intf, 0); /* No altsetting 0? We'll assume the first altsetting. * We could use a GetInterface call, but if a device is * so non-compliant that it doesn't have altsetting 0 * then I wouldn't trust its reply anyway. */ if (!alt) alt = &intf->altsetting[0]; if (alt != intf->cur_altsetting) { remove_intf_ep_devs(intf); usb_remove_sysfs_intf_files(intf); } intf->cur_altsetting = alt; usb_enable_interface(dev, intf, true); if (device_is_registered(&intf->dev)) { usb_create_sysfs_intf_files(intf); create_intf_ep_devs(intf); } } /* Now that the interfaces are installed, re-enable LPM. */ usb_unlocked_enable_lpm(dev); return 0; } EXPORT_SYMBOL_GPL(usb_reset_configuration); static void usb_release_interface(struct device *dev) { struct usb_interface *intf = to_usb_interface(dev); struct usb_interface_cache *intfc = altsetting_to_usb_interface_cache(intf->altsetting); kref_put(&intfc->ref, usb_release_interface_cache); usb_put_dev(interface_to_usbdev(intf)); of_node_put(dev->of_node); kfree(intf); } /* * usb_deauthorize_interface - deauthorize an USB interface * * @intf: USB interface structure */ void usb_deauthorize_interface(struct usb_interface *intf) { struct device *dev = &intf->dev; device_lock(dev->parent); if (intf->authorized) { device_lock(dev); intf->authorized = 0; device_unlock(dev); usb_forced_unbind_intf(intf); } device_unlock(dev->parent); } /* * usb_authorize_interface - authorize an USB interface * * @intf: USB interface structure */ void usb_authorize_interface(struct usb_interface *intf) { struct device *dev = &intf->dev; if (!intf->authorized) { device_lock(dev); intf->authorized = 1; /* authorize interface */ device_unlock(dev); } } static int usb_if_uevent(const struct device *dev, struct kobj_uevent_env *env) { const struct usb_device *usb_dev; const struct usb_interface *intf; const struct usb_host_interface *alt; intf = to_usb_interface(dev); usb_dev = interface_to_usbdev(intf); alt = intf->cur_altsetting; if (add_uevent_var(env, "INTERFACE=%d/%d/%d", alt->desc.bInterfaceClass, alt->desc.bInterfaceSubClass, alt->desc.bInterfaceProtocol)) return -ENOMEM; if (add_uevent_var(env, "MODALIAS=usb:" "v%04Xp%04Xd%04Xdc%02Xdsc%02Xdp%02Xic%02Xisc%02Xip%02Xin%02X", le16_to_cpu(usb_dev->descriptor.idVendor), le16_to_cpu(usb_dev->descriptor.idProduct), le16_to_cpu(usb_dev->descriptor.bcdDevice), usb_dev->descriptor.bDeviceClass, usb_dev->descriptor.bDeviceSubClass, usb_dev->descriptor.bDeviceProtocol, alt->desc.bInterfaceClass, alt->desc.bInterfaceSubClass, alt->desc.bInterfaceProtocol, alt->desc.bInterfaceNumber)) return -ENOMEM; return 0; } const struct device_type usb_if_device_type = { .name = "usb_interface", .release = usb_release_interface, .uevent = usb_if_uevent, }; static struct usb_interface_assoc_descriptor *find_iad(struct usb_device *dev, struct usb_host_config *config, u8 inum) { struct usb_interface_assoc_descriptor *retval = NULL; struct usb_interface_assoc_descriptor *intf_assoc; int first_intf; int last_intf; int i; for (i = 0; (i < USB_MAXIADS && config->intf_assoc[i]); i++) { intf_assoc = config->intf_assoc[i]; if (intf_assoc->bInterfaceCount == 0) continue; first_intf = intf_assoc->bFirstInterface; last_intf = first_intf + (intf_assoc->bInterfaceCount - 1); if (inum >= first_intf && inum <= last_intf) { if (!retval) retval = intf_assoc; else dev_err(&dev->dev, "Interface #%d referenced" " by multiple IADs\n", inum); } } return retval; } /* * Internal function to queue a device reset * See usb_queue_reset_device() for more details */ static void __usb_queue_reset_device(struct work_struct *ws) { int rc; struct usb_interface *iface = container_of(ws, struct usb_interface, reset_ws); struct usb_device *udev = interface_to_usbdev(iface); rc = usb_lock_device_for_reset(udev, iface); if (rc >= 0) { usb_reset_device(udev); usb_unlock_device(udev); } usb_put_intf(iface); /* Undo _get_ in usb_queue_reset_device() */ } /* * Internal function to set the wireless_status sysfs attribute * See usb_set_wireless_status() for more details */ static void __usb_wireless_status_intf(struct work_struct *ws) { struct usb_interface *iface = container_of(ws, struct usb_interface, wireless_status_work); device_lock(iface->dev.parent); if (iface->sysfs_files_created) usb_update_wireless_status_attr(iface); device_unlock(iface->dev.parent); usb_put_intf(iface); /* Undo _get_ in usb_set_wireless_status() */ } /** * usb_set_wireless_status - sets the wireless_status struct member * @iface: the interface to modify * @status: the new wireless status * * Set the wireless_status struct member to the new value, and emit * sysfs changes as necessary. * * Returns: 0 on success, -EALREADY if already set. */ int usb_set_wireless_status(struct usb_interface *iface, enum usb_wireless_status status) { if (iface->wireless_status == status) return -EALREADY; usb_get_intf(iface); iface->wireless_status = status; schedule_work(&iface->wireless_status_work); return 0; } EXPORT_SYMBOL_GPL(usb_set_wireless_status); /* * usb_set_configuration - Makes a particular device setting be current * @dev: the device whose configuration is being updated * @configuration: the configuration being chosen. * * Context: task context, might sleep. Caller holds device lock. * * This is used to enable non-default device modes. Not all devices * use this kind of configurability; many devices only have one * configuration. * * @configuration is the value of the configuration to be installed. * According to the USB spec (e.g. section 9.1.1.5), configuration values * must be non-zero; a value of zero indicates that the device in * unconfigured. However some devices erroneously use 0 as one of their * configuration values. To help manage such devices, this routine will * accept @configuration = -1 as indicating the device should be put in * an unconfigured state. * * USB device configurations may affect Linux interoperability, * power consumption and the functionality available. For example, * the default configuration is limited to using 100mA of bus power, * so that when certain device functionality requires more power, * and the device is bus powered, that functionality should be in some * non-default device configuration. Other device modes may also be * reflected as configuration options, such as whether two ISDN * channels are available independently; and choosing between open * standard device protocols (like CDC) or proprietary ones. * * Note that a non-authorized device (dev->authorized == 0) will only * be put in unconfigured mode. * * Note that USB has an additional level of device configurability, * associated with interfaces. That configurability is accessed using * usb_set_interface(). * * This call is synchronous. The calling context must be able to sleep, * must own the device lock, and must not hold the driver model's USB * bus mutex; usb interface driver probe() methods cannot use this routine. * * Returns zero on success, or else the status code returned by the * underlying call that failed. On successful completion, each interface * in the original device configuration has been destroyed, and each one * in the new configuration has been probed by all relevant usb device * drivers currently known to the kernel. */ int usb_set_configuration(struct usb_device *dev, int configuration) { int i, ret; struct usb_host_config *cp = NULL; struct usb_interface **new_interfaces = NULL; struct usb_hcd *hcd = bus_to_hcd(dev->bus); int n, nintf; if (dev->authorized == 0 || configuration == -1) configuration = 0; else { for (i = 0; i < dev->descriptor.bNumConfigurations; i++) { if (dev->config[i].desc.bConfigurationValue == configuration) { cp = &dev->config[i]; break; } } } if ((!cp && configuration != 0)) return -EINVAL; /* The USB spec says configuration 0 means unconfigured. * But if a device includes a configuration numbered 0, * we will accept it as a correctly configured state. * Use -1 if you really want to unconfigure the device. */ if (cp && configuration == 0) dev_warn(&dev->dev, "config 0 descriptor??\n"); /* Allocate memory for new interfaces before doing anything else, * so that if we run out then nothing will have changed. */ n = nintf = 0; if (cp) { nintf = cp->desc.bNumInterfaces; new_interfaces = kmalloc_array(nintf, sizeof(*new_interfaces), GFP_NOIO); if (!new_interfaces) return -ENOMEM; for (; n < nintf; ++n) { new_interfaces[n] = kzalloc( sizeof(struct usb_interface), GFP_NOIO); if (!new_interfaces[n]) { ret = -ENOMEM; free_interfaces: while (--n >= 0) kfree(new_interfaces[n]); kfree(new_interfaces); return ret; } } i = dev->bus_mA - usb_get_max_power(dev, cp); if (i < 0) dev_warn(&dev->dev, "new config #%d exceeds power " "limit by %dmA\n", configuration, -i); } /* Wake up the device so we can send it the Set-Config request */ ret = usb_autoresume_device(dev); if (ret) goto free_interfaces; /* if it's already configured, clear out old state first. * getting rid of old interfaces means unbinding their drivers. */ if (dev->state != USB_STATE_ADDRESS) usb_disable_device(dev, 1); /* Skip ep0 */ /* Get rid of pending async Set-Config requests for this device */ cancel_async_set_config(dev); /* Make sure we have bandwidth (and available HCD resources) for this * configuration. Remove endpoints from the schedule if we're dropping * this configuration to set configuration 0. After this point, the * host controller will not allow submissions to dropped endpoints. If * this call fails, the device state is unchanged. */ mutex_lock(hcd->bandwidth_mutex); /* Disable LPM, and re-enable it once the new configuration is * installed, so that the xHCI driver can recalculate the U1/U2 * timeouts. */ if (dev->actconfig && usb_disable_lpm(dev)) { dev_err(&dev->dev, "%s Failed to disable LPM\n", __func__); mutex_unlock(hcd->bandwidth_mutex); ret = -ENOMEM; goto free_interfaces; } ret = usb_hcd_alloc_bandwidth(dev, cp, NULL, NULL); if (ret < 0) { if (dev->actconfig) usb_enable_lpm(dev); mutex_unlock(hcd->bandwidth_mutex); usb_autosuspend_device(dev); goto free_interfaces; } /* * Initialize the new interface structures and the * hc/hcd/usbcore interface/endpoint state. */ for (i = 0; i < nintf; ++i) { struct usb_interface_cache *intfc; struct usb_interface *intf; struct usb_host_interface *alt; u8 ifnum; cp->interface[i] = intf = new_interfaces[i]; intfc = cp->intf_cache[i]; intf->altsetting = intfc->altsetting; intf->num_altsetting = intfc->num_altsetting; intf->authorized = !!HCD_INTF_AUTHORIZED(hcd); kref_get(&intfc->ref); alt = usb_altnum_to_altsetting(intf, 0); /* No altsetting 0? We'll assume the first altsetting. * We could use a GetInterface call, but if a device is * so non-compliant that it doesn't have altsetting 0 * then I wouldn't trust its reply anyway. */ if (!alt) alt = &intf->altsetting[0]; ifnum = alt->desc.bInterfaceNumber; intf->intf_assoc = find_iad(dev, cp, ifnum); intf->cur_altsetting = alt; usb_enable_interface(dev, intf, true); intf->dev.parent = &dev->dev; if (usb_of_has_combined_node(dev)) { device_set_of_node_from_dev(&intf->dev, &dev->dev); } else { intf->dev.of_node = usb_of_get_interface_node(dev, configuration, ifnum); } ACPI_COMPANION_SET(&intf->dev, ACPI_COMPANION(&dev->dev)); intf->dev.driver = NULL; intf->dev.bus = &usb_bus_type; intf->dev.type = &usb_if_device_type; intf->dev.groups = usb_interface_groups; INIT_WORK(&intf->reset_ws, __usb_queue_reset_device); INIT_WORK(&intf->wireless_status_work, __usb_wireless_status_intf); intf->minor = -1; device_initialize(&intf->dev); pm_runtime_no_callbacks(&intf->dev); dev_set_name(&intf->dev, "%d-%s:%d.%d", dev->bus->busnum, dev->devpath, configuration, ifnum); usb_get_dev(dev); } kfree(new_interfaces); ret = usb_control_msg_send(dev, 0, USB_REQ_SET_CONFIGURATION, 0, configuration, 0, NULL, 0, USB_CTRL_SET_TIMEOUT, GFP_NOIO); if (ret && cp) { /* * All the old state is gone, so what else can we do? * The device is probably useless now anyway. */ usb_hcd_alloc_bandwidth(dev, NULL, NULL, NULL); for (i = 0; i < nintf; ++i) { usb_disable_interface(dev, cp->interface[i], true); put_device(&cp->interface[i]->dev); cp->interface[i] = NULL; } cp = NULL; } dev->actconfig = cp; mutex_unlock(hcd->bandwidth_mutex); if (!cp) { usb_set_device_state(dev, USB_STATE_ADDRESS); /* Leave LPM disabled while the device is unconfigured. */ usb_autosuspend_device(dev); return ret; } usb_set_device_state(dev, USB_STATE_CONFIGURED); if (cp->string == NULL && !(dev->quirks & USB_QUIRK_CONFIG_INTF_STRINGS)) cp->string = usb_cache_string(dev, cp->desc.iConfiguration); /* Now that the interfaces are installed, re-enable LPM. */ usb_unlocked_enable_lpm(dev); /* Enable LTM if it was turned off by usb_disable_device. */ usb_enable_ltm(dev); /* Now that all the interfaces are set up, register them * to trigger binding of drivers to interfaces. probe() * routines may install different altsettings and may * claim() any interfaces not yet bound. Many class drivers * need that: CDC, audio, video, etc. */ for (i = 0; i < nintf; ++i) { struct usb_interface *intf = cp->interface[i]; if (intf->dev.of_node && !of_device_is_available(intf->dev.of_node)) { dev_info(&dev->dev, "skipping disabled interface %d\n", intf->cur_altsetting->desc.bInterfaceNumber); continue; } dev_dbg(&dev->dev, "adding %s (config #%d, interface %d)\n", dev_name(&intf->dev), configuration, intf->cur_altsetting->desc.bInterfaceNumber); device_enable_async_suspend(&intf->dev); ret = device_add(&intf->dev); if (ret != 0) { dev_err(&dev->dev, "device_add(%s) --> %d\n", dev_name(&intf->dev), ret); continue; } create_intf_ep_devs(intf); } usb_autosuspend_device(dev); return 0; } EXPORT_SYMBOL_GPL(usb_set_configuration); static LIST_HEAD(set_config_list); static DEFINE_SPINLOCK(set_config_lock); struct set_config_request { struct usb_device *udev; int config; struct work_struct work; struct list_head node; }; /* Worker routine for usb_driver_set_configuration() */ static void driver_set_config_work(struct work_struct *work) { struct set_config_request *req = container_of(work, struct set_config_request, work); struct usb_device *udev = req->udev; usb_lock_device(udev); spin_lock(&set_config_lock); list_del(&req->node); spin_unlock(&set_config_lock); if (req->config >= -1) /* Is req still valid? */ usb_set_configuration(udev, req->config); usb_unlock_device(udev); usb_put_dev(udev); kfree(req); } /* Cancel pending Set-Config requests for a device whose configuration * was just changed */ static void cancel_async_set_config(struct usb_device *udev) { struct set_config_request *req; spin_lock(&set_config_lock); list_for_each_entry(req, &set_config_list, node) { if (req->udev == udev) req->config = -999; /* Mark as cancelled */ } spin_unlock(&set_config_lock); } /** * usb_driver_set_configuration - Provide a way for drivers to change device configurations * @udev: the device whose configuration is being updated * @config: the configuration being chosen. * Context: In process context, must be able to sleep * * Device interface drivers are not allowed to change device configurations. * This is because changing configurations will destroy the interface the * driver is bound to and create new ones; it would be like a floppy-disk * driver telling the computer to replace the floppy-disk drive with a * tape drive! * * Still, in certain specialized circumstances the need may arise. This * routine gets around the normal restrictions by using a work thread to * submit the change-config request. * * Return: 0 if the request was successfully queued, error code otherwise. * The caller has no way to know whether the queued request will eventually * succeed. */ int usb_driver_set_configuration(struct usb_device *udev, int config) { struct set_config_request *req; req = kmalloc(sizeof(*req), GFP_KERNEL); if (!req) return -ENOMEM; req->udev = udev; req->config = config; INIT_WORK(&req->work, driver_set_config_work); spin_lock(&set_config_lock); list_add(&req->node, &set_config_list); spin_unlock(&set_config_lock); usb_get_dev(udev); schedule_work(&req->work); return 0; } EXPORT_SYMBOL_GPL(usb_driver_set_configuration); /** * cdc_parse_cdc_header - parse the extra headers present in CDC devices * @hdr: the place to put the results of the parsing * @intf: the interface for which parsing is requested * @buffer: pointer to the extra headers to be parsed * @buflen: length of the extra headers * * This evaluates the extra headers present in CDC devices which * bind the interfaces for data and control and provide details * about the capabilities of the device. * * Return: number of descriptors parsed or -EINVAL * if the header is contradictory beyond salvage */ int cdc_parse_cdc_header(struct usb_cdc_parsed_header *hdr, struct usb_interface *intf, u8 *buffer, int buflen) { /* duplicates are ignored */ struct usb_cdc_union_desc *union_header = NULL; /* duplicates are not tolerated */ struct usb_cdc_header_desc *header = NULL; struct usb_cdc_ether_desc *ether = NULL; struct usb_cdc_mdlm_detail_desc *detail = NULL; struct usb_cdc_mdlm_desc *desc = NULL; unsigned int elength; int cnt = 0; memset(hdr, 0x00, sizeof(struct usb_cdc_parsed_header)); hdr->phonet_magic_present = false; while (buflen > 0) { elength = buffer[0]; if (!elength) { dev_err(&intf->dev, "skipping garbage byte\n"); elength = 1; goto next_desc; } if ((buflen < elength) || (elength < 3)) { dev_err(&intf->dev, "invalid descriptor buffer length\n"); break; } if (buffer[1] != USB_DT_CS_INTERFACE) { dev_err(&intf->dev, "skipping garbage\n"); goto next_desc; } switch (buffer[2]) { case USB_CDC_UNION_TYPE: /* we've found it */ if (elength < sizeof(struct usb_cdc_union_desc)) goto next_desc; if (union_header) { dev_err(&intf->dev, "More than one union descriptor, skipping ...\n"); goto next_desc; } union_header = (struct usb_cdc_union_desc *)buffer; break; case USB_CDC_COUNTRY_TYPE: if (elength < sizeof(struct usb_cdc_country_functional_desc)) goto next_desc; hdr->usb_cdc_country_functional_desc = (struct usb_cdc_country_functional_desc *)buffer; break; case USB_CDC_HEADER_TYPE: if (elength != sizeof(struct usb_cdc_header_desc)) goto next_desc; if (header) return -EINVAL; header = (struct usb_cdc_header_desc *)buffer; break; case USB_CDC_ACM_TYPE: if (elength < sizeof(struct usb_cdc_acm_descriptor)) goto next_desc; hdr->usb_cdc_acm_descriptor = (struct usb_cdc_acm_descriptor *)buffer; break; case USB_CDC_ETHERNET_TYPE: if (elength != sizeof(struct usb_cdc_ether_desc)) goto next_desc; if (ether) return -EINVAL; ether = (struct usb_cdc_ether_desc *)buffer; break; case USB_CDC_CALL_MANAGEMENT_TYPE: if (elength < sizeof(struct usb_cdc_call_mgmt_descriptor)) goto next_desc; hdr->usb_cdc_call_mgmt_descriptor = (struct usb_cdc_call_mgmt_descriptor *)buffer; break; case USB_CDC_DMM_TYPE: if (elength < sizeof(struct usb_cdc_dmm_desc)) goto next_desc; hdr->usb_cdc_dmm_desc = (struct usb_cdc_dmm_desc *)buffer; break; case USB_CDC_MDLM_TYPE: if (elength < sizeof(struct usb_cdc_mdlm_desc)) goto next_desc; if (desc) return -EINVAL; desc = (struct usb_cdc_mdlm_desc *)buffer; break; case USB_CDC_MDLM_DETAIL_TYPE: if (elength < sizeof(struct usb_cdc_mdlm_detail_desc)) goto next_desc; if (detail) return -EINVAL; detail = (struct usb_cdc_mdlm_detail_desc *)buffer; break; case USB_CDC_NCM_TYPE: if (elength < sizeof(struct usb_cdc_ncm_desc)) goto next_desc; hdr->usb_cdc_ncm_desc = (struct usb_cdc_ncm_desc *)buffer; break; case USB_CDC_MBIM_TYPE: if (elength < sizeof(struct usb_cdc_mbim_desc)) goto next_desc; hdr->usb_cdc_mbim_desc = (struct usb_cdc_mbim_desc *)buffer; break; case USB_CDC_MBIM_EXTENDED_TYPE: if (elength < sizeof(struct usb_cdc_mbim_extended_desc)) break; hdr->usb_cdc_mbim_extended_desc = (struct usb_cdc_mbim_extended_desc *)buffer; break; case CDC_PHONET_MAGIC_NUMBER: hdr->phonet_magic_present = true; break; default: /* * there are LOTS more CDC descriptors that * could legitimately be found here. */ dev_dbg(&intf->dev, "Ignoring descriptor: type %02x, length %ud\n", buffer[2], elength); goto next_desc; } cnt++; next_desc: buflen -= elength; buffer += elength; } hdr->usb_cdc_union_desc = union_header; hdr->usb_cdc_header_desc = header; hdr->usb_cdc_mdlm_detail_desc = detail; hdr->usb_cdc_mdlm_desc = desc; hdr->usb_cdc_ether_desc = ether; return cnt; } EXPORT_SYMBOL(cdc_parse_cdc_header); |
7 6 2 2 7 7 7 2 2 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 | // SPDX-License-Identifier: GPL-2.0 #include <linux/ceph/ceph_debug.h> #include <linux/err.h> #include <linux/sched.h> #include <linux/types.h> #include <linux/vmalloc.h> #include <linux/ceph/messenger.h> #include <linux/ceph/msgpool.h> static void *msgpool_alloc(gfp_t gfp_mask, void *arg) { struct ceph_msgpool *pool = arg; struct ceph_msg *msg; msg = ceph_msg_new2(pool->type, pool->front_len, pool->max_data_items, gfp_mask, true); if (!msg) { dout("msgpool_alloc %s failed\n", pool->name); } else { dout("msgpool_alloc %s %p\n", pool->name, msg); msg->pool = pool; } return msg; } static void msgpool_free(void *element, void *arg) { struct ceph_msgpool *pool = arg; struct ceph_msg *msg = element; dout("msgpool_release %s %p\n", pool->name, msg); msg->pool = NULL; ceph_msg_put(msg); } int ceph_msgpool_init(struct ceph_msgpool *pool, int type, int front_len, int max_data_items, int size, const char *name) { dout("msgpool %s init\n", name); pool->type = type; pool->front_len = front_len; pool->max_data_items = max_data_items; pool->pool = mempool_create(size, msgpool_alloc, msgpool_free, pool); if (!pool->pool) return -ENOMEM; pool->name = name; return 0; } void ceph_msgpool_destroy(struct ceph_msgpool *pool) { dout("msgpool %s destroy\n", pool->name); mempool_destroy(pool->pool); } struct ceph_msg *ceph_msgpool_get(struct ceph_msgpool *pool, int front_len, int max_data_items) { struct ceph_msg *msg; if (front_len > pool->front_len || max_data_items > pool->max_data_items) { pr_warn_ratelimited("%s need %d/%d, pool %s has %d/%d\n", __func__, front_len, max_data_items, pool->name, pool->front_len, pool->max_data_items); WARN_ON_ONCE(1); /* try to alloc a fresh message */ return ceph_msg_new2(pool->type, front_len, max_data_items, GFP_NOFS, false); } msg = mempool_alloc(pool->pool, GFP_NOFS); dout("msgpool_get %s %p\n", pool->name, msg); return msg; } void ceph_msgpool_put(struct ceph_msgpool *pool, struct ceph_msg *msg) { dout("msgpool_put %s %p\n", pool->name, msg); /* reset msg front_len; user may have changed it */ msg->front.iov_len = pool->front_len; msg->hdr.front_len = cpu_to_le32(pool->front_len); msg->data_length = 0; msg->num_data_items = 0; kref_init(&msg->kref); /* retake single ref */ mempool_free(msg, pool->pool); } |
38 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 | /* SPDX-License-Identifier: GPL-2.0 */ #ifndef _BR_NETFILTER_H_ #define _BR_NETFILTER_H_ #include <linux/netfilter.h> #include "../../../net/bridge/br_private.h" static inline struct nf_bridge_info *nf_bridge_alloc(struct sk_buff *skb) { #if IS_ENABLED(CONFIG_BRIDGE_NETFILTER) struct nf_bridge_info *b = skb_ext_add(skb, SKB_EXT_BRIDGE_NF); if (b) memset(b, 0, sizeof(*b)); return b; #else return NULL; #endif } void nf_bridge_update_protocol(struct sk_buff *skb); int br_nf_hook_thresh(unsigned int hook, struct net *net, struct sock *sk, struct sk_buff *skb, struct net_device *indev, struct net_device *outdev, int (*okfn)(struct net *, struct sock *, struct sk_buff *)); unsigned int nf_bridge_encap_header_len(const struct sk_buff *skb); static inline void nf_bridge_push_encap_header(struct sk_buff *skb) { unsigned int len = nf_bridge_encap_header_len(skb); skb_push(skb, len); skb->network_header -= len; } int br_nf_pre_routing_finish_bridge(struct net *net, struct sock *sk, struct sk_buff *skb); static inline struct rtable *bridge_parent_rtable(const struct net_device *dev) { #if IS_ENABLED(CONFIG_BRIDGE_NETFILTER) struct net_bridge_port *port; port = br_port_get_rcu(dev); return port ? &port->br->fake_rtable : NULL; #else return NULL; #endif } struct net_device *setup_pre_routing(struct sk_buff *skb, const struct net *net); #if IS_ENABLED(CONFIG_IPV6) int br_validate_ipv6(struct net *net, struct sk_buff *skb); unsigned int br_nf_pre_routing_ipv6(void *priv, struct sk_buff *skb, const struct nf_hook_state *state); #else static inline int br_validate_ipv6(struct net *net, struct sk_buff *skb) { return -1; } static inline unsigned int br_nf_pre_routing_ipv6(void *priv, struct sk_buff *skb, const struct nf_hook_state *state) { return NF_ACCEPT; } #endif #endif /* _BR_NETFILTER_H_ */ |
11 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 | /* SPDX-License-Identifier: GPL-2.0 */ #undef TRACE_SYSTEM #define TRACE_SYSTEM tlb #if !defined(_TRACE_TLB_H) || defined(TRACE_HEADER_MULTI_READ) #define _TRACE_TLB_H #include <linux/mm_types.h> #include <linux/tracepoint.h> #define TLB_FLUSH_REASON \ EM( TLB_FLUSH_ON_TASK_SWITCH, "flush on task switch" ) \ EM( TLB_REMOTE_SHOOTDOWN, "remote shootdown" ) \ EM( TLB_LOCAL_SHOOTDOWN, "local shootdown" ) \ EM( TLB_LOCAL_MM_SHOOTDOWN, "local mm shootdown" ) \ EMe( TLB_REMOTE_SEND_IPI, "remote ipi send" ) /* * First define the enums in TLB_FLUSH_REASON to be exported to userspace * via TRACE_DEFINE_ENUM(). */ #undef EM #undef EMe #define EM(a,b) TRACE_DEFINE_ENUM(a); #define EMe(a,b) TRACE_DEFINE_ENUM(a); TLB_FLUSH_REASON /* * Now redefine the EM() and EMe() macros to map the enums to the strings * that will be printed in the output. */ #undef EM #undef EMe #define EM(a,b) { a, b }, #define EMe(a,b) { a, b } TRACE_EVENT(tlb_flush, TP_PROTO(int reason, unsigned long pages), TP_ARGS(reason, pages), TP_STRUCT__entry( __field( int, reason) __field(unsigned long, pages) ), TP_fast_assign( __entry->reason = reason; __entry->pages = pages; ), TP_printk("pages:%ld reason:%s (%d)", __entry->pages, __print_symbolic(__entry->reason, TLB_FLUSH_REASON), __entry->reason) ); #endif /* _TRACE_TLB_H */ /* This part must be outside protection */ #include <trace/define_trace.h> |
2 2 2 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 | // SPDX-License-Identifier: GPL-2.0-only /* * Copyright (c) 2017 Red Hat, Inc */ #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt #include <linux/kernel.h> #include <linux/module.h> #include <linux/libps2.h> #include <linux/i2c.h> #include <linux/serio.h> #include <linux/slab.h> #include <linux/workqueue.h> #include "psmouse.h" struct psmouse_smbus_dev { struct i2c_board_info board; struct psmouse *psmouse; struct i2c_client *client; struct list_head node; bool dead; bool need_deactivate; }; static LIST_HEAD(psmouse_smbus_list); static DEFINE_MUTEX(psmouse_smbus_mutex); static struct workqueue_struct *psmouse_smbus_wq; static void psmouse_smbus_check_adapter(struct i2c_adapter *adapter) { struct psmouse_smbus_dev *smbdev; if (!i2c_check_functionality(adapter, I2C_FUNC_SMBUS_HOST_NOTIFY)) return; guard(mutex)(&psmouse_smbus_mutex); list_for_each_entry(smbdev, &psmouse_smbus_list, node) { if (smbdev->dead) continue; if (smbdev->client) continue; /* * Here would be a good place to check if device is actually * present, but it seems that SMBus will not respond unless we * fully reset PS/2 connection. So cross our fingers, and try * to switch over, hopefully our system will not have too many * "host notify" I2C adapters. */ psmouse_dbg(smbdev->psmouse, "SMBus candidate adapter appeared, triggering rescan\n"); serio_rescan(smbdev->psmouse->ps2dev.serio); } } static void psmouse_smbus_detach_i2c_client(struct i2c_client *client) { struct psmouse_smbus_dev *smbdev, *tmp; guard(mutex)(&psmouse_smbus_mutex); list_for_each_entry_safe(smbdev, tmp, &psmouse_smbus_list, node) { if (smbdev->client != client) continue; kfree(client->dev.platform_data); client->dev.platform_data = NULL; if (!smbdev->dead) { psmouse_dbg(smbdev->psmouse, "Marking SMBus companion %s as gone\n", dev_name(&smbdev->client->dev)); smbdev->dead = true; device_link_remove(&smbdev->client->dev, &smbdev->psmouse->ps2dev.serio->dev); serio_rescan(smbdev->psmouse->ps2dev.serio); } else { list_del(&smbdev->node); kfree(smbdev); } } } static int psmouse_smbus_notifier_call(struct notifier_block *nb, unsigned long action, void *data) { struct device *dev = data; switch (action) { case BUS_NOTIFY_ADD_DEVICE: if (dev->type == &i2c_adapter_type) psmouse_smbus_check_adapter(to_i2c_adapter(dev)); break; case BUS_NOTIFY_REMOVED_DEVICE: if (dev->type == &i2c_client_type) psmouse_smbus_detach_i2c_client(to_i2c_client(dev)); break; } return 0; } static struct notifier_block psmouse_smbus_notifier = { .notifier_call = psmouse_smbus_notifier_call, }; static psmouse_ret_t psmouse_smbus_process_byte(struct psmouse *psmouse) { return PSMOUSE_FULL_PACKET; } static int psmouse_smbus_reconnect(struct psmouse *psmouse) { struct psmouse_smbus_dev *smbdev = psmouse->private; if (smbdev->need_deactivate) psmouse_deactivate(psmouse); return 0; } struct psmouse_smbus_removal_work { struct work_struct work; struct i2c_client *client; }; static void psmouse_smbus_remove_i2c_device(struct work_struct *work) { struct psmouse_smbus_removal_work *rwork = container_of(work, struct psmouse_smbus_removal_work, work); dev_dbg(&rwork->client->dev, "destroying SMBus companion device\n"); i2c_unregister_device(rwork->client); kfree(rwork); } /* * This schedules removal of SMBus companion device. We have to do * it in a separate tread to avoid deadlocking on psmouse_mutex in * case the device has a trackstick (which is also driven by psmouse). * * Note that this may be racing with i2c adapter removal, but we * can't do anything about that: i2c automatically destroys clients * attached to an adapter that is being removed. This has to be * fixed in i2c core. */ static void psmouse_smbus_schedule_remove(struct i2c_client *client) { struct psmouse_smbus_removal_work *rwork; rwork = kzalloc(sizeof(*rwork), GFP_KERNEL); if (rwork) { INIT_WORK(&rwork->work, psmouse_smbus_remove_i2c_device); rwork->client = client; queue_work(psmouse_smbus_wq, &rwork->work); } } static void psmouse_smbus_disconnect(struct psmouse *psmouse) { struct psmouse_smbus_dev *smbdev = psmouse->private; guard(mutex)(&psmouse_smbus_mutex); if (smbdev->dead) { list_del(&smbdev->node); kfree(smbdev); } else { smbdev->dead = true; device_link_remove(&smbdev->client->dev, &psmouse->ps2dev.serio->dev); psmouse_dbg(smbdev->psmouse, "posting removal request for SMBus companion %s\n", dev_name(&smbdev->client->dev)); psmouse_smbus_schedule_remove(smbdev->client); } psmouse->private = NULL; } static int psmouse_smbus_create_companion(struct device *dev, void *data) { struct psmouse_smbus_dev *smbdev = data; unsigned short addr_list[] = { smbdev->board.addr, I2C_CLIENT_END }; struct i2c_adapter *adapter; struct i2c_client *client; adapter = i2c_verify_adapter(dev); if (!adapter) return 0; if (!i2c_check_functionality(adapter, I2C_FUNC_SMBUS_HOST_NOTIFY)) return 0; client = i2c_new_scanned_device(adapter, &smbdev->board, addr_list, NULL); if (IS_ERR(client)) return 0; /* We have our(?) device, stop iterating i2c bus. */ smbdev->client = client; return 1; } void psmouse_smbus_cleanup(struct psmouse *psmouse) { struct psmouse_smbus_dev *smbdev, *tmp; guard(mutex)(&psmouse_smbus_mutex); list_for_each_entry_safe(smbdev, tmp, &psmouse_smbus_list, node) { if (psmouse == smbdev->psmouse) { list_del(&smbdev->node); kfree(smbdev); } } } int psmouse_smbus_init(struct psmouse *psmouse, const struct i2c_board_info *board, const void *pdata, size_t pdata_size, bool need_deactivate, bool leave_breadcrumbs) { struct psmouse_smbus_dev *smbdev; int error; smbdev = kzalloc(sizeof(*smbdev), GFP_KERNEL); if (!smbdev) return -ENOMEM; smbdev->psmouse = psmouse; smbdev->board = *board; smbdev->need_deactivate = need_deactivate; if (pdata) { smbdev->board.platform_data = kmemdup(pdata, pdata_size, GFP_KERNEL); if (!smbdev->board.platform_data) { kfree(smbdev); return -ENOMEM; } } if (need_deactivate) psmouse_deactivate(psmouse); psmouse->private = smbdev; psmouse->protocol_handler = psmouse_smbus_process_byte; psmouse->reconnect = psmouse_smbus_reconnect; psmouse->fast_reconnect = psmouse_smbus_reconnect; psmouse->disconnect = psmouse_smbus_disconnect; psmouse->resync_time = 0; scoped_guard(mutex, &psmouse_smbus_mutex) { list_add_tail(&smbdev->node, &psmouse_smbus_list); } /* Bind to already existing adapters right away */ error = i2c_for_each_dev(smbdev, psmouse_smbus_create_companion); if (smbdev->client) { /* We have our companion device */ if (!device_link_add(&smbdev->client->dev, &psmouse->ps2dev.serio->dev, DL_FLAG_STATELESS)) psmouse_warn(psmouse, "failed to set up link with iSMBus companion %s\n", dev_name(&smbdev->client->dev)); return 0; } /* * If we did not create i2c device we will not need platform * data even if we are leaving breadcrumbs. */ kfree(smbdev->board.platform_data); smbdev->board.platform_data = NULL; if (error < 0 || !leave_breadcrumbs) { scoped_guard(mutex, &psmouse_smbus_mutex) { list_del(&smbdev->node); } kfree(smbdev); } return error < 0 ? error : -EAGAIN; } int __init psmouse_smbus_module_init(void) { int error; psmouse_smbus_wq = alloc_workqueue("psmouse-smbus", 0, 0); if (!psmouse_smbus_wq) return -ENOMEM; error = bus_register_notifier(&i2c_bus_type, &psmouse_smbus_notifier); if (error) { pr_err("failed to register i2c bus notifier: %d\n", error); destroy_workqueue(psmouse_smbus_wq); return error; } return 0; } void psmouse_smbus_module_exit(void) { bus_unregister_notifier(&i2c_bus_type, &psmouse_smbus_notifier); destroy_workqueue(psmouse_smbus_wq); } |
725 11 4 9 6 22184 1 1 1 12805 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 | /* SPDX-License-Identifier: GPL-2.0 */ #ifndef _ASM_X86_ATOMIC_H #define _ASM_X86_ATOMIC_H #include <linux/compiler.h> #include <linux/types.h> #include <asm/alternative.h> #include <asm/cmpxchg.h> #include <asm/rmwcc.h> #include <asm/barrier.h> /* * Atomic operations that C can't guarantee us. Useful for * resource counting etc.. */ static __always_inline int arch_atomic_read(const atomic_t *v) { /* * Note for KASAN: we deliberately don't use READ_ONCE_NOCHECK() here, * it's non-inlined function that increases binary size and stack usage. */ return __READ_ONCE((v)->counter); } static __always_inline void arch_atomic_set(atomic_t *v, int i) { __WRITE_ONCE(v->counter, i); } static __always_inline void arch_atomic_add(int i, atomic_t *v) { asm_inline volatile(LOCK_PREFIX "addl %1, %0" : "+m" (v->counter) : "ir" (i) : "memory"); } static __always_inline void arch_atomic_sub(int i, atomic_t *v) { asm_inline volatile(LOCK_PREFIX "subl %1, %0" : "+m" (v->counter) : "ir" (i) : "memory"); } static __always_inline bool arch_atomic_sub_and_test(int i, atomic_t *v) { return GEN_BINARY_RMWcc(LOCK_PREFIX "subl", v->counter, e, "er", i); } #define arch_atomic_sub_and_test arch_atomic_sub_and_test static __always_inline void arch_atomic_inc(atomic_t *v) { asm_inline volatile(LOCK_PREFIX "incl %0" : "+m" (v->counter) :: "memory"); } #define arch_atomic_inc arch_atomic_inc static __always_inline void arch_atomic_dec(atomic_t *v) { asm_inline volatile(LOCK_PREFIX "decl %0" : "+m" (v->counter) :: "memory"); } #define arch_atomic_dec arch_atomic_dec static __always_inline bool arch_atomic_dec_and_test(atomic_t *v) { return GEN_UNARY_RMWcc(LOCK_PREFIX "decl", v->counter, e); } #define arch_atomic_dec_and_test arch_atomic_dec_and_test static __always_inline bool arch_atomic_inc_and_test(atomic_t *v) { return GEN_UNARY_RMWcc(LOCK_PREFIX "incl", v->counter, e); } #define arch_atomic_inc_and_test arch_atomic_inc_and_test static __always_inline bool arch_atomic_add_negative(int i, atomic_t *v) { return GEN_BINARY_RMWcc(LOCK_PREFIX "addl", v->counter, s, "er", i); } #define arch_atomic_add_negative arch_atomic_add_negative static __always_inline int arch_atomic_add_return(int i, atomic_t *v) { return i + xadd(&v->counter, i); } #define arch_atomic_add_return arch_atomic_add_return #define arch_atomic_sub_return(i, v) arch_atomic_add_return(-(i), v) static __always_inline int arch_atomic_fetch_add(int i, atomic_t *v) { return xadd(&v->counter, i); } #define arch_atomic_fetch_add arch_atomic_fetch_add #define arch_atomic_fetch_sub(i, v) arch_atomic_fetch_add(-(i), v) static __always_inline int arch_atomic_cmpxchg(atomic_t *v, int old, int new) { return arch_cmpxchg(&v->counter, old, new); } #define arch_atomic_cmpxchg arch_atomic_cmpxchg static __always_inline bool arch_atomic_try_cmpxchg(atomic_t *v, int *old, int new) { return arch_try_cmpxchg(&v->counter, old, new); } #define arch_atomic_try_cmpxchg arch_atomic_try_cmpxchg static __always_inline int arch_atomic_xchg(atomic_t *v, int new) { return arch_xchg(&v->counter, new); } #define arch_atomic_xchg arch_atomic_xchg static __always_inline void arch_atomic_and(int i, atomic_t *v) { asm_inline volatile(LOCK_PREFIX "andl %1, %0" : "+m" (v->counter) : "ir" (i) : "memory"); } static __always_inline int arch_atomic_fetch_and(int i, atomic_t *v) { int val = arch_atomic_read(v); do { } while (!arch_atomic_try_cmpxchg(v, &val, val & i)); return val; } #define arch_atomic_fetch_and arch_atomic_fetch_and static __always_inline void arch_atomic_or(int i, atomic_t *v) { asm_inline volatile(LOCK_PREFIX "orl %1, %0" : "+m" (v->counter) : "ir" (i) : "memory"); } static __always_inline int arch_atomic_fetch_or(int i, atomic_t *v) { int val = arch_atomic_read(v); do { } while (!arch_atomic_try_cmpxchg(v, &val, val | i)); return val; } #define arch_atomic_fetch_or arch_atomic_fetch_or static __always_inline void arch_atomic_xor(int i, atomic_t *v) { asm_inline volatile(LOCK_PREFIX "xorl %1, %0" : "+m" (v->counter) : "ir" (i) : "memory"); } static __always_inline int arch_atomic_fetch_xor(int i, atomic_t *v) { int val = arch_atomic_read(v); do { } while (!arch_atomic_try_cmpxchg(v, &val, val ^ i)); return val; } #define arch_atomic_fetch_xor arch_atomic_fetch_xor #ifdef CONFIG_X86_32 # include <asm/atomic64_32.h> #else # include <asm/atomic64_64.h> #endif #endif /* _ASM_X86_ATOMIC_H */ |
20 99 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 | /* SPDX-License-Identifier: GPL-2.0-or-later */ /* * Copyright (C) 2002, 2004, 2005 Oracle. All rights reserved. */ #ifndef OCFS2_AOPS_H #define OCFS2_AOPS_H #include <linux/fs.h> int ocfs2_map_folio_blocks(struct folio *folio, u64 *p_blkno, struct inode *inode, unsigned int from, unsigned int to, int new); void ocfs2_unlock_and_free_folios(struct folio **folios, int num_folios); int walk_page_buffers( handle_t *handle, struct buffer_head *head, unsigned from, unsigned to, int *partial, int (*fn)( handle_t *handle, struct buffer_head *bh)); int ocfs2_write_end_nolock(struct address_space *mapping, loff_t pos, unsigned len, unsigned copied, void *fsdata); typedef enum { OCFS2_WRITE_BUFFER = 0, OCFS2_WRITE_DIRECT, OCFS2_WRITE_MMAP, } ocfs2_write_type_t; int ocfs2_write_begin_nolock(struct address_space *mapping, loff_t pos, unsigned len, ocfs2_write_type_t type, struct folio **foliop, void **fsdata, struct buffer_head *di_bh, struct folio *mmap_folio); int ocfs2_read_inline_data(struct inode *inode, struct folio *folio, struct buffer_head *di_bh); int ocfs2_size_fits_inline_data(struct buffer_head *di_bh, u64 new_size); int ocfs2_get_block(struct inode *inode, sector_t iblock, struct buffer_head *bh_result, int create); /* all ocfs2_dio_end_io()'s fault */ #define ocfs2_iocb_is_rw_locked(iocb) \ test_bit(0, (unsigned long *)&iocb->private) static inline void ocfs2_iocb_set_rw_locked(struct kiocb *iocb, int level) { set_bit(0, (unsigned long *)&iocb->private); if (level) set_bit(1, (unsigned long *)&iocb->private); else clear_bit(1, (unsigned long *)&iocb->private); } /* * Using a named enum representing lock types in terms of #N bit stored in * iocb->private, which is going to be used for communication between * ocfs2_dio_end_io() and ocfs2_file_write/read_iter(). */ enum ocfs2_iocb_lock_bits { OCFS2_IOCB_RW_LOCK = 0, OCFS2_IOCB_RW_LOCK_LEVEL, OCFS2_IOCB_NUM_LOCKS }; #define ocfs2_iocb_init_rw_locked(iocb) \ (iocb->private = NULL) #define ocfs2_iocb_clear_rw_locked(iocb) \ clear_bit(OCFS2_IOCB_RW_LOCK, (unsigned long *)&iocb->private) #define ocfs2_iocb_rw_locked_level(iocb) \ test_bit(OCFS2_IOCB_RW_LOCK_LEVEL, (unsigned long *)&iocb->private) #endif /* OCFS2_FILE_H */ |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 | /* SPDX-License-Identifier: GPL-2.0 */ #ifndef _LINUX_VDPA_H #define _LINUX_VDPA_H #include <linux/kernel.h> #include <linux/device.h> #include <linux/interrupt.h> #include <linux/vhost_iotlb.h> #include <linux/virtio_net.h> #include <linux/virtio_blk.h> #include <linux/if_ether.h> /** * struct vdpa_callback - vDPA callback definition. * @callback: interrupt callback function * @private: the data passed to the callback function * @trigger: the eventfd for the callback (Optional). * When it is set, the vDPA driver must guarantee that * signaling it is functional equivalent to triggering * the callback. Then vDPA parent can signal it directly * instead of triggering the callback. */ struct vdpa_callback { irqreturn_t (*callback)(void *data); void *private; struct eventfd_ctx *trigger; }; /** * struct vdpa_notification_area - vDPA notification area * @addr: base address of the notification area * @size: size of the notification area */ struct vdpa_notification_area { resource_size_t addr; resource_size_t size; }; /** * struct vdpa_vq_state_split - vDPA split virtqueue state * @avail_index: available index */ struct vdpa_vq_state_split { u16 avail_index; }; /** * struct vdpa_vq_state_packed - vDPA packed virtqueue state * @last_avail_counter: last driver ring wrap counter observed by device * @last_avail_idx: device available index * @last_used_counter: device ring wrap counter * @last_used_idx: used index */ struct vdpa_vq_state_packed { u16 last_avail_counter:1; u16 last_avail_idx:15; u16 last_used_counter:1; u16 last_used_idx:15; }; struct vdpa_vq_state { union { struct vdpa_vq_state_split split; struct vdpa_vq_state_packed packed; }; }; struct vdpa_mgmt_dev; /** * struct vdpa_device - representation of a vDPA device * @dev: underlying device * @dma_dev: the actual device that is performing DMA * @driver_override: driver name to force a match; do not set directly, * because core frees it; use driver_set_override() to * set or clear it. * @config: the configuration ops for this device. * @cf_lock: Protects get and set access to configuration layout. * @index: device index * @features_valid: were features initialized? for legacy guests * @ngroups: the number of virtqueue groups * @nas: the number of address spaces * @use_va: indicate whether virtual address must be used by this device * @nvqs: maximum number of supported virtqueues * @mdev: management device pointer; caller must setup when registering device as part * of dev_add() mgmtdev ops callback before invoking _vdpa_register_device(). */ struct vdpa_device { struct device dev; struct device *dma_dev; const char *driver_override; const struct vdpa_config_ops *config; struct rw_semaphore cf_lock; /* Protects get/set config */ unsigned int index; bool features_valid; bool use_va; u32 nvqs; struct vdpa_mgmt_dev *mdev; unsigned int ngroups; unsigned int nas; }; /** * struct vdpa_iova_range - the IOVA range support by the device * @first: start of the IOVA range * @last: end of the IOVA range */ struct vdpa_iova_range { u64 first; u64 last; }; struct vdpa_dev_set_config { u64 device_features; struct { u8 mac[ETH_ALEN]; u16 mtu; u16 max_vq_pairs; } net; u64 mask; }; /** * struct vdpa_map_file - file area for device memory mapping * @file: vma->vm_file for the mapping * @offset: mapping offset in the vm_file */ struct vdpa_map_file { struct file *file; u64 offset; }; /** * struct vdpa_config_ops - operations for configuring a vDPA device. * Note: vDPA device drivers are required to implement all of the * operations unless it is mentioned to be optional in the following * list. * * @set_vq_address: Set the address of virtqueue * @vdev: vdpa device * @idx: virtqueue index * @desc_area: address of desc area * @driver_area: address of driver area * @device_area: address of device area * Returns integer: success (0) or error (< 0) * @set_vq_num: Set the size of virtqueue * @vdev: vdpa device * @idx: virtqueue index * @num: the size of virtqueue * @kick_vq: Kick the virtqueue * @vdev: vdpa device * @idx: virtqueue index * @kick_vq_with_data: Kick the virtqueue and supply extra data * (only if VIRTIO_F_NOTIFICATION_DATA is negotiated) * @vdev: vdpa device * @data for split virtqueue: * 16 bits vqn and 16 bits next available index. * @data for packed virtqueue: * 16 bits vqn, 15 least significant bits of * next available index and 1 bit next_wrap. * @set_vq_cb: Set the interrupt callback function for * a virtqueue * @vdev: vdpa device * @idx: virtqueue index * @cb: virtio-vdev interrupt callback structure * @set_vq_ready: Set ready status for a virtqueue * @vdev: vdpa device * @idx: virtqueue index * @ready: ready (true) not ready(false) * @get_vq_ready: Get ready status for a virtqueue * @vdev: vdpa device * @idx: virtqueue index * Returns boolean: ready (true) or not (false) * @set_vq_state: Set the state for a virtqueue * @vdev: vdpa device * @idx: virtqueue index * @state: pointer to set virtqueue state (last_avail_idx) * Returns integer: success (0) or error (< 0) * @get_vq_state: Get the state for a virtqueue * @vdev: vdpa device * @idx: virtqueue index * @state: pointer to returned state (last_avail_idx) * @get_vendor_vq_stats: Get the vendor statistics of a device. * @vdev: vdpa device * @idx: virtqueue index * @msg: socket buffer holding stats message * @extack: extack for reporting error messages * Returns integer: success (0) or error (< 0) * @get_vq_notification: Get the notification area for a virtqueue (optional) * @vdev: vdpa device * @idx: virtqueue index * Returns the notification area * @get_vq_irq: Get the irq number of a virtqueue (optional, * but must implemented if require vq irq offloading) * @vdev: vdpa device * @idx: virtqueue index * Returns int: irq number of a virtqueue, * negative number if no irq assigned. * @get_vq_size: Get the size of a specific virtqueue (optional) * @vdev: vdpa device * @idx: virtqueue index * Return u16: the size of the virtqueue * @get_vq_align: Get the virtqueue align requirement * for the device * @vdev: vdpa device * Returns virtqueue algin requirement * @get_vq_group: Get the group id for a specific * virtqueue (optional) * @vdev: vdpa device * @idx: virtqueue index * Returns u32: group id for this virtqueue * @get_vq_desc_group: Get the group id for the descriptor table of * a specific virtqueue (optional) * @vdev: vdpa device * @idx: virtqueue index * Returns u32: group id for the descriptor table * portion of this virtqueue. Could be different * than the one from @get_vq_group, in which case * the access to the descriptor table can be * confined to a separate asid, isolating from * the virtqueue's buffer address access. * @get_device_features: Get virtio features supported by the device * @vdev: vdpa device * Returns the virtio features support by the * device * @get_backend_features: Get parent-specific backend features (optional) * Returns the vdpa features supported by the * device. * @set_driver_features: Set virtio features supported by the driver * @vdev: vdpa device * @features: feature support by the driver * Returns integer: success (0) or error (< 0) * @get_driver_features: Get the virtio driver features in action * @vdev: vdpa device * Returns the virtio features accepted * @set_config_cb: Set the config interrupt callback * @vdev: vdpa device * @cb: virtio-vdev interrupt callback structure * @get_vq_num_max: Get the max size of virtqueue * @vdev: vdpa device * Returns u16: max size of virtqueue * @get_vq_num_min: Get the min size of virtqueue (optional) * @vdev: vdpa device * Returns u16: min size of virtqueue * @get_device_id: Get virtio device id * @vdev: vdpa device * Returns u32: virtio device id * @get_vendor_id: Get id for the vendor that provides this device * @vdev: vdpa device * Returns u32: virtio vendor id * @get_status: Get the device status * @vdev: vdpa device * Returns u8: virtio device status * @set_status: Set the device status * @vdev: vdpa device * @status: virtio device status * @reset: Reset device * @vdev: vdpa device * Returns integer: success (0) or error (< 0) * @compat_reset: Reset device with compatibility quirks to * accommodate older userspace. Only needed by * parent driver which used to have bogus reset * behaviour, and has to maintain such behaviour * for compatibility with older userspace. * Historically compliant driver only has to * implement .reset, Historically non-compliant * driver should implement both. * @vdev: vdpa device * @flags: compatibility quirks for reset * Returns integer: success (0) or error (< 0) * @suspend: Suspend the device (optional) * @vdev: vdpa device * Returns integer: success (0) or error (< 0) * @resume: Resume the device (optional) * @vdev: vdpa device * Returns integer: success (0) or error (< 0) * @get_config_size: Get the size of the configuration space includes * fields that are conditional on feature bits. * @vdev: vdpa device * Returns size_t: configuration size * @get_config: Read from device specific configuration space * @vdev: vdpa device * @offset: offset from the beginning of * configuration space * @buf: buffer used to read to * @len: the length to read from * configuration space * @set_config: Write to device specific configuration space * @vdev: vdpa device * @offset: offset from the beginning of * configuration space * @buf: buffer used to write from * @len: the length to write to * configuration space * @get_generation: Get device config generation (optional) * @vdev: vdpa device * Returns u32: device generation * @get_iova_range: Get supported iova range (optional) * @vdev: vdpa device * Returns the iova range supported by * the device. * @set_vq_affinity: Set the affinity of virtqueue (optional) * @vdev: vdpa device * @idx: virtqueue index * @cpu_mask: the affinity mask * Returns integer: success (0) or error (< 0) * @get_vq_affinity: Get the affinity of virtqueue (optional) * @vdev: vdpa device * @idx: virtqueue index * Returns the affinity mask * @set_group_asid: Set address space identifier for a * virtqueue group (optional) * @vdev: vdpa device * @group: virtqueue group * @asid: address space id for this group * Returns integer: success (0) or error (< 0) * @set_map: Set device memory mapping (optional) * Needed for device that using device * specific DMA translation (on-chip IOMMU) * @vdev: vdpa device * @asid: address space identifier * @iotlb: vhost memory mapping to be * used by the vDPA * Returns integer: success (0) or error (< 0) * @dma_map: Map an area of PA to IOVA (optional) * Needed for device that using device * specific DMA translation (on-chip IOMMU) * and preferring incremental map. * @vdev: vdpa device * @asid: address space identifier * @iova: iova to be mapped * @size: size of the area * @pa: physical address for the map * @perm: device access permission (VHOST_MAP_XX) * Returns integer: success (0) or error (< 0) * @dma_unmap: Unmap an area of IOVA (optional but * must be implemented with dma_map) * Needed for device that using device * specific DMA translation (on-chip IOMMU) * and preferring incremental unmap. * @vdev: vdpa device * @asid: address space identifier * @iova: iova to be unmapped * @size: size of the area * Returns integer: success (0) or error (< 0) * @reset_map: Reset device memory mapping to the default * state (optional) * Needed for devices that are using device * specific DMA translation and prefer mapping * to be decoupled from the virtio life cycle, * i.e. device .reset op does not reset mapping * @vdev: vdpa device * @asid: address space identifier * Returns integer: success (0) or error (< 0) * @get_vq_dma_dev: Get the dma device for a specific * virtqueue (optional) * @vdev: vdpa device * @idx: virtqueue index * Returns pointer to structure device or error (NULL) * @bind_mm: Bind the device to a specific address space * so the vDPA framework can use VA when this * callback is implemented. (optional) * @vdev: vdpa device * @mm: address space to bind * @unbind_mm: Unbind the device from the address space * bound using the bind_mm callback. (optional) * @vdev: vdpa device * @free: Free resources that belongs to vDPA (optional) * @vdev: vdpa device */ struct vdpa_config_ops { /* Virtqueue ops */ int (*set_vq_address)(struct vdpa_device *vdev, u16 idx, u64 desc_area, u64 driver_area, u64 device_area); void (*set_vq_num)(struct vdpa_device *vdev, u16 idx, u32 num); void (*kick_vq)(struct vdpa_device *vdev, u16 idx); void (*kick_vq_with_data)(struct vdpa_device *vdev, u32 data); void (*set_vq_cb)(struct vdpa_device *vdev, u16 idx, struct vdpa_callback *cb); void (*set_vq_ready)(struct vdpa_device *vdev, u16 idx, bool ready); bool (*get_vq_ready)(struct vdpa_device *vdev, u16 idx); int (*set_vq_state)(struct vdpa_device *vdev, u16 idx, const struct vdpa_vq_state *state); int (*get_vq_state)(struct vdpa_device *vdev, u16 idx, struct vdpa_vq_state *state); int (*get_vendor_vq_stats)(struct vdpa_device *vdev, u16 idx, struct sk_buff *msg, struct netlink_ext_ack *extack); struct vdpa_notification_area (*get_vq_notification)(struct vdpa_device *vdev, u16 idx); /* vq irq is not expected to be changed once DRIVER_OK is set */ int (*get_vq_irq)(struct vdpa_device *vdev, u16 idx); u16 (*get_vq_size)(struct vdpa_device *vdev, u16 idx); /* Device ops */ u32 (*get_vq_align)(struct vdpa_device *vdev); u32 (*get_vq_group)(struct vdpa_device *vdev, u16 idx); u32 (*get_vq_desc_group)(struct vdpa_device *vdev, u16 idx); u64 (*get_device_features)(struct vdpa_device *vdev); u64 (*get_backend_features)(const struct vdpa_device *vdev); int (*set_driver_features)(struct vdpa_device *vdev, u64 features); u64 (*get_driver_features)(struct vdpa_device *vdev); void (*set_config_cb)(struct vdpa_device *vdev, struct vdpa_callback *cb); u16 (*get_vq_num_max)(struct vdpa_device *vdev); u16 (*get_vq_num_min)(struct vdpa_device *vdev); u32 (*get_device_id)(struct vdpa_device *vdev); u32 (*get_vendor_id)(struct vdpa_device *vdev); u8 (*get_status)(struct vdpa_device *vdev); void (*set_status)(struct vdpa_device *vdev, u8 status); int (*reset)(struct vdpa_device *vdev); int (*compat_reset)(struct vdpa_device *vdev, u32 flags); #define VDPA_RESET_F_CLEAN_MAP 1 int (*suspend)(struct vdpa_device *vdev); int (*resume)(struct vdpa_device *vdev); size_t (*get_config_size)(struct vdpa_device *vdev); void (*get_config)(struct vdpa_device *vdev, unsigned int offset, void *buf, unsigned int len); void (*set_config)(struct vdpa_device *vdev, unsigned int offset, const void *buf, unsigned int len); u32 (*get_generation)(struct vdpa_device *vdev); struct vdpa_iova_range (*get_iova_range)(struct vdpa_device *vdev); int (*set_vq_affinity)(struct vdpa_device *vdev, u16 idx, const struct cpumask *cpu_mask); const struct cpumask *(*get_vq_affinity)(struct vdpa_device *vdev, u16 idx); /* DMA ops */ int (*set_map)(struct vdpa_device *vdev, unsigned int asid, struct vhost_iotlb *iotlb); int (*dma_map)(struct vdpa_device *vdev, unsigned int asid, u64 iova, u64 size, u64 pa, u32 perm, void *opaque); int (*dma_unmap)(struct vdpa_device *vdev, unsigned int asid, u64 iova, u64 size); int (*reset_map)(struct vdpa_device *vdev, unsigned int asid); int (*set_group_asid)(struct vdpa_device *vdev, unsigned int group, unsigned int asid); struct device *(*get_vq_dma_dev)(struct vdpa_device *vdev, u16 idx); int (*bind_mm)(struct vdpa_device *vdev, struct mm_struct *mm); void (*unbind_mm)(struct vdpa_device *vdev); /* Free device resources */ void (*free)(struct vdpa_device *vdev); }; struct vdpa_device *__vdpa_alloc_device(struct device *parent, const struct vdpa_config_ops *config, unsigned int ngroups, unsigned int nas, size_t size, const char *name, bool use_va); /** * vdpa_alloc_device - allocate and initilaize a vDPA device * * @dev_struct: the type of the parent structure * @member: the name of struct vdpa_device within the @dev_struct * @parent: the parent device * @config: the bus operations that is supported by this device * @ngroups: the number of virtqueue groups supported by this device * @nas: the number of address spaces * @name: name of the vdpa device * @use_va: indicate whether virtual address must be used by this device * * Return allocated data structure or ERR_PTR upon error */ #define vdpa_alloc_device(dev_struct, member, parent, config, ngroups, nas, \ name, use_va) \ container_of((__vdpa_alloc_device( \ parent, config, ngroups, nas, \ (sizeof(dev_struct) + \ BUILD_BUG_ON_ZERO(offsetof( \ dev_struct, member))), name, use_va)), \ dev_struct, member) int vdpa_register_device(struct vdpa_device *vdev, u32 nvqs); void vdpa_unregister_device(struct vdpa_device *vdev); int _vdpa_register_device(struct vdpa_device *vdev, u32 nvqs); void _vdpa_unregister_device(struct vdpa_device *vdev); /** * struct vdpa_driver - operations for a vDPA driver * @driver: underlying device driver * @probe: the function to call when a device is found. Returns 0 or -errno. * @remove: the function to call when a device is removed. */ struct vdpa_driver { struct device_driver driver; int (*probe)(struct vdpa_device *vdev); void (*remove)(struct vdpa_device *vdev); }; #define vdpa_register_driver(drv) \ __vdpa_register_driver(drv, THIS_MODULE) int __vdpa_register_driver(struct vdpa_driver *drv, struct module *owner); void vdpa_unregister_driver(struct vdpa_driver *drv); #define module_vdpa_driver(__vdpa_driver) \ module_driver(__vdpa_driver, vdpa_register_driver, \ vdpa_unregister_driver) static inline struct vdpa_driver *drv_to_vdpa(struct device_driver *driver) { return container_of(driver, struct vdpa_driver, driver); } static inline struct vdpa_device *dev_to_vdpa(struct device *_dev) { return container_of(_dev, struct vdpa_device, dev); } static inline void *vdpa_get_drvdata(const struct vdpa_device *vdev) { return dev_get_drvdata(&vdev->dev); } static inline void vdpa_set_drvdata(struct vdpa_device *vdev, void *data) { dev_set_drvdata(&vdev->dev, data); } static inline struct device *vdpa_get_dma_dev(struct vdpa_device *vdev) { return vdev->dma_dev; } static inline int vdpa_reset(struct vdpa_device *vdev, u32 flags) { const struct vdpa_config_ops *ops = vdev->config; int ret; down_write(&vdev->cf_lock); vdev->features_valid = false; if (ops->compat_reset && flags) ret = ops->compat_reset(vdev, flags); else ret = ops->reset(vdev); up_write(&vdev->cf_lock); return ret; } static inline int vdpa_set_features_unlocked(struct vdpa_device *vdev, u64 features) { const struct vdpa_config_ops *ops = vdev->config; int ret; vdev->features_valid = true; ret = ops->set_driver_features(vdev, features); return ret; } static inline int vdpa_set_features(struct vdpa_device *vdev, u64 features) { int ret; down_write(&vdev->cf_lock); ret = vdpa_set_features_unlocked(vdev, features); up_write(&vdev->cf_lock); return ret; } void vdpa_get_config(struct vdpa_device *vdev, unsigned int offset, void *buf, unsigned int len); void vdpa_set_config(struct vdpa_device *dev, unsigned int offset, const void *buf, unsigned int length); void vdpa_set_status(struct vdpa_device *vdev, u8 status); /** * struct vdpa_mgmtdev_ops - vdpa device ops * @dev_add: Add a vdpa device using alloc and register * @mdev: parent device to use for device addition * @name: name of the new vdpa device * @config: config attributes to apply to the device under creation * Driver need to add a new device using _vdpa_register_device() * after fully initializing the vdpa device. Driver must return 0 * on success or appropriate error code. * @dev_del: Remove a vdpa device using unregister * @mdev: parent device to use for device removal * @dev: vdpa device to remove * Driver need to remove the specified device by calling * _vdpa_unregister_device(). * @dev_set_attr: change a vdpa device's attr after it was create * @mdev: parent device to use for device * @dev: vdpa device structure * @config:Attributes to be set for the device. * The driver needs to check the mask of the structure and then set * the related information to the vdpa device. The driver must return 0 * if set successfully. */ struct vdpa_mgmtdev_ops { int (*dev_add)(struct vdpa_mgmt_dev *mdev, const char *name, const struct vdpa_dev_set_config *config); void (*dev_del)(struct vdpa_mgmt_dev *mdev, struct vdpa_device *dev); int (*dev_set_attr)(struct vdpa_mgmt_dev *mdev, struct vdpa_device *dev, const struct vdpa_dev_set_config *config); }; /** * struct vdpa_mgmt_dev - vdpa management device * @device: Management parent device * @ops: operations supported by management device * @id_table: Pointer to device id table of supported ids * @config_attr_mask: bit mask of attributes of type enum vdpa_attr that * management device support during dev_add callback * @list: list entry * @supported_features: features supported by device * @max_supported_vqs: maximum number of virtqueues supported by device */ struct vdpa_mgmt_dev { struct device *device; const struct vdpa_mgmtdev_ops *ops; struct virtio_device_id *id_table; u64 config_attr_mask; struct list_head list; u64 supported_features; u32 max_supported_vqs; }; int vdpa_mgmtdev_register(struct vdpa_mgmt_dev *mdev); void vdpa_mgmtdev_unregister(struct vdpa_mgmt_dev *mdev); #endif /* _LINUX_VDPA_H */ |
11196 11207 10716 10641 11150 11162 11148 11008 11153 858 11154 11150 860 11160 10653 10692 10658 10653 10657 10659 10693 10654 10693 10659 10654 10694 831 833 832 377 818 816 819 10643 10648 10656 10653 10618 10623 10624 10634 10618 281 834 858 857 862 283 828 827 833 834 1 833 833 10613 10615 10655 10653 374 10646 10691 10642 10695 10654 10639 10643 10693 10644 10656 10645 10651 10695 994 995 10607 1 4 95 10634 4 4 10624 11071 10767 11154 11200 10603 10724 10755 10712 11061 11066 11158 11207 11056 11053 102 3 3 1 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 2297 2298 2299 2300 2301 2302 2303 2304 2305 2306 2307 2308 2309 2310 2311 2312 2313 2314 2315 2316 2317 2318 2319 2320 2321 2322 2323 2324 2325 2326 2327 2328 2329 2330 2331 2332 2333 2334 2335 2336 2337 2338 2339 2340 2341 2342 2343 2344 2345 2346 2347 2348 2349 2350 2351 2352 2353 2354 2355 2356 | // SPDX-License-Identifier: GPL-2.0 #include <linux/kernel.h> #include <linux/irqflags.h> #include <linux/string.h> #include <linux/errno.h> #include <linux/bug.h> #include "printk_ringbuffer.h" #include "internal.h" /** * DOC: printk_ringbuffer overview * * Data Structure * -------------- * The printk_ringbuffer is made up of 3 internal ringbuffers: * * desc_ring * A ring of descriptors and their meta data (such as sequence number, * timestamp, loglevel, etc.) as well as internal state information about * the record and logical positions specifying where in the other * ringbuffer the text strings are located. * * text_data_ring * A ring of data blocks. A data block consists of an unsigned long * integer (ID) that maps to a desc_ring index followed by the text * string of the record. * * The internal state information of a descriptor is the key element to allow * readers and writers to locklessly synchronize access to the data. * * Implementation * -------------- * * Descriptor Ring * ~~~~~~~~~~~~~~~ * The descriptor ring is an array of descriptors. A descriptor contains * essential meta data to track the data of a printk record using * blk_lpos structs pointing to associated text data blocks (see * "Data Rings" below). Each descriptor is assigned an ID that maps * directly to index values of the descriptor array and has a state. The ID * and the state are bitwise combined into a single descriptor field named * @state_var, allowing ID and state to be synchronously and atomically * updated. * * Descriptors have four states: * * reserved * A writer is modifying the record. * * committed * The record and all its data are written. A writer can reopen the * descriptor (transitioning it back to reserved), but in the committed * state the data is consistent. * * finalized * The record and all its data are complete and available for reading. A * writer cannot reopen the descriptor. * * reusable * The record exists, but its text and/or meta data may no longer be * available. * * Querying the @state_var of a record requires providing the ID of the * descriptor to query. This can yield a possible fifth (pseudo) state: * * miss * The descriptor being queried has an unexpected ID. * * The descriptor ring has a @tail_id that contains the ID of the oldest * descriptor and @head_id that contains the ID of the newest descriptor. * * When a new descriptor should be created (and the ring is full), the tail * descriptor is invalidated by first transitioning to the reusable state and * then invalidating all tail data blocks up to and including the data blocks * associated with the tail descriptor (for the text ring). Then * @tail_id is advanced, followed by advancing @head_id. And finally the * @state_var of the new descriptor is initialized to the new ID and reserved * state. * * The @tail_id can only be advanced if the new @tail_id would be in the * committed or reusable queried state. This makes it possible that a valid * sequence number of the tail is always available. * * Descriptor Finalization * ~~~~~~~~~~~~~~~~~~~~~~~ * When a writer calls the commit function prb_commit(), record data is * fully stored and is consistent within the ringbuffer. However, a writer can * reopen that record, claiming exclusive access (as with prb_reserve()), and * modify that record. When finished, the writer must again commit the record. * * In order for a record to be made available to readers (and also become * recyclable for writers), it must be finalized. A finalized record cannot be * reopened and can never become "unfinalized". Record finalization can occur * in three different scenarios: * * 1) A writer can simultaneously commit and finalize its record by calling * prb_final_commit() instead of prb_commit(). * * 2) When a new record is reserved and the previous record has been * committed via prb_commit(), that previous record is automatically * finalized. * * 3) When a record is committed via prb_commit() and a newer record * already exists, the record being committed is automatically finalized. * * Data Ring * ~~~~~~~~~ * The text data ring is a byte array composed of data blocks. Data blocks are * referenced by blk_lpos structs that point to the logical position of the * beginning of a data block and the beginning of the next adjacent data * block. Logical positions are mapped directly to index values of the byte * array ringbuffer. * * Each data block consists of an ID followed by the writer data. The ID is * the identifier of a descriptor that is associated with the data block. A * given data block is considered valid if all of the following conditions * are met: * * 1) The descriptor associated with the data block is in the committed * or finalized queried state. * * 2) The blk_lpos struct within the descriptor associated with the data * block references back to the same data block. * * 3) The data block is within the head/tail logical position range. * * If the writer data of a data block would extend beyond the end of the * byte array, only the ID of the data block is stored at the logical * position and the full data block (ID and writer data) is stored at the * beginning of the byte array. The referencing blk_lpos will point to the * ID before the wrap and the next data block will be at the logical * position adjacent the full data block after the wrap. * * Data rings have a @tail_lpos that points to the beginning of the oldest * data block and a @head_lpos that points to the logical position of the * next (not yet existing) data block. * * When a new data block should be created (and the ring is full), tail data * blocks will first be invalidated by putting their associated descriptors * into the reusable state and then pushing the @tail_lpos forward beyond * them. Then the @head_lpos is pushed forward and is associated with a new * descriptor. If a data block is not valid, the @tail_lpos cannot be * advanced beyond it. * * Info Array * ~~~~~~~~~~ * The general meta data of printk records are stored in printk_info structs, * stored in an array with the same number of elements as the descriptor ring. * Each info corresponds to the descriptor of the same index in the * descriptor ring. Info validity is confirmed by evaluating the corresponding * descriptor before and after loading the info. * * Usage * ----- * Here are some simple examples demonstrating writers and readers. For the * examples a global ringbuffer (test_rb) is available (which is not the * actual ringbuffer used by printk):: * * DEFINE_PRINTKRB(test_rb, 15, 5); * * This ringbuffer allows up to 32768 records (2 ^ 15) and has a size of * 1 MiB (2 ^ (15 + 5)) for text data. * * Sample writer code:: * * const char *textstr = "message text"; * struct prb_reserved_entry e; * struct printk_record r; * * // specify how much to allocate * prb_rec_init_wr(&r, strlen(textstr) + 1); * * if (prb_reserve(&e, &test_rb, &r)) { * snprintf(r.text_buf, r.text_buf_size, "%s", textstr); * * r.info->text_len = strlen(textstr); * r.info->ts_nsec = local_clock(); * r.info->caller_id = printk_caller_id(); * * // commit and finalize the record * prb_final_commit(&e); * } * * Note that additional writer functions are available to extend a record * after it has been committed but not yet finalized. This can be done as * long as no new records have been reserved and the caller is the same. * * Sample writer code (record extending):: * * // alternate rest of previous example * * r.info->text_len = strlen(textstr); * r.info->ts_nsec = local_clock(); * r.info->caller_id = printk_caller_id(); * * // commit the record (but do not finalize yet) * prb_commit(&e); * } * * ... * * // specify additional 5 bytes text space to extend * prb_rec_init_wr(&r, 5); * * // try to extend, but only if it does not exceed 32 bytes * if (prb_reserve_in_last(&e, &test_rb, &r, printk_caller_id(), 32)) { * snprintf(&r.text_buf[r.info->text_len], * r.text_buf_size - r.info->text_len, "hello"); * * r.info->text_len += 5; * * // commit and finalize the record * prb_final_commit(&e); * } * * Sample reader code:: * * struct printk_info info; * struct printk_record r; * char text_buf[32]; * u64 seq; * * prb_rec_init_rd(&r, &info, &text_buf[0], sizeof(text_buf)); * * prb_for_each_record(0, &test_rb, &seq, &r) { * if (info.seq != seq) * pr_warn("lost %llu records\n", info.seq - seq); * * if (info.text_len > r.text_buf_size) { * pr_warn("record %llu text truncated\n", info.seq); * text_buf[r.text_buf_size - 1] = 0; * } * * pr_info("%llu: %llu: %s\n", info.seq, info.ts_nsec, * &text_buf[0]); * } * * Note that additional less convenient reader functions are available to * allow complex record access. * * ABA Issues * ~~~~~~~~~~ * To help avoid ABA issues, descriptors are referenced by IDs (array index * values combined with tagged bits counting array wraps) and data blocks are * referenced by logical positions (array index values combined with tagged * bits counting array wraps). However, on 32-bit systems the number of * tagged bits is relatively small such that an ABA incident is (at least * theoretically) possible. For example, if 4 million maximally sized (1KiB) * printk messages were to occur in NMI context on a 32-bit system, the * interrupted context would not be able to recognize that the 32-bit integer * completely wrapped and thus represents a different data block than the one * the interrupted context expects. * * To help combat this possibility, additional state checking is performed * (such as using cmpxchg() even though set() would suffice). These extra * checks are commented as such and will hopefully catch any ABA issue that * a 32-bit system might experience. * * Memory Barriers * ~~~~~~~~~~~~~~~ * Multiple memory barriers are used. To simplify proving correctness and * generating litmus tests, lines of code related to memory barriers * (loads, stores, and the associated memory barriers) are labeled:: * * LMM(function:letter) * * Comments reference the labels using only the "function:letter" part. * * The memory barrier pairs and their ordering are: * * desc_reserve:D / desc_reserve:B * push descriptor tail (id), then push descriptor head (id) * * desc_reserve:D / data_push_tail:B * push data tail (lpos), then set new descriptor reserved (state) * * desc_reserve:D / desc_push_tail:C * push descriptor tail (id), then set new descriptor reserved (state) * * desc_reserve:D / prb_first_seq:C * push descriptor tail (id), then set new descriptor reserved (state) * * desc_reserve:F / desc_read:D * set new descriptor id and reserved (state), then allow writer changes * * data_alloc:A (or data_realloc:A) / desc_read:D * set old descriptor reusable (state), then modify new data block area * * data_alloc:A (or data_realloc:A) / data_push_tail:B * push data tail (lpos), then modify new data block area * * _prb_commit:B / desc_read:B * store writer changes, then set new descriptor committed (state) * * desc_reopen_last:A / _prb_commit:B * set descriptor reserved (state), then read descriptor data * * _prb_commit:B / desc_reserve:D * set new descriptor committed (state), then check descriptor head (id) * * data_push_tail:D / data_push_tail:A * set descriptor reusable (state), then push data tail (lpos) * * desc_push_tail:B / desc_reserve:D * set descriptor reusable (state), then push descriptor tail (id) * * desc_update_last_finalized:A / desc_last_finalized_seq:A * store finalized record, then set new highest finalized sequence number */ #define DATA_SIZE(data_ring) _DATA_SIZE((data_ring)->size_bits) #define DATA_SIZE_MASK(data_ring) (DATA_SIZE(data_ring) - 1) #define DESCS_COUNT(desc_ring) _DESCS_COUNT((desc_ring)->count_bits) #define DESCS_COUNT_MASK(desc_ring) (DESCS_COUNT(desc_ring) - 1) /* Determine the data array index from a logical position. */ #define DATA_INDEX(data_ring, lpos) ((lpos) & DATA_SIZE_MASK(data_ring)) /* Determine the desc array index from an ID or sequence number. */ #define DESC_INDEX(desc_ring, n) ((n) & DESCS_COUNT_MASK(desc_ring)) /* Determine how many times the data array has wrapped. */ #define DATA_WRAPS(data_ring, lpos) ((lpos) >> (data_ring)->size_bits) /* Determine if a logical position refers to a data-less block. */ #define LPOS_DATALESS(lpos) ((lpos) & 1UL) #define BLK_DATALESS(blk) (LPOS_DATALESS((blk)->begin) && \ LPOS_DATALESS((blk)->next)) /* Get the logical position at index 0 of the current wrap. */ #define DATA_THIS_WRAP_START_LPOS(data_ring, lpos) \ ((lpos) & ~DATA_SIZE_MASK(data_ring)) /* Get the ID for the same index of the previous wrap as the given ID. */ #define DESC_ID_PREV_WRAP(desc_ring, id) \ DESC_ID((id) - DESCS_COUNT(desc_ring)) /* * A data block: mapped directly to the beginning of the data block area * specified as a logical position within the data ring. * * @id: the ID of the associated descriptor * @data: the writer data * * Note that the size of a data block is only known by its associated * descriptor. */ struct prb_data_block { unsigned long id; char data[]; }; /* * Return the descriptor associated with @n. @n can be either a * descriptor ID or a sequence number. */ static struct prb_desc *to_desc(struct prb_desc_ring *desc_ring, u64 n) { return &desc_ring->descs[DESC_INDEX(desc_ring, n)]; } /* * Return the printk_info associated with @n. @n can be either a * descriptor ID or a sequence number. */ static struct printk_info *to_info(struct prb_desc_ring *desc_ring, u64 n) { return &desc_ring->infos[DESC_INDEX(desc_ring, n)]; } static struct prb_data_block *to_block(struct prb_data_ring *data_ring, unsigned long begin_lpos) { return (void *)&data_ring->data[DATA_INDEX(data_ring, begin_lpos)]; } /* * Increase the data size to account for data block meta data plus any * padding so that the adjacent data block is aligned on the ID size. */ static unsigned int to_blk_size(unsigned int size) { struct prb_data_block *db = NULL; size += sizeof(*db); size = ALIGN(size, sizeof(db->id)); return size; } /* * Sanity checker for reserve size. The ringbuffer code assumes that a data * block does not exceed the maximum possible size that could fit within the * ringbuffer. This function provides that basic size check so that the * assumption is safe. */ static bool data_check_size(struct prb_data_ring *data_ring, unsigned int size) { struct prb_data_block *db = NULL; if (size == 0) return true; /* * Ensure the alignment padded size could possibly fit in the data * array. The largest possible data block must still leave room for * at least the ID of the next block. */ size = to_blk_size(size); if (size > DATA_SIZE(data_ring) - sizeof(db->id)) return false; return true; } /* Query the state of a descriptor. */ static enum desc_state get_desc_state(unsigned long id, unsigned long state_val) { if (id != DESC_ID(state_val)) return desc_miss; return DESC_STATE(state_val); } /* * Get a copy of a specified descriptor and return its queried state. If the * descriptor is in an inconsistent state (miss or reserved), the caller can * only expect the descriptor's @state_var field to be valid. * * The sequence number and caller_id can be optionally retrieved. Like all * non-state_var data, they are only valid if the descriptor is in a * consistent state. */ static enum desc_state desc_read(struct prb_desc_ring *desc_ring, unsigned long id, struct prb_desc *desc_out, u64 *seq_out, u32 *caller_id_out) { struct printk_info *info = to_info(desc_ring, id); struct prb_desc *desc = to_desc(desc_ring, id); atomic_long_t *state_var = &desc->state_var; enum desc_state d_state; unsigned long state_val; /* Check the descriptor state. */ state_val = atomic_long_read(state_var); /* LMM(desc_read:A) */ d_state = get_desc_state(id, state_val); if (d_state == desc_miss || d_state == desc_reserved) { /* * The descriptor is in an inconsistent state. Set at least * @state_var so that the caller can see the details of * the inconsistent state. */ goto out; } /* * Guarantee the state is loaded before copying the descriptor * content. This avoids copying obsolete descriptor content that might * not apply to the descriptor state. This pairs with _prb_commit:B. * * Memory barrier involvement: * * If desc_read:A reads from _prb_commit:B, then desc_read:C reads * from _prb_commit:A. * * Relies on: * * WMB from _prb_commit:A to _prb_commit:B * matching * RMB from desc_read:A to desc_read:C */ smp_rmb(); /* LMM(desc_read:B) */ /* * Copy the descriptor data. The data is not valid until the * state has been re-checked. A memcpy() for all of @desc * cannot be used because of the atomic_t @state_var field. */ if (desc_out) { memcpy(&desc_out->text_blk_lpos, &desc->text_blk_lpos, sizeof(desc_out->text_blk_lpos)); /* LMM(desc_read:C) */ } if (seq_out) *seq_out = info->seq; /* also part of desc_read:C */ if (caller_id_out) *caller_id_out = info->caller_id; /* also part of desc_read:C */ /* * 1. Guarantee the descriptor content is loaded before re-checking * the state. This avoids reading an obsolete descriptor state * that may not apply to the copied content. This pairs with * desc_reserve:F. * * Memory barrier involvement: * * If desc_read:C reads from desc_reserve:G, then desc_read:E * reads from desc_reserve:F. * * Relies on: * * WMB from desc_reserve:F to desc_reserve:G * matching * RMB from desc_read:C to desc_read:E * * 2. Guarantee the record data is loaded before re-checking the * state. This avoids reading an obsolete descriptor state that may * not apply to the copied data. This pairs with data_alloc:A and * data_realloc:A. * * Memory barrier involvement: * * If copy_data:A reads from data_alloc:B, then desc_read:E * reads from desc_make_reusable:A. * * Relies on: * * MB from desc_make_reusable:A to data_alloc:B * matching * RMB from desc_read:C to desc_read:E * * Note: desc_make_reusable:A and data_alloc:B can be different * CPUs. However, the data_alloc:B CPU (which performs the * full memory barrier) must have previously seen * desc_make_reusable:A. */ smp_rmb(); /* LMM(desc_read:D) */ /* * The data has been copied. Return the current descriptor state, * which may have changed since the load above. */ state_val = atomic_long_read(state_var); /* LMM(desc_read:E) */ d_state = get_desc_state(id, state_val); out: if (desc_out) atomic_long_set(&desc_out->state_var, state_val); return d_state; } /* * Take a specified descriptor out of the finalized state by attempting * the transition from finalized to reusable. Either this context or some * other context will have been successful. */ static void desc_make_reusable(struct prb_desc_ring *desc_ring, unsigned long id) { unsigned long val_finalized = DESC_SV(id, desc_finalized); unsigned long val_reusable = DESC_SV(id, desc_reusable); struct prb_desc *desc = to_desc(desc_ring, id); atomic_long_t *state_var = &desc->state_var; atomic_long_cmpxchg_relaxed(state_var, val_finalized, val_reusable); /* LMM(desc_make_reusable:A) */ } /* * Given the text data ring, put the associated descriptor of each * data block from @lpos_begin until @lpos_end into the reusable state. * * If there is any problem making the associated descriptor reusable, either * the descriptor has not yet been finalized or another writer context has * already pushed the tail lpos past the problematic data block. Regardless, * on error the caller can re-load the tail lpos to determine the situation. */ static bool data_make_reusable(struct printk_ringbuffer *rb, unsigned long lpos_begin, unsigned long lpos_end, unsigned long *lpos_out) { struct prb_data_ring *data_ring = &rb->text_data_ring; struct prb_desc_ring *desc_ring = &rb->desc_ring; struct prb_data_block *blk; enum desc_state d_state; struct prb_desc desc; struct prb_data_blk_lpos *blk_lpos = &desc.text_blk_lpos; unsigned long id; /* Loop until @lpos_begin has advanced to or beyond @lpos_end. */ while ((lpos_end - lpos_begin) - 1 < DATA_SIZE(data_ring)) { blk = to_block(data_ring, lpos_begin); /* * Load the block ID from the data block. This is a data race * against a writer that may have newly reserved this data * area. If the loaded value matches a valid descriptor ID, * the blk_lpos of that descriptor will be checked to make * sure it points back to this data block. If the check fails, * the data area has been recycled by another writer. */ id = blk->id; /* LMM(data_make_reusable:A) */ d_state = desc_read(desc_ring, id, &desc, NULL, NULL); /* LMM(data_make_reusable:B) */ switch (d_state) { case desc_miss: case desc_reserved: case desc_committed: return false; case desc_finalized: /* * This data block is invalid if the descriptor * does not point back to it. */ if (blk_lpos->begin != lpos_begin) return false; desc_make_reusable(desc_ring, id); break; case desc_reusable: /* * This data block is invalid if the descriptor * does not point back to it. */ if (blk_lpos->begin != lpos_begin) return false; break; } /* Advance @lpos_begin to the next data block. */ lpos_begin = blk_lpos->next; } *lpos_out = lpos_begin; return true; } /* * Advance the data ring tail to at least @lpos. This function puts * descriptors into the reusable state if the tail is pushed beyond * their associated data block. */ static bool data_push_tail(struct printk_ringbuffer *rb, unsigned long lpos) { struct prb_data_ring *data_ring = &rb->text_data_ring; unsigned long tail_lpos_new; unsigned long tail_lpos; unsigned long next_lpos; /* If @lpos is from a data-less block, there is nothing to do. */ if (LPOS_DATALESS(lpos)) return true; /* * Any descriptor states that have transitioned to reusable due to the * data tail being pushed to this loaded value will be visible to this * CPU. This pairs with data_push_tail:D. * * Memory barrier involvement: * * If data_push_tail:A reads from data_push_tail:D, then this CPU can * see desc_make_reusable:A. * * Relies on: * * MB from desc_make_reusable:A to data_push_tail:D * matches * READFROM from data_push_tail:D to data_push_tail:A * thus * READFROM from desc_make_reusable:A to this CPU */ tail_lpos = atomic_long_read(&data_ring->tail_lpos); /* LMM(data_push_tail:A) */ /* * Loop until the tail lpos is at or beyond @lpos. This condition * may already be satisfied, resulting in no full memory barrier * from data_push_tail:D being performed. However, since this CPU * sees the new tail lpos, any descriptor states that transitioned to * the reusable state must already be visible. */ while ((lpos - tail_lpos) - 1 < DATA_SIZE(data_ring)) { /* * Make all descriptors reusable that are associated with * data blocks before @lpos. */ if (!data_make_reusable(rb, tail_lpos, lpos, &next_lpos)) { /* * 1. Guarantee the block ID loaded in * data_make_reusable() is performed before * reloading the tail lpos. The failed * data_make_reusable() may be due to a newly * recycled data area causing the tail lpos to * have been previously pushed. This pairs with * data_alloc:A and data_realloc:A. * * Memory barrier involvement: * * If data_make_reusable:A reads from data_alloc:B, * then data_push_tail:C reads from * data_push_tail:D. * * Relies on: * * MB from data_push_tail:D to data_alloc:B * matching * RMB from data_make_reusable:A to * data_push_tail:C * * Note: data_push_tail:D and data_alloc:B can be * different CPUs. However, the data_alloc:B * CPU (which performs the full memory * barrier) must have previously seen * data_push_tail:D. * * 2. Guarantee the descriptor state loaded in * data_make_reusable() is performed before * reloading the tail lpos. The failed * data_make_reusable() may be due to a newly * recycled descriptor causing the tail lpos to * have been previously pushed. This pairs with * desc_reserve:D. * * Memory barrier involvement: * * If data_make_reusable:B reads from * desc_reserve:F, then data_push_tail:C reads * from data_push_tail:D. * * Relies on: * * MB from data_push_tail:D to desc_reserve:F * matching * RMB from data_make_reusable:B to * data_push_tail:C * * Note: data_push_tail:D and desc_reserve:F can * be different CPUs. However, the * desc_reserve:F CPU (which performs the * full memory barrier) must have previously * seen data_push_tail:D. */ smp_rmb(); /* LMM(data_push_tail:B) */ tail_lpos_new = atomic_long_read(&data_ring->tail_lpos ); /* LMM(data_push_tail:C) */ if (tail_lpos_new == tail_lpos) return false; /* Another CPU pushed the tail. Try again. */ tail_lpos = tail_lpos_new; continue; } /* * Guarantee any descriptor states that have transitioned to * reusable are stored before pushing the tail lpos. A full * memory barrier is needed since other CPUs may have made * the descriptor states reusable. This pairs with * data_push_tail:A. */ if (atomic_long_try_cmpxchg(&data_ring->tail_lpos, &tail_lpos, next_lpos)) { /* LMM(data_push_tail:D) */ break; } } return true; } /* * Advance the desc ring tail. This function advances the tail by one * descriptor, thus invalidating the oldest descriptor. Before advancing * the tail, the tail descriptor is made reusable and all data blocks up to * and including the descriptor's data block are invalidated (i.e. the data * ring tail is pushed past the data block of the descriptor being made * reusable). */ static bool desc_push_tail(struct printk_ringbuffer *rb, unsigned long tail_id) { struct prb_desc_ring *desc_ring = &rb->desc_ring; enum desc_state d_state; struct prb_desc desc; d_state = desc_read(desc_ring, tail_id, &desc, NULL, NULL); switch (d_state) { case desc_miss: /* * If the ID is exactly 1 wrap behind the expected, it is * in the process of being reserved by another writer and * must be considered reserved. */ if (DESC_ID(atomic_long_read(&desc.state_var)) == DESC_ID_PREV_WRAP(desc_ring, tail_id)) { return false; } /* * The ID has changed. Another writer must have pushed the * tail and recycled the descriptor already. Success is * returned because the caller is only interested in the * specified tail being pushed, which it was. */ return true; case desc_reserved: case desc_committed: return false; case desc_finalized: desc_make_reusable(desc_ring, tail_id); break; case desc_reusable: break; } /* * Data blocks must be invalidated before their associated * descriptor can be made available for recycling. Invalidating * them later is not possible because there is no way to trust * data blocks once their associated descriptor is gone. */ if (!data_push_tail(rb, desc.text_blk_lpos.next)) return false; /* * Check the next descriptor after @tail_id before pushing the tail * to it because the tail must always be in a finalized or reusable * state. The implementation of prb_first_seq() relies on this. * * A successful read implies that the next descriptor is less than or * equal to @head_id so there is no risk of pushing the tail past the * head. */ d_state = desc_read(desc_ring, DESC_ID(tail_id + 1), &desc, NULL, NULL); /* LMM(desc_push_tail:A) */ if (d_state == desc_finalized || d_state == desc_reusable) { /* * Guarantee any descriptor states that have transitioned to * reusable are stored before pushing the tail ID. This allows * verifying the recycled descriptor state. A full memory * barrier is needed since other CPUs may have made the * descriptor states reusable. This pairs with desc_reserve:D. */ atomic_long_cmpxchg(&desc_ring->tail_id, tail_id, DESC_ID(tail_id + 1)); /* LMM(desc_push_tail:B) */ } else { /* * Guarantee the last state load from desc_read() is before * reloading @tail_id in order to see a new tail ID in the * case that the descriptor has been recycled. This pairs * with desc_reserve:D. * * Memory barrier involvement: * * If desc_push_tail:A reads from desc_reserve:F, then * desc_push_tail:D reads from desc_push_tail:B. * * Relies on: * * MB from desc_push_tail:B to desc_reserve:F * matching * RMB from desc_push_tail:A to desc_push_tail:D * * Note: desc_push_tail:B and desc_reserve:F can be different * CPUs. However, the desc_reserve:F CPU (which performs * the full memory barrier) must have previously seen * desc_push_tail:B. */ smp_rmb(); /* LMM(desc_push_tail:C) */ /* * Re-check the tail ID. The descriptor following @tail_id is * not in an allowed tail state. But if the tail has since * been moved by another CPU, then it does not matter. */ if (atomic_long_read(&desc_ring->tail_id) == tail_id) /* LMM(desc_push_tail:D) */ return false; } return true; } /* Reserve a new descriptor, invalidating the oldest if necessary. */ static bool desc_reserve(struct printk_ringbuffer *rb, unsigned long *id_out) { struct prb_desc_ring *desc_ring = &rb->desc_ring; unsigned long prev_state_val; unsigned long id_prev_wrap; struct prb_desc *desc; unsigned long head_id; unsigned long id; head_id = atomic_long_read(&desc_ring->head_id); /* LMM(desc_reserve:A) */ do { id = DESC_ID(head_id + 1); id_prev_wrap = DESC_ID_PREV_WRAP(desc_ring, id); /* * Guarantee the head ID is read before reading the tail ID. * Since the tail ID is updated before the head ID, this * guarantees that @id_prev_wrap is never ahead of the tail * ID. This pairs with desc_reserve:D. * * Memory barrier involvement: * * If desc_reserve:A reads from desc_reserve:D, then * desc_reserve:C reads from desc_push_tail:B. * * Relies on: * * MB from desc_push_tail:B to desc_reserve:D * matching * RMB from desc_reserve:A to desc_reserve:C * * Note: desc_push_tail:B and desc_reserve:D can be different * CPUs. However, the desc_reserve:D CPU (which performs * the full memory barrier) must have previously seen * desc_push_tail:B. */ smp_rmb(); /* LMM(desc_reserve:B) */ if (id_prev_wrap == atomic_long_read(&desc_ring->tail_id )) { /* LMM(desc_reserve:C) */ /* * Make space for the new descriptor by * advancing the tail. */ if (!desc_push_tail(rb, id_prev_wrap)) return false; } /* * 1. Guarantee the tail ID is read before validating the * recycled descriptor state. A read memory barrier is * sufficient for this. This pairs with desc_push_tail:B. * * Memory barrier involvement: * * If desc_reserve:C reads from desc_push_tail:B, then * desc_reserve:E reads from desc_make_reusable:A. * * Relies on: * * MB from desc_make_reusable:A to desc_push_tail:B * matching * RMB from desc_reserve:C to desc_reserve:E * * Note: desc_make_reusable:A and desc_push_tail:B can be * different CPUs. However, the desc_push_tail:B CPU * (which performs the full memory barrier) must have * previously seen desc_make_reusable:A. * * 2. Guarantee the tail ID is stored before storing the head * ID. This pairs with desc_reserve:B. * * 3. Guarantee any data ring tail changes are stored before * recycling the descriptor. Data ring tail changes can * happen via desc_push_tail()->data_push_tail(). A full * memory barrier is needed since another CPU may have * pushed the data ring tails. This pairs with * data_push_tail:B. * * 4. Guarantee a new tail ID is stored before recycling the * descriptor. A full memory barrier is needed since * another CPU may have pushed the tail ID. This pairs * with desc_push_tail:C and this also pairs with * prb_first_seq:C. * * 5. Guarantee the head ID is stored before trying to * finalize the previous descriptor. This pairs with * _prb_commit:B. */ } while (!atomic_long_try_cmpxchg(&desc_ring->head_id, &head_id, id)); /* LMM(desc_reserve:D) */ desc = to_desc(desc_ring, id); /* * If the descriptor has been recycled, verify the old state val. * See "ABA Issues" about why this verification is performed. */ prev_state_val = atomic_long_read(&desc->state_var); /* LMM(desc_reserve:E) */ if (prev_state_val && get_desc_state(id_prev_wrap, prev_state_val) != desc_reusable) { WARN_ON_ONCE(1); return false; } /* * Assign the descriptor a new ID and set its state to reserved. * See "ABA Issues" about why cmpxchg() instead of set() is used. * * Guarantee the new descriptor ID and state is stored before making * any other changes. A write memory barrier is sufficient for this. * This pairs with desc_read:D. */ if (!atomic_long_try_cmpxchg(&desc->state_var, &prev_state_val, DESC_SV(id, desc_reserved))) { /* LMM(desc_reserve:F) */ WARN_ON_ONCE(1); return false; } /* Now data in @desc can be modified: LMM(desc_reserve:G) */ *id_out = id; return true; } /* Determine the end of a data block. */ static unsigned long get_next_lpos(struct prb_data_ring *data_ring, unsigned long lpos, unsigned int size) { unsigned long begin_lpos; unsigned long next_lpos; begin_lpos = lpos; next_lpos = lpos + size; /* First check if the data block does not wrap. */ if (DATA_WRAPS(data_ring, begin_lpos) == DATA_WRAPS(data_ring, next_lpos)) return next_lpos; /* Wrapping data blocks store their data at the beginning. */ return (DATA_THIS_WRAP_START_LPOS(data_ring, next_lpos) + size); } /* * Allocate a new data block, invalidating the oldest data block(s) * if necessary. This function also associates the data block with * a specified descriptor. */ static char *data_alloc(struct printk_ringbuffer *rb, unsigned int size, struct prb_data_blk_lpos *blk_lpos, unsigned long id) { struct prb_data_ring *data_ring = &rb->text_data_ring; struct prb_data_block *blk; unsigned long begin_lpos; unsigned long next_lpos; if (size == 0) { /* * Data blocks are not created for empty lines. Instead, the * reader will recognize these special lpos values and handle * it appropriately. */ blk_lpos->begin = EMPTY_LINE_LPOS; blk_lpos->next = EMPTY_LINE_LPOS; return NULL; } size = to_blk_size(size); begin_lpos = atomic_long_read(&data_ring->head_lpos); do { next_lpos = get_next_lpos(data_ring, begin_lpos, size); if (!data_push_tail(rb, next_lpos - DATA_SIZE(data_ring))) { /* Failed to allocate, specify a data-less block. */ blk_lpos->begin = FAILED_LPOS; blk_lpos->next = FAILED_LPOS; return NULL; } /* * 1. Guarantee any descriptor states that have transitioned * to reusable are stored before modifying the newly * allocated data area. A full memory barrier is needed * since other CPUs may have made the descriptor states * reusable. See data_push_tail:A about why the reusable * states are visible. This pairs with desc_read:D. * * 2. Guarantee any updated tail lpos is stored before * modifying the newly allocated data area. Another CPU may * be in data_make_reusable() and is reading a block ID * from this area. data_make_reusable() can handle reading * a garbage block ID value, but then it must be able to * load a new tail lpos. A full memory barrier is needed * since other CPUs may have updated the tail lpos. This * pairs with data_push_tail:B. */ } while (!atomic_long_try_cmpxchg(&data_ring->head_lpos, &begin_lpos, next_lpos)); /* LMM(data_alloc:A) */ blk = to_block(data_ring, begin_lpos); blk->id = id; /* LMM(data_alloc:B) */ if (DATA_WRAPS(data_ring, begin_lpos) != DATA_WRAPS(data_ring, next_lpos)) { /* Wrapping data blocks store their data at the beginning. */ blk = to_block(data_ring, 0); /* * Store the ID on the wrapped block for consistency. * The printk_ringbuffer does not actually use it. */ blk->id = id; } blk_lpos->begin = begin_lpos; blk_lpos->next = next_lpos; return &blk->data[0]; } /* * Try to resize an existing data block associated with the descriptor * specified by @id. If the resized data block should become wrapped, it * copies the old data to the new data block. If @size yields a data block * with the same or less size, the data block is left as is. * * Fail if this is not the last allocated data block or if there is not * enough space or it is not possible make enough space. * * Return a pointer to the beginning of the entire data buffer or NULL on * failure. */ static char *data_realloc(struct printk_ringbuffer *rb, unsigned int size, struct prb_data_blk_lpos *blk_lpos, unsigned long id) { struct prb_data_ring *data_ring = &rb->text_data_ring; struct prb_data_block *blk; unsigned long head_lpos; unsigned long next_lpos; bool wrapped; /* Reallocation only works if @blk_lpos is the newest data block. */ head_lpos = atomic_long_read(&data_ring->head_lpos); if (head_lpos != blk_lpos->next) return NULL; /* Keep track if @blk_lpos was a wrapping data block. */ wrapped = (DATA_WRAPS(data_ring, blk_lpos->begin) != DATA_WRAPS(data_ring, blk_lpos->next)); size = to_blk_size(size); next_lpos = get_next_lpos(data_ring, blk_lpos->begin, size); /* If the data block does not increase, there is nothing to do. */ if (head_lpos - next_lpos < DATA_SIZE(data_ring)) { if (wrapped) blk = to_block(data_ring, 0); else blk = to_block(data_ring, blk_lpos->begin); return &blk->data[0]; } if (!data_push_tail(rb, next_lpos - DATA_SIZE(data_ring))) return NULL; /* The memory barrier involvement is the same as data_alloc:A. */ if (!atomic_long_try_cmpxchg(&data_ring->head_lpos, &head_lpos, next_lpos)) { /* LMM(data_realloc:A) */ return NULL; } blk = to_block(data_ring, blk_lpos->begin); if (DATA_WRAPS(data_ring, blk_lpos->begin) != DATA_WRAPS(data_ring, next_lpos)) { struct prb_data_block *old_blk = blk; /* Wrapping data blocks store their data at the beginning. */ blk = to_block(data_ring, 0); /* * Store the ID on the wrapped block for consistency. * The printk_ringbuffer does not actually use it. */ blk->id = id; if (!wrapped) { /* * Since the allocated space is now in the newly * created wrapping data block, copy the content * from the old data block. */ memcpy(&blk->data[0], &old_blk->data[0], (blk_lpos->next - blk_lpos->begin) - sizeof(blk->id)); } } blk_lpos->next = next_lpos; return &blk->data[0]; } /* Return the number of bytes used by a data block. */ static unsigned int space_used(struct prb_data_ring *data_ring, struct prb_data_blk_lpos *blk_lpos) { /* Data-less blocks take no space. */ if (BLK_DATALESS(blk_lpos)) return 0; if (DATA_WRAPS(data_ring, blk_lpos->begin) == DATA_WRAPS(data_ring, blk_lpos->next)) { /* Data block does not wrap. */ return (DATA_INDEX(data_ring, blk_lpos->next) - DATA_INDEX(data_ring, blk_lpos->begin)); } /* * For wrapping data blocks, the trailing (wasted) space is * also counted. */ return (DATA_INDEX(data_ring, blk_lpos->next) + DATA_SIZE(data_ring) - DATA_INDEX(data_ring, blk_lpos->begin)); } /* * Given @blk_lpos, return a pointer to the writer data from the data block * and calculate the size of the data part. A NULL pointer is returned if * @blk_lpos specifies values that could never be legal. * * This function (used by readers) performs strict validation on the lpos * values to possibly detect bugs in the writer code. A WARN_ON_ONCE() is * triggered if an internal error is detected. */ static const char *get_data(struct prb_data_ring *data_ring, struct prb_data_blk_lpos *blk_lpos, unsigned int *data_size) { struct prb_data_block *db; /* Data-less data block description. */ if (BLK_DATALESS(blk_lpos)) { /* * Records that are just empty lines are also valid, even * though they do not have a data block. For such records * explicitly return empty string data to signify success. */ if (blk_lpos->begin == EMPTY_LINE_LPOS && blk_lpos->next == EMPTY_LINE_LPOS) { *data_size = 0; return ""; } /* Data lost, invalid, or otherwise unavailable. */ return NULL; } /* Regular data block: @begin less than @next and in same wrap. */ if (DATA_WRAPS(data_ring, blk_lpos->begin) == DATA_WRAPS(data_ring, blk_lpos->next) && blk_lpos->begin < blk_lpos->next) { db = to_block(data_ring, blk_lpos->begin); *data_size = blk_lpos->next - blk_lpos->begin; /* Wrapping data block: @begin is one wrap behind @next. */ } else if (DATA_WRAPS(data_ring, blk_lpos->begin + DATA_SIZE(data_ring)) == DATA_WRAPS(data_ring, blk_lpos->next)) { db = to_block(data_ring, 0); *data_size = DATA_INDEX(data_ring, blk_lpos->next); /* Illegal block description. */ } else { WARN_ON_ONCE(1); return NULL; } /* A valid data block will always be aligned to the ID size. */ if (WARN_ON_ONCE(blk_lpos->begin != ALIGN(blk_lpos->begin, sizeof(db->id))) || WARN_ON_ONCE(blk_lpos->next != ALIGN(blk_lpos->next, sizeof(db->id)))) { return NULL; } /* A valid data block will always have at least an ID. */ if (WARN_ON_ONCE(*data_size < sizeof(db->id))) return NULL; /* Subtract block ID space from size to reflect data size. */ *data_size -= sizeof(db->id); return &db->data[0]; } /* * Attempt to transition the newest descriptor from committed back to reserved * so that the record can be modified by a writer again. This is only possible * if the descriptor is not yet finalized and the provided @caller_id matches. */ static struct prb_desc *desc_reopen_last(struct prb_desc_ring *desc_ring, u32 caller_id, unsigned long *id_out) { unsigned long prev_state_val; enum desc_state d_state; struct prb_desc desc; struct prb_desc *d; unsigned long id; u32 cid; id = atomic_long_read(&desc_ring->head_id); /* * To reduce unnecessarily reopening, first check if the descriptor * state and caller ID are correct. */ d_state = desc_read(desc_ring, id, &desc, NULL, &cid); if (d_state != desc_committed || cid != caller_id) return NULL; d = to_desc(desc_ring, id); prev_state_val = DESC_SV(id, desc_committed); /* * Guarantee the reserved state is stored before reading any * record data. A full memory barrier is needed because @state_var * modification is followed by reading. This pairs with _prb_commit:B. * * Memory barrier involvement: * * If desc_reopen_last:A reads from _prb_commit:B, then * prb_reserve_in_last:A reads from _prb_commit:A. * * Relies on: * * WMB from _prb_commit:A to _prb_commit:B * matching * MB If desc_reopen_last:A to prb_reserve_in_last:A */ if (!atomic_long_try_cmpxchg(&d->state_var, &prev_state_val, DESC_SV(id, desc_reserved))) { /* LMM(desc_reopen_last:A) */ return NULL; } *id_out = id; return d; } /** * prb_reserve_in_last() - Re-reserve and extend the space in the ringbuffer * used by the newest record. * * @e: The entry structure to setup. * @rb: The ringbuffer to re-reserve and extend data in. * @r: The record structure to allocate buffers for. * @caller_id: The caller ID of the caller (reserving writer). * @max_size: Fail if the extended size would be greater than this. * * This is the public function available to writers to re-reserve and extend * data. * * The writer specifies the text size to extend (not the new total size) by * setting the @text_buf_size field of @r. To ensure proper initialization * of @r, prb_rec_init_wr() should be used. * * This function will fail if @caller_id does not match the caller ID of the * newest record. In that case the caller must reserve new data using * prb_reserve(). * * Context: Any context. Disables local interrupts on success. * Return: true if text data could be extended, otherwise false. * * On success: * * - @r->text_buf points to the beginning of the entire text buffer. * * - @r->text_buf_size is set to the new total size of the buffer. * * - @r->info is not touched so that @r->info->text_len could be used * to append the text. * * - prb_record_text_space() can be used on @e to query the new * actually used space. * * Important: All @r->info fields will already be set with the current values * for the record. I.e. @r->info->text_len will be less than * @text_buf_size. Writers can use @r->info->text_len to know * where concatenation begins and writers should update * @r->info->text_len after concatenating. */ bool prb_reserve_in_last(struct prb_reserved_entry *e, struct printk_ringbuffer *rb, struct printk_record *r, u32 caller_id, unsigned int max_size) { struct prb_desc_ring *desc_ring = &rb->desc_ring; struct printk_info *info; unsigned int data_size; struct prb_desc *d; unsigned long id; local_irq_save(e->irqflags); /* Transition the newest descriptor back to the reserved state. */ d = desc_reopen_last(desc_ring, caller_id, &id); if (!d) { local_irq_restore(e->irqflags); goto fail_reopen; } /* Now the writer has exclusive access: LMM(prb_reserve_in_last:A) */ info = to_info(desc_ring, id); /* * Set the @e fields here so that prb_commit() can be used if * anything fails from now on. */ e->rb = rb; e->id = id; /* * desc_reopen_last() checked the caller_id, but there was no * exclusive access at that point. The descriptor may have * changed since then. */ if (caller_id != info->caller_id) goto fail; if (BLK_DATALESS(&d->text_blk_lpos)) { if (WARN_ON_ONCE(info->text_len != 0)) { pr_warn_once("wrong text_len value (%hu, expecting 0)\n", info->text_len); info->text_len = 0; } if (!data_check_size(&rb->text_data_ring, r->text_buf_size)) goto fail; if (r->text_buf_size > max_size) goto fail; r->text_buf = data_alloc(rb, r->text_buf_size, &d->text_blk_lpos, id); } else { if (!get_data(&rb->text_data_ring, &d->text_blk_lpos, &data_size)) goto fail; /* * Increase the buffer size to include the original size. If * the meta data (@text_len) is not sane, use the full data * block size. */ if (WARN_ON_ONCE(info->text_len > data_size)) { pr_warn_once("wrong text_len value (%hu, expecting <=%u)\n", info->text_len, data_size); info->text_len = data_size; } r->text_buf_size += info->text_len; if (!data_check_size(&rb->text_data_ring, r->text_buf_size)) goto fail; if (r->text_buf_size > max_size) goto fail; r->text_buf = data_realloc(rb, r->text_buf_size, &d->text_blk_lpos, id); } if (r->text_buf_size && !r->text_buf) goto fail; r->info = info; e->text_space = space_used(&rb->text_data_ring, &d->text_blk_lpos); return true; fail: prb_commit(e); /* prb_commit() re-enabled interrupts. */ fail_reopen: /* Make it clear to the caller that the re-reserve failed. */ memset(r, 0, sizeof(*r)); return false; } /* * @last_finalized_seq value guarantees that all records up to and including * this sequence number are finalized and can be read. The only exception are * too old records which have already been overwritten. * * It is also guaranteed that @last_finalized_seq only increases. * * Be aware that finalized records following non-finalized records are not * reported because they are not yet available to the reader. For example, * a new record stored via printk() will not be available to a printer if * it follows a record that has not been finalized yet. However, once that * non-finalized record becomes finalized, @last_finalized_seq will be * appropriately updated and the full set of finalized records will be * available to the printer. And since each printk() caller will either * directly print or trigger deferred printing of all available unprinted * records, all printk() messages will get printed. */ static u64 desc_last_finalized_seq(struct printk_ringbuffer *rb) { struct prb_desc_ring *desc_ring = &rb->desc_ring; unsigned long ulseq; /* * Guarantee the sequence number is loaded before loading the * associated record in order to guarantee that the record can be * seen by this CPU. This pairs with desc_update_last_finalized:A. */ ulseq = atomic_long_read_acquire(&desc_ring->last_finalized_seq ); /* LMM(desc_last_finalized_seq:A) */ return __ulseq_to_u64seq(rb, ulseq); } static bool _prb_read_valid(struct printk_ringbuffer *rb, u64 *seq, struct printk_record *r, unsigned int *line_count); /* * Check if there are records directly following @last_finalized_seq that are * finalized. If so, update @last_finalized_seq to the latest of these * records. It is not allowed to skip over records that are not yet finalized. */ static void desc_update_last_finalized(struct printk_ringbuffer *rb) { struct prb_desc_ring *desc_ring = &rb->desc_ring; u64 old_seq = desc_last_finalized_seq(rb); unsigned long oldval; unsigned long newval; u64 finalized_seq; u64 try_seq; try_again: finalized_seq = old_seq; try_seq = finalized_seq + 1; /* Try to find later finalized records. */ while (_prb_read_valid(rb, &try_seq, NULL, NULL)) { finalized_seq = try_seq; try_seq++; } /* No update needed if no later finalized record was found. */ if (finalized_seq == old_seq) return; oldval = __u64seq_to_ulseq(old_seq); newval = __u64seq_to_ulseq(finalized_seq); /* * Set the sequence number of a later finalized record that has been * seen. * * Guarantee the record data is visible to other CPUs before storing * its sequence number. This pairs with desc_last_finalized_seq:A. * * Memory barrier involvement: * * If desc_last_finalized_seq:A reads from * desc_update_last_finalized:A, then desc_read:A reads from * _prb_commit:B. * * Relies on: * * RELEASE from _prb_commit:B to desc_update_last_finalized:A * matching * ACQUIRE from desc_last_finalized_seq:A to desc_read:A * * Note: _prb_commit:B and desc_update_last_finalized:A can be * different CPUs. However, the desc_update_last_finalized:A * CPU (which performs the release) must have previously seen * _prb_commit:B. */ if (!atomic_long_try_cmpxchg_release(&desc_ring->last_finalized_seq, &oldval, newval)) { /* LMM(desc_update_last_finalized:A) */ old_seq = __ulseq_to_u64seq(rb, oldval); goto try_again; } } /* * Attempt to finalize a specified descriptor. If this fails, the descriptor * is either already final or it will finalize itself when the writer commits. */ static void desc_make_final(struct printk_ringbuffer *rb, unsigned long id) { struct prb_desc_ring *desc_ring = &rb->desc_ring; unsigned long prev_state_val = DESC_SV(id, desc_committed); struct prb_desc *d = to_desc(desc_ring, id); if (atomic_long_try_cmpxchg_relaxed(&d->state_var, &prev_state_val, DESC_SV(id, desc_finalized))) { /* LMM(desc_make_final:A) */ desc_update_last_finalized(rb); } } /** * prb_reserve() - Reserve space in the ringbuffer. * * @e: The entry structure to setup. * @rb: The ringbuffer to reserve data in. * @r: The record structure to allocate buffers for. * * This is the public function available to writers to reserve data. * * The writer specifies the text size to reserve by setting the * @text_buf_size field of @r. To ensure proper initialization of @r, * prb_rec_init_wr() should be used. * * Context: Any context. Disables local interrupts on success. * Return: true if at least text data could be allocated, otherwise false. * * On success, the fields @info and @text_buf of @r will be set by this * function and should be filled in by the writer before committing. Also * on success, prb_record_text_space() can be used on @e to query the actual * space used for the text data block. * * Important: @info->text_len needs to be set correctly by the writer in * order for data to be readable and/or extended. Its value * is initialized to 0. */ bool prb_reserve(struct prb_reserved_entry *e, struct printk_ringbuffer *rb, struct printk_record *r) { struct prb_desc_ring *desc_ring = &rb->desc_ring; struct printk_info *info; struct prb_desc *d; unsigned long id; u64 seq; if (!data_check_size(&rb->text_data_ring, r->text_buf_size)) goto fail; /* * Descriptors in the reserved state act as blockers to all further * reservations once the desc_ring has fully wrapped. Disable * interrupts during the reserve/commit window in order to minimize * the likelihood of this happening. */ local_irq_save(e->irqflags); if (!desc_reserve(rb, &id)) { /* Descriptor reservation failures are tracked. */ atomic_long_inc(&rb->fail); local_irq_restore(e->irqflags); goto fail; } d = to_desc(desc_ring, id); info = to_info(desc_ring, id); /* * All @info fields (except @seq) are cleared and must be filled in * by the writer. Save @seq before clearing because it is used to * determine the new sequence number. */ seq = info->seq; memset(info, 0, sizeof(*info)); /* * Set the @e fields here so that prb_commit() can be used if * text data allocation fails. */ e->rb = rb; e->id = id; /* * Initialize the sequence number if it has "never been set". * Otherwise just increment it by a full wrap. * * @seq is considered "never been set" if it has a value of 0, * _except_ for @infos[0], which was specially setup by the ringbuffer * initializer and therefore is always considered as set. * * See the "Bootstrap" comment block in printk_ringbuffer.h for * details about how the initializer bootstraps the descriptors. */ if (seq == 0 && DESC_INDEX(desc_ring, id) != 0) info->seq = DESC_INDEX(desc_ring, id); else info->seq = seq + DESCS_COUNT(desc_ring); /* * New data is about to be reserved. Once that happens, previous * descriptors are no longer able to be extended. Finalize the * previous descriptor now so that it can be made available to * readers. (For seq==0 there is no previous descriptor.) */ if (info->seq > 0) desc_make_final(rb, DESC_ID(id - 1)); r->text_buf = data_alloc(rb, r->text_buf_size, &d->text_blk_lpos, id); /* If text data allocation fails, a data-less record is committed. */ if (r->text_buf_size && !r->text_buf) { prb_commit(e); /* prb_commit() re-enabled interrupts. */ goto fail; } r->info = info; /* Record full text space used by record. */ e->text_space = space_used(&rb->text_data_ring, &d->text_blk_lpos); return true; fail: /* Make it clear to the caller that the reserve failed. */ memset(r, 0, sizeof(*r)); return false; } /* Commit the data (possibly finalizing it) and restore interrupts. */ static void _prb_commit(struct prb_reserved_entry *e, unsigned long state_val) { struct prb_desc_ring *desc_ring = &e->rb->desc_ring; struct prb_desc *d = to_desc(desc_ring, e->id); unsigned long prev_state_val = DESC_SV(e->id, desc_reserved); /* Now the writer has finished all writing: LMM(_prb_commit:A) */ /* * Set the descriptor as committed. See "ABA Issues" about why * cmpxchg() instead of set() is used. * * 1 Guarantee all record data is stored before the descriptor state * is stored as committed. A write memory barrier is sufficient * for this. This pairs with desc_read:B and desc_reopen_last:A. * * 2. Guarantee the descriptor state is stored as committed before * re-checking the head ID in order to possibly finalize this * descriptor. This pairs with desc_reserve:D. * * Memory barrier involvement: * * If prb_commit:A reads from desc_reserve:D, then * desc_make_final:A reads from _prb_commit:B. * * Relies on: * * MB _prb_commit:B to prb_commit:A * matching * MB desc_reserve:D to desc_make_final:A */ if (!atomic_long_try_cmpxchg(&d->state_var, &prev_state_val, DESC_SV(e->id, state_val))) { /* LMM(_prb_commit:B) */ WARN_ON_ONCE(1); } /* Restore interrupts, the reserve/commit window is finished. */ local_irq_restore(e->irqflags); } /** * prb_commit() - Commit (previously reserved) data to the ringbuffer. * * @e: The entry containing the reserved data information. * * This is the public function available to writers to commit data. * * Note that the data is not yet available to readers until it is finalized. * Finalizing happens automatically when space for the next record is * reserved. * * See prb_final_commit() for a version of this function that finalizes * immediately. * * Context: Any context. Enables local interrupts. */ void prb_commit(struct prb_reserved_entry *e) { struct prb_desc_ring *desc_ring = &e->rb->desc_ring; unsigned long head_id; _prb_commit(e, desc_committed); /* * If this descriptor is no longer the head (i.e. a new record has * been allocated), extending the data for this record is no longer * allowed and therefore it must be finalized. */ head_id = atomic_long_read(&desc_ring->head_id); /* LMM(prb_commit:A) */ if (head_id != e->id) desc_make_final(e->rb, e->id); } /** * prb_final_commit() - Commit and finalize (previously reserved) data to * the ringbuffer. * * @e: The entry containing the reserved data information. * * This is the public function available to writers to commit+finalize data. * * By finalizing, the data is made immediately available to readers. * * This function should only be used if there are no intentions of extending * this data using prb_reserve_in_last(). * * Context: Any context. Enables local interrupts. */ void prb_final_commit(struct prb_reserved_entry *e) { _prb_commit(e, desc_finalized); desc_update_last_finalized(e->rb); } /* * Count the number of lines in provided text. All text has at least 1 line * (even if @text_size is 0). Each '\n' processed is counted as an additional * line. */ static unsigned int count_lines(const char *text, unsigned int text_size) { unsigned int next_size = text_size; unsigned int line_count = 1; const char *next = text; while (next_size) { next = memchr(next, '\n', next_size); if (!next) break; line_count++; next++; next_size = text_size - (next - text); } return line_count; } /* * Given @blk_lpos, copy an expected @len of data into the provided buffer. * If @line_count is provided, count the number of lines in the data. * * This function (used by readers) performs strict validation on the data * size to possibly detect bugs in the writer code. A WARN_ON_ONCE() is * triggered if an internal error is detected. */ static bool copy_data(struct prb_data_ring *data_ring, struct prb_data_blk_lpos *blk_lpos, u16 len, char *buf, unsigned int buf_size, unsigned int *line_count) { unsigned int data_size; const char *data; /* Caller might not want any data. */ if ((!buf || !buf_size) && !line_count) return true; data = get_data(data_ring, blk_lpos, &data_size); if (!data) return false; /* * Actual cannot be less than expected. It can be more than expected * because of the trailing alignment padding. * * Note that invalid @len values can occur because the caller loads * the value during an allowed data race. */ if (data_size < (unsigned int)len) return false; /* Caller interested in the line count? */ if (line_count) *line_count = count_lines(data, len); /* Caller interested in the data content? */ if (!buf || !buf_size) return true; data_size = min_t(unsigned int, buf_size, len); memcpy(&buf[0], data, data_size); /* LMM(copy_data:A) */ return true; } /* * This is an extended version of desc_read(). It gets a copy of a specified * descriptor. However, it also verifies that the record is finalized and has * the sequence number @seq. On success, 0 is returned. * * Error return values: * -EINVAL: A finalized record with sequence number @seq does not exist. * -ENOENT: A finalized record with sequence number @seq exists, but its data * is not available. This is a valid record, so readers should * continue with the next record. */ static int desc_read_finalized_seq(struct prb_desc_ring *desc_ring, unsigned long id, u64 seq, struct prb_desc *desc_out) { struct prb_data_blk_lpos *blk_lpos = &desc_out->text_blk_lpos; enum desc_state d_state; u64 s; d_state = desc_read(desc_ring, id, desc_out, &s, NULL); /* * An unexpected @id (desc_miss) or @seq mismatch means the record * does not exist. A descriptor in the reserved or committed state * means the record does not yet exist for the reader. */ if (d_state == desc_miss || d_state == desc_reserved || d_state == desc_committed || s != seq) { return -EINVAL; } /* * A descriptor in the reusable state may no longer have its data * available; report it as existing but with lost data. Or the record * may actually be a record with lost data. */ if (d_state == desc_reusable || (blk_lpos->begin == FAILED_LPOS && blk_lpos->next == FAILED_LPOS)) { return -ENOENT; } return 0; } /* * Copy the ringbuffer data from the record with @seq to the provided * @r buffer. On success, 0 is returned. * * See desc_read_finalized_seq() for error return values. */ static int prb_read(struct printk_ringbuffer *rb, u64 seq, struct printk_record *r, unsigned int *line_count) { struct prb_desc_ring *desc_ring = &rb->desc_ring; struct printk_info *info = to_info(desc_ring, seq); struct prb_desc *rdesc = to_desc(desc_ring, seq); atomic_long_t *state_var = &rdesc->state_var; struct prb_desc desc; unsigned long id; int err; /* Extract the ID, used to specify the descriptor to read. */ id = DESC_ID(atomic_long_read(state_var)); /* Get a local copy of the correct descriptor (if available). */ err = desc_read_finalized_seq(desc_ring, id, seq, &desc); /* * If @r is NULL, the caller is only interested in the availability * of the record. */ if (err || !r) return err; /* If requested, copy meta data. */ if (r->info) memcpy(r->info, info, sizeof(*(r->info))); /* Copy text data. If it fails, this is a data-less record. */ if (!copy_data(&rb->text_data_ring, &desc.text_blk_lpos, info->text_len, r->text_buf, r->text_buf_size, line_count)) { return -ENOENT; } /* Ensure the record is still finalized and has the same @seq. */ return desc_read_finalized_seq(desc_ring, id, seq, &desc); } /* Get the sequence number of the tail descriptor. */ u64 prb_first_seq(struct printk_ringbuffer *rb) { struct prb_desc_ring *desc_ring = &rb->desc_ring; enum desc_state d_state; struct prb_desc desc; unsigned long id; u64 seq; for (;;) { id = atomic_long_read(&rb->desc_ring.tail_id); /* LMM(prb_first_seq:A) */ d_state = desc_read(desc_ring, id, &desc, &seq, NULL); /* LMM(prb_first_seq:B) */ /* * This loop will not be infinite because the tail is * _always_ in the finalized or reusable state. */ if (d_state == desc_finalized || d_state == desc_reusable) break; /* * Guarantee the last state load from desc_read() is before * reloading @tail_id in order to see a new tail in the case * that the descriptor has been recycled. This pairs with * desc_reserve:D. * * Memory barrier involvement: * * If prb_first_seq:B reads from desc_reserve:F, then * prb_first_seq:A reads from desc_push_tail:B. * * Relies on: * * MB from desc_push_tail:B to desc_reserve:F * matching * RMB prb_first_seq:B to prb_first_seq:A */ smp_rmb(); /* LMM(prb_first_seq:C) */ } return seq; } /** * prb_next_reserve_seq() - Get the sequence number after the most recently * reserved record. * * @rb: The ringbuffer to get the sequence number from. * * This is the public function available to readers to see what sequence * number will be assigned to the next reserved record. * * Note that depending on the situation, this value can be equal to or * higher than the sequence number returned by prb_next_seq(). * * Context: Any context. * Return: The sequence number that will be assigned to the next record * reserved. */ u64 prb_next_reserve_seq(struct printk_ringbuffer *rb) { struct prb_desc_ring *desc_ring = &rb->desc_ring; unsigned long last_finalized_id; atomic_long_t *state_var; u64 last_finalized_seq; unsigned long head_id; struct prb_desc desc; unsigned long diff; struct prb_desc *d; int err; /* * It may not be possible to read a sequence number for @head_id. * So the ID of @last_finailzed_seq is used to calculate what the * sequence number of @head_id will be. */ try_again: last_finalized_seq = desc_last_finalized_seq(rb); /* * @head_id is loaded after @last_finalized_seq to ensure that * it points to the record with @last_finalized_seq or newer. * * Memory barrier involvement: * * If desc_last_finalized_seq:A reads from * desc_update_last_finalized:A, then * prb_next_reserve_seq:A reads from desc_reserve:D. * * Relies on: * * RELEASE from desc_reserve:D to desc_update_last_finalized:A * matching * ACQUIRE from desc_last_finalized_seq:A to prb_next_reserve_seq:A * * Note: desc_reserve:D and desc_update_last_finalized:A can be * different CPUs. However, the desc_update_last_finalized:A CPU * (which performs the release) must have previously seen * desc_read:C, which implies desc_reserve:D can be seen. */ head_id = atomic_long_read(&desc_ring->head_id); /* LMM(prb_next_reserve_seq:A) */ d = to_desc(desc_ring, last_finalized_seq); state_var = &d->state_var; /* Extract the ID, used to specify the descriptor to read. */ last_finalized_id = DESC_ID(atomic_long_read(state_var)); /* Ensure @last_finalized_id is correct. */ err = desc_read_finalized_seq(desc_ring, last_finalized_id, last_finalized_seq, &desc); if (err == -EINVAL) { if (last_finalized_seq == 0) { /* * No record has been finalized or even reserved yet. * * The @head_id is initialized such that the first * increment will yield the first record (seq=0). * Handle it separately to avoid a negative @diff * below. */ if (head_id == DESC0_ID(desc_ring->count_bits)) return 0; /* * One or more descriptors are already reserved. Use * the descriptor ID of the first one (@seq=0) for * the @diff below. */ last_finalized_id = DESC0_ID(desc_ring->count_bits) + 1; } else { /* Record must have been overwritten. Try again. */ goto try_again; } } /* Diff of known descriptor IDs to compute related sequence numbers. */ diff = head_id - last_finalized_id; /* * @head_id points to the most recently reserved record, but this * function returns the sequence number that will be assigned to the * next (not yet reserved) record. Thus +1 is needed. */ return (last_finalized_seq + diff + 1); } /* * Non-blocking read of a record. * * On success @seq is updated to the record that was read and (if provided) * @r and @line_count will contain the read/calculated data. * * On failure @seq is updated to a record that is not yet available to the * reader, but it will be the next record available to the reader. * * Note: When the current CPU is in panic, this function will skip over any * non-existent/non-finalized records in order to allow the panic CPU * to print any and all records that have been finalized. */ static bool _prb_read_valid(struct printk_ringbuffer *rb, u64 *seq, struct printk_record *r, unsigned int *line_count) { u64 tail_seq; int err; while ((err = prb_read(rb, *seq, r, line_count))) { tail_seq = prb_first_seq(rb); if (*seq < tail_seq) { /* * Behind the tail. Catch up and try again. This * can happen for -ENOENT and -EINVAL cases. */ *seq = tail_seq; } else if (err == -ENOENT) { /* Record exists, but the data was lost. Skip. */ (*seq)++; } else { /* * Non-existent/non-finalized record. Must stop. * * For panic situations it cannot be expected that * non-finalized records will become finalized. But * there may be other finalized records beyond that * need to be printed for a panic situation. If this * is the panic CPU, skip this * non-existent/non-finalized record unless non-panic * CPUs are still running and their debugging is * explicitly enabled. * * Note that new messages printed on panic CPU are * finalized when we are here. The only exception * might be the last message without trailing newline. * But it would have the sequence number returned * by "prb_next_reserve_seq() - 1". */ if (this_cpu_in_panic() && (!debug_non_panic_cpus || legacy_allow_panic_sync) && ((*seq + 1) < prb_next_reserve_seq(rb))) { (*seq)++; } else { return false; } } } return true; } /** * prb_read_valid() - Non-blocking read of a requested record or (if gone) * the next available record. * * @rb: The ringbuffer to read from. * @seq: The sequence number of the record to read. * @r: A record data buffer to store the read record to. * * This is the public function available to readers to read a record. * * The reader provides the @info and @text_buf buffers of @r to be * filled in. Any of the buffer pointers can be set to NULL if the reader * is not interested in that data. To ensure proper initialization of @r, * prb_rec_init_rd() should be used. * * Context: Any context. * Return: true if a record was read, otherwise false. * * On success, the reader must check r->info.seq to see which record was * actually read. This allows the reader to detect dropped records. * * Failure means @seq refers to a record not yet available to the reader. */ bool prb_read_valid(struct printk_ringbuffer *rb, u64 seq, struct printk_record *r) { return _prb_read_valid(rb, &seq, r, NULL); } /** * prb_read_valid_info() - Non-blocking read of meta data for a requested * record or (if gone) the next available record. * * @rb: The ringbuffer to read from. * @seq: The sequence number of the record to read. * @info: A buffer to store the read record meta data to. * @line_count: A buffer to store the number of lines in the record text. * * This is the public function available to readers to read only the * meta data of a record. * * The reader provides the @info, @line_count buffers to be filled in. * Either of the buffer pointers can be set to NULL if the reader is not * interested in that data. * * Context: Any context. * Return: true if a record's meta data was read, otherwise false. * * On success, the reader must check info->seq to see which record meta data * was actually read. This allows the reader to detect dropped records. * * Failure means @seq refers to a record not yet available to the reader. */ bool prb_read_valid_info(struct printk_ringbuffer *rb, u64 seq, struct printk_info *info, unsigned int *line_count) { struct printk_record r; prb_rec_init_rd(&r, info, NULL, 0); return _prb_read_valid(rb, &seq, &r, line_count); } /** * prb_first_valid_seq() - Get the sequence number of the oldest available * record. * * @rb: The ringbuffer to get the sequence number from. * * This is the public function available to readers to see what the * first/oldest valid sequence number is. * * This provides readers a starting point to begin iterating the ringbuffer. * * Context: Any context. * Return: The sequence number of the first/oldest record or, if the * ringbuffer is empty, 0 is returned. */ u64 prb_first_valid_seq(struct printk_ringbuffer *rb) { u64 seq = 0; if (!_prb_read_valid(rb, &seq, NULL, NULL)) return 0; return seq; } /** * prb_next_seq() - Get the sequence number after the last available record. * * @rb: The ringbuffer to get the sequence number from. * * This is the public function available to readers to see what the next * newest sequence number available to readers will be. * * This provides readers a sequence number to jump to if all currently * available records should be skipped. It is guaranteed that all records * previous to the returned value have been finalized and are (or were) * available to the reader. * * Context: Any context. * Return: The sequence number of the next newest (not yet available) record * for readers. */ u64 prb_next_seq(struct printk_ringbuffer *rb) { u64 seq; seq = desc_last_finalized_seq(rb); /* * Begin searching after the last finalized record. * * On 0, the search must begin at 0 because of hack#2 * of the bootstrapping phase it is not known if a * record at index 0 exists. */ if (seq != 0) seq++; /* * The information about the last finalized @seq might be inaccurate. * Search forward to find the current one. */ while (_prb_read_valid(rb, &seq, NULL, NULL)) seq++; return seq; } /** * prb_init() - Initialize a ringbuffer to use provided external buffers. * * @rb: The ringbuffer to initialize. * @text_buf: The data buffer for text data. * @textbits: The size of @text_buf as a power-of-2 value. * @descs: The descriptor buffer for ringbuffer records. * @descbits: The count of @descs items as a power-of-2 value. * @infos: The printk_info buffer for ringbuffer records. * * This is the public function available to writers to setup a ringbuffer * during runtime using provided buffers. * * This must match the initialization of DEFINE_PRINTKRB(). * * Context: Any context. */ void prb_init(struct printk_ringbuffer *rb, char *text_buf, unsigned int textbits, struct prb_desc *descs, unsigned int descbits, struct printk_info *infos) { memset(descs, 0, _DESCS_COUNT(descbits) * sizeof(descs[0])); memset(infos, 0, _DESCS_COUNT(descbits) * sizeof(infos[0])); rb->desc_ring.count_bits = descbits; rb->desc_ring.descs = descs; rb->desc_ring.infos = infos; atomic_long_set(&rb->desc_ring.head_id, DESC0_ID(descbits)); atomic_long_set(&rb->desc_ring.tail_id, DESC0_ID(descbits)); atomic_long_set(&rb->desc_ring.last_finalized_seq, 0); rb->text_data_ring.size_bits = textbits; rb->text_data_ring.data = text_buf; atomic_long_set(&rb->text_data_ring.head_lpos, BLK0_LPOS(textbits)); atomic_long_set(&rb->text_data_ring.tail_lpos, BLK0_LPOS(textbits)); atomic_long_set(&rb->fail, 0); atomic_long_set(&(descs[_DESCS_COUNT(descbits) - 1].state_var), DESC0_SV(descbits)); descs[_DESCS_COUNT(descbits) - 1].text_blk_lpos.begin = FAILED_LPOS; descs[_DESCS_COUNT(descbits) - 1].text_blk_lpos.next = FAILED_LPOS; infos[0].seq = -(u64)_DESCS_COUNT(descbits); infos[_DESCS_COUNT(descbits) - 1].seq = 0; } /** * prb_record_text_space() - Query the full actual used ringbuffer space for * the text data of a reserved entry. * * @e: The successfully reserved entry to query. * * This is the public function available to writers to see how much actual * space is used in the ringbuffer to store the text data of the specified * entry. * * This function is only valid if @e has been successfully reserved using * prb_reserve(). * * Context: Any context. * Return: The size in bytes used by the text data of the associated record. */ unsigned int prb_record_text_space(struct prb_reserved_entry *e) { return e->text_space; } |
40 40 40 40 40 299 300 300 141 141 26 134 141 125 22 117 126 129 6 1 126 17 108 265 267 9 265 9 132 133 133 6 6 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 | // SPDX-License-Identifier: GPL-2.0+ /* * Copyright (C) 2016 Oracle. All Rights Reserved. * Author: Darrick J. Wong <darrick.wong@oracle.com> */ #include "xfs.h" #include "xfs_fs.h" #include "xfs_shared.h" #include "xfs_format.h" #include "xfs_log_format.h" #include "xfs_trans_resv.h" #include "xfs_mount.h" #include "xfs_alloc.h" #include "xfs_errortag.h" #include "xfs_error.h" #include "xfs_trace.h" #include "xfs_trans.h" #include "xfs_rmap_btree.h" #include "xfs_btree.h" #include "xfs_refcount_btree.h" #include "xfs_ialloc_btree.h" #include "xfs_ag.h" #include "xfs_ag_resv.h" /* * Per-AG Block Reservations * * For some kinds of allocation group metadata structures, it is advantageous * to reserve a small number of blocks in each AG so that future expansions of * that data structure do not encounter ENOSPC because errors during a btree * split cause the filesystem to go offline. * * Prior to the introduction of reflink, this wasn't an issue because the free * space btrees maintain a reserve of space (the AGFL) to handle any expansion * that may be necessary; and allocations of other metadata (inodes, BMBT, * dir/attr) aren't restricted to a single AG. However, with reflink it is * possible to allocate all the space in an AG, have subsequent reflink/CoW * activity expand the refcount btree, and discover that there's no space left * to handle that expansion. Since we can calculate the maximum size of the * refcount btree, we can reserve space for it and avoid ENOSPC. * * Handling per-AG reservations consists of three changes to the allocator's * behavior: First, because these reservations are always needed, we decrease * the ag_max_usable counter to reflect the size of the AG after the reserved * blocks are taken. Second, the reservations must be reflected in the * fdblocks count to maintain proper accounting. Third, each AG must maintain * its own reserved block counter so that we can calculate the amount of space * that must remain free to maintain the reservations. Fourth, the "remaining * reserved blocks" count must be used when calculating the length of the * longest free extent in an AG and to clamp maxlen in the per-AG allocation * functions. In other words, we maintain a virtual allocation via in-core * accounting tricks so that we don't have to clean up after a crash. :) * * Reserved blocks can be managed by passing one of the enum xfs_ag_resv_type * values via struct xfs_alloc_arg or directly to the xfs_free_extent * function. It might seem a little funny to maintain a reservoir of blocks * to feed another reservoir, but the AGFL only holds enough blocks to get * through the next transaction. The per-AG reservation is to ensure (we * hope) that each AG never runs out of blocks. Each data structure wanting * to use the reservation system should update ask/used in xfs_ag_resv_init. */ /* * Are we critically low on blocks? For now we'll define that as the number * of blocks we can get our hands on being less than 10% of what we reserved * or less than some arbitrary number (maximum btree height). */ bool xfs_ag_resv_critical( struct xfs_perag *pag, enum xfs_ag_resv_type type) { struct xfs_mount *mp = pag_mount(pag); xfs_extlen_t avail; xfs_extlen_t orig; switch (type) { case XFS_AG_RESV_METADATA: avail = pag->pagf_freeblks - pag->pag_rmapbt_resv.ar_reserved; orig = pag->pag_meta_resv.ar_asked; break; case XFS_AG_RESV_RMAPBT: avail = pag->pagf_freeblks + pag->pagf_flcount - pag->pag_meta_resv.ar_reserved; orig = pag->pag_rmapbt_resv.ar_asked; break; default: ASSERT(0); return false; } trace_xfs_ag_resv_critical(pag, type, avail); /* Critically low if less than 10% or max btree height remains. */ return XFS_TEST_ERROR(avail < orig / 10 || avail < mp->m_agbtree_maxlevels, mp, XFS_ERRTAG_AG_RESV_CRITICAL); } /* * How many blocks are reserved but not used, and therefore must not be * allocated away? */ xfs_extlen_t xfs_ag_resv_needed( struct xfs_perag *pag, enum xfs_ag_resv_type type) { xfs_extlen_t len; len = pag->pag_meta_resv.ar_reserved + pag->pag_rmapbt_resv.ar_reserved; switch (type) { case XFS_AG_RESV_METADATA: case XFS_AG_RESV_RMAPBT: len -= xfs_perag_resv(pag, type)->ar_reserved; break; case XFS_AG_RESV_METAFILE: case XFS_AG_RESV_NONE: /* empty */ break; default: ASSERT(0); } trace_xfs_ag_resv_needed(pag, type, len); return len; } /* Clean out a reservation */ static void __xfs_ag_resv_free( struct xfs_perag *pag, enum xfs_ag_resv_type type) { struct xfs_ag_resv *resv; xfs_extlen_t oldresv; trace_xfs_ag_resv_free(pag, type, 0); resv = xfs_perag_resv(pag, type); if (pag_agno(pag) == 0) pag_mount(pag)->m_ag_max_usable += resv->ar_asked; /* * RMAPBT blocks come from the AGFL and AGFL blocks are always * considered "free", so whatever was reserved at mount time must be * given back at umount. */ if (type == XFS_AG_RESV_RMAPBT) oldresv = resv->ar_orig_reserved; else oldresv = resv->ar_reserved; xfs_add_fdblocks(pag_mount(pag), oldresv); resv->ar_reserved = 0; resv->ar_asked = 0; resv->ar_orig_reserved = 0; } /* Free a per-AG reservation. */ void xfs_ag_resv_free( struct xfs_perag *pag) { __xfs_ag_resv_free(pag, XFS_AG_RESV_RMAPBT); __xfs_ag_resv_free(pag, XFS_AG_RESV_METADATA); } static int __xfs_ag_resv_init( struct xfs_perag *pag, enum xfs_ag_resv_type type, xfs_extlen_t ask, xfs_extlen_t used) { struct xfs_mount *mp = pag_mount(pag); struct xfs_ag_resv *resv; int error; xfs_extlen_t hidden_space; if (used > ask) ask = used; switch (type) { case XFS_AG_RESV_RMAPBT: /* * Space taken by the rmapbt is not subtracted from fdblocks * because the rmapbt lives in the free space. Here we must * subtract the entire reservation from fdblocks so that we * always have blocks available for rmapbt expansion. */ hidden_space = ask; break; case XFS_AG_RESV_METADATA: /* * Space taken by all other metadata btrees are accounted * on-disk as used space. We therefore only hide the space * that is reserved but not used by the trees. */ hidden_space = ask - used; break; default: ASSERT(0); return -EINVAL; } if (XFS_TEST_ERROR(false, mp, XFS_ERRTAG_AG_RESV_FAIL)) error = -ENOSPC; else error = xfs_dec_fdblocks(mp, hidden_space, true); if (error) { trace_xfs_ag_resv_init_error(pag, error, _RET_IP_); xfs_warn(mp, "Per-AG reservation for AG %u failed. Filesystem may run out of space.", pag_agno(pag)); return error; } /* * Reduce the maximum per-AG allocation length by however much we're * trying to reserve for an AG. Since this is a filesystem-wide * counter, we only make the adjustment for AG 0. This assumes that * there aren't any AGs hungrier for per-AG reservation than AG 0. */ if (pag_agno(pag) == 0) mp->m_ag_max_usable -= ask; resv = xfs_perag_resv(pag, type); resv->ar_asked = ask; resv->ar_orig_reserved = hidden_space; resv->ar_reserved = ask - used; trace_xfs_ag_resv_init(pag, type, ask); return 0; } /* Create a per-AG block reservation. */ int xfs_ag_resv_init( struct xfs_perag *pag, struct xfs_trans *tp) { struct xfs_mount *mp = pag_mount(pag); xfs_extlen_t ask; xfs_extlen_t used; int error = 0, error2; bool has_resv = false; /* Create the metadata reservation. */ if (pag->pag_meta_resv.ar_asked == 0) { ask = used = 0; error = xfs_refcountbt_calc_reserves(mp, tp, pag, &ask, &used); if (error) goto out; error = xfs_finobt_calc_reserves(pag, tp, &ask, &used); if (error) goto out; error = __xfs_ag_resv_init(pag, XFS_AG_RESV_METADATA, ask, used); if (error) { /* * Because we didn't have per-AG reservations when the * finobt feature was added we might not be able to * reserve all needed blocks. Warn and fall back to the * old and potentially buggy code in that case, but * ensure we do have the reservation for the refcountbt. */ ask = used = 0; mp->m_finobt_nores = true; error = xfs_refcountbt_calc_reserves(mp, tp, pag, &ask, &used); if (error) goto out; error = __xfs_ag_resv_init(pag, XFS_AG_RESV_METADATA, ask, used); if (error) goto out; } if (ask) has_resv = true; } /* Create the RMAPBT metadata reservation */ if (pag->pag_rmapbt_resv.ar_asked == 0) { ask = used = 0; error = xfs_rmapbt_calc_reserves(mp, tp, pag, &ask, &used); if (error) goto out; error = __xfs_ag_resv_init(pag, XFS_AG_RESV_RMAPBT, ask, used); if (error) goto out; if (ask) has_resv = true; } out: /* * Initialize the pagf if we have at least one active reservation on the * AG. This may have occurred already via reservation calculation, but * fall back to an explicit init to ensure the in-core allocbt usage * counters are initialized as soon as possible. This is important * because filesystems with large perag reservations are susceptible to * free space reservation problems that the allocbt counter is used to * address. */ if (has_resv) { error2 = xfs_alloc_read_agf(pag, tp, 0, NULL); if (error2) return error2; /* * If there isn't enough space in the AG to satisfy the * reservation, let the caller know that there wasn't enough * space. Callers are responsible for deciding what to do * next, since (in theory) we can stumble along with * insufficient reservation if data blocks are being freed to * replenish the AG's free space. */ if (!error && xfs_perag_resv(pag, XFS_AG_RESV_METADATA)->ar_reserved + xfs_perag_resv(pag, XFS_AG_RESV_RMAPBT)->ar_reserved > pag->pagf_freeblks + pag->pagf_flcount) error = -ENOSPC; } return error; } /* Allocate a block from the reservation. */ void xfs_ag_resv_alloc_extent( struct xfs_perag *pag, enum xfs_ag_resv_type type, struct xfs_alloc_arg *args) { struct xfs_ag_resv *resv; xfs_extlen_t len; uint field; trace_xfs_ag_resv_alloc_extent(pag, type, args->len); switch (type) { case XFS_AG_RESV_AGFL: case XFS_AG_RESV_METAFILE: return; case XFS_AG_RESV_METADATA: case XFS_AG_RESV_RMAPBT: resv = xfs_perag_resv(pag, type); break; default: ASSERT(0); fallthrough; case XFS_AG_RESV_NONE: field = args->wasdel ? XFS_TRANS_SB_RES_FDBLOCKS : XFS_TRANS_SB_FDBLOCKS; xfs_trans_mod_sb(args->tp, field, -(int64_t)args->len); return; } len = min_t(xfs_extlen_t, args->len, resv->ar_reserved); resv->ar_reserved -= len; if (type == XFS_AG_RESV_RMAPBT) return; /* Allocations of reserved blocks only need on-disk sb updates... */ xfs_trans_mod_sb(args->tp, XFS_TRANS_SB_RES_FDBLOCKS, -(int64_t)len); /* ...but non-reserved blocks need in-core and on-disk updates. */ if (args->len > len) xfs_trans_mod_sb(args->tp, XFS_TRANS_SB_FDBLOCKS, -((int64_t)args->len - len)); } /* Free a block to the reservation. */ void xfs_ag_resv_free_extent( struct xfs_perag *pag, enum xfs_ag_resv_type type, struct xfs_trans *tp, xfs_extlen_t len) { xfs_extlen_t leftover; struct xfs_ag_resv *resv; trace_xfs_ag_resv_free_extent(pag, type, len); switch (type) { case XFS_AG_RESV_AGFL: case XFS_AG_RESV_METAFILE: return; case XFS_AG_RESV_METADATA: case XFS_AG_RESV_RMAPBT: resv = xfs_perag_resv(pag, type); break; default: ASSERT(0); fallthrough; case XFS_AG_RESV_NONE: xfs_trans_mod_sb(tp, XFS_TRANS_SB_FDBLOCKS, (int64_t)len); fallthrough; case XFS_AG_RESV_IGNORE: return; } leftover = min_t(xfs_extlen_t, len, resv->ar_asked - resv->ar_reserved); resv->ar_reserved += leftover; if (type == XFS_AG_RESV_RMAPBT) return; /* Freeing into the reserved pool only requires on-disk update... */ xfs_trans_mod_sb(tp, XFS_TRANS_SB_RES_FDBLOCKS, len); /* ...but freeing beyond that requires in-core and on-disk update. */ if (len > leftover) xfs_trans_mod_sb(tp, XFS_TRANS_SB_FDBLOCKS, len - leftover); } |
11 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 | // SPDX-License-Identifier: GPL-2.0 /* * Copyright (c) 2000-2001 Silicon Graphics, Inc. All Rights Reserved. */ #ifndef __XFS_ITABLE_H__ #define __XFS_ITABLE_H__ /* In-memory representation of a userspace request for batch inode data. */ struct xfs_ibulk { struct xfs_mount *mp; struct mnt_idmap *idmap; void __user *ubuffer; /* user output buffer */ xfs_ino_t startino; /* start with this inode */ unsigned int icount; /* number of elements in ubuffer */ unsigned int ocount; /* number of records returned */ unsigned int flags; /* see XFS_IBULK_FLAG_* */ }; /* Only iterate within the same AG as startino */ #define XFS_IBULK_SAME_AG (1U << 0) /* Fill out the bs_extents64 field if set. */ #define XFS_IBULK_NREXT64 (1U << 1) /* Signal that we can return metadata directories. */ #define XFS_IBULK_METADIR (1U << 2) /* * Advance the user buffer pointer by one record of the given size. If the * buffer is now full, return the appropriate error code. */ static inline int xfs_ibulk_advance( struct xfs_ibulk *breq, size_t bytes) { char __user *b = breq->ubuffer; breq->ubuffer = b + bytes; breq->ocount++; return breq->ocount == breq->icount ? -ECANCELED : 0; } /* * Return stat information in bulk (by-inode) for the filesystem. */ /* * Return codes for the formatter function are 0 to continue iterating, and * non-zero to stop iterating. Any non-zero value will be passed up to the * bulkstat/inumbers caller. The special value -ECANCELED can be used to stop * iteration, as neither bulkstat nor inumbers will ever generate that error * code on their own. */ typedef int (*bulkstat_one_fmt_pf)(struct xfs_ibulk *breq, const struct xfs_bulkstat *bstat); int xfs_bulkstat_one(struct xfs_ibulk *breq, bulkstat_one_fmt_pf formatter); int xfs_bulkstat(struct xfs_ibulk *breq, bulkstat_one_fmt_pf formatter); void xfs_bulkstat_to_bstat(struct xfs_mount *mp, struct xfs_bstat *bs1, const struct xfs_bulkstat *bstat); typedef int (*inumbers_fmt_pf)(struct xfs_ibulk *breq, const struct xfs_inumbers *igrp); int xfs_inumbers(struct xfs_ibulk *breq, inumbers_fmt_pf formatter); void xfs_inumbers_to_inogrp(struct xfs_inogrp *ig1, const struct xfs_inumbers *ig); #endif /* __XFS_ITABLE_H__ */ |
62 17 16 79 63 60 2 32 2 1 3 3 2 1 3 1 60 54 1 4 55 9 38 1 2 1 1 7 1 1 1 1 1 1 94 94 60 49 35 1 5 35 1 1 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 | // SPDX-License-Identifier: GPL-2.0-or-later /* * User-space Probes (UProbes) for x86 * * Copyright (C) IBM Corporation, 2008-2011 * Authors: * Srikar Dronamraju * Jim Keniston */ #include <linux/kernel.h> #include <linux/sched.h> #include <linux/ptrace.h> #include <linux/uprobes.h> #include <linux/uaccess.h> #include <linux/syscalls.h> #include <linux/kdebug.h> #include <asm/processor.h> #include <asm/insn.h> #include <asm/mmu_context.h> /* Post-execution fixups. */ /* Adjust IP back to vicinity of actual insn */ #define UPROBE_FIX_IP 0x01 /* Adjust the return address of a call insn */ #define UPROBE_FIX_CALL 0x02 /* Instruction will modify TF, don't change it */ #define UPROBE_FIX_SETF 0x04 #define UPROBE_FIX_RIP_SI 0x08 #define UPROBE_FIX_RIP_DI 0x10 #define UPROBE_FIX_RIP_BX 0x20 #define UPROBE_FIX_RIP_MASK \ (UPROBE_FIX_RIP_SI | UPROBE_FIX_RIP_DI | UPROBE_FIX_RIP_BX) #define UPROBE_TRAP_NR UINT_MAX /* Adaptations for mhiramat x86 decoder v14. */ #define OPCODE1(insn) ((insn)->opcode.bytes[0]) #define OPCODE2(insn) ((insn)->opcode.bytes[1]) #define OPCODE3(insn) ((insn)->opcode.bytes[2]) #define MODRM_REG(insn) X86_MODRM_REG((insn)->modrm.value) #define W(row, b0, b1, b2, b3, b4, b5, b6, b7, b8, b9, ba, bb, bc, bd, be, bf)\ (((b0##UL << 0x0)|(b1##UL << 0x1)|(b2##UL << 0x2)|(b3##UL << 0x3) | \ (b4##UL << 0x4)|(b5##UL << 0x5)|(b6##UL << 0x6)|(b7##UL << 0x7) | \ (b8##UL << 0x8)|(b9##UL << 0x9)|(ba##UL << 0xa)|(bb##UL << 0xb) | \ (bc##UL << 0xc)|(bd##UL << 0xd)|(be##UL << 0xe)|(bf##UL << 0xf)) \ << (row % 32)) /* * Good-instruction tables for 32-bit apps. This is non-const and volatile * to keep gcc from statically optimizing it out, as variable_test_bit makes * some versions of gcc to think only *(unsigned long*) is used. * * Opcodes we'll probably never support: * 6c-6f - ins,outs. SEGVs if used in userspace * e4-e7 - in,out imm. SEGVs if used in userspace * ec-ef - in,out acc. SEGVs if used in userspace * cc - int3. SIGTRAP if used in userspace * ce - into. Not used in userspace - no kernel support to make it useful. SEGVs * (why we support bound (62) then? it's similar, and similarly unused...) * f1 - int1. SIGTRAP if used in userspace * f4 - hlt. SEGVs if used in userspace * fa - cli. SEGVs if used in userspace * fb - sti. SEGVs if used in userspace * * Opcodes which need some work to be supported: * 07,17,1f - pop es/ss/ds * Normally not used in userspace, but would execute if used. * Can cause GP or stack exception if tries to load wrong segment descriptor. * We hesitate to run them under single step since kernel's handling * of userspace single-stepping (TF flag) is fragile. * We can easily refuse to support push es/cs/ss/ds (06/0e/16/1e) * on the same grounds that they are never used. * cd - int N. * Used by userspace for "int 80" syscall entry. (Other "int N" * cause GP -> SEGV since their IDT gates don't allow calls from CPL 3). * Not supported since kernel's handling of userspace single-stepping * (TF flag) is fragile. * cf - iret. Normally not used in userspace. Doesn't SEGV unless arguments are bad */ #if defined(CONFIG_X86_32) || defined(CONFIG_IA32_EMULATION) static volatile u32 good_insns_32[256 / 32] = { /* 0 1 2 3 4 5 6 7 8 9 a b c d e f */ /* ---------------------------------------------- */ W(0x00, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1) | /* 00 */ W(0x10, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0) , /* 10 */ W(0x20, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* 20 */ W(0x30, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* 30 */ W(0x40, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* 40 */ W(0x50, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* 50 */ W(0x60, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0) | /* 60 */ W(0x70, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* 70 */ W(0x80, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* 80 */ W(0x90, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* 90 */ W(0xa0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* a0 */ W(0xb0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* b0 */ W(0xc0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0) | /* c0 */ W(0xd0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* d0 */ W(0xe0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0) | /* e0 */ W(0xf0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1) /* f0 */ /* ---------------------------------------------- */ /* 0 1 2 3 4 5 6 7 8 9 a b c d e f */ }; #else #define good_insns_32 NULL #endif /* Good-instruction tables for 64-bit apps. * * Genuinely invalid opcodes: * 06,07 - formerly push/pop es * 0e - formerly push cs * 16,17 - formerly push/pop ss * 1e,1f - formerly push/pop ds * 27,2f,37,3f - formerly daa/das/aaa/aas * 60,61 - formerly pusha/popa * 62 - formerly bound. EVEX prefix for AVX512 (not yet supported) * 82 - formerly redundant encoding of Group1 * 9a - formerly call seg:ofs * ce - formerly into * d4,d5 - formerly aam/aad * d6 - formerly undocumented salc * ea - formerly jmp seg:ofs * * Opcodes we'll probably never support: * 6c-6f - ins,outs. SEGVs if used in userspace * e4-e7 - in,out imm. SEGVs if used in userspace * ec-ef - in,out acc. SEGVs if used in userspace * cc - int3. SIGTRAP if used in userspace * f1 - int1. SIGTRAP if used in userspace * f4 - hlt. SEGVs if used in userspace * fa - cli. SEGVs if used in userspace * fb - sti. SEGVs if used in userspace * * Opcodes which need some work to be supported: * cd - int N. * Used by userspace for "int 80" syscall entry. (Other "int N" * cause GP -> SEGV since their IDT gates don't allow calls from CPL 3). * Not supported since kernel's handling of userspace single-stepping * (TF flag) is fragile. * cf - iret. Normally not used in userspace. Doesn't SEGV unless arguments are bad */ #if defined(CONFIG_X86_64) static volatile u32 good_insns_64[256 / 32] = { /* 0 1 2 3 4 5 6 7 8 9 a b c d e f */ /* ---------------------------------------------- */ W(0x00, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1) | /* 00 */ W(0x10, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0) , /* 10 */ W(0x20, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0) | /* 20 */ W(0x30, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0) , /* 30 */ W(0x40, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* 40 */ W(0x50, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* 50 */ W(0x60, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0) | /* 60 */ W(0x70, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* 70 */ W(0x80, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* 80 */ W(0x90, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1) , /* 90 */ W(0xa0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* a0 */ W(0xb0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* b0 */ W(0xc0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0) | /* c0 */ W(0xd0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* d0 */ W(0xe0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0) | /* e0 */ W(0xf0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1) /* f0 */ /* ---------------------------------------------- */ /* 0 1 2 3 4 5 6 7 8 9 a b c d e f */ }; #else #define good_insns_64 NULL #endif /* Using this for both 64-bit and 32-bit apps. * Opcodes we don't support: * 0f 00 - SLDT/STR/LLDT/LTR/VERR/VERW/-/- group. System insns * 0f 01 - SGDT/SIDT/LGDT/LIDT/SMSW/-/LMSW/INVLPG group. * Also encodes tons of other system insns if mod=11. * Some are in fact non-system: xend, xtest, rdtscp, maybe more * 0f 05 - syscall * 0f 06 - clts (CPL0 insn) * 0f 07 - sysret * 0f 08 - invd (CPL0 insn) * 0f 09 - wbinvd (CPL0 insn) * 0f 0b - ud2 * 0f 30 - wrmsr (CPL0 insn) (then why rdmsr is allowed, it's also CPL0 insn?) * 0f 34 - sysenter * 0f 35 - sysexit * 0f 37 - getsec * 0f 78 - vmread (Intel VMX. CPL0 insn) * 0f 79 - vmwrite (Intel VMX. CPL0 insn) * Note: with prefixes, these two opcodes are * extrq/insertq/AVX512 convert vector ops. * 0f ae - group15: [f]xsave,[f]xrstor,[v]{ld,st}mxcsr,clflush[opt], * {rd,wr}{fs,gs}base,{s,l,m}fence. * Why? They are all user-executable. */ static volatile u32 good_2byte_insns[256 / 32] = { /* 0 1 2 3 4 5 6 7 8 9 a b c d e f */ /* ---------------------------------------------- */ W(0x00, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1) | /* 00 */ W(0x10, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* 10 */ W(0x20, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* 20 */ W(0x30, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1) , /* 30 */ W(0x40, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* 40 */ W(0x50, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* 50 */ W(0x60, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* 60 */ W(0x70, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1) , /* 70 */ W(0x80, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* 80 */ W(0x90, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* 90 */ W(0xa0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1) | /* a0 */ W(0xb0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* b0 */ W(0xc0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* c0 */ W(0xd0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* d0 */ W(0xe0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* e0 */ W(0xf0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) /* f0 */ /* ---------------------------------------------- */ /* 0 1 2 3 4 5 6 7 8 9 a b c d e f */ }; #undef W /* * opcodes we may need to refine support for: * * 0f - 2-byte instructions: For many of these instructions, the validity * depends on the prefix and/or the reg field. On such instructions, we * just consider the opcode combination valid if it corresponds to any * valid instruction. * * 8f - Group 1 - only reg = 0 is OK * c6-c7 - Group 11 - only reg = 0 is OK * d9-df - fpu insns with some illegal encodings * f2, f3 - repnz, repz prefixes. These are also the first byte for * certain floating-point instructions, such as addsd. * * fe - Group 4 - only reg = 0 or 1 is OK * ff - Group 5 - only reg = 0-6 is OK * * others -- Do we need to support these? * * 0f - (floating-point?) prefetch instructions * 07, 17, 1f - pop es, pop ss, pop ds * 26, 2e, 36, 3e - es:, cs:, ss:, ds: segment prefixes -- * but 64 and 65 (fs: and gs:) seem to be used, so we support them * 67 - addr16 prefix * ce - into * f0 - lock prefix */ /* * TODO: * - Where necessary, examine the modrm byte and allow only valid instructions * in the different Groups and fpu instructions. */ static bool is_prefix_bad(struct insn *insn) { insn_byte_t p; int i; for_each_insn_prefix(insn, i, p) { insn_attr_t attr; attr = inat_get_opcode_attribute(p); switch (attr) { case INAT_MAKE_PREFIX(INAT_PFX_ES): case INAT_MAKE_PREFIX(INAT_PFX_CS): case INAT_MAKE_PREFIX(INAT_PFX_DS): case INAT_MAKE_PREFIX(INAT_PFX_SS): case INAT_MAKE_PREFIX(INAT_PFX_LOCK): return true; } } return false; } static int uprobe_init_insn(struct arch_uprobe *auprobe, struct insn *insn, bool x86_64) { enum insn_mode m = x86_64 ? INSN_MODE_64 : INSN_MODE_32; u32 volatile *good_insns; int ret; ret = insn_decode(insn, auprobe->insn, sizeof(auprobe->insn), m); if (ret < 0) return -ENOEXEC; if (is_prefix_bad(insn)) return -ENOTSUPP; /* We should not singlestep on the exception masking instructions */ if (insn_masking_exception(insn)) return -ENOTSUPP; if (x86_64) good_insns = good_insns_64; else good_insns = good_insns_32; if (test_bit(OPCODE1(insn), (unsigned long *)good_insns)) return 0; if (insn->opcode.nbytes == 2) { if (test_bit(OPCODE2(insn), (unsigned long *)good_2byte_insns)) return 0; } return -ENOTSUPP; } #ifdef CONFIG_X86_64 asm ( ".pushsection .rodata\n" ".global uretprobe_trampoline_entry\n" "uretprobe_trampoline_entry:\n" "pushq %rax\n" "pushq %rcx\n" "pushq %r11\n" "movq $" __stringify(__NR_uretprobe) ", %rax\n" "syscall\n" ".global uretprobe_syscall_check\n" "uretprobe_syscall_check:\n" "popq %r11\n" "popq %rcx\n" /* The uretprobe syscall replaces stored %rax value with final * return address, so we don't restore %rax in here and just * call ret. */ "retq\n" ".global uretprobe_trampoline_end\n" "uretprobe_trampoline_end:\n" ".popsection\n" ); extern u8 uretprobe_trampoline_entry[]; extern u8 uretprobe_trampoline_end[]; extern u8 uretprobe_syscall_check[]; void *arch_uprobe_trampoline(unsigned long *psize) { static uprobe_opcode_t insn = UPROBE_SWBP_INSN; struct pt_regs *regs = task_pt_regs(current); /* * At the moment the uretprobe syscall trampoline is supported * only for native 64-bit process, the compat process still uses * standard breakpoint. */ if (user_64bit_mode(regs)) { *psize = uretprobe_trampoline_end - uretprobe_trampoline_entry; return uretprobe_trampoline_entry; } *psize = UPROBE_SWBP_INSN_SIZE; return &insn; } static unsigned long trampoline_check_ip(unsigned long tramp) { return tramp + (uretprobe_syscall_check - uretprobe_trampoline_entry); } SYSCALL_DEFINE0(uretprobe) { struct pt_regs *regs = task_pt_regs(current); unsigned long err, ip, sp, r11_cx_ax[3], tramp; /* If there's no trampoline, we are called from wrong place. */ tramp = uprobe_get_trampoline_vaddr(); if (unlikely(tramp == UPROBE_NO_TRAMPOLINE_VADDR)) goto sigill; /* Make sure the ip matches the only allowed sys_uretprobe caller. */ if (unlikely(regs->ip != trampoline_check_ip(tramp))) goto sigill; err = copy_from_user(r11_cx_ax, (void __user *)regs->sp, sizeof(r11_cx_ax)); if (err) goto sigill; /* expose the "right" values of r11/cx/ax/sp to uprobe_consumer/s */ regs->r11 = r11_cx_ax[0]; regs->cx = r11_cx_ax[1]; regs->ax = r11_cx_ax[2]; regs->sp += sizeof(r11_cx_ax); regs->orig_ax = -1; ip = regs->ip; sp = regs->sp; uprobe_handle_trampoline(regs); /* * Some of the uprobe consumers has changed sp, we can do nothing, * just return via iret. * .. or shadow stack is enabled, in which case we need to skip * return through the user space stack address. */ if (regs->sp != sp || shstk_is_enabled()) return regs->ax; regs->sp -= sizeof(r11_cx_ax); /* for the case uprobe_consumer has changed r11/cx */ r11_cx_ax[0] = regs->r11; r11_cx_ax[1] = regs->cx; /* * ax register is passed through as return value, so we can use * its space on stack for ip value and jump to it through the * trampoline's ret instruction */ r11_cx_ax[2] = regs->ip; regs->ip = ip; err = copy_to_user((void __user *)regs->sp, r11_cx_ax, sizeof(r11_cx_ax)); if (err) goto sigill; /* ensure sysret, see do_syscall_64() */ regs->r11 = regs->flags; regs->cx = regs->ip; return regs->ax; sigill: force_sig(SIGILL); return -1; } /* * If arch_uprobe->insn doesn't use rip-relative addressing, return * immediately. Otherwise, rewrite the instruction so that it accesses * its memory operand indirectly through a scratch register. Set * defparam->fixups accordingly. (The contents of the scratch register * will be saved before we single-step the modified instruction, * and restored afterward). * * We do this because a rip-relative instruction can access only a * relatively small area (+/- 2 GB from the instruction), and the XOL * area typically lies beyond that area. At least for instructions * that store to memory, we can't execute the original instruction * and "fix things up" later, because the misdirected store could be * disastrous. * * Some useful facts about rip-relative instructions: * * - There's always a modrm byte with bit layout "00 reg 101". * - There's never a SIB byte. * - The displacement is always 4 bytes. * - REX.B=1 bit in REX prefix, which normally extends r/m field, * has no effect on rip-relative mode. It doesn't make modrm byte * with r/m=101 refer to register 1101 = R13. */ static void riprel_analyze(struct arch_uprobe *auprobe, struct insn *insn) { u8 *cursor; u8 reg; u8 reg2; if (!insn_rip_relative(insn)) return; /* * insn_rip_relative() would have decoded rex_prefix, vex_prefix, modrm. * Clear REX.b bit (extension of MODRM.rm field): * we want to encode low numbered reg, not r8+. */ if (insn->rex_prefix.nbytes) { cursor = auprobe->insn + insn_offset_rex_prefix(insn); /* REX byte has 0100wrxb layout, clearing REX.b bit */ *cursor &= 0xfe; } /* * Similar treatment for VEX3/EVEX prefix. * TODO: add XOP treatment when insn decoder supports them */ if (insn->vex_prefix.nbytes >= 3) { /* * vex2: c5 rvvvvLpp (has no b bit) * vex3/xop: c4/8f rxbmmmmm wvvvvLpp * evex: 62 rxbR00mm wvvvv1pp zllBVaaa * Setting VEX3.b (setting because it has inverted meaning). * Setting EVEX.x since (in non-SIB encoding) EVEX.x * is the 4th bit of MODRM.rm, and needs the same treatment. * For VEX3-encoded insns, VEX3.x value has no effect in * non-SIB encoding, the change is superfluous but harmless. */ cursor = auprobe->insn + insn_offset_vex_prefix(insn) + 1; *cursor |= 0x60; } /* * Convert from rip-relative addressing to register-relative addressing * via a scratch register. * * This is tricky since there are insns with modrm byte * which also use registers not encoded in modrm byte: * [i]div/[i]mul: implicitly use dx:ax * shift ops: implicitly use cx * cmpxchg: implicitly uses ax * cmpxchg8/16b: implicitly uses dx:ax and bx:cx * Encoding: 0f c7/1 modrm * The code below thinks that reg=1 (cx), chooses si as scratch. * mulx: implicitly uses dx: mulx r/m,r1,r2 does r1:r2 = dx * r/m. * First appeared in Haswell (BMI2 insn). It is vex-encoded. * Example where none of bx,cx,dx can be used as scratch reg: * c4 e2 63 f6 0d disp32 mulx disp32(%rip),%ebx,%ecx * [v]pcmpistri: implicitly uses cx, xmm0 * [v]pcmpistrm: implicitly uses xmm0 * [v]pcmpestri: implicitly uses ax, dx, cx, xmm0 * [v]pcmpestrm: implicitly uses ax, dx, xmm0 * Evil SSE4.2 string comparison ops from hell. * maskmovq/[v]maskmovdqu: implicitly uses (ds:rdi) as destination. * Encoding: 0f f7 modrm, 66 0f f7 modrm, vex-encoded: c5 f9 f7 modrm. * Store op1, byte-masked by op2 msb's in each byte, to (ds:rdi). * AMD says it has no 3-operand form (vex.vvvv must be 1111) * and that it can have only register operands, not mem * (its modrm byte must have mode=11). * If these restrictions will ever be lifted, * we'll need code to prevent selection of di as scratch reg! * * Summary: I don't know any insns with modrm byte which * use SI register implicitly. DI register is used only * by one insn (maskmovq) and BX register is used * only by one too (cmpxchg8b). * BP is stack-segment based (may be a problem?). * AX, DX, CX are off-limits (many implicit users). * SP is unusable (it's stack pointer - think about "pop mem"; * also, rsp+disp32 needs sib encoding -> insn length change). */ reg = MODRM_REG(insn); /* Fetch modrm.reg */ reg2 = 0xff; /* Fetch vex.vvvv */ if (insn->vex_prefix.nbytes) reg2 = insn->vex_prefix.bytes[2]; /* * TODO: add XOP vvvv reading. * * vex.vvvv field is in bits 6-3, bits are inverted. * But in 32-bit mode, high-order bit may be ignored. * Therefore, let's consider only 3 low-order bits. */ reg2 = ((reg2 >> 3) & 0x7) ^ 0x7; /* * Register numbering is ax,cx,dx,bx, sp,bp,si,di, r8..r15. * * Choose scratch reg. Order is important: must not select bx * if we can use si (cmpxchg8b case!) */ if (reg != 6 && reg2 != 6) { reg2 = 6; auprobe->defparam.fixups |= UPROBE_FIX_RIP_SI; } else if (reg != 7 && reg2 != 7) { reg2 = 7; auprobe->defparam.fixups |= UPROBE_FIX_RIP_DI; /* TODO (paranoia): force maskmovq to not use di */ } else { reg2 = 3; auprobe->defparam.fixups |= UPROBE_FIX_RIP_BX; } /* * Point cursor at the modrm byte. The next 4 bytes are the * displacement. Beyond the displacement, for some instructions, * is the immediate operand. */ cursor = auprobe->insn + insn_offset_modrm(insn); /* * Change modrm from "00 reg 101" to "10 reg reg2". Example: * 89 05 disp32 mov %eax,disp32(%rip) becomes * 89 86 disp32 mov %eax,disp32(%rsi) */ *cursor = 0x80 | (reg << 3) | reg2; } static inline unsigned long * scratch_reg(struct arch_uprobe *auprobe, struct pt_regs *regs) { if (auprobe->defparam.fixups & UPROBE_FIX_RIP_SI) return ®s->si; if (auprobe->defparam.fixups & UPROBE_FIX_RIP_DI) return ®s->di; return ®s->bx; } /* * If we're emulating a rip-relative instruction, save the contents * of the scratch register and store the target address in that register. */ static void riprel_pre_xol(struct arch_uprobe *auprobe, struct pt_regs *regs) { if (auprobe->defparam.fixups & UPROBE_FIX_RIP_MASK) { struct uprobe_task *utask = current->utask; unsigned long *sr = scratch_reg(auprobe, regs); utask->autask.saved_scratch_register = *sr; *sr = utask->vaddr + auprobe->defparam.ilen; } } static void riprel_post_xol(struct arch_uprobe *auprobe, struct pt_regs *regs) { if (auprobe->defparam.fixups & UPROBE_FIX_RIP_MASK) { struct uprobe_task *utask = current->utask; unsigned long *sr = scratch_reg(auprobe, regs); *sr = utask->autask.saved_scratch_register; } } #else /* 32-bit: */ /* * No RIP-relative addressing on 32-bit */ static void riprel_analyze(struct arch_uprobe *auprobe, struct insn *insn) { } static void riprel_pre_xol(struct arch_uprobe *auprobe, struct pt_regs *regs) { } static void riprel_post_xol(struct arch_uprobe *auprobe, struct pt_regs *regs) { } #endif /* CONFIG_X86_64 */ struct uprobe_xol_ops { bool (*emulate)(struct arch_uprobe *, struct pt_regs *); int (*pre_xol)(struct arch_uprobe *, struct pt_regs *); int (*post_xol)(struct arch_uprobe *, struct pt_regs *); void (*abort)(struct arch_uprobe *, struct pt_regs *); }; static inline int sizeof_long(struct pt_regs *regs) { /* * Check registers for mode as in_xxx_syscall() does not apply here. */ return user_64bit_mode(regs) ? 8 : 4; } static int default_pre_xol_op(struct arch_uprobe *auprobe, struct pt_regs *regs) { riprel_pre_xol(auprobe, regs); return 0; } static int emulate_push_stack(struct pt_regs *regs, unsigned long val) { unsigned long new_sp = regs->sp - sizeof_long(regs); if (copy_to_user((void __user *)new_sp, &val, sizeof_long(regs))) return -EFAULT; regs->sp = new_sp; return 0; } /* * We have to fix things up as follows: * * Typically, the new ip is relative to the copied instruction. We need * to make it relative to the original instruction (FIX_IP). Exceptions * are return instructions and absolute or indirect jump or call instructions. * * If the single-stepped instruction was a call, the return address that * is atop the stack is the address following the copied instruction. We * need to make it the address following the original instruction (FIX_CALL). * * If the original instruction was a rip-relative instruction such as * "movl %edx,0xnnnn(%rip)", we have instead executed an equivalent * instruction using a scratch register -- e.g., "movl %edx,0xnnnn(%rsi)". * We need to restore the contents of the scratch register * (FIX_RIP_reg). */ static int default_post_xol_op(struct arch_uprobe *auprobe, struct pt_regs *regs) { struct uprobe_task *utask = current->utask; riprel_post_xol(auprobe, regs); if (auprobe->defparam.fixups & UPROBE_FIX_IP) { long correction = utask->vaddr - utask->xol_vaddr; regs->ip += correction; } else if (auprobe->defparam.fixups & UPROBE_FIX_CALL) { regs->sp += sizeof_long(regs); /* Pop incorrect return address */ if (emulate_push_stack(regs, utask->vaddr + auprobe->defparam.ilen)) return -ERESTART; } /* popf; tell the caller to not touch TF */ if (auprobe->defparam.fixups & UPROBE_FIX_SETF) utask->autask.saved_tf = true; return 0; } static void default_abort_op(struct arch_uprobe *auprobe, struct pt_regs *regs) { riprel_post_xol(auprobe, regs); } static const struct uprobe_xol_ops default_xol_ops = { .pre_xol = default_pre_xol_op, .post_xol = default_post_xol_op, .abort = default_abort_op, }; static bool branch_is_call(struct arch_uprobe *auprobe) { return auprobe->branch.opc1 == 0xe8; } #define CASE_COND \ COND(70, 71, XF(OF)) \ COND(72, 73, XF(CF)) \ COND(74, 75, XF(ZF)) \ COND(78, 79, XF(SF)) \ COND(7a, 7b, XF(PF)) \ COND(76, 77, XF(CF) || XF(ZF)) \ COND(7c, 7d, XF(SF) != XF(OF)) \ COND(7e, 7f, XF(ZF) || XF(SF) != XF(OF)) #define COND(op_y, op_n, expr) \ case 0x ## op_y: DO((expr) != 0) \ case 0x ## op_n: DO((expr) == 0) #define XF(xf) (!!(flags & X86_EFLAGS_ ## xf)) static bool is_cond_jmp_opcode(u8 opcode) { switch (opcode) { #define DO(expr) \ return true; CASE_COND #undef DO default: return false; } } static bool check_jmp_cond(struct arch_uprobe *auprobe, struct pt_regs *regs) { unsigned long flags = regs->flags; switch (auprobe->branch.opc1) { #define DO(expr) \ return expr; CASE_COND #undef DO default: /* not a conditional jmp */ return true; } } #undef XF #undef COND #undef CASE_COND static bool branch_emulate_op(struct arch_uprobe *auprobe, struct pt_regs *regs) { unsigned long new_ip = regs->ip += auprobe->branch.ilen; unsigned long offs = (long)auprobe->branch.offs; if (branch_is_call(auprobe)) { /* * If it fails we execute this (mangled, see the comment in * branch_clear_offset) insn out-of-line. In the likely case * this should trigger the trap, and the probed application * should die or restart the same insn after it handles the * signal, arch_uprobe_post_xol() won't be even called. * * But there is corner case, see the comment in ->post_xol(). */ if (emulate_push_stack(regs, new_ip)) return false; } else if (!check_jmp_cond(auprobe, regs)) { offs = 0; } regs->ip = new_ip + offs; return true; } static bool push_emulate_op(struct arch_uprobe *auprobe, struct pt_regs *regs) { unsigned long *src_ptr = (void *)regs + auprobe->push.reg_offset; if (emulate_push_stack(regs, *src_ptr)) return false; regs->ip += auprobe->push.ilen; return true; } static int branch_post_xol_op(struct arch_uprobe *auprobe, struct pt_regs *regs) { BUG_ON(!branch_is_call(auprobe)); /* * We can only get here if branch_emulate_op() failed to push the ret * address _and_ another thread expanded our stack before the (mangled) * "call" insn was executed out-of-line. Just restore ->sp and restart. * We could also restore ->ip and try to call branch_emulate_op() again. */ regs->sp += sizeof_long(regs); return -ERESTART; } static void branch_clear_offset(struct arch_uprobe *auprobe, struct insn *insn) { /* * Turn this insn into "call 1f; 1:", this is what we will execute * out-of-line if ->emulate() fails. We only need this to generate * a trap, so that the probed task receives the correct signal with * the properly filled siginfo. * * But see the comment in ->post_xol(), in the unlikely case it can * succeed. So we need to ensure that the new ->ip can not fall into * the non-canonical area and trigger #GP. * * We could turn it into (say) "pushf", but then we would need to * divorce ->insn[] and ->ixol[]. We need to preserve the 1st byte * of ->insn[] for set_orig_insn(). */ memset(auprobe->insn + insn_offset_immediate(insn), 0, insn->immediate.nbytes); } static const struct uprobe_xol_ops branch_xol_ops = { .emulate = branch_emulate_op, .post_xol = branch_post_xol_op, }; static const struct uprobe_xol_ops push_xol_ops = { .emulate = push_emulate_op, }; /* Returns -ENOSYS if branch_xol_ops doesn't handle this insn */ static int branch_setup_xol_ops(struct arch_uprobe *auprobe, struct insn *insn) { u8 opc1 = OPCODE1(insn); insn_byte_t p; int i; /* x86_nops[insn->length]; same as jmp with .offs = 0 */ if (insn->length <= ASM_NOP_MAX && !memcmp(insn->kaddr, x86_nops[insn->length], insn->length)) goto setup; switch (opc1) { case 0xeb: /* jmp 8 */ case 0xe9: /* jmp 32 */ break; case 0x90: /* prefix* + nop; same as jmp with .offs = 0 */ goto setup; case 0xe8: /* call relative */ branch_clear_offset(auprobe, insn); break; case 0x0f: if (insn->opcode.nbytes != 2) return -ENOSYS; /* * If it is a "near" conditional jmp, OPCODE2() - 0x10 matches * OPCODE1() of the "short" jmp which checks the same condition. */ opc1 = OPCODE2(insn) - 0x10; fallthrough; default: if (!is_cond_jmp_opcode(opc1)) return -ENOSYS; } /* * 16-bit overrides such as CALLW (66 e8 nn nn) are not supported. * Intel and AMD behavior differ in 64-bit mode: Intel ignores 66 prefix. * No one uses these insns, reject any branch insns with such prefix. */ for_each_insn_prefix(insn, i, p) { if (p == 0x66) return -ENOTSUPP; } setup: auprobe->branch.opc1 = opc1; auprobe->branch.ilen = insn->length; auprobe->branch.offs = insn->immediate.value; auprobe->ops = &branch_xol_ops; return 0; } /* Returns -ENOSYS if push_xol_ops doesn't handle this insn */ static int push_setup_xol_ops(struct arch_uprobe *auprobe, struct insn *insn) { u8 opc1 = OPCODE1(insn), reg_offset = 0; if (opc1 < 0x50 || opc1 > 0x57) return -ENOSYS; if (insn->length > 2) return -ENOSYS; if (insn->length == 2) { /* only support rex_prefix 0x41 (x64 only) */ #ifdef CONFIG_X86_64 if (insn->rex_prefix.nbytes != 1 || insn->rex_prefix.bytes[0] != 0x41) return -ENOSYS; switch (opc1) { case 0x50: reg_offset = offsetof(struct pt_regs, r8); break; case 0x51: reg_offset = offsetof(struct pt_regs, r9); break; case 0x52: reg_offset = offsetof(struct pt_regs, r10); break; case 0x53: reg_offset = offsetof(struct pt_regs, r11); break; case 0x54: reg_offset = offsetof(struct pt_regs, r12); break; case 0x55: reg_offset = offsetof(struct pt_regs, r13); break; case 0x56: reg_offset = offsetof(struct pt_regs, r14); break; case 0x57: reg_offset = offsetof(struct pt_regs, r15); break; } #else return -ENOSYS; #endif } else { switch (opc1) { case 0x50: reg_offset = offsetof(struct pt_regs, ax); break; case 0x51: reg_offset = offsetof(struct pt_regs, cx); break; case 0x52: reg_offset = offsetof(struct pt_regs, dx); break; case 0x53: reg_offset = offsetof(struct pt_regs, bx); break; case 0x54: reg_offset = offsetof(struct pt_regs, sp); break; case 0x55: reg_offset = offsetof(struct pt_regs, bp); break; case 0x56: reg_offset = offsetof(struct pt_regs, si); break; case 0x57: reg_offset = offsetof(struct pt_regs, di); break; } } auprobe->push.reg_offset = reg_offset; auprobe->push.ilen = insn->length; auprobe->ops = &push_xol_ops; return 0; } /** * arch_uprobe_analyze_insn - instruction analysis including validity and fixups. * @auprobe: the probepoint information. * @mm: the probed address space. * @addr: virtual address at which to install the probepoint * Return 0 on success or a -ve number on error. */ int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long addr) { struct insn insn; u8 fix_ip_or_call = UPROBE_FIX_IP; int ret; ret = uprobe_init_insn(auprobe, &insn, is_64bit_mm(mm)); if (ret) return ret; ret = branch_setup_xol_ops(auprobe, &insn); if (ret != -ENOSYS) return ret; ret = push_setup_xol_ops(auprobe, &insn); if (ret != -ENOSYS) return ret; /* * Figure out which fixups default_post_xol_op() will need to perform, * and annotate defparam->fixups accordingly. */ switch (OPCODE1(&insn)) { case 0x9d: /* popf */ auprobe->defparam.fixups |= UPROBE_FIX_SETF; break; case 0xc3: /* ret or lret -- ip is correct */ case 0xcb: case 0xc2: case 0xca: case 0xea: /* jmp absolute -- ip is correct */ fix_ip_or_call = 0; break; case 0x9a: /* call absolute - Fix return addr, not ip */ fix_ip_or_call = UPROBE_FIX_CALL; break; case 0xff: switch (MODRM_REG(&insn)) { case 2: case 3: /* call or lcall, indirect */ fix_ip_or_call = UPROBE_FIX_CALL; break; case 4: case 5: /* jmp or ljmp, indirect */ fix_ip_or_call = 0; break; } fallthrough; default: riprel_analyze(auprobe, &insn); } auprobe->defparam.ilen = insn.length; auprobe->defparam.fixups |= fix_ip_or_call; auprobe->ops = &default_xol_ops; return 0; } /* * arch_uprobe_pre_xol - prepare to execute out of line. * @auprobe: the probepoint information. * @regs: reflects the saved user state of current task. */ int arch_uprobe_pre_xol(struct arch_uprobe *auprobe, struct pt_regs *regs) { struct uprobe_task *utask = current->utask; if (auprobe->ops->pre_xol) { int err = auprobe->ops->pre_xol(auprobe, regs); if (err) return err; } regs->ip = utask->xol_vaddr; utask->autask.saved_trap_nr = current->thread.trap_nr; current->thread.trap_nr = UPROBE_TRAP_NR; utask->autask.saved_tf = !!(regs->flags & X86_EFLAGS_TF); regs->flags |= X86_EFLAGS_TF; if (test_tsk_thread_flag(current, TIF_BLOCKSTEP)) set_task_blockstep(current, false); return 0; } /* * If xol insn itself traps and generates a signal(Say, * SIGILL/SIGSEGV/etc), then detect the case where a singlestepped * instruction jumps back to its own address. It is assumed that anything * like do_page_fault/do_trap/etc sets thread.trap_nr != -1. * * arch_uprobe_pre_xol/arch_uprobe_post_xol save/restore thread.trap_nr, * arch_uprobe_xol_was_trapped() simply checks that ->trap_nr is not equal to * UPROBE_TRAP_NR == -1 set by arch_uprobe_pre_xol(). */ bool arch_uprobe_xol_was_trapped(struct task_struct *t) { if (t->thread.trap_nr != UPROBE_TRAP_NR) return true; return false; } /* * Called after single-stepping. To avoid the SMP problems that can * occur when we temporarily put back the original opcode to * single-step, we single-stepped a copy of the instruction. * * This function prepares to resume execution after the single-step. */ int arch_uprobe_post_xol(struct arch_uprobe *auprobe, struct pt_regs *regs) { struct uprobe_task *utask = current->utask; bool send_sigtrap = utask->autask.saved_tf; int err = 0; WARN_ON_ONCE(current->thread.trap_nr != UPROBE_TRAP_NR); current->thread.trap_nr = utask->autask.saved_trap_nr; if (auprobe->ops->post_xol) { err = auprobe->ops->post_xol(auprobe, regs); if (err) { /* * Restore ->ip for restart or post mortem analysis. * ->post_xol() must not return -ERESTART unless this * is really possible. */ regs->ip = utask->vaddr; if (err == -ERESTART) err = 0; send_sigtrap = false; } } /* * arch_uprobe_pre_xol() doesn't save the state of TIF_BLOCKSTEP * so we can get an extra SIGTRAP if we do not clear TF. We need * to examine the opcode to make it right. */ if (send_sigtrap) send_sig(SIGTRAP, current, 0); if (!utask->autask.saved_tf) regs->flags &= ~X86_EFLAGS_TF; return err; } /* callback routine for handling exceptions. */ int arch_uprobe_exception_notify(struct notifier_block *self, unsigned long val, void *data) { struct die_args *args = data; struct pt_regs *regs = args->regs; int ret = NOTIFY_DONE; /* We are only interested in userspace traps */ if (regs && !user_mode(regs)) return NOTIFY_DONE; switch (val) { case DIE_INT3: if (uprobe_pre_sstep_notifier(regs)) ret = NOTIFY_STOP; break; case DIE_DEBUG: if (uprobe_post_sstep_notifier(regs)) ret = NOTIFY_STOP; break; default: break; } return ret; } /* * This function gets called when XOL instruction either gets trapped or * the thread has a fatal signal. Reset the instruction pointer to its * probed address for the potential restart or for post mortem analysis. */ void arch_uprobe_abort_xol(struct arch_uprobe *auprobe, struct pt_regs *regs) { struct uprobe_task *utask = current->utask; if (auprobe->ops->abort) auprobe->ops->abort(auprobe, regs); current->thread.trap_nr = utask->autask.saved_trap_nr; regs->ip = utask->vaddr; /* clear TF if it was set by us in arch_uprobe_pre_xol() */ if (!utask->autask.saved_tf) regs->flags &= ~X86_EFLAGS_TF; } static bool __skip_sstep(struct arch_uprobe *auprobe, struct pt_regs *regs) { if (auprobe->ops->emulate) return auprobe->ops->emulate(auprobe, regs); return false; } bool arch_uprobe_skip_sstep(struct arch_uprobe *auprobe, struct pt_regs *regs) { bool ret = __skip_sstep(auprobe, regs); if (ret && (regs->flags & X86_EFLAGS_TF)) send_sig(SIGTRAP, current, 0); return ret; } unsigned long arch_uretprobe_hijack_return_addr(unsigned long trampoline_vaddr, struct pt_regs *regs) { int rasize = sizeof_long(regs), nleft; unsigned long orig_ret_vaddr = 0; /* clear high bits for 32-bit apps */ if (copy_from_user(&orig_ret_vaddr, (void __user *)regs->sp, rasize)) return -1; /* check whether address has been already hijacked */ if (orig_ret_vaddr == trampoline_vaddr) return orig_ret_vaddr; nleft = copy_to_user((void __user *)regs->sp, &trampoline_vaddr, rasize); if (likely(!nleft)) { if (shstk_update_last_frame(trampoline_vaddr)) { force_sig(SIGSEGV); return -1; } return orig_ret_vaddr; } if (nleft != rasize) { pr_err("return address clobbered: pid=%d, %%sp=%#lx, %%ip=%#lx\n", current->pid, regs->sp, regs->ip); force_sig(SIGSEGV); } return -1; } bool arch_uretprobe_is_alive(struct return_instance *ret, enum rp_check ctx, struct pt_regs *regs) { if (ctx == RP_CHECK_CALL) /* sp was just decremented by "call" insn */ return regs->sp < ret->stack; else return regs->sp <= ret->stack; } |
166 1 2 165 130 131 4 166 166 454 326 4 325 168 70 1 2 70 1 1 1 53 53 8 8 7 37 37 37 117 112 4 352 49 39 11 48 1 1 47 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 | // SPDX-License-Identifier: GPL-2.0 /* * Copyright (C) 2007 Oracle. All rights reserved. */ #include <linux/err.h> #include <linux/uuid.h> #include "ctree.h" #include "fs.h" #include "messages.h" #include "transaction.h" #include "disk-io.h" #include "qgroup.h" #include "space-info.h" #include "accessors.h" #include "root-tree.h" #include "orphan.h" /* * Read a root item from the tree. In case we detect a root item smaller then * sizeof(root_item), we know it's an old version of the root structure and * initialize all new fields to zero. The same happens if we detect mismatching * generation numbers as then we know the root was once mounted with an older * kernel that was not aware of the root item structure change. */ static void btrfs_read_root_item(struct extent_buffer *eb, int slot, struct btrfs_root_item *item) { u32 len; int need_reset = 0; len = btrfs_item_size(eb, slot); read_extent_buffer(eb, item, btrfs_item_ptr_offset(eb, slot), min_t(u32, len, sizeof(*item))); if (len < sizeof(*item)) need_reset = 1; if (!need_reset && btrfs_root_generation(item) != btrfs_root_generation_v2(item)) { if (btrfs_root_generation_v2(item) != 0) { btrfs_warn(eb->fs_info, "mismatching generation and generation_v2 found in root item. This root was probably mounted with an older kernel. Resetting all new fields."); } need_reset = 1; } if (need_reset) { /* Clear all members from generation_v2 onwards. */ memset_startat(item, 0, generation_v2); generate_random_guid(item->uuid); } } /* * Lookup the root by the key. * * root: the root of the root tree * search_key: the key to search * path: the path we search * root_item: the root item of the tree we look for * root_key: the root key of the tree we look for * * If ->offset of 'search_key' is -1ULL, it means we are not sure the offset * of the search key, just lookup the root with the highest offset for a * given objectid. * * If we find something return 0, otherwise > 0, < 0 on error. */ int btrfs_find_root(struct btrfs_root *root, const struct btrfs_key *search_key, struct btrfs_path *path, struct btrfs_root_item *root_item, struct btrfs_key *root_key) { struct btrfs_key found_key; struct extent_buffer *l; int ret; int slot; ret = btrfs_search_slot(NULL, root, search_key, path, 0, 0); if (ret < 0) return ret; if (search_key->offset != -1ULL) { /* the search key is exact */ if (ret > 0) goto out; } else { /* * Key with offset -1 found, there would have to exist a root * with such id, but this is out of the valid range. */ if (ret == 0) { ret = -EUCLEAN; goto out; } if (path->slots[0] == 0) goto out; path->slots[0]--; ret = 0; } l = path->nodes[0]; slot = path->slots[0]; btrfs_item_key_to_cpu(l, &found_key, slot); if (found_key.objectid != search_key->objectid || found_key.type != BTRFS_ROOT_ITEM_KEY) { ret = 1; goto out; } if (root_item) btrfs_read_root_item(l, slot, root_item); if (root_key) memcpy(root_key, &found_key, sizeof(found_key)); out: btrfs_release_path(path); return ret; } void btrfs_set_root_node(struct btrfs_root_item *item, struct extent_buffer *node) { btrfs_set_root_bytenr(item, node->start); btrfs_set_root_level(item, btrfs_header_level(node)); btrfs_set_root_generation(item, btrfs_header_generation(node)); } /* * copy the data in 'item' into the btree */ int btrfs_update_root(struct btrfs_trans_handle *trans, struct btrfs_root *root, struct btrfs_key *key, struct btrfs_root_item *item) { struct btrfs_fs_info *fs_info = root->fs_info; struct btrfs_path *path; struct extent_buffer *l; int ret; int slot; unsigned long ptr; u32 old_len; path = btrfs_alloc_path(); if (!path) return -ENOMEM; ret = btrfs_search_slot(trans, root, key, path, 0, 1); if (ret < 0) goto out; if (ret > 0) { btrfs_crit(fs_info, "unable to find root key (%llu %u %llu) in tree %llu", key->objectid, key->type, key->offset, btrfs_root_id(root)); ret = -EUCLEAN; btrfs_abort_transaction(trans, ret); goto out; } l = path->nodes[0]; slot = path->slots[0]; ptr = btrfs_item_ptr_offset(l, slot); old_len = btrfs_item_size(l, slot); /* * If this is the first time we update the root item which originated * from an older kernel, we need to enlarge the item size to make room * for the added fields. */ if (old_len < sizeof(*item)) { btrfs_release_path(path); ret = btrfs_search_slot(trans, root, key, path, -1, 1); if (ret < 0) { btrfs_abort_transaction(trans, ret); goto out; } ret = btrfs_del_item(trans, root, path); if (ret < 0) { btrfs_abort_transaction(trans, ret); goto out; } btrfs_release_path(path); ret = btrfs_insert_empty_item(trans, root, path, key, sizeof(*item)); if (ret < 0) { btrfs_abort_transaction(trans, ret); goto out; } l = path->nodes[0]; slot = path->slots[0]; ptr = btrfs_item_ptr_offset(l, slot); } /* * Update generation_v2 so at the next mount we know the new root * fields are valid. */ btrfs_set_root_generation_v2(item, btrfs_root_generation(item)); write_extent_buffer(l, item, ptr, sizeof(*item)); out: btrfs_free_path(path); return ret; } int btrfs_insert_root(struct btrfs_trans_handle *trans, struct btrfs_root *root, const struct btrfs_key *key, struct btrfs_root_item *item) { /* * Make sure generation v1 and v2 match. See update_root for details. */ btrfs_set_root_generation_v2(item, btrfs_root_generation(item)); return btrfs_insert_item(trans, root, key, item, sizeof(*item)); } int btrfs_find_orphan_roots(struct btrfs_fs_info *fs_info) { struct btrfs_root *tree_root = fs_info->tree_root; struct extent_buffer *leaf; struct btrfs_path *path; struct btrfs_key key; struct btrfs_root *root; int err = 0; int ret; path = btrfs_alloc_path(); if (!path) return -ENOMEM; key.objectid = BTRFS_ORPHAN_OBJECTID; key.type = BTRFS_ORPHAN_ITEM_KEY; key.offset = 0; while (1) { u64 root_objectid; ret = btrfs_search_slot(NULL, tree_root, &key, path, 0, 0); if (ret < 0) { err = ret; break; } leaf = path->nodes[0]; if (path->slots[0] >= btrfs_header_nritems(leaf)) { ret = btrfs_next_leaf(tree_root, path); if (ret < 0) err = ret; if (ret != 0) break; leaf = path->nodes[0]; } btrfs_item_key_to_cpu(leaf, &key, path->slots[0]); btrfs_release_path(path); if (key.objectid != BTRFS_ORPHAN_OBJECTID || key.type != BTRFS_ORPHAN_ITEM_KEY) break; root_objectid = key.offset; key.offset++; root = btrfs_get_fs_root(fs_info, root_objectid, false); err = PTR_ERR_OR_ZERO(root); if (err && err != -ENOENT) { break; } else if (err == -ENOENT) { struct btrfs_trans_handle *trans; btrfs_release_path(path); trans = btrfs_join_transaction(tree_root); if (IS_ERR(trans)) { err = PTR_ERR(trans); btrfs_handle_fs_error(fs_info, err, "Failed to start trans to delete orphan item"); break; } err = btrfs_del_orphan_item(trans, tree_root, root_objectid); btrfs_end_transaction(trans); if (err) { btrfs_handle_fs_error(fs_info, err, "Failed to delete root orphan item"); break; } continue; } WARN_ON(!test_bit(BTRFS_ROOT_ORPHAN_ITEM_INSERTED, &root->state)); if (btrfs_root_refs(&root->root_item) == 0) { struct btrfs_key drop_key; btrfs_disk_key_to_cpu(&drop_key, &root->root_item.drop_progress); /* * If we have a non-zero drop_progress then we know we * made it partly through deleting this snapshot, and * thus we need to make sure we block any balance from * happening until this snapshot is completely dropped. */ if (drop_key.objectid != 0 || drop_key.type != 0 || drop_key.offset != 0) { set_bit(BTRFS_FS_UNFINISHED_DROPS, &fs_info->flags); set_bit(BTRFS_ROOT_UNFINISHED_DROP, &root->state); } set_bit(BTRFS_ROOT_DEAD_TREE, &root->state); btrfs_add_dead_root(root); } btrfs_put_root(root); } btrfs_free_path(path); return err; } /* drop the root item for 'key' from the tree root */ int btrfs_del_root(struct btrfs_trans_handle *trans, const struct btrfs_key *key) { struct btrfs_root *root = trans->fs_info->tree_root; struct btrfs_path *path; int ret; path = btrfs_alloc_path(); if (!path) return -ENOMEM; ret = btrfs_search_slot(trans, root, key, path, -1, 1); if (ret < 0) goto out; if (ret != 0) { /* The root must exist but we did not find it by the key. */ ret = -EUCLEAN; goto out; } ret = btrfs_del_item(trans, root, path); out: btrfs_free_path(path); return ret; } int btrfs_del_root_ref(struct btrfs_trans_handle *trans, u64 root_id, u64 ref_id, u64 dirid, u64 *sequence, const struct fscrypt_str *name) { struct btrfs_root *tree_root = trans->fs_info->tree_root; struct btrfs_path *path; struct btrfs_root_ref *ref; struct extent_buffer *leaf; struct btrfs_key key; unsigned long ptr; int ret; path = btrfs_alloc_path(); if (!path) return -ENOMEM; key.objectid = root_id; key.type = BTRFS_ROOT_BACKREF_KEY; key.offset = ref_id; again: ret = btrfs_search_slot(trans, tree_root, &key, path, -1, 1); if (ret < 0) { goto out; } else if (ret == 0) { leaf = path->nodes[0]; ref = btrfs_item_ptr(leaf, path->slots[0], struct btrfs_root_ref); ptr = (unsigned long)(ref + 1); if ((btrfs_root_ref_dirid(leaf, ref) != dirid) || (btrfs_root_ref_name_len(leaf, ref) != name->len) || memcmp_extent_buffer(leaf, name->name, ptr, name->len)) { ret = -ENOENT; goto out; } *sequence = btrfs_root_ref_sequence(leaf, ref); ret = btrfs_del_item(trans, tree_root, path); if (ret) goto out; } else { ret = -ENOENT; goto out; } if (key.type == BTRFS_ROOT_BACKREF_KEY) { btrfs_release_path(path); key.objectid = ref_id; key.type = BTRFS_ROOT_REF_KEY; key.offset = root_id; goto again; } out: btrfs_free_path(path); return ret; } /* * add a btrfs_root_ref item. type is either BTRFS_ROOT_REF_KEY * or BTRFS_ROOT_BACKREF_KEY. * * The dirid, sequence, name and name_len refer to the directory entry * that is referencing the root. * * For a forward ref, the root_id is the id of the tree referencing * the root and ref_id is the id of the subvol or snapshot. * * For a back ref the root_id is the id of the subvol or snapshot and * ref_id is the id of the tree referencing it. * * Will return 0, -ENOMEM, or anything from the CoW path */ int btrfs_add_root_ref(struct btrfs_trans_handle *trans, u64 root_id, u64 ref_id, u64 dirid, u64 sequence, const struct fscrypt_str *name) { struct btrfs_root *tree_root = trans->fs_info->tree_root; struct btrfs_key key; int ret; struct btrfs_path *path; struct btrfs_root_ref *ref; struct extent_buffer *leaf; unsigned long ptr; path = btrfs_alloc_path(); if (!path) return -ENOMEM; key.objectid = root_id; key.type = BTRFS_ROOT_BACKREF_KEY; key.offset = ref_id; again: ret = btrfs_insert_empty_item(trans, tree_root, path, &key, sizeof(*ref) + name->len); if (ret) { btrfs_abort_transaction(trans, ret); btrfs_free_path(path); return ret; } leaf = path->nodes[0]; ref = btrfs_item_ptr(leaf, path->slots[0], struct btrfs_root_ref); btrfs_set_root_ref_dirid(leaf, ref, dirid); btrfs_set_root_ref_sequence(leaf, ref, sequence); btrfs_set_root_ref_name_len(leaf, ref, name->len); ptr = (unsigned long)(ref + 1); write_extent_buffer(leaf, name->name, ptr, name->len); if (key.type == BTRFS_ROOT_BACKREF_KEY) { btrfs_release_path(path); key.objectid = ref_id; key.type = BTRFS_ROOT_REF_KEY; key.offset = root_id; goto again; } btrfs_free_path(path); return 0; } /* * Old btrfs forgets to init root_item->flags and root_item->byte_limit * for subvolumes. To work around this problem, we steal a bit from * root_item->inode_item->flags, and use it to indicate if those fields * have been properly initialized. */ void btrfs_check_and_init_root_item(struct btrfs_root_item *root_item) { u64 inode_flags = btrfs_stack_inode_flags(&root_item->inode); if (!(inode_flags & BTRFS_INODE_ROOT_ITEM_INIT)) { inode_flags |= BTRFS_INODE_ROOT_ITEM_INIT; btrfs_set_stack_inode_flags(&root_item->inode, inode_flags); btrfs_set_root_flags(root_item, 0); btrfs_set_root_limit(root_item, 0); } } void btrfs_update_root_times(struct btrfs_trans_handle *trans, struct btrfs_root *root) { struct btrfs_root_item *item = &root->root_item; struct timespec64 ct; ktime_get_real_ts64(&ct); spin_lock(&root->root_item_lock); btrfs_set_root_ctransid(item, trans->transid); btrfs_set_stack_timespec_sec(&item->ctime, ct.tv_sec); btrfs_set_stack_timespec_nsec(&item->ctime, ct.tv_nsec); spin_unlock(&root->root_item_lock); } /* * Reserve space for subvolume operation. * * root: the root of the parent directory * rsv: block reservation * items: the number of items that we need do reservation * use_global_rsv: allow fallback to the global block reservation * * This function is used to reserve the space for snapshot/subvolume * creation and deletion. Those operations are different with the * common file/directory operations, they change two fs/file trees * and root tree, the number of items that the qgroup reserves is * different with the free space reservation. So we can not use * the space reservation mechanism in start_transaction(). */ int btrfs_subvolume_reserve_metadata(struct btrfs_root *root, struct btrfs_block_rsv *rsv, int items, bool use_global_rsv) { u64 qgroup_num_bytes = 0; u64 num_bytes; int ret; struct btrfs_fs_info *fs_info = root->fs_info; struct btrfs_block_rsv *global_rsv = &fs_info->global_block_rsv; if (btrfs_qgroup_enabled(fs_info)) { /* One for parent inode, two for dir entries */ qgroup_num_bytes = 3 * fs_info->nodesize; ret = btrfs_qgroup_reserve_meta_prealloc(root, qgroup_num_bytes, true, false); if (ret) return ret; } num_bytes = btrfs_calc_insert_metadata_size(fs_info, items); rsv->space_info = btrfs_find_space_info(fs_info, BTRFS_BLOCK_GROUP_METADATA); ret = btrfs_block_rsv_add(fs_info, rsv, num_bytes, BTRFS_RESERVE_FLUSH_ALL); if (ret == -ENOSPC && use_global_rsv) ret = btrfs_block_rsv_migrate(global_rsv, rsv, num_bytes, true); if (ret && qgroup_num_bytes) btrfs_qgroup_free_meta_prealloc(root, qgroup_num_bytes); if (!ret) { spin_lock(&rsv->lock); rsv->qgroup_rsv_reserved += qgroup_num_bytes; spin_unlock(&rsv->lock); } return ret; } |
3 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 | #ifndef __DRM_VMA_MANAGER_H__ #define __DRM_VMA_MANAGER_H__ /* * Copyright (c) 2013 David Herrmann <dh.herrmann@gmail.com> * * Permission is hereby granted, free of charge, to any person obtaining a * copy of this software and associated documentation files (the "Software"), * to deal in the Software without restriction, including without limitation * the rights to use, copy, modify, merge, publish, distribute, sublicense, * and/or sell copies of the Software, and to permit persons to whom the * Software is furnished to do so, subject to the following conditions: * * The above copyright notice and this permission notice shall be included in * all copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR * OTHER DEALINGS IN THE SOFTWARE. */ #include <drm/drm_mm.h> #include <linux/mm.h> #include <linux/rbtree.h> #include <linux/spinlock.h> #include <linux/types.h> /* We make up offsets for buffer objects so we can recognize them at * mmap time. pgoff in mmap is an unsigned long, so we need to make sure * that the faked up offset will fit */ #if BITS_PER_LONG == 64 #define DRM_FILE_PAGE_OFFSET_START ((0xFFFFFFFFUL >> PAGE_SHIFT) + 1) #define DRM_FILE_PAGE_OFFSET_SIZE ((0xFFFFFFFFUL >> PAGE_SHIFT) * 256) #else #define DRM_FILE_PAGE_OFFSET_START ((0xFFFFFFFUL >> PAGE_SHIFT) + 1) #define DRM_FILE_PAGE_OFFSET_SIZE ((0xFFFFFFFUL >> PAGE_SHIFT) * 16) #endif struct drm_file; struct drm_vma_offset_file { struct rb_node vm_rb; struct drm_file *vm_tag; unsigned long vm_count; }; struct drm_vma_offset_node { rwlock_t vm_lock; struct drm_mm_node vm_node; struct rb_root vm_files; void *driver_private; }; struct drm_vma_offset_manager { rwlock_t vm_lock; struct drm_mm vm_addr_space_mm; }; void drm_vma_offset_manager_init(struct drm_vma_offset_manager *mgr, unsigned long page_offset, unsigned long size); void drm_vma_offset_manager_destroy(struct drm_vma_offset_manager *mgr); struct drm_vma_offset_node *drm_vma_offset_lookup_locked(struct drm_vma_offset_manager *mgr, unsigned long start, unsigned long pages); int drm_vma_offset_add(struct drm_vma_offset_manager *mgr, struct drm_vma_offset_node *node, unsigned long pages); void drm_vma_offset_remove(struct drm_vma_offset_manager *mgr, struct drm_vma_offset_node *node); int drm_vma_node_allow(struct drm_vma_offset_node *node, struct drm_file *tag); int drm_vma_node_allow_once(struct drm_vma_offset_node *node, struct drm_file *tag); void drm_vma_node_revoke(struct drm_vma_offset_node *node, struct drm_file *tag); bool drm_vma_node_is_allowed(struct drm_vma_offset_node *node, struct drm_file *tag); /** * drm_vma_offset_exact_lookup_locked() - Look up node by exact address * @mgr: Manager object * @start: Start address (page-based, not byte-based) * @pages: Size of object (page-based) * * Same as drm_vma_offset_lookup_locked() but does not allow any offset into the node. * It only returns the exact object with the given start address. * * RETURNS: * Node at exact start address @start. */ static inline struct drm_vma_offset_node * drm_vma_offset_exact_lookup_locked(struct drm_vma_offset_manager *mgr, unsigned long start, unsigned long pages) { struct drm_vma_offset_node *node; node = drm_vma_offset_lookup_locked(mgr, start, pages); return (node && node->vm_node.start == start) ? node : NULL; } /** * drm_vma_offset_lock_lookup() - Lock lookup for extended private use * @mgr: Manager object * * Lock VMA manager for extended lookups. Only locked VMA function calls * are allowed while holding this lock. All other contexts are blocked from VMA * until the lock is released via drm_vma_offset_unlock_lookup(). * * Use this if you need to take a reference to the objects returned by * drm_vma_offset_lookup_locked() before releasing this lock again. * * This lock must not be used for anything else than extended lookups. You must * not call any other VMA helpers while holding this lock. * * Note: You're in atomic-context while holding this lock! */ static inline void drm_vma_offset_lock_lookup(struct drm_vma_offset_manager *mgr) { read_lock(&mgr->vm_lock); } /** * drm_vma_offset_unlock_lookup() - Unlock lookup for extended private use * @mgr: Manager object * * Release lookup-lock. See drm_vma_offset_lock_lookup() for more information. */ static inline void drm_vma_offset_unlock_lookup(struct drm_vma_offset_manager *mgr) { read_unlock(&mgr->vm_lock); } /** * drm_vma_node_reset() - Initialize or reset node object * @node: Node to initialize or reset * * Reset a node to its initial state. This must be called before using it with * any VMA offset manager. * * This must not be called on an already allocated node, or you will leak * memory. */ static inline void drm_vma_node_reset(struct drm_vma_offset_node *node) { memset(node, 0, sizeof(*node)); node->vm_files = RB_ROOT; rwlock_init(&node->vm_lock); } /** * drm_vma_node_start() - Return start address for page-based addressing * @node: Node to inspect * * Return the start address of the given node. This can be used as offset into * the linear VM space that is provided by the VMA offset manager. Note that * this can only be used for page-based addressing. If you need a proper offset * for user-space mappings, you must apply "<< PAGE_SHIFT" or use the * drm_vma_node_offset_addr() helper instead. * * RETURNS: * Start address of @node for page-based addressing. 0 if the node does not * have an offset allocated. */ static inline unsigned long drm_vma_node_start(const struct drm_vma_offset_node *node) { return node->vm_node.start; } /** * drm_vma_node_size() - Return size (page-based) * @node: Node to inspect * * Return the size as number of pages for the given node. This is the same size * that was passed to drm_vma_offset_add(). If no offset is allocated for the * node, this is 0. * * RETURNS: * Size of @node as number of pages. 0 if the node does not have an offset * allocated. */ static inline unsigned long drm_vma_node_size(struct drm_vma_offset_node *node) { return node->vm_node.size; } /** * drm_vma_node_offset_addr() - Return sanitized offset for user-space mmaps * @node: Linked offset node * * Same as drm_vma_node_start() but returns the address as a valid offset that * can be used for user-space mappings during mmap(). * This must not be called on unlinked nodes. * * RETURNS: * Offset of @node for byte-based addressing. 0 if the node does not have an * object allocated. */ static inline __u64 drm_vma_node_offset_addr(struct drm_vma_offset_node *node) { return ((__u64)node->vm_node.start) << PAGE_SHIFT; } /** * drm_vma_node_unmap() - Unmap offset node * @node: Offset node * @file_mapping: Address space to unmap @node from * * Unmap all userspace mappings for a given offset node. The mappings must be * associated with the @file_mapping address-space. If no offset exists * nothing is done. * * This call is unlocked. The caller must guarantee that drm_vma_offset_remove() * is not called on this node concurrently. */ static inline void drm_vma_node_unmap(struct drm_vma_offset_node *node, struct address_space *file_mapping) { if (drm_mm_node_allocated(&node->vm_node)) unmap_mapping_range(file_mapping, drm_vma_node_offset_addr(node), drm_vma_node_size(node) << PAGE_SHIFT, 1); } /** * drm_vma_node_verify_access() - Access verification helper for TTM * @node: Offset node * @tag: Tag of file to check * * This checks whether @tag is granted access to @node. It is the same as * drm_vma_node_is_allowed() but suitable as drop-in helper for TTM * verify_access() callbacks. * * RETURNS: * 0 if access is granted, -EACCES otherwise. */ static inline int drm_vma_node_verify_access(struct drm_vma_offset_node *node, struct drm_file *tag) { return drm_vma_node_is_allowed(node, tag) ? 0 : -EACCES; } #endif /* __DRM_VMA_MANAGER_H__ */ |
168 169 169 168 169 52 273 52 319 313 56 25 108 11 97 8 43 73 5 68 46 31 2 11 2 18 342 342 36 342 18 18 26 26 22 4 26 1 25 25 4 25 25 45 46 1 1 1 1 1 1 1 1 10 10 10 13 13 2 1 1 21 3 25 25 1 13 10 1 22 2 24 23 1 1 1 1 1 691 691 689 2 1 66 64 66 3 20 20 20 20 46 46 46 46 46 72 117 117 117 86 46 87 23 19 15 6 97 33 103 15 9 1 1 1 2 1 1 1 1 101 6 3 2 1 5 1 6 1 6 4 1 5 1 5 6 6 5 1 6 1 117 117 7 111 110 13 106 9 6 117 69 69 69 50 40 55 15 24 31 64 24 65 12 11 5 4 7 4 2 24 4 2 1 33 7 10 1 28 1 1 1 13 1 16 1 69 67 10 198 198 198 107 136 162 57 112 43 72 19 48 167 20 44 4 5 3 7 1 2 35 12 2 3 176 22 198 198 179 32 55 49 24 43 15 47 4 2 2 50 151 151 151 146 147 285 232 7 75 13 87 108 6 5 67 21 4 9 16 140 11 78 273 31 77 8 278 207 15 206 201 208 250 250 21 19 100 138 285 39 89 169 41 40 1 45 210 152 103 241 21 4 11 4 7 76 76 76 74 5 208 108 134 208 208 201 200 6 286 209 77 78 208 84 73 251 130 20 114 29 29 114 60 368 366 4 365 368 368 423 422 1 3 422 224 305 207 299 286 286 239 60 84 213 196 90 197 286 46 250 249 246 20 248 11 117 152 1 248 171 172 84 116 40 9 43 43 43 4 299 296 288 26 295 293 290 99 244 283 34 185 186 186 116 126 186 14 164 71 18 147 24 19 147 20 13 172 172 172 19 128 128 3 125 47 91 29 102 102 1 101 87 19 102 29 128 128 30 104 62 62 53 11 62 10 62 51 11 6 2 3 54 54 54 30 15 20 9 9 9 1 8 3 6 54 25 1 25 25 11 3 154 154 153 109 14 38 4 127 38 98 37 31 10 15 2 3 12 45 16 96 96 153 45 111 174 177 176 4 173 1 145 39 171 161 1 143 3 30 140 53 161 10 153 145 11 22 159 18 173 3 169 3 171 24 152 143 41 74 9 1 3 6 1 1 1 12 3 9 6 5 1 1 2 1 4 4 5 2 5 1 1 5 12 7 5 10 8 2 10 6 8 6 8 6 8 2 2 8 10 5 4 1 4 1 4 1 5 1 5 5 70 66 66 46 45 66 66 62 45 71 72 72 72 68 58 10 252 284 278 50 54 1 55 55 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 2297 2298 2299 2300 2301 2302 2303 2304 2305 2306 2307 2308 2309 2310 2311 2312 2313 2314 2315 2316 2317 2318 2319 2320 2321 2322 2323 2324 2325 2326 2327 2328 2329 2330 2331 2332 2333 2334 2335 2336 2337 2338 2339 2340 2341 2342 2343 2344 2345 2346 2347 2348 2349 2350 2351 2352 2353 2354 2355 2356 2357 2358 2359 2360 2361 2362 2363 2364 2365 2366 2367 2368 2369 2370 2371 2372 2373 2374 2375 2376 2377 2378 2379 2380 2381 2382 2383 2384 2385 2386 2387 2388 2389 2390 2391 2392 2393 2394 2395 2396 2397 2398 2399 2400 2401 2402 2403 2404 2405 2406 2407 2408 2409 2410 2411 2412 2413 2414 2415 2416 2417 2418 2419 2420 2421 2422 2423 2424 2425 2426 2427 2428 2429 2430 2431 2432 2433 2434 2435 2436 2437 2438 2439 2440 2441 2442 2443 2444 2445 2446 2447 2448 2449 2450 2451 2452 2453 2454 2455 2456 2457 2458 2459 2460 2461 2462 2463 2464 2465 2466 2467 2468 2469 2470 2471 2472 2473 2474 2475 2476 2477 2478 2479 2480 2481 2482 2483 2484 2485 2486 2487 2488 2489 2490 2491 2492 2493 2494 2495 2496 2497 2498 2499 2500 2501 2502 2503 2504 2505 2506 2507 2508 2509 2510 2511 2512 2513 2514 2515 2516 2517 2518 2519 2520 2521 2522 2523 2524 2525 2526 2527 2528 2529 2530 2531 2532 2533 2534 2535 2536 2537 2538 2539 2540 2541 2542 2543 2544 2545 2546 2547 2548 2549 2550 2551 2552 2553 2554 2555 2556 2557 2558 2559 2560 2561 2562 2563 2564 2565 2566 2567 2568 2569 2570 2571 2572 2573 2574 2575 2576 2577 2578 2579 2580 2581 2582 2583 2584 2585 2586 2587 2588 2589 2590 2591 2592 2593 2594 2595 2596 2597 2598 2599 2600 2601 2602 2603 2604 2605 2606 2607 2608 2609 2610 2611 2612 2613 2614 2615 2616 2617 2618 2619 2620 2621 2622 2623 2624 2625 2626 2627 2628 2629 2630 2631 2632 2633 2634 2635 2636 2637 2638 2639 2640 2641 2642 2643 2644 2645 2646 2647 2648 2649 2650 2651 2652 2653 2654 2655 2656 2657 2658 2659 2660 2661 2662 2663 2664 2665 2666 2667 2668 2669 2670 2671 2672 2673 2674 2675 2676 2677 2678 2679 2680 2681 2682 2683 2684 2685 2686 2687 2688 2689 2690 2691 2692 2693 2694 2695 2696 2697 2698 2699 2700 2701 2702 2703 2704 2705 2706 2707 2708 2709 2710 2711 2712 2713 2714 2715 2716 2717 2718 2719 2720 2721 2722 2723 2724 2725 2726 2727 2728 2729 2730 2731 2732 2733 2734 2735 2736 2737 2738 2739 2740 2741 2742 2743 2744 2745 2746 2747 2748 2749 2750 2751 2752 2753 2754 2755 2756 2757 2758 2759 2760 2761 2762 2763 2764 2765 2766 2767 2768 2769 2770 2771 2772 2773 2774 2775 2776 2777 2778 2779 2780 2781 2782 2783 2784 2785 2786 2787 2788 2789 2790 2791 2792 2793 2794 2795 2796 2797 2798 2799 2800 2801 2802 2803 2804 2805 2806 2807 2808 2809 2810 2811 2812 2813 2814 2815 2816 2817 2818 2819 2820 2821 2822 2823 2824 2825 2826 2827 2828 2829 2830 2831 2832 2833 2834 2835 2836 2837 2838 2839 2840 2841 2842 2843 2844 2845 2846 2847 2848 2849 2850 2851 2852 2853 2854 2855 2856 2857 2858 2859 2860 2861 2862 2863 2864 2865 2866 2867 2868 2869 2870 2871 2872 2873 2874 2875 2876 2877 2878 2879 2880 2881 2882 2883 2884 2885 2886 2887 2888 2889 2890 2891 2892 2893 2894 2895 2896 2897 2898 2899 2900 2901 2902 2903 2904 2905 2906 2907 2908 2909 2910 2911 2912 2913 2914 2915 2916 2917 2918 2919 2920 2921 2922 2923 2924 2925 2926 2927 2928 2929 2930 2931 2932 2933 2934 2935 2936 2937 2938 2939 2940 2941 2942 2943 2944 2945 2946 2947 2948 2949 2950 2951 2952 2953 2954 2955 2956 2957 2958 2959 2960 2961 2962 2963 2964 2965 2966 2967 2968 2969 2970 2971 2972 2973 2974 2975 2976 2977 2978 2979 2980 2981 2982 2983 2984 2985 2986 2987 2988 2989 2990 2991 2992 2993 2994 2995 2996 2997 2998 2999 3000 3001 3002 3003 3004 3005 3006 3007 3008 3009 3010 3011 3012 3013 3014 3015 3016 3017 3018 3019 3020 3021 3022 3023 3024 3025 3026 3027 3028 3029 3030 3031 3032 3033 3034 3035 3036 3037 3038 3039 3040 3041 3042 3043 3044 3045 3046 3047 3048 3049 3050 3051 3052 3053 3054 3055 3056 3057 3058 3059 3060 3061 3062 3063 3064 3065 3066 3067 3068 3069 3070 3071 3072 3073 3074 3075 3076 3077 3078 3079 3080 3081 3082 3083 3084 3085 3086 3087 3088 3089 3090 3091 3092 3093 3094 3095 3096 3097 3098 3099 3100 3101 3102 3103 3104 3105 3106 3107 3108 3109 3110 3111 3112 3113 3114 3115 3116 3117 3118 3119 3120 3121 3122 3123 3124 3125 3126 3127 3128 3129 3130 3131 3132 3133 3134 3135 3136 3137 3138 3139 3140 3141 3142 3143 3144 3145 3146 3147 3148 3149 3150 3151 3152 3153 3154 3155 3156 3157 3158 3159 3160 3161 3162 3163 3164 3165 3166 3167 3168 3169 3170 3171 3172 3173 3174 3175 3176 3177 3178 3179 3180 3181 3182 3183 3184 3185 3186 3187 3188 3189 3190 3191 3192 3193 3194 3195 3196 3197 3198 3199 3200 3201 3202 3203 3204 3205 3206 3207 3208 3209 3210 3211 3212 3213 3214 3215 3216 3217 3218 3219 3220 3221 3222 3223 3224 3225 3226 3227 3228 3229 3230 3231 3232 3233 3234 3235 3236 3237 3238 3239 3240 3241 3242 3243 3244 3245 3246 3247 3248 3249 3250 3251 3252 3253 3254 3255 3256 3257 3258 3259 3260 3261 3262 3263 3264 3265 3266 3267 3268 3269 3270 3271 3272 3273 3274 3275 3276 3277 3278 3279 3280 3281 3282 3283 3284 3285 3286 3287 3288 3289 3290 3291 3292 3293 3294 3295 3296 3297 3298 3299 3300 3301 3302 3303 3304 3305 3306 3307 3308 3309 3310 3311 3312 3313 3314 3315 3316 3317 3318 3319 3320 3321 3322 3323 3324 3325 3326 3327 3328 3329 3330 3331 3332 3333 3334 3335 3336 3337 3338 3339 3340 3341 3342 3343 3344 3345 3346 3347 3348 3349 3350 3351 3352 3353 3354 3355 3356 3357 3358 3359 3360 3361 3362 3363 3364 3365 3366 3367 3368 3369 3370 3371 3372 3373 3374 3375 3376 3377 3378 3379 3380 3381 3382 3383 3384 3385 3386 3387 3388 3389 3390 3391 3392 3393 3394 3395 3396 3397 3398 3399 3400 3401 3402 3403 3404 3405 3406 3407 3408 3409 3410 3411 3412 3413 3414 3415 3416 3417 3418 3419 3420 3421 3422 3423 3424 3425 3426 3427 3428 3429 3430 3431 3432 3433 3434 3435 3436 3437 3438 3439 3440 3441 3442 3443 3444 3445 3446 3447 3448 3449 3450 3451 3452 3453 3454 3455 3456 3457 3458 3459 3460 3461 3462 3463 3464 3465 3466 3467 3468 3469 3470 3471 3472 3473 3474 3475 3476 3477 3478 3479 3480 3481 3482 3483 3484 3485 3486 3487 3488 3489 3490 3491 3492 3493 3494 3495 3496 3497 3498 3499 3500 3501 3502 3503 3504 3505 3506 3507 3508 3509 3510 3511 3512 3513 3514 3515 3516 3517 3518 3519 3520 3521 3522 3523 3524 3525 3526 3527 3528 3529 3530 3531 3532 3533 3534 3535 3536 3537 3538 3539 3540 3541 3542 3543 3544 3545 3546 3547 3548 3549 3550 3551 3552 3553 3554 3555 3556 3557 3558 3559 3560 3561 3562 3563 3564 3565 3566 3567 3568 3569 3570 3571 3572 3573 3574 3575 3576 3577 3578 3579 3580 3581 3582 3583 3584 3585 3586 3587 3588 3589 3590 3591 3592 3593 3594 3595 3596 3597 3598 3599 3600 3601 3602 3603 3604 3605 3606 3607 3608 3609 3610 3611 3612 3613 3614 3615 3616 3617 3618 3619 3620 3621 3622 3623 3624 3625 3626 3627 3628 3629 3630 3631 3632 3633 3634 3635 3636 3637 3638 3639 3640 3641 3642 3643 3644 3645 3646 3647 3648 3649 3650 3651 3652 3653 3654 3655 3656 3657 3658 3659 3660 3661 3662 3663 3664 3665 3666 3667 3668 3669 3670 3671 3672 3673 3674 3675 3676 3677 3678 3679 3680 3681 3682 3683 3684 3685 3686 3687 3688 3689 3690 3691 3692 3693 3694 3695 3696 3697 3698 3699 3700 3701 3702 3703 3704 3705 3706 3707 3708 3709 3710 3711 3712 3713 3714 3715 3716 3717 3718 3719 3720 3721 3722 3723 3724 3725 3726 3727 3728 3729 3730 3731 3732 3733 3734 3735 3736 3737 3738 3739 3740 3741 3742 3743 3744 3745 3746 3747 3748 3749 3750 3751 3752 3753 3754 3755 3756 3757 3758 3759 3760 3761 3762 3763 3764 3765 3766 3767 3768 3769 3770 3771 3772 3773 3774 3775 3776 3777 3778 3779 3780 3781 3782 3783 3784 3785 3786 3787 3788 3789 3790 3791 3792 3793 3794 3795 3796 3797 3798 3799 3800 3801 3802 3803 3804 3805 3806 3807 3808 3809 3810 3811 3812 3813 3814 3815 3816 3817 3818 3819 3820 3821 3822 3823 3824 3825 3826 3827 3828 3829 3830 3831 3832 3833 3834 3835 3836 3837 3838 3839 3840 3841 3842 3843 3844 3845 3846 3847 3848 3849 3850 3851 3852 3853 3854 3855 3856 3857 3858 3859 3860 3861 3862 3863 3864 3865 3866 3867 3868 3869 3870 3871 3872 3873 3874 3875 3876 3877 3878 3879 3880 3881 3882 3883 3884 3885 3886 3887 3888 3889 3890 3891 3892 3893 3894 3895 3896 3897 3898 3899 3900 3901 3902 3903 3904 3905 3906 3907 3908 3909 3910 3911 3912 3913 3914 3915 3916 3917 3918 3919 3920 3921 3922 3923 3924 3925 3926 3927 3928 3929 3930 3931 3932 3933 3934 3935 3936 3937 3938 3939 3940 3941 3942 3943 3944 3945 3946 3947 3948 3949 3950 3951 3952 3953 3954 3955 3956 3957 3958 3959 3960 3961 3962 3963 3964 3965 3966 3967 3968 3969 3970 3971 3972 3973 3974 3975 3976 3977 3978 3979 3980 3981 3982 3983 3984 3985 3986 3987 3988 3989 3990 3991 3992 3993 3994 3995 3996 3997 3998 3999 4000 4001 4002 4003 4004 4005 4006 4007 4008 4009 4010 4011 4012 4013 4014 4015 4016 4017 4018 4019 4020 4021 4022 4023 4024 4025 4026 4027 4028 4029 4030 4031 4032 4033 4034 4035 4036 4037 4038 4039 4040 4041 4042 4043 4044 4045 4046 4047 4048 4049 4050 4051 4052 4053 4054 4055 4056 4057 4058 4059 4060 4061 4062 4063 4064 4065 4066 4067 4068 4069 4070 4071 4072 4073 4074 4075 4076 4077 4078 4079 4080 4081 4082 4083 4084 4085 4086 4087 4088 4089 4090 4091 4092 4093 4094 4095 4096 4097 4098 4099 4100 4101 4102 4103 4104 4105 4106 4107 4108 4109 4110 4111 4112 4113 4114 4115 4116 4117 4118 4119 4120 4121 4122 4123 4124 4125 4126 4127 4128 4129 4130 4131 4132 4133 4134 4135 4136 4137 4138 4139 4140 4141 4142 4143 4144 4145 4146 4147 4148 4149 4150 4151 4152 4153 4154 4155 4156 4157 4158 4159 4160 4161 4162 4163 4164 4165 4166 4167 4168 4169 4170 4171 4172 4173 4174 4175 4176 4177 4178 4179 4180 4181 4182 4183 4184 4185 4186 4187 4188 4189 4190 4191 4192 4193 4194 4195 4196 4197 4198 4199 4200 4201 4202 4203 4204 4205 4206 4207 4208 4209 4210 4211 4212 4213 4214 4215 4216 4217 4218 4219 4220 4221 4222 4223 4224 4225 4226 4227 4228 4229 4230 4231 4232 4233 4234 4235 4236 4237 4238 4239 4240 4241 4242 4243 4244 4245 4246 4247 4248 4249 4250 4251 4252 4253 4254 4255 4256 4257 4258 4259 4260 4261 4262 4263 4264 4265 4266 4267 4268 4269 4270 4271 4272 4273 4274 4275 4276 4277 4278 4279 4280 4281 4282 4283 4284 4285 4286 4287 4288 4289 4290 4291 4292 4293 4294 4295 4296 4297 4298 4299 4300 4301 4302 4303 4304 4305 4306 4307 4308 4309 4310 4311 4312 4313 4314 4315 4316 4317 4318 4319 4320 4321 4322 4323 4324 4325 4326 4327 4328 4329 4330 4331 4332 4333 4334 4335 4336 4337 4338 4339 4340 4341 4342 4343 4344 4345 4346 4347 4348 4349 4350 4351 4352 4353 4354 4355 4356 4357 4358 4359 4360 4361 4362 4363 4364 4365 4366 4367 4368 4369 4370 4371 4372 4373 4374 4375 4376 4377 4378 4379 4380 4381 4382 4383 4384 4385 4386 4387 4388 4389 4390 4391 4392 4393 4394 4395 4396 4397 4398 4399 4400 4401 4402 4403 4404 4405 4406 4407 4408 4409 4410 4411 4412 4413 4414 4415 4416 4417 4418 4419 4420 4421 4422 4423 4424 4425 4426 4427 4428 4429 4430 4431 4432 4433 4434 4435 4436 4437 4438 4439 4440 4441 4442 4443 4444 4445 4446 4447 4448 4449 4450 4451 4452 4453 4454 4455 4456 4457 4458 4459 4460 4461 4462 4463 4464 4465 4466 4467 4468 4469 4470 4471 4472 4473 4474 4475 4476 4477 4478 4479 4480 4481 4482 4483 4484 4485 4486 4487 4488 4489 4490 4491 4492 4493 4494 4495 4496 4497 4498 4499 4500 4501 4502 4503 4504 4505 4506 4507 4508 4509 4510 4511 4512 4513 4514 4515 4516 4517 4518 4519 4520 4521 4522 4523 4524 4525 4526 4527 4528 4529 4530 4531 4532 4533 4534 4535 4536 4537 4538 4539 4540 4541 4542 4543 4544 4545 4546 4547 4548 4549 4550 4551 4552 4553 4554 4555 4556 4557 4558 4559 4560 4561 4562 4563 4564 4565 4566 4567 4568 4569 4570 4571 4572 4573 4574 4575 4576 4577 4578 4579 4580 4581 4582 4583 4584 4585 4586 4587 4588 4589 4590 4591 4592 4593 4594 4595 4596 4597 4598 4599 4600 4601 4602 4603 4604 4605 4606 4607 4608 4609 4610 4611 4612 4613 4614 4615 4616 4617 4618 4619 4620 4621 4622 4623 4624 4625 4626 4627 4628 4629 4630 4631 4632 4633 4634 4635 4636 4637 4638 4639 4640 4641 4642 4643 4644 4645 4646 4647 4648 4649 4650 4651 4652 4653 4654 4655 4656 4657 4658 4659 4660 4661 4662 4663 4664 4665 4666 4667 4668 4669 4670 4671 4672 4673 4674 4675 4676 4677 4678 4679 4680 4681 4682 4683 4684 4685 4686 4687 4688 4689 4690 4691 4692 4693 4694 4695 4696 4697 4698 4699 4700 4701 4702 4703 4704 4705 4706 4707 4708 4709 4710 4711 4712 4713 4714 4715 4716 4717 4718 4719 4720 4721 4722 4723 4724 4725 4726 4727 4728 4729 4730 4731 4732 4733 4734 4735 4736 4737 4738 4739 4740 4741 4742 4743 4744 4745 4746 4747 4748 4749 4750 4751 4752 4753 4754 4755 4756 4757 4758 4759 4760 4761 4762 4763 4764 4765 4766 4767 4768 4769 4770 4771 4772 4773 4774 4775 4776 4777 4778 4779 4780 4781 4782 4783 4784 4785 4786 4787 4788 4789 4790 4791 4792 4793 4794 4795 4796 4797 4798 4799 4800 4801 4802 4803 4804 4805 4806 4807 4808 4809 4810 4811 4812 4813 4814 4815 4816 4817 4818 4819 4820 4821 4822 4823 4824 4825 4826 4827 4828 4829 4830 4831 4832 4833 4834 4835 4836 4837 4838 4839 4840 4841 4842 4843 4844 4845 4846 4847 4848 4849 4850 4851 4852 4853 4854 4855 4856 4857 4858 4859 4860 4861 4862 4863 4864 4865 4866 4867 4868 4869 4870 4871 4872 4873 4874 4875 4876 4877 4878 4879 4880 4881 4882 4883 4884 4885 4886 4887 4888 4889 4890 4891 4892 4893 4894 4895 4896 4897 4898 4899 4900 4901 4902 4903 4904 4905 4906 4907 4908 4909 4910 4911 4912 4913 4914 4915 4916 4917 4918 4919 4920 4921 4922 4923 4924 4925 4926 4927 4928 4929 4930 4931 4932 4933 4934 4935 4936 4937 4938 4939 4940 4941 4942 4943 4944 4945 4946 4947 4948 4949 4950 4951 4952 4953 4954 4955 4956 4957 4958 4959 4960 4961 4962 4963 4964 4965 4966 4967 4968 4969 4970 4971 4972 4973 4974 4975 4976 4977 4978 4979 4980 4981 4982 4983 4984 4985 4986 4987 4988 4989 4990 4991 4992 4993 4994 4995 4996 4997 4998 4999 5000 5001 5002 5003 5004 5005 5006 5007 5008 5009 5010 5011 5012 5013 5014 5015 5016 5017 5018 5019 5020 5021 5022 5023 5024 5025 5026 5027 5028 5029 5030 5031 5032 5033 5034 5035 5036 5037 5038 5039 5040 5041 5042 5043 5044 5045 5046 5047 5048 5049 5050 5051 5052 5053 5054 5055 5056 5057 5058 5059 5060 5061 5062 5063 5064 5065 5066 5067 5068 5069 5070 5071 5072 5073 5074 5075 5076 5077 5078 5079 5080 5081 5082 5083 5084 5085 5086 5087 5088 5089 5090 5091 5092 5093 5094 5095 5096 5097 5098 5099 5100 5101 5102 5103 5104 5105 5106 5107 5108 5109 5110 5111 5112 5113 5114 5115 5116 5117 5118 5119 5120 5121 5122 5123 5124 5125 5126 5127 5128 5129 5130 5131 5132 5133 5134 5135 5136 5137 5138 5139 5140 5141 5142 5143 5144 5145 5146 5147 5148 5149 5150 5151 5152 5153 5154 5155 5156 5157 5158 5159 5160 5161 5162 5163 5164 5165 5166 5167 5168 5169 5170 5171 5172 5173 5174 5175 5176 5177 5178 5179 5180 5181 5182 5183 5184 5185 5186 5187 5188 5189 5190 5191 5192 5193 5194 5195 5196 5197 5198 5199 5200 5201 5202 5203 5204 5205 5206 5207 5208 5209 5210 5211 5212 5213 5214 5215 5216 5217 5218 5219 5220 5221 5222 5223 5224 5225 5226 5227 5228 5229 5230 5231 5232 5233 5234 5235 5236 5237 5238 5239 5240 5241 5242 5243 5244 5245 5246 5247 5248 5249 5250 5251 5252 5253 5254 5255 5256 5257 5258 5259 5260 5261 5262 5263 5264 5265 5266 5267 5268 5269 5270 5271 5272 5273 5274 5275 5276 5277 5278 5279 5280 5281 5282 5283 5284 5285 5286 5287 5288 5289 5290 5291 5292 5293 5294 5295 5296 5297 5298 5299 5300 5301 5302 5303 5304 5305 5306 5307 5308 5309 5310 5311 5312 5313 5314 5315 5316 5317 5318 5319 5320 5321 5322 5323 5324 5325 5326 5327 5328 5329 5330 5331 5332 5333 5334 5335 5336 5337 5338 5339 5340 5341 5342 5343 5344 5345 5346 5347 5348 5349 5350 5351 5352 5353 5354 5355 5356 5357 5358 5359 5360 5361 5362 5363 5364 5365 5366 5367 5368 5369 5370 5371 5372 5373 5374 5375 5376 5377 5378 5379 5380 5381 5382 5383 5384 5385 5386 5387 5388 5389 5390 5391 5392 5393 5394 5395 5396 5397 5398 5399 5400 5401 5402 5403 5404 5405 5406 5407 5408 5409 5410 5411 5412 5413 5414 5415 5416 5417 5418 5419 5420 5421 5422 5423 5424 5425 5426 5427 5428 5429 5430 5431 5432 5433 5434 5435 5436 5437 5438 5439 5440 5441 5442 5443 5444 5445 5446 5447 5448 5449 5450 5451 5452 5453 5454 5455 5456 5457 5458 5459 5460 5461 5462 5463 5464 5465 5466 5467 5468 5469 5470 5471 5472 5473 5474 5475 5476 5477 5478 5479 5480 5481 5482 5483 5484 5485 5486 5487 5488 5489 5490 5491 5492 5493 5494 5495 5496 5497 5498 5499 5500 5501 5502 5503 5504 5505 5506 5507 5508 5509 5510 5511 5512 5513 5514 5515 5516 5517 5518 5519 5520 5521 5522 5523 5524 5525 5526 5527 5528 5529 5530 5531 5532 5533 5534 5535 5536 5537 5538 5539 5540 5541 5542 5543 5544 5545 5546 5547 5548 5549 5550 5551 5552 5553 5554 5555 5556 5557 5558 5559 5560 5561 5562 5563 5564 5565 5566 5567 5568 5569 5570 5571 5572 5573 5574 5575 5576 5577 5578 5579 5580 5581 5582 5583 5584 5585 5586 5587 5588 5589 5590 5591 5592 5593 5594 5595 5596 5597 5598 5599 5600 5601 5602 5603 5604 5605 5606 5607 5608 5609 5610 5611 5612 5613 5614 5615 5616 5617 5618 5619 5620 5621 5622 5623 5624 5625 5626 5627 5628 5629 5630 5631 5632 5633 5634 5635 5636 5637 5638 5639 5640 5641 5642 5643 5644 5645 5646 5647 5648 5649 5650 5651 5652 5653 5654 5655 5656 5657 5658 5659 5660 5661 5662 5663 5664 5665 5666 5667 5668 5669 5670 5671 5672 5673 5674 5675 5676 5677 5678 5679 5680 5681 5682 5683 5684 5685 5686 5687 5688 5689 5690 5691 5692 5693 5694 5695 5696 5697 5698 5699 5700 5701 5702 5703 5704 5705 5706 5707 5708 5709 5710 5711 5712 5713 5714 5715 5716 5717 5718 5719 5720 5721 5722 5723 5724 5725 5726 5727 5728 5729 5730 5731 5732 5733 5734 5735 5736 5737 5738 5739 5740 5741 5742 5743 5744 5745 5746 5747 5748 5749 5750 5751 5752 5753 5754 5755 5756 5757 5758 5759 5760 5761 5762 5763 5764 5765 5766 5767 5768 5769 5770 5771 5772 5773 5774 5775 5776 5777 5778 5779 5780 5781 5782 5783 5784 5785 5786 5787 5788 5789 5790 5791 5792 5793 5794 5795 5796 5797 5798 5799 5800 5801 5802 5803 5804 5805 5806 5807 5808 5809 5810 5811 5812 5813 5814 5815 5816 5817 5818 5819 5820 5821 5822 5823 5824 5825 5826 5827 5828 5829 5830 5831 5832 5833 5834 5835 5836 5837 5838 5839 5840 5841 5842 5843 5844 5845 5846 5847 5848 5849 5850 5851 5852 5853 5854 5855 5856 5857 5858 5859 5860 5861 5862 5863 5864 5865 5866 5867 5868 5869 5870 5871 5872 5873 5874 5875 5876 5877 5878 5879 5880 5881 5882 5883 5884 5885 5886 5887 5888 5889 5890 5891 5892 5893 5894 5895 5896 5897 5898 5899 5900 5901 5902 5903 5904 5905 5906 5907 5908 5909 5910 5911 5912 5913 5914 5915 5916 5917 5918 5919 5920 5921 5922 5923 5924 5925 5926 5927 5928 5929 5930 5931 5932 5933 5934 5935 5936 5937 5938 5939 5940 5941 5942 5943 5944 5945 5946 5947 5948 5949 5950 5951 5952 5953 5954 5955 5956 5957 5958 5959 5960 5961 5962 5963 5964 5965 5966 5967 5968 5969 5970 5971 5972 5973 5974 5975 5976 5977 5978 5979 5980 5981 5982 5983 5984 5985 5986 5987 5988 5989 5990 5991 5992 5993 5994 5995 5996 5997 5998 5999 6000 6001 6002 6003 6004 6005 6006 6007 6008 6009 6010 6011 6012 6013 6014 6015 6016 6017 6018 6019 6020 6021 6022 6023 6024 6025 6026 6027 6028 6029 6030 6031 6032 6033 6034 6035 6036 6037 6038 6039 6040 6041 6042 6043 6044 6045 6046 6047 6048 6049 6050 6051 6052 6053 6054 6055 6056 6057 6058 6059 6060 6061 6062 6063 6064 6065 6066 6067 6068 6069 6070 6071 6072 6073 6074 6075 6076 6077 6078 6079 6080 6081 6082 6083 6084 6085 6086 6087 6088 6089 6090 6091 6092 6093 6094 6095 6096 6097 6098 6099 6100 6101 6102 6103 6104 6105 6106 6107 6108 6109 6110 6111 6112 6113 6114 6115 6116 6117 6118 6119 6120 6121 6122 6123 6124 6125 6126 6127 6128 6129 6130 6131 6132 6133 6134 6135 6136 6137 6138 6139 6140 6141 6142 6143 6144 6145 6146 6147 6148 6149 6150 6151 6152 6153 6154 6155 6156 6157 6158 6159 6160 6161 6162 6163 6164 6165 6166 6167 6168 6169 6170 6171 6172 6173 6174 6175 6176 6177 6178 6179 6180 6181 6182 6183 6184 6185 6186 6187 6188 6189 6190 6191 6192 6193 6194 6195 6196 6197 6198 6199 6200 6201 6202 6203 6204 6205 6206 6207 6208 6209 6210 6211 6212 6213 6214 6215 6216 6217 6218 6219 6220 6221 6222 6223 6224 6225 6226 6227 6228 6229 6230 6231 6232 6233 6234 6235 6236 6237 6238 6239 6240 6241 6242 6243 6244 6245 6246 6247 6248 6249 6250 6251 6252 6253 6254 6255 6256 6257 6258 6259 6260 6261 6262 6263 6264 6265 6266 6267 6268 6269 6270 6271 6272 6273 6274 6275 | // SPDX-License-Identifier: GPL-2.0 /* * Copyright (c) 2000-2006 Silicon Graphics, Inc. * All Rights Reserved. */ #include "xfs.h" #include "xfs_fs.h" #include "xfs_shared.h" #include "xfs_format.h" #include "xfs_log_format.h" #include "xfs_trans_resv.h" #include "xfs_bit.h" #include "xfs_sb.h" #include "xfs_mount.h" #include "xfs_defer.h" #include "xfs_dir2.h" #include "xfs_inode.h" #include "xfs_btree.h" #include "xfs_trans.h" #include "xfs_alloc.h" #include "xfs_bmap.h" #include "xfs_bmap_util.h" #include "xfs_bmap_btree.h" #include "xfs_rtbitmap.h" #include "xfs_errortag.h" #include "xfs_error.h" #include "xfs_quota.h" #include "xfs_trans_space.h" #include "xfs_buf_item.h" #include "xfs_trace.h" #include "xfs_attr_leaf.h" #include "xfs_filestream.h" #include "xfs_rmap.h" #include "xfs_ag.h" #include "xfs_ag_resv.h" #include "xfs_refcount.h" #include "xfs_iomap.h" #include "xfs_health.h" #include "xfs_bmap_item.h" #include "xfs_symlink_remote.h" #include "xfs_inode_util.h" #include "xfs_rtgroup.h" #include "xfs_zone_alloc.h" struct kmem_cache *xfs_bmap_intent_cache; /* * Miscellaneous helper functions */ /* * Compute and fill in the value of the maximum depth of a bmap btree * in this filesystem. Done once, during mount. */ void xfs_bmap_compute_maxlevels( xfs_mount_t *mp, /* file system mount structure */ int whichfork) /* data or attr fork */ { uint64_t maxblocks; /* max blocks at this level */ xfs_extnum_t maxleafents; /* max leaf entries possible */ int level; /* btree level */ int maxrootrecs; /* max records in root block */ int minleafrecs; /* min records in leaf block */ int minnoderecs; /* min records in node block */ int sz; /* root block size */ /* * The maximum number of extents in a fork, hence the maximum number of * leaf entries, is controlled by the size of the on-disk extent count. * * Note that we can no longer assume that if we are in ATTR1 that the * fork offset of all the inodes will be * (xfs_default_attroffset(ip) >> 3) because we could have mounted with * ATTR2 and then mounted back with ATTR1, keeping the i_forkoff's fixed * but probably at various positions. Therefore, for both ATTR1 and * ATTR2 we have to assume the worst case scenario of a minimum size * available. */ maxleafents = xfs_iext_max_nextents(xfs_has_large_extent_counts(mp), whichfork); if (whichfork == XFS_DATA_FORK) sz = xfs_bmdr_space_calc(MINDBTPTRS); else sz = xfs_bmdr_space_calc(MINABTPTRS); maxrootrecs = xfs_bmdr_maxrecs(sz, 0); minleafrecs = mp->m_bmap_dmnr[0]; minnoderecs = mp->m_bmap_dmnr[1]; maxblocks = howmany_64(maxleafents, minleafrecs); for (level = 1; maxblocks > 1; level++) { if (maxblocks <= maxrootrecs) maxblocks = 1; else maxblocks = howmany_64(maxblocks, minnoderecs); } mp->m_bm_maxlevels[whichfork] = level; ASSERT(mp->m_bm_maxlevels[whichfork] <= xfs_bmbt_maxlevels_ondisk()); } unsigned int xfs_bmap_compute_attr_offset( struct xfs_mount *mp) { if (mp->m_sb.sb_inodesize == 256) return XFS_LITINO(mp) - xfs_bmdr_space_calc(MINABTPTRS); return xfs_bmdr_space_calc(6 * MINABTPTRS); } STATIC int /* error */ xfs_bmbt_lookup_eq( struct xfs_btree_cur *cur, struct xfs_bmbt_irec *irec, int *stat) /* success/failure */ { cur->bc_rec.b = *irec; return xfs_btree_lookup(cur, XFS_LOOKUP_EQ, stat); } STATIC int /* error */ xfs_bmbt_lookup_first( struct xfs_btree_cur *cur, int *stat) /* success/failure */ { cur->bc_rec.b.br_startoff = 0; cur->bc_rec.b.br_startblock = 0; cur->bc_rec.b.br_blockcount = 0; return xfs_btree_lookup(cur, XFS_LOOKUP_GE, stat); } /* * Check if the inode needs to be converted to btree format. */ static inline bool xfs_bmap_needs_btree(struct xfs_inode *ip, int whichfork) { struct xfs_ifork *ifp = xfs_ifork_ptr(ip, whichfork); return whichfork != XFS_COW_FORK && ifp->if_format == XFS_DINODE_FMT_EXTENTS && ifp->if_nextents > XFS_IFORK_MAXEXT(ip, whichfork); } /* * Check if the inode should be converted to extent format. */ static inline bool xfs_bmap_wants_extents(struct xfs_inode *ip, int whichfork) { struct xfs_ifork *ifp = xfs_ifork_ptr(ip, whichfork); return whichfork != XFS_COW_FORK && ifp->if_format == XFS_DINODE_FMT_BTREE && ifp->if_nextents <= XFS_IFORK_MAXEXT(ip, whichfork); } /* * Update the record referred to by cur to the value given by irec * This either works (return 0) or gets an EFSCORRUPTED error. */ STATIC int xfs_bmbt_update( struct xfs_btree_cur *cur, struct xfs_bmbt_irec *irec) { union xfs_btree_rec rec; xfs_bmbt_disk_set_all(&rec.bmbt, irec); return xfs_btree_update(cur, &rec); } /* * Compute the worst-case number of indirect blocks that will be used * for ip's delayed extent of length "len". */ xfs_filblks_t xfs_bmap_worst_indlen( struct xfs_inode *ip, /* incore inode pointer */ xfs_filblks_t len) /* delayed extent length */ { struct xfs_mount *mp = ip->i_mount; int maxrecs = mp->m_bmap_dmxr[0]; int level; xfs_filblks_t rval; for (level = 0, rval = 0; level < XFS_BM_MAXLEVELS(mp, XFS_DATA_FORK); level++) { len += maxrecs - 1; do_div(len, maxrecs); rval += len; if (len == 1) return rval + XFS_BM_MAXLEVELS(mp, XFS_DATA_FORK) - level - 1; if (level == 0) maxrecs = mp->m_bmap_dmxr[1]; } return rval; } /* * Calculate the default attribute fork offset for newly created inodes. */ uint xfs_default_attroffset( struct xfs_inode *ip) { if (ip->i_df.if_format == XFS_DINODE_FMT_DEV) return roundup(sizeof(xfs_dev_t), 8); return M_IGEO(ip->i_mount)->attr_fork_offset; } /* * Helper routine to reset inode i_forkoff field when switching attribute fork * from local to extent format - we reset it where possible to make space * available for inline data fork extents. */ STATIC void xfs_bmap_forkoff_reset( xfs_inode_t *ip, int whichfork) { if (whichfork == XFS_ATTR_FORK && ip->i_df.if_format != XFS_DINODE_FMT_DEV && ip->i_df.if_format != XFS_DINODE_FMT_BTREE) { uint dfl_forkoff = xfs_default_attroffset(ip) >> 3; if (dfl_forkoff > ip->i_forkoff) ip->i_forkoff = dfl_forkoff; } } static int xfs_bmap_read_buf( struct xfs_mount *mp, /* file system mount point */ struct xfs_trans *tp, /* transaction pointer */ xfs_fsblock_t fsbno, /* file system block number */ struct xfs_buf **bpp) /* buffer for fsbno */ { struct xfs_buf *bp; /* return value */ int error; if (!xfs_verify_fsbno(mp, fsbno)) return -EFSCORRUPTED; error = xfs_trans_read_buf(mp, tp, mp->m_ddev_targp, XFS_FSB_TO_DADDR(mp, fsbno), mp->m_bsize, 0, &bp, &xfs_bmbt_buf_ops); if (!error) { xfs_buf_set_ref(bp, XFS_BMAP_BTREE_REF); *bpp = bp; } return error; } #ifdef DEBUG STATIC struct xfs_buf * xfs_bmap_get_bp( struct xfs_btree_cur *cur, xfs_fsblock_t bno) { struct xfs_log_item *lip; int i; if (!cur) return NULL; for (i = 0; i < cur->bc_maxlevels; i++) { if (!cur->bc_levels[i].bp) break; if (xfs_buf_daddr(cur->bc_levels[i].bp) == bno) return cur->bc_levels[i].bp; } /* Chase down all the log items to see if the bp is there */ list_for_each_entry(lip, &cur->bc_tp->t_items, li_trans) { struct xfs_buf_log_item *bip = (struct xfs_buf_log_item *)lip; if (bip->bli_item.li_type == XFS_LI_BUF && xfs_buf_daddr(bip->bli_buf) == bno) return bip->bli_buf; } return NULL; } STATIC void xfs_check_block( struct xfs_btree_block *block, xfs_mount_t *mp, int root, short sz) { int i, j, dmxr; __be64 *pp, *thispa; /* pointer to block address */ xfs_bmbt_key_t *prevp, *keyp; ASSERT(be16_to_cpu(block->bb_level) > 0); prevp = NULL; for( i = 1; i <= xfs_btree_get_numrecs(block); i++) { dmxr = mp->m_bmap_dmxr[0]; keyp = xfs_bmbt_key_addr(mp, block, i); if (prevp) { ASSERT(be64_to_cpu(prevp->br_startoff) < be64_to_cpu(keyp->br_startoff)); } prevp = keyp; /* * Compare the block numbers to see if there are dups. */ if (root) pp = xfs_bmap_broot_ptr_addr(mp, block, i, sz); else pp = xfs_bmbt_ptr_addr(mp, block, i, dmxr); for (j = i+1; j <= be16_to_cpu(block->bb_numrecs); j++) { if (root) thispa = xfs_bmap_broot_ptr_addr(mp, block, j, sz); else thispa = xfs_bmbt_ptr_addr(mp, block, j, dmxr); if (*thispa == *pp) { xfs_warn(mp, "%s: thispa(%d) == pp(%d) %lld", __func__, j, i, (unsigned long long)be64_to_cpu(*thispa)); xfs_err(mp, "%s: ptrs are equal in node\n", __func__); xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE); } } } } /* * Check that the extents for the inode ip are in the right order in all * btree leaves. THis becomes prohibitively expensive for large extent count * files, so don't bother with inodes that have more than 10,000 extents in * them. The btree record ordering checks will still be done, so for such large * bmapbt constructs that is going to catch most corruptions. */ STATIC void xfs_bmap_check_leaf_extents( struct xfs_btree_cur *cur, /* btree cursor or null */ xfs_inode_t *ip, /* incore inode pointer */ int whichfork) /* data or attr fork */ { struct xfs_mount *mp = ip->i_mount; struct xfs_ifork *ifp = xfs_ifork_ptr(ip, whichfork); struct xfs_btree_block *block; /* current btree block */ xfs_fsblock_t bno; /* block # of "block" */ struct xfs_buf *bp; /* buffer for "block" */ int error; /* error return value */ xfs_extnum_t i=0, j; /* index into the extents list */ int level; /* btree level, for checking */ __be64 *pp; /* pointer to block address */ xfs_bmbt_rec_t *ep; /* pointer to current extent */ xfs_bmbt_rec_t last = {0, 0}; /* last extent in prev block */ xfs_bmbt_rec_t *nextp; /* pointer to next extent */ int bp_release = 0; if (ifp->if_format != XFS_DINODE_FMT_BTREE) return; /* skip large extent count inodes */ if (ip->i_df.if_nextents > 10000) return; bno = NULLFSBLOCK; block = ifp->if_broot; /* * Root level must use BMAP_BROOT_PTR_ADDR macro to get ptr out. */ level = be16_to_cpu(block->bb_level); ASSERT(level > 0); xfs_check_block(block, mp, 1, ifp->if_broot_bytes); pp = xfs_bmap_broot_ptr_addr(mp, block, 1, ifp->if_broot_bytes); bno = be64_to_cpu(*pp); ASSERT(bno != NULLFSBLOCK); ASSERT(XFS_FSB_TO_AGNO(mp, bno) < mp->m_sb.sb_agcount); ASSERT(XFS_FSB_TO_AGBNO(mp, bno) < mp->m_sb.sb_agblocks); /* * Go down the tree until leaf level is reached, following the first * pointer (leftmost) at each level. */ while (level-- > 0) { /* See if buf is in cur first */ bp_release = 0; bp = xfs_bmap_get_bp(cur, XFS_FSB_TO_DADDR(mp, bno)); if (!bp) { bp_release = 1; error = xfs_bmap_read_buf(mp, NULL, bno, &bp); if (xfs_metadata_is_sick(error)) xfs_btree_mark_sick(cur); if (error) goto error_norelse; } block = XFS_BUF_TO_BLOCK(bp); if (level == 0) break; /* * Check this block for basic sanity (increasing keys and * no duplicate blocks). */ xfs_check_block(block, mp, 0, 0); pp = xfs_bmbt_ptr_addr(mp, block, 1, mp->m_bmap_dmxr[1]); bno = be64_to_cpu(*pp); if (XFS_IS_CORRUPT(mp, !xfs_verify_fsbno(mp, bno))) { xfs_btree_mark_sick(cur); error = -EFSCORRUPTED; goto error0; } if (bp_release) { bp_release = 0; xfs_trans_brelse(NULL, bp); } } /* * Here with bp and block set to the leftmost leaf node in the tree. */ i = 0; /* * Loop over all leaf nodes checking that all extents are in the right order. */ for (;;) { xfs_fsblock_t nextbno; xfs_extnum_t num_recs; num_recs = xfs_btree_get_numrecs(block); /* * Read-ahead the next leaf block, if any. */ nextbno = be64_to_cpu(block->bb_u.l.bb_rightsib); /* * Check all the extents to make sure they are OK. * If we had a previous block, the last entry should * conform with the first entry in this one. */ ep = xfs_bmbt_rec_addr(mp, block, 1); if (i) { ASSERT(xfs_bmbt_disk_get_startoff(&last) + xfs_bmbt_disk_get_blockcount(&last) <= xfs_bmbt_disk_get_startoff(ep)); } for (j = 1; j < num_recs; j++) { nextp = xfs_bmbt_rec_addr(mp, block, j + 1); ASSERT(xfs_bmbt_disk_get_startoff(ep) + xfs_bmbt_disk_get_blockcount(ep) <= xfs_bmbt_disk_get_startoff(nextp)); ep = nextp; } last = *ep; i += num_recs; if (bp_release) { bp_release = 0; xfs_trans_brelse(NULL, bp); } bno = nextbno; /* * If we've reached the end, stop. */ if (bno == NULLFSBLOCK) break; bp_release = 0; bp = xfs_bmap_get_bp(cur, XFS_FSB_TO_DADDR(mp, bno)); if (!bp) { bp_release = 1; error = xfs_bmap_read_buf(mp, NULL, bno, &bp); if (xfs_metadata_is_sick(error)) xfs_btree_mark_sick(cur); if (error) goto error_norelse; } block = XFS_BUF_TO_BLOCK(bp); } return; error0: xfs_warn(mp, "%s: at error0", __func__); if (bp_release) xfs_trans_brelse(NULL, bp); error_norelse: xfs_warn(mp, "%s: BAD after btree leaves for %llu extents", __func__, i); xfs_err(mp, "%s: CORRUPTED BTREE OR SOMETHING", __func__); xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE); return; } /* * Validate that the bmbt_irecs being returned from bmapi are valid * given the caller's original parameters. Specifically check the * ranges of the returned irecs to ensure that they only extend beyond * the given parameters if the XFS_BMAPI_ENTIRE flag was set. */ STATIC void xfs_bmap_validate_ret( xfs_fileoff_t bno, xfs_filblks_t len, uint32_t flags, xfs_bmbt_irec_t *mval, int nmap, int ret_nmap) { int i; /* index to map values */ ASSERT(ret_nmap <= nmap); for (i = 0; i < ret_nmap; i++) { ASSERT(mval[i].br_blockcount > 0); if (!(flags & XFS_BMAPI_ENTIRE)) { ASSERT(mval[i].br_startoff >= bno); ASSERT(mval[i].br_blockcount <= len); ASSERT(mval[i].br_startoff + mval[i].br_blockcount <= bno + len); } else { ASSERT(mval[i].br_startoff < bno + len); ASSERT(mval[i].br_startoff + mval[i].br_blockcount > bno); } ASSERT(i == 0 || mval[i - 1].br_startoff + mval[i - 1].br_blockcount == mval[i].br_startoff); ASSERT(mval[i].br_startblock != DELAYSTARTBLOCK && mval[i].br_startblock != HOLESTARTBLOCK); ASSERT(mval[i].br_state == XFS_EXT_NORM || mval[i].br_state == XFS_EXT_UNWRITTEN); } } #else #define xfs_bmap_check_leaf_extents(cur, ip, whichfork) do { } while (0) #define xfs_bmap_validate_ret(bno,len,flags,mval,onmap,nmap) do { } while (0) #endif /* DEBUG */ /* * Inode fork format manipulation functions */ /* * Convert the inode format to extent format if it currently is in btree format, * but the extent list is small enough that it fits into the extent format. * * Since the extents are already in-core, all we have to do is give up the space * for the btree root and pitch the leaf block. */ STATIC int /* error */ xfs_bmap_btree_to_extents( struct xfs_trans *tp, /* transaction pointer */ struct xfs_inode *ip, /* incore inode pointer */ struct xfs_btree_cur *cur, /* btree cursor */ int *logflagsp, /* inode logging flags */ int whichfork) /* data or attr fork */ { struct xfs_ifork *ifp = xfs_ifork_ptr(ip, whichfork); struct xfs_mount *mp = ip->i_mount; struct xfs_btree_block *rblock = ifp->if_broot; struct xfs_btree_block *cblock;/* child btree block */ xfs_fsblock_t cbno; /* child block number */ struct xfs_buf *cbp; /* child block's buffer */ int error; /* error return value */ __be64 *pp; /* ptr to block address */ struct xfs_owner_info oinfo; /* check if we actually need the extent format first: */ if (!xfs_bmap_wants_extents(ip, whichfork)) return 0; ASSERT(cur); ASSERT(whichfork != XFS_COW_FORK); ASSERT(ifp->if_format == XFS_DINODE_FMT_BTREE); ASSERT(be16_to_cpu(rblock->bb_level) == 1); ASSERT(be16_to_cpu(rblock->bb_numrecs) == 1); ASSERT(xfs_bmbt_maxrecs(mp, ifp->if_broot_bytes, false) == 1); pp = xfs_bmap_broot_ptr_addr(mp, rblock, 1, ifp->if_broot_bytes); cbno = be64_to_cpu(*pp); #ifdef DEBUG if (XFS_IS_CORRUPT(cur->bc_mp, !xfs_verify_fsbno(mp, cbno))) { xfs_btree_mark_sick(cur); return -EFSCORRUPTED; } #endif error = xfs_bmap_read_buf(mp, tp, cbno, &cbp); if (xfs_metadata_is_sick(error)) xfs_btree_mark_sick(cur); if (error) return error; cblock = XFS_BUF_TO_BLOCK(cbp); if ((error = xfs_btree_check_block(cur, cblock, 0, cbp))) return error; xfs_rmap_ino_bmbt_owner(&oinfo, ip->i_ino, whichfork); error = xfs_free_extent_later(cur->bc_tp, cbno, 1, &oinfo, XFS_AG_RESV_NONE, 0); if (error) return error; ip->i_nblocks--; xfs_trans_mod_dquot_byino(tp, ip, XFS_TRANS_DQ_BCOUNT, -1L); xfs_trans_binval(tp, cbp); if (cur->bc_levels[0].bp == cbp) cur->bc_levels[0].bp = NULL; xfs_bmap_broot_realloc(ip, whichfork, 0); ASSERT(ifp->if_broot == NULL); ifp->if_format = XFS_DINODE_FMT_EXTENTS; *logflagsp |= XFS_ILOG_CORE | xfs_ilog_fext(whichfork); return 0; } /* * Convert an extents-format file into a btree-format file. * The new file will have a root block (in the inode) and a single child block. */ STATIC int /* error */ xfs_bmap_extents_to_btree( struct xfs_trans *tp, /* transaction pointer */ struct xfs_inode *ip, /* incore inode pointer */ struct xfs_btree_cur **curp, /* cursor returned to caller */ int wasdel, /* converting a delayed alloc */ int *logflagsp, /* inode logging flags */ int whichfork) /* data or attr fork */ { struct xfs_btree_block *ablock; /* allocated (child) bt block */ struct xfs_buf *abp; /* buffer for ablock */ struct xfs_alloc_arg args; /* allocation arguments */ struct xfs_bmbt_rec *arp; /* child record pointer */ struct xfs_btree_block *block; /* btree root block */ struct xfs_btree_cur *cur; /* bmap btree cursor */ int error; /* error return value */ struct xfs_ifork *ifp; /* inode fork pointer */ struct xfs_bmbt_key *kp; /* root block key pointer */ struct xfs_mount *mp; /* mount structure */ xfs_bmbt_ptr_t *pp; /* root block address pointer */ struct xfs_iext_cursor icur; struct xfs_bmbt_irec rec; xfs_extnum_t cnt = 0; mp = ip->i_mount; ASSERT(whichfork != XFS_COW_FORK); ifp = xfs_ifork_ptr(ip, whichfork); ASSERT(ifp->if_format == XFS_DINODE_FMT_EXTENTS); /* * Make space in the inode incore. This needs to be undone if we fail * to expand the root. */ block = xfs_bmap_broot_realloc(ip, whichfork, 1); /* * Fill in the root. */ xfs_bmbt_init_block(ip, block, NULL, 1, 1); /* * Need a cursor. Can't allocate until bb_level is filled in. */ cur = xfs_bmbt_init_cursor(mp, tp, ip, whichfork); if (wasdel) cur->bc_flags |= XFS_BTREE_BMBT_WASDEL; /* * Convert to a btree with two levels, one record in root. */ ifp->if_format = XFS_DINODE_FMT_BTREE; memset(&args, 0, sizeof(args)); args.tp = tp; args.mp = mp; xfs_rmap_ino_bmbt_owner(&args.oinfo, ip->i_ino, whichfork); args.minlen = args.maxlen = args.prod = 1; args.wasdel = wasdel; *logflagsp = 0; error = xfs_alloc_vextent_start_ag(&args, XFS_INO_TO_FSB(mp, ip->i_ino)); if (error) goto out_root_realloc; /* * Allocation can't fail, the space was reserved. */ if (WARN_ON_ONCE(args.fsbno == NULLFSBLOCK)) { error = -ENOSPC; goto out_root_realloc; } cur->bc_bmap.allocated++; ip->i_nblocks++; xfs_trans_mod_dquot_byino(tp, ip, XFS_TRANS_DQ_BCOUNT, 1L); error = xfs_trans_get_buf(tp, mp->m_ddev_targp, XFS_FSB_TO_DADDR(mp, args.fsbno), mp->m_bsize, 0, &abp); if (error) goto out_unreserve_dquot; /* * Fill in the child block. */ ablock = XFS_BUF_TO_BLOCK(abp); xfs_bmbt_init_block(ip, ablock, abp, 0, 0); for_each_xfs_iext(ifp, &icur, &rec) { if (isnullstartblock(rec.br_startblock)) continue; arp = xfs_bmbt_rec_addr(mp, ablock, 1 + cnt); xfs_bmbt_disk_set_all(arp, &rec); cnt++; } ASSERT(cnt == ifp->if_nextents); xfs_btree_set_numrecs(ablock, cnt); /* * Fill in the root key and pointer. */ kp = xfs_bmbt_key_addr(mp, block, 1); arp = xfs_bmbt_rec_addr(mp, ablock, 1); kp->br_startoff = cpu_to_be64(xfs_bmbt_disk_get_startoff(arp)); pp = xfs_bmbt_ptr_addr(mp, block, 1, xfs_bmbt_get_maxrecs(cur, be16_to_cpu(block->bb_level))); *pp = cpu_to_be64(args.fsbno); /* * Do all this logging at the end so that * the root is at the right level. */ xfs_btree_log_block(cur, abp, XFS_BB_ALL_BITS); xfs_btree_log_recs(cur, abp, 1, be16_to_cpu(ablock->bb_numrecs)); ASSERT(*curp == NULL); *curp = cur; *logflagsp = XFS_ILOG_CORE | xfs_ilog_fbroot(whichfork); return 0; out_unreserve_dquot: xfs_trans_mod_dquot_byino(tp, ip, XFS_TRANS_DQ_BCOUNT, -1L); out_root_realloc: xfs_bmap_broot_realloc(ip, whichfork, 0); ifp->if_format = XFS_DINODE_FMT_EXTENTS; ASSERT(ifp->if_broot == NULL); xfs_btree_del_cursor(cur, XFS_BTREE_ERROR); return error; } /* * Convert a local file to an extents file. * This code is out of bounds for data forks of regular files, * since the file data needs to get logged so things will stay consistent. * (The bmap-level manipulations are ok, though). */ void xfs_bmap_local_to_extents_empty( struct xfs_trans *tp, struct xfs_inode *ip, int whichfork) { struct xfs_ifork *ifp = xfs_ifork_ptr(ip, whichfork); ASSERT(whichfork != XFS_COW_FORK); ASSERT(ifp->if_format == XFS_DINODE_FMT_LOCAL); ASSERT(ifp->if_bytes == 0); ASSERT(ifp->if_nextents == 0); xfs_bmap_forkoff_reset(ip, whichfork); ifp->if_data = NULL; ifp->if_height = 0; ifp->if_format = XFS_DINODE_FMT_EXTENTS; xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE); } int /* error */ xfs_bmap_local_to_extents( xfs_trans_t *tp, /* transaction pointer */ xfs_inode_t *ip, /* incore inode pointer */ xfs_extlen_t total, /* total blocks needed by transaction */ int *logflagsp, /* inode logging flags */ int whichfork, void (*init_fn)(struct xfs_trans *tp, struct xfs_buf *bp, struct xfs_inode *ip, struct xfs_ifork *ifp, void *priv), void *priv) { int error = 0; int flags; /* logging flags returned */ struct xfs_ifork *ifp; /* inode fork pointer */ xfs_alloc_arg_t args; /* allocation arguments */ struct xfs_buf *bp; /* buffer for extent block */ struct xfs_bmbt_irec rec; struct xfs_iext_cursor icur; /* * We don't want to deal with the case of keeping inode data inline yet. * So sending the data fork of a regular inode is invalid. */ ASSERT(!(S_ISREG(VFS_I(ip)->i_mode) && whichfork == XFS_DATA_FORK)); ifp = xfs_ifork_ptr(ip, whichfork); ASSERT(ifp->if_format == XFS_DINODE_FMT_LOCAL); if (!ifp->if_bytes) { xfs_bmap_local_to_extents_empty(tp, ip, whichfork); flags = XFS_ILOG_CORE; goto done; } flags = 0; error = 0; memset(&args, 0, sizeof(args)); args.tp = tp; args.mp = ip->i_mount; args.total = total; args.minlen = args.maxlen = args.prod = 1; xfs_rmap_ino_owner(&args.oinfo, ip->i_ino, whichfork, 0); /* * Allocate a block. We know we need only one, since the * file currently fits in an inode. */ args.total = total; args.minlen = args.maxlen = args.prod = 1; error = xfs_alloc_vextent_start_ag(&args, XFS_INO_TO_FSB(args.mp, ip->i_ino)); if (error) goto done; /* Can't fail, the space was reserved. */ ASSERT(args.fsbno != NULLFSBLOCK); ASSERT(args.len == 1); error = xfs_trans_get_buf(tp, args.mp->m_ddev_targp, XFS_FSB_TO_DADDR(args.mp, args.fsbno), args.mp->m_bsize, 0, &bp); if (error) goto done; /* * Initialize the block, copy the data and log the remote buffer. * * The callout is responsible for logging because the remote format * might differ from the local format and thus we don't know how much to * log here. Note that init_fn must also set the buffer log item type * correctly. */ init_fn(tp, bp, ip, ifp, priv); /* account for the change in fork size */ xfs_idata_realloc(ip, -ifp->if_bytes, whichfork); xfs_bmap_local_to_extents_empty(tp, ip, whichfork); flags |= XFS_ILOG_CORE; ifp->if_data = NULL; ifp->if_height = 0; rec.br_startoff = 0; rec.br_startblock = args.fsbno; rec.br_blockcount = 1; rec.br_state = XFS_EXT_NORM; xfs_iext_first(ifp, &icur); xfs_iext_insert(ip, &icur, &rec, 0); ifp->if_nextents = 1; ip->i_nblocks = 1; xfs_trans_mod_dquot_byino(tp, ip, XFS_TRANS_DQ_BCOUNT, 1L); flags |= xfs_ilog_fext(whichfork); done: *logflagsp = flags; return error; } /* * Called from xfs_bmap_add_attrfork to handle btree format files. */ STATIC int /* error */ xfs_bmap_add_attrfork_btree( xfs_trans_t *tp, /* transaction pointer */ xfs_inode_t *ip, /* incore inode pointer */ int *flags) /* inode logging flags */ { struct xfs_btree_block *block = ip->i_df.if_broot; struct xfs_btree_cur *cur; /* btree cursor */ int error; /* error return value */ xfs_mount_t *mp; /* file system mount struct */ int stat; /* newroot status */ mp = ip->i_mount; if (xfs_bmap_bmdr_space(block) <= xfs_inode_data_fork_size(ip)) *flags |= XFS_ILOG_DBROOT; else { cur = xfs_bmbt_init_cursor(mp, tp, ip, XFS_DATA_FORK); error = xfs_bmbt_lookup_first(cur, &stat); if (error) goto error0; /* must be at least one entry */ if (XFS_IS_CORRUPT(mp, stat != 1)) { xfs_btree_mark_sick(cur); error = -EFSCORRUPTED; goto error0; } if ((error = xfs_btree_new_iroot(cur, flags, &stat))) goto error0; if (stat == 0) { xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR); return -ENOSPC; } cur->bc_bmap.allocated = 0; xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR); } return 0; error0: xfs_btree_del_cursor(cur, XFS_BTREE_ERROR); return error; } /* * Called from xfs_bmap_add_attrfork to handle extents format files. */ STATIC int /* error */ xfs_bmap_add_attrfork_extents( struct xfs_trans *tp, /* transaction pointer */ struct xfs_inode *ip, /* incore inode pointer */ int *flags) /* inode logging flags */ { struct xfs_btree_cur *cur; /* bmap btree cursor */ int error; /* error return value */ if (ip->i_df.if_nextents * sizeof(struct xfs_bmbt_rec) <= xfs_inode_data_fork_size(ip)) return 0; cur = NULL; error = xfs_bmap_extents_to_btree(tp, ip, &cur, 0, flags, XFS_DATA_FORK); if (cur) { cur->bc_bmap.allocated = 0; xfs_btree_del_cursor(cur, error); } return error; } /* * Called from xfs_bmap_add_attrfork to handle local format files. Each * different data fork content type needs a different callout to do the * conversion. Some are basic and only require special block initialisation * callouts for the data formating, others (directories) are so specialised they * handle everything themselves. * * XXX (dgc): investigate whether directory conversion can use the generic * formatting callout. It should be possible - it's just a very complex * formatter. */ STATIC int /* error */ xfs_bmap_add_attrfork_local( struct xfs_trans *tp, /* transaction pointer */ struct xfs_inode *ip, /* incore inode pointer */ int *flags) /* inode logging flags */ { struct xfs_da_args dargs; /* args for dir/attr code */ if (ip->i_df.if_bytes <= xfs_inode_data_fork_size(ip)) return 0; if (S_ISDIR(VFS_I(ip)->i_mode)) { memset(&dargs, 0, sizeof(dargs)); dargs.geo = ip->i_mount->m_dir_geo; dargs.dp = ip; dargs.total = dargs.geo->fsbcount; dargs.whichfork = XFS_DATA_FORK; dargs.trans = tp; dargs.owner = ip->i_ino; return xfs_dir2_sf_to_block(&dargs); } if (S_ISLNK(VFS_I(ip)->i_mode)) return xfs_bmap_local_to_extents(tp, ip, 1, flags, XFS_DATA_FORK, xfs_symlink_local_to_remote, NULL); /* should only be called for types that support local format data */ ASSERT(0); xfs_bmap_mark_sick(ip, XFS_ATTR_FORK); return -EFSCORRUPTED; } /* * Set an inode attr fork offset based on the format of the data fork. */ static int xfs_bmap_set_attrforkoff( struct xfs_inode *ip, int size, int *version) { int default_size = xfs_default_attroffset(ip) >> 3; switch (ip->i_df.if_format) { case XFS_DINODE_FMT_DEV: ip->i_forkoff = default_size; break; case XFS_DINODE_FMT_LOCAL: case XFS_DINODE_FMT_EXTENTS: case XFS_DINODE_FMT_BTREE: ip->i_forkoff = xfs_attr_shortform_bytesfit(ip, size); if (!ip->i_forkoff) ip->i_forkoff = default_size; else if (xfs_has_attr2(ip->i_mount) && version) *version = 2; break; default: ASSERT(0); return -EINVAL; } return 0; } /* * Convert inode from non-attributed to attributed. Caller must hold the * ILOCK_EXCL and the file cannot have an attr fork. */ int /* error code */ xfs_bmap_add_attrfork( struct xfs_trans *tp, struct xfs_inode *ip, /* incore inode pointer */ int size, /* space new attribute needs */ int rsvd) /* xact may use reserved blks */ { struct xfs_mount *mp = tp->t_mountp; int version = 1; /* superblock attr version */ int logflags; /* logging flags */ int error; /* error return value */ xfs_assert_ilocked(ip, XFS_ILOCK_EXCL); if (!xfs_is_metadir_inode(ip)) ASSERT(!XFS_NOT_DQATTACHED(mp, ip)); ASSERT(!xfs_inode_has_attr_fork(ip)); xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE); error = xfs_bmap_set_attrforkoff(ip, size, &version); if (error) return error; xfs_ifork_init_attr(ip, XFS_DINODE_FMT_EXTENTS, 0); logflags = 0; switch (ip->i_df.if_format) { case XFS_DINODE_FMT_LOCAL: error = xfs_bmap_add_attrfork_local(tp, ip, &logflags); break; case XFS_DINODE_FMT_EXTENTS: error = xfs_bmap_add_attrfork_extents(tp, ip, &logflags); break; case XFS_DINODE_FMT_BTREE: error = xfs_bmap_add_attrfork_btree(tp, ip, &logflags); break; default: error = 0; break; } if (logflags) xfs_trans_log_inode(tp, ip, logflags); if (error) return error; if (!xfs_has_attr(mp) || (!xfs_has_attr2(mp) && version == 2)) { bool log_sb = false; spin_lock(&mp->m_sb_lock); if (!xfs_has_attr(mp)) { xfs_add_attr(mp); log_sb = true; } if (!xfs_has_attr2(mp) && version == 2) { xfs_add_attr2(mp); log_sb = true; } spin_unlock(&mp->m_sb_lock); if (log_sb) xfs_log_sb(tp); } return 0; } /* * Internal and external extent tree search functions. */ struct xfs_iread_state { struct xfs_iext_cursor icur; xfs_extnum_t loaded; }; int xfs_bmap_complain_bad_rec( struct xfs_inode *ip, int whichfork, xfs_failaddr_t fa, const struct xfs_bmbt_irec *irec) { struct xfs_mount *mp = ip->i_mount; const char *forkname; switch (whichfork) { case XFS_DATA_FORK: forkname = "data"; break; case XFS_ATTR_FORK: forkname = "attr"; break; case XFS_COW_FORK: forkname = "CoW"; break; default: forkname = "???"; break; } xfs_warn(mp, "Bmap BTree record corruption in inode 0x%llx %s fork detected at %pS!", ip->i_ino, forkname, fa); xfs_warn(mp, "Offset 0x%llx, start block 0x%llx, block count 0x%llx state 0x%x", irec->br_startoff, irec->br_startblock, irec->br_blockcount, irec->br_state); return -EFSCORRUPTED; } /* Stuff every bmbt record from this block into the incore extent map. */ static int xfs_iread_bmbt_block( struct xfs_btree_cur *cur, int level, void *priv) { struct xfs_iread_state *ir = priv; struct xfs_mount *mp = cur->bc_mp; struct xfs_inode *ip = cur->bc_ino.ip; struct xfs_btree_block *block; struct xfs_buf *bp; struct xfs_bmbt_rec *frp; xfs_extnum_t num_recs; xfs_extnum_t j; int whichfork = cur->bc_ino.whichfork; struct xfs_ifork *ifp = xfs_ifork_ptr(ip, whichfork); block = xfs_btree_get_block(cur, level, &bp); /* Abort if we find more records than nextents. */ num_recs = xfs_btree_get_numrecs(block); if (unlikely(ir->loaded + num_recs > ifp->if_nextents)) { xfs_warn(ip->i_mount, "corrupt dinode %llu, (btree extents).", (unsigned long long)ip->i_ino); xfs_inode_verifier_error(ip, -EFSCORRUPTED, __func__, block, sizeof(*block), __this_address); xfs_bmap_mark_sick(ip, whichfork); return -EFSCORRUPTED; } /* Copy records into the incore cache. */ frp = xfs_bmbt_rec_addr(mp, block, 1); for (j = 0; j < num_recs; j++, frp++, ir->loaded++) { struct xfs_bmbt_irec new; xfs_failaddr_t fa; xfs_bmbt_disk_get_all(frp, &new); fa = xfs_bmap_validate_extent(ip, whichfork, &new); if (fa) { xfs_inode_verifier_error(ip, -EFSCORRUPTED, "xfs_iread_extents(2)", frp, sizeof(*frp), fa); xfs_bmap_mark_sick(ip, whichfork); return xfs_bmap_complain_bad_rec(ip, whichfork, fa, &new); } xfs_iext_insert(ip, &ir->icur, &new, xfs_bmap_fork_to_state(whichfork)); trace_xfs_read_extent(ip, &ir->icur, xfs_bmap_fork_to_state(whichfork), _THIS_IP_); xfs_iext_next(ifp, &ir->icur); } return 0; } /* * Read in extents from a btree-format inode. */ int xfs_iread_extents( struct xfs_trans *tp, struct xfs_inode *ip, int whichfork) { struct xfs_iread_state ir; struct xfs_ifork *ifp = xfs_ifork_ptr(ip, whichfork); struct xfs_mount *mp = ip->i_mount; struct xfs_btree_cur *cur; int error; if (!xfs_need_iread_extents(ifp)) return 0; xfs_assert_ilocked(ip, XFS_ILOCK_EXCL); ir.loaded = 0; xfs_iext_first(ifp, &ir.icur); cur = xfs_bmbt_init_cursor(mp, tp, ip, whichfork); error = xfs_btree_visit_blocks(cur, xfs_iread_bmbt_block, XFS_BTREE_VISIT_RECORDS, &ir); xfs_btree_del_cursor(cur, error); if (error) goto out; if (XFS_IS_CORRUPT(mp, ir.loaded != ifp->if_nextents)) { xfs_bmap_mark_sick(ip, whichfork); error = -EFSCORRUPTED; goto out; } ASSERT(ir.loaded == xfs_iext_count(ifp)); /* * Use release semantics so that we can use acquire semantics in * xfs_need_iread_extents and be guaranteed to see a valid mapping tree * after that load. */ smp_store_release(&ifp->if_needextents, 0); return 0; out: if (xfs_metadata_is_sick(error)) xfs_bmap_mark_sick(ip, whichfork); xfs_iext_destroy(ifp); return error; } /* * Returns the relative block number of the first unused block(s) in the given * fork with at least "len" logically contiguous blocks free. This is the * lowest-address hole if the fork has holes, else the first block past the end * of fork. Return 0 if the fork is currently local (in-inode). */ int /* error */ xfs_bmap_first_unused( struct xfs_trans *tp, /* transaction pointer */ struct xfs_inode *ip, /* incore inode */ xfs_extlen_t len, /* size of hole to find */ xfs_fileoff_t *first_unused, /* unused block */ int whichfork) /* data or attr fork */ { struct xfs_ifork *ifp = xfs_ifork_ptr(ip, whichfork); struct xfs_bmbt_irec got; struct xfs_iext_cursor icur; xfs_fileoff_t lastaddr = 0; xfs_fileoff_t lowest, max; int error; if (ifp->if_format == XFS_DINODE_FMT_LOCAL) { *first_unused = 0; return 0; } ASSERT(xfs_ifork_has_extents(ifp)); error = xfs_iread_extents(tp, ip, whichfork); if (error) return error; lowest = max = *first_unused; for_each_xfs_iext(ifp, &icur, &got) { /* * See if the hole before this extent will work. */ if (got.br_startoff >= lowest + len && got.br_startoff - max >= len) break; lastaddr = got.br_startoff + got.br_blockcount; max = XFS_FILEOFF_MAX(lastaddr, lowest); } *first_unused = max; return 0; } /* * Returns the file-relative block number of the last block - 1 before * last_block (input value) in the file. * This is not based on i_size, it is based on the extent records. * Returns 0 for local files, as they do not have extent records. */ int /* error */ xfs_bmap_last_before( struct xfs_trans *tp, /* transaction pointer */ struct xfs_inode *ip, /* incore inode */ xfs_fileoff_t *last_block, /* last block */ int whichfork) /* data or attr fork */ { struct xfs_ifork *ifp = xfs_ifork_ptr(ip, whichfork); struct xfs_bmbt_irec got; struct xfs_iext_cursor icur; int error; switch (ifp->if_format) { case XFS_DINODE_FMT_LOCAL: *last_block = 0; return 0; case XFS_DINODE_FMT_BTREE: case XFS_DINODE_FMT_EXTENTS: break; default: ASSERT(0); xfs_bmap_mark_sick(ip, whichfork); return -EFSCORRUPTED; } error = xfs_iread_extents(tp, ip, whichfork); if (error) return error; if (!xfs_iext_lookup_extent_before(ip, ifp, last_block, &icur, &got)) *last_block = 0; return 0; } int xfs_bmap_last_extent( struct xfs_trans *tp, struct xfs_inode *ip, int whichfork, struct xfs_bmbt_irec *rec, int *is_empty) { struct xfs_ifork *ifp = xfs_ifork_ptr(ip, whichfork); struct xfs_iext_cursor icur; int error; error = xfs_iread_extents(tp, ip, whichfork); if (error) return error; xfs_iext_last(ifp, &icur); if (!xfs_iext_get_extent(ifp, &icur, rec)) *is_empty = 1; else *is_empty = 0; return 0; } /* * Check the last inode extent to determine whether this allocation will result * in blocks being allocated at the end of the file. When we allocate new data * blocks at the end of the file which do not start at the previous data block, * we will try to align the new blocks at stripe unit boundaries. * * Returns 1 in bma->aeof if the file (fork) is empty as any new write will be * at, or past the EOF. */ STATIC int xfs_bmap_isaeof( struct xfs_bmalloca *bma, int whichfork) { struct xfs_bmbt_irec rec; int is_empty; int error; bma->aeof = false; error = xfs_bmap_last_extent(NULL, bma->ip, whichfork, &rec, &is_empty); if (error) return error; if (is_empty) { bma->aeof = true; return 0; } /* * Check if we are allocation or past the last extent, or at least into * the last delayed allocated extent. */ bma->aeof = bma->offset >= rec.br_startoff + rec.br_blockcount || (bma->offset >= rec.br_startoff && isnullstartblock(rec.br_startblock)); return 0; } /* * Returns the file-relative block number of the first block past eof in * the file. This is not based on i_size, it is based on the extent records. * Returns 0 for local files, as they do not have extent records. */ int xfs_bmap_last_offset( struct xfs_inode *ip, xfs_fileoff_t *last_block, int whichfork) { struct xfs_ifork *ifp = xfs_ifork_ptr(ip, whichfork); struct xfs_bmbt_irec rec; int is_empty; int error; *last_block = 0; if (ifp->if_format == XFS_DINODE_FMT_LOCAL) return 0; if (XFS_IS_CORRUPT(ip->i_mount, !xfs_ifork_has_extents(ifp))) { xfs_bmap_mark_sick(ip, whichfork); return -EFSCORRUPTED; } error = xfs_bmap_last_extent(NULL, ip, whichfork, &rec, &is_empty); if (error || is_empty) return error; *last_block = rec.br_startoff + rec.br_blockcount; return 0; } /* * Extent tree manipulation functions used during allocation. */ static inline bool xfs_bmap_same_rtgroup( struct xfs_inode *ip, int whichfork, struct xfs_bmbt_irec *left, struct xfs_bmbt_irec *right) { struct xfs_mount *mp = ip->i_mount; if (xfs_ifork_is_realtime(ip, whichfork) && xfs_has_rtgroups(mp)) { if (xfs_rtb_to_rgno(mp, left->br_startblock) != xfs_rtb_to_rgno(mp, right->br_startblock)) return false; } return true; } /* * Convert a delayed allocation to a real allocation. */ STATIC int /* error */ xfs_bmap_add_extent_delay_real( struct xfs_bmalloca *bma, int whichfork) { struct xfs_mount *mp = bma->ip->i_mount; struct xfs_ifork *ifp = xfs_ifork_ptr(bma->ip, whichfork); struct xfs_bmbt_irec *new = &bma->got; int error; /* error return value */ int i; /* temp state */ xfs_fileoff_t new_endoff; /* end offset of new entry */ xfs_bmbt_irec_t r[3]; /* neighbor extent entries */ /* left is 0, right is 1, prev is 2 */ int rval=0; /* return value (logging flags) */ uint32_t state = xfs_bmap_fork_to_state(whichfork); xfs_filblks_t da_new; /* new count del alloc blocks used */ xfs_filblks_t da_old; /* old count del alloc blocks used */ xfs_filblks_t temp=0; /* value for da_new calculations */ int tmp_rval; /* partial logging flags */ struct xfs_bmbt_irec old; ASSERT(whichfork != XFS_ATTR_FORK); ASSERT(!isnullstartblock(new->br_startblock)); ASSERT(!bma->cur || (bma->cur->bc_flags & XFS_BTREE_BMBT_WASDEL)); XFS_STATS_INC(mp, xs_add_exlist); #define LEFT r[0] #define RIGHT r[1] #define PREV r[2] /* * Set up a bunch of variables to make the tests simpler. */ xfs_iext_get_extent(ifp, &bma->icur, &PREV); new_endoff = new->br_startoff + new->br_blockcount; ASSERT(isnullstartblock(PREV.br_startblock)); ASSERT(PREV.br_startoff <= new->br_startoff); ASSERT(PREV.br_startoff + PREV.br_blockcount >= new_endoff); da_old = startblockval(PREV.br_startblock); da_new = 0; /* * Set flags determining what part of the previous delayed allocation * extent is being replaced by a real allocation. */ if (PREV.br_startoff == new->br_startoff) state |= BMAP_LEFT_FILLING; if (PREV.br_startoff + PREV.br_blockcount == new_endoff) state |= BMAP_RIGHT_FILLING; /* * Check and set flags if this segment has a left neighbor. * Don't set contiguous if the combined extent would be too large. */ if (xfs_iext_peek_prev_extent(ifp, &bma->icur, &LEFT)) { state |= BMAP_LEFT_VALID; if (isnullstartblock(LEFT.br_startblock)) state |= BMAP_LEFT_DELAY; } if ((state & BMAP_LEFT_VALID) && !(state & BMAP_LEFT_DELAY) && LEFT.br_startoff + LEFT.br_blockcount == new->br_startoff && LEFT.br_startblock + LEFT.br_blockcount == new->br_startblock && LEFT.br_state == new->br_state && LEFT.br_blockcount + new->br_blockcount <= XFS_MAX_BMBT_EXTLEN && xfs_bmap_same_rtgroup(bma->ip, whichfork, &LEFT, new)) state |= BMAP_LEFT_CONTIG; /* * Check and set flags if this segment has a right neighbor. * Don't set contiguous if the combined extent would be too large. * Also check for all-three-contiguous being too large. */ if (xfs_iext_peek_next_extent(ifp, &bma->icur, &RIGHT)) { state |= BMAP_RIGHT_VALID; if (isnullstartblock(RIGHT.br_startblock)) state |= BMAP_RIGHT_DELAY; } if ((state & BMAP_RIGHT_VALID) && !(state & BMAP_RIGHT_DELAY) && new_endoff == RIGHT.br_startoff && new->br_startblock + new->br_blockcount == RIGHT.br_startblock && new->br_state == RIGHT.br_state && new->br_blockcount + RIGHT.br_blockcount <= XFS_MAX_BMBT_EXTLEN && ((state & (BMAP_LEFT_CONTIG | BMAP_LEFT_FILLING | BMAP_RIGHT_FILLING)) != (BMAP_LEFT_CONTIG | BMAP_LEFT_FILLING | BMAP_RIGHT_FILLING) || LEFT.br_blockcount + new->br_blockcount + RIGHT.br_blockcount <= XFS_MAX_BMBT_EXTLEN) && xfs_bmap_same_rtgroup(bma->ip, whichfork, new, &RIGHT)) state |= BMAP_RIGHT_CONTIG; error = 0; /* * Switch out based on the FILLING and CONTIG state bits. */ switch (state & (BMAP_LEFT_FILLING | BMAP_LEFT_CONTIG | BMAP_RIGHT_FILLING | BMAP_RIGHT_CONTIG)) { case BMAP_LEFT_FILLING | BMAP_LEFT_CONTIG | BMAP_RIGHT_FILLING | BMAP_RIGHT_CONTIG: /* * Filling in all of a previously delayed allocation extent. * The left and right neighbors are both contiguous with new. */ LEFT.br_blockcount += PREV.br_blockcount + RIGHT.br_blockcount; xfs_iext_remove(bma->ip, &bma->icur, state); xfs_iext_remove(bma->ip, &bma->icur, state); xfs_iext_prev(ifp, &bma->icur); xfs_iext_update_extent(bma->ip, state, &bma->icur, &LEFT); ifp->if_nextents--; if (bma->cur == NULL) rval = XFS_ILOG_CORE | XFS_ILOG_DEXT; else { rval = XFS_ILOG_CORE; error = xfs_bmbt_lookup_eq(bma->cur, &RIGHT, &i); if (error) goto done; if (XFS_IS_CORRUPT(mp, i != 1)) { xfs_btree_mark_sick(bma->cur); error = -EFSCORRUPTED; goto done; } error = xfs_btree_delete(bma->cur, &i); if (error) goto done; if (XFS_IS_CORRUPT(mp, i != 1)) { xfs_btree_mark_sick(bma->cur); error = -EFSCORRUPTED; goto done; } error = xfs_btree_decrement(bma->cur, 0, &i); if (error) goto done; if (XFS_IS_CORRUPT(mp, i != 1)) { xfs_btree_mark_sick(bma->cur); error = -EFSCORRUPTED; goto done; } error = xfs_bmbt_update(bma->cur, &LEFT); if (error) goto done; } ASSERT(da_new <= da_old); break; case BMAP_LEFT_FILLING | BMAP_RIGHT_FILLING | BMAP_LEFT_CONTIG: /* * Filling in all of a previously delayed allocation extent. * The left neighbor is contiguous, the right is not. */ old = LEFT; LEFT.br_blockcount += PREV.br_blockcount; xfs_iext_remove(bma->ip, &bma->icur, state); xfs_iext_prev(ifp, &bma->icur); xfs_iext_update_extent(bma->ip, state, &bma->icur, &LEFT); if (bma->cur == NULL) rval = XFS_ILOG_DEXT; else { rval = 0; error = xfs_bmbt_lookup_eq(bma->cur, &old, &i); if (error) goto done; if (XFS_IS_CORRUPT(mp, i != 1)) { xfs_btree_mark_sick(bma->cur); error = -EFSCORRUPTED; goto done; } error = xfs_bmbt_update(bma->cur, &LEFT); if (error) goto done; } ASSERT(da_new <= da_old); break; case BMAP_LEFT_FILLING | BMAP_RIGHT_FILLING | BMAP_RIGHT_CONTIG: /* * Filling in all of a previously delayed allocation extent. * The right neighbor is contiguous, the left is not. Take care * with delay -> unwritten extent allocation here because the * delalloc record we are overwriting is always written. */ PREV.br_startblock = new->br_startblock; PREV.br_blockcount += RIGHT.br_blockcount; PREV.br_state = new->br_state; xfs_iext_next(ifp, &bma->icur); xfs_iext_remove(bma->ip, &bma->icur, state); xfs_iext_prev(ifp, &bma->icur); xfs_iext_update_extent(bma->ip, state, &bma->icur, &PREV); if (bma->cur == NULL) rval = XFS_ILOG_DEXT; else { rval = 0; error = xfs_bmbt_lookup_eq(bma->cur, &RIGHT, &i); if (error) goto done; if (XFS_IS_CORRUPT(mp, i != 1)) { xfs_btree_mark_sick(bma->cur); error = -EFSCORRUPTED; goto done; } error = xfs_bmbt_update(bma->cur, &PREV); if (error) goto done; } ASSERT(da_new <= da_old); break; case BMAP_LEFT_FILLING | BMAP_RIGHT_FILLING: /* * Filling in all of a previously delayed allocation extent. * Neither the left nor right neighbors are contiguous with * the new one. */ PREV.br_startblock = new->br_startblock; PREV.br_state = new->br_state; xfs_iext_update_extent(bma->ip, state, &bma->icur, &PREV); ifp->if_nextents++; if (bma->cur == NULL) rval = XFS_ILOG_CORE | XFS_ILOG_DEXT; else { rval = XFS_ILOG_CORE; error = xfs_bmbt_lookup_eq(bma->cur, new, &i); if (error) goto done; if (XFS_IS_CORRUPT(mp, i != 0)) { xfs_btree_mark_sick(bma->cur); error = -EFSCORRUPTED; goto done; } error = xfs_btree_insert(bma->cur, &i); if (error) goto done; if (XFS_IS_CORRUPT(mp, i != 1)) { xfs_btree_mark_sick(bma->cur); error = -EFSCORRUPTED; goto done; } } ASSERT(da_new <= da_old); break; case BMAP_LEFT_FILLING | BMAP_LEFT_CONTIG: /* * Filling in the first part of a previous delayed allocation. * The left neighbor is contiguous. */ old = LEFT; temp = PREV.br_blockcount - new->br_blockcount; da_new = XFS_FILBLKS_MIN(xfs_bmap_worst_indlen(bma->ip, temp), startblockval(PREV.br_startblock)); LEFT.br_blockcount += new->br_blockcount; PREV.br_blockcount = temp; PREV.br_startoff += new->br_blockcount; PREV.br_startblock = nullstartblock(da_new); xfs_iext_update_extent(bma->ip, state, &bma->icur, &PREV); xfs_iext_prev(ifp, &bma->icur); xfs_iext_update_extent(bma->ip, state, &bma->icur, &LEFT); if (bma->cur == NULL) rval = XFS_ILOG_DEXT; else { rval = 0; error = xfs_bmbt_lookup_eq(bma->cur, &old, &i); if (error) goto done; if (XFS_IS_CORRUPT(mp, i != 1)) { xfs_btree_mark_sick(bma->cur); error = -EFSCORRUPTED; goto done; } error = xfs_bmbt_update(bma->cur, &LEFT); if (error) goto done; } ASSERT(da_new <= da_old); break; case BMAP_LEFT_FILLING: /* * Filling in the first part of a previous delayed allocation. * The left neighbor is not contiguous. */ xfs_iext_update_extent(bma->ip, state, &bma->icur, new); ifp->if_nextents++; if (bma->cur == NULL) rval = XFS_ILOG_CORE | XFS_ILOG_DEXT; else { rval = XFS_ILOG_CORE; error = xfs_bmbt_lookup_eq(bma->cur, new, &i); if (error) goto done; if (XFS_IS_CORRUPT(mp, i != 0)) { xfs_btree_mark_sick(bma->cur); error = -EFSCORRUPTED; goto done; } error = xfs_btree_insert(bma->cur, &i); if (error) goto done; if (XFS_IS_CORRUPT(mp, i != 1)) { xfs_btree_mark_sick(bma->cur); error = -EFSCORRUPTED; goto done; } } if (xfs_bmap_needs_btree(bma->ip, whichfork)) { error = xfs_bmap_extents_to_btree(bma->tp, bma->ip, &bma->cur, 1, &tmp_rval, whichfork); rval |= tmp_rval; if (error) goto done; } temp = PREV.br_blockcount - new->br_blockcount; da_new = XFS_FILBLKS_MIN(xfs_bmap_worst_indlen(bma->ip, temp), startblockval(PREV.br_startblock) - (bma->cur ? bma->cur->bc_bmap.allocated : 0)); PREV.br_startoff = new_endoff; PREV.br_blockcount = temp; PREV.br_startblock = nullstartblock(da_new); xfs_iext_next(ifp, &bma->icur); xfs_iext_insert(bma->ip, &bma->icur, &PREV, state); xfs_iext_prev(ifp, &bma->icur); break; case BMAP_RIGHT_FILLING | BMAP_RIGHT_CONTIG: /* * Filling in the last part of a previous delayed allocation. * The right neighbor is contiguous with the new allocation. */ old = RIGHT; RIGHT.br_startoff = new->br_startoff; RIGHT.br_startblock = new->br_startblock; RIGHT.br_blockcount += new->br_blockcount; if (bma->cur == NULL) rval = XFS_ILOG_DEXT; else { rval = 0; error = xfs_bmbt_lookup_eq(bma->cur, &old, &i); if (error) goto done; if (XFS_IS_CORRUPT(mp, i != 1)) { xfs_btree_mark_sick(bma->cur); error = -EFSCORRUPTED; goto done; } error = xfs_bmbt_update(bma->cur, &RIGHT); if (error) goto done; } temp = PREV.br_blockcount - new->br_blockcount; da_new = XFS_FILBLKS_MIN(xfs_bmap_worst_indlen(bma->ip, temp), startblockval(PREV.br_startblock)); PREV.br_blockcount = temp; PREV.br_startblock = nullstartblock(da_new); xfs_iext_update_extent(bma->ip, state, &bma->icur, &PREV); xfs_iext_next(ifp, &bma->icur); xfs_iext_update_extent(bma->ip, state, &bma->icur, &RIGHT); ASSERT(da_new <= da_old); break; case BMAP_RIGHT_FILLING: /* * Filling in the last part of a previous delayed allocation. * The right neighbor is not contiguous. */ xfs_iext_update_extent(bma->ip, state, &bma->icur, new); ifp->if_nextents++; if (bma->cur == NULL) rval = XFS_ILOG_CORE | XFS_ILOG_DEXT; else { rval = XFS_ILOG_CORE; error = xfs_bmbt_lookup_eq(bma->cur, new, &i); if (error) goto done; if (XFS_IS_CORRUPT(mp, i != 0)) { xfs_btree_mark_sick(bma->cur); error = -EFSCORRUPTED; goto done; } error = xfs_btree_insert(bma->cur, &i); if (error) goto done; if (XFS_IS_CORRUPT(mp, i != 1)) { xfs_btree_mark_sick(bma->cur); error = -EFSCORRUPTED; goto done; } } if (xfs_bmap_needs_btree(bma->ip, whichfork)) { error = xfs_bmap_extents_to_btree(bma->tp, bma->ip, &bma->cur, 1, &tmp_rval, whichfork); rval |= tmp_rval; if (error) goto done; } temp = PREV.br_blockcount - new->br_blockcount; da_new = XFS_FILBLKS_MIN(xfs_bmap_worst_indlen(bma->ip, temp), startblockval(PREV.br_startblock) - (bma->cur ? bma->cur->bc_bmap.allocated : 0)); PREV.br_startblock = nullstartblock(da_new); PREV.br_blockcount = temp; xfs_iext_insert(bma->ip, &bma->icur, &PREV, state); xfs_iext_next(ifp, &bma->icur); ASSERT(da_new <= da_old); break; case 0: /* * Filling in the middle part of a previous delayed allocation. * Contiguity is impossible here. * This case is avoided almost all the time. * * We start with a delayed allocation: * * +ddddddddddddddddddddddddddddddddddddddddddddddddddddddd+ * PREV @ idx * * and we are allocating: * +rrrrrrrrrrrrrrrrr+ * new * * and we set it up for insertion as: * +ddddddddddddddddddd+rrrrrrrrrrrrrrrrr+ddddddddddddddddd+ * new * PREV @ idx LEFT RIGHT * inserted at idx + 1 */ old = PREV; /* LEFT is the new middle */ LEFT = *new; /* RIGHT is the new right */ RIGHT.br_state = PREV.br_state; RIGHT.br_startoff = new_endoff; RIGHT.br_blockcount = PREV.br_startoff + PREV.br_blockcount - new_endoff; RIGHT.br_startblock = nullstartblock(xfs_bmap_worst_indlen(bma->ip, RIGHT.br_blockcount)); /* truncate PREV */ PREV.br_blockcount = new->br_startoff - PREV.br_startoff; PREV.br_startblock = nullstartblock(xfs_bmap_worst_indlen(bma->ip, PREV.br_blockcount)); xfs_iext_update_extent(bma->ip, state, &bma->icur, &PREV); xfs_iext_next(ifp, &bma->icur); xfs_iext_insert(bma->ip, &bma->icur, &RIGHT, state); xfs_iext_insert(bma->ip, &bma->icur, &LEFT, state); ifp->if_nextents++; if (bma->cur == NULL) rval = XFS_ILOG_CORE | XFS_ILOG_DEXT; else { rval = XFS_ILOG_CORE; error = xfs_bmbt_lookup_eq(bma->cur, new, &i); if (error) goto done; if (XFS_IS_CORRUPT(mp, i != 0)) { xfs_btree_mark_sick(bma->cur); error = -EFSCORRUPTED; goto done; } error = xfs_btree_insert(bma->cur, &i); if (error) goto done; if (XFS_IS_CORRUPT(mp, i != 1)) { xfs_btree_mark_sick(bma->cur); error = -EFSCORRUPTED; goto done; } } if (xfs_bmap_needs_btree(bma->ip, whichfork)) { error = xfs_bmap_extents_to_btree(bma->tp, bma->ip, &bma->cur, 1, &tmp_rval, whichfork); rval |= tmp_rval; if (error) goto done; } da_new = startblockval(PREV.br_startblock) + startblockval(RIGHT.br_startblock); break; case BMAP_LEFT_FILLING | BMAP_LEFT_CONTIG | BMAP_RIGHT_CONTIG: case BMAP_RIGHT_FILLING | BMAP_LEFT_CONTIG | BMAP_RIGHT_CONTIG: case BMAP_LEFT_FILLING | BMAP_RIGHT_CONTIG: case BMAP_RIGHT_FILLING | BMAP_LEFT_CONTIG: case BMAP_LEFT_CONTIG | BMAP_RIGHT_CONTIG: case BMAP_LEFT_CONTIG: case BMAP_RIGHT_CONTIG: /* * These cases are all impossible. */ ASSERT(0); } /* add reverse mapping unless caller opted out */ if (!(bma->flags & XFS_BMAPI_NORMAP)) xfs_rmap_map_extent(bma->tp, bma->ip, whichfork, new); /* convert to a btree if necessary */ if (xfs_bmap_needs_btree(bma->ip, whichfork)) { int tmp_logflags; /* partial log flag return val */ ASSERT(bma->cur == NULL); error = xfs_bmap_extents_to_btree(bma->tp, bma->ip, &bma->cur, da_old > 0, &tmp_logflags, whichfork); bma->logflags |= tmp_logflags; if (error) goto done; } if (da_new != da_old) xfs_mod_delalloc(bma->ip, 0, (int64_t)da_new - da_old); if (bma->cur) { da_new += bma->cur->bc_bmap.allocated; bma->cur->bc_bmap.allocated = 0; } /* adjust for changes in reserved delayed indirect blocks */ if (da_new < da_old) xfs_add_fdblocks(mp, da_old - da_new); else if (da_new > da_old) error = xfs_dec_fdblocks(mp, da_new - da_old, true); xfs_bmap_check_leaf_extents(bma->cur, bma->ip, whichfork); done: if (whichfork != XFS_COW_FORK) bma->logflags |= rval; return error; #undef LEFT #undef RIGHT #undef PREV } /* * Convert an unwritten allocation to a real allocation or vice versa. */ int /* error */ xfs_bmap_add_extent_unwritten_real( struct xfs_trans *tp, xfs_inode_t *ip, /* incore inode pointer */ int whichfork, struct xfs_iext_cursor *icur, struct xfs_btree_cur **curp, /* if *curp is null, not a btree */ xfs_bmbt_irec_t *new, /* new data to add to file extents */ int *logflagsp) /* inode logging flags */ { struct xfs_btree_cur *cur; /* btree cursor */ int error; /* error return value */ int i; /* temp state */ struct xfs_ifork *ifp; /* inode fork pointer */ xfs_fileoff_t new_endoff; /* end offset of new entry */ xfs_bmbt_irec_t r[3]; /* neighbor extent entries */ /* left is 0, right is 1, prev is 2 */ int rval=0; /* return value (logging flags) */ uint32_t state = xfs_bmap_fork_to_state(whichfork); struct xfs_mount *mp = ip->i_mount; struct xfs_bmbt_irec old; *logflagsp = 0; cur = *curp; ifp = xfs_ifork_ptr(ip, whichfork); ASSERT(!isnullstartblock(new->br_startblock)); XFS_STATS_INC(mp, xs_add_exlist); #define LEFT r[0] #define RIGHT r[1] #define PREV r[2] /* * Set up a bunch of variables to make the tests simpler. */ error = 0; xfs_iext_get_extent(ifp, icur, &PREV); ASSERT(new->br_state != PREV.br_state); new_endoff = new->br_startoff + new->br_blockcount; ASSERT(PREV.br_startoff <= new->br_startoff); ASSERT(PREV.br_startoff + PREV.br_blockcount >= new_endoff); /* * Set flags determining what part of the previous oldext allocation * extent is being replaced by a newext allocation. */ if (PREV.br_startoff == new->br_startoff) state |= BMAP_LEFT_FILLING; if (PREV.br_startoff + PREV.br_blockcount == new_endoff) state |= BMAP_RIGHT_FILLING; /* * Check and set flags if this segment has a left neighbor. * Don't set contiguous if the combined extent would be too large. */ if (xfs_iext_peek_prev_extent(ifp, icur, &LEFT)) { state |= BMAP_LEFT_VALID; if (isnullstartblock(LEFT.br_startblock)) state |= BMAP_LEFT_DELAY; } if ((state & BMAP_LEFT_VALID) && !(state & BMAP_LEFT_DELAY) && LEFT.br_startoff + LEFT.br_blockcount == new->br_startoff && LEFT.br_startblock + LEFT.br_blockcount == new->br_startblock && LEFT.br_state == new->br_state && LEFT.br_blockcount + new->br_blockcount <= XFS_MAX_BMBT_EXTLEN && xfs_bmap_same_rtgroup(ip, whichfork, &LEFT, new)) state |= BMAP_LEFT_CONTIG; /* * Check and set flags if this segment has a right neighbor. * Don't set contiguous if the combined extent would be too large. * Also check for all-three-contiguous being too large. */ if (xfs_iext_peek_next_extent(ifp, icur, &RIGHT)) { state |= BMAP_RIGHT_VALID; if (isnullstartblock(RIGHT.br_startblock)) state |= BMAP_RIGHT_DELAY; } if ((state & BMAP_RIGHT_VALID) && !(state & BMAP_RIGHT_DELAY) && new_endoff == RIGHT.br_startoff && new->br_startblock + new->br_blockcount == RIGHT.br_startblock && new->br_state == RIGHT.br_state && new->br_blockcount + RIGHT.br_blockcount <= XFS_MAX_BMBT_EXTLEN && ((state & (BMAP_LEFT_CONTIG | BMAP_LEFT_FILLING | BMAP_RIGHT_FILLING)) != (BMAP_LEFT_CONTIG | BMAP_LEFT_FILLING | BMAP_RIGHT_FILLING) || LEFT.br_blockcount + new->br_blockcount + RIGHT.br_blockcount <= XFS_MAX_BMBT_EXTLEN) && xfs_bmap_same_rtgroup(ip, whichfork, new, &RIGHT)) state |= BMAP_RIGHT_CONTIG; /* * Switch out based on the FILLING and CONTIG state bits. */ switch (state & (BMAP_LEFT_FILLING | BMAP_LEFT_CONTIG | BMAP_RIGHT_FILLING | BMAP_RIGHT_CONTIG)) { case BMAP_LEFT_FILLING | BMAP_LEFT_CONTIG | BMAP_RIGHT_FILLING | BMAP_RIGHT_CONTIG: /* * Setting all of a previous oldext extent to newext. * The left and right neighbors are both contiguous with new. */ LEFT.br_blockcount += PREV.br_blockcount + RIGHT.br_blockcount; xfs_iext_remove(ip, icur, state); xfs_iext_remove(ip, icur, state); xfs_iext_prev(ifp, icur); xfs_iext_update_extent(ip, state, icur, &LEFT); ifp->if_nextents -= 2; if (cur == NULL) rval = XFS_ILOG_CORE | XFS_ILOG_DEXT; else { rval = XFS_ILOG_CORE; error = xfs_bmbt_lookup_eq(cur, &RIGHT, &i); if (error) goto done; if (XFS_IS_CORRUPT(mp, i != 1)) { xfs_btree_mark_sick(cur); error = -EFSCORRUPTED; goto done; } if ((error = xfs_btree_delete(cur, &i))) goto done; if (XFS_IS_CORRUPT(mp, i != 1)) { xfs_btree_mark_sick(cur); error = -EFSCORRUPTED; goto done; } if ((error = xfs_btree_decrement(cur, 0, &i))) goto done; if (XFS_IS_CORRUPT(mp, i != 1)) { xfs_btree_mark_sick(cur); error = -EFSCORRUPTED; goto done; } if ((error = xfs_btree_delete(cur, &i))) goto done; if (XFS_IS_CORRUPT(mp, i != 1)) { xfs_btree_mark_sick(cur); error = -EFSCORRUPTED; goto done; } if ((error = xfs_btree_decrement(cur, 0, &i))) goto done; if (XFS_IS_CORRUPT(mp, i != 1)) { xfs_btree_mark_sick(cur); error = -EFSCORRUPTED; goto done; } error = xfs_bmbt_update(cur, &LEFT); if (error) goto done; } break; case BMAP_LEFT_FILLING | BMAP_RIGHT_FILLING | BMAP_LEFT_CONTIG: /* * Setting all of a previous oldext extent to newext. * The left neighbor is contiguous, the right is not. */ LEFT.br_blockcount += PREV.br_blockcount; xfs_iext_remove(ip, icur, state); xfs_iext_prev(ifp, icur); xfs_iext_update_extent(ip, state, icur, &LEFT); ifp->if_nextents--; if (cur == NULL) rval = XFS_ILOG_CORE | XFS_ILOG_DEXT; else { rval = XFS_ILOG_CORE; error = xfs_bmbt_lookup_eq(cur, &PREV, &i); if (error) goto done; if (XFS_IS_CORRUPT(mp, i != 1)) { xfs_btree_mark_sick(cur); error = -EFSCORRUPTED; goto done; } if ((error = xfs_btree_delete(cur, &i))) goto done; if (XFS_IS_CORRUPT(mp, i != 1)) { xfs_btree_mark_sick(cur); error = -EFSCORRUPTED; goto done; } if ((error = xfs_btree_decrement(cur, 0, &i))) goto done; if (XFS_IS_CORRUPT(mp, i != 1)) { xfs_btree_mark_sick(cur); error = -EFSCORRUPTED; goto done; } error = xfs_bmbt_update(cur, &LEFT); if (error) goto done; } break; case BMAP_LEFT_FILLING | BMAP_RIGHT_FILLING | BMAP_RIGHT_CONTIG: /* * Setting all of a previous oldext extent to newext. * The right neighbor is contiguous, the left is not. */ PREV.br_blockcount += RIGHT.br_blockcount; PREV.br_state = new->br_state; xfs_iext_next(ifp, icur); xfs_iext_remove(ip, icur, state); xfs_iext_prev(ifp, icur); xfs_iext_update_extent(ip, state, icur, &PREV); ifp->if_nextents--; if (cur == NULL) rval = XFS_ILOG_CORE | XFS_ILOG_DEXT; else { rval = XFS_ILOG_CORE; error = xfs_bmbt_lookup_eq(cur, &RIGHT, &i); if (error) goto done; if (XFS_IS_CORRUPT(mp, i != 1)) { xfs_btree_mark_sick(cur); error = -EFSCORRUPTED; goto done; } if ((error = xfs_btree_delete(cur, &i))) goto done; if (XFS_IS_CORRUPT(mp, i != 1)) { xfs_btree_mark_sick(cur); error = -EFSCORRUPTED; goto done; } if ((error = xfs_btree_decrement(cur, 0, &i))) goto done; if (XFS_IS_CORRUPT(mp, i != 1)) { xfs_btree_mark_sick(cur); error = -EFSCORRUPTED; goto done; } error = xfs_bmbt_update(cur, &PREV); if (error) goto done; } break; case BMAP_LEFT_FILLING | BMAP_RIGHT_FILLING: /* * Setting all of a previous oldext extent to newext. * Neither the left nor right neighbors are contiguous with * the new one. */ PREV.br_state = new->br_state; xfs_iext_update_extent(ip, state, icur, &PREV); if (cur == NULL) rval = XFS_ILOG_DEXT; else { rval = 0; error = xfs_bmbt_lookup_eq(cur, new, &i); if (error) goto done; if (XFS_IS_CORRUPT(mp, i != 1)) { xfs_btree_mark_sick(cur); error = -EFSCORRUPTED; goto done; } error = xfs_bmbt_update(cur, &PREV); if (error) goto done; } break; case BMAP_LEFT_FILLING | BMAP_LEFT_CONTIG: /* * Setting the first part of a previous oldext extent to newext. * The left neighbor is contiguous. */ LEFT.br_blockcount += new->br_blockcount; old = PREV; PREV.br_startoff += new->br_blockcount; PREV.br_startblock += new->br_blockcount; PREV.br_blockcount -= new->br_blockcount; xfs_iext_update_extent(ip, state, icur, &PREV); xfs_iext_prev(ifp, icur); xfs_iext_update_extent(ip, state, icur, &LEFT); if (cur == NULL) rval = XFS_ILOG_DEXT; else { rval = 0; error = xfs_bmbt_lookup_eq(cur, &old, &i); if (error) goto done; if (XFS_IS_CORRUPT(mp, i != 1)) { xfs_btree_mark_sick(cur); error = -EFSCORRUPTED; goto done; } error = xfs_bmbt_update(cur, &PREV); if (error) goto done; error = xfs_btree_decrement(cur, 0, &i); if (error) goto done; error = xfs_bmbt_update(cur, &LEFT); if (error) goto done; } break; case BMAP_LEFT_FILLING: /* * Setting the first part of a previous oldext extent to newext. * The left neighbor is not contiguous. */ old = PREV; PREV.br_startoff += new->br_blockcount; PREV.br_startblock += new->br_blockcount; PREV.br_blockcount -= new->br_blockcount; xfs_iext_update_extent(ip, state, icur, &PREV); xfs_iext_insert(ip, icur, new, state); ifp->if_nextents++; if (cur == NULL) rval = XFS_ILOG_CORE | XFS_ILOG_DEXT; else { rval = XFS_ILOG_CORE; error = xfs_bmbt_lookup_eq(cur, &old, &i); if (error) goto done; if (XFS_IS_CORRUPT(mp, i != 1)) { xfs_btree_mark_sick(cur); error = -EFSCORRUPTED; goto done; } error = xfs_bmbt_update(cur, &PREV); if (error) goto done; cur->bc_rec.b = *new; if ((error = xfs_btree_insert(cur, &i))) goto done; if (XFS_IS_CORRUPT(mp, i != 1)) { xfs_btree_mark_sick(cur); error = -EFSCORRUPTED; goto done; } } break; case BMAP_RIGHT_FILLING | BMAP_RIGHT_CONTIG: /* * Setting the last part of a previous oldext extent to newext. * The right neighbor is contiguous with the new allocation. */ old = PREV; PREV.br_blockcount -= new->br_blockcount; RIGHT.br_startoff = new->br_startoff; RIGHT.br_startblock = new->br_startblock; RIGHT.br_blockcount += new->br_blockcount; xfs_iext_update_extent(ip, state, icur, &PREV); xfs_iext_next(ifp, icur); xfs_iext_update_extent(ip, state, icur, &RIGHT); if (cur == NULL) rval = XFS_ILOG_DEXT; else { rval = 0; error = xfs_bmbt_lookup_eq(cur, &old, &i); if (error) goto done; if (XFS_IS_CORRUPT(mp, i != 1)) { xfs_btree_mark_sick(cur); error = -EFSCORRUPTED; goto done; } error = xfs_bmbt_update(cur, &PREV); if (error) goto done; error = xfs_btree_increment(cur, 0, &i); if (error) goto done; error = xfs_bmbt_update(cur, &RIGHT); if (error) goto done; } break; case BMAP_RIGHT_FILLING: /* * Setting the last part of a previous oldext extent to newext. * The right neighbor is not contiguous. */ old = PREV; PREV.br_blockcount -= new->br_blockcount; xfs_iext_update_extent(ip, state, icur, &PREV); xfs_iext_next(ifp, icur); xfs_iext_insert(ip, icur, new, state); ifp->if_nextents++; if (cur == NULL) rval = XFS_ILOG_CORE | XFS_ILOG_DEXT; else { rval = XFS_ILOG_CORE; error = xfs_bmbt_lookup_eq(cur, &old, &i); if (error) goto done; if (XFS_IS_CORRUPT(mp, i != 1)) { xfs_btree_mark_sick(cur); error = -EFSCORRUPTED; goto done; } error = xfs_bmbt_update(cur, &PREV); if (error) goto done; error = xfs_bmbt_lookup_eq(cur, new, &i); if (error) goto done; if (XFS_IS_CORRUPT(mp, i != 0)) { xfs_btree_mark_sick(cur); error = -EFSCORRUPTED; goto done; } if ((error = xfs_btree_insert(cur, &i))) goto done; if (XFS_IS_CORRUPT(mp, i != 1)) { xfs_btree_mark_sick(cur); error = -EFSCORRUPTED; goto done; } } break; case 0: /* * Setting the middle part of a previous oldext extent to * newext. Contiguity is impossible here. * One extent becomes three extents. */ old = PREV; PREV.br_blockcount = new->br_startoff - PREV.br_startoff; r[0] = *new; r[1].br_startoff = new_endoff; r[1].br_blockcount = old.br_startoff + old.br_blockcount - new_endoff; r[1].br_startblock = new->br_startblock + new->br_blockcount; r[1].br_state = PREV.br_state; xfs_iext_update_extent(ip, state, icur, &PREV); xfs_iext_next(ifp, icur); xfs_iext_insert(ip, icur, &r[1], state); xfs_iext_insert(ip, icur, &r[0], state); ifp->if_nextents += 2; if (cur == NULL) rval = XFS_ILOG_CORE | XFS_ILOG_DEXT; else { rval = XFS_ILOG_CORE; error = xfs_bmbt_lookup_eq(cur, &old, &i); if (error) goto done; if (XFS_IS_CORRUPT(mp, i != 1)) { xfs_btree_mark_sick(cur); error = -EFSCORRUPTED; goto done; } /* new right extent - oldext */ error = xfs_bmbt_update(cur, &r[1]); if (error) goto done; /* new left extent - oldext */ cur->bc_rec.b = PREV; if ((error = xfs_btree_insert(cur, &i))) goto done; if (XFS_IS_CORRUPT(mp, i != 1)) { xfs_btree_mark_sick(cur); error = -EFSCORRUPTED; goto done; } /* * Reset the cursor to the position of the new extent * we are about to insert as we can't trust it after * the previous insert. */ error = xfs_bmbt_lookup_eq(cur, new, &i); if (error) goto done; if (XFS_IS_CORRUPT(mp, i != 0)) { xfs_btree_mark_sick(cur); error = -EFSCORRUPTED; goto done; } /* new middle extent - newext */ if ((error = xfs_btree_insert(cur, &i))) goto done; if (XFS_IS_CORRUPT(mp, i != 1)) { xfs_btree_mark_sick(cur); error = -EFSCORRUPTED; goto done; } } break; case BMAP_LEFT_FILLING | BMAP_LEFT_CONTIG | BMAP_RIGHT_CONTIG: case BMAP_RIGHT_FILLING | BMAP_LEFT_CONTIG | BMAP_RIGHT_CONTIG: case BMAP_LEFT_FILLING | BMAP_RIGHT_CONTIG: case BMAP_RIGHT_FILLING | BMAP_LEFT_CONTIG: case BMAP_LEFT_CONTIG | BMAP_RIGHT_CONTIG: case BMAP_LEFT_CONTIG: case BMAP_RIGHT_CONTIG: /* * These cases are all impossible. */ ASSERT(0); } /* update reverse mappings */ xfs_rmap_convert_extent(mp, tp, ip, whichfork, new); /* convert to a btree if necessary */ if (xfs_bmap_needs_btree(ip, whichfork)) { int tmp_logflags; /* partial log flag return val */ ASSERT(cur == NULL); error = xfs_bmap_extents_to_btree(tp, ip, &cur, 0, &tmp_logflags, whichfork); *logflagsp |= tmp_logflags; if (error) goto done; } /* clear out the allocated field, done with it now in any case. */ if (cur) { cur->bc_bmap.allocated = 0; *curp = cur; } xfs_bmap_check_leaf_extents(*curp, ip, whichfork); done: *logflagsp |= rval; return error; #undef LEFT #undef RIGHT #undef PREV } /* * Convert a hole to a real allocation. */ STATIC int /* error */ xfs_bmap_add_extent_hole_real( struct xfs_trans *tp, struct xfs_inode *ip, int whichfork, struct xfs_iext_cursor *icur, struct xfs_btree_cur **curp, struct xfs_bmbt_irec *new, int *logflagsp, uint32_t flags) { struct xfs_ifork *ifp = xfs_ifork_ptr(ip, whichfork); struct xfs_mount *mp = ip->i_mount; struct xfs_btree_cur *cur = *curp; int error; /* error return value */ int i; /* temp state */ xfs_bmbt_irec_t left; /* left neighbor extent entry */ xfs_bmbt_irec_t right; /* right neighbor extent entry */ int rval=0; /* return value (logging flags) */ uint32_t state = xfs_bmap_fork_to_state(whichfork); struct xfs_bmbt_irec old; ASSERT(!isnullstartblock(new->br_startblock)); ASSERT(!cur || !(cur->bc_flags & XFS_BTREE_BMBT_WASDEL)); XFS_STATS_INC(mp, xs_add_exlist); /* * Check and set flags if this segment has a left neighbor. */ if (xfs_iext_peek_prev_extent(ifp, icur, &left)) { state |= BMAP_LEFT_VALID; if (isnullstartblock(left.br_startblock)) state |= BMAP_LEFT_DELAY; } /* * Check and set flags if this segment has a current value. * Not true if we're inserting into the "hole" at eof. */ if (xfs_iext_get_extent(ifp, icur, &right)) { state |= BMAP_RIGHT_VALID; if (isnullstartblock(right.br_startblock)) state |= BMAP_RIGHT_DELAY; } /* * We're inserting a real allocation between "left" and "right". * Set the contiguity flags. Don't let extents get too large. */ if ((state & BMAP_LEFT_VALID) && !(state & BMAP_LEFT_DELAY) && left.br_startoff + left.br_blockcount == new->br_startoff && left.br_startblock + left.br_blockcount == new->br_startblock && left.br_state == new->br_state && left.br_blockcount + new->br_blockcount <= XFS_MAX_BMBT_EXTLEN && xfs_bmap_same_rtgroup(ip, whichfork, &left, new)) state |= BMAP_LEFT_CONTIG; if ((state & BMAP_RIGHT_VALID) && !(state & BMAP_RIGHT_DELAY) && new->br_startoff + new->br_blockcount == right.br_startoff && new->br_startblock + new->br_blockcount == right.br_startblock && new->br_state == right.br_state && new->br_blockcount + right.br_blockcount <= XFS_MAX_BMBT_EXTLEN && (!(state & BMAP_LEFT_CONTIG) || left.br_blockcount + new->br_blockcount + right.br_blockcount <= XFS_MAX_BMBT_EXTLEN) && xfs_bmap_same_rtgroup(ip, whichfork, new, &right)) state |= BMAP_RIGHT_CONTIG; error = 0; /* * Select which case we're in here, and implement it. */ switch (state & (BMAP_LEFT_CONTIG | BMAP_RIGHT_CONTIG)) { case BMAP_LEFT_CONTIG | BMAP_RIGHT_CONTIG: /* * New allocation is contiguous with real allocations on the * left and on the right. * Merge all three into a single extent record. */ left.br_blockcount += new->br_blockcount + right.br_blockcount; xfs_iext_remove(ip, icur, state); xfs_iext_prev(ifp, icur); xfs_iext_update_extent(ip, state, icur, &left); ifp->if_nextents--; if (cur == NULL) { rval = XFS_ILOG_CORE | xfs_ilog_fext(whichfork); } else { rval = XFS_ILOG_CORE; error = xfs_bmbt_lookup_eq(cur, &right, &i); if (error) goto done; if (XFS_IS_CORRUPT(mp, i != 1)) { xfs_btree_mark_sick(cur); error = -EFSCORRUPTED; goto done; } error = xfs_btree_delete(cur, &i); if (error) goto done; if (XFS_IS_CORRUPT(mp, i != 1)) { xfs_btree_mark_sick(cur); error = -EFSCORRUPTED; goto done; } error = xfs_btree_decrement(cur, 0, &i); if (error) goto done; if (XFS_IS_CORRUPT(mp, i != 1)) { xfs_btree_mark_sick(cur); error = -EFSCORRUPTED; goto done; } error = xfs_bmbt_update(cur, &left); if (error) goto done; } break; case BMAP_LEFT_CONTIG: /* * New allocation is contiguous with a real allocation * on the left. * Merge the new allocation with the left neighbor. */ old = left; left.br_blockcount += new->br_blockcount; xfs_iext_prev(ifp, icur); xfs_iext_update_extent(ip, state, icur, &left); if (cur == NULL) { rval = xfs_ilog_fext(whichfork); } else { rval = 0; error = xfs_bmbt_lookup_eq(cur, &old, &i); if (error) goto done; if (XFS_IS_CORRUPT(mp, i != 1)) { xfs_btree_mark_sick(cur); error = -EFSCORRUPTED; goto done; } error = xfs_bmbt_update(cur, &left); if (error) goto done; } break; case BMAP_RIGHT_CONTIG: /* * New allocation is contiguous with a real allocation * on the right. * Merge the new allocation with the right neighbor. */ old = right; right.br_startoff = new->br_startoff; right.br_startblock = new->br_startblock; right.br_blockcount += new->br_blockcount; xfs_iext_update_extent(ip, state, icur, &right); if (cur == NULL) { rval = xfs_ilog_fext(whichfork); } else { rval = 0; error = xfs_bmbt_lookup_eq(cur, &old, &i); if (error) goto done; if (XFS_IS_CORRUPT(mp, i != 1)) { xfs_btree_mark_sick(cur); error = -EFSCORRUPTED; goto done; } error = xfs_bmbt_update(cur, &right); if (error) goto done; } break; case 0: /* * New allocation is not contiguous with another * real allocation. * Insert a new entry. */ xfs_iext_insert(ip, icur, new, state); ifp->if_nextents++; if (cur == NULL) { rval = XFS_ILOG_CORE | xfs_ilog_fext(whichfork); } else { rval = XFS_ILOG_CORE; error = xfs_bmbt_lookup_eq(cur, new, &i); if (error) goto done; if (XFS_IS_CORRUPT(mp, i != 0)) { xfs_btree_mark_sick(cur); error = -EFSCORRUPTED; goto done; } error = xfs_btree_insert(cur, &i); if (error) goto done; if (XFS_IS_CORRUPT(mp, i != 1)) { xfs_btree_mark_sick(cur); error = -EFSCORRUPTED; goto done; } } break; } /* add reverse mapping unless caller opted out */ if (!(flags & XFS_BMAPI_NORMAP)) xfs_rmap_map_extent(tp, ip, whichfork, new); /* convert to a btree if necessary */ if (xfs_bmap_needs_btree(ip, whichfork)) { int tmp_logflags; /* partial log flag return val */ ASSERT(cur == NULL); error = xfs_bmap_extents_to_btree(tp, ip, curp, 0, &tmp_logflags, whichfork); *logflagsp |= tmp_logflags; cur = *curp; if (error) goto done; } /* clear out the allocated field, done with it now in any case. */ if (cur) cur->bc_bmap.allocated = 0; xfs_bmap_check_leaf_extents(cur, ip, whichfork); done: *logflagsp |= rval; return error; } /* * Functions used in the extent read, allocate and remove paths */ /* * Adjust the size of the new extent based on i_extsize and rt extsize. */ int xfs_bmap_extsize_align( xfs_mount_t *mp, xfs_bmbt_irec_t *gotp, /* next extent pointer */ xfs_bmbt_irec_t *prevp, /* previous extent pointer */ xfs_extlen_t extsz, /* align to this extent size */ int rt, /* is this a realtime inode? */ int eof, /* is extent at end-of-file? */ int delay, /* creating delalloc extent? */ int convert, /* overwriting unwritten extent? */ xfs_fileoff_t *offp, /* in/out: aligned offset */ xfs_extlen_t *lenp) /* in/out: aligned length */ { xfs_fileoff_t orig_off; /* original offset */ xfs_extlen_t orig_alen; /* original length */ xfs_fileoff_t orig_end; /* original off+len */ xfs_fileoff_t nexto; /* next file offset */ xfs_fileoff_t prevo; /* previous file offset */ xfs_fileoff_t align_off; /* temp for offset */ xfs_extlen_t align_alen; /* temp for length */ xfs_extlen_t temp; /* temp for calculations */ if (convert) return 0; orig_off = align_off = *offp; orig_alen = align_alen = *lenp; orig_end = orig_off + orig_alen; /* * If this request overlaps an existing extent, then don't * attempt to perform any additional alignment. */ if (!delay && !eof && (orig_off >= gotp->br_startoff) && (orig_end <= gotp->br_startoff + gotp->br_blockcount)) { return 0; } /* * If the file offset is unaligned vs. the extent size * we need to align it. This will be possible unless * the file was previously written with a kernel that didn't * perform this alignment, or if a truncate shot us in the * foot. */ div_u64_rem(orig_off, extsz, &temp); if (temp) { align_alen += temp; align_off -= temp; } /* Same adjustment for the end of the requested area. */ temp = (align_alen % extsz); if (temp) align_alen += extsz - temp; /* * For large extent hint sizes, the aligned extent might be larger than * XFS_BMBT_MAX_EXTLEN. In that case, reduce the size by an extsz so * that it pulls the length back under XFS_BMBT_MAX_EXTLEN. The outer * allocation loops handle short allocation just fine, so it is safe to * do this. We only want to do it when we are forced to, though, because * it means more allocation operations are required. */ while (align_alen > XFS_MAX_BMBT_EXTLEN) align_alen -= extsz; ASSERT(align_alen <= XFS_MAX_BMBT_EXTLEN); /* * If the previous block overlaps with this proposed allocation * then move the start forward without adjusting the length. */ if (prevp->br_startoff != NULLFILEOFF) { if (prevp->br_startblock == HOLESTARTBLOCK) prevo = prevp->br_startoff; else prevo = prevp->br_startoff + prevp->br_blockcount; } else prevo = 0; if (align_off != orig_off && align_off < prevo) align_off = prevo; /* * If the next block overlaps with this proposed allocation * then move the start back without adjusting the length, * but not before offset 0. * This may of course make the start overlap previous block, * and if we hit the offset 0 limit then the next block * can still overlap too. */ if (!eof && gotp->br_startoff != NULLFILEOFF) { if ((delay && gotp->br_startblock == HOLESTARTBLOCK) || (!delay && gotp->br_startblock == DELAYSTARTBLOCK)) nexto = gotp->br_startoff + gotp->br_blockcount; else nexto = gotp->br_startoff; } else nexto = NULLFILEOFF; if (!eof && align_off + align_alen != orig_end && align_off + align_alen > nexto) align_off = nexto > align_alen ? nexto - align_alen : 0; /* * If we're now overlapping the next or previous extent that * means we can't fit an extsz piece in this hole. Just move * the start forward to the first valid spot and set * the length so we hit the end. */ if (align_off != orig_off && align_off < prevo) align_off = prevo; if (align_off + align_alen != orig_end && align_off + align_alen > nexto && nexto != NULLFILEOFF) { ASSERT(nexto > prevo); align_alen = nexto - align_off; } /* * If realtime, and the result isn't a multiple of the realtime * extent size we need to remove blocks until it is. */ if (rt && (temp = xfs_extlen_to_rtxmod(mp, align_alen))) { /* * We're not covering the original request, or * we won't be able to once we fix the length. */ if (orig_off < align_off || orig_end > align_off + align_alen || align_alen - temp < orig_alen) return -EINVAL; /* * Try to fix it by moving the start up. */ if (align_off + temp <= orig_off) { align_alen -= temp; align_off += temp; } /* * Try to fix it by moving the end in. */ else if (align_off + align_alen - temp >= orig_end) align_alen -= temp; /* * Set the start to the minimum then trim the length. */ else { align_alen -= orig_off - align_off; align_off = orig_off; align_alen -= xfs_extlen_to_rtxmod(mp, align_alen); } /* * Result doesn't cover the request, fail it. */ if (orig_off < align_off || orig_end > align_off + align_alen) return -EINVAL; } else { ASSERT(orig_off >= align_off); /* see XFS_BMBT_MAX_EXTLEN handling above */ ASSERT(orig_end <= align_off + align_alen || align_alen + extsz > XFS_MAX_BMBT_EXTLEN); } #ifdef DEBUG if (!eof && gotp->br_startoff != NULLFILEOFF) ASSERT(align_off + align_alen <= gotp->br_startoff); if (prevp->br_startoff != NULLFILEOFF) ASSERT(align_off >= prevp->br_startoff + prevp->br_blockcount); #endif *lenp = align_alen; *offp = align_off; return 0; } static inline bool xfs_bmap_adjacent_valid( struct xfs_bmalloca *ap, xfs_fsblock_t x, xfs_fsblock_t y) { struct xfs_mount *mp = ap->ip->i_mount; if (XFS_IS_REALTIME_INODE(ap->ip) && (ap->datatype & XFS_ALLOC_USERDATA)) { if (!xfs_has_rtgroups(mp)) return x < mp->m_sb.sb_rblocks; return xfs_rtb_to_rgno(mp, x) == xfs_rtb_to_rgno(mp, y) && xfs_rtb_to_rgno(mp, x) < mp->m_sb.sb_rgcount && xfs_rtb_to_rtx(mp, x) < mp->m_sb.sb_rgextents; } return XFS_FSB_TO_AGNO(mp, x) == XFS_FSB_TO_AGNO(mp, y) && XFS_FSB_TO_AGNO(mp, x) < mp->m_sb.sb_agcount && XFS_FSB_TO_AGBNO(mp, x) < mp->m_sb.sb_agblocks; } #define XFS_ALLOC_GAP_UNITS 4 /* returns true if ap->blkno was modified */ bool xfs_bmap_adjacent( struct xfs_bmalloca *ap) /* bmap alloc argument struct */ { xfs_fsblock_t adjust; /* adjustment to block numbers */ /* * If allocating at eof, and there's a previous real block, * try to use its last block as our starting point. */ if (ap->eof && ap->prev.br_startoff != NULLFILEOFF && !isnullstartblock(ap->prev.br_startblock) && xfs_bmap_adjacent_valid(ap, ap->prev.br_startblock + ap->prev.br_blockcount, ap->prev.br_startblock)) { ap->blkno = ap->prev.br_startblock + ap->prev.br_blockcount; /* * Adjust for the gap between prevp and us. */ adjust = ap->offset - (ap->prev.br_startoff + ap->prev.br_blockcount); if (adjust && xfs_bmap_adjacent_valid(ap, ap->blkno + adjust, ap->prev.br_startblock)) ap->blkno += adjust; return true; } /* * If not at eof, then compare the two neighbor blocks. * Figure out whether either one gives us a good starting point, * and pick the better one. */ if (!ap->eof) { xfs_fsblock_t gotbno; /* right side block number */ xfs_fsblock_t gotdiff=0; /* right side difference */ xfs_fsblock_t prevbno; /* left side block number */ xfs_fsblock_t prevdiff=0; /* left side difference */ /* * If there's a previous (left) block, select a requested * start block based on it. */ if (ap->prev.br_startoff != NULLFILEOFF && !isnullstartblock(ap->prev.br_startblock) && (prevbno = ap->prev.br_startblock + ap->prev.br_blockcount) && xfs_bmap_adjacent_valid(ap, prevbno, ap->prev.br_startblock)) { /* * Calculate gap to end of previous block. */ adjust = prevdiff = ap->offset - (ap->prev.br_startoff + ap->prev.br_blockcount); /* * Figure the startblock based on the previous block's * end and the gap size. * Heuristic! * If the gap is large relative to the piece we're * allocating, or using it gives us an invalid block * number, then just use the end of the previous block. */ if (prevdiff <= XFS_ALLOC_GAP_UNITS * ap->length && xfs_bmap_adjacent_valid(ap, prevbno + prevdiff, ap->prev.br_startblock)) prevbno += adjust; else prevdiff += adjust; } /* * No previous block or can't follow it, just default. */ else prevbno = NULLFSBLOCK; /* * If there's a following (right) block, select a requested * start block based on it. */ if (!isnullstartblock(ap->got.br_startblock)) { /* * Calculate gap to start of next block. */ adjust = gotdiff = ap->got.br_startoff - ap->offset; /* * Figure the startblock based on the next block's * start and the gap size. */ gotbno = ap->got.br_startblock; /* * Heuristic! * If the gap is large relative to the piece we're * allocating, or using it gives us an invalid block * number, then just use the start of the next block * offset by our length. */ if (gotdiff <= XFS_ALLOC_GAP_UNITS * ap->length && xfs_bmap_adjacent_valid(ap, gotbno - gotdiff, gotbno)) gotbno -= adjust; else if (xfs_bmap_adjacent_valid(ap, gotbno - ap->length, gotbno)) { gotbno -= ap->length; gotdiff += adjust - ap->length; } else gotdiff += adjust; } /* * No next block, just default. */ else gotbno = NULLFSBLOCK; /* * If both valid, pick the better one, else the only good * one, else ap->blkno is already set (to 0 or the inode block). */ if (prevbno != NULLFSBLOCK && gotbno != NULLFSBLOCK) { ap->blkno = prevdiff <= gotdiff ? prevbno : gotbno; return true; } if (prevbno != NULLFSBLOCK) { ap->blkno = prevbno; return true; } if (gotbno != NULLFSBLOCK) { ap->blkno = gotbno; return true; } } return false; } int xfs_bmap_longest_free_extent( struct xfs_perag *pag, struct xfs_trans *tp, xfs_extlen_t *blen) { xfs_extlen_t longest; int error = 0; if (!xfs_perag_initialised_agf(pag)) { error = xfs_alloc_read_agf(pag, tp, XFS_ALLOC_FLAG_TRYLOCK, NULL); if (error) return error; } longest = xfs_alloc_longest_free_extent(pag, xfs_alloc_min_freelist(pag_mount(pag), pag), xfs_ag_resv_needed(pag, XFS_AG_RESV_NONE)); if (*blen < longest) *blen = longest; return 0; } static xfs_extlen_t xfs_bmap_select_minlen( struct xfs_bmalloca *ap, struct xfs_alloc_arg *args, xfs_extlen_t blen) { /* * Since we used XFS_ALLOC_FLAG_TRYLOCK in _longest_free_extent(), it is * possible that there is enough contiguous free space for this request. */ if (blen < ap->minlen) return ap->minlen; /* * If the best seen length is less than the request length, * use the best as the minimum, otherwise we've got the maxlen we * were asked for. */ if (blen < args->maxlen) return blen; return args->maxlen; } static int xfs_bmap_btalloc_select_lengths( struct xfs_bmalloca *ap, struct xfs_alloc_arg *args, xfs_extlen_t *blen) { struct xfs_mount *mp = args->mp; struct xfs_perag *pag; xfs_agnumber_t agno, startag; int error = 0; if (ap->tp->t_flags & XFS_TRANS_LOWMODE) { args->total = ap->minlen; args->minlen = ap->minlen; return 0; } args->total = ap->total; startag = XFS_FSB_TO_AGNO(mp, ap->blkno); if (startag == NULLAGNUMBER) startag = 0; *blen = 0; for_each_perag_wrap(mp, startag, agno, pag) { error = xfs_bmap_longest_free_extent(pag, args->tp, blen); if (error && error != -EAGAIN) break; error = 0; if (*blen >= args->maxlen) break; } if (pag) xfs_perag_rele(pag); args->minlen = xfs_bmap_select_minlen(ap, args, *blen); return error; } /* Update all inode and quota accounting for the allocation we just did. */ void xfs_bmap_alloc_account( struct xfs_bmalloca *ap) { bool isrt = XFS_IS_REALTIME_INODE(ap->ip) && !(ap->flags & XFS_BMAPI_ATTRFORK); uint fld; if (ap->flags & XFS_BMAPI_COWFORK) { /* * COW fork blocks are in-core only and thus are treated as * in-core quota reservation (like delalloc blocks) even when * converted to real blocks. The quota reservation is not * accounted to disk until blocks are remapped to the data * fork. So if these blocks were previously delalloc, we * already have quota reservation and there's nothing to do * yet. */ if (ap->wasdel) { xfs_mod_delalloc(ap->ip, -(int64_t)ap->length, 0); return; } /* * Otherwise, we've allocated blocks in a hole. The transaction * has acquired in-core quota reservation for this extent. * Rather than account these as real blocks, however, we reduce * the transaction quota reservation based on the allocation. * This essentially transfers the transaction quota reservation * to that of a delalloc extent. */ ap->ip->i_delayed_blks += ap->length; xfs_trans_mod_dquot_byino(ap->tp, ap->ip, isrt ? XFS_TRANS_DQ_RES_RTBLKS : XFS_TRANS_DQ_RES_BLKS, -(long)ap->length); return; } /* data/attr fork only */ ap->ip->i_nblocks += ap->length; xfs_trans_log_inode(ap->tp, ap->ip, XFS_ILOG_CORE); if (ap->wasdel) { ap->ip->i_delayed_blks -= ap->length; xfs_mod_delalloc(ap->ip, -(int64_t)ap->length, 0); fld = isrt ? XFS_TRANS_DQ_DELRTBCOUNT : XFS_TRANS_DQ_DELBCOUNT; } else { fld = isrt ? XFS_TRANS_DQ_RTBCOUNT : XFS_TRANS_DQ_BCOUNT; } xfs_trans_mod_dquot_byino(ap->tp, ap->ip, fld, ap->length); } static int xfs_bmap_compute_alignments( struct xfs_bmalloca *ap, struct xfs_alloc_arg *args) { struct xfs_mount *mp = args->mp; xfs_extlen_t align = 0; /* minimum allocation alignment */ int stripe_align = 0; /* stripe alignment for allocation is determined by mount parameters */ if (mp->m_swidth && xfs_has_swalloc(mp)) stripe_align = mp->m_swidth; else if (mp->m_dalign) stripe_align = mp->m_dalign; if (ap->flags & XFS_BMAPI_COWFORK) align = xfs_get_cowextsz_hint(ap->ip); else if (ap->datatype & XFS_ALLOC_USERDATA) align = xfs_get_extsz_hint(ap->ip); /* Try to align start block to any minimum allocation alignment */ if (align > 1 && (ap->flags & XFS_BMAPI_EXTSZALIGN)) args->alignment = align; if (align) { if (xfs_bmap_extsize_align(mp, &ap->got, &ap->prev, align, 0, ap->eof, 0, ap->conv, &ap->offset, &ap->length)) ASSERT(0); ASSERT(ap->length); } /* apply extent size hints if obtained earlier */ if (align) { args->prod = align; div_u64_rem(ap->offset, args->prod, &args->mod); if (args->mod) args->mod = args->prod - args->mod; } else if (mp->m_sb.sb_blocksize >= PAGE_SIZE) { args->prod = 1; args->mod = 0; } else { args->prod = PAGE_SIZE >> mp->m_sb.sb_blocklog; div_u64_rem(ap->offset, args->prod, &args->mod); if (args->mod) args->mod = args->prod - args->mod; } return stripe_align; } static void xfs_bmap_process_allocated_extent( struct xfs_bmalloca *ap, struct xfs_alloc_arg *args, xfs_fileoff_t orig_offset, xfs_extlen_t orig_length) { ap->blkno = args->fsbno; ap->length = args->len; /* * If the extent size hint is active, we tried to round the * caller's allocation request offset down to extsz and the * length up to another extsz boundary. If we found a free * extent we mapped it in starting at this new offset. If the * newly mapped space isn't long enough to cover any of the * range of offsets that was originally requested, move the * mapping up so that we can fill as much of the caller's * original request as possible. Free space is apparently * very fragmented so we're unlikely to be able to satisfy the * hints anyway. */ if (ap->length <= orig_length) ap->offset = orig_offset; else if (ap->offset + ap->length < orig_offset + orig_length) ap->offset = orig_offset + orig_length - ap->length; xfs_bmap_alloc_account(ap); } static int xfs_bmap_exact_minlen_extent_alloc( struct xfs_bmalloca *ap, struct xfs_alloc_arg *args) { if (ap->minlen != 1) { args->fsbno = NULLFSBLOCK; return 0; } args->alloc_minlen_only = 1; args->minlen = args->maxlen = ap->minlen; args->total = ap->total; /* * Unlike the longest extent available in an AG, we don't track * the length of an AG's shortest extent. * XFS_ERRTAG_BMAP_ALLOC_MINLEN_EXTENT is a debug only knob and * hence we can afford to start traversing from the 0th AG since * we need not be concerned about a drop in performance in * "debug only" code paths. */ ap->blkno = XFS_AGB_TO_FSB(ap->ip->i_mount, 0, 0); /* * Call xfs_bmap_btalloc_low_space here as it first does a "normal" AG * iteration and then drops args->total to args->minlen, which might be * required to find an allocation for the transaction reservation when * the file system is very full. */ return xfs_bmap_btalloc_low_space(ap, args); } /* * If we are not low on available data blocks and we are allocating at * EOF, optimise allocation for contiguous file extension and/or stripe * alignment of the new extent. * * NOTE: ap->aeof is only set if the allocation length is >= the * stripe unit and the allocation offset is at the end of file. */ static int xfs_bmap_btalloc_at_eof( struct xfs_bmalloca *ap, struct xfs_alloc_arg *args, xfs_extlen_t blen, int stripe_align, bool ag_only) { struct xfs_mount *mp = args->mp; struct xfs_perag *caller_pag = args->pag; int error; /* * If there are already extents in the file, and xfs_bmap_adjacent() has * given a better blkno, try an exact EOF block allocation to extend the * file as a contiguous extent. If that fails, or it's the first * allocation in a file, just try for a stripe aligned allocation. */ if (ap->eof) { xfs_extlen_t nextminlen = 0; /* * Compute the minlen+alignment for the next case. Set slop so * that the value of minlen+alignment+slop doesn't go up between * the calls. */ args->alignment = 1; if (blen > stripe_align && blen <= args->maxlen) nextminlen = blen - stripe_align; else nextminlen = args->minlen; if (nextminlen + stripe_align > args->minlen + 1) args->minalignslop = nextminlen + stripe_align - args->minlen - 1; else args->minalignslop = 0; if (!caller_pag) args->pag = xfs_perag_get(mp, XFS_FSB_TO_AGNO(mp, ap->blkno)); error = xfs_alloc_vextent_exact_bno(args, ap->blkno); if (!caller_pag) { xfs_perag_put(args->pag); args->pag = NULL; } if (error) return error; if (args->fsbno != NULLFSBLOCK) return 0; /* * Exact allocation failed. Reset to try an aligned allocation * according to the original allocation specification. */ args->alignment = stripe_align; args->minlen = nextminlen; args->minalignslop = 0; } else { /* * Adjust minlen to try and preserve alignment if we * can't guarantee an aligned maxlen extent. */ args->alignment = stripe_align; if (blen > args->alignment && blen <= args->maxlen + args->alignment) args->minlen = blen - args->alignment; args->minalignslop = 0; } if (ag_only) { error = xfs_alloc_vextent_near_bno(args, ap->blkno); } else { args->pag = NULL; error = xfs_alloc_vextent_start_ag(args, ap->blkno); ASSERT(args->pag == NULL); args->pag = caller_pag; } if (error) return error; if (args->fsbno != NULLFSBLOCK) return 0; /* * Allocation failed, so turn return the allocation args to their * original non-aligned state so the caller can proceed on allocation * failure as if this function was never called. */ args->alignment = 1; return 0; } /* * We have failed multiple allocation attempts so now are in a low space * allocation situation. Try a locality first full filesystem minimum length * allocation whilst still maintaining necessary total block reservation * requirements. * * If that fails, we are now critically low on space, so perform a last resort * allocation attempt: no reserve, no locality, blocking, minimum length, full * filesystem free space scan. We also indicate to future allocations in this * transaction that we are critically low on space so they don't waste time on * allocation modes that are unlikely to succeed. */ int xfs_bmap_btalloc_low_space( struct xfs_bmalloca *ap, struct xfs_alloc_arg *args) { int error; if (args->minlen > ap->minlen) { args->minlen = ap->minlen; error = xfs_alloc_vextent_start_ag(args, ap->blkno); if (error || args->fsbno != NULLFSBLOCK) return error; } /* Last ditch attempt before failure is declared. */ args->total = ap->minlen; error = xfs_alloc_vextent_first_ag(args, 0); if (error) return error; ap->tp->t_flags |= XFS_TRANS_LOWMODE; return 0; } static int xfs_bmap_btalloc_filestreams( struct xfs_bmalloca *ap, struct xfs_alloc_arg *args, int stripe_align) { xfs_extlen_t blen = 0; int error = 0; error = xfs_filestream_select_ag(ap, args, &blen); if (error) return error; ASSERT(args->pag); /* * If we are in low space mode, then optimal allocation will fail so * prepare for minimal allocation and jump to the low space algorithm * immediately. */ if (ap->tp->t_flags & XFS_TRANS_LOWMODE) { args->minlen = ap->minlen; ASSERT(args->fsbno == NULLFSBLOCK); goto out_low_space; } args->minlen = xfs_bmap_select_minlen(ap, args, blen); if (ap->aeof) error = xfs_bmap_btalloc_at_eof(ap, args, blen, stripe_align, true); if (!error && args->fsbno == NULLFSBLOCK) error = xfs_alloc_vextent_near_bno(args, ap->blkno); out_low_space: /* * We are now done with the perag reference for the filestreams * association provided by xfs_filestream_select_ag(). Release it now as * we've either succeeded, had a fatal error or we are out of space and * need to do a full filesystem scan for free space which will take it's * own references. */ xfs_perag_rele(args->pag); args->pag = NULL; if (error || args->fsbno != NULLFSBLOCK) return error; return xfs_bmap_btalloc_low_space(ap, args); } static int xfs_bmap_btalloc_best_length( struct xfs_bmalloca *ap, struct xfs_alloc_arg *args, int stripe_align) { xfs_extlen_t blen = 0; int error; ap->blkno = XFS_INO_TO_FSB(args->mp, ap->ip->i_ino); if (!xfs_bmap_adjacent(ap)) ap->eof = false; /* * Search for an allocation group with a single extent large enough for * the request. If one isn't found, then adjust the minimum allocation * size to the largest space found. */ error = xfs_bmap_btalloc_select_lengths(ap, args, &blen); if (error) return error; /* * Don't attempt optimal EOF allocation if previous allocations barely * succeeded due to being near ENOSPC. It is highly unlikely we'll get * optimal or even aligned allocations in this case, so don't waste time * trying. */ if (ap->aeof && !(ap->tp->t_flags & XFS_TRANS_LOWMODE)) { error = xfs_bmap_btalloc_at_eof(ap, args, blen, stripe_align, false); if (error || args->fsbno != NULLFSBLOCK) return error; } error = xfs_alloc_vextent_start_ag(args, ap->blkno); if (error || args->fsbno != NULLFSBLOCK) return error; return xfs_bmap_btalloc_low_space(ap, args); } static int xfs_bmap_btalloc( struct xfs_bmalloca *ap) { struct xfs_mount *mp = ap->ip->i_mount; struct xfs_alloc_arg args = { .tp = ap->tp, .mp = mp, .fsbno = NULLFSBLOCK, .oinfo = XFS_RMAP_OINFO_SKIP_UPDATE, .minleft = ap->minleft, .wasdel = ap->wasdel, .resv = XFS_AG_RESV_NONE, .datatype = ap->datatype, .alignment = 1, .minalignslop = 0, }; xfs_fileoff_t orig_offset; xfs_extlen_t orig_length; int error; int stripe_align; ASSERT(ap->length); orig_offset = ap->offset; orig_length = ap->length; stripe_align = xfs_bmap_compute_alignments(ap, &args); /* Trim the allocation back to the maximum an AG can fit. */ args.maxlen = min(ap->length, mp->m_ag_max_usable); if (unlikely(XFS_TEST_ERROR(false, mp, XFS_ERRTAG_BMAP_ALLOC_MINLEN_EXTENT))) error = xfs_bmap_exact_minlen_extent_alloc(ap, &args); else if ((ap->datatype & XFS_ALLOC_USERDATA) && xfs_inode_is_filestream(ap->ip)) error = xfs_bmap_btalloc_filestreams(ap, &args, stripe_align); else error = xfs_bmap_btalloc_best_length(ap, &args, stripe_align); if (error) return error; if (args.fsbno != NULLFSBLOCK) { xfs_bmap_process_allocated_extent(ap, &args, orig_offset, orig_length); } else { ap->blkno = NULLFSBLOCK; ap->length = 0; } return 0; } /* Trim extent to fit a logical block range. */ void xfs_trim_extent( struct xfs_bmbt_irec *irec, xfs_fileoff_t bno, xfs_filblks_t len) { xfs_fileoff_t distance; xfs_fileoff_t end = bno + len; if (irec->br_startoff + irec->br_blockcount <= bno || irec->br_startoff >= end) { irec->br_blockcount = 0; return; } if (irec->br_startoff < bno) { distance = bno - irec->br_startoff; if (isnullstartblock(irec->br_startblock)) irec->br_startblock = DELAYSTARTBLOCK; if (irec->br_startblock != DELAYSTARTBLOCK && irec->br_startblock != HOLESTARTBLOCK) irec->br_startblock += distance; irec->br_startoff += distance; irec->br_blockcount -= distance; } if (end < irec->br_startoff + irec->br_blockcount) { distance = irec->br_startoff + irec->br_blockcount - end; irec->br_blockcount -= distance; } } /* * Trim the returned map to the required bounds */ STATIC void xfs_bmapi_trim_map( struct xfs_bmbt_irec *mval, struct xfs_bmbt_irec *got, xfs_fileoff_t *bno, xfs_filblks_t len, xfs_fileoff_t obno, xfs_fileoff_t end, int n, uint32_t flags) { if ((flags & XFS_BMAPI_ENTIRE) || got->br_startoff + got->br_blockcount <= obno) { *mval = *got; if (isnullstartblock(got->br_startblock)) mval->br_startblock = DELAYSTARTBLOCK; return; } if (obno > *bno) *bno = obno; ASSERT((*bno >= obno) || (n == 0)); ASSERT(*bno < end); mval->br_startoff = *bno; if (isnullstartblock(got->br_startblock)) mval->br_startblock = DELAYSTARTBLOCK; else mval->br_startblock = got->br_startblock + (*bno - got->br_startoff); /* * Return the minimum of what we got and what we asked for for * the length. We can use the len variable here because it is * modified below and we could have been there before coming * here if the first part of the allocation didn't overlap what * was asked for. */ mval->br_blockcount = XFS_FILBLKS_MIN(end - *bno, got->br_blockcount - (*bno - got->br_startoff)); mval->br_state = got->br_state; ASSERT(mval->br_blockcount <= len); return; } /* * Update and validate the extent map to return */ STATIC void xfs_bmapi_update_map( struct xfs_bmbt_irec **map, xfs_fileoff_t *bno, xfs_filblks_t *len, xfs_fileoff_t obno, xfs_fileoff_t end, int *n, uint32_t flags) { xfs_bmbt_irec_t *mval = *map; ASSERT((flags & XFS_BMAPI_ENTIRE) || ((mval->br_startoff + mval->br_blockcount) <= end)); ASSERT((flags & XFS_BMAPI_ENTIRE) || (mval->br_blockcount <= *len) || (mval->br_startoff < obno)); *bno = mval->br_startoff + mval->br_blockcount; *len = end - *bno; if (*n > 0 && mval->br_startoff == mval[-1].br_startoff) { /* update previous map with new information */ ASSERT(mval->br_startblock == mval[-1].br_startblock); ASSERT(mval->br_blockcount > mval[-1].br_blockcount); ASSERT(mval->br_state == mval[-1].br_state); mval[-1].br_blockcount = mval->br_blockcount; mval[-1].br_state = mval->br_state; } else if (*n > 0 && mval->br_startblock != DELAYSTARTBLOCK && mval[-1].br_startblock != DELAYSTARTBLOCK && mval[-1].br_startblock != HOLESTARTBLOCK && mval->br_startblock == mval[-1].br_startblock + mval[-1].br_blockcount && mval[-1].br_state == mval->br_state) { ASSERT(mval->br_startoff == mval[-1].br_startoff + mval[-1].br_blockcount); mval[-1].br_blockcount += mval->br_blockcount; } else if (*n > 0 && mval->br_startblock == DELAYSTARTBLOCK && mval[-1].br_startblock == DELAYSTARTBLOCK && mval->br_startoff == mval[-1].br_startoff + mval[-1].br_blockcount) { mval[-1].br_blockcount += mval->br_blockcount; mval[-1].br_state = mval->br_state; } else if (!((*n == 0) && ((mval->br_startoff + mval->br_blockcount) <= obno))) { mval++; (*n)++; } *map = mval; } /* * Map file blocks to filesystem blocks without allocation. */ int xfs_bmapi_read( struct xfs_inode *ip, xfs_fileoff_t bno, xfs_filblks_t len, struct xfs_bmbt_irec *mval, int *nmap, uint32_t flags) { struct xfs_mount *mp = ip->i_mount; int whichfork = xfs_bmapi_whichfork(flags); struct xfs_ifork *ifp = xfs_ifork_ptr(ip, whichfork); struct xfs_bmbt_irec got; xfs_fileoff_t obno; xfs_fileoff_t end; struct xfs_iext_cursor icur; int error; bool eof = false; int n = 0; ASSERT(*nmap >= 1); ASSERT(!(flags & ~(XFS_BMAPI_ATTRFORK | XFS_BMAPI_ENTIRE))); xfs_assert_ilocked(ip, XFS_ILOCK_SHARED | XFS_ILOCK_EXCL); if (WARN_ON_ONCE(!ifp)) { xfs_bmap_mark_sick(ip, whichfork); return -EFSCORRUPTED; } if (XFS_IS_CORRUPT(mp, !xfs_ifork_has_extents(ifp)) || XFS_TEST_ERROR(false, mp, XFS_ERRTAG_BMAPIFORMAT)) { xfs_bmap_mark_sick(ip, whichfork); return -EFSCORRUPTED; } if (xfs_is_shutdown(mp)) return -EIO; XFS_STATS_INC(mp, xs_blk_mapr); error = xfs_iread_extents(NULL, ip, whichfork); if (error) return error; if (!xfs_iext_lookup_extent(ip, ifp, bno, &icur, &got)) eof = true; end = bno + len; obno = bno; while (bno < end && n < *nmap) { /* Reading past eof, act as though there's a hole up to end. */ if (eof) got.br_startoff = end; if (got.br_startoff > bno) { /* Reading in a hole. */ mval->br_startoff = bno; mval->br_startblock = HOLESTARTBLOCK; mval->br_blockcount = XFS_FILBLKS_MIN(len, got.br_startoff - bno); mval->br_state = XFS_EXT_NORM; bno += mval->br_blockcount; len -= mval->br_blockcount; mval++; n++; continue; } /* set up the extent map to return. */ xfs_bmapi_trim_map(mval, &got, &bno, len, obno, end, n, flags); xfs_bmapi_update_map(&mval, &bno, &len, obno, end, &n, flags); /* If we're done, stop now. */ if (bno >= end || n >= *nmap) break; /* Else go on to the next record. */ if (!xfs_iext_next_extent(ifp, &icur, &got)) eof = true; } *nmap = n; return 0; } static int xfs_bmapi_allocate( struct xfs_bmalloca *bma) { struct xfs_mount *mp = bma->ip->i_mount; int whichfork = xfs_bmapi_whichfork(bma->flags); struct xfs_ifork *ifp = xfs_ifork_ptr(bma->ip, whichfork); int error; ASSERT(bma->length > 0); ASSERT(bma->length <= XFS_MAX_BMBT_EXTLEN); if (bma->flags & XFS_BMAPI_CONTIG) bma->minlen = bma->length; else bma->minlen = 1; if (!(bma->flags & XFS_BMAPI_METADATA)) { /* * For the data and COW fork, the first data in the file is * treated differently to all other allocations. For the * attribute fork, we only need to ensure the allocated range * is not on the busy list. */ bma->datatype = XFS_ALLOC_NOBUSY; if (whichfork == XFS_DATA_FORK || whichfork == XFS_COW_FORK) { bma->datatype |= XFS_ALLOC_USERDATA; if (bma->offset == 0) bma->datatype |= XFS_ALLOC_INITIAL_USER_DATA; if (mp->m_dalign && bma->length >= mp->m_dalign) { error = xfs_bmap_isaeof(bma, whichfork); if (error) return error; } } } if ((bma->datatype & XFS_ALLOC_USERDATA) && XFS_IS_REALTIME_INODE(bma->ip)) error = xfs_bmap_rtalloc(bma); else error = xfs_bmap_btalloc(bma); if (error) return error; if (bma->blkno == NULLFSBLOCK) return -ENOSPC; if (WARN_ON_ONCE(!xfs_valid_startblock(bma->ip, bma->blkno))) { xfs_bmap_mark_sick(bma->ip, whichfork); return -EFSCORRUPTED; } if (bma->flags & XFS_BMAPI_ZERO) { error = xfs_zero_extent(bma->ip, bma->blkno, bma->length); if (error) return error; } if (ifp->if_format == XFS_DINODE_FMT_BTREE && !bma->cur) bma->cur = xfs_bmbt_init_cursor(mp, bma->tp, bma->ip, whichfork); /* * Bump the number of extents we've allocated * in this call. */ bma->nallocs++; if (bma->cur && bma->wasdel) bma->cur->bc_flags |= XFS_BTREE_BMBT_WASDEL; bma->got.br_startoff = bma->offset; bma->got.br_startblock = bma->blkno; bma->got.br_blockcount = bma->length; bma->got.br_state = XFS_EXT_NORM; if (bma->flags & XFS_BMAPI_PREALLOC) bma->got.br_state = XFS_EXT_UNWRITTEN; if (bma->wasdel) error = xfs_bmap_add_extent_delay_real(bma, whichfork); else error = xfs_bmap_add_extent_hole_real(bma->tp, bma->ip, whichfork, &bma->icur, &bma->cur, &bma->got, &bma->logflags, bma->flags); if (error) return error; /* * Update our extent pointer, given that xfs_bmap_add_extent_delay_real * or xfs_bmap_add_extent_hole_real might have merged it into one of * the neighbouring ones. */ xfs_iext_get_extent(ifp, &bma->icur, &bma->got); ASSERT(bma->got.br_startoff <= bma->offset); ASSERT(bma->got.br_startoff + bma->got.br_blockcount >= bma->offset + bma->length); ASSERT(bma->got.br_state == XFS_EXT_NORM || bma->got.br_state == XFS_EXT_UNWRITTEN); return 0; } STATIC int xfs_bmapi_convert_unwritten( struct xfs_bmalloca *bma, struct xfs_bmbt_irec *mval, xfs_filblks_t len, uint32_t flags) { int whichfork = xfs_bmapi_whichfork(flags); struct xfs_ifork *ifp = xfs_ifork_ptr(bma->ip, whichfork); int tmp_logflags = 0; int error; /* check if we need to do unwritten->real conversion */ if (mval->br_state == XFS_EXT_UNWRITTEN && (flags & XFS_BMAPI_PREALLOC)) return 0; /* check if we need to do real->unwritten conversion */ if (mval->br_state == XFS_EXT_NORM && (flags & (XFS_BMAPI_PREALLOC | XFS_BMAPI_CONVERT)) != (XFS_BMAPI_PREALLOC | XFS_BMAPI_CONVERT)) return 0; /* * Modify (by adding) the state flag, if writing. */ ASSERT(mval->br_blockcount <= len); if (ifp->if_format == XFS_DINODE_FMT_BTREE && !bma->cur) { bma->cur = xfs_bmbt_init_cursor(bma->ip->i_mount, bma->tp, bma->ip, whichfork); } mval->br_state = (mval->br_state == XFS_EXT_UNWRITTEN) ? XFS_EXT_NORM : XFS_EXT_UNWRITTEN; /* * Before insertion into the bmbt, zero the range being converted * if required. */ if (flags & XFS_BMAPI_ZERO) { error = xfs_zero_extent(bma->ip, mval->br_startblock, mval->br_blockcount); if (error) return error; } error = xfs_bmap_add_extent_unwritten_real(bma->tp, bma->ip, whichfork, &bma->icur, &bma->cur, mval, &tmp_logflags); /* * Log the inode core unconditionally in the unwritten extent conversion * path because the conversion might not have done so (e.g., if the * extent count hasn't changed). We need to make sure the inode is dirty * in the transaction for the sake of fsync(), even if nothing has * changed, because fsync() will not force the log for this transaction * unless it sees the inode pinned. * * Note: If we're only converting cow fork extents, there aren't * any on-disk updates to make, so we don't need to log anything. */ if (whichfork != XFS_COW_FORK) bma->logflags |= tmp_logflags | XFS_ILOG_CORE; if (error) return error; /* * Update our extent pointer, given that * xfs_bmap_add_extent_unwritten_real might have merged it into one * of the neighbouring ones. */ xfs_iext_get_extent(ifp, &bma->icur, &bma->got); /* * We may have combined previously unwritten space with written space, * so generate another request. */ if (mval->br_blockcount < len) return -EAGAIN; return 0; } xfs_extlen_t xfs_bmapi_minleft( struct xfs_trans *tp, struct xfs_inode *ip, int fork) { struct xfs_ifork *ifp = xfs_ifork_ptr(ip, fork); if (tp && tp->t_highest_agno != NULLAGNUMBER) return 0; if (ifp->if_format != XFS_DINODE_FMT_BTREE) return 1; return be16_to_cpu(ifp->if_broot->bb_level) + 1; } /* * Log whatever the flags say, even if error. Otherwise we might miss detecting * a case where the data is changed, there's an error, and it's not logged so we * don't shutdown when we should. Don't bother logging extents/btree changes if * we converted to the other format. */ static void xfs_bmapi_finish( struct xfs_bmalloca *bma, int whichfork, int error) { struct xfs_ifork *ifp = xfs_ifork_ptr(bma->ip, whichfork); if ((bma->logflags & xfs_ilog_fext(whichfork)) && ifp->if_format != XFS_DINODE_FMT_EXTENTS) bma->logflags &= ~xfs_ilog_fext(whichfork); else if ((bma->logflags & xfs_ilog_fbroot(whichfork)) && ifp->if_format != XFS_DINODE_FMT_BTREE) bma->logflags &= ~xfs_ilog_fbroot(whichfork); if (bma->logflags) xfs_trans_log_inode(bma->tp, bma->ip, bma->logflags); if (bma->cur) xfs_btree_del_cursor(bma->cur, error); } /* * Map file blocks to filesystem blocks, and allocate blocks or convert the * extent state if necessary. Details behaviour is controlled by the flags * parameter. Only allocates blocks from a single allocation group, to avoid * locking problems. * * Returns 0 on success and places the extent mappings in mval. nmaps is used * as an input/output parameter where the caller specifies the maximum number * of mappings that may be returned and xfs_bmapi_write passes back the number * of mappings (including existing mappings) it found. * * Returns a negative error code on failure, including -ENOSPC when it could not * allocate any blocks and -ENOSR when it did allocate blocks to convert a * delalloc range, but those blocks were before the passed in range. */ int xfs_bmapi_write( struct xfs_trans *tp, /* transaction pointer */ struct xfs_inode *ip, /* incore inode */ xfs_fileoff_t bno, /* starting file offs. mapped */ xfs_filblks_t len, /* length to map in file */ uint32_t flags, /* XFS_BMAPI_... */ xfs_extlen_t total, /* total blocks needed */ struct xfs_bmbt_irec *mval, /* output: map values */ int *nmap) /* i/o: mval size/count */ { struct xfs_bmalloca bma = { .tp = tp, .ip = ip, .total = total, }; struct xfs_mount *mp = ip->i_mount; int whichfork = xfs_bmapi_whichfork(flags); struct xfs_ifork *ifp = xfs_ifork_ptr(ip, whichfork); xfs_fileoff_t end; /* end of mapped file region */ bool eof = false; /* after the end of extents */ int error; /* error return */ int n; /* current extent index */ xfs_fileoff_t obno; /* old block number (offset) */ #ifdef DEBUG xfs_fileoff_t orig_bno; /* original block number value */ int orig_flags; /* original flags arg value */ xfs_filblks_t orig_len; /* original value of len arg */ struct xfs_bmbt_irec *orig_mval; /* original value of mval */ int orig_nmap; /* original value of *nmap */ orig_bno = bno; orig_len = len; orig_flags = flags; orig_mval = mval; orig_nmap = *nmap; #endif ASSERT(*nmap >= 1); ASSERT(*nmap <= XFS_BMAP_MAX_NMAP); ASSERT(tp != NULL); ASSERT(len > 0); ASSERT(ifp->if_format != XFS_DINODE_FMT_LOCAL); xfs_assert_ilocked(ip, XFS_ILOCK_EXCL); ASSERT(!(flags & XFS_BMAPI_REMAP)); /* zeroing is for currently only for data extents, not metadata */ ASSERT((flags & (XFS_BMAPI_METADATA | XFS_BMAPI_ZERO)) != (XFS_BMAPI_METADATA | XFS_BMAPI_ZERO)); /* * we can allocate unwritten extents or pre-zero allocated blocks, * but it makes no sense to do both at once. This would result in * zeroing the unwritten extent twice, but it still being an * unwritten extent.... */ ASSERT((flags & (XFS_BMAPI_PREALLOC | XFS_BMAPI_ZERO)) != (XFS_BMAPI_PREALLOC | XFS_BMAPI_ZERO)); if (XFS_IS_CORRUPT(mp, !xfs_ifork_has_extents(ifp)) || XFS_TEST_ERROR(false, mp, XFS_ERRTAG_BMAPIFORMAT)) { xfs_bmap_mark_sick(ip, whichfork); return -EFSCORRUPTED; } if (xfs_is_shutdown(mp)) return -EIO; XFS_STATS_INC(mp, xs_blk_mapw); error = xfs_iread_extents(tp, ip, whichfork); if (error) goto error0; if (!xfs_iext_lookup_extent(ip, ifp, bno, &bma.icur, &bma.got)) eof = true; if (!xfs_iext_peek_prev_extent(ifp, &bma.icur, &bma.prev)) bma.prev.br_startoff = NULLFILEOFF; bma.minleft = xfs_bmapi_minleft(tp, ip, whichfork); n = 0; end = bno + len; obno = bno; while (bno < end && n < *nmap) { bool need_alloc = false, wasdelay = false; /* in hole or beyond EOF? */ if (eof || bma.got.br_startoff > bno) { /* * CoW fork conversions should /never/ hit EOF or * holes. There should always be something for us * to work on. */ ASSERT(!((flags & XFS_BMAPI_CONVERT) && (flags & XFS_BMAPI_COWFORK))); need_alloc = true; } else if (isnullstartblock(bma.got.br_startblock)) { wasdelay = true; } /* * First, deal with the hole before the allocated space * that we found, if any. */ if (need_alloc || wasdelay) { bma.eof = eof; bma.conv = !!(flags & XFS_BMAPI_CONVERT); bma.wasdel = wasdelay; bma.offset = bno; bma.flags = flags; /* * There's a 32/64 bit type mismatch between the * allocation length request (which can be 64 bits in * length) and the bma length request, which is * xfs_extlen_t and therefore 32 bits. Hence we have to * be careful and do the min() using the larger type to * avoid overflows. */ bma.length = XFS_FILBLKS_MIN(len, XFS_MAX_BMBT_EXTLEN); if (wasdelay) { bma.length = XFS_FILBLKS_MIN(bma.length, bma.got.br_blockcount - (bno - bma.got.br_startoff)); } else { if (!eof) bma.length = XFS_FILBLKS_MIN(bma.length, bma.got.br_startoff - bno); } ASSERT(bma.length > 0); error = xfs_bmapi_allocate(&bma); if (error) { /* * If we already allocated space in a previous * iteration return what we go so far when * running out of space. */ if (error == -ENOSPC && bma.nallocs) break; goto error0; } /* * If this is a CoW allocation, record the data in * the refcount btree for orphan recovery. */ if (whichfork == XFS_COW_FORK) xfs_refcount_alloc_cow_extent(tp, XFS_IS_REALTIME_INODE(ip), bma.blkno, bma.length); } /* Deal with the allocated space we found. */ xfs_bmapi_trim_map(mval, &bma.got, &bno, len, obno, end, n, flags); /* Execute unwritten extent conversion if necessary */ error = xfs_bmapi_convert_unwritten(&bma, mval, len, flags); if (error == -EAGAIN) continue; if (error) goto error0; /* update the extent map to return */ xfs_bmapi_update_map(&mval, &bno, &len, obno, end, &n, flags); /* * If we're done, stop now. Stop when we've allocated * XFS_BMAP_MAX_NMAP extents no matter what. Otherwise * the transaction may get too big. */ if (bno >= end || n >= *nmap || bma.nallocs >= *nmap) break; /* Else go on to the next record. */ bma.prev = bma.got; if (!xfs_iext_next_extent(ifp, &bma.icur, &bma.got)) eof = true; } error = xfs_bmap_btree_to_extents(tp, ip, bma.cur, &bma.logflags, whichfork); if (error) goto error0; ASSERT(ifp->if_format != XFS_DINODE_FMT_BTREE || ifp->if_nextents > XFS_IFORK_MAXEXT(ip, whichfork)); xfs_bmapi_finish(&bma, whichfork, 0); xfs_bmap_validate_ret(orig_bno, orig_len, orig_flags, orig_mval, orig_nmap, n); /* * When converting delayed allocations, xfs_bmapi_allocate ignores * the passed in bno and always converts from the start of the found * delalloc extent. * * To avoid a successful return with *nmap set to 0, return the magic * -ENOSR error code for this particular case so that the caller can * handle it. */ if (!n) { ASSERT(bma.nallocs >= *nmap); return -ENOSR; } *nmap = n; return 0; error0: xfs_bmapi_finish(&bma, whichfork, error); return error; } /* * Convert an existing delalloc extent to real blocks based on file offset. This * attempts to allocate the entire delalloc extent and may require multiple * invocations to allocate the target offset if a large enough physical extent * is not available. */ static int xfs_bmapi_convert_one_delalloc( struct xfs_inode *ip, int whichfork, xfs_off_t offset, struct iomap *iomap, unsigned int *seq) { struct xfs_ifork *ifp = xfs_ifork_ptr(ip, whichfork); struct xfs_mount *mp = ip->i_mount; xfs_fileoff_t offset_fsb = XFS_B_TO_FSBT(mp, offset); struct xfs_bmalloca bma = { NULL }; uint16_t flags = 0; struct xfs_trans *tp; int error; if (whichfork == XFS_COW_FORK) flags |= IOMAP_F_SHARED; /* * Space for the extent and indirect blocks was reserved when the * delalloc extent was created so there's no need to do so here. */ error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, 0, 0, XFS_TRANS_RESERVE, &tp); if (error) return error; xfs_ilock(ip, XFS_ILOCK_EXCL); xfs_trans_ijoin(tp, ip, 0); error = xfs_iext_count_extend(tp, ip, whichfork, XFS_IEXT_ADD_NOSPLIT_CNT); if (error) goto out_trans_cancel; if (!xfs_iext_lookup_extent(ip, ifp, offset_fsb, &bma.icur, &bma.got) || bma.got.br_startoff > offset_fsb) { /* * No extent found in the range we are trying to convert. This * should only happen for the COW fork, where another thread * might have moved the extent to the data fork in the meantime. */ WARN_ON_ONCE(whichfork != XFS_COW_FORK); error = -EAGAIN; goto out_trans_cancel; } /* * If we find a real extent here we raced with another thread converting * the extent. Just return the real extent at this offset. */ if (!isnullstartblock(bma.got.br_startblock)) { xfs_bmbt_to_iomap(ip, iomap, &bma.got, 0, flags, xfs_iomap_inode_sequence(ip, flags)); if (seq) *seq = READ_ONCE(ifp->if_seq); goto out_trans_cancel; } bma.tp = tp; bma.ip = ip; bma.wasdel = true; bma.minleft = xfs_bmapi_minleft(tp, ip, whichfork); /* * Always allocate convert from the start of the delalloc extent even if * that is outside the passed in range to create large contiguous * extents on disk. */ bma.offset = bma.got.br_startoff; bma.length = bma.got.br_blockcount; /* * When we're converting the delalloc reservations backing dirty pages * in the page cache, we must be careful about how we create the new * extents: * * New CoW fork extents are created unwritten, turned into real extents * when we're about to write the data to disk, and mapped into the data * fork after the write finishes. End of story. * * New data fork extents must be mapped in as unwritten and converted * to real extents after the write succeeds to avoid exposing stale * disk contents if we crash. */ bma.flags = XFS_BMAPI_PREALLOC; if (whichfork == XFS_COW_FORK) bma.flags |= XFS_BMAPI_COWFORK; if (!xfs_iext_peek_prev_extent(ifp, &bma.icur, &bma.prev)) bma.prev.br_startoff = NULLFILEOFF; error = xfs_bmapi_allocate(&bma); if (error) goto out_finish; XFS_STATS_ADD(mp, xs_xstrat_bytes, XFS_FSB_TO_B(mp, bma.length)); XFS_STATS_INC(mp, xs_xstrat_quick); ASSERT(!isnullstartblock(bma.got.br_startblock)); xfs_bmbt_to_iomap(ip, iomap, &bma.got, 0, flags, xfs_iomap_inode_sequence(ip, flags)); if (seq) *seq = READ_ONCE(ifp->if_seq); if (whichfork == XFS_COW_FORK) xfs_refcount_alloc_cow_extent(tp, XFS_IS_REALTIME_INODE(ip), bma.blkno, bma.length); error = xfs_bmap_btree_to_extents(tp, ip, bma.cur, &bma.logflags, whichfork); if (error) goto out_finish; xfs_bmapi_finish(&bma, whichfork, 0); error = xfs_trans_commit(tp); xfs_iunlock(ip, XFS_ILOCK_EXCL); return error; out_finish: xfs_bmapi_finish(&bma, whichfork, error); out_trans_cancel: xfs_trans_cancel(tp); xfs_iunlock(ip, XFS_ILOCK_EXCL); return error; } /* * Pass in a dellalloc extent and convert it to real extents, return the real * extent that maps offset_fsb in iomap. */ int xfs_bmapi_convert_delalloc( struct xfs_inode *ip, int whichfork, loff_t offset, struct iomap *iomap, unsigned int *seq) { int error; /* * Attempt to allocate whatever delalloc extent currently backs offset * and put the result into iomap. Allocate in a loop because it may * take several attempts to allocate real blocks for a contiguous * delalloc extent if free space is sufficiently fragmented. */ do { error = xfs_bmapi_convert_one_delalloc(ip, whichfork, offset, iomap, seq); if (error) return error; } while (iomap->offset + iomap->length <= offset); return 0; } int xfs_bmapi_remap( struct xfs_trans *tp, struct xfs_inode *ip, xfs_fileoff_t bno, xfs_filblks_t len, xfs_fsblock_t startblock, uint32_t flags) { struct xfs_mount *mp = ip->i_mount; struct xfs_ifork *ifp; struct xfs_btree_cur *cur = NULL; struct xfs_bmbt_irec got; struct xfs_iext_cursor icur; int whichfork = xfs_bmapi_whichfork(flags); int logflags = 0, error; ifp = xfs_ifork_ptr(ip, whichfork); ASSERT(len > 0); ASSERT(len <= (xfs_filblks_t)XFS_MAX_BMBT_EXTLEN); xfs_assert_ilocked(ip, XFS_ILOCK_EXCL); ASSERT(!(flags & ~(XFS_BMAPI_ATTRFORK | XFS_BMAPI_PREALLOC | XFS_BMAPI_NORMAP))); ASSERT((flags & (XFS_BMAPI_ATTRFORK | XFS_BMAPI_PREALLOC)) != (XFS_BMAPI_ATTRFORK | XFS_BMAPI_PREALLOC)); if (XFS_IS_CORRUPT(mp, !xfs_ifork_has_extents(ifp)) || XFS_TEST_ERROR(false, mp, XFS_ERRTAG_BMAPIFORMAT)) { xfs_bmap_mark_sick(ip, whichfork); return -EFSCORRUPTED; } if (xfs_is_shutdown(mp)) return -EIO; error = xfs_iread_extents(tp, ip, whichfork); if (error) return error; if (xfs_iext_lookup_extent(ip, ifp, bno, &icur, &got)) { /* make sure we only reflink into a hole. */ ASSERT(got.br_startoff > bno); ASSERT(got.br_startoff - bno >= len); } ip->i_nblocks += len; ip->i_delayed_blks -= len; /* see xfs_bmap_defer_add */ xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE); if (ifp->if_format == XFS_DINODE_FMT_BTREE) cur = xfs_bmbt_init_cursor(mp, tp, ip, whichfork); got.br_startoff = bno; got.br_startblock = startblock; got.br_blockcount = len; if (flags & XFS_BMAPI_PREALLOC) got.br_state = XFS_EXT_UNWRITTEN; else got.br_state = XFS_EXT_NORM; error = xfs_bmap_add_extent_hole_real(tp, ip, whichfork, &icur, &cur, &got, &logflags, flags); if (error) goto error0; error = xfs_bmap_btree_to_extents(tp, ip, cur, &logflags, whichfork); error0: if (ip->i_df.if_format != XFS_DINODE_FMT_EXTENTS) logflags &= ~XFS_ILOG_DEXT; else if (ip->i_df.if_format != XFS_DINODE_FMT_BTREE) logflags &= ~XFS_ILOG_DBROOT; if (logflags) xfs_trans_log_inode(tp, ip, logflags); if (cur) xfs_btree_del_cursor(cur, error); return error; } /* * When a delalloc extent is split (e.g., due to a hole punch), the original * indlen reservation must be shared across the two new extents that are left * behind. * * Given the original reservation and the worst case indlen for the two new * extents (as calculated by xfs_bmap_worst_indlen()), split the original * reservation fairly across the two new extents. If necessary, steal available * blocks from a deleted extent to make up a reservation deficiency (e.g., if * ores == 1). The number of stolen blocks is returned. The availability and * subsequent accounting of stolen blocks is the responsibility of the caller. */ static void xfs_bmap_split_indlen( xfs_filblks_t ores, /* original res. */ xfs_filblks_t *indlen1, /* ext1 worst indlen */ xfs_filblks_t *indlen2) /* ext2 worst indlen */ { xfs_filblks_t len1 = *indlen1; xfs_filblks_t len2 = *indlen2; xfs_filblks_t nres = len1 + len2; /* new total res. */ xfs_filblks_t resfactor; /* * We can't meet the total required reservation for the two extents. * Calculate the percent of the overall shortage between both extents * and apply this percentage to each of the requested indlen values. * This distributes the shortage fairly and reduces the chances that one * of the two extents is left with nothing when extents are repeatedly * split. */ resfactor = (ores * 100); do_div(resfactor, nres); len1 *= resfactor; do_div(len1, 100); len2 *= resfactor; do_div(len2, 100); ASSERT(len1 + len2 <= ores); ASSERT(len1 < *indlen1 && len2 < *indlen2); /* * Hand out the remainder to each extent. If one of the two reservations * is zero, we want to make sure that one gets a block first. The loop * below starts with len1, so hand len2 a block right off the bat if it * is zero. */ ores -= (len1 + len2); ASSERT((*indlen1 - len1) + (*indlen2 - len2) >= ores); if (ores && !len2 && *indlen2) { len2++; ores--; } while (ores) { if (len1 < *indlen1) { len1++; ores--; } if (!ores) break; if (len2 < *indlen2) { len2++; ores--; } } *indlen1 = len1; *indlen2 = len2; } void xfs_bmap_del_extent_delay( struct xfs_inode *ip, int whichfork, struct xfs_iext_cursor *icur, struct xfs_bmbt_irec *got, struct xfs_bmbt_irec *del, uint32_t bflags) /* bmapi flags */ { struct xfs_mount *mp = ip->i_mount; struct xfs_ifork *ifp = xfs_ifork_ptr(ip, whichfork); struct xfs_bmbt_irec new; int64_t da_old, da_new, da_diff = 0; xfs_fileoff_t del_endoff, got_endoff; xfs_filblks_t got_indlen, new_indlen, stolen = 0; uint32_t state = xfs_bmap_fork_to_state(whichfork); uint64_t fdblocks; bool isrt; XFS_STATS_INC(mp, xs_del_exlist); isrt = xfs_ifork_is_realtime(ip, whichfork); del_endoff = del->br_startoff + del->br_blockcount; got_endoff = got->br_startoff + got->br_blockcount; da_old = startblockval(got->br_startblock); da_new = 0; ASSERT(del->br_blockcount > 0); ASSERT(got->br_startoff <= del->br_startoff); ASSERT(got_endoff >= del_endoff); /* * Update the inode delalloc counter now and wait to update the * sb counters as we might have to borrow some blocks for the * indirect block accounting. */ xfs_quota_unreserve_blkres(ip, del->br_blockcount); ip->i_delayed_blks -= del->br_blockcount; if (got->br_startoff == del->br_startoff) state |= BMAP_LEFT_FILLING; if (got_endoff == del_endoff) state |= BMAP_RIGHT_FILLING; switch (state & (BMAP_LEFT_FILLING | BMAP_RIGHT_FILLING)) { case BMAP_LEFT_FILLING | BMAP_RIGHT_FILLING: /* * Matches the whole extent. Delete the entry. */ xfs_iext_remove(ip, icur, state); xfs_iext_prev(ifp, icur); break; case BMAP_LEFT_FILLING: /* * Deleting the first part of the extent. */ got->br_startoff = del_endoff; got->br_blockcount -= del->br_blockcount; da_new = XFS_FILBLKS_MIN(xfs_bmap_worst_indlen(ip, got->br_blockcount), da_old); got->br_startblock = nullstartblock((int)da_new); xfs_iext_update_extent(ip, state, icur, got); break; case BMAP_RIGHT_FILLING: /* * Deleting the last part of the extent. */ got->br_blockcount = got->br_blockcount - del->br_blockcount; da_new = XFS_FILBLKS_MIN(xfs_bmap_worst_indlen(ip, got->br_blockcount), da_old); got->br_startblock = nullstartblock((int)da_new); xfs_iext_update_extent(ip, state, icur, got); break; case 0: /* * Deleting the middle of the extent. * * Distribute the original indlen reservation across the two new * extents. Steal blocks from the deleted extent if necessary. * Stealing blocks simply fudges the fdblocks accounting below. * Warn if either of the new indlen reservations is zero as this * can lead to delalloc problems. */ got->br_blockcount = del->br_startoff - got->br_startoff; got_indlen = xfs_bmap_worst_indlen(ip, got->br_blockcount); new.br_blockcount = got_endoff - del_endoff; new_indlen = xfs_bmap_worst_indlen(ip, new.br_blockcount); WARN_ON_ONCE(!got_indlen || !new_indlen); /* * Steal as many blocks as we can to try and satisfy the worst * case indlen for both new extents. * * However, we can't just steal reservations from the data * blocks if this is an RT inodes as the data and metadata * blocks come from different pools. We'll have to live with * under-filled indirect reservation in this case. */ da_new = got_indlen + new_indlen; if (da_new > da_old && !isrt) { stolen = XFS_FILBLKS_MIN(da_new - da_old, del->br_blockcount); da_old += stolen; } if (da_new > da_old) xfs_bmap_split_indlen(da_old, &got_indlen, &new_indlen); da_new = got_indlen + new_indlen; got->br_startblock = nullstartblock((int)got_indlen); new.br_startoff = del_endoff; new.br_state = got->br_state; new.br_startblock = nullstartblock((int)new_indlen); xfs_iext_update_extent(ip, state, icur, got); xfs_iext_next(ifp, icur); xfs_iext_insert(ip, icur, &new, state); del->br_blockcount -= stolen; break; } ASSERT(da_old >= da_new); da_diff = da_old - da_new; fdblocks = da_diff; if (bflags & XFS_BMAPI_REMAP) { ; } else if (isrt) { xfs_rtbxlen_t rtxlen; rtxlen = xfs_blen_to_rtbxlen(mp, del->br_blockcount); if (xfs_is_zoned_inode(ip)) xfs_zoned_add_available(mp, rtxlen); xfs_add_frextents(mp, rtxlen); } else { fdblocks += del->br_blockcount; } xfs_add_fdblocks(mp, fdblocks); xfs_mod_delalloc(ip, -(int64_t)del->br_blockcount, -da_diff); } void xfs_bmap_del_extent_cow( struct xfs_inode *ip, struct xfs_iext_cursor *icur, struct xfs_bmbt_irec *got, struct xfs_bmbt_irec *del) { struct xfs_mount *mp = ip->i_mount; struct xfs_ifork *ifp = xfs_ifork_ptr(ip, XFS_COW_FORK); struct xfs_bmbt_irec new; xfs_fileoff_t del_endoff, got_endoff; uint32_t state = BMAP_COWFORK; XFS_STATS_INC(mp, xs_del_exlist); del_endoff = del->br_startoff + del->br_blockcount; got_endoff = got->br_startoff + got->br_blockcount; ASSERT(del->br_blockcount > 0); ASSERT(got->br_startoff <= del->br_startoff); ASSERT(got_endoff >= del_endoff); ASSERT(!isnullstartblock(got->br_startblock)); if (got->br_startoff == del->br_startoff) state |= BMAP_LEFT_FILLING; if (got_endoff == del_endoff) state |= BMAP_RIGHT_FILLING; switch (state & (BMAP_LEFT_FILLING | BMAP_RIGHT_FILLING)) { case BMAP_LEFT_FILLING | BMAP_RIGHT_FILLING: /* * Matches the whole extent. Delete the entry. */ xfs_iext_remove(ip, icur, state); xfs_iext_prev(ifp, icur); break; case BMAP_LEFT_FILLING: /* * Deleting the first part of the extent. */ got->br_startoff = del_endoff; got->br_blockcount -= del->br_blockcount; got->br_startblock = del->br_startblock + del->br_blockcount; xfs_iext_update_extent(ip, state, icur, got); break; case BMAP_RIGHT_FILLING: /* * Deleting the last part of the extent. */ got->br_blockcount -= del->br_blockcount; xfs_iext_update_extent(ip, state, icur, got); break; case 0: /* * Deleting the middle of the extent. */ got->br_blockcount = del->br_startoff - got->br_startoff; new.br_startoff = del_endoff; new.br_blockcount = got_endoff - del_endoff; new.br_state = got->br_state; new.br_startblock = del->br_startblock + del->br_blockcount; xfs_iext_update_extent(ip, state, icur, got); xfs_iext_next(ifp, icur); xfs_iext_insert(ip, icur, &new, state); break; } ip->i_delayed_blks -= del->br_blockcount; } static int xfs_bmap_free_rtblocks( struct xfs_trans *tp, struct xfs_bmbt_irec *del) { struct xfs_rtgroup *rtg; int error; rtg = xfs_rtgroup_grab(tp->t_mountp, 0); if (!rtg) return -EIO; /* * Ensure the bitmap and summary inodes are locked and joined to the * transaction before modifying them. */ if (!(tp->t_flags & XFS_TRANS_RTBITMAP_LOCKED)) { tp->t_flags |= XFS_TRANS_RTBITMAP_LOCKED; xfs_rtgroup_lock(rtg, XFS_RTGLOCK_BITMAP); xfs_rtgroup_trans_join(tp, rtg, XFS_RTGLOCK_BITMAP); } error = xfs_rtfree_blocks(tp, rtg, del->br_startblock, del->br_blockcount); xfs_rtgroup_rele(rtg); return error; } /* * Called by xfs_bmapi to update file extent records and the btree * after removing space. */ STATIC int /* error */ xfs_bmap_del_extent_real( xfs_inode_t *ip, /* incore inode pointer */ xfs_trans_t *tp, /* current transaction pointer */ struct xfs_iext_cursor *icur, struct xfs_btree_cur *cur, /* if null, not a btree */ xfs_bmbt_irec_t *del, /* data to remove from extents */ int *logflagsp, /* inode logging flags */ int whichfork, /* data or attr fork */ uint32_t bflags) /* bmapi flags */ { xfs_fsblock_t del_endblock=0; /* first block past del */ xfs_fileoff_t del_endoff; /* first offset past del */ int error = 0; /* error return value */ struct xfs_bmbt_irec got; /* current extent entry */ xfs_fileoff_t got_endoff; /* first offset past got */ int i; /* temp state */ struct xfs_ifork *ifp; /* inode fork pointer */ xfs_mount_t *mp; /* mount structure */ xfs_filblks_t nblks; /* quota/sb block count */ xfs_bmbt_irec_t new; /* new record to be inserted */ /* REFERENCED */ uint qfield; /* quota field to update */ uint32_t state = xfs_bmap_fork_to_state(whichfork); struct xfs_bmbt_irec old; *logflagsp = 0; mp = ip->i_mount; XFS_STATS_INC(mp, xs_del_exlist); ifp = xfs_ifork_ptr(ip, whichfork); ASSERT(del->br_blockcount > 0); xfs_iext_get_extent(ifp, icur, &got); ASSERT(got.br_startoff <= del->br_startoff); del_endoff = del->br_startoff + del->br_blockcount; got_endoff = got.br_startoff + got.br_blockcount; ASSERT(got_endoff >= del_endoff); ASSERT(!isnullstartblock(got.br_startblock)); qfield = 0; /* * If it's the case where the directory code is running with no block * reservation, and the deleted block is in the middle of its extent, * and the resulting insert of an extent would cause transformation to * btree format, then reject it. The calling code will then swap blocks * around instead. We have to do this now, rather than waiting for the * conversion to btree format, since the transaction will be dirty then. */ if (tp->t_blk_res == 0 && ifp->if_format == XFS_DINODE_FMT_EXTENTS && ifp->if_nextents >= XFS_IFORK_MAXEXT(ip, whichfork) && del->br_startoff > got.br_startoff && del_endoff < got_endoff) return -ENOSPC; *logflagsp = XFS_ILOG_CORE; if (xfs_ifork_is_realtime(ip, whichfork)) qfield = XFS_TRANS_DQ_RTBCOUNT; else qfield = XFS_TRANS_DQ_BCOUNT; nblks = del->br_blockcount; del_endblock = del->br_startblock + del->br_blockcount; if (cur) { error = xfs_bmbt_lookup_eq(cur, &got, &i); if (error) return error; if (XFS_IS_CORRUPT(mp, i != 1)) { xfs_btree_mark_sick(cur); return -EFSCORRUPTED; } } if (got.br_startoff == del->br_startoff) state |= BMAP_LEFT_FILLING; if (got_endoff == del_endoff) state |= BMAP_RIGHT_FILLING; switch (state & (BMAP_LEFT_FILLING | BMAP_RIGHT_FILLING)) { case BMAP_LEFT_FILLING | BMAP_RIGHT_FILLING: /* * Matches the whole extent. Delete the entry. */ xfs_iext_remove(ip, icur, state); xfs_iext_prev(ifp, icur); ifp->if_nextents--; *logflagsp |= XFS_ILOG_CORE; if (!cur) { *logflagsp |= xfs_ilog_fext(whichfork); break; } if ((error = xfs_btree_delete(cur, &i))) return error; if (XFS_IS_CORRUPT(mp, i != 1)) { xfs_btree_mark_sick(cur); return -EFSCORRUPTED; } break; case BMAP_LEFT_FILLING: /* * Deleting the first part of the extent. */ got.br_startoff = del_endoff; got.br_startblock = del_endblock; got.br_blockcount -= del->br_blockcount; xfs_iext_update_extent(ip, state, icur, &got); if (!cur) { *logflagsp |= xfs_ilog_fext(whichfork); break; } error = xfs_bmbt_update(cur, &got); if (error) return error; break; case BMAP_RIGHT_FILLING: /* * Deleting the last part of the extent. */ got.br_blockcount -= del->br_blockcount; xfs_iext_update_extent(ip, state, icur, &got); if (!cur) { *logflagsp |= xfs_ilog_fext(whichfork); break; } error = xfs_bmbt_update(cur, &got); if (error) return error; break; case 0: /* * Deleting the middle of the extent. */ old = got; got.br_blockcount = del->br_startoff - got.br_startoff; xfs_iext_update_extent(ip, state, icur, &got); new.br_startoff = del_endoff; new.br_blockcount = got_endoff - del_endoff; new.br_state = got.br_state; new.br_startblock = del_endblock; *logflagsp |= XFS_ILOG_CORE; if (cur) { error = xfs_bmbt_update(cur, &got); if (error) return error; error = xfs_btree_increment(cur, 0, &i); if (error) return error; cur->bc_rec.b = new; error = xfs_btree_insert(cur, &i); if (error && error != -ENOSPC) return error; /* * If get no-space back from btree insert, it tried a * split, and we have a zero block reservation. Fix up * our state and return the error. */ if (error == -ENOSPC) { /* * Reset the cursor, don't trust it after any * insert operation. */ error = xfs_bmbt_lookup_eq(cur, &got, &i); if (error) return error; if (XFS_IS_CORRUPT(mp, i != 1)) { xfs_btree_mark_sick(cur); return -EFSCORRUPTED; } /* * Update the btree record back * to the original value. */ error = xfs_bmbt_update(cur, &old); if (error) return error; /* * Reset the extent record back * to the original value. */ xfs_iext_update_extent(ip, state, icur, &old); *logflagsp = 0; return -ENOSPC; } if (XFS_IS_CORRUPT(mp, i != 1)) { xfs_btree_mark_sick(cur); return -EFSCORRUPTED; } } else *logflagsp |= xfs_ilog_fext(whichfork); ifp->if_nextents++; xfs_iext_next(ifp, icur); xfs_iext_insert(ip, icur, &new, state); break; } /* remove reverse mapping */ xfs_rmap_unmap_extent(tp, ip, whichfork, del); /* * If we need to, add to list of extents to delete. */ if (!(bflags & XFS_BMAPI_REMAP)) { bool isrt = xfs_ifork_is_realtime(ip, whichfork); if (xfs_is_reflink_inode(ip) && whichfork == XFS_DATA_FORK) { xfs_refcount_decrease_extent(tp, isrt, del); } else if (isrt && !xfs_has_rtgroups(mp)) { error = xfs_bmap_free_rtblocks(tp, del); } else { unsigned int efi_flags = 0; if ((bflags & XFS_BMAPI_NODISCARD) || del->br_state == XFS_EXT_UNWRITTEN) efi_flags |= XFS_FREE_EXTENT_SKIP_DISCARD; /* * Historically, we did not use EFIs to free realtime * extents. However, when reverse mapping is enabled, * we must maintain the same order of operations as the * data device, which is: Remove the file mapping, * remove the reverse mapping, and then free the * blocks. Reflink for realtime volumes requires the * same sort of ordering. Both features rely on * rtgroups, so let's gate rt EFI usage on rtgroups. */ if (isrt) efi_flags |= XFS_FREE_EXTENT_REALTIME; error = xfs_free_extent_later(tp, del->br_startblock, del->br_blockcount, NULL, XFS_AG_RESV_NONE, efi_flags); } if (error) return error; } /* * Adjust inode # blocks in the file. */ if (nblks) ip->i_nblocks -= nblks; /* * Adjust quota data. */ if (qfield && !(bflags & XFS_BMAPI_REMAP)) xfs_trans_mod_dquot_byino(tp, ip, qfield, (long)-nblks); return 0; } /* * Unmap (remove) blocks from a file. * If nexts is nonzero then the number of extents to remove is limited to * that value. If not all extents in the block range can be removed then * *done is set. */ static int __xfs_bunmapi( struct xfs_trans *tp, /* transaction pointer */ struct xfs_inode *ip, /* incore inode */ xfs_fileoff_t start, /* first file offset deleted */ xfs_filblks_t *rlen, /* i/o: amount remaining */ uint32_t flags, /* misc flags */ xfs_extnum_t nexts) /* number of extents max */ { struct xfs_btree_cur *cur; /* bmap btree cursor */ struct xfs_bmbt_irec del; /* extent being deleted */ int error; /* error return value */ xfs_extnum_t extno; /* extent number in list */ struct xfs_bmbt_irec got; /* current extent record */ struct xfs_ifork *ifp; /* inode fork pointer */ int isrt; /* freeing in rt area */ int logflags; /* transaction logging flags */ xfs_extlen_t mod; /* rt extent offset */ struct xfs_mount *mp = ip->i_mount; int tmp_logflags; /* partial logging flags */ int wasdel; /* was a delayed alloc extent */ int whichfork; /* data or attribute fork */ xfs_filblks_t len = *rlen; /* length to unmap in file */ xfs_fileoff_t end; struct xfs_iext_cursor icur; bool done = false; trace_xfs_bunmap(ip, start, len, flags, _RET_IP_); whichfork = xfs_bmapi_whichfork(flags); ASSERT(whichfork != XFS_COW_FORK); ifp = xfs_ifork_ptr(ip, whichfork); if (XFS_IS_CORRUPT(mp, !xfs_ifork_has_extents(ifp))) { xfs_bmap_mark_sick(ip, whichfork); return -EFSCORRUPTED; } if (xfs_is_shutdown(mp)) return -EIO; xfs_assert_ilocked(ip, XFS_ILOCK_EXCL); ASSERT(len > 0); ASSERT(nexts >= 0); error = xfs_iread_extents(tp, ip, whichfork); if (error) return error; if (xfs_iext_count(ifp) == 0) { *rlen = 0; return 0; } XFS_STATS_INC(mp, xs_blk_unmap); isrt = xfs_ifork_is_realtime(ip, whichfork); end = start + len; if (!xfs_iext_lookup_extent_before(ip, ifp, &end, &icur, &got)) { *rlen = 0; return 0; } end--; logflags = 0; if (ifp->if_format == XFS_DINODE_FMT_BTREE) { ASSERT(ifp->if_format == XFS_DINODE_FMT_BTREE); cur = xfs_bmbt_init_cursor(mp, tp, ip, whichfork); } else cur = NULL; extno = 0; while (end != (xfs_fileoff_t)-1 && end >= start && (nexts == 0 || extno < nexts)) { /* * Is the found extent after a hole in which end lives? * Just back up to the previous extent, if so. */ if (got.br_startoff > end && !xfs_iext_prev_extent(ifp, &icur, &got)) { done = true; break; } /* * Is the last block of this extent before the range * we're supposed to delete? If so, we're done. */ end = XFS_FILEOFF_MIN(end, got.br_startoff + got.br_blockcount - 1); if (end < start) break; /* * Then deal with the (possibly delayed) allocated space * we found. */ del = got; wasdel = isnullstartblock(del.br_startblock); if (got.br_startoff < start) { del.br_startoff = start; del.br_blockcount -= start - got.br_startoff; if (!wasdel) del.br_startblock += start - got.br_startoff; } if (del.br_startoff + del.br_blockcount > end + 1) del.br_blockcount = end + 1 - del.br_startoff; if (!isrt || (flags & XFS_BMAPI_REMAP)) goto delete; mod = xfs_rtb_to_rtxoff(mp, del.br_startblock + del.br_blockcount); if (mod) { /* * Realtime extent not lined up at the end. * The extent could have been split into written * and unwritten pieces, or we could just be * unmapping part of it. But we can't really * get rid of part of a realtime extent. */ if (del.br_state == XFS_EXT_UNWRITTEN) { /* * This piece is unwritten, or we're not * using unwritten extents. Skip over it. */ ASSERT((flags & XFS_BMAPI_REMAP) || end >= mod); end -= mod > del.br_blockcount ? del.br_blockcount : mod; if (end < got.br_startoff && !xfs_iext_prev_extent(ifp, &icur, &got)) { done = true; break; } continue; } /* * It's written, turn it unwritten. * This is better than zeroing it. */ ASSERT(del.br_state == XFS_EXT_NORM); ASSERT(tp->t_blk_res > 0); /* * If this spans a realtime extent boundary, * chop it back to the start of the one we end at. */ if (del.br_blockcount > mod) { del.br_startoff += del.br_blockcount - mod; del.br_startblock += del.br_blockcount - mod; del.br_blockcount = mod; } del.br_state = XFS_EXT_UNWRITTEN; error = xfs_bmap_add_extent_unwritten_real(tp, ip, whichfork, &icur, &cur, &del, &logflags); if (error) goto error0; goto nodelete; } mod = xfs_rtb_to_rtxoff(mp, del.br_startblock); if (mod) { xfs_extlen_t off = mp->m_sb.sb_rextsize - mod; /* * Realtime extent is lined up at the end but not * at the front. We'll get rid of full extents if * we can. */ if (del.br_blockcount > off) { del.br_blockcount -= off; del.br_startoff += off; del.br_startblock += off; } else if (del.br_startoff == start && (del.br_state == XFS_EXT_UNWRITTEN || tp->t_blk_res == 0)) { /* * Can't make it unwritten. There isn't * a full extent here so just skip it. */ ASSERT(end >= del.br_blockcount); end -= del.br_blockcount; if (got.br_startoff > end && !xfs_iext_prev_extent(ifp, &icur, &got)) { done = true; break; } continue; } else if (del.br_state == XFS_EXT_UNWRITTEN) { struct xfs_bmbt_irec prev; xfs_fileoff_t unwrite_start; /* * This one is already unwritten. * It must have a written left neighbor. * Unwrite the killed part of that one and * try again. */ if (!xfs_iext_prev_extent(ifp, &icur, &prev)) ASSERT(0); ASSERT(prev.br_state == XFS_EXT_NORM); ASSERT(!isnullstartblock(prev.br_startblock)); ASSERT(del.br_startblock == prev.br_startblock + prev.br_blockcount); unwrite_start = max3(start, del.br_startoff - mod, prev.br_startoff); mod = unwrite_start - prev.br_startoff; prev.br_startoff = unwrite_start; prev.br_startblock += mod; prev.br_blockcount -= mod; prev.br_state = XFS_EXT_UNWRITTEN; error = xfs_bmap_add_extent_unwritten_real(tp, ip, whichfork, &icur, &cur, &prev, &logflags); if (error) goto error0; goto nodelete; } else { ASSERT(del.br_state == XFS_EXT_NORM); del.br_state = XFS_EXT_UNWRITTEN; error = xfs_bmap_add_extent_unwritten_real(tp, ip, whichfork, &icur, &cur, &del, &logflags); if (error) goto error0; goto nodelete; } } delete: if (wasdel) { xfs_bmap_del_extent_delay(ip, whichfork, &icur, &got, &del, flags); } else { error = xfs_bmap_del_extent_real(ip, tp, &icur, cur, &del, &tmp_logflags, whichfork, flags); logflags |= tmp_logflags; if (error) goto error0; } end = del.br_startoff - 1; nodelete: /* * If not done go on to the next (previous) record. */ if (end != (xfs_fileoff_t)-1 && end >= start) { if (!xfs_iext_get_extent(ifp, &icur, &got) || (got.br_startoff > end && !xfs_iext_prev_extent(ifp, &icur, &got))) { done = true; break; } extno++; } } if (done || end == (xfs_fileoff_t)-1 || end < start) *rlen = 0; else *rlen = end - start + 1; /* * Convert to a btree if necessary. */ if (xfs_bmap_needs_btree(ip, whichfork)) { ASSERT(cur == NULL); error = xfs_bmap_extents_to_btree(tp, ip, &cur, 0, &tmp_logflags, whichfork); logflags |= tmp_logflags; } else { error = xfs_bmap_btree_to_extents(tp, ip, cur, &logflags, whichfork); } error0: /* * Log everything. Do this after conversion, there's no point in * logging the extent records if we've converted to btree format. */ if ((logflags & xfs_ilog_fext(whichfork)) && ifp->if_format != XFS_DINODE_FMT_EXTENTS) logflags &= ~xfs_ilog_fext(whichfork); else if ((logflags & xfs_ilog_fbroot(whichfork)) && ifp->if_format != XFS_DINODE_FMT_BTREE) logflags &= ~xfs_ilog_fbroot(whichfork); /* * Log inode even in the error case, if the transaction * is dirty we'll need to shut down the filesystem. */ if (logflags) xfs_trans_log_inode(tp, ip, logflags); if (cur) { if (!error) cur->bc_bmap.allocated = 0; xfs_btree_del_cursor(cur, error); } return error; } /* Unmap a range of a file. */ int xfs_bunmapi( xfs_trans_t *tp, struct xfs_inode *ip, xfs_fileoff_t bno, xfs_filblks_t len, uint32_t flags, xfs_extnum_t nexts, int *done) { int error; error = __xfs_bunmapi(tp, ip, bno, &len, flags, nexts); *done = (len == 0); return error; } /* * Determine whether an extent shift can be accomplished by a merge with the * extent that precedes the target hole of the shift. */ STATIC bool xfs_bmse_can_merge( struct xfs_inode *ip, int whichfork, struct xfs_bmbt_irec *left, /* preceding extent */ struct xfs_bmbt_irec *got, /* current extent to shift */ xfs_fileoff_t shift) /* shift fsb */ { xfs_fileoff_t startoff; startoff = got->br_startoff - shift; /* * The extent, once shifted, must be adjacent in-file and on-disk with * the preceding extent. */ if ((left->br_startoff + left->br_blockcount != startoff) || (left->br_startblock + left->br_blockcount != got->br_startblock) || (left->br_state != got->br_state) || (left->br_blockcount + got->br_blockcount > XFS_MAX_BMBT_EXTLEN) || !xfs_bmap_same_rtgroup(ip, whichfork, left, got)) return false; return true; } /* * A bmap extent shift adjusts the file offset of an extent to fill a preceding * hole in the file. If an extent shift would result in the extent being fully * adjacent to the extent that currently precedes the hole, we can merge with * the preceding extent rather than do the shift. * * This function assumes the caller has verified a shift-by-merge is possible * with the provided extents via xfs_bmse_can_merge(). */ STATIC int xfs_bmse_merge( struct xfs_trans *tp, struct xfs_inode *ip, int whichfork, xfs_fileoff_t shift, /* shift fsb */ struct xfs_iext_cursor *icur, struct xfs_bmbt_irec *got, /* extent to shift */ struct xfs_bmbt_irec *left, /* preceding extent */ struct xfs_btree_cur *cur, int *logflags) /* output */ { struct xfs_ifork *ifp = xfs_ifork_ptr(ip, whichfork); struct xfs_bmbt_irec new; xfs_filblks_t blockcount; int error, i; struct xfs_mount *mp = ip->i_mount; blockcount = left->br_blockcount + got->br_blockcount; xfs_assert_ilocked(ip, XFS_IOLOCK_EXCL | XFS_ILOCK_EXCL); ASSERT(xfs_bmse_can_merge(ip, whichfork, left, got, shift)); new = *left; new.br_blockcount = blockcount; /* * Update the on-disk extent count, the btree if necessary and log the * inode. */ ifp->if_nextents--; *logflags |= XFS_ILOG_CORE; if (!cur) { *logflags |= XFS_ILOG_DEXT; goto done; } /* lookup and remove the extent to merge */ error = xfs_bmbt_lookup_eq(cur, got, &i); if (error) return error; if (XFS_IS_CORRUPT(mp, i != 1)) { xfs_btree_mark_sick(cur); return -EFSCORRUPTED; } error = xfs_btree_delete(cur, &i); if (error) return error; if (XFS_IS_CORRUPT(mp, i != 1)) { xfs_btree_mark_sick(cur); return -EFSCORRUPTED; } /* lookup and update size of the previous extent */ error = xfs_bmbt_lookup_eq(cur, left, &i); if (error) return error; if (XFS_IS_CORRUPT(mp, i != 1)) { xfs_btree_mark_sick(cur); return -EFSCORRUPTED; } error = xfs_bmbt_update(cur, &new); if (error) return error; /* change to extent format if required after extent removal */ error = xfs_bmap_btree_to_extents(tp, ip, cur, logflags, whichfork); if (error) return error; done: xfs_iext_remove(ip, icur, 0); xfs_iext_prev(ifp, icur); xfs_iext_update_extent(ip, xfs_bmap_fork_to_state(whichfork), icur, &new); /* update reverse mapping. rmap functions merge the rmaps for us */ xfs_rmap_unmap_extent(tp, ip, whichfork, got); memcpy(&new, got, sizeof(new)); new.br_startoff = left->br_startoff + left->br_blockcount; xfs_rmap_map_extent(tp, ip, whichfork, &new); return 0; } static int xfs_bmap_shift_update_extent( struct xfs_trans *tp, struct xfs_inode *ip, int whichfork, struct xfs_iext_cursor *icur, struct xfs_bmbt_irec *got, struct xfs_btree_cur *cur, int *logflags, xfs_fileoff_t startoff) { struct xfs_mount *mp = ip->i_mount; struct xfs_bmbt_irec prev = *got; int error, i; *logflags |= XFS_ILOG_CORE; got->br_startoff = startoff; if (cur) { error = xfs_bmbt_lookup_eq(cur, &prev, &i); if (error) return error; if (XFS_IS_CORRUPT(mp, i != 1)) { xfs_btree_mark_sick(cur); return -EFSCORRUPTED; } error = xfs_bmbt_update(cur, got); if (error) return error; } else { *logflags |= XFS_ILOG_DEXT; } xfs_iext_update_extent(ip, xfs_bmap_fork_to_state(whichfork), icur, got); /* update reverse mapping */ xfs_rmap_unmap_extent(tp, ip, whichfork, &prev); xfs_rmap_map_extent(tp, ip, whichfork, got); return 0; } int xfs_bmap_collapse_extents( struct xfs_trans *tp, struct xfs_inode *ip, xfs_fileoff_t *next_fsb, xfs_fileoff_t offset_shift_fsb, bool *done) { int whichfork = XFS_DATA_FORK; struct xfs_mount *mp = ip->i_mount; struct xfs_ifork *ifp = xfs_ifork_ptr(ip, whichfork); struct xfs_btree_cur *cur = NULL; struct xfs_bmbt_irec got, prev; struct xfs_iext_cursor icur; xfs_fileoff_t new_startoff; int error = 0; int logflags = 0; if (XFS_IS_CORRUPT(mp, !xfs_ifork_has_extents(ifp)) || XFS_TEST_ERROR(false, mp, XFS_ERRTAG_BMAPIFORMAT)) { xfs_bmap_mark_sick(ip, whichfork); return -EFSCORRUPTED; } if (xfs_is_shutdown(mp)) return -EIO; xfs_assert_ilocked(ip, XFS_IOLOCK_EXCL | XFS_ILOCK_EXCL); error = xfs_iread_extents(tp, ip, whichfork); if (error) return error; if (ifp->if_format == XFS_DINODE_FMT_BTREE) cur = xfs_bmbt_init_cursor(mp, tp, ip, whichfork); if (!xfs_iext_lookup_extent(ip, ifp, *next_fsb, &icur, &got)) { *done = true; goto del_cursor; } if (XFS_IS_CORRUPT(mp, isnullstartblock(got.br_startblock))) { xfs_bmap_mark_sick(ip, whichfork); error = -EFSCORRUPTED; goto del_cursor; } new_startoff = got.br_startoff - offset_shift_fsb; if (xfs_iext_peek_prev_extent(ifp, &icur, &prev)) { if (new_startoff < prev.br_startoff + prev.br_blockcount) { error = -EINVAL; goto del_cursor; } if (xfs_bmse_can_merge(ip, whichfork, &prev, &got, offset_shift_fsb)) { error = xfs_bmse_merge(tp, ip, whichfork, offset_shift_fsb, &icur, &got, &prev, cur, &logflags); if (error) goto del_cursor; goto done; } } else { if (got.br_startoff < offset_shift_fsb) { error = -EINVAL; goto del_cursor; } } error = xfs_bmap_shift_update_extent(tp, ip, whichfork, &icur, &got, cur, &logflags, new_startoff); if (error) goto del_cursor; done: if (!xfs_iext_next_extent(ifp, &icur, &got)) { *done = true; goto del_cursor; } *next_fsb = got.br_startoff; del_cursor: if (cur) xfs_btree_del_cursor(cur, error); if (logflags) xfs_trans_log_inode(tp, ip, logflags); return error; } /* Make sure we won't be right-shifting an extent past the maximum bound. */ int xfs_bmap_can_insert_extents( struct xfs_inode *ip, xfs_fileoff_t off, xfs_fileoff_t shift) { struct xfs_bmbt_irec got; int is_empty; int error = 0; xfs_assert_ilocked(ip, XFS_IOLOCK_EXCL); if (xfs_is_shutdown(ip->i_mount)) return -EIO; xfs_ilock(ip, XFS_ILOCK_EXCL); error = xfs_bmap_last_extent(NULL, ip, XFS_DATA_FORK, &got, &is_empty); if (!error && !is_empty && got.br_startoff >= off && ((got.br_startoff + shift) & BMBT_STARTOFF_MASK) < got.br_startoff) error = -EINVAL; xfs_iunlock(ip, XFS_ILOCK_EXCL); return error; } int xfs_bmap_insert_extents( struct xfs_trans *tp, struct xfs_inode *ip, xfs_fileoff_t *next_fsb, xfs_fileoff_t offset_shift_fsb, bool *done, xfs_fileoff_t stop_fsb) { int whichfork = XFS_DATA_FORK; struct xfs_mount *mp = ip->i_mount; struct xfs_ifork *ifp = xfs_ifork_ptr(ip, whichfork); struct xfs_btree_cur *cur = NULL; struct xfs_bmbt_irec got, next; struct xfs_iext_cursor icur; xfs_fileoff_t new_startoff; int error = 0; int logflags = 0; if (XFS_IS_CORRUPT(mp, !xfs_ifork_has_extents(ifp)) || XFS_TEST_ERROR(false, mp, XFS_ERRTAG_BMAPIFORMAT)) { xfs_bmap_mark_sick(ip, whichfork); return -EFSCORRUPTED; } if (xfs_is_shutdown(mp)) return -EIO; xfs_assert_ilocked(ip, XFS_IOLOCK_EXCL | XFS_ILOCK_EXCL); error = xfs_iread_extents(tp, ip, whichfork); if (error) return error; if (ifp->if_format == XFS_DINODE_FMT_BTREE) cur = xfs_bmbt_init_cursor(mp, tp, ip, whichfork); if (*next_fsb == NULLFSBLOCK) { xfs_iext_last(ifp, &icur); if (!xfs_iext_get_extent(ifp, &icur, &got) || stop_fsb > got.br_startoff) { *done = true; goto del_cursor; } } else { if (!xfs_iext_lookup_extent(ip, ifp, *next_fsb, &icur, &got)) { *done = true; goto del_cursor; } } if (XFS_IS_CORRUPT(mp, isnullstartblock(got.br_startblock))) { xfs_bmap_mark_sick(ip, whichfork); error = -EFSCORRUPTED; goto del_cursor; } if (XFS_IS_CORRUPT(mp, stop_fsb > got.br_startoff)) { xfs_bmap_mark_sick(ip, whichfork); error = -EFSCORRUPTED; goto del_cursor; } new_startoff = got.br_startoff + offset_shift_fsb; if (xfs_iext_peek_next_extent(ifp, &icur, &next)) { if (new_startoff + got.br_blockcount > next.br_startoff) { error = -EINVAL; goto del_cursor; } /* * Unlike a left shift (which involves a hole punch), a right * shift does not modify extent neighbors in any way. We should * never find mergeable extents in this scenario. Check anyways * and warn if we encounter two extents that could be one. */ if (xfs_bmse_can_merge(ip, whichfork, &got, &next, offset_shift_fsb)) WARN_ON_ONCE(1); } error = xfs_bmap_shift_update_extent(tp, ip, whichfork, &icur, &got, cur, &logflags, new_startoff); if (error) goto del_cursor; if (!xfs_iext_prev_extent(ifp, &icur, &got) || stop_fsb >= got.br_startoff + got.br_blockcount) { *done = true; goto del_cursor; } *next_fsb = got.br_startoff; del_cursor: if (cur) xfs_btree_del_cursor(cur, error); if (logflags) xfs_trans_log_inode(tp, ip, logflags); return error; } /* * Splits an extent into two extents at split_fsb block such that it is the * first block of the current_ext. @ext is a target extent to be split. * @split_fsb is a block where the extents is split. If split_fsb lies in a * hole or the first block of extents, just return 0. */ int xfs_bmap_split_extent( struct xfs_trans *tp, struct xfs_inode *ip, xfs_fileoff_t split_fsb) { int whichfork = XFS_DATA_FORK; struct xfs_ifork *ifp = xfs_ifork_ptr(ip, whichfork); struct xfs_btree_cur *cur = NULL; struct xfs_bmbt_irec got; struct xfs_bmbt_irec new; /* split extent */ struct xfs_mount *mp = ip->i_mount; xfs_fsblock_t gotblkcnt; /* new block count for got */ struct xfs_iext_cursor icur; int error = 0; int logflags = 0; int i = 0; if (XFS_IS_CORRUPT(mp, !xfs_ifork_has_extents(ifp)) || XFS_TEST_ERROR(false, mp, XFS_ERRTAG_BMAPIFORMAT)) { xfs_bmap_mark_sick(ip, whichfork); return -EFSCORRUPTED; } if (xfs_is_shutdown(mp)) return -EIO; /* Read in all the extents */ error = xfs_iread_extents(tp, ip, whichfork); if (error) return error; /* * If there are not extents, or split_fsb lies in a hole we are done. */ if (!xfs_iext_lookup_extent(ip, ifp, split_fsb, &icur, &got) || got.br_startoff >= split_fsb) return 0; gotblkcnt = split_fsb - got.br_startoff; new.br_startoff = split_fsb; new.br_startblock = got.br_startblock + gotblkcnt; new.br_blockcount = got.br_blockcount - gotblkcnt; new.br_state = got.br_state; if (ifp->if_format == XFS_DINODE_FMT_BTREE) { cur = xfs_bmbt_init_cursor(mp, tp, ip, whichfork); error = xfs_bmbt_lookup_eq(cur, &got, &i); if (error) goto del_cursor; if (XFS_IS_CORRUPT(mp, i != 1)) { xfs_btree_mark_sick(cur); error = -EFSCORRUPTED; goto del_cursor; } } got.br_blockcount = gotblkcnt; xfs_iext_update_extent(ip, xfs_bmap_fork_to_state(whichfork), &icur, &got); logflags = XFS_ILOG_CORE; if (cur) { error = xfs_bmbt_update(cur, &got); if (error) goto del_cursor; } else logflags |= XFS_ILOG_DEXT; /* Add new extent */ xfs_iext_next(ifp, &icur); xfs_iext_insert(ip, &icur, &new, 0); ifp->if_nextents++; if (cur) { error = xfs_bmbt_lookup_eq(cur, &new, &i); if (error) goto del_cursor; if (XFS_IS_CORRUPT(mp, i != 0)) { xfs_btree_mark_sick(cur); error = -EFSCORRUPTED; goto del_cursor; } error = xfs_btree_insert(cur, &i); if (error) goto del_cursor; if (XFS_IS_CORRUPT(mp, i != 1)) { xfs_btree_mark_sick(cur); error = -EFSCORRUPTED; goto del_cursor; } } /* * Convert to a btree if necessary. */ if (xfs_bmap_needs_btree(ip, whichfork)) { int tmp_logflags; /* partial log flag return val */ ASSERT(cur == NULL); error = xfs_bmap_extents_to_btree(tp, ip, &cur, 0, &tmp_logflags, whichfork); logflags |= tmp_logflags; } del_cursor: if (cur) { cur->bc_bmap.allocated = 0; xfs_btree_del_cursor(cur, error); } if (logflags) xfs_trans_log_inode(tp, ip, logflags); return error; } /* Record a bmap intent. */ static inline void __xfs_bmap_add( struct xfs_trans *tp, enum xfs_bmap_intent_type type, struct xfs_inode *ip, int whichfork, struct xfs_bmbt_irec *bmap) { struct xfs_bmap_intent *bi; if ((whichfork != XFS_DATA_FORK && whichfork != XFS_ATTR_FORK) || bmap->br_startblock == HOLESTARTBLOCK || bmap->br_startblock == DELAYSTARTBLOCK) return; bi = kmem_cache_alloc(xfs_bmap_intent_cache, GFP_KERNEL | __GFP_NOFAIL); INIT_LIST_HEAD(&bi->bi_list); bi->bi_type = type; bi->bi_owner = ip; bi->bi_whichfork = whichfork; bi->bi_bmap = *bmap; xfs_bmap_defer_add(tp, bi); } /* Map an extent into a file. */ void xfs_bmap_map_extent( struct xfs_trans *tp, struct xfs_inode *ip, int whichfork, struct xfs_bmbt_irec *PREV) { __xfs_bmap_add(tp, XFS_BMAP_MAP, ip, whichfork, PREV); } /* Unmap an extent out of a file. */ void xfs_bmap_unmap_extent( struct xfs_trans *tp, struct xfs_inode *ip, int whichfork, struct xfs_bmbt_irec *PREV) { __xfs_bmap_add(tp, XFS_BMAP_UNMAP, ip, whichfork, PREV); } /* * Process one of the deferred bmap operations. We pass back the * btree cursor to maintain our lock on the bmapbt between calls. */ int xfs_bmap_finish_one( struct xfs_trans *tp, struct xfs_bmap_intent *bi) { struct xfs_bmbt_irec *bmap = &bi->bi_bmap; int error = 0; int flags = 0; if (bi->bi_whichfork == XFS_ATTR_FORK) flags |= XFS_BMAPI_ATTRFORK; ASSERT(tp->t_highest_agno == NULLAGNUMBER); trace_xfs_bmap_deferred(bi); if (XFS_TEST_ERROR(false, tp->t_mountp, XFS_ERRTAG_BMAP_FINISH_ONE)) return -EIO; switch (bi->bi_type) { case XFS_BMAP_MAP: if (bi->bi_bmap.br_state == XFS_EXT_UNWRITTEN) flags |= XFS_BMAPI_PREALLOC; error = xfs_bmapi_remap(tp, bi->bi_owner, bmap->br_startoff, bmap->br_blockcount, bmap->br_startblock, flags); bmap->br_blockcount = 0; break; case XFS_BMAP_UNMAP: error = __xfs_bunmapi(tp, bi->bi_owner, bmap->br_startoff, &bmap->br_blockcount, flags | XFS_BMAPI_REMAP, 1); break; default: ASSERT(0); xfs_bmap_mark_sick(bi->bi_owner, bi->bi_whichfork); error = -EFSCORRUPTED; } return error; } /* Check that an extent does not have invalid flags or bad ranges. */ xfs_failaddr_t xfs_bmap_validate_extent_raw( struct xfs_mount *mp, bool rtfile, int whichfork, struct xfs_bmbt_irec *irec) { if (!xfs_verify_fileext(mp, irec->br_startoff, irec->br_blockcount)) return __this_address; if (rtfile && whichfork == XFS_DATA_FORK) { if (!xfs_verify_rtbext(mp, irec->br_startblock, irec->br_blockcount)) return __this_address; } else { if (!xfs_verify_fsbext(mp, irec->br_startblock, irec->br_blockcount)) return __this_address; } if (irec->br_state != XFS_EXT_NORM && whichfork != XFS_DATA_FORK) return __this_address; return NULL; } int __init xfs_bmap_intent_init_cache(void) { xfs_bmap_intent_cache = kmem_cache_create("xfs_bmap_intent", sizeof(struct xfs_bmap_intent), 0, 0, NULL); return xfs_bmap_intent_cache != NULL ? 0 : -ENOMEM; } void xfs_bmap_intent_destroy_cache(void) { kmem_cache_destroy(xfs_bmap_intent_cache); xfs_bmap_intent_cache = NULL; } /* Check that an inode's extent does not have invalid flags or bad ranges. */ xfs_failaddr_t xfs_bmap_validate_extent( struct xfs_inode *ip, int whichfork, struct xfs_bmbt_irec *irec) { return xfs_bmap_validate_extent_raw(ip->i_mount, XFS_IS_REALTIME_INODE(ip), whichfork, irec); } /* * Used in xfs_itruncate_extents(). This is the maximum number of extents * freed from a file in a single transaction. */ #define XFS_ITRUNC_MAX_EXTENTS 2 /* * Unmap every extent in part of an inode's fork. We don't do any higher level * invalidation work at all. */ int xfs_bunmapi_range( struct xfs_trans **tpp, struct xfs_inode *ip, uint32_t flags, xfs_fileoff_t startoff, xfs_fileoff_t endoff) { xfs_filblks_t unmap_len = endoff - startoff + 1; int error = 0; xfs_assert_ilocked(ip, XFS_ILOCK_EXCL); while (unmap_len > 0) { ASSERT((*tpp)->t_highest_agno == NULLAGNUMBER); error = __xfs_bunmapi(*tpp, ip, startoff, &unmap_len, flags, XFS_ITRUNC_MAX_EXTENTS); if (error) goto out; /* free the just unmapped extents */ error = xfs_defer_finish(tpp); if (error) goto out; cond_resched(); } out: return error; } struct xfs_bmap_query_range { xfs_bmap_query_range_fn fn; void *priv; }; /* Format btree record and pass to our callback. */ STATIC int xfs_bmap_query_range_helper( struct xfs_btree_cur *cur, const union xfs_btree_rec *rec, void *priv) { struct xfs_bmap_query_range *query = priv; struct xfs_bmbt_irec irec; xfs_failaddr_t fa; xfs_bmbt_disk_get_all(&rec->bmbt, &irec); fa = xfs_bmap_validate_extent(cur->bc_ino.ip, cur->bc_ino.whichfork, &irec); if (fa) { xfs_btree_mark_sick(cur); return xfs_bmap_complain_bad_rec(cur->bc_ino.ip, cur->bc_ino.whichfork, fa, &irec); } return query->fn(cur, &irec, query->priv); } /* Find all bmaps. */ int xfs_bmap_query_all( struct xfs_btree_cur *cur, xfs_bmap_query_range_fn fn, void *priv) { struct xfs_bmap_query_range query = { .priv = priv, .fn = fn, }; return xfs_btree_query_all(cur, xfs_bmap_query_range_helper, &query); } /* Helper function to extract extent size hint from inode */ xfs_extlen_t xfs_get_extsz_hint( struct xfs_inode *ip) { /* * No point in aligning allocations if we need to COW to actually * write to them. */ if (!xfs_is_always_cow_inode(ip) && (ip->i_diflags & XFS_DIFLAG_EXTSIZE) && ip->i_extsize) return ip->i_extsize; if (XFS_IS_REALTIME_INODE(ip) && ip->i_mount->m_sb.sb_rextsize > 1) return ip->i_mount->m_sb.sb_rextsize; return 0; } /* * Helper function to extract CoW extent size hint from inode. * Between the extent size hint and the CoW extent size hint, we * return the greater of the two. If the value is zero (automatic), * use the default size. */ xfs_extlen_t xfs_get_cowextsz_hint( struct xfs_inode *ip) { xfs_extlen_t a, b; a = 0; if (ip->i_diflags2 & XFS_DIFLAG2_COWEXTSIZE) a = ip->i_cowextsize; if (XFS_IS_REALTIME_INODE(ip)) { b = 0; if (ip->i_diflags & XFS_DIFLAG_EXTSIZE) b = ip->i_extsize; } else { b = xfs_get_extsz_hint(ip); } a = max(a, b); if (a == 0) return XFS_DEFAULT_COWEXTSZ_HINT; return a; } |
592 117 117 152 152 152 730 676 438 154 244 234 233 102 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 | // SPDX-License-Identifier: GPL-2.0 #include <linux/slab.h> #include "messages.h" #include "subpage.h" #include "btrfs_inode.h" /* * Subpage (block size < folio size) support overview: * * Limitations: * * - Only support 64K page size for now * This is to make metadata handling easier, as 64K page would ensure * all nodesize would fit inside one page, thus we don't need to handle * cases where a tree block crosses several pages. * * - Only metadata read-write for now * The data read-write part is in development. * * - Metadata can't cross 64K page boundary * btrfs-progs and kernel have done that for a while, thus only ancient * filesystems could have such problem. For such case, do a graceful * rejection. * * Special behavior: * * - Metadata * Metadata read is fully supported. * Meaning when reading one tree block will only trigger the read for the * needed range, other unrelated range in the same page will not be touched. * * Metadata write support is partial. * The writeback is still for the full page, but we will only submit * the dirty extent buffers in the page. * * This means, if we have a metadata page like this: * * Page offset * 0 16K 32K 48K 64K * |/////////| |///////////| * \- Tree block A \- Tree block B * * Even if we just want to writeback tree block A, we will also writeback * tree block B if it's also dirty. * * This may cause extra metadata writeback which results more COW. * * Implementation: * * - Common * Both metadata and data will use a new structure, btrfs_subpage, to * record the status of each sector inside a page. This provides the extra * granularity needed. * * - Metadata * Since we have multiple tree blocks inside one page, we can't rely on page * locking anymore, or we will have greatly reduced concurrency or even * deadlocks (hold one tree lock while trying to lock another tree lock in * the same page). * * Thus for metadata locking, subpage support relies on io_tree locking only. * This means a slightly higher tree locking latency. */ int btrfs_attach_subpage(const struct btrfs_fs_info *fs_info, struct folio *folio, enum btrfs_subpage_type type) { struct btrfs_subpage *subpage; /* For metadata we don't support large folio yet. */ if (type == BTRFS_SUBPAGE_METADATA) ASSERT(!folio_test_large(folio)); /* * We have cases like a dummy extent buffer page, which is not mapped * and doesn't need to be locked. */ if (folio->mapping) ASSERT(folio_test_locked(folio)); /* Either not subpage, or the folio already has private attached. */ if (folio_test_private(folio)) return 0; if (type == BTRFS_SUBPAGE_METADATA && !btrfs_meta_is_subpage(fs_info)) return 0; if (type == BTRFS_SUBPAGE_DATA && !btrfs_is_subpage(fs_info, folio)) return 0; subpage = btrfs_alloc_subpage(fs_info, folio_size(folio), type); if (IS_ERR(subpage)) return PTR_ERR(subpage); folio_attach_private(folio, subpage); return 0; } void btrfs_detach_subpage(const struct btrfs_fs_info *fs_info, struct folio *folio, enum btrfs_subpage_type type) { struct btrfs_subpage *subpage; /* Either not subpage, or the folio already has private attached. */ if (!folio_test_private(folio)) return; if (type == BTRFS_SUBPAGE_METADATA && !btrfs_meta_is_subpage(fs_info)) return; if (type == BTRFS_SUBPAGE_DATA && !btrfs_is_subpage(fs_info, folio)) return; subpage = folio_detach_private(folio); ASSERT(subpage); btrfs_free_subpage(subpage); } struct btrfs_subpage *btrfs_alloc_subpage(const struct btrfs_fs_info *fs_info, size_t fsize, enum btrfs_subpage_type type) { struct btrfs_subpage *ret; unsigned int real_size; ASSERT(fs_info->sectorsize < fsize); real_size = struct_size(ret, bitmaps, BITS_TO_LONGS(btrfs_bitmap_nr_max * (fsize >> fs_info->sectorsize_bits))); ret = kzalloc(real_size, GFP_NOFS); if (!ret) return ERR_PTR(-ENOMEM); spin_lock_init(&ret->lock); if (type == BTRFS_SUBPAGE_METADATA) atomic_set(&ret->eb_refs, 0); else atomic_set(&ret->nr_locked, 0); return ret; } void btrfs_free_subpage(struct btrfs_subpage *subpage) { kfree(subpage); } /* * Increase the eb_refs of current subpage. * * This is important for eb allocation, to prevent race with last eb freeing * of the same page. * With the eb_refs increased before the eb inserted into radix tree, * detach_extent_buffer_page() won't detach the folio private while we're still * allocating the extent buffer. */ void btrfs_folio_inc_eb_refs(const struct btrfs_fs_info *fs_info, struct folio *folio) { struct btrfs_subpage *subpage; if (!btrfs_meta_is_subpage(fs_info)) return; ASSERT(folio_test_private(folio) && folio->mapping); lockdep_assert_held(&folio->mapping->i_private_lock); subpage = folio_get_private(folio); atomic_inc(&subpage->eb_refs); } void btrfs_folio_dec_eb_refs(const struct btrfs_fs_info *fs_info, struct folio *folio) { struct btrfs_subpage *subpage; if (!btrfs_meta_is_subpage(fs_info)) return; ASSERT(folio_test_private(folio) && folio->mapping); lockdep_assert_held(&folio->mapping->i_private_lock); subpage = folio_get_private(folio); ASSERT(atomic_read(&subpage->eb_refs)); atomic_dec(&subpage->eb_refs); } static void btrfs_subpage_assert(const struct btrfs_fs_info *fs_info, struct folio *folio, u64 start, u32 len) { /* Basic checks */ ASSERT(folio_test_private(folio) && folio_get_private(folio)); ASSERT(IS_ALIGNED(start, fs_info->sectorsize) && IS_ALIGNED(len, fs_info->sectorsize)); /* * The range check only works for mapped page, we can still have * unmapped page like dummy extent buffer pages. */ if (folio->mapping) ASSERT(folio_pos(folio) <= start && start + len <= folio_pos(folio) + folio_size(folio)); } #define subpage_calc_start_bit(fs_info, folio, name, start, len) \ ({ \ unsigned int __start_bit; \ const unsigned int blocks_per_folio = \ btrfs_blocks_per_folio(fs_info, folio); \ \ btrfs_subpage_assert(fs_info, folio, start, len); \ __start_bit = offset_in_folio(folio, start) >> fs_info->sectorsize_bits; \ __start_bit += blocks_per_folio * btrfs_bitmap_nr_##name; \ __start_bit; \ }) static void btrfs_subpage_clamp_range(struct folio *folio, u64 *start, u32 *len) { u64 orig_start = *start; u32 orig_len = *len; *start = max_t(u64, folio_pos(folio), orig_start); /* * For certain call sites like btrfs_drop_pages(), we may have pages * beyond the target range. In that case, just set @len to 0, subpage * helpers can handle @len == 0 without any problem. */ if (folio_pos(folio) >= orig_start + orig_len) *len = 0; else *len = min_t(u64, folio_pos(folio) + folio_size(folio), orig_start + orig_len) - *start; } static bool btrfs_subpage_end_and_test_lock(const struct btrfs_fs_info *fs_info, struct folio *folio, u64 start, u32 len) { struct btrfs_subpage *subpage = folio_get_private(folio); const int start_bit = subpage_calc_start_bit(fs_info, folio, locked, start, len); const int nbits = (len >> fs_info->sectorsize_bits); unsigned long flags; unsigned int cleared = 0; int bit = start_bit; bool last; btrfs_subpage_assert(fs_info, folio, start, len); spin_lock_irqsave(&subpage->lock, flags); /* * We have call sites passing @lock_page into * extent_clear_unlock_delalloc() for compression path. * * This @locked_page is locked by plain lock_page(), thus its * subpage::locked is 0. Handle them in a special way. */ if (atomic_read(&subpage->nr_locked) == 0) { spin_unlock_irqrestore(&subpage->lock, flags); return true; } for_each_set_bit_from(bit, subpage->bitmaps, start_bit + nbits) { clear_bit(bit, subpage->bitmaps); cleared++; } ASSERT(atomic_read(&subpage->nr_locked) >= cleared); last = atomic_sub_and_test(cleared, &subpage->nr_locked); spin_unlock_irqrestore(&subpage->lock, flags); return last; } /* * Handle different locked folios: * * - Non-subpage folio * Just unlock it. * * - folio locked but without any subpage locked * This happens either before writepage_delalloc() or the delalloc range is * already handled by previous folio. * We can simple unlock it. * * - folio locked with subpage range locked. * We go through the locked sectors inside the range and clear their locked * bitmap, reduce the writer lock number, and unlock the page if that's * the last locked range. */ void btrfs_folio_end_lock(const struct btrfs_fs_info *fs_info, struct folio *folio, u64 start, u32 len) { struct btrfs_subpage *subpage = folio_get_private(folio); ASSERT(folio_test_locked(folio)); if (unlikely(!fs_info) || !btrfs_is_subpage(fs_info, folio)) { folio_unlock(folio); return; } /* * For subpage case, there are two types of locked page. With or * without locked number. * * Since we own the page lock, no one else could touch subpage::locked * and we are safe to do several atomic operations without spinlock. */ if (atomic_read(&subpage->nr_locked) == 0) { /* No subpage lock, locked by plain lock_page(). */ folio_unlock(folio); return; } btrfs_subpage_clamp_range(folio, &start, &len); if (btrfs_subpage_end_and_test_lock(fs_info, folio, start, len)) folio_unlock(folio); } void btrfs_folio_end_lock_bitmap(const struct btrfs_fs_info *fs_info, struct folio *folio, unsigned long bitmap) { struct btrfs_subpage *subpage = folio_get_private(folio); const unsigned int blocks_per_folio = btrfs_blocks_per_folio(fs_info, folio); const int start_bit = blocks_per_folio * btrfs_bitmap_nr_locked; unsigned long flags; bool last = false; int cleared = 0; int bit; if (!btrfs_is_subpage(fs_info, folio)) { folio_unlock(folio); return; } if (atomic_read(&subpage->nr_locked) == 0) { /* No subpage lock, locked by plain lock_page(). */ folio_unlock(folio); return; } spin_lock_irqsave(&subpage->lock, flags); for_each_set_bit(bit, &bitmap, blocks_per_folio) { if (test_and_clear_bit(bit + start_bit, subpage->bitmaps)) cleared++; } ASSERT(atomic_read(&subpage->nr_locked) >= cleared); last = atomic_sub_and_test(cleared, &subpage->nr_locked); spin_unlock_irqrestore(&subpage->lock, flags); if (last) folio_unlock(folio); } #define subpage_test_bitmap_all_set(fs_info, folio, name) \ ({ \ struct btrfs_subpage *subpage = folio_get_private(folio); \ const unsigned int blocks_per_folio = \ btrfs_blocks_per_folio(fs_info, folio); \ \ bitmap_test_range_all_set(subpage->bitmaps, \ blocks_per_folio * btrfs_bitmap_nr_##name, \ blocks_per_folio); \ }) #define subpage_test_bitmap_all_zero(fs_info, folio, name) \ ({ \ struct btrfs_subpage *subpage = folio_get_private(folio); \ const unsigned int blocks_per_folio = \ btrfs_blocks_per_folio(fs_info, folio); \ \ bitmap_test_range_all_zero(subpage->bitmaps, \ blocks_per_folio * btrfs_bitmap_nr_##name, \ blocks_per_folio); \ }) void btrfs_subpage_set_uptodate(const struct btrfs_fs_info *fs_info, struct folio *folio, u64 start, u32 len) { struct btrfs_subpage *subpage = folio_get_private(folio); unsigned int start_bit = subpage_calc_start_bit(fs_info, folio, uptodate, start, len); unsigned long flags; spin_lock_irqsave(&subpage->lock, flags); bitmap_set(subpage->bitmaps, start_bit, len >> fs_info->sectorsize_bits); if (subpage_test_bitmap_all_set(fs_info, folio, uptodate)) folio_mark_uptodate(folio); spin_unlock_irqrestore(&subpage->lock, flags); } void btrfs_subpage_clear_uptodate(const struct btrfs_fs_info *fs_info, struct folio *folio, u64 start, u32 len) { struct btrfs_subpage *subpage = folio_get_private(folio); unsigned int start_bit = subpage_calc_start_bit(fs_info, folio, uptodate, start, len); unsigned long flags; spin_lock_irqsave(&subpage->lock, flags); bitmap_clear(subpage->bitmaps, start_bit, len >> fs_info->sectorsize_bits); folio_clear_uptodate(folio); spin_unlock_irqrestore(&subpage->lock, flags); } void btrfs_subpage_set_dirty(const struct btrfs_fs_info *fs_info, struct folio *folio, u64 start, u32 len) { struct btrfs_subpage *subpage = folio_get_private(folio); unsigned int start_bit = subpage_calc_start_bit(fs_info, folio, dirty, start, len); unsigned long flags; spin_lock_irqsave(&subpage->lock, flags); bitmap_set(subpage->bitmaps, start_bit, len >> fs_info->sectorsize_bits); spin_unlock_irqrestore(&subpage->lock, flags); folio_mark_dirty(folio); } /* * Extra clear_and_test function for subpage dirty bitmap. * * Return true if we're the last bits in the dirty_bitmap and clear the * dirty_bitmap. * Return false otherwise. * * NOTE: Callers should manually clear page dirty for true case, as we have * extra handling for tree blocks. */ bool btrfs_subpage_clear_and_test_dirty(const struct btrfs_fs_info *fs_info, struct folio *folio, u64 start, u32 len) { struct btrfs_subpage *subpage = folio_get_private(folio); unsigned int start_bit = subpage_calc_start_bit(fs_info, folio, dirty, start, len); unsigned long flags; bool last = false; spin_lock_irqsave(&subpage->lock, flags); bitmap_clear(subpage->bitmaps, start_bit, len >> fs_info->sectorsize_bits); if (subpage_test_bitmap_all_zero(fs_info, folio, dirty)) last = true; spin_unlock_irqrestore(&subpage->lock, flags); return last; } void btrfs_subpage_clear_dirty(const struct btrfs_fs_info *fs_info, struct folio *folio, u64 start, u32 len) { bool last; last = btrfs_subpage_clear_and_test_dirty(fs_info, folio, start, len); if (last) folio_clear_dirty_for_io(folio); } void btrfs_subpage_set_writeback(const struct btrfs_fs_info *fs_info, struct folio *folio, u64 start, u32 len) { struct btrfs_subpage *subpage = folio_get_private(folio); unsigned int start_bit = subpage_calc_start_bit(fs_info, folio, writeback, start, len); unsigned long flags; spin_lock_irqsave(&subpage->lock, flags); bitmap_set(subpage->bitmaps, start_bit, len >> fs_info->sectorsize_bits); if (!folio_test_writeback(folio)) folio_start_writeback(folio); spin_unlock_irqrestore(&subpage->lock, flags); } void btrfs_subpage_clear_writeback(const struct btrfs_fs_info *fs_info, struct folio *folio, u64 start, u32 len) { struct btrfs_subpage *subpage = folio_get_private(folio); unsigned int start_bit = subpage_calc_start_bit(fs_info, folio, writeback, start, len); unsigned long flags; spin_lock_irqsave(&subpage->lock, flags); bitmap_clear(subpage->bitmaps, start_bit, len >> fs_info->sectorsize_bits); if (subpage_test_bitmap_all_zero(fs_info, folio, writeback)) { ASSERT(folio_test_writeback(folio)); folio_end_writeback(folio); } spin_unlock_irqrestore(&subpage->lock, flags); } void btrfs_subpage_set_ordered(const struct btrfs_fs_info *fs_info, struct folio *folio, u64 start, u32 len) { struct btrfs_subpage *subpage = folio_get_private(folio); unsigned int start_bit = subpage_calc_start_bit(fs_info, folio, ordered, start, len); unsigned long flags; spin_lock_irqsave(&subpage->lock, flags); bitmap_set(subpage->bitmaps, start_bit, len >> fs_info->sectorsize_bits); folio_set_ordered(folio); spin_unlock_irqrestore(&subpage->lock, flags); } void btrfs_subpage_clear_ordered(const struct btrfs_fs_info *fs_info, struct folio *folio, u64 start, u32 len) { struct btrfs_subpage *subpage = folio_get_private(folio); unsigned int start_bit = subpage_calc_start_bit(fs_info, folio, ordered, start, len); unsigned long flags; spin_lock_irqsave(&subpage->lock, flags); bitmap_clear(subpage->bitmaps, start_bit, len >> fs_info->sectorsize_bits); if (subpage_test_bitmap_all_zero(fs_info, folio, ordered)) folio_clear_ordered(folio); spin_unlock_irqrestore(&subpage->lock, flags); } void btrfs_subpage_set_checked(const struct btrfs_fs_info *fs_info, struct folio *folio, u64 start, u32 len) { struct btrfs_subpage *subpage = folio_get_private(folio); unsigned int start_bit = subpage_calc_start_bit(fs_info, folio, checked, start, len); unsigned long flags; spin_lock_irqsave(&subpage->lock, flags); bitmap_set(subpage->bitmaps, start_bit, len >> fs_info->sectorsize_bits); if (subpage_test_bitmap_all_set(fs_info, folio, checked)) folio_set_checked(folio); spin_unlock_irqrestore(&subpage->lock, flags); } void btrfs_subpage_clear_checked(const struct btrfs_fs_info *fs_info, struct folio *folio, u64 start, u32 len) { struct btrfs_subpage *subpage = folio_get_private(folio); unsigned int start_bit = subpage_calc_start_bit(fs_info, folio, checked, start, len); unsigned long flags; spin_lock_irqsave(&subpage->lock, flags); bitmap_clear(subpage->bitmaps, start_bit, len >> fs_info->sectorsize_bits); folio_clear_checked(folio); spin_unlock_irqrestore(&subpage->lock, flags); } /* * Unlike set/clear which is dependent on each page status, for test all bits * are tested in the same way. */ #define IMPLEMENT_BTRFS_SUBPAGE_TEST_OP(name) \ bool btrfs_subpage_test_##name(const struct btrfs_fs_info *fs_info, \ struct folio *folio, u64 start, u32 len) \ { \ struct btrfs_subpage *subpage = folio_get_private(folio); \ unsigned int start_bit = subpage_calc_start_bit(fs_info, folio, \ name, start, len); \ unsigned long flags; \ bool ret; \ \ spin_lock_irqsave(&subpage->lock, flags); \ ret = bitmap_test_range_all_set(subpage->bitmaps, start_bit, \ len >> fs_info->sectorsize_bits); \ spin_unlock_irqrestore(&subpage->lock, flags); \ return ret; \ } IMPLEMENT_BTRFS_SUBPAGE_TEST_OP(uptodate); IMPLEMENT_BTRFS_SUBPAGE_TEST_OP(dirty); IMPLEMENT_BTRFS_SUBPAGE_TEST_OP(writeback); IMPLEMENT_BTRFS_SUBPAGE_TEST_OP(ordered); IMPLEMENT_BTRFS_SUBPAGE_TEST_OP(checked); /* * Note that, in selftests (extent-io-tests), we can have empty fs_info passed * in. We only test sectorsize == PAGE_SIZE cases so far, thus we can fall * back to regular sectorsize branch. */ #define IMPLEMENT_BTRFS_PAGE_OPS(name, folio_set_func, \ folio_clear_func, folio_test_func) \ void btrfs_folio_set_##name(const struct btrfs_fs_info *fs_info, \ struct folio *folio, u64 start, u32 len) \ { \ if (unlikely(!fs_info) || \ !btrfs_is_subpage(fs_info, folio)) { \ folio_set_func(folio); \ return; \ } \ btrfs_subpage_set_##name(fs_info, folio, start, len); \ } \ void btrfs_folio_clear_##name(const struct btrfs_fs_info *fs_info, \ struct folio *folio, u64 start, u32 len) \ { \ if (unlikely(!fs_info) || \ !btrfs_is_subpage(fs_info, folio)) { \ folio_clear_func(folio); \ return; \ } \ btrfs_subpage_clear_##name(fs_info, folio, start, len); \ } \ bool btrfs_folio_test_##name(const struct btrfs_fs_info *fs_info, \ struct folio *folio, u64 start, u32 len) \ { \ if (unlikely(!fs_info) || \ !btrfs_is_subpage(fs_info, folio)) \ return folio_test_func(folio); \ return btrfs_subpage_test_##name(fs_info, folio, start, len); \ } \ void btrfs_folio_clamp_set_##name(const struct btrfs_fs_info *fs_info, \ struct folio *folio, u64 start, u32 len) \ { \ if (unlikely(!fs_info) || \ !btrfs_is_subpage(fs_info, folio)) { \ folio_set_func(folio); \ return; \ } \ btrfs_subpage_clamp_range(folio, &start, &len); \ btrfs_subpage_set_##name(fs_info, folio, start, len); \ } \ void btrfs_folio_clamp_clear_##name(const struct btrfs_fs_info *fs_info, \ struct folio *folio, u64 start, u32 len) \ { \ if (unlikely(!fs_info) || \ !btrfs_is_subpage(fs_info, folio)) { \ folio_clear_func(folio); \ return; \ } \ btrfs_subpage_clamp_range(folio, &start, &len); \ btrfs_subpage_clear_##name(fs_info, folio, start, len); \ } \ bool btrfs_folio_clamp_test_##name(const struct btrfs_fs_info *fs_info, \ struct folio *folio, u64 start, u32 len) \ { \ if (unlikely(!fs_info) || \ !btrfs_is_subpage(fs_info, folio)) \ return folio_test_func(folio); \ btrfs_subpage_clamp_range(folio, &start, &len); \ return btrfs_subpage_test_##name(fs_info, folio, start, len); \ } \ void btrfs_meta_folio_set_##name(struct folio *folio, const struct extent_buffer *eb) \ { \ if (!btrfs_meta_is_subpage(eb->fs_info)) { \ folio_set_func(folio); \ return; \ } \ btrfs_subpage_set_##name(eb->fs_info, folio, eb->start, eb->len); \ } \ void btrfs_meta_folio_clear_##name(struct folio *folio, const struct extent_buffer *eb) \ { \ if (!btrfs_meta_is_subpage(eb->fs_info)) { \ folio_clear_func(folio); \ return; \ } \ btrfs_subpage_clear_##name(eb->fs_info, folio, eb->start, eb->len); \ } \ bool btrfs_meta_folio_test_##name(struct folio *folio, const struct extent_buffer *eb) \ { \ if (!btrfs_meta_is_subpage(eb->fs_info)) \ return folio_test_func(folio); \ return btrfs_subpage_test_##name(eb->fs_info, folio, eb->start, eb->len); \ } IMPLEMENT_BTRFS_PAGE_OPS(uptodate, folio_mark_uptodate, folio_clear_uptodate, folio_test_uptodate); IMPLEMENT_BTRFS_PAGE_OPS(dirty, folio_mark_dirty, folio_clear_dirty_for_io, folio_test_dirty); IMPLEMENT_BTRFS_PAGE_OPS(writeback, folio_start_writeback, folio_end_writeback, folio_test_writeback); IMPLEMENT_BTRFS_PAGE_OPS(ordered, folio_set_ordered, folio_clear_ordered, folio_test_ordered); IMPLEMENT_BTRFS_PAGE_OPS(checked, folio_set_checked, folio_clear_checked, folio_test_checked); #define GET_SUBPAGE_BITMAP(fs_info, folio, name, dst) \ { \ const unsigned int blocks_per_folio = \ btrfs_blocks_per_folio(fs_info, folio); \ const struct btrfs_subpage *subpage = folio_get_private(folio); \ \ ASSERT(blocks_per_folio <= BITS_PER_LONG); \ *dst = bitmap_read(subpage->bitmaps, \ blocks_per_folio * btrfs_bitmap_nr_##name, \ blocks_per_folio); \ } #define SUBPAGE_DUMP_BITMAP(fs_info, folio, name, start, len) \ { \ unsigned long bitmap; \ const unsigned int blocks_per_folio = \ btrfs_blocks_per_folio(fs_info, folio); \ \ GET_SUBPAGE_BITMAP(fs_info, folio, name, &bitmap); \ btrfs_warn(fs_info, \ "dumpping bitmap start=%llu len=%u folio=%llu " #name "_bitmap=%*pbl", \ start, len, folio_pos(folio), \ blocks_per_folio, &bitmap); \ } /* * Make sure not only the page dirty bit is cleared, but also subpage dirty bit * is cleared. */ void btrfs_folio_assert_not_dirty(const struct btrfs_fs_info *fs_info, struct folio *folio, u64 start, u32 len) { struct btrfs_subpage *subpage; unsigned int start_bit; unsigned int nbits; unsigned long flags; if (!IS_ENABLED(CONFIG_BTRFS_ASSERT)) return; if (!btrfs_is_subpage(fs_info, folio)) { ASSERT(!folio_test_dirty(folio)); return; } start_bit = subpage_calc_start_bit(fs_info, folio, dirty, start, len); nbits = len >> fs_info->sectorsize_bits; subpage = folio_get_private(folio); ASSERT(subpage); spin_lock_irqsave(&subpage->lock, flags); if (unlikely(!bitmap_test_range_all_zero(subpage->bitmaps, start_bit, nbits))) { SUBPAGE_DUMP_BITMAP(fs_info, folio, dirty, start, len); ASSERT(bitmap_test_range_all_zero(subpage->bitmaps, start_bit, nbits)); } ASSERT(bitmap_test_range_all_zero(subpage->bitmaps, start_bit, nbits)); spin_unlock_irqrestore(&subpage->lock, flags); } /* * This is for folio already locked by plain lock_page()/folio_lock(), which * doesn't have any subpage awareness. * * This populates the involved subpage ranges so that subpage helpers can * properly unlock them. */ void btrfs_folio_set_lock(const struct btrfs_fs_info *fs_info, struct folio *folio, u64 start, u32 len) { struct btrfs_subpage *subpage; unsigned long flags; unsigned int start_bit; unsigned int nbits; int ret; ASSERT(folio_test_locked(folio)); if (unlikely(!fs_info) || !btrfs_is_subpage(fs_info, folio)) return; subpage = folio_get_private(folio); start_bit = subpage_calc_start_bit(fs_info, folio, locked, start, len); nbits = len >> fs_info->sectorsize_bits; spin_lock_irqsave(&subpage->lock, flags); /* Target range should not yet be locked. */ if (unlikely(!bitmap_test_range_all_zero(subpage->bitmaps, start_bit, nbits))) { SUBPAGE_DUMP_BITMAP(fs_info, folio, locked, start, len); ASSERT(bitmap_test_range_all_zero(subpage->bitmaps, start_bit, nbits)); } bitmap_set(subpage->bitmaps, start_bit, nbits); ret = atomic_add_return(nbits, &subpage->nr_locked); ASSERT(ret <= btrfs_blocks_per_folio(fs_info, folio)); spin_unlock_irqrestore(&subpage->lock, flags); } /* * Clear the dirty flag for the folio. * * If the affected folio is no longer dirty, return true. Otherwise return false. */ bool btrfs_meta_folio_clear_and_test_dirty(struct folio *folio, const struct extent_buffer *eb) { bool last; if (!btrfs_meta_is_subpage(eb->fs_info)) { folio_clear_dirty_for_io(folio); return true; } last = btrfs_subpage_clear_and_test_dirty(eb->fs_info, folio, eb->start, eb->len); if (last) { folio_clear_dirty_for_io(folio); return true; } return false; } void __cold btrfs_subpage_dump_bitmap(const struct btrfs_fs_info *fs_info, struct folio *folio, u64 start, u32 len) { struct btrfs_subpage *subpage; const unsigned int blocks_per_folio = btrfs_blocks_per_folio(fs_info, folio); unsigned long uptodate_bitmap; unsigned long dirty_bitmap; unsigned long writeback_bitmap; unsigned long ordered_bitmap; unsigned long checked_bitmap; unsigned long locked_bitmap; unsigned long flags; ASSERT(folio_test_private(folio) && folio_get_private(folio)); ASSERT(blocks_per_folio > 1); subpage = folio_get_private(folio); spin_lock_irqsave(&subpage->lock, flags); GET_SUBPAGE_BITMAP(fs_info, folio, uptodate, &uptodate_bitmap); GET_SUBPAGE_BITMAP(fs_info, folio, dirty, &dirty_bitmap); GET_SUBPAGE_BITMAP(fs_info, folio, writeback, &writeback_bitmap); GET_SUBPAGE_BITMAP(fs_info, folio, ordered, &ordered_bitmap); GET_SUBPAGE_BITMAP(fs_info, folio, checked, &checked_bitmap); GET_SUBPAGE_BITMAP(fs_info, folio, locked, &locked_bitmap); spin_unlock_irqrestore(&subpage->lock, flags); dump_page(folio_page(folio, 0), "btrfs subpage dump"); btrfs_warn(fs_info, "start=%llu len=%u page=%llu, bitmaps uptodate=%*pbl dirty=%*pbl locked=%*pbl writeback=%*pbl ordered=%*pbl checked=%*pbl", start, len, folio_pos(folio), blocks_per_folio, &uptodate_bitmap, blocks_per_folio, &dirty_bitmap, blocks_per_folio, &locked_bitmap, blocks_per_folio, &writeback_bitmap, blocks_per_folio, &ordered_bitmap, blocks_per_folio, &checked_bitmap); } void btrfs_get_subpage_dirty_bitmap(struct btrfs_fs_info *fs_info, struct folio *folio, unsigned long *ret_bitmap) { struct btrfs_subpage *subpage; unsigned long flags; ASSERT(folio_test_private(folio) && folio_get_private(folio)); ASSERT(btrfs_blocks_per_folio(fs_info, folio) > 1); subpage = folio_get_private(folio); spin_lock_irqsave(&subpage->lock, flags); GET_SUBPAGE_BITMAP(fs_info, folio, dirty, ret_bitmap); spin_unlock_irqrestore(&subpage->lock, flags); } |
3 14 1 14 14 1 14 14 14 12 11 1 6 8 4 11 2 2 2 2 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 | // SPDX-License-Identifier: GPL-2.0-or-later /* * * Copyright (C) Jonathan Naylor G4KLX (g4klx@g4klx.demon.co.uk) */ #include <linux/errno.h> #include <linux/types.h> #include <linux/socket.h> #include <linux/in.h> #include <linux/kernel.h> #include <linux/timer.h> #include <linux/string.h> #include <linux/sockios.h> #include <linux/net.h> #include <linux/slab.h> #include <net/ax25.h> #include <linux/inet.h> #include <linux/netdevice.h> #include <linux/skbuff.h> #include <net/sock.h> #include <net/tcp_states.h> #include <linux/fcntl.h> #include <linux/mm.h> #include <linux/interrupt.h> #include <net/rose.h> static int rose_create_facilities(unsigned char *buffer, struct rose_sock *rose); /* * This routine purges all of the queues of frames. */ void rose_clear_queues(struct sock *sk) { skb_queue_purge(&sk->sk_write_queue); skb_queue_purge(&rose_sk(sk)->ack_queue); } /* * This routine purges the input queue of those frames that have been * acknowledged. This replaces the boxes labelled "V(a) <- N(r)" on the * SDL diagram. */ void rose_frames_acked(struct sock *sk, unsigned short nr) { struct sk_buff *skb; struct rose_sock *rose = rose_sk(sk); /* * Remove all the ack-ed frames from the ack queue. */ if (rose->va != nr) { while (skb_peek(&rose->ack_queue) != NULL && rose->va != nr) { skb = skb_dequeue(&rose->ack_queue); kfree_skb(skb); rose->va = (rose->va + 1) % ROSE_MODULUS; } } } void rose_requeue_frames(struct sock *sk) { struct sk_buff *skb, *skb_prev = NULL; /* * Requeue all the un-ack-ed frames on the output queue to be picked * up by rose_kick. This arrangement handles the possibility of an * empty output queue. */ while ((skb = skb_dequeue(&rose_sk(sk)->ack_queue)) != NULL) { if (skb_prev == NULL) skb_queue_head(&sk->sk_write_queue, skb); else skb_append(skb_prev, skb, &sk->sk_write_queue); skb_prev = skb; } } /* * Validate that the value of nr is between va and vs. Return true or * false for testing. */ int rose_validate_nr(struct sock *sk, unsigned short nr) { struct rose_sock *rose = rose_sk(sk); unsigned short vc = rose->va; while (vc != rose->vs) { if (nr == vc) return 1; vc = (vc + 1) % ROSE_MODULUS; } return nr == rose->vs; } /* * This routine is called when the packet layer internally generates a * control frame. */ void rose_write_internal(struct sock *sk, int frametype) { struct rose_sock *rose = rose_sk(sk); struct sk_buff *skb; unsigned char *dptr; unsigned char lci1, lci2; int maxfaclen = 0; int len, faclen; int reserve; reserve = AX25_BPQ_HEADER_LEN + AX25_MAX_HEADER_LEN + 1; len = ROSE_MIN_LEN; switch (frametype) { case ROSE_CALL_REQUEST: len += 1 + ROSE_ADDR_LEN + ROSE_ADDR_LEN; maxfaclen = 256; break; case ROSE_CALL_ACCEPTED: case ROSE_CLEAR_REQUEST: case ROSE_RESET_REQUEST: len += 2; break; } skb = alloc_skb(reserve + len + maxfaclen, GFP_ATOMIC); if (!skb) return; /* * Space for AX.25 header and PID. */ skb_reserve(skb, reserve); dptr = skb_put(skb, len); lci1 = (rose->lci >> 8) & 0x0F; lci2 = (rose->lci >> 0) & 0xFF; switch (frametype) { case ROSE_CALL_REQUEST: *dptr++ = ROSE_GFI | lci1; *dptr++ = lci2; *dptr++ = frametype; *dptr++ = ROSE_CALL_REQ_ADDR_LEN_VAL; memcpy(dptr, &rose->dest_addr, ROSE_ADDR_LEN); dptr += ROSE_ADDR_LEN; memcpy(dptr, &rose->source_addr, ROSE_ADDR_LEN); dptr += ROSE_ADDR_LEN; faclen = rose_create_facilities(dptr, rose); skb_put(skb, faclen); dptr += faclen; break; case ROSE_CALL_ACCEPTED: *dptr++ = ROSE_GFI | lci1; *dptr++ = lci2; *dptr++ = frametype; *dptr++ = 0x00; /* Address length */ *dptr++ = 0; /* Facilities length */ break; case ROSE_CLEAR_REQUEST: *dptr++ = ROSE_GFI | lci1; *dptr++ = lci2; *dptr++ = frametype; *dptr++ = rose->cause; *dptr++ = rose->diagnostic; break; case ROSE_RESET_REQUEST: *dptr++ = ROSE_GFI | lci1; *dptr++ = lci2; *dptr++ = frametype; *dptr++ = ROSE_DTE_ORIGINATED; *dptr++ = 0; break; case ROSE_RR: case ROSE_RNR: *dptr++ = ROSE_GFI | lci1; *dptr++ = lci2; *dptr = frametype; *dptr++ |= (rose->vr << 5) & 0xE0; break; case ROSE_CLEAR_CONFIRMATION: case ROSE_RESET_CONFIRMATION: *dptr++ = ROSE_GFI | lci1; *dptr++ = lci2; *dptr++ = frametype; break; default: printk(KERN_ERR "ROSE: rose_write_internal - invalid frametype %02X\n", frametype); kfree_skb(skb); return; } rose_transmit_link(skb, rose->neighbour); } int rose_decode(struct sk_buff *skb, int *ns, int *nr, int *q, int *d, int *m) { unsigned char *frame; frame = skb->data; *ns = *nr = *q = *d = *m = 0; switch (frame[2]) { case ROSE_CALL_REQUEST: case ROSE_CALL_ACCEPTED: case ROSE_CLEAR_REQUEST: case ROSE_CLEAR_CONFIRMATION: case ROSE_RESET_REQUEST: case ROSE_RESET_CONFIRMATION: return frame[2]; default: break; } if ((frame[2] & 0x1F) == ROSE_RR || (frame[2] & 0x1F) == ROSE_RNR) { *nr = (frame[2] >> 5) & 0x07; return frame[2] & 0x1F; } if ((frame[2] & 0x01) == ROSE_DATA) { *q = (frame[0] & ROSE_Q_BIT) == ROSE_Q_BIT; *d = (frame[0] & ROSE_D_BIT) == ROSE_D_BIT; *m = (frame[2] & ROSE_M_BIT) == ROSE_M_BIT; *nr = (frame[2] >> 5) & 0x07; *ns = (frame[2] >> 1) & 0x07; return ROSE_DATA; } return ROSE_ILLEGAL; } static int rose_parse_national(unsigned char *p, struct rose_facilities_struct *facilities, int len) { unsigned char *pt; unsigned char l, lg, n = 0; int fac_national_digis_received = 0; do { switch (*p & 0xC0) { case 0x00: if (len < 2) return -1; p += 2; n += 2; len -= 2; break; case 0x40: if (len < 3) return -1; if (*p == FAC_NATIONAL_RAND) facilities->rand = ((p[1] << 8) & 0xFF00) + ((p[2] << 0) & 0x00FF); p += 3; n += 3; len -= 3; break; case 0x80: if (len < 4) return -1; p += 4; n += 4; len -= 4; break; case 0xC0: if (len < 2) return -1; l = p[1]; if (len < 2 + l) return -1; if (*p == FAC_NATIONAL_DEST_DIGI) { if (!fac_national_digis_received) { if (l < AX25_ADDR_LEN) return -1; memcpy(&facilities->source_digis[0], p + 2, AX25_ADDR_LEN); facilities->source_ndigis = 1; } } else if (*p == FAC_NATIONAL_SRC_DIGI) { if (!fac_national_digis_received) { if (l < AX25_ADDR_LEN) return -1; memcpy(&facilities->dest_digis[0], p + 2, AX25_ADDR_LEN); facilities->dest_ndigis = 1; } } else if (*p == FAC_NATIONAL_FAIL_CALL) { if (l < AX25_ADDR_LEN) return -1; memcpy(&facilities->fail_call, p + 2, AX25_ADDR_LEN); } else if (*p == FAC_NATIONAL_FAIL_ADD) { if (l < 1 + ROSE_ADDR_LEN) return -1; memcpy(&facilities->fail_addr, p + 3, ROSE_ADDR_LEN); } else if (*p == FAC_NATIONAL_DIGIS) { if (l % AX25_ADDR_LEN) return -1; fac_national_digis_received = 1; facilities->source_ndigis = 0; facilities->dest_ndigis = 0; for (pt = p + 2, lg = 0 ; lg < l ; pt += AX25_ADDR_LEN, lg += AX25_ADDR_LEN) { if (pt[6] & AX25_HBIT) { if (facilities->dest_ndigis >= ROSE_MAX_DIGIS) return -1; memcpy(&facilities->dest_digis[facilities->dest_ndigis++], pt, AX25_ADDR_LEN); } else { if (facilities->source_ndigis >= ROSE_MAX_DIGIS) return -1; memcpy(&facilities->source_digis[facilities->source_ndigis++], pt, AX25_ADDR_LEN); } } } p += l + 2; n += l + 2; len -= l + 2; break; } } while (*p != 0x00 && len > 0); return n; } static int rose_parse_ccitt(unsigned char *p, struct rose_facilities_struct *facilities, int len) { unsigned char l, n = 0; char callsign[11]; do { switch (*p & 0xC0) { case 0x00: if (len < 2) return -1; p += 2; n += 2; len -= 2; break; case 0x40: if (len < 3) return -1; p += 3; n += 3; len -= 3; break; case 0x80: if (len < 4) return -1; p += 4; n += 4; len -= 4; break; case 0xC0: if (len < 2) return -1; l = p[1]; /* Prevent overflows*/ if (l < 10 || l > 20) return -1; if (*p == FAC_CCITT_DEST_NSAP) { memcpy(&facilities->source_addr, p + 7, ROSE_ADDR_LEN); memcpy(callsign, p + 12, l - 10); callsign[l - 10] = '\0'; asc2ax(&facilities->source_call, callsign); } if (*p == FAC_CCITT_SRC_NSAP) { memcpy(&facilities->dest_addr, p + 7, ROSE_ADDR_LEN); memcpy(callsign, p + 12, l - 10); callsign[l - 10] = '\0'; asc2ax(&facilities->dest_call, callsign); } p += l + 2; n += l + 2; len -= l + 2; break; } } while (*p != 0x00 && len > 0); return n; } int rose_parse_facilities(unsigned char *p, unsigned packet_len, struct rose_facilities_struct *facilities) { int facilities_len, len; facilities_len = *p++; if (facilities_len == 0 || (unsigned int)facilities_len > packet_len) return 0; while (facilities_len >= 3 && *p == 0x00) { facilities_len--; p++; switch (*p) { case FAC_NATIONAL: /* National */ len = rose_parse_national(p + 1, facilities, facilities_len - 1); break; case FAC_CCITT: /* CCITT */ len = rose_parse_ccitt(p + 1, facilities, facilities_len - 1); break; default: printk(KERN_DEBUG "ROSE: rose_parse_facilities - unknown facilities family %02X\n", *p); len = 1; break; } if (len < 0) return 0; if (WARN_ON(len >= facilities_len)) return 0; facilities_len -= len + 1; p += len + 1; } return facilities_len == 0; } static int rose_create_facilities(unsigned char *buffer, struct rose_sock *rose) { unsigned char *p = buffer + 1; char *callsign; char buf[11]; int len, nb; /* National Facilities */ if (rose->rand != 0 || rose->source_ndigis == 1 || rose->dest_ndigis == 1) { *p++ = 0x00; *p++ = FAC_NATIONAL; if (rose->rand != 0) { *p++ = FAC_NATIONAL_RAND; *p++ = (rose->rand >> 8) & 0xFF; *p++ = (rose->rand >> 0) & 0xFF; } /* Sent before older facilities */ if ((rose->source_ndigis > 0) || (rose->dest_ndigis > 0)) { int maxdigi = 0; *p++ = FAC_NATIONAL_DIGIS; *p++ = AX25_ADDR_LEN * (rose->source_ndigis + rose->dest_ndigis); for (nb = 0 ; nb < rose->source_ndigis ; nb++) { if (++maxdigi >= ROSE_MAX_DIGIS) break; memcpy(p, &rose->source_digis[nb], AX25_ADDR_LEN); p[6] |= AX25_HBIT; p += AX25_ADDR_LEN; } for (nb = 0 ; nb < rose->dest_ndigis ; nb++) { if (++maxdigi >= ROSE_MAX_DIGIS) break; memcpy(p, &rose->dest_digis[nb], AX25_ADDR_LEN); p[6] &= ~AX25_HBIT; p += AX25_ADDR_LEN; } } /* For compatibility */ if (rose->source_ndigis > 0) { *p++ = FAC_NATIONAL_SRC_DIGI; *p++ = AX25_ADDR_LEN; memcpy(p, &rose->source_digis[0], AX25_ADDR_LEN); p += AX25_ADDR_LEN; } /* For compatibility */ if (rose->dest_ndigis > 0) { *p++ = FAC_NATIONAL_DEST_DIGI; *p++ = AX25_ADDR_LEN; memcpy(p, &rose->dest_digis[0], AX25_ADDR_LEN); p += AX25_ADDR_LEN; } } *p++ = 0x00; *p++ = FAC_CCITT; *p++ = FAC_CCITT_DEST_NSAP; callsign = ax2asc(buf, &rose->dest_call); *p++ = strlen(callsign) + 10; *p++ = (strlen(callsign) + 9) * 2; /* ??? */ *p++ = 0x47; *p++ = 0x00; *p++ = 0x11; *p++ = ROSE_ADDR_LEN * 2; memcpy(p, &rose->dest_addr, ROSE_ADDR_LEN); p += ROSE_ADDR_LEN; memcpy(p, callsign, strlen(callsign)); p += strlen(callsign); *p++ = FAC_CCITT_SRC_NSAP; callsign = ax2asc(buf, &rose->source_call); *p++ = strlen(callsign) + 10; *p++ = (strlen(callsign) + 9) * 2; /* ??? */ *p++ = 0x47; *p++ = 0x00; *p++ = 0x11; *p++ = ROSE_ADDR_LEN * 2; memcpy(p, &rose->source_addr, ROSE_ADDR_LEN); p += ROSE_ADDR_LEN; memcpy(p, callsign, strlen(callsign)); p += strlen(callsign); len = p - buffer; buffer[0] = len - 1; return len; } void rose_disconnect(struct sock *sk, int reason, int cause, int diagnostic) { struct rose_sock *rose = rose_sk(sk); rose_stop_timer(sk); rose_stop_idletimer(sk); rose_clear_queues(sk); rose->lci = 0; rose->state = ROSE_STATE_0; if (cause != -1) rose->cause = cause; if (diagnostic != -1) rose->diagnostic = diagnostic; sk->sk_state = TCP_CLOSE; sk->sk_err = reason; sk->sk_shutdown |= SEND_SHUTDOWN; if (!sock_flag(sk, SOCK_DEAD)) { sk->sk_state_change(sk); sock_set_flag(sk, SOCK_DEAD); } } |
355 191 214 558 31 558 570 167 464 461 105 355 138 16 414 417 276 278 138 60 407 407 7 393 251 343 5 175 11 31 623 257 537 29 1 20 18 22 29 414 222 223 16 66 8 5 8 5 5 13 16 18 18 5 18 6 6 6 18 17 5 8 10 5 13 6 208 197 356 17 208 197 356 353 18 347 43 355 94 356 355 18 355 356 8 11 6 6 12 1 1 1 1 20 20 20 5 20 19 1 17 11 12 12 15 10 12 4 18 14 12 12 10 3 12 5 4 4 4 4 3 1 1 19 16 5 20 81 183 183 183 183 54 183 23 40 151 27 20 105 81 624 625 258 541 539 20 285 4 473 217 217 111 91 227 122 158 48 227 227 227 475 475 458 2 2 2 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 | // SPDX-License-Identifier: GPL-2.0 /* * Copyright (c) 2017 Christoph Hellwig. */ #include "xfs.h" #include "xfs_shared.h" #include "xfs_format.h" #include "xfs_bit.h" #include "xfs_log_format.h" #include "xfs_trans_resv.h" #include "xfs_mount.h" #include "xfs_inode.h" #include "xfs_trace.h" /* * In-core extent record layout: * * +-------+----------------------------+ * | 00:53 | all 54 bits of startoff | * | 54:63 | low 10 bits of startblock | * +-------+----------------------------+ * | 00:20 | all 21 bits of length | * | 21 | unwritten extent bit | * | 22:63 | high 42 bits of startblock | * +-------+----------------------------+ */ #define XFS_IEXT_STARTOFF_MASK xfs_mask64lo(BMBT_STARTOFF_BITLEN) #define XFS_IEXT_LENGTH_MASK xfs_mask64lo(BMBT_BLOCKCOUNT_BITLEN) #define XFS_IEXT_STARTBLOCK_MASK xfs_mask64lo(BMBT_STARTBLOCK_BITLEN) struct xfs_iext_rec { uint64_t lo; uint64_t hi; }; /* * Given that the length can't be a zero, only an empty hi value indicates an * unused record. */ static bool xfs_iext_rec_is_empty(struct xfs_iext_rec *rec) { return rec->hi == 0; } static inline void xfs_iext_rec_clear(struct xfs_iext_rec *rec) { rec->lo = 0; rec->hi = 0; } static void xfs_iext_set( struct xfs_iext_rec *rec, struct xfs_bmbt_irec *irec) { ASSERT((irec->br_startoff & ~XFS_IEXT_STARTOFF_MASK) == 0); ASSERT((irec->br_blockcount & ~XFS_IEXT_LENGTH_MASK) == 0); ASSERT((irec->br_startblock & ~XFS_IEXT_STARTBLOCK_MASK) == 0); rec->lo = irec->br_startoff & XFS_IEXT_STARTOFF_MASK; rec->hi = irec->br_blockcount & XFS_IEXT_LENGTH_MASK; rec->lo |= (irec->br_startblock << 54); rec->hi |= ((irec->br_startblock & ~xfs_mask64lo(10)) << (22 - 10)); if (irec->br_state == XFS_EXT_UNWRITTEN) rec->hi |= (1 << 21); } static void xfs_iext_get( struct xfs_bmbt_irec *irec, struct xfs_iext_rec *rec) { irec->br_startoff = rec->lo & XFS_IEXT_STARTOFF_MASK; irec->br_blockcount = rec->hi & XFS_IEXT_LENGTH_MASK; irec->br_startblock = rec->lo >> 54; irec->br_startblock |= (rec->hi & xfs_mask64hi(42)) >> (22 - 10); if (rec->hi & (1 << 21)) irec->br_state = XFS_EXT_UNWRITTEN; else irec->br_state = XFS_EXT_NORM; } enum { NODE_SIZE = 256, KEYS_PER_NODE = NODE_SIZE / (sizeof(uint64_t) + sizeof(void *)), RECS_PER_LEAF = (NODE_SIZE - (2 * sizeof(struct xfs_iext_leaf *))) / sizeof(struct xfs_iext_rec), }; /* * In-core extent btree block layout: * * There are two types of blocks in the btree: leaf and inner (non-leaf) blocks. * * The leaf blocks are made up by %KEYS_PER_NODE extent records, which each * contain the startoffset, blockcount, startblock and unwritten extent flag. * See above for the exact format, followed by pointers to the previous and next * leaf blocks (if there are any). * * The inner (non-leaf) blocks first contain KEYS_PER_NODE lookup keys, followed * by an equal number of pointers to the btree blocks at the next lower level. * * +-------+-------+-------+-------+-------+----------+----------+ * Leaf: | rec 1 | rec 2 | rec 3 | rec 4 | rec N | prev-ptr | next-ptr | * +-------+-------+-------+-------+-------+----------+----------+ * * +-------+-------+-------+-------+-------+-------+------+-------+ * Inner: | key 1 | key 2 | key 3 | key N | ptr 1 | ptr 2 | ptr3 | ptr N | * +-------+-------+-------+-------+-------+-------+------+-------+ */ struct xfs_iext_node { uint64_t keys[KEYS_PER_NODE]; #define XFS_IEXT_KEY_INVALID (1ULL << 63) void *ptrs[KEYS_PER_NODE]; }; struct xfs_iext_leaf { struct xfs_iext_rec recs[RECS_PER_LEAF]; struct xfs_iext_leaf *prev; struct xfs_iext_leaf *next; }; inline xfs_extnum_t xfs_iext_count(struct xfs_ifork *ifp) { return ifp->if_bytes / sizeof(struct xfs_iext_rec); } static inline int xfs_iext_max_recs(struct xfs_ifork *ifp) { if (ifp->if_height == 1) return xfs_iext_count(ifp); return RECS_PER_LEAF; } static inline struct xfs_iext_rec *cur_rec(struct xfs_iext_cursor *cur) { return &cur->leaf->recs[cur->pos]; } static inline bool xfs_iext_valid(struct xfs_ifork *ifp, struct xfs_iext_cursor *cur) { if (!cur->leaf) return false; if (cur->pos < 0 || cur->pos >= xfs_iext_max_recs(ifp)) return false; if (xfs_iext_rec_is_empty(cur_rec(cur))) return false; return true; } static void * xfs_iext_find_first_leaf( struct xfs_ifork *ifp) { struct xfs_iext_node *node = ifp->if_data; int height; if (!ifp->if_height) return NULL; for (height = ifp->if_height; height > 1; height--) { node = node->ptrs[0]; ASSERT(node); } return node; } static void * xfs_iext_find_last_leaf( struct xfs_ifork *ifp) { struct xfs_iext_node *node = ifp->if_data; int height, i; if (!ifp->if_height) return NULL; for (height = ifp->if_height; height > 1; height--) { for (i = 1; i < KEYS_PER_NODE; i++) if (!node->ptrs[i]) break; node = node->ptrs[i - 1]; ASSERT(node); } return node; } void xfs_iext_first( struct xfs_ifork *ifp, struct xfs_iext_cursor *cur) { cur->pos = 0; cur->leaf = xfs_iext_find_first_leaf(ifp); } void xfs_iext_last( struct xfs_ifork *ifp, struct xfs_iext_cursor *cur) { int i; cur->leaf = xfs_iext_find_last_leaf(ifp); if (!cur->leaf) { cur->pos = 0; return; } for (i = 1; i < xfs_iext_max_recs(ifp); i++) { if (xfs_iext_rec_is_empty(&cur->leaf->recs[i])) break; } cur->pos = i - 1; } void xfs_iext_next( struct xfs_ifork *ifp, struct xfs_iext_cursor *cur) { if (!cur->leaf) { ASSERT(cur->pos <= 0 || cur->pos >= RECS_PER_LEAF); xfs_iext_first(ifp, cur); return; } ASSERT(cur->pos >= 0); ASSERT(cur->pos < xfs_iext_max_recs(ifp)); cur->pos++; if (ifp->if_height > 1 && !xfs_iext_valid(ifp, cur) && cur->leaf->next) { cur->leaf = cur->leaf->next; cur->pos = 0; } } void xfs_iext_prev( struct xfs_ifork *ifp, struct xfs_iext_cursor *cur) { if (!cur->leaf) { ASSERT(cur->pos <= 0 || cur->pos >= RECS_PER_LEAF); xfs_iext_last(ifp, cur); return; } ASSERT(cur->pos >= 0); ASSERT(cur->pos <= RECS_PER_LEAF); recurse: do { cur->pos--; if (xfs_iext_valid(ifp, cur)) return; } while (cur->pos > 0); if (ifp->if_height > 1 && cur->leaf->prev) { cur->leaf = cur->leaf->prev; cur->pos = RECS_PER_LEAF; goto recurse; } } static inline int xfs_iext_key_cmp( struct xfs_iext_node *node, int n, xfs_fileoff_t offset) { if (node->keys[n] > offset) return 1; if (node->keys[n] < offset) return -1; return 0; } static inline int xfs_iext_rec_cmp( struct xfs_iext_rec *rec, xfs_fileoff_t offset) { uint64_t rec_offset = rec->lo & XFS_IEXT_STARTOFF_MASK; uint32_t rec_len = rec->hi & XFS_IEXT_LENGTH_MASK; if (rec_offset > offset) return 1; if (rec_offset + rec_len <= offset) return -1; return 0; } static void * xfs_iext_find_level( struct xfs_ifork *ifp, xfs_fileoff_t offset, int level) { struct xfs_iext_node *node = ifp->if_data; int height, i; if (!ifp->if_height) return NULL; for (height = ifp->if_height; height > level; height--) { for (i = 1; i < KEYS_PER_NODE; i++) if (xfs_iext_key_cmp(node, i, offset) > 0) break; node = node->ptrs[i - 1]; if (!node) break; } return node; } static int xfs_iext_node_pos( struct xfs_iext_node *node, xfs_fileoff_t offset) { int i; for (i = 1; i < KEYS_PER_NODE; i++) { if (xfs_iext_key_cmp(node, i, offset) > 0) break; } return i - 1; } static int xfs_iext_node_insert_pos( struct xfs_iext_node *node, xfs_fileoff_t offset) { int i; for (i = 0; i < KEYS_PER_NODE; i++) { if (xfs_iext_key_cmp(node, i, offset) > 0) return i; } return KEYS_PER_NODE; } static int xfs_iext_node_nr_entries( struct xfs_iext_node *node, int start) { int i; for (i = start; i < KEYS_PER_NODE; i++) { if (node->keys[i] == XFS_IEXT_KEY_INVALID) break; } return i; } static int xfs_iext_leaf_nr_entries( struct xfs_ifork *ifp, struct xfs_iext_leaf *leaf, int start) { int i; for (i = start; i < xfs_iext_max_recs(ifp); i++) { if (xfs_iext_rec_is_empty(&leaf->recs[i])) break; } return i; } static inline uint64_t xfs_iext_leaf_key( struct xfs_iext_leaf *leaf, int n) { return leaf->recs[n].lo & XFS_IEXT_STARTOFF_MASK; } static inline void * xfs_iext_alloc_node( int size) { return kzalloc(size, GFP_KERNEL | __GFP_NOLOCKDEP | __GFP_NOFAIL); } static void xfs_iext_grow( struct xfs_ifork *ifp) { struct xfs_iext_node *node = xfs_iext_alloc_node(NODE_SIZE); int i; if (ifp->if_height == 1) { struct xfs_iext_leaf *prev = ifp->if_data; node->keys[0] = xfs_iext_leaf_key(prev, 0); node->ptrs[0] = prev; } else { struct xfs_iext_node *prev = ifp->if_data; ASSERT(ifp->if_height > 1); node->keys[0] = prev->keys[0]; node->ptrs[0] = prev; } for (i = 1; i < KEYS_PER_NODE; i++) node->keys[i] = XFS_IEXT_KEY_INVALID; ifp->if_data = node; ifp->if_height++; } static void xfs_iext_update_node( struct xfs_ifork *ifp, xfs_fileoff_t old_offset, xfs_fileoff_t new_offset, int level, void *ptr) { struct xfs_iext_node *node = ifp->if_data; int height, i; for (height = ifp->if_height; height > level; height--) { for (i = 0; i < KEYS_PER_NODE; i++) { if (i > 0 && xfs_iext_key_cmp(node, i, old_offset) > 0) break; if (node->keys[i] == old_offset) node->keys[i] = new_offset; } node = node->ptrs[i - 1]; ASSERT(node); } ASSERT(node == ptr); } static struct xfs_iext_node * xfs_iext_split_node( struct xfs_iext_node **nodep, int *pos, int *nr_entries) { struct xfs_iext_node *node = *nodep; struct xfs_iext_node *new = xfs_iext_alloc_node(NODE_SIZE); const int nr_move = KEYS_PER_NODE / 2; int nr_keep = nr_move + (KEYS_PER_NODE & 1); int i = 0; /* for sequential append operations just spill over into the new node */ if (*pos == KEYS_PER_NODE) { *nodep = new; *pos = 0; *nr_entries = 0; goto done; } for (i = 0; i < nr_move; i++) { new->keys[i] = node->keys[nr_keep + i]; new->ptrs[i] = node->ptrs[nr_keep + i]; node->keys[nr_keep + i] = XFS_IEXT_KEY_INVALID; node->ptrs[nr_keep + i] = NULL; } if (*pos >= nr_keep) { *nodep = new; *pos -= nr_keep; *nr_entries = nr_move; } else { *nr_entries = nr_keep; } done: for (; i < KEYS_PER_NODE; i++) new->keys[i] = XFS_IEXT_KEY_INVALID; return new; } static void xfs_iext_insert_node( struct xfs_ifork *ifp, uint64_t offset, void *ptr, int level) { struct xfs_iext_node *node, *new; int i, pos, nr_entries; again: if (ifp->if_height < level) xfs_iext_grow(ifp); new = NULL; node = xfs_iext_find_level(ifp, offset, level); pos = xfs_iext_node_insert_pos(node, offset); nr_entries = xfs_iext_node_nr_entries(node, pos); ASSERT(pos >= nr_entries || xfs_iext_key_cmp(node, pos, offset) != 0); ASSERT(nr_entries <= KEYS_PER_NODE); if (nr_entries == KEYS_PER_NODE) new = xfs_iext_split_node(&node, &pos, &nr_entries); /* * Update the pointers in higher levels if the first entry changes * in an existing node. */ if (node != new && pos == 0 && nr_entries > 0) xfs_iext_update_node(ifp, node->keys[0], offset, level, node); for (i = nr_entries; i > pos; i--) { node->keys[i] = node->keys[i - 1]; node->ptrs[i] = node->ptrs[i - 1]; } node->keys[pos] = offset; node->ptrs[pos] = ptr; if (new) { offset = new->keys[0]; ptr = new; level++; goto again; } } static struct xfs_iext_leaf * xfs_iext_split_leaf( struct xfs_iext_cursor *cur, int *nr_entries) { struct xfs_iext_leaf *leaf = cur->leaf; struct xfs_iext_leaf *new = xfs_iext_alloc_node(NODE_SIZE); const int nr_move = RECS_PER_LEAF / 2; int nr_keep = nr_move + (RECS_PER_LEAF & 1); int i; /* for sequential append operations just spill over into the new node */ if (cur->pos == RECS_PER_LEAF) { cur->leaf = new; cur->pos = 0; *nr_entries = 0; goto done; } for (i = 0; i < nr_move; i++) { new->recs[i] = leaf->recs[nr_keep + i]; xfs_iext_rec_clear(&leaf->recs[nr_keep + i]); } if (cur->pos >= nr_keep) { cur->leaf = new; cur->pos -= nr_keep; *nr_entries = nr_move; } else { *nr_entries = nr_keep; } done: if (leaf->next) leaf->next->prev = new; new->next = leaf->next; new->prev = leaf; leaf->next = new; return new; } static void xfs_iext_alloc_root( struct xfs_ifork *ifp, struct xfs_iext_cursor *cur) { ASSERT(ifp->if_bytes == 0); ifp->if_data = xfs_iext_alloc_node(sizeof(struct xfs_iext_rec)); ifp->if_height = 1; /* now that we have a node step into it */ cur->leaf = ifp->if_data; cur->pos = 0; } static void xfs_iext_realloc_root( struct xfs_ifork *ifp, struct xfs_iext_cursor *cur) { int64_t new_size = ifp->if_bytes + sizeof(struct xfs_iext_rec); void *new; /* account for the prev/next pointers */ if (new_size / sizeof(struct xfs_iext_rec) == RECS_PER_LEAF) new_size = NODE_SIZE; new = krealloc(ifp->if_data, new_size, GFP_KERNEL | __GFP_NOLOCKDEP | __GFP_NOFAIL); memset(new + ifp->if_bytes, 0, new_size - ifp->if_bytes); ifp->if_data = new; cur->leaf = new; } /* * Increment the sequence counter on extent tree changes. If we are on a COW * fork, this allows the writeback code to skip looking for a COW extent if the * COW fork hasn't changed. We use WRITE_ONCE here to ensure the update to the * sequence counter is seen before the modifications to the extent tree itself * take effect. */ static inline void xfs_iext_inc_seq(struct xfs_ifork *ifp) { WRITE_ONCE(ifp->if_seq, READ_ONCE(ifp->if_seq) + 1); } void xfs_iext_insert_raw( struct xfs_ifork *ifp, struct xfs_iext_cursor *cur, struct xfs_bmbt_irec *irec) { xfs_fileoff_t offset = irec->br_startoff; struct xfs_iext_leaf *new = NULL; int nr_entries, i; xfs_iext_inc_seq(ifp); if (ifp->if_height == 0) xfs_iext_alloc_root(ifp, cur); else if (ifp->if_height == 1) xfs_iext_realloc_root(ifp, cur); nr_entries = xfs_iext_leaf_nr_entries(ifp, cur->leaf, cur->pos); ASSERT(nr_entries <= RECS_PER_LEAF); ASSERT(cur->pos >= nr_entries || xfs_iext_rec_cmp(cur_rec(cur), irec->br_startoff) != 0); if (nr_entries == RECS_PER_LEAF) new = xfs_iext_split_leaf(cur, &nr_entries); /* * Update the pointers in higher levels if the first entry changes * in an existing node. */ if (cur->leaf != new && cur->pos == 0 && nr_entries > 0) { xfs_iext_update_node(ifp, xfs_iext_leaf_key(cur->leaf, 0), offset, 1, cur->leaf); } for (i = nr_entries; i > cur->pos; i--) cur->leaf->recs[i] = cur->leaf->recs[i - 1]; xfs_iext_set(cur_rec(cur), irec); ifp->if_bytes += sizeof(struct xfs_iext_rec); if (new) xfs_iext_insert_node(ifp, xfs_iext_leaf_key(new, 0), new, 2); } void xfs_iext_insert( struct xfs_inode *ip, struct xfs_iext_cursor *cur, struct xfs_bmbt_irec *irec, int state) { struct xfs_ifork *ifp = xfs_iext_state_to_fork(ip, state); xfs_iext_insert_raw(ifp, cur, irec); trace_xfs_iext_insert(ip, cur, state, _RET_IP_); } static struct xfs_iext_node * xfs_iext_rebalance_node( struct xfs_iext_node *parent, int *pos, struct xfs_iext_node *node, int nr_entries) { /* * If the neighbouring nodes are completely full, or have different * parents, we might never be able to merge our node, and will only * delete it once the number of entries hits zero. */ if (nr_entries == 0) return node; if (*pos > 0) { struct xfs_iext_node *prev = parent->ptrs[*pos - 1]; int nr_prev = xfs_iext_node_nr_entries(prev, 0), i; if (nr_prev + nr_entries <= KEYS_PER_NODE) { for (i = 0; i < nr_entries; i++) { prev->keys[nr_prev + i] = node->keys[i]; prev->ptrs[nr_prev + i] = node->ptrs[i]; } return node; } } if (*pos + 1 < xfs_iext_node_nr_entries(parent, *pos)) { struct xfs_iext_node *next = parent->ptrs[*pos + 1]; int nr_next = xfs_iext_node_nr_entries(next, 0), i; if (nr_entries + nr_next <= KEYS_PER_NODE) { /* * Merge the next node into this node so that we don't * have to do an additional update of the keys in the * higher levels. */ for (i = 0; i < nr_next; i++) { node->keys[nr_entries + i] = next->keys[i]; node->ptrs[nr_entries + i] = next->ptrs[i]; } ++*pos; return next; } } return NULL; } static void xfs_iext_remove_node( struct xfs_ifork *ifp, xfs_fileoff_t offset, void *victim) { struct xfs_iext_node *node, *parent; int level = 2, pos, nr_entries, i; ASSERT(level <= ifp->if_height); node = xfs_iext_find_level(ifp, offset, level); pos = xfs_iext_node_pos(node, offset); again: ASSERT(node->ptrs[pos]); ASSERT(node->ptrs[pos] == victim); kfree(victim); nr_entries = xfs_iext_node_nr_entries(node, pos) - 1; offset = node->keys[0]; for (i = pos; i < nr_entries; i++) { node->keys[i] = node->keys[i + 1]; node->ptrs[i] = node->ptrs[i + 1]; } node->keys[nr_entries] = XFS_IEXT_KEY_INVALID; node->ptrs[nr_entries] = NULL; if (pos == 0 && nr_entries > 0) { xfs_iext_update_node(ifp, offset, node->keys[0], level, node); offset = node->keys[0]; } if (nr_entries >= KEYS_PER_NODE / 2) return; if (level < ifp->if_height) { /* * If we aren't at the root yet try to find a neighbour node to * merge with (or delete the node if it is empty), and then * recurse up to the next level. */ level++; parent = xfs_iext_find_level(ifp, offset, level); pos = xfs_iext_node_pos(parent, offset); ASSERT(pos != KEYS_PER_NODE); ASSERT(parent->ptrs[pos] == node); node = xfs_iext_rebalance_node(parent, &pos, node, nr_entries); if (node) { victim = node; node = parent; goto again; } } else if (nr_entries == 1) { /* * If we are at the root and only one entry is left we can just * free this node and update the root pointer. */ ASSERT(node == ifp->if_data); ifp->if_data = node->ptrs[0]; ifp->if_height--; kfree(node); } } static void xfs_iext_rebalance_leaf( struct xfs_ifork *ifp, struct xfs_iext_cursor *cur, struct xfs_iext_leaf *leaf, xfs_fileoff_t offset, int nr_entries) { /* * If the neighbouring nodes are completely full we might never be able * to merge our node, and will only delete it once the number of * entries hits zero. */ if (nr_entries == 0) goto remove_node; if (leaf->prev) { int nr_prev = xfs_iext_leaf_nr_entries(ifp, leaf->prev, 0), i; if (nr_prev + nr_entries <= RECS_PER_LEAF) { for (i = 0; i < nr_entries; i++) leaf->prev->recs[nr_prev + i] = leaf->recs[i]; if (cur->leaf == leaf) { cur->leaf = leaf->prev; cur->pos += nr_prev; } goto remove_node; } } if (leaf->next) { int nr_next = xfs_iext_leaf_nr_entries(ifp, leaf->next, 0), i; if (nr_entries + nr_next <= RECS_PER_LEAF) { /* * Merge the next node into this node so that we don't * have to do an additional update of the keys in the * higher levels. */ for (i = 0; i < nr_next; i++) { leaf->recs[nr_entries + i] = leaf->next->recs[i]; } if (cur->leaf == leaf->next) { cur->leaf = leaf; cur->pos += nr_entries; } offset = xfs_iext_leaf_key(leaf->next, 0); leaf = leaf->next; goto remove_node; } } return; remove_node: if (leaf->prev) leaf->prev->next = leaf->next; if (leaf->next) leaf->next->prev = leaf->prev; xfs_iext_remove_node(ifp, offset, leaf); } static void xfs_iext_free_last_leaf( struct xfs_ifork *ifp) { ifp->if_height--; kfree(ifp->if_data); ifp->if_data = NULL; } void xfs_iext_remove( struct xfs_inode *ip, struct xfs_iext_cursor *cur, int state) { struct xfs_ifork *ifp = xfs_iext_state_to_fork(ip, state); struct xfs_iext_leaf *leaf = cur->leaf; xfs_fileoff_t offset = xfs_iext_leaf_key(leaf, 0); int i, nr_entries; trace_xfs_iext_remove(ip, cur, state, _RET_IP_); ASSERT(ifp->if_height > 0); ASSERT(ifp->if_data != NULL); ASSERT(xfs_iext_valid(ifp, cur)); xfs_iext_inc_seq(ifp); nr_entries = xfs_iext_leaf_nr_entries(ifp, leaf, cur->pos) - 1; for (i = cur->pos; i < nr_entries; i++) leaf->recs[i] = leaf->recs[i + 1]; xfs_iext_rec_clear(&leaf->recs[nr_entries]); ifp->if_bytes -= sizeof(struct xfs_iext_rec); if (cur->pos == 0 && nr_entries > 0) { xfs_iext_update_node(ifp, offset, xfs_iext_leaf_key(leaf, 0), 1, leaf); offset = xfs_iext_leaf_key(leaf, 0); } else if (cur->pos == nr_entries) { if (ifp->if_height > 1 && leaf->next) cur->leaf = leaf->next; else cur->leaf = NULL; cur->pos = 0; } if (nr_entries >= RECS_PER_LEAF / 2) return; if (ifp->if_height > 1) xfs_iext_rebalance_leaf(ifp, cur, leaf, offset, nr_entries); else if (nr_entries == 0) xfs_iext_free_last_leaf(ifp); } /* * Lookup the extent covering bno. * * If there is an extent covering bno return the extent index, and store the * expanded extent structure in *gotp, and the extent cursor in *cur. * If there is no extent covering bno, but there is an extent after it (e.g. * it lies in a hole) return that extent in *gotp and its cursor in *cur * instead. * If bno is beyond the last extent return false, and return an invalid * cursor value. */ bool xfs_iext_lookup_extent( struct xfs_inode *ip, struct xfs_ifork *ifp, xfs_fileoff_t offset, struct xfs_iext_cursor *cur, struct xfs_bmbt_irec *gotp) { XFS_STATS_INC(ip->i_mount, xs_look_exlist); cur->leaf = xfs_iext_find_level(ifp, offset, 1); if (!cur->leaf) { cur->pos = 0; return false; } for (cur->pos = 0; cur->pos < xfs_iext_max_recs(ifp); cur->pos++) { struct xfs_iext_rec *rec = cur_rec(cur); if (xfs_iext_rec_is_empty(rec)) break; if (xfs_iext_rec_cmp(rec, offset) >= 0) goto found; } /* Try looking in the next node for an entry > offset */ if (ifp->if_height == 1 || !cur->leaf->next) return false; cur->leaf = cur->leaf->next; cur->pos = 0; if (!xfs_iext_valid(ifp, cur)) return false; found: xfs_iext_get(gotp, cur_rec(cur)); return true; } /* * Returns the last extent before end, and if this extent doesn't cover * end, update end to the end of the extent. */ bool xfs_iext_lookup_extent_before( struct xfs_inode *ip, struct xfs_ifork *ifp, xfs_fileoff_t *end, struct xfs_iext_cursor *cur, struct xfs_bmbt_irec *gotp) { /* could be optimized to not even look up the next on a match.. */ if (xfs_iext_lookup_extent(ip, ifp, *end - 1, cur, gotp) && gotp->br_startoff <= *end - 1) return true; if (!xfs_iext_prev_extent(ifp, cur, gotp)) return false; *end = gotp->br_startoff + gotp->br_blockcount; return true; } void xfs_iext_update_extent( struct xfs_inode *ip, int state, struct xfs_iext_cursor *cur, struct xfs_bmbt_irec *new) { struct xfs_ifork *ifp = xfs_iext_state_to_fork(ip, state); xfs_iext_inc_seq(ifp); if (cur->pos == 0) { struct xfs_bmbt_irec old; xfs_iext_get(&old, cur_rec(cur)); if (new->br_startoff != old.br_startoff) { xfs_iext_update_node(ifp, old.br_startoff, new->br_startoff, 1, cur->leaf); } } trace_xfs_bmap_pre_update(ip, cur, state, _RET_IP_); xfs_iext_set(cur_rec(cur), new); trace_xfs_bmap_post_update(ip, cur, state, _RET_IP_); } /* * Return true if the cursor points at an extent and return the extent structure * in gotp. Else return false. */ bool xfs_iext_get_extent( struct xfs_ifork *ifp, struct xfs_iext_cursor *cur, struct xfs_bmbt_irec *gotp) { if (!xfs_iext_valid(ifp, cur)) return false; xfs_iext_get(gotp, cur_rec(cur)); return true; } /* * This is a recursive function, because of that we need to be extremely * careful with stack usage. */ static void xfs_iext_destroy_node( struct xfs_iext_node *node, int level) { int i; if (level > 1) { for (i = 0; i < KEYS_PER_NODE; i++) { if (node->keys[i] == XFS_IEXT_KEY_INVALID) break; xfs_iext_destroy_node(node->ptrs[i], level - 1); } } kfree(node); } void xfs_iext_destroy( struct xfs_ifork *ifp) { xfs_iext_destroy_node(ifp->if_data, ifp->if_height); ifp->if_bytes = 0; ifp->if_height = 0; ifp->if_data = NULL; } |
732 157 2 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 | /* SPDX-License-Identifier: GPL-2.0 */ /* * Copyright (c) 2000,2002,2005 Silicon Graphics, Inc. * All Rights Reserved. */ #ifndef __XFS_BIT_H__ #define __XFS_BIT_H__ /* * XFS bit manipulation routines. */ /* * masks with n high/low bits set, 64-bit values */ static inline uint64_t xfs_mask64hi(int n) { return (uint64_t)-1 << (64 - (n)); } static inline uint32_t xfs_mask32lo(int n) { return ((uint32_t)1 << (n)) - 1; } static inline uint64_t xfs_mask64lo(int n) { return ((uint64_t)1 << (n)) - 1; } /* Get high bit set out of 32-bit argument, -1 if none set */ static inline int xfs_highbit32(uint32_t v) { return fls(v) - 1; } /* Get high bit set out of 64-bit argument, -1 if none set */ static inline int xfs_highbit64(uint64_t v) { return fls64(v) - 1; } /* Get low bit set out of 32-bit argument, -1 if none set */ static inline int xfs_lowbit32(uint32_t v) { return ffs(v) - 1; } /* Get low bit set out of 64-bit argument, -1 if none set */ static inline int xfs_lowbit64(uint64_t v) { uint32_t w = (uint32_t)v; int n = 0; if (w) { /* lower bits */ n = ffs(w); } else { /* upper bits */ w = (uint32_t)(v >> 32); if (w) { n = ffs(w); if (n) n += 32; } } return n - 1; } /* Return whether bitmap is empty (1 == empty) */ extern int xfs_bitmap_empty(uint *map, uint size); /* Count continuous one bits in map starting with start_bit */ extern int xfs_contig_bits(uint *map, uint size, uint start_bit); /* Find next set bit in map */ extern int xfs_next_bit(uint *map, uint size, uint start_bit); #endif /* __XFS_BIT_H__ */ |
52 505 506 506 508 30 3 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 | /* SPDX-License-Identifier: GPL-2.0-or-later */ /* * Copyright 2003-2005 Red Hat, Inc. All rights reserved. * Copyright 2003-2005 Jeff Garzik * * libata documentation is available via 'make {ps|pdf}docs', * as Documentation/driver-api/libata.rst */ #ifndef __LINUX_LIBATA_H__ #define __LINUX_LIBATA_H__ #include <linux/delay.h> #include <linux/jiffies.h> #include <linux/interrupt.h> #include <linux/dma-mapping.h> #include <linux/scatterlist.h> #include <linux/io.h> #include <linux/ata.h> #include <linux/workqueue.h> #include <scsi/scsi_host.h> #include <linux/acpi.h> #include <linux/cdrom.h> #include <linux/sched.h> #include <linux/async.h> /* * Define if arch has non-standard setup. This is a _PCI_ standard * not a legacy or ISA standard. */ #ifdef CONFIG_ATA_NONSTANDARD #include <asm/libata-portmap.h> #else #define ATA_PRIMARY_IRQ(dev) 14 #define ATA_SECONDARY_IRQ(dev) 15 #endif /* * compile-time options: to be removed as soon as all the drivers are * converted to the new debugging mechanism */ #undef ATA_IRQ_TRAP /* define to ack screaming irqs */ /* defines only for the constants which don't work well as enums */ #define ATA_TAG_POISON 0xfafbfcfdU /* * Quirk flags bits. * ata_device->quirks is an unsigned int, so __ATA_QUIRK_MAX must not exceed 32. */ enum ata_quirks { __ATA_QUIRK_DIAGNOSTIC, /* Failed boot diag */ __ATA_QUIRK_NODMA, /* DMA problems */ __ATA_QUIRK_NONCQ, /* Don't use NCQ */ __ATA_QUIRK_MAX_SEC_128, /* Limit max sects to 128 */ __ATA_QUIRK_BROKEN_HPA, /* Broken HPA */ __ATA_QUIRK_DISABLE, /* Disable it */ __ATA_QUIRK_HPA_SIZE, /* Native size off by one */ __ATA_QUIRK_IVB, /* cbl det validity bit bugs */ __ATA_QUIRK_STUCK_ERR, /* Stuck ERR on next PACKET */ __ATA_QUIRK_BRIDGE_OK, /* No bridge limits */ __ATA_QUIRK_ATAPI_MOD16_DMA, /* Use ATAPI DMA for commands that */ /* are not a multiple of 16 bytes */ __ATA_QUIRK_FIRMWARE_WARN, /* Firmware update warning */ __ATA_QUIRK_1_5_GBPS, /* Force 1.5 Gbps */ __ATA_QUIRK_NOSETXFER, /* Skip SETXFER, SATA only */ __ATA_QUIRK_BROKEN_FPDMA_AA, /* Skip AA */ __ATA_QUIRK_DUMP_ID, /* Dump IDENTIFY data */ __ATA_QUIRK_MAX_SEC_LBA48, /* Set max sects to 65535 */ __ATA_QUIRK_ATAPI_DMADIR, /* Device requires dmadir */ __ATA_QUIRK_NO_NCQ_TRIM, /* Do not use queued TRIM */ __ATA_QUIRK_NOLPM, /* Do not use LPM */ __ATA_QUIRK_WD_BROKEN_LPM, /* Some WDs have broken LPM */ __ATA_QUIRK_ZERO_AFTER_TRIM, /* Guarantees zero after trim */ __ATA_QUIRK_NO_DMA_LOG, /* Do not use DMA for log read */ __ATA_QUIRK_NOTRIM, /* Do not use TRIM */ __ATA_QUIRK_MAX_SEC_1024, /* Limit max sects to 1024 */ __ATA_QUIRK_MAX_TRIM_128M, /* Limit max trim size to 128M */ __ATA_QUIRK_NO_NCQ_ON_ATI, /* Disable NCQ on ATI chipset */ __ATA_QUIRK_NO_LPM_ON_ATI, /* Disable LPM on ATI chipset */ __ATA_QUIRK_NO_ID_DEV_LOG, /* Identify device log missing */ __ATA_QUIRK_NO_LOG_DIR, /* Do not read log directory */ __ATA_QUIRK_NO_FUA, /* Do not use FUA */ __ATA_QUIRK_MAX, }; enum { /* various global constants */ LIBATA_MAX_PRD = ATA_MAX_PRD / 2, LIBATA_DUMB_MAX_PRD = ATA_MAX_PRD / 4, /* Worst case */ ATA_DEF_QUEUE = 1, ATA_MAX_QUEUE = 32, ATA_TAG_INTERNAL = ATA_MAX_QUEUE, ATA_SHORT_PAUSE = 16, ATAPI_MAX_DRAIN = 16 << 10, ATA_ALL_DEVICES = (1 << ATA_MAX_DEVICES) - 1, ATA_SHT_EMULATED = 1, ATA_SHT_THIS_ID = -1, /* struct ata_taskfile flags */ ATA_TFLAG_LBA48 = (1 << 0), /* enable 48-bit LBA and "HOB" */ ATA_TFLAG_ISADDR = (1 << 1), /* enable r/w to nsect/lba regs */ ATA_TFLAG_DEVICE = (1 << 2), /* enable r/w to device reg */ ATA_TFLAG_WRITE = (1 << 3), /* data dir: host->dev==1 (write) */ ATA_TFLAG_LBA = (1 << 4), /* enable LBA */ ATA_TFLAG_FUA = (1 << 5), /* enable FUA */ ATA_TFLAG_POLLING = (1 << 6), /* set nIEN to 1 and use polling */ /* struct ata_device stuff */ ATA_DFLAG_LBA = (1 << 0), /* device supports LBA */ ATA_DFLAG_LBA48 = (1 << 1), /* device supports LBA48 */ ATA_DFLAG_CDB_INTR = (1 << 2), /* device asserts INTRQ when ready for CDB */ ATA_DFLAG_NCQ = (1 << 3), /* device supports NCQ */ ATA_DFLAG_FLUSH_EXT = (1 << 4), /* do FLUSH_EXT instead of FLUSH */ ATA_DFLAG_ACPI_PENDING = (1 << 5), /* ACPI resume action pending */ ATA_DFLAG_ACPI_FAILED = (1 << 6), /* ACPI on devcfg has failed */ ATA_DFLAG_AN = (1 << 7), /* AN configured */ ATA_DFLAG_TRUSTED = (1 << 8), /* device supports trusted send/recv */ ATA_DFLAG_FUA = (1 << 9), /* device supports FUA */ ATA_DFLAG_DMADIR = (1 << 10), /* device requires DMADIR */ ATA_DFLAG_NCQ_SEND_RECV = (1 << 11), /* device supports NCQ SEND and RECV */ ATA_DFLAG_NCQ_PRIO = (1 << 12), /* device supports NCQ priority */ ATA_DFLAG_CDL = (1 << 13), /* supports cmd duration limits */ ATA_DFLAG_CFG_MASK = (1 << 14) - 1, ATA_DFLAG_PIO = (1 << 14), /* device limited to PIO mode */ ATA_DFLAG_NCQ_OFF = (1 << 15), /* device limited to non-NCQ mode */ ATA_DFLAG_SLEEPING = (1 << 16), /* device is sleeping */ ATA_DFLAG_DUBIOUS_XFER = (1 << 17), /* data transfer not verified */ ATA_DFLAG_NO_UNLOAD = (1 << 18), /* device doesn't support unload */ ATA_DFLAG_UNLOCK_HPA = (1 << 19), /* unlock HPA */ ATA_DFLAG_INIT_MASK = (1 << 20) - 1, ATA_DFLAG_NCQ_PRIO_ENABLED = (1 << 20), /* Priority cmds sent to dev */ ATA_DFLAG_CDL_ENABLED = (1 << 21), /* cmd duration limits is enabled */ ATA_DFLAG_RESUMING = (1 << 22), /* Device is resuming */ ATA_DFLAG_DETACH = (1 << 24), ATA_DFLAG_DETACHED = (1 << 25), ATA_DFLAG_DA = (1 << 26), /* device supports Device Attention */ ATA_DFLAG_DEVSLP = (1 << 27), /* device supports Device Sleep */ ATA_DFLAG_ACPI_DISABLED = (1 << 28), /* ACPI for the device is disabled */ ATA_DFLAG_D_SENSE = (1 << 29), /* Descriptor sense requested */ ATA_DFLAG_ZAC = (1 << 30), /* ZAC device */ ATA_DFLAG_FEATURES_MASK = (ATA_DFLAG_TRUSTED | ATA_DFLAG_DA | \ ATA_DFLAG_DEVSLP | ATA_DFLAG_NCQ_SEND_RECV | \ ATA_DFLAG_NCQ_PRIO | ATA_DFLAG_FUA | \ ATA_DFLAG_CDL), ATA_DEV_UNKNOWN = 0, /* unknown device */ ATA_DEV_ATA = 1, /* ATA device */ ATA_DEV_ATA_UNSUP = 2, /* ATA device (unsupported) */ ATA_DEV_ATAPI = 3, /* ATAPI device */ ATA_DEV_ATAPI_UNSUP = 4, /* ATAPI device (unsupported) */ ATA_DEV_PMP = 5, /* SATA port multiplier */ ATA_DEV_PMP_UNSUP = 6, /* SATA port multiplier (unsupported) */ ATA_DEV_SEMB = 7, /* SEMB */ ATA_DEV_SEMB_UNSUP = 8, /* SEMB (unsupported) */ ATA_DEV_ZAC = 9, /* ZAC device */ ATA_DEV_ZAC_UNSUP = 10, /* ZAC device (unsupported) */ ATA_DEV_NONE = 11, /* no device */ /* struct ata_link flags */ /* NOTE: struct ata_force_param currently stores lflags in u16 */ ATA_LFLAG_NO_HRST = (1 << 1), /* avoid hardreset */ ATA_LFLAG_NO_SRST = (1 << 2), /* avoid softreset */ ATA_LFLAG_ASSUME_ATA = (1 << 3), /* assume ATA class */ ATA_LFLAG_ASSUME_SEMB = (1 << 4), /* assume SEMB class */ ATA_LFLAG_ASSUME_CLASS = ATA_LFLAG_ASSUME_ATA | ATA_LFLAG_ASSUME_SEMB, ATA_LFLAG_NO_RETRY = (1 << 5), /* don't retry this link */ ATA_LFLAG_DISABLED = (1 << 6), /* link is disabled */ ATA_LFLAG_SW_ACTIVITY = (1 << 7), /* keep activity stats */ ATA_LFLAG_NO_LPM = (1 << 8), /* disable LPM on this link */ ATA_LFLAG_RST_ONCE = (1 << 9), /* limit recovery to one reset */ ATA_LFLAG_CHANGED = (1 << 10), /* LPM state changed on this link */ ATA_LFLAG_NO_DEBOUNCE_DELAY = (1 << 11), /* no debounce delay on link resume */ /* struct ata_port flags */ ATA_FLAG_SLAVE_POSS = (1 << 0), /* host supports slave dev */ /* (doesn't imply presence) */ ATA_FLAG_SATA = (1 << 1), ATA_FLAG_NO_LPM = (1 << 2), /* host not happy with LPM */ ATA_FLAG_NO_LOG_PAGE = (1 << 5), /* do not issue log page read */ ATA_FLAG_NO_ATAPI = (1 << 6), /* No ATAPI support */ ATA_FLAG_PIO_DMA = (1 << 7), /* PIO cmds via DMA */ ATA_FLAG_PIO_LBA48 = (1 << 8), /* Host DMA engine is LBA28 only */ ATA_FLAG_PIO_POLLING = (1 << 9), /* use polling PIO if LLD * doesn't handle PIO interrupts */ ATA_FLAG_NCQ = (1 << 10), /* host supports NCQ */ ATA_FLAG_NO_POWEROFF_SPINDOWN = (1 << 11), /* don't spindown before poweroff */ ATA_FLAG_NO_HIBERNATE_SPINDOWN = (1 << 12), /* don't spindown before hibernation */ ATA_FLAG_DEBUGMSG = (1 << 13), ATA_FLAG_FPDMA_AA = (1 << 14), /* driver supports Auto-Activate */ ATA_FLAG_IGN_SIMPLEX = (1 << 15), /* ignore SIMPLEX */ ATA_FLAG_NO_IORDY = (1 << 16), /* controller lacks iordy */ ATA_FLAG_ACPI_SATA = (1 << 17), /* need native SATA ACPI layout */ ATA_FLAG_AN = (1 << 18), /* controller supports AN */ ATA_FLAG_PMP = (1 << 19), /* controller supports PMP */ ATA_FLAG_FPDMA_AUX = (1 << 20), /* controller supports H2DFIS aux field */ ATA_FLAG_EM = (1 << 21), /* driver supports enclosure * management */ ATA_FLAG_SW_ACTIVITY = (1 << 22), /* driver supports sw activity * led */ ATA_FLAG_NO_DIPM = (1 << 23), /* host not happy with DIPM */ ATA_FLAG_SAS_HOST = (1 << 24), /* SAS host */ /* bits 24:31 of ap->flags are reserved for LLD specific flags */ /* struct ata_port pflags */ ATA_PFLAG_EH_PENDING = (1 << 0), /* EH pending */ ATA_PFLAG_EH_IN_PROGRESS = (1 << 1), /* EH in progress */ ATA_PFLAG_FROZEN = (1 << 2), /* port is frozen */ ATA_PFLAG_RECOVERED = (1 << 3), /* recovery action performed */ ATA_PFLAG_LOADING = (1 << 4), /* boot/loading probe */ ATA_PFLAG_SCSI_HOTPLUG = (1 << 6), /* SCSI hotplug scheduled */ ATA_PFLAG_INITIALIZING = (1 << 7), /* being initialized, don't touch */ ATA_PFLAG_RESETTING = (1 << 8), /* reset in progress */ ATA_PFLAG_UNLOADING = (1 << 9), /* driver is being unloaded */ ATA_PFLAG_UNLOADED = (1 << 10), /* driver is unloaded */ ATA_PFLAG_RESUMING = (1 << 16), /* port is being resumed */ ATA_PFLAG_SUSPENDED = (1 << 17), /* port is suspended (power) */ ATA_PFLAG_PM_PENDING = (1 << 18), /* PM operation pending */ ATA_PFLAG_INIT_GTM_VALID = (1 << 19), /* initial gtm data valid */ ATA_PFLAG_PIO32 = (1 << 20), /* 32bit PIO */ ATA_PFLAG_PIO32CHANGE = (1 << 21), /* 32bit PIO can be turned on/off */ ATA_PFLAG_EXTERNAL = (1 << 22), /* eSATA/external port */ /* struct ata_queued_cmd flags */ ATA_QCFLAG_ACTIVE = (1 << 0), /* cmd not yet ack'd to scsi lyer */ ATA_QCFLAG_DMAMAP = (1 << 1), /* SG table is DMA mapped */ ATA_QCFLAG_RTF_FILLED = (1 << 2), /* result TF has been filled */ ATA_QCFLAG_IO = (1 << 3), /* standard IO command */ ATA_QCFLAG_RESULT_TF = (1 << 4), /* result TF requested */ ATA_QCFLAG_CLEAR_EXCL = (1 << 5), /* clear excl_link on completion */ ATA_QCFLAG_QUIET = (1 << 6), /* don't report device error */ ATA_QCFLAG_RETRY = (1 << 7), /* retry after failure */ ATA_QCFLAG_HAS_CDL = (1 << 8), /* qc has CDL a descriptor set */ ATA_QCFLAG_EH = (1 << 16), /* cmd aborted and owned by EH */ ATA_QCFLAG_SENSE_VALID = (1 << 17), /* sense data valid */ ATA_QCFLAG_EH_SCHEDULED = (1 << 18), /* EH scheduled (obsolete) */ ATA_QCFLAG_EH_SUCCESS_CMD = (1 << 19), /* EH should fetch sense for this successful cmd */ /* host set flags */ ATA_HOST_SIMPLEX = (1 << 0), /* Host is simplex, one DMA channel per host only */ ATA_HOST_STARTED = (1 << 1), /* Host started */ ATA_HOST_PARALLEL_SCAN = (1 << 2), /* Ports on this host can be scanned in parallel */ ATA_HOST_IGNORE_ATA = (1 << 3), /* Ignore ATA devices on this host. */ ATA_HOST_NO_PART = (1 << 4), /* Host does not support partial */ ATA_HOST_NO_SSC = (1 << 5), /* Host does not support slumber */ ATA_HOST_NO_DEVSLP = (1 << 6), /* Host does not support devslp */ /* bits 24:31 of host->flags are reserved for LLD specific flags */ /* Various lengths of time */ ATA_TMOUT_INTERNAL_QUICK = 5000, ATA_TMOUT_MAX_PARK = 30000, /* * GoVault needs 2s and iVDR disk HHD424020F7SV00 800ms. 2s * is too much without parallel probing. Use 2s if parallel * probing is available, 800ms otherwise. */ ATA_TMOUT_FF_WAIT_LONG = 2000, ATA_TMOUT_FF_WAIT = 800, /* Spec mandates to wait for ">= 2ms" before checking status * after reset. We wait 150ms, because that was the magic * delay used for ATAPI devices in Hale Landis's ATADRVR, for * the period of time between when the ATA command register is * written, and then status is checked. Because waiting for * "a while" before checking status is fine, post SRST, we * perform this magic delay here as well. * * Old drivers/ide uses the 2mS rule and then waits for ready. */ ATA_WAIT_AFTER_RESET = 150, /* If PMP is supported, we have to do follow-up SRST. As some * PMPs don't send D2H Reg FIS after hardreset, LLDs are * advised to wait only for the following duration before * doing SRST. */ ATA_TMOUT_PMP_SRST_WAIT = 10000, /* When the LPM policy is set to ATA_LPM_MAX_POWER, there might * be a spurious PHY event, so ignore the first PHY event that * occurs within 10s after the policy change. */ ATA_TMOUT_SPURIOUS_PHY = 10000, /* ATA bus states */ BUS_UNKNOWN = 0, BUS_DMA = 1, BUS_IDLE = 2, BUS_NOINTR = 3, BUS_NODATA = 4, BUS_TIMER = 5, BUS_PIO = 6, BUS_EDD = 7, BUS_IDENTIFY = 8, BUS_PACKET = 9, /* SATA port states */ PORT_UNKNOWN = 0, PORT_ENABLED = 1, PORT_DISABLED = 2, /* encoding various smaller bitmaps into a single * unsigned int bitmap */ ATA_NR_PIO_MODES = 7, ATA_NR_MWDMA_MODES = 5, ATA_NR_UDMA_MODES = 8, ATA_SHIFT_PIO = 0, ATA_SHIFT_MWDMA = ATA_SHIFT_PIO + ATA_NR_PIO_MODES, ATA_SHIFT_UDMA = ATA_SHIFT_MWDMA + ATA_NR_MWDMA_MODES, ATA_SHIFT_PRIO = 6, ATA_PRIO_HIGH = 2, /* size of buffer to pad xfers ending on unaligned boundaries */ ATA_DMA_PAD_SZ = 4, /* ering size */ ATA_ERING_SIZE = 32, /* return values for ->qc_defer */ ATA_DEFER_LINK = 1, ATA_DEFER_PORT = 2, /* desc_len for ata_eh_info and context */ ATA_EH_DESC_LEN = 80, /* reset / recovery action types */ ATA_EH_REVALIDATE = (1 << 0), ATA_EH_SOFTRESET = (1 << 1), /* meaningful only in ->prereset */ ATA_EH_HARDRESET = (1 << 2), /* meaningful only in ->prereset */ ATA_EH_RESET = ATA_EH_SOFTRESET | ATA_EH_HARDRESET, ATA_EH_ENABLE_LINK = (1 << 3), ATA_EH_PARK = (1 << 5), /* unload heads and stop I/O */ ATA_EH_GET_SUCCESS_SENSE = (1 << 6), /* Get sense data for successful cmd */ ATA_EH_SET_ACTIVE = (1 << 7), /* Set a device to active power mode */ ATA_EH_PERDEV_MASK = ATA_EH_REVALIDATE | ATA_EH_PARK | ATA_EH_GET_SUCCESS_SENSE | ATA_EH_SET_ACTIVE, ATA_EH_ALL_ACTIONS = ATA_EH_REVALIDATE | ATA_EH_RESET | ATA_EH_ENABLE_LINK, /* ata_eh_info->flags */ ATA_EHI_HOTPLUGGED = (1 << 0), /* could have been hotplugged */ ATA_EHI_NO_AUTOPSY = (1 << 2), /* no autopsy */ ATA_EHI_QUIET = (1 << 3), /* be quiet */ ATA_EHI_NO_RECOVERY = (1 << 4), /* no recovery */ ATA_EHI_DID_SOFTRESET = (1 << 16), /* already soft-reset this port */ ATA_EHI_DID_HARDRESET = (1 << 17), /* already soft-reset this port */ ATA_EHI_PRINTINFO = (1 << 18), /* print configuration info */ ATA_EHI_SETMODE = (1 << 19), /* configure transfer mode */ ATA_EHI_POST_SETMODE = (1 << 20), /* revalidating after setmode */ ATA_EHI_DID_PRINT_QUIRKS = (1 << 21), /* already printed quirks info */ ATA_EHI_DID_RESET = ATA_EHI_DID_SOFTRESET | ATA_EHI_DID_HARDRESET, /* mask of flags to transfer *to* the slave link */ ATA_EHI_TO_SLAVE_MASK = ATA_EHI_NO_AUTOPSY | ATA_EHI_QUIET, /* max tries if error condition is still set after ->error_handler */ ATA_EH_MAX_TRIES = 5, /* sometimes resuming a link requires several retries */ ATA_LINK_RESUME_TRIES = 5, /* how hard are we gonna try to probe/recover devices */ ATA_EH_DEV_TRIES = 3, ATA_EH_PMP_TRIES = 5, ATA_EH_PMP_LINK_TRIES = 3, SATA_PMP_RW_TIMEOUT = 3000, /* PMP read/write timeout */ /* This should match the actual table size of * ata_eh_cmd_timeout_table in libata-eh.c. */ ATA_EH_CMD_TIMEOUT_TABLE_SIZE = 8, /* * Quirk flags: may be set by libata or controller drivers on drives. * Some quirks may be drive/controller pair dependent. */ ATA_QUIRK_DIAGNOSTIC = (1U << __ATA_QUIRK_DIAGNOSTIC), ATA_QUIRK_NODMA = (1U << __ATA_QUIRK_NODMA), ATA_QUIRK_NONCQ = (1U << __ATA_QUIRK_NONCQ), ATA_QUIRK_MAX_SEC_128 = (1U << __ATA_QUIRK_MAX_SEC_128), ATA_QUIRK_BROKEN_HPA = (1U << __ATA_QUIRK_BROKEN_HPA), ATA_QUIRK_DISABLE = (1U << __ATA_QUIRK_DISABLE), ATA_QUIRK_HPA_SIZE = (1U << __ATA_QUIRK_HPA_SIZE), ATA_QUIRK_IVB = (1U << __ATA_QUIRK_IVB), ATA_QUIRK_STUCK_ERR = (1U << __ATA_QUIRK_STUCK_ERR), ATA_QUIRK_BRIDGE_OK = (1U << __ATA_QUIRK_BRIDGE_OK), ATA_QUIRK_ATAPI_MOD16_DMA = (1U << __ATA_QUIRK_ATAPI_MOD16_DMA), ATA_QUIRK_FIRMWARE_WARN = (1U << __ATA_QUIRK_FIRMWARE_WARN), ATA_QUIRK_1_5_GBPS = (1U << __ATA_QUIRK_1_5_GBPS), ATA_QUIRK_NOSETXFER = (1U << __ATA_QUIRK_NOSETXFER), ATA_QUIRK_BROKEN_FPDMA_AA = (1U << __ATA_QUIRK_BROKEN_FPDMA_AA), ATA_QUIRK_DUMP_ID = (1U << __ATA_QUIRK_DUMP_ID), ATA_QUIRK_MAX_SEC_LBA48 = (1U << __ATA_QUIRK_MAX_SEC_LBA48), ATA_QUIRK_ATAPI_DMADIR = (1U << __ATA_QUIRK_ATAPI_DMADIR), ATA_QUIRK_NO_NCQ_TRIM = (1U << __ATA_QUIRK_NO_NCQ_TRIM), ATA_QUIRK_NOLPM = (1U << __ATA_QUIRK_NOLPM), ATA_QUIRK_WD_BROKEN_LPM = (1U << __ATA_QUIRK_WD_BROKEN_LPM), ATA_QUIRK_ZERO_AFTER_TRIM = (1U << __ATA_QUIRK_ZERO_AFTER_TRIM), ATA_QUIRK_NO_DMA_LOG = (1U << __ATA_QUIRK_NO_DMA_LOG), ATA_QUIRK_NOTRIM = (1U << __ATA_QUIRK_NOTRIM), ATA_QUIRK_MAX_SEC_1024 = (1U << __ATA_QUIRK_MAX_SEC_1024), ATA_QUIRK_MAX_TRIM_128M = (1U << __ATA_QUIRK_MAX_TRIM_128M), ATA_QUIRK_NO_NCQ_ON_ATI = (1U << __ATA_QUIRK_NO_NCQ_ON_ATI), ATA_QUIRK_NO_LPM_ON_ATI = (1U << __ATA_QUIRK_NO_LPM_ON_ATI), ATA_QUIRK_NO_ID_DEV_LOG = (1U << __ATA_QUIRK_NO_ID_DEV_LOG), ATA_QUIRK_NO_LOG_DIR = (1U << __ATA_QUIRK_NO_LOG_DIR), ATA_QUIRK_NO_FUA = (1U << __ATA_QUIRK_NO_FUA), /* User visible DMA mask for DMA control. DO NOT renumber. */ ATA_DMA_MASK_ATA = (1 << 0), /* DMA on ATA Disk */ ATA_DMA_MASK_ATAPI = (1 << 1), /* DMA on ATAPI */ ATA_DMA_MASK_CFA = (1 << 2), /* DMA on CF Card */ /* ATAPI command types */ ATAPI_READ = 0, /* READs */ ATAPI_WRITE = 1, /* WRITEs */ ATAPI_READ_CD = 2, /* READ CD [MSF] */ ATAPI_PASS_THRU = 3, /* SAT pass-thru */ ATAPI_MISC = 4, /* the rest */ /* Timing constants */ ATA_TIMING_SETUP = (1 << 0), ATA_TIMING_ACT8B = (1 << 1), ATA_TIMING_REC8B = (1 << 2), ATA_TIMING_CYC8B = (1 << 3), ATA_TIMING_8BIT = ATA_TIMING_ACT8B | ATA_TIMING_REC8B | ATA_TIMING_CYC8B, ATA_TIMING_ACTIVE = (1 << 4), ATA_TIMING_RECOVER = (1 << 5), ATA_TIMING_DMACK_HOLD = (1 << 6), ATA_TIMING_CYCLE = (1 << 7), ATA_TIMING_UDMA = (1 << 8), ATA_TIMING_ALL = ATA_TIMING_SETUP | ATA_TIMING_ACT8B | ATA_TIMING_REC8B | ATA_TIMING_CYC8B | ATA_TIMING_ACTIVE | ATA_TIMING_RECOVER | ATA_TIMING_DMACK_HOLD | ATA_TIMING_CYCLE | ATA_TIMING_UDMA, /* ACPI constants */ ATA_ACPI_FILTER_SETXFER = 1 << 0, ATA_ACPI_FILTER_LOCK = 1 << 1, ATA_ACPI_FILTER_DIPM = 1 << 2, ATA_ACPI_FILTER_FPDMA_OFFSET = 1 << 3, /* FPDMA non-zero offset */ ATA_ACPI_FILTER_FPDMA_AA = 1 << 4, /* FPDMA auto activate */ ATA_ACPI_FILTER_DEFAULT = ATA_ACPI_FILTER_SETXFER | ATA_ACPI_FILTER_LOCK | ATA_ACPI_FILTER_DIPM, }; enum ata_xfer_mask { ATA_MASK_PIO = ((1U << ATA_NR_PIO_MODES) - 1) << ATA_SHIFT_PIO, ATA_MASK_MWDMA = ((1U << ATA_NR_MWDMA_MODES) - 1) << ATA_SHIFT_MWDMA, ATA_MASK_UDMA = ((1U << ATA_NR_UDMA_MODES) - 1) << ATA_SHIFT_UDMA, }; enum hsm_task_states { HSM_ST_IDLE, /* no command on going */ HSM_ST_FIRST, /* (waiting the device to) write CDB or first data block */ HSM_ST, /* (waiting the device to) transfer data */ HSM_ST_LAST, /* (waiting the device to) complete command */ HSM_ST_ERR, /* error */ }; enum ata_completion_errors { AC_ERR_OK = 0, /* no error */ AC_ERR_DEV = (1 << 0), /* device reported error */ AC_ERR_HSM = (1 << 1), /* host state machine violation */ AC_ERR_TIMEOUT = (1 << 2), /* timeout */ AC_ERR_MEDIA = (1 << 3), /* media error */ AC_ERR_ATA_BUS = (1 << 4), /* ATA bus error */ AC_ERR_HOST_BUS = (1 << 5), /* host bus error */ AC_ERR_SYSTEM = (1 << 6), /* system error */ AC_ERR_INVALID = (1 << 7), /* invalid argument */ AC_ERR_OTHER = (1 << 8), /* unknown */ AC_ERR_NODEV_HINT = (1 << 9), /* polling device detection hint */ AC_ERR_NCQ = (1 << 10), /* marker for offending NCQ qc */ }; /* * Link power management policy: If you alter this, you also need to * alter libata-sata.c (for the ascii descriptions) */ enum ata_lpm_policy { ATA_LPM_UNKNOWN, ATA_LPM_MAX_POWER, ATA_LPM_MED_POWER, ATA_LPM_MED_POWER_WITH_DIPM, /* Med power + DIPM as win IRST does */ ATA_LPM_MIN_POWER_WITH_PARTIAL, /* Min Power + partial and slumber */ ATA_LPM_MIN_POWER, /* Min power + no partial (slumber only) */ }; enum ata_lpm_hints { ATA_LPM_EMPTY = (1 << 0), /* port empty/probing */ ATA_LPM_HIPM = (1 << 1), /* may use HIPM */ ATA_LPM_WAKE_ONLY = (1 << 2), /* only wake up link */ }; /* forward declarations */ struct scsi_device; struct ata_port_operations; struct ata_port; struct ata_link; struct ata_queued_cmd; /* typedefs */ typedef void (*ata_qc_cb_t) (struct ata_queued_cmd *qc); typedef int (*ata_prereset_fn_t)(struct ata_link *link, unsigned long deadline); typedef int (*ata_reset_fn_t)(struct ata_link *link, unsigned int *classes, unsigned long deadline); typedef void (*ata_postreset_fn_t)(struct ata_link *link, unsigned int *classes); extern struct device_attribute dev_attr_unload_heads; #ifdef CONFIG_SATA_HOST extern struct device_attribute dev_attr_link_power_management_policy; extern struct device_attribute dev_attr_ncq_prio_supported; extern struct device_attribute dev_attr_ncq_prio_enable; extern struct device_attribute dev_attr_em_message_type; extern struct device_attribute dev_attr_em_message; extern struct device_attribute dev_attr_sw_activity; #endif enum sw_activity { OFF, BLINK_ON, BLINK_OFF, }; struct ata_taskfile { unsigned long flags; /* ATA_TFLAG_xxx */ u8 protocol; /* ATA_PROT_xxx */ u8 ctl; /* control reg */ u8 hob_feature; /* additional data */ u8 hob_nsect; /* to support LBA48 */ u8 hob_lbal; u8 hob_lbam; u8 hob_lbah; union { u8 error; u8 feature; }; u8 nsect; u8 lbal; u8 lbam; u8 lbah; u8 device; union { u8 status; u8 command; }; u32 auxiliary; /* auxiliary field */ /* from SATA 3.1 and */ /* ATA-8 ACS-3 */ }; #ifdef CONFIG_ATA_SFF struct ata_ioports { void __iomem *cmd_addr; void __iomem *data_addr; void __iomem *error_addr; void __iomem *feature_addr; void __iomem *nsect_addr; void __iomem *lbal_addr; void __iomem *lbam_addr; void __iomem *lbah_addr; void __iomem *device_addr; void __iomem *status_addr; void __iomem *command_addr; void __iomem *altstatus_addr; void __iomem *ctl_addr; #ifdef CONFIG_ATA_BMDMA void __iomem *bmdma_addr; #endif /* CONFIG_ATA_BMDMA */ void __iomem *scr_addr; }; #endif /* CONFIG_ATA_SFF */ struct ata_host { spinlock_t lock; struct device *dev; void __iomem * const *iomap; unsigned int n_ports; unsigned int n_tags; /* nr of NCQ tags */ void *private_data; struct ata_port_operations *ops; unsigned long flags; struct kref kref; struct mutex eh_mutex; struct task_struct *eh_owner; struct ata_port *simplex_claimed; /* channel owning the DMA */ struct ata_port *ports[]; }; struct ata_queued_cmd { struct ata_port *ap; struct ata_device *dev; struct scsi_cmnd *scsicmd; void (*scsidone)(struct scsi_cmnd *); struct ata_taskfile tf; u8 cdb[ATAPI_CDB_LEN]; unsigned long flags; /* ATA_QCFLAG_xxx */ unsigned int tag; /* libata core tag */ unsigned int hw_tag; /* driver tag */ unsigned int n_elem; unsigned int orig_n_elem; int dma_dir; unsigned int sect_size; unsigned int nbytes; unsigned int extrabytes; unsigned int curbytes; struct scatterlist sgent; struct scatterlist *sg; struct scatterlist *cursg; unsigned int cursg_ofs; unsigned int err_mask; struct ata_taskfile result_tf; ata_qc_cb_t complete_fn; void *private_data; void *lldd_task; }; struct ata_port_stats { unsigned long unhandled_irq; unsigned long idle_irq; unsigned long rw_reqbuf; }; struct ata_ering_entry { unsigned int eflags; unsigned int err_mask; u64 timestamp; }; struct ata_ering { int cursor; struct ata_ering_entry ring[ATA_ERING_SIZE]; }; struct ata_cpr { u8 num; u8 num_storage_elements; u64 start_lba; u64 num_lbas; }; struct ata_cpr_log { u8 nr_cpr; struct ata_cpr cpr[] __counted_by(nr_cpr); }; struct ata_cdl { /* * Buffer to cache the CDL log page 18h (command duration descriptors) * for SCSI-ATA translation. */ u8 desc_log_buf[ATA_LOG_CDL_SIZE]; /* * Buffer to handle reading the sense data for successful NCQ Commands * log page for commands using a CDL with one of the limits policy set * to 0xD (successful completion with sense data available bit set). */ u8 ncq_sense_log_buf[ATA_LOG_SENSE_NCQ_SIZE]; }; struct ata_device { struct ata_link *link; unsigned int devno; /* 0 or 1 */ unsigned int quirks; /* List of broken features */ unsigned long flags; /* ATA_DFLAG_xxx */ struct scsi_device *sdev; /* attached SCSI device */ void *private_data; #ifdef CONFIG_ATA_ACPI union acpi_object *gtf_cache; unsigned int gtf_filter; #endif #ifdef CONFIG_SATA_ZPODD void *zpodd; #endif struct device tdev; /* n_sector is CLEAR_BEGIN, read comment above CLEAR_BEGIN */ u64 n_sectors; /* size of device, if ATA */ u64 n_native_sectors; /* native size, if ATA */ unsigned int class; /* ATA_DEV_xxx */ unsigned long unpark_deadline; u8 pio_mode; u8 dma_mode; u8 xfer_mode; unsigned int xfer_shift; /* ATA_SHIFT_xxx */ unsigned int multi_count; /* sectors count for READ/WRITE MULTIPLE */ unsigned int max_sectors; /* per-device max sectors */ unsigned int cdb_len; /* per-dev xfer mask */ unsigned int pio_mask; unsigned int mwdma_mask; unsigned int udma_mask; /* for CHS addressing */ u16 cylinders; /* Number of cylinders */ u16 heads; /* Number of heads */ u16 sectors; /* Number of sectors per track */ union { u16 id[ATA_ID_WORDS]; /* IDENTIFY xxx DEVICE data */ u32 gscr[SATA_PMP_GSCR_DWORDS]; /* PMP GSCR block */ } ____cacheline_aligned; /* DEVSLP Timing Variables from Identify Device Data Log */ u8 devslp_timing[ATA_LOG_DEVSLP_SIZE]; /* NCQ send and receive log subcommand support */ u8 ncq_send_recv_cmds[ATA_LOG_NCQ_SEND_RECV_SIZE]; u8 ncq_non_data_cmds[ATA_LOG_NCQ_NON_DATA_SIZE]; /* ZAC zone configuration */ u32 zac_zoned_cap; u32 zac_zones_optimal_open; u32 zac_zones_optimal_nonseq; u32 zac_zones_max_open; /* Concurrent positioning ranges */ struct ata_cpr_log *cpr_log; /* Command Duration Limits support */ struct ata_cdl *cdl; /* error history */ int spdn_cnt; /* ering is CLEAR_END, read comment above CLEAR_END */ struct ata_ering ering; /* For EH */ u8 sector_buf[ATA_SECT_SIZE] ____cacheline_aligned; }; /* Fields between ATA_DEVICE_CLEAR_BEGIN and ATA_DEVICE_CLEAR_END are * cleared to zero on ata_dev_init(). */ #define ATA_DEVICE_CLEAR_BEGIN offsetof(struct ata_device, n_sectors) #define ATA_DEVICE_CLEAR_END offsetof(struct ata_device, ering) struct ata_eh_info { struct ata_device *dev; /* offending device */ u32 serror; /* SError from LLDD */ unsigned int err_mask; /* port-wide err_mask */ unsigned int action; /* ATA_EH_* action mask */ unsigned int dev_action[ATA_MAX_DEVICES]; /* dev EH action */ unsigned int flags; /* ATA_EHI_* flags */ unsigned int probe_mask; char desc[ATA_EH_DESC_LEN]; int desc_len; }; struct ata_eh_context { struct ata_eh_info i; int tries[ATA_MAX_DEVICES]; int cmd_timeout_idx[ATA_MAX_DEVICES] [ATA_EH_CMD_TIMEOUT_TABLE_SIZE]; unsigned int classes[ATA_MAX_DEVICES]; unsigned int did_probe_mask; unsigned int unloaded_mask; unsigned int saved_ncq_enabled; u8 saved_xfer_mode[ATA_MAX_DEVICES]; /* timestamp for the last reset attempt or success */ unsigned long last_reset; }; struct ata_acpi_drive { u32 pio; u32 dma; } __packed; struct ata_acpi_gtm { struct ata_acpi_drive drive[2]; u32 flags; } __packed; struct ata_link { struct ata_port *ap; int pmp; /* port multiplier port # */ struct device tdev; unsigned int active_tag; /* active tag on this link */ u32 sactive; /* active NCQ commands */ unsigned int flags; /* ATA_LFLAG_xxx */ u32 saved_scontrol; /* SControl on probe */ unsigned int hw_sata_spd_limit; unsigned int sata_spd_limit; unsigned int sata_spd; /* current SATA PHY speed */ enum ata_lpm_policy lpm_policy; /* record runtime error info, protected by host_set lock */ struct ata_eh_info eh_info; /* EH context */ struct ata_eh_context eh_context; struct ata_device device[ATA_MAX_DEVICES]; unsigned long last_lpm_change; /* when last LPM change happened */ }; #define ATA_LINK_CLEAR_BEGIN offsetof(struct ata_link, active_tag) #define ATA_LINK_CLEAR_END offsetof(struct ata_link, device[0]) struct ata_port { struct Scsi_Host *scsi_host; /* our co-allocated scsi host */ struct ata_port_operations *ops; spinlock_t *lock; /* Flags owned by the EH context. Only EH should touch these once the port is active */ unsigned long flags; /* ATA_FLAG_xxx */ /* Flags that change dynamically, protected by ap->lock */ unsigned int pflags; /* ATA_PFLAG_xxx */ unsigned int print_id; /* user visible unique port ID */ unsigned int port_no; /* 0 based port no. inside the host */ #ifdef CONFIG_ATA_SFF struct ata_ioports ioaddr; /* ATA cmd/ctl/dma register blocks */ u8 ctl; /* cache of ATA control register */ u8 last_ctl; /* Cache last written value */ struct ata_link* sff_pio_task_link; /* link currently used */ struct delayed_work sff_pio_task; #ifdef CONFIG_ATA_BMDMA struct ata_bmdma_prd *bmdma_prd; /* BMDMA SG list */ dma_addr_t bmdma_prd_dma; /* and its DMA mapping */ #endif /* CONFIG_ATA_BMDMA */ #endif /* CONFIG_ATA_SFF */ unsigned int pio_mask; unsigned int mwdma_mask; unsigned int udma_mask; unsigned int cbl; /* cable type; ATA_CBL_xxx */ struct ata_queued_cmd qcmd[ATA_MAX_QUEUE + 1]; u64 qc_active; int nr_active_links; /* #links with active qcs */ struct ata_link link; /* host default link */ struct ata_link *slave_link; /* see ata_slave_link_init() */ int nr_pmp_links; /* nr of available PMP links */ struct ata_link *pmp_link; /* array of PMP links */ struct ata_link *excl_link; /* for PMP qc exclusion */ struct ata_port_stats stats; struct ata_host *host; struct device *dev; struct device tdev; struct mutex scsi_scan_mutex; struct delayed_work hotplug_task; struct delayed_work scsi_rescan_task; unsigned int hsm_task_state; struct list_head eh_done_q; wait_queue_head_t eh_wait_q; int eh_tries; struct completion park_req_pending; pm_message_t pm_mesg; enum ata_lpm_policy target_lpm_policy; struct timer_list fastdrain_timer; unsigned int fastdrain_cnt; async_cookie_t cookie; int em_message_type; void *private_data; #ifdef CONFIG_ATA_ACPI struct ata_acpi_gtm __acpi_init_gtm; /* use ata_acpi_init_gtm() */ #endif }; /* The following initializer overrides a method to NULL whether one of * its parent has the method defined or not. This is equivalent to * ERR_PTR(-ENOENT). Unfortunately, ERR_PTR doesn't render a constant * expression and thus can't be used as an initializer. */ #define ATA_OP_NULL (void *)(unsigned long)(-ENOENT) struct ata_port_operations { /* * Command execution */ int (*qc_defer)(struct ata_queued_cmd *qc); int (*check_atapi_dma)(struct ata_queued_cmd *qc); enum ata_completion_errors (*qc_prep)(struct ata_queued_cmd *qc); unsigned int (*qc_issue)(struct ata_queued_cmd *qc); void (*qc_fill_rtf)(struct ata_queued_cmd *qc); void (*qc_ncq_fill_rtf)(struct ata_port *ap, u64 done_mask); /* * Configuration and exception handling */ int (*cable_detect)(struct ata_port *ap); unsigned int (*mode_filter)(struct ata_device *dev, unsigned int xfer_mask); void (*set_piomode)(struct ata_port *ap, struct ata_device *dev); void (*set_dmamode)(struct ata_port *ap, struct ata_device *dev); int (*set_mode)(struct ata_link *link, struct ata_device **r_failed_dev); unsigned int (*read_id)(struct ata_device *dev, struct ata_taskfile *tf, __le16 *id); void (*dev_config)(struct ata_device *dev); void (*freeze)(struct ata_port *ap); void (*thaw)(struct ata_port *ap); ata_prereset_fn_t prereset; ata_reset_fn_t softreset; ata_reset_fn_t hardreset; ata_postreset_fn_t postreset; ata_prereset_fn_t pmp_prereset; ata_reset_fn_t pmp_softreset; ata_reset_fn_t pmp_hardreset; ata_postreset_fn_t pmp_postreset; void (*error_handler)(struct ata_port *ap); void (*lost_interrupt)(struct ata_port *ap); void (*post_internal_cmd)(struct ata_queued_cmd *qc); void (*sched_eh)(struct ata_port *ap); void (*end_eh)(struct ata_port *ap); /* * Optional features */ int (*scr_read)(struct ata_link *link, unsigned int sc_reg, u32 *val); int (*scr_write)(struct ata_link *link, unsigned int sc_reg, u32 val); void (*pmp_attach)(struct ata_port *ap); void (*pmp_detach)(struct ata_port *ap); int (*set_lpm)(struct ata_link *link, enum ata_lpm_policy policy, unsigned hints); /* * Start, stop, suspend and resume */ int (*port_suspend)(struct ata_port *ap, pm_message_t mesg); int (*port_resume)(struct ata_port *ap); int (*port_start)(struct ata_port *ap); void (*port_stop)(struct ata_port *ap); void (*host_stop)(struct ata_host *host); #ifdef CONFIG_ATA_SFF /* * SFF / taskfile oriented ops */ void (*sff_dev_select)(struct ata_port *ap, unsigned int device); void (*sff_set_devctl)(struct ata_port *ap, u8 ctl); u8 (*sff_check_status)(struct ata_port *ap); u8 (*sff_check_altstatus)(struct ata_port *ap); void (*sff_tf_load)(struct ata_port *ap, const struct ata_taskfile *tf); void (*sff_tf_read)(struct ata_port *ap, struct ata_taskfile *tf); void (*sff_exec_command)(struct ata_port *ap, const struct ata_taskfile *tf); unsigned int (*sff_data_xfer)(struct ata_queued_cmd *qc, unsigned char *buf, unsigned int buflen, int rw); void (*sff_irq_on)(struct ata_port *); bool (*sff_irq_check)(struct ata_port *); void (*sff_irq_clear)(struct ata_port *); void (*sff_drain_fifo)(struct ata_queued_cmd *qc); #ifdef CONFIG_ATA_BMDMA void (*bmdma_setup)(struct ata_queued_cmd *qc); void (*bmdma_start)(struct ata_queued_cmd *qc); void (*bmdma_stop)(struct ata_queued_cmd *qc); u8 (*bmdma_status)(struct ata_port *ap); #endif /* CONFIG_ATA_BMDMA */ #endif /* CONFIG_ATA_SFF */ ssize_t (*em_show)(struct ata_port *ap, char *buf); ssize_t (*em_store)(struct ata_port *ap, const char *message, size_t size); ssize_t (*sw_activity_show)(struct ata_device *dev, char *buf); ssize_t (*sw_activity_store)(struct ata_device *dev, enum sw_activity val); ssize_t (*transmit_led_message)(struct ata_port *ap, u32 state, ssize_t size); /* * ->inherits must be the last field and all the preceding * fields must be pointers. */ const struct ata_port_operations *inherits; }; struct ata_port_info { unsigned long flags; unsigned long link_flags; unsigned int pio_mask; unsigned int mwdma_mask; unsigned int udma_mask; struct ata_port_operations *port_ops; void *private_data; }; struct ata_timing { unsigned short mode; /* ATA mode */ unsigned short setup; /* t1 */ unsigned short act8b; /* t2 for 8-bit I/O */ unsigned short rec8b; /* t2i for 8-bit I/O */ unsigned short cyc8b; /* t0 for 8-bit I/O */ unsigned short active; /* t2 or tD */ unsigned short recover; /* t2i or tK */ unsigned short dmack_hold; /* tj */ unsigned short cycle; /* t0 */ unsigned short udma; /* t2CYCTYP/2 */ }; /* * Core layer - drivers/ata/libata-core.c */ extern struct ata_port_operations ata_dummy_port_ops; extern const struct ata_port_info ata_dummy_port_info; static inline bool ata_is_atapi(u8 prot) { return prot & ATA_PROT_FLAG_ATAPI; } static inline bool ata_is_pio(u8 prot) { return prot & ATA_PROT_FLAG_PIO; } static inline bool ata_is_dma(u8 prot) { return prot & ATA_PROT_FLAG_DMA; } static inline bool ata_is_ncq(u8 prot) { return prot & ATA_PROT_FLAG_NCQ; } static inline bool ata_is_data(u8 prot) { return prot & (ATA_PROT_FLAG_PIO | ATA_PROT_FLAG_DMA); } static inline int is_multi_taskfile(struct ata_taskfile *tf) { return (tf->command == ATA_CMD_READ_MULTI) || (tf->command == ATA_CMD_WRITE_MULTI) || (tf->command == ATA_CMD_READ_MULTI_EXT) || (tf->command == ATA_CMD_WRITE_MULTI_EXT) || (tf->command == ATA_CMD_WRITE_MULTI_FUA_EXT); } static inline int ata_port_is_dummy(struct ata_port *ap) { return ap->ops == &ata_dummy_port_ops; } static inline bool ata_port_is_frozen(const struct ata_port *ap) { return ap->pflags & ATA_PFLAG_FROZEN; } extern int ata_std_prereset(struct ata_link *link, unsigned long deadline); extern int ata_wait_after_reset(struct ata_link *link, unsigned long deadline, int (*check_ready)(struct ata_link *link)); extern void ata_std_postreset(struct ata_link *link, unsigned int *classes); extern struct ata_host *ata_host_alloc(struct device *dev, int n_ports); extern struct ata_host *ata_host_alloc_pinfo(struct device *dev, const struct ata_port_info * const * ppi, int n_ports); extern void ata_host_get(struct ata_host *host); extern void ata_host_put(struct ata_host *host); extern int ata_host_start(struct ata_host *host); extern int ata_host_register(struct ata_host *host, const struct scsi_host_template *sht); extern int ata_host_activate(struct ata_host *host, int irq, irq_handler_t irq_handler, unsigned long irq_flags, const struct scsi_host_template *sht); extern void ata_host_detach(struct ata_host *host); extern void ata_host_init(struct ata_host *, struct device *, struct ata_port_operations *); extern int ata_scsi_ioctl(struct scsi_device *dev, unsigned int cmd, void __user *arg); #ifdef CONFIG_COMPAT #define ATA_SCSI_COMPAT_IOCTL .compat_ioctl = ata_scsi_ioctl, #else #define ATA_SCSI_COMPAT_IOCTL /* empty */ #endif extern int ata_scsi_queuecmd(struct Scsi_Host *h, struct scsi_cmnd *cmd); #if IS_REACHABLE(CONFIG_ATA) bool ata_scsi_dma_need_drain(struct request *rq); #else #define ata_scsi_dma_need_drain NULL #endif extern int ata_sas_scsi_ioctl(struct ata_port *ap, struct scsi_device *dev, unsigned int cmd, void __user *arg); extern bool ata_link_online(struct ata_link *link); extern bool ata_link_offline(struct ata_link *link); #ifdef CONFIG_PM extern void ata_host_suspend(struct ata_host *host, pm_message_t mesg); extern void ata_host_resume(struct ata_host *host); extern void ata_sas_port_suspend(struct ata_port *ap); extern void ata_sas_port_resume(struct ata_port *ap); #else static inline void ata_sas_port_suspend(struct ata_port *ap) { } static inline void ata_sas_port_resume(struct ata_port *ap) { } #endif extern int ata_ratelimit(void); extern void ata_msleep(struct ata_port *ap, unsigned int msecs); extern u32 ata_wait_register(struct ata_port *ap, void __iomem *reg, u32 mask, u32 val, unsigned int interval, unsigned int timeout); extern int atapi_cmd_type(u8 opcode); extern unsigned int ata_pack_xfermask(unsigned int pio_mask, unsigned int mwdma_mask, unsigned int udma_mask); extern void ata_unpack_xfermask(unsigned int xfer_mask, unsigned int *pio_mask, unsigned int *mwdma_mask, unsigned int *udma_mask); extern u8 ata_xfer_mask2mode(unsigned int xfer_mask); extern unsigned int ata_xfer_mode2mask(u8 xfer_mode); extern int ata_xfer_mode2shift(u8 xfer_mode); extern const char *ata_mode_string(unsigned int xfer_mask); extern unsigned int ata_id_xfermask(const u16 *id); extern int ata_std_qc_defer(struct ata_queued_cmd *qc); extern void ata_sg_init(struct ata_queued_cmd *qc, struct scatterlist *sg, unsigned int n_elem); extern unsigned int ata_dev_classify(const struct ata_taskfile *tf); extern unsigned int ata_port_classify(struct ata_port *ap, const struct ata_taskfile *tf); extern void ata_dev_disable(struct ata_device *adev); extern void ata_id_string(const u16 *id, unsigned char *s, unsigned int ofs, unsigned int len); extern void ata_id_c_string(const u16 *id, unsigned char *s, unsigned int ofs, unsigned int len); extern unsigned int ata_do_dev_read_id(struct ata_device *dev, struct ata_taskfile *tf, __le16 *id); extern void ata_qc_complete(struct ata_queued_cmd *qc); extern u64 ata_qc_get_active(struct ata_port *ap); extern void ata_scsi_simulate(struct ata_device *dev, struct scsi_cmnd *cmd); extern int ata_std_bios_param(struct scsi_device *sdev, struct block_device *bdev, sector_t capacity, int geom[]); extern void ata_scsi_unlock_native_capacity(struct scsi_device *sdev); extern int ata_scsi_sdev_init(struct scsi_device *sdev); int ata_scsi_sdev_configure(struct scsi_device *sdev, struct queue_limits *lim); extern void ata_scsi_sdev_destroy(struct scsi_device *sdev); extern int ata_scsi_change_queue_depth(struct scsi_device *sdev, int queue_depth); extern int ata_change_queue_depth(struct ata_port *ap, struct scsi_device *sdev, int queue_depth); extern int ata_ncq_prio_supported(struct ata_port *ap, struct scsi_device *sdev, bool *supported); extern int ata_ncq_prio_enabled(struct ata_port *ap, struct scsi_device *sdev, bool *enabled); extern int ata_ncq_prio_enable(struct ata_port *ap, struct scsi_device *sdev, bool enable); extern struct ata_device *ata_dev_pair(struct ata_device *adev); extern int ata_do_set_mode(struct ata_link *link, struct ata_device **r_failed_dev); extern void ata_scsi_port_error_handler(struct Scsi_Host *host, struct ata_port *ap); extern void ata_scsi_cmd_error_handler(struct Scsi_Host *host, struct ata_port *ap, struct list_head *eh_q); /* * SATA specific code - drivers/ata/libata-sata.c */ #ifdef CONFIG_SATA_HOST extern const unsigned int sata_deb_timing_normal[]; extern const unsigned int sata_deb_timing_hotplug[]; extern const unsigned int sata_deb_timing_long[]; static inline const unsigned int * sata_ehc_deb_timing(struct ata_eh_context *ehc) { if (ehc->i.flags & ATA_EHI_HOTPLUGGED) return sata_deb_timing_hotplug; else return sata_deb_timing_normal; } extern int sata_scr_valid(struct ata_link *link); extern int sata_scr_read(struct ata_link *link, int reg, u32 *val); extern int sata_scr_write(struct ata_link *link, int reg, u32 val); extern int sata_scr_write_flush(struct ata_link *link, int reg, u32 val); extern int sata_set_spd(struct ata_link *link); int sata_std_hardreset(struct ata_link *link, unsigned int *class, unsigned long deadline); extern int sata_link_hardreset(struct ata_link *link, const unsigned int *timing, unsigned long deadline, bool *online, int (*check_ready)(struct ata_link *)); extern int sata_link_resume(struct ata_link *link, const unsigned int *params, unsigned long deadline); extern void ata_eh_analyze_ncq_error(struct ata_link *link); #else static inline const unsigned int * sata_ehc_deb_timing(struct ata_eh_context *ehc) { return NULL; } static inline int sata_scr_valid(struct ata_link *link) { return 0; } static inline int sata_scr_read(struct ata_link *link, int reg, u32 *val) { return -EOPNOTSUPP; } static inline int sata_scr_write(struct ata_link *link, int reg, u32 val) { return -EOPNOTSUPP; } static inline int sata_scr_write_flush(struct ata_link *link, int reg, u32 val) { return -EOPNOTSUPP; } static inline int sata_set_spd(struct ata_link *link) { return -EOPNOTSUPP; } static inline int sata_std_hardreset(struct ata_link *link, unsigned int *class, unsigned long deadline) { return -EOPNOTSUPP; } static inline int sata_link_hardreset(struct ata_link *link, const unsigned int *timing, unsigned long deadline, bool *online, int (*check_ready)(struct ata_link *)) { if (online) *online = false; return -EOPNOTSUPP; } static inline int sata_link_resume(struct ata_link *link, const unsigned int *params, unsigned long deadline) { return -EOPNOTSUPP; } static inline void ata_eh_analyze_ncq_error(struct ata_link *link) { } #endif extern int sata_link_debounce(struct ata_link *link, const unsigned int *params, unsigned long deadline); extern int sata_link_scr_lpm(struct ata_link *link, enum ata_lpm_policy policy, bool spm_wakeup); extern int ata_slave_link_init(struct ata_port *ap); extern void ata_port_probe(struct ata_port *ap); extern struct ata_port *ata_port_alloc(struct ata_host *host); extern void ata_port_free(struct ata_port *ap); extern int ata_tport_add(struct device *parent, struct ata_port *ap); extern void ata_tport_delete(struct ata_port *ap); int ata_sas_sdev_configure(struct scsi_device *sdev, struct queue_limits *lim, struct ata_port *ap); extern int ata_sas_queuecmd(struct scsi_cmnd *cmd, struct ata_port *ap); extern void ata_tf_to_fis(const struct ata_taskfile *tf, u8 pmp, int is_cmd, u8 *fis); extern void ata_tf_from_fis(const u8 *fis, struct ata_taskfile *tf); extern int ata_qc_complete_multiple(struct ata_port *ap, u64 qc_active); extern bool sata_lpm_ignore_phy_events(struct ata_link *link); extern int sata_async_notification(struct ata_port *ap); extern int ata_cable_40wire(struct ata_port *ap); extern int ata_cable_80wire(struct ata_port *ap); extern int ata_cable_sata(struct ata_port *ap); extern int ata_cable_ignore(struct ata_port *ap); extern int ata_cable_unknown(struct ata_port *ap); /* Timing helpers */ extern unsigned int ata_pio_need_iordy(const struct ata_device *); extern u8 ata_timing_cycle2mode(unsigned int xfer_shift, int cycle); /* PCI */ #ifdef CONFIG_PCI struct pci_dev; struct pci_bits { unsigned int reg; /* PCI config register to read */ unsigned int width; /* 1 (8 bit), 2 (16 bit), 4 (32 bit) */ unsigned long mask; unsigned long val; }; extern int pci_test_config_bits(struct pci_dev *pdev, const struct pci_bits *bits); extern void ata_pci_shutdown_one(struct pci_dev *pdev); extern void ata_pci_remove_one(struct pci_dev *pdev); #ifdef CONFIG_PM extern void ata_pci_device_do_suspend(struct pci_dev *pdev, pm_message_t mesg); extern int __must_check ata_pci_device_do_resume(struct pci_dev *pdev); extern int ata_pci_device_suspend(struct pci_dev *pdev, pm_message_t mesg); extern int ata_pci_device_resume(struct pci_dev *pdev); #endif /* CONFIG_PM */ #endif /* CONFIG_PCI */ struct platform_device; extern void ata_platform_remove_one(struct platform_device *pdev); /* * ACPI - drivers/ata/libata-acpi.c */ #ifdef CONFIG_ATA_ACPI static inline const struct ata_acpi_gtm *ata_acpi_init_gtm(struct ata_port *ap) { if (ap->pflags & ATA_PFLAG_INIT_GTM_VALID) return &ap->__acpi_init_gtm; return NULL; } int ata_acpi_stm(struct ata_port *ap, const struct ata_acpi_gtm *stm); int ata_acpi_gtm(struct ata_port *ap, struct ata_acpi_gtm *stm); unsigned int ata_acpi_gtm_xfermask(struct ata_device *dev, const struct ata_acpi_gtm *gtm); int ata_acpi_cbl_pata_type(struct ata_port *ap); #else static inline const struct ata_acpi_gtm *ata_acpi_init_gtm(struct ata_port *ap) { return NULL; } static inline int ata_acpi_stm(const struct ata_port *ap, struct ata_acpi_gtm *stm) { return -ENOSYS; } static inline int ata_acpi_gtm(const struct ata_port *ap, struct ata_acpi_gtm *stm) { return -ENOSYS; } static inline unsigned int ata_acpi_gtm_xfermask(struct ata_device *dev, const struct ata_acpi_gtm *gtm) { return 0; } static inline int ata_acpi_cbl_pata_type(struct ata_port *ap) { return ATA_CBL_PATA40; } #endif /* * EH - drivers/ata/libata-eh.c */ extern void ata_port_schedule_eh(struct ata_port *ap); extern void ata_port_wait_eh(struct ata_port *ap); extern int ata_link_abort(struct ata_link *link); extern int ata_port_abort(struct ata_port *ap); extern int ata_port_freeze(struct ata_port *ap); extern void ata_eh_freeze_port(struct ata_port *ap); extern void ata_eh_thaw_port(struct ata_port *ap); extern void ata_eh_qc_complete(struct ata_queued_cmd *qc); extern void ata_eh_qc_retry(struct ata_queued_cmd *qc); extern void ata_do_eh(struct ata_port *ap, ata_prereset_fn_t prereset, ata_reset_fn_t softreset, ata_reset_fn_t hardreset, ata_postreset_fn_t postreset); extern void ata_std_error_handler(struct ata_port *ap); extern void ata_std_sched_eh(struct ata_port *ap); extern void ata_std_end_eh(struct ata_port *ap); extern int ata_link_nr_enabled(struct ata_link *link); /* * Base operations to inherit from and initializers for sht * * Operations * * base : Common to all libata drivers. * sata : SATA controllers w/ native interface. * pmp : SATA controllers w/ PMP support. * sff : SFF ATA controllers w/o BMDMA support. * bmdma : SFF ATA controllers w/ BMDMA support. * * sht initializers * * BASE : Common to all libata drivers. The user must set * sg_tablesize and dma_boundary. * PIO : SFF ATA controllers w/ only PIO support. * BMDMA : SFF ATA controllers w/ BMDMA support. sg_tablesize and * dma_boundary are set to BMDMA limits. * NCQ : SATA controllers supporting NCQ. The user must set * sg_tablesize, dma_boundary and can_queue. */ extern const struct ata_port_operations ata_base_port_ops; extern const struct ata_port_operations sata_port_ops; extern const struct attribute_group *ata_common_sdev_groups[]; /* * All sht initializers (BASE, PIO, BMDMA, NCQ) must be instantiated * by the edge drivers. Because the 'module' field of sht must be the * edge driver's module reference, otherwise the driver can be unloaded * even if the scsi_device is being accessed. */ #define __ATA_BASE_SHT(drv_name) \ .module = THIS_MODULE, \ .name = drv_name, \ .ioctl = ata_scsi_ioctl, \ ATA_SCSI_COMPAT_IOCTL \ .queuecommand = ata_scsi_queuecmd, \ .dma_need_drain = ata_scsi_dma_need_drain, \ .this_id = ATA_SHT_THIS_ID, \ .emulated = ATA_SHT_EMULATED, \ .proc_name = drv_name, \ .sdev_init = ata_scsi_sdev_init, \ .sdev_destroy = ata_scsi_sdev_destroy, \ .bios_param = ata_std_bios_param, \ .unlock_native_capacity = ata_scsi_unlock_native_capacity,\ .max_sectors = ATA_MAX_SECTORS_LBA48 #define ATA_SUBBASE_SHT(drv_name) \ __ATA_BASE_SHT(drv_name), \ .can_queue = ATA_DEF_QUEUE, \ .tag_alloc_policy_rr = true, \ .sdev_configure = ata_scsi_sdev_configure #define ATA_SUBBASE_SHT_QD(drv_name, drv_qd) \ __ATA_BASE_SHT(drv_name), \ .can_queue = drv_qd, \ .tag_alloc_policy_rr = true, \ .sdev_configure = ata_scsi_sdev_configure #define ATA_BASE_SHT(drv_name) \ ATA_SUBBASE_SHT(drv_name), \ .sdev_groups = ata_common_sdev_groups #ifdef CONFIG_SATA_HOST extern const struct attribute_group *ata_ncq_sdev_groups[]; #define ATA_NCQ_SHT(drv_name) \ ATA_SUBBASE_SHT(drv_name), \ .sdev_groups = ata_ncq_sdev_groups, \ .change_queue_depth = ata_scsi_change_queue_depth #define ATA_NCQ_SHT_QD(drv_name, drv_qd) \ ATA_SUBBASE_SHT_QD(drv_name, drv_qd), \ .sdev_groups = ata_ncq_sdev_groups, \ .change_queue_depth = ata_scsi_change_queue_depth #endif /* * PMP helpers */ #ifdef CONFIG_SATA_PMP static inline bool sata_pmp_supported(struct ata_port *ap) { return ap->flags & ATA_FLAG_PMP; } static inline bool sata_pmp_attached(struct ata_port *ap) { return ap->nr_pmp_links != 0; } static inline bool ata_is_host_link(const struct ata_link *link) { return link == &link->ap->link || link == link->ap->slave_link; } #else /* CONFIG_SATA_PMP */ static inline bool sata_pmp_supported(struct ata_port *ap) { return false; } static inline bool sata_pmp_attached(struct ata_port *ap) { return false; } static inline bool ata_is_host_link(const struct ata_link *link) { return true; } #endif /* CONFIG_SATA_PMP */ static inline int sata_srst_pmp(struct ata_link *link) { if (sata_pmp_supported(link->ap) && ata_is_host_link(link)) return SATA_PMP_CTRL_PORT; return link->pmp; } #define ata_port_printk(level, ap, fmt, ...) \ pr_ ## level ("ata%u: " fmt, (ap)->print_id, ##__VA_ARGS__) #define ata_port_err(ap, fmt, ...) \ ata_port_printk(err, ap, fmt, ##__VA_ARGS__) #define ata_port_warn(ap, fmt, ...) \ ata_port_printk(warn, ap, fmt, ##__VA_ARGS__) #define ata_port_notice(ap, fmt, ...) \ ata_port_printk(notice, ap, fmt, ##__VA_ARGS__) #define ata_port_info(ap, fmt, ...) \ ata_port_printk(info, ap, fmt, ##__VA_ARGS__) #define ata_port_dbg(ap, fmt, ...) \ ata_port_printk(debug, ap, fmt, ##__VA_ARGS__) #define ata_link_printk(level, link, fmt, ...) \ do { \ if (sata_pmp_attached((link)->ap) || \ (link)->ap->slave_link) \ pr_ ## level ("ata%u.%02u: " fmt, \ (link)->ap->print_id, \ (link)->pmp, \ ##__VA_ARGS__); \ else \ pr_ ## level ("ata%u: " fmt, \ (link)->ap->print_id, \ ##__VA_ARGS__); \ } while (0) #define ata_link_err(link, fmt, ...) \ ata_link_printk(err, link, fmt, ##__VA_ARGS__) #define ata_link_warn(link, fmt, ...) \ ata_link_printk(warn, link, fmt, ##__VA_ARGS__) #define ata_link_notice(link, fmt, ...) \ ata_link_printk(notice, link, fmt, ##__VA_ARGS__) #define ata_link_info(link, fmt, ...) \ ata_link_printk(info, link, fmt, ##__VA_ARGS__) #define ata_link_dbg(link, fmt, ...) \ ata_link_printk(debug, link, fmt, ##__VA_ARGS__) #define ata_dev_printk(level, dev, fmt, ...) \ pr_ ## level("ata%u.%02u: " fmt, \ (dev)->link->ap->print_id, \ (dev)->link->pmp + (dev)->devno, \ ##__VA_ARGS__) #define ata_dev_err(dev, fmt, ...) \ ata_dev_printk(err, dev, fmt, ##__VA_ARGS__) #define ata_dev_warn(dev, fmt, ...) \ ata_dev_printk(warn, dev, fmt, ##__VA_ARGS__) #define ata_dev_notice(dev, fmt, ...) \ ata_dev_printk(notice, dev, fmt, ##__VA_ARGS__) #define ata_dev_info(dev, fmt, ...) \ ata_dev_printk(info, dev, fmt, ##__VA_ARGS__) #define ata_dev_dbg(dev, fmt, ...) \ ata_dev_printk(debug, dev, fmt, ##__VA_ARGS__) static inline void ata_print_version_once(const struct device *dev, const char *version) { dev_dbg_once(dev, "version %s\n", version); } /* * ata_eh_info helpers */ extern __printf(2, 3) void __ata_ehi_push_desc(struct ata_eh_info *ehi, const char *fmt, ...); extern __printf(2, 3) void ata_ehi_push_desc(struct ata_eh_info *ehi, const char *fmt, ...); extern void ata_ehi_clear_desc(struct ata_eh_info *ehi); static inline void ata_ehi_hotplugged(struct ata_eh_info *ehi) { ehi->probe_mask |= (1 << ATA_MAX_DEVICES) - 1; ehi->flags |= ATA_EHI_HOTPLUGGED; ehi->action |= ATA_EH_RESET | ATA_EH_ENABLE_LINK; ehi->err_mask |= AC_ERR_ATA_BUS; } /* * port description helpers */ extern __printf(2, 3) void ata_port_desc(struct ata_port *ap, const char *fmt, ...); #ifdef CONFIG_PCI extern void ata_port_pbar_desc(struct ata_port *ap, int bar, ssize_t offset, const char *name); #endif static inline void ata_port_desc_misc(struct ata_port *ap, int irq) { ata_port_desc(ap, "irq %d", irq); ata_port_desc(ap, "lpm-pol %d", ap->target_lpm_policy); if (ap->pflags & ATA_PFLAG_EXTERNAL) ata_port_desc(ap, "ext"); } static inline bool ata_tag_internal(unsigned int tag) { return tag == ATA_TAG_INTERNAL; } static inline bool ata_tag_valid(unsigned int tag) { return tag < ATA_MAX_QUEUE || ata_tag_internal(tag); } #define __ata_qc_for_each(ap, qc, tag, max_tag, fn) \ for ((tag) = 0; (tag) < (max_tag) && \ ({ qc = fn((ap), (tag)); 1; }); (tag)++) \ /* * Internal use only, iterate commands ignoring error handling and * status of 'qc'. */ #define ata_qc_for_each_raw(ap, qc, tag) \ __ata_qc_for_each(ap, qc, tag, ATA_MAX_QUEUE, __ata_qc_from_tag) /* * Iterate all potential commands that can be queued */ #define ata_qc_for_each(ap, qc, tag) \ __ata_qc_for_each(ap, qc, tag, ATA_MAX_QUEUE, ata_qc_from_tag) /* * Like ata_qc_for_each, but with the internal tag included */ #define ata_qc_for_each_with_internal(ap, qc, tag) \ __ata_qc_for_each(ap, qc, tag, ATA_MAX_QUEUE + 1, ata_qc_from_tag) /* * device helpers */ static inline unsigned int ata_class_enabled(unsigned int class) { return class == ATA_DEV_ATA || class == ATA_DEV_ATAPI || class == ATA_DEV_PMP || class == ATA_DEV_SEMB || class == ATA_DEV_ZAC; } static inline unsigned int ata_class_disabled(unsigned int class) { return class == ATA_DEV_ATA_UNSUP || class == ATA_DEV_ATAPI_UNSUP || class == ATA_DEV_PMP_UNSUP || class == ATA_DEV_SEMB_UNSUP || class == ATA_DEV_ZAC_UNSUP; } static inline unsigned int ata_class_absent(unsigned int class) { return !ata_class_enabled(class) && !ata_class_disabled(class); } static inline unsigned int ata_dev_enabled(const struct ata_device *dev) { return ata_class_enabled(dev->class); } static inline unsigned int ata_dev_disabled(const struct ata_device *dev) { return ata_class_disabled(dev->class); } static inline unsigned int ata_dev_absent(const struct ata_device *dev) { return ata_class_absent(dev->class); } /* * link helpers */ static inline int ata_link_max_devices(const struct ata_link *link) { if (ata_is_host_link(link) && link->ap->flags & ATA_FLAG_SLAVE_POSS) return 2; return 1; } static inline int ata_link_active(struct ata_link *link) { return ata_tag_valid(link->active_tag) || link->sactive; } /* * Iterators * * ATA_LITER_* constants are used to select link iteration mode and * ATA_DITER_* device iteration mode. * * For a custom iteration directly using ata_{link|dev}_next(), if * @link or @dev, respectively, is NULL, the first element is * returned. @dev and @link can be any valid device or link and the * next element according to the iteration mode will be returned. * After the last element, NULL is returned. */ enum ata_link_iter_mode { ATA_LITER_EDGE, /* if present, PMP links only; otherwise, * host link. no slave link */ ATA_LITER_HOST_FIRST, /* host link followed by PMP or slave links */ ATA_LITER_PMP_FIRST, /* PMP links followed by host link, * slave link still comes after host link */ }; enum ata_dev_iter_mode { ATA_DITER_ENABLED, ATA_DITER_ENABLED_REVERSE, ATA_DITER_ALL, ATA_DITER_ALL_REVERSE, }; extern struct ata_link *ata_link_next(struct ata_link *link, struct ata_port *ap, enum ata_link_iter_mode mode); extern struct ata_device *ata_dev_next(struct ata_device *dev, struct ata_link *link, enum ata_dev_iter_mode mode); /* * Shortcut notation for iterations * * ata_for_each_link() iterates over each link of @ap according to * @mode. @link points to the current link in the loop. @link is * NULL after loop termination. ata_for_each_dev() works the same way * except that it iterates over each device of @link. * * Note that the mode prefixes ATA_{L|D}ITER_ shouldn't need to be * specified when using the following shorthand notations. Only the * mode itself (EDGE, HOST_FIRST, ENABLED, etc...) should be * specified. This not only increases brevity but also makes it * impossible to use ATA_LITER_* for device iteration or vice-versa. */ #define ata_for_each_link(link, ap, mode) \ for ((link) = ata_link_next(NULL, (ap), ATA_LITER_##mode); (link); \ (link) = ata_link_next((link), (ap), ATA_LITER_##mode)) #define ata_for_each_dev(dev, link, mode) \ for ((dev) = ata_dev_next(NULL, (link), ATA_DITER_##mode); (dev); \ (dev) = ata_dev_next((dev), (link), ATA_DITER_##mode)) /** * ata_ncq_supported - Test whether NCQ is supported * @dev: ATA device to test * * LOCKING: * spin_lock_irqsave(host lock) * * RETURNS: * true if @dev supports NCQ, false otherwise. */ static inline bool ata_ncq_supported(struct ata_device *dev) { if (!IS_ENABLED(CONFIG_SATA_HOST)) return false; return (dev->flags & (ATA_DFLAG_PIO | ATA_DFLAG_NCQ)) == ATA_DFLAG_NCQ; } /** * ata_ncq_enabled - Test whether NCQ is enabled * @dev: ATA device to test * * LOCKING: * spin_lock_irqsave(host lock) * * RETURNS: * true if NCQ is enabled for @dev, false otherwise. */ static inline bool ata_ncq_enabled(struct ata_device *dev) { return ata_ncq_supported(dev) && !(dev->flags & ATA_DFLAG_NCQ_OFF); } static inline bool ata_fpdma_dsm_supported(struct ata_device *dev) { return (dev->flags & ATA_DFLAG_NCQ_SEND_RECV) && (dev->ncq_send_recv_cmds[ATA_LOG_NCQ_SEND_RECV_DSM_OFFSET] & ATA_LOG_NCQ_SEND_RECV_DSM_TRIM); } static inline bool ata_fpdma_read_log_supported(struct ata_device *dev) { return (dev->flags & ATA_DFLAG_NCQ_SEND_RECV) && (dev->ncq_send_recv_cmds[ATA_LOG_NCQ_SEND_RECV_RD_LOG_OFFSET] & ATA_LOG_NCQ_SEND_RECV_RD_LOG_SUPPORTED); } static inline bool ata_fpdma_zac_mgmt_in_supported(struct ata_device *dev) { return (dev->flags & ATA_DFLAG_NCQ_SEND_RECV) && (dev->ncq_send_recv_cmds[ATA_LOG_NCQ_SEND_RECV_ZAC_MGMT_OFFSET] & ATA_LOG_NCQ_SEND_RECV_ZAC_MGMT_IN_SUPPORTED); } static inline bool ata_fpdma_zac_mgmt_out_supported(struct ata_device *dev) { return (dev->ncq_non_data_cmds[ATA_LOG_NCQ_NON_DATA_ZAC_MGMT_OFFSET] & ATA_LOG_NCQ_NON_DATA_ZAC_MGMT_OUT); } static inline void ata_qc_set_polling(struct ata_queued_cmd *qc) { qc->tf.ctl |= ATA_NIEN; } static inline struct ata_queued_cmd *__ata_qc_from_tag(struct ata_port *ap, unsigned int tag) { if (ata_tag_valid(tag)) return &ap->qcmd[tag]; return NULL; } static inline struct ata_queued_cmd *ata_qc_from_tag(struct ata_port *ap, unsigned int tag) { struct ata_queued_cmd *qc = __ata_qc_from_tag(ap, tag); if (unlikely(!qc)) return qc; if ((qc->flags & (ATA_QCFLAG_ACTIVE | ATA_QCFLAG_EH)) == ATA_QCFLAG_ACTIVE) return qc; return NULL; } static inline unsigned int ata_qc_raw_nbytes(struct ata_queued_cmd *qc) { return qc->nbytes - min(qc->extrabytes, qc->nbytes); } static inline void ata_tf_init(struct ata_device *dev, struct ata_taskfile *tf) { memset(tf, 0, sizeof(*tf)); #ifdef CONFIG_ATA_SFF tf->ctl = dev->link->ap->ctl; #else tf->ctl = ATA_DEVCTL_OBS; #endif if (dev->devno == 0) tf->device = ATA_DEVICE_OBS; else tf->device = ATA_DEVICE_OBS | ATA_DEV1; } static inline void ata_qc_reinit(struct ata_queued_cmd *qc) { qc->dma_dir = DMA_NONE; qc->sg = NULL; qc->flags = 0; qc->cursg = NULL; qc->cursg_ofs = 0; qc->nbytes = qc->extrabytes = qc->curbytes = 0; qc->n_elem = 0; qc->err_mask = 0; qc->sect_size = ATA_SECT_SIZE; ata_tf_init(qc->dev, &qc->tf); /* init result_tf such that it indicates normal completion */ qc->result_tf.command = ATA_DRDY; qc->result_tf.feature = 0; } static inline int ata_try_flush_cache(const struct ata_device *dev) { return ata_id_wcache_enabled(dev->id) || ata_id_has_flush(dev->id) || ata_id_has_flush_ext(dev->id); } static inline unsigned int ac_err_mask(u8 status) { if (status & (ATA_BUSY | ATA_DRQ)) return AC_ERR_HSM; if (status & (ATA_ERR | ATA_DF)) return AC_ERR_DEV; return 0; } static inline unsigned int __ac_err_mask(u8 status) { unsigned int mask = ac_err_mask(status); if (mask == 0) return AC_ERR_OTHER; return mask; } static inline struct ata_port *ata_shost_to_port(struct Scsi_Host *host) { return *(struct ata_port **)&host->hostdata[0]; } static inline int ata_check_ready(u8 status) { if (!(status & ATA_BUSY)) return 1; /* 0xff indicates either no device or device not ready */ if (status == 0xff) return -ENODEV; return 0; } static inline unsigned long ata_deadline(unsigned long from_jiffies, unsigned int timeout_msecs) { return from_jiffies + msecs_to_jiffies(timeout_msecs); } /* Don't open code these in drivers as there are traps. Firstly the range may change in future hardware and specs, secondly 0xFF means 'no DMA' but is > UDMA_0. Dyma ddreigiau */ static inline bool ata_using_mwdma(struct ata_device *adev) { return adev->dma_mode >= XFER_MW_DMA_0 && adev->dma_mode <= XFER_MW_DMA_4; } static inline bool ata_using_udma(struct ata_device *adev) { return adev->dma_mode >= XFER_UDMA_0 && adev->dma_mode <= XFER_UDMA_7; } static inline bool ata_dma_enabled(struct ata_device *adev) { return adev->dma_mode != 0xFF; } /************************************************************************** * PATA timings - drivers/ata/libata-pata-timings.c */ extern const struct ata_timing *ata_timing_find_mode(u8 xfer_mode); extern int ata_timing_compute(struct ata_device *, unsigned short, struct ata_timing *, int, int); extern void ata_timing_merge(const struct ata_timing *, const struct ata_timing *, struct ata_timing *, unsigned int); /************************************************************************** * PMP - drivers/ata/libata-pmp.c */ #ifdef CONFIG_SATA_PMP extern const struct ata_port_operations sata_pmp_port_ops; extern int sata_pmp_qc_defer_cmd_switch(struct ata_queued_cmd *qc); extern void sata_pmp_error_handler(struct ata_port *ap); #else /* CONFIG_SATA_PMP */ #define sata_pmp_port_ops sata_port_ops #define sata_pmp_qc_defer_cmd_switch ata_std_qc_defer #define sata_pmp_error_handler ata_std_error_handler #endif /* CONFIG_SATA_PMP */ /************************************************************************** * SFF - drivers/ata/libata-sff.c */ #ifdef CONFIG_ATA_SFF extern const struct ata_port_operations ata_sff_port_ops; extern const struct ata_port_operations ata_bmdma32_port_ops; /* PIO only, sg_tablesize and dma_boundary limits can be removed */ #define ATA_PIO_SHT(drv_name) \ ATA_BASE_SHT(drv_name), \ .sg_tablesize = LIBATA_MAX_PRD, \ .dma_boundary = ATA_DMA_BOUNDARY extern void ata_sff_dev_select(struct ata_port *ap, unsigned int device); extern u8 ata_sff_check_status(struct ata_port *ap); extern void ata_sff_pause(struct ata_port *ap); extern void ata_sff_dma_pause(struct ata_port *ap); extern int ata_sff_wait_ready(struct ata_link *link, unsigned long deadline); extern void ata_sff_tf_load(struct ata_port *ap, const struct ata_taskfile *tf); extern void ata_sff_tf_read(struct ata_port *ap, struct ata_taskfile *tf); extern void ata_sff_exec_command(struct ata_port *ap, const struct ata_taskfile *tf); extern unsigned int ata_sff_data_xfer(struct ata_queued_cmd *qc, unsigned char *buf, unsigned int buflen, int rw); extern unsigned int ata_sff_data_xfer32(struct ata_queued_cmd *qc, unsigned char *buf, unsigned int buflen, int rw); extern void ata_sff_irq_on(struct ata_port *ap); extern int ata_sff_hsm_move(struct ata_port *ap, struct ata_queued_cmd *qc, u8 status, int in_wq); extern void ata_sff_queue_work(struct work_struct *work); extern void ata_sff_queue_delayed_work(struct delayed_work *dwork, unsigned long delay); extern void ata_sff_queue_pio_task(struct ata_link *link, unsigned long delay); extern unsigned int ata_sff_qc_issue(struct ata_queued_cmd *qc); extern void ata_sff_qc_fill_rtf(struct ata_queued_cmd *qc); extern unsigned int ata_sff_port_intr(struct ata_port *ap, struct ata_queued_cmd *qc); extern irqreturn_t ata_sff_interrupt(int irq, void *dev_instance); extern void ata_sff_lost_interrupt(struct ata_port *ap); extern void ata_sff_freeze(struct ata_port *ap); extern void ata_sff_thaw(struct ata_port *ap); extern int ata_sff_prereset(struct ata_link *link, unsigned long deadline); extern unsigned int ata_sff_dev_classify(struct ata_device *dev, int present, u8 *r_err); extern int ata_sff_wait_after_reset(struct ata_link *link, unsigned int devmask, unsigned long deadline); extern int ata_sff_softreset(struct ata_link *link, unsigned int *classes, unsigned long deadline); extern int sata_sff_hardreset(struct ata_link *link, unsigned int *class, unsigned long deadline); extern void ata_sff_postreset(struct ata_link *link, unsigned int *classes); extern void ata_sff_drain_fifo(struct ata_queued_cmd *qc); extern void ata_sff_error_handler(struct ata_port *ap); extern void ata_sff_std_ports(struct ata_ioports *ioaddr); #ifdef CONFIG_PCI extern int ata_pci_sff_init_host(struct ata_host *host); extern int ata_pci_sff_prepare_host(struct pci_dev *pdev, const struct ata_port_info * const * ppi, struct ata_host **r_host); extern int ata_pci_sff_activate_host(struct ata_host *host, irq_handler_t irq_handler, const struct scsi_host_template *sht); extern int ata_pci_sff_init_one(struct pci_dev *pdev, const struct ata_port_info * const * ppi, const struct scsi_host_template *sht, void *host_priv, int hflags); #endif /* CONFIG_PCI */ #ifdef CONFIG_ATA_BMDMA extern const struct ata_port_operations ata_bmdma_port_ops; #define ATA_BMDMA_SHT(drv_name) \ ATA_BASE_SHT(drv_name), \ .sg_tablesize = LIBATA_MAX_PRD, \ .dma_boundary = ATA_DMA_BOUNDARY extern enum ata_completion_errors ata_bmdma_qc_prep(struct ata_queued_cmd *qc); extern unsigned int ata_bmdma_qc_issue(struct ata_queued_cmd *qc); extern enum ata_completion_errors ata_bmdma_dumb_qc_prep(struct ata_queued_cmd *qc); extern unsigned int ata_bmdma_port_intr(struct ata_port *ap, struct ata_queued_cmd *qc); extern irqreturn_t ata_bmdma_interrupt(int irq, void *dev_instance); extern void ata_bmdma_error_handler(struct ata_port *ap); extern void ata_bmdma_post_internal_cmd(struct ata_queued_cmd *qc); extern void ata_bmdma_irq_clear(struct ata_port *ap); extern void ata_bmdma_setup(struct ata_queued_cmd *qc); extern void ata_bmdma_start(struct ata_queued_cmd *qc); extern void ata_bmdma_stop(struct ata_queued_cmd *qc); extern u8 ata_bmdma_status(struct ata_port *ap); extern int ata_bmdma_port_start(struct ata_port *ap); extern int ata_bmdma_port_start32(struct ata_port *ap); #ifdef CONFIG_PCI extern int ata_pci_bmdma_clear_simplex(struct pci_dev *pdev); extern void ata_pci_bmdma_init(struct ata_host *host); extern int ata_pci_bmdma_prepare_host(struct pci_dev *pdev, const struct ata_port_info * const * ppi, struct ata_host **r_host); extern int ata_pci_bmdma_init_one(struct pci_dev *pdev, const struct ata_port_info * const * ppi, const struct scsi_host_template *sht, void *host_priv, int hflags); #endif /* CONFIG_PCI */ #endif /* CONFIG_ATA_BMDMA */ /** * ata_sff_busy_wait - Wait for a port status register * @ap: Port to wait for. * @bits: bits that must be clear * @max: number of 10uS waits to perform * * Waits up to max*10 microseconds for the selected bits in the port's * status register to be cleared. * Returns final value of status register. * * LOCKING: * Inherited from caller. */ static inline u8 ata_sff_busy_wait(struct ata_port *ap, unsigned int bits, unsigned int max) { u8 status; do { udelay(10); status = ap->ops->sff_check_status(ap); max--; } while (status != 0xff && (status & bits) && (max > 0)); return status; } /** * ata_wait_idle - Wait for a port to be idle. * @ap: Port to wait for. * * Waits up to 10ms for port's BUSY and DRQ signals to clear. * Returns final value of status register. * * LOCKING: * Inherited from caller. */ static inline u8 ata_wait_idle(struct ata_port *ap) { u8 status = ata_sff_busy_wait(ap, ATA_BUSY | ATA_DRQ, 1000); if (status != 0xff && (status & (ATA_BUSY | ATA_DRQ))) ata_port_dbg(ap, "abnormal Status 0x%X\n", status); return status; } #endif /* CONFIG_ATA_SFF */ #endif /* __LINUX_LIBATA_H__ */ |
54 54 54 54 1 53 53 53 53 52 26 7 3 5 5 5 5 7 3 11 2 18 14 4 9 10 6 18 6 13 13 11 1 1 13 7 11 9 10 18 18 14 4 5 4 4 4 5 8 5 4 6 5 4 3 6 5 49 5 3 54 60 2 3 54 54 54 1 53 28 26 8 4 14 5 1 18 18 54 4 24 59 1 59 58 60 29 37 1 1 53 10 55 4 5 1 24 27 2 22 10 16 17 10 60 9 9 7 1 60 60 1 60 60 60 59 51 9 51 50 73 73 1 73 1 43 1 1 26 2 2 69 69 68 1 1 64 42 1 22 3 51 60 1 1 1 1 14 1 1 6 2 4 5 5 1 4 2 2 6 3 1 2 14 1 3 10 88 85 1 84 83 83 2 83 77 6 88 88 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 | // SPDX-License-Identifier: GPL-2.0 /* * mm/mremap.c * * (C) Copyright 1996 Linus Torvalds * * Address space accounting code <alan@lxorguk.ukuu.org.uk> * (C) Copyright 2002 Red Hat Inc, All Rights Reserved */ #include <linux/mm.h> #include <linux/mm_inline.h> #include <linux/hugetlb.h> #include <linux/shm.h> #include <linux/ksm.h> #include <linux/mman.h> #include <linux/swap.h> #include <linux/capability.h> #include <linux/fs.h> #include <linux/swapops.h> #include <linux/highmem.h> #include <linux/security.h> #include <linux/syscalls.h> #include <linux/mmu_notifier.h> #include <linux/uaccess.h> #include <linux/userfaultfd_k.h> #include <linux/mempolicy.h> #include <asm/cacheflush.h> #include <asm/tlb.h> #include <asm/pgalloc.h> #include "internal.h" /* Classify the kind of remap operation being performed. */ enum mremap_type { MREMAP_INVALID, /* Initial state. */ MREMAP_NO_RESIZE, /* old_len == new_len, if not moved, do nothing. */ MREMAP_SHRINK, /* old_len > new_len. */ MREMAP_EXPAND, /* old_len < new_len. */ }; /* * Describes a VMA mremap() operation and is threaded throughout it. * * Any of the fields may be mutated by the operation, however these values will * always accurately reflect the remap (for instance, we may adjust lengths and * delta to account for hugetlb alignment). */ struct vma_remap_struct { /* User-provided state. */ unsigned long addr; /* User-specified address from which we remap. */ unsigned long old_len; /* Length of range being remapped. */ unsigned long new_len; /* Desired new length of mapping. */ unsigned long flags; /* user-specified MREMAP_* flags. */ unsigned long new_addr; /* Optionally, desired new address. */ /* uffd state. */ struct vm_userfaultfd_ctx *uf; struct list_head *uf_unmap_early; struct list_head *uf_unmap; /* VMA state, determined in do_mremap(). */ struct vm_area_struct *vma; /* Internal state, determined in do_mremap(). */ unsigned long delta; /* Absolute delta of old_len,new_len. */ bool mlocked; /* Was the VMA mlock()'d? */ enum mremap_type remap_type; /* expand, shrink, etc. */ bool mmap_locked; /* Is mm currently write-locked? */ unsigned long charged; /* If VM_ACCOUNT, # pages to account. */ }; static pud_t *get_old_pud(struct mm_struct *mm, unsigned long addr) { pgd_t *pgd; p4d_t *p4d; pud_t *pud; pgd = pgd_offset(mm, addr); if (pgd_none_or_clear_bad(pgd)) return NULL; p4d = p4d_offset(pgd, addr); if (p4d_none_or_clear_bad(p4d)) return NULL; pud = pud_offset(p4d, addr); if (pud_none_or_clear_bad(pud)) return NULL; return pud; } static pmd_t *get_old_pmd(struct mm_struct *mm, unsigned long addr) { pud_t *pud; pmd_t *pmd; pud = get_old_pud(mm, addr); if (!pud) return NULL; pmd = pmd_offset(pud, addr); if (pmd_none(*pmd)) return NULL; return pmd; } static pud_t *alloc_new_pud(struct mm_struct *mm, unsigned long addr) { pgd_t *pgd; p4d_t *p4d; pgd = pgd_offset(mm, addr); p4d = p4d_alloc(mm, pgd, addr); if (!p4d) return NULL; return pud_alloc(mm, p4d, addr); } static pmd_t *alloc_new_pmd(struct mm_struct *mm, unsigned long addr) { pud_t *pud; pmd_t *pmd; pud = alloc_new_pud(mm, addr); if (!pud) return NULL; pmd = pmd_alloc(mm, pud, addr); if (!pmd) return NULL; VM_BUG_ON(pmd_trans_huge(*pmd)); return pmd; } static void take_rmap_locks(struct vm_area_struct *vma) { if (vma->vm_file) i_mmap_lock_write(vma->vm_file->f_mapping); if (vma->anon_vma) anon_vma_lock_write(vma->anon_vma); } static void drop_rmap_locks(struct vm_area_struct *vma) { if (vma->anon_vma) anon_vma_unlock_write(vma->anon_vma); if (vma->vm_file) i_mmap_unlock_write(vma->vm_file->f_mapping); } static pte_t move_soft_dirty_pte(pte_t pte) { /* * Set soft dirty bit so we can notice * in userspace the ptes were moved. */ #ifdef CONFIG_MEM_SOFT_DIRTY if (pte_present(pte)) pte = pte_mksoft_dirty(pte); else if (is_swap_pte(pte)) pte = pte_swp_mksoft_dirty(pte); #endif return pte; } static int move_ptes(struct pagetable_move_control *pmc, unsigned long extent, pmd_t *old_pmd, pmd_t *new_pmd) { struct vm_area_struct *vma = pmc->old; bool need_clear_uffd_wp = vma_has_uffd_without_event_remap(vma); struct mm_struct *mm = vma->vm_mm; pte_t *old_pte, *new_pte, pte; pmd_t dummy_pmdval; spinlock_t *old_ptl, *new_ptl; bool force_flush = false; unsigned long old_addr = pmc->old_addr; unsigned long new_addr = pmc->new_addr; unsigned long old_end = old_addr + extent; unsigned long len = old_end - old_addr; int err = 0; /* * When need_rmap_locks is true, we take the i_mmap_rwsem and anon_vma * locks to ensure that rmap will always observe either the old or the * new ptes. This is the easiest way to avoid races with * truncate_pagecache(), page migration, etc... * * When need_rmap_locks is false, we use other ways to avoid * such races: * * - During exec() shift_arg_pages(), we use a specially tagged vma * which rmap call sites look for using vma_is_temporary_stack(). * * - During mremap(), new_vma is often known to be placed after vma * in rmap traversal order. This ensures rmap will always observe * either the old pte, or the new pte, or both (the page table locks * serialize access to individual ptes, but only rmap traversal * order guarantees that we won't miss both the old and new ptes). */ if (pmc->need_rmap_locks) take_rmap_locks(vma); /* * We don't have to worry about the ordering of src and dst * pte locks because exclusive mmap_lock prevents deadlock. */ old_pte = pte_offset_map_lock(mm, old_pmd, old_addr, &old_ptl); if (!old_pte) { err = -EAGAIN; goto out; } /* * Now new_pte is none, so hpage_collapse_scan_file() path can not find * this by traversing file->f_mapping, so there is no concurrency with * retract_page_tables(). In addition, we already hold the exclusive * mmap_lock, so this new_pte page is stable, so there is no need to get * pmdval and do pmd_same() check. */ new_pte = pte_offset_map_rw_nolock(mm, new_pmd, new_addr, &dummy_pmdval, &new_ptl); if (!new_pte) { pte_unmap_unlock(old_pte, old_ptl); err = -EAGAIN; goto out; } if (new_ptl != old_ptl) spin_lock_nested(new_ptl, SINGLE_DEPTH_NESTING); flush_tlb_batched_pending(vma->vm_mm); arch_enter_lazy_mmu_mode(); for (; old_addr < old_end; old_pte++, old_addr += PAGE_SIZE, new_pte++, new_addr += PAGE_SIZE) { VM_WARN_ON_ONCE(!pte_none(*new_pte)); if (pte_none(ptep_get(old_pte))) continue; pte = ptep_get_and_clear(mm, old_addr, old_pte); /* * If we are remapping a valid PTE, make sure * to flush TLB before we drop the PTL for the * PTE. * * NOTE! Both old and new PTL matter: the old one * for racing with folio_mkclean(), the new one to * make sure the physical page stays valid until * the TLB entry for the old mapping has been * flushed. */ if (pte_present(pte)) force_flush = true; pte = move_pte(pte, old_addr, new_addr); pte = move_soft_dirty_pte(pte); if (need_clear_uffd_wp && pte_marker_uffd_wp(pte)) pte_clear(mm, new_addr, new_pte); else { if (need_clear_uffd_wp) { if (pte_present(pte)) pte = pte_clear_uffd_wp(pte); else if (is_swap_pte(pte)) pte = pte_swp_clear_uffd_wp(pte); } set_pte_at(mm, new_addr, new_pte, pte); } } arch_leave_lazy_mmu_mode(); if (force_flush) flush_tlb_range(vma, old_end - len, old_end); if (new_ptl != old_ptl) spin_unlock(new_ptl); pte_unmap(new_pte - 1); pte_unmap_unlock(old_pte - 1, old_ptl); out: if (pmc->need_rmap_locks) drop_rmap_locks(vma); return err; } #ifndef arch_supports_page_table_move #define arch_supports_page_table_move arch_supports_page_table_move static inline bool arch_supports_page_table_move(void) { return IS_ENABLED(CONFIG_HAVE_MOVE_PMD) || IS_ENABLED(CONFIG_HAVE_MOVE_PUD); } #endif #ifdef CONFIG_HAVE_MOVE_PMD static bool move_normal_pmd(struct pagetable_move_control *pmc, pmd_t *old_pmd, pmd_t *new_pmd) { spinlock_t *old_ptl, *new_ptl; struct vm_area_struct *vma = pmc->old; struct mm_struct *mm = vma->vm_mm; bool res = false; pmd_t pmd; if (!arch_supports_page_table_move()) return false; /* * The destination pmd shouldn't be established, free_pgtables() * should have released it. * * However, there's a case during execve() where we use mremap * to move the initial stack, and in that case the target area * may overlap the source area (always moving down). * * If everything is PMD-aligned, that works fine, as moving * each pmd down will clear the source pmd. But if we first * have a few 4kB-only pages that get moved down, and then * hit the "now the rest is PMD-aligned, let's do everything * one pmd at a time", we will still have the old (now empty * of any 4kB pages, but still there) PMD in the page table * tree. * * Warn on it once - because we really should try to figure * out how to do this better - but then say "I won't move * this pmd". * * One alternative might be to just unmap the target pmd at * this point, and verify that it really is empty. We'll see. */ if (WARN_ON_ONCE(!pmd_none(*new_pmd))) return false; /* If this pmd belongs to a uffd vma with remap events disabled, we need * to ensure that the uffd-wp state is cleared from all pgtables. This * means recursing into lower page tables in move_page_tables(), and we * can reuse the existing code if we simply treat the entry as "not * moved". */ if (vma_has_uffd_without_event_remap(vma)) return false; /* * We don't have to worry about the ordering of src and dst * ptlocks because exclusive mmap_lock prevents deadlock. */ old_ptl = pmd_lock(mm, old_pmd); new_ptl = pmd_lockptr(mm, new_pmd); if (new_ptl != old_ptl) spin_lock_nested(new_ptl, SINGLE_DEPTH_NESTING); pmd = *old_pmd; /* Racing with collapse? */ if (unlikely(!pmd_present(pmd) || pmd_leaf(pmd))) goto out_unlock; /* Clear the pmd */ pmd_clear(old_pmd); res = true; VM_BUG_ON(!pmd_none(*new_pmd)); pmd_populate(mm, new_pmd, pmd_pgtable(pmd)); flush_tlb_range(vma, pmc->old_addr, pmc->old_addr + PMD_SIZE); out_unlock: if (new_ptl != old_ptl) spin_unlock(new_ptl); spin_unlock(old_ptl); return res; } #else static inline bool move_normal_pmd(struct pagetable_move_control *pmc, pmd_t *old_pmd, pmd_t *new_pmd) { return false; } #endif #if CONFIG_PGTABLE_LEVELS > 2 && defined(CONFIG_HAVE_MOVE_PUD) static bool move_normal_pud(struct pagetable_move_control *pmc, pud_t *old_pud, pud_t *new_pud) { spinlock_t *old_ptl, *new_ptl; struct vm_area_struct *vma = pmc->old; struct mm_struct *mm = vma->vm_mm; pud_t pud; if (!arch_supports_page_table_move()) return false; /* * The destination pud shouldn't be established, free_pgtables() * should have released it. */ if (WARN_ON_ONCE(!pud_none(*new_pud))) return false; /* If this pud belongs to a uffd vma with remap events disabled, we need * to ensure that the uffd-wp state is cleared from all pgtables. This * means recursing into lower page tables in move_page_tables(), and we * can reuse the existing code if we simply treat the entry as "not * moved". */ if (vma_has_uffd_without_event_remap(vma)) return false; /* * We don't have to worry about the ordering of src and dst * ptlocks because exclusive mmap_lock prevents deadlock. */ old_ptl = pud_lock(mm, old_pud); new_ptl = pud_lockptr(mm, new_pud); if (new_ptl != old_ptl) spin_lock_nested(new_ptl, SINGLE_DEPTH_NESTING); /* Clear the pud */ pud = *old_pud; pud_clear(old_pud); VM_BUG_ON(!pud_none(*new_pud)); pud_populate(mm, new_pud, pud_pgtable(pud)); flush_tlb_range(vma, pmc->old_addr, pmc->old_addr + PUD_SIZE); if (new_ptl != old_ptl) spin_unlock(new_ptl); spin_unlock(old_ptl); return true; } #else static inline bool move_normal_pud(struct pagetable_move_control *pmc, pud_t *old_pud, pud_t *new_pud) { return false; } #endif #if defined(CONFIG_TRANSPARENT_HUGEPAGE) && defined(CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD) static bool move_huge_pud(struct pagetable_move_control *pmc, pud_t *old_pud, pud_t *new_pud) { spinlock_t *old_ptl, *new_ptl; struct vm_area_struct *vma = pmc->old; struct mm_struct *mm = vma->vm_mm; pud_t pud; /* * The destination pud shouldn't be established, free_pgtables() * should have released it. */ if (WARN_ON_ONCE(!pud_none(*new_pud))) return false; /* * We don't have to worry about the ordering of src and dst * ptlocks because exclusive mmap_lock prevents deadlock. */ old_ptl = pud_lock(mm, old_pud); new_ptl = pud_lockptr(mm, new_pud); if (new_ptl != old_ptl) spin_lock_nested(new_ptl, SINGLE_DEPTH_NESTING); /* Clear the pud */ pud = *old_pud; pud_clear(old_pud); VM_BUG_ON(!pud_none(*new_pud)); /* Set the new pud */ /* mark soft_ditry when we add pud level soft dirty support */ set_pud_at(mm, pmc->new_addr, new_pud, pud); flush_pud_tlb_range(vma, pmc->old_addr, pmc->old_addr + HPAGE_PUD_SIZE); if (new_ptl != old_ptl) spin_unlock(new_ptl); spin_unlock(old_ptl); return true; } #else static bool move_huge_pud(struct pagetable_move_control *pmc, pud_t *old_pud, pud_t *new_pud) { WARN_ON_ONCE(1); return false; } #endif enum pgt_entry { NORMAL_PMD, HPAGE_PMD, NORMAL_PUD, HPAGE_PUD, }; /* * Returns an extent of the corresponding size for the pgt_entry specified if * valid. Else returns a smaller extent bounded by the end of the source and * destination pgt_entry. */ static __always_inline unsigned long get_extent(enum pgt_entry entry, struct pagetable_move_control *pmc) { unsigned long next, extent, mask, size; unsigned long old_addr = pmc->old_addr; unsigned long old_end = pmc->old_end; unsigned long new_addr = pmc->new_addr; switch (entry) { case HPAGE_PMD: case NORMAL_PMD: mask = PMD_MASK; size = PMD_SIZE; break; case HPAGE_PUD: case NORMAL_PUD: mask = PUD_MASK; size = PUD_SIZE; break; default: BUILD_BUG(); break; } next = (old_addr + size) & mask; /* even if next overflowed, extent below will be ok */ extent = next - old_addr; if (extent > old_end - old_addr) extent = old_end - old_addr; next = (new_addr + size) & mask; if (extent > next - new_addr) extent = next - new_addr; return extent; } /* * Should move_pgt_entry() acquire the rmap locks? This is either expressed in * the PMC, or overridden in the case of normal, larger page tables. */ static bool should_take_rmap_locks(struct pagetable_move_control *pmc, enum pgt_entry entry) { switch (entry) { case NORMAL_PMD: case NORMAL_PUD: return true; default: return pmc->need_rmap_locks; } } /* * Attempts to speedup the move by moving entry at the level corresponding to * pgt_entry. Returns true if the move was successful, else false. */ static bool move_pgt_entry(struct pagetable_move_control *pmc, enum pgt_entry entry, void *old_entry, void *new_entry) { bool moved = false; bool need_rmap_locks = should_take_rmap_locks(pmc, entry); /* See comment in move_ptes() */ if (need_rmap_locks) take_rmap_locks(pmc->old); switch (entry) { case NORMAL_PMD: moved = move_normal_pmd(pmc, old_entry, new_entry); break; case NORMAL_PUD: moved = move_normal_pud(pmc, old_entry, new_entry); break; case HPAGE_PMD: moved = IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) && move_huge_pmd(pmc->old, pmc->old_addr, pmc->new_addr, old_entry, new_entry); break; case HPAGE_PUD: moved = IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) && move_huge_pud(pmc, old_entry, new_entry); break; default: WARN_ON_ONCE(1); break; } if (need_rmap_locks) drop_rmap_locks(pmc->old); return moved; } /* * A helper to check if aligning down is OK. The aligned address should fall * on *no mapping*. For the stack moving down, that's a special move within * the VMA that is created to span the source and destination of the move, * so we make an exception for it. */ static bool can_align_down(struct pagetable_move_control *pmc, struct vm_area_struct *vma, unsigned long addr_to_align, unsigned long mask) { unsigned long addr_masked = addr_to_align & mask; /* * If @addr_to_align of either source or destination is not the beginning * of the corresponding VMA, we can't align down or we will destroy part * of the current mapping. */ if (!pmc->for_stack && vma->vm_start != addr_to_align) return false; /* In the stack case we explicitly permit in-VMA alignment. */ if (pmc->for_stack && addr_masked >= vma->vm_start) return true; /* * Make sure the realignment doesn't cause the address to fall on an * existing mapping. */ return find_vma_intersection(vma->vm_mm, addr_masked, vma->vm_start) == NULL; } /* * Determine if are in fact able to realign for efficiency to a higher page * table boundary. */ static bool can_realign_addr(struct pagetable_move_control *pmc, unsigned long pagetable_mask) { unsigned long align_mask = ~pagetable_mask; unsigned long old_align = pmc->old_addr & align_mask; unsigned long new_align = pmc->new_addr & align_mask; unsigned long pagetable_size = align_mask + 1; unsigned long old_align_next = pagetable_size - old_align; /* * We don't want to have to go hunting for VMAs from the end of the old * VMA to the next page table boundary, also we want to make sure the * operation is wortwhile. * * So ensure that we only perform this realignment if the end of the * range being copied reaches or crosses the page table boundary. * * boundary boundary * .<- old_align -> . * . |----------------.-----------| * . | vma . | * . |----------------.-----------| * . <----------------.-----------> * . len_in * <-------------------------------> * . pagetable_size . * . <----------------> * . old_align_next . */ if (pmc->len_in < old_align_next) return false; /* Skip if the addresses are already aligned. */ if (old_align == 0) return false; /* Only realign if the new and old addresses are mutually aligned. */ if (old_align != new_align) return false; /* Ensure realignment doesn't cause overlap with existing mappings. */ if (!can_align_down(pmc, pmc->old, pmc->old_addr, pagetable_mask) || !can_align_down(pmc, pmc->new, pmc->new_addr, pagetable_mask)) return false; return true; } /* * Opportunistically realign to specified boundary for faster copy. * * Consider an mremap() of a VMA with page table boundaries as below, and no * preceding VMAs from the lower page table boundary to the start of the VMA, * with the end of the range reaching or crossing the page table boundary. * * boundary boundary * . |----------------.-----------| * . | vma . | * . |----------------.-----------| * . pmc->old_addr . pmc->old_end * . <----------------------------> * . move these page tables * * If we proceed with moving page tables in this scenario, we will have a lot of * work to do traversing old page tables and establishing new ones in the * destination across multiple lower level page tables. * * The idea here is simply to align pmc->old_addr, pmc->new_addr down to the * page table boundary, so we can simply copy a single page table entry for the * aligned portion of the VMA instead: * * boundary boundary * . |----------------.-----------| * . | vma . | * . |----------------.-----------| * pmc->old_addr . pmc->old_end * <-------------------------------------------> * . move these page tables */ static void try_realign_addr(struct pagetable_move_control *pmc, unsigned long pagetable_mask) { if (!can_realign_addr(pmc, pagetable_mask)) return; /* * Simply align to page table boundaries. Note that we do NOT update the * pmc->old_end value, and since the move_page_tables() operation spans * from [old_addr, old_end) (offsetting new_addr as it is performed), * this simply changes the start of the copy, not the end. */ pmc->old_addr &= pagetable_mask; pmc->new_addr &= pagetable_mask; } /* Is the page table move operation done? */ static bool pmc_done(struct pagetable_move_control *pmc) { return pmc->old_addr >= pmc->old_end; } /* Advance to the next page table, offset by extent bytes. */ static void pmc_next(struct pagetable_move_control *pmc, unsigned long extent) { pmc->old_addr += extent; pmc->new_addr += extent; } /* * Determine how many bytes in the specified input range have had their page * tables moved so far. */ static unsigned long pmc_progress(struct pagetable_move_control *pmc) { unsigned long orig_old_addr = pmc->old_end - pmc->len_in; unsigned long old_addr = pmc->old_addr; /* * Prevent negative return values when {old,new}_addr was realigned but * we broke out of the loop in move_page_tables() for the first PMD * itself. */ return old_addr < orig_old_addr ? 0 : old_addr - orig_old_addr; } unsigned long move_page_tables(struct pagetable_move_control *pmc) { unsigned long extent; struct mmu_notifier_range range; pmd_t *old_pmd, *new_pmd; pud_t *old_pud, *new_pud; struct mm_struct *mm = pmc->old->vm_mm; if (!pmc->len_in) return 0; if (is_vm_hugetlb_page(pmc->old)) return move_hugetlb_page_tables(pmc->old, pmc->new, pmc->old_addr, pmc->new_addr, pmc->len_in); /* * If possible, realign addresses to PMD boundary for faster copy. * Only realign if the mremap copying hits a PMD boundary. */ try_realign_addr(pmc, PMD_MASK); flush_cache_range(pmc->old, pmc->old_addr, pmc->old_end); mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0, mm, pmc->old_addr, pmc->old_end); mmu_notifier_invalidate_range_start(&range); for (; !pmc_done(pmc); pmc_next(pmc, extent)) { cond_resched(); /* * If extent is PUD-sized try to speed up the move by moving at the * PUD level if possible. */ extent = get_extent(NORMAL_PUD, pmc); old_pud = get_old_pud(mm, pmc->old_addr); if (!old_pud) continue; new_pud = alloc_new_pud(mm, pmc->new_addr); if (!new_pud) break; if (pud_trans_huge(*old_pud) || pud_devmap(*old_pud)) { if (extent == HPAGE_PUD_SIZE) { move_pgt_entry(pmc, HPAGE_PUD, old_pud, new_pud); /* We ignore and continue on error? */ continue; } } else if (IS_ENABLED(CONFIG_HAVE_MOVE_PUD) && extent == PUD_SIZE) { if (move_pgt_entry(pmc, NORMAL_PUD, old_pud, new_pud)) continue; } extent = get_extent(NORMAL_PMD, pmc); old_pmd = get_old_pmd(mm, pmc->old_addr); if (!old_pmd) continue; new_pmd = alloc_new_pmd(mm, pmc->new_addr); if (!new_pmd) break; again: if (is_swap_pmd(*old_pmd) || pmd_trans_huge(*old_pmd) || pmd_devmap(*old_pmd)) { if (extent == HPAGE_PMD_SIZE && move_pgt_entry(pmc, HPAGE_PMD, old_pmd, new_pmd)) continue; split_huge_pmd(pmc->old, old_pmd, pmc->old_addr); } else if (IS_ENABLED(CONFIG_HAVE_MOVE_PMD) && extent == PMD_SIZE) { /* * If the extent is PMD-sized, try to speed the move by * moving at the PMD level if possible. */ if (move_pgt_entry(pmc, NORMAL_PMD, old_pmd, new_pmd)) continue; } if (pmd_none(*old_pmd)) continue; if (pte_alloc(pmc->new->vm_mm, new_pmd)) break; if (move_ptes(pmc, extent, old_pmd, new_pmd) < 0) goto again; } mmu_notifier_invalidate_range_end(&range); return pmc_progress(pmc); } /* Set vrm->delta to the difference in VMA size specified by user. */ static void vrm_set_delta(struct vma_remap_struct *vrm) { vrm->delta = abs_diff(vrm->old_len, vrm->new_len); } /* Determine what kind of remap this is - shrink, expand or no resize at all. */ static enum mremap_type vrm_remap_type(struct vma_remap_struct *vrm) { if (vrm->delta == 0) return MREMAP_NO_RESIZE; if (vrm->old_len > vrm->new_len) return MREMAP_SHRINK; return MREMAP_EXPAND; } /* * When moving a VMA to vrm->new_adr, does this result in the new and old VMAs * overlapping? */ static bool vrm_overlaps(struct vma_remap_struct *vrm) { unsigned long start_old = vrm->addr; unsigned long start_new = vrm->new_addr; unsigned long end_old = vrm->addr + vrm->old_len; unsigned long end_new = vrm->new_addr + vrm->new_len; /* * start_old end_old * |-----------| * | | * |-----------| * |-------------| * | | * |-------------| * start_new end_new */ if (end_old > start_new && end_new > start_old) return true; return false; } /* Do the mremap() flags require that the new_addr parameter be specified? */ static bool vrm_implies_new_addr(struct vma_remap_struct *vrm) { return vrm->flags & (MREMAP_FIXED | MREMAP_DONTUNMAP); } /* * Find an unmapped area for the requested vrm->new_addr. * * If MREMAP_FIXED then this is equivalent to a MAP_FIXED mmap() call. If only * MREMAP_DONTUNMAP is set, then this is equivalent to providing a hint to * mmap(), otherwise this is equivalent to mmap() specifying a NULL address. * * Returns 0 on success (with vrm->new_addr updated), or an error code upon * failure. */ static unsigned long vrm_set_new_addr(struct vma_remap_struct *vrm) { struct vm_area_struct *vma = vrm->vma; unsigned long map_flags = 0; /* Page Offset _into_ the VMA. */ pgoff_t internal_pgoff = (vrm->addr - vma->vm_start) >> PAGE_SHIFT; pgoff_t pgoff = vma->vm_pgoff + internal_pgoff; unsigned long new_addr = vrm_implies_new_addr(vrm) ? vrm->new_addr : 0; unsigned long res; if (vrm->flags & MREMAP_FIXED) map_flags |= MAP_FIXED; if (vma->vm_flags & VM_MAYSHARE) map_flags |= MAP_SHARED; res = get_unmapped_area(vma->vm_file, new_addr, vrm->new_len, pgoff, map_flags); if (IS_ERR_VALUE(res)) return res; vrm->new_addr = res; return 0; } /* * Keep track of pages which have been added to the memory mapping. If the VMA * is accounted, also check to see if there is sufficient memory. * * Returns true on success, false if insufficient memory to charge. */ static bool vrm_charge(struct vma_remap_struct *vrm) { unsigned long charged; if (!(vrm->vma->vm_flags & VM_ACCOUNT)) return true; /* * If we don't unmap the old mapping, then we account the entirety of * the length of the new one. Otherwise it's just the delta in size. */ if (vrm->flags & MREMAP_DONTUNMAP) charged = vrm->new_len >> PAGE_SHIFT; else charged = vrm->delta >> PAGE_SHIFT; /* This accounts 'charged' pages of memory. */ if (security_vm_enough_memory_mm(current->mm, charged)) return false; vrm->charged = charged; return true; } /* * an error has occurred so we will not be using vrm->charged memory. Unaccount * this memory if the VMA is accounted. */ static void vrm_uncharge(struct vma_remap_struct *vrm) { if (!(vrm->vma->vm_flags & VM_ACCOUNT)) return; vm_unacct_memory(vrm->charged); vrm->charged = 0; } /* * Update mm exec_vm, stack_vm, data_vm, and locked_vm fields as needed to * account for 'bytes' memory used, and if locked, indicate this in the VRM so * we can handle this correctly later. */ static void vrm_stat_account(struct vma_remap_struct *vrm, unsigned long bytes) { unsigned long pages = bytes >> PAGE_SHIFT; struct mm_struct *mm = current->mm; struct vm_area_struct *vma = vrm->vma; vm_stat_account(mm, vma->vm_flags, pages); if (vma->vm_flags & VM_LOCKED) { mm->locked_vm += pages; vrm->mlocked = true; } } /* * Perform checks before attempting to write a VMA prior to it being * moved. */ static unsigned long prep_move_vma(struct vma_remap_struct *vrm) { unsigned long err = 0; struct vm_area_struct *vma = vrm->vma; unsigned long old_addr = vrm->addr; unsigned long old_len = vrm->old_len; unsigned long dummy = vma->vm_flags; /* * We'd prefer to avoid failure later on in do_munmap: * which may split one vma into three before unmapping. */ if (current->mm->map_count >= sysctl_max_map_count - 3) return -ENOMEM; if (vma->vm_ops && vma->vm_ops->may_split) { if (vma->vm_start != old_addr) err = vma->vm_ops->may_split(vma, old_addr); if (!err && vma->vm_end != old_addr + old_len) err = vma->vm_ops->may_split(vma, old_addr + old_len); if (err) return err; } /* * Advise KSM to break any KSM pages in the area to be moved: * it would be confusing if they were to turn up at the new * location, where they happen to coincide with different KSM * pages recently unmapped. But leave vma->vm_flags as it was, * so KSM can come around to merge on vma and new_vma afterwards. */ err = ksm_madvise(vma, old_addr, old_addr + old_len, MADV_UNMERGEABLE, &dummy); if (err) return err; return 0; } /* * Unmap source VMA for VMA move, turning it from a copy to a move, being * careful to ensure we do not underflow memory account while doing so if an * accountable move. * * This is best effort, if we fail to unmap then we simply try to correct * accounting and exit. */ static void unmap_source_vma(struct vma_remap_struct *vrm) { struct mm_struct *mm = current->mm; unsigned long addr = vrm->addr; unsigned long len = vrm->old_len; struct vm_area_struct *vma = vrm->vma; VMA_ITERATOR(vmi, mm, addr); int err; unsigned long vm_start; unsigned long vm_end; /* * It might seem odd that we check for MREMAP_DONTUNMAP here, given this * function implies that we unmap the original VMA, which seems * contradictory. * * However, this occurs when this operation was attempted and an error * arose, in which case we _do_ wish to unmap the _new_ VMA, which means * we actually _do_ want it be unaccounted. */ bool accountable_move = (vma->vm_flags & VM_ACCOUNT) && !(vrm->flags & MREMAP_DONTUNMAP); /* * So we perform a trick here to prevent incorrect accounting. Any merge * or new VMA allocation performed in copy_vma() does not adjust * accounting, it is expected that callers handle this. * * And indeed we already have, accounting appropriately in the case of * both in vrm_charge(). * * However, when we unmap the existing VMA (to effect the move), this * code will, if the VMA has VM_ACCOUNT set, attempt to unaccount * removed pages. * * To avoid this we temporarily clear this flag, reinstating on any * portions of the original VMA that remain. */ if (accountable_move) { vm_flags_clear(vma, VM_ACCOUNT); /* We are about to split vma, so store the start/end. */ vm_start = vma->vm_start; vm_end = vma->vm_end; } err = do_vmi_munmap(&vmi, mm, addr, len, vrm->uf_unmap, /* unlock= */false); vrm->vma = NULL; /* Invalidated. */ if (err) { /* OOM: unable to split vma, just get accounts right */ vm_acct_memory(len >> PAGE_SHIFT); return; } /* * If we mremap() from a VMA like this: * * addr end * | | * v v * |-------------| * | | * |-------------| * * Having cleared VM_ACCOUNT from the whole VMA, after we unmap above * we'll end up with: * * addr end * | | * v v * |---| |---| * | A | | B | * |---| |---| * * The VMI is still pointing at addr, so vma_prev() will give us A, and * a subsequent or lone vma_next() will give as B. * * do_vmi_munmap() will have restored the VMI back to addr. */ if (accountable_move) { unsigned long end = addr + len; if (vm_start < addr) { struct vm_area_struct *prev = vma_prev(&vmi); vm_flags_set(prev, VM_ACCOUNT); /* Acquires VMA lock. */ } if (vm_end > end) { struct vm_area_struct *next = vma_next(&vmi); vm_flags_set(next, VM_ACCOUNT); /* Acquires VMA lock. */ } } } /* * Copy vrm->vma over to vrm->new_addr possibly adjusting size as part of the * process. Additionally handle an error occurring on moving of page tables, * where we reset vrm state to cause unmapping of the new VMA. * * Outputs the newly installed VMA to new_vma_ptr. Returns 0 on success or an * error code. */ static int copy_vma_and_data(struct vma_remap_struct *vrm, struct vm_area_struct **new_vma_ptr) { unsigned long internal_offset = vrm->addr - vrm->vma->vm_start; unsigned long internal_pgoff = internal_offset >> PAGE_SHIFT; unsigned long new_pgoff = vrm->vma->vm_pgoff + internal_pgoff; unsigned long moved_len; struct vm_area_struct *vma = vrm->vma; struct vm_area_struct *new_vma; int err = 0; PAGETABLE_MOVE(pmc, NULL, NULL, vrm->addr, vrm->new_addr, vrm->old_len); new_vma = copy_vma(&vma, vrm->new_addr, vrm->new_len, new_pgoff, &pmc.need_rmap_locks); if (!new_vma) { vrm_uncharge(vrm); *new_vma_ptr = NULL; return -ENOMEM; } vrm->vma = vma; pmc.old = vma; pmc.new = new_vma; moved_len = move_page_tables(&pmc); if (moved_len < vrm->old_len) err = -ENOMEM; else if (vma->vm_ops && vma->vm_ops->mremap) err = vma->vm_ops->mremap(new_vma); if (unlikely(err)) { PAGETABLE_MOVE(pmc_revert, new_vma, vma, vrm->new_addr, vrm->addr, moved_len); /* * On error, move entries back from new area to old, * which will succeed since page tables still there, * and then proceed to unmap new area instead of old. */ pmc_revert.need_rmap_locks = true; move_page_tables(&pmc_revert); vrm->vma = new_vma; vrm->old_len = vrm->new_len; vrm->addr = vrm->new_addr; } else { mremap_userfaultfd_prep(new_vma, vrm->uf); } fixup_hugetlb_reservations(vma); *new_vma_ptr = new_vma; return err; } /* * Perform final tasks for MADV_DONTUNMAP operation, clearing mlock() and * account flags on remaining VMA by convention (it cannot be mlock()'d any * longer, as pages in range are no longer mapped), and removing anon_vma_chain * links from it (if the entire VMA was copied over). */ static void dontunmap_complete(struct vma_remap_struct *vrm, struct vm_area_struct *new_vma) { unsigned long start = vrm->addr; unsigned long end = vrm->addr + vrm->old_len; unsigned long old_start = vrm->vma->vm_start; unsigned long old_end = vrm->vma->vm_end; /* * We always clear VM_LOCKED[ONFAULT] | VM_ACCOUNT on the old * vma. */ vm_flags_clear(vrm->vma, VM_LOCKED_MASK | VM_ACCOUNT); /* * anon_vma links of the old vma is no longer needed after its page * table has been moved. */ if (new_vma != vrm->vma && start == old_start && end == old_end) unlink_anon_vmas(vrm->vma); /* Because we won't unmap we don't need to touch locked_vm. */ } static unsigned long move_vma(struct vma_remap_struct *vrm) { struct mm_struct *mm = current->mm; struct vm_area_struct *new_vma; unsigned long hiwater_vm; int err; err = prep_move_vma(vrm); if (err) return err; /* If accounted, charge the number of bytes the operation will use. */ if (!vrm_charge(vrm)) return -ENOMEM; /* We don't want racing faults. */ vma_start_write(vrm->vma); /* Perform copy step. */ err = copy_vma_and_data(vrm, &new_vma); /* * If we established the copied-to VMA, we attempt to recover from the * error by setting the destination VMA to the source VMA and unmapping * it below. */ if (err && !new_vma) return err; /* * If we failed to move page tables we still do total_vm increment * since do_munmap() will decrement it by old_len == new_len. * * Since total_vm is about to be raised artificially high for a * moment, we need to restore high watermark afterwards: if stats * are taken meanwhile, total_vm and hiwater_vm appear too high. * If this were a serious issue, we'd add a flag to do_munmap(). */ hiwater_vm = mm->hiwater_vm; vrm_stat_account(vrm, vrm->new_len); if (unlikely(!err && (vrm->flags & MREMAP_DONTUNMAP))) dontunmap_complete(vrm, new_vma); else unmap_source_vma(vrm); mm->hiwater_vm = hiwater_vm; return err ? (unsigned long)err : vrm->new_addr; } /* * resize_is_valid() - Ensure the vma can be resized to the new length at the give * address. * * Return 0 on success, error otherwise. */ static int resize_is_valid(struct vma_remap_struct *vrm) { struct mm_struct *mm = current->mm; struct vm_area_struct *vma = vrm->vma; unsigned long addr = vrm->addr; unsigned long old_len = vrm->old_len; unsigned long new_len = vrm->new_len; unsigned long pgoff; /* * !old_len is a special case where an attempt is made to 'duplicate' * a mapping. This makes no sense for private mappings as it will * instead create a fresh/new mapping unrelated to the original. This * is contrary to the basic idea of mremap which creates new mappings * based on the original. There are no known use cases for this * behavior. As a result, fail such attempts. */ if (!old_len && !(vma->vm_flags & (VM_SHARED | VM_MAYSHARE))) { pr_warn_once("%s (%d): attempted to duplicate a private mapping with mremap. This is not supported.\n", current->comm, current->pid); return -EINVAL; } if ((vrm->flags & MREMAP_DONTUNMAP) && (vma->vm_flags & (VM_DONTEXPAND | VM_PFNMAP))) return -EINVAL; /* We can't remap across vm area boundaries */ if (old_len > vma->vm_end - addr) return -EFAULT; if (new_len == old_len) return 0; /* Need to be careful about a growing mapping */ pgoff = (addr - vma->vm_start) >> PAGE_SHIFT; pgoff += vma->vm_pgoff; if (pgoff + (new_len >> PAGE_SHIFT) < pgoff) return -EINVAL; if (vma->vm_flags & (VM_DONTEXPAND | VM_PFNMAP)) return -EFAULT; if (!mlock_future_ok(mm, vma->vm_flags, vrm->delta)) return -EAGAIN; if (!may_expand_vm(mm, vma->vm_flags, vrm->delta >> PAGE_SHIFT)) return -ENOMEM; return 0; } /* * The user has requested that the VMA be shrunk (i.e., old_len > new_len), so * execute this, optionally dropping the mmap lock when we do so. * * In both cases this invalidates the VMA, however if we don't drop the lock, * then load the correct VMA into vrm->vma afterwards. */ static unsigned long shrink_vma(struct vma_remap_struct *vrm, bool drop_lock) { struct mm_struct *mm = current->mm; unsigned long unmap_start = vrm->addr + vrm->new_len; unsigned long unmap_bytes = vrm->delta; unsigned long res; VMA_ITERATOR(vmi, mm, unmap_start); VM_BUG_ON(vrm->remap_type != MREMAP_SHRINK); res = do_vmi_munmap(&vmi, mm, unmap_start, unmap_bytes, vrm->uf_unmap, drop_lock); vrm->vma = NULL; /* Invalidated. */ if (res) return res; /* * If we've not dropped the lock, then we should reload the VMA to * replace the invalidated VMA with the one that may have now been * split. */ if (drop_lock) { vrm->mmap_locked = false; } else { vrm->vma = vma_lookup(mm, vrm->addr); if (!vrm->vma) return -EFAULT; } return 0; } /* * mremap_to() - remap a vma to a new location. * Returns: The new address of the vma or an error. */ static unsigned long mremap_to(struct vma_remap_struct *vrm) { struct mm_struct *mm = current->mm; unsigned long err; /* Is the new length or address silly? */ if (vrm->new_len > TASK_SIZE || vrm->new_addr > TASK_SIZE - vrm->new_len) return -EINVAL; if (vrm_overlaps(vrm)) return -EINVAL; if (vrm->flags & MREMAP_FIXED) { /* * In mremap_to(). * VMA is moved to dst address, and munmap dst first. * do_munmap will check if dst is sealed. */ err = do_munmap(mm, vrm->new_addr, vrm->new_len, vrm->uf_unmap_early); vrm->vma = NULL; /* Invalidated. */ if (err) return err; /* * If we remap a portion of a VMA elsewhere in the same VMA, * this can invalidate the old VMA. Reset. */ vrm->vma = vma_lookup(mm, vrm->addr); if (!vrm->vma) return -EFAULT; } if (vrm->remap_type == MREMAP_SHRINK) { err = shrink_vma(vrm, /* drop_lock= */false); if (err) return err; /* Set up for the move now shrink has been executed. */ vrm->old_len = vrm->new_len; } err = resize_is_valid(vrm); if (err) return err; /* MREMAP_DONTUNMAP expands by old_len since old_len == new_len */ if (vrm->flags & MREMAP_DONTUNMAP) { vm_flags_t vm_flags = vrm->vma->vm_flags; unsigned long pages = vrm->old_len >> PAGE_SHIFT; if (!may_expand_vm(mm, vm_flags, pages)) return -ENOMEM; } err = vrm_set_new_addr(vrm); if (err) return err; return move_vma(vrm); } static int vma_expandable(struct vm_area_struct *vma, unsigned long delta) { unsigned long end = vma->vm_end + delta; if (end < vma->vm_end) /* overflow */ return 0; if (find_vma_intersection(vma->vm_mm, vma->vm_end, end)) return 0; if (get_unmapped_area(NULL, vma->vm_start, end - vma->vm_start, 0, MAP_FIXED) & ~PAGE_MASK) return 0; return 1; } /* Determine whether we are actually able to execute an in-place expansion. */ static bool vrm_can_expand_in_place(struct vma_remap_struct *vrm) { /* Number of bytes from vrm->addr to end of VMA. */ unsigned long suffix_bytes = vrm->vma->vm_end - vrm->addr; /* If end of range aligns to end of VMA, we can just expand in-place. */ if (suffix_bytes != vrm->old_len) return false; /* Check whether this is feasible. */ if (!vma_expandable(vrm->vma, vrm->delta)) return false; return true; } /* * Are the parameters passed to mremap() valid? If so return 0, otherwise return * error. */ static unsigned long check_mremap_params(struct vma_remap_struct *vrm) { unsigned long addr = vrm->addr; unsigned long flags = vrm->flags; /* Ensure no unexpected flag values. */ if (flags & ~(MREMAP_FIXED | MREMAP_MAYMOVE | MREMAP_DONTUNMAP)) return -EINVAL; /* Start address must be page-aligned. */ if (offset_in_page(addr)) return -EINVAL; /* * We allow a zero old-len as a special case * for DOS-emu "duplicate shm area" thing. But * a zero new-len is nonsensical. */ if (!PAGE_ALIGN(vrm->new_len)) return -EINVAL; /* Remainder of checks are for cases with specific new_addr. */ if (!vrm_implies_new_addr(vrm)) return 0; /* The new address must be page-aligned. */ if (offset_in_page(vrm->new_addr)) return -EINVAL; /* A fixed address implies a move. */ if (!(flags & MREMAP_MAYMOVE)) return -EINVAL; /* MREMAP_DONTUNMAP does not allow resizing in the process. */ if (flags & MREMAP_DONTUNMAP && vrm->old_len != vrm->new_len) return -EINVAL; /* * move_vma() need us to stay 4 maps below the threshold, otherwise * it will bail out at the very beginning. * That is a problem if we have already unmaped the regions here * (new_addr, and old_addr), because userspace will not know the * state of the vma's after it gets -ENOMEM. * So, to avoid such scenario we can pre-compute if the whole * operation has high chances to success map-wise. * Worst-scenario case is when both vma's (new_addr and old_addr) get * split in 3 before unmapping it. * That means 2 more maps (1 for each) to the ones we already hold. * Check whether current map count plus 2 still leads us to 4 maps below * the threshold, otherwise return -ENOMEM here to be more safe. */ if ((current->mm->map_count + 2) >= sysctl_max_map_count - 3) return -ENOMEM; return 0; } /* * We know we can expand the VMA in-place by delta pages, so do so. * * If we discover the VMA is locked, update mm_struct statistics accordingly and * indicate so to the caller. */ static unsigned long expand_vma_in_place(struct vma_remap_struct *vrm) { struct mm_struct *mm = current->mm; struct vm_area_struct *vma = vrm->vma; VMA_ITERATOR(vmi, mm, vma->vm_end); if (!vrm_charge(vrm)) return -ENOMEM; /* * Function vma_merge_extend() is called on the * extension we are adding to the already existing vma, * vma_merge_extend() will merge this extension with the * already existing vma (expand operation itself) and * possibly also with the next vma if it becomes * adjacent to the expanded vma and otherwise * compatible. */ vma = vma_merge_extend(&vmi, vma, vrm->delta); if (!vma) { vrm_uncharge(vrm); return -ENOMEM; } vrm->vma = vma; vrm_stat_account(vrm, vrm->delta); return 0; } static bool align_hugetlb(struct vma_remap_struct *vrm) { struct hstate *h __maybe_unused = hstate_vma(vrm->vma); vrm->old_len = ALIGN(vrm->old_len, huge_page_size(h)); vrm->new_len = ALIGN(vrm->new_len, huge_page_size(h)); /* addrs must be huge page aligned */ if (vrm->addr & ~huge_page_mask(h)) return false; if (vrm->new_addr & ~huge_page_mask(h)) return false; /* * Don't allow remap expansion, because the underlying hugetlb * reservation is not yet capable to handle split reservation. */ if (vrm->new_len > vrm->old_len) return false; vrm_set_delta(vrm); return true; } /* * We are mremap()'ing without specifying a fixed address to move to, but are * requesting that the VMA's size be increased. * * Try to do so in-place, if this fails, then move the VMA to a new location to * action the change. */ static unsigned long expand_vma(struct vma_remap_struct *vrm) { unsigned long err; unsigned long addr = vrm->addr; err = resize_is_valid(vrm); if (err) return err; /* * [addr, old_len) spans precisely to the end of the VMA, so try to * expand it in-place. */ if (vrm_can_expand_in_place(vrm)) { err = expand_vma_in_place(vrm); if (err) return err; /* * We want to populate the newly expanded portion of the VMA to * satisfy the expectation that mlock()'ing a VMA maintains all * of its pages in memory. */ if (vrm->mlocked) vrm->new_addr = addr; /* OK we're done! */ return addr; } /* * We weren't able to just expand or shrink the area, * we need to create a new one and move it. */ /* We're not allowed to move the VMA, so error out. */ if (!(vrm->flags & MREMAP_MAYMOVE)) return -ENOMEM; /* Find a new location to move the VMA to. */ err = vrm_set_new_addr(vrm); if (err) return err; return move_vma(vrm); } /* * Attempt to resize the VMA in-place, if we cannot, then move the VMA to the * first available address to perform the operation. */ static unsigned long mremap_at(struct vma_remap_struct *vrm) { unsigned long res; switch (vrm->remap_type) { case MREMAP_INVALID: break; case MREMAP_NO_RESIZE: /* NO-OP CASE - resizing to the same size. */ return vrm->addr; case MREMAP_SHRINK: /* * SHRINK CASE. Can always be done in-place. * * Simply unmap the shrunken portion of the VMA. This does all * the needed commit accounting, and we indicate that the mmap * lock should be dropped. */ res = shrink_vma(vrm, /* drop_lock= */true); if (res) return res; return vrm->addr; case MREMAP_EXPAND: return expand_vma(vrm); } BUG(); } static unsigned long do_mremap(struct vma_remap_struct *vrm) { struct mm_struct *mm = current->mm; struct vm_area_struct *vma; unsigned long ret; ret = check_mremap_params(vrm); if (ret) return ret; vrm->old_len = PAGE_ALIGN(vrm->old_len); vrm->new_len = PAGE_ALIGN(vrm->new_len); vrm_set_delta(vrm); if (mmap_write_lock_killable(mm)) return -EINTR; vrm->mmap_locked = true; vma = vrm->vma = vma_lookup(mm, vrm->addr); if (!vma) { ret = -EFAULT; goto out; } /* If mseal()'d, mremap() is prohibited. */ if (!can_modify_vma(vma)) { ret = -EPERM; goto out; } /* Align to hugetlb page size, if required. */ if (is_vm_hugetlb_page(vma) && !align_hugetlb(vrm)) { ret = -EINVAL; goto out; } vrm->remap_type = vrm_remap_type(vrm); /* Actually execute mremap. */ ret = vrm_implies_new_addr(vrm) ? mremap_to(vrm) : mremap_at(vrm); out: if (vrm->mmap_locked) { mmap_write_unlock(mm); vrm->mmap_locked = false; if (!offset_in_page(ret) && vrm->mlocked && vrm->new_len > vrm->old_len) mm_populate(vrm->new_addr + vrm->old_len, vrm->delta); } userfaultfd_unmap_complete(mm, vrm->uf_unmap_early); mremap_userfaultfd_complete(vrm->uf, vrm->addr, ret, vrm->old_len); userfaultfd_unmap_complete(mm, vrm->uf_unmap); return ret; } /* * Expand (or shrink) an existing mapping, potentially moving it at the * same time (controlled by the MREMAP_MAYMOVE flag and available VM space) * * MREMAP_FIXED option added 5-Dec-1999 by Benjamin LaHaise * This option implies MREMAP_MAYMOVE. */ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len, unsigned long, new_len, unsigned long, flags, unsigned long, new_addr) { struct vm_userfaultfd_ctx uf = NULL_VM_UFFD_CTX; LIST_HEAD(uf_unmap_early); LIST_HEAD(uf_unmap); /* * There is a deliberate asymmetry here: we strip the pointer tag * from the old address but leave the new address alone. This is * for consistency with mmap(), where we prevent the creation of * aliasing mappings in userspace by leaving the tag bits of the * mapping address intact. A non-zero tag will cause the subsequent * range checks to reject the address as invalid. * * See Documentation/arch/arm64/tagged-address-abi.rst for more * information. */ struct vma_remap_struct vrm = { .addr = untagged_addr(addr), .old_len = old_len, .new_len = new_len, .flags = flags, .new_addr = new_addr, .uf = &uf, .uf_unmap_early = &uf_unmap_early, .uf_unmap = &uf_unmap, .remap_type = MREMAP_INVALID, /* We set later. */ }; return do_mremap(&vrm); } |
2 2 2 2 2 2 1 1 3 3 1 3 2 2 2 2 2 2 1 1 4 1 2 1 5 1 2 1 1 4 4 3 3 1 2 1 6 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 | // SPDX-License-Identifier: GPL-2.0 /* * Media device request objects * * Copyright 2018 Cisco Systems, Inc. and/or its affiliates. All rights reserved. * Copyright (C) 2018 Intel Corporation * Copyright (C) 2018 Google, Inc. * * Author: Hans Verkuil <hansverk@cisco.com> * Author: Sakari Ailus <sakari.ailus@linux.intel.com> */ #include <linux/anon_inodes.h> #include <linux/file.h> #include <linux/refcount.h> #include <media/media-device.h> #include <media/media-request.h> static const char * const request_state[] = { [MEDIA_REQUEST_STATE_IDLE] = "idle", [MEDIA_REQUEST_STATE_VALIDATING] = "validating", [MEDIA_REQUEST_STATE_QUEUED] = "queued", [MEDIA_REQUEST_STATE_COMPLETE] = "complete", [MEDIA_REQUEST_STATE_CLEANING] = "cleaning", [MEDIA_REQUEST_STATE_UPDATING] = "updating", }; static const char * media_request_state_str(enum media_request_state state) { BUILD_BUG_ON(ARRAY_SIZE(request_state) != NR_OF_MEDIA_REQUEST_STATE); if (WARN_ON(state >= ARRAY_SIZE(request_state))) return "invalid"; return request_state[state]; } static void media_request_clean(struct media_request *req) { struct media_request_object *obj, *obj_safe; /* Just a sanity check. No other code path is allowed to change this. */ WARN_ON(req->state != MEDIA_REQUEST_STATE_CLEANING); WARN_ON(req->updating_count); WARN_ON(req->access_count); list_for_each_entry_safe(obj, obj_safe, &req->objects, list) { media_request_object_unbind(obj); media_request_object_put(obj); } req->updating_count = 0; req->access_count = 0; WARN_ON(req->num_incomplete_objects); req->num_incomplete_objects = 0; wake_up_interruptible_all(&req->poll_wait); } static void media_request_release(struct kref *kref) { struct media_request *req = container_of(kref, struct media_request, kref); struct media_device *mdev = req->mdev; dev_dbg(mdev->dev, "request: release %s\n", req->debug_str); /* No other users, no need for a spinlock */ req->state = MEDIA_REQUEST_STATE_CLEANING; media_request_clean(req); if (mdev->ops->req_free) mdev->ops->req_free(req); else kfree(req); } void media_request_put(struct media_request *req) { kref_put(&req->kref, media_request_release); } EXPORT_SYMBOL_GPL(media_request_put); static int media_request_close(struct inode *inode, struct file *filp) { struct media_request *req = filp->private_data; media_request_put(req); return 0; } static __poll_t media_request_poll(struct file *filp, struct poll_table_struct *wait) { struct media_request *req = filp->private_data; unsigned long flags; __poll_t ret = 0; if (!(poll_requested_events(wait) & EPOLLPRI)) return 0; poll_wait(filp, &req->poll_wait, wait); spin_lock_irqsave(&req->lock, flags); if (req->state == MEDIA_REQUEST_STATE_COMPLETE) { ret = EPOLLPRI; goto unlock; } if (req->state != MEDIA_REQUEST_STATE_QUEUED) { ret = EPOLLERR; goto unlock; } unlock: spin_unlock_irqrestore(&req->lock, flags); return ret; } static long media_request_ioctl_queue(struct media_request *req) { struct media_device *mdev = req->mdev; enum media_request_state state; unsigned long flags; int ret; dev_dbg(mdev->dev, "request: queue %s\n", req->debug_str); /* * Ensure the request that is validated will be the one that gets queued * next by serialising the queueing process. This mutex is also used * to serialize with canceling a vb2 queue and with setting values such * as controls in a request. */ mutex_lock(&mdev->req_queue_mutex); media_request_get(req); spin_lock_irqsave(&req->lock, flags); if (req->state == MEDIA_REQUEST_STATE_IDLE) req->state = MEDIA_REQUEST_STATE_VALIDATING; state = req->state; spin_unlock_irqrestore(&req->lock, flags); if (state != MEDIA_REQUEST_STATE_VALIDATING) { dev_dbg(mdev->dev, "request: unable to queue %s, request in state %s\n", req->debug_str, media_request_state_str(state)); media_request_put(req); mutex_unlock(&mdev->req_queue_mutex); return -EBUSY; } ret = mdev->ops->req_validate(req); /* * If the req_validate was successful, then we mark the state as QUEUED * and call req_queue. The reason we set the state first is that this * allows req_queue to unbind or complete the queued objects in case * they are immediately 'consumed'. State changes from QUEUED to another * state can only happen if either the driver changes the state or if * the user cancels the vb2 queue. The driver can only change the state * after each object is queued through the req_queue op (and note that * that op cannot fail), so setting the state to QUEUED up front is * safe. * * The other reason for changing the state is if the vb2 queue is * canceled, and that uses the req_queue_mutex which is still locked * while req_queue is called, so that's safe as well. */ spin_lock_irqsave(&req->lock, flags); req->state = ret ? MEDIA_REQUEST_STATE_IDLE : MEDIA_REQUEST_STATE_QUEUED; spin_unlock_irqrestore(&req->lock, flags); if (!ret) mdev->ops->req_queue(req); mutex_unlock(&mdev->req_queue_mutex); if (ret) { dev_dbg(mdev->dev, "request: can't queue %s (%d)\n", req->debug_str, ret); media_request_put(req); } return ret; } static long media_request_ioctl_reinit(struct media_request *req) { struct media_device *mdev = req->mdev; unsigned long flags; spin_lock_irqsave(&req->lock, flags); if (req->state != MEDIA_REQUEST_STATE_IDLE && req->state != MEDIA_REQUEST_STATE_COMPLETE) { dev_dbg(mdev->dev, "request: %s not in idle or complete state, cannot reinit\n", req->debug_str); spin_unlock_irqrestore(&req->lock, flags); return -EBUSY; } if (req->access_count) { dev_dbg(mdev->dev, "request: %s is being accessed, cannot reinit\n", req->debug_str); spin_unlock_irqrestore(&req->lock, flags); return -EBUSY; } req->state = MEDIA_REQUEST_STATE_CLEANING; spin_unlock_irqrestore(&req->lock, flags); media_request_clean(req); spin_lock_irqsave(&req->lock, flags); req->state = MEDIA_REQUEST_STATE_IDLE; spin_unlock_irqrestore(&req->lock, flags); return 0; } static long media_request_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) { struct media_request *req = filp->private_data; switch (cmd) { case MEDIA_REQUEST_IOC_QUEUE: return media_request_ioctl_queue(req); case MEDIA_REQUEST_IOC_REINIT: return media_request_ioctl_reinit(req); default: return -ENOIOCTLCMD; } } static const struct file_operations request_fops = { .owner = THIS_MODULE, .poll = media_request_poll, .unlocked_ioctl = media_request_ioctl, #ifdef CONFIG_COMPAT .compat_ioctl = media_request_ioctl, #endif /* CONFIG_COMPAT */ .release = media_request_close, }; struct media_request * media_request_get_by_fd(struct media_device *mdev, int request_fd) { struct media_request *req; if (!mdev || !mdev->ops || !mdev->ops->req_validate || !mdev->ops->req_queue) return ERR_PTR(-EBADR); CLASS(fd, f)(request_fd); if (fd_empty(f)) goto err; if (fd_file(f)->f_op != &request_fops) goto err; req = fd_file(f)->private_data; if (req->mdev != mdev) goto err; /* * Note: as long as someone has an open filehandle of the request, * the request can never be released. The fdget() above ensures that * even if userspace closes the request filehandle, the release() * fop won't be called, so the media_request_get() always succeeds * and there is no race condition where the request was released * before media_request_get() is called. */ media_request_get(req); return req; err: dev_dbg(mdev->dev, "cannot find request_fd %d\n", request_fd); return ERR_PTR(-EINVAL); } EXPORT_SYMBOL_GPL(media_request_get_by_fd); int media_request_alloc(struct media_device *mdev, int *alloc_fd) { struct media_request *req; struct file *filp; int fd; int ret; /* Either both are NULL or both are non-NULL */ if (WARN_ON(!mdev->ops->req_alloc ^ !mdev->ops->req_free)) return -ENOMEM; if (mdev->ops->req_alloc) req = mdev->ops->req_alloc(mdev); else req = kzalloc(sizeof(*req), GFP_KERNEL); if (!req) return -ENOMEM; fd = get_unused_fd_flags(O_CLOEXEC); if (fd < 0) { ret = fd; goto err_free_req; } filp = anon_inode_getfile("request", &request_fops, NULL, O_CLOEXEC); if (IS_ERR(filp)) { ret = PTR_ERR(filp); goto err_put_fd; } filp->private_data = req; req->mdev = mdev; req->state = MEDIA_REQUEST_STATE_IDLE; req->num_incomplete_objects = 0; kref_init(&req->kref); INIT_LIST_HEAD(&req->objects); spin_lock_init(&req->lock); init_waitqueue_head(&req->poll_wait); req->updating_count = 0; req->access_count = 0; *alloc_fd = fd; snprintf(req->debug_str, sizeof(req->debug_str), "%u:%d", atomic_inc_return(&mdev->request_id), fd); dev_dbg(mdev->dev, "request: allocated %s\n", req->debug_str); fd_install(fd, filp); return 0; err_put_fd: put_unused_fd(fd); err_free_req: if (mdev->ops->req_free) mdev->ops->req_free(req); else kfree(req); return ret; } static void media_request_object_release(struct kref *kref) { struct media_request_object *obj = container_of(kref, struct media_request_object, kref); struct media_request *req = obj->req; if (WARN_ON(req)) media_request_object_unbind(obj); obj->ops->release(obj); } struct media_request_object * media_request_object_find(struct media_request *req, const struct media_request_object_ops *ops, void *priv) { struct media_request_object *obj; struct media_request_object *found = NULL; unsigned long flags; if (WARN_ON(!ops || !priv)) return NULL; spin_lock_irqsave(&req->lock, flags); list_for_each_entry(obj, &req->objects, list) { if (obj->ops == ops && obj->priv == priv) { media_request_object_get(obj); found = obj; break; } } spin_unlock_irqrestore(&req->lock, flags); return found; } EXPORT_SYMBOL_GPL(media_request_object_find); void media_request_object_put(struct media_request_object *obj) { kref_put(&obj->kref, media_request_object_release); } EXPORT_SYMBOL_GPL(media_request_object_put); void media_request_object_init(struct media_request_object *obj) { obj->ops = NULL; obj->req = NULL; obj->priv = NULL; obj->completed = false; INIT_LIST_HEAD(&obj->list); kref_init(&obj->kref); } EXPORT_SYMBOL_GPL(media_request_object_init); int media_request_object_bind(struct media_request *req, const struct media_request_object_ops *ops, void *priv, bool is_buffer, struct media_request_object *obj) { unsigned long flags; int ret = -EBUSY; if (WARN_ON(!ops->release)) return -EBADR; spin_lock_irqsave(&req->lock, flags); if (WARN_ON(req->state != MEDIA_REQUEST_STATE_UPDATING && req->state != MEDIA_REQUEST_STATE_QUEUED)) goto unlock; obj->req = req; obj->ops = ops; obj->priv = priv; if (is_buffer) list_add_tail(&obj->list, &req->objects); else list_add(&obj->list, &req->objects); req->num_incomplete_objects++; ret = 0; unlock: spin_unlock_irqrestore(&req->lock, flags); return ret; } EXPORT_SYMBOL_GPL(media_request_object_bind); void media_request_object_unbind(struct media_request_object *obj) { struct media_request *req = obj->req; unsigned long flags; bool completed = false; if (WARN_ON(!req)) return; spin_lock_irqsave(&req->lock, flags); list_del(&obj->list); obj->req = NULL; if (req->state == MEDIA_REQUEST_STATE_COMPLETE) goto unlock; if (WARN_ON(req->state == MEDIA_REQUEST_STATE_VALIDATING)) goto unlock; if (req->state == MEDIA_REQUEST_STATE_CLEANING) { if (!obj->completed) req->num_incomplete_objects--; goto unlock; } if (WARN_ON(!req->num_incomplete_objects)) goto unlock; req->num_incomplete_objects--; if (req->state == MEDIA_REQUEST_STATE_QUEUED && !req->num_incomplete_objects) { req->state = MEDIA_REQUEST_STATE_COMPLETE; completed = true; wake_up_interruptible_all(&req->poll_wait); } unlock: spin_unlock_irqrestore(&req->lock, flags); if (obj->ops->unbind) obj->ops->unbind(obj); if (completed) media_request_put(req); } EXPORT_SYMBOL_GPL(media_request_object_unbind); void media_request_object_complete(struct media_request_object *obj) { struct media_request *req = obj->req; unsigned long flags; bool completed = false; spin_lock_irqsave(&req->lock, flags); if (obj->completed) goto unlock; obj->completed = true; if (WARN_ON(!req->num_incomplete_objects) || WARN_ON(req->state != MEDIA_REQUEST_STATE_QUEUED)) goto unlock; if (!--req->num_incomplete_objects) { req->state = MEDIA_REQUEST_STATE_COMPLETE; wake_up_interruptible_all(&req->poll_wait); completed = true; } unlock: spin_unlock_irqrestore(&req->lock, flags); if (completed) media_request_put(req); } EXPORT_SYMBOL_GPL(media_request_object_complete); |
8 2 61 488 486 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 | /* SPDX-License-Identifier: GPL-2.0-or-later */ /* SCTP kernel Implementation * (C) Copyright IBM Corp. 2001, 2004 * Copyright (C) 1999-2001 Cisco, Motorola * * This file is part of the SCTP kernel implementation * * These are the definitions needed for the command object. * * Please send any bug reports or fixes you make to the * email address(es): * lksctp developers <linux-sctp@vger.kernel.org> * * Written or modified by: * La Monte H.P. Yarroll <piggy@acm.org> * Karl Knutson <karl@athena.chicago.il.us> * Ardelle Fan <ardelle.fan@intel.com> * Sridhar Samudrala <sri@us.ibm.com> */ #ifndef __net_sctp_command_h__ #define __net_sctp_command_h__ #include <net/sctp/constants.h> #include <net/sctp/structs.h> enum sctp_verb { SCTP_CMD_NOP = 0, /* Do nothing. */ SCTP_CMD_NEW_ASOC, /* Register a new association. */ SCTP_CMD_DELETE_TCB, /* Delete the current association. */ SCTP_CMD_NEW_STATE, /* Enter a new state. */ SCTP_CMD_REPORT_TSN, /* Record the arrival of a TSN. */ SCTP_CMD_GEN_SACK, /* Send a Selective ACK (maybe). */ SCTP_CMD_PROCESS_SACK, /* Process an inbound SACK. */ SCTP_CMD_GEN_INIT_ACK, /* Generate an INIT ACK chunk. */ SCTP_CMD_PEER_INIT, /* Process a INIT from the peer. */ SCTP_CMD_GEN_COOKIE_ECHO, /* Generate a COOKIE ECHO chunk. */ SCTP_CMD_CHUNK_ULP, /* Send a chunk to the sockets layer. */ SCTP_CMD_EVENT_ULP, /* Send a notification to the sockets layer. */ SCTP_CMD_REPLY, /* Send a chunk to our peer. */ SCTP_CMD_SEND_PKT, /* Send a full packet to our peer. */ SCTP_CMD_RETRAN, /* Mark a transport for retransmission. */ SCTP_CMD_ECN_CE, /* Do delayed CE processing. */ SCTP_CMD_ECN_ECNE, /* Do delayed ECNE processing. */ SCTP_CMD_ECN_CWR, /* Do delayed CWR processing. */ SCTP_CMD_TIMER_START, /* Start a timer. */ SCTP_CMD_TIMER_START_ONCE, /* Start a timer once */ SCTP_CMD_TIMER_RESTART, /* Restart a timer. */ SCTP_CMD_TIMER_STOP, /* Stop a timer. */ SCTP_CMD_INIT_CHOOSE_TRANSPORT, /* Choose transport for an INIT. */ SCTP_CMD_INIT_COUNTER_RESET, /* Reset init counter. */ SCTP_CMD_INIT_COUNTER_INC, /* Increment init counter. */ SCTP_CMD_INIT_RESTART, /* High level, do init timer work. */ SCTP_CMD_COOKIEECHO_RESTART, /* High level, do cookie-echo timer work. */ SCTP_CMD_INIT_FAILED, /* High level, do init failure work. */ SCTP_CMD_REPORT_DUP, /* Report a duplicate TSN. */ SCTP_CMD_STRIKE, /* Mark a strike against a transport. */ SCTP_CMD_HB_TIMERS_START, /* Start the heartbeat timers. */ SCTP_CMD_HB_TIMER_UPDATE, /* Update a heartbeat timers. */ SCTP_CMD_HB_TIMERS_STOP, /* Stop the heartbeat timers. */ SCTP_CMD_PROBE_TIMER_UPDATE, /* Update a probe timer. */ SCTP_CMD_TRANSPORT_HB_SENT, /* Reset the status of a transport. */ SCTP_CMD_TRANSPORT_IDLE, /* Do manipulations on idle transport */ SCTP_CMD_TRANSPORT_ON, /* Mark the transport as active. */ SCTP_CMD_REPORT_ERROR, /* Pass this error back out of the sm. */ SCTP_CMD_REPORT_BAD_TAG, /* Verification tags didn't match. */ SCTP_CMD_PROCESS_CTSN, /* Sideeffect from shutdown. */ SCTP_CMD_ASSOC_FAILED, /* Handle association failure. */ SCTP_CMD_DISCARD_PACKET, /* Discard the whole packet. */ SCTP_CMD_GEN_SHUTDOWN, /* Generate a SHUTDOWN chunk. */ SCTP_CMD_PURGE_OUTQUEUE, /* Purge all data waiting to be sent. */ SCTP_CMD_SETUP_T2, /* Hi-level, setup T2-shutdown parms. */ SCTP_CMD_RTO_PENDING, /* Set transport's rto_pending. */ SCTP_CMD_PART_DELIVER, /* Partial data delivery considerations. */ SCTP_CMD_RENEGE, /* Renege data on an association. */ SCTP_CMD_SETUP_T4, /* ADDIP, setup T4 RTO timer parms. */ SCTP_CMD_PROCESS_OPERR, /* Process an ERROR chunk. */ SCTP_CMD_REPORT_FWDTSN, /* Report new cumulative TSN Ack. */ SCTP_CMD_PROCESS_FWDTSN, /* Skips were reported, so process further. */ SCTP_CMD_CLEAR_INIT_TAG, /* Clears association peer's inittag. */ SCTP_CMD_DEL_NON_PRIMARY, /* Removes non-primary peer transports. */ SCTP_CMD_T3_RTX_TIMERS_STOP, /* Stops T3-rtx pending timers */ SCTP_CMD_FORCE_PRIM_RETRAN, /* Forces retrans. over primary path. */ SCTP_CMD_SET_SK_ERR, /* Set sk_err */ SCTP_CMD_ASSOC_CHANGE, /* generate and send assoc_change event */ SCTP_CMD_ADAPTATION_IND, /* generate and send adaptation event */ SCTP_CMD_PEER_NO_AUTH, /* generate and send authentication event */ SCTP_CMD_ASSOC_SHKEY, /* generate the association shared keys */ SCTP_CMD_T1_RETRAN, /* Mark for retransmission after T1 timeout */ SCTP_CMD_UPDATE_INITTAG, /* Update peer inittag */ SCTP_CMD_SEND_MSG, /* Send the whole use message */ SCTP_CMD_PURGE_ASCONF_QUEUE, /* Purge all asconf queues.*/ SCTP_CMD_SET_ASOC, /* Restore association context */ SCTP_CMD_LAST }; /* How many commands can you put in an struct sctp_cmd_seq? * This is a rather arbitrary number, ideally derived from a careful * analysis of the state functions, but in reality just taken from * thin air in the hopes othat we don't trigger a kernel panic. */ #define SCTP_MAX_NUM_COMMANDS 20 union sctp_arg { void *zero_all; /* Set to NULL to clear the entire union */ __s32 i32; __u32 u32; __be32 be32; __u16 u16; __u8 u8; int error; __be16 err; enum sctp_state state; enum sctp_event_timeout to; struct sctp_chunk *chunk; struct sctp_association *asoc; struct sctp_transport *transport; struct sctp_bind_addr *bp; struct sctp_init_chunk *init; struct sctp_ulpevent *ulpevent; struct sctp_packet *packet; struct sctp_sackhdr *sackh; struct sctp_datamsg *msg; }; /* We are simulating ML type constructors here. * * SCTP_ARG_CONSTRUCTOR(NAME, TYPE, ELT) builds a function called * SCTP_NAME() which takes an argument of type TYPE and returns an * union sctp_arg. It does this by inserting the sole argument into * the ELT union element of a local union sctp_arg. * * E.g., SCTP_ARG_CONSTRUCTOR(I32, __s32, i32) builds SCTP_I32(arg), * which takes an __s32 and returns a union sctp_arg containing the * __s32. So, after foo = SCTP_I32(arg), foo.i32 == arg. */ #define SCTP_ARG_CONSTRUCTOR(name, type, elt) \ static inline union sctp_arg \ SCTP_## name (type arg) \ { union sctp_arg retval;\ retval.zero_all = NULL;\ retval.elt = arg;\ return retval;\ } SCTP_ARG_CONSTRUCTOR(I32, __s32, i32) SCTP_ARG_CONSTRUCTOR(U32, __u32, u32) SCTP_ARG_CONSTRUCTOR(BE32, __be32, be32) SCTP_ARG_CONSTRUCTOR(U16, __u16, u16) SCTP_ARG_CONSTRUCTOR(U8, __u8, u8) SCTP_ARG_CONSTRUCTOR(ERROR, int, error) SCTP_ARG_CONSTRUCTOR(PERR, __be16, err) /* protocol error */ SCTP_ARG_CONSTRUCTOR(STATE, enum sctp_state, state) SCTP_ARG_CONSTRUCTOR(TO, enum sctp_event_timeout, to) SCTP_ARG_CONSTRUCTOR(CHUNK, struct sctp_chunk *, chunk) SCTP_ARG_CONSTRUCTOR(ASOC, struct sctp_association *, asoc) SCTP_ARG_CONSTRUCTOR(TRANSPORT, struct sctp_transport *, transport) SCTP_ARG_CONSTRUCTOR(BA, struct sctp_bind_addr *, bp) SCTP_ARG_CONSTRUCTOR(PEER_INIT, struct sctp_init_chunk *, init) SCTP_ARG_CONSTRUCTOR(ULPEVENT, struct sctp_ulpevent *, ulpevent) SCTP_ARG_CONSTRUCTOR(PACKET, struct sctp_packet *, packet) SCTP_ARG_CONSTRUCTOR(SACKH, struct sctp_sackhdr *, sackh) SCTP_ARG_CONSTRUCTOR(DATAMSG, struct sctp_datamsg *, msg) static inline union sctp_arg SCTP_FORCE(void) { return SCTP_I32(1); } static inline union sctp_arg SCTP_NOFORCE(void) { return SCTP_I32(0); } static inline union sctp_arg SCTP_NULL(void) { union sctp_arg retval; retval.zero_all = NULL; return retval; } struct sctp_cmd { union sctp_arg obj; enum sctp_verb verb; }; struct sctp_cmd_seq { struct sctp_cmd cmds[SCTP_MAX_NUM_COMMANDS]; struct sctp_cmd *last_used_slot; struct sctp_cmd *next_cmd; }; /* Initialize a block of memory as a command sequence. * Return 0 if the initialization fails. */ static inline int sctp_init_cmd_seq(struct sctp_cmd_seq *seq) { /* cmds[] is filled backwards to simplify the overflow BUG() check */ seq->last_used_slot = seq->cmds + SCTP_MAX_NUM_COMMANDS; seq->next_cmd = seq->last_used_slot; return 1; /* We always succeed. */ } /* Add a command to an struct sctp_cmd_seq. * * Use the SCTP_* constructors defined by SCTP_ARG_CONSTRUCTOR() above * to wrap data which goes in the obj argument. */ static inline void sctp_add_cmd_sf(struct sctp_cmd_seq *seq, enum sctp_verb verb, union sctp_arg obj) { struct sctp_cmd *cmd = seq->last_used_slot - 1; BUG_ON(cmd < seq->cmds); cmd->verb = verb; cmd->obj = obj; seq->last_used_slot = cmd; } /* Return the next command structure in an sctp_cmd_seq. * Return NULL at the end of the sequence. */ static inline struct sctp_cmd *sctp_next_cmd(struct sctp_cmd_seq *seq) { if (seq->next_cmd <= seq->last_used_slot) return NULL; return --seq->next_cmd; } #endif /* __net_sctp_command_h__ */ |
389 388 387 812 812 812 812 391 811 812 812 7 318 427 17 17 17 17 17 806 807 808 812 812 812 425 426 387 25 25 25 25 812 11 812 811 812 812 2 810 1 3 812 1 812 1 812 9 807 500 499 499 498 499 500 499 497 498 22 500 498 492 6 489 492 29 29 29 29 492 492 7 36 362 347 362 779 780 496 474 476 614 485 496 1 1 488 489 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 | // SPDX-License-Identifier: GPL-2.0 #include "bcachefs.h" #include "bkey.h" #include "bkey_cmp.h" #include "bkey_methods.h" #include "bset.h" #include "util.h" const struct bkey_format bch2_bkey_format_current = BKEY_FORMAT_CURRENT; void bch2_bkey_packed_to_binary_text(struct printbuf *out, const struct bkey_format *f, const struct bkey_packed *k) { const u64 *p = high_word(f, k); unsigned word_bits = 64 - high_bit_offset; unsigned nr_key_bits = bkey_format_key_bits(f) + high_bit_offset; u64 v = *p & (~0ULL >> high_bit_offset); if (!nr_key_bits) { prt_str(out, "(empty)"); return; } while (1) { unsigned next_key_bits = nr_key_bits; if (nr_key_bits < 64) { v >>= 64 - nr_key_bits; next_key_bits = 0; } else { next_key_bits -= 64; } bch2_prt_u64_base2_nbits(out, v, min(word_bits, nr_key_bits)); if (!next_key_bits) break; prt_char(out, ' '); p = next_word(p); v = *p; word_bits = 64; nr_key_bits = next_key_bits; } } static void __bch2_bkey_pack_verify(const struct bkey_packed *packed, const struct bkey *unpacked, const struct bkey_format *format) { struct bkey tmp; BUG_ON(bkeyp_val_u64s(format, packed) != bkey_val_u64s(unpacked)); BUG_ON(packed->u64s < bkeyp_key_u64s(format, packed)); tmp = __bch2_bkey_unpack_key(format, packed); if (memcmp(&tmp, unpacked, sizeof(struct bkey))) { struct printbuf buf = PRINTBUF; prt_printf(&buf, "keys differ: format u64s %u fields %u %u %u %u %u\n", format->key_u64s, format->bits_per_field[0], format->bits_per_field[1], format->bits_per_field[2], format->bits_per_field[3], format->bits_per_field[4]); prt_printf(&buf, "compiled unpack: "); bch2_bkey_to_text(&buf, unpacked); prt_newline(&buf); prt_printf(&buf, "c unpack: "); bch2_bkey_to_text(&buf, &tmp); prt_newline(&buf); prt_printf(&buf, "compiled unpack: "); bch2_bkey_packed_to_binary_text(&buf, &bch2_bkey_format_current, (struct bkey_packed *) unpacked); prt_newline(&buf); prt_printf(&buf, "c unpack: "); bch2_bkey_packed_to_binary_text(&buf, &bch2_bkey_format_current, (struct bkey_packed *) &tmp); prt_newline(&buf); panic("%s", buf.buf); } } static inline void bch2_bkey_pack_verify(const struct bkey_packed *packed, const struct bkey *unpacked, const struct bkey_format *format) { if (static_branch_unlikely(&bch2_debug_check_bkey_unpack)) __bch2_bkey_pack_verify(packed, unpacked, format); } struct pack_state { const struct bkey_format *format; unsigned bits; /* bits remaining in current word */ u64 w; /* current word */ u64 *p; /* pointer to next word */ }; __always_inline static struct pack_state pack_state_init(const struct bkey_format *format, struct bkey_packed *k) { u64 *p = high_word(format, k); return (struct pack_state) { .format = format, .bits = 64 - high_bit_offset, .w = 0, .p = p, }; } __always_inline static void pack_state_finish(struct pack_state *state, struct bkey_packed *k) { EBUG_ON(state->p < k->_data); EBUG_ON(state->p >= (u64 *) k->_data + state->format->key_u64s); *state->p = state->w; } struct unpack_state { const struct bkey_format *format; unsigned bits; /* bits remaining in current word */ u64 w; /* current word */ const u64 *p; /* pointer to next word */ }; __always_inline static struct unpack_state unpack_state_init(const struct bkey_format *format, const struct bkey_packed *k) { const u64 *p = high_word(format, k); return (struct unpack_state) { .format = format, .bits = 64 - high_bit_offset, .w = *p << high_bit_offset, .p = p, }; } __always_inline static u64 get_inc_field(struct unpack_state *state, unsigned field) { unsigned bits = state->format->bits_per_field[field]; u64 v = 0, offset = le64_to_cpu(state->format->field_offset[field]); if (bits >= state->bits) { v = state->w >> (64 - bits); bits -= state->bits; state->p = next_word(state->p); state->w = *state->p; state->bits = 64; } /* avoid shift by 64 if bits is 0 - bits is never 64 here: */ v |= (state->w >> 1) >> (63 - bits); state->w <<= bits; state->bits -= bits; return v + offset; } __always_inline static void __set_inc_field(struct pack_state *state, unsigned field, u64 v) { unsigned bits = state->format->bits_per_field[field]; if (bits) { if (bits > state->bits) { bits -= state->bits; /* avoid shift by 64 if bits is 64 - bits is never 0 here: */ state->w |= (v >> 1) >> (bits - 1); *state->p = state->w; state->p = next_word(state->p); state->w = 0; state->bits = 64; } state->bits -= bits; state->w |= v << state->bits; } } __always_inline static bool set_inc_field(struct pack_state *state, unsigned field, u64 v) { unsigned bits = state->format->bits_per_field[field]; u64 offset = le64_to_cpu(state->format->field_offset[field]); if (v < offset) return false; v -= offset; if (fls64(v) > bits) return false; __set_inc_field(state, field, v); return true; } /* * Note: does NOT set out->format (we don't know what it should be here!) * * Also: doesn't work on extents - it doesn't preserve the invariant that * if k is packed bkey_start_pos(k) will successfully pack */ static bool bch2_bkey_transform_key(const struct bkey_format *out_f, struct bkey_packed *out, const struct bkey_format *in_f, const struct bkey_packed *in) { struct pack_state out_s = pack_state_init(out_f, out); struct unpack_state in_s = unpack_state_init(in_f, in); u64 *w = out->_data; unsigned i; *w = 0; for (i = 0; i < BKEY_NR_FIELDS; i++) if (!set_inc_field(&out_s, i, get_inc_field(&in_s, i))) return false; /* Can't happen because the val would be too big to unpack: */ EBUG_ON(in->u64s - in_f->key_u64s + out_f->key_u64s > U8_MAX); pack_state_finish(&out_s, out); out->u64s = out_f->key_u64s + in->u64s - in_f->key_u64s; out->needs_whiteout = in->needs_whiteout; out->type = in->type; return true; } bool bch2_bkey_transform(const struct bkey_format *out_f, struct bkey_packed *out, const struct bkey_format *in_f, const struct bkey_packed *in) { if (!bch2_bkey_transform_key(out_f, out, in_f, in)) return false; memcpy_u64s((u64 *) out + out_f->key_u64s, (u64 *) in + in_f->key_u64s, (in->u64s - in_f->key_u64s)); return true; } struct bkey __bch2_bkey_unpack_key(const struct bkey_format *format, const struct bkey_packed *in) { struct unpack_state state = unpack_state_init(format, in); struct bkey out; EBUG_ON(format->nr_fields != BKEY_NR_FIELDS); EBUG_ON(in->u64s < format->key_u64s); EBUG_ON(in->format != KEY_FORMAT_LOCAL_BTREE); EBUG_ON(in->u64s - format->key_u64s + BKEY_U64s > U8_MAX); out.u64s = BKEY_U64s + in->u64s - format->key_u64s; out.format = KEY_FORMAT_CURRENT; out.needs_whiteout = in->needs_whiteout; out.type = in->type; out.pad[0] = 0; #define x(id, field) out.field = get_inc_field(&state, id); bkey_fields() #undef x return out; } #ifndef HAVE_BCACHEFS_COMPILED_UNPACK struct bpos __bkey_unpack_pos(const struct bkey_format *format, const struct bkey_packed *in) { struct unpack_state state = unpack_state_init(format, in); struct bpos out; EBUG_ON(format->nr_fields != BKEY_NR_FIELDS); EBUG_ON(in->u64s < format->key_u64s); EBUG_ON(in->format != KEY_FORMAT_LOCAL_BTREE); out.inode = get_inc_field(&state, BKEY_FIELD_INODE); out.offset = get_inc_field(&state, BKEY_FIELD_OFFSET); out.snapshot = get_inc_field(&state, BKEY_FIELD_SNAPSHOT); return out; } #endif /** * bch2_bkey_pack_key -- pack just the key, not the value * @out: packed result * @in: key to pack * @format: format of packed result * * Returns: true on success, false on failure */ bool bch2_bkey_pack_key(struct bkey_packed *out, const struct bkey *in, const struct bkey_format *format) { struct pack_state state = pack_state_init(format, out); u64 *w = out->_data; EBUG_ON((void *) in == (void *) out); EBUG_ON(format->nr_fields != BKEY_NR_FIELDS); EBUG_ON(in->format != KEY_FORMAT_CURRENT); *w = 0; #define x(id, field) if (!set_inc_field(&state, id, in->field)) return false; bkey_fields() #undef x pack_state_finish(&state, out); out->u64s = format->key_u64s + in->u64s - BKEY_U64s; out->format = KEY_FORMAT_LOCAL_BTREE; out->needs_whiteout = in->needs_whiteout; out->type = in->type; bch2_bkey_pack_verify(out, in, format); return true; } /** * bch2_bkey_unpack -- unpack the key and the value * @b: btree node of @src key (for packed format) * @dst: unpacked result * @src: packed input */ void bch2_bkey_unpack(const struct btree *b, struct bkey_i *dst, const struct bkey_packed *src) { __bkey_unpack_key(b, &dst->k, src); memcpy_u64s(&dst->v, bkeyp_val(&b->format, src), bkeyp_val_u64s(&b->format, src)); } /** * bch2_bkey_pack -- pack the key and the value * @dst: packed result * @src: unpacked input * @format: format of packed result * * Returns: true on success, false on failure */ bool bch2_bkey_pack(struct bkey_packed *dst, const struct bkey_i *src, const struct bkey_format *format) { struct bkey_packed tmp; if (!bch2_bkey_pack_key(&tmp, &src->k, format)) return false; memmove_u64s((u64 *) dst + format->key_u64s, &src->v, bkey_val_u64s(&src->k)); memcpy_u64s_small(dst, &tmp, format->key_u64s); return true; } __always_inline static bool set_inc_field_lossy(struct pack_state *state, unsigned field, u64 v) { unsigned bits = state->format->bits_per_field[field]; u64 offset = le64_to_cpu(state->format->field_offset[field]); bool ret = true; EBUG_ON(v < offset); v -= offset; if (fls64(v) > bits) { v = ~(~0ULL << bits); ret = false; } __set_inc_field(state, field, v); return ret; } static bool bkey_packed_successor(struct bkey_packed *out, const struct btree *b, struct bkey_packed k) { const struct bkey_format *f = &b->format; unsigned nr_key_bits = b->nr_key_bits; unsigned first_bit, offset; u64 *p; EBUG_ON(b->nr_key_bits != bkey_format_key_bits(f)); if (!nr_key_bits) return false; *out = k; first_bit = high_bit_offset + nr_key_bits - 1; p = nth_word(high_word(f, out), first_bit >> 6); offset = 63 - (first_bit & 63); while (nr_key_bits) { unsigned bits = min(64 - offset, nr_key_bits); u64 mask = (~0ULL >> (64 - bits)) << offset; if ((*p & mask) != mask) { *p += 1ULL << offset; EBUG_ON(bch2_bkey_cmp_packed(b, out, &k) <= 0); return true; } *p &= ~mask; p = prev_word(p); nr_key_bits -= bits; offset = 0; } return false; } static bool bkey_format_has_too_big_fields(const struct bkey_format *f) { for (unsigned i = 0; i < f->nr_fields; i++) { unsigned unpacked_bits = bch2_bkey_format_current.bits_per_field[i]; u64 unpacked_max = ~((~0ULL << 1) << (unpacked_bits - 1)); u64 packed_max = f->bits_per_field[i] ? ~((~0ULL << 1) << (f->bits_per_field[i] - 1)) : 0; u64 field_offset = le64_to_cpu(f->field_offset[i]); if (packed_max + field_offset < packed_max || packed_max + field_offset > unpacked_max) return true; } return false; } /* * Returns a packed key that compares <= in * * This is used in bset_search_tree(), where we need a packed pos in order to be * able to compare against the keys in the auxiliary search tree - and it's * legal to use a packed pos that isn't equivalent to the original pos, * _provided_ it compares <= to the original pos. */ enum bkey_pack_pos_ret bch2_bkey_pack_pos_lossy(struct bkey_packed *out, struct bpos in, const struct btree *b) { const struct bkey_format *f = &b->format; struct pack_state state = pack_state_init(f, out); u64 *w = out->_data; struct bpos orig = in; bool exact = true; unsigned i; /* * bch2_bkey_pack_key() will write to all of f->key_u64s, minus the 3 * byte header, but pack_pos() won't if the len/version fields are big * enough - we need to make sure to zero them out: */ for (i = 0; i < f->key_u64s; i++) w[i] = 0; if (unlikely(in.snapshot < le64_to_cpu(f->field_offset[BKEY_FIELD_SNAPSHOT]))) { if (!in.offset-- && !in.inode--) return BKEY_PACK_POS_FAIL; in.snapshot = KEY_SNAPSHOT_MAX; exact = false; } if (unlikely(in.offset < le64_to_cpu(f->field_offset[BKEY_FIELD_OFFSET]))) { if (!in.inode--) return BKEY_PACK_POS_FAIL; in.offset = KEY_OFFSET_MAX; in.snapshot = KEY_SNAPSHOT_MAX; exact = false; } if (unlikely(in.inode < le64_to_cpu(f->field_offset[BKEY_FIELD_INODE]))) return BKEY_PACK_POS_FAIL; if (unlikely(!set_inc_field_lossy(&state, BKEY_FIELD_INODE, in.inode))) { in.offset = KEY_OFFSET_MAX; in.snapshot = KEY_SNAPSHOT_MAX; exact = false; } if (unlikely(!set_inc_field_lossy(&state, BKEY_FIELD_OFFSET, in.offset))) { in.snapshot = KEY_SNAPSHOT_MAX; exact = false; } if (unlikely(!set_inc_field_lossy(&state, BKEY_FIELD_SNAPSHOT, in.snapshot))) exact = false; pack_state_finish(&state, out); out->u64s = f->key_u64s; out->format = KEY_FORMAT_LOCAL_BTREE; out->type = KEY_TYPE_deleted; if (static_branch_unlikely(&bch2_debug_check_bkey_unpack)) { if (exact) { BUG_ON(bkey_cmp_left_packed(b, out, &orig)); } else { struct bkey_packed successor; BUG_ON(bkey_cmp_left_packed(b, out, &orig) >= 0); BUG_ON(bkey_packed_successor(&successor, b, *out) && bkey_cmp_left_packed(b, &successor, &orig) < 0 && !bkey_format_has_too_big_fields(f)); } } return exact ? BKEY_PACK_POS_EXACT : BKEY_PACK_POS_SMALLER; } void bch2_bkey_format_init(struct bkey_format_state *s) { unsigned i; for (i = 0; i < ARRAY_SIZE(s->field_min); i++) s->field_min[i] = U64_MAX; for (i = 0; i < ARRAY_SIZE(s->field_max); i++) s->field_max[i] = 0; /* Make sure we can store a size of 0: */ s->field_min[BKEY_FIELD_SIZE] = 0; } void bch2_bkey_format_add_pos(struct bkey_format_state *s, struct bpos p) { unsigned field = 0; __bkey_format_add(s, field++, p.inode); __bkey_format_add(s, field++, p.offset); __bkey_format_add(s, field++, p.snapshot); } /* * We don't want it to be possible for the packed format to represent fields * bigger than a u64... that will cause confusion and issues (like with * bkey_packed_successor()) */ static void set_format_field(struct bkey_format *f, enum bch_bkey_fields i, unsigned bits, u64 offset) { unsigned unpacked_bits = bch2_bkey_format_current.bits_per_field[i]; u64 unpacked_max = ~((~0ULL << 1) << (unpacked_bits - 1)); bits = min(bits, unpacked_bits); offset = bits == unpacked_bits ? 0 : min(offset, unpacked_max - ((1ULL << bits) - 1)); f->bits_per_field[i] = bits; f->field_offset[i] = cpu_to_le64(offset); } struct bkey_format bch2_bkey_format_done(struct bkey_format_state *s) { unsigned i, bits = KEY_PACKED_BITS_START; struct bkey_format ret = { .nr_fields = BKEY_NR_FIELDS, }; for (i = 0; i < ARRAY_SIZE(s->field_min); i++) { s->field_min[i] = min(s->field_min[i], s->field_max[i]); set_format_field(&ret, i, fls64(s->field_max[i] - s->field_min[i]), s->field_min[i]); bits += ret.bits_per_field[i]; } /* allow for extent merging: */ if (ret.bits_per_field[BKEY_FIELD_SIZE]) { unsigned b = min(4U, 32U - ret.bits_per_field[BKEY_FIELD_SIZE]); ret.bits_per_field[BKEY_FIELD_SIZE] += b; bits += b; } ret.key_u64s = DIV_ROUND_UP(bits, 64); /* if we have enough spare bits, round fields up to nearest byte */ bits = ret.key_u64s * 64 - bits; for (i = 0; i < ARRAY_SIZE(ret.bits_per_field); i++) { unsigned r = round_up(ret.bits_per_field[i], 8) - ret.bits_per_field[i]; if (r <= bits) { set_format_field(&ret, i, ret.bits_per_field[i] + r, le64_to_cpu(ret.field_offset[i])); bits -= r; } } if (static_branch_unlikely(&bch2_debug_check_bkey_unpack)) { struct printbuf buf = PRINTBUF; BUG_ON(bch2_bkey_format_invalid(NULL, &ret, 0, &buf)); printbuf_exit(&buf); } return ret; } int bch2_bkey_format_invalid(struct bch_fs *c, struct bkey_format *f, enum bch_validate_flags flags, struct printbuf *err) { unsigned bits = KEY_PACKED_BITS_START; if (f->nr_fields != BKEY_NR_FIELDS) { prt_printf(err, "incorrect number of fields: got %u, should be %u", f->nr_fields, BKEY_NR_FIELDS); return -BCH_ERR_invalid; } /* * Verify that the packed format can't represent fields larger than the * unpacked format: */ for (unsigned i = 0; i < f->nr_fields; i++) { if (bch2_bkey_format_field_overflows(f, i)) { unsigned unpacked_bits = bch2_bkey_format_current.bits_per_field[i]; u64 unpacked_max = ~((~0ULL << 1) << (unpacked_bits - 1)); unsigned packed_bits = min(64, f->bits_per_field[i]); u64 packed_max = packed_bits ? ~((~0ULL << 1) << (packed_bits - 1)) : 0; prt_printf(err, "field %u too large: %llu + %llu > %llu", i, packed_max, le64_to_cpu(f->field_offset[i]), unpacked_max); return -BCH_ERR_invalid; } bits += f->bits_per_field[i]; } if (f->key_u64s != DIV_ROUND_UP(bits, 64)) { prt_printf(err, "incorrect key_u64s: got %u, should be %u", f->key_u64s, DIV_ROUND_UP(bits, 64)); return -BCH_ERR_invalid; } return 0; } void bch2_bkey_format_to_text(struct printbuf *out, const struct bkey_format *f) { prt_printf(out, "u64s %u fields ", f->key_u64s); for (unsigned i = 0; i < ARRAY_SIZE(f->bits_per_field); i++) { if (i) prt_str(out, ", "); prt_printf(out, "%u:%llu", f->bits_per_field[i], le64_to_cpu(f->field_offset[i])); } } /* * Most significant differing bit * Bits are indexed from 0 - return is [0, nr_key_bits) */ __pure unsigned bch2_bkey_greatest_differing_bit(const struct btree *b, const struct bkey_packed *l_k, const struct bkey_packed *r_k) { const u64 *l = high_word(&b->format, l_k); const u64 *r = high_word(&b->format, r_k); unsigned nr_key_bits = b->nr_key_bits; unsigned word_bits = 64 - high_bit_offset; u64 l_v, r_v; EBUG_ON(b->nr_key_bits != bkey_format_key_bits(&b->format)); /* for big endian, skip past header */ l_v = *l & (~0ULL >> high_bit_offset); r_v = *r & (~0ULL >> high_bit_offset); while (nr_key_bits) { if (nr_key_bits < word_bits) { l_v >>= word_bits - nr_key_bits; r_v >>= word_bits - nr_key_bits; nr_key_bits = 0; } else { nr_key_bits -= word_bits; } if (l_v != r_v) return fls64(l_v ^ r_v) - 1 + nr_key_bits; l = next_word(l); r = next_word(r); l_v = *l; r_v = *r; word_bits = 64; } return 0; } /* * First set bit * Bits are indexed from 0 - return is [0, nr_key_bits) */ __pure unsigned bch2_bkey_ffs(const struct btree *b, const struct bkey_packed *k) { const u64 *p = high_word(&b->format, k); unsigned nr_key_bits = b->nr_key_bits; unsigned ret = 0, offset; EBUG_ON(b->nr_key_bits != bkey_format_key_bits(&b->format)); offset = nr_key_bits; while (offset > 64) { p = next_word(p); offset -= 64; } offset = 64 - offset; while (nr_key_bits) { unsigned bits = nr_key_bits + offset < 64 ? nr_key_bits : 64 - offset; u64 mask = (~0ULL >> (64 - bits)) << offset; if (*p & mask) return ret + __ffs64(*p & mask) - offset; p = prev_word(p); nr_key_bits -= bits; ret += bits; offset = 0; } return 0; } #ifdef HAVE_BCACHEFS_COMPILED_UNPACK #define I(_x) (*(out)++ = (_x)) #define I1(i0) I(i0) #define I2(i0, i1) (I1(i0), I(i1)) #define I3(i0, i1, i2) (I2(i0, i1), I(i2)) #define I4(i0, i1, i2, i3) (I3(i0, i1, i2), I(i3)) #define I5(i0, i1, i2, i3, i4) (I4(i0, i1, i2, i3), I(i4)) static u8 *compile_bkey_field(const struct bkey_format *format, u8 *out, enum bch_bkey_fields field, unsigned dst_offset, unsigned dst_size, bool *eax_zeroed) { unsigned bits = format->bits_per_field[field]; u64 offset = le64_to_cpu(format->field_offset[field]); unsigned i, byte, bit_offset, align, shl, shr; if (!bits && !offset) { if (!*eax_zeroed) { /* xor eax, eax */ I2(0x31, 0xc0); } *eax_zeroed = true; goto set_field; } if (!bits) { /* just return offset: */ switch (dst_size) { case 8: if (offset > S32_MAX) { /* mov [rdi + dst_offset], offset */ I3(0xc7, 0x47, dst_offset); memcpy(out, &offset, 4); out += 4; I3(0xc7, 0x47, dst_offset + 4); memcpy(out, (void *) &offset + 4, 4); out += 4; } else { /* mov [rdi + dst_offset], offset */ /* sign extended */ I4(0x48, 0xc7, 0x47, dst_offset); memcpy(out, &offset, 4); out += 4; } break; case 4: /* mov [rdi + dst_offset], offset */ I3(0xc7, 0x47, dst_offset); memcpy(out, &offset, 4); out += 4; break; default: BUG(); } return out; } bit_offset = format->key_u64s * 64; for (i = 0; i <= field; i++) bit_offset -= format->bits_per_field[i]; byte = bit_offset / 8; bit_offset -= byte * 8; *eax_zeroed = false; if (bit_offset == 0 && bits == 8) { /* movzx eax, BYTE PTR [rsi + imm8] */ I4(0x0f, 0xb6, 0x46, byte); } else if (bit_offset == 0 && bits == 16) { /* movzx eax, WORD PTR [rsi + imm8] */ I4(0x0f, 0xb7, 0x46, byte); } else if (bit_offset + bits <= 32) { align = min(4 - DIV_ROUND_UP(bit_offset + bits, 8), byte & 3); byte -= align; bit_offset += align * 8; BUG_ON(bit_offset + bits > 32); /* mov eax, [rsi + imm8] */ I3(0x8b, 0x46, byte); if (bit_offset) { /* shr eax, imm8 */ I3(0xc1, 0xe8, bit_offset); } if (bit_offset + bits < 32) { unsigned mask = ~0U >> (32 - bits); /* and eax, imm32 */ I1(0x25); memcpy(out, &mask, 4); out += 4; } } else if (bit_offset + bits <= 64) { align = min(8 - DIV_ROUND_UP(bit_offset + bits, 8), byte & 7); byte -= align; bit_offset += align * 8; BUG_ON(bit_offset + bits > 64); /* mov rax, [rsi + imm8] */ I4(0x48, 0x8b, 0x46, byte); shl = 64 - bit_offset - bits; shr = bit_offset + shl; if (shl) { /* shl rax, imm8 */ I4(0x48, 0xc1, 0xe0, shl); } if (shr) { /* shr rax, imm8 */ I4(0x48, 0xc1, 0xe8, shr); } } else { align = min(4 - DIV_ROUND_UP(bit_offset + bits, 8), byte & 3); byte -= align; bit_offset += align * 8; BUG_ON(bit_offset + bits > 96); /* mov rax, [rsi + byte] */ I4(0x48, 0x8b, 0x46, byte); /* mov edx, [rsi + byte + 8] */ I3(0x8b, 0x56, byte + 8); /* bits from next word: */ shr = bit_offset + bits - 64; BUG_ON(shr > bit_offset); /* shr rax, bit_offset */ I4(0x48, 0xc1, 0xe8, shr); /* shl rdx, imm8 */ I4(0x48, 0xc1, 0xe2, 64 - shr); /* or rax, rdx */ I3(0x48, 0x09, 0xd0); shr = bit_offset - shr; if (shr) { /* shr rax, imm8 */ I4(0x48, 0xc1, 0xe8, shr); } } /* rax += offset: */ if (offset > S32_MAX) { /* mov rdx, imm64 */ I2(0x48, 0xba); memcpy(out, &offset, 8); out += 8; /* add %rdx, %rax */ I3(0x48, 0x01, 0xd0); } else if (offset + (~0ULL >> (64 - bits)) > U32_MAX) { /* add rax, imm32 */ I2(0x48, 0x05); memcpy(out, &offset, 4); out += 4; } else if (offset) { /* add eax, imm32 */ I1(0x05); memcpy(out, &offset, 4); out += 4; } set_field: switch (dst_size) { case 8: /* mov [rdi + dst_offset], rax */ I4(0x48, 0x89, 0x47, dst_offset); break; case 4: /* mov [rdi + dst_offset], eax */ I3(0x89, 0x47, dst_offset); break; default: BUG(); } return out; } int bch2_compile_bkey_format(const struct bkey_format *format, void *_out) { bool eax_zeroed = false; u8 *out = _out; /* * rdi: dst - unpacked key * rsi: src - packed key */ /* k->u64s, k->format, k->type */ /* mov eax, [rsi] */ I2(0x8b, 0x06); /* add eax, BKEY_U64s - format->key_u64s */ I5(0x05, BKEY_U64s - format->key_u64s, KEY_FORMAT_CURRENT, 0, 0); /* and eax, imm32: mask out k->pad: */ I5(0x25, 0xff, 0xff, 0xff, 0); /* mov [rdi], eax */ I2(0x89, 0x07); #define x(id, field) \ out = compile_bkey_field(format, out, id, \ offsetof(struct bkey, field), \ sizeof(((struct bkey *) NULL)->field), \ &eax_zeroed); bkey_fields() #undef x /* retq */ I1(0xc3); return (void *) out - _out; } #else #endif __pure int __bch2_bkey_cmp_packed_format_checked(const struct bkey_packed *l, const struct bkey_packed *r, const struct btree *b) { return __bch2_bkey_cmp_packed_format_checked_inlined(l, r, b); } __pure __flatten int __bch2_bkey_cmp_left_packed_format_checked(const struct btree *b, const struct bkey_packed *l, const struct bpos *r) { return bpos_cmp(bkey_unpack_pos_format_checked(b, l), *r); } __pure __flatten int bch2_bkey_cmp_packed(const struct btree *b, const struct bkey_packed *l, const struct bkey_packed *r) { return bch2_bkey_cmp_packed_inlined(b, l, r); } __pure __flatten int __bch2_bkey_cmp_left_packed(const struct btree *b, const struct bkey_packed *l, const struct bpos *r) { const struct bkey *l_unpacked; return unlikely(l_unpacked = packed_to_bkey_c(l)) ? bpos_cmp(l_unpacked->p, *r) : __bch2_bkey_cmp_left_packed_format_checked(b, l, r); } void bch2_bpos_swab(struct bpos *p) { u8 *l = (u8 *) p; u8 *h = ((u8 *) &p[1]) - 1; while (l < h) { swap(*l, *h); l++; --h; } } void bch2_bkey_swab_key(const struct bkey_format *_f, struct bkey_packed *k) { const struct bkey_format *f = bkey_packed(k) ? _f : &bch2_bkey_format_current; u8 *l = k->key_start; u8 *h = (u8 *) ((u64 *) k->_data + f->key_u64s) - 1; while (l < h) { swap(*l, *h); l++; --h; } } #ifdef CONFIG_BCACHEFS_DEBUG void bch2_bkey_pack_test(void) { struct bkey t = KEY(4134ULL, 1250629070527416633ULL, 0); struct bkey_packed p; struct bkey_format test_format = { .key_u64s = 3, .nr_fields = BKEY_NR_FIELDS, .bits_per_field = { 13, 64, 32, }, }; struct unpack_state in_s = unpack_state_init(&bch2_bkey_format_current, (void *) &t); struct pack_state out_s = pack_state_init(&test_format, &p); unsigned i; for (i = 0; i < out_s.format->nr_fields; i++) { u64 a, v = get_inc_field(&in_s, i); switch (i) { #define x(id, field) case id: a = t.field; break; bkey_fields() #undef x default: BUG(); } if (a != v) panic("got %llu actual %llu i %u\n", v, a, i); if (!set_inc_field(&out_s, i, v)) panic("failed at %u\n", i); } BUG_ON(!bch2_bkey_pack_key(&p, &t, &test_format)); } #endif |
11 28 3 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 | /* SPDX-License-Identifier: GPL-2.0 */ #ifndef _SCSI_SCSI_HOST_H #define _SCSI_SCSI_HOST_H #include <linux/device.h> #include <linux/list.h> #include <linux/types.h> #include <linux/workqueue.h> #include <linux/mutex.h> #include <linux/seq_file.h> #include <linux/blk-mq.h> #include <scsi/scsi.h> struct block_device; struct completion; struct module; struct scsi_cmnd; struct scsi_device; struct scsi_target; struct Scsi_Host; struct scsi_transport_template; #define SG_ALL SG_CHUNK_SIZE #define MODE_UNKNOWN 0x00 #define MODE_INITIATOR 0x01 #define MODE_TARGET 0x02 /** * enum scsi_timeout_action - How to handle a command that timed out. * @SCSI_EH_DONE: The command has already been completed. * @SCSI_EH_RESET_TIMER: Reset the timer and continue waiting for completion. * @SCSI_EH_NOT_HANDLED: The command has not yet finished. Abort the command. */ enum scsi_timeout_action { SCSI_EH_DONE, SCSI_EH_RESET_TIMER, SCSI_EH_NOT_HANDLED, }; struct scsi_host_template { /* * Put fields referenced in IO submission path together in * same cacheline */ /* * Additional per-command data allocated for the driver. */ unsigned int cmd_size; /* * The queuecommand function is used to queue up a scsi * command block to the LLDD. When the driver finished * processing the command the done callback is invoked. * * If queuecommand returns 0, then the driver has accepted the * command. It must also push it to the HBA if the scsi_cmnd * flag SCMD_LAST is set, or if the driver does not implement * commit_rqs. The done() function must be called on the command * when the driver has finished with it. (you may call done on the * command before queuecommand returns, but in this case you * *must* return 0 from queuecommand). * * Queuecommand may also reject the command, in which case it may * not touch the command and must not call done() for it. * * There are two possible rejection returns: * * SCSI_MLQUEUE_DEVICE_BUSY: Block this device temporarily, but * allow commands to other devices serviced by this host. * * SCSI_MLQUEUE_HOST_BUSY: Block all devices served by this * host temporarily. * * For compatibility, any other non-zero return is treated the * same as SCSI_MLQUEUE_HOST_BUSY. * * NOTE: "temporarily" means either until the next command for# * this device/host completes, or a period of time determined by * I/O pressure in the system if there are no other outstanding * commands. * * STATUS: REQUIRED */ int (* queuecommand)(struct Scsi_Host *, struct scsi_cmnd *); /* * The commit_rqs function is used to trigger a hardware * doorbell after some requests have been queued with * queuecommand, when an error is encountered before sending * the request with SCMD_LAST set. * * STATUS: OPTIONAL */ void (*commit_rqs)(struct Scsi_Host *, u16); struct module *module; const char *name; /* * The info function will return whatever useful information the * developer sees fit. If not provided, then the name field will * be used instead. * * Status: OPTIONAL */ const char *(*info)(struct Scsi_Host *); /* * Ioctl interface * * Status: OPTIONAL */ int (*ioctl)(struct scsi_device *dev, unsigned int cmd, void __user *arg); #ifdef CONFIG_COMPAT /* * Compat handler. Handle 32bit ABI. * When unknown ioctl is passed return -ENOIOCTLCMD. * * Status: OPTIONAL */ int (*compat_ioctl)(struct scsi_device *dev, unsigned int cmd, void __user *arg); #endif int (*init_cmd_priv)(struct Scsi_Host *shost, struct scsi_cmnd *cmd); int (*exit_cmd_priv)(struct Scsi_Host *shost, struct scsi_cmnd *cmd); /* * This is an error handling strategy routine. You don't need to * define one of these if you don't want to - there is a default * routine that is present that should work in most cases. For those * driver authors that have the inclination and ability to write their * own strategy routine, this is where it is specified. Note - the * strategy routine is *ALWAYS* run in the context of the kernel eh * thread. Thus you are guaranteed to *NOT* be in an interrupt * handler when you execute this, and you are also guaranteed to * *NOT* have any other commands being queued while you are in the * strategy routine. When you return from this function, operations * return to normal. * * See scsi_error.c scsi_unjam_host for additional comments about * what this function should and should not be attempting to do. * * Status: REQUIRED (at least one of them) */ int (* eh_abort_handler)(struct scsi_cmnd *); int (* eh_device_reset_handler)(struct scsi_cmnd *); int (* eh_target_reset_handler)(struct scsi_cmnd *); int (* eh_bus_reset_handler)(struct scsi_cmnd *); int (* eh_host_reset_handler)(struct scsi_cmnd *); /* * Before the mid layer attempts to scan for a new device where none * currently exists, it will call this entry in your driver. Should * your driver need to allocate any structs or perform any other init * items in order to send commands to a currently unused target/lun * combo, then this is where you can perform those allocations. This * is specifically so that drivers won't have to perform any kind of * "is this a new device" checks in their queuecommand routine, * thereby making the hot path a bit quicker. * * Return values: 0 on success, non-0 on failure * * Deallocation: If we didn't find any devices at this ID, you will * get an immediate call to sdev_destroy(). If we find something * here then you will get a call to sdev_configure(), then the * device will be used for however long it is kept around, then when * the device is removed from the system (or * possibly at reboot * time), you will then get a call to sdev_destroy(). This is * assuming you implement sdev_configure and sdev_destroy. * However, if you allocate memory and hang it off the device struct, * then you must implement the sdev_destroy() routine at a minimum * in order to avoid leaking memory * each time a device is tore down. * * Status: OPTIONAL */ int (* sdev_init)(struct scsi_device *); /* * Once the device has responded to an INQUIRY and we know the * device is online, we call into the low level driver with the * struct scsi_device *. If the low level device driver implements * this function, it *must* perform the task of setting the queue * depth on the device. All other tasks are optional and depend * on what the driver supports and various implementation details. * * Things currently recommended to be handled at this time include: * * 1. Setting the device queue depth. Proper setting of this is * described in the comments for scsi_change_queue_depth. * 2. Determining if the device supports the various synchronous * negotiation protocols. The device struct will already have * responded to INQUIRY and the results of the standard items * will have been shoved into the various device flag bits, eg. * device->sdtr will be true if the device supports SDTR messages. * 3. Allocating command structs that the device will need. * 4. Setting the default timeout on this device (if needed). * 5. Anything else the low level driver might want to do on a device * specific setup basis... * 6. Return 0 on success, non-0 on error. The device will be marked * as offline on error so that no access will occur. If you return * non-0, your sdev_destroy routine will never get called for this * device, so don't leave any loose memory hanging around, clean * up after yourself before returning non-0 * * Status: OPTIONAL */ int (* sdev_configure)(struct scsi_device *, struct queue_limits *lim); /* * Immediately prior to deallocating the device and after all activity * has ceased the mid layer calls this point so that the low level * driver may completely detach itself from the scsi device and vice * versa. The low level driver is responsible for freeing any memory * it allocated in the sdev_init or sdev_configure calls. * * Status: OPTIONAL */ void (* sdev_destroy)(struct scsi_device *); /* * Before the mid layer attempts to scan for a new device attached * to a target where no target currently exists, it will call this * entry in your driver. Should your driver need to allocate any * structs or perform any other init items in order to send commands * to a currently unused target, then this is where you can perform * those allocations. * * Return values: 0 on success, non-0 on failure * * Status: OPTIONAL */ int (* target_alloc)(struct scsi_target *); /* * Immediately prior to deallocating the target structure, and * after all activity to attached scsi devices has ceased, the * midlayer calls this point so that the driver may deallocate * and terminate any references to the target. * * Note: This callback is called with the host lock held and hence * must not sleep. * * Status: OPTIONAL */ void (* target_destroy)(struct scsi_target *); /* * If a host has the ability to discover targets on its own instead * of scanning the entire bus, it can fill in this function and * call scsi_scan_host(). This function will be called periodically * until it returns 1 with the scsi_host and the elapsed time of * the scan in jiffies. * * Status: OPTIONAL */ int (* scan_finished)(struct Scsi_Host *, unsigned long); /* * If the host wants to be called before the scan starts, but * after the midlayer has set up ready for the scan, it can fill * in this function. * * Status: OPTIONAL */ void (* scan_start)(struct Scsi_Host *); /* * Fill in this function to allow the queue depth of this host * to be changeable (on a per device basis). Returns either * the current queue depth setting (may be different from what * was passed in) or an error. An error should only be * returned if the requested depth is legal but the driver was * unable to set it. If the requested depth is illegal, the * driver should set and return the closest legal queue depth. * * Status: OPTIONAL */ int (* change_queue_depth)(struct scsi_device *, int); /* * This functions lets the driver expose the queue mapping * to the block layer. * * Status: OPTIONAL */ void (* map_queues)(struct Scsi_Host *shost); /* * SCSI interface of blk_poll - poll for IO completions. * Only applicable if SCSI LLD exposes multiple h/w queues. * * Return value: Number of completed entries found. * * Status: OPTIONAL */ int (* mq_poll)(struct Scsi_Host *shost, unsigned int queue_num); /* * Check if scatterlists need to be padded for DMA draining. * * Status: OPTIONAL */ bool (* dma_need_drain)(struct request *rq); /* * This function determines the BIOS parameters for a given * harddisk. These tend to be numbers that are made up by * the host adapter. Parameters: * size, device, list (heads, sectors, cylinders) * * Status: OPTIONAL */ int (* bios_param)(struct scsi_device *, struct block_device *, sector_t, int []); /* * This function is called when one or more partitions on the * device reach beyond the end of the device. * * Status: OPTIONAL */ void (*unlock_native_capacity)(struct scsi_device *); /* * Can be used to export driver statistics and other infos to the * world outside the kernel ie. userspace and it also provides an * interface to feed the driver with information. * * Status: OBSOLETE */ int (*show_info)(struct seq_file *, struct Scsi_Host *); int (*write_info)(struct Scsi_Host *, char *, int); /* * This is an optional routine that allows the transport to become * involved when a scsi io timer fires. The return value tells the * timer routine how to finish the io timeout handling. * * Status: OPTIONAL */ enum scsi_timeout_action (*eh_timed_out)(struct scsi_cmnd *); /* * Optional routine that allows the transport to decide if a cmd * is retryable. Return true if the transport is in a state the * cmd should be retried on. */ bool (*eh_should_retry_cmd)(struct scsi_cmnd *scmd); /* This is an optional routine that allows transport to initiate * LLD adapter or firmware reset using sysfs attribute. * * Return values: 0 on success, -ve value on failure. * * Status: OPTIONAL */ int (*host_reset)(struct Scsi_Host *shost, int reset_type); #define SCSI_ADAPTER_RESET 1 #define SCSI_FIRMWARE_RESET 2 /* * Name of proc directory */ const char *proc_name; /* * This determines if we will use a non-interrupt driven * or an interrupt driven scheme. It is set to the maximum number * of simultaneous commands a single hw queue in HBA will accept. */ int can_queue; /* * In many instances, especially where disconnect / reconnect are * supported, our host also has an ID on the SCSI bus. If this is * the case, then it must be reserved. Please set this_id to -1 if * your setup is in single initiator mode, and the host lacks an * ID. */ int this_id; /* * This determines the degree to which the host adapter is capable * of scatter-gather. */ unsigned short sg_tablesize; unsigned short sg_prot_tablesize; /* * Set this if the host adapter has limitations beside segment count. */ unsigned int max_sectors; /* * Maximum size in bytes of a single segment. */ unsigned int max_segment_size; unsigned int dma_alignment; /* * DMA scatter gather segment boundary limit. A segment crossing this * boundary will be split in two. */ unsigned long dma_boundary; unsigned long virt_boundary_mask; /* * This specifies "machine infinity" for host templates which don't * limit the transfer size. Note this limit represents an absolute * maximum, and may be over the transfer limits allowed for * individual devices (e.g. 256 for SCSI-1). */ #define SCSI_DEFAULT_MAX_SECTORS 1024 /* * True if this host adapter can make good use of linked commands. * This will allow more than one command to be queued to a given * unit on a given host. Set this to the maximum number of command * blocks to be provided for each device. Set this to 1 for one * command block per lun, 2 for two, etc. Do not set this to 0. * You should make sure that the host adapter will do the right thing * before you try setting this above 1. */ short cmd_per_lun; /* * Allocate tags starting from last allocated tag. */ bool tag_alloc_policy_rr : 1; /* * Track QUEUE_FULL events and reduce queue depth on demand. */ unsigned track_queue_depth:1; /* * This specifies the mode that a LLD supports. */ unsigned supported_mode:2; /* * True for emulated SCSI host adapters (e.g. ATAPI). */ unsigned emulated:1; /* * True if the low-level driver performs its own reset-settle delays. */ unsigned skip_settle_delay:1; /* True if the controller does not support WRITE SAME */ unsigned no_write_same:1; /* True if the host uses host-wide tagspace */ unsigned host_tagset:1; /* The queuecommand callback may block. See also BLK_MQ_F_BLOCKING. */ unsigned queuecommand_may_block:1; /* * Countdown for host blocking with no commands outstanding. */ unsigned int max_host_blocked; /* * Default value for the blocking. If the queue is empty, * host_blocked counts down in the request_fn until it restarts * host operations as zero is reached. * * FIXME: This should probably be a value in the template */ #define SCSI_DEFAULT_HOST_BLOCKED 7 /* * Pointer to the SCSI host sysfs attribute groups, NULL terminated. */ const struct attribute_group **shost_groups; /* * Pointer to the SCSI device attribute groups for this host, * NULL terminated. */ const struct attribute_group **sdev_groups; /* * Vendor Identifier associated with the host * * Note: When specifying vendor_id, be sure to read the * Vendor Type and ID formatting requirements specified in * scsi_netlink.h */ u64 vendor_id; }; /* * Temporary #define for host lock push down. Can be removed when all * drivers have been updated to take advantage of unlocked * queuecommand. * */ #define DEF_SCSI_QCMD(func_name) \ int func_name(struct Scsi_Host *shost, struct scsi_cmnd *cmd) \ { \ unsigned long irq_flags; \ int rc; \ spin_lock_irqsave(shost->host_lock, irq_flags); \ rc = func_name##_lck(cmd); \ spin_unlock_irqrestore(shost->host_lock, irq_flags); \ return rc; \ } /* * shost state: If you alter this, you also need to alter scsi_sysfs.c * (for the ascii descriptions) and the state model enforcer: * scsi_host_set_state() */ enum scsi_host_state { SHOST_CREATED = 1, SHOST_RUNNING, SHOST_CANCEL, SHOST_DEL, SHOST_RECOVERY, SHOST_CANCEL_RECOVERY, SHOST_DEL_RECOVERY, }; struct Scsi_Host { /* * __devices is protected by the host_lock, but you should * usually use scsi_device_lookup / shost_for_each_device * to access it and don't care about locking yourself. * In the rare case of being in irq context you can use * their __ prefixed variants with the lock held. NEVER * access this list directly from a driver. */ struct list_head __devices; struct list_head __targets; struct list_head starved_list; spinlock_t default_lock; spinlock_t *host_lock; struct mutex scan_mutex;/* serialize scanning activity */ struct list_head eh_abort_list; struct list_head eh_cmd_q; struct task_struct * ehandler; /* Error recovery thread. */ struct completion * eh_action; /* Wait for specific actions on the host. */ wait_queue_head_t host_wait; const struct scsi_host_template *hostt; struct scsi_transport_template *transportt; struct kref tagset_refcnt; struct completion tagset_freed; /* Area to keep a shared tag map */ struct blk_mq_tag_set tag_set; atomic_t host_blocked; unsigned int host_failed; /* commands that failed. protected by host_lock */ unsigned int host_eh_scheduled; /* EH scheduled without command */ unsigned int host_no; /* Used for IOCTL_GET_IDLUN, /proc/scsi et al. */ /* next two fields are used to bound the time spent in error handling */ int eh_deadline; unsigned long last_reset; /* * These three parameters can be used to allow for wide scsi, * and for host adapters that support multiple busses * The last two should be set to 1 more than the actual max id * or lun (e.g. 8 for SCSI parallel systems). */ unsigned int max_channel; unsigned int max_id; u64 max_lun; /* * This is a unique identifier that must be assigned so that we * have some way of identifying each detected host adapter properly * and uniquely. For hosts that do not support more than one card * in the system at one time, this does not need to be set. It is * initialized to 0 in scsi_host_alloc. */ unsigned int unique_id; /* * The maximum length of SCSI commands that this host can accept. * Probably 12 for most host adapters, but could be 16 for others. * or 260 if the driver supports variable length cdbs. * For drivers that don't set this field, a value of 12 is * assumed. */ unsigned short max_cmd_len; int this_id; int can_queue; short cmd_per_lun; short unsigned int sg_tablesize; short unsigned int sg_prot_tablesize; unsigned int max_sectors; unsigned int opt_sectors; unsigned int max_segment_size; unsigned int dma_alignment; unsigned long dma_boundary; unsigned long virt_boundary_mask; /* * In scsi-mq mode, the number of hardware queues supported by the LLD. * * Note: it is assumed that each hardware queue has a queue depth of * can_queue. In other words, the total queue depth per host * is nr_hw_queues * can_queue. However, for when host_tagset is set, * the total queue depth is can_queue. */ unsigned nr_hw_queues; unsigned nr_maps; unsigned active_mode:2; /* * Host has requested that no further requests come through for the * time being. */ unsigned host_self_blocked:1; /* * Host uses correct SCSI ordering not PC ordering. The bit is * set for the minority of drivers whose authors actually read * the spec ;). */ unsigned reverse_ordering:1; /* Task mgmt function in progress */ unsigned tmf_in_progress:1; /* Asynchronous scan in progress */ unsigned async_scan:1; /* Don't resume host in EH */ unsigned eh_noresume:1; /* The controller does not support WRITE SAME */ unsigned no_write_same:1; /* True if the host uses host-wide tagspace */ unsigned host_tagset:1; /* The queuecommand callback may block. See also BLK_MQ_F_BLOCKING. */ unsigned queuecommand_may_block:1; /* Host responded with short (<36 bytes) INQUIRY result */ unsigned short_inquiry:1; /* The transport requires the LUN bits NOT to be stored in CDB[1] */ unsigned no_scsi2_lun_in_cdb:1; /* * Optional work queue to be utilized by the transport */ struct workqueue_struct *work_q; /* * Task management function work queue */ struct workqueue_struct *tmf_work_q; /* * Value host_blocked counts down from */ unsigned int max_host_blocked; /* Protection Information */ unsigned int prot_capabilities; unsigned char prot_guard_type; /* legacy crap */ unsigned long base; unsigned long io_port; unsigned char n_io_port; unsigned char dma_channel; unsigned int irq; enum scsi_host_state shost_state; /* ldm bits */ struct device shost_gendev, shost_dev; /* * Points to the transport data (if any) which is allocated * separately */ void *shost_data; /* * Points to the physical bus device we'd use to do DMA * Needed just in case we have virtual hosts. */ struct device *dma_dev; /* Delay for runtime autosuspend */ int rpm_autosuspend_delay; /* * We should ensure that this is aligned, both for better performance * and also because some compilers (m68k) don't automatically force * alignment to a long boundary. */ unsigned long hostdata[] /* Used for storage of host specific stuff */ __attribute__ ((aligned (sizeof(unsigned long)))); }; #define class_to_shost(d) \ container_of(d, struct Scsi_Host, shost_dev) #define shost_printk(prefix, shost, fmt, a...) \ dev_printk(prefix, &(shost)->shost_gendev, fmt, ##a) static inline void *shost_priv(struct Scsi_Host *shost) { return (void *)shost->hostdata; } int scsi_is_host_device(const struct device *); static inline struct Scsi_Host *dev_to_shost(struct device *dev) { while (!scsi_is_host_device(dev)) { if (!dev->parent) return NULL; dev = dev->parent; } return container_of(dev, struct Scsi_Host, shost_gendev); } static inline int scsi_host_in_recovery(struct Scsi_Host *shost) { return shost->shost_state == SHOST_RECOVERY || shost->shost_state == SHOST_CANCEL_RECOVERY || shost->shost_state == SHOST_DEL_RECOVERY || shost->tmf_in_progress; } extern int scsi_queue_work(struct Scsi_Host *, struct work_struct *); extern void scsi_flush_work(struct Scsi_Host *); extern struct Scsi_Host *scsi_host_alloc(const struct scsi_host_template *, int); extern int __must_check scsi_add_host_with_dma(struct Scsi_Host *, struct device *, struct device *); #if defined(CONFIG_SCSI_PROC_FS) struct proc_dir_entry * scsi_template_proc_dir(const struct scsi_host_template *sht); #else #define scsi_template_proc_dir(sht) NULL #endif extern void scsi_scan_host(struct Scsi_Host *); extern int scsi_resume_device(struct scsi_device *sdev); extern int scsi_rescan_device(struct scsi_device *sdev); extern void scsi_remove_host(struct Scsi_Host *); extern struct Scsi_Host *scsi_host_get(struct Scsi_Host *); extern int scsi_host_busy(struct Scsi_Host *shost); extern void scsi_host_put(struct Scsi_Host *t); extern struct Scsi_Host *scsi_host_lookup(unsigned int hostnum); extern const char *scsi_host_state_name(enum scsi_host_state); extern void scsi_host_complete_all_commands(struct Scsi_Host *shost, enum scsi_host_status status); static inline int __must_check scsi_add_host(struct Scsi_Host *host, struct device *dev) { return scsi_add_host_with_dma(host, dev, dev); } static inline struct device *scsi_get_device(struct Scsi_Host *shost) { return shost->shost_gendev.parent; } /** * scsi_host_scan_allowed - Is scanning of this host allowed * @shost: Pointer to Scsi_Host. **/ static inline int scsi_host_scan_allowed(struct Scsi_Host *shost) { return shost->shost_state == SHOST_RUNNING || shost->shost_state == SHOST_RECOVERY; } extern void scsi_unblock_requests(struct Scsi_Host *); extern void scsi_block_requests(struct Scsi_Host *); extern int scsi_host_block(struct Scsi_Host *shost); extern int scsi_host_unblock(struct Scsi_Host *shost, int new_state); void scsi_host_busy_iter(struct Scsi_Host *, bool (*fn)(struct scsi_cmnd *, void *), void *priv); struct class_container; /* * DIF defines the exchange of protection information between * initiator and SBC block device. * * DIX defines the exchange of protection information between OS and * initiator. */ enum scsi_host_prot_capabilities { SHOST_DIF_TYPE1_PROTECTION = 1 << 0, /* T10 DIF Type 1 */ SHOST_DIF_TYPE2_PROTECTION = 1 << 1, /* T10 DIF Type 2 */ SHOST_DIF_TYPE3_PROTECTION = 1 << 2, /* T10 DIF Type 3 */ SHOST_DIX_TYPE0_PROTECTION = 1 << 3, /* DIX between OS and HBA only */ SHOST_DIX_TYPE1_PROTECTION = 1 << 4, /* DIX with DIF Type 1 */ SHOST_DIX_TYPE2_PROTECTION = 1 << 5, /* DIX with DIF Type 2 */ SHOST_DIX_TYPE3_PROTECTION = 1 << 6, /* DIX with DIF Type 3 */ }; /* * SCSI hosts which support the Data Integrity Extensions must * indicate their capabilities by setting the prot_capabilities using * this call. */ static inline void scsi_host_set_prot(struct Scsi_Host *shost, unsigned int mask) { shost->prot_capabilities = mask; } static inline unsigned int scsi_host_get_prot(struct Scsi_Host *shost) { return shost->prot_capabilities; } static inline int scsi_host_prot_dma(struct Scsi_Host *shost) { return shost->prot_capabilities >= SHOST_DIX_TYPE0_PROTECTION; } static inline unsigned int scsi_host_dif_capable(struct Scsi_Host *shost, unsigned int target_type) { static unsigned char cap[] = { 0, SHOST_DIF_TYPE1_PROTECTION, SHOST_DIF_TYPE2_PROTECTION, SHOST_DIF_TYPE3_PROTECTION }; if (target_type >= ARRAY_SIZE(cap)) return 0; return shost->prot_capabilities & cap[target_type] ? target_type : 0; } static inline unsigned int scsi_host_dix_capable(struct Scsi_Host *shost, unsigned int target_type) { #if defined(CONFIG_BLK_DEV_INTEGRITY) static unsigned char cap[] = { SHOST_DIX_TYPE0_PROTECTION, SHOST_DIX_TYPE1_PROTECTION, SHOST_DIX_TYPE2_PROTECTION, SHOST_DIX_TYPE3_PROTECTION }; if (target_type >= ARRAY_SIZE(cap)) return 0; return shost->prot_capabilities & cap[target_type]; #endif return 0; } /* * All DIX-capable initiators must support the T10-mandated CRC * checksum. Controllers can optionally implement the IP checksum * scheme which has much lower impact on system performance. Note * that the main rationale for the checksum is to match integrity * metadata with data. Detecting bit errors are a job for ECC memory * and buses. */ enum scsi_host_guard_type { SHOST_DIX_GUARD_CRC = 1 << 0, SHOST_DIX_GUARD_IP = 1 << 1, }; static inline void scsi_host_set_guard(struct Scsi_Host *shost, unsigned char type) { shost->prot_guard_type = type; } static inline unsigned char scsi_host_get_guard(struct Scsi_Host *shost) { return shost->prot_guard_type; } extern int scsi_host_set_state(struct Scsi_Host *, enum scsi_host_state); #endif /* _SCSI_SCSI_HOST_H */ |
2 2 2 1 6 6 6 1 3 1 1 1 1 1 1 2 1 1 1 2 2 1 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 | // SPDX-License-Identifier: GPL-2.0-or-later /* * * Bluetooth HCI UART driver * * Copyright (C) 2002-2003 Fabrizio Gennari <fabrizio.gennari@philips.com> * Copyright (C) 2004-2005 Marcel Holtmann <marcel@holtmann.org> */ #include <linux/module.h> #include <linux/kernel.h> #include <linux/init.h> #include <linux/types.h> #include <linux/fcntl.h> #include <linux/interrupt.h> #include <linux/ptrace.h> #include <linux/poll.h> #include <linux/slab.h> #include <linux/tty.h> #include <linux/errno.h> #include <linux/string.h> #include <linux/signal.h> #include <linux/ioctl.h> #include <linux/skbuff.h> #include <linux/bitrev.h> #include <linux/unaligned.h> #include <net/bluetooth/bluetooth.h> #include <net/bluetooth/hci_core.h> #include "hci_uart.h" static bool txcrc = true; static bool hciextn = true; #define BCSP_TXWINSIZE 4 #define BCSP_ACK_PKT 0x05 #define BCSP_LE_PKT 0x06 struct bcsp_struct { struct sk_buff_head unack; /* Unack'ed packets queue */ struct sk_buff_head rel; /* Reliable packets queue */ struct sk_buff_head unrel; /* Unreliable packets queue */ unsigned long rx_count; struct sk_buff *rx_skb; u8 rxseq_txack; /* rxseq == txack. */ u8 rxack; /* Last packet sent by us that the peer ack'ed */ struct timer_list tbcsp; struct hci_uart *hu; enum { BCSP_W4_PKT_DELIMITER, BCSP_W4_PKT_START, BCSP_W4_BCSP_HDR, BCSP_W4_DATA, BCSP_W4_CRC } rx_state; enum { BCSP_ESCSTATE_NOESC, BCSP_ESCSTATE_ESC } rx_esc_state; u8 use_crc; u16 message_crc; u8 txack_req; /* Do we need to send ack's to the peer? */ /* Reliable packet sequence number - used to assign seq to each rel pkt. */ u8 msgq_txseq; }; /* ---- BCSP CRC calculation ---- */ /* Table for calculating CRC for polynomial 0x1021, LSB processed first, * initial value 0xffff, bits shifted in reverse order. */ static const u16 crc_table[] = { 0x0000, 0x1081, 0x2102, 0x3183, 0x4204, 0x5285, 0x6306, 0x7387, 0x8408, 0x9489, 0xa50a, 0xb58b, 0xc60c, 0xd68d, 0xe70e, 0xf78f }; /* Initialise the crc calculator */ #define BCSP_CRC_INIT(x) x = 0xffff /* Update crc with next data byte * * Implementation note * The data byte is treated as two nibbles. The crc is generated * in reverse, i.e., bits are fed into the register from the top. */ static void bcsp_crc_update(u16 *crc, u8 d) { u16 reg = *crc; reg = (reg >> 4) ^ crc_table[(reg ^ d) & 0x000f]; reg = (reg >> 4) ^ crc_table[(reg ^ (d >> 4)) & 0x000f]; *crc = reg; } /* ---- BCSP core ---- */ static void bcsp_slip_msgdelim(struct sk_buff *skb) { const char pkt_delim = 0xc0; skb_put_data(skb, &pkt_delim, 1); } static void bcsp_slip_one_byte(struct sk_buff *skb, u8 c) { const char esc_c0[2] = { 0xdb, 0xdc }; const char esc_db[2] = { 0xdb, 0xdd }; switch (c) { case 0xc0: skb_put_data(skb, &esc_c0, 2); break; case 0xdb: skb_put_data(skb, &esc_db, 2); break; default: skb_put_data(skb, &c, 1); } } static int bcsp_enqueue(struct hci_uart *hu, struct sk_buff *skb) { struct bcsp_struct *bcsp = hu->priv; if (skb->len > 0xFFF) { BT_ERR("Packet too long"); kfree_skb(skb); return 0; } switch (hci_skb_pkt_type(skb)) { case HCI_ACLDATA_PKT: case HCI_COMMAND_PKT: skb_queue_tail(&bcsp->rel, skb); break; case HCI_SCODATA_PKT: skb_queue_tail(&bcsp->unrel, skb); break; default: BT_ERR("Unknown packet type"); kfree_skb(skb); break; } return 0; } static struct sk_buff *bcsp_prepare_pkt(struct bcsp_struct *bcsp, u8 *data, int len, int pkt_type) { struct sk_buff *nskb; u8 hdr[4], chan; u16 BCSP_CRC_INIT(bcsp_txmsg_crc); int rel, i; switch (pkt_type) { case HCI_ACLDATA_PKT: chan = 6; /* BCSP ACL channel */ rel = 1; /* reliable channel */ break; case HCI_COMMAND_PKT: chan = 5; /* BCSP cmd/evt channel */ rel = 1; /* reliable channel */ break; case HCI_SCODATA_PKT: chan = 7; /* BCSP SCO channel */ rel = 0; /* unreliable channel */ break; case BCSP_LE_PKT: chan = 1; /* BCSP LE channel */ rel = 0; /* unreliable channel */ break; case BCSP_ACK_PKT: chan = 0; /* BCSP internal channel */ rel = 0; /* unreliable channel */ break; default: BT_ERR("Unknown packet type"); return NULL; } if (hciextn && chan == 5) { __le16 opcode = ((struct hci_command_hdr *)data)->opcode; /* Vendor specific commands */ if (hci_opcode_ogf(__le16_to_cpu(opcode)) == 0x3f) { u8 desc = *(data + HCI_COMMAND_HDR_SIZE); if ((desc & 0xf0) == 0xc0) { data += HCI_COMMAND_HDR_SIZE + 1; len -= HCI_COMMAND_HDR_SIZE + 1; chan = desc & 0x0f; } } } /* Max len of packet: (original len +4(bcsp hdr) +2(crc))*2 * (because bytes 0xc0 and 0xdb are escaped, worst case is * when the packet is all made of 0xc0 and 0xdb :) ) * + 2 (0xc0 delimiters at start and end). */ nskb = alloc_skb((len + 6) * 2 + 2, GFP_ATOMIC); if (!nskb) return NULL; hci_skb_pkt_type(nskb) = pkt_type; bcsp_slip_msgdelim(nskb); hdr[0] = bcsp->rxseq_txack << 3; bcsp->txack_req = 0; BT_DBG("We request packet no %u to card", bcsp->rxseq_txack); if (rel) { hdr[0] |= 0x80 + bcsp->msgq_txseq; BT_DBG("Sending packet with seqno %u", bcsp->msgq_txseq); bcsp->msgq_txseq = (bcsp->msgq_txseq + 1) & 0x07; } if (bcsp->use_crc) hdr[0] |= 0x40; hdr[1] = ((len << 4) & 0xff) | chan; hdr[2] = len >> 4; hdr[3] = ~(hdr[0] + hdr[1] + hdr[2]); /* Put BCSP header */ for (i = 0; i < 4; i++) { bcsp_slip_one_byte(nskb, hdr[i]); if (bcsp->use_crc) bcsp_crc_update(&bcsp_txmsg_crc, hdr[i]); } /* Put payload */ for (i = 0; i < len; i++) { bcsp_slip_one_byte(nskb, data[i]); if (bcsp->use_crc) bcsp_crc_update(&bcsp_txmsg_crc, data[i]); } /* Put CRC */ if (bcsp->use_crc) { bcsp_txmsg_crc = bitrev16(bcsp_txmsg_crc); bcsp_slip_one_byte(nskb, (u8)((bcsp_txmsg_crc >> 8) & 0x00ff)); bcsp_slip_one_byte(nskb, (u8)(bcsp_txmsg_crc & 0x00ff)); } bcsp_slip_msgdelim(nskb); return nskb; } /* This is a rewrite of pkt_avail in ABCSP */ static struct sk_buff *bcsp_dequeue(struct hci_uart *hu) { struct bcsp_struct *bcsp = hu->priv; unsigned long flags; struct sk_buff *skb; /* First of all, check for unreliable messages in the queue, * since they have priority */ skb = skb_dequeue(&bcsp->unrel); if (skb != NULL) { struct sk_buff *nskb; nskb = bcsp_prepare_pkt(bcsp, skb->data, skb->len, hci_skb_pkt_type(skb)); if (nskb) { kfree_skb(skb); return nskb; } else { skb_queue_head(&bcsp->unrel, skb); BT_ERR("Could not dequeue pkt because alloc_skb failed"); } } /* Now, try to send a reliable pkt. We can only send a * reliable packet if the number of packets sent but not yet ack'ed * is < than the winsize */ spin_lock_irqsave_nested(&bcsp->unack.lock, flags, SINGLE_DEPTH_NESTING); if (bcsp->unack.qlen < BCSP_TXWINSIZE) { skb = skb_dequeue(&bcsp->rel); if (skb != NULL) { struct sk_buff *nskb; nskb = bcsp_prepare_pkt(bcsp, skb->data, skb->len, hci_skb_pkt_type(skb)); if (nskb) { __skb_queue_tail(&bcsp->unack, skb); mod_timer(&bcsp->tbcsp, jiffies + HZ / 4); spin_unlock_irqrestore(&bcsp->unack.lock, flags); return nskb; } else { skb_queue_head(&bcsp->rel, skb); BT_ERR("Could not dequeue pkt because alloc_skb failed"); } } } spin_unlock_irqrestore(&bcsp->unack.lock, flags); /* We could not send a reliable packet, either because there are * none or because there are too many unack'ed pkts. Did we receive * any packets we have not acknowledged yet ? */ if (bcsp->txack_req) { /* if so, craft an empty ACK pkt and send it on BCSP unreliable * channel 0 */ struct sk_buff *nskb = bcsp_prepare_pkt(bcsp, NULL, 0, BCSP_ACK_PKT); return nskb; } /* We have nothing to send */ return NULL; } static int bcsp_flush(struct hci_uart *hu) { BT_DBG("hu %p", hu); return 0; } /* Remove ack'ed packets */ static void bcsp_pkt_cull(struct bcsp_struct *bcsp) { struct sk_buff *skb, *tmp; unsigned long flags; int i, pkts_to_be_removed; u8 seqno; spin_lock_irqsave(&bcsp->unack.lock, flags); pkts_to_be_removed = skb_queue_len(&bcsp->unack); seqno = bcsp->msgq_txseq; while (pkts_to_be_removed) { if (bcsp->rxack == seqno) break; pkts_to_be_removed--; seqno = (seqno - 1) & 0x07; } if (bcsp->rxack != seqno) BT_ERR("Peer acked invalid packet"); BT_DBG("Removing %u pkts out of %u, up to seqno %u", pkts_to_be_removed, skb_queue_len(&bcsp->unack), (seqno - 1) & 0x07); i = 0; skb_queue_walk_safe(&bcsp->unack, skb, tmp) { if (i >= pkts_to_be_removed) break; i++; __skb_unlink(skb, &bcsp->unack); dev_kfree_skb_irq(skb); } if (skb_queue_empty(&bcsp->unack)) timer_delete(&bcsp->tbcsp); spin_unlock_irqrestore(&bcsp->unack.lock, flags); if (i != pkts_to_be_removed) BT_ERR("Removed only %u out of %u pkts", i, pkts_to_be_removed); } /* Handle BCSP link-establishment packets. When we * detect a "sync" packet, symptom that the BT module has reset, * we do nothing :) (yet) */ static void bcsp_handle_le_pkt(struct hci_uart *hu) { struct bcsp_struct *bcsp = hu->priv; u8 conf_pkt[4] = { 0xad, 0xef, 0xac, 0xed }; u8 conf_rsp_pkt[4] = { 0xde, 0xad, 0xd0, 0xd0 }; u8 sync_pkt[4] = { 0xda, 0xdc, 0xed, 0xed }; /* spot "conf" pkts and reply with a "conf rsp" pkt */ if (bcsp->rx_skb->data[1] >> 4 == 4 && bcsp->rx_skb->data[2] == 0 && !memcmp(&bcsp->rx_skb->data[4], conf_pkt, 4)) { struct sk_buff *nskb = alloc_skb(4, GFP_ATOMIC); BT_DBG("Found a LE conf pkt"); if (!nskb) return; skb_put_data(nskb, conf_rsp_pkt, 4); hci_skb_pkt_type(nskb) = BCSP_LE_PKT; skb_queue_head(&bcsp->unrel, nskb); hci_uart_tx_wakeup(hu); } /* Spot "sync" pkts. If we find one...disaster! */ else if (bcsp->rx_skb->data[1] >> 4 == 4 && bcsp->rx_skb->data[2] == 0 && !memcmp(&bcsp->rx_skb->data[4], sync_pkt, 4)) { BT_ERR("Found a LE sync pkt, card has reset"); } } static inline void bcsp_unslip_one_byte(struct bcsp_struct *bcsp, unsigned char byte) { const u8 c0 = 0xc0, db = 0xdb; switch (bcsp->rx_esc_state) { case BCSP_ESCSTATE_NOESC: switch (byte) { case 0xdb: bcsp->rx_esc_state = BCSP_ESCSTATE_ESC; break; default: skb_put_data(bcsp->rx_skb, &byte, 1); if ((bcsp->rx_skb->data[0] & 0x40) != 0 && bcsp->rx_state != BCSP_W4_CRC) bcsp_crc_update(&bcsp->message_crc, byte); bcsp->rx_count--; } break; case BCSP_ESCSTATE_ESC: switch (byte) { case 0xdc: skb_put_data(bcsp->rx_skb, &c0, 1); if ((bcsp->rx_skb->data[0] & 0x40) != 0 && bcsp->rx_state != BCSP_W4_CRC) bcsp_crc_update(&bcsp->message_crc, 0xc0); bcsp->rx_esc_state = BCSP_ESCSTATE_NOESC; bcsp->rx_count--; break; case 0xdd: skb_put_data(bcsp->rx_skb, &db, 1); if ((bcsp->rx_skb->data[0] & 0x40) != 0 && bcsp->rx_state != BCSP_W4_CRC) bcsp_crc_update(&bcsp->message_crc, 0xdb); bcsp->rx_esc_state = BCSP_ESCSTATE_NOESC; bcsp->rx_count--; break; default: BT_ERR("Invalid byte %02x after esc byte", byte); kfree_skb(bcsp->rx_skb); bcsp->rx_skb = NULL; bcsp->rx_state = BCSP_W4_PKT_DELIMITER; bcsp->rx_count = 0; } } } static void bcsp_complete_rx_pkt(struct hci_uart *hu) { struct bcsp_struct *bcsp = hu->priv; int pass_up = 0; if (bcsp->rx_skb->data[0] & 0x80) { /* reliable pkt */ BT_DBG("Received seqno %u from card", bcsp->rxseq_txack); /* check the rx sequence number is as expected */ if ((bcsp->rx_skb->data[0] & 0x07) == bcsp->rxseq_txack) { bcsp->rxseq_txack++; bcsp->rxseq_txack %= 0x8; } else { /* handle re-transmitted packet or * when packet was missed */ BT_ERR("Out-of-order packet arrived, got %u expected %u", bcsp->rx_skb->data[0] & 0x07, bcsp->rxseq_txack); /* do not process out-of-order packet payload */ pass_up = 2; } /* send current txack value to all received reliable packets */ bcsp->txack_req = 1; /* If needed, transmit an ack pkt */ hci_uart_tx_wakeup(hu); } bcsp->rxack = (bcsp->rx_skb->data[0] >> 3) & 0x07; BT_DBG("Request for pkt %u from card", bcsp->rxack); /* handle received ACK indications, * including those from out-of-order packets */ bcsp_pkt_cull(bcsp); if (pass_up != 2) { if ((bcsp->rx_skb->data[1] & 0x0f) == 6 && (bcsp->rx_skb->data[0] & 0x80)) { hci_skb_pkt_type(bcsp->rx_skb) = HCI_ACLDATA_PKT; pass_up = 1; } else if ((bcsp->rx_skb->data[1] & 0x0f) == 5 && (bcsp->rx_skb->data[0] & 0x80)) { hci_skb_pkt_type(bcsp->rx_skb) = HCI_EVENT_PKT; pass_up = 1; } else if ((bcsp->rx_skb->data[1] & 0x0f) == 7) { hci_skb_pkt_type(bcsp->rx_skb) = HCI_SCODATA_PKT; pass_up = 1; } else if ((bcsp->rx_skb->data[1] & 0x0f) == 1 && !(bcsp->rx_skb->data[0] & 0x80)) { bcsp_handle_le_pkt(hu); pass_up = 0; } else { pass_up = 0; } } if (pass_up == 0) { struct hci_event_hdr hdr; u8 desc = (bcsp->rx_skb->data[1] & 0x0f); if (desc != 0 && desc != 1) { if (hciextn) { desc |= 0xc0; skb_pull(bcsp->rx_skb, 4); memcpy(skb_push(bcsp->rx_skb, 1), &desc, 1); hdr.evt = 0xff; hdr.plen = bcsp->rx_skb->len; memcpy(skb_push(bcsp->rx_skb, HCI_EVENT_HDR_SIZE), &hdr, HCI_EVENT_HDR_SIZE); hci_skb_pkt_type(bcsp->rx_skb) = HCI_EVENT_PKT; hci_recv_frame(hu->hdev, bcsp->rx_skb); } else { BT_ERR("Packet for unknown channel (%u %s)", bcsp->rx_skb->data[1] & 0x0f, bcsp->rx_skb->data[0] & 0x80 ? "reliable" : "unreliable"); kfree_skb(bcsp->rx_skb); } } else kfree_skb(bcsp->rx_skb); } else if (pass_up == 1) { /* Pull out BCSP hdr */ skb_pull(bcsp->rx_skb, 4); hci_recv_frame(hu->hdev, bcsp->rx_skb); } else { /* ignore packet payload of already ACKed re-transmitted * packets or when a packet was missed in the BCSP window */ kfree_skb(bcsp->rx_skb); } bcsp->rx_state = BCSP_W4_PKT_DELIMITER; bcsp->rx_skb = NULL; } static u16 bscp_get_crc(struct bcsp_struct *bcsp) { return get_unaligned_be16(&bcsp->rx_skb->data[bcsp->rx_skb->len - 2]); } /* Recv data */ static int bcsp_recv(struct hci_uart *hu, const void *data, int count) { struct bcsp_struct *bcsp = hu->priv; const unsigned char *ptr; BT_DBG("hu %p count %d rx_state %d rx_count %ld", hu, count, bcsp->rx_state, bcsp->rx_count); ptr = data; while (count) { if (bcsp->rx_count) { if (*ptr == 0xc0) { BT_ERR("Short BCSP packet"); kfree_skb(bcsp->rx_skb); bcsp->rx_skb = NULL; bcsp->rx_state = BCSP_W4_PKT_START; bcsp->rx_count = 0; } else bcsp_unslip_one_byte(bcsp, *ptr); ptr++; count--; continue; } switch (bcsp->rx_state) { case BCSP_W4_BCSP_HDR: if ((0xff & (u8)~(bcsp->rx_skb->data[0] + bcsp->rx_skb->data[1] + bcsp->rx_skb->data[2])) != bcsp->rx_skb->data[3]) { BT_ERR("Error in BCSP hdr checksum"); kfree_skb(bcsp->rx_skb); bcsp->rx_skb = NULL; bcsp->rx_state = BCSP_W4_PKT_DELIMITER; bcsp->rx_count = 0; continue; } bcsp->rx_state = BCSP_W4_DATA; bcsp->rx_count = (bcsp->rx_skb->data[1] >> 4) + (bcsp->rx_skb->data[2] << 4); /* May be 0 */ continue; case BCSP_W4_DATA: if (bcsp->rx_skb->data[0] & 0x40) { /* pkt with crc */ bcsp->rx_state = BCSP_W4_CRC; bcsp->rx_count = 2; } else bcsp_complete_rx_pkt(hu); continue; case BCSP_W4_CRC: if (bitrev16(bcsp->message_crc) != bscp_get_crc(bcsp)) { BT_ERR("Checksum failed: computed %04x received %04x", bitrev16(bcsp->message_crc), bscp_get_crc(bcsp)); kfree_skb(bcsp->rx_skb); bcsp->rx_skb = NULL; bcsp->rx_state = BCSP_W4_PKT_DELIMITER; bcsp->rx_count = 0; continue; } skb_trim(bcsp->rx_skb, bcsp->rx_skb->len - 2); bcsp_complete_rx_pkt(hu); continue; case BCSP_W4_PKT_DELIMITER: switch (*ptr) { case 0xc0: bcsp->rx_state = BCSP_W4_PKT_START; break; default: /*BT_ERR("Ignoring byte %02x", *ptr);*/ break; } ptr++; count--; break; case BCSP_W4_PKT_START: switch (*ptr) { case 0xc0: ptr++; count--; break; default: bcsp->rx_state = BCSP_W4_BCSP_HDR; bcsp->rx_count = 4; bcsp->rx_esc_state = BCSP_ESCSTATE_NOESC; BCSP_CRC_INIT(bcsp->message_crc); /* Do not increment ptr or decrement count * Allocate packet. Max len of a BCSP pkt= * 0xFFF (payload) +4 (header) +2 (crc) */ bcsp->rx_skb = bt_skb_alloc(0x1005, GFP_ATOMIC); if (!bcsp->rx_skb) { BT_ERR("Can't allocate mem for new packet"); bcsp->rx_state = BCSP_W4_PKT_DELIMITER; bcsp->rx_count = 0; return 0; } break; } break; } } return count; } /* Arrange to retransmit all messages in the relq. */ static void bcsp_timed_event(struct timer_list *t) { struct bcsp_struct *bcsp = timer_container_of(bcsp, t, tbcsp); struct hci_uart *hu = bcsp->hu; struct sk_buff *skb; unsigned long flags; BT_DBG("hu %p retransmitting %u pkts", hu, bcsp->unack.qlen); spin_lock_irqsave_nested(&bcsp->unack.lock, flags, SINGLE_DEPTH_NESTING); while ((skb = __skb_dequeue_tail(&bcsp->unack)) != NULL) { bcsp->msgq_txseq = (bcsp->msgq_txseq - 1) & 0x07; skb_queue_head(&bcsp->rel, skb); } spin_unlock_irqrestore(&bcsp->unack.lock, flags); hci_uart_tx_wakeup(hu); } static int bcsp_open(struct hci_uart *hu) { struct bcsp_struct *bcsp; BT_DBG("hu %p", hu); bcsp = kzalloc(sizeof(*bcsp), GFP_KERNEL); if (!bcsp) return -ENOMEM; hu->priv = bcsp; bcsp->hu = hu; skb_queue_head_init(&bcsp->unack); skb_queue_head_init(&bcsp->rel); skb_queue_head_init(&bcsp->unrel); timer_setup(&bcsp->tbcsp, bcsp_timed_event, 0); bcsp->rx_state = BCSP_W4_PKT_DELIMITER; if (txcrc) bcsp->use_crc = 1; return 0; } static int bcsp_close(struct hci_uart *hu) { struct bcsp_struct *bcsp = hu->priv; timer_shutdown_sync(&bcsp->tbcsp); hu->priv = NULL; BT_DBG("hu %p", hu); skb_queue_purge(&bcsp->unack); skb_queue_purge(&bcsp->rel); skb_queue_purge(&bcsp->unrel); if (bcsp->rx_skb) { kfree_skb(bcsp->rx_skb); bcsp->rx_skb = NULL; } kfree(bcsp); return 0; } static const struct hci_uart_proto bcsp = { .id = HCI_UART_BCSP, .name = "BCSP", .open = bcsp_open, .close = bcsp_close, .enqueue = bcsp_enqueue, .dequeue = bcsp_dequeue, .recv = bcsp_recv, .flush = bcsp_flush }; int __init bcsp_init(void) { return hci_uart_register_proto(&bcsp); } int __exit bcsp_deinit(void) { return hci_uart_unregister_proto(&bcsp); } module_param(txcrc, bool, 0644); MODULE_PARM_DESC(txcrc, "Transmit CRC with every BCSP packet"); module_param(hciextn, bool, 0644); MODULE_PARM_DESC(hciextn, "Convert HCI Extensions into BCSP packets"); |
232 233 122 122 123 9 72 14 18 82 19 83 3 14 9 63 3 3 9 73 34 8 89 93 4 11 11 75 75 75 73 1 1 8 15 3 9 24 12 7 5 16 3 40 21 19 1 2 13 7 28 1 6 21 8 19 29 1 1 27 14 4 6 21 14 13 12 3 5 13 13 12 13 20 1 19 2 7 4 6 7 7 12 1 5 9 17 17 17 69 9 68 44 67 2 69 13 6 8 5 46 46 45 45 1 4 1 4 65 4 65 64 4 65 64 7 78 6 57 22 3 16 16 6 9 16 14 9 16 13 14 13 14 14 13 12 3 12 2 13 1 2 11 2 11 1 6 1 12 1 13 9 9 5 1 4 3 13 14 14 23 23 3 15 18 5 14 14 18 16 4 16 18 18 1 1 14 7 9 61 60 3 2 6 1 5 34 40 6 6 7 1 5 6 6 7 7 5 4 4 1 1 1 31 1 31 21 18 12 18 1 17 4 12 13 1 11 1 36 1 35 36 23 23 3 9 1 10 3 10 1 10 7 3 3 3 3 1 9 9 1 1 7 7 14 14 4 3 2 3 3 1 1 1 3 4 14 14 33 34 13 34 2 17 17 5 7 1 1 1 1 3 2 3 17 16 1 1 2 2 11 4 2 5 11 1 8 2 2 3 11 51 50 50 1 2 71 69 72 4 2 1 1 1 3 61 50 1 1 62 3 48 2 1 1 48 3 48 51 50 1 50 49 1 2 3 44 1 1 1 36 32 2 3 16 10 7 6 4 1 4 3 1 4 16 1 2 1 4 7 11 1 1 11 1 33 39 41 40 2 55 63 2 63 27 44 63 3 61 3 1 2 6 6 3 2 1 4 4 1 2 1 1 2 1 1 1 1 1 1 2 126 1 140 138 108 11 1 3 7 18 2 5 8 2 6 5 15 4 1 3 1 1 1 1 1 1 1 4 2 53 1 20 1 1 2 1 4 1 1 9 11 5 1 4 6 2 1 2 2 2 2 1 1 9 9 3 3 13 14 14 2 2 26 1 12 12 12 4 2 4 2 8 2 27 34 35 10 3 22 9 26 2 2 11 11 16 5 19 4 17 6 5 2 3 1 2 1 1 2 2 1 1 2 2 2 1 1 2 2 2 1 1 2 1 1 1 1 2 2 1 1 4 4 1 1 3 3 2 1 1 2 1 1 3 20 1 2 9 9 4 1 1 9 86 1 13 206 85 139 83 83 9 9 8 1 1 2 7 2 7 2 9 1 7 7 7 7 161 1 22 143 143 124 19 113 37 37 35 2 37 6 2 1 1 2 3 9 3 2 2 1 1 5 4 2 2 1 2 2 204 204 204 1 163 1 1 1 2 9 1 8 1 1 1 2 1 2 1 1 2 203 150 57 204 1 1 1 3 1 2 3 3 3 3 3 1 2 1 2 3 8 8 9 9 9 2 9 4 4 4 2 1 3 8 3 1 4 1 1 1 1 1 14 2 1 14 9 7 6 7 1 6 7 7 3 6 7 7 7 7 3 7 16 16 14 16 12 15 7 10 6 16 14 1 16 1 14 5 8 8 8 3 8 8 1 8 8 7 8 7 1 7 7 7 7 5 3 7 11 7 8 8 7 10 11 2 3 3 3 2 1 1 1 2 4 4 6 6 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 2297 2298 2299 2300 2301 2302 2303 2304 2305 2306 2307 2308 2309 2310 2311 2312 2313 2314 2315 2316 2317 2318 2319 2320 2321 2322 2323 2324 2325 2326 2327 2328 2329 2330 2331 2332 2333 2334 2335 2336 2337 2338 2339 2340 2341 2342 2343 2344 2345 2346 2347 2348 2349 2350 2351 2352 2353 2354 2355 2356 2357 2358 2359 2360 2361 2362 2363 2364 2365 2366 2367 2368 2369 2370 2371 2372 2373 2374 2375 2376 2377 2378 2379 2380 2381 2382 2383 2384 2385 2386 2387 2388 2389 2390 2391 2392 2393 2394 2395 2396 2397 2398 2399 2400 2401 2402 2403 2404 2405 2406 2407 2408 2409 2410 2411 2412 2413 2414 2415 2416 2417 2418 2419 2420 2421 2422 2423 2424 2425 2426 2427 2428 2429 2430 2431 2432 2433 2434 2435 2436 2437 2438 2439 2440 2441 2442 2443 2444 2445 2446 2447 2448 2449 2450 2451 2452 2453 2454 2455 2456 2457 2458 2459 2460 2461 2462 2463 2464 2465 2466 2467 2468 2469 2470 2471 2472 2473 2474 2475 2476 2477 2478 2479 2480 2481 2482 2483 2484 2485 2486 2487 2488 2489 2490 2491 2492 2493 2494 2495 2496 2497 2498 2499 2500 2501 2502 2503 2504 2505 2506 2507 2508 2509 2510 2511 2512 2513 2514 2515 2516 2517 2518 2519 2520 2521 2522 2523 2524 2525 2526 2527 2528 2529 2530 2531 2532 2533 2534 2535 2536 2537 2538 2539 2540 2541 2542 2543 2544 2545 2546 2547 2548 2549 2550 2551 2552 2553 2554 2555 2556 2557 2558 2559 2560 2561 2562 2563 2564 2565 2566 2567 2568 2569 2570 2571 2572 2573 2574 2575 2576 2577 2578 2579 2580 2581 2582 2583 2584 2585 2586 2587 2588 2589 2590 2591 2592 2593 2594 2595 2596 2597 2598 2599 2600 2601 2602 2603 2604 2605 2606 2607 2608 2609 2610 2611 2612 2613 2614 2615 2616 2617 2618 2619 2620 2621 2622 2623 2624 2625 2626 2627 2628 2629 2630 2631 2632 2633 2634 2635 2636 2637 2638 2639 2640 2641 2642 2643 2644 2645 2646 2647 2648 2649 2650 2651 2652 2653 2654 2655 2656 2657 2658 2659 2660 2661 2662 2663 2664 2665 2666 2667 2668 2669 2670 2671 2672 2673 2674 2675 2676 2677 2678 2679 2680 2681 2682 2683 2684 2685 2686 2687 2688 2689 2690 2691 2692 2693 2694 2695 2696 2697 2698 2699 2700 2701 2702 2703 2704 2705 2706 2707 2708 2709 2710 2711 2712 2713 2714 2715 2716 2717 2718 2719 2720 2721 2722 2723 2724 2725 2726 2727 2728 2729 2730 2731 2732 2733 2734 2735 2736 2737 2738 2739 2740 2741 2742 2743 2744 2745 2746 2747 2748 2749 2750 2751 2752 2753 2754 2755 2756 2757 2758 2759 2760 2761 2762 2763 2764 2765 2766 2767 2768 2769 2770 2771 2772 2773 2774 2775 2776 2777 2778 2779 2780 2781 2782 2783 2784 2785 2786 2787 2788 2789 2790 2791 2792 2793 2794 2795 2796 2797 2798 2799 2800 2801 2802 2803 2804 2805 2806 2807 2808 2809 2810 2811 2812 2813 2814 2815 2816 2817 2818 2819 2820 2821 2822 2823 2824 2825 2826 2827 2828 2829 2830 2831 2832 2833 2834 2835 2836 2837 2838 2839 2840 2841 2842 2843 2844 2845 2846 2847 2848 2849 2850 2851 2852 2853 2854 2855 2856 2857 2858 2859 2860 2861 2862 2863 2864 2865 2866 2867 2868 2869 2870 2871 2872 2873 2874 2875 2876 2877 2878 2879 2880 2881 2882 2883 2884 2885 2886 2887 2888 2889 2890 2891 2892 2893 2894 2895 2896 2897 2898 2899 2900 2901 2902 2903 2904 2905 2906 2907 2908 2909 2910 2911 2912 2913 2914 2915 2916 2917 2918 2919 2920 2921 2922 2923 2924 2925 2926 2927 2928 2929 2930 2931 2932 2933 2934 2935 2936 2937 2938 2939 2940 2941 2942 2943 2944 2945 2946 2947 2948 2949 2950 2951 2952 2953 2954 2955 2956 2957 2958 2959 2960 2961 2962 2963 2964 2965 2966 2967 2968 2969 2970 2971 2972 2973 2974 2975 2976 2977 2978 2979 2980 2981 2982 2983 2984 2985 2986 2987 2988 2989 2990 2991 2992 2993 2994 2995 2996 2997 2998 2999 3000 3001 3002 3003 3004 3005 3006 3007 3008 3009 3010 3011 3012 3013 3014 3015 3016 3017 3018 3019 3020 3021 3022 3023 3024 3025 3026 3027 3028 3029 3030 3031 3032 3033 3034 3035 3036 3037 3038 3039 3040 3041 3042 3043 3044 3045 3046 3047 3048 3049 3050 3051 3052 3053 3054 3055 3056 3057 3058 3059 3060 3061 3062 3063 3064 3065 3066 3067 3068 3069 3070 3071 3072 3073 3074 3075 3076 3077 3078 3079 3080 3081 3082 3083 3084 3085 3086 3087 3088 3089 3090 3091 3092 3093 3094 3095 3096 3097 3098 3099 3100 3101 3102 3103 3104 3105 3106 3107 3108 3109 3110 3111 3112 3113 3114 3115 3116 3117 3118 3119 3120 3121 3122 3123 3124 3125 3126 3127 3128 3129 3130 3131 3132 3133 3134 3135 3136 3137 3138 3139 3140 3141 3142 3143 3144 3145 3146 3147 3148 3149 3150 3151 3152 3153 3154 3155 3156 3157 3158 3159 3160 3161 3162 3163 3164 3165 3166 3167 3168 3169 3170 3171 3172 3173 3174 3175 3176 3177 3178 3179 3180 3181 3182 3183 3184 3185 3186 3187 3188 3189 3190 3191 3192 3193 3194 3195 3196 3197 3198 3199 3200 3201 3202 3203 3204 3205 3206 3207 3208 3209 3210 3211 3212 3213 3214 3215 3216 3217 3218 3219 3220 3221 3222 3223 3224 3225 3226 3227 3228 3229 3230 3231 3232 3233 3234 3235 3236 3237 3238 3239 3240 3241 3242 3243 3244 3245 3246 3247 3248 3249 3250 3251 3252 3253 3254 3255 3256 3257 3258 3259 3260 3261 3262 3263 3264 3265 3266 3267 3268 3269 3270 3271 3272 3273 3274 3275 3276 3277 3278 3279 3280 3281 3282 3283 3284 3285 3286 3287 3288 3289 3290 3291 3292 3293 3294 3295 3296 3297 3298 3299 3300 3301 3302 3303 3304 3305 3306 3307 3308 3309 3310 3311 3312 3313 3314 3315 3316 3317 3318 3319 3320 3321 3322 3323 3324 3325 3326 3327 3328 3329 3330 3331 3332 3333 3334 3335 3336 3337 3338 3339 3340 3341 3342 3343 3344 3345 3346 3347 3348 3349 3350 3351 3352 3353 3354 3355 3356 3357 3358 3359 3360 3361 3362 3363 3364 3365 3366 3367 3368 3369 3370 3371 3372 3373 3374 3375 3376 3377 3378 3379 3380 3381 3382 3383 3384 3385 3386 3387 3388 3389 3390 3391 3392 3393 3394 3395 3396 3397 3398 3399 3400 3401 3402 3403 3404 3405 3406 3407 3408 3409 3410 3411 3412 3413 3414 3415 3416 3417 3418 3419 3420 3421 3422 3423 3424 3425 3426 3427 3428 3429 3430 3431 3432 3433 3434 3435 3436 3437 3438 3439 3440 3441 3442 3443 3444 3445 3446 3447 3448 3449 3450 3451 3452 3453 3454 3455 3456 3457 3458 3459 3460 3461 3462 3463 3464 3465 3466 3467 3468 3469 3470 3471 3472 3473 3474 3475 3476 3477 3478 3479 3480 3481 3482 3483 3484 3485 3486 3487 3488 3489 3490 3491 3492 3493 3494 3495 3496 3497 3498 3499 3500 3501 3502 3503 3504 3505 3506 3507 3508 3509 3510 3511 3512 3513 3514 3515 3516 3517 3518 3519 3520 3521 3522 3523 3524 3525 3526 3527 3528 3529 3530 3531 3532 3533 3534 3535 3536 3537 3538 3539 3540 3541 3542 3543 3544 3545 3546 3547 3548 3549 3550 3551 3552 3553 3554 3555 3556 3557 3558 3559 3560 3561 3562 3563 3564 3565 3566 3567 3568 3569 3570 3571 3572 3573 3574 3575 3576 3577 3578 3579 3580 3581 3582 3583 3584 3585 3586 3587 3588 3589 3590 3591 3592 3593 3594 3595 3596 3597 3598 3599 3600 3601 3602 3603 3604 3605 3606 3607 3608 3609 3610 3611 3612 3613 3614 3615 3616 3617 3618 3619 3620 3621 3622 3623 3624 3625 3626 3627 3628 3629 3630 3631 3632 3633 3634 3635 3636 3637 3638 3639 3640 3641 3642 3643 3644 3645 3646 3647 3648 3649 3650 3651 3652 3653 3654 3655 3656 3657 3658 3659 3660 3661 3662 3663 3664 3665 3666 3667 3668 3669 3670 3671 3672 3673 3674 3675 3676 3677 3678 3679 3680 3681 3682 3683 3684 3685 3686 3687 3688 3689 3690 3691 3692 3693 3694 3695 3696 3697 3698 3699 3700 3701 3702 3703 3704 3705 3706 3707 3708 3709 3710 3711 3712 3713 3714 3715 3716 3717 3718 3719 3720 3721 3722 3723 3724 3725 3726 3727 3728 3729 3730 3731 3732 3733 3734 3735 3736 3737 3738 3739 3740 3741 3742 3743 3744 3745 3746 3747 3748 3749 3750 3751 3752 3753 3754 3755 3756 3757 3758 3759 3760 3761 3762 3763 3764 3765 3766 3767 3768 3769 3770 3771 3772 3773 3774 3775 3776 3777 3778 3779 3780 3781 3782 3783 3784 3785 3786 3787 3788 3789 3790 3791 3792 3793 3794 3795 3796 3797 3798 3799 3800 3801 3802 3803 3804 3805 3806 3807 3808 3809 3810 3811 3812 3813 3814 3815 3816 3817 3818 3819 3820 3821 3822 3823 3824 3825 3826 3827 3828 3829 3830 3831 3832 3833 3834 3835 3836 3837 3838 3839 3840 3841 3842 3843 3844 3845 3846 3847 3848 3849 3850 3851 3852 3853 3854 3855 3856 3857 3858 3859 3860 3861 3862 3863 3864 3865 3866 3867 3868 3869 3870 3871 3872 3873 3874 3875 3876 3877 3878 3879 3880 3881 3882 3883 3884 3885 3886 3887 3888 3889 3890 3891 3892 3893 3894 3895 3896 3897 3898 3899 3900 3901 3902 3903 3904 3905 3906 3907 3908 3909 3910 3911 3912 3913 3914 3915 3916 3917 3918 3919 3920 3921 3922 3923 3924 3925 3926 3927 3928 3929 3930 3931 3932 3933 3934 3935 3936 3937 3938 3939 3940 3941 3942 3943 3944 3945 3946 3947 3948 3949 3950 3951 3952 3953 3954 3955 3956 3957 3958 3959 3960 3961 3962 3963 3964 3965 3966 3967 3968 3969 3970 3971 3972 3973 3974 3975 3976 3977 3978 3979 3980 3981 3982 3983 3984 3985 3986 3987 3988 3989 3990 3991 3992 3993 3994 3995 3996 3997 3998 3999 4000 4001 4002 4003 4004 4005 4006 4007 4008 4009 4010 4011 4012 4013 4014 4015 4016 4017 4018 4019 4020 4021 4022 4023 4024 4025 4026 4027 4028 4029 4030 4031 4032 4033 4034 4035 4036 4037 4038 4039 4040 4041 4042 4043 4044 4045 4046 4047 4048 4049 4050 4051 4052 4053 4054 4055 4056 4057 4058 4059 4060 4061 4062 4063 4064 4065 4066 4067 4068 4069 4070 4071 4072 4073 4074 4075 4076 4077 4078 4079 4080 4081 4082 4083 4084 4085 4086 4087 4088 4089 4090 4091 4092 4093 4094 4095 4096 4097 4098 4099 4100 4101 4102 4103 4104 4105 4106 4107 4108 4109 4110 4111 4112 4113 4114 4115 4116 4117 4118 4119 4120 4121 4122 4123 4124 4125 4126 4127 4128 4129 4130 4131 4132 4133 4134 4135 4136 4137 4138 4139 4140 4141 4142 4143 4144 4145 4146 4147 4148 4149 4150 4151 4152 4153 4154 4155 4156 4157 4158 4159 4160 4161 4162 4163 4164 4165 4166 4167 4168 4169 4170 4171 4172 4173 4174 4175 4176 4177 4178 4179 4180 4181 4182 4183 4184 4185 4186 4187 4188 4189 4190 4191 4192 4193 4194 4195 4196 4197 4198 4199 4200 4201 4202 4203 4204 4205 4206 4207 4208 4209 4210 4211 4212 4213 4214 4215 4216 4217 4218 4219 4220 4221 4222 4223 4224 4225 4226 4227 4228 4229 4230 4231 4232 4233 4234 4235 4236 4237 4238 4239 4240 4241 4242 4243 4244 4245 4246 4247 4248 4249 4250 4251 4252 4253 4254 4255 4256 4257 4258 4259 4260 4261 4262 4263 4264 4265 4266 4267 4268 4269 4270 4271 4272 4273 4274 4275 4276 4277 4278 4279 4280 4281 4282 4283 4284 4285 4286 4287 4288 4289 4290 4291 4292 4293 4294 4295 4296 4297 4298 4299 4300 4301 4302 4303 4304 4305 4306 4307 4308 4309 4310 4311 4312 4313 4314 4315 4316 4317 4318 4319 4320 4321 4322 4323 4324 4325 4326 4327 4328 4329 4330 4331 4332 4333 4334 4335 4336 4337 4338 4339 4340 4341 4342 4343 4344 4345 4346 4347 4348 4349 4350 4351 4352 4353 4354 4355 4356 4357 4358 4359 4360 4361 4362 4363 4364 4365 4366 4367 4368 4369 4370 4371 4372 4373 4374 4375 4376 4377 4378 4379 4380 4381 4382 4383 4384 4385 4386 4387 4388 4389 4390 4391 4392 4393 4394 4395 4396 4397 4398 4399 4400 4401 4402 4403 4404 4405 4406 4407 4408 4409 4410 4411 4412 4413 4414 4415 4416 4417 4418 4419 4420 4421 4422 4423 4424 4425 4426 4427 4428 4429 4430 4431 4432 4433 4434 4435 4436 4437 4438 4439 4440 4441 4442 4443 4444 4445 4446 4447 4448 4449 4450 4451 4452 4453 4454 4455 4456 4457 4458 4459 4460 4461 4462 4463 4464 4465 4466 4467 4468 4469 4470 4471 4472 4473 4474 4475 4476 4477 4478 4479 4480 4481 4482 4483 4484 4485 4486 4487 4488 4489 4490 4491 4492 4493 4494 4495 4496 4497 4498 4499 4500 4501 4502 4503 4504 4505 4506 4507 4508 4509 4510 4511 4512 4513 4514 4515 4516 4517 4518 4519 4520 4521 4522 4523 4524 4525 4526 4527 4528 4529 4530 4531 4532 4533 4534 4535 4536 4537 4538 4539 4540 4541 4542 4543 4544 4545 4546 4547 4548 4549 4550 4551 4552 4553 4554 4555 4556 4557 4558 4559 4560 4561 4562 4563 4564 4565 4566 4567 4568 4569 4570 4571 4572 4573 4574 4575 4576 4577 4578 4579 4580 4581 4582 4583 4584 4585 4586 4587 4588 4589 4590 4591 4592 4593 4594 4595 4596 4597 4598 4599 4600 4601 4602 4603 4604 4605 4606 4607 4608 4609 4610 4611 4612 4613 4614 4615 4616 4617 4618 4619 4620 4621 4622 4623 4624 4625 4626 4627 4628 4629 4630 4631 4632 4633 4634 4635 4636 4637 4638 4639 4640 4641 4642 4643 4644 4645 4646 4647 4648 4649 4650 4651 4652 4653 4654 4655 4656 4657 4658 4659 4660 4661 4662 4663 4664 4665 4666 4667 4668 4669 4670 4671 4672 4673 4674 4675 4676 4677 4678 4679 4680 4681 4682 4683 4684 4685 4686 4687 4688 4689 4690 4691 4692 4693 4694 4695 4696 4697 4698 4699 4700 4701 4702 4703 4704 4705 4706 4707 4708 4709 4710 4711 4712 4713 4714 4715 4716 4717 4718 4719 4720 4721 4722 4723 4724 4725 4726 4727 4728 4729 4730 4731 4732 4733 4734 4735 4736 4737 4738 4739 4740 4741 4742 4743 4744 4745 4746 4747 4748 4749 4750 4751 4752 4753 4754 4755 4756 4757 4758 4759 4760 4761 4762 4763 4764 4765 4766 4767 4768 4769 4770 4771 4772 4773 4774 4775 4776 4777 4778 4779 4780 4781 4782 4783 4784 4785 4786 4787 4788 4789 4790 4791 4792 4793 4794 4795 4796 4797 4798 4799 4800 4801 4802 4803 4804 4805 4806 4807 4808 4809 4810 4811 4812 4813 4814 4815 4816 4817 4818 4819 4820 4821 4822 4823 4824 4825 4826 4827 4828 4829 4830 4831 4832 4833 4834 4835 4836 4837 4838 4839 4840 4841 4842 4843 4844 4845 4846 4847 4848 4849 4850 4851 4852 4853 4854 4855 4856 4857 4858 4859 4860 4861 4862 4863 4864 4865 4866 4867 4868 4869 4870 4871 4872 4873 4874 4875 4876 4877 4878 4879 4880 4881 4882 4883 4884 4885 4886 4887 4888 4889 4890 4891 4892 4893 4894 4895 4896 4897 4898 4899 4900 4901 4902 4903 4904 4905 4906 4907 4908 4909 4910 4911 4912 4913 4914 4915 4916 4917 4918 4919 4920 4921 4922 4923 4924 4925 4926 4927 4928 4929 4930 4931 4932 4933 4934 4935 4936 4937 4938 4939 4940 4941 4942 4943 4944 4945 4946 4947 4948 4949 4950 4951 4952 4953 4954 4955 4956 4957 4958 4959 4960 4961 4962 4963 4964 4965 4966 4967 4968 4969 4970 4971 4972 4973 4974 4975 4976 4977 4978 4979 4980 4981 4982 4983 4984 4985 4986 4987 4988 4989 4990 4991 4992 4993 4994 4995 4996 4997 4998 4999 5000 5001 5002 5003 5004 5005 5006 5007 5008 5009 5010 5011 5012 5013 5014 5015 5016 5017 5018 5019 5020 5021 5022 5023 5024 5025 5026 5027 5028 5029 5030 5031 5032 5033 5034 5035 5036 5037 5038 5039 5040 5041 5042 5043 5044 5045 5046 5047 5048 5049 5050 5051 5052 5053 5054 5055 5056 5057 5058 5059 5060 5061 5062 5063 5064 5065 5066 5067 5068 5069 5070 5071 5072 5073 5074 5075 5076 5077 5078 5079 5080 5081 5082 5083 5084 5085 5086 5087 5088 5089 5090 5091 5092 5093 5094 5095 5096 5097 5098 5099 5100 5101 5102 5103 5104 5105 5106 5107 5108 5109 5110 5111 5112 5113 5114 5115 5116 5117 5118 5119 5120 5121 5122 5123 5124 5125 5126 5127 5128 5129 5130 5131 5132 5133 5134 5135 5136 5137 5138 5139 5140 5141 5142 5143 5144 5145 5146 5147 5148 5149 5150 5151 5152 5153 5154 5155 5156 5157 5158 5159 5160 5161 5162 5163 5164 5165 5166 5167 5168 5169 5170 5171 5172 5173 5174 5175 5176 5177 5178 5179 5180 5181 5182 5183 5184 5185 5186 5187 5188 5189 5190 5191 5192 5193 5194 5195 5196 5197 5198 5199 5200 5201 5202 5203 5204 5205 5206 5207 5208 5209 5210 5211 5212 5213 5214 5215 5216 5217 5218 5219 5220 5221 5222 5223 5224 5225 5226 5227 5228 5229 5230 5231 5232 5233 5234 5235 5236 5237 5238 5239 5240 5241 5242 5243 5244 5245 5246 5247 5248 5249 5250 5251 5252 5253 5254 5255 5256 5257 5258 5259 5260 5261 5262 5263 5264 5265 5266 5267 5268 5269 5270 5271 5272 5273 5274 5275 5276 5277 5278 5279 5280 5281 5282 5283 5284 5285 5286 5287 5288 5289 5290 5291 5292 5293 5294 5295 5296 5297 5298 5299 5300 5301 5302 5303 5304 5305 5306 5307 5308 5309 5310 5311 5312 5313 5314 5315 5316 5317 5318 5319 5320 5321 5322 5323 5324 5325 5326 5327 5328 5329 5330 5331 5332 5333 5334 5335 5336 5337 5338 5339 5340 5341 5342 5343 5344 5345 5346 5347 5348 5349 5350 5351 5352 5353 5354 5355 5356 5357 5358 5359 5360 5361 5362 5363 5364 5365 5366 5367 5368 5369 5370 5371 5372 5373 5374 5375 5376 5377 5378 5379 5380 5381 5382 5383 5384 5385 5386 5387 5388 5389 5390 5391 5392 5393 5394 5395 5396 5397 5398 5399 5400 5401 5402 5403 5404 5405 5406 5407 5408 5409 5410 5411 5412 5413 5414 5415 5416 5417 5418 5419 5420 5421 5422 5423 5424 5425 5426 5427 5428 5429 5430 5431 5432 5433 5434 5435 5436 5437 5438 5439 5440 5441 5442 5443 5444 5445 5446 5447 5448 5449 5450 5451 5452 5453 5454 5455 5456 5457 5458 5459 5460 5461 5462 5463 5464 5465 5466 5467 5468 5469 5470 5471 5472 5473 5474 5475 5476 5477 5478 5479 5480 5481 5482 5483 5484 5485 5486 5487 5488 5489 5490 5491 5492 5493 5494 5495 5496 5497 5498 5499 5500 5501 5502 5503 5504 5505 5506 5507 5508 5509 5510 5511 5512 5513 5514 5515 5516 5517 5518 5519 5520 5521 5522 5523 5524 5525 5526 5527 5528 5529 5530 5531 5532 5533 5534 5535 5536 5537 5538 5539 5540 5541 5542 5543 5544 5545 5546 5547 5548 5549 5550 5551 5552 5553 5554 5555 5556 5557 5558 5559 5560 5561 5562 5563 5564 5565 5566 5567 5568 5569 5570 5571 5572 5573 5574 5575 5576 5577 5578 5579 5580 5581 5582 5583 5584 5585 5586 5587 5588 5589 5590 5591 5592 5593 5594 5595 5596 5597 5598 5599 5600 5601 5602 5603 5604 5605 5606 5607 5608 5609 5610 5611 5612 5613 5614 5615 5616 5617 5618 5619 5620 5621 5622 5623 5624 5625 5626 5627 5628 5629 5630 5631 5632 5633 5634 5635 5636 5637 5638 5639 5640 5641 5642 5643 5644 5645 5646 5647 5648 5649 5650 5651 5652 5653 5654 5655 5656 5657 5658 5659 5660 5661 5662 5663 5664 5665 5666 5667 5668 5669 5670 5671 5672 5673 5674 5675 5676 5677 5678 5679 5680 5681 5682 5683 5684 5685 5686 5687 5688 5689 5690 5691 5692 5693 5694 5695 5696 5697 5698 5699 5700 5701 5702 5703 5704 5705 5706 5707 5708 5709 5710 5711 5712 5713 5714 5715 5716 5717 5718 5719 5720 5721 5722 5723 5724 5725 5726 5727 5728 5729 5730 5731 5732 5733 5734 5735 5736 5737 5738 5739 5740 5741 5742 5743 5744 5745 5746 5747 5748 5749 5750 5751 5752 5753 5754 5755 5756 5757 5758 5759 5760 5761 5762 5763 5764 5765 5766 5767 5768 5769 5770 5771 5772 5773 5774 5775 5776 5777 5778 5779 5780 5781 5782 5783 5784 5785 5786 5787 5788 5789 5790 5791 5792 5793 5794 5795 5796 5797 5798 5799 5800 5801 5802 5803 5804 5805 5806 5807 5808 5809 5810 5811 5812 5813 5814 5815 5816 5817 5818 5819 5820 5821 5822 5823 5824 5825 5826 5827 5828 5829 5830 5831 5832 5833 5834 5835 5836 5837 5838 5839 5840 5841 5842 5843 5844 5845 5846 5847 5848 5849 5850 5851 5852 5853 5854 5855 5856 5857 5858 5859 5860 5861 5862 5863 5864 5865 5866 5867 5868 5869 5870 5871 5872 5873 5874 5875 5876 5877 5878 5879 5880 5881 5882 5883 5884 5885 5886 5887 5888 5889 5890 5891 5892 5893 5894 5895 5896 5897 5898 5899 5900 5901 5902 5903 5904 5905 5906 5907 5908 5909 5910 5911 5912 5913 5914 5915 5916 5917 5918 5919 5920 5921 5922 5923 5924 5925 5926 5927 5928 5929 5930 5931 5932 5933 5934 5935 5936 5937 5938 5939 5940 5941 5942 5943 5944 5945 5946 5947 5948 5949 5950 5951 5952 5953 5954 5955 5956 5957 5958 5959 5960 5961 5962 5963 5964 5965 5966 5967 5968 5969 5970 5971 5972 5973 5974 5975 5976 5977 5978 5979 5980 5981 5982 5983 5984 5985 5986 5987 5988 5989 5990 5991 5992 5993 5994 5995 5996 5997 5998 5999 6000 6001 6002 6003 6004 6005 6006 6007 6008 6009 6010 6011 6012 6013 6014 6015 6016 6017 6018 6019 6020 6021 6022 6023 6024 6025 6026 6027 6028 6029 6030 6031 6032 6033 6034 6035 6036 6037 6038 6039 6040 6041 6042 6043 6044 6045 6046 6047 6048 6049 6050 6051 6052 6053 6054 6055 6056 6057 6058 6059 6060 6061 6062 6063 6064 6065 6066 6067 6068 6069 6070 6071 6072 6073 6074 6075 6076 6077 6078 6079 6080 6081 6082 6083 6084 6085 6086 6087 6088 6089 6090 6091 6092 6093 6094 6095 6096 6097 6098 6099 6100 6101 6102 6103 6104 6105 6106 6107 6108 6109 6110 6111 6112 6113 6114 6115 6116 6117 6118 6119 6120 6121 6122 6123 6124 6125 6126 6127 6128 6129 6130 6131 6132 6133 6134 6135 6136 6137 6138 6139 6140 6141 6142 6143 6144 6145 6146 6147 6148 6149 6150 6151 6152 6153 6154 6155 6156 6157 6158 6159 6160 6161 6162 6163 6164 6165 6166 6167 6168 6169 6170 6171 6172 6173 6174 6175 6176 6177 6178 6179 6180 6181 6182 6183 6184 6185 6186 6187 6188 6189 6190 6191 6192 6193 6194 6195 6196 6197 6198 6199 6200 6201 6202 6203 6204 6205 6206 6207 6208 6209 6210 6211 6212 6213 6214 6215 6216 6217 6218 6219 6220 6221 6222 6223 6224 6225 6226 6227 6228 6229 6230 6231 6232 6233 6234 6235 6236 6237 6238 6239 6240 6241 6242 6243 6244 6245 6246 6247 6248 6249 6250 6251 6252 6253 6254 6255 6256 6257 6258 6259 6260 6261 6262 6263 6264 6265 6266 6267 6268 6269 6270 6271 6272 6273 6274 6275 6276 6277 6278 6279 6280 6281 6282 6283 6284 6285 6286 6287 6288 6289 6290 6291 6292 6293 6294 6295 6296 6297 6298 6299 6300 6301 6302 6303 6304 6305 6306 6307 6308 6309 6310 6311 6312 6313 6314 6315 6316 6317 6318 6319 6320 6321 6322 6323 6324 6325 6326 6327 6328 6329 6330 6331 6332 6333 6334 6335 6336 6337 6338 6339 6340 6341 6342 6343 6344 6345 6346 6347 6348 6349 6350 6351 6352 6353 6354 6355 6356 6357 6358 6359 6360 6361 6362 6363 6364 6365 6366 6367 6368 6369 6370 6371 6372 6373 6374 6375 6376 6377 6378 6379 6380 6381 6382 6383 6384 6385 6386 6387 6388 6389 6390 6391 6392 6393 6394 6395 6396 6397 6398 6399 6400 6401 6402 6403 6404 6405 6406 6407 6408 6409 6410 6411 6412 6413 6414 6415 6416 6417 6418 6419 6420 6421 6422 6423 6424 6425 6426 6427 6428 6429 6430 6431 6432 6433 6434 6435 6436 6437 6438 6439 6440 6441 6442 6443 6444 6445 6446 6447 6448 6449 6450 6451 6452 6453 6454 6455 6456 6457 6458 6459 6460 6461 6462 6463 6464 6465 6466 6467 6468 6469 6470 6471 6472 6473 6474 6475 6476 6477 6478 6479 6480 6481 6482 6483 6484 6485 6486 6487 6488 6489 6490 6491 6492 6493 6494 6495 6496 6497 6498 6499 6500 6501 6502 6503 6504 6505 6506 6507 6508 6509 6510 6511 6512 6513 6514 6515 6516 6517 6518 6519 6520 6521 6522 6523 6524 6525 6526 6527 6528 6529 6530 6531 6532 6533 6534 6535 6536 6537 6538 6539 6540 6541 6542 6543 6544 6545 6546 6547 6548 6549 6550 6551 6552 6553 6554 6555 6556 6557 6558 6559 6560 6561 6562 6563 6564 6565 6566 6567 6568 6569 6570 6571 6572 6573 6574 6575 6576 6577 6578 6579 6580 6581 6582 6583 6584 6585 6586 6587 6588 6589 6590 6591 6592 6593 6594 6595 6596 6597 6598 6599 6600 6601 6602 6603 6604 6605 6606 6607 6608 6609 6610 6611 6612 6613 6614 6615 6616 6617 6618 6619 6620 6621 6622 6623 6624 6625 6626 6627 6628 6629 6630 6631 6632 6633 6634 6635 6636 6637 6638 6639 6640 6641 6642 6643 6644 6645 6646 6647 6648 6649 6650 6651 6652 6653 6654 6655 6656 6657 6658 6659 6660 6661 6662 6663 6664 6665 6666 6667 6668 6669 6670 6671 6672 6673 6674 6675 6676 6677 6678 6679 6680 6681 6682 6683 6684 6685 6686 6687 6688 6689 6690 6691 6692 6693 6694 6695 6696 6697 6698 6699 6700 6701 6702 6703 6704 6705 6706 6707 6708 6709 6710 6711 6712 6713 6714 6715 6716 6717 6718 6719 6720 6721 6722 6723 6724 6725 6726 6727 6728 6729 6730 6731 6732 6733 6734 6735 6736 6737 6738 6739 6740 6741 6742 6743 6744 6745 6746 6747 6748 6749 6750 6751 6752 6753 6754 6755 6756 6757 6758 6759 6760 6761 6762 6763 6764 6765 6766 6767 6768 6769 6770 6771 6772 6773 6774 6775 6776 6777 6778 6779 6780 6781 6782 6783 6784 6785 6786 6787 6788 6789 6790 6791 6792 6793 6794 6795 6796 6797 6798 6799 6800 6801 6802 6803 6804 6805 6806 6807 6808 6809 6810 6811 6812 6813 6814 6815 6816 6817 6818 6819 6820 6821 6822 6823 6824 6825 6826 6827 6828 6829 6830 6831 6832 6833 6834 6835 6836 6837 6838 6839 6840 6841 6842 6843 6844 6845 6846 6847 6848 6849 6850 6851 6852 6853 6854 6855 6856 6857 6858 6859 6860 6861 6862 6863 6864 6865 6866 6867 6868 6869 6870 6871 6872 6873 6874 6875 6876 6877 6878 6879 6880 6881 6882 6883 6884 6885 6886 6887 6888 6889 6890 6891 6892 6893 6894 6895 6896 6897 6898 6899 6900 6901 6902 6903 6904 6905 6906 6907 6908 6909 6910 6911 6912 6913 6914 6915 6916 6917 6918 6919 6920 6921 6922 6923 6924 6925 6926 6927 6928 6929 6930 6931 6932 6933 6934 6935 6936 6937 6938 6939 6940 6941 6942 6943 6944 6945 6946 6947 6948 6949 6950 6951 6952 6953 6954 6955 6956 6957 6958 6959 6960 6961 6962 6963 6964 6965 6966 6967 6968 6969 6970 6971 6972 6973 6974 6975 6976 6977 6978 6979 6980 6981 6982 6983 6984 6985 6986 6987 6988 6989 6990 6991 6992 6993 6994 6995 6996 6997 6998 6999 7000 7001 7002 7003 7004 7005 7006 7007 7008 7009 7010 7011 7012 7013 7014 7015 7016 7017 7018 7019 7020 7021 7022 7023 7024 7025 7026 7027 7028 7029 7030 7031 7032 7033 7034 7035 7036 7037 7038 7039 7040 7041 7042 7043 7044 7045 7046 7047 7048 7049 7050 7051 7052 7053 7054 7055 7056 7057 7058 7059 7060 7061 7062 7063 7064 7065 7066 7067 7068 7069 7070 7071 7072 7073 7074 7075 7076 7077 7078 7079 7080 7081 7082 7083 7084 7085 7086 7087 7088 7089 7090 7091 7092 7093 7094 7095 7096 7097 7098 7099 7100 7101 7102 7103 7104 7105 7106 7107 7108 7109 7110 7111 7112 7113 7114 7115 | // SPDX-License-Identifier: GPL-2.0-only /* binder.c * * Android IPC Subsystem * * Copyright (C) 2007-2008 Google, Inc. */ /* * Locking overview * * There are 3 main spinlocks which must be acquired in the * order shown: * * 1) proc->outer_lock : protects binder_ref * binder_proc_lock() and binder_proc_unlock() are * used to acq/rel. * 2) node->lock : protects most fields of binder_node. * binder_node_lock() and binder_node_unlock() are * used to acq/rel * 3) proc->inner_lock : protects the thread and node lists * (proc->threads, proc->waiting_threads, proc->nodes) * and all todo lists associated with the binder_proc * (proc->todo, thread->todo, proc->delivered_death and * node->async_todo), as well as thread->transaction_stack * binder_inner_proc_lock() and binder_inner_proc_unlock() * are used to acq/rel * * Any lock under procA must never be nested under any lock at the same * level or below on procB. * * Functions that require a lock held on entry indicate which lock * in the suffix of the function name: * * foo_olocked() : requires node->outer_lock * foo_nlocked() : requires node->lock * foo_ilocked() : requires proc->inner_lock * foo_oilocked(): requires proc->outer_lock and proc->inner_lock * foo_nilocked(): requires node->lock and proc->inner_lock * ... */ #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt #include <linux/fdtable.h> #include <linux/file.h> #include <linux/freezer.h> #include <linux/fs.h> #include <linux/list.h> #include <linux/miscdevice.h> #include <linux/module.h> #include <linux/mutex.h> #include <linux/nsproxy.h> #include <linux/poll.h> #include <linux/debugfs.h> #include <linux/rbtree.h> #include <linux/sched/signal.h> #include <linux/sched/mm.h> #include <linux/seq_file.h> #include <linux/string.h> #include <linux/uaccess.h> #include <linux/pid_namespace.h> #include <linux/security.h> #include <linux/spinlock.h> #include <linux/ratelimit.h> #include <linux/syscalls.h> #include <linux/task_work.h> #include <linux/sizes.h> #include <linux/ktime.h> #include <uapi/linux/android/binder.h> #include <linux/cacheflush.h> #include "binder_internal.h" #include "binder_trace.h" static HLIST_HEAD(binder_deferred_list); static DEFINE_MUTEX(binder_deferred_lock); static HLIST_HEAD(binder_devices); static DEFINE_SPINLOCK(binder_devices_lock); static HLIST_HEAD(binder_procs); static DEFINE_MUTEX(binder_procs_lock); static HLIST_HEAD(binder_dead_nodes); static DEFINE_SPINLOCK(binder_dead_nodes_lock); static struct dentry *binder_debugfs_dir_entry_root; static struct dentry *binder_debugfs_dir_entry_proc; static atomic_t binder_last_id; static int proc_show(struct seq_file *m, void *unused); DEFINE_SHOW_ATTRIBUTE(proc); #define FORBIDDEN_MMAP_FLAGS (VM_WRITE) enum { BINDER_DEBUG_USER_ERROR = 1U << 0, BINDER_DEBUG_FAILED_TRANSACTION = 1U << 1, BINDER_DEBUG_DEAD_TRANSACTION = 1U << 2, BINDER_DEBUG_OPEN_CLOSE = 1U << 3, BINDER_DEBUG_DEAD_BINDER = 1U << 4, BINDER_DEBUG_DEATH_NOTIFICATION = 1U << 5, BINDER_DEBUG_READ_WRITE = 1U << 6, BINDER_DEBUG_USER_REFS = 1U << 7, BINDER_DEBUG_THREADS = 1U << 8, BINDER_DEBUG_TRANSACTION = 1U << 9, BINDER_DEBUG_TRANSACTION_COMPLETE = 1U << 10, BINDER_DEBUG_FREE_BUFFER = 1U << 11, BINDER_DEBUG_INTERNAL_REFS = 1U << 12, BINDER_DEBUG_PRIORITY_CAP = 1U << 13, BINDER_DEBUG_SPINLOCKS = 1U << 14, }; static uint32_t binder_debug_mask = BINDER_DEBUG_USER_ERROR | BINDER_DEBUG_FAILED_TRANSACTION | BINDER_DEBUG_DEAD_TRANSACTION; module_param_named(debug_mask, binder_debug_mask, uint, 0644); char *binder_devices_param = CONFIG_ANDROID_BINDER_DEVICES; module_param_named(devices, binder_devices_param, charp, 0444); static DECLARE_WAIT_QUEUE_HEAD(binder_user_error_wait); static int binder_stop_on_user_error; static int binder_set_stop_on_user_error(const char *val, const struct kernel_param *kp) { int ret; ret = param_set_int(val, kp); if (binder_stop_on_user_error < 2) wake_up(&binder_user_error_wait); return ret; } module_param_call(stop_on_user_error, binder_set_stop_on_user_error, param_get_int, &binder_stop_on_user_error, 0644); static __printf(2, 3) void binder_debug(int mask, const char *format, ...) { struct va_format vaf; va_list args; if (binder_debug_mask & mask) { va_start(args, format); vaf.va = &args; vaf.fmt = format; pr_info_ratelimited("%pV", &vaf); va_end(args); } } #define binder_txn_error(x...) \ binder_debug(BINDER_DEBUG_FAILED_TRANSACTION, x) static __printf(1, 2) void binder_user_error(const char *format, ...) { struct va_format vaf; va_list args; if (binder_debug_mask & BINDER_DEBUG_USER_ERROR) { va_start(args, format); vaf.va = &args; vaf.fmt = format; pr_info_ratelimited("%pV", &vaf); va_end(args); } if (binder_stop_on_user_error) binder_stop_on_user_error = 2; } #define binder_set_extended_error(ee, _id, _command, _param) \ do { \ (ee)->id = _id; \ (ee)->command = _command; \ (ee)->param = _param; \ } while (0) #define to_flat_binder_object(hdr) \ container_of(hdr, struct flat_binder_object, hdr) #define to_binder_fd_object(hdr) container_of(hdr, struct binder_fd_object, hdr) #define to_binder_buffer_object(hdr) \ container_of(hdr, struct binder_buffer_object, hdr) #define to_binder_fd_array_object(hdr) \ container_of(hdr, struct binder_fd_array_object, hdr) static struct binder_stats binder_stats; static inline void binder_stats_deleted(enum binder_stat_types type) { atomic_inc(&binder_stats.obj_deleted[type]); } static inline void binder_stats_created(enum binder_stat_types type) { atomic_inc(&binder_stats.obj_created[type]); } struct binder_transaction_log_entry { int debug_id; int debug_id_done; int call_type; int from_proc; int from_thread; int target_handle; int to_proc; int to_thread; int to_node; int data_size; int offsets_size; int return_error_line; uint32_t return_error; uint32_t return_error_param; char context_name[BINDERFS_MAX_NAME + 1]; }; struct binder_transaction_log { atomic_t cur; bool full; struct binder_transaction_log_entry entry[32]; }; static struct binder_transaction_log binder_transaction_log; static struct binder_transaction_log binder_transaction_log_failed; static struct binder_transaction_log_entry *binder_transaction_log_add( struct binder_transaction_log *log) { struct binder_transaction_log_entry *e; unsigned int cur = atomic_inc_return(&log->cur); if (cur >= ARRAY_SIZE(log->entry)) log->full = true; e = &log->entry[cur % ARRAY_SIZE(log->entry)]; WRITE_ONCE(e->debug_id_done, 0); /* * write-barrier to synchronize access to e->debug_id_done. * We make sure the initialized 0 value is seen before * memset() other fields are zeroed by memset. */ smp_wmb(); memset(e, 0, sizeof(*e)); return e; } enum binder_deferred_state { BINDER_DEFERRED_FLUSH = 0x01, BINDER_DEFERRED_RELEASE = 0x02, }; enum { BINDER_LOOPER_STATE_REGISTERED = 0x01, BINDER_LOOPER_STATE_ENTERED = 0x02, BINDER_LOOPER_STATE_EXITED = 0x04, BINDER_LOOPER_STATE_INVALID = 0x08, BINDER_LOOPER_STATE_WAITING = 0x10, BINDER_LOOPER_STATE_POLL = 0x20, }; /** * binder_proc_lock() - Acquire outer lock for given binder_proc * @proc: struct binder_proc to acquire * * Acquires proc->outer_lock. Used to protect binder_ref * structures associated with the given proc. */ #define binder_proc_lock(proc) _binder_proc_lock(proc, __LINE__) static void _binder_proc_lock(struct binder_proc *proc, int line) __acquires(&proc->outer_lock) { binder_debug(BINDER_DEBUG_SPINLOCKS, "%s: line=%d\n", __func__, line); spin_lock(&proc->outer_lock); } /** * binder_proc_unlock() - Release outer lock for given binder_proc * @proc: struct binder_proc to acquire * * Release lock acquired via binder_proc_lock() */ #define binder_proc_unlock(proc) _binder_proc_unlock(proc, __LINE__) static void _binder_proc_unlock(struct binder_proc *proc, int line) __releases(&proc->outer_lock) { binder_debug(BINDER_DEBUG_SPINLOCKS, "%s: line=%d\n", __func__, line); spin_unlock(&proc->outer_lock); } /** * binder_inner_proc_lock() - Acquire inner lock for given binder_proc * @proc: struct binder_proc to acquire * * Acquires proc->inner_lock. Used to protect todo lists */ #define binder_inner_proc_lock(proc) _binder_inner_proc_lock(proc, __LINE__) static void _binder_inner_proc_lock(struct binder_proc *proc, int line) __acquires(&proc->inner_lock) { binder_debug(BINDER_DEBUG_SPINLOCKS, "%s: line=%d\n", __func__, line); spin_lock(&proc->inner_lock); } /** * binder_inner_proc_unlock() - Release inner lock for given binder_proc * @proc: struct binder_proc to acquire * * Release lock acquired via binder_inner_proc_lock() */ #define binder_inner_proc_unlock(proc) _binder_inner_proc_unlock(proc, __LINE__) static void _binder_inner_proc_unlock(struct binder_proc *proc, int line) __releases(&proc->inner_lock) { binder_debug(BINDER_DEBUG_SPINLOCKS, "%s: line=%d\n", __func__, line); spin_unlock(&proc->inner_lock); } /** * binder_node_lock() - Acquire spinlock for given binder_node * @node: struct binder_node to acquire * * Acquires node->lock. Used to protect binder_node fields */ #define binder_node_lock(node) _binder_node_lock(node, __LINE__) static void _binder_node_lock(struct binder_node *node, int line) __acquires(&node->lock) { binder_debug(BINDER_DEBUG_SPINLOCKS, "%s: line=%d\n", __func__, line); spin_lock(&node->lock); } /** * binder_node_unlock() - Release spinlock for given binder_proc * @node: struct binder_node to acquire * * Release lock acquired via binder_node_lock() */ #define binder_node_unlock(node) _binder_node_unlock(node, __LINE__) static void _binder_node_unlock(struct binder_node *node, int line) __releases(&node->lock) { binder_debug(BINDER_DEBUG_SPINLOCKS, "%s: line=%d\n", __func__, line); spin_unlock(&node->lock); } /** * binder_node_inner_lock() - Acquire node and inner locks * @node: struct binder_node to acquire * * Acquires node->lock. If node->proc also acquires * proc->inner_lock. Used to protect binder_node fields */ #define binder_node_inner_lock(node) _binder_node_inner_lock(node, __LINE__) static void _binder_node_inner_lock(struct binder_node *node, int line) __acquires(&node->lock) __acquires(&node->proc->inner_lock) { binder_debug(BINDER_DEBUG_SPINLOCKS, "%s: line=%d\n", __func__, line); spin_lock(&node->lock); if (node->proc) binder_inner_proc_lock(node->proc); else /* annotation for sparse */ __acquire(&node->proc->inner_lock); } /** * binder_node_inner_unlock() - Release node and inner locks * @node: struct binder_node to acquire * * Release lock acquired via binder_node_lock() */ #define binder_node_inner_unlock(node) _binder_node_inner_unlock(node, __LINE__) static void _binder_node_inner_unlock(struct binder_node *node, int line) __releases(&node->lock) __releases(&node->proc->inner_lock) { struct binder_proc *proc = node->proc; binder_debug(BINDER_DEBUG_SPINLOCKS, "%s: line=%d\n", __func__, line); if (proc) binder_inner_proc_unlock(proc); else /* annotation for sparse */ __release(&node->proc->inner_lock); spin_unlock(&node->lock); } static bool binder_worklist_empty_ilocked(struct list_head *list) { return list_empty(list); } /** * binder_worklist_empty() - Check if no items on the work list * @proc: binder_proc associated with list * @list: list to check * * Return: true if there are no items on list, else false */ static bool binder_worklist_empty(struct binder_proc *proc, struct list_head *list) { bool ret; binder_inner_proc_lock(proc); ret = binder_worklist_empty_ilocked(list); binder_inner_proc_unlock(proc); return ret; } /** * binder_enqueue_work_ilocked() - Add an item to the work list * @work: struct binder_work to add to list * @target_list: list to add work to * * Adds the work to the specified list. Asserts that work * is not already on a list. * * Requires the proc->inner_lock to be held. */ static void binder_enqueue_work_ilocked(struct binder_work *work, struct list_head *target_list) { BUG_ON(target_list == NULL); BUG_ON(work->entry.next && !list_empty(&work->entry)); list_add_tail(&work->entry, target_list); } /** * binder_enqueue_deferred_thread_work_ilocked() - Add deferred thread work * @thread: thread to queue work to * @work: struct binder_work to add to list * * Adds the work to the todo list of the thread. Doesn't set the process_todo * flag, which means that (if it wasn't already set) the thread will go to * sleep without handling this work when it calls read. * * Requires the proc->inner_lock to be held. */ static void binder_enqueue_deferred_thread_work_ilocked(struct binder_thread *thread, struct binder_work *work) { WARN_ON(!list_empty(&thread->waiting_thread_node)); binder_enqueue_work_ilocked(work, &thread->todo); } /** * binder_enqueue_thread_work_ilocked() - Add an item to the thread work list * @thread: thread to queue work to * @work: struct binder_work to add to list * * Adds the work to the todo list of the thread, and enables processing * of the todo queue. * * Requires the proc->inner_lock to be held. */ static void binder_enqueue_thread_work_ilocked(struct binder_thread *thread, struct binder_work *work) { WARN_ON(!list_empty(&thread->waiting_thread_node)); binder_enqueue_work_ilocked(work, &thread->todo); /* (e)poll-based threads require an explicit wakeup signal when * queuing their own work; they rely on these events to consume * messages without I/O block. Without it, threads risk waiting * indefinitely without handling the work. */ if (thread->looper & BINDER_LOOPER_STATE_POLL && thread->pid == current->pid && !thread->process_todo) wake_up_interruptible_sync(&thread->wait); thread->process_todo = true; } /** * binder_enqueue_thread_work() - Add an item to the thread work list * @thread: thread to queue work to * @work: struct binder_work to add to list * * Adds the work to the todo list of the thread, and enables processing * of the todo queue. */ static void binder_enqueue_thread_work(struct binder_thread *thread, struct binder_work *work) { binder_inner_proc_lock(thread->proc); binder_enqueue_thread_work_ilocked(thread, work); binder_inner_proc_unlock(thread->proc); } static void binder_dequeue_work_ilocked(struct binder_work *work) { list_del_init(&work->entry); } /** * binder_dequeue_work() - Removes an item from the work list * @proc: binder_proc associated with list * @work: struct binder_work to remove from list * * Removes the specified work item from whatever list it is on. * Can safely be called if work is not on any list. */ static void binder_dequeue_work(struct binder_proc *proc, struct binder_work *work) { binder_inner_proc_lock(proc); binder_dequeue_work_ilocked(work); binder_inner_proc_unlock(proc); } static struct binder_work *binder_dequeue_work_head_ilocked( struct list_head *list) { struct binder_work *w; w = list_first_entry_or_null(list, struct binder_work, entry); if (w) list_del_init(&w->entry); return w; } static void binder_defer_work(struct binder_proc *proc, enum binder_deferred_state defer); static void binder_free_thread(struct binder_thread *thread); static void binder_free_proc(struct binder_proc *proc); static void binder_inc_node_tmpref_ilocked(struct binder_node *node); static bool binder_has_work_ilocked(struct binder_thread *thread, bool do_proc_work) { return thread->process_todo || thread->looper_need_return || (do_proc_work && !binder_worklist_empty_ilocked(&thread->proc->todo)); } static bool binder_has_work(struct binder_thread *thread, bool do_proc_work) { bool has_work; binder_inner_proc_lock(thread->proc); has_work = binder_has_work_ilocked(thread, do_proc_work); binder_inner_proc_unlock(thread->proc); return has_work; } static bool binder_available_for_proc_work_ilocked(struct binder_thread *thread) { return !thread->transaction_stack && binder_worklist_empty_ilocked(&thread->todo); } static void binder_wakeup_poll_threads_ilocked(struct binder_proc *proc, bool sync) { struct rb_node *n; struct binder_thread *thread; for (n = rb_first(&proc->threads); n != NULL; n = rb_next(n)) { thread = rb_entry(n, struct binder_thread, rb_node); if (thread->looper & BINDER_LOOPER_STATE_POLL && binder_available_for_proc_work_ilocked(thread)) { if (sync) wake_up_interruptible_sync(&thread->wait); else wake_up_interruptible(&thread->wait); } } } /** * binder_select_thread_ilocked() - selects a thread for doing proc work. * @proc: process to select a thread from * * Note that calling this function moves the thread off the waiting_threads * list, so it can only be woken up by the caller of this function, or a * signal. Therefore, callers *should* always wake up the thread this function * returns. * * Return: If there's a thread currently waiting for process work, * returns that thread. Otherwise returns NULL. */ static struct binder_thread * binder_select_thread_ilocked(struct binder_proc *proc) { struct binder_thread *thread; assert_spin_locked(&proc->inner_lock); thread = list_first_entry_or_null(&proc->waiting_threads, struct binder_thread, waiting_thread_node); if (thread) list_del_init(&thread->waiting_thread_node); return thread; } /** * binder_wakeup_thread_ilocked() - wakes up a thread for doing proc work. * @proc: process to wake up a thread in * @thread: specific thread to wake-up (may be NULL) * @sync: whether to do a synchronous wake-up * * This function wakes up a thread in the @proc process. * The caller may provide a specific thread to wake-up in * the @thread parameter. If @thread is NULL, this function * will wake up threads that have called poll(). * * Note that for this function to work as expected, callers * should first call binder_select_thread() to find a thread * to handle the work (if they don't have a thread already), * and pass the result into the @thread parameter. */ static void binder_wakeup_thread_ilocked(struct binder_proc *proc, struct binder_thread *thread, bool sync) { assert_spin_locked(&proc->inner_lock); if (thread) { if (sync) wake_up_interruptible_sync(&thread->wait); else wake_up_interruptible(&thread->wait); return; } /* Didn't find a thread waiting for proc work; this can happen * in two scenarios: * 1. All threads are busy handling transactions * In that case, one of those threads should call back into * the kernel driver soon and pick up this work. * 2. Threads are using the (e)poll interface, in which case * they may be blocked on the waitqueue without having been * added to waiting_threads. For this case, we just iterate * over all threads not handling transaction work, and * wake them all up. We wake all because we don't know whether * a thread that called into (e)poll is handling non-binder * work currently. */ binder_wakeup_poll_threads_ilocked(proc, sync); } static void binder_wakeup_proc_ilocked(struct binder_proc *proc) { struct binder_thread *thread = binder_select_thread_ilocked(proc); binder_wakeup_thread_ilocked(proc, thread, /* sync = */false); } static void binder_set_nice(long nice) { long min_nice; if (can_nice(current, nice)) { set_user_nice(current, nice); return; } min_nice = rlimit_to_nice(rlimit(RLIMIT_NICE)); binder_debug(BINDER_DEBUG_PRIORITY_CAP, "%d: nice value %ld not allowed use %ld instead\n", current->pid, nice, min_nice); set_user_nice(current, min_nice); if (min_nice <= MAX_NICE) return; binder_user_error("%d RLIMIT_NICE not set\n", current->pid); } static struct binder_node *binder_get_node_ilocked(struct binder_proc *proc, binder_uintptr_t ptr) { struct rb_node *n = proc->nodes.rb_node; struct binder_node *node; assert_spin_locked(&proc->inner_lock); while (n) { node = rb_entry(n, struct binder_node, rb_node); if (ptr < node->ptr) n = n->rb_left; else if (ptr > node->ptr) n = n->rb_right; else { /* * take an implicit weak reference * to ensure node stays alive until * call to binder_put_node() */ binder_inc_node_tmpref_ilocked(node); return node; } } return NULL; } static struct binder_node *binder_get_node(struct binder_proc *proc, binder_uintptr_t ptr) { struct binder_node *node; binder_inner_proc_lock(proc); node = binder_get_node_ilocked(proc, ptr); binder_inner_proc_unlock(proc); return node; } static struct binder_node *binder_init_node_ilocked( struct binder_proc *proc, struct binder_node *new_node, struct flat_binder_object *fp) { struct rb_node **p = &proc->nodes.rb_node; struct rb_node *parent = NULL; struct binder_node *node; binder_uintptr_t ptr = fp ? fp->binder : 0; binder_uintptr_t cookie = fp ? fp->cookie : 0; __u32 flags = fp ? fp->flags : 0; assert_spin_locked(&proc->inner_lock); while (*p) { parent = *p; node = rb_entry(parent, struct binder_node, rb_node); if (ptr < node->ptr) p = &(*p)->rb_left; else if (ptr > node->ptr) p = &(*p)->rb_right; else { /* * A matching node is already in * the rb tree. Abandon the init * and return it. */ binder_inc_node_tmpref_ilocked(node); return node; } } node = new_node; binder_stats_created(BINDER_STAT_NODE); node->tmp_refs++; rb_link_node(&node->rb_node, parent, p); rb_insert_color(&node->rb_node, &proc->nodes); node->debug_id = atomic_inc_return(&binder_last_id); node->proc = proc; node->ptr = ptr; node->cookie = cookie; node->work.type = BINDER_WORK_NODE; node->min_priority = flags & FLAT_BINDER_FLAG_PRIORITY_MASK; node->accept_fds = !!(flags & FLAT_BINDER_FLAG_ACCEPTS_FDS); node->txn_security_ctx = !!(flags & FLAT_BINDER_FLAG_TXN_SECURITY_CTX); spin_lock_init(&node->lock); INIT_LIST_HEAD(&node->work.entry); INIT_LIST_HEAD(&node->async_todo); binder_debug(BINDER_DEBUG_INTERNAL_REFS, "%d:%d node %d u%016llx c%016llx created\n", proc->pid, current->pid, node->debug_id, (u64)node->ptr, (u64)node->cookie); return node; } static struct binder_node *binder_new_node(struct binder_proc *proc, struct flat_binder_object *fp) { struct binder_node *node; struct binder_node *new_node = kzalloc(sizeof(*node), GFP_KERNEL); if (!new_node) return NULL; binder_inner_proc_lock(proc); node = binder_init_node_ilocked(proc, new_node, fp); binder_inner_proc_unlock(proc); if (node != new_node) /* * The node was already added by another thread */ kfree(new_node); return node; } static void binder_free_node(struct binder_node *node) { kfree(node); binder_stats_deleted(BINDER_STAT_NODE); } static int binder_inc_node_nilocked(struct binder_node *node, int strong, int internal, struct list_head *target_list) { struct binder_proc *proc = node->proc; assert_spin_locked(&node->lock); if (proc) assert_spin_locked(&proc->inner_lock); if (strong) { if (internal) { if (target_list == NULL && node->internal_strong_refs == 0 && !(node->proc && node == node->proc->context->binder_context_mgr_node && node->has_strong_ref)) { pr_err("invalid inc strong node for %d\n", node->debug_id); return -EINVAL; } node->internal_strong_refs++; } else node->local_strong_refs++; if (!node->has_strong_ref && target_list) { struct binder_thread *thread = container_of(target_list, struct binder_thread, todo); binder_dequeue_work_ilocked(&node->work); BUG_ON(&thread->todo != target_list); binder_enqueue_deferred_thread_work_ilocked(thread, &node->work); } } else { if (!internal) node->local_weak_refs++; if (!node->has_weak_ref && list_empty(&node->work.entry)) { if (target_list == NULL) { pr_err("invalid inc weak node for %d\n", node->debug_id); return -EINVAL; } /* * See comment above */ binder_enqueue_work_ilocked(&node->work, target_list); } } return 0; } static int binder_inc_node(struct binder_node *node, int strong, int internal, struct list_head *target_list) { int ret; binder_node_inner_lock(node); ret = binder_inc_node_nilocked(node, strong, internal, target_list); binder_node_inner_unlock(node); return ret; } static bool binder_dec_node_nilocked(struct binder_node *node, int strong, int internal) { struct binder_proc *proc = node->proc; assert_spin_locked(&node->lock); if (proc) assert_spin_locked(&proc->inner_lock); if (strong) { if (internal) node->internal_strong_refs--; else node->local_strong_refs--; if (node->local_strong_refs || node->internal_strong_refs) return false; } else { if (!internal) node->local_weak_refs--; if (node->local_weak_refs || node->tmp_refs || !hlist_empty(&node->refs)) return false; } if (proc && (node->has_strong_ref || node->has_weak_ref)) { if (list_empty(&node->work.entry)) { binder_enqueue_work_ilocked(&node->work, &proc->todo); binder_wakeup_proc_ilocked(proc); } } else { if (hlist_empty(&node->refs) && !node->local_strong_refs && !node->local_weak_refs && !node->tmp_refs) { if (proc) { binder_dequeue_work_ilocked(&node->work); rb_erase(&node->rb_node, &proc->nodes); binder_debug(BINDER_DEBUG_INTERNAL_REFS, "refless node %d deleted\n", node->debug_id); } else { BUG_ON(!list_empty(&node->work.entry)); spin_lock(&binder_dead_nodes_lock); /* * tmp_refs could have changed so * check it again */ if (node->tmp_refs) { spin_unlock(&binder_dead_nodes_lock); return false; } hlist_del(&node->dead_node); spin_unlock(&binder_dead_nodes_lock); binder_debug(BINDER_DEBUG_INTERNAL_REFS, "dead node %d deleted\n", node->debug_id); } return true; } } return false; } static void binder_dec_node(struct binder_node *node, int strong, int internal) { bool free_node; binder_node_inner_lock(node); free_node = binder_dec_node_nilocked(node, strong, internal); binder_node_inner_unlock(node); if (free_node) binder_free_node(node); } static void binder_inc_node_tmpref_ilocked(struct binder_node *node) { /* * No call to binder_inc_node() is needed since we * don't need to inform userspace of any changes to * tmp_refs */ node->tmp_refs++; } /** * binder_inc_node_tmpref() - take a temporary reference on node * @node: node to reference * * Take reference on node to prevent the node from being freed * while referenced only by a local variable. The inner lock is * needed to serialize with the node work on the queue (which * isn't needed after the node is dead). If the node is dead * (node->proc is NULL), use binder_dead_nodes_lock to protect * node->tmp_refs against dead-node-only cases where the node * lock cannot be acquired (eg traversing the dead node list to * print nodes) */ static void binder_inc_node_tmpref(struct binder_node *node) { binder_node_lock(node); if (node->proc) binder_inner_proc_lock(node->proc); else spin_lock(&binder_dead_nodes_lock); binder_inc_node_tmpref_ilocked(node); if (node->proc) binder_inner_proc_unlock(node->proc); else spin_unlock(&binder_dead_nodes_lock); binder_node_unlock(node); } /** * binder_dec_node_tmpref() - remove a temporary reference on node * @node: node to reference * * Release temporary reference on node taken via binder_inc_node_tmpref() */ static void binder_dec_node_tmpref(struct binder_node *node) { bool free_node; binder_node_inner_lock(node); if (!node->proc) spin_lock(&binder_dead_nodes_lock); else __acquire(&binder_dead_nodes_lock); node->tmp_refs--; BUG_ON(node->tmp_refs < 0); if (!node->proc) spin_unlock(&binder_dead_nodes_lock); else __release(&binder_dead_nodes_lock); /* * Call binder_dec_node() to check if all refcounts are 0 * and cleanup is needed. Calling with strong=0 and internal=1 * causes no actual reference to be released in binder_dec_node(). * If that changes, a change is needed here too. */ free_node = binder_dec_node_nilocked(node, 0, 1); binder_node_inner_unlock(node); if (free_node) binder_free_node(node); } static void binder_put_node(struct binder_node *node) { binder_dec_node_tmpref(node); } static struct binder_ref *binder_get_ref_olocked(struct binder_proc *proc, u32 desc, bool need_strong_ref) { struct rb_node *n = proc->refs_by_desc.rb_node; struct binder_ref *ref; while (n) { ref = rb_entry(n, struct binder_ref, rb_node_desc); if (desc < ref->data.desc) { n = n->rb_left; } else if (desc > ref->data.desc) { n = n->rb_right; } else if (need_strong_ref && !ref->data.strong) { binder_user_error("tried to use weak ref as strong ref\n"); return NULL; } else { return ref; } } return NULL; } /* Find the smallest unused descriptor the "slow way" */ static u32 slow_desc_lookup_olocked(struct binder_proc *proc, u32 offset) { struct binder_ref *ref; struct rb_node *n; u32 desc; desc = offset; for (n = rb_first(&proc->refs_by_desc); n; n = rb_next(n)) { ref = rb_entry(n, struct binder_ref, rb_node_desc); if (ref->data.desc > desc) break; desc = ref->data.desc + 1; } return desc; } /* * Find an available reference descriptor ID. The proc->outer_lock might * be released in the process, in which case -EAGAIN is returned and the * @desc should be considered invalid. */ static int get_ref_desc_olocked(struct binder_proc *proc, struct binder_node *node, u32 *desc) { struct dbitmap *dmap = &proc->dmap; unsigned int nbits, offset; unsigned long *new, bit; /* 0 is reserved for the context manager */ offset = (node == proc->context->binder_context_mgr_node) ? 0 : 1; if (!dbitmap_enabled(dmap)) { *desc = slow_desc_lookup_olocked(proc, offset); return 0; } if (dbitmap_acquire_next_zero_bit(dmap, offset, &bit) == 0) { *desc = bit; return 0; } /* * The dbitmap is full and needs to grow. The proc->outer_lock * is briefly released to allocate the new bitmap safely. */ nbits = dbitmap_grow_nbits(dmap); binder_proc_unlock(proc); new = bitmap_zalloc(nbits, GFP_KERNEL); binder_proc_lock(proc); dbitmap_grow(dmap, new, nbits); return -EAGAIN; } /** * binder_get_ref_for_node_olocked() - get the ref associated with given node * @proc: binder_proc that owns the ref * @node: binder_node of target * @new_ref: newly allocated binder_ref to be initialized or %NULL * * Look up the ref for the given node and return it if it exists * * If it doesn't exist and the caller provides a newly allocated * ref, initialize the fields of the newly allocated ref and insert * into the given proc rb_trees and node refs list. * * Return: the ref for node. It is possible that another thread * allocated/initialized the ref first in which case the * returned ref would be different than the passed-in * new_ref. new_ref must be kfree'd by the caller in * this case. */ static struct binder_ref *binder_get_ref_for_node_olocked( struct binder_proc *proc, struct binder_node *node, struct binder_ref *new_ref) { struct binder_ref *ref; struct rb_node *parent; struct rb_node **p; u32 desc; retry: p = &proc->refs_by_node.rb_node; parent = NULL; while (*p) { parent = *p; ref = rb_entry(parent, struct binder_ref, rb_node_node); if (node < ref->node) p = &(*p)->rb_left; else if (node > ref->node) p = &(*p)->rb_right; else return ref; } if (!new_ref) return NULL; /* might release the proc->outer_lock */ if (get_ref_desc_olocked(proc, node, &desc) == -EAGAIN) goto retry; binder_stats_created(BINDER_STAT_REF); new_ref->data.debug_id = atomic_inc_return(&binder_last_id); new_ref->proc = proc; new_ref->node = node; rb_link_node(&new_ref->rb_node_node, parent, p); rb_insert_color(&new_ref->rb_node_node, &proc->refs_by_node); new_ref->data.desc = desc; p = &proc->refs_by_desc.rb_node; while (*p) { parent = *p; ref = rb_entry(parent, struct binder_ref, rb_node_desc); if (new_ref->data.desc < ref->data.desc) p = &(*p)->rb_left; else if (new_ref->data.desc > ref->data.desc) p = &(*p)->rb_right; else BUG(); } rb_link_node(&new_ref->rb_node_desc, parent, p); rb_insert_color(&new_ref->rb_node_desc, &proc->refs_by_desc); binder_node_lock(node); hlist_add_head(&new_ref->node_entry, &node->refs); binder_debug(BINDER_DEBUG_INTERNAL_REFS, "%d new ref %d desc %d for node %d\n", proc->pid, new_ref->data.debug_id, new_ref->data.desc, node->debug_id); binder_node_unlock(node); return new_ref; } static void binder_cleanup_ref_olocked(struct binder_ref *ref) { struct dbitmap *dmap = &ref->proc->dmap; bool delete_node = false; binder_debug(BINDER_DEBUG_INTERNAL_REFS, "%d delete ref %d desc %d for node %d\n", ref->proc->pid, ref->data.debug_id, ref->data.desc, ref->node->debug_id); if (dbitmap_enabled(dmap)) dbitmap_clear_bit(dmap, ref->data.desc); rb_erase(&ref->rb_node_desc, &ref->proc->refs_by_desc); rb_erase(&ref->rb_node_node, &ref->proc->refs_by_node); binder_node_inner_lock(ref->node); if (ref->data.strong) binder_dec_node_nilocked(ref->node, 1, 1); hlist_del(&ref->node_entry); delete_node = binder_dec_node_nilocked(ref->node, 0, 1); binder_node_inner_unlock(ref->node); /* * Clear ref->node unless we want the caller to free the node */ if (!delete_node) { /* * The caller uses ref->node to determine * whether the node needs to be freed. Clear * it since the node is still alive. */ ref->node = NULL; } if (ref->death) { binder_debug(BINDER_DEBUG_DEAD_BINDER, "%d delete ref %d desc %d has death notification\n", ref->proc->pid, ref->data.debug_id, ref->data.desc); binder_dequeue_work(ref->proc, &ref->death->work); binder_stats_deleted(BINDER_STAT_DEATH); } if (ref->freeze) { binder_dequeue_work(ref->proc, &ref->freeze->work); binder_stats_deleted(BINDER_STAT_FREEZE); } binder_stats_deleted(BINDER_STAT_REF); } /** * binder_inc_ref_olocked() - increment the ref for given handle * @ref: ref to be incremented * @strong: if true, strong increment, else weak * @target_list: list to queue node work on * * Increment the ref. @ref->proc->outer_lock must be held on entry * * Return: 0, if successful, else errno */ static int binder_inc_ref_olocked(struct binder_ref *ref, int strong, struct list_head *target_list) { int ret; if (strong) { if (ref->data.strong == 0) { ret = binder_inc_node(ref->node, 1, 1, target_list); if (ret) return ret; } ref->data.strong++; } else { if (ref->data.weak == 0) { ret = binder_inc_node(ref->node, 0, 1, target_list); if (ret) return ret; } ref->data.weak++; } return 0; } /** * binder_dec_ref_olocked() - dec the ref for given handle * @ref: ref to be decremented * @strong: if true, strong decrement, else weak * * Decrement the ref. * * Return: %true if ref is cleaned up and ready to be freed. */ static bool binder_dec_ref_olocked(struct binder_ref *ref, int strong) { if (strong) { if (ref->data.strong == 0) { binder_user_error("%d invalid dec strong, ref %d desc %d s %d w %d\n", ref->proc->pid, ref->data.debug_id, ref->data.desc, ref->data.strong, ref->data.weak); return false; } ref->data.strong--; if (ref->data.strong == 0) binder_dec_node(ref->node, strong, 1); } else { if (ref->data.weak == 0) { binder_user_error("%d invalid dec weak, ref %d desc %d s %d w %d\n", ref->proc->pid, ref->data.debug_id, ref->data.desc, ref->data.strong, ref->data.weak); return false; } ref->data.weak--; } if (ref->data.strong == 0 && ref->data.weak == 0) { binder_cleanup_ref_olocked(ref); return true; } return false; } /** * binder_get_node_from_ref() - get the node from the given proc/desc * @proc: proc containing the ref * @desc: the handle associated with the ref * @need_strong_ref: if true, only return node if ref is strong * @rdata: the id/refcount data for the ref * * Given a proc and ref handle, return the associated binder_node * * Return: a binder_node or NULL if not found or not strong when strong required */ static struct binder_node *binder_get_node_from_ref( struct binder_proc *proc, u32 desc, bool need_strong_ref, struct binder_ref_data *rdata) { struct binder_node *node; struct binder_ref *ref; binder_proc_lock(proc); ref = binder_get_ref_olocked(proc, desc, need_strong_ref); if (!ref) goto err_no_ref; node = ref->node; /* * Take an implicit reference on the node to ensure * it stays alive until the call to binder_put_node() */ binder_inc_node_tmpref(node); if (rdata) *rdata = ref->data; binder_proc_unlock(proc); return node; err_no_ref: binder_proc_unlock(proc); return NULL; } /** * binder_free_ref() - free the binder_ref * @ref: ref to free * * Free the binder_ref. Free the binder_node indicated by ref->node * (if non-NULL) and the binder_ref_death indicated by ref->death. */ static void binder_free_ref(struct binder_ref *ref) { if (ref->node) binder_free_node(ref->node); kfree(ref->death); kfree(ref->freeze); kfree(ref); } /* shrink descriptor bitmap if needed */ static void try_shrink_dmap(struct binder_proc *proc) { unsigned long *new; int nbits; binder_proc_lock(proc); nbits = dbitmap_shrink_nbits(&proc->dmap); binder_proc_unlock(proc); if (!nbits) return; new = bitmap_zalloc(nbits, GFP_KERNEL); binder_proc_lock(proc); dbitmap_shrink(&proc->dmap, new, nbits); binder_proc_unlock(proc); } /** * binder_upd |